Probability Lecture Notes

Bruce K.
Driver
Math 280 (Probability Theory) Lecture Notes
June 10, 2010 File:prob.tex
Contents
Part Homework Problems
-3 Math 280A Homework Problems Fall 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.1 Homework 1. Due Wednesday, September 30, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.2 Homework 2. Due Wednesday, October 7, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.3 Homework 3. Due Wednesday, October 21, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.4 Homework 4. Due Wednesday, October 28, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.5 Homework 5. Due Wednesday, November 4, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.6 Homework 6. Due Wednesday, November 18, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.7 Homework 7. Due Wednesday, November 25, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-3.8 Homework 8. Due Monday, December 7, 2009 by 11:00AM (Put under my oce door if I am not in.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-2 Math 280B Homework Problems Winter 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.1 Homework 1. Due Wednesday, January 13, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.4 Homework 4. Due Wednesday, February 3, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.5 Homework 5. Due Wednesday, February 10, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.6 Homework 6. Due Friday, February 19, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.7 Homework 7. Due Monday, March 1, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.8 Homework 8. Due Monday, March 8, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
-2.9 Homework 9. (Not) Due Monday, March 15, 2010. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
-1 Math 280C Homework Problems Spring 2010. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.1 Homework 1. Due Wednesday, April 7, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.2 Homework 2. Due Wednesday, April 14, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.3 Homework 3. Due Wednesday April 21, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.4 Homework 4. Due Wednesday April 28, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.5 Homework 5. Due Wednesday May 5, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Contents
-1.6 Homework 6. Due Wednesday May 12, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.7 Homework 7. Due Wednesday May 19, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
-1.8 Homework 8. Due Wednesday June 2, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0 Math 286 Homework Problems Spring 2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Part I Background Material
1 Limsups, Liminfs and Extended Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Basic Probabilistic Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Part II Formal Development
3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Algebraic sub-structures of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Finitely Additive Measures / Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Examples of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Simple Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 The algebraic structure of simple functions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Simple Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Appendix: Bonferroni Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Appendix: Riemann Stieljtes integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Simple Independence and the Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.1 Complex Weierstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Product Measures and Fubinis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Simple Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Appendix: A Multi-dimensional Weirstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Countably Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.1 A Density Result* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Construction of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4 Radon Measures on 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.1 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 A Discrete Kolmogorovs Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.6 Appendix: Regularity and Uniqueness Results* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Page: 4 job: prob macro: svmonob.cls date/time: 10-Jun-2010/16:32
Contents 5
5.7 Appendix: Completions of Measure Spaces* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.8 Appendix Monotone Class Theorems* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Factoring Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Summary of Measurability Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.4 Distributions / Laws of Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Generating All Distributions from the Uniform Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7 Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.1 Integrals of positive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Integrals of Complex Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.1 Square Integrable Random Variables and Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2.2 Some Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.3 Integration on 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.4 Densities and Change of Variables Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 Some Common Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.5.1 Normal (Gaussian) Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.6 Stirlings Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.6.1 Two applications of Stirlings formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.6.2 A primitive Stirling type approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.7 Comparison of the Lebesgue and the Riemann Integral* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.8 Measurability on Complete Measure Spaces* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.9 More Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8 Functional Forms of the Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.1 Multiplicative System Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 A Strengthening of the Multiplicative System Theorem* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.4 The Bounded Approximation Theorem* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9 Multiple and Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.1 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.2 Tonellis Theorem and Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.3 Fubinis Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.4 Fubinis Theorem and Completions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.5 Lebesgue Measure on 1
d
and the Change of Variables Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.6 The Polar Decomposition of Lebesgue Measure* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.7 More Spherical Coordinates* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.8 Gaussian Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.8.1 *Gaussian measures with possibly degenerate covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.9 Kolmogorovs Extension Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6 Contents
9.9.1 Regularity and compactness results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.9.2 Kolmogorovs Extension Theorem and Innite Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.10 Appendix: Standard Borel Spaces*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.11 More Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.1 Basic Properties of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.2 Examples of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
10.2.1 An Example of Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
10.3 Gaussian Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
10.4 Summing independent random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.5 A Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.6 A Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10.7 The Second Borel-Cantelli Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.8 Kolmogorov and Hewitt-Savage Zero-One Laws. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10.8.1 Hewitt-Savage Zero-One Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.9 Another Construction of Independent Random Variables* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11 The Standard Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.1 Poisson Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.2 Exponential Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.2.1 Appendix: More properties of Exponential random Variables* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.3 The Standard Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
11.4 Poission Process Extras* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
12 L
p
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
12.1 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
12.2 Almost Everywhere and Measure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.3 Jensens, Holders and Minikowskis Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.4 Completeness of L
p
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12.5 Density Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
12.6 Relationships between dierent L
p
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
12.6.1 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.7 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
12.9 Appendix: Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
13 Hilbert Space Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
13.1 Compactness Results for L
p
Spaces* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
13.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Contents 7
14 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
14.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
14.2 Additional Properties of Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
14.3 Construction of Regular Conditional Distributions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
15 The Radon-Nikodym Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
16 Some Ergodic Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Part III Stochastic Processes I
17 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
17.1 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
17.2 Discrete Time Homogeneous Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
17.3 Continuous time homogeneous Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
17.4 First Step Analysis and Hitting Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
17.5 Finite state space chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
17.5.1 Invariant distributions and return times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
17.5.2 Some worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
17.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
17.6 Appendix: Kolmogorovs extension theorem II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
17.7 Removing the standard Borel restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
17.8 *Appendix: More Probability Kernel Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
18 (Sub and Super) Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
18.1 (Sub and Super) Martingale Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
18.2 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
18.3 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
18.4 Stochastic Integrals and Optional Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
18.5 Submartingale Maximal Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
18.6 Submartingale Upcrossing Inequality and Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
18.7 *Supermartingale inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
18.7.1 Maximal Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
18.7.2 The upcrossing inequality and convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
18.8 Martingale Closure and Regularity Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
18.9 Backwards (Reverse) Submartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
18.10Some More Martingale Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
18.10.1More Random Walk Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
18.11Appendix: Some Alternate Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
8 Contents
19 Some Martingale Examples and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
19.1 A Polya Urn Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
19.2 Galton Watson Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
19.3 Kakutanis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Part IV (Weak) Convergence of Random Sums
20 Random Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
20.1 Weak Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
20.1.1 A WLLN Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
20.2 Kolmogorovs Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
20.3 The Strong Law of Large Numbers Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
20.3.1 Strong Law of Large Number Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
20.4 Kolmogorovs Three Series Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
20.4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
20.5 Maximal Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
21 Weak Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
21.1 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
21.2 Total Variation Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
21.3 A Coupling Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
21.4 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
21.5 Derived Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
21.6 Convergence of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
21.7 Weak Convergence Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
21.8 Compactness and tightness of measures on (1, B
R
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
21.9 Metric Space Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
21.9.1 A point set topology review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
21.9.2 Proof of Skorohods Theorem 21.58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
21.9.3 Proof of Proposition The Portmanteau Theorem 21.59. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
21.9.4 Proof of Prokhorovs compactness Theorem 21.61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
22 Characteristic Functions (Fourier Transform) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
22.1 Basic Properties of the Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
22.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
22.3 Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
22.4 A Fourier Transform Inversion Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
22.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
22.6 Appendix: Bochners Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
22.7 Appendix: Some Calculus Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Contents 9
23 Weak Convergence of Random Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
23.1 Lindeberg-Feller CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
23.2 Innitely Divisible Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
23.3 Stable Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
23.4 *Appendix: Levy exponent and Levy Process facts Very Preliminary!! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
Part V Stochastic Processes II
24 Gaussian Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
24.1 Gaussian Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
24.2 Existence of Gaussian Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
24.3 Gaussian Field Interpretation of Pre-Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
25 Versions and Modications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
25.1 Kolmolgorovs Continuity Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
25.2 Kolmolgorovs Tightness Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
25.2.1 Appendix: Alternate Proofs (please ignore) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
26 Brownian Motion I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
26.1 Donskers Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
26.2 Path Regularity Properties of BM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
26.3 Scaling Properties of B. M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
27 Filtrations and Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
27.1 Measurability Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
27.2 Stopping and optional times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
27.3 Filtration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
27.3.1 ***More Augmentation Results (This subsection neeed serious editing.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
28 Continuous time (sub)martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
28.1 Submartingale Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
28.2 Regularizing a submartingale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Part VI Markov Processes II
29 The Strong Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
29.1 The denumerable strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
29.2 The strong Markov property in continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
29.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
29.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
10 Contents
30 Long Run Behavior of Discrete Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
30.1 The Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
30.1.1 More nite state space examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
30.2 The Strong Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
30.3 Irreducible Recurrent Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
31 Brownian Motion II (Markov Property) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
31.1 Some Brownian Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
32 The Feynman-Kac Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
32.1 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
32.2 Solving the heat equation on 1
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
32.3 Wiener Measure Heuristics and the Feynman-Kac formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
32.4 Proving the Feynman Kac Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
32.5 Appendix: Extensions of Theorem 32.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
33 Feller Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
34 *Nelsons Continuity Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Part VII Continuous Time Markov Chains
35 Basics of continuous time chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
35.1 Construction of continuous time Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
35.2 Markov Properties in more detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
36 Continuous Time M.C. Finite State Space Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
36.1 Matrix Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
36.2 Characterizing Markov Semi-Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
36.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
37 Jump and Hold Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
37.1 Hitting and Expected Return times and Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
37.2 Long time behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
37.3 Formal Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
38 Continuous Time M.C. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
38.1 Birth and Death Process basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
38.2 Pure Birth Process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
38.2.1 Innitesimal description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
38.2.2 Yule Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
38.2.3 Sojourn description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
38.3 Pure Death Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Contents 11
38.3.1 Cable Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
38.3.2 Linear Death Process basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
38.3.3 Linear death process in more detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
38.4 Birth and Death Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
38.4.1 Linear birth and death process with immigration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
Part VIII Appendices
39 Basic Metric Space Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
39.1 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
39.2 Completeness in Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
39.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
39.4 Function Space Compactness Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
39.5 Supplementary Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
39.5.1 Word of Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
39.5.2 Riemannian Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
39.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Part
Homework Problems
-3
Math 280A Homework Problems Fall 2009
Problems are from Resnick, S. A Probability Path, Birkhauser, 1999 or from
the lecture notes. The problems from the lecture notes are hyperlinked to their
location.
-3.1 Homework 1. Due Wednesday, September 30, 2009
Read over Chapter 1.
Hand in Exercises 1.1, 1.2, and 1.3.
-3.2 Homework 2. Due Wednesday, October 7, 2009
Look at Resnick, p. 20-27: 9, 12, 17, 19, 27, 30, 36, and Exercise 3.9 from
the lecture notes.
Hand in Resnick, p. 20-27: 5, 18, 23, 40*, 41, and Exercise 4.1 from the
lecture notes.
*Notes on Resnicks #40: (i) B((0, 1]) should be B([0, 1)) in the statement
of this problem, (ii) k is an integer, (iii) r 2.
Look at Lecture note Exercises; 4.7, 4.8, 4.9
Hand in Resnick, p. 6370; 7* and 13.
Hand in Lecture note Exercises: 4.3, 4.4, 4.5, 4.6, 4.10 4.15.
*Hint: For #7 you might label the coupons as 1, 2, . . . , N and let A
i
be
the event that the collector does not have the i
th
coupon after buying n -
boxes of cereal.
Look at Lecture note Exercises; 5.5, 5.10.
Look at Resnick, p. 6370; 5, 14, 16, 19
Hand in Resnick, p. 6370; 3, 6, 11
Hand in Lecture note Exercises: 5.6 5.9.
-3.5 Homework 5. Due Wednesday, November 4, 2009
Look at Resnick, p. 8590: 3, 7, 8, 12, 17, 21
Hand in from Resnick, p. 8590: 4, 6*, 9, 15, 18**.
*Note: In #6, the random variable X is understood to take values in the
extended real numbers.
** I would write the left side in terms of an expectation.
Look at Lecture note Exercise 6.3, 6.7.
Hand in Lecture note Exercises: 6.4, 6.6, 6.10.
Look at Lecture note Exercise 7.4, 7.9, 7.12, 7.17, 7.18, and 7.27.
Hand in Lecture note Exercises: 7.5, 7.7, 7.8, 7.11, 7.13, 7.14, 7.16
Look at from Resnik, p. 155166: 6, 13, 26, 37
Hand in from Resnick,p. 155166: 7, 38
Look at Lecture note Exercise 9.12 9.14.
Look at from Resnick 5.10: #18, 19, 20, 22, 31.
Hand in Lecture note Exercises: 8.1, 8.2, 8.3, 8.4, 8.5, 9.4, 9.5, 9.6, 9.7, and
9.9.
Hand in from Resnick 5.10: #9, 29.
See next page!
-3.8 Homework 8. Due Monday, December 7, 2009 by
11:00AM (Put under my oce door if I am not in.)
Look at Lecture note Exercise 10.1, 10.2, 10.7, 10.8, 10.10.
Look at from Resnick 4.5: 3, 5, 6, 8, 19, 28, 29.
Look at from Resnick 5.10: #6, 7, 8, 11, 13, 16, 22, 34
Hand in Lecture note Exercises: 9.8, 10.9.
Hand in from Resnick 4.5: 1, 9*, 11, 18, 25. *Exercise 10.10 may be useful
here.
Hand in from Resnick 5.10: #14, 26.
-2
Math 280B Homework Problems Winter 2010
-2.1 Homework 1. Due Wednesday, January 13, 2010
Hand in Lecture note Exercise 11.1, 11.2, 11.3, 11.4, 11.5, 11.6.
Look at from Resnick 5.10: #39
Look at from 6.7: 3, 4, 14, 15(Hint: see Corollary 12.9 or use [a b[ =
2(a b)
+
(a b)), 16, 17, 19, 24, 27, 30
Look at Lecture note Exercise 12.12
Hand in from Resnick 6.7: 1a, d, 12, 13, 18 (Also assume EX
n
= 0)*, 33.
Hand in lecture note exercises: 12.1, 12.3
* For Problem 18, please add the missing assumption that the random
variables should have mean zero. (The assertion to prove is false without
this assumption.) With this assumption, Var(X) = E[X
2
]. Also note that
Cov(X, Y ) = 0 is equivalent to E[XY ] = EX EY.
Look at from 6.7:
Look at Lecture note Exercise 13.3, 13.5
Hand in from Resnick 6.7: 5*, 7 (Hint: Observe that X
n
d
=
n
N (0, 1) .)
*For one possible proof of #5 it is useful to rst show X
n
n=1
are U.I.
rst.
Hand in lecture note exercises: 13.2, 13.4, 13.6
-2.4 Homework 4. Due Wednesday, February 3, 2010
Look at Resnick Chapter 10: 11
Hand in lecture note exercises: 10.3, 14.1, 14.2, 14.3, 14.4.
Hand in from Resnick 10.17: 2
, 5*, 7
, 8**
In part 2b, please explain what convention you are using when the denom-
inator is 0.
*A Poisson process, N (t)
t0
, with parameter satises (by denition): (i)
N has independent increments, so that N(s) and N(t) N(s) are independent;
(ii) if 0 u < v then N(v) N(u) has the Poisson distribution with parameter
(v u).
For 7a and 7b it is illuminating to nd a formula for E[g (X

1
) [X
1
+X
2
] .
**Hint: use Exercise 10.3 to rst show Cov (Y, f (Y )) 0.
-2.5 Homework 5. Due Wednesday, February 10, 2010
Look at the following Exercises from the Lecture Notes: 12.13, 16.2
Do the following Exercises from the Lecture Notes: 12.14, 12.15, 14.6, 14.8,
14.9
-2.6 Homework 6. Due Friday, February 19, 2010
Look at the following Exercises from the Lecture Notes: 17.5, 17.15, 17.16
17.6, 17.8.
-2.7 Homework 7. Due Monday, March 1, 2010
17.12, 17.13.
-2.8 Homework 8. Due Monday, March 8, 2010
Hand in the following Exercises from the Lecture Notes: 18.1, 18.2, 18.3,
18.5,
Resnick Chapter 10: Hand in 14, 15, 16, 33.
-2.9 Homework 9. (Not) Due Monday, March 15, 2010
The following homework will not be collected but it would certainly be good
if you did the problems. Solutions will appear during nals week.
Hand in the following Exercises from the Lecture Notes: 18.7, 18.18.
Look at the following Exercises from the Lecture Notes: 18.4, 18.6, 18.8.
Resnick Chapter 10.17: Hand in 19. For this problem please dene
X
n+1
/X
n
= Z
n+1
where
Z
n+1
=
_
_
_
X
n+1
/X
n
if X
n
,= 0
1 if X
n
= 0 = X
n+1
X
n+1
if X
n
= 0 and X
n+1
,= 0.
-1
Math 280C Homework Problems Spring 2010
-1.1 Homework 1. Due Wednesday, April 7, 2010
Look at from Resnick 10.17: 22
Look at Lecture note Exercises: 18.13, 18.14, 18.15, 18.16, 18.17, 18.20,
18.22, 18.26, 18.27, 18.31.
Hand in from Resnick 10.17: 18b, 23, 28 (hint: consider lnX
n
)
Hand in lecture note exercises: , 18.19, 18.21, 18.23, 18.24, 18.25
-1.2 Homework 2. Due Wednesday, April 14, 2010
Look at from Resnick 10: #25
Look at Lecture note Exercises: 20.2
Hand in from Resnick 7: #1, #2, #28*, #42 (#15 was assigned here in
error previously see Exercise 20.4 on next homework.)
[*Correction to #28: In the second part of the problem, the condi-
tion E[X
i
X
j
] (i j) for i > j should be E[X
i
X
j
] (i j) for
i j.]
Hand in lecture note exercises: 19.2, 19.3, 19.4, 20.1.
-1.3 Homework 3. Due Wednesday April 21, 2010
Look at from Resnick 7: #12. Hint: let U
n
: n = 0, 1, 2, ... be i.i.d.
random variables uniformly distributed on (0,1) and take X
0
= U
0
and then
dene X
n
inductively so that X
n+1
= X
n
U
n+1
.
Hand in from Resnick 7: #13, #16, #33, #36 (assume each X
n
is inte-
grable!)
Hand in lecture note exercises: 20.3, 20.4
Hints and comments. For Resnick 7.36; It must be assumed that E[X
n
] <
for each n, else there is a subtraction-of-innities problem. In addition the
conclusion reached in the second part of the problem can fail to be true if
the expectations are innite. Use the assumptions to bound E[X
n
] in terms of
E[X
n
: X
n
x]. Then use the two series theorem of Exercise 20.4.
-1.4 Homework 4. Due Wednesday April 28, 2010
Look at lecture note exercises: 21.4, 21.7
Hand in from Resnick 8.8: #4a-d, #13 (Assume
2
n
= Var (N
n
) > 0 for
all n.), #20, #31
Hand in lecture note exercises: 21.2, 21.3, 21.5, 21.6
-1.5 Homework 5. Due Wednesday May 5, 2010
Look at from Resnick 8.8: #14, #36
Resnick Chapter 8: Hand in 8.7* (assume the central limit theorem here),
8.17, 8.30**, 8.34***
Comments and hints:
1. * In 8.7 you will need to use the central limit theorem along with the
method.
2. ** For 8.30, ignore the part of the question referring to the moment generat-
ing function. Hint: use problem 8.31 and the convergence of types theorem.
3. *** For 8.34 use an adaptation of the method. Example 21.46 may be
helpful here as well.
Hand in lecture note exercises: 21.9, 21.10, 21.11, 21.12, 22.2, 22.3, 22.4
Resnick Chapter 9: Look at: 9.5 (special case of 21.11), 9.6, 9.22, 9.33
Resnick Chapter 9: Hand in 9.9 a-e., 9.10
Look at Resnick Chapter 9: 9.1 (now obsolete), 9.8
Hand in Resnick Chapter 9: #11 (Exercise 21.11 may be useful here), 28,
34 (assume
2
n
> 0), 35 (hint: show P[
n
,= 0 i.o. ] = 0.), 38 (Hint: make
use Proposition 10.61.)
-1.8 Homework 8. Due Wednesday June 2, 2010
Look at lecture note exercises: 24.2 24.6, 25.2, and 27.3 27.5.
Hand in lecture note exercises: 25.1, 26.1, and 26.2.
0
Math 286 Homework Problems Spring 2008
Part I
Background Material
1
Limsups, Liminfs and Extended Limits
Notation 1.1 The extended real numbers is the set

1 := 1 , i.e. it
is 1 with two new points called and . We use the following conventions,
0 = 0, a = if a 1 with a > 0, a = if a 1 with
a < 0, +a = for any a 1, + = and = while
is not dened. A sequence a
n

1 is said to converge to () if for
all M 1 there exists m N such that a
n
M (a
n
M) for all n m.
Lemma 1.2. Suppose a
n
n=1
and b
n
n=1
are convergent sequences in

1,
then:
1. If a
n
b
n
for
1
a.a. n, then lim
n
a
n
lim
n
b
n
.
2. If c 1, then lim
n
(ca
n
) = c lim
n
a
n
.
3. a
n
+b
n
n=1
is convergent and
lim
n
(a
n
+b
n
) = lim
n
a
n
+ lim
n
b
n
(1.1)
provided the right side is not of the form .
4. a
n
b
n
n=1
is convergent and
lim
n
(a
n
b
n
) = lim
n
a
n
lim
n
b
n
(1.2)
provided the right hand side is not of the for 0 of 0 () .
Before going to the proof consider the simple example where a
n
= n and
b
n
= n with > 0. Then
lim(a
n
+b
n
) =
_
_
_
if < 1
0 if = 1
if > 1
while
lim
n
a
n
+ lim
n
b
n
= .
This shows that the requirement that the right side of Eq. (1.1) is not of form
is necessary in Lemma 1.2. Similarly by considering the examples a
n
= n
1
Here we use a.a. n as an abbreviation for almost all n. So an bn a.a. n i there
exists N < such that an bn for all n N.
and b
n
= n
with > 0 shows the necessity for assuming right hand side of
Eq. (1.2) is not of the form 0.
Proof. The proofs of items 1. and 2. are left to the reader.
Proof of Eq. (1.1). Let a := lim
n
a
n
and b = lim
n
b
n
. Case 1., suppose
b = in which case we must assume a > . In this case, for every M > 0,
there exists N such that b
n
M and a
n
a 1 for all n N and this implies
a
n
+b
n
M +a 1 for all n N.
Since M is arbitrary it follows that a
n
+ b
n
as n . The cases where
b = or a = are handled similarly. Case 2. If a, b 1, then for every
> 0 there exists N N such that
[a a
n
[ and [b b
n
[ for all n N.
Therefore,
[a +b (a
n
+b
n
)[ = [a a
n
+b b
n
[ [a a
n
[ +[b b
n
[ 2
for all n N. Since > 0 is arbitrary, it follows that lim
n
(a
n
+b
n
) = a+b.
Proof of Eq. (1.2). It will be left to the reader to prove the case where lima
n
and limb
n
exist in 1. I will only consider the case where a = lim
n
a
n
,= 0
and lim
n
b
n
= here. Let us also suppose that a > 0 (the case a < 0 is
handled similarly) and let := min
_
a
2
, 1
_
. Given any M < , there exists
N N such that a
n
and b
n
M for all n N and for this choice of N,
a
n
b
n
M for all n N. Since > 0 is xed and M is arbitrary it follows
that lim
n
(a
n
b
n
) = as desired.
For any subset

1, let sup and inf denote the least upper bound and
greatest lower bound of respectively. The convention being that sup =
if or is not bounded from above and inf = if or is
not bounded from below. We will also use the conventions that sup =
and inf = +.
Notation 1.3 Suppose that x
n
n=1

1 is a sequence of numbers. Then
liminf
n
x
n
= lim
n
infx
k
: k n and (1.3)
limsup
n
x
n
= lim
n
supx
k
: k n. (1.4)
14 1 Limsups, Liminfs and Extended Limits
We will also write lim for liminf
n
and lim for limsup
n
.
Remark 1.4. Notice that if a
k
:= infx
k
: k n and b
k
:= supx
k
: k
n, then a
k
is an increasing sequence while b
k
is a decreasing sequence.
Therefore the limits in Eq. (1.3) and Eq. (1.4) always exist in

1 and
liminf
n
x
n
= sup
n
infx
k
: k n and
limsup
n
x
n
= inf
n
supx
k
: k n.
The following proposition contains some basic properties of liminfs and lim-
sups.
Proposition 1.5. Let a
n
n=1
and b
n
n=1
be two sequences of real numbers.
Then
1. liminf
n
a
n
limsup
n
a
n
and lim
n
a
n
exists in

1 i
liminf
n
a
n
= limsup
n
a
n

1.
2. There is a subsequence a
nk
k=1
of a
n
n=1
such that lim
k
a
nk
=
limsup
n
a
n
. Similarly, there is a subsequence a
nk
k=1
of a
n
n=1
such that
lim
k
a
nk
= liminf
n
a
n
.
3.
limsup
n
(a
n
+b
n
) limsup
n
a
n
+ limsup
n
b
n
(1.5)
whenever the right side of this equation is not of the form .
4. If a
n
0 and b
n
0 for all n N, then
limsup
n
(a
n
b
n
) limsup
n
a
n
limsup
n
b
n
, (1.6)
provided the right hand side of (1.6) is not of the form 0 or 0.
Proof. 1. Since
infa
k
: k n supa
k
: k n n,
liminf
n
a
n
limsup
n
a
n
.
Now suppose that liminf
n
a
n
= limsup
n
a
n
= a 1. Then for all > 0,
there is an integer N such that
a infa
k
: k N supa
k
: k N a +,
i.e.
a a
k
a + for all k N.
Hence by the denition of the limit, lim
k
a
k
= a. If liminf
n
a
n
= ,
then we know for all M (0, ) there is an integer N such that
M infa
k
: k N
and hence lim
n
a
n
= . The case where limsup
n
a
n
= is handled simi-
larly.
Conversely, suppose that lim
n
a
n
= A

1 exists. If A 1, then for
every > 0 there exists N() N such that [Aa
n
[ for all n N(), i.e.
A a
n
A+ for all n N().
From this we learn that
A liminf
n
a
n
limsup
n
a
n
A+.
Since > 0 is arbitrary, it follows that
A liminf
n
a
n
limsup
n
a
n
A,
i.e. that A = liminf
n
a
n
= limsup
n
a
n
. If A = , then for all M > 0
there exists N = N(M) such that a
n
M for all n N. This show that
liminf
n
a
n
M and since M is arbitrary it follows that
liminf
n
a
n
limsup
n
a
n
.
The proof for the case A = is analogous to the A = case.
2. 4. The remaining items are left as an exercise to the reader. It may
be useful to keep the following simple example in mind. Let a
n
= (1)
n
and
b
n
= a
n
= (1)
n+1
. Then a
n
+b
n
= 0 so that
0 = lim
n
(a
n
+b
n
) = liminf
n
(a
n
+b
n
) = limsup
n
(a
n
+b
n
)
while
liminf
n
a
n
= liminf
n
b
n
= 1 and
limsup
n
a
n
= limsup
n
b
n
= 1.
Thus in this case we have
1 Limsups, Liminfs and Extended Limits 15
limsup
n
(a
n
+b
n
) < limsup
n
a
n
+ limsup
n
b
n
and
liminf
n
(a
n
+b
n
) > liminf
n
a
n
+ liminf
n
b
n
.
We will refer to the following basic proposition as the monotone convergence
theorem for sums (MCT for short).
Proposition 1.6 (MCT for sums). Suppose that for each n N, f
n
(i)
i=1
is a sequence in [0, ] such that lim
n
f
n
(i) = f (i) by which we mean
f
n
(i) f (i) as n . Then
lim
n
i=1
f
n
(i) =
i=1
f (i) , i.e.
lim
n
i=1
f
n
(i) =
i=1
lim
n
f
n
(i) .
We allow for the possibility that these expression may equal to +.
Proof. Let M := lim
n
i=1
f
n
(i) . As f
n
(i) f (i) for all n it follows
that

i=1
f
n
(i)
i=1
f (i) for all n and therefore passing to the limit shows
M
i=1
f (i) . If N N we have,
N
i=1
f (i) =
N
i=1
lim
n
f
n
(i) = lim
n
N
i=1
f
n
(i) lim
n
i=1
f
n
(i) = M.
Letting N in this equation then shows

i=1
f (i) M which completes
the proof.
Proposition 1.7 (Tonellis theorem for sums). If a
kn
k,n=1
[0, ] ,
then
k=1
n=1
a
kn
=
n=1
k=1
a
kn
.
Here we allow for one and hence both sides to be innite.
Proof. First Proof. Let S
N
(k) :=
N
n=1
a
kn
, then by the MCT (Proposi-
tion 1.6),
lim
N
k=1
S
N
(k) =
k=1
lim
N
S
N
(k) =
k=1
n=1
a
kn
.
On the other hand,
k=1
S
N
(k) =
k=1
N
n=1
a
kn
=
N
n=1
k=1
a
kn
so that
lim
N
k=1
S
N
(k) = lim
N
N
n=1
k=1
a
kn
=
n=1
k=1
a
kn
.
Second Proof. Let
M := sup
_
K
k=1
N
n=1
a
kn
: K, N N
_
= sup
_
N
n=1
K
k=1
a
kn
: K, N N
_
and
L :=
k=1
n=1
a
kn
.
Since
L =
k=1
n=1
a
kn
= lim
K
K
k=1
n=1
a
kn
= lim
K
lim
N
K
k=1
N
n=1
a
kn
and

K
k=1
N
n=1
a
kn
M for all K and N, it follows that L M. Conversely,
K
k=1
N
n=1
a
kn

K
k=1
n=1
a
kn

k=1
n=1
a
kn
= L
and therefore taking the supremum of the left side of this inequality over K
and N shows that M L. Thus we have shown
k=1
n=1
a
kn
= M.
By symmetry (or by a similar argument), we also have that

n=1
k=1
a
kn
=
M and hence the proof is complete.
You are asked to prove the next three results in the exercises.
Proposition 1.8 (Fubini for sums). Suppose a
kn
k,n=1
1 such that
k=1
n=1
[a
kn
[ =
n=1
k=1
[a
kn
[ < .
Then
k=1
n=1
a
kn
=
n=1
k=1
a
kn
.
16 1 Limsups, Liminfs and Extended Limits
Example 1.9 (Counter example). Let S
mn
m,n=1
be any sequence of complex
numbers such that lim
m
S
mn
= 1 for all n and lim
n
S
mn
= 0 for all n.
For example, take S
mn
= 1
mn
+
1
n
1
m<n
. Then dene a
ij
i,j=1
so that
S
mn
=
m
i=1
n
j=1
a
ij
.
Then
i=1
j=1
a
ij
= lim
m
lim
n
S
mn
= 0 ,= 1 = lim
n
lim
m
S
mn
=
j=1
i=1
a
ij
.
To nd a
ij
, set S
mn
= 0 if m = 0 or n = 0, then
S
mn
S
m1,n
=
n
j=1
a
mj
and
a
mn
= S
mn
S
m1,n
(S
m,n1
S
m1,n1
)
= S
mn
S
m1,n
S
m,n1
+S
m1,n1
.
Proposition 1.10 (Fatous Lemma for sums). Suppose that for each n N,
h
n
(i)
i=1
is any sequence in [0, ] , then
i=1
liminf
n
h
n
(i) liminf
n
i=1
h
n
(i) .
The next proposition is referred to as the dominated convergence theorem
(DCT for short) for sums.
Proposition 1.11 (DCT for sums). Suppose that for each n N,
f
n
(i)
i=1
1 is a sequence and g
n
(i)
i=1
is a sequence in [0, ) such that;
1.

i=1
g
n
(i) < for all n,
2. f (i) = lim
n
f
n
(i) and g (i) := lim
n
g
n
(i) exists for each i,
3. [f
n
(i)[ g
n
(i) for all i and n,
4. lim
n
i=1
g
n
(i) =
i=1
g (i) < .
Then
lim
n
i=1
f
n
(i) =
i=1
lim
n
f
n
(i) =
i=1
f (i) .
(Often this proposition is used in the special case where g
n
= g for all n.)
Exercise 1.1. Prove Proposition 1.8. Hint: Let a
+
kn
:= max (a
kn
, 0) and a
kn
=
max (a
kn
, 0) and observe that; a
kn
= a
+
kn
a
kn
and

a
+
kn
kn
= [a
kn
[ .
Now apply Proposition 1.7 with a
kn
replaced by a
+
kn
and a
kn
.
Exercise 1.2. Prove Proposition 1.10. Hint: apply the MCT by applying the
monotone convergence theorem with f
n
(i) := inf
mn
h
m
(i) .
Exercise 1.3. Prove Proposition 1.11. Hint: Apply Fatous lemma twice. Once
with h
n
(i) = g
n
(i) +f
n
(i) and once with h
n
(i) = g
n
(i) f
n
(i) .
2
Basic Probabilistic Notions
Denition 2.1. A sample space is a set which is to represents all possible
outcomes of an experiment.
Example 2.2. 1. The sample space for ipping a coin one time could be taken
to be, = 0, 1 .
2. The sample space for ipping a coin N -times could be taken to be, =
0, 1
N
and for ipping an innite number of times,
= = (
1
,
2
, . . . ) :
i
0, 1 = 0, 1
N
.
3. If we have a roulette wheel with 38 entries, then we might take
= 00, 0, 1, 2, . . . , 36
for one spin,
= 00, 0, 1, 2, . . . , 36
N
for N spins, and
= 00, 0, 1, 2, . . . , 36
N
for an innite number of spins.
4. If we throw darts at a board of radius R, we may take
= D
R
:=
_
(x, y) 1
2
: x
2
+y
2
R
_
for one throw,
= D
N
R
for N throws, and
= D
N
R
for an innite number of throws.
5. Suppose we release a perfume particle at location x 1
3
and follow its
motion for all time, 0 t < . In this case, we might take,
=
_
C ([0, ) , 1
3
) : (0) = x
_
.
Denition 2.3. An event, A, is a subset of . Given A we also dene
the indicator function of A by
1
A
() :=
_
1 if A
0 if / A
.
Example 2.4. Suppose that = 0, 1
N
is the sample space for ipping a coin
an innite number of times. Here
n
= 1 represents the fact that a head was
thrown on the n
th
toss, while
n
= 0 represents a tail on the n
th
toss.
1. A = :
3
= 1 represents the event that the third toss was a head.
2. A =
i=1
:
i
=
i+1
= 1 represents the event that (at least) two
heads are tossed twice in a row at some time.
3. A =
N=1

nN
:
n
= 1 is the event where there are innitely
many heads tossed in the sequence.
4. A =
N=1

nN
:
n
= 1 is the event where heads occurs from
some time onwards, i.e. A i there exists, N = N () such that
n
= 1
for all n N.
Ideally we would like to assign a probability, P (A) , to all events A .
Given a physical experiment, we think of assigning this probability as follows.
Run the experiment many times to get sample points, (n) for each n N,
then try to dene P (A) by
P (A) = lim
N
1
N
N
k=1
1
A
( (k)) (2.1)
= lim
N
1
N
#1 k N : (k) A . (2.2)
18 2 Basic Probabilistic Notions
That is we think of P (A) as being the long term relative frequency that the
event A occurred for the sequence of experiments, (k)
k=1
.
Similarly supposed that A and B are two events and we wish to know how
likely the event A is given that we know that B has occurred. Thus we would
like to compute:
P (A[B) = lim
N
#k : 1 k N and
k
A B
#k : 1 k N and
k
B
,
which represents the frequency that A occurs given that we know that B has
occurred. This may be rewritten as
P (A[B) = lim
N
1
N
#k : 1 k N and
k
A B
1
N
#k : 1 k N and
k
B
=
P (A B)
P (B)
.
Denition 2.5. If B is a non-null event, i.e. P (B) > 0, dene the condi-
tional probability of A given B by,
P (A[B) :=
P (A B)
P (B)
.
There are of course a number of problems with this denition of P in Eq.
(2.1) including the fact that it is not mathematical nor necessarily well dened.
For example the limit may not exist. But ignoring these technicalities for the
moment, let us point out three key properties that P should have.
1. P (A) [0, 1] for all A .
2. P () = 0 and P () = 1.
3. Additivity. If A and B are disjoint event, i.e. A B = AB = , then
1
AB
= 1
A
+ 1
B
so that
P (A B) = lim
N
1
N
N
k=1
1
AB
( (k)) = lim
N
1
N
N
k=1
[1
A
( (k)) + 1
B
( (k))]
= lim
N
_
1
N
N
k=1
1
A
( (k)) +
1
N
N
k=1
1
B
( (k))
_
= P (A) +P (B) .
4. Countable Additivity. If A
j
j=1
are pairwise disjoint events (i.e. A
j

A
k
= for all j ,= k), then again, 1
j=1
Aj
=

j=1
1
Aj
and therefore we
might hope that,
P
_
j=1
A
j
_
= lim
N
1
N
N
k=1
1
j=1
Aj
( (k)) = lim
N
1
N
N
k=1
j=1
1
Aj
( (k))
= lim
N
j=1
1
N
N
k=1
1
Aj
( (k))
?
=
j=1
lim
N
1
N
N
k=1
1
Aj
( (k)) (by a leap of faith)
=
j=1
P (A
j
) .
Example 2.6. Let us consider the tossing of a coin N times with a fair coin. In
this case we would expect that every is equally likely, i.e. P () =
1
2
N
.
Assuming this we are then forced to dene
P (A) =
1
2
N
#(A) .
Observe that this probability has the following property. Suppose that
0, 1
k
is a given sequence, then
P ( : (
1
, . . . ,
k
) = ) =
1
2
N
2
Nk
=
1
2
k
.
That is if we ignore the ips after time k, the resulting probabilities are the
same as if we only ipped the coin k times.
Example 2.7. The previous example suggests that if we ip a fair coin an innite
number of times, so that now = 0, 1
N
, then we should dene
P ( : (
1
, . . . ,
k
) = ) =
1
2
k
(2.3)
for any k 1 and 0, 1
k
. Assuming there exists a probability, P : 2
[0, 1] such that Eq. (2.3) holds, we would like to compute, for example, the
probability of the event B where an innite number of heads are tossed. To try
to compute this, let
A
n
= :
n
= 1 = heads at time n
B
N
:=
nN
A
n
= at least one heads at time N or later
and
B =
N=1
B
N
= A
n
i.o. =
N=1
nN
A
n
.
Since
2 Basic Probabilistic Notions 19
B
c
N
=
nN
A
c
n

MnN
A
c
n
= :
N
=
N+1
= =
M
= 0 ,
we see that
P (B
c
N
)
1
2
MN
0 as M .
Therefore, P (B
N
) = 1 for all N. If we assume that P is continuous under taking
decreasing limits we may conclude, using B
N
B, that
P (B) = lim
N
P (B
N
) = 1.
Without this continuity assumption we would not be able to compute P (B) .
The unfortunate fact is that we can not always assign a desired probability
function, P (A) , for all A . For example we have the following negative
theorem.
Theorem 2.8 (No-Go Theorem). Let S = z C : [z[ = 1 be the unit cir-
cle. Then there is no probability function, P : 2
S
[0, 1] such that P (S) = 1,
P is invariant under rotations, and P is continuous under taking decreasing
limits.
Proof. We are going to use the fact proved below in Proposition 5.3, that
the continuity condition on P is equivalent to the additivity of P. For z S
and N S let
zN := zn S : n N, (2.4)
that is to say e
i
N is the set N rotated counter clockwise by angle . By
assumption, we are supposing that
P(zN) = P(N) (2.5)
for all z S and N S.
Let
R := z = e
i2t
: t = z = e
i2t
: t [0, 1)
a countable subgroup of S. As above R acts on S by rotations and divides S
up into equivalence classes, where z, w S are equivalent if z = rw for some
r R. Choose (using the axiom of choice) one representative point n from each
of these equivalence classes and let N S be the set of these representative
points. Then every point z S may be uniquely written as z = nr with n N
and r R. That is to say
S =
rR
(rN) (2.6)
where

is used to denote the union of pair-wise disjoint sets A
. By
Eqs. (2.5) and (2.6),
1 = P(S) =
rR
P(rN) =
rR
P(N). (2.7)
We have thus arrived at a contradiction, since the right side of Eq. (2.7) is either
equal to 0 or to depending on whether P (N) = 0 or P (N) > 0.
To avoid this problem, we are going to have to relinquish the idea that P
should necessarily be dened on all of 2
. So we are going to only dene P on

particular subsets, B 2
. We will developed this below.

Part II
Formal Development
3
Preliminaries
3.1 Set Operations
Let N denote the positive integers, N
0
:= N0 be the non-negative integers
and Z = N
0
(N) the positive and negative integers including 0, the
rational numbers, 1 the real numbers, and C the complex numbers. We will
also use F to stand for either of the elds 1 or C.
Notation 3.1 Given two sets X and Y, let Y
X
denote the collection of all
functions f : X Y. If X = N, we will say that f Y
N
is a sequence
with values in Y and often write f
n
for f (n) and express f as f
n
n=1
. If
X = 1, 2, . . . , N, we will write Y
N
in place of Y
1,2,...,N]
and denote f Y
N
by f = (f
1
, f
2
, . . . , f
N
) where f
n
= f(n).
Notation 3.2 More generally if X
: A is a collection of non-empty sets,

let X
A
=

A
X
and
: X
A
X
be the canonical projection map dened

by
(x) = x
. If If X
= X for some xed space X, then we will write

A
X
as X
A
rather than X
A
.
Recall that an element x X
A
is a choice function, i.e. an assignment
x
:= x() X
for each A. The axiom of choice states that X

A
,=
provided that X
,= for each A.
Notation 3.3 Given a set X, let 2
X
denote the power set of X the collection
of all subsets of X including the empty set.
The reason for writing the power set of X as 2
X
is that if we think of 2
meaning 0, 1 , then an element of a 2
X
= 0, 1
X
is completely determined
by the set
A := x X : a(x) = 1 X.
In this way elements in 0, 1
X
are in one to one correspondence with subsets
of X.
For A 2
X
let
A
c
:= X A = x X : x / A
and more generally if A, B X let
B A := x B : x / A = B A
c
.
We also dene the symmetric dierence of A and B by
AB := (B A) (A B) .
As usual if A
I
is an indexed collection of subsets of X we dene the union
and the intersection of this collection by
I
A
:= x X : I x A
and
I
A
:= x X : x A
I .
Notation 3.4 We will also write

I
A
for
I
A
in the case that

A
I
are pairwise disjoint, i.e. A
= if ,= .
Notice that is closely related to and is closely related to . For example
let A
n
n=1
be a sequence of subsets from X and dene
inf
kn
A
n
:=
kn
A
k
,
sup
kn
A
n
:=
kn
A
k
,
limsup
n
A
n
:= A
n
i.o. := x X : #n : x A
n
=
and
liminf
n
A
n
:= A
n
a.a. := x X : x A
n
for all n suciently large.
(One should read A
n
i.o. as A
n
innitely often and A
n
a.a. as A
n
almost
always.) Then x A
n
i.o. i
N N n N x A
n
and this may be expressed as
A
n
i.o. =
N=1
nN
A
n
.
Similarly, x A
n
a.a. i
N N n N, x A
n
which may be written as
A
n
a.a. =
N=1
nN
A
n
.
24 3 Preliminaries
Denition 3.5. Given a set A X, let
1
A
(x) =
_
1 if x A
0 if x / A
be the indicator function of A.
Lemma 3.6. We have:
1. (
n
A
n
)
c
=
n
A
c
n
,
2. A
n
i.o.
c
= A
c
n
a.a. ,
3. limsup
n
A
n
= x X :
n=1
1
An
(x) = ,
4. liminf
n
A
n
=
_
x X :
n=1
1
A
c
n
(x) <
_
,
5. sup
kn
1
Ak
(x) = 1
knAk
= 1
sup
kn
Ak
,
6. inf
kn
1
Ak
(x) = 1
knAk
= 1
infkn Ak
,
7. 1
limsup
n
An
= limsup
n
1
An
, and
8. 1
liminfnAn
= liminf
n
1
An
.
Denition 3.7. A set X is said to be countable if is empty or there is an
injective function f : X N, otherwise X is said to be uncountable.
Lemma 3.8 (Basic Properties of Countable Sets).
1. If A X is a subset of a countable set X then A is countable.
2. Any innite subset N is in one to one correspondence with N.
3. A non-empty set X is countable i there exists a surjective map, g : N X.
4. If X and Y are countable then X Y is countable.
5. Suppose for each m N that A
m
is a countable subset of a set X, then
A =
m=1
A
m
is countable. In short, the countable union of countable sets
is still countable.
6. If X is an innite set and Y is a set with at least two elements, then Y
X
is uncountable. In particular 2
X
is uncountable for any innite set X.
Proof. 1. If f : X N is an injective map then so is the restriction, f[
A
,
of f to the subset A. 2. Let f (1) = min and dene f inductively by
f(n + 1) = min( f(1), . . . , f(n)) .
Since is innite the process continues indenitely. The function f : N
dened this way is a bijection.
3. If g : N X is a surjective map, let
f(x) = ming
1
(x) = minn N : f(n) = x .
Then f : X N is injective which combined with item
2. (taking = f(X)) shows X is countable. Conversely if f : X N is
injective let x
0
X be a xed point and dene g : N X by g(n) = f
1
(n)
for n f (X) and g(n) = x
0
otherwise.
4. Let us rst construct a bijection, h, from N to NN. To do this put the
elements of N N into an array of the form
_
_
_
_
_
(1, 1) (1, 2) (1, 3) . . .
(2, 1) (2, 2) (2, 3) . . .
(3, 1) (3, 2) (3, 3) . . .
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
and then count these elements by counting the sets (i, j) : i +j = k one
at a time. For example let h(1) = (1, 1) , h(2) = (2, 1), h(3) = (1, 2), h(4) =
(3, 1), h(5) = (2, 2), h(6) = (1, 3) and so on. If f : N X and g : N Y are
surjective functions, then the function (f g) h : N X Y is surjective
where (f g) (m, n) := (f (m), g(n)) for all (m, n) N N.
5. If A = then A is countable by denition so we may assume A ,= .
With out loss of generality we may assume A
1
,= and by replacing A
m
by
A
1
if necessary we may also assume A
m
,= for all m. For each m N let
a
m
: N A
m
be a surjective function and then dene f : NN
m=1
A
m
by
f(m, n) := a
m
(n). The function f is surjective and hence so is the composition,
f h : N
m=1
A
m
, where h : N N N is the bijection dened above.
6. Let us begin by showing 2
N
= 0, 1
N
is uncountable. For sake of
contradiction suppose f : N 0, 1
N
is a surjection and write f (n) as
(f
1
(n) , f
2
(n) , f
3
(n) , . . . ) . Now dene a 0, 1
N
by a
n
:= 1 f
n
(n). By
construction f
n
(n) ,= a
n
for all n and so a / f (N) . This contradicts the as-
sumption that f is surjective and shows 2
N
is uncountable. For the general
case, since Y
X
0
Y
X
for any subset Y
0
Y, if Y
X
0
is uncountable then so
is Y
X
. In this way we may assume Y
0
is a two point set which may as well
be Y
0
= 0, 1 . Moreover, since X is an innite set we may nd an injective
map x : N X and use this to set up an injection, i : 2
N
2
X
by setting
i (A) := x
n
: n N X for all A N. If 2
X
were countable we could nd
a surjective map f : 2
X
N in which case f i : 2
N
N would be surjec-
tive as well. However this is impossible since we have already seed that 2
N
is
uncountable.
3.2 Exercises
Let f : X Y be a function and A
i
iI
be an indexed family of subsets of Y,
verify the following assertions.
Exercise 3.1. (
iI
A
i
)
c
=
iI
A
c
i
.
3.3 Algebraic sub-structures of sets 25
Exercise 3.2. Suppose that B Y, show that B (
iI
A
i
) =
iI
(B A
i
).
Exercise 3.3. f
1
(
iI
A
i
) =
iI
f
1
(A
i
).
Exercise 3.4. f
1
(
iI
A
i
) =
iI
f
1
(A
i
).
Exercise 3.5. Find a counterexample which shows that f(C D) = f(C)
f(D) need not hold.
Example 3.9. Let X = a, b, c and Y = 1, 2 and dene f (a) = f (b) = 1
and f (c) = 2. Then = f (a b) ,= f (a) f (b) = 1 and 1, 2 =
f (a
c
) ,= f (a)
c
= 2 .
3.3 Algebraic sub-structures of sets
Denition 3.10. A collection of subsets / of a set X is a system or
multiplicative system if / is closed under taking nite intersections.
Denition 3.11. A collection of subsets / of a set X is an algebra (Field)
if
1. , X /
2. A / implies that A
c
/
3. / is closed under nite unions, i.e. if A
1
, . . . , A
n
/ then A
1
A
n
/.
In view of conditions 1. and 2., 3. is equivalent to
3
t
. / is closed under nite intersections.
Denition 3.12. A collection of subsets B of X is a algebra (or some-
times called a eld) if B is an algebra which also closed under countable
unions, i.e. if A
i
i=1
B, then
i=1
A
i
B. (Notice that since B is also
closed under taking complements, B is also closed under taking countable inter-
sections.)
Example 3.13. Here are some examples of algebras.
1. B = 2
X
, then B is a algebra.
2. B = , X is a algebra called the trivial eld.
3. Let X = 1, 2, 3, then / = , X, 1 , 2, 3 is an algebra while, o :=
, X, 2, 3 is a not an algebra but is a system.
Proposition 3.14. Let c be any collection of subsets of X. Then there exists
a unique smallest algebra /(c) and algebra (c) which contains c.
Proof. Simply take
/(c) :=
/ : / is an algebra such that c /

and
(c) :=
/: / is a algebra such that c /.

Example 3.15. Suppose X = 1, 2, 3 and c = , X, 1, 2, 1, 3, see Figure
3.1. Then
Fig. 3.1. A collection of subsets.
/(c) = (c) = 2
X
.
On the other hand if c = 1, 2 , then /(c) = , X, 1, 2, 3.
Exercise 3.6. Suppose that c
i
2
X
for i = 1, 2. Show that /(c
1
) = /(c
2
)
i c
1
/(c
2
) and c
2
/(c
1
) . Similarly show, (c
1
) = (c
2
) i c
1
(c
2
)
and c
2
(c
1
) . Give a simple example where /(c
1
) = /(c
2
) while c
1
,= c
2
.
In this course we will often be interested in the Borel algebra on a
topological space.
Denition 3.16 (Borel eld). The Borel algebra, B = B
R
=
B(1) , on 1 is the smallest -eld containing all of the open subsets of 1.
More generally if (X, ) is a topological space, the Borel algebra on X is
B
X
:= () i.e. the smallest algebra containing all open (closed) subsets
of X.
26 3 Preliminaries
Exercise 3.7. Verify the Borel algebra, B
R
, is generated by any of the
following collection of sets:
1. (a, ) : a 1 , 2. (a, ) : a or 3. [a, ) : a .
Hint: make use of Exercise 3.6.
We will postpone a more in depth study of algebras until later. For now,
let us concentrate on understanding the the simpler notion of an algebra.
Denition 3.17. Let X be a set. We say that a family of sets T 2
X
is a
partition of X if distinct members of T are disjoint and if X is the union of
the sets in T.
Example 3.18. Let X be a set and c = A
1
, . . . , A
n
where A
1
, . . . , A
n
is a
partition of X. In this case
/(c) = (c) =
i
A
i
: 1, 2, . . . , n
where
i
A
i
:= when = . Notice that
#(/(c)) = #(2
1,2,...,n]
) = 2
n
.
Example 3.19. Suppose that X is a set and that / 2
X
is a nite algebra, i.e.
#(/) < . For each x X let
A
x
= A / : x A /,
wherein we have used / is nite to insure A
x
/. Hence A
x
is the smallest set
in / which contains x.
Now suppose that y X. If x A
y
then A
x
A
y
so that A
x
A
y
= A
x
.
On the other hand, if x / A
y
then x A
x
A
y
and therefore A
x
A
x
A
y
, i.e.
A
x
A
y
= . Therefore we have shown, either A
x
A
y
= or A
x
A
y
= A
x
.
By reversing the roles of x and y it also follows that either A
y
A
x
= or
A
y
A
x
= A
y
. Therefore we may conclude, either A
x
= A
y
or A
x
A
y
= for
all x, y X.
Let us now dene B
i
k
i=1
to be an enumeration of A
x
xX
. It is a straight-
forward to conclude that
/ =
i
B
i
: 1, 2, . . . , k .
For example observe that for any A /, we have A =
xA
A
x
=
i
B
i
where
:= i : B
i
A .
Proposition 3.20. Suppose that B 2
X
is a algebra and B is at most
a countable set. Then there exists a unique nite partition T of X such that
T B and every element B B is of the form
B = A T : A B . (3.1)
In particular B is actually a nite set and #(B) = 2
n
for some n N.
Proof. We proceed as in Example 3.19. For each x X let
A
x
= A B : x A B,
wherein we have used B is a countable algebra to insure A
x
B. Just as
above either A
x
A
y
= or A
x
= A
y
and therefore T = A
x
: x X B is a
(necessarily countable) partition of X for which Eq. (3.1) holds for all B B.
Enumerate the elements of T as T = P
n
N
n=1
where N N or N = . If
N = , then the correspondence
a 0, 1
N
A
a
= P
n
: a
n
= 1 B
is bijective and therefore, by Lemma 3.8, B is uncountable. Thus any countable
algebra is necessarily nite. This nishes the proof modulo the uniqueness
assertion which is left as an exercise to the reader.
Example 3.21 (Countable/Co-countable Field). Let X = 1 and c :=
x : x 1 . Then (c) consists of those subsets, A 1, such that A is
countable or A
c
is countable. Similarly, /(c) consists of those subsets, A 1,
such that A is nite or A
c
is nite. More generally we have the following exercise.
Exercise 3.8. Let X be a set, I be an innite index set, and c = A
i
iI
be a
partition of X. Prove the algebra, /(c) , and that algebra, (c) , generated
by c are given by
/(c) =
i
A
i
: I with #() < or #(
c
) <
and
(c) =
i
A
i
: I with countable or
c
countable
respectively. Here we are using the convention that
i
A
i
:= when = .
In particular if I is countable, then
(c) =
i
A
i
: I .
Proposition 3.22. Let X be a set and c 2
X
. Let c
c
:= A
c
: A c and
c
c
:= c X, c
c
Then
/(c) := nite unions of nite intersections of elements from c
c
. (3.2)
3.3 Algebraic sub-structures of sets 27
Proof. Let / denote the right member of Eq. (3.2). From the denition of
an algebra, it is clear that c / /(c). Hence to nish that proof it suces
to show / is an algebra. The proof of these assertions are routine except for
possibly showing that / is closed under complementation. To check / is closed
under complementation, let Z / be expressed as
Z =
N
_
i=1
K
j=1
A
ij
where A
ij
c
c
. Therefore, writing B
ij
= A
c
ij
c
c
, we nd that
Z
c
=
N
i=1
K
_
j=1
B
ij
=
K
_
j1,...,jN=1
(B
1j1
B
2j2
B
NjN
) /
wherein we have used the fact that B
1j1
B
2j2
B
NjN
is a nite intersection
of sets from c
c
.
Remark 3.23. One might think that in general (c) may be described as the
countable unions of countable intersections of sets in c
c
. However this is in
general false, since if
Z =
_
i=1
j=1
A
ij
with A
ij
c
c
, then
Z
c
=
_
j1=1,j2=1,...jN=1,...
_
=1
A
c
,j
_
which is now an uncountable union. Thus the above description is not correct.
In general it is complicated to explicitly describe (c), see Proposition 1.23 on
page 39 of Folland for details. Also see Proposition 3.20.
Exercise 3.9. Let be a topology on a set X and / = /() be the algebra
generated by . Show / is the collection of subsets of X which may be written
as nite union of sets of the form F V where F is closed and V is open.
Solution to Exercise (3.9). In this case
c
is the collection of sets which are
either open or closed. Now if V
i

o
X and F
j
X for each j, then (
n
i=1
V
i
)
_
m
j=1
F
j
_
is simply a set of the form V F where V
o
X and F X. Therefore
the result is an immediate consequence of Proposition 3.22.
Denition 3.24. A set o 2
X
is said to be an semialgebra or elementary
class provided that
o
o is closed under nite intersections
if E o, then E
c
is a nite disjoint union of sets from o. (In particular
X =
c
is a nite disjoint union of elements from o.)
Proposition 3.25. Suppose o 2
X
is a semi-eld, then / = /(o) consists of
sets which may be written as nite disjoint unions of sets from o.
Proof. (Although it is possible to give a proof using Proposition 3.22, it is
just as simple to give a direct proof.) Let / denote the collection of sets which
may be written as nite disjoint unions of sets from o. Clearly o / /(o) so
it suces to show / is an algebra since /(o) is the smallest algebra containing
o. By the properties of o, we know that , X /. The following two steps now
nish the proof.
1. (/ is closed under nite intersections.) Suppose that A
i
=
Fi
F /
where, for i = 1, 2, . . . , n,
i
is a nite collection of disjoint sets from o. Then
n
i=1
A
i
=
n
i=1
_
Fi
F
_
=
_
(F1,,...,Fn)1n
(F
1
F
2
F
n
)
and this is a disjoint (you check) union of elements from o. Therefore / is
closed under nite intersections.
2. (/is closed under complementation.) If A =
F
F with being a nite
collection of disjoint sets from o, then A
c
=

F
F
c
. Since, by assumption,
F
c
/ for all F o and / is closed under nite intersections by step 1.,
it follows that A
c
/.
Example 3.26. Let X = 1, then
o :=
_
(a, b] 1 : a, b

1
_
= (a, b] : a [, ) and a < b < , 1
is a semi-eld. The algebra, /(o), generated by o consists of nite disjoint
unions of sets from o. For example,
A = (0, ] (2, 7] (11, ) /(o) .
Exercise 3.10. Let / 2
X
and B 2
Y
be semi-elds. Show the collection
o := AB : A / and B B
is also a semi-eld.
Solution to Exercise (3.10). Clearly = c = / B. Let A
i
/
and B
i
B, then
n
i=1
(A
i
B
i
) = (
n
i=1
A
i
) (
n
i=1
B
i
) /B
showing c is closed under nite intersections. For AB c,
(AB)
c
= (A
c
B
c
)
(A
c
B)
(AB
c
)
and by assumption A
c
=
n
i=1
A
i
with A
i
/ and B
c
=
m
j=1
B
i
with B
j
B.
Therefore
A
c
B
c
=
_
n
i=1
A
i
_
_
_
m
j=1
B
i
_
_
=
n,m
i=1,j=1
A
i
B
i
,
A
c
B =
n
i=1
A
i
B, and AB
c
=
m
j=1
AB
i
showing (AB)
c
may be written as nite disjoint union of elements from o.
4
Finitely Additive Measures / Integration
Denition 4.1. Suppose that c 2
X
is a collection of subsets of X and :
c [0, ] is a function. Then
1. is additive or nitely additive on c if
(E) =
n
i=1
(E
i
) (4.1)
whenever E =
n
i=1
E
i
c with E
i
c for i = 1, 2, . . . , n < .
2. is additive (or countable additive) on c if Eq. (4.1) holds even
when n = .
3. is sub-additive (nitely sub-additive) on c if
(E)
n
i=1
(E
i
)
whenever E =
n
i=1
E
i
c with n N (n N).
4. is a nitely additive measure if c = / is an algebra, () = 0, and
is nitely additive on /.
5. is a premeasure if is a nitely additive measure which is additive
on /.
6. is a measure if is a premeasure on a algebra. Furthermore if
(X) = 1, we say is a probability measure on X.
Proposition 4.2 (Basic properties of nitely additive measures). Sup-
pose is a nitely additive measure on an algebra, / 2
X
, A, B / with
A Band A
j
n
j=1
/, then :
1. ( is monotone) (A) (B) if A B.
2. For A, B /, the following strong additivity formula holds;
(A B) +(A B) = (A) +(B) . (4.2)
3. ( is nitely subbadditive) (
n
j=1
A
j
)
n
j=1
(A
j
).
4. is sub-additive on / i
(A)
i=1
(A
i
) for A =
i=1
A
i
(4.3)
where A / and A
i
i=1
/ are pairwise disjoint sets.
5. ( is countably superadditive) If A =
i=1
A
i
with A
i
, A /, then
i=1
A
i
_
i=1
(A
i
) . (4.4)
(See Remark 4.9 for example where this inequality is strict.)
6. A nitely additive measure, , is a premeasure i is subadditive.
Proof.
1. Since B is the disjoint union of A and (B A) and B A = B A
c
/ it
follows that
(B) = (A) +(B A) (A).
2. Since
A B = [A (A B)]
[B (A B)]
A B,
(A B) = (A B (A B)) +(A B)
= (A (A B)) +(B (A B)) +(A B) .
Adding (A B) to both sides of this equation proves Eq. (4.2).
3. Let

E
j
= E
j
(E
1
E
j1
) so that the

E
j
s are pair-wise disjoint and
E =
n
j=1
E
j
. Since

E
j
E
j
it follows from the monotonicity of that
(E) =
n
j=1
(
E
j
)
n
j=1
(E
j
).
4. If A =

i=1
B
i
with A / and B
i
/, then A =

i=1
A
i
where A
i
:=
B
i
(B
1
. . . B
i1
) / and B
0
= . Therefore using the monotonicity of
and Eq. (4.3)
(A)
i=1
(A
i
)
i=1
(B
i
).
5. Suppose that A =

i=1
A
i
with A
i
, A /, then

n
i=1
A
i
A for all n
and so by the monotonicity and nite additivity of ,
n
i=1
(A
i
) (A) .
Letting n in this equation shows is superadditive.
6. This is a combination of items 5. and 6.
30 4 Finitely Additive Measures / Integration
4.1 Examples of Measures
Most algebras and -additive measures are somewhat dicult to describe
and dene. However, there are a few special cases where we can describe ex-
plicitly what is going on.
Example 4.3. Suppose that is a nite set, B := 2
, and p : [0, 1] is a
function such that

p () = 1.
Then
P (A) :=
A
p () for all A
denes a measure on 2
.
Example 4.4. Suppose that X is any set and x X is a point. For A X, let
x
(A) =
_
1 if x A
0 if x / A.
Then =
x
is a measure on X called the Dirac delta measure at x.
Example 4.5. Suppose B 2
X
is a algebra, is a measure on B, and > 0,
then is also a measure on B. Moreover, if J is an index set and
j
jJ
are all measures on B, then =
j=1
j
, i.e.
(A) :=
j=1
j
(A) for all A B,
denes another measure on B. To prove this we must show that is countably
additive. Suppose that A =
i=1
A
i
with A
i
B, then (using Tonelli for sums,
Proposition 1.7),
(A) =
j=1
j
(A) =
j=1
i=1
j
(A
i
)
=
i=1
j=1
j
(A
i
) =
i=1
(A
i
).
Example 4.6. Suppose that X is a countable set and : X [0, ] is a func-
tion. Let X = x
n
n=1
be an enumeration of X and then we may dene a
measure on 2
X
by,
=
:=
n=1
(x
n
)
xn
.
We will now show this measure is independent of our choice of enumeration of
X by showing,
(A) =
xA
(x) := sup
A
x
(x) A X. (4.5)
Here we are using the notation, A to indicate that is a nite subset of
A.
To verify Eq. (4.5), let M := sup
A
x
(x) and for each N N let
N
:= x
n
: x
n
A and 1 n N .
Then by denition of ,
(A) =
n=1
(x
n
)
xn
(A) = lim
N
N
n=1
(x
n
)1
xnA
= lim
N
xN
(x) M.
On the other hand if A, then
x
(x) =
n: xn
(x
n
) = () (A)
from which it follows that M (A) . This shows that is independent of how
we enumerate X.
The above example has a natural extension to the case where X is uncount-
able and : X [0, ] is any function. In this setting we simply may dene
: 2
X
[0, ] using Eq. (4.5). We leave it to the reader to verify that this is
indeed a measure on 2
X
.
We will construct many more measure in Chapter 5 below. The starting
point of these constructions will be the construction of nitely additive measures
using the next proposition.
Proposition 4.7 (Construction of Finitely Additive Measures). Sup-
pose o 2
X
is a semi-algebra (see Denition 3.24) and / = /(o) is the
algebra generated by o. Then every additive function : o [0, ] such that
() = 0 extends uniquely to an additive measure (which we still denote by )
on /.
Proof. Since (by Proposition 3.25) every element A / is of the form
A =

i
E
i
for a nite collection of E
i
o, it is clear that if extends to a
measure then the extension is unique and must be given by
4.1 Examples of Measures 31
(A) =
i
(E
i
). (4.6)
To prove existence, the main point is to show that (A) in Eq. (4.6) is well
dened; i.e. if we also have A =
j
F
j
with F
j
o, then we must show
i
(E
i
) =
j
(F
j
). (4.7)
But E
i
=
j
(E
i
F
j
) and the additivity of on o implies (E
i
) =
j
(E
i
F
j
) and hence
i
(E
i
) =
j
(E
i
F
j
) =
i,j
(E
i
F
j
).
Similarly,
j
(F
j
) =
i,j
(E
i
F
j
)
which combined with the previous equation shows that Eq. (4.7) holds. It is
now easy to verify that extended to / as in Eq. (4.6) is an additive measure
on /.
Proposition 4.8. Let X = 1, o be the semi-algebra,
o = (a, b] 1 : a b , (4.8)
and / = /(o) be the algebra formed by taking nite disjoint unions of elements
from o, see Proposition 3.25. To each nitely additive probability measures :
/ [0, ], there is a unique increasing function F :

1 [0, 1] such that
F() = 0, F() = 1 and
((a, b] 1) = F(b) F(a) a b in

1. (4.9)
Conversely, given an increasing function F :

1 [0, 1] such that F() = 0,
F() = 1 there is a unique nitely additive measure =
F
on / such that
the relation in Eq. (4.9) holds. (Eventually we will only be interested in the case
where F () = lim
a
F (a) and F () = lim
b
F (b) .)
Proof. Given a nitely additive probability measure , let
F (x) := ((, x] 1) for all x

1.
Then F () = 1, F () = 0 and for b > a,
F (b) F (a) = ((, b] 1) ((, a]) = ((a, b] 1) .
Conversely, suppose F :

1 [0, 1] as in the statement of the theorem is
given. Dene on o using the formula in Eq. (4.9). The argument will be
completed by showing is additive on o and hence, by Proposition 4.7, has a
unique extension to a nitely additive measure on /. Suppose that
(a, b] =
n
i=1
(a
i
, b
i
].
By reordering (a
i
, b
i
] if necessary, we may assume that
a = a
1
< b
1
= a
2
< b
2
= a
3
< < b
n1
= a
n
< b
n
= b.
Therefore, by the telescoping series argument,
((a, b] 1) = F(b) F(a) =
n
i=1
[F(b
i
) F(a
i
)] =
n
i=1
((a
i
, b
i
] 1).
Remark 4.9. Suppose that F :

1

1 is any non-decreasing function such that
F (1) 1. Then the same methods used in the proof of Proposition 4.8 shows
that there exists a unique nitely additive measure, =
F
, on / = /(o) such
that Eq. (4.9) holds. If F () > lim
b
F (b) and A
i
= (i, i +1] for i N, then
i=1
F
(A
i
) =
i=1
(F (i + 1) F (i)) = lim
N
N
i=1
(F (i + 1) F (i))
= lim
N
(F (N + 1) F (1)) < F () F (1) =
F
(
i=1
A
i
) .
This shows that strict inequality can hold in Eq. (4.4) and that
F
is not
a premeasure. Similarly one shows
F
is not a premeasure if F () <
lim
a
F (a) or if F is not right continuous at some point a 1. Indeed,
in the latter case consider
(a, a + 1] =
n=1
(a +
1
n + 1
, a +
1
n
].
Working as above we nd,
n=1
F
_
(a +
1
n + 1
, a +
1
n
]
_
= F (a + 1) F (a+)
while
F
((a, a + 1]) = F (a + 1) F (a) . We will eventually show in Chapter 5
below that
F
extends uniquely to a additive measure on B
R
whenever F
is increasing, right continuous, and F () = lim
x
F (x) .
Before constructing additive measures (see Chapter 5 below), we are
going to pause to discuss a preliminary notion of integration and develop some
of its properties. Hopefully this will help the reader to develop the necessary
intuition before heading to the general theory. First we need to describe the
functions we are allowed to integrate.
4.2 Simple Random Variables
Denition 4.10 (Simple random variables). A function, f : Y is said
to be simple if f () Y is a nite set. If / 2
is an algebra, we say that a

simple function f : Y is measurable if f = y := f
1
(y) / for all
y Y. A measurable simple function, f : C, is called a simple random
variable relative to /.
Notation 4.11 Given an algebra, / 2
, let S(/) denote the collection of

simple random variables from to C. For example if A /, then 1
A
S(/)
is a measurable simple function.
Lemma 4.12. Let / 2
be an algebra, then;
1. S(/) is a sub-algebra of all functions from to C.
2. f : C, is a / simple random variable i there exists
i
C and
A
i
/ for 1 i n for some n N such that
f =
n
i=1
i
1
Ai
. (4.10)
3. For any function, F : C C, F f S(/) for all f S(/) . In particular,
[f[ S(/) if f S(/) .
Proof. 1. Let us observe that 1
= 1 and 1
= 0 are in S(/) . If f, g S(/)

and c C 0 , then
f +cg = =
_
a,bC:a+cb=
(f = a g = b) / (4.11)
and
f g = =
_
a,bC:ab=
(f = a g = b) / (4.12)
from which it follows that f +cg and f g are back in S(/) .
2. Since S(/) is an algebra, every f of the form in Eq. (4.10) is in S(/) .
Conversely if f S(/) it follows by denition that f =

f()
1
f=]
which is of the form in Eq. (4.10).
3. If F : C C, then
F f =
f()
F () 1
f=]
S(/) .
Exercise 4.1 (/ measurable simple functions). As in Example 3.19, let
/ 2
X
be a nite algebra and B
1
, . . . , B
k
be the partition of X associated to
/. Show that a function, f : X C, is an / simple function i f is constant
on B
i
for each i. Thus any / simple function is of the form,
f =
k
i=1
i
1
Bi
(4.13)
for some
i
C.
Corollary 4.13. Suppose that is a nite set and Z : X is a function.
Let
/ := /(Z) := Z
1
_
2
_
:=
_
Z
1
(E) : E
_
.
Then / is an algebra and f : X C is an / simple function i f = F Z
for some function F : C.
Proof. For , let
A
:= Z = = x X : Z (x) = .
The A
is the partition of X determined by /. Therefore f is an /

simple function i f[
A
is constant for each . Let us denote this constant
value by F () . As Z = on A
, F : C is a function such that f = F Z.

Conversely if F : C is a function and f = F Z, then f = F () on A
,
i.e. f is an / simple function.
4.2.1 The algebraic structure of simple functions*
Denition 4.14. A simple function algebra, S, is a subalgebra
1
of the
bounded complex functions on X such that 1 S and each function in S is
a simple function. If S is a simple function algebra, let
/(S) := A X : 1
A
S .
(It is easily checked that /(S) is a sub-algebra of 2
X
.)
1
To be more explicit we are assuming that S is a linear subspace of bounded functions
which is closed under pointwise multiplication.
4.3 Simple Integration 33
Lemma 4.15. Suppose that S is a simple function algebra, f S and f (X)
the range of f. Then f = /(S) .
Proof. Let
i
n
i=0
be an enumeration of f (X) with
0
= . Then
g :=
_
n
i=1
(
i
)
_
1
n
i=1
(f
i
1) S.
Moreover, we see that g = 0 on
n
i=1
f =
i
while g = 1 on f = . So we
have shown g = 1
f=]
S and therefore that f = /(S) .
Exercise 4.2. Continuing the notation introduced above:
1. Show /(S) is an algebra of sets.
2. Show S(/) is a simple function algebra.
3. Show that the map
/
_
Algebras 2
X
_
S(/) simple function algebras on X
is bijective and the map, S /(S) , is the inverse map.
Solution to Exercise (4.2).
1. Since 0 = 1
, 1 = 1
X
S, it follows that and X are in /(S) . If A /(S) ,
then 1
A
c = 1 1
A
S and so A
c
/(S) . Finally, if A, B /(S) then
1
AB
= 1
A
1
B
S and thus A B /(S) .
2. If f, g S(/) and c F, then
f +cg = =
_
a,bF:a+cb=
(f = a g = b) /
and
f g = =
_
a,bF:ab=
(f = a g = b) /
from which it follows that f +cg and f g are back in S(/) .
3. If f : C is a simple function such that 1
f=]
S for all C,
then f =

C
1
f=]
S. Conversely, by Lemma 4.15, if f S then
1
f=]
S for all C. Therefore, a simple function, f : X C is in S
i 1
f=]
S for all C. With this preparation, we are now ready to
complete the verication.
First o,
A /(S(/)) 1
A
S(/) A /
which shows that /(S(/)) = /. Similarly,
f S(/(S)) f = /(S) C
1
f=]
S C
f S
which shows S(/(S)) = S.
4.3 Simple Integration
Denition 4.16 (Simple Integral). Suppose now that P is a nitely additive
probability measure on an algebra / 2
X
. For f S(/) the integral or
expectation, E(f) = E
P
(f), is dened by
E
P
(f) =
_
X
fdP =
yC
yP(f = y). (4.14)
Example 4.17. Suppose that A /, then
E1
A
= 0 P (A
c
) + 1 P (A) = P (A) . (4.15)
Remark 4.18. Let us recall that our intuitive notion of P (A) was given as in
Eq. (2.1) by
P (A) = lim
N
1
N
1
A
( (k))
where (k) was the result of the k
th
independent experiment. If we use
this interpretation back in Eq. (4.14) we arrive at,
E(f) =
yC
yP(f = y) =
yC
y lim
N
1
N
N
k=1
1
f((k))=y
= lim
N
1
N
yC
y
N
k=1
1
f((k))=y
= lim
N
1
N
N
k=1
yC
f ( (k)) 1
f((k))=y
= lim
N
1
N
N
k=1
f ( (k)) .
Thus informally, Ef should represent the limiting average of the values of f
over many independent experiments. We will come back to this later when
we study the strong law of large numbers.
Proposition 4.19. The expectation operator, E = E
P
: S(/) C, satises:
1. If f S(/) and C, then
E(f) = E(f). (4.16)
2. If f, g S(/) , then
E(f +g) = E(g) +E(f). (4.17)
Items 1. and 2. say that E() is a linear functional on S(/) .
3. If f =
N
j=1
j
1
Aj
for some
j
C and some A
j
C, then
E(f) =
N
j=1
j
P (A
j
) . (4.18)
4. E is positive, i.e. E(f) 0 for all 0 f S(/) . More generally, if
f, g S(/) and f g, then E(f) E(g) .
5. For all f S(/) ,
[Ef[ E[f[ . (4.19)
Proof.
1. If ,= 0, then
E(f) =
yC
y P(f = y) =
yC
y P(f = y/)
=
zC
z P(f = z) = E(f).
The case = 0 is trivial.
2. Writing f = a, g = b for f
1
(a) g
1
(b), then
E(f +g) =
zC
z P(f +g = z)
=
zC
z P
_

a+b=z
f = a, g = b
_
=
zC
z
a+b=z
P (f = a, g = b)
=
zC
a+b=z
(a +b) P (f = a, g = b)
=
a,b
(a +b) P (f = a, g = b) .
But
a,b
aP (f = a, g = b) =
a
a
b
P (f = a, g = b)
=
a
aP (
b
f = a, g = b)
=
a
aP (f = a) = Ef
and similarly,
a,b
bP (f = a, g = b) = Eg.
Equation (4.17) is now a consequence of the last three displayed equations.
3. If f =
N
j=1
j
1
Aj
, then
Ef = E
_
_
N
j=1
j
1
Aj
_
_
=
N
j=1
j
E1
Aj
=
N
j=1
j
P (A
j
) .
4. If f 0 then
E(f) =
a0
aP(f = a) 0
and if f g, then g f 0 so that
E(g) E(f) = E(g f) 0.
5. By the triangle inequality,
[Ef[ =
C
P (f = )
C
[[ P (f = ) = E[f[ ,
wherein the last equality we have used Eq. (4.18) and the fact that [f[ =
C
[[ 1
f=
.
Remark 4.20. If is a nite set and / = 2
, then
f () =
f () 1
]
and hence
E
P
f =
f () P () .
Remark 4.21. All of the results in Proposition 4.19 and Remark 4.20 remain
valid when P is replaced by a nite measure, : / [0, ), i.e. it is enough
to assume (X) < .
Exercise 4.3. Let P is a nitely additive probability measure on an algebra
/ 2
X
and for A, B / let (A, B) := P (AB) where AB = (A B)
(B A) . Show;
1. (A, B) = E[1
A
1
B
[ and then use this (or not) to show
2. (A, C) (A, B) + (B, C) for all A, B, C /.
Remark: it is now easy to see that : // [0, 1] satises the axioms of
a metric except for the condition that (A, B) = 0 does not imply that A = B
but only that A = B modulo a set of probability zero.
Remark 4.22 (Chebyshevs Inequality). Suppose that f S(/), > 0, and
p > 0, then
1
]f]

[f[
p
p
1
]f]

p
[f[
p
and therefore, see item 4. of Proposition 4.19,
P ([f[ ) = E
_
1
]f]
E
_
[f[
p
p
1
]f]
_

p
E[f[
p
. (4.20)
Observe that
[f[
p
=
C
[[
p
1
f=]
is a simple random variable and [f[ =

]]
f = / as well.
Therefore,
]f]
p
p
1
]f]
is still a simple random variable.
Lemma 4.23 (Inclusion Exclusion Formula). If A
n
/ for n =
1, 2, . . . , M such that
_
M
n=1
A
n
_
< , then
M
n=1
A
n
_
=
M
k=1
(1)
k+1

1n1<n2<<nkM
(A
n1
A
nk
) . (4.21)
Proof. This may be proved inductively from Eq. (4.2). We will give a dif-
ferent and perhaps more illuminating proof here. Let A :=
M
n=1
A
n
.
Since A
c
=
_
M
n=1
A
n
_
c
=
M
n=1
A
c
n
, we have
1 1
A
= 1
A
c =
M
n=1
1
A
c
n
=
M
n=1
(1 1
An
)
= 1 +
M
k=1
(1)
k

1n1<n2<<nkM
1
An
1
1
An
k
= 1 +
M
k=1
(1)
k

1n1<n2<<nkM
1
An
1
An
k
from which it follows that
1
M
n=1
An
= 1
A
=
M
k=1
(1)
k+1

1n1<n2<<nkM
1
An
1
An
k
. (4.22)
Integrating this identity with respect to gives Eq. (4.21).
Remark 4.24. The following identity holds even when
_
M
n=1
A
n
_
= ,
M
n=1
A
n
_
+
M
k=2 & k even
1n1<n2<<nkM
(A
n1
A
nk
)
=
M
k=1 & k odd
1n1<n2<<nkM
(A
n1
A
nk
) . (4.23)
This can be proved by moving every term with a negative sign on the right
side of Eq. (4.22) to the left side and then integrate the resulting identity.
Alternatively, Eq. (4.23) follows directly from Eq. (4.21) if
_
M
n=1
A
n
_
<
and when
_
M
n=1
A
n
_
= one easily veries that both sides of Eq. (4.23) are
innite.
To better understand Eq. (4.22), consider the case M = 3 where,
1 1
A
= (1 1
A1
) (1 1
A2
) (1 1
A3
)
= 1 (1
A1
+ 1
A2
+ 1
A3
)
+ 1
A1
1
A2
+ 1
A1
1
A3
+ 1
A2
1
A3
1
A1
1
A2
1
A3
so that
1
A1A2A3
= 1
A1
+ 1
A2
+ 1
A3
(1
A1A2
+ 1
A1A3
+ 1
A2A3
) + 1
A1A2A3
Here is an alternate proof of Eq. (4.22). Let and by relabeling the
sets A
n
if necessary, we may assume that A
1
A
m
and / A
m+1
A
M
for some 0 m M. (When m = 0, both sides of Eq. (4.22) are zero
and so we will only consider the case where 1 m M.) With this notation
we have
M
k=1
(1)
k+1

1n1<n2<<nkM
1
An
1
An
k
()
=
m
k=1
(1)
k+1

1n1<n2<<nkm
1
An
1
An
k
()
=
m
k=1
(1)
k+1
_
m
k
_
= 1
m
k=0
(1)
k
(1)
nk
_
m
k
_
= 1 (1 1)
m
= 1.
This veries Eq. (4.22) since 1
M
n=1
An
() = 1.
Example 4.25 (Coincidences). Let be the set of permutations (think of card
shuing), : 1, 2, . . . , n 1, 2, . . . , n , and dene P (A) :=
#(A)
n!
to be the
uniform distribution (Haar measure) on . We wish to compute the probability
of the event, B, that a random permutation xes some index i. To do this, let
A
i
:= : (i) = i and observe that B =
n
i=1
A
i
. So by the Inclusion
Exclusion Formula, we have
P (B) =
n
k=1
(1)
k+1

1i1<i2<i3<<ikn
P (A
i1
A
ik
) .
Since
P (A
i1
A
ik
) = P ( : (i
1
) = i
1
, . . . , (i
k
) = i
k
)
=
(n k)!
n!
and
#1 i
1
< i
2
< i
3
< < i
k
n =
_
n
k
_
,
we nd
P (B) =
n
k=1
(1)
k+1
_
n
k
_
(n k)!
n!
=
n
k=1
(1)
k+1
1
k!
. (4.24)
For large n this gives,
P (B) =
n
k=1
1
k!
(1)
k
= 1
k=0
1
k!
(1)
k
= 1 e
1

= 0.632.
Example 4.26 (Expected number of coincidences). Continue the notation in Ex-
ample 4.25. We now wish to compute the expected number of xed points of
a random permutation, , i.e. how many cards in the shued stack have not
moved on average. To this end, let
X
i
= 1
Ai
and observe that
N () =
n
i=1
X
i
() =
n
i=1
1
(i)=i
= #i : (i) = i .
denote the number of xed points of . Hence we have
EN =
n
i=1
EX
i
=
n
i=1
P (A
i
) =
n
i=1
(n 1)!
n!
= 1.
Let us check the above formulas when n = 3. In this case we have
N ()
1 2 3 3
1 3 2 1
2 1 3 1
2 3 1 0
3 1 2 0
3 2 1 1
and so
P ( a xed point) =
4
6
=
2
3
= 0.67
= 0.632
while
3
k=1
(1)
k+1
1
k!
= 1
1
2
+
1
6
=
2
3
and
EN =
1
6
(3 + 1 + 1 + 0 + 0 + 1) = 1.
The next three problems generalize the results above. The following notation
will be used throughout these exercises.
1. (, /, P) is a nitely additive probability space, so P () = 1,
2. A
i
/ for i = 1, 2, . . . , n,
3. N () :=
n
i=1
1
Ai
() = #i : A
i
, and
4. S
k
n
k=1
are given by
S
k
:=
1i1<<ikn
P (A
i1
A
ik
)
=
1,2,...,n]]]=k
P (
i
A
i
) .
Exercise 4.4. For 1 k n, show;
1. (as functions on ) that
_
N
k
_
=
1,2,...,n]]]=k
1
iAi
, (4.25)
where by denition
_
m
k
_
=
_
_
_
0 if k > m
m!
k!(mk)!
if 1 k m
1 if k = 0
. (4.26)
2. Conclude from Eq. (4.25) that for all z C,
(1 +z)
N
= 1 +
n
k=1
z
k
1i1<i2<<ikn
1
Ai
1
Ai
k
(4.27)
provided (1 +z)
0
= 1 even when z = 1.
3. Conclude from Eq. (4.25) that S
k
= E
P
_
N
k
_
.
Exercise 4.5. Taking expectations of Eq. (4.27) implies,
E
_
(1 +z)
N
_
= 1 +
n
k=1
S
k
z
k
. (4.28)
Show that setting z = 1 in Eq. (4.28) gives another proof of the inclusion
exclusion formula. Hint: use the denition of the expectation to write out
E
_
(1 +z)
N
_
explicitly.
Exercise 4.6. Let 1 m n. In this problem you are asked to compute the
probability that there are exactly m coincidences. Namely you should show,
P (N = m) =
n
k=m
(1)
km
_
k
m
_
S
k
=
n
k=m
(1)
km
_
k
m
_

1i1<<ikn
P (A
i1
A
ik
)
Hint: dierentiate Eq. (4.28) m times with respect to z and then evaluate the
result at z = 1. In order to do this you will nd it useful to derive formulas
for;
d
m
dz
m
[
z=1
(1 +z)
n
and
d
m
dz
m
[
z=1
z
k
.
Example 4.27. Let us again go back to Example 4.26 where we computed,
S
k
=
_
n
k
_
(n k)!
n!
=
1
k!
.
Therefore it follows from Exercise 4.6 that
P ( exactly m xed points) = P (N = m)
=
n
k=m
(1)
km
_
k
m
_
1
k!
=
1
m!
n
k=m
(1)
km
1
(k m)!
.
So if n is much bigger than m we may conclude that
P ( exactly m xed points)
=
1
m!
e
1
.
Let us check our results are consistent with Eq. (4.24);
P ( a xed point) =
n
m=1
P (N = m)
=
n
m=1
n
k=m
(1)
km
_
k
m
_
1
k!
=
1mkn
(1)
km
_
k
m
_
1
k!
=
n
k=1
k
m=1
(1)
km
_
k
m
_
1
k!
=
n
k=1
_
k
m=0
(1)
km
_
k
m
_
(1)
k
_
1
k!
=
n
k=1
(1)
k
1
k!
wherein we have used,
k
m=0
(1)
km
_
k
m
_
= (1 1)
k
= 0.
4.3.1 Appendix: Bonferroni Inequalities
In this appendix (see Feller Volume 1., p. 106-111 for more) we want to dis-
cuss what happens if we truncate the sums in the inclusion exclusion formula
of Lemma 4.23. In order to do this we will need the following lemma whose
combinatorial meaning was explained to me by Je Remmel.
Lemma 4.28. Let n N
0
and 0 k n, then
k
l=0
(1)
l
_
n
l
_
= (1)
k
_
n 1
k
_
1
n>0
+ 1
n=0
. (4.29)
Proof. The case n = 0 is trivial. We give two proofs for when n N.
First proof. Just use induction on k. When k = 0, Eq. (4.29) holds since
1 = 1. The induction step is as follows,
k+1
l=0
(1)
l
_
n
l
_
= (1)
k
_
n 1
k
_
+
_
n
k + 1
_
=
(1)
k+1
(k + 1)!
[n(n 1) . . . (n k) (k + 1) (n 1) . . . (n k)]
=
(1)
k+1
(k + 1)!
[(n 1) . . . (n k) (n (k + 1))] = (1)
k+1
_
n 1
k + 1
_
.
Second proof. Let X = 1, 2, . . . , n and observe that
m
k
:=
k
l=0
(1)
l
_
n
l
_
=
k
l=0
(1)
l
#
_
2
X
: #() = l
_
=
2
X
: #()k
(1)
#()
(4.30)
Dene T : 2
X
2
X
by
T (S) =
_
S 1 if 1 / S
S 1 if 1 S
.
Observe that T is a bijection of 2
X
such that T takes even cardinality sets to
odd cardinality sets and visa versa. Moreover, if we let
k
:=
_
2
X
: #() k and 1 if #() = k
_
,
then T (
k
) =
k
for all 1 k n. Since
k
(1)
#()
=
k
(1)
#(T())
=
k
(1)
#()
we see that
k
(1)
#()
= 0. Using this observation with Eq. (4.30) implies
m
k
=
k
(1)
#()
+
#()=k & 1/
(1)
#()
= 0 + (1)
k
_
n 1
k
_
.
Corollary 4.29 (Bonferroni Inequalitites). Let : / [0, (X)] be a
nitely additive nite measure on / 2
X
, A
n
/ for n = 1, 2, . . . , M, N :=
M
n=1
1
An
, and
S
k
:=
1i1<<ikM
(A
i1
A
ik
) = E
__
N
k
__
.
Then for 1 k M,
M
n=1
A
n
_
=
k
l=1
(1)
l+1
S
l
+ (1)
k
E
__
N 1
k
__
. (4.31)
This leads to the Bonferroni inequalities;
M
n=1
A
n
_
l=1
(1)
l+1
S
l
if k is odd
and
M
n=1
A
n
_
l=1
(1)
l+1
S
l
if k is even.
Proof. By Lemma 4.28,
k
l=0
(1)
l
_
N
l
_
= (1)
k
_
N 1
k
_
1
N>0
+ 1
N=0
.
Therefore integrating this equation with respect to gives,
(X) +
k
l=1
(1)
l
S
l
= (N = 0) + (1)
k
E
_
N 1
k
_
and therefore,
M
n=1
A
n
_
= (N > 0) = (X) (N = 0)
=
k
l=1
(1)
l
S
l
+ (1)
k
E
_
N 1
k
_
.
The Bonferroni inequalities are a simple consequence of Eq. (4.31) and the fact
that
_
N 1
k
_
0 = E
_
N 1
k
_
0.
4.3.2 Appendix: Riemann Stieljtes integral
In this subsection, let X be a set, / 2
X
be an algebra of sets, and P := :
/ [0, ) be a nitely additive measure with (X) < . As above let
E
f :=
_
X
fd :=
C
(f = ) f S(/) . (4.32)
Notation 4.30 For any function, f : X C let |f|
u
:= sup
xX
[f (x)[ .
Further, let

S := S(/) denote those functions, f : X C such that there exists
f
n
S(/) such that lim
n
|f f
n
|
u
= 0.
Exercise 4.7. Prove the following statements.
1. For all f S(/) ,
[E
f[ (X) |f|
u
. (4.33)
2. If f

S and f
n
S := S(/) such that lim
n
|f f
n
|
u
= 0, show
lim
n
E
f
n
exists. Also show that dening E
f := lim
n
E
f
n
is well
dened, i.e. you must show that lim
n
E
f
n
= lim
n
E
g
n
if g
n
S
such that lim
n
|f g
n
|
u
= 0.
3. Show E
:

S C is still linear and still satises Eq. (4.33).
4. Show [f[

S if f

S and that Eq. (4.19) is still valid, i.e. [E
f[ E
[f[
for all f

S.
Let us now specialize the above results to the case where X = [0, T] for
some T < . Let o := (a, b] : 0 a b T 0 which is easily seen to be
a semi-algebra. The following proposition is fairly straightforward and will be
left to the reader.
Proposition 4.31 (Riemann Stieljtes integral). Let F : [0, T] 1 be an
increasing function, then;
1. there exists a unique nitely additive measure,
F
, on / := /(o) such that
F
((a, b]) = F (b) F (a) for all 0 a b T and
F
(0) = 0. (In fact
one could allow for
F
(0) = for any 0, but we would then have to
write
F,
rather than
F
.)
2. Show C ([0, 1] , C) S(/). More precisely, suppose :=
0 = t
0
< t
1
< < t
n
= T is a partition of [0, T] and c = (c
1
, . . . , c
n
)
[0, T]
n
with t
i1
c
i
t
i
for each i. Then for f C ([0, 1] , C) , let
f
,c
:= f (0) 1
0]
+
n
i=1
f (c
i
) 1
(ti1,ti]
. (4.34)
Show that |f f
,c
|
u
is small provided, [[ := max [t
i
t
i1
[ : i = 1, 2, . . . , n
is small.
3. Using the above results, show
_
[0,T]
fd
F
= lim
]]0
n
i=1
f (c
i
) (F (t
i
) F (t
i1
))
where the c
i
may be chosen arbitrarily subject to the constraint that t
i1

c
i
t
i
.
It is customary to write
_
T
0
fdF for
_
[0,T]
fd
F
. This integral satises the
estimates,
_
[0,T]
fd
F
_
[0,T]
[f[ d
F
|f|
u
(F (T) F (0)) f S(/).
When F (t) = t,
_
T
0
fdF =
_
T
0
f (t) dt,
is the usual Riemann integral.
Exercise 4.8. Let a (0, T) , > 0, and
G(x) = 1
xa
=
_
if x a
0 if x < a
.
1. Explicitly compute
_
[0,T]
fd
G
for all f C ([0, 1] , C) .
2. If F (x) = x + 1
xa
describe
_
[0,T]
fd
F
for all f C ([0, 1] , C) . Hint:
if F (x) = G(x) + H (x) where G and H are two increasing functions on
[0, T] , show
_
[0,T]
fd
F
=
_
[0,T]
fd
G
+
_
[0,T]
fd
H
.
Exercise 4.9. Suppose that F, G : [0, T] 1 are two increasing functions such
that F (0) = G(0) , F (T) = G(T) , and F (x) ,= G(x) for at most countably
many points, x (0, T) . Show
_
[0,T]
fd
F
=
_
[0,T]
fd
G
for all f C ([0, 1] , C) . (4.35)
Note well, given F (0) = G(0) ,
F
=
G
on / i F = G.
One of the points of the previous exercise is to show that Eq. (4.35) holds
when G(x) := F (x+) the right continuous version of F. The exercise applies
since and increasing function can have at most countably many jumps, see
Remark 21.16. So if we only want to integrate continuous functions, we may
always assume that F : [0, T] 1 is right continuous.
4.4 Simple Independence and the Weak Law of Large
Numbers
To motivate the exercises in this section, let us imagine that we are following
the outcomes of two independent experiments with values
k
k=1

1
and
k=1

2
where
1
and
2
are two nite set of outcomes. Here we are
using term independent in an intuitive form to mean that knowing the outcome
of one of the experiments gives us no information about outcome of the other.
As an example of independent experiments, suppose that one experiment
is the outcome of spinning a roulette wheel and the second is the outcome of
rolling a dice. We expect these two experiments will be independent.
As an example of dependent experiments, suppose that dice roller now has
two dice one red and one black. The person rolling dice throws his black or
red dice after the roulette ball has stopped and landed on either black or red
respectively. If the black and the red dice are weighted dierently, we expect
that these two experiments are no longer independent.
Lemma 4.32 (Heuristic). Suppose that
k
k=1

1
and
k
k=1

2
are
the outcomes of repeatedly running two experiments independent of each other
and for x
1
and y
2
,
p (x, y) := lim
N
1
N
#1 k N :
k
= x and
k
= y ,
p
1
(x) := lim
N
1
N
#1 k N :
k
= x , and
p
2
(y) := lim
N
1
N
#1 k N :
k
= y . (4.36)
Then p (x, y) = p
1
(x) p
2
(y) . In particular this then implies for any h :
1

2
1 we have,
Eh = lim
N
1
N
N
k=1
h(
k
,
k
) =
(x,y)12
h(x, y) p
1
(x) p
2
(y) .
Proof. (Heuristic.) Let us imagine running the rst experiment repeatedly
with the results being recorded as,
_
k
_
k=1
, where N indicates the
th
run of the experiment. Then we have postulated that, independent of ,

p (x, y) := lim
N
1
N
N
k=1
1
k
=x and k=y
= lim
N
1
N
N
k=1
1
k
=x
1
k=y]
So for any L N we must also have,
p (x, y) =
1
L
L
=1
p (x, y) =
1
L
L
=1
lim
N
1
N
N
k=1
1
k
=x
1
k=y]
= lim
N
1
N
N
k=1
1
L
L
=1
1
k
=x
1
k=y]
.
Taking the limit of this equation as L and interchanging the order of the
limits (this is faith based) implies,
p (x, y) = lim
N
1
N
N
k=1
1
k=y]
lim
L
1
L
L
=1
1
k
=x
. (4.37)
Since for xed k,
_
k
_
=1
is just another run of the rst experiment, by our
postulate, we conclude that
lim
L
1
L
L
=1
1
k
=x
= p
1
(x) (4.38)
independent of the choice of k. Therefore combining Eqs. (4.36), (4.37), and
(4.38) implies,
p (x, y) = lim
N
1
N
N
k=1
1
k=y]
p
1
(x) = p
2
(y) p
1
(x) .
To understand this Lemma in another but equivalent way, let X
1
:
1

2

1
and X
2
:
1

2

2
be the projection maps, X
1
(x, y) = x and
4.4 Simple Independence and the Weak Law of Large Numbers 41
X
2
(x, y) = y respectively. Further suppose that f :
1
1 and g :
2
1
are functions, then using the heuristics Lemma 4.32 implies,
E[f (X
1
) g (X
2
)] =
(x,y)12
f (x) g (y) p
1
(x) p
2
(y)
=
x1
f (x) p
1
(x)
y2
g (y) p
2
(y) = Ef (X
1
) Eg (X
2
) .
Hopefully these heuristic computations will convince you that the mathe-
matical notion of independence developed below is relevant. In what follows,
we will use the obvious generalization of our results above to the setting of n
independent experiments. For notational simplicity we will now assume that
1
=
2
= =
n
= .
Let be a nite set, n N, =
n
, and X
i
: be dened by
X
i
() =
i
for and i = 1, 2, . . . , n. We further suppose p : [0, 1] is
a function such that

p () = 1
and P : 2
[0, 1] is the probability measure dened by

P (A) :=
A
p () for all A 2
. (4.39)
Exercise 4.10 (Simple Independence 1.). Suppose q
i
: [0, 1] are
functions such that

q
i
() = 1 for i = 1, 2, . . . , n and now dene
p () =
n
i=1
q
i
(
i
) . Show for any functions, f
i
: 1 that
E
P
_
n
i=1
f
i
(X
i
)
_
=
n
i=1
E
P
[f
i
(X
i
)] =
n
i=1
E
Qi
f
i
where Q
i
is the measure on dened by, Q
i
() =
q
i
() for all .
Exercise 4.11 (Simple Independence 2.). Prove the converse of the previ-
ous exercise. Namely, if
E
P
_
n
i=1
f
i
(X
i
)
_
=
n
i=1
E
P
[f
i
(X
i
)] (4.40)
for any functions, f
i
: 1, then there exists functions q
i
: [0, 1] with
q
i
() = 1, such that p () =
n
i=1
q
i
(
i
) .
Denition 4.33 (Independence). We say simple random variables,
X
1
, . . . , X
n
with values in on some probability space, (, /, P) are indepen-
dent (more precisely P independent) if Eq. (4.40) holds for all functions,
f
i
: 1.
Exercise 4.12 (Simple Independence 3.). Let X
1
, . . . , X
n
: and
P : 2
[0, 1] be as described before Exercise 4.10. Show X

1
, . . . , X
n
are
independent i
P (X
1
A
1
, . . . , X
n
A
n
) = P (X
1
A
1
) . . . P (X
n
A
n
) (4.41)
for all choices of A
i
. Also explain why it is enough to restrict the A
i
to
single point subsets of .
Exercise 4.13 (A Weak Law of Large Numbers). Suppose that 1
is a nite set, n N, =
n
, p () =

n
i=1
q (
i
) where q : [0, 1]
such that

q () = 1, and let P : 2
[0, 1] be the probability measure

dened as in Eq. (4.39). Further let X
i
() =
i
for i = 1, 2, . . . , n, := EX
i
,
2
:= E(X
i
)
2
, and
S
n
=
1
n
(X
1
+ +X
n
) .
1. Show, =
q () and
2
=
( )
2
q () =
2
q ()
2
. (4.42)
2. Show, ES
n
= .
3. Let
ij
= 1 if i = j and
ij
= 0 if i ,= j. Show
E[(X
i
) (X
j
)] =
ij
2
.
4. Using S
n
may be expressed as,
1
n
n
i=1
(X
i
) , show
E(S
n
)
2
=
1
n
2
. (4.43)
5. Conclude using Eq. (4.43) and Remark 4.22 that
P ([S
n
[ )
1
n
2
2
. (4.44)
So for large n, S
n
is concentrated near = EX
i
with probability approach-
ing 1 for n large. This is a version of the weak law of large numbers.
Denition 4.34 (Covariance). Let (, B, P) is a nitely additive probability.
The covariance, Cov (X, Y ) , of X, Y S(B) is dened by
Cov (X, Y ) = E[(X
X
) (Y
Y
)] = E[XY ] EX EY
where
X
:= EX and
Y
:= EY. The variance of X,
Var (X) := Cov (X, X) = E
_
X
2
(EX)
2
We say that X and Y are uncorrelated if Cov (X, Y ) = 0, i.e. E[XY ] = EX
EY. More generally we say X
k
n
k=1
S(B) are uncorrelated i Cov (X
i
, X
j
) =
0 for all i ,= j.
Remark 4.35. 1. Observe that X and Y are independent i f (X) and g (Y ) are
uncorrelated for all functions, f and g on the range of X and Y respectively. In
particular if X and Y are independent then Cov (X, Y ) = 0.
2. If you look at your proof of the weak law of large numbers in Exercise
4.13 you will see that it suces to assume that X
i
n
i=1
are uncorrelated rather
than the stronger condition of being independent.
Exercise 4.14 (Bernoulli Random Variables). Let = 0, 1 , X : 1
be dened by X (0) = 0 and X (1) = 1, x [0, 1] , and dene Q = x
1
+
(1 x)
0
, i.e. Q(0) = 1 x and Q(1) = x. Verify,
(x) := E
Q
X = x and
2
(x) := E
Q
(X x)
2
= (1 x) x 1/4.
Theorem 4.36 (Weierstrass Approximation Theorem via Bernsteins
Polynomials.). Suppose that f C([0, 1] , C) and
p
n
(x) :=
n
k=0
_
n
k
_
f
_
k
n
_
x
k
(1 x)
nk
.
Then
lim
n
sup
x[0,1]
[f (x) p
n
(x)[ = 0.
Proof. Let x [0, 1] , = 0, 1 , q (0) = 1 x, q (1) = x, =
n
, and
P
x
() = q (
1
) . . . q (
n
) = x
n
i=1
i
(1 x)
1
n
i=1
i
.
As above, let S
n
=
1
n
(X
1
+ +X
n
) , where X
i
() =
i
and observe that
P
x
_
S
n
=
k
n
_
=
_
n
k
_
x
k
(1 x)
nk
.
Therefore, writing E
x
for E
Px
, we have
E
x
[f (S
n
)] =
n
k=0
f
_
k
n
__
n
k
_
x
k
(1 x)
nk
= p
n
(x) .
Hence we nd
[p
n
(x) f (x)[ = [E
x
f (S
n
) f (x)[ = [E
x
[f (S
n
) f (x)][
E
x
[f (S
n
) f (x)[
= E
x
[[f (S
n
) f (x)[ : [S
n
x[ ]
+E
x
[[f (S
n
) f (x)[ : [S
n
x[ < ]
2M P
x
([S
n
x[ ) + ()
where
M := max
y[0,1]
[f (y)[ and
() := sup[f(y) f(x)[ : x, y [0, 1] and [y x[
is the modulus of continuity of f. Now by the above exercises,
P
x
([S
n
x[ )
1
4n
2
(see Figure 4.1) (4.45)
and hence we may conclude that
max
x[0,1]
[p
n
(x) f (x)[
M
2n
2
+ ()
and therefore, that
limsup
n
max
x[0,1]
[p
n
(x) f (x)[ () .
This completes the proof, since by uniform continuity of f, () 0 as 0.
4.4.1 Complex Weierstrass Approximation Theorem
The main goal of this subsection is to prove Theorem 4.42 which states that
any continuous 2 periodic function on 1 may be well approximated by
trigonometric polynomials. The main ingredient is the following two dimen-
sional generalization of Theorem 4.36. All of the results in this section have
natural generalization to higher dimensions as well , see Theorem 4.50.
Theorem 4.37 (Weierstrass Approximation Theorem). Suppose that
K = [0, 1]
2
, f C(K, C), and
p
n
(x, y) :=
n
k,l=0
f
_
k
n
,
l
n
__
n
k
__
n
l
_
x
k
(1 x)
nk
y
l
(1 y)
nl
. (4.46)
Then p
n
f uniformly on K.
4.4 Simple Independence and the Weak Law of Large Numbers 43
Fig. 4.1. Plots of Px (Sn = k/n) versus k/n for n = 100 with x = 1/4 (black), x = 1/2
(red), and x = 5/6 (green).
Proof. We are going to follow the argument given in the proof of Theorem
4.36. By considering the real and imaginary parts of f separately, it suces
to assume f C([0, 1]
2
, 1). For (x, y) K and n N we may choose a
collection of independent Bernoulli simple random variables X
i
, Y
i
n
i=1
such
that P (X
i
= 1) = x and P (Y
i
= 1) = y for all 1 i n. Then letting
S
n
:=
1
n
n
i=1
X
i
and T
n
:=
1
n
n
i=1
Y
i
, we have
E[f (S
n
, T
n
)] =
n
k,l=0
f
_
k
n
,
l
n
_
P (n S
n
= k, n T
n
= l) = p
n
(x, y)
where p
n
(x, y) is the polynomial given in Eq. (4.46) wherein the assumed in-
dependence is needed to show,
P (n S
n
= k, n T
n
= l) =
_
n
k
__
n
l
_
x
k
(1 x)
nk
y
l
(1 y)
nl
.
Thus if M = sup[f(x, y)[ : (x, y) K , > 0,
= sup[f(x
t
, y
t
) f(x, y)[ : (x, y) , (x
t
, y
t
) K and |x
t
, y
t
(x, y)| ,
and
A := |(S
n
, T
n
) (x, y)| > ,
we have,
[f(x, y) p
n
(x, y)[ =[E(f(x, y) f ((S
n
, T
n
)))[
E[f(x, y) f ((S
n
, T
n
))[
=E[[f(x, y) f (S
n
, T
n
)[ : A]
+E[[f(x, y) f (S
n
, T
n
)[ : A
c
]
2M P (A) +
P (A
c
)
2M P (A) +
. (4.47)
To estimate P (A) , observe that if
|(S
n
, T
n
) (x, y)|
2
= (S
n
x)
2
+ (T
n
y)
2
>
2
,
then either,
(S
n
x)
2
>
2
/2 or (T
n
y)
2
>
2
/2
and therefore by sub-additivity and Eq. (4.45) we know
P (A) P
_
[S
n
x[ > /
2
_
+P
_
[T
n
y[ > /
2
_
1
2n
2
+
1
2n
2
=
1
n
2
. (4.48)
Using this estimate in Eq. (4.47) gives,
[f(x, y) p
n
(x, y)[ 2M
1
n
2
+
and as right is independent of (x, y) K we may conclude,

limsup
n
sup
(x,y)K
[f (x, y) p
n
(x, y)[
which completes the proof since
0 as 0 because f is uniformly continuous

on K.
Remark 4.38. We can easily improve our estimate on P (A) in Eq. (4.48) by a
factor of two as follows. As in the proof of Theorem 4.36,
E
_
|(S
n
, T
n
) (x, y)|
2
_
= E
_
(S
n
x)
2
+ (T
n
y)
2
_
= Var (S
n
) + Var (T
n
)
=
1
n
x(1 x) +y (1 y)
1
2n
.
Therefore by Chebyshevs inequality,
P (A) = P (|(S
n
, T
n
) (x, y)| > )
1
2
E|(S
n
, T
n
) (x, y)|
2
1
2n
2
.
Corollary 4.39. Suppose that K = [a, b] [c, d] is any compact rectangle in 1
2
.
Then every function, f C(K, C), may be uniformly approximated by polyno-
mial functions in (x, y) 1
2
.
Proof. Let F (x, y) := f (a +x(b a) , c +y (d c)) a continuous func-
tion of (x, y) [0, 1]
2
. Given > 0, we may use Theorem Theorem 4.37 to nd
a polynomial, p (x, y) , such that sup
(x,y)[0,1]
2 [F (x, y) p (x, y)[ . Letting
= a +x(b a) and := c +y (d c) , it now follows that
sup
(.)K
f (, ) p
_
a
b a
,
c
d c
_

which completes the proof since p
_
a
ba
,
c
dc
_
is a polynomial in (, ) .
Here is a version of the complex Weierstrass approximation theorem.
Theorem 4.40 (Complex Weierstrass Approximation Theorem).
Suppose that K C is a compact rectangle. Then there exists poly-
nomials in (z = x +iy, z = x iy) , p
n
(z, z) for z C, such that
sup
zK
[q
n
(z, z) f(z)[ 0 as n for every f C (K, C) .
Proof. The mapping (x, y) 1 1 z = x + iy C is an isomorphism
of vector spaces. Letting z = x iy as usual, we have x =
z+ z
2
and y =
z z
2i
.
Therefore under this identication any polynomial p(x, y) on 1 1 may be
written as a polynomial q in (z, z), namely
q(z, z) = p
_
z + z
2
,
z z
2i
_
.
Conversely a polynomial q in (z, z) may be thought of as a polynomial p in
(x, y), namely p(x, y) = q(x + iy, x iy). Hence the result now follows from
Theorem 4.37.
Example 4.41. Let K = S
1
= z C : [z[ = 1 and / be the set of polynomials
in (z, z) restricted to S
1
. Then / is dense in C(S
1
). To prove this rst observe
if f C
_
S
1
_
then F(z) = [z[ f
_
z
]z]
_
for z ,= 0 and F(0) = 0 denes F C(C)
such that F[
S
1 = f. By applying Theorem 4.40 to F restricted to a compact
rectangle containing S
1
we may nd q
n
(z, z) converging uniformly to F on K
and hence on S
1
. Since z on S
1
, we have shown polynomials in z and z
1
are
dense in C(S
1
).
Theorem 4.42 (Density of Trigonometric Polynomials). Any 2 pe-
riodic continuous function, f : 1 C, may be uniformly approximated by a
trigonometric polynomial of the form
p (x) =
e
ix
where is a nite subset of Z and a
C for all .
Proof. For z S
1
, dene F(z) := f() where 1 is chosen so that
z = e
i
. Since f is 2 periodic, F is well dened since if solves e
i
= z then
all other solutions are of the form + 2n : n Z . Since the map e
i
is a local homeomorphism, i.e. for any J = (a, b) with b a < 2, the map
J

J :=
_
e
i
: J
_
S
1
is a homeomorphism, it follows that F(z) =
f
1
(z) for z

J. This shows F is continuous when restricted to

J. Since
such sets cover S
1
, it follows that F is continuous.
By Example 4.41, the polynomials in z and z = z
1
are dense in C(S
1
).
Hence for any > 0 there exists
p(z, z) =
0m,nN
a
m,n
z
m
z
n
such that [F(z) p(z, z)[ for all z S
1
. Taking z = e
i
then implies
sup
f() p
_
e
i
, e
i
_

where
p
_
e
i
, e
i
_
=
0m,nN
a
m,n
e
i(mn)
is the desired trigonometry polynomial.
4.4.2 Product Measures and Fubinis Theorem
In the last part of this section we will extend some of the above ideas to
more general nitely additive measure spaces. A nitely additive mea-
sure space is a triple, (X, /, ), where X is a set, / 2
X
is an algebra, and
: / [0, ] is a nitely additive measure. Let (Y, B, ) be another nitely
additive measure space.
Denition 4.43. Let /B be the smallest sub-algebra of 2
XY
containing all
sets of the form o := AB : A / and B B . As we have seen in Exercise
3.10, o is a semi-algebra and therefore /B consists of subsets, C X Y,
which may be written as;
C =
n
i=1
A
i
B
i
with A
i
B
i
o. (4.49)
4.5 Simple Conditional Expectation 45
Theorem 4.44 (Product Measure and Fubinis Theorem). Assume that
(X) < and (Y ) < for simplicity. Then there is a unique nitely
additive measure, , on /B such that (AB) = (A) (B) for all
A / and B B. Moreover if f S(/B) then;
1. y f (x, y) is in S(B) for all x X and x f (x, y) is in S(/) for all
y Y.
2. x
_
Y
f (x, y) d (y) is in S(/) and y
_
X
f (x, y) d(x) is in S(B) .
3. we have,
_
X
__
Y
f (x, y) d (y)
_
d(x)
=
_
XY
f (x, y) d ( ) (x, y)
=
_
Y
__
X
f (x, y) d(x)
_
d (y) .
We will refer to as the product measure of and .
Proof. According to Eq. (4.49),
1
C
(x, y) =
n
i=1
1
AiBi
(x, y) =
n
i=1
1
Ai
(x) 1
Bi
(y)
from which it follows that 1
C
(x, ) S(B) for each x X and
_
Y
1
C
(x, y) d (y) =
n
i=1
1
Ai
(x) (B
i
) .
It now follows from this equation that x
_
Y
1
C
(x, y) d (y) S(/) and that
_
X
__
Y
1
C
(x, y) d (y)
_
d(x) =
n
i=1
(A
i
) (B
i
) .
Similarly one shows that
_
Y
__
X
1
C
(x, y) d(x)
_
d (y) =
n
i=1
(A
i
) (B
i
) .
In particular this shows that we may dene
( ) (C) =
n
i=1
(A
i
) (B
i
)
and with this denition we have,
_
X
__
Y
1
C
(x, y) d (y)
_
d(x)
= ( ) (C)
=
_
Y
__
X
1
C
(x, y) d(x)
_
d (y) .
From either of these representations it is easily seen that is a nitely
additive measure on / B with the desired properties. Moreover, we have
already veried the Theorem in the special case where f = 1
C
with C /
B. Since the general element, f S(/B) , is a linear combination of such
functions, it is easy to verify using the linearity of the integral and the fact that
S(/) and S(B) are vector spaces that the theorem is true in general.
Example 4.45. Suppose that f S(/) and g S(B) . Let f g (x, y) :=
f (x) g (y) . Since we have,
f g (x, y) =
_
a
a1
f=a
(x)
__
b
b1
g=b
(y)
_
=
a,b
ab1
f=a]g=b]
(x, y)
it follows that f g S(/B) . Moreover, using Fubinis Theorem 4.44 it
follows that
_
XY
f g d ( ) =
__
X
f d
_ __
Y
g d
_
.
4.5 Simple Conditional Expectation
In this section, B is a sub-algebra of 2
, P : B [0, 1] is a nitely additive

probability measure, and / B is a nite sub-algebra. As in Example 3.19, for
each , let A
:= A / : A and recall that either A
= A
or
A
= for all ,
t
. In particular there is a partition, B
1
, . . . , B
n
,
of such that A
B
1
, . . . , B
n
for all .
Denition 4.46 (Conditional expectation). Let X : 1 be a B simple
random variable, i.e. X S(B) , and
X () :=
1
P (A
)
E[1
A
X] for all , (4.50)
where by convention,

X () = 0 if P (A
) = 0. We will denote

X by E[X[/]
for E
,
X and call it the conditional expectation of X given /. Alternatively we
may write

X as
X =
n
i=1
E[1
Bi
X]
P (B
i
)
1
Bi
, (4.51)
again with the convention that E[1
Bi
X] /P (B
i
) = 0 if P (B
i
) = 0.
It should be noted, from Exercise 4.1, that

X = E
,
X S(/) . Heuristi-
cally, if ( (1) , (2) , (3) , . . . ) is the sequence of outcomes of independently
running our experiment repeatedly, then
X[
Bi
=
E[1
Bi
X]
P (B
i
)
=
lim
N
1
N
N
n=1
1
Bi
( (n)) X ( (n))
lim
N
1
N
N
n=1
1
Bi
( (n))
= lim
N
N
n=1
1
Bi
( (n)) X ( (n))
N
n=1
1
Bi
( (n))
.
So to compute

X[
Bi
empirically, we remove all experimental outcomes from
the list, ( (1) , (2) , (3) , . . . )
N
, which are not in B
i
to form a new
list, ( (1) , (2) , (3) , . . . ) B
N
i
. We then compute

X[
Bi
using the empirical
formula for the expectation of X relative to the bar list, i.e.
X[
Bi
= lim
N
1
N
N
n=1
X ( (n)) .
Exercise 4.15 (Simple conditional expectation). Let X S(B) and, for
simplicity, assume all functions are real valued. Prove the following assertions;
1. (Orthogonal Projection Property 1.) If Z S(/), then
E[XZ] = E
_
XZ
= E[E
,
X Z] (4.52)
and
(E
,
Z) () =
_
Z () if P (A
) > 0
0 if P (A
) = 0
. (4.53)
In particular, E
,
[E
,
Z] = E
,
Z.
This basically says that E
,
is orthogonal projection from S(B) onto S(/)
relative to the inner product
(f, g) = E[fg] for all f, g S(B) .
2. (Orthogonal Projection Property 2.) If Y S(/) satises, E[XZ] =
E[Y Z] for all Z S(/) , then Y () =

X () whenever P (A
) > 0. In
particular, P
_
Y ,=

X
_
= 0. Hint: use item 1. to compute E
_
_
X Y
_
2
_
.
3. (Best Approximation Property.) For any Y S(/) ,
E
_
_
X

X
_
2
_
E
_
(X Y )
2
_
(4.54)
with equality i

X = Y almost surely (a.s. for short), where

X = Y a.s. i
P
_
X ,= Y
_
= 0. In words,

X = E
,
X is the best (L
2
) approximation to
X by an / measurable random variable.
4. (Contraction Property.) E
E[X[ . (It is typically not true that

X ()
[X ()[ for all .)

5. (Pull Out Property.) If Z S(/) , then
E
,
[ZX] = ZE
,
X.
Example 4.47 (Heuristics of independence and conditional expectations). Let us
suppose that we have an experiment consisting of spinning a spinner with values
in
1
= 1, 2, . . . , 10 and rolling a die with values in
2
= 1, 2, 3, 4, 5, 6 . So
the outcome of an experiment is represented by a point, = (x, y) =
2
. Let X (x, y) = x, Y (x, y) = y, B = 2
, and
/ = /(X) = X
1
_
2
1
_
=
_
X
1
(A) : A
1
_
B,
so that / is the smallest algebra of subsets of such that X = x / for all
x
1
. Notice that the partition associated to / is precisely
X = 1 , X = 2 , . . . , X = 10 .
Let us now suppose that the spins of the spinner are empirically independent
of the throws of the dice. As usual let us run the experiment repeatedly to
produce a sequence of results,
n
= (x
n
, y
n
) for all n N. If g :
2
1 is a
function, we have (heuristically) that
E
,
[g (Y )] (x, y) = lim
N
N
n=1
g (Y ( (n))) 1
X((n))=x
N
n=1
1
X((n))=x
= lim
N
N
n=1
g (y
n
) 1
xn=x
N
n=1
1
xn=x
.
As the y
n
sequence of results are independent of the x
n
sequence, we should
expect by the usual mantra
2
that
lim
N
N
n=1
g (y
n
) 1
xn=x
N
n=1
1
xn=x
= lim
N
1
M (N)
M(N)
n=1
g ( y
n
) = E[g (Y )] ,
2
That is it should not matter which sequence of independent experiments are used
to compute the time averages.
4.6 Appendix: A Multi-dimensional Weirstrass Approximation Theorem 47
where M (N) =

N
n=1
1
xn=x
and ( y
1
, y
2
, . . . ) = y
l
: 1
xl=x
. (We are also
assuming here that P (X = x) > 0 so that we expect, M (N) P (X = x) N
for N large, in particular M (N) .) Thus under the assumption that X
and Y are describing independent experiments we have heuristically deduced
that E
,
[g (Y )] : 1 is the constant function;
E
,
[g (Y )] (x, y) = E[g (Y )] for all (x, y) . (4.55)
Let us further observe that if f :
1
1 is any other function, then f (X) is
an / simple function and therefore by Eq. (4.55) and Exercise 4.15
E[f (X)]E[g (Y )] = E[f (X) E[g (Y )]] = E[f (X) E
,
[g (Y )]] = E[f (X) g (Y )] .
This observation along with Exercise 4.12 gives another proof of Lemma 4.32.
Lemma 4.48 (Conditional Expectation and Independence). Let =
1

2
, X, Y, B = 2
, and / =X
1
_
2
1
_
, be as in Example 4.47 above.
Assume that P : B [0, 1] is a probability measure. If X and Y are P
independent, then Eq. (4.55) holds.
Proof. From the denitions of conditional expectation and of independence
we have,
E
,
[g (Y )] (x, y) =
E[1
X=x
g (Y )]
P (X = x)
=
E[1
X=x
] E[g (Y )]
P (X = x)
= E[g (Y )] .
The following theorem summarizes much of what we (i.e. you) have shown
regarding the underlying notion of independence of a pair of simple functions.
Theorem 4.49 (Independence result summary). Let (, B, P) be a
nitely additive probability space, be a nite set, and X, Y : be two
B measurable simple functions, i.e. X = x B and Y = y B for all
x, y . Further let / = /(X) := /(X = x : x ) . Then the following
are equivalent;
1. P (X = x, Y = y) = P (X = x) P (Y = y) for all x and y ,
2. E[f (X) g (Y )] = E[f (X)] E[g (Y )] for all functions, f : 1 and g :
1,
3. E
,(X)
[g (Y )] = E[g (Y )] for all g : 1, and
4. E
,(Y )
[f (X)] = E[f (X)] for all f : 1.
We say that X and Y are P independent if any one (and hence all) of the
above conditions holds.
4.6 Appendix: A Multi-dimensional Weirstrass
Approximation Theorem
The following theorem is the multi-dimensional generalization of Theorem 4.36.
Theorem 4.50 (Weierstrass Approximation Theorem). Suppose that
K = [a
1
, b
1
] . . . [a
d
, b
d
] with < a
i
< b
i
< is a compact rectangle
in 1
d
. Then for every f C(K, C), there exists polynomials p
n
on 1
d
such that
p
n
f uniformly on K.
Proof. By a simple scaling and translation of the arguments of f we may
assume without loss of generality that K = [0, 1]
d
. By considering the real and
imaginary parts of f separately, it suces to assume f C([0, 1], 1).
Given x K, let
_
X
n
=
_
X
1
n
, . . . , X
d
n
__
n=1
be i.i.d. random vectors with
values in 1
d
such that
P (X
n
= ) =
d
i=1
(1 x
i
)
1i
x
i
i
for all = (
1
, . . . ,
d
) 0, 1
d
. Since each X
j
n
is a Bernoulli random variable
with P
_
X
j
n
= 1
_
= x
j
, we know that
EX
n
= x and Var
_
X
j
n
_
= x
j
x
2
j
= x
j
(1 x
j
).
As usual let S
n
= S
n
:= X
1
+ +X
n
1
d
, then
E
_
S
n
n
_
= x and
E
_
_
_
_
_
S
n
n
x
_
_
_
_
2
_
=
d
j=1
E
_
S
j
n
n
x
j
_
2
=
d
j=1
Var
_
S
j
n
n
x
j
_
=
d
j=1
Var
_
S
j
n
n
_
=
1
n
2

d
j=1
n
k=1
Var
_
X
j
k
_
=
1
n
d
j=1
x
j
(1 x
j
)
d
4n
.
This shows S
n
/n x in L
2
(P) and hence by Chebyshevs inequality, S
n
/n
P
x
in and by a continuity theorem, f
_
Sn
n
_
P
f (x) as n . This along with the
dominated convergence theorem shows
p
n
(x) := E
_
f
_
S
n
n
__
f (x) as n , (4.56)
where
p
n
(x) =
:1,2,...,n]0,1]
d
f
_
(1) + + (n)
n
_
P (X
1
= (1) , . . . , X
n
= (n))
=
:1,2,...,n]0,1]
d
f
_
(1) + + (n)
n
_
n
k=1
d
i=1
(1 x
i
)
1i(k)
x
i(k)
i
is a polynomial of degree nd. In fact more is true.
Suppose > 0 is given, M = sup[f(x)[ : x K , and
= sup[f(y) f(x)[ : x, y K and |y x| .

By uniform continuity of f on K, lim
0
= 0. Therefore,
[f(x) p
n
(x)[ =
E
_
f(x) f
_
S
n
n
__
f(x) f
_
S
n
n
_
E
_
f(x) f
_
S
n
n
_
: |S
n
x| >
_
+E
_
f(x) f
_
S
n
n
_
: |S
n
x|
_
2MP (|S
n
x| > ) +
. (4.57)
By Chebyshevs inequality,
P (|S
n
x| > )
1
2
E|S
n
x|
2
=
d
4n
2
,
and therefore, Eq. (4.57) yields the estimate
sup
xK
[f (x) p
n
(x)[
2dM
n
2
+
and hence
limsup
n
sup
xK
[f (x) p
n
(x)[
0 as 0.
Here is a version of the complex Weirstrass approximation theorem.
Theorem 4.51 (Complex Weierstrass Approximation Theorem). Sup-
pose that K C
d
= 1
d
1
d
is a compact rectangle. Then there ex-
ists polynomials in (z = x +iy, z = x iy) , p
n
(z, z) for z C
d
, such that
sup
zK
[q
n
(z, z) f(z)[ 0 as n for every f C (K, C) .
Proof. The mapping (x, y) 1
d
1
d
z = x+iy C
d
is an isomorphism
of vector spaces. Letting z = x iy as usual, we have x =
z+ z
2
and y =
z z
2i
.
Therefore under this identication any polynomial p(x, y) on 1
d
1
d
may be
written as a polynomial q in (z, z), namely
q(z, z) = p(
z + z
2
,
z z
2i
).
Conversely a polynomial q in (z, z) may be thought of as a polynomial p in
(x, y), namely p(x, y) = q(x + iy, x iy). Hence the result now follows from
Theorem 4.50.
Example 4.52. Let K = S
1
= z C : [z[ = 1 and / be the set of polynomials
in (z, z) restricted to S
1
. Then / is dense in C(S
1
). To prove this rst observe
if f C
_
S
1
_
then F(z) = [z[ f(
z
]z]
) for z ,= 0 and F(0) = 0 denes F C(C)
such that F[
S
1 = f. By applying Theorem 4.51 to F restricted to a compact
rectangle containing S
1
we may nd q
n
(z, z) converging uniformly to F on
K and hence on S
1
. Since z = z
1
on S
1
, we have shown polynomials in z
and z
1
are dense in C(S
1
). This example generalizes in an obvious way to
K =
_
S
1
_
d
C
d
.
Exercise 4.16. Use Example 4.52 to show that any 2 periodic continuous
function, g : 1
d
C, may be uniformly approximated by a trigonometric
polynomial of the form
p (x) =
e
ix
where is a nite subset of Z
d
and a
C for all . Hint: start by

showing there exists a unique continuous function, f :
_
S
1
_
d
C such that
f
_
e
ix1
, . . . , e
ixd
_
= F (x) for all x = (x
1
, . . . , x
d
) 1
d
.
Solution to Exercise (4.16). I will write out the solution when d = 1. For
z S
1
, dene F(z) := f(e
i
) where 1 is chosen so that z = e
i
. Since f is 2
periodic, F is well dened since if solves e
i
= z then all other solutions are of
the form + 2n : n Z . Since the map e
i
is a local homeomorphism,
i.e. for any J = (a, b) with ba < 2, the map J

J :=
_
e
i
: J
_
S
1
is a homeomorphism, it follows that F(z) = f
1
(z) for z

J. This shows
F is continuous when restricted to

J. Since such sets cover S
1
, it follows that
F is continuous. It now follows from Example 4.52 that polynomials in z and
z
1
are dense in C(S
1
). Hence for any > 0 there exists
p(z, z) =
a
m,n
z
m
z
n
=
a
m,n
z
m
z
n
=
a
m,n
z
mn
such that [F(z) p(z, z)[ for all z. Taking z = e
i
then implies there exists
b
n
C and N N such that
p
() :=
N
n=N
b
n
e
in
(4.58)
satises
sup

f() p ()
.
Exercise 4.17. Suppose f C (1, C) is a 2 periodic function (i.e.
f (x + 2) = f (x) for all x 1) and
_
2
0
f (x) e
inx
dx = 0 for all n Z,
show again that f 0. Hint: Use Exercise 4.16.
Solution to Exercise (4.17). By assumption,
_
2
0
f () e
in
d = 0 for all n
and so by the linearity of the Riemann integral,
0 =
_
2
0
f () p
() d. (4.59)
Choose trigonometric polynomials, p
, as in Eq. (4.58) such that p
()

f ()
uniformly in as 0. Passing to the limit in Eq. (4.59) implies
0 = lim
0
_
2
0
f () p
() d =
_
2
0
f ()

f () d =
_
2
0
[f ()[
2
d.
From this it follows that f 0, for if [f (
0
)[ > 0 for some
0
then [f ()[ > 0
for in a neighborhood of
0
by continuity of f. It would then follow that
_
2
0
[f ()[
2
d > 0.
5
Countably Additive Measures
Let / 2
be an algebra and : / [0, ] be a nitely additive measure.

Recall that is a premeasure on / if is additive on /. If is a
premeasure on / and / is a algebra (Denition 3.12), we say that is a
measure on (, /) and that (, /) is a measurable space.
Denition 5.1. Let (, B) be a measurable space. We say that P : B [0, 1] is
a probability measure on (, B) if P is a measure on B such that P () = 1.
In this case we say that (, B, P) a probability space.
5.1 Overview
The goal of this chapter is develop methods for proving the existence of proba-
bility measures with desirable properties. The main results of this chapter may
are summarized in the following theorem.
Theorem 5.2. A nitely additive probability measure P on an algebra, / 2
,
extends to additive measure on (/) i P is a premeasure on /. If the
extension exists it is unique.
Proof. The uniqueness assertion is proved Proposition 5.15 below. The ex-
istence assertion of the theorem in the content of Theorem 5.27.
In order to use this theorem it is necessary to determine when a nitely ad-
ditive probability measure in is in fact a premeasure. The following Proposition
is sometimes useful in this regard.
Proposition 5.3 (Equivalent premeasure conditions). Suppose that P is
a nitely additive probability measure on an algebra, / 2
. Then the following

are equivalent:
1. P is a premeasure on /, i.e. P is additive on /.
2. For all A
n
/ such that A
n
A /, P (A
n
) P (A) .
3. For all A
n
/ such that A
n
A /, P (A
n
) P (A) .
4. For all A
n
/ such that A
n
, P (A
n
) 1.
5. For all A
n
/ such that A
n
, P (A
n
) 0.
Proof. We will start by showing 1 2 3.
1. = 2. Suppose A
n
/ such that A
n
A /. Let A
t
n
:= A
n
A
n1
with A
0
:= . Then A
t
n
n=1
are disjoint, A
n
=
n
k=1
A
t
k
and A =
k=1
A
t
k
.
Therefore,
P (A) =
k=1
P (A
t
k
) = lim
n
n
k=1
P (A
t
k
) = lim
n
P (
n
k=1
A
t
k
) = lim
n
P (A
n
) .
2. = 1. If A
n
n=1
/ are disjoint and A :=
n=1
A
n
/, then
N
n=1
A
n
A. Therefore,
P (A) = lim
N
P
_
N
n=1
A
n
_
= lim
N
N
n=1
P (A
n
) =
n=1
P (A
n
) .
2. = 3. If A
n
/ such that A
n
A /, then A
c
n
A
c
and therefore,
lim
n
(1 P (A
n
)) = lim
n
P (A
c
n
) = P (A
c
) = 1 P (A) .
3. = 2. If A
n
/ such that A
n
A /, then A
c
n
A
c
and therefore we
again have,
lim
n
(1 P (A
n
)) = lim
n
P (A
c
n
) = P (A
c
) = 1 P (A) .
The same proof used for 2. 3. shows 4. 5 and it is clear that
3. = 5. To nish the proof we will show 5. = 2.
5. = 2. If A
n
/ such that A
n
A /, then A A
n
and therefore
lim
n
[P (A) P (A
n
)] = lim
n
P (A A
n
) = 0.
Remark 5.4. Observe that the equivalence of items 1. and 2. in the above propo-
sition hold without the restriction that P () = 1 and in fact P () = may
be allowed for this equivalence.
Lemma 5.5. If : / [0, ] is a premeasure, then is countably sub-additive
on /.
52 5 Countably Additive Measures
Proof. Suppose that A
n
/ with
n=1
A
n
/. Let A
1
:= A
1
and for
n 2, let A
t
n
:= A
n
(A
1
. . . A
n1
) /. Then
n=1
A
n
=

n=1
A
t
n
and
therefore by the countable additivity and monotonicity of we have,
(
n=1
A
n
) =
_

n=1
A
t
n
_
=
n=1
(A
t
n
)
n=1
(A
n
) .
Let us now specialize to the case where = 1 and / =
/((a, b] 1 : a b ) . In this case we will describe proba-
bility measures, P, on B
R
by their cumulative distribution functions.
Denition 5.6. Given a probability measure, P on B
R
, the cumulative dis-
tribution function (CDF) of P is dened as the function, F = F
P
: 1 [0, 1]
given as
F (x) := P ((, x]) . (5.1)
Example 5.7. Suppose that
P = p
1
+q
1
+r
with p, q, r > 0 and p +q +r = 1. In this case,

F (x) =
_
_
0 for x < 1
p for 1 x < 1
p +q for 1 x <
1 for x <
.
A plot of F (x) with p = .2, q = .3, and r = .5.
Lemma 5.8. If F = F
P
: 1 [0, 1] is a distribution function for a probability
measure, P, on B
R
, then:
1. F is non-decreasing,
2. F is right continuous,
3. F () := lim
x
F (x) = 0, and F () := lim
x
F (x) = 1.
Proof. The monotonicity of P shows that F (x) in Eq. (5.1) is non-
decreasing. For b 1 let A
n
= (, b
n
] with b
n
b as n . The continuity
of P implies
F(b
n
) = P((, b
n
]) ((, b]) = F(b).
Since b
n
n=1
was an arbitrary sequence such that b
n
b, we have shown
F (b+) := lim
yb
F(y) = F(b). This show that F is right continuous. Similar
arguments show that F () = 1 and F () = 0.
It turns out that Lemma 5.8 has the following important converse.
Theorem 5.9. To each function F : 1 [0, 1] satisfying properties 1. 3.. in
Lemma 5.8, there exists a unique probability measure, P
F
, on B
R
such that
P
F
((a, b]) = F (b) F (a) for all < a b < .
Proof. The uniqueness assertion is proved in Corollary 5.17 below or see
Exercises 5.2 and 5.11 below. The existence portion of the theorem is a special
case of Theorem 5.33 below.
Example 5.10 (Uniform Distribution). The function,
F (x) :=
_
_
_
0 for x 0
x for 0 x < 1
1 for 1 x <
,
is the distribution function for a measure, m on B
R
which is concentrated on
(0, 1]. The measure, m is called the uniform distribution or Lebesgue mea-
sure on (0, 1].
With this summary in hand, let us now start the formal development. We
begin with uniqueness statement in Theorem 5.2.
5.2 Theorem
Recall that a collection, T 2
, is a class or system if it is closed

under nite intersections. We also need the notion of a system.
Denition 5.11 ( system). A collection of sets, L 2
, is class or
system if
5.2 Theorem 53
Fig. 5.1. The cumulative distribution function for the uniform distribution.
a. L
b. If A, B L and A B, then B A L. (Closed under proper dierences.)
c. If A
n
L and A
n
A, then A L. (Closed under countable increasing
unions.)
Remark 5.12. If L is a collection of subsets of which is both a class and
a system then L is a algebra. Indeed, since A
c
= A, we see that
any - system is closed under complementation. If L is also a system, it is
closed under intersections and therefore L is an algebra. Since L is also closed
under increasing unions, L is a algebra.
Lemma 5.13 (Alternate Axioms for a System*). Suppose that L 2
is a collection of subsets . Then L is a class i satises the following

postulates:
1. L
2. A L implies A
c
L. (Closed under complementation.)
3. If A
n
n=1
L are disjoint, then

n=1
A
n
L. (Closed under disjoint
unions.)
Proof. Suppose that L satises a. c. above. Clearly then postulates 1. and
2. hold. Suppose that A, B L such that A B = , then A B
c
and
A
c
B
c
= B
c
A L.
Taking complements of this result shows A B L as well. So by induction,
B
m
:=

m
n=1
A
n
L. Since B
m

n=1
A
n
it follows from postulate c. that
n=1
A
n
L.
Now suppose that L satises postulates 1. 3. above. Notice that L
and by postulate 3., L is closed under nite disjoint unions. Therefore if A, B
L with A B, then B
c
L and A B
c
= allows us to conclude that
A B
c
L. Taking complements of this result shows B A = A
c
B L as
well, i.e. postulate b. holds. If A
n
L with A
n
A, then B
n
:= A
n
A
n1
L
for all n, where by convention A
0
= . Hence it follows by postulate 3 that
n=1
A
n
=
n=1
B
n
L.
Theorem 5.14 (Dynkins Theorem). If L is a class which contains
a contains a class, T, then (T) L.
Proof. We start by proving the following assertion; for any element C L,
the collection of sets,
L
C
:= D L : C D L ,
is a system. To prove this claim, observe that: a. L
C
, b. if A B with
A, B L
C
, then A C, B C L with A C B C and therefore,
(B A) C = [B C] A = [B C] [A C] L.
This shows that L
C
is closed under proper dierences. c. If A
n
L
C
with
A
n
A, then A
n
C L and A
n
C AC L, i.e. A L
C
. Hence we have
veried L
C
is still a system.
For the rest of the proof, we may assume without loss of generality that L
is the smallest class containing T if not just replace L by the intersection
of all classes containing T. Then for C T we know that L
C
L is a
- class containing T and hence L
C
= L. Since C T was arbitrary, we have
shown, C D L for all C T and D L. We may now conclude that if
C L, then T L
C
L and hence again L
C
= L. Since C L is arbitrary,
we have shown CD L for all C, D L, i.e. L is a system. So by Remark
5.12, L is a algebra. Since (T) is the smallest algebra containing T it
follows that (T) L.
As an immediate corollary, we have the following uniqueness result.
Proposition 5.15. Suppose that T 2
is a system. If P and Q are two

probability
1
measures on (T) such that P = Q on T, then P = Q on (T) .
Proof. Let L := A (T) : P (A) = Q(A) . One easily shows L is a
class which contains T by assumption. Indeed, T L, if A, B L with
A B, then
P (B A) = P (B) P (A) = Q(B) Q(A) = Q(B A)
1
More generally, P and Q could be two measures such that P () = Q() < .
so that B A L, and if A
n
L with A
n
A, then P (A) = lim
n
P (A
n
) =
lim
n
Q(A
n
) = Q(A) which shows A L. Therefore (T) L = (T) and
the proof is complete.
Example 5.16. Let := a, b, c, d and let and be the probability measure
on 2
determined by, (x) =

1
4
for all x and (a) = (d) =
1
8
and
(b) = (c) = 3/8. In this example,
L :=
_
A 2
: P (A) = Q(A)
_
is system which is not an algebra. Indeed, A = a, b and B = a, c are in
L but A B / L.
Exercise 5.1. Suppose that and are two measures (not assumed to be
nite) on a measure space, (, B) such that = on a system, T. Further
assume B = (T) and there exists
n
T such that; i) (
n
) = (
n
) <
for all n and ii)
n
as n . Show = on B.
Hint: Consider the measures,
n
(A) := (A
n
) and
n
(A) =
(A
n
) .
Solution to Exercise (5.1). Let
n
(A) := (A
n
) and
n
(A) =
(A
n
) for all A B. Then
n
and
n
are nite measure such
n
() =
n
() and
n
=
n
on T. Therefore by Proposition 5.15,
n
=
n
on B. So by
the continuity properties of and , it follows that
(A) = lim
n
(A
n
) = lim
n
n
(A) = lim
n
n
(A) = lim
n
(A
n
) = (A)
for all A B.
Corollary 5.17. A probability measure, P, on (1, B
R
) is uniquely determined
by its cumulative distribution function,
F (x) := P ((, x]) .
Proof. This follows from Proposition 5.15 wherein we use the fact that
T := (, x] : x 1 is a system such that B
R
= (T) .
Remark 5.18. Corollary 5.17 generalizes to 1
n
. Namely a probability measure,
P, on (1
n
, B
R
n) is uniquely determined by its CDF,
F (x) := P ((, x]) for all x 1
n
where now
(, x] := (, x
1
] (, x
2
] (, x
n
].
5.2.1 A Density Result*
Exercise 5.2 (Density of / in (/)). Suppose that / 2
is an algebra,
B := (/) , and P is a probability measure on B. Let (A, B) := P (AB) .
The goal of this exercise is to use the theorem to show that / is dense in
B relative to the metric, . More precisely you are to show using the following
outline that for every B B there exists A / such that that P (AB) < .
1. Recall from Exercise 4.3 that (a, B) = P (AB) = E[1
A
1
B
[ .
2. Observe; if B = B
i
and A =
i
A
i
, then
B A =
i
[B
i
A]
i
(B
i
A
i
)
i
A
i
B
i
and
A B =
i
[A
i
B]
i
(A
i
B
i
)
i
A
i
B
i
so that
AB
i
(A
i
B
i
) .
3. We also have
(B
2
B
1
) (A
2
A
1
) = B
2
B
c
1
(A
2
A
1
)
c
= B
2
B
c
1
(A
2
A
c
1
)
c
= B
2
B
c
1
(A
c
2
A
1
)
= [B
2
B
c
1
A
c
2
] [B
2
B
c
1
A
1
]
(B
2
A
2
) (A
1
B
1
)
and similarly,
(A
2
A
1
) (B
2
B
1
) (A
2
B
2
) (B
1
A
1
)
so that
(A
2
A
1
) (B
2
B
1
) (B
2
A
2
) (A
1
B
1
) (A
2
B
2
) (B
1
A
1
)
= (A
1
B
1
) (A
2
B
2
) .
4. Observe that A
n
B and A
n
A, then
P (B A
n
) = P (B A
n
) +P (A
n
B)
P (B A) +P (A B) = P (AB) .
5. Let L be the collection of sets B B for which the assertion of the theorem
holds. Show L is a system which contains /.
5.3 Construction of Measures 55
Solution to Exercise (5.2). Since L contains the system, / it suces by
the theorem to show L is a system. Clearly, L since / L.
If B
1
B
2
with B
i
L and > 0, there exists A
i
/ such that P (B
i
A
i
) =
E
P
[1
Ai
1
Bi
[ < /2 and therefore,
P ((B
2
B
1
) (A
2
A
1
)) P ((A
1
B
1
) (A
2
B
2
))
P ((A
1
B
1
)) +P ((A
2
B
2
)) < .
Also if B
n
B with B
n
L, there exists A
n
/such that P (B
n
A
n
) < 2
n
and therefore,
P ([
n
B
n
] [
n
A
n
])
n=1
P (B
n
A
n
) < .
Moreover, if we let B :=
n
B
n
and A
N
:=
N
n=1
A
n
, then
P
_
B A
N
_
= P
_
B A
N
_
+P
_
A
N
B
_
P (B A)+P (A B) = P (B A)
where A :=
n
A
n
. Hence it follows for N large enough that P
_
B A
N
_
< .
Since > 0 was arbitrary we have shown B L as desired.
5.3 Construction of Measures
Denition 5.19. Given a collection of subsets, c, of , let c
denote the col-

lection of subsets of which are nite or countable unions of sets from c.
Similarly let c
denote the collection of subsets of which are nite or count-

able intersections of sets from c. We also write c
= (c
and c
= (c
,
etc.
Lemma 5.20. Suppose that / 2
is an algebra. Then:
1. /
is closed under taking countable unions and nite intersections.

2. /
is closed under taking countable intersections and nite unions.

3. A
c
: A /
= /
and A
c
: A /
= /
.
Proof. By construction /
is closed under countable unions. Moreover if

A =
i=1
A
i
and B =
j=1
B
j
with A
i
, B
j
/, then
A B =
i,j=1
A
i
B
j
/
,
which shows that /
is also closed under nite intersections. Item 3. is straight

forward and item 2. follows from items 1. and 3.
Remark 5.21. Let us recall from Proposition 5.3 and Remark 5.4 that a nitely
additive measure : / [0, ] is a premeasure on / i (A
n
) (A) for all
A
n
n=1
/ such that A
n
A /. Furthermore if () < , then is a
premeasure on / i (A
n
) 0 for all A
n
n=1
/ such that A
n
.
Proposition 5.22. Given a premeasure, : / [0, ] , we extend to /
by dening
(B) := sup(A) : / A B . (5.2)
This function : /
[0, ] then satises;

1. (Monotonicity) If A, B /
with A B then (A) (B) .

2. (Continuity) If A
n
/ and A
n
A /
, then (A
n
) (A) as n .
3. (Strong Additivity) If A, B /
, then
(A B) +(A B) = (A) +(B) . (5.3)
4. (Sub-Additivity on /
) The function is sub-additive on /
, i.e. if
A
n
n=1
/
, then
(
n=1
A
n
)
n=1
(A
n
) . (5.4)
5. ( - Additivity on /
) The function is countably additive on /
.
Proof. 1. and 2. Monotonicity follows directly from Eq. (5.2) which then
implies (A
n
) (B) for all n. Therefore M := lim
n
(A
n
) (B) . To
prove the reverse inequality, let / A B. Then by the continuity of on
/ and the fact that A
n
A A we have (A
n
A) (A) . As (A
n
)
(A
n
A) for all n it then follows that M := lim
n
(A
n
) (A) . As
A / with A B was arbitrary we may conclude,
(B) = sup(A) : / A B M.
3. Suppose that A, B /
and A
n
n=1
and B
n
n=1
are sequences in /
such that A
n
A and B
n
B as n . Then passing to the limit as n
in the identity,
(A
n
B
n
) +(A
n
B
n
) = (A
n
) +(B
n
)
proves Eq. (5.3). In particular, it follows that is nitely additive on /
.
4 and 5. Let A
n
n=1
be any sequence in /
and choose A
n,i
i=1
/
such that A
n,i
A
n
as i . Then we have,
N
n=1
A
n,N
_
n=1
(A
n,N
)
N
n=1
(A
n
)
n=1
(A
n
) . (5.5)
Since /
N
n=1
A
n,N

n=1
A
n
/
, we may let N in Eq. (5.5) to

conclude Eq. (5.4) holds. If we further assume that A
n
n=1
/
are pairwise
disjoint, by the nite additivity and monotonicity of on /
, we have
n=1
(A
n
) = lim
N
N
n=1
(A
n
) = lim
N
N
n=1
A
n
_
(
n=1
A
n
) .
This inequality along with Eq. (5.4) shows that is additive on /
.
Suppose is a nite premeasure on an algebra, / 2
, and A /
.
Since A, A
c
/
and = AA
c
, it follows that () = (A) +(A
c
) . From
this observation we may extend to a function on /
by dening
(A) := () (A
c
) for all A /
. (5.6)
Lemma 5.23. Suppose is a nite premeasure on an algebra, / 2
, and
has been extended to /
as described in Proposition 5.22 and Eq. (5.6)

above.
1. If A /
then (A) = inf (B) : A B / .

2. If A /
and A
n
/ such that A
n
A, then (A) = lim
n
(A
n
) .
3. is strongly additive when restricted to /
.
4. If A /
and C /
such that A C, then (C A) = (C) (A) .

Proof.
1. Since (B) = () (B
c
) and A B i B
c
A
c
, it follows that
inf (B) : A B / = inf () (B
c
) : / B
c
A
c
= () sup(B) : / B A
c
= () (A
c
) = (A) .
2. Similarly, since A
c
n
A
c
/
, by the denition of (A) and Proposition

5.22 it follows that
(A) = () (A
c
) = () lim
n
(A
c
n
)
= lim
n
[() (A
c
n
)] = lim
n
(A
n
) .
3. Suppose A, B /
and A
n
, B
n
/ such that A
n
A and B
n
B, then
A
n
B
n
A B and A
n
B
n
A B and therefore,
(A B) +(A B) = lim
n
[(A
n
B
n
) +(A
n
B
n
)]
= lim
n
[(A
n
) +(B
n
)] = (A) +(B) .
All we really need is the nite additivity of which can be proved as follows.
Suppose that A, B /
are disjoint, then AB = implies A

c
B
c
= .
So by the strong additivity of on /
it follows that
() +(A
c
B
c
) = (A
c
) +(B
c
)
(A B) = () (A
c
B
c
)
= () [(A
c
) +(B
c
) ()]
= (A) +(B) .
4. Since A
c
, C /
we may use the strong additivity of on /
to conclude,
(A
c
C) +(A
c
C) = (A
c
) +(C) .
Because = A
c
C, and (A
c
) = () (A) , the above equation may
be written as
() +(C A) = () (A) +(C)
which nishes the proof.
Notation 5.24 (Inner and outer measures) Let : / [0, ) be a nite
premeasure extended to /
as above. The for any B let
(B) := sup(A) : /
A B and
(B) := inf (C) : B C /
.
We refer to
(B) and
(B) as the inner and outer content of B respec-

tively.
If B has the same inner and outer content it is reasonable to dene the
measure of B as this common value. As we will see in Theorem 5.27 below, this
extension becomes a additive measure on a algebra of subsets of .
Denition 5.25 (Measurable Sets). Suppose is a nite premeasure on an
algebra / 2
. We say that B is measurable if
(B) =
(B) . We
will denote the collection of measurable subsets of by B = B() and dene
: B [0, ()] by
(B) :=
(B) =
(B) for all B B. (5.7)

5.3 Construction of Measures 57
Remark 5.26. Observe that
(B) =
(B) i for all > 0 there exists A /
and C /
such that A B C and

(C A) = (C) (A) < ,
wherein we have used Lemma 5.23 for the rst equality. Moreover we will use
below that if B B and /
A B C /
, then
(A)
(B) = (B) =
(B) (C) . (5.8)

Theorem 5.27 (Finite Premeasure Extension Theorem). Suppose is a
nite premeasure on an algebra / 2
and : B := B() [0, ()] be as

in Denition 5.25. Then B is a algebra on which contains / and is a
additive measure on B. Moreover, is the unique measure on B such that
[
,
= .
Proof. It is clear that / B and that B is closed under complementation.
Now suppose that B
i
B for i = 1, 2 and > 0 is given. We may then
choose A
i
B
i
C
i
such that A
i
/
, C
i
/
, and (C
i
A
i
) < for
i = 1, 2. Then with A = A
1
A
2
, B = B
1
B
2
and C = C
1
C
2
, we have
/
A B C /
. Since
C A = (C
1
A) (C
2
A) (C
1
A
1
) (C
2
A
2
) ,
it follows from the sub-additivity of that with
(C A) (C
1
A
1
) +(C
2
A
2
) < 2.
Since > 0 was arbitrary, we have shown that B B. Hence we now know that
B is an algebra.
Because B is an algebra, to verify that B is a algebra it suces to show
that B =
n=1
B
n
B whenever B
n
n=1
is a disjoint sequence in B. To prove
B B, let > 0 be given and choose A
i
B
i
C
i
such that A
i
/
, C
i
/
,
and (C
i
A
i
) < 2
i
for all i. Since the A
i
i=1
are pairwise disjoint we may
use Lemma 5.23 to show,
n
i=1
(C
i
) =
n
i=1
((A
i
) +(C
i
A
i
))
= (
n
i=1
A
i
) +
n
i=1
(C
i
A
i
) () +
n
i=1
2
i
.
Passing to the limit, n , in this equation then shows
i=1
(C
i
) () + < . (5.9)
Let B =
i=1
B
i
, C :=
i=1
C
i
/
and for n N let A

n
:=
n
i=1
A
i
/
.
Then /
A
n
B C /
, C A
n
/
and
C A
n
=
i=1
(C
i
A
n
) [
n
i=1
(C
i
A
i
)]
_
i=n+1
C
i
.
Therefore, using the sub-additivity of on /
and the estimate in Eq. (5.9),

(C A
n
)
n
i=1
(C
i
A
i
) +
i=n+1
(C
i
)
+
i=n+1
(C
i
) as n .
Since > 0 is arbitrary, it follows that B B and that
n
i=1
(A
i
) = (A
n
) (B) (C)
i=1
(C
i
) .
Letting n in this equation then shows,
i=1
(A
i
) (B)
i=1
(C
i
) . (5.10)
On the other hand, since A
i
B
i
C
i
, it follows (see Eq. (5.8) that
i=1
(A
i
)
i=1
(B
i
)
i=1
(C
i
) . (5.11)
As
i=1
(C
i
)
i=1
(A
i
) =
i=1
(C
i
A
i
)
i=1
2
i
= ,
we may conclude from Eqs. (5.10) and (5.11) that
(B)
i=1
(B
i
)
.
Since > 0 is arbitrary, we have shown (B) =

i=1
(B
i
) . This completes
the proof that B is a - algebra and that is a measure on B.
Since we really had no choice as to how to extend , it is to be expected
that the extension is unique. You are asked to supply the details in Exercise 5.3
below.
Exercise 5.3. Let , , /, and B := B() be as in Theorem 5.27. Further
suppose that B
0
2
is a algebra such that / B

0
B and : B
0

[0, ()] is a additive measure on B
0
such that = on /. Show that
= on B
0
as well. (When B
0
= (/) this exercise is of course a consequence
of Proposition 5.15. It is not necessary to use this information to complete the
exercise.)
Corollary 5.28. Suppose that / 2
is an algebra and : B
0
:= (/)
[0, ()] is a additive measure. Then for every B (/) and > 0;
1. there exists /
A B C /
and > 0 such that (C A) < and

2. there exists A / such that (AB) < .
Exercise 5.4. Prove corollary 5.28 by considering where := [
,
. Hint:
you may nd Exercise 4.3 useful here.
Theorem 5.29. Suppose that is a nite premeasure on an algebra /.
Then
(B) := inf (C) : B C /
B (/) (5.12)
denes a measure on (/) and this measure is the unique extension of on /
to a measure on (/) . Recall that
(C) = sup(A) : / A C .
Proof. Let
n
n=1
/ be chosen so that (
n
) < for all n and
n

as n and let
n
(A) :=
n
(A
n
) for all A /.
Each
n
is a premeasure (as is easily veried) on / and hence by Theorem 5.27
each
n
has an extension,
n
, to a measure on (/) . Since the measure
n
are
increasing, := lim
n

n
is a measure which extends .
The proof will be completed by verifying that Eq. (5.12) holds. Let B
(/) , B
m
=
m
B and > 0 be given. By Theorem 5.27, there exists
C
m
/
such that B
m
C
m

m
and (C
m
B
m
) =
m
(C
m
B
m
) < 2
n
.
Then C :=
m=1
C
m
/
and
(C B)
_

_
m=1
(C
m
B)
_
m=1
(C
m
B)
m=1
(C
m
B
m
) < .
Thus
(B) (C) = (B) + (C B) (B) +
which, since > 0 is arbitrary, shows satises Eq. (5.12). The uniqueness of
the extension is proved in Exercise 5.11.
The following slight reformulation of Theorem 5.29 can be useful.
Corollary 5.30. Let / be an algebra of sets,
m
m=1
/ is a given sequence
of sets such that
m
as m . Let
/
f
:= A / : A
m
for some m N .
Notice that /
f
is a ring, i.e. closed under dierences, intersections and unions
and contains the empty set. Further suppose that : /
f
[0, ) is an additive
set function such that (A
n
) 0 for any sequence, A
n
/
f
such that A
n

as n . Then extends uniquely to a nite measure on /.
Proof. Existence. By assumption,
m
:= [
,m
: /
m
[0, ) is a
premeasure on (
m
, /
m
) and hence by Theorem 5.29 extends to a measure
t
m
on (
m
, (/
m
) = B
m
) . Let
m
(B) :=
t
m
(B
m
) for all B B.
Then
m
m=1
is an increasing sequence of measure on (, B) and hence :=
lim
m

m
denes a measure on (, B) such that [
,f
= .
Uniqueness. If
1
and
2
are two such extensions, then
1
(
m
B) =
2
(
m
B) for all B / and therefore by Proposition 5.15 or Exercise 5.11
we know that
1
(
m
B) =
2
(
m
B) for all B B. We may now let
m to see that in fact
1
(B) =
2
(B) for all B B, i.e.
1
=
2
.
5.4 Radon Measures on 1
We say that a measure, , on (1, B
R
) is a Radon measure if ([a, b]) <
for all < a < b < . In this section we will give a characterization of all
Radon measures on 1. We rst need the following general result characterizing
premeasures on an algebra generated by a semi-algebra.
Proposition 5.31. Suppose that o 2
is a semi-algebra, / = /(o) and

: / [0, ] is a nitely additive measure. Then is a premeasure on / i
is countably sub-additive on o.
Proof. Clearly if is a premeasure on / then is - additive and hence
sub-additive on o. Because of Proposition 4.2, to prove the converse it suces
to show that the sub-additivity of on o implies the sub-additivity of on /.
So suppose A =
n=1
A
n
/ with each A
n
/ . By Proposition 3.25 we
may write A =

k
j=1
E
j
and A
n
=

Nn
i=1
E
n,i
with E
j
, E
n,i
o. Intersecting
the identity, A =
n=1
A
n
, with E
j
implies
E
j
= A E
j
=
n=1
A
n
E
j
=
n=1
Nn
i=1
E
n,i
E
j
.
By the assumed sub-additivity of on o,
5.4 Radon Measures on R 59
(E
j
)
n=1
Nn
i=1
(E
n,i
E
j
) .
Summing this equation on j and using the nite additivity of shows
(A) =
k
j=1
(E
j
)
k
j=1
n=1
Nn
i=1
(E
n,i
E
j
)
=
n=1
Nn
i=1
k
j=1
(E
n,i
E
j
) =
n=1
Nn
i=1
(E
n,i
) =
n=1
(A
n
) .
Suppose now that is a Radon measure on (1, B
R
) and F : 1 1 is chosen
so that
((a, b]) = F (b) F (a) for all < a b < . (5.13)
For example if (1) < we can take F (x) = ((, x]) while if (1) =
we might take
F (x) =
_
((0, x]) if x 0
((x, 0]) if x 0
.
The function F is uniquely determined modulo translation by a constant.
Lemma 5.32. If is a Radon measure on (1, B
R
) and F : 1 1 is chosen
so that ((a, b]) = F (b) F (a) , then F is increasing and right continuous.
Proof. The function F is increasing by the monotonicity of . To see that
F is right continuous, let b 1 and choose a (, b) and any sequence
b
n
n=1
(b, ) such that b
n
b as n . Since ((a, b
1
]) < and
(a, b
n
] (a, b] as n , it follows that
F(b
n
) F(a) = ((a, b
n
]) ((a, b]) = F(b) F(a).
Since b
n
n=1
was an arbitrary sequence such that b
n
b, we have shown
lim
yb
F(y) = F(b).
The key result of this section is the converse to this lemma.
Theorem 5.33. Suppose F : 1 1 is a right continuous increasing function.
Then there exists a unique Radon measure, =
F
, on (1, B
R
) such that Eq.
(5.13) holds.
Proof. Let o := (a, b] 1 : a b , and / = /(o) consists
of those sets, A 1 which may be written as nite disjoint unions of sets
from o as in Example 3.26. Recall that B
R
= (/) = (o) . Further dene
F () := lim
x
F (x) and let =
F
be the nitely additive measure
on (1, /) described in Proposition 4.8 and Remark 4.9. To nish the proof it
suces by Theorem 5.29 to show that is a premeasure on / = /(o) where
o := (a, b] 1 : a b . So in light of Proposition 5.31, to nish
the proof it suces to show is sub-additive on o, i.e. we must show
(J)
n=1
(J
n
). (5.14)
where J =

n=1
J
n
with J = (a, b] 1 and J
n
= (a
n
, b
n
] 1. Recall from
Proposition 4.2 that the nite additivity of implies
n=1
(J
n
) (J) . (5.15)
We begin with the special case where < a < b < . Our proof will be
by continuous induction. The strategy is to show a where
:=
_
[a, b] : (J (, b])
n=1
(J
n
(, b])
_
. (5.16)
As b J, there exists an k such that b J
k
and hence (a
k
, b
k
] = (a
k
, b] for this
k. It now easily follows that J
k
so that is not empty. To nish the proof
we are going to show a := inf and that a = a.
If a / , there would exist
m
such that
m
a, i.e.
(J (
m
, b])
n=1
(J
n
(
m
, b]). (5.17)
Since (J
n
(
m
, b]) (J
n
) and

n=1
(J
n
) (J) < by Eq. (5.15),
we may use the right continuity of F and the dominated convergence the-
orem for sums in order to pass to the limit as m in Eq. (5.17) to
learn,
(J ( a, b])
n=1
(J
n
( a, b]).
This shows a which is a contradiction to the original assumption that
a / .
If a > a, then a J
l
= (a
l
, b
l
] for some l. Letting = a
l
< a, we have,
(J (, b]) = (J (, a]) +(J ( a, b])
(J
l
(, a]) +
n=1
(J
n
( a, b])
= (J
l
(, a]) +(J
l
( a, b]) +
n,=l
(J
n
( a, b])
= (J
l
(, b]) +
n,=l
(J
n
( a, b])
n=1
(J
n
(, b]).
This shows and < a which violates the denition of a. Thus we
must conclude that a = a.
The hard work is now done but we still have to check the cases where
a = or b = . For example, suppose that b = so that
J = (a, ) =
n=1
J
n
with J
n
= (a
n
, b
n
] 1. Then
I
M
:= (a, M] = J I
M
=
n=1
J
n
I
M
and so by what we have already proved,
F(M) F(a) = (I
M
)
n=1
(J
n
I
M
)
n=1
(J
n
).
Now let M in this last inequality to nd that
((a, )) = F() F(a)
n=1
(J
n
).
The other cases where a = and b 1 and a = and b = are handled
similarly.
5.4.1 Lebesgue Measure
If F (x) = x for all x 1, we denote
F
by m and call m Lebesgue measure on
(1, B
R
) .
Theorem 5.34. Lebesgue measure m is invariant under translations, i.e. for
B B
R
and x 1,
m(x +B) = m(B). (5.18)
Lebesgue measure, m, is the unique measure on B
R
such that m((0, 1]) = 1 and
Eq. (5.18) holds for B B
R
and x 1. Moreover, m has the scaling property
m(B) = [[ m(B) (5.19)
where 1, B B
R
and B := x : x B.
Proof. Let m
x
(B) := m(x+B), then one easily shows that m
x
is a measure
on B
R
such that m
x
((a, b]) = b a for all a < b. Therefore, m
x
= m by
the uniqueness assertion in Exercise 5.11. For the converse, suppose that m is
translation invariant and m((0, 1]) = 1. Given n N, we have
(0, 1] =
n
k=1
(
k 1
n
,
k
n
] =
n
k=1
_
k 1
n
+ (0,
1
n
]
_
.
Therefore,
1 = m((0, 1]) =
n
k=1
m
_
k 1
n
+ (0,
1
n
]
_
=
n
k=1
m((0,
1
n
]) = n m((0,
1
n
]).
That is to say
m((0,
1
n
]) = 1/n.
Similarly, m((0,
l
n
]) = l/n for all l, n N and therefore by the translation
invariance of m,
m((a, b]) = b a for all a, b with a < b.
Finally for a, b 1 such that a < b, choose a
n
, b
n
such that b
n
b and
a
n
a, then (a
n
, b
n
] (a, b] and thus
m((a, b]) = lim
n
m((a
n
, b
n
]) = lim
n
(b
n
a
n
) = b a,
i.e. m is Lebesgue measure. To prove Eq. (5.19) we may assume that ,= 0
since this case is trivial to prove. Now let m
(B) := [[
1
m(B). It is easily
checked that m
is again a measure on B
R
which satises
m
((a, b]) =
1
m((a, b]) =
1
(b a) = b a
if > 0 and
m
((a, b]) = [[
1
m([b, a)) = [[
1
(b a) = b a
if < 0. Hence m
= m.
5.5 A Discrete Kolmogorovs Extension Theorem 61
5.5 A Discrete Kolmogorovs Extension Theorem
For this section, let S be a nite or countable set (we refer to S as state space),
:= S
:= S
N
(think of N as time and as path space)
/
n
:= B : B S
n
for all n N,
/ :=
n=1
/
n
, and B := (/) . We call the elements, A , the cylinder
subsets of . Notice that A is a cylinder set i there exists n N and
B S
n
such that
A = B := : (
1
, . . . ,
n
) B .
Also observe that we may write A as A = B
t
where B
t
= B S
k
S
n+k
for any k 0.
Exercise 5.5. Show;
1. /
n
is a algebra for each n N,
2. /
n
/
n+1
for all n, and
3. / 2
is an algebra of subsets of . (In fact, you might show that

/ =
n=1
/
n
is an algebra whenever /
n
n=1
is an increasing sequence
of algebras.)
Lemma 5.35 (Baby Tychonov Theorem). Suppose C
n
n=1
/ is a de-
creasing sequence of non-empty cylinder sets. Further assume there exists
N
n
N and B
n
S
Nn
such that C
n
= B
n
. (This last assumption is
vacuous when S is a nite set. Recall that we write A to indicate that
is a nite subset of A.) Then
n=1
C
n
,= .
Proof. Since C
n+1
C
n
, if N
n
> N
n+1
, we would have B
n+1
S
Nn+1Nn
B
n
. If S is an innite set this would imply B
n
is an innite set and hence we
must have N
n+1
N
n
for all n when #(S) = . On the other hand, if S is
a nite set, we can always replace B
n+1
by B
n+1
S
k
for some appropriate
k and arrange it so that N
n+1
N
n
for all n. So from now we assume that
N
n+1
N
n
.
Case 1. lim
n
N
n
< in which case there exists some N N such that
N
n
= N for all large n. Thus for large N, C
n
= B
n
with B
n
S
N
and
B
n+1
B
n
and hence #(B
n
) as n . By assumption, lim
n
#(B
n
) ,= 0
and therefore #(B
n
) = k > 0 for all n large. It then follows that there exists
n
0
N such that B
n
= B
n0
for all n n
0
. Therefore
n=1
C
n
= B
n0
,= .
Case 2. lim
n
N
n
= . By assumption, there exists (n) =
(
1
(n) ,
2
(n) , . . . ) such that (n) C
n
for all n. Moreover, since
(n) C
n
C
k
for all k n, it follows that
(
1
(n) ,
2
(n) , . . . ,
Nk
(n)) B
k
for all n k (5.20)
and as B
k
is a nite set
i
(n)
n=1
must be a nite set for all 1 i N
k
.
As N
k
as k it follows that
i
(n)
n=1
is a nite set for all i N.
Using this observation, we may nd, s
1
S and an innite subset,
1
N such
that
1
(n) = s
1
for all n
1.
Similarly, there exists s
2
S and an innite
set,
2

1
, such that
2
(n) = s
2
for all n
2
. Continuing this procedure
inductively, there exists (for all j N) innite subsets,
j
N and points
s
j
S such that
1

2

3
. . . and
j
(n) = s
j
for all n
j
.
We are now going to complete the proof by showing s := (s
1
, s
2
, . . . )
n=1
C
n
. By the construction above, for all N N we have
(
1
(n) , . . . ,
N
(n)) = (s
1
, . . . , s
N
) for all n
N
.
Taking N = N
k
and n
Nk
with n k, we learn from Eq. (5.20) that
(s
1
, . . . , s
Nk
) = (
1
(n) , . . . ,
Nk
(n)) B
k
.
But this is equivalent to showing s C
k
. Since k N was arbitrary it follows
that s
n=1
C
n
.
Let

S := S is S is a nite set and

S = S if S is an innite set. Here,
, is simply another point not in S which we call innity Let x
n
n=1

S
be a sequence, then we way lim
n
x
n
= if for every A S, x
n
/ A for
almost all n and we say that lim
n
x
n
= s S if x
n
= s for almost all n.
For example this is the usual notion of convergence for S =
_
1
n
: n N
_
and
S = S 0 [0, 1] , where 0 is playing the role of innity here. Observe that

either lim
n
x
n
= or there exists a nite subset F S such that x
n
F
innitely often. Moreover, there must be some point, s F such that x
n
= s
innitely often. Thus if we let n
1
< n
2
< . . . N be chosen such that x
nk
= s
for all k, then lim
k
x
nk
= s. Thus we have shown that every sequence in

S
has a convergent subsequence.
Lemma 5.36 (Baby Tychonov Theorem I.). Let

:=

S
N
and (n)
n=1
be a sequence in

. Then there is a subsequence, n
k
k=1
of n
n=1
such that
lim
k
(n
k
) exists in

by which we mean, lim
k
i
(n
k
) exists in

S for
all i N.
Proof. This follows by the usual cantors diagonalization argument. Indeed,
let
_
n
1
k
_
k=1
n
n=1
be chosen so that lim
k
1
_
n
1
k
_
= s
1

S exists. Then
choose
_
n
2
k
_
k=1

_
n
1
k
_
k=1
so that lim
k
2
_
n
2
k
_
= s
2

S exists. Continue
on this way to inductively choose
_
n
1
k
_
k=1

_
n
2
k
_
k=1

_
n
l
k
_
k=1
. . .
such that lim
k
l
_
n
l
k
_
= s
l

S. The subsequence, n
k
k=1
of n
n=1
, may
now be dened by, n
k
= n
k
k
.
Corollary 5.37 (Baby Tychonov Theorem II.). Suppose that F
n
n=1

is decreasing sequence of non-empty sets which are closed under taking se-
quential limits, then
n=1
F
n
,= .
Proof. Since F
n
,= there exists (n) F
n
for all n. Using Lemma 5.36,
there exists n
k
k=1
n
n=1
such that := lim
k
(n
k
) exits in

. Since
(n
k
) F
n
for all k n, it follows that F
n
for all n, i.e.
n=1
F
n
and
hence
n=1
F
n
,= .
Example 5.38. Suppose that 1 N
1
< N
2
< N
3
< . . . , F
n
= K
n
with
K
n
S
Nn
such that F
n
n=1
is a decreasing sequence of non-empty sets.
Then
n=1
F
n
,= . To prove this, let

F
n
:= K
n

in which case

F
n
are non
empty sets closed under taking limits. Therefore by Corollary 5.37,
n

F
n
,= .
This completes the proof since it is easy to check that
n=1
F
n
=
n

F
n
,= .
Corollary 5.39. If S is a nite set and A
n
n=1
/ is a decreasing sequence
of non-empty cylinder sets, then
n=1
A
n
,= .
Proof. This follows directly from Example 5.38 since necessarily, A
n
=
K
n
, for some K
n
S
Nn
.
Theorem 5.40 (Kolmogorovs Extension Theorem I.). Let us continue
the notation above with the further assumption that S is a nite set. Then every
nitely additive probability measure, P : / [0, 1] , has a unique extension to
a probability measure on B := (/) .
Proof. From Theorem 5.27, it suces to show lim
n
P (A
n
) = 0 whenever
A
n
n=1
/ with A
n
. However, by Lemma 5.35 with C
n
= A
n
, A
n
/
and A
n
, we must have that A
n
= for a.a. n and in particular P (A
n
) = 0
for a.a. n. This certainly implies lim
n
P (A
n
) = 0.
For the next three exercises, suppose that S is a nite set and continue the
notation from above. Further suppose that P : (/) [0, 1] is a probability
measure and for n N and (s
1
, . . . , s
n
) S
n
, let
p
n
(s
1
, . . . , s
n
) := P ( :
1
= s
1
, . . . ,
n
= s
n
) . (5.21)
Exercise 5.6 (Consistency Conditions). If p
n
is dened as above, show:
1.

sS
p
1
(s) = 1 and
2. for all n N and (s
1
, . . . , s
n
) S
n
,
p
n
(s
1
, . . . , s
n
) =
sS
p
n+1
(s
1
, . . . , s
n
, s) .
Exercise 5.7 (Converse to 5.6). Suppose for each n N we are given func-
tions, p
n
: S
n
[0, 1] such that the consistency conditions in Exercise 5.6 hold.
Then there exists a unique probability measure, P on (/) such that Eq. (5.21)
holds for all n N and (s
1
, . . . , s
n
) S
n
.
Example 5.41 (Existence of iid simple R.V.s). Suppose now that q : S [0, 1]
is a function such that

sS
q (s) = 1. Then there exists a unique probability
measure P on (/) such that, for all n N and (s
1
, . . . , s
n
) S
n
, we have
P ( :
1
= s
1
, . . . ,
n
= s
n
) = q (s
1
) . . . q (s
n
) .
This is a special case of Exercise 5.7 with p
n
(s
1
, . . . , s
n
) := q (s
1
) . . . q (s
n
) .
Theorem 5.42 (Kolmogorovs Extension Theorem II). Suppose now that
S is countably innite set and P : / [0, 1] is a nitely additive measure such
that P[
,n
is a additive measure for each n N. Then P extends uniquely
to a probability measure on B := (/) .
Proof. From Theorem 5.27 it suce to show; if A
m
n=1
/ is a decreas-
ing sequence of subsets such that := inf
m
P (A
m
) > 0, then
m=1
A
m
,= .
You are asked to verify this property of P in the next couple of exercises.
For the next couple of exercises the hypothesis of Theorem 5.42 are to be
assumed.
Exercise 5.8. Show for each n N, A /
n
, and > 0 are given. Show there
exists F /
n
such that F A, F = K with K S
n
, and P (A F) < .
Exercise 5.9. Let A
m
n=1
/ be a decreasing sequence of subsets such that
:= inf
m
P (A
m
) > 0. Using Exercise 5.8, choose F
m
= K
m
A
m
with
K
m
S
Nn
and P (A
m
F
m
) /2
m+1
. Further dene C
m
:= F
1
F
m
for each m. Show;
1. Show A
m
C
m
(A
1
F
1
) (A
2
F
2
) (A
m
F
m
) and use this to
conclude that P (A
m
C
m
) /2.
2. Conclude C
m
is not empty for m.
3. Use Lemma 5.35 to conclude that , =
m=1
C
m

m=1
A
m
.
Exercise 5.10. Convince yourself that the results of Exercise 5.6 and 5.7 are
valid when S is a countable set. (See Example 4.6.)
In summary, the main result of this section states, to any sequence of
functions, p
n
: S
n
[0, 1] , such that
S
n p
n
() = 1 and
sS
p
n+1
(, s) =
p
n
() for all n and S
n
, there exists a unique probability measure, P, on
B := (/) such that
P (B ) =
B
p
n
() B S
n
and n N.
Example 5.43 (Markov Chain Probabilities). Let S be a nite or at most count-
able state space and p : S S [0, 1] be a Markov kernel, i.e.
5.6 Appendix: Regularity and Uniqueness Results* 63
yS
p (x, y) = 1 for all x S. (5.22)
Also let : S [0, 1] be a probability function, i.e.

xS
(x) = 1. We now
take
:= S
N0
= = (s
0
, s
1
, . . . ) : s
j
S
and let X
n
: S be given by
X
n
(s
0
, s
1
, . . . ) = s
n
for all n N
0
.
Then there exists a unique probability measure, P
, on (/) such that

P
(X
0
= x
0
, . . . , X
n
= x
n
) = (x
0
) p (x
0
, x
1
) . . . p (x
n1
, x
n
)
for all n N
0
and x
0
, x
1
, . . . , x
n
S. To see such a measure exists, we need
only verify that
p
n
(x
0
, . . . , x
n
) := (x
0
) p (x
0
, x
1
) . . . p (x
n1
, x
n
)
veries the hypothesis of Exercise 5.6 taking into account a shift of the n
index.
5.6 Appendix: Regularity and Uniqueness Results*
The goal of this appendix it to approximating measurable sets from inside
and outside by classes of sets which are relatively easy to understand. Our
rst few results are already contained in Carathoedorys existence of measures
proof. Nevertheless, we state these results again and give another somewhat
independent proof.
Theorem 5.44 (Finite Regularity Result). Suppose / 2
is an algebra,
B = (/) and : B [0, ) is a nite measure, i.e. () < . Then for
every > 0 and B B there exists A /
and C /
such that A B C
and (C A) < .
Proof. Let B
0
denote the collection of B B such that for every > 0
there here exists A /
and C /
such that A B C and (C A) < .

It is now clear that / B
0
and that B
0
is closed under complementation. Now
suppose that B
i
B
0
for i = 1, 2, . . . and > 0 is given. By assumption there
exists A
i
/
and C
i
/
such that A
i
B
i
C
i
and (C
i
A
i
) < 2
i
.
Let A :=
i=1
A
i
, A
N
:=
N
i=1
A
i
/
, B :=
i=1
B
i
, and C :=
i=1
C
i

/
. Then A
N
A B C and
C A = [
i=1
C
i
] A =
i=1
[C
i
A]
i=1
[C
i
A
i
] .
Therefore,
(C A) = (
i=1
[C
i
A])
i=1
(C
i
A)
i=1
(C
i
A
i
) < .
Since C A
N
C A, it also follows that
_
C A
N
_
< for suciently large
N and this shows B =
i=1
B
i
B
0
. Hence B
0
is a sub--algebra of B = (/)
which contains / which shows B
0
= B.
Many theorems in the sequel will require some control on the size of a
measure . The relevant notion for our purposes (and most purposes) is that
of a nite measure dened next.
Denition 5.45. Suppose is a set, c B 2
and : B [0, ] is a
function. The function is nite on c if there exists E
n
c such that
(E
n
) < and =
n=1
E
n
. If B is a algebra and is a measure on B
which is nite on B we will say (, B, ) is a nite measure space.
The reader should check that if is a nitely additive measure on an algebra,
B, then is nite on B i there exists
n
B such that
n
and
(
n
) < .
Corollary 5.46 ( Finite Regularity Result). Theorem 5.44 continues
to hold under the weaker assumption that : B [0, ] is a measure which is
nite on /.
Proof. Let
n
/ such that
n=1
n
= and (
n
) < for all n.Since
A B
n
(A) := (
n
A) is a nite measure on A B for each n, by
Theorem 5.44, for every B B there exists C
n
/
such that B C
n
and
(
n
[C
n
B]) =
n
(C
n
B) < 2
n
. Now let C :=
n=1
[
n
C
n
] /
and observe that B C and

(C B) = (
n=1
([
n
C
n
] B))
n=1
([
n
C
n
] B) =
n=1
(
n
[C
n
B]) < .
Applying this result to B
c
shows there exists D /
such that B
c
D and
(B D
c
) = (D B
c
) < .
So if we let A := D
c
/
, then A B C and
(C A) = ([B A] [(C B) A]) (B A) +(C B) < 2
and the result is proved.
Exercise 5.11. Suppose / 2
is an algebra and and are two measures

on B = (/) .
a. Suppose that and are nite measures such that = on /. Show
= .
b. Generalize the previous assertion to the case where you only assume that
and are nite on /.
Corollary 5.47. Suppose / 2
is an algebra and : B = (/) [0, ] is

a measure which is nite on /. Then for all B B, there exists A /
and C /
such that A B C and (C A) = 0.

Proof. By Theorem 5.44, given B B, we may choose A
n
/
and
C
n
/
such that A
n
B C
n
and (C
n
B) 1/n and (B A
n
) 1/n.
By replacing A
N
by
N
n=1
A
n
and C
N
by
N
n=1
C
n
, we may assume that A
n

and C
n
as n increases. Let A = A
n
/
and C = C
n
/
, then
A B C and
(C A) = (C B) +(B A) (C
n
B) +(B A
n
)
2/n 0 as n .
Exercise 5.12. Let B = B
R
n = (open subsets of 1
n
) be the Borel
algebra on 1
n
and be a probability measure on B. Further, let B
0
denote
those sets B B such that for every > 0 there exists F B V such that
F is closed, V is open, and (V F) < . Show:
1. B
0
contains all closed subsets of B. Hint: given a closed subset, F 1
n
and
k N, let V
k
:=
xF
B(x, 1/k) , where B(x, ) := y 1
n
: [y x[ < .
Show, V
k
F as k .
2. Show B
0
is a algebra and use this along with the rst part of this
exercise to conclude B = B
0
. Hint: follow closely the method used in the
rst step of the proof of Theorem 5.44.
3. Show for every > 0 and B B, there exist a compact subset, K 1
n
, such
that K B and (B K) < . Hint: take K := F x 1
n
: [x[ n
for some suciently large n.
5.7 Appendix: Completions of Measure Spaces*
Denition 5.48. A set E is a null set if E B and (E) = 0. If P is
some property which is either true or false for each x , we will use the
terminology P a.e. (to be read P almost everywhere) to mean
E := x : P is false for x
is a null set. For example if f and g are two measurable functions on (, B, ),
f = g a.e. means that (f ,= g) = 0.
Denition 5.49. A measure space (, B, ) is complete if every subset of a
null set is in B, i.e. for all F such that F E B with (E) = 0 implies
that F B.
Proposition 5.50 (Completion of a Measure). Let (, B, ) be a measure
space. Set
^ = ^
:= N : F Bsuch that N F and (F) = 0 ,

B =

B
:= A N : A B and N ^ and
(A N) := (A) for A B and N ^,
see Fig. 5.2. Then

B is a algebra, is a well dened measure on

B, is the
unique measure on

B which extends on B, and (,

B, ) is complete measure
space. The -algebra,

B, is called the completion of B relative to and , is
called the completion of .
Proof. Clearly ,

B. Let A B and N ^ and choose F B such
Fig. 5.2. Completing a algebra.
that N F and (F) = 0. Since N
c
= (F N) F
c
,
(A N)
c
= A
c
N
c
= A
c
(F N F
c
)
= [A
c
(F N)] [A
c
F
c
]
where [A
c
(F N)] ^ and [A
c
F
c
] B. Thus

B is closed under
complements. If A
i
B and N
i
F
i
B such that (F
i
) = 0 then
(A
i
N
i
) = (A
i
) (N
i
)

B since A
i
B and N
i
F
i
and
(F
i
)
(F
i
) = 0. Therefore,

B is a algebra. Suppose AN
1
= BN
2
with A, B B and N
1
, N
2
, ^. Then A A N
1
A N
1
F
2
= B F
2
which shows that
(A) (B) +(F
2
) = (B).
Similarly, we show that (B) (A) so that (A) = (B) and hence (A
N) := (A) is well dened. It is left as an exercise to show is a measure, i.e.
that it is countable additive.
5.8 Appendix Monotone Class Theorems*
This appendix may be safely skipped!
Denition 5.51 (Montone Class). ( 2
is a monotone class if it is
closed under countable increasing unions and countable decreasing intersections.
Lemma 5.52 (Monotone Class Theorem*). Suppose / 2
is an algebra
and ( is the smallest monotone class containing /. Then ( = (/).
Proof. For C ( let
((C) = B ( : C B, C B
c
, B C
c
(,
then ((C) is a monotone class. Indeed, if B
n
((C) and B
n
B, then B
c
n
B
c
and so
( C B
n
C B
( C B
c
n
C B
c
and
( B
n
C
c
B C
c
.
Since ( is a monotone class, it follows that C B, C B
c
, B C
c
(, i.e.
B ((C). This shows that ((C) is closed under increasing limits and a similar
argument shows that ((C) is closed under decreasing limits. Thus we have
shown that ((C) is a monotone class for all C (. If A / (, then
A B, A B
c
, B A
c
/ ( for all B / and hence it follows that
/ ((A) (. Since ( is the smallest monotone class containing / and ((A) is
a monotone class containing /, we conclude that ((A) = ( for any A /. Let
B ( and notice that A ((B) happens i B ((A). This observation and
the fact that ((A) = ( for all A / implies / ((B) ( for all B (. Again
since ( is the smallest monotone class containing / and ((B) is a monotone
class we conclude that ((B) = ( for all B (. That is to say, if A, B ( then
A ( = ((B) and hence A B, A B
c
, A
c
B (. So ( is closed under
complements (since / () and nite intersections and increasing unions
from which it easily follows that ( is a algebra.
6
Random Variables
Notation 6.1 If f : X Y is a function and c 2
Y
let
f
1
c := f
1
(c) := f
1
(E)[E c.
If ( 2
X
, let
f
( := A 2
Y
[f
1
(A) (.
Denition 6.2. Let c 2
X
be a collection of sets, A X, i
A
: A X be the
inclusion map (i
A
(x) = x for all x A) and
c
A
= i
1
A
(c) = A E : E c .
The following results will be used frequently (often without further refer-
ence) in the sequel.
Lemma 6.3 (A key measurability lemma). If f : X Y is a function and
c 2
Y
, then
_
f
1
(c)
_
= f
1
((c)). (6.1)
In particular, if A Y then
((c))
A
= (c
A
), (6.2)
(Similar assertion hold with () being replaced by /() .)
Proof. Since c (c), it follows that f
1
(c) f
1
((c)). Moreover, by
Exercise 6.1 below, f
1
((c)) is a algebra and therefore,
(f
1
(c)) f
1
((c)).
To nish the proof we must show f
1
((c)) (f
1
(c)), i.e. that f
1
(B)
(f
1
(c)) for all B (c) . To do this we follow the usual measure theoretic
mantra, namely let
/:=
_
B Y : f
1
(B) (f
1
(c))
_
= f
(f
1
(c)).
We will now nish the proof by showing (c) /. This is easily achieved
by observing that / is a algebra (see Exercise 6.1) which contains c and
therefore (c) /.
Equation (6.2) is a special case of Eq. (6.1). Indeed, f = i
A
: A X we
have
((c))
A
= i
1
A
((c)) = (i
1
A
(c)) = (c
A
).
Exercise 6.1. If f : X Y is a function and T 2
Y
and B 2
X
are
algebras (algebras), then f
1
T and f
B are algebras (algebras).

Example 6.4. Let c = (a, b] : < a < b < and B = (c) be the Borel
eld on 1. Then
c
(0,1]
= (a, b] : 0 a < b 1
and we have
B
(0,1]
=
_
c
(0,1]
_
.
In particular, if A B such that A (0, 1], then A
_
c
(0,1]
_
.
6.1 Measurable Functions
Denition 6.5. A measurable space is a pair (X, /), where X is a set and
/ is a algebra on X.
To motivate the notion of a measurable function, suppose (X, /, ) is a
measure space and f : X 1
+
is a function. Roughly speaking, we are going
to dene
_
X
fd as a certain limit of sums of the form,
0<a1<a2<a3<...
a
i
(f
1
(a
i
, a
i+1
]).
For this to make sense we will need to require f
1
((a, b]) / for all a < b.
Because of Corollary 6.11 below, this last condition is equivalent to the condition
f
1
(B
R
) /.
Denition 6.6. Let (X, /) and (Y, T) be measurable spaces. A function f :
X Y is measurable of more precisely, //T measurable or (/, T)
measurable, if f
1
(T) /, i.e. if f
1
(A) / for all A T.
Remark 6.7. Let f : X Y be a function. Given a algebra T 2
Y
, the
algebra /:= f
1
(T) is the smallest algebra on X such that f is (/, T)
- measurable . Similarly, if / is a - algebra on X then
T = f
/=A 2
Y
[f
1
(A) /
is the largest algebra on Y such that f is (/, T) - measurable.
68 6 Random Variables
Example 6.8 (Indicator Functions). Let (X, /) be a measurable space and A
X. Then 1
A
is (/, B
R
) measurable i A /. Indeed, 1
1
A
(W) is either ,
X, A or A
c
for any W 1 with 1
1
A
(1) = A.
Example 6.9. Suppose f : X Y with Y being a nite or countable set and
T = 2
Y
. Then f is measurable i f
1
(y) / for all y Y.
Proposition 6.10. Suppose that (X, /) and (Y, T) are measurable spaces and
further assume c T generates T, i.e. T = (c) . Then a map, f : X Y is
measurable i f
1
(c) /.
Proof. If f is //T measurable, then f
1
(c) f
1
(T) /. Conversely
if f
1
(c) / then
_
f
1
(c)
_
/ and so making use of Lemma 6.3,
f
1
(T) = f
1
( (c)) =
_
f
1
(c)
_
/.
Corollary 6.11. Suppose that (X, /) is a measurable space. Then the follow-
ing conditions on a function f : X 1 are equivalent:
1. f is (/, B
R
) measurable,
2. f
1
((a, )) / for all a 1,
3. f
1
((a, )) / for all a ,
4. f
1
((, a]) / for all a 1.
Exercise 6.2. Prove Corollary 6.11. Hint: See Exercise 3.7.
Exercise 6.3. If / is the algebra generated by c 2
X
, then / is the
union of the algebras generated by countable subsets T c.
Exercise 6.4. Let (X, /) be a measure space and f
n
: X 1 be a sequence
of measurable functions on X. Show that x : lim
n
f
n
(x) exists in 1 /.
Similarly show the same holds if 1 is replaced by C.
Exercise 6.5. Show that every monotone function f : 1 1 is (B
R
, B
R
)
measurable.
Denition 6.12. Given measurable spaces (X, /) and (Y, T) and a subset
A X. We say a function f : A Y is measurable i f is /
A
/T measur-
able.
Proposition 6.13 (Localizing Measurability). Let (X, /) and (Y, T) be
measurable spaces and f : X Y be a function.
1. If f is measurable and A X then f[
A
: A Y is /
A
/T measurable.
2. Suppose there exist A
n
/ such that X =
n=1
A
n
and f[A
n
is /
An
/T
measurable for all n, then f is / measurable.
Proof. 1. If f : X Y is measurable, f
1
(B) / for all B T and
therefore
f[
1
A
(B) = A f
1
(B) /
A
for all B T.
2. If B T, then
f
1
(B) =
n=1
_
f
1
(B) A
n
_
=
n=1
f[
1
An
(B).
Since each A
n
/, /
An
/ and so the previous displayed equation shows
f
1
(B) /.
Lemma 6.14 (Composing Measurable Functions). Suppose that
(X, /), (Y, T) and (Z, () are measurable spaces. If f : (X, /) (Y, T) and
g : (Y, T) (Z, () are measurable functions then g f : (X, /) (Z, () is
measurable as well.
Proof. By assumption g
1
(() T and f
1
(T) / so that
(g f)
1
(() = f
1
_
g
1
(()
_
f
1
(T) /.
Denition 6.15 ( Algebras Generated by Functions). Let X be a set
and suppose there is a collection of measurable spaces (Y
, T
) : I and
functions f
: X Y
for all I. Let (f
: I) denote the smallest

algebra on X such that each f
is measurable, i.e.
(f
: I) = (
f
1
(T
)).
Example 6.16. Suppose that Y is a nite set, T = 2
Y
, and X = Y
N
for some
N N. Let
i
: Y
N
Y be the projection maps,
i
(y
1,
. . . , y
N
) = y
i
. Then,
as the reader should check,
(
1
, . . . ,
n
) =
_
A
Nn
: A
n
_
.
Proposition 6.17. Assuming the notation in Denition 6.15 (so f
: X
Y
for all I) and additionally let (Z, /) be a measurable space. Then

g : Z X is (/, (f
: I)) measurable i f
g
_
Z
g
X
f
Y
_
is
(/, T
)measurable for all I.

Proof. () If g is (/, (f
: I)) measurable, then the composition

f
g is (/, T
) measurable by Lemma 6.14.

6.1 Measurable Functions 69
() Since (f
: I) = (c) where c :=
f
1
(T
), according to
Proposition 6.10, it suces to show g
1
(A) / for A f
1
(T
). But this is
true since if A = f
1
(B) for some B T
, then g
1
(A) = g
1
_
f
1
(B)
_
=
(f
g)
1
(B) / because f
g : Z Y
is assumed to be measurable.
Denition 6.18. If (Y
, T
) : I is a collection of measurable spaces, then

the product measure space, (Y, T) , is Y :=
I
Y
, T := (
: I) where
: Y Y
is the component projection. We call T the product algebra

and denote it by, T =
I
T
.
Let us record an important special case of Proposition 6.17.
Corollary 6.19. If (Z, /) is a measure space, then g : Z Y =

I
Y
is
(/, T :=
I
T
) measurable i
g : Z Y
is (/, T
) measurable
for all I.
As a special case of the above corollary, if A = 1, 2, . . . , n , then Y =
Y
1
Y
n
and g = (g
1
, . . . , g
n
) : Z Y is measurable i each component,
g
i
: Z Y
i
, is measurable. Here is another closely related result.
Proposition 6.20. Suppose X is a set, (Y
, T
) : I is a collection of
measurable spaces, and we are given maps, f
: X Y
, for all I. If
f : X Y :=
I
Y
is the unique map, such that
f = f
, then
(f
: I) = (f) = f
1
(T)
where T :=
I
T
.
Proof. Since
f = f
is (f
: I) /T
measurable for all I it

follows from Corollary 6.19 that f : X Y is (f
: I) /T measurable.
Since (f) is the smallest algebra on X such that f is measurable we may
conclude that (f) (f
: I) .
Conversely, for each I, f
f is (f) /T
measurable for all

I being the composition of two measurable functions. Since (f
: I)
is the smallest algebra on X such that each f
: X Y
is measurable, we
learn that (f
: I) (f) .
Exercise 6.6. Suppose that (Y
1
, T
1
) and (Y
2
, T
2
) are measurable spaces and
c
i
is a subset of T
i
such that Y
i
c
i
and T
i
= (c
i
) for i = 1 and 2. Show
T
1
T
2
= (c) where c := A
1
A
2
: A
i
c
i
for i = 1, 2 . Hints:
1. First show that if Y is a set and o
1
and o
2
are two non-empty sub-
sets of 2
Y
, then ( (o
1
) (o
2
)) = (o
1
o
2
) . (In fact, one has that
(
I
(o
)) = (
I
o
) for any collection of non-empty subsets,

o
I
2
Y
.)
2. After this you might start your proof as follows;
T
1
T
2
:=
_
1
1
(T
1
)
1
2
(T
2
)
_
=
_
1
1
( (c
2
))
1
2
( (c
2
))
_
= . . . .
Remark 6.21. The reader should convince herself that Exercise 6.6 admits the
following extension. If I is any nite or countable index set, (Y
i
, T
i
)
iI
are
measurable spaces and c
i
T
i
are such that Y
i
c
i
and T
i
= (c
i
) for all
i I, then
iI
T
i
=
__
iI
A
i
: A
j
c
j
for all j I
__
and in particular,
iI
T
i
=
__
iI
A
i
: A
j
T
j
for all j I
__
.
The last fact is easily veried directly without the aid of Exercise 6.6.
Exercise 6.7. Suppose that (Y
1
, T
1
) and (Y
2
, T
2
) are measurable spaces and
, = B
i
Y
i
for i = 1, 2. Show
[T
1
T
2
]
B1B2
= [T
1
]
B1
[T
2
]
B2
.
Hint: you may nd it useful to use the result of Exercise 6.6 with
c := A
1
A
2
: A
i
T
i
for i = 1, 2 .
Denition 6.22. A function f : X Y between two topological spaces is
Borel measurable if f
1
(B
Y
) B
X
.
Proposition 6.23. Let X and Y be two topological spaces and f : X Y be
a continuous function. Then f is Borel measurable.
Proof. Using Lemma 6.3 and B
Y
= (
Y
),
f
1
(B
Y
) = f
1
((
Y
)) = (f
1
(
Y
)) (
X
) = B
X
.
Example 6.24. For i = 1, 2, . . . , n, let
i
: 1
n
1 be dened by
i
(x) = x
i
.
Then each
i
is continuous and therefore B
R
n/B
R
measurable.
Lemma 6.25. Let c denote the collection of open rectangle in 1
n
, then B
R
n =
(c) . We also have that B
R
n = (
1
, . . . ,
n
) = B
R
B
R
and in particular,
A
1
A
n
B
R
n whenever A
i
B
R
for i = 1, 2, . . . , n. Therefore B
R
n may
be described as the algebra generated by A
1
A
n
: A
i
B
R
. (Also see
Remark 6.21.)
Proof. Assertion 1. Since c B
R
n, it follows that (c) B
R
n. Let
c
0
:= (a, b) : a, b
n
a < b ,
where, for a, b 1
n
, we write a < b i a
i
< b
i
for i = 1, 2, . . . , n and let
(a, b) = (a
1
, b
1
) (a
n
, b
n
) . (6.3)
Since every open set, V 1
n
, may be written as a (necessarily) countable
union of elements from c
0
, we have
V (c
0
) (c) ,
i.e. (c
0
) and hence (c) contains all open subsets of 1
n
. Hence we may
conclude that
B
R
n = (open sets) (c
0
) (c) B
R
n.
Assertion 2. Since each
i
: 1
n
1 is continuous, it is B
R
n/B
R
measur-
able and therefore, (
1
, . . . ,
n
) B
R
n. Moreover, if (a, b) is as in Eq. (6.3),
then
(a, b) =
n
i=1
1
i
((a
i
, b
i
)) (
1
, . . . ,
n
) .
Therefore, c (
1
, . . . ,
n
) and B
R
n = (c) (
1
, . . . ,
n
) .
Assertion 3. If A
i
B
R
for i = 1, 2, . . . , n, then
A
1
A
n
=
n
i=1
1
i
(A
i
) (
1
, . . . ,
n
) = B
R
n.
Corollary 6.26. If (X, /) is a measurable space, then
f = (f
1
, f
2
, . . . , f
n
) : X 1
n
is (/, B
R
n) measurable i f
i
: X 1 is (/, B
R
) measurable for each i.
In particular, a function f : X C is (/, B
C
) measurable i Re f and Imf
are (/, B
R
) measurable.
Proof. This is an application of Lemma 6.25 and Corollary 6.19 with Y
i
= 1
for each i.
Corollary 6.27. Let (X, /) be a measurable space and f, g : X C be
(/, B
C
) measurable functions. Then f g and f g are also (/, B
C
)
measurable.
Proof. Dene F : X C C, A
: C C C and M : C C C
by F(x) = (f(x), g(x)), A
(w, z) = w z and M(w, z) = wz. Then A
and
M are continuous and hence (B
C
2, B
C
) measurable. Also F is (/, B
C
2)
measurable since
1
F = f and
2
F = g are (/, B
C
) measurable. Therefore
A
F = f g and MF = f g, being the composition of measurable functions,

are also measurable.
Lemma 6.28. Let C, (X, /) be a measurable space and f : X C be a
(/, B
C
) measurable function. Then
F(x) :=
_
1
f(x)
if f(x) ,= 0
if f(x) = 0
is measurable.
Proof. Dene i : C C by
i(z) =
_
1
z
if z ,= 0
0 if z = 0.
For any open set V C we have
i
1
(V ) = i
1
(V 0) i
1
(V 0)
Because i is continuous except at z = 0, i
1
(V 0) is an open set and hence
in B
C
. Moreover, i
1
(V 0) B
C
since i
1
(V 0) is either the empty
set or the one point set 0 . Therefore i
1
(
C
) B
C
and hence i
1
(B
C
) =
i
1
((
C
)) = (i
1
(
C
)) B
C
which shows that i is Borel measurable. Since
F = i f is the composition of measurable functions, F is also measurable.
Remark 6.29. For the real case of Lemma 6.28, dene i as above but now take
z to real. From the plot of i, Figure 6.29, the reader may easily verify that
i
1
((, a]) is an innite half interval for all a and therefore i is measurable.
See Example 6.34 for another proof of this fact.
6.1 Measurable Functions 71
We will often deal with functions f : X

1 = 1 . When talking
about measurability in this context we will refer to the algebra on

1 dened
by
B
R
:= ([a, ] : a 1) . (6.4)
Proposition 6.30 (The Structure of B
R
). Let B
R
and B
R
be as above, then
B
R
= A

1 : A 1 B
R
. (6.5)
In particular , B
R
and B
R
B
R
.
Proof. Let us rst observe that
=
n=1
[, n) =
n=1
[n, ]
c
B
R
,
=
n=1
[n, ] B
R
and 1 =

1 B
R
.
Letting i : 1

1 be the inclusion map,
i
1
(B
R
) =
_
i
1
__
[a, ] : a

1
___
=
__
i
1
([a, ]) : a

1
__
=
__
[a, ] 1 : a

1
__
= ([a, ) : a 1) = B
R
.
Thus we have shown
B
R
= i
1
(B
R
) = A 1 : A B
R
.
This implies:
1. A B
R
=A 1 B
R
and
2. if A

1 is such that A1 B
R
there exists B B
R
such that A1 = B1.
Because AB and , B
R
we may conclude that
A B
R
as well.
This proves Eq. (6.5).
The proofs of the next two corollaries are left to the reader, see Exercises
6.8 and 6.9.
Corollary 6.31. Let (X, /) be a measurable space and f : X

1 be a func-
tion. Then the following are equivalent
1. f is (/, B
R
) - measurable,
2. f
1
((a, ]) / for all a 1,
3. f
1
((, a]) / for all a 1,
4. f
1
() /, f
1
() / and f
0
: X 1 dened by
f
0
(x) :=
_
f (x) if f (x) 1
0 if f (x)
is measurable.
Corollary 6.32. Let (X, /) be a measurable space, f, g : X

1 be functions
and dene f g : X

1 and (f +g) : X

1 using the conventions, 0 = 0
and (f +g) (x) = 0 if f (x) = and g (x) = or f (x) = and g (x) =
. Then f g and f + g are measurable functions on X if both f and g are
measurable.
Exercise 6.8. Prove Corollary 6.31 noting that the equivalence of items 1. 3.
is a direct analogue of Corollary 6.11. Use Proposition 6.30 to handle item 4.
Exercise 6.9. Prove Corollary 6.32.
Proposition 6.33 (Closure under sups, infs and limits). Suppose that
(X, /) is a measurable space and f
j
: (X, /) 1 for j N is a sequence of
//B
R
measurable functions. Then
sup
j
f
j
, inf
j
f
j
, limsup
j
f
j
and liminf
j
f
j
are all //B
R
measurable functions. (Note that this result is in generally false
when (X, /) is a topological space and measurable is replaced by continuous in
the statement.)
Proof. Dene g
+
(x) := sup
j
f
j
(x), then
x : g
+
(x) a = x : f
j
(x) a j
=
j
x : f
j
(x) a /
so that g
+
is measurable. Similarly if g
(x) = inf
j
f
j
(x) then
x : g
(x) a =
j
x : f
j
(x) a /.
Since
limsup
j
f
j
= inf
n
supf
j
: j n and
liminf
j
f
j
= sup
n
inf f
j
: j n
we are done by what we have already proved.
Example 6.34. As we saw in Remark 6.29, i : 1 1 dened by
i(z) =
_
1
z
if z ,= 0
0 if z = 0.
is measurable by a simple direct argument. For an alternative argument, let
i
n
(z) :=
z
z
2
+
1
n
for all n N.
Then i
n
is continuous and lim
n
i
n
(z) = i (z) for all z 1 from which it
follows that i is Borel measurable.
Example 6.35. Let r
n
n=1
be an enumeration of the points in [0, 1] and
dene
f(x) =
n=1
2
n
1
_
[x r
n
[
with the convention that
1
_
[x r
n
[
= 5 if x = r
n
.
Then f : 1

1 is measurable. Indeed, if
g
n
(x) =
_
1
]xrn]
if x ,= r
n
0 if x = r
n
then g
n
(x) =
_
[i (x r
n
)[ is measurable as the composition of measurable is
measurable. Therefore g
n
+ 5 1
rn]
is measurable as well. Finally,
f (x) = lim
N
N
n=1
2
n
1
_
[x r
n
[
is measurable since sums of measurable functions are measurable and limits
of measurable functions are measurable. Moral: if you can explicitly write a
function f :

1

1 down then it is going to be measurable.
Denition 6.36. Given a function f : X

1 let f
+
(x) := max f(x), 0 and
f
(x) := max (f(x), 0) = min(f(x), 0) . Notice that f = f

+
f
.
Corollary 6.37. Suppose (X, /) is a measurable space and f : X

1 is a
function. Then f is measurable i f
are measurable.
Proof. If f is measurable, then Proposition 6.33 implies f
are measurable.
Conversely if f
are measurable then so is f = f

+
f
.
Denition 6.38. Let (X, /) be a measurable space. A function : X F
(F denotes either 1, C or [0, ]

1) is a simple function if is / B
F
measurable and (X) contains only nitely many elements.
Any such simple functions can be written as
=
n
i=1
i
1
Ai
with A
i
/ and
i
F. (6.6)
Indeed, take
1
,
2
, . . . ,
n
to be an enumeration of the range of and A
i
=
1
(
i
). Note that this argument shows that any simple function may be
written intrinsically as
=
yF
y1
1
(y])
. (6.7)
The next theorem shows that simple functions are pointwise dense in the
space of measurable functions.
Theorem 6.39 (Approximation Theorem). Let f : X [0, ] be measur-
able and dene, see Figure 6.1,
n
(x) :=
2
2n
1
k=0
k
2
n
1
f
1
((
k
2
n ,
k+1
2
n ])
(x) + 2
n
1
f
1
((2
n
,])
(x)
=
2
2n
1
k=0
k
2
n
1
k
2
n <f
k+1
2
n
(x) + 2
n
1
f>2
n
]
(x)
then
n
f for all n,
n
(x) f(x) for all x X and
n
f uniformly on the
sets X
M
:= x X : f(x) M with M < .
Moreover, if f : X C is a measurable function, then there exists simple
functions
n
such that lim
n
n
(x) = f(x) for all x and [
n
[ [f[ as n .
Proof. Since f
1
_
(
k
2
n
,
k+1
2
n
]
_
and f
1
((2
n
, ]) are in /as f is measurable,
n
is a measurable simple function for each n. Because
(
k
2
n
,
k + 1
2
n
] = (
2k
2
n+1
,
2k + 1
2
n+1
] (
2k + 1
2
n+1
,
2k + 2
2
n+1
],
if x f
1
_
(
2k
2
n+1
,
2k+1
2
n+1
]
_
then
n
(x) =
n+1
(x) =
2k
2
n+1
and if x
f
1
_
(
2k+1
2
n+1
,
2k+2
2
n+1
]
_
then
n
(x) =
2k
2
n+1
<
2k+1
2
n+1
=
n+1
(x). Similarly
(2
n
, ] = (2
n
, 2
n+1
] (2
n+1
, ],
and so for x f
1
((2
n+1
, ]),
n
(x) = 2
n
< 2
n+1
=
n+1
(x) and for x
f
1
((2
n
, 2
n+1
]),
n+1
(x) 2
n
=
n
(x). Therefore
n

n+1
for all n. It is
clear by construction that 0
n
(x) f(x) for all x and that 0 f(x)
n
(x) 2
n
if x X
2
n = f 2
n
. Hence we have shown that
n
(x) f(x)
for all x X and
n
f uniformly on bounded sets.
For the second assertion, rst assume that f : X 1 is a measurable
function and choose
n
to be non-negative simple functions such that
n
f
as n and dene
n
=
+
n

n
. Then (using
+
n

n
f
+
f
= 0)
[
n
[ =
+
n
+
n

+
n+1
+
n+1
= [
n+1
[
and clearly [
n
[ =
+
n
+
n
f
+
+f
= [f[ and
n
=
+
n

n
f
+
f
= f
as n . Now suppose that f : X C is measurable. We may now choose
6.2 Factoring Random Variables 73
Fig. 6.1. Constructing the simple function, 2, approximating a function, f : X
[0, ]. The graph of 2 is in red.
simple function u
n
and v
n
such that [u
n
[ [Re f[ , [v
n
[ [Imf[ , u
n
Re f and
v
n
Imf as n . Let
n
= u
n
+iv
n
, then
[
n
[
2
= u
2
n
+v
2
n
[Re f[
2
+[Imf[
2
= [f[
2
and
n
= u
n
+iv
n
Re f +i Imf = f as n .
6.2 Factoring Random Variables
Lemma 6.40. Suppose that (Y, T) is a measurable space and Y : Y is a
map. Then to every ((Y ), B
R
) measurable function, h :

1, there is a
(T, B
R
) measurable function H : Y

1 such that h = H Y. More generally,
1 may be replaced by any standard Borel space,

1
i.e. a space, (S, B
S
) which
is measure theoretic isomorphic to a Borel subset of 1.
1
Standard Borel spaces include almost any measurable space that we will consider in
these notes. For example they include all complete seperable metric spaces equipped
with the Borel algebra, see Section 9.10.
(, (Y ))
Y
-
(Y, T)
(S, B
S
)
h
?
H
Proof. First suppose that h = 1

A
where A (Y ) = Y
1
(T). Let B T
such that A = Y
1
(B) then 1
A
= 1
Y
1
(B)
= 1
B
Y and hence the lemma
is valid in this case with H = 1
B
. More generally if h =

a
i
1
Ai
is a simple
function, then there exists B
i
T such that 1
Ai
= 1
Bi
Y and hence h = HY
with H :=
a
i
1
Bi
a simple function on

1.
For a general (T, B
R
) measurable function, h, from

1, choose simple
functions h
n
converging to h. Let H
n
: Y

1 be simple functions such that
h
n
= H
n
Y. Then it follows that
h = lim
n
h
n
= limsup
n
h
n
= limsup
n
H
n
Y = H Y
where H := limsup
n
H
n
a measurable function from Y to

1.
For the last assertion we may assume that S B
R
and B
S
= (B
R
)
S
=
A S : A B
R
. Since i
S
: S 1 is measurable, what we have just proved
shows there exists, H : Y

1 which is (T, B
R
) measurable such that h =
i
S
h = H Y. The only problems with H is that H (Y) may not be contained
in S. To x this, let
H
S
=
_
H[
H
1
(S)
on H
1
(S)
on Y H
1
(S)
where is some xed arbitrary point in S. It follows from Proposition 6.13 that
H
S
: Y S is (T, B
S
) measurable and we still have h = H
S
Y as the range
of Y must necessarily be in H
1
(S) .
Here is how this lemma will often be used in these notes.
Corollary 6.41. Suppose that (, B) is a measurable space, X
n
: 1 are
B/B
R
measurable functions, and B
n
:= (X
1
, . . . , X
n
) B for each n N.
Then h : 1 is B
n
measurable i there exists H : 1
n
1 which is
B
R
n/B
R
measurable such that h = H (X
1
, . . . , X
n
) .
(, B
n
= (Y ))
Y :=(X1,...,Xn)
-
(1
n
, B
R
n)
(1, B
R
)
h
?
H
Proof. By Lemma 6.25 and Corollary 6.19, the map, Y := (X

1
, . . . , X
n
) :
1
n
is (B, B
R
n = B
R
B
R
) measurable and by Proposition 6.20,
B
n
= (X
1
, . . . , X
n
) = (Y ) . Thus we may apply Lemma 6.40 to see that
there exists a B
R
n/B
R
measurable map, H : 1
n
1, such that h = H Y =
H (X
1
, . . . , X
n
) .
6.3 Summary of Measurability Statements
It may be worthwhile to gather the statements of the main measurability re-
sults of Sections 6.1 and 6.2 in one place. To do this let (, B) , (X, /), and
(Y
, T
)
I
be measurable spaces and f
: Y
be given maps for all

I. Also let
: Y Y
be the projection map,

T :=
I
T
:= (
: I)
be the product algebra on Y, and f : Y be the unique map determined
by
f = f
for all I. Then the following measurability results hold;

1. For A , the indicator function, 1
A
, is (B, B
R
) measurable i A B.
(Example 6.8).
2. If c / generates / (i.e. /= (c)), then a map, g : X is (B, /)
measurable i g
1
(c) B (Lemma 6.3 and Proposition 6.10).
3. The notion of measurability may be localized (Proposition 6.13).
4. Composition of measurable functions are measurable (Lemma 6.14).
5. Continuous functions between two topological spaces are also Borel mea-
surable (Proposition 6.23).
6. (f) = (f
: I) (Proposition 6.20).
7. A map, h : X is (/, (f) = (f
: I)) measurable i f
h is
(/, T
) measurable for all I (Proposition 6.17).

8. A map, h : X Y is (/, T) measurable i
h is (/, T
) measurable
for all I (Corollary 6.19).
9. If I = 1, 2, . . . , n , then
I
T
= T
1
T
n
= (A
1
A
2
A
n
: A
i
T
i
for i I) ,
this is a special case of Remark 6.21.
10. B
R
n = B
R
B
R
(n - times) for all n N, i.e. the Borel algebra on
1
n
is the same as the product algebra. (Lemma 6.25).
11. The collection of measurable functions from (, B) to
_
1, B
R
_
is closed un-
der the usual pointwise algebraic operations (Corollary 6.32). They are also
closed under the countable supremums, inmums, and limits (Proposition
6.33).
12. The collection of measurable functions from (, B) to (C, B
C
) is closed under
the usual pointwise algebraic operations and countable limits. (Corollary
6.27 and Proposition 6.33). The limiting assertion follows by considering
the real and imaginary parts of all functions involved.
13. The class of measurable functions from (, B) to
_
1, B
R
_
and from (, B)
to (C, B
C
) may be well approximated by measurable simple functions (The-
orem 6.39).
6.4 Distributions / Laws of Random Vectors 75
14. If X
i
: 1 are B/B
R
measurable maps and B
n
:= (X
1
, . . . , X
n
) ,
then h : 1 is B
n
measurable i h = H (X
1
, . . . , X
n
) for some B
R
n/B
R
measurable map, H : 1
n
1 (Corollary 6.41).
15. We also have the more general factorization Lemma 6.40.
For the most part most of our future measurability issues can be resolved
by one or more of the items on this list.
6.4 Distributions / Laws of Random Vectors
The proof of the following proposition is routine and will be left to the reader.
Proposition 6.42. Let (X, /, ) be a measure space, (Y, T) be a measurable
space and f : X Y be a measurable map. Dene a function : T [0, ] by
(A) := (f
1
(A)) for all A T. Then is a measure on (Y, T) . (In the future
we will denote by f
or f
1
or Law
(f) and call f
the push-forward
of by f or the law of f under .
Denition 6.43. Suppose that X
i
n
i=1
is a sequence of random variables on a
probability space, (, B, P) . The probability measure,
= (X
1
, . . . , X
n
)
P = P (X
1
, . . . , X
n
)
1
on B
R
(see Proposition 6.42) is called the joint distribution (or law) of
(X
1
, . . . , X
n
) . To be more explicit,
(B) := P ((X
1
, . . . , X
n
) B) := P ( : (X
1
() , . . . , X
n
()) B)
for all B B
R
n.
Corollary 6.44. The joint distribution, is uniquely determined from the
knowledge of
P ((X
1
, . . . , X
n
) A
1
A
n
) for all A
i
B
R
or from the knowledge of
P (X
1
x
1
, . . . , X
n
x
n
) for all A
i
B
R
for all x = (x
1
, . . . , x
n
) 1
n
.
Proof. Apply Proposition 5.15 with T being the systems dened by
T := A
1
A
n
B
R
n : A
i
B
R
for the rst case and

T := (, x
1
] (, x
n
] B
R
n : x
i
1
for the second case.
Denition 6.45. Suppose that X
i
n
i=1
and Y
i
n
i=1
are two nite sequences of
random variables on two probability spaces, (, B, P) and (
t
, B
t
, P
t
) respec-
tively. We write (X
1
, . . . , X
n
)
d
= (Y
1
, . . . , Y
n
) if (X
1
, . . . , X
n
) and (Y
1
, . . . , Y
n
)
have the same distribution / law, i.e. if
P ((X
1
, . . . , X
n
) B) = P
t
((Y
1
, . . . , Y
n
) B) for all B B
R
n.
More generally, if X
i
i=1
and Y
i
i=1
are two sequences of random variables
on two probability spaces, (, B, P) and (
t
, B
t
, P
t
) we write X
i
i=1
d
= Y
i
i=1
i (X
1
, . . . , X
n
)
d
= (Y
1
, . . . , Y
n
) for all n N.
Proposition 6.46. Let us continue using the notation in Denition 6.45. Fur-
ther let
X = (X
1
, X
2
, . . . ) : 1
N
and Y := (Y
1
, Y
2
, . . . ) :
t
1
N
and let T :=
nN
B
R
be the product algebra on 1
N
. Then X
i
i=1
d
=
Y
i
i=1
i X
P = Y
P
t
as measures on
_
1
N
, T
_
.
Proof. Let
T :=
n=1
_
A
1
A
2
A
n
1
N
: A
i
B
R
for 1 i n
_
.
Notice that T is a system and it is easy to show (T) = T (see Exercise
6.6). Therefore by Proposition 5.15, X
P = Y
P
t
i X
P = Y
P
t
on T. Now
for A
1
A
2
A
n
1
N
T we have,
X
P
_
A
1
A
2
A
n
1
N
_
= P ((X
1
, . . . , X
n
) A
1
A
2
A
n
)
and hence the condition becomes,
P ((X
1
, . . . , X
n
) A
1
A
2
A
n
) = P
t
((Y
1
, . . . , Y
n
) A
1
A
2
A
n
)
for all n N and A
i
B
R
. Another application of Proposition 5.15 or us-
ing Corollary 6.44 allows us to conclude that shows that X
P = Y
P
t
i
(X
1
, . . . , X
n
)
d
= (Y
1
, . . . , Y
n
) for all n N.
Corollary 6.47. Continue the notation above and assume that X
i
i=1
d
=
Y
i
i=1
. Further let
X
=
_
limsup
n
X
n
if +
liminf
n
X
n
if
and dene Y
similarly. Then (X
, X
+
)
d
= (Y
, Y
+
) as random variables into
_
1
2
, B
R
B
R
_
. In particular,
P
_
lim
n
X
n
exists in 1
_
= P
t
_
lim
n
Y exists in 1
_
. (6.8)
Proof. First suppose that (
t
, B
t
, P
t
) =
_
1
N
, T, P
t
:= X
P
_
where
Y
i
(a
1
, a
2
, . . . ) := a
i
=
i
(a
1
, a
2
, . . . ) . Then for C B
R
B
R
we have,
X
1
((Y
, Y
+
) C) = (Y
X, Y
+
X) C = (X
, X
+
) C ,
since, for example,
Y
X = liminf
n
Y
n
X = liminf
n
X
n
= X
.
Therefore it follows that
P ((X
, X
+
) C) = P X
1
((Y
, Y
+
) C) = P
t
((Y
, Y
+
) C) . (6.9)
The general result now follows by two applications of this special case.
For the last assertion, take
C = (x, x) : x 1 B
R
2 = B
R
B
R
B
R
B
R
.
Then (X
, X
+
) C i X
= X
+
1 which happens i lim
n
X
n
exists in
1. Similarly, (Y
, Y
+
) C i lim
n
Y
n
exists in 1 and therefore Eq. (6.8)
holds as a consequence of Eq. (6.9).
Exercise 6.10. Let X
i
i=1
and Y
i
i=1
be two sequences of random variables
such that X
i
i=1
d
= Y
i
i=1
. Let S
n
n=1
and T
n
n=1
be dened by, S
n
:=
X
1
+ +X
n
and T
n
:= Y
1
+ +Y
n
. Prove the following assertions.
1. Suppose that f : 1
n
1
k
is a B
R
n/B
R
k measurable function, then
f (X
1
, . . . , X
n
)
d
= f (Y
1
, . . . , Y
n
) .
2. Use your result in item 1. to show S
n
n=1
d
= T
n
n=1
.
Hint: Apply item 1. with k = n after making a judicious choice for f :
1
n
1
n
.
6.5 Generating All Distributions from the Uniform
Distribution
Theorem 6.48. Given a distribution function, F : 1 [0, 1] let G : (0, 1) 1
be dened (see Figure 6.2) by,
G(y) := inf x : F (x) y .
Then G : (0, 1) 1 is Borel measurable and G
m =
F
where
F
is the unique
measure on (1, B
R
) such that
F
((a, b]) = F (b) F (a) for all < a < b <
.
6.5 Generating All Distributions from the Uniform Distribution 77
Fig. 6.2. A pictorial denition of G.
Proof. Since G : (0, 1) 1 is a non-decreasing function, G is measurable.
We also claim that, for all x
0
1, that
G
1
((0, x
0
]) = y : G(y) x
0
= (0, F (x
0
)] 1, (6.10)
see Figure 6.3.
Fig. 6.3. As can be seen from this picture, G(y) x0 i y F (x0) and similarly,
G(y) x1 i y x1.
To give a formal proof of Eq. (6.10), G(y) = inf x : F (x) y x
0
, there
exists x
n
x
0
with x
n
x
0
such that F (x
n
) y. By the right continuity of F,
it follows that F (x
0
) y. Thus we have shown
G x
0
(0, F (x
0
)] (0, 1) .
For the converse, if y F (x
0
) then G(y) = inf x : F (x) y x
0
, i.e.
y G x
0
. Indeed, y G
1
((, x
0
]) i G(y) x
0
. Observe that
G(F (x
0
)) = inf x : F (x) F (x
0
) x
0
and hence G(y) x
0
whenever y F (x
0
) . This shows that
(0, F (x
0
)] (0, 1) G
1
((0, x
0
]) .
As a consequence we have G
m =
F
. Indeed,
(G
m) ((, x]) = m
_
G
1
((, x])
_
= m(y (0, 1) : G(y) x)
= m((0, F (x)] (0, 1)) = F (x) .
See section 2.5.2 on p. 61 of Resnick for more details.
Theorem 6.49 (Durrets Version). Given a distribution function, F :
1 [0, 1] let Y : (0, 1) 1 be dened (see Figure 6.4) by,
Y (x) := supy : F (y) < x .
Then Y : (0, 1) 1 is Borel measurable and Y
m =
F
where
F
is the unique
measure on (1, B
R
) such that
F
((a, b]) = F (b) F (a) for all < a < b <
.
Fig. 6.4. A pictorial denition of Y (x) .
Proof. Since Y : (0, 1) 1 is a non-decreasing function, Y is measurable.
Also observe, if y < Y (x) , then F (y) < x and hence,
F (Y (x) ) = lim
yY (x)
F (y) x.
For y > Y (x) , we have F (y) x and therefore,
F (Y (x)) = F (Y (x) +) = lim
yY (x)
F (y) x
and so we have shown
F (Y (x) ) x F (Y (x)) .
We will now show
x (0, 1) : Y (x) y
0
= (0, F (y
0
)] (0, 1) . (6.11)
For the inclusion , if x (0, 1) and Y (x) y
0
, then x F (Y (x)) F (y
0
),
i.e. x (0, F (y
0
)] (0, 1) . Conversely if x (0, 1) and x F (y
0
) then (by
denition of Y (x)) y
0
Y (x) .
From the identity in Eq. (6.11), it follows that Y is measurable and
(Y
m) ((, y
0
)) = m
_
Y
1
(, y
0
)
_
= m((0, F (y
0
)] (0, 1)) = F (y
0
) .
Therefore, Law(Y ) =
F
as desired.
7
Integration Theory
In this chapter, we will greatly extend the simple integral or expectation
which was developed in Section 4.3 above. Recall there that if (, B, ) was
measurable space and : [0, ) was a measurable simple function, then
we let
E
:=
[0,)
( = ) .
The conventions being use here is that 0 ( = 0) = 0 even when ( = 0) =
. This convention is necessary in order to make the integral linear at a
minimum we will want E
[0] = 0. Please be careful not blindly apply the

0 = 0 convention in other circumstances.
7.1 Integrals of positive functions
Denition 7.1. Let L
+
= L
+
(B) = f : [0, ] : f is measurable. Dene
_
f () d() =
_
fd := supE
: is simple and f .
We say the f L
+
is integrable if
_
fd < . If A B, let
_
A
f () d() =
_
A
fd :=
_
1
A
f d.
We also use the notation,
Ef =
_
fd and E[f : A] :=
_
A
fd.
Remark 7.2. Because of item 3. of Proposition 4.19, if is a non-negative simple
function,
_
d = E
so that
_
is an extension of E
.
Lemma 7.3. Let f, g L
+
(B) . Then:
1. if 0, then
_
fd =
_
fd
wherein
_
fd 0 if = 0, even if
_
fd = .
2. if 0 f g, then
_
fd
_
gd. (7.1)
3. For all > 0 and p > 0,
(f )
1
p
_
f
p
1
f]
d
1
p
_
f
p
d. (7.2)
The inequality in Eq. (7.2) is called Chebyshevs Inequality for p = 1 and
Markovs inequality for p = 2.
4. If
_
fd < then (f = ) = 0 (i.e. f < a.e.) and the set f > 0

is nite.
Proof. 1. We may assume > 0 in which case,
_
fd = supE
: is simple and f
= sup
_
E
: is simple and
1
f
_
= supE
[] : is simple and f
= supE
[] : is simple and f
=
_
fd.
2. Since
is simple and f is simple and g ,
Eq. (7.1) follows from the denition of the integral.
3. Since 1
f]
1
f]
1
f
1
f we have
1
f]
1
f]
_
1
f
_
p
_
1
f
_
p
and by monotonicity and the multiplicative property of the integral,
(f ) =
_
1
f]
d
_
1
_
p
_
1
f]
f
p
d
_
1
_
p
_
f
p
d.
80 7 Integration Theory
4. If (f = ) > 0, then
n
:= n1
f=]
is a simple function such that
n
f for all n and hence
n(f = ) = E
(
n
)
_
fd
for all n. Letting n shows
_
fd = . Thus if
_
fd < then
(f = ) = 0.
Moreover,
f > 0 =
n=1
f > 1/n
with (f > 1/n) n
_
fd < for each n.

Theorem 7.4 (Monotone Convergence Theorem). Suppose f
n
L
+
is a
sequence of functions such that f
n
f (f is necessarily in L
+
) then
_
f
n

_
f as n .
Proof. Since f
n
f
m
f, for all n m < ,
_
f
n

_
f
m

_
f
from which if follows
_
f
n
is increasing in n and
lim
n
_
f
n

_
f. (7.3)
For the opposite inequality, let : [0, ) be a simple function such
that 0 f, (0, 1) and
n
:= f
n
. Notice that
n
and
f
n
1
n
and so by denition of
_
f
n
,
_
f
n
E
[1
n
] = E
[1
n
] . (7.4)
Then using the identity
1
n
= 1
n
y>0
y1
=y]
=
y>0
y1
=y]n
,
and the linearity of E
we have,
lim
n
E
[1
n
] = lim
n
y>0
y (
n
= y)
=
y>0
y lim
n
(
n
= y) (nite sum)
=
y>0
y( = y) = E
[] ,
wherein we have used the continuity of under increasing unions for the
third equality. This identity allows us to let n in Eq. (7.4) to conclude
lim
n
_
f
n
E
[] and since (0, 1) was arbitrary we may further con-

clude, E
[] lim
n
_
f
n
. The latter inequality being true for all simple
functions with f then implies that
_
f = sup
0f
E
[] lim
n
_
f
n
,
which combined with Eq. (7.3) proves the theorem.
Remark 7.5 (Explicit Integral Formula). Given f : [0, ] measurable,
we know from the approximation Theorem 6.39
n
f where
n
:=
2
2n
1
k=0
k
2
n
1
k
2
n <f
k+1
2
n
+ 2
n
1
f>2
n
]
.
Therefore by the monotone convergence theorem,
_
fd = lim
n
_
n
d
= lim
n
_
_
2
2n
1
k=0
k
2
n
_
k
2
n
< f
k + 1
2
n
_
+ 2
n
(f > 2
n
)
_
_
.
Corollary 7.6. If f
n
L
+
is a sequence of functions then
_
n=1
f
n
=
n=1
_
f
n
.
In particular, if

n=1
_
f
n
< then

n=1
f
n
< a.e.
Proof. First o we show that
_
(f
1
+f
2
) =
_
f
1
+
_
f
2
by choosing non-negative simple function
n
and
n
such that
n
f
1
and
n
f
2
. Then (
n
+
n
) is simple as well and (
n
+
n
) (f
1
+f
2
) so by the
monotone convergence theorem,
_
(f
1
+f
2
) = lim
n
_
(
n
+
n
) = lim
n
__

n
+
_

n
_
= lim
n
_

n
+ lim
n
_

n
=
_
f
1
+
_
f
2
.
7.1 Integrals of positive functions 81
Now to the general case. Let g
N
:=
N
n=1
f
n
and g =
1
f
n
, then g
N
g and so
again by monotone convergence theorem and the additivity just proved,
n=1
_
f
n
:= lim
N
N
n=1
_
f
n
= lim
N
_ N
n=1
f
n
= lim
N
_
g
N
=
_
g =:
_
n=1
f
n
.
Remark 7.7. It is in the proof of Corollary 7.6 (i.e. the linearity of the integral)
that we really make use of the assumption that all of our functions are measur-
able. In fact the denition
_
fd makes sense for all functions f : [0, ]
not just measurable functions. Moreover the monotone convergence theorem
holds in this generality with no change in the proof. However, in the proof of
Corollary 7.6, we use the approximation Theorem 6.39 which relies heavily on
the measurability of the functions to be approximated.
Example 7.8 (Sums as Integrals I). Suppose, = N, B := 2
N
, (A) = #(A)
for A is the counting measure on B, and f : N [0, ] is a function. Since
f =
n=1
f (n) 1
n]
,
it follows from Corollary 7.6 that
_
N
fd =
n=1
_
N
f (n) 1
n]
d =
n=1
f (n) (n) =
n=1
f (n) .
Thus the integral relative to counting measure is simply the innite sum.
Lemma 7.9 (Sums as Integrals II*). Let be a set and : [0, ] be
a function, let =
()
on B = 2
, i.e.
(A) =
A
().
If f : [0, ] is a function (which is necessarily measurable), then
_
fd =
f.
Proof. Suppose that : [0, ) is a simple function, then =
z[0,)
z1
=z]
and
()
z[0,)
z1
=z]
() =
z[0,)
z
()1
=z]
()
=
z[0,)
z( = z) =
_
d.
So if : [0, ) is a simple function such that f, then
_
d =
f.
Taking the sup over in this last equation then shows that
_
fd
f.
For the reverse inequality, let be a nite set and N (0, ).
Set f
N
() = minN, f() and let
N,
be the simple function given by
N,
() := 1
()f
N
(). Because
N,
() f(),
f
N
=
N,
=
_
N,
d
_
fd.
Since f
N
f as N , we may let N in this last equation to concluded
f
_
fd.
Since is arbitrary, this implies
f
_
fd.
Exercise 7.1. Suppose that
n
: B [0, ] are measures on B for n N. Also
suppose that
n
(A) is increasing in n for all A B. Prove that : B [0, ]
dened by (A) := lim
n
n
(A) is also a measure.
Proposition 7.10. Suppose that f 0 is a measurable function. Then
_
fd = 0 i f = 0 a.e. Also if f, g 0 are measurable functions such that

f g a.e. then
_
fd
_
gd. In particular if f = g a.e. then
_
fd =
_
gd.
Proof. If f = 0 a.e. and f is a simple function then = 0 a.e. This
implies that (
1
(y)) = 0 for all y > 0 and hence
_
d = 0 and therefore
_
fd = 0. Conversely, if
_
fd = 0, then by (Lemma 7.3),
(f 1/n) n
_
fd = 0 for all n.
Therefore, (f > 0)
n=1
(f 1/n) = 0, i.e. f = 0 a.e.
For the second assertion let E be the exceptional set where f > g, i.e.
E := : f() > g().
By assumption E is a null set and 1
E
c f 1
E
c g everywhere. Because g =
1
E
c g + 1
E
g and 1
E
g = 0 a.e.,
_
gd =
_
1
E
c gd +
_
1
E
gd =
_
1
E
c gd
and similarly
_
fd =
_
1
E
c fd. Since 1
E
c f 1
E
c g everywhere,
_
fd =
_
1
E
c fd
_
1
E
c gd =
_
gd.
Corollary 7.11. Suppose that f
n
is a sequence of non-negative measurable
functions and f is a measurable function such that f
n
f o a null set, then
_
f
n

_
f as n .
Proof. Let E be a null set such that f
n
1
E
c f1
E
c as n . Then
by the monotone convergence theorem and Proposition 7.10,
_
f
n
=
_
f
n
1
E
c
_
f1
E
c =
_
f as n .
Lemma 7.12 (Fatous Lemma). If f
n
: [0, ] is a sequence of measur-
able functions then
_
liminf
n
f
n
liminf
n
_
f
n
Proof. Dene g
k
:= inf
nk
f
n
so that g
k
liminf
n
f
n
as k . Since
g
k
f
n
for all k n,
_
g
k

_
f
n
for all n k
and therefore _
g
k
lim inf
n
_
f
n
for all k.
We may now use the monotone convergence theorem to let k to nd
_
lim inf
n
f
n
=
_
lim
k
g
k
MCT
= lim
k
_
g
k
lim inf
n
_
f
n
.
The following Corollary and the next lemma are simple applications of Corol-
lary 7.6.
Corollary 7.13. Suppose that (, B, ) is a measure space and A
n
n=1
B
is a collection of sets such that (A
i
A
j
) = 0 for all i ,= j, then
(
n=1
A
n
) =
n=1
(A
n
).
Proof. Since
(
n=1
A
n
) =
_
n=1
An
d and
n=1
(A
n
) =
_
n=1
1
An
d
it suces to show
n=1
1
An
= 1
n=1
An
a.e. (7.5)
Now

n=1
1
An
1
n=1
An
and

n=1
1
An
() ,= 1
n=1
An
() i A
i
A
j
for
some i ,= j, that is
_
:
n=1
1
An
() ,= 1
n=1
An
()
_
=
i<j
A
i
A
j
and the latter set has measure 0 being the countable union of sets of measure
zero. This proves Eq. (7.5) and hence the corollary.
Lemma 7.14 (The First Borell Cantelli Lemma). Let (, B, ) be a
measure space, A
n
B, and set
A
n
i.o. = : A
n
for innitely many ns =
N=1
_
nN
A
n
.
If

n=1
(A
n
) < then (A
n
i.o.) = 0.
7.2 Integrals of Complex Valued Functions 83
Proof. (First Proof.) Let us rst observe that
A
n
i.o. =
_
:
n=1
1
An
() =
_
.
Hence if

n=1
(A
n
) < then
>
n=1
(A
n
) =
n=1
_
1
An
d =
_
n=1
1
An
d
implies that
n=1
1
An
() < for - a.e. . That is to say (A
n
i.o.) = 0.
(Second Proof.) Of course we may give a strictly measure theoretic proof of
this fact:
(A
n
i.o.) = lim
N
_
_
_
nN
A
n
_
_
lim
N
nN
(A
n
)
and the last limit is zero since

n=1
(A
n
) < .
Example 7.15. Suppose that (, B, P) is a probability space (i.e. P () = 1)
and X
n
: 0, 1 are Bernoulli random variables with P (X
n
= 1) = p
n
and
P (X
n
= 0) = 1 p
n
. If

n=1
p
n
< , then P (X
n
= 1 i.o.) = 0 and hence
P (X
n
= 0 a.a.) = 1. In particular, P (lim
n
X
n
= 0) = 1.
7.2 Integrals of Complex Valued Functions
Denition 7.16. A measurable function f :

1 is integrable if f
+
:=
f1
f0]
and f
= f 1
f0]
are integrable. We write L
1
(; 1) for the space
of real valued integrable functions. For f L
1
(; 1) , let
_
fd =
_
f
+
d
_
d.
To shorten notation in this chapter we may simply write
_
fd or even
_
f for
_
fd.
Convention: If f, g :

1 are two measurable functions, let f +g denote
the collection of measurable functions h :

1 such that h() = f() +g()
whenever f() +g() is well dened, i.e. is not of the form or +.
We use a similar convention for f g. Notice that if f, g L
1
(; 1) and
h
1
, h
2
f +g, then h
1
= h
2
a.e. because [f[ < and [g[ < a.e.
Notation 7.17 (Abuse of notation) We will sometimes denote the integral
_
fd by (f) . With this notation we have (A) = (1

A
) for all A B.
Remark 7.18. Since
f
[f[ f
+
+f
,
a measurable function f is integrable i
_
[f[ d < . Hence
L
1
(; 1) :=
_
f :

1 : f is measurable and
_
[f[ d <
_
.
If f, g L
1
(; 1) and f = g a.e. then f
= g
a.e. and so it follows from

Proposition 7.10 that
_
fd =
_
gd. In particular if f, g L
1
(; 1) we may
dene _
(f +g) d =
_
hd
where h is any element of f +g.
Proposition 7.19. The map
f L
1
(; 1)
_
fd 1
is linear and has the monotonicity property:
_
fd
_
gd for all f, g
L
1
(; 1) such that f g a.e.
Proof. Let f, g L
1
(; 1) and a, b 1. By modifying f and g on a null set,
we may assume that f, g are real valued functions. We have af +bg L
1
(; 1)
because
[af +bg[ [a[ [f[ +[b[ [g[ L
1
(; 1) .
If a < 0, then
(af)
+
= af
and (af)
= af
+
so that _
af = a
_
f
+a
_
f
+
= a(
_
f
+
_
f
) = a
_
f.
A similar calculation works for a > 0 and the case a = 0 is trivial so we have
shown that _
af = a
_
f.
Now set h = f +g. Since h = h
+
h
,
h
+
h
= f
+
f
+g
+
g
or
h
+
+f
+g
= h
+f
+
+g
+
.
Therefore,
_
h
+
+
_
f
+
_
g
=
_
h
+
_
f
+
+
_
g
+
and hence
_
h =
_
h
+
_
h
=
_
f
+
+
_
g
+
_
f
_
g
=
_
f +
_
g.
Finally if f
+
f
= f g = g
+
g
then f
+
+ g
g
+
+ f
which implies
that _
f
+
+
_
g

_
g
+
+
_
f
or equivalently that
_
f =
_
f
+
_
f

_
g
+
_
g
=
_
g.
The monotonicity property is also a consequence of the linearity of the integral,
the fact that f g a.e. implies 0 g f a.e. and Proposition 7.10.
Denition 7.20. A measurable function f : C is integrable if
_
[f[ d < . Analogously to the real case, let

L
1
(; C) :=
_
f : C : f is measurable and
_
[f[ d <
_
.
denote the complex valued integrable functions. Because, max ([Re f[ , [Imf[)
[f[
2 max ([Re f[ , [Imf[) ,

_
[f[ d < i
_
[Re f[ d +
_
[Imf[ d < .
For f L
1
(; C) dene
_
f d =
_
Re f d +i
_
Imf d.
It is routine to show the integral is still linear on L
1
(; C) (prove!). In the
remainder of this section, let L
1
() be either L
1
(; C) or L
1
(; 1) . If A B
and f L
1
(; C) or f : [0, ] is a measurable function, let
_
A
fd :=
_
1
A
fd.
Proposition 7.21. Suppose that f L
1
(; C) , then
fd
[f[ d. (7.6)
Proof. Start by writing
_
f d = Re
i
with R 0. We may assume that
R =
fd
> 0 since otherwise there is nothing to prove. Since

R = e
i
_
f d =
_
e
i
f d =
_
Re
_
e
i
f
_
d +i
_
Im
_
e
i
f
_
d,
it must be that
_
Im
_
e
i
f
d = 0. Using the monotonicity in Proposition

7.10,
fd
=
_
Re
_
e
i
f
_
d
_
Re
_
e
i
f
_
d
_
[f[ d.
Proposition 7.22. Let f, g L
1
() , then
1. The set f ,= 0 is nite, in fact [f[
1
n
f ,= 0 and ([f[
1
n
) <
for all n.
2. The following are equivalent
a)
_
E
f =
_
E
g for all E B
b)
_
[f g[ = 0
c) f = g a.e.
Proof. 1. By Chebyshevs inequality, Lemma 7.3,
([f[
1
n
) n
_
[f[ d <
for all n.
2. (a) = (c) Notice that
_
E
f =
_
E
g
_
E
(f g) = 0
for all E B. Taking E = Re(f g) > 0 and using 1
E
Re(f g) 0, we
learn that
0 = Re
_
E
(f g)d =
_
1
E
Re(f g) =1
E
Re(f g) = 0 a.e.
This implies that 1
E
= 0 a.e. which happens i
(Re(f g) > 0) = (E) = 0.
Similar (Re(f g) < 0) = 0 so that Re(f g) = 0 a.e. Similarly, Im(f g) = 0
a.e and hence f g = 0 a.e., i.e. f = g a.e.
(c) = (b) is clear and so is (b) = (a) since
_
E
f
_
E
g
_
[f g[ = 0.
Lemma 7.23 (Integral Comparison I). Suppose that h L
1
() satises
_
A
hd 0 for all A B, (7.7)
then h 0 a.e.
Proof. Since by assumption,
0 = Im
_
A
hd =
_
A
Imhd for all A B,
we may apply Proposition 7.22 to conclude that Imh = 0 a.e. Thus we may
now assume that h is real valued. Taking A = h < 0 in Eq. (7.7) implies
_
1
A
[h[ d =
_
1
A
hd =
_
A
hd 0.
However 1
A
[h[ 0 and therefore it follows that
_
1
A
[h[ d = 0 and so Propo-
sition 7.22 implies 1
A
[h[ = 0 a.e. which then implies 0 = (A) = (h < 0) = 0.
Lemma 7.24 (Integral Comparison II). Suppose (, B, ) is a nite
measure space (i.e. there exists
n
B such that
n
and (
n
) < for
all n) and f, g : [0, ] are B measurable functions. Then f g a.e. i
_
A
fd
_
A
gd for all A B. (7.8)
In particular f = g a.e. i equality holds in Eq. (7.8).
Proof. It was already shown in Proposition 7.10 that f g a.e. implies Eq.
(7.8). For the converse assertion, let B
n
:= f n1
n
. Then from Eq. (7.8),
> n(
n
)
_
f1
Bn
d
_
g1
Bn
d
from which it follows that both f1
Bn
and g1
Bn
are in L
1
() and hence h :=
f1
Bn
g1
Bn
L
1
() . Using Eq. (7.8) again we know that
_
A
h =
_
f1
BnA
_
g1
BnA
0 for all A B.
An application of Lemma 7.23 implies h 0 a.e., i.e. f1
Bn
g1
Bn
a.e. Since
B
n
f < , we may conclude that
f1
f<]
= lim
n
f1
Bn
lim
n
g1
Bn
= g1
f<]
a.e.
Since f g whenever f = , we have shown f g a.e.
If equality holds in Eq. (7.8), then we know that g f and f g a.e., i.e.
f = g a.e.
Notice that we can not drop the niteness assumption in Lemma 7.24.
For example, let be the measure on B such that (A) = when A ,= ,
g = 3, and f = 2. Then equality holds (both sides are innite unless A =
when they are both zero) in Eq. (7.8) holds even though f < g everywhere.
Denition 7.25. Let (, B, ) be a measure space and L
1
() = L
1
(, B, )
denote the set of L
1
() functions modulo the equivalence relation; f g i
f = g a.e. We make this into a normed space using the norm
|f g|
L
1 =
_
[f g[ d
and into a metric space using
1
(f, g) = |f g|
L
1 .
Warning: in the future we will often not make much of a distinction between
L
1
() and L
1
() . On occasion this can be dangerous and this danger will be
pointed out when necessary.
Remark 7.26. More generally we may dene L
p
() = L
p
(, B, ) for p [1, )
as the set of measurable functions f such that
_
[f[
p
d <
modulo the equivalence relation; f g i f = g a.e.
We will see in later that
|f|
L
p =
__
[f[
p
d
_
1/p
for f L
p
()
is a norm and (L
p
(), ||
L
p) is a Banach space in this norm and in particular,
|f +g|
p
|f|
p
+|g|
p
for all f, g L
p
() .
Theorem 7.27 (Dominated Convergence Theorem). Suppose f
n
, g
n
, g
L
1
() , f
n
f a.e., [f
n
[ g
n
L
1
() , g
n
g a.e. and
_
g
n
d
_
gd.
Then f L
1
() and
_
fd = lim
h
_
f
n
d.
(In most typical applications of this theorem g
n
= g L
1
() for all n.)
Proof. Notice that [f[ = lim
n
[f
n
[ lim
n
[g
n
[ g a.e. so that
f L
1
() . By considering the real and imaginary parts of f separately, it
suces to prove the theorem in the case where f is real. By Fatous Lemma,
_
(g f)d =
_
liminf
n
(g
n
f
n
) d liminf
n
_
(g
n
f
n
) d
= lim
n
_
g
n
d + liminf
n
_
f
n
d
_
=
_
gd + liminf
n
_
f
n
d
_
Since liminf
n
(a
n
) = limsup
n
a
n
, we have shown,
_
gd
_
fd
_
gd +
_
liminf
n
_
f
n
d
limsup
n
_
f
n
d
and therefore
limsup
n
_
f
n
d
_
fd liminf
n
_
f
n
d.
This shows that lim
n
_
f
n
d exists and is equal to
_
fd.
Exercise 7.2. Give another proof of Proposition 7.21 by rst proving Eq. (7.6)
with f being a simple function in which case the triangle inequality for complex
numbers will do the trick. Then use the approximation Theorem 6.39 along with
the dominated convergence Theorem 7.27 to handle the general case.
Corollary 7.28. Let f
n
n=1
L
1
() be a sequence such that
n=1
|f
n
|
L
1
()
< , then

n=1
f
n
is convergent a.e. and
_
n=1
f
n
_
d =
n=1
_
f
n
d.
Proof. The condition

n=1
|f
n
|
L
1
()
< is equivalent to

n=1
[f
n
[
L
1
() . Hence
n=1
f
n
is almost everywhere convergent and if S
N
:=
N
n=1
f
n
,
then
[S
N
[
N
n=1
[f
n
[
n=1
[f
n
[ L
1
() .
So by the dominated convergence theorem,
_
n=1
f
n
_
d =
_
lim
N
S
N
d = lim
N
_
S
N
d
= lim
N
N
n=1
_
f
n
d =
n=1
_
f
n
d.
Example 7.29 (Sums as integrals). Suppose, = N, B := 2
N
, is counting
measure on B (see Example 7.8), and f : N C is a function. From Example
7.8 we have f L
1
() i

n=1
[f (n)[ < , i.e. i the sum,

n=1
f (n) is
absolutely convergent. Moreover, if f L
1
() , we may again write
f =
n=1
f (n) 1
n]
and then use Corollary 7.28 to conclude that
_
N
fd =
n=1
_
N
f (n) 1
n]
d =
n=1
f (n) (n) =
n=1
f (n) .
So again the integral relative to counting measure is simply the innite sum
provided the sum is absolutely convergent.
However if f (n) = (1)
n 1
n
, then
n=1
f (n) := lim
N
N
n=1
f (n)
is perfectly well dened while
_
N
fd is not. In fact in this case we have,
_
N
f
d = .
The point is that when we write

n=1
f (n) the ordering of the terms in the
sum may matter. On the other hand,
_
N
fd knows nothing about the integer
ordering.
The following corollary will be routinely be used in the sequel often without
explicit mention.
Corollary 7.30 (Dierentiation Under the Integral). Suppose that J 1
is an open interval and f : J C is a function such that
1. f(t, ) is measurable for each t J.
2. f(t
0
, ) L
1
() for some t
0
J.
3.
f
t
(t, ) exists for all (t, ).
4. There is a function g L
1
() such that
f
t
(t, )
g for each t J.
Then f(t, ) L
1
() for all t J (i.e.
_
[f(t, )[ d() < ), t

_
f(t, )d() is a dierentiable function on J, and

d
dt
_
f(t, )d() =
_
f
t
(t, )d().
Proof. By considering the real and imaginary parts of f separately, we may
assume that f is real. Also notice that
f
t
(t, ) = lim
n
n(f(t +n
1
, ) f(t, ))
and therefore, for
f
t
(t, ) is a sequential limit of measurable functions
and hence is measurable for all t J. By the mean value theorem,
[f(t, ) f(t
0
, )[ g() [t t
0
[ for all t J (7.9)
and hence
[f(t, )[ [f(t, ) f(t
0
, )[ +[f(t
0
, )[ g() [t t
0
[ +[f(t
0
, )[ .
This shows f(t, ) L
1
() for all t J. Let G(t) :=
_
f(t, )d(), then

G(t) G(t
0
)
t t
0
=
_
f(t, ) f(t
0
, )
t t
0
d().
By assumption,
lim
tt0
f(t, ) f(t
0
, )
t t
0
=
f
t
(t, ) for all
and by Eq. (7.9),
f(t, ) f(t
0
, )
t t
0
g() for all t J and .

Therefore, we may apply the dominated convergence theorem to conclude
lim
n
G(t
n
) G(t
0
)
t
n
t
0
= lim
n
_
f(t
n
, ) f(t
0
, )
t
n
t
0
d()
=
_
lim
n
f(t
n
, ) f(t
0
, )
t
n
t
0
d()
=
_
f
t
(t
0
, )d()
for all sequences t
n
J t
0
such that t
n
t
0
. Therefore,

G(t
0
) =
lim
tt0
G(t)G(t0)
tt0
exists and
G(t
0
) =
_
f
t
(t
0
, )d().
Corollary 7.31. Suppose that a
n
n=0
C is a sequence of complex numbers
such that series
f(z) :=
n=0
a
n
(z z
0
)
n
is convergent for [z z
0
[ < R, where R is some positive number. Then f :
D(z
0
, R) C is complex dierentiable on D(z
0
, R) and
f
t
(z) =
n=0
na
n
(z z
0
)
n1
=
n=1
na
n
(z z
0
)
n1
. (7.10)
By induction it follows that f
(k)
exists for all k and that
f
(k)
(z) =
n=0
n(n 1) . . . (n k + 1)a
n
(z z
0
)
n1
.
Proof. Let < R be given and choose r (, R). Since z = z
0
+ r
D(z
0
, R), by assumption the series
n=0
a
n
r
n
is convergent and in particular
M := sup
n
[a
n
r
n
[ < . We now apply Corollary 7.30 with X = N0 ,
being counting measure, = D(z
0
, ) and g(z, n) := a
n
(z z
0
)
n
. Since
[g
t
(z, n)[ = [na
n
(z z
0
)
n1
[ n[a
n
[
n1
1
r
n
_
r
_
n1
[a
n
[ r
n
1
r
n
_
r
_
n1
M
and the function G(n) :=
M
r
n
_
r
_
n1
is summable (by the Ratio test for exam-
ple), we may use G as our dominating function. It then follows from Corollary
7.30
f(z) =
_
X
g(z, n)d(n) =
n=0
a
n
(z z
0
)
n
is complex dierentiable with the dierential given as in Eq. (7.10).
Denition 7.32 (Moment Generating Function). Let (, B, P) be a prob-
ability space and X : 1 a random variable. The moment generating
function of X is M
X
: 1 [0, ] dened by
M
X
(t) := E
_
e
tX
.
Proposition 7.33. Suppose there exists > 0 such that E
_
e
]X]
< , then
M
X
(t) is a smooth function of t (, ) and
M
X
(t) =
n=0
t
n
n!
EX
n
if [t[ . (7.11)
In particular,
EX
n
=
_
d
dt
_
n
[
t=0
M
X
(t) for all n N
0
. (7.12)
Proof. If [t[ , then
E
_

n=0
[t[
n
n!
[X[
n
_
E
_

n=0
n
n!
[X[
n
_
= E
_
e
]X]
_
< .
it e
tX
e
]X]
for all [t[ . Hence it follows from Corollary 7.28 that, for
[t[ ,
M
X
(t) = E
_
e
tX
= E
_

n=0
t
n
n!
X
n
_
=
n=0
t
n
n!
EX
n
.
Equation (7.12) now is a consequence of Corollary 7.31.
Exercise 7.3. Let d N, = N
d
0
, B = 2
, : B N
0
be counting
measure on , and for x 1
d
and , let x
:= x
1
1
. . . x
n
n
. Further suppose
that f : C is function and r
i
> 0 for 1 i d such that
[f ()[ r
=
_
[f ()[ r
d() < ,
where r := (r
1
, . . . , r
d
) . Show;
1. There is a constant, C < such that [f ()[
C
r
for all .
2. Let
U :=
_
x 1
d
: [x
i
[ < r
i
i
_
and

U =
_
x 1
d
: [x
i
[ r
i
i
_
Show

[f () x
[ < for all x

U and the function, F : U 1
dened by
F (x) =
f () x
is continuous on

U.
3. Show, for all x U and 1 i d, that
x
i
F (x) =
i
f () x
ei
where e
i
= (0, . . . , 0, 1, 0, . . . , 0) is the i
th
standard basis vector on 1
d
.
4. For any , let
:=
_

x1
_
1
. . .
_

xd
_
d
and ! :=

d
i=1
i
! Explain
why we may now conclude that
F (x) =
!f () x
for all x U. (7.13)

5. Conclude that f () =
(F)(0)
!
for all .
6. If g : C is another function such that
g () x
f () x
for x in a neighborhood of 0 1
d
, then g () = f () for all .
Solution to Exercise (7.3). We take each item in turn.
1. If no such C existed, then there would exist (n) such that
[f ( (n))[ r
(n)
n for all n N and therefore,

[f ()[ r
n
for all n N which violates the assumption that

[f ()[ r
< .
2. If x

U, then [x
[ r
and therefore

[f () x
[f ()[ r
< . The continuity of F now follows by the DCT

where we can take g () := [f ()[ r
as the integrable dominating

function.
3. For notational simplicity assume that i = 1 and let
i
(0, r
i
) be chosen.
Then for [x
i
[ <
i
, we have,
1
f () x
e1
e1
C
r
=: g ()
where = (
1
, . . . ,
d
) . Notice that g () is summable since,
g ()
C
1=0
1
_
1
r
1
_
1
i=2
i=0
_
i
r
i
_
i
1
d
i=2
1
1
i
ri
1=0
1
_
1
r
1
_
1
<
where the last sum is nite as we saw in the proof of Corollary 7.31. Thus
we may apply Corollary 7.30 in order to dierentiate past the integral (=
sum).
4. This is a simple matter of induction. Notice that each time we dierentiate,
the resulting function is still dened and dierentiable on all of U.
5. Setting x = 0 in Eq. (7.13) shows (
F) (0) = !f () .
6. This follows directly from the previous item since,
!f () =
f () x
_
[
x=0
=
g () x
_
[
x=0
= !g () .
7.2.1 Square Integrable Random Variables and Correlations
Suppose that (, B, P) is a probability space. We say that X : 1 is
integrable if X L
1
(P) and square integrable if X L
2
(P) . When X is
integrable we let a
X
:= EX be the mean of X.
Now suppose that X, Y : 1 are two square integrable random variables.
Since
0 [X Y [
2
= [X[
2
+[Y [
2
2 [X[ [Y [ ,
it follows that
[XY [
1
2
[X[
2
+
1
2
[Y [
2
L
1
(P) .
In particular by taking Y = 1, we learn that [X[
1
2
_
1 +
X
2
_
which shows
that every square integrable random variable is also integrable.
Denition 7.34. The covariance, Cov (X, Y ) , of two square integrable ran-
dom variables, X and Y, is dened by
Cov (X, Y ) = E[(X a
X
) (Y a
Y
)] = E[XY ] EX EY
where a
X
:= EX and a
Y
:= EY. The variance of X,
Var (X) := Cov (X, X) = E
_
X
2
(EX)
2
(7.14)
We say that X and Y are uncorrelated if Cov (X, Y ) = 0, i.e. E[XY ] =
EX EY. More generally we say X
k
n
k=1
L
2
(P) are uncorrelated i
Cov (X
i
, X
j
) = 0 for all i ,= j.
It follows from Eq. (7.14) that
Var (X) E
_
X
2
for all X L
2
(P) . (7.15)
Lemma 7.35. The covariance function, Cov (X, Y ) is bilinear in X and Y and
Cov (X, Y ) = 0 if either X or Y is constant. For any constant k, Var (X +k) =
Var (X) and Var (kX) = k
2
Var (X) . If X
k
n
k=1
are uncorrelated L
2
(P)
random variables, then
Var (S
n
) =
n
k=1
Var (X
k
) .
Proof. We leave most of this simple proof to the reader. As an example of
the type of argument involved, let us prove Var (X +k) = Var (X) ;
Var (X +k) = Cov (X +k, X +k) = Cov (X +k, X) + Cov (X +k, k)
= Cov (X +k, X) = Cov (X, X) + Cov (k, X)
= Cov (X, X) = Var (X) ,
wherein we have used the bilinearity of Cov (, ) and the property that
Cov (Y, k) = 0 whenever k is a constant.
Exercise 7.4 (A Weak Law of Large Numbers). Assume X
n
n=1
is a se-
quence if uncorrelated square integrable random variables which are identically
distributed, i.e. X
n
d
= X
m
for all m, n N. Let S
n
:=
n
k=1
X
k
, := EX
k
and
2
:= Var (X
k
) (these are independent of k). Show;
E
_
S
n
n
_
= ,
E
_
S
n
n

_
2
= Var
_
S
n
n
_
=

2
n
, and
P
_
S
n
n

>
_

2
n
2
for all > 0 and n N. (Compare this with Exercise 4.13.)
7.2.2 Some Discrete Distributions
Denition 7.36 (Generating Function). Suppose that N : N
0
is an
integer valued random variable on a probability space, (, B, P) . The generating
function associated to N is dened by
G
N
(z) := E
_
z
N
n=0
P (N = n) z
n
for [z[ 1. (7.16)
By Corollary 7.31, it follows that P (N = n) =
1
n!
G
(n)
N
(0) so that G
N
can
be used to completely recover the distribution of N.
Proposition 7.37 (Generating Functions). The generating function satis-
es,
G
(k)
N
(z) = E
_
N (N 1) . . . (N k + 1) z
Nk
for [z[ < 1

and
G
(k)
(1) = lim
z1
G
(k)
(z) = E[N (N 1) . . . (N k + 1)] ,
where it is possible that one and hence both sides of this equation are innite.
In particular, G
t
(1) := lim
z1
G
t
(z) = EN and if EN
2
< ,
Var (N) = G
tt
(1) +G
t
(1) [G
t
(1)]
2
. (7.17)
Proof. By Corollary 7.31 for [z[ < 1,
G
(k)
N
(z) =
n=0
P (N = n) n(n 1) . . . (n k + 1) z
nk
= E
_
N (N 1) . . . (N k + 1) z
Nk
. (7.18)
Since, for z (0, 1) ,
0 N (N 1) . . . (N k + 1) z
Nk
N (N 1) . . . (N k + 1) as z 1,
we may apply the MCT to pass to the limit as z 1 in Eq. (7.18) to nd,
G
(k)
(1) = lim
z1
G
(k)
(z) = E[N (N 1) . . . (N k + 1)] .
Exercise 7.5 (Some Discrete Distributions). Let p (0, 1] and > 0. In
the four parts below, the distribution of N will be described. You should work
out the generating function, G
N
(z) , in each case and use it to verify the given
formulas for EN and Var (N) .
1. Bernoulli(p) : P (N = 1) = p and P (N = 0) = 1 p. You should nd
EN = p and Var (N) = p p
2
.
2. Binomial(n, p) : P (N = k) =
_
n
k
_
p
k
(1 p)
nk
for k = 0, 1, . . . , n.
(P (N = k) is the probability of k successes in a sequence of n indepen-
dent yes/no experiments with probability of success being p.) You should
nd EN = np and Var (N) = n
_
p p
2
_
.
3. Geometric(p) : P (N = k) = p (1 p)
k1
for k N. (P (N = k) is the
probability that the k
th
trial is the rst time of success out a sequence
of independent trials with probability of success being p.) You should nd
EN = 1/p and Var (N) =
1p
p
2
.
4. Poisson() : P (N = k) =

k
k!
e
for all k N
0
. You should nd EN = =
Var (N) .
Exercise 7.6. Let S
n,p
d
= Binomial(n, p) , k N, p
n
=
n
/n where
n
> 0
as n . Show that
lim
n
P (S
n,pn
= k) =

k
k!
e
= P (Poisson() = k) .
Thus we see that for p = O(1/n) and k not too large relative to n that for large
n,
P (Binomial (n, p) = k)
= P (Poisson(pn) = k) =
(pn)
k
k!
e
pn
.
(We will come back to the Poisson distribution and the related Poisson process
later on.)
Solution to Exercise (7.6). We have,
P (S
n,pn
= k) =
_
n
k
_
(
n
/n)
k
(1
n
/n)
nk
=

k
n
k!
n(n 1) . . . (n k + 1)
n
k
(1
n
/n)
nk
.
The result now follows since,
lim
n
n(n 1) . . . (n k + 1)
n
k
= 1
and
lim
n
ln(1
n
/n)
nk
= lim
n
(n k) ln(1
n
/n)
= lim
n
[(n k)
n
/n] = .
7.3 Integration on 1
Notation 7.38 If m is Lebesgue measure on B
R
, f is a non-negative Borel
measurable function and a < b with a, b

1, we will often write
_
b
a
f (x) dx or
_
b
a
fdm for
_
(a,b]R
fdm.
Example 7.39. Suppose < a < b < , f C([a, b], 1) and m be Lebesgue
measure on 1. Given a partition,
= a = a
0
< a
1
< < a
n
= b,
7.3 Integration on R 91
let
mesh() := max[a
j
a
j1
[ : j = 1, . . . , n
and
f
(x) :=
n1
l=0
f (a
l
) 1
(al,al+1]
(x).
Then
_
b
a
f
dm =
n1
l=0
f (a
l
) m((a
l
, a
l+1
]) =
n1
l=0
f (a
l
) (a
l+1
a
l
)
is a Riemann sum. Therefore if
k
k=1
is a sequence of partitions with
lim
k
mesh(
k
) = 0, we know that
lim
k
_
b
a
f
k
dm =
_
b
a
f (x) dx (7.19)
where the latter integral is the Riemann integral. Using the (uniform) continuity
of f on [a, b] , it easily follows that lim
k
f
k
(x) = f (x) and that [f
k
(x)[
g (x) := M1
(a,b]
(x) for all x (a, b] where M := max
x[a,b]
[f (x)[ < . Since
_
R
gdm = M (b a) < , we may apply D.C.T. to conclude,
lim
k
_
b
a
f
k
dm =
_
b
a
lim
k
f
k
dm =
_
b
a
f dm.
This equation with Eq. (7.19) shows
_
b
a
f dm =
_
b
a
f (x) dx
whenever f C([a, b], 1), i.e. the Lebesgue and the Riemann integral agree
on continuous functions. See Theorem 7.70 below for a more general statement
along these lines.
Theorem 7.40 (The Fundamental Theorem of Calculus). Suppose
< a < b < , f C((a, b), 1)L
1
((a, b), m) and F(x) :=
_
x
a
f(y)dm(y).
Then
1. F C([a, b], 1) C
1
((a, b), 1).
2. F
t
(x) = f(x) for all x (a, b).
3. If G C([a, b], 1) C
1
((a, b), 1) is an anti-derivative of f on (a, b) (i.e.
f = G
t
[
(a,b)
) then
_
b
a
f(x)dm(x) = G(b) G(a).
Proof. Since F(x) :=
_
R
1
(a,x)
(y)f(y)dm(y), lim
xz
1
(a,x)
(y) = 1
(a,z)
(y) for
m a.e. y and

1
(a,x)
(y)f(y)
1
(a,b)
(y) [f(y)[ is an L
1
function, it follows
from the dominated convergence Theorem 7.27 that F is continuous on [a, b].
Simple manipulations show,
F(x +h) F(x)

h
f(x)
=
1
[h[
_
_
_
_
x+h
x
[f(y) f(x)] dm(y)
if h > 0
_
x
x+h
[f(y) f(x)] dm(y)
if h < 0
1
[h[
_
_
x+h
x
[f(y) f(x)[ dm(y) if h > 0
_
x
x+h
[f(y) f(x)[ dm(y) if h < 0
sup[f(y) f(x)[ : y [x [h[ , x +[h[]
and the latter expression, by the continuity of f, goes to zero as h 0 . This
shows F
t
= f on (a, b).
For the converse direction, we have by assumption that G
t
(x) = F
t
(x) for
x (a, b). Therefore by the mean value theorem, F G = C for some constant
C. Hence
_
b
a
f(x)dm(x) = F(b) = F(b) F(a)
= (G(b) +C) (G(a) +C) = G(b) G(a).
We can use the above results to integrate some non-Riemann integrable
functions:
Example 7.41. For all > 0,
_

0
e
x
dm(x) =
1
and
_
R
1
1 +x
2
dm(x) = .
The proof of these identities are similar. By the monotone convergence theorem,
Example 7.39 and the fundamental theorem of calculus for Riemann integrals
(or Theorem 7.40 below),
_

0
e
x
dm(x) = lim
N
_
N
0
e
x
dm(x) = lim
N
_
N
0
e
x
dx
= lim
N
1
e
x
[
N
0
=
1
and
_
R
1
1 +x
2
dm(x) = lim
N
_
N
N
1
1 +x
2
dm(x) = lim
N
_
N
N
1
1 +x
2
dx
= lim
N
_
tan
1
(N) tan
1
(N)
= .
Let us also consider the functions x
p
. Using the MCT and the fundamental
theorem of calculus,
_
(0,1]
1
x
p
dm(x) = lim
n
_
1
0
1
(
1
n
,1]
(x)
1
x
p
dm(x)
= lim
n
_
1
1
n
1
x
p
dx = lim
n
x
p+1
1 p
1
1/n
=
_
1
1p
if p < 1
if p > 1
If p = 1 we nd
_
(0,1]
1
x
p
dm(x) = lim
n
_
1
1
n
1
x
dx = lim
n
ln(x)[
1
1/n
= .
Exercise 7.7. Show
_

1
1
x
p
dm(x) =
_
if p 1
1
p1
if p > 1
.
Example 7.42 (Integration of Power Series). Suppose R > 0 and a
n
n=0
is a
sequence of complex numbers such that

n=0
[a
n
[ r
n
< for all r (0, R).
Then
_

n=0
a
n
x
n
_
dm(x) =
n=0
a
n
_

x
n
dm(x) =
n=0
a
n
n+1
n+1
n + 1
for all R < < < R. Indeed this follows from Corollary 7.28 since
n=0
_

[a
n
[ [x[
n
dm(x)
n=0
_
_
]]
0
[a
n
[ [x[
n
dm(x) +
_
]]
0
[a
n
[ [x[
n
dm(x)
_
n=0
[a
n
[
[[
n+1
+[[
n+1
n + 1
2r
n=0
[a
n
[ r
n
<
where r = max([[ , [[).
Example 7.43. Let r
n
n=1
be an enumeration of the points in [0, 1] and
dene
f(x) =
n=1
2
n
1
_
[x r
n
[
1
_
[x r
n
[
= 5 if x = r
n
.
Since, By Theorem 7.40,
_
1
0
1
_
[x r
n
[
dx =
_
1
rn
1
x r
n
dx +
_
rn
0
1
r
n
x
dx
= 2
x r
n
[
1
rn
2
r
n
x[
rn
0
= 2
_
1 r
n
r
n
_
4,
we nd
_
[0,1]
f(x)dm(x) =
n=1
2
n
_
[0,1]
1
_
[x r
n
[
dx
n=1
2
n
4 = 4 < .
In particular, m(f = ) = 0, i.e. that f < for almost every x [0, 1] and
this implies that
n=1
2
n
1
_
[x r
n
[
< for a.e. x [0, 1].
This result is somewhat surprising since the singularities of the summands form
a dense subset of [0, 1].
Example 7.44. The following limit holds,
lim
n
_
n
0
_
1
x
n
_
n
dm(x) = 1. (7.20)
DCT Proof. To verify this, let f
n
(x) :=
_
1
x
n
_
n
1
[0,n]
(x). Then
lim
n
f
n
(x) = e
x
for all x 0. Moreover by simple calculus
1
1 x e
x
for all x 1.
Therefore, for x < n, we have
0 1
x
n
e
x/n
=
_
1
x
n
_
n
_
e
x/n
_
n
= e
x
,
0 f
n
(x) e
x
for all x 0.
1
Since y = 1 x is the tangent line to y = e
x
at x = 0 and e
x
is convex up, it
follows that 1 x e
x
for all x R.
7.3 Integration on R 93
From Example 7.41, we know
_

0
e
x
dm(x) = 1 < ,
so that e
x
is an integrable function on [0, ). Hence by the dominated con-
vergence theorem,
lim
n
_
n
0
_
1
x
n
_
n
dm(x) = lim
n
_

0
f
n
(x)dm(x)
=
_

0
lim
n
f
n
(x)dm(x) =
_

0
e
x
dm(x) = 1.
MCT Proof. The limit in Eq. (7.20) may also be computed using the
monotone convergence theorem. To do this we must show that n f
n
(x) is
increasing in n for each x and for this it suces to consider n > x. But for
n > x,
d
dn
lnf
n
(x) =
d
dn
_
nln
_
1
x
n
__
= ln
_
1
x
n
_
+
n
1
x
n
x
n
2
= ln
_
1
x
n
_
+
x
n
1
x
n
= h(x/n)
where, for 0 y < 1,
h(y) := ln(1 y) +
y
1 y
.
Since h(0) = 0 and
h
t
(y) =
1
1 y
+
1
1 y
+
y
(1 y)
2
> 0
it follows that h 0. Thus we have shown, f
n
(x) e
x
as n as claimed.
Example 7.45. Suppose that f
n
(x) := n1
(0,
1
n
]
(x) for n N. Then
lim
n
f
n
(x) = 0 for all x 1 while
lim
n
_
R
f
n
(x) dx = lim
n
1 = 1 ,= 0 =
_
R
lim
n
f
n
(x) dx.
The problem is that the best dominating function we can take is
g (x) = sup
n
f
n
(x) =
n=1
n 1
(
1
n+1
,
1
n
]
(x) .
Notice that
_
R
g (x) dx =
n=1
n
_
1
n

1
n + 1
_
=
n=1
1
n + 1
= .
Example 7.46 (Jordans Lemma). In this example, let us consider the limit;
lim
n
_

0
cos
_
sin

n
_
e
nsin()
d.
Let
f
n
() := 1
(0,]
() cos
_
sin

n
_
e
nsin()
.
Then
[f
n
[ 1
(0,]
L
1
(m)
and
lim
n
f
n
() = 1
(0,]
() 1
]
() = 1
]
() .
Therefore by the D.C.T.,
lim
n
_

0
cos
_
sin

n
_
e
nsin()
d =
_
R
1
]
() dm() = m() = 0.
Example 7.47. Recall from Example 7.41 that
1
=
_
[0,)
e
x
dm(x) for all > 0.
Let > 0. For 2 > 0 and n N there exists C
n
() < such that
0
_
d
d
_
n
e
x
= x
n
e
x
C
n
()e
x
.
Using this fact, Corollary 7.30 and induction gives
n!
n1
=
_
d
d
_
n
1
=
_
[0,)
_
d
d
_
n
e
x
dm(x)
=
_
[0,)
x
n
e
x
dm(x).
That is
n! =
n
_
[0,)
x
n
e
x
dm(x). (7.21)
Remark 7.48. Corollary 7.30 may be generalized by allowing the hypothesis to
hold for x X E where E B is a xed null set, i.e. E must be independent
of t. Consider what happens if we formally apply Corollary 7.30 to g(t) :=
_
0
1
xt
dm(x),
g(t) =
d
dt
_

0
1
xt
dm(x)
?
=
_

0
t
1
xt
dm(x).
The last integral is zero since

t
1
xt
= 0 unless t = x in which case it is not
dened. On the other hand g(t) = t so that g(t) = 1. (The reader should decide
which hypothesis of Corollary 7.30 has been violated in this example.)
Exercise 7.8 (Folland 2.28 on p. 60.). Compute the following limits and
justify your calculations:
1. lim
n
_
0
sin(
x
n
)
(1+
x
n
)
n
dx.
2. lim
n
_
1
0
1+nx
2
(1+x
2
)
n
dx
3. lim
n
_
0
nsin(x/n)
x(1+x
2
)
dx
4. For all a 1 compute,
f (a) := lim
n
_

a
n(1 +n
2
x
2
)
1
dx.
Exercise 7.9 (Integration by Parts). Suppose that f, g : 1 1 are two
continuously dierentiable functions such that f
t
g, fg
t
, and fg are all Lebesgue
integrable functions on 1. Prove the following integration by parts formula;
_
R
f
t
(x) g (x) dx =
_
R
f (x) g
t
(x) dx. (7.22)
Similarly show that if Suppose that f, g : [0, )[0, ) are two continuously
dierentiable functions such that f
t
g, fg
t
, and fg are all Lebesgue integrable
functions on [0, ), then
_

0
f
t
(x) g (x) dx = f (0) g (0)
_

0
f (x) g
t
(x) dx. (7.23)
Outline: 1. First notice that Eq. (7.22) holds if f (x) = 0 for [x[ N for
some N < by undergraduate calculus.
2. Let : 1 [0, 1] be a continuously dierentiable function such that
(x) = 1 if [x[ 1 and (x) = 0 if [x[ 2. For any > 0 let
(x) = (x)
Write out the identity in Eq. (7.22) with f (x) being replaced by f (x)
(x) .
3. Now use the dominated convergence theorem to pass to the limit as 0
in the identity you found in step 2.
4. A similar outline works to prove Eq. (7.23).
Solution to Exercise (7.9). If f has compact support in [N, N] for some
N < , then by undergraduate integration by parts,
_
R
f
t
(x) g (x) dx =
_
N
N
f
t
(x) g (x) dx
= f (x) g (x) [
N
N

_
N
N
f (x) g
t
(x) dx
=
_
N
N
f (x) g
t
(x) dx =
_
R
f (x) g
t
(x) dx.
Similarly if f has compact support in [0, ), then
_

0
f
t
(x) g (x) dx =
_
N
0
f
t
(x) g (x) dx
= f (x) g (x) [
N
0

_
N
0
f (x) g
t
(x) dx
= f (0) g (0)
_
N
0
f (x) g
t
(x) dx
= f (0)
_

0
f (x) g
t
(x) dx.
For general f we may apply this identity with f (x) replaced by
(x) f (x) to
learn,
_
R
f
t
(x) g (x)
(x) dx+
_
R
f (x) g (x)
t
(x) dx =
_
R
(x) f (x) g
t
(x) dx.
(7.24)
Since
(x) 1 boundedly and [

t
(x)[ = [
t
(x)[ C, we may use the
DCT to conclude,
lim
0
_
R
f
t
(x) g (x)
(x) dx =
_
R
f
t
(x) g (x) dx,
lim
0
_
R
f (x) g
t
(x)
(x) dx =
_
R
f (x) g
t
(x) dx, and
_
R
f (x) g (x)
t
(x) dx
C
_
R
[f (x) g (x)[ dx 0 as 0.
Therefore passing to the limit as 0 in Eq. (7.24) completes the proof of Eq.
(7.22). Equation (7.23) is proved in the same way.
Denition 7.49 (Gamma Function). The Gamma function, : 1
+

1
+
is dened by
(x) :=
_

0
u
x1
e
u
du (7.25)
(The reader should check that (x) < for all x > 0.)
7.4 Densities and Change of Variables Theorems 95
Here are some of the more basic properties of this function.
Example 7.50 ( function properties). Let be the gamma function, then;
1. (1) = 1 as is easily veried.
2. (x + 1) = x(x) for all x > 0 as follows by integration by parts;
(x + 1) =
_

0
e
u
u
x+1
du
u
=
_

0
u
x
_
d
du
e
u
_
du
= x
_

0
u
x1
e
u
du = x (x).
In particular, it follows from items 1. and 2. and induction that
(n + 1) = n! for all n N. (7.26)
(Equation 7.26was also proved in Eq. (7.21).)
3. (1/2) =
. This last assertion is a bit trickier. One proof is to make use

of the fact (proved below in Lemma 9.29) that
_

e
ar
2
dr =
_
a
for all a > 0. (7.27)
Taking a = 1 and making the change of variables, u = r
2
below implies,
=
_

e
r
2
dr = 2
_

0
u
1/2
e
u
du = (1/2) .
(1/2) = 2
_

0
e
r
2
dr =
_

e
r
2
dr
= I
1
(1) =
.
4. A simple induction argument using items 2. and 3. now shows that
_
n +
1
2
_
=
(2n 1)!!
2
n
where (1)!! := 1 and (2n 1)!! = (2n 1) (2n 3) . . . 3 1 for n N.

7.4 Densities and Change of Variables Theorems
Exercise 7.10 (Measures and Densities). Let (X, /, ) be a measure
space and : X [0, ] be a measurable function. For A /, set
(A) :=
_
A
d.
1. Show : /[0, ] is a measure.
2. Let f : X [0, ] be a measurable function, show
_
X
fd =
_
X
fd. (7.28)
Hint: rst prove the relationship for characteristic functions, then for sim-
ple functions, and then for general positive measurable functions.
3. Show that a measurable function f : X C is in L
1
() i [f[ L
1
()
and if f L
1
() then Eq. (7.28) still holds.
Solution to Exercise (7.10). The fact that is a measure follows easily from
Corollary 7.6. Clearly Eq. (7.28) holds when f = 1
A
by denition of . It then
holds for positive simple functions, f, by linearity. Finally for general f L
+
,
choose simple functions,
n
, such that 0
n
f. Then using MCT twice we
nd
_
X
fd = lim
n
_
X
n
d = lim
n
_
X
n
d =
_
X
lim
n
n
d =
_
X
fd.
By what we have just proved, for all f : X C we have
_
X
[f[ d =
_
X
[f[ d
so that f L
1
() i [f[ L
1
(). If f L
1
() and f is real,
_
X
fd =
_
X
f
+
d
_
X
f
d =
_
X
f
+
d
_
X
f
d
=
_
X
[f
+
f
] d =
_
X
fd.
The complex case easily follows from this identity.
Notation 7.51 It is customary to informally describe dened in Exercise
7.10 by writing d = d.
Exercise 7.11 (Abstract Change of Variables Formula). Let (X, /, )
be a measure space, (Y, T) be a measurable space and f : X Y be a mea-
surable map. Recall that = f
: T [0, ] dened by (A) := (f

1
(A))
for all A T is a measure on T.
1. Show _
Y
gd =
_
X
(g f) d (7.29)
for all measurable functions g : Y [0, ]. Hint: see the hint from Exercise
7.10.
2. Show a measurable function g : Y C is in L
1
() i g f L
1
() and
that Eq. (7.29) holds for all g L
1
().
Example 7.52. Suppose (, B, P) is a probability space and X
i
n
i=1
are random
variables on with := Law
P
(X
1
, . . . , X
n
) , then
E[g (X
1
, . . . , X
n
)] =
_
R
n
g d
for all g : 1
n
1 which are Borel measurable and either bounded or non-
negative. This follows directly from Exercise 7.11 with f := (X
1
, . . . , X
n
) :
1
n
and = P.
Remark 7.53. As a special case of Example 7.52, suppose that X is a random
variable on a probability space, (, B, P) , and F (x) := P (X x) . Then
E[f (X)] =
_
R
f (x) dF (x) (7.30)
where dF (x) is shorthand for d
F
(x) and
F
is the unique probability measure
on (1, B
R
) such that
F
((, x]) = F (x) for all x 1. Moreover if F : 1
[0, 1] happens to be C
1
-function, then
d
F
(x) = F
t
(x) dm(x) (7.31)
and Eq. (7.30) may be written as
E[f (X)] =
_
R
f (x) F
t
(x) dm(x) . (7.32)
To verify Eq. (7.31) it suces to observe, by the fundamental theorem of cal-
culus, that
F
((a, b]) = F (b) F (a) =
_
b
a
F
t
(x) dx =
_
(a,b]
F
t
dm.
From this equation we may deduce that
F
(A) =
_
A
F
t
dm for all A B
R
.
Equation 7.32 now follows from Exercise 7.10.
Exercise 7.12. Let F : 1 1 be a C
1
-function such that F
t
(x) > 0 for all
x 1 and lim
x
F(x) = . (Notice that F is strictly increasing so that
F
1
: 1 1 exists and moreover, by the inverse function theorem that F
1
is
a C
1
function.) Let m be Lebesgue measure on B
R
and
(A) = m(F(A)) = m(
_
F
1
_
1
(A)) =
_
F
1
m
_
(A)
for all A B
R
. Show d = F
t
dm. Use this result to prove the change of variable
formula,
_
R
h F F
t
dm =
_
R
hdm (7.33)
which is valid for all Borel measurable functions h : 1 [0, ].
Hint: Start by showing d = F
t
dm on sets of the form A = (a, b] with
a, b 1 and a < b. Then use the uniqueness assertions in Exercise 5.11 to
conclude d = F
t
dm on all of B
R
. To prove Eq. (7.33) apply Exercise 7.11 with
g = h F and f = F
1
.
Solution to Exercise (7.12). Let d = F
t
dm and A = (a, b], then
((a, b]) = m(F((a, b])) = m((F(a), F(b)]) = F(b) F(a)
while
((a, b]) =
_
(a,b]
F
t
dm =
_
b
a
F
t
(x)dx = F(b) F(a).
It follows that both = =
F
where
F
is the measure described in
Theorem 5.33. By Exercise 7.11 with g = h F and f = F
1
, we nd
_
R
h F F
t
dm =
_
R
h Fd =
_
R
h Fd
_
F
1
m
_
=
_
R
(h F) F
1
dm
=
_
R
hdm.
This result is also valid for all h L
1
(m).
7.5 Some Common Continuous Distributions
Example 7.54 (Uniform Distribution). Suppose that X has the uniform distri-
bution in [0, b] for some b (0, ) , i.e. X
P =
1
b
m on [0, b] . More explicitly,
E[f (X)] =
1
b
_
b
0
f (x) dx for all bounded measurable f.
The moment generating function for X is;
M
X
(t) =
1
b
_
b
0
e
tx
dx =
1
bt
_
e
tb
1
_
=
n=1
1
n!
(bt)
n1
=
n=0
b
n
(n + 1)!
t
n
.
7.5 Some Common Continuous Distributions 97
On the other hand (see Proposition 7.33),
M
X
(t) =
n=0
t
n
n!
EX
n
.
Thus it follows that
EX
n
=
b
n
n + 1
.
Of course this may be calculated directly just as easily,
EX
n
=
1
b
_
b
0
x
n
dx =
1
b (n + 1)
x
n+1
[
b
0
=
b
n
n + 1
.
Denition 7.55. A random variable T 0 is said to be exponential with
parameter [0, ) provided, P (T > t) = e
t
for all t 0. We will write
T
d
= E () for short.
If > 0, we have
P (T > t) = e
t
=
_

t
e
d
from which it follows that P (T (t, t +dt)) = 1
t0
e
t
dt. Applying Corollary
7.30 repeatedly implies,
ET =
_

0
e
d =
_
d
d
__

0
e
d =
_
d
d
_
1
=
1
and more generally that
ET
k
=
_

0
k
e
d =
_
d
d
_
k
_

0
e
d =
_
d
d
_
k
1
= k!
k
.
(7.34)
In particular we see that
Var (T) = 2
2
2
=
2
. (7.35)
Alternatively we may compute the moment generating function for T,
M
T
(a) := E
_
e
aT
=
_

0
e
a
e
d
=
_

0
e
a
e
d =

a
=
1
1 a
1
(7.36)
which is valid for a < . On the other hand (see Proposition 7.33), we know
that
E
_
e
aT
n=0
a
n
n!
E[T
n
] for [a[ < . (7.37)
Comparing this with Eq. (7.36) again shows that Eq. (7.34) is valid.
Here is yet another way to understand and generalize Eq. (7.36). We simply
make the change of variables, u = in the integral in Eq. (7.34) to learn,
ET
k
=
k
_

0
u
k
e
u
d =
k
(k + 1) .
This last equation is valid for all k (1, ) in particular k need not be an
integer.
Theorem 7.56 (Memoryless property). A random variable, T (0, ] has
an exponential distribution i it satises the memoryless property:
P (T > s +t[T > s) = P (T > t) for all s, t 0,
where as usual, P (A[B) := P (A B) /P (B) when p (B) > 0. (Note that T
d
=
E (0) means that P (T > t) = e
0t
= 1 for all t > 0 and therefore that T =
a.s.)
Proof. (The following proof is taken from [43].) Suppose rst that T
d
= E ()
for some > 0. Then
P (T > s +t[T > s) =
P (T > s +t)
P (T > s)
=
e
(s+t)
e
s
= e
t
= P (T > t) .
For the converse, let g (t) := P (T > t) , then by assumption,
g (t +s)
g (s)
= P (T > s +t[T > s) = P (T > t) = g (t)
whenever g (s) ,= 0 and g (t) is a decreasing function. Therefore if g (s) = 0 for
some s > 0 then g (t) = 0 for all t > s. Thus it follows that
g (t +s) = g (t) g (s) for all s, t 0.
Since T > 0, we know that g (1/n) = P (T > 1/n) > 0 for some n and
therefore, g (1) = g (1/n)
n
> 0 and we may write g (1) = e
for some 0 <

.
Observe for p, q N, g (p/q) = g (1/q)
p
and taking p = q then shows,
e
= g (1) = g (1/q)
q
. Therefore, g (p/q) = e
p/q
so that g (t) = e
t
for all
t
+
:= 1
+
. Given r, s
+
and t 1 such that r t s we have,
since g is decreasing, that
e
r
= g (r) g (t) g (s) = e
s
.
Hence letting s t and r t in the above equations shows that g (t) = e
t
for
all t 1
+
and therefore T
d
= E () .
Exercise 7.13 (Gamma Distributions). Let X be a positive random vari-
able. For k, > 0, we say that X
d
=Gamma(k, ) if
(X
P) (dx) = f (x; k, ) dx for x > 0,

where
f (x; k, ) := x
k1
e
x/
k
(k)
for x > 0, and k, > 0.
Find the moment generating function (see Denition 7.32), M
X
(t) = E
_
e
tX
for t <
1
. Dierentiate your result in t to show
E[X
m
] = k (k + 1) . . . (k +m1)
m
for all m N
0
.
In particular, E[X] = k and Var (X) = k
2
. (Notice that when k = 1 and
=
1
, X
d
= E () .)
7.5.1 Normal (Gaussian) Random Variables
Denition 7.57 (Normal / Gaussian Random Variables). A random
variable, Y, is normal with mean standard deviation
2
i
P (Y B) =
1
2
2
_
B
e
1
2
2
(y)
2
dy for all B B
R
. (7.38)
We will abbreviate this by writing Y
d
= N
_
,
2
_
. When = 0 and
2
= 1 we
will simply write N for N (0, 1) and if Y
d
= N, we will say Y is a standard
normal random variable.
Observe that Eq. (7.38) is equivalent to writing
E[f (Y )] =
1
2
2
_
R
f (y) e
1
2
2
(y)
2
dy
for all bounded measurable functions, f : 1 1. Also observe that Y
d
=
N
_
,
2
_
is equivalent to Y
d
= N+. Indeed, by making the change of variable,
y = x +, we nd
E[f (N +)] =
1
2
_
R
f (x +) e
1
2
x
2
dx
=
1
2
_
R
f (y) e
1
2
2
(y)
2 dy
=
1
2
2
_
R
f (y) e
1
2
2
(y)
2
dy.
Lastly the constant,
_
2
2
_
1/2
is chosen so that
1
2
2
_
R
e
1
2
2
(y)
2
dy =
1
2
_
R
e
1
2
y
2
dy = 1,
see Example 7.50 and Lemma 9.29.
Exercise 7.14. Suppose that X
d
= N (0, 1) and f : 1 1 is a C
1
function
such that Xf (X) , f
t
(X) and f (X) are all integrable random variables. Show
E[Xf (X)] =
1
2
_
R
f (x)
d
dx
e
1
2
x
2
dx
=
1
2
_
R
f
t
(x) e
1
2
x
2
dx = E[f
t
(X)] .
Example 7.58. Suppose that X
d
= N (0, 1) and dene
k
:= E
_
X
2k
for all
k N
0
. By Exercise 7.14,
k+1
= E
_
X
2k+1
X
= (2k + 1)
k
with
0
= 1.
Hence it follows that
1
=
0
= 1,
2
= 3
1
= 3,
3
= 5 3
and by a simple induction argument,
EX
2k
=
k
= (2k 1)!!, (7.39)
where (1)!! := 0. Actually we can use the function to say more. Namely
for any > 1,
E[X[
=
1
2
_
R
[x[
1
2
x
2
dx =
_
2
_

0
x
1
2
x
2
dx.
Now make the change of variables, y = x
2
/2 (i.e. x =
2y and dx =
1
2
y
1/2
dy)
to learn,
E[X[
=
1
_

0
(2y)
/2
e
y
y
1/2
dy
=
1
2
/2
_

0
y
(+1)/2
e
y
y
1
dy =
1
2
/2
_
+ 1
2
_
. (7.40)
7.5 Some Common Continuous Distributions 99
d
= N (0, 1) and 1. Show
f () := E
_
e
iX
= exp
_
2
/2
_
. (7.41)
Hint: Use Corollary 7.30 to show, f
t
() = iE
_
Xe
iX
and then use Exercise

7.14 to see that f
t
() satises a simple ordinary dierential equation.
Solution to Exercise (7.15). Using Corollary 7.30 and Exercise 7.14,
f
t
() = iE
_
Xe
iX
= iE
_
d
dX
e
iX
_
= i (i) E
_
e
iX
= f () with f (0) = 1.
Solving for the unique solution of this dierential equation gives Eq. (7.41).
d
= N (0, 1) and t 1. Show E
_
e
tX
=
exp
_
t
2
/2
_
. (You could follow the hint in Exercise 7.15 or you could use a
completion of the squares argument along with the translation invariance of
Lebesgue measure.)
Exercise 7.17. Use Exercise 7.16 and Proposition 7.33 to give another proof
that EX
2k
= (2k 1)!! when X
d
= N (0, 1) .
d
= N (0, 1) and 1, nd : 1
+
1
+
:= (0, ) such
that
E[f ([X[
)] =
_
R+
f (x) (x) dx
for all continuous functions, f : 1
+
1 with compact support in 1
+
.
Lemma 7.59 (Gaussian tail estimates). Suppose that X is a standard nor-
mal random variable, i.e.
P (X A) =
1
2
_
A
e
x
2
/2
dx for all A B
R
,
then for all x 0,
P (X x) min
_
1
2

x
2
e
x
2
/2
,
1
2x
e
x
2
/2
_
1
2
e
x
2
/2
. (7.42)
Moreover (see [47, Lemma 2.5]),
P (X x) max
_
1
x
2
,
x
x
2
+ 1
1
2
e
x
2
/2
_
(7.43)
which combined with Eq. (7.42) proves Mills ratio (see [21]);
lim
x
P (X x)
1
2x
e
x
2
/2
= 1. (7.44)
Proof. See Figure 7.1 where; the green curve is the plot of P (X x) , the
black is the plot of
min
_
1
2

1
2x
e
x
2
/2
,
1
2x
e
x
2
/2
_
,
the red is the plot of
1
2
e
x
2
/2
, and the blue is the plot of
max
_
1
2

x
2
,
x
x
2
+ 1
1
2
e
x
2
/2
_
.
The formal proof of these estimates for the reader who is not convinced by
Fig. 7.1. Plots of P (X x) and its estimates.
Figure 7.1 is given below.
We begin by observing that
P (X x) =
1
2
_

x
e
y
2
/2
dy
1
2
_

x
y
x
e
y
2
/2
dy

1
2
1
x
e
y
2
/2
[
x
=
1
2
1
x
e
x
2
/2
. (7.45)
If we only want to prove Mills ratio (7.44), we could proceed as follows. Let
> 1, then for x > 0,
P (X x) =
1
2
_

x
e
y
2
/2
dy
2
_
x
x
y
x
e
y
2
/2
dy =
1
2
1
x
e
y
2
/2
[
y=x
y=x
=
1
2
1
x
e
x
2
/2
_
1 e
2
x
2
/2
_
from which it follows,
liminf
x
_
2xe
x
2
/2
P (X x)
_
1/ 1 as 1.
The estimate in Eq. (7.45) shows limsup
x
_
2xe
x
2
/2
P (X x)
_
1.
To get more precise estimates, we begin by observing,
P (X x) =
1
2

1
2
_
x
0
e
y
2
/2
dy (7.46)
1
2

1
2
_
x
0
e
x
2
/2
dy
1
2

1
2
e
x
2
/2
x.
This equation along with Eq. (7.45) gives the rst equality in Eq. (7.42). To
prove the second equality observe that

2 > 2, so
1
2
1
x
e
x
2
/2
1
2
e
x
2
/2
if x 1.
For x 1 we must show,
1
2

x
2
e
x
2
/2
1
2
e
x
2
/2
or equivalently that f (x) := e
x
2
/2
_
2
x 1 for 0 x 1. Since f is convex

_
f
tt
(x) =
_
x
2
+ 1
_
e
x
2
/2
> 0
_
, f (0) = 1 and f (1)

= 0.85 < 1, it follows that
f 1 on [0, 1] . This proves the second inequality in Eq. (7.42).
P (X x) =
1
2

1
2
_
x
0
e
y
2
/2
dy
1
2

1
2
_
x
0
1dy =
1
2

1
2
x for all x 0.
So to nish the proof of Eq. (7.43) we must show,
f (x) :=
1
2
xe
x
2
/2
_
1 +x
2
_
P (X x)
=
1
2
_
xe
x
2
/2
_
1 +x
2
_
_

x
e
y
2
/2
dy
_
0 for all 0 x < .
This follows by observing that f (0) = 1/2 < 0, lim
x
f (x) = 0 and
f
t
(x) =
1
2
_
e
x
2
/2
_
1 x
2
_
2xP (X x) +
_
1 +x
2
_
e
x
2
/2
_
= 2
_
1
2
e
x
2
/2
xP (X y)
_
0,
where the last inequality is a consequence Eq. (7.42).
7.6 Stirlings Formula
On occasion one is faced with estimating an integral of the form,
_
J
e
G(t)
dt,
where J = (a, b) 1 and G(t) is a C
1
function with a unique (for simplicity)
global minimum at some point t
0
J. The idea is that the majority contribu-
tion of the integral will often come from some neighborhood, (t
0
, t
0
+) ,
of t
0
. Moreover, it may happen that G(t) can be well approximated on this
neighborhood by its Taylor expansion to order 2;
G(t)
= G(t
0
) +
1
2

G(t
0
) (t t
0
)
2
.
Notice that the linear term is zero since t
0
is a minimum and therefore

G(t
0
) =
0. We will further assume that

G(t
0
) ,= 0 and hence

G(t
0
) > 0. Under these
hypothesis we will have,
_
J
e
G(t)
dt
= e
G(t0)
_
]tt0]<
exp
_
1
2

G(t
0
) (t t
0
)
2
_
dt.
Making the change of variables, s =
_
G(t
0
) (t t
0
) , in the above integral then
gives,
_
J
e
G(t)
dt
=
1
_
G(t
0
)
e
G(t0)
_
]s]<
G(t0)
e
1
2
s
2
ds
=
1
_
G(t
0
)
e
G(t0)
_
2
_

G(t0)
e
1
2
s
2
ds
_
=
1
_
G(t
0
)
e
G(t0)
_
_
2 O
_
_
1
_
G(t
0
)
e
1
2

G(t0)
2
_
_
_
_
.
If is suciently large, for example if
_
G(t
0
) = 3, then the error term is
about 0.0037 and we should be able to conclude that
_
J
e
G(t)
dt
G(t
0
)
e
G(t0)
. (7.47)
The proof of the next theorem (Stirlings formula for the Gamma function) will
illustrate these ideas and what one has to do to carry them out rigorously.
Theorem 7.60 (Stirlings formula). The Gamma function (see Denition
7.49), satises Stirlings formula,
7.6 Stirlings Formula 101
lim
x
(x + 1)
2e
x
x
x+1/2
= 1. (7.48)
In particular, if n N, we have
n! = (n + 1)
2e
n
n
n+1/2
where we write a
n
b
n
to mean, lim
n
an
bn
= 1. (See Example 7.65 below for
a slightly cruder but more elementary estimate of n!)
Proof. (The following proof is an elaboration of the proof found on page
236-237 in Krantzs Real Analysis and Foundations.) We begin with the formula
for (x + 1) ;
(x + 1) =
_

0
e
t
t
x
dt =
_

0
e
Gx(t)
dt, (7.49)
where
G
x
(t) := t xlnt.
Then

G
x
(t) = 1 x/t,

G
x
(t) = x/t
2
, G
x
has a global minimum (since

G
x
> 0)
at t
0
= x where
G
x
(x) = x xlnx and

G
x
(x) = 1/x.
So if Eq. (7.47) is valid in this case we should expect,
(x + 1)
2xe
(xx ln x)
=
2e
x
x
x+1/2
which would give Stirlings formula. The rest of the proof will be spent on
rigorously justifying the approximations involved.
Let us begin by making the change of variables s =
_
G(t
0
) (t t
0
) =
1
x
(t x) as suggested above. Then
G
x
(t) G
x
(x) = (t x) xln(t/x) =
xs xln
_
x +
xs
x
_
= x
_
s
x
ln
_
1 +
s
x
__
= s
2
q
_
s
x
_
where
q (u) :=
1
u
2
[u ln(1 +u)] for u > 1 with q (0) :=
1
2
.
Setting q (0) = 1/2 makes q a continuous and in fact smooth function on
(1, ) , see Figure 7.2. Using the power series expansion for ln(1 +u) we
nd,
q (u) =
1
2
+
k=3
(u)
k2
k
for [u[ < 1. (7.50)
Fig. 7.2. Plot of q (u) .
Making the change of variables, t = x +
xs in the second integral in Eq.

(7.49) yields,
(x + 1) = e
(xx ln x)
x
_

x
e
q
_
s
x
_
s
2
ds = x
x+1/2
e
x
I (x) ,
where
I (x) =
_

x
e
q
_
s
x
_
s
2
ds =
_

1
s
x
e
q
_
s
x
_
s
2
ds. (7.51)
From Eq. (7.50) it follows that lim
u0
q (u) = 1/2 and therefore,
_

lim
x
_
1
s
x
e
q
_
s
x
_
s
2
_
ds =
_

1
2
s
2
ds =
2. (7.52)
So if there exists a dominating function, F L
1
(1, m) , such that
1
s
x
e
q
_
s
x
_
s
2
F (s) for all s 1 and x 1,
we can apply the DCT to learn that lim
x
I (x) =

2 which will complete
the proof of Stirlings formula.
We now construct the desired function F. From Eq. (7.50) it follows that
q (u) 1/2 for 1 0 for u ,= 0 (u ln(1 +u) is
convex and has a minimum of 0 at u = 0) we may conclude that q (u) > 0 for
all u > 1 therefore by compactness (on [0, M]), min
1<uM
q (u) = (M) > 0
for all M (0, ) , see Remark 7.61 for more explicit estimates. Lastly, since
1
u
ln(1 +u) 0 as u , there exists M < (M = 3 would due) such that
1
u
ln(1 +u)
1
2
for u M and hence,
q (u) =
1
u
_
1
1
u
ln(1 +u)
_
1
2u
for u M.
So there exists > 0 and M < such that (for all x 1),
1
s
x
e
q
_
s
x
_
s
2
1
x<sM
e
s
2
+ 1
sM
e
xs/2
1
x<sM
e
s
2
+ 1
sM
e
s/2
e
s
2
+e
]s]/2
=: F (s) L
1
(1, ds) .
We will sometimes use the following variant of Eq. (7.48);
lim
x
(x)
_
2
x
_
x
e
_
x
= 1 (7.53)
To prove this let x go to x 1 in Eq. (7.48) in order to nd,
1 = lim
x
(x)
2e
x
e (x 1)
x1/2
= lim
x
(x)
_
2
x
_
x
e
_
x
_
x
x1
e
_
1
1
x
_
x
which gives Eq. (7.53) since
lim
x
_
x
x 1
e
_
1
1
x
_
x
= 1.
Remark 7.61 (Estimating q (u) by Taylors Theorem). Another way to estimate
q (u) is to use Taylors theorem with integral remainder. In general if h is C
2
function on [0, 1] , then by the fundamental theorem of calculus and integration

by parts,
h(1) h(0) =
_
1
0
h(t) dt =
_
1
0
h(t) d (1 t)
=
h(t) (1 t) [
1
0
+
_
1
0
h(t) (1 t) dt
=

h(0) +
1
2
_
1
0
h(t) d (t) (7.54)

where d (t) := 2 (1 t) dt which is a probability measure on [0, 1] . Applying
this to h(t) = F (a +t (b a)) for a C
2
function on an interval of points
between a and b in 1 then implies,
F (b) F (a) = (b a)

F (a) +
1
2
(b a)
2
_
1
0
F (a +t (b a)) d (t) . (7.55)

(Similar formulas hold to any order.) Applying this result with F (x) = x
ln(1 +x) , a = 0, and b = u (1, ) gives,
u ln(1 +u) =
1
2
u
2
_
1
0
1
(1 +tu)
2
d (t) ,
i.e.
q (u) =
1
2
_
1
0
1
(1 +tu)
2
d (t) .
From this expression for q (u) it now easily follows that
q (u)
1
2
_
1
0
1
(1 + 0)
2
d (t) =
1
2
if 1 < u 0
and
q (u)
1
2
_
1
0
1
(1 +u)
2
d (t) =
1
2 (1 +u)
2
.
So an explicit formula for (M) is (M) = (1 +M)
2
/2.
7.6.1 Two applications of Stirlings formula
In this subsection suppose x (0, 1) and S
n
d
=Binomial(n, x) for all n N, i.e.
P
x
(S
n
= k) =
_
n
k
_
x
k
(1 x)
nk
for 0 k n. (7.56)
Recall that ES
n
= nx and Var (S
n
) = n
2
where
2
:= x(1 x) . The weak
law of large numbers states (Exercise 4.13) that
P
_
S
n
n
x
1
n
2
2
and therefore,
Sn
n
is concentrating near its mean value, x, for n large, i.e. S
n

=
nx for n large. The next central limit theorem describes the uctuations of S
n
about nx.
Theorem 7.62 (De Moivre-Laplace Central Limit Theorem). For all
< a < b < ,
lim
n
P
_
a
S
n
nx
n
b
_
=
1
2
_
b
a
e
1
2
y
2
dy
= P (a N b)
where N
d
= N (0, 1) . Informally,
Snnx
n
d
= N or equivalently, S
n
d
= nx+
nN
which if valid in a neighborhood of nx whose length is order

n.
Proof. (We are not going to cover all the technical details in this proof as
we will give much more general versions of this theorem later.) Starting with
the denition of the Binomial distribution we have,
p
n
:= P
_
a
S
n
nx
n
b
_
= P
_
S
n
nx +
n[a, b]
_
=
knx+
n[a,b]
P (S
n
= k)
=
knx+
n[a,b]
_
n
k
_
x
k
(1 x)
nk
.
Letting k = nx+
ny
k
, i.e. y
k
= (k nx) /
n we see that y
k
= y
k+1
y
k
=
1/ (
n) . Therefore we may write p

n
as
p
n
=
yk[a,b]
n
_
n
k
_
x
k
(1 x)
nk
y
k
. (7.57)
So to nish the proof we need to show, for k = O(
n) (y
k
= O(1)), that
n
_
n
k
_
x
k
(1 x)
nk
2
e
1
2
y
2
k
as n (7.58)
in which case the sum in Eq. (7.57) may be well approximated by the Riemann
sum;
p
n

yk[a,b]
1
2
e
1
2
y
2
k
y
k

1
2
_
b
a
e
1
2
y
2
dy as n .
By Stirlings formula,
n
_
n
k
_
=
n
1
k!
n!
(n k)!

2
n
n+1/2
k
k+1/2
(n k)
nk+1/2
=

2
1
_
k
n
_
k+1/2
_
1
k
n
_
nk+1/2
=

2
1
_
x +

n
y
k
_
k+1/2
_
1 x

n
y
k
_
nk+1/2
2
1
_
x(1 x)
1
_
x +

n
y
k
_
k
_
1 x

n
y
k
_
nk
=
1
2
1
_
x +

n
y
k
_
k
_
1 x

n
y
k
_
nk
.
In order to shorten the notation, let z
k
:=

n
y
k
= O
_
n
1/2
_
so that k =
nx +nz
k
= n(x +z
k
) . In this notation we have shown,
n
_
n
k
_
x
k
(1 x)
nk
x
k
(1 x)
nk
(x +z
k
)
k
(1 x z
k
)
nk
=
1
_
1 +
1
x
z
k
_
k
_
1
1
1x
z
k
_
nk
=
1
_
1 +
1
x
z
k
_
n(x+zk)
_
1
1
1x
z
k
_
n(1xzk)
=: q (n, k) .
(7.59)
Taking logarithms and using Taylors theorem we learn
n(x +z
k
) ln
_
1 +
1
x
z
k
_
= n(x +z
k
)
_
1
x
z
k

1
2x
2
z
2
k
+O
_
n
3/2
_
_
= nz
k
+
n
2x
z
2
k
+O
_
n
3/2
_
and
n(1 x z
k
) ln
_
1
1
1 x
z
k
_
= n(1 x z
k
)
_
1
1 x
z
k

1
2 (1 x)
2
z
2
k
+O
_
n
3/2
_
_
= nz
k
+
n
2 (1 x)
z
2
k
+O
_
n
3/2
_
.
and then adding these expressions shows,
lnq (n, k) =
n
2
z
2
k
_
1
x
+
1
1 x
_
+O
_
n
3/2
_
=
n
2
2
z
2
k
+O
_
n
3/2
_
=
1
2
y
2
k
+O
_
n
3/2
_
.
Combining this with Eq. (7.59) shows,
n
_
n
k
_
x
k
(1 x)
nk
2
exp
_
1
2
y
2
k
+O
_
n
3/2
_
_
which gives the desired estimate in Eq. (7.58).
The previous central limit theorem has shown that
S
n
n
d
= x +

n
N
which implies the major uctuations of S
n
/n occur within intervals about x
of length O
_
1
n
_
. The next result aims to understand the rare events where
S
n
/n makes a large deviation from its mean value, x in this case a large
deviation is something of size O(1) as n .
Theorem 7.63 (Binomial Large Deviation Bounds). Let us continue to
use the notation in Theorem 7.62. Then for all y (0, x) ,
lim
n
1
n
lnP
x
_
S
n
n
y
_
= y ln
x
y
+ (1 y) ln
1 x
1 y
.
Roughly speaking,
P
x
_
S
n
n
y
_
e
nIx(y)
where I
x
(y) is the rate function,
I
x
(y) := y ln
y
x
+ (1 y) ln
1 y
1 x
,
see Figure 7.3 for the graph of I
1/2
.
Proof. By denition of the binomial distribution,
P
x
_
S
n
n
y
_
= P
x
(S
n
ny) =
kny
_
n
k
_
x
k
(1 x)
nk
.
If a
k
0, then we have the following crude estimates on

m1
k=0
a
k
,
Fig. 7.3. A plot of the rate function, I
1/2
.
max
k<m
a
k

m1
k=0
a
k
m max
k<m
a
k
. (7.60)
In order to apply this with a
k
=
_
n
k
_
x
k
(1 x)
nk
and m = [ny] , we need to
nd the maximum of the a
k
for 0 k ny. This is easy to do since a
k
is
increasing for 0 k ny as we now show. Consider,
a
k+1
a
k
=
_
n
k+1
_
x
k+1
(1 x)
nk1
_
n
k
_
x
k
(1 x)
nk
=
k! (n k)! x
(k + 1)! (n k 1)! (1 x)
=
(n k) x
(k + 1) (1 x)
.
Therefore, where the latter expression is greater than or equal to 1 i
a
k+1
a
k
1 (n k) x (k + 1) (1 x)
nx k + 1 x k < (n 1) x 1.
Thus for k < (n 1) x1 we may conclude that
_
n
k
_
x
k
(1 x)
nk
is increasing
in k.
Thus the crude bound in Eq. (7.60) implies,
_
n
[ny]
_
x
[ny]
(1 x)
n[ny]
P
x
_
S
n
n
y
_
[ny]
_
n
[ny]
_
x
[ny]
(1 x)
n[ny]
or equivalently,
1
n
ln
__
n
[ny]
_
x
[ny]
(1 x)
n[ny]
_
1
n
lnP
x
_
S
n
n
y
_
1
n
ln
_
(ny)
_
n
[ny]
_
x
[ny]
(1 x)
n[ny]
_
.
By Stirlings formula, for k such that k and n k is large we have,
_
n
k
_
2
n
n+1/2
k
k+1/2
(n k)
nk+1/2
=
2
1
_
k
n
_
k+1/2
_
1
k
n
_
nk+1/2
and therefore,
1
n
ln
_
n
k
_

k
n
ln
_
k
n
_
_
1
k
n
_
ln
_
1
k
n
_
.
So taking k = [ny] , we learn that
lim
n
1
n
ln
_
n
[ny]
_
= y lny (1 y) ln(1 y)
and therefore,
lim
n
1
n
lnP
x
_
S
n
n
y
_
= y lny (1 y) ln(1 y) +y lnx + (1 y) ln(1 x)
= y ln
x
y
+ (1 y) ln
_
1 x
1 y
_
.
As a consistency check it is worth noting, by Jensens inequality described
below, that
I
x
(y) = y ln
x
y
+ (1 y) ln
_
1 x
1 y
_
ln
_
y
x
y
+ (1 y)
1 x
1 y
_
= ln(1) = 0.
This must be the case since
I
x
(y) = lim
n
1
n
lnP
x
_
S
n
n
y
_
lim
n
1
n
ln1 = 0.
7.6.2 A primitive Stirling type approximation
Theorem 7.64. Suppose that f : (0, ) 1 is an increasing concave down
function (like f (x) = lnx) and let s
n
:=
n
k=1
f (k) , then
s
n
1
2
(f (n) +f (1))
_
n
1
f (x) dx
s
n
1
2
[f (n + 1) + 2f (1)] +
1
2
f (2)
s
n
1
2
[f (n) + 2f (1)] +
1
2
f (2) .
Proof. On the interval, [k 1, k] , we have that f (x) is larger than the
straight line segment joining (k 1, f (k 1)) and (k, f (k)) and thus
1
2
(f (k) +f (k 1))
_
k
k1
f (x) dx.
Summing this equation on k = 2, . . . , n shows,
s
n
1
2
(f (n) +f (1)) =
n
k=2
1
2
(f (k) +f (k 1))
k=2
_
k
k1
f (x) dx =
_
n
1
f (x) dx.
For the upper bound on the integral we observe that f (x) f (k)f
t
(k) (x k)
for all x and therefore,
_
k
k1
f (x) dx
_
k
k1
[f (k) f
t
(k) (x k)] dx = f (k)
1
2
f
t
(k) .
Summing this equation on k = 2, . . . , n then implies,
_
n
1
f (x) dx
n
k=2
f (k)
1
2
n
k=2
f
t
(k) .
Since f
tt
(x) 0, f
t
(x) is decreasing and therefore f
t
(x) f
t
(k 1) for x
[k 1, k] and integrating this equation over [k 1, k] gives
f (k) f (k 1) f
t
(k 1) .
Summing the result on k = 3, . . . , n + 1 then shows,
f (n + 1) f (2)
n
k=2
f
t
(k)
and thus ti follows that
_
n
1
f (x) dx
n
k=2
f (k)
1
2
(f (n + 1) f (2))
= s
n
1
2
[f (n + 1) + 2f (1)] +
1
2
f (2)
s
n
1
2
[f (n) + 2f (1)] +
1
2
f (2)
Example 7.65 (Approximating n!). Let us take f (n) = lnn and recall that
_
n
1
lnxdx = nlnn n + 1.
Thus we may conlcud that
s
n
1
2
lnn nlnn n + 1 s
n
1
2
lnn +
1
2
ln2.
_
n +
1
2
_
lnn n + 1 ln
2 s
n

_
n +
1
2
_
lnn n + 1.
Exponentiating this identity then implies,
e
2
e
n
n
n+1/2
n! e e
n
n
n+1/2
which compares well with Strirlings formula (Theorem 7.60) which states,
n!
2e
n
n
n+1/2
.
Observe that
e
= 1. 922 1
2

= 2. 506 e
= 2.718 3.
7.7 Comparison of the Lebesgue and the Riemann
Integral*
For the rest of this chapter, let < a < b < and f : [a, b] 1 be a
bounded function. A partition of [a, b] is a nite subset [a, b] containing
a, b. To each partition
= a = t
0
< t
1
< < t
n
= b (7.61)
of [a, b] let
mesh() := max[t
j
t
j1
[ : j = 1, . . . , n,
M
j
= supf(x) : t
j
x t
j1
, m
j
= inff(x) : t
j
x t
j1
= f(a)1
a]
+
n
1
M
j
1
(tj1,tj]
, g
= f(a)1
a]
+
n
1
m
j
1
(tj1,tj]
and
S
f =
M
j
(t
j
t
j1
) and s
f =
m
j
(t
j
t
j1
).
Notice that
S
f =
_
b
a
G
dm and s
f =
_
b
a
g
dm.
The upper and lower Riemann integrals are dened respectively by
_
b
a
f(x)dx = inf
f and
_
a
b
f(x)dx = sup
f.
Denition 7.66. The function f is Riemann integrable i
_
b
a
f =
_
b
a
f 1
and which case the Riemann integral
_
b
a
f is dened to be the common value:
_
b
a
f(x)dx =
_
b
a
f(x)dx =
_
b
a
f(x)dx.
The proof of the following Lemma is left to the reader as Exercise 7.29.
Lemma 7.67. If
t
and are two partitions of [a, b] and
t
then
G
f g
and
S
f S
f s
f s
f.
There exists an increasing sequence of partitions
k
k=1
such that mesh(
k
)
0 and
S
k
f
_
b
a
f and s
k
f
_
b
a
f as k .
7.7 Comparison of the Lebesgue and the Riemann Integral* 107
If we let
G := lim
k
G
k
and g := lim
k
g
k
(7.62)
then by the dominated convergence theorem,
_
[a,b]
gdm = lim
k
_
[a,b]
g
k
= lim
k
s
k
f =
_
b
a
f(x)dx (7.63)
and
_
[a,b]
Gdm = lim
k
_
[a,b]
G
k
= lim
k
S
k
f =
_
b
a
f(x)dx. (7.64)
Notation 7.68 For x [a, b], let
H(x) = limsup
yx
f(y) := lim
0
supf(y) : [y x[ , y [a, b] and
h(x) = liminf
yx
f(y) := lim
0
inf f(y) : [y x[ , y [a, b].
Lemma 7.69. The functions H, h : [a, b] 1 satisfy:
1. h(x) f(x) H(x) for all x [a, b] and h(x) = H(x) i f is continuous
at x.
2. If
k
k=1
is any increasing sequence of partitions such that mesh(
k
) 0
and G and g are dened as in Eq. (7.62), then
G(x) = H(x) f(x) h(x) = g(x) x / :=
k=1
k
. (7.65)
(Note is a countable set.)
3. H and h are Borel measurable.
Proof. Let G
k
:= G
k
G and g
k
:= g
k
g.
1. It is clear that h(x) f(x) H(x) for all x and H(x) = h(x) i lim
yx
f(y)
exists and is equal to f(x). That is H(x) = h(x) i f is continuous at x.
2. For x / ,
G
k
(x) H(x) f(x) h(x) g
k
(x) k
and letting k in this equation implies
G(x) H(x) f(x) h(x) g(x) x / . (7.66)
Moreover, given > 0 and x / ,
supf(y) : [y x[ , y [a, b] G
k
(x)
for all k large enough, since eventually G
k
(x) is the supremum of f(y) over
some interval contained in [x , x + ]. Again letting k implies
sup
]yx]
f(y) G(x) and therefore, that
H(x) = limsup
yx
f(y) G(x)
for all x / . Combining this equation with Eq. (7.66) then implies H(x) =
G(x) if x / . A similar argument shows that h(x) = g(x) if x / and
hence Eq. (7.65) is proved.
3. The functions G and g are limits of measurable functions and hence mea-
surable. Since H = G and h = g except possibly on the countable set ,
both H and h are also Borel measurable. (You justify this statement.)
Theorem 7.70. Let f : [a, b] 1 be a bounded function. Then
_
b
a
f =
_
[a,b]
Hdm and
_
b
a
f =
_
[a,b]
hdm (7.67)
and the following statements are equivalent:
1. H(x) = h(x) for m -a.e. x,
2. the set
E := x [a, b] : f is discontinuous at x
is an m null set.
3. f is Riemann integrable.
If f is Riemann integrable then f is Lebesgue measurable
2
, i.e. f is L/B
measurable where L is the Lebesgue algebra and B is the Borel algebra
on [a, b]. Moreover if we let m denote the completion of m, then
_
[a,b]
Hdm =
_
b
a
f(x)dx =
_
[a,b]
fd m =
_
[a,b]
hdm. (7.68)
Proof. Let
k
k=1
be an increasing sequence of partitions of [a, b] as de-
scribed in Lemma 7.67 and let G and g be dened as in Lemma 7.69. Since
m() = 0, H = G a.e., Eq. (7.67) is a consequence of Eqs. (7.63) and (7.64).
From Eq. (7.67), f is Riemann integrable i
_
[a,b]
Hdm =
_
[a,b]
hdm
2
f need not be Borel measurable.
and because h f H this happens i h(x) = H(x) for m - a.e. x. Since
E = x : H(x) ,= h(x), this last condition is equivalent to E being a m null
set. In light of these results and Eq. (7.65), the remaining assertions including
Eq. (7.68) are now consequences of Lemma 7.73.
Notation 7.71 In view of this theorem we will often write
_
b
a
f(x)dx for
_
b
a
fdm.
7.8 Measurability on Complete Measure Spaces*
In this subsection we will discuss a couple of measurability results concerning
completions of measure spaces.
Proposition 7.72. Suppose that (X, B, ) is a complete measure space
3
and
f : X 1 is measurable.
1. If g : X 1 is a function such that f(x) = g(x) for a.e. x, then g is
measurable.
2. If f
n
: X 1 are measurable and f : X 1 is a function such that
lim
n
f
n
= f, - a.e., then f is measurable as well.
Proof. 1. Let E = x : f(x) ,= g(x) which is assumed to be in B and
(E) = 0. Then g = 1
E
c f + 1
E
g since f = g on E
c
. Now 1
E
c f is measurable
so g will be measurable if we show 1
E
g is measurable. For this consider,
(1
E
g)
1
(A) =
_
E
c
(1
E
g)
1
(A 0) if 0 A
(1
E
g)
1
(A) if 0 / A
(7.69)
Since (1
E
g)
1
(B) E if 0 / B and (E) = 0, it follow by completeness
of B that (1
E
g)
1
(B) B if 0 / B. Therefore Eq. (7.69) shows that 1
E
g is
measurable. 2. Let E = x : lim
n
f
n
(x) ,= f(x) by assumption E B and
(E) = 0. Since g := 1
E
f = lim
n
1
E
c f
n
, g is measurable. Because f = g
on E
c
and (E) = 0, f = g a.e. so by part 1. f is also measurable.
The above results are in general false if (X, B, ) is not complete. For exam-
ple, let X = 0, 1, 2, B = 0, 1, 2, X, and =
0
. Take g(0) = 0, g(1) =
1, g(2) = 2, then g = 0 a.e. yet g is not measurable.
Lemma 7.73. Suppose that (X, /, ) is a measure space and

/ is the com-
pletion of / relative to and is the extension of to

/. Then a function
f : X 1 is (

/, B = B
R
) measurable i there exists a function g : X 1
3
Recall this means that if N X is a set such that N A M and (A) = 0,
then N M as well.
that is (/, B) measurable such E = x : f(x) ,= g(x)

/ and (E) = 0,
i.e. f(x) = g(x) for a.e. x. Moreover for such a pair f and g, f L
1
( ) i
g L
1
() and in which case
_
X
fd =
_
X
gd.
Proof. Suppose rst that such a function g exists so that (E) = 0. Since
g is also (

/, B) measurable, we see from Proposition 7.72 that f is (

/, B)
measurable. Conversely if f is (

/, B) measurable, by considering f
we may
assume that f 0. Choose (

/, B) measurable simple function
n
0 such
that
n
f as n . Writing
n
=
a
k
1
Ak
with A
k

/, we may choose B
k
/ such that B
k
A
k
and (A
k
B
k
) = 0.
Letting

n
:=
a
k
1
Bk
we have produced a (/, B) measurable simple function
n
0 such that
E
n
:=
n
,=
n
has zero measure. Since (
n
E
n
)

n
(E
n
) , there
exists F / such that
n
E
n
F and (F) = 0. It now follows that
1
F

n
= 1
F

n
g := 1
F
f as n .
This shows that g = 1
F
f is (/, B) measurable and that f ,= g F has
measure zero. Since f = g, a.e.,
_
X
fd =
_
X
gd so to prove Eq. (7.70)
it suces to prove
_
X
gd =
_
X
gd. (7.70)
Because = on /, Eq. (7.70) is easily veried for non-negative / mea-
surable simple functions. Then by the monotone convergence theorem and
the approximation Theorem 6.39 it holds for all / measurable functions
g : X [0, ]. The rest of the assertions follow in the standard way by con-
sidering (Re g)
and (Img)
.
7.9 More Exercises
Exercise 7.19. Let be a measure on an algebra / 2
X
, then (A)+(B) =
(A B) +(A B) for all A, B /.
Exercise 7.20 (From problem 12 on p. 27 of Folland.). Let (X, /, )
be a nite measure space and for A, B / let (A, B) = (AB) where
AB = (A B) (B A) . It is clear that (A, B) = (B, A) . Show:
7.9 More Exercises 109
1. satises the triangle inequality:
(A, C) (A, B) + (B, C) for all A, B, C /.
2. Dene A B i (AB) = 0 and notice that (A, B) = 0 i A B. Show
is an equivalence relation.
3. Let // denote / modulo the equivalence relation, , and let [A] :=
B /: B A . Show that ([A] , [B]) := (A, B) is gives a well dened
metric on // .
4. Similarly show ([A]) = (A) is a well dened function on // and show
: (// ) 1
+
is continuous.
n
: / [0, ] are measures on / for n N.
Also suppose that
n
(A) is increasing in n for all A /. Prove that : /
[0, ] dened by (A) := lim
n
n
(A) is also a measure.
Exercise 7.22. Now suppose that is some index set and for each ,
:
/[0, ] is a measure on /. Dene : /[0, ] by (A) =
(A)
for each A /. Show that is also a measure.
Exercise 7.23. Let (X, /, ) be a measure space and A
n
n=1
/, show
(A
n
a.a.) liminf
n
(A
n
)
and if (
mn
A
m
) < for some n, then
(A
n
i.o.) limsup
n
(A
n
) .
Exercise 7.24 (Folland 2.13 on p. 52.). Suppose that f
n
n=1
is a sequence
of non-negative measurable functions such that f
n
f pointwise and
lim
n
_
f
n
=
_
f < .
Then _
E
f = lim
n
_
E
f
n
for all measurable sets E /. The conclusion need not hold if lim
n
_
f
n
=
_
f. Hint: Fatou times two.
Exercise 7.25. Give examples of measurable functions f
n
on 1 such that
f
n
decreases to 0 uniformly yet
_
f
n
dm = for all n. Also give an example
of a sequence of measurable functions g
n
on [0, 1] such that g
n
0 while
_
g
n
dm = 1 for all n.
Exercise 7.26. Suppose a
n
n=
C is a summable sequence (i.e.
n=
[a
n
[ < ), then f() :=

n=
a
n
e
in
is a continuous function for
1 and
a
n
=
1
2
_

f()e
in
d.
Exercise 7.27. For any function f L
1
(m) , show x
1
_
(,x]
f (t) dm(t) is continuous in x. Also nd a nite measure, ,
on B
R
such that x
_
(,x]
f (t) d(t) is not continuous.
Exercise 7.28. Folland 2.31b and 2.31e on p. 60. (The answer in 2.13b is wrong
by a factor of 1 and the sum is on k = 1 to . In part (e), s should be taken
to be a. You may also freely use the Taylor series expansion
(1 z)
1/2
=
n=0
(2n 1)!!
2
n
n!
z
n
=
n=0
(2n)!
4
n
(n!)
2
z
n
for [z[ < 1.
Exercise 7.29. Prove Lemma 7.67.
8
Functional Forms of the Theorem
In this chapter we will develop a very useful function analogue of the
theorem. The results in this section will be used often in the sequel.
8.1 Multiplicative System Theorems
Notation 8.1 Let be a set and H be a subset of the bounded real valued
functions on . We say that H is closed under bounded convergence if; for
every sequence, f
n
n=1
H, satisfying:
1. there exists M < such that [f
n
()[ M for all and n N,
2. f () := lim
n
f
n
() exists for all , then f H.
A subset, M, of H is called a multiplicative system if M is closed under
nite intersections.
The following result may be found in Dellacherie [11, p. 14]. The style of
proof given here may be found in Janson [26, Appendix A., p. 309].
Theorem 8.2 (Dynkins Multiplicative System Theorem). Suppose that
H is a vector subspace of bounded functions from to 1 which contains the
constant functions and is closed under bounded convergence. If M H is a mul-
tiplicative system, then H contains all bounded (M) measurable functions.
Proof. In this proof, we may (and do) assume that H is the smallest sub-
space of bounded functions on which contains the constant functions, contains
M, and is closed under bounded convergence. (As usual such a space exists by
taking the intersection of all such spaces.) The remainder of the proof will be
broken into four steps.
Step 1. (H is an algebra of functions.) For f H, let H
f
:=
g H : gf H . The reader will now easily verify that H
f
is a linear sub-
space of H, 1 H
f
, and H
f
is closed under bounded convergence. Moreover if
f M, since M is a multiplicative system, M H
f
. Hence by the denition of
H, H = H
f
, i.e. fg H for all f M and g H. Having proved this it now
follows for any f H that M H
f
and therefore as before, H
f
= H. Thus we
may conclude that fg H whenever f, g H, i.e. H is an algebra of functions.
Step 2. (B := A : 1
A
H is a algebra.) Using the fact that H
is an algebra containing constants, the reader will easily verify that B is closed
under complementation, nite intersections, and contains , i.e. B is an algebra.
Using the fact that H is closed under bounded convergence, it follows that B is
closed under increasing unions and hence that B is algebra.
Step 3. (H contains all bounded B measurable functions.) Since H is a
vector space and H contains 1
A
for all A B, H contains all B measurable
simple functions. Since every bounded B measurable function may be written
as a bounded limit of such simple functions (see Theorem 6.39), it follows that
H contains all bounded B measurable functions.
Step 4. ( (M) B.) Let
n
(x) = 0 [(nx) 1] (see Figure 8.1 below)
so that
n
(x) 1
x>0
. Given f M and a 1, let F
n
:=
n
(f a) and
M := sup
[f () a[ . By the Weierstrass approximation Theorem 4.36, we

may nd polynomial functions, p
l
(x) such that p
l

n
uniformly on [M, M] .
Since p
l
is a polynomial and H is an algebra, p
l
(f a) H for all l. Moreover,
p
l
(f a) F
n
uniformly as l , from with it follows that F
n
H for all
n. Since, F
n
1
f>a]
it follows that 1
f>a]
H, i.e. f > a B. As the sets
f > a with a 1 and f M generate (M) , it follows that (M) B.
Fig. 8.1. Plots of 1, 2 and 3.
Second proof.* (This proof may safely be skipped.) This proof will make
use of Dynkins Theorem 5.14. Let
L := A : 1
A
H .
112 8 Functional Forms of the Theorem
We then have L since 1
= 1 H, if A, B L with A B then B A L
since 1
B\A
= 1
B
1
A
H, and if A
n
L with A
n
A, then A L because
1
An
H and 1
An
1
A
H. Therefore L is system.
Let
n
(x) = 0 [(nx) 1] (see Figure 8.1 above) so that
n
(x) 1
x>0
.
Given f
1
, f
2
, . . . , f
k
M and a
1
, . . . , a
k
1, let
F
n
:=
k
i=1
n
(f
i
a
i
)
and let
M := sup
i=1,...,k
sup
[f
i
() a
i
[ .
By the Weierstrass approximation Theorem 4.36, we may nd polynomial func-
tions, p
l
(x) such that p
l

n
uniformly on [M, M] .Since p
l
is a polynomial
it is easily seen that

k
i=1
p
l
(f
i
a
i
) H. Moreover,
k
i=1
p
l
(f
i
a
i
) F
n
uniformly as l ,
from with it follows that F
n
H for all n. Since,
F
n

k
i=1
1
fi>ai]
= 1
k
i=1
fi>ai]
it follows that 1
k
i=1
fi>ai]
H or equivalently that
k
i=1
f
i
> a
i
L. There-
fore L contains the system, T, consisting of nite intersections of sets of
the form, f > a with f M and a 1.
As a consequence of the above paragraphs and the Theorem 5.14, L
contains (T) = (M) . In particular it follows that 1
A
H for all A (M) .
Since any positive (M) measurable function may be written as a increasing
limit of simple functions (see Theorem 6.39)), it follows that H contains all non-
negative bounded (M) measurable functions. Finally, since any bounded
(M) measurable functions may be written as the dierence of two such
non-negative simple functions, it follows that H contains all bounded (M)
measurable functions.
Corollary 8.3. Suppose H is a subspace of bounded real valued functions such
that 1 H and H is closed under bounded convergence. If T 2
is a mul-
tiplicative class such that 1
A
H for all A T, then H contains all bounded
(T) measurable functions.
Proof. Let M= 11
A
: A T . Then M H is a multiplicative system
and the proof is completed with an application of Theorem 8.2.
Example 8.4. Suppose and are two probability measure on (, B) such that
_
fd =
_
fd (8.1)
for all f in a multiplicative subset, M, of bounded measurable functions on .
Then = on (M) . Indeed, apply Theorem 8.2 with H being the bounded
measurable functions on such that Eq. (8.1) holds. In particular if M =
1 1
A
: A T with T being a multiplicative class we learn that = on
(M) = (T) .
Here is a complex version of Theorem 8.2.
Theorem 8.5 (Complex Multiplicative System Theorem). Suppose H is
a complex linear subspace of the bounded complex functions on , 1 H, H is
closed under complex conjugation, and H is closed under bounded convergence.
If M H is multiplicative system which is closed under conjugation, then H
contains all bounded complex valued (M)-measurable functions.
Proof. Let M
0
= span
C
(M1) be the complex span of M. As the reader
should verify, M
0
is an algebra, M
0
H, M
0
is closed under complex conjuga-
tion and (M
0
) = (M) . Let
H
R
:= f H : f is real valued and
M
R
0
:= f M
0
: f is real valued .
Then H
R
is a real linear space of bounded real valued functions 1 which is closed
under bounded convergence and M
R
0
H
R
. Moreover, M
R
0
is a multiplicative
system (as the reader should check) and therefore by Theorem 8.2, H
R
contains
all bounded
_
M
R
0
_
measurable real valued functions. Since H and M
0
are
complex linear spaces closed under complex conjugation, for any f H or
f M
0
, the functions Re f =
1
2
_
f +

f
_
and Imf =
1
2i
_
f

f
_
are in H or
M
0
respectively. Therefore M
0
= M
R
0
+ iM
R
0
,
_
M
R
0
_
= (M
0
) = (M) , and
H = H
R
+iH
R
. Hence if f : C is a bounded (M) measurable function,
then f = Re f +i Imf H since Re f and Imf are in H
R
.
Lemma 8.6. Suppose that < a < b < and let Trig(1) C (1, C) be the
complex linear span of
_
x e
ix
: 1
_
. Then there exists f
n
C
c
(1, [0, 1])
and g
n
Trig(1) such that lim
n
f
n
(x) = 1
(a,b]
(x) = lim
n
g
n
(x) for all
x 1.
Proof. The assertion involving f
n
C
c
(1, [0, 1]) was the content of one of
your homework assignments. For the assertion involving g
n
Trig(1) , it will
suce to show that any f C
c
(1) may be written as f (x) = lim
n
g
n
(x)
8.1 Multiplicative System Theorems 113
for some g
n
Trig(1) where the limit is uniform for x in compact subsets of
1.
So suppose that f C
c
(1) and L > 0 such that f (x) = 0 if [x[ L/4.
Then
f
L
(x) :=
n=
f (x +nL)
is a continuous L periodic function on 1, see Figure 8.2. If > 0 is given, we
Fig. 8.2. This is plot of f8 (x) where f (x) =
_
1 x
2
_
1
|x|1
. The center hump by
itself would be the plot of f (x) .
may apply Theorem 4.42 to nd Z such that
f
L
_
L
2
x
_
e
ix
for all x 1,
wherein we have use the fact that x f
L
_
L
2
x
_
is a 2 periodic function of
x. Equivalently we have,
max
x
f
L
(x)
e
i
2
L
x
.
In particular it follows that f
L
(x) is a uniform limit of functions from Trig(1) .
Since lim
L
f
L
(x) = f (x) uniformly on compact subsets of 1, it is easy to
conclude there exists g
n
n
g
n
(x) = f (x) uniformly
on compact subsets of 1.
Corollary 8.7. Each of the following algebras on 1
d
are equal to B
R
d;
1. /
1
:= (
n
i=1
x f (x
i
) : f C
c
(1)) ,
2. /
2
:= (x f
1
(x
1
) . . . f
d
(x
d
) : f
i
C
c
(1))
3. /
3
=
_
C
c
_
1
d
__
, and
4. /
4
:=
__
x e
ix
: 1
d
__
.
Proof. As the functions dening each /
i
are continuous and hence Borel
measurable, it follows that /
i
B
R
d for each i. So to nish the proof it suces
to show B
R
d /
i
for each i.
/
1
case. Let a, b 1 with < a < b < . By Lemma 8.6, there
exists f
n
C
c
(1) such that lim
n
f
n
= 1
(a,b]
. Therefore it follows that
x 1
(a,b]
(x
i
) is /
1
measurable for each i. Moreover if < a
i
< b
i
<
for each i, then we may conclude that
x
d
i=1
1
(ai,bi]
(x
i
) = 1
(a1,b1](ad,bd]
(x)
is /
1
measurable as well and hence (a
1
, b
1
] (a
d
, b
d
] /
1
. As such
sets generate B
R
d we may conclude that B
R
d /
1
.
and therefore /
1
= B
R
d.
/
2
case. As above, we may nd f
i,n
1
(ai,bi]
as n for each 1 i d
and therefore,
1
(a1,b1](ad,bd]
(x) = lim
n
f
1,n
(x
1
) . . . f
d,n
(x
d
) for all x 1
d
.
This shows that 1
(a1,b1](ad,bd]
is /
2
measurable and therefore (a
1
, b
1
]
(a
d
, b
d
] /
2
.
/
3
case. This is easy since B
R
d = /
2
/
3
.
/
4
case. By Lemma 8.6 here exists g
n
n
g
n
=
1
(a,b]
. Since x g
n
(x
i
) is in the span
_
x e
ix
: 1
d
_
for each n, it follows
that x 1
(a,b]
(x
i
) is /
4
measurable for all < a < b < . Therefore,
just as in the proof of case 1., we may now conclude that B
R
d /
4
.
Corollary 8.8. Suppose that H is a subspace of complex valued functions on
1
d
which is closed under complex conjugation and bounded convergence. If H
contains any one of the following collection of functions;
1. M:= x f
1
(x
1
) . . . f
d
(x
d
) : f
i
C
c
(1)
2. M:= C
c
_
1
d
_
, or
3. M:=
_
x e
ix
: 1
d
_
then H contains all bounded complex Borel measurable functions on 1
d
.
Proof. Observe that if f C
c
(1) such that f (x) = 1 in a neighborhood
of 0, then f
n
(x) := f (x/n) 1 as n . Therefore in cases 1. and 2., H
contains the constant function, 1, since
1 = lim
n
f
n
(x
1
) . . . f
n
(x
d
) .
In case 3, 1 M H as well. The result now follows from Theorem 8.5 and
Corollary 8.7.
Proposition 8.9 (Change of Variables Formula). Suppose that <
a < b < and u : [a, b] 1 is a continuously dierentiable function. Let
[c, d] = u([a, b]) where c = minu([a, b]) and d = max u([a, b]). (By the interme-
diate value theorem u([a, b]) is an interval.) Then for all bounded measurable
functions, f : [c, d] 1 we have
_
u(b)
u(a)
f (x) dx =
_
b
a
f (u(t)) u(t) dt. (8.2)
Moreover, Eq. (8.2) is also valid if f : [c, d] 1 is measurable and
_
b
a
[f (u(t))[ [ u(t)[ dt < . (8.3)
Proof. Let H denote the space of bounded measurable functions such that
Eq. (8.2) holds. It is easily checked that H is a linear space closed under bounded
convergence. Next we show that M = C ([c, d] , 1) H which coupled with
Corollary 8.8 will show that H contains all bounded measurable functions from
[c, d] to 1.
If f : [c, d] 1 is a continuous function and let F be an anti-derivative of
f. Then by the fundamental theorem of calculus,
_
b
a
f (u(t)) u(t) dt =
_
b
a
F
t
(u(t)) u(t) dt
=
_
b
a
d
dt
F (u(t)) dt = F (u(t)) [
b
a
= F (u(b)) F (u(a)) =
_
u(b)
u(a)
F
t
(x) dx =
_
u(b)
u(a)
f (x) dx.
Thus M H and the rst assertion of the proposition is proved.
Now suppose that f : [c, d] 1 is measurable and Eq. (8.3) holds. For M <
, let f
M
(x) = f (x) 1
]f(x)]M
a bounded measurable function. Therefore
applying Eq. (8.2) with f replaced by [f
M
[ shows,
_
u(b)
u(a)
[f
M
(x)[ dx
_
b
a
[f
M
(u(t))[ u(t) dt
_
b
a
[f
M
(u(t))[ [ u(t)[ dt.
Using the MCT, we may let M in the previous inequality to learn
_
u(b)
u(a)
[f (x)[ dx
_
b
a
[f (u(t))[ [ u(t)[ dt < .
Now apply Eq. (8.2) with f replaced by f
M
to learn
_
u(b)
u(a)
f
M
(x) dx =
_
b
a
f
M
(u(t)) u(t) dt.
Using the DCT we may now let M in this equation to show that Eq. (8.2)
remains valid.
Exercise 8.1. Suppose that u : 1 1 is a continuously dierentiable function
such that u(t) 0 for all t and lim
t
u(t) = . Show that
_
R
f (x) dx =
_
R
f (u(t)) u(t) dt (8.4)
for all measurable functions f : 1 [0, ] . In particular applying this result
to u(t) = at +b where a > 0 implies,
_
R
f (x) dx = a
_
R
f (at +b) dt.
Denition 8.10. The Fourier transform or characteristic function of a
nite measure, , on
_
1
d
, B
R
d
_
, is the function, : 1
d
C dened by
() :=
_
R
d
e
ix
d(x) for all 1
d
Corollary 8.11. Suppose that and are two probability measures on
_
1
d
, B
R
d
_
. Then any one of the next three conditions implies that = ;
1.
_
R
d
f
1
(x
1
) . . . f
d
(x
d
) d (x) =
_
R
d
f
1
(x
1
) . . . f
d
(x
d
) d(x) for all f
i

C
c
(1) .
2.
_
R
d
f (x) d (x) =
_
R
d
f (x) d(x) for all f C
c
_
1
d
_
.
3. = .
Item 3. asserts that the Fourier transform is injective.
8.2 Exercises 115
Proof. Let H be the collection of bounded complex measurable functions
from 1
d
to C such that _
R
d
fd =
_
R
d
fd. (8.5)
It is easily seen that H is a linear space closed under complex conjugation and
bounded convergence (by the DCT). Since H contains one of the multiplicative
systems appearing in Corollary 8.8, it contains all bounded Borel measurable
functions form 1
d
C. Thus we may take f = 1
A
with A B
R
d in Eq. (8.5)
to learn, (A) = (A) for all A B
R
d.
In many cases we can replace the condition in item 3. of Corollary 8.11 by;
_
R
d
e
x
d(x) =
_
R
d
e
x
d (x) for all U, (8.6)
where U is a neighborhood of 0 1
d
. In order to do this, one must assume
at least assume that the integrals involved are nite for all U. The idea
is to show that Condition 8.6 implies = . You are asked to carry out this
argument in Exercise 8.2 making use of the following lemma.
Lemma 8.12 (Analytic Continuation). Let > 0 and S
:=
x +iy C : [x[ < be an strip in C about the imaginary axis. Sup-
pose that h : S
C is a function such that for each b 1, there exists

c
n
(b)
n=0
C such that
h(z +ib) =
n=0
c
n
(b) z
n
for all [z[ < . (8.7)
If c
n
(0) = 0 for all n N
0
, then h 0.
Proof. It suces to prove the following assertion; if for some b 1 we know
that c
n
(b) = 0 for all n, then c
n
(y) = 0 for all n and y (b , b +) . We
now prove this assertion.
Let us assume that b 1 and c
n
(b) = 0 for all n N
0
. It then follows from
Eq. (8.7) that h(z +ib) = 0 for all [z[ < . Thus if [y b[ < , we may conclude
that h(x +iy) = 0 for x in a (possibly very small) neighborhood (, ) of 0.
Since
n=0
c
n
(y) x
n
= h(x +iy) = 0 for all [x[ < ,
it follows that
0 =
1
n!
d
n
dx
n
h(x +iy) [
x=0
= c
n
(y)
and the proof is complete.
8.2 Exercises
Exercise 8.2. Suppose > 0 and X and Y are two random variables such that
E
_
e
tX
= E
_
e
tY
< for all [t[ . Show;

1. E
_
e
]X]
and E
_
e
]Y ]
are nite.
2. E
_
e
itX
= E
_
e
itY
for all t 1. Hint: Consider h(z) := E

_
e
zX
E
_
e
zY
for z S
. Now show for [z[ and b 1, that

h(z +ib) = E
_
e
ibX
e
zX
E
_
e
ibY
e
zY
n=0
c
n
(b) z
n
(8.8)
where
c
n
(b) :=
1
n!
_
E
_
e
ibX
X
n
E
_
e
ibY
Y
n
_
. (8.9)
3. Conclude from item 2. that X
d
= Y, i.e. that Law
P
(X) = Law
P
(Y ) .
Exercise 8.3. Let (, B, P) be a probability space and X, Y : 1 be a pair
of random variables such that
E[f (X) g (Y )] = E[f (X) g (X)]
for every pair of bounded measurable functions, f, g : 1 1. Show
P (X = Y ) = 1. Hint: Let H denote the bounded Borel measurable functions,
h : 1
2
1 such that
E[h(X, Y )] = E[h(X, X)] .
Use Theorem 8.2 to show H is the vector space of all bounded Borel measurable
functions. Then take h(x, y) = 1
x=y]
.
Exercise 8.4 (Density of / simple functions). Let (, B, P) be a proba-
bility space and assume that / is a sub-algebra of B such that B = (/) . Let H
denote the bounded measurable functions f : 1 such that for every > 0
there exists an an / simple function, : 1 such that E[f [ < .
Show H consists of all bounded measurable functions, f : 1. Hint: let M
denote the collection of / simple functions.
Corollary 8.13. Suppose that (, B, P) is a probability space, X
n
n=1
is a
collection of random variables on , and B
:= (X
1
, X
2
, X
3
, . . . ) . Then for
all > 0 and all bounded B
measurable functions, f : 1, there exists

an n N and a bounded B
R
n measurable function G : 1
n
1 such that
E[f G(X
1
, . . . , X
n
)[ < . Moreover we may assume that sup
xR
n [G(x)[
M := sup
[f ()[ .
Proof. Apply Exercise 8.4 with / :=
n=1
(X
1
, . . . , X
n
) in order to nd
an / measurable simple function, , such that E[f [ < . By the denition
of / we know that is (X
1
, . . . , X
n
) measurable for some n N. It now
follows by the factorization Lemma 6.40 that = G(X
1
, . . . , X
n
) for some B
R
n
measurable function G : 1
n
1. If necessary, replace G by [G M] (M)
in order to insure sup
xR
n [G(x)[ M.
Exercise 8.5 (Density of / in B = (/)). Keeping the same notation as
in Exercise 8.4 but now take f = 1
B
for some B B and given > 0, write
=

n
i=0
i
1
Ai
where
0
= 0,
i
n
i=1
is an enumeration of () 0 , and
A
i
:= =
i
. Show; 1.
E[1
B
[ = P (A
0
B) +
n
i=1
[[1
i
[ P (B A
i
) +[
i
[ P (A
i
B)] (8.10)
P (A
0
B) +
n
i=1
minP (B A
i
) , P (A
i
B) . (8.11)
2. Now let =
n
i=0
i
1
Ai
with
i
=
_
1 if P (A
i
B) P (B A
i
)
0 if P (A
i
B) > P (B A
i
)
.
Then show that
E[1
B
[ = P (A
0
B) +
n
i=1
minP (B A
i
) , P (A
i
B) E[1
B
[ .
Observe that = 1
D
where D =
i:i=1
A
i
/ and so you have shown; for
every > 0 there exists a D / such that
P (BD) = E[1
B
1
D
[ < .
Exercise 8.6. Suppose that (X
i
, B
i
)
n
i=1
are measurable spaces and for each
i, M
i
is a multiplicative system of real bounded measurable functions on X
i
such that (M
i
) = B
i
and there exist
n
M
i
such that
n
1 boundedly as
n . Given f
i
: X
i
1 let f
1
f
n
: X
1
X
n
1 be dened by
(f
1
f
n
) (x
1
, . . . , x
n
) = f
1
(x
1
) . . . f
n
(x
n
) .
Show
M
1
M
n
:= f
1
f
n
: f
i
M
i
for 1 i n
is a multiplicative system of bounded measurable functions on
(X := X
1
X
n
, B := B
1
B
n
) such that (M
1
M
n
) = B.
Solution to Exercise (8.6). I will give the proof in case that n = 2. The
generalization to higher n is straight forward.
Let
i
: X X
i
be the projection maps,
1
(x
1
, x
2
) = x
1
and
2
(x
1
, x
2
) =
x
2
. For f
i
M
i
, f
i

i
: X 1 is the composition of measurable functions
and hence measurable. Therefore f
1
f
2
= (f
1

1
) (f
2

2
) is a bounded
B
1
B
2
measurable function and therefore (M
1
M
2
) B
1
B
2
. Since it
is clear that M
1
M
2
is a multiplicative system, to nish the proof we must
show B
1
B
2
(M
1
M
2
) .
Proof. Let g M
2
and let
H
g
:= f (B
1
)
b
: f g is (M
1
M
2
) measurable .
You may easily check that H
g
is closed under bounded convergence, M
1
H
g
,
and H
g
contains the constant functions. Since (M
1
) = B
1
it now follows by
Dynkins multiplicative systems Theorem 8.2, that H
g
= (B
1
)
b
. Thus we have
shown that (B
1
)
b
M
2
consists of (M
1
M
2
) measurable functions. By the
same logic we may now conclude that (B
1
)
b
(B
2
)
b
consists of (M
1
M
2
)
measurable functions as well. In particular this shows for any A
i
B
i
that
1
A1A2
= 1
A1
1
A2
is (M
1
M
2
) measurable and therefore A
1
A
2

(M
1
M
2
) for all A
i
B
i
. As the set A
1
A
2
: A
i
B
i
generate B
1
B
2
we may conclude that B
1
B
2
(M
1
M
2
) .
8.3 A Strengthening of the Multiplicative System
Theorem*
Notation 8.14 We say that H
(, 1) is closed under monotone con-

vergence if; for every sequence, f
n
n=1
H, satisfying:
1. there exists M < such that 0 f
n
() M for all and n N,
2. f
n
() is increasing in n for all , then f := lim
n
f
n
H.
Clearly if H is closed under bounded convergence then it is also closed under
monotone convergence. I learned the proof of the converse from Pat Fitzsim-
mons but this result appears in Sharpe [61, p. 365].
Proposition 8.15. *Let be a set. Suppose that H is a vector subspace of
bounded real valued functions from to 1 which is closed under monotone con-
vergence. Then H is closed under uniform convergence as well, i.e. f
n
n=1
H
with sup
nN
sup
[f
n
()[ < and f
n
f, then f H.
Proof. Let us rst assume that f
n
n=1
H such that f
n
converges uni-
formly to a bounded function, f : 1. Let |f|
:= sup
[f ()[ . Let
8.4 The Bounded Approximation Theorem* 117
> 0 be given. By passing to a subsequence if necessary, we may assume
|f f
n
|
2
(n+1)
. Let
g
n
:= f
n
n
+M
with
n
and M constants to be determined shortly. We then have
g
n+1
g
n
= f
n+1
f
n
+
n
n+1
2
(n+1)
+
n
n+1
.
Taking
n
:= 2
n
, then
n

n+1
= 2
n
(1 1/2) = 2
(n+1)
in which case
g
n+1
g
n
0 for all n. By choosing M suciently large, we will also have
g
n
0 for all n. Since H is a vector space containing the constant functions,
g
n
H and since g
n
f +M, it follows that f = f +M M H. So we have
shown that H is closed under uniform convergence.
This proposition immediately leads to the following strengthening of Theo-
rem 8.2.
Theorem 8.16. *Suppose that H is a vector subspace of bounded real valued
functions on which contains the constant functions and is closed under
monotone convergence. If M H is multiplicative system, then H contains
all bounded (M) measurable functions.
Proof. Proposition 8.15 reduces this theorem to Theorem 8.2.
8.4 The Bounded Approximation Theorem*
This section should be skipped until needed (if ever!).
Notation 8.17 Given a collection of bounded functions, M, from a set, , to
1, let M
(M
) denote the the bounded monotone increasing (decreasing) limits

of functions from M. More explicitly a bounded function, f : 1 is in M
respectively M
i there exists f
n
M such that f
n
f respectively f
n
f.
Theorem 8.18 (Bounded Approximation Theorem*). Let (, B, ) be a
nite measure space and M be an algebra of bounded 1 valued measurable
functions such that:
1. (M) = B,
2. 1 M, and
3. [f[ M for all f M.
Then for every bounded (M) measurable function, g : 1, and every
> 0, there exists f M
and h M
such that f g h and (h f) < .

1
1
Bruce: rework the Daniel integral section in the Analysis notes to stick to latticies
of bounded functions.
Proof. Let us begin with a few simple observations.
1. M is a lattice if f, g M then
f g =
1
2
(f +g +[f g[) M
and
f g =
1
2
(f +g [f g[) M.
2. If f, g M
or f, g M
then f +g M
or f +g M
respectively.
3. If 0 and f M
(f M
), then f M
(f M
) .
4. If f M
then f M
and visa versa.

5. If f
n
M
and f
n
f where f : 1 is a bounded function, then f M
.
Indeed, by assumption there exists f
n,i
M such that f
n,i
f
n
as i .
By observation (1), g
n
:= max f
ij
: i, j n M. Moreover it is clear that
g
n
max f
k
: k n = f
n
f and hence g
n
g := lim
n
g
n
f. Since
f
ij
g for all i, j, it follows that f
n
= lim
j
f
nj
g and consequently
that f = lim
n
f
n
g f. So we have shown that g
n
f M
.
Now let H denote the collection of bounded measurable functions which
satisfy the assertion of the theorem. Clearly, M H and in fact it is also easy
to see that M
and M
are contained in H as well. For example, if f M
, by
denition, there exists f
n
M M
such that f
n
f. Since M
f
n
f
f M
and (f f
n
) 0 by the dominated convergence theorem, it follows
that f H. As similar argument shows M
H. We will now show H is a

vector sub-space of the bounded B = (M) measurable functions.
H is closed under addition. If g
i
H for i = 1, 2, and > 0 is given, we
may nd f
i
M
and h
i
M
such that f
i
g
i
h
i
and (h
i
f
i
) < /2 for
i = 1, 2. Since h = h
1
+h
2
M
, f := f
1
+f
2
M
, f g
1
+g
2
h, and
(h f) = (h
1
f
1
) +(h
2
f
2
) < ,
it follows that g
1
+g
2
H.
H is closed under scalar multiplication. If g H then g H for all
1. Indeed suppose that > 0 is given and f M
and h M
such that
f g h and (h f) < . Then for 0, M
f g h M
and
(h f) = (h f) < .
Since > 0 was arbitrary, if follows that g H for 0. Similarly, M

h g f M
and
(f (h)) = (h f) < .
which shows g H as well.
Because of Theorem 8.16, to complete this proof, it suces to show H is
closed under monotone convergence. So suppose that g
n
H and g
n
g, where
g : 1 is a bounded function. Since H is a vector space, it follows that
0
n
:= g
n+1
g
n
H for all n N. So if > 0 is given, we can nd,
M
u
n

n
v
n
M
such that (v
n
u
n
) 2
n
for all n. By replacing
u
n
by u
n
0 M
(by observation 1.), we may further assume that u

n
0. Let
v :=
n=1
v
n
= lim
N
N
n=1
v
n
M
(using observations 2. and 5.)

and for N N, let
u
N
:=
N
n=1
u
n
M
(using observation 2).

Then
n=1
n
= lim
N
N
n=1
n
= lim
N
(g
N+1
g
1
) = g g
1
and u
N
g g
1
v. Moreover,
_
v u
N
_
=
N
n=1
(v
n
u
n
) +
n=N+1
(v
n
)
N
n=1
2
n
+
n=N+1
(v
n
)
+
n=N+1
(v
n
) .
However, since
n=1
(v
n
)
n=1
n
+2
n
_
=
n=1
(
n
) +()
=
n=1
(g g
1
) +() < ,
it follows that for N N suciently large that

n=N+1
(v
n
) < . Therefore,
for this N, we have
_
v u
N
_
< 2 and since > 0 is arbitrary, if follows
that g g
1
H. Since g
1
H and H is a vector space, we may conclude that
g = (g g
1
) +g
1
H.
9
Multiple and Iterated Integrals
9.1 Iterated Integrals
Notation 9.1 (Iterated Integrals) If (X, /, ) and (Y, ^, ) are two mea-
sure spaces and f : XY C is a /^ measurable function, the iterated
integrals of f (when they make sense) are:
_
X
d(x)
_
Y
d(y)f(x, y) :=
_
X
__
Y
f(x, y)d(y)
_
d(x)
and
_
Y
d(y)
_
X
d(x)f(x, y) :=
_
Y
__
X
f(x, y)d(x)
_
d(y).
Notation 9.2 Suppose that f : X C and g : Y C are functions, let f g
denote the function on X Y given by
f g(x, y) = f(x)g(y).
Notice that if f, g are measurable, then f g is (/^, B
C
) measurable.
To prove this let F(x, y) = f(x) and G(x, y) = g(y) so that f g = F G will
be measurable provided that F and G are measurable. Now F = f
1
where
1
: X Y X is the projection map. This shows that F is the composition
of measurable functions and hence measurable. Similarly one shows that G is
measurable.
9.2 Tonellis Theorem and Product Measure
Theorem 9.3. Suppose (X, /, ) and (Y, ^, ) are -nite measure spaces
and f is a nonnegative (/^, B
R
) measurable function, then for each y Y,
x f(x, y) is / B
[0,]
measurable, (9.1)
for each x X,
y f(x, y) is ^ B
[0,]
measurable, (9.2)
x
_
Y
f(x, y)d(y) is / B
[0,]
measurable, (9.3)
y
_
X
f(x, y)d(x) is ^ B
[0,]
measurable, (9.4)
and _
X
d(x)
_
Y
d(y)f(x, y) =
_
Y
d(y)
_
X
d(x)f(x, y). (9.5)
Proof. Suppose that E = AB c := /^ and f = 1
E
. Then
f(x, y) = 1
AB
(x, y) = 1
A
(x)1
B
(y)
and one sees that Eqs. (9.1) and (9.2) hold. Moreover
_
Y
f(x, y)d(y) =
_
Y
1
A
(x)1
B
(y)d(y) = 1
A
(x)(B),
so that Eq. (9.3) holds and we have
_
X
d(x)
_
Y
d(y)f(x, y) = (B)(A). (9.6)
Similarly,
_
X
f(x, y)d(x) = (A)1
B
(y) and
_
Y
d(y)
_
X
d(x)f(x, y) = (B)(A)
from which it follows that Eqs. (9.4) and (9.5) hold in this case as well.
For the moment let us now further assume that (X) < and (Y ) <
and let H be the collection of all bounded (/^, B
R
) measurable functions
on X Y such that Eqs. (9.1) (9.5) hold. Using the fact that measurable
functions are closed under pointwise limits and the dominated convergence the-
orem (the dominating function always being a constant), one easily shows that
H closed under bounded convergence. Since we have just veried that 1
E
H
for all E in the class, c, it follows by Corollary 8.3 that H is the space
120 9 Multiple and Iterated Integrals
of all bounded (/^, B
R
) measurable functions on X Y. Moreover, if
f : X Y [0, ] is a (/^, B
R
) measurable function, let f
M
= M f
so that f
M
f as M . Then Eqs. (9.1) (9.5) hold with f replaced by f
M
for all M N. Repeated use of the monotone convergence theorem allows us to
pass to the limit M in these equations to deduce the theorem in the case
and are nite measures.
For the nite case, choose X
n
/, Y
n
^ such that X
n
X, Y
n
Y,
(X
n
) < and (Y
n
) < for all m, n N. Then dene
m
(A) = (X
m
A)
and
n
(B) = (Y
n
B) for all A / and B ^ or equivalently d
m
=
1
Xm
d and d
n
= 1
Yn
d. By what we have just proved Eqs. (9.1) (9.5) with
replaced by
m
and by
n
for all (/^, B
R
) measurable functions,
f : XY [0, ]. The validity of Eqs. (9.1) (9.5) then follows by passing to
the limits m and then n making use of the monotone convergence
theorem in the following context. For all u L
+
(X, /),
_
X
ud
m
=
_
X
u1
Xm
d
_
X
ud as m ,
and for all and v L
+
(Y, ^),
_
Y
vd
n
=
_
Y
v1
Yn
d
_
Y
vd as n .
Corollary 9.4. Suppose (X, /, ) and (Y, ^, ) are nite measure spaces.
Then there exists a unique measure on /^ such that (AB) = (A)(B)
for all A / and B ^. Moreover is given by
(E) =
_
X
d(x)
_
Y
d(y)1
E
(x, y) =
_
Y
d(y)
_
X
d(x)1
E
(x, y) (9.7)
for all E /^ and is nite.
Proof. Notice that any measure such that (A B) = (A)(B) for
all A / and B ^ is necessarily nite. Indeed, let X
n
/ and
Y
n
^ be chosen so that (X
n
) < , (Y
n
) < , X
n
X and Y
n
Y,
then X
n
Y
n
/ ^, X
n
Y
n
X Y and (X
n
Y
n
) < for all n.
The uniqueness assertion is a consequence of the combination of Exercises 3.10
and 5.11 Proposition 3.25 with c = /^. For the existence, it suces to
observe, using the monotone convergence theorem, that dened in Eq. (9.7)
is a measure on /^. Moreover this measure satises (AB) = (A)(B)
for all A / and B ^ from Eq. (9.6).
Notation 9.5 The measure is called the product measure of and and will
be denoted by .
Theorem 9.6 (Tonellis Theorem). Suppose (X, /, ) and (Y, ^, ) are
nite measure spaces and = is the product measure on /^. If f
L
+
(X Y, /^), then f(, y) L
+
(X, /) for all y Y, f(x, ) L
+
(Y, ^)
for all x X,
_
Y
f(, y)d(y) L
+
(X, /),
_
X
f(x, )d(x) L
+
(Y, ^)
and
_
XY
f d =
_
X
d(x)
_
Y
d(y)f(x, y) (9.8)
=
_
Y
d(y)
_
X
d(x)f(x, y). (9.9)
Proof. By Theorem 9.3 and Corollary 9.4, the theorem holds when f = 1
E
with E /^. Using the linearity of all of the statements, the theorem is also
true for non-negative simple functions. Then using the monotone convergence
theorem repeatedly along with the approximation Theorem 6.39, one deduces
the theorem for general f L
+
(X Y, /^).
Example 9.7. In this example we are going to show, I :=
_
R
e
x
2
/2
dm(x) =
2. To this end we observe, using Tonellis theorem, that

I
2
=
__
R
e
x
2
/2
dm(x)
_
2
=
_
R
e
y
2
/2
__
R
e
x
2
/2
dm(x)
_
dm(y)
=
_
R
2
e
(x
2
+y
2
)/2
dm
2
(x, y)
where m
2
= mm is Lebesgue measure on
_
1
2
, B
R
2 = B
R
B
R
_
. From the
monotone convergence theorem,
I
2
= lim
R
_
DR
e
(x
2
+y
2
)/2
dm
2
(x, y)
where D
R
=
_
(x, y) : x
2
+y
2
< R
2
_
. Using the change of variables theorem
described in Section 9.5 below,
1
we nd
_
DR
e
(x
2
+y
2
)/2
d (x, y) =
_
(0,R)(0,2)
e
r
2
/2
rdrd
= 2
_
R
0
e
r
2
/2
rdr = 2
_
1 e
R
2
/2
_
.
1
Alternatively, you can easily show that the integral
_
D
R
fdm
2
agrees with the
multiple integral in undergraduate analysis when f is continuous. Then use the
change of variables theorem from undergraduate analysis.
9.3 Fubinis Theorem 121
From this we learn that
I
2
= lim
R
2
_
1 e
R
2
/2
_
= 2
as desired.
9.3 Fubinis Theorem
Notation 9.8 If (X, /, ) is a measure space and f : X C is any measur-
able function, let
_
X
fd :=
__
X
fd if
_
X
[f[ d <
0 otherwise.
Theorem 9.9 (Fubinis Theorem). Suppose (X, /, ) and (Y, ^, ) are
nite measure spaces, = is the product measure on / ^ and
f : X Y C is a / ^ measurable function. Then the following three
conditions are equivalent:
_
XY
[f[ d < , i.e. f L
1
(), (9.10)
_
X
__
Y
[f(x, y)[ d(y)
_
d(x) < and (9.11)
_
Y
__
X
[f(x, y)[ d(x)
_
d(y) < . (9.12)
If any one (and hence all) of these condition hold, then f(x, ) L
1
() for -a.e.
x, f(, y) L
1
() for -a.e. y,
_
Y
f(, y)dv(y) L
1
(),
_
X
f(x, )d(x) L
1
()
and Eqs. (9.8) and (9.9) are still valid after putting a bar over the integral
symbols.
Proof. The equivalence of Eqs. (9.10) (9.12) is a direct consequence of
Tonellis Theorem 9.6. Now suppose f L
1
() is a real valued function and let
E :=
_
x X :
_
Y
[f (x, y)[ d (y) =
_
. (9.13)
Then by Tonellis theorem, x
_
Y
[f (x, y)[ d (y) is measurable and hence
E /. Moreover Tonellis theorem implies
_
X
__
Y
[f (x, y)[ d (y)
_
d(x) =
_
XY
[f[ d <
which implies that (E) = 0. Let f
be the positive and negative parts of f,

then
_
Y
f (x, y) d (y) =
_
Y
1
E
c (x) f (x, y) d (y)
=
_
Y
1
E
c (x) [f
+
(x, y) f
(x, y)] d (y)

=
_
Y
1
E
c (x) f
+
(x, y) d (y)
_
Y
1
E
c (x) f
(x, y) d (y) .
(9.14)
Noting that 1
E
c (x) f
(x, y) = (1
E
c 1
Y
f
) (x, y) is a positive / ^
measurable function, it follows from another application of Tonellis theorem
that x
_
Y
f (x, y) d (y) is / measurable, being the dierence of two
measurable functions. Moreover
_
X
_
Y
f (x, y) d (y)
d(x)
_
X
__
Y
[f (x, y)[ d (y)
_
d(x) < ,
which shows
_
Y
f(, y)dv(y) L
1
(). Integrating Eq. (9.14) on x and using
Tonellis theorem repeatedly implies,
_
X
_
_
Y
f (x, y) d (y)
_
d(x)
=
_
X
d(x)
_
Y
d (y) 1
E
c (x) f
+
(x, y)
_
X
d(x)
_
Y
d (y) 1
E
c (x) f
(x, y)
=
_
Y
d (y)
_
X
d(x) 1
E
c (x) f
+
(x, y)
_
Y
d (y)
_
X
d(x) 1
E
c (x) f
(x, y)
=
_
Y
d (y)
_
X
d(x) f
+
(x, y)
_
Y
d (y)
_
X
d(x) f
(x, y)
=
_
XY
f
+
d
_
XY
f
d =
_
XY
(f
+
f
) d =
_
XY
fd (9.15)
which proves Eq. (9.8) holds.
Now suppose that f = u + iv is complex valued and again let E be as in
Eq. (9.13). Just as above we still have E / and (E) = 0 and
_
Y
f (x, y) d (y) =
_
Y
1
E
c (x) f (x, y) d (y) =
_
Y
1
E
c (x) [u(x, y) +iv (x, y)] d (y)
=
_
Y
1
E
c (x) u(x, y) d (y) +i
_
Y
1
E
c (x) v (x, y) d (y) .
The last line is a measurable in x as we have just proved. Similarly one shows
_
Y
f (, y) d (y) L
1
() and Eq. (9.8) still holds by a computation similar to
that done in Eq. (9.15). The assertions pertaining to Eq. (9.9) may be proved
in the same way.
The previous theorems generalize to products of any nite number of
nite measure spaces.
Theorem 9.10. Suppose (X
i
, /
i
,
i
)
n
i=1
are nite measure spaces
and X := X
1
X
n
. Then there exists a unique measure () on
(X, /
1
/
n
) such that
(A
1
A
n
) =
1
(A
1
) . . .
n
(A
n
) for all A
i
/
i
. (9.16)
(This measure and its completion will be denoted by
1

n
.) If f : X
[0, ] is a /
1
/
n
measurable function then
_
X
fd =
_
X
(1)
d
(1)
(x
(1)
) . . .
_
X
(n)
d
(n)
(x
(n)
) f(x
1
, . . . , x
n
) (9.17)
where is any permutation of 1, 2, . . . , n. In particular f L
1
(), i
_
X
(1)
d
(1)
(x
(1)
) . . .
_
X
(n)
d
(n)
(x
(n)
) [f(x
1
, . . . , x
n
)[ <
for some (and hence all) permutations, . Furthermore, if f L
1
() , then
_
X
fd =
_
X
(1)
d
(1)
(x
(1)
) . . .
_
X
(n)
d
(n)
(x
(n)
) f(x
1
, . . . , x
n
) (9.18)
for all permutations .
Proof. (* I would consider skipping this tedious proof.) The proof will be by
induction on n with the case n = 2 being covered in Theorems 9.6 and 9.9. So
let n 3 and assume the theorem is valid for n 1 factors or less. To simplify
notation, for 1 i n, let X
i
=

j,=i
X
j
, /
i
:=
j,=i
/
i
, and
i
:=
j,=i
j
be the product measure on
_
X
i
, /
i
_
which is assumed to exist by the induction
hypothesis. Also let /:= /
1
/
n
and for x = (x
1
, . . . , x
i
, . . . , x
n
) X
let
x
i
:= (x
1
, . . . , x
i
, . . . , x
n
) := (x
1
, . . . , x
i1
, x
i+1
, . . . , x
n
) .
Here is an outline of the argument with some details being left to the reader.
1. If f : X [0, ] is / -measurable, then
(x
1
, . . . , x
i
, . . . , x
n
)
_
Xi
f (x
1
, . . . , x
i
, . . . , x
n
) d
i
(x
i
)
is /
i
-measurable. Thus by the induction hypothesis, the right side of Eq.
(9.17) is well dened.
2. If S
n
(the permutations of 1, 2, . . . , n) we may dene a measure on
(X, /) by;
(A) :=
_
X1
d
1
(x
1
) . . .
_
Xn
d
n
(x
n
) 1
A
(x
1
, . . . , x
n
) . (9.19)
It is easy to check that is a measure which satises Eq. (9.16). Using the
niteness assumptions and the fact that
T := A
1
A
n
: A
i
/
i
for 1 i n
is a system such that (T) = /, it follows from Exercise 5.1 that there
is only one such measure satisfying Eq. (9.16). Thus the formula for in
Eq. (9.19) is independent of S
n
.
3. From Eq. (9.19) and the usual simple function approximation arguments
we may conclude that Eq. (9.17) is valid.
Now suppose that f L
1
(X, /, ) .
4. Using step 1 it is easy to check that
(x
1
, . . . , x
i
, . . . , x
n
)
_
Xi
f (x
1
, . . . , x
i
, . . . , x
n
) d
i
(x
i
)
is /
i
measurable. Indeed,
(x
1
, . . . , x
i
, . . . , x
n
)
_
Xi
[f (x
1
, . . . , x
i
, . . . , x
n
)[ d
i
(x
i
)
is /
i
measurable and therefore
E :=
_
(x
1
, . . . , x
i
, . . . , x
n
) :
_
Xi
[f (x
1
, . . . , x
i
, . . . , x
n
)[ d
i
(x
i
) <
_
/
i
.
Now let u := Re f and v := Imf and u
and v
are the positive and

negative parts of u and v respectively, then
_
Xi
f (x) d
i
(x
i
) =
_
Xi
1
E
_
x
i
_
f (x) d
i
(x
i
)
=
_
Xi
1
E
_
x
i
_
u(x) d
i
(x
i
) +i
_
Xi
1
E
_
x
i
_
v (x) d
i
(x
i
) .
Both of these later terms are /
i
measurable since, for example,
_
Xi
1
E
_
x
i
_
u(x) d
i
(x
i
) =
_
Xi
1
E
_
x
i
_
u
+
(x) d
i
(x
i
)
_
Xi
1
E
_
x
i
_
u
(x) d
i
(x
i
)
which is /
i
measurable by step 1.
9.3 Fubinis Theorem 123
5. It now follows by induction that the right side of Eq. (9.18) is well dened.
6. Let i := n and T : X X
i
X
i
be the obvious identication;
T (x
i
, (x
1
, . . . , x
i
, . . . , x
n
)) = (x
1
, . . . , x
n
) .
One easily veries T is ///
i
/
i
measurable (use Corollary 6.19
repeatedly) and that T
1
=
i
i
(see Exercise 5.1).
7. Let f L
1
() . Combining step 6. with the abstract change of variables
Theorem (Exercise 7.11) implies
_
X
fd =
_
XiX
i
(f T) d
_
i
_
. (9.20)
By Theorem 9.9, we also have
_
XiX
i
(f T) d
_
i
_
=
_
X
i
d
i
_
x
i
_
_
Xi
d
i
(x
i
) f T(x
i
, x
i
)
=
_
X
i
d
i
_
x
i
_
_
Xi
d
i
(x
i
) f(x
1
, . . . , x
n
).
(9.21)
Then by the induction hypothesis,
_
X
i
d
i
(x
i
)
_
Xi
d
i
(x
i
) f(x
1
, . . . , x
n
) =
j,=i
_
Xj
d
j
(x
j
)
_
Xi
d
i
(x
i
) f(x
1
, . . . , x
n
)
(9.22)
where the ordering the integrals in the last product are inconsequential.
Combining Eqs. (9.20) (9.22) completes the proof.
Convention: We are now going to drop the bar above the integral sign
with the understanding that
_
X
fd = 0 whenever f : X C is a measurable
function such that
_
X
[f[ d = . However if f is a non-negative function (i.e.
f : X [0, ]) non-integrable function we will interpret
_
X
fd to be innite.
Example 9.11. In this example we will show
lim
M
_
M
0
sinx
x
dx = /2. (9.23)
To see this write
1
x
=
_
0
e
tx
dt and use Fubini-Tonelli to conclude that
_
M
0
sinx
x
dx =
_
M
0
__

0
e
tx
sinx dt
_
dx
=
_

0
_
_
M
0
e
tx
sinx dx
_
dt
=
_

0
1
1 +t
2
_
1 te
Mt
sinM e
Mt
cos M
_
dt
_

0
1
1 +t
2
dt =

2
as M ,
wherein we have used the dominated convergence theorem (for instance, take
g (t) :=
1
1+t
2
(1 +te
t
+e
t
)) to pass to the limit.
The next example is a renement of this result.
Example 9.12. We have
_

0
sinx
x
e
x
dx =
1
2
arctan for all > 0 (9.24)
and for, M [0, ),
_
M
0
sinx
x
e
x
dx
1
2
+ arctan
C
e
M
M
(9.25)
where C = max
x0
1+x
1+x
2
=
1
2
22
= 1.2. In particular Eq. (9.23) is valid.

To verify these assertions, rst notice that by the fundamental theorem of
calculus,
[sinx[ =
_
x
0
cos ydy
_
x
0
[cos y[ dy
_
x
0
1dy
= [x[
so

sin x
x
1 for all x ,= 0. Making use of the identity

_

0
e
tx
dt = 1/x
and Fubinis theorem,
_
M
0
sinx
x
e
x
dx =
_
M
0
dxsinxe
x
_

0
e
tx
dt
=
_

0
dt
_
M
0
dxsinxe
(+t)x
=
_

0
1 (cos M + ( +t) sinM) e
M(+t)
( +t)
2
+ 1
dt
=
_

0
1
( +t)
2
+ 1
dt
_

0
cos M + ( +t) sinM
( +t)
2
+ 1
e
M(+t)
dt
=
1
2
arctan (M, ) (9.26)
where
(M, ) =
_

0
cos M + ( +t) sinM
( +t)
2
+ 1
e
M(+t)
dt.
Since
cos M + ( +t) sinM

( +t)
2
+ 1
1 + ( +t)
( +t)
2
+ 1
C,
[(M, )[
_

0
e
M(+t)
dt = C
e
M
M
.
This estimate along with Eq. (9.26) proves Eq. (9.25) from which Eq. (9.23) fol-
lows by taking and Eq. (9.24) follows (using the dominated convergence
theorem again) by letting M .
Lemma 9.13. Suppose that X is a random variable and : 1 1 is a C
1
functions such that lim
x
(x) = 0 and either
t
(x) 0 for all x or
_
R
[
t
(x)[ dx < . Then
E[(X)] =
_

t
(y) P (X > y) dy.
Similarly if X 0 and : [0, ) 1 is a C
1
function such that (0) = 0
and either
t
0 or
_
0
[
t
(x)[ dx < , then
E[(X)] =
_

0
t
(y) P (X > y) dy.
Proof. By the fundamental theorem of calculus for all M < and x 1,
(x) = (M) +
_
x
M
t
(y) dy. (9.27)
Under the stated assumptions on , we may use either the monotone or the
dominated convergence theorem to let M in Eq. (9.27) to nd,
(x) =
_
x
t
(y) dy =
_
R
1
y<x
t
(y) dy for all x 1.
Therefore,
E[(X)] = E
__
R
1
y<X
t
(y) dy
_
=
_
R
E[1
y<X
]
t
(y) dy =
_

t
(y) P (X > y) dy,
where we applied Fubinis theorem for the second equality. The proof of the
second assertion is similar and will be left to the reader.
Example 9.14. Here are a couple of examples involving Lemma 9.13.
1. Suppose X is a random variable, then
E
_
e
X
=
_

P (X > y) e
y
dy =
_

0
P (X > lnu) du, (9.28)
where we made the change of variables, u = e
y
, to get the second equality.
2. If X 0 and p 1, then
EX
p
= p
_

0
y
p1
P (X > y) dy. (9.29)
9.4 Fubinis Theorem and Completions*
Notation 9.15 Given E X Y and x X, let
x
E := y Y : (x, y) E.
Similarly if y Y is given let
E
y
:= x X : (x, y) E.
If f : X Y C is a function let f
x
= f(x, ) and f
y
:= f(, y) so that
f
x
: Y C and f
y
: X C.
Theorem 9.16. Suppose (X, /, ) and (Y, ^, ) are complete nite mea-
sure spaces. Let (X Y, L, ) be the completion of (X Y, /^, ). If f
is L measurable and (a) f 0 or (b) f L
1
() then f
x
is ^ measurable
for a.e. x and f
y
is / measurable for a.e. y and in case (b) f
x
L
1
()
and f
y
L
1
() for a.e. x and a.e. y respectively. Moreover,
9.5 Lebesgue Measure on R
d
and the Change of Variables Theorem 125
_
x
_
Y
f
x
d
_
L
1
() and
_
y
_
X
f
y
d
_
L
1
()
and _
XY
fd =
_
Y
d
_
X
df =
_
X
d
_
Y
d f.
Proof. If E /^ is a null set (i.e. ( )(E) = 0), then
0 = ( )(E) =
_
X
(
x
E)d(x) =
_
X
(E
y
)d(y).
This shows that
(x : (
x
E) ,= 0) = 0 and (y : (E
y
) ,= 0) = 0,
i.e. (
x
E) = 0 for a.e. x and (E
y
) = 0 for a.e. y. If h is L measurable and
h = 0 for a.e., then there exists E / ^ such that (x, y) : h(x, y) ,=
0 E and ( )(E) = 0. Therefore [h(x, y)[ 1
E
(x, y) and ( )(E) = 0.
Since
h
x
,= 0 = y Y : h(x, y) ,= 0
x
E and
h
y
,= 0 = x X : h(x, y) ,= 0 E
y
we learn that for a.e. x and a.e. y that h
x
,= 0 /, h
y
,= 0 ^,
(h
x
,= 0) = 0 and a.e. and (h
y
,= 0) = 0. This implies
_
Y
h(x, y)d(y)
exists and equals 0 for a.e. x and similarly that
_
X
h(x, y)d(x) exists and
equals 0 for a.e. y. Therefore
0 =
_
XY
hd =
_
Y
__
X
hd
_
d =
_
X
__
Y
hd
_
d.
For general f L
1
(), we may choose g L
1
(/^, ) such that f(x, y) =
g(x, y) for a.e. (x, y). Dene h := f g. Then h = 0, a.e. Hence by what
we have just proved and Theorem 9.6 f = g +h has the following properties:
1. For a.e. x, y f(x, y) = g(x, y) +h(x, y) is in L
1
() and
_
Y
f(x, y)d(y) =
_
Y
g(x, y)d(y).
2. For a.e. y, x f(x, y) = g(x, y) +h(x, y) is in L
1
() and
_
X
f(x, y)d(x) =
_
X
g(x, y)d(x).
From these assertions and Theorem 9.6, it follows that
_
X
d(x)
_
Y
d(y)f(x, y) =
_
X
d(x)
_
Y
d(y)g(x, y)
=
_
Y
d(y)
_
Y
d(x)g(x, y)
=
_
XY
g(x, y)d( )(x, y)
=
_
XY
f(x, y)d(x, y).
Similarly it is shown that
_
Y
d(y)
_
X
d(x)f(x, y) =
_
XY
f(x, y)d(x, y).
9.5 Lebesgue Measure on 1
d
and the Change of Variables
Theorem
Notation 9.17 Let
m
d
:=
d times
..
m m on B
R
d =
d times
..
B
R
B
R
be the d fold product of Lebesgue measure m on B
R
. We will also use m
d
to denote its completion and let L
d
be the completion of B
R
d relative to m
d
.
A subset A L
d
is called a Lebesgue measurable set and m
d
is called d
dimensional Lebesgue measure, or just Lebesgue measure for short.
Denition 9.18. A function f : 1
d
1 is Lebesgue measurable if
f
1
(B
R
) L
d
.
Notation 9.19 I will often be sloppy in the sequel and write m for m
d
and dx
for dm(x) = dm
d
(x), i.e.
_
R
d
f (x) dx =
_
R
d
fdm =
_
R
d
fdm
d
.
Hopefully the reader will understand the meaning from the context.
Theorem 9.20. Lebesgue measure m
d
is translation invariant. Moreover m
d
is the unique translation invariant measure on B
R
d such that m
d
((0, 1]
d
) = 1.
Proof. Let A = J
1
J
d
with J
i
B
R
and x 1
d
. Then
x +A = (x
1
+J
1
) (x
2
+J
2
) (x
d
+J
d
)
and therefore by translation invariance of m on B
R
we nd that
m
d
(x +A) = m(x
1
+J
1
) . . . m(x
d
+J
d
) = m(J
1
) . . . m(J
d
) = m
d
(A)
and hence m
d
(x +A) = m
d
(A) for all A B
R
d since it holds for A in a multi-
plicative system which generates B
R
d. From this fact we see that the measure
m
d
(x + ) and m
d
() have the same null sets. Using this it is easily seen that
m(x +A) = m(A) for all A L
d
. The proof of the second assertion is Exercise
9.13.
Exercise 9.1. In this problem you are asked to show there is no reasonable
notion of Lebesgue measure on an innite dimensional Hilbert space. To be
more precise, suppose H is an innite dimensional Hilbert space and m is a
countably additive measure on B
H
which is invariant under translations and
satises, m(B
0
()) > 0 for all > 0. Show m(V ) = for all non-empty open
subsets V H.
Theorem 9.21 (Change of Variables Theorem). Let
o
1
d
be an open
set and T : T()
o
1
d
be a C
1
dieomorphism,
2
see Figure 9.1. Then
for any Borel measurable function, f : T() [0, ],
_
f (T (x)) [ det T
t
(x) [dx =
_
T()
f (y) dy, (9.30)
where T
t
(x) is the linear transformation on 1
d
dened by T
t
(x)v :=
d
dt
[
0
T(x +
tv). More explicitly, viewing vectors in 1
d
as columns, T
t
(x) may be represented
by the matrix
T
t
(x) =
_
1
T
1
(x) . . .
d
T
1
(x)
.
.
.
.
.
.
.
.
.
1
T
d
(x) . . .
d
T
d
(x)
_
_, (9.31)
i.e. the i - j matrix entry of T
t
(x) is given by T
t
(x)
ij
=
i
T
j
(x) where
T(x) = (T
1
(x), . . . , T
d
(x))
tr
and
i
= /x
i
.
Remark 9.22. Theorem 9.21 is best remembered as the statement: if we make
the change of variables y = T (x) , then dy = [ det T
t
(x) [dx. As usual, you must
also change the limits of integration appropriately, i.e. if x ranges through
then y must range through T () .
2
That is T : T() o R
d
is a continuously dierentiable bijection and the
inverse map T
1
: T() is also continuously dierentiable.
Fig. 9.1. The geometric setup of Theorem 9.21.
Note: you may skip the rest of this section!
Proof. The proof will be by induction on d. The case d = 1 was essentially
done in Exercise 7.12. Nevertheless, for the sake of completeness let us give
a proof here. Suppose d = 1, a < < 0 on [a, b] , then
_

[T
t
(x)[ dx =
_

T
t
(x) dx = T () T ()
= m(T ((, ])) =
_
T([a,b])
1
T((,])
(y) dy
while if T
t
(x) < 0 on [a, b] , then
_

[T
t
(x)[ dx =
_

T
t
(x) dx = T () T ()
= m(T ((, ])) =
_
T([a,b])
1
T((,])
(y) dy.
d
Combining the previous three equations shows
_
[a,b]
f (T (x)) [T
t
(x)[ dx =
_
T([a,b])
f (y) dy (9.32)
whenever f is of the form f = 1
T((,])
with a < < < b. An application
of Dynkins multiplicative system Theorem 8.16 then implies that Eq. (9.32)
holds for every bounded measurable function f : T ([a, b]) 1. (Observe that
[T
t
(x)[ is continuous and hence bounded for x in the compact interval, [a, b] .)
Recall that =

N
n=1
(a
n
, b
n
) where a
n
, b
n
1 for n = 1, 2, < N
with N = possible. Hence if f : T () 1
+
is a Borel measurable function
and a
n
<
k
<
k
< b
n
with
k
a
n
and
k
b
n
, then by what we have
already proved and the monotone convergence theorem
_
1
(an,bn)
(f T) [T
t
[dm =
_
_
1
T((an,bn))
f
_
T [T
t
[dm
= lim
k
_
_
1
T([k,k])
f
_
T [T
t
[ dm
= lim
k
_
T()
1
T([k,k])
f dm
=
_
T()
1
T((an,bn))
f dm.
Summing this equality on n, then shows Eq. (9.30) holds.
To carry out the induction step, we now suppose d > 1 and suppose the
theorem is valid with d being replaced by d1. For notational compactness, let
us write vectors in 1
d
as row vectors rather than column vectors. Nevertheless,
the matrix associated to the dierential, T
t
(x) , will always be taken to be given
as in Eq. (9.31).
Case 1. Suppose T (x) has the form
T (x) = (x
i
, T
2
(x) , . . . , T
d
(x)) (9.33)
or
T (x) = (T
1
(x) , . . . , T
d1
(x) , x
i
) (9.34)
for some i 1, . . . , d . For deniteness we will assume T is as in Eq. (9.33), the
case of T in Eq. (9.34) may be handled similarly. For t 1, let i
t
: 1
d1
1
d
be the inclusion map dened by
i
t
(w) := w
t
:= (w
1
, . . . , w
i1
, t, w
i+1
, . . . , w
d1
) ,
t
be the (possibly empty) open subset of 1
d1
dened by
t
:=
_
w 1
d1
: (w
1
, . . . , w
i1
, t, w
i+1
, . . . , w
d1
)
_
and T
t
:
t
1
d1
be dened by
T
t
(w) = (T
2
(w
t
) , . . . , T
d
(w
t
)) ,
see Figure 9.2. Expanding det T
t
(w
t
) along the rst row of the matrix T
t
(w
t
)
Fig. 9.2. In this picture d = i = 3 and is an egg-shaped region with an egg-shaped
hole. The picture indicates the geometry associated with the map T and slicing the
set along planes where x3 = t.
shows
[det T
t
(w
t
)[ = [det T
t
t
(w)[ .
Now by the Fubini-Tonelli Theorem and the induction hypothesis,
_
f T[ det T
t
[dm =
_
R
d
1
f T[ det T
t
[dm
=
_
R
d
1
(w
t
) (f T) (w
t
) [ det T
t
(w
t
) [dwdt
=
_
R
_
_
_
t
(f T) (w
t
) [ det T
t
(w
t
) [dw
_
_
dt
=
_
R
_
_
_
t
f (t, T
t
(w)) [ det T
t
t
(w) [dw
_
_
dt
=
_
R
_
_
_
Tt(t)
f (t, z) dz
_
_dt =
_
R
_
_
_
R
d1
1
T()
(t, z) f (t, z) dz
_
_
dt
=
_
T()
f (y) dy
wherein the last two equalities we have used Fubini-Tonelli along with the iden-
tity;
T () =
tR
T (i
t
()) =
tR
(t, z) : z T
t
(
t
) .
Case 2. (Eq. (9.30) is true locally.) Suppose that T : 1
d
is a general
map as in the statement of the theorem and x
0
is an arbitrary point. We
will now show there exists an open neighborhood W of x
0
such that
_
W
f T[ det T
t
[dm =
_
T(W)
fdm
holds for all Borel measurable function, f : T(W) [0, ]. Let M
i
be the 1-i
minor of T
t
(x
0
) , i.e. the determinant of T
t
(x
0
) with the rst row and i
th
column removed. Since

0 ,= det T
t
(x
0
) =
d
i=1
(1)
i+1
i
T
j
(x
0
) M
i
,
there must be some i such that M
i
,= 0. Fix an i such that M
i
,= 0 and let,
S (x) := (x
i
, T
2
(x) , . . . , T
d
(x)) . (9.35)
Observe that [det S
t
(x
0
)[ = [M
i
[ ,= 0. Hence by the inverse function Theorem,
there exist an open neighborhood W of x
0
such that W
o
and S (W)
o
1
d
and S : W S (W) is a C
1
dieomorphism. Let R : S (W) T (W)
o
1
d
to be the C
1
dieomorphism dened by
R(z) := T S
1
(z) for all z S (W) .
Because
(T
1
(x) , . . . , T
d
(x)) = T (x) = R(S (x)) = R((x
i
, T
2
(x) , . . . , T
d
(x)))
for all x W, if
(z
1
, z
2
, . . . , z
d
) = S (x) = (x
i
, T
2
(x) , . . . , T
d
(x))
then
R(z) =
_
T
1
_
S
1
(z)
_
, z
2
, . . . , z
d
_
. (9.36)
Observe that S is a map of the form in Eq. (9.33), R is a map of the form in Eq.
(9.34), T
t
(x) = R
t
(S (x)) S
t
(x) (by the chain rule) and (by the multiplicative
property of the determinant)
[det T
t
(x)[ = [ det R
t
(S (x)) [ [det S
t
(x)[ x W.
So if f : T(W) [0, ] is a Borel measurable function, two applications of the
results in Case 1. shows,
_
W
f T [ det T
t
[dm =
_
W
(f R [ det R
t
[) S [det S
t
[ dm
=
_
S(W)
f R [ det R
t
[dm =
_
R(S(W))
fdm
=
_
T(W)
fdm
and Case 2. is proved.
Case 3. (General Case.) Let f : [0, ] be a general non-negative Borel
measurable function and let
K
n
:= x : dist(x,
c
) 1/n and [x[ n .
Then each K
n
is a compact subset of and K
n
as n . Using the
compactness of K
n
and case 2, for each n N, there is a nite open cover J
n
of K
n
such that W and Eq. (9.30) holds with replaced by W for each
W J
n
. Let W
i
i=1
be an enumeration of
n=1
J
n
and set

W
1
= W
1
and
W
i
:= W
i
(W
1
W
i1
) for all i 2. Then =
i=1
W
i
and by repeated
use of case 2.,
d
_
f T[ det T
t
[dm =
i=1
_
1
Wi
(f T) [ det T
t
[dm
=
i=1
_
Wi
__
1
T(

Wi)
f
_
T
_
[ det T
t
[dm
=
i=1
_
T(Wi)
1
T(

Wi)
f dm =
n
i=1
_
T()
1
T(

Wi)
f dm
=
_
T()
fdm.
Remark 9.23. When d = 1, one often learns the change of variables formula as
_
b
a
f (T (x)) T
t
(x) dx =
_
T(b)
T(a)
f (y) dy (9.37)
where f : [a, b] 1 is a continuous function and T is C
1
function dened in
a neighborhood of [a, b] . If T
t
> 0 on (a, b) then T ((a, b)) = (T (a) , T (b)) and
Eq. (9.37) is implies Eq. (9.30) with = (a, b) . On the other hand if T
t
< 0
on (a, b) then T ((a, b)) = (T (b) , T (a)) and Eq. (9.37) is equivalent to
_
(a,b)
f (T (x)) ([T
t
(x)[) dx =
_
T(a)
T(b)
f (y) dy =
_
T((a,b))
f (y) dy
which is again implies Eq. (9.30). On the other hand Eq. (9.37) is more general
than Eq. (9.30) since it does not require T to be injective. The standard proof
of Eq. (9.37) is as follows. For z T ([a, b]) , let
F (z) :=
_
z
T(a)
f (y) dy.
Then by the chain rule and the fundamental theorem of calculus,
_
b
a
f (T (x)) T
t
(x) dx =
_
b
a
F
t
(T (x)) T
t
(x) dx =
_
b
a
d
dx
[F (T (x))] dx
= F (T (x)) [
b
a
=
_
T(b)
T(a)
f (y) dy.
An application of Dynkins multiplicative systems theorem now shows that Eq.
(9.37) holds for all bounded measurable functions f on (a, b) . Then by the
usual truncation argument, it also holds for all positive measurable functions
on (a, b) .
Exercise 9.2. Continuing the setup in Theorem 9.21, show that
f L
1
_
T () , m
d
_
i
_
[f T[ [ det T
t
[dm <
and if f L
1
_
T () , m
d
_
, then Eq. (9.30) holds.
Example 9.24. Continuing the setup in Theorem 9.21, if A B
, then
m(T (A)) =
_
R
d
1
T(A)
(y) dy =
_
R
d
1
T(A)
(Tx) [det T
t
(x)[ dx
=
_
R
d
1
A
(x) [det T
t
(x)[ dx
wherein the second equality we have made the change of variables, y = T (x) .
Hence we have shown
d (m T) = [det T
t
()[ dm.
Taking T GL(d, 1) = GL(1
d
) the space of d d invertible matrices in
the previous example implies m T = [det T[ m, i.e.
m(T (A)) = [det T[ m(A) for all A B
R
d. (9.38)
This equation also shows that m T and m have the same null sets and hence
the equality in Eq. (9.38) is valid for any A L
d
. In particular we may conclude
that m is invariant under those T GL(d, 1) with [det (T)[ = 1. For example
if T is a rotation (i.e. T
tr
T = I), then det T = 1 and hence m is invariant
under all rotations. This is not obvious from the denition of m
d
as a product
measure!
Example 9.25. Suppose that T (x) = x+b for some b 1
d
. In this case T
t
(x) =
I and therefore it follows that
_
R
d
f (x +b) dx =
_
R
d
f (y) dy
for all measurable f : 1
d
[0, ] or for any f L
1
(m) . In particular Lebesgue
measure is invariant under translations.
Example 9.26 (Polar Coordinates). Suppose T : (0, )(0, 2) 1
2
is dened
by
x = T(r, ) = (r cos , r sin) ,
i.e. we are making the change of variable,
x
1
= r cos and x
2
= r sin for 0 < r < and 0 < < 2.
In this case
T
t
(r, ) =
_
cos r sin
sin r cos
_
and therefore
dx = [det T
t
(r, )[ drd = rdrd.
Observing that
1
2
T ((0, ) (0, 2)) = := (x, 0) : x 0
has m
2
measure zero, it follows from the change of variables Theorem 9.21
that
_
R
2
f(x)dx =
_
2
0
d
_

0
dr r f(r (cos , sin)) (9.39)
for any Borel measurable function f : 1
2
[0, ].
Example 9.27 (Holomorphic Change of Variables). Suppose that f :
o
C
=
1
2
C is an injective holomorphic function such that f
t
(z) ,= 0 for all z .
We may express f as
f (x +iy) = U (x, y) +iV (x, y)
for all z = x +iy . Hence if we make the change of variables,
w = u +iv = f (x +iy) = U (x, y) +iV (x, y)
then
dudv =
det
_
U
x
U
y
V
x
V
y
_
dxdy = [U
x
V
y
U
y
V
x
[ dxdy.
Recalling that U and V satisfy the Cauchy Riemann equations, U
x
= V
y
and
U
y
= V
x
with f
t
= U
x
+iV
x
, we learn
U
x
V
y
U
y
V
x
= U
2
x
+V
2
x
= [f
t
[
2
.
Therefore
dudv = [f
t
(x +iy)[
2
dxdy.
Fig. 9.3. The region consists of the two curved rectangular regions shown.
Example 9.28. In this example we will evaluate the integral
I :=
__
_
x
4
y
4
_
dxdy
where
=
_
(x, y) : 1 < x
2
y
2
< 2, 0 < xy < 1
_
,
see Figure 9.3. We are going to do this by making the change of variables,
(u, v) := T (x, y) =
_
x
2
y
2
, xy
_
,
in which case
dudv =
det
_
2x 2y
y x
_
dxdy = 2
_
x
2
+y
2
_
dxdy
Notice that
_
x
4
y
4
_
=
_
x
2
y
2
_ _
x
2
+y
2
_
= u
_
x
2
+y
2
_
=
1
2
ududv.
The function T is not injective on but it is injective on each of its connected
components. Let D be the connected component in the rst quadrant so that
= D D and T (D) = (1, 2) (0, 1) . The change of variables theorem
then implies
I
:=
__
D
_
x
4
y
4
_
dxdy =
1
2
__
(1,2)(0,1)
ududv =
1
2
u
2
2
[
2
1
1 =
3
4
and therefore I = I
+
+I
= 2 (3/4) = 3/2.
9.6 The Polar Decomposition of Lebesgue Measure* 131
Exercise 9.3 (Spherical Coordinates). Let T : (0, )(0, )(0, 2) 1
3
be dened by
T (r, , ) = (r sincos , r sinsin, r cos )
= r (sincos , sinsin, cos ) ,
see Figure 9.4. By making the change of variables x = T (r, , ) , show
Fig. 9.4. The relation of x to (r, , ) in spherical coordinates.
_
R
3
f(x)dx =
_

0
d
_
2
0
d
_

0
dr r
2
sin f(T (r, , ))
for any Borel measurable function, f : 1
3
[0, ].
Lemma 9.29. Let a > 0 and
I
d
(a) :=
_
R
d
e
a]x]
2
dm(x).
Then I
d
(a) = (/a)
d/2
.
Proof. By Tonellis theorem and induction,
I
d
(a) =
_
R
d1
R
e
a]y]
2
e
at
2
m
d1
(dy) dt
= I
d1
(a)I
1
(a) = I
d
1
(a). (9.40)
So it suces to compute:
I
2
(a) =
_
R
2
e
a]x]
2
dm(x) =
_
R
2
\0]
e
a(x
2
1
+x
2
2
)
dx
1
dx
2
.
Using polar coordinates, see Eq. (9.39), we nd,
I
2
(a) =
_

0
dr r
_
2
0
d e
ar
2
= 2
_

0
re
ar
2
dr
= 2 lim
M
_
M
0
re
ar
2
dr = 2 lim
M
e
ar
2
2a
_
M
0
=
2
2a
= /a.
This shows that I
2
(a) = /a and the result now follows from Eq. (9.40).
9.6 The Polar Decomposition of Lebesgue Measure*
Let
S
d1
= x 1
d
: [x[
2
:=
d
i=1
x
2
i
= 1
be the unit sphere in 1
d
equipped with its Borel algebra, B
S
d1 and :
1
d
0 (0, ) S
d1
be dened by (x) := ([x[ , [x[
1
x). The inverse map,
1
: (0, ) S
d1
1
d
0 , is given by
1
(r, ) = r. Since and
1
are continuous, they are both Borel measurable. For E B
S
d1 and a > 0, let
E
a
:= r : r (0, a] and E =
1
((0, a] E) B
R
d.
Denition 9.30. For E B
S
d1, let (E) := d m(E
1
). We call the surface
measure on S
d1
.
It is easy to check that is a measure. Indeed if E B
S
d1, then E
1
=
1
((0, 1] E) B
R
d so that m(E
1
) is well dened. Moreover if E =
i=1
E
i
,
then E
1
=
i=1
(E
i
)
1
and
(E) = d m(E
1
) =
i=1
m((E
i
)
1
) =
i=1
(E
i
).
The intuition behind this denition is as follows. If E S
d1
is a set and > 0
is a small number, then the volume of
(1, 1 +] E = r : r (1, 1 +] and E
Fig. 9.5. Motivating the denition of surface measure for a sphere.
should be approximately given by m((1, 1 +] E)

= (E), see Figure 9.5
below. On the other hand
m((1, 1 +]E) = m(E
1+
E
1
) =
_
(1 +)
d
1
_
m(E
1
).
Therefore we expect the area of E should be given by
(E) = lim
0
_
(1 +)
d
1
_
m(E
1
)
= d m(E
1
).
The following theorem is motivated by Example 9.26 and Exercise 9.3.
Theorem 9.31 (Polar Coordinates). If f : 1
d
[0, ] is a (B
R
d, B)
measurable function then
_
R
d
f(x)dm(x) =
_
(0,)S
d1
f(r)r
d1
drd(). (9.41)
In particular if f : 1
+
1
+
is measurable then
_
R
d
f([x[)dx =
_

0
f(r)dV (r) (9.42)
where V (r) = m(B(0, r)) = r
d
m(B(0, 1)) = d
1
_
S
d1
_
r
d
.
Proof. By Exercise 7.11,
_
R
d
fdm =
_
R
d
\0]
_
f
1
_
dm =
_
(0,)S
d1
_
f
1
_
d (
m) (9.43)
and therefore to prove Eq. (9.41) we must work out the measure
mon B
(0,)
B
S
d1 dened by
m(A) := m
_
1
(A)
_
A B
(0,)
B
S
d1. (9.44)
If A = (a, b] E with 0 < a < b and E B
S
d1, then
1
(A) = r : r (a, b] and E = bE
1
aE
1
wherein we have used E
a
= aE
1
in the last equality. Therefore by the basic
scaling properties of m and the fundamental theorem of calculus,
(
m) ((a, b] E) = m(bE
1
aE
1
) = m(bE
1
) m(aE
1
)
= b
d
m(E
1
) a
d
m(E
1
) = d m(E
1
)
_
b
a
r
d1
dr. (9.45)
Letting d(r) = r
d1
dr, i.e.
(J) =
_
J
r
d1
dr J B
(0,)
, (9.46)
Eq. (9.45) may be written as
(
m) ((a, b] E) = ((a, b]) (E) = ( ) ((a, b] E) . (9.47)

Since
c = (a, b] E : 0 < a < b and E B
S
d1 ,
is a class (in fact it is an elementary class) such that (c) = B
(0,)
B
S
d1,
it follows from the Theorem and Eq. (9.47) that
m = . Using this
result in Eq. (9.43) gives
_
R
d
fdm =
_
(0,)S
d1
_
f
1
_
d ( )
which combined with Tonellis Theorem 9.6 proves Eq. (9.43).
Corollary 9.32. The surface area (S
d1
) of the unit sphere S
d1
1
d
is
(S
d1
) =
2
d/2
(d/2)
(9.48)
where is the gamma function is as in Example 7.47 and 7.50.
Proof. Using Theorem 9.31 we nd
I
d
(1) =
_

0
dr r
d1
e
r
2
_
S
d1
d = (S
d1
)
_

0
r
d1
e
r
2
dr.
9.7 More Spherical Coordinates* 133
We simplify this last integral by making the change of variables u = r
2
so that
r = u
1/2
and dr =
1
2
u
1/2
du. The result is
_

0
r
d1
e
r
2
dr =
_

0
u
d1
2
e
u
1
2
u
1/2
du
=
1
2
_

0
u
d
2
1
e
u
du =
1
2
(d/2). (9.49)
Combing the the last two equations with Lemma 9.29 which states that I
d
(1) =
d/2
, we conclude that
d/2
= I
d
(1) =
1
2
(S
d1
)(d/2)
which proves Eq. (9.48).
9.7 More Spherical Coordinates*
In this section we will dene spherical coordinates in all dimensions. Along
the way we will develop an explicit method for computing surface integrals on
spheres. As usual when n = 2 dene spherical coordinates (r, ) (0, )
[0, 2) so that
_
x
1
x
2
_
=
_
r cos
r sin
_
= T
2
(, r).
For n = 3 we let x
3
= r cos
1
and then
_
x
1
x
2
_
= T
2
(, r sin
1
),
as can be seen from Figure 9.6, so that
Fig. 9.6. Setting up polar coordinates in two and three dimensions.
_
_
x
1
x
2
x
3
_
_
=
_
T
2
(, r sin
1
)
r cos
1
_
=
_
_
r sin
1
cos
r sin
1
sin
r cos
1
_
_
=: T
3
(,
1
, r, ).
We continue to work inductively this way to dene
_
_
_
_
_
x
1
.
.
.
x
n
x
n+1
_
_
_
_
_
=
_
T
n
(,
1
, . . . ,
n2
, r sin
n1
, )
r cos
n1
_
= T
n+1
(,
1
, . . . ,
n2
,
n1
, r).
So for example,
x
1
= r sin
2
sin
1
cos
x
2
= r sin
2
sin
1
sin
x
3
= r sin
2
cos
1
x
4
= r cos
2
and more generally,
x
1
= r sin
n2
. . . sin
2
sin
1
cos
x
2
= r sin
n2
. . . sin
2
sin
1
sin
x
3
= r sin
n2
. . . sin
2
cos
1
.
.
.
x
n2
= r sin
n2
sin
n3
cos
n4
x
n1
= r sin
n2
cos
n3
x
n
= r cos
n2
. (9.50)
By the change of variables formula,
_
R
n
f(x)dm(x)
=
_

0
dr
_
0i,02
d
1
. . . d
n2
d
_
n
(,
1
, . . . ,
n2
, r)
f(T
n
(,
1
, . . . ,
n2
, r))
_
(9.51)
where
n
(,
1
, . . . ,
n2
, r) := [det T
t
n
(,
1
, . . . ,
n2
, r)[ .
Proposition 9.33. The Jacobian,
n
is given by
n
(,
1
, . . . ,
n2
, r) = r
n1
sin
n2
n2
. . . sin
2
2
sin
1
. (9.52)
If f is a function on rS
n1
the sphere of radius r centered at 0 inside of 1
n
,
then
_
rS
n1
f(x)d(x) = r
n1
_
S
n1
f(r)d()
=
_
0i,02
f(T
n
(,
1
, . . . ,
n2
, r))
n
(,
1
, . . . ,
n2
, r)d
1
. . . d
n2
d
(9.53)
Proof. We are going to compute
n
inductively. Letting := r sin
n1
and writing
Tn
for
Tn
(,
1
, . . . ,
n2
, ) we have
n+1
(,
1
, . . . ,
n2
,
n1
, r)
=
_
Tn
Tn
1
0 0
. . .
Tn
n2
. . . 0
Tn
r cos
n1
r sin
n1
Tn
sin
n1
cos
n1
_
= r
_
cos
2
n1
+ sin
2
n1
_
n
(, ,
1
, . . . ,
n2
, )
= r
n
(,
1
, . . . ,
n2
, r sin
n1
),
i.e.
n+1
(,
1
, . . . ,
n2
,
n1
, r) = r
n
(,
1
, . . . ,
n2
, r sin
n1
). (9.54)
To arrive at this result we have expanded the determinant along the bottom
row. Staring with
2
(, r) = r already derived in Example 9.26, Eq. (9.54)
implies,
3
(,
1
, r) = r
2
(, r sin
1
) = r
2
sin
1
4
(,
1
,
2
, r) = r
3
(,
1
, r sin
2
) = r
3
sin
2
2
sin
1
.
.
.
n
(,
1
, . . . ,
n2
, r) = r
n1
sin
n2
n2
. . . sin
2
2
sin
1
which proves Eq. (9.52). Equation (9.53) now follows from Eqs. (9.41), (9.51)
and (9.52).
As a simple application, Eq. (9.53) implies
(S
n1
) =
_
0i,02
sin
n2
n2
. . . sin
2
2
sin
1
d
1
. . . d
n2
d
= 2
n2
k=1
k
= (S
n2
)
n2
(9.55)
where
k
:=
_
0
sin
k
d. If k 1, we have by integration by parts that,
k
=
_

0
sin
k
d =
_

0
sin
k1
d cos = 2
k,1
+ (k 1)
_

0
sin
k2
cos
2
d
= 2
k,1
+ (k 1)
_

0
sin
k2
_
1 sin
2
_
d = 2
k,1
+ (k 1) [
k2
k
]
and hence
k
satises
0
= ,
1
= 2 and the recursion relation
k
=
k 1
k

k2
for k 2.
Hence we may conclude
0
= ,
1
= 2,
2
=
1
2
,
3
=
2
3
2,
4
=
3
4
1
2
,
5
=
4
5
2
3
2,
6
=
5
6
3
4
1
2
and more generally by induction that
2k
=
(2k 1)!!
(2k)!!
and
2k+1
= 2
(2k)!!
(2k + 1)!!
.
Indeed,
2(k+1)+1
=
2k + 2
2k + 3
2k+1
=
2k + 2
2k + 3
2
(2k)!!
(2k + 1)!!
= 2
[2(k + 1)]!!
(2(k + 1) + 1)!!
and
2(k+1)
=
2k + 1
2k + 1
2k
=
2k + 1
2k + 2
(2k 1)!!
(2k)!!
=
(2k + 1)!!
(2k + 2)!!
.
The recursion relation in Eq. (9.55) may be written as
(S
n
) =
_
S
n1
_
n1
(9.56)
which combined with
_
S
1
_
= 2 implies
_
S
1
_
= 2,
(S
2
) = 2
1
= 2 2,
(S
3
) = 2 2
2
= 2 2
1
2
=
2
2
2
2!!
,
(S
4
) =
2
2
2
2!!

3
=
2
2
2
2!!
2
2
3
=
2
3
2
3!!
(S
5
) = 2 2
1
2

2
3
2
3
4
1
2
=
2
3
3
4!!
,
(S
6
) = 2 2
1
2

2
3
2
3
4
1
2

4
5
2
3
2 =
2
4
3
5!!
9.8 Gaussian Random Vectors 135
and more generally that
(S
2n
) =
2 (2)
n
(2n 1)!!
and (S
2n+1
) =
(2)
n+1
(2n)!!
(9.57)
which is veried inductively using Eq. (9.56). Indeed,
(S
2n+1
) = (S
2n
)
2n
=
2 (2)
n
(2n 1)!!
(2n 1)!!
(2n)!!
=
(2)
n+1
(2n)!!
and
(S
(n+1)
) = (S
2n+2
) = (S
2n+1
)
2n+1
=
(2)
n+1
(2n)!!
2
(2n)!!
(2n + 1)!!
=
2 (2)
n+1
(2n + 1)!!
.
Using
(2n)!! = 2n(2(n 1)) . . . (2 1) = 2
n
n!
we may write (S
2n+1
) =
2
n+1
n!
which shows that Eqs. (9.41) and (9.57 are in
agreement. We may also write the formula in Eq. (9.57) as
(S
n
) =
_
_
_
2(2)
n/2
(n1)!!
for n even
(2)
n+1
2
(n1)!!
for n odd.
9.8 Gaussian Random Vectors
Denition 9.34 (Gaussian Random Vectors). Let (, B, P) be a probabil-
ity space and X : 1
d
be a random vector. We say that X is Gaussian if
there exists an d d symmetric matrix Q and a vector 1
d
such that
E
_
e
iX
= exp
_
1
2
Q +i
_
for all 1
d
. (9.58)
We will write X
d
= N (Q, ) to denote a Gaussian random vector such that Eq.
(9.58) holds.
Notice that if there exists a random variable satisfying Eq. (9.58) then its law
is uniquely determined by Q and because of Corollary 8.11. In the exercises
below your will develop some basic properties of Gaussian random vectors see
Theorem 9.38 for a summary of what you will prove.
Exercise 9.4. Show that Q must be non-negative in Eq. (9.58).
Denition 9.35. Given a Gaussian random vector, X, we call the pair, (Q, )
appearing in Eq. (9.58) the characteristics of X. We will also abbreviate the
statement that X is a Gaussian random vector with characteristics (Q, ) by
writing X
d
= N (Q, ) .
Lemma 9.36. Suppose that X
d
= N (Q, ) and A : 1
d
1
m
is a md real
matrix and 1
m
, then AX +
d
= N (AQA
tr
, A +) . In short we might
abbreviate this by saying, AN (Q, ) +
d
= N (AQA
tr
, A +) .
Proof. Let 1
m
, then
E
_
e
i(AX+)
_
= e
i
E
_
e
iA
tr
X
_
= e
i
exp
_
1
2
QA
tr
A
tr
+i A
tr
_
= e
i
exp
_
1
2
AQA
tr
+iA
_
= exp
_
1
2
AQA
tr
+i (A +)
_
from which it follows that AX +
d
= N (AQA
tr
, A +) .
Exercise 9.5. Let P be the probability measure on := 1
d
dened by
dP (x) :=
_
1
2
_
d/2
e
1
2
xx
dx =
d
i=1
_
1
2
e
x
2
i
/2
dx
i
_
.
Show that N : 1
d
dened by N (x) = x is Gaussian and satises Eq.
(9.58) with Q = I and = 0. Also show
i
= EN
i
and
ij
= Cov (N
i
, N
j
) for all 1 i, j d. (9.59)
Hint: use Exercise 7.15 and (of course) Fubinis theorem.
Exercise 9.6. Let A be any real md matrix and 1
m
and set X := AN+
where = 1
d
, P, and N are as in Exercise 9.5. Show that X is Gaussian by
showing Eq. (9.58) holds with Q = AA
tr
(A
tr
is the transpose of the matrix A)
and = . Also show that
i
= EX
i
and Q
ij
= Cov (X
i
, X
j
) for all 1 i, j m. (9.60)
Remark 9.37 (Spectral Theorem). Recall that if Q is a real symmetric d d
matrix, then the spectral theorem asserts there exists an orthonormal basis,
u
d
j=1
, such that Qu
j
=
j
u
j
for some
j
1. Moreover,
j
0 for all j is
equivalent to Q being non-negative. When Q 0 we may dene Q
1/2
by
Q
1/2
u
j
:=
_
j
u
j
for 1 j d.
Notice that Q
1/2
0 and Q =
_
Q
1/2
_
2
and Q
1/2
is still symmetric. If Q is
positive denite, we may also dene, Q
1/2
by
Q
1/2
u
j
:=
1
_
j
u
j
for 1 j d
so that Q
1/2
=
_
Q
1/2
1
.
Exercise 9.7. Suppose that Q is a positive denite (for simplicity) d d real
matrix and 1
d
and let = 1
d
, P, and N be as in Exercise 9.5. By Exercise
9.6 we know that X = Q
1/2
N + is a Gaussian random vector satisfying Eq.
(9.58). Use the multi-dimensional change of variables formula to show
Law
P
(X) (dy) =
1
_
det (2Q)
exp
_
1
2
Q
1
(y ) (y )
_
dy.
Let us summarize some of what the preceding exercises have shown.
Theorem 9.38. To each positive denite d d real symmetric matrix Q and
1
d
there exist Gaussian random vectors, X, satisfying Eq. (9.58). Moreover
for such an X,
Law
P
(X) (dy) =
1
_
det (2Q)
exp
_
1
2
Q
1
(y ) (y )
_
dy
where Q and may be computed from X using,
i
= EX
i
and Q
ij
= Cov (X
i
, X
j
) for all 1 i, j m. (9.61)
When Q is degenerate, i.e. Nul (Q) ,= 0 , then X = Q
1/2
N + is still a
Gaussian random vectors satisfying Eq. (9.58). However now the Law
P
(X) is
a measure on 1
d
which is concentrated on the non-trivial subspace, Nul (Q)

the details of this are left to the reader for now.
Exercise 9.8 (Gaussian random vectors are highly integrable.). Sup-
pose that X : 1
d
is a Gaussian random vector, say X
d
= N (Q, ) . Let
|x| :=
x x and m := max Qx x : |x| = 1 be the largest eigenvalue

3
of Q.
Then E
_
e
|X|
2
_
< for every <
1
2m
.
3
For those who know about operator norms observe that m = Q in this case.
Because of Eq. (9.61), for all 1
d
we have
=
d
i=1
EX
i

i
= E( X)
and
Q =
i,j
Q
ij
j
=
i,j
j
Cov (X
i
, X
j
)
= Cov
_
_
i
X
i
,
j
X
j
_
_
= Var ( X) .
Therefore we may reformulate the denition of a Gaussian random vector as
follows.
Denition 9.39 (Gaussian Random Vectors). Let (, B, P) be a probabil-
ity space. A random vector, X : 1
d
, is Gaussian i for all 1
d
,
E
_
e
iX
= exp
_
1
2
Var ( X) +iE( X)
_
. (9.62)
In short, X is a Gaussian random vector i X is a Gaussian random variable
for all 1
d
.
Remark 9.40. To conclude that a random vector, X : 1
d
, is Gaussian it
is not enough to check that each of its components, X
i
d
i=1
, are Gaussian
random variables. The following simple counter example was provided by Nate
Eldredge. Let (X, Y ) : 1
2
be a Random vector such that (X, Y )
P =
where d(x) =
1
2
e
1
2
x
2
dx and =
1
2
(
1
+
1
) . Then (X, Y X) : 1
2
is
a random vector such that both components, X and Y X, are Gaussian random
variables but (X, Y X) is not a Gaussian random vector.
Exercise 9.9. Prove the assertion made in Remark 9.40. Hint: explicitly com-
pute E
_
e
i(1X+2XY )
.
9.8.1 *Gaussian measures with possibly degenerate covariances
The main aim of this subsection is to explicitly describe Gaussian measures
with possibly degenerate covariances, Q. The case where Q > 0 has already
been done in Theorem 9.38.
Remark 9.41. Recall that if Q is a real symmetric N N matrix, then the
spectral theorem asserts there exists an orthonormal basis, u
N
j=1
, such that
Qu
j
=
j
u
j
for some
j
1. Moreover,
j
0 for all j is equivalent to Q being
non-negative. Hence if Q 0 and f :
j
: j = 1, 2, . . . , N 1, we may dene
f (Q) to be the unique linear transformation on 1
N
such that f (Q) u
j
=
j
u
j
.
9.9 Kolmogorovs Extension Theorems 137
Example 9.42. When Q 0 and f (x) :=

x, we write Q
1/2
or

Q for f (Q) .
Notice that Q
1/2
0 and Q = Q
1/2
Q
1/2
.
Example 9.43. When Q is symmetric and
f (x) =
_
1/x if x ,= 0
0 if x = 0
we will denote f (Q) by Q
1
. As the notation suggests, f (Q) is the inverse of Q
when Q is invertible which happens i
i
,= 0 for all i. When Q is not invertible,
Q
1
:= f (Q) = Q[
1
Ran(Q)
P, (9.63)
where P : 1
N
1
N
be orthogonal projection onto the Ran(Q) . Observe that
P = g (Q) where g (x) = 1
x,=0
.
Lemma 9.44. For any Q 0 we can nd a matrix, A, such that Q = AA
tr
.
In fact it suces to take A = Q
1/2
.
Proposition 9.45. Suppose X
d
= N (Q, c) (see Denition 9.35) where c 1
N
and Q is a positive semi-denite N N real matrix. If =
(Q,c)
= P X
1
,
then
_
R
N
f (x) d(x) =
1
Z
_
c+Ran(Q)
f (x) exp
_
1
2
Q
1
(x c) (x c)
_
dx
where dx is now Lebesgue measure on c +Ran(Q) , Q
1
is dened as in Eq.
(9.63), and Z :=
_
det
_
2Q[
Ran(Q)
_
.
Proof. Let k = dimRan(Q) and choose a linear transformation, U : 1
k
1
N
, such that Ran(U) = Ran(Q) and U : 1
k
Ran(Q) is an isometric
isomorphism. Letting A := Q
1/2
U, we have
AA
tr
= Q
1/2
UU
tr
Q
1/2
= Q
1/2
P
Ran(Q)
Q
1/2
= Q.
Therefore, if Y = N (I
kk
, 0) , then X = AY + c
d
= N (Q, c) by Lemma 9.36.
Observe that X c = Q
1/2
UY takes values in Ran(Q) and hence the Law
of (X c) is a probability measure on 1
N
which is concentrated on Ran(Q) .
From this it follows that = P X
1
is a probability measure on measure on
1
N
which is concentrated on the ane space, c + Ran(Q) . At any rate from
Theorem 9.38 we have
_
R
N
f (x) d(x) =
_
R
k
f (Ay +c)
_
1
2
_
k/2
e
1
2
]y]
2
dy
=
_
R
k
f
_
Q
1/2
Uy +c
_
_
1
2
_
k/2
e
1
2
]y]
2
dy.
Since
Q
1/2
Uy +c = UU
tr
Q
1/2
Uy +c,
we may make the change of variables, z = U
tr
Q
1/2
Uy, using
dz =
_
det Q[
Ran(Q)
dy =

i:i,=0
i
dy
and
[y[
2
=
_
U
tr
Q
1/2
U
_
1
z
2
=
U
tr
Q
1/2
Uz
2
=
_
Q
1/2
Uz, Q
1/2
Uz
_
R
N
=
_
Q
1
Uz, Uz
_
R
N
,
to nd
_
R
N
f (x) d(x) =
_
R
k
f (Uz +c)
_
1
2
_
k/2
e
1
2
(U
tr
Q
1/2
U)
1
z
2
dz
=
_
R
k
f (Uz +c)
_
1
2
_
k/2
1
_
det Q[
Ran(Q)
e
1
2
(Q
1
Uz,Uz)
R
N
dz
=
_
R
k
f (Uz +c)
1
_
det
_
2Q[
Ran(Q)
_
e
1
2
(Q
1
Uz,Uz)
R
N
dz
=
_
R
k
f (Uz +c)
1
_
det
_
2Q[
Ran(Q)
_
e
1
2
(Q
1
(Uz+cc),(Uz+cc))
R
N
dz.
This completes the proof, since x = Uz + c c + Ran(Q) is by denition dis-
tributed as Lebesgue measure on c+Ran(Q) when z is distributed as Lebesgue
measure on 1
k
.
9.9 Kolmogorovs Extension Theorems
In this section we will extend the results of Section 5.5 to spaces which are not
simply products of discrete spaces. We begin with a couple of results involving
the topology on 1
N
.
9.9.1 Regularity and compactness results
Theorem 9.46 (Inner-Outer Regularity). Suppose is a probability mea-
sure on
_
1
N
, B
R
N
_
, then for all B B
R
N we have
(B) = inf (V ) : B V and V is open (9.64)
and
(B) = sup(K) : K B with K compact . (9.65)
Proof. In this proof, C, and C
i
will always denote a closed subset of 1
N
and V, V
i
will always be open subsets of 1
N
. Let T be the collection of sets,
A B, such that for all > 0 there exists an open set V and a closed set, C,
such that C A V and (V C) < . The key point of the proof is to show
T = B for this certainly implies Equation (9.64) and also that
(B) = sup(C) : C B with C closed . (9.66)
Moreover, by MCT, we know that if C is closed and K
n
:=
C
_
x 1
N
: [x[ n
_
, then (K
n
) (C) . This observation along
with Eq. (9.66) shows Eq. (9.65) is valid as well.
To prove T = B, it suces to show T is a algebra which contains all
closed subsets of 1
N
. To the prove the latter assertion, given a closed subset,
C 1
N
, and > 0, let
C
:=
xC
B(x, )
where B(x, ) :=
_
y 1
N
: [y x[ <
_
. Then C
is an open set and C
C
as 0. (You prove.) Hence by the DCT, we know that (C
C) 0 form
which it follows that C T.
We will now show that T is an algebra. Clearly T contains the empty set
and if A T with C A V and (V C) < , then V
c
A
c
C
c
with
(C
c
V
c
) = (V C) < . This shows A
c
T. Similarly if A
i
T for i = 1, 2
and C
i
A
i
V
i
with (V
i
C
i
) < , then
C := C
1
C
2
A
1
A
2
V
1
V
2
=: V
and
(V C) (V
1
C) +(V
2
C)
(V
1
C
1
) +(V
2
C
2
) < 2.
This implies that A
1
A
2
T and we have shown T is an algebra.
We now show that T is a algebra. To do this it suces to show A :=
n=1
A
n
T if A
n
T with A
n
A
m
= for m ,= n. Let C
n
A
n
V
n
with
(V
n
C
n
) < 2
n
for all n and let C
N
:=
nN
C
n
and V :=
n=1
V
n
. Then
C
N
A V and
_
V C
N
_
n=0
_
V
n
C
N
_
n=0
(V
n
C
n
) +
n=N+1
(V
n
)
n=0
2
n
+
n=N+1
_
(A
n
) +2
n
= +
n=N+1
(A
n
) .
The last term is less that 2 for N suciently large because

n=1
(A
n
) =
(A) < .
Notation 9.47 Let I := [0, 1] , Q = I
N
,
j
: Q I be the projection
map,
j
(x) = x
j
(where x = (x
1
, x
2
, . . . , x
j
, . . . ) for all j N, and B
Q
:=
(
j
: j N) be the product algebra on Q. Let us further say that a sequence
x(m)
m=1
Q, where x(m) = (x
1
(m) , x
2
(m) , . . . ) , converges to x Q i
lim
m
x
j
(m) = x
j
for all j N. (This is just pointwise convergence.)
Lemma 9.48 (Baby Tychonos Theorem). The innite dimensional
cube, Q, is compact, i.e. every sequence x(m)
m=1
Q has a convergent
subsequence,x(m
k
)
k=1
.
Proof. Since I is compact, it follows that for each j N, x
j
(m)
m=1
has
a convergent subsequence. It now follow by Cantors diagonalization method,
that there is a subsequence, m
k
k=1
, of N such that lim
k
x
j
(m
k
) I exists
for all j N.
Corollary 9.49 (Finite Intersection Property). Suppose that K
m
Q are
sets which are, (i) closed under taking sequential limits
4
, and (ii) have the nite
intersection property, (i.e.
n
m=1
K
m
,= for all m N), then
m=1
K
m
,= .
Proof. By assumption, for each n N, there exists x(n)
n
m=1
K
m
. Hence
by Lemma 9.48 there exists a subsequence, x(n
k
) , such that x := lim
k
x(n
k
)
exists in Q. Since x(n
k
)
n
m=1
K
m
for all k large, and each K
m
is closed
under sequential limits, it follows that x K
m
for all m. Thus we have shown,
x
m=1
K
m
and hence
m=1
K
m
,= .
9.9.2 Kolmogorovs Extension Theorem and Innite Product
Measures
Theorem 9.50 (Kolmogorovs Extension Theorem). Let I := [0, 1] .
For each n N, let
n
be a probability measure on (I
n
, B
I
n) such that
4
For example, if Km = K
m
Q with K
m
being a closed subset of I
m
, then Km is
closed under sequential limits.
9.9 Kolmogorovs Extension Theorems 139
n+1
(AI) =
n
(A) . Then there exists a unique measure, P on (Q, B
Q
)
such that
P (AQ) =
n
(A) (9.67)
for all A B
I
n and n N.
Proof. Let / := B
n
where B
n
:= AQ : A B
I
n = (
1
, . . . ,
n
) ,
where
i
(x) = x
i
if x = (x
1
, x
2
, . . . ) Q. Then dene P on / by Eq. (9.67)
which is easily seen (Exercise 9.10) to be a well dened nitely additive measure
on /. So to nish the proof it suces to show if B
n
/ is a decreasing sequence
such that
inf
n
P (B
n
) = lim
n
P (B
n
) = > 0,
then B := B
n
,= .
To simplify notation, we may reduce to the case where B
n
B
n
for all n.
To see this is permissible, Let us choose 1 n
1
< n
2
< n
3
< . . . . such that
B
k
B
nk
for all k. (This is possible since B
n
is increasing in n.) We now dene
a new decreasing sequence of sets,
_
B
k
_
k=1
as follows,
_
B
1
,

B
2
, . . .
_
=
_
_
n11 times
..
Q, . . . , Q,
n2n1 times
..
B
1
, . . . , B
1
,
n3n2 times
..
B
2
, . . . , B
2
,
n4n3 times
..
B
3
, . . . , B
3
, . . .
_
_
.
We then have

B
n
B
n
for all n, lim
n
P
_
B
n
_
= > 0, and B =
n=1

B
n
.
Hence we may replace B
n
by

B
n
if necessary so as to have B
n
B
n
for all n.
Since B
n
B
n
, there exists B
t
n
B
I
n such that B
n
= B
t
n
Q for all n.
Using the regularity Theorem 9.46, there are compact sets, K
t
n
B
t
n
I
n
,
such that
n
(B
t
n
K
t
n
) 2
n1
for all n N. Let K
n
:= K
t
n
Q, then
P (B
n
K
n
) 2
n1
for all n. Moreover,
P (B
n
[
n
m=1
K
m
]) = P (
n
m=1
[B
n
K
m
])
n
m=1
P (B
n
K
m
)
m=1
P (B
m
K
m
)
n
m=1
2
m1
/2.
So, for all n N,
P (
n
m=1
K
m
) = P (B
n
) P (B
n
[
n
m=1
K
m
]) /2 = /2,
and in particular,
n
m=1
K
m
,= . An application of Corollary 9.49 now implies,
, =
n
K
n

n
B
n
.
Exercise 9.10. Show that Eq. (9.67) denes a well dened nitely additive
measure on / := B
n
.
The next result is an easy corollary of Theorem 9.50.
Theorem 9.51. Suppose (X
n
, /
n
)
nN
are standard Borel spaces (see Ap-
pendix 9.10 below), X :=

nN
X
n
,
n
: X X
n
be the n
th
projection map,
B
n
:= (
k
: k n) , B = (
n
: n N), and T
n
:= X
n+1
X
n+2
. . . .
Further suppose that for each n N we are given a probability measure,
n
on
/
1
/
n
such that
n+1
(AX
n+1
) =
n
(A) for all n N and A /
1
/
n
.
Then there exists a unique probability measure, P, on (X, B) such that
P (AT
n
) =
n
(A) for all A /
1
/
n
.
Proof. Since each (X
n
, /
n
) is measure theoretic isomorphic to a Borel
subset of I, we may assume that X
n
B
I
and /
n
= (B
I
)
Xn
for all n. Given
A B
I
n, let
n
(A) :=
n
(A [X
1
X
n
]) a probability measure on
B
I
n. Furthermore,

n+1
(AI) =
n+1
([AI] [X
1
X
n+1
])
=
n+1
((A [X
1
X
n
]) X
n+1
)
=
n
((A [X
1
X
n
])) =
n
(A) .
Hence by Theorem 9.50, there is a unique probability measure,

P, on I
N
such
that
P
_
AI
N
_
=
n
(A) for all n N and A B
I
n.
We will now check that P :=

P[
n=1
,n
is the desired measure. First o we
have
P (X) = lim
n
P
_
X
1
X
n
I
N
_
= lim
n

n
(X
1
X
n
)
= lim
n
n
(X
1
X
n
) = 1.
Secondly, if A /
1
/
n
, we have
P (AT
n
) =

P (AT
n
) =

P
__
AI
N
_
X
_
=

P
_
AI
N
_
=
n
(A) =
n
(A) .
Here is an example of this theorem in action.
Theorem 9.52 (Innite Product Measures). Suppose that
n
n=1
are a
sequence of probability measures on (1, B
R
) and B :=
nN
B
R
is the product
algebra on 1
N
. Then there exists a unique probability measure, , on
_
1
N
, B
_
,
such that
_
A
1
A
2
A
n
1
N
_
=
1
(A
1
) . . .
n
(A
n
) A
i
B
R
& n N.
(9.68)
Moreover, this measure satises,
_
R
N
f (x
1
, . . . , x
n
) d (x) =
_
R
n
f (x
1
, . . . , x
n
) d
1
(x
1
) . . . d
n
(x
n
) (9.69)
for all n N and f : 1
n
1 which are bounded and measurable or non-negative
and measurable.
Proof. The measure is created by apply Theorem 9.51 with
n
:=
1

n
on (1
n
, B
R
n =
n
k=1
B
R
) for each n N. Observe that
n+1
(A1) =
n
(A)
n+1
(1) =
n
(A) ,
so that
n
n=1
satises the needed consistency conditions. Thus there exists
a unique measure on
_
1
N
, B
_
such that
_
A1
N
_
=
n
(A) for all A B
R
n and n N.
Taking A = A
1
A
2
A
n
with A
i
B
R
then gives Eq. (9.68). For this
measure, it follows that Eq. (9.69) holds when f = 1
A1An
. Thus by an
application of Theorem 8.2 with M = 1
A1An
: A
i
B
R
and H being the
set of bounded measurable functions, f : 1
n
1, for which Eq. (9.69) shows
that Eq. (9.69) holds for all bounded and measurable functions, f : 1
n
1.
The statement involving non-negative functions follows by a simple limiting
argument involving the MCT.
It turns out that the existence of innite product measures require no topo-
logical restrictions on the measure spaces involved. See Corollary 17.57 below.
9.10 Appendix: Standard Borel Spaces*
For more information along the lines of this section, see Royden [58] and
Parthasarathy [46].
Denition 9.53. Two measurable spaces, (X, /) and (Y, ^) are said to be
isomorphic if there exists a bijective map, f : X Y such that f (/) = ^
and f
1
(^) = /, i.e. both f and f
1
are measurable. In this case we say f
is a measure theoretic isomorphism and we will write X

= Y.
Denition 9.54. A measurable space, (X, /) is said to be a standard Borel
space if (X, /)
= (B, B
B
) where B is a Borel subset of
_
(0, 1) , B
(0,1)
_
.
Denition 9.55 (Polish spaces). A Polish space is a separable topological
space (X, ) which admits a complete metric, , such that =
.
The main goal of this chapter is to prove every Borel subset of a Polish
space is a standard Borel space, see Corollary 9.65 below. Along the way we
will show a number of spaces, including [0, 1] , (0, 1], [0, 1]
d
, 1
d
, 0, 1
N
, and
1
N
, are all (measure theoretic) isomorphic to (0, 1) . Moreover we also will see
that a countable product of standard Borel spaces is again a standard Borel
space, see Corollary 9.62.
*On rst reading, you may wish to skip the rest of this
section.
Lemma 9.56. Suppose (X, /) and (Y, ^) are measurable spaces such that
X =

n=1
X
n
, Y =

n=1
Y
n
, with X
n
/ and Y
n
^. If (X
n
, /
Xn
)
is isomorphic to (Y
n
, ^
Yn
) for all n then X

= Y. Moreover, if (X
n
, /
n
) and
(Y
n
, ^
n
) are isomorphic measure spaces, then (X :=
n=1
X
n
,
n=1
/
n
) are
(Y :=
n=1
Y
n
,
n=1
^
n
) are isomorphic.
Proof. For each n N, let f
n
: X
n
Y
n
be a measure theoretic isomor-
phism. Then dene f : X Y by f = f
n
on X
n
. Clearly, f : X Y is a
bijection and if B ^, then
f
1
(B) =
n=1
f
1
(B Y
n
) =
n=1
f
1
n
(B Y
n
) /.
This shows f is measurable and by similar considerations, f
1
is measurable
as well. Therefore, f : X Y is the desired measure theoretic isomorphism.
For the second assertion, let f
n
: X
n
Y
n
be a measure theoretic isomor-
phism of all n N and then dene
f (x) = (f
1
(x
1
) , f
2
(x
2
) , . . . ) with x = (x
1
, x
2
, . . . ) X.
Again it is clear that f is bijective and measurable, since
f
1
_

n=1
B
n
_
=
n=1
f
1
n
(B
n
)
n=1
^
n
for all B
n
/
n
and n N. Similar reasoning shows that f
1
is measurable as
well.
Proposition 9.57. Let < a < b < . The following measurable spaces
equipped with there Borel algebras are all isomorphic; (0, 1) , [0, 1] , (0, 1],
[0, 1), (a, b) , [a, b] , (a, b], [a, b), 1, and (0, 1) where is a nite or countable
subset of 1 (0, 1) .
Proof. It is easy to see by that any bounded open, closed, or half open
interval is isomorphic to any other such interval using an ane transformation.
9.10 Appendix: Standard Borel Spaces* 141
Let us now show (1, 1)

= [1, 1] . To prove this it suces, by Lemma 9.56,to
observe that
(1, 1) = 0
n=0
_
(2
n
, 2
n
] [2
n1
, 2
n
)
_
and
[1, 1] = 0
n=0
_
[2
n
, 2
n1
) (2
n1
, 2
n
]
_
.
Similarly (0, 1) is isomorphic to (0, 1] because
(0, 1) =
n=0
[2
n1
, 2
n
) and (0, 1] =
n=0
(2
n1
, 2
n
].
The assertion involving 1 can be proved using the bijection, tan :
(/2, /2) 1.
If = 1 , then by Lemma 9.56 and what we have already proved, (0, 1)
1 = (0, 1]

= (0, 1) . Similarly if N N with N 2 and = 2, . . . , N + 1 ,
then
(0, 1)
= (0, 1] = (0, 2
N+1
]
_
N1
n=1
(2
n
, 2
n1
]
_

while
(0, 1) =
_
0, 2
N+1
_
_
N1
n=1
_
2
n
, 2
n1
_
_
_
2
n
: n = 1, 2, . . . , N
_
and so again it follows from what we have proved and Lemma 9.56 that (0, 1)
=
(0, 1) . Finally if = 2, 3, 4, . . . is a countable set, we can show (0, 1)

=
(0, 1) with the aid of the identities,
(0, 1) =
_

n=1
_
2
n
, 2
n1
_
_
_
2
n
: n N
_
and
(0, 1)
= (0, 1] =
_

n=1
(2
n
, 2
n1
]
_
.
Notation 9.58 Suppose (X, /) is a measurable space and A is a set. Let
a
: X
A
X denote projection operator onto the a
th
component of X
A
(i.e.
a
() = (a) for all a A) and let /
A
:= (
a
: a A) be the product
algebra on X
A
.
Lemma 9.59. If : A B is a bijection of sets and (X, /) is a measurable
space, then
_
X
A
, /
A
_
=
_
X
B
, /
B
_
.
Proof. The map f : X
B
X
A
dened by f () = for all X
B
is
a bijection with f
1
() =
1
. If a A and X
B
, we have
X
A
a
f () = f () (a) = ((a)) =
X
B
(a)
() ,
where
X
A
a
and
X
B
b
are the projection operators on X
A
and X
B
respectively.
Thus
X
A
a
f =
X
B
(a)
for all a A which shows f is measurable. Similarly,
X
B
b
f
1
=
X
A
1
(b)
showing f
1
is measurable as well.
Proposition 9.60. Let := 0, 1
N
,
i
: 0, 1 be projection onto the
i
th
component, and B := (
1
,
2
, . . . ) be the product algebra on . Then
(, B)
=
_
(0, 1) , B
(0,1)
_
.
Proof. We will begin by using a specic binary digit expansion of a point
x [0, 1) to construct a map from [0, 1) . To this end, let r
1
(x) = x,
1
(x) := 1
x2
1 and r
2
(x) := x 2
1
1
(x) (0, 2
1
),
then let
2
:= 1
r22
2 and r
3
= r
2
2
2
2

_
0, 2
2
_
. Working inductively, we
construct
k
(x) , r
k
(x)
k=1
such that
k
(x) 0, 1 , and
r
k+1
(x) = r
k
(x) 2
k
k
(x) = x
k
j=1
2
j
j
(x)
_
0, 2
k
_
(9.70)
for all k. Let us now dene g : [0, 1) by g (x) := (
1
(x) ,
2
(x) , . . . ) . Since
each component function,
j
g =
j
: [0, 1) 0, 1 , is measurable it follows
that g is measurable.
By construction,
x =
k
j=1
2
j
j
(x) +r
k+1
(x)
and r
k+1
(x) 0 as k , therefore
x =
j=1
2
j
j
(x) and r
k+1
(x) =
j=k+1
2
j
j
(x) . (9.71)
Hence if we dene f : [0, 1] by f =
j=1
2
j
j
, then f (g (x)) = x for all
x [0, 1). This shows g is injective, f is surjective, and f in injective on the
range of g.
We now claim that
0
:= g ([0, 1)) , the range of g, consists of those
such that
i
= 0 for innitely many i. Indeed, if there exists an k N such
that
j
(x) = 1 for all j k, then (by Eq. (9.71)) r
k+1
(x) = 2
k
which
would contradict Eq. (9.70). Hence g ([0, 1))
0
. Conversely if
0
and
x = f () [0, 1), it is not hard to show inductively that
j
(x) =
j
for all
j, i.e. g (x) = . For example, if
1
= 1 then x 2
1
and hence
1
(x) = 1.
Alternatively, if
1
= 0, then
x =
j=2
2
j
j
<
j=2
2
j
= 2
1
so that
1
(x) = 0. Hence it follows that r
2
(x) =

j=2
2
j
j
and by similar
reasoning we learn r
2
(x) 2
2
i
2
= 1, i.e.
2
(x) = 1 i
2
= 1. The full
induction argument is now left to the reader.
Since single point sets are in B and
:=
0
=
n=1
:
j
= 1 for j n
is a countable set, it follows that B and therefore
0
= B.
Hence we may now conclude that g :
_
[0, 1), B
[0,1)
_
(
0
, B
0
) is a measurable
bijection with measurable inverse given by f[
0
, i.e.
_
[0, 1), B
[0,1)
_
= (
0
, B
0
) .
An application of Lemma 9.56 and Proposition 9.57 now implies
=
0

= [0, 1) N
= [0, 1)
= (0, 1) .
Corollary 9.61. The following spaces are all isomorphic to
_
(0, 1) , B
(0,1)
_
;
(0, 1)
d
and 1
d
for any d N and [0, 1]
N
and 1
N
where both of these spaces
are equipped with their natural product algebras, .
Proof. In light of Lemma 9.56 and Proposition 9.57 we know that (0, 1)
d
=
1
d
and (0, 1)
N

= [0, 1]
N

= 1
N
. So, using Proposition 9.60, it suces to show
(0, 1)
d
=

= (0, 1)
N
and to do this it suces to show
d
= and
N
= .
To reduce the problem further, let us observe that
d
= 0, 1
N1,2,...,d]
and
N
= 0, 1
N
2
. For example, let g :
N
0, 1
N
2
be dened by
g () (i, j) = (i) (j) for all
N
=
_
0, 1
N
_
N
. Then g is a bijection and
since
0,1]
N
2
(i,j)
g () =
j
_
N
i
()
_
, it follows that g is measurable. The in-
verse, g
1
: 0, 1
N
2

N
, to g is given by g
1
() (i) (j) = (i, j) . To see
this map is measurable, we have
N
i
g
1
: 0, 1
N
2
= 0, 1
N
is given
N
i
g
1
() = g
1
() (i) () = (i, ) and hence
N
i
g () = (i, j) =
0,1]
N
2
i,j
()
N
i
g
1
=
0,1]
N
2
is measurable for all i, j N
and hence
N
i
g
1
is measurable for all i N and hence g
1
is measurable.
This shows
N
= 0, 1
N
2
. The proof that
d
= 0, 1
N1,2,...,d]
is analogous.
We may now complete the proof with a couple of applications of Lemma
9.59. Indeed N, N 1, 2, . . . , d , and N
2
all have the same cardinality and
therefore,
0, 1
N1,2,...,d]

= 0, 1
N
2
= 0, 1
N
= .
Corollary 9.62. Suppose that (X
n
, /
n
) for n N are standard Borel spaces,
then X :=

n=1
X
n
equipped with the product algebra, / :=
n=1
/
n
is
again a standard Borel space.
Proof. Let A
n
B
[0,1]
be Borel sets on [0, 1] such that there exists a mea-
surable isomorpohism, f
n
: X
n
A
n
. Then f : X A :=
n=1
A
n
dened by
f (x
1
, x
2
, . . . ) = (f
1
(x
1
) , f
2
(x
2
) , . . . ) is easily seen to me a measure theoretic
isomorphism when A is equipped with the product algebra,
n=1
B
An
. So ac-
cording to Corollary 9.61, to nish the proof it suce to show
n=1
B
An
= /
A
where /:=
n=1
B
[0,1]
is the product algebra on [0, 1]
N
.
The algebra,
n=1
B
An
, is generated by sets of the form, B :=
n=1
B
n
where B
n
B
An
B
[0,1]
. On the other hand, the algebra, /
A
is generated
by sets of the form, A

B where

B :=
n=1
B
n
with

B
n
B
[0,1]
. Since
A

B =
n=1
_
B
n
A
n
_
=
n=1
B
n
where B
n
=

B
n
A
n
is the generic element in B
An
, we see that
n=1
B
An
and
/
A
can both be generated by the same collections of sets, we may conclude
that
n=1
B
An
= /
A
.
Our next goal is to show that any Polish space with its Borel algebra is
a standard Borel space.
Notation 9.63 Let Q := [0, 1]
N
denote the (innite dimensional) unit cube
in 1
N
. For a, b Q let
d(a, b) :=
n=1
1
2
n
[a
n
b
n
[ =
n=1
1
2
n
[
n
(a)
n
(b)[ . (9.72)
Exercise 9.11. Show d is a metric and that the Borel algebra on (Q, d) is
the same as the product algebra.
9.11 More Exercises 143
Solution to Exercise (9.11). It is easily seen that d is a metric on Q which,
by Eq. (9.72) is measurable relative to the product algebra, /.. There-
fore, / contains all open balls and hence contains the Borel algebra, B.
Conversely, since
[
n
(a)
n
(b)[ 2
n
d (a, b) ,
each of the projection operators,
n
: Q [0, 1] is continuous. Therefore each
n
is B measurable and hence /= (
n
n=1
) B.
Theorem 9.64. To every separable metric space (X, ), there exists a contin-
uous injective map G : X Q such that G : X G(X) Q is a homeomor-
phism. Moreover if the metric, , is also complete, then G(X) is a G
set, i.e.
the G(X) is the countable intersection of open subsets of (Q, d) . In short, any
separable metrizable space X is homeomorphic to a subset of (Q, d) and if X is
a Polish space then X is homeomorphic to a G
subset of (Q, d).

Proof. (This proof follows that in Rogers and Williams [55, Theorem 82.5
on p. 106.].) By replacing by

1+
if necessary, we may assume that 0 < 1.
Let D = a
n
n=1
be a countable dense subset of X and dene
G(x) = ( (x, a
1
) , (x, a
2
) , (x, a
3
) , . . . ) Q
and
(x, y) = d (G(x) , G(y)) =
n=1
1
2
n
[ (x, a
n
) (y, a
n
)[
for x, y X. To prove the rst assertion, we must show G is injective and is
a metric on X which is compatible with the topology determined by .
If G(x) = G(y) , then (x, a) = (y, a) for all a D. Since D is a dense
subset of X, we may choose
k
D such that
0 = lim
k
(x,
k
) = lim
k
(y,
k
) = (y, x)
and therefore x = y. A simple argument using the dominated convergence
theorem shows y (x, y) is continuous, i.e. (x, y) is small if (x, y) is
small. Conversely,
(x, y) (x, a
n
) + (y, a
n
) = 2 (x, a
n
) + (y, a
n
) (x, a
n
)
2 (x, a
n
) +[ (x, a
n
) (y, a
n
)[ 2 (x, a
n
) + 2
n
(x, y) .
Hence if > 0 is given, we may choose n so that 2 (x, a
n
) < /2 and so if
(x, y) < 2
(n+1)
, it will follow that (x, y) < . This shows
. Since
G : (X, ) (Q, d) is isometric, G is a homeomorphism.
Now suppose that (X, ) is a complete metric space. Let S := G(X) and
be the metric on S dened by (G(x) , G(y)) = (x, y) for all x, y X. Then
(S, ) is a complete metric (being the isometric image of a complete metric
space) and by what we have just prove,
=
dS
. Consequently, if u S and >
0 is given, we may nd
t
() such that B
(u,
t
()) B
d
(u, ) . Taking () =
min(
t
() , ) , we have diam
d
(B
d
(u, ())) < and diam
(B
d
(u, ())) <
where
diam
(A) := sup (u, v) : u, v A and

diam
d
(A) := supd (u, v) : u, v A .
Let

S denote the closure of S inside of (Q, d) and for each n N let
^
n
:= N
d
: diam
d
(N) diam
(N S) < 1/n
and let U
n
:= ^
n

d
. From the previous paragraph, it follows that S U
n
and therefore S

S (
n=1
U
n
) .
Conversely if u

S (
n=1
U
n
) and n N, there exists N
n
^
n
such
that u N
n
. Moreover, since N
1
N
n
is an open neighborhood of u

S,
there exists u
n
N
1
N
n
S for each n N. From the denition of
^
n
, we have lim
n
d (u, u
n
) = 0 and (u
n
, u
m
) max
_
n
1
, m
1
_
0 as
m, n . Since (S, ) is complete, it follows that u
n
n=1
is convergent in
(S, ) to some element u
0
S. Since (S, d
S
) has the same topology as (S, )
it follows that d (u
n
, u
0
) 0 as well and thus that u = u
0
S. We have
now shown, S =

S (
n=1
U
n
) . This completes the proof because we may
write

S =
_
n=1
S
1/n
_
where S
1/n
:=
_
u Q : d
_
u,

S
_
< 1/n
_
and therefore,
S = (
n=1
U
n
)
_
n=1
S
1/n
_
is a G
set.
Corollary 9.65. Every Polish space, X, with its Borel algebra is a standard
Borel space. Consequently any Borel subset of X is also a standard Borel space.
Proof. Theorem 9.64 shows that X is homeomorphic to a measurable (in
fact a G
) subset Q
0
of (Q, d) and hence X

= Q
0
. Since Q is a standard Borel
space so is Q
0
and hence so is X.
9.11 More Exercises
Exercise 9.12. Let (X
j
, /
j
,
j
) for j = 1, 2, 3 be nite measure spaces.
Let F : (X
1
X
2
) X
3
X
1
X
2
X
3
be dened by
F((x
1
, x
2
), x
3
) = (x
1
, x
2
, x
3
).
1. Show F is ((/
1
/
2
) /
3
, /
1
/
2
/
3
) measurable and F
1
is
(/
1
/
2
/
3
, (/
1
/
2
) /
3
) measurable. That is
F : ((X
1
X
2
)X
3
, (/
1
/
2
)/
3
) (X
1
X
2
X
3
, /
1
/
2
/
3
)
is a measure theoretic isomorphism.
2. Let := F
[(
1
2
)
3
] , i.e. (A) = [(
1
2
)
3
] (F
1
(A)) for all
A /
1
/
2
/
3
. Then is the unique measure on /
1
/
2
/
3
such that
(A
1
A
2
A
3
) =
1
(A
1
)
2
(A
2
)
3
(A
3
)
for all A
i
/
i
. We will write :=
1
3
.
3. Let f : X
1
X
2
X
3
[0, ] be a (/
1
/
2
/
3
, B
R
) measurable
function. Verify the identity,
_
X1X2X3
fd =
_
X3
d
3
(x
3
)
_
X2
d
2
(x
2
)
_
X1
d
1
(x
1
)f(x
1
, x
2
, x
3
),
makes sense and is correct.
4. (Optional.) Also show the above identity holds for any one of the six possible
orderings of the iterated integrals.
Exercise 9.13. Prove the second assertion of Theorem 9.20. That is show m
d
is the unique translation invariant measure on B
R
d such that m
d
((0, 1]
d
) = 1.
Hint: Look at the proof of Theorem 5.34.
Exercise 9.14. (Part of Folland Problem 2.46 on p. 69.) Let X = [0, 1], / =
B
[0,1]
be the Borel eld on X, m be Lebesgue measure on [0, 1] and be
counting measure, (A) = #(A). Finally let D = (x, x) X
2
: x X be the
diagonal in X
2
. Show
_
X
__
X
1
D
(x, y)d(y)
_
dm(x) ,=
_
X
__
X
1
D
(x, y)dm(x)
_
d(y)
by explicitly computing both sides of this equation.
Exercise 9.15. Folland Problem 2.48 on p. 69. (Counter example related to
Fubini Theorem involving counting measures.)
Exercise 9.16. Folland Problem 2.50 on p. 69 pertaining to area under a curve.
(Note the /B
R
should be /B
R
in this problem.)
Exercise 9.17. Folland Problem 2.55 on p. 77. (Explicit integrations.)
Exercise 9.18. Folland Problem 2.56 on p. 77. Let f L
1
((0, a), dm), g(x) =
_
a
x
f(t)
t
dt for x (0, a), show g L
1
((0, a), dm) and
_
a
0
g(x)dx =
_
a
0
f(t)dt.
Exercise 9.19. Show
_
sin x
x
dm(x) = . So
sin x
x
/ L
1
([0, ), m) and
_
0
sin x
x
dm(x) is not dened as a Lebesgue integral.
Exercise 9.20. Folland Problem 2.57 on p. 77.
Exercise 9.21. Folland Problem 2.58 on p. 77.
Exercise 9.22. Folland Problem 2.60 on p. 77. Properties of the function.
Exercise 9.23. Folland Problem 2.61 on p. 77. Fractional integration.
Exercise 9.24. Folland Problem 2.62 on p. 80. Rotation invariance of surface
measure on S
n1
.
Exercise 9.25. Folland Problem 2.64 on p. 80. On the integrability of
[x[
a
[log [x[[
b
for x near 0 and x near in 1
n
.
Exercise 9.26. Show, using Problem 9.24 that
_
S
d1
j
d () =
1
d
ij
_
S
d1
_
.
Hint: show
_
S
d1

2
i
d () is independent of i and therefore
_
S
d1
2
i
d () =
1
d
d
j=1
_
S
d1
2
j
d () .
10
Independence
As usual, (, B, P) will be some xed probability space. Recall that for
A, B B with P (B) > 0 we let
P (A[B) :=
P (A B)
P (B)
which is to be read as; the probability of A given B.
Denition 10.1. We say that A is independent of B is P (A[B) = P (A) or
equivalently that
P (A B) = P (A) P (B) .
We further say a nite sequence of collection of sets, (
i
n
i=1
, are independent
if
P (
jJ
A
j
) =
jJ
P (A
j
)
for all A
i
(
i
and J 1, 2, . . . , n .
10.1 Basic Properties of Independence
If (
i
n
i=1
, are independent classes then so are (
i

n
i=1
. Moreover, if we
assume that (
i
for each i, then (
i
n
i=1
, are independent i
P
_
n
j=1
A
j
_
=
n
j=1
P (A
j
) for all (A
1
, . . . , A
n
) (
1
(
n
.
Theorem 10.2. Suppose that (
i
n
i=1
is a nite sequence of independent
classes. Then ((
i
)
n
i=1
are also independent.
Proof. As mentioned above, we may always assume without loss of gener-
ality that (
i
. Fix, A
j
(
j
for j = 2, 3, . . . , n. We will begin by showing
that
Q(A) := P (A A
2
A
n
) = P (A) P (A
2
) . . . P (A
n
) for all A ((
1
) .
(10.1)
Since Q() and P (A
2
) . . . P (A
n
) P () are both nite measures agreeing on
and A in the system (
1
, Eq. (10.1) is a direct consequence of Proposition
5.15. Since (A
2
, . . . , A
n
) (
2
(
n
were arbitrary we may now conclude
that ((
1
) , (
2
, . . . , (
n
are independent.
By applying the result we have just proved to the sequence, (
2
, . . . , (
n
, ((
1
)
shows that ((
2
) , (
3
, . . . , (
n
, ((
1
) are independent. Similarly we show induc-
tively that
((
j
) , (
j+1
, . . . , (
n
, ((
1
) , . . . , ((
j1
)
are independent for each j = 1, 2, . . . , n. The desired result occurs at j = n.
Denition 10.3. Let (, , B, P) be a probability space, (S
i
, o
i
)
n
i=1
be a collec-
tion of measurable spaces and Y
i
: S
i
be a measurable map for 1 i n.
The maps Y
i
n
i=1
are P - independent i (
i
n
i=1
are P independent, where
(
i
:= Y
1
i
(o
i
) = (Y
i
) B for 1 i n.
Theorem 10.4 (Independence and Product Measures). Let (, B, P) be
a probability space, (S
i
, o
i
)
n
i=1
be a collection of measurable spaces and Y
i
:
S
i
be a measurable map for 1 i n. Further let
i
:= P Y
1
i
=
Law
P
(Y
i
) . Then Y
i
n
i=1
are independent i
Law
P
(Y
1
, . . . , Y
n
) =
1

n
,
where (Y
1
, . . . , Y
n
) : S
1
S
n
and
Law
P
(Y
1
, . . . , Y
n
) = P (Y
1
, . . . , Y
n
)
1
: o
1
o
n
[0, 1]
is the joint law of Y
1
, . . . , Y
n
.
Proof. Recall that the general element of (
i
is of the form A
i
= Y
1
i
(B
i
)
with B
i
o
i
. Therefore for A
i
= Y
1
i
(B
i
) (
i
we have
P (A
1
A
n
) = P ((Y
1
, . . . , Y
n
) B
1
B
n
)
= ((Y
1
, . . . , Y
n
)
P) (B
1
B
n
) .
If (Y
1
, . . . , Y
n
)
P =
1

n
it follows that
P (A
1
A
n
) =
1

n
(B
1
B
n
)
=
1
(B
1
) (B
n
) = P (Y
1
B
1
) P (Y
n
B
n
)
= P (A
1
) . . . P (A
n
)
146 10 Independence
and therefore (
i
are P independent and hence Y
i
are P independent.
Conversely if Y
i
are P independent, i.e. (
i
are P independent, then
P ((Y
1
, . . . , Y
n
) B
1
B
n
) = P (A
1
A
n
)
= P (A
1
) . . . P (A
n
)
= P (Y
1
B
1
) P (Y
n
B
n
)
=
1
(B
1
) (B
n
)
=
1

n
(B
1
B
n
) .
Since
:= B
1
B
n
: B
i
o
i
for 1 i n
is a system which generates o
1
o
n
and
(Y
1
, . . . , Y
n
)
P =
1

n
on ,
it follows that (Y
1
, . . . , Y
n
)
P =
1

n
on all of o
1
o
n
.
Remark 10.5. When have a collection of not necessarily independent random
functions, Y
i
: S
i
, like in Theorem 10.4 it is not in general possible
to recover the joint distribution, := Law
P
(Y
1
, . . . , Y
n
) , from the individual
distributions,
i
= Law
P
(Y
i
) for all 1 i n. For example suppose that
S
i
= 1 for i = 1, 2. is a probability measure on (1, B
R
) , and (Y
1
, Y
2
) have
joint distribution, , given by,
(C) =
_
R
1
C
(x, x) d(x) for all C B
R
.
If we let
i
= Law
P
(Y
i
) , then for all A B
R
we have
1
(A) = P (Y
1
A) = P ((Y
1
, Y
2
) A1)
= (A1) =
_
R
1
AR
(x, x) d(x) = (A) .
Similarly we show that
2
= . On the other hand if is not concentrated on
one point, is another probability measure on
_
1
2
, B
R
2
_
with the same
marginals as , i.e. (A1) = (A) = (1 A) for all A B
R
.
Lemma 10.6. Let (, , B, P) be a probability space, (S
i
, o
i
)
n
i=1
and
(T
i
, T
i
)
n
i=1
be two collection of measurable spaces, F
i
: S
i
T
i
be a mea-
surable map for each i and Y
i
: S
i
be a collection of P independent
measurable maps. Then F
i
Y
i
n
i=1
are also P independent.
Proof. Notice that
(F
i
Y
i
) = (F
i
Y
i
)
1
(T
i
) = Y
1
i
_
F
i
1
(T
i
)
_
Y
1
i
(o
i
) = (
i
.
The fact that (F
i
Y
i
)
n
i=1
is independent now follows easily from the as-
sumption that (
i
are P independent.
Example 10.7. If :=

n
i=1
S
i
, B := o
1
o
n
, Y
i
(s
1
, . . . , s
n
) = s
i
for all
(s
1
, . . . , s
n
) , and (
i
:= Y
1
i
(o
i
) for all i. Then the probability measures, P,
on (, B) for which (
i
n
i=1
are independent are precisely the product measures,
P =
1

n
where
i
is a probability measure on (S
i
, o
i
) for 1 i n.
Notice that in this setting,
(
i
:= Y
1
i
(o
i
) = S
1
S
i1
B S
i+1
S
n
: B o
i
B.
Proposition 10.8. Suppose that (, B, P) is a probability space and Z
j
n
j=1
are independent integrable random variables. Then

n
j=1
Z
j
is also integrable
and
E
_
_
n
j=1
Z
j
_
_
=
n
j=1
EZ
j
.
Proof. Let
j
:= P Z
1
j
: B
R
[0, 1] be the law of Z
j
for each j. Then we
know (Z
1
, . . . , Z
n
)
P =
1

n
. Therefore by Example 7.52 and Tonellis
theorem,
E
_
_
n
j=1
[Z
j
[
_
_
=
_
R
n
_
_
n
j=1
[z
j
[
_
_
d
_
n
j=1
j
_
(z)
=
n
j=1
_
R
n
[z
j
[ d
j
(z
j
) =
n
j=1
E[Z
j
[ <
which shows that

n
j=1
Z
j
is integrable. Thus again by Example 7.52 and Fu-
binis theorem,
E
_
_
n
j=1
Z
j
_
_
=
_
R
n
_
_
n
j=1
z
j
_
_
d
_
n
j=1
j
_
(z)
=
n
j=1
_
R
z
j
d
j
(z
j
) =
n
j=1
EZ
j
.
Theorem 10.9. Let (, , B, P) be a probability space, (S
i
, o
i
)
n
i=1
be a collec-
tion of measurable spaces and Y
i
: S
i
be a measurable map for 1 i n.
Further let
i
:= P Y
1
i
= Law
P
(Y
i
) and := P (Y
1
, . . . , Y
n
)
1
: o
1
o
n
be the joint distribution of
(Y
1
, . . . , Y
n
) : S
1
S
n
.
Then the following are equivalent,
10.1 Basic Properties of Independence 147
1. Y
i
n
i=1
are independent,
2. =
1
2

n
3. for all bounded measurable functions, f : (S
1
S
n
, o
1
o
n
)
(1, B
R
) ,
Ef (Y
1
, . . . , Y
n
) =
_
S1Sn
f (x
1
, . . . , x
n
) d
1
(x
1
) . . . d
n
(x
n
) , (10.2)
( where the integrals may be taken in any order),
4. E[
n
i=1
f
i
(Y
i
)] =
n
i=1
E[f
i
(Y
i
)] for all bounded (or non-negative) measur-
able functions, f
i
: S
i
1 or C.
Proof. (1 2) has already been proved in Theorem 10.4. The fact
that (2. = 3.) now follows from Exercise 7.11 and Fubinis theorem. Sim-
ilarly, (3. = 4.) follows from Exercise 7.11 and Fubinis theorem after taking
f (x
1
, . . . , x
n
) =

n
i=1
f
i
(x
i
) . Lastly for (4. = 1.) , let A
i
o
i
and take
f
i
:= 1
Ai
in 4. to learn,
P (
n
i=1
Y
i
A
i
) = E
_
n
i=1
1
Ai
(Y
i
)
_
=
n
i=1
E[1
Ai
(Y
i
)] =
n
i=1
P (Y
i
A
i
)
which shows that the Y
i
n
i=1
are independent.
Corollary 10.10. Suppose that (, B, P) is a probability space and
Y
j
: 1
n
j=1
is a sequence of random variables with countable ranges, say
1. Then Y
j
n
j=1
are independent i
P
_
n
j=1
Y
j
= y
j
_
=
n
j=1
P (Y
j
= y
j
) (10.3)
for all choices of y
1
, . . . , y
n
.
Proof. If the Y
j
are independent then clearly Eq. (10.3) holds by denition
as Y
j
= y
j
Y
1
j
(B
R
) . Conversely if Eq. (10.3) holds and f
i
: 1 [0, ) are
measurable functions then,
E
_
n
i=1
f
i
(Y
i
)
_
=
y1,...,yn
n
i=1
f
i
(y
i
) P
_
n
j=1
Y
j
= y
j
_
=
y1,...,yn
n
i=1
f
i
(y
i
)
n
j=1
P (Y
j
= y
j
)
=
n
i=1
yi
f
i
(y
i
) P (Y
j
= y
j
)
=
n
i=1
E[f
i
(Y
i
)]
wherein we have used Tonellis theorem for sum in the third equality. It now
follows that Y
i
are independent using item 4. of Theorem 10.9.
Exercise 10.1. Suppose that = (0, 1], B = B
(0,1]
, and P = m is Lebesgue
measure on B. Let Y
i
() :=
i
be the i
th
digit in the base two expansion of
. To be more precise, the Y
i
() 0, 1 is chosen so that
=
i=1
Y
i
() 2
i
for all
i
0, 1 .
As long as ,= k2
n
for some 0 < k n, the above equation uniquely deter-
mines the Y
i
() . Owing to the fact that

l=n+1
2
l
= 2
n
, if = k2
n
,
there is some ambiguity in the denitions of the Y
i
() for large i which you
may resolve anyway you choose. Show the random variables, Y
i
n
i=1
, are i.i.d.
for each n N with P (Y
i
= 1) = 1/2 = P (Y
i
= 0) for all i.
Hint: the idea is that knowledge of (Y
1
() , . . . , Y
n
()) is equivalent to
knowing for which k N
0
[0, 2
n
) that (2
n
k, 2
n
(k + 1)] and that this
knowledge in no way helps you predict the value of Y
n+1
() . More formally,
you might start by showing,
P
_
Y
n+1
= 1 [(2
n
k, 2
n
(k + 1)]
_
=
1
2
= P
_
Y
n+1
= 0 [(2
n
k, 2
n
(k + 1)]
_
.
See Section 10.9 if you need some more help with this exercise.
Exercise 10.2. Let X, Y be two random variables on (, B, P) .
1. Show that X and Y are independent i Cov (f (X) , g (Y )) = 0 (i.e. f (X)
and g (Y ) are uncorrelated) for bounded measurable functions, f, g : 1
1.
2. If X, Y L
2
(P) and X and Y are independent, then Cov (X, Y ) = 0.
3. Show by example that if X, Y L
2
(P) and Cov (X, Y ) = 0 does not
necessarily imply that X and Y are independent. Hint: try taking (X, Y ) =
(X, ZX) where X and Z are independent simple random variables such that
EZ = 0 similar to Remark 9.40.
Solution to Exercise (10.2). 1. Since
Cov (f (X) , g (Y )) = E[f (X) g (Y )] E[f (X)] E[g (Y )]
it follows that Cov (f (X) , g (Y )) = 0 i
E[f (X) g (Y )] = E[f (X)] E[g (Y )]
from which item 1. easily follows.
2. Let f
M
(x) = x1
]x]M
, then by independence,
148 10 Independence
E[f
M
(X) f
M
(Y )] = E[f
M
(X)] E[f
M
(Y )] . (10.4)
Since
[f
M
(X) f
M
(Y )[ [XY [
1
2
_
X
2
+Y
2
_
L
1
(P) ,
[f
M
(X)[ [X[
1
2
_
1 +X
2
_
L
1
(P) , and
[f
M
(Y )[ [Y [
1
2
_
1 +Y
2
_
L
1
(P) ,
we may use the DCT three times to pass to the limit as M in Eq. (10.4)
to learn that E[XY ] = E[X] E[Y ], i.e. Cov (X, Y ) = 0.
3. Let X and Z be independent with P (Z = 1) =
1
2
and take Y = XZ.
Then EZ = 0 and
Cov (X, Y ) = E
_
X
2
Z
E[X] E[XZ]
= E
_
X
2
EZ E[X] E[X] EZ = 0.
On the other hand it should be intuitively clear that X and Y are not inde-
pendent since knowledge of X typically will give some information about Y. To
verify this assertion let us suppose that X is a discrete random variable with
P (X = 0) = 0. Then
P (X = x, Y = y) = P (X = x, xZ = y) = P (X = x) P (X = y/x)
while
P (X = x) P (Y = y) = P (X = x) P (XZ = y) .
Thus for X and Y to be independent we would have to have,
P (xX = y) = P (XZ = y) for all x, y.
This is clearly not going to be true in general. For example, suppose that
P (X = 1) =
1
2
= P (X = 0) . Taking x = y = 1 in the previously displayed
equation would imply
1
2
= P (X = 1) = P (XZ = 1) = P (X = 1, Z = 1) = P (X = 1) P (Z = 1) =
1
4
which is false.
Exercise 10.3 (A correlation inequality). Suppose that X is a random
variable and f, g : 1 1 are two increasing functions such that both f (X)
and g (X) are square integrable, i.e. E[f (X)[
2
+ E[g (X)[
2
< . Show
Cov (f (X) , g (X)) 0. Hint: let Y be another random variable which has
the same law as X and is independent of X. Then consider
E[(f (Y ) f (X)) (g (Y ) g (X))] .
Let us now specialize to the case where S
i
= 1
mi
and o
i
= B
R
m
i for some
m
i
N.
Theorem 10.11. Let (, B, P) be a probability space, m
j
N, S
j
= 1
mj
,
o
j
= B
R
m
j , Y
j
: S
j
be random vectors, and
j
:= Law
P
(Y
j
) = P Y
1
j
:
o
j
[0, 1] for 1 j n. The the following are equivalent;
1. Y
j
n
j=1
are independent,
2. Law
P
(Y
1
, . . . , Y
n
) =
1
2

n
3. for all bounded measurable functions, f : (S
1
S
n
, o
1
o
n
)
(1, B
R
) ,
Ef (Y
1
, . . . , Y
n
) =
_
S1Sn
f (x
1
, . . . , x
n
) d
1
(x
1
) . . . d
n
(x
n
) , (10.5)
( where the integrals may be taken in any order),
4. E
_
n
j=1
f
j
(Y
j
)
_
=
n
j=1
E[f
j
(Y
j
)] for all bounded (or non-negative) mea-
surable functions, f
j
: S
j
1 or C.
5. P
_
n
j=1
Y
j
y
j
_
=

n
j=1
P (Y
j
y
j
) for all y
j
S
j
, where we say
that Y
j
y
j
i (Y
j
)
k
(y
j
)
k
for 1 k m
j
.
6. E
_
n
j=1
f
j
(Y
j
)
_
=
n
j=1
E[f
j
(Y
j
)] for all f
j
C
c
(S
j
, 1) ,
7. E
_
e
i
n
j=1
jYj
_
=
n
j=1
E
_
e
ijYj
for all
j
S
j
= 1
mj
.
Proof. The equivalence of 1. 4. has already been proved in Theorem 10.9.
It is also clear that item 4. implies both or items 5. 7. upon noting that item
5. may be written as,
E
_
_
n
j=1
1
(,yj]
(Y
j
)
_
_
=
n
j=1
E
_
1
(,yj]
(Y
j
)
where (, y
j
] := (, (y
j
)
1
] (, (y
j
)
mj
]. The proofs that either 5.
or 6. or 7. implies item 3. is a simple application of the multiplicative system
theorem in the form of either Corollary 8.3 or Corollary 8.8. In each case, let H
denote the linear space of bounded measurable functions such that Eq. (10.5)
holds. To complete the proof I will simply give you the multiplicative system,
M, to use in each of the cases. To describe M, let N = m
1
+ +m
n
and
y = (y
1
, . . . , y
n
) =
_
y
1
, y
2
, . . . , y
N
_
1
N
and
= (
1
, . . . ,
n
) =
_
1
,
2
, . . . ,
N
_
1
N
For showing 5. = 3.take M=
_
1
(,y]
: y 1
N
_
.
10.1 Basic Properties of Independence 149
For showing 6. = 3. take M to be a those functions on 1
N
which are of
the form, f (y) =
N
l=1
f
l
_
y
l
_
with each f
l
C
c
(1) .
For showing 7. = 3. take M to be the functions of the form,
f (y) = exp
_
_
i
n
j=1
j
y
j
_
_
= exp(i y) .
Denition 10.12. A collection of subsets of B, (
t
tT
is said to be indepen-
dent i (
t
t
are independent for all nite subsets, T. More explicitly,
we are requiring
P (
t
A
t
) =
t
P (A
t
)
whenever is a nite subset of T and A
t
(
t
for all t .
Corollary 10.13. If (
t
tT
is a collection of independent classes such that
each (
t
is a system, then ((
t
)
tT
are independent as well.
Denition 10.14. A collections of random variables, X
t
: t T are inde-
pendent i (X
t
) : t T are independent.
n
n=1
is any sequence of probability measure
on (1, B
R
) . Let = 1
N
, B :=
n=1
B
R
be the product algebra on , and
P :=
n=1
n
be the product measure. Then the random variables, Y
n
n=1
dened by Y
n
() =
n
for all are independent with Law
P
(Y
n
) =
n
for
each n.
Lemma 10.16 (Independence of groupings). Suppose that B
t
: t T is
an independent family of elds. Suppose further that T
s
sS
is a partition
of T (i.e. T =
sS
T
s
) and let
B
Ts
=
tTs
B
t
= (
tTs
B
t
) .
Then B
Ts
sS
is again independent family of elds.
Proof. Let
(
s
=
K
B
: B
, K T
s
.
It is now easily checked that B
Ts
= ((
s
) and that (
s
sS
is an independent
family of systems. Therefore B
Ts
sS
is an independent family of
algebras by Corollary 10.13.
Corollary 10.17. Suppose that Y
n
n=1
is a sequence of independent random
variables (or vectors) and
1
, . . . ,
m
is a collection of pairwise disjoint subsets
of N. Further suppose that f
i
: 1
i
1 is a measurable function for each
1 i m, then Z
i
:= f
i
_
Y
l
li
_
is again a collection of independent random
variables.
Proof. Notice that (Z
i
)
_
Y
l
li
_
= (
li
(Y
l
)) . Since
(Y
l
)
l=1
are independent by assumption, it follows from Lemma 10.16 that
_
_
Y
l
li
__
m
i=1
are independent and therefore so is (Z
i
)
m
i=1
, i.e. Z
i
m
i=1
are independent.
Denition 10.18 (i.i.d.). A sequences of random variables, X
n
n=1
, on a
probability space, (, B, P), are i.i.d. (= independent and identically dis-
tributed) if they are independent and (X
n
)
P = (X
k
)
P for all k, n. That is

we should have
P (X
n
A) = P (X
k
A) for all k, n N and A B
R
.
Observe that X
n
n=1
are i.i.d. random variables i
P (X
1
A
1
, . . . , X
n
A
n
) =
n
j=1
P (X
i
A
i
) =
n
j=1
P (X
1
A
i
) =
n
j=1
(A
i
)
(10.6)
where = (X
1
)
P. The identity in Eq. (10.6) is to hold for all n N and all

A
i
B
R
. If we choose
n
= in Example 10.15, the Y
n
n=1
there are i.i.d.
with Law
P
(Y
n
) = P Y
1
n
= for all n N.
The following theorem follows immediately from the denitions and Theo-
rem 10.11.
Theorem 10.19. Let X := X
t
: t T be a collection of random variables.
Then the following are equivalent:
1. The collection X is independent,
2.
P (
t
X
t
A
t
) =
t
P (X
t
A
t
)
for all nite subsets, T, and all A
t
t
B
R
.
3.
P (
t
X
t
x
t
) =
t
P (X
t
x
t
)
for all nite subsets, T, and all x
t
t
1.
150 10 Independence
4. For all T and f
t
: 1
n
1 which are bounded an measurable for all
t ,
E
_
t
f
t
(X
t
)
_
=
t
Ef
t
(X
t
) =
_
R
t
f
t
(x
t
)
t
d
t
(x
t
) .
5. E
_
t
exp
_
e
itXt
_
=
t

t
() .
6. For all T and f : (1
n
)
1,
E[f (X
)] =
_
(R
n
)
f (x)
t
d
t
(x
t
) .
7. For all T, Law
P
(X
) =
t
t
.
8. Law
P
(X) =
tT
t
.
Moreover, if B
t
is a sub- - algebra of B for t T, then B
t
tT
are inde-
pendent i for all T,
E
_
t
X
t
_
=
t
EX
t
for all X
t
L
(, B
t
, P) .
Proof. The equivalence of 1. and 2. follows almost immediately form the
denition of independence and the fact that (X
t
) = X
t
A : A B
R
.
Clearly 2. implies 3. holds. Finally, 3. implies 2. is an application of Corollary
10.13 with (
t
:= X
t
a : a 1 and making use the observations that (
t
is a system for all t and that ((
t
) = (X
t
) . The remaining equivalence
are also easy to check.
Denition 10.20 (Conditional Independence). Let (, B, P) be a proba-
bility space and B
i
B be a sub-sigma algebra of B for i = 1, 2, 3. We say that
B
1
is independent of B
3
conditioned on B
2
(written B
1
B2
B
3
) provided,
P (A B[B
2
) = P (A[B
2
) P (B[B
2
) a.s.
for all A B
1
and B B
3
. This can be equivalently stated as
E(f g[B
2
) = E(f[B
2
) E(g[B
2
) a.s.
for all f (B
1
)
b
and g (B
3
)
b
, where B
b
denotes the bounded B mea-
surable functions. If X, Y, Z are measurable functions on (, B) , we say
that X is independent of Z conditioned on Y (written as X
Y
Z) provided
(X)
(Y )
(Z) .
Example 10.21. Let X and Y be two i.i.d. random variables such that
P (X = 1) = 1/2 = P (Y = 1) and P (X = 2) = 1/2 = P (Y = 2) . Then
E[Y [X = Y ] = E[X[X = Y ] =
1
4
(1 + 2)
1
4
+
1
4
=
3
2
and
E[XY [X = Y ] =
1
4
(1 + 4)
1
4
+
1
4
=
5
4
.
Notice that
E[XY [X = Y ] =
5
4
,=
9
4
= E[Y [X = Y ] E[X[X = Y ] .
So independence does not necessarily imply conditional independence!
See Exercise 10.6 and Theorem 17.4 for a couple more examples involving
conditional independence.
Exercise 10.4. Suppose M
i
(B
i
)
b
for i = 1 and i = 3 are multiplicative
systems such that B
i
= (M
i
) . Show B
1
B2
B
3
i
E(f g[B
2
) = E(f[B
2
) E(g[B
2
) a.s. f M
1
and g M
3
. (10.7)
Hint: Do this by two applications of the functional form of the multiplica-
tive systems theorem, see Theorems 8.16 and 8.5 of Chapter 8. For the rst
application, x an f M
1
and let
H := g (B
3
)
b
: E(f g[B
2
) = E(f[B
2
) E(g[B
2
) a.s. .
(See the proof of Theorem 17.4 if you get stuck.)
10.2 Examples of Independence
10.2.1 An Example of Ranks
Lemma 10.22 (No Ties). Suppose that X and Y are independent random
variables on a probability space (, B, P) . If F (x) := P (X x) is continuous,
then P (X = Y ) = 0.
Proof. Let (A) := P (X A) and (A) = P (Y A) . Because F is con-
tinuous, (y) = F (y) F (y) = 0, and hence
10.2 Examples of Independence 151
P (X = Y ) = E
_
1
X=Y ]
=
_
R
2
1
x=y]
d ( ) (x, y)
=
_
R
d (y)
_
R
d(x) 1
x=y]
=
_
R
(y) d (y)
=
_
R
0 d (y) = 0.
Second Proof. For sake of comparison, lets give a proof where we do not
allow ourselves to use Fubinis theorem. To this end let
_
a
l
:=
l
N
_
l=
(or for
the moment any sequence such that, a
l
< a
l+1
for all l Z, lim
l
a
l
= ).
Then
(x, x) : x 1
lZ
[(a
l
, a
l+1
] (a
l
, a
l+1
]]
and therefore,
P (X = Y )
lZ
P (X (a
l
, a
l+1
], Y (a
l
, a
l+1
]) =
lZ
[F (a
l+1
) F (a
l
)]
2
sup
lZ
[F (a
l+1
) F (a
l
)]
lZ
[F (a
l+1
) F (a
l
)] = sup
lZ
[F (a
l+1
) F (a
l
)] .
Since F is continuous and F (+) = 1 and F () = 0, it is easily seen that
F is uniformly continuous on 1. Therefore, if we choose a
l
=
l
N
, we have
P (X = Y ) limsup
N
sup
lZ
_
F
_
l + 1
N
_
F
_
l
N
__
= 0.
Let X
n
n=1
be i.i.d. with common continuous distribution function, F. So
by Lemma 10.22 we know that
P (X
i
= X
j
) = 0 for all i ,= j.
Let R
n
denote the rank of X
n
in the list (X
1
, . . . , X
n
) , i.e.
R
n
:=
n
j=1
1
XjXn
= #j n : X
j
X
n
.
Thus R
n
= k if X
n
is the k
th
largest element in the list, (X
1
, . . . , X
n
) .
For example if (X
1
, X
2
, X
3
, X
4
, X
5
, X
6
, X
7
, . . . ) = (9, 8, 3, 7, 23, 0, 11, . . . ) ,
we have R
1
= 1, R
2
= 2, R
3
= 2, R
4
= 2, R
5
= 1, R
6
= 5, and R
7
=
7. Observe that rank order, from lowest to highest, of (X
1
, X
2
, X
3
, X
4
, X
5
)
is (X
2
, X
3
, X
4
, X
1
, X
5
) . This can be determined by the values of R
i
for i =
1, 2, . . . , 5 as follows. Since R
5
= 1, we must have X
5
in the last slot, i.e.
(, , , , X
5
) . Since R
4
= 2, we know out of the remaining slots, X
4
must be
in the second from the far most right, i.e. (, , X
4
, , X
5
) . Since R
3
= 2, we
know that X
3
is again the second from the right of the remaining slots, i.e. we
now know, (, X
3
, X
4
, , X
5
) . Similarly, R
2
= 2 implies (X
2
, X
3
, X
4
, , X
5
) and
nally R
1
= 1 gives, (X
2
, X
3
, X
4
, X
1
, X
5
) (= (8, 4, 7, 9, 23) in the example).
As another example, if R
i
= i for i = 1, 2, . . . , n, then X
n
< X
n1
< < X
1
.
Theorem 10.23 (Renyi Theorem). Let X
n
n=1
be i.i.d. and assume that
F (x) := P (X
n
x) is continuous. The R
n
n=1
is an independent sequence,
P (R
n
= k) =
1
n
for k = 1, 2, . . . , n,
and the events, A
n
= X
n
is a record = R
n
= 1 are independent as n varies
and
P (A
n
) = P (R
n
= 1) =
1
n
.
Proof. By Problem 6 on p. 110 of Resnick or by Fubinis theorem,
(X
1
, . . . , X
n
) and (X
1
, . . . , X
n
) have the same distribution for any permu-
tation .
Since F is continuous, it now follows that up to a set of measure zero,
=
X
1
< X
2
< < X
n
and therefore
1 = P () =
P (X
1
< X
2
< < X
n
) .
Since P (X
1
< X
2
< < X
n
) is independent of we may now conclude
that
P (X
1
< X
2
< < X
n
) =
1
n!
for all . As observed before the statement of the theorem, to each realization
(
1
, . . . ,
n
) , (here
i
N with
i
i) of (R
1
, . . . , R
n
) there is a uniquely
determined permutation, = (
1
, . . . ,
n
) , such that X
1
< X
2
< <
X
n
. (Notice that there are n! permutations of 1, 2, . . . , n and there are also
n! choices for the (
1
, . . . ,
n
) : 1
i
i .) From this it follows that
(R
1
, . . . , R
n
) = (
1
, . . . ,
n
) = X
1
< X
2
< < X
n
and therefore,
P ((R
1
, . . . , R
n
) = (
1
, . . . ,
n
)) = P (X
1
< X
2
< < X
n
) =
1
n!
.
Since
152 10 Independence
P (R
n
=
n
) =
(1,...n1)
P ((R
1
, . . . , R
n
) = (
1
, . . . ,
n
))
=
(1,...n1)
1
n!
= (n 1)!
1
n!
=
1
n
we have shown that
P ((R
1
, . . . , R
n
) = (
1
, . . . ,
n
)) =
1
n!
=
n
j=1
1
j
=
n
j=1
P (R
j
=
j
) .
10.3 Gaussian Random Vectors
As you saw in Exercise 10.2, uncorrelated random variables are typically not
independent. However, if the random variables involved are jointly Gaussian
(see Denition 9.34), then independence and uncorrelated are actually the same
thing!
Lemma 10.24. Suppose that Z = (X, Y )
tr
is a Gaussian random vector with
X 1
k
and Y 1
l
. Then X is independent of Y i Cov (X
i
, Y
j
) = 0 for all
1 i k and 1 j l. This lemma also holds more generally. Namely if
_
X
l
_
n
l=1
is a sequence of random vectors such that
_
X
1
, . . . , X
n
_
is a Gaussian
random vector. Then
_
X
l
_
n
l=1
are independent i Cov
_
X
l
i
, X
l
k
_
= 0 for all
l ,= l
t
and i and k.
Proof. We know by Exercise 10.2 that if X
i
and Y
j
are independent, then
Cov (X
i
, Y
j
) = 0. For the converse direction, if Cov (X
i
, Y
j
) = 0 for all 1 i k
and 1 j l and x 1
k
and y 1
l
, then
Var (x X +y Y ) = Var (x X) + Var (y Y ) + 2 Cov (x X, y Y )
= Var (x X) + Var (y Y ) .
Therefore using the fact that (X, Y ) is a Gaussian random vector,
E
_
e
ixX
e
iyY
= E
_
e
i(xX+yY )
_
= exp
_
1
2
Var (x X +y Y ) +E(x X +y Y )
_
= exp
_
1
2
Var (x X) +iE(x X)
1
2
Var (y Y ) +iE(y Y )
_
= E
_
e
ixX
E
_
e
iyY
,
and because x and y were arbitrary, we may conclude from Theorem 10.11 that
X and Y are independent.
Corollary 10.25. Suppose that X : 1
k
and Y : 1
l
are two indepen-
dent random Gaussian vectors, then (X, Y ) is also a Gaussian random vector.
This corollary generalizes to multiple independent random Gaussian vectors.
Proof. Let x 1
k
and y 1
l
, then
E
_
e
i(x,y)(X,Y )
_
=E
_
e
i(xX+yY )
_
= E
_
e
ixX
e
iyY
= E
_
e
ixX
E
_
e
iyY
=exp
_
1
2
Var (x X) +iE(x X)
_
exp
_
1
2
_
=exp
_
1
2
Var (x X) +iE(x X)
1
2
_
=exp
_
1
2
Var (x X +y Y ) +iE(x X +y Y )
_
which shows that (X, Y ) is again Gaussian.
Notation 10.26 Suppose that X
i
n
i=1
is a collection of 1 valued variables or
1
d
valued random vectors. We will write X
1
+ X
2
+ . . .
+ X
n
for X
1
+ +X
n
under the additional assumption that the X
i
n
i=1
are independent.
Corollary 10.27. Suppose that X
i
n
i=1
are independent Gaussian random
variables, then S
n
:=
n
i=1
X
i
is a Gaussian random variables with :
Var (S
n
) =
n
i=1
Var (X
i
) and ES
n
=
n
i=1
EX
i
, (10.8)
i.e.
X
1
+ X
2
+ . . .
+ X
n
d
= N
_
n
i=1
Var (X
i
) ,
n
i=1
EX
i
_
.
In particular if X
i
i=1
are i.i.d. Gaussian random variables with EX
i
= and
2
= Var (X
i
) , then
S
n
n

d
= N
_
0,

2
n
_
and (10.9)
S
n
n
n
d
= N (0, 1) . (10.10)
10.3 Gaussian Random Vectors 153
Equation (10.10) is a very special case of the central limit theorem while Eq.
(10.9) leads to a very special case of the strong law of large numbers, see Corol-
lary 10.28.
Proof. The fact that S
n
,
Sn
n
, and
Snn
n
are all Gaussian follows from
Corollary 10.27 and Lemma 9.36 or by direct calculation. The formulas for the
variances and means of these random variables are routine to compute.
Recall the rst Borel Cantelli-Lemma 7.14 states that if A
n
n=1
are mea-
surable sets, then
n=1
P (A
n
) < = P (A
n
i.o.) = 0. (10.11)
Corollary 10.28. Let X
i
i=1
be i.i.d. Gaussian random variables with EX
i
=
and
2
= Var (X
i
) . Then lim
n
Sn
n
= a.s. and moreover for every <
1
2
,
there exists N
: N , such that P (N
= ) = 0 and
S
n
n

for n N
.
In particular, lim
n
Sn
n
= a.s.
Proof. Let Z
d
= N (0, 1) so that

n
Z
d
= N
_
0,

2
n
_
. From the Eq. (10.9)
and Eq. (7.42),
P
_
S
n
n

_
= P
_
n
Z

_
= P
_
[Z[
_
exp
_
1
2
_
n
_
2
_
= exp
_

2
2
2
n
_
.
Taking = n
with 1 2 > 0, it follows that
n=1
P
_
S
n
n

n=1
exp
_
1
2
2
n
12
_
<
and so by the rst Borel-Cantelli lemma,
P
__
S
n
n

i.o.
__
= 0.
Therefore, P a.s.,

Sn
n

a.a., and in particular lim

n
Sn
n
= a.s.
Theorem 10.29. Suppose that Z = (X, Y )
tr
is a mean zero Gaussian random
vector with X 1
k
and Y 1
l
. Let Q = Q
X
:= E[XX
tr
] and then let
W := Y E
_
Y X
tr
Q
1
X
where Q
1
= Q[
1
Ran(Q)
P is as in Example 9.43 below. Then (X, W)
tr
is again a
Gaussian random vector and moreover W is independent of X. The covariance
matrix for W is
E
_
WW
tr
= E
_
Y Y
tr
E
_
Y X
tr
Q
1
E
_
XY
tr
. (10.12)
Proof. Let be any k l matrix and let W := Y X. Since
_
X
W
_
=
_
I 0
I
__
X
Y
_
,
according to Lemma 9.36 (X, W)
tr
is still Gaussian. So according to Lemma
10.24, in order to make W independent of X it suces to choose so that W
and X are uncorrelated, i.e.
0 = Cov (W
j
, X
i
) = Cov
_
Y
j

jk
X
k
, X
i
_
= E[Y
j
X
i
]
jk
E(X
k
X
i
) .
In matrix notation, we want to choose so that
E
_
Y X
tr
= E
_
XX
tr
. (10.13)
In the case Q := E[XX
tr
] is non-degenerate, we see that := E[Y X
tr
] Q
1
is
the desired solution. In fact this works for general Q where Q
1
is dened in
Example 9.43. To see this is correct, recall
v Qv = v E
_
XX
tr
v
= E
_
(v X)
2
_
Nul (Q) =
_
v 1
k
: v X = 0
_
.
E
_
Y X
tr
v = E
_
XX
tr
v for all v Nul (Q)

no matter how is chosen. On the other hand if v Ran(Q) = Nul (Q)
,
154 10 Independence
E
_
XX
tr
v = E
_
Y X
tr
Q
1
Qv = E
_
Y X
tr
v
as desired.
To prove Eq. (10.12) let B := E[Y X
tr
] so that
W := Y BQ
1
X.
We then have
E
_
WW
tr
= E
_
_
Y BQ
1
X
_ _
Y BQ
1
X
_
tr
_
= E
__
Y BQ
1
X
_ _
Y
tr
X
tr
Q
1
B
tr
_
= E
_
Y Y
tr
Y X
tr
Q
1
B
tr
BQ
1
XY
tr
+BQ
1
XX
tr
Q
1
B
tr
= E
_
Y Y
tr
BQ
1
B
tr
BQ
1
B
tr
+BQ
1
QQ
1
B
tr
= E
_
Y Y
tr
BQ
1
B
tr
= E
_
Y Y
tr
E
_
Y X
tr
Q
1
E
_
XY
tr
.
Corollary 10.30. Suppose that Z = (X, Y )
tr
vector with X 1
k
and Y 1
l
,
A := E
_
Y X
tr
Q
1
,
Q
W
:= E
_
Y Y
tr
E
_
Y X
tr
Q
1
E
_
XY
tr
,
and suppose W
d
= N (Q
W
, 0) . If f : 1
k
1
l
1 is a bounded measurable
function, then
E[f (X, Y ) [X] = E[f (x, Ax +W)] [
x=X
.
As an important special case, if x 1
k
and y 1
l
, then
E
_
e
i(xX+yY )
[X
_
= e
i(xX+yAX)
e
1
2
Var(yW)
= e
i(xX+yAX)
e
1
2
QWyy
.
(10.14)
Proof. Using the notation in Theorem 10.29,
E[f (X, Y ) [X] = E[f (X, AX +W) [X]
where W
d
= N (Q
W
, 0) and W is independent of X. The result now follows
by an application of Exercise 14.4. Let us now specialize to the case where
f (X, Y ) = e
i(xX+yY )
in which case
E
_
e
i(xX+yY )
[X
_
= E
_
e
i(xx
+y(Ax
+W))
[X
_
[
x
=X
= e
i(xX+yAX)
E
_
e
iyW
= e
i(xX+yAX)
e
1
2
Var(yW)
= e
i(xX+yAX)
e
1
2
QWyy
.
Exercise 10.5. Suppose now that (X, Y, Z)
tr
vector with X 1
k
, Y 1
l
, and Z 1
m
. Show for all y 1
l
and z 1
m
that
E[exp(i (y Y +z Z)) [X]
= exp(Cov (y W
1
, z W
2
)) E[exp(iy Y ) [X] E[exp(iz Z) [X] .
In performing these computations please use the following denitions,
Q := Q
X
:= E
_
XX
tr
,
A := E
__
Y
Z
_
X
tr
_
Q
1
=
_
E[Y X
tr
] Q
1
E[ZX
tr
] Q
1
_
=:
_
A
1
A
2
_
,
and
W :=
_
W
1
W
2
_
=
_
Y
Z
_
AX =
_
Y A
1
X
Z A
2
X
_
.
Exercise 10.6. Keeping the same notation as in Exercise 10.5, show Y
X
Z
(see Denition 10.20) i
E
_
Y Z
tr
= E
_
Y X
tr
Q
1
E
_
XZ
tr
.
where
Q = Q
X
:= E
_
XX
tr
.
10.4 Summing independent random variables
d
= N
_
0, a
2
_
and Y
d
= N
_
0, b
2
_
and X and
Y are independent. Show by direct computation using the formulas for the
distributions of X and Y that X +Y = N
_
0, a
2
+b
2
_
.
Solution to Exercise (10.7). If f : 1 1 be a bounded measurable func-
tion, then
E[f (X +Y )] =
1
Z
_
R
2
f (x +y) e
1
2a
2
x
2
e
1
2b
2
y
2
dxdy,
where Z = 2ab. Let us make the change of variables, (x, z) = (x, x +y) and
observe that dxdy = dxdz (you check). Therefore we have,
E[f (X +Y )] =
1
Z
_
R
2
f (z) e
1
2a
2
x
2
e
1
2b
2
(zx)
2
dxdz
which shows, Law
P
(X +Y ) (dz) = (z) dz where
10.4 Summing independent random variables 155
(z) =
1
Z
_
R
e
1
2a
2
x
2
e
1
2b
2
(zx)
2
dx. (10.15)
Working the exponent, for any c 1, we have
1
a
2
x
2
+
1
b
2
(z x)
2
=
1
a
2
x
2
+
1
b
2
_
x
2
2xz +z
2
_
=
_
1
a
2
+
1
b
2
_
x
2
2
b
2
xz +
1
b
2
z
2
=
_
1
a
2
+
1
b
2
_
_
(x cz)
2
+ 2cxz c
2
z
2
_
2
b
2
xz +
1
b
2
z
2
.
Let us now choose (to complete the squares) c such that where c must be chosen
so that
c
_
1
a
2
+
1
b
2
_
=
1
b
2
= c =
a
2
a
2
+b
2
,
in which case,
1
a
2
x
2
+
1
b
2
(z x)
2
=
_
1
a
2
+
1
b
2
_
_
(x cz)
2
_
+
_
1
b
2
c
2
_
1
a
2
+
1
b
2
__
z
2
where,
1
b
2
c
2
_
1
a
2
+
1
b
2
_
=
1
b
2
(1 c) =
1
a
2
+b
2
.
So making the change of variables, x x cz, in the integral in Eq. (10.15)
implies,
(z) =
1
Z
_
R
exp
_
1
2
_
1
a
2
+
1
b
2
_
w
2
1
2
1
a
2
+b
2
z
2
_
dw
=
1
Z
exp
_
1
2
1
a
2
+b
2
z
2
_
where,
1
Z
=
1
Z

_
R
exp
_
1
2
_
1
a
2
+
1
b
2
_
w
2
_
dw =
1
2ab
2
_
1
a
2
+
1
b
2
_
1
=
1
2ab
_
2
a
2
b
2
a
2
+b
2
=
1
_
2 (a
2
+b
2
)
.
Thus it follows that X
+ Y
d
= N
_
a
2
+b
2
, 0
_
.
Exercise 10.8. Show that the sum, N
1
+N
2
, of two independent Poisson ran-
dom variables, N
1
and N
2
, with parameters
1
and
2
respectively is again a
Poisson random variable with parameter
1
+
2
. (You could use generating
functions or do this by hand.) In short Poi (
1
)

+ Poi (
2
)
d
= Poi (
1
+
2
) .
Solution to Exercise (10.8). Let z C, then by independence,
E
_
z
N1+N2
= E
_
z
N1
z
N2
= E
_
z
N1
E
_
z
N2
= e
1(z1)
e
2(z1)
= e
(1+2)(z1)
from which it follows that N
1
+N
2
d
= Poisson(
1
+
2
) .
Example 10.31 (Gamma Distribution Sums). We will show here that
Gamma(k, )
+ Gamma(l, ) =Gamma(k +l, ) . In Exercise 7.13 you

showed if k, > 0 then
E
_
e
tX
= (1 t)
k
for t <
1
where X is a positive random variable with X
d
=Gamma(k, ) , i.e.
(X
P) (dx) = x
k1
e
x/
k
(k)
dx for x > 0.
Suppose that X and Y are independent Random variables with
X
d
=Gamma(k, ) and Y
d
=Gamma(l, ) for some l > 0. It now follows
that
E
_
e
t(X+Y )
_
= E
_
e
tX
e
tY
= E
_
e
tX
E
_
e
tY
= (1 t)
k
(1 t)
l
= (1 t)
(k+l)
.
Therefore it follows from Exercise 8.2 that X +Y
d
=Gamma(k +l, ) .
Example 10.32 (Exponential Distribution Sums). If T
k
n
k=1
are independent
random variables such that T
k
d
= E (
k
) for all k, then
T
1
+ T
2
+ . . .
+ T
n
= Gamma
_
n,
1
_
.
This follows directly from Example 10.31 using E () =Gamma
_
1,
1
_
and
induction. We will verify this directly later on in Corollary 11.8.
Example 10.31 may also be veried using brute force. To this end, suppose
that f : 1
+
1
+
is a measurable function, then
156 10 Independence
E[f (X +Y )] =
_
R
2
+
f (x +y) x
k1
e
x/
k
(k)
y
l1
e
y/
l
(l)
dxdy
=
1
k+l
(k) (l)
_
R
2
+
f (x +y) x
k1
y
l1
e
(x+y)/
dxdy.
Let us now make the change of variables, x = x and z = x +y, so that dxdy =
dxdz, to nd,
E[f (X +Y )] =
1
k+l
(k) (l)
_
1
0xz<
f (z) x
k1
(z x)
l1
e
z/
dxdz.
(10.16)
To nish the proof we must now do that x integral and show,
_
z
0
x
k1
(z x)
l1
dx = z
k+l1
(k) (l)
(k +l)
.
(In fact we already know this must be correct from our Laplace transform
computations above.) First make the change of variable, x = zt to nd,
_
z
0
x
k1
(z x)
l1
dx = z
k+l1
B(k, l)
where B(k, l) is the beta function dened by;
B(k, l) :=
_
1
0
t
k1
(1 t)
l1
dt for Re k, Re l > 0. (10.17)
Combining these results with Eq. (10.16) then shows,
E[f (X +Y )] =
B(k, l)
k+l
(k) (l)
_

0
f (z) z
k+l1
e
z/
dz. (10.18)
Since we already know that
_

0
z
k+l1
e
z/
dz =
k+l
(k +l)
it follows by taking f = 1 in Eq. (10.18) that
1 =
B(k, l)
k+l
(k) (l)
k+l
(k +l)
which implies,
B(k, l) =
(k) (l)
(k +l)
. (10.19)
Therefore, using this back in Eq. (10.18) implies
E[f (X +Y )] =
1
k+l
(k +l)
_

0
f (z) z
k+l1
e
z/
dz
from which it follows that X +Y
d
=Gamma(k +l, ) .
Let us pause to give a direct verication of Eq. (10.19). By denition of the
gamma function,
(k) (l) =
_
R
2
+
x
k1
e
x
y
l1
e
y
dxdy =
_
R
2
+
x
k1
y
l1
e
(x+y)
dxdy.
=
_
0xz<
x
k1
(z x)
l1
e
z
dxdz
Making the change of variables, x = x and z = x +y it follows,
(k) (l) =
_
0xz<
x
k1
(z x)
l1
e
z
dxdz.
Now make the change of variables, x = zt to nd,
(k) (l) =
_

0
dze
z
_
1
0
dt (zt)
k1
(z tz)
l1
z
=
_

0
e
z
z
k+l1
dz
_
1
0
t
k1
(1 t)
l1
dt
= (k +l) B(k, l) .
Denition 10.33 (Beta distribution). The distribution is
d
x,y
(t) =
t
x1
(1 t)
y1
dt
B(x, y)
.
Observe that
_
1
0
td
x,y
(t) =
B(x + 1, y)
B(x, y)
=
(x+1)(y)
(x+y+1)
(x)(y)
(x+y)
=
x
x +y
and
_
1
0
t
2
d
x,y
(t) =
B(x + 2, y)
B(x, y)
=
(x+2)(y)
(x+y+2)
(x)(y)
(x+y)
=
(x + 1) x
(x +y + 1) (x +y)
.
10.6 A Central Limit Theorem 157
10.5 A Strong Law of Large Numbers
Theorem 10.34 (A simple form of the strong law of large numbers).
If X
n
n=1
is a sequence of i.i.d. random variables such that E
_
[X
n
[
4
_
< ,
then
lim
n
S
n
n
= a.s.
where S
n
:=
n
k=1
X
k
and := EX
n
= EX
1
.
Exercise 10.9. Use the following outline to give a proof of Theorem 10.34.
1. First show that x
p
1 + x
4
for all x 0 and 1 p 4. Use this to
conclude;
E[X
n
[
p
1 +E[X
n
[
4
< for 1 p 4.
Thus := E
_
[X
n
[
4
_
and the standard deviation
_
2
_
of X
n
dened by,
2
:= E
_
X
2
n
2
= E
_
(X
n
)
2
_
< ,
are nite constants independent of n.
2. Show for all n N that
E
_
_
S
n
n

_
4
_
=
1
n
4
_
n + 3n(n 1)
4
_
=
1
n
2
_
n
1
+ 3
_
1 n
1
_
.
(Thus
Sn
n
in L
4
(P) .)
3. Use item 2. and Chebyshevs inequality to show
P
_
S
n
n

>
_
n
1
+ 3
_
1 n
1
_
4
n
2
.
4. Use item 3. and the rst Borel Cantelli Lemma 7.14 to conclude
lim
n
Sn
n
= a.s.
10.6 A Central Limit Theorem
In this section we will give a preliminary a couple versions of the central limit
theorem following [30, Chapter 2.14]. Let us set up some notation. Given a
square integrable random variable Y, let
Y :=
Y EY
(Y )
where (Y ) :=
_
E(Y EY )
2
=
_
Var (Y ).
Let us also recall that if Z = N
_
0,
2
_
, then Z
d
=

N (0, 1) and so by Eq.
(7.40) with = 3 we have,
E
Z
3
=
3
E[N (0, 1)[
3
=
_
8/
3
. (10.20)
Theorem 10.35 (A CLT proof w/o Fourier). Suppose that X
k
k=1

L
3
(P) is a sequence of independent random variables such that
C := sup
k
E[X
k
EX
k
[
3
<
Then for every function, f C
3
(1) with M := sup
xR
f
(3)
(x)
< we have
Ef (N) Ef
_
S
n
_
M
3!
_
1 +
_
8/
_
C
(S
n
)
3
n, (10.21)
where S
n
:= X
1
+ +X
n
and N
d
= N (0, 1) . In particular if we further assume
that
:= liminf
n
1
n
(S
n
)
2
= liminf
n
1
n
n
i=1
Var (X
i
) > 0, (10.22)
Then it follows that
Ef (N) Ef
_
S
n
_
= O
_
1
n
_
as n (10.23)
which is to say,

S
n
is close in distribution to N, which we abbreviate by
S
n
d
= N for large n.
(It should be noted that the estimate in Eq. (10.21) is valid for any nite
collection of random variables, X
k
n
k=1
.)
Proof. Let n N be xed and then Let Y
k
, N
k
k=1
be a collection of
independent random variables such that
Y
k
d
=

X
k
=
X
k
EX
k
(S
n
)
and N
k
d
= N (0, Var (Y
k
)) for 1 k n.
Let S
Y
n
= Y
1
+ +Y
n
d
=

S
n
and T
n
:= N
1
+ +N
n
. Since
n
k=1
Var (N
k
) =
n
k=1
Var (Y
k
) =
1
(S
n
)
2
n
k=1
Var (X
k
EX
k
)
=
1
(S
n
)
2
n
k=1
Var (X
k
) = 1,
158 10 Independence
it follows by Corollary 10.27) that T
n
d
= N (0, 1) .
To compare Ef
_
S
n
_
with Ef (N) we may compare Ef
_
S
Y
n
_
with Ef (T
n
)
which we will do by interpolating between S
Y
n
and T
n
. To this end, for 0 k
n, let
V
k
:= N
1
+ +N
k
+Y
k+1
+ +Y
n
with the convention that V
n
= T
n
and V
0
= S
Y
n
. Then by a telescoping series
argument, it follows that
f (T
n
) f
_
S
Y
n
_
= f (V
n
) f (V
0
) =
n
k=1
[f (V
k
) f (V
k1
)] . (10.24)
We now make use of Taylors theorem with integral remainder the form,
f (x +) f (x) = f
t
(x) +
1
2
f
tt
(x)
2
+r (x, )
3
(10.25)
where
r (x, ) :=
1
2
_
1
0
f
ttt
(x +t) (1 t)
2
dt.
Taking Eq. (10.24) with replaced by and subtracting the results then implies
f (x +)f (x +) = f
t
(x) ()+
1
2
f
tt
(x)
_
2
_
+ (x, , ) , (10.26)
where
[ (x, , )[ =
r (x, )
3
r (x, )
3
M
3!
_
[[
3
+[[
3
_
, (10.27)
wherein we have used the simple estimate, [r (x, )[ [r (x, )[ M/3!.
If we dene
U
k
:= N
1
+ +N
k1
+Y
k+1
+ +Y
n
,
then V
k
= U
k
+N
k
and V
k1
= U
k
+Y
k
. Hence, using Eq. (10.26) with x = U
k
,
= N
k
and = Y
k
, it follows that
f (V
k
) f (V
k1
) = f (U
k
+N
k
) f (U
k
+Y
k
)
= f
t
(U
k
) (N
k
Y
k
) +
1
2
f
tt
(U
k
)
_
N
2
k
Y
2
k
_
+R
k
(10.28)
where
[R
k
[
M
3!
_
[N
k
[
3
+[Y
k
[
3
_
. (10.29)
Taking expectations of Eq. (10.28) using; Eq. (10.29), EN
k
= 0 = EY
k
, EN
2
k
=
EY
2
k
, and the fact that U
k
is independent of both Y
k
and N
k
, we nd
[E[f (V
k
) f (V
k1
)][ = [ER
k
[
M
3!
E
_
[N
k
[
3
+[Y
k
[
3
_
.
Making use of Eq. (10.20) it follows that
E[N
k
[
3
=
_
8/Var (N
k
)
3/2
=
_
8/Var (Y
k
)
3/2
=
_
8/
_
EY
2
k
_
3/2
_
8/E[Y
k
[
3
,
wherein we have used Jensens (or Holders) inequality (see Chapter 12 below)
for the last inequality. Combining these estimates with Eq. (10.24) shows,
E
_
f (T
n
) f
_
S
Y
n
_
k=1
ER
k
k=1
E[R
k
[
M
3!
n
k=1
E
_
[N
k
[
3
+[Y
k
[
3
_
M
3!
_
1 +
_
8/
_
n
k=1
E
_
[Y
k
[
3
_
. (10.30)
Since
E[Y
k
[
3
= E
_
X
k
EX
k
(S
n
)
_
3
C
(S
n
)
3
and
Ef (N) Ef
_
S
n
_
E
_
f (T
n
) f
_
S
Y
n
_
,
we see that Eq. (10.21) now follows from Eq. (10.30).
n
n=1
is a sequence of i.i.d. random vari-
ables in L
3
(P) , C := E[X
1
EX
1
[
3
< , S
n
:= X
1
+ + X
n
, and N
d
=
N (0, 1) . Then for every function, f C
3
(1) with M := sup
xR
f
(3)
(x)
<
we have
Ef (N) Ef
_
S
n
_
M
3!
n
_
1 +
_
8/
_
C
Var (X
1
)
3/2
. (10.31)
(This is a specialized form of the BerryEsseen theorem.)
By a slight modication of the proof of Theorem 10.35 we have the following
central limit theorem.
Theorem 10.37 (A CLT proof w/o Fourier). Suppose that X
n
n=1
is a
sequence of i.i.d. random variables in L
2
(P) , S
n
:= X
1
+ + X
n
, and N
d
=
N (0, 1) . Then for every function, f C
2
(1) with M := sup
xR
f
(2)
(x)
<
and f
tt
being uniformly continuous on 1 we have,
lim
n
Ef
_
S
n
_
= Ef (N) .
10.7 The Second Borel-Cantelli Lemma 159
Proof. In this proof we use the following form of Taylors theorem;
f (x +) f (x) = f
t
(x) +
1
2
f
tt
(x)
2
+r (x, )
2
(10.32)
where
r (x, ) =
_
1
0
[f
tt
(x +t) f
tt
(x)] (1 t) dt.
Taking Eq. (10.32) with replaced by and subtracting the results then implies
f (x +) f (x +) = f
t
(x) () +
1
2
f
tt
(x)
_
2
_
+ (x, , )
where now,
(x, , ) = r (x, )
2
r (x, )
2
.
Since f
tt
is uniformly continuous it follows that
() :=
1
2
sup[f
tt
(x +t) f (x)[ : x 1 and 0 t 1 0
Thus we may conclude that
[r (x, )[
_
1
0
[f
tt
(x +t) f
tt
(x)[ (1 t) dt
_
1
0
2 () (1 t) dt = () .
and therefore that
[ (x, , )[ ()
2
+ ()
2
.
So working just as in the proof of Theorem 10.35 we may conclude,
Ef (N) Ef
_
S
n
_
k=1
E[R
k
[
where now,
[R
k
[ = (N
k
) N
2
k
+ (Y
k
) Y
2
k
.
Since the Y
k
n
k=1
and the N
k
n
k=1
are i.i.d. now it follows that
Ef (N) Ef
_
S
n
_
n E
_
(N
1
) N
2
1
+ (Y
1
) Y
2
1
.
Since Var (S
n
) = n Var (X
1
) , we have Y
1
=
X1EX1
n(X1)
, Var (N
1
) = Var (Y
1
) =
1
n
and therefore N
1
d
=
_
1
n
N. Combining these observations shows,
Ef (N) Ef
_
S
n
_
E
_
_
_
1
n
N
_
N
2
+
_
X
1
EX
1
n (X
1
)
_
(X
1
EX
1
)
2
2
(X
1
)
_
which goes to zero as n by the DCT.
Lemma 10.38. Suppose that W W
n
n=1
is a collection of random vari-
ables such that lim
n
Ef (W
n
) = Ef (W) for all f C
c
(1) , then
lim
n
Ef (W
n
) = Ef (W) for all bounded continuous functions, f : 1 1.
Proof. According to Theorem 21.29 below it suces to show
lim
n
Ef (W
n
) = Ef (W) for all f C
c
(1) . For such a function,
f C
c
(1) , we may nd
1
f
k
C
c
(1) with all supports being contained in a
compact subset of 1 such that
k
:= sup
xR
[f (x) f
k
(x)[ 0 as k .
We then have,
[Ef (W) Ef (W
n
)[ [Ef (W) Ef
k
(W)[
+[Ef
k
(W) Ef
k
(W
n
)[ +[Ef
k
(W
n
) Ef (W
n
)[
E[f (W) f
k
(W)[
+[Ef
k
(W) Ef
k
(W
n
)[ +E[f
k
(W
n
) f (W
n
)[
2
k
+[Ef
k
(W) Ef
k
(W
n
)[ .
limsup
n
[Ef (W) Ef (W
n
)[ 2
k
+ limsup
n
[Ef
k
(W) Ef
k
(W
n
)[
= 2
k
0 as k .
n
n=1
is a sequence of independent random
variables, then under the hypothesis on this sequence in either of Theorem 10.35
or Theorem 10.37 we have that lim
n
Ef
_
S
n
_
= Ef (N (0, 1)) for all f :
1 1 which are bounded and continuous.
For more on the methods employed in this section the reader is advised
to look up Steins method. In Chapters 22 and 23 below, we will relax
the assumptions in the above theorem. The proofs later will be based in the
characteristic functional or equivalently the Fourier transform.
10.7 The Second Borel-Cantelli Lemma
Lemma 10.40. If 0 x
1
2
, then
e
2x
1 x e
x
. (10.33)
Moreover, the upper bound in Eq. (10.33) is valid for all x 1.
160 10 Independence
Fig. 10.1. A graph of 1 x and e
x
showing that 1 x e
x
for all x.
Proof. The upper bound follows by the convexity of e
x
, see Figure 10.1.
For the lower bound we use the convexity of (x) = e
2x
to conclude that the
line joining (0, 1) = (0, (0)) and
_
1/2, e
1
_
= (1/2, (1/2)) lies above (x)
for 0 x 1/2. Then we use the fact that the line 1 x lies above this line
to conclude the lower bound in Eq. (10.33), see Figure 10.2. See Example 12.54
Fig. 10.2. A graph of 1x (in red), the line joining (0, 1) and
_
1/2, e
1
_
(in green), e
x
(in purple), and e
2x
(in black) showing that e
2x
1 x e
x
for all x [0, 1/2] .
below for a more formal proof of this lemma.
For a
n
n=1
[0, 1] , let
1
We will eventually prove this standard real analysis fact later in the course.
n=1
(1 a
n
) := lim
N
N
n=1
(1 a
n
) .
The limit exists since,

N
n=1
(1 a
n
) decreases as N increases.
Exercise 10.10. Show; if a
n
n=1
[0, 1), then
n=1
(1 a
n
) = 0
n=1
a
n
= .
The implication, = , holds even if a
n
= 1 is allowed.
Solution to Exercise (10.10). By Eq. (10.33) we always have,
N
n=1
(1 a
n
)
N
n=1
e
an
= exp
_
n=1
a
n
_
which upon passing to the limit as N gives
n=1
(1 a
n
) exp
_
n=1
a
n
_
.
Hence if

n=1
a
n
= then

n=1
(1 a
n
) = 0.
Conversely, suppose that

n=1
a
n
< . In this case a
n
0 as n and
so there exists an m N such that a
n
[0, 1/2] for all n m. Therefore by
Eq. (10.33), for any N m,
N
n=1
(1 a
n
) =
m
n=1
(1 a
n
)
N
n=m+1
(1 a
n
)
n=1
(1 a
n
)
N
n=m+1
e
2an
=
m
n=1
(1 a
n
) exp
_
2
N
n=m+1
a
n
_
n=1
(1 a
n
) exp
_
2
n=m+1
a
n
_
.
So again letting N shows,
n=1
(1 a
n
)
m
n=1
(1 a
n
) exp
_
2
n=m+1
a
n
_
> 0.
10.7 The Second Borel-Cantelli Lemma 161
Lemma 10.41 (Second Borel-Cantelli Lemma). Suppose that A
n
n=1
are
independent sets. If
n=1
P (A
n
) = , (10.34)
then
P (A
n
i.o.) = 1. (10.35)
Combining this with the rst Borel Cantelli Lemma 7.14 gives the (Borel)
Zero-One law,
P (A
n
i.o.) =
_
_
_
0 if

n=1
P (A
n
) <
1 if

n=1
P (A
n
) =
.
Proof. We are going to prove Eq. (10.35) by showing,
0 = P (A
n
i.o.
c
) = P (A
c
n
a.a) = P (
n=1
kn
A
c
k
) .
Since
kn
A
c
k

n=1

kn
A
c
k
as n and
m
k=n
A
c
k

n=1

kn
A
k
as
m ,
P (
n=1
kn
A
c
k
) = lim
n
P (
kn
A
c
k
) = lim
n
lim
m
P (
mkn
A
c
k
) .
Making use of the independence of A
k
k=1
and hence the independence of
A
c
k
k=1
, we have
P (
mkn
A
c
k
) =
mkn
P (A
c
k
) =
mkn
(1 P (A
k
)) . (10.36)
Using the upper estimate in Eq. (10.33) along with Eq. (10.36) shows
P (
mkn
A
c
k
)
mkn
e
P(Ak)
= exp
_
k=n
P (A
k
)
_
.
Using Eq. (10.34), we nd from the above inequality that
lim
m
P (
mkn
A
c
k
) = 0 and hence
P (
n=1
kn
A
c
k
) = lim
n
lim
m
P (
mkn
A
c
k
) = lim
n
0 = 0
Note: we could also appeal to Exercise 10.10 above to give a proof of the Borel
Zero-One law without appealing to the rst Borel Cantelli Lemma.
Example 10.42 (Example 7.15 continued). Suppose that X
n
are now indepen-
dent Bernoulli random variables with P (X
n
= 1) = p
n
and P (X
n
= 0) = 1
p
n
. Then P (lim
n
X
n
= 0) = 1 i
p
n
< . Indeed, P (lim
n
X
n
= 0) =
1 i P (X
n
= 0 a.a.) = 1 i P (X
n
= 1 i.o.) = 0 i

p
n
=
P (X
n
= 1) < .
Proposition 10.43 (Extremal behaviour of iid random variables). Sup-
pose that X
n
n=1
is a sequence of i.i.d. random variables and c
n
is an increas-
ing sequence of positive real numbers such that for all > 1 we have
n=1
P
_
X
1
>
1
c
n
_
= (10.37)
while
n=1
P (X
1
> c
n
) < . (10.38)
Then
limsup
n
X
n
c
n
= 1 a.s. (10.39)
Proof. By the second Borel-Cantelli Lemma, Eq. (10.37) implies
P
_
X
n
>
1
c
n
i.o. n
_
= 1
limsup
n
X
n
c
n

1
a.s..
Taking =
k
= 1 + 1/k, we nd
P
_
limsup
n
X
n
c
n
1
_
= P
_
k=1
_
limsup
n
X
n
c
n
k
__
= 1.
Similarly, by the rst Borel-Cantelli lemma, Eq. (10.38) implies
P (X
n
> c
n
i.o. n) = 0
or equivalently,
P (X
n
c
n
a.a. n) = 1.
That is to say,
limsup
n
X
n
c
n
a.s.
and hence working as above,
P
_
limsup
n
X
n
c
n
1
_
= P
_
k=1
_
limsup
n
X
n
c
n

k
__
= 1.
Hence,
P
_
limsup
n
X
n
c
n
= 1
_
= P
__
limsup
n
X
n
c
n
1
_
_
limsup
n
X
n
c
n
1
__
= 1.
162 10 Independence
Example 10.44. Let X
n
n=1
be i.i.d. standard normal random variables. Then
by Mills ratio (see Lemma 7.59),
P (X
n
c
n
)
1
2c
n
e
2
c
2
n
/2
.
Now, suppose that we take c
n
so that
e
c
2
n
/2
=
1
n
= c
n
=
_
2 ln(n).
It then follows that
P (X
n
c
n
)
1
2
_
2 ln(n)
e
2
ln(n)
=
1
2
_
ln(n)
1
n
2
and therefore
n=1
P (X
n
c
n
) = if < 1
and
n=1
P (X
n
c
n
) < if > 1.
Hence an application of Proposition 10.43 shows
limsup
n
X
n
2 lnn
= 1 a.s..
Example 10.45. Let E
n
n=1
be a sequence of i.i.d. random variables with ex-
ponential distributions determined by
P (E
n
> x) = e
(x0)
or P (E
n
x) = 1 e
(x0)
.
(Observe that P (E
n
0) = 0) so that E
n
> 0 a.s.) Then for c
n
> 0 and > 0,
we have
n=1
P (E
n
> c
n
) =
n=1
e
cn
=
n=1
_
e
cn
_
.
Hence if we choose c
n
= lnn so that e
cn
= 1/n, then we have
n=1
P (E
n
> lnn) =
n=1
_
1
n
_
which is convergent i > 1. So by Proposition 10.43, it follows that

limsup
n
E
n
lnn
= 1 a.s.
Example 10.46. * Suppose now that X
n
n=1
are i.i.d. distributed by the Pois-
son distribution with intensity, , i.e.
P (X
1
= k) =

k
k!
e
.
In this case we have
P (X
1
n) = e
k=n
k
k!

n
n!
e
and
k=n
k
k!
e
=

n
n!
e
k=n
n!
k!
kn
=

n
n!
e
k=0
n!
(k +n)!

n
n!
e
k=0
1
k!
k
=

n
n!
.
Thus we have shown that
n
n!
e
P (X
1
n)

n
n!
.
Thus in terms of convergence issues, we may assume that
P (X
1
x)

x
x!

x
2xe
x
x
x
wherein we have used Stirlings formula,
x!
2xe
x
x
x
.
Now suppose that we wish to choose c
n
so that
P (X
1
c
n
) 1/n.
This suggests that we need to solve the equation, x
x
= n. Taking logarithms of
this equation implies that
x =
lnn
lnx
and upon iteration we nd,
x =
lnn
ln
_
ln n
ln x
_ =
lnn
2
(n)
2
(x)
=
lnn
2
(n)
2
_
ln n
ln x
_
=
lnn
2
(n)
3
(n) +
3
(x)
.
10.8 Kolmogorov and Hewitt-Savage Zero-One Laws 163
where
k
=
k - times
..
ln ln ln. Since, x ln(n) , it follows that
3
(x)
3
(n) and
hence
x =
ln(n)
2
(n) +O(
3
(n))
=
ln(n)
2
(n)
_
1 +O
_
3
(n)
2
(n)
__
.
Thus we are lead to take c
n
:=
ln(n)
2(n)
. We then have, for (0, ) that
(c
n
)
cn
= exp(c
n
[ln + lnc
n
])
= exp
_
ln(n)
2
(n)
[ln +
2
(n)
3
(n)]
_
= exp
_
_
ln
3
(n)
2
(n)
+ 1
_
ln(n)
_
= n
(1+n())
where
n
() :=
ln
3
(n)
2
(n)
.
Hence we have
P (X
1
c
n
)

cn
2c
n
e
cn
(c
n
)
cn

(/e)
cn
2c
n
1
n
(1+n())
.
Since
ln(/e)
cn
= c
n
ln(/e) =
lnn
2
(n)
ln(/e) = lnn
ln(/e)
2
(n)
,
it follows that
(/e)
cn
= n
ln(/e)
2
(n)
.
Therefore,
P (X
1
c
n
)
n
ln(/e)
2
(n)
_
ln(n)
2(n)
1
n
(1+n())
=
2
(n)
ln(n)
1
n
(1+n())
where
n
() 0 as n . From this observation, we may show,
n=1
P (X
1
c
n
) < if > 1 and
n=1
P (X
1
c
n
) = if < 1
and so by Proposition 10.43 we may conclude that
limsup
n
X
n
ln(n) /
2
(n)
= 1 a.s.
10.8 Kolmogorov and Hewitt-Savage Zero-One Laws
Let X
n
n=1
be a sequence of random variables on a measurable space, (, B) .
Let B
n
:= (X
1
, . . . , X
n
) , B
:= (X
1
, X
2
, . . . ) , T
n
:= (X
n+1
, X
n+2
, . . . ) ,
and T :=
n=1
T
n
B
. We call T the tail eld and events, A T , are

called tail events.
Example 10.47. Let S
n
:= X
1
+ +X
n
and b
n
n=1
(0, ) such that b
n
.
Here are some example of tail events and tail measurable random variables:
1.
n=1
X
n
converges T . Indeed,
_

k=1
X
k
converges
_
=
_

k=n+1
X
k
converges
_
T
n
for all n N.
2. Both limsup
n
X
n
and liminf
n
X
n
are T measurable as are limsup
n
Sn
bn
and liminf
n
Sn
bn
.
3.
_
limX
n
exists in

1
_
=
_
limsup
n
X
n
= liminf
n
X
n
_
T and similarly,
_
lim
S
n
b
n
exists in

1
_
=
_
limsup
n
S
n
b
n
= liminf
n
S
n
b
n
_
T
and
_
lim
S
n
b
n
exists in 1
_
=
_
< limsup
n
S
n
b
n
= liminf
n
S
n
b
n
<
_
T .
4.
_
lim
n
Sn
bn
= 0
_
T . Indeed, for any k N,
lim
n
S
n
b
n
= lim
n
(X
k+1
+ +X
n
)
b
n
_
lim
n
Sn
bn
= 0
_
T
k
for all k.
Denition 10.48. Let (, B, P) be a probability space. A eld, T B is
almost trivial i P (T) = 0, 1 , i.e. P (A) 0, 1 for all A T.
The following conditions on a sub--algebra, T B are equivalent; 1) T
is almost trivial, 2) P (A) = P (A)
2
for all A T, and 3) T is independent
of itself. For example if T is independent of itself, then P (A) = P (A A) =
P (A) P (A) for all A T which implies P (A) = 0 or 1. If T is almost trivial
and A, B T, then P (A B) = 1 = P (A) P (B) if P (A) = P (B) = 1 and
P (A B) = 0 = P (A) P (B) if either P (A) = 0 or P (B) = 0. Therefore T is
independent of itself.
164 10 Independence
Lemma 10.49. Suppose that X :

1 is a random variable which is T
measurable, where T B is almost trivial. Then there exists c

1 such that
X = c a.s.
Proof. Since X = and X = are in T, if P (X = ) > 0 or
P (X = ) > 0, then P (X = ) = 1 or P (X = ) = 1 respectively.
Hence, it suces to nish the proof under the added condition that P (X 1) =
1.
For each x 1, X x T and therefore, P (X x) is either 0 or 1. Since
the function, F (x) := P (X x) 0, 1 is right continuous, non-decreasing
and F () = 0 and F (+) = 1, there is a unique point c 1 where F (c) = 1
and F (c) = 0. At this point, we have P (X = c) = 1.
Alternatively if X : 1 is an integrable T measurable random vari-
able, we know that X is independent of itself and therefore X
2
is integrable
and EX
2
= (EX)
2
=: c
2
. Thus it follows that E
_
(X c)
2
_
= 0, i.e. X = c
a.s. For general X : 1, let X
M
:= (M X) (M) , then X
M
= EX
M
a.s. For suciently large M we know by MCT that P ([X[ < M) > 0 and since
X = X
M
= EX
M
a.s. on [X[ < M , it follows that c = EX
M
is constant in-
dependent of M for M large. Therefore, X = lim
M
X
M
a.s.
= lim
M
c = c.
Proposition 10.50 (Kolmogorovs Zero-One Law). Suppose that P is a
probability measure on (, B) such that X
n
n=1
are independent random vari-
ables. Then T is almost trivial, i.e. P (A) 0, 1 for all A T . In particular
the tail events in Example 10.47 have probability either 0 or 1.
Proof. For each n N, T (X
n+1
, X
n+2
, . . . ) which is independent of
B
n
:= (X
1
, . . . , X
n
) . Therefore T is independent of B
n
which is a multi-
plicative system. Therefore T and is independent of B
= (B
n
) =
n=1
B
n
.
As T B
it follows that T is independent of itself, i.e. T is almost trivial.

Corollary 10.51. Keeping the assumptions in Proposition 10.50 and let
b
n
n=1
(0, ) such that b
n
. Then limsup
n
X
n
, liminf
n
X
n
,
limsup
n
Sn
bn
, and liminf
n
Sn
bn
are all constant almost surely. In particular, ei-
ther P
__
lim
n
Sn
bn
exists
__
= 0 or P
__
lim
n
Sn
bn
exists
__
= 1 and in the latter
case lim
n
Sn
bn
= c a.s for some c

1.
Example 10.52. Suppose that A
n
n=1
are independent sets and let X
n
:= 1
An
for all n and T =
n1
(X
n
, X
n+1
, . . . ) . Then A
n
i.o. T and therefore
by the Kolmogorov 0-1 law, P (A
n
i.o.) = 0 or 1. Of course, in this case the
Borel zero - one law (Lemma 10.41) tells when P (A
n
i.o.) is 0 and when it
is 1 depending on whether

n=1
P (A
n
) is nite or innite respectively.
10.8.1 Hewitt-Savage Zero-One Law
In this subsection, let := 1
= 1
N
and X
n
() =
n
for all and n N,
and B := (X
1
, X
2
, . . . ) be the product algebra on . We say a permutation
(i.e. a bijective map on N), : N N is nite if (n) = n for a.a. n. Dene
T
: by T
() = (
1
,
2
, . . . ) . Since X
i
T
() =
i
= X
i
() for
all i, it follows that T
is B/B measurable.
Let us further suppose that is a probability measure on (1, B
R
) and let
P =
n=1
be the innite product measure on
_
= 1
N
, B
_
. Then X
n
n=1
are i.i.d. random variables with Law
P
(X
n
) = for all n. If : N N is a nite
permutation and A
i
B
R
for all i, then
T
1
(A
1
A
2
A
3
. . . ) = A
1
1
A
1
2
. . . .
Since sets of the form, A
1
A
2
A
3
. . . , form a system generating B and
P T
1
(A
1
A
2
A
3
. . . ) =
i=1
(A
1
i
)
=
i=1
(A
i
) = P (A
1
A
2
A
3
. . . ) ,
we may conclude that P T
1
= P.
Denition 10.53. The permutation invariant eld, o B, is the col-
lection of sets, A B such that T
1
(A) = A for all nite permutations . (You

should check that o is a eld!)
Proposition 10.54 (Hewitt-Savage Zero-One Law). Let be a probabil-
ity measure on (1, B
R
) and P =
n=1
be the innite product measure on
_
= 1
N
, B
_
so that X
n
n=1
(recall that X
n
() =
n
) is an i.i.d. sequence
with Law
P
(X
n
) = for all n. Then o is P almost trivial.
Proof. Let B o, f = 1
B
, and g = G(X
1
, . . . , X
n
) be a (X
1
, X
2
, . . . , X
n
)
measurable function such that sup
[g ()[ 1. Further let be a nite

permutation such that 1, . . . , n 1, 2, . . . , n = for example we could
take (j) = j +n, (j +n) = j for j = 1, 2, . . . , n, and (j + 2n) = j +2n for
all j N. Then g T
= G(X
1
, . . . , X
n
) is independent of g and therefore,
(Eg)
2
= Eg E[g T
] = E[g g T
] .
Since f T
= 1
T
1
(B)
= 1
B
= f, it follows that Ef = Ef
2
= E[f f T
] and
therefore,
10.9 Another Construction of Independent Random Variables* 165
Ef (Eg)
2
= [E[f f T
g g T
][
E[[f g] f T
[ +E[g [f T
g T
][
E[f g[ +E[f T
g T
[ = 2E[f g[ . (10.40)
According to Corollary 8.13 (or see Corollary 5.28 or Theorem 5.44 or Exercise
8.5)), we may choose g = g
k
as above with E[f g
k
[ 0 as n and so
passing to the limit in Eq. (10.40) with g = g
k
, we may conclude,
P (B) [P (B)]
2
Ef (Ef)
2
0.
That is P (B) 0, 1 for all B o.
In a nutshell, here is the crux of the above proof. First o we know that
for B o B, there exists g which is (X
1
, . . . , X
n
) measurable such that
f := 1
B

= g. Since P T
1
= P it also follows that f = f T

= g T
. For
judiciously chosen , we know that g and g T
are independent. Therefore

Ef
2
= E[f f T
= E[g g T
] = E[g] E[g T
] = (Eg)
2
= (Ef)
2
.
As the approximation f by g may be made as accurate as we please, it follows
that P (B) = Ef
2
= (Ef)
2
= [P (B)]
2
for all B o.
Example 10.55 (Some Random Walk 01 Law Results). Continue the notation
in Proposition 10.54.
1. As above, if S
n
= X
1
+ + X
n
, then P (S
n
B i.o.) 0, 1 for all
B B
R
. Indeed, if is a nite permutation,
T
1
(S
n
B i.o.) = S
n
T
B i.o. = S
n
B i.o. .
Hence S
n
B i.o. is in the permutation invariant eld, o. The same
goes for S
n
B a.a.
2. If P (X
1
,= 0) > 0, then limsup
n
S
n
= a.s. or limsup
n
S
n
= a.s. Indeed,
T
1
_
limsup
n
S
n
x
_
=
_
limsup
n
S
n
T
x
_
=
_
limsup
n
S
n
x
_
which shows that limsup
n
S
n
is o measurable. Therefore, limsup
n
S
n
= c a.s.
for some c

1. Since (X
2
, X
3
, . . . )
d
= (X
1
, X
2
, . . . ) it follows (see Corollary
6.47 and Exercise 6.10) that
c = limsup
n
S
n
d
=limsup
n
(X
2
+X
3
+ +X
n+1
)
=limsup
n
(S
n+1
X
1
) = limsup
n
S
n+1
X
1
= c X
1
.
By Exercise 10.11 below we may now conclude that c = c X
1
a.s. which
is possible i c or X
1
= 0 a.s. Since the latter is not allowed,
limsup
n
S
n
= or limsup
n
S
n
= a.s.
3. Now assume that P (X
1
,= 0) > 0 and X
1
d
= X
1
, i.e. P (X
1
A) =
P (X
1
A) for all A B
R
. By 2. we know limsup
n
S
n
= c a.s. with
c . Since X
n
n=1
and X
n
n=1
are i.i.d. and X
n
d
= X
n
, it
follows that X
n
n=1
d
= X
n
n=1
.The results of Exercises 6.10 and 10.11
then imply that c
d
= limsup
n
S
n
d
= limsup
n
(S
n
) and in particular
c
a.s.
= limsup
n
(S
n
) = liminf
n
S
n
limsup
n
S
n
= c.
Since the c = does not satisfy, c c, we must c = . Hence in this
symmetric case we have shown,
limsup
n
S
n
= and liminf
n
S
n
= a.s.
Exercise 10.11. Suppose that (, B, P) is a probability space, Y :

1 is
a random variable and c

1 is a constant. Then Y = c a.s. i Y
d
= c.
Solution to Exercise (10.11). If Y = c a.s. then P (Y A) = P (c A)
for all A B
R
and therefore Y
d
= c. Conversely, if Y
d
= c, then P (Y = c) =
P (c = c) = 1, i.e. Y = c a.s.
10.9 Another Construction of Independent Random
Variables*
This section may be skipped as the results are a special case of those given above.
The arguments given here avoid the use of Kolmogorovs existence theorem for
product measures.
Example 10.56. Suppose that =
n
where is a nite set, B = 2
, P () =
n
j=1
q
j
(
j
) where q
j
: [0, 1] are functions such that

q
j
() = 1. Let
(
i
:=
_
i1
A
ni
: A
_
. Then (
i
n
i=1
are independent. Indeed, if
B
i
:=
i1
A
i
ni
, then
B
i
= A
1
A
2
A
n
and we have
166 10 Independence
P (B
i
) =
A1A2An
n
i=1
q
i
(
i
) =
n
i=1
Ai
q
i
()
while
P (B
i
) =
i1
Ai
ni
n
i=1
q
i
(
i
) =
Ai
q
i
() .
Example 10.57. Continue the notation of Example 10.56 and further assume
that 1 and let X
i
: be dened by, X
i
() =
i
. Then X
i
n
i=1
are independent random variables. Indeed, (X
i
) = (
i
with (
i
as in Example
10.56.
Alternatively, from Exercise 4.10, we know that
E
P
_
n
i=1
f
i
(X
i
)
_
=
n
i=1
E
P
[f
i
(X
i
)]
for all f
i
: 1. Taking A
i
and f
i
:= 1
Ai
in the above identity shows
that
P (X
1
A
1
, . . . , X
n
A
n
) = E
P
_
n
i=1
1
Ai
(X
i
)
_
=
n
i=1
E
P
[1
Ai
(X
i
)]
=
n
i=1
P (X
i
A
i
)
as desired.
Theorem 10.58 (Existence of i.i.d simple R.V.s). Suppose that q
i
n
i=0
is a sequence of positive numbers such that

n
i=0
q
i
= 1. Then there exists a se-
quence X
k
k=1
of simple random variables taking values in = 0, 1, 2 . . . , n
on ((0, 1], B, m) such that
m(X
1
= i
1
, . . . , X
k
= i
i
) = q
i1
. . . q
ik
for all i
1
, i
2
, . . . , i
k
0, 1, 2, . . . , n and all k N. (See Example 10.15 above
and Theorem 10.62 below for the general case of this theorem.)
Proof. For i = 0, 1, . . . , n, let
1
= 0 and
j
:=

j
i=0
q
i
and for any
interval, (a, b], let
T
i
((a, b]) := (a +
i1
(b a) , a +
i
(b a)].
Given i
1
, i
2
, . . . , i
k
0, 1, 2, . . . , n, let
J
i1,i2,...,ik
:= T
ik
_
T
ik1
(. . . T
i1
((0, 1]))
_
and dene X
k
k=1
on (0, 1] by
X
k
:=
i1,i2,...,ik0,1,2,...,n]
i
k
1
Ji
1
,i
2
,...,i
k
,
see Figure 10.3. Repeated applications of Corollary 6.27 shows the functions,
X
k
: (0, 1] 1 are measurable.
Fig. 10.3. Here we suppose that p0 = 2/3 and p1 = 1/3 and then we construct Jl
and Jl,k for l, k {0, 1} .
Observe that
m(T
i
((a, b])) = q
i
(b a) = q
i
m((a, b]) , (10.41)
and so by induction,
m(J
i1,i2,...,ik
) = q
ik
q
ik1
. . . q
i1
.
The reader should convince herself/himself that
X
1
= i
1
, . . . X
k
= i
i
= J
i1,i2,...,ik
10.9 Another Construction of Independent Random Variables* 167
and therefore, we have
m(X
1
= i
1
, . . . , X
k
= i
i
) = m(J
i1,i2,...,ik
) = q
ik
q
ik1
. . . q
i1
as desired.
Corollary 10.59 (Independent variables on product spaces). Suppose
= 0, 1, 2 . . . , n , q
i
> 0 with

n
i=0
q
i
= 1, =
=
N
, and for
i N, let Y
i
: 1 be dened by Y
i
() =
i
for all . Further let
B := (Y
1
, Y
2
, . . . , Y
n
, . . . ) . Then there exists a unique probability measure,
P : B [0, 1] such that
P (Y
1
= i
1
, . . . , Y
k
= i
i
) = q
i1
. . . q
ik
.
Proof. Let X
i
n
i=1
be as in Theorem 10.58 and dene T : (0, 1] by
T (x) = (X
1
(x) , X
2
(x) , . . . , X
k
(x) , . . . ) .
Observe that T is measurable since Y
i
T = X
i
is measurable for all i. We now
dene, P := T
m. Then we have
P (Y
1
= i
1
, . . . , Y
k
= i
i
) = m
_
T
1
(Y
1
= i
1
, . . . , Y
k
= i
i
)
_
= m(Y
1
T = i
1
, . . . , Y
k
T = i
i
)
= m(X
1
= i
1
, . . . , X
k
= i
i
) = q
i1
. . . q
ik
.
Theorem 10.60. Given a nite subset, 1 and a function q : [0, 1]
such that

q () = 1, there exists a probability space, (, B, P) and an

independent sequence of random variables, X
n
n=1
such that P (X
n
= ) =
q () for all .
Proof. Use Corollary 10.10 to shows that random variables constructed in
Example 5.41 or Theorem 10.58 t the bill.
Proposition 10.61. Suppose that X
n
n=1
is a sequence of i.i.d. random
variables with distribution, P (X
n
= 0) = P (X
n
= 1) =
1
2
. If we let U :=
n=1
2
n
X
n
, then P (U x) = (0 x) 1, i.e. U has the uniform distribution
on [0, 1] .
Proof. Let us recall that P (X
n
= 0 a.a.) = 0 = P (X
n
= 1 a.a.) . Hence
we may, by shrinking if necessary, assume that X
n
= 0 a.a. = =
X
n
= 1 a.a. . With this simplication, we have
_
U <
1
2
_
= X
1
= 0 ,
_
U <
1
4
_
= X
1
= 0, X
2
= 0 and
_
1
2
U <
3
4
_
= X
1
= 1, X
2
= 0
and hence that
_
U <
3
4
_
=
_
U <
1
2
_
_
1
2
U <
3
4
_
= X
1
= 0 X
1
= 1, X
2
= 0 .
From these identities, it follows that
P (U < 0) = 0, P
_
U <
1
4
_
=
1
4
, P
_
U <
1
2
_
=
1
2
, and P
_
U <
3
4
_
=
3
4
.
More generally, we claim that if x =
n
j=1
j
2
j
with
j
0, 1 , then
P (U < x) = x. (10.42)
The proof is by induction on n. Indeed, we have already veried (10.42) when
n = 1, 2. Suppose we have veried (10.42) up to some n N and let x =
n
j=1
j
2
j
and consider
P
_
U < x + 2
(n+1)
_
= P (U < x) +P
_
x U < x + 2
(n+1)
_
= x +P
_
x U < x + 2
(n+1)
_
.
Since _
x U < x + 2
(n+1)
_
=
_
n
j=1
X
j
=
j
X
n+1
= 0
we see that
P
_
x U < x + 2
(n+1)
_
= 2
(n+1)
and hence
P
_
U < x + 2
(n+1)
_
= x + 2
(n+1)
which completes the induction argument.
Since x P (U < x) is left continuous we may now conclude that
P (U < x) = x for all x (0, 1) and since x x is continuous we may also
deduce that P (U x) = x for all x (0, 1) . Hence we may conclude that
P (U x) = (0 x) 1.
We may now show the existence of independent random variables with ar-
bitrary distributions.
Theorem 10.62. Suppose that
n
n=1
are a sequence of probability measures
on (1, B
R
) . Then there exists a probability space, (, B, P) and a sequence
Y
n
n=1
independent random variables with Law(Y
n
) := P Y
1
n
=
n
for all
n.
Proof. By Theorem 10.60, there exists a sequence of i.i.d. random variables,
Z
n
n=1
, such that P (Z
n
= 1) = P (Z
n
= 0) =
1
2
. These random variables may
be put into a two dimensional array, X
i,j
: i, j N , see the proof of Lemma
3.8. For each i, let U
i
:=

j=1
2
i
X
i,j

_
X
i,j
j=1
_
measurable random
variable. According to Proposition 10.61, U
i
is uniformly distributed on [0, 1] .
Moreover by the grouping Lemma 10.16,
_
_
X
i,j
j=1
__
i=1
are independent
algebras and hence U
i
i=1
is a sequence of i.i.d.. random variables with
the uniform distribution.
Finally, let F
i
(x) := ((, x]) for all x 1 and let G
i
(y) =
inf x : F
i
(x) y . Then according to Theorem 6.48, Y
i
:= G
i
(U
i
) has
i
as
its distribution. Moreover each Y
i
is
_
X
i,j
j=1
_
the Y
i
i=1
are independent random variables.
11
The Standard Poisson Process
11.1 Poisson Random Variables
Recall from Exercise 7.5 that a Random variable, X, is Poisson distributed with
intensity, a, if
P (X = k) =
a
k
k!
e
a
for all k N
0
.
We will abbreviate this in the future by writing X
d
= Poi (a) . Let us also recall
that
E
_
z
X
k=0
z
k
a
k
k!
e
a
= e
az
e
a
= e
a(z1)
and as in Exercise 7.5 we have EX = a = Var (X) .
Lemma 11.1. If X = Poi (a) and Y = Poi (b) and X and Y are independent,
then X +Y = Poi (a +b) .
Proof. For k N
0
,
P (X +Y = k) =
k
l=0
P (X = l, Y = k l) =
k
l=0
P (X = l) P (Y = k l)
=
k
l=0
e
a
a
l
l!
e
b
b
kl
(k l)!
=
e
(a+b)
k!
k
l=0
_
k
l
_
a
l
b
kl
=
e
(a+b)
k!
(a +b)
k
.
Alternative Proof. Notice that
E
_
z
X+Y
= E
_
z
X
E
_
z
Y
= e
a(z1)
e
b(z1)
= exp((a +b) (z 1)) .
This suces to complete the proof.
Lemma 11.2. Suppose that N
i
i=1
are independent Poisson random variables
with parameters,
i
i=1
such that

i=1
i
= . Then

i=1
N
i
= a.s.
Fig. 11.1. This plot shows, 1 e
1
2
(1 ) .
Proof. From Figure 11.1 we see that 1 e
1
2
(1 ) for all 0.
Therefore,
i=1
P (N
i
1) =
i=1
(1 P (N
i
= 0)) =
i=1
_
1 e
i
_
1
2
i=1
i
1 =
and so by the second Borel Cantelli Lemma, P (N
i
1 i.o.) = 1. From this
it certainly follows that

i=1
N
i
= a.s.
Alternatively, let
n
=
1
+ +
n
, then
P
_

i=1
N
i
k
_
P
_
n
i=1
N
i
k
_
= 1 e
n
k1
l=0
l
n
l!
1 as n .
Therefore P (
i=1
N
i
k) = 1 for all k N and hence,
P
_

i=1
N
i

_
= P
_
k=1
_

i=1
N
i
k
__
= 1.
170 11 The Standard Poisson Process
11.2 Exponential Random Variables
Recall from Denition 7.55 that T
d
= E () is an exponential random variable
with parameter [0, ) provided, P (T > t) = e
t
for all t 0. We have
seen that
E
_
e
aT
=
1
1 a
1
for a < . (11.1)
ET =
1
and Var (T) =
2
, and (see Theorem 7.56) that T being exponential
is characterized by the following memoryless property;
P (T > s +t[T > s) = P (T > t) for all s, t 0.
Theorem 11.3. Let T
j
j=1
be independent random variables such that T
j
d
=
E (
j
) with 0 <
j
< for all j. Then:
1. If

n=1
1
n
< then P (
n=1
T
n
= ) = 0 (i.e. P (
n=1
T
n
< ) =
1).
2. If

n=1
1
n
= then P (
n=1
T
n
= ) = 1.
(By Kolmogorovs zero-one law (see Proposition 10.50) it follows that
P (
n=1
T
n
= ) is always either 0 or 1. We are showing here that
P (
n=1
T
n
= ) = 1 i E[
n=1
T
n
] = .)
Proof. 1. Since
E
_

n=1
T
n
_
=
n=1
E[T
n
] =
n=1
1
n
<
it follows that

n=1
T
n
< a.s., i.e. P (
n=1
T
n
= ) = 0.
2. By the DCT, independence, and Eq. (11.1) with a = 1,
E
_
e
n=1
Tn
_
= lim
N
E
_
e
N
n=1
Tn
_
= lim
N
N
n=1
E
_
e
Tn
= lim
N
N
n=1
_
1
1 +
1
n
_
=
n=1
(1 a
n
)
where
a
n
= 1
1
1 +
1
n
=
1
1 +
n
.
Hence by Exercise 10.10, E
_
e
n=1
Tn
_
= 0 i =

n=1
a
n
which hap-
pens i

n=1
1
n
= as you should verify. This completes the proof since
E
_
e
n=1
Tn
_
= 0 i e
n=1
Tn
= 0 a.s. or equivalently

n=1
T
n
= a.s.
11.2.1 Appendix: More properties of Exponential random
Variables*
Theorem 11.4. Let I be a countable set and let T
k
kI
be independent ran-
dom variables such that T
k
E (q
k
) with q :=

kI
q
k
(0, ) . Let
T := inf
k
T
k
and let K = k on the set where T
j
> T
k
for all j ,= k. On the
complement of all these sets, dene K = where is some point not in I. Then
P (K = ) = 0, K and T are independent, T E (q) , and P (K = k) = q
k
/q.
Proof. Let k I and t 1
+
and
n

f
I such that
n
I k , then
P (K = k, T > t) = P (
j,=k
T
j
> T
k
, T
k
> t) = lim
n
P (
jn
T
j
> T
k
, T
k
> t)
= lim
n
_
[0,)
n{k}
jn
1
tj>tk
1
tk>t
d
n
_
t
j
jn
_
q
k
e
qktk
dt
k
where
n
is the joint distribution of T
j
jn
. So by Fubinis theorem,
P (K = k, T > t) = lim
n
_

t
q
k
e
qktk
dt
k
_
[0,)
n
jn
1
tj>tk
1
tk>t
d
n
_
t
j
jn
_
= lim
n
_

t
P (
jn
T
j
> t
k
) q
k
e
qktk
dt
k
=
_

t
P (
j,=k
T
j
> ) q
k
e
qk
d
=
_

t
j,=k
e
qj
q
k
e
qk
d =
_

t
jI
e
qj
q
k
d
=
_

t
e
j=1
qj
q
k
d =
_

t
e
q
q
k
d =
q
k
q
e
qt
. (11.2)
Taking t = 0 shows that P (K = k) =
qk
q
and summing this on k shows
P (K I) = 1 so that P (K = ) = 0. Moreover summing Eq. (11.2) on k
now shows that P (T > t) = e
qt
so that T is exponential. Moreover we have
shown that
P (K = k, T > t) = P (K = k) P (T > t)
proving the desired independence.
Theorem 11.5. Suppose that S E () and R E () are independent. Then
for t 0 we have
P (S t < S +R) = P (R t < R +S) .
11.3 The Standard Poisson Process 171
Proof. We have
P (S t < S +R) =
_
t
0
e
s
P (t < s +R) ds
=
_
t
0
e
s
e
(ts)
ds
= e
t
_
t
0
e
()s
ds = e
t
1 e
()t

=
e
t
e
t

which is symmetric in the interchanged of and .Alternatively:
P (S t < S +R) =
_
R
2
+
1
st<s+r
e
s
e
r
dsdr
=
_
t
0
ds
_

ts
dre
s
e
r
=
_
t
0
dse
s
e
(ts)
= e
t
_
t
0
dse
()s
= e
t
1 e
()t

=
e
t
e
t

.
Therefore,
P (S t < S +R) =
e
t
e
t

which is symmetric in the interchanged of and and hence
P (R t < S +R) =
e
t
e
t

.
Example 11.6. Suppose T is a positive random variable such that
P (T t +s[T s) = P (T t) for all s, t 0, or equivalently
P (T t +s) = P (T t) P (T s) for all s, t 0,
then P (T t) = e
at
for some a > 0. (Such exponential random variables
are often used to model waiting times.) The distribution function for T is
F
T
(t) := P (T t) = 1 e
a(t0)
. Since F
T
(t) is piecewise dierentiable, the
law of T, := P T
1
, has a density,
d(t) = F
t
T
(t) dt = ae
at
1
t0
dt.
Therefore,
E
_
e
iaT
=
_

0
ae
at
e
it
dt =
a
a i
= () .
Since

t
() = i
a
(a i)
2
and
tt
() = 2
a
(a i)
3
it follows that
ET =

t
(0)
i
= a
1
and ET
2
=

tt
(0)
i
2
=
2
a
2
and hence Var (T) =
2
a
2

_
1
a
_
2
= a
2
.
11.3 The Standard Poisson Process
Let T
k
k=1
be an i.i.d. sequence of random exponential times with parameter
, i.e. P (T
k
[t, t +dt]) = e
t
dt. For each n N let W
n
:= T
1
+ +T
n
be
the waiting time for the n
th
event to occur. Because of Theorem 11.3 we
know that lim
n
W
n
= a.s.
Denition 11.7 (Poisson Process I). For any subset A 1
+
let N (A) :=
n=1
1
A
(W
n
) count the number of waiting times which occurred in A. When
A = (0, t] we will write, N
t
:= N ((0, t]) for all t 0 and refer to N
t
t0
as
the Poisson Process with intensity . (Observe that N
t
= n = W
n
t <
W
n+1
.)
The next few results summarize a number of the basic properties of this
Poisson process. Many of the proofs will be left as exercises to the reader. We
will use the following notation below; for each n N and T 0 let
n
(T) := (w
1
, . . . , w
n
) 1
n
: 0 < w
1
< w
2
< < w
n
< T
and let
n
:=
T>0
n
(T) = (w
1
, . . . , w
n
) 1
n
: 0 < w
1
< w
2
< < w
n
< .
(We equip each of these spaces with their Borel algebras.)
Exercise 11.1. Show m
n
(
n
(T)) = T
n
/n! where m
n
is Lebesgue measure on
B
R
n.
Exercise 11.2. If n N and g :
n
1 bounded (non-negative) measurable,
then
E[g (W
1
, . . . , W
n
)] =
_
n
g (w
1
, w
2
, . . . , w
n
)
n
e
wn
dw
1
. . . dw
n
. (11.3)
As a simple corollary we have the following direct proof of Example 10.32.
Corollary 11.8. If n N, then W
n
d
=Gamma
_
n,
1
_
.
Proof. Taking g (w
1
, w
2
, . . . , w
n
) = f (w
n
) in Eq. (11.3) we nd with the
aid of Exercise 11.1 that
E[f (W
n
)] =
_
n
f (w
n
)
n
e
wn
dw
1
. . . dw
n
=
_

0
f (w)
n
w
n1
(n 1)!
e
w
dw
which shows that W
n
d
=Gamma
_
n,
1
_
.
Corollary 11.9. If t 1
+
and f :
n
(t) 1 is a bounded (or non-negative)
measurable function, then
E[f (W
1
, . . . , W
n
) : N
t
= n]
=
n
e
t
_
n(t)
f (w
1
, w
2
, . . . , w
n
) dw
1
. . . dw
n
. (11.4)
Proof. Making use of the observation that N
t
= n = W
n
t < W
n+1
,
we may apply Eq. (11.3) at level n + 1 with
g (w
1
, w
2
, . . . , w
n+1
) = f (w
1
, w
2
, . . . , w
n
) 1
wnt<wn+1
to learn
E[f (W
1
, . . . , W
n
) : N
t
= n]
=
_
0<w1<<wn<t<wn+1
f (w
1
, w
2
, . . . , w
n
)
n+1
e
wn+1
dw
1
. . . dw
n
dw
n+1
=
_
n(t)
f (w
1
, w
2
, . . . , w
n
)
n
e
t
dw
1
. . . dw
n
.
Exercise 11.3. Show N
t
d
= Poi (t) for all t > 0.
Denition 11.10 (Order Statistics). Suppose that X
1
, . . . , X
n
are non-
negative random variables such that P (X
i
= X
j
) = 0 for all i ,= j. The order
statistics of X
1
, . . . , X
n
are the random variables,

X
1
,

X
2
, . . . ,

X
n
dened by
X
k
= max
#()=k
minX
i
: i (11.5)
where always denotes a subset of 1, 2, . . . , n in Eq. (11.5).
The reader should verify that

X
1

X
2

X
n
, X
1
, . . . , X
n
=
_
X
1
,

X
2
, . . . ,

X
n
_
with repetitions, and that

X
1
<

X
2
< <

X
n
if
X
i
,= X
j
for all i ,= j. In particular if P (X
i
= X
j
) = 0 for all i ,= j then
P (
i,=j
X
i
= X
j
) = 0 and

X
1
<

X
2
< <

X
n
a.s.
1
, . . . , X
n
are non-negative
1
random variables
such that P (X
i
= X
j
) = 0 for all i ,= j. Show;
1. If f :
n
1 is bounded (non-negative) measurable, then
E
_
f
_
X
1
, . . . ,

X
n
__
=
Sn
E[f (X
1
, . . . , X
n
) : X
1
< X
2
< < X
n
] ,
(11.6)
where S
n
is the permutation group on 1, 2, . . . , n .
2. If we further assume that X
1
, . . . , X
n
are i.i.d. random variables, then
E
_
f
_
X
1
, . . . ,

X
n
__
= n! E[f (X
1
, . . . , X
n
) : X
1
< X
2
< < X
n
] .
(11.7)
(It is not important that f
_
X
1
, . . . ,

X
n
_
is not dened on the null set,
i,=j
X
i
= X
j
.)
3. f : 1
n
+
1 is a bounded (non-negative) measurable symmetric function
(i.e. f (w
1
, . . . , w
n
) = f (w
1
, . . . , w
n
) for all S
n
and (w
1
, . . . , w
n
)
1
n
+
) then
E
_
f
_
X
1
, . . . ,

X
n
__
= E[f (X
1
, . . . , X
n
)] .
4. Suppose that Y
1
, . . . , Y
n
is another collection of non-negative random vari-
ables such that P (Y
i
= Y
j
) = 0 for all i ,= j such that
E[f (X
1
, . . . , X
n
)] = E[f (Y
1
, . . . , Y
n
)]
1
The non-negativity of the Xi are not really necessary here but this is all we need
to consider.
11.3 The Standard Poisson Process 173
for all bounded (non-negative) measurable symmetric functions from 1
n
+

1. Show that
_
X
1
, . . . ,

X
n
_
d
=
_
Y
1
, . . . ,

Y
n
_
.
Hint: if g :
n
1 is a bounded measurable function, dene f : 1
n
+
1
by;
f (y
1
, . . . , y
n
) =
Sn
1
y1<y2<<yn
g (y
1
, y
2
, . . . , y
n
)
and then show f is symmetric.
Exercise 11.5. Let t 1
+
and U
i
n
i=1
be i.i.d. uniformly distributed random
variables on [0, t] . Show that the order statistics,
_
U
1
, . . . ,

U
n
_
, of (U
1
, . . . , U
n
)
has the same distribution as (W
1
, . . . , W
n
) given N
t
= n. (Thus, given N
t
=
n, the collection of points, W
1
, . . . , W
n
, has the same distribution as the
collection of points, U
1
, . . . , U
n
, in [0, t] .)
Theorem 11.11 (Joint Distributions). If A
i
k
i=1
B
[0,t]
is a partition
of [0, t] , then N (A
i
)
k
i=1
are independent random variables and N (A)
d
=
Poi (m(A)) for all A B
[0,t]
with m(A) < . In particular, if 0 < t
1
<
t
2
< < t
n
, then
_
N
ti
N
ti1
_
n
i=1
are independent random variables and
N
t
N
s
d
= Poi ((t s)) for all 0 s < t < . (We say that N
t
t0
is a
stochastic process with independent increments.)
Proof. If z C and A B
[0,t]
, then
z
N(A)
= z
n
i=1
1A(Wi)
on N
t
= n .
Let n N, z
i
C, and dene
f (w
1
, . . . , w
n
) = z
n
i=1
1A
1
(wi)
1
. . . z
n
i=1
1A
k
(wi)
k
which is a symmetric function. On N
t
= n we have,
z
N(A1)
1
. . . z
N(Ak)
k
= f (W
1
, . . . , W
n
)
and therefore,
E
_
z
N(A1)
1
. . . z
N(Ak)
k
[N
t
= n
_
= E[f (W
1
, . . . , W
n
) [N
t
= n]
= E[f (U
1
, . . . , U
n
)]
= E
_
z
n
i=1
1A
1
(Ui)
1
. . . z
n
i=1
1A
k
(Ui)
k
_
=
n
i=1
E
__
z
1A
1
(Ui)
1
. . . z
1A
k
(Ui)
k
__
=
_
E
__
z
1A
1
(U1)
1
. . . z
1A
k
(U1)
k
___
n
=
_
1
t
k
i=1
m(A
i
) z
i
_n
,
wherein we have made use of the fact that A
i
n
i=1
is a partition of [0, t] so that
z
1A
1
(U1)
1
. . . z
1A
k
(U1)
k
=
k
i=1
z
i
1
Ai
(U
i
) .
E
_
z
N(A1)
1
. . . z
N(Ak)
k
_
=
n=0
E
_
z
N(A1)
1
. . . z
N(Ak)
k
[N
t
= n
_
P (N
t
= n)
=
n=0
_
1
t
k
i=1
m(A
i
) z
i
_n
(t)
n
n!
e
t
=
n=0
1
n!
_
i=1
m(A
i
) z
i
_n
e
t
= exp
_
_
k
i=1
m(A
i
) z
i
t
__
= exp
_
_
k
i=1
m(A
i
) (z
i
1)
__
.
From this result it follows that N (A
i
)
n
i=1
are independent random variables
and N (A) = Poi (m(A)) for all A B
R
with m(A) < .
Alternatively; suppose that a
i
N
0
and n := a
1
+ +a
k
, then
P [N (A
1
) = a
1
, . . . , N (A
k
) = a
k
[N
t
= n] = P
_
n
i=1
1
Al
(U
i
) = a
l
for 1 l k
_
=
n!
a
1
! . . . a
k
!
k
l=1
_
m(A
l
)
t
_
al
=
n!
t
n

k
l=1
[m(A
l
)]
al
a
l
!
and therefore,
P [N (A
1
) = a
1
, . . . , N (A
k
) = a
k
]
= P [N (A
1
) = a
1
, . . . , N (A
k
) = a
k
[N
t
= n] P (N
t
= n)
=
n!
t
n

k
l=1
[m(A
l
)]
al
a
l
!
e
t
(t)
n
n!
=
k
l=1
[m(A
l
)]
al
a
l
!
e
t
n
=
k
l=1
[m(A
l
) ]
al
a
l
!
e
al
which shows that N (A
l
)
k
l=1
are independent and that N (A
l
)
d
= Poi (m(A
l
))
for each l.
Remark 11.12. If A B
[0,)
with m(A) = , then N (A) = a.s. To prove
this observe that N (A) = lim
n
N (A [0, n]) . Therefore for any k N, we
have
P (N (A) k) P (N (A [0, n]) k)
= 1 e
m(A[0,n])
0l<k
(m(A [0, n]))
l
l!
1 as n .
This shows that N (A) k a.s. for all k N, i.e. N (A) = a.s.
Exercise 11.6 (A Generalized Poisson Process I). Suppose that (S, B
S
, )
is a nite measure space with (S) < . Dene =
n=0
S
n
where S
0
= ,
were is some arbitrary point. Dene B
to be those sets, B =
n=0
B
n
where
B
n
B
S
n := B
n
S
the product algebra on S
n
. Now dene a probability
measure, P, on (, B
) by
P (B) := e
(S)
n=0
1
n!
n
(B
n
)
where
0
() = 1 by denition. (We denote P schematically by P :=
e
(S)
e
.) Finally for ever , let N
, be the point measure on (S, B

S
)
dened by; N
= 0 and
N
=
n
i=1
si
if = (s
1
, . . . , s
n
) S
n
for n 1.
So for A B
S
, we have N
(A) = 0 and N
(A) =
n
i=1
1
A
(s
i
) . Show;
1. For each A B
S
, N
(A) is a Poisson random variable with intensity

(A) , i.e. N (A) = Poi ((A)) .
2. If A
k
m
k=1
B
S
are disjoint sets, the N
(A
k
)
m
k=1
are independent
random variables.
An integer valued random measure on (S, B
S
) ( N
) satisfying
properties 1. and 2. of Exercise 11.6 is called a Poisson process on (S, B
S
)
with intensity measure . For more motivation as to why Poisson processes
are important see Proposition 21.11 below.
Exercise 11.7 (A Generalized Poisson Process II). Let (S, B
S
, ) be as in
Exercise 11.6, Y
i
i=1
be i.i.d. S valued Random variables with Law
P
(Y
i
) =
() /(S) and be a Poi ((S)) random variable which is independent of
Y
i
. Show N :=
i=1
Yi
is a Poisson process on (S, B
S
) with intensity mea-
sure, .
Exercise 11.8 (A Generalized Poisson Process III). Suppose now that
(S, B
S
, ) is a nite measure space and S =
l=1
S
l
is a partition of S such
that 0 < (S
l
) < for all l. For each l N, using either of the construction
above we may construct a Poisson point process, N
l
, on (S, B
S
) with intensity
measure,
l
where
l
(A) := (A S
l
) for all A B
S
. We do this in such a
what that N
l
l=1
are all independent. Show that N :=
l=1
N
l
is a Poisson
point process on (S, B
S
) with intensity measure, . To be more precise observe
that N is a random measure on (S, B
S
) which satises (as you should show);
1. For each A B
S
with (A) < , show N (A)
d
= Poi ((A)) .
2. If A
k
m
k=1
B
S
are disjoint sets with (A
k
) < , show N (A
k
)
m
k=1
are
independent random variables.
3. If A B
S
with (A) = , show N (A) = a.s.
11.4 Poission Process Extras*
(This subsection still needs work!) In Denition 11.7 we really gave a construc-
tion of a Poisson process as dened in Denition 11.13. The goal of this section
11.4 Poission Process Extras* 175
is to show that the Poisson process, N
t
t0
, as dened in Denition 11.13 is
uniquely determined and is essentially equivalent to what we have already done
above.
Denition 11.13 (Poisson Process II). Let (, B, P) be a probability space
and N
t
: N
0
be a random variable for each t 0. We say that N
t
t0
is a
Poisson process with intensity if; 1) N
0
= 0, 2) N
t
N
s
d
= Poi ((t s)) for
all 0 s < t < , 3) N
t
t0
has independent increments, and 4) t N
t
()
is right continuous and non-decreasing for all .
Let N
() := lim
t
N
t
() and observe that N
k=0
(N
k
N
k1
) = a.s. by Lemma 11.2. Therefore, we may and do
assume that N
() = for all .
Lemma 11.14. There is zero probability that N
t
t0
makes a jump greater
than or equal to 2.
Proof. Suppose that T (0, ) is xed and is sample point where
t N
t
() makes a jump of 2 or more for t [0, T] . Then for all n N we
must have
n
k=1
_
Nk
n
T
Nk1
n
T
2
_
. Therefore,
P
( : [0, T] t N
t
() has jump 2)
k=1
P
_
Nk
n
T
Nk1
n
T
2
_
=
n
k=1
O
_
T
2
/n
2
_
= O(1/n) 0
as n . I am leaving open the possibility that the set of where a jump
size 2 or larger is not measurable.
Theorem 11.15. Suppose that N
t
t0
is a Poisson process with intensity
as in Denition 11.13,
W
n
:= inf t : N
t
= n for all n N
0
be the rst time N
t
reaches n. (The W
n
n=0
are well dened o a set of
measure zero and W
n
< W
n+1
for all n by the right continuity of N
t
t0
.)
Then the T
n
:= W
n
W
n1
n=1
are i.i.d. E () random variables. Thus the
two descriptions of a Poisson process given in Denitions 11.7 and 11.13 are
equivalent.
Proof. Suppose that J
i
= (a
i
, b
i
] with b
i
a
i+1
< for all i. We will
begin by showing
P (
n
i=1
W
i
J
i
) =
n
n1
i=1
m(J
i
)
_
Jn
e
wn
dw
n
(11.8)
=
n
_
J1J2Jn
e
wn
dw
1
. . . dw
n
. (11.9)
To show this let K
i
:= (b
i1
, a
i
] where b
0
= 0. Then
n
i=1
W
i
J
i
=
n
i=1
N (K
i
) = 0
n1
i=1
N (J
i
) = 0 N (J
n
) 2
and therefore,
P (
n
i=1
W
i
J
i
) =
n
i=1
e
m(Ki)
n1
i=1
e
m(Ji)
m(J
i
)
_
1 e
m(Jn)
_
=
n1
n1
i=1
m(J
i
)
_
e
an
e
bn
=
n1
n1
i=1
m(J
i
)
_
Jn
e
wn
dw
n
.
We may now apply a argument, using (J
1
J
n
) = B
n
,
to show
E[g (W
1
, . . . , W
n
)] =
_
n
g (w
1
, . . . , w
n
)
n
e
wn
dw
1
. . . dw
n
holds for all bounded B
n
/B
R
measurable functions, g :
n
1. Undoing
the change of variables you made in Exercise 11.2 allows us to conclude that
T
n
n=1
are i.i.d. E () distributed random variables.
12
L
p
spaces
Let (, B, ) be a measure space and for 0 a) = 0 (12.2)

For 0 < p , let
L
p
(, B, ) = f : C : f is measurable and |f|
p
< /
where f g i f = g a.e. Notice that |f g|
p
= 0 i f g and if f g then
|f|
p
= |g|
p
. In general we will (by abuse of notation) use f to denote both
the function f and the equivalence class containing f.
Remark 12.1. Suppose that |f|
M, then for all a > M, ([f[ > a) = 0 and

therefore ([f[ > M) = lim
n
([f[ > M + 1/n) = 0, i.e. [f()[ M for -
a.e. . Conversely, if [f[ M a.e. and a > M then ([f[ > a) = 0 and hence
|f|
M. This leads to the identity:

|f|
= inf a 0 : [f()[ a for a.e. .

12.1 Modes of Convergence
Let f
n
n=1
f be a collection of complex valued measurable functions on
. We have the following notions of convergence and Cauchy sequences.
Denition 12.2. 1. f
n
f a.e. if there is a set E B such that (E) = 0
and lim
n
1
E
c f
n
= 1
E
c f.
2. f
n
f in measure if lim
n
([f
n
f[ > ) = 0 for all > 0. We
will abbreviate this by saying f
n
f in L
0
or by f
n
f.
3. f
n
f in L
p
i f L
p
and f
n
L
p
for all n, and lim
n
|f
n
f|
p
= 0.
Denition 12.3. 1. f
n
is a.e. Cauchy if there is a set E B such that
(E) = 0 and1
E
c f
n
is a pointwise Cauchy sequences.
2. f
n
is Cauchy in measure (or L
0
Cauchy) if lim
m,n
([f
n
f
m
[ >
) = 0 for all > 0.
3. f
n
is Cauchy in L
p
if lim
m,n
|f
n
f
m
|
p
= 0.
When is a probability measure, we describe, f
n
f as f
n
converging
to f in probability. If a sequence f
n
n=1
is L
p
convergent, then it is L
p
Cauchy. For example, when p [1, ] and f
n
f in L
p
, we have (using
Minikowskis inequality of Theorem 12.22 below)
|f
n
f
m
|
p
|f
n
f|
p
+|f f
m
|
p
0 as m, n .
The case where p = 0 will be handled in Theorem 12.8 below.
Lemma 12.4 (L
p
convergence implies convergence in probability).
Let p [1, ). If f
n
L
p
is L
p
convergent (Cauchy) then f
n
is also
convergent (Cauchy) in measure.
Proof. By Chebyshevs inequality (7.2),
([f[ ) = ([f[
p

p
)
1
p
_
[f[
p
d =
1
p
|f|
p
p
and therefore if f
n
is L
p
Cauchy, then
([f
n
f
m
[ )
1
p
|f
n
f
m
|
p
p
0 as m, n
showing f
n
is L
0
Cauchy. A similar argument holds for the L
p
convergent
case.
Example 12.5. Let us consider a number of examples here to get a feeling for
these dierent notions of convergence. In each of these examples we will work
in the measure space,
_
1
+
, B = B
R+
, m
_
.
1. Let f
n
=
1
n
1
[0,n]
as in Figure 12.1. In this case f
n
0 in L
1
but f
n
0
a.e.,f
n
0 in L
p
for all p > 1 and f
n
m
0.
178 12 L
p
spaces
Fig. 12.1. Graphs of fn =
1
n
1
[0,n]
for n = 1, 2, 3, 4.
2. Let f
n
= 1
[n1,n]
as in the gure below. Then f
n
0 a.e., yet f
n
0 in
any L
p
space or in measure.
3. Now suppose that f
n
= n 1
[0,1/n]
as in Figure 12.2. In this case f
n
0
a.e., f
n
m
0 but f
n
0 in L
1
or in any L
p
for p 1. Observe that
|f
n
|
p
= n
11/p
for all p 1.
Fig. 12.2. Graphs of fn = n 1
[0,n]
for n = 1, 2, 3, 4.
4. For n N and 1 k n, let g
n,k
:= 1
(
k1
n
,
k
n
]
. Then dene f
n
as
(f
1
, f
2
, f
3
, . . . ) = (g
1,1
, g
2,1
, g
2,2
, g
3,1
, g
3,2
, g
3,3
, g
4,1
, g
4,2
, g
4,3
, g
4,4
, . . . )
as depicted in the gures below.
For this sequence of functions we have f
n
0 in L
p
for all 1 p < and
f
n
m
0 but f
n
0 a.e. and f
n
0 in L
. In this case, |g
n,k
|
p
=
_
1
n
_
1/p
for 1 p < while |g
n,k
|
= 1 for all n, k.
12.2 Almost Everywhere and Measure Convergence
Theorem 12.6 (Egorov: a.s. = convergence in probability). Suppose
() = 1 and f
n
f a.s. Then for all > 0 there exists E = E
B such
that (E) < and f
n
f uniformly on E
c
. In particular f
n
f as n .
Proof. Let f
n
f a.e. Then for all > 0,
0 = ([f
n
f[ > i.o. n)
= lim
N
_
_
_
nN
[f
n
f[ >
_
_
(12.3)
limsup
N
([f
N
f[ > )
12.2 Almost Everywhere and Measure Convergence 179
from which it follows that f
n
f as n .
We now prove that the convergence is uniform o a small exceptional set.
By Eq. (12.3), there exists an increasing sequence N
k
k=1
, such that (E
k
) <
2
k
, where
E
k
:=
_
nNk
_
[f
n
f[ >
1
k
_
.
If we now set E :=
k=1
E
k
, then (E) <
k
2
k
= and for / E we have
[f
n
() f ()[
1
k
for all n N
k
and k N. That is f
n
f uniformly on
E
c
.
Lemma 12.7. Suppose a
n
C and [a
n+1
a
n
[
n
and
n=1
n
< . Then
lim
n
a
n
= a C exists and [a a
n
[
n
:=
k=n
k
.
Proof. Let m > n then
[a
m
a
n
[ =
m1
k=n
(a
k+1
a
k
)
m1
k=n
[a
k+1
a
k
[
k=n
k
:=
n
. (12.4)
So [a
m
a
n
[
min(m,n)
0 as , m, n , i.e. a
n
is Cauchy. Let m
in (12.4) to nd [a a
n
[
n
.
Theorem 12.8. Let (, B, ) be a measure space and f
n
n=1
be a sequence
of measurable functions on .
1. If f and g are measurable functions and f
n
f and f
n
g then f = g
a.e.
2. If f
n
f and g
n
g then f
n
f for all C and f
n
+g
n
f +g.
3. If f
n
f then f
n
n=1
is Cauchy in measure.
4. If f
n
n=1
is Cauchy in measure, there exists a measurable function, f, and
a subsequence g
j
= f
nj
of f
n
such that lim
j
g
j
:= f exists a.e.
5. (Completeness of convergence in measure.) If f
n
n=1
is Cauchy in
measure and f is as in item 4. then f
n
f.
Proof. One of the basic tricks here is to observe that if > 0 and a, b 0
such that a +b , then either a /2 or b /2.
1. Suppose that f and g are measurable functions such that f
n
g and
f
n
f as n and > 0 is given. Since

[f g[ [f f
n
[ +[f
n
g[ ,
if > 0 and [f g[ , then either [f f
n
[ /2 or [f
n
g[ /2. Thus
it follows
[f g[ > [f f
n
[ > /2 [g f
n
[ > /2 ,
and therefore,
([f g[ > ) ([f f
n
[ > /2) +([g f
n
[ > /2) 0 as n .
Hence
([f g[ > 0) =
_
n=1
_
[f g[ >
1
n
__
n=1
_
[f g[ >
1
n
_
= 0,
i.e. f = g a.e.
2. The rst claim is easy and the second follows similarly to the proof of the
rst item.
3. Suppose f
n
f, > 0 and m, n N, then [f

n
f
m
[ [f f
n
[ +[f
m
f[ .
So by the basic trick,
([f
n
f
m
[ > ) ([f
n
f[ > /2)+([f
m
f[ > /2) 0 as m, n .
4. Suppose f
n
is L
0
() Cauchy and let
n
> 0 such that
n=1
n
<
(
n
= 2
n
would do) and set
n
=
k=n
k
. Choose g
j
= f
nj
where n
j
is a
subsequence of N such that
([g
j+1
g
j
[ >
j
)
j
.
Let
F
N
:=
jN
[g
j+1
g
j
[ >
j
and
E :=
N=1
F
N
= [g
j+1
g
j
[ >
j
i.o. .
Since
(F
N
)
N
<
and F
N
E it follows
1
that 0 = (E) = lim
N
(F
N
) . For / E,
[g
j+1
() g
j
()[
j
for a.a. j and so by Lemma 12.7, f () := lim
j
g
j
()
exists. For E we may dene f () 0.
5. Next we will show g
N
f as N where f and g
N
are as above. If
F
c
N
=
jN
[g
j+1
g
j
[
j
,
then
1
Alternatively, (E) = 0 by the rst Borel Cantelli lemma and the fact that
j=1
({|gj+1 gj| > j})
j=1
j < .
180 12 L
p
spaces
[g
j+1
() g
j
()[
j
for all j N.
Another application of Lemma 12.7 shows [f() g
j
()[
j
for all j N,
i.e.
F
c
N

jN
[f g
j
[
j
[f g
N
[
N
.
Therefore, by taking complements of this equation, [f g
N
[ >
N
F
N
and hence
([f g
N
[ >
N
) (F
N
)
N
0 as N
and in particular, g
N
f as N .
With this in hand, it is straightforward to show f
n
f. Indeed, by the
usual trick, for all j N,
([f
n
f[ > ) ([f g
j
[ > /2) +([g
j
f
n
[ > /2).
Therefore, letting j in this inequality gives,
([f
n
f[ > ) limsup
j
([g
j
f
n
[ > /2) 0 as n ,
wherein we have used f
n
n=1
is Cauchy in measure and g
j
f.
Corollary 12.9 (Dominated Convergence Theorem). Let (, B, ) be a
measure space. Suppose f
n
, g
n
, and g are in L
1
and f L
0
are functions
such that
[f
n
[ g
n
a.e., f
n
f, g
n
g, and
_
g
n

_
g as n .
Then f L
1
and lim
n
|f f
n
|
1
= 0, i.e. f
n
f in L
1
. In particular
lim
n
_
f
n
=
_
f.
Proof. First notice that [f[ g a.e. and hence f L
1
since g L
1
. To see
that [f[ g, use item 4. of Theorem 12.8 to nd subsequences f
nk
and g
nk
of f
n
and g
n
respectively which are almost everywhere convergent. Then
[f[ = lim
k
[f
nk
[ lim
k
g
nk
= g a.e.
If (for sake of contradiction) lim
n
|f f
n
|
1
,= 0 there exists > 0 and a
subsequence f
nk
of f
n
such that
_
[f f
nk
[ for all k. (12.5)
Using item 4. of Theorem 12.8 again, we may assume (by passing to a further
subsequences if necessary) that f
nk
f and g
nk
g almost everywhere.
Noting, [f f
nk
[ g +g
nk
2g and
_
(g +g
nk
)
_
2g, an application of the
dominated convergence Theorem 7.27 implies lim
k
_
[f f
nk
[ = 0 which
contradicts Eq. (12.5).
Exercise 12.1 (Fatous Lemma). Let (, B, ) be a measure space. If f
n
0
and f
n
f in measure, then
_
fd liminf
n
_
f
n
d.
Lemma 12.10. Suppose 1 p < , f
n
n=1
L
p
() , and f
n
f,
then |f|
p
liminf
n
|f
n
|
p
. Moreover if f
n
n=1
f L
p
() , then
|f f
n
|
p
0 as n i lim
n
|f
n
|
p
= |f|
p
< and f
n
f.
Proof. Choose a subsequence, g
k
= f
nk
, such that liminf
n
|f
n
|
p
=
lim
k
|g
k
|
p
. By passing to a further subsequence if necessary, we may further
assume that g
k
f a.e. Therefore, by Fatous lemma,
|f|
p
p
=
_
[f[
p
d =
_
lim
k
[g
k
[
p
d liminf
k
_
[g
k
[
p
d = liminf
n
|f
n
|
p
p
which proves the rst assertion.
If |f f
n
|
p
0 as n , then by the triangle inequality,
|f|
p
|f
n
|
p
|f f
n
|
p
which shows
_
[f
n
[
p
_
[f[
p
if f
n
f in
L
p
. Chebyschevs inequality implies f
n
f if f
n
f in L
p
.
Conversely if lim
n
|f
n
|
p
= |f|
p
< and f
n
f, let F
n
:= [f f
n
[
p
and G
n
:= 2
p1
[[f[
p
+[f
n
[
p
] . Then F
n
0
2
, F
n
G
n
L
1
, and
_
G
n

_
G
where G := 2
p
[f[
p
L
1
. Therefore, by Corollary 12.9,
_
[f f
n
[
p
=
_
F
n

_
0 = 0.
Exercise 12.2. Let (, B, ) be a measure space, p [1, ), and suppose that
0 f L
1
() , 0 f
n
L
1
() for all n, f
n
f, and
_
f
n
d
_
fd. Then
f
n
f in L
1
() . In particular if f, f
n
L
p
() and f
n
f in L
p
() , then
[f
n
[
p
[f[
p
in L
1
() .
Solution to Exercise (12.2). Let F
n
:= [f f
n
[ f + f
n
:= g
n
and g :=
2f. Then F
n
0, g
n
g, and
_
g
n
d
_
gd. So by Corollary 12.9,
_
[f f
n
[ d =
_
F
n
d 0 as n .
Proposition 12.11. Suppose (, B, ) is a probability space and f
n
n=1
be a
sequence of measurable functions on . Then f
n
n=1
converges to f in prob-
ability i every subsequence, f
t
n
n=1
of f
n
n=1
has a further subsequence,
f
tt
n
n=1
, which is almost surely convergent to f.
2
This is becuase |Fn| i |f fn|
1/p
.
12.3 Jensens, H olders and Minikowskis Inequalities 181
Proof. If f
n
n=1
is convergent and hence Cauchy in probability then any
subsequence, f
t
n
n=1
is also Cauchy in probability. Hence by item 4. of Theo-
rem 12.8 there is a further subsequence, f
tt
n
n=1
of f
t
n
n=1
which is convergent
almost surely.
Conversely if f
n
n=1
does not converge to f in probability, then there
exists an > 0 and a subsequence, n
k
such that inf
k
([f f
nk
[ ) > 0.
Any subsequence of f
nk
would have the same property and hence can not be
almost surely convergent because of Egorovs Theorem 12.6.
Corollary 12.12. Suppose (, B, ) is a probability space, f
n
f and g
n
g and : 1 1 and : 1
2
1 are continuous functions. Then
1. (f
n
)

(f) ,
2. (f
n
, g
n
)

(f, g) , and
3. f
n
g
n
f g.
Proof. Item 1. and 3. follow from item 2. by taking (x, y) = (x) and
(x, y) = x y respectively. So it suces to prove item 2. To do this we will
make repeated use of Theorem 12.8.
Given any subsequence, n
k
, of N there is a subsequence, n
t
k
of n
k
such that f
n
k
f a.s. and yet a further subsequence n
tt
k
of n
t
k
such that
g
n
k
g a.s. Hence, by the continuity of , it now follows that
lim
k
_
f
n
k
, g
n
k
_
= (f, g) a.s.
which completes the proof.
Example 12.13. It is not possible to drop the assumption that () < in
Corollary 12.12. For example, let = 1, B = B
R
, = m be Lebesgue measure,
f
n
(x) =
1
n
and g
n
(x) = x
2
= g (x) . Then f
n
0, g
n
g while f
n
g
n
does not
converge to 0 = 0 g in measure. Also if we let (y) = y
2
, f
n
(x) = x+1/n and
f (x) = x for all x 1, then f
n
f while
[(f
n
) (f)] (x) = (x + 1/n)
2
x
2
=
2
n
x +
1
n
2
does not go to 0 in measure as n .
12.3 Jensens, H olders and Minikowskis Inequalities
Theorem 12.14 (Jensens Inequality). Suppose that (, B, ) is a proba-
bility space, i.e. is a positive measure and () = 1. Also suppose that
f L
1
(), f : (a, b), and : (a, b) 1 is a convex function, (i.e.
tt
(x) 0 on (a, b) .) Then
__
fd
_
(f)d
where if f / L
1
(), then f is integrable in the extended sense and
_
(f)d = .
Proof. Let t =
_
fd (a, b) and let 1 ( = (t) when (t) exists),

be such that (s) (t) (s t) for all s (a, b). (See Lemma 12.52) and
Figure 12.5 when is C
1
and Theorem 12.55 below for the existence of such a
in the general case.) Then integrating the inequality, (f) (t) (f t),
implies that
0
_
(f)d (t) =
_
(f)d (
_
fd).
Moreover, if (f) is not integrable, then (f) (t) + (f t) which shows
that negative part of (f) is integrable. Therefore,
_
(f)d = in this case.

Example 12.15. Since e
x
for x 1, lnx for x > 0, and x
p
for x 0 and p 1
are all convex functions, we have the following inequalities
exp
__
fd
_
e
f
d, (12.6)
_
log([f[)d log
__
[f[ d
_
and for p 1,
fd
__
[f[ d
_
p
[f[
p
d.
Example 12.16. As a special case of Eq. (12.6), if p
i
, s
i
> 0 for i = 1, 2, . . . , n
and

n
i=1
1
pi
= 1, then
s
1
. . . s
n
= e
n
i=1
ln si
= e
n
i=1
1
p
i
ln s
p
i
i
i=1
1
p
i
e
ln s
p
i
i
=
n
i=1
s
pi
i
p
i
. (12.7)
Indeed, we have applied Eq. (12.6) with = 1, 2, . . . , n , =
n
i=1
1
pi
i
and
f (i) := lns
pi
i
. As a special case of Eq. (12.7), suppose that s, t, p, q (1, )
with q =
p
p1
(i.e.
1
p
+
1
q
= 1) then
st
1
p
s
p
+
1
q
t
q
. (12.8)
182 12 L
p
spaces
(When p = q = 1/2, the inequality in Eq. (12.8) follows from the inequality,
0 (s t)
2
.)
As another special case of Eq. (12.7), take p
i
= n and s
i
= a
1/n
i
with a
i
> 0,
then we get the arithmetic geometric mean inequality,
n
a
1
. . . a
n

1
n
n
i=1
a
i
. (12.9)
Example 12.17. Let (, B, ) be a probability space, 0 < p < q < , and
f : C be a measurable function. Then by Jensens inequality,
__
[f[
p
d
_
q/p
([f[
p
)
q/p
d =
_
[f[
q
d
from which it follows that |f|
p
|f|
q
. In particular, L
p
() L
q
() for all
0 < p < q < . See Corollary 12.31 for an alternative proof.
Theorem 12.18 (Holders inequality). Suppose that 1 p and q :=
p
p1
, or equivalently p
1
+q
1
= 1. If f and g are measurable functions then
|fg|
1
|f|
p
|g|
q
. (12.10)
Assuming p (1, ) and |f|
p
|g|
q
< , equality holds in Eq. (12.10) i [f[
p
and [g[
q
are linearly dependent as elements of L
1
which happens i
[g[
q
|f|
p
p
= |g|
q
q
[f[
p
a.e. (12.11)
Proof. The cases p = 1 and q = or p = and q = 1 are easy to deal
with and will be left to the reader. So we now assume that p, q (1, ) . If
|f|
q
= 0 or or |g|
p
= 0 or , Eq. (12.10) is again easily veried. So we will
now assume that 0 < |f|
q
, |g|
p
< . Taking s = [f[ /|f|
p
and t = [g[/|g|
q
in Eq. (12.8) gives,
[fg[
|f|
p
|g|
q
1
p
[f[
p
|f|
p
+
1
q
[g[
q
|g|
q
(12.12)
with equality i [g/|g|
q
[ = [f[
p1
/|f|
(p1)
p = [f[
p/q
/|f|
p/q
p , i.e. [g[
q
|f|
p
p
=
|g|
q
q
[f[
p
. Integrating Eq. (12.12) implies
|fg|
1
|f|
p
|g|
q
1
p
+
1
q
= 1
with equality i Eq. (12.11) holds. The proof is nished since it is easily checked
that equality holds in Eq. (12.10) when [f[
p
= c [g[
q
of [g[
q
= c [f[
p
for some
constant c.
Example 12.19. Suppose that a
k
C for k = 1, 2, . . . , n and p [1, ), then
k=1
a
k
p
n
p1
n
k=1
[a
k
[
p
. (12.13)
Indeed, by Holders inequality applied using the measure space, 1, 2, . . . , n
equipped with counting measure, we have
k=1
a
k
k=1
a
k
1
_
n
k=1
[a
k
[
p
_
1/p
_
n
k=1
1
q
_
1/q
= n
1/q
_
n
k=1
[a
k
[
p
_
1/p
where q =
p
p1
. Taking the p
th
power of this inequality then gives, Eq. (12.14).
Theorem 12.20 (Generalized Holders inequality). Suppose that f
i
:
C are measurable functions for i = 1, . . . , n and p
1
, . . . , p
n
and r are positive
numbers such that

n
i=1
p
1
i
= r
1
, then
_
_
_
_
_
n
i=1
f
i
_
_
_
_
_
r
i=1
|f
i
|
pi
. (12.14)
Proof. One may prove this theorem by induction based on Holders Theo-
rem 12.18 above. Alternatively we may give a proof along the lines of the proof
of Theorem 12.18 which is what we will do here.
Since Eq. (12.14) is easily seen to hold if |f
i
|
pi
= 0 for some i, we will
assume that |f
i
|
pi
> 0 for all i. By assumption,

n
i=1
ri
pi
= 1, hence we may
replace s
i
by s
r
i
and p
i
by p
i
/r for each i in Eq. (12.7) to nd
s
r
1
. . . s
r
n

n
i=1
(s
r
i
)
pi/r
p
i
/r
= r
n
i=1
s
pi
i
p
i
.
Now replace s
i
by [f
i
[ / |f
i
|
pi
in the previous inequality and integrate the result
to nd
1
n
i=1
|f
i
|
pi
_
_
_
_
_
n
i=1
f
i
_
_
_
_
_
r
r
r
n
i=1
1
p
i
1
|f
i
|
pi
pi
_
[f
i
[
pi
d =
n
i=1
r
p
i
= 1.
Denition 12.21. A norm on a vector space Z is a function || : Z [0, )
such that
1. (Homogeneity) |f| = [[ |f| for all F and f Z.
2. (Triangle inequality) |f +g| |f| +|g| for all f, g Z.
p
spaces 183
3. (Positive denite) |f| = 0 implies f = 0.
A pair (Z, ||) where Z is a vector space and || is a norm on Z is called a
normed vector space.
Theorem 12.22 (Minkowskis Inequality). If 1 p and f, g L
p
()
then
|f +g|
p
|f|
p
+|g|
p
. (12.15)
In particular,
_
L
p
() , ||
p
_
is a normed vector space for all 1 p .
Proof. When p = , [f[ |f|
a.e. and [g[ |g|
a.e. so that [f +g[

[f[ +[g[ |f|
+|g|
a.e. and therefore

|f +g|
|f|
+|g|
.
When p < ,
[f +g[
p
(2 max ([f[ , [g[))
p
= 2
p
max ([f[
p
, [g[
p
) 2
p
([f[
p
+[g[
p
) ,
which implies
3
f +g L
p
since
|f +g|
p
p
2
p
_
|f|
p
p
+|g|
p
p
_
< .
Furthermore, when p = 1 we have
|f +g|
1
=
_
[f +g[d
_
[f[ d +
_
[g[d = |f|
1
+|g|
1
.
We now consider p (1, ) . We may assume |f +g|
p
, |f|
p
and |g|
p
are
all positive since otherwise the theorem is easily veried. Integrating
[f +g[
p
= [f +g[[f +g[
p1
([f[ +[g[)[f +g[
p1
and then applying Holders inequality with q = p/(p 1) gives
_
[f +g[
p
d
_
[f[ [f +g[
p1
d +
_
[g[ [f +g[
p1
d
(|f|
p
+|g|
p
) | [f +g[
p1
|
q
, (12.16)
where
|[f +g[
p1
|
q
q
=
_
([f +g[
p1
)
q
d =
_
[f +g[
p
d = |f +g|
p
p
. (12.17)
Combining Eqs. (12.16) and (12.17) implies
|f +g|
p
p
|f|
p
|f +g|
p/q
p
+|g|
p
|f +g|
p/q
p
(12.18)
Solving this inequality for |f +g|
p
gives Eq. (12.15).
3
In light of Example 12.19, the last 2
p
in the above inequality may be replaced by
2
p1
.
p
spaces
Denition 12.23 (Banach space). A normed vector space (Z, ||) is a
Banach space if is is complete, i.e. all Cauchy sequences are conver-
gent. To be more precise we are assuming that if x
n
n=1
Z satis-
es, lim
m,n
|x
n
x
m
| = 0, then there exists an x Z such that
lim
n
|x x
n
| = 0.
Theorem 12.24. Let ||
be as dened in Eq. (12.2), then

(L
(, B, ), ||
) is a Banach space. A sequence f

n
n=1
L
con-
verges to f L
i there exists E B such that (E) = 0 and f

n
f
uniformly on E
c
. Moreover, bounded simple functions are dense in L
.
Proof. By Minkowskis Theorem 12.22, ||
satises the triangle inequality.

The reader may easily check the remaining conditions that ensure ||
is a
norm. Suppose that f
n
n=1
L
is a sequence such f
n
f L
, i.e.
|f f
n
|
0 as n . Then for all k N, there exists N

k
< such that
_
[f f
n
[ > k
1
_
= 0 for all n N
k
.
Let
E =
k=1
nNk
_
[f f
n
[ > k
1
_
.
Then (E) = 0 and for x E
c
, [f(x) f
n
(x)[ k
1
for all n N
k
. This
shows that f
n
f uniformly on E
c
. Conversely, if there exists E B such that
(E) = 0 and f
n
f uniformly on E
c
, then for any > 0,
([f f
n
[ ) = ([f f
n
[ E
c
) = 0
for all n suciently large. That is to say limsup
j
|f f
n
|
for all > 0.

The density of simple functions follows from the approximation Theorem 6.39.
So the last item to prove is the completeness of L
.
Suppose
m,n
:= |f
m
f
n
|
0 as m, n . Let E
m,n
=
[f
n
f
m
[ >
m,n
and E := E
m,n
, then (E) = 0 and
sup
xE
c
[f
m
(x) f
n
(x)[
m,n
0 as m, n .
Therefore, f := lim
n
f
n
exists on E
c
and the limit is uniform on E
c
. Letting
f = lim
n
1
E
c f
n
, it then follows that lim
n
|f
n
f|
= 0.
Theorem 12.25 (Completeness of L
p
()). For 1 p , L
p
() equipped
with the L
p
norm, ||
p
(see Eq. (12.1)), is a Banach space.
184 12 L
p
spaces
Proof. By Minkowskis Theorem 12.22, ||
p
satises the triangle inequality.
As above the reader may easily check the remaining conditions that ensure ||
p
is a norm. So we are left to prove the completeness of L
p
() for 1 p < , the
case p = being done in Theorem 12.24.
Let f
n
n=1
L
p
() be a Cauchy sequence. By Chebyshevs inequality
(Lemma 12.4), f
n
is L
0
-Cauchy (i.e. Cauchy in measure) and by Theorem
12.8 there exists a subsequence g
j
of f
n
such that g
j
f a.e. By Fatous
Lemma,
|g
j
f|
p
p
=
_
lim
k
inf [g
j
g
k
[
p
d lim
k
inf
_
[g
j
g
k
[
p
d
= lim
k
inf |g
j
g
k
|
p
p
0 as j .
In particular, |f|
p
|g
j
f|
p
+|g
j
|
p
< so the f L
p
and g
j
L
p
f. The
proof is nished because,
|f
n
f|
p
|f
n
g
j
|
p
+|g
j
f|
p
0 as j, n .
See Denition 14.2 for a very important example of where completeness is
used. To end this section we are going to record a few results we will need later
regarding subspace of L
p
() which are induced by sub algebras, B
0
B.
Lemma 12.26. Let (, B, ) be a measure space and B
0
be a sub algebra
of B. Then for 1 p < , the map i : L
p
(, B
0
, ) L
p
(, B, ) dened by
i ([f]
0
) = [f] is a well dened linear isometry. Here we are writing,
[f]
0
= g L
p
(, B
0
, ) : g = f a.e. and
[f] = g L
p
(, B, ) : g = f a.e. .
Moreover the image of i, i (L
p
(, B
0
, )) , is a closed subspace of L
p
(, B, ) .
Proof. This is proof is routine and most of it will be left to the reader. Let us
just check that i (L
p
(, B
0
, )) , is a closed subspace of L
p
(, B, ) . To this end,
suppose that i ([f
n
]
0
) = [f
n
] is a convergent sequence in L
p
(, B, ) . Because,
i, is an isometry it follows that [f
n
]
0
n=1
is a Cauchy and hence convergent
sequence in L
p
(, B
0
, ) . Letting f L
p
(, B
0
, ) such that |f f
n
|
L
p
()

0, we will have, since i is isometric, that [f
n
] [f] = i ([f]
0
) i (L
p
(, B
0
, ))
as desired.
Exercise 12.3. Let (, B, ) be a measure space and B
0
be a sub algebra
of B. Further suppose that to every B B there exists A B
0
such that
(BA) = 0. Show for all 1 p < that i (L
p
(, B
0
, )) = L
p
(, B, ) , i.e.
to each f L
p
(, B, ) there exists a g L
p
(, B
0
, ) such that f = g a.e.
Hints: 1. verify the last assertion for simple functions in L
p
(, B
0
, ) . 2. then
make use of Theorem 6.39 and Exercise 6.4.
Exercise 12.4. Suppose that 1 p < , (, B, ) is a nite measure space
and B
0
is a sub algebra of B. Show that i (L
p
(, B
0
, )) = L
p
(, B, )
implies; to every B B there exists A B
0
such that (BA) = 0.
Solution to Exercise (12.4). Let B B with (B) < . Then 1
B

L
p
(, B, ) and hence by assumption there exists g L
p
(, B
0
, ) such that
g = 1
B
a.e. Let A := g = 1 B
0
and observe that AB g ,= 1
B
. There-
fore (AB) = (g ,= 1
B
) = 0. For general the case we use the fact that
(, B, ) is a nite measure space to conclude that each B B may be
written as a disjoint union, B =

n=1
B
n
, with B
n
B and (B
n
) < . By
what we have just proved we may nd A
n
B
0
such that (B
n
A
n
) = 0. I
now claim that A :=
n=1
A
n
B
0
satises (AB) = 0. Indeed, notice that
A B =
n=1
A
n
B
n=1
A
n
B
n
,
similarly B A
n=1
B
n
A
n
, and therefore AB
n=1
A
n
B
n
. Therefore
by sub-additivity of , (AB)
n=1
(A
n
B
n
) = 0.
Convention: From now on we will drop the cumbersome notation and
simply identify [f] with f and L
p
(, B
0
, ) with its image, i (L
p
(, B
0
, )) , in
L
p
(, B, ) .
12.5 Density Results
Theorem 12.27 (Density Theorem). Let p [1, ), (, B, ) be a measure
space and M be an algebra of bounded 1 valued measurable functions such
that
1. M L
p
(, 1) and (M) = B.
2. There exists
k
M such that
k
1 boundedly.
Then to every function f L
p
(, 1) , there exist
n
M such that
lim
n
|f
n
|
L
p
()
= 0, i.e. M is dense in L
p
(, 1) .
Proof. Fix k N for the moment and let H denote those bounded B
measurable functions, f : 1, for which there exists
n
n=1
M such
that lim
n
|
k
f
n
|
L
p
()
= 0. A routine check shows H is a subspace of
the bounded measurable 1 valued functions on , 1 H, M H and H
is closed under bounded convergence. To verify the latter assertion, suppose
f
n
H and f
n
f boundedly. Then, by the dominated convergence theorem,
lim
n
|
k
(f f
n
)|
L
p
()
= 0.
4
(Take the dominating function to be g =
4
It is at this point that the proof would break down if p = .
p
spaces 185
[2C [
k
[]
p
where C is a constant bounding all of the [f
n
[
n=1
.) We may now
choose
n
M such that |
n
k
f
n
|
L
p
()

1
n
then
lim sup
n
|
k
f
n
|
L
p
()
lim sup
n
|
k
(f f
n
)|
L
p
()
+ lim sup
n
|
k
f
n
n
|
L
p
()
= 0 (12.19)
which implies f H.
An application of Dynkins Multiplicative System Theorem 8.16, now shows
H contains all bounded measurable functions on . Let f L
p
() be given. The
dominated convergence theorem implies lim
k
_
_
k
1
]f]k]
f f
_
_
L
p
()
= 0.
(Take the dominating function to be g = [2C [f[]
p
where C is a bound on all of
the [
k
[ .) Using this and what we have just proved, there exists
k
M such
that
_
_
k
1
]f]k]
f
k
_
_
L
p
()

1
k
.
The same line of reasoning used in Eq. (12.19) now implies
lim
k
|f
k
|
L
p
()
= 0.
Example 12.28. Let be a measure on (1, B
R
) such that ([M, M]) <
for all M < . Then, C
c
(1, 1) (the space of continuous functions on 1 with
compact support) is dense in L
p
() for all 1 p < . To see this, apply
Theorem 12.27 with M= C
c
(1, 1) and
k
:= 1
[k,k]
.
Theorem 12.29. Suppose p [1, ), / B 2
is an algebra such that

(/) = B and is nite on /. Let S(/, ) denote the measurable simple
functions, : 1 such = y / for all y 1 and ( ,= 0) < .
Then S(/, ) is dense subspace of L
p
().
Proof. Let M := S(/, ). By assumption there exists
k
/ such that
(
k
) < and
k
as k . If A /, then
k
A / and (
k
A) <
so that 1
kA
M. Therefore 1
A
= lim
k
1
kA
is (M) measurable
for every A /. So we have shown that / (M) B and therefore B =
(/) (M) B, i.e. (M) = B. The theorem now follows from Theorem
12.27 after observing
k
:= 1
k
M and
k
1 boundedly.
Theorem 12.30 (Separability of L
p
Spaces). Suppose, p [1, ), / B
is a countable algebra such that (/) = B and is nite on /. Then L
p
()
is separable and
| =
a
j
1
Aj
: a
j
+i, A
j
/ with (A
j
) <
is a countable dense subset.
Proof. It is left to reader to check | is dense in S(/, ) relative to the L
p
()
norm. Once this is done, the proof is then complete since S(/, ) is a dense
subspace of L
p
() by Theorem 12.29.
p
spaces
The L
p
() norm controls two types of behaviors of f, namely the behavior
at innity and the behavior of local singularities. So in particular, if f blows
up at a point x
0
, then locally near x
0
it is harder for f to be in L
p
()
as p increases. On the other hand a function f L
p
() is allowed to decay
at innity slower and slower as p increases. With these insights in mind,
we should not in general expect L
p
() L
q
() or L
q
() L
p
(). However,
there are two notable exceptions. (1) If () < , then there is no behavior at
innity to worry about and L
q
() L
p
() for all q p as is shown in Corollary
12.31 below. (2) If is counting measure, i.e. (A) = #(A), then all functions
in L
p
() for any p can not blow up on a set of positive measure, so there are no
local singularities. In this case L
p
() L
q
() for all q p, see Corollary 12.36
below.
Corollary 12.31 (Example 12.17 revisited). If () < and 0 < p
q , then L
q
() L
p
(), the inclusion map is bounded and in fact
|f|
p
[()]
(
1
p
1
q
)
|f|
q
.
Proof. Take a [1, ] such that
1
p
=
1
a
+
1
q
, i.e. a =
pq
q p
.
Then by Theorem 12.20,
|f|
p
= |f 1|
p
|f|
q
|1|
a
= ()
1/a
|f|
q
= ()
(
1
p
1
q
)
|f|
q
.
The reader may easily check this nal formula is correct even when q =
provided we interpret 1/p 1/ to be 1/p.
The rest of this section may be skipped.
Example 12.32 (Power Inequalities). Let a := (a
1
, . . . , a
n
) with a
i
> 0 for i =
1, 2, . . . , n and for p 1 0 , let
|a|
p
:=
_
1
n
n
i=1
a
p
i
_
1/p
.
Then by Corollary 12.31, p |a|
p
is increasing in p for p > 0. For p = q < 0,
we have
186 12 L
p
spaces
|a|
p
:=
_
1
n
n
i=1
a
q
i
_
1/q
=
_
_
1
1
n
n
i=1
_
1
ai
_
q
_
_
1/q
=
_
_
_
_
1
a
_
_
_
_
1
q
where
1
a
:= (1/a
1
, . . . , 1/a
n
) . So for p < 0, as p increases, q = p decreases, so
that
_
_
1
a
_
_
q
is decreasing and hence
_
_
1
a
_
_
1
q
is increasing. Hence we have shown
that p |a|
p
is increasing for p 1 0 .
We now claim that lim
p0
|a|
p
=
n
a
1
. . . a
n
. To prove this, write a
p
i
=
e
p ln ai
= 1 +p lna
i
+O
_
p
2
_
for p near zero. Therefore,
1
n
n
i=1
a
p
i
= 1 +p
1
n
n
i=1
lna
i
+O
_
p
2
_
.
lim
p0
|a|
p
= lim
p0
_
1
n
n
i=1
a
p
i
_
1/p
= lim
p0
_
1 +p
1
n
n
i=1
lna
i
+O
_
p
2
_
_
1/p
= e
1
n
n
i=1
ln ai
=
n
a
1
. . . a
n
.
So if we now dene |a|
0
:=
n
a
1
. . . a
n
, the map p 1 |a|
p
(0, ) is
continuous and increasing in p.
We will now show that lim
p
|a|
p
= max
i
a
i
=: M and lim
p
|a|
p
=
min
i
a
i
=: m. Indeed, for p > 0,
1
n
M
p
1
n
n
i=1
a
p
i
M
p
and therefore,
_
1
n
_
1/p
M |a|
p
M.
Since
_
1
n
_
1/p
1 as p , it follows that lim
p
|a|
p
= M. For p = q < 0,
we have
lim
p
|a|
p
= lim
q
_
1
_
_
1
a
_
_
q
_
=
1
max
i
(1/a
i
)
=
1
1/m
= m = min
i
a
i
.
Conclusion. If we extend the denition of |a|
p
to p = and p =
by |a|
= max
i
a
i
and |a|
= min
i
a
i
, then

1 p |a|
p
(0, ) is a
continuous non-decreasing function of p.
Proposition 12.33. Suppose that 0 < p
0
< p
1
, (0, 1) and p

(p
0
, p
1
) be dened by
1
p
=
1
p
0
+

p
1
(12.20)
with the interpretation that /p
1
= 0 if p
1
= .
5
Then L
p
L
p0
+ L
p1
, i.e.
every function f L
p
may be written as f = g +h with g L
p0
and h L
p1
.
For 1 p
0
< p
1
and f L
p0
+L
p1
let
|f| := inf
_
|g|
p0
+|h|
p1
: f = g +h
_
.
Then (L
p0
+L
p1
, ||) is a Banach space and the inclusion map from L
p
to
L
p0
+L
p1
is bounded; in fact |f| 2 |f|
p
for all f L
p
.
Proof. Let M > 0, then the local singularities of f are contained in the
set E := [f[ > M and the behavior of f at innity is solely determined by
f on E
c
. Hence let g = f1
E
and h = f1
E
c so that f = g + h. By our earlier
discussion we expect that g L
p0
and h L
p1
and this is the case since,
|g|
p0
p0
=
_
[f[
p0
1
]f]>M
= M
p0
_
f
M
p0
1
]f]>M
M
p0
_
f
M
p
1
]f]>M
M
p0p
|f|
p
p
<
and
|h|
p1
p1
=
_
_
f1
]f]M
_
_
p1
p1
=
_
[f[
p1
1
]f]M
= M
p1
_
f
M
p1
1
]f]M
M
p1
_
f
M
p
1
]f]M
M
p1p
|f|
p
p
< .
Moreover this shows
|f| M
1p/p0
|f|
p/p0
p
+M
1p/p1
|f|
p/p1
p
.
Taking M = |f|
p
then gives
|f|
_
1p/p0
+
1p/p1
_
|f|
p
and then taking = 1 shows |f| 2 |f|
p
. The proof that (L
p0
+L
p1
, ||) is
a Banach space is left as Exercise 12.11 to the reader.
5
A little algebra shows that may be computed in terms of p0, p and p1 by
=
p0
p
p1 p
p1 p0
.
12.7 Uniform Integrability 187
Corollary 12.34 (Interpolation of L
p
norms). Suppose that 0 < p
0
<
p
1
, (0, 1) and p
(p
0
, p
1
) be dened as in Eq. (12.20), then L
p0
L
p1
L
p
and
|f|
p
|f|
p0
|f|
1
p1
. (12.21)
Further assume 1 p
0
< p
< p
1
, and for f L
p0
L
p1
let
|f| := |f|
p0
+|f|
p1
.
Then (L
p0
L
p1
, ||) is a Banach space and the inclusion map of L
p0
L
p1
into
L
p
is bounded, in fact
|f|
p
max
_
1
, (1 )
1
_
_
|f|
p0
+|f|
p1
_
. (12.22)
The heuristic explanation of this corollary is that if f L
p0
L
p1
, then f
has local singularities no worse than an L
p1
function and behavior at innity
no worse than an L
p0
function. Hence f L
p
for any p
between p
0
and p
1
.
Proof. Let be determined as above, a = p
0
/ and b = p
1
/(1 ), then
by Theorem 12.20,
|f|
p
=
_
_
_[f[
[f[
1
_
_
_
p
_
_
_[f[
_
_
_
a
_
_
_[f[
1
_
_
_
b
= |f|
p0
|f|
1
p1
.
It is easily checked that || is a norm on L
p0
L
p1
. To show this space is
complete, suppose that f
n
L
p0
L
p1
is a || Cauchy sequence. Then
f
n
is both L
p0
and L
p1
Cauchy. Hence there exist f L
p0
and g L
p1
such
that lim
n
|f f
n
|
p0
= 0 and lim
n
|g f
n
|
p
= 0. By Chebyshevs
inequality (Lemma 12.4) f
n
f and f
n
g in measure and therefore by
Theorem 12.8, f = g a.e. It now is clear that lim
n
|f f
n
| = 0. The
estimate in Eq. (12.22) is left as Exercise 12.10 to the reader.
Remark 12.35. Combining Proposition 12.33 and Corollary 12.34 gives
L
p0
L
p1
L
p
L
p0
+L
p1
for 0 < p
0
< p
1
, (0, 1) and p
(p
0
, p
1
) as in Eq. (12.20).
Corollary 12.36. Suppose now that is counting measure on . Then L
p
()
L
q
() for all 0 < p < q and |f|
q
|f|
p
.
Proof. Suppose that 0 < p < q = , then
|f|
p
= sup[f(x)[
p
: x
x
[f(x)[
p
= |f|
p
p
,
i.e. |f|
|f|
p
for all 0 < p < . For 0 < p q , apply Corollary 12.34
with p
0
= p and p
1
= to nd
|f|
q
|f|
p/q
p
|f|
1p/q
|f|
p/q
p
|f|
1p/q
p
= |f|
p
.
12.6.1 Summary:
1. L
p0
L
p1
L
q
L
p0
+L
p1
for any q (p
0
, p
1
).
2. If p q, then
p

q
and |f|
q
|f|
p
.
3. Since ([f[ > )
p
|f|
p
p
, L
p
convergence implies L
0
convergence.
4. L
0
convergence implies almost everywhere convergence for some subse-
quence.
5. If () < then almost everywhere convergence implies uniform con-
vergence o certain sets of small measure and in particular we have L
0
convergence.
6. If () < , then L
q
L
p
for all p q and L
q
convergence implies L
p
convergence.
12.7 Uniform Integrability
This section will address the question as to what extra conditions are needed
in order that an L
0
convergent sequence is L
p
convergent. This will lead us
to the notion of uniform integrability. To simplify matters a bit here, it will be
assumed that (, B, ) is a nite measure space for this section.
Notation 12.37 For f L
1
() and E B, let
(f : E) :=
_
E
fd.
and more generally if A, B B let
(f : A, B) :=
_
AB
fd.
When is a probability measure, we will often write E[f : E] for (f : E) and
E[f : A, B] for (f : A, B).
Denition 12.38. A collection of functions, L
1
() is said to be uni-
formly integrable if,
lim
a
sup
f
([f[ : [f[ a) = 0. (12.23)
In words, L
1
() is uniformly integrable if tail expectations can be made
uniformly small.
188 12 L
p
spaces
The condition in Eq. (12.23) implies sup
f
|f|
1
< .
6
Indeed, choose a
suciently large so that sup
f
([f[ : [f[ a) 1, then for f
|f|
1
= ([f[ : [f[ a) +([f[ : [f[ < a) 1 +a() .
Example 12.39. If = f with f L
1
() , then is uniformly integrable.
Indeed, lim
a
([f[ : [f[ a) = 0 by the dominated convergence theorem.
Exercise 12.5. Suppose A is an index set, f
A
and g
A
are two col-
lections of random variables. If g
A
is uniformly integrable and [f
[ [g
[
for all A, show f
A
is uniformly integrable as well.
Solution to Exercise (12.5). For a > 0 we have
E[[f
[ : [f
[ a] E[[g
[ : [f
[ a] E[[g
[ : [g
[ a] .
Therefore,
lim
a
sup
E[[f
[ : [f
[ a] lim
a
sup
E[[g
[ : [g
[ a] = 0.
Denition 12.40. A collection of functions, L
1
() is said to be uni-
formly absolutely continuous if for all > 0 there exists > 0 such that
sup
f
([f[ : E) < whenever (E) < . (12.24)
Equivalently put,
lim
0
sup([f[ : E) : f and (E) < = 0. (12.25)
Remark 12.41. It is not in general true that if f
n
L
1
() is uniformly ab-
solutely continuous implies sup
n
|f
n
|
1
< . For example take = and
() = 1. Let f
n
() = n. Since for < 1 a set E such that (E) <
is in fact the empty set and hence f
n
n=1
is uniformly absolutely continuous.
However, for nite measure spaces without atoms, for every > 0 we may
nd a nite partition of by sets E
k
=1
with (E
) < . If Eq. (12.24) holds

with = 1, then
([f
n
[) =
k
=1
([f
n
[ : E
) k
showing that ([f
n
[) k for all n.
6
This is not necessarily the case if () = . Indeed, if = R and = m is
Lebesgue measure, the sequences of functions,
_
fn := 1
[n,n]
_
n=1
are uniformly
integrable but not bounded in L
1
(m) .
Proposition 12.42. A subset L
1
() is uniformly integrable i L
1
()
is bounded and uniformly absolutely continuous.
Proof. ( =) We have already seen that uniformly integrable subsets, ,
are bounded in L
1
() . Moreover, for f , and E B,
([f[ : E) = ([f[ : [f[ M, E) +([f[ : [f[ < M, E)
([f[ : [f[ M) +M(E).
Therefore,
lim
0
sup([f[ : E) : f and (E) 0.
Hence given > 0 and > 0 as in the denition of uniform absolute continuity,
we may choose a = K/ in which case
sup
f
([f[ : [f[ a) < .
Since > 0 was arbitrary, it follows that lim
a
sup
f
([f[ : [f[ a) = 0 as
desired.
Corollary 12.43. Suppose f
A
and g
A
are two uniformly integrable
collections of functions, then f
+g
A
is also uniformly integrable.
Proof. By Proposition 12.42, f
A
and g
A
are both bounded
in L
1
() and are both uniformly absolutely continuous. Since |f
+g
|
1

|f
|
1
+ |g
|
1
it follows that f
+g
A
is bounded in L
1
() as well.
Moreover, for > 0 we may choose > 0 such that ([f
[ : E) < and
([g
[ : E) < whenever (E) < . For this choice of and , we then have
([f
+g
[ : E) ([f
[ +[g
[ : E) < 2 whenever (E) < ,

showing f
+g
A
uniformly absolutely continuous. Another application of
Proposition 12.42 completes the proof.
Exercise 12.6 (Problem 5 on p. 196 of Resnick.). Suppose that X
n
n=1
is a sequence of integrable and i.i.d random variables. Then
_
Sn
n
_
n=1
is uni-
formly integrable.
12.7 Uniform Integrability 189
Theorem 12.44 (Vitali Convergence Theorem). Let (, B, ) be a nite
measure space, := f
n
n=1
be a sequence of functions in L
1
() , and f :
C be a measurable function. Then f L
1
() and |f f
n
|
1
0 as n i
f
n
f in measure and is uniformly integrable.
Proof. ( =) If f
n
f in L
1
() , then by Chebyschevs inequality it fol-
lows that f
n
f in measure. Given > 0 we may choose N = N
N such
that |f f
n
|
1
/2 for n N
. Since convergent sequences are bounded,

we have K := sup
n
|f
n
|
1
< and ([f
n
[ a) K/a for all a > 0. Apply-
ing Proposition 12.42 with = f , for any a suciently large we will have
sup
n
([f[ : [f
n
[ a) /2. Thus for a suciently large and n N, it follows
that
([f
n
[ : [f
n
[ a) ([f f
n
[ : [f
n
[ a) +([f[ : [f
n
[ a)
|f f
n
|
1
+([f[ : [f
n
[ a) /2 +/2 = .
By Example 12.39 we also know that limsup
a
max
n<N
([f
n
[ : [f
n
[ a) = 0
for any nite N. Therefore we have shown,
limsup
a
sup
n
([f
n
[ : [f
n
[ a)
and as > 0 was arbitrary it follows that f
n
n=1
is uniformly integrable.
(=) If f
n
f in measure and = f
n
n=1
is uniformly integrable then
we know M := sup
n
|f
n
|
1
< . Hence and application of Fatous lemma, see
Exercise 12.1,
_
[f[ d liminf
n
_
[f
n
[ d M < ,
i.e. f L
1
(). It then follows by Example 12.39 and Corollary 12.43 that
0
:= f f
n
n=1
Therefore,
|f f
n
|
1
= ([f f
n
[ : [f f
n
[ a) +([f f
n
[ : [f f
n
[ < a)
(a) +
_
1
]ffn]<a
[f f
n
[ d (12.26)
where
(a) := sup
m
([f f
m
[ : [f f
m
[ a) 0 as a .
Since 1
]ffn]<a
[f f
n
[ a L
1
() and 1
]ffn]<a
[f f
n
[

0 becuase
_
1
]ffn]<a
[f f
n
[ >
_
([f f
n
[ > ) 0 as n ,
we may pass to the limit in Eq. (12.26), with the aid of the dominated conver-
gence theorem (see Corollary 12.9), to nd
limsup
n
|f f
n
|
1
(a) 0 as a .
Example 12.45. Let = [0, 1] , B = B
[0,1]
and P = m be Lebesgue measure on
B. Then the collection of functions, f
:=
1
1
[0,]
for (0, 1) is bounded in
L
1
(P) , f
0 a.e. as 0 but
0 =
_
lim
0
f
dP ,= lim
0
_
dP = 1.
This is a typical example of a bounded and pointwise convergent sequence in L
1
which is not uniformly integrable. This is easy to check directly as well since,
sup
(0,1)
m([f
[ : [f
[ a) = 1 for all a > 0.

Example 12.46. Let = [0, 1] , P be Lebesgue measure on B = B
[0,1]
, and for
(0, 1) let a
> 0 with lim

0
a
= and let f
:= a
1
[0,]
. Then Ef
= a
and so sup
>0
|f
|
1
=: K 0 is given, for large M we have a
for
small enough so that a
M. From this we conclude that limsup

0
(a
)
and since > 0 was arbitrary, lim
0
a
= 0 if f
is uniformly integrable. By
reversing these steps one sees the converse is also true.
Alternatively. No matter how a
> 0 is chosen, lim

0
f
= 0 a.s.. So from
Theorem 12.44, if f
is uniformly integrable we would have to have

lim
0
(a
) = lim
0
Ef
= E0 = 0.
Corollary 12.47. Let (, B, ) be a nite measure space, p [1, ), f
n
n=1
be a sequence of functions in L
p
() , and f : C be a measurable function.
Then f L
p
() and |f f
n
|
p
0 as n i f
n
f in measure and
:= [f
n
[
p
n=1
Proof. ( = ) Suppose that f
n
f in measure and := [f
n
[
p
n=1
is uniformly integrable. By Corollary 12.12, [f
n
[
p
[f[
p
in measure, and
h
n
:= [f f
n
[
p
0, and by Theorem 12.44, [f[
p
L
1
() and [f
n
[
p
[f[
p
in
L
1
() . It now follows by an application of Lemma 12.10 that |f f
n
|
p
0
as n .
(=) Suppose f L
p
and f
n
f in L
p
. Again f
n
f in measure by
Lemma 12.4. Let
190 12 L
p
spaces
h
n
:= [[f
n
[
p
[f[
p
[ [f
n
[
p
+[f[
p
=: g
n
L
1
and g := 2[f[
p
L
1
. Then g
n
g, h
n
0 and
_
g
n
d
_
gd. Therefore
by the dominated convergence theorem in Corollary 12.9, lim
n
_
h
n
d = 0,
i.e. [f
n
[
p
[f[
p
in L
1
() .
7
Hence it follows from Theorem 12.44 that is
uniformly integrable.
The following Lemma gives a concrete necessary and sucient conditions
for verifying a sequence of functions is uniformly integrable.
Lemma 12.48. Suppose that () < , and L
0
() is a collection of
functions.
1. If there exists a measurable function : 1
+
1
+
such that
lim
x
(x)/x = and
K := sup
f
(([f[)) < , (12.27)
then is uniformly integrable. (A typical example for in item 1. is (x) =
x
p
for some p > 1.)
2. *(Skip this if you like.) Conversely if is uniformly integrable, there exists
a non-decreasing continuous function : 1
+
1
+
such that (0) = 0,
lim
x
(x)/x = and Eq. (12.27) is valid.
Proof. 1. Let be as in item 1. above and set
a
:= sup
xa
x
(x)
0 as
a by assumption. Then for f
([f[ : [f[ a) =
_
[f[
([f[)
([f[) : [f[ a
_
(([f[) : [f[ a)
a
(([f[))
a
K
a
and hence
7
Here is an alternative proof. By the mean value theorem,
||f|
p
|fn|
p
| p(max(|f| , |fn|))
p1
||f| |fn|| p(|f| +|fn|)
p1
||f| |fn||
and therefore by H olders inequality,
_
||f|
p
|fn|
p
| d p
_
(|f| +|fn|)
p1
||f| |fn|| d p
_
(|f| +|fn|)
p1
|f fn| d
pf fnp(|f| +|fn|)
p1
q = p |f| +|fn|
p/q
p
f fnp
p(fp +fnp)
p/q
f fnp
where q := p/(p 1). This shows that
_
||f|
p
|fn|
p
| d 0 as n .
lim
a
sup
f
_
[f[ 1
]f]a
_
lim
a
K
a
= 0.
2. *(Skip this if you like.) By assumption,
a
:= sup
f
_
[f[ 1
]f]a
_
0 as
a . Therefore we may choose a
n
such that
n=0
(n + 1)
an
<
where by convention a
0
:= 0. Now dene so that (0) = 0 and
t
(x) =
n=0
(n + 1) 1
(an,an+1]
(x),
i.e.
(x) =
_
x
0
t
(y)dy =
n=0
(n + 1) (x a
n+1
x a
n
) .
By construction is continuous, (0) = 0,
t
(x) is increasing (so is convex)
and
t
(x) (n + 1) for x a
n
. In particular
(x)
x

(a
n
) + (n + 1)x
x
n + 1 for x a
n
from which we conclude lim
x
(x)/x = . We also have
t
(x) (n+1) on
[0, a
n+1
] and therefore
(x) (n + 1)x for x a
n+1
.
So for f ,
(([f[)) =
n=0
_
([f[)1
(an,an+1]
([f[)
_
n=0
(n + 1)
_
[f[ 1
(an,an+1]
([f[)
_
n=0
(n + 1)
_
[f[ 1
]f]an
_
n=0
(n + 1)
an
and hence
sup
f
(([f[))
n=0
(n + 1)
an
< .
12.8 Exercises 191
Exercise 12.7. Show directly that if () < , is as in Lemma 12.48,
and f
n
L
1
() such that f
n
f and K := sup
n
E[([f
n
[)] < , then
|f f
n
|
1
0 as n .
Solution to Exercise (12.7). Letting
a
:= sup
xa
x
(x)
as above we have
x
a
(x) for x a. Therefore,
E[f
n
[ = E[[f
n
[ : [f
n
[ a] +E[[f
n
[ : [f
n
[ < a]

a
E[([f
n
[) : [f
n
[ a] +a

a
E[([f
n
[)] +a = a +
a
K
from which it follows sup
n
E[f
n
[ < . Hence by Fatous lemma,
E[f[ liminf
n
E[f
n
[ sup
n
E[f
n
[ < .
Similarly,
E[[f f
n
[ : [f
n
[ a] E[[f[ : [f
n
[ a] +E[[f
n
[ : [f
n
[ a]
E[[f[ : [f
n
[ a] +
a
K
and for b > 0
E[[f[ : [f
n
[ a] = E[[f[ : [f[ b, [f
n
[ a] +E[[f[ : [f[ < b, [f
n
[ a]
E[[f[ : [f[ b] +bP ([f
n
[ a)
E[[f[ : [f[ b] +b
1
a
sup
n
E[f
n
[ .
Therefore,
limsup
a
E[[f f
n
[ : [f
n
[ a] limsup
a
(E[[f[ : [f
n
[ a] +
a
K)
E[[f[ : [f[ b] .
Now by the DCT, lim
n
E[[f f
n
[ : [f
n
[ < a] = 0 and hence
limsup
n
E[[f f
n
[] limsup
n
E[[f f
n
[ : [f
n
[ a] + limsup
n
E[[f f
n
[ : [f
n
[ < a]
E[[f[ : [f[ b] .
Another application of the DCT now show E[[f[ : [f[ b] 0 as b which
completes the proof.
12.8 Exercises
Exercise 12.8. Let f L
p
L
for some p < . Show |f|
= lim
q
|f|
q
.
If we further assume (X) < , show |f|
= lim
q
|f|
q
for all mea-
surable functions f : X C. In particular, f L
i lim
q
|f|
q
< .
Hints: Use Corollary 12.34 to show limsup
q
|f|
q
|f|
and to show
liminf
q
|f|
q
|f|
, let M < |f|
and make use of Chebyshevs in-

equality.
Exercise 12.9. Let > a, b > 1 with a
1
+b
1
= 1. Give a calculus proof of
the inequality
st
s
a
a
+
t
b
b
for all s, t 0.
Hint: by taking s = xt
b/a
, show that it suces to prove
x
x
a
a
+
1
b
for all x 0.
and then maximize the function f (x) = x x
a
/a for x [0, ).
Exercise 12.10. Prove Eq. (12.22) in Corollary 12.34. (Part of Folland 6.3 on
p. 186.) Hint: Use the inequality, with a, b 1 with a
1
+ b
1
= 1 chosen
appropriately,
st
s
a
a
+
t
b
b
applied to the right side of Eq. (12.21).
Exercise 12.11. Complete the proof of Proposition 12.33 by showing (L
p
+
L
r
, ||) is a Banach space.
Exercise 12.12. Let (, B, ) be a probability space. Show directly that for
any g L
1
(), = g is uniformly absolutely continuous. (We already know
this is true by combining Example 12.39 with Proposition 12.42.)
Solution to Exercise (12.12). First Proof. If the statement is false, there
would exist > 0 and sets E
n
such that (E
n
) 0 while ([g[ : E
n
) for all
n. Since [1
En
g[ [g[ L
1
and for any > 0, (1
En
[g[ > ) (E
n
) 0 as
n so that 1
En
[g[

0, the dominated convergence theorem of Corollary
12.9 implies lim
n
([g[ : E
n
) = 0. This contradicts ([g[ : E
n
) for all n
and the proof is complete.
Second Proof. Let =

n
i=1
c
i
1
Bi
be a simple function such that
|g |
1
< /2. Then
192 12 L
p
spaces
([g[ : E) ([[ : E) +([g [ : E)
i=1
[c
i
[ (E B
i
) +|g |
1

_
n
i=1
[c
i
[
_
(E) +/2.
This shows ([g[ : E) < provided that (E) < (2
n
i=1
[c
i
[)
1
.
Exercise 12.13. Suppose that (, B, P) is a probability space and X
n
n=1
is
a sequence of uncorrelated (i.e. Cov (X
n
, X
m
) = 0 if m ,= n) square integrable
random variables such that = EX
n
and
2
= Var (X
n
) for all n. Let S
n
:=
X
1
+ +X
n
. Show
_
_
Sn
n

_
_
2
2
=

2
n
0 as n .
Solution to Exercise (12.13). To say that the X
n
n=1
are uncorrelated
is equivalent to saying that X
n
n=1
is an orthogonal set. Thus by
Pythagoreans theorem,
_
_
_
_
S
n
n

_
_
_
_
2
2
=
1
n
2
|S
n
n|
2
2
=
1
n
2
|(X
1
) + + (X
n
)|
2
2
=
1
n
2
n
i=1
|X
i
|
2
2
=
n
2
n
2
0 as n .
n
n=1
are i.i.d. integrable random variables
and S
n
:= X
1
+ + X
n
and := EX
n
. Show,
Sn
n
in L
1
(P) as n
. (Incidentally, this shows that
_
Sn
n
_
n=1
is U.I. Hint: for M (0, ) , let
X
M
i
:= X
i
1
]Xi]M
and S
M
n
:= X
M
1
+ +X
M
n
and use Exercise 12.13 to see
that
S
M
n
n
EX
M
1
in L
2
(P) L
1
(P) for all M.
Using this to show lim
n
_
_
Sn
n
EX
1
_
_
1
= 0 by getting good control on
_
_
_
Sn
n

S
M
n
n
_
_
_
1
and

EX
n
EX
M
n
.
Exercise 12.15. Suppose 1 p < , X
n
n=1
are i.i.d. random variables
such that E[X
n
[
p
< , S
n
:= X
1
+ + X
n
and := EX
n
. Show,
Sn
n

in L
p
(P) as n . Hint: show
_
Sn
n
p
_
n=1
is U.I. this is not meant to be
hard!
12.9 Appendix: Convex Functions
Reference; see the appendix (page 500) of Revuz and Yor.
Denition 12.49. Given any function, : (a, b) 1, we say that is convex
if for all a < x
0
x
1
< b and t [0, 1] ,
(x
t
) h
t
:= (1 t)(x
0
) +t(x
1
) for all t [0, 1] , (12.28)
where
x
t
:= x
0
+t (x
1
x
0
) = (1 t)x
0
+tx
1
, (12.29)
see Figure 12.3 below.
Fig. 12.3. A convex function along with three cords corresponding to x0 = 5 and
x1 = 2 x0 = 2 and x1 = 5/2, and x0 = 5 and x1 = 5/2 with slopes, m1 = 15/3,
m2 = 15/6 and m3 = 1/2 respectively. Notice that m1 m3 m2.
Lemma 12.50. Let : (a, b) 1 be a function and
F (x
0
, x
1
) :=
(x
1
) (x
0
)
x
1
x
0
for a < x
0
< x
1
< b.
Then the following are equivalent;
1. is convex,
2. F (x
0
, x
1
) is non-decreasing in x
0
for all a < x
0
< x
1
< b, and
3. F (x
0
, x
1
) is non-decreasing in x
1
for all a < x
0
< x
1
< b.
Proof. Let x
t
and h
t
be as in Eq. (12.28), then (x
t
, h
t
) is on the line segment
joining (x
0
, (x
0
)) to (x
1
, (x
1
)) and the statement that is convex is then
12.9 Appendix: Convex Functions 193
Fig. 12.4. A convex function with three cords. Notice the slope relationships; m1
m3 m2.
equivalent to the assertion that (x
t
) h
t
for all 0 t 1. Since (x
t
, h
t
) lies
on a straight line we always have the following three slopes are equal;
h
t
(x
0
)
x
t
x
0
=
(x
1
) (x
0
)
x
1
x
0
=
(x
1
) h
t
x
1
x
t
.
In light of this identity, it is now clear that the convexity of is equivalent to
either,
F (x
0
, x
t
) =
(x
t
) (x
0
)
x
t
x
0
h
t
(x
0
)
x
t
x
0
=
(x
1
) (x
0
)
x
1
x
0
= F (x
0
, x
1
)
or
F (x
0
, x
1
) =
(x
1
) (x
0
)
x
1
x
0
=
(x
1
) h
t
x
1
x
t
(x
1
) (x
t
)
x
1
x
t
= F (x
t
, x
1
)
holding for all x
0
< x
t
< x
1
.
Lemma 12.51 (A generalized FTC). If PC
1
((a, b) 1)
8
, then for all
a < x < y < b,
(y) (x) =
_
y
x
t
(t) dt.
Proof. Let b
1
, . . . , b
l1
be the points of non-dierentiability of in (x, y)
and set b
0
= x and b
l
= y. Then
(y) (x) =
l
k=1
[(b
k
) (b
k1
)]
=
l
k=1
_
bk
bk1
t
(t) dt =
_
y
x
t
(t) dt.
Figure 12.5 below serves as motivation for the following elementary lemma
on convex functions.
Fig. 12.5. A convex function, , along with a cord and a tangent line. Notice that
the tangent line is always below and the cord lies above between the points of
intersection of the cord with the graph of .
8
PC
1
denotes the space of piecewise C
1
functions, i.e. PC
1
((a, b) R) means
the is continuous and there are a nite number of points,
{a = a0 < a1 < a2 < < an1 < an = b} ,
such that |
[aj1,aj](a,b)
is C
1
for all j = 1, 2, . . . , n.
194 12 L
p
spaces
Lemma 12.52 (Convex Functions). Let PC
1
((a, b) 1) and for x
(a, b) , let
t
(x+) := lim
h0
(x +h) (x)
h
and
t
(x) := lim
h0
(x +h) (x)
h
.
(Of course,
t
(x) =
t
(x) at points x (a, b) where is dierentiable.)
1. If
t
(x)
t
(y) for all a < x < y < b with x and y be points where is
dierentiable, then for any x
0
(a, b) , we have
t
(x
0
)
t
(x
0
+) and
for m (
t
(x
0
) ,
t
(x
0
+)) we have,
(x
0
) +m(x x
0
) (x) x
0
, x (a, b) . (12.30)
2. If PC
2
((a, b) 1)
9
with
tt
(x) 0 for almost all x (a, b) , then Eq.
(12.30) holds with m =
t
(x
0
) .
3. If either of the hypothesis in items 1. and 2. above hold then is convex.
(This lemma applies to the functions, e
x
for all 1, [x[
for > 1,
and lnx to name a few examples. See Appendix 12.9 below for much more on
convex functions.)
Proof. 1. If x
0
is a point where is not dierentiable and h > 0 is small, by
the mean value theorem, for all h > 0 small, there exists c
+
(h) (x
0
, x
0
+h)
and c
(h) (x
0
h, x
0
) such that
(x
0
h) (x
0
)
h
=
t
(c
(h))
t
(c
+
(h)) =
(x
0
+h) (x
0
)
h
.
Letting h 0 in this equation shows
t
(x
0
)
t
(x
0
+) . Furthermore if
x < x
0
< y with x and y being points of dierentiability of , the for small
h > 0,
t
(x)
t
(c
(h))
t
(c
+
(h))
t
(y) .
Letting h 0 in these inequalities shows,
9
PC
2
denotes the space of piecewise C
2
functions, i.e. PC
2
((a, b) R) means
the is C
1
and there are a nite number of points,
{a = a0 < a1 < a2 < < an1 < an = b} ,
such that |
[aj1,aj](a,b)
is C
2
for all j = 1, 2, . . . , n.
t
(x)
t
(x
0
)
t
(x
0
+)
t
(y) . (12.31)
Now let m (
t
(x
0
) ,
t
(x
0
+)) . By the fundamental theorem of calculus in
Lemma 12.51 and making use of Eq. (12.31), if x > x
0
then
(x) (x
0
) =
_
x
x0
t
(t) dt
_
x
x0
m dt = m(x x
0
)
and if x < x
0,
then
(x
0
) (x) =
_
x0
x
t
(t) dt
_
x0
x
m dt = m(x
0
x) .
These two equations implies Eq. (12.30) holds.
2. Notice that
t
PC
1
((a, b)) and therefore,
t
(y)
t
(x) =
_
y
x
tt
(t) dt 0 for all a < x y < b
which shows that item 1. may be used.
Alternatively; by Taylors theorem with integral remainder (see Eq. (7.55)
with F = , a = x
0
, and b = x) implies
(x) = (x
0
) +
t
(x
0
) (x x
0
) + (x x
0
)
2
_
1
0
tt
(x
0
+ (x x
0
)) (1 ) d
(x
0
) +
t
(x
0
) (x x
0
) .
3. For any (a, b) , let h
(x) := (x
0
) +
t
(x
0
) (x x
0
) . By Eq. (12.30)
we know that h
(x) (x) for all , x (a, b) with equality when = x and

therefore,
(x) = sup
(a,b)
h
(x) .
Since h
is an ane function for each (a, b) , it follows that

h
(x
t
) = (1 t) h
(x
0
) +th
(x
1
) (1 t) (x
0
) +t(x
1
)
for all t [0, 1] . Thus we may conclude that
(x
t
) = sup
(a,b)
h
(x
t
) (1 t) (x
0
) +t(x
1
)
as desired.
*For fun, here are three more proofs of Eq. (12.28) under the hypothesis of
item 2. Clearly these proofs may be omitted.
3a. By Lemma 12.50 below it suces to show either
d
dx
(y) (x)
y x
0 or
d
dy
(y) (x)
y x
0 for a < x < y < b.
For the rst case,
d
dx
(y) (x)
y x
=
(y) (x)
t
(x) (y x)
(y x)
2
=
_
1
0
tt
(x +t (y x)) (1 t) dt 0.
Similarly,
d
dy
(y) (x)
y x
=

t
(y) (y x) [(y) (x)]
(y x)
2
where we now use,
(x) (y) =
t
(y) (x y) + (x y)
2
_
1
0
tt
(y +t (x y)) (1 t) dt
so that
t
(y) (y x) [(y) (x)]
(y x)
2
= (x y)
2
_
1
0
tt
(y +t (x y)) (1 t) dt 0
again.
3b. Let
f (t) := (u) +t ((v) (u)) (u +t (v u)) .
Then f (0) = f (1) = 0 with

f (t) = (v u)
2
tt
(u +t (v u)) 0 for almost
all t. By the mean value theorem, there exists, t
0
(0, 1) such that

f (t
0
) = 0
and then by the fundamental theorem of calculus it follows that
f (t) =
_
t
t0
f () dt.
In particular,

f (t) 0 for t > t
0
and

f (t) 0 for t < t
0
and hence f (t)
f (1) = 0 for t t
0
and f (t) f (0) = 0 for t t
0
, i.e. f (t) 0.
3c. Let h : [0, 1] 1 be a piecewise C
2
function. Then by the fundamental
theorem of calculus and integration by parts,
h(t) = h(0) +
_
t
0
h() d = h(0) +th(t)
_
t
0
h() d
and
h(1) = h(t) +
_
1
t
h() d ( 1) = h(t) (t 1) h(t)
_
1
t
h() ( 1) d.
Thus we have shown,
h(t) = h(0) +th(t)
_
t
0
h() d and
h(t) = h(1) + (t 1) h(t) +
_
1
t
h() ( 1) d.
So if we multiply the rst equation by (1 t) and add to it the second equation
multiplied by t shows,
h(t) = (1 t) h(0) +th(1)
_
1
0
G(t, )
h() d, (12.32)
where
G(t, ) :=
_
(1 t) if t
t (1 ) if t
.
(The function G(t, ) is the Greens function for the operator d
2
/dt
2
on
[0, 1] with Dirichlet boundary conditions. The formula in Eq. (12.32) is a stan-
dard representation formula for h(t) which appears naturally in the study of
harmonic functions.)
We now take h(t) := (x
0
+t (x
1
x
0
)) in Eq. (12.32) to learn
(x
0
+t (x
1
x
0
)) =(1 t) (x
0
) +t(x
1
)
(x
1
x
0
)
2
_
1
0
G(t, ) (x
0
+ (x
1
x
0
)) d
(1 t) (x
0
) +t(x
1
) ,
because 0 and G(t, ) 0.
Example 12.53. The functions exp(x) and log(x) are convex and [x[
p
is
convex i p 1 as follows from Lemma 12.52.
Example 12.54 (Proof of Lemma 10.40). Taking (x) = e
x
in Lemma 12.52,
Eq. (12.30) with x
0
= 0 implies (see Figure 10.1),
1 x (x) = e
x
for all x 1.
Taking (x) = e
2x
in Lemma 12.52, Eq. (12.28) with x
0
= 0 and x
1
= 1
implies, for all t [0, 1] ,
196 12 L
p
spaces
e
t

_
(1 t) 0 +t
1
2
_
(1 t) (0) +t
_
1
2
_
= 1 t +te
1
1
1
2
t,
wherein the last equality we used e
1
<
1
2
. Taking t = 2x in this equation then
gives (see Figure 10.2)
e
2x
1 x for 0 x
1
2
. (12.33)
Theorem 12.55. Suppose that : (a, b) 1 is convex and for x, y (a, b)
with x < y, let
10
F (x, y) :=
(y) (x)
y x
.
Then;
1. F (x, y) is increasing in each of its arguments.
2. The following limits exist,
t
+
(x) := F (x, x+) := lim
yx
F (x, y) < and (12.34)
(y) := F (y, y) := lim

xy
F (x, y) > . (12.35)
3. The functions,
t
are both increasing functions and further satisfy,

<
t
(x)
t
+
(x)
t
(y) < a < x < y < b. (12.36)

4. For any t
_
(x) ,
t
+
(x)
,
(y) (x) +t (y x) for all x, y (a, b) . (12.37)
5. For a < < < b, let K := max
_
t
+
()
()
_
. Then
[(y) (x)[ K[y x[ for all x, y [, ] .
That is is Lipschitz continuous on [, ] .
6. The function
t
+
is right continuous and
t
is left continuous.
7. The set of discontinuity points for
t
+
and for
t
are the same as the set of

points of non-dierentiability of . Moreover this set is at most countable.
10
The same formula would dene F (x, y) for x = y. However, since F (x, y) =
F (y, x) , we would gain no new information by this extension.
Proof. BRUCE: The rst two items are a repetition of Lemma 12.50.
1. and 2. If we let h
t
= t(x
1
) + (1 t)(x
0
), then (x
t
, h
t
) is on the line
segment joining (x
0
, (x
0
)) to (x
1
, (x
1
)) and the statement that is convex
is then equivalent of (x
t
) h
t
for all 0 t 1. Since
h
t
(x
0
)
x
t
x
0
=
(x
1
) (x
0
)
x
1
x
0
=
(x
1
) h
t
x
1
x
t
,
the convexity of is equivalent to
(x
t
) (x
0
)
x
t
x
0
h
t
(x
0
)
x
t
x
0
=
(x
1
) (x
0
)
x
1
x
0
for all x
0
x
t
x
1
and to
(x
1
) (x
0
)
x
1
x
0
=
(x
1
) h
t
x
1
x
t
(x
1
) (x
t
)
x
1
x
t
for all x
0
x
t
x
1
.
Convexity also implies
(x
t
) (x
0
)
x
t
x
0
=
h
t
(x
0
)
x
t
x
0
=
(x
1
) h
t
x
1
x
t
(x
1
) (x
t
)
x
1
x
t
.
These inequalities may be written more compactly as,
(v) (u)
v u

(w) (u)
w u

(w) (v)
w v
, (12.38)
valid for all a < u < v < w < b, again see Figure 12.4. The rst (second)
inequality in Eq. (12.38) shows F (x, y) is increasing y (x). This then implies
the limits in item 2. are monotone and hence exist as claimed.
3. Let a < x < y < b. Using the increasing nature of F,
<
t
(x) = F (x, x) F (x, x+) =

t
+
(x) <
and
t
+
(x) = F (x, x+) F (y, y) =
t
(y)
as desired.
4. Let t
_
(x) ,
t
+
(x)
. Then
t
t
+
(x) = F (x, x+) F (x, y) =
(y) (x)
y x
or equivalently,
(y) (x) +t (y x) for y x.
Therefore Eq. (12.37) holds for y x. Similarly, for y < x,
t
t
(x) = F (x, x) F (y, x) =

(x) (y)
x y
or equivalently,
(y) (x) t (x y) = (x) +t (y x) for y x.
Hence we have proved Eq. (12.37) for all x, y (a, b) .
5. For a < x < y < b, we have
t
+
()
t
+
(x) = F (x, x+) F (x, y) F (y, y) =
t
(y)
t
()
(12.39)
and in particular,
K
t
+
()
(y) (x)
y x

t
() K.
This last inequality implies, [(y) (x)[ K (y x) which is the desired
Lipschitz bound.
6. For a < c < x < y < b, we have
t
+
(x) = F (x, x+) F (x, y) and letting
x c (using the continuity of F) we learn
t
+
(c+) F (c, y) . We may now let
y c to conclude
t
+
(c+)
t
+
(c) . Since
t
+
(c)
t
+
(c+) , it follows that
t
+
(c) =
t
+
(c+) and hence that
t
+
is right continuous.
Similarly, for a < x < y < c < b, we have
t
(y) F (x, y) and letting

y c (using the continuity of F) we learn
t
(c) F (x, c) . Now let x c to

conclude
t
(c)
t
(c) . Since
t
(c)
t
(c) , it follows that

t
(c) =
(c) , i.e.
t
is left continuous.
7. Since
are increasing functions, they have at most countably many

points of discontinuity. Letting x y in Eq. (12.36), using the left continuity
of
t
, shows
t
(y) =
t
+
(y) . Hence if
t
is continuous at y,
t
(y) =
(y+) =
t
+
(y) and is dierentiable at y. Conversely if is dierentiable
at y, then
t
+
(y) =
t
(y) =
t
(y) =
t
+
(y)
which shows
t
+
is continuous at y. Thus we have shown that set of discontinuity
points of
t
+
is the same as the set of points of non-dierentiability of . That
the discontinuity set of
t
is the same as the non-dierentiability set of is

proved similarly.
Corollary 12.56. If : (a, b) 1 is a convex function and D (a, b) is a
dense set, then
(y) = sup
xD
_
(x) +
t
(x) (y x)
for all x, y (a, b) .

Proof. Let
(y) := sup
xD
[(x) +
(x) (y x)] . According to Eq.

(12.37) above, we know that (y)
(y) for all y (a, b) . Now suppose that

x (a, b) and x
n
with x
n
x. Then passing to the limit in the estimate,
(y) (x
n
) +
t
(x
n
) (y x
n
) , shows
(y) (x) +
t
(x) (y x) .
Since x (a, b) is arbitrary we may take x = y to discover
(y) (y) and

hence (y) =
(y) . The proof that (y) =

+
(y) is similar.
Lemma 12.57. Suppose that : (a, b) 1 is a non-decreasing function such
that
_
1
2
(x +y)
_
1
2
[(x) +(y)] for all x, y (a, b) , (12.40)
then is convex. The result remains true if is assumed to be continuous
rather than non-decreasing.
Proof. Let x
0
, x
1
(a, b) and x
t
:= x
0
+t (x
1
x
0
) as above. For n N let
|
n
=
_
k
2
n
: 1 k < 2
n
_
. We are going to being by showing Eq. (12.40) implies
(x
t
) (1 t) (x
0
) +t(x
1
) for all t | :=
n
|
n
. (12.41)
We will do this by induction on n. For n = 1, this follows directly from Eq.
(12.40). So now suppose that Eq. (12.41) holds for all t |
n
and now suppose
that t =
2k+1
2
n
|
n+1
. Observing that
x
t
=
1
2
_
x k
2
n1
+xk+1
2
n
_
we may again use Eq. (12.40) to conclude,
(x
t
)
1
2
_
_
x k
2
n1
_
+
_
x k+1
2
n1
__
.
Then use the induction hypothesis to conclude,
(x
t
)
1
2
_ _
1
k
2
n1
_
(x
0
) +
k
2
n1
(x
1
)
+
_
1
k+1
2
n1
_
(x
0
) +
k+1
2
n1
(x
1
)
_
= (1 t) (x
0
) +t(x
1
)
as desired.
For general t (0, 1) , let | such that > t. Since is increasing and
by Eq. (12.41) we conclude,
(x
t
) (x
) (1 ) (x
0
) +(x
1
) .
We may now let t to complete the proof. This same technique clearly also
works if we were to assume that is continuous rather than monotonic.
13
Hilbert Space Basics
Denition 13.1. Let H be a complex vector space. An inner product on H is
a function, [) : H H C, such that
1. ax +by[z) = ax[z) +by[z) i.e. x x[z) is linear.
2. x[y) = y[x).
3. |x|
2
:= x[x) 0 with |x|
2
= 0 i x = 0.
Notice that combining properties (1) and (2) that x z[x) is conjugate
linear for xed z H, i.e.
z[ax +by) = az[x) +
bz[y).
The following identity will be used frequently in the sequel without further
mention,
|x +y|
2
= x +y[x +y) = |x|
2
+|y|
2
+x[y) +y[x)
= |x|
2
+|y|
2
+ 2Rex[y). (13.1)
Theorem 13.2 (Schwarz Inequality). Let (H, [)) be an inner product
space, then for all x, y H
[x[y)[ |x||y|
and equality holds i x and y are linearly dependent.
Proof. If y = 0, the result holds trivially. So assume that y ,= 0 and observe;
if x = y for some C, then x[y) = |y|
2
and hence
[x[y)[ = [[ |y|
2
= |x||y|.
Now suppose that x H is arbitrary, let z := x|y|
2
x[y)y. (So |y|
2
x[y)y
is the orthogonal projection of x along y, see Figure 13.1.) Then
0 |z|
2
=
_
_
_
_
x
x[y)
|y|
2
y
_
_
_
_
2
= |x|
2
+
[x[y)[
2
|y|
4
|y|
2
2Rex[
x[y)
|y|
2
y)
= |x|
2
[x[y)[
2
|y|
2
from which it follows that 0 |y|
2
|x|
2
[x[y)[
2
with equality i z = 0 or
equivalently i x = |y|
2
x[y)y.
Fig. 13.1. The picture behind the proof of the Schwarz inequality.
Corollary 13.3. Let (H, [)) be an inner product space and |x| :=
_
x[x).
Then the Hilbertian norm, | |, is a norm on H. Moreover [) is continuous
on H H, where H is viewed as the normed space (H, ||).
Proof. If x, y H, then, using Schwarzs inequality,
|x +y|
2
= |x|
2
+|y|
2
+ 2Rex[y)
|x|
2
+|y|
2
+ 2|x||y| = (|x| +|y|)
2
.
Taking the square root of this inequality shows || satises the triangle inequal-
ity.
Checking that || satises the remaining axioms of a norm is now routine
and will be left to the reader. If x, y, x, y H, then
[x +x[y +y) x[y)[ = [x[y) +x[y) +x[y)[
|x||y| +|y||x| +|x||y|
0 as x, y 0,
from which it follows that [) is continuous.
Denition 13.4. Let (H, [)) be an inner product space, we say x, y H are
orthogonal and write x y i x[y) = 0. More generally if A H is a set,
x H is orthogonal to A (write x A) i x[y) = 0 for all y A. Let
A
= x H : x A be the set of vectors orthogonal to A. A subset S H

200 13 Hilbert Space Basics
is an orthogonal set if x y for all distinct elements x, y S. If S further
satises, |x| = 1 for all x S, then S is said to be an orthonormal set.
Proposition 13.5. Let (H, [)) be an inner product space then
1. (Parallelogram Law)
|a +b|
2
+|a b|
2
= 2|a|
2
+ 2|b|
2
(13.2)
for all a, b H.
2. (Pythagorean Theorem) If S H is a nite orthogonal set, then
_
_
_
_
_
xS
x
_
_
_
_
_
2
=
xS
|x|
2
. (13.3)
3. If A H is a set, then A
is a closed linear subspace of H.

Proof. I will assume that H is a complex Hilbert space, the real case being
easier. Items 1. and 2. are proved by the following elementary computations;
|a +b|
2
+|a b|
2
= |a|
2
+|b|
2
+ 2Rea[b) +|a|
2
+|b|
2
2Rea[b)
= 2|a|
2
+ 2|b|
2
,
and
_
_
_
_
_
xS
x
_
_
_
_
_
2
=
xS
x[
yS
y) =
x,yS
x[y)
=
xS
x[x) =
xS
|x|
2
.
Item 3. is a consequence of the continuity of [) and the fact that
A
=
xA
Nul([x))
where Nul([x)) = y H : y[x) = 0 a closed subspace of H. Alternatively,
if x
n
A
and x
n
x in H, then
0 = lim
n
0 = lim
n
x
n
[a) =
_
lim
n
x
n
[a
_
= x[a) a A
which shows that x A
.
Denition 13.6. A Hilbert space is an inner product space (H, [)) such
that the induced Hilbertian norm is complete.
Example 13.7. For any measure space, (, B, ) , H := L
2
() with inner prod-
uct,
f[g) =
_
f () g () d()
is a Hilbert space see Theorem 12.25 for the completeness assertion.
Denition 13.8. A subset C of a vector space X is said to be convex if for all
x, y C the line segment [x, y] := tx + (1 t)y : 0 t 1 joining x to y is
contained in C as well. (Notice that any vector subspace of X is convex.)
Theorem 13.9 (Best Approximation Theorem). Suppose that H is a
Hilbert space and M H is a closed convex subset of H. Then for any x H
there exists a unique y M such that
|x y| = d(x, M) = inf
zM
|x z|.
Moreover, if M is a vector subspace of H, then the point y may also be charac-
terized as the unique point in M such that (x y) M.
Proof. Let x H, := d(x, M), y, z M, and, referring to Figure 13.2, let
w = z + (y x) and c = (z +y) /2 M. It then follows by the parallelogram
law (Eq. (13.2) with a = (y x) and b = (z x)) and the fact that c M that
2 |y x|
2
+ 2 |z x|
2
= |w x|
2
+|y z|
2
= |z +y 2x|
2
+|y z|
2
= 4 |x c|
2
+|y z|
2
4
2
+|y z|
2
.
Thus we have shown for all y, z M that,
|y z|
2
2 |y x|
2
+ 2 |z x|
2
4
2
. (13.4)
Uniqueness. If y, z M minimize the distance to x, then |y x| = =
|z x| and it follows from Eq. (13.4) that y = z.
Existence. Let y
n
M be chosen such that |y
n
x| =
n
= d(x, M).
Taking y = y
m
and z = y
n
in Eq. (13.4) shows
|y
n
y
m
|
2
2
2
m
+ 2
2
n
4
2
0 as m, n .
Therefore, by completeness of H, y
n
n=1
is convergent. Because M is closed,
y := lim
n
y
n
M and because the norm is continuous,
|y x| = lim
n
|y
n
x| = = d(x, M).
13 Hilbert Space Basics 201
Fig. 13.2. In this gure y, z M and by convexity, c = (z +y) /2 M.
So y is the desired point in M which is closest to x.
Orthogonality property. Now suppose M is a closed subspace of H and
x H. Let y M be the closest point in M to x. Then for w M, the function
g(t) := |x (y +tw)|
2
= |x y|
2
2tRex y[w) +t
2
|w|
2
has a minimum at t = 0 and therefore 0 = g
t
(0) = 2Rexy[w). Since w M
is arbitrary, this implies that (x y) M, see Figure 13.3.
Fig. 13.3. The orthogonality relationships of closest points.
Finally suppose y M is any point such that (x y) M. Then for z M,
by Pythagoreans theorem,
|x z|
2
= |x y +y z|
2
= |x y|
2
+|y z|
2
|x y|
2
which shows d(x, M)
2
|x y|
2
. That is to say y is the point in M closest to
x.
Notation 13.10 If A : X Y is a linear operator between two normed spaces,
we let
|A| := sup
xX\0]
|Ax|
Y
|x|
X
= sup
|x|
X
=1
|Ax|
Y
.
We refer to |A| as the operator norm of A and call A a bounded operator if
|A| < . We further let L(X, Y ) be the set of bounded operators from X to
Y.
Exercise 13.1. Show that a linear operator, A : X Y, is a bounded i it is
continuous.
Solution to Exercise (13.1). If A is continuous at x = 0, then (as A0 = 0)
there exists > 0 such that |Ax|
Y
1 for |x|
X
. Thus if x ,= 0, we have
_
_
_

|x|
X
x
_
_
_
X
= and therefore,
|x|
X
|Ax|
Y
=
_
_
_
_
A

|x|
X
x
_
_
_
_
Y
1
from which it follows that |Ax|
Y

1
|x|
X
which shows that |A|
1
<
. Conversely if |A| < , then
|Ax Ax
t
|
Y
= |A(x x
t
)|
Y
|A| |x x
t
|
X
from which it follows that A is continuous.
Denition 13.11. Suppose that A : H H is a bounded operator. The
adjoint of A, denoted A
, is the unique operator A
: H H such that
Ax[y) = x[A
y). (The proof that A
exists and is unique will be given in

Proposition 13.16 below.) A bounded operator A : H H is self - adjoint or
Hermitian if A = A
.
Denition 13.12. H be a Hilbert space and M H be a closed subspace. The
orthogonal projection of H onto M is the function P
M
: H H such that for
x H, P
M
(x) is the unique element in M such that (x P
M
(x)) M, i.e.
P
M
(x) is the unique element in M such that
x[m) = P
M
(x)[m) for all m M. (13.5)
Given a linear transformation A, we will let Ran(A) and Nul (A) denote the
range and the null-space of A respectively.
Theorem 13.13 (Projection Theorem). Let H be a Hilbert space and M
H be a closed subspace. The orthogonal projection P
M
satises:
1. P
M
is linear and hence we will write P
M
x rather than P
M
(x).
2. P
2
M
= P
M
(P
M
is a projection).
3. P
M
= P
M
(P
M
is self-adjoint).
4. Ran(P
M
) = M and Nul(P
M
) = M
.
5. If N M H is another closed subspace, the P
N
P
M
= P
M
P
N
= P
N
.
Proof.
1. Let x
1
, x
2
H and C, then P
M
x
1
+P
M
x
2
M and
P
M
x
1
+P
M
x
2
(x
1
+x
2
) = [P
M
x
1
x
1
+(P
M
x
2
x
2
)] M
showing P
M
x
1
+P
M
x
2
= P
M
(x
1
+x
2
), i.e. P
M
is linear.
2. Obviously Ran(P
M
) = M and P
M
x = x for all x M. Therefore P
2
M
= P
M
.
3. Let x, y H, then since (x P
M
x) and (y P
M
y) are in M
,
P
M
x[y) = P
M
x[P
M
y +y P
M
y) = P
M
x[P
M
y)
= P
M
x + (x P
M
x)[P
M
y) = x[P
M
y).
4. We have already seen, Ran(P
M
) = M and P
M
x = 0 i x = x 0 M
,
i.e. Nul(P
M
) = M
.
5. If N M H it is clear that P
M
P
N
= P
N
since P
M
= Id on
N = Ran(P
N
) M. Taking adjoints gives the other identity, namely that
P
N
P
M
= P
N
.
Alternative proof 1 of P
N
P
M
= P
N
. If x H, then (x P
M
x) M
and therefore (x P
M
x) N. We also have (P
M
x P
N
P
M
x) N and
therefore,
x P
N
P
M
x = (x P
M
x) + (P
M
x P
N
P
M
x) N
which shows P
N
P
M
x = P
N
x.
Alternative proof 2 of P
N
P
M
= P
N
. If x H and n N, we have
P
N
P
M
x[n) = P
M
x[P
N
n) = P
M
x[n) = x[P
M
n) = x[n) .
Since this holds for all n we may conclude that P
N
P
M
x = P
N
x.
Corollary 13.14. If M H is a proper closed subspace of a Hilbert space H,
then H = M M
.
Proof. Given x H, let y = P
M
x so that xy M
. Then x = y+(xy)
M+M
. If x MM
, then x x, i.e. |x|

2
= x[x) = 0. So MM
= 0 .
Exercise 13.2. Suppose M is a subset of H, then M
= span(M) where (as

usual), span(M) denotes all nite linear combinations of elements from M.
Theorem 13.15 (Riesz Theorem). Let H
be the dual space of H, i.e. f

H
i f : H F is linear and continuous. The map

z H
j
[z) H
(13.6)
is a conjugate linear
1
isometric isomorphism, where for f H
we let,
|f|
H
:= sup
xH\0]
[f (x)[
|x|
= sup
|x|=1
[f (x)[ .
Proof. Let f H
and M =Nul(f) a closed proper subspace of H since f

is continuous. If f = 0, then clearly f () = [0) . If f ,= 0 there exists y HM.
Then for any C we have e := (y P
M
y) M
. We now choose so that

f (e) = 1. Hence if x H,
f (x f (x) e) = f (x) f (x) f (e) = f (x) f (x) = 0,
which shows x f (x) e M. As e M
it follows that
0 = x f (x) e[e) = x[e) f (x) |e|
2
which shows f () = [z) = jz where z := e/ |e|
2
and thus j is surjective.
The map j is conjugate linear by the axioms of the inner products. Moreover,
for x, z H,
[x[z)[ |x| |z| for all x H
with equality when x = z. This implies that |jz|
H
= |[z)|
H
= |z| . There-
fore j is isometric and this implies j is injective.
Proposition 13.16 (Adjoints). Let H and K be Hilbert spaces and A : H
K be a bounded operator. Then there exists a unique bounded operator A
:
K H such that
Ax[y)
K
= x[A
y)
H
for all x H and y K. (13.7)
Moreover, for all A, B L(H, K) and C,
1
Recall that j is conjugate linear if
j (z1 +z2) = jz1 + jz2
for all z1, z2 H and C.
13 Hilbert Space Basics 203
1. (A+B)
= A
+

B
,
2. A
:= (A
= A,
3. |A
| = |A| and
4. |A
A| = |A|
2
.
5. If K = H, then (AB)
= B
. In particular A L(H) has a bounded

inverse i A
has a bounded inverse and (A
)
1
=
_
A
1
_
.
Proof. For each y K, the map x Ax[y)
K
is in H
and therefore there

exists, by Theorem 13.15, a unique vector z H (we will denote this z by
A
(y)) such that

Ax[y)
K
= x[z)
H
for all x H.
This shows there is a unique map A
: K H such that Ax[y)

K
= x[A
(y))
H
for all x H and y K.
To see A
is linear, let y
1
, y
2
K and C, then for any x H,
Ax[y
1
+y
2
)
K
= Ax[y
1
)
K
+

Ax[y
2
)
K
= x[A
(y
1
))
K
+

x[A
(y
2
))
H
= x[A
(y
1
) +A
(y
2
))
H
and by the uniqueness of A
(y
1
+y
2
) we nd
A
(y
1
+y
2
) = A
(y
1
) +A
(y
2
).
This shows A
is linear and so we will now write A
y instead of A
(y).
Since
A
y[x)
H
= x[A
y)
H
= Ax[y)
K
= y[Ax)
K
it follows that A
= A. The assertion that (A+B)
= A
+

B
is Exercise
13.3.
Items 3. and 4. Making use of Schwarzs inequality (Theorem 13.2), we
have
|A
| = sup
kK:|k|=1
|A
k|
= sup
kK:|k|=1
sup
hH:|h|=1
[A
k[h)[
= sup
hH:|h|=1
sup
kK:|k|=1
[k[Ah)[ = sup
hH:|h|=1
|Ah| = |A|
so that |A
| = |A| . Since
|A
A| |A
| |A| = |A|
2
and
|A|
2
= sup
hH:|h|=1
|Ah|
2
= sup
hH:|h|=1
[Ah[Ah)[
= sup
hH:|h|=1
[h[A
Ah)[ sup
hH:|h|=1
|A
Ah| = |A
A| (13.8)
we also have |A
A| |A|
2
|A
A| which shows |A|

2
= |A
A| .
Alternatively, from Eq. (13.8),
|A|
2
|A
A| |A| |A
| (13.9)
which then implies |A| |A
| . Replacing A by A
in this last inequality

shows |A
| |A| and hence that |A
| = |A| . Using this identity back in

Eq. (13.9) proves |A|
2
= |A
A| .
Now suppose that K = H. Then
ABh[k) = Bh[A
k) = h[B
k)
which shows (AB)
= B
. If A
1
exists then
_
A
1
_
=
_
AA
1
_
= I
= I and
A
_
A
1
_
=
_
A
1
A
_
= I
= I.
This shows that A
is invertible and (A
)
1
=
_
A
1
_
. Similarly if A
is
invertible then so is A = A
.
Exercise 13.3. Let H, K, M be Hilbert spaces, A, B L(H, K), C L(K, M)
and C. Show (A+B)
= A
+

B
and (CA)
= A
L(M, H).
Exercise 13.4. Let H = C
n
and K = C
m
equipped with the usual inner
products, i.e. z[w)
H
= z w for z, w H. Let A be an mn matrix thought of
as a linear operator from H to K. Show the matrix associated to A
: K H
is the conjugate transpose of A.
Lemma 13.17. Suppose A : H K is a bounded operator, then:
1. Nul(A
) = Ran(A)
.
2. Ran(A) = Nul(A
.
3. if K = H and V H is an A invariant subspace (i.e. A(V ) V ), then
V
is A
invariant.
Proof. An element y K is in Nul(A
) i 0 = A
y[x) = y[Ax) for all

x H which happens i y Ran(A)
. Because, by Exercise 13.2, Ran(A) =

Ran(A)
, and so by the rst item, Ran(A) = Nul(A
. Now suppose A(V )

V and y V
, then
A
y[x) = y[Ax) = 0 for all x V

which shows A
y V
.
The next elementary theorem (referred to as the bounded linear transfor-
mation theorem, or B.L.T. theorem for short) is often useful.
Theorem 13.18 (B. L. T. Theorem). Suppose that Z is a normed space, X
is a Banach space, and o Z is a dense linear subspace of Z. If T : o X is a
bounded linear transformation (i.e. there exists C < such that |Tz| C |z|
for all z o), then T has a unique extension to an element

T L(Z, X) and
this extension still satises
_
_
Tz
_
_
C |z| for all z

o.
Proof. Let z Z and choose z
n
o such that z
n
z. Since
|Tz
m
Tz
n
| C |z
m
z
n
| 0 as m, n ,
it follows by the completeness of X that lim
n
Tz
n
=:

Tz exists. Moreover,
if w
n
o is another sequence converging to z, then
|Tz
n
Tw
n
| C |z
n
w
n
| C |z z| = 0
and therefore

Tz is well dened. It is now a simple matter to check that

T :
Z X is still linear and that
_
_
Tz
_
_
= lim
n
|Tz
n
| lim
n
C |z
n
| = C |z| for all x Z.
Thus

T is an extension of T to all of the Z. The uniqueness of this extension is
easy to prove and will be left to the reader.
p
Spaces*
In this section we are going to identify the sequentially weak compact subsets
of L
p
(, B, P) for 1 p < , where (, B, P) is a probability space. The key
to our proofs will be the following Hilbert space compactness result.
Theorem 13.19. Suppose x
n
n=1
is a bounded sequence in a Hilbert space H
(i.e. C := sup
n
|x
n
| < ), then there exists a sub-sequence, y
k
:= x
nk
and an
x H such that lim
k
y
k
[h) = x[h) for all h H. We say that y
k
converges
to x weakly in this case and denote this by y
k
w
x.
Proof. Let H
0
:= span(x
k
: k N). Then H
0
is a closed separable Hilbert
subspace of H and x
k
k=1
H
0
. Let h
n
n=1
be a countable dense subset of
H
0
. Since [x
k
[h
n
)[ |x
k
| |h
n
| C |h
n
| < , the sequence, x
k
[h
n
)
k=1

C, is bounded and hence has a convergent sub-sequence for all n N. By the
Cantors diagonalization argument we can nd a a sub-sequence, y
k
:= x
nk
, of
x
n
such that lim
k
y
k
[h
n
) exists for all n N.
We now show (z) := lim
k
y
k
[z) exists for all z H
0
. Indeed, for any
k, l, n N, we have
[y
k
[z) y
l
[z)[ = [y
k
y
l
[z)[ [y
k
y
l
[h
n
)[ +[y
k
y
l
[z h
n
)[
[y
k
y
l
[h
n
)[ + 2C |z h
n
| .
Letting k, l in this estimate then shows
limsup
k,l
[y
k
[z) y
l
[z)[ 2C |z h
n
| .
Since we may choose n N such that |z h
n
| is as small as we please, we may
conclude that limsup
k,l
[y
k
[z) y
l
[z)[ , i.e. (z) := lim
k
y
k
[z) exists.
The function, (z) = lim
k
z[y
k
) is a bounded linear functional on H
because
[ (z)[ = liminf
k
[z[y
k
)[ C |z| .
Therefore by the Riesz Theorem 13.15, there exists x H
0
such that (z) =
z[x) for all z H
0
. Thus, for this x H
0
we have shown
lim
k
y
k
[z) = x[z) for all z H
0
. (13.10)
To nish the proof we need only observe that Eq. (13.10) is valid for all
z H. Indeed if z H, then z = z
0
+ z
1
where z
0
= P
H0
z H
0
and z
1
=
z P
H0
z H
0
. Since y
k
, x H
0
, we have
lim
k
y
k
[z) = lim
k
y
k
[z
0
) = x[z
0
) = x[z) for all z H.
Since unbounded subsets of H are clearly not sequentially weakly compact,
Theorem 13.19 states that a set is sequentially precompact in H i it is bounded.
Let us now use Theorem 13.19 to identify the sequentially compact subsets of
L
p
(, B, P) for all 1 p < . We begin with the case p = 1.
Theorem 13.20. If X
n
n=1
is a uniformly integrable subset of L
1
(, B, P) ,
there exists a subsequence Y
k
:= X
nk
of X
n
n=1
and X L
1
(, B, P) such
that
lim
k
E[Y
k
h] = E[Xh] for all h B
b
. (13.11)
Proof. For each m N let X
m
n
:= X
n
1
]Xn]m
. The truncated sequence
X
m
n

n=1
is a bounded subset of the Hilbert space, L
2
(, B, P) , for all m N.
Therefore by Theorem 13.19, X
m
n

n=1
has a weakly convergent sub-sequence
p
Spaces* 205
for all m N. By Cantors diagonalization argument, we can nd Y
m
k
:= X
m
nk
and X
m
L
2
(, B, P) such that Y
m
k
w
X
m
as m and in particular
lim
k
E[Y
m
k
h] = E[X
m
h] for all h B
b
.
Our next goal is to show X
m
X in L
1
(, B, P) . To this end, for m < M
and h B
b
we have
E
__
X
M
X
m
_
h
= lim
k
E
__
Y
M
k
Y
m
k
_
h
liminf
k
E
_
Y
M
k
Y
m
k
[h[
|h|
liminf
k
E[[Y
k
[ : M [Y
k
[ > m]
|h|
liminf
k
E[[Y
k
[ : [Y
k
[ > m] .
Taking h = sgn(X
M
X
m
) in this inequality shows
E
_
X
M
X
m
liminf
k
E[[Y
k
[ : [Y
k
[ > m]
with the right member of this inequality going to zero as m, M with
M m by the assumed uniform integrability of the X
n
. Therefore there
exists X L
1
(, B, P) such that lim
m
E[X X
m
[ = 0.
We are now ready to verify Eq. (13.11) is valid. For h B
b
,
[E[(X Y
k
) h][ [E[(X
m
Y
m
k
) h][ +[E[(X X
m
) h][ +[E[(Y
k
Y
m
k
) h][
[E[(X
m
Y
m
k
) h][ +|h|
(E[[X X
m
[] +E[[Y
k
[ : [Y
k
[ > m])
[E[(X
m
Y
m
k
) h][ +|h|
_
E[[X X
m
[] + sup
l
E[[Y
l
[ : [Y
l
[ > m]
_
.
Passing to the limit as k in the above inequality shows
limsup
k
[E[(X Y
k
) h][ |h|
_
E[[X X
m
[] + sup
l
E[[Y
l
[ : [Y
l
[ > m]
_
.
Since X
m
X in L
1
and sup
l
E[[Y
l
[ : [Y
l
[ > m] 0 by uniform integrability,
it follows that, limsup
k
[E[(X Y
k
) h][ = 0.
Example 13.21. Let (, B, P) =
_
(0, 1) , B
(0,1)
, m
_
where m is Lebesgue measure
and let X
n
() = 2
n
1
0<<2
n. Then EX
n
= 1 for all n and hence X
n
n=1
is
bounded in L
1
(, B, P) (but is not uniformly integrable). Suppose for sake of
contradiction that there existed X L
1
(, B, P) and subsequence, Y
k
:= X
nk
such that Y
k
w
X. Then for h B
b
and any > 0 we would have
E
_
Xh1
(,1)
= lim
k
E
_
Y
k
h1
(,1)
= 0.
Then by DCT it would follow that E[Xh] = 0 for all h B
b
and hence that
X 0. On the other hand we would also have
0 = E[X 1] = lim
k
E[Y
k
1] = 1
and we have reached the desired contradiction. Hence we must conclude that
bounded subset of L
1
(, B, P) need not be weakly compact and thus we can
not drop the uniform integrability assumption made in Theorem 13.20.
When 1 < p < , the situation is simpler.
Theorem 13.22. Let p (1, ) and q = p (p 1)
1
(1, ) be its conjugate
exponent. If X
n
n=1
is a bounded sequence in L
p
(, B, P) , there exists X
L
p
(, B, P) and a subsequence Y
k
:= X
nk
of X
n
n=1
such that
lim
k
E[Y
k
h] = E[Xh] for all h L
q
(, B, P) . (13.12)
Proof. Let C := sup
nN
|X
n
|
p
< and recall that Lemma 12.48 guar-
antees that X
n
n=1
is a uniformly integrable subset of L
1
(, B, P) . There-
fore by Theorem 13.20, there exists X L
1
(, B, P) and a subsequence,
Y
k
:= X
nk
, such that Eq. (13.11) holds. We will complete the proof by showing;
a) X L
p
(, B, P) and b) and Eq. (13.12) is valid.
a) For h B
b
we have
[E[Xh][ liminf
k
E[[Y
k
h[] liminf
k
|Y
k
|
p
|h|
q
C |h|
q
.
For M < , taking h = sgn(X) [X[
p1
1
]X]M
in the previous inequality shows
E
_
[X[
p
1
]X]M
C
_
_
_sgn(X) [X[
p1
1
]X]M
_
_
_
q
= C
_
E
_
[X[
(p1)q
1
]X]M
__
1/q
C
_
E
_
[X[
p
1
]X]M
_
1/q
_
E
_
[X[
p
1
]X]M
_
1/p
_
E
_
[X[
p
1
]X]M
_
11/q
C.
Using the monotone convergence theorem, we may let M in this equation
to nd |X|
p
= (E[[X[
p
])
1/p
C < .
b) Now that we know X L
p
(, B, P) , in make sense to consider
E[(X Y
k
) h] for all h L
p
(, B, P) . For M < , let h
M
:= h1
]h]M
, then
[E[(X Y
k
) h][
E
_
(X Y
k
) h
M
E
_
(X Y
k
) h1
]h]>M
E
_
(X Y
k
) h
M
+|X Y
k
|
p
_
_
h1
]h]>M
_
_
q
E
_
(X Y
k
) h
M
+ 2C
_
_
h1
]h]>M
_
_
q
.
Since h
M
B
b
, we may pass to the limit k in the previous inequality to
nd,
limsup
k
[E[(X Y
k
) h][ 2C
_
_
h1
]h]>M
_
_
q
.
This completes the proof, since
_
_
h1
]h]>M
_
_
q
0 as M by DCT.
13.2 Exercises
Exercise 13.5. Suppose that M
n
n=1
is an increasing sequence of closed sub-
spaces of a Hilbert space, H. Let M be the closure of M
0
:=
n=1
M
n
. Show
lim
n
P
Mn
x = P
M
x for all x H. Hint: rst prove this for x M
0
and then
for x M. Also consider the case where x M
.
Solution to Exercise (13.5). Let P
n
:= P
Mn
and P = P
M
. If y M
0
, then
P
n
y = y = Py for all n suciently large. and therefore, lim
n
P
n
y = Py.
Now suppose that x M and y M
0
. Then
|Px P
n
x| |Px Py| +|Py P
n
y| +|P
n
y P
n
x|
2 |x y| +|Py P
n
y|
and passing to the limit as n then shows
limsup
n
|Px P
n
x| 2 |x y| .
The left hand side may be made as small as we like by choosing y M
0
arbitrarily close to x M =

M
0
.
For the general case, if x H, then x = Px +y where y = x Px M
n
for all n. Therefore,
P
n
x = P
n
Px Px as n
by what we have just proved.
Exercise 13.6 (A Martingale Convergence Theorem). Suppose that
M
n
n=1
is an increasing sequence of closed subspaces of a Hilbert space, H,
P
n
:= P
Mn
, and x
n
n=1
is a sequence of elements from H such that x
n
=
P
n
x
n+1
for all n N. Show;
1. P
m
x
n
= x
m
for all 1 m n < ,
2. (x
n
x
m
) M
m
for all n m,
3. |x
n
| is increasing as n increases,
4. if sup
n
|x
n
| = lim
n
|x
n
| < , then x := lim
n
x
n
exists in M and
that x
n
= P
n
x for all n N. (Hint: show x
n
n=1
is a Cauchy sequence.)
Remark 13.23. Let H =
2
:= L
2
(N, counting measure),
M
n
= (a (1) , . . . , a (n) , 0, 0, . . . ) : a (i) C for 1 i n ,
and x
n
(i) = 1
in
, then x
m
= P
m
x
n
for all n m while |x
n
|
2
= n as
n . Thus, we can not drop the assumption that sup
n
|x
n
| < in Exercise
13.6.
The rest of this section may be safely skipped.
Exercise 13.7. *Suppose that (X, ||) is a normed space such that parallelo-
gram law, Eq. (13.2), holds for all x, y X, then there exists a unique inner
product on [) such that |x| :=
_
x[x) for all x X. In this case we say that
|| is a Hilbertian norm.
Solution to Exercise (13.7). If || is going to come from an inner product
[), it follows from Eq. (13.1) that
2Rex[y) = |x +y|
2
|x|
2
|y|
2
and
2Rex[y) = |x y|
2
|x|
2
|y|
2
.
Subtracting these two equations gives the polarization identity,
4Rex[y) = |x +y|
2
|x y|
2
. (13.13)
Replacing y by iy in this equation then implies that
4Imx[y) = |x +iy|
2
|x iy|
2
from which we nd
x[y) =
1
4
G
|x +y|
2
(13.14)
where G = 1, i a cyclic subgroup of S
1
C. Hence, if [) is going to
exist we must dene it by Eq. (13.14) and the uniqueness has been proved.
For existence, dene x[y) by Eq. (13.14) in which case,
x[x) =
1
4
G
|x +x|
2
=
1
4
_
|2x|
2
+i|x +ix|
2
i|x ix|
2
= |x|
2
+
i
4
1 +i[
2
|x|
2
i
4
1 i[
2
|x|
2
= |x|
2
.
So to nish the proof, it only remains to show that x[y) dened by Eq. (13.14)
is an inner product.
13.2 Exercises 207
Since
4y[x) =
G
|y +x|
2
=
G
| (y +x) |
2
=
G
|y +
2
x|
2
= |y +x|
2
| y +x|
2
+i|iy x|
2
i| iy x|
2
= |x +y|
2
|x y|
2
+i|x iy|
2
i|x +iy|
2
= 4x[y)
it suces to show x x[y) is linear for all y H. For this we will need to
derive an identity from Eq. (13.2),. To do this we make use of Eq. (13.2), three
times to nd
|x +y +z|
2
= |x +y z|
2
+ 2|x +y|
2
+ 2|z|
2
= |x y z|
2
2|x z|
2
2|y|
2
+ 2|x +y|
2
+ 2|z|
2
= |y +z x|
2
2|x z|
2
2|y|
2
+ 2|x +y|
2
+ 2|z|
2
= |y +z +x|
2
+ 2|y +z|
2
+ 2|x|
2
2|x z|
2
2|y|
2
+ 2|x +y|
2
+ 2|z|
2
.
Solving this equation for |x +y +z|
2
gives
|x +y +z|
2
= |y +z|
2
+|x +y|
2
|x z|
2
+|x|
2
+|z|
2
|y|
2
. (13.15)
Using Eq. (13.15), for x, y, z H,
4 Rex +z[y) = |x +z +y|
2
|x +z y|
2
= |y +z|
2
+|x +y|
2
|x z|
2
+|x|
2
+|z|
2
|y|
2
_
|z y|
2
+|x y|
2
|x z|
2
+|x|
2
+|z|
2
|y|
2
_
= |z +y|
2
|z y|
2
+|x +y|
2
|x y|
2
= 4 Rex[y) + 4 Rez[y). (13.16)
Now suppose that G, then since [[ = 1,
4x[y) =
1
4
G
|x +y|
2
=
1
4
G
|x +
1
y|
2
=
1
4
G
|x +y|
2
= 4x[y) (13.17)
where in the third inequality, the substitution was made in the sum. So
Eq. (13.17) says ix[y) = ix[y) and x[y) = x[y). Therefore
Imx[y) = Re (ix[y)) = Reix[y)
which combined with Eq. (13.16) shows
Imx +z[y) = Reix iz[y) = Reix[y) + Reiz[y)
= Imx[y) + Imz[y)
and therefore (again in combination with Eq. (13.16)),
x +z[y) = x[y) +z[y) for all x, y H.
Because of this equation and Eq. (13.17) to nish the proof that x x[y) is
linear, it suces to show x[y) = x[y) for all > 0. Now if = m N, then
mx[y) = x + (m1)x[y) = x[y) +(m1)x[y)
so that by induction mx[y) = mx[y). Replacing x by x/m then shows that
x[y) = mm
1
x[y) so that m
1
x[y) = m
1
x[y) and so if m, n N, we nd
n
m
x[y) = n
1
m
x[y) =
n
m
x[y)
so that x[y) = x[y) for all > 0 and . By continuity, it now follows
that x[y) = x[y) for all > 0.
An alternate ending: In the case where X is real, the latter parts of the
proof are easier to digest as we can use Eq. (13.13) for the formula for the inner
product. For example, we have
4 x[2z) = |x + 2z|
2
|x 2z|
2
= |x +z +z|
2
+|x +z z|
2
|x z +z|
2
|x z z|
2
=
1
2
_
|x +z|
2
+|z|
2
_
1
2
_
|x z|
2
+|z|
2
_
=
1
2
_
|x +z|
2
|x z|
2
_
= 2 x[z)
from which it follows that x[2z) = 2 x[z) . Similarly,
4 [x[z) +y[z)] = |x +z|
2
|x z|
2
+|y +z|
2
|y z|
2
= |x +z|
2
+|y +z|
2
|x z|
2
|y z|
2
=
1
2
_
|x +y + 2z|
2
+|x y|
2
_
1
2
_
|x +y 2z|
2
+|x y|
2
_
= 2 x +y[2z) = 4 x +y[z)
from which it follows that and x +y[z) = x[z) +y[z) . From this identity one
shows as above that [) is a real inner product on X.
Now suppose that X is complex and now let
Q(x, y) =
1
4
_
|x +z|
2
|x z|
2
_
.
We should expect that Q(, ) = Re [) and therefore we should dene
x[y) := Q(x, y) iQ(ix, y) .
Since
4Q(ix, y) = |ix +y|
2
|ix y|
2
= |i (ix +y)|
2
|i (ix y)|
2
= |x iy|
2
|x +iy|
2
= 4Q(x, iy) ,
it follows that Q(ix, x) = 0 so that x[x) = |x|
2
and that
y[x) = Q(y, x) iQ(iy, x) = Q(y, x) +iQ(y, ix) = x[y).
Since x x[y) is real linear, we now need only show that ix[y) = i x[y) .
However,
ix[y) = Q(ix, y) iQ(i (ix) , y)
= Q(ix, y) +iQ(x, y) = i x[y)
as desired.
14
Conditional Expectation
In this section let (, B, P) be a probability space and ( B be a sub
sigma algebra of B. We will write f (
b
i f : C is bounded and f is
((, B
C
) measurable. If A B and P (A) > 0, we will let
E[X[A] :=
E[X : A]
P (A)
and P (B[A) := E[1
B
[A] :=
P (A B)
P (A)
for all integrable random variables, X, and B B. We will often use the fac-
torization Lemma 6.40 in this section. Because of this let us repeat it here.
Lemma 14.1. Suppose that (Y, T) is a measurable space and Y : Y is a
map. Then to every ((Y ), B
R
) measurable function, H :

1, there is a
(T, B
R
) measurable function h : Y

1 such that H = h Y.
Proof. First suppose that H = 1
A
where A (Y ) = Y
1
(T). Let B T
such that A = Y
1
(B) then 1
A
= 1
Y
1
(B)
= 1
B
Y and hence the lemma
is valid in this case with h = 1
B
. More generally if H =

a
i
1
Ai
is a simple
function, then there exists B
i
T such that 1
Ai
= 1
Bi
Y and hence H = hY
with h :=
a
i
1
Bi
a simple function on

1.
For a general (T, B
R
) measurable function, H, from

1, choose simple
functions H
n
converging to H. Let h
n
: Y

1 be simple functions such that
H
n
= h
n
Y. Then it follows that
H = lim
n
H
n
= limsup
n
H
n
= limsup
n
h
n
Y = h Y
where h := limsup
n
h
n
a measurable function from Y to

1.
Denition 14.2 (Conditional Expectation). Let E
: L
2
(, B, P)
L
2
(, (, P) denote orthogonal projection of L
2
(, B, P) onto the closed sub-
space L
2
(, (, P). For f L
2
(, B, P), we say that E
f L
2
(, (, P) is the
conditional expectation of f given (.
Remark 14.3 (Basic Properties of E
). Let f L
2
(, B, P). By the orthogonal
projection Theorem 13.13 we know that F L
2
(, (, P) is E
f a.s. i either
of the following two conditions hold;
1. |f F|
2
|f g|
2
for all g L
2
(, (, P) or
2. E[fh] = E[Fh] for all h L
2
(, (, P).
Moreover if (
0
(
1
B then L
2
(, (
0
, P) L
2
(, (
1
, P) L
2
(, B, P)
and therefore,
E
0
E
1
f = E
1
E
0
f = E
0
f a.s. for all f L
2
(, B, P) . (14.1)
It is also useful to observe that condition 2. above may expressed as
E[f : A] = E[F : A] for all A ( (14.2)
or
E[fh] = E[Fh] for all h (
b
. (14.3)
Indeed, if Eq. (14.2) holds, then by linearity we have E[fh] = E[Fh] for all
( measurable simple functions, h and hence by the approximation Theorem
6.39 and the DCT for all h (
b
. Therefore Eq. (14.2) implies Eq. (14.3). If Eq.
(14.3) holds and h L
2
(, (, P), we may use DCT to show
E[fh]
DCT
= lim
n
E
_
fh1
]h]n
(14.3)
= lim
n
E
_
Fh1
]h]n
DCT
= E[Fh] ,
which is condition 2. in Remark 14.3. Taking h = 1
A
with A ( in condition
2. or Remark 14.3, we learn that Eq. (14.2) is satised as well.
Theorem 14.4. Let (, B, P) and ( B be as above and let f, g L
1
(, B, P).
The operator E
: L
2
(, B, P) L
2
(, (, P) extends uniquely to a linear con-
traction from L
1
(, B, P) to L
1
(, (, P). This extension enjoys the following
properties;
1. If f 0, P a.e. then E
f 0, P a.e.
2. Monotonicity. If f g, P a.e. there E
f E
g, P a.e.
3. L
contraction property. [E
f[ E
[f[ , P a.e.
4. Averaging Property. If f L
1
(, B, P) then F = E
f i F
L
1
(, (, P) and
E(Fh) = E(fh) for all h (
b
. (14.4)
5. Pull out property or product rule. If g (
b
and f L
1
(, B, P), then
E
(gf) = g E
f, P a.e.
210 14 Conditional Expectation
6. Tower or smoothing property. If (
0
(
1
B. Then
E
0
E
1
f = E
1
E
0
f = E
0
f a.s. for all f L
1
(, B, P) . (14.5)
Proof. By the denition of orthogonal projection, f L
2
(, B, P) and
h (
b
,
E(fh) = E(f E
h) = E(E
f h). (14.6)
Taking
h = sgn(E
f) :=
E
f
E
f
1
]EGf]>0
(14.7)
in Eq. (14.6) shows
E([E
f[) = E(E
f h) = E(fh) E([fh[) E([f[). (14.8)

It follows from this equation and the BLT (Theorem 13.18) that E
extends
uniquely to a contraction form L
1
(, B, P) to L
1
(, (, P). Moreover, by a sim-
ple limiting argument, Eq. (14.6) remains valid for all f L
1
(, B, P) and
h (
b
. Indeed, (without reference to Theorem 13.18) if f
n
:= f1
]f]n

L
2
(, B, P) , then f
n
f in L
1
(, B, P) and hence
E[[E
f
n
E
f
m
[] = E[[E
(f
n
f
m
)[] E[[f
n
f
m
[] 0 as m, n .
By the completeness of L
1
(, (, P), F := L
1
(, (, P)-lim
n
E
f
n
exists.
Moreover the function F satises,
E(F h) = E( lim
n
E
f
n
h) = lim
n
E(f
n
h) = E(f h) (14.9)
for all h (
b
and by Proposition 7.22 there is at most one, F L
1
(, (, P),
which satises Eq. (14.9). We will again denote F by E
f. This proves the

existence and uniqueness of F satisfying the dening relation in Eq. (14.4) of
item 4. The same argument used in Eq. (14.8) again shows E[F[ E[f[ and
therefore that E
: L
1
(, B, P) L
1
(, (, P) is a contraction.
Items 1 and 2. If f L
1
(, B, P) with f 0, then
E(E
f h) = E(fh) 0 h (
b
with h 0. (14.10)
An application of Lemma 7.23 then shows that E
f 0 a.s.
1
The proof of item
2. follows by applying item 1. with f replaced by f g 0.
Item 3. If f is real, f [f[ and so by Item 2., E
f E
[f[ , i.e.
[E
f[ E
[f[ , P a.e. For complex f, let h 0 be a bounded and (

measurable function. Then
1
This can also easily be proved directly here by taking h = 1E
G
f<0 in Eq. (14.10).
E[[E
f[ h] = E
_
E
f sgn(E
f)h
_
= E
_
f sgn(E
f)h
_
E[[f[ h] = E[E
[f[ h] .
Since h 0 is an arbitrary ( measurable function, it follows, by Lemma 7.23,
that [E
f[ E
[f[ , P a.s. Recall the item 4. has already been proved.

Item 5. If h, g (
b
and f L
1
(, B, P) , then
E[(gE
f) h] = E[E
f hg] = E[f hg] = E[gf h] = E[E
(gf) h] .
Thus E
(gf) = g E
f, P a.e.
Item 6., by the item 5. of the projection Theorem 13.13, Eq. (14.5) holds
on L
2
(, B, P). By continuity of conditional expectation on L
1
(, B, P) and
the density of L
1
probability spaces in L
2
probability spaces shows that Eq.
(14.5) continues to hold on L
1
(, B, P).
Second Proof. For h ((
0
)
b
, we have
E[E
0
E
1
f h] = E[E
1
f h] = E[f h] = E[E
0
f h]
which shows E
0
E
1
f = E
0
f a.s. By the product rule in item 5., it also follows
that
E
1
[E
0
f] = E
1
[E
0
f 1] = E
0
f E
1
[1] = E
0
f a.s.
Notice that E
1
[E
0
f] need only be (
1
measurable. What the statement says
there are representatives of E
1
[E
0
f] which is (
0
measurable and any such
representative is also a representative of E
0
f.
Remark 14.5. There is another standard construction of E
f based on the char-

acterization in Eq. (14.4) and the Radon Nikodym Theorem 15.8 below. It goes
as follows, for 0 f L
1
(P) , let Qbe the measure dened by dQ := fdP. Then
Q[
P[
and hence there exists 0 g L

1
(, (, P) such that dQ[
= gdP[
.
This then implies that
_
A
fdP = Q(A) =
_
A
gdP for all A (,
i.e. g = E
f. For general real valued, f L

1
(P) , dene E
f = E
f
+
E
and then for complex f L

1
(P) let E
f = E
Re f +iE
Imf.
Notation 14.6 In the future, we will often write E
f as E[f[(] . Moreover,
if (X, /) is a measurable space and X : X is a measurable map.
We will often simply denote E[f[ (X)] simply by E[f[X] . We will further
let P (A[() := E[1
A
[(] be the conditional probability of A given (, and
P (A[X) := P (A[ (X)) be conditional probability of A given X.
Exercise 14.1. Suppose f L
1
(, B, P) and f > 0 a.s. Show E[f[(] > 0 a.s.
(i.e. show g > 0 a.s. for any version, g, of E[f[(] .) Use this result to conclude if
f (a, b) a.s. for some a, b such that a < b , then E[f[(] (a, b) a.s.
More precisely you are to show that any version, g, of E[f[(] satises, g (a, b)
a.s.
14.1 Examples 211
14.1 Examples
Example 14.7. Suppose ( is the trivial algebra, i.e. ( = , . In this case
E
f = Ef a.s.
Example 14.8. On the opposite extreme, if ( = B, then E
f = f a.s.
Exercise 14.2 (Exercise 4.15 revisited.). Suppose (, B, P) is a probability
space and T := A
i
i=1
B is a partition of . (Recall this means =
i=1
A
i
.) Let ( be the algebra generated by T. Show:
1. B ( i B =
i
A
i
for some N.
2. g : 1 is ( measurable i g =
i=1
i
1
Ai
for some
i
1.
3. For f L
1
(, B, P), let E[f[A
i
] := E[1
Ai
f] /P(A
i
) if P(A
i
) ,= 0 and
E[f[A
i
] = 0 otherwise. Show
E
f =
i=1
E[f[A
i
] 1
Ai
a.s. (14.11)
Solution to Exercise (14.2). We will only prove part 3. here. To do this,
suppose that E
f =
i=1
i
1
Ai
for some
i
1. Then
E[f : A
j
] = E[E
f : A
j
] = E
_

i=1
i
1
Ai
: A
j
_
=
j
P (A
j
)
which holds automatically if P (A
j
) = 0 no matter how
j
is chosen. Therefore,
we must take
j
=
E[f : A
j
]
P (A
j
)
= E[f[A
j
]
which veries Eq. (14.11).
Example 14.9. If S is a countable or nite set equipped with the algebra,
2
S
, and X : S is a measurable map. Then
E[Z[X] =
sS
E[Z[X = s] 1
X=s
a.s.
where by convention we set E[Z[X = s] = 0 if P (X = s) = 1. This is an
immediate consequence of Exercise 14.2 with ( = (X) which is generated by
the partition, X = s for s S. Thus if we dene F (s) := E[Z[X = s] , we
will have E[Z[X] = F (X) a.s.
Lemma 14.10. Suppose (X, /) is a measurable space, X : X is a mea-
surable function, and ( is a sub--algebra of B. If X is independent of ( and
f : X 1 is a measurable function such that f (X) L
1
(, B, P) , then
E
[f (X)] = E[f (X)] a.s.. Conversely if E
[f (X)] = E[f (X)] a.s. for all

bounded measurable functions, f : X 1, then X is independent of (.
Proof. Suppose that X is independent of (, f : X 1 is a measurable
function such that f (X) L
1
(, B, P) , := E[f (X)] , and A (. Then, by
independence,
E[f (X) : A] = E[f (X) 1
A
] = E[f (X)] E[1
A
] = E[1
A
] = E[ : A] .
Therefore E
[f (X)] = = E[f (X)] a.s.

Conversely if E
[f (X)] = E[f (X)] = and A (, then

E[f (X) 1
A
] = E[f (X) : A] = E[ : A] = E[1
A
] = E[f (X)] E[1
A
] .
Since this last equation is assumed to hold true for all A ( and all bounded
measurable functions, f : X 1, X is independent of (.
The following remark is often useful in computing conditional expectations.
The following Exercise should help you gain some more intuition about condi-
tional expectations.
Remark 14.11 (Note well.). According to Lemma 14.1, E(f[X) =

f (X) a.s.
for some measurable function,

f : X 1. So computing E(f[X) =

f (X) is
equivalent to nding a function,

f : X 1, such that
E[f h(X)] = E
_
f (X) h(X)
_
(14.12)
for all bounded and measurable functions, h : X 1. The function,

f :
X 1, is often denoted by writing

f (x) = E(f[X = x). If P (X = x) >
0, then E(f[X = x) = E(f : X = x) /P (X = x) consistent with our previous
denitions compare with Example 14.9. If P (X = x) , E(f[X = x) is not
given a value but is just a convenient notational way to denote a function

f :
X 1 such that Eq. (14.12) holds. (Roughly speaking, you should think that
E(f[X = x) = E[f
x
(X)] /E[
x
(X)] where
x
is the Dirac delta function
at x. If this last comment is confusing to you, please ignore it!)
Example 14.12. Suppose that X is a random variable, t 1 and f : 1 1
is a measurable function such that f (X) L
1
(P) . We wish to compute
E[f (X) [X t] = h(X t) . So we are looking for a function, h : (, t] 1
such that
E[f (X) u(X t)] = E[h(X t) u(X t)] (14.13)
for all bounded measurable functions, u : (, t] 1. Taking u = 1
t]
in Eq.
(14.13) implies,
E[f (X) : X t] = h(t) P (X t)
and therefore we should take,
h(t) = E[f (X) [X t]
which by convention we set to be (say) zero if P (X t) = 0. Now suppose that
u(t) = 0, then Eq. (8.6) becomes,
E[f (X) u(X) : X < t] = E[h(X) u(X) : X < t]
from which it follows that f (X) 1
X<t
= h(X) 1
X<t
a.s. Thus we can take
h(x) :=
_
f (x) if x < t
E[f (X) [X t] if x = t
and we have shown,
E[f (X) [X t] = 1
X<t
f (X) + 1
Xt
E[f (X) [X t]
= 1
Xt<t
f (X) + 1
Xt=t
E[f (X) [X t] .
Proposition 14.13. Suppose that (, B, P) is a probability space, (X, /, )
and (Y, ^, ) are two nite measure spaces, X : X and Y : Y
are measurable functions, and there exists 0 L
1
(, B, ) such that
P ((X, Y ) U) =
_
U
(x, y) d(x) d (y) for all U /^. Let
(x) :=
_
Y
(x, y) d (y) (14.14)
and x X and B ^, let
Q(x, B) :=
_
1
(x)
_
B
(x, y) d (y) if (x) (0, )
y0
(B) if (x) 0,
(14.15)
where y
0
is some arbitrary but xed point in Y. Then for any bounded (or non-
negative) measurable function, f : X Y 1, we have
E[f (X, Y ) [X] = Q(X, f (X, )) =:
_
Y
f (X, y) Q(X, dy) = g (X) a.s. (14.16)
where,
g (x) :=
_
Y
f (x, y) Q(x, dy) = Q(x, f (x, )) .
As usual we use the notation,
Q(x, v) :=
_
Y
v (y) Q(x, dy) =
_
1
(x)
_
Y
v (y) (x, y) d (y) if (x) (0, )
y0
(v) = v (y
0
) if (x) 0, .
for all bounded measurable functions, v : Y 1.
Proof. Our goal is to compute E[f (X, Y ) [X] . According to Remark 14.11,
we are searching for a bounded measurable function, g : X 1, such that
E[f (X, Y ) h(X)] = E[g (X) h(X)] for all h /
b
. (14.17)
(Throughout this argument we are going to repeatedly use the Tonelli - Fubini
theorems.) We now explicitly write out both sides of Eq. (14.17);
E[f (X, Y ) h(X)] =
_
XY
h(x) f (x, y) (x, y) d(x) d (y)
=
_
X
h(x)
__
Y
f (x, y) (x, y) d (y)
_
d(x) (14.18)
E[g (X) h(X)] =
_
XY
h(x) g (x) (x, y) d(x) d (y)
=
_
X
h(x) g (x) (x) d(x) . (14.19)
Since the right sides of Eqs. (14.18) and (14.19) must be equal for all h /
b
,
we must demand (see Lemma 7.23 and 7.24) that
_
Y
f (x, y) (x, y) d (y) = g (x) (x) for a.e. x. (14.20)
There are two possible problems in solving this equation for g (x) at a particular
point x; the rst is when (x) = 0 and the second is when (x) = . Since
_
X
(x) d(x) =
_
X
__
Y
(x, y) d (y)
_
d(x) = 1,
we know that (x) < for a.e. x and therefore it does not matter how g
is dened on = as long as it is measurable. If
0 = (x) =
_
Y
(x, y) d (y) ,
then (x, y) = 0 for a.e. y and therefore,
_
Y
f (x, y) (x, y) d (y) = 0. (14.21)
Hence Eq. (14.20) will be valid no matter how we choose g (x) for x = 0 .
So a valid solution of Eq. (14.20) is
g (x) :=
_
1
(x)
_
Y
f (x, y) (x, y) d (y) if (x) (0, )
f (x, y
0
) =
y0
(f (x, )) if (x) 0,
14.1 Examples 213
and with this choice we will have E[f (X, Y ) [X] = g (X) = Q(X, f) a.s.
as desired. (Observe here that when (x) < , (x, ) L
1
() and hence
_
Y
f (x, y) (x, y) d (y) is a well dened integral.)
It is comforting to observe that
P (X = 0) = P ( (X) = 0) =
_
X
1
=0
d = 0
and similarly
P (X = ) =
_
X
1
=
d = 0.
Thus it follows that P (X x X : (x) = 0 of ) = 0 while the set
x X : (x) = 0 of is precisely where there is ambiguity in dening g (x) .
Just for added security, let us check directly that g (X) = E[f (X, Y ) [X] a.s.
According to Eq. (14.19) we have
E[g (X) h(X)] =
_
X
h(x) g (x) (x) d(x)
=
_
X0< <]
h(x) g (x) (x) d(x)
=
_
X0< <]
h(x) (x)
_
1
(x)
_
Y
f (x, y) (x, y) d (y)
_
d(x)
=
_
X0< <]
h(x)
__
Y
f (x, y) (x, y) d (y)
_
d(x)
=
_
X
h(x)
__
Y
f (x, y) (x, y) d (y)
_
d(x)
= E[f (X, Y ) h(X)] (by Eq. (14.18)),
wherein we have repeatedly used ( = ) = 0 and Eq. (14.21) holds when
(x) = 0. This completes the verication that g (X) = E[f (X, Y ) [X] a.s..
Proposition 14.13 shows that conditional expectation is a generalization of
the notion of performing integration over a partial subset of the variables in the
integrand. Whereas to compute the expectation, one should integrate over all
of the variables. Proposition 14.13 also gives an example of regular conditional
probabilities which we now dene.
Denition 14.14. Let (X, /) and (Y, ^) be measurable spaces. A function,
Q : X ^ [0, 1] is a probability kernel on X Y if
1. Q(x, ) : ^ [0, 1] is a probability measure on (Y, ^) for each x X and
2. Q(, B) : X [0, 1] is //B
R
measurable for all B ^.
If Q is a probability kernel on X Y and f : Y 1 is a bounded
measurable function or a positive measurable function, then x Q(x, f) :=
_
Y
f (y) Q(x, dy) is //B
R
measurable. This is clear for simple functions and
then for general functions via simple limiting arguments.
Denition 14.15. Let (X, /) and (Y, ^) be measurable spaces and X :
X and Y : Y be measurable functions. A probability kernel, Q, on X Y
is said to be a regular conditional distribution of Y given X i Q(X, B)
is a version of P (Y B[X) for each B ^. Equivalently, we should have
Q(X, f) = E[f (Y ) [X] a.s. for all f ^
b
.
The probability kernel, Q, dened in Eq. (14.15) is an example of a regular
conditional distribution of Y given X.
Remark 14.16. Unfortunately, regular conditional distributions do not always
exists, see Doob [13, p. 624]. However, if we require Y to be a standard Borel
space, (i.e. Y is isomorphic to a Borel subset of 1), then a conditional distribu-
tion of Y given X will always exists. See Theorem 14.32 in the appendix to this
chapter. Moreover, it is known that reasonable measure spaces are standard
Borel spaces, see Section 9.10 above for more details. So in most instances of
interest a regular conditional distribution of Y given X will exist.
Exercise 14.3. Suppose that (X, /) and (Y, ^) are measurable spaces, X :
X and Y : Y are measurable functions, and there exists a regular
conditional distribution, Q, of Y given X. Show:
1. For all bounded measurable functions, f : (X Y, /^) 1, the func-
tion X x Q(x, f (x, )) is measurable and
Q(X, f (X, )) = E[f (X, Y ) [X] a.s. (14.22)
Hint: let H denote the set of bounded measurable functions, f, on X Y
such that the two assertions are valid.
2. If A /^ and := P X
1
be the law of X, then
P ((X, Y ) A) =
_
X
Q(x, 1
A
(x, )) d(x) =
_
X
d(x)
_
Y
1
A
(x, y) Q(x, dy) .
(14.23)
Exercise 14.4. Keeping the same notation as in Exercise 14.3 and further as-
sume that X and Y are independent. Find a regular conditional distribution of
Y given X and prove
E[f (X, Y ) [X] = h
f
(X) a.s. bounded measurable f : X Y 1,
where
h
f
(x) := E[f (x, Y )] for all x X,
i.e.
E[f (X, Y ) [X] = E[f (x, Y )] [
x=X
a.s.
Exercise 14.5 (Exercise 14.4 strengthened). Let (X, /), (Y, ^) be mea-
surable spaces, (, T, P) a probability space, and X : X and Y : Y be
measurable functions. Let ( T be a eld such that X is (// measurable
and Y is independent of (. Then for any bounded measurable f : X Y 1
we have
E[f(X, Y )[(] = h
f
(X) where h
f
(x) := E[f(x, Y )]. (14.24)
Exercise 14.6. Suppose (, B, P) and (
t
, B
t
, P
t
) are two probability spaces,
(X, /) and (Y, ^) are measurable spaces, X : X, X
t
:
t
X, Y :
Y,and Y
t
: Y are measurable functions such that P (X, Y )
1
=
P
t
(X
t
, Y
t
) , i.e. (X, Y )
d
= (X
t
, Y
t
) . If f : (X Y, /^) 1 is a bounded
measurable function and

f : (X, /) 1 is a measurable function such that
f (X) = E[f (X, Y ) [X] P - a.s. then

E
t
[f (X
t
, Y
t
) [X
t
] =

f (X
t
) P
t
a.s.
Let now suppose that ( is a sub--algebra of B and let P
: B L
1
(, (, P)
be dened by, P
(B) = P (B[() := E
1
B
L
1
(, B, P) for all B B. If B =
n=1
B
n
with B
n
B, then 1
B
=
n=1
1
Bn
and this sum converges in L
1
(P)
(in fact in all L
p
(P)) by the DCT. Since E
: L
1
(, B, P) L
1
(, (, P) is a
contraction and therefore continuous it follows that
P
(B) = E
1
B
= E
n=1
1
Bn
=
n=1
E
1
Bn
=
n=1
P
(B
n
) (14.25)
where all equalities are in L
1
(, (, P) . Now suppose that we have chosen a
representative,

P
(B) : [0, 1] , of P
(B) for each B B. From Eq. (14.25)

it follows that
(B) () =
n=1
(B
n
) () for P -a.e. . (14.26)
However, note well, the exceptional set of s depends on the sets B, B
n

B. The goal of regular conditioning is to carefully choose the representative,
(B) : [0, 1] , such that Eq. (14.26) holds for all and all B, B
n
B
with B =
n=1
B
n
.
Denition 14.17. If ( is a sub- algebra of B, a regular conditional dis-
tribution given ( is a probability kernel on Q : (, () (, B) [0, 1] such
that
Q(, B) = P (B[() () a.s. for every B B. (14.27)
This corresponds to the Q in Denition 14.15 provided, (X, /) = (, () ,
(Y, ^) = (, B) , and X () = Y () = for all .
14.2 Additional Properties of Conditional Expectations
The next theorem is devoted to extending the notion of conditional expectations
to all non-negative functions and to proving conditional versions of the MCT,
DCT, and Fatous lemma.
Theorem 14.18 (Extending E
). If f : [0, ] is B measurable, there

is a ( measurable function, F : [0, ] , satisfying
E[f : A] = E[F : A] for all A (. (14.28)
By Lemma 7.24, the function F is uniquely determined up to sets of measure
zero and hence we denote any such version of F by E
f.
1. Properties 2., 5. (with 0 g (
b
), and 6. of Theorem 14.4 still hold for
any B measurable functions such that 0 f g. Namely;
a) Order Preserving. E
f E
g a.s. when 0 f g,
b) Pull out Property. E
[hf] = hE
[f] a.s. for all h 0 and (

measurable.
c) Tower or smoothing property. If (
0
(
1
B. Then
E
0
E
1
f = E
1
E
0
f = E
0
f a.s.
2. Conditional Monotone Convergence (cMCT). Suppose that, almost
surely, 0 f
n
f
n+1
for all n, then lim
n
E
f
n
= E
[lim
n
f
n
] a.s.
3. Conditional Fatous Lemma (cFatou). Suppose again that 0 f
n

L
1
(, B, P) a.s., then
E
_
liminf
n
f
n
_
liminf
n
E
[f
n
] a.s. (14.29)
Proof. Since fn L
1
(, B, P) and fn is increasing, it follows that F :=
lim
n
E
[f n] exists a.s. Moreover, by two applications of the standard

MCT, we have for any A (, that
E[F : A] = lim
n
E[E
[f n] : A] = lim
n
E[f n : A] = lim
n
E[f : A] .
Thus Eq. (14.28) holds and this uniquely determines F follows from Lemma
7.24.
14.2 Additional Properties of Conditional Expectations 215
Item 1. a) If 0 f g, then
E
f = lim
n
E
[f n] lim
n
E
[g n] = E
g a.s.
and so E
still preserves order. We will prove items 1b and 1c at the end of this
proof.
Item 2. Suppose that, almost surely, 0 f
n
f
n+1
for all n, then E
f
n
is a.s. increasing in n. Hence, again by two applications of the MCT, for any
A (, we have
E
_
lim
n
E
f
n
: A
_
= lim
n
E[E
f
n
: A] = lim
n
E[f
n
: A]
= E
_
lim
n
f
n
: A
_
= E
_
E
_
lim
n
f
n
_
: A
_
which combined with Lemma 7.24 implies that lim
n
E
f
n
= E
[lim
n
f
n
]
a.s.
Item 3. For 0 f
n
, let g
k
:= inf
nk
f
n
. Then g
k
f
k
for all k and g
k

liminf
n
f
n
and hence by cMCT and item 1.,
E
_
liminf
n
f
n
_
= lim
k
E
g
k
liminf
k
E
f
k
a.s.
Item 1. b) If h 0 is a ( measurable function and f 0, then by cMCT,
E
[hf]
cMCT
= lim
n
E
[(h n) (f n)]
= lim
n
(h n) E
[(f n)]
cMCT
= hE
f a.s.
Item 1. c) Similarly by multiple uses of cMCT,
E
0
E
1
f = E
0
lim
n
E
1
(f n) = lim
n
E
0
E
1
(f n)
= lim
n
E
0
(f n) = E
0
f
and
E
1
E
0
f = E
1
lim
n
E
0
(f n) = lim
n
E
1
E
0
[f n]
= lim
n
E
0
(f n) = E
0
f.
Theorem 14.19 (Conditional Dominated Convergence (cDCT)). If
f
n
a.s
f, and [f
n
[ g L
1
(, B, P) , then E
f
n
E
f a.s.
Proof. From Corollary 12.9 we know that f
n
f in L
1
(P) and therefore
E
f
n
E
f in L
1
(P) as conditional expectation is a contraction on L
1
(P) . So
we need only prove the almost sure convergence. As usual it suces to consider
the real case.
Following the proof of the Dominated convergence theorem, we start with
the fact that 0 g f
n
a.s. for all n. Hence by cFatou,
E
(g f) = E
_
liminf
n
(g f
n
)
_
liminf
n
E
(g f
n
) = E
g +
_
liminf
n
E
(f
n
) in + case
limsup
n
E
(f
n
) in case,
where the above equations hold a.s. Cancelling E
g from both sides of the

equation then implies
limsup
n
E
(f
n
) E
f liminf
n
E
(f
n
) a.s.
Remark 14.20. Suppose that f
n
P
f, [f
n
[ g
n
L
1
(, B, P) , g
n
P
g
L
1
(, B, P) and Eg
n
Eg. Then by the DCT in Corollary 12.9, we know that
f
n
f in L
1
(, B, P) . Since E
is a contraction, it follows that E
f
n
E
f
in L
1
(, B, P) and hence E
f
n
P
E
f.
Exercise 14.7. Suppose that (X, /) and (Y, ^) are measurable spaces, (Y, ^)
X : X and Y : Y are measurable functions. Further assume that ( is
a sub algebra of B, X is (// measurable and Y is independent of (.
Show for all bounded measurable functions, f : (X Y, /^) 1, that
E[f (X, Y ) [(] = h
f
(X) = E[f (x, Y )] [
x=X
a.s.
where if := Law
P
(Y ) ,
h
f
(x) := E[f (x, Y )] =
_
Y
f (x, y) d(y) . (14.30)
Solution to Exercise (14.7). Notice by Fubinis theorem, h
f
(x) is ^/B
R
measurable and therefore h
f
(X) is ( measurable. If f (x, y) = u(x) v (y)
where u : (X, /) (1, B
R
) and v : (Y, ^) (1, B
R
) are measurable functions,
then by Lemma 14.10
E[f (X, Y ) [(] = E[u(X) v (Y ) [(] = u(X) E[v (Y ) [(]
= u(X) E[v (Y )] = u(X) (v) = h
f
(X) a.s.
The proof may now be completed using the multiplicative systems Theorem
8.2. In more detail, let
H := f [/^]
b
: E[f (X, Y ) [(] = h
f
(X) a.s. .
Using the linearity of conditional expectations and of expectations along with
the DCT and the cDCT (Theorem 14.19), it is easily seen that H is a linear
subspace which is closed under bounded convergence. Moreover we have just
seen that H contains the multiplicative system of product functions of the form
f (x, y) = u(x) v (y) . Since such functions generate / ^, it follows that H
consists of all bounded measurable functions on X Y.
The next result in Lemma 14.23 shows how to localize conditional expec-
tations. In order to state and prove the lemma we need a little ground work
rst.
Denition 14.21. Suppose that T and ( are sub--elds of B and A B. We
say that T = ( on A i T
A
= (
A
. Recall that T
A
= B A : B T .
Lemma 14.22. If A T (, then T
A
(
A
= [T (]
A
and T
A
= (
A
implies
T
A
= (
A
= [T (]
A
. (14.31)
Proof. If A T we have B T
A
i there exists B
t
T such that B =
A B
t
. As A T it follows that B T and therefore we have
T
A
= B A : B T .
Thus if A T ( it follows that T
A
= B A : B T and (
A
=
B A : B ( and therefore
T
A
(
A
= B A : B T ( = [T (]
A
.
Equation (14.31) now clearly follows from this identity when T
A
= (
A
.
Lemma 14.23 (Localizing Conditional Expectations). Let (, B, P) be
a probability space, T and ( be sub-sigma-leds of B, X, Y L
1
(, B, P) or
X, Y : (, B) [0, ] are measurable, and A T (. If T = ( on A and
X = Y a.s. on A, then
E
J
X = E
J
X = E
J
Y = E
Y a.s. on A. (14.32)
Proof. It suces to prove, E
J
X = E
J
Y a.s. on A and this is equivalent
to 1
A
E
J
X = 1
A
E
J
Y a.s. As both sides of this equation are T measurable,
by the comparison Lemma 7.24 we need to show
E[1
A
E
J
X : B] = E[1
A
E
J
Y : B] for all B T.
So let B T in which case AB T
A
= (
A
= [T (]
A
T ( (see Lemma
14.22). Therefore, using the basic properties of conditional expectations, we
have
E[1
A
E
J
X : B] = E[E
J
X : A B] = E[X : A B]
and similarly
E[1
A
E
J
Y : B] = E[E
J
Y : A B] = E[Y : A B] .
This completes the proof as E[X : A B] = E[Y : A B] because of the as-
sumption that X = Y a.s. on A.
Example 14.24. Let us use Lemma 14.23 to show E[f (X) [X t] = f (X) =
f (X t) on X < t a fact we have already seen to be true in Example 14.12.
Let us begin by observing that X < t = X t < t (X) (X t) .
Moreover, using (X)
A
= (X[
A
) for all A B,
2
we see that
(X)
X<t]
=
_
X[
X<t]
_
=
_
(X t) [
X<t]
_
= (X t)
X<t]
.
E[f (X) [X t] = E[f (X) [ (X t)] = E[f (X) [ (X)] = f (X) a.s. on X < t .
What goes wrong with the above argument if you replace X < t by X t
everywhere? (Notice that the same argument shows; if X = Y on A (X)
(Y ) then E[f (X) [Y ] = f (Y ) = f (X) a.s. on A.)
Theorem 14.25 (Conditional Jensens inequality). Let (, B, P) be a
probability space, a < b , and : (a, b) 1 be a convex func-
tion. Assume f L
1
(, B, P; 1) is a random variable satisfying, f (a, b) a.s.
and (f) L
1
(, B, P; 1). Then (E
f) L
1
(, (, P) ,
(E
f) E
[(f)] a.s. (14.33)

and
E[(E
f)] E[(f)] (14.34)

Proof. Let := (a, b) a countable dense subset of (a, b) . By Theorem
12.55 (also see Lemma 12.52) and Figure 12.5 when is C
1
)
(y) (x) +
t
(x)(y x) for all for all x, y (a, b) , (14.35)

2
Here is the verication that (X)
A
= (X|A) . Let iA : A be the inclusion
map. Since (X) = X
1
(BR) and (X)
A
= i
1
A
(X) it follows that
(X)
A
= i
1
A
_
X
1
(BR)
_
= (X iA)
1
(BR)
= (X iA) = (X|A) .
14.2 Additional Properties of Conditional Expectations 217
where
t
(x) is the left hand derivative of at x. Taking y = f and then taking

conditional expectations imply,
E
[(f)] E
_
(x) +
t
(x)(f x)
= (x) +
t
(x)(E
f x) a.s. (14.36)
Since this is true for all x (a, b) (and hence all x in the countable set, ) we
may conclude that
E
[(f)] sup
x
_
(x) +
t
(x)(E
f x)
a.s.
By Exercise 14.1, E
f (a, b) , and hence it follows from Corollary 12.56 that

sup
x
_
(x) +
t
(x)(E
f x)
= (E
f) a.s.
Combining the last two estimates proves Eq. (14.33).
From Eq. (14.33) and Eq. (14.35) with y = E
f and x (a, b) xed we nd,

(x) +
t
(x) (E
f x) (E
f) E
[(f)] . (14.37)
Therefore
[(E
f)[ [E
[(f)][
(x) +
t
(x)(E
f x)
L
1
(, (, P) (14.38)
which implies that (E
f) L
1
(, (, P) . Taking expectations of Eq. (14.33)
is now allowed and immediately gives Eq. (14.34).
Remark 14.26 (On Theorem 14.25 and its proof.). *From Eq. (14.35),
(f) (E
f) +
t
(E
f)(f E
f). (14.39)
Therefore taking E
of this equation implies that

E
[(f)] (E
f) +E
(E
f)(f E
f)
= (E
f) +
t
(E
f)E
[(f E
f)] = (E
f). (14.40)
The technical problem with this argument is the justication that
E
(E
f)(f E
f)
=
t
(E
f)E
[(f E
f)] since there is no rea-

son for
t
to be a bounded function. The proof we give in Theorem 14.25

circumvents this technical detail.
On the other hand let us now suppose that is C
1
(1) is convex and for
the moment that [f[ M < a.s. Then E
f [M, M] a.s. and hence
(E
f) =
t
(E
f) is bounded and Eq. (14.40) is now valid. Moreover, taking

x = 0 in Eq. (14.37) shows
(0) +
t
(0)E
f (E
f) E
[(f)] .
If f is unbounded we may apply the above inequality with f replaced by f
M
:=
f 1
]f]M
in order to conclude,
(0) +
t
(0)E
f
M
(E
f
M
) E
[(f
M
)] .
If we further assume that (f
M
) 0 is increasing as M increase (for example
this is the case if (x) = [x[
p
for some p > 1) we may conclude by passing to
the limit along a nicely chosen subsequence that
(0) +
t
(0)E
f (E
f) E
[(f)]
where we used E
[(f
M
)] E
[(f)] by cMCT.
Corollary 14.27. The conditional expectation operator, E
maps L
p
(, B, P)
into L
p
(, B, P) and the map remains a contraction for all 1 p .
Proof. The case p = and p = 1 have already been covered in Theorem
14.4. So now suppose, 1 < p < , and apply Jensens inequality with (x) =
[x[
p
to nd [E
f[
p
E
[f[
p
a.s. Taking expectations of this inequality gives
the desired result.
Exercise 14.8 (Martingale Convergence Theorem for p = 1 and 2.).
Let (, B, P) be a probability space and B
n
n=1
be an increasing sequence of
sub--algebras of B. Show;
1. The closure, M, of
n=1
L
2
(, B
n
, P) is L
2
(, B
, P) where B
n=1
B
n
:= (
n=1
B
n
) . Hint: make use of Theorem 12.29.
2. For every X L
2
(, B, P) , X
n
:= E[X[B
n
] E[X[B
] in L
2
(P) . Hint:
see Exercise 13.5.
3. For every X L
1
(, B, P) , X
n
:= E[X[B
n
] E[X[B
] in L
1
(P) . Hint:
make use of item 2. by a truncation argument using the contractive prop-
erties of conditional expectations.
(Eventually we will show that X
n
= E[X[B
n
] E[X[B
] a.s. as well.)
Exercise 14.9 (Martingale Convergence Theorem for general p). Let
1 p < , (, B, P) be a probability space, and B
n
n=1
be an increas-
ing sequence of sub--algebras of B. Show for all X L
p
(, B, P) , X
n
:=
E[X[B
n
] E[X[B
] in L
p
(P) . (Hint: show that [E[X[B
n
][
p
n=1
is uni-
formly integrable and E[X[B
n
]
P
E[X[B
] with the aid of item 3. of Exercise

14.8.)
14.3 Construction of Regular Conditional Distributions*
Lemma 14.28. Suppose that h : [0, 1] is an increasing (i.e. non-
decreasing) function and H (t) := inf h(s) : t < s for all t 1. Then
H : 1 [0, 1] is an increasing right continuous function.
Proof. If t
1
< t
2
, then
h(s) : t
1
< s h(s) : t
2
< s
and therefore H (t
1
) H (t
2
) . Let H (t+) := lim
t
H () . Then for any s
with s > t we have H (t) H (t+) h(s) and then taking the inmum over
such s we learn that H (t) H (t+) H (t) , i.e. H (t+) = H (t) .
Lemma 14.29. Suppose that (X, /) is a measurable space and F : X1 1
is a function such that; 1) F (, t) : X 1 is //B
R
measurable for all t 1,
and 2) F (x, ) : 1 1 is right continuous for all x X. Then F is /B
R
/B
R
measurable.
Proof. For n N, the function,
F
n
(x, t) :=
k=
F
_
x, (k + 1) 2
n
_
1
(k2
n
,(k+1)2
n
]
(t) ,
is /B
R
/B
R
measurable. Using the right continuity assumption, it follows
that F (x, t) = lim
n
F
n
(x, t) for all (x, t) X 1 and therefore F is also
/B
R
/B
R
measurable.
Proposition 14.30. Let B
R
be the Borel algebra on 1. Then B
R
contains
a countable sub-algebra, /
R
B
R
, which generates B
R
and has the amazing
property that every nitely additive probability measure on /
R
extends uniquely
to a countably additive probability measure on B
R
.
Proof. By the results in Appendix 9.10, we know that (1, B
R
) is measure
theoretically isomorphic to
_
0, 1
N
, T
_
where T is the product algebra. As
we saw in Section 5.5, T is generated by the countable algebra, / :=
n=1
/
n
where
/
n
:= B : B 0, 1
n
for all n N.
According to the baby Kolmogorov Theorem 5.40, any nitely additive prob-
ability measure on / has a unique extension to a probability measure on T.
The algebra / may now be transferred by the measure theoretic isomorphism
to the desired sub-algebra, /
R
, of B
R
.
Theorem 14.31. Suppose that (X, /) is a measurable space, X : X is a
measurable function and Y : 1 is a random variable. Then there exists a
probability kernel, Q, on X1 such that E[f (Y ) [X] = Q(X, f) , P a.s., for
all bounded measurable functions, f : 1 1.
Proof. First proof. For each r , let q
r
: X [0, 1] be a measurable
function such that
E[1
Y r
[X] = q
r
(X) a.s.
Let := P X
1
be the law of X. Then using the basic properties of conditional
expectation, q
r
q
s
a.s. for all r s, lim
r
q
r
= 1 and lim
r
q
r
= 0,
a.s. Hence the set, X
0
X where q
r
(x) q
s
(x) for all r s, lim
r
q
r
(x) = 1,
and lim
r
q
r
(x) = 0 satises, (X
0
) = P (X X
0
) = 1. For t 1, let
F (x, t) := 1
X0
(x) inf q
r
(x) : r > t + 1
X\X0
(x) 1
t0
.
Then F (, t) : X 1 is measurable for each t 1 and by Lemma 14.28, F (x, )
is a distribution function on 1 for each x X. Hence an application of Lemma
14.29 shows F : X 1 [0, 1] is measurable.
For each x X and B B
R
, let Q(x, B) =
F(x,)
(B) where
F
denotes the
probability measure on 1 determined by a distribution function, F : 1 [0, 1] .
We will now show that Q is the desired probability kernel. To prove this, let
H be the collection of bounded measurable functions, f : 1 1, such that X
x Q(x, f) 1 is measurable and E[f (Y ) [X] = Q(X, f) , P a.s. It is easily
seen that H is a linear subspace which is closed under bounded convergence. We
will nish the proof by showing that H contains the multiplicative class, M =
_
1
(,t]
: t 1
_
so that multiplicative systems Theorem 8.2 may be applied.
Notice that Q
_
x, 1
(,t]
_
= F (x, t) is measurable. Now let r and
g : X 1 be a bounded measurable function, then
E[1
Y r
g (X)] = E[E[1
Y r
[X] g (X)] = E[q
r
(X) g (X)]
= E[q
r
(X) 1
X0
(X) g (X)] .
For t 1, we may let r t in the above equality (use DCT) to learn,
E[1
Y t
g (X)] = E[F (X, t) 1
X0
(X) g (X)] = E[F (X, t) g (X)] .
Since g was arbitrary, we may conclude that
Q
_
X, 1
(,t]
_
= F (X, t) = E[1
Y t
[X] a.s.
This completes the proof.
Second proof. Let / := /
R
be the algebra described in Proposition 14.30.
For each A /, let
A
: X 1 be a measurable function such that
A
(X) =
P (Y A[X) a.s. If A = A
1
A
2
with A
i
/ and A
1
A
2
= , then
14.3 Construction of Regular Conditional Distributions* 219
A1
(X) +
A2
(X) = P (Y A
1
[X) +P (Y A
2
[X)
= P (Y A
1
A
2
[X) =
A1+A2
(X) a.s.
Thus if := Law
P
(X) , we have
A1
(x) +
A2
(x) =
A1+A2
(x) for a.e. x.
Since
R
(X) = P (Y 1[X) = 1 a.s.
we know that
R
(x) = 1 for a.e. x.
Thus if we let X
0
denote those x X such that
R
(x) = 1 and
A1
(x) +
A2
(x) =
A1+A2
(x) for all disjoint pairs, (A
1
, A
2
) /
2
, we have (X
0
) = 1
and / A Q
0
(x, A) :=
A
(x) is a nitely additive probability measure on
/. According to Proposition 14.30, Q
0
(x, ) extends to a probability measure,
Q(x, ) on B
R
for all x X
0
. For x / X
0
we let Q
0
(x, ) =
0
where
0
(B) =
1
B
(0) for all B B
R
.
We will now show that Q is the desired probability kernel. To prove this,
let H be the collection of bounded measurable functions, f : 1 1, such that
X x Q(x, f) 1 is measurable and E[f (Y ) [X] = Q(X, f) , P a.s. By
construction, H contains the multiplicative system, 1
A
: A / . Moreover
it is easily seen that H is a linear subspace which is closed under bounded
convergence. Therefore by the multiplicative systems Theorem 8.2, H consists
of all bounded measurable functions on 1.
This result leads fairly immediately to the following far reaching generaliza-
tion.
Theorem 14.32. Suppose that (X, /) is a measurable space and (Y, ^) is
a standard Borel space
3
, see Appendix 9.10. Suppose that X : X and
Y : Y are measurable functions. Then there exists a probability kernel, Q,
on XY such that E[f (Y ) [X] = Q(X, f) , P a.s., for all bounded measurable
functions, f : Y 1.
Proof. By denition of a standard Borel space, we may assume that Y B
R
and ^ = B
Y
. In this case Y may also be viewed to be a measurable map form
1 such that Y () Y. By Theorem 14.31, we may nd a probability
kernel, Q
0
, on X 1 such that
E[f (Y ) [X] = Q
0
(X, f) , P a.s., (14.41)
for all bounded measurable functions, f : 1 1.
Taking f = 1
Y
in Eq. (14.41) shows
1 = E[1
Y
(Y ) [X] = Q
0
(X, Y) a.s..
3
According to the counter example in Doob [13, p. 624], it is not sucient to assume
that N is countably generated!
Thus if we let X
0
:= x X : Q
0
(x, Y) = 1 , we know that P (X X
0
) = 1.
Let us now dene
Q(x, B) := 1
X0
(x) Q
0
(x, B) + 1
X\X0
(x)
y
(B) for (x, B) X B
Y
,
where y is an arbitrary but xed point in Y. Then and hence Q is a probability
kernel on X Y. Moreover if B B
Y
B
R
, then
Q(X, B) = 1
X0
(X) Q
0
(X, B) = 1
X0
(X) E[1
B
(Y ) [X] = E[1
B
(Y ) [X] a.s.
This shows that Q is the desired regular conditional probability.
Corollary 14.33. Suppose ( is a sub- algebra of B, (Y, ^) is a standard
Borel space, and Y : Y is a measurable function. Then there exists a
probability kernel, Q, on (, () (Y, ^) such that E[f (Y ) [(] = Q(, f) , P -
a.s. for all bounded measurable functions, f : Y 1.
Proof. This is a special case of Theorem 14.32 applied with (X, /) = (, ()
and X : being the identity map which is B/( measurable.
Corollary 14.34. Suppose that (, B, P) is a probability space such that (, B)
is a standard Borel space and ( is a sub- algebra B. Then there exists a
probability kernel, Q on (, () (, B) such that E[Z[(] = Q(, Z) , P - a.s.
for all bounded B measurable random variables, Z : 1.
Proof. This is a special case of Corollary 14.33 with (Y, ^) = (, B) and
Y : being the identity map which is B/B measurable.
Remark 14.35. It turns out that every standard Borel space (X, /) possess a
countable sub-algebra / generating / with the property that every nitely
additive probability measure on / extends to a probability measure on /,
see [7]. With this in hand, the second proof of Theorem 14.31 extends easily to
give another proof of Theorem 14.32 all in one go. As the next example shows
it is a bit tricky to produce the algebra /.
Example 14.36. Let := 0, 1
N
,
i
: 0, 1 be projection onto the i
th
component and B := (
1
,
2
, . . . ) be the product algebra on . Further
let / :=
n=1
/
n
where
/
n
:= B : B 0, 1
n
for all n N.
Suppose that X = e
n
n=1
where e
n
(i) =
in
for i, n N. I now claim
that
/
X
= A X : #(A) < or #(A
c
) < =: (
is the so called conite algebra. To see this observe that / is generated by
sets of the form
i
= 1 for i N. Therefore /
X
is generated by sets of the
form
i
= 1
X
= e
i
. But these one point sets are easily seen to generate (.
Now suppose that : X [0, 1] is a function such that Z :=
nN
(e
n
)
(0, 1) and let (B) :=

aB
(a) for all B X. Then is a measure on 2
X
with (X) = Z < 1.
Using this measure , we may dene P
0
: /
X
= ( [0, 1] by,
P
0
(A) :=
_
(A) if #(A) <
1 (A
c
) if #(A
c
) <
.
I claim that P
0
is a nitely additive probability measure on /
X
= ( which has
no -extension to a probability measure on 2
X
. To see that P
0
is nitely additive,
let A, B ( be disjoint sets. If both A and B are nite sets, then
P
0
(A B) = (A B) = (A) +(B) = P
0
(A) +P
0
(B) .
If one of the sets is an innite set, say B, then #(B
c
) < and #(A) < for
otherwise A B ,= . As A B = we know that A B
c
and therefore,
P
0
(A B) = 1 ([A B]
c
) = 1 (A
c
B
c
)
= 1 (B
c
A) = 1 ((B
c
) (A))
= 1 (B
c
) +(A) = P
0
(B) +P
0
(A) .
Thus we have shown that P
0
: /
X
[0, 1] is a nitely additive probability
measure. If P were a countably additive extension of P
0
, we would have to
have,
1 = P
0
(X) = P (X) =
n=1
P (e
n
)
=
n=1
P
0
(e
n
) =
n=1
(e
n
) = Z < 1
which is clearly a contradiction.
There is however a way to x this example as shown in [7]. It is to replace
/
X
in this example by the algebra, /, generated by c := n : n 2 . This
algebra may be described as those A N such that either A 2, 3, . . . for
1 A and #(A
c
) < . Thus if A
k
/ with A
k
we must have that 1 / A
k
for k large and therefore #(A
k
) < for k large. Moreover #(A
k
) is decreasing
in k. If lim
k
#(A
k
) = m > 0, we must have that A
k
= A
l
for all k, l large
and therefore A
k
,= . Thus we must conclude that A
k
= for large k. We
therefore may conclude that any nitely additive probability measure, P
0
, on
/ has a unique extension to a probability measure on (/) = 2
N
.
15
The Radon-Nikodym Theorem
Theorem 15.1 (A Baby Radon-Nikodym Theorem). Suppose (X, /) is
a measurable space, and are two nite positive measures on / such that
(A) (A) for all A /. Then there exists a measurable function, : X
[0, 1] such that d = d.
Proof. If f is a non-negative simple function, then
(f) =
a0
a (f = a)
a0
a(f = a) = (f) .
In light of Theorem 6.39 and the MCT, this inequality continues to hold for all
non-negative measurable functions. Furthermore if f L
1
() , then ([f[)
([f[) < and hence f L
1
() and
[ (f)[ ([f[) ([f[) (X)
1/2
|f|
L
2
()
.
Therefore, L
2
() f (f) C is a continuous linear functional on L
2
().
By the Riesz representation Theorem 13.15, there exists a unique L
2
()
such that
(f) =
_
X
fd for all f L
2
().
In particular this equation holds for all bounded measurable functions, f : X
1 and for such a function we have
(f) = Re (f) = Re
_
X
fd =
_
X
f Re d. (15.1)
Thus by replacing by Re if necessary we may assume is real.
Taking f = 1
<0
in Eq. (15.1) shows
0 ( < 0) =
_
X
1
<0
d 0,
from which we conclude that 1
<0
= 0, a.e., i.e. ( < 0) = 0. Therefore
0, a.e. Similarly for > 1,
( > ) ( > ) =
_
X
1
>
d ( > )
which is possible i ( > ) = 0. Letting 1, it follows that ( > 1) = 0
and hence 0 1, - a.e.
Denition 15.2. Let and be two positive measure on a measurable space,
(X, /). Then:
1. and are mutually singular (written as ) if there exists A /
such that (A) = 0 and (A
c
) = 0. We say that lives on A and lives
on A
c
.
2. The measure is absolutely continuous relative to (written as
) provided (A) = 0 whenever (A) = 0.
As an example, suppose that is a positive measure and 0 is a measur-
able function. Then the measure, := is absolutely continuous relative to
. Indeed, if (A) = 0 then
(A) =
_
A
d = 0.
We will eventually show that if and are nite and , then d = d
for some measurable function, 0.
Denition 15.3 (Lebesgue Decomposition). Let and be two positive
measure on a measurable space, (X, /). Two positive measures
a
and
s
form
a Lebesgue decomposition of relative to if =
a
+
s
,
a
, and
s
.
Lemma 15.4. If
1
,
2
and are positive measures on (X, /) such that
1

and
2
, then (
1
+
2
) . More generally if
i
i=1
is a sequence of
positive measures such that
i
for all i then =
i=1
i
is singular relative
to .
Proof. It suces to prove the second assertion since we can then take
j
0
for all j 3. Choose A
i
/ such that (A
i
) = 0 and
i
(A
c
i
) = 0 for all i.
Letting A :=
i
A
i
we have (A) = 0. Moreover, since A
c
=
i
A
c
i
A
c
m
for
all m, we have
i
(A
c
) = 0 for all i and therefore, (A
c
) = 0. This shows that
.
Lemma 15.5. Let and be positive measures on (X, /). If there exists a
Lebesgue decomposition, =
s
+
a
, of the measure relative to then this
decomposition is unique. Moreover: if is a nite measure then so are
s
and
a
.
222 15 The Radon-Nikodym Theorem
Proof. Since
s
, there exists A /such that (A) = 0 and
s
(A
c
) = 0
and because
a
, we also know that
a
(A) = 0. So for C /,
(C A) =
s
(C A) +
a
(C A) =
s
(C A) =
s
(C) (15.2)
and
(C A
c
) =
s
(C A
c
) +
a
(C A
c
) =
a
(C A
c
) =
a
(C) . (15.3)
Now suppose we have another Lebesgue decomposition, =
a
+
s
with

s
and
a
. Working as above, we may choose

A / such that
(

A) = 0 and

A
c
is
s
null. Then B = A

A is still a null set and and
B
c
= A
c

A
c
is a null set for both
s
and
s
. Therefore we may use Eqs. (15.2)
and (15.3) with A being replaced by B to conclude,
s
(C) = (C B) =
s
(C) and
a
(C) = (C B
c
) =
a
(C) for all C /.
Lastly if is a nite measure then there exists X
n
/ such that
X =
n=1
X
n
and (X
n
) < for all n. Since > (X
n
) =
a
(X
n
) +
s
(X
n
),
we must have
a
(X
n
) < and
s
(X
n
) < , showing
a
and
s
are nite
as well.
Lemma 15.6. Suppose is a positive measure on (X, /) and f, g : X [0, ]
are functions such that the measures, fd and gd are nite and further
satisfy,
_
A
fd =
_
A
gd for all A /. (15.4)
Then f(x) = g(x) for a.e. x. (BRUCE: this lemma is very closely related
to Lemma 7.24 above.)
Proof. By assumption there exists X
n
/ such that X
n
X and
_
Xn
fd < and
_
Xn
gd < for all n. Replacing A by A X
n
in Eq.
(15.4) implies
_
A
1
Xn
fd =
_
AXn
fd =
_
AXn
gd =
_
A
1
Xn
gd
for all A /. Since 1
Xn
f and 1
Xn
g are in L
1
() for all n, this equation implies
1
Xn
f = 1
Xn
g, a.e. Letting n then shows that f = g, a.e.
Remark 15.7. Lemma 15.6 is in general false without the niteness assump-
tion. A trivial counterexample is to take /= 2
X
, (A) = for all non-empty
A /, f = 1
X
and g = 2 1
X
. Then Eq. (15.4) holds yet f ,= g.
Theorem 15.8 (Radon Nikodym Theorem for Positive Measures).
Suppose that and are nite positive measures on (X, /). Then has
a unique Lebesgue decomposition =
a
+
s
relative to and there exists
a unique (modulo sets of measure 0) function : X [0, ) such that
d
a
= d. Moreover,
s
= 0 i .
Proof. The uniqueness assertions follow directly from Lemmas 15.5 and
15.6.
Existence when and are both nite measures. (Von-Neumanns
Proof. See Remark 15.9 for the motivation for this proof.) First suppose that
and are nite measures and let = +. By Theorem 15.1, d = hd with
0 h 1 and this implies, for all non-negative measurable functions f, that
(f) = (fh) = (fh) +(fh) (15.5)
or equivalently
(f(1 h)) = (fh). (15.6)
Taking f = 1
h=1]
in Eq. (15.6) shows that
(h = 1) = (1
h=1]
(1 h)) = 0,
i.e. 0 h(x) < 1 for - a.e. x. Let
:= 1
h<1]
h
1 h
and then take f = g1
h<1]
(1 h)
1
with g 0 in Eq. (15.6) to learn
(g1
h<1]
) = (g1
h<1]
(1 h)
1
h) = (g).
Hence if we dene
a
:= 1
h<1]
and
s
:= 1
h=1]
,
we then have
s
(since
s
lives on h = 1 while (h = 1) = 0) and
a
= and in particular
a
. Hence =
a
+
s
is the desired Lebesgue
decomposition of . If we further assume that , then (h = 1) = 0 implies
(h = 1) = 0 and hence that
s
= 0 and we conclude that =
a
= .
Existence when and are -nite measures. Write X =

n=1
X
n
where X
n
/ are chosen so that (X
n
) < and (X
n
) < for all n. Let
d
n
= 1
Xn
d and d
n
= 1
Xn
d. Then by what we have just proved there exists
n
L
1
(X,
n
) L
1
(X, ) and measure
s
n
such that d
n
=
n
d
n
+d
s
n
with
s
n

n
. Since
n
and
s
n
live on X
n
there exists A
n
/
Xn
such that
(A
n
) =
n
(A
n
) = 0 and
s
n
(X A
n
) =
s
n
(X
n
A
n
) = 0.
This shows that
s
n
for all n and so by Lemma 15.4,
s
:=

n=1
s
n
is
singular relative to . Since
=
n=1
n
=
n=1
(
n
n
+
s
n
) =
n=1
(
n
1
Xn
+
s
n
) = +
s
, (15.7)
where :=

n=1
1
Xn
n
, it follows that =
a
+
s
with
a
= . Hence this
is the desired Lebesgue decomposition of relative to .
Remark 15.9. Here is the motivation for the above construction. Suppose that
d = d
s
+ d is the Radon-Nikodym decomposition and X = A
B such
that
s
(B) = 0 and (A) = 0. Then we nd
s
(f) +(f) = (f) = (hf) = (hf) +(hf).
Letting f 1
A
f then implies that
(1
A
f) =
s
(1
A
f) = (1
A
hf)
which show that h = 1, a.e. on A. Also letting f 1
B
f implies that
(1
B
f) = (h1
B
f) +(h1
B
f) = (h1
B
f) +(h1
B
f)
which implies, = h +h, a.e. on B, i.e.
(1 h) = h, a.e. on B.
In particular it follows that h < 1, = a.e. on B and that =
h
1h
1
h<1
,
a.e. So up to sets of measure zero, A = h = 1 and B = h < 1 and
therefore,
d = 1
h=1]
d + 1
h<1]
d = 1
h=1]
d +
h
1 h
1
h<1
d.
16
Some Ergodic Theory
The goal of this chapter is to show (in certain circumstances) that time
averages are the same as spatial averages. We start with a simple Hilbert
space version of the type of theorem that we are after. For more on the following
mean Ergodic theorem, see [22] and [14].
Theorem 16.1 (Von-Neumanns Mean Ergodic Theorem). Let U : H
H be an isometry on a Hilbert space H, M = Nul(UI), P = P
M
be orthogonal
projection onto M, and S
n
=
n1
k=0
U
k
. Show
Sn
n
P
M
strongly by which we
mean lim
n
Sn
n
x = P
M
x for all x H.
Proof. Since U is an isometry we have (Ux, Uy) = (x, y) for all x, y H
and therefore that U
U = I. In general it is not true that UU
= I but instead,
UU
= P
Ran(U)
. Thus U
U = I i U is surjective, i.e. U is unitary.

Before starting the proof in earnest we need to prove
Nul(U
I) = Nul(U I).
If x Nul (U I) then x = Ux and therefore U
x = U
Ux = x, i.e. x
Nul(U
I). Conversely if x Nul(U
I) then U
x = x and we have
|Ux x|
2
= 2 |x|
2
2 Re (Ux, x)
= 2 |x|
2
2 Re (x, U
x) = 2 |x|
2
2 Re (x, x) = 0
which shows that Ux = x, i.e. x Nul (U I) . With this remark in hand we
can easily complete the proof.
Let us rst observe that
S
n
n
(U I) =
1
n
[U
n
I] 0 as n .
Thus if x = (U I)y Ran(U I), we have
S
n
n
x =
1
n
(U
n
y y) 0 as n .
More generally if x Ran(U I) and x
t
Ran(UI), we have, since
_
_
Sn
n
_
_
1,
that _
_
_
_
S
n
n
x
S
n
n
x
t
_
_
_
_
|x x
t
|
and hence
lim sup
n
_
_
_
_
S
n
n
x
_
_
_
_
= lim sup
n
_
_
_
_
S
n
n
x
S
n
n
x
t
_
_
_
_
|x x
t
| .
Letting x
t
Ran(U I) tend to x Ran(U I) allows us to conclude that
limsup
n
_
_
Sn
n
x
_
_
= 0.
For
x Ran(U I)
= Ran(U I)
= Nul (U
I) = Nul (U I) = M
we have
Sn
n
x = x. So for general x H, we have x = P
M
x +y with y M
=
Ran(U I) and therefore,
S
n
n
x =
S
n
n
P
M
x +
S
n
n
y = P
M
x +
S
n
n
y P
M
x as n .
For the rest of this section, suppose that (, B, ) is a nite measure
space and : is a measurable map such that
= . After Theorem
16.6 we will further assume that = P is a probability measure. For more
results along the lines of this chapter, the reader is referred to Kallenberg [28,
Chapter 10]. The reader may also benet from Norriss notes in [44].
Denition 16.2. Let
B
:=
_
A B :
1
(A) = A
_
and
B
t
:=
_
A B :
_
1
(A) A
_
= 0
_
be the invariant eld and almost invariant elds respectively.
In what follows we will make use of the following easily proved set identities.
Let A
n
n=1
, B
n
n=1
, and A, B, C be a collection of subsets of , then;
1. AC [AB] [BC] ,
2. [
n=1
A
n
] [
n=1
B
n
]
n=1
A
n
B
n
,
3. [
n=1
A
n
] [
n=1
B
n
]
n=1
A
n
B
n
,
4. BA
n
i.o.
n=1
[BA
n
] .
226 16 Some Ergodic Theory
Lemma 16.3. The elements of B
t
are the same as the elements in B
modulo
null sets, i.e.
B
t
= B B : A B
(AB) = 0 .
Moreover if B B
t
, then
A :=
_
:
k
() B i.o. k
_
B
(16.1)
and (AB) = 0. (We could have just as well taken A to be equal to
_
:
k
() B a.a.
_
.)
Proof. If A B
and B B such that (AB) = 0, then
_
A
1
(B)
_
=
_
1
(A)
1
(B)
_
=
1
(AB) = (AB) = 0
and therefore it follows that
_
B
1
(B)
_
(BA) +
_
A
1
(B)
_
= 0.
This shows that B B
t
.
Conversely if B B
t
then by the invariance of under it follows that
l
(B)
(l+1)
(B)
_
= 0 for all k = 0, 1, 2, 3 . . . . In particular we learn
that
k
(B) B
_
=
_
k
(B)
1
B
k1
l=0
l
(B)
1
(l+1)
(B)
_
=
k1
l=0
l
(B)
(l+1)
(B)
_
= 0.
Thus if A =
_
k
(B) i.o. k
_
as in Eq. (16.1) we have,
(BA)
k=1
_
B
k
(B)
_
= 0.
This completes the proof since
1
(A) =
_
:
k+1
() B i.o. k
_
= A
and thus A B
.
Denition 16.4. A B measurable function, f : 1 is (almost) invariant
i f = f (f = f a.s.).
Lemma 16.5. A B measurable function, f : 1 is (almost) invariant i
f is B
(B
t
) measurable. Moreover, if f is almost invariant, then there exists

and invariant function, g : 1, such that f = g, a.e. (This latter
assertion has already been explained in Exercises 12.3 and 12.4.)
Proof. If f is invariant, f = f, then
1
(f x) = f x =
f x which shows that f x B
for all x 1 and therefore f is B

measurable. Similarly if f is almost invariant so that f = f ( a.e.), then
1
(fx])
1
fx]
_
=
_
1
fx]
1
fx]
_
=
_
1
(,x]
f 1
(,x]
f
_
= 0
from which it follows that f x B
t
for all x 1, that is f is B

t

measurable.
Conversely if f : 1 is (B
t
) B
-measurable, then for all < a < b <

, (a < f b B
t
) a < f b B
from which it follows that 1

a<fb]
is (almost) invariant. Thus for every N N the function dened by;
f
N
:=
N
2
n=N
2
n
N
1
n1
N
<f
n
N
,
is (almost) invariant. As f = lim
N
f
N,
it follows that f is (almost) invariant
as well.
In the case where f is almost invariant, we can choose D
N
(n) B
such
that
_
D
N
(n)
_
n1
N
< f
n
N
__
= 0 for all n and N and then set
g
N
:=
N
2
n=N
2
n
N
1
DN(n)
.
We then have g
N
= f
N
a.e. and g
N
is B
measurable. We may thus conclude

that g := limsup
N
g
N
is B
measurable. It now follows that g := g1

] g]<
is B
measurable function such that g = f a.e.

Theorem 16.6. Suppose that (, B, ) is a nite measure space and :
is a measurable map such that
= . Then;
1. U : L
2
() L
2
() dened by Uf := f is an isometry. The isometry U
is unitary if
1
exists as a measurable map.
2. The map,
L
2
(, B
, ) f f Nul (U I)
is unitary. In other words, Uf = f i there exists g L
2
(, B
, ) such
that f = g a.e.
16 Some Ergodic Theory 227
3. For every f L
2
() we have,
L
2
() lim
n
f +f + +f
n1
n
= E
B
[f]
where E
B
denotes orthogonal projection from L
2
(, B, ) onto
L
2
(, B
, ) , i.e. E
B
is conditional expectation.
Proof. 1. To see that U is an isometry observe that
|Uf|
2
=
_
[f [
2
d =
_
[f[
2
d (
) =
_
[f[
2
d = |f|
2
for all f L
2
() .
2. f Nul (U I) i f = Uf = f a.e., i.e. i f is almost invariant.
According to Lemma 16.5 this happen i there exists a B
measurable func-
tion, g, such that f = g a.e. Necessarily, g L
2
() so that g L
2
(, B
, ) as
required.
3. The last assertion now follows from items 1. and 2. and the mean ergodic
Theorem 16.1.
Assumption 1 From now on we will assume that = P is a probability mea-
sure such that P
1
= .
Exercise 16.1. For every Z L
1
(P) , show that E[Z [B
] = E[Z[B
] a.s.
More generally, show for sub -algebra, ( B, show E
_
Z [
1
(
=
E[Z[(] a.s.
Solution to Exercise (16.1). First observe that E[Z[(] is ( measurable
being the composition of
_
,
1
(
_

(, ()
E[Z]]
(1, B
R
) . Now let A (,
then
E
_
E
_
Z [
1
(
:
1
A
= E
_
Z :
1
A
= E[(Z 1
A
) ]
= E[Z 1
A
] = E[E[Z[(] 1
A
]
= E[(E[Z[(] 1
A
) ] = E
_
E[Z[(] :
1
A
.
As A ( is arbitrary, it follows that E
_
Z [
1
(
= E[Z[(] a.s. Taking

( = B
then shows,
E[Z [B
] = E
_
Z [
1
B
= E[Z[B
] = E[Z[B
] a.s.
Exercise 16.2. Let 1 p < . Following the ideas introduced in Exercises
14.8 and 14.9, show
L
p
(P) lim
n
f +f + +f
n1
n
= E
B
[f] for all f L
p
(, B, P) .
(Some of these ideas will again be used in the proof of Theorem 16.9 below.)
Denition 16.7. A sequence of random variables =
k
k=1
is a stationary
if (
2
,
3
, , . . . )
d
= (
1
,
2
, . . . ) .
If we temporarily let
(x
1
, x
2
, x
3
, . . . ) = (x
2
, x
3
, . . . ) for (x
1
, x
2
, x
3
, . . . ) 1
N
, (16.2)
the stationarity condition states that
d
= . Equivalently if
= Law
P
(
1
,
2
, . . . ) on
_
1
N
, B
N
R
_
, then =
k
k=1
is stationary i

1
= . Let us also observe that is stationary implies
2
d
=
d
=
and
3
d
=
d
= , etc. so that
n
d
= for all n N.
1
In what follows for
x (x
1
, x
2
, x
3
, . . . ) 1
N
we will let S
0
(x) = 0,
S
n
(x) = x
1
+x
2
+ +x
n
, and
S
n
:= max (S
1
, S
2
, . . . , S
n
)
for all n N.
Lemma 16.8 (Maximal Ergodic Lemma). Suppose :=
k
k=1
is a sta-
tionary sequence and S
n
() =
1
+ +
n
as above, then
E
_
1
: sup
n
S
n
() > 0
_
0. (16.3)
Proof. In this proof, will be as in Eq. (16.2). If 1 k n, then
S
k
() =
1
+S
k1
()
1
+S
k1
()
1
+S
n
() =
1
+ [S
n
()]
+
and therefore, S
n
()
1
+ [S
n
()]
+
. So we may conclude that
E[
1
: S
n
() > 0] E
_
S
n
() [S
n
()]
+
: S
n
> 0
= E
_
[S
n
()]
+
[S
n
()]
+
1
S
n
>0
E
_
[S
n
()]
+
[S
n
()]
+
= E[S
n
()]
+
E[S
n
()]
+
= 0,
wherein we used
d
= for the last equality. Letting n making use of
the MCT and the observation that S
n
() > 0 sup
n
S
n
() > 0 gives Eq.
(16.3).
1
In other words if {k}
k=1
is stationary, then by lopping o the rst random variable
on each side of the identity, (2, 3, , . . . )
d
= (1, 2, . . . ) , implies that
(3, 4, . . . )
d
= (2, 3, , . . . )
d
= (1, 2, . . . ) .
Continuing this way inductively shows that stationarity is equivalent to
(n, n+1, n+2, . . . )
d
= (1, 2, . . . ) for all n N.
228 16 Some Ergodic Theory
Theorem 16.9 (Birkos Ergodic Theorem). Suppose that f
L
1
(, B, P) or f 0 and is B measurable, then
lim
n
1
n
n
k=1
f
k1
= E[f[B
] a.s.. (16.4)
Moreover if f L
p
(, B, P) for some 1 p < then the convergence in Eq.
(16.4) holds in L
p
as well.
Proof. Let us begin with the general observation that if = (
1
,
2
, . . . ) is
a sequence of random variables such that
i
=
i+1
for i = 1, 2, . . . , then
is stationary. This is because,
(
1
,
2
, . . . )
d
= (
1
,
2
, . . . ) = (
1
,
2
, . . . ) = (
2
,
3
, . . . ) .
We will rst prove Eq. (16.4) under the assumption that f L
1
(P) . We
now let g := E[f[B
] and
k
:= f
k1
g for all k N. Since g is B

measurable we know that g = g and therefore,
k
=
_
f
k1
g
_
= f
k
g =
k+1
and therefore = (
1
,
2
, . . . ) is stationary. To simplify notation let us write S
n
for S
n
() =
1
+ +
n
. To nish the proof we need to show that lim
n
Sn
n
=
0 a.s. for then,
1
n
n
k=1
f
k1
=
1
n
S
n
+g g = E[f[B
] a.s.
In order to show lim
n
Sn
n
= 0 a.s. it suces to show M () :=
limsup
n
Sn()
n
0 a.s. If we can do this we can also show that M () =
limsup
n
Sn()
n
0, i.e.
liminf
n
S
n
()
n
0 limsup
n
S
n
()
n
a.s.
which shows that lim
n
Sn
n
= 0 a.s. Finally in order to prove M () 0 a.s.
it suces to show P (M () > ) = 0 for all > 0. This is what we will do now.
Since S
n
= S
n+1
1
we have so that
M () = limsup
n
1
n
(S
n+1
1
) = limsup
n
_
n + 1
n

1
n + 1
S
n+1
_
= M () .
Thus M () is an invariant function and therefore A
:= M () > B
.
Using E[
1
[B
] = E[f g[B
] = g g = 0 a.s. it follows that

0 = E[E[
1
[B
] : M () > ] = E[
1
: M ( ) > 0]
= E[
1
: M ( ) > 0] +P (A
) .
If we now dene
n
:= (
n
) 1
A
, which is still stationary since
n
= (
n
) 1
A
= (
n+1
) 1
A
=
n+1
,
then it is easily veried
2
that
A
= M ( ) > 0 =
_
sup
n
S
n
(
) > 0
_
.
Therefore by an application of the maximal ergodic Lemma 16.8 we have,
P (M () > ) = E[
1
: A
] = E
_
1
: sup
n
S
n
(
) > 0
_
0
which shows P (M () > ) = 0.
Now suppose that f L
p
(P) . To prove the L
p
convergence of the limit in
Eq. (16.4) it suces by Corollary 12.47 to show
_
1
n
S
n
()
p
_
n=1
is uniformly
integrable. This can be done as in the second solution to Exercise 12.6 (Resnick
6.7, #5). Here are the details.
First observe that [
k
[
p
k=1
are uniformly integrable. Indeed, by station-
arity,
E[[
k
[
p
: [
k
[
p
a] = E[[
1
[
p
: [
1
[
p
a]
and therefore
sup
k
E[[
k
[
p
: [
k
[
p
a] = E[[
1
[
p
: [
1
[
p
a]
DCT
0 as a .
Thus if > 0 is given we may nd (see Proposition 12.42) > 0 such that
E[[
k
[
p
: A] whenever A B with P (A) . Then for such an A, we
have (using Jensens inequality relative to normalized counting measure on
1, 2, . . . , n),
E
_
1
n
S
n
()
p
: A
_
E
_
1
n
S
n
([[
p
) : A
_
=
1
n
n
k=1
E[[
k
[
p
: A]
1
n
n = .
2
Since A {sup Sn/n > } , it follows that
A =
_
sup
Sn
n
>
_
A = {supSn n > 0} A
= {supSn ( ) > 0} A = {sup Sn ( ) 1A
> 0}
=
_
sup
n
Sn (
) > 0
_
.
16 Some Ergodic Theory 229
Another application of Proposition 12.42 shows
_
1
n
S
n
()
p
_
n=1
is uniformly
integrable as
sup
n
E
1
n
S
n
()
p
sup
n
1
n
n
k=1
E[[
k
[
p
] = E[
1
[
p
< .
Finally we need to consider the case where f 0 but f / L
1
(P) . As before,
let g = E[f[B
] 0. For r (0, ) and let f

r
:= f 1
gr
. We then have
E[f
r
[B
] = E[f 1
gr
[B
] = 1
gr
E[f [B
] = 1
gr
g
and in particular, Ef
r
= E(1
gr
g) r < . Thus by the L
1
case already
proved,
lim
n
1
n
n
k=1
f
r

k1
= 1
gr
g a.s.
On the other hand, since g is invariant, we see that f
r

k
= f
k
1
gr
and therefore
1
n
n
k=1
f
r

k1
=
_
1
n
n
k=1
f
k1
_
1
gr
.
Using these identities and the fact that r < was arbitrary we may conclude
that
lim
n
1
n
n
k=1
f
k1
= g a.s. on g < . (16.5)
To take care of the set where g = , again let r (0, ) but now take
f
r
= f r f. It then follows that
liminf
n
1
n
n
k=1
f
k1
liminf
n
1
n
n
k=1
_
f
r

k1
= E[f r[B
] .
Letting r and using the cMCT implies,
liminf
n
1
n
n
k=1
f
k1
E[f[B
] = g
and therefore liminf
n
1
n
n
k=1
f
k1
= a.s. on g = . This then
shows that
lim
n
1
n
n
k=1
f
k1
= = g a.s. on g = .
which combined with Eq. (16.5) completes the proof.
As a corollary we have the following version of the strong law of large num-
bers, also see Theorems 20.30 and Example 18.78 below for other proofs.
Theorem 16.10 (Kolmogorovs Strong Law of Large Numbers). Sup-
pose that X
n
n=1
are i.i.d. random variables and let S
n
:= X
1
+ + X
n
. If
X
n
are integrable or X
n
0, then
lim
n
1
n
S
n
= EX
1
a.s.
and
1
n
S
n
EX
1
in L
1
(P) when E[X
n
[ < .
Proof. We may assume that = 1
N
, B is the product algebra, and
P =
N
where = Law
P
(X
1
) . In this model, X
n
() =
n
for all
and we take : as in Eq. (16.2). Wit this notation we have X
n
=
X
1
n1
and therefore, S
n
=
n
k=1
X
1
k1
. So by Birkos ergodic theorem
lim
n
1
n
S
n
= E[X
1
[B
] =: g a.s.
If A B
, then A =
n
(A) (X
n+1
, X
n+2
, . . . ) and therefore A T =
n
(X
n+1
, X
n+2
, . . . ) the tail algebra. However by Kolmogorovs 0 - 1
law (Proposition 10.50), we know that T is almost trivial and therefore so is
B
. Hence we may conclude that g = c a.s. where c [0, ] is a constant, see

Lemma 10.49.
If X
1
0 a.s. and EX
1
= then we must c = E[X
1
[B
] = a.s. for
if c < , then EX
1
= E[E[X
1
[B
]] = E[c] < . When X

1
L
1
(P) , the
convergence in Birkos ergodic theorem is also in L
1
and therefore we may
conclude that
c = Ec = lim
n
E
_
1
n
S
n
_
= lim
n
1
n
E[S
n
] = EX
1
.
Thus we have shown in all cases that lim
n
1
n
S
n
= E[X
1
[B
] = EX
1
a.s.
Part III
Stochastic Processes I
In the sequel (, B, P) will be a probability space and (S, o) will denote a
measurable space which we refer to as state space. If we say that f :
S is a function we will always assume that it is B/o measurable. We also
let o
b
denote the bounded o/B
R
measurable functions from S to 1. On
occasion we will assume that (S, o) is a standard Borel space in order to have
available to us the existence of regular conditional distributions (see Remark
14.16 and Theorem 14.32) and the use of Kolmogorovs extension Theorem
17.54 for proving the existence of Markov processes.
In the rest of this book we will devote most of our time to studying stochas-
tic processes, i.e. a collection of random variables or more generally random
functions, X := X
t
: S
tT
, indexed by some parameter space, T. The
weakest description of such a stochastic process will be through its nite di-
mensional distributions.
Denition 16.11. Given a stochastic process, X := X
t
: S
tT
, and a
nite subset, T, we say that
:= Law
P
_
X
t
t
_
on
_
S
, o
_
is a
nite dimensional distribution of X.
Unless T is a countable or nite set or X
t
has some continuity properties
in t, knowledge of the nite dimensional distributions alone is not going to be
adequate for our purposes, however it is a starting point. For now we are going
to restrict our attention to the case where T = N
0
or T = 1
+
:= [0, ) (t T
is typically interpreted as a time). Later in this part we will further restrict
attention to stochastic processes indexed by N
0
leaving the technically more
complicated case where T = 1
+
to later parts of the book.
Denition 16.12. An increasing (i.e. non-decreasing) sequence B
t
tT
of
sub--algebras of B is called a ltration. We will let B
:=
tT
B
t
:=
(
tT
B
t
) . A four-tuple,
_
, B, B
t
tT
, P
_
, where (, B, P) is a probabil-
ity space and B
t
tT
is a ltration is called a ltered probability space. We
say that a stochastic process, X
t
tT
, of random functions from S is
adapted to the ltration if X
t
is B
t
/o measurable for every t T.
A typical way to make a ltration is to start with a stochastic process
X
t
tT
and then dene B
X
t
:= (X
s
: s t) . Clearly X
t
tT
will always be
adapted to this ltration.
In this part of the book we are going to study stochastic processes with
certain dependency structures. This will take us to the notion of Markov pro-
cesses and martingales. Before starting our study of Markov processes it will be
helpful to record a few more facts about probability kernels.
Given a probability kernel, Q, on S S (so Q : S o [0, 1]), we may
associate a linear transformation, T = T
Q
: o
b
o
b
dened by
(Tf) (x) = Q(x, f) =
_
S
Q(x, dy) f (y) for all f o
b
. (16.6)
It is easy to check that T satises;
1. T1 = 1,
2. Tf 0 if 0 f o
b
,
3. if f
n
o
b
and f
n
f boundedly then Tf
n
Tf boundedly as well.
Notice that an operator T : o
b
o
b
satisfying conditions 1. and 2. above
also satises Tf Tg and Tf is real is f. Indeed if f = f
+
f
is real then
Tf = T (f
+
f
) = Tf
+
Tf
with 0 Tf
1 and if f g then 0 f g which implies

Tf Tg = T (f g) 0.
As f [f[ when f is real, we have Tf T [f[ and therefore [Tf[ T [f[ .
More generally if f is complex and x S, we may choose 1 such that
e
i
(Tf) (x) 0 and therefore,
[(Tf) (x)[ = e
i
(Tf) (x) =
_
T
_
e
i
f
_
(x)
=
_
T Re
_
e
i
f
_
(x) +i
_
T Im
_
e
i
f
_
(x) .
Furthermore we must have
_
T Im
_
e
i
f
_
(x) = 0 and using Re
_
e
i
f
[f[ we
nd,
[(Tf) (x)[ =
_
T Re
_
e
i
f
_
(x) (T [f[) (x) .
As x S was arbitrary we have shown that [Tf[ T [f[ . Thus if [f[ M for
some 0 M < we may conclude,
[Tf[ T [f[ T (M 1) = MT1 = M.
Proposition 16.13. If T : o
b
o
b
is a linear transformation satisfying the
three properties listed after Eq. (16.6), then Q(x, A) := (T1
A
) (x) for all A o
and x S is a probability kernel such that Eq. (16.6) holds.
The proof of this proposition is straightforward and will be left to the reader.
Let me just remark that if Q(x, A) := (T1
A
) (x) for all x S and A o then
Tf = Q(, f) for all simple functions in o
b
and then by approximation for all
f o
b
.
Corollary 16.14. If Q
1
and Q
2
are two probability kernels on (S, o) (S, o) ,
then T
Q1
T
Q2
= T
Q
where Q is the probability kernel given by
Q(x, A) = (T
Q1
T
Q2
1
A
) (x) = Q
1
(x, Q
2
(, A))
=
_
S
Q
1
(x, dy) Q
2
(y, A)
for all A o and x S. We will denote Q by Q
1
Q
2
.
From now on we will identify the probability kernel Q : S o [0, 1]
with the linear transformation T = T
Q
and simply write Qf for Q(, f) . The
last construction that we need involving probability kernels is the following
extension of the notion of product measure.
Proposition 16.15. Suppose that is a probability measure on (S, o) and Q
k
:
So [0, 1] are probability kernels on (S, o)(S, o) for 1 k n. Then there
exists a probability measure on
_
S
n+1
, o
(n+1)
_
such that for all f o
(n+1)
b
we have
(f) =
_
S
d (x
0
)
_
S
Q
1
(x
0
, dx
1
)
_
S
Q
2
(x
1
, dx
2
)
. . .
_
S
Q
n
(x
n1
, dx
n
) f (x
0
, . . . , x
n
) . (16.7)
Part of the assertion here is that all functions appearing are bounded and mea-
surable so that all of the above integrals make sense. We will denote in the
future by,
d(x
0
, . . . , x
n
) = d (x
0
) Q
1
(x
0
, dx
1
) Q
2
(x
1
, dx
2
) . . . Q
n
(x
n1
, dx
n
) .
Proof. The fact that all of the iterated integrals make sense in Eq. (16.7)
follows from Exercise 14.3, the measurability statements in Fubinis theorem,
and induction. The measure is dened by setting (A) = (1
A
) for all A
o
(n+1)
. It is a simple matter to check that is a measure on
_
S
n+1
, o
(n+1)
_
and that
_
S
fd agrees with the right side of Eq. (16.7) for all f o
(n+1)
b
.
Remark 16.16. As usual the measure is determined by its value on product
functions of the form f (x
0
, . . . , x
n
) =

n
i=0
f
i
(x
i
) with f
i
o
b
. For such a
function we have
(f) = E
_
f
0
Q
1
M
f1
Q
2
M
f2
. . . Q
n1
M
fn1
Q
n
f
n
where M
f
: o
b
o
b
is dened by M
f
g = fg, i.e. M
f
is multiplication by f.
17
The Markov Property
For purposes of this section, T = N
0
or 1
+
,
_
, B, B
t
tT
, P
_
is a ltered
probability space, and (S, o) be a measurable space. We will often write t 0
to mean that t T. Thus we will often denote a stochastic process by X
t
t0
instead of X
t
tT
.
Denition 17.1 (The Markov Property). A stochastic process
X
t
: S
tT
is said to satisfy the Markov property if X
t
is adapted and
E
Bs
f (X
t
) := E[f (X
t
) [B
s
] = E[f (X
t
) [X
s
] a.s. for all 0 s < t (17.1)
and for every f o
b
.
If Eq. (17.1) holds then by the factorization Lemma 6.40 there exists F o
b
such that F (X
s
) = E[f (X
t
) [X
s
] . Conversely if we want to verify Eq. (17.1) it
suces to nd an F o
b
such that E
Bs
f (X
t
) = F (X
s
) a.s. This is because,
by the tower property of conditional expectation,
E
Bs
f (X
t
) = F (X
s
) = E[F (X
s
) [X
s
] = E[E
Bs
f (X
t
) [X
s
] = E[f (X
t
) [X
s
] a.s.
(17.2)
Poetically speaking as stochastic process with the Markov property is forgetful
in the sense that knowing the positions of the process up to some time s t does
not give any more information about the position of the process, X
t
, at time t
than knowing where the process was at time s. We will in fact show (Theorem
17.4 below) that given X
s
what the process did before time s is independent of
the what it will do after time s.
Lemma 17.2. If X
t
t0
satises the Markov property relative to
the ltration B
t
t0
it also satises the Markov property relative to
_
B
X
t
= (X
s
: s t)
_
t0
.
Proof. It is clear that X
t
tT
is B
X
t
adapted and that (X
s
) B
X
s
B
s
for all s T. Therefore using the tower property of conditional expectation we
have,
E
B
X
s
f (X
t
) = E
B
X
s
E
Bs
f (X
t
) = E
B
X
s
E
(Xs)
f (X
t
) = E
(Xs)
f (X
t
) .
Remark 17.3. If T = N
0
, a stochastic process X
n
n0
is Markov i for all
f o
b
,
E[f (X
m+1
) [B
m
] = E[f (X
m+1
) [X
m
] a.s. for all m 0 (17.3)
Indeed if Eq. (17.3) holds for all m, we may use induction on n to show
E[f (X
n
) [B
m
] = E[f (X
n
) [X
m
] a.s. for all n m. (17.4)
It is clear that Eq. (17.4) holds for n = m and n = m+ 1. So now suppose Eq.
(17.4) holds for a given n m. Using Eq. (17.3) with m = n implies
E
Bn
f (X
n+1
) = E[f (X
n+1
) [X
n
] = F (X
n
)
for some F o
b
. Thus by the tower property of conditional expectations and
the induction hypothesis,
E
Bm
f (X
n+1
) = E
Bm
E
Bn
f (X
n+1
) = E
Bm
F (X
n
) = E[F (X
n
) [X
m
]
= E[E
Bn
f (X
n+1
) [X
m
] = E[f (X
n+1
) [X
m
] .
The next theorem and Exercise 17.1 shows that a stochastic process has the
Markov property i it has the property that given its present state, its past and
future are independent.
Theorem 17.4 (Markov Independence). Suppose that X
t
tT
is an
adapted stochastic process with the Markov property and let T
s
:= (X
t
: t s)
be the future algebra. Then B
s
is independent of T
s
given X
s
which we
abbreviate as B
s

Xs
T
s
. In more detail we are asserting the
P (A B[X
s
) = P (A[X
s
) P (B[X
s
) a.s.
for all A B
s
and B T
s
E[FG[X
s
] = E[F[X
s
] E[G[X
s
] a.s. (17.5)
for all F (B
s
)
b
and G (T
s
)
b
and s T.
236 17 The Markov Property
Proof. Suppose rst that G =
n
i=1
g
i
(X
ti
) with s < t
1
< t
2
< < t
n
and
g
i
o
b
. Then by the Markov property and the tower property of conditional
expectations,
E
Bs
[G] = E
Bs
_
n
i=1
g
i
(X
ti
)
_
= E
Bs
E
Bt
n1
_
n
i=1
g
i
(X
ti
)
_
= E
Bs
n1
i=1
g
i
(X
ti
) E
Bt
n1
g
n
(X
tn
)
= E
Bs
n1
i=1
g
i
(X
ti
) E
_
g
n
(X
tn
) [X
tn1
= E
Bs
n1
i=1
g
i
(X
ti
)
where g
i
= g
i
for i < n 1 and g
n1
= g
n1
g where g is chosen so that
E
_
g
n
(X
tn
) [X
tn1
= g
_
X
tn1
_
a.s.. Continuing this way inductively we learn
that E
Bs
[G] = F (X
s
) a.s. for some F o
b
and therefore,
E[G[X
s
] = E[E
Bs
G[X
s
] = E[F (X
s
) [X
s
] = F (X
s
) = E
Bs
[G] a.s.
Now suppose that G = g
0
(X
s
)
n
i=1
g
i
(X
ti
) where g
i
o
b
and s < t
1
<
t
2
< < t
n
and F (B
s
)
b
. Then
E
Bs
[FG] = F E
Bs
G = F g
0
(X
s
) E
Bs
_
n
i=1
g
i
(X
ti
)
_
= F g
0
(X
s
) E
_
n
i=1
g
i
(X
ti
) [X
s
_
= F E
_
g
0
(X
s
)
n
i=1
g
i
(X
ti
) [X
s
_
= F E[G[X
s
] .
We may now condition this equation on X
s
to arrive at Eq. (17.5) for product
functions, G, as above. An application of the multiplicative system Theorem
8.2 may now be used to show that Eq. (17.5) holds for general G (T
s
)
b
.
t
t0
is an adapted stochastic process such
that Eq. (17.5) holds for all F (B
s
)
b
and G (T
s
)
b
and s T. Show that
X
t
tT
has the Markov property.
17.1 Markov Processes
If S is a standard Borel space (i.e. S is isomorphic to a Borel subset of [0, 1]), we
may a nd regular conditional probability kernels, Q
s,t
on (S, o)(S, o) [0, 1]
for all 0 s < t such that
E[f (X
t
) [X
s
] = Q
s,t
(X
s
; f) = (Q
s,t
f) (X
s
) a.s. (17.6)
Moreover by the Markov property if 0 < s < t, then
(Q
,t
f) (X
) = E[f (X
t
) [X
] = E[E
Bs
f (X
t
) [X
]
= E[(Q
st
f) (X
s
) [X
] = (Q
,s
Q
st
f) (X
) P a.s.
= Q
,s
(X
; Q
s,t
(; f)) , P a.s.
If we let
t
:= Law
P
(X
t
) : o [0, 1] for all t T, we have just shown that for
every f o
b
, that
Q
,t
(; f) = Q
,s
(; Q
s,t
(; f))
a.s. (17.7)
In the sequel we want to assume that such kernels exists and that Eq. (17.7)
holds for everywhere not just
a.s. Thus we make the following denitions.

Denition 17.5 (Markov transition kernels). We say a collection of prob-
ability kernels, Q
s,t
0st<
, on S S are Markov transition kernels
if Q
s,s
(x, dy) =
x
(dy) (as an operator Q
s,s
= I
Sb
) for all s T and the
Chapmann-Kolmogorov equations hold;
Q
,t
= Q
,s
Q
s,t
for all 0 s t. (17.8)
Recall that Eq. (17.8) is equivalent to
Q
,t
(x, A) =
_
S
Q
,s
(x, dy) Q
s,t
(y, A) for all x S and A o (17.9)
or
Q
,t
(x; f) = Q
,s
(x; Q
s,t
(; f)) for all x S and f o
b
. (17.10)
Thus Markov transition kernels should satisfy Eq. (17.7) everywhere not just
almost everywhere.
The reader should keep in mind that Q
,t
(x, A) represents the jump prob-
ability of starting at x at time and ending up in A o at time t. With this
in mind, Q
,s
(x, dy) Q
s,t
(y, A) intuitively is the probability of jumping from x
at time s to y at time t followed by a jump into A at time u. Thus Eq. (17.9)
states that averaging these probabilities over the intermediate location (y) of
the particle at time t gives the jump probability of starting at x at time s and
ending up in A o at time t. This interpretation is rigorously true when S is
a nite or countable set.
Denition 17.6 (Markov process). A Markov process is an adapted
stochastic process, X
t
: S
t0
, with the Markov property such that there
are Markov transition kernels Q
s,t
0st<
, on S S such that Eq. (17.6)
holds.
17.1 Markov Processes 237
Denition 17.7. A stochastic process,
_
X
t
: S := 1
d
_
tT
, has indepen-
dent increments if for all nite subsets, = 0 t
0
< t
1
< < t
n
T
the random variables X
0

_
X
tk
X
tk1
_
n
k=1
are independent. We refer to
X
t
X
s
for s < t as an increment of X.
_
X
t
: S := 1
d
_
tT
is a stochastic process
with independent increments and let B
t
:= B
X
t
for all t T. Show, for all
0 s < t, that (X
t
X
s
) is independent of B
X
s
and then use this to show
X
t
tT
is a Markov process with transition kernels dened by 0 s t,
Q
s,t
(x, A) := E[1
A
(x +X
t
X
s
)] for all A o and x 1
d
. (17.11)
You should verify that Q
s,t
0st
are indeed Markov transition kernels, i.e.
satisfy the Chapmann-Kolmogorov equations.
Example 17.8 (Random Walks). Suppose that
_
n
: S := 1
d
_
n=0
are in-
dependent random vectors and X
m
:=

m
k=0
k
and B
m
:= (
0
, . . . ,
m
) for
each m T = N
0
. Then X
m
m0
has independent increments and therefore
has the Markov property with Markov transition kernels being given by
Q
s,t
(x, f) = E[f (x +X
t
X
s
)]
= E
_
_
f
_
_
x +
s<kt
k
_
_
_
_
or in other words,
Q
s,t
(x, ) = Law
P
_
_
x +
s<kt
k
_
_
.
The one step transition kernels are determined by
(Q
n,n+1
f) (x) = E[f (x +
n+1
)] for n N
0
.
Exercise 17.3. Let us now suppose that
n
: S
n=0
are independent
random functions where (S, o) is a general a measurable space, B
n
:=
(
0
,
1
, . . . ,
n
) for n 0, u
n
: S S S are measurable functions for n 1,
and X
n
: S for n N
0
are dened by X
0
=
0
and then inductively for
n 1 by
X
n+1
= u
n+1
(X
n
,
n+1
) for n 0.
Convince yourself that for 0 m < n there is a measurable function,
n,m
:
S
nm+1
S determined by the u
k
such that X
n
=
n,m
(X
m
,
m+1
, . . . ,
n
) .
(You need not write the proof of this assertion in your solution.) In particular,
X
n
=
n,0
(
0
, . . . ,
n
) is B
n
/o measurable so that X = X
n
n0
is adapted.
Show X
n
n0
is a Markov process with transition kernels,
Q
m,n
(x, ) = Law
P
(
n,m
(x,
m+1
, . . . ,
n
)) for all 0 m n
where (by denition) Q
m,m
(x, ) =
x
() . Please explicitly verify that
Q
m,n
0mn
are Markov transition kernels, i.e. satisfy the Chapmann-
Kolmogorov equations.
Remark 17.9. Suppose that T = N
0
and Q
m,n
: 0 m n are Markov tran-
sition kernels on S S. Since
Q
m,n
= Q
m,m+1
Q
m+1,m+2
. . . Q
n1,n
, (17.12)
it follows that the Q
m,n
are uniquely determined by knowing the one step tran-
sition kernels, Q
n,n+1
n=0
. Conversely if Q
n,n+1
n=0
are arbitrarily given
probability kernels on S S and Q
m,n
are dened as in Eq. (17.12), then the
resulting Q
m,n
: 0 m n are Markov transition kernels on SS. Moreover
if S is a countable set, then we may let
q
m,n
(x, y) := Q
m,n
(x, y) = P (X
n
= y[X
m
= x) for all x, y S (17.13)
so that
Q
m,n
(x, A) =
yA
q
m,n
(x, y) .
In this case it is easily checked that
q
m,n
(x, y)
=
xiS:m<i<n
q
m,m+1
(x, x
m+1
) q
m+1,m+2
(x
m+1
, x
m+2
) . . . q
n1,n
(x
n1
, y) .
(17.14)
The reader should observe that this is simply matrix multiplication!
Exercise 17.4 (Polyas Urn). Suppose that an urn contains r red balls and
g green balls. At each time (t T = N
0
) we draw a ball out, then replace it and
add c more balls of the color drawn. It is reasonable to model this as a Markov
process with S := N
0
N
0
and X
n
:= (r
n
, g
n
) S being the number of red and
green balls respectively in the earn at time n. Find
q
n,n+1
((r, g) , (r
t
, g
t
)) = P (X
n+1
= (r
t
, g
t
) [X
n
= (r, g))
for this model.
Theorem 17.10 (Finite Dimensional Distributions). Suppose that X =
X
t
t0
is a Markov process with Markov transition kernels Q
s,t
0st
. Fur-
ther let := Law
P
(X
0
) , then for all 0 = t
0
< t
1
< t
2
< < t
n
we have
Law
P
(X
t0
, X
t1
, . . . , X
tn
) (dx
0
, dx
1
, . . . , dx
n
) = d (x
0
)
n
i=1
Q
ti1,ti
(x
i1
, dx
i
)
(17.15)
or equivalently,
E[f (X
t0
, X
t1
, . . . , X
tn
)] =
_
S
n+1
f (x
0
, x
1
, . . . , x
n
) d (x
0
)
n
i=1
Q
ti1,ti
(x
i1
, dx
i
)
(17.16)
for all f o
(n+1)
b
.
Proof. Because of the multiplicative system Theorem 8.2, it suces to prove
Eq. (17.16) for functions of the form f (x
1
, . . . , x
n
) =
n
i=0
f
i
(x
i
) where f
i
o
b
.
The proof is now easily completed by induction on n. It is true for n = 0 by
denition of . Now assume it is true for some n1 0. We then have, making
use of the inductive hypothesis, that
E[f (X
t0
, X
t1
, . . . , X
tn
)]
= EE
Bt
n1
_
n
i=0
f
i
(X
ti
)
_
= E
_
Q
tn1,tn
_
X
tn1
, f
n
_
n1
i=0
f
i
(X
ti
)
_
=
_
S
n
Q
tn1,tn
(x
n1
, f
n
)
n1
i=0
f
i
(x
i
) d (x
0
)
n1
i=1
Q
ti1,ti
(x
i1
, dx
i
)
=
_
S
n
__
S
Q
tn1,tn
(x
n1
, dx
n
) f
n
(x
n
)
_
n1
i=0
f
i
(x
i
) d (x
0
)
n1
i=1
Q
ti1,ti
(x
i1
, dx
i
)
=
_
S
n+1
f (x
0
, x
1
, . . . , x
n
) d (x
0
)
n
i=1
Q
ti1,ti
(x
i1
, dx
i
)
as desired.
Theorem 17.11 (Existence of Markov processes). Suppose that
Q
s,t
0st
are Markov transition kernels on a standard Borel space,
(S, o) . Let := S
T
, X
t
: S be the projection map, X
t
() = (t) and
B
t
= B
X
t
= (X
s
: s t) for all t T and B := o
T
= (X
t
: t T) . Then to
each probability measure, , on (S, o) there exists a unique probability measure
P
on (, B) such that 1) Law

P
(X
0
) = and 2) X
t
t0
is a Markov process
having Q
s,t
0st
as its Markov transition kernels.
Proof. This is mainly an exercise in applying Kolmogorovs extension Theo-
rem 17.54 as described in the appendix to this chapter. I will only briey sketch
the proof here.
For each = 0 = t
0
< t
1
< t
2
< < t
n
T, let P
be the measure on
_
S
n+1
, o
(n+1)
_
dened by
dP
(x
0
, x
1
, . . . , x
n
) = d (x
0
)
n
i=1
Q
ti1,ti
(x
i1
, dx
i
) .
Using the Chapman-Kolmogorov equations one shows that the P
f T
(
f
T denotes a nite subset of T) are consistently dened measures as described
in the statement of Theorem 17.54. Therefore it follows by an application of
that theorem that there exists a unique measure P
on (, B) such that
Law
P
(X[
) = P
for all
f
T. (17.17)
In light of Theorem 17.10, in order to nish the proof we need only show
that X
t
t0
is a Markov process having Q
s,t
0st
as its Markov transition
kernels. Since if this is this case it nite dimensional distributions must be given
as in Eq. (17.17) and therefore P
is uniquely determined. So let us now verify

the desired Markov property.
Again let = 0 = t
0
< t
1
< t
2
< < t
n
T with t
n1
= s < t = t
n
and suppose that f (x
0
, . . . , x
n
) = h(x
0
, . . . , x
n1
) g (x
n
) with h o
n
b
and
f o
b
. By the denition of P
we then have (writing E
for E
P
),
E
_
h
_
X
t0
, X
t1
, . . . , X
tn1
_
g (X
t
)
=
_
S
n+1
h(x
0
, x
1
, . . . , x
n1
) g (x
n
) d (x
0
)
n
i=1
Q
ti1,ti
(x
i1
, dx
i
)
=
_
S
n
h(x
0
, x
1
, . . . , x
n1
) Q
tn1,tn
(x
n1
, g) d (x
0
)
n1
i=1
Q
ti1,ti
(x
i1
, dx
i
)
= E
_
h
_
X
t0
, X
t1
, . . . , X
tn1
_
Q
tn1,tn
_
X
tn1
, g
_
= E
_
h
_
X
t0
, X
t1
, . . . , X
tn1
_
Q
s,t
(X
s
, g)
It then follows by an application of the multiplicative system theorem that

E
[Hg (X
t
)] = E
[HQ
s,t
(X
s
, g)] for all H (B
s
)
b
and therefore that
E
[g (X
t
) [B
s
] = Q
s,t
(X
s
, g) a.s.
17.1 Markov Processes 239
We are now going to specialize to the more manageable class of time ho-
mogeneous Markov processes.
Denition 17.12. We say that a collection of Markov transition kernels,
Q
s,t
0st
are time homogeneous if Q
s,t
= Q
0,ts
for all 0 s t. In
this case we usually let Q
t
:= Q
0,ts
. The condition that Q
s,s
(x, ) =
x
now
reduces to Q
0
(x, ) =
x
and the Chapmann-Kolmogorov equations reduce to
Q
s
Q
t
= Q
s+t
for all s, t 0, (17.18)
i.e.
_
S
Q
s
(x, dy) Q
t
(y, A) = Q
s+t
(x, A) for all s, t 0, x S, and A o.
(17.19)
A collection of operators Q
t
t0
with Q
0
= Id satisfying Eq. (17.18) is called
a one parameter semi-group.
Denition 17.13. A Markov process is time homogeneous if it has time
homogeneous Markov transition kernels. In this case we will have,
E[f (X
t
) [B
s
] = Q
ts
(X
s
, f) = (Q
ts
f) (X
s
) a.s. (17.20)
for all 0 s t and f o
b
.
Theorem 17.14 (The time homogeneous Markov property). Suppose
that (S, o) is a measurable space, Q
t
: S o [0, 1] are time homoge-
neous Markov transition kernels,
_
, B, B
t
t0
_
is a ltered measure space,
X
t
: S
t0
are adapted functions, and for each x S there exists a prob-
ability measure, P
x
on (, B) such that;
1. X
0
() = x for P
x
a.e. and
2. X
t
t0
is a time homogeneous Markov process with transition kernels
Q
t
t0
relative to P
x
.
Let us further suppose that P is any probability measure on (, B) such that
X
t
t0
is a time homogeneous Markov process with transition kernels being
Q
t
. Then for all F o
T
b
and t 0 we have that S x E
x
F (X) is o/B
R
measurable and
E
P
[F (X
t+
) [B
t
] = E
P
[F (X
t+
) [X
t
] = E
Xt
[F (X)] P a.s. (17.21)
Warning: In this equation E
Xt
does not denote E
(Xt)
= E[[X
t
] but
instead
1
it means the composition of X
t
with the function S x E
x
[F (X)]
1. In more detail we are saying,
E
P
[F (X
t+
) [B
t
] () = E
Xt()
[F (X)]
=
_
F (X (
t
)) P
Xt()
(d
t
) .
Proof. Let
t
= Law
P
(X
t
) . If g o
b
, f o
(n+1)
b
, and F (X) :=
f (X
t0
, . . . , X
tn
) , then
E
P
[g (X
t
) F (X
t+
)] = E
P
[g (X
t
) f (X
t0+t
, . . . , X
tn+t
)]
=
_
g (x
0
) f (x
0
, . . . , x
n
) d
t
(x
0
)
n
j=1
Q
tjtj1
(x
j1
, dx
j
)
=
_
d
t
(x
0
) g (x
0
) E
x0
f (X
t0
, . . . , X
tn
)
=
_
d
t
(x
0
) g (x
0
) E
x0
F (X) = E
x
[g (X
t
) E
Xt
F (X)] .
An application of the multiplicative systems Theorem 8.2 shows this equation
is valid for all F o
T
b
and this tells us that
E
P
[F (X
t+
) [X
t
] = E
Xt
F (X) P
x
a.s.
for all F o
T
b
.
Now suppose that G (B
t
)
b
and F o
T
b
. As F (X
t+
) (T
t
)
b
it follows
by Theorem 17.4 that
E
P
[G F (X
t+
) [X
t
] = E
P
[G[X
t
] E
P
[F (X
t+
) [X
t
] a.s.
Thus we may conclude that
E
P
[G F (X
t+
)] = E
P
[E
P
[G[X
t
] E
P
[F (X
t+
) [X
t
]]
= E
P
[E
P
[G[X
t
] E
Xt
F (X)]
= E
P
[E
P
[E
Xt
F (X) G[X
t
]] = E
P
[E
Xt
[F (X)] G] .
This being valid for all G (B
t
)
b
is equivalent to Eq. (17.21).
1
Unfortunately we now have a lot of dierent meanings for E depending on what
happens to be. So if = P is a measure then EP stands for expectation relative to
P. If = G is a algebra it stands for conditional expectation relative to G and
a given probability measure which not indicated in the notation. Finally if x S
we are writing Ex for EPx
.
Remark 17.15. Admittedly Theorem 17.14 is a bit hard to parse on rst reading.
Therefore it is useful to rephrase what it says in the case that the state space,
S, is nite or countable and x S and t > 0 are such that P (X
t
= x) > 0.
Under these additional hypothesis we may combine Theorems 17.4 and 17.14
to nd X
s
st
and X
s
st
are P ([X
t
= x) independent and moreover,
Law
P(]Xt=x)
(X
t+
) = Law
Px
(X
) . (17.22)
Last assertion simply states that given X
t
= x the process, X, after time t
behaves just like the process starting afresh from x.
17.2 Discrete Time Homogeneous Markov Processes
The proof of the following easy lemma is left to the reader.
Lemma 17.16. If Q
n
: S o [0, 1] for n N
0
are time homogeneous
Markov kernels then Q
n
= Q
n
where Q := Q
1
and Q
0
:= I. Conversely if Q is
a probability kernel on S S then Q
n
:= Q
n
for n N
0
are time homogeneous
Markov kernels.
Example 17.17 (Random Walks Revisited). Suppose that
0
: S := 1
d
is
independent of
_
n
: S := 1
d
_
n=1
which are now assumed to be i.i.d. If
X
m
=

m
k=0
k
is as in Example 17.8, then X
m
m0
is a time homogeneous
Markov process with
Q
m
(x, ) = Law
P
(X
m
X
0
)
and the one step transition kernel, Q = Q
1
, is given by
Qf (x) = Q(x, f) = E[f (x +
1
)] =
_
S
f (x +y) d (y)
where := Law
P
(
1
) . For example if d = 1 and P (
i
= 1) = p and
P (
i
= 1) = q := 1 p for some 0 p 1, then we may take S = Z
and we then have
Qf (x) = Q(x, f) = pf (x + 1) +qf (x 1) .
Example 17.18 (Ehrenfest Urn Model). Let a beaker lled with a particle uid
mixture be divided into two parts A and B by a semipermeable membrane. Let
X
n
= (# of particles in A) which we assume evolves by choosing a particle at
random from A B and then replacing this particle in the opposite bin from
which it was found. Modeling X
n
as a Markov process we nd,
P(X
n+1
= j [ X
n
= i) =
_
_
_
0 if j / i 1, i + 1
i
N
if j = i 1
Ni
N
if j = i + 1
=: q (i, j)
As these probabilities do not depend on n, X
n
is a time homogeneous Markov
chain.
Exercise 17.5. Consider a rat in a maze consisting of 7 rooms which is laid
out as in the following gure.
_
_
1 2 3
4 5 6
7
_
_
In this gure rooms are connected by either vertical or horizontal adjacent
passages only, so that 1 is connected to 2 and 4 but not to 5 and 7 is only
connected to 4. At each time t N
0
the rat moves from her current room to
one of the adjacent rooms with equal probability (the rat always changes rooms
at each time step). Find the one step 7 7 transition matrix, q, with entries
given by q (i, j) := P (X
n+1
= j[X
n
= i) , where X
n
denotes the room the rat
is in at time n.
Solution to Exercise (17.5). The rat moves to an adjacent room from near-
est neighbor locations probability being 1/D where D is the number of doors in
the room where the rat is currently located. The transition matrix is therefore,
q =
1 2 3 4 5 6 7
_
_
0 1/2 0 1/2 0 0 0
1/3 0 1/3 0 1/3 0 0
0 1/2 0 0 0 1/2 0
1/3 0 0 0 1/3 0 1/3
0 1/3 0 1/3 0 1/3 0
0 0 1/2 0 1/2 0 0
0 0 0 1 0 0 0
_
_
1
2
3
4
5
6
7
. (17.23)
and the corresponding jump diagram is given in Figure 17.1.
Exercise 17.6 (2 - step MC). Consider the following simple (i.e. no-brainer)
two state game consisting of moving between two sites labeled 1 and 2. At
each site you nd a coin with sides labeled 1 and 2. The probability of ipping a
2 at site 1 is a (0, 1) and a 1 at site 2 is b (0, 1). If you are at site i at time n,
then you ip the coin at this site and move or stay at the current site as indicated
by coin toss. We summarize this scheme by the jump diagram of Figure 17.2.
17.3 Continuous time homogeneous Markov processes 241
` _
1
1/2
1/2
` _
2
1/3
1/3
1/3
.
` _
3
1/2
1/2
.
` _
4
1/3
_
1/3
1/3
` _
5
1/3
.
1/3
1/3
_
` _
6
1/2
_
1/2
.
` _
7
1
_
Fig. 17.1. The jump diagram for our rat in the maze.
` _
1
1a
` _
2
b
.
1b
Fig. 17.2. The generic jump diagram for a two state Markov chain.
It is reasonable to suppose that your location, X
n
, at time n is modeled by a
Markov process with state space, S = 1, 2 . Explain (briey) why this is a
time homogeneous chain and nd the one step transition probabilities,
q (i, j) = P (X
n+1
= j[X
n
= i) for i, j S.
Use your result and basic linear (matrix) algebra to compute,
lim
n
P (X
n
= 1) . Your answer should be independent of the possible
starting distributions, = (
1
,
2
) for X
0
where
i
:= P (X
0
= i) .
17.3 Continuous time homogeneous Markov processes
An analogous (to Lemma 17.16) innitesimal description of time homoge-
neous Markov kernels in the continuous time case can involve a considerable
number of technicalities. Nevertheless, in this section we are going to ignore
these diculties in order to give a general impression of how the story goes. We
will cover more precisely the missing details later.
So let Q
t
tR+
be time homogeneous collection of Markov transition ker-
nels. We dene the innitesimal generator of Q
t
t0
by,
Af :=
d
dt
[
0
+Q
t
f = lim
t0
Q
t
f f
t
. (17.24)
For now we make the (often unreasonable assumption) that the limit in Eq.
(17.24) holds for all f o
b
. This assumption is OK when S is a nite or
sometimes even a countable state space. For more complicated states spaces we
will have to restrict the set of f o
b
that we consider when computing Af by
Eq. ( 17.24). You should get a feeling for this issue by working through Exercise
17.8 which involves Brownian motion.
Since we are assuming
d
dt
[
0
+Q
t
f exists we must also have lim
t0
Q
t
f = f.
More generally for t, h > 0, using the semi-group property, we have
Q
t+h
Q
t
= (Q
h
I) Q
t
= Q
t
(Q
h
I) (17.25)
and therefore,
Q
t+h
f Q
t
f = (Q
h
I) Q
t
f 0 as h 0
so that Q
t
f is right continuous. Similarly,
Q
th
Q
t
= (Q
h
I) Q
th
= Q
th
(Q
h
I) (17.26)
and
[Q
th
f Q
t
f[ = [Q
th
(Q
h
I) f[ [Q
th
[(Q
h
I) f[[ sup
S
[(Q
h
I) f[
which will tend to zero as h 0 provided Q
h
f f uniformly (another fantasy
in general). With this as justication we will assume that t Q
t
f is continuous
in t.
Taking Eq. (17.25) divided by h and Eq. (17.26) divided by h and then
letting h 0 implies,
_
d
dt
_
+
Q
t
f = AQ
t
f = Q
t
Af
and
_
d
dt
_
Q
t
= AQ
t
= Q
t
A.
where
_
d
dt
_
+
and
_
d
dt
_
denote the right and left derivatives at t. So in principle

we can expect that Q
t
t0
is uniquely determined by its innitesimal generator
A by solving the dierential equation,
d
dt
Q
t
= AQ
t
= Q
t
A with Q
0
= Id. (17.27)
Assuming all of this works out as sketched, it is now reasonable to denote Q
t
by e
tA
. Let us now give a few examples to illustrate the discussion above.
Example 17.19. Suppose that S = 1, 2, . . . , n and Q
t
is a Markov-semi-group
with innitesimal generator, A, so that
d
dt
Q
t
= AQ
t
= Q
t
A. By assumption
Q
t
(i, j) 0 for all i, j S and

n
j=1
Q
t
(i, j) = 1 for all i S. We may write
this last condition as Q
t
1 = 1 for all t 0 where 1 denotes the vector in 1
n
with all entries being 1. Dierentiating Q
t
1 = 1 at t = 0 shows that A1 = 0,
i.e.

n
j=1
A
ij
= 0 for all i S. Since
A
ij
= lim
t0
Q
t
(i, j)
ij
t
if i ,= j we will have,
A
ij
= lim
t0
Q
t
(i, j)
t
0.
Thus we have shown the innitesimal generator, A, of Q
t
must satisfy A
ij
0
for all i ,= j and

n
j=1
A
ij
= 0 for all i S. You are asked to prove the
converse in Exercise 17.7. So an explicit example of an innitesimal generator
when S = 1, 2, 3 is
A =
_
_
3 1 2
4 6 2
7 1 8
_
_
.
Exercise 17.7. Suppose that S = 1, 2, . . . , n and A is a matrix such that
A
ij
0 for i ,= j and

n
j=1
A
ij
= 0 for all i. Show
Q
t
= e
tA
:=
n=0
t
n
n!
A
n
(17.28)
is a time homogeneous Markov kernel.
Hints: 1. To show Q
t
(i, j) 0 for all t 0 and i, j S, write Q
t
=
e
t
e
t(I+A)
where > 0 is chosen so that I + A has only non-negative
entries. 2. To show

jS
Q
t
(i, j) = 1, compute
d
dt
Q
t
1.
Example 17.20 (Poisson Process). By Exercise 17.2, it follows that Poisson pro-
cess, N
t
S := N
0
t0
with intensity has the Markov property. For all
0 s t we have,
P (N
t
= y[N
s
= x) = P (N
s
+N
t
N
s
= y[N
s
= x)
= P (N
s
+N
t
N
s
= y[N
s
= x)
= P (N
t
N
s
= y x[N
s
= x)
= 1
yx
((t s))
yx
(y x)!
e
(ts)
=: q
ts
(x, y) .
With this notation it follows that
P (f (N
t
) [N
s
) = (Q
ts
f) (N
s
)
where
Q
t
f (x) =
yS
q
t
(x, y) f (y)
=
yS
1
yx
(t)
yx
(y x)!
e
t
f (y)
=
n=0
(t)
n
n!
e
t
f (x +n) . (17.29)
In particular N
t
t0
is a time homogeneous Markov process. It is easy (but
technically unnecessary) to directly verify the semi-group property;
(q
t
q
s
) (x, z) :=
yS
q
t
(x, y) q
s
(y, z) = q
s+t
(x, z) .
This can be done using the binomial theorem as follows;
yS
q
t
(x, y) q
s
(y, z) =
zS
1
yx
(t)
yx
(y x)!
e
t
1
zy
(s)
zy
(z y)!
e
s
=
n=0
(t)
n
n!
e
t
1
zx+n
(s)
zxn
(z x n)!
e
s
= 1
zx
e
(t+s)
zx
n=0
(t)
n
n!
(s)
zxn
(z x n)!
= 1
zx
e
(t+s)
((t +s))
zx
(z x)!
= q
s+t
(x, z) .
To identify innitesimal generator, A =
d
dt
[
0
+Q
t
, in this example observe
that
d
dt
Q
t
f (x) =
d
dt
_
e
t
n=0
(t)
n
n!
f (x +n)
_
= Q
t
f (x) +e
t
n=0
n(t)
n1
n!
f (x +n)
= Q
t
f (x) +e
t
n=0
(t)
n
n!
f (x +n + 1)
= Q
t
f (x) +(Q
t
f) (x + 1)
= Q
t
[f ( + 1) f ()] (x)
and hence
Af (x) = (f (x + 1) f (x)) .
Finally let us try to solve Eq. (17.27) in order to recover Q
t
from A. Formally
we can hope that Q
t
= e
tA
where e
tA
is given as its power series expansion. To
simplify the computation it convenient to write A = (T I) where If = f
and Tf = f ( + 1) . Since I and T commute we further expect
e
tA
= e
t(TI)
= e
tI
e
tT
= e
t
e
tT
where
_
e
tT
f
_
(x) =
n=0
(t)
n
n!
(T
n
f) (x)
=
n=0
(t)
n
n!
f (x +n) .
Putting this all together we nd
_
e
tA
f
_
(x) = e
t
n=0
(t)
n
n!
f (x +n)
which is indeed in agreement with Q
t
f (x) as we saw in Eq. (17.29).
Denition 17.21 (Brownian Motion). Let
_
, B, B
t
tR+
, P
_
be a l-
tered probability space. A real valued adapted process, X
t
: S = 1
tR+
,
is called a Brownian motion if;
1. X
t
tR+
has independent increments,
2. for 0 s < t, X
t
X
s
d
= N (0, t s) , i.e. X
t
X
s
is a normal mean zero
random variable with variance (t s) ,
3. t X
t
() is continuous for all .
Exercise 17.8 (Brownian Motion). Assuming a Brownian motion B
t
t0
exists as described in Denition 17.21 show;
1. The process is a time homogeneous Markov process with transition kernels
given by;
Q
t
(x, dy) = q
t
(x, y) dy (17.30)
where
q
t
(x, y) =
1
2t
e
1
2t
]yx]
2
. (17.31)
2. Show by direct computation that Q
t
Q
s
= Q
t+s
for all s, t > 0. Hint: one of
the many ways to do this it to use basic facts you have already proved about
sums of independent Gaussian random variables along with the identity,
(Q
t
f) (x) = E
_
f
_
x +
tZ
__
,
where Z
d
= N (0, 1) .
3. Show by direct computation that q
t
(x, y) satises the heat equation,
d
dt
q
t
(x, y) =
1
2
d
2
dx
2
q
t
(x, y) =
1
2
d
2
dy
2
q
t
(x, y) for t > 0.
4. Suppose that f : 1 1 is a twice continuously dierentiable function with
compact support. Show
d
dt
Q
t
f = AQ
t
f = Q
t
Af for all t > 0,
where
Af (x) =
1
2
f
tt
(x) .
By combining Exercise 17.8 with Theorem 17.11 proves the following corol-
lary.
Corollary 17.22. There exits a Markov process B
t
t0
satisfying properties
1. and 2. of Denition 17.21.
To get the path continuity property of Brownian motion requires additional
arguments which we will do in a number of ways later, see Theorems 26.3,
26.7, and 34.5. Modulo technical details, Exercise 17.8 shows that A =
1
2
d
2
dx
2
is
the innitesimal generator of Brownian motion, i.e. of Q
t
in Eqs. (17.30) and
(17.31). The technical details we have ignored involve the proper function spaces
in which to carry out these computations along with a proper description of the
domain of the operator A. We will have to postpone these somewhat delicate
issues until later. By the way it is no longer necessarily a good idea to try to
recover Q
t
as

n=0
t
n
n!
A
n
in this example since in order for

n=0
t
n
n!
A
n
f to
make sense one needs to assume that f is a least C
and even this will not

guarantee convergence of the sum.
A Levy process is a generalization of this type a process. A Levy process
is a process with independent stationary increments which has right continuous
paths.
Example 17.23. If N
t
t0
is a Poisson process and B
t
t0
is a Brownian mo-
tion which is independent of N
t
t0
, then X
t
= B
t
+ N
t
is a Levy process,
i.e. has independent stationary increments and is right continuous. The process
X is a time homogeneous Markov process with Markov transition kernels given
by;
(Q
t
f) (x) = Ef (x +N
t
+B
t
) =
_
R
1
2t
e
1
2t
]y]
2
E[f (x +y +N
t
)] dy
=
e
t
2t
n=0
_
R
e
1
2t
]y]
2 (t)
n
n!
f (x +y +n) dy
=
e
t
2t
n=0
_
R
e
1
2t
]yn]
2 (t)
n
n!
f (x +y) dy
=
_
R
q
t
(x, y) f (y) dy
where
q
t
(x, y) =
e
t
2t
n=0
(t)
n
n!
e
1
2t
]yn]
2
.
The innitesimal generator, A =
d
dt
[
0
+Q
t
of this process satises,
(Af) (x) =
1
2
f
tt
(x) +(f (x + 1) f (x))
at least for all f C
2
c
(1) . This example will be signicantly generalized in
Theorem 17.28 below.
In order to continue with giving examples in the continuous time case we
will need a simple measure theoretic result.
Lemma 17.24. If is a set,
0
, and B
0
is a algebra on
0
, then
B
0
:= A : A
0
B
0
is a algebra on . Moreover, f : 1 is
B
0
measurable i f[
0
is B
0
measurable.
Proof. It is clear that ,

B
0
and that

B
0
is closed under countable unions
since A
n

B
0
i A
n

0
B
0
which implies [A
n
]
0
= [A
n

0
] B
0
and this implies that [A
n
]

B
0
. Lastly if A

B
0
then A
0

B
0
implies
that
A
c

0
=
0
A =
0
[A
0
] B
0
and therefore A

B
0
.
For the second assertion, let us observe that for W B
R
we have
f
1
(W)
0
= f[
1
0
(W)
so that f
1
(W)

B
0
i f
1
(W)
0
B
0
i f[
1
0
(W) B
0
. It now clearly
follows that f : 1 is

B
0
measurable i f[
0
is B
0
measurable.
Denition 17.25. Suppose that =

n=0
n
and B
n
is a algebra on
n
for all n. Then we let
n=0
B
n
=: B be the algebra on such that
A is measurable i A
n
B
n
for all n. That is to say B =
n=0
B
t
n
with
B
t
n
= A : A
n
B
n
.
From Lemma 17.24 it follows that f : 1 is
n=0
B
n
measurable i
f
1
(W)

B
n
for all n i f[
1
n
(W) B
n
for all n i f[
n
is B
n
measurable
for all n. We in fact do not really use any properties of
n
for these statements
it is not even necessary for n to run over a countable index set!
_
,
_
B
n
_
nN0
, B, P, Y
n
: S
nN0
_
is a
time homogeneous Markov chain with one step transition kernel,

Q. Further
suppose that N
t
t0
is a Poisson process with parameter which is independent
of

B
:=
n

B
n
. Let B
t
be the algebra on such that
[B
t
]
Nt=n
:=
_
_
N
s
: s t &

B
n
__
Nt=n
.
To be more explicit, A is in B
t
i
A N
t
= n
_
N
s
: s t &

B
n
_
for all n N
0
.
Finally let X
t
:= Y
Nt
for t 1
+
. Then B
t
tR+
is a ltration, X
t
t0
is
adapted to this ltration and is time homogeneous Markov process with transi-
tion semi-group given by
Q
t
= e
t
n=0
(t)
n
n!

Q
n
= e
t
e
t
Q
= e
t(
QI)
.
Proof. Let us begin by showing that B
t
is increasing. If 0 s < t and
A B
s
then A N
s
= m
_
N
r
: r s &

B
m
_
therefore, for n m, we
have
A N
s
= m N
t
= n
_
N
r
: r t &

B
m
_

_
N
r
: r t &

B
n
_
and therefore,
A N
t
= n =
mn
[A N
s
= m N
t
= n]
_
N
r
: r t &

B
n
_
for all n N
0
. Thus we have shown A B
t
and therefore B
s
B
t
for all s t.
Since X
t
[
Nt=n
= Y
n
[
Nt=n
is
_
N
r
: r t &

B
n
_
measurable for all n N
0
,
it follows by Lemma 17.24 that X
t
is B
t
measurable.
We now need to show that Markov property. To this end let t s, f o
b
and g (B
s
)
b
. Then for each n N
0
, we have g[
Ns=n

_
N
r
: r s &

B
n
_
and therefore,
E[f (X
t
) g] =
m,n=0
E[f (X
t
) g : N
s
= n, N
t
N
s
= m]
=
m,n=0
E[f (Y
n+m
) 1
NtNs=m
g 1
Ns=n
]
=
m,n=0
P (N
t
N
s
= m) E[f (Y
n+m
) g 1
Ns=n
]
=
m,n=0
e
(ts)
((t s))
m
m!
E
__
Q
m
f
_
(Y
n
) g 1
Ns=n
_
=
n=0
E[(Q
ts
f) (Y
n
) g 1
Ns=n
] = E[(Q
ts
f) (Y
Ns
) g]
= E[(Q
ts
f) (X
s
) g] .
This shows that
E[f (X
t
) [B
s
] = (Q
ts
f) (X
s
) P a.s.
Corollary 17.27. Suppose that S is countable or nite set and a : S S 1
is a function such that a (x, y) 0 for all x ,= y, and there exists < such
that
a
x
:=
y,=x
a (x, y) for all x S.
Let us now dene a (x, x) = a
x
and A : o
b
o
b
by
Af (x) :=
yS
a (x, y) f (y) =
y,=x
a (x, y) [f (y) f (x)] for all x S.
Then;
1. The functions, q : S S [0, 1] dened by
q (x, y) :=
_
1
a (x, y) if x ,= y
1
1
a
x
if x = y
are the matrix elements of a one step Markov transition kernel,

Q, i.e.
Q : o
b
o
b
is dened by
Qf (x) =
yS
q (x, y) f (y) .
In this operator theoretic notation we have A =
_
QI
_
.
2. If Y
n
n=0
is a time homogeneous Markov chain with one step transition
kernel,

Q, and N
t
t0
is an independent Poisson process with intensity
, then X
t
:= Y
Nt
is a time homogeneous Markov process with transition
kernels,
Q
t
= e
tA
= e
t
e
t
Q
.
In particular, A is the innitesimal generator of a Markov transition semi-
group.
Notice that
0 e
t
Q
(x, y) =
n=0
(t)
n
n!

Q
n
(x, y)
and
yS
e
t
Q
(x, y) =
yS
n=0
(t)
n
n!

Q
n
(x, y)
=
n=0
(t)
n
n!
yS
Q
n
(x, y) =
n=0
(t)
n
n!
1 = e
t
so that Q
t
(x, y) 0 for all x, y and

yS
Q
t
(x, y) = 1 for all x S as must
be the case.
The compound Poisson process in the next theorem gives another example
of a Levy process an example of the construction in Theorem 17.26. (The reader
should compare the following result with Theorem 23.36 below.)
Theorem 17.28 (Compound Poisson Process). Suppose that Z
i
i=1
are
i.i.d. random vectors in 1
d
and N
t
t0
be an independent Poisson process with
intensity . Further let Z
0
: 1
d
be independent of N
t
t0
and the Z
i
i=1
and then dene, for t 1
+
,
B
t
=
n=0
[ (N
s
: s t, Z
0
, . . . , Z
n
)]
{N
t
=n}
and X
t
:= S
Nt
where S
n
:= Z
0
+Z
1
+ +Z
n
. Then B
t
t0
is a ltration (i.e.
it is increasing), X
t
t0
is a B
t
adapted process such that for all 0 s < t,
X
t
X
s
is independent of B
s
. The increments are stationary and therefore
X
t
t0
is a Levy process. The time homogeneous transition kernel is given by
(Q
t
f) (x) = E[f (x +Z
1
+ +Z
Nt
)]
=
n=0
(t)
n
n!
e
t
E[f (x +Z
1
+ +Z
n
)] .
If we dene
_
Qf
_
(x) := E[f (x +Z
1
)] , the above equation may be written as,
Q
t
= e
t
n=0
(t)
n
n!

Q
n
= e
t(
QI)
.
Proof. Let us begin by showing that B
t
is increasing. First observe that
[ (N
s
: s t, Z
0
, . . . , Z
n
)]
{N
t
=n}
= A N
t
= n : A (N
s
: s t, Z
0
, . . . , Z
n
) .
If 0 s < t and A B
s
then A N
s
= m (N
r
: r s, Z
0
, . . . , Z
m
)
therefore, for n m, we have
AN
s
= mN
t
= n (N
r
: r t, Z
0
, . . . , Z
m
) (N
r
: r t, Z
0
, . . . , Z
n
)
and we may conclude that
A N
t
= n =
mn
[A N
s
= m N
t
= n] (N
r
: r t, Z
0
, . . . , Z
n
)
for all n N
0
. Thus we have shown A B
t
and therefore B
s
B
t
for all s t.
Since
X
t
[
Nt=n
= [Z
0
+Z
1
+ +Z
n
] [
Nt=n
is [ (N
s
: s t, Z
0
, . . . , Z
n
)]
Nt=n]
measurable for all n N
0
, it follows by
Lemma 17.24 that X
t
is B
t
measurable.
We now show that X
t
X
s
is independent of B
s
for all t s. To this end,
let f (B
R
d)
b
and g (B
s
)
b
. Then for each n N
0
, we have
g[
Ns=n
= G
n
_
N
r
rs
, Z
0
, . . . , Z
n
_
while
(X
t
X
s
) [
Ns=n
= Z
n+1
+ +Z
Nt
.
Therefore we have,
E[f (X
t
X
s
) g] =
m,n=0
E[f (X
t
X
s
) g : N
s
= n, N
t
N
s
= m]
and if we let a
m,n
be the summand on the right side of this equation we have,
a
m,n
= E
_
f (Z
n+1
+ +Z
n+m
) G
n
_
N
r
rs
, Z
0
, . . . , Z
n
_
: N
s
= n, N
t
N
s
= m
_
= E
_
f (Z
n+1
+ +Z
n+m
) 1
NtNs=m
G
n
_
N
r
rs
, Z
0
, . . . , Z
n
_
1
Ns=n
_
= E[f (Z
n+1
+ +Z
n+m
) 1
NtNs=m
] E
_
G
n
_
N
r
rs
, Z
0
, . . . , Z
n
_
1
Ns=n
_
= e
(ts)
((t s))
m
m!
E[f (Z
1
+ +Z
m
)] E
_
G
n
_
N
r
rs
, Z
0
, . . . , Z
n
_
1
Ns=n
_
.
E[f (X
t
X
s
) g] =
m,n=0
a
m,n
=
n=0
m=0
a
m,n
= (Q
ts
f) (0)
n=0
E
_
G
n
_
N
r
rs
, Z
0
, . . . , Z
n
_
1
Ns=n
_
= (Q
ts
f) (0) E[g]
from which it follows that X
t
X
s
is independent of B
s
and
E[f (X
t
X
s
)] = (Q
ts
f) (0) .
(This equation shows that the distribution of the increments is stationary.) We
now know by Exercise 17.2 that X
t
is a Markov process and the transition
kernel is given by
(Q
ts
f) (x) = E[f (x +X
t
X
s
)] = (Q
ts
f (x +)) (0)
as described above.
17.4 First Step Analysis and Hitting Probabilities
In this section we suppose that T = N
0
,
_
, B, B
t
tT
_
is a ltered measures
space, X
t
: S is a B
t
/o measurable function for all t T, Q : S o
[0, 1] is a Markov-transition kernel, and for each x S there exists a probability,
P
x
, on (, B) such that P
x
(X
0
= x) = 1 and X
t
t0
is a time homogenous
Markov process with Q as its one step Markov transition kernel. To shorten
notation we will write E
x
for the expectation relative to the measure P
x
.
Denition 17.29 (Hitting times). For B o, let
T
B
(X) := minn 0 : X
n
B
with the convention that min = . We call T
B
(X) = T
B
(X
0
, X
1
, . . . ) the
rst hitting time of B by X = X
n
n
.
17.4 First Step Analysis and Hitting Probabilities 247
Notation 17.30 For A o, let Q
A
: Ao
A
[0, 1] be the restriction of Q to
A, so that Q
A
(x, C) := Q(x, C) for all x A and C o
A
. As with probability
kernels we may identify Q
A
with an operator from (o
A
)
b
to itself via,
(Q
A
f) (x) =
_
A
Q(x, dy) f (y) for all x A and f (o
A
)
b
.
Theorem 17.31. Let n denote a non-negative integer. If h : B 1 is mea-
surable and either bounded or non-negative, then
E
x
[h(X
n
) : T
B
= n] =
_
Q
n1
A
Q[1
B
h]
_
(x)
and
E
x
[h(X
TB
) : T
B
< ] =
_

n=0
Q
n
A
Q[1
B
h]
_
(x) . (17.32)
If g : A 1
+
is a measurable function, then for all x A and n N
0
,
E
x
[g (X
n
) 1
n<TB
] = (Q
n
A
g) (x) .
In particular we have
E
x
_

n<TB
g (X
n
)
_
=
n=0
(Q
n
A
g) (x) =: u(x) , (17.33)
where by convention,

n<TB
g (X
n
) = 0 when T
B
= 0.
Proof. Let x A. In computing each of these quantities we will use;
T
B
> n = X
i
A for 0 i n and
T
B
= n = X
i
A for 0 i n 1 X
n
B .
From the second identity above it follows that for
E
x
[h(X
n
) : T
B
= n] = E
x
_
h(X
n
) : (X
1
, . . . , X
n1
) A
n1
, X
n
B
n=1
_
A
n1
B
n
j=1
Q(x
j1
, dx
j
) h(x
n
)
=
_
Q
n1
A
Q[1
B
h]
_
(x)
and therefore
E
x
[h(X
TB
) : T
B
< ] =
n=1
E
x
[h(X
n
) : T
B
= n]
=
n=1
Q
n1
A
Q[1
B
h] =
n=0
Q
n
A
Q[1
B
h] .
Similarly,
E
x
[g (X
n
) 1
n<TB
] =
_
A
n
Q(x, dx
1
) Q(x
1
, dx
2
) . . . Q(x
n1
, dx
n
) g (x
n
)
= (Q
n
A
g) (x)
and therefore,
E
x
_

n=0
g (X
n
) 1
n<TB
_
=
n=0
E
x
[g (X
n
) 1
n<TB
]
=
n=0
(Q
n
A
g) (x) .
In practice it is not so easy to sum the series in Eqs. (17.32) and (17.33).
Thus we would like to have another way to compute these quantities. Since
n=0
Q
n
A
is a geometric series, we expect that
n=0
Q
n
A
= (I Q
A
)
1
which is basically correct at least when (I Q
A
) is invertible. This suggests
that if u(x) = E
x
[h(X
TB
) : T
B
< ] , then (see Eq. (17.32))
u = Q
A
u +Q[1
B
h] on A, (17.34)
and if u(x) = E
x
_
n<TB
g (X
n
)
, then (see Eq. (17.33))

u = Q
A
u +g on A. (17.35)
That these equations are valid is the content of Corollaries 17.33 and 17.34
below which we will prove using the rst step analysis in the next theorem.
We will give another direct proof in Theorem 17.39 below as well.
Theorem 17.32 (First step analysis). Let us keep the assumptions in The-
orem 17.14 and add the further assumption that T = N
0
. Then for all F o
N
b
or F : S
N0
[0, ] measurable;
E
x
[F (X
0
, X
1
, . . . )] =
_
S
Q(x, dy) E
y
F (x, X
0
, X
1
, . . . ) . (17.36)
This equation can be iterated to show more generally that
E
x
[F (X
0
, X
1
, . . . )] =
_
S
n
n
j=1
Q(x
j1
, dx
j
) E
xn
[F (x
0
, x
1
, . . . , x
n1
, X
0
, X
1
, . . . )]
(17.37)
where x
0
:= x.
Proof. Since X
0
() = x for P
x
a.e. , we have F (X
0
, X
1
, . . . ) =
F (x, X
1
, X
2
, . . . ) a.s. Therefore by Theorem 17.14 we know that
E
x
[F (X
0
, X
1
, . . . ) [B
1
] = E
x
[F (x, X
1
, X
2
, . . . ) [B
1
] = E
X1
F (x, X
0
, X
1
, . . . ) .
Taking expectations of this equation shows,
E
x
[F (X
0
, X
1
, . . . )] = E
x
[E
X1
F (x, X
0
, X
1
, . . . )]
=
_
S
Q(x, dy) E
y
F (x, X
0
, X
1
, . . . ) .
Corollary 17.33. Suppose that B o, A := B
c
o, h : B 1 is a measur-
able function which is either bounded or non-negative, and
u(x) := E
x
[h(X
TB
) : T
B
< ] for x A.
Then u : A 1 satises Eq. (17.34), i.e. u = Q
A
u +Q[1
B
h] on A or in more
detail
u(x) =
_
A
Q(x, dy) u(y) +
_
B
Q(x, dy) h(y) for all x A.
In particular, when h 1, u(x) = P
x
(T
B
< ) is a solution to the equation,
u = Q
A
u +Q1
B
on A. (17.38)
Proof. To shorten the notation we will use the convention that h(X
TB
) = 0
if T
B
= so that we may simply write u(x) := E
x
[h(X
TB
)] . Let
F (X
0
, X
1
, . . . ) = h
_
X
TB(X)
_
= h
_
X
TB(X)
_
1
TB(X)<
,
then for x A we have F (x, X
0
, X
1
, . . . ) = F (X
0
, X
1
, . . . ) . Therefore by the
rst step analysis (Theorem 17.32) we learn
u(x) = E
x
h
_
X
TB(X)
_
= E
x
F (x, X
1
, . . . ) =
_
S
Q(x, dy) E
y
F (x, X
0
, X
1
, . . . )
=
_
S
Q(x, dy) E
y
F (X
0
, X
1
, . . . ) =
_
S
Q(x, dy) E
y
_
h
_
X
TB(X)
_
=
_
A
Q(x, dy) E
y
_
h
_
X
TB(X)
_
+
_
B
Q(x, dy) h(y)
=
_
A
Q(x, dy) u(y) +
_
B
Q(x, dy) h(y)
= (Q
A
u) (x) + (Q1
B
) (x) .
Corollary 17.34. Suppose that B o, A := B
c
o, g : A [0, ] is
a measurable function. Further let u(x) := E
x
_
n<TB
g (X
n
)
. Then u(x)
satises Eq. (17.35), i.e. u = Q
A
u +g on A or in more detail,
u(x) =
_
A
Q(x, dy) u(y) +g (x) for all x A.
In particular if we take g 1 in this equation we learn that
E
x
T
B
=
_
A
Q(x, dy) E
y
T
B
+ 1 for all x A.
Proof. Let F (X
0
, X
1
, . . . ) =

n<TB(X0,X1,... )
g (X
n
) be the sum of the
values of g along the chain before its rst exit from A, i.e. entrance into B.
With this interpretation in mind, if x A, it is easy to see that
F (x, X
0
, X
1
, . . . ) =
_
g (x) if X
0
B
g (x) +F (X
0
, X
1
, . . . ) if X
0
A
= g (x) + 1
X0A
F (X
0
, X
1
, . . . ) .
Therefore by the rst step analysis (Theorem 17.32) it follows that
u(x) = E
x
F (X
0
, X
1
, . . . ) =
_
S
Q(x, dy) E
y
F (x, X
0
, X
1
, . . . )
=
_
S
Q(x, dy) E
y
[g (x) + 1
X0A
F (X
0
, X
1
, . . . )]
= g (x) +
_
A
Q(x, dy) E
y
[F (X
0
, X
1
, . . . )]
= g (x) +
_
A
Q(x, dy) u(y) .
The problem with Corollaries 17.33 and 17.34 is that the solutions to Eqs.
(17.34) and (17.35) may not be unique as we will see in the next examples.
Theorem 17.39 below will explain when these ambiguities may occur and how
to deal with them when they do.
Example 17.35 (Biased random walks I). Let p (1/2, 1) and consider the bi-
ased random walk S
n
n0
on the S = Z where S
n
= X
0
+ X
1
+ + X
n
,
X
i
i=1
are i.i.d. with P (X
i
= 1) = p (0, 1) and P (X
i
= 1) = q := 1 p,
and X
0
= x for some x Z. Let B := 0 and u(x) := P
x
(T
B
< ) . Clearly
u(0) = 0 and by the rst step analysis,
u(x) = pu(x + 1) +qu(x 1) for x ,= 0. (17.39)
From Exercise 17.10 below, we know that the general solution to Eq. (17.39) is
of the form
u(x) = a
x
+
+b
x
where
are the roots for the characteristic polynomial, p

2
+q = 0. since
constants solve Eq. (17.39) we know that one root is 1 as is easily veried. The
other root
2
is q/p. Thus the general solution is of the form, w(x) = a+b (q/p)
x
.
In all case we are going to choose a and b so that 0 = u(0) = w(0) (i.e. a+b = 0)
so that w(x) = a + (1 a) (q/p)
x
. For x > 0 we choose a = a
+
so that
w
+
(x) := a
+
+(1 a
+
) (q/p)
x
satises w
+
(1) = u(1) and for x < 0 we choose
a = a
so that w
(x) := a
+ (1 a
) (q/p)
x
satises w
(1) = u(1) .
With these choice we will have u(x) = w
+
(x) for x 0 and u(x) = w
(x) for
x 0 see Exercise 17.10 and Remark 17.37. Observe that
u(1) = a
+
+ (1 a
+
) (q/p) = a
+
=
u(1) (q/p)
1 (q/p)
and
u(1) = a
+ (1 a
) (p/q) = a
=
(p/q) u(1)
(p/q) 1
.
Case 1. x < 0 : As x , we will have [u(x)[ unless a
= 1. Thus
we must take a
= 1 and we have shown,

P
x
(T
0
< ) = w
(x) = 1 for all x < 0.

Case 2. x > 0 : For n N
0
, let T
n
= minm : X
m
= n be the rst time
X hits n. By the MCT we have,
P
x
(T
0
< ) = lim
n
P
x
(T
0
< T
n
) .
So we will now try to compute u(x) = P
x
(T
0
< T
n
). By the rst step analysis
(take B = 0, n and h(0) = 1 and h(n) = 0 in Corollary 17.33) we will
still have that u(x) satises Eq. (17.39) for 0 < x < n but now the boundary
conditions are u(0) = 1 and u(n) = 0. Accordingly u(x) for 0 x n is still
of the form given in Eq. (17.39) but we may now determine a = a
n
using the
boundary condition
0 = u(n) = a + (1 a) (q/p)
n
= (q/p)
n
+a (1 (q/p)
n
)
2
Indeed,
p
_
q
p
_
2
q
p
+q =
q
p
[q 1 +p] = 0.
a
n
=
(q/p)
n
(q/p)
n
1
0 as n .
Thus we have shown
P
x
(T
0
< T
n
) =
(q/p)
n
(q/p)
n
1
+
_
1
(q/p)
n
(q/p)
n
1
_
(q/p)
x
=
(q/p)
n
(q/p)
x
(q/p)
n
1
=
(q/p)
x
(q/p)
n
1 (q/p)
n
(q/p)
x
as n
and therefore, since T
n
P
x
a.s. as n ,
P
x
(T
0
< ) = (q/p)
x
for all x > 0.
Example 17.36 (Biased random walks II). Continue the notation in Example
17.35. Let us now try to compute E
x
T
0
. Since P
x
(T
0
= ) > 0 for x > 0 we
already know that E
x
T
0
= for all x > 0. Nevertheless we will deduce this
fact again here.
Letting u(x) = E
x
T
0
it follows by the rst step analysis that, for x ,= 0,
u(x) = p [1 +u(x + 1)] +q [1 +u(x 1)]
= pu(x + 1) +qu(x 1) + 1 (17.40)
with u(0) = 0. Notice u(x) = is a solution to this equation while if u(a) <
for some a ,= 0 then Eq. (17.40) implies that u(x) < for all x ,= 0 with the
same sign as a.
A particular solution to this equation may be found by trying u(x) = x
to learn,
x = p(x + 1) +q(x 1) + 1 = x +(p q) + 1
which is valid for all x provided = (q p)
1
. The general nite solution to
Eq. (17.40) is therefore,
u(x) = (q p)
1
x +a +b (q/p)
x
. (17.41)
Using the boundary condition, u(0) = 0 allows us to conclude that a + b = 0
and therefore,
u(x) = u
a
(x) = (q p)
1
x +a [1 (q/p)
x
] . (17.42)
Notice that u
a
(x) as x + no matter how a is chosen and therefore
we must conclude that the desired solution to Eq. (17.40) is u(x) = for x > 0
as we already mentioned.
The question now is for x < 0. Is it again the case that u(x) = or is
u(x) = u
a
(x) for some a 1. Since lim
x
u
a
(x) = unless a 0, we
may restrict our attention to a 0. To work out which a 0 is correct observe
by MCT that
E
x
T
0
= lim
n
E
x
[T
n
T
0
] = lim
n
E
x
_
T
n,0]
.
So let n Z with n < 0 be xed for the moment. By item 8. of Theorem
17.39 we may conclude that u(x) := E
x
_
T
n,0]
< for all n x 0.

Then by the rst step analysis, u(x) satises Eq. (17.40) for n < x < 0 and has
boundary conditions u(n) = 0 = u(0) . Using the boundary condition u(n) = 0
to determine a = a
n
in Eq. (17.42) implies,
0 = u
a
(n) = (q p)
1
n +a [1 (q/p)
n
]
so that
a = a
n
=
n
(1 (q/p)
n
) (p q)
0 as n .
Thus we conclude that
E
x
T
0
= lim
n
E
x
[T
n
T
0
] = lim
n
u
an
(x)
=
x
q p
=
[x[
p q
for x < 0.
Remark 17.37 (More on the boundary conditions). If we were to use Corollary
17.34 directly to derive Eq. (17.40) in the case that u(x) := E
x
_
T
n,0]
<
we for all 0 x n. we would nd, for x ,= 0, that
u(x) =
y/ n,0]
q (x, y) u(y) + 1
which implies that u(x) satises Eq. (17.40) for n < x < 0 provided u(n) and
u(0) are taken to be equal to zero. Let us again choose a and b
w(x) := (q p)
1
x +a +b (q/p)
x
satises w(0) = 0 and w(1) = u(1) . Then both w and u satisfy Eq. (17.40)
for n < x 0 and agree at 0 and 1 and therefore are equal
3
for n x 0
and in particular 0 = u(n) = w(n) . Thus correct boundary conditions on w in
order for w = u are w(0) = w(n) = 0 as we have used above.
3
Observe from Eq. (17.40) we have for x = 0 that,
u(x 1) = q
1
[u(x) pu(x + 1) 1] .
From this equation it follows easily that u(x) for x 0 is determined by its values
at x = 0 and x = 1.
Denition 17.38. Suppose (A, /) is a measurable space. A sub-probability
kernel on (A, /) is a function : A / [0, 1] such that (, C) is //B
R

measurable for all C / and (x, ) : / [0, 1] is a measure for all x A.
As with probability kernels we will identify with the linear map, : /
b

/
b
given by
(f) (x) = (x, f) =
_
A
f (y) (x, dy) .
Of course we have in mind that / = o
A
and = Q
A
. In the following lemma
let |g|
:= sup
xA
[g (x)[ for all g /
b
.
Theorem 17.39. Let be a sub-probability kernel on a measurable space (A, /)
and dene u
n
(x) := (
n
1) (x) for all x A and n N
0
. Then;
1. u
n
is a decreasing sequence so that u := lim
n
u
n
exists and is in /
b
.
(When = Q
A
, u
n
(x) = P
x
(T
B
> n) u(x) = P (T
B
= ) as n .)
2. The function u satises u = u.
3. If w /
b
and w = w then [w[ |w|
u. In particular the equation,

w = w, has a non-zero solution w /
b
i u ,= 0.
4. If u = 0 and g /
b
, then there is at most one w /
b
such that w = w+g.
5. Let
U :=
n=0
u
n
=
n=0
n
1 : A [0, ] (17.43)
and suppose that U (x) < for all x A. Then for each g o
b
,
w =
n=0
n
g (17.44)
is absolutely convergent,
[w[ |g|
U, (17.45)
(x, [w[) < for all x A, and w solves w = w + g. Moreover if v also
solves v = v +g and [v[ CU for some C < then v = w.
Observe that when = Q
A
,
U (x) =
n=0
P
x
(T
B
> n) =
n=0
E
x
(1
TB>n
) = E
x
_

n=0
1
TB>n
_
= E
x
[T
B
] .
6. If g : A [0, ] is any measurable function then
w :=
n=0
n
g : A [0, ]
is a solution to w = w + g. (It may be that w though!) Moreover if
v : A [0, ] satises v = v + g then w v. Thus w is the minimal
non-negative solution to v = v +g.
7. If there exists < 1 such that u on A then u = 0. (When = Q
A
, this
state that P
x
(T
B
= ) for all x A implies P
x
(T
A
= ) = 0 for all
x A.)
8. If there exists an < 1 and an n N such that u
n
=
n
1 on A, then
there exists C < such that
u
k
(x) =
_
k
1
_
(x) C
k
for all x A and k N
0
where :=
1/n
< 1. In particular, U C (1 )
1
and u = 0 under this
assumption.
(When = Q
A
this assertion states; if P
x
(T
B
> n) for all A, then
P
x
(T
B
> k) C
k
and E
x
T
B
C (1 )
1
for all k N
0
.)
Proof. We will prove each item in turn.
1. First observe that u
1
(x) = (x, A) 1 = u
0
(x) and therefore,
u
n+1
=
n+1
1 =
n
u
1

n
1 = u
n
.
We now let u := lim
n
u
n
so that u : A [0, 1] .
2. Using DCT we may let n in the identity, u
n
= u
n+1
in order to show
u = u.
3. If w /
b
with w = w, then
[w[ = [
n
w[
n
[w[ |w|
n
1 = |w|
u
n
.
Letting n shows that [w[ |w|
u.
4. If w
i
/
b
solves w
i
= w
i
+ g for i = 1, 2 then w := w
2
w
1
satises
w = w and therefore [w[ Cu = 0.
5. Let U :=

n=0
u
n
=

n=0
n
1 : A [0, ] and suppose U (x) < for
all x A. Then u
n
(x) 0 as n and so bounded solutions to u = u
are necessarily zero. Moreover we have, for all k N
0
, that
k
U =
n=0
k
u
n
=
n=0
u
n+k
=
n=k
u
n
U. (17.46)
Since the tails of convergent series tend to zero it follows that lim
k
k
U =
0.
Now if g o
b
, we have
n=0
[
n
g[
n=0
n
[g[
n=0
n
|g|
= |g|
U < (17.47)
and therefore
n=0
n
g is absolutely convergent. Making use of Eqs. (17.46)
and (17.47) we see that
n=1
[
n
g[ |g|
U |g|
U <
and therefore (using DCT),
w =
n=0
n
g = g +
n=1
n
g
= g +
n=1
n1
g = g +w,
i.e. w solves w = g +w.
If v : A 1 is measurable such that [v[ CU and v = g + v, then
y := w v solves y = y with [y[ (C +|g|
) U. It follows that
[y[ = [
n
y[ (C +|g|
)
n
U 0 as n ,
i.e. 0 = y = w v.
6. If g 0 we may always dene w by Eq. (17.44) allowing for w(x) = for
some or even all x A. As in the proof of the previous item (with DCT
being replaced by MCT), it follows that w = w + g. If v 0 also solves
v = g +v, then
v = g + (g +v) = g +g +
2
v
and more generally by induction we have
v =
n
k=0
k
g +
n+1
v
n
k=0
k
g.
Letting n in this last equation shows that v w.
7. If u < 1 on A, then by item 3. with w = u we nd that
u |u|
u u
which clearly implies u = 0.
8. If u
n
< 1, then for any m N we have,
u
n+m
=
m
u
n

m
1 = u
m
.
Taking m = kn in this inequality shows, u
(k+1)n
u
kn
. Thus a simple
induction argument shows u
kn

k
for all k N
0
. For general l N
0
we
write l = kn +r with 0 r < n. We then have,
u
l
= u
kn+r
u
kn

k
=
lr
n
= C
l/n
where C =
n1
n .
Corollary 17.40. If h : B [0, ] is measurable, then u(x) :=
E
x
[h(X
TB
) : T
B
< ] is the unique minimal non-negative solution to Eq.
(17.34) while if g : A [0, ] is measurable, then u(x) = E
x
_
n<TB
g (X
n
)
is the unique minimal non-negative solution to Eq. (17.35).

Exercise 17.9. Keeping the notation of Example 17.35 and 17.36. Use Corol-
lary 17.40 to show again that P
x
(T
B
< ) = (q/p)
x
for all x > 0 and
E
x
T
0
= x/ (q p) for x < 0. You should do so without making use of the
extraneous hitting times, T
n
for n ,= 0.
Corollary 17.41. If P
x
(T
B
= ) = 0 for all x A and h : B 1 is a
bounded measurable function, then u(x) := E
x
[h(X
TB
)] is the unique solution
to Eq. (17.34).
Corollary 17.42. Suppose now that A = B
c
is a nite subset of S and there
exists an (0, 1) such that P
x
(T
B
= ) for all x A. Then there exists
C < and (0, 1) such that P
x
(T
B
> n) C
n
.
Proof. We know that
lim
n
P
x
(T
B
> n) = P
x
(T
B
= ) for all x A.
Therefore if (, 1) , using the fact that A is a nite set, there exists an
n suciently large such that P
x
(T
B
> n) for all x A. The result now
follows from item 8. of Theorem 17.39.
17.5 Finite state space chains
In this subsection I would like to write out the above theorems in the special
case where S is a nite set. In this case we will let q (x, y) := Q(x, y) so that
(Qf) (x) =
yS
q (x, y) f (y) .
Thus if we view f : S 1 as a column vector and Q to be the matrix with
q (x, y) in the x
th
row and y
th
column, then Qf is simply matrix multiplica-
tion. As above we now suppose that S is partitioned into two nonempty subsets
B and A = B
c
. We further assume that P
x
(T
B
< ) > 0 for all x A, i.e.
it is possible with positive probability for the chain X
n
n=0
to visit B when
started from any point in A. Because of Corollary 17.42 we know that in fact
there exists C < and (0, 1) such that P
x
(T
B
> n) C
n
for all n N
0
.
In particular it follows that E
x
T
B
< and P
x
(T
B
< ) = 1 for all x A.
If we let Q
A
= Q
A,A
be the matrix with entries, Q
A
= (q (x, y))
x,yA
and I be the corresponding identity matrix, then (Q
A
I)
1
exits according
to Theorem 17.39. Let us further let R = Q
A,B
be the matrix with entries,
(q (x, y))
xA and yB
. Thus Q decomposes as
Q =
A B
_
Q
A
R

_
A
B
.
To summarize, Q
A
is Q with the rows and columns indexed by B deleted and R
is the Q matrix with the columns indexed by A deleted and rows indexed by
B being deleted. Given a function h : B 1 let (Rh) (x) =
yB
q (x, y) h(y)
for all x A which again may be thought of as matrix multiplication.
Theorem 17.43. Let us continue to use the notation and assumptions as de-
scribed above. If h : B 1 and g : A 1 are given functions, then for all
x A we have;
E
x
[h(X
TB
)] =
_
(I Q
A
)
1
Rh
_
(x) and
E
x
_

n<TB
g (X
n
)
_
=
_
(I Q
A
)
1
g
_
(x) .
Remark 17.44. Here is a story to go along with the above scenario. Suppose
that g (x) is the toll you have to pay for visiting a site x A while h(y)
is the amount of prize money you get when landing on a point in B. Then
E
x
_
0n<T
g(X
n
)
_
is the expected toll you have to pay before your rst exit
from A while E
x
[h(X
T
)] is your expected winnings upon exiting B.
Here are some typical choices for h and g.
1. If y B and h =
y
, then
P
x
(X
TB
= y) =
_
(I Q
A
)
1
R
y
_
(x) =
_
(I Q
A
)
1
R
_
x,y
.
2. If y A and g =
y
, then
n<TB
g (X
n
) =
n<TB
y
(X
n
) = # visits to before hitting B
and hence
E
x
(# visits to before hitting B) =
_
(I Q
A
)
1
y
_
(x)
= (I Q
A
)
1
xy
.
17.5 Finite state space chains 253
3. If g = 1, i.e. g (y) = 1 for all y A, then

n<TB
g (X
n
) = T
B
and we nd,
E
x
T
B
=
_
(I Q
A
)
1
1
_
x
=
yA
(I Q
A
)
1
xy
,
where E
x
T
B
is the expected hitting time of B when starting from x.
Example 17.45. Let us continue the rat in the maze Exercise 17.5 and now
suppose that room 3 contains food while room 7 contains a mouse trap.
_
_
1 2 3 (food)
4 5 6
7 (trap)
_
_
.
We would like to compute the probability that the rat reaches the food before
he is trapped. To answer this question we let A = 1, 2, 4, 5, 6 , B = 3, 7 ,
and T := T
B
be the rst hitting time of B. Then deleting the 3 and 7 rows of
q in Eq. (17.23) leaves the matrix,
1 2 3 4 5 6 7
_
_
0 1/2 0 1/2 0 0 0
1/3 0 1/3 0 1/3 0 0
1/3 0 0 0 1/3 0 1/3
0 1/3 0 1/3 0 1/3 0
0 0 1/2 0 1/2 0 0
_
_
1
2
4
5
6
.
Deleting the 3 and 7 columns from this matrix gives
Q
A
=
1 2 4 5 6
_
_
0 1/2 1/2 0 0
1/3 0 0 1/3 0
1/3 0 0 1/3 0
0 1/3 1/3 0 1/3
0 0 0 1/2 0
_
_
1
2
4
5
6
and deleting the 1, 2, 4, 5, and 6 columns gives
R = Q
A,B
=
3 7
_
_
0 0
1/3 0
0 1/3
0 0
1/2 0
_
_
1
2
4
5
6
.
Therefore,
I Q
A
=
_
_
1
1
2

1
2
0 0
1
3
1 0
1
3
0
1
3
0 1
1
3
0
0
1
3

1
3
1
1
3
0 0 0
1
2
1
_
_
,
and using a computer algebra package we nd
(I Q
A
)
1
=
1 2 4 5 6
_
_
11
6
5
4
5
4
1
1
3
5
6
7
4
3
4
1
1
3
5
6
3
4
7
4
1
1
3
2
3
1 1 2
2
3
1
3
1
2
1
2
1
4
3
_
_
1
2
4
5
6
.
In particular we may conclude,
_
_
E
1
T
E
2
T
E
4
T
E
5
T
E
6
T
_
_
= (I Q
A
)
1
1 =
_
_
17
3
14
3
14
3
16
3
11
3
_
_
,
and
_
_
P
1
(X
T
= 3) P
1
(X
T
= 7)
P
2
(X
T
= 3) P
2
(X
T
= 3)
P
4
(X
T
= 3) P
4
(X
T
= 3)
P
5
(X
T
= 3) P
5
(X
T
= 3)
P
6
(X
T
= 3) P
6
(X
T
= 7)
_
_
= (I Q
A
)
1
R =
3 7
_
_
7
12
5
12
3
4
1
4
5
12
7
12
2
3
1
3
5
6
1
6
_
_
1
2
4
5
6
.
.
Since the event of hitting 3 before 7 is the same as the event X
T
= 3 , the
desired hitting probabilities are
_
_
P
1
(X
T
= 3)
P
2
(X
T
= 3)
P
4
(X
T
= 3)
P
5
(X
T
= 3)
P
6
(X
T
= 3)
_
_
=
_
_
7
12
3
4
5
12
2
3
5
6
_
_
.
We can also derive these hitting probabilities from scratch using the rst
step analysis. In order to do this let
h
i
= P
i
(X
T
= 3) = P
i
(X
n
hits 3 (food) before 7(trapped)) .
By the rst step analysis we will have,
h
i
=
j
P
i
(X
T
= 3[X
1
= j) P
i
(X
1
= j)
=
j
q (i, j) P
i
(X
T
= 3[X
1
= j)
=
j
q (i, j) P
j
(X
T
= 3)
=
j
q (i, j) h
j
where h
3
= 1 and h
7
= 0. Looking at the jump diagram (Figure 17.3) we easily
` _
1
1/2
1/2
` _
2
1/3
1/3
1/3
.
\ __ 3
food
1/2
1/2
.
` _
4
1/3
_
1/3
1/3
` _
5
1/3
.
1/3
1/3
_
` _
6
1/2
1/2
.
\ __ 7
trap
1
_
Fig. 17.3. The jump diagram for our proverbial rat in the maze.
nd
h
1
=
1
2
(h
2
+h
4
)
h
2
=
1
3
(h
1
+h
3
+h
5
) =
1
3
(h
1
+ 1 +h
5
)
h
4
=
1
3
(h
1
+h
5
+h
7
) =
1
3
(h
1
+h
5
)
h
5
=
1
3
(h
2
+h
4
+h
6
)
h
6
=
1
2
(h
3
+h
5
) =
1
2
(1 +h
5
)
and the solutions to these equations are (as seen before) given by
_
h
1
=
7
12
, h
2
=
3
4
, h
4
=
5
12
, h
5
=
2
3
, h
6
=
5
6
_
. (17.48)
Similarly, if
k
i
:= P
i
(X
T
= 7) = P
i
(X
n
is trapped before dinner) ,
we need only use the above equations with h replaced by k and now taking
k
3
= 0 and k
7
= 1 to nd,
k
1
=
1
2
(k
2
+k
4
)
k
2
=
1
3
(k
1
+k
5
)
k
4
=
1
3
(k
1
+k
5
+ 1)
k
5
=
1
3
(k
2
+k
4
+k
6
)
k
6
=
1
2
k
5
and then solve to nd,
_
k
1
=
5
12
, k
2
=
1
4
, k
4
=
7
12
, k
5
=
1
3
, k
6
=
1
6
_
. (17.49)
Notice that the sum of the hitting probabilities in Eqs. (17.48) and (17.49) add
up to 1 as they should.
17.5.1 Invariant distributions and return times
For this subsection suppose that S = 1, 2, . . . , n and Q
ij
is a Markov matrix.
To each state i S, let
i
:= minn 1 : X
n
= i (17.50)
be the rst passage time of the chain to site i.
Proposition 17.46. The Markov matrix Q has an invariant distribution.
Proof. If 1 :=
_
1 1 . . . 1
tr
, then Q1 = 1 from which it follows that
0 = det (QI) = det
_
Q
tr
I
_
.
Therefore there exists a non-zero row vector such that Q
tr
tr
=
tr
or equiv-
alently that Q = . At this point we would be done if we knew that
i
0 for
all i but we dont. So let
i
:= [
i
[ and observe that
i
= [
i
[ =
k=1
k
Q
ki
k=1
[
k
[ Q
ki

n
k=1
k
Q
ki
.
We now claim that in fact = Q. If this were not the case we would have
i
<
n
k=1
k
Q
ki
for some i and therefore
0 <
n
i=1
i
<
n
i=1
n
k=1
k
Q
ki
=
n
k=1
n
i=1
k
Q
ki
=
n
k=1
k
which is a contradiction. So all that is left to do is normalize
i
so

n
i=1
i
= 1
and we are done.
We are now going to assume that Q is irreducible which means that for all
i ,= j there exists n N such that Q
n
ij
> 0. Alternatively put this implies that
P
i
(T
j
< ) = P
i
(
j
< ) > 0 for all i ,= j. By Corollary 17.42 we know that
E
i
[
j
] = E
i
T
j
< for all i ,= j and it is not too hard to see that E
i
i
<
also holds. The fact that E
i
i
< for all i S will come out of the proof of
the next proposition as well.
Proposition 17.47. If Q is irreducible, then there is precisely one invariant
distribution, , which is given by
i
= 1/ (E
i
i
) > 0 for all i S.
Proof. We begin by using the rst step analysis to write equations for E
i
[
j
]
as follows:
E
i
[
j
] =
n
k=1
E
i
[
j
[X
1
= k] Q
ik
=
k,=j
E
i
[
j
[X
1
= k] Q
ik
+Q
ij
1
=
k,=j
(E
k
[
j
] + 1) Q
ik
+Q
ij
1 =
k,=j
E
k
[
j
] Q
ik
+ 1.
and therefore,
E
i
[
j
] =
k,=j
Q
ik
E
k
[
j
] + 1. (17.51)
Now suppose that is any invariant distribution for Q, then multiplying Eq.
(17.51) by
i
and summing on i shows
n
i=1
i
E
i
[
j
] =
n
i=1
k,=j
Q
ik
E
k
[
j
] +
n
i=1
i
1
=
k,=j
k
E
k
[
j
] + 1.
Since

k,=j

k
E
k
[
j
] < we may cancel it from both sides of this equation in
order to learn
j
E
j
[
j
] = 1.
We may use Eq. (17.51) to compute E
i
[
j
] in examples. To do this, x j and
set v
i
:= E
i
j
. Then Eq. (17.51) states that v = Q
(j)
v + 1 where Q
(j)
denotes
Q with the j
th
column replaced by all zeros. Thus we have
(E
i
j
)
n
i=1
=
_
I Q
(j)
_
1
1, (17.52)
i.e.
_
_
E
1
j
.
.
.
E
n
j
_
_ =
_
I Q
(j)
_
1
_
_
1
.
.
.
1
_
_. (17.53)
17.5.2 Some worked examples
Example 17.48. Let S = 1, 2 and Q =
_
0 1
1 0
_
with jump diagram in Figure
17.4. In this case Q
2n
= I while Q
2n+1
= Q and therefore lim
n
Q
n
does not
\
1
1
\
2
1
.
Fig. 17.4. A non-random chain.
exist. On the other hand it is easy to see that the invariant distribution, , for
Q is =
_
1/2 1/2
and, moreover,
Q+Q
2
+ +Q
N
N

1
2
_
1 1
1 1
_
=
_
_
.
Let us compute
_
E
1
1
E
2
1
_
=
__
1 0
0 1
_
_
0 1
0 0
__
1
_
1
1
_
=
_
2
1
_
and
_
E
1
2
E
2
2
_
=
__
1 0
0 1
_
_
0 0
1 0
__
1
_
1
1
_
=
_
1
2
_
so that indeed,
1
= 1/E
1
1
and
2
= 1/E
2
2
. Of course
1
= 2 (P
1
-a.s.) and
2
= 2 (P
2
-a.s.) so that it is obvious that E
1
1
= E
2
2
= 2.
\
1 1
\
2 1
.
Fig. 17.5. A simple non-irreducible chain.
Example 17.49. Again let S = 1, 2 and Q =
_
1
0
0
1
_
with jump diagram in
Figure 17.5. In this case the chain is not irreducible and every = [a b] with
a +b = 1 and a, b 0 is an invariant distribution.
Example 17.50. Suppose that S = 1, 2, 3 , and
Q =
1 2 3
_
_
0 1 0
1/2 0 1/2
1 0 0
_
_
1
2
3
has the jump graph given by 17.6. Notice that Q
2
11
> 0 and Q
3
11
> 0 that Q is
` _
1
1
` _
2
1
2
1
2
` _
3
1
Fig. 17.6. A simple 3 state jump diagram.

aperiodic. We now nd the invariant distribution,
Nul (QI)
tr
= Nul
_
_
1
1
2
1
1 1 0
0
1
2
1
_
_
= 1
_
_
2
2
1
_
_
.
Therefore the invariant distribution is given by
=
1
5
_
2 2 1
.
Let us now observe that
Q
2
=
_
_
1
2
0
1
2
1
2
1
2
0
0 1 0
_
_
Q
3
=
_
_
0 1 0
1/2 0 1/2
1 0 0
_
_
3
=
_
_
1
2
1
2
0
1
4
1
2
1
4
1
2
0
1
2
_
_
Q
20
=
_
_
409
1024
205
512
205
1024
205
512
409
1024
205
1024
205
512
205
512
51
256
_
_
=
_
_
0.399 41 0.400 39 0.200 20
0.400 39 0.399 41 0.200 20
0.400 39 0.400 39 0.199 22
_
_
.
Let us also compute E
2
3
via,
_
_
E
1
3
E
2
3
E
3
3
_
_
=
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
0 1 0
1/2 0 0
1 0 0
_
_
_
_
1
_
_
1
1
1
_
_
=
_
_
4
3
5
_
_
so that
1
E
3
3
=
1
5
=
3
.
Example 17.51. The transition matrix,
Q =
1 2 3
_
_
1/4 1/2 1/4
1/2 0 1/2
1/3 1/3 1/3
_
_
1
2
3
is represented by the jump diagram in Figure 17.7. This chain is aperiodic. We
` _
1
1
4
1
2
` _
2
1
2
.
1
2
` _
3
1
3
1
3
Fig. 17.7. In the above diagram there are jumps from 1 to 1 with probability 1/4
and jumps from 3 to 3 with probability 1/3 which are not explicitly shown but must
be inferred by conservation of probability.
nd the invariant distribution as,
Nul (QI)
tr
= Nul
_
_
_
_
1/4 1/2 1/4
1/2 0 1/2
1/3 1/3 1/3
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
tr
= Nul
_
_
_
_
3
4
1
2
1
3
1
2
1
1
3
1
4
1
2

2
3
_
_
_
_
= 1
_
_
1
5
6
1
_
_
= 1
_
_
6
5
6
_
_
=
1
17
_
6 5 6
=
_
0.352 94 0.294 12 0.352 94
.
In this case
Q
10
=
_
_
1/4 1/2 1/4
1/2 0 1/2
1/3 1/3 1/3
_
_
10
=
_
_
0.352 98 0.294 04 0.352 98
0.352 89 0.294 23 0.352 89
0.352 95 0.294 1 0.352 95
_
_
.
Let us also compute
_
_
E
1
2
E
2
2
E
3
2
_
_
=
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
1/4 0 1/4
1/2 0 1/2
1/3 0 1/3
_
_
_
_
1
_
_
1
1
1
_
_
=
_
_
11
5
17
5
13
5
_
_
so that
1/E
2
2
= 5/17 =
2
.
Example 17.52. Consider the following Markov matrix,
Q =
1 2 3 4
_
_
1/4 1/4 1/4 1/4
1/4 0 0 3/4
1/2 1/2 0 0
0 1/4 3/4 0
_
_
1
2
3
4
with jump diagram in Figure 17.8. Since this matrix is doubly stochastic (i.e
4
i=1
Q
ij
= 1 for all j as well as
4
j=1
Q
ij
= 1 for all i), it is easy to check that
=
1
4
_
1 1 1 1
. Let us compute E
3
3
as follows
_
_
E
1
3
E
2
3
E
3
3
E
4
3
_
_
=
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
1/4 1/4 0 1/4
1/4 0 0 3/4
1/2 1/2 0 0
0 1/4 0 0
_
_
_
_
_
_
1
_
_
1
1
1
1
_
_
=
_
_
50
17
52
17
4
30
17
_
_
` _
1
1
4
1
4
1
4
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
` _
2
3
4
.
1
4
` _
4
3
4
1
4
` _
3
1
2
1
2
Fig. 17.8. The jump diagram for Q.

so that E
3
3
= 4 = 1/
4
as it should be. Similarly,
_
_
E
1
2
E
2
2
E
3
2
E
4
2
_
_
=
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
1/4 0 1/4 1/4
1/4 0 0 3/4
1/2 0 0 0
0 0 3/4 0
_
_
_
_
_
_
1
_
_
1
1
1
1
_
_
=
_
_
54
17
4
44
17
50
17
_
_
and again E
2
2
= 4 = 1/
2
.
17.5.3 Exercises
Exercise 17.10 (2nd order recurrence relations). Let a, b, c be real num-
bers with a ,= 0 ,= c, , Z with < , and suppose
u(x) : x [, ] Z solves the second order homogeneous recurrence rela-
tion:
au(x + 1) +bu(x) +cu(x 1) = 0 (17.54)
for < x < . Show:
1. for any C,
a
x+1
+b
x
+c
x1
=
x1
p () (17.55)
where p () = a
2
+ b + c is the characteristic polynomial associated
to Eq. (17.54).
Let
=
b
b
2
4ac
2a
be the roots of p () and suppose for the moment that
b
2
4ac ,= 0. From Eq. (17.54) it follows that for any choice of A
1, the
function,
w(x) := A
+
x
+
+A
,
solves Eq. (17.54) for all x Z.
2. Show there is a unique choice of constants, A
1, such that the function

u(x) is given by
u(x) := A
+
x
+
+A
for all x .
3. Now suppose that b
2
= 4ac and
0
:= b/ (2a) is the double root of p () .
Show for any choice of A
0
and A
1
in 1 that
w(x) := (A
0
+A
1
x)
x
0
solves Eq. (17.54) for all x Z. Hint: Dierentiate Eq. (17.55) with respect
to and then set =
0
.
4. Again show that any function u solving Eq. (17.54) is of the form u(x) =
(A
0
+A
1
x)
x
0
for x for some unique choice of constants A
0
, A
1

1.
In the next couple of exercises you are going to use rst step analysis to show
that a simple unbiased random walk on Z is null recurrent. We let X
n
n=0
be
the Markov chain with values in Z with transition probabilities given by
P (X
n+1
= x 1[X
n
= x) = 1/2 for all n N
0
and x Z.
Further let a, b Z with a < 0 < b and
T
a,b
:= minn : X
n
a, b and T
b
:= inf n : X
n
= b .
We know by Corollary
4
17.42 that E
0
[T
a,b
] < from which it follows that
P (T
a,b
< ) = 1 for all a < 0 < b.
Exercise 17.11. Let w
x
:= P
x
_
X
Ta,b
= b
_
:= P
_
X
Ta,b
= b[X
0
= x
_
.
1. Use rst step analysis to show for a < x < b that
w
x
=
1
2
(w
x+1
+w
x1
) (17.56)
provided we dene w
a
= 0 and w
b
= 1.
2. Use the results of Exercise 17.10 to show
P
x
_
X
Ta,b
= b
_
= w
x
=
1
b a
(x a) . (17.57)
4
Apply this corollary to nite walk in [a, b] Z.
3. Let
T
b
:=
_
minn : X
n
= b if X
n
hits b
otherwise
be the rst time X
n
hits b. Explain why,
_
X
Ta,b
= b
_
T
b
< and
use this along with Eq. (17.57) to conclude
5
that P
x
(T
b
< ) = 1 for all
x < b. (By symmetry this result holds true for all x Z.)
Exercise 17.12. The goal of this exercise is to give a second proof of the fact
that P
x
(T
b
< ) = 1. Here is the outline:
1. Let w
x
:= P
x
(T
b
< ) . Again use rst step analysis to show that w
x
satises Eq. (17.56) for all x with w
b
= 1.
2. Use Exercise 17.10 to show that there is a constant, c, such that
w
x
= c (x b) + 1 for all x Z.
3. Explain why c must be zero to again show that P
x
(T
b
< ) = 1 for all
x Z.
Exercise 17.13. Let T = T
a,b
and u
x
:= E
x
T := E[T[X
0
= x] .
1. Use rst step analysis to show for a < x < b that
u
x
=
1
2
(u
x+1
+u
x1
) + 1 (17.58)
with the convention that u
a
= 0 = u
b
.
2. Show that
u
x
= A
0
+A
1
x x
2
(17.59)
solves Eq. (17.58) for any choice of constants A
0
and A
1
.
3. Choose A
0
and A
1
so that u
x
satises the boundary conditions, u
a
= 0 = u
b
.
Use this to conclude that
E
x
T
a,b
= ab + (b +a) x x
2
= a (b x) +bx x
2
. (17.60)
Remark 17.53. Notice that T
a,b
T
b
= inf n : X
n
= b as a , and so
passing to the limit as a in Eq. (17.60) shows
E
x
T
b
= for all x < b.
Combining the last couple of exercises together shows that X
n
is null -
recurrent.
5
The fact that Pj (Tb < ) = 1 is also follows from Example 10.55 above.
17.7 Removing the standard Borel restriction 259
Exercise 17.14. Let T = T
b
. The goal of this exercise is to give a second
proof of the fact and u
x
:= E
x
T = for all x ,= b. Here is the outline. Let
u
x
:= E
x
T [0, ] = [0, ) .
1. Note that u
b
= 0 and, by a rst step analysis, that u
x
satises Eq. (17.58)
for all x ,= b allowing for the possibility that some of the u
x
may be
innite.
2. Argue, using Eq. (17.58), that if u
x
< for some x b then u
y
< for all y > b.
3. If u
x
< for all x > b then u
x
must be of the form in Eq. (17.59) for some
A
0
and A
1
in 1 such that u
b
= 0. However, this would imply, u
x
= E
x
T
as x which is impossible since E
x
T 0 for all x. Thus we must
conclude that E
x
T = u
x
= for all x > b. (A similar argument works if
we assume that u
x
< for all x < b.)
17.6 Appendix: Kolmogorovs extension theorem II
The Kolmogorov extension Theorem 9.51 generalizes to the case where N is
replaced by an arbitrary index set, T. Let us set up the notation for this theorem.
Let T be an arbitrary index set, (S
t
, o
t
)
tT
be a collection of standard Borel
spaces, S =
tT
S
t
, o :=
tT
o
t
, and for T let
_
S
:=
t
S
t
, o
:=
t
o
t
_
and X
: S S
be the projection map, X
(x) := x[
. If
t
T, also
let X
,
: S
be the projection map, X

,
(x) := x[
for all x S
.
Theorem 17.54 (Kolmogorov). For each
f
T (i.e. T and #() <
), let
be a probability measure on (S
, o
) . We further suppose
f T
satisfy the following compatibility relations;
X
1
,
=
for all
t

f
T. (17.61)
Then there exists a unique probability measure, P, on (S, o) such that PX
1
for all
f
T.
Proof. (For slight variation on the proof of this theorem given here, see
Exercise 17.16.) Let
/ :=
f T
X
1
(o
)
and for A = X
1
(A
t
) /, let P (A) :=
(A
t
) . The compatibility conditions
in Eq. (17.61) imply P is a well dened nitely additive measure on the algebra,
/. We now complete the proof by showing P is continuous on /.
To this end, suppose A
n
:= X
1
n
(A
t
n
) / with A
n
as n . Let
:=
n=1
n
a countable subset of T. Owing to Theorem 9.51, there is a
unique probability measure, P
, on (S
, o
) such that P
_
X
1
(A)
_
=
(A)
for all
f
and A o
. Hence if we let

A
n
:= X
1
,n
(A
n
) , we then have
P (A
n
) =
n
(A
t
n
) = P
A
n
_
with

A
n
as n . Since P
is a measure, we may conclude

lim
n
P (A
n
) = lim
n
P
A
n
_
= 0.
Exercise 17.15. Let us write
c
T to mean T and is at most count-
able. Show
o =
cT
X
1
(o
) . (17.62)
Hint: Verify Eq. (17.62) by showing o
0
:=
cT
X
1
(o
) is a algebra.
Exercise 17.16. For each T, let o
t
:= X
1
(o
) = (X
i
: i ) o.
Show;
1. if U, V T then o
t
U
o
t
V
= o
t
UV
.
2. By Theorem 9.51, if U, V
c
T, there exists unique probability measures,
P
U
and P
V
on o
t
U
and o
t
V
respectively such that P
U
X
1
for all

f
U and P
V
X
1
for all
f
V. Show P
U
= P
V
on o
t
U
o
t
V
.
Hence for any A o we may dene P (A) := P
U
(A) provided A o
t
U
.
3. Show P dened in the previous item is a countably additive measure on o.
17.7 Removing the standard Borel restriction
Theorem 17.55. Let (S
n
, o
n
)
nN0
be a collection of measurable spaces, S =
n=0
S
n
and o :=
n=0
o
n
. Moreover for each n N
0
let S
n
:= S
0
S
n
and o
n
:= o
0
o
n
. We further suppose that
0
is a given probability
measure on (S
0
, o
0
) and T
n
: S
n1
o
n
[0, 1] for n = 1, 2, . . . are give
probability kernels on S
n1
S
n
. Finally let
n
be the probability measure on
(S
n
, o
n
) dened inductively by,
n
(dx
0
, . . . , dx
n
) =
n1
(dx
0
, . . . , dx
n1
) T
n
(x
0
, . . . , x
n1
, dx
n
) n N.
(17.63)
Then there exists a unique probability measure, P on (S, o) such that
P (f) =
_
S
n
Fd
n
whenever f (x) = F (x
0
, . . . , x
n
) for some F (o
n
)
b
.
Remark 17.56 (Heuristic proof ). Before giving the formal proof of this theorem
let me indicate the main ideas. Let X
i
: S S
i
be the projection maps and
B
n
:= (X
0
, . . . , X
n
) . If P exists, then
P [F (X
0
, . . . , X
n+1
) [B
n
] = T
n
(X
0
, . . . , X
n
; F (X
0
, . . . , X
n
, ))
= (TF (X
0
, . . . , X
n
, )) (X
0
, . . . , X
n
) .
Indeed,
E[T
n
(X
0
, . . . , X
n
; F (X
0
, . . . , X
n
, )) G(X
0
, . . . , X
n
)]
=
_
T
n
(x
0
, . . . , x
n
, dx
n+1
) F (x
0
, . . . , x
n
, x
n+1
) G(x
0
, . . . , x
n
) d
n
(x
0
, . . . , x
n
)
=
_
F (x
0
, . . . , x
n
, x
n+1
) G(x
0
, . . . , x
n
) d
n+1
(x
0
, . . . , x
n
, x
n+1
)
= E[F (X
0
, . . . , X
n+1
) G(X
0
, . . . , X
n
)] .
Now suppose that f
n
= F
n
(X
0
, . . . , X
n
) is a decreasing sequence of func-
tions such that lim
n
P (f
n
) =: > 0. Letting f
:= lim
n
f
n
we would
have f
n
f
for all n and therefore f

n
E[f
[B
n
] :=

f
n
. We also use
f
n
(X
0
, X
1
, . . . , X
n
)
= E[f
[B
n
] = E[E[f
[B
n+1
] [B
n
]
= E
_
f
n+1
[B
n
=
_

f
n+1
(X
0
, X
1
, . . . , x
n+1
) T
n+1
(X
0
, . . . , X
n
, dx
n+1
)
and P
_
f
n
_
= P (f
) = lim
m
P (f
m
) = > 0 (we only use the case where
n = 0 here). Since P
_
f
0
(X
0
)
_
= > 0, there exists x
0
S
0
such that

f
0
(x
0
) = E
_
f
1
[B
0
=
_

f
1
(x
0
, x
1
) T
1
(x
0
, dx
1
)
and so similarly there exists x
1
S
1
such that

f
1
(x
0
, x
1
) =
_

f
2
(x
0
, x
1
, x
2
) T
2
(x
0
, x
1
, dx
2
) .
Again it follows that there must exists an x
2
S
2
such that

f
2
(x
0
, x
1
, x
2
) .
We continue on this way to nd and x S such that
f
n
(x)

f
n
(x
0
, . . . , x
n
) for all n.
Thus if P (f
n
) > 0 then lim
n
f
n
(x) ,= 0 as desired.
Proof. Now onto the formal proof. Let S denote the space of nitely
based bounded cylinder functions on S, i.e. functions of the form f (x) =
F (x
0
, . . . , x
n
) with F o
n
b
. For such an f we dene
I (f) := P
n
(F) .
It is easy to check that I is a well dened positive linear functional on S.
Now suppose that 0 f
n
S are forms a decreasing sequence of functions
such that lim
n
I (f
n
) = > 0. We wish to show that lim
n
f
n
(x) ,= 0
for every x S. By assumption, f
n
(x) = F
n
(x
0
, . . . , x
Nn
) for some N
n
N of
which we may assume N
0
< N
1
< N
2
< . . . . Moreover if N
0
= 2 < N
1
= 5 <
N
2
= 7 < . . . , we may replace (f
0
, f
1
, . . . ) by
(g
0
, g
1
, g
2
, . . . ) = (1, 1, f
0
, f
0
, f
0
, f
1
, f
1
, f
2
, . . . ) .
Noting that lim
n
g
n
= lim
n
f
n
, lim
n
I (g
n
) = I (f
n
) , and g
n
(x) =
G
n
(x
0
, . . . , x
n
) for some G
n
o
n
b
, we may now assume that f
n
(x) =
F
n
(x
0
, . . . , x
n
) with F
n
o
n
b
.
For any k n let
F
k
n
(x
0
, . . . , x
k
) :=
_

_
F
n
(x
0
, . . . , x
n
)
n1
l=k
T
l
(x
0
, . . . , x
l
, dx
l+1
)
which is an explicit version of P
n
[F
n
(x
0
, . . . , x
n
) [x
0
, . . . , x
k
]=E[f
n
[B
k
] (x) .
By construction of the measures P
n
it follows that
P
k
F
k
n
= P
n
F
n
= I (f
n
) for all k n. (17.64)
Since
F
n
(x
0
, . . . , x
n
) = f
n
(x) f
n+1
(x) = F
n+1
(x
0
, . . . , x
n
, x
n+1
) ,
it follows that
F
k
n
(x
0
, . . . , x
k
) =
_
F
n
(x
0
, . . . , x
n
)
n
l=k
T
l
(x
0
, . . . , x
l
, dx
l+1
)
_
F
n+1
(x
0
, . . . , x
n
, x
n+1
)
n
l=k
T
l
(x
0
, . . . , x
l
, dx
l+1
)
= F
k
n+1
(x
0
, . . . , x
k
) .
Thus we may dene F
k
(x
0
, . . . , x
k
) := lim
n
F
k
n
(x
0
, . . . , x
k
) which is for-
mally equal to E[f[B
k
] (x) . Hence we expect that
F
k
(x
0
, . . . , x
k
) =
_
F
k+1
(x
0
, . . . , x
k
, x
k+1
) T
k
(x
0
, . . . , x
k
, dx
k+1
) (17.65)
17.8 *Appendix: More Probability Kernel Constructions 261
by the tower property for conditional expectations. This is indeed that case
since,
_
F
k+1
(x
0
, . . . , x
k
, x
k+1
) T
k
(x
0
, . . . , x
k
, dx
k+1
)
= lim
n
_
F
k+1
n
(x
0
, . . . , x
k
, x
k+1
) T
k
(x
0
, . . . , x
k
, dx
k+1
)
while
_
F
k+1
n
(x
0
, . . . , x
k
, x
k+1
) T
k
(x
0
, . . . , x
k
, dx
k+1
)
=
_
_
_

_
F
n
(x
0
, . . . , x
n
)
n
l=k+1
T
l
(x
0
, . . . , x
l
, dx
l+1
)
_
T
k
(x
0
, . . . , x
k
, dx
k+1
)
=
_

_
F
n
(x
0
, . . . , x
n
)
n
l=k
T
l
(x
0
, . . . , x
l
, dx
l+1
)
= F
k
n
(x
0
, . . . , x
k
) .
We may now pass to the limit as n in Eq. (17.64) to nd
P
k
_
F
k
_
= > 0 for all k.
For k = 0 it follows that F
0
(x
0
) > 0 for some x
0
S
0
for otherwise
P
0
(F
0
) < . But
F
0
(x
0
) =
_
F
1
(x
0
, x
1
) T
1
(x
0
, dx
1
)
and so there exists x
1
such that
F
1
(x
0
, x
1
) =
_
F
2
(x
0
, x
1
, x
2
) T
2
(x
0
, x
1
, dx
2
)
and hence there exists x
2
such that F
2
(x
0
, x
1
, x
2
) , etc. etc. Thus in the
end we nd an x = (x
0
, x
1
, . . . ) S such that F
k
(x
0
, . . . , x
n
) for all k.
Finally recall that
F
k
n
(x
0
, . . . , x
k
) F
k
(x
0
, . . . , x
k
) for all k n.
Taking k = n then implies,
f
n
(x) = F
n
n
(x
0
, . . . , x
n
) F
n
(x
0
, . . . , x
n
) for all n.
Therefore we have constructed a x S such that f (x) = lim
n
f
n
(x) >
0.
We may now use the Caratheodory extension theorem to show that P
extends to a countably additive measure on (S, o) . Indeed suppose A
n

/(X
i
: i N
0
) . If A
n
then 1
An
0 and by what we have just proved,
P (A
n
) = P (1
An
) 0 as n .
Corollary 17.57 (Innite Product Measures). Let (S
n
, o
n
,
n
)
nN0
be
a collection of measurable spaces, then there exists P on (S, o) such that
P (f) = P (f) =
_
S
n
F (x
0
, . . . , x
n
) d
0
(x
0
) . . . d
n
(x
n
)
whenever f (x) = F (x
0
, . . . , x
n
) for some F (o
n
)
b
.
Proof. Let
0
=
0
and
T
n
(x
0
, . . . , x
n1
, dx
n+1
) = v
n
(dx
n
) .
Then in this case we will have
n
(dx
0
, . . . , dx
n
) = d
0
(x
0
) d
1
(dx
1
) . . .
n
(dx
n
)
as desired.
17.8 *Appendix: More Probability Kernel Constructions
Lemma 17.58. Suppose that (X, /) , (Y, T) , and (Z, B) are measurable spaces
and Q : X T [0, 1] and R : Y B [0, 1] are probability kernels. Then for
every bounded measurable function, F : (Y Z, T B) (1, B
R
) , the map
y
_
Z
R(y, dz) F (y, z)
is measurable. Moreover, if we dene P (x; A) for A T B and x X by
P (x, A) =
_
Y
Q(x, dy)
_
Z
R(y, dz) 1
A
(y, z) ,
then P : X T B [0, 1] is a probability kernel such that
P (x, F) =
_
Y
Q(x, dy)
_
Z
R(y, dz) F (y, z)
for all bounded measurable functions, F : (Y Z, T B) (1, B
R
) . We will
denote the kernel P by QR and write
(QR) (x, dy, dz) = Q(x, dy) R(y, dz) .
Moreover if S (z, dw) is another probability kernel, then ((QR) S) =
(Q(R S)) .
Proof. A routine exercise in using the multiplicative systems theorem. To
verify the last assertion it suces to consider the kernels on sets of the form
AB C in which case,
(Q(R S)) (x, AB C)
=
_
Y
Q(x, dy)
_
ZW
RS (y, dz, dw) 1
ABC
(y, z, w)
=
_
Y
Q(x, dy) 1
A
(y)
_
ZW
RS (y; B C)
=
_
Y
Q(x, dy) 1
A
(y)
_
ZW
R(y, dz) S (z, dw) 1
BC
(z, w)
=
_
Y
Q(x, dy) 1
A
(y)
_
Z
R(y, dz) S (z, C) 1
B
(z)
while
((QR) S) (x, AB C)
=
_
Y Z
QR(x, dy, dz)
_
ZW
S (z, dw) 1
ABC
(y, z, w)
=
_
Y Z
QR(x, dy, dz) 1
AB
(y, z) S (z, C)
=
_
Y
Q(x, dy)
_
Z
R(y, dz) 1
AB
(y, z) S (z, C)
=
_
Y
Q(x, dy) 1
A
(y)
_
Z
R(y, dz) S (z, C) 1
B
(z) .
Corollary 17.59. Keeping the notation in Lemma 17.58, let QR be the proba-
bility kernel given by QR(x, dz) =
_
Y
Q(x, dy) R(y, dz) so that
QR(x; B) = QR(x; Y B) .
Then we have Q(RS) = (QR) S.
Proof. Let C B
W
, then
Q(RS) (x; C) = Q(RS) (x; Y C) =
_
Y
Q(x, dy) (RS) (y; C)
=
_
Y
Q(x, dy) (R S) (y; Z C) = [Q(R S)] (Y Z C) .
Similarly one shows that
(QR) S (x; C) = [(QR) S] (Y Z C)
and then the result follows from Lemma 17.58.
18
(Sub and Super) Martingales
Let us start with a reminder of a few key notions that were already intro-
duced in Chapter 17. As usual we will let (S, o) denote a measurable space
called state space. (Often in this chapter we will take (S, o) = (1, B
R
) .) As
in Chapter 17, we will x a ltered probability space,
_
, B, B
n
nN0
, P
_
,
i.e. B
n
B
n+1
B for all n = 0, 1, 2 . . . . We further dene
B
:=
n=0
B
n
:= (
n=0
B
n
) B. (18.1)
Also recall that a sequence of random functions, Y
n
: S for n N
0
, are
said to be adapted to the ltration if Y
n
is B
n
/o measurable for all n.
Denition 18.1. Let X := X
n
n=0
is a be an adapted sequence of integrable
random variables. Then;
1. X is a B
n
n=0
martingale if E[X
n+1
[B
n
] = X
n
a.s. for all n N
0
.
2. X is a B
n
n=0
submartingale if E[X
n+1
[B
n
] X
n
a.s. for all n N
0
.
3. X is a B
n
n=0
supermartingale if E[X
n+1
[B
n
] X
n
a.s. for all
n N
0
.
It is often fruitful to view X
n
as your earnings at time n while playing some
game of chance. In this interpretation, your expected earnings at time n + 1
given the history of the game up to time n is the same, greater than, less than
your earnings at time n if X = X
n
n=0
is a martingale, submartingale or
supermartingale respectively. In this interpretation, martingales are fair games,
submartingales are games which are favorable to the gambler (unfavorable to the
casino), and supermartingales are games which are unfavorable to the gambler
(favorable to the casino), see Example 18.4.
By induction one shows that X is a supermartingale, martingale, or sub-
martingale i
E[X
m
[B
n
]

=
X
n
a.s for all m n, (18.2)
to be read from top to bottom respectively. This last equation may also be
expressed as
E[X
m
[B
n
]

=
X
mn
a.s for all m, n N
0
. (18.3)
The reader should also note that E[X
n
] is decreasing, constant, or increasing
respectively. The next lemma shows that we may shrink the ltration, B
n
n=0
,
within limits and still have X retain the property of being a supermartingale,
martingale, or submartingale.
Lemma 18.2 (Shrinking the ltration). Suppose that X is a B
n
n=0

supermartingale, martingale, submartingale respectively and B
t
n
n=0
is another
ltration such that (X
0
, . . . , X
n
) B
t
n
B
n
for all n. Then X is a B
t
n
n=0
supermartingale, martingale, submartingale respectively.
Proof. Since X
n
n=0
is adapted to B
n
n=0
and (X
0
, . . . , X
n
) B
t
n

B
n
, for all n,
E
B
n
X
n+1
= E
B
n
E
Bn
X
n+1
E
B
n
X
n
= X
n
,
when X is a B
n
n=0
supermartingale, martingale, submartingale respectively
read from top to bottom.
Enlarging the ltration is another matter all together. In what follows we
will simply say X is a supermartingale, martingale, submartingale if it is a
B
n
n=0
supermartingale, martingale, submartingale.
18.1 (Sub and Super) Martingale Examples
Example 18.3. Suppose that Z
n
n=0
are independent integrable random vari-
ables such that EZ
n
= 0 for all n 1. Then S
n
:=

n
k=0
Z
k
is a martingale
relative to the ltration, B
Z
n
:= (Z
0
, . . . , Z
n
) . Indeed,
E[S
n+1
S
n
[B
n
] = E[Z
n+1
[B
n
] = EZ
n+1
= 0.
This same computation also shows that S
n
n0
is a submartingale if EZ
n
0
and supermartingale if EZ
n
0 for all n.
Exercise 18.1. Construct an example of a martingale, M
n
n=0
such that
E[M
n
[ as n .
Example 18.4 (Setting the odds). Let S be a nite set (think of the outcomes
of a spinner, or dice, or a roulette wheel) and p : S (0, 1) be a probability
264 18 (Sub and Super) Martingales
function
1
. Let Z
n
n=1
be random functions with values in S such that p (s) :=
P (Z
n
= s) for all s S. (Z
n
represents the outcome of the n
th
game.) Also
let : S [0, ) be the houses payo function, i.e. for each dollar you (the
gambler) bets on s S, the house will pay (s) dollars back if s is rolled.
Further let W : W be measurable function into some other measure
space, (W, T) which is to represent your random (or not so random) whims..
We now assume that Z
n
is independent of (W, Z
1
, . . . , Z
n1
) for each n, i.e.
the dice are not inuenced by the previous plays or your whims. If we let
B
n
:= (W, Z
1
, . . . , Z
n
) with B
0
= (W
0
) , then we are assuming the Z
n
is
independent of B
n1
for each n N.
As a gambler, you are allowed to choose before the n
th
game is played, the
amounts
_
C
n
(s)
sS
_
that you want to bet on each of the possible outcomes
of the n
th
game. Assuming the you are not clairvoyant (i.e. can not see the
future), these amounts may be random but must be B
n1
measurable, that is
C
n
(s) = C
n
(W, Z
1
, . . . , Z
n1
, s) , i.e. C
n
(s)
n=1
is previsible process (see
Denition 18.5 below). Thus if X
0
denotes your initial wealth (assumed to be
a non-random quantity) and X
n
denotes your wealth just after the n
th
game
is played, then
X
n
X
n1
=
sS
C
n
(s) +C
n
(Z
n
) (Z
n
)
where
sS
C
n
(s) is your total bet on the n
th
game and C
n
(Z
n
) (Z
n
)
represents the houses payo to you for the n
th
game. Therefore it follows
that
X
n
= X
0
+
n
k=1
_
sS
C
k
(s) +C
n
(Z
k
) (Z
k
)
_
,
X
n
is B
n
measurable for each n, and
E
Bn1
[X
n
X
n1
] =
sS
C
n
(s) +E
Bn1
[C
n
(Z
n
) (Z
n
)]
=
sS
C
n
(s) +
sS
C
n
(s) (s) p (s)
=
sS
C
n
(s) ((s) p (s) 1) .
Thus it follows, that no matter the choice of the betting strategy,
C
n
(s) : s S
n=1
, we will have
1
To be concrete, take S = {2, . . . , 12} representing the possible values for the sums
of the upward pointing faces of two dice. Assuming the dice are independent and
fair then determines p : S (0, 1) . For example p (2) = p (12) = 1/36, p (3) =
p (11) = 1/18, p (7) = 1/6, etc.
E
Bn1
[X
n
X
n1
] =
_
_
_
0 if () p () 1
= 0 if () p () = 1
0 if () p () 1
,
that is X
n
n0
is a sub-martingale, martingale, or supermartingale depending
on whether p 1, p = 1, or p 1.
Moral: If the Casino wants to be guaranteed to make money on average, it
had better choose : S [0, ) such that (s) < 1/p (s) for all s S. In this
case the expected earnings of the gambler will be decreasing which means the
expected earnings of the Casino will be increasing.
Denition 18.5. We say C
n
: S
n=1
is predictable or previsible if
each C
n
is B
n1
/o measurable for all n N.
A typical example is when X
n
: S
n=0
is a sequence of measurable
functions on a probability space (, B, P) and B
n
:= (X
0
, . . . , X
n
) . An ap-
plication of Lemma 14.1 shows that a sequence of random variables, Y
n
n=0
,
is adapted to the ltration i there are o
(n+1)
/B
R
measurable functions,
f
n
: S
n+1
1, such that Y
n
= f
n
(X
0
, . . . , X
n
) for all n N
0
and a se-
quence of random variables, Z
n
n=1
, is predictable i there exists, there are
measurable functions, f
n
: 1
n
1 such that Z
n
= f
n
(X
0
, . . . , X
n1
) for all
n N.
Example 18.6. Suppose that (, B, B
n
n=0
, P) is a ltered probability space
and X L
1
(, B, P) . Then X
n
:= E[X[B
n
] is a martingale. Indeed, by the
tower property of conditional expectations,
E[X
n+1
[B
n
] = E[E[X[B
n+1
] [B
n
] = E[X[B
n
] = X
n
a.s.
Example 18.7. Suppose that = [0, 1] , B = B
[0,1]
, and P = m Lebesgue
measure. Let T
n
=
__
k
2
n
,
k+1
2
n
_
2
n
1
k=1

__
0,
1
2
n
_
and B
n
:= (T
n
) for each
n N. Then M
n
:= 2
n
1
(0,2
n
]
for n N is a martingale (Exercise 18.2) such
that E[M
n
[ = 1 for all n. However, there is no X L
1
(, B, P) such that
M
n
= E[X[B
n
] . To verify this last assertion, suppose such an X existed. Let .
We would then have for 2
n
> k > 0 and any m > n, that
E
_
X :
_
k
2
n
,
k + 1
2
n
__
= E
_
E
Bm
X :
_
k
2
n
,
k + 1
2
n
__
= E
_
M
m
:
_
k
2
n
,
k + 1
2
n
__
= 0.
Using E[X : A] = 0 for all A in the system, Q :=
n=1
__
k
2
n
,
k+1
2
n
: 0 < k < 2
n
_
, an application of the theorem shows
E[X : A] = 0 for all A (Q) = B. Therefore X = 0 a.s. by Proposition 7.22.
But this is impossible since 1 = EM
n
= EX.
18.1 (Sub and Super) Martingale Examples 265
Moral: not all L
1
bounded martingales are of the form in example 18.6.
Proposition 18.8 shows what is missing from this martingale in order for it to be
of the form in Example 18.6. See the comments after Example 18.11 for another
example of an L
1
bounded martingale which is not of the form in example
18.6.
Exercise 18.2. Show that M
n
:= 2
n
1
(0,2
n
]
for n N as dened in Example
18.7 is a martingale.
Proposition 18.8. Suppose 1 p < and X L
p
(, B, P) . Then the col-
lection of random variables, := E[X[(] : ( B is a bounded subset of
L
p
(, B, P) which is also uniformly integrable.
Proof. Since E
is a contraction on all L
p
spaces it follows that is
bounded in L
p
with
sup
B
|E[X[(]|
p
|X|
p
.
For the p > 1 the uniform integrability of follows directly from Lemma 12.48.
We now concentrate on the p = 1 case. Recall that [E
X[ E
[X[ a.s. and

therefore,
E[[E
X[ : [E
X[ a] E[[X[ : [E
X[ a] for all a > 0.

But by Chebyshevs inequality,
P ([E
X[ a)
1
a
E[E
X[
1
a
E[X[ .
Since [X[ is uniformly integrable, it follows from Proposition 12.42 that, by
choosing a suciently large, E[[X[ : [E
X[ a] is as small as we please uni-

formly in ( B and therefore,
lim
a
sup
B
E[[E
X[ : [E
X[ a] = 0.
Example 18.9. This example generalizes Example 18.7. Suppose
(, B, B
n
n=0
, P) is a ltered probability space and Q is another probability
measure on (, B) . Let us assume that Q[
Bn
P[
Bn
for all n, which by the
Raydon-Nikodym Theorem 15.8, implies there exists 0 X
n
L
1
(, B
n
, P)
with EX
n
= 1 such that dQ[
Bn
= X
n
dP[
Bn
, or equivalently put, for any
B B
n
we have
Q(B) =
_
B
X
n
dP = E[X
n
: B] .
Since B B
n
B
n+1
, we also have E[X
n+1
: B] = Q(B) = E[X
n
: B] for
all B B
n
and hence E[X
n+1
[B
n
] = X
n
a.s., i.e. X = X
n
n=0
is a positive
martingale.
Example 18.7 is of this form with Q =
0
. Notice that
0
[
Bn
m[
Bn
for all
n < while
0
m on B
[0,1]
= B
. See Section 19.3 for more in the direction

of this example.
Lemma 18.10. Let X := X
n
n=0
be an adapted process of integrable random
variables on a ltered probability space, (, B, B
n
n=0
, P) and let d
n
:= X
n
X
n1
with X
1
:= EX
0
. Then X is a martingale (respectively submartingale
or supermartingale) i E[d
n+1
[B
n
] = 0 (E[d
n+1
[B
n
] 0 or E[d
n+1
[B
n
] 0
respectively) for all n N
0
.
Conversely if d
n
n=1
is an adapted sequence of integrable random vari-
ables and X
0
is a B
0
-measurable integrable random variable. Then X
n
=
X
0
+
n
j=1
d
j
is a martingale (respectively submartingale or supermartingale)
i E[d
n+1
[B
n
] = 0 (E[d
n+1
[B
n
] 0 or E[d
n+1
[B
n
] 0 respectively) for all
n N.
Proof. We prove the assertions for martingales only, the other all being
similar. Clearly X is a martingale i
0 = E[X
n+1
[B
n
] X
n
= E[X
n+1
X
n
[B
n
] = E[d
n+1
[B
n
] .
The second assertion is an easy consequence of the rst assertion.
Example 18.11. Suppose that Z
n
n=0
is a sequence of independent integrable
random variables, X
n
= Z
0
. . . Z
n
, and B
n
:= (Z
0
, , Z
n
) . (Observe that
E[X
n
[ =
n
k=0
E[Z
k
[ < .) Since
E[X
n+1
[B
n
] = E[X
n
Z
n+1
[B
n
] = X
n
E[Z
n+1
[B
n
] = X
n
E[Z
n+1
] a.s.,
it follows that X
n
n=0
is a martingale if EZ
n
= 1. If we further assume,
for all n, that Z
n
0 so that X
n
0, then X
n
n=0
is a supermartingale
(submartingale) provided EZ
n
1 (EZ
n
1) for all n.
Let us specialize the above example even more by taking Z
n
d
= p +U where
p 0 and U is the uniform distribution on [0, 1] . In this case we have by the
strong law of large numbers that
1
n
lnX
n
=
1
n
n
k=0
lnZ
k
E[ln(p +U)] a.s.
An elementary computation shows
Fig. 18.1. The graph of E[ln(p +U)] as a function of p. This function has a zero at
p = pc

= 0.542 21.
E[ln(p +U)] =
_
1
0
ln(p +x) dx =
_
p+1
p
ln(p +x) dx
= (xlnx x)
x=p+1
x=p
= (p + 1) ln(p + 1) p lnp 1
Hence we may conclude that
X
n
lim
n
exp(nE[ln(p +U)]) =
_
_
_
0 if p p
c
a.s.
Notice that EZ
n
= p + 1/2 and therefore X
n
is a martingale precisely when
p = 1/2 and is a sub-martingale for p > 1/2. So for 1/2 < p < p
c
, X
n
n=1
is
a positive sub-martingale, EX
n
= (p + 1/2)
n+1
yet lim
n
X
n
= 0 a.s.
Have a look at the excel le (Product positive-(sub)martingales.xls) in order to
construct sample paths for the X
n
n=0
.
Proposition 18.12. Suppose that X = X
n
n=0
is a martingale and is a
convex function such that (X
n
) L
1
for all n. Then (X) = (X
n
)
n=0
is a submartingale. If is also assumed to be increasing, it suces to assume
that X is a submartingale in order to conclude that (X) is a submartingale.
(For example if X is a positive submartingale, p (1, ) , and EX
p
n
< for
all n, then X
p
:= X
p
n
n=0
is another positive submartingale.
Proof. When X is a martingale, by the conditional Jensens inequality
14.25,
(X
n
) = (E
Bn
X
n+1
) E
Bn
[(X
n+1
)]
which shows (X) is a submartingale. Similarly, if X is a submartingale and
is convex and increasing, then preserves the inequality, X
n
E
Bn
X
n+1
, and
hence
(X
n
) (E
Bn
X
n+1
) E
Bn
[(X
n+1
)]
so again (X) is a submartingale.
Proposition 18.13 (Markov Chains and Martingales). Suppose that
_
, B, B
nN0
, X
n
: S
n0
, Q, P
_
is a time homogeneous Markov chain
and f : N
0
S 1 be measurable function which is either non-negative or sat-
ises E[[f (n, X
n
)[] < for all n and let Z
n
:= f (n, X
n
) . Then Z
n
n=0
is
a (sub-martingale) martingale if (Qf (n + 1, ) f (n)) Qf (n + 1, ) = f (n, )
for all n 0. In particular if f : S 1 is a function such that (Qf f)
Qf = f then Z
n
= f (X
n
) is a (sub-martingale) martingale. (Also see Exercise
18.5 below.)
Proof. Using the Markov property and the denition of Q, we have
E[Z
n+1
[B
n
] = E[f (n + 1, X
n+1
) [B
n
] = [Qf (n + 1, )] (X
n
) .
The latter expression is (less than or equal) equal to Z
n
if
(Qf (n + 1, ) f (n)) Qf (n + 1, ) = f (n, ) for all n 0.
One way to nd solutions to the equation Qf (n + 1, ) = f (n, ) at least for
a nite number of n is to let g : S 1 be an arbitrary function and T N be
given and then dene
f (n, y) :=
_
Q
Tn
g
_
(y) for 0 n T.
Then Qf (n + 1, ) = Q
_
Q
Tn1
g
_
= Q
Tn
g = f (n, ) and we will have that
Z
n
= f (n, X
n
) =
_
Q
Tn
g
_
(X
n
)
is a Martingale for 0 n T. If f (n, ) satises Qf (n + 1, ) = f (n, ) for all
n then we must have, with f
0
:= f (0, ) ,
f (n, ) = Q
n
f
0
where Q
1
g denotes a function h solving Qh = g. In general Q is not invertible
and hence there may be no solution to Qh = g or there might be many solutions.
In special cases one can often make sense of these expressions as you
will see Exercise 18.5. In this exercise we will continue the notation in Ex-
ercise 17.35 where S = Z, S
n
= X
0
+ X
1
+ + X
n
, where X
i
i=1
are
i.i.d. with P (X
i
= 1) = p (0, 1) and P (X
i
= 1) = q := 1 p, and X
0
is S valued random variable independent of X
i
i=1
. Recall that S
n
n=0
is a time homogeneous Markov chain with transition kernel determined by
18.2 Decompositions 267
Qf (x) = pf (x + 1) + qf (x 1) . As we have seen if f (x) = a + b (q/p)
x
,
then Qf = f and therefore
M
n
= a +b (q/p)
Sn
is a Martingale for all a, b 1.
Now suppose that ,= 0 and observe that Q
x
=
_
p +q
1
_
x
. Thus
it follows that we may set Q
1
x
=
_
p +q
1
_
1
x
and therefore conclude
that
f (n, x) := Q
n
x
=
_
p +q
1
_
n
x
satises Qf (n + 1, ) = f (n, ) . So if we suppose that X
0
is a bounded so
that S
n
is bounded for all n, we will have
_
M
n
=
_
p +q
1
_
n
Sn
_
n0
is a
martingale for all ,= 0.
Exercise 18.3. For 1 let
f
(n, x) := Q
n
e
x
=
_
pe
+qe
_
n
e
x
so that Qf
(n + 1, ) = f
(n, ) for all 1. Compute;

1. f
(k)
(n, x) :=
_
d
d
_
k
f
(n, x) for k = 1, 2.
2. Use your results to show,
M
(1)
n
:= S
n
n(p q)
and
M
(2)
n
:= (S
n
n(p q))
2
4npq
are martingales.
(If you are ambitious you might also nd M
(3)
n .)
Remark 18.14. If M
n
()
n=0
is a martingale depending dierentiability on a
parameter 1. Then for all A B
n
,
E
_
d
d
M
n+1
() : A
_
=
d
d
E[M
n+1
() : A] =
d
d
E[M
n
() : A] = E
_
d
d
M
n
() : A
_
provided it is permissible to interchange
d
d
with the expectations in this equa-
tion. Thus under suitable hypothesis, we will have
_
d
d
M
n
()
_
n0
is another
martingale.
18.2 Decompositions
Notation 18.15 Given a sequence Z
k
k=0
, let
k
Z := Z
k
Z
k1
for k =
1, 2, . . . .
Lemma 18.16 (Doob Decomposition). Each adapted sequence, Z
n
n=0
,
of integrable random variables has a unique decomposition,
Z
n
= M
n
+A
n
(18.4)
where M
n
n=0
is a martingale and A
n
is a predictable process such that A
0
=
0. Moreover this decomposition is given by A
0
= 0,
A
n
:=
n
k=1
E
Bk1
[
k
Z] for n 1 (18.5)
and
M
n
= Z
n
A
n
= Z
n
k=1
E
Bk1
[
k
Z] (18.6)
= Z
0
+
n
k=1
_
Z
k
E
Bk1
Z
k
_
. (18.7)
In particular, Z
n
n=0
is a submartingale (supermartingale) i A
n
is increasing
(decreasing) almost surely.
Proof. Assuming Z
n
has a decomposition as in Eq. (18.4), then
E
Bn
[
n+1
Z] = E
Bn
[
n+1
M +
n+1
A] =
n+1
A (18.8)
wherein we have used M is a martingale and A is predictable so that
E
Bn
[
n+1
M] = 0 and E
Bn
[
n+1
A] =
n+1
A. Hence we must dene, for
m 1,
A
n
:=
n
k=1
k
A =
n
k=1
E
Bk1
[
k
Z]
which is a predictable process. This proves the uniqueness of the decomposition
and the validity of Eq. (18.5).
For existence, from Eq. (18.5) it follows that
E
Bn
[
n+1
Z] =
n+1
A = E
Bn
[
n+1
A] .
Hence, if we dene M
n
:= Z
n
A
n
, then
E
Bn
[
n+1
M] = E
Bn
[
n+1
Z
n+1
A] = 0
and hence M
n
n=0
is a martingale. Moreover, Eq. (18.7) follows from Eq.
(18.6) since,
M
n
= Z
0
+
n
k=1
_
k
Z E
Bk1
[
k
Z]
_
and
k
Z E
Bk1
[
k
Z] = Z
k
Z
k1
E
Bk1
[Z
k
Z
k1
]
= Z
k
Z
k1
_
E
Bk1
Z
k
Z
k1
_
= Z
k
E
Bk1
Z
k
.
Remark 18.17. Suppose that X = X
n
n=0
is a submartingale and X
n
= M
n
+
A
n
is it Doob decomposition. Then A
= lim
n
A
n
exists a.s.,
EA
n
= E[X
n
M
n
] = EX
n
EM
0
= E[X
n
X
0
] (18.9)
and hence by MCT,
EA
= lim
n
E[X
n
X
0
] . (18.10)
Hence if lim
n
E[X
n
X
0
] = sup
n
E[X
n
X
0
] < , then EA
< and so
by DCT, A
n
A
in L
1
(, B, P) . In particular if sup
n
E[X
n
[ < , we may
conclude that X
n
n=0
is L
1
(, B, P) convergent i M
n
n=0
is L
1
(, B, P)
convergent. (We will see below in Corollary 18.54 that X
:= lim
n
X
n
and M
:= lim
n
M
n
exist almost surely under the assumption that
sup
n
E[X
n
[ < .)
Example 18.18. Suppose that N = N
n
n=0
is a square integrable martingale,
i.e. EN
2
n
< for all n. Then from Proposition 18.12, X :=
_
X
n
= N
2
n
_
n=0
is
a positive submartingale. In this case
E
Bk1
k
X = E
Bk1
_
N
2
k
N
2
k1
_
= E
Bk1
[(N
k
N
k1
) (N
k
+N
k1
)]
= E
Bk1
[(N
k
N
k1
) (N
k
N
k1
)]
= E
Bk1
(N
k
N
k1
)
2
wherein the second to last equality we have used
E
Bk1
[(N
k
N
k1
) N
k1
] = N
k1
E
Bk1
(N
k
N
k1
) = 0 a.s.
in order to change (N
k
+N
k1
) to (N
k
N
k1
) . Hence the increasing pre-
dictable process, A
n
, in the Doob decomposition may be written as
A
n
=
kn
E
Bk1
k
X =
kn
E
Bk1
(
k
N)
2
. (18.11)
Exercise 18.4 (Very similar to above example?). Suppose M
n
n=0
is a
square integrable martingale. Show;
1. E
_
M
2
n+1
M
2
n
[B
n
= E
_
(M
n+1
M
n
)
2
[B
n
_
. Conclude from this that the
Doob decomposition of M
2
n
is of the form,
M
2
n
= N
n
+A
n
where
A
n
:=
1kn
E
_
(M
k
M
k1
)
2
[B
k1
_
.
2. If we further assume that M
k
M
k1
is independent of B
k1
for all k =
1, 2, . . . , explain why,
A
n
=
1kn
E(M
k
M
k1
)
2
.
The next exercise shows how to characterize Markov processes via martin-
gales.
Exercise 18.5 (Martingale problem I). Suppose that X
n
n=0
is an (S, o)
valued adapted process on some ltered probability space
_
, B, B
n
nN0
, P
_
and Q is a probability kernel on S. To each f : S 1 which is bounded and
measurable, let
M
f
n
:= f (X
n
)
k<n
(Qf (X
k
) f (X
k
)) = f (X
n
)
k<n
((QI) f) (X
k
) .
Show;
1. If X
n
n0
is a time homogeneous Markov chain with transition kernel, Q,
then
_
M
f
n
_
n0
is a martingale for each f o
b
.
2. Conversely if
_
M
f
n
_
n0
is a martingale for each f o
b
, then X
n
n0
is a
time homogeneous Markov chain with transition kernel, Q.
Remark 18.19. If X is a real valued random variable, then X = X
+
X
,
[X[ = X
+
+X
, X
+
[X[ = 2X
+
X, so that
EX
+
E[X[ = 2EX
+
EX.
Hence if X
n
n=0
is a submartingale then
EX
+
n
E[X
n
[ = 2EX
+
n
EX
n
2EX
+
n
EX
0
18.3 Stopping Times 269
sup
n
EX
+
n
sup
n
E[X
n
[ 2 sup
n
EX
+
n
EX
0
. (18.12)
In particular, an integrable submartingale X
n
n=0
is L
1
(P) bounded i
X
+
n
n=0
is L
1
(P) bounded.
Theorem 18.20 (Krickeberg Decomposition). Suppose that X is an in-
tegrable submartingale such that C := sup
n
E[X
+
n
] < or equivalently
sup
n
E[X
n
[ < , see Eq. (18.12). Then
M
n
:= lim
p
E
_
X
+
p
[B
n
exists a.s.,
M = M
n
n=0
is a positive martingale, Y = Y
n
n=0
with Y
n
:= X
n
M
n
is
a positive supermartingale, and hence X
n
= M
n
Y
n
. So X can be decomposed
into the dierence of a positive martingale and a positive supermartingale.
Proof. From Proposition 18.12 we know that X
+
= X
+
n
is a still a positive
submartingale. Therefore for each n N, and p n,
E
Bn
_
X
+
p+1
= E
Bn
E
Bp
_
X
+
p+1
E
Bn
X
+
p
a.s.
Therefore E
Bn
X
+
p
is increasing in p for p n and therefore, M
n
:=
lim
p
E
Bn
_
X
+
p
exists in [0, ] . By Fatous lemma, we know that

EM
n
liminf
p
E
_
E
Bn
_
X
+
p
liminf
p
E
_
X
+
p
= C <
which shows M is integrable. By cMCT and the tower property of conditional
expectation,
E
Bn
M
n+1
= E
Bn
lim
p
E
Bn+1
_
X
+
p
= lim
p
E
Bn
E
Bn+1
_
X
+
p
= lim
p
E
Bn
_
X
+
p
= M
n
a.s.,
which shows M = M
n
is a martingale.
We now dene Y
n
:= M
n
X
n
. Using the submartingale property of X
+
implies,
Y
n
= M
n
X
n
= lim
p
E
Bn
_
X
+
p
X
n
= lim
p
E
Bn
_
X
+
p
X
+
n
+X
n
= lim
p
E
Bn
_
X
+
p
X
+
n
+X
n
0 a.s..
Moreover,
E[Y
n+1
[B
n
] = E[M
n+1
X
n+1
[B
n
] = M
n
E[X
n+1
[B
n
] M
n
X
n
= Y
n
wherein we have use M is a martingale in the second equality and X is sub-
martingale the last inequality.
18.3 Stopping Times
Denition 18.21. Again let B
n
n=0
be a ltration on (, B) and assume that
B = B
:=
n=0
B
n
:= (
n=0
B
n
) . A function, :

N := N 0, is
said to be a stopping time if n B
n
for all n

N. Equivalently put,
:

N is a stopping time i the process, n 1
n
is adapted.
Lemma 18.22. Let B
n
n=0
be a ltration on (, B) and :

N be a
function. Then the following are equivalent;
1. is a stopping time.
2. n B
n
for all n N
0
.
3. > n = n + 1 B
n
for all n N
0
.
4. = n B
n
for all n N
0
.
Moreover if any of these conditions hold for n N
0
then they also hold for
n = .
Proof. (1. 2.) Observe that if n B
n
for all n N
0
, then
< =
n=1
n B
and therefore = = <

c
B
and hence = < = B
. Hence in order to check that

is a stopping time, it suces to show n B
n
for all n N
0
.
The equivalence of 2., 3., and 4. follows from the identities
> n
c
= n ,
= n = n n 1 , and
n =
n
k=0
= k
from which we conclude that 2. = 3. = 4. = 1.
Clearly any constant function, :

N, is a stopping time. The reader
should also observe that if B
n
= (X
0
, . . . , X
n
) , then :

N is a stopping
time i, for each n N
0
there exists a measurable function, f
n
: 1
n+1
1
such that 1
=n]
= f
n
(X
0
, . . . , X
n
) . In other words, if () = n and
t
is any
other point in such that X
k
() = X
k
(
t
) for k n then (
t
) = n. Here is
another common example of a stopping time.
Example 18.23 (Hitting times). Let (S, o) be a state space, X :=
X
n
: S
n=0
be an adapted process on the ltered space, (, B, B
n
n=0
)
and A o. Then the rst hitting time of A,
:= inf n N
0
: X
n
A ,
(with convention that inf = ) is a stopping time. To see this, observe that
= n = X
0
A
c
, . . . , X
n1
A
c
, X
n
A (X
0
, . . . , X
n
) B
n
.
More generally if is a stopping time, then the rst hitting time after ,
:= inf k : X
k
A ,
is also a stopping time. Indeed,
= n = n X
/ A, . . . , X
n1
/ A, X
n
A
=
0kn
= k X
k
/ A, . . . , X
n1
/ A, X
n
A
which is in B
n
for all n. Here we use the convention that
X
k
/ A, . . . , X
n1
/ A, X
n
A = X
n
A if k = n.
On the other hand the last hitting time, = supn N
0
: X
n
A , of a
set A is typically not a stopping time. Indeed, in this case
= n = X
n
A, X
n+1
/ A, X
n+2
/ A, . . . (X
n
, X
n+1
, . . . )
which typically will not be in B
n
.
Proposition 18.24 (New Stopping Times from Old). Let (, B, B
n
n=0
)
be a ltered measure space and suppose , , and
n
n=1
are all stopping times.
Then
1. , , + are all stopping times.
2. If
k

or
k

, then
is a stopping time.
3. In general, sup
k

k
= lim
k
max
1,
. . . ,
k
and inf
k

k
=
lim
k
min
1,
. . . ,
k
are also stopping times.
Proof.
1. Since > n = > n > n B
n
, n = n
n B
n
for all n, and
+ = n =
n
k=0
= k, = n k B
n
for all n, , , + are all stopping times.
2. If
k

, then
n =
k
k
n B
n
and so
is a stopping
time. Similarly, if
k

, then
> n =
k
k
> n B
n
and so
is a stopping time. (Recall that
> n =
n + 1 .)
3. This follows from items 1. and 2.
Lemma 18.25. If is a stopping time, then the processes, f
n
:= 1
n]
, and
f
n
:= 1
=n]
are adapted and f
n
:= 1
<n]
is predictable. Moreover, if and
are two stopping times, then f
n
:= 1
<n
is predictable.
Proof. These are all trivial to prove. For example, if f
n
:= 1
<n
, then f
n
is B
n1
measurable since,
< n = < n n = < n < n
c
B
n1
.
Notation 18.26 (Stochastic intervals) If , :

N, let
(, ] :=
_
(, n)

N : () < n ()
_
and we will write 1
(,]
for the process, 1
<n
.
Our next goal is to dene the stopped algebra, B
. To motivate the
upcoming denition, suppose X
n
: 1 are given functions for all n N
0
,
B
n
:= (X
0
, . . . , X
n
) , and : N
0
is a B
stopping time. Recalling that

a function Y : 1 is B
n
measurable i Y () = f
n
(X
0
() , . . . X
n
()) for
some measurable function, f
n
: 1
n+1
1, it is reasonable to suggest that Y
is B
measurable i Y () = f
()
_
X
0
() , . . . X
()
()
_
, where f
n
: 1
n+1
1 are measurable random variables. If this is the case, then we would have
1
=n
Y = f
n
(X
0
, . . . , X
n
) is B
n
measurable for all n. Hence we should dene
A to be in B
i 1
A
is B
measurable i 1
=n
1
A
is B
n
measurable for all
n which happens i = n A B
n
for all n.
Denition 18.27 (Stopped algebra). Given a stopping time on a
ltered measure space (, B, B
n
n=0
) with B
:=
n=0
B
n
:= (
n=0
B
n
) , let
B
:= A : = n A B
n
for all n . (18.13)
Lemma 18.28. Suppose and are stopping times.
1. A set, A is in B
i A n B
n
for all n .
2. B
is a sub--algebra of B
.
3. If , then B
.
Proof. 1. Since
A n =
kn
[A k] and
A = n = [A n] [A n 1] ,
it easily follows that A is in B
i A n B
n
for all n .
2. Since n = n B
n
for all n, it follows that B
. If
A B
, then, for all n N

0
,
A
c
n = n A = n [A n] B
n
.
This shows A
c
B
. Similarly if A
k
k=1
B
, then
18.3 Stopping Times 271
n (
k=1
A
k
) =
k=1
( n A
k
) B
n
and hence
k=1
A
k
B
. This completes the proof the B
is a algebra. Since
A = A , it also follows that B
.
3. Now suppose that and A B
. Since A n and n are

in B
n
for all n , we nd
A n = [A n] n B
n
n
which shows A B
.
Proposition 18.29 (B
measurable random variables). Let

(, B, B
n
n=0
) be a ltered measure space. Let be a stopping time
and Z : 1 be a function. Then the following are equivalent;
1. Z is B
measurable,
2. 1
n]
Z is B
n
measurable for all n ,
3. 1
=n]
Z is B
n
measurable for all n .
4. There exists, Y
n
: 1 which are B
n
measurable for all n such
that
Z = Y
N
1
=n]
Y
n
.
Proof. 1. = 2. By denition, if A B
, then 1
n]
1
A
= 1
n]A
is B
n
measurable for all n . Consequently any simple B
measurable
function, Z, satises 1
n]
Z is B
n
measurable for all n. So by the usual
limiting argument (Theorem 6.39), it follows that 1
n]
Z is B
n
measurable
for all n for any B
measurable function, Z.
2. = 3. This property follows from the identity,
1
=n]
Z = 1
n]
Z 1
<n]
Z.
3. = 4. Simply take Y
n
= 1
=n]
Z.
4. = 1. Since Z =

n
N
1
=n]
Y
n
, it suces to show 1
=n]
Y
n
is B

measurable if Y
n
is B
n
measurable. Further, by the usual limiting arguments
using Theorem 6.39, it suces to assume that Y
n
= 1
A
for some A B
n
. In
this case 1
=n]
Y
n
= 1
A=n]
. Hence we must show A = n B
which
indeed is true because
A = n = k =
_
B
k
if k ,= n
A = n B
k
if k = n
.
Alternatively proof for 1. = 2. If Z is B
measurable, then Z B
n B
n
for all n and B B
R
. Hence if B B
R
with 0 / B, then
_
1
n]
Z B
_
= Z B n B
n
for all n
and similarly,
_
1
n]
Z = 0
_
c
=
_
1
n]
Z ,= 0
_
= Z ,= 0 n B
n
for all n.
From these two observations, it follows that
_
1
n]
Z B
_
B
n
for all B B
R
and therefore, 1
n]
Z is B
n
measurable.
Exercise 18.6. Suppose is a stopping time, (S, o) is a measurable space,
and Z : S is a function. Show that Z is B
/o measurable i Z[
=n]
is
(B
n
)
=n]
/o measurable for all n N
0
.
Lemma 18.30 (B
conditioning). Suppose is a stopping time and Z

L
1
(, B, P) or Z 0, then
E[Z[B
] =
n
1
=n
E[Z[B
n
] = Y
(18.14)
where
Y
n
:= E[Z[B
n
] for all n

N. (18.15)
Proof. By Proposition 18.29, Y
is B
measurable. Moreover if Z is inte-

grable, then
n
E
_
1
=n]
[Y
n
[
n
E1
=n]
[E[Z[B
n
][
n
E
_
1
=n]
E[[Z[ [B
n
]
n
E
_
E
_
1
=n]
[Z[ [B
n
n
E
_
1
=n]
[Z[
= E[Z[ < (18.16)

and therefore
E[Y
[ = E
n
_
1
=n]
Y
n
n
E
_
1
=n]
[Y
n
[
E[Z[ < .
Furthermore if A B
, then
E[Z : A] =
n
E[Z : A = n] =
n
E[Y
n
: A = n]
=
n
E
_
1
=n]
Y
n
: A
= E
_
_
n
1
=n]
Y
n
: A
_
_
= E[Y
: A] ,
wherein the interchange of the sum and the expectation in the second to last
equality is justied by the estimate in 18.16 or by the fact that everything in
sight is positive when Z 0.
Exercise 18.7. Suppose and are two stopping times. Show;
1. < , = , and are all in B
,
2. B
= B
,
3. B
= B
:= (B
) , and
4. B
= B
on C where C is any one of the following three sets; ,

< , or = .
For sake of completeness (as it will be needed in the next exercise), let me
check for you directly that B
. This will be the case i

= n B
n
for all n N
0
. This is however true since,
= n = = n = n = n B
n
.
Similarly one shows that < , , < , and = are in B
.
Theorem 18.31 (Tower Property II). Let X L
1
(, B, P) or X :
[0, ] be a B measurable function. Then given any two stopping times, and
, show
E
B
E
B
X = E
B
E
B
X = E
B
X. (18.17)
Proof. As usual it suces to consider the case where X 0 and this case
there will be now convergence issues to worry about.
First Proof. Notice that
1
E
B
=
n
1
1
=n
E
Bn
= 1
n
1
=n
E
Bn
= 1
E
B
and similarly,
1
<
E
B
=
n
1
<
1
=n
E
Bn
= 1
<
n
1
=n
E
Bn
= 1
<
E
B
.
Using these remarks and the fact that and > are both in B
=
B
we nd;
E
B
E
B
= E
B
(1
+ 1
>
) E
B
= E
B
1
E
B
+ 1
>
E
B
E
B
= 1
E
B
E
B
+ 1
>
E
B
E
B
= 1
E
B
+ 1
>
E
B
= E
B
.
Second Proof. In this proof we are going to make use of the localization
Lemma 14.23. Since B
, it follows by item 4. of Exercise 18.7 that

B
= B
on and on < . We will actually use the rst statement

in the form, B
= B
on . From Lemma 18.30, we have

1
E
B
= 1
E
B
and
1
>
E
B
= 1
>
E
B
.
Using these relations and the basic properties of conditional expectation we
arrive at,
E
B
E
B
X = E
B
E
B
[1
X + 1
>
X]
= E
B
[1
E
B
X] + 1
>
E
B
E
B
X
= E
B
[1
E
B
X] + 1
>
E
B
E
B
X
= 1
E
B
[E
B
X] + 1
>
E
B
X
= 1
E
B
X + 1
>
E
B
X = E
B
X a.s.
Exercise 18.8. Show, by example, that it is not necessarily true that
E
1
E
2
= E
12
for arbitrary (
1
and (
2
sub-sigma algebras of B.
Hint: it suces to take (, B, P) with = 1, 2, 3 , B = 2
, and P (j) =
1
3
for j = 1, 2, 3.
Exercise 18.9 (Geometry of commuting projections). Suppose that H is
a Hilbert space and H
i
H for i = 1, 2 are two closed subspaces. Let P
i
= P
Hi
denote orthogonal projection onto H
i
and P = P
M
be orthogonal projection
onto M := H
1
H
2
. Show;
1. Suppose there exists M
0
H
1
H
2
such that M
1
M
2
where M
i
=
h H
i
: h M
0
so that H
1
= M
0
M
1
and H
2
= M
0
M
2
. Then
M
0
= H
1
H
2
and P
1
P
2
= P = P
2
P
1
.
2. If P
1
P
2
= P
2
P
1
, then P
1
P
2
= P = P
2
P
1
. Moreover if we let M
0
= H
1
H
2
and M
i
be as above, then M
1
M
2
.
18.4 Stochastic Integrals and Optional Stopping 273
Solution to Exercise (18.9). 1. The assumptions imply that P
i
= P
M0
+P
Mi
for i = 1, 2. Moreover since M
1
M
2
and M
i
M
0
for i = 1, 2, it follows that
P
M0
P
Mi
= P
Mi
P
M0
= 0 and P
M1
P
M2
= 0 = P
M2
P
M1
. Therefore,
P
1
P
2
= (P
M0
+P
M1
) (P
M0
+P
M2
) = P
2
M0
= P
M0
and similarly P
2
P
1
= P
M0
. Finally since P
1
P
2
(H) P
1
P
2
(M) = M it follows
that M P
M0
(H) = M
0
M and so M = M
0
.
2. Let Q := P
1
P
2
= P
2
P
1
and notice that Qh H
1
and Qh H
2
for
all h H, i.e. Q(H) M, Q[
M
= I
M
and so Q(H) = M, and that Q
=
(P
1
P
2
)
= P
2
P
1
= P
2
P
1
= Q. Therefore it follows that Q = P
M
as claimed.
We now let M
i
= h H
i
: h M
0
where M
0
= M = H
1
H
2
. Then
M
i
= (I P
M
) H
i
and for h
i
H
i
we have, using P
M
= P
1
P
2
,
((I P
M
) h
1
, (I P
M
) h
2
) = (h
1
, (I P
M
) h
2
)
= (h
1
, h
2
) (h
1
, P
1
P
2
h
2
)
= (h
1
, h
2
) (P
1
h
1
, P
2
h
2
) = (h
1
, h
2
) (h
1
, h
2
) = 0.
Hence it follows that M
1
M
2
.
Alternative proof. Let P
i
be orthogonal projection onto H
i
, P
0
be or-
thogonal projection onto H
0
:= H
1
H
2
, and Q
i
be orthogonal projection onto
M
i
. Then P
i
= P
0
+Q
i
for i = 1, 2. Indeed, we P
0
Q
i
= Q
i
P
0
= 0 so that
(P
0
+Q
i
)
2
= P
2
0
+Q
2
i
= P
0
+Q
i
and (P
0
+Q
i
)
2
= P
0
+Q
i
which shows that P
0
+Q
i
is an orthogonal projection.
Moreover it is easy to check that Ran(P
0
+Q
i
) = H
i
so that P
i
= P
0
+ Q
i
as
claimed. Having said this we have in general that
P
1
P
2
= (P
0
+Q
1
) (P
0
+Q
2
) = P
0
+Q
1
Q
2
and similarly that P
2
P
1
= P
0
+Q
2
Q
1
from which it follows that P
1
P
2
= P
2
P
1
i Q
1
Q
2
= Q
2
Q
1
. Lastly if Q
1
Q
2
= Q
2
Q
1
then for all x H we will have
Q
1
Q
2
x = Q
2
Q
1
x M
1
M
2
= 0
which shows that Q
1
Q
2
= Q
2
Q
1
= 0. Thus we have shown that Q
1
Q
2
= Q
2
Q
1
i Q
1
Q
2
= 0 = Q
2
Q
1
which happens i M
1
M
2
.
Exercise 18.10. Let and be stopping times and apply the results of Ex-
ercise 18.9 with M
0
:= L
2
(, B
, P) , H
1
:= L
2
(, B
, P) , and H
2
=
L
2
(, B
, P) to give another proof of Theorem 18.31.

Solution to Exercise (18.9). In order to apply Exercise 18.9, we need to
show if X H
1
and Y H
2
are both orthogonal to M
0
then X Y, i.e.
E[XY ] = 0. In order to compute this last expectation, let

X
n
:= E[X[B
n
] and
Y
n
:= E[Y [B
n
] for all n so that (by Lemma 18.30)
X = E[X[B
] =
n
1
=n
E[X[B
n
] =
n
1
=n

X
n
=

X
and similarly Y =

Y
. We then have,
XY =

X
=

X
(1
+ 1
>
) = 1
+

X
1
>
=
_
1
_

Y
+

X
1
>
_
=
_
1
_
Y +X
_
1
>
_
.
Since
_
1
_
and
_
1
>
_
are both in M
0
= L
2
(, B
, P) (notice
that L
2
(P)

X
= E[X[B
] by Lemma 18.30 again) and both X and Y

are orthogonal to M
0
by assumption, we may conclude
E[XY ] = E
__
1
_
Y
+E
_
X
_
1
>
_
= 0 + 0 = 0.
Having checked the hypothesis of item 1. of Exercise 18.9 we may conclude
that
E
B
E
B
= E
B
E
B
= E
B
on L
2
(, B, P) . This then extends to L
1
(, B, P) by the standard limiting
arguments we used in constructing conditional expectations.
18.4 Stochastic Integrals and Optional Stopping
Notation 18.32 Suppose that c
n
n=1
and x
n
n=0
are two sequences of num-
bers, let c x = (c x)
n
nN0
denote the sequence of numbers dened by
(c x)
0
= 0 and
(c x)
n
=
n
j=1
c
j
(x
j
x
j1
) =
n
j=1
c
j
j
x for n 1.
(For convenience of notation later we will interpret

0
j=1
c
j
j
x = 0.)
For a gambling interpretation of (c x)
n
, let x
j
represent the price of
a stock at time j. Suppose that you, the investor, then buys c
j1
shares at
time j 1 and then sells these shares back at time j. With this interpretation,
c
j1
j
x represents your prot (or loss if negative) in the time interval from
j 1 to j and (c x)
n
represents your prot (or loss) from time 0 to time n.
By the way, if you want to buy 5 shares of the stock at time n = 3 and then
sell them all at time 9, you would take c
k
= 5 1
3<k9
so that
(c x)
9
= 5
3<k9
k
x = 5 (x
9
x
3
)
would represent your prot (loss) for this transaction. The next example for-
malizes this notion.
Example 18.33. Suppose that 0 where ,

N
0
and let c
n
:= 1
<n
.
Then
(c x)
n
=
n
j=1
1
<j
(x
j
x
j1
) =
j=1
1
<jn
(x
j
x
j1
)
=
j=1
1
n<jn
(x
j
x
j1
) = x
n
x
n
.
More generally if ,

N
0
are arbitrary and c
n
:= 1
<n
we will have c
n
:=
1
<n
and therefore
(c x)
n
= x
n
x
n
.
Proposition 18.34 (The Discrete Stochastic Integral). Let X = X
n
n=0
be an adapted integrable process, i.e. E[X
n
[ < for all n. If X is a martin-
gale and C
n
n=1
is a predictable sequence of bounded random variables, then
(C X)
n
n=1
is still a martingale. If X := X
n
n=0
is a submartingale (su-
permartingale) (necessarily real valued) and C
n
0, then (C X)
n
n=1
is a
submartingale (supermartingale).
Conversely if X is an adapted process of integrable functions such that
E[(C X)
n
] = 0 for all bounded predictable processes, C
n
n=1
, then X is
a martingale. Similarly if X is real valued adapted process such that
E[(C X)
n
]

=
0 (18.18)
for all n and for all bounded, non-negative predictable processes, C, then X is
a supermartingale, martingale, or submartingale respectively. (In other words,
X is a sub-martingale if no matter what your (non-negative) betting strategy is
you will make money on average.)
Proof. For any adapted process X, we have
E
_
(C X)
n+1
[B
n
= E[(C X)
n
+C
n+1
(X
n+1
X
n
) [B
n
]
= (C X)
n
+C
n+1
E[(X
n+1
X
n
) [B
n
] . (18.19)
The rst assertions easily follow from this identity.
Now suppose that X is an adapted process of integrable functions such
that E[(C X)
n
] = 0 for all bounded predictable processes, C
n
n=1
. Taking
expectations of Eq. (18.19) then allows us to conclude that
E[C
n+1
E[(X
n+1
X
n
) [B
n
]] = 0
for all bounded B
n
measurable random variables, C
n+1
. Taking C
n+1
:=
sgn(E[(X
n+1
X
n
) [B
n
]) shows [E[(X
n+1
X
n
) [B
n
][ = 0 a.s. and hence X is
a martingale. Similarly, if for all non-negative, predictable C, Eq. (18.18) holds
for all n 1, and C
n
0, then taking A B
n
and C
k
=
k,n+1
1
A
in Eq. (18.12)
allows us to conclude that
E[X
n+1
X
n
: A] = E
_
(C X)
n+1

=
0,
i.e. X is a supermartingale, martingale, or submartingale respectively.
n
n=0
are mean zero independent integrable
random variables and f
k
: 1
k
1 are bounded measurable functions for k N.
Then Y
n
n=0
, dened by Y
0
= 0 and
Y
n
:=
n
k=1
f
k
(X
0
, . . . , X
k1
) (X
k
X
k1
) for n N, (18.20)
is a martingale sequence relative to
_
B
X
n
_
n0
.
Notation 18.36 Given an adapted process, X, and a stopping time , let
X
n
:= X
n
. We call X
:= X
n=0
the process X stopped by .
Observe that
[X
n
[ = [X
n
[ =
0kn
1
=k
X
k
0kn
1
=k
[X
k
[
0kn
[X
k
[ ,
so that X
n
L
1
(P) for all n provided X
n
L
1
(P) for all n.
Example 18.37. Suppose that X = X
n
n=0
is a supermartingale, martingale,
or submartingale, with E[X
n
[ < and let and be stopping times. Then
for any A B
, the process C
n
:= 1
A
1
<n
is predictable since for all n N
we have
A < n = (A < n) n
= (A n 1) n 1
c
B
n1
.
18.4 Stochastic Integrals and Optional Stopping 275
Therefore by Proposition 18.34, (C X)
n
n=0
is a supermartingale, martin-
gale, or submartingale respectively where
(C X)
n
=
n
k=1
1
A
1
<k
k
X = 1
A
k=1
1
<k
k
X
=
k=1
1
A
1
n<kn
k
X = 1
A
(X
n
X
n
) .
Theorem 18.38 (Optional stopping theorem). Suppose X = X
n
n=0
is
a supermartingale, martingale, or submartingale with either E[X
n
[ < for all
n or X
n
0 for all n. Then for every stopping time, , X
is a B
n
n=0

supermartingale, martingale, or submartingale respectively.
Proof. When E[X
n
[ < for all n 0 we may take = 0 and
A = in Example 18.37 in order to learn that X
n
X
0
n=0
is a su-
permartingale, martingale, or submartingale respectively and therefore so is
X
n
= X
0
+X
n
X
0
n=0
. When X
n
is only non-negative we have to give a
dierent proof which does not involve any subtractions (which might be unde-
ned).
For the second proof we simply observe that 1
n
X
n
k=0
1
=k
X
k
is B
n
measurable, > n B
n
, and
X
(n+1)
= 1
n
X
+ 1
>n
X
n+1
.
Therefore
E
Bn
_
X
(n+1)
_
= E
Bn
_
X
(n+1)
=1
n
X
+ 1
>n
E
Bn
X
n+1
_
_
1
n
X
+ 1
>n
X
n
= X
n
,
where the top, middle, bottom (in)equality holds depending on whether X is a
supermartingale, martingale, or submartingale respectively. (This second proof
works for both cases at once. For another proof see Remark 18.40.)
Theorem 18.39 (Optional sampling theorem I). Suppose that and are
two stopping times and is bounded, i.e. there exists N N such that N <
a.s. If X = X
n
n=0
is a supermartingale, martingale, or submartingale,
with either E[X
n
[ < of X
n
0 for all 0 n N, then
E[X
[B
]

=
a.s. (18.21)
respectively
2
from top to bottom.
2
This is the natural generalization of Eq. (18.3) to the stopping time setting.
Proof. First suppose that E[X
n
[ < for 0 n N and let A B
. From
Example 18.37 we know that 1
A
(X
n
X
n
) is a supermartingale, martingale,
or submartingale respectively and in particular for all n N
0
we have
E[1
A
(X
n
X
n
)]

=
0 respectively.
Taking n = N in this equation using N then implies, for all A B
,
that
E[(X
) : A]

=
0 respectively
and this is equivalent to Eq. (18.21).
When we only assume that X
n
0 for all n we again have to give a dierent
proof which avoids subtractions which may be undened. One way to do this
is to use Theorem 18.38 in order to conclude that X
is a supermartingale,
martingale, or submartingale respectively and in particular that
E[X
[B
n
] = E[X
N
[B
n
]

=
nN
for all n .
Combining this result with Lemma 18.30 then implies
E[X
[B
] =
n
1
=n
E[X
[B
n
]

=
n
1
=n
X
nN
= X
N
= X
. (18.22)
(This second proof again covers both cases at once!)
Exercise 18.11. Give another proof of Theorem 18.39 when E[X
n
[ < by
using the tower property in Theorem 18.31 along with the Doob decomposition
of Lemma 18.16.
Solution to Exercise (18.11). First suppose X is a martingale in which case
X
n
= E
Bn
X
N
for all n N and hence by Lemma 18.16
X
nN
1
=n
X
n
=
nN
1
=n
E
Bn
X
N
=
n
1
=n
E
Bn
X
N
= E
B
X
N
.
Therefore, by Theorem 18.31
E
B
X
= E
B
E
B
X
N
= E
B
X
N
= X
.
Now suppose that X is a submartingale. By the Doob decomposition of
Lemma 18.16, X
n
= M
n
+A
n
where M is a martingale and A is an increasing
predictable process. In this case we have
E
B
X
= E
B
M
+E
B
A
= M
+E
B
A
+E
B
A
= M
+A
= X
.
The supermartingale case follows from the submartingale result just proved
applied to X.
Exercise 18.12. Give yet another (full) proof of Theorem 18.39 using the fol-
lowing outline;
1. Show by induction on n starting with n = N that
E[X
[B
n
]

=
X
n
a.s. for all 0 n N. (18.23)
2. Observe the above inequality holds as an equality for n > N as well.
3. Combine this result with Lemma 18.16 to complete the proof.
This argument makes it clear why we must at least initially assume that
N for some N N. To relax this restriction will require a limiting argument
which will be the topic of Section 18.8 below.
Solution to Exercise (18.12). To keep the notation manageable I will give
the proof in the case that X
n
is a submartingale. Since N everywhere,
X
is B
B
N
measurable, it follows that Eq. (18.23) holds for all n N. Now
consider n = N 1. Using
X
= 1
=N
X
N
+ 1
N1
X
where = N = N 1
c
B
N1
and 1
N1
X
is B
N1
measurable,
we learn,
E[X
[B
N1
] = 1
=N
E[X
N
[B
N1
] + 1
N1
X
1
=N
X
N1
+ 1
N1
X
= X
(N1)
.
Applying this same argument with replaced by (N k) N k shows
E
_
X
(Nk)
[B
Nk1
X
(Nk)(Nk1)
= X
(Nk1)
for any 0 k N 1. Combining these observations with the tower property
of conditional expectations allows us to conclude that
E[X
[B
Nk
] X
(Nk)
.
For example,
E[X
[B
N3
] = E[E[X
[B
N1
] [B
N3
]
E
_
X
(N1)
[B
N3
= E
_
E
_
X
(N1)
[B
N2
[B
N3
E
_
X
(N2)
[B
N3
X
(N3)
.
Thus we have now veried Eq. (18.23) for all n

N
0
.
We now combine this result with Lemma 18.16 to learn,
E[X
[B
] =
n
1
=n
E[X
[B
n
]
n
1
=n
X
n
= X
.
Remark 18.40. Theorem 18.39 can be used to give a simple proof of the Optional
stopping Theorem 18.38. For example, if X = X
n
n=0
is a submartingale and
is a stopping time, then
E
Bn
X
(n+1)
X
[(n+1)]n
= X
n
,
i.e. X
is a submartingale.
18.5 Submartingale Maximal Inequalities
Notation 18.41 (Running Maximum) If X = X
n
n=0
is a sequence of
(extended) real numbers, we let
X
N
:= max X
0
, . . . , X
N
. (18.24)
Proposition 18.42 (Maximal Inequalities of Bernstein and Levy). Let
X
n
be a submartingale on a ltered probability space, (, B, B
n
n=0
, P) .
Then
3
for any a 0 and N N,
aP (X
N
a) E[X
N
: X
N
a] E
_
X
+
N
, (18.25)
aP
_
min
nN
X
n
a
_
E
_
X
N
: min
kN
X
k
> a
_
E[X
0
] (18.26)
E
_
X
+
N
E[X
0
] , (18.27)
and
aP (X
N
a) 2E
_
X
+
N
E[X
0
] . (18.28)
3
The rst inequality is the most important.
18.5 Submartingale Maximal Inequalities 277
Proof. Let := inf n : X
n
a and observe that
X
N
X
a on N = X
N
a . (18.29)
Since N B
N
, it follows by the optional sampling Theorem 18.39 that
E[X
: N] = E[X
N
: N] E[X
N
: N]
which combined with Eq. (18.29) implies,
a P (X
N
a) = a P ( N) E[X
: N] E[X
N
: X
N
a] ,
i.e. Eq. (18.25) holds.
More generally if X is any integrable process and is the random time
dened by, := inf n : X
n
a we still have Eq. (18.29) and
aP (X
N
a) = E[a : N] E[X
: N] (18.30)
= E[X
N
: N] E[X
N
X
: N]
= E[X
N
: N] E[X
N
X
N
] . (18.31)
Let me emphasize again that in deriving Eq. (18.31), we have not used any
special properties (not even adaptedness) of X. If X is now assumed to be a
submartingale, by the optional sampling Theorem 18.39, E
BN
X
N
X
N
and in particular E[X
N
X
N
] 0. Combining this observation with Eq.
(18.31) and Eq. (18.29) again gives Eq. (18.25).
Secondly we may apply Eq. (18.31) with X
n
replaced by X
n
to nd
aP
_
min
nN
X
n
a
_
= aP
_
min
nN
X
n
a
_
= aP
_
max
nN
(X
n
) a
_
E[X
N
: N] +E[X
N
X
N
] (18.32)
where now,
:= inf n : X
n
a = inf n : X
n
a .
By the optional sampling Theorem 18.39, E[X
N
X
0
] 0 and adding this
to right side of Eq. (18.32) gives the estimate
aP
_
min
nN
X
n
a
_
E[X
N
: N] +E[X
N
X
N
] +E[X
N
X
0
]
E[X
N
X
0
] E[X
N
: N]
= E[X
N
: > N] E[X
0
]
= E
_
X
N
: min
kN
X
k
> a
_
E[X
0
]
which proves Eq. (18.26) and hence Eq. (18.27). Adding Eqs. (18.25) and (18.27)
gives the estimate in Eq. (18.28).
Remark 18.43. It is of course possible to give a direct proof of Proposition 18.42.
For example,
E
_
X
N
: max
nN
X
n
a
_
=
N
k=1
E[X
N
: X
1
< a, . . . , X
k1
< a, X
k
a]
k=1
E[X
k
: X
1
< a, . . . , X
k1
< a, X
k
a]
k=1
E[a : X
1
< a, . . . , X
k1
< a, X
k
a]
= aP
_
max
nN
X
n
a
_
Corollary 18.44. Suppose that Y
n
n=1
is a non-negative supermartingale,
a > 0 and N N, then
aP
_
max
nN
Y
n
a
_
E[Y
0
a] E
_
Y
N
: max
nN
Y
n
< a
_
E[Y
0
a] . (18.33)
Proof. Let X
n
:= Y
n
in Eq. (18.26) to learn
aP
_
min
nN
(Y
n
) a
_
E
_
Y
N
: min
nN
(Y
n
) > a
_
+E[Y
0
]
aP
_
max
nN
Y
n
a
_
E[Y
0
] E
_
Y
N
: max
nN
Y
n
< a
_
E[Y
0
] . (18.34)
Since
a
(x) := a x is concave and nondecreasing, it follows by Jensens
inequality that
E[
a
(Y
n
) [B
m
]
a
(E[Y
n
[B
m
])
a
(Y
n
) for all m n.
In this way we see that
a
(Y
n
) = Y
n
a is a supermartingale as well. Applying
Eq. (18.34) with Y
n
replaced by Y
n
a proves Eq. (18.33).
Lemma 18.45. Suppose that X and Y are two non-negative random variables
such that P (Y y)
1
y
E[X : Y y] for all y > 0. Then for all p (1, ) ,
EY
p
_
p
p 1
_
p
EX
p
. (18.35)
Proof. We will begin by proving Eq. (18.35) under the additional assump-
tion that Y L
p
(, B, P) . Since
EY
p
= pE
_

0
1
yY
y
p1
dy = p
_

0
E[1
yY
] y
p1
dy
= p
_

0
P (Y y) y
p1
dy p
_

0
1
y
E[X : Y y] y
p1
dy
= pE
_

0
X1
yY
y
p2
dy =
p
p 1
E
_
XY
p1
.
Now apply Holders inequality, with q = p (p 1)
1
, to nd
E
_
XY
p1
|X|
p
_
_
Y
p1
_
_
q
= |X|
p
[E[Y [
p
]
1/q
.
Combining thew two inequalities shows and solving for |Y |
p
shows |Y |
p

p
p1
|X|
p
which proves Eq. (18.35) under the additional restriction of Y being
in L
p
(, B, P) .
To remove the integrability restriction on Y, for M > 0 let Z := Y M and
observe that
P (Z y) = P (Y y)
1
y
E[X : Y y] =
1
y
E[X : Z y] if y M
while
P (Z y) = 0 =
1
y
E[X : Z y] if y > M.
Since Z is bounded, the special case just proved shows
E[(Y M)
p
] = EZ
p
_
p
p 1
_
p
EX
p
.
We may now use the MCT to pass to the limit, M , and hence conclude
that Eq. (18.35) holds in general.
Corollary 18.46 (Doobs Inequality). If X = X
n
n=0
be a non-negative
submartingale and 1 < p < , then
EX
p
N

_
p
p 1
_
p
EX
p
N
. (18.36)
Proof. Equation 18.36 follows by applying Lemma 18.45 with the aid of
Proposition 18.42.
Corollary 18.47 (Doobs Inequality). If M
n
n=0
is a martingale and 1 0,
P
_
[M[
N
a
_
1
a
E[[M[
N
: M
N
a]
1
a
E[[M
N
[] (18.37)
and
E[M[
p
N

_
p
p 1
_
p
E[M
N
[
p
. (18.38)
Proof. By the conditional Jensens inequality, it follows that X
n
:= [M
n
[
is a submartingale. Hence Eq. (18.37) follows from Eq. (18.25) and Eq. (18.38)
follows from Eq. (18.36).
n
be a sequence of independent integrable random
variables with mean zero, S
0
= 0, S
n
:= X
1
+ + X
n
for n N, and
[S[
n
= max
jn
[S
j
[ . Since S
n
n=0
is a martingale, by cJensens inequality,
[S
n
[
p
n=1
is a (possibly extended) submartingale for any p [1, ). There-
fore an application of Eq. (18.25) of Proposition 18.42 show
P
_
[S[
N

_
= P
_
[S[
p
N

p
_
p
E[[S
N
[
p
: S
N
] .
(When p = 2, this is Kolmogorovs inequality in Theorem 20.42 below.) From
Corollary 18.47 we also know that
E[S[
p
N

_
p
p 1
_
p
E[S
N
[
p
.
In particular when p = 2, this inequality becomes,
E[S[
2
N
4 E[S
N
[
2
= 4
N
n=1
E[X
n
[
2
.
18.6 Submartingale Upcrossing Inequality and
Convergence Theorems
The main results of this section are consequences of the following example and
lemma which say that the optimal strategy for betting on a sub-martingale is
to go all in. Any other strategy, including buy low and sell high, will not fare
better (on average) than going all in.
n
n=0
represents the value of a stock which is
known to be a sub-martingale. At time n 1 you are allowed buy C
n
[0, 1]
18.6 Submartingale Upcrossing Inequality and Convergence Theorems 279
shares of the stock which you will then sell at time n. Your net gain (loss) in
this transaction is C
n
X
n
C
n
X
n1
= C
n
n
X and your wealth at time n will
be
W
n
= W
0
+
n
k=1
C
k
k
X.
The next lemma asserts that the way to maximize your expected gain is to
choose C
k
= 1 for all k, i.e. buy the maximum amount of stock you can at each
stage. We will refer to this as the all in strategy..
Lemma 18.50 (All In). If X
n
n=0
is a sub-martingale and C
k
k=1
is a
previsible process with values in [0, 1] , then
E
_
n
k=1
C
k
k
X
_
E[X
n
X
0
]
with equality when C
k
= 1 for all k, i.e. the optimal strategy is to go all in.
Proof. Notice that 1 C
k
k=1
is a previsible non-negative process and
therefore by Proposition 18.34,
E
_
n
k=1
(1 C
k
)
k
X
_
0.
Since
X
n
X
0
=
n
k=1
k
X =
n
k=1
C
k
k
X +
n
k=1
(1 C
k
)
k
X,
it follows that
E[X
n
X
0
] = E
_
n
k=1
C
k
k
X
_
+E
_
n
k=1
(1 C
k
)
k
X
_
E
_
n
k=1
C
k
k
X
_
.
We are now going to apply Lemma 18.50 to the time honored gambling strat-
egy of buying low and selling high in order to prove the important upcross-
ing inequality of Doob, see Theorem 18.51. To be more precise, suppose that
X
n
n=0
is a sub-martingale representing a stock price and < a < b <
are given numbers. The (sub-optimal) strategy we wish to employ is to buy the
stock when it rst drops below a and then sell the rst time it rises above b
and then repeat this strategy over and over again.
Given a function, N
0
n X
n
1 and < a < b < , let
0
= 0,
1
= inf n
0
: X
n
a
2
= inf n
1
: X
n
b ,
3
:= inf n
2
: X
n
a
.
.
.
2k
= inf n
2k1
: X
n
b ,
2k+1
:= inf n
2k
: X
n
a (18.39)
.
.
.
with the usual convention that inf = in the denitions above, see Figures
18.2 and 18.3.
Fig. 18.2. A sample path or the positive part of a random walk with level crossing
of a = 1 and b = 2 being marked o.
In terms of these stopping time our betting strategy may be describe as,
C
n
=
k=1
1
2k1<n2k
for n N, (18.40)
see Figure 18.3 for a more intuitive description of C
n
n=1
.
Fig. 18.3. In this gure we are taking a = 0.85 and b = 1.20. There are two up-
crossings and we imaging buying below0.85 and selling above 1.20. The graph of Cn
is given in blue in the above gure.
Observe that
n+1

n
+1 for all n 1 and hence
n
n1 for all n 1.
Further, for each N

N let
U
X
N
(a, b) = max k :
2k
N (18.41)
be the number of upcrossings of X across [a, b] in the time interval, [0, N] .
In Figure 18.3 you will notice that there are two upcrossings and at the end
we are holding a stock for a loss of no more than (a X
N
)
+
. In this example
X
0
= 0.90 and we do not purchase a stock until time 1, i.e. C
n
= 1 for the rst
time at n = 2. On the other hand if X
0
< a, then on the rst upcrossing we
would be guaranteed to make at least
b X
0
= b a +a X
0
= b a + (a X
0
)
+
.
With these observations in mind, if there is at least one upcrossing, then
W
N
:=
N
k=1
C
k
k
X (b a) U
X
N
(a, b) + (a X
0
)
+
(a X
N
)
+
(18.42)
= (b a) U
X
N
(a, b) + (X
0
a)
(X
N
a)
. (18.43)
In words the inequality in Eq. (18.43) states that our net gain in buying at
or below a and selling at or above b is at least equal to (b a) times the number
of times we buy low and sell high plus a possible bonus for buying below a at
time 0 and a penalty for holding the stock below a at the end of the day. The
key inequality in Eq. (18.43) may also be veried when no upcrossings occur.
Here are the three case to consider.
1. If X
n
> a for all 0 n N, then C
n
= 0 for all n so W
N
= 0 while
(X
0
a)
(X
N
a)
= 0 0 = 0 as well.
2. If X
0
a and X
n
 a, but
1
N and X
n
< b for all 0 n N, then
W
N
= X
N
X
1
X
N
a (X
N
a)
= (X
N
a)
+ (X
0
a)
.
Theorem 18.51 (Doobs Upcrossing Inequality). If X
n
n=0
is a sub-
martingale and < a < b < , then for all N N,
E
_
U
X
N
(a, b)
1
b a
_
E(X
N
a)
+
E(X
0
a)
+
.
Proof. First Proof. Let C
k
k=1
be the buy low sell high strategy dened
in Eq. (18.40). Taking expectations of the inequality in Eq. (18.43) making use
of Lemma 18.50 implies,
E[X
N
a (X
0
a)] = E[X
N
X
0
] E[(C X)
N
]
(b a) EU
X
N
(a, b) +E(X
0
a)
E(X
N
a)
.
The result follows from this inequality and the fact that (X
n
a) =
(X
n
a)
+
(X
n
a)
.
Remark 18.52 (*Second Proof ). Here is a variant on the above proof which may
safely be skipped. We rst suppose that X
n
0, a = 0 and b > 0. Let
0
= 0,
1
= inf n
0
: X
n
= 0
2
= inf n
1
: X
n
b ,
3
:= inf n
2
: X
n
= 0
.
.
.
2k
= inf n
2k1
: X
n
b ,
2k+1
:= inf n
2k
: X
n
= 0
.
.
.
18.6 Submartingale Upcrossing Inequality and Convergence Theorems 281
a sequence of stopping times. Suppose that N is given and we choose k such
that 2k > N. Then we know that
2k
N. Thus if we let
t
n
:=
n
N, we
know that
t
n
= N for all n 2k. Therefore,
X
N
X
0
=
2k
n=1
_
X
n
X
n1
_
=
k
n=1
_
X
2n
X
2n1
_
+
k
n=1
_
X
2n1
X
2n2
_
bU
X
N
(0, b) +
k
n=1
_
X
2n1
X
2n2
_
, (18.44)
wherein we have used X
2n
X
2n1
b if there were an upcrossing in the
interval
_
t
2n1
,
t
2n
and X
2n
X
2n1
0 otherwise,
4
see Figure 18.4. Taking
expectations of Eq. (18.44) implies
EX
N
EX
0
bEU
X
N
(0, b) +
k
n=1
E
_
X
2n1
X
2n2
_
bEU
X
N
(0, b)
wherein we have used the optional sampling theorem to guarantee,
E
_
X
2n1
X
2n2
_
0.
If X is a general submartingale and < a < b < , we know by
cJensens inequality (with (x) = (x a)
+
which is convex and increasing)
that (X
n
a)
+
is still a sub-martingale and moreover
U
X
N
(a, b) = U
(Xa)
+
(0, b a)
and therefore
(b a) E
_
U
X
N
(a, b)
= (b a) E
_
U
(Xa)
+
(0, b a)
_
E(X
N
a)
+
E(X
0
a)
+
.
The second proof is now complete, nevertheless it is worth contemplating a
bit how is that E
_
X
2n1
X
2n2
_
0 given that are strategy being employed
is now to buy high and sell low. On
2n1
N , X
2n1
X
2n2
0b = b
and therefore,
4
If 2n1 N, then X
2n
X
2n1
= XN XN = 0, while if 2n1 < N, X
2n
2n1
= X
2n
0 0.
0 E
_
X
2n1
X
2n2
_
= E
_
X
2n1
X
2n2
:
2n1
N
_
+E
_
X
2n1
X
2n2
:
2n1
> N
_
bP (
2n1
N) +E
_
X
N
X
2n2
:
2n1
> N
_
.
Therefore we must have
E
_
X
N
X
2n2N
:
2n1
> N
_
bP (
2n1
N)
so that X
N
must be suciently large suciently often on the set where
2n1
>
N.
Fig. 18.4. A sample path of a positive submartingale along with stopping times
2j1 and 2j which are the successive hitting times of 0 and 2 respectively. If we
take N = 70 in this case, then observe that Notice that X870 X770 2 while
X1070 X970 = 0
Lemma 18.53. Suppose X = X
n
n=0
is a sequence of extended real numbers
such that U
X
(a, b) < for all a, b with a < b. Then X
:= lim
n
X
n
exists in

1.
Proof. If lim
n
X
n
does not exists in

1, then there would exists a, b
such that
liminf
n
X
n
< a b innitely
often. Therefore, U
X
(a, b) = .
Corollary 18.54. Suppose X
n
n=0
is an integrable submartingale such that
sup
n
EX
+
n
< (or equivalently C := sup
n
E[X
n
[ < , see Remark 18.19),
then X
:= lim
n
X
n
exists in 1 a.s. and X
L
1
(, B, P) . Moreover
X
n
N0
is a submartingale (that is we also have X
n
E[X
[B
n
] a.s. for all
n), i X
+
n
n=1
Proof. For any < a < b < , by Doobs upcrossing inequality (Theo-
rem 18.51) and the MCT,
E
_
U
X
(a, b)
1
b a
_
sup
N
E(X
N
a)
+
E(X
0
a)
+
_
<
where
U
X
(a, b) := lim
N
U
X
N
(a, b)
is the total number of upcrossings of X across [a, b].
5
In particular it follows
that
0
:=
_
U
X
(a, b) < : a, b with a < b

_
has probability one. Hence by Lemma 18.53, for
0
we have X
() :=
lim
n
X
n
() exists in

1. By Fatous lemma we know that
E[[X
[] = E
_
liminf
n
[X
n
[
_
liminf
n
E[[X
n
[] C < (18.45)
and therefore that X
1 a.s.
Since (as we have already shown) X
+
n
X
+
a.s., if X
+
n
n=1
is uniformly
integrable, then X
+
n
X
+
in L
1
(P) by Vitallis convergence Theorem 12.44.
Therefore for A B
n
we have by Fatous lemma that
E[X
n
1
A
] limsup
m
E[X
m
1
A
] = limsup
m
_
E
_
X
+
m
1
A
E
_
X
m
1
A
_
= E
_
X
+
1
A
liminf
m
E
_
X
m
1
A
E
_
X
+
1
A
E
_
liminf
m
X
m
1
A
_
= E
_
X
+
1
A
E
_
X
1
A
= E[X
1
A
] .
Since A B
n
was arbitrary we may conclude that X
n
E[X
[B
n
] a.s. for n.
5
Notice that (XN a)
+
|XN a| |XN|+a so that sup
N
E(XN a)
+
C+a <
.
Conversely if we suppose that X
n
E[X
[B
n
] a.s. for n, then by cJensens
inequality (with (x) = x 0 being an increaing convex function),
X
+
n
(E[X
[B
n
])
+
E
_
X
+
[B
n
a.s. for all n

and therefore X
+
n
n=1
is uniformly integrable by Proposition 18.8 and Exercise
12.5.
Second Proof. We may also give another proof of the rst assertion based
on the Krickeberg decomposition Theorem 18.20 and the supermartingale con-
vergence Corollary 18.63 below. Indeed, by the Krickeberg decomposition The-
orem 18.20, X
n
= M
n
Y
n
where M is a positive martingale and Y is a positive
supermartingale. Hence by two applications of Corollary 18.63 we may conclude
that
X
= lim
n
X
n
= lim
n
M
n
lim
n
Y
n
exists in 1 almost surely.
Remark 18.55. If X
n
n=0
is a submartingale such that X
+
n
n=0
is uniformly
integrable, it does not necessarily follows that X
n
n=0
Indeed, let X
n
= M
n
where M
n
is the non-uniformly integrable martingale
in Example 18.7. Then X
n
is a negative (sub)martingale and hence X
+
n
0 is
uniformly integrable but X
n
n=0
is not uniformly integrable. This also shows
that assuming the positive part of a martingale is uniformly integrable is not
sucient to show the martingale itself is uniformly integrable. Keep in mind in
this example that lim
n
X
n
= 0 a.s. while EX
n
= 1 for all n and so clearly
lim
n
EX
n
= 1 ,= 0 = E[lim
n
X
n
] in this case.
Notation 18.56 Given a probability space, (, B, P) and A, B B, we say
A = B a.s. i P (AB) = 0 or equivalently i 1
A
= 1
B
a.s.
Corollary 18.57 (Localizing Corollary 18.54). Suppose M = M
n
n=0
is
a martingale and c < such that
n
M c a.s. for all n. Then
_
lim
n
M
n
exists in 1
_
=
_
sup
n
M
n
<
_
a.s.
Proof. Let
a
:= inf n : M
n
a for all a N. Then by the optional
stopping theorem, n M
a
n
is still a martingale. Since M
a
n
a+c,
6
it follows
that E(M
a
n
)
+
a + c < for all n. Hence we may apply Corollary 18.54 to
conclude, lim
n
M
a
n
= M
a
exists in 1 almost surely. Therefore n M

n
is
convergent in 1 almost surely on the set
a
M
a
= M =
_
sup
n
M
n
<
_
.
Conversely if n M
n
is convergent in 1, then sup
n
M
n
< .
6
If n < a then Mn < a and if n a then M
a
n
= Ma
Ma1 +c < a +c.
18.7 *Supermartingale inequalities 283
Corollary 18.58. Suppose M = M
n
n=0
is a martingale, and c < such
that [
n
M[ c a.s. for all n. Let
C :=
_
lim
n
M
n
exists in 1
_
and
D :=
_
limsup
n
M
n
= and liminf
n
M
n
=
_
.
Then, P (C D) = 1. (In words, either lim
n
M
n
exists in 1 or M
n
n=1
is
wildly oscillating as n .)
Proof. Since both M and M satisfy the hypothesis of Corollary 18.57, we
may conclude that (almost surely),
C =
_
sup
n
M
n
<
_
=
_
inf
n
M
n
>
_
a.s.
and hence almost surely,
C
c
=
_
sup
n
M
n
=
_
=
_
inf
n
M
n
=
_
=
_
sup
n
M
n
=
_
_
inf
n
M
n
=
_
= D.
Corollary 18.59. Suppose (, B, B
n
n=0
, P) is a ltered probability space
and A
n
B
n
for all n. Then
_
n
1
An
=
_
= A
n
i.o. =
_
n
E[1
An
[B
n1
] =
_
a.s. (18.46)
Proof. Let
n
M := 1
An
E[1
An
[B
n1
] so that E[
n
M[B
n1
] = 0 for all
n. Thus if
M
n
:=
kn
n
M =
kn
(1
An
E[1
An
[B
n1
]) ,
then M is a martingale with [
n
M[ 1 for all n. Let C and C be as in Corollary
18.58. Since A
n
i.o. =
n
1
An
= , it follows that
A
n
i.o. =
_
n
E[1
An
[B
n1
] =
_
a.s. on C.
Moreover, on sup
n
M
n
= we must have

n
1
An
= and on
inf
n
M
n
= that

n
E[1
An
[B
n1
] = and so
n
1
An
= and
n
E[1
An
[B
n1
] = a.s. on D.
Thus it follows that Eq. (18.46) holds on C D a.s. which completes the proof
since = C D a.s..
See Durrett [15, Chapter 4.3] for more in this direction.
18.7 *Supermartingale inequalities
As the optional sampling theorem was our basic tool for deriving submartingale
inequalities, the following optional switching lemma will be our basic tool for
deriving positive supermartingale inequalities.
Lemma 18.60 (Optional switching lemma). Suppose that X and Y are
two supermartingales and is a stopping time such that X
on < .
Then
Z
n
= 1
n<
X
n
+ 1
n
Y
n
=
_
X
n
if n <
Y
n
if n
is again a supermartingale. (In short we can switch from X to Y at time, ,
provided Y X at the switching time, .) This lemma is valid if X
n
, Y
n

L
1
(, B
n
, P) for all n or if both X
n
, Y
n
0 for all n. In the latter case, we
should be using the extended notion of conditional expectations.
Proof. We begin by observing,
Z
n+1
= 1
n+1<
X
n+1
+ 1
n+1
Y
n+1
= 1
n+1<
X
n+1
+ 1
n
Y
n+1
+ 1
=n+1
Y
n+1
1
n+1<
X
n+1
+ 1
n
Y
n+1
+ 1
=n+1
X
n+1
= 1
n<
X
n+1
+ 1
n
Y
n+1
.
Since n < and n are B
n
measurable, it now follows from the super-
martingale property of X and Y that
E
Bn
Z
n+1
E
Bn
[1
n<
X
n+1
+ 1
n
Y
n+1
]
= 1
n<
E
Bn
[X
n+1
] + 1
n
E
Bn
[Y
n+1
]
1
n<
X
n
+ 1
n
Y
n
= Z
n
.
18.7.1 Maximal Inequalities
Theorem 18.61 (Supermartingale maximal inequality). Let X be a pos-
itive supermartingale (in the extended sense) and a B
0
with a 0, then
aP
_
sup
n
X
n
a[B
0
_
a X
0
(18.47)
and moreover
P
_
sup
n
X
n
= [B
0
_
= 0 on X
0
< . (18.48)
In particular if X
0
< a.s. then sup
n
X
n
< a.s.
Proof. Proof. Simply apply Corollary 18.44 with Y
n
= ((2a) X
n
) 1
A
where A B
0
to nd
aE
_
P
_
sup
n
X
n
a[B
0
_
: A
_
= aP
_
sup
n
X
n
a : A
_
E[a X
0
: A] .
Since this holds for all A B
0
, Eq. (18.47) follows.
Second Proof. Let := inf n : X
n
a which is a stopping time since,
n = X
n
a B
n
for all n.
Since X
a on < and Y
n
:= a is a supermartingale, it follows by the
switching Lemma 18.60 that
Z
n
:= 1
n<
X
n
+a1
n
is a supermartingale (in the extended sense). In particular it follows
aP ( n[B
0
) = E
B0
[a1
n
] E
B0
Z
n
Z
0
,
and
Z
0
= 1
0<
X
0
+a1
=0
= 1
X0<a
X
0
+ 1
X0a
a = a X
0
.
Therefore, using the cMCT,
aP
_
sup
n
X
n
a[B
0
_
= aP [ < [B
0
] = lim
n
aP ( n[B
0
)
Z
0
= a X
0
For the last assertion, take a > 0 to be constant in Eq. (18.47) and then use
the cDCT to let a to conclude
P
_
sup
n
X
n
= [B
0
_
= lim
a
P
_
sup
n
X
n
a[B
0
_
lim
a
1
X
0
a
= 1
X0=
.
Multiplying this equation by 1
X0<
and then taking expectations implies
E
_
1
sup
n
Xn=
1
X0<
= E[1
X0=
1
X0<
] = 0
which implies 1
sup
n
Xn=
1
X0<
= 0 a.s., i.e. sup
n
X
n
< a.s. on X
0
< .
18.7.2 The upcrossing inequality and convergence result
Theorem 18.62 (Dubins Upcrossing Inequality). Suppose X = X
n
n=0
is a positive supermartingale and 0 < a < b < . Then
P
_
U
X
(a, b) k[B
0
_
_
a
b
_
k
_
1
X
0
a
_
, for k 1 (18.49)
and U
(a, b) < a.s. and in fact

E
_
U
X
(a, b)
1
b/a 1
=
a
b a
< .
Proof. Since
U
X
N
(a, b) = U
X/a
N
(1, b/a) ,
it suces to consider the case where a = 1 and b > 1. Let
n
be the stopping
times dened in Eq. (18.39) with a = 1 and b > 1, i.e.
0
= 0,
1
= inf n
0
: X
n
1
2
= inf n
1
: X
n
b ,
3
:= inf n
2
: X
n
1
.
.
.
2k
= inf n
2k1
: X
n
b ,
2k+1
:= inf n
2k
: X
n
1 ,
.
.
.
see Figure 18.2.
Let k 1 and use the switching Lemma 18.60 repeatedly to dene a new
positive supermatingale Y
n
= Y
(k)
n (see Exercise 18.13 below) as follows,
Y
(k)
n
= 1
n<1
+ 1
1n<2
X
n
+b1
2n<3
+bX
n
1
3n<4
+b
2
1
4n<5
+b
2
X
n
1
5n<6
.
.
.
+b
k1
1
2k2n<2k1
+b
k1
X
n
1
2k1n<2k
+b
k
1
2kn
. (18.50)
18.7 *Supermartingale inequalities 285
Since E[Y
n
[B
0
] Y
0
a.s., Y
n
b
k
1
2kn
, and
Y
0
= 1
0<1
+ 1
1=0
X
0
= 1
X0>1
+ 1
X01
X
0
= 1 X
0
,
we may infer that
b
k
P (
2k
n[B
0
) = E
_
b
k
1
2kn
[B
0
E[Y
n
[B
0
] 1 X
0
a.s.
Using cMCT, we may now let n to conclude
P
_
U
X
(1, b) k[B
0
_
P (
2k
< [B
0
)
1
b
k
(1 X
0
) a.s.
which is Eq. (18.49). Using cDCT, we may let k in this equation to discover
P
_
U
X
(1, b) = [B
0
_
= 0 a.s. and in particular, U
X
(1, b) < a.s. In fact we

have
E
_
U
X
(1, b)
k=1
P
_
U
X
(1, b) k
_
k=1
E
_
1
b
k
(1 X
0
)
_
=
1
b
1
1 1/b
E[(1 X
0
)]
1
b 1
< .
Exercise 18.13. In this exercise you are asked to ll in the details showing Y
n
in Eq. (18.50) is still a supermartingale. To do this, dene Y
(k)
n via Eq. (18.50)
and then show (making use of the switching Lemma 18.60 twice) Y
(k+1)
n is a
supermartingale under the assumption that Y
(k)
n is a supermartingale. Finish
o the induction argument by observing that the constant process, U
n
:= 1 and
V
n
= 0 are supermartingales such that U
1
= 1 0 = V
1
on
1
< , and
therefore by the switching Lemma 18.60,
Y
(1)
n
= 1
0n<1
U
n
+ 1
1n
V
n
= 1
0n<1
is also a supermartingale.
Corollary 18.63 (Positive Supermartingale convergence). Suppose X =
X
n
n=0
is a positive supermartingale (possibly in the extended sense), then
X
= lim
n
X
n
exists a.s. and we have
E[X
[B
n
] X
n
for all n

N. (18.51)
In particular,
EX
EX
n
EX
0
for all n < . (18.52)
Proof. The set,
0
:=
_
U
X
(a, b) < : a, b with a < b

_
,
has full measure (P (
0
) = 1) by Dubins upcrossing inequality in Theorem
18.62. So by Lemma 18.53, for
0
we have X
() := lim
n
X
n
()
exists
7
in [0, ] . For deniteness, let X
= 0 on
c
0
. Equation (18.51) is now
a consequence of cFatou;
E[X
[B
n
] = E
_
lim
m
X
m
[B
n
_
liminf
m
E[X
m
[B
n
] liminf
m
X
n
= X
n
a.s.
The supermartingale property guarantees that EX
n
EX
0
for all n < while
taking expectations of Eq. (18.51) implies EX
EX
n
.
Theorem 18.64 (Optional sampling II Positive supermartingales).
Suppose that X = X
n
n=0
is a positive supermartingale, X
:= lim
n
X
n
(which exists a.s. by Corollary 18.63), and and are arbitrary stopping
times. Then X
n
:= X
n
is a positive B
n
n=0
super martingale, X
=
lim
n
X
n
, and
E[X
[B
] X
a.s. (18.53)
Moreover, if EX
0
< , then E[X
] = E[X
] < .
Proof. We already know that X
is a positive supermatingale by optional

stopping Theorem 18.38. Hence an application of Corollary 18.63 implies that
lim
n
X
n
= lim
n
X
n
is convergent and
E
_
lim
n
X
n
[B
m
_
X
m
= X
m
for all m < . (18.54)
On the set < , lim
n
X
n
= X
and on the set = ,

lim
n
X
n
= lim
n
X
n
= X
= X
a.s. Therefore it follows that

lim
n
X
n
= X
and Eq. (18.54) may be expressed as

E[X
[B
m
] X
m
for all m < . (18.55)
An application of Lemma 18.30 now implies
E[X
[B
] =
m
1
=m
E[X
[B
m
]
m
1
=m
X
m
= X
a.s.
7
If EX0 < , this may also be deduced by applying Corollary 18.54 to {Xn}
n=0
.
18.8 Martingale Closure and Regularity Results
We are now going to give a couple of theorems which have already been alluded
to in Exercises 13.6, 14.8, and 14.9.
Theorem 18.65. Let M := M
n
n=0
be an L
1
bounded martingale, i.e. C :=
sup
n
E[M
n
[ < and let M
:= lim
n
M
n
which exists a.s. and satises,
E[M
[ < by Corollary 18.54. Then the following are equivalent;

1. There exists X L
1
(, B, P) such that M
n
= E[X[B
n
] for all n.
2. M
n
n=0
3. M
n
M
in L
1
(, B, P) .
Moreover, if any of the above equivalent conditions hold we may take X =
M
, i.e. M
n
= E[M
[B
n
] .
Proof. 1. = 2. This was already proved in Proposition 18.8.
2. = 3. The knowledge that M
:= lim
n
M
n
exists a.s. along with the
assumed uniform integrability implies L
1
convergence by Vitali convergence
Theorem 12.44.
3. = 1. If M
n
M
in L
1
(, B, P) , then by the martingale property
and the L
1
(P) continuity of conditional expectation we nd,
M
n
= E[M
m
[B
n
] E[M
[B
n
] as m ,
and thus, M
n
= E[M
[B
n
] a.s.
Denition 18.66. A martingale satisfying any and all of the equivalent state-
ments in Theorem 18.65 is said to be regular.
Theorem 18.67. Suppose 1 < p < and M := M
n
n=0
is an L
p
bounded
martingale. Then M
n
M
almost surely and in L

p
. In particular, M
n
is
a regular martingale.
Proof. The almost sure convergence follows from Corollary 18.54. So, be-
cause of Corollary 12.47, to nish the proof it suces to show [M
n
[
p
n=0
is
uniformly integrable. But by Doobs inequality, Corollary 18.47, and the MCT,
we nd
E
_
sup
k
[M
k
[
p
_
_
p
p 1
_
p
sup
k
E[[M
k
[
p
] < .
As [M
n
[
p
sup
k
[M
k
[
p
L
1
(P) for all n N, it follows by Example 12.39 and
Exercise 12.5 that [M
n
[
p
n=0
Theorem 18.68 (Optional sampling III regular martingales). Suppose
that M = M
n
n=0
is a regular martingale, and are arbitrary stopping
times. Dene M
:= lim
n
M
n
which exists a.s.. Then M
L
1
(P) ,
M
= E[M
[B
] , E[M
[ E[M
[ < (18.56)
and
E[M
[B
] = M
a.s. (18.57)
Proof. By Theorem 18.65, M
L
1
(, B, P) and M
n
:= E
Bn
M
a.s. for
all n . By Lemma 18.30,
E
B
M
n
1
=n
E
Bn
M
n
1
=n
M
n
= M
.
Hence we have [M
[ = [E
B
M
[ E
B
[M
[ a.s. and E[M
[ E[M
[ < .
An application of Theorem 18.31 now concludes the proof;
E
B
M
= E
B
E
B
M
= E
B
M
= M
.
Denition 18.69. Let M = M
n
n=0
be a martingale. We say that is a
regular stopping time for M if M
is a regular martingale.
Example 18.70. Every bounded martingale is regular. More generally if is a
stopping time such that M
is bounded, then is a regular stopping time for

M.
Remark 18.71. If is regular for M, then lim
n
M
n
:= M
exists a.s. and in

L
1
(P) and hence
lim
n
M
n
= M
a.s. on = . (18.58)
Thus if is regular for M, we may dene M
as, M
:= M
= lim
n
M
n
and we will have
M
n
= M
n
= E
Bn
M
(18.59)
and
E[M
[ = lim
n
E[M
n
[ sup
n
E[M
n
[ < .
Theorem 18.72. Suppose M = M
n
n=0
is a martingale and , , are stopping
times such that is a regular stopping time for M. Then
E
B
M
= M
, (18.60)
and if a.s. then
M
n
= E
Bn
[E
B
M
] (18.61)
and is regular for M.
18.9 Backwards (Reverse) Submartingales 287
Proof. By assumption, M
= lim
n
M
n
exists almost surely and in
L
1
(P) and M
n
= E[M
[B
n
] for n .
1. Equation (18.60) is a consequence of;
E
B
M
n
1
=n
E
Bn
M
n
1
=n
M
n
= M
.
2. Applying E
B
to Eq. (18.59) using the optional sampling Theorem 18.39
and the tower property of conditional expectation (see Theorem 18.31) shows
M
n
= M
n
= E
B
M
n
= E
B
E
Bn
M
= E
Bn
[E
B
M
] .
The regularity of M
now follows by item 1. of Theorem 18.65.

Proposition 18.73. Suppose that M is a martingale and is a stopping time.
Then the is regular for M i;
1. E[[M
[ : < ] < and

2. M
n
1
n<
n=0
is a uniformly integrable sequence of random variables.
Moreover, condition 1. is automatically satised if M is L
1
bounded, i.e.
if C := sup
n
E[M
n
[ < .
Proof. ( =) If is regular for M, M
L
1
(P) and M
n
= E
Bn
M
so that
M
n
= E
Bn
M
a.s. on n . In particular it follows that

E[[M
[ : < ] E[M
[ <
and
[M
n
1
n<
[ = [E
Bn
M
1
n<
[ E
Bn
[M
[ a.s.
from which it follows that M
n
1
n<
n=0
( = ) Our goal is to show M
n=0
is uniformly integrable. We begin with
the identity;
E[[M
n
[ : [M
n
[ a] =E[[M
n
[ : [M
n
[ a, n]
+E[[M
n
[ : [M
n
[ a, n < ] .
Since
E[[M
n
[ : [M
n
[ a, n] = E[[M
[ : [M
[ a, n]
E[[M
1
<
[ : [M
1
<
[ a] ,
if follows (by assumption 1. that E[[M
1
<
[] < ) that
lim
a
sup
n
E[[M
n
[ : [M
n
[ a, n] = 0.
Moreover for any a > 0,
sup
n
E[[M
n
[ : [M
n
[ a, n < ] = sup
n
E[[M
n
1
n<
[ : [M
n
1
n<
[ a]
at the latter term goes to zero as a by assumption 2. Hence we have
shown,
lim
a
sup
n
E[[M
n
[ : [M
n
[ a] = 0
as desired.
Now to prove the last assertion. If C := sup
n
E[M
n
[ < , the (by Corollary
18.54) M
:= lim
n
M
n
a.s. and E[M
[ < . Therefore,
E[[M
[ : < ] E[M
[ = E
_
lim
n
[M
n
[
_
liminf
n
E[M
n
[ liminf
n
E[M
n
[ <
wherein we have used Fatous lemma, the optional sampling theorem to conclude
M
n
= E
Bn
M
n
, cJensen to conclude [M
n
[ E
Bn
[M
n
[ , and the tower
property of conditional expectation to conclude E[M
n
[ E[M
n
[ .
Corollary 18.74. Suppose that M is an L
1
bounded martingale and J B
R
is a bounded set, then = inf n : M
n
/ J is a regular stopping time for M.
Proof. According to Proposition 18.73, it suces to show M
n
1
n<
n=0
is a uniformly integrable sequence of random variables. However, if we choose
A < such that J [A, A] , since M
n
1
n<
J we have [M
n
1
n<
[ A
which is sucient to complete the proof.
18.9 Backwards (Reverse) Submartingales
In this section we will consider submartingales indexed by Z
:=
. . . , n, n + 1, . . . , 2, 1, 0 . So again we assume that we have an
increasing ltration, B
n
: n 0 , i.e. B
2
B
1
B
0
B. As
usual, we say an adapted process X
n
n0
is a submartingale (martingale)
provided E[X
m
X
n
[B
n
] 0 (= 0) for all m n. Observe that EX
m
EX
n
for m n, so that EX
n
decreases as n increases. Also observe that
_
X
n
, X
(n1)
, . . . , X
1
, X
0
_
is a nite string submartingale relative to the
ltration, B
n
B
(n1)
B
1
B
0
.
It turns out that backwards submartingales are even better behaved than
forward submartingales. In order to understand why, consider the case where
Fig. 18.5. A sample path of a backwards martingale on [100, 0] indicating the
down crossings of X0, X1, . . . , X100 and the upcrossings of X100, X99, . . . , X0.
The total number of each is the same.
M
n
n0
is a backwards martingale. We then have, E[M
n
[B
m
] = M
mn
for all
m, n 0. Taking n = 0 in this equation implies that M
m
= E[M
0
[B
m
] and
so the only backwards martingales are of the form M
m
= E[M
0
[B
m
] for some
M
0
L
1
(P) . We have seen in Example 18.7 that this need not be the case for
forward martingales.
Theorem 18.75 (Backwards (or reverse) submartingale convergence).
Let B
n
: n 0 be a reverse ltration, X
n
n0
is a backwards submartingale.
Then X
= lim
n
X
n
exists a.s. in 1 and X
+
L
1
(, B, P) .
If we further assume that
C := lim
n
EX
n
= inf
n0
EX
n
> , (18.62)
then 1) X
n
= M
n
+A
n
where M
n
<n0
is a martingale, A
n
<n0
is a
predictable process such that A
= lim
n
A
n
= 0, 2) X
n
n0
is uniformly
integrability, 3) X
L
1
(, B, P) , and 4) lim
n
E[X
n
X
[ = 0.
Proof. The number of downcrossings of
_
X
0
, X
1
, . . . , X
(n1)
, X
n
_
across [a, b] , (denoted by D
n
(a, b)) is equal to the number of upcross-
ings,
_
X
n
, X
(n1)
, . . . , X
1
, X
0
_
across [a, b] , see Figure 18.5. Since
_
X
n
, X
(n1)
, . . . , X
1
, X
0
_
is a B
n
B
(n1)
B
1
B
0
sub-
martingale, we may apply Doobs upcrossing inequality (Theorem 18.51) to
nd;
(b a) E[D
n
(a, b)] E(X
0
a)
+
E(X
n
a)
+
E(X
0
a)
+
< . (18.63)
Letting D
(a, b) := lim
n
D
n
(a, b) be the total number of downcrossing of
(X
0
, X
1
, . . . , X
n
, . . . ) , using the MCT to pass to the limit in Eq. (18.63), we
have
(b a) E[D
(a, b)] E(X

0
a)
+
< .
In particular it follows that D
(a, b) < a.s. for all a < b.

As in the proof of Corollary 18.54 (making use of the obvious downcrossing
analogue of Lemma 18.53), it follows that X
:= lim
n
X
n
exists in

1
a.s. At the end of the proof, we will show that X
takes values in 1
almost surely, i.e. X
< a.s.
Now suppose that C > . We begin by computing the Doob decom-
position of X
n
as X
n
= M
n
+ A
n
with A
n
being predictable, increasing and
satisfying, A
= lim
n
A
n
= 0. If such an A is to exist, following Lemma
18.16, we should dene
A
n
=
kn
E[
k
X[B
k1
] .
This is a well dened increasing predictable process since the submartingale
property implies E[
k
X[B
k1
] 0. Moreover we have
EA
0
=
k0
E[E[
k
X[B
k1
]] =
k0
E[
k
X]
= lim
N
(EX
0
EX
N
) = EX
0
inf
n0
EX
n
= EX
0
C < .
As 0 A
n
A
n
= A
0
L
1
(P) , it follows that A
n
n0
is uniformly integrable
Moreover if we dene M
n
:= X
n
A
n
, then
E[
n
M[B
n1
] = E[
n
X
n
A[B
n1
] = E[
n
X[B
n1
]
n
A = 0 a.s.
Thus M is a martingale and therefore, M
n
= E[M
0
[B
n
] with M
0
= X
0

A
0
L
1
(P) . An application of Proposition 18.8 implies M
n
n0
is uniformly
integrable and hence X
n
= M
n
+A
n
is uniformly integrable as well. (See Remark
18.9 Backwards (Reverse) Submartingales 289
18.76 for an alternate proof of the uniform integrability of X.) Therefore X

L
1
(, B, P) and X
n
X
in L
1
(, B, P) as n .
To nish the proof we must show, without assumptions on C > , we
must show X
+
L
1
(, B, P) which will also imply P (X
= ) = 0. To
prove this, notice that X
+
= lim
n
X
+
n
and that by Jensens inequality,
X
+
n
n=1
is a non-negative backwards submartingale. Since inf EX
+
n
0 >
, it follows by what we have just proved that X
+
L
1
(, B, P) .
Remark 18.76 (*Not necessary to read.). Let us give a direct proof of the fact
that X is uniformly integrable if C > . We begin with Jensens inequality;
E[X
n
[ = 2EX
+
n
EX
n
2EX
+
0
EX
n
2EX
+
0
C = K < , (18.64)
which shows that X
n
n=1
is L
1
- bounded. For uniform integrability we will
use the following identity;
E[[X[ : [X[ ] = E[X : X ] E[X : X ]
= E[X : X ] (EXE[X : X > ])
= E[X : X ] +E[X : X > ] EX.
Taking X = X
n
and k n, we nd
E[[X
n
[ : [X
n
[ ] =E[X
n
: X
n
] +E[X
n
: X
n
> ] EX
n
E[X
k
: X
n
] +E[X
k
: X
n
> ]
EX
k
+ (EX
k
EX
n
)
=E[X
k
: X
n
] E[X
k
: X
n
] + (EX
k
EX
n
)
=E[[X
k
[ : [X
n
[ ] + (EX
k
EX
n
) .
Given > 0 we may choose k = k
< 0 such that if n k, 0 EX

k
EX
n

and hence
limsup
sup
nk
E[[X
n
[ : [X
n
[ ] limsup
E[[X
k
[ : [X
n
[ ] +
wherein we have used Eq. (18.64), Chebyschevs inequality to conclude
P ([X
n
[ ) K/ and then the uniform integrability of the singleton set,
[X
k
[ L
1
(, B, P) . From this it now easily follows that X
n
n0
is a uni-
formly integrable.
Corollary 18.77. Suppose 1 p < and X
n
= M
n
in Theorem 18.75, where
M
n
is an L
p
bounded martingale on N0 . Then M
:= lim
n
M
n
ex-
ists a.s. and in L
p
(P) . Moreover M
= E[M
0
[B
] , where B
=
n0
B
n
.
Proof. Since M
n
= E[M
0
[B
n
] for all n, it follows by cJensen that [M
n
[
p
E[[M
0
[
p
[B
n
] for all n. By Proposition 18.8, E[[M
0
[
p
[B
n
]
n0
is uniformly
integrable and so is [M
n
[
p
n0
. By Theorem 18.75, M
n
M
a.s.. Hence
we may now apply Corollary 12.47 to see that M
n
M
in L
p
(P) .
Example 18.78 (Kolmogorovs SLLN). In this example we are going to give
another proof of the strong law of large numbers in Theorem 16.10, also
see Theorem 20.30 below for a third proof. Let X
n
n=1
be i.i.d. random
variables such that EX
n
= 0 and let S
0
= 0, S
n
:= X
1
+ + X
n
and
B
n
= (S
n
, S
n+1
, S
n+2
, . . . ) so that S
n
is B
n
measurable for all n 0.
1. For any permutation of the set 1, 2, . . . , n ,
(X
1
, . . . , X
n
, S
n
, S
n+1
, S
n+2
, . . . )
d
= (X
1
, . . . , X
n
, S
n
, S
n+1
, S
n+2
, . . . )
and in particular
(X
j
, S
n
, S
n+1
, S
n+2
, . . . )
d
= (X
1
, S
n
, S
n+1
, S
n+2
, . . . ) for all j n.
2. By Exercise 14.6 we may conclude that
E[X
j
[B
n
] = E[X
1
[B
n
] a.s. for all j n. (18.65)
To see this directly notice that if is any permutation of N leaving
n + 1, n + 2, . . . xed, then
E[g (X
1
, . . . , X
n
) f (S
n
, S
n+1
, . . . )] = E[g (X
1
, . . . , X
n
) f (S
n
, S
n+1
, . . . )]
for all bounded measurable f and g such that g (X
1
, . . . , X
n
) L
1
(P) .
From this equation it follows that
E[g (X
1
, . . . , X
n
) [B
n
] = E[g (X
1
, . . . , X
n
) [B
n
] a.s.
and then taking g (x
1
, . . . , x
n
) = x
1
give the desired result.
3. Summing Eq. (18.65) over j = 1, 2, . . . , n gives,
S
n
= E[S
n
[S
n
, S
n+1
, S
n+2
, . . . ] = nE[X
1
[S
n
, S
n+1
, S
n+2
, . . . ]
M
n
:=
S
n
n
:= E[X
1
[S
n
, S
n+1
, S
n+2
, . . . ] (18.66)
and hence
_
M
n
=
1
n
S
n
_
is a backwards martingale.
4. By Theorem 18.75 we know;
lim
n
S
n
n
= lim
n
M
n
=: M
exists a.s. and in L

1
(P) .
5. Since M
= lim
n
Sn
n
is a (X
1
, . . . , X
n
)
n=1
tail random variable
it follows by Corollary 10.51 (basically by Kolmogorovs zero one law of
Proposition 10.50) that lim
n
Sn
n
= c a.s. for some constant c.
6. Since
Sn
n
c in L
1
(P) we may conclude that
c = lim
n
E
S
n
n
= EX
1
.
Thus we have given another proof of Kolmogorovs strong law of large num-
bers.
18.10 Some More Martingale Exercises
(The next four problems were taken directly from
http://math.nyu.edu/she/martingalenote.pdf.)
Exercise 18.14. Suppose Harriet has 7 dollars. Her plan is to make one dollar
bets on fair coin tosses until her wealth reaches either 0 or 50, and then to go
home. What is the expected amount of money that Harriet will have when she
goes home? What is the probability that she will have 50 when she goes home?
Exercise 18.15. Consider a contract that at time N will be worth either 100
or 0. Let S
n
be its price at time 0 n N. If S
n
is a martingale, and S
0
= 47,
then what is the probability that the contract will be worth 100 at time N?
Exercise 18.16. Pedro plans to buy the contract in the previous problem at
time 0 and sell it the rst time T at which the price goes above 55 or below 15.
What is the expected value of S
T
? You may assume that the value, S
n
, of the
contract is bounded there is only a nite amount of money in the world up
to time N. Also note, by assumption, T N.
Exercise 18.17. Suppose S
N
is with probability one either 100 or 0 and that
S
0
= 50. Suppose further there is at least a 60% probability that the price will
at some point dip to below 40 and then subsequently rise to above 60 before
time N. Prove that S
n
cannot be a martingale. (I dont know if this problem
is correct! but if we modify the 40 to a 30 the buy low sell high strategy will
show that S
n
is not a martingale.)
Exercise 18.18. Let (M
n
)
n=0
be a martingale with M
0
= 0 and E[M
2
n
] <
for all n. Show that for all > 0,
P
_
max
1mn
M
m

_
E[M
2
n
]
E[M
2
n
] +
2
.
Hints: First show that for any c > 0 that
_
X
n
:= (M
n
+c)
2
_
n=0
is a
submartingale and then observe,
_
max
1mn
M
m

_
_
max
1mn
X
n
( +c)
2
_
.
Now use Doob Maximal inequality (Proposition 18.42) to estimate the proba-
bility of the last set and then choose c so as to optimize the resulting estimate
you get for P (max
1mn
M
m
) . (Notice that this result applies to M
n
as
well so it also holds that;
P
_
min
1mn
M
m

_
E[M
2
n
]
E[M
2
n
] +
2
for all > 0.
Exercise 18.19. Let Z
n
n=1
be independent random variables, S
0
= 0 and
S
n
:= Z
1
+ + Z
n
, and f
n
() := E
_
e
iZn
. Suppose Ee
iSn
=

N
n=1
f
n
()
converges to a continuous function, F () , as N . Show for each 1
that
P
_
lim
n
e
iSn
exists
_
= 1. (18.67)
Hints:
1. Show it is enough to nd an > 0 such that Eq. (18.67) holds for [[ .
2. Choose > 0 such that [F () 1[ < 1/2 for [[ . For [[ , show
M
n
() :=
e
iSn
Ee
iSn
is a bounded complex
8
martingale relative to the ltra-
tion, B
n
= (Z
1
, . . . , Z
n
) .
Lemma 18.79 (Protter [49, See the lemma on p. 22.]). Let x
n
n=1
1
such that
_
e
iuxn
_
n=1
is convergent for Lebesgue almost every u 1. Then
lim
n
x
n
exists in 1.
Proof. Let U be a uniform random variable with values in [0, 1] . By as-
sumption, for any t 1, lim
n
e
itUxn
exists a.s. Thus if n
k
and m
k
are any
increasing sequences we have
lim
k
e
itUxn
k
= lim
n
e
itUxn
= lim
k
e
itUxm
k
a.s.
and therefore,
e
it(Uxn
k
Uxm
k
)
=
e
itUxn
k
e
itUxm
k
1 a.s. as k .
Hence by DCT it follows that
E
_
e
it(Uxn
k
Uxm
k
)
_
1 as k
and therefore
(x
nk
x
mk
) U = Ux
nk
Ux
mk
0
in distribution and hence in probability. But his can only happen if
(x
nk
x
mk
) 0 as k . As n
k
and m
k
were arbitrary, this suces to
show x
n
is a Cauchy sequence.
Exercise 18.20 (Continuation of Exercise 18.19 See Doob [13, Chap-
ter VII.5]). Let Z
n
n=1
be independent random variables. Use Exercise 18.19
an Lemma 18.79 to prove the series,
n=1
Z
n
, converges in 1 a.s. i
N
n=1
f
n
()
converges to a continuous function, F () as N . Conclude from this that
n=1
Z
n
is a.s. convergent i

n=1
Z
n
is convergent in distribution.
8
Please use the obvious generalization of a martingale for complex valued processes.
It will be useful to observe that the real and imaginary parts of a complex martin-
gales are real martingales.
18.10 Some More Martingale Exercises 291
18.10.1 More Random Walk Exercises
For the next four exercises, let Z
n
n=1
be a sequence of Bernoulli random
variables with P (Z
n
= 1) =
1
2
and let S
0
= 0 and S
n
:= Z
1
+ +Z
n
. Then
S becomes a martingale relative to the ltration, B
n
:= (Z
1
, . . . , Z
n
) with
B
0
:= , of course S
n
is the (fair) simple random walk on Z. For any
a Z, let
a
:= inf n : S
n
= a .
Exercise 18.21. For a < 0 < b with a, b Z, let =
a

b
. Explain why
is regular for S. Use this to show P ( = ) = 0. Hint: make use of Remark
18.71 and the fact that [S
n
S
n1
[ = [Z
n
[ = 1 for all n.
Exercise 18.22. In this exercise, you are asked to use the central limit Theorem
10.35 to prove again that P ( = ) = 0, Exercise 18.21. Hints: Use the central
limit theorem to show
1
2
_
R
f (x) e
x
2
/2
dx f (0) P ( = ) (18.68)
for all f C
3
(1 [0, )) with M := sup
xR
f
(3)
(x)
< . Use this inequal-

ity to conclude that P ( = ) = 0.
Exercise 18.23. Show
P (
b
<
a
) =
[a[
b +[a[
(18.69)
and use this to conclude P (
b
< ) = 1, i.e. every b N is almost surely visited
by S
n
. (This last result also follows by the Hewitt-Savage Zero-One Law, see
Example 10.55 where it is shown b is visited innitely often.)
Hint: Using properties of martingales and Exercise 18.21, compute
lim
n
E[S
ab
n
] in two dierent ways.
Exercise 18.24. Let :=
a

b
. In this problem you are asked to show
E[] = [a[ b with the aid of the following outline.
1. Use Exercise 18.4 above to conclude N
n
:= S
2
n
n is a martingale.
2. Now show
0 = EN
0
= EN
n
= ES
2
n
E[ n] . (18.70)
3. Now use DCT and MCT along with Exercise 18.23 to compute the limit as
n in Eq. (18.70) to nd
E[
a

b
] = E[] = b [a[ . (18.71)
4. By considering the limit, a in Eq. (18.71), show E[
b
] = .
For the next group of exercise we are now going to suppose that
P (Z
n
= 1) = p >
1
2
and P (Z
n
= 1) = q = 1 p <
1
2
. As before let
B
n
= (Z
1
, . . . , Z
n
) , S
0
= 0 and S
n
= Z
1
+ + Z
n
for n N. Let us
review the method above and what you did in Exercise 17.10 above.
In order to follow the procedures above, we start by looking for a function,
, such that (S
n
) is a martingale. Such a function must satisfy,
(S
n
) = E
Bn
(S
n+1
) = (S
n
+ 1) p +(S
n
1) q,
and this then leads us to try to solve the following dierence equation for ;
(x) = p(x + 1) +q(x 1) for all x Z. (18.72)
Similar to the theory of second order ODEs this equation has two linearly
independent solutions which could be found by solving Eq. (18.72) with initial
conditions, (0) = 1 and (1) = 0 and then with (0) = 0 and (1) =
0 for example. Rather than doing this, motivated by second order constant
coecient ODEs, let us try to nd solutions of the form (x) =
x
with
to be determined. Doing so leads to the equation,
x
= p
x+1
+ q
x1
, or
equivalently to the characteristic equation,
p
2
+q = 0.
The solutions to this equation are
=
1
1 4pq
2p
=
1
_
1 4p (1 p)
2p
=
1
_
4p
2
4p + 1
2p
=
1
_
(2p 1)
2
2p
= 1, (1 p) /p = 1, q/p .
The most general solution to Eq. (18.72) is then given by
(x) = A+B(q/p)
x
.
Below we will take A = 0 and B = 1. As before let
a
= inf n 0 : S
n
= a .
Exercise 18.25. Let a < 0 < b and :=
a

b
.
1. Apply the method in Exercise 18.21 with S
n
replaced by M
n
:= (q/p)
Sn
to
show P ( = ) = 0.
2. Now use the method in Exercise 18.23 to show
P (
a
<
b
) =
(q/p)
b
1
(q/p)
b
(q/p)
a
. (18.73)
3. By letting a in Eq. (18.73), conclude P (
b
= ) = 0.
4. By letting b in Eq. (18.73), conclude P (
a
< ) = (q/p)
]a]
.
Exercise 18.26. Verify,
M
n
:= S
n
n(p q)
and
N
n
:= M
2
n

2
n
are martingales, where
2
= 1 (p q)
2
. (This should be simple; see either
Exercise 18.4 or Exercise 18.3.)
Exercise 18.27. Using exercise 18.26, show
E(
a

b
) =
_
_
b [1 (q/p)
a
] +a
_
(q/p)
b
1
_
(q/p)
b
(q/p)
a
_
_
(p q)
1
. (18.74)
By considering the limit of this equation as a , show
E[
b
] =
b
p q
and by considering the limit as b , show E[
a
] = .
18.11 Appendix: Some Alternate Proofs
This section may be safely omitted (for now).
Proof. Alternate proof of Theorem 18.39. Let A B
. Then
E[X
: A] = E
_
N1
k=0
1
k<
k+1
X : A
_
=
N
k=1
E[
k
X : A k < ] .
Since A B
, A k B
k
and since k < = k
c
B
k
, it follows
that A k < B
k
. Hence we know that
E[
k+1
X : A k < ]

=
0 respectively.
and hence that
E[X
: A]

=
0 respectively.
Since this true for all A B
, Eq. (18.21) follows.

Lemma 18.80. Suppose (, B, B
n
n=0
, P) is a ltered probability space, 1
p < , and let B
:=
n=1
B
n
:= (
n=1
B
n
) . Then
n=1
L
p
(, B
n
, P) is
dense in L
p
(, B
, P) .
Proof. Let M
n
:= L
p
(, B
n
, P) , then M
n
is an increasing sequence of
closed subspaces of M
= L
p
(, B
, P) . Further let A be the algebra of func-

tions consisting of those f
n=1
M
n
such that f is bounded. As a consequence
of the density Theorem 12.27, we know that A and hence
n=1
M
n
is dense in
M
= L
p
(, B
, P) . This completes the proof. However for the readers con-

venience let us quickly review the proof of Theorem 12.27 in this context.
Let H denote those bounded B
measurable functions, f : 1, for

which there exists
n
n=1
A such that lim
n
|f
n
|
L
p
(P)
= 0. A rou-
tine check shows H is a subspace of the bounded B
measurable 1 valued
functions on , 1 H, A H and H is closed under bounded convergence. To
verify the latter assertion, suppose f
n
H and f
n
f boundedly. Then, by the
dominated (or bounded) convergence theorem, lim
n
|(f f
n
)|
L
p
(P)
= 0.
9
We may now choose
n
A such that |
n
f
n
|
L
p
(P)

1
n
then
lim sup
n
|f
n
|
L
p
(P)
lim sup
n
|(f f
n
)|
L
p
(P)
+ lim sup
n
|f
n
n
|
L
p
(P)
= 0,
which implies f H.
An application of Dynkins Multiplicative System Theorem 8.16, now shows
H contains all bounded (A) = B
measurable functions on . Since for any

f L
p
(, B, P) , f1
]f]n
H there exists
n
A such that |f
n
n
|
p
n
1
.
Using the DCT we know that f
n
f in L
p
and therefore by Minikowskis
inequality it follows that
n
f in L
p
.
Theorem 18.81. Suppose (, B, B
n
n=0
, P) is a ltered probability space,
1 p < , and let B
:=
n=1
B
n
:= (
n=1
B
n
) . Then for every
X L
p
(, B, P) , X
n
= E[X[B
n
] is a martingale and X
n
X
:= E[X[B
]
in L
p
(, B
, P) as n .
Proof. We have already seen in Example 18.6 that X
n
= E[X[B
n
] is always
a martingale. Since conditional expectation is a contraction on L
p
it follows that
E[X
n
[
p
E[X[
p
< for all n N . So to nish the proof we need to
show X
n
X
in L
p
(, B, P) as n .
Let M
n
:= L
p
(, B
n
, P) and M
= L
p
(, B
, P) . If X
n=1
M
n
, then
X
n
= X for all suciently large n and for n = . Now suppose that X M
and Y
n=1
M
n
. Then
9
It is at this point that the proof would break down if p = .
18.11 Appendix: Some Alternate Proofs 293
|E
B
X E
Bn
X|
p
|E
B
X E
B
Y |
p
+|E
B
Y E
Bn
Y |
p
+|E
Bn
Y E
Bn
X|
p
2 |X Y |
p
+|E
B
Y E
Bn
Y |
p
and hence
limsup
n
|E
B
X E
Bn
X|
p
2 |X Y |
p
.
Using the density Lemma 18.80 we may choose Y
n=1
M
n
as close to X
M
as we please and therefore it follows that limsup

n
|E
B
X E
Bn
X|
p
=
0.
For general X L
p
(, B, P) it suces to observe that X
:= E[X[B
]
L
p
(, B
, P) and by the tower property of conditional expectations,

E[X
[B
n
] = E[E[X[B
] [B
n
] = E[X[B
n
] = X
n
.
So again X
n
X
in L
p
as desired.
We are now ready to prove the converse of Theorem 18.81.
Theorem 18.82. Suppose (, B, B
n
n=0
, P) is a ltered probability space,
1 p < , B
:=
n=1
B
n
:= (
n=1
B
n
) , and X
n
n=1
L
p
(, B, P)
is a martingale. Further assume that sup
n
|X
n
|
p
< and that X
n
n=1
is
uniformly integrable if p = 1. Then there exists X
L
p
(, B
, P) such that
X
n
:= E[X
[B
] . Moreover by Theorem 18.81 we know that X

n
X
in
L
p
(, B
, P) as n and hence X
is uniquely determined by X
n
n=1
.
Proof. By Theorems 13.20 and 13.22 exists X
L
p
(, B
, P) and a
subsequence, Y
k
= X
nk
such that
lim
k
E[Y
k
h] = E[X
h] for all h L
q
(, B
, P)
where q := p (p 1)
1
. Using the martingale property, if h (B
n
)
b
for some n,
it follows that E[Y
k
h] = E[X
n
h] for all large k and therefore that
E[X
h] = E[X
n
h] for all h (B
n
)
b
.
This implies that X
n
= E[X
[B
n
] as desired.
Theorem 18.83 (Almost sure convergence). Suppose (, B, B
n
n=0
, P)
is a ltered probability space, 1 p < , and let B
:=
n=1
B
n
:=
(
n=1
B
n
) . Then for every X L
1
(, B, P) , the martingale, X
n
= E[X[B
n
] ,
converges almost surely to X
:= E[X[B
] .
Before starting the proof, recall from Proposition 1.5, if a
n
n=1
and
b
n
n=1
are two bounded sequences, then
limsup
n
(a
n
+b
n
) liminf
n
(a
n
+b
n
)
limsup
n
a
n
+ limsup
n
b
n
_
liminf
n
a
n
+ liminf
n
b
n
_
=limsup
n
a
n
liminf
n
a
n
+ limsup
n
b
n
liminf
n
b
n
. (18.75)
Proof. Since
X
n
= E[X[B
n
] = E[E[X[B
] [B
n
] = E[X
[B
n
] ,
there is no loss in generality in assuming X = X
. If X M
n
:= L
1
(, B
n
, P) ,
then X
m
= X
a.s. for all m n and hence X

m
X
a.s. Therefore the

theorem is valid for any X in the dense (by Lemma 18.80) subspace
n=1
M
n
of L
1
(, B
, P) .
For general X L
1
(, B
, P) , let Y
j
M
n
such that Y
j
X
L
1
(, B
, P) and let Y
j,n
:= E[Y
j
[B
n
] and X
n
:= E[X[B
n
] . We know that
Y
j,n
Y
j,
a.s. for each j N and our goal is to show X
n
X
a.s. By
Doobs inequality in Corollary 18.47 and the L
1
- contraction property of con-
ditional expectation we know that
P (X
N
a)
1
a
E[X
N
[
1
a
E[X[
and so passing to the limit as N we learn that
P
_
sup
n
[X
n
[ a
_
1
a
E[X[ for all a > 0. (18.76)
Letting a then shows P (sup
n
[X
n
[ = ) = 0 and hence sup
n
[X
n
[ <
a.s. Hence we may use Eq. (18.75) with a
n
= X
n
Y
j,n
and b
n
:= Y
j,n
to nd
D = limsup
n
X
n
liminf
n
X
n
limsup
n
a
n
liminf
n
a
n
+ limsup
n
b
n
liminf
n
b
n
= limsup
n
a
n
liminf
n
a
n
2 sup
n
[a
n
[
= 2 sup
n
[X
n
Y
j,n
[ ,
wherein we have used limsup
n
b
n
liminf
n
b
n
= 0 a.s. since Y
j,n
Y
j,
a.s.
We now apply Doobs inequality one more time, i.e. use Eq. (18.76) with
X
n
being replaced by X
n
Y
j,n
and X by X Y
j
, to conclude,
P (D a) P
_
sup
n
[X
n
Y
j,n
[
a
2
_
2
a
E[X Y
j
[ 0 as j .
Since a > 0 is arbitrary here, it follows that D = 0 a.s., i.e. limsup
n
X
n
=
liminf
n
X
n
and hence lim
n
X
n
exists in 1 almost surely. Since we already
know that X
n
X
in L
1
(, B, P) , we may conclude that lim
n
X
n
= X
a.s.
Alternative proof see Stroock [64, Corollary 5.2.7]. Let H denote those
X L
1
(, B
n
, P) such that X
n
:= E[X[B
n
] X
a.s. As we saw above H

contains the dense subspace
n=1
M
n
. It is also easy to see that H is a linear
space. Thus it suces to show that H is closed in L
1
(P) . To prove this let
X
(k)
H with X
(k)
X in L
1
(P) and let X
(k)
n := E
_
X
(k)
[B
n
. Then by the
maximal inequality in Eq. (18.76),
P
_
sup
n
X
n
X
(k)
n
a
_
1
a
E
X X
(k)
for all a > 0 and k N.

Therefore,
P
_
sup
nN
[X X
n
[ 3a
_
P
_
X X
(k)
a
_
+P
_
sup
nN
X
(k)
X
(k)
n
a
_
+P
_
sup
nN
X
(k)
n
X
n
a
_
2
a
E
X X
(k)
+P
_
sup
nN
X
(k)
X
(k)
n
a
_
and hence
limsup
N
P
_
sup
nN
[X X
n
[ 3a
_
2
a
E
X X
(k)
0 as k .
Thus we have shown
limsup
N
P
_
sup
nN
[X X
n
[ 3a
_
= 0 for all a > 0.
Since
_
limsup
n
[X X
n
[ 3a
_
_
sup
nN
[X X
n
[ 3a
_
for all N,
it follows that
P
_
limsup
n
[X X
n
[ 3a
_
= 0 for all a > 0
and therefore limsup
n
[X X
n
[ = 0 (P a.s.) which shows that X H.
(This proof works equally as well in the case that X is a Banach valued random
variable. One only needs to replace the absolute values in the proof by the
Banach norm.)
19
Some Martingale Examples and Applications
Exercise 19.1. Let S
n
be the total assets of an insurance company in year
n N
0
. Assume S
0
> 0 is a constant and that for all n 1 that S
n
=
S
n1
+
n
, where
n
= c Z
n
and Z
n
n=1
are i.i.d. random variables having
the normal distribution with mean < c and variance
2
. (The number c is
to be interpreted as the yearly premium.) Let R = S
n
0 for some n be the
event that the company eventually becomes bankrupt, i.e. is Ruined. Show
P(Ruin) = P(R) e
2(c)S0/
2
.
Solution to Exercise (19.1). Let us rst nd such that 1 = E
_
e
n
. To
do this let N be a standard normal random variable in which case,
1
set
= E
_
e
n
= E
_
e
(cN)
_
= e
(c)
e
(
2
2
)/2
,
leads to the equation for ;
2
2

2
+(c ) = 0.
Hence we should take = 2 (c ) /
2
the other solution, = 0, is unin-
teresting. Since E
_
e
n
= 1, we know from Example 18.11 that

Y
n
:= exp(S
n
) = e
S0
n
j=1
e
j
is a non-negative B
n
= (Z
1
, . . . , Z
n
) martingale. By the super-martingale
or the sub-martingale convergence theorem (see Corollaries 18.63 and 18.54), it
follows that lim
n
Y
n
= Y
exists a.s. Thus if is any stopping time we will

have;
EY
= E lim
n
Y
n
liminf
n
EY
n
= EY
0
= e
S0
as follows from Fatous Lemma and the optional sampling Theorem 18.39. If
= infn : S
n
0 is the time of the companies ruin, we have S
0 on
R = < and because < 0 and S
0 on R, it follows that Y
= e
S
1
on R. This leads to the desired estimate;
P (R) E[Y
: < ] EY
e
S0
= e
2(c)S0/
2
.
Observe that by the strong law of large numbers that lim
n
Sn
n
= E
1
=
c > 0 a.s. Thus for large n we have S
n
n(c ) as n .
The question we have addressed is what happens to the S
n
for intermediate
values in particular what is the likely hood that S
n
makes a suciently large
deviation from the typical value of n(c ) in order for the company to go
bankrupt.
19.1 A Polya Urn Model
In this section we are going to analyze the long run behavior of the Polya urn
Markov process which was introduced in Exercise 17.4. Recall that if the urn
contains r red balls and g green balls at a given time we draw one of these balls
at random and replace it and add c more balls of the same color drawn. Let
(r
n
, g
n
) be the number of red and green balls in the earn at time n. Then we
have
P ((r
n+1
, g
n
) = (r +c, g) [ (r
n
, g
n
) = (r, g)) =
r
r +g
and
P ((r
n+1
, g
n
) = (r, g +c) [ (r
n
, g
n
) = (r, g)) =
g
r +g
.
Let us observe that r
n
+g
n
= r
0
+g
0
+nc and hence if we let X
n
be the fraction
of green balls in the urn at time n,
X
n
:=
g
n
r
n
+g
n
,
then
X
n
:=
g
n
r
n
+g
n
=
g
n
r
0
+g
0
+nc
.
We now claim that X
n
n=0
is a martingale relative to
B
n
:= ((r
k
, g
k
) : k n) = (X
k
: k n) .
Indeed,
296 19 Some Martingale Examples and Applications
E[X
n+1
[B
n
] = E[X
n+1
[X
n
]
=
r
n
r
n
+g
n
g
n
r
n
+g
n
+c
+
g
n
r
n
+g
n
g
n
+c
r
n
+g
n
+c
=
g
n
r
n
+g
n
r
n
+g
n
+c
r
n
+g
n
+c
= X
n
.
Since X
n
0 and EX
n
= EX
0
< for all n it follows by Corollary 18.54 that
X
:= lim
n
X
n
exists a.s. The distribution of X
is described in the next

theorem.
Theorem 19.1. Let := g/c and := r/c and := Law
P
(X
) . Then is
the beta distribution on [0, 1] with parameters, , , i.e.
d(x) =
( +)
() ()
x
1
(1 x)
1
dx for x [0, 1] . (19.1)
Proof. We will begin by computing the distribution of X
n
. As an example,
the probability of drawing 3 greens and then 2 reds is
g
r +g

g +c
r +g +c

g + 2c
r +g + 2c

r
r +g + 3c

r +c
r +g + 4c
.
More generally, the probability of rst drawing m greens and then n m reds
is
g (g +c) (g + (n 1) c) r (r +c) (r + (n m1) c)
(r +g) (r +g +c) (r +g + (n 1) c)
.
Since this is the same probability for any of the
_
n
m
_
ways of drawing m greens
and n m reds in n draws we have
P (Draw m greens)
=
_
n
m
_
g (g +c) (g + (m1) c) r (r +c) (r + (n m1) c)
(r +g) (r +g +c) (r +g + (n 1) c)
=
_
n
m
_
( + 1) ( + (m1)) ( + 1) ( + (n m1))
( +) ( + + 1) ( + + (n 1))
.
(19.2)
Before going to the general case let us warm up with the special case, g = r =
c = 1. In this case Eq. (19.2) becomes,
P (Draw m greens) =
_
n
m
_
1 2 m 1 2 (n m)
2 3 (n + 1)
=
1
n + 1
.
On the set, Draw m greens , we have X
n
=
1+m
2+n
and hence it follows that
for any f C ([0, 1]) that
E[f (X
n
)] =
n
m=0
f
_
m+ 1
n + 2
_
P (Draw m greens)
=
n
m=0
f
_
m+ 1
n + 2
_
1
n + 1
.
Therefore
E[f (X)] = lim
n
E[f (X
n
)] =
_
1
0
f (x) dx (19.3)
and hence we may conclude that X
has the uniform distribution on [0, 1] .

For the general case, recall from Example 7.50 that n! = (n+1), (t + 1) =
t (t) , and therefore for m N,
(x +m) = (x +m1) (x +m2) . . . (x + 1) x (x) . (19.4)
Also recall Stirlings formula in Eq. (7.53) (also see Theorem 7.60) that
(x) =
2x
x1/2
e
x
[1 +r (x)] (19.5)
where [r (x)[ 0 as x . To nish the proof we will follow the strategy of
the proof of Eq. (19.3) using Stirlings formula to estimate the expression for
P (Draw m greens) in Eq. (19.2).
On the set, Draw m greens , we have
X
n
=
g +mc
r +g +nc
=
+m
+ +n
=: x
m
,
where := r/c and := g/c. For later notice that
m
x =

++n
.
Using this notation we may rewrite Eq. (19.2) as
P (Draw m greens)
=
_
n
m
_ (+m)
()

(+nm)
()
(++n)
(+)
=
( +)
() ()

(n + 1)
(m+ 1) (n m+ 1)
( +m) ( +n m)
( + +n)
. (19.6)
Now by Stirlings formula,
( +m)
(m+ 1)
=
( +m)
+m1/2
e
(+m)
[1 +r ( +m)]
(1 +m)
m+11/2
e
(m+1)
[1 +r (1 +m)]
= ( +m)
1
_
+m
m+ 1
_
m+1/2
e
(1)
1 +r ( +m)
1 +r (m+ 1)
.
= ( +m)
1
_
1 +/m
1 + 1/m
_
m+1/2
e
(1)
1 +r ( +m)
1 +r (m+ 1)
19.2 Galton Watson Branching Process 297
We will keep m fairly large, so that
_
1 +/m
1 + 1/m
_
m+1/2
= exp
_
(m+ 1/2) ln
_
1 +/m
1 + 1/m
__
= exp((m+ 1/2) (/m1/m))
= e
1
.
Hence we have
( +m)
(m+ 1)
( +m)
1
.
Similarly, keeping n m fairly large, we also have
( +n m)
(n m+ 1)
( +n m)
1
and
( + +n)
(n + 1)
( + +n)
+1
.
Combining these estimates with Eq. (19.6) gives,
P (Draw m greens)
( +)
() ()

( +m)
1
( +n m)
1
( + +n)
+1
=
( +)
() ()

_
+m
++n
_
1
_
+nm
++n
_
1
( + +n)
+1
=
( +)
() ()
(x
m
)
1
(1 x
m
)
1
m
x.
Therefore, for any f C ([0, 1]) , it follows that
E[f (X
)] = lim
n
E[f (X
n
)]
= lim
n
n
m=0
f (x
m
)
( +)
() ()
(x
m
)
1
(1 x
m
)
1
m
x
=
_
1
0
f (x)
( +)
() ()
x
1
(1 x)
1
dx.
19.2 Galton Watson Branching Process
This section is taken from [15, p. 245 249]. Let
n
i
: i, n 1 be a sequence
of i.i.d. non-negative integer valued random variables. Suppose that Z
n
is the
number of people in the n
th
generation and
n+1
1
, . . . ,
n+1
Zn
are the number of
o spring of the Z
n
people of generation n. Then
Z
n+1
=
n+1
1
+ +
n+1
Zn
=
k=1
_
n+1
1
+ +
n+1
k
_
1
Zn=k
. (19.7)
represents the number of people present in generation, n + 1. We complete the
description of the process, Z
n
by setting Z
0
= 1 and Z
n+1
= 0 if Z
n
= 0, i.e. once
the population dies out it remains extinct forever after. The process Z
n
n0
is
called a Galton-Watson Branching process, see Figure 19.1. To understand
Fig. 19.1. A possible realization of a Galton Watson tree.
Z
n
a bit better observe that Z
1
=
1
1
, Z
2
=
2
1
+ +
3
1
1
, Z
3
=
3
1
+ +
3
Z2
,
etc. The sample path in Figure 19.1 corresponds to
1
1
= 3,
2
1
= 2,
2
2
= 0,
2
3
= 3,
3
1
=
3
2
=
3
3
=
3
4
= 0,
3
5
= 4, and
4
1
=
4
2
=
4
3
=
4
4
= 0.
We will use later the intuitive fact that the dierent branches of the Galton-
Watson tree evolve independently of one another you will be asked to make
this precise Exercise 19.4.
Let
d
=
m
i
, p
k
:= P ( = k) be the o-spring distribution,
:= E =
k=0
kp
k
,
which we assume to be nite.
Let B
0
= , and
B
n
:= (
m
i
: i 1 and 1 m n) .
Making use of Eq. (19.7) and the independence of
_
n+1
i
_
i=1
from B
n
we nd
E[Z
n+1
[B
n
] = E
_

k=1
_
n+1
1
+ +
n+1
k
_
1
Zn=k
[B
n
_
=
k=1
1
Zn=k
E
__
n+1
1
+ +
n+1
k
_
[B
n
k=1
1
Zn=k
k = Z
n
. (19.8)
So we have shown, M
n
:= Z
n
/
n
is a positive martingale and since M
0
=
Z
0
/
0
= 1 it follows that
1 = EM
0
= EM
n
=
EZ
n
n
= EZ
n
=
n
< . (19.9)
Theorem 19.2. If < 1, then, almost surely, Z
n
= 0 for a.a. n.
Proof. When < 1, we have
E
n=0
Z
n
=
n=0
n
=
1
1
<
and therefore

n=0
Z
n
< a.s. As Z
n
N
0
for all n, this can only happen if
Z
n
= 0 for almost all n a.s.
Theorem 19.3. If = 1 and P (
m
i
= 1) < 1,
1
then again, almost surely,
Z
n
= 0 for a.a. n.
Proof. In this case Z
n
n=1
is a martingale which, being positive, is L
1
bounded. Therefore, lim

n
Z
n
=: Z
exists. Because Z
n
is integer valued, it
must happen that Z
n
= Z
a.a. If k N, Since
1
The assumption here is equivalent to p0 > 0 and = 1.
Z
= k = Z
n
= k a.a. n =
N=1
Z
n
= k for all n N ,
we have
P (Z
= k) = lim
N
P (Z
n
= k for all n N) .
However,
P (Z
n
= k for all n N) = P (
n
1
+ +
n
k
= k for all n N)
= [P (
n
1
+ +
n
k
= k)]
= 0,
because, P (
n
1
+ +
n
k
= k) < 1. Indeed, since p
1
= P ( = 1) < 1 and
= 1 it follows that p
l
= P ( = l) > 0 for some l > 1 and therefore
if P (
n
1
+ +
n
k
= kl) > 0 which then implies P (
n
1
+ +
n
k
= k) < 1.
Therefore we have shown P (Z
= k) = 0 for all k > 0 and therefore, Z
= 0
a.s. and hence almost surely, Z
n
= 0 for a.a. n.
Remark 19.4. By the way, the branching process, Z
n
n=0
with = 1 and
P ( = 1) < 1 gives a nice example of a non regular martingale. Indeed, if Z
were regular, we would have
Z
n
= E
_
lim
m
Z
m
[B
n
_
= E[0[B
n
] = 0
which is clearly false.
We now wish to consider the case where := E[
m
i
] > 1. Let
d
=
m
i
and
for C with [[ 1 we let
() := E
_
k0
p
k
k
.
Notice that (1) = 1 and for = s (1, 1) we have
t
(s) =
k0
kp
k
s
k1
and
tt
(s) =
k0
k (k 1) p
k
s
k2
0
with
lim
s1
t
(s) =
k0
kp
k
= E[] =: and
lim
s1
tt
(s) =
k0
k (k 1) p
k
= E[ ( 1)] .
Therefore is convex with (0) = p
0
, (1) = 1 and
t
(1) = .
19.2 Galton Watson Branching Process 299
Fig. 19.2. Figure associated to (s) =
1
8
_
1 + 3s + 3s
2
+s
3
_
which is relevant for
Exercise 3.13 of Durrett on p. 249. In this case
= 0.236 07.
Lemma 19.5. If =
t
(1) > 1, there exists a unique < 1 so that () = .
Proof. See Figure 19.2 below.
Theorem 19.6 (See Durrett [15], p. 247-248.). If > 1, then
P (Z
n
= 0 for some n) = .
Proof. Since Z
m
= 0 Z
m+1
= 0 , it follows that Z
m
= 0
Z
n
= 0 for some n and therefore if
m
:= P (Z
m
= 0) ,
then
P (Z
n
= 0 for some n) = lim
m
m
.
We now show;
m
= (
m1
) . To see this, conditioned on the set Z
1
= k ,
Z
m
= 0 i all k families die out in the remaining m1 time units. Since each
family evolves independently, the probability
2
of this event is
k
m1
. Combining
this with, P (Z
1
= k) = P
_
1
1
= k
_
= p
k
, allows us to conclude,
m
= P (Z
m
= 0) =
k=0
P (Z
m
= 0, Z
1
= k)
=
k=0
P (Z
m
= 0[Z
1
= k) P (Z
1
= k) =
k=0
p
k
k
m1
= (
m1
) .
Fig. 19.3. The graphical interpretation of m = (m1) starting with 0 = 0.
It is now easy to see that
m
as m , again see Figure 19.3.
Let S = N
0
, Y
i
i=1
be i.i.d. S valued random variables such that Y
d
=
n
i
(i.e. P (Y
i
= l) = p
l
for l N
0
), and for f : S C bounded or non-negative let
Qf (0) = f (0) and
Qf (k) := E[f (Y
1
+ +Y
k
)] for all k 1.
Notice that
Qf (k) =
lS
f (l) p
k
l
where
p
k
l
:= P (Y
1
+ +Y
k
= l) =
l1++lk=l
p
l1
. . . p
lk
with the convention that p
0
n
=
0,n
. As above, for C with [[ 1 we let
() := E
_
Y1
k0
p
k
k
2
This argument is made precise with the aid of Exercise 19.4.
be the moment generating function of for the Y
i
.
Exercise 19.2. Let B
n
:= B
Z
n
= (Z
0
, . . . , Z
n
) so that (, B, B
n
, P) is a l-
tered probability space. Show that Z
n
n=0
process with one step transition kernel being Q. In particular verify that
P (Z
n
= j[Z
n1
= k) = p
k
j
for all j, k S and n 1
and
E
_
Zn
[B
n1
= ()
Zn1
a.s. (19.10)
for all C with [[ 1.
Exercise 19.3. In the notation used in this section (Section 19.2), show for all
n N and
i
C with [
i
[ 1 that
E
_
_
n
j=1
Zi
i
_
_
= (
1
(. . .
n2
(
n1
(
n
)))) .
For example you should show,
E
_
Z1
1

Z2
2

Z3
3
_
= (
1
(
2
(
3
)))
and
E
_
Z1
1

Z2
2

Z3
3

Z4
4
_
= (
1
(
2
(
3
(
4
)))) .
Exercise 19.4. Suppose that n 2 and f : N
n1
0
C is a bounded function
or a non-negative function. Show for all k 1 that
E[f (Z
2
, . . . , Z
n
) [Z
1
= k] = E
_
f
_
k
l=1
_
Z
l
1
, . . . , Z
l
n1
_
__
(19.11)
where
_
Z
l
n
_
n=0
for 1 l k are i.i.d. Galton-Watson Branching processes
such that
_
Z
l
n
_
n=0
d
= Z
n
.
n=0
for each l.
Suggestion: it suces to prove Eq. (19.11) for f of the form,
f (k
2
, . . . , k
n
) =
n
j=2
ki
i
. (19.12)
19.3 Kakutanis Theorem
For broad generalizations of the results in this section, see [24, Chapter IV.]
or [25].
Proposition 19.7. Suppose that and are nite positive measures on
(X, /), =
a
+
s
is the Lebesgue decomposition of relative to , and
: X [0, ) is a measurable function such that d
a
= d so that
d = d
a
+d
s
= d +d
s
.
If g : X [0, ) is another measurable function such that gd d, (i.e.
_
B
gd (B) for all B /), then g , a.e.
Proof. Let A / be chosen so that (A
c
) = 0 and
s
(A) = 0. Then, for
all B /,
_
B
gd =
_
BA
gd (B A) =
_
BA
d =
_
B
d.
So by the comparison Lemma 7.24, g .
Example 19.8. This example generalizes Example 18.9. Suppose
(, B, B
n
n=0
, P) is a ltered probability space and Q is any another
probability measure on (, B) . By the Raydon-Nikodym Theorem 15.8, for
each n

N we may write
dQ[
Bn
= X
n
dP[
Bn
+dR
n
(19.13)
where R
n
is a measure on (, B
n
) which is singular relative to P[
Bn
and 0
X
n
L
1
(, B
n
, P) . In this case the most we can say in general is that X :=
X
n
n
is a positive supermartingale. To verify this assertion, for B B
n
and n m , we have
Q(B) = E[X
m
: B] +R
m
(B) E[X
m
: B] = E[E
Bn
(X
m
) : B]
from which it follows that E
Bn
(X
m
) dP[
Bn
dQ[
Bn
. So according to Propo-
sition 19.7,
E
Bn
(X
m
) X
n
(P a.s.) for all n m . (19.14)
Proposition 19.9. Keeping the assumptions and notation used in Example
19.8, then lim
n
X
n
= X
a.s. and in particular the Lebesgue decomposi-

tion of Q[
B
relative to P[
B
may be written as
dQ[
B
=
_
lim
n
X
n
_
dP[
B
+dR
. (19.15)
19.3 Kakutanis Theorem 301
Proof. By Example 19.8, we know that X
n
n
is a positive supermartin-
gale and by letting m = in Eq. (19.14), we know
E
Bn
X
X
n
a.s. (19.16)
By the supermartingale convergence Corollary 18.63 or by the submartingale
convergence Corollary 18.54 applied to X
n
we know that Y := lim
n
X
n
exists almost surely. To nish the proof it suces to show that Y = X
a.s.
where X
is dened so that Eq. (19.13) holds for n = .

From the regular martingale convergence Theorem 18.65 we also know that
lim
n
E
Bn
X
= X
a.s. as well. So passing to the limit in Eq. (19.16) implies

X
Y a.s. To prove the reverse inequality, Y X
a.s., let B B
m
and
n m. Then
Q(B) = E[X
n
: B] +R
n
(B) E[X
n
: B]
and so by Fatous lemma,
E[Y : B] = E
_
liminf
n
X
n
: B
_
liminf
n
E[X
n
: B] Q(B) . (19.17)
Since m N was arbitrary, we have proved E[Y : B] Q(B) for all B in
the algebra, / :=
mN
B
m
. As a consequence of the regularity Theorem 5.44
or of the monotone class Lemma 5.52, or of Theorem
3
5.27, it follows that
E[Y : B] Q(B) for all B (/) = B
. An application of Proposition 19.7

then implies Y X
a.s.
Theorem 19.10. (, B, B
n
n=0
, P) be a ltered probability space and Q be
a probability measure on (, B) such that Q[
Bn
P[
Bn
for all n N. Let
M
n
:=
dQ]Bn
dP]Bn
be a version of the Raydon-Nikodym derivative of Q[
Bn
relative to
P[
Bn
, see Theorem 15.8. Recall from Example 18.9 that M
n
n=1
is a positive
martingale and let M
= lim
n
M
n
which exists a.s. Then the following are
equivalent;
1. Q[
B
P[
B,
2. E
P
M
= 1,
3. M
n
M
in L
1
(P) , and
4. M
n
n=1
3
This theorem implies that for B B,
E[X0 : B] = inf {E[X0 : A] : A A} and
Q(B) = inf {Q(A) : A A}
and since, by MCT, E[X0 : A] Q(A) for all A A it follows that Eq. (19.17)
holds for all B B.
Proof. Recall from Proposition 19.9 (where X
n
is now M
n
) that in general,
dQ[
B
= M
dP[
B
+dR
(19.18)
where R
is singular relative to P[
B
. Therefore, Q[
B
P[
B
i R
= 0
which happens i R
() = 0, i.e. i
1 = Q() =
_
dP[
B
= E
P
M
.
This proves the equivalence of items 1. and 2. If item 2. holds, then M
n
M
by the DCT, Corollary 12.9, with g

n
= f
n
= M
n
and g = f = M
and so item
3. holds. The implication of 3. = 2. is easy and the equivalence of items 3.
and 4. follows from Theorem 12.44 for simply see Theorem 18.65.
Remark 19.11. Recall from Exercise 10.10, that if 0 < a
n
1,
n=1
a
n
> 0 i
n=1
(1 a
n
) < . Indeed,

n=1
a
n
> 0 i
< ln
_

n=1
a
n
_
=
n=1
lna
n
=
n=1
ln(1 (1 a
n
))
and

n=1
ln(1 (1 a
n
)) > i

n=1
(1 a
n
) < . Recall that
ln(1 (1 a
n
))
= (1 a
n
) for a
n
near 1.
Theorem 19.12 (Kakutanis Theorem). Let X
n
n=1
be independent non-
negative random variables with EX
n
= 1 for all n. Further, let M
0
= 1 and
M
n
:= X
1
X
2
X
n
a martingale relative to the ltration, B
n
:=
(X
1
, . . . , X
n
) as was shown in Example 18.11. According to Corollary 18.63,
M
:= lim
n
M
n
exists a.s. and EM
1. The following statements are

equivalent;
1. EM
= 1,
2. M
n
M
in L
1
(, B, P) ,
3. M
n
n=1
is uniformly integrable,
4.

n=1
E
_
X
n
_
> 0,
5.

n=1
_
1 E
_
X
n
__
< .
Moreover, if any one, and hence all of the above statements, fails to hold,
then P (M
= 0) = 1.
Proof. If a
n
:= E
_
X
n
_
, then 0 < a
n
and a
2
n
EX
n
= 1 with equality i
X
n
= 1 a.s. So Remark 19.11 gives the equivalence of items 4. and 5.
The equivalence of items 1., 2. and 3. follow by the same techniques used in
the proof of Theorem 19.10 above. We will now complete the proof by showing
4. = 3. and not(4.) = P (M
= 0) = 1 which clearly implies not(1.) .

For both pars of the argument, let N
0
= 1 and N
n
be the martingale (again see
Example 18.11) dened by
N
n
:=
n
k=1
X
k
a
k
=
M
n
n
k=1
a
k
. (19.19)
Further observe that, in all cases, N
= lim
n
N
n
exists in [0, ) a.s.,
see Corollary 18.54 or Corollary 18.63.
4. = 3. Since
N
2
n
=
n
k=1
X
k
a
2
k
=
M
n
(
n
k=1
a
k
)
2
,
E
_
N
2
n
=
EM
n
(
n
k=1
a
k
)
2
=
1
(
n
k=1
a
k
)
2

1
(
k=1
a
k
)
2
< ,
and hence N
n
n=1
is bounded in L
2
. Therefore, using
M
n
=
_
n
k=1
a
k
_
2
N
2
n
N
2
n
(19.20)
and Doobs inequality in Corollary 18.47, we nd
E
_
sup
n
M
n
_
= E
_
sup
n
N
2
n
_
4 sup
n
E
_
N
2
n
< . (19.21)
Equation Eq. (19.21) certainly implies M
n
n=1
is uniformly integrable, see
Proposition 12.42.
Not(4.) = P (M
= 0) = 1. If
n=1
E
_
_
X
n
_
= lim
n
n
k=1
a
k
= 0,
we may pass to the limit in Eq. (19.20) to nd
M
= lim
n
M
n
= lim
n
_
_
_
n
k=1
a
k
_
2
N
2
n
_
_
= 0
_
lim
n
N
n
_
2
= 0 a.s..
Lemma 19.13. Given two probability measures, and on a measurable space,
(, B) , there exists a a positive measure such that d :=
_
d
d

d
d
d, where
is any other nite measure on (, B) such that and . We
will write

d d for d in the future.
Proof. The main point is to show that is well dened. So suppose
1
and
2
are two nite measures such that
i
and
i
for i = 1, 2. Further
let :=
1
+
2
so that
i
for i = 1, 2. Observe that
d
1
=
d
1
d
d,
d =
d
d
1
d
1
=
d
d
1
d
1
d
d, and
d =
d
d
1
d
1
=
d
d
1
d
1
d
d.
So
_
d
d

d
d
d =
_
d
d
1
d
1
d

d
d
1
d
1
d
d
=
_
d
d
1
d
d
1
d
1
d
d =
_
d
d
1
d
d
1
d
1
and by symmetry,
_
d
d

d
d
d =
_
d
d
2
d
d
2
d
2
.
This shows
_
d
d
2
d
d
2
d
2
=
_
d
d
1
d
d
1
d
1
and hence d =
d d is well dened.
Denition 19.14. Two probability measures, and on a measure space,
(, B) are said to be equivalent (written ) if and , i.e.
if and are absolutely continuous relative to one another. The Hellinger
integral of and is dened as
H (, ) :=
_
_
d d =
_
_
d
d

d
d
d (19.22)
where is any measure (for example =
1
2
( +) would work) on (, B) such
that there exists,
d
d
and
d
d
in L
1
(, B, ) such that d =
d
d
d and d =
d
d
d.
Lemma 19.13 guarantees that H (, ) is well dened.
Proposition 19.15. The Hellinger integral, H (, ) , of two probability mea-
sures, and , is well dened. Moreover H (, ) satises;
1. 0 H (, ) 1,
2. H (, ) = 1 i = ,
3. H (, ) = 0 i , and
19.3 Kakutanis Theorem 303
4. If or more generally if , then H (, ) > 0.
Furthermore
4
,
H (, ) = inf
_
n
i=1
_
(A
i
) (A
i
) : =
n
i=1
A
i
and n N
_
. (19.23)
Proof. Items 1. and 2. are both an easy consequence of the Schwarz in-
equality and its converse. For item 3., if H (, ) = 0, then
d
d
dv
d
= 0, a.e..
Therefore, if we let
A :=
_
d
d
,= 0
_
,
then
d
d
= 1
A
d
d
a.e. and
dv
d
1
A
c =
dv
d
a.e. Hence it follows that
(A
c
) = 0 and (A) = 0 and hence .
If and in particular, v , then
H (, ) =
_
d
d
d
d
d =
_
d
d
d.
For sake of contradiction, if H (, ) = 0 then
_
d
d
= 0 and hence
d
d
= 0,
a.e. The later would imply = 0 which is impossible. Therefore, H (, ) > 0
if . The last statement is left to the reader as Exercise 19.6.
Exercise 19.5. Find a counter example to the statement that H (, ) > 0
implies .
Exercise 19.6. Prove Eq. (19.23).
Corollary 19.16 (Kakutani [27]). Let = 1
N
, Y
n
() =
n
for all
and n N, and B := B
= (Y
n
: n N) be the product algebra on .
Further, let :=
n=1
n
and :=
n=1
n
be product measures on (, B
)
associated to two sequences of probability measures,
n
n=1
and
n
n=1
on
(1, B
R
) , see Theorem 10.62 (take := P(Y
1
, Y
2
, . . . )
1
). Let us further assume
that
n

n
for all n so that
0 < H (
n
,
n
) =
_
R
d
n
d
m
d
n
1.
Then precisely one of the two cases below hold;
1.

n=1
(1 H (
n
,
n
)) < which happens i

n=1
H (
n
,
n
) > 0 which
happens i
or
4
This statement and its proof may be safely omitted.
2.

n=1
(1 H (
n
,
n
)) = which happens i

n=1
H (
n
,
n
) = 0 which
happens i .
In case 1. where we have
d
d
=
n=1
d
n
d
n
(Y
n
) -a.s. (19.24)
and in all cases we have
H (, ) =
n=1
H (
n
,
n
) .
Proof. Let P = , Q = , B
n
:= (Y
1
, . . . , Y
n
) , X
n
:=
dn
dn
(Y
n
) , and
M
n
:= X
1
. . . X
n
=
d
1
d
1
(Y
1
) . . .
d
n
d
n
(Y
n
) .
If f : 1
n
1 is a bounded measurable function, then
E
(f (Y
1
, . . . , Y
n
)) =
_
R
n
f (y
1
, . . . , y
n
) d
1
(y
1
) . . . d
n
(y
n
)
=
_
R
n
f (y
1
, . . . , y
n
)
d
1
d
1
(y
1
) . . .
d
n
d
n
(y
n
) d
1
(y
1
) . . . d
n
(y
n
)
= E
_
f (Y
1
, . . . , Y
n
)
d
1
d
1
(Y
1
) . . .
d
n
d
n
(Y
n
)
_
= E
[f (Y
1
, . . . , Y
n
) M
n
]
d[
Bn
= M
n
d[
Bn
.
Hence by Theorem 19.10, M
:= lim
n
M
n
exists a.s. and the Lebesgue
decomposition of is given by
d = M
d +dR
where R
. Moreover i R
= 0 which happens i EM
= 1 and
i R
= which happens i M
= 0. From Theorem 19.12,

E
= 1 i 0 <
n=1
E
_
_
X
n
_
=
n=1
_
R
d
n
d
n
d
n
=
n=1
H (
n
,
n
)
and in this case
d = M
d =
_

k=1
X
k
_
d =
_

n=1
d
n
d
n
(Y
n
)
_
d.
On the other hand, if
n=1
E
_
_
X
n
_
=
n=1
H (
n
,
n
) = 0,
Theorem 19.12 implies M
= 0, a.s. in which case Theorem 19.10 implies

= R
and so .
(The rest of the argument may be safely omitted.) For the last assertion,
if

n=1
H (
n
,
n
) = 0 then and hence H (, ) = 0. Conversely if
n=1
H (
n
,
n
) > 0, then M
n
M
in L
1
() and therefore
E
_
M
n
_
M
2
_
E
_
M
n
_
M
_
M
n
+
_
M
_
= E
[[M
n
M
[] 0 as n .
Since d = M
d in this case, it follows that

H (, ) = E
_
_
M
_
= lim
n
E
_
_
M
n
_
= lim
n
n
k=1
H (
k
,
k
) =
k=1
H (
k
,
k
) .
n
=
1
for all n and
n
=
_
1 p
2
n
_
0
+ p
2
n
1
with p
n
(0, 1) . Then
n

n
with
d
n
d
n
= 1
1]
p
2
n
and
H (
n
,
n
) =
_
R
_
1
1]
p
2
n d
n
=
_
p
2
n p
2
n
= p
n
.
So in this case i
n=1
(1 p
n
) < . Observe that is never absolutely
continuous relative to .
On the other hand; if we further assume in Corollary 19.16 that
n

n
,
then either; or depending on whether

n=1
H (
n
,
n
) > 0 or
n=1
H (
n
,
n
) = 0 respectively.
In the next group of problems you will be given probability measures,
n
and
n
on 1 and you will be asked to decide if :=
n=1
n
and :=
n=1
n
are equivalent. For the solutions of these problems you will want to make use
of the following Gaussian integral formula;
_
R
exp
_
a
2
x
2
+bx
_
dx =
_
R
exp
_
a
2
_
x
b
a
_
2
+
b
2
2a
_
dx
= e
b
2
2a
_
R
exp
_
a
2
x
2
_
dx =
_
2
a
e
b
2
2a
which is valid for all a > 0 and b 1.
Exercise 19.7 (A Discrete Cameron-Martin Theorem). Suppose t > 0,
a
n
1, d
n
(x) =
1
2t
e
x
2
/2t
dx and d
n
(x) =
1
2t
e
(x+an)
2
/2t
dx . Show
i

k=1
a
2
k
< .
Exercise 19.8. Suppose s, t > 0, a
n
1, d
n
(x) =
1
2t
e
x
2
/2t
dx and
d
n
(x) =
1
2s
e
(x+an)
2
/2s
dx. Show if s ,= t.
Exercise 19.9. Suppose t
n
(0, ) , d
n
(x) =
1
2
e
x
2
/2
dx and d
n
(x) =
1
2tn
e
x
2
/2tn
dx. If

n=1
(t
n
1)
2
< then .
Part IV
(Weak) Convergence of Random Sums
20
Random Sums
As usual let (, B, P) be a probability space. The general theme of this
chapter is to consider arrays of random variables, X
n
k
n
k=1
, for each n N.
We are going to look for conditions under which lim
n
n
k=1
X
n
k
exists almost
surely or in L
p
for some 0 p < . Typically we will start with a sequence of
random variables, X
k
k=1
and consider the convergence of
S
n
=
X
1
+ +X
n
b
n
a
n
for appropriate choices of sequence of numbers, a
n
and b
n
. This ts into
our general scheme by taking X
n
k
= X
n
k
/b
n
a
n
/n.
20.1 Weak Laws of Large Numbers
Theorem 20.1 (An L
2
Weak Law of Large Numbers). Let X
n
n=1
be
a sequence of uncorrelated square integrable random variables,
n
= EX
n
and
2
n
= Var (X
n
) . If there exists an increasing positive sequence, a
n
and 1
such that
lim
n
1
a
n
n
j=1
j
= and lim
n
1
a
2
n
n
j=1
2
j
= 0,
then
Sn
an
in L
2
(P) (and hence also in probability).
Exercise 20.1. Prove Theorem 20.1.
k
k=1
L
2
(P) are uncorrelated identically dis-
tributed random variables. Then
S
n
n
L
2
(P)
= EX
1
as n .
To see this, simply apply Theorem 20.1 with a
n
= n. More generally if b
n

such that lim
n
_
n/b
2
n
_
= 0, then
Var
_
S
n
b
n
_
=
1
b
2
n
nVar (X
1
) 0 as n
and therefore
(S
n
n) /b
n
0 in L
2
(P) .
Note well: since L
2
(P) convergence implies L
p
(P) convergence for
0 p 2, where by L
0
(P) convergence we mean convergence in prob-
ability. The remainder of this chapter is mostly devoted to proving a.s. conver-
gence for the quantities in Theorem 12.25 and Proposition 20.10 under various
assumptions. These results will be described in the next section.
Theorem 20.3 (Weak Law of Large Numbers). Suppose that X
n
n=1
is
a sequence of independent random variables. Let and
S
n
:=
n
j=1
X
j
and a
n
:=
n
k=1
E(X
k
: [X
k
[ n) .
If
lim
n
n
k=1
P ([X
k
[ > n) = 0 and (20.1)
lim
n
1
n
2
n
k=1
E
_
X
2
k
: [X
k
[ n
_
= 0, (20.2)
then
S
n
a
n
n
P
0.
Proof. A key ingredient in this proof and proofs of other versions of the
law of large numbers is to introduce truncations of the X
k
. In this case we
consider
S
t
n
:=
n
k=1
X
k
1
]Xk]n
.
Since S
n
,= S
n

n
k=1
[X
k
[ > n ,
P
_
S
n
a
n
n

S
t
n
a
n
n
>
_
= P
_
S
n
S
t
n
n
>
_
P (S
n
,= S
n
)
n
k=1
P ([X
k
[ > n) 0 as n .
308 20 Random Sums
Hence it suces to show
S
n
an
n
P
0 as n and for this it suces to show,
S
n
an
n
L
2
(P)
0 as n .
Observe that ES
t
n
= a
n
and therefore,
E
_
_
S
t
n
a
n
n
_
2
_
=
1
n
2
Var (S
t
n
) =
1
n
2
n
k=1
Var
_
X
k
1
]Xk]n
_
1
n
2
n
k=1
E
_
X
2
k
1
]Xk]n
_
0 as n ,
wherein we have used Var (Y ) = EY
2
(EY )
2
EY
2
in the last inequality.
We are now going to use this result to prove Fellers weak law of large
numbers which will be valid with an assumption which is weaker than rst
moments existing.
Remark 20.4. If X L
1
(P) , Chebyschevs inequality along with the dominated
convergence theorem implies
(x) := xP ([X[ x) E[[X[ : [X[ x] 0 as x .
If X is a random variable such that (x) = xP ([X[ x) 0 as x , we
say that X is in weak L
1
.
Exercise 20.2. Let = (0, 1], B = B
(0,1]
be the Borel algebra, P = m be
Lebesgue measure on (, B) , and X (y) := (y [lny[)
1
1
y1/2
for y . Show
that X / L
1
(P) yet lim
x
xP ([X[ x) = 0.
Lemma 20.5. Let X be a random variable such that (x) := xP ([X[ x) 0
as x , then
lim
n
1
n
E
_
[X[
2
: [X[ n
_
= 0. (20.3)
Proof. To prove this we observe that
E
_
[X[
2
: [X[ n
_
= E
_
2
_
1
0x]X]n
xdx
_
= 2
_
P (0 x [X[ n) xdx
2
_
n
0
xP ([X[ x) dx = 2
_
n
0
(x) dx
so that
1
n
E
_
[X[
2
: [X[ n
_
=
2
n
_
n
0
(x) dx.
It is now easy to check (we leave it to the reader) that
lim
n
1
n
_
n
0
(x) dx = 0.
Corollary 20.6 (Fellers WLLN). If X
n
n=1
are i.i.d. and (x) :=
xP ([X
1
[ > x) 0 as x , then the hypothesis of Theorem 20.3 are sat-
ised so that
S
n
n
E(X
1
: [X
1
[ n)
P
0.
Proof. Since
n
k=1
P ([X
k
[ > n) = nP ([X
1
[ > n) = (n) 0 as n ,
Eq. (20.1) is satised. Equation (20.2) follows from Lemma 20.5 and the identity,
1
n
2
n
k=1
E
_
X
2
k
: [X
k
[ n
_
=
1
n
E
_
[X
1
[
2
: [X
1
[ n
_
.
As a direct corollary of Fellers WLLN and Remark 20.4 we get Khintchins
weak law of large numbers.
Corollary 20.7 (Khintchins WLLN). If X
n
n=1
are i.i.d. L
1
(P) ran-
dom variables, then
1
n
S
n
P
= EX
1
. This convergence holds in L
1
(P) as well
since
_
1
n
S
n
_
n=1
is uniformly integrable under these hypothesis.
This result is also clearly a consequence of Komogorovs strong law of large
numbers.
20.1.1 A WLLN Example
Theorem 20.8 (Shannons Theorem). Let X
i
i=1
be a sequence of i.i.d.
random variables with values in 1, 2, . . . , r N, p (k) := P (X
i
= k) > 0 for
1 k r, and
H (p) := E[lnp (X
1
)] =
r
k=1
p (k) lnp (k)
be the entropy of p = p
k
r
k=1
. If we dene
n
() := p (X
1
()) . . . p (X
n
()) to
be the probability of the realization (X
1
() , . . . , X
n
()) , then for all > 0,
P
_
e
n(H(p)+)

n
e
n(H(p))
_
1 as n .
Thus the probability,
n
, that the random sample X
1
, . . . , X
n
should occur
is approximately e
nH(p)
with high probability. The number H (p) is called the
entropy of the distribution, p (k)
r
k=1
.
20.1 Weak Laws of Large Numbers 309
Proof. Since lnp (X
i
)
i=1
are i.i.d. it follows by the Weal law of large
numbers that
1
n
ln
n
=
1
n
n
i=1
lnp (X
i
)
P
E[lnp (X
1
)] =
r
k=1
p (k) lnp (k) =: H (p) ,
i.e. for every > 0,
lim
n
P
_
H (p)
1
n
ln
n
>
_
= 0.
Since
_
H (p) +
1
n
ln
n
>
_
=
_
H (p) +
1
n
ln
n
>
_
_
H (p) +
1
n
ln
n
<
_
=
_
1
n
ln
n
> H (p) +
_
_
1
n
ln
n
< H (p)
_
=
_
n
> e
n(H(p)+)
_
n
< e
n(H(p))
_
it follows that
_
H (p)
1
n
ln
n
>
_
c
=
_
n
e
n(H(p)+)
_
n
e
n(H(p))
_
=
_
e
n(H(p)+)

n
e
n(H(p))
_
,
and therefore
P
_
e
n(H(p)+)

n
e
n(H(p))
_
1 as n .
For our next example, let X
n
n=1
be i.i.d. random variables with com-
mon distribution function, F (x) := P (X
n
x) . For x 1 let F
n
(x) be the
empirical distribution function dened by,
F
n
(x) :=
1
n
n
j=1
1
Xjx
=
_
_
1
n
n
j=1
Xj
_
_
((, x]) .
Since E1
Xjx
= F (x) and
_
1
Xjx
_
j=1
are Bernoulli random variables, the
weak law of large numbers implies F
n
(x)
P
F (x) as n . As usual, for
p (0, 1) let
F
(p) := inf x : F (x) p

and recall that F
(p) x i F (x) p. Let us notice that

F
n
(p) = inf x : F
n
(x) p = inf
_
_
_
x :
n
j=1
1
Xjx
np
_
_
_
= inf x : #j n : X
j
x np .
Recall from Denition 11.10 that the order statistic of (X
1
, . . . , X
n
) is the
nite sequence,
_
X
(n)
1
, X
(n)
2
, . . . , X
(n)
n
_
, where
_
X
(n)
1
, X
(n)
2
, . . . , X
(n)
n
_
denotes
(X
1
, . . . , X
n
) arranged in increasing order with possible repetitions. It follows
from the formula in Denition 11.10 that X
(n)
k
are all random variables for
k n but it will be useful to give another proof. Indeed, X
(n)
k
x i
#j n : X
j
x k i

n
j=1
1
Xjx
k, i.e.
_
X
(n)
k
x
_
=
_
_
_
n
j=1
1
Xjx
k
_
_
_
B.
Moreover, if we let x| = minn Z : n x , the reader may easily check that
F
n
(p) = X
(n)
|np|
.
Proposition 20.9. Keeping the notation above. Suppose that p (0, 1) is a
point where
F (F
(p) ) 0

then X
(n)
|np|
= F
n
(p)
P
F
(p) as n . Thus we can recover, with high

probability, the p
th
quantile of the distribution F by observing X
i
n
i=1
.
Proof. Let > 0. Then
F
n
(p) F
(p) >
c
= F
n
(p) +F
(p) = F
n
(p) +F
(p)
= F
n
( +F
(p)) p
so that
F
n
(p) F
(p) > = F
n
(F
(p) +) 0, we have, as n , that

P (F
n
(p) F
(p) > ) = P (F
n
( +F
(p)) F ( +F
(p)) <
) 0.
Similarly, let
:= p F (F
(p) ) > 0 and observe that

F
(p) F
n
(p) = F
n
(p) F
(p) = F
n
(F
(p) ) p
310 20 Random Sums
and hence,
P (F
(p) F
n
(p) )
= P (F
n
(F
(p) ) F (F
(p) ) p F (F
(p) ))
= P (F
n
(F
(p) ) F (F
(p) )
) 0 as n .
Thus we have shown that X
(n)
|np|
P
F
(p) as n .
20.2 Kolmogorovs Convergence Criteria
Proposition 20.10 (L
2
- Convergence of Random Sums). Suppose that
Y
k
k=1
L
2
(P) are uncorrelated. If

k=1
Var (Y
k
) < then
k=1
(Y
k

k
) converges in L
2
(P) .
where
k
:= EY
k
.
Proof. Letting S
n
:=

n
k=1
(Y
k

k
) , it suces by the completeness of
L
2
(P) (see Theorem 12.25) to show |S
n
S
m
|
2
0 as m, n . Supposing
n > m, we have
|S
n
S
m
|
2
2
= E
_
n
k=m+1
(Y
k

k
)
_
2
=
n
k=m+1
Var (Y
k
) =
n
k=m+1
2
k
0 as m, n .
Theorem 20.11 (Kolmogorovs Convergence Criteria). Suppose
that Y
n
n=1
are independent square integrable random variables. If
j=1
Var (Y
j
) < , then

j=1
(Y
j
EY
j
) converges a.s. In particular if
j=1
Var (Y
j
) < and

j=1
EY
j
is convergent, then

j=1
Y
j
converges a.s.
and in L
2
(P) .
Proof. This is a special case of Theorem 18.67. Indeed, let S
n
:=
n
j=1
(Y
j
EY
j
) with S
0
= 0. Then S
n
n=0
is a martingale relative to the
ltration, B
n
= (S
0
, . . . S
n
) . By assumption we have
ES
2
n
=
n
j=1
Var (Y
j
)
j=1
Var (Y
j
) <
so that S
n
n=0
is bounded in L
2
(P) . Therefore by Theorem 18.67,
j=1
(Y
j
EY
j
) = lim
n
S
n
2
(P) .
Another way to prove this is to appeal Proposition 20.10 above and Levys
Theorem 20.46 below. As second method is to make use of Kolmogorovs in-
equality and we will give this proof below.
Example 20.12 (Brownian Motion). Let N
n
n=1
be i.i.d. standard normal ran-
dom variable, i.e.
P (N
n
A) =
_
A
1
2
e
x
2
/2
dx for all A B
R
.
Let
n
n=1
1, a
n
n=1
1, and t 1, then
n=1
a
n
N
n
sin
n
t converges a.s.
provided

n=1
a
2
n
< . This is a simple consequence of Kolmogorovs conver-
gence criteria, Theorem 20.11, and the facts that E[a
n
N
n
sin
n
t] = 0 and
Var (a
n
N
n
sin
n
t) = a
2
n
sin
2
n
t a
2
n
.
As a special case, if we take
n
= (2n 1)

2
and a
n
=
2
(2n1)
, then it follows
that
B
t
:=
2
k=1,3,5,...
N
k
k
sin
_
k
2
t
_
(20.4)
is a.s. convergent for all t 1. The factor
2
2
k
has been determined by requiring,
_
1
0
_
d
dt
2
2
k
sin(kt)
_
2
dt = 1
as seen by,
_
1
0
_
d
dt
sin
_
k
2
t
__
2
dt =
k
2
2
2
2
_
1
0
_
cos
_
k
2
t
__
2
dt
=
k
2
2
2
2
2
k
_
k
4
t +
1
4
sinkt
_
1
0
=
k
2
2
2
3
.
Fact: Wiener in 1923 showed the series in Eq. (20.4) is in fact almost surely
uniformly convergent. Given this, the process, t B
t
is almost surely contin-
uous. The process B
t
: 0 t 1 is Brownian Motion.
20.2 Kolmogorovs Convergence Criteria 311
Kolmogorovs convergence criteria becomes a powerful tool when combined
with the following real variable lemma.
Lemma 20.13 (Kroneckers Lemma). Suppose that x
k
1 and a
k

(0, ) are sequences such that a
k
and

k=1
xk
ak
is convergent in 1. Then
lim
n
1
a
n
n
k=1
x
k
= 0.
Proof. Before going to the proof, let us warm-up by proving the following
continuous version of the lemma. Let a (s) (0, ) and x(s) 1 be continuous
functions such that a (s) as s and
_
1
x(s)
a(s)
ds exists. We are going to
show
lim
n
1
a (n)
_
n
1
x(s) ds = 0.
Let X (s) :=
_
s
0
x(u) du and
r (s) :=
_

s
X
t
(u)
a (u)
du =
_

s
x(u)
a (u)
du.
Then by assumption, r (s) 0 as s 0 and X
t
(s) = a (s) r
t
(s) . Integrating
this equation shows
X (s) X (s
0
) =
_
s
s0
a (u) r
t
(u) du = a (u) r (u) [
s
u=s0
+
_
s
s0
r (u) a
t
(u) du.
Dividing this equation by a (s) and then letting s gives
limsup
s
[X (s)[
a (s)
= limsup
s
_
a (s
0
) r (s
0
) a (s) r (s)
a (s)
+
1
a (s)
_
s
s0
r (u) a
t
(u) du
_
limsup
s
_
r (s) +
1
a (s)
_
s
s0
[r (u)[ a
t
(u) du
_
limsup
s
_
a (s) a (s
0
)
a (s)
sup
us0
[r (u)[
_
= sup
us0
[r (u)[ 0 as s
0
.
With this as warm-up, we go to the discrete case.
Let
S
k
:=
k
j=1
x
j
and r
k
:=
j=k
x
j
a
j
,
so that r
k
0 as k by assumption. Since x
k
= a
k
(r
k
r
k+1
) , we nd
S
n
a
n
=
1
a
n
n
k=1
a
k
(r
k
r
k+1
) =
1
a
n
_
n
k=1
a
k
r
k

n+1
k=2
a
k1
r
k
_
=
1
a
n
_
a
1
r
1
a
n
r
n+1
+
n
k=2
(a
k
a
k1
) r
k
_
. (summation by parts)
Using the fact that a
k
a
k1
0 for all k 2, and
lim
n
1
a
n
m
k=2
(a
k
a
k1
) [r
k
[ = 0
for any m N; we may conclude
limsup
n
S
n
a
n
limsup
n
1
a
n
_
n
k=2
(a
k
a
k1
) [r
k
[
_
= limsup
n
1
a
n
_
n
k=m
(a
k
a
k1
) [r
k
[
_
sup
km
[r
k
[ limsup
n
1
a
n
_
n
k=m
(a
k
a
k1
)
_
= sup
km
[r
k
[ limsup
n
1
a
n
[a
n
a
m1
] = sup
km
[r
k
[ .
This completes the proof since sup
km
[r
k
[ 0 as m .
(See Kallenberg for a better proof.)
Corollary 20.14. Let X
n
be a sequence of independent square integrable ran-
dom variables and b
n
be a sequence such that b
n
. If
k=1
Var (X
k
)
b
2
k
<
then
S
n
ES
n
b
n
0 a.s. and in L
2
(P) .
Proof. By Kolmogorovs convergence criteria, Theorem 20.11,
k=1
X
k
EX
k
b
k
is convergent a.s. and in L
2
(P) .
Therefore an application of Kroneckers Lemma 20.13 implies
312 20 Random Sums
0 = lim
n
1
b
n
n
k=1
(X
k
EX
k
) = lim
n
S
n
ES
n
b
n
a.s.
Similarly by Kroneckers Lemma 20.13 we know that
0 = lim
n
1
b
2
n
n
k=1
Var (X
k
) = lim
n
E
_
S
n
ES
n
b
n
_
2
which gives the L
2
(P) convergence statement as well.
As an immediate corollary we have the following corollary.
Corollary 20.15 (L
2
SSLN). Let X
n
be a sequence of independent ran-
dom variables such that
2
= EX
2
n
< and = EX
n
are independent of n. As
above let S
n
=

n
k=1
X
k
. If b
n
n=1
(0, ) is a sequence such that b
n

and

n=1
1
b
2
n
< , then
1
b
n
(S
n
n) 0 a.s. and in L
2
(P) (20.5)
We may rewrite Eq. (20.5) as
S
n
= n +o (1) b
n
or
S
n
n
= +o (1)
b
n
n
.
Example 20.16. For example, we could take b
n
= n or b
n
= n
p
for an p > 1/2,
or b
n
= n
1/2
(lnn)
1/2+
for any > 0. The idea here is that
n=2
1
_
n
1/2
(lnn)
1/2+
_
2
=
n=2
1
n(lnn)
1+2
which may be analyzed by comparison with the integral
_

2
1
xln
1+2
x
dx =
_

ln 2
1
e
y
y
1+2
e
y
dy =
_

ln 2
1
y
1+2
dy < ,
wherein we have made the change of variables, y = lnx. When b
n
=
n
1/2
(lnn)
1/2+
S
n
n
= +o (1)
(lnn)
1/2+
n
1/2
,
i.e. the uctuations of
Sn
n
about the mean, , have order smaller than
n
1/2
(lnn)
1/2+
.
Fact 20.17 (Missing Reference) Under the hypothesis in Corollary 20.15,
lim
n
S
n
n
n
1/2
(lnlnn)
1/2
=
2 a.s.
We end this section with another example of using Kolmogorovs conver-
gence criteria in conjunction with Kroneckers Lemma 20.13.
Lemma 20.18. Let X
n
n=1
be independent square integrable random vari-
ables such that ES
n
as n . Then
n=1
Var
_
X
n
ES
n
_
=
n=1
Var (X
n
)
(ES
n
)
2
< =
S
n
ES
n
1 a.s.
Proof. Kolmogorovs convergence criteria, Theorem 20.11 we know that
n=1
X
n
EX
n
ES
n
is a.s. convergent.
It then follows by Kroneckers Lemma 20.13 that
0 = lim
n
1
ES
n
n
i=1
(X
n
EX
n
) = lim
n
S
n
ES
n
1 a.s.
n
n=1
are i.i.d. square integrable random vari-
ables with := EX
n
> 0 and
2
:= Var (X
n
) < . Since ES
n
= n
and
n=1
Var (X
n
)
(ES
n
)
2
=
n=1
2
n
2
< ,
we may conclude that lim
n
Sn
n
= 1 a.s., i.e. S
n
/n a.s. as we already
know.
We now assume that X
n
n=1
are i.i.d. random variables with a continuous
distribution function and let A
j
denote the event when X
j
is a record, i.e.
A
j
:= X
j
> max X
1
, X
2
, . . . , X
k1
.
Recall from Renyi Theorem 10.23 that A
j
j=1
are independent and P (A
j
) =
1
j
for all j.
Proposition 20.20. Keeping the preceding notation and let S
n
:=

n
j=1
1
Aj
denote the number of records in the rst n observations. Then lim
n
Sn
ln n
= 1
a.s.
20.3 The Strong Law of Large Numbers Revisited 313
Proof. In this case
ES
n
=
n
j=1
E1
Aj
=
n
j=1
1
j

_
n
1
1
x
dx = lnn
and
Var (1
An
) = E1
2
An
(E1
An
)
2
=
1
n

1
n
2
=
n 1
n
2
so by that
n=1
Var
_
1
An
ES
n
_
=
n=1
_
1
n

1
n
2
_
1
_
n
j=1
1
j
_
2
n=1
1
_
n
j=1
1
j
_
2
1
n
1 +
_

2
1
ln
2
x
1
x
dx = 1 +
_

ln 2
1
y
2
dy < .
Therefore by Lemma 20.18 we may conclude that lim
n
Sn
ESn
= 1 a.s.
So to nish the proof it only remains to show
lim
n
ES
n
lnn
lim
n
n
j=1
1
j
lnn
= 1. (20.6)
To see this write
ln(n + 1) =
_
n+1
1
1
x
dx =
n
j=1
_
j+1
j
1
x
dx
=
n
j=1
_
j+1
j
_
1
x

1
j
_
dx +
n
j=1
1
j
=
n
+
n
j=1
1
j
(20.7)
where
[
n
[ =
n
j=1
ln
j + 1
j

1
j
=
n
j=1
ln(1 + 1/j)
1
j
j=1
1
j
2
and hence we conclude that lim
n
n
< . So dividing Eq. (20.7) by lnn
and letting n gives the desired limit in Eq. (20.6).
20.3 The Strong Law of Large Numbers Revisited
Denition 20.21. Two sequences, X
n
and X
t
n
, of random variables are
tail equivalent if
E
_

n=1
1
Xn,=X
n
_
=
n=1
P (X
n
,= X
t
n
) < .
Proposition 20.22. Suppose X
n
and X
t
n
are tail equivalent. Then
1.

(X
n
X
t
n
) converges a.s.
2. The sum

X
n
is convergent a.s. i the sum

X
t
n
is convergent a.s. More
generally we have
P
__
X
n
is convergent
_
X
t
n
is convergent
__
= 1
3. If there exists a random variable, X, and a sequence a
n
such that
lim
n
1
a
n
n
k=1
X
k
= X a.s
then
lim
n
1
a
n
n
k=1
X
t
k
= X a.s
Proof. If X
n
and X
t
n
are tail equivalent, we know by the rst Borel -
Cantelli Lemma 7.14 that P (X
n
= X
t
n
for a.a. n) = 1. The proposition is an
easy consequence of this observation.
Remark 20.23. In what follows we will typically have a sequence, X
n
n=1
, of
independent random variables and X
t
n
= f
n
(X
n
) for some cuto functions,
f
n
: 1 1. In this case the collection of sets, A
n
:= X
n
,= X
t
n
n=1
are
independent and so by the Borel zero one law (Lemma 10.41) we will have
P (X
n
,= X
t
n
i.o. n) = 0
n=1
P (X
n
,= X
t
n
) < .
So in this case X
n
and X
t
n
are tail equivalent i P (X
n
,= X
t
n
a.a. n) = 1.
For example if k
n
n=1
(0, ) and X
t
n
:= X
n
1
]Xn]kn
then the following
are equivalent;
1. P ([X
n
[ k
n
a.a. n) = 1,
2. P ([X
n
[ > k
n
i.o. n) = 0,
3.

n=1
P (X
n
,= X
t
n
) =
n=1
P ([X
n
[ > k
n
) < ,
314 20 Random Sums
4. X
n
and X
t
n
are tail equivalent.
Lemma 20.24. Suppose that X : 1 is a random variable, then
E[X[
p
=
_

0
ps
p1
P ([X[ s) ds =
_

0
ps
p1
P ([X[ > s) ds.
Proof. By the fundamental theorem of calculus,
[X[
p
=
_
]X]
0
ps
p1
ds = p
_

0
1
s]X]
s
p1
ds = p
_

0
1
s<]X]
s
p1
ds.
Taking expectations of this identity along with an application of Tonellis the-
orem completes the proof.
Lemma 20.25. If X is a random variable and > 0, then
n=1
P ([X[ n)
1
E[X[
n=0
P ([X[ n) . (20.8)
Proof. First observe that for all y 0 we have,
n=1
1
ny
y
n=1
1
ny
+ 1 =
n=0
1
ny
. (20.9)
Taking y = [X[ / in Eq. (20.9) and then take expectations gives the estimate
in Eq. (20.8).
n
n=1
are i.i.d. random variables, then
the following are equivalent:
1. E[X
1
[ < .
2. There exists > 0 such that

n=1
P ([X
1
[ n) < .
3. For all > 0,
n=1
P ([X
1
[ n) < .
4. lim
n
]Xn]
n
= 0 a.s.
Proof. The equivalence of items 1., 2., and 3. easily follows from Lemma
20.25. So to nish the proof it suces to show 3. is equivalent to 4. To this end
we start by noting that lim
n
]Xn]
n
= 0 a.s. i
0 = P
_
[X
n
[
n
i.o.
_
= P ([X
n
[ n i.o.) for all > 0. (20.10)
Because [X
n
[ n
n=1
are independent sets, the Borel zero-one law
(Lemma 10.41) shows the statement in Eq. (20.10) is equivalent to
n=1
P ([X
n
[ n) < for all > 0.
n
n=1
are i.i.d. random variables such that
1
n
S
n
c 1 a.s., then X
n
L
1
(P) and := EX
n
= c.
Proof. If
1
n
S
n
c a.s. then
n
:=
Sn+1
n+1

Sn
n
0 a.s. and therefore,
X
n+1
n + 1
=
S
n+1
n + 1

S
n
n + 1
=
n
+S
n
_
1
n

1
n + 1
_
=
n
+
1
(n + 1)
S
n
n
0 + 0 c = 0.
Hence an application of Proposition 20.26 shows X
n
L
1
(P) . Moreover by
Exercise 12.6,
_
1
n
S
n
_
n=1
is a uniformly integrable sequenced and therefore,
= E
_
1
n
S
n
_
E
_
lim
n
1
n
S
n
_
= E[c] = c.
Lemma 20.28. For all x 0,
(x) :=
n=1
1
n
2
1
xn
=
nx
1
n
2
2 min
_
1
x
, 1
_
.
Proof. The proof will be by comparison with the integral,
_
a
1
t
2
dt = 1/a.
For example,
n=1
1
n
2
1 +
_

1
1
t
2
dt = 1 + 1 = 2
and so
nx
1
n
2

n=1
1
n
2
2
2
x
for 0 < x 1.
Similarly, for x > 1,
nx
1
n
2

1
x
2
+
_

x
1
t
2
dt =
1
x
2
+
1
x
=
1
x
_
1 +
1
x
_
2
x
,
see Figure 20.1 below.
Lemma 20.29. Suppose that X : 1 is a random variable, then
n=1
1
n
2
E
_
[X[
2
: 1
]X]n
_
2E[X[ .
20.3 The Strong Law of Large Numbers Revisited 315
Fig. 20.1. Estimating
nx
1/n
2
with an integral.
Proof. This is a simple application of Lemma 20.28;
n=1
1
n
2
E
_
[X[
2
: 1
]X]n
_
= E
_
[X[
2
n=1
1
n
2
1
]X]n
_
= E
_
[X[
2
([X[)
_
2E
_
[X[
2
_
1
[X[
1
__
2E[X[ .
With this as preparation we are now in a position to give another proof of
the Kolmogorovs strong law of large numbers which has already appeared in
Theorem 16.10 and Example 18.78.
Theorem 20.30 (Kolmogorovs Strong Law of Large Numbers). Sup-
pose that X
n
n=1
are i.i.d. random variables and let S
n
:= X
1
+ + X
n
.
Then there exists 1 such that
1
n
S
n
a.s. i X
n
is integrable and in
which case EX
n
= .
Proof. The implication,
1
n
S
n
a.s. implies X
n
L
1
(P) and EX
n
=
has already been proved in Corollary 20.27. So let us now assume X
n
L
1
(P)
and let := EX
n
.
Let X
t
n
:= X
n
1
]Xn]n
. By Lemma 20.25,
n=1
P (X
t
n
,= X
n
) =
n=1
P ([X
n
[ > n) =
n=1
P ([X
1
[ > n) E[X
1
[ < ,
and hence X
n
and X
t
n
are tail equivalent. Therefore, by Proposition 20.22,
it suces to show lim
n
1
n
S
n
= a.s. where S
t
n
:= X
t
1
+ + X
t
n
. But by
Lemma 20.29,
n=1
Var (X
t
n
)
n
2

n=1
E[X
t
n
[
2
n
2
=
n=1
E
_
[X
n
[
2
1
]Xn]n
_
n
2
=
n=1
E
_
[X
1
[
2
1
]X1]n
_
n
2
2E[X
1
[ < . (20.11)
Therefore by Kolmogorovs convergence criteria, Theorem 20.11,
n=1
X
t
n
EX
t
n
n
is almost surely convergent.
Kroneckers Lemma 20.13 then implies
lim
n
1
n
n
k=1
(X
t
k
EX
t
k
) = 0 a.s.
So to nish the proof, it only remains to observe
lim
n
1
n
n
k=1
EX
t
k
= lim
n
1
n
n
k=1
E
_
X
k
1
]Xk]k
= lim
n
1
n
n
k=1
E
_
X
1
1
]X1]k
= .
Here we have used the dominated convergence theorem to see that a
k
:=
E
_
X
1
1
]X1]k
as k from which it is easy (and standard) to check

that lim
n
1
n
n
k=1
a
k
= .
Remark 20.31. If E[X
1
[ = but EX
1
< , then
1
n
S
n
a.s. To prove this,
for M > 0 let X
M
n
:= X
n
M and S
M
n
:=
n
i=1
X
M
i
. It follows from Theorem
20.30 that
1
n
S
M
n

M
:= EX
M
1
a.s.. Since S
n
S
M
n
, we may conclude that
liminf
n
S
n
n
liminf
n
1
n
S
M
n
=
M
a.s.
Since
M
as M , it follows that liminf
n
Sn
n
= a.s. and hence
that lim
n
Sn
n
= a.s.
Exercise 20.3 (Resnik 7.9). Let X
n
n=1
be i.i.d. with E[X
1
[ < and
EX
1
= 0. Following the ideas in the proof of Theorem 20.30, show for any
bounded sequence c
n
n=1
of real numbers that
lim
n
1
n
n
k=1
c
k
X
k
= 0 a.s.
316 20 Random Sums
20.3.1 Strong Law of Large Number Examples
Example 20.32 (Renewal Theory). Let X
i
i=1
be i.i.d. non-negative integrable
random variables such that P (X
i
> 0) > 0. Think of the X
i
as the life time
of bulb number i, := EX
i
is the mean life time of each bulb, and T
n
:=
X
1
+ +X
n
is the time that the n
th
bulb burns out. (We assume the bulbs
are replaced immediately on burning out.) By convention, we set T
0
= 0.
Let
N
t
:= supn 0 : T
n
t
denote the number of bulbs which have burned out up to time t. Since EX
i
< ,
X
i
< a.s. and therefore T
n
< a.s. for all n. From this observation it follows
that N
t
on the set,
1
:=
i=1
X
i
< a subset of with full measure.
It is reasonable to guess that N
t
t/ and indeed we will show;
lim
t
1
t
N
t
=
1
a.s. (20.12)
To prove Eq. (20.12), by the SSLN, if
0
:=
_
lim
n
1
n
T
n
=
_
then P (
0
) =
1. From the denition of N
t
, T
Nt
t < T
Nt+1
and so
T
Nt
N
t
t
N
t
<
T
Nt+1
N
t
.
For
0

1
we have
= lim
t
T
Nt()
()
N
t
()
liminf
t
t
N
t
()
limsup
t
t
N
t
()
lim
t
_
T
Nt()+1
()
N
t
() + 1
N
t
() + 1
N
t
()
_
= .
Example 20.33 (Renewal Theory II). Let X
i
i=1
be i.i.d. and Y
i
i=1
be i.i.d.
non-negative integrable random variables with X
i
i=1
being independent of
the Y
i
i=1
and let = EX
1
and = EY
1
. Again assume that P (X
i
> 0) > 0.
We will interpret Y
i
to be the amount of time the i
th
bulb remains out after
burning out before it is replaced by bulb number i +1. Let R
t
be the amount of
time that we have a working bulb in the time interval [0, t] . We are now going
to show
lim
t
1
t
R
t
=
EX
1
EX
1
+EY
1
=

+
.
To prove this, let T
n
:=

n
i=1
(X
i
+Y
i
) be the time that the n
th
bulb is
replaced and
N
t
:= supn 0 : T
n
t
denote the number of bulbs which have burned out up to time n. By Example
20.32 we know that
lim
t
1
t
N
t
=
1
+
a.s., i.e. N
t
=
1
+
t +o (t) a.s.
Let us now set

R
t
=
Nt
i=1
X
i
and observe that
R
t
R
t

R
t
+X
Nt+1
.
By Proposition 20.26 we know that X
n
/n 0 a.s. and therefore,
lim
t
X
Nt+1
t
= lim
t
_
X
Nt+1
N
t
+ 1

N
t
+ 1
t
_
= 0
1
+
= 0 a.s.
Thus it follows that lim
t
1
t
R
t
= lim
t
1
t
R
t
a.s. and the latter limit may be
computed using the strong law of large numbers;
1
t
R
t
=
1
t
Nt
i=1
X
i
=
N
t
t

1
N
t
Nt
i=1
X
i

1
+
a.s.
Theorem 20.34 (Glivenko-Cantelli Theorem). Suppose that X
n
n=1
are
i.i.d. random variables and F (x) := P (X
i
x) . Further let
n
:=
1
n
n
i=1
Xi
be the empirical distribution with empirical distribution function,
F
n
(x) :=
n
((, x]) =
1
n
n
i=1
1
Xix
.
Then
lim
n
sup
xR
[F
n
(x) F (x)[ = 0 a.s.
Proof. Since 1
Xix
i=1
are i.i.d random variables with E1
Xix
=
P (X
i
x) = F (x) , it follows by the strong law of large numbers that
lim
n
F
n
(x) = F (x) a.s. for all x 1. (20.13)
Our goal is to now show that this convergence is uniform.
1
To do this we will
use another application of the strong law of large numbers applied to 1
Xi<x
in order to conclude that, for all x 1,

1
Observation. If F is continuous then, by what we have just shown, there is a set
0 such that P (0) = 1 and on 0, Fn (r) F (r) for all r Q. Moreover
on 0, if x R and r x s with r, s Q, we have
F (r) = lim
n
Fn (r) liminf
n
Fn (x) limsup
n
Fn (x) lim
n
Fn (s) = F (s) .
20.4 Kolmogorovs Three Series Theorem 317
lim
n
F
n
(x) = F (x) a.s. for all x 1. (20.14)
Keep in mind that the exceptional set of probability zero depend on x.
Given k N, let
k
:=
_
i
k
: i = 1, 2, . . . , k 1
_
and let x
i
:=
inf x : F (x) i/k for i = 1, 2, . . . , k 1, see Figure 20.2. Let us fur-
Fig. 20.2. Constructing the sequence of points {xi}
k
i=0
.
ther set x
k
= and x
0
= and let
k
denote the subset of of full
measure where Eqs. (20.13) and (20.14) hold for x x
i
: 1 i k 1 . For

k
we may nd N () N (N is random) so that
[F
n
(x
i
) F (x
i
)[ < 1/k and [F
n
(x
i
) F (x
i
)[ < 1/k
for n N () , 1 i k 1, and
k
with P (
k
) = 1.
Observe that it is possible that x
i
= x
i+1
for some of the i. This can occur
when F has jumps of size greater than 1/k,
2
see Figure 20.2. Now suppose i has
been chosen so that x
i1
< x
i
and let x (x
i1
, x
i
) .We then have for
k
and n N () that
F
n
(x) F
n
(x
i
) F (x
i
) + 1/k F (x) + 2/k
We may now let s x and r x to conclude, on 0, on
F (x) liminf
n
Fn (x) limsup
n
Fn (x) F (x) for all x R,
i.e. on 0, limnFn (x) = F (x) . Thus, in this special case we have shown that
o a xed null set independent of x that limnFn (x) = F (x) for all x R.
2
In fact if F (x) = 0 ((, x]) = 1x0, then x1 = = xk1 = 0 for all k.
and
F
n
(x) F
n
(x
i1
) F (x
i1
) 1/k F (x
i
) 2/k F (x) 2/k.
From this it follows on
k
that [F (x) F
n
(x)[ 2/k for n N and therefore,
sup
xR
[F (x) F
n
(x)[ 2/k.
Hence it follows on
0
:=
k=1
k
(a set with P (
0
) = 1) that
lim
n
sup
xR
[F
n
(x) F (x)[ = 0.
20.4 Kolmogorovs Three Series Theorem
The next theorem generalizes Theorem 20.11 by giving necessary and sucient
conditions for a random series of independent random variables to converge.
Theorem 20.35 (Kolmogorovs Three Series Theorem). Suppose that
X
n
n=1
are independent random variables. Then the random series,
n=1
X
n
,
is almost surely convergent in 1 i there exists c > 0 such that
1.

n=1
P ([X
n
[ > c) < ,
2.

n=1
Var
_
X
n
1
]Xn]c
_
< , and
3.

n=1
E
_
X
n
1
]Xn]c
_
converges.
Moreover, if the three series above converge for some c > 0 then they con-
verge for all values of c > 0.
Remark 20.36. We have seen another necessary and sucient condition in Ex-
ercise 18.20, namely

n=1
X
n
, is almost surely convergent in 1 i

n=1
X
n
is convergent in distribution. We will also see below that

n=1
X
n
, is almost
surely convergent in 1 i

n=1
X
n
, is convergent in probability, see Levys
Theorem 20.46 below.
Proof. Proof of suciency. Suppose the three series converge for some
c > 0. If we let X
t
n
:= X
n
1
]Xn]c
, then
n=1
P (X
t
n
,= X
n
) =
n=1
P ([X
n
[ > c) < .
Hence X
n
and X
t
n
are tail equivalent and so it suces to show

n=1
X
t
n
is almost surely convergent. However, by the convergence of the second series
we learn
318 20 Random Sums
n=1
Var (X
t
n
) =
n=1
Var
_
X
n
1
]Xn]c
_
<
and so by Kolmogorovs convergence criteria, Theorem 20.11,
n=1
(X
t
n
EX
t
n
) is almost surely convergent.
Finally, the third series guarantees that

n=1
EX
t
n
=

n=1
E
_
X
n
1
]Xn]c
_
is
convergent, therefore we may conclude

n=1
X
t
n
is convergent.
The necessity proof will be completed after the next two lemmas. Another
proof of necessity may be found in Chapter 23, see Theorem 23.10.
Lemma 20.37. Suppose that Y
n
n=1
are independent random variables such
that there exists c < such that [Y
n
[ c < a.s. and further assume
EY
n
= 0. If

n=1
Y
n
is almost surely convergent then

n=1
EY
2
n
< . More
precisely the following estimate holds,
j=1
EY
2
j

( +c)
2
P (sup
n
[S
n
[ )
for all > 0, (20.15)
where as usual, S
n
:=
n
j=1
Y
j
.
Remark 20.38. It follows from Eq. (20.15) that if P (sup
n
[S
n
[ < ) > 0, then
j=1
EY
2
j
< and hence by Kolmogorovs convergence criteria (Theorem
20.11),
j=1
Y
j
= lim
n
S
n
exists a.s. and in particular, P (sup
n
[S
n
[ < ) =
1. This also follows from the fact that sup
n
[S
n
[ < is a tail event and hence
P (sup
n
[S
n
[ < ) is either 0 or 1 and as P (sup
n
[S
n
[ < ) > 0 we must have
P (sup
n
[S
n
[ < ) = 1.
Proof. We will begin by proving that for every N N and > 0 we have
E
_
S
2
N
( +c)
2
P
_
sup
nN
[S
n
[
_
( +c)
2
P (sup
n
[S
n
[ )
. (20.16)
To prove Eq. (20.16), let be the stopping time,
=
:= inf n 1 : [S
n
[ >
where inf = . Then
E
_
S
2
N
= E
_
S
2
N
: N
+E
_
S
2
N
: > N
E
_
S
2
N
: N
+
2
P [ > N]
and
E
_
S
2
N
: N
=
N
j=1
E
_
S
2
N
: = j
=
N
j=1
E
_
[S
j
+S
N
S
j
[
2
: = j
_
=
N
j=1
E
_
S
2
j
+ 2S
j
(S
N
S
j
) + (S
N
S
j
)
2
: = j
_
=
N
j=1
E
_
S
2
j
: = j
+
N
j=1
E
_
(S
N
S
j
)
2
_
P [ = j]
j=1
E
_
(S
j1
+Y
j
)
2
: = j
_
+E
_
S
2
N
j=1
P [ = j]
j=1
E
_
( +c)
2
: = j
_
+E
_
S
2
N
P [ N]
=
_
( +c)
2
+E
_
S
2
N
_
P [ N] .
Combining the previous two estimates gives;
E
_
S
2
N
_
( +c)
2
+E
_
S
2
N
_
P [ N] +
2
P [ > N]
_
( +c)
2
+E
_
S
2
N
_
P [ N] + ( +c)
2
P [ > N]
= ( +c)
2
+P [ N] E
_
S
2
N
,
form which Eq. (20.16) follows upon noting that
N =
_
sup
nN
[S
n
[ >
_
=
_
sup
nN
[S
n
[
_
c
.
Since S
n
is convergent a.s., it follows that P (sup
n
[S
n
[ < ) = 1 and there-
fore,
lim
P
_
sup
n
[S
n
[
_
= 1.
Hence for suciently large, P (sup
n
[S
n
[ ) > 0 and we learn from Eq.
(20.16) that
j=1
EY
2
j
= lim
N
E
_
S
2
N
( +c)
2
P (sup
n
[S
n
[ )
< .
20.4 Kolmogorovs Three Series Theorem 319
n
n=1
are independent random variables such
that there exists c < such that [Y
n
[ c a.s. for all n. If

n=1
Y
n
converges
in 1 a.s. then

n=1
EY
n
converges as well.
Proof. Let (
0
, B
0
, P
0
) be the probability space that Y
n
n=1
is dened on
and let
:=
0
0
, B := B
0
B
0
, and P := P
0
P
0
.
Further let Y
t
n
(
1
,
2
) := Y
n
(
1
) and Y
tt
n
(
1
,
2
) := Y
n
(
2
) and
Z
n
(
1
,
2
) := Y
t
n
(
1
,
2
) Y
tt
n
(
1
,
2
) = Y
n
(
1
) Y
n
(
2
) .
Then [Z
n
[ 2c a.s., EZ
n
= 0, and
n=1
Z
n
(
1
,
2
) =
n=1
Y
n
(
1
)
n=1
Y
n
(
2
) exists
for P a.e. (
1
,
2
) . Hence it follows from Lemma 20.37 that
>
n=1
EZ
2
n
=
n=1
Var (Z
n
) =
n=1
Var (Y
t
n
Y
tt
n
)
=
n=1
[Var (Y
t
n
) + Var (Y
tt
n
)] = 2
n=1
Var (Y
n
) .
Thus by Kolmogorovs convergence theorem, it follows that
n=1
(Y
n
EY
n
) is
convergent. Since

n=1
Y
n
is a.s. convergent, we may conclude that

n=1
EY
n
is also convergent.
We are now ready to complete the proof of Theorem 20.35.
Proof of Theorem 20.35. Our goal is to show if X
n
n=1
are independent
random variables such that

n=1
X
n
, is almost surely convergent then for all
c > 0 the following three series converge;
1.

n=1
P ([X
n
[ > c) < ,
2.

n=1
Var
_
X
n
1
]Xn]c
_
< , and
3.

n=1
E
_
X
n
1
]Xn]c
_
converges.
Since

n=1
X
n
is almost surely convergent, it follows that lim
n
X
n
= 0
a.s. and hence for every c > 0, P ([X
n
[ c i.o.) = 0. According the Borel zero
one law (Lemma 10.41) this implies for every c > 0 that

n=1
P ([X
n
[ > c) <
. Given this, we now know that X
n
and
_
X
c
n
:= X
n
1
]Xn]c
_
are tail equiv-
alent for all c > 0 and in particular

n=1
X
c
n
is almost surely convergent for
all c > 0. So according to Lemma 20.39 (with Y
n
= X
c
n
),
n=1
EX
c
n
=
n=1
E
_
X
n
1
]Xn]c
_
converges.
Letting Y
n
:= X
c
n
EX
c
n
, we may now conclude that

n=1
Y
n
is almost surely
convergent. Since Y
n
is uniformly bounded and EY
n
= 0 for all n, an appli-
cation of Lemma 20.37 allows us to conclude
n=1
Var
_
X
n
1
]Xn]c
_
=
n=1
EY
2
n
< .
Exercise 20.4 (Two Series Theorem Resnik 7.15). Prove that the three
series theorem reduces to a two series theorem when the random variables are
positive. That is, if X
n
0 are independent, then

n
X
n
< a.s. i for any
c > 0 we have
n
P(X
n
> c) < and (20.17)
n
E[X
n
1
Xnc
] < , (20.18)
that is it is unnecessary to verify the convergence of the second series in Theorem
20.35 involving the variances.
20.4.1 Examples
n
n=1
are independent square integrable ran-
dom variables such that Y
n
d
= N
_
n
,
2
n
_
. Then

j=1
Y
j
converges a.s. i
j=1
2
j
< and

j=1
j
converges.
Proof. The implication = is true without the assumption that the Y
n
are normal random variables as pointed out in Theorem 20.11. To prove the
converse directions we will make use of the Kolmogorovs three series Theorem
20.35. Namely, if
j=1
Y
j
converges a.s. then the three series in Theorem 20.35
converge for all c > 0.
1. Since Y
n
d
=
n
N +
n
, we have for any c > 0 that
>
n=1
P ([
n
N +
n
[ > c) . (20.19)
If lim
n
n
,= 0 then there is a c > 0 such that either
n
2c for innitely
many n or
n
2c for innitely many n. It then follows that either N > 0
[
n
N +
n
[ > c n i.o. or N < 0 [
n
N +
n
[ > c n i.o. In either case
we would have P ([
n
N +
n
[ > c) 1/2 i.o. which would violate Eq. (20.19)
and so we may concluded that lim
n
n
= 0. Similarly if lim
n
n
,= 0,
then there exists < such that
320 20 Random Sums
N [
n
N +
n
[ > 1 n i.o.
which would imply P ([
n
N +
n
[ > 1) P (N ) > 0 for innitely many n.
This again violate Eq. (20.19) and thus we may conclude that lim
n
n
=
lim
n
n
= 0.
2. Let
n
:= 1
]nN+n]c
0, 1 . The convergence of the second series for
all c > 0 implies
>
n=1
Var
_
Y
n
1
]Yn]c
_
=
n=1
Var ([
n
N +
n
]
n
) . (20.20)
If we can show
Var ([
n
N +
n
]
n
)
1
2
2
n
for large n, (20.21)
it would then follow from Eq. (20.20) that
n=1
2
n
< . We may now use Kol-
mogorovs convergence criteria (Theorem 20.11) to infer that

n=1
(Y
n
n
)
is almost surely convergent which then implies that

n=1
n
is convergent as
n
= Y
n
(Y
n
n
) and

n=1
Y
n
and

n=1
(Y
n
n
) are both convergent
a.s. So to nish the proof we need to prove the estimate in Eq. (20.21).
Let
n
:= Var (N
n
) and
n
:= P (
n
= 1) so that Var (
n
) =
n
(1
n
)
and
n
:= Cov (N
n
,
n
) = E[N
n

n
] E[N
n
] E[
n
] = E[N
n
] (1
n
) .
Therefore, using Var (X +Y ) =
2
Var (X) +
2
Var (Y ) + 2Cov (X, Y ) ,
we nd
Var ([
n
N +
n
]
n
) = Var (
n
N
n
+
n
n
)
=
2
n
n
+
2
n
n
(1
n
) + 2
n
n
.
Making use of the estimate, 2ab a
2
+b
2
valid for all a, b 0, it follows that
Var ([
n
N +
n
]
n
)
2
n
n
+
2
n
n
(1
n
) 2 [
n
[
n
[
n
[

2
n
(
n
[
n
[) +
2
n
(
n
(1
n
) [
n
[)
=
2
n
(
n
[
n
[) + (1
n
) (
n
[E[N
n
][)
2
n
.
This estimate along with the observations that 1
n
0, lim
n
n
=
lim
n
n
= 1, lim
n
E[N
n
] = 0 (use DCT) and lim
n
n
= 0 easily
implies Eq. (20.21).
An alternative proof that

n=1
n
is convergent using the the third
series in Theorem 20.35. For all c > 0 the third series implies
n=1
E
_
[
n
N +
n
] 1
]nN+n]c
_
is convergent, i.e.
n=1
[
n
n
+
n
n
] is convergent.
where
n
:= E
_
N 1
]nN+n]c
_
and
n
:= E
_
1
]nN+n]c
_
. With a little eort
one can show,
n
e
k/
2
n
and 1
n
e
k/
2
n
for large n.
Since e
k/
2
n
C
2
n
for large n, it follows that
n=1
[
n
n
[ C
n=1
3
n
<
so that

n=1
n
is convergent. Moreover,
n=1
[
n
(
n
1)[ C
n=1
[
n
[
2
n
<
and hence
n=1
n
=
n=1
n=1
n
(
n
1)
must also be convergent.
Example 20.41. As another simple application of Theorem 20.35, let us use it
to give a proof of Theorem 20.11. We will apply Theorem 20.35 with X
n
:=
Y
n
EY
n
. We need to then check the three series in the statement of Theorem
20.35 converge. For the rst series we have by the Markov inequality,
n=1
P ([X
n
[ > c)
n=1
1
c
2
E[X
n
[
2
=
1
c
2
n=1
Var (Y
n
) < .
For the second series, observe that
n=1
Var
_
X
n
1
]Xn]c
_
n=1
E
_
_
X
n
1
]Xn]c
_
2
_
n=1
E
_
X
2
n
n=1
Var (Y
n
) <
and we estimate the third series as;
n=1
E
_
X
n
1
]Xn]c
_
n=1
E
_
1
c
[X
n
[
2
1
]Xn]c
_
1
c
n=1
Var (Y
n
) < .
20.5 Maximal Inequalities 321
20.5 Maximal Inequalities
Theorem 20.42 (Kolmogorovs Inequality). Let X
n
be a sequence of
independent random variables with mean zero, S
n
:= X
1
+ + X
n
, and
S
n
= max
jn
[S
j
[ . Then for any > 0 we have
P (S
N
)
1
2
E
_
S
2
N
: S
. (20.22)
Proof. First proof. As S
n
n=1
is a martingale relative to the ltration,
B
n
= (S
1
, . . . , S
n
) , the inequality in Eq. (20.22) is a special case of Proposition
18.42 with X
n
= S
2
n
, also see Example 18.48.
*Second direct proof. Let = inf j : [S
j
[ with the inmum of the
empty set being taken to be equal to . Observe that
= j = [S
1
[ < , . . . , [S
j1
[ < , [S
j
[ (X
1
, . . . , X
j
) .
Now
E
_
S
2
N
: [S
N
[ >
= E
_
S
2
N
: N
=
N
j=1
E
_
S
2
N
: = j
=
N
j=1
E
_
(S
j
+S
N
S
j
)
2
: = j
_
=
N
j=1
E
_
S
2
j
+ (S
N
S
j
)
2
+ 2S
j
(S
N
S
j
) : = j
_
()
=
N
j=1
E
_
S
2
j
+ (S
N
S
j
)
2
: = j
_
j=1
E
_
S
2
j
: = j

2
N
j=1
P [ = j] =
2
P ([S
N
[ > ) .
The equality, () , is a consequence of the observations: 1) 1
=j
S
j
is
(X
1
, . . . , X
j
) measurable, 2) (S
n
S
j
) is (X
j+1
, . . . , X
n
) measurable
and hence 1
=j
S
j
and (S
n
S
j
) are independent, and so 3)
E[S
j
(S
N
S
j
) : = j] = E[S
j
1
=j
(S
N
S
j
)]
= E[S
j
1
=j
] E[S
N
S
j
] = E[S
j
1
=j
] 0 = 0.
Remark 20.43 (Another proof of Theorem 20.11). Suppose that Y
j
j=1
are
independent random variables such that

j=1
Var (Y
j
) < and let S
n
:=
n
j=1
X
j
where X
j
:= Y
j
EY
j
. According to Kolmogorovs inequality, Theorem
20.42, for all M < N,
P
_
max
MjN
[S
j
S
M
[
_
2
E
_
(S
N
S
M
)
2
_
=
1
2
N
j=M+1
E
_
X
2
j
=
1
2
N
j=M+1
Var (X
j
) .
Letting N in this inequality shows, with Q
M
:= sup
jM
[S
j
S
M
[ ,
P (Q
M
)
1
j=M+1
Var (X
j
) .
Since
M
:= sup
j,kM
[S
j
S
k
[ sup
j,kM
[[S
j
S
M
[ +[S
M
S
k
[] 2Q
M
we may further conclude,
P (
M
2)
1
j=M+1
Var (X
j
) 0 as M ,
i.e.
M
P
0 as M . Since
M
is decreasing in M, it follows that
lim
M
M
=: exists and because
M
P
0 we may concluded that = 0
a.s. Thus we have shown
lim
m,n
[S
n
S
m
[ = 0 a.s.
and therefore S
n
n=1
is almost surely Cauchy and hence almost surely conver-
gent. This gives a second proof of Kolmogorovs convergence criteria in Theorem
20.11.
Corollary 20.44 (L
2
SSLN). Let X
n
be a sequence of independent ran-
dom variables with mean zero, and
2
= EX
2
n
< . Letting S
n
=

n
k=1
X
k
and p > 1/2, we have
1
n
p
S
n
0 a.s.
If Y
n
is a sequence of independent random variables EY
n
= and
2
=
Var (X
n
) < , then for any (0, 1/2) ,
322 20 Random Sums
1
n
n
k=1
Y
k
= O
_
1
n
_
.
Proof. (The proof of this Corollary may be skipped as it has already been
proved, see Corollary 20.15.) From Theorem 20.42, we have for every > 0 that
P
_
S
N
N
p

_
= P (S
N
N
p
)
1
2
N
2p
E
_
S
2
N
=
1
2
N
2p
CN =
C
2
N
(2p1)
.
Hence if we suppose that N
n
= n
with (2p 1) > 1, then we have
n=1
P
_
S
Nn
N
p
n

_
n=1
C
2
n
(2p1)
<
and so by the rst Borel Cantelli lemma we have
P
__
S
Nn
N
p
n
for n i.o.
__
= 0.
From this it follows that lim
n
S
Nn
N
p
n
= 0 a.s.
To nish the proof, for m N, we may choose n = n(m) such that
n
= N
n
m < N
n+1
= (n + 1)
.
Since
S
N
n(m)
N
p
n(m)+1
m
m
p

S
N
n(m)+1
N
p
n(m)
and
N
n+1
/N
n
1 as n ,
it follows that
0 = lim
m
S
N
n(m)
N
p
n(m)
= lim
m
S
N
n(m)
N
p
n(m)+1
lim
m
S
m
m
p
lim
m
S
N
n(m)+1
N
p
n(m)
= lim
m
S
N
n(m)+1
N
p
n(m)+1
= 0 a.s.
That is lim
m
S
m
m
p
= 0 a.s.
We are going to give three more maximal inequalities before ending this
section. In all case we will start with X
n
n=1
a sequence of (possibly with
values in a separable Banach space, Y ) random variables and we will let S
n
:=
kn
X
k
and S
n
:= max
kn
|S
k
| . If is any B
X
n
stopping time and f 0,
then
E[f (S
n
S
) : n] =
n
k=1
E[f (S
n
S
k
) : = k] =
n
k=1
E[f (S
n
S
k
)]P ( = k) .
(20.23)
Theorem 20.45 (Skorohods Inequality). Suppose that X
n
n=1
are inde-
pendent real or Banach valued random variables. Then for all > 0 we have
P (|S
N
| ) (1 c
N
()) P (S
N
2) and (20.24)
P (|S
N
| > ) (1 c
N
()) P (S
N
> 2) (20.25)
where
c
N
() := max
1kN
P (|S
N
S
k
| > ) .
Proof. We only prove Eq. (20.24) since the proof of Eq. (20.25) is similar
and in fact can be deduced from Eq. (20.24) by a simple limiting argument. If
= inf n : |S
n
| 2 , then N = S
N
2 we have
|S
N
| = |S
+S
N
S
| |S
| |S
N
S
|
2 |S
N
S
| .
From this it follows that
N & |S
N
S
| |S
N
|
and therefore,
P (|S
N
| ) P ( N & |S
N
S
| )
=
N
k=1
P ( = k) P (|S
N
S
k
| )
min
1kN
P (|S
N
S
k
| )
N
k=1
P ( = k)
= c
N
() P (S
N
2) .
As an application of Theorem 20.45 we have the following convergence result.
Theorem 20.46 (Levys Theorem). Suppose that X
n
n=1
are i.i.d. random
variables then

n=1
X
n
converges in probability i

n=1
X
n
converges a.s.
Proof. Let S
n
:=

n
k=1
X
k
. Since almost sure convergence implies conver-
gence in probability, it suces to show; if S
n
is convergent in probability then S
n
is almost surely convergent. Given M M, let Q
M
:= sup
nM
[S
n
S
M
[ and
20.5 Maximal Inequalities 323
for M < N, let Q
M,N
:= sup
MnN
[S
n
S
M
[ . Given (0, 1) , by assump-
tion, there exists M = M () N such that max
MjN
P ([S
N
S
j
[ > ) <
for all N M. An application of Skorohods inequality (Theorem 20.45), then
shows
P (Q
M,N
2)
P ([S
N
S
M
[ > )
(1 max
MjN
P ([S
N
S
j
[ > ))

1
.
Since Q
M,N
Q
M
as N , we may conclude
P (Q
M
2)

1
.
Since,
M
:= sup
m,nM
[S
n
S
m
[ sup
m,nM
[[S
n
S
M
[ +[S
M
S
m
[] = 2Q
M
we may further conclude, P (
M
> 4)

1
and since > 0 is arbitrary, it
follows that
M
P
0 as M . Moreover, since
M
is decreasing in M, it
follows that lim
M
M
=: exists and because
M
P
0 we may concluded
that = 0 a.s. Thus we have shown
lim
m,n
[S
n
S
m
[ = 0 a.s.
and therefore S
n
n=1
is almost surely Cauchy and hence almost surely con-
vergent.
Remark 20.47 (Yet another proof of Theorem 20.11). Suppose that Y
j
j=1
are independent random variables such that

j=1
Var (Y
j
) < . By Propo-
sition 20.10, the sum,

j=1
(Y
j
EY
j
) , is L
2
(P) convergent and hence con-
vergent in probability. An application of Levys Theorem 20.46 then shows
j=1
(Y
j
EY
j
) is almost surely convergent which gives another proof of Kol-
mogorovs convergence criteria in Theorem 20.11.
The next maximal inequality will be useful later in proving the functional
central limit theorem. It is actually a simple corollary of Skorohods inequality
(Theorem 20.45) along with Chebyshevs inequality.
Corollary 20.48 (Ottavianis maximal ineqaulity). Suppose that X
n
n=1
are independent real or Banach valued square integrable random variables. Then
for all > 0 we have
P (|S
N
| )
_
1
1
2
max
1kN
E|S
N
S
k
|
2
_
P (S
N
2)
and in particular if
2
> max
1kN
E|S
N
S
k
|
2
, then
P (S
N
2)
_
1
1
2
max
1kN
E|S
N
S
k
|
2
_
1
P (|S
N
| ) .
If we further assume that X
n
are real (or Hilbert valued) mean zero random
variables, then
P (S
N
2)
_
1
1
2
E|S
N
X
1
|
2
_
1
P (|S
N
| ) . (20.26)
Proof. The rst and second inequalities follow by Chebyshevs inequality
and Skorohods Theorem 20.45. When the X
n
are real or Hilbert valued mean
zero square integrable random variables, we have
max
1kN
E|S
N
S
k
|
2
= max
1kN
N
j=k+1
E|X
j
|
2
=
N
j=2
E|X
j
|
2
= |S
N
X
1
|
2
.
Corollary 20.49. Suppose > 1 and X
n
n=1
are independent real square
integrable random variables with EX
n
= 0 and Var (X
n
) = 1 for all n. Then
P
_
S
n
2
n
_
_
1
1
2
_
1
P
_
[S
n
[
n
_
and if we further assume that X
n
n=1
are i.i.d., then
lim
n
P
_
S
n
2
n
_
_
2
_
1
1
2
_
1
1
2
/2
.
Proof. The rst inequality follows from Eq. (20.26) of Corollary 20.48 with
=
n. For the second inequality we use the central limit theorem to conclude
that
P
_
[S
n
[
n
_
= P
_
[S
n
[
n

_
P ([Z[ )
where Z is a standard normal random variable. We then estimate P ([Z[ )
using the Gaussian tail estimates in Lemma 7.59.
We can signicantly improve on Corollary 20.48 if we further assume that
X
n
is symmetric for n in which case the following reection principle holds.
Theorem 20.50. Suppose that X
n
n=1
are independent real or Banach
valued random variables such that X
n
d
= X
n
for all n and is any
_
B
X
n
= (X
1
, . . . , X
n
)
_
n=1
stopping time. If we set
324 20 Random Sums
S
n
:= 1
n
S
n
1
n>
(S
n
S
)
= 1
n
S
n
1
n>
<kn
X
k
,
then (S
n
n=1
, ) and (S
n=1
, ) have the same distribution. (Notice S
n
=
S
n
for n and S
n
is S
n
reected about S
for n > .)
Proof. Let N N be given and f : S
N
1 be a bounded measurable
function. Then, for all k N we have,
E[f (S
1
, . . . , S
N
) : = k]
= E[f (S
1
, . . . , S
k
, (S
k
X
k+1
) , . . . , (S
k
X
k+1
X
N
)) : = k]
= E[f (S
1
, . . . , S
k
, (S
k
+X
k+1
) , . . . , (S
k
+X
k+1
+ +X
N
)) : = k]
= E[f (S
1
, . . . , S
N
) : = k] ,
=
wherein we have used (X
k+1
, . . . , X
N
)
d
= (X
k+1
, . . . , X
N
) , (X
k+1
, . . . , X
N
) is
independent of B
X
k
, and = k and (S
1
, . . . , S
k
) are B
X
k
measurable. This
completes the proof since on = , S
n
n=1
= S
n=1
.
In order to exploit this principle we will need to combine it with the following
simple geometric reection property for Banach spaces; if r > 0 and x, y
Y (Y is a normed space) such that |x| r while |x y| < r, then |x +y| > r.
This is easy to believe (draw the picture for Y = 1
2
) and it is also easy to prove;
|x +y| = |2x (x y)|
|2x| |x y|
2r |x y| > 2r r = r.
Proposition 20.51 (Reection Principle). Suppose that X
n
n=1
are in-
dependent real or Banach valued random variables such that X
n
d
= X
n
for all
n. Then
P(S
N
r) P (|S
N
| r) +P (|S
N
| > r) 2P (|S
N
| r) . (20.27)
Proof. Let := inf n : |S
n
| r (a
_
B
X
n
_
n=1
stopping time), then
P(S
N
r) = P(S
N
r, |S
N
| r) +P(S
N
r, |S
N
| < r)
= P(|S
N
| r) +P( N, |S
N
| < r).
Moreover by the reection principle (Theorem 20.50),
P( N, |S
N
| < r) = P( N, |S
N
| < r)
= P ( N, |S
(S
N
S
)| < r) .
If |S
| r and |S
(S
N
S
)| < r, then by the geometric reection prop-

erty, |S
N
| = |S
+ (S
N
S
)| > r and therefore

P ( N, |S
(S
N
S
)| < r) P ( N, |S
N
| > r) = P (|S
N
| > r) .
Combining this inequality with the rst displayed inequality in the proof easily
gives the result.
Exercise 20.5 (Simple Random Walk Reection principle ). Let
X
n
n=1
be i.i.d Bernoulli random variables with P (X
n
= 1) =
1
2
for all
n and let S
n
:=
kn
X
k
be the standard simple random walk on Z. Show for
every r N that
P
_
max
kn
S
k
r
_
= P (S
n
r) +P (S
n
> r) .
21
Weak Convergence Results
In this chapter we will discuss a couple of dierent ways to decide weather
two probability measures on (1, B
R
) are close to one another. This will arise
later as follows. Suppose Y
n
n=1
is a sequence of random variables and Y is
another random variable (possibly dened on a dierent probability space). We
would like to understand when, for large n, Y
n
and Y have nearly the same
distribution, i.e. when is
n
:= Law(Y
n
) close to = Law(Y ) for large n.
We will often be the case that Y
n
= X
1
+ + X
n
where X
i
n
i=1
are
independent random variables. For this reason it will be useful to record the
procedure for computing the law of Y
n
in terms of the laws of the X
i
n
i=1
. So
before going to the main theme of this chapter let us pause to introduce the
relevant notion of the convolution of probability measures on 1
n
.
21.1 Convolutions
Denition 21.1. Let and be two probability measure on (1
n
, B
R
n) . The
convolution of and , denoted , is the measure, P (X +Y )
1
where
X, Y are two independent random vectors such that P X
1
= and P
Y
1
= .
Of course we may give a more direct denition of the convolution of and
by observing for A B
R
n that
( ) (A) = P (X +Y A)
=
_
R
n
d(x)
_
R
n
d (y) 1
A
(x +y) (21.1)
=
_
R
n
(Ax) d(x) (21.2)
=
_
R
n
(Ax) d (x) . (21.3)
This may also be expressed as,
( ) (A) =
_
R
n
R
n
1
A
(x +y) d(x) d (y) =
_
R
n
R
n
1
A
(x +y) d ( ) (x, y) .
(21.4)
Exercise 21.1. Let , , and be three probability measure on (1
n
, B
R
n) .
Show;
1. = .
2. ( ) = ( ) . (So it is now safe to write for either side
of this equation.)
3. (
x
) (A) = (Ax) for all x 1
n
where
x
(A) := 1
A
(x) for all A
B
R
n and in particular
0
= .
As a consequence of item 2. of this exercise, if Y
i
n
i=1
are independent
random vectors in 1
n
with
i
= Law(Y
i
) , then
Law(Y
1
+ +Y
n
) =
1

2

n
. (21.5)
Solution to Exercise (21.1). 1. The rst item follows from Eq. (21.4).
2.For the second we have,
[ ( )] (A) =
_
R
n
( ) (Ax) d(x)
=
_
R
n
d(x)
_
R
n
d (y) (Ax y)
=
_
R
n
d(x)
_
R
n
d (y)
_
R
n
d (z) 1
Axy
(z)
=
_
R
n
d(x)
_
R
n
d (y)
_
R
n
d (z) 1
A
(x +y +z)
=
_
[R
n
]
3
1
A
(x +y +z) d ( ) (x, y, z) .
Similarly one shows [( ) ] (A) satises the same formula.
3. For the last item,
(
x
) (A) =
_
R
n
x
(Ay) d(y) =
_
R
n
1
xAy
d(y)
=
_
R
n
1
x+yA
d(y) =
_
R
n
1
yAx
d(y) = (Ax) .
326 21 Weak Convergence Results
Remark 21.2. Suppose that d(x) = u(x) dx where u(x) 0 and
_
R
n
u(x) dx = 1. Then using the translation invariance of Lebesgue mea-
sure and Tonellis theorem, we have
(f) =
_
R
n
R
n
f (x +y) u(x) dxd (y) =
_
R
n
R
n
f (x) u(x y) dxd (y)
d ( ) (x) =
__
R
n
u(x y) d (y)
_
dx.
If we further assume that d (x) = v (x) dx, then we have
d ( ) (x) =
__
R
n
u(x y) v (y) dy
_
dx.
To simplify notation we write,
u v (x) =
_
R
n
u(x y) v (y) dy =
_
R
n
v (x y) u(y) dy.
Example 21.3. Suppose that n = 1, d(x) = 1
[0,1]
(x) dx and d (x) =
1
[1,0]
(x) dx so that (A) = (A) . In this case
d ( ) (x) =
_
1
[0,1]
1
[1,0]
_
(x) dx
where
_
1
[0,1]
1
[1,0]
_
(x) =
_
R
1
[1,0]
(x y) 1
[0,1]
(y) dy
=
_
R
1
[0,1]
(y x) 1
[0,1]
(y) dy
=
_
R
1
[0,1]+x
(y) 1
[0,1]
(y) dy
= m([0, 1] (x + [0, 1])) = (1 [x[)
+
.
21.2 Total Variation Distance
Denition 21.4. Let and be two probability measure on a measurable space,
(, B) . The total variation distance, d
TV
(, ) , is dened as
d
TV
(, ) := sup
AB
[(A) (A)[ , (21.6)
i.e. d
TV
(, ) is simply the supremum norm of as a function on B.
Notation 21.5 Suppose that X and Y are random variables, let
d
TV
(X, Y ) := d
TV
(
X
,
Y
) = sup
ABR
[P (X A) P (Y A)[ ,
where
X
= P X
1
and
Y
= P Y
1
.
Example 21.6. For x 1
n
, let
x
(A) := 1
A
(x) for all A B
R
n. Then one easily
shows that d
TV
(
x
,
y
) = 1
x,=y
. Thus if x ,= y, in this metric
x
and
y
are
one unit apart no matter how close x and y are in 1
n
. (This is not always a
desirable feature and because of this will introduce shortly another notion of
closeness for measures.)
Exercise 21.2. Let T
1
denote the set of probability measures on (, B) . Show
d
TV
is a complete metric on T
1
.
Exercise 21.3. Suppose that , , and are probability measures on
(1
n
, B
R
n) . Show d
TV
( , ) d
TV
(, ) . Use this fact along with
Exercise 21.2 to show,
d
TV
(
1

2

n
,
1

2

n
)
n
i=1
d
TV
(
i
,
i
)
for all choices probability measures,
i
and
i
on (1
n
, B
R
n) .
Remark 21.7. The function, : B 1 dened by, (A) := (A) (A) for all
A B, is an example of a signed measure. For signed measures, one usually
denes
||
TV
:= sup
_
n
i=1
[(A
i
)[ : n N and partitions, A
i
n
i=1
B of
_
.
You are asked to show in Exercise 21.4 below, that when = , d
TV
(, ) =
1
2
| |
TV
.
Lemma 21.8 (Schees Lemma). Suppose that m is another positive mea-
sure on (, B) such that there exists measurable functions, f, g : [0, ),
such that d = fdm and d = gdm.
1
Then
d
TV
(, ) =
1
2
_
[f g[ dm.
Moreover, if
n
n=1
is a sequence of probability measure of the form, d
n
=
f
n
dm with f
n
: [0, ), and f
n
g, m - a.e., then d
TV
(
n
, ) 0 as
n .
1
Fact: it is always possible to do this by taking m = + for example.
21.2 Total Variation Distance 327
Proof. Let = and h := f g : 1 so that d = hdm. Since
() = () () = 1 1 = 0,
if A B we have
(A) +(A
c
) = () = 0.
In particular this shows [(A)[ = [(A
c
)[ and therefore,
[(A)[ =
1
2
[[(A)[ +[(A
c
)[] =
1
2
_
_
A
hdm
_
A
c
hdm
_
(21.7)
1
2
__
A
[h[ dm+
_
A
c
[h[ dm
_
=
1
2
_
[h[ dm.
This shows
d
TV
(, ) = sup
AB
[(A)[
1
2
_
[h[ dm.
To prove the converse inequality, simply take A = h > 0 (note A
c
= h 0)
in Eq. (21.7) to nd
[(A)[ =
1
2
__
A
hdm
_
A
c
hdm
_
=
1
2
__
A
[h[ dm+
_
A
c
[h[ dm
_
=
1
2
_
[h[ dm.
For the second assertion, observe that [f
n
g[ 0 m a.e., [f
n
g[
G
n
:= f
n
+ g L
1
(m) , G
n
G := 2g a.e. and
_
G
n
dm = 2 2 =
_
Gdm
and n . Therefore, by the dominated convergence theorem 7.27,
lim
n
d
TV
(
n
, ) =
1
2
lim
n
_
[f
n
g[ dm = 0.
For a concrete application of Schees Lemma 21.8, see Proposition 21.49
below.
Corollary 21.9. Let |h|
:= sup
[h()[ when h : 1 is a bounded

random variable. Continuing the notation in Schees lemma above, we have
d
TV
(, ) =
1
2
sup
_
hd
_
hd
: |h|
1
_
. (21.8)
Consequently,

hd
_
hd
2d
TV
(, ) |h|
(21.9)
and in particular, for all bounded and measurable functions, h : 1,
lim
n
d
TV
(
n
, ) = 0 = lim
n
_
hd
n
=
_
hd. (21.10)
Proof. We begin by observing that
hd
_
hd
h(f g) dm
[h[ [f g[ dm
|h|
[f g[ dm = 2d
TV
(, ) |h|
.
Moreover, from the proof of Schees Lemma 21.8, we have
d
TV
(, ) =
1
2
hd
_
hd
when h := 1
f>g
1
fg
. These two equations prove Eqs. (21.8) and (21.9) and
the latter implies Eq. (21.10).
Exercise 21.4. Under the hypothesis of Schees Lemma 21.8, show
| |
TV
=
_
[f g[ dm = 2d
TV
(, ) .
Exercise 21.5. Suppose that is a (at most) countable set, B := 2
, and
n=0
are probability measures on (, B) . Show
d
TV
(
n
,
0
) =
1
2
[
n
()
0
()[
and lim
n
d
TV
(
n
,
0
) = 0 i lim
n
n
() =
0
() for all .
Exercise 21.6. Let
p
(1) = p and
p
(0) = 1 p and
(n) := e

n
n!
for all n N
0
.
1. Find d
TV
(
p
,
q
) for all 0 p, q 1.
2. Show d
TV
(
p
,
p
) = p (1 e
p
) for all 0 p 1. From this estimate and
the estimate,
1 e
p
=
_
p
0
e
x
dx
_
p
0
1dx = p, (21.11)
it follows that d
TV
(
p
,
p
) p
2
for all 0 p 1.
3. Show
d
TV
(
) [ [ e
]]
for all , 1
+
.
I got this estimate with the aid of the fundamental theorem of calculus along
with crude estimates on each term in the innite series for d
TV
(
) .
Andy Parrish got a much better estimate, namely that
d
TV
(
) [ [ for all , 1
+
! (21.12)
The next theorem should be compared with Exercise 7.6 which may
be stated as follows. If Z
i
n
i=1
are i.i.d. Bernoulli random variables with
P (Z
i
= 1) = p = O(1/n) and S = Z
1
+ + Z
n
, then P (S = k)

=
P (Poisson(pn) = k) which is valid for k n.
Theorem 21.10 (Law of rare events). Let Z
i
n
i=1
be independent Bernoulli
random variables with P (Z
i
= 1) = p
i
(0, 1) and P (Z
i
= 0) = 1 p
i
, S :=
Z
1
+ +Z
n
, a := p
1
+ +p
n
, and X
d
= Poi (a) . Then for any
2
A B
R
we
have
[P (S A) P (X A)[
n
i=1
p
2
i
, (21.13)
or in short,
d
TV
_
n
i=1
Z
i
, X
_
i=1
p
2
i
.
(Of course this estimate has no content unless

n
i=1
p
2
i
< 1.)
Proof. Let X
i
n
i=1
be independent random variables with X
i
d
= Pois (p
i
)
for each i. It then follows from Exercises 21.3 and 21.6 that,
d
TV
_
n
i=1
Z
i
,
n
i=1
X
i
_
i=1
d
TV
(Z
i
, X
i
) =
n
i=1
p
i
_
1 e
pi
_
i=1
p
2
i
.
The reader should compare the proof of this theorem with the proof of the
central limit theorem in Theorem 10.37. For another less quantitative Poisson
limit theorem, see Theorem 23.16.
For the next result we will suppose that (Y, /, ) is a nite measure space
with the following properties;
1. y / and (y) = 0 for all y Y,
2. to any A / and > 0, there exists a nite partition A
n
N=N()
n=1
/
of A such that (A
n
) for all n.
In what follows below we will write F (A) = o ((A)) provided there exits
an increasing function, : 1
+
1
+
, such that (x) 0 as x 0 and
[F (A)[ (A) ((A)) for all A /.
Proposition 21.11 (Why Poisson). Suppose (Y, /, ) is nite measure
space with the properties given above and N (A) : A / is a collection of
N
0
valued random variables with the following properties;
2
Actually, since S and X are N0 valued, we may as well assume that A N0.
1. If A
j
n
j=1
/ are disjoint, then N (A
i
)
n
i=1
are independent random
variables and
N
_
n
i=1
A
i
_
=
n
i=1
N (A
i
) a.s.
2. P (N (A) 2) = o ((A)) .
3. [P (N (A) 1) (A)[ = o ((A)) .
Then N (A)
d
= Poi ((A)) for all A / and in particular EN (A) = (A)
for all A /.
Proof. Let A / and > 0 be given. Choose a partition A
N
i=1
/ of
A such that (A
i
) for all i. Let Z
i
:= 1
N(A
i
)1
and S :=
N
i=1
Z
i
. Using
N (A) =
N
i=1
N (A
i
)
and Lemma 21.12, we have
[P (N (A) = k) P (S = k)[ P (N (A) ,= S)
N
i=1
P (Z
i
,= N (A
i
)) .
Since Z
i
,= N (A
i
) = N (A
i
) 2 and P (N (A
i
) 2) = o ((A
i
)) , it fol-
lows that
[P (N (A) = k) P (S = k)[
N
i=1
(A
i
) ((A
i
))
i=1
(A
i
) () = () (A) . (21.14)
On the other hand, Z
i
N
i=1
are independent Bernoulli random variables with
P (Z
i
= 1) = P (N (A
i
) 1) ,
and a
N
i=1
P (N (A
i
) 1) . Then by the Law of rare events Theorem 21.10,
P (S = k)
a
k
k!
e
a
i=1
[P (N (A
i
) 1)]
2
i=1
[(A
i
) +o ((A
i
))]
2
i=1
(A
i
)
2
(1 +
t
())
2
= (1 +
t
())
2
(A) .
(21.15)
21.3 A Coupling Estimate 329
Combining Eqs. (21.14) and (21.15) shows
P (N (A) = k)
a
k
k!
e
a
_
() + (1 +
t
())
2
_
(A) (21.16)
where a
satises
[a
(A)[ =
i=1
[P (N (A
i
) 1) (A
i
)]
i=1
[[P (N (A
i
) 1) (A
i
)][
N
i=1
o ((A
i
))
i=1
(A
i
) [
t
((A
i
))[ (A)
t
() .
Hence we may let 0 in Eq. (21.16) to nd
P (N (A) = k) =
((A))
k
k!
e
(A)
.
See [49, p. 13-16.] for another variant of this theorem in the case that =
1
+
. See Theorem 11.11 and Exercises 11.6 11.8 for concrete constructions of
Poisson processes.
21.3 A Coupling Estimate
Lemma 21.12 (Coupling Estimates). Suppose X and Y are any random
variables on a probability space, (, B, P) and A B
R
. Then
[P (X A) P (Y A)[ P (X A Y A) P (X ,= Y ) . (21.17)
Proof. The proof is simply;
[P (X A) P (Y A)[ = [E[1
A
(X) 1
A
(Y )][
E[1
A
(X) 1
A
(Y )[ = E1
XA]ZY A]
E1
X,=Y
= P (X ,= Y ) .
Pushing the above proof a little more we have, if A
i
is a partition of ,
then
i
[P (X A
i
) P (Y A
i
)[ =
i
[E[1
Ai
(X) 1
Ai
(Y )][
i
E[1
Ai
(X) 1
Ai
(Y )[
E[1
X,=Y
: X A
i
or Y A
i
]
E[1
X,=Y
: X A
i
] +E[1
X,=Y
: Y A
i
]
= 2P (X ,= Y ) .
This shows
|X
P Y
P|
TV
2P (X ,= Y ) .
This is really not more general than Eq. (21.17) since the Hahn decomposition
theorem we know that in fact the signed measure, := X
P Y
P, has total
variation given by
||
TV
= (
+
) (
)
where =
+
with
+
being a positive set and
being a negative
set. Moreover, since () = 0 we must in fact have (
+
) = (
) so that
|X
P Y
P|
TV
= ||
TV
= 2(
+
) = 2 [P (X
+
) P (Y
+
)[
2P (X ,= Y ) .
Here is perhaps a better way to view the above lemma. Suppose that we
are given two probability measures, , , on (1, B
R
) (or any other measurable
space, (S, B
S
)). We would like to estimate | |
TV
. The lemma states that
if X, Y are random variables (vectors) on some probability space such that
Law
P
(X) = and Law
P
(Y ) = , then
| |
TV
2P (X ,= Y ) .
Suppose that we let := Law
P
(X, Y ) on (S S, B
S
B
S
),
i
: S S S be
the projection maps for i = 1, 2, then (
1
)
= , (
2
)
= v, and
| |
TV
2P (X ,= Y ) = 2 (
1
,=
2
) = 2
_
S
2

_
where = (s, s) : s S is the diagonal in S
2
. Thus nding a coupling
amounts to ning a probability measure, , on
_
S
2
, B
S
B
S
_
whose marginals
are and respectively. Then we will have the coupling estimate,
| |
TV
2
_
S
2

_
.
As an example of how to use Lemma 21.12 let us give a coupling proof of
Theorem 21.10.
Proof. (A coupling proof of Theorem 21.10.) We are going to construct a
coupling for S
P and X
P. Finding such a coupling amounts to representing

X and S on the same probability space. We are going to do this by building
all random variables in site out of U
i
n
i=1
, where the U
i
n
i=1
are i.i.d. random
variables distributed uniformly on [0, 1] .
If we dene,
Z
i
:= 1
(1pi,1]
(U
i
) = 1
1pi<Ui1
,
then Z
i
n
i=1
are independent Bernoulli random variables with P (Z
i
= 1) = p
i
.
We are now also going to construct out of the U
i
n
i=1
, a sequence of indepen-
dent Poisson random variables, X
i
n
i=1
with X
i
= Poi (p
i
) . To do this dene
i
(k) := P (Poi (p
i
) k) = e
pi
k
j=0
p
j
i
j!
i
(1) = 0. Notice that
e
pi

i
(k)
i
(k + 1) 1 for all k N
0
and for p
i
small, we have
i
(0) = e
pi
= 1 p
i
i
(1) = e
pi
(1 +p
i
)
= 1 p
2
i
.
If we dene (see Figure 21.1) by
X
i
:=
k=0
k1
i(k1)<Uii(k)
,
then X
i
= Poi (p
i
) since
P (X
i
= k) = P (
i
(k 1) < U
i

i
(k))
=
i
(k)
i
(k 1) = e
pi
p
k
i
k!
.
It is also clear that X
i
n
i=1
are independent and hence by Lemma 11.1, it
follows that X :=
n
i=1
X
i
d
= Poi (a) .
An application of Lemma 21.12 now shows
[P (S A) P (X A)[ P (S ,= X)
and since S ,= X
n
i=1
X
i
,= Z
i
, we may conclude
[P (S A) P (X A)[
n
i=1
P (X
i
,= Z
i
) .
Fig. 21.1. Plots of Xi and Zi as functions of Ui.
As is easily seen from Figure 21.1,
P (X
i
,= Z
i
) = [
i
(0) (1 p
i
)] + 1
i
(1)
=
_
e
pi
(1 p
i
)
+ 1 e
pi
(1 +p
i
)
= p
i
_
1 e
pi
_
p
2
i
where we have used the estimate in Eq. (21.11) for the last inequality.
21.4 Weak Convergence
Recall that to each right continuous increasing function, F : 1 1 there
is a unique measure,
F
, on B
R
such that
F
((a, b]) = F (b) F (a) for all
< a b < . To simplify notation in this section we will now write F (A)
for
F
(A) for all A B
R
and in particular F ((a, b]) := F (b) F (a) for all
< a b < .
Example 21.13. Suppose that P
_
X
n
=
i
n
_
=
1
n
for i 1, 2, . . . , n so that
X
n
is a discrete approximation to the uniform distribution, i.e. to U where
P (U A) = m(A [0, 1]) for all A B
R
. If we let A
n
=
_
i
n
: i = 1, 2, . . . , n
_
,
then P (X
n
A
n
) = 1 while P (U A
n
) = 0. Therefore, it follows that
d
TV
(X
n
, U) = 1 for all n.
3
3
More generally, if and are two probability measure on (R, BR) such that
({x}) = 0 for all x R while concentrates on a countable set, then dTF (, ) =
1.
21.4 Weak Convergence 331
Nevertheless we would like X
n
to be close to U in distribution. Let us observe
that if we let F
n
(y) := P (X
n
y) and F (y) := P (U y) , then
F
n
(y) = P (X
n
y) =
1
n
#
_
i 1, 2, . . . , n :
i
n
y
_
and
F (y) := P (U y) = (y 1) 0.
From these formula, it easily follows that F (y) = lim
n
F
n
(y) for all y 1,
see Figure 21.2. This suggest that we should say that X
n
converges in distribu-
Fig. 21.2. The plot of F5 in blue and F is black on [0, 1] .
tion to X i P (X
n
y) P (X y) for all y 1. However, the next simple
example shows this denition is also too restrictive.
Example 21.14. Suppose that P (X
n
= 1/n) = 1 for all n and P (X
0
= 0) = 1.
Then it is reasonable to insist that X
n
converges of X
0
in distribution. However,
F
n
(y) = 1
y1/n
1
y0
= F
0
(y) for all y 1 except for y = 0. Observe that
y is the only point of discontinuity of F
0
.
Notation 21.15 Let (X, d) be a metric space, f : X 1 be a function. The
set of x X where f is continuous (discontinuous) at x will be denoted by ( (f)
(T(f)).
Remark 21.16. If F : 1 [0, 1] is a non-decreasing function, then ( (F) is
at most countable. To see this, suppose that > 0 is given and let (
:=
y 1 : F (y+) F (y) . If y < y
t
with y, y
t
(
, then F (y+) < F (y

t
)
and (F (y) , F (y+)) and (F (y
t
) , F (y
t
+)) are disjoint intervals of length
greater that . Hence it follows that
1 = m([0, 1])
yc
m((F (y) , F (y+))) #((
)
and hence that #((
)
1
< . Therefore ( :=
k=1
(
1/k
is at most count-
able.
Denition 21.17. Let F, F
n
: n = 1, 2, . . . be a collection of right continuous
non-increasing functions from 1 to [0, 1] . Then
1. F
n
converges to F vaguely and write, F
n
v
F, i F
n
((a, b]) F ((a, b])
for all a, b ( (F) .
2. F
n
converges to F weakly and write, F
n
w
F, i F
n
(x) F (x) for all
x ( (F) .
3. We say F is proper, if F is a distribution function of a probability measure,
i.e. if F () = 1 and F () = 0.
Example 21.18. If X
n
and U are as in Example 21.13 and F
n
(y) := P (X
n
y)
and F (y) := P (Y y) , then F
n
v
F and F
n
w
F.
Lemma 21.19. Let F, F
n
: n = 1, 2, . . . be a collection of proper distribution
functions. Then F
n
v
F i F
n
w
F. In the case where F
n
and F are proper
and F
n
w
F, we will write F
n
= F.
Proof. If F
n
w
F, then F
n
((a, b]) = F
n
(b) F
n
(a) F (b) F (a) =
F ((a, b]) for all a, b ( (F) and therefore F
n
v
F. So now suppose F
n
v
F
and let a < x with a, x ( (F) . Then
F (x) = F (a) + lim
n
[F
n
(x) F
n
(a)] F (a) + liminf
n
F
n
(x) .
Letting a , using the fact that F is proper, implies
F (x) liminf
n
F
n
(x) .
Likewise,
F (x)F (a) = lim
n
[F
n
(x) F
n
(a)] limsup
n
[F
n
(x) 1] = limsup
n
F
n
(x)1
which upon letting a , (so F (a) 1) allows us to conclude,
F (x) limsup
n
F
n
(x) .
Denition 21.20. A sequence of random variables, X
n
n=1
is said to con-
verge weakly or to converge in distribution to a random variable X (writ-
ten X
n
= X) i F
n
(y) := P (X
n
y) = F (y) := P (X y) .
Example 21.21 (Central Limit Theorem). The central limit theorem (see The-
orems 7.62, Corollary 10.39, and Theorem 22.26) states; if X
n
n=1
are i.i.d.
L
2
(P) random variables with := EX
1
and
2
= Var (X
1
) , then
S
n
n
n
= N (0, )
d
= N (0, 1) .
Written out explicitly we nd
lim
n
P
_
a <
S
n
n
n
b
_
= P (a < N (0, 1) b)
=
1
2
_
b
a
e
1
2
x
2
dx
or equivalently put
lim
n
P
_
n +
na < S
n
n +
nb
_
=
1
2
_
b
a
e
1
2
x
2
dx.
More intuitively, we have
S
n
d
= n +
nN (0, 1)
d
= N
_
n, n
2
_
.
Example 21.22. Suppose that P (X
n
= n) = 1 for all n, then F
n
(y) = 1
yn

0 = F (y) as n . Notice that F is not a distribution function because all
of the mass went o to +. Similarly, if we suppose, P (X
n
= n) =
1
2
for all
n, then F
n
=
1
2
1
[n,n)
+ 1
[n,)

1
2
= F (y) as n . Again, F is not a
distribution function on 1 since half the mass went to while the other half
went to +.
Example 21.23. Suppose X is a non-zero random variables such that X
d
= X,
then X
n
:= (1)
n
X
d
= X for all n and therefore, X
n
= X as n . On
the other hand, X
n
does not converge to X almost surely or in probability.
Lemma 21.24. Suppose X is a random variable, c
n
n=1
1, and X
n
=
X +c
n
. If c := lim
n
c
n
exists, then X
n
= X +c.
Proof. Let F (x) := P (X x) and
F
n
(x) := P (X
n
x) = P (X +c
n
x) = F (x c
n
) .
Clearly, if c
n
c as n , then for all x ( (F ( c)) we have F
n
(x)
F (x c) . Since F (x c) = P (X +c x) , we see that X
n
= X + c.
Observe that F
n
(x) F (x c) only for x ( (F ( c)) but this is sucient
to assert X
n
= X +c.
Fig. 21.3. The functions Y and Y
+
associated to F.
Lemma 21.25. Suppose X
n
n=1
is a sequence of random variables on a com-
mon probability space and c 1. Then X
n
= c i X
n
P
c.
Proof. Recall that X
n
P
c i for all > 0, P ([X
n
c[ > ) 0. Since
[X
n
c[ > = X
n
> c + X
n
< c
it follows X
n
P
c i P (X
n
> x) 0 for all x > c and P (X
n
< x) 0 for all
x < c. These conditions are also equivalent to P (X
n
x) 1 for all x > c and
P (X
n
x) P (X
n
< x
t
) 0 for all x < c (where x < x
t
< c). So X
n
P
c i
lim
n
P (X
n
x) =
_
0 if x < c
1 if x > c
= F (x)
where F (x) = P (c x) = 1
xc
. Since ( (F) = 1c , we have shown X
n
P
c
i X
n
= c.
Notation 21.26 Given a proper distribution function, F : 1 [0, 1] , let Y =
F
: (0, 1) 1 be the function dened by

Y (x) = F
(x) = supy 1 : F (y) < x .

Similarly, let
Y
+
(x) := inf y 1 : F (y) > x .
We will need the following simple observations about Y and Y
+
which are
easily understood from Figure 21.3.
1. Y (x) Y
+
(x) and Y (x) < Y
+
(x) i x is the height of a at spot of F.
2. The set,
E :=
_
x (0, 1) : Y (x) < Y
+
(x)
_
, (21.18)
of at spot heights is at most countable. This is because,
(Y (x) , Y
+
(x))
xE
is a collection of pairwise disjoint intervals which is
necessarily countable. (Each such interval contains a rational number.)
3. The following inequality holds,
F (Y (x) ) x F (Y (x)) for all x (0, 1) . (21.19)
Indeed, if y > Y (x) , then F (y) x and by right continuity of F it follows
that F (Y (x)) x. Similarly, if y < Y (x) , then F (y) < x and hence
F (Y (x) ) x.
4. x (0, 1) : Y (x) y
0
= (0, F (y
0
)] (0, 1) . To prove this assertion rst
suppose that Y (x) y
0
, then according to Eq. (21.19) we have x
F (Y (x)) F (y
0
) , i.e. x (0, F (y
0
)] (0, 1) . Conversely, if x (0, 1)
and x F (y
0
) , then Y (x) y
0
by denition of Y.
5. As a consequence of item 4. we see that Y is B
(0,1)
/B
R
measurable and
m Y
1
= F, where m is Lebesgue measure on
_
(0, 1) , B
(0,1)
_
.
Theorem 21.27 (Baby Skorohod Theorem). Suppose that F
n
n=0
is a
collection of distribution functions such that F
n
= F
0
. Then there ex-
ists a probability space, (, B, P) and random variables, Y
n
n=1
such that
P (Y
n
y) = F
n
(y) for all n N and lim
n
Y
n
= Y a.s..
Proof. We will take := (0, 1) , B = B
(0,1)
, and P = m Lebesgue measure
on and let Y
n
:= F
n
and Y := F
0
as in Notation 21.26. Because of the
above comments, P (Y
n
y) = F
n
(y) and P (Y y) = F
0
(y) for all y 1. So
in order to nish the proof it suces to show, Y
n
(x) Y (x) for all x / E,
where E is the countable null set dened as in Eq. (21.18).
We now suppose x / E. If y ( (F
0
) with y < Y (x) , we have
lim
n
F
n
(y) = F
0
(y) < x and in particular, F
n
(y) < x for almost all n.
This implies that Y
n
(x) y for a.a. n and hence that liminf
n
Y
n
(x) y.
Letting y Y (x) with y ( (F
0
) then implies
liminf
n
Y
n
(x) Y (x) .
Similarly, for x / E and y ( (F
0
) with Y (x) = Y
+
(x) < y, we have
lim
n
F
n
(y) = F
0
(y) > x and in particular, F
n
(y) > x for almost all n.
This implies that Y
n
(x) y for a.a. n and hence that limsup
n
Y
n
(x) y.
Letting y Y (x) with y ( (F
0
) then implies
limsup
n
Y
n
(x) Y (x) .
Hence we have shown, for x / E, that
limsup
n
Y
n
(x) Y (x) liminf
n
Y
n
(x)
which shows
lim
n
F
n
(x) = lim
n
Y
n
(x) = Y (x) = F
(x) for all x / E. (21.20)

In preparation for the full version of Skorohods Theorem 21.58 it will be
useful to record a special case of Theorem 21.27 which has both a stronger
hypothesis and a stronger conclusion.
Theorem 21.28 (Prenatal Skorohod Theorem). Suppose S =
1, 2, . . . , m 1 and
n
n=1
is a sequence of probabilities on S such
that
n
= for some probability on S. Let P := m on := S (0, 1],
Y (i, ) = i for all (i, ) . Then there exists Y
n
: S such that
Law
P
(Y
n
) =
n
for all n and Y
n
(i, ) = i if
n
(i) /(i) where we take
0/0 = 1 in this expression. In particular, lim
n
Y
n
(i, ) = Y (i, ) a.s.
Proof. The main point is to show for any probability measure, , on S
there exists Y
: S such that Y
(i, ) = i when (i) /(i) and

Law
P
(Y
) = . If we can do this then we need only take Y

n
= Y
n
for all n to
complete the proof.
In the proof to follow we will use the simple observation that for any a
(0, 1) and
i
0 with

m
i=1
i
= 1, then there exists a partition, J
i
m
i=1
of
(a, 1] such that m(J
i
) =
i
m((a, 1]) =
i
(1 a) simply take J
i
= (a
i1
, a
i
]
where a
0
= a and a
i
=
_
ji
i
_
a for 1 i m.
Let be any probability on S and let
A
i
:= i
_
0,
(i)
(i)
1
_
and
C =
_
m
i=1
A
i
_
=
m
i=1
i
_
(i)
(i)
1, 1
_
and observe that
P (A
i
) = (i)
_
(i)
(i)
1
_
= (i) (i) .
Using the observation in the previous paragraph we may write k
_
(k)
(k)
1, 1
_
=
m
i=1
C
k,i
with
P (C
k,i
) =
i
P
_
k
_
(k)
(k)
1, 1
__
.
The sets C
i
:=
m
k=1
C
k,i
then form a partition of C such that P (C
i
) =
i
P (C)
for all i.
We now dene
Y
(i, ) :=
m
i=1
i1
AiCi
so that Y
= i on A
i
and in particular Y
(i, ) = i when v (i) /(i) .

To nish the proof we need only choose the
i
m
i=1
so that P (Y
= i) = (i)
for all i, i.e. we must require,
(i) = P (Y
= i) = P (A
i
C
i
) = P (A
i
) +
i
P (C)
= (i) (i) +
i
P (C) (21.21)
and therefore we must dene
i
= ( (i) (i) (i)) /P (C) 0.
To see this is an admissible choice (i.e.

m
i=1
i
= 1) notice that
P (C) =
i
[(i) (i) (i)]
=
(i)<(i)
((i) (i)) =
(i)(i)
( (i) (i)) , (21.22)
wherein we have used the fact that
iS
((i) (i)) = 1 1 = 0.
Making use of these identities we nd,
iS
i
=
1
P (C)
(i)(i)
( (i) (i)) = 1.
The next theorem summarizes a number of useful equivalent characteriza-
tions of weak convergence. (The reader should compare Theorem 21.29 with
Corollary 21.9.) In this theorem we will write BC (1) for the bounded continu-
ous functions, f : 1 1 (or f : 1 C) and C
c
(1) for those f C (1) which
have compact support, i.e. f (x) 0 if [x[ is suciently large.
n
n=0
is a sequence of probability measures
on (1, B
R
) and for each n, let F
n
(y) :=
n
((, y]) be the (proper) distribution
function associated to
n
. Then the following are equivalent.
1. For all f BC (1) ,
_
R
fd
n

_
R
fd
0
as n . (21.23)
2. Eq. (21.23) holds for all f BC (1) which are uniformly continuous.
3. Eq. (21.23) holds for all f C
c
(1) .
4. F
n
= F.
5. There exists a probability space (, B, P) and random variables, Y
n
, on this
space such that P Y
1
n
=
n
for all n and Y
n
Y
0
a.s.
Proof. Clearly 1. = 2. = 3. and 5. = 1. by the dominated
convergence theorem. Indeed, we have
_
R
fd
n
= E[f (Y
n
)]
D.C.T.
E[f (Y )] =
_
R
fd
0
for all f BC (1) . Therefore it suces to prove 3. = 4. and 4. = 5.
The proof of 4. = 5. will be the content of Skorohods Theorem 21.27 below.
Given Skorohods Theorem, we will now complete the proof.
(3. = 4.) Let < a 0, let
f
(x) 1
(a,b]
and g
(x) 1
(a,b]
be the functions in C
c
(1) pictured in Figure
21.4. Then
limsup
n
n
((a, b]) limsup
n
_
R
f
d
n
=
_
R
f
d
0
(21.24)
and
liminf
n
n
((a, b]) liminf
n
_
R
g
d
n
=
_
R
g
d
0
. (21.25)
Since f
1
[a,b]
and g
1
(a,b)
as 0, we may use the dominated convergence
theorem to pass to the limit as 0 in Eqs. (21.24) and (21.25) to conclude,
limsup
n
n
((a, b])
0
([a, b]) =
0
((a, b])
and
liminf
n
n
((a, b])
0
((a, b)) =
0
((a, b]) ,
where the second equality in each of the equations holds because a and b are
points of continuity of F
0
. Hence we have shown that lim
n
n
((a, b]) exists
and is equal to
0
((a, b]) .
Fig. 21.4. The picture denition of the trapezoidal functions, f and g.
n
n=1
and are measures on (1, B
R
) such that
lim
n
d
TV
(
n
, ) = 0, then
n
= . To prove this simply observe that for
f BC (1) we have by Corollary 21.9 that
[(f)
n
(f)[ 2 |f|
u
d
TV
(
n
, ) 0 as n .
n
n=0
is a sequence of random variables,
such that X
n
P
X
0
, then X
n
= X
0
. (Recall that Example 21.23 shows the
converse is in general false.)
Proof. Let g BC (1) , then by Corollary 12.12, g (X
n
)
P
g (X
0
) and since
g is bounded, we may apply the dominated convergence theorem (see Corollary
12.9) to conclude that E[g (X
n
)] E[g (X
0
)] .
We end this section with a few more equivalent characterizations of weak
convergence. The combination of Theorem 21.29 and 21.32 is often called the
Portmanteau
4
Theorem. A review of the notions of closure, interior, and bound-
ary of a set A which are used in the next theorem may be bound in Subsection
21.9.1 below.
Theorem 21.32 (The Baby Portmanteau Theorem). Suppose F
n
n=0
are proper distribution functions. (Recall that we are denoting
Fn
(A) simply
by F
n
(A) for all A B
R
.) Then the following are equivalent.
1. F
n
= F
0
.
2. liminf
n
F
n
(U) F
0
(U) for open subsets, U 1.
3. limsup
n
F
n
(C) F
0
(C) for all closed subsets, C 1.
4. lim
n
F
n
(A) = F
0
(A) for all A B
R
such that F
0
(bd(A)) = 0.
Proof. (1. = 2.) By Skorohods Theorem 21.27 we may choose random
variables, Y
n
, such that P (Y
n
y) = F
n
(y) for all y 1 and n N and
Y
n
Y
0
a.s. as n . Since U is open, it follows that
1
U
(Y ) liminf
n
1
U
(Y
n
) a.s.
F (U) = P (Y U) = E[1
U
(Y )]
liminf
n
E[1
U
(Y
n
)] = liminf
n
P (Y
n
U) = liminf
n
F
n
(U) .
(2. 3.) This follows from the observations: 1) C 1 is closed i
U := C
c
is open, 2) F (U) = 1 F (C) , and 3) liminf
n
(F
n
(C)) =
limsup
n
F
n
(C) .
(2. and 3. 4.) If F
0
(bd(A)) = 0, then A
o
A

A with F
0
_
A A
o
_
=
F
0
(bd(A)) = 0. Therefore
F
0
(A) = F
0
(A
o
) liminf
n
F
n
(A
o
) limsup
n
F
n
_
A
_
F
0
_
A
_
= F
0
(A) .
(4. = 1.) Let a, b ( (F
0
) and take A := (a, b]. Then F
0
(bd(A)) =
F
0
(a, b) = 0 and therefore, lim
n
F
n
((a, b]) = F
0
((a, b]) , i.e. F
n
= F
0
.
Exercise 21.7. Suppose that F is a continuous proper distribution function.
Show,
1. F : 1 [0, 1] is uniformly continuous.
4
Portmanteua: 1) A new word formed by joining two others and combining their
meanings, or 2) A large travelling bag made of sti leather.
2. If F
n
n=1
is a sequence of distribution functions converging weakly to F,
then F
n
converges to F uniformly on 1, i.e.
lim
n
sup
xR
[F (x) F
n
(x)[ = 0.
In particular, it follows that
sup
a 0, show that there exists, =
0
<
1
<
<
n
= , such that [F (
i+1
) F (
i
)[ for all i. Now show, for
x [
i
,
i+1
), that
[F (x) F
n
(x)[ (F (
i+1
) F (
i
))+[F (
i
) F
n
(
i
)[+(F
n
(
i+1
) F
n
(
i
)) .
Most of the results above generalize to the case where 1 is replaced by
a complete separable metric space as described in Section 21.9 below. The
denition of weak convergence in this generality is as follows.
Denition 21.33 (Weak convergence). Let (S, ) be a metric space. A se-
quence of probability measures
n
n=1
is said to converge weakly to a prob-
ability if lim
n
n
(f) = (f) for every f BC(S).
5
We will write this
convergence as
n
= or
n
w
as n .
As a warm up to these general results and compactness results to come, let
us consider in more detail the case where S = 1
d
.
Proposition 21.34. Suppose that
n
n=1
are probability measures on
_
S := 1
d
, B = B
R
d
_
such that (f) = lim
n
n
(f) for all f C
c
(S) then
lim
n
n
(f) = (f) for all f C
c
(S) .
Proof. Let C
c
(S) such that 0 1
C1
and
_
S
(z) dz = 1. For
f C
c
(S) and > 0, let
f
(x) :=
_
S
f (x +z) (z) dz. (21.26)
It then follows that
5
This is actually weak-* convergence when viewing n BC(S)
.
M
:= max
x
[f (x) f
(x)[ = max
x
_
S
[f (x) f (x +z)] (z) dz
max
x
_
S
[f (x) f (x +z)[ (z) dz
max
x
max
]z]
[f (x) f (x +z)[
where the latter expression goes to zero as 0 by the uniform continuity of f.
Thus we have shown that f
f uniformly in x as 0. Making the change

of variables y = x +z in Eq. (21.26) shows
f
(x) :=
1
d
_
S
f (y)
_
y x
_
dy
from which it follows that f
is smooth. Using this information we nd,

limsup
n
[(f)
n
(f)[
limsup
n
[[(f) (f
)[ +[(f
)
n
(f
)[ +[
n
(f
)
n
(f)[]
2M
0 as 0.
n
n=1
are probability measures on
_
S := 1
d
, B = B
R
d
_
(or some other locally compact Hausdor space) such that
(f) = lim
n
n
(f) for all f C
c
(S) , then;
1. For all > 0 there exists a compact set K
S such that (K
) 1
and
n
(K
) 1 for all n N.
2. If f BC (S) , then lim
n
n
(f) = (f) .
Proof. For all R > 0 let C
R
:= x S : [x[ R and then choose
R

C
c
(S) such that
R
= 1 on C
R/2
and 0
R
1
CR
.
1. With this notation it follows that
n
(C
R
)
n
(
R
) (
R
)
_
C
R/2
_
.
Choose R so large that
n
(C
R
)
_
C
R/2
_
1 /2. Then for n N
we
will have
n
(C
R
) 1 for all n N
. By increasing R more if necessary

we may also assume that
n
(C
R
) 1 for all n < N
. Taking K
:= C
R
for this R completes the proof of item 1.
2. Let f BC (S) and for R > 0 let f
R
:=
R
f C
c
(S) . Then
21.5 Derived Weak Convergence 337
limsup
n
[(f)
n
(f)[
limsup
n
[[(f) (f
R
)[ +[(f)
n
(f
R
)[ +[
n
(f
R
)
n
(f)[]
= [(f) (f
R
)[ + limsup
n
[
n
(f
R
)
n
(f)[ . (21.27)
By the dominated convergence theorem, lim
R
[(f) (f
R
)[ = 0. For
the second term if M = max
xS
[f (x)[ we will have
sup
n
[
n
(f
R
)
n
(f)[ sup
n
n
([f
R
f[)
M sup
n
n
(
R
,= 1) M sup
n
n
_
S C
R/2
_
.
However, by item 1. it follows that lim
R
sup
n
n
_
S C
R/2
_
= 0. There-
fore letting R in Eq. (21.27) show that limsup
n
[(f)
n
(f)[ =
0.
21.5 Derived Weak Convergence
Lemma 21.36. Let (X, d) be a metric space, f : X 1 be a function, and
T(f) be the set of x X where f is discontinuous at x. Then T(f) is a Borel
measurable subset of X.
Proof. For x X and > 0, let B
x
() = y X : d (x, y) < . Given
> 0, let f
: X 1 be dened by,
f
(x) := sup
yBx()
f (y) .
We will begin by showing f
is lower semi-continuous, i.e.

_
f
a
_
is closed
(or equivalently
_
f
> a
_
is open) for all a 1. Indeed, if f
(x) > a, then

there exists y B
x
() such that f (y) > a. Since this y is in B
x
() whenever
d (x, x
t
) < d (x, y) (because then, d (x
t
, y) d (x, y)+d (x, x
t
) < ) it follows
that f
(x
t
) > a for all x
t
B
x
( d (x, y)) . This shows
_
f
> a
_
is open in
X.
We similarly dene f
: X 1 by
f
(x) := inf
yBx()
f (y) .
Since f
= (f)
, it follows that
f
a =
_
(f)
a
_
is closed for all a 1, i.e. f
is upper semi-continuous. Moreover, f
f
f
for all > 0 and f
f
0
and f
f
0
as 0, where f
0
f f
0
and
f
0
: X 1 and f
0
: X 1 are measurable functions. The
proof is now complete since it is easy to see that
T(f) =
_
f
0
> f
0
_
=
_
f
0
f
0
,= 0
_
B
X
.
Remark 21.37. Suppose that x
n
x with x ( (f) := T(f)
c
. Then f (x
n
)
f (x) as n .
Theorem 21.38 (Continuous Mapping Theorem). Let f : 1 1 be a
Borel measurable function. If X
n
= X
0
and P (X
0
T(f)) = 0, then
f (X
n
) = f (X
0
) . If in addition, f is bounded, lim
n
Ef (X
n
) = Ef (X
0
) .
(This result generalizes easily to the case where f : S T is a Borel mea-
surable function between metric spaces and X
n
, X
0
are not S valued random
functions.)
Proof. Let Y
n
n=0
be random variables on some probability space as in
Theorem 21.27. For g BC (1) we observe that T(g f) T(f) and there-
fore,
P (Y
0
T(g f)) P (Y
0
T(f)) = P (X
0
T(f)) = 0.
Hence it follows that gf Y
n
gf Y
0
a.s. So an application of the dominated
convergence theorem (see Corollary 12.9) implies
E[g (f (X
n
))] = E[g (f (Y
n
))] E[g (f (Y
0
))] = E[g (f (X
0
))] . (21.28)
This proves the rst assertion. For the second assertion we take g (x) =
(x M) (M) in Eq. (21.28) where M is a bound on [f[ .
Theorem 21.39 (Slutzkys Theorem). Suppose that X
n
= X 1
m
and
Y
n
P
c 1
n
where c 1
n
is constant. Assuming all random vectors are on the
same probability space we will have (X
n
, Y
n
) = (X, c) see Denition 21.33.
In particular if m = n, by taking f (x, y) = g (x +y) and f (x, y) = h(x y)
with g BC (1
n
) and h BC (1) , we learn X
n
+ Y
n
= X + c and
X
n
Y
n
= X c respectively. (The rst part of this theorem generalizes to
metric spaces as well.)
Proof. According to Theorem 21.35 it suces to show for
lim
n
E[f (X
n
, Y
n
)] = E[f (X, c)] (21.29)
for all f BC (1
mn
) which are uniformly continuous or even only f
C (
c
1
mn
) . For a uniformly continuous function we have for every > 0 a
:= () > 0 such that
[f (x, y) f (x
t
, y
t
)[ if |(x, y) (x
t
, y
t
)| .
Then
[E[f (X
n
, Y
n
) f (X
n
, c)][ E[[f (X
n
, Y
n
) f (X
n
, c)[ : |Y
n
c| ]
+E[[f (X
n
, Y
n
) f (X
n
, c)[ : |Y
n
c| > ]
+ 2MP (|Y
n
c| > ) as n ,
where M = sup[f[ . Since, X
n
= X, we know E[f (X
n
, c)] E[f (X, c)]
and hence we have shown,
limsup
n
[E[f (X
n
, Y
n
) f (X, c)][
limsup
n
[E[f (X
n
, Y
n
) f (X
n
, c)][ + limsup
n
[E[f (X
n
, c) f (X, c)][ .
As > 0 was arbitrary this proves Eq. (21.29).
Theorem 21.40 ( method). Suppose that X
n
n=1
are random variables,
b 1, a
n
1 0 with lim
n
a
n
= 0, and
Y
n
:=
X
n
b
a
n
= Z.
If g : 1 1 be a measurable function which is dierentiable at b, then
g (X
n
) g (b)
a
n
= g
t
(b) Z. (21.30)
Put more informally, if
X
n
d
b +a
n
Z then g (X
n
)
d
g (b) +g
t
(b) a
n
Z.
Proof. Informally we have X
n
= a
n
Y
n
+b
d
= a
n
Z +b and therefore
g (X
n
) g (b)
a
n
d
=
g (a
n
Z +b) g (b)
a
n
Z
Z g
t
(b) Z as n .
We now make the proof rigorous.
By Skorohods Theorem 21.27 we may assume that Y
n
n=1
and Z are on the
same probability space and that Y
n
Z a.s. and we may take X
n
:= a
n
Y
n
+b.
By the denition of the derivative of g at b, we have
g (b +) g (b) = g
t
(b) + ()
where () 0 as 0. Taking = a
n
Y
n
in this equation shows
g (X
n
) g (b)
a
n
=
g (a
n
Y
n
+b) g (b)
a
n
=
g
t
(b) a
n
Y
n
+ (a
n
Y
n
) a
n
Y
n
a
n
g
t
(b) Z a.s.
which implies Eq. (21.30) because of Corollary 21.31.
Example 21.41. Suppose that U
n
n=1
are i.i.d. random variables which are
uniformly distributed on [0, 1] and let Y
n
:=

n
j=1
U
1
n
j
. Our goal is to nd a
n
and b
n
such that
Ynbn
an
is weakly convergent to a non-constant random variable.
To this end, let
X
n
:= lnY
n
=
1
n
n
j=1
lnU
j
.
Since
E[lnU
1
] =
_
1
0
lnxdx = 1,
E[lnU
1
]
2
=
_
1
0
ln
2
xdx = 2,
Var (lnU
1
) = 1 and so by the central limit theorem
n[X
n
(1)] =
n
j=1
[lnU
j
+ 1]
n
= Z
d
= N (0, 1) .
In other words, X
n
d
1+
1
n
Z and so by the method if g
t
(1) eixts, then
g (X
n
)
d
g (1) +g
t
(1)
1
n
Z.
Taking g (x) = e
x
then implies Y
n
d
e
1
+e
1 1
n
Z or more precisely,
n
_
_
n
j=1
U
1
n
j
e
1
_
_
=
n
_
Y
n
e
1
_
= e
1
Z = N
_
0, e
2
_
.
21.6 Convergence of Types 339
Exercise 21.8. Given a function, f : X 1 and a point x X, let
liminf
yx
f (y) := lim
0
inf
yB
x
()
f (y) and (21.31)
limsup
yx
f (y) := lim
0
sup
yB
x
()
f (y) , (21.32)
where
B
t
x
() := y X : 0 < d (x, y) < .
Show f is lower (upper) semi-continuous i liminf
yx
f (y) f (x)
_
limsup
yx
f (y) f (x)
_
for all x X.
Solution to Exercise (21.8). Suppose Eq. (21.31) holds, a 1, and x X
such that f (x) > a. Since,
lim
0
inf
yB
x
()
f (y) = liminf
yx
f (y) f (x) > a,
it follows that inf
yB
x
()
f (y) > a for some > 0. Hence we may conclude that
B
x
() f > a which shows f > a is open.
Conversely, suppose now that f > a is open for all a 1. Given x X and
a < f (x) , there exists > 0 such that B
x
() f > a . Hence it follows that
liminf
yx
f (y) a and then letting a f (x) then implies liminf
yx
f (y)
f (x) .
21.6 Convergence of Types
Given a sequence of random variables X
n
n=1
we often look for centerings
b
n
n=1
1 and scalings a
n
> 0
n=1
such that there exists a non-constant
random variable Y such that
X
n
b
n
a
n
= Y. (21.33)
Assuming this can be done it is reasonable to ask how unique are the centering,
scaling parameters, and the limiting distribution Y. To answer this question
let us suppose there exists another collection of centerings
n
n=1
1 and
scalings
n
> 0
n=1
along with a non-constant random variable Z such that
Thus if
X
n
n
= Z. (21.34)
Working informally we expect that
X
n
d
=
n
Z +
n
and putting this expression back into Eq. (21.34) leads us to expect;
n
a
n
Z +

n
b
n
a
n
=

n
Z +
n
b
n
a
n
= Y.
It is reasonable to expect that this can only happen if the limits
A = lim
n
n
a
n
(0, ) and B := lim
n
n
b
n
a
n
(21.35)
exist and
Y
d
= AZ +B. (21.36)
Notice that A > 0 as both Y and Z are assumed to be non-constant. That these
results are correct is the content of Theorem 21.45 below.
Let us now explain how to choose the a
n
and the b
n
. Let F
n
(x) :=
P (X
n
x) , then Eq. (21.33) states,
F
n
(a
n
y +b
n
) = P (X
n
a
n
y +b
n
) = P
_
X
n
b
n
a
n
y
_
= P (Y y) .
Taking y = 0 and y = 1 in this equation leads us to expect,
lim
n
F
n
(b
n
) = P (Y 0) =
1
(0, 1) and
lim
n
F
n
(a
n
+b
n
) = P (Y 1) =
2
(0, 1) .
In fact there is nothing so special about 0 and 1 in these equation for if Y
d
=
AZ +B we will have Z = A
1
(Y B) and so
P (Y 0) = P (AZ +B 0) = P (Z B/A) and
P (Y 1) = P (AZ +B 1) = P (Z (1 B) /A) .
Denition 21.42. Two random variables, Y and Z, are said to be of the same
type if there exists constants, A > 0 and B 1 such that Eq. (21.36) holds.
Alternatively put, if U (y) := P (Y y) and V (z) := P (Z z) , then U and V
should satisfy,
V (z) = P (Z z) = P (Y Az +B) = U (Az +B)
for all z 1.
Remark 21.43. Suppose that Y
d
= AZ + B and Y and Z are square integrable
random variables. Then
EY = A EZ +B and Var (Y ) = A
2
Var (Z)
from which it follows that A
2
= Var (Y ) / Var (Z) and B = EY A EZ. In
particular, given Y L
2
(P) there is a unique Z of the same type such that
EZ = 0 and Var (Z) = 1. On these grounds it is often reasonable to try to
choose b
n
and a
n
> 0 so that

X
n
:= a
1
n
(X
n
b
n
) has mean zero and
variance one.
We will need the following elementary observation for the proof of Theorem
21.45.
Lemma 21.44. If Y is non-constant (a.s.) random variable and U (y) :=
P (Y y) , then U
(
1
) < U
(
2
) for all
1
suciently close to 0 and
2
suciently close to 1 see Notation 21.26 for the meaning of U
.
Proof. Observe that Y is constant i U (y) = 1
yc
for some c 1, i.e.
i U only takes on the values, 0, 1 . So since Y is not constant, there exists
y 1 such that 0 U (y) then U
(
2
) y and
if
1
< U (y) then U
(
1
) y. Moreover, if we suppose that
1
is not the
height of a at spot of U, then in fact, U
(
1
) < U
(
2
) . This inequality
then remains valid as
1
decreases and
2
increases.
Theorem 21.45 (Convergence of Types). Suppose X
n
n=1
is a sequence
of random variables and a
n
,
n
(0, ) , b
n
,
n
1 are constants and Y and
Z are non-constant random variables. Then
1. if both Eq. (21.33) and Eq. (21.34) hold then the limits, in Eq. (21.35) exists
and Y
d
= AZ +B and in particular Y and Z are of the same type.
2. If the limits in Eq. (21.35) hold then either of the convergences in Eqs.
(21.33) or (21.34) implies the others with Z and Y related by Eq. (21.36).
3. If there are some constants, a
n
> 0 and b
n
1 and a non-constant random
variable Y, such that Eq. (21.33) holds, then Eq. (21.34) holds using
n
and
n
of the form,
n
:= F
n
(
2
) F
n
(
1
) and
n
:= F
n
(
1
) (21.37)
for some 0 <
1
<
2
< 1. If the F
n
are invertible functions, Eq. (21.37)
may be written as
F
n
(
n
) =
1
and F
n
(
n
+
n
) =
2
. (21.38)
Proof. (2) Assume the limits in Eq. (21.35) hold. If Eq. (21.33) is satised,
then by Slutskys Theorem 13.22,
X
n
n
=
X
n
b
n
+b
n
n
a
n
a
n
n
=
X
n
b
n
a
n
a
n

n
b
n
a
n
a
n
n
= A
1
(Y B) =: Z
Similarly, if Eq. (21.34) is satised, then
X
n
b
n
a
n
=
X
n
n
a
n
+

n
b
n
a
n
= AZ +B =: Y.
(1) If F
n
(y) := P (X
n
y) , then
P
_
X
n
b
n
a
n
y
_
= F
n
(a
n
y +b
n
) and P
_
X
n
n
y
_
= F
n
(
n
y +
n
) .
By assumption we have
F
n
(a
n
y +b
n
) = U (y) and F
n
(
n
y +
n
) = V (y) .
If w := supy : F
n
(a
n
y +b
n
) < x , then a
n
w +b
n
= F
n
(x) and hence
supy : F
n
(a
n
y +b
n
) < x =
F
n
(x) b
n
a
n
.
Similarly,
supy : F
n
(
n
y +
n
) < x =
F
n
(x)
n
n
.
With these identities, it now follows from the proof of Skorohods Theorem
21.27 (see Eq. (21.20)) that there exists an at most countable subset, , of
(0, 1) such that,
F
n
(x) b
n
a
n
= supy : F
n
(a
n
y +b
n
) < x U
(x) and
F
n
(x)
n
n
= supy : F
n
(
n
y +
n
) < x V
(x)
for all x / . Since Y and Z are not constants a.s., we can choose, by Lemma
21.44,
1
<
2
not in such that U
(
1
) 0 (21.39)
and similarly
F
n
(
2
) F
n
(
1
)
n
V
(
2
) V
(
1
) > 0.
Taking ratios of the last two displayed equations shows,
21.7 Weak Convergence Examples 341
n
a
n
A :=
U
(
2
) U
(
1
)
V
(
2
) V
(
1
)
(0, ) .
Moreover,
F
n
(
1
) b
n
a
n
U
(
1
) and (21.40)
F
n
(
1
)
n
a
n
=
F
n
(
1
)
n
n
a
n
AV
(
1
)
and therefore,
n
b
n
a
n
=
F
n
(
1
)
n
a
n
n
(
1
) b
n
a
n
AV
(
1
) U
(
1
) := B.
(3) Now suppose that we dene
n
:= F
n
(
2
) F
n
(
1
) and
n
:=
F
n
(
1
) , then according to Eqs. (21.39) and (21.40)we have
n
/a
n
U
(
2
) U
(
1
) (0, 1) and
n
b
n
a
n
U
(
1
) as n .
Thus we may always center and scale the X
n
using
n
and
n
of the form
described in Eq. (21.37).
21.7 Weak Convergence Examples
n
n=1
are i.i.d. exp() random variables, i.e.
X
n
0 a.s. and P (X
n
x) = e
x
for all x 0. In this case
F (x) := P (X
1
x) = 1 e
(x0)
=
_
1 e
x
_
+
.
Consider M
n
:= max (X
1
, . . . , X
n
) . We have, for x 0 and c
n
(0, ) that
F
n
(x) := P (M
n
x) = P
_
n
j=1
X
j
x
_
=
n
j=1
P (X
j
x) = [F (x)]
n
=
_
1 e
x
_
n
.
We now wish to nd a
n
> 0 and b
n
1 such that
Mnbn
an
= Y.
1. To this end we note that
P
_
M
n
b
n
a
n
x
_
= P (M
n
a
n
x +b
n
)
= F
n
(a
n
x +b
n
) = [F (a
n
x +b
n
)]
n
.
If we demand (c.f. Eq. (21.38) above)
P
_
M
n
b
n
a
n
0
_
= F
n
(b
n
) = [F (b
n
)]
n
1
(0, 1) ,
then b
n
and we nd
ln
1
nlnF (b
n
) = nln
_
1 e
bn
_
ne
bn
.
From this it follows that b
n

1
lnn. Given this, we now try to nd a
n
by
requiring,
P
_
M
n
b
n
a
n
1
_
= F
n
(a
n
+b
n
) = [F (a
n
+b
n
)]
n
2
(0, 1) .
However, by what we have done above, this requires a
n
+b
n

1
lnn. Hence
we may as well take a
n
to be constant and for simplicity we take a
n
= 1.
2. We now compute
lim
n
P
_
M
n
1
lnn x
_
= lim
n
_
1 e
(x+
1
ln n)
_
n
= lim
n
_
1
e
x
n
_
n
= exp
_
e
x
_
.
The function F (x) = exp
_
e
x
_
is the CDF for a Gumbel distribution, see
Figure 21.5. Thus letting Y be a random variable with this distribution (i.e.
P (Y x) = exp
_
e
x
_
) we have shown M
n
lnn = Y, i.e.
max (X
1
, . . . , X
n
)
1
lnn = Y.
Example 21.47. For p (0, 1) , let X
p
denote the number of trials to get
success in a sequence of independent trials with success probability p. Then
P (X
p
> n) = (1 p)
n
and therefore for x > 0,
P (pX
p
> x) = P
_
X
p
>
x
p
_
= (1 p)
[
x
p
]
= e
[
x
p
] ln(1p)
e
p[
x
p
]
e
x
as p 0.
Therefore pX
p
= T where T
d
= exp(1) , i.e. P (T > x) = e
x
for x 0 or
alternatively, P (T y) = 1 e
y0
.
Fig. 21.5. Here is a plot of the density function for Y when = 1.
Remarks on this example. Let us see in a couple of ways where the
appropriate centering and scaling of the X
p
come from in this example. For
this let q = 1 p, then P (X
p
= n) = (1 p)
n1
p = q
n1
p for n N. Also let
F
p
(x) = P (X
p
x) = P (X
p
[x]) = 1 q
[x]
where [x] :=
n=1
n 1
[n,n+1)
.
Method 1. Our goal is to choose a
p
> 0 and b
p
1 such that
lim
p 0
F
p
(a
p
x +b
p
) exists. As above, we rst demand (taking x = 0) that
lim
p 0
F
p
(b
p
) =
1
(0, 1) .
Since,
1
F
p
(b
p
) 1 q
bp
we require, q
bp
1
1
and hence, c b
p
lnq =
b
p
ln(1 p) b
p
p. This suggests that we take b
p
= 1/p say. Having done this,
we would like to choose a
p
such that
F
0
(x) := lim
p 0
F
p
(a
p
x +b
p
) exists.
Since,
F
0
(x) F
p
(a
p
x +b
p
) 1 q
apx+bp
this requires that
(1 p)
apx+bp
= q
apx+bp
1 F
0
(x)
and hence that
ln(1 F
0
(x)) = (a
p
x +b
p
) lnq (a
p
x +b
p
) (p) = pa
p
x 1.
From this (setting x = 1) we see that pa
p
c > 0. Hence we might take
a
p
= 1/p as well. We then have
F
p
(a
p
x +b
p
) = F
p
_
p
1
x +p
1
_
= 1 (1 p)
[p
1
(x+1)]
which is equal to 0 if x 1, and for x > 1 we nd
(1 p)
[p
1
(x+1)]
= exp
__
p
1
(x + 1)
ln(1 p)
_
exp((x + 1)) .
Hence we have shown,
lim
p 0
F
p
(a
p
x +b
p
) = [1 exp((x + 1))] 1
x1
X
p
1/p
1/p
= pX
p
1 = T 1
or again that pX
p
= T.
Method 2. (Center and scale using the rst moment and the variance of
X
p
.) The generating function is given by
f (z) := E
_
z
Xp
n=1
z
n
q
n1
p =
pz
1 qz
.
Observe that f (z) is well dened for [z[ <
1
q
and that f (1) = 1, reecting the
fact that P (X
p
N) = 1, i.e. a success must occur almost surely. Moreover, we
have
f
t
(z) = E
_
X
p
z
Xp1
, f
tt
(z) = E
_
X
p
(X
p
1) z
Xp2
, . . .
f
(k)
(z) = E
_
X
p
(X
p
1) . . . (X
p
k + 1) z
Xpk
and in particular,
E[X
p
(X
p
1) . . . (X
p
k + 1)] = f
(k)
(1) =
_
d
dz
_
k
[
z=1
pz
1 qz
.
Since
d
dz
pz
1 qz
=
p (1 qz) +qpz
(1 qz)
2
=
p
(1 qz)
2
and
d
2
dz
2
pz
1 qz
= 2
pq
(1 qz)
3
it follows that
21.7 Weak Convergence Examples 343
p
:= EX
p
=
p
(1 q)
2
=
1
p
and
E[X
p
(X
p
1)] = 2
pq
(1 q)
3
=
2q
p
2
.
Therefore,
2
p
= Var (X
p
) = EX
2
p
(EX
p
)
2
=
2q
p
2
+
1
p

_
1
p
_
2
=
2q +p 1
p
2
=
q
p
2
=
1 p
p
2
.
Thus, if we had used
p
and
p
to center and scale X
p
we would have considered,
X
p
1
p
1p
p
=
pX
p
1
1 p
= T 1
instead.
Theorem 21.48 (This is already done in Theorem 7.62). Let X
n
n=1
be
i.i.d. random variables such that P (X
n
= 1) = 1/2 and let S
n
:= X
1
+ +X
n
the position of a drunk after n steps. Observe that [S
n
[ is an odd integer if n
is odd and an even integer if n is even. Then
Sm
m
= N (0, 1) as m .
Proof. (Sketch of the proof.) We start by observing that S
2n
= 2k i
#i 2n : X
i
= 1 = n +k while
#i 2n : X
i
= 1 = 2n (n +k) = n k
and therefore,
P (S
2n
= 2k) =
_
2n
n +k
__
1
2
_
2n
=
(2n)!
(n +k)! (n k)!
_
1
2
_
2n
.
Recall Stirlings formula states,
n! n
n
e
n
2n as n
and therefore,
P (S
2n
= 2k)
(2n)
2n
e
2n
4n
(n +k)
n+k
e
(n+k)
_
2 (n +k) (n k)
nk
e
(nk)
_
2 (n k)
_
1
2
_
2n
=
_
n
(n +k) (n k)
_
1 +
k
n
_
(n+k)
_
1
k
n
_
(nk)
=
1
1
_
1 +
k
n
_ _
1
k
n
_
_
1
k
2
n
2
_
n
_
1 +
k
n
_
k
_
1
k
n
_
k
=
1
n
_
1
k
2
n
2
_
n
_
1 +
k
n
_
k1/2
_
1
k
n
_
k1/2
.
So if we let x := 2k/
2n, i.e. k = x
_
n/2 and k/n =
x
2n
, we have
P
_
S
2n
2n
= x
_
n
_
1
x
2
2n
_
n
_
1 +
x
2n
_
x
n/21/2
_
1
x
2n
_
x
n/21/2
n
e
x
2
/2
e
x
2n
_
x
n/21/2
_
e
2n
_
x
n/21/2
_
n
e
x
2
/2
,
wherein we have repeatedly used
(1 +a
n
)
bn
= e
bn ln(1+an)
e
bnan
when a
n
0.
We now compute
P
_
a
S
2n
2n
b
_
=
axb
P
_
S
2n
2n
= x
_
=
1
axb
e
x
2
/2
2
2n
(21.41)
where the sum is over x of the form, x =
2k
2n
with k 0, 1, . . . , n . Since
2
2n
is the increment of x as k increases by 1, we see the latter expression in
Eq. (21.41) is the Riemann sum approximation to
1
2
_
b
a
e
x
2
/2
dx.
This proves
S2n
2n
= N (0, 1) . Since
S
2n+1
2n + 1
=
S
2n
+X
2n+1
2n + 1
=
S
2n
2n
1
_
1 +
1
2n
+
X
2n+1
2n + 1
,
it follows directly (or see Slutskys Theorem 21.32) that
S2n+1
2n+1
= N (0, 1)
as well.
Proposition 21.49. Suppose that U
n
n=1
are i.i.d. random variables which
are uniformly distributed in (0, 1) . Let U
(k,n)
denote the position of the k
th
largest number from the list, U

1
, U
2
, . . . , U
n
. Further let k (n) be chosen so
that lim
n
k (n) = while lim
n
k(n)
n
= 0 and let
X
n
:=
U
(k(n),n)
k (n) /n
k(n)
n
.
Then d
TV
(X
n
, N (0, 1)) 0 as n .
Proof. (Sketch only. See Resnick, Proposition 8.2.1 for more details.) Ob-
serve that, for x (0, 1) , that
P
_
U
(k,n)
x
_
= P
_
n
i=1
X
i
k
_
=
n
l=k
_
n
l
_
x
l
(1 x)
nl
.
From this it follows that
n
(x) := 1
(0,1)
(x)
d
dx
P
_
U
(k,n)
x
_
is the probability
density for U
(k,n)
. It now turns out that
n
(x) is a Beta distribution,
n
(x) =
_
n
k
_
k x
k1
(1 x)
nk
.
Giving a direct computation of this result is not so illuminating. So let us go
another route. To do this we are going to estimate, P
_
U
(k,n)
(x, x +]
_
, for
(0, 1) . Observe that if U
(k,n)
(x, x +], then there must be at least one
U
i
(x, x + ], for otherwise, U
(k,n)
x + would imply U
(k,n)
x as well
and hence U
(k,n)
/ (x, x +]. Let
i
:= U
i
(x, x +] and U
j
/ (x, x +] for j ,= i .
Since
P (U
i
, U
j
(x, x +] for some i ,= j with i, j n)
i<jn
P (U
i
, U
j
(x, x +])
n
2
n
2

2
,
we see that
P
_
U
(k,n)
(x, x +]
_
=
n
i=1
P
_
U
(k,n)
(x, x +],
i
_
+O
_
2
_
= nP
_
U
(k,n)
(x, x +],
1
_
+O
_
2
_
.
Now on the set,
1
; U
(k,n)
(x, x +] i there are exactly k 1 of U
2
, . . . , U
n
in [0, x] and n k of these in [x +, 1] . This leads to the conclusion that
P
_
U
(k,n)
(x, x +]
_
= n
_
n 1
k 1
_
x
k1
(1 (x +))
nk
+O
_
2
_
and therefore,
n
(x) = lim
0
P
_
U
(k,n)
(x, x +]
_
=
n!
(k 1)! (n k)!
x
k1
(1 x)
nk
.
By Stirlings formula,
n!
(k 1)! (n k)!
n
n
e
n
2n
(k 1)
(k1)
e
(k1)
_
2 (k 1) (n k)
(nk)
e
(nk)
_
2 (n k)
=
ne
1
2
1
_
k1
n
_
(k1)
_
k1
n
_
nk
n
_
(nk)
_
nk
n
=
ne
1
2
1
_
k1
n
_
(k1/2)
_
1
k
n
_
(nk+1/2)
.
Since
_
k 1
n
_
(k1/2)
=
_
k
n
_
(k1/2)
_
k 1
k
_
(k1/2)
=
_
k
n
_
(k1/2)
_
1
1
k
_
(k1/2)
e
1
_
k
n
_
(k1/2)
we arrive at
n!
(k 1)! (n k)!

2
1
_
k
n
_
(k1/2)
_
1
k
n
_
(nk+1/2)
.
21.8 Compactness and tightness of measures on (R, BR) 345
By the change of variables formula, with
x =
u k (n) /n
k(n)
n
on noting the du =
k(n)
n
dx, x =
_
k (n) at u = 0, and
x =
1 k (n) /n
k(n)
n
=
n k (n)
_
k (n)
=
n
_
k (n)
_
1
k (n)
n
_
=
n
_
n
k (n)
_
1
k (n)
n
_
=: b
n
,
E[F (X
n
)] =
_
1
0
n
(u) F
_
_
u k (n) /n
k(n)
n
_
_
du
=
_
bn
k(n)
_
k (n)
n

n
_
_
k (n)
n
x +k (n) /n
_
F (x) du.
Using this information, it is then shown in Resnick that
_
k (n)
n

n
_
_
k (n)
n
x +k (n) /n
_
e
x
2
/2
2
which upon an application of Schees Lemma 21.8 completes the proof.
Remark 21.50. It is possible to understand the normalization constants in the
denition of X
n
by computing the mean and the variance of U
(n,k)
. After some
computations (see Chapter ??), one arrives at
EU
(k,n)
=
_
1
0
n!
(k 1)! (n k)!
x
k1
(1 x)
nk
xdx
=
k
n + 1

k
n
,
EU
2
(k,n)
=
_
1
0
n!
(k 1)! (n k)!
x
k1
(1 x)
nk
x
2
dx
=
(k + 1) k
(n + 2) (n + 1)
and
Var
_
U
(k,n)
_
=
(k + 1) k
(n + 2) (n + 1)

k
2
(n + 1)
2
=
k
n + 1
_
k + 1
n + 2

k
n + 1
_
=
k
n + 1
_
n k + 1
(n + 2) (n + 1)
_
k
n
2
.
21.8 Compactness and tightness of measures on (1, B
R
)
Suppose that 1 is a dense set and F and

F are two right continuous
functions. If F =

F on , then F =

F on 1. Indeed, for x 1 we have
F (x) = lim
x
F () = lim
x
F () =

F (x) .
Lemma 21.51. If G : 1 is a non-decreasing function, then
F (x) := G
+
(x) := inf G() : x < (21.42)
is a non-decreasing right continuous function.
Proof. To show F is right continuous, let x 1 and such that > x.
Then for any y (x, ) ,
F (x) F (y) = G
+
(y) G()
and therefore,
F (x) F (x+) := lim
yx
F (y) G() .
Since > x with is arbitrary, we may conclude, F (x) F (x+)
G
+
(x) = F (x) , i.e. F (x+) = F (x) .
Proposition 21.52. Suppose that F
n
n=1
is a sequence of distribution func-
tions and 1 is a dense set such that G() := lim
n
F
n
() [0, 1] exists
for all . If, for all x 1, we dene F = G
+
as in Eq. (21.42), then
F
n
(x) F (x) for all x ( (F) . (Note well; as we have already seen, it is
possible that F () < 1 and F () > 0 so that F need not be a distribution
function for a measure on (1, B
R
) .)
Proof. Suppose that x, y 1 with x < y and and s, t are chosen so
that x < s < y < t. Then passing to the limit in the inequality,
F
n
(s) F
n
(y) F
n
(t)
implies
F (x) = G
+
(x) G(s) liminf
n
F
n
(y) limsup
n
F
n
(y) G(t) .
Taking the inmum over t (y, ) and then letting x 1 tend up to y,
we may conclude
F (y) liminf
n
F
n
(y) limsup
n
F
n
(y) F (y) for all y 1.
This completes the proof, since F (y) = F (y) for y ( (F) .
The next theorem deals with weak convergence of measures on
_
1, B
R
_
. So
as not have to introduce any new machinery, the reader should identify

1 with
[1, 1] 1 via the map,
[1, 1] x tan
_
2
x
_

1.
Hence a probability measure on
_
1, B
R
_
may be identied with a probability
measure on (1, B
R
) which is supported on [1, 1] . Using this identication, we
see that a should only be considered a point of continuity of a distribution
function, F :

1 [0, 1] i and only if F () = 0. On the other hand, is
always a point of continuity.
Theorem 21.53 (Hellys Selection Theorem). Every sequence of probabil-
ity measures,
n
n=1
, on
_
1, B
R
_
has a sub-sequence which is weakly conver-
gent to a probability measure,
0
on
_
1, B
R
_
.
Proof. Using the identication described above, rather than viewing
n
as
probability measures on
_
1, B
R
_
, we may view them as probability measures
on (1, B
R
) which are supported on [1, 1] , i.e.
n
([1, 1]) = 1. As usual, let
F
n
(x) :=
n
((, x]) =
n
((, x] [1, 1]) .
Since F
n
(x)
n=1
[0, 1] and [0, 1] is compact, for each x 1 we may nd
a convergence subsequence of F
n
(x)
n=1
. Hence by Cantors diagonalization
argument we may nd a subsequence, G
k
:= F
nk
k=1
of the F
n
n=1
such
that G(x) := lim
k
G
k
(x) exists for all x := .
Letting F (x) := G(x+) as in Eq. (21.42), it follows from Lemma 21.51 and
Proposition 21.52 that G
k
= F
nk
= F
0
. Moreover, since G
k
(x) = 0 for all
x (, 1) and G
k
(x) = 1 for all x [1, ). Therefore, F
0
(x) = 1
for all x 1 and F
0
(x) = 0 for all x < 1 and the corresponding measure,
0
is supported on [1, 1] . Hence
0
may now be transferred back to a measure
on
_
1, B
R
_
.
Example 21.54. Here are there simple examples showing that probabilities may
indeed transfer to the points at ; 1)
n
=
, 2)
n
=
and 3)
1
2
(
n
+
n
) =
1
2
(
) .
The next question we would like to address is when is the limiting measure,
0
on
_
1, B
R
_
concentrated on 1. The following notion of tightness is the key
to answering this question.
Denition 21.55. A collection of probability measures, , on (1, B
R
) is tight
i for every > 0 there exists M
< such that

inf
([M
, M
]) 1 . (21.43)
We further say that a collection of random variables, X
: is tight
i the collection probability measures,
_
P X
1
:
_
is tight. Equivalently
put, X
: is tight i
lim
M
sup
P ([X
[ M) = 0. (21.44)
Observe that the denition of uniform integrability (see Denition 12.38) is
considerably stronger than the notion of tightness. It is also worth observing
that if > 0 and C := sup
E[X
< , then by Chebyschevs inequality,

sup
P ([X
[ M) sup
_
1
M
E[X
C
M
0 as M
and therefore X
: is tight.
Theorem 21.56. Let :=
n
n=1
be a sequence of probability measures on
(1, B
R
) . Then is tight, i every subsequently limit measure,
0
, on
_
1, B
R
_
is supported on 1. In particular if is tight, there is a weakly convergent sub-
sequence of converging to a probability measure on (1, B
R
) . (This is greatly
generalized in Prokhorovs Theorem 21.61 below.)
21.9 Metric Space Extensions 347
Proof. Suppose that
nk
=
0
with
0
being a probability measure on
_
1, B
R
_
. As usual, let F
0
(x) :=
0
([, x]) . If is tight and > 0 is given,
we may nd M
< such that M
, M
( (F
0
) and
n
([M
, M
]) 1
for all n. Hence it follows that
0
([M
, M
]) = lim
k
nk
([M
, M
]) 1
and by letting 0 we conclude that
0
(1) = lim
0
0
([M
, M
]) = 1.
Conversely, suppose there is a subsequence
nk
k=1
such that
nk
=
0
with
0
being a probability measure on
_
1, B
R
_
such that
0
(1) < 1. In this
case
0
:=
0
(, ) > 0 and hence for all M < we have
0
([M, M])
0
_
1
_
0
(, ) = 1
0
.
By choosing M so that M and M are points of continuity of F
0
, it then follows
that
lim
k
nk
([M, M]) =
0
([M, M]) 1
0
.
Therefore,
inf
nN
n
(([M, M])) 1
0
for all M <
and
n
n=1
is not tight.
21.9 Metric Space Extensions
The goal of this section is to extend the notions of weak convergence when 1
is replace by a metric space (S, ) . Standard references for the material here
are [6] and [46] also see [18] and [31]. Throughout this section, (S, ) will be
a metric space and B
S
will be the Borel algebra on S, i.e. the algebra
generated by the open subsets of S. Recall that V S is open if it is the union
of open balls of the form
B(x, r) := y S : (x, y) < r
where x S and r 0. It should be noted that if S is separable (i.e. contains
a countable dense set, S), then every open set may be written as a union
of balls with x and r and so in the separable case
B
S
= (B(x, r) : x , r ) = (B(x, r) : x S, r 0) .
Let us now state the theorems of this section.
Denition 21.57. Let (S, ) be a topological space, B := () be the Borel
algebra, and be a probability measure on (S, B) . We say that A B is a
continuity set for provided (bd(A)) = 0. Notice that this is equivalent to
saying that (A
) = (A) =
_
A
_
.
Theorem 21.58 (Skorohod Theorem). Let (S, ) be a separable met-
ric space and
n
n=0
be probability measures on (S, B
S
) such that
lim
n
n
(A) = (A) for all A B such that (bd(A)) = 0.
6
Then there
exists a probability space, (, B, P) and measurable functions, Y
n
: S,
such that
n
= P Y
1
n
for all n N
0
:= N0 and lim
n
Y
n
= Y a.s.
Proposition 21.59 (The Portmanteau Theorem). Suppose that S is a
complete separable metric space and
n
are probability measure on
(S, B := B
S
) . Then the following are equivalent:
1.
n
= as n , i.e.
n
(f) (f) for all f BC(S).
2.
n
(f) (f) for every f BC(S) which is uniformly continuous.
3. limsup
n
n
(F) (F) for all F S.
4. liminf
n
n
(G) (G) for all G
o
S.
5. lim
n
n
(A) = (A) for all A B such that (bd(A)) = 0.
Denition 21.60. Let S be a topological space. A collection of probability mea-
sures on (S, B
S
) is said to be tight if for every > 0 there exists a compact
set K
B
S
such that (K
) 1 for all .
Theorem 21.61 (Prokhorovs Theorem). Suppose S is a separable metriz-
able space and =
n
n=1
is a tight sequence of probability measures on
B
S1
. Then there exists a subsequence
nk
k=1
which is weakly convergent to a
probability measure on B
S
.
Conversely, if we further assume that (S, ) is a complete and is a se-
quentially compact subset of the probability measures on (S, B
S
) with the weak
topology, then is tight. (The converse direction is not so important for us.)
For the next few exercises, let (S
1
,
1
) and (S
2
,
2
) be separable metric spaces
and B
S1
and B
S2
be the Borel algebras on S
1
and S
2
respectively. Further
dene a metric, , on S := S
1
S
2
by
((x
1
, x
2
) , (y
1
, y
2
)) =
1
(x
1
, y
1
)
2
(x
2
, y
2
)
and let B
S1S2
be the Borel algebra on S
1
S
2
. For i = 1, 2, let
i
:
S
1
S
2
S
i
be the projection maps and recall that
B
S1
B
S2
= (
1
,
2
) =
_
1
1
(B
S1
)
1
2
(B
S2
)
_
.
Exercise 21.9 (Continuous Mapping Theorem). Let (S
1
,
1
) and (S
2
,
2
)
be separable metric spaces and B
S1
and B
S2
1
and
S
2
respectively. Let Further suppose that
n
are probability measures
6
In Proposition 21.59 below we will see that this assumption is equivalent to assum-
ing n = .
on (S
1
, B
S1
) such that
n
= . If f : S
1
S
2
is a Borel measurable
function such that (T(f)) = 0 (see Notation 21.15), then f
n
= f
where f
:= f
1
.
Exercise 21.10. Prove the analogue of Lemma 6.25, namely show B
S1S2
=
B
S1
B
S2
. Hint: you may nd Exercise 6.6 helpful.
Exercise 21.11. Let (S
1
,
1
) and (S
2
,
2
) be separable metric spaces and B
S1
and B
S2
1
and S
2
respectively. Further suppose
that
n
and
n
are probability measures on (S
1
, B
S1
) and
(S
2
, B
S2
) respectively. If
n
= and
n
= , then
n
n
= .
Exercise 21.10 and 21.11 have obvious generalizations to nite product
spaces. In particular, if
_
X
(i)
n
_
n=0
are sequences of random variables for
1 i K such that for each n,
_
X
(i)
n
_
K
i=1
are independent random variables
with X
(i)
n = X
(i)
0
as n for each 1 i K, then
_
X
(1)
n
, X
(2)
n
, . . . , X
(K)
n
_
=
_
X
(1)
0
, X
(2)
0
, . . . , X
(K)
0
_
as n .
These comments will be useful for Exercise 21.12 below.
Denition 21.62 (Convergence of nite dimensional distributions). Let
X
n
(t) : t 0
n=0
be a collection of random processes, X
n
(t) : 1. We say
that X
n
converges to X
0
in nite dimensional distributions and write X
n
f.d.
=
X
0
provided for every nite subset := 0 = t
0
< t
1
< t
2
< < t
K
of 1
+
we have
(X
n
(t
0
) , . . . , X
n
(t
K
)) = (X
0
(t
0
) , . . . , X
0
(t
K
)) as n .
n
n=1
be an i.i.d. sequence of random variables with
zero mean and Var (X
n
) = 1. For t 0, let B
n
(t) :=
1
n
S
[nt]
where [nt] is
the nearest integer to nt less than or equal to nt and S
m
:=

km
X
k
where
S
0
= 0 by denition. Show that B
n
f.d.
= B where B(t) : t 0 is a Brownian
motion as dened in denition 17.21. You might use the following outline.
1. For any 0 s < t < , explain why B
n
(t) B
n
(s) = N (0, (t s)) .
2. Given := 0 = t
0
< t
1
< t
2
< < t
K
1
+
argue that
B
n
(t
i
) B
n
(t
i1
)
K
i=1
are independent and then show
B
n
(t
i
) B
n
(t
i1
)
K
i=1
= B(t
i
) B(t
i1
)
K
i=1
as n .
3. Now show that B
n
(t
i
)
K
i=1
= B(t
i
)
K
i=1
as n .
The rest of this section is devoted to the proofs of these results. (These
proofs may safely be skipped on rst reading.)
21.9.1 A point set topology review
Before getting down to business let me recall a few basic point set topology
results which we will need. Recall that if (S, ) is a topological space that
A S, the closure of A, is dened by
A := C : A C S and A
:= V : V A
and the interior of A is dened by
A
= V : V A .
Thus

A is the smallest closed set containing A and A
is the largest open set

contained in A. The relationship between the interior and closure operations is;
(A
)
c
= V
c
: V A
= C : A
c
C S = A
c
.
Finally recall that the topological boundary of a set A S is dened by
bd(A) :=

A A
which may also be expressed as

bd(A) =

A (A
)
c
=

A A
c
(= bd(A
c
)) .
In the case of a metric space we may describe

A and bd(A) as
A = x S : x
n
A x = lim
n
x
n
and
bd(A) = x S : x
n
A and y
n
A
c
lim
n
y
n
= x = lim
n
x
n
.
So the boundary of A consists of those points in S which are arbitrarily close to
points inside of A and outside of A. In the metric space case of most interest,
the next lemma is easily proved using this characterization.
Lemma 21.63. For any subsets, A and B, of S we have bd(A B) bd(A)
bd(B) , bd(A B) bd(A) bd(B) , and bd(A B) bd(A) bd(B) .
Proof. We begin by observing that A
A B

A

B from which
it follows that
A
[A B]
A B A B

A

B
and hence,
bd(A B)
_
A

B
[A
] .
Combining this inclusion with
_
A

B
[A
] =
_
A

B
[A
]
c
=
_
A

B
[(A
)
c
(B
)
c
]
=
_
A

B (A
)
c
A

B (B
)
c
A (A
)
c
B (B
)
c
= bd(A) bd(B)
completes the proof of the rst assertion. The second and third assertions are
easy consequence of the rst because;
bd(A B) = bd(A B
c
) bd(A) bd(B
c
) = bd(A) bd(B)
and
bd(A B) = bd([A B]
c
) = bd(A
c
B
c
)
bd(A
c
) bd(B
c
) = bd(A) bd(B) .
21.9.2 Proof of Skorohods Theorem 21.58
Lemma 21.64. Let (S, ) be a separable metric space, B be the Borel algebra
on S, and be a probability measure on B. Then for every > 0 there exists a
countable partition, B
n
n=1
, of S such that B
n
B, diam(B
n
) and B
n
is a continuity set (i.e. (bd(B
n
)) = 0) for all n.
Proof. For x S and r 0 let S (x, r) := y S : (x, y) = r . For any
nite subset, [0, ), we have

r
S (x, r) S and therefore,
r
(S (x, r)) (S) = 1.
As
f
[0, ) was arbitrary we may conclude that

r0
(S (x, r)) 1 <
and therefore the set Q
x
:= r 0 : (S (x, r)) > 0 is at most countable.
If B(x, r) := y S : (x, y) < r and C (x, r) := y S : (x, y) r are
the open and closed r balls about x respectively, we have S (x, r) = C (x, r)
B(x, r) . As
bd(B(x, r)) = B(x, r) B(x, r) C (x, r) B(x, r) = S (x, r) ,
it follows that B(x, r) is a continuity set for all r / Q
x
. With these prepa-
rations in hand we are now ready to complete the proof.
Let x
n
n=1
be a countable dense subset of S and let Q :=
n=1
Q
xn
a
countable subset of [0, ). Choose r [0, ) Q such that r /2 and then
dene
B
n
:= B(x
n
, r) [B(x
1
, r) B(x
n1
, r)] .
It is clear that B
n
n=1
B is a partition of S with diam(B
n
) 2r .
Moreover, we know that
bd(B
n
) bd(B(x
n
, r)) bd(B(x
1
, r) B(x
n1
, r))

n
k=1
bd(B(x
k
, r))
n
k=1
S (x
k
, r)
and therefore as r / Q we have
(bd(B
n
))
n
k=1
(S (x
k
, r)) = 0
so that B
n
is a continuity set for each n N.
We are now ready to prove Skorohods Theorem 21.58.
Proof. (of Skorohods Theorem 21.58) We will be following the proof in
Kallenberg [28, Theorem 4.30 on page 79.]. In this proof we will be using an
auxiliary probability space (
0
, B
0
, P
0
) which is suciently rich so as to support
the collection of independent random variables needed in the proof.
7
The nal
probability space will then be given by (, B, P) = (
0
S, B
0
B
S
, P
0
)
and the random variable Y will be dened by Y (, x) := x for all (, x) .
Let us now start the proof.
Given p N, use Lemma 21.64 to construct a partition, B
n
n=1
, of S such
that diam(B
n
) < 2
p
and (bd(B
n
)) = 0 for all n. Choose m suciently large
so that
_
n=m+1
B
n
_
< 2
p
and let B
0
:=

n=m+1
B
n
so that B
k
m
k=0
is
a partition of S. Now dene
:=
m
k=0
k1
Bk
(Y ) =
m
k=0
k1
Y Bk
and let be a random variable on which is independent of Y and has the
uniform distribution on [0, 1] . For each n N, the Prenatal Skorohod The-
orem 21.28 implies there exists
n
: (0, 1) 0, . . . , m 0, . . . , m such
that
n
(, k) = k when
n
(B
k
) /(B
k
) and Law
m(Bk)]
m
k=0
(
n
) =
n
(B
k
)
m
k=0
. Now let
n
:=
n
(, ) so that P (
n
= k) =
n
(B
k
) for
all n N and 0 k m and
n
= k when
n
(B
) /(B
) . Since
(bd(B
k
)) = 0 for all k it follows
n
(B
k
) (B
k
) for all 0 k m and
therefore lim
n
n
= , P a.s.
Now choose
k
n
independent of everything such that P
_
k
n
A
_
=
n
(A[B
k
)
for all n and 0 k n. Then dene
Y
p
n
:=
n(,)
n
=
m
k=0
1
n(,)=k

k
n
.
7
An examination of the proof will show that 0 can be taken to be (0, 1) S
N
equipped with a well chosen innite product measure.
Notice that
P (Y
p
n
A) =
m
k=0
P
_
k
n
A &
n
(, ) = k
_
=
m
k=0
n
(A[B
k
)
n
(B
k
) =
m
k=0
n
(A B
k
) =
n
(A) ,
and
_
(Y
p
n
, Y ) > 2
p
_
Y B
0
,=
n
so that
P
_
nN
_
(Y
p
n
, Y ) > 2
p
__
P (Y B
0
) +P (
nN

n
,= )
< 2
p
+P (
nN

n
,= ) .
Since
n
a.s. it follows that
0 = P (
n
,= i.o. n) = lim
N
P (
nN

n
,= )
and so there exists n
p
< such that
P
_
nnp
_
(Y
p
n
, Y ) > 2
p
__
< 2
p
.
To nish the proof, construct Y
p
n
n=1
and n
p
N as above for each p N.
By replacing n
p
by

p
i=1
n
i
if necessary, we may assume that n
1
< n
2
< n
3
<
. . . . As

p
P
_
nnp
_
(Y
p
n
, Y ) > 2
p
__
 2
p
_
i.o. p
_
.
So o the null set N we have (Y
p
n
, Y ) 2
p
for all n n
p
and a.a. p. We now
dene Y
n
n=1
by
Y
n
:= Y
p
n
for n
p
n < n
p+1
and p N.
Then by construction we have Law(Y
n
) =
n
for all n and (Y
n
, Y ) 0 a.s.
21.9.3 Proof of Proposition The Portmanteau Theorem 21.59
Proof. (of Proposition 21.59.) 1. = 2. is obvious.
For 2. = 3., let
(t) :=
_
_
_
1 if t 0
1 t if 0 t 1
0 if t 1
(21.45)
and let f
n
(x) := (n(x, F)). Then f
n
BC(S, [0, 1]) is uniformly continuous,
0 1
F
f
n
for all n and f
n
1
F
as n . Passing to the limit n in
the equation
0
n
(F)
n
(f
m
)
gives
0 limsup
n
n
(F) (f
m
)
and then letting m in this inequality implies item 3.
3. 4. Assuming item 3., let F = G
c
, then
1 liminf
n
n
(G) = limsup
n
(1
n
(G)) = limsup
n
n
(G
c
)
(G
c
) = 1 (G)
which implies 4. Similarly 4. = 3.
3. 5. Recall that bd(A) =

A A
o
, so if (bd(A)) = 0 and 3. (and
hence also 4. holds) we have
limsup
n
n
(A) limsup
n
n
(

A) (

A) = (A) and
liminf
n
n
(A) liminf
n
n
(A
o
) (A
o
) = (A)
from which it follows that lim
n
n
(A) = (A). Conversely, let F S and
set F
:= x S : (x, F) .
8
Then
bd(F
) F
x S : (x, F) < = A
where A
:= x S : (x, F) = . Since A
>0
are all disjoint, we must have
>0
(A
) (S) 1
and in particular the set := > 0 : (A
) > 0 is at most countable. Let
n
/ be chosen so that
n
0 as n , then
(F
m
) = lim
n
n
(F
m
) limsup
n
n
(F).
Let m in this equation to conclude (F) limsup
n
n
(F) as desired.
8
We let (x, F) := inf { (x, y) : y F} so that (x, F) is the distance of x from F.
Recall that (, F) : S [0, ) is a continuous map.
To nish the proof it suces to show 5. = 1. which is easily done using
Skorohods Theorem 21.58 just as was done in the proof of Theorem 21.29. For
those not wanting to use Skorohods theorem we also provide a direct proof
that 3. = 1.
Alternate nish to the proof (3. = 1.) . By an ane change of vari-
ables it suces to consider f C(S, (0, 1)) in which case we have
k
i=1
(i 1)
k
1
_
(i1)
k
f<
i
k
_
f
k
i=1
i
k
1
_
(i1)
k
f<
i
k
_
. (21.46)
Let F
i
:=
_
i
k
f
_
and notice that F
k
= . Then for any probability ,
k
i=1
(i 1)
k
[(F
i1
) (F
i
)] (f)
k
i=1
i
k
[(F
i1
) (F
i
)] . (21.47)
Since
k
i=1
(i 1)
k
[(F
i1
) (F
i
)]
=
k
i=1
(i 1)
k
(F
i1
)
k
i=1
(i 1)
k
(F
i
)
=
k1
i=1
i
k
(F
i
)
k
i=1
i 1
k
(F
i
) =
1
k
k1
i=1
(F
i
)
and
k
i=1
i
k
[(F
i1
) (F
i
)]
=
k
i=1
i 1
k
[(F
i1
) (F
i
)] +
k
i=1
1
k
[(F
i1
) (F
i
)]
=
k1
i=1
(F
i
) +
1
k
,
Eq. (21.47) becomes,
1
k
k1
i=1
(F
i
) (f)
1
k
k1
i=1
(F
i
) + 1/k.
Using this equation with =
n
and then with = we nd
limsup
n
n
(f) limsup
n
_
1
k
k1
i=1
n
(F
i
) + 1/k
_
1
k
k1
i=1
(F
i
) + 1/k (f) + 1/k.
Since k is arbitrary, limsup
n
n
(f) (f). Replacing f by 1 f in
this inequality also gives liminf
n
n
(f) (f) and hence we have shown
lim
n
n
(f) = (f) as claimed.
21.9.4 Proof of Prokhorovs compactness Theorem 21.61
The following proof relies on results not proved in these notes up to this point.
The missing results may be found by searching for Riesz-Markov Theorem in
the notes at
http://www.math.ucsd.edu/bdriver/240A-C-03-04/240 lecture notes.htm.
Proof. (of Prokhorovs compactness Theorem 21.61) First suppose that S
is compact. In this case C(S) is a Banach space which is separable by the
Stone Weirstrass theorem, see Exercise ?? in the analysis notes. By the Riesz
theorem, Corollary ?? of the analysis notes, we know that C(S)
is in one to
one correspondence with the complex measures on (S, B
S
). We have also seen
that C(S)
is metrizable and the unit ball in C(S)
is weak - * compact, see

Theorem ?? of the analysis notes. Hence there exists a subsequence
nk
k=1
which is weak -* convergent to a probability measure on S. Alternatively, use
the cantors diagonalization procedure on a countable dense set C(S) so
nd
nk
k=1
such that (f) := lim
k
nk
(f) exists for all f . Then for
g C(S) and f , we have
[
nk
(g)
nl
(g)[ [
nk
(g)
nk
(f)[ +[
nk
(f)
nl
(f)[
+[
nl
(f)
nl
(g)[
2 |g f|
+[
nk
(f)
nl
(f)[
which shows
limsup
n
[
nk
(g)
nl
(g)[ 2 |g f|
.
Letting f tend to g in C(S) shows limsup
n
[
nk
(g)
nl
(g)[ = 0 and
hence (g) := lim
k
nk
(g) for all g C(S). It is now clear that (g) 0
for all g 0 so that is a positive linear functional on S and thus there is a
probability measure such that (g) = (g).
General case. By Theorem 9.64 we may assume that S is a subset of a
compact metric space which we will denote by

S. We now extend
n
to

S by
setting
n
(A) :=
n
(AS) for all A B
S
. By what we have just proved, there
is a subsequence
t
k
:=
nk
k=1
such that
t
k
converges weakly to a probability
measure on

S. The main thing we now have to prove is that (S) = 1, this
is where the tightness assumption is going to be used. Given > 0, let K
S
be a compact set such that
n
(K
) 1 for all n. Since K
is compact in S
it is compact in

S as well and in particular a closed subset of

S. Therefore by
Proposition 21.59
(K
) limsup
k

k
(K
) = 1 .
Since > 0 is arbitrary, this shows with S
0
:=
n=1
K
1/n
satises (S
0
) = 1.
Because S
0
B
S
B
S
, we may view as a measure on B
S
by letting (A) :=
(AS
0
) for all A B
S
. Given a closed subset F S, choose

F

S such that
F =

F S. Then
limsup
k
t
k
(F) = limsup
k

t
k
(

F) (

F) = (

F S
0
) = (F),
which shows
t
k
= .
Converse direction. Suppose now that (S, ) is complete and is a se-
quentially compact subset of the probability measures on (S, B
S
) . We rst will
prove if G
n
n=1
is a sequence of open subsets of S such that G
n
S, then
c := sup
n
inf
(G
n
) = lim
n
inf
(G
n
) = 1.
Suppose for sake of contradiction that c < 1 and let c
t
(c, 1) . By our assump-
tion we have inf
(G
n
) c for all n therefore there exists
n
such that
n
(G
n
) c
t
for all n N. By passing to a subsequence of n and correspond-
ing subsequence G
t
n
of the G
n
, we may assume that
n
:=
kn
= for
some probability measure on S and
n
(G
t
n
) c
t
for all n where G
t
n
S as
n . For xed N N we have
n
(G
t
N
)
n
(G
t
n
) c
t
for n N. Passing to
the limit as n in these inequalities then implies
(G
t
N
) liminf
n
n
(G
t
N
) c
t
< 1.
However this is absurd since (G
t
N
) 1 as N since is a probability
measure on S and G
t
N
S as N .
We may now nish the proof as follows. Let > 0 be given and let
x
k
k=1
be a countable dense subset of S. For each m N the open sets
G
n
:=
n
k=1
B
_
x
k
,
1
m
_
S and so by the above claim there exists n
m
such
V
m
:= G
nm
satises inf
k

k
(V
m
) 1 2
m
. We now let A :=
m
V
m
so that
k
(A) 1 for all k. As A is totally bounded and S is complete, K
:=

A is
the desired compact subset of S such that
k
(K
) 1 for all k.
22
Characteristic Functions (Fourier Transform)
Notation 22.1 Given a measure on a measurable space, (, B) and a func-
tion, f L
1
() , we will often write (f) for
_
fd.
Let us recall Denition 8.10 here.
Denition 22.2. Given a probability measure, on (1
n
, B
R
n) , let
() :=
_
R
n
e
ix
d(x)
be the Fourier transform or characteristic function of . If X =
(X
1
, . . . , X
n
) : 1
n
is a random vector on some probability space (, B, P) ,
then we let f () := f
X
() := E
_
e
iX
. Of course, if := P X
1
, then
f
X
() = () .
From Corollary 8.11 that we know if and are two probability measures
on (1
n
, B
R
n) such that = then = i.e. the Fourier transform map
is injective. In this chapter we are going to, among other things, characterize
those functions which are characteristic functions and we will also construct an
inversion formula.
22.1 Basic Properties of the Characteristic Function
Denition 22.3. A function f : 1
n
C is said to be positive denite, i
f () = f () for all 1
n
and for all m N,
j
m
j=1
1
n
the matrix,
_
f (
j

k
)
m
j,.k=1
_
is non-negative. More explicitly we require,
m
j,k=1
f (
j

k
)
j
k
0 for all (
1
, . . . ,
m
) C
m
.
Notation 22.4 For l N0 , let C
l
(1
n
, C) denote the vector space of func-
tions, f : 1
n
C which are l - time continuously dierentiable. More explicitly,
if
j
:=

xj
, then f C
l
(1
n
, C) i the partial derivatives,
j1
. . .
jk
f, exist
and are continuous for k = 1, 2, . . . , l and all j
1
, . . . , j
k
1, 2, . . . , n .
Proposition 22.5 (Basic Properties of ). Let and be two probability
measures on (1
n
, B
R
n) , then;
1. (0) = 1, and [ ()[ 1 for all .
2. () is continuous.
3. () = () for all 1
n
and in particular, is real valued i is
symmetric, i.e. i (A) = (A) for all A B
R
n. (If = P X
1
for
some random vector X, then is symmetric i X
d
= X.)
4. is a positive denite function.
(Bochners Theorem 22.43 below asserts that if f is a function satisfying
properties of in itmes 1 4 above, then f = for some probability measure
.)
5. If
_
R
n
|x|
l
d(x) < , then C
l
(1
n
, C) and
j1
. . .
jm
() =
_
R
n
(ix
j1
. . . ix
jm
) e
ix
d(x) for all m l.
6. If X and Y are independent random vectors then
f
X+Y
() = f
X
() f
Y
() for all 1
n
.
This may be alternatively expressed as
() = () () for all 1
n
.
7. If a 1, b 1
n
, and X : 1
n
is a random vector, then
f
aX+b
() = e
ib
f
X
(a) .
Proof. The proof of items 1., 2., 6., and 7. are elementary and will be left
to the reader. It also easy to see that () = () and () = () if is
symmetric. Therefore if is symmetric, then () is real. Conversely if ()
is real then
() = () =
_
R
n
e
ix
d (x) = ()
where (A) := (A) . The uniqueness Corollary 8.11 then implies = , i.e.
is symmetric. This proves item 3.
354 22 Characteristic Functions (Fourier Transform)
Item 5. follows by induction using Corollary 7.30. For item 4. let m N,
m
j=1
1
n
and (
1
, . . . ,
m
) C
m
. Then
m
j,k=1
(
j

k
)
j
k
=
_
R
n
m
j,k=1
e
i(jk)x
k
d(x)
=
_
R
n
m
j,k=1
e
ijx
j
e
ikx
k
d(x)
=
_
R
n
j=1
e
ijx
2
d(x) 0.
Example 22.6 (Example 21.3 continued.). Let d(x) = 1
[0,1]
(x) dx and (A) =
(A) . Then
() =
_
1
0
e
ix
dx =
e
i
1
i
,
() = () = () =
e
i
1
i
, and
() = () () = [ ()[
2
=
e
i
1
i
2
=
2
2
[1 cos ] .
According to example 21.3 we also have d ( ) (x) = (1 [x[)
+
dx and so
directly we nd
() =
_
R
e
ix
(1 [x[)
+
dx =
_
R
cos (x) (1 [x[)
+
dx
= 2
_
1
0
(1 x) cos x dx = 2
_
1
0
(1 x) d
sinx
= 2
_
1
0
d (1 x)
sinx
= 2
_
1
0
sinx
dx = 2
cos x
2
[
x=1
x=0
= 2
1 cos
2
.
For the most part we are now going to stick to the one dimensional case, i.e.
X will be a random variable and will be a probability measure on (1, B
R
) .
The following Lemma is a special case of item 4. of Proposition 22.5.
Lemma 22.7. Suppose n N and X is random variables such that E[[X[
n
] <
. If = P X
1
is the distribution of X, then () := E
_
e
iX
is C
n
dierentiable and

(l)
() = E
_
(iX)
l
e
iX
_
=
_
R
(ix)
l
e
ix
d(x) for l = 0, 1, 2, . . . , n.
In particular it follows that
E
_
X
l
=

(l)
(0)
i
l
.
The following theorem is a partial converse to this lemma. Hence the combi-
nation of Lemma 22.7 and Theorem 22.8 (see also Corollary 22.35 below) shows
that there is a correspondence between the number of moments of X and the
dierentiability of f
X
.
Theorem 22.8. Let X be a random variable, m 0, 1, 2, . . . , f () =
E
_
e
iX
. If f C
2m
(1, C) such that g := f
(2m)
is dierentiable in a neigh-
borhood of 0 and g
tt
(0) = f
(2m+2)
(0) exists. Then E
_
X
2m+2
< and
f C
2m+2
(1, C) .
Proof. This will be proved by induction on m. Let m N
0
be given and
suppose that
u() = E
_
X
2m
cos (X)
= Re E
_
X
2m
e
iX
is dierentiable in a neighborhood of 0 and further suppose that u

tt
(0) exists.
Since u is an even function of , u
t
is an odd function of near 0 and therefore
u
t
(0) = 0. By the mean value theorem, to each > 0 with near 0, there exists
0 < c
< such that

u() u(0)
= u
t
(c
) = u
t
(c
) u
t
(0)
and so
u(0) u()
c
=
u
t
(c
) u
t
(0)
c
u
tt
(0) as 0. (22.1)
Using Eq. (22.1) along with Fatous lemma, we may pass to the limit as 0
in the inequality,
E
_
X
2m
1 cos (X)
2
_
E
_
X
2m
1 cos (X)
c
_
=
u(0) u()
c
,
to nd
1
2
E
_
X
2m+2
= liminf
0
E
_
X
2m
1 cos (X)
2
_
liminf
0
u(0) u()
c
= u
tt
(0) < .
With this result in hand the theorem is now easily proved by induction. We
start with m = 0 and recall from Proposition 22.5 that f C (1, C)). Assuming
22.2 Examples 355
f is dierentiable in a neighborhood of 0 and f
tt
(0) exists we may apply the
above result with u = Re f in order to learn E
_
X
2
< . An application of
Lemma 22.7 then implies that f C
2
(1, C) . The induction step is handled in
much the same way upon noting,
f
(2m)
() = (1)
m
E
_
X
2m
e
iX
so that
u() := (1)
m
Re f
(2m)
() = E
_
X
2m
cos (X)
.
Corollary 22.9. Suppose that X is a 1
d
valued random vector such that for
all 1
d
the function
f
(t) := f
X
(t) = E
_
e
itX
is 2m times dierentiable in a neighborhood of t = 0, then E|X|

2m
< and
f
X
C
2m
_
1
d
, C
_
.
Proof. Applying Theorem 22.8 with X replaced by X shows that
E
_
[ X[
2m
_
< for all 1
d
. In particular, taking = e
i
(the i
th
standard basis vector) implies that E

_
[X
i
[
2m
_
< for 1 i d. So by
Minikowskis inequality;
|X|
L
2m
(P)
=
_
_
_
_
_
d
i=1
X
i
e
i
_
_
_
_
_
L
2m
(P)
_
_
_
_
_
d
i=1
[X
i
[ |e
i
|
_
_
_
_
_
L
2m
(P)
i=1
|e
i
||X
i
|
L
2m
(P)
< ,
i.e. E|X|
2m
< . The fact that f
X
C
2m
_
1
d
, C
_
now follows from Proposi-
tion 22.5.
22.2 Examples
Example 22.10. If < a 0, then
() =
sinc
c
.
Observe that
() = 1
1
3!
2
c
2
+. . .
and therefore,
t
(0) = 0 and
tt
(0) =
1
3
c
2
_
R
xd(x) = 0 and
_
R
x
2
d(x) =
1
3
c
2
.
Example 22.11. Suppose Z is a Poisson random variable with mean a > 0, i.e.
P (Z = n) = e
a a
n
n!
. Then
f
Z
() = E
_
e
iZ
= e
a
n=0
e
in
a
n
n!
= e
a
n=0
_
ae
i
_
n
n!
= exp
_
a
_
e
i
1
__
.
Dierentiating this result gives,
f
t
Z
() = iae
i
exp
_
a
_
e
i
1
__
and
f
tt
Z
() =
_
a
2
e
i2
ae
i
_
exp
_
a
_
e
i
1
__
from which we conclude,
EZ =
1
i
f
t
Z
(0) = a and EZ
2
= f
tt
Z
(0) = a
2
+a.
Therefore, EZ = a = Var (Z) .
Example 22.12. Suppose T
d
= exp(a), i.e. T 0 a.s. and P (T t) = e
at
for
all t 0. Recall that = Law(T) is given by
d(t) = F
t
T
(t) dt = ae
at
1
t0
dt.
Therefore,
E
_
e
iaT
=
_

0
ae
at
e
it
dt =
a
a i
= () .
Since

t
() = i
a
(a i)
2
and
tt
() = 2
a
(a i)
3
it follows that
ET =

t
(0)
i
= a
1
and ET
2
=

tt
(0)
i
2
=
2
a
2
and hence Var (T) =
2
a
2

_
1
a
_
2
= a
2
.
Example 22.13. From Exercise 7.15, if d(x) :=
1
2
e
x
2
/2
dx, then () =
e
2
/2
and we may deduce
_
R
xd(x) = 0 and
_
R
x
2
d(x) = 1.
Recall from Section 9.8 that we have dened a random vector, X 1
d
, to
be Gaussian i
E
_
e
iX
= exp
_
1
2
Var ( X) +iE( X)
_
.
We dene a probability measure, , on
_
1
d
, B
R
d
_
to be Gaussian i there is
a Gaussian random vector, X, such that Law(X) = . This can be expressed
directly in terms of as; is Gaussian i
() = exp
_
1
2
q (, ) +i m
_
for all 1
d
where
m :=
_
R
d
xd(x) and q (, ) :=
_
R
d
( x)
2
d(x) ( m)
2
.
Example 22.14. If is a probability measure on (1, B
R
) and n N, then
n
is
the characteristic function of the probability measure, namely the measure
n
:=
n times
..
. (22.2)
Alternatively put, if X
k
n
k=1
are i.i.d. random variables with = P X
1
k
,
then
f
X1++Xn
() = f
n
X1
() .
n
n=0
are probability measure on (1, B
R
) and
p
n
n=0
[0, 1] such that

n=0
p
n
= 1. Then

n=0
p
n

n
is the characteristic
function of the probability measure,
:=
n=0
p
n
n
.
Here is a more interesting interpretation of . Let X
n
n=0
T be independent
random variables with P X
1
n
=
n
and P (T = n) = p
n
for all n N
0
. Then
(A) = P (X
T
A) , where X
T
() := X
T()
() . Indeed,
(A) = P (X
T
A) =
n=0
P (X
T
A, T = n) =
n=0
P (X
n
A, T = n)
=
n=0
P (X
n
A, T = n) =
n=0
p
n
n
(A) .
Let us also observe that
() = E
_
e
iXT
n=0
E
_
e
iXT
: T = n
n=0
E
_
e
iXn
: T = n
n=0
E
_
e
iXn
P (T = n) =
n=0
p
n

n
() .
Example 22.16. If is a probability measure on (1, B
R
) then

n=0
p
n

n
is the
characteristic function of a probability measure, , on (1, B
R
) . In this case,
=

n=0
p
n
n
where
n
is dened in Eq. (22.2). As an explicit example, if
a > 0 and p
n
=
a
n
n!
e
a
, then
n=0
p
n

n
=
n=0
a
n
n!
e
a

n
= e
a
e
a
= e
a( 1)
is the characteristic function of a probability measure. In other words,
f
XT
() = E
_
e
iXT
= exp(a (f
X1
() 1)) .
22.3 Continuity Theorem
Lemma 22.17 (Tail Estimate). Let X : (, B, P) 1 be a random variable
and f
X
() := E
_
e
iX
be its characteristic function. Then for a > 0,

P ([X[ a)
a
2
_
2/a
2/a
(1 f
X
()) d =
a
2
_
2/a
2/a
(1 Re f
X
()) d (22.3)
Proof. Recall that the Fourier transform of the uniform distribution on
[c, c] is
sin c
c
and hence
1
2c
_
c
c
f
X
() d =
1
2c
_
c
c
E
_
e
iX
d = E
_
sincX
cX
_
.
Therefore,
1
2c
_
c
c
(1 f
X
()) d = 1 E
_
sincX
cX
_
= E[Y
c
] (22.4)
22.3 Continuity Theorem 357
where
Y
c
:= 1
sincX
cX
.
Notice that Y
c
0 (see Eq. (22.49)) and moreover, Y
c
1/2 if [cX[ 2
([sincX[ / [cX[ [sincX[ /2 1/2 if [cX[ 2). Hence we may conclude
E[Y
c
] E[Y
c
: [cX[ 2] E
_
1
2
: [cX[ 2
_
=
1
2
P ([X[ 2/c) .
Combining this estimate with Eq. (22.4) shows,
1
2c
_
c
c
(1 f
X
()) d
1
2
P ([X[ 2/c) .
Taking c = 2/a in this estimate proves Eq. (22.3).
Exercise 22.1. Suppose now X : (, B, P) 1
d
is a random vector and
f
X
() := E
_
e
iX
is its characteristic function. Show for a > 0,

P ([X[
a) 2
_
a
4
_
d
_
[2/a,2/a]
d
(1 f
X
()) d
= 2
_
a
4
_
d
_
[2/a,2/a]
d
(1 Re f
X
()) d (22.5)
where [X[
= max
i
[X
i
[ and d = d
1
, . . . , d
d
.
Solution to Exercise (22.1). Working as above, we have
_
1
2c
_
d
_
[c,c]
d
_
1 e
iX
_
d = 1
d
j=1
sincX
j
cX
j
=: Y
c
, (22.6)
where as before, Y
c
0 and Y
c
1/2 if c [X
j
[ 2 for some j, i.e. if c [X[
2.
Therefore taking expectations of Eq. (22.6) implies,
_
1
2c
_
d
_
[c,c]
d
(1 f
X
()) d = E[Y
c
] E[Y
c
: [X[
2/c]
E
_
1
2
: [X[
2/c
_
=
1
2
P ([X[
2/c) .
Taking c = 2/a in this expression implies Eq. (22.5).
Theorem 22.18 (Continuity Theorem). Suppose that
n
n=1
is a se-
quence of probability measure on
_
1
d
, B
R
d
_
and suppose that f () :=
lim
n

n
() exists for all 1
d
. If f is continuous at = 0, then f
is the characteristic function of a unique probability measure, , on B
R
d and
n
= as n .
Proof. I will give the proof when d = 1 and leave the straight forward
extension to the d dimensional case to the reader.
By the continuity of f at = 0, for ever > 0 we may choose a
suciently
large so that
1
2
a
_
2/a
2/a
(1 Re f ()) d /2.
According to Lemma 22.17 and the DCT,
n
(x : [x[ a
)
1
2
a
_
2/a
2/a
(1 Re
n
()) d
1
2
a
_
2/a
2/a
(1 Re f ()) d /2 as n .
Hence
n
(x : [x[ a
) for all suciently large n, say n N. By increas-

ing a
if necessary we can assure that

n
(x : [x[ a
) for all n and hence

:=
n
n=1
is tight.
By Theorem 21.56, we may nd a subsequence,
nk
k=1
and a probability
measure on B
R
such that
nk
= as k . Since x e
ix
is a bounded
and continuous function, it follows that
() = lim
k

nk
() = f () for all 1,
that is f is the characteristic function of a probability measure, .
We now claim that
n
= as n . If not, we could nd a bounded
continuous function, g, such that lim
n
n
(g) ,= (g) or equivalently, there
would exists > 0 and a subsequence
t
k
:=
nk
such that
[(g)
t
k
(g)[ for all k N.
However by Theorem 21.56 again, there is a further subsequence,
tt
l
=
t
kl
of
t
k
such that
tt
l
= for some probability measure . Since () =
lim
l

tt
l
() = f () = () , it follows that = . This leads to a contradic-
tion since,
lim
l
[(g)
tt
l
(g)[ = [(g) (g)[ = 0.
Remark 22.19. One could also use Proposition 22.40 and Bochners Theorem
22.43 below to conclude; if f () := lim
n

n
() exists and is continuous
at 0, then f is the characteristic function of a probability measure. Indeed,
the condition of a function being positive denite is preserved under taking
pointwise limits.
n
n=1
X are random vectors in 1
d
, then
X
n
= X i lim
n
E
_
e
iXn
= E
_
e
iX
for all 1
d
.
Proof. Since f (x) := e
ix
is in BC
_
1
d
_
for all 1
d
, if X
n
= X then
lim
n
E
_
e
iXn
= E
_
e
iX
. Conversely if lim
n
E
_
e
iXn
= E
_
e
iX
for all 1
d
and
n
:= Law(X
n
) and which is equivalent to X
n
= X.
The proof of the next corollary is a straightforward consequence of Corollary
22.20 used for dimension d and dimension 1.
n
n=1
X are random vectors in 1
d
, then
X
n
= X i X
n
= X for all 1
d
.
Lemma 22.22. If
n
n=1
is a tight sequence of probability measures on 1
d
,
then the corresponding characteristic functions,
n
n=1
, are equicontinuous
on 1
d
.
Proof. By the tightness of the
n
n=1
, given > 0 there exists M
<
such that
n
_
1
d
[M
, M
]
n
_
for all n. Let , h 1
d
, then
[
n
( +h)
n
()[
_
R
d
e
ix(+h)
e
ix
d
n
(x)
=
_
R
d
e
ixh
1
d
n
(x)
2 + sup
x[M,M]
n
e
ixh
1
.
limsup
h0
sup
R
d
[
n
( +h)
n
()[ 2
and as > 0 was arbitrary the result follows.
Corollary 22.23 (Uniform Convergence). If
n
= as n then

n
() () uniformly on compact subsets of 1 (1
n
).
Proof. This is a consequence of Theorem 22.18, Lemma 22.22, and the
Arzela - Ascoli Theorem 39.36. For completeness here is a sketch of the proof.
Let K be a compact subset of 1 (1
n
) and > 0 be given. Applying Lemma
22.22 to
n
we know that there exists > 0 such that
sup
[
n
( +h)
n
()[ and sup
[ ( +h) ()[ (22.7)

whenever |h| . Let F K be a nite set such that K
F
B(, ) . Since
we already know that
n
pointwise we will have
lim
n
max
F
[
n
() ()[ = 0.
Since every point K is within of a point in F we may use Eq. (22.7) to
conclude that
sup
K
[
n
() ()[ 2 + max
F
[
n
() ()[
and therefore, limsup
n
sup
K
[
n
() ()[ 2. As > 0 was arbitrary
the result follows.
The following lemma will be needed before giving our rst applications of
the continuity theorem.
Lemma 22.24. Suppose that z
n
n=1
C satises, lim
n
nz
n
= C,
then
lim
n
(1 +z
n
)
n
= e
.
Proof. Since nz
n
, it follows that z
n

n
0 as n and therefore
by Lemma 22.45 below, (1 +z
n
) = e
ln(1+zn)
and
ln(1 +z
n
) = z
n
+O
_
z
2
n
_
= z
n
+O
_
1
n
2
_
.
Therefore,
(1 +z
n
)
n
=
_
e
ln(1+zn)
_
n
= e
nln(1+zn)
= e
n(zn+O(
1
n
2
))
e
as n .
Proposition 22.25 (Weak Law of Large Numbers revisited). Suppose
that X
n
n=1
are i.i.d. integrable random variables. Then
Sn
n
P
EX
1
=: c.
Proof. Let f () := f
X1
() = E
_
e
iX1
in which case
fSn
n
() =
_
f
_
n
__
n
.
By Taylors theorem (see Appendix 22.7), f () = 1 +k () where
lim
0
k () = k (0) = f
t
(0) = iE[X
1
] .
It now follows from Lemma 22.24 that
fSn
n
() =
_
1 +k
_
n
_

n
_
n
e
ic
as n
22.3 Continuity Theorem 359
which is the characteristic function of the constant random variable, c. By the
continuity Theorem 22.18, it follows that
Sn
n
= c and since c is constant we
may apply Lemma 21.25 to conclude
Sn
n
P
c = EX
1
.
We are now ready to continue are investigation of central limit theorems
that was begun with Theorem 10.37 above.
Theorem 22.26 (The Basic Central Limit Theorem). Suppose that
X
n
n=1
are i.i.d. square integrable random variables such that EX
1
= 0 and
EX
2
1
= 1. Then
Sn
n
= N (0, 1) .
Proof. If f () := E
_
e
iX1
, then by Taylors theorem (see Appendix 22.7),

f () = f (0) +f
t
(0) +
1
2
k ()
2
= 1 +
1
2
k ()
2
(22.8)
where
lim
0
k () = k (0) = f
tt
(0) = E
_
X
2
1
= 1.
Hence, using Lemma 22.24, we nd
E
_
e
i
Sn
n
_
=
_
f
_

n
__
n
=
_
1 +
1
2
k
_

n
_
2
n
_
n
e
2
/2
.
Since e
2
/2
is the characteristic function of N (0, 1) (Example 22.13), the result
now follows from the continuity Theorem 22.18.
Alternative proof. Again it suces to show
lim
n
E
_
e
i
Sn
n
_
= e
2
/2
for all 1.
We do this using Lemma 23.7 below as follows;
f Sn
n
() e
2
/2
_
f
_

n
__
n
_
e
2
/2n
_
n
f
_

n
_
e
2
/2n
= n
1
1
2
_
1 +
_

n
__
2
n

_
1

2
2n
+O
_
1
n
2
__
0 as n .
Corollary 22.27. If X
n
n=1
are i.i.d. square integrable random variables such
that EX
1
= 0 and EX
2
1
= 1, then
sup
R
P
_
S
n
n
y
_
P (N (0, 1) y)
0 as n . (22.9)
Proof. This is a direct consequence of Theorem 22.26 and Exercise 21.7.
Berry (1941) and Esse n (1942) showed there exists a constant, C < , such
that; if
3
:= E[X
1
[
3
< , then
sup
R
P
_
S
n
n
y
_
P (N (0, 1) y)
C
_
_
3
/
n.
In particular the rate of convergence is n
1/2
. The exact value of the best
constant C is still unknown but it is known to be less than 1. We will not prove
this theorem here. However we have seen a hint that such a result should be
true in Theorem 10.37 above.
Remark 22.28 (Why normal?). It is now a reasonable question to ask why is
the limiting random variable normal in Theorem 22.26. One way to understand
this is, if under the assumptions of Theorem 22.26, we know
Sn
n
= L where
L is some random variable with EL = 0 and EL
2
= 1, then
S
2n
2n
=
1
2
_
2n
k=1, k odd
X
j
n
+
2n
k=1, k even
X
j
n
_
(22.10)
=
1
2
(L
1
+L
2
)
where L
1
d
= L
d
= L
2
and L
1
and L
2
are independent see Exercise 21.11. In
particular this implies that
f () =
_
f
_

2
__
2
for all 1. (22.11)
We could also arrive at Eq. (22.11) by passing to the limit in the identity,
f S
2n
2n
() = f Sn
n
_

2
_
f Sn
n
_

2
_
.
Iterating Eq. (22.11) and then Eq. (22.8) and Lemma 22.24 above we again
deduce that,
f () =
_
f
_

_
2
_
n
__
2
n
=
_
1 +
1
2
k
_

2
n/2
_
2
2
n
_
2
n
e
1
2
2
= f
N(0,1)
() .
That is we must have L
d
= N (0, 1) . What we have proved is that if L is any
square integrable random variable with zero mean and variance equal to one
such that L
d
=
1
2
(L
1
+L
2
) where L
1
and L
2
are two independent copies of L,
then L
d
= N (0, 1) .
Theorem 22.29 (The multi-dimensional Central Limit Theorem). Sup-
pose that X
n
n=1
are i.i.d. square integrable random vectors in 1
d
and let
m := EX
1
and Q = E
_
(X
1
m) (X
1
m)
tr
_
, that is m 1
d
and Q is the
d d matrix dened by
m
j
:= E(X
1
)
j
and
Q
ij
:= E
_
(X
1
m)
i
(X
1
m)
j
_
= Cov
_
(X
1
)
i
, (X)
j
_
for all 1 i, j d. Then
1
n
n
k=1
(X
k
m) = Z (22.12)
where Z
d
= N (0, Q) , i.e. Z is a random vector such that
E
_
e
iZ
= exp
_
1
2
Q
_
for all 1
d
.
Proof. Let 1
d
, then
Z
d
= N (0, Q )
d
=
_
Q N (0, 1)
and X
k
k=1
are i.i.d random variables with E[ X
k
] = m and
Var ( X
k
) = Q . If Q = 0 then X
k
= m a.s. and Z = 0
a.s. and we will have

_
1
n
n
k=1
(X
k
m)
_
= 0 = 0 = Z. (22.13)
If Q > 0 then
_
Xkm
Q
_
k=1
satisfy the hypothesis of Theorem 22.26 and
therefore,
1
Q

_
1
n
n
k=1
(X
k
m)
_
=
1
n
n
k=1
X
k
m
Q
= N (0, 1)
which combined with Eq. (22.13) implies, for all 1
d
we have

_
1
n
n
k=1
(X
k
m)
_
=
_
Q N (0, 1)
d
= Z.
We may now apply Corollary 22.21 to conclude that Eq. (22.12) holds.
22.4 A Fourier Transform Inversion Formula
Corollary 8.11 guarantees the injectivity of the Fourier transform on the space
of probability measures. Our next goal is to nd an inversion formula for the
Fourier transform. To motivate the construction below, let us rst recall a few
facts about Fourier series. To keep our exposition as simple as possible, we now
restrict ourselves to the one dimensional case.
For L > 0, let e
L
n
(x) := e
i
n
L
x
and let
(f, g)
L
:=
1
2L
_
L
L
f (x) g (x) dx
for f, g L
2
([L, L] , dx) . Then it is well known (and fairly elementary
to prove) that
_
e
L
n
: n Z
_
is an orthonormal basis for L
2
([L, L] , dx) . In
particular, if f C
c
(1) with supp(f) [L, L] , then for x [L, L] ,
f (x) =
nZ
_
f, e
L
n
_
L
e
L
n
(x) =
1
2L
nZ
_
_
L
L
f (y) e
i
n
L
y
dy
_
e
i
n
L
x
=
1
2L
nZ
f
_
n
L
_
e
i
n
L
x
(22.14)
where
f () =
_

f (y) e
iy
dy.
Letting L in Eq. (22.14) then suggests that
1
2L
nZ
f
_
n
L
_
e
i
n
L
x
1
2
_

f () e
ix
d
and we are lead to expect,
f (x) =
1
2
_

f () e
ix
d. (22.15)
Now suppose that f (x) = (x) where (x) is a probability density for a
measure (i.e. d(x) := (x) dx) so that () = () . From Eq. (22.15) we
expect that
22.4 A Fourier Transform Inversion Formula 361
((a, b]) =
_
b
a
(x) dx =
_
b
a
_
1
2
_

() e
ix
d
_
dx
=
1
2
_

()
_
_
b
a
e
ix
dx
_
d
=
1
2
_

()
_
e
ia
e
ib
i
_
d
= lim
c
1
2
_
c
c
()
_
e
ia
e
ib
i
_
d. (22.16)
We will prove this formula is essentially correct in Theorem 22.31 below. The
following lemma is the key to computing the limit appearing in Eq. (22.16)
which will be the heart of the proof of the inversion formula.
Lemma 22.30. For c > 0, let
S (c) :=
_
c
c
sin
d. (22.17)
Then S (c) is a continuous function such that S (c) boundedly as c ,
see Figure 22.1. Moreover for any y 1 we have
_
c
c
siny
d = sgn(y)S (c [y[) (22.18)

where
sgn(y) =
_
_
_
1 if y > 0
1 if y < 0
0 if y = 0
.
Proof. The rst assertion has already been dealt with in Example 9.11. We
will repeat the argument here for the readers convenience. By symmetry and
Fubinis theorem,
S (c) = 2
_
c
0
sin
d = 2
_
c
0
sin
__

0
e
t
dt
_
d
= 2
_

0
__
c
0
sine
t
d
_
dt
= 2
_

0
_
1
1 +t
2
_
1 e
tc
(cos c +t sinc)
_
dt
= 2
_

0
1
1 +t
2
e
tc
[cos c +t sinc] dt. (22.19)
Fig. 22.1. The graph of S (c) in black and in red.
The the integral in Eq. (22.19) tends to 0 as c by the dominated con-
vergence theorem. The second assertion in Eq. (22.18) is a consequence of the
change of variables, z = y.
Theorem 22.31 (Fourier Inversion Formula). If is a probability measure
on (1, B
R
) and < a < b < , then
lim
c
1
2
_
c
c
()
_
e
ia
e
ib
i
_
d = ((a, b)) +
1
2
((a) +(b)) .
(22.20)
(At the end points, the limit picks up only half of the mass.)
Proof. Let I (c) denote the integral appearing in Eq. (22.20). By Fubinis
theorem and Lemma 22.30,
I (c) :=
_
c
c
()
_
e
ia
e
ib
i
_
d (22.21)
=
_
c
c
__
R
e
ix
d(x)
__
e
ia
e
ib
i
_
d
=
_
R
d(x)
_
c
c
d e
ix
_
e
ia
e
ib
i
_
=
_
R
d(x)
_
c
c
d
_
e
i(ax)
e
i(bx)
i
_
.
Since
e
i(x)
i
=
i
cos (( x))
1
sin(( x))
it follows that Im
__
e
i(ax)
e
i(bx)
/i
_
is an odd function of and
Re
_
e
i(ax)
e
i(bx)
i
_
=
1
[sin((x a)) sin((x b))] ,

and therefore (using Lemma 22.30)
I (c) =
_
R
d(x)
_
c
c
dRe
_
e
i(ax)
e
i(bx)
i
_
=
_
R
d(x)
_
c
c
d
_
sin(x a) sin(x b)
_
=
_
R
d(x) [sgn(x a)S (c [x a[) sgn(x b)S (c [x b[)] .
Using Lemma 22.30 again along with the DCT we may pass to the limit as
c in the previous identity to get the result;
lim
c
1
2
I (c) =
1
2
_
R
d(x) [sgn(x a) sgn(x b)]
=
1
2
_
R
d(x)
_
2 1
(a,b)
(x) + 1
a]
(x) + 1
b]
(x)
= ((a, b)) +
1
2
[(a) +(b)] .
Corollary 22.32. Suppose that is a probability measure on (1, B
R
) such that
L
1
(m) , then d = dm where is the continuous probability density on 1
given by
(x) :=
1
2
_
R
() e
ix
d. (22.22)
Proof. The function dened in Eq. (22.22) is continuous by the dominated
convergence theorem. Moreover for any < a < b < we have
_
b
a
(x) dx =
1
2
_
b
a
dx
_
R
d () e
ix
=
1
2
_
R
d ()
_
b
a
dxe
ix
=
1
2
_
R
d ()
_
e
ia
e
ib
i
_
= lim
c
1
2
_
c
c
()
_
e
ia
e
ib
i
_
d
= ((a, b)) +
1
2
[(a) +(b)] ,
wherein we have used Theorem 22.31 to evaluate the limit. Letting a b over
a 1 such that (a) = 0 in this identity shows (b) = 0 for all b 1.
Therefore we have shown
((a, b]) =
_
b
a
(x) dx for all < a < b < .
Using one of the multiplicative systems theorems, it is now easy to verify that
(A) =
_
A
(x) dx for all A B
R
or
_
R
hd =
_
R
h d for all bounded mea-
surable functions h : 1 1. This then implies that 0, m a.e.
1
and the
d = dm.
Example 22.33. Let (x) = (1 [x[)
+
be the triangle density in Figure 22.2
Fig. 22.2. The triangular density function.
Recall from Example 22.6 that
_
R
e
ix
(1 [x[)
+
dx = 2
1 cos
2
.
Alternatively by direct calculation,
_
R
e
ix
(1 [x[)
+
dx = 2 Re
_
1
0
e
ix
(1 x) dx
= 2 Re
__
I
1
i
d
d
__
1
0
e
ix
dx
_
= 2 Re
__
I
1
i
d
d
_
e
i
1
i
_
= 2
1 cos
2
.
1
Since is continuous we may further conclude that (x) 0 for every x R.
22.4 A Fourier Transform Inversion Formula 363
Hence it follows
2
from Corollary 22.32 that
(1 [x[)
+
=
1
_
R
1 cos
2
e
ix
d. (22.23)
Evaluating Eq. (22.23) at x = 0 gives the identity
1 =
1
1 cos
2
d. (22.24)
from which we deduce that
d(x) :=
1
1 cos x
x
2
dx (22.25)
is a probability measure such that (from Eq. (22.23)) has characteristic function,
() = (1 [[)
+
. (22.26)
Corollary 22.34. For all random variables, X, we have
E[X[ =
1
_
R
1 Re f
X
()
2
d. (22.27)
Proof. For M 1 0 , make the change of variables, M in Eq.
(22.24) for nd
[M[ =
1
_
R
1 cos (M)
2
d. (22.28)
Observe the identity holds for M = 0 as well. Taking M = X in Eq. (22.28)
and then taking expectations implies,
E[X[ =
1
_
R
E
1 cos X
2
d =
1
_
R
1 Re f
X
()
2
d.
Suppose that we did not know the value of c :=
_
1cos
2
d is , we could
still proceed as above to learn
E[X[ =
1
c
_
R
1 Re f
X
()
2
d.
We could then evaluate c by making a judicious choice of X. For example if
X
d
= N (0, 1) , we would have on one hand
2
This identity could also be veried directly using residue calculus techniques from
complex variables.
E[X[ =
1
2
_
R
[x[ e
x
2
/2
dx =
2
2
_

0
xe
x
2
/2
dx =
_
2
.
On the other hand, f
X
() = e
2
/2
and so
_
2
=
1
c
_
R
_
1 e
2
/2
_
d
_
1
_
=
1
c
_
R
d
_
1 e
2
/2
_
_
1
_
=
1
c
_
R
e
2
/2
d =
2
c
from which it follows, again, that c = .
Corollary 22.35. Suppose X is a random variable and there exists > 0
such that u() := Re f
X
() = E[cos X] is continuously dierentiable for
(2, 2) . If we further assume that
_

0
[u
t
()[
d < , (22.29)
then E[X[ < and f
X
C
1
(1, C) . (Since u is even, u
t
is odd and u
t
(0) = 0.
Hence if u
t
() were Holder continuous for some > 0, then Eq. (22.29)
would hold.)
Proof. According to Eq. (22.27)
E[X[ =
_
R
1 u()
2
d =
_
]]
1 u()
2
d +
_
]]>
1 u()
2
d.
Since 0 1 u() 2 and 2/
2
is integrable for [[ > , to show E[X[ <
we must show,
>
_
]]
1 u()
2
d = lim
0
_
]]
1 u()
2
d.
By an integration by parts we nd
_
]]
1 u()
2
d =
_
]]
(1 u()) d
_
1
_
=
u() 1
+
u() 1

_
]]
1
u
t
() d
=
_
]]
1
u
t
() d +
u() 1

u() 1
+
u() 1

u() 1
.
lim
0
_
]]
1
u
t
() d +
u() +u()
+u
t
(0) u
t
(0)
_
]]
[u
t
()[
[[
d +
u() +u()
= 2
_

0
[u
t
()[
d +
u() +u()
< .
Passing the limit as 0 using the fact that u
t
() is an odd function, we learn
_
]]
1 u()
2
d = lim
0
_
]]
1
u
t
() d +
u() +u()
2
_

0
[u
t
()[
d +
u() +u()
< .
22.5 Exercises
Exercise 22.2. For x, 1, let (also see Eq. (22.32))
(, x) :=
_
_
_
e
ix
1ix
x
2
if x ,= 0
1
2
2
if x = 0.
(22.30)
Let x
k
n
k=1
1 0 , Z
k
n
k=1
N be independent random variables with
N
d
= N (0, 1) and Z
k
being Poisson random variables with mean a
k
> 0, i.e.
P (Z
k
= n) = e
ak
a
n
k
n!
for n = 0, 1, 2 . . . . With Y :=

n
k=1
x
k
(Z
k
a
k
) + N,
show
f
Y
() := E
_
e
iY
= exp
__
R
(, x) d (x)
_
where is the discrete measure on (1, B
R
) given by
=
2
0
+
n
k=1
a
k
x
2
k
xk
. (22.31)
It is easy to see that (, 0) = lim
x0
(, x) . In fact by Taylors theorem
with integral remainder we have
(, x) =
1
2
2
_
1
0
e
itx
d (t) (22.32)
where d (t) = 2 (1 t) dt is a probability measure on [0, 1] . From this formula
it is clear that is a smooth function of (, x) .
Exercise 22.3. To each nite and compactly supported measure, , on (1, B
R
)
show there exists a sequence
n
n=1
of nitely supported nite measures on
(1, B
R
) such that
n
= . Here we say is compactly supported if there
exists M < such that (x : [x[ M) = 0 and we say is nitely supported
if there exists a nite subset, 1 such that (1 ) = 0.
Exercise 22.4. Show that if is a nite measure on (1, B
R
) , then
f () := exp
__
R
(, x) d (x)
_
(22.33)
is the characteristic function of a probability measure on (1, B
R
) . Here is an
outline to follow. (You may nd the calculus estimates in Section 22.7 to be of
help.)
1. Show f () is continuous.
2. Now suppose that is compactly supported. Show, using Exercises 22.2,
22.3, and the continuity Theorem 22.18 that exp
__
R
(, x) d (x)
_
is the
characteristic function of a probability measure on (1, B
R
) .
3. For the general case, approximate by a sequence of nite measures with
compact support as in item 2.
Exercise 22.5 (Exercise 2.3 in [65]). Let be the probability measure on
(1, B
R
) , such that (n) = p (n) = c
1
n
2
ln]n]
1
]n]2
with c chosen so that
nZ
p (n) = 1. Show that C
1
(1, C) even though
_
R
[x[ d(x) = . To do
this show,
g (t) :=
n2
1 cos nt
n
2
lnn
is continuously dierentiable.
22.5 Exercises 365
Fig. 22.3. Here is a piecewise linear convex function. We will assume that dn > 0 for
all n and that () = 0 for suciently large. This last restriction may be removed
later by a limiting argument.
Exercise 22.6 (Polyas Criterion [5, Problem 26.3 on p. 305.] and [15,
p. 104-107.]). Suppose () is a non-negative symmetric continuous function
such that (0) = 1, () is non-increasing and convex for 0. Show () =
() for some probability measure, , on (1, B
R
) .
Solution to Exercise (22.6). Because of the continuity theorem and some
simple limiting arguments, it suces to prove the result for a function as
pictured in Figure 22.3. From Example 22.33, we know that (1 [[)
+
= ()
where is the probability measure,
d(x) :=
1
1 cos x
x
2
dx.
For a > 0, let
a
(A) = (aA) in which case
a
(f) =
_
f
_
a
1
__
for all
bounded measurable f and in particular,

a
() =
_
a
1
_
=
_
1
_
+
.
To nish the proof it suces to show that () may be expressed as
() =
n=1
p
n

an
() =
n=1
p
n
_
1
a
n
_
+
(22.34)
for some a
n
> 0 and p
n
0 such that

n=1
p
n
= 1 Indeed, if this is the case
we may take, :=
n=1
p
n
an
, see Figure 22.4.
Fig. 22.4. The Fourier transform of =
1
2
1 +
1
6
3 +
1
3
7 is in blue and the Fourier
transform of =
3
8
0 +
1
8
1 +
1
6
3 +
1
3
7 is given in red.
It is pretty clear that we should take a
n
= d
1
+ +d
n
for all n N. Since
we are assuming () = 0 for large , there is a rst index, N N, such that
0 = (a
N
) = 1
N
n=1
d
n
s
n
. (22.35)
Notice that s
n
= 0 for all n > N.
Since
t
() =
n=k
p
n
1
a
n
when a
k1
< < a
k
we must require,
s
k
=
n=k
p
n
1
a
n
for all k
which then implies p
k
1
ak
= s
k
s
k+1
p
k
= a
k
(s
k
s
k+1
) . (22.36)
Since is convex, we know that s
k
s
k+1
or s
k
s
k+1
for all k and
therefore p
k
0 and p
k
= 0 for all k > N. Moreover,
k=1
p
k
=
k=1
a
k
(s
k
s
k+1
) =
k=1
a
k
s
k

k=2
a
k1
s
k
= a
1
s
1
+
k=2
s
k
(a
k
a
k1
) = d
1
s
1
+
k=2
s
k
d
k
=
k=1
s
k
d
k
= 1
where the last equality follows from Eq. (22.35). Working backwards with p
k
dened as in Eq. (22.36) it is now easily shown that
d
d
n=1
p
n
_
1

an
_
+
=
t
() for / a
1
, a
2
, . . . and since both functions are equal to 1 at = 0 we
may conclude that Eq. (22.34) is indeed valid.
22.6 Appendix: Bochners Theorem
Denition 22.36. A function f C(1
n
, C) is said to have rapid decay or
rapid decrease if
sup
xR
n
(1 +[x[)
N
[f(x)[ < for N = 1, 2, . . . .
Equivalently, for each N N there exists constants C
N
< such that [f(x)[
C
N
(1 + [x[)
N
for all x 1
n
. A function f C(1
n
, C) is said to have (at
most) polynomial growth if there exists N < such
sup(1 +[x[)
N
[f(x)[ < ,
i.e. there exists N N and C < such that [f(x)[ C(1 + [x[)
N
for all
x 1
n
.
Denition 22.37 (Schwartz Test Functions). Let o denote the space of
functions f C
(1
n
) such that f and all of its partial derivatives have rapid
decay and let
|f|
N,
= sup
xR
n
(1 +[x[)
N
f(x)
so that
o =
_
f C
(1
n
) : |f|
N,
< for all N and
_
.
Also let T denote those functions g C
(1
n
) such that g and all of its deriva-
tives have at most polynomial growth, i.e. g C
(1
n
) is in T i for all multi-
indices , there exists N
< such
sup(1 +[x[)
N
[
g(x)[ < .
(Notice that any polynomial function on 1
n
is in T.)
Denition 22.38. A function : 1
n
C is said to be positive (semi)
denite i the matrices A := (
k

j
)
m
k,j=1
are positive denite for all
m N and
j
m
j=1
1
n
.
Lemma 22.39. If is a nite positive measure on B
R
n, then :=
C(1
n
, C) is a positive denite function.
Proof. The dominated convergence theorem implies C(1
n
, C). Since
is a positive measure (and hence real),
() =
_
R
n
e
ix
d(x) =
_
R
n
e
ix
d(x) = ().
From this it follows that for any m N and
j
m
j=1
1
n
, the matrix A :=
(
k

j
)
m
k,j=1
is self-adjoint. Moreover if C
m
,
m
k,j=1
(
k

j
)
k
j
=
_
R
n
m
k,j=1
e
i(kj)x
j
d(x)
=
_
R
n
m
k,j=1
e
ikx
k
e
ijx
j
d(x)
=
_
R
n
k=1
e
ikx
2
d(x) 0
showing A is positive denite.
Proposition 22.40. Suppose that : 1
n
C is said to be positive denite
with (0) = 1. If is continuous at 0 then in fact is uniformly continuous
on all of 1
n
.
Proof. Taking
1
= x,
2
= y and
3
= 0 in Denition 22.38 we conclude
that
A :=
_
_
1 (x y) (x)
(y x) 1 (y)
(x) (y) 1
_
_
=
_
_
1 (x y) (x)
(x y) 1 (y)
(x) (y) 1
_
_
is positive denite. In particular,
0 det A = 1 +(x y) (y) (x) +(x) (x y) (y)
[(x)[
2
[(y)[
2
[(x y)[
2
.
Combining this inequality with the identity,
22.6 Appendix: Bochners Theorem 367
[(x) (y)[
2
= [(x)[
2
+[(y)[
2
(x) (y) (y) (x) ,
gives
0 1 [(x y)[
2
+(x y) (y) (x) +(x) (x y) (y)
_
[(x) (y)[
2
+(x) (y) +(y) (x)
_
= 1 [(x y)[
2
[(x) (y)[
2
+(x y) (y) (x) (y) (x) +(x) (x y) (y) (x) (y)
= 1 [(x y)[
2
[(x) (y)[
2
+ 2 Re (((x y) 1) (y) (x))
1 [(x y)[
2
[(x) (y)[
2
+ 2 [(x y) 1[ .
Hence we have
[(x) (y)[
2
1 [(x y)[
2
+ 2 [(x y) 1[
= (1 [(x y)[) (1 +[(x y)[) + 2 [(x y) 1[
4 [1 (x y)[
Remark 22.41. The function f () = 1
0]
() is positive denite since the ma-
trix, f (
i
j
)
n
i,j=1
is the n n identity matrix for all choices of distinct
n
i=1
in 1. Note however that f is not continuous at = 0.
Lemma 22.42. If C(1
n
, C) is a positive denite function, then
1. (0) 0.
2. () = () for all 1
n
.
3. [()[ (0) for all 1
n
.
4. If we further assume that is continuous, then
_
R
n
R
n
( )f()f()dd 0 (22.37)
for all f S(1
d
).
Proof. Taking m = 1 and
1
= 0 we learn (0) [[
2
0 for all C which
proves item 1. Taking m = 2,
1
= and
2
= , the matrix
A :=
_
(0) ( )
( ) (0)
_
is positive denite from which we conclude ( ) = ( ) (since A = A
by denition) and
0 det
_
(0) ( )
( ) (0)
_
= [(0)[
2
[( )[
2
.
and hence [()[ (0) for all . This proves items 2. and 3. Item 4. follows by
approximating the integral in Eq. (22.37) by Riemann sums,
_
R
n
R
n
( )f()f()dd
= lim
0
2n
,(Z
n
)[
1
,
1
]
n
( )f()f() 0.
The details are left to the reader keeping in mind this is where we must use the
assumption that is continuous.
Theorem 22.43 (Bochners Theorem). Suppose C(1
n
, C) is positive
denite function which is continuous at 0, then there exists a unique positive
measure on B
R
n such that = .
Proof. If () = (), then for f o we would have
_
R
n
fd =
_
R
n
(f
d =
_
R
n
f
() ()d.
This suggests that we dene
I(f) :=
_
R
n
()f
()d for all f o.

We will now show I is positive in the sense if f o and f 0 then I(f) 0.
For general f o we have
I([f[
2
) =
_
R
n
()
_
[f[
2
_
()d =
_
R
n
()
_
f
_
()d
=
_
R
n
()f
( )

f
()dd =
_
R
n
()f
( )f
()dd
=
_
R
n
( )f
()f
()dd 0. (22.38)
For t > 0 let p
t
(x) := t
n/2
e
]x]
2
/2t
o and dene
I
t
(x) := Ip
t
(x) := I(p
t
(x )) = I(
_
p
t
(x )
2
)
which is non-negative by Eq. (22.38) and the fact that
_
p
t
(x ) o. Using
[p
t
(x )]
() =
_
R
n
p
t
(x y)e
iy
dy =
_
R
n
p
t
(y)e
i(y+x)
dy
= e
ix
p
t
() = e
ix
e
t]]
2
/2
,
I
t
, ) =
_
R
n
I(p
t
(x ))(x)dx
=
_
R
n
__
R
n
() [p
t
(x )]
()(x)d
_
dx
=
_
R
n
__
R
n
()e
ix
e
t]]
2
/2
(x)d
_
dx
=
_
R
n
()
()e
t]]
2
/2
d
which coupled with the dominated convergence theorem shows
Ip
t
, )
_
R
n
()
()d = I() as t 0.
Hence if 0, then I() = lim
t0
I
t
, ) 0.
Let K 1 be a compact set and C
c
(1, [0, )) be a function such that
= 1 on K. If f C
c
(1, 1) is a smooth function with supp(f) K, then
0 |f|
f o and hence
0 I, |f|
f) = |f|
I, ) I, f)
and therefore I, f) |f|
I, ). Replacing f by f implies, I, f)
|f|
I, ) and hence we have proved

[I, f)[ C(supp(f)) |f|
(22.39)
for all f T
R
n := C
c
(1
n
, 1) where C(K) is a nite constant for each compact
subset of 1
n
. Because of the estimate in Eq. (22.39), it follows that I[
1R
n has a
unique extension I to C
c
(1
n
, 1) still satisfying the estimates in Eq. (22.39) and
moreover this extension is still positive. So by the Riesz Markov Theorem ??,
there exists a unique Radon measure on 1
n
such that such that I, f) = (f)
for all f C
c
(1
n
, 1).
To nish the proof we must show () = () for all 1
n
given
(f) =
_
R
n
()f
()d for all f C
c
(1
n
, 1). (22.40)
Let f C
c
(1
n
, 1
+
) be a radial function such f(0) = 1 and f(x) is decreasing
as [x[ increases. Let f
(x) := f(x), then by Theorem ??,

T
1
_
e
ix
f
(x)
() =
n
f
)
and therefore, from Eq. (22.40),
_
R
n
e
ix
f
(x)d(x) =
_
R
n
()
n
f
)d. (22.41)
Because
_
R
n
f
()d = Tf
(0) = f(0) = 1, we may apply the approximate

function Theorem 22.44 below to Eq. (22.41) to nd (using the continuity of
here!)
_
R
n
e
ix
f
(x)d(x) () as 0. (22.42)
On the the other hand, when = 0, the monotone convergence theorem implies
(f
) (1) = (1
n
) and therefore (1
n
) = (1) = (0) < . Now knowing
the is a nite measure we may use the dominated convergence theorem to
concluded
(e
ix
f
(x)) (e
ix
) = () as 0
for all . Combining this equation with Eq. (22.42) shows () = () for all
1
n
.
Theorem 22.44 (Approximate functions). Let p [1, ], L
1
(1
d
),
a :=
_
R
d
(x)dx, and for t > 0 let
t
(x) = t
d
(x/t). Then
1. If f L
p
with p < then
t
f af in L
p
as t 0.
2. If f BC(1
d
) and f is uniformly continuous then |
t
f af|
0 as
t 0.
3. If f L
and f is continuous on U
o
1
d
then
t
f af uniformly on
compact subsets of U as t 0.
Proof. Making the change of variables y = tz implies
t
f(x) =
_
R
d
f(x y)
t
(y)dy =
_
R
d
f(x tz)(z)dz
so that
t
f(x) af(x) =
_
R
d
[f(x tz) f(x)] (z)dz
=
_
R
d
[
tz
f(x) f(x)] (z)dz. (22.43)
Hence by Minkowskis inequality for integrals (Theorem ?? of the analysis
notes), Proposition ?? and the dominated convergence theorem,
22.7 Appendix: Some Calculus Estimates 369
|
t
f af|
p

_
R
d
|
tz
f f|
p
[(z)[ dz 0 as t 0.
Item 2. is proved similarly. Indeed, form Eq. (22.43)
|
t
f af|

_
R
d
|
tz
f f|
[(z)[ dz
which again tends to zero by the dominated convergence theorem because
lim
t0
|
tz
f f|
= 0 uniformly in z by the uniform continuity of f.

Item 3. Let B
R
= B(0, R) be a large ball in 1
d
and K U, then
sup
xK
[
t
f(x) af(x)[
_
BR
_
B
c
R
_
BR
[(z)[ dz sup
xK,zBR
[f(x tz) f(x)[ + 2 |f|
_
B
c
R
[(z)[ dz
||
1
sup
xK,zBR
[f(x tz) f(x)[ + 2 |f|
_
]z]>R
[(z)[ dz
so that using the uniform continuity of f on compact subsets of U,
limsup
t0
sup
xK
[
t
f(x) af(x)[ 2 |f|
_
]z]>R
[(z)[ dz 0 as R .
22.7 Appendix: Some Calculus Estimates
We end this section by gathering together a number of calculus estimates that
we will need in the future.
1. Taylors theorem with integral remainder states, if f C
k
(1) and z, 1
or f be holomorphic in a neighborhood of z C and C be suciently
small so that f (z +t) is dened for t [0, 1] , then
f (z +) =
k1
n=0
f
(n)
(z)

n
n!
+
k
r
k
(z, ) (22.44)
=
k1
n=0
f
(n)
(z)

n
n!
+
k
_
1
k!
f
(k)
(z) + (z, )
_
(22.45)
where
r
k
(z, ) =
1
(k 1)!
_
1
0
f
(k)
(z +t) (1 t)
k1
dt (22.46)
=
1
k!
f
(k)
(z) + (z, ) (22.47)
and
(z, ) =
1
(k 1)!
_
1
0
_
f
(k)
(z +t) f
(k)
(z)
_
(1 t)
k1
dt 0 as 0.
(22.48)
To prove this, use integration by parts to show,
r
k
(z, ) =
1
k!
_
1
0
f
(k)
(z +t)
_
d
dt
_
(1 t)
k
dt
=
1
k!
_
f
(k)
(z +t) (1 t)
k
_
t=1
t=0
+

k!
_
1
0
f
(k+1)
(z +t) (1 t)
k
dt
=
1
k!
f
(k)
(z) +r
k+1
(z, ) ,
i.e.
k
r
k
(z, ) =
1
k!
f
(k)
(z)
k
+
k+1
r
k+1
(z, ) .
The result now follows by induction.
2. For y 1, siny = y
_
1
0
cos (ty) dt and hence
[siny[ [y[ . (22.49)
3. For y 1 we have
cos y = 1 +y
2
_
1
0
cos (ty) (1 t) dt 1 +y
2
_
1
0
(1 t) dt = 1
y
2
2
.
Equivalently put
3
,
3
Alternatively,
|sin y| =
_
y
0
cos xdx
_
y
0
|cos x| dx
|y|
and for y 0 we have,
cos y 1 =
_
y
0
sinxdx
_
y
0
xdx = y
2
/2.
This last inequality may also be proved as a simple calculus exercise following from;
g () = and g
(y) = 0 i sin y = y which happens i y = 0.

g (y) := cos y 1 +y
2
/2 0 for all y 1. (22.50)
4. Since
[e
z
1 z[ =
z
2
_
1
0
e
tz
(1 t) dt
[z[
2
_
1
0
e
t Re z
(1 t) dt
[z[
2
_
1
0
e
0Re z
(1 t) dt
we have shown
[e
z
1 z[ e
0Re z
[z[
2
2
. (22.51)
In particular if Re z 0, then
[e
z
1 z[ [z[
2
/2. (22.52)
5. Since e
iy
1 = iy
_
1
0
e
ity
dt,
e
iy
1
[y[ and hence
e
iy
1
2 [y[ for all y 1. (22.53)

Lemma 22.45. For z = re
i
with < < and r > 0, let lnz = lnr + i.
Then ln : C (, 0] C is a holomorphic function such that e
ln z
= z
4
and
if [z[ < 1 then
[ln(1 +z) z[ [z[
2
1
2 (1 [z[)
2
for [z[ < 1. (22.54)
Proof. Clearly e
ln z
= z and lnz is continuous. Therefore by the inverse
function theorem for holomorphic functions, lnz is holomorphic and
z
d
dz
lnz = e
ln z
d
dz
lnz = 1.
4
For the purposes of this lemma it suces to dene ln(1 +z) =
n=1
(z)
n
/n
and to then observe: 1)
d
dz
ln(1 +z) =
n=0
(z)
n
=
1
1 +z
,
and 2) the functions 1 +z and e
ln(1+z)
both solve
f
(z) =
1
1 +z
f (z) with f (0) = 1
and therefore e
ln(1+z)
= 1 +z.
Therefore,
d
dz
lnz =
1
z
and
d
2
dz
2
lnz =
1
z
2
. So by Taylors theorem,
ln(1 +z) = z z
2
_
1
0
1
(1 +tz)
2
(1 t) dt. (22.55)
If t 0 and [z[ < 1, then
1
(1 +tz)
n=0
[tz[
n
=
1
1 t [z[

1
1 [z[
.
and therefore,

_
1
0
1
(1 +tz)
2
(1 t) dt
1
2 (1 [z[)
2
. (22.56)
Eq. (22.54) is now a consequence of Eq. (22.55) and Eq. (22.56).
Lemma 22.46. For all y 1 and n N0 ,
e
iy
k=0
(iy)
k
k!
[y[
n+1
(n + 1)!
(22.57)
and in particular,
e
iy
1
[y[ 2 (22.58)
and

e
iy
_
1 +iy
y
2
2!
_
y
2
[y[
3
3!
. (22.59)
More generally for all n N we have
e
iy
k=0
(iy)
k
k!
[y[
n+1
(n + 1)!

2 [y[
n
n!
. (22.60)
Proof. By Taylors theorem (see Eq. (22.44) with f (y) = e
iy
, x = 0 and
= y) we have
e
iy
k=0
(iy)
k
k!
y
n+1
n!
_
1
0
i
n+1
e
ity
(1 t)
n
dt
[y[
n+1
n!
_
1
0
(1 t)
n
dt =
[y[
n+1
(n + 1)!
which is Eq. (22.57). Using Eq. (22.57) with n = 0 and the simple estimate;
e
iy
1
2 gives Eq. (22.58). Similarly, Eq. (22.57) follows from the estimates
coming from Eq. (22.57) with n = 1 and n = 2 respectively;
e
iy
_
1 +iy
y
2
2!
_
e
iy
(1 +iy)
y
2
2
y
2
2
y
2
2
= y
2
and

e
iy
_
1 +iy
y
2
2!
_
[y[
3
3!
.
Equation (22.60) is proved similarly and hence will be omitted.
Lemma 22.47. If X is a square integrable random variable, then
f ()
_
1 +iEX

2
2!
E
_
X
2
e
iX
_
1 +iX
2
X
2
2!
_

2
()
(22.61)
where
() := E
_
X
2
[[ [X[
3
3!
_
0 as 0. (22.62)
Proof. Using Eq. (22.59) with y = X and taking expectations gives Eq.
(22.61). The DCT, with X
2
L
1
(P) being the dominating function, allows us
to conclude that lim
0
() = 0.
23
Weak Convergence of Random Sums
Throughout this chapter, we will assume the following standing notation
unless otherwise stated. For each n N, let X
n,k
n
k=1
be independent random
variables and let
S
n
:=
n
k=1
X
n,k
. (23.1)
Also let
f
nk
() := E
_
e
iXn,k
(23.2)
denote the characteristic function of X
n,k
. In Section 23.1 we are going to de-
scribe necessary and sucient conditions on the array, X
n,k
: 1 k n < ,
so that S
n
= N (0, 1) . In the latter sections we are going to explore other
possible limiting distributions for the S
n
n=1
. This will lead us to the notions
of innitely divisible and stable random variables.
23.1 Lindeberg-Feller CLT
Assumption 2 Until further notice we are going to assume E[X
n,k
] = 0,
2
n,k
= E
_
X
2
n,k
_
< , and Var (S
n
) =
n
k=1
2
n,k
= 1.
Example 23.1. Suppose X
n
n=1
are mean zero square integrable random vari-
ables with
2
k
= Var (X
k
) . If we let s
2
n
:=

n
k=1
Var (X
k
) =

n
k=1
2
k
,
2
n,k
:=
2
k
/s
2
n
, and X
n,k
:= X
k
/s
n
, then X
n,k
n
k=1
satisfy the above hypothesis
and S
n
=
1
sn
n
k=1
X
k
.
Our main interest in this chapter is to consider the limiting behavior of S
n
as n . In order to do this, it will be useful to put conditions on the X
n,k
such that no one term dominates sum dening the sum dening S
n
in Eq. (23.1)
in the limit as n .
Denition 23.2. Let X
n,k
be as above.
1. X
n,k
satises the Lindeberg Condition (LC) i
lim
n
n
k=1
E
_
X
2
n,k
: [X
n,k
[ > t
= 0 for all t > 0. (23.3)

(The (LC) condition is really a condition about small t.)
2. X
n,k
satises condition (M) if
D
n
:= max
_
2
n,k
: k n
_
0 as n . (23.4)
3. X
n,k
is uniformly asymptotic negligibility (UAN) if for all > 0,
lim
n
max
kn
P ([X
n,k
[ > ) = 0. (23.5)
Clearly it suces to test the Lindeberg condition for small t only. Each of
these conditions imposes constraints on the size of the tails of the X
n,k
, see
Lemma 23.6 below where it is shown (LC) = (M) = (UAN) . Condition
(M) asserts that all of the terms in the sum

n
k=1
2
n,k
= Var (S
n
) = 1 are
small so that no one term is contributing by itself.
Remark 23.3. The reader should observe that in order for condition (M) to hold
in the setup in Example 23.1 it is necessary that lim
n
s
2
n
= .
n
n=1
are i.i.d. with EX
n
= 0 and Var (X
n
) =
2
.
Then
_
X
n,k
:=
1
n
X
k
_
n
k=1
satisfy (LC) . Indeed,
n
k=1
E
_
X
2
n,k
: [X
n,k
[ > t
=
1
n
2
n
k=1
E
_
X
2
k
:
X
k
> t
_
=
1
2
E
_
X
2
1
: [X
1
[ >
nt
which tends to zero as n by DCT.

Lemma 23.5. Let X
n,k
n
k=1
for n N be as in Assumption 2. If X
n,k
n
k=1
satisfy the Liapunov condition;
lim
n
n
k=1
E[X
n,k
[
= 0 for some > 2 (23.6)

then (LC) holds. More generally, if X
n,k
satises the Liapunov condition,
lim
n
n
k=1
E
_
X
2
n,k
([X
n,k
[)
= 0
where : [0, ) [0, ) is a non-decreasing function such that (t) > 0 for
all t > 0, then X
n,k
satises (LC) .
374 23 Weak Convergence of Random Sums
Proof. Assuming Eq. (23.6), then for any t > 0,
n
k=1
E
_
X
2
n,k
: [X
n,k
[ > t
k=1
E
_
X
2
n,k
X
n,k
t
2
: [X
n,k
[ > t
_
1
t
2
n
k=1
E[[X
n,k
[
] =
1
t
2
n
k=1
E[X
n,k
[
0 as n .
The generalization is proved similarly;
n
k=1
E
_
X
2
n,k
: [X
n,k
[ > t
k=1
E
_
X
2
n,k
([X
n,k
[)
(t)
: [X
n,k
[ > t
_
1
(t)
n
k=1
E
_
X
2
n,k
([X
n,k
[)
0 as n .
Lemma 23.6. Let X
n,k
: 1 k n < be as above, then (LC) =
(M) = (UAN) . Moreover the Lindeberg Condition (LC) implies the fol-
lowing strong form of (UAN) ,
n
k=1
P ([X
n,k
[ > )
1
2
n
k=1
E
_
[X
n,k
[
2
: [X
n,k
[ >
_
0. (23.7)
Proof. For k n,
2
n,k
= E
_
X
2
n,k
= E
_
X
2
n,k
1
]Xn,k]t
+E
_
X
2
n,k
1
]Xn,k]>t
t
2
+E
_
X
2
n,k
1
]Xn,k]>t
t
2
+
n
m=1
E
_
X
2
n,m
1
]Xn,m]>t
and therefore using (LC) we nd

lim
n
max
kn
2
n,k
t
2
for all t > 0.
This clearly implies (M) holds. For > 0 we have by Chebyschevs inequality
that
P ([X
n,k
[ > )
1
2
E
_
[X
n,k
[
2
: [X
n,k
[ >
_
2
n,k
(23.8)
and therefore,
max
kn
P ([X
n,k
[ > )
1
2
max
kn
2
n,k
=
1
2
D
n
0 as n
which shows (M) = (UAN) . Summing Eq. (23.8) on k gives Eq. (23.7) and
the right member of this equation tends to zero as n by (LC) .
We will need the following lemma for our subsequent applications of the
continuity theorem.
Lemma 23.7. Suppose that a
i
, b
i
C with [a
i
[ , [b
i
[ 1 for i = 1, 2, . . . , n.
Then
i=1
a
i
i=1
b
i
i=1
[a
i
b
i
[ .
Proof. Let a :=
n1
i=1
a
i
and b :=
n1
i=1
b
i
and observe that [a[ , [b[ 1 and
that
[a
n
a b
n
b[ [a
n
a a
n
b[ +[a
n
b b
n
b[
= [a
n
[ [a b[ +[a
n
b
n
[ [b[
[a b[ +[a
n
b
n
[ .
The proof is now easily completed by induction on n.
Theorem 23.8 (Lindeberg-Feller CLT (I)). Suppose X
n,k
satises (LC)
and the hypothesis in Assumption 2, then
S
n
= N (0, 1) . (23.9)
(See Theorem 23.13 for a converse to this theorem.)
To prove this theorem we must show
E
_
e
iSn
2
/2
as n . (23.10)
Before starting the formal proof, let me give an informal explanation for Eq.
(23.10). Using
f
nk
() 1

2
2

2
nk
,
we might expect
E
_
e
iSn
=
n
k=1
f
nk
() = e
n
k=1
ln fnk()
= e
n
k=1
ln(1+fnk()1)
(A)
e
n
k=1
(fnk()1)
_
=
n
k=1
e
(fnk()1)
_
(B)
e
n
k=1
2
2

2
nk
= e
2
2
.
The question then becomes under what conditions are these approximations
valid. It turns out that approximation (A), namely that
23.1 Lindeberg-Feller CLT 375
lim
n
k=1
f
nk
() exp
_
n
k=1
(f
nk
() 1)
_
= 0, (23.11)
is valid if condition (M) holds, see Lemma 23.11 below and the approximation
(B) is valid, i.e.
lim
n
n
k=1
(f
nk
() 1) =
1
2
2
,
if (LC) is satised, see Lemma 23.9. These observations would then constitute
a proof of Theorem 23.8. The proof we give below of Theorem 23.8 will not
quite follow this route and will not use Lemma 23.11 directly. However, this
lemma will be used in the proofs of Theorems 23.13 and 23.22.
Proof. (Proof of Theorem 23.8) Since
E
_
e
iSn
=
n
k=1
f
nk
() and e
2
/2
=
n
k=1
e
2
n,k
/2
,
we may use Lemma 23.7 to conclude,
E
_
e
iSn
2
/2
k=1
f
nk
() e
2
n,k
/2
=
n
k=1
(A
n,k
+B
n,k
)
where A
n,k
is dened in Eq. (23.12) and
B
n,k
:=
_
1
2
n,k
2
_
e
2
n,k
/2
.
Because of Lemma 23.9 below, in order to nish the proof it suces to
show lim
n
n
k=1
B
n,k
= 0. To estimate

n
k=1
B
n,k
, we use the estimate,
[e
u
1 +u[ u
2
/2 valid for u 0 (see Eq. (22.52) with z = u). With this
estimate we nd,
n
k=1
B
n,k
=
n
k=1
_
1
2
n,k
2
_
e
2
n,k
/2
k=1
1
2
_
2
n,k
2
_
2
=

4
8
n
k=1
4
n,k

4
8
max
kn
2
n,k
n
k=1
2
n,k
=

4
8
max
kn
2
n,k
0,
wherein we have used (M) (which is implied by (LC)) in taking the limit as
n .
Lemma 23.9. Let
A
n,k
:=
f
nk
() 1 +

2
2
n,k
2
(23.12)
and assume that X
n,k
n
k=1
satises (LC), then
limsup
n
n
k=1
A
n,k
= 0 and
lim
n
n
k=1
(f
nk
() 1) =
2
/2 for all 1.
Proof. Rewriting A
n,k
and using Lemma 22.47 implies for every > 0 that,
A
n,k
=
E
_
e
iXn,k
1 +

2
2
X
2
n,k
_
e
iXn,k
1 +

2
2
X
2
n,k

2
E
_
X
2
n,k

[[ [X
n,k
[
3
3!
_

2
E
_
X
2
n,k

[[ [X
n,k
[
3
3!
: [X
n,k
[
_
+
2
E
_
X
2
n,k

[[ [X
n,k
[
3
3!
: [X
n,k
[ >
_
[[
3
3!
E
_
[X
n,k
[
2
: [X
n,k
[
_
+
2
E
_
X
2
n,k
: [X
n,k
[ >
=
[[
3
6

2
n,k
+
2
E
_
X
2
n,k
: [X
n,k
[ >
.
Summing this equation on k and making use of (LC) gives;
limsup
n
n
k=1
A
n,k

3
6
0 as 0. (23.13)
The second limit follows from the rst and the simple estimate;
k=1
(f
nk
() 1) +
2
/2
k=1
_
f
nk
() 1 +

2
2
n,k
2
_
k=1
A
n,k
.
As an application of Theorem 23.8 we can give half of the proof of Theorem
20.35.
Theorem 23.10 (Converse assertion in Theorem 20.35). If X
n
n=1
are
independent random variables and the random series,
n=1
X
n
, is almost surely
convergent, then for all c > 0 the following three series converge;
1.

n=1
P ([X
n
[ > c) < ,
2.

n=1
Var
_
X
n
1
]Xn]c
_
< , and
3.

n=1
E
_
X
n
1
]Xn]c
_
converges.
Proof. Since

n=1
X
n
is almost surely convergent, it follows that
lim
n
X
n
= 0 a.s. and hence for every c > 0, P ([X
n
[ c i.o.) = 0. Ac-
cording the Borel zero one law (Lemma 10.41) this implies for every c > 0 that
n=1
P ([X
n
[ > c) < . Since X
n
0 a.s., X
n
and
_
X
c
n
:= X
n
1
]Xn]c
_
are
tail equivalent for all c > 0. In particular

n=1
X
c
n
is almost surely convergent
for all c > 0.
Fix c > 0, let Y
n
:= X
c
n
E[X
c
n
] and let
s
2
n
= Var (Y
1
+ +Y
n
) =
n
k=1
Var (Y
k
) =
n
k=1
Var (X
c
k
) =
n
k=1
Var
_
X
k
1
]Xk]c
_
.
For the sake of contradictions, suppose s
2
n
as n . Since [Y
k
[ 2c, it
follows that

n
k=1
E
_
Y
2
k
1
]Yk]>snt
= 0 for all suciently large n and hence

lim
n
1
s
2
n
n
k=1
E
_
Y
2
k
1
]Yk]>snt
= 0,
i.e. Y
n,k
:= Y
k
/s
n
n=1
satises (LC) see Examples 23.1 and Remark 23.3.
So by the central limit Theorem 23.8, it follows that
1
s
2
n
n
k=1
(X
c
n
E[X
c
n
]) =
1
s
2
n
n
k=1
Y
k
= N (0, 1) .
On the other hand we know
lim
n
1
s
2
n
n
k=1
X
c
n
=
k=1
X
c
k
lim
n
s
2
n
= 0 a.s.
and so by Slutskys theorem,
1
s
2
n
n
k=1
E[X
c
n
] =
1
s
2
n
n
k=1
X
c
n
1
s
2
n
n
k=1
Y
k
= N (0, 1) .
But it is not possible for constant (i.e. non-random) variables, c
n
:=
1
s
2
n
n
k=1
E[X
c
n
] , to converge to a non-degenerate limit. (Think about this ei-
ther in terms of characteristic functions or in terms of distribution functions.)
Thus we must conclude that
n=1
Var
_
X
n
1
]Xn]c
_
=
n=1
Var (X
c
n
) = lim
n
s
2
n
< .
An application of Kolmogorovs convergence criteria (Theorem 20.11) im-
plies that
n=1
(X
c
n
E[X
c
n
]) is convergent a.s.
Since we already know that

n=1
X
c
n
is convergent almost surely we may now
conclude

n=1
E
_
X
n
1
]Xn]c
_
is convergent.
Let us now turn to the converse of Theorem 23.8, see Theorem 23.13 below.
n,k
satises property (M) , i.e. D
n
:=
max
kn
2
n,k
0. If we dene,
n,k
() := f
n,k
() 1 = E
_
e
iXn,k
1
,
then;
1. lim
n
max
kn
[
n,k
()[ = 0 and
2. f
Sn
()
n
k=1
e
n,k()
0 as n , where
f
Sn
() = E
_
e
iSn
=
n
k=1
f
n,k
() .
Proof. For any > 0 we have, making use of Eq. (22.58) and Chebyschevs
inequality, that
[
n,k
()[ = [f
n,k
() 1[ E
e
iXn,k
1
E[2 [X
n,k
[]
E[2 [X
n,k
[ : [X
n,k
[ ] +E[2 [X
n,k
[ : [X
n,k
[ < ]
2P [[X
n,k
[ ] +[[
2
2
n,k
2
+[[ .
Therefore,
limsup
n
max
kn
[
n,k
()[ limsup
n
_
2D
n
2
+[[
_
= [[ 0 as 0.
For the second item, observe that Re
n,k
() = Re f
n,k
() 1 0 and
hence

e
n,k()
= e
Re n,k()
1. Therefore by Lemma 23.7 and the estimate
(22.52) we nd;
23.1 Lindeberg-Feller CLT 377
k=1
e
n,k()
k=1
f
n,k
()
k=1
e
n,k()
f
n,k
()
=
n
k=1
e
n,k()
(1 +
n,k
())
1
2
n
k=1
[
n,k
()[
2
1
2
max
kn
[
n,k
()[
n
k=1
[
n,k
()[ .
Since EX
n,k
= 0 we may write express
n,k
as
n,k
() = E
_
e
iXn,k
1 iX
n,k
and then using estimate in Eq. (22.52) again shows

n
k=1
[
n,k
()[ =
n
k=1
E
_
e
iXn,k
1 iX
n,k
k=1
E
_
1
2
[X
n,k
[
2
_

2
2
n
k=1
2
n,k
=

2
2
.
Thus we have shown,
k=1
f
n,k
()
n
k=1
e
n,k()

2
4
max
kn
[
n,k
()[
and the latter expression tends to zero by item 1.
Lemma 23.12. Let X be a random variable such that EX
2
< and EX = 0.
Further let f () := E
_
e
iX
and u() := Re (f () 1) . Then for all c > 0,

u() +

2
2
E
_
X
2
E
_
X
2
_
2
2

2
c
2
_
: [X[ > c
_
(23.14)
or equivalently
E
_
cos X 1 +

2
2
X
2
_
E
_
X
2
_
2
2

2
c
2
_
: [X[ > c
_
. (23.15)
In particular if we choose [[
6/ [c[ , then
E
_
cos X 1 +

2
2
X
2
_
1
c
2
E
_
X
2
: [X[ > c
. (23.16)
Proof. For all 1, we have (see Eq. (22.50)) cos X1 +

2
2
X
2
0 and
cos X 1 2. Therefore,
u() +

2
2
E
_
X
2
= E
_
cos X 1 +

2
2
X
2
_
E
_
cos X 1 +

2
2
X
2
: [X[ > c
_
E
_
2 +

2
2
X
2
: [X[ > c
_
E
_
2
[X[
2
c
2
+

2
2
X
2
: [X[ > c
_
which gives Eq. (23.14).
Theorem 23.13 (Lindeberg-Feller CLT (II)). Suppose X
n,k
satises
(M) and also the central limit theorem in Eq. (23.9) holds, then X
n,k
satises
(LC) . So under condition (M) , S
n
converges to a normal random variable i
(LC) holds.
Proof. By assumption we have
lim
n
max
kn
2
n,k
= 0 and lim
n
n
k=1
f
n,k
() = e
2
/2
.
The second inequality combined with Lemma 23.11 implies,
lim
n
e
n
k=1
n,k()
= lim
n
n
k=1
e
n,k()
= e
2
/2
.
Taking the modulus of this equation then implies,
lim
n
e
n
k=1
Re n,k()
= lim
n
n
k=1
n,k()
= e
2
/2
from which we may conclude
lim
n
n
k=1
Re
n,k
() =
2
/2.
We may write this last limit as
lim
n
n
k=1
E
_
cos (X
n,k
) 1 +

2
2
X
2
n,k
_
= 0
which by Lemma 23.12 implies
lim
n
n
k=1
E
_
X
2
n,k
: [X
n,k
[ > c
= 0
for all c > 0 which is (LC) .
As an application of Theorem 23.8 let us see what it has to say about
Brownian motion. In what follows we say that B
t
t0
is a Gaussian process
if for all nite subsets, [0, ) the random variables B
t
t
are jointly
Gaussian. We will discuss Gaussian processes in more generality in Chapter 24.
Proposition 23.14. Suppose that B
t
t0
is a stochastic process on some prob-
ability space, (, B, P) such that;
1. B
0
= 0 a.s., EB
t
= 0 for all t 0,
2. E(B
t
B
s
)
2
= t s for all 0 s t < ,
3. B has independent increments, i.e. if 0 = t
0
< t
1
< < t
n
< , then
_
B
tj
B
tj1
_
n
j=1
are independent random variables.
4. (Moment Condition) There exists p > 2, q > 1 and c < such that
E[B
t
B
s
[
p
c [t s[
q
for all s, t 1
+
.
Then B
t
B
s
d
= N (0, t s) for all 0 s < t < . We call such a process
satisfying these conditions a pre-Brownian motion.
Proof. Let 0 s < t and for each n N and 1 k n let X
n,k
:=
B
tk
B
tk1
where s = t
0
< t
1
< < t
n
= t is the uniform partition of [s, t] .
Under the moment condition hypothesis we nd,
n
k=1
E
_
X
p
n,k
_
c
n
k=1
_
t s
n
_
q
= c (t s)
q
n
n
q
0 as n .
Thus we have shown that X
n,k
satises a Liapunov condition which by
Lemma 23.5 implies that X
n,k
satises (LC) . Therefore, B
t
B
s
=
n
k=1
X
n,k
N (0, t s) as n by the Lindeberg-Feller central limit
Theorem 23.8.
Remark 23.15 (Poisson Process). There certainly are other properties satisfying
items 1.-3. other than a pre-Brownian motion. Indeed, if N
t
t0
is a Poisson
process with intensity (see Example 25.8), then B
t
:=
1
N
t
t satises the
items 1.3. above. Recall that Var (N
t
N
s
) = (t s)
2
and E(N
t
N
s
) =
(t s) so EB
t
= 0 and
E(B
t
B
s
)
2
= Var (B
t
B
s
) = Var
_
1
(N
t
N
s
)
_
=
2
Var (N
t
N
s
) =
2
(t s)
2
= t s.
In this case one can show that E[(B
t
B
s
)
p
] [t s[ for all 1 p < .
This last remark leads us to our next topic.
23.2 Innitely Divisible Distributions
In the this section we are going to investigate the possible limiting distributions
of the S
n
n=1
when we relax the Lindeberg condition. Let us begin with a
simple example of the Poisson limit theorem.
Theorem 23.16 (A Poisson Limit Theorem). For each n N, let
Y
n,k
n
k=1
be independent Bernoulli random variables with P (Y
n,k
= 1) = p
n,k
and P (Y
n,k
= 0) = q
n,k
:= 1 p
n,k
. Suppose;
1. lim
n
n
k=1
p
n,k
= a (0, ) and
2. lim
n
max
1kn
p
n,k
= 0. (So no one term is dominating the sums in
item 1.)
Then S
n
=

n
k=1
Y
n,k
= Z where Z is a Poisson random variable with
mean a. (See [15, Section 2.6] for more on this theorem.)
Proof. We will give two proofs of this theorem. The rst proof relies on the
law of rare events in Theorem 21.10 while the second uses Fourier transform
methods.
First proof. Let Z
n
d
= Poi (
n
k=1
p
n,k
) , then by Theorem 21.10, we know
that
d
TV
(Z
n
, S
n
)
n
k=1
p
2
n,k
max
1kn
p
n,k

n
k=1
p
n,k
.
From the assumptions it follows that lim
n
d
TV
(Z
n
, S
n
) = 0 and from
part 3. of Exercise 21.6 we know that lim
n
d
TV
(Z
n
, Z) = 0. Therefore,
lim
n
d
TV
(Z, S
n
) = 0.
Second proof. Recall from Example 22.11 that for any a > 0,
E
_
e
iZ
= exp
_
a
_
e
i
1
__
.
Since
E
_
e
iYn,k
= e
i
p
n,k
+ (1 p
n,k
) = 1 +p
n,k
_
e
i
1
_
,
it follows that
E
_
e
iSn
=
n
k=1
_
1 +p
n,k
_
e
i
1
_
.
Since 1 +p
n,k
_
e
i
1
_
lies on the line segment joining 1 to e
i
, it follows (see
Figure 23.1) that
23.2 Innitely Divisible Distributions 379
Fig. 23.1. Simple circle geometry reecting the convexity of the disk.
1 +p
n,k
_
e
i
1
_
1.
Hence we may apply Lemma 23.7 to nd
k=1
exp
_
p
n,k
_
e
i
1
__
k=1
_
1 +p
n,k
_
e
i
1
_
k=1
exp
_
p
n,k
_
e
i
1
__
_
1 +p
n,k
_
e
i
1
_
=
n
k=1
[exp(z
n,k
) [1 +z
n,k
][
where
z
n,k
= p
n,k
_
e
i
1
_
.
Since Re z
n,k
= p
n,k
(cos 1) 0, we may use the calculus estimate in Eq.
(22.52) to conclude,
k=1
exp
_
p
n,k
_
e
i
1
__
k=1
_
1 +p
n,k
_
e
i
1
_
1
2
n
k=1
[z
n,k
[
2
1
2
max
1kn
[z
n,k
[
n
k=1
[z
n,k
[
2 max
1kn
p
n,k
n
k=1
p
n,k
.
Using the assumptions, we may conclude
k=1
exp
_
p
n,k
_
e
i
1
__
k=1
_
1 +p
n,k
_
e
i
1
_
0 as n .
Since
n
k=1
exp
_
p
n,k
_
e
i
1
__
= exp
_
n
k=1
p
n,k
_
e
i
1
_
_
exp
_
a
_
e
i
1
__
,
we have shown
lim
n
E
_
e
iSn
= lim
n
n
k=1
_
1 +p
n,k
_
e
i
1
_
= lim
n
n
k=1
exp
_
p
n,k
_
e
i
1
__
= exp
_
a
_
e
i
1
__
.
The result now follows by an application of the continuity Theorem 22.18.
Remark 23.17. Keeping the notation in Theorem 23.16, we have
E[Y
n,k
] = p
n,k
and Var (Y
n,k
) = p
n,k
(1 p
n,k
)
and
s
2
n
:=
n
k=1
Var (Y
n,k
) =
n
k=1
p
n,k
(1 p
n,k
) .
Under the assumptions of Theorem 23.16, we see that s
2
n
a as n . Let
us now center and normalize the Y
n,k
by setting;
X
n,k
:=
Y
n,k
p
n,k
s
n
so that
2
n,k
:= Var (X
n,k
) =
1
s
2
n
Var (Y
n,k
) =
1
s
2
n
p
n,k
(1 p
n,k
) ,
E[X
n,k
] = 0, Var (
n
k=1
X
n,k
) = 1, and the X
n,k
satisfy condition (M) . On
the other hand for small t and large n we have
E
_
X
2
n,k
: [X
n,k
[ > t
= E
_
X
2
n,k
:
Y
n,k
p
n,k
s
n
> t
_
= E
_
X
2
n,k
: [Y
n,k
p
n,k
[ > s
n
t
E
_
X
2
n,k
: [Y
n,k
p
n,k
[ > 2at
= E
_
X
2
n,k
: Y
n,k
= 1
= p
n,k
_
1 p
n,k
s
n
_
2
lim
n
n
k=1
E
_
X
2
n,k
: [X
n,k
[ > t
= lim
n
n
k=1
p
n,k
_
1 p
n,k
s
n
_
2
= a.
Therefore X
n,k
do not satisfy (LC) . Nevertheless we have by Theorem 23.16
along with Slutzkys Theorem 21.39 that
n
k=1
X
n,k
=
n
k=1
Y
n,k

n
k=1
p
n,k
s
n
=
Z a
a
where Z is a Poisson random variable with mean a. Notice that the limit is not
a normal random variable in agreement with Theorem 23.13.
Given this example it is natural to ask what are the possible limiting dis-
tribution of row sums of random arrays X
n,k
. As it turns out the answer is
often contained in the following denition.
Denition 23.18. A probability distribution, , on (1, B
R
) is innitely di-
visible i for all n N there exists i.i.d. nondegenerate random variables,
X
n,k
n
k=1
, such that X
n,1
+ +X
n,n
d
= . This can be formulated in the follow-
ing two equivalent ways. For all n N there should exists a non-degenerate prob-
ability measure,
n
, on (1, B
R
) such that
n
n
= . For all n N, () = [g ()]
n
for some non-constant characteristic function, g.
Theorem 23.19. Suppose that is a probability measure on (1, B
R
)
and Law(X) = . Then is innitely divisible i there exits an array,
X
n,k
: 1 k m
n
with X
n,k
mn
k=1
being i.i.d. such that

mn
k=1
X
n,k
= X
and m
n
as n .
Proof. The only non-trivial direction is (=) . Let me do this in the case
that m
n
= n for all n case. See Kallenberg [28, Lemma 15.13, p. 294] for the
needed result involving the tail bounds needed to cover the full case. In this
case given any k, we have S
nk
=
k
i=1
S
i
n
where
S
i
n
=
ki
j=k(i1)+1
X
n,j
.
Notice that
_
S
i
n
_
k
i=1
are i.i.d. for each n N and since S
nk
= X as n
we know that S
nk
n=1
is tight and there exists (r) 0 as r such that
P ([S
nk
[ > r) (r) .
Since
P
_
S
1
n
> r
_
k
= P
_
S
i
n
> r for 1 i k
_
P (S
nk
> kr) P ([S
nk
[ > kr) (kr)
and similarly,
P
_
S
1
n
> r
_
k
= P
_
S
i
n
> r for 1 i k
_
P (S
nk
> kr) P ([S
nk
[ > kr) (kr)
we see that P
_
S
1
n
> r
_
2 (kr)
1.k
0 as r which shows that
_
S
1
n
_
n=1
has tight distributions as well. Thus there exists a subsequence n
l
such that
S
1
nl
= Y as l . Let Y
i
k
i=1
be i.i.d. random variables with Y
i
d
= Y. Then
by Exercise 21.11 it follows that
S
knl
=
k
i=1
S
i
nl
= Y
1
+. . . Y
k
from which we conclude that X
d
= Y
1
+. . . Y
k
.
It turns out that the characteristic function of an innitely divisible distri-
bution has to have a very special form. This will be the subject of the Levy
Kintchine formula in Theorem 23.21 below. First though, lets give a couple of
examples.
Example 23.20 (Following Theorem 17.28). Suppose that Z
n
n=1
are i.i.d. ran-
dom variables and N
d
= Poi () and Y
d
= N (0, 1) + are chosen so that
Z
n
n=1
N
, Y are all independent, then S := Y +
nN
Z
n
is innitely
divisible. Indeed we have
f
S
() = E
_
e
iS
= E
_
e
iY
E
_
e
i
kN
Zk
_
= E
_
e
iY
k=0
E
_
e
i
kN
Zk
[N
= n
_
P (N
= n)
= E
_
e
iY
n=0
E
_
e
i(Z1++Zn)
_
P (N
= n)
= exp
_
1
2
2
+i
_

n=0
e
n
n!
()
n
= exp
_
1
2
2
+i +( () 1)
_
= e
()
where
() =
1
2
2
+i +
_
R
_
e
ix
1
_
d (x)
23.2 Innitely Divisible Distributions 381
and d (x) := d(x) is an arbitrary nite measure on 1. It is interesting
to note that if X
:=

nN
Z
n
and m N then the law of the sum of m
independent copies of X
/m
is the law of X
. This explicitly shows that X
is
innitely divisible.
In a related result, recall from Exercise 22.4 that for any nite measure on
(1, B
R
) , there exists a (necessarily unique) probability measure on (1, B
R
)
such that
() = exp
__
R
e
ix
1 ix
x
2
d (x)
_
.
Notice that any such probability measure is innitely divisible since for each
n N there exists a unique probability measure,
n
, such that

n
() = exp
_
1
n
_
R
e
ix
1 ix
x
2
d (x)
_
.
Keeping these two examples in mind should make the following important the-
orem plausible.
Theorem 23.21 (Levy Kintchine formula). A probability measure on
(1, B
R
) is innitely divisible i () = e
i()
where
() = ib
1
2
a
2
+
_
R\0]
_
e
ix
1 ix 1
]x]1
_
d (x) (23.17)
for some b 1, a 0, and some measure on 1 0 such that
_
R\0]
_
x
2
1
_
d (x) < . (23.18)
Proof. We will give the easy direction of this proof, namely the implication
(=) . Notice that if the measure appearing in Eq. (23.17) is a nite measure
then
() = ib
t
1
2
a
2
+
_
R
_
e
ix
1
_
d (x)
where
b
t
= b
_
]x]1
xd (x) .
Thus we may use Example 23.20 in order to construct a random variable with
distribution given by .
For general satisfying Eq. (23.18) let, for > 0, d
(x) := 1
]x]
d (x) a
nite measure on 1. Thus by Example 23.20 there exits a probability measure,
, on (1, B
R
) such that
= e
where
() := ib
1
2
a
2
+
_
R\0]
_
e
ix
1 ix 1
]x]1
_
1
]x]
d (x) .
As e
ix
1 ix 1
]x]1
is bounded and less than C () x
2
for [x[ 1, we may
use DCT to show
() () . Furthermore the DCT also shows () is

continuous and therefore e
()
is continuous. Thus we have shown that
where the limit is continuous and therefore by the continuity Theorem 22.18,
there exists a probability measure on (1, B
R
) such that = e
.
The proof of the other implication ( =) will be discussed in Appendix 23.4
below. For more information about Poisson processes and Levy processes; see
Protter [49, Chapter I], [9, Chapter 9.5], and [19, Chapter XVII.2, p. 558-] for
analytic proofs. Also see http://www.math.uconn.edu/bass/scdp.pdf, Kallen-
berg [28, Theorem 15.13, p. 294], and [1].
We are now going see that we may often drop the identically distributed
assumption of the X
n,k
n
k=1
and yet still have that the weak limit of the sums
of the form,

n
k=1
X
n,k
, are still innitely divisible distributions. In the next
theorem we are going to see this is the case for weak limits under condition
(M) .
Theorem 23.22 (Limits under (M)). Suppose X
n,k
n
k=1
satisfy property
(M) and the normalizations in Assumption 2. If S
n
:=

n
k=1
X
n,k
= L for
some random variable L, then
f
L
() := E
_
e
iL
= exp
__
R
e
ix
1 ix
x
2
d (x)
_
for some nite positive measure, , on (1, B
R
) with (1) 1.
Proof. As before, let f
n,k
() = E
_
e
iXn,k
and
n,k
() := f
n,k
() 1. By
the continuity theorem we are assuming
lim
n
f
Sn
() = lim
n
n
k=1
f
n,k
() = f ()
where f () is continuous at = 0. We are also assuming property (M) , i.e.
lim
n
max
kn
2
n,k
= 0.
Under condition (M) , we expect f
n,k
()
= 1 for n large. Therefore we expect

f
n,k
() = e
ln fn,k()
= e
ln[1+(fn,k()1)]

= e
(fn,k()1)
and hence that
E
_
e
iSn
=
n
k=1
f
n,k
()
=
n
k=1
e
(fn,k()1)
= exp
_
n
k=1
(f
n,k
() 1)
_
. (23.19)
This is in fact correct, since Lemma 23.11 indeed implies
lim
n
_
E
_
e
iSn
exp
_
n
k=1
(f
n,k
() 1)
__
= 0. (23.20)
Since E[X
n,k
] = 0,
f
n,k
() 1 = E
_
e
iXn,k
1
= E
_
e
iXn,k
1 iX
n,k
=
_
R
_
e
ix
1 ix
_
d
n,k
(x)
where
n,k
:= P X
1
n,k
is the law of X
n,k
. Therefore we have
exp
_
n
k=1
(f
n,k
() 1)
_
= exp
_
n
k=1
_
R
_
e
ix
1 ix
_
d
n,k
(x)
_
= exp
_
_
R
_
e
ix
1 ix
_
n
k=1
d
n,k
(x)
_
= exp
__
R
_
e
ix
1 ix
_
d
n
(x)
_
(23.21)
where
n
:=
n
k=1
n,k
. Let us further observe that
_
R
x
2
d
n
(x) =
n
k=1
_
R
x
2
d
n,k
(x) =
n
k=1
2
n,k
= 1.
Hence if we dene d
n
(x) := x
2
d
n
(x) , then
n
is a probability measure and
we have from Eqs. (23.20) and Eq. (23.21) that
f
Sn
() exp
__
R
e
ix
1 ix
x
2
d
n
(x)
_
0. (23.22)
Let
(, x) :=
e
ix
1 ix
x
2
=
2
2
_
1
0
e
itx
2 (1 t) dt (23.23)
(the second equality is from Taylors theorem) and extend (, ) to

1 by
setting (, ) = 0. Then (, )
R
C
_
1
_
and therefore by Hellys
selection Theorem 21.53 there is a probability measure on
_
1, B
R
_
and a
subsequence, n
l
of n such that
nl
((, )) ((, )) for all 1 (in
fact
nl
(h) (h) for all h C
_
1
_
). Combining this with Eq. (23.22) allows
us to conclude,
f
L
() = lim
l
E
_
e
iSn
l
= lim
l
exp
__
R
_
e
ix
1 ix
_
d
nl
(x)
_
= lim
l
exp
__
R
(, x) d
nl
(x)
_
= exp
__
R
(, x) d (x)
_
= exp
__
R
(, x) d (x)
_
where := [
BR
. The last equality follows from the fact that (, ) = 0.
The measure now satises, (1) = (1)
_
1
_
= 1.
We are now going to drop the assumption that Var (S
n
) = 1 for all n and
replace it with the following property.
Denition 23.23. We say that X
n,k
n
k=1
has bounded variation (BV ) i
sup
n
Var (S
n
) = sup
n
n
k=1
2
n,k
< . (23.24)
Corollary 23.24 (Limits under (BV )). Suppose X
n,k
n
k=1
are independent
mean zero random variables for each n which satisfy properties (M) and (BV ) .
If S
n
:=
n
k=1
X
n,k
= L for some random variable L, then
f
L
() = exp
__
R
e
ix
1 ix
x
2
d (x)
_
(23.25)
where is a nite positive measure on (1, B
R
) .
Proof. Let s
2
n
:= Var (S
n
) . If lim
n
s
n
= 0, then S
n
0 in L
2
and
hence weakly, therefore Eq. (23.25) holds with 0. So let us now suppose
lim
n
s
n
,= 0. Since s
n
n=1
is bounded, we may by passing to a subsequence
if necessary, assume lim
n
s
n
= s > 0. By replacing X
n,k
by X
n,k
/s
n
and
hence S
n
by S
n
/s
n
, we then know by Slutzkys Theorem 21.39 that S
n
/s
n
=
L/s. Hence by an application of Theorem 23.22, we may conclude
f
L
(/s) = f
L/s
() = exp
__
R
e
ix
1 ix
x
2
d (x)
_
where is a nite positive measure on (1, B
R
) such that (1) 1. Letting
s in this expression then implies
23.3 Stable Distributions 383
f
L
() = exp
__
R
e
isx
1 isx
x
2
d (x)
_
= exp
_
_
R
e
isx
1 isx
(sx)
2
s
2
d (x)
_
= exp
__
R
e
ix
1 ix
x
2
d
s
(x)
_
where
s
is the nite measure on (1, B
R
) dened by
s
(A) := s
2
_
s
1
A
_
for all A B
R
.
From Eq. (23.23) we see that (, x) :=
_
e
ix
1 ix
_
/x
2
is a smooth
function of (, x) . Moreover,
d
d
(, x) =
ixe
ix
ix
x
2
= i
e
ix
1
x
and
d
2
d
2
(, x) = i
ixe
ix
x
= e
ix
.
Using these remarks and the fact that (1) < , it is easy to see that
f
t
L
() =
__
R
i
e
ix
1
x
d
s
(x)
_
f
L
()
and
f
tt
L
() =
_
_
R
e
ix
d
s
(x) +
_
__
R
i
e
ix
1
x
d
s
(x)
_
2
__
f
L
()
and in particular, f
t
L
(0) = 0 and f
tt
L
(0) =
s
(1) . Therefore by Theorem 22.8
the probability measure on (1, B
R
) such that () = f
L
() has mean zero
and variance,
s
(1) < . This later condition reects the (BV ) assumption
that we made.
Theorem 23.25. The following class of symmetric distributions on (1, B
R
)
are equal;
1. C
1
all possible limiting distributions under properties (M) and (BV ) .
2. C
2
all distributions with characteristic functions of the form given in
Corollary 23.24.
3. C
3
all innitely divisible distributions with mean zero and nite variance.
Proof. The inclusion, C
1
C
2
, is the content of Corollary 23.24. For C
2

C
3
, observe that if
() = exp
__
R
e
ix
1 ix
x
2
d (x)
_
then () = [
n
()]
n
where
n
is the unique probability measure on (1, B
R
)
such that

n
() = exp
__
R
e
ix
1 ix
x
2
1
n
d (x)
_
.
For C
3
C
1
, simply dene X
n,k
n
k=1
to be i.i.d with E
_
e
iXn,k
=
n
() . In
this case S
n
=
n
k=1
X
n,k
d
= .
23.3 Stable Distributions
Denition 23.26. A non-degenerate distribution = Law(X) on 1 is stable
if whenever X
1
and X
2
are independent copies of X, then for all a, b 1 there
exists c, d 1 such that aX
1
+bX
2
d
= cX +d with some constants c and d.
Example 23.27. Any Gaussian random variable is stable. Indeed if X
d
= N +
where > 0 and 1 and N = N (0, 1) , then X
i
= N
i
+ where N
1
and
N
2
are independent with N
i
d
= N we will have aX
1
+ bX
2
is Gaussian mean
(a +b) and variance
_
a
2
+b
2
_
2
so that
aX
1
+bX
2
d
=
_
(a
2
+b
2
)N + (a +b)
d
=
_
(a
2
+b
2
) (X ) + (a +b)
=
_
(a
2
+b
2
)X +
_
a +b
_
(a
2
+b
2
)
_
.
Example 23.28. Poisson random variables are not stable. For suppose that Z =
Pois () and Z
1
d
= Z
2
d
= Z, then Z
1
+Z
2
d
= Pois (2) . If we could nd a, b such
that
Pois (2)
d
= Z
1
+Z
2
d
= aZ +b
we would have
e
2
(2)
n
n!
= P (aZ +b = n) = P
_
Z =
n b
a
_
for all n.
In particular this implies that
nb
a
= k
n
N
0
for all n N
0
and the map
n k
n
must be invertible so as probabilities are conserved. This can only be
the case if a = 1 and b = 0 and we would conclude that Z
d
= Pois (2) which is
absurd.
i
n
i=1
are i.i.d. random variables such that
X
1
+ +X
n
= c a.s., then X
i
= c/n a.s.
Proof. Let f () := Ee
iX1
, then
e
ic
= E
_
e
i(X1++Xn)
[X
1
_
= e
iX1
f () a.s.
from which it follows that f () = e
i(cX1)
a.s. and in particular for an where
this equality holds we nd, f () = e
i(cX1())
= e
ic
. By uniqueness of the
Fourier transform it follows that X
1
= c
t
a.s. and therefore c = X
1
+ +X
n
=
nc
t
a.s., i.e. c
t
= c/n.
Lemma 23.30. If is a stable distribution then it is innitely divisible.
Proof. Let X
n
N
n=1
be i.i.d. random variables with Law(X
n
) = =
Law(X) . As is stable we know that
X
1
+ +X
N
d
= aX +b. (23.26)
As is non-degenerate, it follows from Lemma 23.29 that a ,= 0, Therefore form
Eq. (23.26) we nd,
X
d
=
N
i=1
1
a
(X
i
b/N)
and this shows that X is innitely divisible.
The converse of this lemma is not true as is seen by considering Poisson
random variables, see Example 23.28. The following characterization of the
stable law may be found in [9, Chapter 9.9]. For a whole book about stable
laws and their properties see Samorodnitsky and Taqqu [59].
Theorem 23.31. A probability measure on 1 is a stable distribution i is
Gaussian or () = e
()
where
() = ib +
_
R
_
e
ix
1
ix
1 +x
2
_
m
1
1
x>0
m
2
1
x<0
[x[
1+
dx
for some constants, 0 < < 2, m
i
0 and b 1.
To get some feeling for this theorem. Let us consider the case of a stable
random variable X which is also assumed to be symmetric. In this case if X
1
, X
2
are independent copies of X and a, b 1 and c = c (a
1
, a
2
) and d = d (a
1
, a
2
)
are then chosen so that aX
1
+bX
2
d
= cX+d, we must have that d = 0 and may
take c > 0 by the symmetry assumption. Letting f () = E
_
e
iX
we may now
conclude that
f (a) f (b) = E
_
e
i(aX1+bX2)
_
= E
_
e
icX
= f (c) .
It turns out the solution to these functional equation are of the form f () =
e
k]]
. If f () is of this form then

f (a) f (b) = exp(k ([a[
+[b[
) [[
) = f (c)
where c = ([a[
+[b[
)
1/
. Moreover it turns out the f is a characteristic
function when 0 < 2. The case = 2 is the Gaussian case, then case = 1
is the Cauchy distribution, for example if
d(x) =
1
(1 +x
2
)
dx then () = e
]]
.
For 1 we nd that we have
f
t
() = k [[
1
f () 0 and
f
tt
() =
_
k
2
[[
22
k ( 1) [[
2
_
f () 0
so that f is a decreasing convex symmetric function for 0. Therefore by
Polyas criteria of Exercise 22.6 it follows that e
k]]
is the characteristic func-

tion of a probability measure for 0 1. The full proof is not denitely not
given here.
23.4 *Appendix: Levy exponent and Levy Process facts
Very Preliminary!!
We would like to characterize all processes with independent stationary in-
crements with values in 1 or more generally 1
d
. We begin with some more
examples.
Proposition 23.32. For every nite measure , the function
f () := exp
__
R
e
ix
1 ix
x
2
d (x)
_
is the characteristic function of a probability measure, =
, on (1, B
R
) . The
convention here is that
e
ix
1 ix
x
2
[
x=0
:= lim
x0
e
ix
1 ix
x
2
=
1
2
2
.
Proof. This is the content of Exercise 22.4
23.4 *Appendix: Levy exponent and Levy Process facts Very Preliminary!! 385
1. If X
t
t0
is a right continuous process with stationary and independent
increments, then let f
t
() := E
_
e
i(Xt+X)
for any 0. It then follows

that
f
t+s
() = E
_
e
i(Xt+sX0)
_
= E
_
e
i(Xt+sXt+XtX0)
_
= E
_
e
i(Xt+sXt)
_
E
_
e
i(XtX0)
_
= f
s
() f
t
() .
The right continuity of X
t
now insures that f
t
is also right continuous.
The only solution to the above functional equation is therefore of the form,
f
t
() = e
t()
for some function () . Since
e
t Re ()
= [f
t
()[ 1
it follows that Re () 0. Let 1 be xed and dene h(t) := f
t
() ,
then h is right continuous, h(0) = 1, and h(t +s) = h(t) h(s) . Let ln be a
branch of the logarithm dened near 1 such that ln1 = 0. Then there exists
such that for all t we have g (t) := lnh(t) is well dened and g (t)
satises, g (t +s) = g (t) + g (s) for all 0 s, t . We now set g
(t) :=
g (t) and then g
(s +t) = g
(s) +g
(t) for all 0 s, t 1 and is still right

continuous. As usual it now follows that g
(1) = g
(n 1/n) = n g
(1/n)
for all n and therefore for all 0 k n, we have g
(k/n) =
k
n
g
(1) . Using
the right continuity of g
it now follows that g
(t) = tg
(1) for all 0 t < 1.

Thus we have shown g (t) = tg () for 0 t < 1 and therefore if we set
:= g () / we have shown g (t) = t for t [0, ) that is ,
h(t) = e
t
for 0 t < .
This formula is now seen to be correct for all t 0. Indeed if t = k/2 +
with 0 < /2, then
h(t) = h(/2)
k
h() =
_
e
/2
_
k
e
= e
[k/2+]
= e
t
.
Thus we have shown that f
t
() = e
t()
for some function () . Let us
further observe that
() = lim
t0
f
t
() 1
t
from which it follows that must be measurable. Furthermore,
() = lim
t0
f
t
() 1
t
= lim
t0
f
t
() 1
t
= ().
We are going to show more.
2. Let z
i
n
i=1
C such that

n
i=1
z
i
= 1 and
i
n
i=1
1, then
n
i,j=1
(
i
j
) z
i
z
j
= lim
t0
n
i,j=1
f
t
(
i
j
) 1
t
z
i
z
j
= lim
t0
1
t
n
i,j=1
f
t
(
i
j
) z
i
z
j
while for any z
i
n
i=1
C we have
n
i,j=1
f
t
(
i
j
) z
i
z
j
=
n
i,j=1
E
_
e
i(ij)Xt
_
z
i
z
j
=
n
i,j=1
E
_
e
iiXt
z
i
e
ijXt
z
j
= E
_
_
n
i=1
e
iiXt
z
i
j=1
e
ijXt
z
j
_
_
= E
_
_
i=1
e
iiXt
z
i
2
_
_
0.
Therefore it follows that when
n
i=1
z
i
= 1 then
n
i,j=1
(
i
j
) z
i
z
j
0.
We say the is conditionally positive denite in this case.
3. The Schoenberg correspondence says (see [1, Theorem 1.1.13]) that if
is continuous at zero, () = () and is conditionally positive de-
nite, then e
t()
is a characteristic function. We will prove this below using
Bochners Theorem 22.43.
4. But rst some examples;
a) Let () = iab
2
with a 1 and b 0.Then () = iab
2
=
() and for

n
i=1
z
i
= 1 we have
n
i,j=1
(
i
j
) z
i
z
j
=
n
i,j=1
_
i (
i
j
) a b (
i
j
)
2
_
z
i
z
j
.
Noting that
n
i,j=1
i
z
i
z
j
=
n
i=1
i
z
i
n
j=1
z
j
=
n
i=1
i
z
i
0 = 0
and similarly that

n
i,j=1
_
2
i
z
i
z
j
= 0, it follows that
n
i,j=1
(
i
j
) z
i
z
j
=
n
i,j=1
_
b (2
i
j
)
2
_
z
i
z
j
= 2b
i,j=1
i
z
i
2
0.
b) Suppose that Z
i
i=1
are i.i.d. random variables and N is an inde-
pendent Poisson process with intensity . Let X := Z
1
+ + Z
N
,
then
f
X
() = E
_
e
iX
n=0
E
_
e
iX
: N = n
n=0
E
_
e
i[Z1++Zn]
: N = n
_
= e
n=0
n
n!
[f
Z1
()]
n
= exp((f
Z1
() 1)) .
So in this case () = f
Z1
() 1 and we know by the theory above
that () is conditionally positive denite.
Lemma 23.33. Suppose that A
ij
d
i,j=1
C is a matrix such that A
= A
and A 0. Then for all n N
0
, the matrix with entries
_
A
n
ij
_
n
i,j=1
is positive
semi-denite.
Proof. Since A
ij
= (Ae
j
, e
i
) where (v, w) :=

d
j=1
v
j
w
j
is the standard
inner product on C
d
, it follows that
A
n
ij
=
_
A
n
e
n
j
, e
n
i
_
and therefore,
d
i,j=1
A
n
ij
z
i
z
j
=
d
i,j=1
_
A
n
e
n
j
, e
n
i
_
z
i
z
j
=
_
A
n
,
_
where :=

d
j=1
z
j
e
n
j

_
C
d
_
n
. So it suces to show A
n
0. To do this
let u
i
d
i=1
be an O.N. basis for C
d
such that Au
i
=
i
u
i
for all i. Since A 0
we know that
i
0 and therefore
A
n
(u
i1
u
id
) = (
i1
. . .
id
) (u
i1
u
id
)
where (
i1
. . .
id
) 0. This shows that A
n
is unitarily equivalent to a diagonal
matrix with non-negative entries and hence is positive semi-denite.
Proposition 23.34. Suppose that A
ij
d
i,j=1
C is a matrix such that A
= A
and A is conditionally positive denite, for example A
ij
:= (
i
j
) as above.
Then the matrix with entries,
_
e
Aij
_
d
i,j=1
is positive denite.
Proof. Let u := (1, . . . , 1)
tr
C
d
. Let C
d
and write = z + u where
(z, u) = 0 and := (, u) /d. Letting B :=
A on u
and 0 on C u, we have
(A, ) = (A(z +u) , z +u)
= (Az, z) + 2 Re [ (Az, u)] +[[
2
(Au, u)
= (Az, z) + 2 Re
_

_
B
2
z, u
_
+[[
2
(Au, u)
= (Az, z) + 2 Re [ (Bz, B
u)] +[[
2
(Au, u)
(Az, z) 2 |Bz| [[ |B
u| +[[
2
(Au, u)
(Az, z)
_
|Bz|
2
+[[
2
|B
u|
2
_
+[[
2
(Au, u)
= [[
2
_
(Au, u) |B
u|
2
_
.
Since
_
u u
tr
,
_
= [[
2
_
u u
tr
u, u
_
= [[
2
d
2
,
it follows that
__
A+u u
tr
_
,
_
[[
2
_
(Au, u) |B
u|
2
+d
2
_
0
provided d
2
|B
u|
2
(Au, u) .
We now x such a 1 so that (A+u u
tr
) 0. It then follows from
Lemma 23.33 that
e
e
Aij
= e
Aij+
= e
(A+u u
tr
)
ij
=
n=0
(A+u u
tr
)
n
ij
n!
are the matrix entries of a positive denite matrix. Scaling this matrix by e
>
0 then gives the result that
_
e
Aij
_
i,j
0.
As a consequence it follows that e
t()
is a positive denite function when-
ever is conditionally positive denite.
Proposition 23.35. Suppose that Z
i
i=1
are i.i.d. random vectors in 1
d
with
Law(Z
i
) = and N
t
t0
be an independent Poisson process with intensity .
Then X
t
:= S
Nt
t0
is a Levy process with E
_
e
ikXt
= e
t(k)
where
(k) =
_
R
n
_
e
ikx
1
_
d(x) = E
_
e
ikZ1
.
Proof. It has already been shown in Theorem 17.28 that X
t
t0
has sta-
tionary independent increments and being right continuous it is a Levy process.
It only remains to compute the Fourier transform,
E
_
e
ikXt
=
_
Q
t
_
x e
ikx
_
(0)
=
n=0
(t)
n
n!
e
t
E
_
e
ik(Z1++Zn)
_
=
n=0
(t)
n
n!
e
t
(k)
n
= e
t( (k)1)
= exp
_
t
_
R
n
_
e
ikx
1
_
d(x)
_
.
More generally, if we let B
t
be Brownian motion in 1
n
with Cov
_
B
i
t
, B
j
t
_
=
A
ij
t and b 1
n
, then assuming B and X above are independent, then X
t
=
bt +B
t
+X
t
is again a Levy process whose Fourier transform is given by,
E
_
e
ikXt
= exp
_
ibt +Ak k +
_
R
n
_
e
ikx
1
_
d(x)
_
.
Thus
() = ibt +Ak k +
_
R
n
_
e
ikx
1
_
d(x)
is a Levy exponent for all choice of b 1
n
, all > 0, probability measures
on 1
n
, and A 0.
Levy proved that in general (k) will be a Levy exponent i has the form
given in Eq. (23.27) below.
Theorem 23.36 (Levy Kintchine formula). If is continuous at zero and
conditionally positive denite, then
() = ib
1
2
a
2
+
_
R\0]
_
e
ix
1 ix 1
]x]1
_
d (x) (23.27)
for some b 1, a 0, and some measure such that
_
R\0]
_
x
2
1
_
d (x) < .
Part V
Stochastic Processes II
We are now going to discuss continuous time stochastic processes in more
detail. We will be using Poisson processes (Denition 11.7) and Brownian mo-
tion (Denition 17.21) as our model cases. Up to now we have not proved the
existence of Brownian motion. This lapse will be remedied in the next couple of
chapters. We are going to begin by constructing a process B
t
t0
satisfying all
of the properties of a Brownian motion in Denition 17.21 except for the con-
tinuity of the sample paths. We will then use Kolmogorovs continuity criteria
(Theorem 25.7) to show we can modify this process in such a way so as to
produce an example of Brownian motion. We start with a class of random elds
which are relatively easy to understand. (BRUCE metion the free Euclidean
eld and its connections to SLE.)
24
Gaussian Random Fields
Recall from Section 9.8 (which the reader should review if necessary) that
a random variable, Y : 1 is said to be Gaussian if
Ee
iY
= exp
_
1
2
2
Var (Y ) +iEY
_
1.
More generally a random vector, X : 1
N
, is said to be Gaussian if X
is a Gaussian random variable for all 1
N
. Equivalently put, X : 1
N
is Gaussian provided
E
_
e
iX
= exp
_
1
2
Var ( X) +iE( X)
_
1
N
. (24.1)
Remark 24.1. To conclude that a random vector, X : 1
N
, is Gaussian
it is not enough to check that each of its components are Gaussian random
variables. The following simple counter example was provided by Nate Eldredge.
Let X
d
= N(0, 1) and Y be an independent Bernoulli random variable with
P(Y = 1) = P(Y = 1) = 1/2. Then the random vector, (X, X Y )
tr
has
Gaussian components but is not Gaussian.
Exercise 24.1 (Same as Exercise 9.9.). Prove the assertion made in Re-
mark 24.1 by computing E
_
e
i(1X+2XY )
. (Another proof that (X, X Y )

tr
is
not Gaussian follows from the fact that X and XY are uncorrelated but not
independent
1
which would then contradict Lemma 10.24.)
24.1 Gaussian Integrals
The following theorem gives a useful way of computing Gaussian integrals of
polynomials and exponential functions.
1
To formally see that they are not independent, observe that |X|
1
2
i |XY |
1
2
and therefore,
P
_
|X|
1
2
and |XY |
1
2
_
= P
_
|X|
1
2
_
=:
while
P
_
|X|
1
2
_
P
_
|XY |
1
2
_
=
2
= .
Theorem 24.2. Suppose X
d
= N (Q, 0) where Q is a NN symmetric positive
denite matrix. Let L = L
Q
:= Q
ij
j
(sum on repeated indices) where
i
:=
/x
i
. Then for any polynomial function, q : 1
N
1,
E[q (X)] =
_
e
1
2
L
q
_
(0) :=
n=0
1
n!
__
L
2
_
n
q
_
(0) (a nite sum). (24.2)
Proof. First Proof. The rst proof is conceptually clear but technically
a bit more dicult. In this proof we will begin by proving Eq. (24.2) when
q (x) = e
ix
where 1
N
. The function q is not a polynomial, but never
mind. In this case,
E[q (X)] = E
_
e
iX
= e
1
2
Q
.
On the other hand,
_
1
2
Lq
_
(x) =
1
2
Q
ij
j
e
ix
=
1
2
(Q ) e
ix
=
1
2
(Q ) q (x) .
Therefore,
e
1
2
L
q =
n=0
1
n!
_
1
2
Q
_
n
q = e
1
2
Q
q
and hence _
e
1
2
L
q
_
(0) = e
1
2
Q
.
Thus we have shown
E
_
e
iX
= e
1
2
L
e
ix
[
x=0
.
The result now formally follows by dierentiating this equation in and then
setting = 0. Indeed observe that
E[(iX)
] =
E
_
e
iX
[
=0
=
e
1
2
L
e
ix
[
x=0,=0
= e
1
2
L
e
ix
[
x=0,=0
= e
1
2
L
(ix)
[
x=0
.
To justify this last equation we must show,
e
1
2
L
e
ix
= e
1
2
L
e
ix
394 24 Gaussian Random Fields
which is formally true since mixed partial derivatives commute. However there
is also an innite sum involved so we have to be a bit more careful. To see what
is involved, on one hand
e
1
2
L
e
ix
=
n=0
1
n!
_
1
2
Q
_
n
e
ix
while on the other,
e
1
2
L
e
ix
=
n=0
1
n!
__
L
2
_
n
e
ix
_
=
n=0
1
n!
__
L
2
_
n
e
ix
_
=
n=0
_
1
n!
_
1
2
Q
_
n
e
ix
_
.
Thus to complete the proof we must show,
n=0
1
n!
_
1
2
Q
_
n
e
ix
=
n=0
_
1
n!
_
1
2
Q
_
n
e
ix
_
.
Perhaps the easiest way to do this would be to use the Cauchy estimates
2
which allow one to show that if f
n
()
n=0
is a sequence of analytic func-
tions such that

n=0
f
n
() is uniformly convergent on compact subsets, then
n=0
f
n
() is also uniformly convergent on compact subsets and therefore,
n=0
f
n
() =
n=0
f
n
() .
Now apply this result with f
n
() :=
1
n!
_
1
2
Q
_
n
e
ix
to get the result. The
details are left to the reader.
This proof actually shows more than what is claimed. Namely, 1. Q may be
only non-negative denite and 2. Eq. (24.2) holds for q (x) = p (x) e
ix
where
1
N
and p is a polynomial.
Second Proof. Let
u(t, y) := E
_
q
_
y +
tX
__
= Z
1
_
R
N
q
_
y +
tx
_
e
Q
1
xx/2
dx (24.3)
= Z
1
_
R
N
q (y +x)
e
Q
1
xx/2t
t
n/2
dx. (24.4)
2
If you want to avoid the Cauchy estimates it would suce to show by hand that
n=0
sup
||R
_
1
n!
_
1
2
Q
_
n
e
ix
_
<
for all multi- indices, .
One now veries that
t
e
Q
1
xx/2t
t
n/2
=
1
2
L
e
Q
1
xx/2t
t
n/2
.
Using this result and dierentiating under the integral in Eq. (24.4) then shows,
t
u(t, y) =
1
2
L
y
u(t, y) with u(0, y) = q (y) .
Moreover, from Eq. (24.3), one easily sees that u(t, y) is a polynomial in (t, y)
and the degree in y is the same as the degree of q. On the other hand,
v (t, y) :=
n=0
t
n
n!
__
L
2
_
n
q
_
(y) =
_
e
tL/2
q
_
(y)
satises the same equation as u in the same nite dimensional space of poly-
nomials of degree less than or equal to deg (q) . Therefore by uniqueness of
solutions to ODE we must have u(t, y) = v (t, y) . The result now follows by
taking t = 1 and y = 0 and observing that
u(1, 0) = E
_
q
_
0 +
1X
__
= E[q (X)] and
v (1, 0) =
_
e
L/2
q
_
(0) .
Third Proof. Let u 1
N
. Since
u
exp
_
1
2
Q
1
x x
_
=
_
Q
1
x u
_
exp
_
1
2
Q
1
x x
_
it follow by integration by parts that
E
__
Q
1
X u
_
p (X)
=
1
Z
_
R
N
p (x)
u
exp
_
1
2
Q
1
x x
_
dx
=
1
Z
_
R
N
(
u
p) (x) exp
_
1
2
Q
1
x x
_
dx
= E[(
u
p) (X)] .
Replacing u by Qu in this equation leads to important identity,
E[(X u) p (X)] = E[(
Qu
p) (X)] . (24.5)
It is clear that using this identity and induction it would be possible to compute
E[p (X)] for any polynomial p. So to nish the proof it suces to show
e
L/2
((x u) p (x)) [
x=0
= e
L/2
((
Qu
p) (x)) [
x
= 0.
24.2 Existence of Gaussian Fields 395
This is correct notice that
e
L/2
((x u) p (x)) = e
L/2
_
(x u) e
L/2
e
L/2
p (x)
_
= e
L/2
M
(xu)
e
L/2
e
L/2
p (x)
Letting q be any polynomial and
F
t
:= e
tL/2
M
(xu)
e
tL/2
,
we have
d
dt
F
t
= e
tL/2
_
L
2
, M
(xu)
_
e
tL/2
= e
tL/2
Qu
e
tL/2
=
Qu
and therefore,
e
L/2
M
(xu)
e
L/2
= F
1
= M
(xu)
+
Qu
.
e
L/2
((x u) p (x)) [
x=0
=
_
_
M
(xu)
+
Qu
_
e
L/2
p (x)
_
x=0
=
_
Qu
e
L/2
p (x)
_
x=0
=
_
e
L/2
Qu
p (x)
_
x=0
(24.6)
which is the same identity as in Eq. (24.5).
d
= N (1, 0) 1, then
E
_
X
2n
=
_
e
/2
x
2n
_
x=0
=
1
n! 2
n
n
|x|
2n
=
(2n)!
2
n
n!
.
24.2 Existence of Gaussian Fields
Denition 24.4. Let T be a set. A Gaussian random eld indexed by T is
a collection of random variables, X
t
tT
on some probability space (, B, P)
such that for any nite subset,
f
T, X
t
: t is a Gaussian random
vector.
Associated to a Gaussian random eld, X
t
tT
, are the two functions,
c : T 1 and Q : T T 1
dened by c (t) := EX
t
and Q(s, t) := Cov (X
s
, X
t
) . By the previous results,
the functions (Q, c) uniquely determine the nite dimensional distributions
X
t
: t T , i.e. the joint distribution of the random variables, X
t
: t ,
for all
f
T.
Denition 24.5. Suppose T is a set and X
t
: t T is a random eld. For
any T, let B
:= (X
t
: t ) .
Proposition 24.6. Suppose T is a set and c : T 1 and Q : T T 1 are
given functions such that Q(s, t) = Q(t, s) for all s, t T and for each
f
T
s,t
Q(s, t) (s) (t) 0 for all : 1.
Then there exists a probability space, (, B, P) , and random variables, X
t
:
1 for each t T such that X
t
tT
is a Gaussian random process with
E[X
s
] = c (s) and Cov (X
s
, X
t
) = Q(s, t) (24.7)
for all s, t T.
Proof. Since we will construct (, B, P) by Kolmogorovs extension Theo-
rem 17.54, let := 1
T
, B = B
R
T , and X
t
() =
t
for all t T and .
Given
f
T, let
be the unique Gaussian measure on

_
1
, B
:= B
R
_
such that
_
R
e
i
t
(t)x(t)
d
(x)
= exp
_
_
1
2
s,t
Q(s, t) (s) (t) +i
s
c (s) (s)
_
_
.
The main point now is to show
__
1
, B
__
f T
is a consistent family of
measures. For this, suppose
f
T and : 1
is the projection
map, (x) = x[
. For any 1
, let

1
be dened so that

= on
and

= 0 on . We then have,
_
R
e
i
t
(t)x(t)
d
_

1
_
(x)
=
_
R
e
i
t
(t)(x)(t)
d
(x)
=
_
R
e
i
(t)x(t)
d
(x)
= exp
_
_
1
2
s,t
Q(s, t)

(s)

(t) +i
s
c (s)

(s)
_
_
= exp
_
_
1
2
s,t
Q(s, t) (s) (t) +i
s
c (s) (s)
_
_
=
_
R
e
i
t
(t)x(t)
d
(x) .
396 24 Gaussian Random Fields
Since this is valid for all 1
, it follows that

1
=
as desired.
Hence by Kolmogorovs theorem, there exists a unique probability measure, P
on (, B) such that
_
f ([
) dP () =
_
R
f (x) d
(x)
for all
f
T and all bounded measurable functions, f : 1
1. In particular,
it follows that
E
_
e
i
t
(t)Xt
_
=
_
e
i
t
(t)(t)
dP ()
= exp
_
_
1
2
s,t
Q(s, t) (s) (t) +i
s
c (s) (s)
_
_
for all 1
. From this it follows that X

t
tT
is a Gaussian random eld
satisfying Eq. (24.7).
Exercise 24.2. Suppose T = [0, ) and X
t
: t T is a mean zero Gaussian
random eld (process). Show that B
[0,]
X
B
[,)
for all 0 < i
Q(s, ) Q(, t) = Q(, ) Q(s, t) 0 s t < . (24.8)
Hint: see use Exercises 10.6 and 10.4.
24.3 Gaussian Field Interpretation of Pre-Brownian
Motion
Lemma 24.7. Suppose that B
t
t0
is a pre-Brownian motion as described
in Proposition 23.14, also see Corollary 17.22. Then B
t
t0
is a mean zero
Gaussian random process with E[B
t
B
s
] = s t for all s, t 0.
Proof. Suppose we are given 0 = t
0
< t
1
< < t
n
< and recall
from Proposition 23.14 that B
0
= 0 a.s. and
_
B
tj
B
tj1
_
n
j=1
are independent
mean zero Gaussian random variables. Hence it follows from Corollary 10.25
that
_
B
tj
B
tj1
_
n
j=1
is a Gaussian random vector. Since the random vector
_
B
tj
_
n
j=0
is a linear transformation
_
B
tj
B
tj1
_
n
j=1
it follows from Lemma
9.36 that
_
B
tj
_
n
j=0
is a Gaussian random vector. Since 0 = t
0
< t
1
< <
t
n
< was arbitrary, it follows that B
t
t0
is a Gaussian process. Since
B
t
= B
t
B
0
d
= N (0, t) we see that EB
t
= 0 for all t. Moreover we have for
0 s < t < that
E[B
t
B
s
] = E[(B
t
B
s
+B
s
B
0
) (B
s
B
0
)]
= E[(B
t
B
s
) (B
s
B
0
)] +E
_
(B
s
B
0
)
2
_
= E[B
t
B
s
] E[B
s
B
0
] +s = 0 0 +s
Theorem 24.8. The function Q(s, t) := s t dened on s, t 0 is positive
denite.
Proof. We are going to give a six proofs of this theorem.
1. Choose any independent square integrable random variables, X
j
n
j=1
, such
that EX
j
= 0 and Var (X
j
) = t
j
t
j1
. Let Y
j
:= X
1
+ + X
j
for
j = 1, 2, . . . , n. We then have, for j k that
Cov (Y
j
, Y
k
) =
mj, nk
Cov (X
m
, X
n
) =
mj, nk
m,n
(t
m
t
m1
)
=
mj
(t
m
t
m1
) = t
j
,
i.e. t
j
t
k
= Cov (Y
j
, Y
k
) . But such covariance matrices are always positive
denite. Indeed,
j,kn
t
j
t
k
k
=
j,kn
k
Cov (Y
j
, Y
k
)
= Var (
1
Y
1
+ +
n
Y
n
) 0
with equality holding i
1
Y
1
+ +
n
Y
n
= 0 from which it follows that
0 = E[Y
j
(
1
Y
1
+ +
n
Y
n
)] =
j
(t
j
t
j1
) ,
i.e.
j
= 0.
2. According to Exercise 21.12 we can nd stochastic processes
_
B
n
(t) =
nS
[nt]
_
n=1
such that E[B
n
(t) B
n
(s)] s t as n
and therefore
s,t
(s t)
s
t
= lim
n
s,t
E[B
n
(t) B
n
(s)]
s
t
= lim
n
E
_
_
_
t
B
n
(t)
t
_
2
_
_
0.
24.3 Gaussian Field Interpretation of Pre-Brownian Motion 397
3. Appealing to Corollary 17.22, there exists a time homogeneous Markov
processes B
t
t0
with Markov transition kernels given by
Q
t
(x, dy) =
1
2t
e
1
2t
]yx]
2
dy. (24.9)
It is now easy to see that s t = Cov (B
s
, B
t
) which is automatically non-
negative as we saw in the proof of item 2.
4. Let = 0 < t
1
< < t
n
< and
i
n
i=1
1 be given. Further let
j
:=
i
+
i+1
+ +
n
n+1
= 0. We then
have,
n
i=1
t
i
t
j
i
=
n
i=1
t
i
t
j
(
i
i+1
)
=
n
i=1
[t
i
t
j
t
i1
t
j
]
i
=
1ij
[t
i
t
i1
]
i
where t
0
:= 0. Hence it follows that
n
i,j=1
t
i
t
j
j
=
n
j=1
1ij
[t
i
t
i1
]
i
j
=
1ijn
[t
i
t
i1
]
i
j
=
1in
[t
i
t
i1
]
2
i
0
with equality i
i
= 0 for all i which is equivalent to
i
= 0 for all i.
5. Let h
t
() := t be as after Theorem 32.9 below and using the results and
notation proved there we nd,
s,t
(s t)
s
t
=
s,t
h
t
, h
s
)
T

s
t
=
_
_
_
_
_
t
h
t
_
_
_
_
_
2
T
0.
This shows Q is positive semi-denite and equality holds i
t
h
t
= 0.
After taking the derivative of this identity, it is not hard to see that
t
= 0
for all t so that Q is positive denite.
6. The function Q(s, t) = s t restricted to s, t [0, T] for some T < is the
Greens function for the positive denite second order dierential operator
d
2
dt
2
which is equipped with Dirichlet boundary condition at t = 0 and
Neumann boundary conditions at t = T.
We have already given a Markov process proof of the existence of Pre-
Brownian motion in Corollary 17.22. Given Theorem 24.8 we can also give
a Gaussian process proof of the existence of pre-Brownian motion which we
summarize in the next proposition.
Proposition 24.9 (Pre-Brownian motion). Let B
t
t0
be a mean zero
Gaussian process such that Cov (B
s
, B
t
) = s t for all s, t 0 and let
B
t
:= (B
s
: s t) and B
t+
:=
>t
B
Then;
1. B
0
= 0 a.s.
2. B
t
t0
has independent increments with B
t
B
s
d
= N (0, (t s)) for all
0 s < t < .
3. For all t s 0, B
t
B
s
is independent of B
s+
.
4. B
t
t0
is a time homogeneous Markov process with transition kernels
Q
t
(x, dy)
t0
given as in Eq. (24.9).
Proof. See Exercise 24.3 24.5.
Exercise 24.3 (Independent increments). Let
T := 0 = t
0
< t
1
< < t
n
= T
be a partition of [0, T] ,
i
B := B
ti
B
ti1
and
i
t := t
i
t
i1
. Show
i
B
n
i=1
are independent mean zero normal random variables with Var (
i
B) =
i
t.
Exercise 24.4 (Increments independent of the past). Let
B
t
:= (B
s
: s t) . For each s (0, ) and t > s, show;
1. B
t
B
s
is independent of B
s
and
2. more generally show, B
t
B
s
is independent of B
s+
:=
>s
B
.
Exercise 24.5 (The simple Markov property). Show B
t
B
s
is indepen-
dent of B
s
for all t s. Use this to show, for any bounded measurable function,
f : 1 1 that
E[f (B
t
) [B
s+
] = E[f (B
t
) [B
s
] = E[f (B
t
) [B
s
]
= (p
ts
f) (B
s
) =:
_
e
(ts)/2
f
_
(B
s
) a.s.,
where
p
t
(x) :=
1
2t
e
1
2t
x
2
so that p
t
f = Q
t
(, f) . This problem veries that B
t
t0
is a Markov
process with transition kernels Q
t
t0
which have
1
2
=
1
2
d
2
dx
2
as there
innitesimal generator.
Exercise 24.6. Let
T := 0 = t
0
< t
1
< < t
n
= T
and f : 1
n
1 be a bounded measurable function. Show
E[f (B
t1
, . . . , B
tn
)] =
_
R
n
f (x
1
, . . . , x
n
) q
1
(x) dx
where
q
1
(x) := p
t1
(x
1
) p
t2t1
(x
2
x
1
) . . . p
tntn1
(x
n
x
n1
) .
Hint: Either use Exercise 24.3 by writing
f (x
1
, . . . , x
n
) = g (x
1
, x
2
x
1
, x
3
x
2
, . . . , x
n
x
n1
)
for some function, g or use Exercise 24.5 rst for functions, f of the form,
f (x
1
, . . . , x
n
) =
n
j=1
j
(x
j
) .
Better yet, do it by both methods!
25
Versions and Modications
We need to introduce a bit of terminology which we will use throughout this
part of the book. As before we will let T be an index space which will typically
be 1
+
or [0, 1] in this part of the book. We further sill suppose that (, B, P)
is a given probability space, (S, ) is a separable (for simplicity) metric state
space, and X
t
: S is a measurable stochastic processes.
Denition 25.1 (Versions). Suppose, X
t
: S and

X
t
: S are two
processes dened on T. We say that

X is a version or a modication of X
provided, for each t T, X
t
=

X
t
a.s.. (Notice that the null set may depend on
the parameter t in the uncountable set, T.)
Denition 25.2. We say two processes are indistinguishable i
P
(Y
,= X
) = 0, i.e. i there is a measurable set, E , such that

P (E) = 0 and Y
,= X
E where
Y
,= X
= : Y
t
() ,= X
t
() for some t [0, )
=
t[0,)
: Y
t
() ,= X
t
() . (25.1)
So Y is a modication of X i
0 = sup
tT
P
_
X
t
,=

X
t
_
= sup
tT
P
__
_
X
t
,

X
t
_
> 0
__
while Y is indistinguishable for X i
0 = P
_
X
t
,=

X
t
t
_
= P
__
sup
tT
_
X
t
,

X
t
_
> 0
__
.
Thus the formal dierence between the two notions is simply whether the supre-
mum is taken outside or inside the probabilities. See Exercise 28.1 for an ex-
ample of two processes which are modications of each other but are not indis-
tinguishable.
Exercise 25.1. Suppose Y
t
t0
is a version of a process, X
t
t0
. Further
suppose that t Y
t
() and t X
t
() are both right continuous everywhere.
Show E := Y
,= X
is a measurable set such that P (E) = 0 and hence X and

Y are indistinguishable. Hint: replace the union in Eq. (25.1) by an appropriate
countable union.
t
: S
t0
is a process such that for each
N N there is a right continuous modication,
_
X
(N)
t
_
0t<N
of X
t
0t<N
.
Show that X admits a right continuous modications,

X, dened for all t 0.
25.1 Kolmolgorovs Continuity Criteria
Let |
n
:=
_
i
2
n
: i Z
_
and | :=
n=0
|
n
be the dyadic rational numbers.
Lemma 25.3. Let |
+
= | [0, ) and s |
+
= | [0, ) and n N
0
be
given, then;
1. there exists a unique i = i (n, s) N
0
such that i2
n
s < (i + 1) 2
n
and
2. s may be uniquely written as
s =
i
2
n
+
k=1
a
k
2
n+k
,
where a
k
= a
k
(n, s) 0, 1 with a
k
= 0 for all suciently large k.
Example 25.4. Suppose that s = 85/32 = 85/2
5
| and 2 N
0
are given, then
2
2
s = 85/8 = 10 + 5/8, i.e.
s =
10
2
2
+
5
2
5
.
Similarly, 2
3
5/2
5
= 5/4 = 1 + 1/4 so that 5/2
5
= 1/2
3
+ 1/2
5
and we have
expressed s as
s =
10
2
2
+
1
2
3
+
0
2
4
+
1
2
5
.
Proof. The rst assertion follows from the fact that |
+
is partitioned by
[i2
n
, (i + 1) 2
n
) |
iN0
. For the second assertion dene the a
1
= 1 if
i
2
n
+
a1
2
n
s and 0 otherwise, then choose a
2
= 1 if
i
2
n
+
a1
2
n+1
+
a2
2
n+2
s and
0 otherwise, etc. It is easy to check that s
m
:=
i
2
n
+
m
k=1
ak
2
n+k
so constructed
satises, s
1
2
n+m
< s
m
s for all m N. As s
m
|
m+n
and s |
N
for
some N, if m+n N then we must have s
m
= s because s s
m
<
1
2
n+m
and
s, s
m
|
m+n
.
400 25 Versions and Modications
Suppose now that (S, ) is a metric space and x : Q := |[0, 1] S. For
n N
0
let
n
(x) = max
_
_
x
_
i2
n
_
, x
_
(i 1) 2
n
__
: 1 i 2
n
_
= max
_
(x(t) , x(s)) : s, t Q |
n
with [s t[
1
2
n
_
.
If (0, 1) and x : |
1
:= |[0, 1] S is a Holder continuous function,
i.e.
(x(t) , x(s)) K[t s[
for some and K < , then

n
(x) K2
n
for all n and in particular for all
(0, ) we have
n=0
2
n
n
(x) K
n=0
2
n
2
n
= K
_
1
1
2
_
1
< .
Our next goal is to produce the following converse to this statement.
Lemma 25.5. Suppose > 0 and x : Q := |[0, 1] S is a map such that
n=0
2
n
n
(x) < , then
(x(t) , x(s)) 2
1+
k=0
2
k
k
(x) [t s[
for all s, t Q. (25.2)

Moreover there exists a unique continuous function x : [0, 1] S extending
x and this extesion is still Holder continuous. (When > 1 it follows by
Exercise 25.3 that x(t) is constant.)
Proof. Let s, t Q with s < t and choose n so that
1
2
n+1
< t s
1
2
n
(25.3)
and observe that if s [i2
n
, (i + 1) 2
n
), then t [i2
n
, (i + 2) 2
n
) and
therefore
s =
i
2
n
+
k=1
a
k
2
n+k
and t =
j
2
n
+
k=1
b
k
2
n+k
where j i, i + 1 and a
k
, b
k
0, 1 with a
k
= b
k
= 0 for a.a. k. Letting
s
m
:=
i
2
n
+
m
k=1
a
k
2
n+k
as above we have s
N
= s for large N. Since s
m
, s
m+1
Q |
n+m+1
with
[s
m
s
m+1
[ 2
(n+m+1)
, it follows from the denition of
n+m+1
that
(x(s
m+1
) , x(s
m
))
n+m+1
(x) which combined with the triangle inequality
shows
(x(s) , x(s
0
))
N
m=1
(x(s
m
) , x(s
m1
))
m=1
n+m
(x) .
Similarly (x(t) , x(t
0
))

m=1
n+m
(x) while (x(s
0
) , x(t
0
))
n
(x) .
One more application of the triangle inequality now shows,
(x(t) , x(s))
n
(x) + 2
m=1
n+m
(x)
2
k=n
k
(x) = 2
k=n
2
k
2
k
k
(x)
2
_
2
n
_
k=n
2
k
k
(x) .
Combining this with the lower bound in Eq. (25.3) gives the estimate in Eq.
(25.2).
For the last assertion we dene x(t) := lim
Qst
x(s) . This limit ex-
ists since for any sequence s
n
n=1
Q with s
n
t [0, 1] , the sequence
x(s
n
)
n=1
is Cauchy in S because of Eq. (25.2) and hence convergent in S. It
is easy to check that lim
n
x(s
n
) is independent of the choice of the sequence
s
n
n=1
. A simple limiting argument now shows that
( x(t) , x(s)) 2
1+
k=0
2
k
k
(x) [t s[
for all s, t [0, 1]

which shows that x is Holder continuous. As we had no choice but to dene x
the way we did if x is to be continuous, the extension is unique.
Exercise 25.3. Show; if x : Q S is Holder continuous for some > 1,
then x is constant.
Solution to Exercise (25.3). Let s, t Q with s < t and let > 0 be
given. Choose a partition s = t
0
< t
1
< . . . < t
n
= t Q such that
i
:=
t
i
t
i1
. Then by the triangle inequality and the assumed Holder continuity
25.1 Kolmolgorovs Continuity Criteria 401
(x(t) , x(s))
n
i=1
(x(t
i1
) , x(t
i
))
i=1
C
i
C
1
n
i=1
i
C
1
(t s) .
As > 1 and > 0 is arbitrary, we may let 0 in this equation in order to
learn (x(t) , x(s)) = 0. As s, t Q is arbitrary it follows that x(t) is constant
in t.
Theorem 25.6 (Kolmogorovs Continuity Criteria). Let X
t
tQ
be an
S valued stochastic process and suppose there exists , > 0 such that
E[ (X
t
, X
s
)
] C [t s[
1+
Then for all (0, /) there exists 0 K
(X) L
(P) such that

(X
t
, X
s
) K
(X) [t s[

and there is a function, M (C, , , ) < , such that
E[K
(X)
] M (C, , , ) . (25.6)
Proof. According to Exercise 25.4 below when < 1 or more generally
when < 1 + it actually follows that X
t
= X
0
a.s. and therefore Eq. (25.5)
holds for some K
which is equal to zero almost surely. So we may now suppose

that 1 + 1 and let (0, /) and
:=
k=0
2
k
k
(X) .
The estimate in Eq. (25.5) with K
(X) = 2
(1+)

is now a consequence
of Lemma 25.5. So it only remains to show E[
] < when (0, /) .

From the following simple estimate,
k
(X)
= max
1i2
k
_
X
i2
k, X
(i1)2k
_
2
k
i=1
_
X
i2
k, X
(i1)2k
_
,
we nd
E[
k
(X)
]
2
k
i=1
E
_
_
X
i2
k, X
(i1)2k
_
2
k
C
_
2
k
_
1+
= C2
k
(25.7)
and therefore |
k
(X)|
C
1/
2
k/
. Combining this inequality with
Minikowskis inequality shows
|
=
_
_
_
_
_
k=0
_
2
k
k
(X)
_
_
_
_
_
k=0
2
k
|
k
(X)|
C
1/
k=0
2
k
2
k/
= C
1/
k=0
_
2
/
_
k
.
This is again nite provided that < /.
Theorem 25.7 (Kolmogorovs Continuity Criteria). Let T N, D =
[0, T] 1, (S, ) be a complete separable metric space and suppose that X
t
:
S is a process for t D. Assume there exists , C, > 0 such that such
that
E[ (X
t
, X
s
)
] C [t s[
1+
for all s, t D. (25.8)
Then there is a modication,

X, of X which is Holder continuous for all
(0, /) and for each such there is a random variable K
(X) L
(P)
such that
(

X
t
,

X
s
) K
(X) [t s[
for all s, t D. (25.9)

(Again according to Exercise 25.4, we will have X
t
= X
0
a.s. for all t D
unless 1 +.)
Proof. From Theorem 25.6 we know for all (0, /) there is a random
variable K
(X) L
(P) such that

(X
t
, X
s
) K
(X) [t s[
for all s, t D |.
On the set K
(X) < , X
t
tDD
has a unique continuous extension to D
which we denote by
_
X
t
_
tD
. Moroever this extenson is easily seen to satisfy
Eq. (25.9). Lastly we have for s D | and t D that
_
X
t
,

X
t
_
liminf
DDst
_
(X
t
, X
s
) +
_
X
s
,

X
t
__
= liminf
DDst
(X
t
, X
s
)

E
_
_
X
t
,

X
t
_
_
liminf
DDst
E[ (X
t
, X
s
)
] liminf
DDst
C [t s[
1+
= 0.
This certainly implies that
_
X
t
,

X
t
_
= 0 a.s. for every t D and therefore
that

X is a modication of X.
Our construction of Brownian motion in Theorem 26.3 below will give us
an opportunity to apply Theorem 25.7. At this time let us observe that it is
important that is greater than 0 in the previous two theorems.
Example 25.8. Recall that a Poisson process, N
t
t0
, with parameter sat-
ises (by denition): (i) N has independent increments, and (ii) if 0 u < v
then N
v
N
u
has the Poisson distribution with parameter (v u). Using the
generating function (or the Laplace or Fourier transform, see Example 22.11),
one can show that for any k N, that
E[N
t
N
s
[
k
[t s[ for [t s[ small. (25.10)
Notice that we can not use Eq. (25.10) for any k N to satisfy the hypothesis
of Theorem 25.7 which is good since N
t
t0
is integer value and does not have
a continuous modication. However, see Example 28.27 below where it is shown
that N
t
t0
has a right continuous modication.
Exercise 25.4. Let T N, D = [0, T] 1, (S, ) be a complete separable
metric space and suppose that X
t
: S is a process for t D. Assume there
exists > 0, C > 0, and > 0 such that (1 +) / > 1 and
E[ (X
t
, X
s
)
] C [t s[
1+
for all s, t D.
Show X
t
= X
0
a.s. for each t D. Hint: for (0, 1) use the inequality
1
(a +b)
+ b
for all a, b 0 while for 1 use Minikowskis inequality.

(This Exercise was inspired by questions posed by Dennis Leung.)
Solution to Exercise (25.3). Case 1 where (0, 1) . Let t D and
suppose that 0 = t
0
< t
1
< . . . < t
n
= t is a partition of [0, t] and let
:= max
i
[t
i
t
i1
[ . To simplify notation let
i
:= t
i
t
i1
. Then by
the triangle inequality and the estimate (a +b)
+ b
for all a, b 0 we
nd
1
If f (x) = x
for (0, 1) , then f is an increasing function which is concave down,

i.e. f
is decreasing. For a > 0 let g (x) := f (x +a) f (x) . By looking at a picture

or just noting that g
(x) 0 since f
is decreasing, it follows that g is a decreasing

function of x. In particular it follows that
f (b +a) f (b) = g (b) g (0) = f (a) f (0) = f (a) .
E[ (X
t
, X
0
)
] E
_
n
i=1
_
X
ti1
, X
ti
_
_
i=1
E
_
_
X
ti1
, X
ti
_
i=1
C
1+
i
C
i=1
i
= C
t.
Since we are free to choose the partition as we wish we may let 0 in the
previous estimate in order to discover that E[ (X
t
, X
0
)
] = 0 for all t D.
Therefore X
t
= X
0
a.s. for all t D.
Case 2. Suppose that 1, then again by the triangle inequality for and
Minikowskis inequality;
| (X
t
, X
0
)|

_
_
_
_
_
n
i=1
_
X
ti1
, X
ti
_
_
_
_
_
_
i=1
_
_
_
X
ti1
, X
ti
__
_
C
n
i=1
1+
i
C
1+
1
n
i=1
i
= C
1+
1
t.
As before, since (1 +) / 1 > 0, the same argument as before show that
| (X
t
, X
0
)|
= 0 and therefore X
t
= X
0
a.s. again.
25.2 Kolmolgorovs Tightness Criteria
Before leaving this chapter let us record a compactness result which follows
easily from what we have done so far.
Theorem 25.9 (Tightness Criteria). Let S be a complete metric space satis-
fying the HeineBorel property
2
(for example S = 1
d
for some d < ). Suppose
that B
n
(t) : 0 t 1
n=1
is a sequence of S valued continuous stochastic
processes and suppose there exists , > 0 and C < such that
sup
n
E[ (B
n
(t) , B
n
(s))
] C [t s[
1+
for all 0 s, t 1 (25.11)
and for some point s
0
S we have
lim
N
sup
n
P [ (B
n
(0) , s
0
) > N] = 0. (25.12)
Then the collection of measures,
n
:= Law(B
n
)
n=1
on C ([0, 1] , S) are tight.
2
The HeineBorel property means that closed and bounded sets are compact.
25.2 Kolmolgorovs Tightness Criteria 403
Proof. Let (0, /) and for C ([0, 1] , S) let
K
() = 2
(1+)
k=0
2
k
k
()
where
k
() := max
_
_
j 1
2
k
_
,
_
j
2
k
__
: 1 j 2
k
_
.
The assumptions of this theorem allows us to apply Theorem 25.6 in order to
learn;
sup
n
E[K
(B
n
)
] M (C, , , ) < .
Now let
N
denote those C ([0, 1] , S) such that ( (0) , s
0
) N and
K
() N. We then have that
n
(
c
N
) = P (B
n
/
N
)
= P ( (B
n
(0) , s
0
) > N or K
(B
n
) > N)
P ( (B
n
(0) , s
0
) > N) +P (K
(B
n
) > N)
P [ (B
n
(0) , s
0
) > N] +
1
N
EK
(B
n
)
P [ (B
n
(0) , s
0
) > N] +
1
N
M (C, , , ) .
From this inequality and the hypothesis of the theorem it follows that
lim
N
sup
n
n
(
c
N
) = 0. To complete the proof it suces to observe that
for
N
we have ( (0) , s
0
) N and
( (t) , (s)) K
() [t s[
N [t s[
0 s, t 1.
Therefore by the Arzela - Ascoli Theorem 39.36 and Remark 39.37, it fol-
lows that
N
is precompact inside of the complete separable metric space,
C ([0, 1] , S) .
25.2.1 Appendix: Alternate Proofs (please ignore)
(BRUCE: Add Garsia Rumsey stu here as well see Chapter ??.) Let T N
and D = [0, T] 1,
| =
_
i
2
n
: i, n N
0
_
the Dyadic Rationals in [0, ),
and (S, ) be a complete metric space. Suppose (0, 1) and x : D S is a
Holder continuous function, i.e. there exists K < such that
(x(t), x(s)) K[t s[
for all s, t D.
Then for any (0, ) and n := n (log
2
K) / ( ) we have,
_
x
_
i + 1
2
n
_
, x
_
i
2
n
__
K2
n
= K2
n()
2
n
2
n
provided
i
2
n
< T.
Lemma 25.10. Suppose that x : D| S is a given function. Assume there
exists N such that
_
x
_
i + 1
2
n
_
, x
_
i
2
n
__
2
n
for all n (25.13)
provided
i
2
n
< T. Then,
(x(s) , x(t)) C [t s[
s, t D | with [t s[ 2
, (25.14)
where
C = C() =
1 + 2
1 2
.
In particular x uniquely extends to a continuous function on D which is
Holder continuous. Moreover, the extension, x, satises the Holder estimate,
(x(s) , x(t)) C
1
(, T) 2
(1)
[t s[
for all s, t D (25.15)

where
C
1
(, T) := 2C () T
1
. (25.16)
Proof. Let n and s | D and express s as,
s =
i
2
n
+
k=1
a
k
2
n+k
where i = i
n
(s) N
0
is chosen so that i2
n
s < (i + 1) 2
n
, and a
k
= 0 or 1
with a
k
= 0 for almost all k. Set
s
m
=
i
2
n
+
m
k=1
a
k
2
n+k
and notice that s
0
=
i
2
n
and s
m
= s for all m suciently large. Therefore with
m
0
suciently large, we have
_
x(s), x
_
i
2
n
__
= (x(s
m0
), x(s
0
))
m01
k=0
(x(s
k+1
) , x(s
k
)) .
Since s
k+1
s
k
=
ak+1
2
n+k+1
2
, it follows from Eq. (25.13) that
_
x(s), x
_
i
2
n
__
k=0
_
1
2
n+1+k
_
= 2
(n+1)
1
1 2
=
2
1 2
2
n
.
If 0 < t s 2
, let n be chosen so that 2

(n+1)
< t s 2
n
.
For this choice of n, if i = i
n
(s) then j = i
n
(t) i, i + 1 , see Figure 25.1.
Therefore
Fig. 25.1. The geometry of s and t with 2
(n+1)
< t s 2
n
.
(x(s) , x(t)) (x(s), x(i2
n
)) +(x(i2
n
), x(j2
n
)) +(x(j2
n
), x(t))
1 2
2
n
+ 2
n
+
2
1 2
2
n
= 2
n
1 + 2
1 2
.
Since
2
n
= 2
2
(n+1)
< 2
(t s)
,
(x(s) , x(t)) (t s)
1 + 2
1 2
= C () [t s[
.
From this estimate it follows that x has an Holder continuous extension to
D. We will continue to denote this extension by x.
If s, t D | with t s > 2
, choose k N such that t s = k2
+
with 0 < 2
. It then follows that

(x(s) , x(t)) =
_
x(s) , x
_
s +k2
+
__
j=1
_
x
_
s + (j 1) 2
_
, x
_
s +j2
__
+
_
x
_
s +k2
_
, x
_
s +k2
+
__
C ()
_
k2
_
2C () k2
.
Since
k 2
(t s) 2
(t s)
T
1
,
we may conclude
(x(s) , x(t)) 2C () 2
T
1
[t s[
= C
1
(, T) 2
(1)
[t s[
where C
1
(, T) is given as in Eq. (25.16). As x is continuous and D| is dense
in D, the above estimate extends to all s, t D.
Theorem 25.11 (Kolmogorovs Continuity Criteria). Suppose that X
t
:
S is a process for t D. Assume there exists positive constants, , , and
C, such that
E[(X
t
, X
s
)
] C [t s[
1+
(25.17)
for all s, t D. Then for any (0, /) there is a modication,

X, of X
which is Holder continuous. Moreover, there is a random variable K
such
that,
(

X
t
,

X
s
) K
[t s[
for all s, t D (25.18)

and EK
p
< for all p <

1
.
Proof. Using Chebyshevs inequality,
P
_
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
2
n
_
= P
_
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
2
n
_
2
n
E
_
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
_
C2
n(1+)
. (25.19)
Letting
A
n
=
_
max
0iT2
n
1
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
2
n
_
=
0iT2
n
1
_
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
2
n
_
,
it follows from Eq. (25.19), that
P(A
n
)
0iT2
n
1
P
_
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
2
n
_
T2
n
C2
n(1+)
= CT2
n()
. (25.20)
25.2 Kolmolgorovs Tightness Criteria 405
Since,
n=0
P (A
n
) CT
1
1 2
()
< ,
it follows by the rst Borel Cantelli lemma that P (A
n
i.o.) = 0 or equivalently
put, if
0
:= A
c
n
a.a. =
_
max
0iT2
n
1
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
< 2
n
a.a.
_
,
then P (
0
) = 1.
For
0
, let
() = minn : A
c
m
for all m n < .
On
0
, we know that
max
0iT2
n
1
_
X
_
i + 1
2
n
_
, X
_
i
2
n
__
< 2
n
for all n
and hence by Lemma 25.10,
(X
t
, X
s
) [t s[
when [t s[ 2
. (25.21)
Hence on
0
, if we dene

X
t
:= lim
DD; t
X
t
, the resulting process,

X
t
,
will be Holder continuous on D. To complete the denition of

X
t
, x a
point y S and set

X
t
() = y for all t D and /
0
.
For t D and s D |, we have that
(

X
t
, X
t
) (

X
t
,

X
s
) +(

X
s
, X
s
) +(X
s
, X
t
) = (

X
t
,

X
s
) +(X
s
, X
t
) a.e.
By continuity, lim
st

X
s
=

X
t
and by Eq. (25.17) it follows that lim
st
X
s
=
X
t
in measure and hence we may conclude that (

X
t
, X
t
) = 0 a.s., i.e. X
t
=

X
t
a.e. and

X is a version of X.
It is only left to prove the quantitative estimate in Eq. (25.18). Because of
Eq. (25.15) we have the following estimate:
(

X
t
,

X
s
) K
[t s[
for all s, t D, (25.22)

where
K
:= 1
0
C
1
(, T) 2
(1)
.
Since > N =
m=N
A
n
, it follows from Eq. (25.20), that
P( > N)
mN
P(A
m
)
mN
C2
m()
C
2
N()
1 2
()
. (25.23)
Using this estimate we nd
EK
p
= C
1
(, T)
p
E
_
2
(1)p
_
= C
1
(, T)
p
n=0
2
n(1)p
P( = n)
C
1
(, T)
p
_
1 +
n=1
2
n(1)p
P ( > n 1))
_
C
1
(, T)
p
_
1 +
C
1 2
()
n=1
2
n(1)p
2
(n1)()
_
which is nite provided that (1 )p ( ) < 0.
26
Brownian Motion I
Our next goal is to prove existence of Brownian motion and then describe
some of its basic path properties.
Denition 26.1 (Brownian Motion). A Brownian motion B
t
t0
is an
adapted mean zero Gaussian random process on some ltered probability space,
_
, B, B
t
t0
, P
_
, satisfying; 1) for each , t B
t
() is continuous,
and 2)
E[B
t
B
s
] = t s for all s, t 0. (26.1)
So a Brownian motion is a pre-Brownian motion with continuous sample paths.
Remark 26.2. If no ltration is given, we can use the process to construct one.
Namely, let B
0
t
:= (B
s
: s t) and replace B by B
0
t
if necessary. We call
_
B
0
t
_
the raw ltration associated to B
t
.
Theorem 26.3 (Wiener 1923). Brownian motions exists. Moreover for any
(0, 1/2) , t B
t
is locally Holder continuous almost surely.
Proof. For 0 s < t < ,

B
t

B
s
variable with
E
_
_
B
t

B
s
_
2
_
= E
_
B
2
t
+

B
2
s
2B
s

B
t
_
= t +s 2s = t s.
Hence if N is a standard normal random variable, then

B
t

B
s
d
=
t sN and
therefore, for any p [1, ),
E
B
t

B
s
p
= (t s)
p/2
E[N[
p
. (26.2)
Hence an application of Theorem 25.7 shows, with = p > 2, = p/2 1,

_
0,
p/21
p
_
=
_
0,
1
2
1/p
_
, there exists a modication, B of

B such that
[B
t
B
s
[ C
,T
[t s[
for s, t [0, T).

By applying this result with T = N N, we nd there exists a continuous
version, B, of

B for all t [0, ) and this version is locally Holder continuous
with Holder constant < 1/2.
For the rest of this chapter we will assume that B
t
t0
is a Brownian motion
on some probability space,
_
, B, B
t
t0
, P
_
and B
t
:= (B
s
: s t) .
26.1 Donskers Invariance Principle
In this section we will see that Brownian motion may be thought of as a limit
of random walks this is the content of Donskers invariance principle or the
so called functional central limit theorem. The setup is to start with a random
walk, S
n
:= X
1
+ +X
n
where X
n
n=1
are i.i.d. random variables with zero
mean and variance one. We then dene for each n N the following continuous
process,
B
n
(t) :=
1
n
_
S
[nt]
+ (nt [nt]) X
[nt]+1
_
(26.3)
where for 1
+
, [] is the integer part of , i.e. the nearest integer to which
is no greater than . The rst step in this program is to prove convergence in
the sense of nite dimensional distributions.
Proposition 26.4. Let B be a standard Brownian motion, then B
n
f.d.
= B.
Proof. In Exercise 21.12 you showed that
_
1
n
S
[nt]
_
t0
f.d.
= B as n .
Suppose that 0 < t
1
< t
2
< < t
k
< are given and we let
X
n
:=
1
n
_
S
[nt1]
, . . . , S
[ntk]
_
,
Y
n
:= (B
n
(t
1
) , . . . , B
n
(t
k
))
and
n
:= Y
n
X
n
. From Eq. (26.3) and Chebyshevs inequality it follows that
P ([(
n
)
i
[ > )
1
E
_
1
n
(nt
i
[nt
i
])
X
[nt]+1
n
E[X
1
[
0
as n . This then easily implies that
n
P
0 as n and therefore by
Slutzkys Theorem 21.39 it follows that Y
n
= X
n
+
n
= (B(t
1
) , . . . , B(t
k
)) .
Let := C ([0, ), 1) which becomes a complete metric space in the metric
dened by,
(
1
,
2
) :=
n=1
1
2
n
max
0tn
[
1
(t)
2
(t)[ 1.
408 26 Brownian Motion I
Theorem 26.5 (Donskers invariance principle). Let , B
n
, and B be as
above. Then for B
n
= B, i.e.
lim
n
E[F (B
n
)] = E[F (B)]
for all bounded continuous functions F : 1.
One method of proof, see [29] and [63], goes by the following two steps. 1)
Show that the nite dimensional distributions of B
n
converge to a B as was
done in Proposition 26.4. 2) Show the distributions of B
n
are tight. A proof
of the tightness may be based on Ottavianis maximal inequality in Corollary
20.48, [28, Corollary 16.7]. Another possible proof or this theorem is based on
Skorokhods representation, see (for example) [28, Theorem 14.9 on p. 275]
or [47]. Rather than give the full proof here I will give the proof of a slightly
weaker version of the theorem under the more stringent restriction that the X
n
posses fourth moments.
Proposition 26.6 (Random walk approximation bounds). Suppose that
X
n
n=1
L
4
(P) are i.i.d. random variables with EX
n
= 0, EX
2
n
= 1 and
:= EX
4
n
< . Then there exists C < such that
E[B
n
(t) B
n
(s)[
4
C [t s[
2
for all s, t 1
+
. (26.4)
Exercise 26.1. Provide a proof of Proposition 26.6. Hints: Use the results of
Exercise 10.9 to verify that Eq. (26.4) holds for s, t D
n
:=
1
n
N
0
. Take care
of the case where s, t 0 with [t s[ < 1/n by hand and nish up using these
results along with Minikowskis inequality.
Theorem 26.7 (Baby Donsker Theorem). Continuing the notation used in
Proposition 26.6 and let T < and B be a Brownian motion. Then
B
n
[
[0,T]
= B[
[0,T]
, (26.5)
i.e. Law
_
B
n
[
[0,T]
_
= Law
_
B[
[0,T]
_
as distributions on
T
= C ([0, T] , 1)
a complete separable metric space. (See the simulation le Random Walks to
BM.xls.)
Proof. If Eq. (26.5) fails to hold there would exists g BC (
T
) and a
subsequence B
t
k
= B
nk
such that
:= inf
k
E
_
g
_
B
t
k
[
[0,T]
_
Eg
_
B[
[0,T]
_
> 0. (26.6)
Since B
n
(0) = 0 for all n and the estimate in Eq. (26.4) of Proposition
26.6 holds, it follows from Theorem 25.9 that
_
B
n
[
[0,T]
_
n=1
is tight. So by
Prokhorovs Theorem 21.61, there is a further subsequence B
tt
l
= B
t
kl
which is
weakly convergent to some
T
valued process X. Replacing B
t
k
bit B
tt
l
in Eq.
(26.6) and then letting l in the resulting equation shows
E
_
g
_
X[
[0,T]
_
Eg
_
B[
[0,T]
_
> 0. (26.7)
On the other hand by Proposition 26.4 we know that B
n
f.d.
= B as n
and therefore X and B are continuous processes on [0, T] with the same -
nite dimensional distributions and hence are indistinguishable by Exercise 25.1.
However this is in contradiction to Eq. (26.7).
26.2 Path Regularity Properties of BM
Denition 26.8. Let (V, ||) be a normed space and Z C ([0, T] , V ) . For
1 p < , the p - variation of Z is;
v
p
(Z) := sup
_
_
n
j=1
_
_
Z
tj
Z
tj1
_
_
p
_
_
1/p
where the supremum is taken over all partitions, :=
0 = t
0
< t
1
< < t
n
= T , of [0, T] .
Lemma 26.9. The function v
p
(Z) is a decreasing function of p.
Proof. Let a := a
j
n
j=1
be a sequence of non-negative numbers and set
|a|
p
:=
_
_
n
j=1
a
p
j
_
_
1/p
.
It will suce to show |a|
p
is a decreasing function of p. To see this is true,
q = p +r. Then
|a|
q
q
=
n
j=1
a
p+r
j

_
max
j
a
j
_
r
j=1
a
p
j
|a|
r
p
|a|
p
p
= |a|
q
p
,
wherein we have used,
max
j
a
j
=
_
max
j
a
p
j
_
1/p
_
_
n
j=1
a
p
j
_
_
1/p
= |a|
p
.
26.2 Path Regularity Properties of BM 409
Notation 26.10 (Partitions) Given T := 0 = t
0
< t
1
< < t
n
= T , a
partition of [0, T] , let
i
B := B
ti
B
ti1
, and
i
t := t
i
t
i1
for all i = 1, 2, . . . , n. Further let mesh(T) := max
i
[
i
t[ denote the mesh of the
partition, T.
Corollary 26.11. For all p > 2 and T < , v
p
_
B[
[0,T]
_
< a.s. (We will
see later that v
p
_
B[
[0,T]
_
= a.s. for all p < 2.)
Proof. By Theorem 26.3, there exists K
p
< a.s. such that
[B
t
B
s
[ K
p
[t s[
1/p
for all 0 s, t T. (26.8)
Thus we have
i
[
i
B[
p
i
_
K
p
[t
i
t
i1
[
1/p
_
p
i
K
p
p
[t
i
t
i1
[ = K
p
p
T
and therefore, v
p
_
B[
[0,T]
_
K
p
p
T < a.s.
Exercise 26.2 (Quadratic Variation). Let
T
m
:=
_
0 = t
m
0
< t
m
1
< < t
m
nm
= T
_
be a sequence of partitions such that mesh(T
m
) 0 as m . Further let
Q
m
:=
nm
i=1
(
m
i
B)
2
:=
nm
i=1
_
B
t
m
i
B
t
m
i1
_
2
. (26.9)
Show
lim
m
E
_
(Q
m
T)
2
_
= 0
and lim
m
Q
m
= T a.s. if

m=1
mesh(T
m
) < . This result is often abbre-
viated by the writing, dB
2
t
= dt. Hint: it is useful to observe; 1)
Q
m
T =
nm
i=1
_
(
m
i
B)
2
i
t
_
and 2) using Eq. (26.2) there is a constant, c < such that
E
_
(
m
i
B)
2
i
t
_
2
= c (
i
t)
2
.
Proposition 26.12. Suppose that T
m
m=1
is a sequence of partitions of [0, T]
such that T
m
T
m+1
for all m and mesh(T
m
) 0 as m . Then Q
m
T
a.s. where Q
m
is dened as in Eq. (26.9).
Proof. It is always possible to nd another sequence of partitions, T
t
n
n=1
,
of [0, T] such that T
t
n
T
t
n+1
, mesh(T
t
n
) 0 as n , #
_
T
t
n+1
_
= #(T
t
n
)+
1, and T
m
= T
t
nm
where n
m
m=1
is a subsequence of N. If we let Q
t
n
denote
the quadratic variations associated to T
t
n
and we can shown Q
t
n
T a.s. then
we will also have Q
m
= Q
t
nm
T a.s. as well. So with these comments we may
now assume that #(T
n+1
) = #(T
n
) + 1.
We already know form Exercise 26.2 that Q
m
T in L
2
(P) . So it suces
to show Q
m
is almost surely convergent. We will do this by showing Q
m
m=1
is a backwards martingale relative to the ltration,
T
m
:= (Q
m
, Q
m+1
, . . . ) .
To do this, suppose that T
m+1
= T
m
v and u = t
i1
, v = t
i+1
T
m
such
that u < v < w. Let X := B
v
B
w
and Y := B
w
B
u
. Then
Q
m
= Q
m+1
(B
v
B
w
)
2
(B
w
B
u
)
2
+ (B
v
B
u
)
2
= Q
m+1
X
2
Y
2
+ (X +Y )
2
= Q
m+1
+ 2XY
therefore,
E[Q
m
[T
m+1
] = Q
m+1
+ 2E[XY [T
m+1
] .
So to nish the proof it suces to show E[XY [T
m+1
] = 0 a.s.
To do this let
b
t
:=
_
B
t
if t v
B
v
(B
t
B
v
) if t v,
that is after t = v, the increments of b are the reections of the increments of B.
Clearly b
t
is still a continuous process and it is easily veried that E[b
t
b
s
] = st.
Thus b
t
t0
is still a Brownian motion. Moreover, if Q
m+n
(b) is the quadratic
variation of b relative to T
m+n
, then
Q
m+n
(b) = Q
m+n
= Q
m+n
(B) for all n N.
On the other hand, under this transformation, X X and Y Y. Since
(X, Y, Q
m+1
, Q
m+2
, . . . ) and (X, Y, Q
m+1
, Q
m+2
, . . . ) have the same distribu-
tion, if we write
E[XY [T
m+1
] = f (Q
m+1
, Q
m+2
, . . . ) a.s., (26.10)
then it follows from Exercise 14.6, that
410 26 Brownian Motion I
E[XY [T
m+1
] = f (Q
m+1
, Q
m+2
, . . . ) a.s. (26.11)
Hence we may conclude,
E[XY [T
m+1
] = E[XY [T
m+1
] = E[XY [T
m+1
] ,
and thus E[XY [T
m+1
] = 0 a.s.
Corollary 26.13. If p < 2, then v
p
_
B[
[0,T]
_
= a.s.
Proof. Choose partitions, T
m
, of [0, T] such that lim
m
Q
m
= T a.s.
where Q
m
is as in Eq. (26.9) and let
0
:= lim
m
Q
m
= T so that P (
0
) =
1. If v
p
_
B[
[0,T]
()
_
< for some
0
, then, with b = B() , we would
have
Q
m
() =
nm
i=1
(
m
i
b)
2
=
nm
i=1
[
m
i
b[
p
[
m
i
b[
2p
max
i
[
m
i
b[
2p
nm
i=1
[
m
i
b[
p
[v
p
(b)]
p
max
i
[
m
i
b[
2p
which tends to zero as m by the uniform continuity of b. But this con-
tradicts the fact that lim
m
Q
m
() = T. Thus we must v
p
_
B[
[0,T]
_
= on
0
.
Remark 26.14. The reader may nd a proof that Corollary 26.13 also holds for
p = 2 in [?, Theorem 13.69 on p. 382]. You should consider why this result in
not in contradiction with Exercise 26.2 and Theorem 26.12. Hint: unlike the
case of p = 1, when p > 1 the quantity;
v
p
(Z) :=
_
_
n
j=1
_
_
Z
tj
Z
tj1
_
_
p
_
_
1/p
does not increase under renement of partitions so that
v
p
(Z) = sup
p
(Z) ,= lim
]]0
v
p
(Z)
when p > 1.
Corollary 26.15 (Roughness of Brownian Paths). A Brownian motion,
B
t
t0
, is not almost surely Holder continuous for any > 1/2.
Proof. According to Exercise 26.2, we may choose partition, T
m
, such that
mesh(T
m
) 0 and Q
m
T a.s. If B were Holder continuous for some
> 1/2, then
Q
m
=
nm
i=1
(
m
i
B)
2
C
nm
i=1
(
m
i
t)
2
C max
_
[
i
t]
21
_
nm
i=1
m
i
t
C [mesh(T
m
)]
21
T 0 as m
which contradicts the fact that Q
m
T as m .
Lemma 26.16. For any > 1/2, limsup
t0
[B
t
[ /t
= a.s. (See Exercise

31.3 below to see that = 1/2 would work as well.)
Proof. If limsup
t0
[
t
[ /t
< then there would exists C < such that

[
t
[ Ct
for all t 1 and in particular,

1/n
Cn
for all n N. Hence

we have shown
_
limsup
t0
[B
t
[ /t
<
_

CN
nN
_
B
1/n
Cn
_
.
This completes the proof because,
P
_
nN
_
B
1/n
Cn
__
liminf
n
P
_
B
1/n
Cn
_
= liminf
n
P
_
1
n
[B
1
[ Cn
_
= liminf
n
P
_
[B
1
[ Cn
1/2
_
= P ([B
1
[ = 0) = 0
if > 1/2.
Theorem 26.17 (Nowhere 1/2 + Holder Continuous). Let
W := C ([0, ) 1) : (0) = 0 ,
B denote the eld on W generated by the projection maps, b
t
() = (t)
for all t [0, ), and be Wiener measure on (W, B) , i.e. is the Law of
a Brownian motion. For > 1/2 and E
denote the set of W such that

is Holder continuous at some point t = t
[0, 1]. when
(E
) = 0, i.e.
there exists a set

E
B such that
E
=
_
inf
0t1
limsup
h0
[ (t +h) (t)[
[h[
<
_
and
_
_
= 0. In particular, is concentrated on

E
c
which is a subset of the

collection paths which are nowhere dierentiable on [0, 1] .
26.3 Scaling Properties of B. M. 411
Proof. Let (0, 1) and N to be chosen more specically later. If
E
, then there exists, t [0, 1], C < , such that

[(t) (s)[ C [t s[
for all [s[ + 1.

For all n N we may choose i 0 so that

t
i
n
<
1
n
. By the triangle
inequality, for all j = 1, 2, . . . , , we have
_
i +j
n
_
_
i +j 1
n
_
_
i +j
n
_
(t)
(t)
_
i +j 1
n
_
C
_
i +j
n
t
i +j 1
n
t
_
Cn
[[ + 1[
+[[
] =: Dn
.
Therefore, E
implies there exists D N such that for all n N there

exists i n such that
_
i +j
n
_
_
i +j 1
n
_
Dn
j = 1, 2, . . . , .
Letting
A
D
:=
n=1
in
j=1
_
:
_
i +j
n
_
_
i +j 1
n
_
Dn
_
,
we have shown that E

DN
A
D
. We now complete the proof by showing
P (A
D
) = 0. To do this, we compute,
P (A
D
) liminf
n
P
_
in
j=1
_
:
_
i +j
n
_
_
i +j 1
n
_
Dn
__
liminf
n
in
j=1
P
_
:
_
i +j
n
_
_
i +j 1
n
_
Dn
_
= liminf
n
n
_
P
_
1
n
[N[ Dn
__
= liminf
n
n
_
P
_
[N[ Dn
1
2
__
liminf
n
n
_
Cn
1
2
= C
liminf
n
n
1+(
1
2
)
. (26.12)
wherein we have used
([N[ ) =
1
2
_
]x]
e
1
2
x
2
dx
1
2
2.
The last limit in Eq. (26.12) is zero provided we choose >
1
2
and
_

1
2
_
> 1.
26.3 Scaling Properties of B. M.
Theorem 26.18 (Transformations preserving B. M.). Let B
t
t0
be a
Brownian motion and B
t
:= (B
s
: s t) . Then;
1. b
t
= B
t
is again a Brownian motion.
2. if c > 0 and b
t
:= c
1/2
B
ct
3. b
t
:= tB
1/t
for t > 0 and b
0
= 0 is a Brownian motion. In particular,
lim
t0
tB
1/t
= 0 a.s.
4. for all T (0, ) , b
t
:= B
t+T
B
T
for t 0 is again a Brownian motion
which is independent of B
T
.
5. for all T (0, ) , b
t
:= B
Tt
B
T
for 0 t T is again a Brownian
motion on [0, T] .
Proof. It is clear that in each of the four cases above b
t
t0
is still a
Gaussian process. Hence to nish the proof it suces to verify, E[b
t
b
s
] = s t
which is routine in all cases. Let us work out item 3. in detail to illustrate the
method. For 0 < s < t,
E[b
s
b
t
] = stE[B
s
1B
t
1] = st
_
s
1
t
1
_
= st t
1
= s.
Notice that t b
t
is continuous for t > 0, so to nish the proof we must
show that lim
t0
b
t
= 0 a.s. However, this follows from Kolmogorovs continuity
criteria. Since b
t
t0
is a pre-Brownian motion, we know there is a version,

b
which is a.s. continuous for t [0, ). By exercise 25.1, we know that
E :=
_
: b
t
() ,=

b
t
() for some t > 0
_
is a null set. Hence / E it follows that
lim
t0
b
t
() = lim
t0
b
t
() = 0.
Corollary 26.19 (B. M. Law of Large Numbers). Suppose B
t
t0
is a
Brownian motion, then almost surely, for each > 1/2,
limsup
t
[B
t
[
t
=
_
0 if > 1/2
if (0, 1/2) .
(26.13)
Proof. Since b
t
:= tB
1/t
for t > 0 and b
0
= 0 is a Brownian motion, we
know that for all < 1/2 there exists, C
() < such that, almost surely,

t
B
1/t
tB
1/t
= [b
t
[ C
[t[
for all t 1.
Replacing t by 1/t in this inequality implies, almost surely, that
1
t
[B
t
[
C
[t[
for all t 1.
[B
t
[ C
t
1
for all t 1. (26.14)
Hence if > 1/2, let < 1/2 such that < 1 . Then Eq. (26.13) follows
from Eq. (26.14).
On the other hand, taking > 1/2, we know by Lemma 26.16 (or Theorem
26.17) that
limsup
t0
t
B
1/t
= limsup
t0
[b
t
[
t
= a.s.
This may be expressed as saying
= limsup
t
t
1
[B
t
[
t
= limsup
t
[B
t
[
t
1
a.s.
Since := 1 is any number less that 1/2, the proof is complete.
27
Filtrations and Stopping Times
For our later development we need to go over some measure theoretic pre-
liminaries about processes indexed by 1
+
:= [0, ). We will continue this
discussion in more depth later. For this chapter we will always suppose that
_
, B, B
t
tR+
_
is a ltered measurable space, i.e. is as set, B 2
is
a algebra, and B
t
tR+
is a ltration which to say each B
t
is a sub
algebra of B and B
s
B
t
for all s t.
27.1 Measurability Structures
Notation 27.1 (B
t
) Let
B
= B
+
=
tR+
B
t
=
_
tR+
B
t
_
B,
and for t 1
+
, let
B
+
t
= B
t+
:=
s>t
B
s
.
Also let B
0
:= B
0
and for t (0, ] let
B
t
:=
s<t
B
s
= (
s<t
B
s
) .
(Observe that B
= B
.)
The ltration,
_
B
+
t
_
tR+
, peaks innitesimally into the future while B
t
limits itself to knowing about the state of the system up to the times innites-
imally before time t.
Denition 27.2 (Right continuous ltrations). The ltration B
t
t0
is
right continuous if B
+
t
:= B
t+
= B
t
for all t 0.
The next result is trivial but we record it as a lemma nevertheless.
Lemma 27.3 (Right continuous extension). Suppose
_
, B, B
t
t0
_
is
a ltered space and B
+
t
:= B
t+
:=
s>t
B
s
.Then
_
B
+
t
_
t0
is right continu-
ous. (We refer to
_
B
+
t
_
tR+
as the right continuous ltration associated to
B
t
tR+
.)
Exercise 27.1. Suppose (, T) is a measurable space, (S, ) is a separable
metric space
1
, and o is the Borel algebra on S i.e. the algebra
generated by all open subset of S.
1. Let D S be a countable dense set and
+
:= 1
+
. Show o may be
described as the algebra generated by all open (or closed) balls of the
form
B(a, ) := s S : (s, a) < (27.1)
(or C (a, ) := s S : (s, a) ) (27.2)
with a D and
+
.
2. Show a function, Y : S, is T/o measurable i the functions,
(x, Y ()) 1
+
are measurable for all x D. Hint: show, for each
x S, that (x, ) : S 1
+
is a measurable map.
3. If X
n
: S is a sequence of T/o measurable maps such that X () :=
lim
n
X
n
() exists in S for all , then the limiting function, X, is
T/o measurable as well. (Hint: use item 2.)
Denition 27.4. Suppose S is a metric space, o is the Borel algebra on
S, and
_
, B, B
t
tR+
_
is a ltered measurable space. A process, X
t
: S
for t 1
+
is;
1. adapted if X
t
is B
t
/o measurable for all t 1
+
,
2. right continuous if t X
t
() is right continuous for all ,
3. left continuous if t X
t
() is left continuous for all , and
4. progressively measurable, if for all T 1
+
, the map
T
: [0, T] S
dened by
T
(t, ) := X
t
() is B
[0,T]
B
T
/o measurable.
Lemma 27.5. Let (t, ) := X
t
() where we are continuing the notation in
Denition 27.4. If X
t
: S is a progressively measurable process then X
is
adapted and : 1
+
S is B
R+
B/o measurable and the X is adapted.
Proof. For T 1
+
, let
T
: [0, T] , be dened by
T
() :=
(T, ) . If a [0, T] and A B
T
, then
1
T
([0, a] A) = B
T
if a ,= T
1
If you are unconfortable with this much generality, you may assume S is a subset
of R
d
and (x, y) := x y for all x, y S.
414 27 Filtrations and Stopping Times
and
1
T
([0, a] A) = A B
T
if a = T. This shows
T
is B
T
/B
[0,T]
B
measurable. Therefore, the composition,
T

T
= X
T
is B
T
/o measurable
for all T 1
+
which is the statement that X is adapted.
For V o and T < , we have
1
(V ) ([0, T] ) =
_
T
_
1
(V ) B
[0,T]
B
T
B
[0,T]
B. (27.3)
Since B
[0,T]
B and
_
B
R+
B
_
[0,T]
are algebras which are generated
by sets of the form [0, a] A with a [0, T] and A B, they are equal
B
[0,T]
B =
_
B
R+
B
_
[0,T]
. This observation along with Eq. (27.3) then
implies,
1
(V ) ([0, T] )
_
B
R+
B
_
[0,T]
B
R+
B
and therefore,
1
(V ) =
TN
_
1
(V ) ([0, T] )
B
R+
B.
This shows is B
R+
B/o measurable as claimed.
Lemma 27.6. Suppose S is a separable metric space, o is the Borel algebra
on S,
_
, B, B
t
tR+
_
is a ltered measurable space, and X
t
: S for t
1
+
is an adapted right continuous process. Then X
is progressively measurable
and the map, : 1
+
S dened by (t, ) = X
t
() is
_
B
R+
B
/o
measurable.
Proof. Let T 1
+
. To each n N let
n
(0, ) = X
0
() and
n
(t, ) := XkT
2
n
() if
(k 1) T
2
n
< t
kT
2
n
for k 1, 2, . . . , 2
n
.
Then
1
n
(A) =
_
0 X
1
0
(A)
k=1
__
(k 1) T
2
n
,
kT
2
n
_
X
1
Tk2
n
(A)
_
B
[0,T]
B
T
,
showing that
n
is
_
B
[0,T]
B
T
/o measurable. Therefore, by Exercise 27.1,
T
= lim
n
n
is also
_
B
[0,T]
B
T
/o measurable. The fact that is

_
B
R+
B
/o measurable now follows from Lemma 27.5.

Lemma 27.7. Suppose that T (0, ) ,
T
:= C ([0, T] , 1) , and T
T
=
_
B
T
t
: t T
_
, where B
T
t
() = (t) for all t [0, T] and
T
. Then;
1. The map, :
T
dened by () := [
[0,T]
is B
T
/T
T
measurable.
2. A function, F : 1 is B
T
measurable i there exists a function,
f :
T
1 which is T
T
measurable such that F = f .
3. Let ||
T
:= max
t[0,T]
[ (t)[ so that (
T
, ||
T
) is a Banach space. The
Borel algebra, B
T
on
T
is the same as T
T
.
4. If F = f where f :
T
1 is a ||
T
continuous function, then F is
B
T
measurable.
Proof. 1. Since B
T
t
= B
t
is B
T
measurable for all t [0, T] , it follows
that is measurable.
2. Clearly if f :
T
1 is T
T
measurable, then F = f : 1
is B
T
measurable. For the converse assertion, let H denote the bounded B
T
measurable functions of the form F = f with f :
T
1 being T
T

measurable. It is a simple matter to check that H is a vector space which is
closed under bounded convergence and contains all cylinder functions of the
form, G(B
t1
, . . . B
tn
) = G
_
B
T
t1
, . . . B
T
tn
_
with t
i
n
i=1
[0, T] . The latter
set of functions generates the algebra, B
T
, and so by the multiplicative
systems theorem, H contains all bounded B
T
measurable functions. For a
general B
T
measurable function, F : 1, the truncation by N N,
F
N
= N (F N) , is of the form F
N
= f
N
for some T
T
measurable
function, f
N
:
T
1. Since every
T
extends to an element of ,
it follows that lim
N
f
N
() = lim
N
F
N
( ) = F ( ) exists. Hence if we
let f := lim
N
f
N
, we will have F = f with f being a T
T
measurable
function.
3. Recall that B
T
= (open sets) . Since B
s
: 1 is continuous for all
s, it follows that
_
B
T
s
_
B
T
for all s and hence T
T
B
T
. Conversely,
since
|| := sup
tQ[0,T]
[ (t)[ = sup
tQ[0,T]
B
T
t
()
,
it follows that |
0
| = sup
tQ[0,T]
B
T
t
()
0
(t)
is T
T
measurable
for every
0
. From this we conclude that each open ball, B(
0
, r) :=
: |
0
| < r , is in T
T
. By the classical Weierstrass approximation
theorem we know that is separable and hence we may now conclude that T
T
contains all open subsets of . This shows that B
T
= (open sets) T
T
.
4. Any continuous function, f : 1 is B
T
= T
T
measurable and
therefore, F = f is B
T
measurable since it is the composition of two
27.2 Stopping and optional times
Denition 27.8. A random time T : [0, ] is a stopping time i
T t B
t
for all t 0 and is an optional time i T < t B
t
for
all t 0.
27.2 Stopping and optional times 415
If T is an optional time, the condition T < 0 = B
0+
is vacuous.
Moreover, since T < t T = 0 as t 0, it follows that T = 0 B
0+
when
T is an optional time.
Proposition 27.9. Suppose T : [0, ] is a random time. Then;
1. If T () = s with s 0 for all , then T is a stopping time.
2. Every stopping time is optional.
3. T is a B
t
optional time i T is a
_
B
+
t
_
stopping time. In particular,
if B
t
is right continuous, i.e. B
+
t
= B
t
for all t, then the notion of optional
time and stopping time are the same.
Proof. 1.
T t =
_
if t < s
if t s
which show T t is in any algebra on .
2. If T is a stopping time, t > 0 and t
n
(0, t) with t
n
t, then
T < t =
n
T t
n
B
t
B
t
.
This shows T is an optional time.
3. If T is B
t
optional and t 0, choose t
n
> t such that t
n
t.
Then T < t
n
T t which implies T t B
t+
= B
+
t
. Conversely
if T is an B
t+
stopping time, t > 0, and t
n
(0, t) with t
n
t, then
T t
n
B
tn+
B
t
for all n and therefore,
T < t =
n=1
T t
n
B
t
.
Exercise 27.2. Suppose, for all t 1
+
, that X
t
: 1 is a function. Let
B
t
:= B
X
t
:= (X
s
: s t) and B = B
:=
0t<
B
t
. (Recall that the general
element, A B
t
is of the form, A = X
1
A
_
where is a countable subset of
[0, ),

A 1
is a measurable set relative to the product algebra on 1
,
and X
: 1
is dened by, X
() (t) = X
t
() for all t .) If T is a
stopping time and ,
t
satisfy X
t
() = X
t
(
t
) for all t [0, T ()] 1,
then show T () = T (
t
) .
Denition 27.10. Given a process, X
t
: S and A S, let
T
A
() := inf t > 0 : X
t
() A and
D
A
() := inf t 0 : X
t
() A
be the rst hitting time and Debut (rst entrance time) of A. As usual
the inmum of the empty set is taken to be innity.
Clearly, D
A
T
A
and if D
A
() > 0 or more generally if X
0
() / A,
then T
A
() = D
A
() . Hence we will have D
A
= T
A
i T
A
() = 0 whenever
X
0
() A.
In the sequel will typically assume that (S, ) is a metric space and o is
the Borel algebra on S. We will also typically assume (or arrange) for our
processes to have right continuous sample paths. If A is an open subset of S
and t X
t
() is right continuous, then T
A
= D
A
. Indeed, if X
0
() A, then
by the right continuity of X
() , we know that lim

t0
X
t
() = X
0
() A and
hence X
t
() A for all t > 0 suciently close to 0 and therefore, T
A
() = 0.
On the other hand, if A is a closed set and X
0
() bd(A) , there is no need
for T
A
() = 0 and hence in this case, typically D
A
_ T
A
.
Proposition 27.11. Suppose
_
, B
t
t0
, B
_
is a ltered measurable space,
(S, ) is a metric space, and X
t
: S is a right continuous B
t
t0

adapted process. Then;
1. If A S is a open set, T
A
= D
A
is an optional time.
2. If A S is closed, on T
A
< (D
A
< ), X
TA
A (X
DA
A) .
3. If A S is closed and X is a continuous process, then D
A
is a stopping
time.
4. If A S is closed and X is a continuous process, then T
A
is an optional
time. In fact, T
A
t B
t
for all t > 0 while T
A
= 0 B
0+
, see Figure
27.1.
Fig. 27.1. A sample point, , where TA () = 0 with A = {a} R.
Proof. 1. By denition, D
A
() < t i X
s
() A for some s < t, which
by right continuity of X happens i X
s
() A for some s < t with s .
Therefore,
D
A
< t =
_
Qs<t
X
1
s
(A) B
t
B
t
.
2. If A S is closed and T
A
() < (or D
A
() < ), there exists t
n
> 0
(t
n
0) such that X
tn
A and t
n
T
A
() . Since X is right continuous and
A is closed, X
tn
X
TA()
A (X
tn
X
DA
()) .
For the rest of the argument we will now assume that X is a continuous
process and A is a closed subset of S.
3. Observe that D
A
() > t i X
[0,t]
() A = . Since X is continuous,
X
[0,t]
() is a compact subset of S and therefore
:=
_
X
[0,t]
() , A
_
> 0
where
(A, B) := inf (a, b) : a A and b B .
Hence we have shown,
D
A
> t =
n=1
_
:
_
X
[0,t]
() , A
_
1/n
_
=
n=1
sQ[0,t]
(X
s
, A) 1/n B
t
wherein we have used (, A) : S 1
+
is continuous and hence measurable. As
D
A
t = D
A
> t
c
B
t
for all t, we have shown D
A
is an stopping time.
4. Suppose t > 0. Then T
A
() > t i X
(0,t]
() A = which happens i
for all (0, t) we have X
[,t]
() A = or equivalently i for all (0, t) ,
:=
_
X
[,t]
() , A
_
> 0. Using these observations we nd,
T
A
> t =
n>1/t
m=1
_
:
_
X
[1/n,t]
() , A
_
1/m
_
=
n>1/t
m=1
sQ[1/n,t]
(X
s
, A) 1/m B
t
.
This shows T
A
t = T
A
> t
c
B
t
for all t > 0. Since, for t > 0, T
A
< t =
sQ(0,t)
T
A
s B
t
we see that T
A
is an optional time.
The only thing keeping T
A
from being a stopping time in item 4 above is
the fact that T
A
= 0 B
0+
rather than T
A
= 0 B
0
. It should be clear
that, in general, T
A
= 0 / B
0
for T
A
= 0 B
0
i 1
TA=0
= f (X
0
) for
some measurable function, f : S 0, 1 1. But it is clearly impossible to
determine whether T
A
= 0 by only observing X
0
.
Notation 27.12 If : [0, ] is a random B
measurable time, let

B
:= A B
: t A B
t
for all t [0, ] . (27.4)
and
B
+
:= A B
: A < t B
t
for all t .
Exercise 27.3. If is a stopping time then B
is a sub- algebra of B
and
if is an optional time then B
+
is a sub- algebra of B
.
Exercise 27.4. Suppose : [0, ] is the constant function, = s, show
B
= B
s
and B
+
=
t>s
B
t
=: B
s+
so that the notation introduced in Notation
27.12 is consistent with the previous meanings of B
s
and B
s+
.
Exercise 27.5. Suppose that is an optional time and let
B
+
:=
_
A B
: A t B
t+
= B
+
t
for all t
_
.
Show B
+
= B
+
. Hence B
+
is precisely the stopped algebra of the stopping
time, , relative to the ltration
_
B
+
t
_
.
Lemma 27.13. Suppose T : [0, ] is a random time.
1. If T is a B
t
stopping time, then T is B
T
measurable.
2. If T is a B
t
optional time, then T is B
T+
= B
+
T
measurable.
Proof. Because of Exercise 27.5, it suces to prove the rst assertion. For
all s, t 1
+
, we have
T t T s = T s t B
st
B
s
.
This shows T t B
T
for all t 1
+
and therefore that T is B
T
measurable.
Lemma 27.14. If is a B
t
stopping time and X
t
: S is a B
t

progressively measurable process, then X
dened on < is (B
)
<]
/o
measurable. Similarly, if is a B
t
optional time and X
t
: S is
a
_
B
+
t
_
progressively measurable process, then X
dened on < is
(B
+
)
<]
/o measurable.
Proof. In view of Proposition 27.9 and Exercise 27.5, it suces to prove
the rst assertion. For T 1
+
, let
T
: T [0, T] be dened by
T
() = ( () , ) and
T
: [0, T] S be dened by
T
(t, ) = X
t
() .
By denition
T
is B
[0,T]
B
T
/o measurable. Since, for all A B
T
and
a [0, T] ,
1
T
([0, a] A) = a A (B
T
)
T]
,
it follows that
T
is (B
T
)
T]
/B
[0,T]
B
T
measurable and therefore,
T
T
:
T S is (B
T
)
T]
/o measurable.
For V o and T 1
+
,
X
1
(A) T =
_
: () T and X
()
() A
_
=
_
: () T and
T

T
() A
_
= T
_
T

T
A
_
(B
T
)
T]
B
T
.
This is true for arbitrary T 1
+
we conclude that X
1
(A) B
and since,
by denition, X
1
(A) < , it follows that X

1
(A) (B
)
<]
. This
completes the proof since A o was arbitrary.
27.2 Stopping and optional times 417
Lemma 27.15 (Properties of Optional/Stopping times). Let T and S be
optional times and > 0. Then;
1. T + is a stopping time.
2. T +S is an optional time.
3. If T > 0 and T is a stopping time then T +S is again a stopping time.
4. If T > 0 and S > 0, then T +S is a stopping time.
5. If we further assume that S and T are stopping times, then T S, T S,
and T +S are stopping times.
6. If T
n
n=1
are optional times, then
sup
n1
T
n
, inf
n1
T
n
, liminf
n
T
n
, and limsup
n
T
n
are all optional times. If T
n
n=1
are stopping times, then sup
n1
T
n
is a
stopping time.
Proof. 1. This follows from the observation that
T + t = T t B
(t)+
B
t
.
Notice that if t < , then T + t = B
0
.
2. 4. For item 2., if > 0, then
T +S < = T < t, S < s : s, t (0, ] with s +t 0 and T is a stopping time
and > 0, then
T +S = S = 0, T 0 < S, S +T
and S = 0, T B
. Hence it suces to show 0 < S, S +T B
.
To this end, observe that 0 < S and S +T happens i there exists m N
such that for all n m, there exists a r = r
n
such that
0 < r
n
< S < r
n
+ 1/n < and T r
n
.
Indeed, if the latter condition holds, then S+T r
n
+(r
n
+ 1/n) = +1/n
for all n and therefore S +T . Thus we have shown
S +T =
mN
nm
r < S < r + 1/n, T r : 0 < r < r + 1/n <
which is in B
. In showing 0 < S, S +T B
we only need for S

and T to be optional times and so if S > 0 and T > 0, then
T +S = 0 < S, S +T B
.
5. If T and S are stopping times and 0, then
T S = T S B
,
T S = T S B
,
and
S +T > =T = 0, S > 0 < T, T +S >
=T = 0, S > 0 < T < , T +S > T , T +S >
=T = 0, S > 0 < T < , T +S >
S = 0, T > S > 0, T .
The rst, third, and fourth events are easily seen to be in B
. As for the second

event,
0 < T < , T +S > = r < T < , S > r : r with 0 < r < .
6. We have
_
sup
n
T
n
t
_
=
n=1
T
n
t ,
_
inf
n
T
n
< t
_
=
n=1
T
n
< t
which shows that sup
n
T
n
is a stopping time if each T
n
is a stopping time and
that inf
n
T
n
is optional if each T
n
is optional. Moreover, if each T
n
is optional,
then T
n
is a B
t+
stopping time and hence sup
n
T
n
is an B
t+
stopping time and
hence sup
n
T
n
is an B
t
optional time, wherein we have used Proposition 27.9
twice.
Lemma 27.16 (Stopped algebras). Suppose and are stopping times.
1. B
= B
t
on = t .
2. If t [0, ] , then t is B
t
measurable.
3. If , then B
.
4. (B
)
]
B
and in particular , < , , and

< are all in B
.
5. (B
)
<]
B
.
6. B
= B
.
7. If | is a countable set and : | [0, ] is a function, then is a
stopping time i = t B
t
for all t |.
8. If the range of is a countable subset, | [0, ], then A is in B
i
A = t B
t
for all t |.
9. If the range of is a countable subset, | [0, ], then a function f : 1
is B
measurable i 1
=t]
f is B
t
measurable for all t |.
Moreover, all of the above results hold if and are optional times provided
every occurrence of the letter B is replaced by B
+
.
Proof. Recall from Denition 14.21 that if ( is a algebra on and
A , then (
A
:= B A : B ( a sub algebra of 2
A
. Moreover if (
and T are two algebras on and A ( T, then (by denition) ( = T
on A i (
A
= T
A
.
1. If A B
, then
A = t = A t < t
c
B
t
.
Conversely if A B
t
and s 1
+
,
A = t s =
_
if s < t
A = t if s t
from which it follows that A = t B
.
2. To see t is B
t
measurable simply observe that
t s =
_
B
t
if t s
s B
s
B
t
if t > s
and hence t s B
t
for all s [0, ] .
3. If A B
and , then
A t = [A t] t B
t
for all t and therefore A B
.
4. If A B
then A is the generic element of (B
)
]
. We now
have
(A ) t = (A ) t
= (A t t) t
= (A t) t t B
t
since (A t) B
t
and t and t are B
t
measurable and hence
t t B
t
. Since B
, it follows from what we have just proved

that = B
and hence also < =

c
. By symmetry we may also conclude that and < are

in B
.
5. By item 4., if A B
, then
A < = A < B
.
6. Since is a stopping time which is no larger than either or , it follows
that from item 2. that B
. Conversely, if A B
then
A t = A [ t t]
= [A t] [A t] B
t
for all t . From this it follows that A B
.
7. If is a stopping time and t |, then = t = t[
Ds<t
s]
B
t
. Conversely if = t B
t
for all t | and s 1
+
, then
s =
Dts
= t B
s
showing is a stopping time.
8. If A = t B
t
for all t |, then for any s ,
A s =
Dts
[A = t] B
s
which shows A B
. Conversely if A B
and t |, then
A = t = [A t] [
Ds<t
(A s)] B
t
.
9. If f : 1 is B
measurable, then f is a limit of B
simple functions,
say f
n
f. By item 7. it easily follows that 1
=t]
f
n
is B
t
measurable
for each t | and therefore 1
=t]
f = lim
n
1
=t]
f
n
is B
t
measurable
for each t |.
Conversely if f : 1 is a function such that 1
=t]
f is B
t
measurable
for each t |, then for every A B
R
with 0 / A we have
= t f A =
_
1
=t]
f A
_
B
t
for all t |.
Hence it follows by item 7. that f A B
. Similalry,
= t f = 0 =
_
1
=t]
f = 0
_
= t B
t
for all t |
and so again f = 0 B
by item 7. This suces to show that f is B

measurable.
Corollary 27.17. If and are stopping times and F is a B
measurable
function then 1
]
F and 1
<]
F are B
measurable.
Proof. If F = 1
A
with A B
, then the assertion follows from items 4. and

5. from Lemma 27.16. By linearity, the assertion holds if F is a B
measurable
simple function and then, by taking limits, for all B
27.3 Filtration considerations 419
Lemma 27.18 (Optional time approximation lemma). Let be a B
t
t0
optional time and for n N, let
n
: [0, ] be dened by
n
:=
1
2
n
[2
n
] = 1
=
+
k=1
k
2
n
1k1
2
n <
k
2
n
. (27.5)
Then
n
n=1
are stopping times such that;
1.
n
as n ,
2. B
+
B
n
for all n, and
3.
n
= = = for all n.
Proof. If A B
+
then
A
_
n
= k2
n
_
= A
_
(k 1) 2
n
< k2
n
_
=
_
A
_
< k2
n
_
_
< (k 1) 2
n
_
B
k2
n.
Taking A = in this equation shows
n
= k2
n
B
k2
n for all k N and
so is a stopping time by Lemma 27.16. Moreover this same lemma shows that
A B
n
. The fact that
n
as n and
n
= i = should be clear.
27.3 Filtration considerations
For this sction suppose that
_
, B, B
t
t0
, P
_
is a given ltered probability
space.
Notation 27.19 (Null sets) Let ^
P
:= N B : P (N) = 0 be the collec-
tion of null sets of P.
Denition 27.20. If / B is a sub-sigma-algebra of B, then the augmenta-
tion of / is the algebra,

/ := / ^ := (/ ^) .
Denition 27.21 (Usual hypothesis). A ltered probability space,
_
, B, B
t
t0
, P
_
, is said to satisfy the weak usual hypothesis if:
1. For each t 1
+
, ^
P
B
t
, i.e. B
t
contains all of the P null sets.
2. The ltration, B
t
tR+
is right continuous, i.e. B
t+
= B
t
.
If in addition, (, B, P) is complete (i.e. if N ^
P
, A N, then A ^
P
),
then we say
_
, B, B
t
t0
, P
_
satises the usual hypothesis.
It is always possible to make an arbitrary ltered probability space,
_
, B, B
t
t0
, P
_
, into one satisfying the (weak) usual hypothesis by aug-
menting the ltration by the null sets and taking the right continuous ex-
tension. We are going to develop these two concepts now. (For even more
information on the usual hypothesis, see [51, pages 34-36.].)
Lemma 27.22 (Augmentation lemma). Continuing the notation in Deni-
tion 27.20, we have
/ := B B : A / AB ^ . (27.6)
Proof. Let ( denote the right side of Eq. (27.6). If B ( and A / such
that N := AB ^, then
B = [A B] [A B] = [A (A B)] [B A] . (27.7)
Since A B N and B A N implies A B and B A are in ^, it follows
that B /^ =

/. Thus we have shown, (

/. Since it is clear that / (
and ^ (, to nish the proof it suces to show ( is a algebra. For if we
do this, then

/ = / ^ (.
Since A
c
B
c
= A B, we see that ( is closed under complementation.
Moreover, if B
j
(, there exists A
j
/ such that A
j
B
j
^ for all j. So
letting A =
j
A
j
/ and B =
j
B
j
B, we have
B AB
j
[A
j
B
j
] ^
from which we conclude that AB ^ and hence B (. This shows that (
is closed under countable unions, complementation, and contains / and hence
the empty set and , thus ( is a algebra.
Lemma 27.23 (Commutation lemma). If
_
, B, B
t
t0
, P
_
is a ltered
probability space, then

B
t+
= B
t+
. In words the augmentation procedure and the
right continuity extension procedure commute.
Proof. Since for any s > t, B
t+
B
s
it follows that B
t+

B
s
and therefore
that
B
t+

s>t

B
s
=

B
t+
.
Conversely if B

B
t+
=
s>t

B
s
and t
n
> t such that t
n
0, then for each
n N there exists A
n
B
tn
such that A
n
B ^. We will now show that
B B
t+
, by showing B A ^ where
A := A
n
i.o. =
mN
nm
A
n
B
t+
.
To prove this let A
t
m
:=
nm
A
n
so that A
t
n
A as n . Then
B A = B [
m
A
t
m
] = (B [
m
A
t
m
]) ([
m
A
t
m
] B)
[
m
(B A
t
m
)] (A
t
1
B) ^
because measurable subsets of elements in ^ are still in ^, ^ is closed under
countable unions,
B A
t
m
B A
m
B A
m
^, and
A
t
1
B =
n=1
[A
n
B]
n=1
[A
n
B] ^.
27.3.1 ***More Augmentation Results (This subsection neeed
serious editing.)
In this subsection we generalize the augmentation results above to the setting
where we adjoin our favorite collection of null like sets.
Denition 27.24. Suppose (, B) is a measurable space. A collection of sub-
sets, ^ B is a null like collection in B if; 1) ^ is closed under countable
union, and 2) if A B and there exists N ^ such that A N, then A ^.
Example 27.25. Let P
i
iI
be any collection of probability measures on a mea-
surable space, (, B) . Then
^ := N B : P
i
(N) = 0 for all i I
is a null like collections of subsets in B.
Example 27.26. If ^ is a null like collection in B, then
^ :=
_
A 2
: A N for some N ^
_
is a null like collection in 2
.
Example 27.27. Let P
i
iI
be any collection of probability measures on a mea-
surable space, (, B) . Then
^ :=
_
N 2
: B B P
i
(B) = 0 and N B for all i I
_
is a null like collections of subsets of 2
. Similarly,
^ :=
_
N 2
: B
i
B P
i
(B
i
) = 0 and N B
i
for all i I
_
is a null like collections of subsets of 2
. These two collections are easily seen

to be the same if I is countable, otherwise they may be dierent.
Example 27.28. If ^
i
B are null like collections in B for all i I, then ^ :=
iI
^
i
is another null like collection in B. Indeed, if B B and B N ^,
then B N ^
i
for all i and therefore, B ^
i
for all i and hence B ^.
Moreover, it is clear that ^ is still closed under countable unions.
Denition 27.29. If (, B) is a measurable space, / B is a sub-sigma-
algebra of B and ^ B is a null like collection in B, we say /
A
:= / ^ is
the augmentation of / by ^.
Lemma 27.30 (Augmentation lemma). If / is a sub-sigma-algebra of B
and ^ is a null like collection in B, then the augmentation of / by ^ is
/
A
:= B B : A / AB ^ . (27.8)
Proof. Let ( denote the right side of Eq. (27.8). If B ( and A / such
that N := AB ^, then
B = [A B] [A B] = [A (A B)] [B A] . (27.9)
Since A B N and B A N implies A B and B A are in ^, it follows
that B / ^ = /
A
. Thus we have shown, ( /
A
. Since it is clear that
/ ( and ^ (, to nish the proof it suces to show ( is a algebra. For
if we do this, then /
A
= / ^ (.
Since A
c
B
c
= A B, we see that ( is closed under complementation.
Moreover, if B
j
(, there exists A
j
/ such that A
j
B
j
^ for all j. So
letting A =
j
A
j
/ and B =
j
B
j
B, we have
B AB
j
[A
j
B
j
] ^
from which we conclude that AB ^ and hence B (. This shows that (
is closed under countable unions, complementation, and contains / and hence
the empty set and , thus ( is a algebra.
Lemma 27.31 (Commutation lemma). Let
_
, B, B
t
t0
_
be a ltered
space and ^ B be null like collection and for ( B, let

( := ( ^.
Then

B
t+
= B
t+
.
Proof. Since for any s > t, B
t+
B
s
it follows that B
t+

B
s
and therefore
that
B
t+

s>t

B
s
=

B
t+
.
Conversely if B

B
t+
=
s>t

B
s
and t
n
> t such that t
n
0, then for each
n N there exists A
n
B
tn
such that A
n
B ^. We will now show that
B B
t+
, by shown B A ^ where
A := A
n
i.o. =
mN
nm
A
n
B
t+
.
27.3 Filtration considerations 421
To prove this let A
t
m
:=
nm
A
n
so that A
t
n
A as n . Then
B A = B [
m
A
t
m
] = (B [
m
A
t
m
]) ([
m
A
t
m
] B)
[
m
(B A
t
m
)] (A
t
1
B) ^
because measurable subsets of elements in ^ are still in ^, ^ is closed under
countable unions,
B A
t
m
B A
m
B A
m
^, and
A
t
1
B =
n=1
[A
n
B]
n=1
[A
n
B] ^.
Corollary 27.32. Suppose / is a sub-sigma-algebra of B and ^ is a null like
collection in B with the additional property, that for all N ^ there exists
N
t
/ ^ such that N N
t
. Then
/
A
:= A N : A / and N ^ . (27.10)
Proof. Let ( denote the right side of Eq. (27.10). It is clear that ( /
A
.
Conversely if B /
A
, we know by Lemma 27.22 that there exists, A / such
that N := A B ^. Since C := A B N, C ^ and so by assumption
there exists N
t
/ ^ such that C N
t
. Therefore, according to Eq. (27.7),
we have
B = [A C] [B A] = [A N
t
] [(A N
t
) C] [B A] .
Since A N
t
/ and [(A N
t
) C] [B A] ^, it follows that B (.
Example 27.33. Let be a probability measure on 1. As in Notation ??, let
P
:=
_
R
d (x) P
x
be the Wiener measure on := C ([0, ), 1) , B
t
: 1
be the projection map, B
t
() = (t) , B
t
= (B
s
: s t) , and ^
t+
() :=
N B
t+
: P
(N) = 0 . Then by Corollary 29.13, B

t+
= B
t
^
t+
() . Hence
if we let
^ () := N B : P
(N) = 0
and
^ () :=
_
B 2
: B N for some N ^ ()
_
then B
t+
^ () = B
t
^ () =

B
t
and B
t+

^ () = B
t

^ () for all t 1
+
.
This shows that the augmented Brownian ltration,
_
B
t
_
t0
, is already right
continuous.
Denition 27.34. Recall from Proposition 5.50, if (, B, P) is a probability
space and

^
P
:= A : P
(A) = 0 , then the completion,

P := P
[
B

A
P ,
is a probability measure on B

^
P
which extends P so that

P (A) = 0 for all
A

^
P
.
Suppose that (, B) is a measurable space and ^ B is a collection of sets
closed under countable union and also satisfying, A ^ if A B with A N
for some N ^. The main example that we will use below is to let P
i
iI
to
be a collection of probability measures on (, B) and then let
^ := N B : P
i
(N) = 0 for all i I .
Let us also observe that if ^
i
is a collection of null sets above for each i I, then
^ =
i
^
i
is also a collection of null sets. Indeed, if B B and B N ^,
then B N ^
i
for all i and therefore, B N
i
for all i and hence B ^.
Moreover, it is clear that ^ is still closed under countable unions.
Lemma 27.35 (Augmentation). Let us not suppose that / is a sub-sigma-
algebra of B. Then the augmentation of / by ^,
/
A
:= B B : A / AB ^ ,
is a sub-sigma-algebra of B. Moreover if ^ = ^
i
and /
i
= /
Ai
is the aug-
mentation of / by ^
i
, then
/
A
=
i
/
i
.
Proof. To prove this, rst observe that
AB = (A B) (B A) = (A B
c
) (B A
c
)
= (B
c
A
c
) (A
c
B
c
) = A
c
B
c
from which it follows that /
A
is closed under complementation. Moreover, if
B
j
/
A
, then there exists A
j
/ such that A
j
B
j
^ for all j. So letting
A =
j
A
j
/ and B =
j
B
j
B, we have
B AB
j
[A
j
B
j
] ^
from which we conclude that AB ^ and hence B /
A
. This shows that
/
A
is closed under unions and hence we have show / /
A
B and /
A
is
sigma algebra.
???Now to prove the second assertion of this lemma. It is clear that if ^
^
t
, then /
A
/
A

/
A

i
/
i
=
i
/
Ai
.
For the converse inclusion, suppose that B
i
/
Ai
in which case there exists
A
i
/ such that B A
i
^
i
for all i I.
Suppose that (, B, P) is a probability space and / is a sub-sigma-algebra
of B. The augmentation, /
P
, of / by the P null sets of B is the collection
of sets:
/
P
:= B B : A / P (BA) = 0 .
Notation 27.36 Let

B
P
denote the completion of B. Let B
P
t
denote the aug-
mentation of B
t
by the P null subsets of B. We also let

B
P
t
denote the aug-
mentation of B
t
by the P null subsets of

B
P
t
.
28
Continuous time (sub)martingales
For this chapter, let
_
, B, B
t
tR+
, P
_
be a ltered probability space as
described in Chapter 27.
Denition 28.1. Given a ltered probability space,
_
, B, B
t
t0
, P
_
, an
adapted process, X
t
: 1, is said to be a (B
t
) martingale provided,
E[X
t
[ < for all t and E[X
t
X
s
[B
s
] = 0 for all 0 s t < . If
E[X
t
X
s
[B
s
] 0 or E[X
t
X
s
[B
s
] 0 for all 0 s t < , then X is
said to be submartingale or supermartingale respectively.
Remark 28.2. If and are two B
t
optional times, then is as well.
Indeed, it t 1
+
, then
< t = < t < t B
t
.
The following results are of fundamental importance for a number or results
in this chapter. The rst result is a simple consequence of the optional sampling
Theorem 18.39.
Proposition 28.3 (Discrete optional sampling). Suppose X
t
tR+
is a
submartingale on a ltered probability space,
_
, B, B
t
t0
, P
_
, and and
are two B
t
t0
stopping times with values in |
n
:=
_
k
2
n
: k

N
_
for
some n N. If M := sup
() < , then X
L
1
(, B
, P) , X

L
1
(, B
, P) , and
X
E[X
[B
] .
Proof. For k

N, let T
k
:= B
k2
n and Y
k
:= X
k2
n. Then Y
k
k=0
is a
T
k
submartingale and 2
n
, 2
n
are two

N valued stopping times with
2
n
2
n
M < . Therefore we may apply the optional sampling Theorem
18.39 to nd
X
= Y
(2
n
)(2
n
)
E[Y
2
n
[T
2
n
] = E[X
[B
] .
We have used T
2
n
= B
(you prove) in the last equality.

Lemma 28.4 (L
1
convergence I). Suppose X
t
tR+
is a submartingale on
a ltered probability space,
_
, B, B
t
t0
, P
_
. If t 1
+
and t
n
n=1
(t, )
such that t
n
t, then lim
n
X
tn
exists almost surely and in L
1
(P) .
Proof. Let Y
n
:= X
tn
and T
n
:= B
tn
for n N. Then (Y
n
, T
n
)
nN
is a backwards submartingale such that inf EY
n
EX
t
and hence the result
follows by by Theorem 18.75.
Lemma 28.5 (L
1
convergence II). Suppose X
t
tR+
is a submartingale
on a ltered probability space,
_
, B, B
t
t0
, P
_
, is a bounded B
t
op-
tional time, and
n
n=1
is the sequence of approximate stopping times dened
in Lemma 27.18. Then X
+
:= lim
n
X
n
1
(P) .
Proof. Let M := sup
() . If m < n, then
m
and
n
take values in
|
n
, 0
m

n
, and
n
M + 1. Therefore by Proposition 28.3, X
m

E[X
n
[B
m
] and X
0
E[X
n
[B
0
] . Hence if we let Y
n
:= X
n
and T
n
:= B
n
for n N. Then (Y
n
, T
n
)
nN
is a backwards submartingale such that
inf
nN
EY
n
= inf
nN
EX
n
EX
0
> .
The result now follows by an application of Theorem 18.75.
Lemma 28.6 (L
1
convergence III). Suppose (, B, P) is a probability
space and B
n
n=1
is a decreasing sequence of sub algebras of B. Then
for all Z L
1
(P) ,
lim
n
E[Z[B
n
] = E[Z[
n=1
B
n
] (28.1)
where the above convergence is almost surely and in L
1
(P) .
Proof. This is a special case of Corollary 18.77 applied to the reverse mar-
tingale, M
m
= E[Z[T
m
] where, for m N, T
m
:= B
m
. This may also be
proved by Hilbert space projection methods when Z L
2
(P) and then by a
limiting argument for all Z L
1
(P) .
Proposition 28.7. Suppose that Z L
1
(, B, P) and and are two stopping
times. Then
1. E[Z[B
] = E[Z[B
] on and hence on < .

2. E[E[Z[B
] [B
] = E[Z[B
] .
424 28 Continuous time (sub)martingales
Moreover, both results hold if and are optional times provided every
occurrence of the letter B is replaced by B
+
.
Proof. 1. From Corollary 27.17, 1
E[Z[B
] is B
measurable and
therefore,
1
E[Z[B
] = E[1
E[Z[B
] [B
]
= 1
E[E[Z[B
] [B
] = 1
E[Z[B
]
as desired.
2. Writing
Z = 1
Z + 1
<
Z
we nd, using item 1. that,
E[Z[B
] = 1
E[Z[B
] + 1
<
E[Z[B
]
= 1
E[Z[B
] + 1
<
E[Z[B
] . (28.2)
Another application of item 1. shows,
E[1
<
E[Z[B
] [B
] = 1
<
E[E[Z[B
] [B
]
= 1
<
E[E[Z[B
] [B
] = 1
<
E[Z[B
] .
Using this equation and the fact that 1
E[Z[B
] is B
measurable, we
may condition Eq. (28.2) on B
to nd
E[E[Z[B
] [B
] = 1
E[Z[B
] + 1
<
E[Z[B
] = E[Z[B
] .
Lemma 28.8. Suppose is an optional time and
m
m=1
are stopping times
such that
m
as m and <
m
on < for all m N. Then
B
m
B
+
as m , i.e. B
m
is decreasing in m and
B
+
m=1
B
m
. (28.3)
Proof. If A B
+
, then A B
and for all t 1

+
and m N we have
A
m
t = A < t
m
t B
t
.
This shows A
m=1
B
m
. For the converse, observe that
< t =
m=1
m
t t 1
+
.
Therefore if A
m=1
B
m
then A B
and
A < t =
m=1
[A
m
t] B
t
t 1
+
.
Theorem 28.9 (Continuous time optional sampling theorem). Let
X
t
t0
be a right continuous B
t
(or
_
B
+
t
_
) submartingale and and be
two B
t
optional (or stopping) times such that M := sup
() < .
1
Then X
L
1
(, B
+
, P) , X
L
1
_
, B
+
, P
_
and
X
E
_
X
[B
+
. (28.4)
Proof. Let
m
m=1
and
n
n=1
be the sequences of approximate times
for and respectively dened Lemma 27.18, i.e.
n
:= 1
=
+
k=1
k
2
n
1k1
2
n <
k
2
n
.
By the discrete optional sampling Proposition 28.3, we know that
X
mn
E[X
n
[B
m
] a.s. (28.5)
Since X
t
is right continuous, X
n
() X
() for all which combined

with Lemma 28.5 implies X
n
X
in L
1
(P) and in particular X
L
1
(P) .
Similarly, X
nn
X
in L
1
(P) and therefore X
L
1
(P) . Using
the L
1
(P) contractivity of conditional expectation along with the fact that
X
mn
X
m
on , we may pass to the limit (n ) in Eq. (28.5) to
nd
X
m
E[X
[B
m
] a.s. (28.6)
From the right continuity of X
t
and making use of Lemma 28.6 (or Corol-
lary 18.77) and Lemma 28.8, we may let m in Eq. (28.6) to nd
X
lim
m
E[X
[B
m
] = E[X
m=1
B
m
] = E
_
X
[B
+
which is Eq. (28.4).

Corollary 28.10 (Optional stopping). Let X
t
t0
be a right continuous
B
t
(or
_
B
+
t
_
) submartingale and be any B
t
optional (or stopping)
time. Then the stopped process, X
t
:= X
t
is a right continuous
_
B
+
t
_

submartingale.
Proof. Let 0 s t < and apply Theorem 28.9 with to the two stopping
times, s and t to nd
X
s
= X
s
E
_
X
t
[B
+
s
= E
_
X
t
[B
+
s
.
From Proposition 28.7,
E
_
X
t
[B
+
s
= E
_
X
t
[B
+
ts
= E
_
E
_
X
t
[B
+
t
[B
+
s
= E
_
X
t
[B
+
s
and therefore, we have shown X
s
E[X
t
[B
+
s
] . Since X
s
= X
s
is B
+
s
measurable and B
+
s
B
+
s
, it follows that X
s
is B
+
s
measurable.
1
We will see below in Theorem 28.28, that the boundedness restriction on may be
replaced by the assumption that
_
X
+
t
_
t0
28.1 Submartingale Inequalities 425
28.1 Submartingale Inequalities
Let
_
, B, B
t
tR+
, P
_
be a ltered probability space, | be any dense subset
of 1
+
containing 0, and let denote either | or 1
+
. Throughout this section,
X
t
tT
will be a submartingale which is assumed to be right continuous if
= 1
+
. To keep the notation unied, for T 1
+
, we will simply denote
sup
DtT
X
t
= inf
DtT
X
t
, and sup
sD[0,T]
[X
s
[ by sup
tT
X
t
, inf
tT
X
t
, and
X
T
respectively. It is worth observing that if = 1
+
and T |, we have (by
the assumed right continuity of X
t
) that
sup
tT
X
t
= sup
DtT
X
t
, inf
tT
X
t
= inf
DtT
X
t
and , sup
s[0,T]
[X
s
[ = sup
sD[0,T]
[X
s
[ .
(28.7)
Our immediate goal is to generalize the submartingale inequalities of Section
18.5 to this context.
Proposition 28.11 (Maximal Inequalities of Bernstein and Levy). With
= | for = 1
+
, for any a 0 and T , we have,
aP
_
sup
tT
X
t
a
_
E
_
X
T
: sup
tT
X
t
a
_
E
_
X
+
T
, (28.8)
aP
_
inf
tT
X
t
a
_
E
_
X
T
: inf
tT
X
t
> a
_
E[X
0
] (28.9)
E
_
X
+
T
E[X
0
] , (28.10)
and
aP (X
T
a) 2E
_
X
+
T
E[X
0
] . (28.11)
In particular if M
t
tT
is a martingale and a > 0, then
P (M
T
a)
1
a
E[[M[
T
: M
T
a]
1
a
E[[M
T
[] (28.12)
Proof. First assume = |. For each k N let
k
= 0 = t
0
< t
1
< < t
m
= T |[0, T]
be a nite subset of |[0, T] containing 0, T such that
k
|[0, T] . Not-
ing that X
tn
m
n=0
is a discrete (, B, B
tn
m
n=0
, P) submartingale, Proposi-
tion 18.42 implies all of the inequalities in Eqs. (28.8) (28.11) hold pro-
vided we replace sup
tT
X
t
by max
tk
X
t
, inf
tT
X
t
by min
tk
X
t
, and X
T
by max
tk
[X
t
[ . Since max
tk
X
t
sup
tT
X
t
, max
tk
[X
t
[ X
T
, and
min
tk
X
t
inf
tT
X
t
, we may use the MCT and the DCT to pass to the
limit (k ) in order to conclude Eqs. (28.8) (28.11) are valid as stated.
Equation (28.12) follows from Eq. (28.8) applied to X
t
:= [M
t
[ .
Now suppose that X
t
tR+
and M
t
tR+
are right continuous. Making
use of the observations in Eq. (28.7), we see that Eqs. (28.8) (28.12) remain
valid for = 1
+
by what we have just proved in the case = |T .
Proposition 28.12 (Doobs Inequality). Suppose that X
t
is a non-negative
submartingale (for example X
t
= [M
t
[ where M
t
is a martingale) and 1 < p <
, then for any T ,
EX
p
T

_
p
p 1
_
p
EX
p
T
. (28.13)
Proof. Using the notation in the proof of Proposition 28.11, it follows from
Corollary 18.46 that
E
_
max
tk
[X
t
[
p
_
_
p
p 1
_
p
EX
p
T
.
Using the MCT, we may let k in this equation to arrive at Eq. (28.13)
when = |. The case when = 1
+
follows immediately using the comments
at the end of the proof of Proposition 28.11.
Lemma 28.13. Suppose that F
n
is a sequence of bounded functions on [a, b)
which are uniformly convergent to a function F. If
n
:= lim
ta
F
n
(t) exists
for all n, then := lim
ta
F (t) exists and
n
as n . An analogous
statement holds for left limits. In particular right (left) continuous functions
are preserved under uniform limits.
Proof. Let
n
:= sup
t[a,b)
[F (t) F
n
(t)[ which by assumption tends to
zero as n . Thus for s, t > a, we have
[F (t) F (s)[ [F (t) F
n
(t)[ +[F
n
(t) F
n
(s)[ +[F
n
(s) F (s)[
2
n
+[F
n
(t) F
n
(s)[ .
Therefore we have
limsup
s,ta
[F (t) F (s)[ 2
n
0 as n
which shows that := lim
ta
F (t) exists. Similarly, for any t > a,
[
n
[ [ F (t)[ +[F (t) F
n
(t)[ +[F
n
(t)
n
[
[ F (t)[ +[F
n
(t)
n
[ +
n
and hence by passing to the limit as t a in the previous inequality we have
[
n
[
n
0 as n .
Corollary 28.14. Suppose that
_
, B, B
t
t0
, P
_
is a ltered probability
space such that B
t
contains all P null subsets
2
of B for all t 1
+
. For any
T 1
+
, let M
T
denote the collection of (right) continuous L
2
martingales,
M := M
t
tT
equipped with the inner product,
(M, N)
T
:= E[M
T
N
T
] .
(More precisely, two (right) continuous L
2
martingales, M and N, are taken
to be equal if P (M
t
= N
t
t T) = 1.) Then the space, (M
T
, (, )
T
) , is a
Hilbert space and the map, U : M
T
L
2
(, B
T
, P) dened by UM := M
T
, is
an isometry.
Proof. Since M
t
= E[M
T
[B
t
] a.s., if (M
T
, M
T
) = E[M
T
[
2
= 0 then M =
0 in M
T
. This shows that U is injective and by denition U is an isometry
and (, )
T
is an inner product on M
T
. To nish the proof, we need only show
H := Ran(U) is a closed subspace of L
2
(, B
T
, P) or equivalently that M
T
is
complete.
Suppose that M
n
n=1
is a Cauchy sequence in M
T
, then by Doobs in-
equality (Proposition 28.12) and Holders inequality, we have
E
_
(M
n
M
m
)
_
E
_
(M
n
M
m
)
2
T
_
_
4E[M
n
T
M
m
T
[
2
= 2 |M
n
M
m
|
T
0 as m, n .
By passing to a subsequence if necessary, we may assume
n=1
E
_
_
M
n+1
M
n
_
T
_
2
n=1
_
_
M
n+1
M
n
_
_
T
<
E
_

n=1
_
M
n+1
M
n
_
T
_
=
n=1
E
_
_
M
n+1
M
n
T
_
< .
So if we let
0
:=
_

n=1
_
M
n+1
M
n
T
<
_
,
then P (
0
) = 1. Hence if m < l, the triangle inequality implies
_
M
l
M
m
T

l1
n=m
_
M
n+1
M
n
T
0 on
0
as m, l ,
2
Lemma 28.15 below shows that this hypothesis can always be fullled if one is
willing to augment the ltration by the P null sets.
which shows that M
n
()
n=1
is a uniformly Cauchy sequence and hence uni-
formly convergent for all
0
. Therefore by Lemma 28.13, t M
t
() is
(right) continuous for all
0
. We complete the denition of M by setting
M
() 0 for /
0
. Since B
t
contains all of the null subset in B, it is easy
to see that M
is a B
t
adapted process. Moreover, by Fatous lemma, we have
E
_
(M
M
m
)
2
T
_
= E
_
liminf
n
(M
n
M
m
)
2
T
_
liminf
n
E
_
(M
n
M
m
)
2
T
_
0 as m .
In particular M
m
t
M
t
in L
2
(P) for all t T from which follows that M is
still an L
2
martingale. As M is (right) continuous, M M
T
and
|M M
n
|
T
= |M
T
M
n
T
|
L
2
(P)
0 as n .
28.2 Regularizing a submartingale
Lemma 28.15. Suppose that
_
, B, B
t
t0
, P
_
is a ltered probability space
and X
t
t0
be a B
t
t0
submartingale. Then X
t
t0
is also a
_
B
t
_
t0

submartingale. Moreover, we may rst replace
_
, B, B
t
t0
, P
_
by its com-
pletion,
_
,

B,

P
_
(see Proposition 5.50), then X
t
t0
is still a submartingale
relative to the ltration

B
t
:= B
t

^ where
^ :=
_
B

B :

P (B) = 0
_
.
Proof. It suces to prove the second assertion. By the augmentation
Lemma 27.22 we know that B

B
s
:= B
s

^ i there exists A B
s
such
that B A

^. Then for any t > s we have
E
P
[X
t
X
s
: B] = E
P
[X
t
X
s
: A] = E
P
[X
t
X
s
: A] 0.
t
tR+
is an B
t
submartingale such
that t X
t
is right continuous in probability, i.e. X
t
P
X
s
as t s for all
s 1
+
. (For example, this hypothesis will hold if there exists > 0 such that
lim
ts
E[X
t
X
s
[
= 0 for all s 1
+
.) Then X
t
tR+
is also an
_
B
+
t
_

submartingale.
28.2 Regularizing a submartingale 427
Proof. Let 0 s < t < , A B
+
s
, and s
n
(s, t) such that s
n
s. By
Lemma 28.4 we know that

X
s
:= lim
n
X
sn
1
(P) and
using the assumption that X
sn
P
X
s
we may conclude that X
sn
X
s
in
L
1
(P) . Since A B
+
s
B
sn
for all n, we have
E[X
t
X
s
: A] = lim
n
E[X
t
X
sn
: A] 0.
Corollary 28.17. Suppose
_
, B, B
t
t0
, P
_
is a ltered probability space
and X
t
tR+
is an B
t
submartingale such that X
t
P
X
s
as t s for
all s 1
+
. Let
_
,

B,

P
_
denote the completion of (, B, P) , ^ and

^ be
the P and

P null sets respectively, then
_
, B, B
t+
^
t0
, P
_
satises the
weak usual hypothesis (see Denition 27.21),
_
,

B,
_
B
t+

^
_
t0
,

P
_
satis-
es the usual hypothesis, and X
t
t0
is a submartingale relative to each of
these ltrations.
Proof. This follows directly from Proposition 28.16, Lemma 28.15, and
Lemma 27.23. We use Lemma 27.23 to guarantee that B
t+
^
t0
and
_
B
t+

^
_
t0
are right continuous.
In all of the examples of submartingales appearing in this book, the hypoth-
esis and hence the conclusions of Proposition 28.16 will apply. For this reason
there is typically no harm in assuming that our ltration is right continuous.
By Corollary 28.17 we may also assume that B
t
contains all P null sets. The
results in the following exercise are useful to keep in mind as you are reading
the rest of this section.
Exercise 28.1 (Continuous version of Example 18.7). Suppose that =
(0, 1] , B = B
(0,1]
, and P = m Lebesgue measure. Further suppose that :
[0, ) 0, 1 is any function of your choosing. Then dene, for t 0 and
x ,
M
t
(x) := e
t
_
(t) 1
0<xe
t + (1 (t)) 1
0<x<e
t
_
= e
t
(1
0<x<e
t + (t) 1
x=e
t ) .
Further let B
t
:= (M
s
: s t) for all t 0 and for a (0, 1] let
T
(0,a]
:=
_
[0, a] A : A B
(a,1]
_
B
(a,1]
and
T
(0,a)
:=
_
(0, a) A : A B
[a,1]
_
B
[a,1]
.
Show:
Fig. 28.1. The graph of x M
0
t
(x) for some xed t.
1. T
(0,a]
and T
(0,a)
are sub sigma algebras of B such that T
(0,a]
_ T
(0,a)
and
B =
a(0,1]
T
(0,a]
=
a(0,1]
T
(0,a)
.
2. For all b (0, 1],
T
(0,b)
=
a 0. In particular, the
sample paths, t M
t+
(x) , are right continuous and possess left limits for
all x .
Fig. 28.2. A typical sample path of M
0
(x) .
4. B
t
= T
(0,e
t
]
if (t) = 1 and B
t
= T
(0,e
t
)
if (t) = 0.
5. No matter how is chosen, B
t+
= B
0
t
:= T
(0,e
t
)
for all t 0.
6. M
t
is a B
t0
martingale and in fact it is a
_
B
t+
= B
0
t
_
t0
martingale.
7. The map, [0, ) (0, 1] (t, x) M
t
(x) 1
+
is measurable i
t [0, ) : (t) = 1 B
R+
.
8. Let
^ :=
_
x : M
t
(x) ,= M
t+
(x) for some t 0
_
.
Show ^ = x : ([lnx[) = 1 and observe that ^ is measurable i is
measurable. Also observe that if 1, then P (^) = 1 and hence M
t+
and
M
t
are certainly not indistinguishable, see Denition 25.2.
9. Show M
t0
is not uniformly integrable.
10. Let Z L
1
(, B, P) nd a version, N
t
, of E[Z[B
t
] . Verify that for any
sequence, t
n
n=1
[1, ), that N
tn
Z almost surely and in L
1
(P) as
n .
Solution to Exercise (28.1).
1. It is routine to check that T
(0,a]
_ T
(0,a)
B
(0,1]
are sigma algebras. Since
a(0,1]
T
(0,a]
contains all sets of the form, (a, 1] : 0 a 1 , it contains
B
(0,1]
. In fact this is most easily done by observing that T
(0,a)
consists those
A B
(0,1]
such that (0, a) A or (0, a) A
c
with a similar characterization
of T
(0,a]
.
2. Since T
(0,a]
T
(0,a)
, T
(0,a]
and T
(0,a)
are increasing as a decreases, and
T
(0,b)
T
(0,a]
if a < b,
T
(0,b)

a<b
T
(0,a]

a<b
T
(0,a)
.
Now suppose

A
a<b
T
(0,a)
and a
n
< b with a
n
b. Then for each n, there
exists an A
n
B
[an,1]
such that

A = (0, a
n
) A
n
or

A = A
n
. There are now
only two alternatives, either 1)

A = (0, a
n
) A
n
for all n or 2)

A = A
n
for
all n. In the rst case, we must have
(0, b) =
n=1
(0, a
n
)

A
and therefore

A = (0, b) A
for some A
B
[b,1]
and thus

A T
(0,b)
.
In the second case, we know that

A (0, a
n
) = for all n and therefore,
A (0, b) = from which it again follows that

A T
(0,b)
. Therefore we
have shown
T
(0,b)

a<b
T
(0,a]

a<b
T
(0,a)
T
(0,b)
which implies Eq. (28.14).
3. Let x = (0, 1] be xed and observe that
M
t
(x) = e
t
_
1
t<ln(x)
+ (t) 1
t=ln(x)
_
= e
t
_
1
t<]ln(x)]
+ ([ln(x)[) 1
t=]ln(x)]
_
.
Therefore, no matter the value of ([ln(x)[) , M
t
(x) = 0 if t > [ln(x)[ and
M
t
(x) = e
t
if t < [ln(x)[ . Hence it follows that
M
t+
(x) = e
t
1
t<ln(x)
= M
0
t
(x) (28.15)
and
M
t
(x) = e
t
1
tln(x)
= M
1
t
(x) . (28.16)
4. Since e
t
M
0
t
(x) = 1
0<x<e
t and e
t
M
1
t
(x) = 1
0<xe
t , the reader may
easily show, B
0
t
= T
(0,e
t
)
and B
1
t
= T
(0,e
t
]
for all t 0. Also, no matter
how is chosen, M
1
s
= M
s
is B
t
measurable for all 0 < s t and since
M
1
0
1 we may conclude that M
1
0
is B
0
B
t
measurable as well. It is
also simple to verify M
t
is T
(0,e
t
)
measurable for all t 0. From these
observations we may conclude, no matter how is chosen, that
T
(0,e
t
]
= B
1
t
B
t
T
(0,e
t
)
.
As M
s
is T
(0,e
t
]
measurable for all s < t and M
t
is T
(0,e
t
]
measurable
if (t) = 1 we may conclude, B
t
= T
(0,e
t
]
if (t) = 1. On the other hand,
if (t) = 0, then B
t
contains T
(0,e
t
]
and all M
0
t
This then implies that
T
(0,e
t
)
=
_
e
t
_
T
(0,e
t
]
B
t
T
(0,e
t
)
which forces B
t
= T
(0,e
t
)
if (t) = 0.
5. By items 2. and 4.,
B
0
t
= B
1
t+
B
t+
B
0
t+
= B
0
t
from which it follows that B
t+
= B
0
t
:= T
(0,e
t
)
for all t 0.
6. If 0 s t and A B
0
s
= T
(0,e
s
)
, then
E[M
t
: A] =
_
0 if A (0, e
s
) =
1 if (0, e
s
) A.
Since this is constant in t for t s and A B
0
s
, we wee that M
t
is a
_
B
0
t
_
t0
martingale.
7. Since (t, x) M
t
(x) is measurable i (t, x) e
t
M
t
(x) is measurable
and as the latter function is an indicator function this is equivalent to the
set, B := (t, x) : e
t
M
t
(x) = 1 being measurable. Now,
B = (t, x) : (1
0<x<e
t + (t) 1
x=e
t ) = 1
= (t, x) : 1
0<x<e
t = 1 (t, x) : (t) 1
x=e
t = 1
= (t, x) : 1
0<x<e
t = 1 A
where A := (t, e
t
) : (t) = 1 [0, ) (0, 1]. As A is disjoint from the
measurable set, (t, x) : 1
0<x<e
t = 1 , it follows that B is measurable i
A is measurable. If A is measurable, let : 1
+
1
+
(0, 1] be the con-
tinuous map dened by (t) = (t, e
t
) . Then t : (t) = 1 =
1
(A) is
measurable and because is an indicator function this shows is measur-
able. Conversely if is measurable, then
A =
__
t, e
t
_
: t 0
_
(t : (t) = 1 (0, 1])
which is measurable.
8. Since M
t+
(x) = M
0
t
(x) , we see that M
t
(x) ,= M
0
t
(x) can only happen if
x = e
t
and (t) = 1, i.e. only if
1 = (lnx) = ([lnx[) .
9. Since M
t
0 as t while EM
t
= 1 for all t, see Figure 28.1, M
t0
can not be uniformly integrable.
10. Let Z L
1
(, B, P) one easily shows that
E
_
Z[B
0
t
(x) = 1
0<x<e
t e
t
_
e
t
0
Z (y) dy + 1
e
t
x<
Z (x)
and
E
_
Z[B
1
t
(x) = 1
0<xe
t e
t
_
e
t
0
Z (y) dy + 1
e
t
<x<
Z (x)
will do the trick.
Denition 28.18 (Upcrossings). Let x
t
tT
be a real valued function which
is right continuous if = 1
+
. Given < a < b < , T , and a nite
subset, F, of [0, T] , let U
x
F
(a, b) denote the number of upcrossings of
x
t
tF
across [a, b] , see Section 18.6. Also let
U
x
T
(a, b) := supU
x
F
(a, b) : F
f
[0, T] (28.17)
be the number of upcrossings of x
t
tT[0,T]
across [a, b] .
Lemma 28.19. If = | and F
n
n=1
is a sequence of nite subsets of
|[0, T] such that F
n
|[0, T] , then
U
x
T
(a, b) := lim
n
U
x
Fn
(a, b) . (28.18)
In particular, U
X
T
(a, b) is a B
T
measurable random variable when = |.
Proof. It is clear that U
x
Fn
(a, b) U
x
T
(a, b) for all n and U
x
Fn
(a, b) is
increasing with n and therefore the limit in Eq. (28.18) exists and satises,
lim
n
U
x
Fn
(a, b) U
x
T
(a, b) . Moreover, for any F
f
|[0, T] we may nd
an n N suciently large so that F F
n
. For this n we will have
U
x
F
(a, b) U
x
Fn
(a, b) lim
n
U
x
Fn
(a, b) .
Taking supremum over all F
f
|[0, T] in this estimate then shows
U
x
T
(a, b) lim
n
U
x
Fn
(a, b) .
Remark 28.20. It is easy to see that if = 1
+
, x
t
is right continuous, and
a < < < b, then
U
x
T
(a, b) supU
x
F
(, ) : F
f
|[0, T] .
Lemma 28.21. Let T 1
+
and x
t
tD
be a real valued function such that
U
x
T
(a, b) < for all < a < b < with a, b . Then
x
t
:= lim
Dst
x
s
exists in

1 for t (0, T] and (28.19)
x
t+
:= lim
Dst
x
s
exists in

1 for t [0, T). (28.20)
Moreover, if we let U
x
(a, b) = lim
T
U
x
T
(a, b) and further assume that
U
x
(a, b) < for all < a < b < with a, b , then x
:= lim
t
x
t
exists in

1 as well.
Proof. I will only prove the statement in Eq. (28.19) since all of the others
are similar. If x
t
does not exists in

1 then we can nd a, b such that
liminf
Dst
x
s
< a b. From this observation it is easy to see that
= U
x
t
(a, b) U
x
T
(a, b) .
Lemma 28.22. Suppose that = |, S is a metric space, and x
t
S
tD
.
1. If for all t 1
+
,
x
+
t
:= x
t+
= lim
Dst
x
s
exists in S,
then 1
+
t x
+
t
S is right continuous.
2. If we further assume that
x
t
:= lim
Dst
x
s
exists in S
for all t > 0, then lim
t
x
+
= x
t
for all t > 0.
3. Moreover, if lim
Dt
x
t
exists in S then again lim
t
x
t+
= lim
Dt
x
t
.
Proof. 1. Suppose t 1
+
and > 0 is given. By assumption, there exists
> 0 such that for s (t, t +) |, we have (x
t+
, x
s
) . Therefore if
(t, t +) , then
(x
t+
, x
+
) = lim
Ds
(x
t+
, x
s
)
from which it follows that x
+
x
t+
as t.
2. Now suppose t > 0 such that x
t
exists in S. Then for all > 0 there
exists a > 0 such that (x
t
, x
s
) if s (t , t)|. Hence, if (t , t)
we may conclude,
(x
t
, x
+
) = lim
s
(x
t+
, x
s
)
from which it follows that x
+
x
t
as t.
3. Now suppose x
:= lim
Ds
x
s
exists in S. Then for every > 0, there
exists M = M () < such that (x
, x
s
) if s |(M, ) . Hence if
t (M, ) we have
(x
, x
t+
) = lim
Dst
(x
, x
s
)
from which we conclude that, lim
t
x
t+
exists in S and is equal to x
.
Theorem 28.23 (Doobs upcrossing inequality). Let X
t
tD
be a sub-
martingale and < a < b < . Then for all T |,
E
_
U
X
T
(a, b)
1
b a
_
E(X
T
a)
+
E(X
0
a)
+
. (28.21)
Proof. Let F
n
n=1
be a sequence as in Lemma 28.19 and assume without
loss of generality that 0, T F
n
for all n. It then follows From Theorem 18.51
that
E
_
U
X
Fn
(a, b)
1
b a
_
E(X
T
a)
+
E(X
0
a)
+
n N.
By letting n , Eq. (28.21) follows from this inequality, Lemma 28.19, and
the MCT.
Theorem 28.24. Let X
t
tD
be a submartingale,
0
:=
TN
__
sup
DtT
[X
t
[ <
_
U
X
T
(a, b) < : a < b with a, b
_
_
,
(28.22)
for all t 1
+
,
Y
t
:= limsup
Dst
X
s
and

X
t
:= Y
t
1
]Yt]<
. (28.23)
Then;
1. P (
0
) = 1.
2. on
0
, sup
tT
[X
t
[ < and X
t+
and X
t
exist for all t 1
+
where by
convention X
0
:= X
0
.
3. X
t+
()
tR+
is right continuous with left hand limits for all
0
.
4. For any t 1
+
and any sequence s
n
n=1
|(t, ) such that s
n
t,
then X
sn
X
t+
in L
1
(P) as n .
5. The process
_
X
t
_
tR+
is a
_
B
+
t
_
t0
submartingale such that t

X
t
is
right continuous and has left limits on
0
.
6. X
t
E
_
X
t
[B
t
a.s. for all t | with equality at some t | i

lim
Dst
EX
s
= EX
t
.
7. If X
s
P
X
t
as | s t at some t |,
3
then

X
t
= X
t
a.s.
8. If C := sup
tD
E[X
t
[ < (or equivalently sup
tD
EX
+
t
< ), then X
:=
lim
Dt

X
t
= lim
Dt
X
t
exists in 1 a.s. and E[X
[ < C < .
Note: if
_
X
+
t
_
tD
is uniformly integrable then sup
tD
E
X
+
t
< .
9. If
_
X
+
t
_
tD
is uniformly integrable i there exists X
L
1
(, B, P) such
that X
t
tD]
is a submartingale. In other words,
_
X
+
t
_
tD
is uniformly
integrable i here exists X
L
1
(, B, P) such that X
t
E[X
[B
t
] a.s.
for all t |.
Proof. 1. 3. The fact that P (
0
) = 1 follows from Doobs upcrossing
inequality and the maximal inequality in Eq. (28.11). The assertions in items
2. and 3. are now a consequence of the denition of
0
and Lemmas 28.21 and
28.22.
4. Let Y
n
:= X
sn
and T
n
:= B
sn
for n N. Then (Y
n
, T
n
)
nN
is
a backwards submartingale such that inf EY
n
EX
t
and hence by Theorem
18.75, Y
n
= X
sn
X
t+
in L
1
(P) as n .
5. Since

X
t
= X
t+
on
0
and X
t+
is right continuous with left hand limits,
X has these properties on

0
as well. Now let 0 s < t < , s
n
, t
n
|,
such that s
n
s, t
n
t with s
n
< t for all n. Then by item 4. and the
submartingale property of X,
3
For example, this will hold if limDst E|Xt Xs| = 0.
E
_
X
t

X
s
: A
= E[X
t+
X
s+
: A] = lim
n
E[X
tn
X
sn
: A] 0
for all and A B
s+
.
6. Let A B
t
and t
n
| with t
n
t |, then
E
_
X
t
: A
= lim
n
E[X
tn
: A] lim
n
E[X
t
: A] .
Since A B
t
is arbitrary it follows that X
t
E
_
X
t
[B
t
a.s. If equality holds,

then, taking A = above, we nd
EX
t
= E
X
t
= lim
n
E[X
tn
] .
Since t
n
| with t
n
t was arbitrary, we may conclude that lim
Dst
EX
s
=
EX
t
. Conversely if lim
Dst
EX
s
= EX
t
, then along any sequence, s
n
|
with s
n
s, we have
EX
t
= lim
n
EX
sn
= E lim
n
X
sn
= E
X
t
= EE
_
X
t
[B
t
.
As X
t
E
_
X
t
[B
t
a.s. this identity implies X

t
= E
_
X
t
[B
t
a.s.
7. Let t
n
| such that t
n
t, then as we have already seen X
tn

X
t
in L
1
(P) . However by assumption, X
tn
P
X
t
, and therefore we must have
X
t
= X
t
a.s. since limits in probability are unique up to null sets.
The proof or items 8. and 9. will closely mimic their discrete versions given
in Corollary 18.54.
8. The proof here mimics closely the discrete version given in Corollary 18.54.
For any < a < b < , Doobs upcrossing inequality (Theorem 28.23) and
the MCT implies,
E
_
U
X
(a, b)
= lim
DT
E
_
U
X
T
(a, b)
1
b a
_
sup
TD
E(X
T
a)
+
E(X
0
a)
+
_
<
where
U
X
(a, b) = lim
DT
U
X
T
(a, b)
is the total number of upcrossings of X across [a, b] . In particular it follows
that
0
:=
_
U
X
(a, b) < : a, b with a < b

_
has probability one. Hence by Lemma 28.21, for
0
we have X
() :=
lim
Dt
X
t
() exists in

1. By Fatous lemma with | t
n
, it follows
that
E[[X
[] = E
_
liminf
n
[X
n
[
_
liminf
n
E[[X
n
[] C <
and therefore that X
1 a.s.
9. If
_
X
+
t
_
t0
is uniformly integrable, then, by Vitallis convergence The-
orem 12.44 and the fact that X
+
t
X
+
a.s. (as we have already shown),

X
+
t
X
+
in L
1
(P) . Therefore for A B
t
we have, by Fatous lemma, that
E[X
t
1
A
] limsup
Ds
E[X
s
1
A
] = limsup
Ds
_
E
_
X
+
s
1
A
E
_
X
s
1
A
_
= E
_
X
+
1
A
liminf
Ds
E
_
X
s
1
A
E
_
X
+
1
A
E
_
liminf
Ds
X
s
1
A
_
= E
_
X
+
1
A
E
_
X
1
A
= E[X
1
A
] .
Since A B
t
was arbitrary we may conclude that X
t
E[X
[B
t
] a.s. for all
t 1
+
.
Conversely if we suppose that X
t
E[X
[B
t
] a.s. for all t 1
+
, then
by Jensens inequality, X
+
t
E[X
+
[B
t
] and therefore
_
X
+
t
_
t0
is uniformly
integrable by Proposition 18.8 and Exercise 12.5.
Example 28.25. In this example we show that there exists a right continuous
submartingale, X
t
t0
, such that X
sn
n=1
is not uniformly integrable for
some bounded increasing sequence s
n
n=1
. Indeed, let
X
t
:= M
0
tan(
2
t1)
,
where M
t
t0
is the martingale constructed in Exercise 28.1. Then it is easily
checked that X
t
t0
is a
_
B
tan(
2
t1)
_
t0
submartingale. Moreover if s
n

[0, 1) with s
n
1, the collection, X
sn
n=1
is not uniformly integrable for if it
were we would have
1 = lim
n
EX
sn
= E
_
lim
n
X
sn
_
= E[0] = 0.
In particular this shows that in item 4. of Theorem 28.24, we can not suppose
s
n
n=1
|[0, t) with s
n
t.
Exercise 28.2. If X
t
t0
is a right continuous submartingale on a ltered
probability space,
_
, B, B
t
t0
, P
_
, then s EX
s
is right continuous at t
and X is a
_
B
+
t
_
submartingale.
Solution to Exercise (28.2). Let t
n
n=1
be any decreasing sequence in
(t, ) such that t
n
t as n . By Lemma 28.4, we know Y := lim
n
X
tn
exists a.s and in L
1
(P) . As X
tn
X
t
as n , it follows that X
tn
X
t
in
L
1
(P) and therefore, lim
n
EX
tn
= EX
t
. Since t
n
n=1
(t, ) with t
n
t
was an arbitrary sequence, we may conclude that t EX
t
Similarly, if 0 s < t < , s
n
(s, t) with s
n
s, then for any B B
s+
we
have
E[X
t
X
s
: B] = lim
n
E[X
t
X
sn
: B] 0.
t
t0
be a submartingale on a ltered probability space,
_
, B, B
t
t0
, P
_
, t 1
+
, and X
s
P
X
t
as s t, then s EX
s
is right
continuous at t.
Solution to Exercise (28.3). Let t
n
n=1
be a decreasing sequence in (t, )
such that t
n
t as n . By Lemma 28.4, we know Y := lim
n
X
tn
exists
a.s and in L
1
(P) . As X
tn
P
X
t
as n , it follows that X
tn
X
t
in L
1
(P)
and a.s. Therefore, lim
n
EX
tn
= EX
t
. Since t
n
n=1
(t, ) with t
n
t
was an arbitrary sequence, we may conclude that s EX
s
is right continuous
at t.
Theorem 28.26 (Regularizing Submartingales). Let X
t
t0
be a sub-
martingale on a ltered probability space,
_
, B, B
t
t0
, P
_
, and let
0
and
X
t
be as in Theorem 28.24 applied to X
t
tD
. Further let
X
t
() :=
_

X
t
() if
0
0 if /
0
,
and
_
B
t+
:= B
t+
^
_
t0
where ^ is the collection of P null subsets of
in B. Then:
1.
_
X
t
_
t0
is a
_
B
t+
_
t0
submartingale which is right continuous with left
hand limits.
2. E
_
X
t
[B
t
_
X
t
a.s. for all t 1
+
with equality holding for all t 1
+
i
t EX
t
3. If B
t
t0
is right continuous, then

X
t
X
t
a.s. for all t 1
+
with equality
holding for all t 1
+
i t EX
t
In particular if X
t
t0
is right continuous in probability, then X
t
t0
has
a right continuous modication possessing left hand limits,
_
X
t
_
t0
, such that
_
X
t
_
t0
is a
_
B
t+
_
t0
submartingale.
Proof. 1. Since
0
^ and

X
t
is a B
t+
measurable, it follows that
X
t
is

B
t+
measurable. Hence
_
X
t
_
t0
is an adapted process. Since

X is a
modication of

X which is already a
_
B
t+
_
t0
submartingale (see Lemma
28.15) it follows that

X is also a
_
B
t+
_
t0
submartingale.
2. Since

X
t
=

X
t
, we may replace

X by

X in the statement 2. We now need
only follow the proof of item 6. in Theorem 28.26. Indeed, if t 1
+
, A B
t
,
and t
n
| with t
n
t |, then
E
_
X
t
: A
= lim
n
E[X
tn
: A] lim
n
E[X
t
: A] .
Since A B
t
was arbitrary, it follows that
X
t
E
_
X
t
[B
t
= E
_
X
t
[B
t
_
a.s. (28.24)
If equality holds in Eq. (28.24), EX
t
= E
X
t
which is right continuous by Exer-
cise 28.2. Conversely if t EX
t
is right continuous, it follows from item 6. of
Theorem 28.26 that

X
t
= X
t
a.s.
3. Since

X
t
=

X
t
a.s. and

X
t
is B
t+
= B
t
measurable, we nd
X
t
=

X
t
= E
_
X
t
[B
t
= E
_
X
t
[B
t
_
a.s.
With this observation (i.e.

X
t
= E
_
X
t
[B
t
_
a.s.) the assertions in item 3. follow
directly from those in item 2.
Now suppose that X
t
t0
is right continuous in probability. By Proposition
28.16 and Lemma 28.15, X
t
t0
is a
_
B
t+
_
t0
submartingale such that, by
Exercise 28.3, t EX
t
is right continuous. Therefore, by items 1. and 3. of the
theorem,

X dened above is the desired modication of X.
Example 28.27. Let be a Poisson process, N
t
t0
, with parameter as de-
scribed in Example 25.8. Since N
t0
has independent increments, it follows
that N
t
t0
is a B
t
:= (N
s
: s t)
t0
martingale. By example 22.14) we
know that E[N
t
N
s
[ = [t s[ and in particular, t N
t
is continuous in
probability. Hence it follows from Theorem 28.26 that there is a modication,
N and N such that

_
N
t
_
t0
is a
_
B
t+
_
t0
martingale which has continuous
sample paths possessing left hand limits.
The ideas of this example signicantly generalize to produce good modica-
tions of large classes of Markov processes, see for example [8, Theorem I.9.4 on
p. 46], [20] and [38]. See [49, Chapter I] where this is carried out in the context
of Levy processes. We end this section with another version of the optional
sampling theorem.
Theorem 28.28 (Optional sampling II). Suppose X
t
t0
is a right con-
tinuous submartingale on a ltered probability space,
_
, B, B
t
t0
, P
_
such
that
_
X
+
t
_
t0
is uniformly integrable. Then for any two optional times, and
, X
L
1
(P) and
X
E
_
X
[B
+
. (28.25)
In particular if M
t
t0
is a right continuous uniformly integrable martingale,
then
M
= E
_
M
[B
+
. (28.26)
Proof. Let X
:= lim
Dt
X
t
L
1
(P) as in Theorem 28.24 so that
X
t
E[X
[B
t
] for all t |. For t 1
+
, let t
n
n=1
|(t, ) be such that
t
n
t, then by Corollary 18.77,
X
t
= lim
n
X
tn
lim
n
E[X
[B
tn
] = E
_
X
[B
+
t
a.s.
Conditioning this inequality on B
t
also allows us to conclude that X
t

E[X
[B
t
] .
4
We may now reduce the inequality in Eq. (28.25) to the case in
Theorem 28.9 where is a bounded stopping time by simply identifying [0, /2]
with [0, ] via the map, t tant. More precisely, let Y
t
0t/2
be the right
continuous
_
B
t
:= B
tan t
_
0t/2
submartingale dened by Y
t
:= X
tan t
. As
tan
1
() and tan
1
() are two bounded
_
B
t
_
0t/2
optional times, we
may apply Theorem 28.9 to nd;
X
= Y
tan
1
()tan
1
()
E
_
Y
tan
1
()
[
B
+
tan
1
()
_
= E
_
X
[B
+
a.s.
For the martingale assertions, simply apply Eq. (28.25) with X
t
= M
t
and
X
t
= M
t
.
4
According to Exercise 28.2, {Xt}
t0
is also a
_
B
+
t
_
submartingale. Therefore
for the purposes of this Theorem, there is no loss in generality in assuming that
B
+
t
= Bt.
Part VI
Markov Processes II
29
The Strong Markov Property
The main theme of this part of the book is that many Markov processes
(including all the ones we have seen) satisfy a stronger version of the Markov
property than we proved in Theorems 17.4 and 17.14. Our rst goal is to describe
these stronger Markov properties which involve stopping times. We will then
show in dierent classes of examples how this strong Markov property implies
a number of interesting properties of the corresponding Markov process.
Our immediate goal is to extend the Markov property in Theorem 17.14 to
allow for restarting the processes at random times. As in Chapter 17 we will
assume that T = N
0
or 1
+
. We will start with a simple extension of the Markov
property associated to a stopping time with countable range. We will then use
a limiting procedure in order to handle general optional times.
To keep the setup relatively general but not overly general, in this part
of the book we will assume that (S, ) is a complete separable metric
space and let o denote the Borel algebra on S. Further suppose that
Q
t
: S o [0, 1]
tN0
are time homogeneous Markov transition kernels. As
usual we let S
T
denote the set of all functions, : T S and
t
: S
T
S will
denote the projection,
t
() = (t) .
Denition 29.1 (RC). Let RC (T, S) denote those S
T
which are right
continuous, i.e. (t) = (t+) := lim
st
(s) for all t T. Similarly we let
C (T, S) denote those S
T
which are continuous.
Notation 29.2 (Path Spaces) Let denote either of the three path spaces,
S
T
, RC (T, S) or C (T, S) and write in all cases,
t
for
t
[
and T
t
:=
(
s
: s t) , T
+
t
:= T
t+
, and T :=
t0
T
t
= (
s
: 0 s < ) .
Assumption 3 To each x S we will assume there exists a probability mea-
sure, P
x
, on (, T) such that
_
, T, T
t
t0
,
t
t0
, P
x
_
is a time homoge-
neous Markov process with transition kernels Q
t
t0
such that P
x
(
0
= x) =
1.
From Theorem 17.11, we know that Assumption 3 automatically holds when
= S
T
. On the other hand if is equal to RC (T, S) or C (T, S) and T = 1
+
,
the existence of the Markov measures P
x
xS
requires additional restriction
on the Markov semi-group, Q
t
tT
. Our starting point for this chapter is
the following minor variant of Theorem 17.14.
Theorem 29.3 (The time homogeneous Markov property). Suppose
that
_
, B, B
t
t0
, P
_
is a ltered probability space and X
t
: S
t0
are
adapted functions such that X
t
t0
is a time homogeneous Markov process with
transition kernels Q
t
tT
. If F : 1 is bounded and T/B
R
measurable
and t T, then S x E
x
F is o/B
R
measurable and
E
P
[F (X
t+
) [B
t
] = E
P
[F (X
t+
) [X
t
] = E
Xt
[F] P a.s. (29.1)
We omit the proof since it is virtually identical to the proof of Theorem
17.14.
29.1 The denumerable strong Markov property
In this section we are going to prove a strong form of the Markov property which
is useful for discrete time Markov chains. It will be the stepping stone to the
more general strong Markov property in the continuous time case considered in
Theorem 29.6.
Theorem 29.4 (The denumerable strong Markov property). Suppose
again that
_
, B, B
t
t0
, X
t
t0
, P
_
is a time homogeneous Markov process
with transition kernels Q
t
tT
and : [0, ] is a B
t
t0
stopping
time with countable range, i.e. () is nite or countable. If F : 1 is
bounded and T/B
R
measurable, then
E[F (X
+
) [B
] = E
X
F, P a.s. on < . (29.2)
Proof. Let T
0
:= () = () 1
+
. If B o we have
X
B = t = X
t
B = t B
t
for all t T
0
from which it follows that X
is B
/o measurable on <
by item 7. of Lemma 27.16. Since x E
x
F is o/B
R
measurable by Theorem
29.3 we may now conclude that 1
<]
E
X
F is B
/B
R
measurable.
By Theorem 29.3, if A B
we nd,
438 29 The Strong Markov Property
E[F (X
+
) 1
<
1
A
]
= E
_
tT0
F (X
t+
) 1
A=t]
_
=
tT0
E
_
F (X
t+
) 1
A=t]
tT0
E
_
E
Xt
F 1
A=t]
= E
_
tT0
1
=t
[E
Xt
F] 1
A
_
= E[[E
X
F] 1
A
1
<
] ,
where in we have used Lemma 27.16 again to see that A = t B
t
for all
t. Since A B
is arbitrary and 1
<
E
X
F is B
measurable it follows that

E[F (X
+
) 1
<
[B
] = 1
<
E
X
F, P a.s.
which is precisely Eq. (29.2).
Let us write out this theorem more explicitly in the case of a discrete time
Markov chain.
Corollary 29.5. Suppose X
t
tN0
is a discrete time Markov chain in a count-
able or nite state space S. If is a stopping time and x S, then, condi-
tioned on < and X
= x , B
and X
+t
: t 0 are independent and
X
+t
: t 0 has the same distribution as X
t
t0
under P
x
.
Proof. Let g : 1 be a bounded B
measurable function and f :

S
N0
1 be a bounded o
N0
measurable function. Then
E
[g f (X
+
) : < & X
= x] = E
[(1
X =x
g) f (X
+
) : < ]
= E[(1
X =x
g) E
X
[f (X
)] : < ]
= E[(1
X =x
g) E
x
[f (X
)] : < ]
= E
x
[f (X
)] E[g : < & X
= x]
and therefore,
E
[g f (X
+
) [ < & X
= x] =
E[g : < & X
= x]
E
[ < & X
= x]
E
x
f
= E
(g[ < & X
= x) E
x
f.
Taking g = 1 in this equation then shows,
E
[f (X
+
) [ < & X
= x] = E
x
f
which combined with the previously displayed equation completes the proof of
the corollary.
In Chapter 30 we will see that Corollary 29.5 will allow us to say a fair bit
more about the behavior of discrete time Markov chains. In fact you may jump
directly to Chapter 30 now if you wish.
29.2 The strong Markov property in continuous time
Let us now assume that T = 1
+
. We would like to remove the countable range
restriction on the stopping time appearing in Theorem 29.4. In order to do
this we are going to need to impose some continuity conditions on our Markov
processes and their transition kernels.
Theorem 29.6 (Strong Markov Property). Let be either RC (T, S) or
C (T, S) ,
_
, B, B
t
t0
, P
_
be a ltered probability space, X
t
t0
be a time
homogeneous Markov with Markov transition kernels Q
t
t0
and suppose that
is a B
t
optional time, see Denition 27.8. We further assume that that
X
(i.e. (t X
t
()) for all ) and assume the existence of a
multiplicative system, M BC (S) , such that (M) = o and Q
t
M M for all
t T. Then for any F : 1 which is bounded and T measurable, we have
E
_
F (X
+
) [B
+
= E
X
[F] , P a.s. on < . (29.3)
Part of the assertion here is that 1
<]
E
X
[F] is B
+
measurable. Moreover
if is a stopping time, then 1
<]
E
X
[F] is B
measurable and
E
_
F (X
+
) [B
+
= E[F (X
+
) [B
] = E
X
[F] , P a.s. on < .
(29.4)
Proof. The proof will be divided into ve steps.
1. By Lemmas 27.6 and 27.14 we know that X
is B
+
measurable on <
which combined with the measurability of x E
x
F (see Theorem 29.3)
implies 1
<]
E
X
[F] is B
+
measurable. If we further assume that is

a stopping time, the same reasoning imply that 1
<]
E
X
[F] is now B
measurable. With this measurability issue out of the way it follows by

the tower property for conditional expectations that Eq. (29.3) implies Eq.
(29.4). So we now embark on the proof of Eq. (29.3).
2. Let
n
n=1
be the discrete stopping times dened in Eq. (27.5) of Lemma
27.18. Recall that = =
n
= and
n
as n and B
+

B
n
for all n. Therefore if A B
+
B
n
, it follows from Theorem 29.4 that
E[F (X
n+
) 1
<
1
A
] = E[F (X
n+
) 1
n<
1
A
]
= E
_
1
n<
_
E
Xn
F
1
A
= E
__
E
Xn
F
1
<
1
A
. (29.5)
Our goal now is to let n in Eq. (29.5). However, we are not going to
be able to take this limit for general F T
b
and hence the need for three
more steps.
29.3 Examples 439
3. For the moment suppose that
F =
n
i=1
f
i

ti
(29.6)
(i.e. F () =

n
i=1
f
i
( (t
i
))) for some f
i
M and times 0 t
1
< t
2
<
< t
n
. For x S one shows by induction that
E
x
[F] =
_
Q
t1
M
f1
Q
t2t1
M
f2
. . . M
fn1
Q
tntn1
f
n
_
(x) (29.7)
where M
f
denotes the operation of multiplication by f on M, i.e. M
f
g = fg
for all g M. Since M is multiplicative and Q
t
M M for all t it follows
from Eq. (29.7) that (S x E
x
F) M C (S) . The continuity of E
()
F
along with the right continuity of t X
t
then implies;
E
Xn
F E
X
F boundedly as n . (29.8)
We also have that
F (X
n+
) =
n
i=1
f
i
(X
n+ti
)
n
i=1
f
i
(X
+ti
) = F (X
+
) as n .
(29.9)
Therefore for F as in Eq. (29.6) we may use the DCT to pass to the limit
in Eq. (29.5) to nd;
E[F (X
+
) 1
<
1
A
] = E[[E
X
F] 1
<
1
A
] . (29.10)
4. Let H denote those F T
b
such that Eq. (29.10) is valid and

Mdenote those
F T
b
which are of the form in Eq. (29.6). We have just proved that

M H.
The reader may now easily check that 1 H, H is a linear subspace which
is closed under bounded convergence, and

M is a multiplicative system.
Therefore an application of the multiplicative systems Theorem 8.2 implies
that H contains all
_
M
_
bounded measurable functions.
5. To nish the proof we must now show
_
M
_
= T and for this it suces
to show that
t
is
_
M
_
measurable for all t 1
+
. Since f
t

M
for all f M it follows that M H where H now denotes those f o
b
such that f
t
is
_
M
_
measurable. Again the reader may easily check
that 1 H and H is a linear subspace which is closed under bounded
convergence. Therefore another application of the multiplicative systems
Theorem 8.2 shows that H =o
b
. In particular if B o then 1
B

t
is
M
_
1
t
(B) = 1
B

t
= 1
_
M
_
for
all B o. This shows that
t
is
_
M
_
/o measurable for all t 1
+
.
Remark 29.7. We can also see that X
is B
+
/o measurable on < from

the fact that X
= lim
n
X
n
. In order to do this rst observe that if A
n=1
B
n
and t > 0 then
A < t =
n=1
A
n
< t B
t
from which it follows that A B
+
. Therefore for any x S we have (x, X
) =
lim
n
(x, X
n
) on < where the limit is necessarily
n=1
B
n
mea-
surable on < . Therefore it follows that (x, X
) is B
+
measurable on
< for all x S. Therefore by Exercise 27.1 it follows that X
is B
+s

measurable on < .
29.3 Examples
In this section we will verify that the examples of continuous time Markov
processes we have considered so far all verify the hypotheses of Theorem 29.6
and therefore satisfy the strong Markov property.
Example 29.8 (Poisson Process). From the construction given in Theorem 11.15
we know that the Poisson process may be chosen to have right continuous paths
in S = N
0
. From example 17.20, the conservative Markov semi-group of the
Poisson process is given by
Q
t
f (x) =
n=0
(t)
n
n!
e
t
f (x +n)
for all bounded functions, f : S 1. We may now take M = BC (S) the
bounded continuous functions on S. (As continuity is no restriction we really
have that M consists of all bounded functions on S.) Since
[Q
t
f (x)[
n=0
(t)
n
n!
e
t
[f (x +n)[
n=0
(t)
n
n!
e
t
|f|
u
= |f|
u
it follows that Q
t
f is bounded whenever f is bounded. Therefore Q
t
M M for
all t 0.
More generally the same argument works for any right continuous Markov
chain with a discrete state space.
Proposition 29.9. Suppose that S is a countable or nite state space,
Q
t
: S S [0, 1]
t0
is a time homogeneous Markov semi-group. (In this
case o = 2
S
.) If M denotes the bounded (necessarily continuous) functions on
S then Q
t
M M for all t 0.
440 29 The Strong Markov Property
Proof. As for all probability kernels we have |Q
t
f|
u
|f|
u
so that Q
t
M
M for all t 0.
The next results is the continuous time Markov chain analogue of Corollary
29.5 for discrete time Markov chains.
Corollary 29.10. Let S and Q
t
t0
be as in Proposition 29.9, =
RC (T, S) , and suppose that X
t
t0
is an associated right continuous Markov
chain. If is an optional time and j S, then, then conditioned on
< and X
= j we have that B
+
and X
t+
: t 0 are independent and
X
t+
: t 0 has the same distribution as X
t
t0
under P
j
.
Proof. Let g be a bounded B
+
measurable function and f : 1 be a

bounded B measurable function. Then by Theorem 29.6,
E[g f (X
+
) : < & X
= j] = E
_
E
_
f (X
+
) g 1
<
1
X=j
[B
+
_
= E
_
g 1
<
1
X=j
E
_
f (X
+
) [B
+
_
= E
_
g 1
<
1
X=j
E
X
f
_
= E
_
g 1
<
1
X=j
E
j
f
_
= E
_
g 1
<
1
X=j
_
E
j
f
and therefore,
E[g f (X
+
) [ < & X
= j] =
E
_
g 1
<
1
X=j
_
E[ < & X
= j]
E
j
f
= E(g[ < & X
= j) E
j
f.
E[f (X
+
) [ < & X
= j] = E
j
f
the corollary.
Example 29.11 (Bounded Rate Markov Chains). Suppose that S is countable
set and a : S S 1 is a function such that a (x, y) 0 for all x ,= y, and
there exists < such that
a
x
:=
y,=x
Let us now dene a (x, x) = a
x
and A : o
b
o
b
by
Af (x) :=
yS
a (x, y) f (y) =
y,=x
a (x, y) [f (y) f (x)] for all x S
and Q
t
= e
tA
. It was shown in Corollary 17.27 that Q
t
t0
denes a con-
servative Markov semi-group and that there is an associated right continuous
Markov process associated to this semi-group. In light of Proposition 29.9 and
Theorem 29.6 this Markov process has the strong Markov property.
Example 29.12 (Brownian Motion). Suppose that S = 1
d
and and
Q
t
(x, dy) = p
t
(x, y) dy = p
t
(y x) dy,
where
p
t
(x, y) :=
_
1
2t
_
d/2
exp
_
1
2t
[x y[
2
_
, (29.11)
where [x[
2
=

d
i=1
x
2
i
. As you showed in Exercise 17.8, Q
t
t0
is a time ho-
mogeneous Markov semi-group (the associated Markov process being Brownian
motion) which may be described as;
(Q
t
f) (x) = E
_
f
_
x +
tZ
__
(29.12)
where Z
d
= N (0, I) . In this case we may take M = BC (S) the bounded
continuous functions on S. Indeed if f BC (S) , then from Eq. (29.12) and
the DCT we see that Q
t
f is still continuous and that |Q
t
f|
u
|f|
u
so that
Q
t
f M whenever f M. Since we know Brownian motion has continuous
paths (Theorem 26.3), it follows from Theorem 29.6 that Brownian motion
satises the strong Markov property.
Alternatively we could take
M=
_
f
a,
(x) := ae
ix
: 1
d
and a > 0
_
.
In this case we have
(Q
t
f
a,
) (x) = aE
_
e
i(x+
tZ)
_
= aE
_
e
i
tZ
_
e
ix
= ae
t
2
]]
2
e
ix
= f
ae
t
2
||
2
,
(x) .
In order to make use of this choice for M, it is necessary to generalize Theorem
29.6 to the complex setting which we leave to the interested reader.
29.4 Applications
Throughout this section we will continue the notation and hypotheseses as in
Theorem 29.6. To simplify the statments of the theorems we will further assume
that = , B = T, B
t
= T
t
, and X
t
=
t
. That is we are going to work in the
29.4 Applications 441
canonical path space model for our Markov process. For the time being I
will write bB for the bounded B measurable functions on the path space
rather than B
b
. Let us introduce the shift operator,
t
: dened by
t
() := (t +) for all t 0. With this notation we may now write Z
t
for
Z (X
t+
) .
Corollary 29.13. For all Z bB, and t 0,
E[Z[B
t+
] = E[Z[B
t
] , P a.s. (29.13)
(More precisely, if U is any version of E[Z[B
t+
] and V is any version of
E[Z[B
t
] , then U = V, P a.s.)
Proof. First suppose that Z = G F
t
with F bB and G bB
t
. For
such a Z we have, according to Theorem 29.6 with t (a stopping time),
E[Z[B
t+
] = E[G F
t
[B
t+
] = G E[F
t
[B
t+
]
= G E
Bt
[F] = G E[F
t
[B
t
]
= E[G F
t
[B
t
] = E[Z[B
t
] P a.s..
An application of the multiplicative systems Theorem 8.2 (or Theorem 8.16)
then shows this identity remains valid for all Z bB. (In applying Theorem
8.2, you may want to make use of the conditional version of the DCT.)
Corollary 29.14. If A B
t+
, there exists

A B
t
such that P
_
A

A
_
= 0.
In short, B
t+
is equal to B
t
modulo P null sets.
Proof. Let A B
t+
. Applying Corollary 29.13 to Z = 1
A
bB
t+
gives;
Z = E[Z[B
t+
] = E[Z[B
t
] P a.s.
Letting

Z bB
t
be a xed version of E[Z[B
t
] , we have 1
A
=

Z (P a.s.).
Composing this equation with the function, (x) = (x 0) 1 shows that we
may assume

Z = 1
A
where

A =
_
Z = 1
_
B
t
. This completes the proof since,
P
_
AA
_
= E[1
A
1
A
[ = E
1
A
= 0.
Recall if
_
, B, B
t
t0
, P
_
is a ltered probability space, the augmentation
of a ltration, B
t
t0
, is the new ltration
_
B
t
:= B
t
^ (P)
_
t0
where
^ (P) := A B : P (A) = 0
is the collection of P null sets in B.
Corollary 29.15 (Augmented ltrations are right continuous). The
augmentation,
_
B
t
:= B
t
^ (P)
_
t0
, of a Brownian ltration, B
t
t0
, is
automatically right continuous.
Proof. By Corollary 29.13 we know that B
t
B
t+

B
t
and therefore,
B
t
(B
t+
)

B
t
, i.e.

B
t
= (B
t+
)
. On the other hand by Lemma 27.23,

(B
t+
)
=

B
t+
. (Also see Example 29.16.)
More generally we have the following result.
Example 29.16 (Augmented ltrations are right continuous). Let be a prob-
ability measure on S. As in Notation ??, let P
:=
_
S
d (x) P
x
be the
Wiener measure on := C ([0, ), S) , B
t
: S be the projection map,
B
t
() = (t) , B
t
= (B
s
: s t) , and ^
t+
() := N B
t+
: P
(N) = 0 .
Then by Corollary 29.13, B
t+
= B
t
^
t+
() . Hence if we let
^ () := N B : P
(N) = 0
and
^ () :=
_
B 2
: B N for some N ^ ()
_
then B
t+
^ () = B
t
^ () =

B
t
and B
t+

^ () = B
t

^ () for all
t 1
+
. This shows that the augmented Brownian ltration, B
t
^ ()
t0
,
is already right continuous and hence satises the weak usual hypothesis, (see
Denition 27.21). Similarly, the completed and augmented Brownian ltration,
_
B
t

^ ()
_
t0
, satises the usual hypothesis.
Theorem 29.17 (Blumenthal 0 1 Law). The eld, B
0+
is P
x
trivial
for all x S.
Proof. By Corollary 29.14, if A B
0+
there exists

A B
0
such that
P
x
_
A

A
_
= 0. In particular it follows that P
x
(A) = P
x
_
A
_
. On the other
hand 1
A
is B
0
measurable and hence 1
A
= F (X
0
) = F (x) P
x
a.s. for
some F o
b
. Since 1
A
is constant a.s. it must be either 0 or 1 a.s. and hence
P
x
(A) = P
x
_
A
_
0, 1 .
30
Long Run Behavior of Discrete Markov Chains
(This chapter needs more editing and in particular include restatements and
proofs of theorems already covered.) In this chapter, X
n
will be a Markov chain
with a nite or countable state space, S. To each state i S, let
R
i
:= minn 1 : X
n
= i (30.1)
be the rst passage time of the chain to site i, and
M
i
:=
n1
1
Xn=i
(30.2)
be number of visits of X
n
n1
to site i.
Denition 30.1. A state j is accessible from i (written i j) i P
i
(R
j
<
) > 0 and i j (i communicates with j) i i j and j i. No-
tice that i j i there is a path, i = x
0
, x
1
, . . . , x
n
= j S such that
p (x
0
, x
1
) p (x
1
, x
2
) . . . p (x
n1
, x
n
) > 0.
Denition 30.2. For each i S, let C
i
:= j S : i j be the commu-
nicating class of i. The state space, S, is partitioned into a disjoint union of
its communicating classes.
Denition 30.3. A communicating class C S is closed provided the proba-
bility that X
n
leaves C given that it started in C is zero. In other words P
ij
= 0
for all i C and j / C. (Notice that if C is closed, then X
n
restricted to C is
a Markov chain.)
Denition 30.4. A state i S is:
1. transient if P
i
(R
i
< ) < 1,
2. recurrent if P
i
(R
i
< ) = 1,
a) positive recurrent if 1/ (E
i
R
i
) > 0, i.e. E
i
R
i
< ,
b) null recurrent if it is recurrent (P
i
(R
i
< ) = 1) and 1/ (E
i
R
i
) = 0,
i.e. ER
i
= .
We let S
t
, S
r
, S
pr
, and S
nr
be the transient, recurrent, positive recurrent,
and null recurrent states respectively.
The next two sections give the main results of this chapter along with some
illustrative examples. The remaining sections are devoted to some of the more
technical aspects of the proofs.
30.1 The Main Results
Bruce see Kallenberg [28, pages 148-158]for more information along the lines
of what is to follow.
Proposition 30.5 (Class properties). The notions of being recurrent, pos-
itive recurrent, null recurrent, or transient are all class properties. Namely if
C S is a communicating class then either all i C are recurrent, positive
recurrent, null recurrent, or transient. Hence it makes sense to refer to C as
being either recurrent, positive recurrent, null recurrent, or transient.
Proof. See Proposition 30.13 for the assertion that being recurrent or tran-
sient is a class property. For the fact that positive and null recurrence is a class
property, see Proposition 30.39 below.
Lemma 30.6. Let C S be a communicating class. Then
C not closed = C is transient
or equivalently put,
C is recurrent = C is closed.
Proof. If C is not closed and i C, there is a j / C such that i j, i.e.
there is a path i = x
0
, x
1
, . . . , x
n
= j with all of the x
j
n
j=0
being distinct such
that
P
i
(X
0
= i, X
1
= x
1
, . . . , X
n1
= x
n1
, X
n
= x
n
= j) > 0.
Since j / C we must have j C and therefore on the event,
A := X
0
= i, X
1
= x
1
, . . . , X
n1
= x
n1
, X
n
= x
n
= j ,
X
m
/ C for all m n and therefore R
i
= on the event A which has positive
probability.
Proposition 30.7. Suppose that C S is a nite communicating class and
T = inf n 0 : X
n
/ C be the rst exit time from C. If C is not closed, then
not only is C transient but E
i
T < for all i C. We also have the equivalence
of the following statements:
444 30 Long Run Behavior of Discrete Markov Chains
1. C is closed.
2. C is positive recurrent.
3. C is recurrent.
In particular if #(S) < , then the recurrent (= positively recurrent) states
are precisely the union of the closed communication classes and the transient
states are what is left over.
Proof. These results follow fairly easily from Proposition ??. Also see Corol-
lary 30.20 for another proof.
Remark 30.8. Let X
n
n=0
denote the fair random walk on 0, 1, 2, . . . with
0 being an absorbing state. The communication classes are 0 and 1, 2, . . .
with the latter class not being closed and hence transient. Using Remark ??, it
follows that E
i
T = for all i > 0 which shows we can not drop the assumption
that #(C) < in the rst statement in Proposition 30.7. Similarly, using the
fair random walk example, we see that it is not possible to drop the condition
that #(C) < for the equivalence statements as well.
Example 30.9. Let P be the Markov matrix with jump diagram given in Figure
30.9. In this case the communication classes are 1, 2 , 3, 4 , 5 . The latter
two are closed and hence positively recurrent while 1, 2 is transient.
Warning: if C S is closed and #(C) = , C could be recurrent or it
could be transient. Transient in this case means the walk goes o to innity.
The following proposition is a consequence of the strong Markov property in
Corollary 29.5 (or Corollary 30.35 below).
Proposition 30.10. If j S, k N, and : S [0, 1] is any probability on
S, then
P
(M
j
k) = P
(R
j
< ) P
j
(R
j
< )
k1
. (30.3)
Proof. Intuitively, M
j
k happens i the chain rst visits j with proba-
bility P
(R
j
< ) and then revisits j again k 1 times which the probability
of each revisit being P
j
(R
j
< ) . Since Markov chains are forgetful, these
probabilities are all independent and hence we arrive at Eq. (30.3). See Propo-
sition 30.36 below for the formal proof based on the strong Markov property in
Corollary 29.5 (or Corollary 30.35 below).
Corollary 30.11. If j S and : S [0, 1] is any probability on S, then
P
(M
j
= ) = P
(X
n
= j i.o.) = P
(R
j
< ) 1
jSr
, (30.4)
P
j
(M
j
= ) = P
j
(X
n
= j i.o.) = 1
jSr
, (30.5)
E
M
j
=
n=1
iS
(i) P
n
ij
=
P
(R
j
< )
1 P
j
(R
j
< )
, (30.6)
and
E
i
M
j
=
n=1
P
n
ij
=
P
i
(R
j
< )
1 P
j
(R
j
< )
(30.7)
where the following conventions are used in interpreting the right hand side of
Eqs. (30.6) and (30.7): a/0 := if a > 0 while 0/0 := 0.
Proof. Since
M
j
k M
j
= = X
n
= j i.o. n as k ,
it follows, using Eq. (30.3), that
P
(X
n
= j i.o. n) = lim
k
P
(M
j
k) = P
(R
j
< ) lim
k
P
j
(R
j
< )
k1
(30.8)
which gives Eq. (30.4). Equation (30.5) follows by taking =
j
in Eq. (30.4)
and recalling that j S
r
i P
j
(R
j
< ) = 1. Similarly Eq. (30.7) is a special
case of Eq. (30.6) with =
i
. We now prove Eq. (30.6).
Using the denition of M
j
in Eq. (30.2),
E
M
j
= E
n1
1
Xn=j
=
n1
E
1
Xn=j
=
n1
P
(X
n
= j) =
n=1
jS
(j) P
n
jj
30.1 The Main Results 445
which is the rst equality in Eq. (30.6). For the second, observe that
k=1
P
(M
j
k) =
k=1
E
1
Mjk
= E
k=1
1
kMj
= E
M
j
.
On the other hand using Eq. (30.3) we have
k=1
P
(M
j
k) =
k=1
P
(R
j
< )P
j
(R
j
< )
k1
=
P
(R
j
< )
1 P
j
(R
j
< )
provided a/0 := if a > 0 while 0/0 := 0.
It is worth remarking that if j S
t
, then Eq. (30.6) asserts that
E
M
j
= (the expected number of visits to j) <
which then implies that M
j
is a nite valued random variable almost surely.
Hence, for almost all sample paths, X
n
can visit j at most a nite number of
times.
Theorem 30.12 (Recurrent States). Let j S. Then the following are
equivalent;
1. j is recurrent, i.e. P
j
(R
j
< ) = 1,
2. P
j
(X
n
= j i.o. n) = 1,
3. E
j
M
j
=
n=1
P
n
jj
= .
Proof. The equivalence of the rst two items follows directly from Eq. (30.5)
and the equivalent of items 1. and 3. follows directly from Eq. (30.7) with i = j.
Proposition 30.13. If i j, then i is recurrent i j is recurrent, i.e. the
property of being recurrent or transient is a class property.
Proof. Since i and j communicate, there exists and in N such that
P
ij
> 0 and P
ji
> 0. Therefore
n1
P
n++
ii

n1
P
ij
P
n
j j
P
ji
which shows that

n1
P
n
j j
= =

n1
P
n
ii
= . Similarly

n1
P
n
ii
=
=

n1
P
n
j j
= . Thus using item 3. of Theorem 30.12, it follows that i
is recurrent i j is recurrent.
Corollary 30.14. If C S
r
is a recurrent communication class, then
P
i
(R
j
< ) = 1 for all i, j C (30.9)
and in fact
P
i
(
jC
X
n
= j i.o. n) = 1 for all i C. (30.10)
More generally if : S [0, 1] is a probability such that (i) = 0 for i / C,
then
P
(
jC
X
n
= j i.o. n) = 1 for all i C. (30.11)
In words, if we start in C then every state in C is visited an innite number of
times. (Notice that P
i
(R
j
< ) = P
i
(X
n
n1
hits j).)
Proof. Let i, j C S
r
and choose m N such that P
m
ji
> 0. Since
P
j
(M
j
= ) = 1 and
X
m
= i and X
n
= j for some n > m
=
n>m
X
m
= i, X
m+1
,= j, . . . , X
n1
,= j, X
n
= j ,
we have
P
m
ji
= P
j
(X
m
= i) = P
j
(M
j
= , X
m
= i)
P
j
(X
m
= i and X
n
= j for some n > m)
=
n>m
P
j
(X
m
= i, X
m+1
,= j, . . . , X
n1
,= j, X
n
= j)
=
n>m
P
m
ji
P
i
(X
1
,= j, . . . , X
nm1
,= j, X
nm
= j)
=
n>m
P
m
ji
P
i
(R
j
= n m) = P
m
ji
k=1
P
i
(R
j
= k)
= P
m
ji
P
i
(R
j
< ). (30.12)
Because P
m
ji
> 0, we may conclude from Eq. (30.12) that 1 P
i
(R
j
< ), i.e.
that P
i
(R
j
< ) = 1 and Eq. (30.9) is proved. Feeding this result back into
Eq. (30.4) with =
i
shows P
i
(M
j
= ) = 1 for all i, j C and therefore,
P
i
(
jC
M
j
= ) = 1 for all i C which is Eq. (30.10). Equation (30.11)
follows by multiplying Eq. (30.10) by (i) and then summing on i C.
Theorem 30.15 (Transient States). Let j S. Then the following are equiv-
alent;
1. j is transient, i.e. P
j
(R
j
< ) < 1,
2. P
j
(X
n
= j i.o. n) = 0, and
3. E
j
M
j
=
n=1
P
n
jj
< .
Moreover, if i S and j S
t
, then
n=1
P
n
ij
= E
i
M
j
< =
_
lim
n
P
n
ij
= 0
P
i
(X
n
= j i.o. n) = 0.
(30.13)
and more generally if : S [0, 1] is any probability, then
n=1
P
(X
n
= j) = E
M
j
< =
_
lim
n
P
(X
n
= j) = 0
P
(X
n
= j i.o. n) = 0.
(30.14)
Proof. The equivalence of the rst two items follows directly from Eq. (30.5)
and the equivalent of items 1. and 3. follows directly from Eq. (30.7) with i = j.
The fact that E
i
M
j
< and E
M
j
< for all j S
t
are consequences of
Eqs. (30.7) and (30.6) respectively. The remaining implication in Eqs. (30.13)
and (30.6) follow from the rst Borel Cantelli Lemma ?? and the fact that n
th
term in a convergent series tends to zero as n .
Corollary 30.16. 1) If the state space, S, is a nite set, then S
r
,= . 2) Any
nite and closed communicating class C S is a recurrent.
Proof. First suppose that #(S) < and for the sake of contradic-
tion, suppose S
r
= or equivalently that S = S
t
. Then by Theorem 30.15,
lim
n
P
n
ij
= 0 for all i, j S. On the other hand,

jS
P
n
ij
= 1 so that
1 = lim
n
jS
P
n
ij
=
jS
lim
n
P
n
ij
=
jS
0 = 0,
which is a contradiction. (Notice that if S were innite, we could not interchange
the limit and the above sum without some extra conditions.)
To prove the rst statement, restrict X
n
to C to get a Markov chain on a
nite state space C. By what we have just proved, there is a recurrent state
i C. Since recurrence is a class property, it follows that all states in C are
recurrent.
Denition 30.17. A function, : S [0, 1] is a sub-probability if
jS
(j) 1. We call

jS
(j) the mass of . So a probability is a sub-
probability with mass one.
Denition 30.18. We say a sub-probability, : S [0, 1] , is invariant if
P = , i.e.
iS
(i) p
ij
= (j) for all j S. (30.15)
An invariant probability, : S [0, 1] , is called an invariant distribution.
Theorem 30.19. Suppose that P = (p
ij
) is an irreducible Markov kernel and
j
:=
1
EjRj
for all j S. Then:
1. For all i, j S, we have
lim
N
1
N
N
n=0
1
Xn=j
=
j
P
i
a.s. (30.16)
and
lim
N
1
N
N
n=1
P
i
(X
n
= j) = lim
N
1
N
N
n=0
P
n
ij
=
j
. (30.17)
2. If : S [0, 1] is an invariant sub-probability, then either (i) > 0 for all
i or (i) = 0 for all i.
3. P has at most one invariant distribution.
4. P has a (necessarily unique) invariant distribution, : S [0, 1] , i P is
positive recurrent in which case (i) = (i) =
1
EiRi
> 0 for all i S.
(These results may of course be applied to the restriction of a general non-
irreducible Markov chain to any one of its communication classes.)
Proof. These results are the contents of Theorem 30.38 and Propositions
30.39 and 30.40 below.
Using this result we can give another proof of Proposition 30.7.
Corollary 30.20. If C is a closed nite communicating class then C is positive
recurrent. (Recall that we already know that C is recurrent by Corollary 30.16.)
Proof. For i, j C, let
j
:= lim
N
1
N
N
n=1
P
i
(X
n
= j) =
1
E
j
R
j
as in Theorem 30.21. Since C is closed,
jC
P
i
(X
n
= j) = 1
1and therefore,
jC
j
= lim
N
1
N
jC
N
n=1
P
i
(X
n
= j) = lim
N
1
N
N
n=1
jC
P
i
(X
n
= j) = 1.
Therefore
j
> 0 for some j C and hence all j C by Theorem 30.19 with S
replaced by C. Hence we have E
j
R
j
< , i.e. every j C is a positive recurrent
state.
30.1 The Main Results 447
Theorem 30.21 (General Convergence Theorem). Let : S [0, 1] be
any probability, i S, C be the communicating class containing i,
X
n
hits C := X
n
C for some n ,
and
i
:=
i
() =
P
(X
n
hits C)
E
i
R
i
, (30.18)
where 1/:= 0. Then:
1. P
a.s.,
lim
N
1
N
N
n=1
1
Xn=i
=
1
E
i
R
i
1
Xn hits C]
, (30.19)
2.
lim
N
1
N
N
n=1
jS
(j) P
n
ji
= lim
N
1
N
N
n=1
P
(X
n
= i) =
i
, (30.20)
3. is an invariant sub-probability for P, and
4. the mass of is
iS
i
=
C: pos. recurrent
P
(X
n
hits C) 1. (30.21)
Proof. If i S is a transient site, then according to Eq. (30.14),
P
(M
i
< ) = 1 and therefore lim
N
1
N
N
n=1
1
Xn=i
= 0 which agrees with
Eq. (30.19) for i S
t
.
So now suppose that i S
r
and let C be the communication class containing
i and
T = inf n 0 : X
n
C
be the rst time when X
n
enters C. It is clear that R
i
< T < . On
the other hand, for any j C, it follows by the strong Markov property (Corol-
lary 29.5 (or Corollary 30.35 below)) and Corollary 30.14 that, conditioned on
T < , X
T
= j , X
n
hits i i.o. and hence P (R
i
< [T < , X
T
= j) = 1.
Equivalently put,
P (R
i
< , T < , X
T
= j) = P (T < , X
T
= j) for all j C.
Summing this last equation on j C then shows
P (R
i
< ) = P (R
i
< , T < ) = P (T < )
and therefore R
i
< = T < modulo an event with P
probability
zero.
Another application of the strong Markov property (in Corollary 29.5 (or
Corollary 30.35 below)), observing that X
Ri
= i on R
i
< , allows us to con-
clude that the P
([R
i
< ) = P
([T < ) law of (X

Ri
, X
Ri+1
, X
Ri+2
, . . . )
is the same as the P
i
law of (X
0
, X
1
, X
2
, . . . ) . Therefore, we may apply The-
orem 30.19 to conclude that
lim
N
1
N
N
n=1
1
Xn=i
= lim
N
1
N
N
n=1
1
XR
i
+n=i
=
1
E
i
R
i
P
([R
i
< ) a.s.
On the other hand, on the event R
i
= we have lim
N
1
N
N
n=1
1
Xn=i
=
0. Thus we have shown P
a.s. that
lim
N
1
N
N
n=1
1
Xn=i
=
1
E
i
R
i
1
Ri<
=
1
E
i
R
i
1
T<
=
1
E
i
R
i
1
Xn hits C]
which is Eq. (30.19). Taking expectations of this equation, using the dominated
convergence theorem, gives Eq. (30.20).
Since 1/E
i
R
i
= unless i is a positive recurrent site, it follows that
iS
i
P
ij
=
iSpr
i
P
ij
=
C: pos-rec.
P
(X
n
hits C)
iC
1
E
i
R
i
P
ij
. (30.22)
As each positive recurrent class, C, is closed; if i C and j / C, then P
ij
= 0.
Therefore
iC
1
EiRi
P
ij
is zero unless j C. So if j / S
pr
we have
iS

i
P
ij
=
0 =
j
and if j S
pr
, then by Theorem 30.19,
iC
1
E
i
R
i
P
ij
= 1
jC

1
E
j
R
j
.
Using this result in Eq. (30.22) shows that
iS
i
P
ij
=
C: pos-rec.
P
(X
n
hits C) 1
jC

1
E
j
R
j
=
j
so that is an invariant distribution. Similarly, using Theorem 30.19 again,
iS
i
=
C: pos-rec.
P
(X
n
hits C)
iC
1
E
i
R
i
=
C: pos-rec.
P
(X
n
hits C) .
Denition 30.22. A state i S is aperiodic if P
n
ii
> 0 for all n suciently
large.
Lemma 30.23. If i S is aperiodic and j i, then j is aperiodic. So being
aperiodic is a class property.
Proof. We have
P
n+m+k
jj
=
w,zS
P
n
j,w
P
m
w,z
P
k
z,j
P
n
j,i
P
m
i,i
P
k
i,j
.
Since j i, there exists n, k N such that P
n
j,i
> 0 and P
k
i,j
> 0. Since
P
m
i,i
> 0 for all large m, it follows that P
n+m+k
jj
> 0 for all large m and
therefore, j is aperiodic as well.
Lemma 30.24. A state i S is aperiodic i 1 is the greatest common divisor
of the set,
n N : P
i
(X
n
= i) = P
n
ii
> 0 .
Proof. Use the number theory Lemma 30.41 below.
Theorem 30.25. If P is an irreducible, aperiodic, and recurrent Markov chain,
then
lim
n
P
n
ij
=
j
=
1
E
j
(R
j
)
. (30.23)
More generally, if C is an aperiodic communication class, then
lim
n
P
(X
n
= i) := lim
n
jS
(j) P
n
ji
= P
(R
i
< )
1
E
j
(R
j
)
for all i C.
Proof. I will not prove this theorem here but refer the reader to Norris [43,
Theorem 1.8.3] or Kallenberg [28, Chapter 8]. The proof given there is by a
coupling argument is given.
30.1.1 More nite state space examples
Example 30.26 (Analyzing a non-irreducible Markov chain). In this example we
are going to analyze the limiting behavior of the non-irreducible Markov chain
determined by the Markov matrix,
P =
1 2 3 4 5
_
_
0 1/2 0 0 1/2
1/2 0 0 1/2 0
0 0 1/2 1/2 0
0 0 1/3 2/3 0
0 0 0 0 1
_
_
1
2
3
4
5
.
Here are the steps to follow.
Fig. 30.1. The jump diagram for P above.
1. Find the jump diagram for P. In our case it is given in Figure 30.1.
2. Identify the communication classes. In our example they are 1, 2 ,
5 , and 3, 4 . The rst is not closed and hence transient while the second
two are closed and nite sets and hence recurrent.
3. Find the invariant distributions for the recurrent classes. For 5
it is simply
t
5]
= [1] and for 3, 4 we must nd the invariant distribution
for the 2 2 Markov matrix,
Q =
3 4
_
1/2 1/2
1/3 2/3
_
3
4
.
We do this in the usual way, namely
Nul
_
I Q
tr
_
= Nul
__
1 0
0 1
_
_
1
2
1
3
1
2
2
3
__
= 1
_
2
3
_
so that
t
3,4]
=
1
5
_
2 3
.
4. We can turn
t
3,4]
and
t
5]
into invariant distributions for P by padding
the row vectors with zeros to get
3,4]
=
_
0 0 2/5 3/5 0
5]
=
_
0 0 0 0 1
.
The general invariant distribution may then be written as;
=
5]
+
3,4]
with , 0 and + = 1.
30.2 The Strong Markov Property 449
5. We can now work out the lim
n
P
n
. If we start at site i we are considering
the i
th
row of lim
n
P
n
. If we start in the recurrent class 3, 4 we will
simply get
3,4]
for these rows and we start in the recurrent class 5 we
will get
5]
. However if start in the non-closed transient class, 1, 2 we
have
rst row of lim
n
P
n
= P
1
(X
n
hits 5)
5]
+P
1
(X
n
hits 3, 4)
3,4]
(30.24)
and
second row of lim
n
P
n
= P
2
(X
n
hits 5)
5]
+P
2
(X
n
hits 3, 4)
3,4]
.
(30.25)
6. Compute the required hitting probabilities. Let us begin by comput-
ing the fraction of one pound of sand put at site 1 will end up at site 5, i.e.
we want to nd h
1
:= P
1
(X
n
hits 5) . To do this let h
i
= P
i
(X
n
hits 5) for
i = 1, 2, . . . , 5. It is clear that h
5
= 1, and h
3
= h
4
= 0. A rst step analysis
then shows
h
1
=
1
2
P
2
(X
n
hits 5) +
1
2
P
5
(X
n
hits 5)
h
2
=
1
2
P
1
(X
n
hits 5) +
1
2
P
4
(X
n
hits 5)
which leads to
1
h
1
=
1
2
h
2
+
1
2
h
2
=
1
2
h
1
+
1
2
0.
The solutions to these equations are
1
Example 30.27. Note: If we were to make use of Theorem ?? we would have not
set h3 = h4 = 0 and we would have added the equations,
h3 =
1
2
h3 +
1
2
h4
h4 =
1
3
h3 +
2
3
h4,
to those above. The general solution to these equations is c (1, 1) for some c R and
the non-negative minimal solution is the special case where c = 0, i.e. h3 = h4 = 0.
The point is, since {3, 4} is a closed communication class there is no way to hit 5
starting in {3, 4} and therefore clearly h3 = h4 = 0.
P
1
(X
n
hits 5) = h
1
=
2
3
and P
2
(X
n
hits 5) = h
2
=
1
3
.
Since the process is either going to end up in 5 or in 3, 4 , we may also
conclude that
P
1
(X
n
hits 3, 4) =
1
3
and P
2
(X
n
hits 3, 4) =
2
3
.
7. Using these results in Eqs. (30.24) and (30.25) shows,
rst row of lim
n
P
n
=
2
3
5]
+
1
3
3,4]
=
_
0 0
2
15
1
5
2/3
=
_
0.0 0.0 0.133 33 0.2 0.666 67
and
second row of lim
n
P
n
=
1
3
5]
+
2
3
3,4]
=
1
3
_
0 0 0 0 1
+
2
3
_
0 0 2/5 3/5 0
=
_
0 0
4
15
2
5
1
3
=
_
0.0 0.0 0.266 67 0.4 0.333 33
.
These answers already compare well with
P
10
=
_
_
9.7656 10
4
0.0 0.132 76 0.200 24 0.666 02
0.0 9.7656 10
4
0.266 26 0.399 76 0.333 01
0.0 0.0 0.4 0.600 00 0.0
0.0 0.0 0.400 00 0.6 0.0
0.0 0.0 0.0 0.0 1.0
_
_
.
30.2 The Strong Markov Property
In proving the results above, we are going to make essential use of a strong form
of the Markov property which asserts that Theorem ?? continues to hold even
when n is replaced by a random stopping time.
Denition 30.28 (Stopping times). Let be an N
0
- valued ran-
dom variable which is a functional of a sequence of random variables, X
n
n=0
which we write by abuse of notation as, = (X
0
, X
1
, . . . ) . We say that
is a stopping time if for all n N
0
, the indicator random variable, 1
=n
is a
functional of (X
0
, . . . , X
n
) . Thus for each n N
0
there should exist a function,
n
such that 1
=n
=
n
(X
0
, . . . , X
n
) . In other words, the event = n may
be described using only (X
0
, . . . , X
n
) for all n N.
Example 30.29. Here are some example of random times which are which are
not stopping times. In these examples we will always use the convention that
the minimum of the empty set is +.
1. The random time, = mink : [X
k
[ 5 (the rst time, k, such that [X
k
[
5) is a stopping time since
= k = [X
1
[ < 5, . . . , [X
k1
[ < 5, [X
k
[ 5.
2. Let W
k
:= X
1
+ +X
k
, then the random time,
= mink : W
k

is a stopping time since,
= k =
_
W
j
= X
1
+ +X
j
< for j = 1, 2, . . . , k 1,
& X
1
+ +X
k1
+X
k

_
.
3. For t 0, let N(t) = #k : W
k
t. Then
N(t) = k = X
1
+ +X
k
t, X
1
+ +X
k+1
> t
which shows that N (t) is not a stopping time. On the other hand, since
N(t) + 1 = k = N(t) = k 1
= X
1
+ +X
k1
t, X
1
+ +X
k
> t,
we see that N(t) + 1 is a stopping time!
4. If is a stopping time then so is + 1 because,
1
+1=k]
= 1
=k1]
=
k1
(X
0
, . . . , X
k1
)
which is also a function of (X
0
, . . . , X
k
) which happens not to depend on
X
k
.
5. On the other hand, if is a stopping time it is not necessarily true that
1 is still a stopping time as seen in item 3. above.
6. One can also see that the last time, k, such that [X
k
[ is typically not
a stopping time. (Think about this.)
Remark 30.30. If is an X
n
n=0
- stopping time then
1
n
= 1 1
<n
= 1
k<n
k
(X
0
, . . . , X
k
) =: u
n
(X
0
, . . . , X
n1
) .
That is for a stopping time , 1
n
is a function of (X
0
, . . . , X
n1
) only for all
n N
0
.
The following presentation of Walds equation is taken from Ross [57, p.
59-60].
Theorem 30.31 (Walds Equation). Suppose that X
n
n=1
is a sequence of
i.i.d. random variables, f (x) is a non-negative function of x 1, and is a
stopping time. Then
E
_

n=1
f (X
n
)
_
= Ef (X
1
) E. (30.26)
This identity also holds if f (X
n
) are real valued but integrable and is a stop-
ping time such that E < . (See Resnick for more identities along these lines.)
Proof. If f (X
n
) 0 for all n, then the the following computations need no
justication,
E
_

n=1
f (X
n
)
_
= E
_

n=1
f (X
n
) 1
n
_
=
n=1
E[f (X
n
) 1
n
]
=
n=1
E[f (X
n
) u
n
(X
1
, . . . , X
n1
)]
=
n=1
E[f (X
n
)] E[u
n
(X
1
, . . . , X
n1
)]
=
n=1
E[f (X
n
)] E[1
n
] = Ef (X
1
)
n=1
E[1
n
]
= Ef (X
1
) E
_

n=1
1
n
_
= Ef (X
1
) E.
If E[f (X
n
)[ < and E < , the above computation with f replaced by
[f[ shows all sums appearing above are equal E[f (X
1
)[ E < . Hence we
may remove the absolute values to again arrive at Eq. (30.26).
n
n=1
be i.i.d. such that P (X
n
= 0) = P (X
n
= 1) =
1/2 and let
:= minn : X
1
+ +X
n
= 10 .
For example is the rst time we have ipped 10 heads of a fair coin. By Walds
equation (valid because X
n
0 for all n) we nd
10 = E
_

n=1
X
n
_
= EX
1
E =
1
2
E
and therefore E = 20 < .
30.2 The Strong Markov Property 451
Example 30.33 (Gamblers ruin). Let X
n
n=1
be i.i.d. such that
P (X
n
= 1) = P (X
n
= 1) = 1/2 and let
:= minn : X
1
+ +X
n
= 1 .
So may represent the rst time that a gambler is ahead by 1. Notice that
EX
1
= 0. If E < , then we would have < a.s. and by Walds equation
would give,
1 = E
_

n=1
X
n
_
= EX
1
E = 0 E
which can not hold. Hence it must be that
E = E[rst time that a gambler is ahead by 1] = .
Theorem 30.34 (Strong Markov Property). Let
_
X
n
n=0
, P
x
xS
, p
_
be Markov chain as above and : [0, ] be a stopping time as in Denition
30.28. Then
E
[f (X
, X
+1
, . . . ) g
(X
0
, . . . , X
) 1
<
]
= E
[[E
X
f (X
0
, X
1
, . . . )] g
(X
0
, . . . , X
) 1
<
] . (30.27)
for all f, g = g
n
0 or f and g bounded.
Proof. The proof of this deep result is now rather easy to reduce to Theorem
??. Indeed,
E
[f (X
, X
+1
, . . . ) g
(X
0
, . . . , X
) 1
<
]
=
n=0
E
[f (X
n
, X
n+1
, . . . ) g
n
(X
0
, . . . , X
n
) 1
=n
]
=
n=0
E
[f (X
n
, X
n+1
, . . . ) g
n
(X
0
, . . . , X
n
)
n
(X
0
, . . . , X
n
)]
=
n=0
E
[[E
Xn
f (X
0
, X
1
, . . . )] g
n
(X
0
, . . . , X
n
)
n
(X
0
, . . . , X
n
)]
=
n=0
E
[[E
X
f (X
0
, X
1
, . . . )] g
(X
0
, . . . , X
n
) 1
=n
]
= E
[[E
X
f (X
0
, X
1
, . . . )] g
(X
0
, . . . , X
) 1
<
]
wherein we have used Theorem ?? in the third equality.
The analogue of Corollary ?? in this more general setting states; conditioned
on < and X
= x, X
, X
+1
, X
+2
, . . . is independent of X
0
, . . . , X
and
is distributed as X
0
, X
1
, . . . under P
x
.
Corollary 30.35. Let be a stopping time, x S and be any probability
on S. Then relative to P
([ < , X
= x) , X
+k
k0
is independent of
X
0
, . . . , X
and X
+k
k0
has the same distribution as X
k
k=0
under P
x
.
Proof. According to Eq. (30.27),
E
[g (X
0
, . . . , X
) f (X
, X
+1
, . . . ) : < , X
= x]
= E
[g (X
0
, . . . , X
) 1
<
x
(X
) f (X
, X
+1
, . . . )]
= E
[g (X
0
, . . . , X
) 1
<
x
(X
) E
X
[f (X
0
, X
1
, . . . )]]
= E
[g (X
0
, . . . , X
) 1
<
x
(X
) E
x
[f (X
0
, X
1
, . . . )]]
= E
[g (X
0
, . . . , X
) : < , X
= x] E
x
[f (X
0
, X
1
, . . . )] .
Dividing this equation by P ( < , X
= x) shows,
E
[g (X
0
, . . . , X
) f (X
, X
+1
, . . . ) [ < , X
= x]
= E
[g (X
0
, . . . , X
) [ < , X
= x] E
x
[f (X
0
, X
1
, . . . )] . (30.28)
E
[f (X
, X
+1
, . . . ) [ < , X
= x] = E
x
[f (X
0
, X
1
, . . . )] . (30.29)
This shows that X
+k
k0
under P
([ < , X
= x) has the same distri-

bution as X
k
k=0
under P
x
and, in combination, Eqs. (30.28) and (30.29)
shows X
+k
k0
and X
0
, . . . , X
are conditionally, on < , X
= x ,
independent.
To match notation in the book, let
f
(n)
ii
= P
i
(R
i
= n) = P
i
(X
1
,= i, . . . , X
n1
,= i, X
n
= i)
and m
ij
:= E
i
(M
j
) the expected number of visits to j after n = 0.
Proposition 30.36. Let i S and n 1. Then P
n
ii
satises the renewal
equation,
P
n
ii
=
n
k=1
P(R
i
= k)P
nk
ii
. (30.30)
Also if j S, k N, and : S [0, 1] is any probability on S, then Eq. (30.3)
holds, i.e.
P
(M
j
k) = P
(R
j
< ) P
j
(R
j
< )
k1
. (30.31)
Proof. To prove Eq. (30.30) we rst observe for n 1 that X
n
= i is the
disjoint union of X
n
= i, R
i
= k for 1 k n and therefore
2
,
2
Alternatively, we could use the Markov property to show,
P
n
ii
= P
i
(X
n
= i) =
n
k=1
P
i
(R
i
= k, X
n
= i)
=
n
k=1
P
i
(X
1
,= i, . . . , X
k1
,= i, X
k
= i, X
n
= i)
=
n
k=1
P
i
(X
1
,= i, . . . , X
k1
,= i, X
k
= i)P
nk
ii
=
n
k=1
P
nk
ii
P(R
i
= k).
For Eq. (30.31) we have M
j
1 = R
j
< so that P
i
(M
j
1) =
P
i
(R
j
< ) . For k 2, since R
j
< if M
j
1, we have
P
i
(M
j
k) = P
i
(M
j
k[R
j
< ) P
i
(R
j
< ) .
Since, on R
j
< , X
Rj
= j, it follows by the strong Markov property (Corollary
29.5 (or Corollary 30.35 below)) that;
P
i
(M
j
k[R
j
< ) = P
i
_
M
j
k[R
j
< , X
Rj
= j
_
= P
i
_
_
1 +
n1
1
XR
j
+n=j
k[R
j
< , X
Rj
= j
_
_
= P
j
_
_
1 +
n1
1
Xn=j
k
_
_
= P
j
(M
j
k 1) .
By the last two displayed equations,
P
i
(M
j
k) = P
j
(M
j
k 1) P
i
(R
j
< ) (30.32)
Taking i = j in this equation shows,
P
j
(M
j
k) = P
j
(M
j
k 1) P
j
(R
j
< )
P
n
ii
= Pi(Xn = i) =
n
k=1
Ei(1Ri=k 1Xn=i) =
n
k=1
Ei(1Ri=k Ei1X
nk
=i)
=
n
k=1
Ei(1Ri=k)Ei
_
1X
nk
=i
_
=
n
k=1
Pi(Ri = k)Pi(Xnk = i)
=
n
k=1
P
nk
ii
P(Ri = k).
and so by induction,
P
j
(M
j
k) = P
j
(R
j
< )
k
. (30.33)
Equation (30.31) now follows from Eqs. (30.32) and (30.33).
30.3 Irreducible Recurrent Chains
For this section we are going to assume that X
n
is a irreducible recurrent
Markov chain. Let us now x a state, j S and dene,
1
= R
j
= minn 1 : X
n
= j,
2
= minn 1 : X
n+1
= j,
.
.
.
n
= minn 1 : X
n+n1
= j,
so that
n
is the time it takes for the chain to visit j after the (n 1)st visit
to j. By Corollary 30.14 we know that P
i
(
n
< ) = 1 for all i S and n N.
We will use strong Markov property to prove the following key lemma in our
development.
Lemma 30.37. We continue to use the notation above and in particular as-
sume that X
n
is an irreducible recurrent Markov chain. Then relative to any P
i
with i S,
n
n=1
is a sequence of independent random variables,
n
n=2
are
identically distributed, and P
i
(
n
= k) = P
j
(
1
= k) for all k N
0
and n 2.
Proof. Let T
0
= 0 and then dene T
k
inductively by, T
k+1
=
inf n > T
k
: X
n
= j so that T
n
is the time of the nth visit of X
n
n=1
to
site j. Observe that T
1
=
1
,
n+1
(X
0
, X
1
, . . . ) =
1
_
X
Tn
, X
Tn+1
, X
Tn+2
, . . .
_
,
and (
1
, . . . ,
n
) is a function of (X
0
, . . . , X
Tn
) . Since P
i
(T
n
< ) = 1 (Corol-
lary 30.14) and X
Tn
= j, we may apply the strong Markov property in the form
of Corollary 29.5 (or Corollary 30.35) to learn:
1.
n+1
is independent of (X
0
, . . . , X
Tn
) and hence
n+1
is independent of
(
1
, . . . ,
n
) , and
2. the distribution of
n+1
under P
i
is the same as the distribution of
1
under
P
j
.
The result now follows from these two observations and induction.
30.3 Irreducible Recurrent Chains 453
Theorem 30.38. Suppose that X
n
is a irreducible recurrent Markov chain, and
let j S be a xed state. Dene
j
:=
1
E
j
(R
j
)
, (30.34)
with the understanding that
j
= 0 if E
j
(R
j
) = . Then
lim
N
1
N
N
n=0
1
Xn=j
=
j
P
i
a.s. (30.35)
for all i S and
lim
N
1
N
N
n=0
P
n
ij
=
j
. (30.36)
Proof. Let us rst note that Eq. (30.36) follows by taking expectations of
Eq. (30.35). So we must prove Eq. (30.35).
By Lemma 30.37, the sequence
n
n2
is i.i.d. relative to P
i
and E
i
n
=
E
j
j
= E
j
R
j
for all i S. We may now use the strong law of large numbers
(Theorem ??) to conclude that
lim
N
1
+
2
+ +
N
N
= E
i
2
= E
j
1
= E
j
R
j
(P
i
a.s.). (30.37)
This may be expressed as follows, let R
(N)
j
=
1
+
2
+ +
N
, be the time
when the chain rst visits j for the N
th
time, then
lim
N
R
(N)
j
N
= E
j
R
j
(P
i
a.s.) (30.38)
Let
N
=
N
n=0
1
Xn
= j
be the number of time X
n
visits j up to time N. Since j is visited innitely
often,
N
as N and therefore, lim
N
N+1
N
= 1. Since there were
N
visits to j in the rst N steps, the of the
N
th
time j was hit is less than or
equal to N, i.e. R
(N)
j
N. Similarly, the time, R
(N+1)
j
, of the (
N
+ 1)
st
visit
to j must be larger than N, so we have R
(N)
j
N R
(N+1)
j
. Putting these
facts together along with Eq. (30.38) shows that
R
(
N
)
j
N
N
N
R
(
N
+1)
j
N+1

N+1
N
N ,
E
j
R
j
lim
N
N
N
E
j
R
j
1
i.e. lim
N
N
N
= E
j
R
j
for P
i
almost every sample path. Taking reciprocals
of this last set of inequalities implies Eq. (30.35).
n
is a irreducible, recurrent Markov chain
and let
j
=
1
Ej(Rj)
for all j S as in Eq. (30.34). Then either
i
= 0 for all
i S (in which case X
n
is null recurrent) or
i
> 0 for all i S (in which case
X
n
is positive recurrent). Moreover if
i
> 0 then
iS
i
= 1 and (30.39)
iS
i
P
ij
=
j
for all j S. (30.40)
That is = (
i
)
iS
is the unique stationary distribution for P.
Proof. Let us dene
T
n
ki
:=
1
n
n
l=1
P
l
ki
(30.41)
which, according to Theorem 30.38, satises,
lim
n
T
n
ki
=
i
for all i, k S.
Observe that,
(T
n
P)
ki
=
1
n
n
l=1
P
l+1
ki
=
1
n
n
l=1
P
l
ki
+
1
n
_
P
n+1
ki
P
ki
i
as n .
Let :=

iS

i
. Since
i
= lim
n
T
n
ki
, Fatous lemma implies for all
i, j S that
=
iS
i
=
iS
lim inf
n
T
n
ki
lim inf
n
iS
T
n
ki
= 1
and
iS
i
P
ij
=
iS
lim
n
T
n
li
P
ij
lim inf
n
iS
T
n
li
P
ij
= lim inf
n
T
n+1
lj
=
j
where l S is arbitrary. Thus
iS
i
=: 1 and
iS
i
P
ij

j
By induction it also follows that
iS
i
P
k
ij

j
So if
j
= 0 for some j S, then given any i S, there is a integer k such that
P
k
ij
> 0, and by Eq. (30.43) we learn that
i
= 0. This shows that either
i
= 0
for all i S or
i
> 0 for all i S.
For the rest of the proof we assume that
i
> 0 for all i S. If there were
some j S such that

iS

i
P
ij
<
j
, we would have from Eq. (30.42) that
=
iS
i
=
iS
jS
i
P
ij
=
jS
iS
i
P
ij
<
jS
j
= ,
which is a contradiction and Eq. (30.40) is proved.
From Eq. (30.40) and induction we also have
iS
i
P
k
ij
=
j
for all j S
for all k N and therefore,
iS
i
T
k
ij
=
j
Since 0 T
ij
1 and
iS

i
= 1, we may use the dominated convergence
theorem to pass to the limit as k in Eq. (30.44) to nd
j
= lim
k
iS
i
T
k
ij
=
iS
lim
k
i
T
k
ij
=
iS
j
=
j
.
Since
j
> 0, this implies that = 1 and hence Eq. (30.39) is now veried.
Proposition 30.40. Suppose that P is an irreducible Markov kernel which ad-
mits a stationary distribution . Then P is positive recurrent and
j
=
j
=
1
Ej(Rj)
for all j S. In particular, an irreducible Markov kernel has at most
one invariant distribution and it has exactly one i P is positive recurrent.
Proof. Suppose that = (
i
) is a stationary distribution for P, i.e.
iS

i
= 1 and
j
=
iS

i
P
ij
for all j S. Then we also have
j
=
iS
i
T
k
ij
for all k N (30.45)
where T
k
ij
is dened above in Eq. (30.41). As in the proof of Proposition 30.39,
we may use the dominated convergence theorem to nd,
j
= lim
k
iS
i
T
k
ij
=
iS
lim
k
i
T
k
ij
=
iS
j
=
j
.
Alternative Proof. If P were not positive recurrent then P is either tran-
sient or null-recurrent in which case lim
n
T
n
ij
=
1
Ej(Rj)
= 0 for all i, j. So
letting k , using the dominated convergence theorem, in Eq. (30.45) al-
lows us to conclude that
j
= 0 for all j which contradicts the fact that was
assumed to be a distribution.
Lemma 30.41 (A number theory lemma). Suppose that 1 is the greatest
common denominator of a set of positive integers, := n
1
, . . . , n
k
. Then
there exists N N such that the set,
A = m
1
n
1
+ +m
k
n
k
: m
i
0 for all i ,
contains all n N with n N.
Proof. (The following proof is from Durrett [15].) We rst will show that
A contains two consecutive positive integers, a and a + 1. To prove this let,
k := min[b a[ : a, b A with a ,= b
and choose a, b A with b = a +k. If k > 1, there exists n A such that
k does not divide n. Let us write n = mk + r with m 0 and 1 r < k. It
then follows that (m+ 1) b and (m+ 1) a +n are in A,
(m+ 1) b = (m+ 1) (a +k) > (m+ 1) a +mk +r = (m+ 1) a +n,
and
(m+ 1) b (m+ 1) a +n = k r < k.
This contradicts the denition of k and therefore, k = 1.
Let N = a
2
. If n N, then na
2
= ma +r for some m 0 and 0 r < a.
Therefore,
n = a
2
+ma +r = (a +m) a +r = (a +mr) a +r (a + 1) A.
31
Brownian Motion II (Markov Property)
Denition 31.1. For d N, we say a 1
d
valued process,
_
X
t
=
_
X
1
t
, . . . , X
d
t
_
tr
_
t0
is a d dimensional Brownian motion
provided
_
X
i
_
d
i=1
is an independent collection of one dimensional Brownian
motions.
Recall from Example 29.12 we know the hypothesis of Theorem 29.6 are
veried for Brownian motion and therefore it satises the strong Markov prop-
erty. In this chapter we will develop some properties of Brownian motion based
on the strong Markov property. To keep the setup as clean as possible we will
now work with the canonical path space model for Brownian motion.
Notation 31.2 (Canonical path space) In what follows, let :=
C
_
1
d
+
, 1
d
_
and let
t
: and B
t
: 1
d
be dened by,
t
() = (t +) and
B
t
() := (t)
respectively. Further let B
t
:= (B
s
: s t) , B
t+
:=
s>t
B
t
, and B :=
(B
s
: s < ) =
t<
B
t
.
Denition 31.3. Let X
t
t0
be a Brownian motion dened on some proba-
bility space, (Y, /, ) . For x 1
d
, let
x
: Y be dened by,
x
(y) := ([0, ) t x +X
t
(y)) .
Then
x
is a measurable map and we let P
x
:=
1
x
for all x 1
d
. When
x = 0, measure, P
0
, is called Wiener measure on (, B) . More generally for
any probability measure on
_
1
d
, B
R
d
_
, let
P
(A) :=
_
R
d
P
x
(A) d (x) for all A B. (31.1)
Exercise 31.1. For T > 0, let
T
:= C ([0, T] , 1) which relative to the sup-
norm (or uniform norm) ||
u
, is a Banach space. Show that the Borel
algebra, B, on
T
is the same as B
T
:= (B
s
: s T) .
Corollary 31.4. Let be a probability measure on
_
1
d
, B
R
d
_
, T > 0,and let
b
t
:= B
t+T
B
T
for t 0.
Then b is a Brownian motion starting at 0 1
d
+
T
relative to the probability measure, P
in Eq. (31.1).
Proof. Let F : 1 be a bounded B measurable function and A B
+
T
.
Since B
t+T
= B
t

T
we may write
b
t
= (B
t
B
0
)
T
and F (b) = F (B B
0
)
T
.
So by the Markov property in Theorem 29.6 with = T (a stopping time) and
the fact that the P
x
law of B
B
0
is the P
0
law of B
for all x 1, we
have
E
_
F (b) [B
+
T
= E
_
F (B B
0
)
T
[B
+
T
= E
BT
[F (B B
0
)] = E
0
[F (B)] = E
0
[F] .
Corollary 31.5 (Rapid oscillation of B. M.). Suppose that d = 1. Let T
+
:=
inf t > 0 : B
t
> 0 , T
:= inf t > 0 : B
t
< 0 , and T
0
:= inf t > 0 : B
t
= 0 .
Then
P
0
(T
= 0) = 1 = P
0
(T
0
= 0) .
In particular, the typical Brownian path revisits 0 for innitely many t (0, )
for any > 0, see Figure 27.1.
Proof. From Proposition 27.11, we know that T
+
, T
, and T
0
are all optional
times and therefore,
T
+
= 0 , T
= 0 , and T
0
= 0
are all B
0+
measurable. For any t > 0, B
t
> 0 T
+
t and therefore,
P
0
(T
+
t) P (B
t
> 0) =
1
2
.
456 31 Brownian Motion II (Markov Property)
Letting t 0 in this equation shows P
0
(T
+
= 0)
1
2
and so by Blumenthal 0 1
law (Theorem 29.17) we know P
0
(T
+
= 0) = 1. Similarly, or by using B
d
= B
under P
0
, P
0
(T
= 0) = 1 as well. Finally, if T
+
() = 0 and T
() = 0, it
follows by the intermediate value theorem that T
0
() = 0. Thus T
+
= 0
T
= 0 T
0
= 0 and therefore P
0
(T
0
= 0) = 1.
Exercise 31.2. Suppose d = 1 and f : (0, 1) (0, ) is a continuous function.
Show X := limsup
t0
Bt
f(t)
is B
0+
measurable. Use Blumenthal 0 1 law (The-
orem 29.17), Corollary 31.5, and Lemma 10.49 to conclude that there exists
c [0, ] such that limsup
t0
Bt
f(t)
= c, P
0
a.s.
Exercise 31.3. Show limsup
t0
Bt
t
1/2
= , P
0
a.s. Hint: use a scaling argu-
ment to show for any M (0, ) and n N that
P
0
_
sup
t(0,1/n]
B
t
t
1/2
M
_
P
0
(B
1
M) < 1.
The following corollary is the stopping time analogue of Corollary 31.4.
Corollary 31.6. Let be a probability measure on
_
1
d
, B
R
d
_
, be an optional
time with P
( < ) > 0, and let

b
t
:= B
t+
B
on < .
Then, conditioned on < , b is a Brownian motion starting at 0 1
d
+
. To be more precise we are claiming, for all bounded

measurable functions F : 1
d
and all A B
+
, that
E
[F (b) [ < ] = E
0
[F] (31.2)
and
E
[F (b) 1
A
[ < ] = E
[F (b) [ < ] E
[1
A
[ < ] . (31.3)
Proof. On < , B
t+
= B
t

and therefore,
b
t
= (B
t
B
0
)
and F (b) = F (B B
0
)
on < .
So by the Markov property in Theorem 29.6 and the fact that the P
x
law of
B
B
0
is the P
0
law of B
for all x 1
d
, we have
E
_
1
<
F (b) [B
+
= E
_
1
<
F (B B
0
)
[B
+
= 1
<
E
B
[F (B B
0
)]
= 1
<
E
0
[F (B)] = 1
<
E
0
[F] .
Therefore if A B
+
,
E
[F (b) 1
A
[ < ] =
1
P
( < )
E
[1
<
F (b) 1
A
]
= E
0
[F] E
[1
A
[ < ] . (31.4)
Taking A = in this equation proves Eq. (31.2) and then combining Eq. (31.2)
and (31.4) proves Eq. (31.3).
Proposition 31.7 (Stitching Lemma). Suppose that B
t
t0
and X
t
t0
are Brownian motions, is an optional time, and X
t
t0
is independent of
B
+
. Then

B
t
:= B
t
+X
(t)
+
is another Brownian motion.
Proof. Given any nite subset, := t
i
n
i=1
[0, ), we must show
(B
t1
, . . . , B
tn
) has the same distribution as
_
B
t1
, . . . ,

B
tn
_
. In checking this
we may replace by M where M := max and in this way we may assume
without loss of generality that < = .
By Corollary 31.6, the process, b
t
:= B
t+
B
is again a Brownian motion

independent of B
+
and B
t
= B
t
+ b
(t)
+
. As X
d
= b and both b and X
are independent of B
+
and therefore independent of (B
, ) , it follows that
(B
, , b)
d
= (B
, , X) . So we may conclude that F (B
, , b)
d
= F (B
, , X)
for any measurable function F. Applying this fact with F : [0, ]
1
n
dened by
1
F (x, t, y) :=
_
x
t1t
+y
(t1t)
+
, . . . , x
tnt
+y
(tnt)
+
_
shows
(B
t1
, . . . , B
tn
) = F (B
, , b)
d
= F (B
, , X) =
_
B
t1
, . . . ,

B
tn
_
.
(In broad strokes the the three steps in the above proof may be summarized as;
1) B = B
+b
()
, 2)

B = B
+X
()
, and 3) (B
, , b)
d
= (B
, , X) .)
For the next couple of results we will follow [28, Chapter 13] (also see [29,
Section 2.8]) where more results along this line may be found.
Theorem 31.8 (Reection Principle). Let be an optional time and
B
t
t0
be a Brownian motion. Then the reected process (see Figure ??),
B
t
:= B
t
(B
t
B
t
) =
_
B
t
if t
B
(B
t
B
) if t > ,
1
You should convince yourself that this function is measurable, see Lemma 27.6.
31 Brownian Motion II (Markov Property) 457
Proof. Again it suces to show (B
t1
, . . . , B
tn
)
d
=
_
B
t1
, . . . ,

B
tn
_
for all
nite subsets, := t
i
n
i=1
[0, ). As in the proof of Proposition 31.7 we
may assume the < = . By Corollary 31.6, the process, b
t
:= B
t+
B
is a Brownian motion independent of B

+
. it then follows that X

t
:= b
t
is
another Brownian motion independent of B
+
and since

B
t
= B
t
+X
(t)
+
the
proof is nished with an application of Proposition 31.7.
Lemma 31.9. If d = 1, then
P
0
(B
t
> a) min
_
_
t
2a
2
e
a
2
/2t
,
1
2
e
a
2
/2t
_
for all t > 0 and a > 0.
Proof. This follows from Lemma 7.59 and the fact that B
t
d
=

tN, where
N is a standard normal random variable.
Lemma 31.10 (Running maximum). If B
t
is a 1 - dimensional Brownian
motion and z > 0 and y 0, then
P
_
max
st
B
s
z, B
t
< z y
_
= P(B
t
> z +y) (31.5)
and
P
_
max
st
B
s
z
_
= 2P(B
t
> z) = P ([B
t
[ > z) . (31.6)
In particular we have max
st
B
s
d
= [B
t
[ .
Proof. For z > 0, let := T
z
= inf t > 0 : B
t
= z and let

B
t
:= B
t

b
(t)
+
where b
t
:= (B
t
B
) 1
<
, see Figure 31.1. (By Corollary 26.19 we
actually know that T
z
< a.s. but we will not use this fact in the proof.)
Observe that := inf
_
t > 0 :

B
t
= z
_
= and that
_
max
st
B
s
z
_
= = t .
Furthermore B
t
< z y =
_
B
t
> z +y
_
and hence we have
P
_
max
st
B
s
z, B
t
< z y
_
= P ( t, B
t
< z y) = P
_
= t,

B
t
> z +y
_
= P ( t, B
t
> z +y) = P (B
t
> z +y)
Fig. 31.1. A Brownian path B along with its reection about the rst time, , that
Bt hits the level z.
wherein we have use B
t
> z +y t for the last equality.
Taking y = 0 in this estimate shows
P
_
max
st
B
s
z, B
t
< z
_
= P(B
t
> z)
and then using P (B
t
= z) = 0 we may add P (B
t
z) =
P (max
st
B
s
z, B
t
z) to both sides of this equation to learn
P
_
max
st
B
s
z
_
= 2P(B
t
> z).
Remark 31.11. Notice that
P
_
max
st
B
s
z
_
= P ([B
t
[ > z) = P
_
t [B
1
[ > z
_
= P
_
[B
1
[ >
z
t
_
1 as t
and therefore it follows that sup
s 0 : [B
t
[ = a , i.e. the rst
time B
t
leaves the strip (a, a). Then
P
0
(T < t) 4P
0
(B
t
> a) =
4
2t
_

a
e
x
2
/2t
dx
min
_
_
8t
a
2
e
a
2
/2t
, 1
_
. (31.7)
Notice that P
0
(T < t) = P
0
(B
t
a) where B
t
= max [B
[ : t . So Eq.
(31.7) may be rewritten as
P
0
(B
t
a) 4P
0
(B
t
> a) min
_
_
8t
a
2
e
a
2
/2t
, 1
_
2e
a
2
/2t
. (31.8)
Proof. By denition T = T
a
T
a
so that T < t = T
a
< t T
a
< t
and therefore
P
0
(T < t) P
0
(T
a
< t) +P
0
(T
a
< t)
= 2P
0
(T
a
< t) = 4P
0
(B
t
> a) =
4
2t
_

a
e
x
2
/2t
dx
2t
_

a
x
a
e
x
2
/2t
dx =
4
2t
_
t
a
e
x
2
/2t
_
a
=
_
8t
a
2
e
a
2
/2t
.
This proves everything but the very last inequality in Eq. (31.8). To prove this
inequality rst observe the elementary calculus inequality:
min
_
4
2y
e
y
2
/2
, 1
_
2e
y
2
/2
. (31.9)
Indeed Eq. (31.9) holds
4
2y
2, i.e. if y y
0
:= 2/
2. The fact that Eq.

(31.9) holds for y y
0
follows from the following trivial inequality
1 1.4552
= 2e
= e
y
2
0
/2
.
Finally letting y = a/
t in Eq. (31.9) gives the last inequality in Eq. (31.8).

Theorem 31.13 (The Dirichlet problem). Suppose D is an open subset of
1
d
and := inf t 0 : B
t
D
c
is the rst exit time form D. Given a bounded
measurable function, f : bd(D) 1, let u : D 1 be dened by (see Figure
31.2),
u(x) := E
x
[f (B
) : < ] for x D.
Then u C
(D) and u = 0 on D, i.e. u is a harmonic function.

Fig. 31.2. Brownian motion starting at x D and exiting on the boundary of D at
B .
Fig. 31.3. Brownian motion starting at x D and exiting the boundary of B(x, r)
at B before exiting on the boundary of D at B .
Proof. (Sketch.) Let x D and r > 0 be such that B(x, r) D and let
:= inf t 0 : B
t
/ B(x, r) as in Figure 31.3. Setting F := f (B
) 1
<]
,
we see that F
= F on < and that < , P

x
- a.s. on < .
Moreover, by either Corollary 26.19 or by Lemma 31.10 (see Remark 31.11), we
know that < , P
x
a.s. Therefore by the strong Markov property,
u(x) = E
x
[F] = E
x
[F : < ] = E
x
[F
: < ]
= E
x
[E
B
F : < ] = E
x
[u(B
)] .
Using the rotation invariance of Brownian motion, we may conclude that
E
x
[u(B
)] =
1
(bd(B(x, r)))
_
bd(B(x,r))
u(y) d (y)
31.1 Some Brownian Martingales 459
where denotes surface measure on bd(B(x, r)). This shows that u(x) satises
the mean value property, i.e. u(x) is equal to its average about in sphere cen-
tered at x which is contained in D. It is now a well known that this property,
see for example [29, Proposition 4.2.5 on p. 242], that this implies u C
(D)
and that u = 0.
When the boundary of D is suciently regular and f is continuous on bd(D),
it can be shown that, for x bd(D), that u(y) f (x) as y D tends to x.
For more details in this direction, see [2], [4], [29, Section 4.2], [17], and [16].
31.1 Some Brownian Martingales
For this chapter, let
_
, B, B
t
tR+
, P
_
be a ltered probability space as de-
scribed in Chapter 27. For the readers convenience, let us repeat the denition
of a (sub or super) martingale here that was given in Denition 28.1.
Denition 31.14. Given a ltered probability space,
_
, B, B
t
t0
, P
_
, an
adapted process, X
t
: 1, is said to be a (B
t
) martingale provided,
E[X
t
[ < for all t and E[X
t
X
s
[B
s
] = 0 for all 0 s t < . If
E[X
t
X
s
[B
s
] 0 or E[X
t
X
s
[B
s
] 0 for all 0 s t < , then X is
said to be submartingale or supermartingale respectively.
Let us also recall optional sampling Theorem 28.9. In this theorem we as-
sumed that M
t
t0
is a right continuous B
t
(or
_
B
+
t
_
) submartingale
(martingale) and suppose that and are two B
t
optional times. (Equiv-
alently put, we are assuming that and are two
_
B
+
t
_
stopping times.) If
there is a constant K < such that K, then
M
E
_
M
[B
+
(31.10)
with equality when M
t
t0
is a martingale.
Theorem 31.15. Suppose that h : [0, T]1
d
1 is a continuous function such
that

h(t, x) =

t
h(t, x) ,
x
h(t, x) , and
x
h(t, x) exist and are continuous on
[0, T] 1
d
and satisfy
sup
0tT
E
_
[h(t, B
t
)[ +
h(t, B
t
)
+[
x
h(t, B
t
)[ +[
x
h(t, B
t
)[
_
< . (31.11)
Then the process,
M
t
:= h(t, B
t
)
_
t
0
_
h(, B
) +
1
2
x
h(, B
)
_
d (31.12)
is a
_
B
+
t
_
tR+
martingale. In particular, if h also satises the heat equation
in reverse time,
h(t, x) +
1
2
x
h(t, x) = 0, (31.13)
then M
t
= h(t, B
t
) is a martingale.
Proof. Working formally for the moment,
d
d
E
[h(, B
) [B
s+
] =
d
d
_
e
s
2

h(, B
s
)
_
= e
s
2

h(, B
s
) +
1
2
e
s
2

h(, B
s
)
= E
h(, B
) +
1
2
h(, B
) [B
s+
_
.
Integrating this equation on [s, t] then shows
E
[h(t, B
t
) h(s, B
s
) [B
s+
] = E
__
t
s
_
h(, B
) +
1
2
h(, B
)
_
d[B
s+
_
.
This statement is equivalent to the statement that E
[M
t
M
s
[B
s+
] = 0 i.e. to
the assertion that M
t
t0
is a martingale. We now need to justify the above
computations.
1. Let us rst suppose there exists an R < such that h(t, x) = 0 if
[x[ =
_
d
i=1
x
2
i
R. Using Corollary 7.30 and a couple of integration by
parts, we nd
d
d
_
e
s
2

h
_
(, x) =
_
R
d
d
d
[p
s
(x y) h(, y)] dy
=
_
R
d
_
1
2
(p
s
) (x y) h(, y) +p
s
(x y)

h(, y)
_
dy
=
_
R
d
_
1
2
p
s
(x y)
y
h(, y) +p
s
(x y)

h(, y)
_
dy
=
_
R
d
p
s
(x y) h(, y) dy =:
_
e
s
2

h
_
(, x) ,
(31.14)
where
h(, y) :=

h(, y) +
1
2
y
h(, y) . (31.15)
Since
_
e
s
2

h
_
(, x) :=
_
R
d
p
s
(x y) h(, y) dy = E
_
h
_
, x +
sN
_
where N
d
= N (0, I
dd
) , we see that
_
e
s
2

h
_
(, x) (and similarly
that h(, x)) is a continuous function in (, x) for s. Moreover,
_
e
s
2

h
_
(, x) [
=s
= h(s, x) and so by the fundamental theorem of calculus
(using Eq. (31.14)),
_
e
s
2

h
_
(, x) = h(s, x) +
_
t
s
_
e
s
2

h
_
(, x) d. (31.16)
Hence for A B
s+
,
E
[h(t, B
t
) : A] = E
[E
[h(t, B
t
) [B
s+
] : A]
= E
__
e
s
2

h
_
(, B
s
) : A
_
= E
_
h(s, B
s
) +
_
t
s
_
e
s
2

h
_
(, B
s
) d : A
_
= E
[h(s, B
s
) : A] +
_
t
s
E
__
e
s
2

h
_
(, B
s
) : A
_
d.
(31.17)
Since _
e
s
2

h
_
(, B
s
) = E
[(h) (, B
) [B
s+
] ,
we have
E
__
e
s
2

h
_
(, B
s
) : A
_
= E
[(h) (, B
) : A]
which combined with Eq. (31.17) shows,
E
[h(t, B
t
) : A] = E
[h(s, B
s
) : A] +
_
t
s
E
[(h) (, B
) : A] d
= E
_
h(s, B
s
) +
_
t
s
(h) (, B
) d : A
_
.
This proves M
t
t0
is a martingale when h(t, x) = 0 if [x[ R.
2. For the general case, let C
c
_
1
d
, [0, 1]
_
such that = 1 in a neigh-
borhood of 0 1
d
and for n N, let
n
(x) := (x/n) . Observe that
n
1,
n
(x) =
1
n
() (x/n) , and
n
(x) =
1
n
2
() (x/n) all go to zero bound-
edly as n . Applying case 1. to h
n
(t, x) =
n
(x) h(t, x) we nd that
M
n
t
:=
n
(B
t
) h(t, B
t
)
_
t
0
n
(B
)
_
h(, B
) +
1
2
h(, B
)
_
d +
n
(t)
is a martingale, where
n
(t) :=
_
t
0
_
1
2
n
(B
) h(, B
) +
n
(B
) h(, B
)
_
d.
By DCT,
E
[
n
(t)[
_
t
0
1
2
E
[
n
(B
) h(, B
)[ d
+
_
t
0
E
[[
n
(B
)[ [h(, B
)[] d 0 as n ,
and similarly,
E
[
n
(B
t
) h(t, B
t
) h(t, B
t
)[ 0,
_
t
0
E
n
(B
)

h(, B
)

h(, B
d 0, and
1
2
_
t
0
E
[
n
(B
) h(, B
) h(, B
)[ d 0
as n . From these comments one easily sees that E
[M
n
t
M
t
[ 0 as
n which is sucient to show M
t
t[0,T]
is still a martingale.
Rewriting Eq. (31.12) we have
h(t, B
t
) = M
t
+
_
t
0
_
h(, B
) +
1
2
x
h(, B
)
_
d.
Considering the simplest case where h(t, x) = f (x) and x 1, i.e. d = 1, we
have
f (B
t
) = M
t
+
1
2
_
t
0
f
tt
(B
t
) dt
where M
t
is a martingale provided,
sup
0tT
E
[[f (B
t
)[ +[f
t
(B
t
)[ +[f
tt
(B
t
)[] < .
From this it follows that if f
tt
0 (i.e. f is subharmonic) then f (B
t
) is a
submartingale and if f
tt
0 (i.e. f is super harmonic) then f (B
t
) is a super-
martingale. More precisely, if f : 1
d
1 is a C
2
function, then f (B
t
) is a
local submartingale (supermartingale) i f 0 (f 0) .
Corollary 31.16. Suppose h : [0, ) 1
d
1 is a C
2
function such that
h = 0 and both h and h
2
satisfy the hypothesis of Theorem 31.15. If we let
M
t
denote the martingale, M
t
:= h(t, B
t
) and
A
t
:=
_
t
0
[
x
h(, B
)[
2
d,
31.1 Some Brownian Martingales 461
then N
t
:= M
2
t
A
t
is a martingale. Thus the submartingale has the Doob
decomposition,
M
2
t
= N
t
+A
t
.
and we call the increasing process, A
t
, the compensator to M
2
t
.
Proof. We need only apply Theorem 31.15 to h
2
. In order to do this we
need to compute h
2
. Since
2
i
h
2
=
i
(2h
i
h) = 2 (
i
h)
2
+ 2h
i
h
we see that
1
2
h
2
= hh +[h[
2
. Therefore,
h
2
= 2h
h +hh +[h[
2
= [h[
2
and hence the proposition follows form Eq. (31.12) with h replaced by h
2
.
Exercise 31.4 (h transforms of B
t
). Let B
t
tR+
be a d dimensional
Brownian motion. Show the following processes are martingales;
1. M
t
= u(B
t
) where u : 1
d
1 is a Harmonic function, u = 0, such that
sup
tT
E[[u(B
t
)[ +[u(B
t
)[] < for all T < .
2. M
t
= B
t
for all 1
d
.
3. M
t
= e
Yt
cos (X
t
) and M
t
= e
Yt
sin(X
t
) where B
t
= (X
t
, Y
t
) is a
two dimensional Brownian motion and 1.
4. M
t
= [B
t
[
2
d t.
5. M
t
= (a B
t
) (b B
t
) (a b) t for all a, b 1
d
.
6. M
t
:= e
Bt]]
2
t/2
for any 1
d
.
Exercise 31.5 (Compensators). Let B
t
tR+
be a d dimensional Brown-
ian motion. Find the compensator, A
t
, for each of the following square integrable
martingales.
1. M
t
= u(B
t
) where u : 1
d
1 is a harmonic function, u = 0, such that
sup
tT
E
_
u
2
(B
t
)
+[u(B
t
)[
2
_
< for all T < .
2. M
t
= B
t
for all 1
d
.
3. M
t
= [B
t
[
2
d t.
4. M
t
:= e
Bt]]
2
t/2
for any 1
d
.
For the next three exercises let a < 0 < b,
T
y
:= inf t > 0 : B
t
= y y 1, (31.18)
and := T
a
T
b
. Recall from the law of large numbers for Brownian motions
(Corollary 26.19) that P
0
( < ) = 1. This may also be deduced from Lemma
31.10 but this is using a rather large hammer to conclude a simple result.
Remark 31.17. In applying the optional sampling Theorem 28.9, observe that
if is any optional time and K < is a constant, then K is a bounded
optional time. Indeed,
K < t =
_
< t B
t
if t K
B
t
if t > K.
Therefore if and are any optional times with , we may apply Theorem
28.9 with and replaced by K and K. One may then try to pass to
limit as K in the resulting identity.
Exercise 31.6 (Compare with Exercise 18.23). Let < a < 0 < b < .
Show
P
0
(T
b
< T
a
) =
a
b a
=
[a[
b +[a[
.
Use this to conclude that P
0
(T
b
< ) = 1. Hint: consider the optional sam-
pling Theorem 28.9 with M
t
= B
t
.
Exercise 31.7 (Compare with Exercise 18.24). Let < a < 0 0.
Exercise 31.8. By considering the martingale,
M
t
:= e
Bt
1
2
2
t
,
show
E
0
_
e
Ta
= e
a
2
. (31.19)
Remark 31.18. Equation (31.19) may also be proved using Lemma 31.10 di-
rectly. Indeed, if
p
t
(x) :=
1
2t
e
x
2
/2t
,
then
E
0
_
e
Ta
= E
0
__

Ta
e
t
dt
_
= E
0
__

0
1
Ta<t
e
t
dt
_
= E
0
__

0
P (T
a
< t) e
t
dt
_
= 2
_

0
__

a
p
t
(x) dx
_
d
dt
e
t
dt.
Integrating by parts in t, shows
2
_

0
__

a
p
t
(x) dx
_
d
dt
e
t
dt = 2
_

0
__

a
d
dt
p
t
(x) dx
_
e
t
dt
= 2
_

0
__

a
1
2
p
tt
t
(x) dx
_
e
t
dt
=
_

0
p
t
t
(a) e
t
dt
=
_

0
1
2t
a
t
e
a
2
/2t
e
t
dt
=
a
2
_

0
t
3/2
e
a
2
/2t
e
t
dt
where the last integral may be evaluated to be e
a
2
using the Laplace trans-
form function of Mathematica.
Alternatively, consider
_

0
p
t
(x) e
t
dt =
_

0
1
2t
e
x
2
/2t
e
t
dt =
1
2
_

0
1
t
exp
_
x
2
2t
t
_
dt
which is evaluated in Exercise 34.1 in the real analysis notes. Here is a brief
sketch.
Using the theory of the Fourier transform (or characteristic functions if you
prefer),
2
2m
e
m]x]
=
_
R
_
[[
2
+m
2
_
1
e
ix
d
=
_
R
__

0
e
(]]
2
+m
2
)
d
_
e
ix
d
=
_

0
d
_
e
m
2
_
R
de
]]
2
e
ix
_
=
_

0
e
m
2
(2)
1/2
e
1
4
x
2
d,
where d := (2)
1/2
d. Now make appropriate change of variables and carry
out the remaining x integral to arrive at the result.
32
The Feynman-Kac Formula
The goal of this section is to more deeply explore the connection between
Brownian motion and certain heat type equations. We will start with some
basic facts about the heat equation and give some heuristics indicating how one
might solve these equations via Brownian motion. The resulting probabilistic
representation is referred to as the Feynman-Kac formula. At the end we will
see how to make the heuristics rigorous. We begin with the heat equation.
32.1 The Heat Equation
Suppose that 1
d
is a region of space lled with a material, (x) is the
density of the material at x and c(x) is the heat capacity. Let u(x, t)
denote the temperature at time t [0, ) at the spatial point x . Now
suppose that B 1
d
is a subregion in , B is the boundary of B, and E
B
(t)
is the heat energy contained in the volume B at time t. Then
E
B
(t) =
_
B
(x)c(x)u(t, x)dx.
So on one hand (writing

f (t) for
f(t)
t
),
E
B
(t) :=
d
dt
E
B
(t) =
_
B
(x)c(x) u(t, x)dx (32.1)
while on the other hand,
E
B
(t) =
_
B
G(x)u(t, x), n(x))d(x), (32.2)
where G(x) is a n npositive denite matrix representing the conduction
properties of the material, n(x) is the outward pointing normal to B at x
B, and d denotes surface measure on B. (We are using , ) to denote the
standard dot product on 1
d
and u(x, t) =
_
u(x,t)
x1
, . . .
u(x,t)
xn
_
tr
.)
In order to see that we have the sign correct in Eq. (32.2), suppose that
x B and
Fig. 32.1. The geometry of the test region, B .
G(x)u(t, x), n(x)) = u(t, x), G(x)n(x)) > 0.
In this case temperature is increasing as one move from x in the direction of
G(x) n(x) and since n(x) , G(x)n(x)) > 0, G(x) n(x) is outward pointing to
B. Thus if G(x)u(t, x), n(x)) > 0, near x, it is hotter outside B then inside
and hence heat energy is owing into B in a neighborhood of the B near x,
see Figure 32.1.
Comparing Eqs. (32.1) to (32.2) after an application of the divergence the-
orem shows that
_
B
(x)c(x) u(t, x)dx =
_
B
(G()u(t, ))(x) dx. (32.3)
Since this holds for all nice volumes B , we conclude that the temperature
functions should satisfy the following partial dierential equation.
(x)c(x) u(t, x) = (G()u(t, ))(x) . (32.4)
464 32 The Feynman-Kac Formula
u(t, x) =
1
(x)c(x)
(G(x)u(t, x)). (32.5)
Setting g
ij
(x) := G
ij
(x)/((x)c(x)) and
z
j
(x) :=
n
i=1
(G
ij
(x)/((x)c(x)))/x
i
the above equation may be written as:
u(t, x) = Lu(t, x), (32.6)
where
(Lf)(x) =
i,j
g
ij
(x)

2
x
i
x
j
f(x) +
j
z
j
(x)

x
j
f(x). (32.7)
The operator L is a prototypical example of a second order elliptic dierential
operator. In the next section we will consider the special case of the standard
Laplacian, , on 1
d
, i.e. z
j
0 and g
ij
=
ij
so that
:=
n
i=1
2
(x
i
)
2
. (32.8)
32.2 Solving the heat equation on 1
n
.
Let
f(k) = (Tf) (k) =

_
1
2
_
n
_
R
d
f(x)e
ikx
dx
be the Fourier transform of f and
f
(x) =
_
T
1
f
_
(x) =
_
R
d
f(k)e
ikx
dk
be the inverse Fourier transform. For f nice enough (or in the sense of tempered
distributions), we know that
f(x) =
_
R
d
f(k)e
ikx
dk =
_
T
1

f
_
(x).
Also recall that the Fourier transform and the convolution operation are related
by;
f g(k) =
_
1
2
_
n
_
R
d
R
d
f(x y)g(y)e
ikx
dx dy
=
_
1
2
_
n
_
R
d
R
d
f(x)g(y)e
ik(x+y)
dx dy = (2)
n

f(k) g(k).
Inverting this relation gives the relation,
T
1
_
f g
_
(x) =
_
1
2
_
n
(f g)(x).
The heat equation for a function u : 1
+
1
n
C is the partial dierential
equation
_
1
2
x
_
u(t, x) = 0 with u(0, x) = f(x), (32.9)
where f is a given function on 1
n
. By Fourier transforming Eq. (32.9) in the x
variables only, one nds (after some integration by parts) that (32.9) implies
that
_
t
+
1
2
[k[
2
_
u(t, k) = 0 with u(0, k) =

f(k). (32.10)
Solving for u(t, k) gives
u(t, k) = e
t]k]
2
/2

f(k).
Inverting the Fourier transform then shows that
u(t, x) = T
1
_
e
t]k]
2
/2

f(k)
_
(x) =
_
1
2
_
n _
T
1
_
e
t]k]
2
/2
_
f
_
(x).
(32.11)
Let
g (x) :=
_
T
1
e
t
2
]k]
2
_
(x) =
_
R
d
e
t
2
]k]
2
e
ikx
dk.
Making the change of variables, k k/t
1/2
and using a standard Gaussian
integral formula gives
g (x) =
_
1
t
_
n/2
_
R
d
e
1
2
]k]
2
e
ikx/
t
dk
=
_
2
t
_
n/2
_
1
2
_
n/2
_
R
d
e
1
2
]k]
2
e
ikx/
t
dk
=
_
2
t
_
n/2
exp
_
1
2
2
_
=
_
2
t
_
n/2
exp
_
1
2t
[x[
2
_
. (32.12)
Using this result in Eq. (32.11) implies
u(t, x) =
_
R
n
p
t
(x y)f(y)dy
where
p
t
(x) :=
_
1
2t
_
n/2
exp
_
1
2t
[x[
2
_
. (32.13)
This suggests the following theorem.
32.2 Solving the heat equation on R
n
. 465
Theorem 32.1. Let p
t
(x) be the heat kernel on 1
n
dened in Eq. (32.13).
Then
_
1
2
x
_
p
t
(x y) = 0 and lim
t0
p
t
(x y) =
x
(y), (32.14)
where
x
is the function at x in 1
n
. More precisely, if f is a bounded
continuous function on 1
n
, then
u(t, x) =
_
R
n
p
t
(x y)f(y)dy (32.15)
is a solution to Eq. (32.9) and lim
t0
u(t, x) = f (x) uniformly for x K, where
K is any compact subset of 1
n
.
Proof. Direct computations show that
_
1
2
x
_
p
t
(x y) = 0 which
coupled with a few applications of Corollary 7.30 shows
_
1
2
x
_
u(t, x) = 0
for t > 0. After a making the changes of variables, y xy and then y
ty,
Eq. (32.15) may be written as
u(t, x) =
_
R
n
p
t
(y)f(x y)dy =
_
R
n
p
1
(y)f(x
ty)dy
and therefore,
[u(t, x) f (x)[ =
_
R
n
p
1
(y)
_
f(x
ty) f (x)
_
dy
_
R
n
p
1
(y)
f(x
ty) f (x)
dy. (32.16)
For R > 0,
sup
xK
_
]y]R
p
1
(y)
f(x
ty) f (x)
dy sup
xK
sup
]y]R
f(x
ty) f (x)
0
as t 0 by the uniform continuity of f on compact sets. If M = sup
xR
n [f (x)[ ,
then by Chebyschevs inequality,
_
]y]>R
p
1
(y)
f(x
ty) f (x)
dy 2M
_
]y]>R
p
1
(y)dy C
2M
R
where C :=
_
R
n
[y[ p
1
(y)dy. Hence we have shown,
limsup
t0
sup
xK
[u(t, x) f (x)[ C
2M
R
0 as R .
This shows that lim
t0
u(t, x) = f(x) uniformly on compact subsets of 1
n
.
Notation 32.2 We will let
_
e
t/2
f
_
(x) be dened by
_
e
t/2
f
_
(x) =
_
R
n
p
t
(x y)f(y)dy = (p
t
f) (x) .
Hence for nice enough f (for example when f is bounded and continuous),
u(t, x) :=
_
e
t/2
f
_
(x) solves the heat equation in Eq. (32.9).
Exercise 32.1 (Semigroup Property). Verify the semi-group identity for p
t
;
p
t+s
= p
s
p
t
for all s, t > 0. (32.17)
Proposition 32.3 (Properties of e
t/2
). Let t (0, ) , then;
1. for f L
p
(1
n
, dx) with 1 p , the function
_
e
t/2
f
_
(x) =
_
R
n
f(y)
e
1
2t
]xy]
2
(2t)
n/2
dy
is smooth
1
in (t, x) for t > 0 and x 1
n
.
2. e
t/2
acts as a contraction on L
p
(1
n
, dx) for all p [0, ] and t > 0.
3. For p [0, ), e
t/2
f = p
t
f f in L
p
(1
n
, dx) as t 0.
Proof. Item 1. follows by multiple applications of Corollary 7.30.
Item 2.
[(p
t
f)(x)[
_
R
n
[f(y)[p
t
(x y)dy
and hence with the aid of Jensens inequality we have,
|p
t
f|
p
L
p
_
R
n
_
R
n
[f(y)[
p
p
t
(x y)dydx = |f|
p
L
p
So p
t
is a contraction t > 0.
Item 3. First, let us suppose that f C
c
(1
n
) . From Eq. (32.16) along with
Jensens inequality, we nd
_
R
n
[(p
t
f)(x) f (x)[
p
dx
_
R
n
dx
_
R
n
p
1
(y)
f(x
ty) f (x)
p
dy
=
_
R
n
dy p
1
(y)
_
R
n
dx
f(x
ty) f (x)
p
.
(32.18)
1
In fact, u(t, x) is real analytic for x R
n
and t > 0. One just notices that pt(xy)
analytically continues to Re t > 0 and x C
n
and then shows that it is permissible
to dierentiate under the integral.
Since
g (t, y) :=
_
R
n
f(x
ty) f (x)
p
dx 2
p(p1)
1
_
R
n
[f (x)[
p
dx
and lim
t0
g (t, y) = 0, we may pass to the limit (using the DCT) in Eq. (32.18)
to nd lim
t0
|p
t
f f| = 0.
Now suppose g L
p
(1
n
) and f C
c
(1
n
) , then
|p
t
g g|
p
|p
t
g p
t
f|
p
+|p
t
f f|
p
+|f g|
p
2 |f g|
p
+|p
t
f f|
p
and therefore,
limsup
t0
|p
t
g g|
p
2 |f g|
p
.
Since this inequality is valid for all f C
c
(1
n
) and, by Theorem 12.27, C
c
(1
n
)
is dense in L
p
(1
n
) , we may conclude that limsup
t0
|p
t
g g|
p
= 0.
Theorem 32.4 (Forced Heat Equation). Suppose g C
b
(1
d
) and f
C
1,2
b
([0, ) 1
d
) then
u(t, x) := p
t
g(x) +
_
t
0
p
t
f(, x)d
solves
u
t
=
1
2
u +f with u(0, ) = g.
Proof. Because of Theorem 32.1, we may without loss of generality assume
g = 0 in which case
u(t, x) =
_
t
0
p
t
f(t , x)d.
Therefore
u
t
(t, x) = p
t
f(0, x) +
_
t
0
p

t
f(t , x)d
= p
t
f
0
(x)
_
t
0
p
f(t , x)d
and
2
u(t, x) =
_
t
0
p
t

2
f(t , x)d.
Hence we nd, using integration by parts and approximate function argu-
ments, that
_

t

2
_
u(t, x) = p
t
f
0
(x) +
_
t
0
p

1
2
_
f(t , x)d
= p
t
f
0
(x)
+ lim
0
_
t

1
2
_
f(t , x)d
= p
t
f
0
(x) lim
0
p
f(t , x)
+ lim
0
_
t

1
2
_
p
f(t , x)d
= p
t
f
0
(x) p
t
f
0
(x) + lim
0
p
f(t , x)
= f(t, x).
32.3 Wiener Measure Heuristics and the Feynman-Kac
formula
Theorem 32.5 (Trotter Product Formula). Let A and B be dd matrices.
Then e
(A+B)
= lim
n
_
e
A
n e
B
n
_
n
.
Proof. By the chain rule,
d
d
[
0
log(e
A
e
B
) = A+B.
Hence by Taylors theorem with remainder,
log(e
A
e
B
) = (A+B) +O
_
2
_
which is equivalent to
e
A
e
B
= e
(A+B)+O(
2
)
.
Taking = 1/n and raising the result to the n
th
power gives
(e
n
1
A
e
n
1
B
)
n
=
_
e
n
1
(A+B)+O(n
2
)
_
n
= e
A+B+O(n
1
)
e
(A+B)
as n .
32.3 Wiener Measure Heuristics and the Feynman-Kac formula 467
Fact 32.6 (Trotter product formula) For nice enough V,
e
T(/2V )
= strong lim
n
[e
T
2n
T
n
V
]
n
. (32.19)
See [63] for a rigorous statement of this type.
Lemma 32.7. Let V : 1
d
1 be a continuous function which is bounded from
below, then
__
e
T
n
/2
e
T
n
V
_
n
f
_
(x
0
)
=
_
R
dn
pT
n
(x
0
, x
1
)e
T
n
V (x1)
. . . pT
n
(x
n1
, x
n
)e
T
n
V (xn)
f(x
n
)dx
1
. . . dx
n
=
_
_
1
_
2
T
n
_
_
dn
_
(R
d
)
n
e
n
2T
n
i=1
]xixi1]
2
T
n
n
i=1
V (xi)
f(x
n
)dx
1
. . . dx
n
. (32.20)
Notation 32.8 Given T > 0, and n N, let W
n,T
denote the set of piece-
wise C
1
paths, : [0, T] 1
d
such that (0) = 0 and
tt
() = 0 if
/
_
i
n
T
_
n
i=0
=: T
n
(T) see Figure 32.2. Further let dm
n
denote the unique
translation invariant measure on W
n,T
which is well dened up to a multiplica-
tive constant.
Fig. 32.2. A typical path in Wm,T .
With this notation we may rewrite Lemma 32.7 as follows.
Theorem 32.9. Let T > 0 and n N be given. For [0, T] , let
+
=
i
n
T if
(
i1
n
T,
i
n
T]. Then Eq. (32.20) may be written as,
__
e
T
n
/2
e
T
n
V
_
n
f
_
(x
0
)
=
1
Z
n
(T)
_
Wn,T
e
_
T
0
_
1
2
[
()[
2
+V (x0+(+))
d
f (x
0
+ (T)) dm
n
()
where
Z
n
(T) :=
_
Wn,T
e
1
2
_
T
0
[
()[
2
d
dm
n
() .
Moreover, by Trotters product formula,
e
T(/2V )
f (x
0
)
= lim
n
1
Z
n
(T)
_
Wn,T
e
_
T
0
_
1
2
[
()[
2
+V (x0+(+))
d
f (x
0
+ (T)) dm
n
() .
(32.21)
Following Feynman, at an informal level (see Figure 32.3), W
n,T
W
T
as
n , where
W
T
:=
_
C
_
[0, T] 1
d
_
: (0) = 0
_
.
Moreover, formally passing to the limit in Eq. (32.21) leads us to the following
Fig. 32.3. A typical path in WT may be approximated better and better by paths in
Wm,T as m .
heuristic expression for
_
e
T(/2V )
f
_
(x
0
) ;
_
e
T(/2V )
f
_
(x
0
) =
1
Z (T)
_
WT
e
_
T
0
_
1
2
[
()[
2
+V (x0+())
d
f (x
0
+ (T)) T
(32.22)
where T is the non-existent Lebesgue measure on W
T
, and Z (T) is the
normalization constant (or partition function) given by
Z (T) =
_
WT
e
1
2
_
T
0
[
()[
2
d
T.
This expression may also be written in the Feynman Kac form as
e
T(/2V )
f (x
0
) =
_
WT
e
_
T
0
V (x0+())d
f (x
0
+ (T)) d() , (32.23)
where
d() =
1
Z (T)
e
1
2
_
T
0
[
()[
2
d
T. (32.24)
Thus our immediate goal is to make sense out of Eq. (32.24).
Let
H
T
:=
_
h W
T
:
_
T
0
[h
t
()[
2
d <
_
_
T
0
[h
t
()[
2
d := if h is not absolutely continuous.
Further let
h, k)
T
:=
_
T
0
h
t
() k
t
() d for all h, k H
T
and X
h
() := h, )
T
for h H
T
. Since
d() =
1
Z (T)
e
1
2
||
2
H
T T, (32.25)
d() should be a Gaussian measure on H
T
and hence we expect,
E
[X
h
X
k
] = h, k)
T
for all h, k H
T
. (32.26)
According to Proposition 24.6, there exists a Gaussian random eld,
X
h
hHT
, on some probability space, (, B, P) , such that Eq. (32.26) holds.
We are applying this corollary with T H
T
, and Q(h, k) := h, k)
T
. Notice
that if
f
H
T
and : 1 is a function, then
h,k
Q(h, k) (h) (k) =
_
_
_
_
_
h
(h) h
_
_
_
_
_
2
T
0.
Heuristically, we are thinking that should be the Hilbert space, H
T
, and P
should be the measure in Eq. (32.25). In this hypothetical setting, we could
dene, B
t
: H
T
1
d
to be the projection, B
t
() = (t) for t [0, T] . Hence
for a 1
d
,
a B
t
() = a (t) =
_
T
0
a1
[0,t]
() () d = h
a,t
, )
T
= X
ha,t
()
where
h
a,t
() :=
_

0
a1
[0,t]
(u) du = a (t ) .
Since the B
j
are independent processes, it follows from Eq. (32.26) that

E
[(a B
t
) (b B
s
)] = h
a,t
, h
b,s
)
T
=
_
T
0
a b1
[0,t]
() 1
[0,s]
() d
= a b (s t) . (32.27)
Hence we recognize that the process B
t
t0
should be our old friend the
multidimensional Brownian motion and so = Law(B
) . Assuming this to be
the case, the informal expression in Eq. (32.23) leads the following conjecture;
_
e
T(/2V )
f
_
(x
0
) = E
_
e
_
T
0
V (x0+B )d
f (x
0
+B
T
)
_
which is the Feynman Kac formula. We will discuss a rigorous proof of
this formula in the next section.
32.4 Proving the Feynman Kac Formula
Suppose that V : 1
d
1 is a smooth function such that k := inf
xR
d V (x) >
and for f 0 or f bounded and measurable, let
2
T
t
f (x) := E
x
_
e
_
t
0
V (B )d
f (B
t
)
_
= E
0
_
e
_
t
0
V (x+B )d
f (x +B
t
)
_
.
Let us observe that for f 0 and p, q (1, ) such that p
1
+q
1
= 1
[(T
t
f) (x)[ e
kt
E
x
[f (B
t
)] = e
kt
_
R
d
[f (y)[ p
t
(x, y) dy
e
kt
|f|
p
|p
t
(x, )|
q
(32.28)
2
In what follows, the reader feeling queasy about measurability issues, please have
a look at Lemma 27.6 and Lemma 27.7 in the next section.
32.4 Proving the Feynman Kac Formula 469
where
|f|
p
:=
__
R
d
[f (y)[
p
dy
_
1/p
.
In particular if f = 0, m a.e., then T
t
f (x) = 0 and we can use this to see
that (T
t
f) (x) is well dened for all f L
p
. Since
|p
t
(x, )|
q
q
=
_
R
d
p
t
(y x)
q
dy =
_
R
d
p
t
(y)
q
dy
=
1
(2t)
dq/2
_
R
d
e
1
2(t/q)
]y]
2
dy =
1
(2t)
dq/2
_
2t
q
_
d/2
= q
d/2
1
(2t)
d(q1)/2
,
Eq. (32.28) gives the quantitative estimate;
[T
t
f (x)[ C (p, t) |f|
L
p
(R
d
)
, (32.29)
where
C (p, t) := q
d
2q
(2t)
d
2p
e
kt
. (32.30)
Theorem 32.10 (Feynman-Kac Formula). Suppose f L
2
_
1
d
, m
_
and
t 0. Then;
1. T
t
is a bounded linear operator on L
2
_
1
d
_
with |T
t
|
op
e
kt
, i.e.
|T
t
f|
2
e
kt
|f|
2
for all f L
2
_
1
d
, m
_
.
2. T
t
is self-adjoint, i.e. (T
t
f, g) = (f, T
t
g) for all f, g L
2
_
1
d
, m
_
where
(f, g) :=
_
R
d
f (x) g (x) dm(x) .
3. T
t
t0
is a semi-group, i.e. T
t+s
= T
t
T
s
for all t, s 0.
4. T
t
is strongly continuous, i.e.
lim
t0
|T
t
f f|
L
2 = 0 for all f L
2
_
1
d
, m
_
.
5. Let
Af :=
d
dt
[
0+
(T
t
f) := L
2
lim
t0
T
t
f f
t
for those f for which the limit exists. Then Af =
_
1
2
V
_
f for all
f C
2
c
_
1
d
_
. The operator A with its natural domain
3
is called the in-
nitesimal generator of T
t
t0
.
3
The domain, D(A) , of A consists of those f L
2
_
R
d
, m
_
such that the limit
dening Af exists in the L
2
_
R
d
, m
_
sense. So we are asserting that C
2
c
_
R
d
_
D(A)
and Af =
_
1
2
V
_
f for all f C
2
c
_
R
d
_
.
Remark 32.11. Some functional analysis along with basic elliptic regularity
shows, that u(t, x) = T
t
f (x) solves the heat equation,
u
t
(t, x) =
1
2
u(t, x) V (x) u(t, x) with lim
t0
u(t, ) = f () in L
2
.
See Simon [63], for a proof of Theorem 32.10 (in more generality) using Trotters
product formula.
Proof. To simplify notation a bit we will assume d = 1 in the proof below
and let |f| := |f|
L
2
(R,m)
.
1. By Fubinis theorem and simple estimates,
|T
t
f|
2
=
_
R
dx
E
0
_
e
_
t
0
V (x+B )d
f (x +B
t
)
_
_
R
E
0
_
t
0
V (x+B )d
f (x +B
t
)
2
dx
e
2kt
E
0
_
R
[f (x +B
t
)[
2
dx = e
2kt
|f|
2
2
.
2. We have
(T
t
f, g) =
_
R
E
0
_
e
_
t
0
V (x+B )d
f (x +B
t
)
_
g (x) dx
= E
0
_
R
e
_
t
0
V (x+B )d
f (x +B
t
) g (x) dx
= E
0
_
R
e
_
t
0
V (xBt+B )d
f (x) g (x B
t
) dx.
Now let b
:= B
t
B
t
so that b
is another Brownian motion on [0, t]

and observe that b
t
= B
t
. Hence we have
(T
t
f, g) = E
0
_
R
e
_
t
0
V (x+bt )d
f (x) g (x +b
t
) dx
= E
0
_
R
e
_
t
0
V (x+b )d
f (x) g (x +b
t
) dx
= E
0
_
R
e
_
t
0
V (x+B )d
g (x +B
t
) f (x) dx
=
_
R
E
0
_
e
_
t
0
V (x+B )d
g (x +B
t
)
_
f (x) dx
= (f, T
t
g) .
3. Using the Markov property in Theorem 29.3 we nd,
(T
t+s
f) (x) = E
x
_
e
_
t+s
0
V (B )d
f (B
t+s
)
_
= E
x
_
e
_
t
0
V (B )d
e
_
t+s
s
V (B )d
f (B
t+s
)
_
= E
x
_
e
_
t
0
V (B )d
_
e
_
s
0
V (B )d
f (B
s
)
_

t
_
= E
x
_
e
_
t
0
V (B )d
E
Bt
_
e
_
s
0
V (B )d
f (B
s
)
__
= E
x
_
e
_
t
0
V (B )d
(T
s
f) (B
t
)
_
= (T
t
T
s
f) (x) .
4. From the estimate,
|T
t
f f|
2
=
_
R
dx
E
0
_
e
_
t
0
V (x+B )d
f (x +B
t
) f (x)
_
_
R
E
0
_
t
0
V (x+B )d
f (x +B
t
) f (x)
2
dx
=
_
R
E
0
_
e
_
t
0
V (x+B )d
1
_
f (x +B
t
) +f (x +B
t
) f (x)
2
dx,
it follows that
limsup
t0
|T
t
f f|
2
limsup
t0
D
t
+ limsup
t0
E
t
where
D
t
= 2
_
R
E
0
_
e
_
t
0
V (x+B )d
1
_
f (x +B
t
)
2
dx
and
E
t
:= 2E
0
_
R
[f (x +B
t
) f (x)[
2
dx.
Let us now assume for the moment that f C
c
(1) . In this case, using
DCT twice, we learn that
_
R
[f (x +B
t
) f (x)[
2
dx 0 boundedly and
hence that limsup
t0
E
t
= 0. Similarly, if M is a bound on [f[ , then
D
t
2M
2
_
R
E
0
_
t
0
V (x+B )d
1
2
dx 0 as t 0.
So in this special case we have shown lim
t0
|T
t
f f| = 0.
For general f L
2
and g C
c
(1) we have
limsup
t0
|T
t
f f| limsup
t0
[|T
t
f T
t
g| +|T
t
g g| +|g f|]
limsup
t0
_
|T
t
g g| +
_
1 +e
kt
_
|g f|
2 |g f| .
This completes the proof because C
c
(1) is dense in L
2
(1, m) (see Example
12.28) and hence we may make |f g| as small as we please.
5. Finally we sketch the computation of the innitesimal generator, A, on
C
2
c
(1) . By the chain rule,
d
dt
[
0+
T
t
f (x) =
d
dt
[
0+
E
x
_
e
_
t
0
V (B )d
f (B
t
)
_
=
d
dt
[
0+
E
x
_
e
_
t
0
V (B )d
f (B
0
)
_
+
d
dt
[
0+
E
x
[f (B
t
)]
= E
x
[V (B
0
) f (B
0
)] +
d
dt
[
0+
(p
t
f) (x)
= V (x) f (x) +
1
2
f (x) .
Exercise 32.2 (Ultracontractivity of T
t
). Let BC
_
1
d
_
denote the bounded
continuous functions on 1
d
and dene
|g|
= sup
xR
d
[g (x)[
for g BC
_
1
d
_
. Suppose 1 0. In particular, for any t > 0, show T
t
maps L
p
_
1
d
, m
_
into BC
_
1
d
_
and
|T
t
f|
C (p, t) |f|
L
p
(R
d
)
,
where C (p, t) is dene as in Eq. (32.30). Hint: rst verify the continuity of
u(t, x) under the additional assumption that f C
c
_
1
d
_
.
32.5 Appendix: Extensions of Theorem 32.1
Proposition 32.12. Suppose f : 1
d
1 is a measurable function and there
exists constants c, C < such that
[f(x)[ Ce
c
2
]x]
2
.
32.5 Appendix: Extensions of Theorem 32.1 471
Then u(t, x) := p
t
f(x) is smooth for (t, x) (0, c
1
) 1
d
and for all k N
and all multi-indices ,
D
_

t
_
k
u(t, x) =
_
D
_

t
_
k
p
t
_
f(x). (32.31)
In particular u satises the heat equation u
t
= u/2 on (0, c
1
) 1
d
.
Proof. The reader may check that
D
_

t
_
k
p
t
(x) = q(t
1
, x)p
t
(x)
where q is a polynomial in its variables. Let x
0
1
d
and > 0 be small, then
for x B(x
0
, ) and any > 0,
[x y[
2
= [x[
2
2 [x[ [y[ +[y[
2
[y[
2
+[x[
2
2
[x[
2
+
2
[y[
2
_
_
1
2
_
[y[
2
2
1
_
_
[x
0
[
2
+
_
.
Hence
g(y) := sup
_
_

t
_
k
p
t
(x y)f(y)
: t c & x B(x
0
, )
_
sup
_
q(t
1
, x y)
e
1
2t
]xy]
2
(2t)
n/2
Ce
c
2
]y]
2
: t c & x B(x
0
, )
_
C(, x
0
, ) sup
_
q(t
1
, x y)
e
[
1
2t
(1
2
)+
c
2
]]y]
2
(2t)
n/2
:
t c and
x B(x
0
, )
_
.
By choosing close to 0, the reader should check using the above expression
that for any 0 < < (1/t c) /2 there is a

C < such that g(y)

Ce
]y]
2
.
In particular g L
1
_
1
d
_
. Hence one is justied in dierentiating past the
integrals in p
t
f and this proves Eq. (32.31).
Lemma 32.13. There exists a polynomial q
n
(x) such that for any > 0 and
> 0,
_
R
d
1
]y]
e
]y]
2
dy
n
q
n
_
1
2
_
e
2
Proof. Making the change of variables y y and then passing to polar
coordinates shows
_
R
d
1
]y]
e
]y]
2
dy =
n
_
R
d
1
]y]1
e
2
]y]
2
dy =
_
S
n1
_
n
_

1
e
2
r
2
r
n1
dr.
Letting =
2
and
n
() :=
_
r=1
e
r
2
r
n
dr, integration by parts shows
n
() =
_

r=1
r
n1
d
_
e
r
2
2
_
=
1
2
e
+
1
2
_

r=1
(n 1)r
(n2)
e
r
2
dr
=
1
2
e
+
n 1
2

n2
().
Iterating this equation implies
n
() =
1
2
e
+
n 1
2
_
1
2
e
+
n 3
2

n4
()
_
and continuing in this way shows
n
() = e
r
n
(
1
) +
(n 1)!!
2

i
()
where is the integer part of n/2, i = 0 if n is even and i = 1 if n is odd and
r
n
is a polynomial. Since
0
() =
_

r=1
e
r
2
dr
1
() =
_

r=1
re
r
2
dr =
e
2
,
it follows that
n
() e
q
n
(
1
)
for some polynomial q
n
.
Proposition 32.14. Suppose f C(1
d
, 1) such that [f(x)[ Ce
c
2
]x]
2
then
p
t
f f uniformly on compact subsets as t 0. In particular in view of
Proposition 32.12, u(t, x) := p
t
f(x) is a solution to the heat equation with
u(0, x) = f(x).
Proof. Let M > 0 be xed and assume [x[ M throughout. By uniform
continuity of f on compact set, given > 0 there exists = (t) > 0 such
that [f(x) f(y)[ if [x y[ and [x[ M. Therefore, choosing a > c/2
suciently small,
[p
t
f(x) f(x)[ =
_
p
t
(y) [f(x y) f(x)] dy
_
p
t
(y) [f(x y) f(x)[ dy

_
]y]
p
t
(y)dy +
C
(2t)
n/2
_
]y]
[e
c
2
]xy]
2
+e
c
2
]x]
2
]e
1
2t
]y]
2
dy
+

C (2t)
n/2
_
]y]
e
(
1
2t
a)]y]
2
dy.
So by Lemma 32.13, it follows that
[p
t
f(x) f(x)[ +

C (2t)
n/2
n
q
n
_
1
_
1
2t
a
_
2
_
e
(
1
2t
a)
2
and therefore
limsup
t0
sup
]x]M
[p
t
f(x) f(x)[ 0 as 0.
Lemma 32.15. If q(x) is a polynomial on 1
d
, then
_
R
d
p
t
(x y)q(y)dy =
n=0
t
n
n!
n
2
n
q(x).
Proof. Since
f(t, x) :=
_
R
d
p
t
(x y)q(y)dy =
_
R
d
p
t
(y)
dy =
(t)x
,
f(t, x) is a polynomial in x of degree no larger than that of q. Moreover
f(t, x) solves the heat equation and f(t, x) q(x) as t 0. Since g(t, x) :=
n=0
t
n
n!
n
2
n
q(x) has the same properties of f and is a bounded operator
when acting on polynomials of a xed degree we conclude f(t, x) = g(t, x).
Example 32.16. Suppose q(x) = x
1
x
2
+x
4
3
, then
e
t/2
q(x) = x
1
x
2
+x
4
3
+
t
2
_
x
1
x
2
+x
4
3
_
+
t
2
2! 4
2
_
x
1
x
2
+x
4
3
_
= x
1
x
2
+x
4
3
+
t
2
12x
2
3
+
t
2
2! 4
4!
= x
1
x
2
+x
4
3
+ 6tx
2
3
+ 3t
2
.
Proposition 32.17. Suppose f C
(1
d
) and there exists a constant C <
such that

]]=2N+2
[D
f(x)[ Ce
C]x]
2
,
then
(p
t
f)(x) = e
t/2
f(x) =
N
k=0
t
k
k!
k
f(x) +O(t
N+1
) as t 0
Proof. Fix x 1
d
and let
f
N
(y) :=
]]2N+1
1
!
D
f(x)y
.
Then by Taylors theorem with remainder
[f(x +y) f
N
(y)[ C [y[
2N+2
sup
t[0,1]
e
C]x+ty]
2
C [y[
2N+2
e
2C[]x]
2
+]y]
2
]

C [y[
2N+2
e
2C]y]
2
and thus
_
R
d
p
t
(y)f(x +y)dy
_
R
d
p
t
(y)f
N
(y)dy

C
_
R
d
p
t
(y) [y[
2N+2
e
2C]y]
2
dy
=

Ct
N+1
_
R
d
p
1
(y) [y[
2N+2
e
2t
2
C]y]
2
dy = O(t
N+1
).
Since f(x +y) and f
N
(y) agree to order 2N + 1 for y near zero, it follows that
_
R
d
p
t
(y)f
N
(y)dy =
N
k=0
t
k
k!
k
f
N
(0) =
N
k=0
t
k
k!
k
y
f(x +y)[
y=0
=
N
k=0
t
k
k!
k
f(x)
33
Feller Processes
In this chapter we are going to introduce the class of Feller Markov processes.
This class of processes contains most of the examples of continuous time Markov
processes that are studied in this book.
Throughout this part of the book, let (S, ) be a locally compact, separable
metric space and recall that for x S and > 0 we let
B(x, ) = y S : (x, y) < and
C (x, ) = y S : (x, y)
be the open and closed balls about x respectively. Let us now recall
that (S, ) is compact, i.e. there exists compact subsets K
n
n=1
such that
K
n
S as n . To prove this let x
n
n=1
be a countable dense subset of S
and let
n
:= sup > 0 : C (x
n
, 2) is compact .
Then dene C
n
:= C (x
n
,
n
1)is compact for each n. We may then take
K
n
=
n
l=1
C (x
l
,
l
1) for all n N.
By construction each K
n
is compact and I will leave it to the reader to verify
that
n=1
K
n
= S.
Denition 33.1. A function f : S 1 is said to vanish at innity if for
every > 0 there exists a compact set K
S such that [f (x)[ for all

x S K
. We denote those f C (S) which vanish at innity by C

0
(S) .
In terms of an exhausting sequence of compact sets K
n
n=1
as constructed
above we can rephrase the denition that f vanishes at innity as the
limsup
n
sup
xS\Kn
[f (x)[ = 0.
Finally when S is not compact, let

S := S denote the one point
compactication of S where V

S is an open neighborhood of i

S V
is compact in S. To any f C
0
(S) we will dene f () = 0 and view f as a
function on

S. This map is inverse to the restriction map, C
S
_
f f[
S

C (S) where C
S
_
consists of those f C
_
S
_
such that f () = 0.
We are now going extend the notion introduced in Denition 17.12.
Denition 33.2 (sub-Markov transition kernels). We say a collection
of sub-probabilty kernels, Q
t
t0
on S S are time homogeneous sub-
Markov transition kernels Q if Q
0
(x, dy) =
x
(dy) for all s T and
the Chapmann-Kolmogorov equations hold;
Q
s+t
= Q
s
Q
t
for all 0 s, t < . (33.1)
The point is that we are now allowing for the possibility that Q
t
(x, 1) =
Q
t
(x, S) < 1 and hence the terminology of being sub-Markovian. For t 0 and
x S let
t
(x) := 1 Q
t
(x, S) = 1 (Q
t
1) (x)
and let
t
() = 1. Then we may dened probability kernels on

S

S by
Q
t
(x, A) := Q
t
(x, A ) +
t
(x)
(A)
for all x

S and A B
S
where by convention Q
t
(, ) 0. Alternatively
stated, if f :

S 1 is a bounded measurable function, then
_
Q
t
f
_
(x) =

Q
t
(x, f) = Q
t
(x, f[
S
) +
t
(x) f () .
Lemma 33.3. The probability kernels
_
Q
t
_
t0
dened above are Markov tran-
sition kernels as in Denition 17.12.
Proof. The point is to verify the Chapmann-Kolmogorov equations hold. If
f :

S 1 is a bounded measurable function, then for x S we have
_
Q
s

Q
t
f
_
(x) = Q
s
_
x,
_
Q
t
f
_
[
S
_
+
s
(x)
_
Q
t
f
_
()
= Q
s
(x, Q
t
(, f[
S
) +
t
() f ()) +
s
(x) f ()
= Q
s+t
(x, f[
S
) + [(Q
s
t
) (x) +
s
(x)] f () .
Since
s
+ (Q
s
t
) =
s
+Q
s
(1 Q
t
(, S))
= 1 Q
s
1 +Q
s
1 Q
s+t
1 = 1 Q
s+t
1 =
s+t
474 33 Feller Processes
we have shown that
_
Q
s

Q
t
f
_
(x) =
_
Q
s+t
f
_
(x) for all x S. The case where
x = is simple to check since
_
Q
t
f
_
() = f () and therefore,
_
Q
s

Q
t
f
_
() =
_
Q
t
f
_
() = f () =
_
Q
s+t
f
_
() .
According to Theorem 17.11 we may now associate an

S valued Markov
process to the Markov transition kernels,
_
Q
t
_
t0
. However, Theorem 17.11
makes no guarantee that this process will have any reasonable sample path
properties, e.g. measurability or continuity properties. To overcome this problem
we are going to require more restrictions on Q
t
.
Denition 33.4 (Feller Semigroup). We say that Q
t
t0
is a Feller semi-
group if Q
t
t0
is are sub-Markov transition kernels such that;
1. Q
t
(C
0
(S)) C
0
(S) for all t 0 and
2. Q
t
f f in the uniform norm as t 0 for all f C
0
(S) .
It is possible to associate to these Feller semi-groups Markov processes with
nice sample space properties.
Denition 33.5 (Canonical path space). Let =
_
S
_
denote the collec-
tion of functions, : 1
+

S which are right continuous and possess left hand
limits (rcll for short) and are absorbing at . Here we way that is absorbing
at if either (t) := lim
t
() = or (t) = then (s) = for all
s > t. For we let
() := inf t 0 : (t) =
and call () the life time of .
For example if S = 1 and
(t) :=
_
1
1t
if 0 t < 1
if t 1
,
then
_
S
_
.
Notation 33.6 For t 0, let X
t
:

S be dened by X
t
() = (t) for
all , B
t
:= B
X
t
:= (X
s
: s t) , B
+
t
:= B
t+
, and B :=
t0
B
t
=
(X
s
: 0 s < ) .
Theorem 33.7. Suppose that Q
t
t0
is Feller semi-group. Then for each
probability measure, , on (S, B
S
) , there is a probability measure P
on (, B)
such that; 1) Law
P
(X
0
) = and 2) X
t
t0
process having Q
t
t0
as its Markov transition kernels. Moreover if Q
t
is con-
servative for all t 0 (i.e. Q
t
1 = 1 for all t) then we may replace by the rcll
paths on S.
Rather than give a proof
1
of this theorem we will remind the reader of a
few examples which we already know satisfy the hypothesis and the conclusion
of the theorem.
Example 33.8 (Poisson Process). Let us continue the set up in Example 29.8 so
that S = N
0
and
Q
t
f (x) =
n=0
(t)
n
n!
e
t
f (x +n)
for all bounded functions, f : S 1. In this case f C
0
(S) i lim
x
f (x) =
0 and if this is the case then for all t > 0 we have by DCT that,
lim
x
Q
t
f (x) =
n=0
(t)
n
n!
e
t
lim
x
f (x +n) = 0
so that Q
t
(C
0
(S)) C
0
(S) . Moreover if f : S 1 is a bounded functions,
then
[Q
t
f (x) f (x)[ =
n=0
(t)
n
n!
e
t
f (x +n) f (x)
_
1 e
t
_
[f (x)[ +e
t
n=1
(t)
n
n!
[f (x +n)[
_
_
1 e
t
_
+e
t
n=1
(t)
n
n!
_
|f|
u
.
It now easily follows that lim
t0
|Q
t
f f|
u
= 0 for all bounded functions on
S and hence for all f C
0
(S) . This shows that Q
t
t0
is a Feller semi-group
and so by Theorem 33.7 there we have a rccl Markov process on S which we
have already constructed in Section 11.3.
1
The idea of the proof is to start with the Markov processes constructed in Theorem
17.11. One then shows using martingale regularization techniques (see Section 28.2)
in order to show that this Markov process admits a version with sample paths
in
_
S
_
. The interested reader may nd the missing details in Kallenberg [28,
Theorem 19.15 on p. 379].
33 Feller Processes 475
Example 33.9 (Bounded Rate Markov Chains). We now re-examine Example
29.11. Recall that S is countable set and a : S S 1 is a function such that
a (x, y) 0 for all x ,= y, and there exists < such that
a
x
:=
y,=x
We then set A : o
b
o
b
by
Af (x) :=
y,=x
a (x, y) [f (y) f (x)] for all x S
and Q
t
= e
tA
so that Q
t
f =
n=0
t
n
n!
A
n
f where
A
n
f (x) =
y1,...,ynS
a (x, y
1
) a (y
1
, y
2
) . . . a (y
n1
, y
n
) f (y
n
)
where a (x, x) :=
y,=x
a (x, y) in these formulas. It was shown in Corollary
17.27 that Q
t
t0
denes a conservative Markov semi-group and that there is
an associated rcll Markov process associated to this semi-group. However in
this case it is not actually true that Q
t
is necessarily Feller. For example if we
take S = Z and dene a (x, y) = 1
x,=0
1
y=0
then

y,=x
a (x, y) = 1 for x ,= 0
and 0 if x = 0. Thus we extend a via
a (x, y) = 1
x,=0
(1
y=0
1
x=y
) .
Taking f = 1
0]
C
0
(S) we nd
(Af) (x) :=
yS
a (x, y) f (y) = a (x, 0) = 1
x,=0
(1 1
x=0
) = 1
x,=0
.
So if e
tA
f C
0
(S) then e
t(A+I)
f = e
t
e
tA
f C
0
(S) and since A + I has a
positive matrix entries this would imply that (A+I) f C
0
(S) for f 0.
But this is not the case for f = 1
0]
. Nevertheless we still have a nice Markov
process associated to these Markov kernels.
It is worth observing that e
tA
C (S) C (S) as C (S) is a Banach space and
e
tA
f is convergent in the uniform norm topology. In fact we have
[Af (x)[
yS
[a (x, y)[ [f (y)[ 2 |f|
u
so that |Af|
u
2|f|
u
. Therefore A is bounded in the operator norm and
therefore e
tA
is convergent in End(C
0
(S)) with e
tA
I as t 0 in the operator
norm.
Exercise 33.1. Suppose that a : S S [, ] is as described in Example
33.9. Show that the associated semi-group is Feller provided lim
x
a (x, y) = 0
for all y S, i.e. a (, y) C
0
(S) for all y S.
Solution to Exercise (33.1). Suppose that f C
0
(S) and > 0 are given.
Choose

f
S such that [f[ on
c
, then
[Af (x)[
y
[a (x, y)[ [f (y)[ +
y/
[a (x, y)[
y
[a (x, y)[ [f (y)[ + 2.
Therefore
limsup
x
[Af (x)[ 2 0 as 0.
This shows that Af C
0
(S) . As C
0
(S) is a closed subspace of C (S) with the
uniform norm and e
tA
f is convergent in C (S) and each partial sum is in C
0
(S),
it follows that e
tA
f C
0
(S) as well. We have already seen that e
tA
I in the
operator norm on C (S) and therefore e
tA
f f for all f C (S) as t 0.
Example 33.10 (Brownian Motion). We now reconsider Example 29.12 so that
S = 1
d
and
Q
t
(x, dy) =
_
1
2t
_
d/2
exp
_
1
2t
|y x|
2
_
dy.
In this case Q
t
t0
is a Feller semi-group. Indeed let Z
d
= N (0, I) . If f
C
0
_
1
d
_
and t > 0, then by DCT
lim
x
Q
t
(x; f) = lim
x
E
_
f
_
x +
tZ
__
= E
_
lim
x
f
_
x +
tZ
__
= 0
and
[Q
t
(x; f) f (x)[ =
E
_
f
_
x +
tZ
_
f (x)
_
f
_
x +
tZ
_
f (x)
E
_
f
_
x +
tZ
_
f (x)
: |Z| M
_
+E
_
f
_
x +
tZ
_
f (x)
: |Z| > M
_
.
|Q
t
f f|
u
sup
x
sup
|y|
tM
[f (x +y) f (x)[ + 2 |f|
u
P (|Z| > M) .
Since f is uniformly continuous it follows that
limsup
t0
|Q
t
f f|
u
2 |f|
u
P (|Z| > M) 0 as M .
As we have already seen in Theorem 26.3, there is a corresponding rcll Markov
process (called Brownian motion) and in fact we have seen that this process
may be taken to be continuous.
34
*Nelsons Continuity Criteria
This chapter is devoted to giving Nelsons convenient criteria for verifying
that the sample paths of a Markov process may be chosen to be continuous.
Let (S, ) be a metric space and suppose that
_
Q
t
, X
t
, , B
t
, B, P, P
x
xS
_
is
a time homogeneous Markov process as in Denition 17.13 and Theorem 17.14.
For , > 0 let
c (, ) :=supQ
s
(x, C (x, )
c
) : 0 s and x S (34.1)
= sup
0s,xS
P
x
( (x, X
s
) > ) (34.2)
where
C (x, ) := y : (y, x)
is the closed ball in S which is centered at x. We are going to give a version
of Nelsons continuity criteria, see [41, Theorem 2] and [42, p. 339-340]. Roughly
speaking Theorem 34.4 below states if the probabilities of jumps of any xed
size over small time intervals is suciently small then the process wants to
have continuous sample paths. More precisely we will prove if c (, ) = o () for
all > 0 then X
t
has a continuous modication. We begin with the following
preliminary result.
Lemma 34.1. For all s, t 1
+
and > 0 we have
P ( (X
t
, X
s
) > ) c (, [t s[) . (34.3)
Hence if c (, ) 0 as 0, then X
s
P
X
t
as s t.
Proof. Let := Law
P
(X
0
) and suppose that 0 s < t for deniteness. We
then have,
P ( (X
t
, X
s
) > ) =
_
S
d (x) Q
ts
_
x, 1
(x,)>
_
=
_
S
d (x) Q
ts
(x, C (x, )
c
)
_
S
d (x) c (, [t s[) = c (, [t s[) .
We can in fact greatly improve on this estimate as we see in the next theo-
rem.
Theorem 34.2. For all countable subsets 1
+
with sup inf , we
have
P
_
sup
s,t
(X
s
, X
t
) > 4
_
2c (, ) and (34.4)
P
_
sup
s,t
(X
s
, X
t
) > 4
_
c (, )
1 c (, )
(34.5)
where P is any measure on (, B) such that X
t
t0
is a Markov process with
transition kernels, Q
t
t0
. (We will refer the rst inequality as Nelsons in-
equality. The second inequality is a variant of Skorohods inequality in Theorem
20.45.)
Proof. We may with out loss of generality assume that is a nite set. To
see this let
n
n=1
be a sequence of nite sets such that
n
as n ,
then
_
max
s,tn
(X
s
, X
t
) > 2
_
_
sup
s,t
(X
s
, X
t
) > 2
_
and so the estimates for nite will give the estimates for countable . So
we now let = t
1
< t
2
< < t
n
1
+
with t
n
t
1
and to simplify
notation let Y
j
:= X
tj
for all 1 j n and
= mink 1, . . . , n : (Y
k
, Y
1
) > 2
where min := . Notice that
n =
_
max
t
(X
t1
, X
t
) > 2
_
.
Let
B := (X
tn
, X
t1
) = (Y
n
, Y
1
) > .
For 1 k n on the event = k B we have
2 < (Y
k
, Y
1
) (Y
k
, Y
n
) + (Y
n
, Y
1
) < (Y
k
, Y
n
) +
and so = k B (Y
k
, Y
n
) > . So we have shown,
478 34 *Nelsons Continuity Criteria
= k B = k & (Y
k
, Y
n
) >
which implies that
n B
n
k=1
= k & (Y
k
, Y
n
) > .
This leads to the estimate;
P ( n) P (B) +
n
k=1
P ( = k & (Y
k
, Y
n
) > )
= P ( (Y
n
, Y
1
) > ) +
n
k=1
P ( = k & (Y
k
, Y
n
) > )
c (, [t
n
t
1
[) +
n
k=1
P ( = k & (Y
k
, Y
n
) > ) , (34.6)
wherein we used Lemma 34.1 in the last inequality.
Since = k is B
tk
measurable we may use the Markov property in
Theorem 17.14 to nd;
P ( = k & (Y
k
, Y
n
) > ) = E[1
=k
P
Yk
[ (X
0
, X
tntk
) > ]]
= E[1
=k
Q
tntk
(Y
k
, y S : (Y
k
, y) > )]
E[1
=k
c (, )] = c (, ) P ( = k) .
Using this estimate back in Eq. (34.6) gives the estimate
P
_
max
t
(X
t1
, X
t
) 2
_
= P ( n) 2c (, ) .
This inequality along with the observation;
max
s,t
(X
t
, X
s
) max
s,t
[ (X
t
, X
t1
) + (X
t1
, X
s
)] 2 max
t
(X
t
, X
t1
) (34.7)
proves Eq. (34.4).
The proof of Eq. (34.5) is similar. Working on the event n
(Y
, Y
n
) we have
2 < (Y
, Y
1
) (Y
, Y
n
) + (Y
n
, Y
1
) + (Y
n
, Y
1
)
and so we have shown
n & (Y
, Y
n
) (Y
n
, Y
1
) > .
Hence it follows (again using the Markov property in Theorem 17.14) that
P ( (Y
n
, Y
1
) > ) P ( n & (Y
, Y
n
) )
=
n
k=1
P ( = k & (Y
k
, Y
n
) )
=
n
k=1
E[1
=k
P
Yk
( (X
0
, X
tntk
)) ]
=
n
k=1
E[1
=k
Q
tntk
(Y
k
, C (Y
k
, ))]
k=1
E
_
1
=k
inf
xS
Q
tntk
(x, C (x, ))
_
inf
xS
inf
0s
Q
s
(x, C (x, ))
n
k=1
P ( = k)
= inf
xS
inf
0s
Q
s
(x, C (x, )) P ( n) .
Thus we have shown
P
_
max
t
(X
t1
, X
t
) > 2
_
1
inf
xS
inf
0s
Q
s
(x, C (x, ))
P ( (Y
n
, Y
1
) > )
=
1
1 sup
xS
sup
0s
Q
s
(x, y : (y, x) > )
P ( (X
tn
, X
t1
) > )
=
1
1 c (, )
P ( (X
tn
, X
t1
) > )
c (, )
1 c (, )
wherein we have used Lemma 34.1 in the last inequality. Equation (34.5) now
follows from this inequality and Eq. (34.7).
Corollary 34.3. Continuing the notation used above and further assume that
1/ N. Then for all N N;
P (sup (X
s
, X
t
) : s, t [0, N) & [t s[ 4) 2N
c (, 2)
.
Proof. Let J
k
:= [(k 1) , k) so that [0, N) =

N/
k=1
J
k
. If s, t [0, N)
with s < t s+, then s J
k
for some unique k and consequently t J
k
J
k+1
.
From this observation it follows that
sup (X
s
, X
t
) : s, t [0, N) & [t s[
= max
1kN/
sup (X
s
, X
t
) : s, t [J
k
J
k+1
]
34 *Nelsons Continuity Criteria 479
and therefore,
P (sup (X
s
, X
t
) : s, t [0, N) & [t s[ 4)
N/
k=1
P (sup (X
s
, X
t
) : s, t [J
k
J
k+1
] 4)
N/
k=1
2c (, 2) = 2N
c (, 2)
.
Theorem 34.4 (Nelsons continuity theorem). Suppose now that (S, ) is
a complete metric space and (Q
t
, X
t
, , B
t
, B, P) are as in Theorem 17.14 and
suppose for each > 0 that c (, ) = o () (i.e. lim
0
[c (, ) /] = 0) where
c (, ) is dened in Eq. (34.1). Then X
t
t0
admits a continuous version.
Proof. Let
0
denote those such that
+
t X
t
() is
uniformly continuous on bounded subsets. Since /
0
i there exists
N N and an = 1/k > 0 such that for all = 1/n we will have
sup (X
s
, X
t
) : s, t [0, N) & [t s[ 4 it follows that
c
0
=
NN
kN
nN
E
N,k,n
where
E
N,k,n
:= sup (X
s
, X
t
) : s, t [0, N) & [t s[ 1/n 4/k
However making use of the continuity properties of P and Corollary 34.3 we
learn that
P (
c
0
) = lim
N
lim
k
lim
n
P (E
N,k,n
) = 0.
Thus for
0
(as set of full measure),
+
t X
t
() is uniformly
continuous on bounded subsets and therefore extends uniquely to a continuous
function,

X
t
() for all t 1
+
. (This is where we use the completeness of
(S, ) .) For
c
0
let us dene

X
t
() = for some xed point S. To
nish the proof it suces to show that

X
t
= X
t
a.s. for every t 1
+
. But this
follows from the facts; 1) X
s
=

X
s
a.s. when s
+
, 2) X
s
P
X
t
as s t by
Lemma 34.1, and

X
t
= lim
Qst
X
s
a.s. and therefore X
t
=

X
t
a.s.
As an example we give yet another proof for the existence of Brownian
motion.
Theorem 34.5 (Wiener). Associated to the homogeneous Markov semi-group,
Q
t
t0
on 1 dened by
Q
t
(x, dy) :=
1
2t
exp
_
1
2
(y x)
2
_
dy,
is a Markov process, B
t
t0
, with continuous sample paths. (Of course when
we start the process at 0, B
t
t0
is a standard Brownian motion as in Deni-
tion 17.21.)
Proof. In Exercise 17.8 you showed that is a time homogeneous Markov
kernel and therefore by Theorem 17.11 there exists a corresponding Markov
process. Recall from Exercise 17.8 that
Q
t
(x, f) = E
_
f
_
x +
tZ
__
where Z is standard normal random variable, we nd that
Q
t
(x, [x , x +]
c
) = P
_
x +
tZ [x , x +]
c
_
= P
_
[Z[ > /
t
_
and therefore,
c (, ) = sup
0t
P
_
[Z[ > /
t
_
= P
_
[Z[ > /
_
.
By Chebyshevs inequality we have for all p > 0 that c (, )

p/2
p
E[Z[
p
which
is clearly o () when p > 2. Alternatively we may use the Gaussian tail estimates
in Lemma 7.59 in order to conclude that
c (, )
_
2
e
/2
2
as 0.
The result now follows from Theorem 34.4.
Part VII
Continuous Time Markov Chains
This part of the document needs signicant editing! Please read at your own
risk. Strictly speaking the material here might logically come before Brownian
motion. But as this section is still needs so much work, I thought it was best
to isolate it by itself for now.
35
Basics of continuous time chains
(This chapter needs serious editing! It was originally written for Math 180
and as such is not properly integrated into the material above. For example
there is a lot of redundancy with Chapters 17 and 29. Moreover the notation is
not consistent with that chapter.)
In this chapter we are going to begin out study continuous time homogeneous
Markov chains on discrete state spaces S. In more detail we will assume that
X
t
t0
is a stochastic process whose sample paths are right continuous, see
Figures 35.1 and 35.2. (These processes need not have left hand limits if there
are an innite number of jumps in a nite time interval. For the most part
we will assume that this does not happen almost surely.) Recall from Theorem
17.26 and Corollary 17.27 above that we have seen such Markov chains exist.
Fig. 35.1. Typical sample paths of a continuous time Markov chain in a discrete state
space.
Fig. 35.2. A sample path of a birth process. Here the state space is {0, 1, 2, . . . } to
be thought of the possible population size.
As in the discrete time Markov chain setting, to each i S, we will write
P
i
(A) := P (A[X
0
= i) . That is P
i
is the probability associated to the scenario
where the chain is forced to start at site i. We now dene, for i, j S,
P
ij
(t) := P
i
(X (t) = j) (35.1)
which is the probability of nding the chain at time t at site j given the chain
starts at i.
Denition 35.1. The time homogeneous Markov property states for ev-
ery 0 s < t < and any choices of 0 = t
0
< t
1
< < t
n
= s < t and
i
1
, . . . , i
n
S that
P
i
(X (t) = j[X (t
1
) = i
1
, . . . , X (t
n
) = i
n
) = P
in,j
(t s) , (35.2)
486 35 Basics of continuous time chains
and consequently,
P
i
(X (t) = j[X (s) = i
n
) = P
in,j
(t s) . (35.3)
Roughly speaking the Markov property may be stated as follows; the
probability that X (t) = j given knowledge of the process up to time s is
P
X(s),j
(t s) . In symbols we might express this last sentence as
P
i
_
X (t) = j[ X ()
s
_
= P
i
(X (t) = j[X (s)) = P
X(s),j
(t s) .
So again a continuous time Markov process is forgetful in the sense what the
chain does for t s depend only on where the chain is located, X (s) , at time
s and not how it got there. See Fact 35.3 below for a more general statement
of this property.
Denition 35.2 (Informal). A stopping time, T, for X (t) , is a random
variable with the property that the event T t is determined from the knowl-
edge of X (s) : 0 s t . Alternatively put, for each t 0, there is a func-
tional, f
t
, such that
1
Tt
= f
t
(X (s) : 0 s t) .
As in the discrete state space setting, the rst time the chain hits some subset
of states, A S, is a typical example of a stopping time whereas the last time
the chain hits a set A S is typically not a stopping time. Similar the discrete
time setting, the Markov property leads to a strong form of forgetfulness of the
chain. This property is again called the strong Markov property which we
take for granted here.
Fact 35.3 (Strong Markov Property) If X (t)
t0
is a Markov chain, T
is a stopping time, and j S, then, conditioned on T < and X
T
= j ,
X (s) : 0 s T and X (t +T) : t 0 are independent
and X (t +T) : t 0 has the same distribution as X (t)
t0
under P
j
. (See
Theorem 35.19 and Corollary 35.20.)
We will use the above fact later in our discussions. For the moment, let us
go back to more elementary considerations.
Theorem 35.4 (Finite dimensional distributions). Let 0 < t
1
< t
2
<
< t
n
and i
0
, i
1
, i
2
, . . . , i
n
S. Then
P
i0
(X
t1
= i
1
, X
t2
= i
2
, . . . , X
tn
= i
n
)
= P
i0,i1
(t
1
)P
i1,i2
(t
2
t
1
) . . . P
in1,in
(t
n
t
n1
). (35.4)
Proof. The proof is similar to that of Proposition ??. For notational sim-
plicity let us suppose that n = 3. We then have
P
i0
(X
t1
= i
1
, X
t2
= i
2
, X
t3
= i
3
) = P
i0
(X
t3
= i
3
[X
t1
= i
1
, X
t2
= i
2
)P
i0
(X
t1
= i
1
, X
t2
= i
2
)
= P
i2,i3
(t
3
t
2
) P
i0
(X
t1
= i
1
, X
t2
= i
2
)
= P
i2,i3
(t
3
t
2
) P
i0
(X
t2
= i
2
[X
t1
= i
1
) P
i0
(X
t1
= i
1
)
= P
i2,i3
(t
3
t
2
) P
i1,i2
(t
2
t
1
) P
i0,i1
(t
1
)
wherein we have used the Markov property once in line 2 and twice in line 4.
Proposition 35.5 (Properties of P). Let P
ij
(t) := P
i
(X (t) = j) be as
above. Then:
1. For each t 0, P (t) is a Markov matrix, i.e.
jS
P
ij
(t) = 1 for all i S and
P
ij
(t) 0 for all i, j S.
2. lim
t0
P
ij
(t) =
ij
for all i, j S.
3. The Chapman Kolmogorov equation holds:
P(t +s) = P(t)P(s) for all s, t 0, (35.5)
i.e.
P
ij
(t +s) =
kS
P
ik
(s) P
kj
(t) for all s, t 0. (35.6)
We will call a matrix P (t)
t0
satisfying items 1. 3. a continuous time
Markov semigroup.
Proof. Most of the assertions follow from the basic properties of conditional
probabilities. The assumed right continuity of X
t
implies that lim
t0
P(t) =
P(0) = I. From Equation (35.4) with n = 2 we learn that
P
i0,i2
(t
2
) =
i1S
P
i0
(X
t1
= i
1
, X
t2
= i
2
)
=
i1S
P
i0,i1
(t
1
)P
i1,i2
(t
2
t
1
)
= [P(t
1
)P(t
2
t
1
)]
i0,i2
.
At this point it is not so clear how to nd a non-trivial (i.e. P (t) ,= I for all
t) example of a continuous time Markov semi-group. It turns out the Poisson
process provides such an example.
35 Basics of continuous time chains 487
Example 35.6. In this example we will take S = 0, 1, 2, . . . and then dene,
for > 0,
P (t) = e
t
0 1 2 3 4 5 6 . . .
_
_
1 t
(t)
2
2!
(t)
3
3!
(t)
4
4!
(t)
5
5!
. . .
0 1 t
(t)
2
2!
(t)
3
3!
(t)
4
4!
. . .
0 0 1 t
(t)
2
2!
(t)
3
3!
. . .
0 0 0 1 t
(t)
2
2!
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
0
1
2
3
.
.
.
.
In components this may be expressed as,
P
ij
(t) = e
t
(t)
ji
(j i)!
1
ij
with the convention that 0! = 1. (See Exercise ?? to see where this example is
coming from.)
If i, j S, then P
ik
(t) P
kj
(s) will be zero unless i k j, therefore we
have
kS
P
ik
(t) P
kj
(s) = 1
ij
ikj
P
ik
(t) P
kj
(s)
= 1
ij
e
(t+s)
ikj
(t)
ki
(k i)!
(s)
jk
(j k)!
. (35.7)
Let k = i +m with 0 m j i, then the above sum may be written as
ji
m=0
(t)
m
m!
(s)
jim
(j i m)!
=
1
(j i)!
ji
m=0
_
j i
m
_
(t)
m
(s)
jim
and hence by the Binomial formula we nd,
ikj
(t)
ki
(k i)!
(s)
jk
(j k)!
=
1
(j i)!
(t +s)
ji
.
Combining this with Eq. (35.7) shows that
kS
P
ik
(t) P
kj
(s) = P
ij
(s +t) .
Proposition 35.7. Let X
t
t0
is the Markov chain determined by P (t) of
Example 35.6. Then relative to P
0
, X
t
t0
is precisely the Poisson process on
[0, ) with intensity .
Proof. Let 0 s < t. Since P
0
(X
t
= n[X
s
= k) = P
kn
(t s) = 0 if n < k,
X
t
t0
is a non-decreasing integer value process. Suppose that 0 = s
0
< s
1
<
s
2
< < s
n
= s and i
k
S for k = 0, 1, 2, . . . , n, then
P
0
_
X
t
X
s
= i
0
[X
sj
= i
j
for 1 j n
_
= P
0
_
X
t
= i
n
+i
0
[X
sj
= i
j
for 1 j n
_
= P
0
(X
t
= i
n
+i
0
[X
sn
= i
n
)
= e
(ts)
(t)
i0
i
0
!
.
Since this answer is independent of i
1
, . . . , i
n
we also have
P
0
(X
t
X
s
= i
0
)
=
i1,...,inS
P
0
_
X
t
X
s
= i
0
[X
sj
= i
j
for 1 j n
_
P
0
_
X
sj
= i
j
for 1 j n
_
=
i1,...,inS
e
(ts)
(t)
i0
i
0
!
P
0
_
X
sj
= i
j
for 1 j n
_
= e
(ts)
(t)
i0
i
0
!
.
Thus we may conclude that X
t
X
s
is Poisson random variable with intensity
which is independent of X
r
rs
. That is X
t
t0
is a Poisson process with
rate .
The next example is generalization of the Poisson process example above.
You will be asked to work this example out on a future homework set.
Example 35.8. In problems VI.6.P1 on p. 406, you will be asked to consider a
discrete time Markov matrix,
ij
, on some discrete state space, S,with associate
Markov chain Y
n
. It is claimed in this problem that if N (t)
t0
is Poisson
process which is independent of Y
n
, then X
t
:= Y
N(t)
is a continuous time
Markov chain. More precisely the claim is that Eq. (35.2) holds with
P (t) = e
t
m=0
t
m
m!
m
=: e
t(I)
,
i.e.
P
ij
(t) = e
t
m=0
t
m
m!
(
m
)
ij
.
(We will see a little later, that this example can be used to construct all nite
state continuous time Markov chains.)
Notice that in each of these examples, P (t) = I + Qt + O
_
t
2
_
for some
matrix Q. In the rst example,
Q
ij
=
ij
+
i,i+1
while in the second example, Q = I.
For a general Markov semigroup, P (t) , we are going to show (at least when
#(S) < ) that P (t) = I + Qt + O
_
t
2
_
for some matrix Q which we call
the innitesimal generator (or Markov generator) of P. We will see that
every innitesimal generator must satisfy:
Q
ij
0 for all i ,= j, and (35.8)
j
Q
ij
= 0, i.e. Q
ii
=
j,=i
Q
ij
for all i. (35.9)
Moreover, to any such Q, the matrix
P (t) = e
tQ
:=
n=0
t
n
n!
Q
n
= I +tQ+
t
2
2!
Q
2
+
t
3
3!
Q
3
+. . .
will be a Markov semigroup.
One useful way to understand what is going on here is to choose an initial
distribution, on S and then dene (t) := P (t) . We are going to interpret
j
as the amount of sand we have placed at each of the sites, j S. We are going
to interpret
j
(t) as the mass at site j at a later time t under the assumption
that satises, (t) = (t) Q, i.e.

j
(t) =
i,=j
i
(t) Qij q
j
j
(t) , (35.10)
where q
j
= Q
j j
. (See Example 36.19 below.) Here is how to interpret each
term in this equation:

j
(t) = rate of change of the amount of sand at j at time t,
i
(t) Qij = rate at which sand is shoveled from site i to j,
q
j
j
(t) = rate at which sand is shoveled out of site i to all other sites.
With this interpretation Eq. 35.10 has the clear meaning: namely the rate of
change of the mass of sand at j at time t should be equal to the rate at which
sand is shoveled into site j form all other sites minus the rate at which sand is
shoveled out of site i. With this interpretation, the condition,
Q
j j
:= q
j
=
k,=j
Q
j,k
,
just states the total sand in the system should be conserved, i.e. this guarantees
the rate of sand leaving j should equal the total rate of sand being sent to all
of the other sites from j.
Warning: the book denotes Q by A but then denotes the entries of A by
q
ij
. I have just decided to write A = Q and identify, Q
ij
and q
ij
. To avoid some
technical details, in the next chapter we are mostly going to restrict ourselves
to the case where #(S) < . Later we will consider examples in more detail
where #(S) = .
35.1 Construction of continuous time Markov processes
(Also see Section 37.3.)
Proposition 35.9 (Measure Theoretic). Let be a set,
n
be a nite or
countable partition of , and for each suppose that B
n
is a algebra on
n
.
Then
B :=
_
n
A
n
: A
n
B
n
_
is a algebra on . Moreover, f : 1 is B measurable i f[
n
:
n
1
is B
n
measurable for all n. We will write
n
B
n
for B in this case.
Proof. It is clear that B and that B is closed under countable and nite
unions. Thus it only is necessary to observe that B is closed under complemen-
tation which is a consequence of the following identity;

_
n
A
n
_
=
_
n
_
n
A
n
_
=
m
A
m
_
=
n
A
n
.
For the measurability assertion regarding f just observe that B
n
= B
n
for
all n.
Theorem 35.10 (Same as Theorem 17.26). Let
ij
i,jS
be a discrete
time Markov matrix over a discrete state space, S and Y
n
n=0
be the corre-
sponding Markov chain. Also let N
t
t0
be a Poisson process with rate > 0
which is independent of Y
n
. Then X
t
:= Y
Nt
is a continuous time Markov
chain with transition semi-group given by,
P (t) = e
t(I)
= e
t
e
t
.
Proof. Let us begin by computing,
35.2 Markov Properties in more detail 489
E
x
f (X
t
) = E
x
f (Y
Nt
) =
n=0
e
t
(t)
n
n!
E
x
f (Y
n
)
= e
t
n=0
(t)
n
n!
yS
n
(x, y) f (y)
= e
t
yS
n=0
(t)
n
n!

n
(x, y) f (y)
= e
t
yS
e
t
(x, y) f (y) =
_
e
t(I)
f
_
(x)
= P (t) f (x) .
Thus if show that X
t
t0
is Markov it will follow that P (t) is the desired
semi-group.
For each t 1
+
, notice that N
t
= n
n=0
is a partition of and so we may
dene
B
t
:=
n=0
(N
s
: s t, Y
0
, . . . , Y
n
)
Nt=n]
.
Observe that B
t
is an increasing ltration. Indeed, if s t, then B
s
is generated
by functions of the form,
g := 1
Ns=k
f (N
1
, . . . , N
n
, Y
0
, . . . , Y
k
)
where 0
1
<
2
< <
n
s. Since 1
Nt=n
1
Ns=k
= 0 unless k n, it
follows that
1
Nt=n
g = 1
Nt=n
1
Ns=k
f (N
1
, . . . , N
n
, Y
0
, . . . , Y
k
)
is (N
s
: s t, Y
0
, . . . , Y
n
)
Nt=n]
measurable for each n N
0
and therefore
g is B
t
measurable. This shows that B
s
B
t
as required.
We are going to now show that X
t
is B
t
Markov. To this end we have
for g as above (making use of the independence of N and Y and the Markov
property for Y ),
E[g u(X
t
)] = E[g u(Y
Nt
)] = E[g u(Y
Ns+NtNs
)]
= E[1
Ns=k
f (N
1
, . . . , N
n
, Y
0
, . . . , Y
k
) u(Y
k+NtNs
)]
= e
(ts)
l=0
[(t s) ]
l
l!
E[1
Ns=k
f (N
1
, . . . , N
n
, Y
0
, . . . , Y
k
) u(Y
k+l
)]
= e
(ts)
l=0
[(t s) ]
l
l!
E
_
1
Ns=k
f (N
1
, . . . , N
n
, Y
0
, . . . , Y
k
)
_
l
u
_
(Y
k
)
= E
_
1
Ns=k
f (N
1
, . . . , N
n
, Y
0
, . . . , Y
k
) e
(ts)
l=0
[(t s) ]
l
l!
_
l
u
_
(Y
k
)
_
= E[g (P (t s) u) (Y
Ns
)] = E[g (P (t s) u) (X
s
)] .
Thus it follows that E[u(X
t
) [B
t
] = (P (t s) u) (X
s
) as we wished to show.
Lemma 35.11. If q (x, y) is a rate matrix on S such that q (x) :=
y,=x
q (x, y)
is a bounded function on S, then there exists a Markov matrix (x, y) on S and
> 0 such that q (x, y) = ( (x, y)
x,y
) .
Proof. Let := sup
xS
q (x) and let us look for (x, y) such that q (x, y) =
( (x, y)
xy
) . Clearly we must take (x, y) =
1
q (x, y) for x ,= y and for

x = y we will have q (x) = ( (x, x) 1) so that (x, x) = 1
1
q (x) , i.e.
(x, y) =
1
q (x, y) 1
x,=y
+ 1
x=y
_
1
1
q (x)
_
.
Notice that (x, y) 0 for all x, y S and that
yS
(x, y) =
1
q (x) +
_
1
1
q (x)
_
= 1
so that (x, y) is indeed a Markov matrix.
Corollary 35.12. If q (x, y) is a rate matrix on S such that q (x) :=
y,=x
q (x, y) is a bounded function on S, then there exists a time homogeneous
Markov process X
t
t0
with values in S such that E
x
f (X
t
) =
_
e
tQ
f
_
(x) for
all f bo and t 0.
Proof. Let and be as in Lemma 35.11. Then dene X
t
= Y
Nt
as in
Theorem 35.10.
35.2 Markov Properties in more detail
This section is pretty much redundant now as it has all been done in Chapter
29.
Notation 35.13 In what follows, let denote those paths, : 1
+
S, which
are right continuous with left hand limits where S is equipped with the discrete
topology. Let
t
: and X
t
: S be dened by,
t
() = ( +t) and
X
t
() := (t)
respectively. Further let B
t
:= (X
s
: s t) , B
t+
:=
s>t
B
t
, and B :=
(X
s
: s < ) =
t<
B
t
.
We now assume for each x S there is a probability measure, P
x
, on (, B)
such that
P
x
(X
0
= y
0
, X
t1
= y
1
, . . . , X
tn=yn
) =
xy0
n
i=1
P
yi1,yi
(t
i
t
i1
) (35.11)
for all 0 = t
0
< t
1
< < t
n
, y
i
n
i=0
S, and n N.
Denition 35.14. A function, F : 1 is said to be a cylinder function
if there exists times,
0 = t
0
< t
1
< t
2
< < t
n
< ,
and a measurable function, f : S
n+1
1 such that
F = f (X
t0
, . . . , X
tn
) . (35.12)
Notation 35.15 Suppose that is a measure on S, let
P
:=
_
S
d (x) P
x
=
xS
(x) P
x
.
(Some of what follows is already done in Chapter 17.)
Theorem 35.16 (Markov Property I). Let be a probability measure on S
and F bB (the space of bounded B measurable functions), then
E
[F
t
[B
t
] = E
Xt
[F] P
a.s., (35.13)
where
(E
Xt
[F]) () := E
x
[F] [
x=Xt()
=
_
F (
t
) P
Xt()
(d
t
) .
Proof. Let F be a bounded cylinder function of the form, F =
f (X
t0
, . . . , X
tn
) , and G be a bounded cylinder function of the form, G =
g (X
s0
, . . . , X
sm
) where
0 = s
0
< s
1
< < s
m
= t.
Then
F
t
= f (X
t+t0
, . . . , X
t+tn
)
and
E
[F
t
G] =
xS
m+1
,yS
n
g (x) f (x
m
, y) q (x) q
t
(x
m
; y) (35.14)
where x S
m+1
, y S
n
,
q (x) = (x
0
) p
s1s0
(x
0
, x
1
) p
smsm1
(x
m1
, x
m
) ,
and
q
t
(x
m
; y) = p
t1t0
(x
m
, y
1
) p
tntn1
(y
n1
, y
n
) . (35.15)
According to Eq. (35.11
yS
n
f (x
m
, y) q
t
(x
m
; y) = E
xm
[F]
we may rewrite Eq. (35.14) as,
E
[F
t
G] =
xS
m+1
g (x) E
xm
[F] q (x)
= E
_
g (X
s0
, . . . , X
sm
) E
Xsm
[F]
= E
[G E
Xt
[F]] . (35.16)
An application of the multiplicative systems Theorem 8.2 (or Theorem 8.16)
shows Eq. (35.16) holds for all F bB and G bB
t
which is sucient to prove
Eq. (35.13).
Here is another seemingly mild improvement on the previous theorem.
Theorem 35.17 (Markov Property II). If F bB and is a probability
measure on 1, then
E
[F
t
[B
t+
] = E
Xt
[F] P
a.s.
and in particular there is a version of E
[F
t
[B
t+
] which is (B
t
) B
t

measurable.
Proof. Again let F be a bounded cylinder function of the form, F =
f (X
t0
, . . . , X
tn
) with f : S
n+1
1 being a bounded function. Recall that
E
x
F =
yS
n
f (x, y
1
, . . . , y
n
) q
t
(x; y)
where
q
t
(x; y) = p
t1t0
(x, y
1
) p
tntn1
(y
n1
, y
n
) .
Also observe that
F
() = f ( (t
0
+) , . . . , (t
n
+))
is right continuous in 0 for all . Therefore if G bB
t+
, then by the
DCT,
35.2 Markov Properties in more detail 491
E
[F
t
G] = lim
t
E
[F
G] .
By Theorem 35.16, for > t,
E
[F
G] = E
[E
[F
[B
] G] = E
[E
X
[F] G] .
Therefore, by another application of the DCT,
E
[F
t
G] = lim
t
E
[F
G] = lim
t
E
[E
X
[F] G] = E
[E
Xt
[F] G] .
It now follows by an application of the multiplicative systems Theorem 8.2 (or
Theorem 8.16) that
E
[F
t
G] = E
[E
Xt
[F] G]
for all F bB which completes the proof.
Lemma 35.18 (Optional time approximation lemma). Let be a B
t
t0
optional time (i.e. < t B
t
for all t, see Denition 27.8), and for n N,
let
n
: [0, ] be dened by
n
:= 1
=
+
k=1
k
2
n
1k1
2
n <
k
2
n
. (35.17)
If A B
+
, then A
n
= k2
n
B
k2
n for all k N,
n
is a stopping time,
and
n
as n .
Proof. For A B
+
A
_
n
= k2
n
_
= A
_
(k 1) 2
n
< k2
n
_
=
_
A
_
< k2
n
_
_
< (k 1) 2
n
_
B
k2
n.
Taking A = in this equation shows
n
= k2
n
B
k2
n for all k N and
therefore
_
n
k2
n
_
=
k
l=1
_
n
= l2
n
_
B
k2
n
for k N. From this it follows that
n
t B
t
for all t and hence
n
is
a stopping time. The fact that
n
as n should be clear.
See Section 27.2 for more on optional and stopping times and in particular
see Notation 27.1.
Theorem 35.19 (Strong Markov Property). Suppose that : [0, ]
is an optional time (i.e. < t B
t
for all t, see Denition 27.8), and F :
1 is a bounded measurable function. Then for any probability measure, ,
on (S, B
S
) we have
E
_
F
[B
+
= E
X
F, P
a.s. on < . (35.18)

Proof. Suppose that A B
+
, F := f (X
t0
, . . . , X
tn
) where f : S
n+1
1
is a bounded function, and
n
n=1
are the discrete stopping times dened in
Eq. (35.17) of Lemma 35.18. Then for each n, we may use Theorem 35.17, to
conclude,
E
[F
n
1
<
1
A
]
= E
[F
n
1
n<
1
A
]
= E
k=1
F
k2
n 1
An=k2
n
]
_
=
k=1
E
_
F
k2
n 1
An=k2
n
]
k=1
E
__
E
X
k2
n
F
1
An=k2
n
]
= E
k=1
1
n=k2
n
_
E
X
k2
n
F
1
A
_
= E
_
1
n<
_
E
Xn
F
1
A
= E
_
1
<
_
E
Xn
F
1
A
. (35.19)
Observe that F
n
F and E
X
F E
Xn
F boundedly on < because
t E
Xt()
F is right continuous and bounded. Using these observations and
the DCT, we may pass to the limit (n ) in Eq. (35.19) to arrive at the
identity;
E
[F
1
<
1
A
] = E
[1
<
[E
X
F] 1
A
] . (35.20)
By the multiplicative systems theorem, Eq. (35.20) is seen to be valid for all
bounded measurable functions, F : 1. Since A B
+
was arbitrary, Eq.

(35.18) is proved.
Corollary 35.20. If X (t)
t0
is a Markov chain, is an optional time
(i.e. < t B
t
for all t, see Denition 27.8), and j S, then, condi-
tioned on < and X
= j , B
+
and X (t +) : t 0 are independent

and X (t +) : t 0 has the same distribution as X (t)
t0
under P
j
.
Proof. Let g : 1 be a bounded B
measurable function and f :

1 be a bounded B measurable function. Then
E
[g f (X
+
) : < & X
= j] = E
[g f
: < & X
= j]
= E
_
E
_
f
g 1
<
1
X=j
[B
_
= E
_
g 1
<
1
X=j
E
[f
[B
]
_
= E
_
g 1
<
1
X=j
E
X
f
_
= E
_
g 1
<
1
X=j
E
j
f
_
= E
_
g 1
<
1
X=j
_
E
j
f
and therefore,
E
[g f (X
+
) [ < & X
= j] =
E
_
g 1
<
1
X=j
_
E
[ < & X
= j]
E
j
f
= E
(g[ < & X
= j) E
j
f.
E
[f (X
+
) [ < & X
= j] = E
j
f
the corollary.
36
Continuous Time M.C. Finite State Space Theory
For simplicity we will begin our study in the case where the state space is
nite, say S = 1, 2, 3, . . . , N for some N < . It will be convenient to dene,
1 :=
_
_
1
1
.
.
.
1
_
_
be the column vector with all entries being 1.
Denition 36.1. An N N matrix function P (t) for t 0 is Markov semi-
group if
1. P (t) is Markov matrix for all t 0, i.e. P
ij
(t) 0 for all i, j and
jS
P
ij
(t) = 1 for all i S. (36.1)
The condition in Eq. (36.1) may be written in matrix notation as,
P (t) 1 = 1 for all t 0. (36.2)
2. P (0) = I
NN
,
3. P (t +s) = P (t) P (s) for all s, t 0 (Chapman - Kolmogorov),
4. lim
t0
P (t) = I, i.e. P is continuous at t = 0.
Denition 36.2. An N N matrix, Q, is an innitesimal generator if
Q
ij
0 for all i ,= j and
jS
Q
ij
= 0 for all i S. (36.3)
The condition in Eq. (36.3) may be written in matrix notation as,
Q1 = 0. (36.4)
36.1 Matrix Exponentials
In this section we are going to make use of the following facts from the theory
of linear ordinary dierential equations.
Theorem 36.3. Let A and B be any N N (real) matrices. Then there exists
a unique N N matrix function P (t) solving the dierential equation,
P (t) = AP (t) with P (0) = B (36.5)

which is in fact given by
P (t) = e
tA
B (36.6)
where
e
tA
=
n=0
t
n
n!
A
n
= I +tA+
t
2
2!
A
2
+
t
3
3!
A
3
+. . . (36.7)
The matrix function e
tA
may be characterized as the unique solution Eq. (36.5)
with B = I and it is also the unique solution to
P (t) = AP (t) with P (0) = I.

Moreover, e
tA
satises the semi-group property (Chapman Kolmogorov equa-
tion),
e
(t+s)A
= e
tA
e
sA
for all s, t 0. (36.8)
Proof. We will only prove Eq. (36.8) here assuming the rst part of the
theorem. Fix s > 0 and let R(t) := e
(t+s)A
, then
R(t) = Ae
(t+s)A
= AR(t) with R(0) = P (s) .
Therefore by the rst part of the theorem
e
(t+s)A
= R(t) = e
tA
R(0) = e
tA
e
sA
.
494 36 Continuous Time M.C. Finite State Space Theory
Example 36.4 (Thanks to Mike Gao!). If A =
_
0 1
0 0
_
, then A
n
= 0 for n 2,
so that
e
tA
= I +tA =
_
1 0
0 1
_
+t
_
0 1
0 0
_
=
_
1 t
0 1
_
.
Similarly if B =
_
0 0
1 0
_
, then B
n
= 0 for n 2 and
e
tB
= I +tB =
_
1 0
0 1
_
+t
_
0 0
1 0
_
=
_
1 0
t 1
_
.
Now let C = A + B =
_
0 1
1 0
_
. In this case C
2
= I, C
3
= C, C
4
= I,
C
5
= C etc., so that
C
2n
= (1)
n
I and C
2n+1
= (1)
n
C.
Therefore,
e
tC
=
n=0
t
2n
(2n)!
C
2n
+
n=0
t
2n+1
(2n + 1)!
C
2n+1
=
n=0
t
2n
(2n)!
(1)
n
I +
n=0
t
2n+1
(2n + 1)!
(1)
n
C
= cos (t) I + sin(t) C =
_
cos t sint
sint cos t
_
which is the matrix representing rotation in the plan by t degrees.
Here is another way to compute e
tC
in this example. Since C
2
= I, we
nd
d
2
dt
2
e
tC
= C
2
e
tC
= e
tC
with
e
0C
= I and
d
dt
e
tC
[
t=0
= C.
It is now easy to verify the solution to this second order equation is given by,
e
tC
= cos t I + sint C
which agrees with our previous answer.
Remark 36.5. Warning: if A and B are two N N matrices it is not generally
true that
e
(A+B)
= e
A
e
B
(36.9)
as can be seen from Example 36.4.
However we have the following lemma.
Lemma 36.6. If A and B commute, i.e. AB = BA, then Eq. (36.9) holds. In
particular, taking B = A, shows that e
A
=
_
e
A
1
.
Proof. First proof. Simply verify Eq. (36.9) using explicit manipulations
with the innite series expansion. The point is, because A and B compute, we
may use the binomial formula to nd;
(A+B)
n
=
n
k=0
_
n
k
_
A
k
B
nk
.
(Notice that if A and B do not compute we will have
(A+B) = A
2
+AB +BA+B
2
,= A
2
+ 2AB +B
2
.)
Therefore,
e
(A+B)
=
n=0
1
n!
(A+B)
n
=
n=0
1
n!
n
k=0
_
n
k
_
A
k
B
nk
=
0kn<
1
k!
1
(n k)!
A
k
B
nk
(let n k = l)
=
k=0
l=0
1
k!
1
l!
A
k
B
l
=
k=0
1
k!
A
k
l=0
1
l!
B
l
= e
A
e
B
.
Second proof. Here is another proof which uses the ODE interpretation
of e
tA
. We will carry it out in a number of steps.
1. By Theorem 36.3 and the product rule
d
dt
e
tA
Be
tA
= e
tA
(A) Be
tA
+e
tA
BAe
tA
= e
tA
(BAAB) e
tA
= 0
since A and B commute. This shows that e
tA
Be
tA
= B for all t 1.
2. Taking B = I in 1. then shows e
tA
e
tA
= I for all t ,i.e. e
tA
=
_
e
tA
1
.
Hence we now conclude from Item 1. that e
tA
B = Be
tA
for all t.
3. Using Theorem 36.3, Item 2., and the product rule implies
d
dt
_
e
tB
e
tA
e
t(A+B)
_
=e
tB
(B) e
tA
e
t(A+B)
+e
tB
e
tA
(A) e
t(A+B)
+e
tB
e
tA
(A+B) e
t(A+B)
=e
tB
e
tA
(B) e
t(A+B)
+e
tB
e
tA
(A) e
t(A+B)
+e
tB
e
tA
(A+B) e
t(A+B)
= 0.
36.2 Characterizing Markov Semi-Groups 495
Therefore,
e
tB
e
tA
e
t(A+B)
= e
tB
e
tA
e
t(A+B)
[
t=0
= I for all t,
and hence taking t = 1, shows
e
B
e
A
e
(A+B)
= I. (36.10)
Multiplying Eq. (36.10) on the left by e
A
e
B
gives Eq. (36.9).
The next two results gives a practical method for computing e
tQ
in many
situations.
Proposition 36.7. If is a diagonal matrix,
:=
_
2
.
.
.
m
_
_
then
e
t
=
_
_
e
t1
e
t2
.
.
.
e
tn
_
_
.
Proof. One easily shows that
n
:=
_
n
1
n
2
.
.
.
n
m
_
_
for all n and therefore,
e
t
=
n=0
t
n
n!
n
=
_
n=0
t
n
n!
n
1
n=0
t
n
n!
n
2
.
.
.
n=0
t
n
n!
n
m
_
_
=
_
_
e
t1
e
t2
.
.
.
e
tn
_
_
.
Theorem 36.8. Suppose that Q is a diagonalizable matrix, i.e. there exists an
invertible matrix, S, such that S
1
QS = with being a diagonal matrix. In
this case we have,
e
tQ
= Se
t
S
1
(36.11)
Proof. We begin by observing that
_
S
1
QS
_
2
= S
1
QSS
1
QS = S
1
Q
2
S,
_
S
1
QS
_
3
= S
1
Q
2
SS
1
QS = S
1
Q
3
S
.
.
.
_
S
1
QS
_
n
= S
1
Q
n
S for all n 0.
Therefore we nd that
S
1
e
tQ
S = S
1
IS +
n=0
t
n
n!
S
1
Q
n
S
= I +
n=0
t
n
n!
_
S
1
QS
_
n
= I +
n=0
t
n
n!
n
= e
t
.
Solving this equation for e
tQ
gives the desired result.
36.2 Characterizing Markov Semi-Groups
We now come to the main theorem of this chapter.
Theorem 36.9. The collection of Markov semi-groups is in one to one cor-
respondence with the collection of innitesimal generators. More precisely we
have;
1. P (t) = e
tQ
is Markov semi-group i Q is an innitesimal generator.
2. If P (t) is a Markov semi-group, then Q :=
d
dt
[
0+
P (t) exists, Q is an in-
nitesimal generator, and P (t) = e
tQ
.
Proof. The proof is completed by Propositions 36.10 36.13 below. (You
might look at Example 36.4 to see what goes wrong if Q does not satisfy the
properties of a Markov generator.)
We are now going to prove a number of results which in total will complete
the proof of Theorem 36.9. The rst result is technical and you may safely skip
its proof.
Proposition 36.10 (Techinical proposition). Every Markov semi-group,
P (t)
t0
is continuously dierentiable.
Proof. First we want to show that P (t) is continuous. For t, h 0, we have
P (t +h) P (t) = P (t) P (h) P (t) = P (t) (P (h) I) 0 as h 0.
Similarly if t > 0 and 0 h < t, we have
P (t) P (t h) = P (t h +h) P (t h) = P (t h) P (h) P (t h)
= P (t h) [P (h) I] 0 as h 0
where we use the fact that P (t h) has entries all bounded by 1 and therefore
(P (t h) [P (h) I])
ij
k
P
ik
(t h)
(P (h) I)
kj
(P (h) I)
kj
0 as h 0.
Thus we have shown that P (t) is continuous.
To prove the dierentiability of P (t) we use a trick due to Garding. Choose
> 0 such that
:=
1
_

0
P (s) ds
is invertible. To see this is possible, observe that by the continuity of P,
1
0
P (s) ds I as 0. Therefore, by the continuity of the determinant
function,
det
_
1
_

0
P (s) ds
_
det (I) = 1 as 0.
With this denition of , we have
P (t) =
1
_

0
P (t) P (s) ds =
1
_

0
P (t +s) ds =
1
_
t+
t
P (s) ds.
So by the fundamental theorem of calculus, P (t) is dierentiable and
d
dt
[P (t) ] =
1
(P (t +) P (t)) .
As is invertible, we may conclude that P (t) is dierentiable and that
P (t) :=
1
(P (t +) P (t))
1
.
Since the right hand side of this equation is continuous in t it follows that

P (t)
is continuous as well.
Proposition 36.11. If P (t)
t0
is a Markov semi-group and Q :=
d
dt
[
0+
P (t) , then
1. P (t) satises P (0) = I and both,
P (t) = P (t) Q (Kolmogorovs forward Eq.)

and
P (t) = QP (t) (Kolmogorovs backwards Eq.)

hold.
2. P (t) = e
tQ
.
3. Q is an innitesimal generator.
Proof. 1.-2. We may compute

P (t) using
P (t) =
d
ds
[
0
P (t +s) .
We then may write P (t +s) as P (t) P (s) or as P (s) P (t) and hence
P (t) =
d
ds
[
0
[P (t) P (s)] = P (t) Q and
P (t) =
d
ds
[
0
[P (s) P (t)] = QP (t) .
This proves Item 1. and Item 2. now follows from Theorem 36.3.
3. Since P (t) is continuously dierentiable, P (t) = I +tQ+O
_
t
2
_
, and so
for i ,= j,
0 P
ij
(t) =
ij
+tQ
ij
+O
_
t
2
_
= tQ
ij
+O
_
t
2
_
.
Dividing this inequality by t and then letting t 0 shows Q
ij
0. Dierentiating
the Eq. (36.2), P (t) 1 = 1, at t = 0
+
to show Q1 =0.
Proposition 36.12. Let Q be any matrix such that Q
ij
0 for all i ,= j. Then
_
e
tQ
_
ij
0 for all t 0 and i, j S.
Proof. Choose 1 such that Q
ii
for all i S. Then I + Q has
all non-negative entries and therefore e
t(I+Q)
has non-negative entries for all
t 0. (Think about the power series expansion for e
t(I+Q)
.) By Lemma 36.6
we know that e
t(I+Q)
= e
tI
e
tQ
and since e
tI
= e
t
I (you verify), we have
1
e
t(I+Q)
= e
t
e
tQ
.
Therefore, e
tQ
= e
t
e
t(I+Q)
again has all non-negative entries and the proof
is complete.
1
Actually if you do not want to use Lemma 36.6, you may check that e
t(I+Q)
=
e
t
e
tQ
by simply showing both sides of this equation satisfy the same ordinary
dierential equation.
36.3 Examples 497
Proposition 36.13. Suppose that Q is any matrix such that

jS
Q
ij
= 0 for
all i S, i.e. Q1 = 0. Then e
tQ
1 = 1.
Proof. Since
d
dt
e
tQ
1 =e
tQ
Q1 =0,
it follows that e
tQ
1 = e
tQ
1[
t=0
= 1.
Lemma 36.14 (ODE Lemma). If h(t) is a given function and 1, then
the solution to the dierential equation,
(t) = (t) +h(t) (36.12)
is
(t) = e
t
_
(0) +
_
t
0
e
s
h(s) ds
_
(36.13)
= e
t
(0) +
_
t
0
e
(ts)
h(s) ds. (36.14)
Proof. If (t) satises Eq. (36.12), then
d
dt
_
e
t
(t)
_
= e
t
( (t) + (t)) = e
t
h(t) .
Integrating this equation implies,
e
t
(t) (0) =
_
t
0
e
s
h(s) ds.
Solving this equation for (t) gives
(t) = e
t
(0) +e
t
_
t
0
e
s
h(s) ds (36.15)
which is the same as Eq. (36.13). A direct check shows that (t) so dened
solves Eq. (36.12). Indeed using Eq. (36.15) and the fundamental theorem of
calculus shows,
(t) = e
t
(0) +e
t
_
t
0
e
s
h(s) ds +e
t
e
t
h(t)
= (t) +h(t) .
Corollary 36.15. Suppose 1 and (t) is a function which satises, (t)
(t) , then
(t) e
t
(0) for all t 0. (36.16)
In particular if (0) 0 then (t) 0 for all t. In particular if Q is a Markov
generator and P (t) = e
tQ
, then
P
ii
(t) e
qit
for all t > 0
where q
i
:= Q
ii
. (If we put all of the sand at site i at time 0, e
qit
represents
the amount of sand at a later time t in the worst case scenario where no one
else shovels sand back to site i.)
Proof. Let h(t) := (t) (t) 0 and then apply Lemma 36.14 to
conclude that
(t) = e
t
(0) +
_
t
0
e
(ts)
h(s) ds. (36.17)
Since e
(ts)
h(s) 0, it follows that
_
t
0
e
(ts)
h(s) ds 0 and therefore if we
ignore this term in Eq. (36.17) leads to the estimate in Eq. (36.16).
36.3 Examples
Example 36.16 (2 2 case I). The most general 2 2 rate matrix Q is of the
form
Q =
0 1
_

_
0
1
with rate diagram being given in Figure 36.1. We now nd e
tQ
using Theorem
Fig. 36.1. Two state Markov chain rate diagram.
36.8. To do this we start by observing that
det (QI) = det
__

__
= ( +) ( +)
=
2
+ = ( +) .
Thus the eigenvalues of Q are 0, . The eigenvector for 0 is
_
1 1
tr
. More-
over,
Q() I =
_

_
which has
_

tr
and therefore we let
S =
_
1
1
_
and S
1
=
1
_

1 1
_
.
We then have
S
1
QS =
_
0 0
0
_
=: .
So in our case
S
1
e
tQ
S = e
t
=
_
e
0t
0
0 e
t
_
=
_
1 0
0 e
t
_
.
Hence we must have,
e
tQ
= S
_
1 0
0 e
t
_
S
1
=
1
_
1
1
_ _
1 0
0 e
t
_ _

1 1
_
=
1
_
+e
t
e
t
e
t
+e
t
_
=
1
_
+e
t
(1 e
t
)
(1 e
t
) +e
t
_
.
Example 36.17 (2 2 case II). If P (t) = e
tQ
and (t) = (0) P (t) , then
(t) = (t) Q = [
0
(t) ,
1
(t)]
_

_
_
0
(t) +
1
(t)
0
(t)
1
(t)
,
i.e

0
(t) =
0
(t) +
1
(t) (36.18)

1
(t) =
0
(t)
1
(t) . (36.19)
The latter pair of equations is easy to write down using the jump diagram and
the movement of sand interpretation. If we assume that
0
(0) +
1
(0) = 1 then
we know
0
(t) +
1
(t) = 1 for all later times and therefore we may rewrite Eq.
(36.18) as

0
(t) =
0
(t) + (1
0
(t))
=
0
(t) +
where := +. We may use Lemma 36.14 below to nd
0
(t) = e
t
0
(0) +
_
t
0
e
(ts)
ds
= e
t
0
(0) +

_
1 e
t
_
.
We may also conclude that
1
(t) = 1
0
(t) = 1 e
t
0
(0)

_
1 e
t
_
= 1 e
t
(1
1
(0))

_
1 e
t
_
= e
t
1
(0) +
_
1 e
t
_
_
1 e
t
_
= e
t
1
(0) +

_
1 e
t
_
.
By taking
0
(0) = 1 and
1
(0) = 0 we get the rst row of P (t) is equal to
_
e
t
1 +

(1 e
t
)

(1 e
t
)
=
1
_
e
t
+ (1 e
t
)
and similarly the second row of P (t) is found by taking

0
(0) = 0 and
1
(0) = 1
to nd
_
(1 e
t
) e
t
+

(1 e
t
)
=
1
_
(1 e
t
) e
t
+
.
Hence we have found
P (t) =
1
_
e
t
+ (1 e
t
)
(1 e
t
) e
t
+
_
=
1
_
(e
t
1) + + (1 e
t
)
(1 e
t
) (e
t
1) + +
_
= I +
1
_
1 e
t
_
_

_
= I +
1
_
1 e
t
_
Q.
36.3 Examples 499
Let us verify that this is indeed the correct solution. It is clear that P (0) = I,
P (t) = e
t
_

_
Q
2
=
_
+
2

2

2
+
2
_
=
_

_
= Q
and therefore,
P (t) Q = Q
_
1 e
t
_
Q = e
t
Q
as desired.
We also have
P (s) P (t) =
_
I +
1
_
1 e
s
_
Q
__
I +
1
_
1 e
t
_
Q
_
= I +
1
_
2 e
s
e
t
_
Q+
1
_
1 e
s
_
1
_
1 e
t
_
() Q
= I +
1
__
2 e
s
e
t
_
_
1 e
s
_ _
1 e
t
_
Q
= I +
1
_
1 e
(s+t)
_
Q = P (s +t)
as it should be. Lastly let us observe that
lim
t
P (t) = I +
1
lim
t
_
1 e
t
_
_

_
= I
1
_

_
=
1
_

_
.
Moreover we have
lim
t
P (t) = lim
t
e
t
_

_
= 0.
Suppose that is any distribution, then
lim
t
P (t) =
1
0

1
_

_
=
1
independent of . Moreover, since

1
P (s) = lim
t
P (t) P (s) = lim
t
P (t +s)
= lim
t
P (t) =
1
which shows that the limiting distribution is also an invariant distribution. If

is any invariant distribution for P, we must have
= lim
t
P (t) =
1
=
_

+
+
_
(36.20)
and moreover,
0 =
d
dt
[
0
=
d
dt
[
0
P (t) = Q.
The solutions of Q = 0 correspond to the null space of Q
tr
which implies
Nul Q
tr
= Nul
_

_
= 1
_
_
and hence we have again recovered =
1
.
Example 36.18 (22 case III). We now compute e
tQ
by the power series method
as follows. A simple computation shows that
Q
2
=
_
+
2

2

2
+
2
_
=
_

_
= Q.
Hence it follows by induction that Q
n
= ()
n1
Q and therefore,
P (t) = e
tQ
= I +
n=1
t
n
n!
()
n1
Q
= I
1
n=1
t
n
n!
()
n
Q = I
1
_
e
t
1
_
Q
=
_
1 0
0 1
_
_
e
t
1
_
_

_
=
_
(e
t
1) + 1
(e
t
1)
(e
t
1)

(e
t
1) + 1
_
=
1
_
e
t
+ (1 e
t
)
(1 e
t
) e
t
+
_
Let us again verify that this answer is correct;
P (t) = e
t
Q while
P (t) Q = Q
1
_
e
t
1
_
() Q = Q+
_
e
t
1
_
Q =

P (t) .
Example 36.19. Let S = 1, 2, 3 and
Q =
1 2 3
_
_
3 1 2
0 1 1
0 0 0
_
_
1
2
3
which we represent by Figure 36.19. Let = (
1
,
2
,
3
) be a given initial ( at
t = 0) distribution (of sand say) on S and let (t) := e
tQ
be the distribution
at time t. Then
(t) = e
tQ
Q = (t) Q.
In this particular example this gives,
_

1

2

3
=
_
1

2

3
_
_
3 1 2
0 1 1
0 0 0
_
_
=
_
3
1

1
2
2
1
+
2
,
or equivalently,

1
= 3
1
(36.21)

2
=
1
2
(36.22)

3
= 2
1
+
2
. (36.23)
Notice that these equations are easy to read o from Figure 36.19. For example,
the second equation represents the fact that rate of change of sand at site 2 is
equal to the rate which sand is entering site 2 (in this case from 1 with rate
1
1
) minus the rate at which sand is leaving site 2 (in this case 1
2
is the rate
that sand is being transported to 3). Similarly, site 3 is greedy and never gives
up any of its sand while happily receiving sand from site 1 at rate 2
1
and from
site 2 are rate 1
2
. Solving Eq. (36.21) gives,
1
(t) = e
3t
1
(0)
and therefore Eq. (36.22) becomes

2
= e
3t
1
(0)
2
which, by Lemma 36.14 below, has solution,
2
(t) = e
t
2
(0) +e
t
_
t
0
e
e
3
1
(0) d
=
1
2
_
e
t
e
3t
_
1
(0) +e
t
2
(0) .
Using this back in Eq. (36.23) then shows

3
= 2e
3t
1
(0) +
1
2
_
e
t
e
3t
_
1
(0) +e
t
2
(0)
=
_
1
2
e
t
+
3
2
e
3t
_
1
(0) +e
t
2
(0)
which integrates to
3
(t) =
_
1
2
_
1 e
t
+
1
2
_
1 e
3t
_
_
1
(0) +
_
1 e
t
_
2
(0) +
3
(0)
=
_
1
1
2
_
e
t
+e
3t
1
(0) +
_
1 e
t
_
2
(0) +
3
(0) .
Thus we have
_
_
1
(t)
2
(t)
3
(t)
_
_
=
_
_
e
3t
1
(0)
1
2
_
e
t
e
3t
_
1
(0) +e
t
2
(0)
_
1
1
2
_
e
t
+e
3t
_
1
(0) + (1 e
t
)
2
(0) +
3
(0)
_
_
=
_
_
e
3t
0 0
1
2
_
e
t
e
3t
_
e
t
0
1
1
2
_
e
t
+e
3t
1 e
t
1
_
_
_
_
1
(0)
2
(0)
3
(0)
_
_
.
From this we may conclude that
P (t) = e
tQ
=
_
_
e
3t
0 0
1
2
_
e
t
e
3t
_
e
t
0
1
1
2
_
e
t
+e
3t
1 e
t
1
_
_
tr
=
_
_
e
3t
_
1
2
e
t
1
2
e
3t
_ _
1
1
2
e
t
1
2
e
3t
_
0 e
t
e
t
+ 1
0 0 1
_
_
.
37
Jump and Hold Description
We would now like to make a direct connection between Q and the Markov
process X
t
. To this end, let denote the rst time the process makes a jump
between two states. In this section we are going to write x and y for typical
element in the state space, S. (BRUCE: there is a notational problem here,
namely below we are writing = S and also using S for the state space.) In
this section we will describe the theory a bit informally saving the formal proofs
for the next Section 37.3.
Theorem 37.1. Let Q
x
:= Q
x,x
0. Then P
x
(S > t) = e
Qxt
, which shows
that relative P
x
, S is exponentially distributed with parameter Q
x
. Moreover,
X
S
is independent of S and
P
x
(X
S
= y) = Q
x,y
/Q
x
.
Proof. We give a number of proofs for this theorem. Perhaps the tightest
proof appears in Section 37.3 below.
First proof. For the rst assertion we let
A
n
:=
_
X
_
i
2
n
t
_
= x for i = 1, 2, . . . , 2
n
1, 2
n
_
.
Then
A
n
X (s) = x for s t = S > t
and therefore, P
x
(A
n
) P
x
(S > t) . Since,
P (A
n
) = [P
x,x
(t/2
n
)]
2
n
=
_
1
tQ
x
2
n
+O
_
(1/2
n
)
2
_
_
2
n
e
tQx
as n ,
we have shown P
x
(S > t) = e
tQx
.
For the second assertion. Let T be the time between the second and rst
jump of the process. Then by the strong Markov property (Corollary 35.20),
for any t 0 and > 0 small, we have,
P
x
(t < S t +, T ) =
yS
P
x
(t < S t +, T , X
S
= y)
=
yS
P
x
(t < S t +, S
S
, X
S
= y)
=
yS
P
x
(t < S t +, X
S
= y) P
y
(S )
=
yS
P
x
(t < S t +, X
S
= y)
_
1 e
Qy
_
max
yS
_
1 e
Qy
_
yS
P
x
(t < S t +, X
S
= y)
= max
yS
_
1 e
Qy
_
P
x
(t < S t +)
= max
yS
_
1 e
Qy
_
_
t+
t
Q
x
e
Qx
d = O
_
2
_
.
(Here we have used that the rates, Q
y
yS
are bounded which is certainly the
case when #(S) < .) Therefore the probability of two jumps occurring in the
time interval, [t, t +] , may be ignored and we have,
P
x
(X
S
= y, t < S t +) = P
x
(X
t+
= y, S > t) +o()
= P
x
(X
t+
= y, X
t
= x, S > t) +o()
= lim
n
_
1
tQ
x
n
+O(n
2
)
_
n
P
x,y
() +o()
= e
tQx
P
x,y
() +o().
Also
P
x
(t < S t+) =
_
t+
t
Q
x
e
Qxs
ds = e
Qxt
e
Qx(t+)
= Q
x
e
Qxt
+o().
Therefore,
P
x
(X
S
= y[S = t) = lim
0
P
x
(X
S
= y, t < S t +)
P
x
(t < S t +)
= lim
0
e
tQx
P
x,y
() +o()
Q
x
e
Qxt
+o()
=
1
Q
x
lim
0
P
x,y
()
= Q
x,y
/Q
x
.
502 37 Jump and Hold Description
This shows that S and X
S
are independent and that P
x
(X
S
= y) = Q
x,y
/Q
x
.
Second Proof. For t > 0 and > 0, we have that
P
x
(S > t, X
t+
= y) = lim
n
P
x
(X
t+
= y and X
_
i
2
n
t
_
= x for i = 1, 2, . . . , 2
n
)
= lim
n
[P
x,x
(t/2
n
)]
2
n
P
xy
()
= P
xy
() lim
n
_
1
tQ
x
2
n
+O
_
2
2n
_
_
2
n
= P
xy
()e
tQx
.
With this computation in hand, we may now compute P
x
(X
S
= y, t < S
t +) using the Figure 37.1 as our guide
Fig. 37.1. Depicting the rst jump of the chain staring at i.
So according Figure 37.1, we must have X
S
= y & t < S t + i for all
large n there exists 0 k < n such that S > t +k/n & X
t+(k+1)/n
= y and
therefore
P
x
(X
S
= y & t < S t +)
= lim
n
P
x
_
S > t +k/n & X
t+(k+1)/n
= y
for some 0 k < n
_
= lim
n
n1
k=0
P
x
(S > t +k/n & X
t+(k+1)/n
= y)
= lim
n
n1
k=0
P
xy
(/n)e
(t+k/n)Qx
= lim
n
n1
k=0
e
(t+k/n)Qx
(
n
Q
xy
+o(n
1
))
= Q
xy
_
t+
t
e
Qxs
ds =
Q
x,y
Q
x
_
t+
t
Q
x
e
Qxs
ds
=
Q
x,y
Q
x
P
x
(t < S t +).
Letting t 0 and in this equation we learn that
P
x
(X
S
= y) =
Q
x,y
Q
x
and hence
P
x
(X
S
= y, t < S t +) = P
x
(X
S
= y) P
x
(t < S t +).
This proves also that X
S
and S are independent random variables.
Remark 37.2. Technically in the proof above, we have used the identity,
X
S
= y & t < S t +
=
N=1
nN

0m<n
_
S > t +k/n & X
t+(k+1)/n
= y
_
.
Using Theorem 37.1 along with the strong Markov property in Corollary
35.20 leads to the following description of the Markov process associated to Q.
Dene a Markov matrix,

P, by
P
xy
:=
_
Qx,y
Qx,x
if x ,= y
0 if x = y
for all x, y S. (37.1)
The process X starting at x may be described as follows: 1) stay at x for an
exp(Q
x
) amount of time, S
1
, then jump to x
1
with probability

P
x,x1
. Stay
at x
1
for an exp(Q
x1
) amount of time, S
2
, independent of S
1
and then jump
to x
2
with probability

P
x1,x2
. Stay at x
2
for an exp(Q
x2
) amount of time, S
3
,
independent of S
1
and S
2
and then jump to x
3
with probability

P
x2,x3
, etc. etc.
etc. etc. The next corollary formalizes these rules.
37.1 Hitting and Expected Return times and Probabilities 503
Corollary 37.3. Let Q be the innitesimal generator of a Markov semigroup
P (t) . Then the Markov chain, X
t
, associated to P (t) may be described
as follows. Let Y
k
k=0
denote the discrete time Markov chain with Markov
matrix

P as in Eq. (37.1). Let S
j
j=1
be random times such that given
Y
j
= x
j
: j n , S
j
d
= exp
_
Q
xj1
_
and the S
j
n
j=1
are independent for
1 j n.
1
Now let N
t
= max j : S
1
+ +S
j
t (see Figure 37.2) and
X
t
:= Y
Nt
. Then X
t
t
0 is the Markov process starting at x with Markov
semi-group, P (t) = e
t
Q.
Fig. 37.2. Dening Nt.
In a manner somewhat similar to the proof of Example 35.8 one shows the
description in Corollary 37.3 denes a Markov process with the correct semi-
group, P (t) . For the details the reader is referred to Norris [43, See Theorems
2.8.2 and 2.8.4] and section 37.3 below.
37.1 Hitting and Expected Return times and Probabilities
Let X (t)
t0
be a continuous time Markov chain described by its innitesimal
generator, Q = (Q
ij
)
i,jS
where S is the state space. Further let
S
1
= inf t > 0 : X (t) ,= X (0)
be the rst jump time of the chain and q
j
:= Q
j j
for all j S. Recall
P (S
1
> t[X (0) = j) = e
qjt
for all t > 0 and E[S
1
[X (0) = j] = 1/q
j
. Given a
subset, A, of the state space, S, let
1
A concrete way to chose the {Sj}
j=1
is as follows. Given a sequence, {Tj}
j=1
,
of i.i.d. exp(1) random variables which are independent of {Y } , dene Sj :=
Tj/QYj1
.
Fig. 37.3. A rate diagram for a four state Markov chain.
T
A
:= inf t 0 : X (t) A
be the rst time the process, X (t) , hits A. By convention, T
A
= if X (t) / A
for all t, i.e. if X (t) does not hit A.
Example 37.4. Let S = 1, 2, 3, 4 and X (t) be the continuous time Markov
chain determined by the rate diagram, Further let A = 3, 4 . We would like
to compute, h
i
= P
i
(X (t) hits A) for i = 1, 2. If Y
n
n=0
is the embedded
discrete time chain, this is the same as computing, h
i
= P
i
(Y
n
hits A) which
we know how to do. We now carry out the details. First o the innitesimal
generator, Q, is given by
Q =
1 2 3 4
1
2
3
4
_
_
4 1 3 0
2 3 0 1
1 0 1 0
0 2 0 2
_
_
and hence the Markov matrix for Y
n
is given by,
P :=
1 2 3 4
1
2
3
4
_
_
0 1/4 3/4 0
2/3 0 0 1/3
1 0 0 0
0 1 0 0
_
_
.
The rst step analysis for the hitting probabilities then implies,
h
1
=P
1
(X (t) hits A[X
S1
= 3) P
1
(X
S1
= 3)
+P
1
(X (t) hits A[X
S1
= 2) P
1
(X
S1
= 2)
=
3
4
+h
2
1
4
and
h
2
=P
2
(X (t) hits A[X
S1
= 1) P
2
(X
S1
= 1)
+P
2
(X (t) hits A[X
S1
= 4) P
2
(X
S1
= 4)
=
2
3
h
1
+
1
3
which have solutions, h
1
= h
2
= 1 as we know should be the case since this is
an irreducible Markov chain.
Example 37.5. Continuing the set up in Example 37.4, we are going to compute
w
i
= E
i
T
A
for i = 1, 2. Again by a rst step analysis we have,
w
1
= E
1
(T
A
[X
S1
= 3) P
1
(X
S1
= 3) +E
1
(T
A
[X
S1
= 2) P
1
(X
S1
= 2)
=
1
4
3
4
+
_
1
4
+w
2
_
1
4
=
1
4
+
1
4
w
2
and
w
2
= E
2
(T
A
[X
S1
= 1) P
2
(X
S1
= 1) +E
2
(T
A
[X
S1
= 4) P
2
(X
S1
= 4)
=
_
1
3
+w
1
_
2
3
+
1
3
1
3
=
1
3
+
2
3
w
1
,
where
1
4
= E
1
(S
1
) and
1
3
= E
2
(S
1
) . The solutions to these equations are:
E
1
(T
A
) = w
1
=
2
5
and E
2
(T
A
) = w
2
=
3
5
.
With this example as background, let us now work out the general formula
for these hitting times.
Proposition 37.6. Let Q be the innitesimal generator of a continuous time
Markov chain, X (t)
t0
, with state space, S. Suppose that A S and T
A
:=
inf t 0 : X (t) A . If we let w
i
:= E
i
T
A
for all i / A, then w
i
iA
c satisfy
the system of linear equations,
w
i
=
1
q
i
+
j / A
P
ij
w
j
=
1
q
i
+
j / A
Q
ij
q
i
w
j
where as usual, q
i
= Q
ii
=
j,=i
Q
ij
.
Proof. By the rst step analysis we have, for i / A,
w
i
=
j,=i
E
i
[T
A
[X
S1
= j] P
i
(X
S1
= j)
=
j,=i
P
ij
E
i
[T
A
[X
S1
= j] .
By the strong Markov property,
E
i
[T
A
[X
S1
= j] = E
i
S
1
+E
j
T
A
=
1
q
i
+w
j
where w
j
:= E
j
T
A
= 0 if j A. Therefore we have,
w
i
=
j,=i
P
ij
_
1
q
i
+w
j
_
=
1
q
i
+
j,=i
P
ij
w
j
=
1
q
i
+
j / A
P
ij
w
j
as claimed.
Notation 37.7 Now let
R
j
:= inf t > S
1
: X
t
= j
be the rst return time to j.
Our next goal is to nd a formula for E
i
R
j
for all i, j S. Before going to
the general case, let us work out an example.
Example 37.8. Let us do an example of a two state Markov chain. Say
0

1

0.
Let m
0
= E
0
R
0
and m
1
= E
1
R
0
, then
m
0
= E
0
[R
0
[X
S1
= 1] P (X
S1
= 1) =
1
+m
1
m
1
= E
0
[R
0
[X
S1
= 0] P (X
S1
= 0) =
1
and therefore, m
0
=
1
+
1
which is clearly the correct answer in this case. The

long run fraction of the time we are in state 0 is therefore
1/
m
0
=

+
.
This is the same as computing lim
t
P (X (t) = 0) =
0
. Indeed for this case,
Q =
_

_
has invariant distribution, = (, ) / ( +) . Therefore,
0
=

+
and
1
=

+
. (37.2)
as argued above.
37.2 Long time behavior 505
Proposition 37.9 (Expected return times). If m
ij
:= E
i
R
j
for all j S,
then
m
ij
=
1
q
i
+
k,=i or j
Q
ik
q
i
m
kj
. (37.3)
Proof. By a rst step analysis we have,
m
ij
= E
i
R
j
=
k,=i
E
i
[R
j
[X
S1
= k] P (X
S1
= k)
=
k,=i
E
i
[R
j
[X
S1
= k]
Q
ik
q
i
.
Since
E
i
[R
j
[X
S1
= k] =
_
E
i
S
1
+E
k
R
j
if k ,= j
E
i
S
1
if k = j
=
_
1
qi
+m
kj
if k ,= j
1
qi
if k = j
.
we arrive at the
m
ij
=
k,=i
E
i
[R
j
[X
S1
= k]
Q
ik
q
i
=
1
q
2
i
Q
ij
+
k,=i and k,=j

_
1
q
i
+m
kj
_
Q
ik
q
i
=
k,=i
Q
ik
q
2
i
+
k,=i and k,=j

m
kj
Q
ik
q
i
=
1
q
i
+
k
1
k,=i,k,=j
m
kj
Q
ik
q
i
.
Corollary 37.10. Let X (t)
t0
be a nite state irreducible Markov chain with
generator, Q = (Q
ij
)
i,jS
. If = (
i
) is an invariant distribution, then
i
=
1
q
i
m
ii
=
1
q
i
E
i
[R
i
]
. (37.4)
Proof. Suppose that
j
is an invariant distribution for the chain, so that
i
Q
ik
= 0 or equivalently,
i,=k
i
Q
ik
=
k
Q
kk
=
k
q
k
.
i
q
i
m
ij
=
i
q
i
1
q
i
+
i
q
i
k
1
k,=i,k,=j
Q
ik
q
i
m
kj
= 1 +
i,k
i
1
i,=k,k,=j
Q
ik
m
kj
= 1 +
k
1
k,=j
k
q
k
m
kj
= 1 +
i
1
i,=j
i
q
i
m
ij
.
i
q
i
m
ii
= 1 which proves Eq. (37.4).
37.2 Long time behavior
In this section, suppose that X (t)
t0
is a continuous time Markov chain with
innitesimal generator, Q, so that
P (X (t +h) = j[X (t) = i) =
ij
+Q
ij
h +o (h) .
We further assume that Q completely determines the chain.
Denition 37.11. The chain, X (t) , is irreducible i the underlying discrete
time jump chain, Y
n
, determined by the Markov matrix,

P
ij
:=
Qij
qi
1
i,=j
, is
irreducible, where
q
i
:= Q
ii
=
j,=i
Q
ij
.
Remark 37.12. Using the Sojourn time description of X (t) it is easy to see that
P
ij
(t) =
_
e
tQ
_
ij
> 0 for all t > 0 and i, j S if X (t) is irreducible. Moreover,
if for all i, j S, P
ij
(t) > 0 for some t > 0 then, for the chain Y
n
, i j and
hence X (t) is irreducible. In short the following are equivalent:
1. X (t) is irreducible,
2. or all i, j S, P
ij
(t) > 0 for some t > 0, and
3. P
ij
(t) > 0 for all t > 0 and i, j S.
In particular, in continuous time all chains are aperiodic.
The next theorem gives the basic limiting behavior of irreducible Markov
chains. Before stating the theorem we need to introduce a little more notation.
Notation 37.13 Let S
1
be the time of the rst jump of X (t) , and
R
i
:= mint S
1
: X (t) = i ,
is the rst time hitting the site i after the rst jump, and set
i
=
1
q
i
E
i
R
i
where q
i
:= Q
ii
.
Theorem 37.14 (Limiting behavior). Let X (t) be an irreducible Markov
chain. Then
1. for all initial staring distributions, (j) := P (X (0) = j) for all j S, and
all j S,
P
_
lim
T
1
T
_
T
0
1
X(t)=j
dt =
j
_
= 1. (37.5)
2. lim
t
P
ij
(t) =
j
independent of i.
3. = (
j
)
jS
is stationary, i.e. 0 = Q, i.e.
iS
i
Q
ij
= 0 for all j S,
which is equivalent to P (t) = for all t and to P
(X (t) = j) = (j) for

all t > 0 and j S.
4. If
i
> 0 for some i S, then
i
> 0 for all i S and

iS

i
= 1.
5. The
i
are all positive i there exists a solution,
i
0 to
iS
i
Q
ij
= 0 for all j S with
iS
i
= 1.
If such a solution exists it is unique and = .
Proof. We refer the reader to [43, Theorems 3.8.1.] for the full proof. Let us
make a few comments on the proof taking for granted that lim
t
P
ij
(t) =:
j
exists.
1. Suppose we assume that and that is a stationary distribution, i.e.
P (t) = , then (by dominated convergence theorem),
j
= lim
t
i
P
ij
(t) =
i
lim
t
i
P
ij
(t) =
_
i
_
j
=
j
.
Thus
j
=
j
. If
j
= 0 for all j we must conclude there is not stationary
distribution.
2. If we are in the nite state setting, the following computation is justied:
jS
j
P
jk
(s) =
jS
lim
t
P
ij
(t) P
jk
(s) = lim
t
jS
P
ij
(t) P
jk
(s)
= lim
t
P
ik
(t +s) =
k
.
This show that P (s) = for all s and dierentiating this equation at s = 0
then shows, Q = 0.
3. Let us now explain why
1
T
_
T
0
1
X(t)=j
dt
1
q
j
E
j
R
j
.
The idea is that, because the chain is irreducible, no matter how we start the
chain we will eventually hit the site j. Once we hit j, the (strong) Markov
property implies the chain forgets how it got there and behaves as if it started
at j. Since what happens for the initial time interval of hitting j in computing
the average time spent at j, namely lim
T
1
T
_
T
0
1
X(t)=j
dt, we may as well
have started our chain at j in the rst place.
Now consider one typical cycle in the chain staring at j jumping away at
time S
1
and then returning to j at time R
j
. The average rst jump time is
ES
1
= 1/q
j
while the average length of such as cycle is ER
j
. As the chain
repeats this procedure over and over again with the same statistics, we expect
(by a law of large numbers) that the average time spent at site j is given by
ES
1
ER
j
=
1/q
j
E
j
R
j
=
1
q
j
E
j
R
j
.
37.3 Formal Proofs
Let S () = () = inf t > 0 : (t) ,= (0) . Notice that is an optional time
by Proposition 27.11 or by rst principles. Notice that () < t i (s) ,= (0)
for some s (0, t) , i.e.
< t =
sQ(0,t)
X
s
,= X
0
B
t
.
Lemma 37.15. Suppose that f : 1
+
(0, 1] is a decreasing function such that
f (s +t) = f (s) f (t) for all s, t > 0, then f (t) = e
ct
for some c 0.
Proof. Let g (t) := lnf (t) , then g (t) is decreasing and g (t +s) = g (t) +
g (s) for all s, t > 0. So for n N it follows that
g (1) = g
_
n
1
n
_
= g
_
1
n
+ +
1
n
_
= n g
_
1
n
_
37.3 Formal Proofs 507
and therefore g (1/n) =
1
n
g (1) . Similarly it follows that g (k/n) =
k
n
g (1) for
all k, n N.
Let c := g (1) = lnf (1) 0. If t 1
+
and a, b
+
with a < t < b we
have,
bc = g (b) g (t) g (a) = ac.
Letting a t and b t in these inequalities then shows that g (t) = ct and
therefore that f (t) = e
ct
as claimed.
Let us begin with a better proof of Theorem 37.1 following Kallenberg, [28,
Lemma 12.16]. For the readers convenience we will restate the theorem here.
Theorem 37.16 (Theorem 37.1 restated). Let S
1
= =
inf t > 0 : X
t
,= X
0
and suppose that x S is a non-absorbing state, i.e.
P
x
(S
1
= ) < 1. (That is there is a chance that we will leave the site x.)
Then Law
Px
(S
1
)
d
= exp(c (x)) for some c (x) > 0 and
1
S1
(B) and S
1
are
independent.
Proof. We then have, for s, t 1
+
that
P
x
(S
1
> s +t) = P
x
(S
1
> s & S
1
s > t) = P
x
(S
1
> s & S
1

s
> t)
= E
x
(1
S1>s
E
x
(1
S1s>t
[B
s
)) = E
x
(1
S1>s
E
Xs
[1
S1>t
])
= E
x
(1
S1>s
E
x
[1
S1>t
]) = P
x
(S
1
> s) P
x
(S
1
> t) .
Since P (S
1
> t) > 0 for t small as our paths are right continuous, we may use
Lemma 37.15 to conclude that P
x
(S
1
> t) = e
ct
for some c = c (x) 0. Since
P
x
(S
1
= ) < 1 we must conclude that c > 0 and therefore P
x
(S
1
> t) = e
ct
for some c > 0. In particular, P
x
(S
1
= ) = 0.
Next suppose that f : 1 is a bounded B measurable function. Then
E
x
[f
S1
1
S1>t
] = E
x
[1
S1>t
E
x
(f
S1
[B
S1
)] = E
x
_
1
S1>t
E
XS
1
f
= E
x
[1
S1>t
E
x
f] = E
x
[1
S1>t
] E
x
f.
Taking t = 0 in this identity then shows E
x
[f
S1
] = E
x
f and therefore we
have shown
E
x
[f
S1
1
S1>t
] = E
x
[1
S1>t
] E
x
f = E
x
[1
S1>t
] E
x
[f
S1
] .
Since f and t 0 were arbitrary, it follows that
1
S1
(B) and S
1
are independent.
Corollary 37.17. Now suppose that all of the states are non-absorbing. Let S
n
be the n
th
jump time of the chain. We now dene (x, y) := P
x
(X
S1
= y)
and let Y
n
:= X
Sn
and
n
:= S
n
S
n1
for all n and let := lim
n
S
n
.
Then the joint distributions of the jumps and holds for this Markov chain are
determined by
P
x
(Y
1
= y
1
, . . . , Y
n
= y
n
,
1
> t
1
, . . . ,
n
> t
n
)
= (x, y
1
) (y
1
, y
2
) . . . (y
n1
, y
n
) e
c(x)t1
e
c(y1)t2
. . . e
c(yn1)tn
. (37.6)
Proof. By the Strong Markov property we know that given Y
1
= X
S1
= y
1
that X
S1+t
t0
is independent of B
S1
and has distribution of X
t
t0
under
P
y1
. Therefore S
2
S
1
is independent of B
S1
and given Y
1
= X
S1
= y
1
and
Y
2
= X
S2
= y
2
, X
S1+S2+t
t0
is independent of B
S2
and has law given by P
y2
.
Continuing this way inductively leads to Eq. (37.6).
One of our next goals is to identify (x, y) and c (x) for all x, y S in
terms of the innitesimal generators of the process. It is convenient to dene
q : S S 1 by
q (x, y) =
_
c (x) (x, y) if x ,= y
c (x) if x = y
(37.7)
and
T
t
f (x) := E
x
f (X
t
) (37.8)
for all f : S 1 bounded. Using the Markov property we nd,
T
t+s
f (x) = E
x
f (X
t+s
) = E
x
f (X
t

s
) = E
x
E
Xs
f (X
t
) = E
x
T
t
f (X
s
) = (T
s
T
t
f) (x)
which is to say that T
t
is a semi-group of operators. The semi-group property
is a strong indication that T
t
f satises a dierential equation as we now show
to be the case.
Theorem 37.18 (Backwards Kolmogorovs equation). The Markov tran-
sition operators T
t
t0
satisfy the backwards Kolmogorov equation;
d
dt
T
t
f (x) =
yS
q (x, y) T
t
f (y) (37.9)
for all f : S 1 bounded Alternatively put the transition functions, p
t
(x, y) :=
P
x
(X
t
= y) satisfy
d
dt
p
t
(x, z) =
yS
q (x, y) T
t
z
(y) =
yS
q (x, y) p
t
(y, z) (37.10)
with initial conditions, p
0
(x, y) =
x,y
.
Proof. First note that Eq. (37.10) follows from Eq. (37.9) by taking f (x) =
z
(x) in which case we have
T
t
z
(x) = E
x
z
(X
t
) = P
x
(X
t
= z) = p
t
(x, z) .
Our proof of Eq. (37.9) will follow [28, Theorem 12.22, p. 242].
Let = S
1
and := t, then
T
t
f (x) = E
x
f (X
t
) = E
x
[f (X
t
) : > t] +E
x
[f (X
t
) : t]
= E
x
[f (x) : > t] +E
x
[f (X
t
) : t]
= f (x) P ( > t) +E
x
[f (X
t
) : t] .
In order to evaluate E
x
[f (X
t
) : t] , let us set X
t
= X
0
if t < 0. We then
have, for t , that
f (X
t
) = [f (X
ts
)
]
s=
.
Note well that it is not true that f (X
t
) = f (X
t
)
since,
f (X
t
)
() = f (X
t
) ( ( () +))
= f ( ( () +t ( ( () +))))
([f (X
ts
)
]
s=
) () = [f (X
ts
)
()]
s=()
= [f (X
ts
) ( ( () +))]
s=()
= [f ( ( () +t s))]
s=()
= f ( (t)) = f (X
t
) ()
as desired. See Lemma 27.6 which shows that X is progressively measurable
which we will use below.
With the above comments working informally at rst,
E
x
[f (X
t
) : t] = E
x
[[f (X
ts
)
]
s=
: t]
= E
x
_
E
x
_
[f (X
ts
)
]
s=
[B
+
_
: t
= E
x
_
E
x
_
[f (X
ts
)
] [B
+
_
[
s=
: t
= E
x
[[E
X
f (X
ts
)] [
s=
: t]
= E
x
[[E
Y1
f (X
ts
)] [
s=
: t]
=
y,=x
(x, y) E
x
([E
y
f (X
ts
)] [
s=
: t)
=
y,=x
(x, y)
_
t
0
c (x) e
c(x)u
[E
y
f (X
ts
)] [
s=u
du
=
y,=x
(x, y)
_
t
0
c (x) e
c(x)u
(T
tu
f) (y) du. (37.11)
We will now ll in the missing details in the above argument. (The general idea
going on here is the notion of regular conditional expectations which we have
available to us because of the Markov property.)
Let F (s, ) := f (X
ts
) () = f ( (t s)) which is jointly measurable by
Lemma 27.6. To nish the proof of Eq. (37.11) it suces to show
E
x
[[F (s, )
]
s=
: t] =
y,=x
(x, y)
_
t
0
[E
y
F (u, )] c (x) e
c(x)u
du
for all bounded measurable, F : 1
+
1. This is done using the multiplica-
tive system theorem in the usual way. The point is that if F (s, ) = h(s) f () ,
then
E
x
[[F (s, )
]
s=
: t] = E
x
[h() 1
t
f
] = E
x
[h() 1
t
E
X
f]
=
y,=x
(x, y) E
y
fE
x
[h() 1
t
]
=
y,=x
(x, y) E
y
f
_
t
0
c (x) e
c(x)u
h(u) du
=
y,=x
(x, y)
_
t
0
[E
y
F (u, )] c (x) e
c(x)u
du.
Thus we have shown,
T
t
f (x) = f (x) e
c(x)t
+c (x)
_
t
0
_
_
y,=x
(x, y) (T
tu
f) (y)
_
_
e
c(x)u
du.
and therefore T
t
f (x) is dierentiable in t and
d
dt
_
e
c(x)t
T
t
f (x)
_
=
d
dt
c (x)
_
t
0
_
_
y,=x
(x, y) (T
tu
f) (y)
_
_
e
c(x)(tu)
du
=
d
dt
c (x)
_
t
0
_
_
y,=x
(x, y) (T
s
f) (y)
_
_
e
c(x)s
ds
= c (x)
_
_
y,=x
(x, y) (T
t
f) (y)
_
_
e
c(x)t
and hence,
d
dt
T
t
f (x) +c (x) T
t
f (x) = c (x)
_
_
y,=x
(x, y) (T
t
f) (y)
_
_
37.3 Formal Proofs 509
d
dt
T
t
f (x) = c (x)
_
_
y,=x
(x, y) (T
t
f) (y)
_
_
c (x) T
t
f (x)
=
yS
q (x, y) T
t
f (y)
where q (x, y) = c (x) (x, y) for y ,= x and q (x, x) = c (x) for all x S.
Our next goal is to show directly that we can use the jump hold description
of the Markov chain in order to give a full denition of the process. The reader
should compare the next theorem with Theorem 17.26.
Theorem 37.19. Let Y
n
n=0
be a discrete time Markov chain in some count-
able state space, S. Also let
n
n=1
be an independent collection of i.i.d. expo-
nential random variables with parameter 1 and suppose that c : S (0, ) is
a given function. As usual we let S
n
:=
n
/c (Y
n1
) and following the notation
in [43], J
n
= S
1
+ + S
n
. Let = lim
n
J
n
which will be the life-time of
the process that we now describe. Let
N
t
:= #n : J
n
t =
n=1
1
(0,t]
(J
n
) for t <
and dene
X
t
=
_
Y
Nt
if t <
if t
where is a new point we adjoin to S which we refer to as the cemetery. (Our
jumping particle whose position is described by X
t
is sent to the cemetery
at the rst time it has made an innite number of jumps which is the same
a.s. as the rst time it goes o to innity
2
.) Let denote those paths from
1
+
S which are right continuous with left limits such that (t) =
for t > () the rst time that has made an innite number of jumps.
Thus X
. The process X
t
t0
is a Markov process, i.e. for all T > 0 and
f : 1 bounded and measurable we have,
E[f (X
T+
) [B
T
] = E
XT
f (X) and T < . (37.12)
Proof. Let g bB
T
and f = f (X) . Since X
is a function of
(Y
0
, Y
1
, . . . ; S
1
, S
2
, . . . ) , i.e. X := (Y
0
, Y
1
, . . . ; S
1
, S
2
, . . . ) as described in the
statement of the theorem it follows that there is a function F such that
f (X) = f ( (Y
0
, Y
1
, . . . ; S
1
, S
2
, . . . )) =: F (Y
0
, Y
1
, . . . ; S
1
, S
2
, . . . )
2
This is because if Xt keeps revisiting a compact (i.e. nite) subset K of S, then the
accumulated jump times associated with jumps starting in K will be ininite.
We have
T
X = (Y
m
, Y
m+1
, . . . ; S
m+1
(T J
m
) , S
m+2
, . . . ) on J
m
T < J
m+1
and therefore
f (
T
X) = F (Y
m
, Y
m+1
, . . . ; S
m+1
(T J
m
) , S
m+2
, . . . ) on J
m
T < J
m+1
.
Similarly
g = G(Y
0
, Y
1
, . . . Y
m
; S
1
, S
2
, . . . S
m
) on T < J
m+1
and dene
g
m
:= G(Y
0
, Y
1
, . . . Y
m
; S
1
, S
2
, . . . S
m
) .
(We need some more precision here to assert there exists a globally dened
function, G, such that the above equation holds, see Norris [43, Lemma 6.5.3])
Using this notation we nd,
E[f (
T
X) g : J
m
T < J
m+1
]
= E[F (Y
m
, Y
m+1
, . . . ; S
m+1
(T J
m
) , S
m+2
, . . . ) g
m
: J
m
T < J
m+1
]
= E[F (Y
m
, Y
m+1
, . . . ; S
m+1
(T J
m
) , S
m+2
, . . . ) g
m
: J
m
T < J
m
+S
m+1
]
= E
_
F
_
Y
m
, Y
m+1
, . . . ;

m+1
c (Y
m
)
(T J
m
) , S
m+2
, . . .
_
g
m
: J
m
T < J
m
+

m+1
c (Y
m
)
_
= E
__

TJm
dc (Y
m
) e
c(Ym)
F (Y
m
, Y
m+1
, . . . ; (T J
m
) , S
m+2
, . . . ) g
m
: J
m
T
_
= E
_
e
c(Ym)(TJm)
_

0
dF (Y
m
, Y
m+1
, . . . ; , S
m+2
, . . . ) g
m
c (Y
m
) e
c(Ym)
: J
m
T
_
= E
_
e
c(Ym)(TJm)
_

0
dE
Ym
[F (Y
0
, Y
1
, . . . ; , S
2
, . . . )] g
m
c (Y
m
) e
c(Ym)
: J
m
T
_
= E[E
Ym
[F (Y
0
, Y
1
, . . . ; S
1
, S
2
, . . . )] g
m
: J
m
T < J
m+1
]
= E[E
XT
[F (Y
0
, Y
1
, . . . ; S
1
, S
2
, . . . )] g : J
m
T < J
m+1
]
= E[E
XT
[f] g : J
m
T < J
m+1
]
Summing this last equation on m then shows,
E[f (
T
X) g : T < ] = E[E
XT
[f] g : T < ]
which is equivalent to Eq. (37.12).
Observe that, for all s < t, that
E[f (X
t
) [B
s
] = E[f (
s
X
ts
) [B
s
] = E
Xs
f (X
ts
) = (T
ts
f) (X
s
)
which shows, in light of Theorem 37.18, that the chains we have been construct-
ing are time homogeneous with the correct Markovian semi-group.
38
Continuous Time M.C. Examples
38.1 Birth and Death Process basics
A birth and death process is a continuous time Markov chain with state
space being S = 0, 1, 2, . . . and transitions rates of the form;
0
0
1
1
1
2
2
2
3
3 . . .
n2
n1
(n 1)
n1
n
n
n
n+1
(n + 1) . . .
The associated Q matrix for this chain is given by
Q =
0
1
2
3
4
.
.
.
_
_
0 1 2 3 4 . . .
0

0
1
(
1
+
1
)
1
2
(
2
+
2
)
2
3
(
3
+
3
)
3
.
.
.
.
.
.
.
.
.
.
If
n
(t) = P (X (t) = n) , then (t) = (
n
(t))
n0
satises, (t) = (t) Q
which written out in components is the system of dierential equations;

0
(t) =
0
0
(t) +
1
1
(t)

1
(t) =
0
0
(t) (
1
+
1
)
1
(t) +
2
2
(t)
.
.
.

n
(t) =
n1
n1
(t) (
n
+
n
)
n
(t) +
n+1
n+1
(t) .
.
.
.
The associated discrete time chain is described by the jump diagram,
0
1
1
+
1
1
1
+
1
2
+
2
2
2
+
2
3
+
3
3 (n 1)
n1
n1
+
n1
n
n+n
n
n
n+n
n+1
n+1
+
n+1
. . .
In the jump hold description, a particle follows this discrete time chain. When
it arrives at a site, say n, it stays there for an exp(
n
+
) time and then

jumps to either n1 or n with probability
n
n+n
or
n
n+n
respectively. Given
your homework problem we may also describe these transitions by assuming at
each site we have a death clock D
n
= exp(
n
) and a Birth clock B
n
= exp(
n
)
with B
n
and D
n
being independent. We then stay at site n until either B
n
or
D
n
rings, i.e. for min(B
n
, D
n
) = exp(
n
+
n
) amount of time. If B
n
rings
rst we go to n +1 while if D
n
rings rst we go to n 1. When we are at 0 we
go to 1 after waiting exp(
0
) amount of time.
38.2 Pure Birth Process:
The innitesimal generator for a pure Birth process is described by the following
rate diagram
0
0
1
1
2
2
. . .
n1
(n 1)
n1
n
n
. . . .
For simplicity we are going to assume that we start at state 0. We will examine
this model is both the sojourn description and the innitesimal description. The
typical sample path is shown in Figure 38.1.
Fig. 38.1. A typical sample path for a pure birth process.
38.2.1 Innitesimal description
The matrix Q is this case is given by
512 38 Continuous Time M.C. Examples
Q
i,i+1
=
i
and Q
ii
=
i
for all i = 0, 1, 2, . . .
with all other entries being zero. Thus we have
Q =
0
1
2
3
.
.
.
0 1 2 3 . . .
_
0

0
1

1
2

2
3

3
.
.
.
.
.
.
.
If we now let
j
(t) = P
0
(X (t) = j) =
_
(0) e
tQ
j
then
j
(t) satises the system of dierential equations;

0
(t) =
0
0
(t)

1
(t) =
0
0
(t)
1
1
(t)
.
.
.

n
(t) =
n1
n1
(t)
n
n
(t)
.
.
.
The solution to the rst equation is given by
0
(t) = e
0t
(0) = e
0t
and the remaining may now be obtained inductively, see the ODE Lemma 36.14,
using
n
(t) =
n1
e
nt
_
t
0
e
n
n1
() d. (38.1)
So for example
1
(t) =
0
e
1t
_
t
0
e
1
0
() d =
0
e
1t
_
t
0
e
1
e
0
d
=

0
0
e
1t
e
(10)
[
=t
=0
=

0
0
_
e
1t
e
(10)t
e
1t
_
=

0
0
_
e
0t
e
1t
.
If
1
=
0
, this becomes,
1
(t) = (
0
t) e
0t
instead. In principle one can
compute all of these integrals (you have already done the case where
j
= for
all j) to nd all of the
n
(t) . The formula for the solution is given as
n
(t) = P (X (t) = n[X (0) = 0) =
0
. . .
n1
_
n
k=0
B
k,n
e
kt
_
where the B
k,n
are given on p. 338 of the book.
To see that this form of the answer is reasonable, if we look at the equations
for n = 0, 1, 2, 3, we have

0
(t) =
0
0
(t)

1
(t) =
0
0
(t)
1
1
(t)

2
(t) =
1
1
(t)
2
2
(t)

3
(t) =
2
2
(t)
3
3
(t)
and the matrix associated to this system is
Q
t
=
_
0

0
1

1
2

2
3
_
_
so that (
0
(t) , . . . ,
3
(t)) = (1, 0, 0, 0) e
tQ
. If all of the
j
are distinct, then Q
t
has
j
3
j=0
as its distinct eigenvalues and hence is diagonalizable. Therefore
we will have
(
0
(t) , . . . ,
3
(t)) = (1, 0, 0, 0) S
_
_
e
t0
e
t1
e
t2
e
t3
_
_
S
1
for some invertible matrix S. In particular it follows that
3
(t) must be a linear
combination of
_
e
tj
_
3
j=0
. Generalizing this argument shows that there must
be constants, C
k,n
n
k=0
such that
n
(t) =
n
k=0
C
kn
e
tk
.
We may now plug these expressions into the dierential equations,

n
(t) =
n1
n1
(t)
n
n
(t) ,
to learn
k=0
k
C
kn
e
tk
=
n1
n1
k=0
C
k,n1
e
tk
n
n
k=0
C
kn
e
tk
.
38.2 Pure Birth Process: 513
Since one may show
_
e
tk
_
n
k=0
are linearly independent, we conclude that
k
C
kn
=
n1
C
k,n1
1
kn1
n
C
kn
for k = 0, 1, 2, . . . , n.
This equation gives no information for k = n, but for k < n it implies,
C
k,n
=

n1
k
C
k,n1
for k n 1.
To discover the value of C
n,n
we use the fact that

n
k=0
C
kn
=
n
(0) = 0 for
n 1 to learn,
C
n,n
=
n1
k=0
C
k,n
=
n1
k=0
n1
k
C
k,n1
.
One may determine all of the coecients from these equations. For example,
we know that C
00
= 1 and therefore,
C
0,1
=

0
0
and C
1,1
= C
0,1
=

0
0
.
Thus we learn that
1
(t) =

0
0
_
e
0t
e
1t
_
as we have seen from above.
Remark 38.1. It is interesting to observe that
d
dt
(
0
(t) , . . . ,
3
(t))
_
_
1
1
1
1
_
_
=
d
dt
(1, 0, 0, 0) e
tQ
_
1
1
1
1
_
_
= (1, 0, 0, 0) e
tQ
Q
t
_
_
1
1
1
1
_
_
where
Q
t
_
_
1
1
1
1
_
_
=
_
0

0
1

1
2

2
3
_
_
_
_
1
1
1
1
_
_
=
_
_
0
0
0
3
_
_
and therefore,
d
dt
(
0
(t) , . . . ,
3
(t))
_
_
1
1
1
1
_
_
0.
This shows that

3
j=0
j
(t)
3
j=0
j
(0) = 1. Similarly one shows that
n
j=0
j
(t) 1 for all t 0 and n.
Letting n in this estimate then implies
j=0
j
(t) 1.
It is possible that we have a strict inequality here! We will discuss this below.
Remark 38.2. We may iterate Eq. (38.1) to nd,
1
(t) =
0
e
1t
_
t
0
e
1
0
() d =
0
e
1t
_
t
0
e
1
e
0
d
2
(t) =
1
e
2t
_
t
0
e
2
1
() d
=
1
e
2t
_
t
0
e
2
_
0
e
1
_

0
e
1
e
0
d
_
d
=
0
1
e
2t
_
t
0
de
(21)
_

0
e
(10)
d
=
0
1
e
2t
_
0t
e
(21)+(10)
dd
and continuing on this way we nd,
n
(t) =
0
1
. . .
n1
e
nt
_
0s1s2snt
e
n
j=1
(jj1)sj
ds
1
. . . ds
n
.
(38.2)
In the special case where
j
= for all j, this gives, by Lemma 38.3 below with
f (s) = 1,
n
(t) =
n
e
t
_
0s1s2snt
ds
1
. . . ds
n
=
(t)
n
n!
e
t
. (38.3)
Another special case of interest is when
j
= (j + 1) for all j 0. This will
be the Yule process discussed below. In this case,
n
(t) = n!
n
e
(n+1)t
_
0s1s2snt
e
n
j=1
sj
ds
1
. . . ds
n
= n!
n
e
(n+1)t
1
n!
__
t
0
e
s
ds
_n
=
n
e
(n+1)t
_
e
t
1
_
n
= e
t
_
1 e
t
_
n
, (38.4)
wherein we have used Lemma 38.3 below for the the second equality.
Lemma 38.3. Let f (t) be a continuous function, then for all n N we have
_
0s1s2snt
f (s
1
) . . . f (s
n
) ds
1
. . . ds
n
=
1
n!
__
t
0
f (s) ds
_n
.
Proof. Let F (t) :=
_
t
0
f (s) ds. The proof goes by induction on n. The
statement is clearly true when n = 1 and if it holds at level n, then
_
0s1s2snsn+1t
f (s
1
) . . . f (s
n
) f (s
n+1
) ds
1
. . . ds
n
ds
n+1
=
_
t
0
_
_
0s1s2snsn+1
f (s
1
) . . . f (s
n
) ds
1
. . . ds
n
_
f (s
n+1
) ds
n+1
=
_
t
0
_
1
n!
(F (s
n+1
))
n
_
F
t
(s
n+1
) ds
n+1
=
_
F(t)
0
_
1
n!
u
n
_
du
=
F (t)
n+1
(n + 1)!
as required.
38.2.2 Yule Process
Suppose that each member of a population gives birth independently to one
ospring at an exponential time with rate . If there are k members of the
population with birth times, T
1
, . . . , T
k
, then the time of the birth for this
population is min(T
1
, . . . , T
k
) = S
k
where S
k
is now an exponential random
variable with parameter, k. This description gives rise to a pure Birth process
with parameters
k
= k. In this case we start with initial distribution,
j
(0) =
j,1
. We have already solved for
k
(t) in this case. Indeed from from Eq. (38.4)
after a shift of the index by 1, we nd,
n
(t) = e
t
_
1 e
t
_
n1
for n 1.
38.2.3 Sojourn description
Let S
n
n=0
be independent exponential random variables with P (S
n
> t) =
e
nt
for all n and let
W
k
:= S
0
+ +S
k1
be the time of the k
th
birth, see Figure 38.1 where the graph of X (t) is shown
as determined by the sequence S
n
n=0
. With this notation we have
P (X (t) = 0) = P (S
0
> t) = e
0t
P (X (t) = 1) = P (S
0
t < S
0
+S
1
) = P (W
1
t < W
2
)
P (X (t) = 2) = P (W
2
t < W
3
)
.
.
.
P (X (t) = j) = P (W
j
t < W
j+1
)
where W
j
t < W
j+1
represents the event where the j
th
birth has occurred
by time t but the j
th
birth as not. Consider,
P (W
1
t < W
2
) =
0
1
_
0x0t<x0+x1
e
0x0
e
1x1
dx
0
dx
1
.
Doing the x
1
-integral rst gives,
P (X (t) = 1) = P (W
1
t < W
2
)
=
0
_
0x0t<x0+x1
e
0x0
_
e
1x1
x1=tx0
dx
0
=
0
_
0x0t
e
0x0
e
1(tx0)
dx
0
=
0
e
1t
_
0x0t
e
(10)x0
dx
0
=

0
0
e
1t
_
e
(10)t
1
_
=

0
0
_
e
0t
e
1t
.
There is one point which we have not yet addressed in this model, namely
does it make sense without further information. In terms of the Sojourn de-
scription this comes down to the issue as to whether P
_
j=1
S
j
=
_
= 1.
Indeed, if this is not the case, we will only have X (t) dened for t <
j=1
S
j
which may be less than innity. The next theorem tells us precisely when this
phenomenon can happen.
38.3 Pure Death Process 515
Theorem 38.4. Let S
j
j=1
be independent random variables such that S
j
d
=
exp(
j
) with 0 <
j
< for all j. Then:
1. If

n=1
1
n
< then P (
n=1
S
n
< ) = 1.
2. If

n=1
1
n
= then P (
n=1
S
n
= ) = 1.
Proof. 1. Since
E
_

n=1
S
n
_
=
n=1
E[S
n
] =
n=1
1
n
<
it follows that

n=1
S
n
< a.s.
2. By the DCT, independence, and Eq. (??),
E
_
e
n=1
Sn
_
= lim
N
E
_
e
N
n=1
Sn
_
= lim
N
N
n=1
E
_
e
Sn
= lim
N
N
n=1
_
1
1 +
1
n
_
= lim
N
exp
_
n=1
ln
_
1 +
1
n
_
_
= exp
_
n=1
ln
_
1 +
1
n
_
_
.
If
n
does not go to innity, then the latter sum is innite and
n
and
n=1
1
n
= then

n=1
ln
_
1 +
1
n
_
= as ln
_
1 +
1
n
_

=
1
n
for large
n. In any case we have shown that E
_
e
n=1
Sn
_
= 0 which can happen i
e
n=1
Sn
= 0 a.s. or equivalently

n=1
S
n
= a.s.
Remark 38.5. If

k=1
1/
k
< so that P (
n=1
S
n
< ) = 1, one may
dene X (t) = on t
n=1
S
n
. With this denition, X (t)
t0
is
again a Markov process. However, most of the examples we study will satisfy
k=1
1/
k
= .
38.3 Pure Death Process
A pure death process is described by the following rate diagram,
0
1
1
2
2
3
3 . . .
N1
(N 1)
N
N.
If
j
(t) = P (X (t) = j[X (0) =
j
(0)) , we have that

N
(t) =
N
N
(t)

N1
(t) =
N
N
(t)
N1
N1
(t)
.
.
.

n
(t) =
n+1
n+1
(t)
n
n
(t)
.
.
.

1
(t) =
2
2
(t)
1
1
(t)

0
(t) =
1
1
(t) .
Let us now suppose that
j
(t) = P (X (t) = j[X (0) = N) . A little thought
shows that we may nd
j
(t) for j = 1, 2, . . . , N by using the solutions for
the pure Birth process with 0 N, 1 (N 1) , 2 (N 2) , . . . , and
(N 1) 1. We may then compute
0
(t) := 1
N
j=1
j
(t) .
The explicit formula for these solutions may be found in the book on p. 346 in
the special case where all of the death parameters are distinct.
38.3.1 Cable Failure Model
Suppose that a cable is made up of N individual strands with the life time
of each strand being a exp(K (l)) random variable where K (l) > 0 is some
function of the load, l, on the strand. We suppose that the cable starts with N
bers and is put under a total load of NL that L is the load applied per ber
when all N bers are unbroken. If there are k bers in tact, the load per ber is
NL/k and the exponential life time of each ber is now K (NL/k) . Thus when
k - bers are in tact the time to the next ber breaking is exp(kK (NL/k)) .
So if S
j
1
j=N
are the Sojourn times at state j, the time to failure of the cable
is T =
N
j=1
S
j
and the expected time to failure is
ET =
N
j=1
ES
j
=
N
j=1
1
kK (NL/k)
=
1
N
N
j=1
1
k
N
K
_
N
k
L
_

=
_
1
0
1
xK (L/x)
dx
if K is a nice enough function and N is large. For example, if K (l) = l
/A for
some > 0 and A > 0, we nd
ET =
_
1
0
A
x(L/x)
dx =
A
L
_
1
0
x
1
dx =
A
L
.
Where as the expected life, at the start, of any one strand is 1/K (L) = A/L
.
Thus the cable last only
1
times the average strand life. It is actually better

to let L
0
be the total load applied so that L = L
0
/N, then the above formula
becomes,
ET =
A
L
0
N
.
38.3.2 Linear Death Process basics
Similar to the Yule process, suppose that each individual in a population has a
life expectancy, T
d
= exp() . Thus if there are k members in the population at
time t, using the memoryless property of the exponential distribution, we the
time of the next death is has distribution, exp(k) . Thus the
k
= k in this
case. Using the formula in the book on p. 346, we then learn that if we start
with an population of size N, then
n
(t) = P (X (t) = n[X (0) = N)
=
_
N
n
_
e
nt
_
1 e
t
_
Nn
for n = 0, 1, 2, . . . , N. (38.5)
So
n
(t)
N
n=0
is the binomial distribution with parameter e
t
. This may be
understood as follows. We have X (t) = n i there are exactly n members out
of the original N still alive. Let
j
be the life time of the j
th
member of the
population, so that
j
N
j=1
are i.i.d. exp() distributed random variables.
We then have the probability that a particular choice, A 1, 2, . . . , N of n -
members are alive with the others being dead is given by
P
_
(
jA
j
> t)
_
j / A
j
t
__
=
_
e
t
_
n
_
1 e
t
_
Nn
.
As there are
_
N
n
_
ways to choose such subsets, A 1, 2, . . . , N , with n
members, we arrive at Eq. (38.5).
38.3.3 Linear death process in more detail
(You may safely skip this subsection.) In this subsection, we suppose
that we start with a population of size N with
j
being the life time of the
j
th
member of the population. We assume that
j
N
j=1
are i.i.d. exp()
distributed random variables and let X (t) denote the number of people alive
at time t, i.e.
X (t) = #j :
j
> t .
Theorem 38.6. The process, X (t)
t0
is the linear death Markov process with
parameter, .
We will begin with the following lemma.
Lemma 38.7. Suppose that B and A
j
n
j=1
are events such that: 1) A
j
n
j=1
are pairwise disjoint, 2) P (A
j
) = P (A
1
) for all j, and 3) P (B A
j
) =
P (B A
1
) for all j. Then
P
_
B[
N
j=1
A
j
_
= P (B[A
1
) . (38.6)
We also use the identity, that
P (B[A C) = P (B[A) (38.7)
whenever C is independent of A, B .
Proof. The proof is the following simple computation,
P
_
B[
n
j=1
A
j
_
=
P
_
B
_
n
j=1
A
j
__
P
_
n
j=1
A
j
_ =
P
_
n
j=1
B A
j
_
P
_
n
j=1
A
j
_
=
n
j=1
P (B A
j
)
n
j=1
P (A
j
)
=
nP (B A
1
)
nP (A
1
)
= P (B[A
1
) .
For the second assertion, we have
P (B[A C) =
P (B A C)
P (A C)
=
P (B A) P (C)
P (A) P (C)
=
P (B A)
P (A)
= P (B[A) .
Proof. Sketch of the proof of Theorem 38.6. Let 0 u
j / U

j
u
jV

j
> v
j / V

j
v
so that X
u
= k, X
v
= l is the disjoint union of A
U,V
over all such choices
of V U as above. Notice that P (A
U,V
) is independent of how U V
as is P (X
t
= m A
U,V
) . Therefore by Lemma 38.7, we have, with V =
1, 2, . . . , l U = 1, 2, . . . , k , that
P (X
t
= m[X
u
= k, X
v
= l) = P (X
t
= m[A
U,V
)
= P (Exactly m of
1
, . . . ,
l
> t[
1
> v, . . . ,
l
> v, v
l+1
> u, . . . , v
k
> u)
= P (Exactly m of
1
, . . . ,
l
> t[
1
> v, . . . ,
l
> v)
=
_
l
m
_
P (
1
> t, . . . ,
m
> t,
m+1
t, . . .
l
t[
1
> v, . . . ,
l
> v)
=
_
l
m
_
P (
1
> t)
m
P (v <
1
t)
lm
P (v <
1
)
l
=
_
l
m
_
(e
t
)
m
(e
vt
e
t
)
lm
e
vl
=
_
l
m
_
e
m(tv)
_
1 e
(tv)
_
lm
.
Similar considerations show that X
t
has the Markov property and we have just
found the transition matrix for this process to be,
P (X
t
= m[X
v
= l) = 1
lm
_
l
m
_
e
m(tv)
_
1 e
(tv)
_
lm
.
So
P
lm
(t) := P (X
t
= m[X
0
= l) = 1
m
_
l
m
_
e
mt
_
1 e
t
_
lm
.
Dierentiating this equation at t = 0 implies
d
dt
[
0+
P
lm
(t) = 0 unless m = l or
m = l 1 and
d
dt
[
0+
P
l l
(t) = l and
d
dt
[
0+
P
l ,l1
(t) =
_
l
l 1
_
= l.
These are precisely the transition rate of the linear death process with parameter
.
Let us now also work out the Sojourn description in this model.
j
N
j=1
are independent exponential random
variables with parameter, as in the above model for the life times of a pop-
ulation. Let W
1
< W
2
< < W
N
be the order statistics of
j
N
j=1
, i.e.
W
1
< W
2
< < W
N
=
j
N
j=1
. Hence W
j
is the time of the j
th
death.
Further let S
1
= W
1
, S
2
= W
2
W
1
, . . . , S
N
= W
N
W
N1
are times be-
tween successive deaths. Then S
j
N
j=1
are exponential random variables with
S
j
d
= exp((N j) ) .
Proof. Since W
1
= S
1
= min(
1
, . . . ,
N
) , by a homework problem, S
1
d
=
exp(N) . Let
A
j
:=
_
j
< min(
k
)
k,=j
_

j
= t .
We then have
W
1
= t =
N
j=1
A
j
and
A
j
W
2
> s +t =
_
s +t < min(
k
)
k,=j
_

j
= t .
By symmetry we have (this is the informal part)
P (A
j
) = P (A
1
) and
P (A
j
W
2
> s +t) = P (A
1
W
2
> s +t) ,
and hence by Lemma 38.7,
P (W
2
> s +t[W
1
= t) = P
Now consider
W
2
= P (A
1
W
2
> s +t [A
1
)
= P
_
1
= t
_
min(
k
)
k,=1
> s +t
_
[ min(
k
)
k,=1
>
1
= t
_
=
P
_
min(
k
)
k,=1
> s +t,
1
= t
_
P
_
min(
k
)
k,=1
> t,
1
= t
_
=
P
_
min(
k
)
k,=1
> s +t
_
P
_
min(
k
)
k,=1
> t
_ = e
(N1)s
since min(
k
)
k,=1
d
= exp((N 1) ) and the memoryless property of exponen-
tial random variables. This shows that S
2
:= W
2
W
1
d
= exp((N 1) ) .
Let us consider the next case, namely P (W
3
W
2
> t[W
1
= a, W
2
= a +b) .
In this case we argue as above that
P (W
3
W
2
> t[W
1
= a, W
2
= a +b)
= P (min(
3
, . . . ,
N
)
2
> t[
1
= a,
2
= a +b, min(
3
, . . . ,
N
) >
2
)
=
P (min(
3
, . . . ,
N
) > t +a +b,
1
= a,
2
= a +b, min(
3
, . . . ,
N
) >
2
)
P (
1
= a,
2
= a +b, min(
3
, . . . ,
N
) > a +b)
=
P (min(
3
, . . . ,
N
) > t +a +b)
P (min(
3
, . . . ,
N
) > a +b)
= e
(N2)t
.
We continue on this way to get the result. This proof is not rigorous, since
P (
j
= t) = 0 but the spirit is correct.
Rigorous Proof. (Probably should be skipped.) In this proof, let g
be a bounded function and T
k
:= min(
l
: l ,= k) . We then have that T
k
and
k
are independent, T
k
d
= exp((N 1) ) , and hence
E[1
W2W1>t
g (W
1
)] =
k
E[1
W2W1>t
g (W
1
) :
k
< T
k
]
=
k
E[1
Tkk>t
g (
k
) :
k
< T
k
]
=
k
E[1
Tkk>t
g (
k
)]
=
k
E[exp((N 1) (t +
k
)) g (
k
)]
= exp((N 1) t)
k
E[exp((N 1)
k
) g (
k
)]
= exp((N 1) t)
k
E[1
Tkk>0
g (
k
)]
= exp((N 1) t)
k
E[1
Tkk>0
g (W
1
)]
= exp((N 1) t) E[g (W
1
)] .
It follows from this calculation that W
2
W
1
and W
1
are independent, W
2

W
1
= exp((N 1)) .
The general case may be done similarly. To see how this goes, let us show
that W
3
W
2
d
= exp((N 2)) and is independent of W
1
and W
2
. To this end,
let T
jk
:= min
l
: l ,= j or k for j ,= k in which case T
jk
d
= exp((N 2) )
and is independent of
j
,
k
. We then have
E[1
W3W2>t
g (W
1
, W
2
)] =
j,=k
E[1
W3W2>t
g (W
1
, W
2
) :
j
<
k
< T
jk
]
=
j,=k
E
_
1
Tjkk>t
g (
j
,
k
) :
j
<
k
< T
jk
j,=k
E
_
1
Tjkk>t
g (
j
,
k
) :
j
<
k
j,=k
E[exp((N 2) (t +
k
)) g (
j
,
k
) :
j
<
k
]
= exp((N 2) t)
j,=k
E[exp((N 2)
k
) g (
j
,
k
) :
j
<
k
]
= exp((N 2) t)
j,=k
E
_
1
Tjkk>0
g (
j
,
k
) :
j
<
k
= exp((N 2) t)
j,=k
E[g (W
1
, W
2
) :
j
<
k
< T
jk
]
= exp((N 2) t) E[g (W
1
, W
2
)] .
This again shows that W
3
W
2
is independent of W
1
, W
2
and W
3
W
2
d
=
exp((N 2)) . We leave the general argument to the reader.
38.4 Birth and Death Processes
We have already discussed the basics of the Birth and death processes. To have
the existence of the process requires some restrictions on the Birth and Death
parameters which are discussed on p. 359 of the book. In general, we are not able
to nd solve for the transition semi-group, e
tQ
, in this case. We will therefore
have to ask more limited questions about more limited models. This is what we
will consider in the rest of this section. We will also consider some interesting
situations which one might model by a Birth and Death process.
Recall that the functions,
j
(t) = P (X (t) = j) , satisfy the dierential
equations

0
(t) =
0
0
(t) +
1
1
(t)

1
(t) =
0
0
(t) (
1
+
1
)
1
(t) +
2
2
(t)

2
(t) =
1
1
(t) (
2
+
2
)
2
(t) +
3
3
(t)
.
.
.

n
(t) =
n1
n1
(t) (
n
+
n
)
n
(t) +
n+1
n+1
(t) .
.
.
.
38.4 Birth and Death Processes 519
Hence if are going to look for a stationary distribution, we must set
j
(t) = 0
for all t and solve the system of algebraic equations:
0 =
0
0
+
1
1
0 =
0
0
(
1
+
1
)
1
+
2
2
0 =
1
1
(
2
+
2
)
2
+
3
3
.
.
.
0 =
n1
n1
(
n
+
n
)
n
+
n+1
n+1
.
.
.
.
We solve these equations in order to nd,
1
=

0
0
,
2
=

1
+
1
0
=

1
+
1
0
=

0
3
=

2
+
2
1
=

2
+
2
0
=

0
0
.
.
.
n
=

0
2
. . .
n1
3
. . .
n
0
.
This leads to the following proposition.
Proposition 38.9. Let
n
:=
012...n1
123...n
for n = 1, 2, . . . . and
0
:= 1. Then
the birth and death process, X (t) with birth rates
j
j=0
and death rates
j=1
has a stationary distribution, , i :=
n=0
n
< in which case,
n
=

n
for all n.
Lemma 38.10 (Detail balance). In general, if we can nd a distribution, ,
satisfying the detail balance equation,
i
Q
ij
=
j
Q
ji
for all i ,= j, (38.8)
then is a stationary distribution, i.e. Q = 0.
Proof. First proof. Intuitively, Eq. (38.8) states that sites i and j are
always exchanging sand back and forth at equal rates. Hence if all sites are
doing this the size of the piles of sand at each site must remain unchanged.
Second Proof. Summing Eq. (38.8) on i making use of the fact that
i
Q
ji
= 0 for all j implies,

i
i
Q
ij
= 0.
We could have used this result on our birth death processes to nd the
stationary distribution as well. Indeed, looking at the rate diagram,
0
0
1
1
1
2
2
2
3
3 . . .
n2
n1
(n 1)
n1
n
n
n
n+1
(n + 1) ,
we see the conditions for detail balance between n and n = 1 are,
n
=
n+1
n+1
which implies
n+1
n
=
n
n+1
. Therefore it follows that,
0
=

0
1
,
0
=

2
0
=

1
1
,
.
.
.
0
=

n
n1
n1
n2
. . .

1
0
=

n1
n
. . .

1
1
=

0
1
. . .
n1
2
. . .
n
=
n
as before.
Lemma 38.11. For [x[ < 1 and 1 we have,
(1 x)
k=0
( + 1) . . . ( +k 1)
k!
x
k
, (38.9)
where
(+1)...(+k1)
k!
:= 1 when k = 0.
Proof. This is a consequence of Taylors theorem with integral remainder.
The main point is to observe that
d
dx
(1 x)
= (1 x)
(+1)
_
d
dx
_
2
(1 x)
= ( + 1) (1 x)
(+2)
.
.
.
_
d
dx
_
k
(1 x)
= ( + 1) . . . ( +k 1) (1 x)
(+k)
.
.
.
and hence,
_
d
dx
_
k
(1 x)
[
x=0
= ( + 1) . . . ( +k 1) . (38.10)
Therefore by Taylors theorem,
(1 x)
k=0
1
k!
_
d
dx
_
k
(1 x)
[
x=0
x
k
which combined with Eq. (38.10) gives Eq. (38.9).
Example 38.12 (Exercise 4.5 on p. 377). Suppose that
n
= < 1 and
n
=
n
n+1
. In this case,
n
=

n
1
2
2
3

n
n+1
= (n + 1)
n
and we must have,
n
=
(n + 1)
n
n=0
(n + 1)
n
.
We can simplify this answer a bit by noticing that
n=0
(n + 1)
n
=
d
d
n=0
n+1
=
d
d
1
=
(1 ) +
(1 )
2
=
1
(1 )
2
.
(Alternatively, apply Lemma 38.11 with = 2 and x = . )Thus we have,
n
= (1 )
2
(n + 1)
n
.
Example 38.13 (Exercise 4.4 on p. 377). Two machines operate with failure
rate and there is a repair facility which can repair one machine at a time with
rate . Let X (t) be the number of operational machines at time t. The state
space is thus, 0, 1, 2 with the transition diagram,
0
0
1
1
1
2
2
where
0
= ,
1
= ,
2
= 2 and
1
= . Thus we nd,
1
=

0
0
=

2
=

2
2
2
0
=
1
2
0
.
so that
1 =
0
+
1
+
2
=
_
1 +

+
1
2
2
_
0
.
So the long run probability that all machines are broken is given by
0
=
_
1 +

+
1
2
2
_
1
.
If we now suppose that only one machine can be in operation at a time
(perhaps there is only one plug), the new rates become,
0
= ,
1
= ,
2
=
and
1
= and working as above we have:
1
=

0
0
=

2
=

2
0
=

2
0
.
so that
1 =
0
+
1
+
2
=
_
1 +

+

2
2
_
0
.
So the long run probability that all machines are broken is given by
0
=
_
1 +

+

2
2
_
1
.
Example 38.14 (Problem VI.4.7, p. 379). A system consists of 3 machines and
2 repairmen. At most 2 machines can operate at any time. The amount of
time that an operating machine works before breaking down is exponentially
distributed with mean 5 hours. The amount of time that it takes a single re-
pairman to x a machine is exponentially distributed with mean 4 hours. Only
one repairman can work on a failed machine at any given time. Let X (t) be
the number of machines in operating condition at time t.
a) Calculate the long run probability distribution of X (t) .
38.4 Birth and Death Processes 521
b) If an operating machine produces 100 units of output per hour, what is the
long run output per hour from the factory.
Solution to Exercise (Problem VI.4.7, p. 379).
The state space of operating machines is S = 0, 1, 2, 3 and the system is
modeled by a birth death process with rate diagram,
0
2/4
1/5
1
2/4
2/5
2
1/4
2/5
3.
a) We then have
0
= 1,
1
=
1/2
1/5
=
5
2
2
=
1/2
1/5
1/2
2/5
=
5
2
2
3
3
=
1/2
1/5
1/2
2/5
1/4
2/5
=
5
3
2
4
1
4
=
5
3
2
6
and
=
3
j=0
i
= 1 +
5
2
+
5
2
2
3
+
5
3
2
6
=
549
64
.
:
549
64
Therefore
i
=
i
/ gives,
(
0
,
1
,
2
,
3
) =
64
549
_
1,
5
2
,
5
2
2
3
,
5
3
2
6
_
=
_
64
549
160
549
200
549
125
549
_
=
_
0.116 58 0.291 44 0.364 30 0.227 69
_
.
b) If the operating machines can produce 100 units per hour, the long run
output per hour is,
100
1
+200 (
2
+
3
) = 1000.291 44+200 (0.364 30 + 0.227 69)
= 147.54 /hour.
Solution to Exercise (Problem VI.4.7, p. 379 but only one repair per-
son.). Here is the same problems with only one repair person. The state space
of operating machines is S = 0, 1, 2, 3 and the system is modeled by a birth
death process with rate diagram,
0
0
1
1
1
2
2
2
3
3
where,
0
=
1
=
2
= 1/4 and
1
= 1/5,
2
=
3
= 2/5, so the rate diagram
is,
0
1/4
1/5
1
1/4
2/5
2
1/4
2/5
3.
a) We then have
0
= 1,
1
=
1/4
1/5
=
5
4
2
=
1/4
1/5
1/4
2/5
=
5
2
2
1
4
2
3
=
1/4
1/5
1/4
2/5
1/4
2/5
=
5
3
2
2
1
4
3
and
=
3
j=0
i
= 1 +
5
4
+
5
2
2
1
4
2
+
5
3
2
2
1
4
3
=
901
256
.
Therefore
i
=
i
/ gives,
(
0
,
1
,
2
,
3
) =
256
901
_
1,
5
4
,
5
2
2
1
4
2
,
5
3
2
2
1
4
3
_
=
_
256
901
320
901
200
901
125
901
_
=
_
0.284 0.355 0.222 0.139
_
.
b) If the operating machines can produce 100 units per hour, the long run
output per hour is,
100
1
+ 200 (
2
+
3
) = 100 0.355 + 200 (0.222 + 0.139)
= 108 /hour.
Example 38.15 (Telephone Exchange). Consider as telephone exchange consist-
ing of K out going lines. The mean call time is 1/ and new call requests arrive
at the exchange at rate . If all lines are occupied, the call is lost. Let X (t) be
the number of outgoing lines which are in service at time t see Figure 38.3.
We model this as a birth death process with state space, 0, 1, 2, . . . , K and
birth parameters,
k
= for k = 0, 1, 2, . . . , K 1 and death rates,
k
= k
for k = 1, 2, . . . , K, see Figure 38.4. In this case,
= 1,
1
=

,
2
=

2
2
2
,
3
=

3
3!
3
, . . . ,
K
=

K
K!
K
so that
Fig. 38.3. Schematic of a telephone exchange.
Fig. 38.4. Rate diagram for the telephone exchange.
:=
K
k=0
1
k!
_
_
k
= e
/
for large K.
and hence
k
=
1
1
k!
_
_
k
=
1
k!
_
_
k
e
/
.
For example, suppose = 100 calls / hour and average duration of a connected
call is 1/4 of an hour, i.e. = 4. Then we have
25
=
1
25!
(25)
25
25
k=0
1
k!
(25)
k
= 0.144.
so the exchange is busy 14.4% of the time. On the other hand if there are 30 or
even 35 lines, then we have,
30
=
1
30!
(25)
30
30
k=0
1
k!
(25)
k
= 0.053
and
35
=
1
35!
(25)
35
35
k=0
1
k!
(25)
k
= .012
and hence the exchange is busy 5.3% and 1.2% respectively.
38.4.1 Linear birth and death process with immigration
Suppose now that
n
= n +a and
n
= n for some , > 0 where and
represent the birth rates and deaths of each individual in the population and a
represents the rate of migration into the population. In this case,
n
=
a (a +) (a + 2) . . . (a + (n 1) )
n!
n
=
_
_
n a
_
a
+ 1
_ _
a
+ 2
_
. . .
_
a
+ (n 1)
_
n!
.
Using Lemma 38.11 with = a/ and x = / which we need to assume is
less than 1, we nd
:=
n=0
n
=
_
1

_
a/
and therefore,
n
=
_
1

_
a/ a
_
a
+ 1
_ _
a
+ 2
_
. . .
_
a
+ (n 1)
_
n!
_
_
n
In this case there is an invariant distribution i < and a > 0. Notice that
if a = 0, then 0 is an absorbing state so when < , the process actually dies
out.
(STOP) Bruce think about this, perhaps we need to start at
1
instead of
0
. Or we take
0
= 1 would be an invariant distribution when a = 0. The
class 1,2,3,. . . is now not closed and hence transient!) When a = 0 we should
compute the hitting probability of site 0!
Now that we have found the stationary distribution in this case, let us try
to compute the expected population of this model at time t.
Theorem 38.16. If
M (t) := E[X (t)] =
n=1
nP (X (t) = n) =
n=1
n
n
(t)
be the expected population size for our linear birth and death process with im-
migration, then
M (t) =
a

_
e
t()
1
_
+M (0) e
t()
which when = should be interpreted as
M (t) = at +M (0) .
Proof. In this proof we take for granted the fact that it is permissible to
interchange the time derivative with the innite sum. Assuming this fact we
nd,
M (t) =
n=1
n
n
(t)
=
n=1
n
_
(a +(n 1))
n1
(t)
(a +n +n)
n
(t) +(n + 1)
n+1
(t)
_
=
n=1
n(a +(n 1))
n1
(t)
n=1
n(a +n +n)
n
(t) +
n=1
n(n + 1)
n+1
(t)
=
n=0
(n + 1) (a +n)
n
(t)
n=1
n(a +n +n)
n
(t) +
n=2
(n 1) n
n
(t)
=a
0
(t) + [2 (a +) (a + +)]
1
(t)
+
n=2
[(n + 1) (a +n) +(n 1) n n(a +n +n)]
n
(t)
=a
0
(t) + [a + ]
1
(t) +
n=2
[(a +n) n]
n
(t)
=a
0
(t) +
n=1
[a +n n]
n
(t)
=
n=0
[a +n n]
n
(t) = a + ( ) M (t) .
Thus we have shown that
M (t) = a + ( ) M (t) with M (0) =
n=1
n
n
(0) ,
where M (0) is the mean size of the initial population. Solving this simple
dierential equation gives the results.
Part VIII
Appendices
39
Basic Metric Space Facts
Denition 39.1. A function d : X X [0, ) is called a metric if
1. (Symmetry) d(x, y) = d(y, x) for all x, y X
2. (Non-degenerate) d(x, y) = 0 if and only if x = y X
3. (Triangle inequality) d(x, z) d(x, y) +d(y, z) for all x, y, z X.
As primary examples, any normed space (X, ||) (see Denition ??) is a
metric space with d(x, y) := |x y| . Thus the space
p
() (as in Theorem
??) is a metric space for all p [1, ]. Also any subset of a metric space is a
metric space. For example a surface in 1
3
is a metric space with the distance
between two points on being the usual distance in 1
3
.
Denition 39.2. Let (X, d) be a metric space. The open ball B(x, ) X
centered at x X with radius > 0 is the set
B(x, ) := y X : d(x, y) < .
We will often also write B(x, ) as B
x
(). We also dene the closed ball cen-
tered at x X with radius > 0 as the set C
x
() := y X : d(x, y) .
Denition 39.3. A sequence x
n
n=1
in a metric space (X, d) is said to be
convergent if there exists a point x X such that lim
n
d(x, x
n
) = 0. In
this case we write lim
n
x
n
= x or x
n
x as n .
Exercise 39.1. Show that x in Denition 39.3 is necessarily unique.
Denition 39.4. A set E X is bounded if E B(x, R) for some x X
and R < . A set F X is closed i every convergent sequence x
n
n=1
which is contained in F has its limit back in F. A set V X is open i V
c
is
closed. We will write F X to indicate F is a closed subset of X and V
o
X
to indicate the V is an open subset of X. We also let
d
denote the collection
of open subsets of X relative to the metric d.
Denition 39.5. A subset A X is a neighborhood of x if there exists an
open set V
o
X such that x V A. We will say that A X is an open
neighborhood of x if A is open and x A.
Exercise 39.2. Let T be a collection of closed subsets of X, show T :=
FJ
F is closed. Also show that nite unions of closed sets are closed, i.e. if
F
k
n
k=1
are closed sets then
n
k=1
F
k
is closed. (By taking complements, this
shows that the collection of open sets,
d
, is closed under nite intersections
and arbitrary unions.)
The following continuity facts of the metric d will be used frequently in
the remainder of this book.
Lemma 39.6. For any non empty subset A X, let d
A
(x) := infd(x, a)[a
A, then
[d
A
(x) d
A
(y)[ d(x, y) x, y X (39.1)
and in particular if x
n
x in X then d
A
(x
n
) d
A
(x) as n . Moreover
the set F
:= x X[d
A
(x) is closed in X.
Proof. Let a A and x, y X, then
d
A
(x) d(x, a) d(x, y) +d(y, a).
Take the inmum over a in the above equation shows that
d
A
(x) d(x, y) +d
A
(y) x, y X.
Therefore, d
A
(x) d
A
(y) d(x, y) and by interchanging x and y we also have
that d
A
(y) d
A
(x) d(x, y) which implies Eq. (39.1). If x
n
x X, then by
Eq. (39.1),
[d
A
(x) d
A
(x
n
)[ d(x, x
n
) 0 as n
so that lim
n
d
A
(x
n
) = d
A
(x) . Now suppose that x
n
n=1
F
and x
n
x
in X, then
d
A
(x) = lim
n
d
A
(x
n
)
since d
A
(x
n
) for all n. This shows that x F
and hence F
is closed.
Corollary 39.7. The function d satises,
[d(x, y) d(x
t
, y
t
)[ d(y, y
t
) +d(x, x
t
).
In particular d : X X [0, ) is continuous in the sense that d(x, y) is
close to d(x
t
, y
t
) if x is close to x
t
and y is close to y
t
. (The notion of continuity
will be developed shortly.)
528 39 Basic Metric Space Facts
Proof. By Lemma 39.6 for single point sets and the triangle inequality for
the absolute value of real numbers,
[d(x, y) d(x
t
, y
t
)[ [d(x, y) d(x, y
t
)[ +[d(x, y
t
) d(x
t
, y
t
)[
d(y, y
t
) +d(x, x
t
).
Example 39.8. Let x X and > 0, then C
x
() and B
x
()
c
are closed subsets
of X. For example if y
n
n=1
C
x
() and y
n
y X, then d (y
n
, x) for
all n and using Corollary 39.7 it follows d (y, x) , i.e. y C
x
() . A similar
proof shows B
x
()
c
is open, see Exercise 39.3.
Exercise 39.3. Show that V X is open i for every x V there is a > 0
such that B
x
() V. In particular show B
x
() is open for all x X and > 0.
Hint: by denition V is not open i V
c
is not closed.
Lemma 39.9 (Approximating open sets from the inside by closed
sets). Let A be a closed subset of X and F
:= x X[d
A
(x) X
be as in Lemma 39.6. Then F
A
c
as 0.
Proof. It is clear that d
A
(x) = 0 for x A so that F
A
c
for each > 0
and hence
>0
F
A
c
. Now suppose that x A
c
o
X. By Exercise 39.3
there exists an > 0 such that B
x
() A
c
, i.e. d(x, y) for all y A. Hence
x F
and we have shown that A

c

>0
F
. Finally it is clear that F
whenever
t
.
Denition 39.10. Given a set A contained in a metric space X, let

A X be
the closure of A dened by
A := x X : x
n
A x = lim
n
x
n
.
That is to say

A contains all limit points of A. We say A is dense in X if
A = X, i.e. every element x X is a limit of a sequence of elements from A.

Exercise 39.4. Given A X, show

A is a closed set and in fact
A = F : A F X with F closed. (39.2)

That is to say

A is the smallest closed set containing A.
Denition 39.11. A subset D of X is dense if

D = X and we say that X is
separable if it contains a countable dense subset.
39.1 Continuity
Suppose that (X, ) and (Y, d) are two metric spaces and f : X Y is a
function.
Denition 39.12. A function f : X Y is continuous at x X if for all
> 0 there is a > 0 such that
d(f(x), f(x
t
)) < provided that (x, x
t
) < . (39.3)
The function f is said to be continuous if f is continuous at all points x X.
The following lemma gives two other characterizations of continuity of a
function at a point.
Lemma 39.13 (Local Continuity Lemma). Suppose that (X, ) and (Y, d)
are two metric spaces and f : X Y is a function dened in a neighborhood
of a point x X. Then the following are equivalent:
1. f is continuous at x X.
2. For all neighborhoods A Y of f(x), f
1
(A) is a neighborhood of x X.
3. For all sequences x
n
n=1
X such that x = lim
n
x
n
, f(x
n
) is con-
vergent in Y and
lim
n
f(x
n
) = f
_
lim
n
x
n
_
.
Proof. 1 = 2. If A Y is a neighborhood of f (x) , there exists > 0
such that B
f(x)
() A and because f is continuous there exists a > 0 such
that Eq. (39.3) holds. Therefore
B
x
() f
1
_
B
f(x)
()
_
f
1
(A)
showing f
1
(A) is a neighborhood of x.
2 = 3. Suppose that x
n
n=1
X and x = lim
n
x
n
. Then for any >
0, B
f(x)
() is a neighborhood of f (x) and so f
1
_
B
f(x)
()
_
is a neighborhood
of x which must contain B
x
() for some > 0. Because x
n
x, it follows that
x
n
B
x
() f
1
_
B
f(x)
()
_
for a.a. n and this implies f (x
n
) B
f(x)
() for
a.a. n, i.e. d(f(x), f (x
n
)) < for a.a. n. Since > 0 is arbitrary it follows that
lim
n
f (x
n
) = f (x) .
3. = 1. We will show not 1. = not 3. If f is not continuous at x,
there exists an > 0 such that for all n N there exists a point x
n
X with
(x
n
, x) <
1
n
yet d (f (x
n
) , f (x)) . Hence x
n
x as n yet f (x
n
) does
not converge to f (x) .
Here is a global version of the previous lemma.
39.2 Completeness in Metric Spaces 529
Lemma 39.14 (Global Continuity Lemma). Suppose that (X, ) and (Y, d)
are two metric spaces and f : X Y is a function dened on all of X. Then
the following are equivalent:
1. f is continuous.
2. f
1
(V )
for all V
d
, i.e. f
1
(V ) is open in X if V is open in Y.
3. f
1
(C) is closed in X if C is closed in Y.
4. For all convergent sequences x
n
X, f(x
n
) is convergent in Y and
lim
n
f(x
n
) = f
_
lim
n
x
n
_
.
Proof. Since f
1
(A
c
) =
_
f
1
(A)
c
, it is easily seen that 2. and 3. are
equivalent. So because of Lemma 39.13 it only remains to show 1. and 2. are
equivalent. If f is continuous and V Y is open, then for every x f
1
(V ) , V
is a neighborhood of f (x) and so f
1
(V ) is a neighborhood of x. Hence f
1
(V )
is a neighborhood of all of its points and from this and Exercise 39.3 it follows
that f
1
(V ) is open. Conversely, if x X and A Y is a neighborhood
of f (x) then there exists V
o
X such that f (x) V A. Hence x
f
1
(V ) f
1
(A) and by assumption f
1
(V ) is open showing f
1
(A) is a
neighborhood of x. Therefore f is continuous at x and since x X was arbitrary,
f is continuous.
Example 39.15. The function d
A
dened in Lemma 39.6 is continuous for each
A X. In particular, if A = x , it follows that y X d(y, x) is continuous
for each x X.
Exercise 39.5. Use Example 39.15 and Lemma 39.14 to recover the results of
Example 39.8.
The next result shows that there are lots of continuous functions on a metric
space (X, d) .
Lemma 39.16 (Urysohns Lemma for Metric Spaces). Let (X, d) be a
metric space and suppose that A and B are two disjoint closed subsets of X.
Then
f(x) =
d
B
(x)
d
A
(x) +d
B
(x)
for x X (39.4)
denes a continuous function, f : X [0, 1], such that f(x) = 1 for x A and
f(x) = 0 if x B.
Proof. By Lemma 39.6, d
A
and d
B
are continuous functions on X. Since
A and B are closed, d
A
(x) > 0 if x / A and d
B
(x) > 0 if x / B. Since
A B = , d
A
(x) +d
B
(x) > 0 for all x and (d
A
+d
B
)
1
is continuous as well.
The remaining assertions about f are all easy to verify.
Sometimes Urysohns lemma will be use in the following form. Suppose
F V X with F being closed and V being open, then there exists f
C (X, [0, 1])) such that f = 1 on F while f = 0 on V
c
. This of course follows
from Lemma 39.16 by taking A = F and B = V
c
.
39.2 Completeness in Metric Spaces
Denition 39.17 (Cauchy sequences). A sequence x
n
n=1
in a metric
space (X, d) is Cauchy provided that
lim
m,n
d(x
n
, x
m
) = 0.
Exercise 39.6. Show that convergent sequences are always Cauchy sequences.
The converse is not always true. For example, let X = be the set of rational
numbers and d(x, y) = [xy[. Choose a sequence x
n
n=1
which converges
to

2 1, then x
n
n=1
is (, d) Cauchy but not (, d) convergent. The
sequence does converge in 1 however.
Denition 39.18. A metric space (X, d) is complete if all Cauchy sequences
are convergent sequences.
Exercise 39.7. Let (X, d) be a complete metric space. Let A X be a subset
of X viewed as a metric space using d[
AA
. Show that (A, d[
AA
) is complete
i A is a closed subset of X.
Example 39.19. Examples 2. 4. of complete metric spaces will be veried in
Chapter ?? below.
1. X = 1 and d(x, y) = [x y[, see Theorem ?? above.
2. X = 1
n
and d(x, y) = |x y|
2
=
_
n
i=1
(x
i
y
i
)
2
_
1/2
.
3. X =
p
() for p [1, ] and any weight function : X (0, ).
4. X = C([0, 1], 1) the space of continuous functions from [0, 1] to 1 and
d(f, g) := max
t[0,1]
[f(t) g(t)[.
This is a special case of Lemma ?? below.
5. Let X = C([0, 1], 1) and
d(f, g) :=
_
1
0
[f(t) g(t)[ dt.
You are asked in Exercise ?? to verify that (X, d) is a metric space which
is not complete.
Exercise 39.8 (Completions of Metric Spaces). Suppose that (X, d) is a
(not necessarily complete) metric space. Using the following outline show there
exists a complete metric space
_
X,

d
_
and an isometric map i : X

X such
that i (X) is dense in

X, see Denition 39.10.
1. Let ( denote the collection of Cauchy sequences a = a
n
n=1
X. Given
two element a, b ( show d
c
(a, b) := lim
n
d (a
n
, b
n
) exists, d
c
(a, b) 0
for all a, b ( and d
c
satises the triangle inequality,
d
c
(a, c) d
c
(a, b) +d
c
(b, c) for all a, b, c (.
Thus ((, d
c
) would be a metric space if it were true that d
c
(a, b) = 0 i
a = b. This however is false, for example if a
n
= b
n
for all n 100, then
d
c
(a, b) = 0 while a need not equal b.
2. Dene two elements a, b ( to be equivalent (write a b) when-
ever d
c
(a, b) = 0. Show is an equivalence relation on ( and that
d
c
(a
t
, b
t
) = d
c
(a, b) if a a
t
and b b
t
. (Hint: see Corollary 39.7.)
3. Given a ( let a := b ( : b a denote the equivalence class containing
a and let

X := a : a ( denote the collection of such equivalence classes.
Show that

d
_
a,
b
_
:= d
c
(a, b) is well dened on

X

X and verify
_
X,

d
_
is
a metric space.
4. For x X let i (x) = a where a is the constant sequence, a
n
= x for all n.
Verify that i : X

X is an isometric map and that i (X) is dense in

X.
5. Verify
_
X,

d
_
is complete. Hint: if a(m)
m=1
is a Cauchy sequence in

X
choose b
m
X such that

d (i (b
m
) , a(m)) 1/m. Then show a(m)

b
where b = b
m
m=1
.
39.3 Compactness
Denition 39.20. The subset A of a topological space (X ) is said to be com-
pact if every open cover (Denition ??) of A has nite a sub-cover, i.e. if |
is an open cover of A there exists |
0
| such that |
0
is a cover of A. (We
will write A X to denote that A X and A is compact.) A subset A X
is precompact if

A is compact.
Proposition 39.21. Suppose that K X is a compact set and F K is a
closed subset. Then F is compact. If K
i
n
i=1
is a nite collections of compact
subsets of X then K =
n
i=1
K
i
is also a compact subset of X.
Proof. Let | be an open cover of F, then |F
c
is an open cover of
K. The cover |F
c
of K has a nite subcover which we denote by |
0
F
c
where |
0
|. Since F F
c
= , it follows that |
0
is the desired subcover
of F. For the second assertion suppose | is an open cover of K. Then |
covers each compact set K
i
and therefore there exists a nite subset |
i
|
for each i such that K
i
|
i
. Then |
0
:=
n
i=1
|
i
is a nite cover of K.
Exercise 39.9 (Suggested by Michael Gurvich). Show by example that
the intersection of two compact sets need not be compact. (This pathology
disappears if one assumes the topology is Hausdor, see Denition ?? below.)
Exercise 39.10. Suppose f : X Y is continuous and K X is compact,
then f(K) is a compact subset of Y. Give an example of continuous map,
f : X Y, and a compact subset K of Y such that f
1
(K) is not compact.
Exercise 39.11 (Dinis Theorem). Let X be a compact topological space
and f
n
: X [0, ) be a sequence of continuous functions such that f
n
(x) 0
as n for each x X. Show that in fact f
n
0 uniformly in x, i.e.
sup
xX
f
n
(x) 0 as n . Hint: Given > 0, consider the open sets V
n
:=
x X : f
n
(x) < .
Denition 39.22. A collection T of closed subsets of a topological space (X, )
has the nite intersection property if T
0
,= for all T
0
T.
The notion of compactness may be expressed in terms of closed sets as
follows.
Proposition 39.23. A topological space X is compact i every family of closed
sets T 2
X
having the nite intersection property satises

T ,= .
Proof. () Suppose that X is compact and T 2
X
is a collection of closed
sets such that

T = . Let
| = T
c
:= C
c
: C T ,
then | is a cover of X and hence has a nite subcover, |
0
. Let T
0
= |
c
0
T,
then T
0
= so that T does not have the nite intersection property. () If
X is not compact, there exists an open cover | of X with no nite subcover.
Let
T = |
c
:= U
c
: U | ,
then T is a collection of closed sets with the nite intersection property while
T = .
Exercise 39.12. Let (X, ) be a topological space. Show that A X is com-
pact i (A,
A
) is a compact topological space.
Metric Space Compactness Criteria
Let (X, d) be a metric space and for x X and > 0 let
B
t
x
() := B
x
() x
be the ball centered at x of radius > 0 with x deleted. Recall from Denition ??
that a point x X is an accumulation point of a subset E X if , = EV x
for all open neighborhoods, V, of x. The proof of the following elementary lemma
is left to the reader.
39.3 Compactness 531
Lemma 39.24. Let E X be a subset of a metric space (X, d) . Then the
following are equivalent:
1. x X is an accumulation point of E.
2. B
t
x
() E ,= for all > 0.
3. B
x
() E is an innite set for all > 0.
4. There exists x
n
n=1
E x with lim
n
x
n
= x.
Denition 39.25. A metric space (X, d) is bounded ( > 0) if there exists
a nite cover of X by balls of radius and it is totally bounded if it is
bounded for all > 0.
Theorem 39.26. Let (X, d) be a metric space. The following are equivalent.
(a) X is compact.
(b) Every innite subset of X has an accumulation point.
(c) Every sequence x
n
n=1
X has a convergent subsequence.
(d) X is totally bounded and complete.
Proof. The proof will consist of showing that a b c d a.
(a b) We will show that not b not a. Suppose there exists an innite
subset E X which has no accumulation points. Then for all x X there
exists
x
> 0 such that V
x
:= B
x
(
x
) satises (V
x
x) E = . Clearly
1 = V
x
xX
is a cover of X, yet 1 has no nite sub cover. Indeed, for each
x X, V
x
E x and hence if X,
x
V
x
can only contain a nite
number of points from E (namely E). Thus for any X, E
x
V
x
and in particular X ,=
x
V
x
. (See Figure 39.1.)
Fig. 39.1. The construction of an open cover with no nite sub-cover.
(b c) Let x
n
n=1
X be a sequence and E := x
n
: n N . If
#(E) < , then x
n
n=1
has a subsequence x
nk
k=1
which is constant and
hence convergent. On the other hand if #(E) = then by assumption E has
an accumulation point and hence by Lemma 39.24, x
n
n=1
has a convergent
subsequence.
(c d) Suppose x
n
n=1
X is a Cauchy sequence. By assumption there
exists a subsequence x
nk
k=1
which is convergent to some point x X. Since
x
n
n=1
is Cauchy it follows that x
n
x as n showing X is complete.
We now show that X is totally bounded. Let > 0 be given and choose an
arbitrary point x
1
X. If possible choose x
2
X such that d(x
2
, x
1
) , then
if possible choose x
3
X such that d
x1,x2]
(x
3
) and continue inductively
choosing points x
j
n
j=1
X such that d
x1,...,xn1]
(x
n
) . This process must
terminate, for otherwise we would produce a sequence x
n
n=1
X which can
have no convergent subsequences. Indeed, the x
n
have been chosen so that
d (x
n
, x
m
) > 0 for every m ,= n and hence no subsequence of x
n
n=1
can
be Cauchy.
(d a) For sake of contradiction, assume there exists an open cover 1 =
V
A
of X with no nite subcover. Since X is totally bounded for each
n N there exists
n
X such that
X =
_
xn
B
x
(1/n)
_
xn
C
x
(1/n).
Choose x
1

1
such that no nite subset of 1 covers K
1
:= C
x1
(1). Since
K
1
=
x2
K
1
C
x
(1/2), there exists x
2

2
such that K
2
:= K
1
C
x2
(1/2)
can not be covered by a nite subset of 1, see Figure 39.2. Continuing this way
inductively, we construct sets K
n
= K
n1
C
xn
(1/n) with x
n

n
such that no
K
n
can be covered by a nite subset of 1. Now choose y
n
K
n
for each n. Since
K
n
n=1
is a decreasing sequence of closed sets such that diam(K
n
) 2/n, it
follows that y
n
is a Cauchy and hence convergent with
y = lim
n
y
n

m=1
K
m
.
Since 1 is a cover of X, there exists V 1 such that y V. Since K
n
y
and diam(K
n
) 0, it now follows that K
n
V for some n large. But this
violates the assertion that K
n
can not be covered by a nite subset of 1.
Corollary 39.27. Any compact metric space (X, d) is second countable and
hence also separable by Exercise ??. (See Example ?? below for an example of
a compact topological space which is not separable.)
Proof. To each integer n, there exists
n
X such that X =
xn
B(x, 1/n). The collection of open balls,
1 :=
nN
xn
B(x, 1/n)
forms a countable basis for the metric topology on X. To check this, suppose
that x
0
X and > 0 are given and choose n N such that 1/n < /2
and x
n
such that d (x
0
, x) < 1/n. Then B(x, 1/n) B(x
0
, ) because for
y B(x, 1/n),
Fig. 39.2. Nested Sequence of cubes.
d (y, x
0
) d (y, x) +d (x, x
0
) < 2/n < .
Corollary 39.28. The compact subsets of 1
n
are the closed and bounded sets.
Proof. This is a consequence of Theorem ?? and Theorem 39.26. Here is
another proof. If K is closed and bounded then K is complete (being the closed
subset of a complete space) and K is contained in [M, M]
n
for some positive
integer M. For > 0, let
= Z
n
[M, M]
n
:= x : x Z
n
and [x
i
[ M for i = 1, 2, . . . , n.
We will show, by choosing > 0 suciently small, that
K [M, M]
n

x
B(x, ) (39.5)
which shows that K is totally bounded. Hence by Theorem 39.26, K is compact.
Suppose that y [M, M]
n
, then there exists x
such that [y
i
x
i
[
for i = 1, 2, . . . , n. Hence
d
2
(x, y) =
n
i=1
(y
i
x
i
)
2
n
2
which shows that d(x, y)
n. Hence if choose < /
n we have shows that

d(x, y) < , i.e. Eq. (39.5) holds.
Example 39.29. Let X =
p
(N) with p [1, ) and
p
(N) such that (k)
0 for all k N. The set
K := x X : [x(k)[ (k) for all k N
is compact. To prove this, let x
n
n=1
K be a sequence. By compactness of
closed bounded sets in C, for each k N there is a subsequence of x
n
(k)
n=1

C which is convergent. By Cantors diagonalization trick, we may choose a
subsequence y
n
n=1
of x
n
n=1
such that y(k) := lim
n
y
n
(k) exists for all
k N.
1
Since [y
n
(k)[ (k) for all n it follows that [y(k)[ (k), i.e. y K.
Finally
lim
n
|y y
n
|
p
p
= lim
n
k=1
[y(k) y
n
(k)[
p
=
k=1
lim
n
[y(k) y
n
(k)[
p
= 0
wherein we have used the Dominated convergence theorem. (Note
[y(k) y
n
(k)[
p
2
p
p
(k)
and
p
is summable.) Therefore y
n
y and we are done.
Alternatively, we can prove K is compact by showing that K is closed and
totally bounded. It is simple to show K is closed, for if x
n
n=1
K is a
convergent sequence in X, x := lim
n
x
n
, then
[x(k)[ lim
n
[x
n
(k)[ (k) k N.
This shows that x K and hence K is closed. To see that K is totally
bounded, let > 0 and choose N such that
_
k=N+1
[(k)[
p
_
1/p
< . Since
N
k=1
C
(k)
(0) C
N
is closed and bounded, it is compact. Therefore there
exists a nite subset
N
k=1
C
(k)
(0) such that
N
k=1
C
(k)
(0)
z
B
N
z
()
where B
N
z
() is the open ball centered at z C
N
relative to the
p
(1, 2, 3, . . . , N) norm. For each z , let z X be dened by
z(k) = z(k) if k N and z(k) = 0 for k N + 1. I now claim that
1
The argument is as follows. Let {n
1
j
}
j=1
be a subsequence of N ={n}
n=1
such that
limjx
n
1
j
(1) exists. Now choose a subsequence {n
2
j
}
j=1
of {n
1
j
}
j=1
such that
limjx
n
2
j
(2) exists and similarly {n
3
j
}
j=1
of {n
2
j
}
j=1
such that limjx
n
3
j
(3)
exists. Continue on this way inductively to get
{n}
n=1
{n
1
j
}
j=1
{n
2
j
}
j=1
{n
3
j
}
j=1
. . .
such that limjx
n
k
j
(k) exists for all k N. Let mj := n
j
j
so that eventually
{mj}
j=1
is a subsequence of {n
k
j
}
j=1
for all k. Therefore, we may take yj := xmj
.
39.3 Compactness 533
K
z
B
z
(2) (39.6)
which, when veried, shows K is totally bounded. To verify Eq. (39.6), let
x K and write x = u + v where u(k) = x(k) for k N and u(k) = 0 for
k < N. Then by construction u B
z
() for some z and
|v|
p

_

k=N+1
[(k)[
p
_
1/p
< .
So we have
|x z|
p
= |u +v z|
p
|u z|
p
+|v|
p
< 2.
Exercise 39.13 (Extreme value theorem). Let (X, ) be a compact topo-
logical space and f : X 1 be a continuous function. Show < inf f
supf < and there exists a, b X such that f(a) = inf f and f(b) = supf
2
.
Hint: use Exercise 39.10 and Corollary 39.28.
Exercise 39.14 (Uniform Continuity). Let (X, d) be a compact metric
space, (Y, ) be a metric space and f : X Y be a continuous function.
Show that f is uniformly continuous, i.e. if > 0 there exists > 0 such that
(f(y), f(x)) < if x, y X with d(x, y) < . Hint: you could follow the
argument in the proof of Theorem ??.
Denition 39.30. Let L be a vector space. We say that two norms, [[ and || ,
on L are equivalent if there exists constants , (0, ) such that
|f| [f[ and [f[ |f| for all f L.
Theorem 39.31. Let L be a nite dimensional vector space. Then any two
norms [[ and || on L are equivalent. (This is typically not true for norms on
innite dimensional spaces, see for example Exercise ??.)
Proof. Let f
i
n
i=1
be a basis for L and dene a new norm on L by
_
_
_
_
_
n
i=1
a
i
f
i
_
_
_
_
_
2
:=
_
n
i=1
[a
i
[
2
for a
i
F.
By the triangle inequality for the norm [[ , we nd
2
Here is a proof if X is a metric space. Let {xn}
n=1
X be a sequence such that
f(xn) supf. By compactness of X we may assume, by passing to a subsequence
if necessary that xn b X as n . By continuity of f, f(b) = supf.
i=1
a
i
f
i
i=1
[a
i
[ [f
i
[
_
n
i=1
[f
i
[
2
_
n
i=1
[a
i
[
2
M
_
_
_
_
_
n
i=1
a
i
f
i
_
_
_
_
_
2
where M =
_
n
i=1
[f
i
[
2
. Thus we have
[f[ M|f|
2
for all f L and this inequality shows that [[ is continuous relative to
||
2
. Since the normed space (L, ||
2
) is homeomorphic and isomorphic
to F
n
with the standard euclidean norm, the closed bounded set, S :=
f L : |f|
2
= 1 L, is a compact subset of L relative to ||
2
. Therefore by
Exercise 39.13 there exists f
0
S such that
m = inf [f[ : f S = [f
0
[ > 0.
Hence given 0 ,= f L, then
f
|f|
2
S so that
m
f
|f|
2
= [f[
1
|f|
2
or equivalently
|f|
2

1
m
[f[ .
This shows that [[ and ||
2
are equivalent norms. Similarly one shows that ||
and ||
2
are equivalent and hence so are [[ and || .
Corollary 39.32. If (L, ||) is a nite dimensional normed space, then A L
is compact i A is closed and bounded relative to the given norm, || .
Corollary 39.33. Every nite dimensional normed vector space (L, ||) is
complete. In particular any nite dimensional subspace of a normed vector space
is automatically closed.
Proof. If f
n
n=1
L is a Cauchy sequence, then f
n
n=1
is bounded and
hence has a convergent subsequence, g
k
= f
nk
, by Corollary 39.32. It is now
routine to show lim
n
f
n
= f := lim
k
g
k
.
Theorem 39.34. Suppose that (X, ||) is a normed vector in which the unit
ball, V := B
0
(1) , is precompact. Then dimX < .
Proof. Since

V is compact, we may choose X such that
V
x
_
x +
1
2
V
_
(39.7)
where, for any > 0,
V := x : x V = B
0
() .
Let Y := span(), then Eq. (39.7) implies,
V

V Y +
1
2
V.
Multiplying this equation by
1
2
then shows
1
2
V
1
2
Y +
1
4
V = Y +
1
4
V
and hence
V Y +
1
2
V Y +Y +
1
4
V = Y +
1
4
V.
Continuing this way inductively then shows that
V Y +
1
2
n
V for all n N. (39.8)
Indeed, if Eq. (39.8) holds, then
V Y +
1
2
V Y +
1
2
_
Y +
1
2
n
V
_
= Y +
1
2
n+1
V.
Hence if x V, there exists y
n
Y and z
n
B
0
(2
n
) such that y
n
+z
n
x.
Since lim
n
z
n
= 0, it follows that x = lim
n
y
n

Y . Since dimY
#() < , Corollary 39.33 implies Y =

Y and so we have shown that V Y.
Since for any x X,
1
2|x|
x V Y, we have x Y for all x X, i.e. X = Y.
Exercise 39.15. Suppose (Y, ||
Y
) is a normed space and (X, ||
X
) is a nite
dimensional normed space. Show every linear transformation T : X Y is
necessarily bounded.
39.4 Function Space Compactness Criteria
In this section, let (X, ) be a topological space.
Denition 39.35. Let T C(X).
1. T is equicontinuous at x X i for all > 0 there exists U
x
such
that [f(y) f(x)[ < for all y U and f T.
2. T is equicontinuous if T is equicontinuous at all points x X.
3. T is pointwise bounded if sup[f(x)[ : f T < for all x X.
Theorem 39.36 (Ascoli-Arzela Theorem). Let (X, ) be a compact topolog-
ical space and T C(X). Then T is precompact in C(X) i T is equicontinuous
and point-wise bounded.
Proof. () Since C(X)
(X) is a complete metric space, we must show

T is totally bounded. Let > 0 be given. By equicontinuity, for all x X, there
exists V
x

x
such that [f(y) f(x)[ < /2 if y V
x
and f T. Since X
is compact we may choose X such that X =
x
V
x
. We have now
decomposed X into blocks V
x
x
such that each f T is constant to
within on V
x
. Since sup[f(x)[ : x and f T < , it is now evident
that
M = sup[f(x)[ : x X and f T
sup[f(x)[ : x and f T + < .
Let | := k/2 : k Z [M, M]. If f T and |
(i.e. : | is a
function) is chosen so that [(x) f(x)[ /2 for all x , then
[f(y) (x)[ [f(y) f(x)[ +[f(x) (x)[ < x and y V
x
.
From this it follows that T =
_
T
: |
_
where, for |
,
T
:= f T : [f(y) (x)[ < for y V

x
and x .
Let :=
_
|
: T
,=
_
and for each choose f
T. For
f T
, x and y V
x
we have
[f(y) f
(y)[ [f(y) (x))[ +[(x) f
(y)[ < 2.
So |f f
< 2 for all f T
showing that T
B
f
(2). Therefore,
T =
B
f
(2)
and because > 0 was arbitrary we have shown that T is totally bounded.
() (*The rest of this proof may safely be skipped.) Since ||
: C(X)
[0, ) is a continuous function on C(X) it is bounded on any compact subset
T C(X). This shows that sup|f|
: f T < which clearly implies that

T is pointwise bounded.
3
Suppose T were not equicontinuous at some point
3
One could also prove that F is pointwise bounded by considering the continuous
evaluation maps ex : C(X) R given by ex(f) = f(x) for all x X.
39.4 Function Space Compactness Criteria 535
x X that is to say there exists > 0 such that for all V
x
, sup
yV
sup
fJ
[f(y)
f(x)[ > .
4
Equivalently said, to each V
x
we may choose
f
V
T and x
V
V [f
V
(x) f
V
(x
V
)[ . (39.9)
Set (
V
= f
W
: W
x
and W V
||
T and notice for any 1
x
that
V \
(
V
(
\
,= ,
so that (
V
V

x
T has the nite intersection property.
5
Since T is
compact, it follows that there exists some
f
V x
(
V
,= .
Since f is continuous, there exists V
x
such that [f(x) f(y)[ < /3 for all
y V. Because f (
V
, there exists W V such that |f f
W
| < /3. We
now arrive at a contradiction;
[f
W
(x) f
W
(x
W
)[
[f
W
(x) f(x)[ +[f(x) f(x
W
)[ +[f(x
W
) f
W
(x
W
)[
< /3 +/3 +/3 = .
Alternate proof. For > 0 let

f
X and V
x
be a nite open
cover of X with the property; for all x X we have
4
If X is rst countable we could nish the proof with the following argument. Let
{Vn}
n=1
be a neighborhood base at x such that V1 V2 V3 . . . . By the
assumption that F is not equicontinuous at x, there exist fn F and xn Vn such
that |fn(x) fn(xn)| n. Since F is a compact metric space by passing to
a subsequence if necessary we may assume that fn converges uniformly to some
f F. Because xn x as n we learn that
|fn(x) fn(xn)| |fn(x) f(x)| +|f(x) f(xn)| + |f(xn) fn(xn)|
2fn f +|f(x) f(xn)| 0 as n
which is a contradiction.
5
If we are willing to use Nets described in Appendix ?? below we could nish the
proof as follows. Since F is compact, the net {fV }V x
F has a cluster point
f F C(X). Choose a subnet {g}A of {fV }V
X
such that g f uniformly.
Then, since xV x implies xV
x, we may conclude from Eq. (39.9) that
|g(x) g(xV
)| |g(x) g(x)| = 0
which is a contradiction.
[f(y) f(x)[ < y V
x
and f T.
Let D :=
m=1
1/m
countable set and suppose that f
n
T is a given
sequence. Since f
n
(x)
n=1
is bounded in 1 for all x D, by Cantors di-
agonalization argument, we may choose a subsequence, g
k
:= f
nk
such that
g
0
(x) := lim
k
g
k
(x) exists for all x D. To nish the proof we need only
show g
k
is uniformly Cauchy. To this end, observe that for y X and m N
we may choose an x
1/m
such that y V
1/m
x and therefore,
[g
k
(y) g
l
(y)[ [g
k
(y) g
k
(x)[ +[g
k
(x) g
l
(x)[ +[g
l
(x) g
l
(y)[
2/m+[g
k
(x) g
l
(x)[
and therefore,
|g
k
g
l
|
u
2/m+ max
x
1/m
[g
k
(x) g
l
(x)[ .
Passing to the limit as k, l then shows
limsup
k,l
|g
k
g
l
|
u
2/m 0 as m .
Remark 39.37. The above theorem may be easily generalized to cover C (X, S)
where (S, ) is a complete metric space with the HeineBorel property, namely
closed and bounded sets are compact. For example if T C (X, S) is pointwise
bounded and equicontinuous then it is sequentially compact. This is proved by
rst showing C (X, S) is a complete metric space relative to the metric,
u
(f, g) := sup
xX
(f (x) , g (x)) .
Now follow the alternative proof in the previous theorem replacing 1 by S,
absolute values by the metric , and ||
u
by
u
everywhere. For example the
proof should start as; for > 0 let

f
X and V
x
be a nite open
cover of X with the property; for all x X we have
(f(y), f(x)) < y V
x
and f T.
Exercise 39.16. Give an alternative proof of the implication, () , in Theorem
39.36 by showing every subsequence f
n
: n N T has a convergence sub-
sequence.
Exercise 39.17. Suppose k C
_
[0, 1]
2
, 1
_
and for f C ([0, 1] , 1) , let
Kf (x) :=
_
1
0
k (x, y) f (y) dy for all x [0, 1] .
Show K is a compact operator on (C ([0, 1] , 1) , ||
) .
The following result is a corollary of Lemma ?? and Theorem 39.36.
Corollary 39.38 (Locally Compact Ascoli-Arzela Theorem). Let (X, )
be a locally compact and compact topological space and f
m
C(X) be a
pointwise bounded sequence of functions such that f
m
[
K
is equicontinuous for
any compact subset K X. Then there exists a subsequence m
n
m such
that g
n
:= f
mn
n=1
C(X) is a sequence which is uniformly convergent on
compact subsets of X.
Proof. Let K
n
n=1
be the compact subsets of X constructed in Lemma
??. We may now apply Theorem 39.36 repeatedly to nd a nested family of
subsequences
f
m
g
1
m
g
2
m
g
3
m
. . .
such that the sequence g
n
m
m=1
C(X) is uniformly convergent on K
n
. Using
Cantors trick, dene the subsequence h
n
of f
m
by h
n
:= g
n
n
. Then h
n
is uniformly convergent on K
l
for each l N. Now if K X is an arbitrary
compact set, there exists l < such that K K
o
l
K
l
and therefore h
n
is
uniformly convergent on K as well.
Proposition 39.39. Let
o
1
d
such that

is compact and 0 < 1.
Then the inclusion map i : C
() C
() is a compact operator. See Chapter

?? and Lemma ?? for the notation being used here.
Let u
n
n=1
C
() such that |u
n
|
C
1, i.e. |u
n
|
1 and
[u
n
(x) u
n
(y)[ [x y[
for all x, y .
By the Arzela-Ascoli Theorem 39.36, there exists a subsequence of u
n
n=1
of
u
n
n=1
and u C
o
(

) such that u
n
u in C
0
. Since
[u(x) u(y)[ = lim
n
[ u
n
(x) u
n
(y)[ [x y[
,
u C
as well. Dene g
n
:= u u
n
C
, then
[g
n
]
+|g
n
|
C
0 = |g
n
|
C
2
and g
n
0 in C
0
. To nish the proof we must show that g
n
0 in C
. Given
> 0,
[g
n
]
= sup
x,=y
[g
n
(x) g
n
(y)[
[x y[
A
n
+B
n
where
A
n
= sup
_
[g
n
(x) g
n
(y)[
[x y[
: x ,= y and [x y[
_
= sup
_
[g
n
(x) g
n
(y)[
[x y[
[x y[
: x ,= y and [x y[
_

[g
n
]
and
B
n
= sup
_
[g
n
(x) g
n
(y)[
[x y[
: [x y[ >
_
2
|g
n
|
C
0 0 as n .
Therefore,
lim sup
n
[g
n
]
lim sup
n
A
n
+ lim sup
n
B
n
2
+ 0 0 as 0.
This proposition generalizes to the following theorem which the reader is asked
to prove in Exercise ?? below.
Theorem 39.40. Let be a precompact open subset of 1
d
, , [0, 1] and
k, j N
0
. If j + > k +, then C
j,
_
_
is compactly contained in C
k,
_
_
.
39.5 Supplementary Remarks
39.5.1 Word of Caution
Example 39.41. Let (X, d) be a metric space. It is always true that B
x
()
C
x
() since C
x
() is a closed set containing B
x
(). However, it is not always
true that B
x
() = C
x
(). For example let X = 1, 2 and d(1, 2) = 1, then
B
1
(1) = 1 , B
1
(1) = 1 while C
1
(1) = X. For another counterexample, take
X =
_
(x, y) 1
2
: x = 0 or x = 1
_
with the usually Euclidean metric coming from the plane. Then
B
(0,0)
(1) =
_
(0, y) 1
2
: [y[ < 1
_
,
B
(0,0)
(1) =
_
(0, y) 1
2
: [y[ 1
_
, while
C
(0,0)
(1) = B
(0,0)
(1) (0, 1) .
In spite of the above examples, Lemmas 39.42 and 39.43 below shows that
for certain metric spaces of interest it is true that B
x
() = C
x
().
39.5 Supplementary Remarks 537
Lemma 39.42. Suppose that (X, [[) is a normed vector space and d is the
metric on X dened by d(x, y) = [x y[ . Then
B
x
() = C
x
() and
bd(B
x
()) = y X : d(x, y) = .
where the boundary operation, bd() is dened in Denition ?? (BRUCE: For-
ward Reference.) below.
Proof. We must show that C := C
x
() B
x
() =:

B. For y C, let
v = y x, then
[v[ = [y x[ = d(x, y) .
Let
n
= 1 1/n so that
n
1 as n . Let y
n
= x +
n
v, then d(x, y
n
) =
n
d(x, y) < , so that y
n
B
x
() and d(y, y
n
) = (1
n
) [v[ 0 as n .
This shows that y
n
y as n and hence that y

B.
39.5.2 Riemannian Metrics
This subsection is not completely self contained and may safely be skipped.
Lemma 39.43. Suppose that X is a Riemannian (or sub-Riemannian) mani-
fold and d is the metric on X dened by
d(x, y) = inf () : (0) = x and (1) = y
where () is the length of the curve . We dene () = if is not piecewise
smooth.
Then
B
x
() = C
x
() and
bd(B
x
()) = y X : d(x, y) =
where the boundary operation, bd() is dened in Denition ?? below.
Proof. Let C := C
x
() B
x
() =:

B. We will show that C

B by showing
B
c
C
c
. Suppose that y

B
c
and choose > 0 such that B
y
()

B = . In
particular this implies that
B
y
() B
x
() = .
We will nish the proof by showing that d(x, y) + > and hence that
y C
c
. This will be accomplished by showing: if d(x, y) < + then B
y
()
B
x
() ,= . If d(x, y) < max(, ) then either x B
y
() or y B
x
(). In either
Fig. 39.3. An almost length minimizing curve joining x to y.
case B
y
() B
x
() ,= . Hence we may assume that max(, ) d(x, y) < +.
Let > 0 be a number such that
max(, ) d(x, y) < < +
and choose a curve from x to y such that () < . Also choose 0 <
t
< such
that 0 <
t
< which can be done since < . Let k(t) = d(y, (t)) a
continuous function on [0, 1] and therefore k([0, 1]) 1 is a connected set which
contains 0 and d(x, y). Therefore there exists t
0
[0, 1] such that d(y, (t
0
)) =
k(t
0
) =
t
. Let z = (t
0
) B
y
() then
d(x, z) ([
[0,t0]
) = () ([
[t0,1]
) < d(z, y) =
t
<
and therefore z B
x
() B
x
() ,= .
Remark 39.44. Suppose again that X is a Riemannian (or sub-Riemannian)
manifold and
d(x, y) = inf () : (0) = x and (1) = y .
Let be a curve from x to y and let = () d(x, y). Then for all 0 u <
v 1,
d(x, y) + = () = ([
[0,u]
) +([
[u,v]
) +([
[v,1]
)
d(x, (u)) +([
[u,v]
) +d((v), y)
and therefore, using the triangle inequality,
([
[u,v]
) d(x, y) + d(x, (u)) d((v), y)
d((u), (v)) +.
This leads to the following conclusions. If is within of a length minimizing
curve from x to y then [
[u,v]
is within of a length minimizing curve from (u)
to (v). In particular if is a length minimizing curve from x to y then [
[u,v]
is a length minimizing curve from (u) to (v).
39.6 Exercises
Exercise 39.18. Let (X, d) be a metric space. Suppose that x
n
n=1
X is a
sequence and set
n
:= d(x
n
, x
n+1
). Show that for m > n that
d(x
n
, x
m
)
m1
k=n
k=n
k
.
Conclude from this that if
k=1
k
=
n=1
d(x
n
, x
n+1
) <
then x
n
n=1
is Cauchy. Moreover, show that if x
n
n=1
is a convergent se-
quence and x = lim
n
x
n
then
d(x, x
n
)
k=n
k
.
Exercise 39.19. Show that (X, d) is a complete metric space i every sequence
x
n
n=1
X such that

n=1
d(x
n
, x
n+1
) < is a convergent sequence in X.
You may nd it useful to prove the following statements in the course of the
proof.
1. If x
n
is Cauchy sequence, then there is a subsequence y
j
:= x
nj
such that
j=1
d(y
j+1
, y
j
) < .
2. If x
n
n=1
is Cauchy and there exists a subsequence y
j
:= x
nj
of x
n
such
that x = lim
j
y
j
exists, then lim
n
x
n
also exists and is equal to x.
Exercise 39.20. Suppose that f : [0, ) [0, ) is a C
2
function such
that f(0) = 0, f
t
> 0 and f
tt
0 and (X, ) is a metric space. Show that
d(x, y) = f((x, y)) is a metric on X. In particular show that
d(x, y) :=
(x, y)
1 +(x, y)
is a metric on X. (Hint: use calculus to verify that f(a + b) f(a) + f(b) for
all a, b [0, ).)
Exercise 39.21. Let (X
n
, d
n
)
n=1
be a sequence of metric spaces, X :=
n=1
X
n
, and for x = (x(n))
n=1
and y = (y(n))
n=1
in X let
d(x, y) =
n=1
2
n
d
n
(x(n), y(n))
1 +d
n
(x(n), y(n))
.
Show:
1. (X, d) is a metric space,
2. a sequence x
k
k=1
X converges to x X i x
k
(n) x(n) X
n
as
k for each n N and
3. X is complete if X
n
is complete for all n.
Exercise 39.22. Suppose (X, ) and (Y, d) are metric spaces and A is a dense
subset of X.
1. Show that if F : X Y and G : X Y are two continuous functions
such that F = G on A then F = G on X. Hint: consider the set C :=
x X : F (x) = G(x) .
2. Suppose f : A Y is a function which is uniformly continuous, i.e. for
every > 0 there exists a > 0 such that
d (f (a) , f (b)) < for all a, b A with (a, b) < .
Show there is a unique continuous function F : X Y such that F = f on
A. Hint: each point x X is a limit of a sequence consisting of elements
from A.
3. Let X = 1 = Y and A = X, nd a function f : 1 which is
continuous on but does not extend to a continuous function on 1.
References
1. David Applebaum, Levy processes and stochastic calculus, second ed., Cambridge
Studies in Advanced Mathematics, vol. 116, Cambridge University Press, Cam-
bridge, 2009. MR MR2512800
2. Richard F. Bass, Probabilistic techniques in analysis, Probability and its Applica-
tions (New York), Springer-Verlag, New York, 1995. MR MR1329542 (96e:60001)
3. , The Doob-Meyer decomposition revisited, Canad. Math. Bull. 39 (1996),
no. 2, 138150. MR MR1390349 (97b:60075)
4. , Diusions and elliptic operators, Probability and its Applications (New
York), Springer-Verlag, New York, 1998. MR MR1483890 (99h:60136)
5. Patrick Billingsley, Probability and measure, third ed., Wiley Series in Probability
and Mathematical Statistics, John Wiley & Sons Inc., New York, 1995, A Wiley-
Interscience Publication. MR MR1324786 (95k:60001)
6. , Convergence of probability measures, second ed., Wiley Series in Prob-
ability and Statistics: Probability and Statistics, John Wiley & Sons Inc., New
York, 1999, A Wiley-Interscience Publication. MR MR1700749 (2000e:60008)
7. Otto J. Bj ornsson, A note on the characterization of standard Borel spaces, Math.
Scand. 47 (1980), no. 1, 135136. MR MR600083 (82a:54070)
8. R. M. Blumenthal and R. K. Getoor, Markov processes and potential theory,
Pure and Applied Mathematics, Vol. 29, Academic Press, New York, 1968. MR
MR0264757 (41 #9348)
9. Leo Breiman, Probability, Addison-Wesley Publishing Company, Reading, Mass.,
1968. MR MR0229267 (37 #4841)
10. K. L. Chung and R. J. Williams, Introduction to stochastic integration, second
ed., Probability and its Applications, Birkh auser Boston Inc., Boston, MA, 1990.
MR MR1102676 (92d:60057)
11. Claude Dellacherie, Capacites et processus stochastiques, Springer-Verlag, Berlin,
1972, Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 67. MR
MR0448504 (56 #6810)
12. J. Dieudonne, Foundations of modern analysis, Academic Press, New York, 1969,
Enlarged and corrected printing, Pure and Applied Mathematics, Vol. 10-I. MR
MR0349288 (50 #1782)
13. J. L. Doob, Stochastic processes, Wiley Classics Library, John Wiley & Sons Inc.,
New York, 1990, Reprint of the 1953 original, A Wiley-Interscience Publication.
MR MR1038526 (91d:60002)
14. Nelson Dunford and Jacob T. Schwartz, Linear operators. Part I, Wiley Classics
Library, John Wiley & Sons Inc., New York, 1988, General theory, With the
assistance of William G. Bade and Robert G. Bartle, Reprint of the 1958 original,
A Wiley-Interscience Publication. MR MR1009162 (90g:47001a)
15. Richard Durrett, Probability: theory and examples, second ed., Duxbury Press,
Belmont, CA, 1996. MR MR1609153 (98m:60001)
16. , Stochastic calculus, Probability and Stochastics Series, CRC Press, Boca
Raton, FL, 1996, A practical introduction. MR MR1398879 (97k:60148)
17. Evgenii B. Dynkin and Aleksandr A. Yushkevich, Markov processes: Theorems
and problems, Translated from the Russian by James S. Wood, Plenum Press,
New York, 1969. MR MR0242252 (39 #3585a)
18. Stewart N. Ethier and Thomas G. Kurtz, Markov processes, Wiley Series in
Probability and Mathematical Statistics: Probability and Mathematical Statis-
tics, John Wiley & Sons Inc., New York, 1986, Characterization and convergence.
MR MR838085 (88a:60130)
540 References
19. William Feller, An introduction to probability theory and its applications. Vol. II.,
Second edition, John Wiley & Sons Inc., New York, 1971. MR MR0270403 (42
#5292)
20. Masatoshi Fukushima, Y oichi

Oshima, and Masayoshi Takeda, Dirichlet forms
and symmetric Markov processes, de Gruyter Studies in Mathematics, vol. 19,
Walter de Gruyter & Co., Berlin, 1994. MR MR1303354 (96f:60126)
21. Robert D. Gordon, Values of Mills ratio of area to bounding ordinate and of the
normal probability integral for large values of the argument, Ann. Math. Statistics
12 (1941), 364366. MR MR0005558 (3,171e)
22. Paul R. Halmos, Lectures on ergodic theory, (1960), vii+101. MR MR0111817 (22
#2677)
23. Nobuyuki Ikeda and Shinzo Watanabe, Stochastic dierential equations and dif-
fusion processes, second ed., North-Holland Mathematical Library, vol. 24, North-
Holland Publishing Co., Amsterdam, 1989. MR MR1011252 (90m:60069)
24. Jean Jacod and Albert N. Shiryaev, Limit theorems for stochastic processes,
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of
Mathematical Sciences], vol. 288, Springer-Verlag, Berlin, 1987. MR MR959133
(89k:60044)
25. , Limit theorems for stochastic processes, second ed., Grundlehren der
Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sci-
ences], vol. 288, Springer-Verlag, Berlin, 2003. MR MR1943877 (2003j:60001)
26. Svante Janson, Gaussian Hilbert spaces, Cambridge Tracts in Mathematics, vol.
129, Cambridge University Press, Cambridge, 1997. MR MR1474726 (99f:60082)
27. Shizuo Kakutani, On equivalence of innite product measures, Ann. of Math. (2)
49 (1948), 214224. MR MR0023331 (9,340e)
28. Olav Kallenberg, Foundations of modern probability, second ed., Probability and
its Applications (New York), Springer-Verlag, New York, 2002. MR MR1876169
(2002m:60002)
29. Ioannis Karatzas and Steven E. Shreve, Brownian motion and stochastic calculus,
second ed., Graduate Texts in Mathematics, vol. 113, Springer-Verlag, New York,
1991. MR MR1121940 (92h:60127)
30. Oliver Knill, Probability and stochastic processes with applications, Havard Web-
Based, 1994.
31. Leonid B. Koralov and Yakov G. Sinai, Theory of probability and random
processes, second ed., Universitext, Springer, Berlin, 2007. MR MR2343262
(2008k:60002)
32. Hiroshi Kunita, Stochastic ows and stochastic partial dierential equations, Pro-
ceedings of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley,
Calif., 1986) (Providence, RI), Amer. Math. Soc., 1987, pp. 10211031. MR
MR934304 (89g:60194)
33. , Stochastic ows and stochastic dierential equations, Cambridge Studies
in Advanced Mathematics, vol. 24, Cambridge University Press, Cambridge, 1990.
MR MR1070361 (91m:60107)
34. Hui Hsiung Kuo, Gaussian measures in Banach spaces, Springer-Verlag, Berlin,
1975, Lecture Notes in Mathematics, Vol. 463. MR 57 #1628
35. Shigeo Kusuoka and Daniel W. Stroock, Precise asymptotics of certain Wiener
functionals, J. Funct. Anal. 99 (1991), no. 1, 174. MR MR1120913 (93a:60085)
36. Gregory F. Lawler, Introduction to stochastic processes, Chapman & Hall Proba-
bility Series, Chapman & Hall, New York, 1995. MR MR1372946 (97a:60001)
37. Terry Lyons and Zhongmin Qian, System control and rough paths, Oxford Mathe-
matical Monographs, Oxford University Press, Oxford, 2002, Oxford Science Pub-
lications. MR MR2036784 (2005f:93001)
38. Zhi Ming Ma and Michael R ockner, Introduction to the theory of (nonsymmet-
ric) Dirichlet forms, Universitext, Springer-Verlag, Berlin, 1992. MR MR1214375
(94d:60119)
39. Henry P. McKean, Stochastic integrals, AMS Chelsea Publishing, Providence, RI,
2005, Reprint of the 1969 edition, with errata. MR MR2169626 (2006d:60003)
40. Michel Metivier, Semimartingales, de Gruyter Studies in Mathematics, vol. 2,
Walter de Gruyter & Co., Berlin, 1982, A course on stochastic processes. MR
MR688144 (84i:60002)
41. Edward Nelson, An existence theorem for second order parabolic equations, Trans.
Amer. Math. Soc. 88 (1958), 414429. MR MR0095341 (20 #1844)
42. , Feynman integrals and the Schrodinger equation, J. Mathematical Phys.
5 (1964), 332343. MR MR0161189 (28 #4397)
43. J. R. Norris, Markov chains, Cambridge Series in Statistical and Probabilistic
Mathematics, vol. 2, Cambridge University Press, Cambridge, 1998, Reprint of
1997 original. MR MR1600720 (99c:60144)
44. , Probability and measure, Tech. report, Mathematics Department, Uni-
versity of Cambridge, 2009.
45. James Norris, Simplied Malliavin calculus, Seminaire de Probabilites, XX,
1984/85, Lecture Notes in Math., vol. 1204, Springer, Berlin, 1986, pp. 101130.
46. K. R. Parthasarathy, Probability measures on metric spaces, AMS Chelsea Pub-
lishing, Providence, RI, 2005, Reprint of the 1967 original. MR MR2169627
(2006d:60004)
47. Yuval Peres, An invitation to sample paths of brownian motion, stat-
www.berkeley.edu/ peres/bmall.pdf (2001), 168.
48. , Brownian motion, http://www.stat.berkeley.edu/users/peres/bmbook.pdf
(2006), 1272.
49. Philip Protter, Stochastic integration and dierential equations, Applications of
Mathematics (New York), vol. 21, Springer-Verlag, Berlin, 1990, A new approach.
MR MR1037262 (91i:60148)
50. Philip E. Protter, Stochastic integration and dierential equations, second ed.,
Applications of Mathematics (New York), vol. 21, Springer-Verlag, Berlin, 2004,
Stochastic Modelling and Applied Probability. MR MR2020294 (2005k:60008)
51. , Stochastic integration and dierential equations, Stochastic Modelling
and Applied Probability, vol. 21, Springer-Verlag, Berlin, 2005, Second edition.
Version 2.1, Corrected third printing. MR MR2273672
52. Michael Reed and Barry Simon, Methods of modern mathematical physics. II.
Fourier analysis, self-adjointness, Academic Press [Harcourt Brace Jovanovich
Publishers], New York, 1975. MR 58 #12429b
53. , Methods of modern mathematical physics. I, second ed., Academic Press
Inc. [Harcourt Brace Jovanovich Publishers], New York, 1980, Functional analysis.
MR 85e:46002
54. Daniel Revuz and Marc Yor, Continuous martingales and Brownian motion, third
ed., Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of
Mathematical Sciences], vol. 293, Springer-Verlag, Berlin, 1999. MR MR1725357
(2000h:60050)
55. L. C. G. Rogers and David Williams, Diusions, Markov processes, and mar-
tingales. Vol. 1, Cambridge Mathematical Library, Cambridge University Press,
Cambridge, 2000, Foundations, Reprint of the second (1994) edition. MR
2001g:60188
56. , Diusions, Markov processes, and martingales. Vol. 2, Cambridge Math-
ematical Library, Cambridge University Press, Cambridge, 2000, Ito calculus,
Reprint of the second (1994) edition. MR MR1780932 (2001g:60189)
57. Sheldon M. Ross, Stochastic processes, Wiley Series in Probability and Mathe-
matical Statistics: Probability and Mathematical Statistics, John Wiley & Sons
Inc., New York, 1983, Lectures in Mathematics, 14. MR MR683455 (84m:60001)
58. H. L. Royden, Real analysis, third ed., Macmillan Publishing Company, New York,
1988. MR MR1013117 (90g:00004)
59. Gennady Samorodnitsky and Murad S. Taqqu, Stable non-Gaussian random pro-
cesses, Stochastic Modeling, Chapman & Hall, New York, 1994, Stochastic models
with innite variance. MR MR1280932 (95f:60024)
60. Robert Schatten, Norm ideals of completely continuous operators, Second printing.
Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 27, Springer-Verlag,
Berlin, 1970. MR 41 #2449
61. Michael Sharpe, General theory of Markov processes, Pure and Applied Math-
ematics, vol. 133, Academic Press Inc., Boston, MA, 1988. MR MR958914
(89m:60169)
62. Barry Simon, Trace ideals and their applications, London Mathematical Society
Lecture Note Series, vol. 35, Cambridge University Press, Cambridge, 1979. MR
80k:47048
63. , Functional integration and quantum physics, second ed., AMS Chelsea
Publishing, Providence, RI, 2005. MR MR2105995 (2005f:81003)
64. Daniel W. Stroock, Probability theory, an analytic view, Cambridge University
Press, Cambridge, 1993. MR MR1267569 (95f:60003)
65. S. R. S. Varadhan, Probability theory, Courant Lecture Notes in Mathematics,
vol. 7, New York University Courant Institute of Mathematical Sciences, New
York, 2001. MR MR1852999 (2003a:60001)
66. John B. Walsh, An introduction to stochastic partial dierential equations,

Ecole
dete de probabilites de Saint-Flour, XIV1984, Lecture Notes in Math., vol.
1180, Springer, Berlin, 1986, pp. 265439. MR MR876085 (88a:60114)

Probability Lecture Notes

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Probability Lecture Notes

Caricato da

Copyright:

Formati disponibili

Bruce K.

For 7a and 7b it is illuminating to nd a formula for E[g (X

is used to denote the union of pair-wise disjoint sets A

. So we are going to only dene P on

. We will developed this below.

: A is a collection of non-empty sets,

be the canonical projection map dened

= X for some xed space X, then we will write

for each A. The axiom of choice states that X

in the case that

/ : / is an algebra such that c /

/: / is a algebra such that c /.

is an algebra, we say that a

, let S(/) denote the collection of

= 0 are in S(/) . If f, g S(/)

is the partition of X determined by /. Therefore f is an /

, F : C is a function such that f = F Z.

k=2 & k even

k=1 & k odd

run of the experiment. Then we have postulated that, independent of ,

[0, 1] is the probability measure dened by

[0, 1] be as described before Exercise 4.10. Show X

[0, 1] be the probability measure

and as right is independent of (x, y) K we may conclude,

which completes the proof since

0 as 0 because f is uniformly continuous

, P : B [0, 1] is a nitely additive

:= A / : A and recall that either A

E[X[ . (It is typically not true that

[X ()[ for all .)

= sup[f(y) f(x)[ : x, y K and |y x| .

C for all . Hint: start by

, as in Eq. (4.58) such that p

be an algebra and : / [0, ] be a nitely additive measure.

. Then the following

with p, q, r > 0 and p +q +r = 1. In this case,

, is a class or system if it is closed

is a collection of subsets . Then L is a class i satises the following

is a system. If P and Q are two

determined by, (x) =

denote the col-

denote the collection of subsets of which are nite or count-

is closed under taking countable unions and nite intersections.

is closed under taking countable intersections and nite unions.

is closed under countable unions. Moreover if

is also closed under nite intersections. Item 3. is straight

[0, ] then satises;

with A B then (A) (B) .

) The function is sub-additive on /

) The function is countably additive on /

, we may let N in Eq. (5.5) to

as described in Proposition 5.22 and Eq. (5.6)

then (A) = inf (B) : A B / .

such that A C, then (C A) = (C) (A) .

, by the denition of (A) and Proposition

are disjoint, then AB = implies A

we may use the strong additivity of on /

as above. The for any B let

(B) := inf (C) : B C /

(B) as the inner and outer content of B respec-

. We say that B is measurable if

(B) for all B B. (5.7)

(B) i for all > 0 there exists A /

such that A B C and