Mscfe XXX (Course Name) - Module X: Collaborative Review Task

MScFE xxx [Course Name] - Module X: Collaborative Review Task
Revised: 06/06/2019
© 2019 - WorldQuant University – All rights reserved.
1
MScFE 620 Discrete-time Stochastic Processes – Table of Contents
Module 1: Probability Theory ....................................................................... 3
Unit 1: Sigma-Algebras and Measures............................................................................. 4
Unit 2: Random Variables......................................................................................................9
Part I .............................................................................................................................9
Part II ......................................................................................................................... 15
Unit 3: Expectation ................................................................................................................. 18
Unit 4: The Radon-Nikodym Theorem .......................................................................... 27
Problem Set .............................................................................................................................. 30
Collaborative Review Task ................................................................................................ 34

2
MScFE 620 Discrete-time Stochastic Processes - Module 1: Summary
Module 1: Probability Theory
Module 1 introduces the measure-theoretic foundations of probability. It begins with an

introduction to sigma-algebras and measures, then proceeds to introduce random variables and
measurable functions before concluding with the Lebesgue integration and the Radon-Nikodym
theorem.

3
MScFE 620 Discrete-time Stochastic Processes - Module 1: Unit 1
Unit 1: Sigma-Algebras and Measures
The study of probability theory begins with what is called a random experiment: an experiment
whose outcome cannot be pre-determined with certainty. An example of such a random
experiment is the tossing of three fair coins, with the (uncertain) outcome being the face shown by
each of them.
We associate with every random experiment a set Ω of all possible outcomes of the random
experiment. An outcome 𝜔 is simply an element of Ω.
We define an event to be a subset of Ω and we say that an event 𝐴 occurs if the outcome 𝜔 belongs
to 𝐴, i.e. 𝜔 ∈ 𝐴.
In our example of tossing three coins, the sample space can be chosen to be:
Ω = {HHH, HHT, … , TTT} = {𝜔 = 𝜔1 𝜔2 𝜔3 : 𝜔𝑖 = H or T for 𝑖 = 1,2,3},
where H means “heads" and T means “tails". For instance, the event 𝐴 that exactly two heads come
up corresponds to the set
A = {HHT, HTH, THH}.
For complex models, it turns out that we have to restrict the class of events to a sub-collection
ℱ of subsets of Ω that satisfies the following properties:
𝒜1. ∅ ∈ ℱ;
𝒜2. For every 𝐴 ⊆ Ω, 𝐴 ∈ ℱ ⟹ 𝐴𝑐 ∈ ℱ;
𝒜3. For every 𝐴, 𝐵 ⊆ Ω, 𝐴, 𝐵 ∈ ℱ ⟹ 𝐴 ∪ 𝐵 ∈ ℱ;
𝜎𝒜4. If {𝐴𝑛 : 𝑛 = 1,2,3, … } is a collection of subsets of Ω with 𝐴𝑛 ∈ 𝐹 for each 𝑛 ≥ 1, then
⋃ 𝐴𝑛 ∈ ℱ.
𝑛=1

4
A collection of subsets of 𝛺 that satisfies 𝒜1, 𝒜2 and 𝒜3 is called an algebra in 𝛺, and an algebra
that also satisfies 𝜎𝒜4 is called a 𝜎 − 𝑎𝑙𝑔𝑒𝑏𝑟𝑎1. Thus, the collection of admissible events ℱ should
be a 𝜎-𝑎𝑙𝑔𝑒𝑏𝑟𝑎.1
The pair (Ω, ℱ) is called a measurable space.
Once we have the collection of events ℱ, we want to assign probabilities to each one of them. We
call a set function ℙ: ℱ → ℝ
̅ = ℝ ∪ {∞, −∞} a probability measure if ℙ satisfies the following:
ℙ1. ℙ(∅) = 0;
ℙ2. ℙ(𝐴) ≥ 0 for every 𝐴 ∈ ℱ;
ℙ3. If {𝐴𝑛 : 𝑛 = 1,2,3, … } is a countable collection of pairwise disjoint events, then
∞ ∞
ℙ (⋃ 𝐴𝑛 ) = ∑ ℙ(𝐴𝑛 ).
𝑛=1 𝑛=1
ℙ4. ℙ(Ω) = 1.
The triple (Ω, ℱ, ℙ) is called a probability space.
Upon careful observation, we see that there are several other set functions that behave like a
probability. Here are some examples of set functions that satisfy conditions ℙ1 − ℙ3:
o Length of subsets of the real line. This is the function 𝜆1 that assigns to certain subsets of
ℝ the “length” of that subset. If 𝐴 = (𝑎, 𝑏) is a bounded interval, then 𝜆1 (𝐴) = 𝑏 − 𝑎.
However, 𝜆1 is defined on a larger collection than intervals (the intervals by themselves
do not form a 𝜎-algebra). This measure, as we will refer to it, satisfies conditions ℙ1 −
ℙ3, but not ℙ4. Also, 𝜆1 (𝛺) = 𝜆1 (ℝ) = ∞, hence 𝜆1 takes values on the extended real
numbers ℝ̅ ≔ ℝ ∪ {−∞, ∞}. This measure is called the Lebesgue measure, after the
French mathematician Henri Lebesgue, who was instrumental in laying the foundations of
the field of Measure Theory.
o The above example can be extended to arbitrary dimensions to construct the 𝑁-

dimensional Lebesgue measure 𝜆𝑁 on subsets of ℝ𝑁 , where 𝑁 is a positive integer. For
𝑁 = 2 this corresponds to area in the plane, while λ3 measures volume in ℝ3 .
1
Note that for a 𝜎-algebra, condition 𝒜3 can be removed since it is implied by condition 𝜎𝒜4

5
o Mass for quantities of matter.
o Cardinality of sets also satisfies the same properties.
Given an arbitrary set Ω and a 𝜎-algebra ℱ of subsets of Ω, we say that a set function 𝜇: ℱ → ℝ
̅ is a
positive measure (or measure) on (Ω, ℱ) if it satisfies ℙ1, ℙ2, and ℙ3 (with 𝜇 replacing ℙ, of course).
If, in addition, the measure 𝜇 also satisfies ℙ4, then 𝜇 is a probability measure. This means that the
study of measure theory contains probability theory as a special case! The pair (Ω, ℱ) is called a
measurable space and the triple (Ω, ℱ, 𝜇) is called a measure space. If 𝜇 is a probability measure,
then (Ω, ℱ, 𝜇) is called a probability space. Thus, on our road to understanding probability theory
we are going to take a detour through abstract measure theory. We will prove many important
results that hold in general measure spaces and later apply them to probability spaces.
Here are some examples of algebras and 𝜎 -algebras. Check that these are indeed algebras or 𝜎-
algebras.
1 The “smallest” 𝜎-algebra on any set Ω is 𝒜 ≔ {∅, Ω}. Any other algebra (or 𝜎-algebra) must
necessarily contain it.
2 On the other hand, the “largest” 𝜎-algebra on a set Ω is the power set of Ω, 2Ω . (A power set
is a set of all subsets of a set.) Every algebra (or 𝜎-algebra) is contained in 2Ω .
3 Let 𝐴 ⊆ Ω, then ℱ𝐴 = {𝐴, 𝐴𝑐 , ∅, Ω} is a 𝜎-algebra (and, therefore, an algebra) on Ω. We call

this the 𝜎-algebra generated by {𝐴}.
4 For a concrete example, let Ω = {1,2,3,4} and
ℱ = {{1}, {2,3}, {4}, {2,3,4}, {1,2,3}, {1,4}, ∅, Ω}.
One can easily check that ℱ is a 𝜎-algebra on Ω.
Notice that in all the examples above the number of elements in each algebra is a power of 2. In
general, we say that a 𝜎-algebra ℱ is generated by blocks if there exists a countable partition
{ℬ𝓃 : 𝑛 = 0,1,2, … } of Ω such that
ℱ = {⋃ ℬ𝓃 : 𝐽 ⊆ ℕ}.
𝑛∈𝐽

6
Now we consider more complicated examples.
1 Let Ω = ℝ and choose ℱ = {𝐴 ⊆ ℝ: for every 𝑥 ∈ ℝ, 𝑥 ∈ 𝐴 ⟺ −𝑥 ∈ 𝐴} to be the collection

of all symmetric subsets of ℝ. This is a 𝜎-algebra on ℝ.
2 Let Ω = ℝ again and define
𝒜 = {𝐴 ⊆ ℝ: 𝐴 =∪𝑛𝑘=1 𝐼𝑘 for some intervals 𝐼𝑘 ⊆ ℝ, 𝑘 = 1,2, … , 𝑛, 𝑛 ∈ ℕ+ }
to be subsets of ℝ that are finite unions of (open, closed, bounded, empty, etc.) intervals in
ℝ. Then 𝒜 is an algebra but not a 𝜎-algebra, since (for instance) 𝐴𝑛 : = {𝑛} = [𝑛, 𝑛] ∈ 𝐴 for
each 𝑛 ∈ ℕ, yet ∪𝑛∈ℕ 𝐴𝑛 = 𝑁 ∉ 𝒜.
3 Let Ω be an infinite set. Define:
𝒜 ≔ {𝐴 ⊆ Ω: 𝐴 is finite or 𝐴𝑐 is finite}.
Then 𝒜 is an algebra but not a 𝜎-algebra.
4 Let Ω be an infinite set. Define:
ℱ ≔ {𝐴 ⊆ Ω: 𝐴 is countable or 𝐴𝑐 is countable}.
Then ℱ is a 𝜎-algebra.
Here are some examples of measures.
1 The Dirac Measure. Let Ω be any non-empty set and pick 𝑎 ∉ Ω. If we let ℱ = 2Ω and define
𝛿𝑎 : ℱ → ℝ by
1 if 𝑎 ∈ 𝐴
δ𝑎 (𝐴) ≔ {
0 otherwise,
then δ𝑎 is a probability measure on (Ω, ℱ), called the Dirac measure.
2 The Counting Measure. Let Ω be a set and define #: 2Ω → ℝ

̅ by
|𝐴| if 𝐴 is finite
#(𝐴) ≔ {
∞ otherwise,

7
where |𝐴| represents the cardinality (number of elements) of 𝐴. Then this is a measure on
(Ω, 2Ω ), called the counting measure.
3 The Lebesgue Measure on ℝ. This measure 𝜆1 is defined on a 𝜎-algebra that includes all
the intervals and it extends the notion of length of an interval to such sets. If 𝐼 = (𝑎, 𝑏) is an
interval, then
λ1 (𝐼) = 𝑏 − 𝑎.
The N-dimensional analog of Lebesgue measure is a measure λ𝑁 on some subsets of ℝ𝑁

that extends area for 𝑁 = 2 and volume for 𝑁 = 3.
A property 𝑃 is said to be true almost everywhere with respect to a measure 𝜇 if the set of points
where 𝑃 is false is contained in a set of measure zero. We write 𝑃 − 𝜇 a.e., and we say 𝜇 − a.s. if 𝜇 is
a probability measure.
Let 𝒜 ⊆ 2Ω be a collection of subsets of Ω. In general, 𝒜 will not be a 𝜎-algebra. What we want is

to find the smallest 𝜎-algebra containing 𝒜 and we denote it by 𝜎(𝒜).
A special 𝜎-algebra is the Borel 𝜎-algebra on ℝ denoted by ℬ(ℝ). It is the 𝜎-algebra generated by
intervals in ℝ, i.e.,
ℬ(ℝ) = 𝜎 ({𝐼 ∶ 𝐼 ⊆ ℝ is an interval}).
The elements of ℬ(ℝ) are called Borel sets. We also define the Borel 𝜎-algebra of a subset 𝐴 ⊆ ℝ
as the restriction of ℬ(ℝ) to 𝐴; i.e.,
ℬ(𝐴) ≔ {𝐵 ∩ 𝐴: 𝐵 ∈ ℬ(ℝ)}.

8
Unit 2: Random Variables
Part I
Intuitively, a random variable is a numerical variable whose value is determined by the outcome of
a random experiment. This description suggests that we define a random variable 𝑋 as a function
𝑋: Ω → ℝ. Unfortunately, defining a random variable this way is not sufficient in calculating
probabilities about 𝑋. Indeed, to calculate ℙ({𝑋 ≤ 2}) for instance, we need to make sure that the
set 𝑋 −1 ((−∞, 2]) is in the 𝜎-algebra ℱ.
We define a random variable as a function 𝑋: Ω → ℝ such that:
𝑋 −1 (𝐵) ∈ ℱ for every 𝐵 ∈ ℱ.
This definition of a random variable does not require a measure to be defined on (Ω, ℱ); all we
need is Ω and the 𝜎-algebra ℱ . We now extend this definition to general measurable spaces.
Let (Ω, ℱ) and (𝐸, ℰ) be measurable spaces and 𝑓: Ω → 𝐸 be a function. (We will write
𝑓: (Ω, ℱ) → (𝐸, ℰ).) We say that 𝑓 is ℱ/ℰ -measurable if 𝑓 −1 (𝐵) ∈ ℱ for every 𝐵 ∈ ℰ.
1 If (𝐸, ℰ) = (ℝ, ℬ(ℝ)), we say that 𝑓 is Borel measurable, or, simply, measurable.
2 If (Ω, ℱ, ℙ) is a probability space and (𝐸, ℰ) = (ℝ, ℬ(ℝ)), then 𝑓 is called a random variable
and we usually use the letter 𝑋 instead of 𝑓.
3 If (Ω, ℱ, ℙ) is a probability space and (𝐸, ℰ) = (ℝ𝑛 , ℬ(ℝ𝑛 )), then 𝑓 is called a random vector
and we also use the letter 𝑋 instead of 𝑓.
Whenever we say 𝑓 is measurable without mentioning (𝐸, ℰ), we will assume that we mean Borel
measurable – i.e. (𝐸, ℰ) = (ℝ, ℬ(ℝ)).

9
Here are some examples:
1 Let (Ω, ℱ) be any measurable space and 𝐴 ⊆ 𝛺. We define the indicator function 𝐼𝐴 : 𝛺 → ℝ
of 𝐴 as
1 𝜔∈𝐴
𝐼𝐴 (𝜔) = { .
0 𝜔∉𝐴
Then 𝐼𝐴 is measurable (Borel measurable) if and only if 𝐴 ∈ ℱ. To see this, note that for any
𝐵 ∈ ℬ(ℝ),
∅ 0,1 ∉ 𝐵
𝐴 1 ∈ 𝐵, 0 ∉ 𝐵
𝐼𝐴−1 𝐵 = { 𝑐
𝐴 0 ∈ 𝐵, 1 ∉ 𝐵
𝛺 0,1 ∈ 𝐵,
which is always in ℱ if and only if 𝐴 ∈ ℱ.
2 On (𝛺, 2𝛺 ), every function 𝑓: 𝛺 → ℝ is measurable, since 𝑓 −1 (𝐵) ⊆ 𝛺 for each 𝐵 ∈ ℬ(ℝ).
3 On (𝛺, {∅, 𝛺}), the only (Borel) measurable functions 𝑓: 𝛺 → ℝ are constants.
4 On (𝛺, {∅, 𝛺, 𝐴, 𝐴𝑐 })𝐴 ⊆ 𝛺 , the only measurable functions 𝑓: 𝛺 → ℝ are constant on 𝐴 and
constant on 𝐴𝑐 . That is, 𝑓: 𝛺 → ℝ is measurable if and only if 𝑓 = 𝛼𝐼𝐴 + 𝛽𝐼𝐴𝑐 for some
constants 𝛼, 𝛽 ∈ ℝ.
Let us now look at a concrete example. Consider the example where two fair coins are tossed and
only the outcome of the first coin is recorded. Let 𝐴 be the event that the outcome of the first coin
is a head and 𝐵 be the event that the outcome of the first coin is a tail. Then 𝛺 =
{𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇} and
ℱ ≔ 𝜎({𝐴, 𝐵}) = 𝜎({{𝐻𝐻, 𝐻𝑇}, {𝑇𝐻, 𝑇𝑇}}) = {{𝐻𝐻, 𝐻𝑇}, {𝑇𝐻, 𝑇𝑇}, ∅, 𝛺}.
Notice that since 𝐴𝑐 = 𝐵 , then ℬ = {𝐴, 𝐵} are the blocks that generate ℱ . Both 𝐼𝐴 and 𝐼𝐵 are
random variables. In fact, the only random variables are linear combinations of the two. In other
words, 𝑋 is a random variable if it is constant on both 𝐴 and 𝐵. This simply means that 𝑋 is only
dependent on the outcome of the first toss.

10
The preceding example suggests that if ℱ is generated by (countably many) blocks, then the only
ℱ -measurable functions are the ones that are constant on the blocks. The following result makes
this more precise.
Theorem 1
Let (𝛺, ℱ) be a measurable space, where ℱ is generated by countably many blocks ℬ =

{𝐵1 , 𝐵2 , … }.Then 𝑓: 𝛺 → ℝ is ℱ -measurable if and only if 𝑓 is constant on each block 𝐵𝑛 (𝑛 ≥ 1).
In this case, 𝑓 can be written as
𝑓(𝜔) = ∑ 𝑎𝑛 𝐼𝐵𝑛 (𝜔) , 𝜔 ∈ 𝛺,

𝑛=1
where 𝑎𝑛 is the value of 𝑓 on 𝐵𝑛 – i.e. 𝐵𝑛 ⊆ 𝑓 −1 ({𝑎𝑛 }).
Let 𝛺 = ℝ and choose ℱ = 𝜎({{𝜔}: 𝜔 ∈ ℤ}). Note that ℱ is generated by the following
(countably many) blocks:
ℬ = {{𝜔}: 𝜔 ∈ ℤ} ∪ {ℝ\ℤ} = {… {−2}, {−1}, {0}, {1}, {2}, … , ℝ\ℤ}.
Therefore, a function 𝑓: 𝛺 → ℝ is measurable if it is constant on the non-integers ℝ\ℤ. That is,

𝑓: 𝛺 → ℝ is measurable if and only if there exist a function 𝑔: ℤ → ℝ and a constant 𝑐 ∈ ℝ such
that
𝑔(𝜔) 𝜔∈ℤ
𝑓(𝜔) = { .
𝑐 𝜔 ∈ ℝ\ℤ
If the 𝜎-algebra ℱ is not generated by countably many blocks, it is generally difficult to prove that
a function 𝑓 is measurable. The difficulty is caused by the fact that we don't have a general form of
an arbitrary Borel set 𝐵. (𝐵 could be an interval, a countably infinite set, or even the irrational
numbers.) Thus, checking the pre-image condition for arbitrary Borel sets might be challenging.
The next theorem tells us that we do not need to check this condition for every Borel set, but it is
enough to check the condition for a sub-collection that generates ℬ(ℝ).

11
Theorem 2
Let (𝛺, ℱ) and (𝐸, ℰ) be measurable spaces and 𝒜 ⊆ ℰ be a collection of subsets of 𝐸 such that
𝜎(𝒜) = ℰ . A function 𝑓: 𝛺, ℱ → (𝐸, ℰ) is ℱ/ℰ -measurable if and only if for each 𝐵 ∈ 𝒜,
𝑓 −1 (𝐵) ∈ ℱ .
As a corollary, the following are equivalent:
1 𝑓 is Borel measurable
2 {𝑓 < 𝑐} ≔ 𝑓 −1 ((−∞, 𝑐)) ∈ ℱ for every 𝑐 ∈ ℝ
3 {𝑓 ≤ 𝑐} ≔ 𝑓 −1 ((−∞, 𝑐]) ∈ ℱ for every 𝑐 ∈ ℝ
4 {𝑓 > 𝑐} ≔ 𝑓 −1 ((𝑐, ∞)) ∈ ℱ for every 𝑐 ∈ ℝ
5 {𝑓 ≥ 𝑐} ≔ 𝑓 −1 ([𝑐, ∞)) ∈ ℱ for every 𝑐 ∈ ℝ
6 {𝑎 < 𝑓 < 𝑏} ≔ 𝑓 −1 ((𝑎, 𝑏)) ∈ ℱ for every 𝑎, 𝑏 ∈ ℝ
7 {𝑎 ≤ 𝑓 < 𝑏} ≔ 𝑓 −1 ([𝑎, 𝑏)) ∈ ℱ for every 𝑎, 𝑏 ∈ ℝ
8 {𝑎 < 𝑓 ≤ 𝑏} ≔ 𝑓 −1 ((𝑎, 𝑏]) ∈ ℱ for every 𝑎, 𝑏 ∈ ℝ
9 {𝑎 ≤ 𝑓 ≤ 𝑏} ≔ 𝑓 −1 ([𝑎, 𝑏]) ∈ ℱ for every 𝑎, 𝑏 ∈ ℝ
Let 𝛺 = [0,1], ℱ = ℬ([0,1]) ≔ 𝜎({(𝑎, 𝑏): 0 ≤ 𝑎 < 𝑏 ≤ 1}) = Borel subsets of [0,1] and ℙ(𝐴) = 𝜆1 (𝐴)
for each 𝐴 ∈ ℱ. Define 𝑋: 𝛺 → ℝ as
1 1
0≤𝜔<
𝑋(𝜔) ≔ 4 2.
1
𝜔 ≤𝜔≤1
2
Then 𝑋 is a random variable. For each 𝑐 ∈ ℝ, we have
𝛺 𝑐≥1
1
[0, 𝑐] ≤𝑐<1
{𝑋 ≤ 𝑐} = 2
1 1,
[0, ] 0≤𝑐<
2 2
{ ∅ 𝑐<0
which is always a Borel subset of [0,1].
We denote by 𝑚ℱ the set of all measurable functions 𝑓: (𝛺, ℱ) → (ℝ, ℬ(ℝ)).

12
Let (𝛺, 𝐹) be a measurable space and 𝑓, 𝑔, 𝑓𝑛 ∈ 𝑚ℱ be real-valued Borel measurable functions.

Then:
1 𝛼𝑓, 𝑓 + 𝑔, 𝑓 2 , 𝑓𝑔, 𝑓/𝑔 ∈ 𝑚ℱ where 𝛼 ∈ ℝ
2 𝑠𝑢𝑝𝑛 𝑓𝑛 , 𝑖𝑛𝑓𝑛 𝑓𝑛 ∈ 𝑚ℱ (See footnote.)2
3 𝑙𝑖𝑚 𝑠𝑢𝑝 𝑓𝑛 , 𝑙𝑖𝑚 𝑖𝑛𝑓 𝑓𝑛 ∈ 𝑚ℱ

𝑛→∞ 𝑛→∞
4 If 𝑙𝑖𝑚 𝑓𝑛 exists, then 𝑙𝑖𝑚 𝑓𝑛 ∈ 𝑚ℱ .

𝑛→∞ 𝑛→∞
Let (𝛺, ℱ), (𝐸, ℰ) and (𝐺, 𝒢) be measurable spaces and 𝑓: (𝛺, ℱ) → (𝐸, ℰ), 𝑔: (𝐸, ℰ) → (𝐺, 𝒢) be
measurable functions. Then ℎ ≔ 𝑔 ∘ 𝑓: (𝛺, ℱ) → (𝐺, 𝒢) is ℱ/𝒢-measurable.
Let 𝛺 be a set and 𝑋: 𝛺 → ℝ be a function. We define the 𝜎-algebra generated by 𝑋, 𝜎(𝑋), to be

the smallest 𝜎-algebra with respect to which 𝑋 is a random variable.
Consider the random experiment of tossing two fair coins.
The sample space is 𝛺 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}.
Define 𝑋: 𝛺 → ℝ by 𝑋(𝜔) ≔ number of heads in 𝜔. We have 𝑋(𝐻𝐻) = 2, 𝑋(𝐻𝑇) = 𝑋(𝑇𝐻) = 1

and 𝑋(𝑇𝑇) = 0. If 𝐵 ∈ ℬ(ℝ), then 𝑋 −1 (𝐵) be one of {𝐻𝐻}, {𝐻𝑇, 𝑇𝐻}, {𝑇𝑇} or a union of these
sets, depending on which of the numbers 0, 1, and 2 the set 𝐵 contains. Thus:
𝜎(𝑋) = 𝜎({{𝐻𝐻}, {𝐻𝑇, 𝑇𝐻}, {𝑇𝑇}}).
That is, 𝜎(𝑋) is generated by the blocks ℬ = {𝑋 −1 ({2}), 𝑋 −1 ({1}), 𝑋 −1 ({0})}.
Let 𝛺 = [0,1] and define 𝑋: 𝛺 → ℝ by

1
2 0≤𝜔<
𝑋(𝜔) ≔ { 2.
1
𝜔 ≤𝜔≤1
2
For each 𝑐 ∈ ℝ we have
2
Strictly speaking, these are extended random variables. See the next set
of notes.

13
𝛺 𝑐≥2
1
[ , 1] 1≤𝑐<2
2
{𝑋 ≤ 𝑐} = 1 1
[ , √𝑐] ≤𝑐≤1
2 4
1
{ ∅ 𝑐< .
4
Thus
1 1 1
𝜎(𝑋) = 𝜎({{𝑋 ≤ 𝑐} : 𝑐 ∈ ℝ}) = 𝜎 ({𝛺, [ , 1] , ∅} ∪ {[ , 𝑏] : ≤ 𝑏 ≤ 1})
2 2 2
1 1
= 𝜎 {{[ , 𝑏] : ≤ 𝑏 ≤ 1}}
2 2
1 1 1
= {𝐵 ⊆ [0,1]: 𝐵 ∈ ℬ ([ , 1]) or 𝐵 = [0, ) ∪ 𝐵′ where 𝐵′ ∈ ℬ ([ , 1])}
2 2 2
1
The set [0, ) is a block of 𝜎(𝑋) since 𝑋 does not distinguish between elements of this set, implying
2
1
that if a function 𝑌: 𝛺 → ℝ is 𝜎(𝑋)-measurable, then 𝑌 must also be constant on [0, ).
2
We end with a powerful result:
Theorem 3:
Doob-Dynkin Lemma
Let 𝛺 be a set and 𝑋, 𝑌: 𝛺 → ℝ be functions. Then 𝑌 is 𝜎(𝑋)-measurable if and only if there exists a
Borel measurable function 𝑔: (ℝ, ℬ(ℝ)) → (ℝ, ℬ(ℝ)) such that 𝑌 = 𝑔(𝑋).

14
Part II
The Law of a Random Variable
Let (Ω, ℱ, ℙ) be a probability space, 𝑋: (Ω, ℱ) → (ℝ, ℬ(ℝ)) be a random variable (i.e. 𝑋 ∈ 𝑚ℱ) and
𝐵 ∈ ℬ(ℝ). To calculate the probability that 𝑋 ∈ 𝐵 , we first find 𝑋 −1 (𝐵) and then evaluate ℙ
(𝑋 −1 (𝐵)). This means that we can define a set function ℙ𝑋 : ℬ(ℝ) → [0,1] by:
ℙ𝑋 (𝐵) = 𝑃(𝑋 −1 (𝐵)), 𝐵 ∈ ℬ(ℝ).
The function ℙ𝑋 is a probability measure on ℬ(ℝ). Indeed,
1 ℙ𝑋 (∅) = ℙ(𝑋 −1 (∅)) = 0.

2 ℙ𝑋 (𝐵) = ℙ(𝑋 −1 (𝐵)) ≥ 0, 𝐵 ∈ ℬ(ℝ).
3 If 𝐵1 , 𝐵2 , … ∈ ℬ(ℝ) are pairwise disjoint elements of ℬ(ℝ), then 𝑋 −1 (𝐵1 ), 𝑋 −1 (𝐵2 ), … are
pairwise disjoint elements of ℱ and
∞ ∞ ∞ ∞ ∞
−1
ℙ𝑥 (⋃ 𝐵𝑛 ) = ℙ (𝑋 (⋃ 𝐵𝑛 )) = ℙ (⋃ 𝑋 −1 (𝐵𝑛 )) = ∑ ℙ(𝑋 −1 (𝐵𝑛 )) = ∑ ℙ𝑋 (𝐵𝑛 ).
𝑛=1 𝑛=1 𝑛=1 𝑛=1 𝑛=1
4 ℙ𝑋 (ℝ) = ℙ(𝑋 −1 (ℝ)) = ℙ(Ω) = 1.
Thus, each random variable induces a measure on ℬ(ℝ). We will call ℙ𝑋 the probability
distribution or Law of 𝑋.
Consider the random experiment of tossing 2 fair coins and let 𝑋 count the number of heads.
Let ℱ = 2Ω and define the probability measure ℙ on (Ω, ℱ) by:
1
ℙ= ∑ 𝛿 .
4 𝜔
𝜔∈Ω
Then 𝑋 is automatically a random variable. It can be shown that:
1 1 1
ℙ𝑋 = δ0 + δ1 + δ2 .
4 2 4
We will later see that if 𝑋 is a “discrete” random variable, then ℙ𝑋 is a linear combination of Dirac
measures.

15
Let Ω = (0,1], ℱ = ℬ((0,1]) and ℙ = 𝜆1.Then 𝑋(𝜔) = 4𝜔 ∀𝜔 ∈ Ω is a random variable.
We now calculate ℙ𝑋 .
First, if 𝐵 = (−∞, 𝑥], then:
ℙ𝑋 ((−∞, 𝑥]) = ℙ({𝑋 ≤ 𝑥}) = ℙ({ω ∈ (0,1]: 4ω ≤ 𝑥}) = ℙ((0,1] ∩ (−∞, 𝑥/4])
0 𝑥 ≤0
𝑥
={ 0<𝑥<4
4
1 𝑥 ≥4.
Now, since Range(𝑋) = (0,4], ℙ𝑋 is concentrated on (0,4] and
λ1 ((0,4] ∩ 𝐵)
ℙ𝑋 (𝐵) = , 𝐵 ∈ ℬ(ℝ).
4
Let 𝑋: (Ω, ℱ, ℙ) → ℝ be a random variable. The Cumulative Distribution Function (CDF) of 𝑋 is the
function 𝐹𝑋 : ℝ → ℝ defined by:
𝐹𝑋 (𝑥) ≔ ℙ𝑋 ((−∞, 𝑥]) = ℙ({𝑋 ≤ 𝑥}), 𝑥 ∈ ℝ.
The CDF characterizes the distribution of a random variable. That is, if 𝑋, 𝑌: (Ω, ℱ) → (ℝ, ℬ(ℝ))
are random variables, then:
ℙ𝑋 = ℙ𝑌 ⟺ 𝐹𝑋 = 𝐹𝑌 .
The CDF of any random variable 𝑋 satisfies the following properties:
1 𝐹𝑋 is increasing, i.e. if 𝑥 ≤ then 𝐹𝑋 (𝑥) ≤ 𝐹𝑋 (𝑦).
2 𝐹𝑋 is right continuous, i.e. FX (𝑥 + ) ≔ lim+ 𝐹𝑋 (𝑦) = 𝐹(𝑥).

𝑦→𝑥
3 lim 𝐹𝑋 (𝑥) = 0 and lim 𝐹𝑋 (𝑥) = 1.

𝑥→−∞ 𝑥→∞
The converse of the above is also true. That is, if you start with an arbitrary function that satisfies
the above properties, then that function is a CDF of some random variable.
Let (Ω, ℱ, ℙ) be a probability space and 𝑋 ∈ 𝑚ℱ be a random variable.
1 We say that 𝑋 is discrete iff ℙ𝑋 is concentrated on a countable set {𝑥𝑛 : 𝑛 = 1,2,3, … }.

16
2 We say that 𝑋 is continuous iff 𝐹𝑋 is a continuous function.

3 We say that 𝑋 is absolutely continuous iff 𝐹𝑋 is an absolutely continuous function, in the
sense that for every ϵ > 0 there exists δ > 0 such that if (𝑎1 , 𝑏1 ), … , (𝑎𝑛 , 𝑏𝑛 ) is a finite
collection of disjoint intervals in ℝ with
∑|𝑏𝑖 − 𝑎𝑖 | < 𝛿,
𝑖=1
Then
∑|𝐹(𝑏𝑖 ) − 𝐹(𝑎𝑖 )| < ϵ.

𝑖=1
If 𝑋 is a discrete random variable with ℙ𝑋 concentrated on {𝑥𝑛 : 𝑛 = 1,2,3}, then
ℙ𝑋 = ∑ 𝑝𝑋 (𝑥𝑛 )δ𝑥𝑛 ,
𝑛=1
where the function 𝑝𝑋 : ℝ → ℝ, called the probability mass function (PMF) of 𝑋, is defined by
𝑝𝑋 (𝑥) = ℙ𝑋 ({𝑥}), 𝑥 ∈ ℝ.
The coin toss random variable counting the number of heads is obviously discrete. In fact, every
random variable with a countable range is discrete.

17
Unit 3: Expectation
In this section, we develop the Lebesgue integral of a measurable function 𝑓 with respect to a
measure 𝜇 . In the special case when 𝜇 = ℙ is a probability measure and 𝑓 = 𝑋 is a random
variable, this integral is the expected value of 𝑋.
Our intuition about integration is that it represents “aggregation” or “summing up”. We will see
that by an appropriate choice of 𝜇 and Ω, the integral will cover the many forms of aggregation we
are used to in mathematics, including summation, expectation, and calculating area/volume using
the Riemann integral. The advantage now is that we can look at all these forms of aggregation as
essentially doing the same thing: integrating with respect to a measure. This point of view will help
us prove one theorem that applies to all these special cases. In applications to probability theory, it
will enable us to treat discrete, continuous, and other random variables the same way. There will
be no need, for instance, to prove a theorem for discrete random variables using summation and
then prove the same theorem again for absolutely continuous random variables using integrals.
Let (Ω, ℱ, 𝜇) be a measure space and 𝑓: Ω → ℝ be a measurable function. In this section, we want
to define the integral of 𝑓 with respect to the measure 𝜇 , denoted by
∫ 𝑓 𝑑𝜇.
𝛺
When 𝜇 = ℙ is a probability measure and 𝑓 = 𝑋 is a random variable, we will define the

expectation of 𝑋 as
𝔼(𝑋) ≔ ∫ 𝑋 𝑑ℙ.
Ω
For reasons we will see later, it will be useful to define the integral for extended measurable
̅ ≔ ℝ ∪ {∞, −∞}, we define the Borel 𝜎-algebra
function. First, on the extended real numbers ℝ
ℬ(ℝ̅ ) to be
̅ ) ≔ {𝐵 ⊆ ℝ
ℬ(ℝ ̅ : 𝐵 ∩ ℝ ∈ ℬ(ℝ)}
= {𝐵 ⊆ ℝ ̅ ) or 𝐵 = 𝐵′ ∪ {∞} or 𝐵 = 𝐵′ ∪ {−∞} or 𝐵 = 𝐵′ ∪ {∞, −∞}, 𝐵′ ∈ ℬ(ℝ

̅ : 𝐵 ∈ ℬ(ℝ ̅ )}.
̅ , ℬ(ℝ
A function 𝑓: (𝛺, ℱ) → (ℝ ̅ )) that is ℱ/ℬ(ℝ
̅ )-measurable will be called an extended
measurable function. If 𝑓 = 𝑋 is a random variable, we will call 𝑋 an extended random variable.

18
Throughout this chapter we will assume that all measurable functions are extended measurable
functions.
We will carry out the definition in four steps:
1 We first define the integral for the case when 𝑓 = 𝐼𝐴 is an indicator function of a
measurable set 𝐴 ∈ ℱ .
2 We will then extend the definition to when 𝑓 is positive3 and is a (finite) linear combination
of indicators. Such functions are called (positive) simple functions.
3 We then extend the definition to all positive measurable functions and prove useful results
that aid with the calculation of the integral in many concrete examples.
4 Finally, we extend the definition to general measurable (sign-changing) functions 𝑓 and
define the notion of integrability. This will be covered in the next section.
We begin with indicators. Throughout the rest of the chapter, (𝛺, ℱ, 𝜇) is a measure space and
̅ , ℬ(ℝ
𝑓: (𝛺, ℱ) → (ℝ ̅ )) is ℱ/ℬ(ℝ
̅ )-measurable.
If 𝑓 = 𝐼𝐴 for some 𝐴 ∈ ℱ , then we define the integral of 𝑓 with respect to the measure 𝜇 to be
∫ 𝑓 𝑑𝜇 = ∫ 𝐼𝐴 𝑑𝜇 ≔ 𝜇(𝐴).

𝛺 𝛺
In the case of a probability space, we have
𝔼(𝐼𝐴 ) = ∫ 𝐼𝐴 𝑑ℙ ≔ ℙ(𝐴).

𝛺
There is nothing surprising about this definition; it is consistent with our intuition of aggregation.
We move on to simple functions.
̅ is called simple if it has a finite range. We will denote by 𝒮 the set

A measurable function 𝑠: 𝛺 → ℝ
of all (measurable) simple functions. We will also write 𝒮 + for the class of positive simple functions
and 𝑚ℱ + for the class of positive measurable functions. Note that if 𝑠 ∈ 𝒮 with Range(𝑠) =
{𝑠1 , … , 𝑠𝑛 }, then 𝑠 can be written as
𝑛
𝑠 = ∑ 𝑠𝑘 𝐼𝐴𝑘 ,
𝑘=1
3
We should really be saying “non-negative”.

19
where 𝐴𝑘 = 𝑠 −1 {(𝑠𝑘 )} ∈ ℱ for 𝑘 = 1,2, … , 𝑛 are pairwise disjoint. We will call this the canonical
representation of a simple function and assume this representation when we talk about a simple
function.
Let 𝑓 = 𝑠, where 𝑠 ∈ 𝒮 + is a positive simple function with canonical representation
𝑠 = ∑ 𝑠𝑘 𝐼𝐴𝑘 .
𝑘=1
We define the integral of 𝑓 with respect to the measure 𝜇 as
∫ 𝑓 𝑑𝜇 = ∫ 𝑠 𝑑𝜇 ≔ ∑ 𝑠𝑘 𝜇(𝐴𝑘 ).

𝛺 𝛺 𝑘=1
Note that the value of this integral could be ∞, even if 𝑓 is finite. The probability space equivalent
of this definition is that if 𝑋 is a simple random variable with
𝑋 = ∑ 𝑠𝑘 𝐼𝐴𝑘 ,
𝑘=1
then
𝔼(𝑋) = ∫ 𝑋 𝑑ℙ ≔ ∑ 𝑠𝑘 ℙ(𝐴𝑘 ).

𝛺 𝑘=1
Note that if 𝑠 ∈ 𝒮 + is a positive simple function with canonical representation
𝑠 = ∑ 𝑠𝑘 𝐼𝐴𝑘 ,
𝑘=1
then
∫ 𝑠 𝑑𝜇 = ∑ 𝑠𝑘 ∫ 𝐼𝐴𝑘 𝑑𝜇.

𝛺 𝑘=1 𝛺
We extend this definition to all positive measurable functions. Let 𝑓 ∈ 𝑚ℱ + be a positive

measurable function. We define the integral of 𝑓 with respect to 𝜇 as

20
∫ 𝑓 𝑑𝜇 ≔ 𝑠𝑢𝑝 {∫ 𝑠 𝑑𝜇: 𝑠 ∈ 𝒮 + , 0 ≤ 𝑠 ≤ 𝑓}.

𝛺 𝛺
When 𝜇 = ℙ is a probability measure and 𝑓 = 𝑋 is a random variable, then this definition means
𝔼(𝑋) = ∫ 𝑋 𝑑ℙ ≔ 𝑠𝑢𝑝{𝔼(𝑠): 𝑠 ∈ 𝒮 + , 0 ≤ 𝑠 ≤ 𝑋}.

𝛺
If 𝑓 ∈ 𝒮 + , then this definition gives the same result as the previous definition of the integral for
positive simple functions. That is
∫ 𝑓 𝑑𝜇 ≔ 𝑠𝑢𝑝 {∫ 𝑠 𝑑𝜇: 𝑠 ∈ 𝒮 + , 0 ≤ 𝑠 ≤ 𝑓},

𝛺 𝛺
even when 𝑓 ∈ 𝒮 + .
If 𝑓, 𝑔 ∈ 𝑚ℱ + are such that 𝑓(𝜔) ≤ 𝑔(𝜔) for every 𝜔 ∈ 𝛺 , then
∫ 𝑓 𝑑𝜇 ≤ ∫ 𝑔 𝑑μ.
𝛺 𝛺
This definition is not user-friendly. We would like to avoid having to use it to calculate integrals in
concrete cases. The following two results will help us do just that. First, we need some definitions.
Let (𝑓𝑛 ) be a sequence of measurable functions. We say that (𝑓𝑛 ) is increasing if for each 𝜔 ∈ 𝛺
and 𝑛 ≥ 1, 𝑓𝑛 (𝜔) ≤ 𝑓𝑛+1 (𝜔). If 𝑓 ∈ 𝑚ℱ , we say that (𝑓𝑛 ) converges point-wise to 𝑓 if for each
𝜔 ∈ 𝛺, 𝑙𝑖𝑚 𝑓𝑛 (𝜔) = 𝑓(𝜔). We will simply write 𝑓𝑛 → 𝑓 to denote point-wise convergence of (𝑓𝑛 )
𝑛→∞
to 𝑓 .
Here's the big theorem:

21
Theorem 4:
Monotone Convergence Theorem (MON), Beppo-Levi
Let 𝑚ℱ + and (𝑓𝑛 ) be an increasing sequence of positive measurable functions that converges to 𝑓
point-wise. Then
∫ 𝑓 𝑑𝜇 = 𝑙𝑖𝑚 ∫ 𝑓𝑛 𝑑𝜇.

𝛺 𝑛→∞ 𝛺
That is,
𝑙𝑖𝑚 ∫ 𝑓𝑛 𝑑𝜇 = ∫ 𝑙𝑖𝑚 𝑓𝑛 𝑑𝜇.

𝑛→∞ 𝛺 𝛺 𝑛→∞
The probability space equivalent statement is that if (𝑋𝑛 ) is an increasing sequence of non-
negative random variables that converges point-wise to a non-negative random variable 𝑋, then
𝑙𝑖𝑚 𝔼(𝑋𝑛 ) = 𝔼 ( 𝑙𝑖𝑚 𝑋𝑛 ) = 𝔼(𝑋).

𝑛→∞ 𝑛→∞
This result says that we can interchange between taking limits and evaluating integrals in the case
when the sequence of functions is increasing and positive. We will see other theorems of this
nature in the next section. The result will be helpful in calculating the integral of 𝑓 ∈ 𝑚ℱ + if we
already know how to calculate the integrals of the 𝑓𝑛 's. In other words, we need to be able to find a
sequence (𝑓𝑛 ) such that
1 (𝑓𝑛 ) is increasing,
2 𝑓𝑛 ∈ 𝑚ℱ + for each 𝑛 ≥ 1,
3 (𝑓𝑛 ) converges point-wise to 𝑓, and
4 ∫𝛺 𝑓𝑛 𝑑 is easy to calculate.
Here is one case where all these requirements are met.
Theorem 5:
Simple Function Approximation
If 𝑓 ∈ 𝑚ℱ + , then there exists an increasing sequence of positive simple functions (𝑠𝑛 ) such that
(𝑠𝑛 ) converges point-wise to 𝑓 .

22
It then follows that
∫ 𝑓 𝑑𝜇 = 𝑙𝑖𝑚 ∫ 𝑠𝑛 𝑑𝜇,

𝛺 𝑛→∞ 𝛺
by (MON).
Let us see this theorem in action. Consider the measure space ([0,1], ℬ([0,1]), 𝜆1 ) and 𝑓: [0,1] →
̅ defined by 𝑓(𝑥) = 𝑥 . We have
ℝ
𝑛2𝑛 −1
𝑘
𝑠𝑛 = ∑ 𝐼 𝑘 𝑘+1 + 𝑛𝐼{𝑓≥𝑛} ,
2𝑛 {2𝑛 ≤𝑓< 2𝑛 }
𝑘=0
where
𝑘 𝑘+1 𝑘 𝑘+1 𝑘 𝑘+1
{ 𝑛
≤ 𝑓 < 𝑛 } = {𝑥 ∈ [0,1]: 𝑛 ≤ 𝑓(𝑥) < 𝑛 } = [0,1] ∩ [ 𝑛 , 𝑛 ),
2 2 2 2 2 2
which is a null set when 𝑘 ≥ 2𝑛 . Hence,
2𝑛 −1 2𝑛 −1
𝑘 𝑘
𝑠𝑛 = ∑ 𝑛 𝐼 𝑘 𝑘+1 = ∑ 𝑛 𝐼 𝑘 𝑘+1
2 {2𝑛≤𝑓< 2𝑛 } 2 [2𝑛, 2𝑛 )
𝑘=0 𝑘=0
2𝑛 −1 2𝑛 −1
𝑘 𝑘 𝑘+1 𝑘 1 1
∫ 𝑠𝑛 𝑑𝜆1 = ∑ 𝑛 𝜆1 ([ 𝑛 , 𝑛 )) = ∑ 2𝑛 = (1 − 𝑛 ).
[0,1] 2 2 2 2 2 2
𝑘=0 𝑘=0
Hence, (MON) gives
1 1 1
∫ 𝑓 𝑑𝜇 = 𝑙𝑖𝑚 ∫ 𝑠𝑛 𝑑𝜇 = 𝑙𝑖𝑚 (1 − 𝑛 ) = .
[0,1] 𝑛→∞ [0,1] 𝑛→∞ 2 2 2
We will see later that in general, if 𝑓: [𝑎, 𝑏] → ℝ is Riemann-integrable, then
𝑏
∫ 𝑓 𝑑𝜆1 = ∫ 𝑓(𝑥)𝑑𝑥 .
[𝑎,𝑏] 𝑎
That is, integrating with respect to the Lebesgue measure is the same as ordinary Riemann
integration when 𝑓 is Riemann integrable on a bounded interval.

23
Also note that ([0,1], ℬ([0,1]), 𝜆1 ) is actually a probability space, therefore 𝑓 is random variable.
It is easy to show that 𝑓 ∼ 𝑈(0,1), which is consistent with the fact the integral represents the
expected value of 𝑓 .
We now extend the definition of the integral to functions in 𝑚ℱ that may change sign. If 𝑓 ∈ 𝑚ℱ ,
we define the positive part of 𝑓 as the function 𝑓 + ≔ 𝑚𝑎𝑥{𝑓, 0}, and the negative part of 𝑓 as
𝑓 − ≔ 𝑚𝑎𝑥{−𝑓, 0}. It is easy to show that both 𝑓 + , 𝑓 − ∈ 𝑚ℱ + , 𝑓 = 𝑓 + 𝑓 − and |𝑓| = 𝑓 + + 𝑓 − .
Let 𝑓 ∈ 𝑚ℱ . We define the integral of 𝑓 with respect to 𝜇 as
∫ 𝑓 𝑑𝜇 ≔ ∫ 𝑓 + 𝑑𝜇 − ∫ 𝑓 − 𝑑𝜇,

𝛺 𝛺 𝛺
provided at least one of the integrals on the right is finite. In that case, we say that the integral
exists, otherwise, we say that the integral does not exist. We will say that 𝑓 is integrable if both
integrals on the right are finite, i.e., if
∫ |𝑓| 𝑑𝜇 < ∞.

𝛺
We will denote by ℒ 1 ≔ ℒ 1 (𝛺, ℱ, 𝜇) the set of all integrable functions. In general, if 𝑝 ∈ [1, ∞),
we define
ℒ 𝑝 ≔ ℒ 𝑝 (𝛺, ℱ, 𝜇) ≔ {𝑓 ∈ 𝑚ℱ: ∫ |𝑓|𝑝 𝑑𝜇 < ∞}.

𝛺
We will also be interested in integrating functions over subsets of 𝛺 . If 𝐴 ∈ ℱ , we define the

integral of 𝑓 over 𝐴 as
∫ 𝑓 𝑑𝜇 ≔ ∫ 𝑓𝐼𝐴 𝑑𝜇,

𝐴 𝛺
provided the integral exists. If 𝜇 = ℙ is a probability space and 𝑓 = 𝑋 is a random variable, we will
write this as
𝔼(𝑋; 𝐴) ≔ 𝔼(𝑋𝐼𝐴 ).
Here is a powerful theorem that allows us to interchange limits and integrals.

24
Theorem 6:
Lebesgue Dominated Convergence Theorem (DCT)
Let (𝑓𝑛 ) be a sequence of measurable functions such that there exists 𝑔 ∈ ℒ 1 ∩ 𝑚ℱ + such that
|𝑓𝑛 (𝜔)| ≤ 𝑔(𝜔) for every 𝜔 ∈ 𝛺 and 𝑛 ≥ 1. If 𝑓 ∈ 𝑚ℱ and 𝑙𝑖𝑚 𝑓𝑛 (𝜔) = 𝑓(𝜔) for every 𝜔 ∈ 𝛺,
𝑛→∞
then
1 𝑓 ∈ ℒ1,
2 𝑙𝑖𝑚 ∫ |𝑓𝑛 − 𝑓| 𝑑𝜇 = 0, and

𝑛→∞ 𝛺
3 ∫𝛺 𝑓 𝑑𝜇 = 𝑙𝑖𝑚 ∫𝛺 𝑓𝑛 .
𝑛→∞
In the language of probability, the theorem says that if (𝑋𝑛 ) is a sequence of random variables
such that there exists a random variable 𝑌 ∈ ℒ 1 ∩ 𝑚ℱ + such that |𝑋𝑛 (𝜔)| ≤ 𝑌(𝜔) for every 𝜔 ∈
𝛺 , 𝑛 ≥ 1, and 𝑋 ∈ 𝑚ℱ is such that 𝑙𝑖𝑚 𝑋𝑛 (𝜔) = 𝑋(𝜔) for every 𝜔 ∈ 𝛺, then
𝑛→∞
1 𝑋 ∈ ℒ1,
2 𝑙𝑖𝑚 𝔼 (|𝑋𝑛 − 𝑋|) = 0, and

𝑛→∞
3 𝑙𝑖𝑚 𝔼 (𝑋𝑛 ) = 𝔼 ( 𝑙𝑖𝑚 𝑋𝑛 ) = 𝔼(𝑋).

𝑛→∞ 𝑛→∞
Here is a very important special case of how to evaluate Lebesgue integrals:
Theorem 7
Let 𝑓: [𝑎, 𝑏] → ℝ be bounded.
1 𝑓 is Riemann integrable if and only if 𝑓 is continuous 𝜆1 -almost everywhere.

2 If 𝑓 is Riemann integrable, then it is Lebesgue integrable and
𝑏
∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓 𝑑𝜆1 .
𝑎 [𝑎,𝑏]
On (ℕ, 2ℕ , #), integration reduces to summation:

25
∫ 𝑓 𝑑# = ∑ 𝑓(𝑛),
ℕ 𝑛∈ℕ
provided the series converges.
For any set 𝛺 and 𝑎 ∈ 𝛺 , we have
∫ 𝑓 𝑑𝛿𝑎 = 𝑓(𝑎).
𝛺
Finally, we mention that if 𝜇 = ∑𝑘 𝑎𝑘 𝜇𝑘 , where 𝑎𝑘 ≥ 0, we have
∫ 𝑓 𝑑𝜇 = ∑ 𝑎𝑘 ∫ 𝑓 𝑑𝜇𝑘 ,
𝛺 𝑘 𝛺
provided 𝑓 is 𝜇 -integrable.

26
Unit 4: The Radon-Nikodym Theorem

μ
Let 𝑓 ∈ 𝑚ℱ + and define ν𝑓 : ℱ → ℝ by
μ
ν𝑓 (𝐴) = ∫ 𝑓 𝑑μ, 𝐴 ∈ ℱ.
𝐴
𝝁 𝝁
Then 𝝂𝒇 is a positive measure (exercise). Also, if 𝝁(𝑨) = 𝟎 then 𝝂𝒇 (𝑨) = 0. Likewise, if (Ω, ℱ, ℙ) is
a probability space and 𝑓 ∈ 𝑚ℱ + is a non-negative random variable with 𝔼(𝑋) = 1, then the
function Q : ℱ → [0,1] defined by
𝑄(𝐴) = 𝔼(𝑋; 𝐴), 𝐴∈ℱ
is a probability measure such that for every 𝐴 ∈ ℱ, ℙ(𝐴) = 0 ⟹ 𝑄(𝐴) = 0.
Let μ, ν be positive measures on (Ω, ℱ).
1 We say that 𝜈 is absolutely continuous with respect to 𝜇 (written 𝑣 ≪ 𝜇) if for each 𝐴 ∈

ℱ, 𝜇(𝐴) = 0 ⟹ 𝑣(𝐴) = 0.
2 We say that 𝜇 and 𝜈 are equivalent (written μ ≡ 𝑣) if 𝑣 ≪ 𝜇 and 𝜇 ≪ 𝑣.
3 We say that 𝜈 and 𝜇 are mutually singular (written ν ⊥ μ) if there exists 𝐴 ∈ ℱ such that
ν(𝐴) = μ(𝐴𝑐 ) = 0.
1 On (ℝ, 𝐿(ℝ)), 𝛿0 ⊥ 𝜆1 since 𝜆1 ({0}) = 𝛿0 (ℝ\{0}) = 0.

2 If (Ω, ℱ, 𝜇) is a measurable space, then μ ≪ #, since #(𝐴) = 0 ⟹ 𝐴 = ∅ ⟹ μ(𝐴) = 0.
3 If (Ω, ℱ, 𝜇) is a measure space, then for any constants 𝑎, 𝑏 > 0, 𝑎μ ≡ 𝑏μ.
4 Note that in general, if μ = ∑∞
𝑘=1 𝑎𝑘 μ𝑘 where 𝑎𝑘 > 0 for each 𝑘 ≥ 1, then μ𝑘 ≪ 𝜇 for every
≥ 1.
𝜇
We have seen above that 𝜈𝑓 ≪ 𝜇. The Radon-Nikodym Theorem tells us that all measures that are
absolutely continuous with respect to 𝜇 arise in this way (at least in the 𝜎 −finite case).

27
Theorem 8:
Radon-Nikodym
Let (𝛺, ℱ) be a measurable space, and 𝜇, 𝜈 be two 𝜎 −finite measures on (𝛺, ℱ) such that 𝑣 ≪ 𝜇.
Then there exists 𝑓 ∈ 𝑚ℱ + such that 𝜈 = 𝜈𝑓𝜇 , in the sense that
ν(𝐴) = ∫ 𝑓 𝑑μ, 𝐴 ∈ ℱ.
𝐴
μ
Furthermore, if 𝑔 ∈ 𝑚ℱ + is such that ν = ν𝑔 , then 𝑓 = 𝑔 𝜇 − 𝑎. 𝑒.. Thus, 𝑓 is unique up to a null set.
𝜇
The function 𝒇 such that ν = 𝜈𝑓 is called the Radon-Nikodym derivative or density of 𝜈 with
respect to 𝜇 and is denoted by
𝑑ν
𝑓 =: .
𝑑μ
1 If (Ω, ℱ, 𝜇) is a measure space, then define (for any constant 𝑎 > 0), 𝜈: = 𝑎𝜇. We have that
𝑣 ≪ 𝜇 and
𝑑𝜈
= 𝑎, 𝜇 − 𝑎 .𝑒
𝑑𝜇
To see this, note that for each 𝐴 ∈ ℱ,
𝜈(𝐴) = 𝑎 𝜇(𝐴) = 𝑎 ∫ 1 𝑑𝜇 = ∫ 𝑎 𝑑𝜇.

𝐴 𝐴
2 If (Ω, ℱ, 𝜇) is a measure space and 𝐹 ∈ ℱ, define 𝜈(𝐴) : = 𝜇(𝐴 ∩ 𝐹) for every 𝐴 ∈ ℱ. Then 𝜈
is also a measure on (Ω, ℱ) and 𝑣 ≪ 𝜇 with
𝑑𝜈
= 𝐼𝐹 , 𝜇 − 𝑎. 𝑒..
𝑑𝜇

28
Theorem 9
Let (𝛺, ℱ) be a measurable space and 𝜈, 𝜇 be measures with 𝑣 ≪ 𝜇. If 𝑔 ∈ ℒ 1 (𝛺, ℱ, 𝑣), then
𝑔𝑑𝜈
𝑑𝜇
∈ ℒ1 (𝛺, ℱ, 𝜇) and
𝑑𝜈
∫ 𝑔 𝑑𝜈 = ∫ 𝑔 𝑑𝜇 , for every 𝑨 ∈ 𝓕.
𝐴 𝐴 𝑑𝜇
Let (Ω, ℱ) be a measurable space and 𝜇, 𝜈 and 𝜆 be measures.
1 If 𝑣 ≪ 𝜇 and 𝜆 ≪ 𝜇, then 𝑎𝑣 + 𝑏𝜆 ≪ 𝜇 for every 𝑎, 𝑏 ≥ 0. Furthermore,
𝑑(𝑎𝑣 + 𝑏𝜆) 𝑑𝑣 𝑑𝜆
=𝑎 +𝑏 , 𝜇 − 𝑎. 𝑒.
𝑑𝜇 𝑑𝜇 𝑑𝜇
2 If 𝜆 ≪ 𝑣 and 𝑣 ≪ 𝜇, then 𝜆 ≪ 𝜇 and
𝑑𝜆 𝑑𝜆 𝑑𝑣
= .
𝑑𝜇 𝑑𝑣 𝑑𝜇
𝑑𝑣 𝑑𝜇
3 If 𝑣 ≡ 𝜇, then > 0 , 𝜇 − 𝑎. 𝑒. , > 0 𝑣 − 𝑎. 𝑒. , and
𝑑𝜇 𝑑𝑣
𝑑𝜇 1
= 𝑣 − 𝑎. 𝑒.
𝑑𝑣 𝑑𝑣
𝑑𝜇
A random variable 𝑋 is absolutely continuous if and only if ℙ𝑋 ≪ 𝜆1 .

29
MScFE 620 Discrete-time Stochastic Processes - Module 1: Problem Set
Problem Set
Problem 1
Consider the function 𝑓: ℝ → ℝ defined by
𝑓(𝑥) = (𝑥 + 2)3 .
What is the set 𝑓 −1 ((−1,1))?
Solution:
We just have to compute the inverse of the following function:
𝑦 = 𝑓(𝑥) = (𝑥 + 2)3 .
First, we need to isolate 𝑥 ,
𝑥 = 𝑦1/3 − 2.
Now, we get the value for the extremes of the given set, 𝑓 −1 ((−1,1)). For −1, we have 𝑦 1/3 −
2 = −1 − 2 = −3 and for 1 we get 𝑦1/3 − 2 = 1 − 2 = −1. Thus, the solution is the interval
(−3, −1).
Problem 2
Consider the measure space ([0,1], ℬ([0,1]), 𝜇 ≔ 𝜆1 + 2𝛿0 ) and the measurable function
𝑓(𝜔) = 2𝜔. Find
∫ 𝑓 𝑑𝜇.
[0,1]
Solution:
We will need to apply Theorem 7,
Let 𝑓: [𝑎, 𝑏] → ℝ be bounded.

1 𝑓 is Riemann integrable if and only if 𝑓 is continuous 𝜆1 -almost everywhere.
2 If 𝑓 is Riemann integrable, then it is Lebesgue integrable and

30
𝑏
∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓 𝑑𝜆1 .
𝑎 [𝑎,𝑏]
We will need to apply this together with the following property: For any set 𝛺 and 𝑎 ∈ 𝛺 , we have
∫ 𝑓 𝑑𝛿𝑎 = 𝑓(𝑎).
𝛺
Thus,
∫ 𝑓 𝑑𝜇 = ∫ 2𝑥 (𝑑𝜆1 + 2𝑑𝛿0 ) = ∫ 2𝑥 𝑑𝜆1 + ∫ 4𝑥 𝑑𝛿0 .

[0,1] [0,1] [0,1] [0,1]
We can solve the first integral applying Theorem 7 (see above):
1
∫ 2𝑥 𝑑𝜆1 = ∫ 2𝑥 𝑑𝑥 = 1.
[0,1] 0
On the other hand, the second integral will be solved as follows,
∫ 4𝑥 𝑑𝛿0 = 𝑓(0) = 4 ∗ 0 = 0.
[0,1]
Thus, the final solution is,
∫ 𝑓 𝑑𝜇 = ∫ 2𝑥 (𝑑𝜆1 + 2𝑑𝛿0 ) = ∫ 2𝑥 𝑑𝜆1 + ∫ 4𝑥 𝑑𝛿0 = 1 + 0 = 1.

[0,1] [0,1] [0,1] [0,1]
Problem 3
Consider the measurable space (ℝ, ℬ(ℝ)) and the random variable 𝑋(𝜔) = |𝜔|. Define the
measures ℙ and ℙ∗ as
𝑑 ℙ∗
ℙ ≔ 0.3𝛿0 + 0.4𝛿1 + 0.3𝛿2 , = 𝑋.
𝑑ℙ
Then what is ℙ∗ ([0,1])?
Solution:
First of all we should compute ℙ∗ as follows:

31
ℙ∗ = ℙ ∗ 𝑋 = 0.3𝛿0 𝑋(0) + 0.4𝛿1 𝑋(1) + 0.3𝛿2 𝑋(2).
Thus, as we are interested in the interval ℙ∗ ([0,1]), we need to find the probability of 𝜔 = 0 and
𝜔 = 1.
ℙ∗ ([0,1]) = 0.3𝛿0 𝑋(0) + 0.4𝛿1 𝑋(1) = 0.3 ∗ 0 + 0.4 ∗ |1| = 0.4.
The solution is 0.4.
Problem 4
Consider the measurable space (ℝ, ℬ(ℝ)) and the random variables 𝑋 and 𝑌 by 𝑋(𝜔) =
2𝐼(−∞,0] (𝜔) and 𝑌(𝜔) = 𝜔2 . Define the measures ℙ and ℙ∗ as
𝑑ℙ∗
ℙ ≔ 0.3𝛿0 + 0.5𝛿1 + 0.2𝛿−1 , ≔ 𝑋.
𝑑ℙ
Then, compute 𝔼∗ (𝑌).
Solution:
First, we need to define the expected value of 𝑌,
𝔼∗ (𝑌) = ∑ 𝑌(𝜔) ∗ 𝑃∗ (𝜔).

𝜔
Thus, as we already know 𝑌(𝜔) (since it is given), we just need to compute 𝑃∗ (see Problem 3) as
follows:
ℙ∗ = ℙ ∗ 𝑋 = 2 ∗ 0.3𝛿0 + 2 ∗ 0.2𝛿−1 = 0.6𝛿0 + 0.4𝛿−1 .
Finally, we compute 𝔼∗ (𝑌),
𝔼∗ (𝑌) = ∑ 𝑌(𝜔) ∗ 𝑃∗ (𝜔) = 𝑌(0) ∗ 0.6 + 𝑌(−1) ∗ 0.4 = 0.4.

𝜔
The solution is 0.4.

32
Problem 5
Consider the measure space (ℕ, 2ℕ , 𝜇 = #) and the measurable function 𝑓(𝜔) = 𝑒 −𝜔 . Find
∫ 𝑓 𝑑𝜇.
ℕ
Solution:
From the lecture notes we know that on (ℕ, 2ℕ , #), integration reduces to summation:
∫ 𝑓 𝑑# = ∑ 𝑓(𝑛),
ℕ 𝑛∈ℕ
provided the series converges. Thus, our problem could be re-written as follows,
−𝑛
1 𝑛
∫ 𝑓𝑑# = ∑ 𝑓(𝑛) = ∑ 𝑒 = ∑( ) .
ℕ 𝑒
𝑛∈ℕ 𝑛∈ℕ 𝑛∈ℕ
The last expression above is just a geometric series with 𝑟 = 1/𝑒 and, thus the infinity sum is
computed as,
−𝑛
1 𝑛 1
∑𝑒 = ∑( ) = ,
𝑒 1 − 𝑒 −1
𝑛∈ℕ 𝑛∈ℕ
which is the solution of the proposed problem.

33
MScFE 620 Discrete-time Stochastic Processes - Module 1: Collaborative Review Task
Collaborative Review Task
In this module, you are required to complete a collaborative review task, which is designed to test
your ability to apply and analyze the knowledge you have learned during the week.
Brief
A random variable 𝑋 is said to have a standard normal distribution if 𝑋 is absolutely continuous

with density given by
𝑑ℙ𝑋 1 1 2
(𝑥) = 𝑒 −2 𝑥 , 𝑥 ∈ ℝ.
d𝜆1 √2𝜋
Construct (i.e., give an example of) a probability space (Ω, ℱ, ℙ) and a random variable 𝑋: Ω → ℝ on
(Ω, ℱ, ℙ) such that 𝑋 has a standard normal distribution. In your example, be sure to verify that 𝑋
does indeed have a standard normal distribution.

34

Mscfe XXX (Course Name) - Module X: Collaborative Review Task

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Mscfe XXX (Course Name) - Module X: Collaborative Review Task

Caricato da

Copyright:

Formati disponibili

MScFE xxx [Course Name] - Module X: Collaborative Review Task

Module 1: Probability Theory ....................................................................... 3

Unit 1: Sigma-Algebras and Measures............................................................................. 4

Unit 2: Random Variables......................................................................................................9

Unit 3: Expectation ................................................................................................................. 18

Unit 4: The Radon-Nikodym Theorem .......................................................................... 27

Problem Set .............................................................................................................................. 30

Collaborative Review Task ................................................................................................ 34

© 2019 - WorldQuant University – All rights reserved.

Module 1: Probability Theory

Module 1 introduces the measure-theoretic foundations of probability. It begins with an

© 2019 - WorldQuant University – All rights reserved.

Unit 1: Sigma-Algebras and Measures

Ω = {HHH, HHT, … , TTT} = {𝜔 = 𝜔1 𝜔2 𝜔3 : 𝜔𝑖 = H or T for 𝑖 = 1,2,3},

A = {HHT, HTH, THH}.

𝒜2. For every 𝐴 ⊆ Ω, 𝐴 ∈ ℱ ⟹ 𝐴𝑐 ∈ ℱ;

𝒜3. For every 𝐴, 𝐵 ⊆ Ω, 𝐴, 𝐵 ∈ ℱ ⟹ 𝐴 ∪ 𝐵 ∈ ℱ;

𝜎𝒜4. If {𝐴𝑛 : 𝑛 = 1,2,3, … } is a collection of subsets of Ω with 𝐴𝑛 ∈ 𝐹 for each 𝑛 ≥ 1, then

© 2019 - WorldQuant University – All rights reserved.

The pair (Ω, ℱ) is called a measurable space.

ℙ2. ℙ(𝐴) ≥ 0 for every 𝐴 ∈ ℱ;

ℙ3. If {𝐴𝑛 : 𝑛 = 1,2,3, … } is a countable collection of pairwise disjoint events, then

The triple (Ω, ℱ, ℙ) is called a probability space.

o The above example can be extended to arbitrary dimensions to construct the 𝑁-

© 2019 - WorldQuant University – All rights reserved.

o Mass for quantities of matter.

o Cardinality of sets also satisfies the same properties.

3 Let 𝐴 ⊆ Ω, then ℱ𝐴 = {𝐴, 𝐴𝑐 , ∅, Ω} is a 𝜎-algebra (and, therefore, an algebra) on Ω. We call

4 For a concrete example, let Ω = {1,2,3,4} and

ℱ = {{1}, {2,3}, {4}, {2,3,4}, {1,2,3}, {1,4}, ∅, Ω}.

One can easily check that ℱ is a 𝜎-algebra on Ω.

© 2019 - WorldQuant University – All rights reserved.

Now we consider more complicated examples.

1 Let Ω = ℝ and choose ℱ = {𝐴 ⊆ ℝ: for every 𝑥 ∈ ℝ, 𝑥 ∈ 𝐴 ⟺ −𝑥 ∈ 𝐴} to be the collection

2 Let Ω = ℝ again and define

𝒜 = {𝐴 ⊆ ℝ: 𝐴 =∪𝑛𝑘=1 𝐼𝑘 for some intervals 𝐼𝑘 ⊆ ℝ, 𝑘 = 1,2, … , 𝑛, 𝑛 ∈ ℕ+ }

3 Let Ω be an infinite set. Define:

𝒜 ≔ {𝐴 ⊆ Ω: 𝐴 is finite or 𝐴𝑐 is finite}.

Then 𝒜 is an algebra but not a 𝜎-algebra.

4 Let Ω be an infinite set. Define:

ℱ ≔ {𝐴 ⊆ Ω: 𝐴 is countable or 𝐴𝑐 is countable}.

Here are some examples of measures.

then δ𝑎 is a probability measure on (Ω, ℱ), called the Dirac measure.

2 The Counting Measure. Let Ω be a set and define #: 2Ω → ℝ

© 2019 - WorldQuant University – All rights reserved.

The N-dimensional analog of Lebesgue measure is a measure λ𝑁 on some subsets of ℝ𝑁

Let 𝒜 ⊆ 2Ω be a collection of subsets of Ω. In general, 𝒜 will not be a 𝜎-algebra. What we want is

ℬ(ℝ) = 𝜎 ({𝐼 ∶ 𝐼 ⊆ ℝ is an interval}).

© 2019 - WorldQuant University – All rights reserved.

Unit 2: Random Variables

We define a random variable as a function 𝑋: Ω → ℝ such that:

𝑋 −1 (𝐵) ∈ ℱ for every 𝐵 ∈ ℱ.

© 2019 - WorldQuant University – All rights reserved.

Here are some examples:

which is always in ℱ if and only if 𝐴 ∈ ℱ.

2 On (𝛺, 2𝛺 ), every function 𝑓: 𝛺 → ℝ is measurable, since 𝑓 −1 (𝐵) ⊆ 𝛺 for each 𝐵 ∈ ℬ(ℝ).

© 2019 - WorldQuant University – All rights reserved.

Let (𝛺, ℱ) be a measurable space, where ℱ is generated by countably many blocks ℬ =

𝑓(𝜔) = ∑ 𝑎𝑛 𝐼𝐵𝑛 (𝜔) , 𝜔 ∈ 𝛺,

where 𝑎𝑛 is the value of 𝑓 on 𝐵𝑛 – i.e. 𝐵𝑛 ⊆ 𝑓 −1 ({𝑎𝑛 }).

ℬ = {{𝜔}: 𝜔 ∈ ℤ} ∪ {ℝ\ℤ} = {… {−2}, {−1}, {0}, {1}, {2}, … , ℝ\ℤ}.

Therefore, a function 𝑓: 𝛺 → ℝ is measurable if it is constant on the non-integers ℝ\ℤ. That is,

© 2019 - WorldQuant University – All rights reserved.