WQU MScFE Discrete-Time Stochastic Processes M2

©2019 - WorldQuant University - All rights reserved.
Revised: 06/06/2019
Table of Contents
Module 2: Stochastic Processes 1
Unit 1: What is a Stochastic Process? 2
Unit 1: Notes 2
Unit 1 : Video transcript 5
Unit 2: Filtrations 7
Unit 2 : Notes 7
Unit 2: Video Transcript 9
Unit 3: Stopping Times and Stopped Processes 11
Unit 3: Notes 11
Unit 3 : Video Transcript 13
Unit 4 : Conditional Expectation 16
Unit 4 : Notes 16
Unit 4: Video Transcript 20
Unit 5 : Martingales and Markov Processes 23
Unit 5 : Notes 23
Unit 5 : Problem Set 27
Bibliography 32
Collaborative Review Task 33
©2019 - WorldQuant University - All rights reserved.
Revised: 06/06/2019
MScFE 620 Discrete-time Stochastic Processes - Module 2
Module 2: Stochastic Processes
Module 2 covers the basics of stochastic processes from a probability theory perspective. It
begins with a thorough definition of stochastic processes and their adaptation to filtrations, and
then continues with a discussion on conditional expectations of random variables. The module
concludes with the definition of martingales and the Markov process.
©2019 - WorldQuant University - All rights reserved
1
MScFE 620 Discrete-time Stochastic Processes - Module 2: Unit 1
Unit 1: What is a Stochastic Process?
Unit 1: Notes
A random variable 𝑋 is a real-valued function of the outcome of a random experiment. Each
outcome 𝜔 ∈ 𝛺 leads to exactly one value 𝑋(𝜔) ∈ ℝ. We now want to take this a step further
and consider the case when each outcome of the random experiment determines the values of
a collection of random variables {𝑋𝑡 : 𝑡 ∈ 𝕀}. We will call such a collection of random variables a
stochastic process and the set 𝕀 the index set. We only consider 𝕀 ⊆ [0, ∞) and think of each 𝑡 ∈
𝕀 as a time point.
Let (𝛺, ℱ, ℙ) be a probability space and ∅ ≠ 𝕀 ⊆ [0, ∞) be an index set. A stochastic process 𝑋 =
{𝑋𝑡 : 𝑡 ∈ 𝕀} is a collection of random variables. We call 𝕀 the index set. If 𝕀 ⊆ ℕ, we call 𝑋 a

discrete-time stochastic process, while if 𝕀 = [0, ∞) or 𝕀 = [0, 𝑇] for some 𝑇 > 0, we call 𝑋 a
continuous-time stochastic process. This course deals mainly with discrete-time stochastic
processes.
Here are some real-world examples.

1 Let 𝑋𝑡 be the value of the S&P 500 index at time 𝑡. Here, the index set could be taken to be
continuous or discrete, if the prices are observed at discrete time intervals – e.g. daily.
Each price 𝑋𝑡 is a random variable, and one outcome 𝜔 determines the values, or prices,
of 𝑋𝑡 for all 𝑡 ∈ 𝕀.
2 In gambling, if 𝑋𝑛 is your wealth (total earnings) after you have played the 𝑛𝑡ℎ game, then
𝑋 = {𝑋𝑛 : 𝑛 = 0,1,2,3, … },
is a discrete-time stochastic process.
The important thing to note here is that one outcome 𝜔 ∈ 𝛺 must determine the values of all
the random variables {𝑋𝑡 : 𝑡 ∈ 𝕀}. For a fixed outcome 𝜔 ∈ 𝛺, we call the function 𝑡 ↦ 𝑋𝑡 (𝜔) a
sample path of 𝑋. Thus, a stochastic process is a “sample path-valued” random variable.
2
That is, we can think of a stochastic process as a function 𝑋: Ω → {sample paths}, that outputs a
path (a function from 𝕀 to ℝ) for each outcome, rather than one real number (as is the case for
random variables).
Here's an example of the so-called Renewal Process. Let (Ω, ℱ, ℙ) be a probability space and
𝑇1 , 𝑇2 , 𝑇3 , be a sequence of positive independent and identically distributed random variables in
ℒ 1 . Think of 𝑇𝑛 as the time between the occurrence of the (𝑛 − 1)𝑡ℎ and 𝑛𝑡ℎ events (inter-arrival
time). Define the arrival times 𝑊1 , 𝑊2 , 𝑊3 , … as
𝑊𝑛 ≔ ∑ 𝑇𝑘 , 𝑛 ≥ 1.
𝑘=1
The stochastic process 𝑋 = {𝑋𝑡 : 𝑡 ≥ 0}, defined by
𝑋𝑡 ≔ Number of 𝑊𝑛 's less than or equal to 𝑡 = ∑ 𝐼{𝑊𝑛≤𝑡} , 𝑡 ≥ 0,

𝑛=1
is called a renewal process. A Poisson process is a special case when each 𝑇𝑛 has an
exponential distribution with common parameter 𝜆 > 0.
Consider a gambling game where you start with an initial wealth of 5. A coin is tossed twice,
and, for each toss, you either win 1 if the outcome is a head or lose 1 otherwise. Pick
1
𝛺 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}, ℱ = 2𝛺 and ℙ = ∑ 𝛿𝜔 .
4
𝜔∈𝛺
Define 𝑋0 (𝜔) = 5 for all 𝜔 ∈ 𝛺 and
𝑋1 (HH) = 𝑋1 (HT) = 5 + 1 = 6, 𝑋1 (TH) = 𝑋1 (TT) = 5 − 1 = 4,

𝑋2 (HH) = 5 + 1 + 1 = 7, 𝑋2 (HT) = 5 + 1 − 1 = 5,
𝑋2 (TH) = 5 − 1 + 1 = 5, 𝑋2 (TT) = 5 − 1 − 1 = 3.
3
Then 𝑋 = {𝑋0 , 𝑋1 , 𝑋2 } is a stochastic process, where 𝑋𝑡 represents your wealth at times 𝑡 =

0,1,2. The sample path corresponding to 𝜔 = HT, for instance, is
𝑋(HT) = {𝑋0 (HT), 𝑋1 (HT), 𝑋2 (HT)} = {5,6,5}.
Here is a table summarizing the sample paths of 𝑋:
If the index set is 𝕀 = [0, ∞) we say that a stochastic process 𝑋 is continuous (resp. right
continuous, left continuous) if each sample path is continuous (resp. right continuous, left
continuous).
4
Unit 1 : Video transcript

Hi, in this video we introduce stochastic processes.
Remember that if we are given a probability space (Ω, 𝔽, ℙ), a function 𝑋: Ω → 𝑅 is called a
random variable if 𝑋 is measurable in the sense that the pre-image of every Borel set belongs to
𝔽.
We're now going to take this a step further and consider a collection of random variables.
𝑋 is now a collection of random variables that are indexed by an index set 𝕀. This index set can
either be the positive real line, which is the set [0, ∞], in which case we will call it a continuous-
time stochastic process. On the other hand, it could be a discrete set, like the natural numbers
which consist of {0,1,2 … }, and in this case we would call it a discrete-time stochastic process.
A stochastic process is therefore a collection of random variables that are indexed by either this
set of this form [0, ∞] or the index set of the following form {0,1,2 … }, in which case it is
discrete.
We may sometimes consider a case whereby this index set is restricted between [0, 𝑇] where 𝑇
is some positive real number (𝑇 > 0). Let's look at some examples.
Firstly, if we let 𝑋𝑡 be the value of the S&P500 index at time 𝑡 then 𝑋𝑡 can be thought of as a
collection of random variables because at each time point we have no idea what the value of the
S&P500 index will be. 𝑋𝑡 is therefore a stochastic process. We can think of it as a continuous-
time stochastic process if we observe the index at every time point between 0 and 𝑡 or we can
consider it as a discrete-time stochastic process if we observe it at certain points in time.
As a second example, let's consider a gambling game. If we let 𝑋𝑛 be your wealth after the 𝑛𝑡ℎ
game, then this sequence {𝑋𝑛 : 𝑛 = 0,1,2 … } can be thought of as a stochastic process if we don't
know the outcome of each game.
5
The important thing to note about a stochastic process is that the outcome 𝜔 must determine
the values of all the random variables 𝑋𝑡 (𝜔) whereas in a random variable the outcome
determines the value of just one random variable. We will call this collection for each 𝜔 a
"sample path", so it is a sample path of the stochastic process 𝑋. One way to think about it is
that if, for instance, the index set is continuous like this, a stochastic process consists of
different sample paths. Each outcome, for example ω1 , gives you a sample path. Another
outcome, 𝜔2 , gives the whole sample path and so it continues. Another way of thinking about a
stochastic process is that it's a sample path valued random variable.
As an example, consider an experiment where two coins are tossed. This would be the sample
space: {HH,HT,TH,TT}. You are playing a gambling game whereby your initial wealth is $5. At
time 1, if the outcome of the first toss is a head you get $1 and if the outcome is a tail you lose
$1. 𝑋1 (ω) will either be 𝑋0 (ω) + 1 if ω is {HH,HT}, which represents the outcome of the first toss
being a head, or it will be 𝑋0 (ω) − 1 if ω belongs to {TH,TT}, which represents the outcome of
the first toss being a tail. 𝑋2 represents the outcome of the second toss, which follows the same
pattern. So, 𝑋2 will be equal to 𝑋1 (ω) + 1 if the outcome of the second toss is a head and ω
belongs to {HH,TH} and it will be 𝑋1 (ω) − 1 if the outcome of the second toss is a tail and ω
belongs to {HT,TT}. We can represent this stochastic process in a table that looks like this:
As an example of a sample path, if you pick ω to be {HT} then the sample path corresponding to
that is simply the set {5,6,5}. That's the sample path – the values of all the random variables
when ω is {HT}.
Now that we've introduced stochastic processes, in the next video we are going to move on to
filtrations.
6
Unit 2: Filtrations
Unit 2 : Notes
Even though all the values of 𝑋 are determined by one outcome, some of them can be
determined without knowledge of the full outcome and are therefore known prior to others. In
the example in the previous section, we do not need to know the full outcome of the experiment
to determine 𝑋1 , since 𝑋1 depends only on the outcome of the first toss. That is, 𝑋1 is
measurable with respect to a 𝜎-algebra that is strictly smaller than ℱ .
Let (Ω, ℱ, ℙ) be a probability space and 𝕀 ⊆ [0, ∞) be an index set. A filtration 𝔽 = {ℱ𝑡 : 𝑡 ∈ 𝕀} is
an increasing collection of sub-𝜎-algebras of ℱ – i.e. ℱ𝑠 ⊆ ℱ𝑡 ⊆ ℱ for 𝑠 ≤ 𝑡, 𝑠, 𝑡 ∈ 𝕀. A stochastic
process 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} is adapted to a filtration 𝔽 if for every 𝑡 ∈ 𝕀, 𝑋𝑡 is ℱ𝑡 -measurable.
Intuitively, a filtration contains the information about 𝜔 that is available to us at each time
point. At time 𝑡 ∈ 𝕀, ℱ𝑡 consists of all the events that can be decided (whether or not they have
occurred) by time 𝑡. The fact that 𝔽 is increasing implies that “we do not forget”. An adapted
stochastic process is one whose values are known at the corresponding time point. We will call
(𝛺, ℱ, 𝔽, ℙ) a filtered probability space.
Associated with a stochastic process 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} is a filtration 𝔽 𝑋 = {ℱ𝑡𝑋 : 𝑡 ∈ 𝕀}, called the
natural filtration of 𝑋, defined by
ℱ𝑡𝑋 ≔ 𝜎({𝑋𝑠 : 𝑠 ≤ 𝑡, 𝑠 ∈ 𝕀, }), 𝑡 ∈ 𝕀.
This is the smallest filtration with respect to which 𝑋 is adapted.
We continue with the example of two coin-tosses. Define 𝔽 = {ℱ0 , ℱ1 , ℱ2 } by
ℱ0 = {∅, Ω}, ℱ1 = 𝜎({{HH,HT}, {TH,TT}}), ℱ2 = ℱ = 2Ω .
7
Then 𝔽 is a filtration. The 𝜎-algebra ℱ0 contains no information, ℱ1 contains information about

the outcome of the first toss, and ℱ2 contains information on both tosses. The natural filtration
𝔽 𝑋 also coincides with 𝔽 in this case. Indeed,
ℱ0𝒳 = 𝜎(𝑋0 ) = {∅, 𝛺},
ℱ1𝑋 = 𝜎({𝑋0 , 𝑋1 }) = 𝜎(𝑋1 ) = 𝜎({𝑋1−1 ({4}, 𝑋1−1 ({6}))}) = 𝜎({{HH,HT}, {TH,TT}}), and
ℱ2𝒳 = 𝜎({𝑋0 , 𝑋1 , 𝑋2 }) = 𝜎({𝑋1−1 (4), 𝑋1−1 (6), 𝑋2−1 (3), 𝑋2−1 (5), 𝑋2−1 (7)})
= 𝜎({{TH,TT}, {HH,HT}, {TT}, {HT,TH}, {HH}})
= 𝜎({{HH}, {TH}, {TH}, {TT}}) = 2Ω .
This can be summarized in the following table:
8
Unit 2: Video Transcript

Hi, in this video we introduce filtrations.
So, let (Ω, ℱ, ℙ) be a probability space and consider 𝑋, which is a stochastic process – meaning
that it's an index collection of random variables – even though the values of 𝑋 are all
determined by one outcome 𝛺 . If you recall, the sample paths of 𝑋 look like this: one outcome
determines all the values of 𝑋. Some of them are known prior to others. For instance, if we take
𝑋𝑡 to be the value of the S&P500 at time 𝑡, then the value of 𝑋5, for instance, is known prior to the
value of 𝑋10 . What we need therefore, is to have a way of collecting the information that is
available at each point in time. To do so, we are going to introduce the notion of a filtration.
A filtration (or 𝔽) is an increasing collection of 𝜎-algebras. 𝔽 can be written like this: 𝔽 =

{𝐹𝑡 : 𝑡 ∈ 𝕀}. The idea behind this is that 𝐹𝑡 represents the information we have that is available at
time 𝑡 and we call {𝛺, 𝐹, 𝔽, ℙ} a filtered probability space.
Now consider a filtered probability space (𝛺, 𝐹, 𝔽, ℙ) and let 𝑋 be a stochastic process that is
indexed by 𝕀. We say that 𝑋 is adapted to the filtration 𝔽 if 𝑋𝑡 is 𝐹𝑡 -measurable for each 𝑡.
Intuitively, what that means is that the value of 𝑋𝑡 is known at each time point if the
information available to us is given by the filtration 𝔽.
We define the natural filtration of 𝑋 as the smallest filtration that makes 𝑋 adapted. It is defined
as follows: 𝔽 𝑋 is a collection of sub-𝜎-algebras that are defined as 𝐹𝑡𝑋 which is the 𝜎-algebra
generated by 𝑋𝑠 where 𝑠 is less than or equal to 𝑡. It is the smallest 𝜎-algebra that makes all of
𝑋𝑠 measurable for 𝑠 less than or equal to 𝑡. We call that the natural of filtration of 𝑋.
Let us look at an example:
9
Consider this experiment where we have HH, HT, TH, and TT – we are tossing two coins and we
define the following stochastic process 𝑋0 (ω) as always equal to 5, 𝑋1 (ω) as 6, 6, 4, 4 and 𝑋2 (ω)
as 7, 5, 5, 3. This is a stochastic process in discrete time and the index set is {0,1,2}. We want to
find a natural filtration of this stochastic process.
𝐹0𝑋 is a 𝜎-algebra generated by ({𝑋0 }) only and, since 𝑋0 is constant, it does not distinguish
between any of these elements. This will equate to the trivial 𝜎-algebra.
𝐹1𝑋 is a 𝜎-algebra generated by ({𝑋0 , 𝑋1 }) and, in this case, 𝑋1 does not distinguish between HH
and HT and it also does not distinguish between TH and TT. 𝑋0 , however, does not distinguish
between any of the elements at all. Simply put, this 𝜎-algebra generates HH and HT as one block
and TH and TT as another block.
Finally, we have 𝐹𝑋2 which is the 𝜎-algebra generated by ({𝑋0 , 𝑋1 , 𝑋2 }. While 𝑋1 puts, for example,
HH and HT together as a block, 𝑋2 separates them, meaning that HH and HT would now be in
separate blocks. The same goes for TH and TT. As a result, this 𝜎-algebra is equal to the power
set of Ω which is a set of all possible subsets.
The natural filtration of 𝑋, therefore, consists of these three 𝜎-algebras which are all sub-𝜎-
algebras of 𝐹 , which is the powerset of Ω
10
Unit 3: Stopping Times and Stopped Processes
Unit 3: Notes
Suppose you invest your money in an asset whose price is a stochastic process 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀}.
Let 𝜏 be a random variable that denotes the time when you decide to sell the asset (with 𝜏 = ∞ if
you never sell). So 𝜏 is a random time. If 𝔽 = {ℱ𝑡 : 𝑡 ∈ 𝕀} represents the information currently
available at each time point, then for every 𝑡 ∈ 𝕀, the decision of whether or not to sell should
only depend on the information available at time 𝑡, ℱ𝑡 . That is, the event that you have stopped
by time 𝑡, {𝜏 ≤ 𝑡}, should belong to ℱ𝑡 .
Let (Ω, ℱ, 𝔽, ℙ) be a filtered probability space. A non-negative extended random variable 𝜏: 𝛺 →

𝕀 ∪ {∞} =: 𝕀̅ is called a stopping time if {𝜏 ≤ 𝑡} ∈ ℱ𝑡 for every 𝑡 ∈ 𝕀. If 𝜏 is a stopping time and
𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} is a stochastic process, we define the stopped process 𝑋 𝜏 = {𝑋𝑡𝜏 : 𝑡 ∈ 𝕀} as 𝑋𝑡𝜏 ≔
𝑋𝜏∧𝑡 – i.e.,
𝑋𝑡 (𝜔) 𝑡 < 𝜏(𝜔)

𝑋𝑡𝜏 (𝜔) ≔ { 𝜔 ∈ 𝛺.
𝑋𝜏(𝜔) (𝜔) 𝑡 ≥ 𝜏(𝜔)
We also define the random variable 𝑋𝜏 by
𝑋𝜏(𝜔) (𝜔) 𝜏(𝜔) < ∞

𝑋𝜏 (𝜔) ≔ {
0 𝜏(𝜔) = ∞.
In the coin toss example, say we decide to stop as soon as our wealth reaches 6. Then 𝜏(ω) ≔
inf { 𝑡 ∈ 𝕀: 𝑋𝑡 (𝜔) = 6}, where we agree that 𝑖𝑛𝑓(∅) =. Then 𝜏(HH) = 𝜏(HT) = 1, 𝜏(TH) =
𝜏(TT) = ∞. The stopped process 𝑋 𝜏 represents our wealth process if we employ this stopping
strategy. We have 𝑋0𝜏 = 𝑋𝜏∧0 = 𝑋0 = 5, 𝑋1𝜏 = 𝑋𝜏∧1 = 𝑋1 since 𝜏 ≥ 1, and 𝑋2𝜏 is given by
𝑋2𝜏 (HH) = 𝑋𝜏(HH)∧2 (HH) = 𝑋1∧2 (HH) = 𝑋1 (HH) = 6.
A similar argument gives
11
𝑋2𝜏 (HT) = 𝑋1 (HT) = 6, 𝑋2𝜏 (TH) = X2 (TH) = 5, 𝑋2𝜏 (TT) = 𝑋2 (TT) = 3.
The random variable 𝑋𝜏 is given by
𝑋𝜏 (HH) = 𝑋1 (HH) = 6, 𝑋𝜏 (HT) = 𝑋1 (HT) = 6, 𝑋𝜏 (TH) = 0 = 𝑋𝜏 (TT).
A table summary:
In most instances, stopping times can be generated as follows. Let 𝐵 ∈ ℬ(ℝ) be a Borel set and
𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} be a stochastic process. Define 𝜏𝐵 : Ω → 𝕀̅ by
𝜏𝐵 (ω) ≔ inf{𝑡 ∈ 𝕀: 𝑋𝑡 (ω) ∈ 𝐵}.
We call 𝜏𝐵 the hitting time.
Let (Ω, ℱ, 𝔽, ℙ) be a filtered probability space, 𝕀 ⊆ ℕ be a discrete index set and 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} be
a stochastic process. If B ∈ ℬ(ℝ) and 𝑋 is adapted to 𝔽, then 𝜏𝐵 is an 𝔽-stopping time.
If 𝕀 ⊆ ℕ is discrete, then 𝜏: Ω → 𝕀̅ is a stopping time if and only if {𝜏 = 𝑛} ∈ ℱ𝑛 for each 𝑛 ∈ 𝕀.
12
Unit 3 : Video Transcript

Exploring Stopping Times and Stopped Processes
Hi, in this video we're going to introduce stopping times and stopped processes. We are going to
first define them and then look at an example.
Suppose you have invested in a share whose price is given by the stochastic process 𝑋 =
{𝑋𝑡 : 𝑡 ∈ 𝕀}. Let 𝜏 be the random time you decide to sell the share. So 𝜏 is the time that you sell
the share 𝑋. What do I mean by random time? If you look at a sample path of the stochastic
process 𝑋(𝜔) corresponding to the sample path 𝜔, 𝜏 might be somewhere here. 𝜏(𝜔) represents
the time you decide to sell the share.
As an example, you could define 𝜏 to be the first time that the share price reaches 150. This is
random because it depends on the sample path under consideration. We're going to call 𝜏 a
stopping time if it satisfies the following condition: {𝜏 ≤ 𝑡} ∈ ℱ𝑡 . This set here ({ω ∈ Ω: 𝜏(ω) ≤
𝑡}) is the set of all outcomes for which 𝜏 ≤ 𝑡. Intuitively, what this means is that the decision of
whether or not to sell by time 𝑡 is known by time 𝑡 because it belongs to the 𝜎-algebra at time 𝑡,
which is the filtration that contains the information available to us.
Similarly, we define a stopped process as simply the original stochastic process but then, once
we stop, it becomes constant and that constant is equal to whatever the value is at the stopping
time. This will be denoted by 𝑋 𝜏 with the superscript, and it will be the collection of random
variables 𝑋 𝜏 = {X τ 𝑡 : 𝑡 ∈ 𝕀} that are defined as 𝑋 𝜏 𝑡 ≔ 𝑋𝑡 ∧ 𝜏(𝜔)(𝜔) where this is a function of 𝜔

and this represents the minimum between the two. So, 𝑥 ∧ 𝑦 is simply the minimum between 𝑥
and 𝑦.
We also define the stopping random variable as 𝑋𝜏 (𝜔) . This is just the value of the stochastic
process at the stopping time itself. We are going to define it to be 𝑋𝜏(𝜔) (𝜔) if the stopping time is
finite and it is zero otherwise. We need this condition since the way we define stopping times
makes it possible for 𝜏 to take the value of infinity. If the stock price never reaches the level at
150 for instance, then 𝜏(ω) will be infinite for that sample path.
13
Let's look at an example now. Consider the experiment of tossing two coins where the
outcomes are {HH, HT, TH, and TT}. We will define the same stochastic process that we looked
at in previous videos.
Define 𝜏(ω) = inf{𝑡 ∈ {0,1,2}: 𝑋𝑡 (ω) = 6}. If we go back to the table, we can fill it in with the
values of 𝜏:
Now the question is, is 𝜏 a stopping time with respect to the natural filtration of 𝑋(𝔽X )? We
found 𝔽 𝑋 in a previous video as a collection with 𝔽 𝑋 = {𝔽0 𝑋 , 𝔽1 𝑋 , 𝔽2 𝑋 } where 𝔽0 𝑋 is the trivial
𝜎-algebra, 𝔽1 𝑋 is a 𝜎-algebra generated by {HH,HT} as a block and {TH,TT} as a block and 𝔽2 𝑋 is

equal to the powerset of 𝜔.
To decide whether or not 𝜏 is a stopping time, we have to calculate the event {𝜏 ≤ 𝑡} and this
should be in 𝐹𝑡 for every 𝑡. To start, when 𝑡 = 0, the event that {𝜏 ≤ 𝑡} is empty because 𝜏 is
never less than or equal to 0. This belongs to 𝔽0 𝑋 . When 𝑡 = 1, the event that {𝜏 ≤ 1}, is equal to
{HH,HT} which belongs to 𝔽1 𝑋 because 𝔽1 𝑋 is generated by this as a block. Finally, when 𝑡 = 2,
14
the event that {𝜏 ≤ 2}, is exactly the same as before as {HH,HT} which obviously belongs to 𝔽2 𝑋
because 𝔽2 𝑋 is the powerset of Ω.
15
Unit 4 : Conditional Expectation
Unit 4 : Notes
In this chapter we put a rigorous foundation on another fundamental concept
of probability: conditional expectation.
Let (Ω, ℱ, ℙ) be a probability space and 𝑋 ∈ 𝑚ℱ be a random variable. If 𝒢 ⊆ ℱ is a sub-𝜎-

algebra of ℱ , then any random variable 𝑍 that satisfies the following:
1 𝑍 is 𝒢-measurable;
2 𝑍 satisfies the partial averaging property:
∫ 𝑍 𝑑ℙ = ∫ 𝑋 𝑑ℙ, for every 𝐴 ∈ 𝒢

𝐴 𝐴
is called a version of the conditional expectation of 𝑋 given 𝒢.
We should really be saying the conditional expectation, but we can only do so after the
following theorem.
Theorem 1
Let (Ω, ℱ, ℙ) be a probability space and 𝑋 ∈ ℒ 1 be a random variable. If 𝒢 ⊆ ℱ is a sub-𝜎-algebra
of ℱ , then a version of the conditional expectation of 𝑋 given 𝒢 exists and is unique ℙ-almost
surely. We will denote it by 𝔼(𝑋|𝒢), and simply call it the (a.s. defined) conditional expectation
of 𝑋 given 𝒢.
If (Ω, ℱ, ℙ) is a probability space and 𝑋, 𝑌 ∈ 𝑚ℱ are random variables, then we define
𝔼(𝑋|𝑌) ≔ 𝔼(𝑋|𝜎(𝑌)),
if it exists.
16
Here is the interpretation. You have conducted a random experiment whose random outcome
𝜔 ∈ Ω is not known. The 𝜎-algebra ℱ consists of all the events for which we can decide whether
or not they have occurred once the experiment is conducted. (Note that this information might
still not enable us to uniquely determine 𝜔.) The random variable 𝑋 is ℱ -measurable, which
means that its value can be determined only from the knowledge of the occurrence or non-
occurrence of the events in ℱ (without knowing 𝜔). Suppose that at the current stage we are
only given partial information contained in the 𝜎-algebra 𝒢 ⊆ ℱ . Then 𝔼(𝑋|𝒢) is the 𝒢-
measurable random variable that best approximates 𝑋 (in some sense). That is, the best
estimator for 𝑋 that can be determined from the information in 𝒢.
Suppose a fair coin is tossed twice and 𝑋 represents the number of heads in the two tosses.
1
Choose Ω = {HH,HT,TH,TT}, ℱ = 2𝛺 and ℙ = ∑𝜔∈Ω 𝛿𝜔 . Now let 𝒢 ⊆ ℱ be the 𝜎-algebra that
4
contains information about the outcome of the first toss. Then 𝒢 = 𝜎({{HH,HT}, {TH,TT}}). We
want to find 𝔼(𝑋|𝒢). Since this random variable must be 𝒢-measurable, it must be constant on
the blocks that generate 𝒢; that is,
𝔼(𝑋|𝒢) = 𝑎𝐼{HH,HT} + 𝑏𝐼{TH,TT}
for some 𝑎, 𝑏 ∈ ℝ. To find 𝑎 and 𝑏, we use the partial averaging property. If 𝐴 = {HH,HT}, then
1 3
∫ 𝔼(𝑋|𝒢)𝑑ℙ = 𝑎ℙ({HH,HT}) = ∫ 𝑋 𝑑ℙ = (𝑋(HH) + 𝑋(HT)) = .
𝐴 𝐴 4 4
3 1
Hence 𝑎 = . A similar calculation on 𝐴 = {TH,TT} gives 𝑏 = .
2 2
Here is a more intuitive approach to finding 𝔼(𝑋|𝒢). The (constant) value of the random variable
𝔼(𝑋|𝒢) on the block {HH,HT} is given by the weighted average of 𝑋 over that block:
ℙ({HH}) ℙ({HT}) 3
𝔼(𝑋|𝒢)(𝜔) = 𝑋(HH) × + 𝑋(HT) × = , ω ∈ {HH,HT}.
ℙ({HH,HT}) ℙ({HH,HT}) 2
17
Similarly,
ℙ({TH}) ℙ({TT}) 1
𝔼(𝑋|𝒢)(𝜔) = 𝑋(TH) × + 𝑋(TT) × = , ω ∈ {TH,TT}.
ℙ({TH,TT}) ℙ({TH,TT}) 2
So, the random variable 𝔼(𝑋|𝒢) can be determined only from the outcome of the first toss. It is
3 1
equal to if the first toss is a head and if the first toss is a tail. In contrast, 𝑋 cannot be
2 2
determined from this information: if, for instance, the first toss results in a head, the value of 𝑋
could be 1 or 2 with equal probability. The value of 𝔼(𝑋|𝒢) on this scenario is simply the
weighted average of these possibilities.
Consider the probability space ([0,1], ℬ([0,1]), ℙ = 𝜆1 ). Define 𝑋, 𝑌: [0,1] → ℝ by
𝑋(𝜔) ≔ 𝜔2 , 𝑌(𝜔) = (𝜔 + 1)𝐼[0,1/2] (𝜔), 𝜔 ∈ [0,1].
We want 𝔼(𝑋|𝑌) = 𝔼(𝑋|𝜎(𝑌)). First, we have
1
𝜎(𝑌) = 𝜎({{𝑌 ≤ 𝑐}: 𝑐 ∈ ℝ}) = 𝜎 ({[0, 𝑐]: 0 ≤ 𝑐 ≤ })
2
1 1 1
= {𝐵 ⊆ [0,1]: 𝐵 ∈ ℬ ([0, ]) or 𝐵 = ( , 1] ∪ 𝐵 ′ , 𝐵 ′ ∈ ℬ ([0, ])}.
2 2 2
Thus, a necessary condition for a function 𝑍 to be 𝜎(𝑌)-measurable is that it must be constant

1 1
on ( , 1]. Hence 𝔼(𝑋|𝑌)(𝜔) = 𝑎 for 𝜔 ∈ ( , 1]. To find 𝑎, we use partial averaging
2 2
1 7
∫ 𝔼(𝑋|𝑌)𝑑ℙ = 𝑎ℙ (( , 1]) = ∫ 𝜔2 𝑑 𝜆1 =
1
( ,1] 2 1
( ,1] 24
2 2
7
24 7
⟹𝑎= = .
1 12
ℙ ((2 , 1])
18
1
For 𝜔 ∈ [0, ], 𝑋(𝜔) = 𝜔2 = (𝑌(𝜔) − 1)2 , hence 𝑋 is 𝜎(𝑌)-measurable on that interval. So, we
2
have
1
𝑋(𝜔) 𝜔 ∈ [0, ]
𝔼(𝑋|𝑌)(𝜔) { 2
7 1
𝜔 ∈ ( , 1] .
12 2
Here are some useful properties of the conditional expectation. We say that two 𝜎-algebras ℱ
and 𝒢 are independent if for any 𝐴 ∈ ℱ and 𝐵 ∈ 𝒢 , ℙ(𝐴 ∩ 𝐵) = ℙ(𝐴)ℙ(𝐵).
Let (Ω, ℱ, ℙ) be a probability space and 𝑋, 𝑌 ∈ ℒ 1 (Ω, ℱ, ℙ). If 𝒢 ⊆ ℱ is a sub-𝜎-algebra of ℱ , then:
1 (Total Expectation) 𝔼(𝔼(𝑋|𝒢)) = 𝔼(𝑋).
2 If 𝑋 is 𝒢-measurable, then 𝔼(𝑋|𝒢) = 𝑋 a.s..

3 If 𝑋 ≥ 0, then 𝔼(𝑋|𝒢) ≥ 0 a.s..
4 𝔼(𝑎𝑋 + 𝑏𝑌|𝒢) = 𝑎𝔼(𝑋|𝒢) + 𝑏𝔼(𝑌|𝒢) a.s., 𝑎, 𝑏 ∈ ℝ.
5 If 𝑌 is 𝒢-measurable, then 𝔼(𝑌𝑋|𝒢) = 𝑌𝔼(𝑋|𝒢) a.s..
6 (Tower Property) If ℋ ⊆ 𝒢, then 𝔼(𝔼(𝑋|𝒢)|ℋ) = 𝔼(𝑋|ℋ) a.s..
7 If 𝜎(𝑋) and 𝒢 are independent, then 𝔼(𝑋|𝒢) = 𝔼(𝑋).
19
Unit 4: Video Transcript

Hi, in this video we define the conditional expectation of a random variable with respect to a 𝜎-
algebra.
Let (Ω, ℱ, ℙ) be a probability space and 𝒢 be a sub-𝜎-algebra of ℱ . What that means is that 𝒢 in
itself is a 𝜎-algebra and all the elements of 𝒢 are also elements of ℱ .
If 𝑋 is a random variable, then what a conditional expectation of 𝑋 given 𝒢 represents is the
best 𝒢-measurable estimator of 𝑋. In other words, it is the random variable that best
approximates X and is also 𝒢-measurable because 𝑋 itself need not be 𝒢-measurable.
Let’s look at a number line. Consider ℱ the 𝜎-algebra. The random variable 𝑋 is ℱ -measurable,
meaning that if we know the information or the events that are in ℱ then we know the value of
𝑋. However, it may not be 𝒢-measurable. The conditional expectation of 𝑋 given 𝒢 provides the
best estimator that is 𝒢-measurable and approximates 𝑋 in the most efficient way. We will see
what that means later on.
The intuition about conditional expectation is captured by the following definition: a random
variable 𝑍 is a version of the conditional expectation of 𝑋 given 𝒢 if it satisfies the following two
conditions:
1 The first one is that 𝑍 must be 𝒢-measurable itself, and
2 The second condition, which is what captures the intuition of approximation, is called a
partial averaging property, and says that the integral over any 𝐴 of 𝑍 is the same as the
integral over 𝐴 of 𝑋 for every 𝐴 in 𝒢.
Now, 𝑍 exists if 𝐸(|𝑋|) < ∞. So, if the expected value of 𝑋 is finite, one can show, using the
Radon-Nikodym theorem, that the version of the conditional expectation of 𝑋 given 𝒢 does
exist. In that case 𝑍 is also unique and we denote it by writing 𝑍 = 𝐸(𝑋|𝒢), which is a random
variable that is defined almost surely.
20
Let’s look at an example. Consider the experiment of tossing two coins, where the
corresponding sample space is Ω = {HH, HT, TH, TT} and let 𝑋 be the number of heads of the
two tosses. We will take as the 𝜎-algebra the powerset of Ω and as a probability measure, we’ll
take a weighted sum of Dirac measures where each sample point has equal weight. All this is
1
saying is that each sample point has probability . Let 𝒢 be the 𝜎-algebra that represents the
4
information we have by observing the first toss. 𝒢 is generated by the following two blocks:
({{HH,HT}, {TH,TT}}). {HH,HT} corresponds to the event that the first toss is a head and {TH,TT}
corresponds to the event that the first is a tail.
The question is: what is the conditional expectation of 𝑋 given 𝒢? That’s what we want to
calculate. Now, first note that in order for this to be 𝒢-measureable it has to be constant on these
two blocks. This can be written 𝑎 times the indicator of {HH,HT} plus 𝑏 times the indicator of
{TH,TT} and all we need to do is try and find what 𝑎 and 𝑏 are. For that we will use a partial
averaging property. So, we take a set in 𝒢 and we have to evaluate the integral of the conditional
expectation with respect to the measure ℙ and do the same evaluating the integral of X with
respect to the measure ℙ. So, if we evaluate the integral of {HH,HT} of the conditional
expectation with respect to ℙ, we will get 𝑎 times the integral over {HH,HT} of the indicator of
1
{HH,HT} with respect to ℙ, which is equal to 𝑎 times the probability of {HH,HT} and that is 𝑎 .
2
Written in full:
1
∫ 𝐸(𝑋|𝒢)𝑑ℙ = 𝑎 ∫ 𝐼{𝐻𝐻,𝐻𝑇} 𝑑ℙ = 𝑎ℙ({𝐻𝐻, 𝐻𝑇}) = 𝑎 .
{𝑇𝐻,𝑇𝑇} {𝐻𝐻,𝐻𝑇} 2
Similarly, if we evaluate another integral of {TH,TT} of the conditional expectation 𝑋 given ℙ,

1
that will give us 𝑏 2. Written in full:
1
∫ 𝐸(𝑋|𝒢)𝑑ℙ = 𝑏 .
{𝑇𝐻,𝑇𝑇} 2
Those are the partial averages with respect to the conditional expectation.
21
Now, we do the same for 𝑋. Evaluate the integral of {HH, HT}𝑋 and you’ll see that the answer in
3 1
that case is . When you calculate the integral over {TH, TT}𝑋 with respect to ℙ, the answer is .
4 4
3 1 3
We then equate 4 to 𝑎 times 2 and that gives us 𝑎 is equal to 2. And, by solving this, we can get
1
that 𝑏 is equal to 2.
Those are the values of 𝑎 and 𝑏, which implies that the conditional expectation 𝑋 given 𝒢 is just
3 1
simply equal to 2 times the indicator of {HH,HT} plus 2 times the indicator of {TH,TT}. Written in
full:
3 1
𝐸(𝑋|𝒢) = 𝐼{𝐻𝐻,𝐻𝑇} + 𝐼{𝑇𝐻,𝑇𝑇} .
2 2
We can sketch the conditional expectation on a graph together with 𝑋. 𝑋 of (HH) is 2, (HT) is 1,
3
(TH) is 1, and (TT) is 0. We can then do the same for the conditional expectation: it’s for both
2
(TH) and (TT). So, what the conditional expectation does is groups (HH) and (HT) together and
gives them one value which is the average of the two. It also groups (TH) and (TT) together and
gives them one value which is the average of the two.
22
Unit 5 : Martingales and Markov Processes
Unit 5 : Notes
Let (Ω, ℱ, ℙ) be a probability space and (𝐸, ℰ) be a measurable space. A collection of random
elements 𝑋 = {𝑋𝑡 : Ω → 𝐸: 𝑡 ∈ 𝕀} is called an 𝐸 -valued stochastic process. The special case when
{𝐸, ℰ} → (ℝ, ℬ(ℝ)) gives the usual stochastic process discussed previously.
An 𝐸 -valued stochastic process 𝑋 is called a Markov process with state space 𝐸 if for every
bounded measurable function 𝑓: {𝐸, ℰ} → (ℝ, ℬ(ℝ)),
𝔼(𝑓(𝑋𝑡 )|ℱ𝓈𝒳 ) = 𝑔𝑡 (𝑋𝑠 ) 𝑠 ≤ 𝑡 for 𝑔𝑡 : (𝐸, ℰ) → (ℝ, ℬ(ℝ)).
In simple terms, this means that the conditional distribution of 𝑋𝑡 given all information about 𝑋
up to and including time 𝑠 ≤ 𝑡 depends only on 𝑋𝑠 .
An example of a Markov process is the Poisson process discussed previously.
We say that a process 𝑋 has independent increments if for every 𝑠 < 𝑡, 𝑋𝑡 − 𝑋𝑠 is independent
of ℱ𝑠𝑋 . It is easy to see that a process with independent increments is a Markov process.
Let (Ω, ℱ, 𝔽, ℙ) be a filtered probability space and 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} be a stochastic process. We call
𝑋 a martingale (or an (𝔽, ℙ)-martingale) if
1 𝑋 is adapted to 𝔽;
2 𝔼(|𝑋𝑡 |) < ∞ for all 𝑡 ∈ 𝕀;
3 𝔼(𝑋𝑡 |ℱ𝑠 ) = 𝑋𝑠 ℙ-a.s. for 𝑠 ≤ 𝑡.
We call 𝑋 a sub-martingale (resp. super-martingale) if it satisfies 1, 2, and
𝟑′ 𝔼(𝑋𝑡|ℱ𝓈 ) ≥ 𝑋𝑠 (resp. 𝔼(𝑋𝑡 |ℱ𝑠 ) ≤ 𝑋𝑠 ) ℙ a.s. for 𝑠 ≤ 𝑡.
23
A martingale is a driftless process — the best estimator of any future value is the current one. A
submartingale drifts upwards, while a supermartingale drifts downwards. Furthermore, 𝑋 is a
martingale if and only if it is both a submartingale and a supermartingale.
Let (Ω, ℱ, ℙ) be a probability space and 𝑁 = {𝑁𝑡 : 𝑡 ≥ 0} be a Poisson process with rate 𝜆 > 0. Let
𝔽 = 𝔽𝑁 be the natural filtration of 𝑁. Then 𝑁 is a submartingale. Indeed, 𝑁 is adapted to 𝔽𝑁 , and

𝔼(|𝑁𝑡 |) = 𝔼(𝑁𝑡 ) = 𝜆𝑡 < ∞ for each 𝑡 ≥ 0. We also have (for 𝑠 < 𝑡)
𝔼(𝑁𝑡 |ℱ𝑠𝑁 ) = 𝔼(𝑁𝑠 + 𝑁𝑡 − 𝑁𝑠 |ℱ𝑠𝑁 ) = 𝑁𝑠 + 𝜆(𝑡 − 𝑠) ≥ 𝑁𝑠 .
On the other hand, the compensated Poisson process 𝑀 = {𝑁𝑡 − λ𝑡: 𝑡 ≥ 0} is a martingale.
Let (Ω, ℱ, ℙ) be a probability space and 𝑋 = (𝑋𝑛 )∞

𝑛=1 be a sequence of i.i.d random variables in
ℒ 1 with 𝔼(𝑋𝑛 ) = 𝜇 ∀𝑛 ∈ ℕ+ . Fix 𝑥 ∈ ℝ and define the random walk 𝑆 = {𝑆𝑛 : 𝑛 ∈ ℕ} by
𝑆0 = 𝑥, 𝑆𝑛 = ∑ 𝑋𝑘 , 𝑛 ≥ 1
𝑘=1
We pick 𝔽 = {ℱ𝑛 : n ∈ ℕ} as follows:
ℱ0 ≔ {∅, Ω}, ℱ𝑛 = 𝜎({𝑋1 , … , 𝑋𝑛 }), 𝑛 ≥ 1.
It is easy to see that conditions 1 and 2 are both satisfied. We also have for 𝑚 ≤ 𝑛,
𝔼(𝑆𝑛 |ℱ𝑚 ) = 𝔼 (𝑆𝑚 + ∑ 𝑋𝑘 |ℱ𝑚 ) = 𝑆𝑚 + (𝑛 − 𝑚)𝜇.

𝑘=𝑚+1
Thus, 𝑆 is a submartingale if 𝜇 ≥ 0, a supermartingale if 𝜇 ≤ 0 and a martingale if 𝜇 = 0.
In a similar fashion to the previous example, if 𝑋𝑘 ≥ 0 with 𝔼(𝑋𝑘 ) = 𝜇, then 𝑆 = {𝑆𝑛 : 𝑛 ∈ ℕ}

defined by 𝑆0 ≔ 1, 𝑆𝑛 ≔ ∏𝑛
𝑘=1 𝑋𝑘 for 𝑛 ≥ 1 is a martingale if and only if 𝜇 = 1. It is a
submartingale if 𝜇 ≥ 1 and a supermartingale if 𝜇 ≤ 1.
24
Let (Ω, ℱ, 𝔽, ℙ) be a filtered space and 𝑋 = {𝑋𝑛 : 𝑛 ∈ ℕ} be a stochastic process. Then 𝑋 is a

martingale if and only if 𝑋 is adapted to 𝔽, 𝑋𝑛 ∈ ℒ 1 for each 𝑛 ∈ ℕ and
𝐸(𝑋𝑛+1 |ℱ𝑛 ) = 𝑋𝑛 ℙ − a.s..
Let (Ω, ℱ, 𝔽, ℙ) be a filtered space and 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} be a stochastic process.
1 If 𝑋 is a martingale, then 𝔼(𝑋𝑡 ) = 𝔼(𝑋𝑠 ) = 𝔼(𝑋0 ) for every 𝑠, 𝑡 ∈ 𝕀. That is, 𝑋 has a
constant mean.
2 If 𝑋 is a submartingale, then 𝔼(𝑋𝑡 ) ≥ 𝔼(𝑋𝑠 ) ≥ 𝔼(𝑋0 ) for every 𝑠 ≤ 𝑡 ∈ 𝕀. That is, 𝑋 has an
increasing mean.
3 If 𝑋 is a supermartingale, then 𝔼(𝑋𝑡 ) ≤ 𝔼(𝑋𝑠 ) ≤ 𝔼(𝑋0 ) for every 𝑠 ≤ 𝑡 ∈ 𝕀. That is, 𝑋 has a
decreasing mean.
A function φ: ℝ → ℝ is convex if for any 𝑥, 𝑦 ∈ ℝ and λ ∈ {0,1},
𝜑(𝜆𝑥 + (1 − 𝜆)𝑦) ≤ 𝜆𝜑(𝑥) + (1 − 𝜆)𝜑(𝑦).
Examples of convex functions include 𝜑(𝑥) = 𝑥 2 , 𝑒 𝑥 , − ln 𝑥 , |𝑥|.
Jensen's inequality: Let (Ω, ℱ, ℙ) be a probability space and 𝑋 ∈ ℒ 1 . If 𝜑: ℝ → ℝ is a convex

function such that 𝜑(𝑋) ∈ ℒ 1 , and 𝒢 ⊆ ℱ is a sub-𝜎-algebra of ℱ , then
𝔼(𝜑(𝑋)|𝒢) ≥ 𝜑(𝔼(𝑋|𝒢)) ℙ − 𝑎. 𝑠..
In particular,
𝔼(𝜑(𝑋)) ≥ 𝜑(𝔼(𝑋)).
25
Let (Ω, ℱ, 𝔽, ℙ) be a filtered probability space and 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} be a martingale. If 𝜑 is a convex

function such that 𝜑(𝑋𝑡 ) ∈ ℒ 1 for every 𝑡 ∈ 𝕀, then {𝜑(𝑋𝑡 ): 𝑡 ∈ 𝕀} is a submartingale.
Let (Ω, ℱ, 𝔽, ℙ) be a filtered probability space and 𝐻 ∈ ℒ 1 . For any index set 𝕀 ⊆ [0, ∞),
define 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} by 𝑋𝑡 ≔ 𝔼(𝐻|ℱ𝑡 ) for each 𝑡 ∈ 𝕀. Then 𝑋 is an (𝔽, ℙ)-martingale.
26
Unit 5 : Problem Set

Problem 1
Let Ω = {𝑎, 𝑏, 𝑐, 𝑑}, ℱ = 2Ω , ℱ0 = {∅, Ω}, ℱ1 = 𝜎({𝑎, 𝑏}, {𝑐, 𝑑}), ℱ2 = ℱ and 𝑋 = {𝑋0 , 𝑋1 , 𝑋2 }, be
defined as
Let 𝜏: Ω → {0,1,2, ∞} be 𝜏(ω) ≔ inf{𝑛 ∈ {0,1,2}: 𝑋𝑛 (ω) ≥ 3}. Then what is (𝜏(𝑎), 𝜏(𝑏), 𝜏(𝑐), 𝜏(𝑑))?
Solution:
In this example we have four different possible realizations of ω, that is Ω = {𝑎, 𝑏, 𝑐, 𝑑}. We can
interpret each realization {𝑎, 𝑏, 𝑐, 𝑑} as a potential path that the discrete stochastic process can
follow. In this case, 𝑋 = {𝑋0 , 𝑋1 , 𝑋2 }, is only allowed to follow four different paths, as you can see
in the table:
27
For instance, in the path 𝑎, the realization of 𝑋(𝑎) = {𝑋0 , 𝑋1 , 𝑋2 } is 𝑋(𝑎) = {1,3,4}. As the
problem defined the stopping time as:
𝜏(ω) ≔ inf{𝑛 ∈ {0,1,2}: 𝑋𝑛 (𝜔)},
we just have to find the value of 𝑡 (take into account that we have only three times in this
problem {0,1,2}) such that 𝑋𝑡 (ω) ≥ 3, and repeat the process for each path, {𝑎, 𝑏, 𝑐, 𝑑}, in order to
compute (𝜏(𝑎), 𝜏(𝑏), 𝜏(𝑐), 𝜏(𝑑)).
Let focus in the first path, 𝑎. The realization for this path (from the table) is: 𝑋(𝑎) = {1,3,4}. It is
easy to see that: 𝑋0 (𝑎) = 1 does not hold the condition that 𝑋𝑡 (𝑎) ≥ 3. On the other hand,
𝑋1 (𝑎) = 3 holds the condition, thus 𝑡 = 1 is the minimum time (inf) such that 𝑋𝑡 (ω) ≥ 3. Thus,
we can conclude that 𝜏(𝑎) = 1.
For paths 𝑏 and 𝑐 , we will get (following the same procedure) 𝜏(𝑏) = 1 and 𝜏(𝑐) = 2.
Finally, considering path 𝑑 , we observe that there isn’t any time 𝑋𝑡 (𝑑) such that 𝑋𝑡 (𝑑) ≥ 3. As
the problem tell us that 𝜏: Ω → {0,1,2, ∞}, we can conclude that 𝜏(𝑑) = ∞.
To sum up, the solution is equal to: (𝜏(𝑎), 𝜏(𝑏), 𝜏(𝑐), 𝜏(𝑑)) = (1,1,2, ∞).
28
Problem 2
Let Ω = {𝑎, 𝑏, 𝑐, 𝑑}, ℱ = 2Ω , ℱ0 = {∅, Ω}, ℱ1 = 𝜎({{𝑎, 𝑐}, {𝑏, 𝑑}}), ℱ2 = ℱ and 𝑋 = {𝑋0 , 𝑋1 , 𝑋2 }, 𝜏 be
defined as
Then what is 𝑋𝜏 (𝑑)?
Solution:
First of all, let’s summarize the theory (from the lecture notes) that we are going to need to solve
this problem. We know that if 𝜏 is a stopping time and 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} is a stochastic process, we
can define the stopped process 𝑋 𝜏 = {𝑋𝑡𝜏 : 𝑡 ∈ 𝕀} as 𝑋𝑡𝜏 ≔ 𝑋𝜏∧𝑡 – i.e.,
𝑋𝑡 (ω) 𝑡 < 𝜏(ω)

𝑋𝑡𝜏 (ω) ≔ { ω ∈ Ω.
𝑋𝜏(ω) (ω) 𝑡 ≥ 𝜏(ω)
We also define the random variable 𝑋𝜏 by\]
𝑋𝜏(𝜔) (𝜔) 𝜏(𝜔) < ∞

𝑋𝜏 (ω) ≔ {
0 𝜏(𝜔) = ∞.
As the problem only asks about path 𝑑 , we can focus only on ω = 𝑑 . As 𝜏(𝑑) = ∞ we can
already conclude (see the definition of the random variable 𝑋𝜏 above) that 𝑋𝜏 (𝑑) = 0.
29
Problem 3
Let Ω = {𝑎, 𝑏, 𝑐, 𝑑}, ℱ = 2Ω , ℱ0 = {∅, Ω}, ℱ1 = 𝜎({{𝑎, 𝑏}, {𝑐, 𝑑}}), ℱ2 = ℱ and 𝑋 = {𝑋0 , 𝑋1 , 𝑋2 } be a
martingale defined as
Then what is α?
Solution:
We know the definition of a martingale from the theory.
Let (Ω, ℱ, 𝔽, ℙ) be a filtered space and 𝑋 = {𝑋𝑛 : 𝑛 ∈ ℕ} be a stochastic process. Then 𝑋 is a

martingale if and only if 𝑋 is adapted to 𝔽, 𝑋𝑛 ∈ ℒ 1 for each 𝑛 ∈ ℕ and
𝔼(𝑋𝑛+1|ℱ𝓃 ) = 𝑋𝑛 ℙ − 𝑎. 𝑠..
Moreover, we also know that the condition above is equivalent (more practical) to the following
proposition.
Let (Ω, ℱ, 𝔽, ℙ) be a filtered space and 𝑋 = {𝑋𝑡 : 𝑡 ∈ 𝕀} be a stochastic process. Then, if 𝑋 is a

martingale, then 𝔼(𝑋𝑡 ) = 𝔼(𝑋𝑠 ) = 𝔼(𝑋0 ) for every 𝑠, 𝑡 ∈ 𝕀. That is, 𝑋 has a constant mean.
Following the above proposition, we can solve the problem as follows. First, we compute the
𝔼(𝑋0 ) which does not depend on α:
30
1 1 1 1
𝔼(𝑋0 ) = ∗ 2 + ∗ 2 + ∗ 2 + ∗ 2 = 2.
4 4 4 4
Now we have to find α such that 𝔼(𝑋1 ) = 2. It is easy to see that α = 2 solving the following
equation:
1 1 1 1
∗ 3 + ∗ α + ∗ 2 + ∗ 1 = 2.
4 4 4 4
Thus, the solution is α = 2.
31
MScFE 620 Discrete-time Stochastic Processes - Module 2
Bibliography
Shiryaev, A.N., 1996. Probability. Graduate Texts in Mathematics, R.P. Boas, Ed. Springer.
Roger, L.C.G. & Williams, D., 1987. Diffusions, Markov processes and martingales. Cambridge
University Press.
Williams, D., 1991. Probability with martingales. Cambridge Mathematical Textbooks. Cambridge
University Press.
32
MScFE 620 Discrete-time Stochastic Processes - Module 2: Collaborative Review Task
Collaborative Review Task
In this module, you are required to complete a collaborative review task, which is designed to
test your ability to apply and analyze the knowledge you have learned during the week.
Brief
Let Ω = {𝑎, 𝑏, 𝑐, 𝑑} and
ω ℙ({ω}) 𝑋0 (ω) 𝑋1 (ω) 𝑋2 (ω) 𝑌(ω)

1
𝑎 12 18 36 12
9
2
𝑏 12 18 9 15
9
1
𝑐 12 9 12 16
6
1
𝑑 12 9 8 8
2
Let 𝕀 = {0,1,2} and define 𝜏: Ω → 𝕀 ∪ {∞} by
𝜏(𝜔) = 𝑖𝑛𝑓{𝑛 ∈ 𝕀: 𝑋𝑛 (𝜔) < 10}.
1 Find 𝔽 𝑋 = {ℱ𝑛𝑋 : 𝑛 ∈ 𝐼} = {ℱ0𝑋 , ℱ1𝑋 , ℱ2𝑋 }, the natural filtration of 𝑋.

2 Calculate 𝜏(𝜔) for each 𝜔 ∈ Ω.
3 Show that 𝜏 is a stopping time with respect to 𝔽 𝑋 .
4 Is 𝑋 a martingale with respect to 𝔽 𝑋 ?
5 Define 𝑍 = {𝑍0 , 𝑍1 , 𝑍2 } by 𝑍𝑛 = 𝐸(𝑌|ℱ𝑛𝑋 ) for 𝑛 ∈ 𝕀. Show that 𝑍 is an 𝔽 𝑋 -martingale.

6 Find the stopped process 𝑍 𝜏 .
33

WQU MScFE Discrete-Time Stochastic Processes M2

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

WQU MScFE Discrete-Time Stochastic Processes M2

Caricato da

Copyright:

Formati disponibili

©2019 - WorldQuant University - All rights reserved.

Module 2: Stochastic Processes 1

Unit 1: What is a Stochastic Process? 2

Unit 1 : Video transcript 5

Unit 2: Video Transcript 9

Unit 3: Stopping Times and Stopped Processes 11

Unit 3 : Video Transcript 13

Unit 4 : Conditional Expectation 16

Unit 4: Video Transcript 20

Unit 5 : Martingales and Markov Processes 23

Unit 5 : Problem Set 27

Collaborative Review Task 33

©2019 - WorldQuant University - All rights reserved.

Module 2: Stochastic Processes

©2019 - WorldQuant University - All rights reserved

Unit 1: What is a Stochastic Process?

{𝑋𝑡 : 𝑡 ∈ 𝕀} is a collection of random variables. We call 𝕀 the index set. If 𝕀 ⊆ ℕ, we call 𝑋 a

Here are some real-world examples.

is a discrete-time stochastic process.

©2019 - WorldQuant University - All rights reserved

The stochastic process 𝑋 = {𝑋𝑡 : 𝑡 ≥ 0}, defined by

𝑋𝑡 ≔ Number of 𝑊𝑛 's less than or equal to 𝑡 = ∑ 𝐼{𝑊𝑛≤𝑡} , 𝑡 ≥ 0,

Define 𝑋0 (𝜔) = 5 for all 𝜔 ∈ 𝛺 and

𝑋1 (HH) = 𝑋1 (HT) = 5 + 1 = 6, 𝑋1 (TH) = 𝑋1 (TT) = 5 − 1 = 4,

©2019 - WorldQuant University - All rights reserved

Then 𝑋 = {𝑋0 , 𝑋1 , 𝑋2 } is a stochastic process, where 𝑋𝑡 represents your wealth at times 𝑡 =

𝑋(HT) = {𝑋0 (HT), 𝑋1 (HT), 𝑋2 (HT)} = {5,6,5}.

Here is a table summarizing the sample paths of 𝑋:

©2019 - WorldQuant University - All rights reserved

Unit 1 : Video transcript

©2019 - WorldQuant University - All rights reserved

stochastic process is that it's a sample path valued random variable.

©2019 - WorldQuant University - All rights reserved

ℱ𝑡𝑋 ≔ 𝜎({𝑋𝑠 : 𝑠 ≤ 𝑡, 𝑠 ∈ 𝕀, }), 𝑡 ∈ 𝕀.

This is the smallest filtration with respect to which 𝑋 is adapted.

We continue with the example of two coin-tosses. Define 𝔽 = {ℱ0 , ℱ1 , ℱ2 } by

ℱ0 = {∅, Ω}, ℱ1 = 𝜎({{HH,HT}, {TH,TT}}), ℱ2 = ℱ = 2Ω .

©2019 - WorldQuant University - All rights reserved

Then 𝔽 is a filtration. The 𝜎-algebra ℱ0 contains no information, ℱ1 contains information about

ℱ0𝒳 = 𝜎(𝑋0 ) = {∅, 𝛺},

This can be summarized in the following table:

©2019 - WorldQuant University - All rights reserved

Unit 2: Video Transcript

A filtration (or 𝔽) is an increasing collection of 𝜎-algebras. 𝔽 can be written like this: 𝔽 =

time 𝑡 and we call {𝛺, 𝐹, 𝔽, ℙ} a filtered probability space.

Let us look at an example:

©2019 - WorldQuant University - All rights reserved

©2019 - WorldQuant University - All rights reserved

Unit 3: Stopping Times and Stopped Processes

Let (Ω, ℱ, 𝔽, ℙ) be a filtered probability space. A non-negative extended random variable 𝜏: 𝛺 →

𝑋𝑡 (𝜔) 𝑡 < 𝜏(𝜔)

We also define the random variable 𝑋𝜏 by

𝑋𝜏(𝜔) (𝜔) 𝜏(𝜔) < ∞

𝑋2𝜏 (HH) = 𝑋𝜏(HH)∧2 (HH) = 𝑋1∧2 (HH) = 𝑋1 (HH) = 6.

A similar argument gives

©2019 - WorldQuant University - All rights reserved

𝑋2𝜏 (HT) = 𝑋1 (HT) = 6, 𝑋2𝜏 (TH) = X2 (TH) = 5, 𝑋2𝜏 (TT) = 𝑋2 (TT) = 3.

The random variable 𝑋𝜏 is given by

𝑋𝜏 (HH) = 𝑋1 (HH) = 6, 𝑋𝜏 (HT) = 𝑋1 (HT) = 6, 𝑋𝜏 (TH) = 0 = 𝑋𝜏 (TT).

𝜏𝐵 (ω) ≔ inf{𝑡 ∈ 𝕀: 𝑋𝑡 (ω) ∈ 𝐵}.

We call 𝜏𝐵 the hitting time.

If 𝕀 ⊆ ℕ is discrete, then 𝜏: Ω → 𝕀̅ is a stopping time if and only if {𝜏 = 𝑛} ∈ ℱ𝑛 for each 𝑛 ∈ 𝕀.

©2019 - WorldQuant University - All rights reserved