Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Applications
Example – Flight cancellation concerns: Daily Flight
Cancellations Data
State
0 no cancellations
1 one cancellation
2 2 cancellation
3 more than 2 cancellations
𝐹𝐹𝑖𝑖𝑖𝑖 = � 𝑓𝑓𝑖𝑖𝑖𝑖𝑛𝑛 = 1
𝑛𝑛=1
• For transient state ∞
𝜇𝜇𝑖𝑖𝑖𝑖 = � 𝑛𝑛 × 𝑓𝑓𝑖𝑖𝑖𝑖𝑛𝑛
𝑛𝑛=1
• Positive recurrent state: If 𝜇𝜇𝑖𝑖𝑖𝑖 < ∞ (i.e. finite time)
• Null-recurrent state: If 𝜇𝜇𝑖𝑖𝑖𝑖 = ∞ (i.e. infinite time)
Periodic State
• Periodic state is special case of recurrent state
• Let 𝑑𝑑(𝑖𝑖) be the greatest common divisor of n such
that 𝑃𝑃𝑖𝑖𝑖𝑖𝑛𝑛 > 0
• Aperiodic State: 𝑑𝑑 𝑖𝑖 = 1
• Periodic State: 𝑑𝑑 𝑖𝑖 ≥ 2
1 2 3
1 0 1 0
P= 2 0 0 1
3 1 0 0
2
𝑃𝑃11 = 0, but for n=multiples of 3, 𝑃𝑃𝑖𝑖𝑖𝑖𝑛𝑛 = 1 > 0 ⇒ 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 1 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
Ergodic Markov Chain
• A state i of a Markov chain is ergodic when it is
positive recurrent and aperiodic
• Markov chains in which all states are ergodic is an
ergodic Markov chain
• An ergodic Markov chain has a stationary distribution
that satisfies 𝑚𝑚
� 𝜋𝜋𝑘𝑘 = 1
𝑘𝑘=1
Limiting Probability
• Limiting Probability is
lim 𝑃𝑃𝑖𝑖𝑖𝑖𝑛𝑛
𝑛𝑛→∞
• Limiting probability depends may depend on the
initial state and is not unique
• Stationary distribution is unique and does not
depend on the initial state
Markov Chains with Absorbing
States
• Absorbing state: 𝑃𝑃𝑖𝑖𝑖𝑖 = 1
• Absorbing state Markov Chain is a Markov chain in
which there is at least one state k such that 𝑃𝑃𝑘𝑘𝑘𝑘 = 1
• Absorbing state markov chains are not ∞
ergodic
𝑛𝑛
since
other states will be transient (i.e. ∑𝑛𝑛=1 𝑃𝑃𝑘𝑘𝑘𝑘 < ∞)
• Transition matrix corresponding to an absorbing state
markov chain is not a regular matrix and thus do not
have stationary distributions
• i.e. 𝑃𝑃𝐼𝐼 𝑃𝑃𝑛𝑛 may not converge to a unique value and depends on
the initial distribution
• The long run probability of finding the system in
transient states is zero
Canonical Form of the Transition
Matrix of an Absorbing State Markov
Chain
• I = Identity matrix (corresponds to transition between absorbing states)
• 0= matrix in which all elements are zero (i.e. from absorbing state to
transient state)
• R = matrix in which element represents probability of absorption from
transient state to absorbing state
• Q= matrix in which elements represent transition between transient
states
A T
P= A I 0
T R Q
A T
Pn= A I 0
T ∑𝑛𝑛−1 k
𝑘𝑘=0 Q R Qn
Fundamental Matrix
• For large value of n, ∑𝑛𝑛−1
𝑘𝑘=0 Qk R gives the
𝐸𝐸𝑗𝑗𝑗𝑗 = 0
Example – how long does it take for
NPA problem to become worse
State1 NPA is less than 1%
State2 NPA is between 1% and 2%
State3 NPA is between 2% and 3%
State4 NPA is between 3% and 4%
State5 NPA is between 4% and 5%
State6 NPA is between 5% and 6%
State7 NPA is greater than 6%
Transition probability matrix (based on monthly data)
States State1 State2 State3 State4 State5 State6 State7
1 0.95 0.05 0 0 0 0 0
2 0.1 0.85 0.05 0 0 0 0
3 0 0.1 0.8 0.1 0 0 0
4 0 0 0.15 0.7 0.15 0 0
5 0 0 0 0.15 0.65 0.2 0
6 0 0 0 0 0.2 0.6 0.2
7 0 0 0 0 0 0.1 0.9
Question
• Calculate the expected duration (in months) for the
process to reach 7 from state 4
𝐸𝐸𝑖𝑖7 = 1 + � 𝑃𝑃𝑖𝑖𝑖𝑖 𝐸𝐸𝑘𝑘7 ∀𝑖𝑖, 𝑖𝑖 ≠ 7
𝑘𝑘
𝐸𝐸77 = 0
Markov Reward Processes
Markov Reward Process
NPV of Rewards for each state
Use of NPV
State Value Function
Bellman Equations - Obtaining the
expected value of rewards
Bellman Equations in Matrix Form
Exercise
• Obtain the value functions for each state
• 𝛾𝛾 = 0
• 𝛾𝛾 = 1
• 𝛾𝛾 = 0.75
Markov Decision Processes
• Used for analyzing sequential decision making over
a planning horizon
• Decisions are made based on every state of the
system, leading to outcomes over a period of time,
along with state changes – What are the best
decisions?
• Substitute for a 90 minutes football match
• Whether to promote a product or not
• When to buy and sell shares
• Movement of robots in a given context
• When to stop or change a television serial with the
objective to maximize television ratings
Two algorithms
• Objective is to find the optimal sequence of action
{𝑎𝑎0 , 𝑎𝑎1 … } that maximize total rewards
• Policy iteration algorithm
• Value iteration algorithm
Example – evaluating policy for maintenance
of mining equipment
States 1(excellent condition), 2, 3, 4(bad condition)
Discount factor 0.95
State 1 2 3 4
Revenue 20000 16000 12000 5000
Actions
1 Do nothing
2 Carry out preventive maintenance. This is applicable when in state 3
or state 4. Converts either state to state 2. Preventive maintenance
costs Rs. 2000
3 Replace the equipment. Applicable to states 2, 3 and 4. Converts
either state to state 1. Cost of replacement is Rs. 10000
Transition probability matrix
States 1 2 3 4
1 0.8 0.1 0.1 0
2 0 0.7 0.2 0.1
3 0 0 0.7 0.3
4 0 0 0 1
Find policy values for {1,1,2,2} {1,1,2,3}
Policy Iteration algorithm
• For any policy 𝜋𝜋 = 𝑎𝑎0 , 𝑎𝑎1 …
• Use LPP
• Represent the bellman equations for the value of each
state corresponding to each policy as an inequality,
assuming that policy is optimal (LP constraints)
• Minimize the total optimal value function to identify the
optimal policy
Value Iteration Algorithm
• Based on a finite planning horizon
• Dynamic Programming Algorithm
• Identify the time period of planning
• Start from the last time period, n
• Obtain the best action for each state for that time period
based on immediate rewards generated
• Move backwards to the previous time period, n-1
• Obtain the best action for that time period for each state
based on rewards and expected value
• Repeat these steps till you reach the first time period
• Total optimal value for each state is obtained corresponds
to optimal action taken at each stage (i.e. actions profile over
the time period can also be mapped)