Sei sulla pagina 1di 38

ORIGIN OF MARKOV CHAINS

By Ritwik Vashistha
Shivam Pandey
WEAK LAW OF LARGE NUMBERS
The weak law of large numbers (also called Khinchin's law) states that in the case of i.i.d.
random variables, the sample average converges in probability towards the expected value

Interpreting this result, the weak law states that for any nonzero margin specified, no matter
how small, with a sufficiently large sample there will be a very high probability that the
average of the observations will be close to the expected value; that is, within the margin.
IMPLICATIONS OF WEAK LAW OF LARGE
NUMBERS
Bernoulli said “ If observations of all
events be continued for the entire infinity, it
will be noticed that everything in the world
is governed by precise ratios and a
constant law of change.”

Average Height of People can be


determined by taking a sufficiently large
sample.
Jacob Bernoulli
TWO CUPS EXAMPLE
30000 White Pebbles 20000 Black Pebbles
Now determine the ratio of White versus Black by experiment ?
3
/
2
So the Expected Value of White versus Black
Observations will converge to the Actual Ratio as the
number of trials increases, which is technically the
Weak Law of Large Numbers
ARGUMENT AGAINST WLLN

• Pavel Nekrasov didn’t like the idea of us


having a pre-determined Statistical Fate

• He made the claim that independence is a


necessary condition for Weak Law of
Large Numbers

• Markov says convergence seen in Weak


Law of Large Numbers is applicable in
cases of Dependent Events also.

Pavel Nekrasov Andrey Markov


A SIMPLE WEATHER MODEL
Consider a scenario where you live at a place where they never have two nice days
in a row. If they have a nice day, they are just as likely to have snow or rain the next
day. If they have snow or rain, they have an even chance of having the same the next
day. If there is change from snow or rain, only half of the time is this a change to a
nice day.

Now, you might be interested in finding out that what is the probability that after 15
days it will be a nice day given that it rained today or it was a nice day today.

WLLN is not applicable here, so how can we find a ‘pre-determined Statistical Fate’ ?
FINDING CONDITIONAL PROBABILITIES

P(R2|R1)=? P(N2|R1)=? P(S2|R1)=?

P(R2|N1)=? P(N2|N1)=? P(S2|N1)=?

P(R2|S1)=? P(N2|S1)=? P(S2|S1)=?


FINDING CONDITIONAL PROBABILITIES

P(Ri+1|Ri)=0.50, P(Ni+1|Ri)=0.25, P(Si+1|Ri)=0.25

P(Ri+1|Ni)=0.50, P(Ni+1|Ni)=0.00, P(Si+1|Ni)=0.50

P(Ri+1|Si)=0.25, P(Ni+1|Si)=0.25, P(Si+1|Si)=0.50


CAN WE REPRESENT THE PROBABILITIES IN
A MATRIX?
t+1
t

Now Consider initially the question of determining the


probability that, given that weather is rainy today, it will
be a nice day two days from now?
We denote this probability by PRN(2)

We see that if it is rainy today then the event that it is nice two days from now is the disjoint
union of the following three events:
1)it is rainy tomorrow and nice the next day,
2) it is nice tomorrow and nice the next day,
and 3) it is snowy tomorrow and nice the next day.

The probability of the first of these events is the product of the conditional probability that it
is rainy tomorrow, given that it is rainy today, and the conditional probability that it is nice two
days from now, given that it is rainy tomorrow.
We can write this product as PRR *PRN
Thus , we have PRN(2) = PRR*PRN+PRN*PNN+PRS*PSN
PRN(2) = 0.5*0.25 + 0.5*0 + 0.25*0.25 = 0.188
This equation should remind us of a dot product of two vectors; we are
dotting the first row of P with the Second column of P.

But why is this notion important ?


IMPLICATIONS OF THE EXPERIMENT

The Probabilities are not independent.

No matter where you start, once begin the


experiment, the chance that it will be nice day in the
long run converges to some specific ratio.

Sample Ratio converges to Population Ratio.


MARKOV CHAIN

A Markov chain is a stochastic model describing a sequence of possible events in


which the probability of each event depends only on the state attained in the
previous event.

An alternative definition could be that Markov Chain is a stochastic process that


satisfies the Markovian property.
MARKOV PROPERTY
A stochastic process has the Markov property if
the conditional probability distribution of future
states of the process (conditional on both past
and present states) depends only upon the
present state, not on the sequence of events that
preceded it.
REVISITING OUR EXAMPLE

Here our random variable is Xt ,which can take values Rainy Day ,Nice Day or Snow Day
at time period t.
The range (possible values) of the random variables in a stochastic process is called
the state space of the process.
So our State Space is {Rainy, Nice, Snowy} or {R,N,S}
MARKOV PROPERTY
A stochastic process has the Markov property if
the conditional probability distribution of future
states of the process (conditional on both past
and present states) depends only upon the
present state, not on the sequence of events that
preceded it.
MARKOV DECISION PROCESS
Basis for sequential decision making.

R ( t+1 ) = f ( S (t), A(t) )


R ( t+1 ) = f ( S (t), A(t) )
RETURN
Sequence-
S (0), A (0), R (1), S(1), A (1), R (2), S(2), A (2), R (3)……….

Return-
G (t) = R (t+1) + R (t+2) + R (t+3) + R (t+4) +………+ R (T)

Discounted Return-
G (t) = R (t+1) + ϒ * R (t+2) + ϒ² * R (t+3) +……..
Q-VALUE AND Q-LEARNING
•Q- value is simply expected return of taking an action in any given state.

•Q-Learning is the process of learning optimal policy by learning Q-value for


each state-action pair.
•It uses an iterative approach to update the q-value every time an action is
taken.
Possible Actions

Possible States
EXPLORATION V/S EXPLOITATION

What is better? How to choose?


Possible Actions

Possible States
HOW MUCH CAN I LEARN?

Learning rate = α [ 0 <= α <= 1 ]

Updated q-value = (1- α) * Old q-value + α * New Return Received


Possible Actions

Possible States
For fixed value of parameters,
THANK YOU
BIBLOGRAPHY
Khan Academy
Wikipedia
NPTEL
Research Papers

Potrebbero piacerti anche