Sei sulla pagina 1di 27

Markov Chain Monte Carlo

Chapter 14

Sampling
Why sampling?
Looking at samples from a model can help to understand the model's properties

Basis of sampling method is in generation of numbers, so we will be looking into random number generation and different distribution method of random number

Random Numbers
A random number is a number generated by a process, whose outcome is unpredictable, and which cannot be subsequently reliably reproduced. Pure Random number do not exist There are plenty of algorithms that produce pseudo-random number

Random Numbers (contd..)


Linear congruential generator:
Simplest algorithm to generate pseudo-random number It is function that is defined by recurrence relation

Random Numbers (contd..)


Industry-standard algorithm for generating random samples is the Mersenne Twister (based on Mersenne prime numbers) Mersenne Twister produces uniform random numbers For other distribution there are other schemes e.g. The Box-Muller Scheme for Gaussian distribution for random numbers

The Box-Muller Scheme


1. Pick two uniformly distributed random numbers between -1 and 1 (x1,x2)
2. If x12 +x22 > 1, discard them and pick two more 3. Compute w = x12 +x22 4. Compute 5. These yi have probability

which describes two independent variables with zero mean, unit variance Gaussian distribution

The Box-Muller Scheme


Introduces idea of rejection when the original samples were not inside the unit circle they were rejected and another one computed to replace them

Monte Carlo
If you take independent and identically distributed samples from an unknown high-dimensional distribution p(x), then as the number of samples gets larger the sample distribution will converge to the true distribution. Mathematically,

Monte Carlo (contd..)


We can also use these samples to compute expectations

And even use them to find a maximum

Rejection Sampling
More generally, we would like to sample from p(x), but its easier to sample from a proposal distribution q(x) q(x) satisfies p(x) M q(x) for some M<

Rejection Sampling: Algorithm


1. Sample x* from q(x) 2. Sample u from uniform(0, x*) 3. If u<p(x*)/Mq(x*)
Add x* to the set of samples

4. Else
Reject x and pick another sample

Rejection Sampling
If you dont choose M properly, you will have to reject a lot of them Curse of dimensionality makes the problem even worse To avoid problems we need to
I. develop some more sophisticated methods of understanding the space that we are sampling other is to try to ensure that samples are taken from areas of the space that have high probability

II.

Importance Sampling solves the problem listed above because it attaches a weight that says how important each sample is

Importance Sampling
to compute expectation E(f) for a continuous random variable x distributed according to unknown distribution p(x)

where ratio p(x(i))/q(x(i)) is importance weight Using importance weight we can resample the data (by Sampling-Importance-Resampling Algorithm)

Sampling-ImportanceResampling Algorithm
1. Produce N samples x(i), i=1.N from q(x) 2. Compute normalized importance weights

3. Resample from the distribution {x(i)} with probabilities given by the weights

Markov chain Monte Carlo


Recall again the set X and the distribution p(x) we wish to sample from Suppose that it is hard to sample p(x) but that it is possible to walk around in X using only local state transitions Insight: we can use a random walk to help us draw random samples from p(x)

Markov Chain
Markov chain on a space X with transitions T is a random process (infinite sequence of random variables)

(x(0), x(1),x(t),) in X that satisfy


p( x (t ) | x (t 1) ,..., x (1) ) T( x (t 1) , x (t ) )

That is, the probability of being in a particular state at time t given the state history depends only on the state at time t-1

Markov Chain (contd..)


Random walk is done by setting up Markov chain so that it reflects the distribution we wish to sample from and we want distribution P(x(i)) to converge to the actual distribution p(x) no matter what state we start from Irreducible: since we can do start from any state we every state is reachable from every other state

Ergodicity: if chain has both aperiodic and irreducible property

Markov Chain (contd..)


Invariant: we want distribution p(x) must be invariant to Markov chain that means transition probabilities dont change the distribution

Reversible: we can move backward and forwards along the chain with equal probability (detailed balance condition)
Probability of being in an unlikely state s, but heading for a likely state s should be same as being in the likely state s and heading for unlikely state s, so that

Markov chain Monte Carlo


if we construct a Markov chain with detailed balance we can sample from it in order to sample from our distribution (known as Markov Chain Monte Carlo MCMC) Metropolis-Hastings is most popular algorithm that is used for MCMC

Metropolis-Hastings
Assume that we have a proposal distribution of the form q(x(i)|x(i-1)) that we can sample from Idea of Metropolis-Hastings is similar to that of rejection sampling: we take a sample x* and choose whether or not to keep it Except, unlike rejection sampling, rather than picking another sample if we reject the current one, instead add another copy of previous accepted sample. Probability of keeping the sample is u(x*|x(i-1)):

Metropolis-Hastings Algorithm
1. Given an initial value x0 2. Repeat (Until you have enough samples)

Sample x* from q(xi|xi-1) Sample u from the uniform distribution If u<u(x*|x(i-1)) where
Set x[i+1] = x* Set x[i+1] = x[i]

Otherwise:

Simulated Annealing
There are lots of times when we might just want to find the maximum of a distribution rather than approximate distribution itself we use simulated annealing to find the maximum of a distribution This method changes the Markov chain so that its invariant distribution is not p(x), but rather p1/Ti(x), where Ti0 as i

We need annealing schedule that cools system down over time so that we are progressively less likely to accept solutions that are worse over time

Simulated Annealing (contd..)


There are only two modifications needed to the Metropolis-Hastings algorithm, and both are trivial:
extend the acceptance criterion to include the temperature and add a line into the loop to include the annealing schedule

Gibbs Sampling
Gibbs Sampling is a Metropolis-Hastings algorithm whose proposals are always accepted for each step, replace the value of a variable using the distribution conditioned on the remaining variables Perfect for Bayesian network

Gibbs Sampling
More formally, the proposal distribution is
q( x* | x (t ) )
(t ) p ( x* | x j j ) 0

if x*-j=x(t)-j otherwise

The importance ratio is

p( x* ) q( x (t ) | x* ) r p( x (t ) ) q( x* | x (t ) )
(t ) p( x* ) p( x (jt ) | x j) * p( x (t ) ) p( x* j | x j ) (t ) * p( x* ) p( x (jt ) , x ) p( x j j) * (t ) p( x (t ) ) p( x* j , x j ) p( x j )

Dfn of proposal distribution

Dfn of conditional probability B/c we didnt change other vars

So we always accept!

p( x ) p( x )

* j (t ) j

Gibbs Sampler
1. For each variable xj:
Initialize xj(0)

2. Repeat (until you have enough samples)

o o o o

For each variable xj:


Sample x1(i+1) from p(x1|x2(i),xn(i)) Sample x2(i+1) from p(x2|x1(i), x3(i)xn(i))
.

Sample x2(i+1) from p(xn|x1(i+1), xn-1(i+1))

Potrebbero piacerti anche