CH14ML

Markov Chain Monte Carlo
Chapter 14
Sampling
Why sampling?
Looking at samples from a model can help to understand the model's properties
Basis of sampling method is in generation of numbers, so we will be looking into random number generation and different distribution method of random number
Random Numbers
A random number is a number generated by a process, whose outcome is unpredictable, and which cannot be subsequently reliably reproduced. Pure Random number do not exist There are plenty of algorithms that produce pseudo-random number
Random Numbers (contd..)

Linear congruential generator:
Simplest algorithm to generate pseudo-random number It is function that is defined by recurrence relation
Random Numbers (contd..)

Industry-standard algorithm for generating random samples is the Mersenne Twister (based on Mersenne prime numbers) Mersenne Twister produces uniform random numbers For other distribution there are other schemes e.g. The Box-Muller Scheme for Gaussian distribution for random numbers
The Box-Muller Scheme

1. Pick two uniformly distributed random numbers between -1 and 1 (x1,x2)
2. If x12 +x22 > 1, discard them and pick two more 3. Compute w = x12 +x22 4. Compute 5. These yi have probability
which describes two independent variables with zero mean, unit variance Gaussian distribution
The Box-Muller Scheme

Introduces idea of rejection when the original samples were not inside the unit circle they were rejected and another one computed to replace them
Monte Carlo
If you take independent and identically distributed samples from an unknown high-dimensional distribution p(x), then as the number of samples gets larger the sample distribution will converge to the true distribution. Mathematically,
Monte Carlo (contd..)

We can also use these samples to compute expectations
And even use them to find a maximum
Rejection Sampling
More generally, we would like to sample from p(x), but its easier to sample from a proposal distribution q(x) q(x) satisfies p(x) M q(x) for some M<
Rejection Sampling: Algorithm

1. Sample x* from q(x) 2. Sample u from uniform(0, x*) 3. If u<p(x*)/Mq(x*)
Add x* to the set of samples
4. Else
Reject x and pick another sample
Rejection Sampling
If you dont choose M properly, you will have to reject a lot of them Curse of dimensionality makes the problem even worse To avoid problems we need to
I. develop some more sophisticated methods of understanding the space that we are sampling other is to try to ensure that samples are taken from areas of the space that have high probability
II.
Importance Sampling solves the problem listed above because it attaches a weight that says how important each sample is
Importance Sampling
to compute expectation E(f) for a continuous random variable x distributed according to unknown distribution p(x)
where ratio p(x(i))/q(x(i)) is importance weight Using importance weight we can resample the data (by Sampling-Importance-Resampling Algorithm)
Sampling-ImportanceResampling Algorithm
1. Produce N samples x(i), i=1.N from q(x) 2. Compute normalized importance weights
3. Resample from the distribution {x(i)} with probabilities given by the weights
Markov chain Monte Carlo

Recall again the set X and the distribution p(x) we wish to sample from Suppose that it is hard to sample p(x) but that it is possible to walk around in X using only local state transitions Insight: we can use a random walk to help us draw random samples from p(x)
Markov Chain
Markov chain on a space X with transitions T is a random process (infinite sequence of random variables)
(x(0), x(1),x(t),) in X that satisfy

p( x (t ) | x (t 1) ,..., x (1) ) T( x (t 1) , x (t ) )
That is, the probability of being in a particular state at time t given the state history depends only on the state at time t-1
Markov Chain (contd..)

Random walk is done by setting up Markov chain so that it reflects the distribution we wish to sample from and we want distribution P(x(i)) to converge to the actual distribution p(x) no matter what state we start from Irreducible: since we can do start from any state we every state is reachable from every other state
Ergodicity: if chain has both aperiodic and irreducible property
Markov Chain (contd..)

Invariant: we want distribution p(x) must be invariant to Markov chain that means transition probabilities dont change the distribution
Reversible: we can move backward and forwards along the chain with equal probability (detailed balance condition)
Probability of being in an unlikely state s, but heading for a likely state s should be same as being in the likely state s and heading for unlikely state s, so that
Markov chain Monte Carlo

if we construct a Markov chain with detailed balance we can sample from it in order to sample from our distribution (known as Markov Chain Monte Carlo MCMC) Metropolis-Hastings is most popular algorithm that is used for MCMC
Metropolis-Hastings
Assume that we have a proposal distribution of the form q(x(i)|x(i-1)) that we can sample from Idea of Metropolis-Hastings is similar to that of rejection sampling: we take a sample x* and choose whether or not to keep it Except, unlike rejection sampling, rather than picking another sample if we reject the current one, instead add another copy of previous accepted sample. Probability of keeping the sample is u(x*|x(i-1)):
Metropolis-Hastings Algorithm
1. Given an initial value x0 2. Repeat (Until you have enough samples)

Sample x* from q(xi|xi-1) Sample u from the uniform distribution If u<u(x*|x(i-1)) where
Set x[i+1] = x* Set x[i+1] = x[i]
Otherwise:
Simulated Annealing
There are lots of times when we might just want to find the maximum of a distribution rather than approximate distribution itself we use simulated annealing to find the maximum of a distribution This method changes the Markov chain so that its invariant distribution is not p(x), but rather p1/Ti(x), where Ti0 as i
We need annealing schedule that cools system down over time so that we are progressively less likely to accept solutions that are worse over time
Simulated Annealing (contd..)

There are only two modifications needed to the Metropolis-Hastings algorithm, and both are trivial:
extend the acceptance criterion to include the temperature and add a line into the loop to include the annealing schedule
Gibbs Sampling
Gibbs Sampling is a Metropolis-Hastings algorithm whose proposals are always accepted for each step, replace the value of a variable using the distribution conditioned on the remaining variables Perfect for Bayesian network
Gibbs Sampling
More formally, the proposal distribution is
q( x* | x (t ) )
(t ) p ( x* | x j j ) 0
if x*-j=x(t)-j otherwise
The importance ratio is
p( x* ) q( x (t ) | x* ) r p( x (t ) ) q( x* | x (t ) )
(t ) p( x* ) p( x (jt ) | x j) * p( x (t ) ) p( x* j | x j ) (t ) * p( x* ) p( x (jt ) , x ) p( x j j) * (t ) p( x (t ) ) p( x* j , x j ) p( x j )
Dfn of proposal distribution
Dfn of conditional probability B/c we didnt change other vars
So we always accept!
p( x ) p( x )
* j (t ) j
Gibbs Sampler
1. For each variable xj:
Initialize xj(0)
2. Repeat (until you have enough samples)
o o o o
For each variable xj:

Sample x1(i+1) from p(x1|x2(i),xn(i)) Sample x2(i+1) from p(x2|x1(i), x3(i)xn(i))
.
Sample x2(i+1) from p(xn|x1(i+1), xn-1(i+1))

CH14ML

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CH14ML

Caricato da

Copyright:

Formati disponibili

Markov Chain Monte Carlo

Random Numbers (contd..)

Random Numbers (contd..)

The Box-Muller Scheme

The Box-Muller Scheme

Monte Carlo (contd..)

And even use them to find a maximum

Rejection Sampling: Algorithm

Markov chain Monte Carlo

(x(0), x(1),x(t),) in X that satisfy

Markov Chain (contd..)

Ergodicity: if chain has both aperiodic and irreducible property

Markov Chain (contd..)

Markov chain Monte Carlo

Simulated Annealing (contd..)

The importance ratio is

Dfn of proposal distribution

Dfn of conditional probability B/c we didnt change other vars

2. Repeat (until you have enough samples)

For each variable xj:

Sample x2(i+1) from p(xn|x1(i+1), xn-1(i+1))

Potrebbero piacerti anche