CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo

CSE291D Lecture 6
Monte Carlo Methods 2:

Markov chain Monte Carlo
1
Announcements
• Please hand in homework 1 to the front
• You can also pick up hard copies of the project

specification (already on Piazza)
• Homework 2 will be available on Piazza tomorrow
• Project proposals are due on Tuesday!
2
Probability and Inference
Probability
Data generating
Observed data
process
Inference
3
Figure based on one by Larry Wasserman, "All of Statistics"
Monte Carlo Inference
• Bayesian inference:
– Computing/approximating the posterior
– Answering queries based on the posterior
(MAP, marginals, posterior predictive,…)
– Constructing a data structure for answering queries
• We need more powerful algorithms for

Bayesian inference, to solve real-world
high-dimensional inference problems.
4
• Goal: approximate/summarize a distribution,
e.g. the posterior, with a set of samples
• Idea: use a Markov chain to simulate the

distribution and draw samples
5
Example: statistical mechanics
• Suppose we are modeling particles in a gas. The

Boltzmann distribution (Gibbs distribution) has the form:
• This probability distribution arises from the behavior of the

particles over time, which could be modeled as a Markov chain
6
• “The tour de force [of Metropolis et al. (1953)] was their realization that
they did not need to simulate the exact dynamics; they only needed to
simulate some Markov chain having the same equilibrium distribution.”
– Charles Geyer, the MCMC Handbook
7
History of MCMC
• 1953: Metropolis algorithm invented at Los Alamos
National Labs, home of the Manhattan Project
• 1970: Generalized to Metropolis-Hastings.

Used by chemists and physicists.
• 1984: Gibbs sampler invented, used for mode finding

(Geman and Geman, 1984)
• 1990: Finally brought to the attention of Bayesian

statisticians (Gelfand and Smith, 1990)
8
9
Learning outcomes
By the end of the lesson, you should be able to:
• Derive Metropolis-Hastings and Gibbs sampling

algorithms to simulate from probability models
• Apply these methods to solve practical inference

tasks, while sensibly navigating convergence issues
10
11
Note: the correct
answer is false!
12
Markov chain
X1 X2 X3 X4 X5
• Homogenous: transition probabilities don’t change.
• Transition matrix:
• Transition operator:
13
Markov chain: example
14
15
16
17
18
19
20
21
22
Stationary distribution
• The distribution over states converged towards a particular distribution, in
this case the uniform distribution
• In this case, we say that the Markov chain has reached equilibrium
• The distribution is called a stationary distribution, a.k.a. an

equilibrium distribution, a.k.a. an invariant distribution of the chain.
23
24
25
• A distribution is said to be a stationary distribution
(a.k.a. invariant distribution) of a Markov chain if
• If x is distributed according to , it will still be

distributed according to after taking a step in the
Markov chain
26
• A stationary distribution is an eigenvector of

the transition matrix that has eigenvalue 1.
27
• Select a Markov chain whose unique stationary
distribution is the target distribution P(x)
• Simulate the Markov chain until you reach

equilibrium, then keep the states as samples
28
Metropolis-Hastings
• Need:
– unnormalized target density P*(x)
– A proposal distribution that depends on the
current value of x, Q(x’;x(t))
29
Metropolis-Hastings
• In each iteration t:
– Draw from the proposal distribution,
– Decide whether to accept the proposal, or reject it

– If accept
– Else
30
Metropolis-Hastings
• In each iteration t:
– Draw from the proposal distribution,
– Decide whether to accept the proposal, or reject it

– If accept
– Else
31
Metropolis-Hastings
acceptance decision
• With symmetric proposal:
Higher probability states should be

accepted proportionally more often
• Asymmetric proposal:
Correct for asymmetry

in the proposal 32
Metropolis-Hastings
acceptance decision
• With symmetric proposal:
Higher probability states should be

accepted proportionally more often
• Asymmetric proposal:
Correct for asymmetry

in the proposal 33
Proposal distributions
• A common choice of proposal distribution:
a Gaussian centered at the current state x(t).
The variance plays the role of a step size.
• Independence Metropolis-Hastings:
– Proposal is a fixed Gaussian (or other distribution)
which does not depend on x(t).
(can be useful for a unimodal distribution where
we can find the mode, e.g. logistic regression).
34
35
Selecting the variance of the proposal
• The variance of the proposal (step size) can have
a big impact on performance, but it may be
difficult to know the best value ahead of time.
• One could try some preliminary runs with several

values for the variance of the proposal, and select
one that performs well – around 25-50%
acceptance rate is typically good.
36
Example: mixture of two
1-D Gaussians
37
1-D Gaussians
38
1-D Gaussians
39
steps steps
40
Detailed balance
• Detailed balance is a sufficient (but not
necessary) condition for stationarity:
• If we could magically pick a state from the distribution, and

take a step in the chain, we’d be just as likely to pick xb and go
to xa, as we are to pick xa and go to xb.
41
Stationarity of Metropolis-Hastings
• Want to show detailed balance holds:
42
43
44
45
Gibbs sampling
• Sampling from a complicated joint distribution P(x) is
hard.
• Often, sampling one variable at a time, given all the

others, is much easier.
• Graphical models:
Graph structure gives us Markov blanket
46
Gibbs sampling
• Update variables one at a time by drawing
from their conditional distributions
• In each iteration, sweep through and update

all of the variables, in any order.
47
Gibbs sampling
48
Gibbs sampling
49
Gibbs sampling
50
Gibbs sampling
51
Gibbs sampling
52
Gibbs sampling
53
Gibbs sampling
54
Block Gibbs sampling
• Perform Gibbs updates on groups of variables
at once
Latent
states Z1 Z2 Z3 Z4 Z5
…
Observations Y1 Y2 Y3 Y4 Y5
θk
Parameters
K 55
at once
Latent
…
θk
Parameters
K 56
at once
Latent
…
θk
Parameters
K 57
Convergence of Gibbs sampling
• The target distribution is quite clearly/

intuitively invariant to Gibbs updates
• Gibbs updates can be viewed as a

special case of Metropolis-Hastings updates,
with an acceptance probability = 1.
58
Requirements for MCMC convergence
to the target distribution
1. The target distribution P(x) is a

stationary distribution of the Markov chain
2. The Markov chain has a limiting distribution,
59
Reasons that a Markov chain
might not converge to a limiting distribution
• Reducible: Some states can’t be reached from

other states. 1 2
3 4
• Periodic: Contains fixed-length cyclic behavior

1 2
3 4 60
Theorem
• If a Markov chain is irreducible and aperiodic,

it has a limiting distribution , which is its
unique stationary distribution
1 2 1 2
3 4 3 4
61
MCMC convergence in practice
• It is generally relatively easy to construct a

Markov chain which has a the correct
limiting distribution
• The challenge is to assess when it has reached

this limiting distribution
62
Burn in
• The initial samples won’t be from the stationary
distribution, until we converge.
– Use a burn in period where the initial samples are
discarded.
• Multiple chains can help check convergence
63
Monitoring convergence
64
Monitoring convergence
Log-likelihood or log
posterior probability
Number of iterations
65
66

CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo

Caricato da

Copyright:

Formati disponibili

CSE291D Lecture 6

Monte Carlo Methods 2:

• You can also pick up hard copies of the project

• Homework 2 will be available on Piazza tomorrow

• Project proposals are due on Tuesday!

• We need more powerful algorithms for

• Idea: use a Markov chain to simulate the

• Suppose we are modeling particles in a gas. The

• This probability distribution arises from the behavior of the

• 1970: Generalized to Metropolis-Hastings.

• 1984: Gibbs sampler invented, used for mode finding

• 1990: Finally brought to the attention of Bayesian

• Derive Metropolis-Hastings and Gibbs sampling

• Apply these methods to solve practical inference

• Homogenous: transition probabilities don’t change.

• The distribution is called a stationary distribution, a.k.a. an

• If x is distributed according to , it will still be

• A stationary distribution is an eigenvector of

• Simulate the Markov chain until you reach

– Decide whether to accept the proposal, or reject it

– Decide whether to accept the proposal, or reject it

Higher probability states should be

Correct for asymmetry

Higher probability states should be

Correct for asymmetry

• One could try some preliminary runs with several

• If we could magically pick a state from the distribution, and

• Often, sampling one variable at a time, given all the

• In each iteration, sweep through and update

• The target distribution is quite clearly/

• Gibbs updates can be viewed as a

1. The target distribution P(x) is a

2. The Markov chain has a limiting distribution,

• Reducible: Some states can’t be reached from

• Periodic: Contains fixed-length cyclic behavior

• If a Markov chain is irreducible and aperiodic,

• It is generally relatively easy to construct a

• The challenge is to assess when it has reached

• Multiple chains can help check convergence

Potrebbero piacerti anche