Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
Announcements
• Please hand in homework 1 to the front
2
Probability and Inference
Probability
Data generating
Observed data
process
Inference
3
Figure based on one by Larry Wasserman, "All of Statistics"
Monte Carlo Inference
• Bayesian inference:
– Computing/approximating the posterior
– Answering queries based on the posterior
(MAP, marginals, posterior predictive,…)
– Constructing a data structure for answering queries
4
Markov chain Monte Carlo
• Goal: approximate/summarize a distribution,
e.g. the posterior, with a set of samples
5
Example: statistical mechanics
6
Markov chain Monte Carlo
• “The tour de force [of Metropolis et al. (1953)] was their realization that
they did not need to simulate the exact dynamics; they only needed to
simulate some Markov chain having the same equilibrium distribution.”
– Charles Geyer, the MCMC Handbook
7
History of MCMC
• 1953: Metropolis algorithm invented at Los Alamos
National Labs, home of the Manhattan Project
10
11
Note: the correct
answer is false!
12
Markov chain
X1 X2 X3 X4 X5
• Transition matrix:
• Transition operator:
13
Markov chain: example
14
Markov chain: example
15
Markov chain: example
16
Markov chain: example
17
Markov chain: example
18
Markov chain: example
19
Markov chain: example
20
Markov chain: example
21
Markov chain: example
22
Stationary distribution
• The distribution over states converged towards a particular distribution, in
this case the uniform distribution
• In this case, we say that the Markov chain has reached equilibrium
23
24
Stationary distribution
25
Stationary distribution
• A distribution is said to be a stationary distribution
(a.k.a. invariant distribution) of a Markov chain if
26
Stationary distribution
27
Markov chain Monte Carlo
• Select a Markov chain whose unique stationary
distribution is the target distribution P(x)
28
Metropolis-Hastings
• Need:
– unnormalized target density P*(x)
– A proposal distribution that depends on the
current value of x, Q(x’;x(t))
29
Metropolis-Hastings
• In each iteration t:
– Draw from the proposal distribution,
– Else
30
Metropolis-Hastings
• In each iteration t:
– Draw from the proposal distribution,
– Else
31
Metropolis-Hastings
acceptance decision
• With symmetric proposal:
• Asymmetric proposal:
• Asymmetric proposal:
• Independence Metropolis-Hastings:
– Proposal is a fixed Gaussian (or other distribution)
which does not depend on x(t).
(can be useful for a unimodal distribution where
we can find the mode, e.g. logistic regression).
34
35
Selecting the variance of the proposal
• The variance of the proposal (step size) can have
a big impact on performance, but it may be
difficult to know the best value ahead of time.
37
Example: mixture of two
1-D Gaussians
38
Example: mixture of two
1-D Gaussians
39
steps steps
40
Detailed balance
• Detailed balance is a sufficient (but not
necessary) condition for stationarity:
41
Stationarity of Metropolis-Hastings
• Want to show detailed balance holds:
42
Stationarity of Metropolis-Hastings
• Want to show detailed balance holds:
43
Stationarity of Metropolis-Hastings
• Want to show detailed balance holds:
44
Stationarity of Metropolis-Hastings
• Want to show detailed balance holds:
45
Gibbs sampling
• Sampling from a complicated joint distribution P(x) is
hard.
• Graphical models:
Graph structure gives us Markov blanket
46
Gibbs sampling
• Update variables one at a time by drawing
from their conditional distributions
48
Gibbs sampling
49
Gibbs sampling
50
Gibbs sampling
51
Gibbs sampling
52
Gibbs sampling
53
Gibbs sampling
54
Block Gibbs sampling
• Perform Gibbs updates on groups of variables
at once
Latent
states Z1 Z2 Z3 Z4 Z5
…
Observations Y1 Y2 Y3 Y4 Y5
θk
Parameters
K 55
Block Gibbs sampling
• Perform Gibbs updates on groups of variables
at once
Latent
states Z1 Z2 Z3 Z4 Z5
…
Observations Y1 Y2 Y3 Y4 Y5
θk
Parameters
K 56
Block Gibbs sampling
• Perform Gibbs updates on groups of variables
at once
Latent
states Z1 Z2 Z3 Z4 Z5
…
Observations Y1 Y2 Y3 Y4 Y5
θk
Parameters
K 57
Convergence of Gibbs sampling
58
Requirements for MCMC convergence
to the target distribution
59
Reasons that a Markov chain
might not converge to a limiting distribution
3 4
3 4 60
Theorem
1 2 1 2
3 4 3 4
61
MCMC convergence in practice
62
Burn in
• The initial samples won’t be from the stationary
distribution, until we converge.
– Use a burn in period where the initial samples are
discarded.
63
Monitoring convergence
64
Monitoring convergence
Log-likelihood or log
posterior probability
Number of iterations
65
66