Cse291d 5

CSE291D Lecture 5
Monte Carlo Methods 1:

Importance Sampling,
Rejection Sampling,
Particle filters
1
Project
• Project details have been uploaded to Piazza,
and are in the handout
• Reminder: Project proposals due 4/19,

via email
• Start getting a group together and planning

your project.
You can use Piazza to search for teammates
2
Probability and Inference
Probability
Data generating
Observed data
process
Inference
3
Figure based on one by Larry Wasserman, "All of Statistics"
Approximate Inference
• In principle, Bayesian inference is a simple
application of Bayes’ rule. This has been easy to do
for most of the simple models we’ve studied so far.
• However, in general, Bayesian inference is

intractable, motivating approximation techniques
4
• Optimization approaches
– Cast inference as optimizing an objective function.

Maximize or find a fixed point
• EM
• Variational inference
– Variational Bayes, mean field
– Message passing: loopy BP, TRW, expectation propagation
• Laplace approximation
5
• Simulation approaches
(Monte Carlo methods)
– Approximate a distribution by drawing samples
• Importance sampling, rejection sampling

• Particle filtering
• Markov chain Monte Carlo
– Gibbs sampling, Metropolis-Hastings, Hamiltonian Monte
Carlo…
6
Monte Carlo Methods
• Suppose we want to approximately compute
• From the law of large numbers, for sufficiently

large S,
7
Monte Carlo Methods
• This suggests the procedure:
– Draw S samples from P(x)

– Compute f(x) for each of the samples
– Approximate E[f(x)] by the sample average
8
Monte Carlo Methods: Example
9
Monte Carlo Methods
• In practice, we typically cannot sample from P(x), and

need to resort to approximate algorithms
• That’s what we’ll be talking about in the next two

lessons
10
Learning outcomes
By the end of the lesson, you should be able to:
• Apply simple Monte Carlo methods to approximate

expectations under distributions, including
importance sampling and rejection sampling.
• Distinguish between scenarios where these methods

might be expected to perform well or not.
11
12
13
Bayesian Inference:
One Computer Scientist’s Perspective
• In theory, the posterior is simply given by Bayes’ rule.
• Bayesian inference, then, involves computing likelihood times

prior, and normalizing, for every single possible value
• But even if we could do this, except for very simple cases we

typically couldn’t even store the result of this computation (at
least naïvely).
14
Bayesian Inference:
One Computer Scientist’s Perspective
• So, what do we actually mean when we say
we are doing Bayesian inference?
– Answering specific queries with respect to the
distribution?
(MAP, marginals, posterior predictive,…)
– Computing a data structure which allows us to

answer such queries?
• Posterior samples could be understood as a convenient
data structure summarizing the posterior distribution
15
Sampling: An analogy
• Draw a water sample so that it is equally likely to

come from anywhere in the lake
16
Exhaustive approach
• Visit every point in the lake. Pour a copy of the

whole lake into equally-sized jars. Pick one at random
• As the number of dimensions increases, the size of

the “surface of the lake” increases exponentially
17
Sampling: Challenges
• We don’t know how deep the lake could be
• It is too expensive to explore the whole lake
• There could be deep, narrow canyons. How do you

make sure you don’t miss them?
18
Uniform sampling
• Pick S uniform samples, weight according to

their relative probability
19
Uniform sampling
• If you miss a “canyon,” the result will be very

bad.
• In higher dimensions, it’s more likely you’ll
miss the “canyons”
20
Uniform sampling
• E.g. suppose you have a very good model for documents:
The quick brown fox jumps over the sly lazy dog
[5 6 37 1 4 30 5 22 570 12]
• The chance of uniformly picking a coherent document gets exponentially

smaller, the longer the document is.
21
Importance sampling
• Same idea, but pick from a better “proposal”
distribution than uniform.
• Reweight samples to correct for sampling from the
wrong distribution.
22
Importance sampling
23
Importance sampling
24
Importance sampling
25
Importance sampling
26
Importance sampling
without normalization constants
27
Importance sampling
28
Importance sampling
29
Importance sampling
30
Importance sampling
31
Importance sampling
32
Importance sampling
33
34
Importance sampling
• Can be used to estimate the ratio of
partition functions between p(x) and q(x)
35
Importance sampling
36
Importance sampling
37
Importance sampling
38
39
Heavy tails
• If q(x) goes towards zero faster than p(x), importance
weights of rare events will become extremely large
Gaussian proposal Cauchy proposal 40

Importance sampling
in high dimensions
• As the dimensionality of the space increases, it becomes
harder to reliably construct a good proposal distribution
Spherical Gaussian:
41
42
Sampling Importance Resampling
• We can convert a set of importance-weighted
samples to a set of unweighted samples
– Draw S importance samples
– Resample S’ samples from the set of samples, with

replacement, proportional to their importance weights
43
Rejection Sampling
• Unnormalized proposal distribution cQ*(x) that upper bounds P*(x)

• Sample uniformly under the curve cQ*(x) (with auxiliary “height” u)
• Reject samples that do not fall under the curve of P*(x)
44
Rejection Sampling
• Unnormalized proposal distribution cQ*(x) that upper bounds P*(x)

• Sample uniformly under the curve cQ*(x) (with auxiliary “height” u)
• Reject samples that do not fall under the curve of P*(x)
45
Rejection sampling
in high dimensions
• As the dimensionality of the space increases, the constant c
gets exponentially larger in general
Spherical Gaussian: Multiply by c in one dimension, multiply by cN in N dimensions.

46
Particle Filters
• Dynamical systems (cf. Kalman filters)
Latent
states Z1 Z2 Z3 Z4 Z5
…
Observations Y1 Y2 Y3 Y4 Y5
• Radar tracking, robot localization, weather

forecasting,…
47
Particle Filters
• Dynamical systems (cf. Kalman filters)
Latent
states Z1 Z2 Z3 Z4 Z5
…
Observations Y1 Y2 Y3 Y4 Y5
Zt = ?
• Filtering: keeping a running prediction on the

current state zt
48
Particle Filters
• Particle filters, a.k.a. sequential Monte Carlo,
a.k.a. sequential importance sampling
• Basic idea:
– Perform importance sampling to estimate z,
– At each timestep t, extend each importance sample to

include zt and update the weights recursively
49
Updating importance weights
• Suppose the proposal is the prior:
• Then the update simplifies:
50
Degeneracy
• As we add more timesteps, the z vector
becomes higher dimensional
– Importance weights select only a few samples
• Solution: Sampling importance resampling!

– When “effective sample size” is low, resample new
particles proportional to the weights
51
Illustration of particle filtering
52
Application: visual object tracking
• Goal: track an object (in this case, a remote
controlled helicopter) in a video sequence
• Linear dynamics model

• Likelihood based on color histogram features
• Proposal distribution: sample from the prior
(dynamics model)
• S = 250 samples
53
54
55
56
57
58
59
Think-pair-share: helicopter tracker
• You are an engineer for the RC helicopter company.
• Your company plans to deploy the helicopter tracking system

as part of a mobile phone app in 3 months, but needs it to be
more reliable.
• How would you change the system to improve its

performance?
60

Cse291d 5

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cse291d 5

Caricato da

Copyright:

Formati disponibili

CSE291D Lecture 5

Monte Carlo Methods 1:

• Reminder: Project proposals due 4/19,

• Start getting a group together and planning

• However, in general, Bayesian inference is

– Cast inference as optimizing an objective function.

– Approximate a distribution by drawing samples

• Importance sampling, rejection sampling

• From the law of large numbers, for sufficiently

• This suggests the procedure:

– Draw S samples from P(x)

• In practice, we typically cannot sample from P(x), and

• That’s what we’ll be talking about in the next two

• Apply simple Monte Carlo methods to approximate

• Distinguish between scenarios where these methods

• Bayesian inference, then, involves computing likelihood times

• But even if we could do this, except for very simple cases we

– Computing a data structure which allows us to

• Draw a water sample so that it is equally likely to

• Visit every point in the lake. Pour a copy of the

• As the number of dimensions increases, the size of

• We don’t know how deep the lake could be

• It is too expensive to explore the whole lake

• There could be deep, narrow canyons. How do you

• Pick S uniform samples, weight according to

• If you miss a “canyon,” the result will be very

• The chance of uniformly picking a coherent document gets exponentially

Gaussian proposal Cauchy proposal 40

– Draw S importance samples

– Resample S’ samples from the set of samples, with

• Unnormalized proposal distribution cQ*(x) that upper bounds P*(x)

• Unnormalized proposal distribution cQ*(x) that upper bounds P*(x)

Spherical Gaussian: Multiply by c in one dimension, multiply by cN in N dimensions.

• Radar tracking, robot localization, weather

• Filtering: keeping a running prediction on the

– At each timestep t, extend each importance sample to

• Suppose the proposal is the prior:

• Then the update simplifies:

• Solution: Sampling importance resampling!

• Linear dynamics model

• You are an engineer for the RC helicopter company.

• Your company plans to deploy the helicopter tracking system

• How would you change the system to improve its

Potrebbero piacerti anche

• Unnormalized proposal distribution cQ(x) that upper bounds P(x)

• Unnormalized proposal distribution cQ(x) that upper bounds P(x)