Cse291d 20

CSE291D Lecture 20
Nonparametric Bayesian models
1
Announcements
• HW 4 is back and available at the front of the room
• Solutions for HW 3 and 4 are also available at the front
• Submit HW5 to my office (under the door if I’m not there), or by

email to me and the TA, by midnight 06/09.
• Submit the project report to me by the same time,

preferably by email.
• Good luck on the exam!

Tuesday 06/07 7pm-10pm in this room (PETER 103).
2
How many clusters should we use
in our mixture model?
3
Choosing the dimensionality of your
latent space
• How many
– clusters should we use in our mixture model?
– dimensions in our factor analysis model?
– Topics in our topic model?
• With more latent variables, can fit training data better

(e.g. a cluster for every data point!)
• However, we may “overfit” and generalize less well to

new data. Also lose parsimony and interpretability
4
Traditional model selection
• Marginal likelihood:
• Bayes factor: ratio of marginal likelihoods between two models
• Can also put a prior on models, pick the one with the highest posterior
probability (or average over all possible models, a.k.a. model averaging)
• Pro: automatically penalizes complicated models, since integrates over all

parameter values, including bad ones
• Con: intractable to do this exactly
5
Approximate heuristic methods:
Bayesian information criterion (BIC)
• Score(model) = model complexity - model fit
• Pick the model with the best (lowest) score

# data points
# (free) parameters
6
(for large n, exponential families)
Nonparametric models
• So-called “nonparametric” models typically do have
parameters, however:
• Num parameters (model complexity) is not fixed,

but grows with the amount of data
• E.g.
– K-nearest neighbors classifier
– Kernel density estimation Model
complexity
– Decision trees
Data points N
7
Bayesian nonparametric models
• Bayesian models whose complexity increases
with the amount of data
• Typically, prior over an infinite # of latent vars

– The number that are used is finite, depends on data
Model
complexity
Data points N
8
Learning outcomes
By the end of the lesson, you should be able to:
• Simulate the Chinese restaurant process
• Perform nonparametric data modeling with

CRP mixture models and
Indian Buffet process models
9
10
11
12
Chinese restaurant process
• A distribution over partitions (groupings) of

objects, e.g. data points
• The number of groups is not specified in advance
• Useful as a prior for cluster assignments in a

nonparametric Bayesian mixture model
13
• Overall metaphor:
– Imagine a restaurant with an infinite number of tables,
each serving a different dish
• Tables = clusters
– Customers enter the restaurant one at a time,

and sit at a table
• Customers = data points
14
• Overall metaphor:
– Some dishes are more popular than others

• Customers sit at a table with probability proportional to the
number of customers already at that table
• Or at a new table with probability proportional to
Basically a Polya urn process!

-we also have balls in the urn for “new table” option
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
CRP is exchangeable
• Joint distribution
• Terms for customers in group k:

(Ik,: = their indices, Nk = num customers in k)
35
CRP is exchangeable
• Each index occurs in one group. Simplify:
Depends on num groups K, and

the group sizes Nk,
but not on the ordering!
36
CRP mixture models
(a.k.a. Dirichlet process mixture models)
• Generate cluster assignments
(partition of data points) via CRP
• Draw parameters for each cluster from prior
• Draw each data points its cluster
37
Draw from CRP mixture model, N = 50
38
39
40
Inference via collapsed Gibbs sampling
• For a collapsed Gibbs update, compute:
Posterior predictive.
If conjugate prior, we can compute this.
• Use exchangeability! Make ci the last customer
41
42
43
Using CRP mixture model to find the
“true” num clusters is dangerous!
-Gershman and Blei (2012) 44

Alternative derivation:
Infinite limit of finite mixture model
• Consider a mixture model with a Dirichlet concentration
parameter that does not depend on num clusters K:
• Marginalize out :
45
46
• Goes to 0 as K goes to infinity. Instead count

equivalence classes of partitions
47
• Take the limit as K goes to infinity
48
Indian Buffet Process
• Distribution over binary matrices with an
infinite number of columns (latent features)
Features (dishes)
Data points
(customers)
…
49
• Start with a finite Beta-Bernoulli model
(each column/feature/dish has a coin-flip param)
• Define equivalence classes of matrices
• Take infinite limit, K goes to infinity 50

• Each “customer” i eats each “dish” with
probability , then samples
new dishes
51
Extending to non-binary case
• Elementwise multiply Z with a random
real-valued or integer matrix
• Can use as a prior for factor analysis, etc
52
Exam study guide
• The learning outcomes in each lecture are your main guide on what
to study
• Useful to study slides, homeworks, peer instruction questions.

Readings are lower priority, but may be useful
• Format will be similar to homeworks, but will also include a

multi-choice component
– Need to be comfortable with models but don’t need to memorize
pdfs of distributions, proofs done in class, etc.
• Bring pens, scratch paper, pocket calculators (no phones!)
53
Think-pair-share
• Design a nonparametric Bayesian latent variable

model for a social network, represented as a binary
adjacency matrix Y
– How will you specify a prior?

– How will you specify a likelihood?
– Does your model encode any sociological principles?
54
Evaluations
• Please be sure to submit evaluations for both your

instructor and TA, if you have not done so already.
• This will help us a lot!
• (Thanks, if you have already done this)
• I understand that you have been emailed a link to do

this.
55

Cse291d 20

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cse291d 20

Caricato da

Copyright:

Formati disponibili

CSE291D Lecture 20

Nonparametric Bayesian models

• HW 4 is back and available at the front of the room

• Solutions for HW 3 and 4 are also available at the front

• Submit HW5 to my office (under the door if I’m not there), or by

• Submit the project report to me by the same time,

• Good luck on the exam!

• With more latent variables, can fit training data better

• However, we may “overfit” and generalize less well to

• Bayes factor: ratio of marginal likelihoods between two models

• Pro: automatically penalizes complicated models, since integrates over all

• Con: intractable to do this exactly

• Score(model) = model complexity - model fit

• Pick the model with the best (lowest) score

• Num parameters (model complexity) is not fixed,

• Typically, prior over an infinite # of latent vars

• Simulate the Chinese restaurant process

• Perform nonparametric data modeling with

• A distribution over partitions (groupings) of

• The number of groups is not specified in advance

• Useful as a prior for cluster assignments in a

– Customers enter the restaurant one at a time,

– Some dishes are more popular than others

Basically a Polya urn process!

• Terms for customers in group k:

• Each index occurs in one group. Simplify:

Depends on num groups K, and

• Draw parameters for each cluster from prior

• Draw each data points its cluster

• Use exchangeability! Make ci the last customer

-Gershman and Blei (2012) 44

• Goes to 0 as K goes to infinity. Instead count

• Define equivalence classes of matrices

• Take infinite limit, K goes to infinity 50

• Can use as a prior for factor analysis, etc

• Useful to study slides, homeworks, peer instruction questions.

• Format will be similar to homeworks, but will also include a

• Bring pens, scratch paper, pocket calculators (no phones!)

• Design a nonparametric Bayesian latent variable

– How will you specify a prior?

• Please be sure to submit evaluations for both your

• This will help us a lot!

• (Thanks, if you have already done this)

• I understand that you have been emailed a link to do

Potrebbero piacerti anche