Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
Announcements
2
How many clusters should we use
in our mixture model?
3
Choosing the dimensionality of your
latent space
• How many
– clusters should we use in our mixture model?
– dimensions in our factor analysis model?
– Topics in our topic model?
4
Traditional model selection
• Marginal likelihood:
• Can also put a prior on models, pick the one with the highest posterior
probability (or average over all possible models, a.k.a. model averaging)
5
Approximate heuristic methods:
Bayesian information criterion (BIC)
6
(for large n, exponential families)
Nonparametric models
• So-called “nonparametric” models typically do have
parameters, however:
• E.g.
– K-nearest neighbors classifier
– Kernel density estimation Model
complexity
– Decision trees
Data points N
7
Bayesian nonparametric models
• Bayesian models whose complexity increases
with the amount of data
Model
complexity
Data points N
8
Learning outcomes
By the end of the lesson, you should be able to:
9
10
11
12
Chinese restaurant process
13
Chinese restaurant process
• Overall metaphor:
– Imagine a restaurant with an infinite number of tables,
each serving a different dish
• Tables = clusters
14
Chinese restaurant process
• Overall metaphor:
15
Chinese restaurant process
16
Chinese restaurant process
17
Chinese restaurant process
18
Chinese restaurant process
19
Chinese restaurant process
20
Chinese restaurant process
21
Chinese restaurant process
22
Chinese restaurant process
23
Chinese restaurant process
24
Chinese restaurant process
25
Chinese restaurant process
26
Chinese restaurant process
27
Chinese restaurant process
28
Chinese restaurant process
29
Chinese restaurant process
30
Chinese restaurant process
31
Chinese restaurant process
32
Chinese restaurant process
33
34
CRP is exchangeable
• Joint distribution
35
CRP is exchangeable
36
CRP mixture models
(a.k.a. Dirichlet process mixture models)
• Generate cluster assignments
(partition of data points) via CRP
37
Draw from CRP mixture model, N = 50
38
Draw from CRP mixture model, N = 500
39
Draw from CRP mixture model, N = 1000
40
Inference via collapsed Gibbs sampling
• For a collapsed Gibbs update, compute:
Posterior predictive.
If conjugate prior, we can compute this.
41
42
43
Using CRP mixture model to find the
“true” num clusters is dangerous!
• Marginalize out :
45
46
Alternative derivation:
Infinite limit of finite mixture model
47
Alternative derivation:
Infinite limit of finite mixture model
• Take the limit as K goes to infinity
48
Indian Buffet Process
• Distribution over binary matrices with an
infinite number of columns (latent features)
Features (dishes)
Data points
(customers)
…
49
Indian Buffet Process
• Start with a finite Beta-Bernoulli model
(each column/feature/dish has a coin-flip param)
51
Extending to non-binary case
• Elementwise multiply Z with a random
real-valued or integer matrix
52
Exam study guide
• The learning outcomes in each lecture are your main guide on what
to study
53
Think-pair-share
54
Evaluations
55