Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Automatic Differentiation
Variational Inference
Philip Schulz and Wilker Aziz
https:
//github.com/philschulz/VITutorial
1 / 37
Automatic Differentiation Variational Inference
I DGMs:
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
2 / 37
Automatic Differentiation Variational Inference
ADVI
Example
3 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
4 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Jacobian
5 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Jacobian
JT −1 (y ) = (JT (x))−1
5 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
6 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
6 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
6 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Integration by substitution
We can integrate a function g (x)
by substituting x = T −1 (y )
Z
g (x)dx
7 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Integration by substitution
We can integrate a function g (x)
by substituting x = T −1 (y )
Z Z
g (x)dx = g (T −1 (y )) |det JT −1 (y )|dy
| {z } | {z }
x dx
7 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Integration by substitution
We can integrate a function g (x)
by substituting x = T −1 (y )
Z Z
g (x)dx = g (T −1 (y )) |det JT −1 (y )|dy
| {z } | {z }
x dx
7 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Integration by substitution
We can integrate a function g (x)
by substituting x = T −1 (y )
Z Z
g (x)dx = g (T −1 (y )) |det JT −1 (y )|dy
| {z } | {z }
x dx
7 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Change of density
Let X take on values in RK with density pX (x)
8 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Change of density
Let X take on values in RK with density pX (x)
and recall that y = T (x) and x = T −1 (y )
8 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Change of density
Let X take on values in RK with density pX (x)
and recall that y = T (x) and x = T −1 (y )
pY (y ) = pX (T −1 (y ))|det JT −1 (y )|
8 / 37
Automatic Differentiation Variational Inference
Multivariate calculus recap
Change of density
Let X take on values in RK with density pX (x)
and recall that y = T (x) and x = T −1 (y )
pY (y ) = pX (T −1 (y ))|det JT −1 (y )|
8 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
9 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
9 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
9 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
9 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
9 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
9 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Eq(z|λ) [g (z)]
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
Z
= π()
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
Z −1
= π() det JSλ−1 ()
| {z }
inv func theorem
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
Z −1
= π() det JSλ−1 () g (Sλ−1 ())
| {z } | {z
z
}
inv func theorem
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
Z −1
−1
= π() det JSλ−1 () g (Sλ ()) det JSλ−1 ()d
| {z } | {z
z
} | {z }
inv func theorem change of var
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
Z −1
−1
= π() det JSλ−1 () g (Sλ ()) det JSλ−1 ()d
| {z } | {z
z
} | {z }
inv func theorem change of var
Z
= π()g (Sλ−1 ())d
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised expectations
If we are interested in
Z
Eq(z|λ) [g (z)] = q(z|λ)g (z)dz
Z
= π(Sλ (z))|det JSλ (z)| g (z)dz
| {z }
change of density
Z −1
−1
= π() det JSλ−1 () g (Sλ ()) det JSλ−1 ()d
| {z } | {z
z
} | {z }
inv func theorem change of var
Z
= π()g (Sλ−1 ())d = Eπ() g (Sλ−1 ())
10 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised gradients
For optimisation, we need tractable gradients
∂ ∂
Eπ() g (Sλ−1 ())
Eq(z|λ) [g (z)] =
∂λ ∂λ
11 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised gradients
For optimisation, we need tractable gradients
∂ ∂
Eπ() g (Sλ−1 ())
Eq(z|λ) [g (z)] =
∂λ ∂λ
since now the density does not depend on λ, we can
obtain a gradient estimate
∂ ∂
Eq(z|λ) [g (z)] = Eπ() g (Sλ−1 ())
∂λ ∂λ
11 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Reparameterised gradients
For optimisation, we need tractable gradients
∂ ∂
Eπ() g (Sλ−1 ())
Eq(z|λ) [g (z)] =
∂λ ∂λ
since now the density does not depend on λ, we can
obtain a gradient estimate
∂ ∂
Eq(z|λ) [g (z)] = Eπ() g (Sλ−1 ())
∂λ ∂λ
M
MC 1 X ∂
≈ g (Sλ−1 (i ))
M ∂λ
i=1
i ∼π()
11 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
12 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
12 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
12 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Beyond
13 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Beyond
Many interesting densities cannot easily be
reparameterised
Beta
13 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Beyond
Many interesting densities cannot easily be
reparameterised
Gamma
13 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Beyond
Many interesting densities cannot easily be
reparameterised
Weibull
13 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Beyond
Many interesting densities cannot easily be
reparameterised
Dirichlet
13 / 37
Automatic Differentiation Variational Inference
Reparameterised gradients revisited
Beyond
Many interesting densities cannot easily be
reparameterised
von Mises-Fisher
13 / 37
Automatic Differentiation Variational Inference
ADVI
Automatic Differentiation VI
Motivation
I many models have intractable posteriors
their normalising constants (evidence) lack
analytic solutions
14 / 37
Automatic Differentiation Variational Inference
ADVI
Automatic Differentiation VI
Motivation
I many models have intractable posteriors
their normalising constants (evidence) lack
analytic solutions
I but many models are differentiable
that’s the main constraint for using NNs
14 / 37
Automatic Differentiation Variational Inference
ADVI
Automatic Differentiation VI
Motivation
I many models have intractable posteriors
their normalising constants (evidence) lack
analytic solutions
I but many models are differentiable
that’s the main constraint for using NNs
Reparameterised gradients are a step towards
automatising VI for differentiable models
14 / 37
Automatic Differentiation Variational Inference
ADVI
Automatic Differentiation VI
Motivation
I many models have intractable posteriors
their normalising constants (evidence) lack
analytic solutions
I but many models are differentiable
that’s the main constraint for using NNs
Reparameterised gradients are a step towards
automatising VI for differentiable models
I but not every model of interest employs rvs for
which a reparameterisation is known
14 / 37
Automatic Differentiation Variational Inference
ADVI
X |z ∼ Poisson(z) z ∈ R>0
15 / 37
Automatic Differentiation Variational Inference
ADVI
X |z ∼ Poisson(z) z ∈ R>0
15 / 37
Automatic Differentiation Variational Inference
ADVI
15 / 37
Automatic Differentiation Variational Inference
ADVI
16 / 37
Automatic Differentiation Variational Inference
ADVI
16 / 37
Automatic Differentiation Variational Inference
ADVI
ELBO
Eq(z|λ) [log p(x, z|r , k)] + H (q(z))
16 / 37
Automatic Differentiation Variational Inference
ADVI
ELBO
Eq(z|λ) [log p(x, z|r , k)] + H (q(z))
Can we make q(z|λ) Gaussian?
16 / 37
Automatic Differentiation Variational Inference
ADVI
ELBO
Eq(z|λ) [log p(x, z|r , k)] + H (q(z))
Can we make q(z|λ) Gaussian?
No! supp(N (z|µ, σ 2 )) = R
16 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
ELBO
Eq(ζ|λ) [. . .] + H (q(ζ))
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
ELBO
Eq(ζ|λ) [. . .] + H (q(ζ))
Can we use a Gaussian approximate posterior?
17 / 37
Automatic Differentiation Variational Inference
ADVI
Strategy
Build a change of variable into the model
ELBO
Eq(ζ|λ) [. . .] + H (q(ζ))
Can we use a Gaussian approximate posterior? Yes!
17 / 37
Automatic Differentiation Variational Inference
ADVI
Differentiable models
We focus on differentiable probability models
p(x, z) = p(x|z)p(z)
18 / 37
Automatic Differentiation Variational Inference
ADVI
Differentiable models
We focus on differentiable probability models
p(x, z) = p(x|z)p(z)
18 / 37
Automatic Differentiation Variational Inference
ADVI
Differentiable models
We focus on differentiable probability models
p(x, z) = p(x|z)p(z)
19 / 37
Automatic Differentiation Variational Inference
ADVI
19 / 37
Automatic Differentiation Variational Inference
ADVI
∂
Eq(z;λ) [log p(x, z)]
∂λ
19 / 37
Automatic Differentiation Variational Inference
ADVI
19 / 37
Automatic Differentiation Variational Inference
ADVI
VI optimisation problem
Let’s focus on the design and optimisation of the
variational approximation
arg min KL (q(z) || p(z|x))
q(z)
20 / 37
Automatic Differentiation Variational Inference
ADVI
VI optimisation problem
Let’s focus on the design and optimisation of the
variational approximation
arg min KL (q(z) || p(z|x))
q(z)
supp(q(z)) ⊆ supp(p(z|x))
20 / 37
Automatic Differentiation Variational Inference
ADVI
VI optimisation problem
Let’s focus on the design and optimisation of the
variational approximation
arg min KL (q(z) || p(z|x))
q(z)
supp(q(z)) ⊆ supp(p(z|x))
where
Q = {q(z) : supp(q(z)) ⊆ supp(p(z|x))}
21 / 37
Automatic Differentiation Variational Inference
ADVI
where
Q = {q(z) : supp(q(z)) ⊆ supp(p(z|x))}
But what is the support of p(z|x)?
21 / 37
Automatic Differentiation Variational Inference
ADVI
where
Q = {q(z) : supp(q(z)) ⊆ supp(p(z|x))}
But what is the support of p(z|x)?
I typically the same as the support of p(z)
21 / 37
Automatic Differentiation Variational Inference
ADVI
where
Q = {q(z) : supp(q(z)) ⊆ supp(p(z|x))}
But what is the support of p(z|x)?
I typically the same as the support of p(z)
as long as p(x, z) > 0 if p(z) > 0
21 / 37
Automatic Differentiation Variational Inference
ADVI
Parametric family
So let’s constrain q(z) to a family Q whose support
is included in the support of the prior
where
22 / 37
Automatic Differentiation Variational Inference
ADVI
Parametric family
So let’s constrain q(z) to a family Q whose support
is included in the support of the prior
where
23 / 37
Automatic Differentiation Variational Inference
ADVI
23 / 37
Automatic Differentiation Variational Inference
ADVI
23 / 37
Automatic Differentiation Variational Inference
ADVI
23 / 37
Automatic Differentiation Variational Inference
ADVI
23 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
24 / 37
Automatic Differentiation Variational Inference
ADVI
25 / 37
Automatic Differentiation Variational Inference
ADVI
subject to
25 / 37
Automatic Differentiation Variational Inference
ADVI
subject to
ADVI
A gradient-based black-box VI procedure
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
2. Custom supp(p(z))
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
2. Custom supp(p(z))
I Express z ∈ supp(p(z)) ⊆ RK as a transformation
of some unconstrained ζ ∈ RK
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
2. Custom supp(p(z))
I Express z ∈ supp(p(z)) ⊆ RK as a transformation
of some unconstrained ζ ∈ RK
I Pick a variational family over the entire real
coordinate space
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
2. Custom supp(p(z))
I Express z ∈ supp(p(z)) ⊆ RK as a transformation
of some unconstrained ζ ∈ RK
I Pick a variational family over the entire real
coordinate space
I basically, pick a Gaussian!
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
2. Custom supp(p(z))
I Express z ∈ supp(p(z)) ⊆ RK as a transformation
of some unconstrained ζ ∈ RK
I Pick a variational family over the entire real
coordinate space
I basically, pick a Gaussian!
3. Intractable expectations
26 / 37
Automatic Differentiation Variational Inference
ADVI
ADVI
A gradient-based black-box VI procedure
1. Custom parameter space
I Appropriate transformations of unconstrained
parameters!
2. Custom supp(p(z))
I Express z ∈ supp(p(z)) ⊆ RK as a transformation
of some unconstrained ζ ∈ RK
I Pick a variational family over the entire real
coordinate space
I basically, pick a Gaussian!
3. Intractable expectations
I Reparameterised Gradients!
26 / 37
Automatic Differentiation Variational Inference
ADVI
27 / 37
Automatic Differentiation Variational Inference
ADVI
27 / 37
Automatic Differentiation Variational Inference
ADVI
27 / 37
Automatic Differentiation Variational Inference
ADVI
27 / 37
Automatic Differentiation Variational Inference
ADVI
28 / 37
Automatic Differentiation Variational Inference
ADVI
q(ζ; λ)
28 / 37
Automatic Differentiation Variational Inference
ADVI
28 / 37
Automatic Differentiation Variational Inference
ADVI
log p(x)
29 / 37
Automatic Differentiation Variational Inference
ADVI
29 / 37
Automatic Differentiation Variational Inference
ADVI
29 / 37
Automatic Differentiation Variational Inference
ADVI
29 / 37
Automatic Differentiation Variational Inference
ADVI
29 / 37
Automatic Differentiation Variational Inference
ADVI
29 / 37
Automatic Differentiation Variational Inference
ADVI
Reparameterised ELBO
30 / 37
Automatic Differentiation Variational Inference
ADVI
Reparameterised ELBO
+ H (q(ζ; λ))
30 / 37
Automatic Differentiation Variational Inference
ADVI
Gradient estimate
For i ∼ N (0, I )
∂
ELBO(λ)
∂λ
31 / 37
Automatic Differentiation Variational Inference
ADVI
Gradient estimate
For i ∼ N (0, I )
M
∂ MC 1 X ∂
ELBO(λ) ≈ log p(x|T −1 (Sλ−1 (i )))
∂λ M i=1 ∂λ | {z }
likelihood
31 / 37
Automatic Differentiation Variational Inference
ADVI
Gradient estimate
For i ∼ N (0, I )
M
∂ MC 1 X ∂
ELBO(λ) ≈ log p(x|T −1 (Sλ−1 (i )))
∂λ M i=1 ∂λ | {z }
likelihood
∂
+ log p(T −1 (Sλ−1 (i )))
∂λ | {z }
prior
31 / 37
Automatic Differentiation Variational Inference
ADVI
Gradient estimate
For i ∼ N (0, I )
M
∂ MC 1 X ∂
ELBO(λ) ≈ log p(x|T −1 (Sλ−1 (i )))
∂λ M i=1 ∂λ | {z }
likelihood
∂
+ log p(T −1 (Sλ−1 (i )))
∂λ | {z }
prior
∂
log det JT −1 (Sλ−1 (i ))
+
∂λ | {z }
change of volume
31 / 37
Automatic Differentiation Variational Inference
ADVI
Gradient estimate
For i ∼ N (0, I )
M
∂ MC 1 X ∂
ELBO(λ) ≈ log p(x|T −1 (Sλ−1 (i )))
∂λ M i=1 ∂λ | {z }
likelihood
∂
+ log p(T −1 (Sλ−1 (i )))
∂λ | {z }
prior
∂
log det JT −1 (Sλ−1 (i ))
+
∂λ | {z }
change of volume
∂
+ H (q(ζ; λ))
∂λ | {z }
analaytic
31 / 37
Automatic Differentiation Variational Inference
ADVI
Practical tips
32 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull( |r , k) Poisson(x| )
| {z } | {z }
z z
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull(log−1 (ζ) |r , k) Poisson(x| log−1 (ζ))
| {z } | {z }
z z
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull(log−1 (ζ) |r , k) Poisson(x| log−1 (ζ))det Jlog−1 (ζ)
| {z } | {z }
z z
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull(log−1 (ζ) |r , k) Poisson(x| log−1 (ζ))det Jlog−1 (ζ)
| {z } | {z }
z z
−1
= p(x, z = log (ζ)) det Jlog−1 (ζ)
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull(log−1 (ζ) |r , k) Poisson(x| log−1 (ζ))det Jlog−1 (ζ)
| {z } | {z }
z z
−1
= p(x, z = log (ζ)) det Jlog−1 (ζ)
ELBO
Eq(ζ|λ) [. . .] + H (q(ζ))
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull(log−1 (ζ) |r , k) Poisson(x| log−1 (ζ))det Jlog−1 (ζ)
| {z } | {z }
z z
−1
= p(x, z = log (ζ)) det Jlog−1 (ζ)
ELBO
Eq(ζ|λ) log p(x, z = log−1 (ζ))det Jlog−1 (ζ) + H (q(ζ))
33 / 37
Automatic Differentiation Variational Inference
Example
Weibull-Poisson model
Build a change of variable into the model
p(x, z|r , k) = p(z|r , k)p(x|ρ)
= Weibull(z|r , k) Poisson(x|z)
= Weibull(log−1 (ζ) |r , k) Poisson(x| log−1 (ζ))det Jlog−1 (ζ)
| {z } | {z }
z z
−1
= p(x, z = log (ζ)) det Jlog−1 (ζ)
ELBO
Eq(ζ|λ) log p(x, z = log−1 (ζ))det Jlog−1 (ζ) + H (q(ζ))
33 / 37
Automatic Differentiation Variational Inference
Example
Visualisation
35 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
36 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
I we knew how to map parameters to the
unconstrained real coordinate space
36 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
I we knew how to map parameters to the
unconstrained real coordinate space
I now we also know how to map latent variables
to unconstrained real coordinate space
36 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
I we knew how to map parameters to the
unconstrained real coordinate space
I now we also know how to map latent variables
to unconstrained real coordinate space
I it takes a change of variable built into the
model
36 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
I we knew how to map parameters to the
unconstrained real coordinate space
I now we also know how to map latent variables
to unconstrained real coordinate space
I it takes a change of variable built into the
model
Think of ADVI as reparameterised gradients and
autodiff expanded to many more models!
36 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
I we knew how to map parameters to the
unconstrained real coordinate space
I now we also know how to map latent variables
to unconstrained real coordinate space
I it takes a change of variable built into the
model
Think of ADVI as reparameterised gradients and
autodiff expanded to many more models!
What’s left?
36 / 37
Automatic Differentiation Variational Inference
Example
Summary
ADVI is a big step towards blackbox VI
I we knew how to map parameters to the
unconstrained real coordinate space
I now we also know how to map latent variables
to unconstrained real coordinate space
I it takes a change of variable built into the
model
Think of ADVI as reparameterised gradients and
autodiff expanded to many more models!
What’s left? Our posteriors are still rather simple,
aren’t they?
36 / 37
Automatic Differentiation Variational Inference
Example
37 / 37