Sei sulla pagina 1di 68

Tools for Probabilistic Data Analysis in Python *

Dan Foreman-Mackey | #pyastro16


Tools for Probabilistic Data Analysis in Python *

Dan Foreman-Mackey | #pyastro16

* in 15 minutes
What have I done?
Tools for Probabilistic Data Analysis in Python
Physics

Data
Physics

mean model
(physical parameters → predicted data)

Data
Physics

mean model
(physical parameters → predicted data)

Data

noise
(stochastic; instrument, systematics, etc.)
Physics

inference mean model


(parameter estimation) (physical parameters → predicted data)

Data

noise
(stochastic; instrument, systematics, etc.)
A few examples

1 linear regression

2 maximum likelihood

3 uncertainty quantification
Linear regression
Linear regression

if you have:
a linear mean model and
known Gaussian uncertainties

and you want:


"best" parameters and uncertainties
Linear (mean) models

y = mx + b
Linear (mean) models

y = mx + b

y = a2 x2 + a1 x + a0
Linear (mean) models

y = mx + b

y = a2 x2 + a1 x + a0

y = a sin(x + w)
Linear (mean) models

y = mx + b

y = a2 x2 + a1 x + a0

y = a sin(x + w)

+ known Gaussian uncertainties


Linear regression
Linear regression
Linear regression

# x, y, yerr are numpy arrays of the same shape

import numpy as np

A = np.vander(x, 2)
ATA = np.dot(A.T, A / yerr[:, None]**2)
sigma_w = np.linalg.inv(ATA)
mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))
Linear regression
0 1
x1 1
B x2 1 C
# x, y, yerr are numpy arrays of the same B
shape C
A=B . .. C
@ .. . A
import numpy as np
xn 1
A = np.vander(x, 2)
ATA = np.dot(A.T, A / yerr[:, None]**2)
sigma_w = np.linalg.inv(ATA)
mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))
Linear regression
0 1
x1 1
B x2 1 C
# x, y, yerr are numpy arrays of the same B
shape C
A=B . .. C
@ .. . A
import numpy as np
xn 1
A = np.vander(x, 2)
ATA = np.dot(A.T, A / yerr[:, None]**2)
sigma_w = np.linalg.inv(ATA)
mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))
✓ ◆
m
w=
b
Linear regression
0 1
x1 1
B x2 1 C
# x, y, yerr are numpy arrays of the same B
shape C
A=B . .. C
@ .. . A
import numpy as np
xn 1
A = np.vander(x, 2)
ATA = np.dot(A.T, A / yerr[:, None]**2)
sigma_w = np.linalg.inv(ATA)
mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))
✓ ◆
m
w=
b
That's it!
(in other words: "Don't use MCMC for linear regression!")
Maximum likelihood
Maximum likelihood

if you have:
a non-linear mean model and/or
non-Gaussian/unknown noise

and you want:


"best" parameters
Likelihoods

p(data | physics)
"probability of the data given physics"

parameterized by some parameters


Example likelihood function

log-likelihood mean model

XN 2
1 [yn f✓ (xn )]
ln p({yn } | ✓) = 2
+ constant
2 n=1 n

" "
2
Likelihoods
Likelihoods

SciPy
Likelihoods

# x, y, yerr are numpy arrays of the same shape

import numpy as np
from scipy.optimize import minimize

def model(theta, x):


a, b, c = theta
return a / (1 + np.exp(-b * (x - c)))

def neg_log_like(theta):
return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2)

r = minimize(nll, [1.0, 10.0, 1.5])


print(r)

XN
1 [yn f✓ (xn )]2
ln p({yn } | ✓) = 2
+ constant
2 n=1 n
Likelihoods

# x, y, yerr are numpy arrays of the same shape

import numpy as np
from scipy.optimize import minimize
a
def model(theta, x): f✓ (xn ) = b (xn c)
a, b, c = theta 1 +e
return a / (1 + np.exp(-b * (x - c)))

def neg_log_like(theta):
return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2)

r = minimize(nll, [1.0, 10.0, 1.5])


print(r)

XN
1 [yn f✓ (xn )]2
ln p({yn } | ✓) = 2
+ constant
2 n=1 n
Likelihoods

# x, y, yerr are numpy arrays of the same shape

import numpy as np
from scipy.optimize import minimize
a
def model(theta, x): f✓ (xn ) = b (xn c)
a, b, c = theta 1 +e
return a / (1 + np.exp(-b * (x - c)))

def neg_log_like(theta): ln p({yn } | ✓)


return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2)

r = minimize(nll, [1.0, 10.0, 1.5])


print(r)

XN
1 [yn f✓ (xn )]2
ln p({yn } | ✓) = 2
+ constant
2 n=1 n
"But it doesn't work…"
— everyone
1 initialization

2 bounds

3 convergence

4 gradients
1 initialization

2 bounds

3 convergence

4 gradients
Gradients

d
ln p({yn } | ✓)
d✓
seriously?
AutoDiff to the rescue!

"The most criminally underused tool


in the [PyAstro] toolkit"
— adapted from
justindomke.wordpress.com
AutoDiff

"Compile" time exact gradients


AutoDiff

"Compile" time chain rule


AutoDiff

GradType sin (GradType x):


return GradType(
x.value,
x.grad * cos(x.value)
)
AutoDiff

"Compile" time exact gradients


AutoDiff in Python

1 Theano: deeplearning.net/software/theano

2 HIPS/autograd: github.com/HIPS/autograd
HIPS/autograd just works

import autograd.numpy as np
from autograd import elementwise_grad

def f(x):
y = np.exp(-x)
return (1.0 - y) / (1.0 + y)

df = elementwise_grad(f)
ddf = elementwise_grad(df)
HIPS/autograd just works

1.0
f (x); f (x); f (x)
0.5
import autograd.numpy as np
00

from autograd import elementwise_grad

def f(x):
y = np.exp(-x)
0.0
0

return (1.0 - y) / (1.0 + y)

0.5
df = elementwise_grad(f)
ddf = elementwise_grad(df)

1.0
4 2 0 2 4
x
before autograd

# x, y, yerr are numpy arrays of the same shape

import numpy as np
from scipy.optimize import minimize

def model(theta, x):


a, b, c = theta
return a / (1 + np.exp(-b * (x - c)))

def neg_log_like(theta):
r = (y - model(theta, x)) / yerr
return 0.5 * np.sum(r*r)

r = minimize(neg_log_like, [1.0, 10.0, 1.5])

print(r)
after autograd

# x, y, yerr are numpy arrays of the same shape

from autograd import grad


import autograd.numpy as np
from scipy.optimize import minimize

def model(theta, x):


a, b, c = theta
return a / (1 + np.exp(-b * (x - c)))

def neg_log_like(theta):
r = (y - model(theta, x)) / yerr
return 0.5 * np.sum(r*r)

r = minimize(neg_log_like, [1.0, 10.0, 1.5],


jac=grad(neg_log_like))
print(r)
after autograd

# x, y, yerr are numpy arrays of the same shape

from autograd import grad


import autograd.numpy as np
from scipy.optimize import minimize

def model(theta, x):


a, b, c = theta
return a / (1 + np.exp(-b * (x - c)))

def neg_log_like(theta):
r = (y - model(theta, x)) / yerr
return 0.5 * np.sum(r*r)

r = minimize(neg_log_like, [1.0, 10.0, 1.5],


jac=grad(neg_log_like))
print(r)
after autograd

# x, y, yerr are numpy arrays of the same shape

from autograd import grad


import autograd.numpy as np
from scipy.optimize import minimize

def model(theta, x):


a, b, c = theta
115
return a / (1 +calls 66 calls
np.exp(-b * (x - c)))

def neg_log_like(theta):
r = (y - model(theta, x)) / yerr
return 0.5 * np.sum(r*r)

r = minimize(neg_log_like, [1.0, 10.0, 1.5],


jac=grad(neg_log_like))
print(r)
HIPS/autograd just works

but… HIPS/autograd is not super fast


HIPS/autograd just works

but… HIPS/autograd is not super fast

you might need to drop down to a compiled language


HIPS/autograd just works

but… HIPS/autograd is not super fast

you might need to drop down to a compiled language

or...
Use Julia?
Uncertainty quantification
Uncertainty quantification

if you have:
a non-linear mean model and/or
non-Gaussian/unknown noise

and you want:


parameter uncertainties
Uncertainty

p(physics | data) / p(data | physics) p(physics)

distribution of likelihood prior


physical parameters
consistent with data
You're going to have to

SAMPLE

cbnd
Flickr user Franz Jachim
MCMC sampling
MCMC sampling

it's
ham
me
r tim
e!

emcee
The MCMC Hammer
MCMC sampling with emcee

dfm.io/emcee; github.com/dfm/emcee
MCMC sampling with emcee

# x, y, yerr are numpy arrays of the same shape

import emcee
import numpy as np

def model(theta, x):


a, b, c = theta
return a / (1 + np.exp(-b * (x - c)))

def log_prob(theta):
log_prior = 0.0
r = (y - model(theta, x)) / yerr
return -0.5 * np.sum(r*r) + log_prior

ndim, nwalkers = 3, 32
p0 = np.array([1.0, 10.0, 1.5])
p0 = p0 + 0.01*np.random.randn(nwalkers, ndim)
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob)
sampler.run_mcmc(p0, 1000)
MCMC sampling with emcee

a
f✓ (xn ) = b (xn c)
14
10 12
1+e
b
8 5
50 .52
1
0
c
1.
5
47
1.

8
5
0
5
0

10
12
14

5
97
00
02
05

47

50

52
0.
1.
1.
1.

1.

1.

1.
a b c made using:
github.com/dfm/corner.py
1 initialization

2 bounds

3 convergence

4 gradients
1 initialization

2 priors

3 convergence

4 gradients?
Other MCMC samplers in Python

1 pymc-devs/pymc3

2 stan-dev/pystan

3 JohannesBuchner/PyMultiNest

4 eggplantbren/DNest4
Other MCMC samplers in Python

hierarchical
1 pymc-devs/pymc3

inference
2 stan-dev/pystan

3 JohannesBuchner/PyMultiNest

4 eggplantbren/DNest4
Other MCMC samplers in Python

hierarchical
1 pymc-devs/pymc3

inference
2 stan-dev/pystan

3 JohannesBuchner/PyMultiNest

sampling
nested
4 eggplantbren/DNest4
in summary…
If your data analysis problem looks like this… *

Physics

inference mean model


(parameter estimation) (physical parameters → predicted data)

Data

noise
(stochastic; instrument, systematics, etc.)

* it probably does
… now you know how to solve it! *

https://speakerdeck.com/dfm/pyastro16

* in theory

Potrebbero piacerti anche