Ba Yes I An Analysis

INTRODUCTION TO BAYESIAN ANALYSIS
Arto Luoma
University of Tampere, Finland
Autumn 2014
Introduction to Bayesian analysis, autumn 2013 University of Tampere – 1 / 130

Who was Thomas Bayes?
Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models Thomas Bayes (1701-1761) was an English philosopher and
Categorical data Presbyterian minister. In his later years he took a deep interest in
probability. He suggested a solution to a problem of inverse
probability. What do we know about the probability of success if the
number of successes is recorded in a binomial experiment? Richard
Price discovered Bayes’ essay and published it posthumously. He
believed that Bayes’ Theorem helped prove the existence of God.

Bayesian paradigm
Basic concepts
Bayesian paradigm:
Single-parameter
models
posterior information = prior information + data information
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Bayesian paradigm
Basic concepts
Bayesian paradigm:
Single-parameter
models
Hypothesis
testing
Simple
More formally:
multiparameter
models
p(θ|y) ∝ p(θ)p(y|θ),
Markov chains where ∝ is a symbol for proportionality, θ is an unknown
MCMC methods
parameter, y is data, and p(θ), p(θ|y) and p(y|θ) are the density
Model checking
and comparison functions of the prior, posterior and sampling distributions,
Hierarchical and
regression
respectively.
models
Categorical data

Bayesian paradigm
Basic concepts
Bayesian paradigm:
Single-parameter
models
Hypothesis
testing
Simple
More formally:
multiparameter
models
p(θ|y) ∝ p(θ)p(y|θ),
Markov chains where ∝ is a symbol for proportionality, θ is an unknown
MCMC methods
parameter, y is data, and p(θ), p(θ|y) and p(y|θ) are the density
Model checking
and comparison functions of the prior, posterior and sampling distributions,
Hierarchical and
regression
respectively.
models
Categorical data
In Bayesian inference, the unknown parameter θ is considered
stochastic, unlike in classical inference. The distributions p(θ)
and p(θ|y) express uncertainty about the exact value of θ. The
density of data, p(y|θ), provides information from the data. It
is called a likelihood function when considered a function of θ.

Software for Bayesian Statistics
Basic concepts
In this course we use the R and BUGS programming languages.
Single-parameter
models BUGS stands for Bayesian inference Using Gibbs Sampling.
Hypothesis
testing
Gibbs sampling was the computational technique first adopted
Simple for Bayesian analysis. The goal of the BUGS project is to
multiparameter
models
separate the ”knowledge base” from the ”inference machine”
Markov chains used to draw conclusions. BUGS language is able to describe
MCMC methods complex models using very limited syntax.
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Software for Bayesian Statistics
Basic concepts
In this course we use the R and BUGS programming languages.
Single-parameter
models BUGS stands for Bayesian inference Using Gibbs Sampling.
Hypothesis
testing
Gibbs sampling was the computational technique first adopted
Simple for Bayesian analysis. The goal of the BUGS project is to
multiparameter
models
separate the ”knowledge base” from the ”inference machine”
Markov chains used to draw conclusions. BUGS language is able to describe
MCMC methods complex models using very limited syntax.
Model checking
and comparison There are three widely used BUGS implementations:
Hierarchical and
regression
WinBUGS, OpenBUGS and JAGS. Both WinBUGS and
models
OpenBUGS have a Windows GUI. Further, each engine can be
Categorical data
controlled from R. In this course we introduce rjags, the R
interface to JAGS.

Contents of the course
Basic concepts
Basic concepts
Single-parameter
models
Single-parameter models
Hypothesis
testing
Hypothesis testing
Simple
multiparameter
models Simple multiparameter models
Markov chains
Markov chains
MCMC methods
Model checking MCMC methods
and comparison
Hierarchical and Model checking and comparison
regression
models
Hierarchical and regression models
Categorical data
Categorical data

Basic concepts
Bayes’ theorem
Example
Prior and
posterior
distributions
Example 1
Example 2
Decision theory
Bayes estimators
Example 1
Basic concepts
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking Introduction to Bayesian analysis, autumn 2013 University of Tampere – 6 / 130
and comparison
Bayes’ theorem
Basic concepts
Bayes’ theorem
Let A1 , A2 , ..., Ak be events that partition the sample space Ω,
Example (i.e. Ω = A1 ∪ A2 ∪ ... ∪ Ak and Ai ∩ Aj = ∅ when i 6= j) and let
Prior and
posterior B an event on that space for which Pr(B) > 0. Then Bayes’
distributions
Example 1
theorem is
Example 2
Decision theory Pr(Aj ) Pr(B|Aj )
Bayes estimators Pr(Aj |B) = Pk .
Example 1 j=1 Pr(Aj ) Pr(B|Aj )
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Bayes’ theorem
Basic concepts
Bayes’ theorem
Let A1 , A2 , ..., Ak be events that partition the sample space Ω,
Example (i.e. Ω = A1 ∪ A2 ∪ ... ∪ Ak and Ai ∩ Aj = ∅ when i 6= j) and let
Prior and
posterior B an event on that space for which Pr(B) > 0. Then Bayes’
distributions
Example 1
theorem is
Example 2
Decision theory Pr(Aj ) Pr(B|Aj )
Bayes estimators Pr(Aj |B) = Pk .
Example 1 j=1 Pr(Aj ) Pr(B|Aj )
Example 2
Conjugate priors
Noninformative
priors
Intervals
This formula can be used to reverse conditional probabilities. If
Prediction one knows the probabilities of the events Aj and the
Single-parameter
models
conditional probabilities Pr(B|Aj ), j = 1, ..., k, the formula can
Hypothesis be used to compute the conditinal probabilites Pr(Aj |B).
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Example (Diagnostic tests)
Basic concepts
Bayes’ theorem
A disease occurs with prevalence γ in population, and θ
Example indicates that an individual has the disease. Hence
Prior and
posterior Pr(θ = 1) = γ, Pr(θ = 0) = 1 − γ. A diagnostic test gives a
distributions
Example 1
result Y , whose distribution function is F1 (y) for a diseased
Example 2 individual and F0 (y) otherwise. The most common type of test
Decision theory
Bayes estimators declares that a person is diseased if Y > y0 , where y0 is fixed on
Example 1
Example 2
the basis of past data.
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Example 1
Example 2
the basis of past data. The probability that a person is
Conjugate priors diseased, given a positive test result, is
Noninformative
priors
Intervals Pr(θ = 1|Y > y0 )
Prediction
γ[1 − F1 (y0 )]
Single-parameter = .
models
γ[1 − F1 (y0 )] + (1 − γ)[1 − F0 (y0 )]
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Example 1
Example 2
the basis of past data. The probability that a person is
Conjugate priors diseased, given a positive test result, is
Noninformative
priors
Intervals Pr(θ = 1|Y > y0 )
Prediction
γ[1 − F1 (y0 )]
Single-parameter = .
models
γ[1 − F1 (y0 )] + (1 − γ)[1 − F0 (y0 )]
Hypothesis
testing
Simple
This is sometimes called the positive predictive value of test. Its
multiparameter sensitivity and specifity are 1 − F1 (y0 ) and F0 (y0 ).
models
Markov chains (Example from Davison, 2003).

MCMC methods
and comparison
Prior and posterior distributions
Basic concepts
Bayes’ theorem
In a more general case, θ can take a finite number of values,
Example labelled 1, ..., k. We can assign to these values probabilites
Prior and
posterior p1 , ..., pk which express our beliefs about θ before we have
distributions
Example 1
access to the data. The data y are assumed to be the observed
Example 2 value of a (multidimensional) random variable Y , and p(y|θ)
Decision theory
Bayes estimators the density of y given θ (the likelihood function).
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators the density of y given θ (the likelihood function). Then the
Example 1
Example 2
conditional probabilites
Conjugate priors
Noninformative pj p(y|θ = j)
priors Pr(θ = j|Y = y) = Pk , j = 1, ..., k,
i=1 pi p(y|θ = i)
Intervals
Prediction
Single-parameter
models summarize our beliefs about θ after we have observed Y .
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators the density of y given θ (the likelihood function). Then the
Example 1
Example 2
conditional probabilites
Conjugate priors
Noninformative pj p(y|θ = j)
priors Pr(θ = j|Y = y) = Pk , j = 1, ..., k,
i=1 pi p(y|θ = i)
Intervals
Prediction
Single-parameter
models summarize our beliefs about θ after we have observed Y .
Hypothesis
testing The unconditional probabilities p1 , ..., pk are called prior
Simple
multiparameter
probablities and Pr(θ = 1|Y = y), ..., Pr(θ = k|Y = y) are called
models
posterior probabilites of θ.
Markov chains
MCMC methods
and comparison
Prior and posterior distributions (2)
Basic concepts
Bayes’ theorem
When θ can get values continuosly on some interval, we can
Example express our beliefs about it with a prior density p(θ). After we
Prior and
posterior have obtained the data y, our beliefs about θ are contained in
distributions
Example 1
the conditional density,
Example 2
Decision theory p(θ)p(y|θ)
Bayes estimators p(θ|y) = R , (1)
Example 1 p(θ)p(y|θ)dθ
Example 2
Conjugate priors
Noninformative called posterior density.
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Prior and posterior distributions (2)
Basic concepts
Bayes’ theorem
When θ can get values continuosly on some interval, we can
Example express our beliefs about it with a prior density p(θ). After we
Prior and
posterior have obtained the data y, our beliefs about θ are contained in
distributions
Example 1
the conditional density,
Example 2
Decision theory p(θ)p(y|θ)
Bayes estimators p(θ|y) = R , (1)
Example 1 p(θ)p(y|θ)dθ
Example 2
Conjugate priors
Noninformative called posterior density.
priors
Intervals Since θ is integrated out in the denominator, it can be
Prediction
Single-parameter
considered as a constant with respect to θ. Therefore, the
models
Bayes’ formula in (1) is often written as
Hypothesis
testing
Simple p(θ|y) ∝ p(θ)p(y|θ), (2)
multiparameter
models
Markov chains which denotes that p(θ|y) is proportional to p(θ)p(y|θ).

MCMC methods
and comparison
Example 1 (Introducing a New Drug in the Market)
Basic concepts
Bayes’ theorem
A drug company would like to introduce a drug to reduce acid
Example indigestion. It is desirable to estimate θ, the proportion of the
Prior and
posterior market share that this drug will capture. The company
distributions
Example 1
interviews n people and Y of them say that they will buy the
Example 2 drug. In the non-Bayesian analysis θ ∈ [0, 1] and Y ∼ Bin(n, θ).
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators
Example 1
We know that θ̂ = Y /n is a very good estimator of θ. It is
Example 2 unbiased, consistent and minimum variance unbiased.
Conjugate priors
Noninformative Moreover, it is also the maximum likelihood estimator (MLE),
priors
Intervals
and thus asymptotically normal.
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators
Example 1
We know that θ̂ = Y /n is a very good estimator of θ. It is
Example 2 unbiased, consistent and minimum variance unbiased.
Conjugate priors
Noninformative Moreover, it is also the maximum likelihood estimator (MLE),
priors
Intervals
and thus asymptotically normal.
Prediction
Single-parameter
A Bayesian may look at the past performance of new drugs of
models
this type. If in the past new drugs tend to capture a proportion
Hypothesis
testing between say .05 and .15 of the market, and if all values in
Simple
multiparameter
between are assumed equally likely, then θ ∼ Unif(.05, .15).
models
(Example from Rohatgi, 2003).
Markov chains
MCMC methods
and comparison
Example 1 (continued)
Basic concepts
Bayes’ theorem
Thus, the prior distribution is given by
Example
Prior and 1/(0.15 − 0.05) = 10, 0.05 ≤ θ ≤ 0.15
posterior p(θ) =
distributions 0, otherwise.
Example 1
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
Prior and 1/(0.15 − 0.05) = 10, 0.05 ≤ θ ≤ 0.15
posterior p(θ) =
Example 1
Example 2
Decision theory and the likelihood function by
Bayes estimators
Example 1
n y
Example 2
p(y|θ) = θ (1 − θ)n−y .
Conjugate priors
Noninformative
y
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
Prior and 1/(0.15 − 0.05) = 10, 0.05 ≤ θ ≤ 0.15
posterior p(θ) =
Example 1
Example 2
Decision theory and the likelihood function by
Bayes estimators
Example 1
n y
Example 2
p(y|θ) = θ (1 − θ)n−y .
Conjugate priors
Noninformative
y
priors
Intervals The posterior distribution is
Prediction
Single-parameter (
θ y (1−θ)n−y
models p(θ)p(y|θ) R 0.15 y 0.05 ≤ θ ≤ 0.15
Hypothesis p(θ|y) = R = 0.05 θ (1−θ)
n−y dθ
testing p(θ)p(y|θ)dθ 0, otherwise.
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Suppose that the sample size is n = 100 and y = 20 say that
Example they will use the drug. Then the following BUGS code can be
Prior and
posterior
used to simulate the posterior distribution.
distributions
Example 1
Example 2
Decision theory
model{
Bayes estimators theta ~ dunif(0.05,0.15)
Example 1 y ~ dbin(theta,n)
Example 2
Conjugate priors }
Noninformative
priors
Intervals
Suppose that this is the contents of file Acid.txt at the home
Prediction directory. Then JAGS can be called from R as follows:
Single-parameter
models
Hypothesis
testing acid <- list(n=100,y=20)
Simple acid.jag <- jags.model("Acid1.txt",acid)
multiparameter acid.coda <- coda.samples(acid.jag,"theta",10000)
models
Markov chains
hist(acid.coda[[1]][,"theta"],main="",xlab=expression(theta))
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
2500
Prior and
posterior
distributions
2000
Example 1
Example 2
1500
Frequency
Decision theory
Bayes estimators
1000
Example 1
Example 2
500
Conjugate priors
Noninformative
priors
0
Intervals 0.08 0.10 0.12 0.14

Prediction
θ
Single-parameter
models
Hypothesis
testing Figure 1: Market share of a new drug: Simulations from the
Simple posterior distribution of θ.
multiparameter
models
Markov chains
MCMC methods
and comparison
Example 2 (Diseased White Pine Trees.)
Basic concepts
Bayes’ theorem
White pine is one of the best known species of pines in the
Example northeastern United States and Canada. White pine is
Prior and
posterior susceptible to blister rust, which develops cankers on the bark.
distributions
Example 1
These cankers swell, resulting in death of twigs and small trees.
Example 2 A forester wishes to estimate the average number of diseased
Decision theory
Bayes estimators pine trees per acre in a forest.
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Example 1
Example 2 The number of diseased trees per acre can be modeled by a
Conjugate priors
Noninformative Poisson distribution with mean θ. Since θ changes from area to
priors
Intervals
area, the forester believes that θ ∼ Exp(λ). Thus,
Prediction
Single-parameter
models
p(θ) = (1/λ)e−θ/λ , if θ > 0, and 0 elsewhere
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Example 1
Example 2 The number of diseased trees per acre can be modeled by a
Conjugate priors
Noninformative Poisson distribution with mean θ. Since θ changes from area to
priors
Intervals
area, the forester believes that θ ∼ Exp(λ). Thus,
Prediction
Single-parameter
models
p(θ) = (1/λ)e−θ/λ , if θ > 0, and 0 elsewhere
Hypothesis
testing The forester takes a random sample of size n from n different
Simple
multiparameter
models
one-acre plots.
Markov chains (Example from Rohatgi, 2003).
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
The likelihood function is
Example n Pn
Prior and Y θ yi θ i=1 yi
−θ
posterior p(y|θ) = e = Q e−nθ .
distributions yi ! yi !
Example 1 i=1
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example n Pn
−θ
Example 1 i=1
Example 2
Decision theory Consequently, the posterior distribution is
Bayes estimators
Example 1 Pn
Example 2 θ i=1 yi e−θ(n+1/λ)
Conjugate priors p(θ|y) = R ∞ Pn .
Noninformative θ i=1 yi e−θ(n+1/λ)
priors 0
Intervals
Prediction We see
Pnthat this is a Gamma-distribution with parameters
Single-parameter
models
α = i=1 yi + 1 and β = n + 1/λ.
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example n Pn
−θ
Example 1 i=1
Example 2
Decision theory Consequently, the posterior distribution is
Bayes estimators
Example 1 Pn
Example 2 θ i=1 yi e−θ(n+1/λ)
Conjugate priors p(θ|y) = R ∞ Pn .
Noninformative θ i=1 yi e−θ(n+1/λ)
priors 0
Intervals
Prediction We see
Pnthat this is a Gamma-distribution with parameters
Single-parameter
models
α = i=1 yi + 1 and β = n + 1/λ. Thus,
Hypothesis Pn
testing
(n + 1/λ) i=1 yi +1 Pni=1 yi −θ(n+1/λ)
Simple p(θ|y) = Pn θ e .
multiparameter Γ( i=1 yi + 1)
models
Markov chains
MCMC methods
and comparison
Statistical decision theory
Basic concepts
Bayes’ theorem
The outcome of a Bayesian analysis is the posterior
Example distribution, which combines the prior information and the
Prior and
posterior information from data. However, sometimes we may want to
distributions
Example 1
summarize the posterior information with a scalar, for example
Example 2 the mean, median or mode of the posterior distribution. In the
Decision theory
Bayes estimators following, we show how the use of scalar estimator can be
Example 1
Example 2
justified using statistical decision theory.
Conjugate priors
Noninformative Let L(θ, θ̂) denote the loss function which gives the cost of
priors
Intervals
using θ̂ = θ̂(y) as an estimate for θ. We define that θ̂ is a Bayes
Prediction estimate of θ if it minimizes the posterior expected loss
Single-parameter
models Z
Hypothesis
testing
E[L(θ, θ̂)|y] = L(θ, θ̂)p(θ|y)dθ.
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Statistical decision theory (continued)
Basic concepts
Bayes’ theorem
On the other hand, the expectation of the loss function over the
Example sampling distribution of y is called risk function:
Prior and
posterior Z
distributions
Example 1 Rθ̂ (θ) = E[L(θ, θ̂)|θ] = L(θ, θ̂)p(y|θ)dy.
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
On the other hand, the expectation of the loss function over the
Example sampling distribution of y is called risk function:
Prior and
posterior Z
distributions
Example 1 Rθ̂ (θ) = E[L(θ, θ̂)|θ] = L(θ, θ̂)p(y|θ)dy.
Example 2
Decision theory
Bayes estimators Further, the expectation of the risk function over the prior
Example 1
Example 2 distribution of θ,
Conjugate priors
Noninformative
Z
priors
Intervals
E[Rθ̂ (θ)] = Rθ̂ (θ)p(θ)dθ,
Prediction
Single-parameter
models is called Bayes risk.
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
By changing the order of integration one can see that the Bayes
Example risk
Prior and
posterior Z Z Z
distributions
Example 1
Rθ̂ (θ)p(θ)dθ = p(θ) L(θ, θ̂)p(y|θ)dydθ
Example 2 Z Z
Decision theory
Bayes estimators = p(y) L(θ, θ̂)p(θ|y)dθdy (3)
Example 1
Example 2
Conjugate priors
Noninformative
is minimized when the inner integral in (3) is minimized for
priors each y, that is, when a Bayes estimator is used.
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
By changing the order of integration one can see that the Bayes
Example risk
Prior and
posterior Z Z Z
distributions
Example 1
Rθ̂ (θ)p(θ)dθ = p(θ) L(θ, θ̂)p(y|θ)dydθ
Example 2 Z Z
Decision theory
Bayes estimators = p(y) L(θ, θ̂)p(θ|y)dθdy (3)
Example 1
Example 2
Conjugate priors
Noninformative
is minimized when the inner integral in (3) is minimized for
priors each y, that is, when a Bayes estimator is used.
Intervals
Prediction
In the following, we introduce the Bayes estimators for three
Single-parameter
models simple loss functions.
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Bayes estimators: zero-one loss function
Basic concepts
Bayes’ theorem
Zero-one loss:
Example (
Prior and 0 when |θ̂ − θ| < a
posterior
distributions
L(θ, θ̂) =
Example 1
1 when |θ̂ − θ| ≥ a.
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Zero-one loss:
Example (
posterior
distributions
L(θ, θ̂) =
Example 1
1 when |θ̂ − θ| ≥ a.
Example 2
Decision theory
Bayes estimators
We should minimize
Example 1 Z ∞ Z θ̂−a Z ∞
Example 2
Conjugate priors L(θ, θ̂)p(θ|y)dθ = p(θ|y)dθ + p(θ|y)dθ
Noninformative −∞ −∞ θ̂+a
priors
Intervals
Z θ̂+a
Prediction =1 − p(θ|y)dθ,
Single-parameter θ̂−a
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Zero-one loss:
Example (
posterior
distributions
L(θ, θ̂) =
Example 1
1 when |θ̂ − θ| ≥ a.
Example 2
Decision theory
Bayes estimators
We should minimize
Example 1 Z ∞ Z θ̂−a Z ∞
Example 2
Conjugate priors L(θ, θ̂)p(θ|y)dθ = p(θ|y)dθ + p(θ|y)dθ
Noninformative −∞ −∞ θ̂+a
priors
Intervals
Z θ̂+a
Prediction =1 − p(θ|y)dθ,
Single-parameter θ̂−a
models
Hypothesis
or maximize
testing
Z θ̂+a
Simple
multiparameter p(θ|y)dθ.
models
θ̂−a
Markov chains
MCMC methods
and comparison
Bayes estimators: absolute error loss and quadratic
loss function
Basic concepts
Bayes’ theorem If p(θ|y) is unimodal, maximization is achieved by choosing θ̂ to
Example
Prior and
be the midpoint of the interval of length 2a for which p(θ|y) has
posterior
distributions
the same value at both ends. If we let a → 0, then θ̂ tends to
Example 1 the mode of the posterior distribution. This equals the MLE if
Example 2
Decision theory
p(θ) is ’flat’.
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
loss function
Basic concepts
Example
Prior and
posterior
distributions
Example 2
Decision theory
Bayes estimators
Example 1 Absolute error loss: L(θ, θ̂) = |θ̂ − θ|. In general, if X is a
Example 2
Conjugate priors
random variable, then the expectation E(|X − d|) is minimized
Noninformative by choosing d to be the median of the distribution of X. Thus,
priors
Intervals the Bayes estimate of θ is the posterior median.
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
loss function
Basic concepts
Example
Prior and
posterior
distributions
Example 2
Decision theory
Bayes estimators
Example 1 Absolute error loss: L(θ, θ̂) = |θ̂ − θ|. In general, if X is a
Example 2
Conjugate priors
random variable, then the expectation E(|X − d|) is minimized
Noninformative by choosing d to be the median of the distribution of X. Thus,
priors
Intervals the Bayes estimate of θ is the posterior median.
Prediction
Single-parameter Quadratic loss function: L(θ, θ̂) = (θ̂ − θ)2 . In general, if X is a
models
Hypothesis
random variable, then the expectation E[(X − d)2 ] is minimized
testing by choosing d to be the mean of the distribution of X. Thus,
Simple
multiparameter the Bayes estimate of θ is the posterior mean.
models
Markov chains
MCMC methods
and comparison
Bayes estimators: Example 1 (cont)
Basic concepts
Bayes’ theorem
We continue our example of the market share of a new drug.
Example Using R, we can compute the posterior mean and median
Prior and
posterior
estimates, and various posterior intervals:
distributions
Example 1
Example 2
Decision theory
summary(acid.coda)
Bayes estimators
Example 1 1. Empirical mean and standard deviation for each variable,
Example 2
Conjugate priors plus standard error of the mean:
Noninformative
priors
Intervals Mean SD Naive SE Time-series SE
Prediction 0.1357622 0.0121584 0.0001216 0.0002253
Single-parameter
models
Hypothesis 2. Quantiles for each variable:
testing
Simple
multiparameter
2.5% 25% 50% 75% 97.5%
models 0.1050 0.1294 0.1390 0.1453 0.1496
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
From Figure 1 we see that the posterior mode is 0.15.
Example
Prior and
posterior
distributions
Example 1
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
Prior and If we use Beta(α, β), whose density is
posterior
distributions
1
Example 1
Example 2
p(θ) = θα−1 (1 − θ)β−1 , when 0 < θ < 1,
Decision theory
B(α, β)
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
posterior
distributions
1
Example 1
Example 2
p(θ) = θα−1 (1 − θ)β−1 , when 0 < θ < 1,
Decision theory
B(α, β)
Bayes estimators
Example 1 as a prior, then the posterior is
Example 2
Conjugate priors
Noninformative p(θ|y) ∝ p(θ)p(y|θ) ∝ θα+y−1 (1 − θ)β+n−y−1 .
priors
Intervals
Prediction We see immediately that the posterior distribution is
Single-parameter
models
Beta(α + y, β + n − y).
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
posterior
distributions
1
Example 1
Example 2
p(θ) = θα−1 (1 − θ)β−1 , when 0 < θ < 1,
Decision theory
B(α, β)
Bayes estimators
Example 1 as a prior, then the posterior is
Example 2
Conjugate priors
Noninformative p(θ|y) ∝ p(θ)p(y|θ) ∝ θα+y−1 (1 − θ)β+n−y−1 .
priors
Intervals
Prediction We see immediately that the posterior distribution is
Single-parameter
models
Beta(α + y, β + n − y).
Hypothesis
testing The posterior mean (Bayes estimator with quadratic loss) is
Simple (α + y)/(α + β + n). The mode (Bayes estimator with zero-one
multiparameter
models loss when a → 0) is (α + y − 1)/(α + β + n − 2), provided that
Markov chains the distribution is unimodal.
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
We now continue our example of estimating the proportion of
Example diseased Ptrees. We derived that the posterior distribution is
Gamma( ni=1 yi + 1, n + 1/λ). Thus, the Bayes estimator with
Prior and
posterior
distributions
Example 1
aPquadratic loss function is the mean of this distribution,
Example 2 ( ni=1 yi + 1)/(n + 1/λ). However, the mean and mode of a
Decision theory
Bayes estimators gamma distribution do not exist in closed form.
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
We now continue our example of estimating the proportion of
Example diseased Ptrees. We derived that the posterior distribution is
Gamma( ni=1 yi + 1, n + 1/λ). Thus, the Bayes estimator with
Prior and
posterior
distributions
Example 1
aPquadratic loss function is the mean of this distribution,
Example 2 ( ni=1 yi + 1)/(n + 1/λ). However, the mean and mode of a
Decision theory
Bayes estimators gamma distribution do not exist in closed form.
Example 1
Example 2 Note that the classical estimate for θ is the sample mean ȳ.
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Conjugate prior distribution
Basic concepts
Bayes’ theorem
Computations can often be facilitated using conjugate prior
Example distributions. We say that a prior is conjugate for the likelihood
Prior and
posterior if the prior and posterior distributions belong to the same
distributions
Example 1
family. There are conjugate distributions for the exponential
Example 2 family of sampling distributions.
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Conjugate prior distribution
Basic concepts
Bayes’ theorem
Computations can often be facilitated using conjugate prior
Example distributions. We say that a prior is conjugate for the likelihood
Prior and
posterior if the prior and posterior distributions belong to the same
distributions
Example 1
family. There are conjugate distributions for the exponential
Example 2 family of sampling distributions.
Decision theory
Bayes estimators
Example 1
Conjugate priors can be formed with the following simple steps:
Example 2
Conjugate priors 1. Write the likelihood function.
Noninformative
priors
2. Remove the factors that do not depend on θ.
Intervals 3. Replace the expressions which depend on data with
Prediction
Single-parameter
parameters. Also the sample size n should be replaced.
models 4. Now you have the kernel of the conjugate prior. You can
Hypothesis
testing complement it with the normalizing constant.
Simple 5. In order to obtain the standard parametrization it may be
multiparameter
models necessary to reparametrize.
Markov chains
MCMC methods
and comparison
Example: Poisson likelihood
Basic concepts
Bayes’ theorem
Let y = (y1 , ...yn ) be a sample from Poi(θ). Then the likelihood
Example is
n
Y P
Prior and
θyi e−θ
posterior
distributions p(y|θ) = ∝ θ yi e−nθ .
Example 1
yi !
i=1
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Example: Poisson likelihood
Basic concepts
Bayes’ theorem
Let y = (y1 , ...yn ) be a sample from Poi(θ). Then the likelihood
Example is
Yn P
Prior and
θyi e−θ
posterior
distributions p(y|θ) = ∝ θ yi e−nθ .
Example 1
yi !
i=1
Example 2 P
Decision theory By replacing yi and n, which depend on the data, with the
Bayes estimators
Example 1 parameters α1 and α2 , we obtain the conjugate prior
Example 2
Conjugate priors
Noninformative p(θ) ∝ θα1 e−α2 θ ,
priors
Intervals
Prediction which is Gamma(α1 + 1, α2 ) distribution. If we reparametrize
Single-parameter
models
this distribution so that α = α1 + 1 and β = α2 we obtain the
Hypothesis prior Gamma(α, β).
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Example: Uniform likelihood
Basic concepts
Bayes’ theorem
Assume that y = (y1 , ..., yn ) is a random sample from Unif(0, θ).
Example The the density of a single observation yi is
Prior and
posterior 1
distributions
θ 0 ≤ yi ≤ θ,
Example 1 p(yi |θ) =
Example 2 0, otherwise,
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Example: Uniform likelihood
Basic concepts
Bayes’ theorem
Assume that y = (y1 , ..., yn ) is a random sample from Unif(0, θ).
Example The the density of a single observation yi is
Prior and
posterior 1
distributions
θ 0 ≤ yi ≤ θ,
Example 1 p(yi |θ) =
Example 2 0, otherwise,
Decision theory
Bayes estimators
Example 1
and the likelihood of θ is
Example 2 1
Conjugate priors
p(y|θ) = θ n , 0 ≤ y(1) ≤ ... ≤ y(n) ≤ θ,
Noninformative
priors 0, otherwise,
Intervals
Prediction
1
= I (y) I{y(1) ≥0} (y),
Single-parameter θn {y(n) ≤θ}
models
Hypothesis
testing
where IA (y) denotes an indicator function obtaining value 1
Simple
when y ∈ A and 0 otherwise.
multiparameter
models
Markov chains
MCMC methods
and comparison
Example: Uniform likelihood (cont)
Basic concepts
Bayes’ theorem
Now, by removing the factor I{y(1) ≥0} (y), which does not
Example depend on θ, and replacing n and y(n) with parameters we
Prior and
posterior obtain
distributions
Example 1 1
Example 2 p(θ) ∝ I
α {θ≥β}
(θ)
Decision theory θ
1
Bayes estimators
θα , when θ ≥ β,
Example 1 =
Example 2 0, otherwise.
Conjugate priors
Noninformative
priors This is the kernel of the Pareto distribution.
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Example: Uniform likelihood (cont)
Basic concepts
Bayes’ theorem
Now, by removing the factor I{y(1) ≥0} (y), which does not
Example depend on θ, and replacing n and y(n) with parameters we
Prior and
posterior obtain
distributions
Example 1 1
Example 2 p(θ) ∝ I
α {θ≥β}
(θ)
Decision theory θ
1
Bayes estimators
θα , when θ ≥ β,
Example 1 =
Example 2 0, otherwise.
Conjugate priors
Noninformative
priors This is the kernel of the Pareto distribution.The posterior
Intervals
Prediction
distribution
Single-parameter
models p(θ|y) ∝ p(θ)p(y|θ)
Hypothesis 1
testing
θ n+α
, when θ ≥ max(β, y(n) )
∝
Simple 0, otherwise.
multiparameter
models
Markov chains
is also a Pareto distribution.
MCMC methods
and comparison
Noninformative prior distribution
Basic concepts
Bayes’ theorem
When there is no prior information available on the estimated
Example parameters, noninformative priors can be used. They can also
Prior and
posterior be used to find out how an informative prior affects the
distributions
Example 1
outcome of the inference.
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Noninformative prior distribution
Basic concepts
Bayes’ theorem
When there is no prior information available on the estimated
Example parameters, noninformative priors can be used. They can also
Prior and
posterior be used to find out how an informative prior affects the
distributions
Example 1
outcome of the inference.
Example 2
Decision theory The uniform distribution p(θ) ∝ 1 is often used as a
Bayes estimators
Example 1
noninformative prior. However, this is not fully unproblematic.
Example 2 If the uniform distribution is restricted to an interval, it is not,
Conjugate priors
Noninformative in fact, noninformative. For example, the prior Unif(0, 1),
priors
Intervals
contains the information that θ is in the interval [0.2, 0.4] with
Prediction probability 0.2. This information content becomes obvious
Single-parameter
models when a parametric transformation is made. The distribution of
Hypothesis the transformed parameter is no more uniform.
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Noninformative prior distribution (cont)
Basic concepts
Bayes’ theorem
Another problem arises if the parameter can obtain values in an
Example infinite interval. In such a case there is no proper uniform
Prior and
posterior distribution. However, one can use an improper uniform prior
distributions
Example 1
distribution. Then the posterior is proportional to the
Example 2 likelihood.
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators
Example 1
Some parameters, for example scale parameteres and variances,
Example 2 can obtain only positive values. Such variables are often given
Conjugate priors
Noninformative the improper prior p(θ) ∝ 1/θ, which implies that log(θ) has a
priors
Intervals
uniform prior.
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators
Example 1
Some parameters, for example scale parameteres and variances,
Example 2 can obtain only positive values. Such variables are often given
Conjugate priors
Noninformative the improper prior p(θ) ∝ 1/θ, which implies that log(θ) has a
priors
Intervals
uniform prior.
Prediction
Single-parameter
Jeffreys has suggested giving a uniform prior for such a
models
transformation of θ that its Fisher information is a constant.
Hypothesis 1
testing Jeffreys’ prior is defined as p(θ) ∝ I(θ) 2 , where I(θ) is the
Simple
multiparameter
Fisher information of θ. That this definition is invariant to
models parametrization, can be seen as follows:
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Let φ = h(θ) be a regular, monotonic transformation of θ, and
Example its inverse transformation θ = h−1 (φ). Then the Fisher
Prior and
posterior information of φ is
distributions
Example 1 " 2 #
Example 2 d log p(y|φ)
Decision theory I(φ) =E φ
Bayes estimators dφ
Example 1 " #
d log p(y|θ = h−1 (φ))
2
Example 2 dθ 2
Conjugate priors =E φ
Noninformative
priors
dθ dφ
Intervals 2
dθ
=I(θ) .
Prediction
Single-parameter
models
dφ
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1 " 2 #
Example 1 " #
2
Example 2 dθ 2
Noninformative
priors
dθ dφ
Intervals 2
dθ
=I(θ) .
Prediction
Single-parameter
models
dφ
Hypothesis
1
testing 1
dθ
Simple
Thus, I(φ) 2 = I(Θ) 2 dφ .
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1 " 2 #
Example 1 " #
2
Example 2 dθ 2
Noninformative
priors
dθ dφ
Intervals 2
dθ
=I(θ) .
Prediction
Single-parameter
models
dφ
Hypothesis
1
testing 1
dθ
Simple
Thus, I(φ) 2 = I(Θ) 2 dφ .
multiparameter
models dθ 1 dθ
Markov chains
On the other hand, p(φ) = p(θ) dφ = I(Θ) 2 dφ , as required.
MCMC methods
and comparison
Jeffreys’ prior: Examples
Basic concepts
Bayes’ theorem
Binomial distribution
Example
Prior and The Fisher information of the binomial distribution parameter
posterior
distributions θ is I(θ) = n/[(θ(1 − θ)]. Thus, the Jeffreys prior is
Example 1 p(θ) ∝ [θ(1 − θ)]−1/2 , which is the Beta(1/2,1/2) distribution.
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
posterior
Example 2
Decision theory
Bayes estimators
The mean of the normal distribution
Example 1
Example 2 The Fisher information for the mean θ of the normal
Conjugate priors
Noninformative
distribution is I(θ) = n/σ 2 . This is independent of θ, so that
priors
Jeffreys’ prior is constant, p(θ) ∝ 1.
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
posterior
Example 2
Decision theory
Bayes estimators
The mean of the normal distribution
Example 1
Example 2 The Fisher information for the mean θ of the normal
Conjugate priors
Noninformative
distribution is I(θ) = n/σ 2 . This is independent of θ, so that
priors
Jeffreys’ prior is constant, p(θ) ∝ 1.
Intervals
Prediction
Single-parameter
The variance of the normal distribution
models
Hypothesis Assume that the variance θ of the normal distribution N (µ, θ)
testing
is unknown. Then its Fisher information is I(θ) = n/(2θ2 ), and
Simple
multiparameter Jeffreys’ prior p(θ) ∝ 1/θ.
models
Markov chains
MCMC methods
and comparison
Posterior intervals
Basic concepts
Bayes’ theorem
Whe have seen that it is possible to summarize posterior
Example information using point estimators. However, posterior regions
Prior and
posterior and intervals are usually more useful. We define that a set C is
distributions
Example 1
a posterior region of level 1 − α for θ if the posterior probability
Example 2 of θ belonging to C is 1 − α:
Decision theory
Bayes estimators Z
Example 1
Example 2
Pr(θ ∈ C|y) = p(θ|y)dθ = 1 − α.
Conjugate priors C
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Posterior intervals
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators Z
Example 1
Example 2
Pr(θ ∈ C|y) = p(θ|y)dθ = 1 − α.
Conjugate priors C
Noninformative
priors In the case of scalar parameters one can use posterior intervals
Intervals
Prediction (credible intervals). An equi-tailed posterior inteval is defined
Single-parameter
models
using quantiles of the posterior. Thus, (θL , θU ) is an
Hypothesis 100(1 − α)% interval if Pr(θ < θL |y) = Pr(θ > θU |y) = α/2.
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Posterior intervals
Basic concepts
Bayes’ theorem
Prior and
distributions
Example 1
Decision theory
Bayes estimators Z
Example 1
Example 2
Pr(θ ∈ C|y) = p(θ|y)dθ = 1 − α.
Conjugate priors C
Noninformative
priors In the case of scalar parameters one can use posterior intervals
Intervals
Prediction (credible intervals). An equi-tailed posterior inteval is defined
Single-parameter
models
using quantiles of the posterior. Thus, (θL , θU ) is an
Hypothesis 100(1 − α)% interval if Pr(θ < θL |y) = Pr(θ > θU |y) = α/2.
testing
Simple An advantage of this type of interval is that it is invariant with
multiparameter
models respect to one-to-one parameter transformations. Further, it is
Markov chains easy to compute.
MCMC methods
and comparison
Posterior intervals (cont)
Basic concepts
Bayes’ theorem
A posterior region is said to be a highest posterior density
Example region (HPD region) if the posterior density is larger in all
Prior and
posterior points of the region than in any point outside the region. This
distributions
Example 1
type of region has the smallest possible volume. In a scalar
Example 2 case, an HPD interval has the smallest length. On the other
Decision theory
Bayes estimators hand, the bounds of the interval are not invariant with respect
Example 1
Example 2
to parameter transformations, and it is not always easy to
Conjugate priors determine them.
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
A posterior region is said to be a highest posterior density
Example region (HPD region) if the posterior density is larger in all
Prior and
posterior points of the region than in any point outside the region. This
distributions
Example 1
type of region has the smallest possible volume. In a scalar
Example 2 case, an HPD interval has the smallest length. On the other
Decision theory
Bayes estimators hand, the bounds of the interval are not invariant with respect
Example 1
Example 2
to parameter transformations, and it is not always easy to
Conjugate priors determine them.
Noninformative
priors
Intervals
Example. Cardiac surgery data. Table 1 shows mortality rates
Prediction for cardiac surgery on babies at 12 hospitals. If one wishes to
Single-parameter
models estimate the mortality rate in hospital A, denoted as θA , the
Hypothesis simpliest approach is to assume that the number of deaths y is
testing
Simple
binomially distributed with parameters n and θA where n is the
multiparameter
models
number of operations in A. Then the MLE is θ̂A = 0, which
Markov chains
sounds too optimistic.
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
If we give a uniform prior for θA , then the posterior distribution
Example is Beta(1,48), with posterior mean 1/49. The 95% HPD interval
Prior and
posterior is (0,6.05)% and equi-tailed interval (0.05,7.30)%. Figure 2
distributions
Example 1
shows the posterior density. Another approach would use the
Example 2 total numbers of deaths and operations in all hospitals.
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative Table 1: Mortality rates y/n from cardiac surgery in 12 hospitals
priors
Intervals (Spiegelhalter et. al, BUGS 0.5 Examples Volume 1, Cambridge:
Prediction
MRC Biostatistics Unit, 1996). The numbers of deaths y out of
Single-parameter
models n operations.
Hypothesis
testing
Simple
A 0/47 B 18/148 C 8/119 D 46/810
multiparameter
models
E 8/211 F 13/196 G 9/148 H 31/215
Markov chains I 14/207 J 8/97 K 29/256 L 24/360
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
Example
Prior and
posterior
40
distributions
Example 1
Example 2
30
Decision theory p(,θ|y)
Bayes estimators 20
Example 1
Example 2
10
Conjugate priors
Noninformative
priors
0
Intervals 0.00 0.05 0.10 0.15 0.20

Prediction
θ
Single-parameter
models
Hypothesis
testing Figure 2: Posterior density of θA when the prior is uniform. The
Simple 95% HPD interval is indicated with vertical lines and 95% equi-
multiparameter
models tailed interval with red colour.
Markov chains
MCMC methods
and comparison
Basic concepts
Bayes’ theorem
The following BUGS and R codes can be used to compute the
Example equi-tailed and HPD intervals:
Prior and
posterior
distributions
Example 1 model{
Example 2
Decision theory
theta ~ dbeta(1,1)
Bayes estimators y ~ dbin(theta,n)
Example 1 }
Example 2
Conjugate priors
Noninformative hospital <- list(n=47,y=0)
priors
Intervals hospital.jag <- jags.model("Hospital.txt",hospital)
Prediction hospital.coda <- coda.samples(hospital.jag,"theta",10000)
Single-parameter
models
summary(hospital.coda)
Hypothesis HPDinterval(hospital.coda)
testing
Simple
multiparameter
#Compare with exact upper limit of HPD interval:
models qbeta(0.95,1,48)
Markov chains [1] 0.06050341
MCMC methods
and comparison
Posterior predictive distribution
Basic concepts
Bayes’ theorem
If we wish to predict a new observation ỹ on the basis of the
Example sample y = (y1 , ...yn ), we may use its posterior predictive
Prior and
posterior distribution. This is defined to be the conditional distribution
distributions
Example 1
of ỹ given y:
Example 2 Z
Decision theory
Bayes estimators p(ỹ|y) = p(ỹ, θ|y)dθ
Example 1
Example 2
Z
Conjugate priors
Noninformative
= p(ỹ|y, θ)p(θ|y)dθ,
priors
Intervals
Prediction where p(ỹ|y, θ) is the density of the predictive distribution.
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Posterior predictive distribution
Basic concepts
Bayes’ theorem
If we wish to predict a new observation ỹ on the basis of the
Example sample y = (y1 , ...yn ), we may use its posterior predictive
Prior and
posterior distribution. This is defined to be the conditional distribution
distributions
Example 1
of ỹ given y:
Example 2 Z
Decision theory
Bayes estimators p(ỹ|y) = p(ỹ, θ|y)dθ
Example 1
Example 2
Z
Conjugate priors
Noninformative
= p(ỹ|y, θ)p(θ|y)dθ,
priors
Intervals
Prediction where p(ỹ|y, θ) is the density of the predictive distribution.
Single-parameter
models It is easy to simulate the posterior predictive distribution.
Hypothesis
testing First, draw simulations θ1 , ..., θL from the posterior p(θ|y), then,
Simple for each i, draw ỹi from the predictive distribution p(ỹ|y, θi ).
multiparameter
models
Markov chains
MCMC methods
and comparison
Posterior predictive distribution: Example
Basic concepts
Bayes’ theorem
Assume that we have a coin with unknown probability θ of a
Example head. If there occurs y heads among the first n tosses what is
Prior and
posterior the probability of a head on the next throw?
distributions
Example 1
Example 2
Decision theory
Bayes estimators
Example 1
Example 2
Conjugate priors
Noninformative
priors
Intervals
Prediction
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
and comparison
Posterior predictive distribution: Example
Basic concepts
Bayes’ theorem
Assume that we have a coin with unknown probability θ of a
Example head. If there occurs y heads among the first n tosses what is
Prior and
posterior the probability of a head on the next throw?
distributions
Example 1 Let ỹ = 1 (ỹ = 0) indicate the event that the next throw is a
Example 2
Decision theory head (tail). If the prior of θ is Beta(α, β), then
Bayes estimators
Example 1 Z 1
Example 2
Conjugate priors
p(ỹ|y) = p(ỹ|y, θ)p(θ|y)dθ
Noninformative 0
Z 1 α+y−1 (1
priors
Intervals ỹ 1−ỹ θ − θ)β+n−y−1
= θ (1 − θ) dθ
Prediction
0 B(α + y, β + n − y)
Single-parameter
models B(α + y + ỹ, β + n − y − ỹ + 1)
Hypothesis =
testing B(α + y, β + n − y)
Simple
(α + y)ỹ (β + n − y)1−ỹ
multiparameter
models
= .
α+β+n
Markov chains
MCMC methods
and comparison
Posterior predictive distribution: Example (cont)
Basic concepts
Bayes’ theorem
Thus, Pr(ỹ = 1|y) = (α + y)/(α + β + n). This tends to the
Example sample proportion y/n as n → ∞, so that the role of the prior
Prior and
posterior
information vanishes. If n = 10 and y = 4 and prior parameters
distributions α = β = 0.5 (Jeffreys’ prior), the posterior predictive
Example 1
Example 2
distribution can be simulated with BUGS as follows:
Decision theory
Bayes estimators
Example 1
model{
Example 2
Conjugate priors theta ~ dbeta(alpha,beta)
Noninformative y ~ dbin(theta,n)
priors
Intervals ynew ~ dbern(theta)
Prediction }
Single-parameter
models
Hypothesis coin <- list(n=10,y=4,alpha=0.5,beta=0.5)
testing
coin.jag <- jags.model("Coin.txt",coin)
Simple
multiparameter coin.coda <- coda.samples(coin.jag,c("theta","ynew"),10000)
models summary(coin.coda)
Markov chains
MCMC methods
and comparison
Basic concepts
Single-parameter
models
Normal
distribution
Poisson
distribution
Exponential
distribution
Hypothesis
testing Single-parameter models
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Normal distribution with known variance
Basic concepts
Next we will consider some simple single-parameter models. Let
Single-parameter
models us first assume that y = (y1 , ...yn ) is a sample from a normal
Normal
distribution distribution unknown mean θ and known variance σ 2 . The
Poisson
distribution likelihood is then
Exponential
distribution n
Y
Hypothesis 1 − 1
(yi −θ)2
testing p(y|θ) = √ e 2σ 2
Simple i=1 2πσ 2

multiparameter 1 Pn 2
− i=1 (yi −θ)
models ∝e 2σ 2
Markov chains n
− (θ−ȳ)2
MCMC methods ∝e 2σ 2 .
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Normal distribution with known variance
Basic concepts
Next we will consider some simple single-parameter models. Let
Single-parameter
models us first assume that y = (y1 , ...yn ) is a sample from a normal
Normal
distribution distribution unknown mean θ and known variance σ 2 . The
Poisson
distribution likelihood is then
Exponential
distribution n
Y
Hypothesis 1 − 1
(yi −θ)2
testing p(y|θ) = √ e 2σ 2
Simple i=1 2πσ 2

multiparameter 1 Pn 2
− i=1 (yi −θ)
models ∝e 2σ 2
Markov chains n
− (θ−ȳ)2
MCMC methods ∝e 2σ 2 .
Model checking
and comparison
By replacing σ 2 /n with τ02 , and ȳ with µ0 , we find a conjugate
Hierarchical and
regression prior
models 1
− 2 (θ−µ0 )2
2τ
Categorical data p(θ) ∝ e 0 ,
which is N (µ0 , τ02 ).

Normal distribution with known variance (cont)
Basic concepts
With this prior the posterior becomes
Single-parameter
models
Normal
distribution
p(θ|y) ∝ p(θ)p(y|θ)
Poisson 1
distribution − 2 (θ−µ0 )2 − n
(θ−ȳ)2
2τ0
Exponential ∝e e 2σ 2
distribution ( 1 n !)
Hypothesis
1 1 n µ
τ02 0
+ σ2
ȳ
testing 2
∝ exp − 2 + 2
θ −2 1 n θ
Simple 2 τ0 σ τ02
+ σ2
multiparameter
models
1
Markov chains ∝ exp − 2 (θ − µn )2 ,
MCMC methods
2τn
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
With this prior the posterior becomes
Single-parameter
models
Normal
distribution
Poisson 1
distribution − 2 (θ−µ0 )2 − n
(θ−ȳ)2
2τ0
Exponential ∝e e 2σ 2
distribution ( 1 n !)
Hypothesis
1 1 n µ
τ02 0
+ σ2
ȳ
testing 2
∝ exp − 2 + 2
θ −2 1 n θ
Simple 2 τ0 σ τ02
+ σ2
multiparameter
models
1
Markov chains ∝ exp − 2 (θ − µn )2 ,
MCMC methods
2τn
Model checking
and comparison where
Hierarchical and
regression 1
µ + n
ȳ −1
models τ02 0 σ2 1 n
Categorical data µn = 1 n and τn2 = 2 + 2 .
τ02 + σ2
τ0 σ

Basic concepts
Thus, the posterior distribution is N (µn , τn2 ).
Single-parameter
models
Normal
distribution
Poisson
distribution
Exponential
distribution
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Normal The inverse of variance is called precision. We see that
distribution
Poisson
distribution posterior precision = prior precision + data precision
Exponential
distribution
where the prior precision is 1/τ02 and data precision n/σ 2 (the
Hypothesis
testing inverse of the variance of the sample mean).
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Normal The inverse of variance is called precision. We see that
distribution
Poisson
distribution posterior precision = prior precision + data precision
Exponential
distribution
where the prior precision is 1/τ02 and data precision n/σ 2 (the
Hypothesis
testing inverse of the variance of the sample mean).
Simple
multiparameter
models
The posterior mean is a weighted average of the prior mean µ0
Markov chains
and sample mean ȳ where the weights are the corresponding
MCMC methods precisions. When n → ∞ (or when τ02 → ∞), the role of the
Model checking prior information vanishes. Thus, for large values of n,
and comparison
Hierarchical and
approximately θ|y ∼ N (ȳ, σ 2 /n).
regression
models
Categorical data

Basic concepts
Next, we determine the posterior predictive distribution of a
Single-parameter
models new observation ỹ. The joint posterior distribution of θ and ỹ is
Normal
distribution
Poisson
distribution
p(θ, ỹ|y) = p(θ|y)p(ỹ|y, θ)
Exponential
distribution 1 1
∝ exp − 2 (θ − µn )2 − 2 (ỹ − θ)2 .
Hypothesis
testing
2τn 2σ
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Next, we determine the posterior predictive distribution of a
Single-parameter
models new observation ỹ. The joint posterior distribution of θ and ỹ is
Normal
distribution
Poisson
distribution
p(θ, ỹ|y) = p(θ|y)p(ỹ|y, θ)
Exponential
distribution 1 1
∝ exp − 2 (θ − µn )2 − 2 (ỹ − θ)2 .
Hypothesis
testing
2τn 2σ
Simple
multiparameter Since the exponent is a quadratic function of θ and ỹ, their
models
Markov chains
joint distribution is bivariate normal. Consequently, the
MCMC methods
marginal distribution p(ỹ|y) is univariate normal, and it is
Model checking sufficient to determine its mean and variance.
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Using the rules of iterated mean and variance, we obtain that
Single-parameter
models
Normal
distribution
E(ỹ|y) = E[E(ỹ|y, θ)|y] = E[θ|y] = µn ,
Poisson
distribution
Exponential and
distribution
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Normal
distribution
E(ỹ|y) = E[E(ỹ|y, θ)|y] = E[θ|y] = µn ,
Poisson
distribution
Exponential and
distribution
Hypothesis
testing Var(ỹ|y) = E[Var(ỹ|y, θ)|y] + Var[E(ỹ|y, θ)|y]
Simple
multiparameter = E[σ 2 |y] + Var[θ|Y ]
models
Markov chains = σ 2 + τn2 .

MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Normal
distribution
E(ỹ|y) = E[E(ỹ|y, θ)|y] = E[θ|y] = µn ,
Poisson
distribution
Exponential and
distribution
Hypothesis
testing Var(ỹ|y) = E[Var(ỹ|y, θ)|y] + Var[E(ỹ|y, θ)|y]
Simple
multiparameter = E[σ 2 |y] + Var[θ|Y ]
models
Markov chains = σ 2 + τn2 .

MCMC methods
Model checking Thus, the posterior predictive distribution is
and comparison
Hierarchical and
p(ỹ|y) = N (ỹ|µn , σ 2 + τn2 ).

regression
models
Categorical data

Poisson distribution
Basic concepts
The Poisson distribution is often used to model rare incidents,
Single-parameter
models such as traffic accidents or rare diseases. For a vector
Normal
distribution y = (y1 , ..., yn ) of iid observation, the likelihood is
Poisson
distribution
n
Y P
Exponential
distribution
θ yi −θ yi −nθ
p(y|θ) = e ∝θ e .
Hypothesis yi !
testing i=1
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Poisson distribution
Basic concepts
The Poisson distribution is often used to model rare incidents,
Single-parameter
models such as traffic accidents or rare diseases. For a vector
Normal
distribution y = (y1 , ..., yn ) of iid observation, the likelihood is
Poisson
distribution
n
Y P
Exponential
distribution
θ yi −θ yi −nθ
p(y|θ) = e ∝θ e .
Hypothesis yi !
testing i=1
Simple
multiparameter
models
Given that the prior distribution is Gamma(α, β), the posterior
Markov chains
MCMC methods
P
Model checking α−1 −βθ yi −nθ
and comparison ∝θ e θ e
P
Hierarchical and α+
yi −1 −(β+n)θ
regression ∝θ e
models
P
Categorical data is Gamma(α + yi , β + n).

Poisson distribution (cont)
Basic concepts
The negative binomial distribution. When the prior and
Single-parameter
models posterior distributions can be written in closed form, the
Normal
distribution marginal likelihood p(y) can be computed using the formula
Poisson
distribution
Exponential p(y|θ)p(θ)
distribution p(y) = .
Hypothesis p(θ|y)
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
The negative binomial distribution. When the prior and
Single-parameter
models posterior distributions can be written in closed form, the
Normal
distribution marginal likelihood p(y) can be computed using the formula
Poisson
distribution
Exponential p(y|θ)p(θ)
distribution p(y) = .
Hypothesis p(θ|y)
testing
Simple
multiparameter
For example, if y is a single observation from Poi(θ), then
models
θ y −θ β α α−1 −βθ
Markov chains
y! e · Γ(α) θ e
MCMC methods p(y) = (β+1)α+y α+y−1 −(β+1)θ
Γ(α+y) θ e
Model checking
and comparison
α y
Hierarchical and α+y−1 β 1
regression = ,
models y β+1 β+1
Categorical data
which is Neg-Bin(α, β), the negative binomial distribution.

Basic concepts
On the other hand,
Single-parameter
models Z Z
Normal
distribution p(y) = p(y|θ)p(θ) = Poi(y|θ)Gamma(θ|α, β)dθ,
Poisson
distribution
Exponential
distribution implying that the negative binomial distribution is a compound
Hypothesis
testing distribution where the Poisson distribution is compounded
Simple using the Gamma distribution as a weight distribution.
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
On the other hand,
Single-parameter
models Z Z
Normal
distribution p(y) = p(y|θ)p(θ) = Poi(y|θ)Gamma(θ|α, β)dθ,
Poisson
distribution
Exponential
distribution implying that the negative binomial distribution is a compound
Hypothesis
testing distribution where the Poisson distribution is compounded
Simple using the Gamma distribution as a weight distribution.
multiparameter
models
In many applications, the data are distributed as
Markov chains
MCMC methods
Model checking
yi ∼ Poi(xi θ),
and comparison
Hierarchical and
regression
where the xi are known values of an explanatory variable. In
models epidemiology, xi is called exposure of the ith unit. With prior
Categorical data
distribution Gamma(α,
P Pβ), the posterior becomes
Gamma(α + yi , β + xi ).

Poisson distribution: Example
Basic concepts
Single-parameter
models
Year Fatal Passenger Death
Normal accidents deaths rate
distribution
Poisson 1976 24 734 0.19
distribution
Exponential 1977 25 516 0.12
distribution
1978 31 754 0.15
Hypothesis
testing 1979 31 877 0.16
Simple 1980 22 814 0.14
multiparameter
models
1981 21 362 0.06
Markov chains
1982 26 764 0.13
MCMC methods
1983 20 809 0.13
Model checking 1984 16 223 0.03
and comparison 1985 22 1066 0.15
Hierarchical and
regression
models
Categorical data
Table 2: Worldwide airline fatalities 1976-85. Death rate is pas-
senger deaths per 100 million passenger miles. Source: Statistical
Abstract of the United States.
Poisson distribution: Example (cont)
Basic concepts
In Table 2, the death rate is di = yi /xi where yi is the number
Single-parameter
models of passenger deaths and xi the ’exposure’ given in 100 million
Normal
distribution
passenger miles. Thus xi = yi /di . Assuming the model
Poisson yi ∼ Poi(θxi ), the rate θ can be estimated using BUGS as
distribution
Exponential follows:
distribution
Hypothesis
testing
Simple
model{
multiparameter theta ~ dgamma(alpha,beta)
models
for(i in 1:n){
Markov chains
y[i] ~ dpois(theta*x[i])
MCMC methods
Model checking
}
and comparison }
Hierarchical and
regression
models air <- list(n=10,y=deaths,x=deaths/rate,alpha=0.01,beta=0.01)
Categorical data ...
2.5% 25% 50% 75% 97.5%
0.1182 0.1201 0.1210 0.1220 0.1239

Exponential distribution
Basic concepts
In a Poisson process having intensity θ, the number of events in
Single-parameter
models a time interval of length τ follows the Poisson distribution with
Normal
distribution parameter τ θ. Further, the waiting time between two Poisson
Poisson
distribution events follows the exponential distribution Exp(θ), and the
Exponential
distribution waiting time until the nth event is Gamma(n, θ).
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Normal
Poisson
Exponential
Hypothesis
testing The exponential distribution can also be used to model life
Simple times of objects that do not wear out, since in this model the
multiparameter
models expected remaining life time is independent of the time the
Markov chains object has already survived.
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Normal
Poisson
Exponential
Hypothesis
testing The exponential distribution can also be used to model life
Simple times of objects that do not wear out, since in this model the
multiparameter
models expected remaining life time is independent of the time the
Markov chains object has already survived. If Y ∼ Exp(θ), then
MCMC methods
Pr(y0 < Y ≤ y) Pr(Y ≤ y) − Pr(Y ≤ y0 )
Model checking Pr(Y ≤ y|Y > y0 ) = =
and comparison Pr(Y > y0 Pr(Y > y0 )
Hierarchical and
regression (1 − e−θy ) − (1 − e−θy0 ) −θ(y−y0 )
models = = 1 − e ,
e−θy0
Categorical data
which is the exponential distribution function starting at y0 .

Exponential distribution (cont)
Basic concepts
Bayesian analysis. Let y = (y1 , ..., yn ) be a random sample form
Single-parameter
models Exp(θ) and let Gamma(α, β) be the prior. Then the posterior is
Normal
distribution
Poisson n
Y
distribution
Exponential p(θ|y) ∝ p(θ)p(y|θ) ∝ θα−1 e−βθ θe−θyi
distribution
i=1
Hypothesis P
testing ∝ θα+n−1 e−θ(β+ yi )
,
Simple
multiparameter P
models which is Gamma(α + n, β + yi ) distribution.
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Bayesian analysis. Let y = (y1 , ..., yn ) be a random sample form
Single-parameter
models Exp(θ) and let Gamma(α, β) be the prior. Then the posterior is
Normal
distribution
Poisson n
Y
distribution
Exponential p(θ|y) ∝ p(θ)p(y|θ) ∝ θα−1 e−βθ θe−θyi
distribution
i=1
Hypothesis P
testing ∝ θα+n−1 e−θ(β+ yi )
,
Simple
multiparameter P
models which is Gamma(α + n, β + yi ) distribution.
Markov chains
MCMC methods
Censored observations. Assume that the observations y1 , ...ym
Model checking are known to be larger than U , while the exact values of
and comparison
Hierarchical and
ym+1 , ..., yn are known. Then the values y1 , ...ym are called
regression
models
right-censored. On the other hand, if some observations are
Categorical data
known to be less or equal to some threshold L, they are called
left-censored.

Basic concepts
In the exponential case of right-censoring, the likelihood is
Single-parameter
models
Normal
m
Y n
Y
distribution
Poisson p(y|θ) = Pr(Yi > U |θ) p(yi |θ)
distribution
Exponential
i=1 i=m+1
distribution m
Y n
Y P
−θU −θyi n−m −θ(mU + yi )
Hypothesis
testing
= e θe =θ e .
Simple i=1 i=m+1
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Normal
m
Y n
Y
distribution
distribution
Exponential
i=1 i=m+1
distribution m
Y n
Y P
Hypothesis
testing
= e θe =θ e .
Simple i=1 i=m+1
multiparameter
models
Thus, with prior Gamma(α, β),P
the posterior is
Markov chains
MCMC methods
Gamma(α + n − m, β + mU + yi ).
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Normal
m
Y n
Y
distribution
distribution
Exponential
i=1 i=m+1
distribution m
Y n
Y P
Hypothesis
testing
= e θe =θ e .
Simple i=1 i=m+1
multiparameter
models
Thus, with prior Gamma(α, β),P
the posterior is
Markov chains
MCMC methods
Gamma(α + n − m, β + mU + yi ).
Model checking
and comparison In the case of left-censoring, the likelihood is
Hierarchical and P
−θL m n−m −θ yi
regression
models
p(y|θ) = (1 − e ) θ e ,
Categorical data
so that the posterior distribution is nonstandard.

Exponential distribution: Example
Basic concepts
Let us assume that the life time of an electronical component is
Single-parameter
models exponentially distributed. After 2 years it is observed that 3
Normal
distribution out of 10 components have broken and the life times of the
Poisson
distribution remaining components are 2.7, 3.7, 4.0, 4.7, 5.9, 6.6, 12.1.
Exponential
distribution
The JAGS code (in this case different from OpenBUGS or
Hypothesis WinBUGS) and the related R code:
testing
Simple
multiparameter
models model{
Markov chains theta ~ dgamma(alpha,beta)
MCMC methods for(i in 1:n){
Model checking x[i] ~ dinterval(y[i],L)
and comparison
y[i] ~ dexp(theta)
Hierarchical and
regression }
models
}
Categorical data
comp <- list(n=10,L=2,y=c(NA,NA,NA,2.7,3.7,4.0,4.7,5.9,6.6,12
x=c(0,0,0,1,1,1,1,1,1,1),alpha=0.01,beta=0.01)

Basic concepts
Single-parameter
models
Hypothesis
testing
Definition
Example
Computation
Simple
multiparameter Hypothesis testing
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Bayesian hypothesis testing
Basic concepts
The frequentist approach to hypothesis testing would compare
Single-parameter
models a null hypothesis H0 with an alternative H1 through a test
Hypothesis
testing
statistic T which typically obtains a larger value when H1 is
Definition true than when H0 is true. The null hypothesis is rejected with
Example
Computation
a level α if the observed value of the test statistic, tobs , is larger
Simple than the critical value tC where Pr(T > tC |H0 ) = α. The
multiparameter
models so-called p-value, p = Pr(T ≥ tobs |H0 ), is a related concept.
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Bayesian hypothesis testing
Basic concepts
The frequentist approach to hypothesis testing would compare
Single-parameter
models a null hypothesis H0 with an alternative H1 through a test
Hypothesis
testing
statistic T which typically obtains a larger value when H1 is
Definition true than when H0 is true. The null hypothesis is rejected with
Example
Computation
a level α if the observed value of the test statistic, tobs , is larger
Simple than the critical value tC where Pr(T > tC |H0 ) = α. The
multiparameter
models so-called p-value, p = Pr(T ≥ tobs |H0 ), is a related concept.
Markov chains
In frequentist statistics, we do not assign probabilities to
MCMC methods
Model checking
hypotheses. In particular, the p-value cannot be interpreted as
and comparison
p(H0 ). On the contrary, in the Bayesian approach, we may
Hierarchical and
regression assign the prior probabilities p(H0 ) and p(H1 ), and, using
models
Bayes’ theorem, compute the posterior probabilities
Categorical data
p(Hi )p(y|Hi )
p(Hi |y) = , i = 0, 1.
p(H0 )p(y|H0 ) + p(H1 )p(y|H1 )

Bayesian hypothesis testing (cont)
Basic concepts
In the frequentist approach it is not absolutely necessary to
Single-parameter
models specify an alternative hypothesis. Further, if an alternative is
Hypothesis
testing
specified, the p-value is independent of it. In the Bayesian
Definition approach, the both hypotheses must be fully specified.
Example
Computation
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
Example
Computation One usually computes the posterior odds
Simple
multiparameter
models p(H1 |y p(y|H1 ) p(H1 )
= × ,
Markov chains p(H0 |y) p(y|H0 ) p(H0 )
MCMC methods
Model checking
and comparison
which depends on the data y only through the Bayes factor
Hierarchical and
B10 = p(y|H1 )/p(y|H0 ).
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
Example
Computation One usually computes the posterior odds
Simple
multiparameter
models p(H1 |y p(y|H1 ) p(H1 )
= × ,
Markov chains p(H0 |y) p(y|H0 ) p(H0 )
MCMC methods
Model checking
and comparison
which depends on the data y only through the Bayes factor
Hierarchical and
B10 = p(y|H1 )/p(y|H0 ).
regression
models In the case that a hypothesis is composite (not simple), the
Categorical data
unknown parameters should be first integrated out:
Z
p(y|Hi ) = p(y|θi , Hi )p(θi |Hi )dθi , i = 0, 1.

Basic concepts
Single-parameter
models Table 3: Interpretation of Bayes factor B10 in favor of H1 over
Hypothesis
testing H0 . From Robert E. Kass and Adrian E. Raftery (1995). ”Bayes
Definition Factors”. JASA 90 (430): 791.
Example
Computation
Simple
multiparameter B10 2 log B10 Evidence against H0
models
1-3 0-2 Hardly worth a mention
Markov chains
MCMC methods
3-20 2-6 Positive
Model checking 20-150 6-10 Strong
and comparison
>150 >10 Very strong
Hierarchical and
regression
models
Categorical data
Rough interpretations for B1 , and, equivalently for 2 log B10 ,
are provided in Table 3. The quantity 2 log B10 corresponds to
the likelihood ratio statistics in likelihood inference.

Bayesian hypothesis testing: Example
Basic concepts
Single-parameter
models Table 4: The log Bayes factors 2 log Bτ 0 for HUS data.
Hypothesis
testing 1970 1971 1972 1973 1974 1975 1976
Definition
y 1 5 3 2 2 1 0
Example
Computation α=β=1 4.9 -0.5 0.6 3.9 7.5 13 24
Simple α = β = 0.01 -1.3 -5.9 -4.5 -1.0 3.0 9.7 20
multiparameter α = β = 0.0001 -10 -15 -14 -10 -6.1 0.6 11
models
1977 1978 1979 1980 1981 1982 1983
Markov chains
y 0 2 1 1 7 11 4
MCMC methods
α=β=1 35 41 51 63 55 38 42
Model checking
and comparison α = β = 0.01 32 39 51 64 57 40 47
Hierarchical and α = β = 0.0001 23 30 42 55 48 31 38
regression
models 1984 1985 1986 1987 1988 1989
Categorical data y 7 10 16 16 9 15
α=β=1 40 31 11 -2.9 -5.3 0
α = β = 0.01 46 38 18 1.8 1.2 0
α = β = 0.0001 37 29 8.8 -7.1 -7.7 0

Basic concepts
Table 4 shows the numbers of cases of haemolytic uraemic
Single-parameter
models syndrome (HUS) treated at a clinic in Birmingham from 1970
Hypothesis
testing
to 1989. There seems to be a rise in 1981. We assume that the
Definition annual counts y1 , ..., yn are independent and Poisson-distributed
Example
Computation
with means E(Yj ) = λ1 for j = 1, ..., τ , and E(Yj ) = λ2 for
Simple j = τ + 1, ..., n. The changepoint can take values 1, ..., n − 1.
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Table 4 shows the numbers of cases of haemolytic uraemic
Single-parameter
models syndrome (HUS) treated at a clinic in Birmingham from 1970
Hypothesis
testing
to 1989. There seems to be a rise in 1981. We assume that the
Definition annual counts y1 , ..., yn are independent and Poisson-distributed
Example
Computation
with means E(Yj ) = λ1 for j = 1, ..., τ , and E(Yj ) = λ2 for
Simple j = τ + 1, ..., n. The changepoint can take values 1, ..., n − 1.
multiparameter
models Our baseline model H0 is that there is no change, λ1 = λ2 = λ,
Markov chains and the alternative Hτ that there is a change after τ years.
MCMC methods Under Hτ we assume that λ1 and λ2 have independent priors
Model checking
and comparison
with parameters α and β. Then p(y|Hτ ) equals
Hierarchical and Z τ
∞Y y Z Yn y
regression λ1j −λ1 β α λα−1
1 e−βλ1 ∞
λ1j −λ2 β α λα−1
2 e−βλ2
models e · dλ1 e · dλ2 ,
Categorical data 0 y !
j=1 j
Γ(α) 0 y !
j=τ +1 j
Γ(α)

Basic concepts
which can be simplified as
Single-parameter
models
Hypothesis β 2α Γ(α + sτ )Γ(α + sn − sτ )
testing 2
Qn α+s α+s −s
,
Definition Γ(α) j=1 yj ! (β + τ )
τ (β + n − τ ) n τ
Example
Computation
where sτ = y1 + ... + yτ and sn = y1 + ... + yn .
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
which can be simplified as
Single-parameter
models
Hypothesis β 2α Γ(α + sτ )Γ(α + sn − sτ )
testing 2
Qn α+s α+s −s
,
Definition Γ(α) j=1 yj ! (β + τ )
τ (β + n − τ ) n τ
Example
Computation
where sτ = y1 + ... + yτ and sn = y1 + ... + yn .
Simple
multiparameter
models Under H0 we also assume that λ ∼ Gamma(α, β). Then the
Markov chains Bayes factor for a changepoint in year τ is
MCMC methods
Model checking Γ(α + sτ )Γ(α + sn − sτ )β α (β + n)α+sn
and comparison Bτ 0 = α+s α+s −s
, τ = 1, ..., n
Hierarchical and Γ(α)Γ(α + sn )(β + τ ) τ (β + n − τ ) n τ
regression
models
From Table 4 we see that there is a very strong evidence for
Categorical data
change in 1976–1985 for all priors.

Computing marginal likelihoods
Basic concepts
Bayes factors can be presented in closed form only in simple
Single-parameter
models conjugate situations, but various simulation-based methods
Hypothesis
testing
have been suggested. One simple example is the harmonic
Definition mean method, which is based on the result
Example
Computation
T
Simple 1X 1 p 1
multiparameter
(t)
−→ as T → ∞,
models T p(y|θ ) p(y)
t=1
Markov chains
MCMC methods
where θ(t) , t = 1, ..., T are independent simulations from p(θ|y).
Model checking
and comparison The result follows from the law of large numbers.
Hierarchical and
regression
models
Categorical data

Computing marginal likelihoods
Basic concepts
Bayes factors can be presented in closed form only in simple
Single-parameter
models conjugate situations, but various simulation-based methods
Hypothesis
testing
have been suggested. One simple example is the harmonic
Definition mean method, which is based on the result
Example
Computation
T
Simple 1X 1 p 1
multiparameter
(t)
−→ as T → ∞,
models T p(y|θ ) p(y)
t=1
Markov chains
MCMC methods
where θ(t) , t = 1, ..., T are independent simulations from p(θ|y).
Model checking
and comparison The result follows from the law of large numbers.
Hierarchical and
regression This estimator is somewhat unstable, since occasional values of
models
Categorical data
θ(t) with small likelihood have a large effect on it. Therefore,
several modifications of the method have been developed. More
advanced methods, such as path sampling, are effective, but
usually require problem-specific tuning.

Model choice: Example (cont)
Basic concepts
Another approach is to consider the model choice as a discrete
Single-parameter
models parameter. This is generally a more reliable method to obtain
Hypothesis
testing
posterior model probabilities with BUGS (BUGS book, 2013).
Definition
Example
Computation
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
Definition
Example In the following, we present a code to estimate the model
Computation probabilities in the HUS example. We give an equal prior
Simple
multiparameter probability, 1/n, to each of the models Hτ , τ = 1, ..., n. Here,
models
Hn corresponds to H0 .
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
Definition
Example In the following, we present a code to estimate the model
Computation probabilities in the HUS example. We give an equal prior
Simple
multiparameter probability, 1/n, to each of the models Hτ , τ = 1, ..., n. Here,
models
Hn corresponds to H0 .
Markov chains
MCMC methods Figure 3 shows the posterior model probabilities. The values 11
Model checking
and comparison
and 12 are the most probable change points:
Hierarchical and Pr(τ = 11|y) ≈ 0.97 and Pr(τ = 12|y) ≈ 0.03.
regression
models
Categorical data

Basic concepts
Single-parameter
model{
models for(i in 1:n){
Hypothesis
testing
q[i] <- 1/n
Definition }
Example
tau ~ dcat(q[])
Computation
Simple
for(i in 1:2){
multiparameter lambda[i] ~ dgamma(alpha,beta)
models
}
Markov chains
for(i in 1:n){
MCMC methods
mu[i] <- lambda[1]+
Model checking
and comparison step(i-tau-0.1)*(lambda[2]-lambda[1])
Hierarchical and y[i] ~ dpois(mu[i])
regression
models }
Categorical data }
HUS <- list(n=20, y= c(1,5,3,2,2,1,0,0,2,1,1,7,11,4,7,10,16,1

alpha=0.01,beta=0.01)

Basic concepts
Single-parameter
models
1.0
Hypothesis
testing
0.8
Definition
Example
Computation
0.6
Proportion
Simple
multiparameter
0.4
models
Markov chains
0.2
MCMC methods
Model checking
and comparison
0.0
Hierarchical and
9 10 11 12 14 15
regression
models
Categorical data
Figure 3: Posterior model probabilities in the HUS example.

Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Normal
distribution
Example
Simple multiparameter models
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Normal distribution with unknown mean and
variance
Basic concepts
Next we consider simple models having more than one
Single-parameter
models parameter. Let us assume that y = (y1 , ..., yn ) is a random
Hypothesis
testing
sample from N (µ, σ 2 ) where both µ and σ 2 are unknown. If the
Simple joint prior is p(µ, σ 2 ) ∝ 1/σ 2 , or equivalently p(µ, log(σ 2 )) ∝ 1,
multiparameter
models
the posterior is
Normal
distribution
Example
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Normal distribution with unknown mean and
variance
Basic concepts
Next we consider simple models having more than one
Single-parameter
models parameter. Let us assume that y = (y1 , ..., yn ) is a random
Hypothesis
testing
sample from N (µ, σ 2 ) where both µ and σ 2 are unknown. If the
Simple joint prior is p(µ, σ 2 ) ∝ 1/σ 2 , or equivalently p(µ, log(σ 2 )) ∝ 1,
multiparameter
models
the posterior is
Normal
distribution n
!
Example
2 1 1 1 X
Multinomial p(µ, σ |y) ∝ 2 × 2 n/2 exp − 2 (yi − µ)2
distribution σ (σ ) 2σ
i=1
Example " n #
Markov chains 1 1 X
MCMC methods = 2 n/2+1 exp − 2 (yi − ȳ)2 + n(ȳ − µ)2
(σ ) 2σ
Model checking i=1
and comparison
1 1
Hierarchical and = 2 n/2+1 exp − 2 [(n − 1)s2 + n(ȳ − µ)2 ] ,
regression
models
(σ ) 2σ
Categorical data
1 Pn
where s2 = n−1 i=1 (yi − ȳ)2 is the sample variance.

Normal distribution (cont)
Basic concepts
The marginal posterior of σ 2 is obtained by integrating µ out:
Single-parameter
models Z ∞
Hypothesis 1 1
testing p(σ 2 |y) ∝ 2 n/2+1
exp − 2
[(n − 1)s2
+ n(ȳ − µ) 2
] dµ.
Simple −∞ (σ ) 2σ
multiparameter
models
Normal
distribution
Example
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models Z ∞
Hypothesis 1 1
exp − 2
[(n − 1)s 2
+ n(ȳ − µ) 2
] dµ.
multiparameter
models 1 2

Normal The integral of the factor exp − 2σ2 n(ȳ − µ) is a simple
distribution
Example normal integral, so
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models Z ∞
Hypothesis 1 1
exp − 2
[(n − 1)s 2
+ n(ȳ − µ) 2
] dµ.
multiparameter
models 1 2

distribution
Multinomial

distribution
2 1 1 2
p
Example
p(σ |y) ∝ 2 n/2+1 exp − 2 (n − 1)s 2πσ 2 /n
Markov chains (σ ) 2σ
2

MCMC methods
1 (n − 1)s
Model checking ∝ 2 (n+1)/2 exp − 2
.
and comparison (σ ) 2σ
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models Z ∞
Hypothesis 1 1
exp − 2
[(n − 1)s 2
+ n(ȳ − µ) 2
] dµ.
multiparameter
models 1 2

distribution
Multinomial

distribution
2 1 1 2
p
Example
p(σ |y) ∝ 2 n/2+1 exp − 2 (n − 1)s 2πσ 2 /n
Markov chains (σ ) 2σ
2

MCMC methods
1 (n − 1)s
Model checking ∝ 2 (n+1)/2 exp − 2
.
and comparison (σ ) 2σ
Hierarchical and
regression
models This is a scaled inverse-χ2 -density:
Categorical data
σ 2 |y ∼ Inv-χ2 (n − 1, s2 ).

Basic concepts
Thus, {(n − 1)s2 /σ 2 | y} ∼ χ2n−1 . This is analogous with the
Single-parameter
models corresponding sampling theory result. However, in sampling
Hypothesis
testing
theory, s2 is considered random, while here σ 2 is random.
Simple
multiparameter
models
Normal
distribution
Example
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
Simple
multiparameter By making the substitution
models
Normal
A
where A = (n − 1)s2 + n(ȳ − µ)2 ,
distribution
Example z = 2,
Multinomial
σ
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
Simple
multiparameter By making the substitution
models
Normal
A
where A = (n − 1)s2 + n(ȳ − µ)2 ,
distribution
Example z = 2,
Multinomial
σ
distribution
Example we obtain the marginal density of µ:
Markov chains Z
∞
1 1
MCMC methods
p(µ|y) ∝ exp − 2 [(n − 1)s2 + n(ȳ − µ)2 ] dσ 2
Model checking
0(σ 2 )n/2+1 2σ
and comparison Z ∞
Hierarchical and
regression
∝ A−n/2 z n/2−1 exp(−z)dz
models 0

2 −n/2
Categorical data n(µ − ȳ)
∝ 1+ 2
.
(n − 1)s

Normal distribution: Speed of light (example)
√
Basic concepts
This is the tn−1 (ȳ, s2 /n)
density. Thus, {(µ − ȳ)/(s/ n) | y}
Single-parameter
models ∼ tn−1 . This is again analogous to the sampling theory result.
Hypothesis
testing
It can also be shown (exercise) that the density of a new
Simple observation ỹ is tn−1 (ȳ, s2 (1 + 1/n)). The posterior can be
multiparameter
models simulated using p(σ 2 |y) and p(µ|σ 2 , y) = N (µ|ȳ, σ 2 /n).
Normal
distribution
Example
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Normal distribution: Speed of light (example)
√
Basic concepts
This is the tn−1 (ȳ, s2 /n)
density. Thus, {(µ − ȳ)/(s/ n) | y}
Single-parameter
models ∼ tn−1 . This is again analogous to the sampling theory result.
Hypothesis
testing
It can also be shown (exercise) that the density of a new
Simple observation ỹ is tn−1 (ȳ, s2 (1 + 1/n)). The posterior can be
multiparameter
models simulated using p(σ 2 |y) and p(µ|σ 2 , y) = N (µ|ȳ, σ 2 /n).
Normal
distribution
Example
Example. Estimating the speed of light. Simon Newcomb made
Multinomial an experiment in 1882 to measure the speed of light. He
distribution
Example measured the time light travels 7442 meters. Figure 4 shows
Markov chains that there are two outliers, so the normal distribution as such is
MCMC methods not a very good model. However, for the sake of illustration, we
Model checking
and comparison assume that the observations are independent and from
Hierarchical and N (µ, σ 2 ). With the noninformative prior p(µ, σ 2 ) ∝ 1/σ 2 , the
regression √
models 95% posterior interval is (ȳ ± t(n−1);0.025 s/ n) = (23.6, 28.9)
Categorical data where n = 66, ȳ = 26.2 andps = 10.8. Further, the prediction
interval is (ȳ ± t(n−1);0.025 s 1 + 1/n) = (4.6, 47.8).

Example: Speed of light (cont)
Basic concepts
Single-parameter
models
12
Hypothesis
testing
10
Simple
multiparameter
8
Frequency
models
Normal
6
distribution
Example
4
Multinomial
distribution
2
Example
0
Markov chains
−40 −20 0 20 40
MCMC methods
Model checking
and comparison
Hierarchical and
regression
Figure 4: Newcomb’s measurements for speed of light.
models
Categorical data

Multinomial distribution
Basic concepts
If y = (y1 , ...yk ) is multinomially distributed with parameters n
Single-parameter
models and θ = (θ1 , ..., θk ) (denoted as Multin(n; θ)) then the likelihood
Hypothesis
testing
is
Simple p(θ|y) ∝ θ1y1 θ2y2 ...θkyk
multiparameter
models Pk
Normal where θi ≥ 0 for all i = 1, ..., k and i=1 θi = 1.
distribution
Example
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
is
multiparameter
models Pk
distribution
Example
Multinomial
It is easy to see that the conjugate prior is the Dirichlet
distribution distribution (denoted as Dirichlet(α1 , ...αk )):
Example
Markov chains
p(θ) ∝ θ1α1 −1 θ2α2 −1 ...θkαk −1 ,
MCMC methods
Model checking Pk
and comparison where θi ≥ 0 and αi > 0 for all i = 1, ..., k, and i=1 θi = 1.
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
is
multiparameter
models Pk
distribution
Example
Multinomial
It is easy to see that the conjugate prior is the Dirichlet
distribution distribution (denoted as Dirichlet(α1 , ...αk )):
Example
Markov chains
p(θ) ∝ θ1α1 −1 θ2α2 −1 ...θkαk −1 ,
MCMC methods
Model checking Pk
and comparison where θi ≥ 0 and αi > 0 for all i = 1, ..., k, and i=1 θi = 1.
Hierarchical and
regression
models
The posterior distribution is Dirichlet(α1 + y1 , ...αk + yk ):
Categorical data
p(θ) ∝ θ1α1 +y1 −1 θ2α2 +y2 −1 ...θkαk +yk −1 .

Multinomial distribution: Presidential elections
(example)
Basic concepts
In January 2006, Taloustutkimus (Economic Survey in Finland)
Single-parameter
models interviewed 1582 adults about their preferences in the
Hypothesis
testing
forthcoming presidential election. Out of those who expressed
Simple their opinion, 52% supported Halonen, 20% Niinistö, 18%
multiparameter
models Vanhanen, and 10% other candidates. The proportion of
Normal
distribution uncertain respondents was 29%.
Example
Multinomial
distribution
Example
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Multinomial distribution: Presidential elections
(example)
Basic concepts
In January 2006, Taloustutkimus (Economic Survey in Finland)
Single-parameter
models interviewed 1582 adults about their preferences in the
Hypothesis
testing
forthcoming presidential election. Out of those who expressed
Simple their opinion, 52% supported Halonen, 20% Niinistö, 18%
multiparameter
models Vanhanen, and 10% other candidates. The proportion of
Normal
distribution uncertain respondents was 29%.
Example
Multinomial If we assume simple random sampling (which is not exactly
distribution
Example true), the numbers of the supporters in the sample follow a
Markov chains multinomial distribution where n ≈ 0.71 · 1582 ≈ 1123, and
MCMC methods θ1 , θ2 , θ3 , θ4 are the true proportions of the supporters of
Model checking
and comparison Halonen, Niinistö, Vanhanen, and other candidates, in the
Hierarchical and population of those expressing their opinion. With a uniform
regression
models prior, the posterior is Dirichlet(0.52 · 1123 + 1, 0.20 · 1123 + 1,
Categorical data 0.18 · 1123 + 1, 0.1 · 1123 + 1).

Example: Presidential elections (cont)
Basic concepts
There were two interesting questions: 1) Will Halonen have
Single-parameter
models more than 50% of the votes in the first round? 2) Will Niinistö
Hypothesis
testing
win Vanhanen? By posterior simulation we find out that
Simple Pr(θ1 > 0.5|y) = 0.90 and Pr(θ2 − θ3 > 0|y) = 0.86. Further,
multiparameter
models the 95% posterior interval for Halonen’s support is (49,55)%.
Normal
distribution Below the related JAGS code and the data given in R:
Example
Multinomial
distribution
Example model{
Markov chains y ~ dmulti(theta,n)
MCMC methods theta ~ ddirch(alpha)
Model checking
and comparison
p1 <- step(theta[1]-0.5)
Hierarchical and
p2 <- step(theta[2]-theta[3])
regression }
models
Categorical data
el <- list(n=1123,y=round(c(0.52,0.2,0.18,0.1)*1123),
alpha=c(1,1,1,1))

Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
Estimation
Markov chains
Example
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Estimation of Markov Chains
Basic concepts
Assume that we have observations y0 , ..., yT from a time
Single-parameter
models homogenous Markov chain measured at time points
Hypothesis
testing
t = 0, 1, 2, ..., T . Then the likelihood can be written as
Simple
multiparameter Pr(Y0 = y0 , ..., YT = yT )
models
Markov chains
T
Y
Estimation = Pr(Y0 = y0 Pr(Yt = yt |Yt−1 = yt−1 )
Example
t=1
MCMC methods
YT
Model checking
and comparison = Pr(Y0 = y0 ) pyt−1 ,yt
Hierarchical and
regression
t=1
models
YS YS
Categorical data = Pr(Y0 = y0 ) pnrsrs ,
r=1 r=1
where nrs denotes the number of transitions from r to s.

Estimation of Markov Chains (cont)
Basic concepts
If we ignoring the information of the first observation, Y0 , we
Single-parameter
models can write the log-likelihood as
Hypothesis
testing S
S X
X
Simple
multiparameter l(p) = nrs log(prs ), (4)
models
r=1 s=1
Markov chains
Estimation
Example
and the S × S matrix of transition counts nrs is a sufficient
MCMC methods
statistic. Conditioning on the row sums nr. , the numbers of
Model checking transitions starting from state r are multinomially distributed,
and comparison
(nr1 , ..., nrS ) ∼ Multin(nr. ; (pr1 , ..., prS )) for all r = 1, ..., S.
Hierarchical and
regression
models
Categorical data

Basic concepts
If we ignoring the information of the first observation, Y0 , we
Single-parameter
models can write the log-likelihood as
Hypothesis
testing S
S X
X
Simple
multiparameter l(p) = nrs log(prs ), (4)
models
r=1 s=1
Markov chains
Estimation
Example
and the S × S matrix of transition counts nrs is a sufficient
MCMC methods
statistic. Conditioning on the row sums nr. , the numbers of
Model checking transitions starting from state r are multinomially distributed,
and comparison
(nr1 , ..., nrS ) ∼ Multin(nr. ; (pr1 , ..., prS )) for all r = 1, ..., S.
Hierarchical and
regression
models Further, the rows of this matrix are independent. From results
Categorical data concerning the multinomial distribution it follows that the ML
estimate is p̂rs = nrs /nr. , for s = 1, ..., S and r = 1, ..., S.

Basic concepts
In a more simple model where the states Yt are independent prs
Single-parameter
models can be replaced with ps in equation (4). The ML estimates are
Hypothesis
testing
now p̂s = n.s /n.. where n.s is the sth column sum and n.. the
Simple number of all transitions.
multiparameter
models
Markov chains
Estimation
Example
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
In a more simple model where the states Yt are independent prs
Single-parameter
models can be replaced with ps in equation (4). The ML estimates are
Hypothesis
testing
now p̂s = n.s /n.. where n.s is the sth column sum and n.. the
Simple number of all transitions.
multiparameter
models The likelihood ratio statistics for testing the independence
Markov chains
Estimation
hypothesis is given by
Example
X X
MCMC methods p̂rs nrs n..
W =2 nrs log =2 nrs log .
Model checking
and comparison r,s
p̂s r,s
nr· n·s
Hierarchical and
regression
models
Under independence, there are S − 1 free parameters, while in
Categorical data the general case, S(S − 1) parameters. Thus, under
independence, the test statistic is approximately χ2 -distributed
with S(S − 1) − (S − 1) = (S − 1)2 degrees of freedom. W
approximately equals the Pearson statistic for independence.

Estimation of Markov Chains: Example
Basic concepts
Single-parameter
models Table 5: Observed frequencies of one-step transitions in a DNA
Hypothesis
testing chain
Simple
multiparameter
Observed frequency
models
Markov chains
First base A C G T Sum
Estimation A 185 74 86 171 516
Example C 101 41 6 115 263
MCMC methods G 69 45 34 78 226
Model checking
and comparison T 161 103 100 202 566
Hierarchical and Sum 516 263 226 566 1571
regression
models
Categorical data
Let us test independence of bases in a DNA chain. Under
independence, we obtain estimates p̂A = 516/1571 = 0.328,
p̂C = 263/1571 = 0.167 etc. In the Markovian case, we obtain
p̂AA = 185/516 = 0.359, p̂AC = 74/516 = 0.143 etc.

Estimation of Markov Chains: Example (cont)
Basic concepts
If the independence hypothesis was correct, the test statistics P
Single-parameter
models and W would have approximate χ29 -distributions. Now their
Hypothesis
testing
observed values are 64.45 and 50.3 which make this hypothesis
Simple highly implausible.
multiparameter
models
Markov chains
Estimation
Example
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
If the independence hypothesis was correct, the test statistics P
Single-parameter
models and W would have approximate χ29 -distributions. Now their
Hypothesis
testing
observed values are 64.45 and 50.3 which make this hypothesis
Simple highly implausible.
multiparameter
models The fit of the independence assumption can also be studied
Markov chains
Estimation
graphically. If this assumption was correct, the normalized
1/2
Example deviations Zrs = (Ors − Ers )/Ers , where Ors = nrs denotes
MCMC methods
the observed and Ers = nr· n·s /n.. the expected frequency,
Model checking
and comparison would be approximately distributed as N (0, 1). Figure 5 shows
Hierarchical and
regression
the normal probability plot. One observed frequency clearly
models deviates from the observed one (Zrs is less than -5). This value
Categorical data belongs to the CG cell.

Basic concepts
Single-parameter Normal Q−Q Plot
models
Hypothesis
testing
Simple
4
multiparameter
models
Markov chains
2
Estimation
(O−E)/sqrt(E)
Example
MCMC methods
0
Model checking
and comparison
−2
Hierarchical and
regression
models
−4
Categorical data
−4 −2 0 2 4
Quantiles of Standard Normal

Figure 5: Normal probability plot of normalized deviations
Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
MCMC methods
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

MCMC methods
Basic concepts
In the following, we will introduce computationally intensive
Single-parameter
models methods based on Markov chains which can be used in the
Hypothesis
testing
simulation of multivariate distributions. These are called
Simple Markov Chain Monte Carlo (MCMC) methods, and they are
multiparameter
models
especially useful in the computations of Bayesian statistics. The
Markov chains general idea is to generate a time-reversible Markov chain with
MCMC methods a desired stationary distribution.
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

MCMC methods
Basic concepts
In the following, we will introduce computationally intensive
Single-parameter
models methods based on Markov chains which can be used in the
Hypothesis
testing
simulation of multivariate distributions. These are called
Simple Markov Chain Monte Carlo (MCMC) methods, and they are
multiparameter
models
especially useful in the computations of Bayesian statistics. The
Markov chains general idea is to generate a time-reversible Markov chain with
MCMC methods a desired stationary distribution.
Gibbs sampler
Metropolis
algorithm
We will assume that target distribution is discrete, so that we
Example can apply the theory of discrete state-space Markov chains.
Metropolis-
Hastings However, MCMC methods are often applied to continuous
Convergence
distributions, so that their proper treatment would require the
Model checking
and comparison theory of general state-space Markov chains. But since
Hierarchical and
regression
continuous distributions can be approximated by discrete ones
models with arbitrary accuracy, we can content ourselves with the
Categorical data theory presented by far.

Gibbs sampler
Basic concepts
The Gibbs sampler can be used to simulate a multivarite
Single-parameter
models distribution with probability function p(x). The Gibbs sampler
Hypothesis
testing
can be implemented if it is possible to generate random
Simple numbers from all of the full conditional distributions, denoted
multiparameter
models
as pi (xi |x−i ), i = 1, ..., d, where x−i = (x1 , ..., xi−1 , xi+1 , ..., xd ).
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Gibbs sampler
Basic concepts
The Gibbs sampler can be used to simulate a multivarite
Single-parameter
models distribution with probability function p(x). The Gibbs sampler
Hypothesis
testing
can be implemented if it is possible to generate random
Simple numbers from all of the full conditional distributions, denoted
multiparameter
models
as pi (xi |x−i ), i = 1, ..., d, where x−i = (x1 , ..., xi−1 , xi+1 , ..., xd ).
Markov chains
MCMC methods The algorithm is implemented so that one first chooses the
Gibbs sampler
Metropolis initial value vector x0 = (x01 , ..., x0d ). After generating the
algorithm
Example random vectors x1 , ..., xt , the vector xt+1 is generated
Metropolis-
Hastings
componentwise as follows:
Convergence
Model checking
✔ Generate xt+1
1 from p1 (x 1 |x t , ..., xt )
2 d
t+1 t+1 t
and comparison ✔ Generate x2 from p2 (x2 |x1 , x3 , ..., xtd )
Hierarchical and
regression
✔ Generate xt+1
3 from p (x |x t+1 t+1 t t
3 3 1 , x2 , x4 , ..., xd )
models
...
Categorical data
✔ Generate xt+1
d from pd (x d |x t+1 t+1
1 , x 2 , ..., x t+1
d−1 )

Gibbs sampler (cont)
Basic concepts
The algorithm produces a Markov chain, since the distribution
Single-parameter
models of x(t+1) is independent of x0 , ..., x(t−1) given xt . It is time
Hypothesis
testing
homogenous, since the transition probabilites are based on the
Simple distributions pj (xj |x−j ) all the time. The chain is not
multiparameter
models
necessarily irreducible, but it is so if the set {x : p(x) > 0} is
Markov chains ’sufficiently’ connected enabling the process to move to all
MCMC methods points of the state space.
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
The algorithm produces a Markov chain, since the distribution
Single-parameter
models of x(t+1) is independent of x0 , ..., x(t−1) given xt . It is time
Hypothesis
testing
homogenous, since the transition probabilites are based on the
Simple distributions pj (xj |x−j ) all the time. The chain is not
multiparameter
models
necessarily irreducible, but it is so if the set {x : p(x) > 0} is
Markov chains ’sufficiently’ connected enabling the process to move to all
MCMC methods points of the state space.
Gibbs sampler
Metropolis
algorithm
We show next that p(x) fulfils the detailed balance condition
Example
Metropolis-
Hastings
p(x) Pr(Xt+1 = x∗ |X t = x) = p(x∗ ) Pr(Xt+1 = x|Xt = x∗ ),
Convergence
Model checking where x = (x1 , ..., xj , ..., xd ) and x∗ = (x1 , ..., x∗j , ..., xd ). For the
and comparison
Hierarchical and
moment we consider that one time step corresponds to
regression
models
changing only one component of x.
Categorical data

Basic concepts
We obtain that
Single-parameter
models
t+1 ∗ t p(x∗ )
Hypothesis p(x) Pr(X = x |X = x) = p(x)pj (x∗j |x−j ) = p(x)
testing p(x−j )
Simple
p(x)
multiparameter
models = p(x∗ ) = p(x∗ )pj (xj |x−j )
p(x−j )
Markov chains
MCMC methods
Gibbs sampler = p(x∗ ) Pr(Xt+1 = x|Xt = x∗ );
Metropolis
algorithm
Example thus p(x) is a stationary distribution.
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
We obtain that
Single-parameter
models
t+1 ∗ t p(x∗ )
Hypothesis p(x) Pr(X = x |X = x) = p(x)pj (x∗j |x−j ) = p(x)
testing p(x−j )
Simple
p(x)
multiparameter
models = p(x∗ ) = p(x∗ )pj (xj |x−j )
p(x−j )
Markov chains
MCMC methods
Gibbs sampler = p(x∗ ) Pr(Xt+1 = x|Xt = x∗ );
Metropolis
algorithm
Example thus p(x) is a stationary distribution.
Metropolis-
Hastings Irreducibility implies the uniqueness of the stationary
Convergence
Model checking
distribution. The chain is also positively recurrent, since
and comparison transient and null recurrent chains do not posses a stationary
Hierarchical and
regression distribution. Further, it is aperiodic, since the new value can be
models
the same as the old. It follows from these properties that the
Categorical data
chain is ergodic.

Metropolis algorithm
Basic concepts
The Metropolis algorithm is different from Gibbs sampling in
Single-parameter
models that it does not require ability to generate random variates
Hypothesis
testing
from conditional distributions. It is sufficient to know the
Simple probability function (or density) of the target density upto a
multiparameter
models
constant of proportionality.
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Metropolis algorithm
Basic concepts
The Metropolis algorithm is different from Gibbs sampling in
Single-parameter
models that it does not require ability to generate random variates
Hypothesis
testing
from conditional distributions. It is sufficient to know the
Simple probability function (or density) of the target density upto a
multiparameter
models
constant of proportionality.
Markov chains
Assume that we want to simulate a distribution with
MCMC methods
Gibbs sampler
probability function p(x) where x may be scalar or vector. We
Metropolis
algorithm
need to define a jumping distribution (or proposal distribution)
Example J(y|x) from which a proposal y may be generated when the
Metropolis-
Hastings current value is x. In the Metropolis algorithm it is assumed
Convergence
that J(y|x) = J(x|y).
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Metropolis algorithm (cont)
Basic concepts
An initial value x0 is first generated. After generating x0 , ..., xt ,
Single-parameter
models the new value xt+1 is obtained as follows: 1) A new proposal y
Hypothesis
testing
is generated from J(y|x). The new value y is accepted with
Simple probability
multiparameter p(y)
models min 1, .
Markov chains p(xt )
MCMC methods
Gibbs sampler
2) If the new value is accepted, we set xt+1 = y, otherwise the
Metropolis old value is kept, so that xt+1 = xt .
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
An initial value x0 is first generated. After generating x0 , ..., xt ,
Single-parameter
models the new value xt+1 is obtained as follows: 1) A new proposal y
Hypothesis
testing
is generated from J(y|x). The new value y is accepted with
Simple probability
multiparameter p(y)
models min 1, .
Markov chains p(xt )
MCMC methods
Gibbs sampler
2) If the new value is accepted, we set xt+1 = y, otherwise the
Metropolis old value is kept, so that xt+1 = xt .
algorithm
Example
Metropolis- The Metropolis algorithm produces a Markov chain, since the
Hastings
Convergence
distribution of the new value xt+1 only depends on the current
Model checking value xt . The chain is also time-homogenous, since the
and comparison
transition probabilities are based on the jumping distribution
Hierarchical and
regression J(y|x), which is not changed during the simulation. Further, it
models
is irreducible if J(y|x) is so chosen that the chain may reach all
Categorical data
points of the state space.
Basic concepts
Next we show that p(x) fulfils the detailed balance condition.
Single-parameter
models Let x and x∗ be two points in the state space such that
Hypothesis
testing
p(x∗ ) ≤ p(x). Then
Simple
multiparameter ∗ p(x∗ )
∗
models p(x) Pr(Xt+1 = x |Xt = x) = p(x)J(x |x)
p(x)
Markov chains
MCMC methods
= p(x∗ )J(x|x∗ )
Gibbs sampler
Metropolis
= p(x∗ ) Pr(Xt+1 = x|Xt = x∗ ).
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Next we show that p(x) fulfils the detailed balance condition.
Single-parameter
models Let x and x∗ be two points in the state space such that
Hypothesis
testing
p(x∗ ) ≤ p(x). Then
Simple
multiparameter ∗ p(x∗ )
∗
models p(x) Pr(Xt+1 = x |Xt = x) = p(x)J(x |x)
p(x)
Markov chains
MCMC methods
= p(x∗ )J(x|x∗ )
Gibbs sampler
Metropolis
= p(x∗ ) Pr(Xt+1 = x|Xt = x∗ ).
algorithm
Example
Metropolis-
Hastings
Convergence Thus, p(x) is the stationary distribution and the chain is
Model checking
and comparison
positively recurrent. Further, since it is also aperiodic, it is
Hierarchical and ergodic.
regression
models
Categorical data

Basic concepts
It is said that a Markov chain mixes slowly if it moves slowly
Single-parameter
models around the support of p(x). Then there is strong
Hypothesis
testing
autocorrelation between the consequtive observations, and the
Simple mean converges slowly to the theoretical mean of the stationary
multiparameter
models
distribution.
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
distribution.
Markov chains
There are two possible reasons for this problem. First, if the
MCMC methods
Gibbs sampler
deviation of the jumping distribution is too small for some
Metropolis
algorithm
component, the chain moves slowly with respect to that
Example component. On the other hand, if the deviation is too large,
Metropolis-
Hastings new proposals are rarely accepted and the chain remains long in
Convergence
the same position.
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
distribution.
Markov chains
There are two possible reasons for this problem. First, if the
MCMC methods
Gibbs sampler
deviation of the jumping distribution is too small for some
Metropolis
algorithm
component, the chain moves slowly with respect to that
Example component. On the other hand, if the deviation is too large,
Metropolis-
Hastings new proposals are rarely accepted and the chain remains long in
Convergence
the same position.
Model checking
and comparison
It is possible to optimize the jumping distribution. If the
Hierarchical and
regression jumping distribution is a d-dimensional normal distribution,
√
models
2
then its optimal covariance matrix is c Σ where c ≈ 2.4/ d and
Categorical data
Σ is the covariance matrix of the target distribution.
Metropolis algorithm: Example
Basic concepts
Let us consider a two-parameter Weibull distribution with the
Single-parameter
models density
Hypothesis
testing
( )
δ δ−1 x δ
Simple
f (x; β, δ) = δ x exp − , x, β, δ > 0.
multiparameter
models
β β
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models density
Hypothesis
testing
( )
δ δ−1 x δ
Simple
f (x; β, δ) = δ x exp − , x, β, δ > 0.
multiparameter
models
β β
Markov chains
MCMC methods With a random sample y1 , ..., yn the likelihood is

Gibbs sampler
!δ−1 ( δ )
X
Metropolis
algorithm δn Y yi
Example p(y|θ) = nδ yi exp − .
Metropolis- β β
Hastings i i
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models density
Hypothesis
testing
( )
δ δ−1 x δ
Simple
f (x; β, δ) = δ x exp − , x, β, δ > 0.
multiparameter
models
β β
Markov chains
MCMC methods With a random sample y1 , ..., yn the likelihood is

Gibbs sampler
!δ−1 ( δ )
X
Metropolis
algorithm δn Y yi
Example p(y|θ) = nδ yi exp − .
Metropolis- β β
Hastings i i
Convergence
Model checking By choosing p(β, δ) ∝ 1/(βδ) as the prior, the posterior becomes
and comparison
!δ−1 ( δ )
X
Hierarchical and
regression
models δ n−1 Y yi
p(β, δ|y) ∝ nδ+1 yi exp − .
Categorical data β β
i i

Metropolis algorithm: Example (cont)
Basic concepts
It would be possible to derive the full conditional posterior
Single-parameter
models distributions and simulate the posterior distribution using
Hypothesis
testing
Gibbs sampling. We could generate random numbers from the
Simple conditional distributions using adaptive rejection sampling.
multiparameter
models
However, it is here simpler to apply the Metropolis algorithm.
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
To illustrate the estimation, we generate an artificial data set of
MCMC methods
Gibbs sampler
100 observations from the Weibull(0.3,10) distribution. Figure 6
Metropolis
algorithm
shows a simulated Markov chain with 10000 iterations, starting
Example from the initial values δ = β = 1. As a jumping distribution we
Metropolis-
Hastings use the bivariate normal distribution, the mean vector being
Convergence
the ’old’ vector and the covariance matrix diag(0.01, 10).
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
To illustrate the estimation, we generate an artificial data set of
MCMC methods
Gibbs sampler
100 observations from the Weibull(0.3,10) distribution. Figure 6
Metropolis
algorithm
shows a simulated Markov chain with 10000 iterations, starting
Example from the initial values δ = β = 1. As a jumping distribution we
Metropolis-
Hastings use the bivariate normal distribution, the mean vector being
Convergence
the ’old’ vector and the covariance matrix diag(0.01, 10).
Model checking
and comparison
The figure shows that the chain converges to its stationary
Hierarchical and
regression distribution rapidly but the chain for β seems to mix poorly.
models
Categorical data

Basic concepts
Single-parameter
models
1.0
Hypothesis
testing
0.8
Simple
delta
0.6
multiparameter
models
0.4
Markov chains
0.2
20
MCMC methods
15
Gibbs sampler
Metropolis
beta
10
algorithm
Example
5
Metropolis-
Hastings
Convergence 0 2000 4000 6000 8000 10000
Time
Model checking
and comparison
Hierarchical and
regression
models Figure 6: Estimating the parameteres of the Weibull distribution:
Categorical data 10000 iterations of the Metropolis algorithm

Basic concepts
Next we simulate 10000 new observations using the optimal
Single-parameter
models covariance matrix 2.42 Σ/2 where Σ is the covariance matrix of
Hypothesis
testing
the target distribution, estimated using the most recent
Simple simulations of the original chain. As an initial value we use the
multiparameter
models
last simulated vector of the first chain. On the basis of Figure 7
Markov chains the mixing is more rapid now. Figure 8 shows the graphs of the
MCMC methods 2.5%, 50% and 97% cumulative quantiles.
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Hypothesis
0.35
testing
0.30
Simple
delta
multiparameter
models
0.25
Markov chains
10 15 20 25
MCMC methods
Gibbs sampler
Metropolis
beta
algorithm
Example
5
Metropolis-
Hastings
Convergence 0 2000 4000 6000 8000 10000
Time
Model checking
and comparison
Hierarchical and
regression
models Figure 7: Estimating the parameteres of the Weibull distribution:
Categorical data 10000 further iterations of the Metropolis algorithm

Basic concepts
Single-parameter delta beta
models
Hypothesis
testing
Simple 0.32
15
multiparameter
0.30
models
Markov chains
0.28
10
MCMC methods
Gibbs sampler
Metropolis
0.26
algorithm
Example
5
Metropolis-
0.24
Hastings
Convergence 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Model checking Iterations Iterations
and comparison
Hierarchical and
regression
models Figure 8: 2.5%, 50% and 97% cumulative quantiles of 10000
Categorical data posterior simulations

Metropolis-Hastings algorithm
Basic concepts
The Metropolis-Hastings algorithm is similar to the Metropolis
Single-parameter
models algorithm except that it is not assumed that the jumping
Hypothesis
testing
distribution J(y|x) is symmetric with respect to the ’old’ value
Simple x. The acceptance probablity of a proposal is now
multiparameter
models
p(y)/J(y|xt )
Markov chains min 1, .
MCMC methods
p(xt )/J(xt |y)
Gibbs sampler
Metropolis
algorithm
It can be shown (exercise) that the algorithm produces a
Example Markov chain with stationary distribution p(x).
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Detecting convergence
Basic concepts
Markov Chain simulation should be continued until reaching
Single-parameter
models the stationary distribution, and after this until reliable
Hypothesis
testing
estimates for the summary statistics of the stationary
Simple distribution have been obtained. The iterations before the
multiparameter
models
convergence are usually disregarded as a burn-in phase.
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
In practice, convergence to stationary distribution can be
MCMC methods
Gibbs sampler
detected by studying various time series plots, such as trace
Metropolis
algorithm
plots, and plots of cumulative summary statistics and
Example autoregression functions.
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
In practice, convergence to stationary distribution can be
MCMC methods
Gibbs sampler
detected by studying various time series plots, such as trace
Metropolis
algorithm
plots, and plots of cumulative summary statistics and
Example autoregression functions.
Metropolis-
Hastings
Convergence
However, it is usually more reliable to also use convergence
Model checking diagnostics. Geweke’s diagnostic is based on comparing the
and comparison
means of the beginning and last parts of the chain. In the
Hierarchical and
regression following, we will introduce Gelman and Rubin’s diagnostic,
models
Categorical data
which is based on comparing several simulated chains.

Gelman and Rubin’s diagnostic
Basic concepts
Suppose we have simulated m chains of n iterations (after
Single-parameter
models removing the burn-in phase). We denote the simulations by
Hypothesis
testing
ψij (i = 1, ..., n; j = 1, ..., n), and compute B and W , the
Simple between- and within-sequence variances:
multiparameter
models
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models m
n X
Markov chains
B= (ψ̄.j − ψ̄.. )2 ,
MCMC methods m−1
Gibbs sampler
j=1
Metropolis
algorithm 1 Pn 1 Pm
Example
where ψ̄.j = n i=1 ψij , ψ̄.. = m j=1 ψ̄.j , and
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models m
n X
Markov chains
B= (ψ̄.j − ψ̄.. )2 ,
MCMC methods m−1
Gibbs sampler
j=1
Metropolis
algorithm 1 Pn 1 Pm
Example
Metropolis-
Hastings
m n
Convergence 1 X 2 2 1 X
Model checking
W = sj , where sj = (ψij − ψ̄.j )2
and comparison m n−1
j=1 i=1
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models m
n X
Markov chains
B= (ψ̄.j − ψ̄.. )2 ,
MCMC methods m−1
Gibbs sampler
j=1
Metropolis
algorithm 1 Pn 1 Pm
Example
Metropolis-
Hastings
m n
Convergence 1 X 2 2 1 X
Model checking
W = sj , where sj = (ψij − ψ̄.j )2
and comparison m n−1
j=1 i=1
Hierarchical and
regression
models We can estimate the posterior variance Var(ψ|y) by the
+
Categorical data d
weighted average Var (ψ|y) = n−1 W + 1
n n B.

Gelman and Rubin’s diagnostic (cont)
Basic concepts +
d
The quantity Var (ψ|y) overestimates the posterior variance if
Single-parameter
models
the starting values are overdispersed, but is unbiased under
Hypothesis
testing stationarity. On the other hand, W underestimates posterior
Simple
multiparameter
variance for any finite n because the individual sequences have
models not had time to range over all of the target distribution.
Markov chains
MCMC methods
Gibbs sampler
Metropolis
algorithm
Example
Metropolis-
Hastings
Convergence
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Gelman and Rubin’s diagnostic (cont)
Basic concepts +
d
The quantity Var (ψ|y) overestimates the posterior variance if
Single-parameter
models
the starting values are overdispersed, but is unbiased under
Hypothesis
testing stationarity. On the other hand, W underestimates posterior
Simple
multiparameter
variance for any finite n because the individual sequences have
models not had time to range over all of the target distribution.
Markov chains
MCMC methods We may monitor convergence using the potential scale factor
Gibbs sampler
Metropolis
s
+
algorithm d
Var (ψ|y)
Example
R̂ =
Metropolis-
Hastings
W
Convergence
Model checking which tells by which factor the posterior deviation estimate can
and comparison
be decreased if simulation is continued. Simulation should be
Hierarchical and
regression continued until R̂ is close to 1 for each parameter ψ. In most
models
practical cases, values below 1.1 would be acceptable.
Categorical data

Gelman and Rubin’s diagnostic: Example
Basic concepts
To illustrate the use of the diagnostic, we continue our example
Single-parameter
models
on the Weibull distribution. We generate 5 chains of length
Hypothesis 1000 using random initial values. After removing the first 500
testing
simulations from each chain, we obtain the following
Simple
multiparameter diagnostics. Also a multivariate version of the diagnostic is
models computed. Here, gelman.diag is a function in R package coda
Markov chains and SIMS is an mcmc object containing the chains.
MCMC methods
Gibbs sampler
Metropolis
algorithm 1> gelman.diag(SIMS)
Example Potential scale reduction factors:
Metropolis-
Hastings
Convergence
Point est. Upper C.I.
Model checking
and comparison delta 1.00 1.01
Hierarchical and beta 1.01 1.02
regression
models
Categorical data
Multivariate psrf
1.01

Gelman and Rubin’s diagnostic: Example (cont)
Basic concepts
delta beta
Single-parameter
models
median median
Hypothesis
1.6
97.5% 97.5%
testing
1.6
Simple 1.5
multiparameter
1.4
shrink factor
shrink factor
models
1.4
1.3
Markov chains
MCMC methods
1.2
1.2
Gibbs sampler
1.1
Metropolis
algorithm
1.0
1.0
Example
Metropolis- 200 400 600 800 1000 200 400 600 800 1000
Hastings last iteration in chain last iteration in chain
Convergence
Model checking
and comparison
Figure 9: The Gelman-Rubin shrink factor might be close to 1 by
Hierarchical and
regression chance. Therefore, a graph (gelman.plot) showing its convergence
models
is useful. Here, the curves show the diagnostic and its 97.5%
Categorical data
quantile for the observation intervals 25:50, 30:60, ..., 500:1000.
Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
Model checking and comparison
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
The conclusions of a Bayesian analysis are conditional on the
Single-parameter
models chosen probability model. Therefore, it is essential to check
Hypothesis
testing
that the model is a reasonable approximation to reality. Model
Simple checking can be done with respect to outliers, sampling
multiparameter
models
distribution, prior distribution, link function, covariates and so
Markov chains on.
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
The conclusions of a Bayesian analysis are conditional on the
Single-parameter
models chosen probability model. Therefore, it is essential to check
Hypothesis
testing
that the model is a reasonable approximation to reality. Model
Simple checking can be done with respect to outliers, sampling
multiparameter
models
distribution, prior distribution, link function, covariates and so
Markov chains on.
MCMC methods
Model checking
We can distinguish three aspects of modelling:
and comparison
Residuals ✔ Criticism: exploratory checking of a single model
Example
Predictive checks
✔ Extension: embedding a model in a larger model
p-values ✔ Comparison: comparing candidate models in terms of their
Example 1
Example 2
fit and predictive power
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Residuals
Basic concepts
A widely used and useful technique for model checking is
Single-parameter
models plotting residuals. They help, for example, detect outliers,
Hypothesis
testing
autocorrelation and problems in distributional assumptions.
Simple They measure the deviation between observations and
multiparameter
models
estimated expected values.
Markov chains
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Residuals
Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
A Pearson residual is defined as
MCMC methods
Model checking yi − E(yi |θ)
and comparison ri (θ) = p .
Residuals Var(yi |θ)
Example
Predictive checks
p-values In classical analysis, θ is replaced by its fitted value, while in
Example 1
Example 2
Bayesian analysis the residuals have a posterior distribution.
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Residuals
Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
A Pearson residual is defined as
MCMC methods
Model checking yi − E(yi |θ)
and comparison ri (θ) = p .
Residuals Var(yi |θ)
Example
Predictive checks
p-values In classical analysis, θ is replaced by its fitted value, while in
Example 1
Example 2
Bayesian analysis the residuals have a posterior distribution.
Deviance
DIC Example. We consider the child heart surgery data in Table 1.
Example
Figure 10 shows the box plot of Pearson residuals assuming
Hierarchical and
regression that yi ∼ Bin(θ, mj ). Hospital H appears to be an outlier.
models
Categorical data
Residuals (cont)
Basic concepts
Single-parameter
models
6
Hypothesis
testing
Simple 4
multiparameter
models
Markov chains
2
MCMC methods
Model checking
0
and comparison
Residuals
Example
−2
Predictive checks
p-values
Example 1
−4
Example 2 E A D G L F I C J B K H
Deviance
DIC
Example
Hierarchical and
regression
models
Figure 10: Box plot of Pearson residuals for heart surgery data
Categorical data
Residuals: code for making the box plot
Basic concepts
Single-parameter
model{
models theta ~ dbeta(1,1)
Hypothesis
testing
for(j in 1:J){
Simple
y[j] ~ dbin(theta,m[j])
multiparameter res[j] <- (y[j]-m[j]*theta)/sqrt(m[j]*theta*(1-theta))
models
}}
Markov chains
MCMC methods
Model checking
hospital <- list(J=J,m=m,y=y)
and comparison hospital.jag <- jags.model("Hospital2.txt",hospital)
Residuals
Example
hospital.coda <- coda.samples(hospital.jag,c("theta","res"),1
Predictive checks
p-values med <- apply(hospital.coda[[1]][,-13],2,median)
Example 1
Example 2 ind <- order(med)
Deviance Res <- as.list(1:J)
DIC
Example
for(j in 1:J) Res[[j]] <-
Hierarchical and
c(hospital.coda[[1]][,paste("res[",ind[j],"]",sep="")])
regression boxplot(Res,names=names(y)[ind])
models
Categorical data
Predictive checks and Bayesian p-values
Basic concepts
Residuals are examples of statistics which measure the
Single-parameter
models discrepancy between the data and the assumed model. These
Hypothesis
testing
statistics are usually easy to calculate, but we beed a method to
Simple determine if the observed discrepancy is significant. Here, we
multiparameter
models
may use so-called Bayesian p-values obtained by simulating the
Markov chains posterior predictive distribution of the test statistic.
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Predictive checks and Bayesian p-values
Basic concepts
Residuals are examples of statistics which measure the
Single-parameter
models discrepancy between the data and the assumed model. These
Hypothesis
testing
statistics are usually easy to calculate, but we beed a method to
Simple determine if the observed discrepancy is significant. Here, we
multiparameter
models
may use so-called Bayesian p-values obtained by simulating the
Markov chains posterior predictive distribution of the test statistic.
MCMC methods
Model checking
Ideally, models should be checked by comparing the predictions
and comparison of a model to new data. Suppose that the data y is divided into
Residuals
Example two parts: yf for fitting the model, and yc for model criticism.
Predictive checks
p-values
Then the comparisons are based on the predictive distribution,
Example 1 Z
Example 2
Deviance p(ycpred |yf ) = p(ycpred |θ)p(θ|yf )dθ,
DIC
Example
Hierarchical and
regression
simulated by drawing θ from p(θ|yf ) and ycpred from p(ycpred |θ).
models
Categorical data
Predictive checks and Bayesian p-values (cont)
Basic concepts
A function T (yc ) is called a test statistic (Gelman et al., 2004)
Single-parameter
models if it would have an extreme value if the data yc conflict with the
Hypothesis
testing
assumed model. By choosing T (yc ) = yci one can check for
Simple individual outliers.
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
One can check whether T (yc ) is extreme graphically or by
Markov chains
computing the Bayesian p-value
MCMC methods
Model checking
and comparison p = Pr(T (ycpred ) ≤ T (yc )|yf ).
Residuals
Example
Predictive checks This can be obtained by drawing simulations ycpred from the
p-values
Example 1
posterior predictive distribution, and by calculating the
Example 2 proportion of cases where T (ycpred ) ≤ T (yc ).
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
One can check whether T (yc ) is extreme graphically or by
Markov chains
computing the Bayesian p-value
MCMC methods
Model checking
and comparison p = Pr(T (ycpred ) ≤ T (yc )|yf ).
Residuals
Example
Predictive checks This can be obtained by drawing simulations ycpred from the
p-values
Example 1
posterior predictive distribution, and by calculating the
Example 2 proportion of cases where T (ycpred ) ≤ T (yc ).
Deviance
DIC
Example
In practice, the same data set is often used for fitting and
Hierarchical and checking (yc = yf = y). In this case the diagnostics are likely to
regression
models be conservative.
Categorical data
Predictive checks and Bayesian p-values: Example 1
Basic concepts
In the previous example of cardic surgery death rates the value
Single-parameter
models
of the hospital H appeared to be an outlier. We may compute
Hypothesis its predictive p-value (using the mid p-value
Pr(yipred > yi |y−i ) + 21 Pr(yipred = yi |y−i ) for discrete data):
testing
Simple
multiparameter
models
Markov chains model{

MCMC methods theta ~ dbeta(1,1)
Model checking
and comparison
for(j in 1:7){y[j] ~ dbin(theta,m[j])}
Residuals for(j in 9:J){y[j] ~ dbin(theta,m[j])}
Example
#predicted number of deaths in the 8th hospital
Predictive checks
p-values y8.pred ~ dbin(theta,m[8])
Example 1 P <- step(y8.pred-y[8]-0.001)+0.5*equals(y8.pred,y[8])
Example 2
Deviance
}
DIC Mean SD Naive SE Time-series SE
Example
P 0.00035 0.01803 0.0001803 0.0001803
Hierarchical and
regression
y8.pred 14.68810 3.86418 0.0386418 0.0386418
models
Categorical data
Basic concepts
We continue our study of the Newcomb data, and use the
Single-parameter
models
statistics T1 = min(y) and T2 = (y(1) − y(n/2) )/(y(n/4) − y(n/2) ),
Hypothesis where y(j) is the jth lowest value of y.
testing
Simple
multiparameter
Markov chains y[i] ~ dnorm(mu,tau)
MCMC methods yrep[i] ~ dnorm(mu,tau)
Model checking }
and comparison
Residuals n.50 <- round(n/2)
Example n.25 <- round(n/4)
Predictive checks
p-values
yrep.sort <- sort(yrep[])
Example 1 T1.rep <- yrep.sort[1]
Example 2 yrep.50 <- yrep.sort[n.50]
Deviance
DIC yrep.25 <- yrep.sort[n.25]
Example T2.rep <- (T1.rep-yrep.50)/(yrep.25-yrep.50)
Hierarchical and P1 <- step(T1.rep-T1.obs)
regression
models P2 <- step(T2.rep-T2.obs)
Categorical data
Basic concepts
Single-parameter
models
Hypothesis
2000
testing
Simple 1500
multiparameter
Frequency
models
1000
Markov chains
MCMC methods
500
Model checking
and comparison
Residuals
0
Example
0 5 10 15 20 25
Predictive checks
p-values
Example 1
Example 2
Deviance Figure 11: The figure shows the posterior predictive distribution
DIC
Example of T2 . We see that T2obs indicated by a vertical line would be
Hierarchical and implausibly large if the model was correct.
regression
models
Categorical data
Model comparison using deviances
Basic concepts
Model fit can be summarized with the deviance, defined as
Single-parameter
models
Hypothesis D(θ) = −2 log p(y|θ)
testing
Simple
multiparameter where p(y|θ) is the likelihood function.
models
Markov chains
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
models
testing
Simple
multiparameter where p(y|θ) is the likelihood function.To obtain a summary
models
that depends on y only, θ can be replaced with a point estimate
Markov chains
θ̂, such as posterior mean. We obtain
MCMC methods
Model checking
and comparison D(θ̂) = −2 log p(y|θ̂).
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
models
testing
Simple
models
Markov chains
MCMC methods
Model checking
Residuals
Example
Predictive checks This may give an over-optimistic picture of the model fit.
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
models
testing
Simple
models
Markov chains
MCMC methods
Model checking
Residuals
Example
Predictive checks This may give an over-optimistic picture of the model fit. A
p-values
Example 1 natural Bayesian alternative is the posterior mean deviance
Example 2
Deviance
DIC
D̄ = E(D(θ)|y).
Example
Hierarchical and
regression
models
Categorical data
Model comparison using deviances (cont)
Basic concepts
It is easy to estimate D̄ using posterior simulations θl :
Single-parameter
models
L
Hypothesis
ˆ 1X
testing
D̄ = D(θl ).
Simple L
multiparameter l=1
models
Markov chains
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
models
L
Hypothesis
ˆ 1X
testing
D̄ = D(θl ).
Simple L
multiparameter l=1
models
Markov chains The difference between the posterior mean deviance and the
MCMC methods deviance at θ̂ represents the effect of model fitting and is called
Model checking
and comparison the effective number of parameters:
Residuals
Example
Predictive checks pD = D̄ − D(θ̂).
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
models
L
Hypothesis
ˆ 1X
testing
D̄ = D(θl ).
Simple L
multiparameter l=1
models
Markov chains The difference between the posterior mean deviance and the
MCMC methods deviance at θ̂ represents the effect of model fitting and is called
Model checking
and comparison the effective number of parameters:
Residuals
Example
Predictive checks pD = D̄ − D(θ̂).
p-values
Example 1
Example 2
In nonhierarchical models, if the number of observations is large
Deviance or the prior information is weak, pD is usually approximately
DIC
Example equal to the actual number of parameters.
Hierarchical and
regression
models
Categorical data
Deviance information criterion, DIC
Basic concepts
When the goal is to choose an optimal model for prediction, the
Single-parameter
models expected predictive deviance,
Hypothesis
E[−2 log(p(y rep , θ̂(y)))],

testing
Simple
multiparameter
models
has been suggested as a criterion of model fit. Here the
Markov chains
expectation is taken over the unknown true distribution of y rep .
MCMC methods
Model checking
and comparison
Residuals
Example
Predictive checks
p-values
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
Hypothesis
E[−2 log(p(y rep , θ̂(y)))],

testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
This can be approximated by the deviance information
Residuals criterion (DIC):
Example
Predictive checks
p-values DIC = D(θ̂) + 2pD = D̄ + pD .
Example 1
Example 2
Deviance
DIC
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
Hypothesis
E[−2 log(p(y rep , θ̂(y)))],

testing
Simple
multiparameter
models
Markov chains
MCMC methods
Model checking
and comparison
This can be approximated by the deviance information
Residuals criterion (DIC):
Example
Predictive checks
p-values DIC = D(θ̂) + 2pD = D̄ + pD .
Example 1
Example 2
Deviance This can usually be easily computed using posterior simulation.
DIC
Example
When the prior information is weak or the sample size large,
Hierarchical and p ≈ pD , implying that DIC ≈ AIC.
regression
models
Categorical data
Deviance information criterion (example)
Basic concepts
We regress the incidence of pine processionary caterpillars on 8
Single-parameter
models potential exploratory variables. (The data set is caterpillar in R
Hypothesis
testing
package bayess.) The response variable y is the log transform of
Simple the average number of nests per tree.
multiparameter
models The explanatory variables:
Markov chains
x1 altitude (in meters)
MCMC methods
Model checking
x2 slope (in degrees)
and comparison x3 number of pine trees in the area
Residuals
Example x4 height (in meters) of the tree sampled at the center of the
Predictive checks
p-values
area
Example 1 x5 orientation of the area (from 1 if southbound to 2 otherwise)
Example 2
Deviance
x6 height (in meters) of the dominant tree
DIC x7 number of vegetation strata
Example
Hierarchical and
x8 mix settlement index (from 1 if not mixed to 2 if mixed)
regression
models
Categorical data
DIC example (cont)
Basic concepts
Single-parameter
#JAGS code
Hypothesis
testing
y[i] ~ dnorm(mu[i],tau)
Simple
mu[i] <- b0 +b[1]*X[i,1]+b[2]*X[i,2]+b[3]*X[i,3]+
multiparameter b[4]*X[i,4]+b[5]*X[i,5]+b[6]*X[i,6]+
models
b[7]*X[i,7]+b[8]*X[i,8]
Markov chains
}
MCMC methods
Model checking
b0 ~ dnorm(0,0.001)
and comparison tau ~ dgamma(0.001,0.001)
Residuals
Example
for(j in 1:8){
Predictive checks b[j] ~ dnorm(0,0.001)
p-values
Example 1
Example 2 #R code
Deviance cp.jag <- jags.model("caterpillar.txt",data,n.chains=2)
DIC
Example
cp.coda <- coda.samples(cp.jag,c("b0","b","tau"),10000)
Hierarchical and
summary(cp.coda)
regression dic.samples(cp.jag,n.iter=100000,type="pD")
models
Categorical data
DIC example (cont)
Basic concepts
According to the results, only β1 , β2 and β7 are ’significant’ in
Single-parameter
models the sense that 0 is not included in their 95% posterior intervals.
Hypothesis
testing
Simple
multiparameter 2.5% 25% 50% 75% 97.5%
models
b[1] -0.6285 -0.4439 -0.35067 -0.25715 -0.06983
Markov chains
b[2] -0.4792 -0.3324 -0.25874 -0.18520 -0.03537
MCMC methods
b[3] -0.1268 0.2279 0.39554 0.55959 0.89309
Model checking
and comparison b[4] -0.4393 -0.1638 -0.02962 0.10728 0.36899
Residuals b[5] -0.3364 -0.1891 -0.11807 -0.04442 0.10419
Example
Predictive checks
b[6] -0.6400 -0.2028 0.03315 0.26520 0.72730
p-values b[7] -1.2519 -0.8400 -0.63957 -0.44101 -0.04077
Example 1
b[8] -0.2799 -0.1311 -0.05702 0.01804 0.17237
Example 2
Deviance b0 0.6114 0.7450 0.81175 0.87794 1.01362
DIC tau 1.6534 2.5258 3.10635 3.76617 5.25581
Example
Hierarchical and
regression
models
Categorical data
DIC example (cont)
Basic concepts
Now, the model selection criteria are estimated as follows:
Single-parameter
models
ˆ = 56.7, p̂ = 10.95 and DIC
D̄ d = 67.65. When the unsignificant
D
Hypothesis ˆ and DIC
variables are removed both D̄ d become smaller,
testing
Simple indicating a better model.
multiparameter
models
Markov chains
#Original model
MCMC methods
Mean deviance: 56.7
Model checking
and comparison penalty 10.95
Residuals
Penalized deviance: 67.65
Example
Predictive checks
p-values #Restricted model
Example 1
Example 2
Mean deviance: 55.52
Deviance penalty 5.369
DIC
Penalized deviance: 60.89
Example
Hierarchical and
regression
models
Categorical data
Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
Hierarchical and regression models
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data

Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
multiparameter
models
Markov chains
Categorical data
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data
Generalized
linear model
Binomial model
Example

Generalized linear model
Basic concepts
In linear models it is assumed that the response variable is
Single-parameter
models normally distributed and its expected value is a linear
Hypothesis
testing
combination of the explanatory variables. Generalized linear
Simple models extend the idea of linear modelling to cases where either
multiparameter
models
of these assumptions may not be appropriate.
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data
Generalized
linear model
Binomial model
Example

Generalized linear model
Basic concepts
In linear models it is assumed that the response variable is
Single-parameter
models normally distributed and its expected value is a linear
Hypothesis
testing
combination of the explanatory variables. Generalized linear
Simple models extend the idea of linear modelling to cases where either
multiparameter
models
of these assumptions may not be appropriate.
Markov chains
A generalized linear model is specified in three stages:
MCMC methods Pp
Model checking ✔ The linear predictor ηi = β0 + j=1 βj xij
and comparison
Hierarchical and
✔ The link function g(.) which relates the linear predictor to
regression
models
the mean of the response variable: g(µi ) = ηi , where
Categorical data
µi = E(yi )
Generalized
linear model
✔ The distribution of yi given its mean µi . In general, this
Binomial model distribution may also depend on a dispersion parameter φ.
Example

Binomial regression
Basic concepts
Binomial regression is perhaps the most popular application of
Single-parameter
models the generalized linear model. Suppose that yi ∼ Bin(ni , µi )
Hypothesis
testing
where ni is known. Then one usually specifies a model for µi ,
Simple the mean of yi /ni . Choosing the logistic transformation
multiparameter
models
g(µi ) = log(µi /(1 − µi )) leads to logistic regression. The
Markov chains likelihood in this case is
MCMC methods n
Y η
y i ni −yi
Model checking ni e i 1
and comparison p(y|β) = η η
.
yi 1+e i 1+e i
Hierarchical and i=1
regression
models
Categorical data
Generalized
linear model
Binomial model
Example

Binomial regression
Basic concepts
Binomial regression is perhaps the most popular application of
Single-parameter
models the generalized linear model. Suppose that yi ∼ Bin(ni , µi )
Hypothesis
testing
where ni is known. Then one usually specifies a model for µi ,
Simple the mean of yi /ni . Choosing the logistic transformation
multiparameter
models
g(µi ) = log(µi /(1 − µi )) leads to logistic regression. The
Markov chains likelihood in this case is
MCMC methods n
Y η
y i ni −yi
Model checking ni e i 1
and comparison p(y|β) = η η
.
yi 1+e i 1+e i
Hierarchical and i=1
regression
models
Categorical data
Another popular choice for a link function is the probit link
Generalized g(µ) = Φ−1 (µ) where Φ(.) is the distribution function of a
linear model
Binomial model standard normal variable. The likelihood becomes
Example
n
Y ni
p(y|β) = (Φ(ηi ))yi (1 − Φ(ηi ))ni −yi .
yi
i=1

Binomial regression: Example
Basic concepts
Single-parameter
models
Hypothesis
testing
Simple
Table 6: Bioassay data from Racine et al. (1986).
multiparameter
models Dose, xi Number of Number of
Markov chains (log g/ml) animals, ni deaths , yi
MCMC methods
-0.86 5 0
Model checking
and comparison -0.30 5 1
Hierarchical and
regression
-0.05 5 3
models 0.73 5 5
Categorical data
Generalized
linear model
Binomial model As an example we consider 4 batches of 5 animals, each of
Example
which is given a different dose of a drug. We are interested in
determining the toxity of the drug. Table 6 reports the
numbers of deaths for the different dose levels.
Binomial regression: Example (cont)
Basic concepts
We assume that the numbers of deaths are binomially
Single-parameter
models distributed,
Hypothesis
testing
yi |θi ∼ Bin(ni , θi ),
Simple
multiparameter
and that there is a simple linear linear relationship between the
models
logit of the mortality θi and the dose level xi :
Markov chains
MCMC methods
logit(θi ) = α + βxi ,
Model checking
and comparison
Hierarchical and where logit(θi ) = log(θi /(1 − θi )).
regression
models
Categorical data
Generalized
linear model
Binomial model
Example

Basic concepts
We assume that the numbers of deaths are binomially
Single-parameter
models distributed,
Hypothesis
testing
yi |θi ∼ Bin(ni , θi ),
Simple
multiparameter
and that there is a simple linear linear relationship between the
models
logit of the mortality θi and the dose level xi :
Markov chains
MCMC methods
logit(θi ) = α + βxi ,
Model checking
and comparison
Hierarchical and where logit(θi ) = log(θi /(1 − θi )). Now the posterior of (α, β) is
regression
models
k
Y
Categorical data ni yi
Generalized p(α, β|y) ∝ p(α, β) θi (1 − θi )ni −yi ,
linear model yi
Binomial model i=1
Example
where p(α, β) is the prior, θi = eηi /(1 + eηi ) and ηi = α + βxi .

Basic concepts
There are several ways to specify the prior information.
Single-parameter
models However, we assume that α ∼ N (0, 1000) and
Hypothesis
testing
β ∼ TN(0, 1000; 0, ∞) (truncated normal distribution). We
Simple truncate the prior of β from below at 0, since we believe that
multiparameter
models
the dose is harmful so that β ≥ 0.
Markov chains
MCMC methods
Model checking
and comparison
Hierarchical and
regression
models
Categorical data
Generalized
linear model
Binomial model
Example

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
We also wish to determine the LD50, the dose level at which
MCMC methods
Model checking
the probability of death is 50%. Thus, we determine x so that
and comparison
Hierarchical and logit(0.5) = α + βx.
regression
models
Categorical data Solving this gives that the LD50 is x = −β/α.

Generalized
linear model
Binomial model
Example

Basic concepts
Single-parameter
Hypothesis
testing
multiparameter
models
Markov chains
We also wish to determine the LD50, the dose level at which
MCMC methods
Model checking
the probability of death is 50%. Thus, we determine x so that
and comparison
Hierarchical and logit(0.5) = α + βx.
regression
models
Categorical data Solving this gives that the LD50 is x = −β/α.

Generalized
linear model
Binomial model
Figure 12 shows the results of the analysis.
Example

Basic concepts
Single-parameter
models
2500
1.0
Hypothesis
testing
2000
Simple 0.8
probability of death
multiparameter
1500
0.6
Frequency
models
Markov chains
1000
0.4
MCMC methods
500
0.2
Model checking
and comparison
0.0
Hierarchical and
0
regression
−1.0 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5
models
dose LD50
Categorical data
Generalized
linear model
Binomial model Figure 12: Results of the bioassay experiment. Left: Probability
Example
of death as a function of dose with 95% posterior interval and
’observed values’. Right: The posterior distribution of LD50.

Basic concepts
Single-parameter
alpha ~ dnorm(0,0.001)
models beta ~ dnorm(0,0.001)T(0,) #Truncated distribution
Hypothesis
testing
for(i in 1:k){
Simple
logit(theta[i]) <- alpha+beta*x[i]
multiparameter y[i] ~ dbinom(theta[i],n[i])
models
}
Markov chains
LD50 <- -alpha/beta
MCMC methods
Model checking
for(i in 1:K){
and comparison logit(theta.pred[i]) <- alpha+beta*xpred[i]}
Hierarchical and
regression
models #R code:
Categorical data bioassay.coda <- coda.samples(bioassay.jag,c("alpha","beta",
Generalized
linear model
"LD50","theta.pred"),10000)
Binomial model a <- summary(bioassay.coda)
Example
med <- a$quantiles[-(1:3),3]
plot(xpred,med,type="l",ylim=c(0,1),xlab="dose",ylab="probabi
points(x,y/n,pch=19)

Literature
Basic concepts
✔ Congdon, Peter, Bayesian Statistical Modelling, Wiley, 2001
Single-parameter
models
Hypothesis
testing
✔ Davison, A. C. Statistical Models, Cambridge University
Simple Press, 2003
multiparameter
models
✔ Gelman et al, Bayesian Data Analysis, Chapman &
Markov chains Hall/CRC, 2nd edition, 2004
MCMC methods ✔ Kruschke, John K, Doing Bayesian Data Analysis: A
Model checking
and comparison
Tutorial with R and BUGS, Elsevier, 2011
Hierarchical and ✔ Lee, Peter M., Bayesian Statistics: An Introduction, Wiley,
regression
models 4th edition, 2012.
Categorical data ✔ Lunn et al, The BUGS Book: A Practical Introduction to
Generalized
linear model Bayesian Analysis, CRC Press, 2013
Binomial model ✔ Rohatgi, Vijay K. Statistical Inference, Dover Publications,
Example
2003
✔ Ross, Sheldon M, Introduction to Probability Models,
Academic Press, 6th (or newer) edition, 1997

Ba Yes I An Analysis

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Ba Yes I An Analysis

Caricato da

Copyright:

Formati disponibili

INTRODUCTION TO BAYESIAN ANALYSIS

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 1 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 2 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 3 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 3 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 3 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 4 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 4 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 5 / 130

Markov chains (Example from Davison, 2003).

Markov chains which denotes that p(θ|y) is proportional to p(θ)p(y|θ).

Intervals 0.08 0.10 0.12 0.14

Intervals 0.00 0.05 0.10 0.15 0.20

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 41 / 130

Simple i=1 2πσ 2

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 42 / 130

Simple i=1 2πσ 2

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 42 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 43 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 43 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 44 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 44 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 44 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 45 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 45 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 46 / 130

Markov chains = σ 2 + τn2 .

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 46 / 130

Markov chains = σ 2 + τn2 .

p(ỹ|y) = N (ỹ|µn , σ 2 + τn2 ).

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 46 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 47 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 47 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 48 / 130

which is Neg-Bin(α, β), the negative binomial distribution.

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 48 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 49 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 49 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 51 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 52 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 52 / 130

which is the exponential distribution function starting at y0 .

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 52 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 53 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 53 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 54 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 54 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 54 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 55 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 56 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 57 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 57 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 58 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 58 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 58 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 59 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 60 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 61 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 61 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 62 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 62 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 63 / 130

Introduction to Bayesian analysis, autumn 2013 University of Tampere – 63 / 130