Topic5 State Space Models 2019

Bayesian State Space Models
() State Space Models 1 / 40

Introduction
State space methods are used for a wide variety of time series
problems
They are important in and of themselves in economics (e.g.
trend-cycle decompositions, structural time series models, dealing
with missing observations, etc.)
Also time-varying parameter VARs (TVP-VARs) and stochastic
volatility are state space models
DSGE models are state space models (DYNARE popular Bayesian
code for estimation)
Advantage of state space models: well-developed set of MCMC
algorithms for doing Bayesian inference

Remember: our general notation for a VAR was:
yt = Zt β + ε
In many macroeconomic applications, constant β is unrealistic

This leads to TVP-VAR:
yt = Zt βt + εt
where
β t + 1 = β t + ut
This is a state space model.
In VAR assume εt to be i.i.d. N (0, Σ)
In empirical macroeconomics, this is often unrealistic.
Want to have var (εt ) = Σt
This also leads to state space models.

The Normal Linear State Space Model
Fairly general version of Normal linear state space model:
Measurement equation:
yt = Wt δ + Zt βt + εt
State equation:
β t + 1 = Tt β t + ut
yt and εt de…ned as for VAR
Wt is known M p0 matrix (e.g. lagged dependent variables or
explanatory variables with constant coe¢ cients)
Zt is known M K matrix (e.g. lagged dependent variables or
explanatory variables with time varying coe¢ cients)
βt is k 1 vector of states (e.g. VAR coe¢ cients)
εt ind N (0, Σt )
ut ind N (0, Qt ).
εt and us are independent for all s and t.
Tt is a k k matrix (usually …xed, but sometimes not).
Key idea: for given values for δ, Tt , Σt and Qt (called “system
matrices”) posterior simulators for βt for t = 1, .., T exist.
E.g. Carter and Kohn (1994, Btka), Fruhwirth-Schnatter (1994,
JTSA), DeJong and Shephard (1995, Btka) and Durbin and
Koopman (2002, Btka).
Precision based sampler of Joshua Chan (http://joshuachan.org/)
I will not present details of these (standard) algorithms
These algorithms involve use of methods called Kalman …ltering and
smoothing
Filtering = estimating a state at time t using data up to time t
Smoothing = estimating a state at time t using data up to time T

0
Notation: βt = β10 , .., βt0 stacks all the states up to time t (and
similar superscript t convention for other things)
Gibbs sampler: p βT jy T , δ, T T , ΣT , Q T drawn use such an
algorithm
p δjy T , βT , T T , ΣT , Q T , p T T jy T , βT , δ, ΣT , Q T ,
p ΣT jy T , βT , δ, T T , Q T and p Q T jy T , βT , δ, T T , ΣT depend
on precise form of model (typically simple since, conditional on βT
have a Normal linear model)
Typically restricted versions of this general model used
TVP-VAR of Primiceri (2005, ReStud) has δ = 0, Tt = I and Qt = Q

Example of an MCMC Algorithm
Special case δ = 0, Tt = I , Σt = Σ and Qt = Q

Homoskedastic TVP-VAR of Cogley and Sargent (2001, NBER)
Need prior for all parameters
But state equation implies hierarchical prior for βT :
β t +1 j β t , Q N ( βt , Q )
Formally:
T
p βT jQ = ∏p βt j βt 1, Q
t =1
Hierarchical: since it depends on Q which, in turn, requires its own

prior.

Note β0 enters prior for β1 .
Need prior for β0
Standard treatments exist.
E.g. assume β0 = 0, then:
β1 jQ N (0, Q )
Or Carter and Kohn (1994) simply assume β0 has some prior that
researcher chooses

Convenient to use Wishart priors for Σ 1 and Q 1
Σ 1
W S 1
,ν
1 1
Q W Q , νQ

Want MCMC algorithm which sequentially draws from
p Σ 1 jy T , βT , Q , p Q 1 jy T , Σ, βT and p βT jy T , Σ, Q .
For p βT jy T , Σ, Q use standard algorithm for state space models

(e.g. Carter and Kohn, 1994)
Can derive p Σ 1 jy T , βT , Q and p Q 1 jy T , Σ, βT using
methods similar to those used in section on VAR independent
Normal-Wishart model.

Conditional on βT , measurement equation is like a VAR with known
coe¢ cients.
This leads to:
1
Σ 1
jy T , βT W S ,ν
where
ν = T +ν
T
S =S+ ∑ ( yt Wt δ Zt βt ) (yt Wt δ Zt βt )0
t =1

Conditional on βT , state equation is also like a VAR with known
coe¢ cients.
This leads to:
1
Q 1
jy T , βT W Q , νQ
where
νQ = T + νQ
T
∑
0
Q=Q+ β t +1 βt β t +1 βt .
t =1

Nonlinear State Space Models
Normal linear state space model useful for empirical macroeconomists

E.g. trend-cycle decompositions, TVP-VARs, linearized DSGE
models, etc.
Some models have yt being a nonlinear function of the states (e.g.
DSGE models which have not been linearized)
Increasing number of Bayesian tools for nonlinear state space models
(e.g. the particle …lter)
Here we will focus on stochastic volatility

Univariate Stochastic Volatility
Begin with yt being a scalar (common in …nance)

Stochastic volatility model:
ht
yt = exp εt
2
ht + 1 = µ + φ ( ht µ) + η t
εt is i.i.d. N (0, 1) and η t is i.i.d. N 0, σ2η . εt and η s are

independent.
This is state space model with states being ht , but measurement
equation is not a linear function of ht

ht is log of the variance of yt (log volatility)
Since variances must be positive, common to work with log-variances
Note µ is the unconditional mean of ht .
Initial conditions: if jφj < 1 (stationary) then:
!
σ2η
h0 N µ,
1 φ2
if φ = 1, µ drops out of the model and However, when φ = 1, need a

prior such as h0 N (h, V h )
e.g. Primiceri (2005) chooses V h using training sample

MCMC Algorithm for Stochastic Volatility Model
MCMC algorithm involves sequentially drawing from

p hT jy T , µ, φ, σ2η , p φjy T , µ, σ2η , hT , p µjy T , φ, σ2η , hT and
p σ2η jy T , µ, φ, hT
Last three standard forms based on results from Normal linear
regression model and will not present here.
Several algorithms exist for p hT jy T , µ, φ, σ2η
Here we describe a popular one from Kim, Shephard and Chib (1998,
ReStud)
For complete details, see their paper. Here we outline ideas.

Square and log the measurement equation:
y t = ht + ε t
where yt = ln yt2 and εt = ln ε2t .

Now the measurement equation is linear so maybe we can use
algorithm for Normal linear state space model?
No, since error is no longer Normal (i.e. εt = ln ε2t )
Idea: use mixture of di¤erent Normal distributions to approximate
distribution of εt .

Mixtures of Normal distributions are very ‡exible and have been used
widely in many …elds to approximate unknown or inconvenient
distributions.
7
p ( εt ) ∑ q i fN εt jmi , vi2
i =1
where fN εt jmi , vi2 is the p.d.f. of a N mi , vi2

since εt is N (0, 1), εt involves no unknown parameters
Thus, qi , mi , vi2 for i = 1, .., 7 are not parameters, but numbers (see
Table 4 of Kim, Shephard and Chib, 1998).

Mixture of Normals can also be written in terms of component
indicator variables, st 2 f1, 2, .., 7g
εt jst = i N mi , vi2
Pr (st = i ) = qi
MCMC algorithm does not draw from p hT jy T , µ, φ, σ2η , but from

p hT jy T , µ, φ, σ2η , s T .
But, conditional on s T , knows which of the Normals εt comes from.
Result is a Normal linear state space model and familiar algorithm can
be used.
Finally, need p s T jy T , µ, φ, σ2η , hT but this has simple form (see
Kim, Shephard and Chib , 1998)

Multivariate Stochastic Volatility
yt is now M 1 vector and εt is i.i.d. N (0, Σt ).

Many ways of allowing Σt to be time-varying
But must worry about overparameterization problems
TM (M +1 )
Σt for t = 1, .., T contains 2 unknown parameters
Here we discuss three particular approaches popular in
macroeconomics
To focus on multivariate stochastic volatility, use model:
yt = εt

Multivariate Stochastic Volatility Model 1
Σt = Dt
where Dt is a diagonal matrix with diagonal elements dit
dit has standard univariate stochastic volatility speci…cation
dit = exp (hit ) and
hi ,t +1 = µi + φi (hit µi ) + η it
if η it are independent (across both i and t) then Kim, Shephard and

Chib (1998) MCMC algorithm can be used one equation at a time.
But many interesting macroeconomic features (e.g. impulse
responses) depend on error covariances so assuming Σt to be diagonal
often will be a bad idea.

Cogley and Sargent (2005, RED)
Σt = L 1
Dt L 10
Dt is as in Model 1 (diagonal matrix with diagonal elements being

variances)
L is a lower triangular matrix with ones on the diagonal.
E.g. M = 3 2 3
1 0 0
L = 4 L21 1 0 5
L31 L32 1

We can transform model as:
Lyt = Lεt
εt = Lεt will now have a diagonal covariance matrix – can use

algorithm for Model 1.
MCMC algorithm: p hT jy T , L can use Kim, Shephard and Chib
(1998) algorithm one equation at a time.
p Ljy T , hT results similar to those from a series of M regression
equations with independent Normal errors.
See Cogley and Sargent (2005) for details.

Cogley-Sargent model allows the covariance between errors to change
over time, but in restricted fashion.
E.g. M = 2 then cov (ε1t , ε2t ) = d1t L21 which varies proportionally
with the error variance of the …rst equation.
Impulse response analysis: a shock to i th variable has an e¤ect on j th
variable which is constant over time
In many macroeconomic applications this is too restrictive.

Primiceri (2005, ReStud):
Σ t = Lt 1 Dt Lt 10
Lt is same as Cogley-Sargent’s L but is now time varying.

Does not restrict Σt in any way.
MCMC algorithm same as for Cogley-Sargent except for Lt

How does Lt evolve?
M (M 1 )
Stack unrestricted elements by rows into a 2 vector as
0
lt = L21,t , L31,t , L32,t , .., Lp (p 1 ),t .
lt +1 = lt + ζ t
ζ t is i.i.d. N 0, Dζ and Dζ is a diagonal matrix.
Can transform model so that algorithm for Normal linear state space
model can draw lt
See Primiceri (2005) for details
Note: if Dζ is not diagonal have to be careful (no longer Normal state
space model)

More Uses of State Space Models
MCMC algorithms such as the Gibbs sampler are modular in nature

(sequentially draw from blocks)
By combining simple blocks together you can end up with very
‡exible models
This is strategy pursued here.
For state space models there are a standard set of algorithms which
can be combined together in various ways to produce quite
sophisticated models
Our MCMC algorithms for complicated models all combine simpler
algorithms.
Now see how this works with TVP-VARs

TVP-VARs
Why TVP-VARs?
Example: U.S. monetary policy
was the high in‡ation and slow growth of the 1970s were due to bad
policy or bad luck?
Some have argued that the way the Fed reacted to in‡ation has
changed over time
After 1980, Fed became more aggressive in …ghting in‡ation pressures
than before
This is the “bad policy” story (change in the monetary policy
transmission mechanism)
This story depends on having VAR coe¢ cients di¤erent in the 1970s
than subsequently.

Others think that variance of the exogenous shocks hitting economy
has changed over time
Perhaps this may explain apparent changes in monetary policy.
This is the “bad luck” story (i.e. 1970s volatility was high, adverse
shocks hit economy, whereas later policymakers had the good fortune
of the Great Moderation of the business cycle – at least until 2008)
This motivates need for multivariate stochastic volatility to VAR
models
Cannot check whether volatility has been changing with a
homoskedastic model

Most macroeconomic applications of interest involve several variables
(so need multivariate model like VAR)
Also need VAR coe¢ cients changing
Also need multivariate stochastic volatility
TVP-VARs are most popular models with such features
But other exist (Markov-switching VARs, Vector Floor and Ceiling
Model, etc.)

Homoskedastic TVP-VARs
Begin by assuming Σt = Σ
Remember VAR notation: yt is M 1 vector, Zt is M k matrix
(de…ned so as to allow for a VAR with di¤erent lagged dependent and
exogenous variables in each equation).
TVP-VAR:
yt = Zt βt + εt
β t + 1 = β t + ut
εt is i.i.d. N (0, Σ) and ut is i.i.d. N (0, Q ).
εt and us are independent of one another for all s and t.

Bayesian inference in this model?
Already done: this is just the Normal linear state space model of the
last lecture.
MCMC algorithm of standard form (e.g. Carter and Kohn, 1994).
But let us see how it works in practice in our empirical application
Follow Primiceri (2005)

Illustration of Bayesian TVP-VAR Methods
Same quarterly US data set from 1953Q1 to 2006Q3 as was used to

illustrate VAR methods
Three variables: In‡ation rate ∆π t , the unemployment rate ut and
the interest rate rt
VAR lag length is 2.
Training sample prior: prior hyperparameters are set to OLS quantities
calculating using an initial part of the data
Our training sample contains 40 observations.
Data through 1962Q4 used to choose prior hyperparameter values,
then Bayesian estimation uses data beginning in 1963Q1.

βOLS is OLS estimate of VAR coe¢ cients in constant-coe¢ cient VAR
using training sample
V ( βOLS ) is estimated covariance of βOLS .
Prior for β0 :
β0 N ( βOLS , 4 V ( βOLS ))
Prior for Σ 1 Wishart prior with ν = M + 1, S = I
Prior for Q 1 Wishart prior with νQ = 40, Q = 0.0001 40 V ( βOLS )

With TVP-VAR we have di¤erent set of VAR coe¢ cients in every
time period
So di¤erent impulse responses in every time period.
Figure 1 presents impulse responses to a monetary policy shock in
three time periods: 1975Q1, 1981Q3 and 1996Q1.
Impulse responses de…ned in same way as we did for VAR
Posterior median is solid line and dotted lines are 10th and 90th
percentiles.

Figure 1: Impulse responses at di¤erent times
TVP-VARs with Stochastic Volatility
In empirical work, you will usually want to add multivariate stochastic

volatility to the TVP-VAR
But this can be dealt with quickly, since the appropriate algorithms
were described in the lecture on State Space Modelling
Remember, in particular, the approaches of Cogley and Sargent
(2005) and Primiceri (2005).
MCMC: need only add another block to our algorithm to draw Σt for
t = 1, .., T .
Homoskedastic TVP-VAR MCMC: p Q 1 jy T , βT ,
p βT jy T , Σ, Q and p Σ 1 jy T , βT
Heteroskedastic TVP-VAR MCMC: p Q 1 jy T , βT ,

p βT jy T , Σ1 , .., ΣT , Q and p Σ1 1 , .., ΣT 1 jy T , βT

Empirical Illustration of Bayesian Inference in TVP-VARs
with Stochastic Volatility
Continue same illustration as before.

All details as for homoskedastic TVP–VAR
Plus allow for multivariate stochastic volatility as in Primiceri (2005).
Priors as in Primiceri
Can present empirical features of interest such as impulse responses
But (for brevity) just present volatility information
Figure 2: time-varying standard deviations of the errors in the three
equations (i.e. the posterior means of the square roots of the diagonal
element of Σt )

Figure 2: Volatilities in the 3 Equations
Summary of TVP-VARs
TVP-VARs are useful for the empirical macroeconomists since they:

are multivariate
allow for VAR coe¢ cients to change
allow for error variances to change
They are state space models so Bayesian inference can use familiar
MCMC algorithms developed for state space models.
Much recent work on shrinkage priors for TVP-VARs to avoid
over-parameterization concerns

Topic5 State Space Models 2019

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Topic5 State Space Models 2019

Caricato da

Copyright:

Formati disponibili

Bayesian State Space Models

() State Space Models 1 / 40

() State Space Models 2 / 40

In many macroeconomic applications, constant β is unrealistic

() State Space Models 3 / 40

() State Space Models 5 / 40

() State Space Models 6 / 40

Special case δ = 0, Tt = I , Σt = Σ and Qt = Q

Hierarchical: since it depends on Q which, in turn, requires its own

() State Space Models 7 / 40

() State Space Models 8 / 40

() State Space Models 9 / 40

For p βT jy T , Σ, Q use standard algorithm for state space models

() State Space Models 10 / 40

() State Space Models 11 / 40

() State Space Models 12 / 40

Normal linear state space model useful for empirical macroeconomists

() State Space Models 13 / 40

Begin with yt being a scalar (common in …nance)

εt is i.i.d. N (0, 1) and η t is i.i.d. N 0, σ2η . εt and η s are

() State Space Models 14 / 40

if φ = 1, µ drops out of the model and However, when φ = 1, need a

() State Space Models 15 / 40

MCMC algorithm involves sequentially drawing from

() State Space Models 16 / 40

where yt = ln yt2 and εt = ln ε2t .

() State Space Models 17 / 40

where fN εt jmi , vi2 is the p.d.f. of a N mi , vi2

() State Space Models 18 / 40

MCMC algorithm does not draw from p hT jy T , µ, φ, σ2η , but from

() State Space Models 19 / 40

yt is now M 1 vector and εt is i.i.d. N (0, Σt ).

() State Space Models 20 / 40

if η it are independent (across both i and t) then Kim, Shephard and

() State Space Models 21 / 40

Cogley and Sargent (2005, RED)

Dt is as in Model 1 (diagonal matrix with diagonal elements being

() State Space Models 22 / 40

εt = Lεt will now have a diagonal covariance matrix – can use

() State Space Models 23 / 40

() State Space Models 24 / 40

Primiceri (2005, ReStud):

Lt is same as Cogley-Sargent’s L but is now time varying.

() State Space Models 25 / 40

() State Space Models 26 / 40

MCMC algorithms such as the Gibbs sampler are modular in nature

() State Space Models 27 / 40

() State Space Models 28 / 40

() State Space Models 29 / 40

() State Space Models 30 / 40

() State Space Models 31 / 40

() State Space Models 32 / 40

Same quarterly US data set from 1953Q1 to 2006Q3 as was used to

() State Space Models 33 / 40

() State Space Models 34 / 40

() State Space Models 35 / 40

In empirical work, you will usually want to add multivariate stochastic

Heteroskedastic TVP-VAR MCMC: p Q 1 jy T , βT ,

() State Space Models 37 / 40

Continue same illustration as before.

() State Space Models 38 / 40

TVP-VARs are useful for the empirical macroeconomists since they:

() State Space Models 40 / 40