Slides A13 14

Time Series Analysis
Andrea Beccarini
Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE)
Winter 2013/2014
1 / 143
Introduction
Objectives
Time series are ubiquitous in economics, and very important

in macro economics and financial economics
GDP, inflation rates, unemployment, interest rates, stock prices
You will learn . . .
the formal mathematical treatment of time series and stochastic
processes
what the most important standard models in economics are
how to fit models to real world time series
Winter 2013/2014
2 / 143
Introduction
Prerequisites
Descriptive Statistics
Probability Theory
Statistical Inference
Winter 2013/2014
3 / 143
Introduction
Class and material
Class
Class teacher: Sarah Meyer
Time: Tu., 12:00-14:00
Location: CAWM 3
Start: 22 October 2013
Material
Course page on Blackboard
Slides and class material are (or will be) downloadable
Winter 2013/2014
4 / 143
Introduction
Literature
Neusser, Klaus (2011), Zeitreihenanalyse in den

Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.
available online in the RUB-Netz
Hamilton, James D. (1994), Time Series Analysis,
Princeton University Press, Princeton.
Pfaff, Bernhard (2006), Analysis of Integrated and
Cointegrated Time Series with R, Springer, New York.
Schlittgen, Rainer und Streitberg, Bernd (1997),
Zeitreihenanalyse, 7. Aufl., Oldenbourg, M
unchen.
Winter 2013/2014
5 / 143
Basics
Definition
Definition: Time series

A sequence of observations ordered by time is called time series
Time series can be univariate or multivariate
Time can be discrete or continous
The states can be discrete or continuous
Winter 2013/2014
6 / 143
Basics
Definition
Typical notations
x1 , x2 , . . . , xT
or x(1), x(2), . . . , x(T )
or xt , t = 1, . . . , T
or (xt )t0
This course is about . . .
univariate time series
in discrete time
with continuous states
Winter 2013/2014
7 / 143
Basics
Examples
Quarterly GDP Germany, 1991 I to 2012 II
600
550
500
450
400
350
GDP (in current billion Euro)
650
1995
2000
2005
2010
Time
Winter 2013/2014
8 / 143
Basics
Examples
6000
2000
DAX
DAX index and log(DAX), 31.12.1964 to 6.4.2009
1970
1980
1990
2000
2010
2000
2010
9.0
8.0
7.0
6.0
logarithm of DAX
Time
1970
1980
1990
Time
Winter 2013/2014
9 / 143
Basics
Definition
Definition: Stochastic process

A sequence (Xt )tT of random variables, all defined on the same
probability space (, A, P), is called stochastic process with discrete time
parameter (usually T = N or T = Z)
Short version: A stochastic process is a sequence of random variables
A stochastic process depends on both chance and time
Winter 2013/2014
10 / 143
Basics
Definition
Distinguish four cases: both time and chance can be fixed or variable
fixed
variable
t fixed
Xt () is a
real number
Xt () is a
random variable
t variable
Xt () is a sequence of
real numbers (path,
realization, trajectory)
Xt () is a stochastic
process
process.R
Winter 2013/2014
11 / 143
Basics
Examples
Example 1: White noise

t NID 0, 2
Example 2: Random walk

Xt
= Xt1 + t
and X0 = 0
NID(0, )
Example 3: A random constant

Xt
Z
= Z
N(0, 2 )
Winter 2013/2014
12 / 143
Basics
Moment functions
Definition: Moment functions

The following functions of time are called moment functions:
(t) = E (Xt )
(expectation function)
2 (t) = Var (Xt )
(variance function)
(s, t) = Cov (Xs , Xt ) (covariance function)
Correlation function (autocorrelation function)
(s, t)
p
(s, t) = p
2
(s) 2 (t)
moments.R
[1]
Winter 2013/2014
13 / 143
Basics
Estimation of moment functions
Usually, the moment functions are unknown and have to be estimated

Problem: Only a single path (realization) can be observed
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Can we still estimate the expectation function (t) and the

autocovariance function (s, t)? Under which conditions?
Winter 2013/2014
14 / 143
Basics
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Usually, the expectation function (t) should be estimated by

averaging over realizations,
n
1 X (i)
(t) =
Xt
n
i=1
Winter 2013/2014
15 / 143
Basics
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Under certain conditions, (t) can be estimated by

averaging over time,
T
1 X (1)
=
Xt
T
t=1
Winter 2013/2014
15 / 143
Basics
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Usually, the autocovariance (t, t + h) should be estimated by

averaging over realizations,
n
1 X (i)
(i)
(t, t + h) =
(Xt
(t))(Xt+h
(t + h))
n
i=1
Winter 2013/2014
16 / 143
Basics
X1
(1)
X2
..
.
(1)
X1
(2)
X2
..
.
(2)
(1)
XT
(2)
XT
...
...
...
...
(n)
X1
(n)
X2
..
.
(n)
XT
Under certain conditions, (t, t + h) can be estimated by

averaging over time,
(t, t + h) =
T h
1 X
)(Xt+h (1)
)
(Xt (1)
T
t=1
Winter 2013/2014
16 / 143
Basics
Definition
Moment functions cannot be estimated without additional

assumptions since only one path is observed
There are restrictions which allow to estimate the moment functions
Restriction of the time heterogeneity:
The distribution of (Xt ())tT must not be completely different for
each t T
Restriction of the memory:
If the values of the process are coupled too closely over time, the
individual observations do not supply any (or only insufficient)
information about the distribution
Winter 2013/2014
17 / 143
Basics
Restriction of time heterogeneity: Stationarity
Definition: Strong stationarity

Let (Xt )tT be a stochastic process, and let t1 , . . . , tn T be an arbitrary
number of n N arbitrary time points.
(Xt )tT is called strongly stationary if for arbitrary h Z
P(Xt1 x1 , . . . , Xtn xn ) = P(Xt1 +h x1 , . . . , Xtn +h xn )
Implication: all univariate marginal distributions are identical
Winter 2013/2014
18 / 143
Basics
Definition: Weak stationarity

(Xt )tT is called weakly stationary if
1
the expectation exists and is constant: E (Xt ) = < for all t T
the variance exists and is constant: Var (Xt ) = 2 < for all t T
for all t, s, r Z (in admissible range)

(t, s) = (t + r , s + r )
Simplified notation for covariance and correlation functions

(h) = (t, t + h)
(h) = (t, t + h)
Winter 2013/2014
19 / 143
Basics
Strong stationarity implies weak stationarity

(but only if the first two moments exist)
A stochastic process is called Gaussian if the joint distribution
of Xt1 , . . . , Xtn is multivariate normal
For Gaussian processes, weak and strong stationarity coincide
Intuition: An observed time series can be regarded as a realization of
a stationary process, if a gliding window of appropriate width
always displays qualitatively the same picture
stationary.R
Examples
[2]
Winter 2013/2014
20 / 143
Basics
Restriction of memory: Ergodicity
Definition: Ergodicity (I)

Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
T
1 X
Xt
T
t=1
(Xt )tT is called (expectation) ergodic, if

h
i
lim E (
T )2 = 0
Winter 2013/2014
21 / 143
Basics
Definition: Ergodicity (II)

Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
(h) =
T h
1 X
(Xt )(Xt+h )
T
t=1
(Xt )tT is called (covariance) ergodic, if for all h Z

h
i
lim E (
(h) (h))2 = 0
Winter 2013/2014
22 / 143
Basics
Ergodicity is consistency (in quadratic mean) of the estimators

of
and (h) of (h) for dependent observations
The process (Xt )tT is expectation ergodic if ((h))hZ is
absolutely summable, i.e.
|(h)| <
h=
The dependence between far away observations must be sufficiently

small
Winter 2013/2014
23 / 143
Basics
Ergodicity condition (for autocovariance): A stationary Gaussian

process (Xt )tT with absolutely summable autocovariance function
(h) is (autocovariance) ergodic
Under ergodicity, the law of large numbers holds even if the
observations are dependent
If the dependence (h) does not diminish fast enough,
the estimators are no longer consistent
Examples
[3]
Winter 2013/2014
24 / 143
Basics
Summary of estimators
electricity.R
T
1 X
= XT =
Xt
T
t=1
T
h
X
(h) =
1
T
(h) =
(h)
(0)
(Xt
)(Xt+h
)
t=1
Sometimes, (h) is defined with factor 1/(T h)
Winter 2013/2014
25 / 143
Basics
A closer look at the expectation estimator
The estimator
is unbiased, i.e. E (
) =
[4]
The variance of
is
[5]

T 1
(0)
2 X
h
Var (
) =
+
1
(h)
T
T
T
h=1
Under ergodicity, for T

T Var (
) (0) + 2
X
h=1
(h) =
(h)
h=
Winter 2013/2014
26 / 143
Basics
For Gaussian processes,

is normally distributed
N (, Var (
))
and asymptotically
T (
) Z N
0, (0) + 2
!
(h)
h=1
For non-Gaussian processes,

is (often) asymptotically normal
!
T (
) Z N 0, (0) + 2
(h)
h=1
Winter 2013/2014
27 / 143
Basics
A closer look at the autocovariance estimators (h)

For Gaussian processes with absolutely summable covariance function,
0

T (
(0) (0)) , . . . , T (
(K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and
T Cov (
(h1 ) , (h2 ))
X
=
( (r ) (r + h1 + h2 ) + (r h2 ) (r + h1 ))
r =
Winter 2013/2014
28 / 143
Basics
A closer look at the autocorrelation estimators (h)

For Gaussian processes with absolutely summable covariance function,
the random vector

0
T (
(0) (0)) , . . . , T (
(K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and a
complicated covariance matrix
Be careful: For small to medium sample sizes the autocovariance and
autocorrelation estimators are biased!
autocorr.R
Winter 2013/2014
29 / 143
Basics
An important special case for autocorrelation estimators:

Let (t ) be a white-noise process with Var (t ) = 2 < , then
E (
(h)) = T 1 + O(T 2 )
1
2 )
T + O(T

Cov (
(h1 ) , (h2 )) =
2
O T
for h1 = h2
else
For white-noise processes and long time series, the empirical

autocorrelations are approximately independent normal random
variables with expectation T 1 and variance T 1
Winter 2013/2014
30 / 143
Mathematical digression (I)

Complex numbers
Some quadratic equations do not have real solutions, e.g.

x2 + 1 = 0
Still it is possible (and sensible) to define solutions to such equations
The definition in common notation is
i = 1
where i is the number which, when squared, equals 1
The number i is called imaginary (i.e. not real)
Winter 2013/2014
31 / 143

Complex numbers
Other imaginary numbers follow from this definition, e.g.

16 =
16 1 = 4i

5 =
5 1 = 5i
Further, it is possible to define numbers that contain
both a real part and an imaginary part, e.g. 5 8i or a + bi
Such numbers are called complex and the set of complex numbers
is denoted as C
The pair a + bi and a bi is called conjugate complex
Winter 2013/2014
32 / 143

Complex numbers
imaginary axis
seq(0, 8, length = 11)
Geometric interpretation:
a+bi
er
alu
ev
lut
so
ab
imaginary part b
real part a
real axis
Winter 2013/2014
33 / 143

Complex numbers
Polar coordinates and Cartesian coordinates

z
= a + bi
= r (cos + i sin )
= re i
a = r cos
b = r sin
p
a2 + b 2
r =

b
= arctan
a
Winter 2013/2014
34 / 143

Complex numbers
Rules of calculus:
Addition
(a + bi) + (c + di) = (a + c) + (b + d)i
Multiplication (cartesian coordinates)
(a + bi) (c + di) = (ac bd) + (ad + bc)i
Multiplication (polar coordinates)
r1 e i1 r2 e i2 = r1 r2 e i(1 +2 )
Winter 2013/2014
35 / 143

Complex numbers
imaginary axis
Addition:
a+bi
c+di
real axis
Winter 2013/2014
36 / 143

Complex numbers
Addition:
imaginary axis
a+bi
c+di
real axis
Winter 2013/2014
36 / 143

Complex numbers
Addition:
(a+c)+(b+d)i
imaginary axis
a+bi
c+di
real axis
Winter 2013/2014
36 / 143

Complex numbers
imaginary axis
Multiplication:
r2
r1
real axis
Winter 2013/2014
37 / 143

Complex numbers
Multiplication:
imaginary axis
r=
r1
= 1 + 2
2
r2
r1
real axis
Winter 2013/2014
37 / 143

Complex numbers
The quadratic equation

x 2 + px + q = 0
has the solutions
p
x =
2
If
p2
4
p2
q
4
q < 0 the solutions are complex (and conjugate)
Winter 2013/2014
38 / 143

Complex numbers
Example: The solutions of

x 2 2x + 5 = 0
are
(2)
+
x =
2
(2)2
5 = 1 + 2i
4
(2)
x =
(2)2
5 = 1 2i
4
and
Winter 2013/2014
39 / 143
Mathematical digression (II)

Linear difference equations
First order difference equation with initial value x0 :

xt = c + 1 xt1
p-th order difference equation with initial value x0 :
xt = c + 1 xt1 + . . . + p xtp
A sequence (xt )t=0,1,... that satisfies the difference equation
is called a solution of the difference equation
Examples (diffequation.R)
Winter 2013/2014
40 / 143

We only consider the homogeneous case, i.e. c = 0

The general solution of the first-order difference equation
xt = 1 xt1
is
xt = A t1
with arbitrary constant A since xt = At1 = 1 At1
= 1 xt1
1
The constant is definitized by the initial condition, A = x0
The sequence xt = At1 is convergent if and only if |1 | < 1
Winter 2013/2014
41 / 143

Solution of the p-th order difference equation

xt = 1 xt1 + . . . + p xtp
Let xt = Az t , then
Az t
z t
= 1 Az (t1) + . . . + p Az (tp)
= 1 z (t1) + . . . + p z (tp)
and thus
1 1 z 1 . . . p z p = 0
Characteristic polynomial, characteristic equation
Winter 2013/2014
42 / 143

There are p (possibly complex, possibly nondistinct) solutions

of the characteristic equation
Denote the solutions (called roots) by z1 , . . . , zp
If all roots are real and distinct, then
xt = A1 z1t + . . . + Ap zpt
is a solution of the homogeneous difference equation
If there are complex roots the solution is oscillating
The constants A1 , . . . , Ap can be definitized with p initial conditions
(x0 , x1 , . . . , xp1 )
Winter 2013/2014
43 / 143

Stability condition: The linear difference equation

xt = 1 xt1 + . . . + p xtp
is stable (i.e. convergent) if and only if all roots of the
characteristic polynomial
1 1 z . . . p z p = 0
are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p
In R, the stability condition can be checked easily using the
commands polyroot (base package) or ArmaRoots (fArma package)
Winter 2013/2014
44 / 143
ARMA models
Definition
Definition: ARMA process

Let (t )tT be a white noise process; the stochastic process
Xt = 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
with p , q 6= 0 is called ARMA(p, q) process
AutoRegressive Moving Average process
ARMA processes are important since every stationary process can be
approximated by an ARMA process
Winter 2013/2014
45 / 143
ARMA models
Lag operator and lag polynomial
The lag operator is a convenient notational tool

The lag operator L shifts the time index of a stochastic process
L (Xt )tT = (Xt1 )tT
LXt
= Xt1
Rules
L2 Xt
n
L Xt
= Xtn
= Xt+1
= Xt
L Xt
= L (LXt ) = Xt2
Winter 2013/2014
46 / 143
ARMA models
Lag polynomial
A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp
Example: Let A(L) = 1 0.5L and B(L) = 1 + 4L2 , then
C (L) = A(L)B(L)
= (1 0.5L) 1 + 4L2
= 1 0.5L + 4L2 2L3

Lag polynomials can be treated in the same way as ordinary
polynomials
Winter 2013/2014
47 / 143
ARMA models
Define the lag polynomials

(L) = 1 1 L . . . p Lp
(L) = 1 + 1 L + . . . + q Lq
The ARMA(p, q) process can be written compactly as
(L)Xt = (L)t
Important special cases
MA(q) process :
Xt = t + 1 t1 + . . . + q tq
AR(1) process :
Xt = 1 Xt1 + t
AR(p) process :
Xt = 1 Xt1 + + p Xtp + t
Winter 2013/2014
48 / 143
ARMA models
MA(q) process
The MA(q) process is

Xt
= (L)t
Xt
= t + 1 t1 + . . . + q tq
with t NID(0, 2 )
Expectation function
E (Xt ) = E (t + 1 t1 + . . . + q tq )
= E (t ) + 1 E (t1 ) + . . . + q E (tq )
= 0
Winter 2013/2014
49 / 143
ARMA models
MA(q) process
Autocovariance function
(s, t)

= E (s + 1 s1 + . . . + q sq ) (t + 1 t1 + . . . + q tq )

= E s t + 1 s t1 + 2 s t2 + . . . + q s tq
+1 s1 t + 12 s1 t1 + 1 2 s1 t2 + . . . + 1 q s1 tq
+...
+q sq t + 1 q sq t1 + 2 q sq t2 + . . . + q2 sq tq
The expectations of the cross products are

0
for s 6= t
E (s t ) =
2
for s = t
Winter 2013/2014
50 / 143
ARMA models
MA(q) process
Define 0 = 1, then
(t, t) = 2
(t 1, t) =
Xq
2
i=0 i
Xq1
2
i i+1
i=0
(t 2, t) = 2
Xq2
i=0
i i+2
(t q, t) = 2 0 q = 2 q
(s, t) = 0 for s < t q
Hence, MA(q) processes are always stationary
Simulation of MA(q) processes (maqsim.R)
Winter 2013/2014
51 / 143
ARMA models
AR(1) process
The AR(1) process is

(L)Xt
= t
(1 1 L)Xt
= t
Xt
= 1 Xt1 + t
with t NID(0, 2 )
Expectation and variance function
[6]
Stability condition: AR(1) processes are stable if |1 | < 1
Winter 2013/2014
52 / 143
ARMA models
AR(1) process
Stationarity: Stable AR(1) processes are weakly stationary if
[7]
E (X0 ) = 0
Var (X0 ) =
2
1 21
Nonstationary stable processes converge towards stationarity
[8]
It is common parlance to call stable processes stationary

Covariance function of stationary AR(1) process
[9]
Winter 2013/2014
53 / 143
ARMA models
AR(p) process
The AR(p) process is

(L)Xt
Xt
= t
= 1 Xt1 + . . . + p Xtp + t
with t NID(0, 2 )
Assumption: t is independent from Xt1 , Xt2 , . . . (innovations)
Expectation function
[10]
The covariance function is complicated (ar2autocov.R)
Winter 2013/2014
54 / 143
ARMA models
AR(p) process
AR(p) processes are stable if all roots of the characteristic equation

(z) = 0
are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p
An AR(p) process is weakly stationary if the joint distribution of the
p initial values (X0 , X1 , . . . , X(p1) ) is appropriate
Stable AR(p) processes converge towards stationarity;

they are often called stationary
Simulation of AR(p) processes (arpsim.R)
Winter 2013/2014
55 / 143
ARMA models
Invertability
AR and MA processes can be inverted (into each other)

Example: Consider the stable AR (1) process with |1 | < 1
Xt
= 1 Xt1 + t
= 1 (1 Xt2 + t1 ) + t
= 21 Xt2 + 1 t1 + t
..
.
= n1 Xtn + 1n1 t(n1) + . . . + 21 t2 + 1 t1 + t
Winter 2013/2014
56 / 143
ARMA models
Invertability
Since |1 | < 1
Xt
i1 ti
i=0
= t + 1 t1 + 2 t2 + . . .
with i = i1
A stable AR(1) process can be written as an MA() process
(the same is true for stable AR(p) processes)
Winter 2013/2014
57 / 143
ARMA models
Invertability
Using lag polynomials this can be written as

(1 1 L)Xt
Xt
Xt
= t
= (1 1 L)1 t
X
=
(1 L)i t
i=0
General compact and elegant notation

(L)Xt
Xt
= t
= ((L))1 t
= (L)t
Winter 2013/2014
58 / 143
ARMA models
Invertability
MA(q) can be written as AR() if all roots of (z) = 0 are larger

than 1 in absolute value (invertability condition)
Example: MA(1) with |1 | < 1; from
Xt
= t + 1 t1
1 Xt1 = 1 t1 + 12 t2
we find Xt = 1 Xt1 + t 12 t2
Repeated substitution of the ti terms yields
Xt =
i Xti + t
with i = (1)i+1 1i
i=1
Winter 2013/2014
59 / 143
ARMA models
Invertability
Summary
ARMA(p, q) processes are stable if all roots of
(z) = 0
are larger than 1 in absolute value
ARMA(p, q) processes are invertible if all roots of
(z) = 0
are larger than 1 in absolute value
Winter 2013/2014
60 / 143
ARMA models
Invertability
Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)

process either as AR() or as MA()
ARMA(p, q) can be written as AR() or MA()
(L)Xt
Xt
((L))1 (L)Xt
= (L)t
= ((L))1 (L)t
= t
Winter 2013/2014
61 / 143
ARMA models
Deterministic components
Until now we only considered processes with zero expectation

Many processes have both a zero-expectation stochastic
component (Yt ) and a non-zero deterministic component (Dt )
Examples:
linear trend Dt = a + bt
exponential trend Dt = ab t
saisonal patterns
Let (Xt )tZ be a stochastic process with deterministic component Dt

and define Yt = Xt Dt
Winter 2013/2014
62 / 143
ARMA models
Then E (Yt ) = 0 and

Cov (Yt , Ys ) = E [(Yt E (Yt )) (Ys E (Ys ))]
= E [(Xt Dt E (Xt Dt ))(Xs Ds E (Xs Ds ))]
= E [(Xt E (Xt )) (Xs E (Xs ))]
= Cov (Xt , Xs )
The covariance function does not depend on the deterministic
component
To derive the covariance function of a stochastic process, simply drop
the deterministic component
Winter 2013/2014
63 / 143
ARMA models
Special case: Dt = t =
ARMA(p, q) process with constant (non-zero) expectation
Xt = 1 (Xt1 ) + . . . + p (Xtp )
+t + 1 t1 + . . . + q tq
The process can also be written as
Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
where c = (1 1 . . . p )
Winter 2013/2014
64 / 143
ARMA models
Wolds representation theorem: Every stationary stochastic process

(Xt )tT can be represented as
Xt =
h th + Dt
h=0
with 0 = 1,
2
h=0 j
< and t white noise with variance 2 > 0
Stationary stochastic processes can be written as a sum of a

deterministic process and an MA() process
Often, low order ARMA(p, q) processes can approximate MA()
processes well
Winter 2013/2014
65 / 143
ARMA models
Linear processes and filter
Definition: Linear process

Let (t )tZ be a white noise process; a stochastic process (Xt )tZ is called
linear if it can be written as
Xt
h th
h=
= (L)t
where the coefficients are absolutely summable, i.e.
h= |h |
< .
The lag polynomial (L) is called (linear) filter
Winter 2013/2014
66 / 143
ARMA models
Some special filters

Change from previous period (difference filter)
(L) = 1 L
Change from last year (for quarterly or monthly data)
(L) = 1 L4
(L) = 1 L12
Elimination of saisonal influences (quarterly data)

(L) = 1 + L + L2 + L3 /4
(L) = 0.125L2 + 0.25L + 0.25 + 0.25L1 + 0.125L2
Winter 2013/2014
67 / 143
ARMA models
Hodrick-Prescott filter (important tool in empirical macro economics)

Decompose a time series (Xt ) into a long-term growth component
(Gt ) and a short-term cyclical component (Ct )
Xt = Gt + Ct
Trade-off between goodness-of-fit and smoothness of Gt
Minimize the criterion function
T
X
(Xt Gt )2 +
t=1
T
1
X
[(Gt+1 Gt ) (Gt Gt1 )]2
t=2
with respect to Gt for given smoothness parameter
Winter 2013/2014
68 / 143
ARMA models
The FOCs of the minimization problem are
G1
X1
.
..
. = A ..
GT
XT
where A = (I + K 0 K )1 with
1 2 1
0 0
0 1 2 1 0
1 2 1
K = 0 0
..
..
..
..
..
.
.
.
.
.
0 0
0
0 0
... 0
... 0
... 0
.
. . . ..
0
0
0
..
.
0
0
0
..
.
. . . 1 2 1
Winter 2013/2014
69 / 143
ARMA models
The HP filter is a linear filter

Typical values for smoothing parameter
= 10
= 1600
= 14400
annual data
quarterly data
monthly data
Implementation in R (code by Olaf Posch)

Empirical examples (hpfilter.R)
Winter 2013/2014
70 / 143
Estimation of ARMA models

The estimation problem
Problem: The parameters 1 , . . . , p , 1 , . . . , q , 2 of an ARMA(p, q)

process are usually unknown
They have to be estimated from an observed time series X1 , . . . , XT
Standard estimation methods:
Least squares (OLS)
Maximum likelihood (ML)
Assumption: the lag orders p and q are known
Winter 2013/2014
71 / 143

Least squares estimation of AR(p) models
The AR(p) model with non-zero constant expectation

Xt = c + 1 Xt1 + . . . + p Xtp + t
can be writte in matrix notation

Xp+1
1
Xp
Xp1
Xp+2 1 Xp+1
Xp

.. = ..
..
..
. .
.
.
XT
...
...
..
.
X1
X2
..
.
1 XT 1 XT 2 . . . XT p
c
1
..
.
p+1
p+2
..
.
Compact notation: y = X + u
Winter 2013/2014
72 / 143

Least squares estimation of AR(p) models
The standard least squares estimator is

1 0
= X0 X
Xy
The matrix of exogenous variables X is stochastic
usual results for OLS regression do not hold
But: There is no contemporaneous correlation between the error term
and the exogenous variables
Hence, the OLS estimators are consistent and asymptotically efficient
Winter 2013/2014
73 / 143

Least squares estimation of ARMA models
Solve the ARMA equation

Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
for t ,
t = Xt c 1 Xt1 . . . p Xtp 1 t1 . . . q tq
Define the residuals as functions of the unknown parameters
t (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt d f1 Xt1 . . . fp Xtp
g1 t1 . . . gq tq
Winter 2013/2014
74 / 143

Define the sum of squared residuals

S (d, f1 , . . . , fp , g1 , . . . , gq ) =
T
X
(
t (d, f1 , . . . , fp , g1 , . . . , gq ))2
t=1
The least squares estimators are

(
c , 1 , . . . , p , 1 , . . . , q ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq )
Since the residuals are defined recursively one needs starting values
0 , . . . , q+1 and X0 , . . . , Xp+1 to calculate 1
Easiest way: Set all starting values to zero ( conditional estimation)
Winter 2013/2014
75 / 143

The first order conditions are a nonlinear equation system

which cannot be solved easily
Minimization by standard numerical methods
(implemented in all usual statistical packages)
Either solve the nonlinear first order conditions equation system or
minimize S
Simple special case: ARMA(1, 1)
arma11.R
Winter 2013/2014
76 / 143

Maximum likelihood estimation
Additional assumption: The innovations t are normally distributed

Implication: ARMA processes are Gaussian
The joint distribution of X1 , . . . , XT is multivariat normal
X1
X = ... N (, )
XT
Winter 2013/2014
77 / 143

Expectation vector
X1
c/ (1 1 . . . p )
..
= E ... =
.
XT
c/ (1 1 . . . p )
Covariance matrix

X1
X2

= Cov . =
..
XT
. . . (T 1)
. . . (T 2)
..
..
.
.
(T 1) (T 2) . . .
(0)
(0)
(1)
..
.
(1)
(0)
..
.
Winter 2013/2014
78 / 143

The expectation vector and the covariance matrix contain

all
2
unknown parameters = 1 , . . . , p , 1 , . . . , q , c,
The likelihood function is
T /2
L (; X) = (2)
1/2
(det )

1
0 1
exp (X ) (X )
2
and the loglikelihood function is

ln L (; X) =
T
1
1
ln (2) ln (det ) (X )0 1 (X )
2
2
2
The ML estimators are = arg max ln L (; X)
Winter 2013/2014
79 / 143

The loglikelihood function has to be maximized by numerical methods

Standard properties of ML estimators:
1
2
3
4
consistency
asymptotic efficiency
asymptotically jointly normally distributed
the covariance matrix of the estimators can be consistently estimated
Example: ML estimation of an ARMA(3, 3) model for the interest

rate spread (arma33.R)
Winter 2013/2014
80 / 143

Hypothesis tests
Since the estimation method is maximum likelihood,

the classical tests (Wald, LR, LM) are applicable
General null and alternative hypotheses
H0 : g () = 0
H1 : not H0
where g () is an m-valued function of the parameters
Example: If H0 : 1 = 0 then m = 1 and g () = 1
Winter 2013/2014
81 / 143

Hypothesis tests
Likelihood ratio test statistic

LR = 2(ln L(ML ) ln L(R ))
where ML and R are the unrestricted and restricted estimators
Under the null hypothesis
d
LR U 2m
and H0 is rejected at significance level if LR > 2m;1
Disadvantage: Two models must be estimated
Winter 2013/2014
82 / 143

Hypothesis tests
For the Wald test we only consider g () = 0 , i.e.

H0 : = 0
H1 : not H0
Test statistic
d ()(
0 )
W = ( 0 )0 Cov
d
If the null hypothesis is true then W U 2m

The asymptotic covariance matrix can be estimated consistently as
d ()
= H 1 where H is the Hessian matrix returned by the
Cov
maximization procedure
Winter 2013/2014
83 / 143

Hypothesis tests
Test example 1:
H0 : 1 = 0
H1 : 1 6= 0
Test example 2
H0 : = 0
H1 : not H0
Illustration (arma33.R)
Winter 2013/2014
84 / 143

Model selection
Usually, the lag orders p and q of an ARMA model are unknown

Trade-off: Goodness-of-fit against parsimony
Akaikes information criterion for the model with non-zero expectation
AIC =
ln
2
|{z}
goodness-of-fit
+ 2 (p + q + 1) /T
|
{z
}
penalty
Choose the model with the smallest AIC
Winter 2013/2014
85 / 143

Model selection
Bayesian information criterion BIC (Schwarz information criterion)

BIC = ln
2 + (p + q + 1) ln T /T
Hannan-Quinn information criterion
HQ = ln
2 + 2 (p + q + 1) ln (ln T ) /T
Both BIC and HQ are consistent while the AIC tends to overfit
Illustration (arma33.R)
Winter 2013/2014
86 / 143

Model selection
Another illustration: The true model is ARMA(2, 1) with

Xt = 0.5Xt1 + 0.3Xt2 + t + 0.7t1 ; 1000 samples of size n = 500
were generated; the table shows the models orders p and q as selected by
AIC and BIC
p
0
1
2
3
4
5
0
0
0
0
0
9
11
# orders selected by
q
1
2
3
0
0
0
18
64
23
171
21
16
7
35
58
2
12 139
6
12
56
AIC
4
0
14
5
80
37
46
5
0
6
7
45
44
56
0
0
0
0
1
6
1
# orders selected by
q
1
2
3
0
0
0
310 167
4
503
3
1
0
2
1
1
0
0
0
0
0
BIC
4
0
0
0
0
0
0
Winter 2013/2014
5
0
0
0
0
0
0
87 / 143
Integrated processes
Difference operator
Define the difference operator

= 1 L,
then
Xt = Xt Xt1
Second order differences
2 = () = (1 L)2 = 1 2L + L2
Higher orders n are defined in the same way; note that n 6= 1 Ln
Winter 2013/2014
88 / 143
Definition
Definition: Integrated process

A stochastic process is called integrated of order 1 if
Xt = + (L)t
P
where t is white noise, (1) 6= 0, and
j=0 j|j | <
Common notation: Xt I (1)
I (1) processes are also called difference stationary or
unit root processes
Stochastic and deterministic trends
Trend stationary processes are not I (1) (since (1) = 0)
Winter 2013/2014
89 / 143
Definition
Stationary processes are sometimes called I (0)

Higher order integrations are possible, e.g.
Xt
I (2)
Xt
I (0)
In general, Xt I (d) means that d Xt I (0)

Most economic time series are either I (0) or I (1)
Some economic time series may be I (2)
Winter 2013/2014
90 / 143
Definition
Example 1: The random walk with drift, Xt = b + Xt1 + t ,

is I (1) because
Xt
= Xt Xt1
= b + t
= b + (L)t
where 0 = 1 and j = 0 for j 6= 0
Winter 2013/2014
91 / 143
Definition
Example 2: The trend stationary process, Xt = a + bt + t ,

is not I (1) because
Xt
= b + t t1
= (L)t
with 0 = 1, 1 = 1 and j = 0 for all other j
Winter 2013/2014
92 / 143
Definition
Example 3: The AR(2) process
Xt
(1 L) (1 L) Xt
= b + (1 + ) Xt1 Xt2 + t
= b + t
is I (1) if || < 1 because Xt = (L) (b + t ) with

(L) = (1 L)1 = 1 + L + 2 L2 + 3 L3 + 4 L4 + . . .
P
1
i
and thus (1) =
i=0 = 1 6= 0. The roots of the characteristic
equation are z = 1 and z = 1/
Winter 2013/2014
93 / 143
Definition
Example 4: The process

Xt = 0.5Xt1 0.4Xt2 + t
is a stationary (stable) zero expectation AR(2) process; the process
Yt = a + bt + Xt
is trend stationary and I (0) since
Yt = b + Xt
with Xt = (L)t = (1 L) 1 0.5L + 0.4L2
and therefore (1) = 0 (i0andi1.R)
1
Winter 2013/2014
94 / 143
Definition
Definition: ARIMA process

Let (t )tT be a white noise process; the stochastic process (Xt )tZ is
called integrated autoregressive moving-average process of the orders
p, d and q, or ARIMA(p, d, q), if d Xt is an ARMA(p, q) process
(L)d Xt = (L)t
For d > 0 the process is nonstationary (I (d)) even if all roots of
(z) = 0 are outside the unit circle
Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)
Winter 2013/2014
95 / 143
Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?

Reason 1: Long-term forecasts and forecasting errors
Deterministic trend: The forecasting error variance is bounded
Stochastic trend: The forecasting error variance is unbounded
Illustrations
i0andi1.R
Winter 2013/2014
96 / 143
Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?

Reason 2: Spurious regression
OLS regressions will show spurious relationships between
time series with (deterministic or stochastic) trends
Detrending works if the series have deterministic trends,
but it does not help if the series are integrated
Illustrations
spurious1.R
Winter 2013/2014
97 / 143
Integrated processes and parameter estimation
OLS estimators (and ML estimators) are consistent and

asymptotically normal for stationary processes
The asymptotic normality is lost if the processes are integrated
We only look at the very special case
Xt = 1 Xt1 + t
with t NID(0, 1) and X0 = 0
The AR(1) process is stationary if |1 | < 1 and has a unit root
if |1 | = 1
Winter 2013/2014
98 / 143
The usual OLS estimator of 1 is

PT
t=1 Xt Xt1
1 = P
T
2
t=1 Xt1
How does the distribution of look like?
Influence of and T
Consistency?
Asymptotic normality?
Illustration (phihat.R)
Winter 2013/2014
99 / 143
Consistency and asymptotic normality for I (0) processes (|1 | < 1)

plim 1 = 1

d
T 1 1 Z N 0, 1 21
Consistency and asymptotic normality for I (1) processes (1 = 1)
plim 1 = 1

d
T 1 1 V
where V is a nondegenerate, nonnormal random variable
Root-T -consistency and superconsistency
Winter 2013/2014
100 / 143
Unit root tests
Importance to distinguish between trend stationarity and difference

stationarity
Test of hypothesis that a process has a unit root (i.e. is I (1))
Classical approaches: (Augmented) Dickey-Fuller-Test,
Phillips-Perron-Test
Basic tool: Linear regression
Xt
Xt
= deterministics + Xt1 + t
= deterministics + ( 1) Xt1 + t
| {z }
=:
Winter 2013/2014
101 / 143
Unit root tests
Null and alternative hypothesis

H0 : = 1
(unit root)
H1 : || < 1
(no unit root)
H0 : = 0
(unit root)
H1 : < 0
(no unit root)
or, equivalently,
Unit root tests are one-sided; explosive process are ruled out
Rejecting the null hypothesis is evidence in favour of stationarity
If the null hypothesis is not rejected, there could be a unit root
Winter 2013/2014
102 / 143
DF test and ADF test
Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests

Possible regressions
Xt = Xt1 + t
Xt = a + Xt1 + t
Xt = a + bt + Xt1 + t
or Xt = Xt1 + t
or Xt = a + Xt1 + t
or Xt = a + bt + Xt1 + t
Assumption for Dickey-Fuller test: no autocorrelation in t

If there is autocorrelation in t , use the augmented DF test
Winter 2013/2014
103 / 143
Dickey-Fuller regression, case 1: no constant, no trend

Xt = Xt1 + t
Null and alternative hypotheses
H0 : = 0
H1 : < 0
Null hypothesis: stochastic trend without drift
Alternative hypothesis: stationary process around zero
Winter 2013/2014
104 / 143
Dickey-Fuller regression, case 2: constant, no trend

Xt = a + Xt1 + t
H0 : = 0
or H0 : = 0, a = 0
H1 : < 0
or
H0 : < 0, a 6= 0
Null hypothesis: stochastic trend without drift

Alternative hypothesis: stationary process around a constant
Winter 2013/2014
105 / 143
Dickey-Fuller regression, case 3: constant and trend

Xt = a + bt + Xt1 + t
H0 : = 0
or = 0, b = 0
H1 : < 0
or
< 0, b 6= 0
Null hypothesis: stochastic trend with drift

Alternative hypothesis: trend stationary process
Winter 2013/2014
106 / 143
Dickey-Fuller test statistics for single hypotheses

-test :
-test :
T

/
The -test statistic is computed in the same way as

the usual t-test statistic
Reject the null hypothesis if the test statistics are too small
The critical values are not the quantiles of the t-distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.6)
Winter 2013/2014
107 / 143
The Dickey-Fuller test statistics for the joint hypotheses are

computed in the same way as the usual F -test statistics
Reject the null hypothesis if the test statistic is too large
The critical values are not the quantiles of the F -distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.7)
Illustrations (dftest.R)
Winter 2013/2014
108 / 143
If there is autocorrelation in t the DF test does not work (dftest.R)

Augmented Dickey-Fuller test (ADF test) regressions:
Xt = 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + bt + 1 Xt1 + . . . + p Xtp + Xt1 + t
The added lagged differences capture the autocorrelation
The number of lags p must be large enough to make t white noise
The critical values remain the same as in the no-correlation case
Winter 2013/2014
109 / 143
Further interesting topics (but we skip these)

Phillips-Perron test
Structural breaks and unit roots
KPSS test of stationarity
H0 : Xt I (0)
H1 : Xt I (1)
Winter 2013/2014
110 / 143
Regression with integrated processes
Spurious regression: If Xt and Yt are independent but both I (1)

then the regression
Yt = + Xt + ut
will result in an estimated coefficient that is significantly different
from 0 with probability 1 as T
BUT: The regression
Yt = + Xt + ut
may be sensible even though Xt and Yt are I (1)
Cointegration
Winter 2013/2014
111 / 143
Regression with integrated processes
Definition: Cointegration
Two stochastic processes (Xt )tT and (Yt )tT are cointegrated if both
processes are I (1) and there is a constant such that the process
(Yt Xt ) is I (0)
If is known, cointegration can be tested using a standard unit root
test on the process (Yt Xt )
If is unknown, it can be estimated from the linear regression
Yt = + Xt + ut
and cointegration is tested using a modified unit root test on the
residual process (ut )t=1,...,T
Winter 2013/2014
112 / 143
GARCH models
Conditional expectation
Let (X , Y ) be a bivariate random variable with a joint density

function, then
Z
E (X |Y = y ) =
x fX |Y =y (x)dx
is the conditional expectation of X given Y = y

E (X |Y ) denotes a random variable with realization E (X |Y = y )
if the random variable Y realizes as y
Both E (X |Y ) and E (X |Y = y ) are called conditional expectation
Winter 2013/2014
113 / 143
GARCH models
Conditional variance
Let (X , Y ) be a bivariate random variable with a joint density

function, then
Z
Var (X |Y = y ) =
(x E (X |Y = y ))2 fX |Y =y (x)dx
is the conditional variance of X given Y = y

Var (X |Y ) denotes a random variable with realization Var (X |Y = y )
if the random variable Y realizes as y
Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance
Winter 2013/2014
114 / 143
GARCH models
Rules for conditional expectations
Law of iterated expectations: E (E (X |Y )) = E (X )
If X and Y are independent, then E (X |Y ) = E (X )
The condition can be treated like a constant,

E (XY |Y ) = Y E (X |Y )
The conditional expecation is a linear operator. For a1 , . . . , an R

!
n
n
X
X
E
ai Xi |Y =
ai E (Xi |Y )
i=1
i=1
Winter 2013/2014
115 / 143
GARCH models
Basics
Some economic time series show volatility clusters, e.g. stock returns,
commodity price changes, inflation rates, . . .
Simple autoregressive models cannot capture volatility clusters since
their conditional variance is constant
Example: Stationary AR(1)-process, Xt = Xt1 + t with || < 1;
then
2
Var (Xt ) = X2 =
,
1 2
and the conditional variance is
Var (Xt |Xt1 ) = 2
Winter 2013/2014
116 / 143
GARCH models
Basics
In the following, we will focus on stock returns

Empirical fact: squared (or absolute) returns are positively
autocorrelated
Implication: Returns are not independent over time
The dependence is nonlinear
How can we model this kind of dependence?
Winter 2013/2014
117 / 143
GARCH models
ARCH(1)-process
Definition: ARCH(1)-process
The stochastic process (Xt )tZ is called ARCH(1)-process if
E (Xt |Xt1 ) = 0
Var (Xt |Xt1 ) = t2
2
= 0 + 1 Xt1
for all t Z, with 0 , 1 > 0

Often, an additional assumption is
2
Xt | (Xt1 = xt1 ) N(0, 0 + 1 xt1
)
Winter 2013/2014
118 / 143
GARCH models
ARCH(1)-process
The unconditional distribution of Xt is a non-normal distribution

Leptokurtosis: The tails are heavier than the tails of the normal
distribution
Example of an ARCH(1)-process
Xt = t t
where (t )tZ is white noise with 2 = 1 and
q
2
t = 0 + 1 Xt1
Winter 2013/2014
119 / 143
GARCH models
ARCH(1)-process
One can show that
[11]
E (Xt |Xt1 ) = 0
E (Xt ) = 0
2
Var (Xt |Xt1 ) = 0 + 1 Xt1
Var (Xt ) = 0 / (1 1 )
Cov (Xt , Xti ) = 0
for i > 0
Stationarity condition: 0 < 1 < 1

The unconditional
kurtosis is 3(1 12 )/(1 312 ) if t N(0, 1).
p
If 1 > 1/3 = 0.57735, the kurtosis does not exist.
Winter 2013/2014
[12]
120 / 143
GARCH models
ARCH(1)-process
Squared returns follow
[13]
2
Xt2 = 0 + 1 Xt1
+ vt
with vt = t2 (2t 1)
Thus, squared returns of ARCH(1) are AR(1)
The process (vt )tZ is white noise
E (vt ) = 0
Var (vt ) = E (vt2 ) = const.
Cov (vt , vti ) = 0
(i = 1, 2, . . .)
Winter 2013/2014
121 / 143
GARCH models
ARCH(1)-process
Simulation of an ARCH(1)-process for t = 1, . . . , 2500

Parameters: 0 = 0.05, 1 = 0.95, start value X0 = 0
Conditional distribution: t N(0, 1)
archsim.R
Check whether the simulated time series shows the typical stylized
facts of return distributions
Winter 2013/2014
122 / 143
GARCH models
Estimation of an ARCH(1)-process
Of course, we do not know the true values of the model parameters

0 and 1
How can we estimate the unknown parameters 0 and 1 ?
Observations X1 , . . . , XT
Because of
2
Xt2 = 0 + 1 Xt1
+ vt
a possible estimation method is OLS
Winter 2013/2014
123 / 143
GARCH models
OLS estimator of 1

P
1 =

2 X2
Xt2 Xt2 Xt1
t1
2
(Xt2 , Xt1
)
2
PT 2
2
X
X
t1
t1
t=2
T
t=2
Careful: These
p estimators are only consistent if the kurtosis exists
(i.e. if 1 < 1/3)
Test of ARCH-effects
H0 : 1 = 0
H1 : 1 > 0
Winter 2013/2014
124 / 143
GARCH models
For T large, under H0
Reject H0 if
T
1 N(0, 1)
T
1 > 1 (1 )
Second version of this test: Consider the R 2 of the regression

2
+ vt ,
Xt2 = 0 + 1 Xt1
then under H0
appr
T
12 TR 2 21
Reject H0 if TR 2 > F1
2 (1 )
1
Winter 2013/2014
125 / 143
GARCH models
ARCH(p)-process
Definition: ARCH(p)-process
The stochastic process (Xt )tZ is called ARCH(p)-process if
E (Xt |Xt1 , . . . Xtp ) = 0
Var (Xt |Xt1 , . . . , Xtp ) = t2
2
2
= 0 + 1 Xt1
+ . . . + p Xtp
for t Z, where i 0 for i = 0, 1, . . . , p 1 and p > 0

Often, an additional assumption is that
Xt |(Xt1 = xt1 , . . . , Xtp = xtp ) N(0, t2 )
Winter 2013/2014
126 / 143
GARCH models
ARCH(p)-process
Example of an ARCH(p)-process
Xt = t t
where(t )tZ is white noise with 2 = 1 and
q
2 + ... + X2
t = 0 + 1 Xt1
p tp
An ARCH(p) process is weakly stationary if all roots of
1 1 z 2 z 2 . . . p z p = 0 are outside the unit circle
Then, for all t Z, E (Xt ) = 0 and
Var (Xt ) =
P0p
i=1 i
Winter 2013/2014
127 / 143
GARCH models
ARCH(p)-process
If (Xt )tZ is a stationary ARCH(p) process, then (Xt2 )tZ is a

stationary AR(p) process
2
2
Xt2 = 0 + 1 Xt1
+ . . . + p Xtp
+ vt
As to the error term,

E (vt ) = 0
Var (vt ) = const.
Cov (vt , vti ) = 0
for i = 1, 2, . . .
Simulating an ARCH(p) is easy
Winter 2013/2014
128 / 143
GARCH models
Estimation of ARCH(p) models
OLS estimation of
2
2
Xt2 = 0 + 1 Xt1
+ . . . + p Xtp
+ vt
Test of ARCH-effects
H0 : 1 = 2 = . . . = p = 0
vs H1 : not H0
Let R 2 denote the coefficient of determination of the regression

Under H0 , the test statistic TR 2 2p ;
thus reject H0 if TR 2 > F1
2 (1 )
p
Winter 2013/2014
129 / 143
GARCH models
Basic idea of the maximum likelihood estimation method:

Choose parameters such that the joint density of the observations
fX1 ,...,XT (x1 , . . . , xT )
is maximized
Let X1 , . . . , XT denote a random sample from X
The density fX (x; ) depends on R unknown parameters
= (1 , . . . , R )
Winter 2013/2014
130 / 143
GARCH models
ML estimation of : Maximize the (log)likelihood function

L () = fX1 ,...XT (x1 , . . . , xT ; )
=
ln L () =
T
Y
t=1
T
X
fX (xt ; )
ln fX (xt ; )
t=1
ML estimate
= argmax [ln L ()]
Winter 2013/2014
131 / 143
GARCH models
Since observations are independent in random samples

fX1 ,...,XT (x1 , . . . , xT ) =
T
Y
fXt (xt )
t=1
or
ln fX1 ,...,XT (x1 , . . . , xT ) =
T
X
ln fXt (xt )
t=1
T
X
ln fX (xt )
t=1
But: ARCH-returns are not independent!

Winter 2013/2014
132 / 143
GARCH models
Factorization with dependent observations

fX1 ,...,XT (x1 , . . . , xT ) =
T
Y
fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1
or
ln fX1 ,...,XT (x1 , . . . , xT ) =
T
X
ln fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1
Hence, for an ARCH(1)-process

T
Y
1
1
fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 )
p 2 exp
2
2 t
t=2
xt
t
2 !
Winter 2013/2014
133 / 143
GARCH models
The marginal density of X1 is complicated but becomes negligible for

large T and, therefore, will be dropped from now on
Log-likelihood function (without initial marginal density)
ln L(0 , 1 |x1 , . . . , xT )
T
t=2
t=2
T
1X
1X
= ln 2
ln t2
2
2
2
xt
t
2
2
where t2 = 0 + 1 xt1
ML-estimation of 0 and 1 by numerical maximization of

ln L(0 , 1 ) with respect to 0 and 1
Winter 2013/2014
134 / 143
GARCH models
GARCH(p,q)-process
Definition: GARCH(p,q)-process
The stochastic process (Xt )tZ is called GARCH(p, q)-process if
E (Xt |Xt1 , Xt2 , . . .) = 0
Var (Xt |Xt1 , Xt2 , . . .) = t2
2
2
= 0 + 1 Xt1
+ . . . + p Xtp
2
2
+1 t1
+ . . . + q tq
for t Z with i , i 0
Often, an additional assumption is that
(Xt |Xt1 = xt1 , Xt2 = xt2 , . . .) N(0, t2 )
Winter 2013/2014
135 / 143
GARCH models
GARCH(p,q)-process
Conditional variance of GARCH(1, 1)

Var (Xt |Xt1 , Xt2 , . . .) = t2
2
2
= 0 + 1 Xt1
+ 1 t1
X
0
2
=
+ 1
1i1 Xti
1 1
i=1
Unconditional variance
Var (Xt ) =
0
Pq
i=1 i
j=1 j
Pp
Winter 2013/2014
136 / 143
GARCH models
GARCH(p,q)-process
Necessary condition for weak stationarity

p
X
i +
i=1
q
X
j < 1
j=1
(Xt )tZ has no autocorrelation

GARCH-processes can be written as ARMA(max (p, q) , q)-processes
in the squared returns
Example: GARCH(1, 1)-process with Xt = t t and
2 + 2
t2 = 0 + 1 Xt1
1 t1
Winter 2013/2014
137 / 143
GARCH models
Estimation of GARCH(p,q)-processes
Estimation of the ARMA(max (p, q) , q)-process in the squared returns

Alternative (and better) method: Maximum likelihood
For a GARCH(1, 1)-process
fX1 ,...,XT (x1 , . . . , xT )
T
Y
1
1
= fX1 (x1 )
p 2 exp
2
2 t
t=2
xt
t
2 !
Winter 2013/2014
138 / 143
GARCH models
Again, the density of X1 can be neglected

Log-Likelihood function
ln L(0 , 1 , 1 |x1 , . . . , xT )
T
t=2
t=2
T
1X
1X
= ln 2
ln t2
2
2
2
xt
t
2
2
2
with t2 = 0 + 1 xt1
+ 1 t1
and 12 = 0
ML-estimation of 0 , 1 and 1 by numerical maximization
Winter 2013/2014
139 / 143
GARCH models
2
Conditional h-step forecast of the volatility t+h
in a GARCH(1, 1)
model

0
2
h
2
E t+h |Xt , Xt1 , . . . = (1 + 1 ) t
1 1 1
0
+
1 1 1
If the process is stationary

2
lim E (t+h
|Xt , Xt1 , . . .) =
0
1 1 1
Simulation of GARCH-processes is easy; the estimation can be

computer intensive
Winter 2013/2014
140 / 143
GARCH models
Residuals of an estimated GARCH(1,1) model
Careful: Residuals are slightly different from what you know from OLS
regressions
Estimates:
0,
1 , 1 ,
2 + 2
From t2 = 0 + 1 Xt1
1 t1 and Xt = + t t we calculate
the standardized residuals
t =
Xt
Xt
=q
t
2 +
1 2
0 +
1 Xt1
t1
Histogram of the standardized residuals
Winter 2013/2014
141 / 143
GARCH models
AR(p)-ARCH(q)-models
Definition: (Xt )tZ is called AR(p)-ARCH(q)-process if

Xt
= + 1 Xt1 + t
t2
= 0 + 1 2t1
where t N(0, t2 )
mean equation / variance equation
Winter 2013/2014
142 / 143
GARCH models
Extensions of the GARCH model
There are a number of possible extensions to the GARCH model:

Empirical fact: Negative shocks have a larger impact on volatility
than positive shocks (leverage effect)
News impact curve
Nonnormal innovations, e.g. t t
Winter 2013/2014
143 / 143

Slides A13 14

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Slides A13 14

Caricato da

Copyright:

Formati disponibili

Time Series Analysis

Andrea Beccarini (CQE)

Time Series Analysis

Time series are ubiquitous in economics, and very important

Andrea Beccarini (CQE)

Time Series Analysis

Andrea Beccarini (CQE)

Time Series Analysis

Andrea Beccarini (CQE)

Time Series Analysis

Neusser, Klaus (2011), Zeitreihenanalyse in den

Andrea Beccarini (CQE)

Time Series Analysis

Definition: Time series

Andrea Beccarini (CQE)

Time Series Analysis

Andrea Beccarini (CQE)

Time Series Analysis

Quarterly GDP Germany, 1991 I to 2012 II

GDP (in current billion Euro)

Andrea Beccarini (CQE)

Time Series Analysis

DAX index and log(DAX), 31.12.1964 to 6.4.2009

Andrea Beccarini (CQE)

Time Series Analysis

Definition: Stochastic process

Andrea Beccarini (CQE)

Time Series Analysis

Andrea Beccarini (CQE)

Time Series Analysis

Example 1: White noise

Example 2: Random walk

Example 3: A random constant

Andrea Beccarini (CQE)

Time Series Analysis

Definition: Moment functions

Andrea Beccarini (CQE)

Time Series Analysis

Usually, the moment functions are unknown and have to be estimated

Can we still estimate the expectation function (t) and the

Andrea Beccarini (CQE)

Time Series Analysis

Usually, the expectation function (t) should be estimated by

Andrea Beccarini (CQE)

Time Series Analysis

Under certain conditions, (t) can be estimated by

Andrea Beccarini (CQE)

Time Series Analysis

Usually, the autocovariance (t, t + h) should be estimated by

Andrea Beccarini (CQE)

Time Series Analysis

Under certain conditions, (t, t + h) can be estimated by

Andrea Beccarini (CQE)

Time Series Analysis

Moment functions cannot be estimated without additional

Andrea Beccarini (CQE)

Time Series Analysis

Definition: Strong stationarity

Andrea Beccarini (CQE)

Time Series Analysis

Definition: Weak stationarity

the expectation exists and is constant: E (Xt ) = < for all t T

for all t, s, r Z (in admissible range)