Sei sulla pagina 1di 148

Time Series Analysis

Andrea Beccarini
Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

1 / 143

Introduction
Objectives

Time series are ubiquitous in economics, and very important


in macro economics and financial economics
GDP, inflation rates, unemployment, interest rates, stock prices
You will learn . . .
the formal mathematical treatment of time series and stochastic
processes
what the most important standard models in economics are
how to fit models to real world time series

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

2 / 143

Introduction
Prerequisites

Descriptive Statistics
Probability Theory
Statistical Inference

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

3 / 143

Introduction
Class and material

Class
Class teacher: Sarah Meyer
Time: Tu., 12:00-14:00
Location: CAWM 3
Start: 22 October 2013
Material
Course page on Blackboard
Slides and class material are (or will be) downloadable

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

4 / 143

Introduction
Literature

Neusser, Klaus (2011), Zeitreihenanalyse in den


Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.
available online in the RUB-Netz
Hamilton, James D. (1994), Time Series Analysis,
Princeton University Press, Princeton.
Pfaff, Bernhard (2006), Analysis of Integrated and
Cointegrated Time Series with R, Springer, New York.
Schlittgen, Rainer und Streitberg, Bernd (1997),
Zeitreihenanalyse, 7. Aufl., Oldenbourg, M
unchen.

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

5 / 143

Basics
Definition

Definition: Time series


A sequence of observations ordered by time is called time series
Time series can be univariate or multivariate
Time can be discrete or continous
The states can be discrete or continuous

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

6 / 143

Basics
Definition

Typical notations
x1 , x2 , . . . , xT
or x(1), x(2), . . . , x(T )
or xt , t = 1, . . . , T
or (xt )t0
This course is about . . .
univariate time series
in discrete time
with continuous states

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

7 / 143

Basics
Examples

Quarterly GDP Germany, 1991 I to 2012 II

600

550

500

450
400
350

GDP (in current billion Euro)

650

1995

2000

2005

2010

Time

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

8 / 143

Basics
Examples

6000
2000

DAX

DAX index and log(DAX), 31.12.1964 to 6.4.2009

1970

1980

1990

2000

2010

2000

2010

9.0
8.0
7.0
6.0

logarithm of DAX

Time

1970

1980

1990
Time

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

9 / 143

Basics
Definition

Definition: Stochastic process


A sequence (Xt )tT of random variables, all defined on the same
probability space (, A, P), is called stochastic process with discrete time
parameter (usually T = N or T = Z)
Short version: A stochastic process is a sequence of random variables
A stochastic process depends on both chance and time

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

10 / 143

Basics
Definition

Distinguish four cases: both time and chance can be fixed or variable

fixed

variable

t fixed
Xt () is a
real number
Xt () is a
random variable

t variable
Xt () is a sequence of
real numbers (path,
realization, trajectory)
Xt () is a stochastic
process

process.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

11 / 143

Basics
Examples

Example 1: White noise


t NID 0, 2

Example 2: Random walk


Xt

= Xt1 + t

and X0 = 0

NID(0, )

Example 3: A random constant


Xt
Z

Andrea Beccarini (CQE)

= Z
N(0, 2 )

Time Series Analysis

Winter 2013/2014

12 / 143

Basics
Moment functions

Definition: Moment functions


The following functions of time are called moment functions:
(t) = E (Xt )
(expectation function)
2 (t) = Var (Xt )
(variance function)
(s, t) = Cov (Xs , Xt ) (covariance function)
Correlation function (autocorrelation function)
(s, t)
p
(s, t) = p
2
(s) 2 (t)
moments.R

Andrea Beccarini (CQE)

[1]

Time Series Analysis

Winter 2013/2014

13 / 143

Basics
Estimation of moment functions

Usually, the moment functions are unknown and have to be estimated


Problem: Only a single path (realization) can be observed
X1
(1)
X2
..
.

(1)

X1
(2)
X2
..
.

(2)

(1)
XT

(2)
XT

...
...
...
...

(n)

X1
(n)
X2
..
.
(n)

XT

Can we still estimate the expectation function (t) and the


autocovariance function (s, t)? Under which conditions?

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

14 / 143

Basics
Estimation of moment functions

X1
(1)
X2
..
.

(1)

X1
(2)
X2
..
.

(2)

(1)
XT

(2)
XT

...
...
...
...

(n)

X1
(n)
X2
..
.
(n)

XT

Usually, the expectation function (t) should be estimated by


averaging over realizations,
n

1 X (i)

(t) =
Xt
n
i=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

15 / 143

Basics
Estimation of moment functions

X1
(1)
X2
..
.

(1)

X1
(2)
X2
..
.

(2)

(1)
XT

(2)
XT

...
...
...
...

(n)

X1
(n)
X2
..
.
(n)

XT

Under certain conditions, (t) can be estimated by


averaging over time,
T
1 X (1)

=
Xt
T
t=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

15 / 143

Basics
Estimation of moment functions

X1
(1)
X2
..
.

(1)

X1
(2)
X2
..
.

(2)

(1)
XT

(2)
XT

...
...
...
...

(n)

X1
(n)
X2
..
.
(n)

XT

Usually, the autocovariance (t, t + h) should be estimated by


averaging over realizations,
n

1 X (i)
(i)
(t, t + h) =
(Xt
(t))(Xt+h
(t + h))
n
i=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

16 / 143

Basics
Estimation of moment functions

X1
(1)
X2
..
.

(1)

X1
(2)
X2
..
.

(2)

(1)
XT

(2)
XT

...
...
...
...

(n)

X1
(n)
X2
..
.
(n)

XT

Under certain conditions, (t, t + h) can be estimated by


averaging over time,
(t, t + h) =

T h
1 X
)(Xt+h (1)
)
(Xt (1)
T
t=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

16 / 143

Basics
Definition

Moment functions cannot be estimated without additional


assumptions since only one path is observed
There are restrictions which allow to estimate the moment functions
Restriction of the time heterogeneity:
The distribution of (Xt ())tT must not be completely different for
each t T
Restriction of the memory:
If the values of the process are coupled too closely over time, the
individual observations do not supply any (or only insufficient)
information about the distribution

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

17 / 143

Basics
Restriction of time heterogeneity: Stationarity

Definition: Strong stationarity


Let (Xt )tT be a stochastic process, and let t1 , . . . , tn T be an arbitrary
number of n N arbitrary time points.
(Xt )tT is called strongly stationary if for arbitrary h Z
P(Xt1 x1 , . . . , Xtn xn ) = P(Xt1 +h x1 , . . . , Xtn +h xn )
Implication: all univariate marginal distributions are identical

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

18 / 143

Basics
Restriction of time heterogeneity: Stationarity

Definition: Weak stationarity


(Xt )tT is called weakly stationary if
1

the expectation exists and is constant: E (Xt ) = < for all t T

the variance exists and is constant: Var (Xt ) = 2 < for all t T

for all t, s, r Z (in admissible range)


(t, s) = (t + r , s + r )

Simplified notation for covariance and correlation functions


(h) = (t, t + h)
(h) = (t, t + h)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

19 / 143

Basics
Restriction of time heterogeneity: Stationarity

Strong stationarity implies weak stationarity


(but only if the first two moments exist)
A stochastic process is called Gaussian if the joint distribution
of Xt1 , . . . , Xtn is multivariate normal
For Gaussian processes, weak and strong stationarity coincide
Intuition: An observed time series can be regarded as a realization of
a stationary process, if a gliding window of appropriate width

always displays qualitatively the same picture

stationary.R
Examples

Andrea Beccarini (CQE)

[2]

Time Series Analysis

Winter 2013/2014

20 / 143

Basics
Restriction of memory: Ergodicity

Definition: Ergodicity (I)


Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define

T
1 X
Xt
T
t=1

(Xt )tT is called (expectation) ergodic, if


h
i
lim E (
T )2 = 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

21 / 143

Basics
Restriction of memory: Ergodicity

Definition: Ergodicity (II)


Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
(h) =

T h
1 X
(Xt )(Xt+h )
T
t=1

(Xt )tT is called (covariance) ergodic, if for all h Z


h
i
lim E (
(h) (h))2 = 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

22 / 143

Basics
Restriction of memory: Ergodicity

Ergodicity is consistency (in quadratic mean) of the estimators


of
and (h) of (h) for dependent observations
The process (Xt )tT is expectation ergodic if ((h))hZ is
absolutely summable, i.e.

|(h)| <

h=

The dependence between far away observations must be sufficiently


small

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

23 / 143

Basics
Restriction of memory: Ergodicity

Ergodicity condition (for autocovariance): A stationary Gaussian


process (Xt )tT with absolutely summable autocovariance function
(h) is (autocovariance) ergodic
Under ergodicity, the law of large numbers holds even if the
observations are dependent
If the dependence (h) does not diminish fast enough,
the estimators are no longer consistent
Examples

Andrea Beccarini (CQE)

[3]

Time Series Analysis

Winter 2013/2014

24 / 143

Basics
Estimation of moment functions

Summary of estimators

electricity.R

T
1 X

= XT =
Xt
T
t=1

T
h
X

(h) =

1
T

(h) =

(h)
(0)

(Xt
)(Xt+h
)

t=1

Sometimes, (h) is defined with factor 1/(T h)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

25 / 143

Basics
Estimation of moment functions

A closer look at the expectation estimator

The estimator
is unbiased, i.e. E (
) =

[4]

The variance of
is

[5]


T 1 
(0)
2 X
h
Var (
) =
+
1
(h)
T
T
T
h=1

Under ergodicity, for T


T Var (
) (0) + 2

X
h=1

Andrea Beccarini (CQE)

Time Series Analysis

(h) =

(h)

h=

Winter 2013/2014

26 / 143

Basics
Estimation of moment functions

For Gaussian processes,


is normally distributed

N (, Var (
))
and asymptotically

T (
) Z N

0, (0) + 2

!
(h)

h=1

For non-Gaussian processes,


is (often) asymptotically normal
!

T (
) Z N 0, (0) + 2
(h)
h=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

27 / 143

Basics
Estimation of moment functions

A closer look at the autocovariance estimators (h)


For Gaussian processes with absolutely summable covariance function,
0


T (
(0) (0)) , . . . , T (
(K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and
T Cov (
(h1 ) , (h2 ))

X
=
( (r ) (r + h1 + h2 ) + (r h2 ) (r + h1 ))
r =

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

28 / 143

Basics
Estimation of moment functions

A closer look at the autocorrelation estimators (h)


For Gaussian processes with absolutely summable covariance function,
the random vector

0

T (
(0) (0)) , . . . , T (
(K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and a
complicated covariance matrix
Be careful: For small to medium sample sizes the autocovariance and
autocorrelation estimators are biased!
autocorr.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

29 / 143

Basics
Estimation of moment functions

An important special case for autocorrelation estimators:


Let (t ) be a white-noise process with Var (t ) = 2 < , then
E (
(h)) = T 1 + O(T 2 )
 1
2 )
T + O(T

Cov (
(h1 ) , (h2 )) =
2
O T

for h1 = h2
else

For white-noise processes and long time series, the empirical


autocorrelations are approximately independent normal random
variables with expectation T 1 and variance T 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

30 / 143

Mathematical digression (I)


Complex numbers

Some quadratic equations do not have real solutions, e.g.


x2 + 1 = 0
Still it is possible (and sensible) to define solutions to such equations
The definition in common notation is

i = 1
where i is the number which, when squared, equals 1
The number i is called imaginary (i.e. not real)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

31 / 143

Mathematical digression (I)


Complex numbers

Other imaginary numbers follow from this definition, e.g.


16 =
16 1 = 4i

5 =
5 1 = 5i
Further, it is possible to define numbers that contain
both a real part and an imaginary part, e.g. 5 8i or a + bi
Such numbers are called complex and the set of complex numbers
is denoted as C
The pair a + bi and a bi is called conjugate complex

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

32 / 143

Mathematical digression (I)


Complex numbers

imaginary axis

seq(0, 8, length = 11)

Geometric interpretation:

a+bi

er

alu
ev
lut

so

ab

imaginary part b

real part a

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

33 / 143

Mathematical digression (I)


Complex numbers

Polar coordinates and Cartesian coordinates


z

= a + bi
= r (cos + i sin )
= re i

a = r cos
b = r sin
p
a2 + b 2
r =
 
b
= arctan
a

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

34 / 143

Mathematical digression (I)


Complex numbers

Rules of calculus:
Addition
(a + bi) + (c + di) = (a + c) + (b + d)i
Multiplication (cartesian coordinates)
(a + bi) (c + di) = (ac bd) + (ad + bc)i
Multiplication (polar coordinates)
r1 e i1 r2 e i2 = r1 r2 e i(1 +2 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

35 / 143

Mathematical digression (I)


Complex numbers

imaginary axis

seq(2, 8, length = 11)

Addition:

a+bi

c+di

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

36 / 143

Mathematical digression (I)


Complex numbers

Addition:

imaginary axis

seq(2, 8, length = 11)

a+bi

c+di

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

36 / 143

Mathematical digression (I)


Complex numbers

Addition:
(a+c)+(b+d)i

imaginary axis

seq(2, 8, length = 11)

a+bi

c+di

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

36 / 143

Mathematical digression (I)


Complex numbers

imaginary axis

seq(2, 8, length = 11)

Multiplication:

r2

r1

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

37 / 143

Mathematical digression (I)


Complex numbers

Multiplication:

imaginary axis

seq(2, 8, length = 11)

r=

r1

= 1 + 2
2

r2

r1

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

37 / 143

Mathematical digression (I)


Complex numbers

The quadratic equation


x 2 + px + q = 0
has the solutions
p
x =
2
If

p2
4

p2
q
4

q < 0 the solutions are complex (and conjugate)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

38 / 143

Mathematical digression (I)


Complex numbers

Example: The solutions of


x 2 2x + 5 = 0
are
(2)
+
x =
2

(2)2
5 = 1 + 2i
4

(2)
x =

(2)2
5 = 1 2i
4

and

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

39 / 143

Mathematical digression (II)


Linear difference equations

First order difference equation with initial value x0 :


xt = c + 1 xt1
p-th order difference equation with initial value x0 :
xt = c + 1 xt1 + . . . + p xtp
A sequence (xt )t=0,1,... that satisfies the difference equation
is called a solution of the difference equation
Examples (diffequation.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

40 / 143

Mathematical digression (II)


Linear difference equations

We only consider the homogeneous case, i.e. c = 0


The general solution of the first-order difference equation
xt = 1 xt1
is
xt = A t1
with arbitrary constant A since xt = At1 = 1 At1
= 1 xt1
1
The constant is definitized by the initial condition, A = x0
The sequence xt = At1 is convergent if and only if |1 | < 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

41 / 143

Mathematical digression (II)


Linear difference equations

Solution of the p-th order difference equation


xt = 1 xt1 + . . . + p xtp
Let xt = Az t , then
Az t
z t

= 1 Az (t1) + . . . + p Az (tp)
= 1 z (t1) + . . . + p z (tp)

and thus
1 1 z 1 . . . p z p = 0
Characteristic polynomial, characteristic equation

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

42 / 143

Mathematical digression (II)


Linear difference equations

There are p (possibly complex, possibly nondistinct) solutions


of the characteristic equation
Denote the solutions (called roots) by z1 , . . . , zp
If all roots are real and distinct, then
xt = A1 z1t + . . . + Ap zpt
is a solution of the homogeneous difference equation
If there are complex roots the solution is oscillating
The constants A1 , . . . , Ap can be definitized with p initial conditions
(x0 , x1 , . . . , xp1 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

43 / 143

Mathematical digression (II)


Linear difference equations

Stability condition: The linear difference equation


xt = 1 xt1 + . . . + p xtp
is stable (i.e. convergent) if and only if all roots of the
characteristic polynomial
1 1 z . . . p z p = 0
are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p
In R, the stability condition can be checked easily using the
commands polyroot (base package) or ArmaRoots (fArma package)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

44 / 143

ARMA models
Definition

Definition: ARMA process


Let (t )tT be a white noise process; the stochastic process
Xt = 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
with p , q 6= 0 is called ARMA(p, q) process
AutoRegressive Moving Average process
ARMA processes are important since every stationary process can be
approximated by an ARMA process

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

45 / 143

ARMA models
Lag operator and lag polynomial

The lag operator is a convenient notational tool


The lag operator L shifts the time index of a stochastic process
L (Xt )tT = (Xt1 )tT
LXt

= Xt1

Rules
L2 Xt
n

L Xt

= Xtn

= Xt+1

= Xt

L Xt

Andrea Beccarini (CQE)

= L (LXt ) = Xt2

Time Series Analysis

Winter 2013/2014

46 / 143

ARMA models
Lag operator and lag polynomial

Lag polynomial
A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp
Example: Let A(L) = 1 0.5L and B(L) = 1 + 4L2 , then
C (L) = A(L)B(L)
= (1 0.5L) 1 + 4L2

= 1 0.5L + 4L2 2L3


Lag polynomials can be treated in the same way as ordinary
polynomials

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

47 / 143

ARMA models
Lag operator and lag polynomial

Define the lag polynomials


(L) = 1 1 L . . . p Lp
(L) = 1 + 1 L + . . . + q Lq
The ARMA(p, q) process can be written compactly as
(L)Xt = (L)t
Important special cases
MA(q) process :

Xt = t + 1 t1 + . . . + q tq

AR(1) process :

Xt = 1 Xt1 + t

AR(p) process :

Xt = 1 Xt1 + + p Xtp + t

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

48 / 143

ARMA models
MA(q) process

The MA(q) process is


Xt

= (L)t

Xt

= t + 1 t1 + . . . + q tq

with t NID(0, 2 )
Expectation function
E (Xt ) = E (t + 1 t1 + . . . + q tq )
= E (t ) + 1 E (t1 ) + . . . + q E (tq )
= 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

49 / 143

ARMA models
MA(q) process

Autocovariance function
(s, t)


= E (s + 1 s1 + . . . + q sq ) (t + 1 t1 + . . . + q tq )

= E s t + 1 s t1 + 2 s t2 + . . . + q s tq
+1 s1 t + 12 s1 t1 + 1 2 s1 t2 + . . . + 1 q s1 tq
+...
+q sq t + 1 q sq t1 + 2 q sq t2 + . . . + q2 sq tq

The expectations of the cross products are



0
for s 6= t
E (s t ) =
2

for s = t
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

50 / 143

ARMA models
MA(q) process

Define 0 = 1, then
(t, t) = 2
(t 1, t) =

Xq

2
i=0 i
Xq1
2
i i+1
i=0

(t 2, t) = 2

Xq2
i=0

i i+2

(t q, t) = 2 0 q = 2 q
(s, t) = 0 for s < t q
Hence, MA(q) processes are always stationary
Simulation of MA(q) processes (maqsim.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

51 / 143

ARMA models
AR(1) process

The AR(1) process is


(L)Xt

= t

(1 1 L)Xt

= t

Xt

= 1 Xt1 + t

with t NID(0, 2 )
Expectation and variance function

[6]

Stability condition: AR(1) processes are stable if |1 | < 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

52 / 143

ARMA models
AR(1) process

Stationarity: Stable AR(1) processes are weakly stationary if

[7]

E (X0 ) = 0
Var (X0 ) =

2
1 21

Nonstationary stable processes converge towards stationarity

[8]

It is common parlance to call stable processes stationary


Covariance function of stationary AR(1) process

Andrea Beccarini (CQE)

Time Series Analysis

[9]

Winter 2013/2014

53 / 143

ARMA models
AR(p) process

The AR(p) process is


(L)Xt
Xt

= t
= 1 Xt1 + . . . + p Xtp + t

with t NID(0, 2 )
Assumption: t is independent from Xt1 , Xt2 , . . . (innovations)
Expectation function

[10]

The covariance function is complicated (ar2autocov.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

54 / 143

ARMA models
AR(p) process

AR(p) processes are stable if all roots of the characteristic equation


(z) = 0
are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p
An AR(p) process is weakly stationary if the joint distribution of the
p initial values (X0 , X1 , . . . , X(p1) ) is appropriate

Stable AR(p) processes converge towards stationarity;


they are often called stationary
Simulation of AR(p) processes (arpsim.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

55 / 143

ARMA models
Invertability

AR and MA processes can be inverted (into each other)


Example: Consider the stable AR (1) process with |1 | < 1
Xt

= 1 Xt1 + t
= 1 (1 Xt2 + t1 ) + t
= 21 Xt2 + 1 t1 + t
..
.
= n1 Xtn + 1n1 t(n1) + . . . + 21 t2 + 1 t1 + t

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

56 / 143

ARMA models
Invertability

Since |1 | < 1
Xt

i1 ti

i=0

= t + 1 t1 + 2 t2 + . . .
with i = i1
A stable AR(1) process can be written as an MA() process
(the same is true for stable AR(p) processes)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

57 / 143

ARMA models
Invertability

Using lag polynomials this can be written as


(1 1 L)Xt
Xt
Xt

= t
= (1 1 L)1 t

X
=
(1 L)i t
i=0

General compact and elegant notation


(L)Xt
Xt

= t
= ((L))1 t
= (L)t

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

58 / 143

ARMA models
Invertability

MA(q) can be written as AR() if all roots of (z) = 0 are larger


than 1 in absolute value (invertability condition)
Example: MA(1) with |1 | < 1; from
Xt

= t + 1 t1

1 Xt1 = 1 t1 + 12 t2
we find Xt = 1 Xt1 + t 12 t2
Repeated substitution of the ti terms yields
Xt =

i Xti + t

with i = (1)i+1 1i

i=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

59 / 143

ARMA models
Invertability

Summary
ARMA(p, q) processes are stable if all roots of
(z) = 0
are larger than 1 in absolute value
ARMA(p, q) processes are invertible if all roots of
(z) = 0
are larger than 1 in absolute value

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

60 / 143

ARMA models
Invertability

Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)


process either as AR() or as MA()
ARMA(p, q) can be written as AR() or MA()
(L)Xt
Xt
((L))1 (L)Xt

Andrea Beccarini (CQE)

= (L)t
= ((L))1 (L)t
= t

Time Series Analysis

Winter 2013/2014

61 / 143

ARMA models
Deterministic components

Until now we only considered processes with zero expectation


Many processes have both a zero-expectation stochastic
component (Yt ) and a non-zero deterministic component (Dt )
Examples:
linear trend Dt = a + bt
exponential trend Dt = ab t
saisonal patterns

Let (Xt )tZ be a stochastic process with deterministic component Dt


and define Yt = Xt Dt

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

62 / 143

ARMA models
Deterministic components

Then E (Yt ) = 0 and


Cov (Yt , Ys ) = E [(Yt E (Yt )) (Ys E (Ys ))]
= E [(Xt Dt E (Xt Dt ))(Xs Ds E (Xs Ds ))]
= E [(Xt E (Xt )) (Xs E (Xs ))]
= Cov (Xt , Xs )
The covariance function does not depend on the deterministic
component
To derive the covariance function of a stochastic process, simply drop
the deterministic component

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

63 / 143

ARMA models
Deterministic components

Special case: Dt = t =
ARMA(p, q) process with constant (non-zero) expectation
Xt = 1 (Xt1 ) + . . . + p (Xtp )
+t + 1 t1 + . . . + q tq
The process can also be written as
Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
where c = (1 1 . . . p )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

64 / 143

ARMA models
Deterministic components

Wolds representation theorem: Every stationary stochastic process


(Xt )tT can be represented as
Xt =

h th + Dt

h=0

with 0 = 1,

2
h=0 j

< and t white noise with variance 2 > 0

Stationary stochastic processes can be written as a sum of a


deterministic process and an MA() process
Often, low order ARMA(p, q) processes can approximate MA()
processes well

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

65 / 143

ARMA models
Linear processes and filter

Definition: Linear process


Let (t )tZ be a white noise process; a stochastic process (Xt )tZ is called
linear if it can be written as
Xt

h th

h=

= (L)t
where the coefficients are absolutely summable, i.e.

h= |h |

< .

The lag polynomial (L) is called (linear) filter

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

66 / 143

ARMA models
Linear processes and filter

Some special filters


Change from previous period (difference filter)
(L) = 1 L
Change from last year (for quarterly or monthly data)
(L) = 1 L4
(L) = 1 L12
Elimination of saisonal influences (quarterly data)

(L) = 1 + L + L2 + L3 /4
(L) = 0.125L2 + 0.25L + 0.25 + 0.25L1 + 0.125L2
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

67 / 143

ARMA models
Linear processes and filter

Hodrick-Prescott filter (important tool in empirical macro economics)


Decompose a time series (Xt ) into a long-term growth component
(Gt ) and a short-term cyclical component (Ct )
Xt = Gt + Ct
Trade-off between goodness-of-fit and smoothness of Gt
Minimize the criterion function
T
X

(Xt Gt )2 +

t=1

T
1
X

[(Gt+1 Gt ) (Gt Gt1 )]2

t=2

with respect to Gt for given smoothness parameter

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

68 / 143

ARMA models
Linear processes and filter

The FOCs of the minimization problem are

G1
X1
.
..
. = A ..
GT
XT
where A = (I + K 0 K )1 with

1 2 1
0 0
0 1 2 1 0

1 2 1
K = 0 0
..
..
..
..
..
.
.
.
.
.
0 0
0
0 0

Andrea Beccarini (CQE)

Time Series Analysis

... 0
... 0
... 0
.
. . . ..

0
0
0
..
.

0
0
0
..
.

. . . 1 2 1

Winter 2013/2014

69 / 143

ARMA models
Linear processes and filter

The HP filter is a linear filter


Typical values for smoothing parameter
= 10
= 1600
= 14400

annual data
quarterly data
monthly data

Implementation in R (code by Olaf Posch)


Empirical examples (hpfilter.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

70 / 143

Estimation of ARMA models


The estimation problem

Problem: The parameters 1 , . . . , p , 1 , . . . , q , 2 of an ARMA(p, q)


process are usually unknown
They have to be estimated from an observed time series X1 , . . . , XT
Standard estimation methods:
Least squares (OLS)
Maximum likelihood (ML)

Assumption: the lag orders p and q are known

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

71 / 143

Estimation of ARMA models


Least squares estimation of AR(p) models

The AR(p) model with non-zero constant expectation


Xt = c + 1 Xt1 + . . . + p Xtp + t
can be writte in matrix notation


Xp+1
1
Xp
Xp1
Xp+2 1 Xp+1
Xp


.. = ..
..
..
. .
.
.
XT

...
...
..
.

X1
X2
..
.

1 XT 1 XT 2 . . . XT p

c
1
..
.

p+1
p+2
..
.

Compact notation: y = X + u

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

72 / 143

Estimation of ARMA models


Least squares estimation of AR(p) models

The standard least squares estimator is


1 0
= X0 X
Xy
The matrix of exogenous variables X is stochastic
usual results for OLS regression do not hold
But: There is no contemporaneous correlation between the error term
and the exogenous variables
Hence, the OLS estimators are consistent and asymptotically efficient

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

73 / 143

Estimation of ARMA models


Least squares estimation of ARMA models

Solve the ARMA equation


Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
for t ,
t = Xt c 1 Xt1 . . . p Xtp 1 t1 . . . q tq
Define the residuals as functions of the unknown parameters
t (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt d f1 Xt1 . . . fp Xtp
g1 t1 . . . gq tq

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

74 / 143

Estimation of ARMA models


Least squares estimation of ARMA models

Define the sum of squared residuals


S (d, f1 , . . . , fp , g1 , . . . , gq ) =

T
X

(
t (d, f1 , . . . , fp , g1 , . . . , gq ))2

t=1

The least squares estimators are


(
c , 1 , . . . , p , 1 , . . . , q ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq )
Since the residuals are defined recursively one needs starting values
0 , . . . , q+1 and X0 , . . . , Xp+1 to calculate 1
Easiest way: Set all starting values to zero ( conditional estimation)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

75 / 143

Estimation of ARMA models


Least squares estimation of ARMA models

The first order conditions are a nonlinear equation system


which cannot be solved easily
Minimization by standard numerical methods
(implemented in all usual statistical packages)
Either solve the nonlinear first order conditions equation system or
minimize S
Simple special case: ARMA(1, 1)
arma11.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

76 / 143

Estimation of ARMA models


Maximum likelihood estimation

Additional assumption: The innovations t are normally distributed


Implication: ARMA processes are Gaussian
The joint distribution of X1 , . . . , XT is multivariat normal

X1

X = ... N (, )
XT

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

77 / 143

Estimation of ARMA models


Maximum likelihood estimation

Expectation vector

X1
c/ (1 1 . . . p )

..
= E ... =

.
XT
c/ (1 1 . . . p )

Covariance matrix


X1
X2


= Cov . =
..
XT

Andrea Beccarini (CQE)

. . . (T 1)
. . . (T 2)
..
..
.
.
(T 1) (T 2) . . .
(0)
(0)
(1)
..
.

Time Series Analysis

(1)
(0)
..
.

Winter 2013/2014

78 / 143

Estimation of ARMA models


Maximum likelihood estimation

The expectation vector and the covariance matrix contain


 all
2
unknown parameters = 1 , . . . , p , 1 , . . . , q , c,
The likelihood function is
T /2

L (; X) = (2)

1/2

(det )



1
0 1
exp (X ) (X )
2

and the loglikelihood function is


ln L (; X) =

T
1
1
ln (2) ln (det ) (X )0 1 (X )
2
2
2

The ML estimators are = arg max ln L (; X)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

79 / 143

Estimation of ARMA models


Maximum likelihood estimation

The loglikelihood function has to be maximized by numerical methods


Standard properties of ML estimators:
1
2
3
4

consistency
asymptotic efficiency
asymptotically jointly normally distributed
the covariance matrix of the estimators can be consistently estimated

Example: ML estimation of an ARMA(3, 3) model for the interest


rate spread (arma33.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

80 / 143

Estimation of ARMA models


Hypothesis tests

Since the estimation method is maximum likelihood,


the classical tests (Wald, LR, LM) are applicable
General null and alternative hypotheses
H0 : g () = 0
H1 : not H0
where g () is an m-valued function of the parameters
Example: If H0 : 1 = 0 then m = 1 and g () = 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

81 / 143

Estimation of ARMA models


Hypothesis tests

Likelihood ratio test statistic


LR = 2(ln L(ML ) ln L(R ))
where ML and R are the unrestricted and restricted estimators
Under the null hypothesis
d

LR U 2m
and H0 is rejected at significance level if LR > 2m;1
Disadvantage: Two models must be estimated

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

82 / 143

Estimation of ARMA models


Hypothesis tests

For the Wald test we only consider g () = 0 , i.e.


H0 : = 0
H1 : not H0
Test statistic
d ()(
0 )
W = ( 0 )0 Cov
d

If the null hypothesis is true then W U 2m


The asymptotic covariance matrix can be estimated consistently as
d ()
= H 1 where H is the Hessian matrix returned by the
Cov
maximization procedure

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

83 / 143

Estimation of ARMA models


Hypothesis tests

Test example 1:
H0 : 1 = 0
H1 : 1 6= 0
Test example 2
H0 : = 0
H1 : not H0
Illustration (arma33.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

84 / 143

Estimation of ARMA models


Model selection

Usually, the lag orders p and q of an ARMA model are unknown


Trade-off: Goodness-of-fit against parsimony
Akaikes information criterion for the model with non-zero expectation
AIC =

ln

2
|{z}

goodness-of-fit

+ 2 (p + q + 1) /T
|
{z
}
penalty

Choose the model with the smallest AIC

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

85 / 143

Estimation of ARMA models


Model selection

Bayesian information criterion BIC (Schwarz information criterion)


BIC = ln
2 + (p + q + 1) ln T /T
Hannan-Quinn information criterion
HQ = ln
2 + 2 (p + q + 1) ln (ln T ) /T
Both BIC and HQ are consistent while the AIC tends to overfit
Illustration (arma33.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

86 / 143

Estimation of ARMA models


Model selection

Another illustration: The true model is ARMA(2, 1) with


Xt = 0.5Xt1 + 0.3Xt2 + t + 0.7t1 ; 1000 samples of size n = 500
were generated; the table shows the models orders p and q as selected by
AIC and BIC
p
0
1
2
3
4
5

0
0
0
0
0
9
11

# orders selected by
q
1
2
3
0
0
0
18
64
23
171
21
16
7
35
58
2
12 139
6
12
56

Andrea Beccarini (CQE)

AIC
4
0
14
5
80
37
46

5
0
6
7
45
44
56

0
0
0
0
1
6
1

Time Series Analysis

# orders selected by
q
1
2
3
0
0
0
310 167
4
503
3
1
0
2
1
1
0
0
0
0
0

BIC
4
0
0
0
0
0
0

Winter 2013/2014

5
0
0
0
0
0
0

87 / 143

Integrated processes
Difference operator

Define the difference operator


= 1 L,
then
Xt = Xt Xt1
Second order differences
2 = () = (1 L)2 = 1 2L + L2
Higher orders n are defined in the same way; note that n 6= 1 Ln

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

88 / 143

Integrated processes
Definition

Definition: Integrated process


A stochastic process is called integrated of order 1 if
Xt = + (L)t
P
where t is white noise, (1) 6= 0, and
j=0 j|j | <
Common notation: Xt I (1)
I (1) processes are also called difference stationary or
unit root processes
Stochastic and deterministic trends
Trend stationary processes are not I (1) (since (1) = 0)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

89 / 143

Integrated processes
Definition

Stationary processes are sometimes called I (0)


Higher order integrations are possible, e.g.
Xt

I (2)

Xt

I (0)

In general, Xt I (d) means that d Xt I (0)


Most economic time series are either I (0) or I (1)
Some economic time series may be I (2)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

90 / 143

Integrated processes
Definition

Example 1: The random walk with drift, Xt = b + Xt1 + t ,


is I (1) because
Xt

= Xt Xt1
= b + t
= b + (L)t

where 0 = 1 and j = 0 for j 6= 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

91 / 143

Integrated processes
Definition

Example 2: The trend stationary process, Xt = a + bt + t ,


is not I (1) because
Xt

= b + t t1
= (L)t

with 0 = 1, 1 = 1 and j = 0 for all other j

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

92 / 143

Integrated processes
Definition

Example 3: The AR(2) process

Xt
(1 L) (1 L) Xt

= b + (1 + ) Xt1 Xt2 + t
= b + t

is I (1) if || < 1 because Xt = (L) (b + t ) with


(L) = (1 L)1 = 1 + L + 2 L2 + 3 L3 + 4 L4 + . . .
P
1
i
and thus (1) =
i=0 = 1 6= 0. The roots of the characteristic
equation are z = 1 and z = 1/

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

93 / 143

Integrated processes
Definition

Example 4: The process


Xt = 0.5Xt1 0.4Xt2 + t
is a stationary (stable) zero expectation AR(2) process; the process
Yt = a + bt + Xt
is trend stationary and I (0) since
Yt = b + Xt
with Xt = (L)t = (1 L) 1 0.5L + 0.4L2
and therefore (1) = 0 (i0andi1.R)

Andrea Beccarini (CQE)

Time Series Analysis

1

Winter 2013/2014

94 / 143

Integrated processes
Definition

Definition: ARIMA process


Let (t )tT be a white noise process; the stochastic process (Xt )tZ is
called integrated autoregressive moving-average process of the orders
p, d and q, or ARIMA(p, d, q), if d Xt is an ARMA(p, q) process
(L)d Xt = (L)t
For d > 0 the process is nonstationary (I (d)) even if all roots of
(z) = 0 are outside the unit circle
Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

95 / 143

Integrated processes
Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends?


Reason 1: Long-term forecasts and forecasting errors
Deterministic trend: The forecasting error variance is bounded
Stochastic trend: The forecasting error variance is unbounded
Illustrations
i0andi1.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

96 / 143

Integrated processes
Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends?


Reason 2: Spurious regression
OLS regressions will show spurious relationships between
time series with (deterministic or stochastic) trends
Detrending works if the series have deterministic trends,
but it does not help if the series are integrated
Illustrations
spurious1.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

97 / 143

Integrated processes
Integrated processes and parameter estimation

OLS estimators (and ML estimators) are consistent and


asymptotically normal for stationary processes
The asymptotic normality is lost if the processes are integrated
We only look at the very special case
Xt = 1 Xt1 + t
with t NID(0, 1) and X0 = 0
The AR(1) process is stationary if |1 | < 1 and has a unit root
if |1 | = 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

98 / 143

Integrated processes
Integrated processes and parameter estimation

The usual OLS estimator of 1 is


PT
t=1 Xt Xt1
1 = P
T
2
t=1 Xt1
How does the distribution of look like?
Influence of and T
Consistency?
Asymptotic normality?
Illustration (phihat.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

99 / 143

Integrated processes
Integrated processes and parameter estimation

Consistency and asymptotic normality for I (0) processes (|1 | < 1)


plim 1 = 1



d
T 1 1 Z N 0, 1 21
Consistency and asymptotic normality for I (1) processes (1 = 1)
plim 1 = 1


d
T 1 1 V
where V is a nondegenerate, nonnormal random variable
Root-T -consistency and superconsistency
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

100 / 143

Integrated processes
Unit root tests

Importance to distinguish between trend stationarity and difference


stationarity
Test of hypothesis that a process has a unit root (i.e. is I (1))
Classical approaches: (Augmented) Dickey-Fuller-Test,
Phillips-Perron-Test
Basic tool: Linear regression
Xt
Xt

= deterministics + Xt1 + t
= deterministics + ( 1) Xt1 + t
| {z }
=:

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

101 / 143

Integrated processes
Unit root tests

Null and alternative hypothesis


H0 : = 1

(unit root)

H1 : || < 1

(no unit root)

H0 : = 0

(unit root)

H1 : < 0

(no unit root)

or, equivalently,

Unit root tests are one-sided; explosive process are ruled out
Rejecting the null hypothesis is evidence in favour of stationarity
If the null hypothesis is not rejected, there could be a unit root
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

102 / 143

Integrated processes
DF test and ADF test

Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests


Possible regressions
Xt = Xt1 + t
Xt = a + Xt1 + t
Xt = a + bt + Xt1 + t

or Xt = Xt1 + t
or Xt = a + Xt1 + t
or Xt = a + bt + Xt1 + t

Assumption for Dickey-Fuller test: no autocorrelation in t


If there is autocorrelation in t , use the augmented DF test

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

103 / 143

Integrated processes
DF test and ADF test

Dickey-Fuller regression, case 1: no constant, no trend


Xt = Xt1 + t
Null and alternative hypotheses
H0 : = 0
H1 : < 0
Null hypothesis: stochastic trend without drift
Alternative hypothesis: stationary process around zero

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

104 / 143

Integrated processes
DF test and ADF test

Dickey-Fuller regression, case 2: constant, no trend


Xt = a + Xt1 + t
Null and alternative hypotheses
H0 : = 0

or H0 : = 0, a = 0

H1 : < 0

or

H0 : < 0, a 6= 0

Null hypothesis: stochastic trend without drift


Alternative hypothesis: stationary process around a constant

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

105 / 143

Integrated processes
DF test and ADF test

Dickey-Fuller regression, case 3: constant and trend


Xt = a + bt + Xt1 + t
Null and alternative hypotheses
H0 : = 0

or = 0, b = 0

H1 : < 0

or

< 0, b 6= 0

Null hypothesis: stochastic trend with drift


Alternative hypothesis: trend stationary process

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

106 / 143

Integrated processes
DF test and ADF test

Dickey-Fuller test statistics for single hypotheses


-test :
-test :

T

/

The -test statistic is computed in the same way as


the usual t-test statistic
Reject the null hypothesis if the test statistics are too small
The critical values are not the quantiles of the t-distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.6)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

107 / 143

Integrated processes
DF test and ADF test

The Dickey-Fuller test statistics for the joint hypotheses are


computed in the same way as the usual F -test statistics
Reject the null hypothesis if the test statistic is too large
The critical values are not the quantiles of the F -distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.7)
Illustrations (dftest.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

108 / 143

Integrated processes
DF test and ADF test

If there is autocorrelation in t the DF test does not work (dftest.R)


Augmented Dickey-Fuller test (ADF test) regressions:
Xt = 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + bt + 1 Xt1 + . . . + p Xtp + Xt1 + t
The added lagged differences capture the autocorrelation
The number of lags p must be large enough to make t white noise
The critical values remain the same as in the no-correlation case

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

109 / 143

Integrated processes
DF test and ADF test

Further interesting topics (but we skip these)


Phillips-Perron test
Structural breaks and unit roots
KPSS test of stationarity
H0 : Xt I (0)
H1 : Xt I (1)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

110 / 143

Integrated processes
Regression with integrated processes

Spurious regression: If Xt and Yt are independent but both I (1)


then the regression
Yt = + Xt + ut
will result in an estimated coefficient that is significantly different
from 0 with probability 1 as T
BUT: The regression
Yt = + Xt + ut
may be sensible even though Xt and Yt are I (1)
Cointegration

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

111 / 143

Integrated processes
Regression with integrated processes

Definition: Cointegration
Two stochastic processes (Xt )tT and (Yt )tT are cointegrated if both
processes are I (1) and there is a constant such that the process
(Yt Xt ) is I (0)
If is known, cointegration can be tested using a standard unit root
test on the process (Yt Xt )
If is unknown, it can be estimated from the linear regression
Yt = + Xt + ut
and cointegration is tested using a modified unit root test on the
residual process (ut )t=1,...,T
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

112 / 143

GARCH models
Conditional expectation

Let (X , Y ) be a bivariate random variable with a joint density


function, then
Z
E (X |Y = y ) =
x fX |Y =y (x)dx

is the conditional expectation of X given Y = y


E (X |Y ) denotes a random variable with realization E (X |Y = y )
if the random variable Y realizes as y
Both E (X |Y ) and E (X |Y = y ) are called conditional expectation

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

113 / 143

GARCH models
Conditional variance

Let (X , Y ) be a bivariate random variable with a joint density


function, then
Z
Var (X |Y = y ) =
(x E (X |Y = y ))2 fX |Y =y (x)dx

is the conditional variance of X given Y = y


Var (X |Y ) denotes a random variable with realization Var (X |Y = y )
if the random variable Y realizes as y
Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

114 / 143

GARCH models
Rules for conditional expectations

Law of iterated expectations: E (E (X |Y )) = E (X )

If X and Y are independent, then E (X |Y ) = E (X )

The condition can be treated like a constant,


E (XY |Y ) = Y E (X |Y )

The conditional expecation is a linear operator. For a1 , . . . , an R


!
n
n
X
X
E
ai Xi |Y =
ai E (Xi |Y )
i=1

Andrea Beccarini (CQE)

i=1

Time Series Analysis

Winter 2013/2014

115 / 143

GARCH models
Basics

Some economic time series show volatility clusters, e.g. stock returns,
commodity price changes, inflation rates, . . .
Simple autoregressive models cannot capture volatility clusters since
their conditional variance is constant
Example: Stationary AR(1)-process, Xt = Xt1 + t with || < 1;
then
2
Var (Xt ) = X2 =
,
1 2
and the conditional variance is
Var (Xt |Xt1 ) = 2

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

116 / 143

GARCH models
Basics

In the following, we will focus on stock returns


Empirical fact: squared (or absolute) returns are positively
autocorrelated
Implication: Returns are not independent over time
The dependence is nonlinear
How can we model this kind of dependence?

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

117 / 143

GARCH models
ARCH(1)-process

Definition: ARCH(1)-process
The stochastic process (Xt )tZ is called ARCH(1)-process if
E (Xt |Xt1 ) = 0
Var (Xt |Xt1 ) = t2
2
= 0 + 1 Xt1

for all t Z, with 0 , 1 > 0


Often, an additional assumption is
2
Xt | (Xt1 = xt1 ) N(0, 0 + 1 xt1
)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

118 / 143

GARCH models
ARCH(1)-process

The unconditional distribution of Xt is a non-normal distribution


Leptokurtosis: The tails are heavier than the tails of the normal
distribution
Example of an ARCH(1)-process
Xt = t t
where (t )tZ is white noise with 2 = 1 and
q
2
t = 0 + 1 Xt1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

119 / 143

GARCH models
ARCH(1)-process

One can show that

[11]

E (Xt |Xt1 ) = 0
E (Xt ) = 0
2
Var (Xt |Xt1 ) = 0 + 1 Xt1

Var (Xt ) = 0 / (1 1 )
Cov (Xt , Xti ) = 0

for i > 0

Stationarity condition: 0 < 1 < 1


The unconditional
kurtosis is 3(1 12 )/(1 312 ) if t N(0, 1).
p
If 1 > 1/3 = 0.57735, the kurtosis does not exist.

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

[12]

120 / 143

GARCH models
ARCH(1)-process

Squared returns follow

[13]

2
Xt2 = 0 + 1 Xt1
+ vt

with vt = t2 (2t 1)
Thus, squared returns of ARCH(1) are AR(1)
The process (vt )tZ is white noise
E (vt ) = 0
Var (vt ) = E (vt2 ) = const.
Cov (vt , vti ) = 0

Andrea Beccarini (CQE)

Time Series Analysis

(i = 1, 2, . . .)

Winter 2013/2014

121 / 143

GARCH models
ARCH(1)-process

Simulation of an ARCH(1)-process for t = 1, . . . , 2500


Parameters: 0 = 0.05, 1 = 0.95, start value X0 = 0
Conditional distribution: t N(0, 1)
archsim.R
Check whether the simulated time series shows the typical stylized
facts of return distributions

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

122 / 143

GARCH models
Estimation of an ARCH(1)-process

Of course, we do not know the true values of the model parameters


0 and 1
How can we estimate the unknown parameters 0 and 1 ?
Observations X1 , . . . , XT
Because of
2
Xt2 = 0 + 1 Xt1
+ vt

a possible estimation method is OLS

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

123 / 143

GARCH models
Estimation of an ARCH(1)-process

OLS estimator of 1

P

1 =



2 X2
Xt2 Xt2 Xt1
t1
2
(Xt2 , Xt1
)
2
PT  2
2
X

X
t1
t1
t=2

T
t=2

Careful: These
p estimators are only consistent if the kurtosis exists
(i.e. if 1 < 1/3)
Test of ARCH-effects
H0 : 1 = 0
H1 : 1 > 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

124 / 143

GARCH models
Estimation of an ARCH(1)-process

For T large, under H0

Reject H0 if

T
1 N(0, 1)

T
1 > 1 (1 )

Second version of this test: Consider the R 2 of the regression


2
+ vt ,
Xt2 = 0 + 1 Xt1

then under H0

appr

T
12 TR 2 21
Reject H0 if TR 2 > F1
2 (1 )
1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

125 / 143

GARCH models
ARCH(p)-process

Definition: ARCH(p)-process
The stochastic process (Xt )tZ is called ARCH(p)-process if
E (Xt |Xt1 , . . . Xtp ) = 0
Var (Xt |Xt1 , . . . , Xtp ) = t2
2
2
= 0 + 1 Xt1
+ . . . + p Xtp

for t Z, where i 0 for i = 0, 1, . . . , p 1 and p > 0


Often, an additional assumption is that
Xt |(Xt1 = xt1 , . . . , Xtp = xtp ) N(0, t2 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

126 / 143

GARCH models
ARCH(p)-process

Example of an ARCH(p)-process
Xt = t t
where(t )tZ is white noise with 2 = 1 and
q
2 + ... + X2
t = 0 + 1 Xt1
p tp
An ARCH(p) process is weakly stationary if all roots of
1 1 z 2 z 2 . . . p z p = 0 are outside the unit circle
Then, for all t Z, E (Xt ) = 0 and
Var (Xt ) =

Andrea Beccarini (CQE)

P0p

Time Series Analysis

i=1 i

Winter 2013/2014

127 / 143

GARCH models
ARCH(p)-process

If (Xt )tZ is a stationary ARCH(p) process, then (Xt2 )tZ is a


stationary AR(p) process
2
2
Xt2 = 0 + 1 Xt1
+ . . . + p Xtp
+ vt

As to the error term,


E (vt ) = 0
Var (vt ) = const.
Cov (vt , vti ) = 0

for i = 1, 2, . . .

Simulating an ARCH(p) is easy

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

128 / 143

GARCH models
Estimation of ARCH(p) models

OLS estimation of
2
2
Xt2 = 0 + 1 Xt1
+ . . . + p Xtp
+ vt

Test of ARCH-effects
H0 : 1 = 2 = . . . = p = 0

vs H1 : not H0

Let R 2 denote the coefficient of determination of the regression


Under H0 , the test statistic TR 2 2p ;
thus reject H0 if TR 2 > F1
2 (1 )
p

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

129 / 143

GARCH models
Maximum likelihood estimation

Basic idea of the maximum likelihood estimation method:


Choose parameters such that the joint density of the observations
fX1 ,...,XT (x1 , . . . , xT )
is maximized
Let X1 , . . . , XT denote a random sample from X
The density fX (x; ) depends on R unknown parameters
= (1 , . . . , R )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

130 / 143

GARCH models
Maximum likelihood estimation

ML estimation of : Maximize the (log)likelihood function


L () = fX1 ,...XT (x1 , . . . , xT ; )
=

ln L () =

T
Y
t=1
T
X

fX (xt ; )

ln fX (xt ; )

t=1

ML estimate
= argmax [ln L ()]

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

131 / 143

GARCH models
Maximum likelihood estimation

Since observations are independent in random samples


fX1 ,...,XT (x1 , . . . , xT ) =

T
Y

fXt (xt )

t=1

or
ln fX1 ,...,XT (x1 , . . . , xT ) =

T
X

ln fXt (xt )

t=1

T
X

ln fX (xt )

t=1

But: ARCH-returns are not independent!


Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

132 / 143

GARCH models
Maximum likelihood estimation

Factorization with dependent observations


fX1 ,...,XT (x1 , . . . , xT ) =

T
Y

fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )

t=1

or
ln fX1 ,...,XT (x1 , . . . , xT ) =

T
X

ln fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )

t=1

Hence, for an ARCH(1)-process


T
Y

1
1
fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 )
p 2 exp
2
2 t
t=2
Andrea Beccarini (CQE)

Time Series Analysis

xt
t

2 !

Winter 2013/2014

133 / 143

GARCH models
Maximum likelihood estimation

The marginal density of X1 is complicated but becomes negligible for


large T and, therefore, will be dropped from now on
Log-likelihood function (without initial marginal density)
ln L(0 , 1 |x1 , . . . , xT )
T

t=2

t=2

T
1X
1X
= ln 2
ln t2
2
2
2

xt
t

2

2
where t2 = 0 + 1 xt1

ML-estimation of 0 and 1 by numerical maximization of


ln L(0 , 1 ) with respect to 0 and 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

134 / 143

GARCH models
GARCH(p,q)-process

Definition: GARCH(p,q)-process
The stochastic process (Xt )tZ is called GARCH(p, q)-process if
E (Xt |Xt1 , Xt2 , . . .) = 0
Var (Xt |Xt1 , Xt2 , . . .) = t2
2
2
= 0 + 1 Xt1
+ . . . + p Xtp
2
2
+1 t1
+ . . . + q tq

for t Z with i , i 0
Often, an additional assumption is that
(Xt |Xt1 = xt1 , Xt2 = xt2 , . . .) N(0, t2 )
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

135 / 143

GARCH models
GARCH(p,q)-process

Conditional variance of GARCH(1, 1)


Var (Xt |Xt1 , Xt2 , . . .) = t2
2
2
= 0 + 1 Xt1
+ 1 t1

X
0
2
=
+ 1
1i1 Xti
1 1
i=1

Unconditional variance
Var (Xt ) =

Andrea Beccarini (CQE)

0
Pq
i=1 i
j=1 j

Pp

Time Series Analysis

Winter 2013/2014

136 / 143

GARCH models
GARCH(p,q)-process

Necessary condition for weak stationarity


p
X

i +

i=1

q
X

j < 1

j=1

(Xt )tZ has no autocorrelation


GARCH-processes can be written as ARMA(max (p, q) , q)-processes
in the squared returns
Example: GARCH(1, 1)-process with Xt = t t and
2 + 2
t2 = 0 + 1 Xt1
1 t1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

137 / 143

GARCH models
Estimation of GARCH(p,q)-processes

Estimation of the ARMA(max (p, q) , q)-process in the squared returns


Alternative (and better) method: Maximum likelihood
For a GARCH(1, 1)-process
fX1 ,...,XT (x1 , . . . , xT )
T
Y

1
1
= fX1 (x1 )
p 2 exp
2
2 t
t=2

Andrea Beccarini (CQE)

Time Series Analysis

xt
t

2 !

Winter 2013/2014

138 / 143

GARCH models
Estimation of GARCH(p,q)-processes

Again, the density of X1 can be neglected


Log-Likelihood function
ln L(0 , 1 , 1 |x1 , . . . , xT )
T

t=2

t=2

T
1X
1X
= ln 2
ln t2
2
2
2

xt
t

2

2
2
with t2 = 0 + 1 xt1
+ 1 t1
and 12 = 0

ML-estimation of 0 , 1 and 1 by numerical maximization

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

139 / 143

GARCH models
Estimation of GARCH(p,q)-processes

2
Conditional h-step forecast of the volatility t+h
in a GARCH(1, 1)
model



0
2
h
2
E t+h |Xt , Xt1 , . . . = (1 + 1 ) t
1 1 1
0
+
1 1 1

If the process is stationary


2
lim E (t+h
|Xt , Xt1 , . . .) =

0
1 1 1

Simulation of GARCH-processes is easy; the estimation can be


computer intensive
Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

140 / 143

GARCH models
Residuals of an estimated GARCH(1,1) model

Careful: Residuals are slightly different from what you know from OLS
regressions
Estimates:
0,
1 , 1 ,

2 + 2
From t2 = 0 + 1 Xt1
1 t1 and Xt = + t t we calculate
the standardized residuals

t =

Xt

Xt

=q

t
2 +
1 2

0 +
1 Xt1
t1

Histogram of the standardized residuals

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

141 / 143

GARCH models
AR(p)-ARCH(q)-models

Definition: (Xt )tZ is called AR(p)-ARCH(q)-process if


Xt

= + 1 Xt1 + t

t2

= 0 + 1 2t1

where t N(0, t2 )
mean equation / variance equation
Maximum likelihood estimation

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

142 / 143

GARCH models
Extensions of the GARCH model

There are a number of possible extensions to the GARCH model:


Empirical fact: Negative shocks have a larger impact on volatility
than positive shocks (leverage effect)
News impact curve
Nonnormal innovations, e.g. t t

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

143 / 143

Potrebbero piacerti anche