12

Introduction to Time Series Analysis. Lecture 11.
Peter Bartlett
Last lecture: Forecasting.
1. The innovations representation.
2. Recursive method: Innovations algorithm.
1
1. Review: Forecasting.
2. Example: Innovations algorithm for forecasting an MA(1)
3. Linear prediction based on the innite past
4. The truncated predictor
2
Review: One-step-ahead linear prediction
X
n
n+1
=
n1
X
n
+
n2
X
n1
+ +
nn
X
1
n
=
n
,
P
n
n+1
= E
_
X
n+1
X
n
n+1
_
2
= (0)
1
n

n
,
n
=
_
_
(0) (1) (n 1)
(1) (0) (n 2)
.
.
.
.
.
.
.
.
.
(n 1) (n 2) (0)
_
_
,
n
= (
n1
,
n2
, . . . ,
nn
)
,
n
= ((1), (2), . . . , (n))
.
3
Review: The innovations representation
Write the best linear predictor as
X
n
n+1
=
n1
_
X
n
X
n1
n
_
. .
innovation
+
n2
_
X
n1
X
n2
n1
_
+ +
nn
_
X
1
X
0
1
_
.
The innovations are uncorrelated:
Cov(X
j
X
j1
j
, X
i
X
i1
i
) = 0 for i = j.
Well see that this is useful for estimation.
4
5
Example: Innovations algorithm for forecasting an MA(1)
Suppose that we have an MA(1) process {X
t
} satisfying
X
t
= W
t
+
1
W
t1
.
Given X
1
, X
2
, . . . , X
n
, we wish to compute the best linear forecast of
X
n+1
, using the innovations representation,
X
0
1
= 0, X
n
n+1
=
n
i=1
ni
_
X
n+1i
X
ni
n+1i
_
.
6
An aside: The linear predictions are in the form
X
n
n+1
=
n
i=1
ni
Z
n+1i
for uncorrelated, zero mean random variables Z
i
. In particular,
X
n+1
= Z
n+1
+
n
i=1
ni
Z
n+1i
,
where Z
n+1
= X
n+1
X
n
n+1
(and all the Z
i
are uncorrelated).
This is suggestive of an MA representation. Why isnt it an MA?
7
n,ni
=
1
P
i
i+1
_
_
(n i)
i1
j=0
i,ij
n,nj
P
j
j+1
_
_
.
P
0
1
= (0) P
n
n+1
= (0)
n1
i=0
2
n,ni
P
i
i+1
.
The algorithm computes P
0
1
= (0),
1,1
(in terms of (1));
P
1
2
,
2,2
(in terms of (2)),
2,1
; P
2
3
,
3,3
(in terms of (3)), etc.
8
n,ni
=
1
P
i
i+1
_
_
(n i)
i1
j=0
i,ij
n,nj
P
j
j+1
_
_
.
For an MA(1), (0) =
2
(1 +
2
1
), (1) =
1
2
.
Thus:
1,1
= (1)/P
0
1
;
2,2
= 0,
2,1
= (1)/P
1
2
;
3,3
=
3,2
= 0;
3,1
= (1)/P
2
3
, etc.
Because (n i) = 0 only for i = n 1, only
n,1
= 0.
9
For the MA(1) process {X
t
} satisfying
X
t
= W
t
+
1
W
t1
,
the innovations representation of the best linear forecast is
X
0
1
= 0, X
n
n+1
=
n1
_
X
n
X
n1
n
_
.
More generally, for an MA(q) process, we have
ni
= 0 for i > q.
10
For the MA(1) process {X
t
},
X
0
1
= 0, X
n
n+1
=
n1
_
X
n
X
n1
n
_
.
This is consistent with the observation that
X
n+1
= Z
n+1
+
n
i=1
ni
Z
n+1i
,
where the uncorrelated Z
i
are dened by Z
t
= X
t
X
t1
t
for
t = 1, . . . , n + 1.
Indeed, as n increases, P
n
n+1
Var(W
t
) (recall the recursion for P
n
n+1
),
and
n1
= (1)/P
n1
n

1
.
11
Recall: Forecasting an AR(p)
For the AR(p) process {X
t
} satisfying
X
t
=
p
i=1
i
X
ti
+W
t
,
X
0
1
= 0, X
n
n+1
=
p
i=1
i
X
n+1i
for n p. Then
X
n+1
=
p
i=1
i
X
n+1i
+Z
n+1
,
where Z
n+1
= X
n+1
X
n
n+1
.
The Durbin-Levinson algorithm is convenient for AR(p) processes.
The innovations algorithm is convenient for MA(q) processes.
12
3. An aside: Innovations algorithm for forecasting an ARMA(p,q)
13
An aside: Forecasting an ARMA(p,q)
There is a related representation for an ARMA(p,q) process, based on the
innovations algorithm. Suppose that {X
t
} is an ARMA(p,q) process:
X
t
=
p
j=1
j
X
tj
+W
t
+
q
j=1
j
W
tj
.
Consider the transformed process (C. F. Ansley, Biometrika 66: 5965, 1979)
Z
t
=
_
_
_
X
t
/ if t = 1, . . . , m,
(B)X
t
/ if t > m.
If p > 0, this is not stationary. However, there is a more general version of
the innovations algorithm, which is applicable to nonstationary processes.
14
An aside: Forecasting an ARMA(p,q)
Let
n,j
be the coefcients obtained from the application of the innovations
algorithm to this process Z
t
. This gives the representation
X
n
n+1
=
_
_
_
n
j=1
nj
_
X
n+1j
X
nj
n+1j
_
n < m,
p
j=1
j
X
n+1j
+
q
j=1
nj
_
X
n+1j
X
nj
n+1j
_
n m
For a causal, invertible {X
t
}:
E(X
n
X
n1
n
W
n
)
2
0,
nj

j
, and P
n+1
n

2
.
Notice that this illustrates one way to simulate an ARMA(p,q) process
exactly. Why?
15
3. An aside: Innovations algorithm for forecasting an ARMA(p,q)
16
Linear prediction based on the innite past
So far, we have considered linear predictors based on n observed values of
the time series:
X
n
n+m
= P(X
n+m
|X
n
, X
n1
, . . . , X
1
).
What if we have access to all previous values, X
n
, X
n1
, X
n2
, . . .?
Write
X
n+m
= P(X
n+m
|X
n
, X
n1
, . . .)
=
i=1
i
X
n+1i
.
17
X
n+m
= P(X
n+m
|X
n
, X
n1
, . . .) =
i=1
i
X
n+1i
.
The orthogonality property of the optimal linear predictor implies
E
_
(

X
n+m
X
n+m
)X
n+1i
_
= 0, i = 1, 2, . . .
Thus, if {X
t
} is a zero-mean stationary time series, we have
j=1
j
(i j) = (m1 +i), i = 1, 2, . . .
18
If {X
t
} is a causal, invertible, linear process, we can write
X
n+m
=
j=1
j
W
n+mj
+W
n+m
, W
n+m
=
j=1
j
X
n+mj
+X
n+m
.
In this case,
X
n+m
= P(X
n+m
|X
n
, X
n1
, . . .)
= P(W
n+m
|X
n
, . . .)
j=1
j
P(X
n+mj
|X
n
, . . .)
=
m1
j=1
j
P(X
n+mj
|X
n
, . . .)
j=m
j
X
n+mj
.
19
X
n+m
=
m1
j=1
j
P(X
n+mj
|X
n
, . . .)
j=m
j
X
n+mj
.
That is,

X
n+1
=
j=1
j
X
n+1j
,
X
n+2
=
1

X
n+1
j=2
j
X
n+2j
,
X
n+3
=
1

X
n+2
2

X
n+1
j=3
j
X
n+3j
.
The invertible (AR()) representation gives the forecasts

X
n
n+m
.
20
To compute the mean squared error, we notice that
X
n+m
= P(X
n+m
|X
n
, X
n1
, . . .) =
j=1
j
P(W
n+mj
|X
n
, X
n1
, . . .)
+P(W
n+m
|X
n
, X
n1
, . . .)
=
j=m
j
W
n+mj
.
E(X
n+m
P(X
n+m
|X
n
, X
n1
, . . .))
2
= E
_
_
m1
j=0
j
W
n+mj
_
_
2
=
2
w
m1
j=0
2
j
.
21
That is, the mean squared error of the forecast based on the innite history
is given by the initial terms of the causal (MA()) representation:
E
_
X
n+m

X
n+m
_
2
=
2
w
m1
j=0
2
j
.
In particular, for m = 1, the mean squared error is
2
w
.
22
The truncated forecast
For large n, truncating the innite-past forecasts gives a good
approximation:
X
n+m
=
m1
j=1
j

X
n+mj
j=m
j
X
n+
m
j
X
n
n+m
=
m1
j=1
j

X
n
n+mj
n+m1
j=m
j
X
n+
m
j
.
The approximation is exact for AR(p) when n p, since
j
= 0 for j > p.
In general, it is a good approximation if the
j
converge quickly to 0.
23
Example: Forecasting an ARMA(p,q) model
Consider an ARMA(p,q) model:
X
t
i=1
i
X
ti
= W
t
+
q
i=1
i
W
ti
.
Suppose we have X
1
, X
2
, . . . , X
n
, and we wish to forecast X
n+m
.
We could use the best linear prediction, X
n
n+m
.
For an AR(p) model (that is, q = 0), we can write down the coefcients
n
.
Otherwise, we must solve a linear system of size n.
If n is large, the truncated forecasts

X
n
n+m
give a good approximation. To
compute them, we could compute
i
and truncate.
There is also a recursive method, which takes time O((n +m)(p +q))...
24
Recursive truncated forecasts for an ARMA(p,q) model
W
n
t
= 0 for t 0.

X
n
t
=
_
_
_
0 for t 0,
X
t
for 1 t n.
W
n
t
=

X
n
t

1

X
n
t1

p

X
n
tp
1

W
n
t1

q

W
n
tq
for t = 1, . . . , n.
W
n
t
= 0 for t > n.
X
n
t
=
1

X
n
t1
+ +
p

X
n
tp
+
1

W
n
t1
+ +
q

W
n
tq
for t = n + 1, . . . , n +m.
25
Example: Forecasting an AR(2) model
Consider the following AR(2) model.
X
t
+
1
1.21
X
t2
= W
t
.
The zeros of the characteristic polynomial z
2
+ 1.21 are at 1.1i. We can
solve the linear difference equations
0
= 1, (B)
t
= 0 to compute the
MA() representation:
t
=
1
2
1.1
t
cos(t/2).
Thus, the m-step-ahead estimates have mean squared error
E(X
n+m

X
n+m
)
2
=
m1
j=0
2
j
.
26
0 5 10 15 20 25 30
0.5
0
0.5
1
i
i
AR(2): X
t
+ 0.8264 X
t2
= W
t
27
10 12 14 16 18 20 22 24 26 28 30
5
4
3
2
1
0
1
2
3
4
t
X
t
AR(2): X
t
+ 0.8264 X
t2
= W
t
28
10 12 14 16 18 20 22 24 26 28 30
6
4
2
0
2
4
6
t
AR(2): X
t
+ 0.8264 X
t2
= W
t
X
t
onestep prediction
95% prediction interval
29
0 5 10 15 20 25
3
2
1
0
1
2
3
t
AR(2): X
t
+ 0.8264 X
t2
= W
t
X
t
prediction
95% prediction interval
30
31

12

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

12

Caricato da

Copyright:

Formati disponibili

Introduction to Time Series Analysis. Lecture 11.

Potrebbero piacerti anche