Sei sulla pagina 1di 6

Maximum Likelihood Estimation of an ARMA(p,q) Model

Constantino Hevia
The World Bank. DECRG.
October 2008
This note describes the Matlab function arma_mle.m that computes the maximum likelihood
estimates of a stationary ARMA(p,q) model.
Problem: To t an ARMA(p,q) model to a vector of time series f
1
.
2
. ....
T
g with zero
unconditional mean. An ARMA(p,q) process is given by

t
= c
1

t1
+ ... + c
p

tp
+
t
+ o
1

t1
+ ... + o
q

tq
.
where
t
is an i.i.d. shock normally distributed with mean zero and variance o
2
. If the original
series do not have zero mean, we rst construct ~
t
=
t

T
s=1

s
,1 and then t the ARMA
model to ~
t
.
Usage: results = arma_mle(y,p,q,[info])
Arguments:
y = vector of observed time series with mean zero.
p = length of the autoregressive part (AR) of the ARMA model (integer)
q = length of the moving average part (MA) of the ARMA model (integer)
info =[optional] If info is not zero, the programprints information about the convergence
of the optimization algorithm. The default value is zero.
Output: A structure with the following elements:
results.ar =
_
^
c
1
.
^
c
2
. ....
^
c
p
_
: estimated coecients of the AR part.
results.ma =
_
^
o
1
.
^
o
2
. ....
^
o
q
_
: estimated coecients of the MA part.
results.sigma =^ o : estimated standard deviation of
t
.
1
The le test_arma_mle.m performs a Montecarlo experiment using the function arma_mle.m.
The user inputs a theoretical ARMA model. The program runs a large number of simulations
and then estimates the parameters for each simulation. Finally, the histograms of the estimates
are shown.
Algorithm
In this section I describe the algorithm used to compute the maximum likelihood estimates
of the ARMA(p,q) process. Suppose that we want to t the (mean zero) time series f
t
g
T
t=0
the
the following ARMA(p,q) model

t
= c
1

t1
+ ... + c
p

tp
+
t
+ o
1

t1
+ ... + o
q

tq
. (1)
where
t
is an i.i.d. shock normally distributed with mean zero and variance o
2
. Let : =
max (j. + 1), and rewrite the model as

t
= c
1

t1
+ ... + c
r

tr
+
t
+ o
1

t1
+ ... + o
r1

tr+1
. (2)
We interpret c
j
= 0 for , j and o
j
= 0 for , .
The estimation procedure is based on the Kalman lter (see Hamilton (1994) for the deriva-
tion of the lter). To use the Kalman lter we need to write the model in the following
(state-space) form
r
t+1
= Ar
t
+R
t+1
(3)

t
= Z
0
r
t
(4)
where r
t
is an : 1 state vector, A is an : : matrix, and R and Z are : 1 vectors. These
matrices and vectors are dened as follows
A =
_

_
c
1
1 0 0 0
c
2
0 1 0 0
c
3
0 0 1 0
.
.
.
.
.
.
c
r1
0 0 0 1
c
r
0 0 0 0
_

_
; R =
_

_
1
o
1
o
2
.
.
.
o
r1
_

_
; Z =
_

_
1
0
0
.
.
.
0
_

_
To see that the system (3) and (4) is equivalent to (2), write the last row of (3) as
r
r;t+1
= c
r
r
1;t
+ o
r1

t+1
2
Lagging this equation : 1 periods we nd
r
r;tr+2
= c
r
1
r1
r
1;t
+ o
r1
1
r1

t+1
(5)
where we dene 1
r
r
t
= r
tr
as the : lag operator for any integer : 0. The second to last
row implies
r
r1;t+1
= c
r1
r
1;t
+ r
r;t
+ o
r2

t+1
Lagging : 2 periods we obtain
r
r1;tr+3
= c
r1
1
r2
r
1;t
+ r
r;tr+2
+ o
r2
1
r2

t+1
Introducing (5) into the previous equation we nd
r
r1;tr+3
= c
r1
1
r2
r
1;t
+ c
r
1
r1
r
1;t
+ o
r1
1
r1

t+1
+ o
r2
1
r2

t+1
or
r
r1;tr+3
=
_
c
r1
1
r2
+ c
r
1
r1
_
r
1;t
+
_
o
r1
1
r1
+ o
r2
1
r2
_

t+1
(6)
Take now row : 2,
r
r2;t+1
= c
r2
r
1;t
+ r
r1;t
+ o
r3

t+1
Lagging : 3 periods we nd
r
r2;tr+4
= c
r2
1
r3
r
1;t
+ r
r1;tr+3
+ o
r3
1
r3

t+1
Plugging (6) into the previous equation we obtain
r
r2;tr+4
=
_
c
r2
1
r3
+ c
r1
1
r2
+ c
r
1
r1

r
1;t
+
_
o
r1
1
r1
+ o
r2
1
r2
+ o
r3
1
r3

t+1
Following this iterative procedure until row : 1 we nd
r
1;t+1
=
_
c
1
+ ... + c
r2
1
r3
+ c
r1
1
r2
+ c
r
1
r1

r
1;t
+
_
o
r1
1
r1
+ o
r2
1
r2
+ o
r3
1
r3
+ ... + 1

t+1
or
_
1 c
1
1 c
2
1
2
... c
r
1
r
_
r
1;t+1
=
_
o
r1
1
r1
+ o
r2
1
r2
+ o
r3
1
r3
+ ... + 1

t+1
(7)
3
Now, the observation equation (4) and the denition of Z imply

t
= r
1;t
Using (7) evaluated at t we arrive at the ARMA representation (2),
_
1 c
1
1 c
2
1
2
... c
r
1
r
_

t
=
_
o
r1
1
r1
+ o
r2
1
r2
+ o
r3
1
r3
+ ... + 1

t
which proves that the system (3), (4) is equivalent to (2).
Denote by ^ r
t+1jt
= 1
t
[r
t+1
j
0
. ....
t
; r
0
] the expected value of r
t+1
conditional on the history
of observations (
0
. ....
t
). The Kalman lter provides an algorithm for computing recursively
^ r
t+1jt
given an initial value ^ r
1j0
=

0. (Note that

0 is the unconditional mean of r
t
). Associated
with each of these forecasts is a mean squared error matrix, dened as
P
t+1jt
= 1
_
_
r
t+1
^ r
t+1jt
_ _
r
t+1
^ r
t+1jt
_
0
_
.
Given the estimate ^ r
tjt1
, we use (4) to compute the innovations
c
t
=
t
1 [
t
j
0
. ....
t1
; r
0
]
=
t
Z
0
^ r
tjt1
The innovation variance, denoted by .
t
, satises
.
t
= 1
_
_

t
Z
0
^ r
tjt1
_ _

t
Z
0
^ r
tjt1
_
0
_
(8)
= 1
_
_
Z
0
r
t
Z
0
^ r
tjt1
_ _
Z
0
r
t
Z
0
^ r
tjt1
_
0
_
= Z
0
P
tjt1
Z.
In addition to the estimates ^ r
t+1jt
, the Kalman lter equations imply the following evolution of
the matrices P
t+1jt
P
t+1jt
= A
_
P
tjt1
P
tjt1
ZZ
0
P
tjt1
,.
t

A
0
+RR
0
o
2
. (9)
Given an initial matrix P
1j0
= 1 (r
t
r
0
t
) and the initial value ^ r
1j0
=

0, the likelihood function of
the observation vector f
0
.
1
. ....
T
g is given by
1 =
T

t=1
(2:.
t
)
1=2
exp
_

c
2
t
2.
t
_
4
Taking logarithms, dropping the constant 2:, and multiplying by 2 we obtain
| =
T

t=1
_
ln (.
t
) + c
2
t
,.
t

(10)
In principle, to nd the MLE estimates we maximize (10) with respect to the parameters o
j
, c
j
,
and o
2
. However, the following trick allows us to concentrate-out the term o
2
, and maximize
only with respect to the parameters o
j
, c
j
. Suppose we initialize the lter with the matrix
~
P
1j0
= o
2
P
1j0
. Then, from (9) it follows that each P
t+1jt
is proportional to o
2
, and from (8)
it follows that the innovation variance is also proportional to o
2
. This implies that we can
optimize rst with respect to o
2
by hand, replace the result into the objective function, and
then optimize the resulting expression (called the concentrated log-likelihood) with respect to
the parameters o
j
, c
j
. To see this, note that (10) becomes
| =
T

t=1
_
ln
_
o
2
.
t
_
+
c
2
t
.
t
o
2
_
(11)
and o
2
is cancelled out in the evolution equations of P
t+1jt
and in the projections ^ r
t+1jt
. So we
can directly optimize (11) with respect to o
2
to obtain
o
2
=
1
1
T

t=1
c
2
t
,.
t
.
Replacing this result into (11) we obtain the concentrated log-likelihood function
| =
T

t=1
_
ln
_
o
2
_
+ ln .
t
+
c
2
t
.
t
o
2
_
=
_
T

t=1
ln
_
1
1
T

t=1
c
2
t
.
t
.
_
+
T

t=1
ln .
t
+

T
t=1
c
2
t
,.
t
1
n

T
t=1
c
2
t
,.
t
.
_
=
_
1 ln (1,1) + 1 + 1 ln
T

t=1
c
2
t
,.
t
+
T

t=1
ln .
t
_
or, dropping irrelevant constants,
| =
_
1 ln
T

t=1
c
2
t
,.
t
+
T

t=1
ln .
t
_
(12)
Because the innovations c
t
and the variances .
t
are nonlinear functions of the parameters [o. c],
5
we use numerical methods to maximize (12). The Matlab function arma_mle.m performs this
task using the optimization routine fminunc.m from the Matlab optimization package. The
initial condition for the parameters are based on the two-step regression procedure described
in Hannan and McDougall (1984). The rst step consists in running a (relatively) long autore-
gression and computing the tted residuals. The second steps computes an OLS regression of

t
on its p lagged values, and on lagged values of the tted residuals obtained in the rst step.
REFERENCES
[1] James D. Hamilton Time Series Analysis, 1994. Princeton University Press.
[2] E.J. Hannan and A.J. McDougall. Regression Procedures for ARMA Estimation, Journal
of the American Statistical Association, Vol 83, No 409, June 1988.
6

Potrebbero piacerti anche