Identification and Estimation

University of Oxford
Statistical Methods
Autocorrelation
Identification and Estimation
Dr. Orlaith
Burke
Michaelmas Term, 2011
Department of Statistics, 1 South Parks Road,

Oxford OX1 3TG
Contents
1 Model Identification
1.1
The Sample ACF . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1
1.2
1.3
Using the ACF to determine a model . . . . . . . . . . . .
10
The PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.2.1
AR(1) Process . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.2.2
AR(p) Process . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.2.3
MA(1) Process . . . . . . . . . . . . . . . . . . . . . . . .
13
1.2.4
MA(q) process
. . . . . . . . . . . . . . . . . . . . . . . .
13
The Sample PACF . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2 Estimating the Model Parameters

2.1
2.2
17
Estimating Drift and Trend terms . . . . . . . . . . . . . . . . . .
17
2.1.1
Estimating Drift . . . . . . . . . . . . . . . . . . . . . . .
18
2.1.2
Estimating a Linear Trend . . . . . . . . . . . . . . . . . .
19
Method of Moments Estimators for AR, MA and ARMA models .
20
2.2.1
AR Models . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.2.2
MA Models . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.2.3
ARMA(1,1) Model . . . . . . . . . . . . . . . . . . . . . .
21
CONTENTS
2.3
Alternative Estimation Techniques . . . . . . . . . . . . . . . . .
22
2.3.1
Least Squares Estimation . . . . . . . . . . . . . . . . . .
22
2.3.2
Maximum Likelihood Estimation . . . . . . . . . . . . . .
24
2.3.3
Unconditional Least Squares . . . . . . . . . . . . . . . . .
25
1
Model Identification
We have been introduced to ARIMA models and have examined in detail the
behaviour of various models. But all of our investigations so far have looked at
theoretical models for time series. Our next task is to consider how we might fit
these models to real time series data.
To begin, in this chapter we examine how to establish what are appropriate values
for p, d and q in fitting an ARIMA(p, d, q) model to a set of data.
1.1
The Sample ACF
The first technique we use to determine values for p, d and q is to compute the
sample ACF from the data and compare this to known properties of the ACF for
ARIMA models.
The sample autocorrelation function is defined as
Pnk
(Yt Y )(Yt+k Y )
.
rk = t=1Pn
2
(Y
Y
)
t
t=1
(1.1)
The sample autocorrelation rk provides us with an estimate for the true autocorrelation
k . In order to make proper statistical determinations about k we need to
1
1. MODEL IDENTIFICATION
investigate the sampling distribution behaviour. Unfortunately the sampling

behaviour is too complicated for us to go into detail in these notes. Instead,
we consider the following asymptotic result:
Let k and rk be the autocorrelation function and sample autocorrelation function
respectively of stationary ARMA process. In the limit as n the joint
distribution of
( n(r1 1 ), n(r2 2 ), . . . , n(rm m ))
(1.2)
approaches a joint normal distribution with zero means and with variance-covariance
matrix cij where
cij =
(k+i k+j + ki k+j
k=
2i k k+j 2j k k+i + 2i j 2k ).
(1.3)
For large n we could therefore say that rk is approximately normally distributed

with mean k and variance ckk /n. We will not examine rk in any more detail
however we take note that the rk should not be expected to match k in great
detail. We may see ripples or trends in rk which are not present in k but are
artifacts of sampling.
AR1: Yt = 0.9Yt1 + t
t(Y)
TS plot of Simulated AR1 Phi1=0.9
200
400
600
800
1000
Time
0.0
0.2
0.4
ACF
0.6
0.8
1.0
ACF Simulated AR1 Phi1=0.9 T=1000
10
15
Lag
20
25
30
AR1: Yt = 0.4Yt1 + t
t(Y)
200
400
600
800
1000
Time
0.0
0.2
ACF
0.4
0.6
0.8
1.0
10
15
Lag
20
25
30
AR1: Yt = 0.7Yt1 + t
t(Y)
200
400
600
800
1000
Time
0.5
0.0
ACF
0.5
1.0
10
15
Lag
20
25
30
AR1: Yt = 1.0Yt1 + t
10
t(Y)
10
20
TS plot of Simulated AR1 Phi1=1
200
400
600
800
1000
Time
0.0
0.2
0.4
ACF
0.6
0.8
1.0
ACF Simulated AR1 Phi1=1 T=1000
10
15
Lag
20
25
30
MA1: Yt = 1.0t1 + t
0
4
t(Y)
TS plot of Simulated MA1 Theta1=1
200
400
600
800
1000
Time
0.5
0.0
ACF
0.5
1.0
ACF Simulated MA1 Theta1=1 T=1000
10
15
Lag
20
25
30
MA1: Yt = 0.4t1 + t
t(Y)
TS plot of Simulated MA1 Theta1=0.4
200
400
600
800
1000
Time
0.4
0.2
0.0
0.2
ACF
0.4
0.6
0.8
1.0
ACF Simulated MA1 Theta1=0.4 T=1000
10
15
Lag
20
25
30
MA2: Yt = 0.4t1 + 0.4t2 + t
t(Y)
TS plot of Simulated MA2 Theta1=0.4, Theta2=0.4
200
400
600
800
1000
Time
0.4
0.2
0.0
0.2
ACF
0.6
0.8
1.0
ACF Simulated MA2 Theta1=0.4, Theta2=0.4 T=1000
10
15
Lag
20
25
30
1.1.1
10
Using the ACF to determine a model
We can now begin to see how the ACF can be used to determine which model
best fits a set of data.
Firstly we can see from the theoretical ACF and from the previous graphs that
the ACF is very useful for identifying MA processes. This is because the first q
terms in the ACF of an MA(q) process are non-zero and the remaining terms are
all zero.
Secondly we have seen that the ACF of a non-stationary series displays a behaviour
quite different from that of stationary series. In non-stationary series the terms
of the ACF do not decay to zero as they do in a stationary series (c.f. the plots
above).
We have also seen that the ACF of a stationary AR process decays exponentially
to zero.
1.2
The PACF
In the previous section we have seen how the ACF can be used to identify MA
processes, clearly indicating the order q of the process by the number of non-zero
terms in the ACF. It would be great if we could similarly identify the order of an
AR process. There is another correlation function which allows us to do precisely
that: The Partial Autocorrelation Function or PACF.
The PACF computes the correlation between two variables yt and ytk after
removing the effect of all intervening variables yt1 , yt2 , . . . , ytk+1 . We can
think of the PACF as a conditional correlation:
kk = Corr(yt , ytk |yt1 , yt2 , . . . , ytk+1 ).
11
(1.4)
Another way to think of the PACF is to consider regressing yt on yt1 , yt2 , . . . , ytk+1
to give the following least squares estimate:
1 yt1 + 2 yt2 + . . . + k1 ytk+1 .
(1.5)
We define Ef orward to be the error of prediction

Ef orward = yt 1 yt1 2 yt2 . . . k1 ytk+1 .
(1.6)
Consider also regressing ytk on yt1 , yt2 , . . . , ytk+1 . It can be shown (however
we will not prove this result) that the least squares estimate of ytk is
1 ytk+1 + 2 ytk+2 + . . . + k1 yt1 .
(1.7)
Here the prediction error is

Ebackwards = ytk 1 ytk+1 2 ytk+2 . . . k1 yt1 .
(1.8)
The Partial Autocorrelation Function is now

kk = Corr(Ef orward , Ebackward ).
(1.9)
12
How does this PACF behave? Well, by convention we set

11 = 1 .
(1.10)
It is possible to show that

22 =
2 21
.
1 21
(1.11)
Now let us consider some ARIMA models.
1.2.1
AR(1) Process
We have seen that for an AR(1) model

k = k
(1.12)
hence
22 =
2 2
=0
1 2
(1.13)
In fact kk = 0 for all k > 1 in the AR(1) model.
1.2.2
AR(p) Process
For the AR(p) model we can show that

kk = 0
k>p
(1.14)
1.2.3
13
MA(1) Process
For the MA(1) process we find the following results

22 =
2
1 + 2 + 4
(1.15)
kk =
k (1 2 )
for k 1
1 2(k+1)
(1.16)
It is clear from these equations that the PACF for an MA(1) process decays
exponentially to zero.
1.2.4
MA(q) process
For a general MA(q) process we can use the following Yule-Walker equations to
solve for the PACF kk :
j = k1 j1 + k2 j2 + . . . + kk jk , j = 1, 2, . . . , k
(1.17)
Note that here the terms are known and we are solving for kk . Of all the terms
k1 , k2 , . . . , kk we are only interested in kk .
We should note here that these Yule-Walker equations are slightly different from
those we have seen before. These equations allow us to solve for kk for any
stationary process not just an AR(p) process. If however we are dealing with an
AR(p) process then these equations match exactly those that we ave already seen
with pp = p and kk = 0 for k > p.
1.3
14
The Sample PACF
When presented with actual data we can use equation (1.17) to compute the
Sample PACF if we replace the true ACFs () by the sample ACFs (r). We will
not go into any more details here however as in the case of the sample ACF we
note that the behaviour of the sample PACF will be similar to the true PACF.
We should not however expect complete agreement between the two and we will
therefore not be surprised to see some ripples or trends in the Sample PACF
which would not be present in the true ACF.
15
AR1: Yt = Yt1 + t
0.4
0.0
0.2
Partial ACF
0.6
0.8
PACF Simulated AR1 Phi1=0.9 T=1000
10
15
20
25
30
25
30
Lag
0.0
Partial ACF
0.2
0.4
0.6
PACF Simulated AR1 Phi1=0.7 T=1000
10
15
Lag
20
16
MA1: Yt = ()t1 + t
0.2
0.3
0.4
0.5
Partial ACF
0.1
0.0
PACF Simulated MA1 Theta1=0.7 T=1000
10
15
Lag
20
25
30
2
Estimating the Model
Parameters
1
In the last chapter we examined how to determine values for p, d, q and hence
establish the appropriate ARIMA(p, d, q) model to fit to a set of data. In this

chapter we consider how to estimate the parameters of the ARIMA model.
We will consider several different estimation techniques for estimating the various
parameters in the model.
2.1
Estimating Drift and Trend terms
When we first encountered the Box-Jenkins ARIMA model the model contained
a mean term . Throughout this course so far we have set = 0. Let us now
allow the possibility that 6= 0 and consider how we might estimate this non-zero
mean.
Courtesy of Dr. P. Murphy, UCD
17
2. ESTIMATING THE MODEL PARAMETERS
18
In essence we are dealing with a time series

xt = + y t
(2.1)
where E(yt ) = 0 and we wish to estimate using the observed sample time series
x1 , x2 , . . . , xn .
2.1.1
Estimating Drift
Suppose that is a constant then the obvious way to estimate is use the sample
mean:
x=
1X
xt .
n t=1
(2.2)
This estimate is obviously unbiased. Lets consider how precise an estimate it is

by examining the variance of x.
If yt is stationary then we may show that

n1
0 X
|k|
Var(x) =
k
1
n k=n+1
n
"
#
n1
X
0
k
=
1+2
1
k
n
n
k=1
(2.3)
(2.4)
where k is the autocorrelation of yt and in fact also of xt .

We note here that the variance is approximately inversely proportional to the
sample size.
If yt is non-stationary with zero mean then the precision of the estimate x is very
much worse.
Suppose yt = yt1 + t then it can be shown that
Var(x) =
1
n2
"
n
X
t 2 + 2
t=1
= 2 (2n + 1)
2.1.2
19
n X
s1
X
#
t 2
(2.5)
s=2 t=1
n+1
6n
(2.6)
Estimating a Linear Trend
Suppose now that the mean is no longer a constant but instead takes the form:
= 0 + 1 t.
(2.7)
We want to estimate the two parameters 0 and 1 . One method is to use the
ordinary least squares technique used in regression. Here we minimise
n
X
(xt 0 1 t)2 .
Q(0 , 1 ) =
(2.8)
t=1
Partial differentiation gives us the least squares estimators:
1 =
Pn
t=1 (xt x)(t

Pn
t=1 (t t)
0 = x 1 t
t)
(2.9)
(2.10)
where t is the average of 1, 2, 3, . . . , n:

t = (n + 1)/2.
(2.11)
2.2
20
Method of Moments Estimators for AR, MA

and ARMA models
2.2.1
AR Models
In the AR(1) model 1 = 1 therefore we could estimate 1 by

1 = r1 .
(2.12)
In the AR(2) model we have the Yule-Walker equations:

1 = 1 + 1 2
(2.13)
2 = 1 1 + 2 .
(2.14)
The method of moments then instructs us to replace each by the corresponding

r giving us
r1 = 1 + r1 2
(2.15)
r2 = r1 1 + 2 .
(2.16)
When we solve these we get

r1 (1 r2 )
1 =
1 r12
r2 r12
2 =
.
1 r12
(2.17)
(2.18)
The exact same technique gives us the method of moments estimators of the p
parameters in an AR(p) model, namely we replace each of the p ACFs by the
corresponding sample ACFs rin the Yule-Walker equations and then we solve
21
for the method of moments estimators: 1 , 2 , . . . , p .
2.2.2
MA Models
In the MA(1) process we know that
1 =
1
1 + 12
(2.19)
Now we replace 1 by r1 and then we solve this equation to get two roots
1
p
1 4r12
=
2r
p1
1 1 4r12
=
2r1
1 +
(2.20)
(2.21)
We now must consider three situations:

+
1. If |r1 | < 0.5 then we have real roots but only 1 is invertible.
2. If r1 = 0.5 then neither of the solutions are invertible.

3. If |r1 | > 0.5 then no real solutions exist and so we have no estimator for 1 .
The MA(q) model can be considered in the same way but the solution requires
solving sets of non-linear equations. In fact it is only possible to solve these
equations numerically, analytical solutions do not exist in most cases.
2.2.3
ARMA(1,1) Model
We have seen that for the ARMA(1,1) model
k =
(1 1 1 )(1 1 ) k1
.
1 21 1 + 12
(2.22)
22
From this we see that

2
= 1
1
(2.23)
r2
1 = .
r1
(2.24)
and so we can estimate 1 by
Substituting this estimate into (2.22) we obtain
r1 =
(1 1 1 )(1 1 )
.
1 21 1 + 2
(2.25)
We can now solve this quadratic equation for 1 , remembering that of the two
solutions thus obtained we only keep the invertible solution.
2.3
Alternative Estimation Techniques
We have just seen that the method of moments cannot estimate the parameters
in a Moving Average model. We now note that there are several alternative
estimation procedures that can be used to estimate the parameters in an ARIMA
model. The first is based on the principle of least squares, the second uses
maximum likelihood methods and the technique of unconditional least squares
is a compromise between these two methods. We will not go into the detail of
any of these estimation procedures here but in the following we will provide a
brief introduction to the basic ideas.
2.3.1
Least Squares Estimation
The AR(1) model

Yt = (Yt ) + t
(2.26)
can be viewed as a regression with predictor variable Yt1 and response variable
Yt . The least squares estimator of the parameter is clearly then obtained by
23
minimising the following sum of squares:
S (, ) =
n
X
[(Yt ) (Yt1 )]2 .
(2.27)
t=2
This function is called the conditional sum of squares function. Minimising S

we get
Pn
P
Yt nt=2 Yt1
(n 1)(1 )
t=2
(2.28)
For large values of n this equation reduces to
Y Y
=Y.
1
(2.29)
And substituting Y for we find that when dealing with large samples the
estimator for is given by
Pn
t=2 (Yt Y )(Yt1

Pn
2
t=2 (Yt1 Y )
Y)
(2.30)
In the AR(p) model we again find that for large sample sizes
=Y.
(2.31)
More interestingly we can show that the estimates of 1 , 2 , . . . , p are given by

the solutions of the sample Yule-Walker equations. For example in the AR(2)
model we solve
to obtain 1 and 2 .
r1 = 1 + r1 2
(2.32)
r2 = r1 1 + 2
(2.33)
24
Suppose now we are trying to estimate the parameter in the MA(1) model:
Yt = t 1 t1 .
(2.34)
We remember that the if this model is invertible then it can be written as an

infinite order AR model:
Yt = 1 Yt1 12 Yt2 . . . + t
(2.35)
And again we have a regression model that in principle allows us to estimate 1

using least squares. We note however that this regression is not only of infinite
order but is non-linear in 1 . In fact we cannot minimise the appropriate sum of
squares
S (1 ) =
t
(2.36)
analytically but must use numerical methods to find a solution.

Actually for MA(q) and ARMA(p, q) models we will have to resort to numerical
techniques to find solutions to the least squares equations.
2.3.2
Maximum Likelihood Estimation
For the AR(1) model

Yt = 1 (Yt1 ) + t
(2.37)
if we assume that the error terms t are iid normal white noise then we can show
that the likelihood function is given by
2
2 n/2
L(1 , , ) = (2 )
(1
21 )1/2 exp
S(1 , )
2 2

(2.38)
25
where
S(1 , ) =
n
X
[(Yt ) 1 (Yt1 )]2 + (1 21 )(Y1 )2 .
(2.39)
t=2
The term S(1 , ) is called the unconditional sum of squares functionand should
be compared with the previous conditional sum of squares. Clearly they are
related as follows:
S(1 , ) = S (1 , ) + (1 21 )(Y1 )2 .
(2.40)
To obtain maximum likelihood estimators for 1 , and 2 we need to minimise

L(1 , , 2 ) with respect to these parameters.
We note that we can extend this procedure to other ARIMA models.
2.3.3
Unconditional Least Squares
In previous sections we have seen that when estimating parameters using Least
Squares we must minimise the conditional sum of squares S (1 , ). We have also
just seen that the maximum likelihood estimators are obtained by minimising
L(1 , , 2 ) and we have seen L(1 , , 2 ) is a function of the unconditional sum
of squares S(1 , ). An alternative estimation procedure tells us that instead of
minimising the full likelihood function L(1 , , 2 ) we could just minimise the
unconditional sum of squares S(1 , ). This procedure is then a compromise
between the conditional least squares estimates and the full maximum likelihood
procedure.
We will finish our discussion here and will not investigate any of these estimation
procedures further.

Identification and Estimation - Oxford

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Identification and Estimation - Oxford

Caricato da

Copyright:

Formati disponibili

University of Oxford

Michaelmas Term, 2011

Department of Statistics, 1 South Parks Road,

The Sample ACF . . . . . . . . . . . . . . . . . . . . . . . . . . .

Using the ACF to determine a model . . . . . . . . . . . .

The Sample PACF . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Estimating the Model Parameters

Estimating Drift and Trend terms . . . . . . . . . . . . . . . . . .

Estimating a Linear Trend . . . . . . . . . . . . . . . . . .

Method of Moments Estimators for AR, MA and ARMA models .

Alternative Estimation Techniques . . . . . . . . . . . . . . . . .

Least Squares Estimation . . . . . . . . . . . . . . . . . .

Maximum Likelihood Estimation . . . . . . . . . . . . . .

Unconditional Least Squares . . . . . . . . . . . . . . . . .

The Sample ACF

investigate the sampling distribution behaviour. Unfortunately the sampling

( n(r1 1 ), n(r2 2 ), . . . , n(rm m ))

(k+i k+j + ki k+j

For large n we could therefore say that rk is approximately normally distributed

TS plot of Simulated AR1 Phi1=0.9

ACF Simulated AR1 Phi1=0.9 T=1000

TS plot of Simulated AR1 Phi1=0.4

ACF Simulated AR1 Phi1=0.4 T=1000

TS plot of Simulated AR1 Phi1=0.7

ACF Simulated AR1 Phi1=0.7 T=1000

TS plot of Simulated AR1 Phi1=1

ACF Simulated AR1 Phi1=1 T=1000

TS plot of Simulated MA1 Theta1=1

ACF Simulated MA1 Theta1=1 T=1000

TS plot of Simulated MA1 Theta1=0.4

ACF Simulated MA1 Theta1=0.4 T=1000

MA2: Yt = 0.4t1 + 0.4t2 + t

TS plot of Simulated MA2 Theta1=0.4, Theta2=0.4

ACF Simulated MA2 Theta1=0.4, Theta2=0.4 T=1000

Using the ACF to determine a model

kk = Corr(yt , ytk |yt1 , yt2 , . . . , ytk+1 ).

We define Ef orward to be the error of prediction

Here the prediction error is

The Partial Autocorrelation Function is now

How does this PACF behave? Well, by convention we set

It is possible to show that

Now let us consider some ARIMA models.

We have seen that for an AR(1) model

In fact kk = 0 for all k > 1 in the AR(1) model.

For the AR(p) model we can show that

For the MA(1) process we find the following results

The Sample PACF

PACF Simulated AR1 Phi1=0.9 T=1000

PACF Simulated AR1 Phi1=0.7 T=1000

PACF Simulated MA1 Theta1=0.7 T=1000

establish the appropriate ARIMA(p, d, q) model to fit to a set of data. In this

Estimating Drift and Trend terms

Courtesy of Dr. P. Murphy, UCD

2. ESTIMATING THE MODEL PARAMETERS

In essence we are dealing with a time series

This estimate is obviously unbiased. Lets consider how precise an estimate it is

where k is the autocorrelation of yt and in fact also of xt .

2. ESTIMATING THE MODEL PARAMETERS

Estimating a Linear Trend

Partial differentiation gives us the least squares estimators:

t=1 (xt x)(t

where t is the average of 1, 2, 3, . . . , n:

MA2: Yt = 0.4t1 + 0.4t2 + t