Sei sulla pagina 1di 5

Some stochastic modelling

techniques and their


applications
N. K. Sinha
Group on Simulation,
Canada

Optimization and Control, McMaster University, Hamilton,

T. Prasad
Solid Mechanics Division, University of Waterloo, Waterloo, Canada
(Received March I9 78)

Several different types of stochastic models for estimation and forecasting


have been described. The presentation is fairly general so as to be useful
to applied scientists from different fields. The methods used for validation
and diagnostic checks of the models have also been discussed. The formulae
can even be implemented on a programmable pocket calculator. As an
example of the procedure, twelve different models are obtained for
predicting the annual flow of the Nile river at Aswan Dam, based on data
obtained for the period 1903-1944.

Introduction
Mathematical modelling is now recognized as being vital
for many areas in applied sciences. For example, modelling
techniques have been applied to the understanding of
biological processes, socioeconomical
systems, chemical
processes and environmental
systems. Several books1-4
have been written on the subject. Most of the presentations, however, have remained inaccessible to applied
scientists who are not experts in statistics and advanced
mathematics. There is, thus, a great need for a simple and
straightforward presentation of the methods for obtaining
stochastic models from observed data.
To be useful, a model must be conceptually simple in
addition to being consistent with the observations. A model
is said to be parsimonious if the number of parameters is
minimized, consistent with certain accuracy. Such a model
can, at best, be an approximation
to the system. To be
acceptable, the responses predicted using the model must
be sufficiently close to that of the process being modelled.
Several tests for validation of models have been proposed
in the literature, and will be briefly discussed.
Since the data obtained for modelling of most systems
are usually contaminated with observation and systemic
noise, the types of models normally used are called stochastic
models. Several methods for determining such models from
observed data have been proposed in the literature.
The aim of this paper is to present a brief and simpleminded survey of these methods. This will be followed by
their application to a particular problem in order to make
a critical comparison: the problem considered is the
modelling of the annual flow of a river from past records.

Appl. Math. Modelling,

1979,

Vol 3, February

Such models have also been considered by others.s6 Here


the data for the annual flow of the Nile river at Aswan Dam
over the period 1 October 1903 to 1 October 1945 have
been utilized.
The methods considered for modelling are (l), linear
regression; (2) exponential regression; (3), polynomial
curve fitting; (4), autoregressive and moving-average models
of different orders assuming stationarity following Box and
Jenkins1 and (S), nonstationary time series models of the
autoregressive integrated moving average (AFUMA) type.
Twelve different models are obtained using these methods.
A comparison between these models is made by predicting
the annual flow for successive years and calculating the mean
square value of the prediction error for each case. Diagnostic
checks are then applied to the models for which the mean
square error is small, to determine which of these may be
acceptable.

Survey of modelling techniques


In order to fully appreciate the various methods of modelling, a brief survey will be presented for each method.
Linear regression

This method is based on the simplifying assumption


the observed data may be approximated by the linear
relationship:
yk=a+bktwk,

k=1,2,...,N

that

(1)

where:
yk = value of kth observation

(2)

0307-904x/79/010002-05
$02.00
0 1979 IFCBusiness Press

Stochastic modelling techniques: N. K. Sinha and T. Prasad


Polynomial models

and:
wk = random error in model

(3)

Assuming that (wk} is a zero-mean white noise sequence,


the model for estimation is given by:
?k=a+bk

(4)

where yk is the estimated value of the kth observation,


the mean square error:

and

The polynomial

model is assumed to have the form:


(15)

It is evident that linear regression is a special case, with


m = 1. For a given value of m, the coefficients ao, a,, . . . ,
a, of the model may be obtained by solving the following
set of simultaneous equations:

(16)
is minimized
f
b=

if:
kyk-tij

k=l

(6)

where:

s,

kiyk=$ (khc)
and:

(17)

and
a=p-bk
(18)

where:

(8)

With these values of ai, the mean square error is


obtained as:

and :

(19)
1
k=-

iv
1 k

(9)

Nk=l

With these values of a and b, the minimum


error is given by:

mean square

Stationary time series models


Following Box and Jenkins a stationary time-series
model can be expressed by the following equation:
,X
Yk=@1Yk-1+@2Yk-2~+~.~@mYk-rn-~1~k-1

Jmi, =~k$Iy~-a2-(Ntl)ab

- 02Wk-2

-~(Nt1)(2N+l)b2

(10)

- ... -

t&W&,

+(1-@1-@2-...-#m)Y

(20)

where:

model is given by:

(11)
where the constants CYand /I are obtained from the
following equations:
N

klnyk-k

lnyk

k=l

k=l

(12)

o=
=f

(21)

-9i

Wi=yi

Exponential regression
In this case, the estimation

The model represented by equation (20) is called a


mixed autoregressive moving-average model of order
(m, n), denoted by ARMA (m, n). The parameters @iare
called the autoregressive parameters, whereas Bi are called
the moving-average parameters. The calculation of these
parameters requires knowledge of the autocorrelation
coefficients of the sequence {yk}, defined as:
pi ,yi

(22)

YO

k2-N(i)

where :

k=l

and :

7i =

lim
N-tmN-i

cu=exp

E lnyk-bk

Nk=l

_.$eZP

N-i

k=,

YkYk+i

(23)

(13)

With these values of (Yand p, the mean square value of


the error is given by:
1 N
Jmin=- Nk=IY;
C

l-:f(r;l)B

(14)

Explicit formulae for calculating


special cases are given below.

the parameters for some

First-order autoregressive model (ARl):


the form :
3k = &Yk-1

Appl.

+ (I-

Math.

This model is of
(24)

@I)?

Modelling,

1979,

Vol 3, February

Stochastic modelling techniques: N. K. Sinha and T. Prasad


where :

equations:

#I= PI

(25)

In this case, the expected mean-square

(2-:+$)e:

e;+(b;z)eit

error is given by:

0; = c$(l - &)

jk =@r.J+_r

-@1-42)Y

+&Y&2+(1

(26)

(27)

1-p:
P2 -

$2 =

P:

(28)

1-p:

Also:
= (1 -

u;

P1@1- P2@2)

0;

(38)

and the variance of the smoothing


2_
Jw -

(37)

~~(1 - 0,)

where:
~ = PIWP2)

e,tl=o
1

he2

8, =

Second-order autoregressive model (AR2): This model


is of the form:

L-2
( P2

error is given by:

(39)

ite:te2,

From equation (37) it will be seen that certain conditions must be satisfied in order that a real root may exist
with the magnitude of e2 less than unity. It can be shown
that the solutions will exist if and only if the correlation
coefficients p 1 and p2 he within the area bounded by
segments of the curves:

(29)

nth order autoregressive model (ARn): In this case:

p2 + p1= -0.5

(40)

p2-p1=

(41)

-0.5

and
Fk =

l#IjJq_j

(1 - 41 -

42 -3..

.T -4%>v

(30)

p: = 4P2(1-

i=l

where the autoregressive parameters are obtained from the


matrix equation (also called the Yule-Walker equations):

2pa)

(42)

n th order moving-average model (MAn); In this case :


gk =v - 4(.k-3k-1)

p1

p2

p1

...

Pn-1

I-1:
Pl

P2

pn-1

Pl

1.

Pn-2

Pn-3

-41

. . . Pn-2

...

**.

and the variance of the smoothing

py3

p2

$3

= P3

e: t p;el

- Yk-n)

(43)

-i3i
t elei + . . . + &it&
Pi =

(33)
equation:

+ 1= 0

lte:te;t...te;

and the variance of the smoothing

This model is

-.cc-1)

(34)

and the variance of the smoothing


2
2Oy
(Jw
1te:

-.k2)

The parameters of the model can be obtained by solving


the set of n nonlinear equations:

(32)

First-order moving-average model (MAl):


of the form:

where 8, is a root of the quadratic

(31)

d,
l _ &I
_lH
error is given by:

u~=(1-p,~,-P,~,-...-Pp,@,bJ~.

Bk Y -&Ok-l

- . . . -wh

Pl

$2

- e2th2

2 =
uw

i=

1,2, . . . ,n (44)

error is given by:

a;

(45)

itefte;t...te:

Due to the nonlinear nature of equation (44), it is not


possible to obtain an explicit solution for Bi.
First-order autoregressive-first-order moving average
model (ARMAl, 1): This model is of the form:
Pk =

$lYk-l+

cl-

h)3

4h1

-h-1)

(46)

where:

error is given by:

&

(35)

It may be noted that equation (34) will have real roots


only if -0.5 < p, < 0.5. If this condition is satisfied, the
real root of magnitude less than unity must be used in order
to satisfy the invertibility conditi0n.l

AT_

(47)

Pl

and e1 is obtained as the real root, with magnitude


than unity, of the quadratic:

e: t

1 - 2P2 + 6:

e,tl=o

less

(48)

Pl -@l

The variance of the smoothing


Second-order moving average model (MA2): This model
is of the form:
Bk =v - el(kl
where the parameters

Appl.

Math.

-Cc-t)

- e2(k2

8, and e2 are obtained

Modelling,

1979,

-L-2)
from

Vol 3, February

(36)

u;(l-@:>

2 _

uw -

error is given by:

I-

2elel

+e:

(49)

In order that real roots may exist for equation (48), the
correlation coefficients pl and p2 must lie in the region

Stochastic modeling techniques: N. K. Sinha and T. Prasad

A number of other models were also tried, but are not


included since they gave worse fits. It may be added that
the ARMA(l,l)
model did not exist, since the conditions
specified in equations (40)-(42) and (50)-(52) were not
satisfied.

defined by the following equations:


l&l < IPrl

(50)

~~>~r(2~r+l)forpr<O

(51)

~2>p1(D-l)forp1>0

(52)

It may be noted that equation (46) is exactly of the


same form as the discrete Kalman filter of the first-order
provided that the mean value ofy is zero, or alternatively,
one is considering only the deviations from the mean in
the model.

Diagnostic tests for the models


Since a number of different models have been obtained,
one would like to know which of these are suitable. In
particular, one would like to determine if by increasing
the order, a significant improvement will be obtained.
These questions can be answered by making suitable
diagnostic tests of the models representing good fits.
As pointed out by Box and Jenkins a model is good
if the sequence of the residuals Wk = Yk - j& is a zero-mean
white-noise sequence. The procedure for testing the validity
of a model, therefore, is to first determine the correlation
coefficients of the residuals and test these for whiteness.
Since only a fmite sample of N terms is used, the correlation coefficients with lag one or more will not be exactly
zero, but must be sufficiently small. A common test is to
require that the first 10 correlation coefficients be less than
1.98/dN for 95% confidence limits. A more suitable
approach is to calculate:

Non-stationary time series models

Many time series do not have stationary means. For such


cases it may be assumed that some suitable difference of the
series is stationary. The resulting model is called an autoregressive integrated moving average (ARIMA) model. For
example, the ARIMA( 1 , 1,l) model is given by:
Bk -Yk-r

= Gr(Y&r

-Y/c-a)

f4(Yk-l-y^k-l)

(53)
which may be rearranged as:

Br)Yk-1-

Pk = (1 +@I-

01Yk-2

(54)

+&?k-l

The calculation of the parameters proceeds as for the


stationary time series models after the differencing process.

in

Example: models of stream flow

c pi

i=l

From the data for the annual flows of the Nile river at
Aswan Dam, given in Appendix I, twelve different models
were obtained, using the methods discussed in the previous
section. A summary of the results is given in Table 1.

and use the Chi-square table for the Portmanteau test, as


suggested by Box and Jenkins.
For the fourth-order autoregressive (AR4) model, the

Table 7. Models of stream flow

Equation

Linear regression

Yk = 2735.142

- 4.649k

Exponential

pk = 2J01

,-0.001

6l

Second-order

+ 10.094

78k

regression
polynomial

,331

pk = 2627.023

Third-order

polynomial

pk = 2809.831

+ 2.427
5

First-order

autoregressive

Standard
of error

132 251.286

363.668

80.949

219 799.647

461.783

0.000

130 224.348

360.866

0.730

126 750.158

356.019

110611.425

336.488

-0.004

Jk

deviation

880 1k2

-0.342
4

Mean square
error

Mean error

Type of model

No.

21k

38.114

237k2-

pk = 0.433

78 6Yk-l

pk = 0.573

3JYk_l

0.042

983

+ 0.566

75k3

215~

-12.187

(ARl)
6

Second-order

autoregressive

+ 0.748

(AR21
7

Third-order

autoregressive

Fourth-order

autoregressive

(AR4)

926Yk_l

- O.l85325Yk_3

(AR3)

pk = 0.563

+ 0.321

First-order

Moving average

785Yk-2

5.135

94 200.518

306.873

1.007

90 056.561

300.093

1.051

844 24.504

290.557

998 26.986

315.843

415y

pk = 0.423

?JOYk_1

- 0.134

357Yk_3

+ 1.182

92lY

0.331

229Yk-2

+ 0.952

689Y

0.217

24JYk-2

- 0.255

pk=ji--0.579416(Yk_t

08JYk-4

Yk-1)

-8.364

(MAI)
10

ARMA

(2, I)

pk = 1.295 83Yk_1 - 0.635 178Yk-2


- 0.164 101 4(Yk-1
+ 1.471 076V

11

ARIMA

(I,

12

ARIMA

(2, 1, 0)

1.0)

Yk = 0.974

5Yk__1 - O.O255Yk-2

pk = 0.968

721Yk_1

+ 0.224

11.083

128 442.506

358.217

-1.215

141 050.701

375.565

-13.118

137 526.837

370.614

-Yk-I)

- 0.193

4l5Yk-2

698Yk_3

Appl.

Math.

Modelling,

1979,

Vol

3, February

Stochastic

modeling

techniques:

N. K. Sinha and T. Prasad

correlation coefficients of the sequence of the residuals


were calculated, and are given below:
p1 = 0.138, p2 = -0.034,
p5 = -0.130,&
p9 = -O.l71,p,,

pa = -0.005,

p4 = 0.012,

= 0.015, p7 = -0.272,~~

= 0.112,

= -0.032

Comparison
iv f

pi =

5.868

i=l

Since 1.98/fi=
0.316, and from the Chi-square table
0.75 G E < 0.5, it is evident that the model is satisfactory
within 95% confidence limits.

Conclusions
Most natural, environmental,
as well as man-made processes
are dynamic in nature, in addition to having certain aspects
of uncertainty and variability. These elements of unpredicability make it necessary to construct dynamic stochastic
models for these processes. Although many generalized
mathematical formulations exist in the literature, they have
found limited practical application because the presentations usually require a more thorough background in
statistics and advanced mathematics than normally available
to most engineers and applied scientists.
An attempt has been made in this paper to present
various techniques for modelling of discrete-time observations in a simple and concise manner. The primary objective
of such models is the prediction and control of the processes
concerned. As an illustrative example, these techniques
have been applied to the modelling of annual stream flows
of the Nile river at Aswan Dam based on observed data
for the period 1903-1944.
Although these values differ
considerably from year to year, a fourth-order autoregressive model gives a good fit, with the standard deviation
of the one-step prediction error less than 10% of the mean
flow. Diagnostic tests indicate that this model may be used
within 95% confidence limits. It is envisaged that the
modelling techniques described in this paper will find
further utilization in other fields.

Acknowledgements
The support of this research by the National Research
Council of Canada is gratefully acknowledged.

References

2
3

5
6

Kashyap, R. L. and Rao, A. Ramachandra.


Dynamic
Stochastic Models from Empirical Data, Academic Press,
London, 1976
Duong, N. et al. IEEE Trans., 1975, SMC-5,46
Ikeda, S. et al. IEEE Trans., 1976, SMC-6,473

Appendix

Also:

BOX.G. E. P. and Jenkins, G. M. Tune Series Analysis Forecasting


and Control, Holden Day, San Francisco,
1970
Kendall. M. G. Time Series. Hafner. New York. 1973
Eykhoff,
Pieter. System Identification:
Parameter
and State
Estimation,
John Wiley and Sons, London,
1974

Appl.

Math.

Modelling,

1979,

Vol 3, February

of actual and predicted flows of the Nile river.

Year

Actual
flow

1903

2950.904

1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
193.5
1936
1937
1938
1939
1940
1941
1942
1943
1944

2247.875
2628.277
2491.124
2792.827
3324.487
3058.08 1
2889.850
2495.270
1848.821
1981.981
24 11.070
3035.200
3558.132
328 1.957
2377.89 1
2394.982
2499.997
2610.239
2743.831
2744.114
2338.835
2494.98 1
2474.437
2448.371
2983.055
2732.250
2205.147
2881.808
2580.533
2954.375
3025.941
2902.774
2642.45 5
2880.239
2885.499
2308.904
1848.087
2569.538
2503.951
2438.750
2211.127

Predicted flow
using AR(4)

2547.145
2833.017
2914.167
2680.220
2518.412
2287.923
2190.258
2483.055
2823.468
3141.747
3136.141
2712.190
2159.605
2269.967
2499.672
2751.894
2766.087
2695.585
2477.708
2597.808
2609.560
2685.379
2880.554
2666.855
2432.78 1
273 1.349
2591.472
2858.888
2675.871
2634.752
2446.217
2601.829
2618.794
2407.764
2276.384
2758.353
2782.822
2790.056

Potrebbero piacerti anche