Sei sulla pagina 1di 51

Time series

Sales figures jan 98 - dec 01


45
40
35
30
25
20
15
10
5
0

2001-01-15

2000-01-15

1999-01-15

1998-01-15

1997-01-15

1996-01-15

1995-01-15

1994-01-15

1993-01-15

1992-01-15

1991-01-15

1990-01-15

1989-01-15

1988-01-15

1987-01-15

1986-01-15

1985-01-15

1984-01-15

1983-01-15

1982-01-15

1981-01-15

1980-01-15

Tot-P ug/l, Rn, Helsingborg 1980-2001

1000

900

800

700

600

500

400

300

200

100

Characteristics
Non-independent observations (correlations
structure)
Systematic variation within a year (seasonal
effects)
Long-term increasing or decreasing level
(trend)
Irregular variation of small magnitude
(noise)

Where can time series be found?


Economic indicators: Sales figures,
employment statistics, stock market indices,

Meteorological data: precipitation,


temperature,
Environmental monitoring: concentrations
of nutrients and pollutants in air masses,
rivers, marine basins,

Time series analysis


Purpose: Estimate different parts of a time
series in order to
understand the historical pattern
judge upon the current status
make forecasts of the future development

Methodologies:
Method

This course?

Time series regression

Yes

Classical decomposition

Yes

Exponential smoothing

Yes

ARIMA modelling (Box-Jenkins)

Yes

Non-parametric tests

No

Transfer function and intervention models

No

State space modelling

No

Spectral domain analysis

No

Time series regression?


Let
yt=(Observed) value of times series at time point t
and assume a year is divided into L seasons
Regession model (with linear trend):

yt=0+ 1t+j sj xj,t+t


where xj,t=1 if yt belongs to season j and 0 otherwise, j=1,,L-1
and {t } are assumed to have zero mean and constant variance
( 2 )

The parameters 0, 1, s1,, s,L-1 are estimated by the Ordinary Least


Squares method:
(b0, b1, bs1, ,bs,L-1)=argmin {(yt (0+ 1t+j sj xj,t)2}
Advantages:
Simple and robust method
Easily interpreted components
Normal inference (conf.intervals, hypothesis testing) directly
applicable
Drawbacks:
Fixed components in model (mathematical trend function and
constant seasonal components)
No consideration to correlation between observations

Example: Sales figures

jan-98
feb-98
mar-98
apr-98
maj-98
jun-98
jul-98
aug-98
sep-98
okt-98
nov-98
dec-98

20.33
20.96
23.06
24.48
25.47
28.81
30.32
29.56
30.01
26.78
23.75
24.06

jan-99
feb-99
mar-99
apr-99
maj-99
jun-99
jul-99
aug-99
sep-99
okt-99
nov-99
dec-99

23.58
24.61
27.28
27.69
29.99
30.87
32.09
34.53
30.85
30.24
27.86
24.67

jan-00
feb-00
mar-00
apr-00
maj-00
jun-00
jul-00
aug-00
sep-00
okt-00
nov-00
dec-00

26.09
26.66
29.61
32.12
34.01
32.98
36.38
35.90
36.42
34.04
31.29
28.50

jan-01
feb-01
mar-01
apr-01
maj-01
jun-01
jul-01
aug-01
sep-01
okt-01
nov-01
dec-01

28.43
29.92
33.44
34.56
34.22
38.91
41.31
38.89
40.90
38.27
32.02
29.78

Construct seasonal indicators: x1, x2, , x12


January (1998-2001):

x1 = 1, x2 = 0, x3 = 0, , x12 = 0

February (1998-2001):

x1 = 0, x2 = 1, x3 = 0, , x12 = 0

etc.
December (1998-2001):

x1 = 0, x2 = 0, x3 = 0, , x12 = 1

sales

time

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

20.33

20.96

23.06

24.48

32.02

47

29.78

48

Use 11 indicators, e.g. x1 - x11 in the regression model

Regression Analysis: sales versus time, x1, ...


The regression equation is
sales = 18.9 + 0.263 time + 0.750 x1 + 1.42 x2 + 3.96 x3 + 5.07 x4 + 6.01 x5
+ 7.72 x6 + 9.59 x7 + 9.02 x8 + 8.58 x9 + 6.11 x10 + 2.24 x11

Predictor

Coef

SE Coef

Constant

18.8583

0.6467

29.16

0.000

time

0.26314

0.01169

22.51

0.000

x1

0.7495

0.7791

0.96

0.343

x2

1.4164

0.7772

1.82

0.077

x3

3.9632

0.7756

5.11

0.000

x4

5.0651

0.7741

6.54

0.000

x5

6.0120

0.7728

7.78

0.000

x6

7.7188

0.7716

10.00

0.000

x7

9.5882

0.7706

12.44

0.000

x8

9.0201

0.7698

11.72

0.000

x9

8.5819

0.7692

11.16

0.000

x10

6.1063

0.7688

7.94

0.000

x11

2.2406

0.7685

2.92

0.006

S = 1.087

R-Sq = 96.6%

R-Sq(adj) = 95.5%

Analysis of Variance

Source

DF

SS

MS

Regression

12

1179.818

98.318

83.26

0.000

Residual Error

35

41.331

1.181

Total

47

1221.150

Source

DF

Seq SS

time

683.542

x1

79.515

x2

72.040

x3

16.541

x4

4.873

x5

0.204

x6

10.320

x7

63.284

x8

72.664

x9

100.570

x10

66.226

x11

10.039

Unusual Observations
Obs

time

sales

Fit

SE Fit

Residual

St Resid

12

12.0

24.060

22.016

0.583

2.044

2.23R

21

21.0

30.850

32.966

0.548

-2.116

-2.25R

R denotes an observation with a large standardized residual

Predicted Values for New Observations

New Obs
1

Fit

SE Fit

32.502

0.647

95.0% CI
(

31.189,

95.0% PI

33.815)

29.934,

35.069)

Values of Predictors for New Observations

New Obs

time

x1

x2

x3

x4

x5

x6

49.0

1.00

0.000000

0.000000

0.000000

0.000000

0.000000

x7

x8

x9

x10

x11

0.000000

0.000000

0.000000

0.000000

0.000000

New Obs
1

What about serial correlation in data?

Positive serial correlation:


Values follow a smooth pattern

Negative serial correlation:


Values show a thorny pattern

How to obtain it?


Use the residuals.
11

et yt y t yt 0 1 t s , j x j ,t ; t 1,...,48
j 1

Residual plot from the regression analysis:

Smooth or thorny?

-1

-2
10
20
30
Month number (from jan 1998)

Durbin Watson test on residuals:


n

2
(
e

e
)
t t 1
t 2

2
e
t
t 1

Thumb rule:
If d < 1 or d > 3, the conclusion is that residuals (and original data)
are correlated.
Use shape of figure (smooth or thorny) to decide if positive or
negative)
(More thorough rules for comparisons and decisions about positive or
negative correlations exist.)

Durbin-Watson statistic = 2.05

(Comes in the output )

Value > 1 and < 3 No significant serial correlation in residuals!

Decompose Analyse the observed time


series in its different components:

Trend part (TR)


Seasonal part (SN)
Cyclical part (CL)
Irregular part (IR)

Cyclical part: State-of-market in economic time series


In environmental series, usually together with TR

Multiplicative model:
yt=TRtSNt CLt IRt
Suitable for economic indicators
Level is present in TRt or in
TCt=(TRCL)t
SNt , IRt (and CLt) works as
indices
Seasonal variation increases
with level of yt

Additive model:
yt=TRt+SNt +CLt +IRt
More suitable for environmental
data
Requires constant seasonal
variation
SNt , IRt (and CLt) vary around 0

Example 1: Sales data


Observed (blue) and deseasonalised (magenta)

Sales figures jan 98 - dec 01


45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00

45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00

Observed (blue) and theoretical trend (magenta)


45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00

Observed (blue) with estimated trendline (black)


45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
mar-97

jul-98

dec-99

apr-01

sep-02

Example 2:

Estimation of components, working scheme


1.

Seasonally adjustment/Deseasonalisation:

SNt usually has the largest amount of variation among the components.
The time series is deseasonalised by calculating centred and weighted
Moving Averages:

( L)
t

yt ( L / 2) yt ( L / 21) 2 ... yt 2 ... yt ( L / 21) 2 yt ( L / 2 )


L2

where L=Number of seasons within a year (L=2 for -year data, 4 for
quaerterly data och 12 fr monthly data)

Mt becomes a rough estimate of (TRCL)t .


Rough seasonal components are obtained by
yt/Mt in a multiplicative model
yt Mt in an additive model

Mean values of the rough seasonal components are calculated for


eacj season separetly. L means.
The L means are adjusted to
have an exact average of 1 (i.e. their sum equals L ) in a
multiplicative model.
Have an exact average of 0 (i.e. their sum equals zero) in an additive
model.

Final estimates of the seasonal components are set to these


adjusted means and are denoted:

sn1 , , snL

The time series is now deaseasonalised by

yt* yt / snt

in a multiplicative model

yt* yt snt

in an additive model

where

snt

is one of

sn1 , , snL

depending on which of the seasons t represents.

2.

Seasonally adjusted values are used to estimate the trend


component and occasionally the cyclical component.
If no cyclical component is present:

Apply simple linear regression on the seasonally adjusted values


Estimates trt of linear or quadratic trend component.
The residuals from the regression fit constitutes estimates, irt of
the irregular component

If cyclical component is present:

Estimate trend and cyclical component as a whole (do not split


them) by *
*
*
*
*

tct

yt m yt ( m 1) yt yt 1 yt m
2 m 1

i.e. A non-weighted centred Moving Average with length 2m+1


caclulated over the seasonally adjusted values

Common values for 2m+1: 3, 5, 7, 9, 11, 13


Choice of m is based on properties of the final
estimate of IRt which is calculated as
*
ir

y
t
t /(tct )

in a multiplicative model

*
ir

t
t (tct )

in an additive model

m is chosen so to minimise the serial correlation


and the variance of irt .
2m+1 is called (number of) points of the
Moving Average.

Example, cont: Home sales data

Minitab can be used for decomposition by


StatTime seriesDecomposition

Val av modelltyp

Option to choose
between two
models

Time Series Decomposition

Data

Sold

Length

47,0000

NMissing

Trend Line Equation


Yt = 5,77613 + 4,30E-02*t

Seasonal Indices

Period

Index

-4,09028

-4,13194

0,909722

-1,09028

3,70139

0,618056

4,70139

4,70139

-1,96528

10

0,118056

11

-1,29861

Accuracy of Model
MAPE:

16,4122

MAD:

0,9025

MSD:

1,6902

Deseasonalised data have been stored in a column with head DESE1.


Moving Averages on these column can be calculated by
StatTime seriesMoving average

Choice of 2m+1

TC component with 2m +1 = 3 (blue)

MSD should be kept as small as possible

By saving residuals from the moving averages we can calculate MSD


and serial correlations for each choice of 2m+1.

2m+1

MSD

Corr(et,et-1)

1.817

-0.444

1.577

-0.473

1.564

-0.424

1.602

-0.396

11

1.542

-0.431

13

1.612

-0.405

A 7-points or 9-points moving average seems most reasonable.

Serial correlations are simply calculated by


StatTime seriesLag
and further
StatBasic statisticsCorrelation

Or manually in Session window:


MTB > lag RESI4 c50
MTB > corr RESI4 c50

Analysis with multiplicative model:

Time Series Decomposition

Data

Sold

Length

47,0000

NMissing

Trend Line Equation


Yt = 5,77613 + 4,30E-02*t

Seasonal Indices
Period

Index

0,425997

0,425278

1,14238

0,856404

1,52471

1,10138

MAPE:

1,65646

MAD:

0,9057

1,65053

MSD:

1,6388

0,670985

10

1,02048

11

0,825072

12

0,700325

Accuracy of Model

16,8643

additive

additive

additive

Classical decomposition, summary


Multiplicative model:

yt TRt SN t CLt IRt


Additive model:

yt TRt SN t CLt IRt

Deseasonalisation

Estimate trend+cyclical component by a


centred moving average:
CMAt

yt ( L / 2 ) yt ( L / 21) 2 ... yt 2 ... yt ( L / 21) 2 yt ( L / 2 )


L2

where L is the number of seasons (e.g. 12, 4, 2)

Filter out seasonal and error (irregular)


components:
Multiplicative model:

yt
snt irt
CMAt

-- Additive model:

snt irt yt CMAt

Calculate monthly averages


Multiplicative model:

sn m

1
nm

nm

( snl irl )

Additive model:

sn m

1
nm

for seasons m=1,,L

nm

( snl irl )

Normalise the monhtly means


Multiplicative model:
snm

sn m
1

L l 1 sn l
L

l 1

sn l

Additive model:

snm sn m

l 1

sn l

Deseasonalise
Multiplicative model:

yt
dt
snt
Additive model:

d t yt snt
where snt = snm for current month m

Fit trend function, detrend (deaseasonalised) data

trt f (t )
Multiplicative model:

dt
clt irt
trt
Additive model:
clt irt d t trt

Estimate cyclical component and separate from error


component
Multiplicative model:
clt
irt

(cl ir ) t k (cl ir ) t ( k 1) ... (cl ir )t ... (cl ir ) t k


(cl ir ) t
clt

2 k 1

Additive model:
clt

(cl ir )t k (cl ir ) t ( k 1) ... (cl ir )t ... (cl ir )t k

irt (cl ir ) t clt

2 k 1

Potrebbero piacerti anche