Sei sulla pagina 1di 173

Time Series Analysis

Session 0: Course outline


Carlos scar Snchez Sorzano, Ph.D.
Madrid
2
Motivation for this course
3
Course outline
4
Course outline: Session 1
Kinds of time series
Continuous time series sampling
Data models
Descriptive analysis: time plots and data preprocessing
Distributional properties: statisitcal distribution, stationarity and autocorrelation
Outlier detection and rejection
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Sampling
t
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Regular sampling
t
Irregular sampling
] [ ] [ ] [ ] [ n random n periodic n trend n x
5
Course outline: Session 2
Trend analysis:
Linear and non-linear regression
Polynomial fitting
Cubic spline fitting
Seasonal component analysis:
Spectral representation of stationary processes
Spectral signal processing:
Detrending and filtering
Non-stationary signal processing
] [ ] [ ] [ ] [ n random n periodic n trend n x
6
Course outline: Session 3
Model definition:
Moving Average processes (MA)
Autoregressive processes (AR)
Autoregressive, Moving Average (ARMA)
Autoregressive, Integrated, Moving Average (ARIMA, FARIMA)
Seasonal, Autoregressive, Integrated, Moving Average (SARIMA)
Known external inputs: System identification
A family of models
Nonlinear models
Parameter estimation
Order selection
Model checking
Self-similarity, fractal dimension and chaos theory
] [ ] [ ] [ ] [ n random n periodic n trend n x
7
Course outline: Session 4
Forecasting
Univariate forecasting
Intervention modelling
State-space modelling
Time series data mining
Time series representation
Distance measure
Anomaly/Novelty detection
Classification/Clustering
Indexing
Motif discovery
Rule extraction
Segmentation
Summarization
Winding Dataset
(The angular speed of reel 2)
0 500 1000 1500 2000 2500
A B C
8
Course Outline: Session 5
Su nombre aqu
Bring your own data if
possible!
9
Suggested readings
It is suggested to read (before coming):
Geo4990: Time series analysis
Adler1998: Analysing stable time series
Leonard: Mining Transactional and Time Series Data
Chattarjee2006: Simple Linear Regression
10
Resources
Data sets
http://www.york.ac.uk/depts/maths/data/ts
http://www-personal.buseco.monash.edu.au/~hyndman/TSDL
Competition
http://www.neural-forecasting-competition.com
Links to organizations, events, software, datasets
http://www.buseco.monash.edu.au/units/forecasting/links.php
http://www.secondmoment.org/time_series.php
Lecture notes
http://www.econphd.net/notes.htm#Econometrics
11
Bibliography
D. Pea, G. C. Tiao, R. S. Tsay. A course in time series analysis. J ohn
Wiley and Sons, Inc. 2001
C. Chatfield. The analysis of time series: an introduction. Chapman &
Hall, CRC, 1996.
C. Chatfield. Time-series forecasting. Chapman & Hall, CRC, 2000.
D.S.G. Pollock. A handbook of time-series analysis, signal processing
and dynamics. Academics Press, 1999.
J . D. Hamilton. Time series analysis. Princeton Univ. Press, 1994.
C. Prez. Econometra de las series temporales. Prentice Hall (2006)
A. V. Oppenheim, R. W. Schafer, J . R. Buck. Discrete-time signal
processing, 2nd edition. Prentice Hall, 1999.
A. Papoulis, S. U. Pillai. Probability, random variables and stochastic
processes, 4th edition. McGraw Hill, 2002.
1
Time Series Analysis
Session I: Introduction
Carlos scar Snchez Sorzano, Ph.D.
Madrid
2
2
Session outline
1. Features and objectives of the time series
2. Sampling
3. Components of time series: data models
4. Descriptive analysis
5. Distributional properties
6. Detection and removal of outliers
7. Time series methods
3
3
1. Features and objectives of the time series
Goal: Explain history
Features:
1. Samples (discrete)
2. Bounded
3. Finite support
| | n x
| | | | | |
)
`

< = e

= n
n s n s l n x
1
| | { }
F
n n n n n x ,..., 1 , 0
0 0
+ e =
Trend
Periodic component
Period=N7 years
] [ ] [ n x N n x = +
4
4
1. Features and objectives of the time series
Goal: Forecast demand
Features:
1. Seasonal
2. Non stationary
3. Non independent samples
Mean
Standard
Variance
5
5
1. Features and objectives of the time series
Pendulum angle with different controllers
Goal:
Control=Forecast(+correct)
Features:
1. Continuous signal
Regular sampling
) ( ] [
s
nT x n x =
s
T =Sampling period
6
6
1. Features and objectives of the time series
Other kind of times series not covered in this course
7
7
1. Features and objectives of the time series
Other kind of times series not covered in this course
8
8
2. Sampling
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Sampling
t
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Regular sampling
t
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Irregular sampling
t
) ( ] [
s
nT x n x =
Also called non-uniform
sampling
9
9
2. Sampling
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Continuous signal
t
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Oversampling
t
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Critical sampling
t
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Undersampling
t
10
min
= T
5
min
T
T
s
=
2
min
T
T
s
=
min
T T
s
=
Nyquist/Shannon criterion
( ) ( ) ( )
8 100
10
4 100
3
2
1
100
1
2 sin 2 . 0 2 sin 2 sin 2 ) (
t t
t t t + + = t t t t x
10
10
2. Sampling
11
11
2. Sampling: Signal reconstruction
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Continuous signal
t
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Sampled signal
t
-5 0 5
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
sinc(t)
t
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Sinc superposition
t
9 9.5 10 10.5 11
-0.2
0
0.2
0.4
0.6
0.8
1
Sinc superposition
t
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Reconstructed signal
t
| |

=
=
n
s r r
nT t h n x t x ) ( ) (
|
|
.
|

\
|
=
s
r
T
t
c t h sin ) (
Reconstruction formula
Reconstruction kernel
for bandlimited
signals
12
12
2. Sampling: Aliasing
0 20 40 60 80 100 120 140 160 180 200
-1
-0.5
0
0.5
1
Obvious aliasing effect
t
0 50 100 150 200 250 300 350 400
-3
-2
-1
0
1
2
3
Not so clear aliasing effect
t
2 1 . 1
min min
T T
T
s
> =
2 1 . 1
min min
T T
T
s
> =
( ) ( ) ( )
8 100
10
4 100
3
2
1
100
1
2 sin 2 . 0 2 sin 2 sin 2 ) (
t t
t t t + + = t t t t x
( )
10
100
( ) sin 2 x t t t =
13
13
2. Aliasing: Conclusions
From the disgression above, two are the main consequences that must be kept
in mind:
1. Any continuous time series can be safely treated as a discrete time
series as long as the Nyquist criterion is satisfied.
2. Once our discrete analysis is finished, we can always return to the
continuous world by reconstructing our output sequence.
Although not discussed, once the continuous signal is discretized, one can
arbitrarily change the sampling period without having to go back to the
continuous signal with two operations called upsampling (going for a finer
discretization) and downsampling (coarser discretization).
14
14
3. Components of a time series: data models
] [ ] [ ] [ ] [ n random n periodic n trend n x + + =
Seasonal
Ex: Unemployment is low in summer
Ex: Temperature is high in the middle of the day
Long-term change. What is long-term?
0 50 100 150 200 250 300 350 400
-3
-2
-1
0
1
2
3
t
Explained by statistical models (AR, MA, )
15
15
0 100 200 300 400
-10
-5
0
5
10
t
Trend
0 100 200 300 400
-10
-5
0
5
10
t
Seasonal
0 100 200 300 400
-10
-5
0
5
10
t
Noise
0 100 200 300 400
-10
-5
0
5
10
t
Additive model
0 100 200 300 400
-10
-5
0
5
10
t
Seasonal multiplicative model
0 100 200 300 400
-10
-5
0
5
10
t
Fully multiplicative model
3. Components of a time series: data models
] [ ] [ ] [ ] [ n n S n m n x c + + = ] [ ] [ ] [ ] [ n n S n m n x c + = ] [ ] [ ] [ ] [ n n S n m n x c =
] [ log ] [ log ] [ log ] [ log n n S n m n x c + + =
log
16
16
3. Components of a time series: data model
0 10 20 30 40 50 60 70 80 90 100
-1
-0.5
0
0.5
1
0 10 20 30 40 50 60 70 80 90 100
-1
-0.5
0
0.5
1
Noise is dependent of signal?
0 10 20 30 40 50 60 70 80 90 100
-1
0
1
Signal x1
0 10 20 30 40 50 60 70 80 90 100
-1
0
1
Signal x2
0 10 20 30 40 50 60 70 80 90 100
-1
0
1
Interferent signal
0 10 20 30 40 50 60 70 80 90 100
-2
0
2
Observed signal
Noise is dependent of noise
[ ] [ ] [ ] x n x n n c = +
17
17
4. Descriptive analysis
Time plot
Take care of appearance (scale, point shape, line shape, etc.)
Take care of labelling axes specifying units (be careful with the
scientific notation, e.g. from 0.5e+03 to 0.1e+04)
Data Preprocessing
Consider transforming the data to enhance a certain feature, although
this is a controversial topic:
Logarithm: Stabilizes variance in the fully multiplicative model
Box-Cox: the transformed data tends to be normally distributed
]) [ log( ] [ n x n y =

=
>
=

0 ]) [ log(
0
] [
1 ] [

n x
n y
n x
18
18
4. Descriptive analysis
Data preprocessing:
(Removal of trend) Detrending: if the trend clearly follows a known
curve (line, polynomial, logistic, Gaussian, Gompertz, etc.), fit a model
to the time series and detrend (either by substracting or dividing).
(Removal of trend) Differencing
2
2
] 2 [ ] 1 [ 2 ] [
] [ ] [
] 1 [ ] [ 2 ] 1 [
] [ ] [
] [ ] 1 [
] [ ] [
] 1 [ ] [
] [ ] [
s
l l ll
s
l r lr
s
r r
s
l l
T
n x n x n x
n y n y
T
n x n x n x
n y n y
T
n x n x
n x n y
T
n x n x
n x n y
+
= V =
+ +
= V =
+
= V =

= V =
-6 -4 -2 0 2 4 6
-10
-5
0
5
10
15
20
25
30


x[n]
y
l
[n]
y
lr
[n]
1.5
) ( ] [
) 2 sin( 5 . 1 10 ) (
s
nT x n x
t t t x
=
+ + =
19
19
4. Descriptive analysis
Data preprocessing:
(Removal of season) Deseasoning:average over the seasonal period
to remove its effect.
(Estimate of season) Estimating seasoning: substract the
deseasoned time series from a local estimate of the current sample.
( ] [ ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
n x n x n x n x n m
est
+ + + + =
) ] 6 [ ] 5 [ ... ] 1 [
2
1
+ + + + + + n x n x n x
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|

\
|
=

=
+
] [ ] [ ] [ ] [ n e n S n m n x + + =
0 10 20 30 40 50 60 70 80 90 100
-20
0
20
40
60
80
100


x[n]
m[n]
m
est
[n]
S
est
[n]
] [ ] 2 [ ] 1 [ ] [ ] 1 [ ] 2 [ ] [
8
1
4
1
2
1
4
1
8
1
n m n x n x n x n x n x n S
est est
+ + + + + + =
20
20
4. Descriptive analysis
Data preprocessing:
(Removal or isolation of trend, seasonal or noise) Filtering: filtering
aims at the removal of any of the components, for example, the moving
average is a common filtering operation in stock analysis to remove
noise.

=
=
20
1
] [
20
1
] [
k
k n x n y

=
=
10
10
] [
21
1
] [
k
k n x n y
40 60 80 100 120 140 160 180
7
8
9
10


Original
Causal
Causal+Anticausal
21
21
5. Distributional properties
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
t
Irregular sampling
] [n x
31
] 31 [ X x =
All variables are identically distributed Stationarity
A variable can be normally distributed or Poisson or etc.
It is important to characterize their distribution
All variables are independent White process
Independency is measured through the autocorrelation function
0 10 20 30 40 50 60 70 80 90 100
-3
-2
-1
0
1
2
3
22
22
5. Distributional properties
-5 0 5
0
0.5
1
Normal distribution function
X
-5 0 5
0
0.1
0.2
0.3
0.4
Normal probability density function
X
-5 0 5 10 15 20
0
0.5
1
Poisson distribution function
X
-5 0 5 10
0
0.1
0.2
0.3
0.4
Poisson probability density function
X
{ } x X x F
X
s = Pr ) (
}

=
x
dt t f ) (
{ }

s
= =
x x
i
i
x X Pr
{ }
}


= ) (dx F x X E
X
r r
}

=
x
r
dt t f x ) (
{ }

s
= =
x x
i
r
i
x X x Pr
{ }
{ }

=
+ = =
1
!
1 ) (
k
k
k k
itX
X
t
k
X E i
e E t
}

=
t
t t

t
dt t
it
e e
a F b F
X
itb ita
X X
) ( lim
2
1
) ( ) (
{ } y Y x X y x F
Y X
s s = , Pr ) , (
,
Characteristic
function
23
23
5. Distributional properties
0 20 40 60 80 100
-20
0
20
40
60
80
100
0 20 40 60 80 100
-1
-0.5
0
0.5
1
0 50 100 150 200
-0.2
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Examples of nonstationary variables
24
24
5. Distributional properties
Strictly stationary
) ,..., , ( ) ,..., , (
2 1
2 1
2 1
2 1
,..., , ,..., , N n N n N n X X X n n n X X X
k N
k
n N n N n k
k
n n n
x x x F x x x F
+ + +
+ + +
=
N n n n k
k
, ,..., , ,
2 1

Consequences:
{ } =
n
X E n
{ }
2
0
*
0
] [ ) )( ( ) , ( ] [
0 0
I = = =
+ +
n X X E X X Cov n C
n n n n n n
0
, n n
Autocorrelation
function (ACF)
Wide sense stationary { } =
n
X E n
] [ ) , (
0
0
n X X Corr
n n n
I =
+ 0
, n n
] [ ] [
0
*
0
n n I = I
] [ ] 0 [
0
n I > I
lag
{ } ] [ ) , (
0
*
0 0
n X X E X X Corr
n n n n n n
I = =
+ +
Autocovariance
function
Correlation
coefficient
] 0 [
] [
] [
0
0
C
n C
n r =
0
n
0
n
Example: white process

=
=
= = I
0 0
0 1
] [ ] [
0
0
0
2
0
n
n
n n
X
o o
25
25
5. Distributional properties
] 0 [
] [
] [
0
0
C
n C
n r =
Correlation coefficients (Zero-order correlations)
Under the null hypothesis it is distributed as
Students t with N-2 degrees of freedom.
Number of samples upon which is estimated ] [
0
n r
The probability of observing if the null hypothesis is true
is given by
( ) 0 ] [
0 0
= n r H
] [
0
n r
)
`


>
] [ 1
2
] [ Pr
0
2
0
n r
N
n r t
26
26
5. Distributional proeperties
Ergodicity
0 5 10 15 20 25
-1
0
1
0 5 10 15 20 25
-1
0
1
0 5 10 15 20 25
-1
0
1
0 5 10 15 20 25
-1
0
1
Realization 1
Realization 2
Realization 3
Realization 4

=
=
4
1
, 20 4
1
20

i
i
x

=
=
25
1
25
1
] [
n
n x
Strong law of large numbers

=

=
N
i
i N
N
x
1
, 20
1
20
lim
Equal if
stationary and
ergodic for the
mean
Ensemble average
Time average
27
27
5. Distributional properties
0 10 20 30 40 50 60 70 80 90 100
-25
-20
-15
-10
-5
0
5
10
15
20
] [ ] [ ] [ n e n m n x + =
) , 0 (
2

o N
) , 0 (
2
c
o N
{ } 0 =
n
X E
2 2
) , (
0
c
o o + =
+n n n
X X Cov
The time average still converges to the ensemble average
Stationarity Ergodicity
Example:
28
28
5. Distributional properties
How to detect non-stationarity in the mean?
There is a variation in the local mean
Solutions:
Polynomial trends: By differentiating p times
(2) (1) (1)
[ ] [ ] [ 1] x n x n x n =
( ) ( 1) ( 1)
[ ] [ ] [ 1]
p p p
x n x n x n

=

Removal of a linear trend


Removal of a quadratic trend
Removal of a p-th order polynomial trend
(1)
[ ] [ ] [ 1] x n x n x n =
Removal of a constant (or piece-wise constant)
(3) (2) (2)
[ ] [ ] [ 1] x n x n x n =
29
29
5. Distributional properties
How to detect non-stationarity in the mean?
Exponential trends: Taking logs
[ ]
[ ] log
[ 1]
x n
y n
x n
=

Financial time series


30
30
5. Distributional properties
How to detect non-stationarity due to seasonality?
There is a periodic variation in the local mean
Solutions:
Polynomial trends: By differentiating p times with that seasonality
(1)
[ ] [ ] [ 12] x n x n x n =
Monthly data: removal of a yearly seasonality
(1)
[ ] [ ] [ 7] x n x n x n =
Daily data: removal of a weekly seasonality
How to detect non-stationarity in the variance?
There is a variation in the local variance
Solutions:
Box-Cox transformation tends to stabilize variance
How to detect non-stationarity?
Solutions:
Unit root tests
31
31
6. Outlier detection and rejection
Common procedures:
1. Visual inspection
2. Mean k Standard Deviation
3. Median k Median Abs. Deviation
4. Robust detection
5. Robust model fitting
Common actions:
1. Remove observation
2. Substitute by an estimate
Do
Estimate mean and Std. Dev.
Remove samples outside a given interval
Until (No sample is removed)
0 10 20 30 40 50 60 70 80 90 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Sensor blackout
32
32
7. Time series methods
Time-domain methods:
Based on classical theory of correlation (autocovariance)
Include parametric methods (AR, MA, ARMA, ARIMA, etc.)
Include regression(linear, nonlinear)
Frequency-domain (spectral) methods:
Based on Fourier analysis
Include harmonic methods
Neural networks and Fuzzy neural networks
Other fancy approaches: fractal models, wavelet models,
bayesian networks, etc.
33
33
Bibliography
C. Chatfield. The analysis of time series: an introduction. Chapman &
Hall, CRC, 1996.
D.S.G. Pollock. A handbook of time-series analysis, signal processing
and dynamics. Academics Press, 1999.
J . D. Hamilton. Time series analysis. Princeton Univ. Press, 1994.
A. V. Oppenheim, R. W. Schafer, J . R. Buck. Discrete-time signal
processing, 2nd edition. Prentice Hall, 1999.
A. Papoulis, S. U. Pillai. Probability, random variables and stochastic
processes, 4th edition. McGraw Hill, 2002.
Time Series Analysis
Session II: Regression and Harmonic Analysis
Carlos scar Snchez Sorzano, Ph.D.
Madrid
2
Session outline
1. Goal
2. Linear and non-linear regression
3. Polynomial fitting
4. Cubic spline fitting
5. A short introduction to system analysis
6. Spectral representation of stationary processes
7. Detrending and filtering
8. Non-stationary processes
3
1. Goal
] [ ] [ ] [ ] [ n random n periodic n trend n x + + =
Session 3
Session 2
4
2. Linear and non linear regression
Year
(n)
Price
(x[n])
1500 17
1501 19
1502 20
1503 15
1504 13
1505 14
1506 14
1507 14

Linear regression
] [ ] [ ] [ n random n trend n x + =
n n trend
1 0
] [ | | + =
...
1503 15
1502 20
1501 19
1500 17
1 0
1 0
1 0
1 0
| |
| |
| |
| |
+ =
+ =
+ =
+ =
|
|
|
|
|
|
.
|

\
|
=
|
|
.
|

\
|
|
|
|
|
|
|
.
|

\
|
...
15
20
19
17
... ...
1503 1
1502 1
1501 1
1500 1
1
0
|
|
x X =
5
2. Linear and non linear regression
Linear regression
x X =
] [ ] [ ] [ n random n trend n x + = ( ) ( ) x X x X x X X x = = + =
t
2
c
o
{ }
] [ ] [
0 ] [
0
2
0
n n
n E
o o
c
c c
= I
=
Lets assume
( ) x X X X x X

t t
1
2
min arg


+
= = =
c
o
Least Squares Estimate
{ } =

E
Properties:
{ } ( )
1
2


= X X
t
Cov
c
o
( ) ( ) x X x X

=
t
k N
1

2
c
o
Homocedasticity
2
2
2
1
1 x x
X x

= R
1
1
) 1 ( 1
2 2

=
k N
N
R R
adjusted
Degree of fit
Linear regression with constraints
( ) ( )
( )
2
1 1
argmin

( )
. .
t t t t
R
s t
c
o

= = +
=

X X R R X X R r R
R r
2 1 1 2
2 2 0 | | | | = =
Example:
6
2. Linear and non linear regression
] [ ] [ ] [ n random n trend n x + =
2
2 1 0
] [ n n n trend | | | + + =
...
1503 1503 15
1502 1502 20
1501 1501 19
1500 1500 17
2
2
1 0
2
2
1 0
2
2
1 0
2
2
1 0
| | |
| | |
| | |
| | |
+ + =
+ + =
+ + =
+ + =
|
|
|
|
|
|
.
|

\
|
=
|
|
|
.
|

\
|
|
|
|
|
|
|
.
|

\
|
...
15
20
19
17
... ... ...
1503 1503 1
1502 1502 1
1501 1501 1
1500 1500 1
2
1
0
2
2
2
2
|
|
|
x X =
0
0


=
=
1
0
H
H First test:
( ) ( )
0
t
t
0
X X =

1
2
k
F
c
o
H
0
is true
) , ( k N k F
If F>F
P
, reject H
0
20 1
20 0


2
2
=
=
H
H Second test:
( ) ( )
20 2 2 1
t
2
t
20 2
)X P (I X =

1
2
2
k
F
c
o
H
0
is true
) , (
2
k N k F
] [ ] [
) , 0 ( ] [
0
2
0
2
n n
N n
o o
o c
c c
c
= I

Lets assume
( ) x

X X =
|
|
.
|

\
|
2
1
2 1
( )
t
1
1
1
t
1 1 1
X X X X P

=
0 1
0 0
i i
i i
H
H
| |
| |
=
=
ii
i i
t
e o
| |
c

= ( )
( )
| |
ii
ii
1
= X X
t
e
H
0
is true
) ( k N t
7
2. Linear and non linear regression
[40,45] [0.05,0.45] Y X = +
We got a certain regression
line but the true regression
line lies within this region
with a 95% confidence.
( )
1
2 2

t
j
jj
o o

= X X
2
2
1 , 1

j j j
N k
t
o
| | o

e +
Unbiased variance of the j-th regression coefficient
Confidence interval for the j-th regression coefficient
Confidence intervals for the coefficients
8
2. Linear and non linear regression
Durbin-Watson test:
0 ] [
0 ] [
0 1
0 0
> I
= I
n H
n H
( )

=
=
=

N
n
N
n
n
n n
d
1
2
2
2
] [
] 1 [ ] [
c
c c
Reject H
0
L
d d <
Do not reject H
0
U
d d >
0
0
= n
Test that the residuals are certainly uncorrelated
Other tests: Durbins h, Wallis D
4
, Von Neumanns ratio, Breusch-Godfrey
9
2. Linear and non linear regression
(0) (0)
[ ] [ ] y n n = x
Cochrane-Orcutt method of regression with correlated residues
1. Estimate a first model
2. Estimate residuals (correlated)
3. Estimate the correlation of the residues
4. i=1
5. Estimate uncorrelated residuals
6. Estimate uncorrelated output
7. Estimate uncorrelated input
8. Reestimate model
9. Estimate residuals
10. i=i+1 until convergence in
(0) (0)
[ ] [ ] [ ] n y n y n c =

( ) ( ) ( )

[ ] [ ] [ 1] [ 1]
i i i
a n n i n
c
c c =
( ) ( )

[ ] [ ] [ 1] [ 1]
i i
y n y n i y n
c
=
( ) ( )

[ ] [ ] [ 1] [ 1]
i i
n n i n
c
= x x x
( ) ( ) ( ) ( )
[ ] [ ] [ ]
i i i i
y n a n n = x
( ) ( ) ( )
[ ] [ ] [ ]
i i i
n y n y n c =
( ) i

10
2. Linear and non linear regression
Assumptions of regression
The sample is representative of your population
The dependent variable is noisy, but the predictors are not!!. Solution: Total Least
Squares
Predictors are linearly independent (i.e., no predictor can be expressed as a linear
combination of the rest), although they can be correlated. If it happens, this is called
multicollinearity. Solution: add more samples, remove dependent variable, PCA
The errors are homoscedastic. Solution: Weighted Least Squares
The errors are uncorrelated to the predictors and to itself. Solution: Generalized
Least Squares
The errors follow a normal distribution. Solution: Generalized Linear Models
11
2. Linear and non linear regression
More linear regression
] [ ] [
1 0
n w n n x + + = | |
] [ ] 1 [ ] [
2 1 0
n w n x n n x + + + = | | |
] [ ]) 2 [ ] 1 [ ( ] [
2 1 0
n w n x n x n n x + + + = | | |
] [ ] 2 [ ] 1 [ ] [
3 2 1 0
n w n x n x n n x + + + + = | | | |
] [ ] [ ... ] [ ] [ ] [
6 6 1 1 0 0 1 0
n w n M n M n M n n x + + + + + = o o o | |

+ =
=
otherwise 0
0 7 n 1
] [
0

n M

+ =
=
otherwise 0
6 7 n 1
] [
6

n M
7 years
Non linear regression
] [ ) sin( ] [
1 0 2 1 0
n w n n n x + + + + = o o | | |
] [ ] [
1
0
n w n n x
|
| =
] [ log log log ] [ log
1 0
n w n n x + + = | |
] [ ' log ] [ '
'
1
'
0
n w n n x + + = | |
12
3. Polynomial fitting
Polynomial trends
] [ ] [ ] [ n random n trend n x + =
2
2 1 0
] [ n n n trend | | | + + =
...
1503 1503 15
1502 1502 20
1501 1501 19
1500 1500 17
2
2
1 0
2
2
1 0
2
2
1 0
2
2
1 0
| | |
| | |
| | |
| | |
+ + =
+ + =
+ + =
+ + =
|
|
|
|
|
|
.
|

\
|
=
|
|
|
.
|

\
|
|
|
|
|
|
|
.
|

\
|
...
15
20
19
17
... ... ...
1503 1503 1
1502 1502 1
1501 1501 1
1500 1500 1
2
1
0
2
2
2
2
|
|
|
x X =
Orthogonal polynomials
...
) 1503 ( ) 1503 ( ) 1503 ( 15
) 1502 ( ) 1502 ( ) 1502 ( 20
) 1501 ( ) 1501 ( ) 1501 ( 19
) 1500 ( ) 1500 ( ) 1500 ( 17
2 2 1 1 0 0
2 2 1 1 0 0
2 2 1 1 0 0
2 2 1 1 0 0
o | o | o |
o | o | o |
o | o | o |
o | o | o |
+ + =
+ + =
+ + =
+ + =
q
q
n n n n trend | | | | + + + + = ... ] [
2
2 1 0
) ( ... ) ( ) ( ) ( ] [
2 2 1 1 0 0
n n n n n trend
q q
| o | o | o | o + + + + =
x X =
x =
1
= XR
R
1
=
}


= = 0 ) ( ) ( dt t t j i
j i
| | Such that
13
Polynomial trends
Orthogonal polynomials
3. Polynomial fitting
}

= =
1
1
0 ) ( ) ( dt t t j i
j i
| |
1 ) (
0
= t |
t t = ) (
1
|
) (
1
) (
1
1 2
) (
2 1
t
i
i
t t
i
i
t
i i i
+

+
+
= | | |
2
1
2
2
3
2
) ( = t t |
t t t
2
3
3
2
5
3
) ( = |
Grafted polynomials
) ( ... ) ( ) ( ) ( ] [
2 2 1 1 0 0
n n n n n trend
q q
| o | o | o | o + + + + =
S(t) is a cubic spline in some interval if this
interval can be decomposed in a set of
subintervals such that S(t) is a polynomial of
degree at most 3 on each of these subintervals
and the first and second derivatives of S(t) are
continuous.
1
t
2
t
3
t

=
+
=
1
3
3
) ( ) (
k
j
j j
t t d t S
14
4. Cubic spline fitting
Cubic B-splines
( )

< s
< +
=

otherwise 0
2 1
1
) (
6
2
2
2
3
2
3
2
3
3
t
t t
t B
t
t

=
=
1
3
3
) ( ) (
k
j
j j
t B d t S
-3 -2 -1 0 1 2 3
0
0.5
1
t
0
3 3 3 2 2 3
( ) ( ) ( 3 3 )
q q
p j j j j j j
j p j p
B t d t t d t t t tt t
+ +
= =
= = +

0 ) ( = < t B t t
p p
3 , 2 , 1 , 0 0 0 ) ( = = = >

=
k t d t B t t
q
p j
k
j j p q
|
|
|
|
|
.
|

\
|
=
|
|
|
|
|
|
.
|

\
|
|
|
|
|
|
.
|

\
|
+
+
+
+
+ + + +
+ + + +
+ + + +
0
0
0
0
1 1 1 1 1
4
3
2
1
3
4
3
3
3
2
3
1
3
2
4
2
3
2
2
2
1
2
4 3 2 1
p
p
p
p
p
p p p p p
p p p p p
p p p p p
d
d
d
d
d
t t t t t
t t t t t
t t t t t
}


=1 ) (
3
dt t B
p
( ) ( ) 2 1 0 1 2
4 3 2 1
=
+ + + + p p p p p
t t t t t
By definition
15
4. Cubic spline fitting

=
=
1
3
3
) ( ) (
k
j
j j
t B d t S
( )

+
=
=
1
1
3
0
) (
F
j
j j
T
t
j
j B d t S
T
t
j
T
t
j
F
F
= = ,
0
0
Bd S =
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-10
-8
-6
-4
-2
0
2
4
6
8
10
Cubic B-splines
T
16
4. Cubic spline fitting
Overfitting and forecasting
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1 -0.5 0 0.5 1 1.5 2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
4. Polynomial fitting with discontinuities
17
18
5. A short introduction to system analysis
T
] [n x
) ( ] [ x T n y =
Example: ] [ 3 ] [ n x n y = Amplifier
] 3 [ ] [ = n x n y Delay
] [ ] [
2
n x n y =
Instant power
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
Smoother
Box-Cox transformation: family of systems
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|

\
|
=

=
+
Season estimation

=
>
= =

0 ]) [ log(
0
) ( ] [
1 ] [

n x
x T n y
n x
-5 0 5 10 15
0
0.5
1
n
x[n]
-5 0 5 10 15
0
0.5
1
n
x[n-3]
19
Systems with memory
| | | | | | | |,...) 2 , 1 , ( = n x n x n x T n y
Memoryless systems
| | | |) ( n x T n y =
Invertible systems | | | | | | | | | |)) ( ( ) ( : ) ( ) (
1 1 1
n x T T n y T n x y T n x T n y

= = - =
Causal systems
| | | | | | | |,...) 2 , 1 , ( = n x n x n x T n y
Anticausal sytems
| | | | | | | |,...) 2 , 1 , ( + + = n x n x n x T n y
Stable systems
| | | | n B n x T B n B n x
T T x
< - < , ) ( : ,
Time invariant
systems
| | | | | | | |) ( ) (
0 0
n n x T n n y n x T n y = =
Linear systems
| | | | ( ) | | ( ) | | ( ) n x bT n x aT n bx n ax T
2 1 2 1
+ = +
LTI
Basic system properties
5. A short introduction to system analysis
Present Past
Future
20
5. Basic introduction to system analysis
LTI systems
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
1
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
1
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
2
[n]=x
1
[n-3]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
2
[n]=y
1
[n-3]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
3
[n]=2x
1
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
3
[n]=2y
3
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
4
[n]=x
1
[n]+x
2
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
4
[n]=y
1
[n]+y
2
[n]
21
5. A short introduction to system analysis
T
] [n x
) ( ] [ x T n y =
Example: ] [ 3 ] [ n x n y = LTI, memoryless, invertible, causal, stable
] 10 [ ] [ = n x n y
] [ ] [
2
n x n y =
Non linear, TI, memoryless, non invertible,
causal,stable
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y

=
>
= =

0 ]) [ log(
0
) ( ] [
1 ] [

n x
x T n y
n x
Non linear, memoryless, invertible, causal,
unstable
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|

\
|
=

=
+
LTI, with memory, invertible, causal, stable
LTI, with memory, invertible, non causal,
stable
LTI, with memory, invertible, non causal,
stable
22
5. A short introduction to system analysis
Impulse response of a LTI system
T
] [n o
] [n h
-5 0 5
0
0.5
1
n
o[n]
FIR: Finite impulse response
IIR: Infinite impulse response
T
] [n x

=
= =
k
k n h k x n h n x n y ] [ ] [ ] [ * ] [ ] [
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|

\
|
=

=
+
Example:
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n n n
n h
o o o
-5 0 5
0
0.1
0.2
0.3
0.4
n
h[n]
-10 -5 0 5 10
-0.2
0
0.2
0.4
n
h[n]
23
5. A short introduction to system analysis
| | | |

= =
=
M
k
k
N
k
k
k n x b k n y a
0 0
Difference equation
Example:
] 1 [ 5 . 0 ] [ ] 1 [ ] [ = + n x n x n y n y

= =
N
k
k
k
M
k
k
k
z a
z b
z X
z Y
z H
0
0
) (
) (
) (

=

=
M
k
k
k
N
k
k
k
z z X b z z Y a
0 0
) ( ) (
1 1
) ( 5 . 0 ) ( ) ( ) (

= + z z X z X z z Y z Y
1
1
1
5 . 0 1
) (
) (
) (

= =
z
z
z X
z Y
z H
Transfer function
C z e
Impulse response of a LTI system ) ( ) ( ) ( z X z H z Y =
] [
0
n n x +
ZT
) (
0
z X z
n
24
6. Spectral representation of stationary processes
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
-100
0
100
Year
P
r
i
c
e
-
t
r
e
n
d
] [ ] [ ] [ ] [ n random n seasonal n trend n x + + =
n n trend 6 . 1 2777 ] [ + =
] [ ] [ ] [ n trend n x n y =
] [ ] [ n random n seasonal + =
( ) ( ) ] [ cos 5 . 22 cos 5 . 26
12
2
3
2
8
2
n random n n + + + + = t
t t t
Period=8 years Period=12 years
25
6. Spectral representation of stationary processes
Harmonic components
( ) ] [ cos ] [ n random n A n x + + = | e
Amplitude
Frequency (rad/sample)
Phase (rad)
0 10 20 30 40 50 60 70 80 90 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
A
e
|
( ) ] [ 2 cos ] [ n random fnT A n x
s
+ + = | t
Frequency (Hertz)
k N N n seasonal n seasonal
f
f
fT
k k s
s
= = = + =
e
t 2
] [ ] [ ,... 3 , 2 , 1 = k
( ) ( ) t
t t t
+ + + = n n n seasonal
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ] [
Period=8 years Period=12 years
Example:
Period=lcm(N
1
,N
2
)=24 years
26
6. Spectral representation of stationary processes
Harmonic components
( ) | t + = n A n x cos ] [
0 5 10 15 20 25 30
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30
-1
-0.5
0
0.5
1
0 5 10 15 20 25 30
-1
-0.5
0
0.5
1
( ) n n x
15
2
cos ] [
t
= ( ) n n x
15
28
cos ] [
t
=
( ) ( ) n t
t
2 cos
15
28
=
( ) ( ) n
15
2
cos
t
=
) cos(
15
2
n
t
=
27
6. Spectral representation of stationary processes
Harmonic representation
( ) ( ) t
t t t
+ + + = n n n x
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ] [
( )

=
+ =
K
k
k k k
n A n x
1
cos ] [ | e
}
+ =
t
e e | e e
0
)) ( cos( ) ( ] [ d n A n x
| |
}

=
t
t
e
e e
t
d e X n x
n j
) (
2
1
| |
n j
n
e n x X
e
e

= ) (
Fourier transform pair
| | { } ) (
1
e X FT n x

= { } ] [ ) ( n x FT X = e
( ) ( ) { } t e
t t t
+ + + = n n FT Y
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ) ( Example:
) ( 5 . 26 ) ( 5 . 26
8
2
8
2 3
2
3
2
t t
e o t e o t
t t
+ + =
j j
e e
) ( 5 . 22 ) ( 5 . 22
12
2
12
2 t
t
t
t
e o t e o t + + +
j j
e e
( ) ( )
n j j n j j
e e A e e A n A
e e | e e |
e e e | e e

+ = +
) ( ) (
2
1
) ( ) ( ) ( cos ) (
) (
) ( ) (
e |
e t e
j
e A X =
) ( ) (
*
e e X X =
-3 -2 -1 0 1 2 3
0
20
40
60
80
e
|
Y
(
e
)
|
Low speed
variation
Medium
speed
High
speed
Average
28
6. Spectral representation of stationary processes
Power spectral density (PSD)
2
) ( ) ( e e X S
X
=
] [
0
n
X
I
FT
) (e
X
S
Deterministic
] [ ] [
0
2
0
n n
W W
o o = I
FT
2
) (
W W
S o e = Example:
Example:
) ( ) 5 . 26 ( ) ( ) 5 . 26 ( ) (

8
2
2
8
2
2
t t
e o t e o t e + + =
Y
S
) ( ) 5 . 22 ( ) ( ) 5 . 22 (
12
2
2
12
2
2
t t
e o t e o t + + +
( ) ( ) ] [ cos 5 . 22 cos 5 . 26 ] [
12
2
3
2
8
2
n w n n n y + + + + = t
t t t
) (e
Y
S
1820 1830 1840 1850 1860 1870
-100
-50
0
50
100
Year
P
r
i
c
e
-
t
r
e
n
d
-3 -2 -1 0 1 2 3
0
2000
4000
6000
8000
e
S
Y
(
e
)
Periodogram
29
7. Detrending and filtering
) (z H
] [n w ] [n x
) ( ) ( ) (
2
e e e
W X
S H S =
) (e
W
S
-3 -2 -1 0 1 2 3
0
20
40
60
80
e
|
Y
(
e
)
|
Low speed
variation
Medium
speed
High
speed
Highpass filtering
Detrending
Bandpass filtering
Detrending+Denoising
Lowpass filtering
Denoising
Trend estimation
Seasonal estimation
Identity
1820 1830 1840 1850 1860 1870
150
200
250
300
Year
x[n]
-2 0 2
0
1
2
3
x 10
4
e
S
X
(
e
)
1820 1830 1840 1850 1860 1870
-100
0
100
Year
y[n]=x[n]-trend[n]
-2 0 2
0
1
2
3
x 10
4
e
S
Y
(
e
)
30
7. Detrending and filtering
) (z H
] [n w ] [n x
) ( ) ( ) (
2
e e e
W X
S H S =
) (e
W
S
( ) z z z H + + =

1 ) (
1
3
1
Example: Smoother
) ( ) ( ) (
e
e
e
j
e z
e H z H H
j
= =
=
( )
e e
e
j j
e e H + + =

1 ) (
3
1
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
-3 -2 -1 0 1 2 3
0
0.5
1
e
|H(
e
)|
2
-3 -2 -1 0 1 2 3
-2
0
2
e
angle(H(
e
))
-3 -2 -1 0 1 2 3
0
1
2
e
|1-H(
e
)|
2
-3 -2 -1 0 1 2 3
-2
0
2
e
angle(1-H(
e
))
( ) z z z H + + =

1 1 ) (
1
3
1
( )
e e
e
j j
e e H + + =

1 1 ) (
3
1
3
] 1 [ ] [ ] 1 [
] [ ] [
+ + +
=
n x n x n x
n x n y
31
7. Detrending and filtering
( ) z z z H + + =

1 ) (
1
3
1
1
Example: Smoothers
( )
e e
e
j j
e e H + + =

1 ) (
3
1
1
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
( )
2 1
3
1
2
1 ) (

+ + = z z z H
( )
e e
e
2
3
1
2
1 ) (
j j
e e H

+ + =
3
] 2 [ ] 1 [ ] [
] [
+ +
=
n x n x n x
n y
-2 0 2
0
0.5
1
e
|H
1
(
e
)|
2
-2 0 2
-2
0
2
e
angle(H
1
(
e
))
-2 0 2
0
0.5
1
e
|H
2
(
e
)|
2
-2 0 2
-2
0
2
e
angle(H
2
(
e
))
-2 0 2
0
0.5
1
e
|H
3
(
e
)|
2
-2 0 2
-2
0
2
e
angle(H
3
(
e
))
2
4
1
1
4
1
2
1
3
) (

+ + = z z z H
e e
e
2
4
1
4
1
2
1
3
) (
j j
e e H

+ + =
] 2 [ ] 1 [ ] [ ] [
4
1
4
1
2
1
+ + = n x n x n x n y
32
7. Detrending and filtering
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|

\
|
=

=
+
Example: Yearly season estimation
( )
6
2
1
5 4 3 2 1 2 3 4 5 6
2
1
1
1
12
1
) (
) (
) ( z z z z z z z z z z z z
z X
z M
z H
est
+ + + + + + + + + + + + = =

2
8
1
4
1
2
1
1
4
1
2
8
1
2
) (
) ( '
) ( z z z z
z X
z Y
z H + + + + = =

) ( ) ( ) (
1 2
z H z H z H =
-2 0 2
0
0.5
1
1.5
2
e
|H
1
(
e
)|
2
-2 0 2
0
0.5
1
1.5
2
e
|H
2
(
e
)|
2
-2 0 2
0
0.5
1
1.5
2
e
|H
3
(
e
)|
2
33
7. Detrending and filtering
-3 -2 -1 0 1 2 3
0
0.5
1
e
|H
1
(
e
)|
2
-3 -2 -1 0 1 2 3
0
0.5
1
e
|H
2
(
e
)|
2
6]) x[n 6] - 0.0611(x[n - 7]) x[n 7] - 0.0313(x[n - 8]) x[n 8] - n -0.0101(x[ ] [
1
+ + + + + + = n y
2]) x[n 2] - 0.0934(x[n 4]) x[n 4] - 0.0634(x[n - 5]) x[n 5] - 0.0802(x[n - + + + + + + +
0.2108x[n] 1]) x[n 1] - 0.1772(x[n + + + +
4] - 0.00056x[n 6]) - x[n 1] - n 0.00037(x[ - 8]) - x[n 93(x[n] 0000 . 0 ] [
2
+ + + = n y
8] - 58y[n . 0 7] - 4.3y[n - 6] - 14.6y[n 5] - 29.7y[n - 4] - 39y[n 3] - 34y[n - 2] - 19.3y[n 1] - 6.5y[n - + + + +
f c=2*pi / 12;
ef =2*pi / 60;
b1=f i r 1( 18, [ f c- ef f c+ef ] / pi ) ;
f c=2*pi / 12;
ef =2*pi / 60;
[ b1, a1] =but t er ( 4, [ f c- ef f c+ef ] / pi ) ;
34
7. Detrending and filtering
[ ] [ ] (1 )( [ 1] [ 1]) S n ax n a x n y n = + +
[ ] ( [ ] [ 1]) (1 ) [ 1] y n b S n S n b y n = +
Example: Holts linear exponential smoothing

[ ] [ 1] [ 1]
N
S n S n y n N = +
N
1 1
( ) ( ( ) ( ) ) (1 ) ( ) ( ) Y z b S z S z z b Y z z f S

= + =
1 1
( ) ( ) (1 )( ( ) ( ) ) S z aX z a X z z Y z z

= + +
( , ) ( ) f X Y f X = =
1 1

( ) ( ) ( ) ( )
N
S z S z z NY z z f X

= + =
( ) f X =
1
2
3
8. Non-stationary processes
35
[ , ) [ ] [ ]
j n
n
X m x n m w n e
e
e

=
= +

Short-time Fourier Transform


8. Non-stationary processes
36
Wavelet transform
8. Non-stationary processes
37
Wavelet transform
8. Non-stationary processes
38
Wavelet transform
8. Non-stationary processes
39
Empirical Mode Decomposition
40
Session outline
1. Goal
2. Linear and non-linear regression
3. Polynomial fitting
4. Cubic spline fitting
5. A short introduction to system analysis
6. Spectral representation of stationary processes
7. Detrending and filtering
8. Non-stationary processes
41
Bibliography
C. Chatfield. The analysis of time series: an introduction. Chapman &
Hall, CRC, 1996.
D.S.G. Pollock. A handbook of time-series analysis, signal processing
and dynamics. Academics Press, 1999.
J . D. Hamilton. Time series analysis. Princeton Univ. Press, 1994.
1
Time Series Analysis
Session III: Probability models for time series
Carlos scar Snchez Sorzano, Ph.D.
Madrid,
2
2
Session outline
1. Goal
2. A short introduction to system analysis
3. Moving Average processes (MA)
4. Autoregressive processes (AR)
5. Autoregressive, Moving Average (ARMA)
6. Autoregressive, Integrated, Moving Average (ARIMA, FARIMA)
7. Seasonal, Autoregressive, Integrated, Moving Average (SARIMA)
8. Known external inputs: System identification
9. A family of models
10. Nonlinear models
11. Parameter estimation
12. Order selection
13. Model checking
14. Self-similarity, Fractal dimension, and Chaos theory
3
3
1. Goal
] [ ] [ ] [ ] [ n random n periodic n trend n x + + =
Explained by statistical models (AR, MA, )
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
0 10 20 30 40 50 60 70 80 90 100
-1
-0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80 90 100
-0.5
0
0.5
1
[ ] 0.9 [ ] [ ] random n random n completelyRandom n = +
[ ] [ ] 0.9 [ 1] random n completelyRandom n completelyRandom n = +
4
4
1. Goal
5
5
2. A short introduction to system analysis
| | | |

= =
=
M
k
k
N
k
k
k n x b k n y a
0 0
Difference equation
Example: ] 1 [ 5 . 0 ] [ ] 1 [ ] [ = + n x n x n y n y

= =
N
k
k
k
M
k
k
k
z a
z b
z X
z Y
z H
0
0
) (
) (
) (

=

=
M
k
k
k
N
k
k
k
z z X b z z Y a
0 0
) ( ) (
1 1
) ( 5 . 0 ) ( ) ( ) (

= + z z X z X z z Y z Y
1
1
1
5 . 0 1
) (
) (
) (

= =
z
z
z X
z Y
z H
Transfer function
C z e
T
] [n x ) ( ] [ x T n y =
] [
0
n n x +
ZT
) (
0
z X z
n
6
6
2. A short introduction to system analysis
Poles/Zeros
0
z is a pole of iff ) (z H
= ) (
0
z H
0
z is a zero of iff ) (z H
0 ) (
0
= z H
Stability of LTI systems
A causal system is stable iff all its poles are inside the unit circle
Example:
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
z z z H
3
1
3
1
1
3
1
) ( + + =

Poles:
= , 0 z
Zeros:
2
3
2
1
j z =
} Re{z
} Im{z
1 = z
Invertibility of LTI systems
The transfer function of the inverse system of a LTI system whose transfer function
is is . Therefore, the zeros of one system are the poles of its inverse, and
viceversa.
) (z H
) (
1
z H
1 < z
7
7
2. A short introduction to system analysis
Downsampling
| | n x
M
| | ] [nM x n x
d
=
| | n x
L
Upsampling
| |

=
=

=
=
k
e
kL n k x
resto
L L n L n x
n x ] [ ] [
0
,... 2 , , 0 ] / [
o
8
8
3. Moving average processes: MA(q)
] [ ... ] 1 [ ] [ ] [
1 0
q n w b n w b n w b n x
q
+ + + =
) ( ... ) (
2
2
1
1 0
z B z b z b z b b z H
q
q
= + + + + =

) (q MA
] [n w
LTI, with memory,
invertible, causal, stable
0 2 0 4 0 6 0 8 0 1 0 0
- 4
- 3
- 2
- 1
0
1
2
3
w [ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
- 5
0
5
x
1
[ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
-1
- 0 . 5
0
0 . 5
1
x
2 0
[ n ]
t i m e
-1 0 -5 0 5 1 0
0
0 . 2
0 . 4
0 . 6
0 . 8
1
A C F
w
[ n
0
]
l a g
-1 0 -5 0 5 1 0
0
0 . 2
0 . 4
0 . 6
0 . 8
1
A C F
x
1
[ n
0
]
l a g
- 3 0 - 2 0 -1 0 0 1 0 2 0 3 0
0
0 . 2
0 . 4
0 . 6
0 . 8
1
A C F
x
20
[ n
0
]
l a g
Definition
| |
{ }
2
1
0
( ) ( )
W
n TF H S e e

I =
9
9
3. Moving average processes: MA(q)
) , 0 (
2
W
N o
] [ ] [
0
2
0
n n
W W
o o = I
) , 0 (
0
2 2

=
q
k
k W
b N o

< I
s s
<
= I

=
+
0 ) (
0
0
] [
0 0
0
0
2
0
0
0
0
n n
q n b b
n q
n
X
n q
k
n k k W X
o
It has limited
support!!

=
=
q
k
k
k n w b n x
0
] [ ] [
) (q MA
] [n w
{ } { }

= = = =
+ =
)
`

|
|
.
|

\
|
+
|
|
.
|

\
|
= + = I
q
k
q
k
k k
q
k
k
q
k
k X
k n n w k n w E b b k n n w b k n w b E n n x n x E n
0 0 '
0 '
0 '
0 '
0
0 0
] ' [ ] [ ] ' [ ] [ ] [ ] [ ] [
{ }
2
' 0 ' 0
0 ' 0 0 ' 0
[ '] [ ' ' ] [ ( ' )]
q q q q
k k k k W
k k k k
b b E w n w n n k k b b n k k o o
= = = =
= + + =

Statistical properties
Proof
10
10
3. Moving average processes: MA(q)

< I
s s
<
= = = I


=
+
= =
0 ) (
0
0
)] ' ( [ ... ] [
0 0
0
0
2
0
0 0 '
0
2
' 0
0
0
n n
q n b b
n q
k k n b b n
X
n q
k
n k k W
q
k
q
k
W k k X
o o o

=
=
=
0 ) ' ( 0
0 ) ' ( 1
)] ' ( [
0
0
0
k k n
k k n
k k n o
0
' n k k + =
k
' k
q
q
0
n
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
Statistical properties
Proof (contd.)
11
3. Moving average processes: MA(q)
11
(Brute force) determination of the MA parameters
0
0
0
2
0 0
0
0 0
0
[ ] 0
( ) 0
q n
X W k k n
k
X
q n
n b b n q
n n
o

+
=
<

I = s s

I <

( )
2 2 2 2
0 1 2
[0]
X W
r b b b o = + +
( )
2
0 1 1 2
[1]
X W
r b b b b o = +
( )
2
0 2
[2]
X W
r b b o =
12
12
3. Moving average processes: MA(q)
Invertibility
) (q MA
] [n w

=
=
q
k
k
k n w b n x
0
] [ ] [
) (
1
q MA

=
q
k
k
k
z b z H
0
) (

= =
q
k
k
k
inv
z b
z H
z H
0
1
) (
1
) (
|
|
.
|

\
|
=

=
q
k
k
k n w b n x
b
n w
1
0
] [ ] [
1
] [
] [n w
] [ ] [
0
n x k n w b
q
k
k
=

=
Example:
3
] 2 [ ] 1 [ ] [
] [
+ +
=
n x n x n x
n y
does not have a stable, causal inverse
] 1 [ 9 . 0 ] [ ] [ + = n x n x n y
has a stable, causal inverse
} Re{z
} Im{z
1 = z
Whitening
filter
Colouring
filter
Channel equalization
Channel estimation
Compression
Forecasting
13
13
3. Moving average processes: generalizations
] [ ] 1 [ ... ] 1 [ ] [ ] [
0 1 0 1 0
0 0
n w b n w b q n w b q n w b n x
q q
+ + + + =
+
Model not restricted to be causal
Model not restricted to be linear

= = =
+ =
q
k
q
k
k k
q
k
k
k n w k n w b k n w b n x
0
'
0 '
' ,
0
] ' [ ] [ ] [ ] [
] [ ... ] 1 [
1 F q
q n w b n w b
F
+ + + + +

Causal component
Anticausal component
Quadratic component

= = = = = =
+ + =
q
k
q
k
q
k
k k k
q
k
q
k
k k
q
k
k
k n w k n w k n w b k n w k n w b k n w b n x
0
'
0 '
' '
0 ' '
' ' , ' ,
0
'
0 '
' ,
0
] ' ' [ ] ' [ ] [ ] ' [ ] [ ] [ ] [
Volterra Kernels
1)
2)

=
=
q
k
k
k n w f b n x
0
]) [ ( ] [
14
14
4. Autoregressive processes: AR(p)
0 2 0 4 0 6 0 8 0 1 0 0
- 4
- 3
- 2
- 1
0
1
2
3
w [ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
- 5
0
5
x
1
[ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
-1
- 0 . 5
0
0 . 5
1
x
2 0
[ n ]
t i m e
] [ ... ] 1 [ ] [ ] [
1
p n x a n x a n w n x
p
+ + + =
) (
1
... 1
1
) (
2
2
1
1
z A z a z a z a
z H
p
p
=

=

) ( p AR
] [n w
LTI, with memory,
invertible, causal, stable
Definition
-30 -20 -10 0 10 20 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
0
ACF(n
0
)
15
15
4. Autoregressive processes: AR(p)
] [ ... ] 1 [ ] [ ] [
1
p n x a n x a n w n x
p
+ + + =
) (
1
... 1
1
) (
2
2
1
1
z A z a z a z a
z H
p
p
=

=

) ( p AR
] [n w
LTI, with memory,
invertible, causal, stable
Definition
Relationship to MA processes


+ =

=
1
2
2
1
1
1
... 1
1
) (
k
k
k
p
p
z b
z a z a z a
z H
Laurent series

=
I + = I
p
k
X k W X
k n a n n
1
0 0
2
0
] [ ] [ ] [ o o
Statistical properties
) , 0 (
2
W
N o
]) 0 [ , 0 (
X
N I
Yule-Walker equations

=
= I
p
k
n
k k X
z A n
1
0
0
] [
Whose solution is

k
z
Poles of ) (z H
16
16
4. Autoregressive processes: AR(p)
Determination of the constants

=
= I
p
k
n
k k X
z A n
1
0
0
] [
Example:
] 2 [ ] 1 [ ] [ ] [
2 1
+ + = n x a n x a n w n x
2
2
1
1
1
1
) (


=
z a z a
z H
Poles:
k
A

=
=
p
k
n
k k X
z A n r
1
'
0
0
] [
2
0 0 0
1
[ ] [ ] [ ]
p
X W k X
k
n n a n k o o
=
I = + I

=
=
p
k
X k X
k n r a n r
1
0 0
] [ ] [ 0
0
> n
2
4
,
2
2
1 1
2 1
a a a
z z
+
= 1 , 1 , 1 1
2 1 2 1 2
> < + > < a a a a a z
i
0 0
2
'
2 1
'
1 0
] [
n n
X
z A z A n r + =
0 4
2
2
1
> + e a a R z
i
1 ] 0 [
'
2
'
1
= + = A A r
X
] 1 [ ] 0 [ ] 1 [
2 1 2
'
1 1
'
1
+ = + =
X X X
r a r a z A z A r
17
17
5. Autoregressive, Moving average: ARMA(p,q)

= =
+ =
p
k
k
q
k
k
k n x a k n w b n x
1 0
] [ ] [ ] [
1 2
0 1 2
( ) ( )
1 2
1 2
...
( )
( ) ( ) ( )
1 ... ( )
q
q
MA q AR p
p
p
b b z b z b z
B z
H z H z H z
a z a z a z A z


+ + + +
= = =

) ( p AR
] [n w
Definition
Statistical properties
) , 0 (
2
W
N o
]) 0 [ , 0 (
X
N I
) (q MA

= =
I = I
p
k
X k
q
k
k W X
k n a n k h b n
1
0
0
0
2
0
] [ ] [ ] [ o
18
18
6. Autoregressive, Integrated, Moving Average: ARIMA(p,d,q)
] [n x
d
) (
) (
) (
) (
) , (
z H
z W
z X
z H
q p ARMA
d
= =
) , ( q p ARMA
] [n w
Definition
] [ * ]) 1 [ ] [ ( * ... * ]) 1 [ ] [ ( ] [ ] [ n x n n n n n x n x
d
l d
= V = o o o o
d
d
d
Int
d
d
z z X
z X
z H z X z z X
) 1 (
1
) (
) (
) ( ) ( ) 1 ( ) (
1
1

= = =
) (d Int
] [n x
Poles:
1 = z
(Multiplicity=d)
Unit root
) ( ) (
) (
) (
) (
) (
) (
) (
) (
) ( ) , ( ) , , (
z H z H
z X
z X
z W
z X
z W
z X
z H
d Int q p ARMA
d
d
q d p ARIMA
= = =
( ) ( )

= =
+ =
p
k
k
q
k
k
k n x k n x a k n w b n x n x
1 0
] 1 [ ] [ ] [ ] 1 [ ] [ Example for d=1:
eQ d
FARIMA or
ARFIMA
19
19
7. Seasonal ARIMA: SARIMA(p,d,q)x(P,D,Q)
s
(Box-Jenkins model)
) , , ( q d p ARIMA
] [n w
Definition
( , , )
( )
( )
( )
ARIMA p d q
s
X z
H z
X z
=
) , , ( Q D P ARIMA
s s
] [n x
s
] [n x
) (
) (
) (
) , , (
s
Q D P ARIMA
s
z H
z W
z X
=
) ( ) (
) (
) (
) (
) , , ( ) , , ( ) , , ( ) , , (
z H z H
z W
z X
z H
q d p ARIMA
s
Q D P ARIMA Q D P q d p SARIMA
s
= =

( ) ( )

= =
+ =
P
k
s s k
Q
k
k s s
s k n x ks n x A ks n w B s n x n x
1 0
] ) 1 ( [ ] [ ] [ ] [ ] [
( ) ( )

= =
+ =
p
k
k
q
k
s k
k n x k n x a k n x b n x n x
1 0
] 1 [ ] [ ] [ ] 1 [ ] [
1 = D
1 = d
20
20
7. Seasonal ARIMA: SARIMA(p,d,q)x(P,D,Q)
s
Example: SARIMA(1,0,0)x(0,1,1)
12
) , , ( q d p ARIMA
] [n w
) , , ( Q D P ARIMA
s s
] [n x
s
] [n x
] 12 [ ] [ ] 12 [ ] [
1 0
+ = n w B n w B n x n x
s s
] 1 [ ] [ ] [
1
+ = n x a n x n x
s
) 0 , 0 , 1 ( ) , , ( = q d p
) 1 , 1 , 0 ( ) , , ( = Q D P
] 1 [ ] [ ] [
1
= n x a n x n x
s
( ) ( ) ] 12 [ ] [ ] 13 [ ] 12 [ ] 1 [ ] [
1 0 1 1
+ = n w B n w B n x a n x n x a n x
( ) ] 13 [ ] 1 [ ] 12 [ ] [ ] 12 [ ] [
1 1 0
+ + + = n x n x a n w B n w B n x n x
21
21
8. Known external inputs: System identification
) (
1
z A
] [n w
+
) (
) (
z A
z B
] [n u
] [ ] [ ] [ ] [
0 1
n w k n u b k n x a n x
q
k
k
p
k
k
+ + =

= =
) (
) (
1
) (
) (
) (
) ( z W
z A
z U
z A
z B
z X + =
ARX
) (
) (
z A
z C
] [n w
+
) (
) (
z A
z B
] [n u

= = =
+ + =
'
0 0 1
] [ ] [ ] [ ] [
q
k
k
q
k
k
p
k
k
k n w c k n u b k n x a n x
) (
) (
) (
) (
) (
) (
) ( z W
z A
z C
z U
z A
z B
z X + =
ARMAX
22
22
9. A family of models
) (
) (
z D
z C
] [n w
+
) (
) (
z F
z B
] [n u
) (
) ( ) (
) (
) (
) ( ) (
) (
) ( z W
z A z D
z C
z U
z A z F
z B
z X + =
General model
) (
1
z A
] [n x
Polynomials used Name of the model
A AR
C MA
AC ARMA
ACD ARIMA
AB ARX
ABC ARMAX
ABD ARARX
ABCD ARARMAX
BFCD Box-Jenkins
23
23
10. Nonlinear models
( ) ] [ ] [ ],..., 2 [ ], 1 [ ] [ n w p n x n x n x f n x + =
Nonlinear AR:
] [ ] [ ] [ ] [
1
n w k n x n a n x
p
k
k
+ =

=
Time-varying AR:
Random coeff. AR:
] [ ] [ ]) [ ( ] [
1
n w k n x n a n x
p
k
k
+ + =

=
c
Bilinear models
] [ ] [ ] [ ] [ ] [
1 1
n w M k n w k n x b k n x a n x
q
k
k
p
k
k
+ + =

= =
| | | |
'
1 1
[ ] [ ] [ ] (1 ) [ ]
p p
k k
k k
x n a x n k p n a x n k p n w n
= =
| | | |
= + +
| |
\ . \ .

Smooth transition:
24
24
10. Nonlinear models
Smooth TAR (STAR):
Threshold AR (TAR):

> +
s +
=

=
=
t d n x n w k n x a
t d n x n w k n x a
n x
p
k
k
p
k
k
] [ ] [ ] [
] [ ] [ ] [
] [
1
) 2 (
1
) 1 (
] [ ]) [ ( ] [ ] [ ] [
1
) 2 (
1
) 1 (
n w d n x S k n x a k n x a n x
p
k
k
p
k
k
+
|
|
.
|

\
|
+ =

= =
Heterocedastic model:
] [ ] [ ] [ n w n n x o = Random walk
) 1 , 0 ( N

=
+ =
p
k
k
k n x a n
1
2 2
0
2
] [ ] [ o o

= =
+ + =
q
k
k
p
k
k
k n b k n x a n
1
2
1
2 2
0
2
] [ ] [ ] [ o o o
ARCH
GARCH
(Neural networks)
(Chaos)
25
25
10. Nonlinear models
[ ] x n
[ ] 0
[ ]
x n
n I
2
0
[ ]
[ ]
x n
n I

= =
+ + =
q
k
k
p
k
k
k n b k n x a n
1
2
1
2 2
0
2
] [ ] [ ] [ o o o GARCH
1 1
1
p q
k k
k k
a b
= =
+ <

The model is unique and stationary if
Properties
{ }
[ ] 0 E x n =
Zero mean
Lack of correlation
{ }
2
0
[ ] 0 0
max ,
1
[ ] [ ]
1
x n
p q
k k
k
n n
a b
o
o
=
I =
+

26
26
10. Nonlinear models

= =
+ + =
q
k
k
p
k
k
k n b k n x a n
1
2
1
2 2
0
2
] [ ] [ ] [ o o o GARCH
Estimation through Maximum Likelihood
Forecasting
{ } min ,
2 2 2
0
1 1
[ ] ( ) [ ] [ ]
p q
q
k k k
k k
x n h a b x n h k b z n h k o
= =
+ = + + + +

2 2
0
[ 1] ... x n h o + = +
2 2
0
[ 2] ... x n h o + = +
...
2
[ ] x n
2
[ 1] x n
Observed
[ 1] 0 z n h + =
[ 2] 0 z n h + =
...
2 2
[ ] [ ] [ ] z n x n n o =
2 2
[ 1] [ 1] [ 1] z n x n n o =
GARCH(1,1)
2 2 2 2
0 1 1
[ 1] [ ] [ ] x n a x n b n o o + = + +
1
2 2 2 1 2
0 1 1 1 1 1 1 1 1
0
[ ] ( ) ( ) [ ] ( ) [ ]
h
k h
k
x n h a b a a b x n b a b n o o

=
+ = + + + + +

27
27
10. Nonlinear models
Extensions of GARCH
Exponential GARCH (EGARCH)
2 2 2 2
0
1 1
log [ ] log log [ ] log [ ]
p q
k k
k k
n a x n k b n k o o o
= =
= + +

Integrated GARCH (IGARCH)
1 1
1
p q
k k
k k
a b
= =
+ <

1 1
1
p q
k k
k k
a b
= =
+ =

GARCH
IGARCH
28
28
11. Parameter estimation
Maximum Likelihood Estimates (MLE)
] [ ] 1 [ ] [
1
n w n x a n x + =
AR(1)
Assume that we observe ( ) ] [ ],..., 2 [ ], 1 [ N x x x
] [ ] 1 [ ] [
1
n w n x a n x + =
|
|
.
|

\
|

2
1
2
1
1
, 0 |
a
N X
W
o
u
] [ ] [
0
2
0
n n
W W
o o = I
... ) (
1 2 1 1 1 1
= + + = + =
n n n n n n
W W X a a W X a X
...
3
3
1 2
2
1 1 1
+ + + + =
n n n n
W a W a W a W
{ } 0 =
n
X E
{ } ( ) { }= + + + + =

2
3
3
1 2
2
1 1 1
2
...
n n n n n
W a W a W a W E X E
( )
2
1
2
6
1
4
1
2
1
2
1
... 1
a
a a a
W
W

= + + + + =
o
o
2 1 1 2
W X a X + =
1 1 2 2
X a X W =
2
2 1 1 1
2
1
| , ,
1
W
X X N a x
a
o
u
| |

\ .
( )
2
, 0
W
N o
{ }
2
1
,
W
a o u =
29
29
11. Parameter estimation
Maximum Likelihood Estimates (MLE)
|
|
.
|

\
|

2
1
2
1
1
, 0 |
a
N X
W
o
u
2
2 1 1 1
2
1
| , ,
1
W
X X N a x
a
o
u
| |

\ .
2
3 2 1 1 2
2
1
| , , ,
1
W
X X X N a x
a
o
u
| |

\ .

= ) ,..., , (
2 1 | ...
2 1
N X X X
x x x f
N
u
) ( )... ( ) ( ) (
, | 3 , | 2 , | 1 |
1 2 3 1 2 1
N X X X X X X X
x f x f x f x f
N N
u u u u

=

|
|
.
|

\
|

= =
N
n
W
n n
W
N
a
W
N X X X
x a x x
a
x x x f L
W
N
2
2
2
1 1
2
1
2
2
1
1
2
1
2
1
2
2
1
2
1
2 1 | ...
) 2 log(
2
1
log ) 2 log( ) ,..., , ( log ) (
2
1
2
2 1
o
to
o
t u
o
u
2
1
2
1
) (
0
) (
) ( max arg ,
1
W
a
W
L
a
L
L a
o
u u
u o
c
c
= =
c
c
=
Numerical, iterative solution
Confidence intervals
30
30
11. Parameter estimation
] [ ] [ ] [ n w n x n x + =
{ } 0 ] [ = n w E
{ } ] 0 [ ] 1 [ 2 ] 0 [ ] [
2
1 1
2 2
X X X W
a a n w E I + I + I = = o
] [ ] [ ] [ n x n x n w =
Least Squares Estimates (LSE)
] [ ] 1 [ ] [
1
n w n x a n x + =
] 1 [ ] [
1
= n x a n x
|
|
.
|

\
|

I
I
I =
I
I
= I + I = =
c
c
1
] 1 [
] 0 [
] 0 [
] 0 [
] 1 [
] 0 [ 2 ] 1 [ 2 0
2
2
2
1 1
1
2
X
X
X W
X
X
X X
W
a a
a
o
o
31
31
12. Order selection
If I have to fit a model ARMA(p,q), what are the p and q values I have to supply?
ACF/PACF analysis
Akaike Information Criterion
Bayesian Information Criterion
Final Prediction Error
N
q p q p AIC
W
2
) ( log ) , (
2
+ + = o
N
N
q p q p BIC
W
log
) ( log ) , (
2
+ + = o
2
) (
W
p N
p N
p FPE o

+
=
32
32
12. Order selection
Partial correlation coefficients (PACF, First-order correlations)
] [ ] [ ... ] 2 [ ] 1 [ ] [
0 , 2 , 1 ,
0 0 0 0
n w n n x n x n x n x
n n n n
+ + + + = | | |

=
=
0
0
1
0 , 0
] [ ] [
n
n
n n
n n r n r |
|
|
|
|
|
|
|
|
.
|

\
|

=
|
|
|
|
|
|
|
|
.
|

\
|
|
|
|
|
|
|
|
|
.
|

\
|




] [
] 1 [
...
] 3 [
] 2 [
] 1 [
...
] 0 [ ] 1 [ ... ] 2 [ ] 1 [ ] 1 [
] 1 [ ] 0 [ ... ] 4 [ ] 3 [ ] 2 [
... ... ... ... ... ...
] 3 [ ] 4 [ ... ] 0 [ ] 1 [ ] 2 [
] 2 [ ] 3 [ ... ] 1 [ ] 0 [ ] 1 [
] 1 [ ] 2 [ ... ] 2 [ ] 1 [ ] 0 [
0
0
,
1 ,
3 ,
2 ,
1 ,
0 0 0
0 0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
n r
n r
r
r
r
r n r n r n r
r r n r n r n r
n r n r r r r
n r n r r r r
n r n r r r r
n n
n n
n
n
n
|
|
|
|
|
Yule-Walker
equations
33
33
12. Order selection
Thumb rule
ARMA(1,0): ACF: exponential decrease; PACF: one peak
ARMA(2,0): ACF: exponential decrease or waves; PACF: two peaks
ARMA(p,0): ACF: unlimited, decaying; PACF: limited
ARMA(0,1): ACF: one peak; PACF: exponential decrease
ARMA(0,2): ACF: two peaks; PACF: exponential decrease or waves
ARMA(0,q): ACF: limited; PACF: unlimited decaying
ARMA(1,1): ACF&PACF: exponential decrease
ARMA(p,q): ACF: unlimited; PACF: unlimited
34
34
13. Model checking
Residual Analysis
Example: ARMA (1,1)
( ) ] 1 [ ] 1 [ ] [ ] [ ] 1 [ ] [ ] 1 [ ] [
1
1
1 0
0
= + + = n w b n ax n x n w n w b n w b n ax n x
b
Assumptions
1. Gaussianity:
1. The input random signal w[n] is univariate normal with zero mean
2. The output signal, x[n] (the time series being studied), is multivariate normal and its
covariance structure is fully determined by the model structure and parameters
2. Stationarity: x[n] is stationary once that the necessary operations to produce a stationary
signal have been carried out.
3. Residual independency: the input random signal w[n] is independent of all previous
samples.
0 ] 0 [ = w
35
35
13. Model checking
[ ] x n
n
Structural changes
1 1
[ ] ( , ) y n f x y c = +
2 2
[ ] ( , ) y n f x y c = +
[ ] ( , ) y n f x y c = +
2
1
( [ ] [ ])
N
i i
n
S y n y n
=
=

Sum of squares
of the residuals
with model i
Chow test:
0 1 2
1 1 2
:
:
H f f
H f f
=
=
1 2
1
1 2
1 2
1
1 2 2
( ( ))
( , 2 )
( )
k
N N k
S S S
F F k N N k
S S
+
+
= +
+

Number of
samples in
each period
Number of
parameters in
the model
Assumption: variance is the same in both regions
Solution: Robust standard errors
36
36
13. Model checking
Diagnostic checking
1. Compute and plot the residual error
2. Check that its mean is approximately zero
3. Check for the randomness of the residual, i.e., there are no time intervals where the
mean is significantly different from zero (intervals where the residual is
systematically positive or negative).
4. Check that the residual autocorrelation is not significantly different from zero for all
lags
5. Check that the residual is normally distributed.
6. Check if there are residual outliers.
7. Check the ability of the model to predict future samples
37
14. Self-similarity, Fractal dimension, Chaos theory
37
Intuitively a fractal is a curve that is self-similar at
all scales
Koch curve
k=1
k=0
k=2
k=3
k=4
k=5
( )
X
S
o
e e

38
14. Self-similarity, Fractal dimension, Chaos theory
38
L==1m
=0.5m =0.5m
1
2 N
c
= =
=0.25m =0.25m =0.25m =0.25m
1
4 N
c
= =
1
1 N D c

= =
1
1 N
c
= =
S=1m
2
L==1m
2
1
1 N
c
= =
=0.5m
2
1
4 N
c
= =

2
=1m
2

2
=0.25m
2
2
1
16 N
c
= =

2
=0.0625m
2
=0.25m
2
2 N D c

= =
39
14. Self-similarity, Fractal dimension, Chaos theory
39
1 1
D D
N c

= =
1
3
m c =
1m c =
2
1
3
m c =
3
1
3
m c =
1
4
3
D
D
N c

| |
= =
|
\ .
2
2
1
4
3
D
D
N c

| |
= =
|
\ .
3
3
1
4
3
D
D
N c

| |
= =
|
\ .
k=1
k=2
k=3
1 log 4
4 1.26
3 log3
D
D k
k
N D c

| |
= = = =
|
\ .

40
14. Self-similarity, Fractal dimension, Chaos theory
40
( )
X
S
o
e e

5 2D o =
2 H D =
Hurst exponent
| |
0.2
0,1 0.5
0.8
H

e =

Uncorrelated time series


Long-range dependent alternating signs
Long-range dependent same signs
( )
max
max max
max
1
[ ] [ ]
N
H
MA MA N n
n n
x n x n n o

=
=

41
14. Self-similarity, Fractal dimension, Chaos theory
41
42
14. Self-similarity, Fractal dimension, Chaos theory
42
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
x
0.1
[n]
[ ] 4 [ 1](1 [ 1]) x n x n x n =
Chaotic systems
Logistic equation
[0] 0.1 x =
0 10 20 30 40 50 60 70 80 90 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
n
x
0.1
[n]-x
0.100001
[n]
43
14. Self-similarity, Fractal dimension, Chaos theory
43
Phase space
Attractor (Fixed point)
2-history
( )
2
2
[ ] [ ], [ 1] n x n x n = e x
3-history
( )
3
3
[ ] [ ], [ 1], [ 2] n x n x n x n = e x
h-history
( )
[ ] [ ], [ 1],..., [ 1]
h
h
n x n x n x n h = + e x
( )
2
[2] [2], [1] x x = x
( )
2
[3] [3], [2] x x = x
( )
2
[1] [1], [0] x x = x

2-Phase space
44
14. Self-similarity, Fractal dimension, Chaos theory
44
Recurrence plots
,
1 [ ] [ ']
( , ')
0 [ ] [ ']
h h
h
h h
n n
R n n
n n
c
c
c
<

=

>

x x
x x
White noise
2,
( , ') R n n
c
Sinusoidal Chaotic system
with trend
AR model
45
14. Self-similarity, Fractal dimension, Chaos theory
45
Recurrence plots
46
14. Self-similarity, Fractal dimension, Chaos theory
46
Correlation dimension (Grassberger-Procaccia plots)
Correlation integral
{ } ( )
( 1)
, ' 1
2
'
1
( ) Pr [ ] [ '] lim [ ] [ ']
N
h h h h h
N N
N
n n
n n
C n n H n n c c c

=
=
= < =

x x x x
0 0
( )
1 0
x
H x
x
s

=

>

Heaviside function
Correlation dimension
0
log( ( ))
lim
log( )
h
h
C
D
c
c
c

=
h
D
h
D h =
(random)
(maybe chaotic)
h
47
14. Self-similarity, Fractal dimension, Chaos theory
47
Brock-Dechert-Scheinkman (BDS) Test
( )
1
,
( ) ( )
(0,1)
h
h
h
C C
V N
N
c
c c
=
Lyapunov exponent
Consider two time points such that
0
[ ] [ '] 1
h h
n n o = x x
Consider the distance m samples later
[ ] [ ' ]
m h h
n m n m o = + + x x
0
m
m
e

o o =
The Lyapunov exponent relates these two distances
0 >
Histories diverge: chaos, cannot be predicted
0 <
Histories converge: can be predicted
Maximal Lyapunov exponent
0
,
0
1
lim log
m
m
m
o
o

=
H
0
: samples are iid
48
48
Bibliography
C. Chatfield. The analysis of time series: an introduction. Chapman &
Hall, CRC, 1996.
D.S.G. Pollock. A handbook of time-series analysis, signal processing
and dynamics. Academics Press, 1999.
J. D. Hamilton. Time series analysis. Princeton Univ. Press, 1994.
Time Series Analysis
Session IV: Forecasting and Data Mining
Carlos scar Snchez Sorzano, Ph.D.
Madrid
2
Session outline
1. Forecasting
2. Univariate forecasting
3. Intervention modelling
4. State-space modelling
5. Time series data mining
1. Time series representation
2. Distance measure
3. Anomaly/Novelty detection
4. Classification/Clustering
5. Indexing
6. Motif discovery
7. Rule extraction
8. Segmentation
9. Summarization
3
1. Forecasting
? ] [ h n x +
Goals:
Symmetric loss functions:
Quadratic loss function:
Other loss functions:
Assymmetric loss functions:
{ } ]) [ ], [ ( min arg ] [ ] [

h g h n x L E h g h n x
n
g
n
+ = = +
( )
2
] [ ] [ ]) [ ], [ ( h n x h n x h n x h n x L + + = + +
] [ ] [ ]) [ ], [ ( h n x h n x h n x h n x L + + = + +
( )
( )

+ > + + +
+ s + + +
= + +
] [ ] [ ] [ ] [
] [ ] [ ] [ ] [
]) [ ], [ (
2
2
h n x h n x h n x h n x b
h n x h n x h n x h n x a
h n x h n x L
Solution:
{ } ) ] [ ],..., 2 [ ], 1 [ ] [ ] [ ] [

n x x x h n x E h g h n x
n
+ = = +
Univariate Forecasting: use only samples of the time series to be predicted
Multivariate Forecasting: use samples of the time series to be predicted and other
companion time series.
4
2. Univariate Forecasting
Trend and seasonal component extrapolation
] [ ] [ ] [ ] [ n random n seasonal n trend n x + + =
n n trend 6 . 1 2777 ] [ + =
( ) ( ) t
t t t
+ + + = n n n seasonal
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ] [
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1 -0.5 0 0.5 1 1.5 2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
5
2. Univariate Forecasting
Exponential smoothing
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
-100
0
100
Year
P
r
i
c
e
-
t
r
e
n
d
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
-50
0
50
Year
R
e
s
i
d
u
a
l
] [ ) 1 ( ] [ ] 1 [ n x n x n x o o + = +
1 ) (
) (

) (
+
= =
o
o
z z X
z X
z H
-2 0 2
0
0.5
1
e
|
H
(
e
)
|
2
-2 0 2
-2
0
2
e
a
n
g
l
e
(
H
(
e
)
)
] [ ]) [ ] [ ( n x n x n x + =o
] [ ] [ n x n e + =o
] 1 [ ] 1 [ x x =
6
2. Univariate forecasting
Model based forecasting (Box-J enkins procedure)
1. Model identification: examine the data to select an apropriate model structure
(AR, ARMA, ARIMA, SARIMA, etc.)
2. Model estimation: estimate the model parameters
3. Diagnostic checking: examine the residuals of the model to check if it is valid
4. Consider alternative models if necessary: if the residual analysis reveals that the
selected model is not apropriate
5. Use the model difference equation to predict: as shown in the next two slides.
7
2. Univariate forecasting
Model based forecasting
] 1 [ ] [ ] 1 [ ] [ + + = n bw n w n ax n x
] [ ] 1 [ ] [ ] 1 [ n bw n w n ax n x + + + = +
Example: ARMA(1,1)
1
1
1
1
) (
) (
) (

+
= =
az
bz
z W
z X
z H
W
) (
) (
) ( ) ( ) ( ) (

z H
z X
b z aX z bW z aX z X z
W
+ = + =
b z
b a
bz
b a
z z H
b
a
z z X
z X
z H
W
X
+
+
=
+
+
=
|
|
.
|

\
|
+ = =
1
1
1
) (
1
) (
) (

) (

] [ ) ( ] [ ] 1 [ n x b a n x b n x + = + + ] [ ] [ ) ( ] 1 [ n x b n x b a n x + = +
= = + = + = + + + + + = + ... ] 3 [ ] 2 [ ] 1 [ ] [ ] 1 [ ] [
3 2
h n x a h n x a h n bw h n w h n x a h n x
( ) ( ) ] [ ] [ ] 1 [
1 1
n x b n x b a a n x a
h h
+ = + =

1 > h
8
2. Univariate forecasting
] 12 [ ] [ ]) 13 [ ] 1 [ ( ] 12 [ ] [ + + + = n w n w n x n x n x n x | o
] 11 [ ] 1 [ ]) 12 [ ] [ ( ] 11 [ ] 1 [ + + + + = + n w n w n x n x n x n x | o
Example: SARIMA(1,0,0)x(0,1,1)
12
Model based forecasting
13 12 1
12
1
1
) (
) (
) (

+
+
= =
z z z
z
z W
z X
z H
W
o o
|
] 12 [ ] 13 [ ] 12 [ ] 1 [ ] [ ] [ + = n w n x n x n x n x n w | o o
Eliminating w[n]
] 13 [ ] 12 [ ) ( ] 11 [ ] 1 [ ] [ ) ( ] 1 [ + + + + = + n x n x n x n x n x n x o| | o o| o| | o
] 12 [ ] 24 [ ] 23 [ + n x n x n x | o| |
9
2. Univariate Forecasting
10
2. Univariate forecasting
[ ] x n
n
1 1
[ ] ( , ) y n f x y c = +
[ ] ( , ) y n f x y c = +
Chow test:
0 1
1 1
:
:
H f f
H f f
=
=
2
1
1
1
2 1
1
1
( )
( , )
N
N k
S S
F F N N k
S

=
Assumption: variance is the same in both regions
Solution: Robust standard errors
Predictive power test
2. Univariate forecasting
11
Artificial Neural Network (ANN)
1 2
( , ,..., )
t t t t p t t t
y f y y y y c c

= + = +
w
2. Univariate forecasting
12
Fuzzy Artificial Neural Network (ANN)
( ): [0,1]
M
a
x
2. Univariate forecasting
13
Fuzzy Artificial Neural Network (ANN)
2. Univariate forecasting
14
Fuzzy Artificial Neural Network (ANN)
2. Univariate forecasting
15
Fuzzy Artificial Neural Network (ANN)
16
3. Intervention modelling
What happens in the time series of a price if there is tax raise of 3% in 1990?
[ ] [ 1] [ ] 0.03 [ 1990] price n price n w n u n o = + +
1 0
[ ]
0 0
n
u n
n
>

=

<

1982 1984 1986 1988 1990 1992 1994


0
0.02
0.04
[ ] [ 1] [ ] 0.03( [ 1990] [ 1993]) price n price n w n u n u n o = + +
1982 1984 1986 1988 1990 1992 1994
0
0.02
0.04
What happens in the time series of a price if there is tax raise of 3% between 1990 and 1992?
1 1990
1
1
( ) ( ) ( ) 0.03
1
price z price z z w z z
z
o

= + +

1
( ) ( ) ( ) price z price z z w z o

= + +
1990 1993
1
1
0.03 ( )
1
z z
z

17
3. Intervention modelling
What happens in the time series of a price if in 1990 there was an earthquake?
[ ] [ 1] [ ] [ 1990] price n price n w n n o |o = + +
1 0
[ ]
0
n
n
otherwise
o
=

1982 1984 1986 1988 1990 1992 1994


0
0.02
0.04
What happens in the time series of a price if there is a steady tax raise of 3% since 1990?
[ ] [ 1] [ ] 0.03( 1989) [ 1990] price n price n w n n u n o = + +
1982 1984 1986 1988 1990 1992 1994
0
0.05
0.1
0.15
0.2
1 1990
( ) ( ) ( ) 1 price z price z z w z z o |

= + +
1 1990
1 1
1 1
( ) ( ) ( ) 0.03
1 1
price z price z z w z z
z z
o


= + +

18
3. Intervention modelling
In general
[ ] ( [ 1],..., [ ], [ ], [ 1],..., [ ]) ( [ ]) x n f x n x n q w n w n w n p f i n = +
( ) ( )
( ) ( ) ( )
( ) ( )
B z C z
X z W z I z
A z D z
= +
We are back to the system identification problem with external inputs
19
3. Intervention modelling: outliers revisited
1982 1984 1986 1988 1990 1992 1994
0
0.02
0.04
Additive outliers:
( )
( ) ( ) ( )
( )
B z
X z W z I z
A z
= +
Innovational outliers:
( ) ( )
( ) ( ) ( )
( ) ( )
B z C z
X z W z I z
A z D z
= +
Level outliers:
1982 1984 1986 1988 1990 1992 1994
0
0.02
0.04
( )
( ) ( ) ( )
( )
B z
X z W z I z
A z
= +
Time change outliers:
( ) 1
( ) ( ) ( )
( ) ( )
B z
X z W z I z
A z D z
= +
20
4. State-space modelling
F
1
z

[ ]
k
n e x
H
1
[ ] n
2
[ ] n
[ ]
l
n e y
State
Noise
Noise
Observed time-series
Observation
system
State-transition
system
Linear, Gaussian system: 1
[ ] [ 1] [ ] n F n Q n = + x x
State-transition model
2
[ ] [ ] [ ] n H n n = + y x Observation model
0 0
[0] ( , ) N E x
1 1
[ ] ( , ) n N E 0
2 2
[ ] ( , ) n N E 0
Kalman filter: Given estimate
Time series modelling (System Identification):
0 0 1 2
[ ], , , , , , x n H F E E E
1 2
[ ], , x n E E
0 0
[ ], , , , y n H F E
Given y[n] estimate
21
4. State-space modelling
22
4. State-space modelling
Linear, Gaussian system: 1
[ ] [ 1] [ ] n F n Q n = + x x
State-transition model
2
[ ] [ ] [ ] n H n n = + y x Observation model
Extended/Unscented Kalman filter: Given estimate
1 2
[ ], , x n E E
0 0
[ ], , , , y n h f E
General system:
1
[ ] ( [ 1], [ ]) n f n n = x x
2
[ ] ( [ ], [ ]) n h n n = y x
, f h Differentiable
Particle filter:
Given and the probability density functions of
0 0
[ ], , , , y n h f E
1 2
, c c
estimate x[n] and the free parameters of the noise signals.
23
5. Time series data mining
Source: http://www.kdnuggets.com/polls/2004/time_series_data_mining.htm
24
5. Time series data mining
Time Series
Representations
Data Adaptive
Non Data Adaptive
Spectral Wavelets
Piecewise
Aggregate
Approximation
Piecewise
Polynomial
Symbolic
Singular
Value
Decomposition
Random
Mappings
Piecewise
Linear
Approximation
Adaptive
Piecewise
Constant
Approximation
Discrete
Fourier
Transform
Discrete
Cosine
Transform
Haar Daubechies
dbn n > 1
Coiflets Symlets
Sorted
Coefficients
Orthonormal Bi-Orthonormal
Interpolation Regression
Trees
Natural
Language
Strings
0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120
DFT DWT SVD APCA PAA PLA
0 20 40 60 80 100120
SYM
UUCUCUCD
U
U
C
U
C
U
D
D
Time series representation
25
5. Time series data mining
0 20 40 60 80 100 120
C
C
0
--
0 20 40 60 80 100 120
b
b
b
a
c
c
c
a
baabccbc
Time series representation
26
5. Time series data mining
Time series representation
Source: http://www.cs.cmu.edu/~bobski/pubs/tr01108-onesided.pdf
27
5. Time series data mining
Time series distance measure
28
5. Time series data mining
Source: http://www.cs.ucr.edu/~wli/SSDBM05/
Anomaly/Novelty detection
A very complex and noisy
ECG, but according to a
cardiologist there is only one
abnormal heartbeat.
The algorithm easily finds it.
29
5. Time series data mining
Classification/Clustering
30
5. Time series data mining
Indexing
31
5. Time series data mining
Winding Dataset
(The angular speed of reel 2)
0 500 1000 1500 2000 2500
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
A B C
A B C
Motif discovery
5. Time series data mining
32
Motif discovery
33
5. Time series data mining
Rule extraction
5. Time series data mining
34
Rule extraction
| |
| | | |
| |
1 x n x n
y n
x n

=
Step 1: Convert time series into a symbol sequence
5. Time series data mining
35
Rule extraction
Step 2: Identify frequent itemsets
x[n]=abbcaabbcabababcaabcabbcabcbcabbaabcbcabbcbc
2-item set
aa 3
ab 11
ac 1
ba 2
bb 5
bc 11
ca 7
cb 3
cc 0
Min.Support=5
3-item set
aba 2
abb 5
abc 4
bba 1
bbb 0
bbc 4
bca 7
bcb 3
bcc 0
caa 2
cab 5
cac 0
4-item set
abba 1
abbb 0
abbc 4
bcaa 2
bcab 5
bcac 0
caba 1
cabb 3
cabc 1
5-item set
bcaba 1
bcabb 3
bcabc 1
36
5. Time series data mining
Segmentation (Change Point Detection)
37
5. Time series data mining
Summarization
38
Session outline
1. Forecasting
2. Univariate forecasting
3. Intervention modelling
4. State-space modelling
5. Time series data mining
1. Time series representation
2. Distance measure
3. Anomaly/Novelty detection
4. Classification/Clustering
5. Indexing
6. Motif discovery
7. Rule extraction
8. Segmentation
9. Summarization
39
Conclusions
] [ ] [ ] [ ] [ n random n periodic n trend n x + + =
Regression, Curve fitting
Explained by statistical models (AR, MA, ARMA)
Preprocessing
(heteroskedasticity,
gaussianity, outliers, )?
Harmonic analysis,
Filtering
(F)ARIMA SARIMA
40
Bibliography
C. Chatfield. The analysis of time series: an introduction. Chapman &
Hall, CRC, 1996.
C. Chatfield. Time-series forecasting. Chapman & Hall, CRC, 2000.
D.S.G. Pollock. A handbook of time-series analysis, signal processing
and dynamics. Academics Press, 1999.
D. Pea, G. C. Tiao, R. S. Tsay. A course in time series analysis. J ohn
Wiley and Sons, Inc. 2001