Time Series Analysis

Time Series Analysis
Session 0: Course outline

Carlos scar Snchez Sorzano, Ph.D.
Madrid
2
Motivation for this course
3
Course outline
4
Course outline: Session 1
Kinds of time series
Continuous time series sampling
Data models
Descriptive analysis: time plots and data preprocessing
Distributional properties: statisitcal distribution, stationarity and autocorrelation
Outlier detection and rejection
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Sampling
t
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Regular sampling
t
Irregular sampling
] [ ] [ ] [ ] [ n random n periodic n trend n x
5
Trend analysis:
Linear and non-linear regression
Polynomial fitting
Cubic spline fitting
Seasonal component analysis:
Spectral representation of stationary processes
Spectral signal processing:
Detrending and filtering
Non-stationary signal processing
6
Model definition:
Moving Average processes (MA)
Autoregressive processes (AR)
Autoregressive, Moving Average (ARMA)
Autoregressive, Integrated, Moving Average (ARIMA, FARIMA)
Seasonal, Autoregressive, Integrated, Moving Average (SARIMA)
Known external inputs: System identification
A family of models
Nonlinear models
Parameter estimation
Order selection
Model checking
Self-similarity, fractal dimension and chaos theory
7
Forecasting
Univariate forecasting
Intervention modelling
State-space modelling
Time series data mining
Time series representation
Distance measure
Anomaly/Novelty detection
Classification/Clustering
Indexing
Motif discovery
Rule extraction
Segmentation
Summarization
Winding Dataset
(The angular speed of reel 2)
0 500 1000 1500 2000 2500
A B C
8
Course Outline: Session 5
Su nombre aqu
Bring your own data if
possible!
9
Suggested readings
It is suggested to read (before coming):
Geo4990: Time series analysis
Adler1998: Analysing stable time series
Leonard: Mining Transactional and Time Series Data
Chattarjee2006: Simple Linear Regression
10
Resources
Data sets
http://www.york.ac.uk/depts/maths/data/ts
http://www-personal.buseco.monash.edu.au/~hyndman/TSDL
Competition
http://www.neural-forecasting-competition.com
Links to organizations, events, software, datasets
http://www.buseco.monash.edu.au/units/forecasting/links.php
http://www.secondmoment.org/time_series.php
Lecture notes
http://www.econphd.net/notes.htm#Econometrics
11
Bibliography
D. Pea, G. C. Tiao, R. S. Tsay. A course in time series analysis. J ohn
Wiley and Sons, Inc. 2001
C. Chatfield. The analysis of time series: an introduction. Chapman &
Hall, CRC, 1996.
C. Chatfield. Time-series forecasting. Chapman & Hall, CRC, 2000.
D.S.G. Pollock. A handbook of time-series analysis, signal processing
and dynamics. Academics Press, 1999.
J . D. Hamilton. Time series analysis. Princeton Univ. Press, 1994.
C. Prez. Econometra de las series temporales. Prentice Hall (2006)
A. V. Oppenheim, R. W. Schafer, J . R. Buck. Discrete-time signal
processing, 2nd edition. Prentice Hall, 1999.
A. Papoulis, S. U. Pillai. Probability, random variables and stochastic
processes, 4th edition. McGraw Hill, 2002.
1
Session I: Introduction
Madrid
2
2
Session outline
1. Features and objectives of the time series
2. Sampling
3. Components of time series: data models
4. Descriptive analysis
5. Distributional properties
6. Detection and removal of outliers
7. Time series methods
3
3
Goal: Explain history
Features:
1. Samples (discrete)
2. Bounded
3. Finite support
| | n x
| | | | | |
)
`
< = e

= n
n s n s l n x
1
| | { }
F
n n n n n x ,..., 1 , 0
0 0
+ e =
Trend
Periodic component
Period=N7 years
] [ ] [ n x N n x = +
4
4
Goal: Forecast demand
Features:
1. Seasonal
2. Non stationary
3. Non independent samples
Mean
Standard
Variance
5
5
Pendulum angle with different controllers
Goal:
Control=Forecast(+correct)
Features:
1. Continuous signal
Regular sampling
) ( ] [
s
nT x n x =
s
T =Sampling period
6
6
Other kind of times series not covered in this course
7
7
Other kind of times series not covered in this course
8
8
2. Sampling
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Sampling
t
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Regular sampling
t
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
Irregular sampling
t
) ( ] [
s
nT x n x =
Also called non-uniform
sampling
9
9
2. Sampling
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Continuous signal
t
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Oversampling
t
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Critical sampling
t
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Undersampling
t
10
min
= T
5
min
T
T
s
=
2
min
T
T
s
=
min
T T
s
=
Nyquist/Shannon criterion
( ) ( ) ( )
8 100
10
4 100
3
2
1
100
1
2 sin 2 . 0 2 sin 2 sin 2 ) (
t t
t t t + + = t t t t x
10
10
2. Sampling
11
11
2. Sampling: Signal reconstruction
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Continuous signal
t
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Sampled signal
t
-5 0 5
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
sinc(t)
t
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Sinc superposition
t
9 9.5 10 10.5 11
-0.2
0
0.2
0.4
0.6
0.8
1
Sinc superposition
t
0 5 10 15 20
-0.2
0
0.2
0.4
0.6
0.8
1
Reconstructed signal
t
| |
=
=
n
s r r
nT t h n x t x ) ( ) (
|
|
.
|
\
|
=
s
r
T
t
c t h sin ) (
Reconstruction formula
Reconstruction kernel
for bandlimited
signals
12
12
2. Sampling: Aliasing
0 20 40 60 80 100 120 140 160 180 200
-1
-0.5
0
0.5
1
Obvious aliasing effect
t
0 50 100 150 200 250 300 350 400
-3
-2
-1
0
1
2
3
Not so clear aliasing effect
t
2 1 . 1
min min
T T
T
s
> =
2 1 . 1
min min
T T
T
s
> =
( ) ( ) ( )
8 100
10
4 100
3
2
1
100
1
2 sin 2 . 0 2 sin 2 sin 2 ) (
t t
t t t + + = t t t t x
( )
10
100
( ) sin 2 x t t t =
13
13
2. Aliasing: Conclusions
From the disgression above, two are the main consequences that must be kept
in mind:
1. Any continuous time series can be safely treated as a discrete time
series as long as the Nyquist criterion is satisfied.
2. Once our discrete analysis is finished, we can always return to the
continuous world by reconstructing our output sequence.
Although not discussed, once the continuous signal is discretized, one can
arbitrarily change the sampling period without having to go back to the
continuous signal with two operations called upsampling (going for a finer
discretization) and downsampling (coarser discretization).
14
14
3. Components of a time series: data models
] [ ] [ ] [ ] [ n random n periodic n trend n x + + =
Seasonal
Ex: Unemployment is low in summer
Ex: Temperature is high in the middle of the day
Long-term change. What is long-term?
0 50 100 150 200 250 300 350 400
-3
-2
-1
0
1
2
3
t
Explained by statistical models (AR, MA, )
15
15
0 100 200 300 400
-10
-5
0
5
10
t
Trend
0 100 200 300 400
-10
-5
0
5
10
t
Seasonal
0 100 200 300 400
-10
-5
0
5
10
t
Noise
0 100 200 300 400
-10
-5
0
5
10
t
Additive model
0 100 200 300 400
-10
-5
0
5
10
t
Seasonal multiplicative model
0 100 200 300 400
-10
-5
0
5
10
t
Fully multiplicative model
3. Components of a time series: data models
] [ ] [ ] [ ] [ n n S n m n x c + + = ] [ ] [ ] [ ] [ n n S n m n x c + = ] [ ] [ ] [ ] [ n n S n m n x c =
] [ log ] [ log ] [ log ] [ log n n S n m n x c + + =
log
16
16
3. Components of a time series: data model
0 10 20 30 40 50 60 70 80 90 100
-1
-0.5
0
0.5
1
0 10 20 30 40 50 60 70 80 90 100
-1
-0.5
0
0.5
1
Noise is dependent of signal?
0 10 20 30 40 50 60 70 80 90 100
-1
0
1
Signal x1
0 10 20 30 40 50 60 70 80 90 100
-1
0
1
Signal x2
0 10 20 30 40 50 60 70 80 90 100
-1
0
1
Interferent signal
0 10 20 30 40 50 60 70 80 90 100
-2
0
2
Observed signal
Noise is dependent of noise
[ ] [ ] [ ] x n x n n c = +
17
17
Time plot
Take care of appearance (scale, point shape, line shape, etc.)
Take care of labelling axes specifying units (be careful with the
scientific notation, e.g. from 0.5e+03 to 0.1e+04)
Data Preprocessing
Consider transforming the data to enhance a certain feature, although
this is a controversial topic:
Logarithm: Stabilizes variance in the fully multiplicative model
Box-Cox: the transformed data tends to be normally distributed
]) [ log( ] [ n x n y =
=
>
=

0 ]) [ log(
0
] [
1 ] [
n x
n y
n x
18
18
Data preprocessing:
(Removal of trend) Detrending: if the trend clearly follows a known
curve (line, polynomial, logistic, Gaussian, Gompertz, etc.), fit a model
to the time series and detrend (either by substracting or dividing).
(Removal of trend) Differencing
2
2
] 2 [ ] 1 [ 2 ] [
] [ ] [
] 1 [ ] [ 2 ] 1 [
] [ ] [
] [ ] 1 [
] [ ] [
] 1 [ ] [
] [ ] [
s
l l ll
s
l r lr
s
r r
s
l l
T
n x n x n x
n y n y
T
n x n x n x
n y n y
T
n x n x
n x n y
T
n x n x
n x n y
+
= V =
+ +
= V =
+
= V =

= V =
-6 -4 -2 0 2 4 6
-10
-5
0
5
10
15
20
25
30

x[n]
y
l
[n]
y
lr
[n]
1.5
) ( ] [
) 2 sin( 5 . 1 10 ) (
s
nT x n x
t t t x
=
+ + =
19
19
Data preprocessing:
(Removal of season) Deseasoning:average over the seasonal period
to remove its effect.
(Estimate of season) Estimating seasoning: substract the
deseasoned time series from a local estimate of the current sample.
( ] [ ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
n x n x n x n x n m
est
+ + + + =
) ] 6 [ ] 5 [ ... ] 1 [
2
1
+ + + + + + n x n x n x
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|
\
|
=

=
+
] [ ] [ ] [ ] [ n e n S n m n x + + =
0 10 20 30 40 50 60 70 80 90 100
-20
0
20
40
60
80
100

x[n]
m[n]
m
est
[n]
S
est
[n]
] [ ] 2 [ ] 1 [ ] [ ] 1 [ ] 2 [ ] [
8
1
4
1
2
1
4
1
8
1
n m n x n x n x n x n x n S
est est
+ + + + + + =
20
20
Data preprocessing:
(Removal or isolation of trend, seasonal or noise) Filtering: filtering
aims at the removal of any of the components, for example, the moving
average is a common filtering operation in stock analysis to remove
noise.
=
=
20
1
] [
20
1
] [
k
k n x n y

=
=
10
10
] [
21
1
] [
k
k n x n y
40 60 80 100 120 140 160 180
7
8
9
10

Original
Causal
Causal+Anticausal
21
21
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
t
Irregular sampling
] [n x
31
] 31 [ X x =
All variables are identically distributed Stationarity
A variable can be normally distributed or Poisson or etc.
It is important to characterize their distribution
All variables are independent White process
Independency is measured through the autocorrelation function
0 10 20 30 40 50 60 70 80 90 100
-3
-2
-1
0
1
2
3
22
22
-5 0 5
0
0.5
1
Normal distribution function
X
-5 0 5
0
0.1
0.2
0.3
0.4
Normal probability density function
X
-5 0 5 10 15 20
0
0.5
1
Poisson distribution function
X
-5 0 5 10
0
0.1
0.2
0.3
0.4
Poisson probability density function
X
{ } x X x F
X
s = Pr ) (
}

=
x
dt t f ) (
{ }
s
= =
x x
i
i
x X Pr
{ }
}

= ) (dx F x X E
X
r r
}

=
x
r
dt t f x ) (
{ }
s
= =
x x
i
r
i
x X x Pr
{ }
{ }
=
+ = =
1
!
1 ) (
k
k
k k
itX
X
t
k
X E i
e E t
}
=
t
t t

t
dt t
it
e e
a F b F
X
itb ita
X X
) ( lim
2
1
) ( ) (
{ } y Y x X y x F
Y X
s s = , Pr ) , (
,
Characteristic
function
23
23
0 20 40 60 80 100
-20
0
20
40
60
80
100
0 20 40 60 80 100
-1
-0.5
0
0.5
1
0 50 100 150 200
-0.2
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Examples of nonstationary variables
24
24
Strictly stationary
) ,..., , ( ) ,..., , (
2 1
2 1
2 1
2 1
,..., , ,..., , N n N n N n X X X n n n X X X
k N
k
n N n N n k
k
n n n
x x x F x x x F
+ + +
+ + +
=
N n n n k
k
, ,..., , ,
2 1
Consequences:
{ } =
n
X E n
{ }
2
0
*
0
] [ ) )( ( ) , ( ] [
0 0
I = = =
+ +
n X X E X X Cov n C
n n n n n n
0
, n n
Autocorrelation
function (ACF)
Wide sense stationary { } =
n
X E n
] [ ) , (
0
0
n X X Corr
n n n
I =
+ 0
, n n
] [ ] [
0
*
0
n n I = I
] [ ] 0 [
0
n I > I
lag
{ } ] [ ) , (
0
*
0 0
n X X E X X Corr
n n n n n n
I = =
+ +
Autocovariance
function
Correlation
coefficient
] 0 [
] [
] [
0
0
C
n C
n r =
0
n
0
n
Example: white process
=
=
= = I
0 0
0 1
] [ ] [
0
0
0
2
0
n
n
n n
X
o o
25
25
] 0 [
] [
] [
0
0
C
n C
n r =
Correlation coefficients (Zero-order correlations)
Under the null hypothesis it is distributed as
Students t with N-2 degrees of freedom.
Number of samples upon which is estimated ] [
0
n r
The probability of observing if the null hypothesis is true
is given by
( ) 0 ] [
0 0
= n r H
] [
0
n r
)
`

>
] [ 1
2
] [ Pr
0
2
0
n r
N
n r t
26
26
5. Distributional proeperties
Ergodicity
0 5 10 15 20 25
-1
0
1
0 5 10 15 20 25
-1
0
1
0 5 10 15 20 25
-1
0
1
0 5 10 15 20 25
-1
0
1
Realization 1
Realization 2
Realization 3
Realization 4
=
=
4
1
, 20 4
1
20
i
i
x
=
=
25
1
25
1
] [
n
n x
Strong law of large numbers
=

=
N
i
i N
N
x
1
, 20
1
20
lim
Equal if
stationary and
ergodic for the
mean
Ensemble average
Time average
27
27
0 10 20 30 40 50 60 70 80 90 100
-25
-20
-15
-10
-5
0
5
10
15
20
] [ ] [ ] [ n e n m n x + =
) , 0 (
2
o N
) , 0 (
2
c
o N
{ } 0 =
n
X E
2 2
) , (
0
c
o o + =
+n n n
X X Cov
The time average still converges to the ensemble average
Stationarity Ergodicity
Example:
28
28
How to detect non-stationarity in the mean?
There is a variation in the local mean
Solutions:
Polynomial trends: By differentiating p times
(2) (1) (1)
[ ] [ ] [ 1] x n x n x n =
( ) ( 1) ( 1)
[ ] [ ] [ 1]
p p p
x n x n x n

=
Removal of a linear trend

Removal of a quadratic trend
Removal of a p-th order polynomial trend
(1)
[ ] [ ] [ 1] x n x n x n =
Removal of a constant (or piece-wise constant)
(3) (2) (2)
[ ] [ ] [ 1] x n x n x n =
29
29
How to detect non-stationarity in the mean?
Exponential trends: Taking logs
[ ]
[ ] log
[ 1]
x n
y n
x n
=
Financial time series

30
30
How to detect non-stationarity due to seasonality?
There is a periodic variation in the local mean
Solutions:
Polynomial trends: By differentiating p times with that seasonality
(1)
[ ] [ ] [ 12] x n x n x n =
Monthly data: removal of a yearly seasonality
(1)
[ ] [ ] [ 7] x n x n x n =
Daily data: removal of a weekly seasonality
How to detect non-stationarity in the variance?
There is a variation in the local variance
Solutions:
Box-Cox transformation tends to stabilize variance
How to detect non-stationarity?
Solutions:
Unit root tests
31
31
6. Outlier detection and rejection
Common procedures:
1. Visual inspection
2. Mean k Standard Deviation
3. Median k Median Abs. Deviation
4. Robust detection
5. Robust model fitting
Common actions:
1. Remove observation
2. Substitute by an estimate
Do
Estimate mean and Std. Dev.
Remove samples outside a given interval
Until (No sample is removed)
0 10 20 30 40 50 60 70 80 90 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Sensor blackout
32
32
7. Time series methods
Time-domain methods:
Based on classical theory of correlation (autocovariance)
Include parametric methods (AR, MA, ARMA, ARIMA, etc.)
Include regression(linear, nonlinear)
Frequency-domain (spectral) methods:
Based on Fourier analysis
Include harmonic methods
Neural networks and Fuzzy neural networks
Other fancy approaches: fractal models, wavelet models,
bayesian networks, etc.
33
33
Bibliography
Hall, CRC, 1996.
A. V. Oppenheim, R. W. Schafer, J . R. Buck. Discrete-time signal
processing, 2nd edition. Prentice Hall, 1999.
A. Papoulis, S. U. Pillai. Probability, random variables and stochastic
processes, 4th edition. McGraw Hill, 2002.
Session II: Regression and Harmonic Analysis
Madrid
2
Session outline
1. Goal
2. Linear and non-linear regression
3. Polynomial fitting
4. Cubic spline fitting
5. A short introduction to system analysis
6. Spectral representation of stationary processes
7. Detrending and filtering
8. Non-stationary processes
3
1. Goal
Session 3
Session 2
4
2. Linear and non linear regression
Year
(n)
Price
(x[n])
1500 17
1501 19
1502 20
1503 15
1504 13
1505 14
1506 14
1507 14

Linear regression
] [ ] [ ] [ n random n trend n x + =
n n trend
1 0
] [ | | + =
...
1503 15
1502 20
1501 19
1500 17
1 0
1 0
1 0
1 0
| |
| |
| |
| |
+ =
+ =
+ =
+ =
|
|
|
|
|
|
.
|
\
|
=
|
|
.
|
\
|
|
|
|
|
|
|
.
|
\
|
...
15
20
19
17
... ...
1503 1
1502 1
1501 1
1500 1
1
0
|
|
x X =
5
Linear regression
x X =
] [ ] [ ] [ n random n trend n x + = ( ) ( ) x X x X x X X x = = + =
t
2
c
o
{ }
] [ ] [
0 ] [
0
2
0
n n
n E
o o
c
c c
= I
=
Lets assume
( ) x X X X x X

t t
1
2
min arg

+
= = =
c
o
Least Squares Estimate
{ } =
E
Properties:
{ } ( )
1
2

= X X
t
Cov
c
o
( ) ( ) x X x X
=
t
k N
1
2
c
o
Homocedasticity
2
2
2
1
1 x x
X x
= R
1
1
) 1 ( 1
2 2

=
k N
N
R R
adjusted
Degree of fit
Linear regression with constraints
( ) ( )
( )
2
1 1
argmin

( )
. .
t t t t
R
s t
c
o

= = +
=
X X R R X X R r R
R r
2 1 1 2
2 2 0 | | | | = =
Example:
6
2
2 1 0
] [ n n n trend | | | + + =
...
1503 1503 15
1502 1502 20
1501 1501 19
1500 1500 17
2
2
1 0
2
2
1 0
2
2
1 0
2
2
1 0
| | |
| | |
| | |
| | |
+ + =
+ + =
+ + =
+ + =
|
|
|
|
|
|
.
|
\
|
=
|
|
|
.
|
\
|
|
|
|
|
|
|
.
|
\
|
...
15
20
19
17
... ... ...
1503 1503 1
1502 1502 1
1501 1501 1
1500 1500 1
2
1
0
2
2
2
2
|
|
|
x X =
0
0

=
=
1
0
H
H First test:
( ) ( )
0
t
t
0
X X =

1
2
k
F
c
o
H
0
is true
) , ( k N k F
If F>F
P
, reject H
0
20 1
20 0

2
2
=
=
H
H Second test:
( ) ( )
20 2 2 1
t
2
t
20 2
)X P (I X =

1
2
2
k
F
c
o
H
0
is true
) , (
2
k N k F
] [ ] [
) , 0 ( ] [
0
2
0
2
n n
N n
o o
o c
c c
c
= I

Lets assume
( ) x
X X =
|
|
.
|
\
|
2
1
2 1
( )
t
1
1
1
t
1 1 1
X X X X P

=
0 1
0 0
i i
i i
H
H
| |
| |
=
=
ii
i i
t
e o
| |
c
= ( )
( )
| |
ii
ii
1
= X X
t
e
H
0
is true
) ( k N t
7
[40,45] [0.05,0.45] Y X = +
We got a certain regression
line but the true regression
line lies within this region
with a 95% confidence.
( )
1
2 2

t
j
jj
o o

= X X
2
2
1 , 1
j j j
N k
t
o
| | o

e +
Unbiased variance of the j-th regression coefficient
Confidence interval for the j-th regression coefficient
Confidence intervals for the coefficients
8
Durbin-Watson test:
0 ] [
0 ] [
0 1
0 0
> I
= I
n H
n H
( )
=
=
=

N
n
N
n
n
n n
d
1
2
2
2
] [
] 1 [ ] [
c
c c
Reject H
0
L
d d <
Do not reject H
0
U
d d >
0
0
= n
Test that the residuals are certainly uncorrelated
Other tests: Durbins h, Wallis D
4
, Von Neumanns ratio, Breusch-Godfrey
9
(0) (0)
[ ] [ ] y n n = x
Cochrane-Orcutt method of regression with correlated residues
1. Estimate a first model
2. Estimate residuals (correlated)
3. Estimate the correlation of the residues
4. i=1
5. Estimate uncorrelated residuals
6. Estimate uncorrelated output
7. Estimate uncorrelated input
8. Reestimate model
9. Estimate residuals
10. i=i+1 until convergence in
(0) (0)
[ ] [ ] [ ] n y n y n c =
( ) ( ) ( )
[ ] [ ] [ 1] [ 1]
i i i
a n n i n
c
c c =
( ) ( )
[ ] [ ] [ 1] [ 1]
i i
y n y n i y n
c
=
( ) ( )
[ ] [ ] [ 1] [ 1]
i i
n n i n
c
= x x x
( ) ( ) ( ) ( )
[ ] [ ] [ ]
i i i i
y n a n n = x
( ) ( ) ( )
[ ] [ ] [ ]
i i i
n y n y n c =
( ) i
10
Assumptions of regression
The sample is representative of your population
The dependent variable is noisy, but the predictors are not!!. Solution: Total Least
Squares
Predictors are linearly independent (i.e., no predictor can be expressed as a linear
combination of the rest), although they can be correlated. If it happens, this is called
multicollinearity. Solution: add more samples, remove dependent variable, PCA
The errors are homoscedastic. Solution: Weighted Least Squares
The errors are uncorrelated to the predictors and to itself. Solution: Generalized
Least Squares
The errors follow a normal distribution. Solution: Generalized Linear Models
11
More linear regression
] [ ] [
1 0
n w n n x + + = | |
] [ ] 1 [ ] [
2 1 0
n w n x n n x + + + = | | |
] [ ]) 2 [ ] 1 [ ( ] [
2 1 0
n w n x n x n n x + + + = | | |
] [ ] 2 [ ] 1 [ ] [
3 2 1 0
n w n x n x n n x + + + + = | | | |
] [ ] [ ... ] [ ] [ ] [
6 6 1 1 0 0 1 0
n w n M n M n M n n x + + + + + = o o o | |
+ =
=
otherwise 0
0 7 n 1
] [
0
n M
+ =
=
otherwise 0
6 7 n 1
] [
6
n M
7 years
Non linear regression
] [ ) sin( ] [
1 0 2 1 0
n w n n n x + + + + = o o | | |
] [ ] [
1
0
n w n n x
|
| =
] [ log log log ] [ log
1 0
n w n n x + + = | |
] [ ' log ] [ '
'
1
'
0
n w n n x + + = | |
12
Polynomial trends
2
2 1 0
] [ n n n trend | | | + + =
...
1503 1503 15
1502 1502 20
1501 1501 19
1500 1500 17
2
2
1 0
2
2
1 0
2
2
1 0
2
2
1 0
| | |
| | |
| | |
| | |
+ + =
+ + =
+ + =
+ + =
|
|
|
|
|
|
.
|
\
|
=
|
|
|
.
|
\
|
|
|
|
|
|
|
.
|
\
|
...
15
20
19
17
... ... ...
1503 1503 1
1502 1502 1
1501 1501 1
1500 1500 1
2
1
0
2
2
2
2
|
|
|
x X =
Orthogonal polynomials
...
) 1503 ( ) 1503 ( ) 1503 ( 15
) 1502 ( ) 1502 ( ) 1502 ( 20
) 1501 ( ) 1501 ( ) 1501 ( 19
) 1500 ( ) 1500 ( ) 1500 ( 17
2 2 1 1 0 0
2 2 1 1 0 0
2 2 1 1 0 0
2 2 1 1 0 0
o | o | o |
o | o | o |
o | o | o |
o | o | o |
+ + =
+ + =
+ + =
+ + =
q
q
n n n n trend | | | | + + + + = ... ] [
2
2 1 0
) ( ... ) ( ) ( ) ( ] [
2 2 1 1 0 0
n n n n n trend
q q
| o | o | o | o + + + + =
x X =
x =
1
= XR
R
1
=
}

= = 0 ) ( ) ( dt t t j i
j i
| | Such that
13
Polynomial trends
Orthogonal polynomials
}
= =
1
1
0 ) ( ) ( dt t t j i
j i
| |
1 ) (
0
= t |
t t = ) (
1
|
) (
1
) (
1
1 2
) (
2 1
t
i
i
t t
i
i
t
i i i
+
+
+
= | | |
2
1
2
2
3
2
) ( = t t |
t t t
2
3
3
2
5
3
) ( = |
Grafted polynomials
) ( ... ) ( ) ( ) ( ] [
2 2 1 1 0 0
n n n n n trend
q q
| o | o | o | o + + + + =
S(t) is a cubic spline in some interval if this
interval can be decomposed in a set of
subintervals such that S(t) is a polynomial of
degree at most 3 on each of these subintervals
and the first and second derivatives of S(t) are
continuous.
1
t
2
t
3
t
=
+
=
1
3
3
) ( ) (
k
j
j j
t t d t S
14
Cubic B-splines
( )
< s
< +
=

otherwise 0
2 1
1
) (
6
2
2
2
3
2
3
2
3
3
t
t t
t B
t
t
=
=
1
3
3
) ( ) (
k
j
j j
t B d t S
-3 -2 -1 0 1 2 3
0
0.5
1
t
0
3 3 3 2 2 3
( ) ( ) ( 3 3 )
q q
p j j j j j j
j p j p
B t d t t d t t t tt t
+ +
= =
= = +

0 ) ( = < t B t t
p p
3 , 2 , 1 , 0 0 0 ) ( = = = >

=
k t d t B t t
q
p j
k
j j p q
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
|
.
|
\
|
|
|
|
|
|
.
|
\
|
+
+
+
+
+ + + +
+ + + +
+ + + +
0
0
0
0
1 1 1 1 1
4
3
2
1
3
4
3
3
3
2
3
1
3
2
4
2
3
2
2
2
1
2
4 3 2 1
p
p
p
p
p
p p p p p
p p p p p
p p p p p
d
d
d
d
d
t t t t t
t t t t t
t t t t t
}

=1 ) (
3
dt t B
p
( ) ( ) 2 1 0 1 2
4 3 2 1
=
+ + + + p p p p p
t t t t t
By definition
15
=
=
1
3
3
) ( ) (
k
j
j j
t B d t S
( )
+
=
=
1
1
3
0
) (
F
j
j j
T
t
j
j B d t S
T
t
j
T
t
j
F
F
= = ,
0
0
Bd S =
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-10
-8
-6
-4
-2
0
2
4
6
8
10
Cubic B-splines
T
16
Overfitting and forecasting
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1 -0.5 0 0.5 1 1.5 2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
4. Polynomial fitting with discontinuities
17
18
T
] [n x
) ( ] [ x T n y =
Example: ] [ 3 ] [ n x n y = Amplifier
] 3 [ ] [ = n x n y Delay
] [ ] [
2
n x n y =
Instant power
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
Smoother
Box-Cox transformation: family of systems
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|
\
|
=

=
+
Season estimation
=
>
= =

0 ]) [ log(
0
) ( ] [
1 ] [
n x
x T n y
n x
-5 0 5 10 15
0
0.5
1
n
x[n]
-5 0 5 10 15
0
0.5
1
n
x[n-3]
19
Systems with memory
| | | | | | | |,...) 2 , 1 , ( = n x n x n x T n y
Memoryless systems
| | | |) ( n x T n y =
Invertible systems | | | | | | | | | |)) ( ( ) ( : ) ( ) (
1 1 1
n x T T n y T n x y T n x T n y

= = - =
Causal systems
| | | | | | | |,...) 2 , 1 , ( = n x n x n x T n y
Anticausal sytems
| | | | | | | |,...) 2 , 1 , ( + + = n x n x n x T n y
Stable systems
| | | | n B n x T B n B n x
T T x
< - < , ) ( : ,
Time invariant
systems
| | | | | | | |) ( ) (
0 0
n n x T n n y n x T n y = =
Linear systems
| | | | ( ) | | ( ) | | ( ) n x bT n x aT n bx n ax T
2 1 2 1
+ = +
LTI
Basic system properties
Present Past
Future
20
5. Basic introduction to system analysis
LTI systems
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
1
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
1
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
2
[n]=x
1
[n-3]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
2
[n]=y
1
[n-3]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
3
[n]=2x
1
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
3
[n]=2y
3
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
x
4
[n]=x
1
[n]+x
2
[n]
-5 0 5 10 15
0
0. 5
1
1. 5
2
n
y
4
[n]=y
1
[n]+y
2
[n]
21
T
] [n x
) ( ] [ x T n y =
Example: ] [ 3 ] [ n x n y = LTI, memoryless, invertible, causal, stable
] 10 [ ] [ = n x n y
] [ ] [
2
n x n y =
Non linear, TI, memoryless, non invertible,
causal,stable
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
=
>
= =

0 ]) [ log(
0
) ( ] [
1 ] [
n x
x T n y
n x
Non linear, memoryless, invertible, causal,
unstable
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|
\
|
=

=
+
LTI, with memory, invertible, causal, stable
LTI, with memory, invertible, non causal,
stable
LTI, with memory, invertible, non causal,
stable
22
Impulse response of a LTI system
T
] [n o
] [n h
-5 0 5
0
0.5
1
n
o[n]
FIR: Finite impulse response
IIR: Infinite impulse response
T
] [n x
=
= =
k
k n h k x n h n x n y ] [ ] [ ] [ * ] [ ] [
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|
\
|
=

=
+
Example:
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n n n
n h
o o o
-5 0 5
0
0.1
0.2
0.3
0.4
n
h[n]
-10 -5 0 5 10
-0.2
0
0.2
0.4
n
h[n]
23
| | | |

= =
=
M
k
k
N
k
k
k n x b k n y a
0 0
Difference equation
Example:
] 1 [ 5 . 0 ] [ ] 1 [ ] [ = + n x n x n y n y
= =
N
k
k
k
M
k
k
k
z a
z b
z X
z Y
z H
0
0
) (
) (
) (

=
=
M
k
k
k
N
k
k
k
z z X b z z Y a
0 0
) ( ) (
1 1
) ( 5 . 0 ) ( ) ( ) (

= + z z X z X z z Y z Y
1
1
1
5 . 0 1
) (
) (
) (

= =
z
z
z X
z Y
z H
Transfer function
C z e
Impulse response of a LTI system ) ( ) ( ) ( z X z H z Y =
] [
0
n n x +
ZT
) (
0
z X z
n
24
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
-100
0
100
Year
P
r
i
c
e
-
t
r
e
n
d
] [ ] [ ] [ ] [ n random n seasonal n trend n x + + =
n n trend 6 . 1 2777 ] [ + =
] [ ] [ ] [ n trend n x n y =
] [ ] [ n random n seasonal + =
( ) ( ) ] [ cos 5 . 22 cos 5 . 26
12
2
3
2
8
2
n random n n + + + + = t
t t t
Period=8 years Period=12 years
25
Harmonic components
( ) ] [ cos ] [ n random n A n x + + = | e
Amplitude
Frequency (rad/sample)
Phase (rad)
0 10 20 30 40 50 60 70 80 90 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
A
e
|
( ) ] [ 2 cos ] [ n random fnT A n x
s
+ + = | t
Frequency (Hertz)
k N N n seasonal n seasonal
f
f
fT
k k s
s
= = = + =
e
t 2
] [ ] [ ,... 3 , 2 , 1 = k
( ) ( ) t
t t t
+ + + = n n n seasonal
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ] [
Period=8 years Period=12 years
Example:
Period=lcm(N
1
,N
2
)=24 years
26
Harmonic components
( ) | t + = n A n x cos ] [
0 5 10 15 20 25 30
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30
-1
-0.5
0
0.5
1
0 5 10 15 20 25 30
-1
-0.5
0
0.5
1
( ) n n x
15
2
cos ] [
t
= ( ) n n x
15
28
cos ] [
t
=
( ) ( ) n t
t
2 cos
15
28
=
( ) ( ) n
15
2
cos
t
=
) cos(
15
2
n
t
=
27
Harmonic representation
( ) ( ) t
t t t
+ + + = n n n x
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ] [
( )
=
+ =
K
k
k k k
n A n x
1
cos ] [ | e
}
+ =
t
e e | e e
0
)) ( cos( ) ( ] [ d n A n x
| |
}
=
t
t
e
e e
t
d e X n x
n j
) (
2
1
| |
n j
n
e n x X
e
e

= ) (
Fourier transform pair
| | { } ) (
1
e X FT n x

= { } ] [ ) ( n x FT X = e
( ) ( ) { } t e
t t t
+ + + = n n FT Y
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ) ( Example:
) ( 5 . 26 ) ( 5 . 26
8
2
8
2 3
2
3
2
t t
e o t e o t
t t
+ + =
j j
e e
) ( 5 . 22 ) ( 5 . 22
12
2
12
2 t
t
t
t
e o t e o t + + +
j j
e e
( ) ( )
n j j n j j
e e A e e A n A
e e | e e |
e e e | e e

+ = +
) ( ) (
2
1
) ( ) ( ) ( cos ) (
) (
) ( ) (
e |
e t e
j
e A X =
) ( ) (
*
e e X X =
-3 -2 -1 0 1 2 3
0
20
40
60
80
e
|
Y
(
e
)
|
Low speed
variation
Medium
speed
High
speed
Average
28
Power spectral density (PSD)
2
) ( ) ( e e X S
X
=
] [
0
n
X
I
FT
) (e
X
S
Deterministic
] [ ] [
0
2
0
n n
W W
o o = I
FT
2
) (
W W
S o e = Example:
Example:
) ( ) 5 . 26 ( ) ( ) 5 . 26 ( ) (
8
2
2
8
2
2
t t
e o t e o t e + + =
Y
S
) ( ) 5 . 22 ( ) ( ) 5 . 22 (
12
2
2
12
2
2
t t
e o t e o t + + +
( ) ( ) ] [ cos 5 . 22 cos 5 . 26 ] [
12
2
3
2
8
2
n w n n n y + + + + = t
t t t
) (e
Y
S
1820 1830 1840 1850 1860 1870
-100
-50
0
50
100
Year
P
r
i
c
e
-
t
r
e
n
d
-3 -2 -1 0 1 2 3
0
2000
4000
6000
8000
e
S
Y
(
e
)
Periodogram
29
) (z H
] [n w ] [n x
) ( ) ( ) (
2
e e e
W X
S H S =
) (e
W
S
-3 -2 -1 0 1 2 3
0
20
40
60
80
e
|
Y
(
e
)
|
Low speed
variation
Medium
speed
High
speed
Highpass filtering
Detrending
Bandpass filtering
Detrending+Denoising
Lowpass filtering
Denoising
Trend estimation
Seasonal estimation
Identity
1820 1830 1840 1850 1860 1870
150
200
250
300
Year
x[n]
-2 0 2
0
1
2
3
x 10
4
e
S
X
(
e
)
1820 1830 1840 1850 1860 1870
-100
0
100
Year
y[n]=x[n]-trend[n]
-2 0 2
0
1
2
3
x 10
4
e
S
Y
(
e
)
30
) (z H
] [n w ] [n x
) ( ) ( ) (
2
e e e
W X
S H S =
) (e
W
S
( ) z z z H + + =

1 ) (
1
3
1
Example: Smoother
) ( ) ( ) (
e
e
e
j
e z
e H z H H
j
= =
=
( )
e e
e
j j
e e H + + =

1 ) (
3
1
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
-3 -2 -1 0 1 2 3
0
0.5
1
e
|H(
e
)|
2
-3 -2 -1 0 1 2 3
-2
0
2
e
angle(H(
e
))
-3 -2 -1 0 1 2 3
0
1
2
e
|1-H(
e
)|
2
-3 -2 -1 0 1 2 3
-2
0
2
e
angle(1-H(
e
))
( ) z z z H + + =

1 1 ) (
1
3
1
( )
e e
e
j j
e e H + + =

1 1 ) (
3
1
3
] 1 [ ] [ ] 1 [
] [ ] [
+ + +
=
n x n x n x
n x n y
31
( ) z z z H + + =

1 ) (
1
3
1
1
Example: Smoothers
( )
e e
e
j j
e e H + + =

1 ) (
3
1
1
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
( )
2 1
3
1
2
1 ) (

+ + = z z z H
( )
e e
e
2
3
1
2
1 ) (
j j
e e H

+ + =
3
] 2 [ ] 1 [ ] [
] [
+ +
=
n x n x n x
n y
-2 0 2
0
0.5
1
e
|H
1
(
e
)|
2
-2 0 2
-2
0
2
e
angle(H
1
(
e
))
-2 0 2
0
0.5
1
e
|H
2
(
e
)|
2
-2 0 2
-2
0
2
e
angle(H
2
(
e
))
-2 0 2
0
0.5
1
e
|H
3
(
e
)|
2
-2 0 2
-2
0
2
e
angle(H
3
(
e
))
2
4
1
1
4
1
2
1
3
) (

+ + = z z z H
e e
e
2
4
1
4
1
2
1
3
) (
j j
e e H

+ + =
] 2 [ ] 1 [ ] [ ] [
4
1
4
1
2
1
+ + = n x n x n x n y
32
( ... ] 4 [ ] 5 [ ] 6 [
12
1
] [
2
1
+ + + = n x n x n x n m
est
] [ ] [
2
1
] [
2
2
1
n m k n x n S
est
k
k
est
|
.
|
\
|
=

=
+
Example: Yearly season estimation
( )
6
2
1
5 4 3 2 1 2 3 4 5 6
2
1
1
1
12
1
) (
) (
) ( z z z z z z z z z z z z
z X
z M
z H
est
+ + + + + + + + + + + + = =

2
8
1
4
1
2
1
1
4
1
2
8
1
2
) (
) ( '
) ( z z z z
z X
z Y
z H + + + + = =

) ( ) ( ) (
1 2
z H z H z H =
-2 0 2
0
0.5
1
1.5
2
e
|H
1
(
e
)|
2
-2 0 2
0
0.5
1
1.5
2
e
|H
2
(
e
)|
2
-2 0 2
0
0.5
1
1.5
2
e
|H
3
(
e
)|
2
33
-3 -2 -1 0 1 2 3
0
0.5
1
e
|H
1
(
e
)|
2
-3 -2 -1 0 1 2 3
0
0.5
1
e
|H
2
(
e
)|
2
6]) x[n 6] - 0.0611(x[n - 7]) x[n 7] - 0.0313(x[n - 8]) x[n 8] - n -0.0101(x[ ] [
1
+ + + + + + = n y
2]) x[n 2] - 0.0934(x[n 4]) x[n 4] - 0.0634(x[n - 5]) x[n 5] - 0.0802(x[n - + + + + + + +
0.2108x[n] 1]) x[n 1] - 0.1772(x[n + + + +
4] - 0.00056x[n 6]) - x[n 1] - n 0.00037(x[ - 8]) - x[n 93(x[n] 0000 . 0 ] [
2
+ + + = n y
8] - 58y[n . 0 7] - 4.3y[n - 6] - 14.6y[n 5] - 29.7y[n - 4] - 39y[n 3] - 34y[n - 2] - 19.3y[n 1] - 6.5y[n - + + + +
f c=2*pi / 12;
ef =2*pi / 60;
b1=f i r 1( 18, [ f c- ef f c+ef ] / pi ) ;
f c=2*pi / 12;
ef =2*pi / 60;
[ b1, a1] =but t er ( 4, [ f c- ef f c+ef ] / pi ) ;
34
[ ] [ ] (1 )( [ 1] [ 1]) S n ax n a x n y n = + +
[ ] ( [ ] [ 1]) (1 ) [ 1] y n b S n S n b y n = +
Example: Holts linear exponential smoothing
[ ] [ 1] [ 1]
N
S n S n y n N = +
N
1 1
( ) ( ( ) ( ) ) (1 ) ( ) ( ) Y z b S z S z z b Y z z f S

= + =
1 1
( ) ( ) (1 )( ( ) ( ) ) S z aX z a X z z Y z z

= + +
( , ) ( ) f X Y f X = =
1 1
( ) ( ) ( ) ( )
N
S z S z z NY z z f X

= + =
( ) f X =
1
2
3
35
[ , ) [ ] [ ]
j n
n
X m x n m w n e
e
e

=
= +
Short-time Fourier Transform

36
Wavelet transform
37
Wavelet transform
38
Wavelet transform
39
Empirical Mode Decomposition
40
Session outline
1. Goal
2. Linear and non-linear regression
41
Bibliography
Hall, CRC, 1996.
1
Session III: Probability models for time series
Madrid,
2
2
Session outline
1. Goal
3. Moving Average processes (MA)
4. Autoregressive processes (AR)
5. Autoregressive, Moving Average (ARMA)
6. Autoregressive, Integrated, Moving Average (ARIMA, FARIMA)
7. Seasonal, Autoregressive, Integrated, Moving Average (SARIMA)
8. Known external inputs: System identification
9. A family of models
10. Nonlinear models
11. Parameter estimation
12. Order selection
13. Model checking
14. Self-similarity, Fractal dimension, and Chaos theory
3
3
1. Goal
Explained by statistical models (AR, MA, )
0 10 20 30 40 50 60 70 80 90 100
-4
-2
0
2
4
0 10 20 30 40 50 60 70 80 90 100
-1
-0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80 90 100
-0.5
0
0.5
1
[ ] 0.9 [ ] [ ] random n random n completelyRandom n = +
[ ] [ ] 0.9 [ 1] random n completelyRandom n completelyRandom n = +
4
4
1. Goal
5
5
| | | |

= =
=
M
k
k
N
k
k
k n x b k n y a
0 0
Difference equation
Example: ] 1 [ 5 . 0 ] [ ] 1 [ ] [ = + n x n x n y n y
= =
N
k
k
k
M
k
k
k
z a
z b
z X
z Y
z H
0
0
) (
) (
) (

=
=
M
k
k
k
N
k
k
k
z z X b z z Y a
0 0
) ( ) (
1 1
) ( 5 . 0 ) ( ) ( ) (

= + z z X z X z z Y z Y
1
1
1
5 . 0 1
) (
) (
) (

= =
z
z
z X
z Y
z H
Transfer function
C z e
T
] [n x ) ( ] [ x T n y =
] [
0
n n x +
ZT
) (
0
z X z
n
6
6
Poles/Zeros
0
z is a pole of iff ) (z H
= ) (
0
z H
0
z is a zero of iff ) (z H
0 ) (
0
= z H
Stability of LTI systems
A causal system is stable iff all its poles are inside the unit circle
Example:
3
] 1 [ ] [ ] 1 [
] [
+ + +
=
n x n x n x
n y
z z z H
3
1
3
1
1
3
1
) ( + + =

Poles:
= , 0 z
Zeros:
2
3
2
1
j z =
} Re{z
} Im{z
1 = z
Invertibility of LTI systems
The transfer function of the inverse system of a LTI system whose transfer function
is is . Therefore, the zeros of one system are the poles of its inverse, and
viceversa.
) (z H
) (
1
z H
1 < z
7
7
Downsampling
| | n x
M
| | ] [nM x n x
d
=
| | n x
L
Upsampling
| |

=
=
=
=
k
e
kL n k x
resto
L L n L n x
n x ] [ ] [
0
,... 2 , , 0 ] / [
o
8
8
3. Moving average processes: MA(q)
] [ ... ] 1 [ ] [ ] [
1 0
q n w b n w b n w b n x
q
+ + + =
) ( ... ) (
2
2
1
1 0
z B z b z b z b b z H
q
q
= + + + + =

) (q MA
] [n w
LTI, with memory,
invertible, causal, stable
0 2 0 4 0 6 0 8 0 1 0 0
- 4
- 3
- 2
- 1
0
1
2
3
w [ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
- 5
0
5
x
1
[ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
-1
- 0 . 5
0
0 . 5
1
x
2 0
[ n ]
t i m e
-1 0 -5 0 5 1 0
0
0 . 2
0 . 4
0 . 6
0 . 8
1
A C F
w
[ n
0
]
l a g
-1 0 -5 0 5 1 0
0
0 . 2
0 . 4
0 . 6
0 . 8
1
A C F
x
1
[ n
0
]
l a g
- 3 0 - 2 0 -1 0 0 1 0 2 0 3 0
0
0 . 2
0 . 4
0 . 6
0 . 8
1
A C F
x
20
[ n
0
]
l a g
Definition
| |
{ }
2
1
0
( ) ( )
W
n TF H S e e
I =
9
9
) , 0 (
2
W
N o
] [ ] [
0
2
0
n n
W W
o o = I
) , 0 (
0
2 2
=
q
k
k W
b N o
< I
s s
<
= I

=
+
0 ) (
0
0
] [
0 0
0
0
2
0
0
0
0
n n
q n b b
n q
n
X
n q
k
n k k W X
o
It has limited
support!!
=
=
q
k
k
k n w b n x
0
] [ ] [
) (q MA
] [n w
{ } { }

= = = =
+ =
)
`
|
|
.
|
\
|
+
|
|
.
|
\
|
= + = I
q
k
q
k
k k
q
k
k
q
k
k X
k n n w k n w E b b k n n w b k n w b E n n x n x E n
0 0 '
0 '
0 '
0 '
0
0 0
] ' [ ] [ ] ' [ ] [ ] [ ] [ ] [
{ }
2
' 0 ' 0
0 ' 0 0 ' 0
[ '] [ ' ' ] [ ( ' )]
q q q q
k k k k W
k k k k
b b E w n w n n k k b b n k k o o
= = = =
= + + =

Statistical properties
Proof
10
10
< I
s s
<
= = = I

=
+
= =
0 ) (
0
0
)] ' ( [ ... ] [
0 0
0
0
2
0
0 0 '
0
2
' 0
0
0
n n
q n b b
n q
k k n b b n
X
n q
k
n k k W
q
k
q
k
W k k X
o o o
=
=
=
0 ) ' ( 0
0 ) ' ( 1
)] ' ( [
0
0
0
k k n
k k n
k k n o
0
' n k k + =
k
' k
q
q
0
n
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
Proof (contd.)
11
11
(Brute force) determination of the MA parameters
0
0
0
2
0 0
0
0 0
0
[ ] 0
( ) 0
q n
X W k k n
k
X
q n
n b b n q
n n
o

+
=
<
I = s s
I <
( )
2 2 2 2
0 1 2
[0]
X W
r b b b o = + +
( )
2
0 1 1 2
[1]
X W
r b b b b o = +
( )
2
0 2
[2]
X W
r b b o =
12
12
Invertibility
) (q MA
] [n w

=
=
q
k
k
k n w b n x
0
] [ ] [
) (
1
q MA
=
q
k
k
k
z b z H
0
) (
= =
q
k
k
k
inv
z b
z H
z H
0
1
) (
1
) (
|
|
.
|
\
|
=

=
q
k
k
k n w b n x
b
n w
1
0
] [ ] [
1
] [
] [n w
] [ ] [
0
n x k n w b
q
k
k
=
=
Example:
3
] 2 [ ] 1 [ ] [
] [
+ +
=
n x n x n x
n y
does not have a stable, causal inverse
] 1 [ 9 . 0 ] [ ] [ + = n x n x n y
has a stable, causal inverse
} Re{z
} Im{z
1 = z
Whitening
filter
Colouring
filter
Channel equalization
Channel estimation
Compression
Forecasting
13
13
3. Moving average processes: generalizations
] [ ] 1 [ ... ] 1 [ ] [ ] [
0 1 0 1 0
0 0
n w b n w b q n w b q n w b n x
q q
+ + + + =
+
Model not restricted to be causal
Model not restricted to be linear

= = =
+ =
q
k
q
k
k k
q
k
k
k n w k n w b k n w b n x
0
'
0 '
' ,
0
] ' [ ] [ ] [ ] [
] [ ... ] 1 [
1 F q
q n w b n w b
F
+ + + + +

Causal component
Anticausal component
Quadratic component

= = = = = =
+ + =
q
k
q
k
q
k
k k k
q
k
q
k
k k
q
k
k
k n w k n w k n w b k n w k n w b k n w b n x
0
'
0 '
' '
0 ' '
' ' , ' ,
0
'
0 '
' ,
0
] ' ' [ ] ' [ ] [ ] ' [ ] [ ] [ ] [
Volterra Kernels
1)
2)

=
=
q
k
k
k n w f b n x
0
]) [ ( ] [
14
14
4. Autoregressive processes: AR(p)
0 2 0 4 0 6 0 8 0 1 0 0
- 4
- 3
- 2
- 1
0
1
2
3
w [ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
- 5
0
5
x
1
[ n ]
t i m e
0 2 0 4 0 6 0 8 0 1 0 0
-1
- 0 . 5
0
0 . 5
1
x
2 0
[ n ]
t i m e
] [ ... ] 1 [ ] [ ] [
1
p n x a n x a n w n x
p
+ + + =
) (
1
... 1
1
) (
2
2
1
1
z A z a z a z a
z H
p
p
=

=

) ( p AR
] [n w
LTI, with memory,
Definition
-30 -20 -10 0 10 20 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
0
ACF(n
0
)
15
15
] [ ... ] 1 [ ] [ ] [
1
p n x a n x a n w n x
p
+ + + =
) (
1
... 1
1
) (
2
2
1
1
z A z a z a z a
z H
p
p
=

=

) ( p AR
] [n w
LTI, with memory,
Definition
Relationship to MA processes

+ =

=
1
2
2
1
1
1
... 1
1
) (
k
k
k
p
p
z b
z a z a z a
z H
Laurent series
=
I + = I
p
k
X k W X
k n a n n
1
0 0
2
0
] [ ] [ ] [ o o
) , 0 (
2
W
N o
]) 0 [ , 0 (
X
N I
Yule-Walker equations
=
= I
p
k
n
k k X
z A n
1
0
0
] [
Whose solution is
k
z
Poles of ) (z H
16
16
Determination of the constants
=
= I
p
k
n
k k X
z A n
1
0
0
] [
Example:
] 2 [ ] 1 [ ] [ ] [
2 1
+ + = n x a n x a n w n x
2
2
1
1
1
1
) (

=
z a z a
z H
Poles:
k
A
=
=
p
k
n
k k X
z A n r
1
'
0
0
] [
2
0 0 0
1
[ ] [ ] [ ]
p
X W k X
k
n n a n k o o
=
I = + I

=
=
p
k
X k X
k n r a n r
1
0 0
] [ ] [ 0
0
> n
2
4
,
2
2
1 1
2 1
a a a
z z
+
= 1 , 1 , 1 1
2 1 2 1 2
> < + > < a a a a a z
i
0 0
2
'
2 1
'
1 0
] [
n n
X
z A z A n r + =
0 4
2
2
1
> + e a a R z
i
1 ] 0 [
'
2
'
1
= + = A A r
X
] 1 [ ] 0 [ ] 1 [
2 1 2
'
1 1
'
1
+ = + =
X X X
r a r a z A z A r
17
17
5. Autoregressive, Moving average: ARMA(p,q)

= =
+ =
p
k
k
q
k
k
k n x a k n w b n x
1 0
] [ ] [ ] [
1 2
0 1 2
( ) ( )
1 2
1 2
...
( )
( ) ( ) ( )
1 ... ( )
q
q
MA q AR p
p
p
b b z b z b z
B z
H z H z H z
a z a z a z A z

+ + + +
= = =

) ( p AR
] [n w
Definition
) , 0 (
2
W
N o
]) 0 [ , 0 (
X
N I
) (q MA

= =
I = I
p
k
X k
q
k
k W X
k n a n k h b n
1
0
0
0
2
0
] [ ] [ ] [ o
18
18
6. Autoregressive, Integrated, Moving Average: ARIMA(p,d,q)
] [n x
d
) (
) (
) (
) (
) , (
z H
z W
z X
z H
q p ARMA
d
= =
) , ( q p ARMA
] [n w
Definition
] [ * ]) 1 [ ] [ ( * ... * ]) 1 [ ] [ ( ] [ ] [ n x n n n n n x n x
d
l d
= V = o o o o
d
d
d
Int
d
d
z z X
z X
z H z X z z X
) 1 (
1
) (
) (
) ( ) ( ) 1 ( ) (
1
1
= = =
) (d Int
] [n x
Poles:
1 = z
(Multiplicity=d)
Unit root
) ( ) (
) (
) (
) (
) (
) (
) (
) (
) ( ) , ( ) , , (
z H z H
z X
z X
z W
z X
z W
z X
z H
d Int q p ARMA
d
d
q d p ARIMA
= = =
( ) ( )

= =
+ =
p
k
k
q
k
k
k n x k n x a k n w b n x n x
1 0
] 1 [ ] [ ] [ ] 1 [ ] [ Example for d=1:
eQ d
FARIMA or
ARFIMA
19
19
7. Seasonal ARIMA: SARIMA(p,d,q)x(P,D,Q)
s
(Box-Jenkins model)
) , , ( q d p ARIMA
] [n w
Definition
( , , )
( )
( )
( )
ARIMA p d q
s
X z
H z
X z
=
) , , ( Q D P ARIMA
s s
] [n x
s
] [n x
) (
) (
) (
) , , (
s
Q D P ARIMA
s
z H
z W
z X
=
) ( ) (
) (
) (
) (
) , , ( ) , , ( ) , , ( ) , , (
z H z H
z W
z X
z H
q d p ARIMA
s
Q D P ARIMA Q D P q d p SARIMA
s
= =
( ) ( )

= =
+ =
P
k
s s k
Q
k
k s s
s k n x ks n x A ks n w B s n x n x
1 0
] ) 1 ( [ ] [ ] [ ] [ ] [
( ) ( )

= =
+ =
p
k
k
q
k
s k
k n x k n x a k n x b n x n x
1 0
] 1 [ ] [ ] [ ] 1 [ ] [
1 = D
1 = d
20
20
7. Seasonal ARIMA: SARIMA(p,d,q)x(P,D,Q)
s
Example: SARIMA(1,0,0)x(0,1,1)
12
) , , ( q d p ARIMA
] [n w
) , , ( Q D P ARIMA
s s
] [n x
s
] [n x
] 12 [ ] [ ] 12 [ ] [
1 0
+ = n w B n w B n x n x
s s
] 1 [ ] [ ] [
1
+ = n x a n x n x
s
) 0 , 0 , 1 ( ) , , ( = q d p
) 1 , 1 , 0 ( ) , , ( = Q D P
] 1 [ ] [ ] [
1
= n x a n x n x
s
( ) ( ) ] 12 [ ] [ ] 13 [ ] 12 [ ] 1 [ ] [
1 0 1 1
+ = n w B n w B n x a n x n x a n x
( ) ] 13 [ ] 1 [ ] 12 [ ] [ ] 12 [ ] [
1 1 0
+ + + = n x n x a n w B n w B n x n x
21
21
8. Known external inputs: System identification
) (
1
z A
] [n w
+
) (
) (
z A
z B
] [n u
] [ ] [ ] [ ] [
0 1
n w k n u b k n x a n x
q
k
k
p
k
k
+ + =

= =
) (
) (
1
) (
) (
) (
) ( z W
z A
z U
z A
z B
z X + =
ARX
) (
) (
z A
z C
] [n w
+
) (
) (
z A
z B
] [n u

= = =
+ + =
'
0 0 1
] [ ] [ ] [ ] [
q
k
k
q
k
k
p
k
k
k n w c k n u b k n x a n x
) (
) (
) (
) (
) (
) (
) ( z W
z A
z C
z U
z A
z B
z X + =
ARMAX
22
22
9. A family of models
) (
) (
z D
z C
] [n w
+
) (
) (
z F
z B
] [n u
) (
) ( ) (
) (
) (
) ( ) (
) (
) ( z W
z A z D
z C
z U
z A z F
z B
z X + =
General model
) (
1
z A
] [n x
Polynomials used Name of the model
A AR
C MA
AC ARMA
ACD ARIMA
AB ARX
ABC ARMAX
ABD ARARX
ABCD ARARMAX
BFCD Box-Jenkins
23
23
( ) ] [ ] [ ],..., 2 [ ], 1 [ ] [ n w p n x n x n x f n x + =
Nonlinear AR:
] [ ] [ ] [ ] [
1
n w k n x n a n x
p
k
k
+ =
=
Time-varying AR:
Random coeff. AR:
] [ ] [ ]) [ ( ] [
1
n w k n x n a n x
p
k
k
+ + =
=
c
Bilinear models
] [ ] [ ] [ ] [ ] [
1 1
n w M k n w k n x b k n x a n x
q
k
k
p
k
k
+ + =

= =
| | | |
'
1 1
[ ] [ ] [ ] (1 ) [ ]
p p
k k
k k
x n a x n k p n a x n k p n w n
= =
| | | |
= + +
| |
\ . \ .

Smooth transition:
24
24
Smooth TAR (STAR):
Threshold AR (TAR):
> +
s +
=
=
=
t d n x n w k n x a
t d n x n w k n x a
n x
p
k
k
p
k
k
] [ ] [ ] [
] [ ] [ ] [
] [
1
) 2 (
1
) 1 (
] [ ]) [ ( ] [ ] [ ] [
1
) 2 (
1
) 1 (
n w d n x S k n x a k n x a n x
p
k
k
p
k
k
+
|
|
.
|
\
|
+ =

= =
Heterocedastic model:
] [ ] [ ] [ n w n n x o = Random walk
) 1 , 0 ( N
=
+ =
p
k
k
k n x a n
1
2 2
0
2
] [ ] [ o o

= =
+ + =
q
k
k
p
k
k
k n b k n x a n
1
2
1
2 2
0
2
] [ ] [ ] [ o o o
ARCH
GARCH
(Neural networks)
(Chaos)
25
25
[ ] x n
[ ] 0
[ ]
x n
n I
2
0
[ ]
[ ]
x n
n I

= =
+ + =
q
k
k
p
k
k
k n b k n x a n
1
2
1
2 2
0
2
] [ ] [ ] [ o o o GARCH
1 1
1
p q
k k
k k
a b
= =
+ <

The model is unique and stationary if
Properties
{ }
[ ] 0 E x n =
Zero mean
Lack of correlation
{ }
2
0
[ ] 0 0
max ,
1
[ ] [ ]
1
x n
p q
k k
k
n n
a b
o
o
=
I =
+
26
26

= =
+ + =
q
k
k
p
k
k
k n b k n x a n
1
2
1
2 2
0
2
] [ ] [ ] [ o o o GARCH
Estimation through Maximum Likelihood
Forecasting
{ } min ,
2 2 2
0
1 1
[ ] ( ) [ ] [ ]
p q
q
k k k
k k
x n h a b x n h k b z n h k o
= =
+ = + + + +

2 2
0
[ 1] ... x n h o + = +
2 2
0
[ 2] ... x n h o + = +
...
2
[ ] x n
2
[ 1] x n
Observed
[ 1] 0 z n h + =
[ 2] 0 z n h + =
...
2 2
[ ] [ ] [ ] z n x n n o =
2 2
[ 1] [ 1] [ 1] z n x n n o =
GARCH(1,1)
2 2 2 2
0 1 1
[ 1] [ ] [ ] x n a x n b n o o + = + +
1
2 2 2 1 2
0 1 1 1 1 1 1 1 1
0
[ ] ( ) ( ) [ ] ( ) [ ]
h
k h
k
x n h a b a a b x n b a b n o o
=
+ = + + + + +
27
27
Extensions of GARCH
Exponential GARCH (EGARCH)
2 2 2 2
0
1 1
log [ ] log log [ ] log [ ]
p q
k k
k k
n a x n k b n k o o o
= =
= + +

Integrated GARCH (IGARCH)
1 1
1
p q
k k
k k
a b
= =
+ <

1 1
1
p q
k k
k k
a b
= =
+ =

GARCH
IGARCH
28
28
Maximum Likelihood Estimates (MLE)
] [ ] 1 [ ] [
1
n w n x a n x + =
AR(1)
Assume that we observe ( ) ] [ ],..., 2 [ ], 1 [ N x x x
] [ ] 1 [ ] [
1
n w n x a n x + =
|
|
.
|
\
|
2
1
2
1
1
, 0 |
a
N X
W
o
u
] [ ] [
0
2
0
n n
W W
o o = I
... ) (
1 2 1 1 1 1
= + + = + =
n n n n n n
W W X a a W X a X
...
3
3
1 2
2
1 1 1
+ + + + =
n n n n
W a W a W a W
{ } 0 =
n
X E
{ } ( ) { }= + + + + =

2
3
3
1 2
2
1 1 1
2
...
n n n n n
W a W a W a W E X E
( )
2
1
2
6
1
4
1
2
1
2
1
... 1
a
a a a
W
W
= + + + + =
o
o
2 1 1 2
W X a X + =
1 1 2 2
X a X W =
2
2 1 1 1
2
1
| , ,
1
W
X X N a x
a
o
u
| |
\ .
( )
2
, 0
W
N o
{ }
2
1
,
W
a o u =
29
29
Maximum Likelihood Estimates (MLE)
|
|
.
|
\
|
2
1
2
1
1
, 0 |
a
N X
W
o
u
2
2 1 1 1
2
1
| , ,
1
W
X X N a x
a
o
u
| |
\ .
2
3 2 1 1 2
2
1
| , , ,
1
W
X X X N a x
a
o
u
| |
\ .
= ) ,..., , (
2 1 | ...
2 1
N X X X
x x x f
N
u
) ( )... ( ) ( ) (
, | 3 , | 2 , | 1 |
1 2 3 1 2 1
N X X X X X X X
x f x f x f x f
N N
u u u u

=
|
|
.
|
\
|

= =
N
n
W
n n
W
N
a
W
N X X X
x a x x
a
x x x f L
W
N
2
2
2
1 1
2
1
2
2
1
1
2
1
2
1
2
2
1
2
1
2 1 | ...
) 2 log(
2
1
log ) 2 log( ) ,..., , ( log ) (
2
1
2
2 1
o
to
o
t u
o
u
2
1
2
1
) (
0
) (
) ( max arg ,
1
W
a
W
L
a
L
L a
o
u u
u o
c
c
= =
c
c
=
Numerical, iterative solution
Confidence intervals
30
30
] [ ] [ ] [ n w n x n x + =
{ } 0 ] [ = n w E
{ } ] 0 [ ] 1 [ 2 ] 0 [ ] [
2
1 1
2 2
X X X W
a a n w E I + I + I = = o
] [ ] [ ] [ n x n x n w =
Least Squares Estimates (LSE)
] [ ] 1 [ ] [
1
n w n x a n x + =
] 1 [ ] [
1
= n x a n x
|
|
.
|
\
|
I
I
I =
I
I
= I + I = =
c
c
1
] 1 [
] 0 [
] 0 [
] 0 [
] 1 [
] 0 [ 2 ] 1 [ 2 0
2
2
2
1 1
1
2
X
X
X W
X
X
X X
W
a a
a
o
o
31
31
12. Order selection
If I have to fit a model ARMA(p,q), what are the p and q values I have to supply?
ACF/PACF analysis
Akaike Information Criterion
Bayesian Information Criterion
Final Prediction Error
N
q p q p AIC
W
2
) ( log ) , (
2
+ + = o
N
N
q p q p BIC
W
log
) ( log ) , (
2
+ + = o
2
) (
W
p N
p N
p FPE o
+
=
32
32
12. Order selection
Partial correlation coefficients (PACF, First-order correlations)
] [ ] [ ... ] 2 [ ] 1 [ ] [
0 , 2 , 1 ,
0 0 0 0
n w n n x n x n x n x
n n n n
+ + + + = | | |
=
=
0
0
1
0 , 0
] [ ] [
n
n
n n
n n r n r |
|
|
|
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
|
|
|
.
|
\
|
|
|
|
|
|
|
|
|
.
|
\
|

] [
] 1 [
...
] 3 [
] 2 [
] 1 [
...
] 0 [ ] 1 [ ... ] 2 [ ] 1 [ ] 1 [
] 1 [ ] 0 [ ... ] 4 [ ] 3 [ ] 2 [
... ... ... ... ... ...
] 3 [ ] 4 [ ... ] 0 [ ] 1 [ ] 2 [
] 2 [ ] 3 [ ... ] 1 [ ] 0 [ ] 1 [
] 1 [ ] 2 [ ... ] 2 [ ] 1 [ ] 0 [
0
0
,
1 ,
3 ,
2 ,
1 ,
0 0 0
0 0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
n r
n r
r
r
r
r n r n r n r
r r n r n r n r
n r n r r r r
n r n r r r r
n r n r r r r
n n
n n
n
n
n
|
|
|
|
|
Yule-Walker
equations
33
33
12. Order selection
Thumb rule
ARMA(1,0): ACF: exponential decrease; PACF: one peak
ARMA(2,0): ACF: exponential decrease or waves; PACF: two peaks
ARMA(p,0): ACF: unlimited, decaying; PACF: limited
ARMA(0,1): ACF: one peak; PACF: exponential decrease
ARMA(0,2): ACF: two peaks; PACF: exponential decrease or waves
ARMA(0,q): ACF: limited; PACF: unlimited decaying
ARMA(1,1): ACF&PACF: exponential decrease
ARMA(p,q): ACF: unlimited; PACF: unlimited
34
34
13. Model checking
Residual Analysis
Example: ARMA (1,1)
( ) ] 1 [ ] 1 [ ] [ ] [ ] 1 [ ] [ ] 1 [ ] [
1
1
1 0
0
= + + = n w b n ax n x n w n w b n w b n ax n x
b
Assumptions
1. Gaussianity:
1. The input random signal w[n] is univariate normal with zero mean
2. The output signal, x[n] (the time series being studied), is multivariate normal and its
covariance structure is fully determined by the model structure and parameters
2. Stationarity: x[n] is stationary once that the necessary operations to produce a stationary
signal have been carried out.
3. Residual independency: the input random signal w[n] is independent of all previous
samples.
0 ] 0 [ = w
35
35
13. Model checking
[ ] x n
n
Structural changes
1 1
[ ] ( , ) y n f x y c = +
2 2
[ ] ( , ) y n f x y c = +
[ ] ( , ) y n f x y c = +
2
1
( [ ] [ ])
N
i i
n
S y n y n
=
=
Sum of squares
of the residuals
with model i
Chow test:
0 1 2
1 1 2
:
:
H f f
H f f
=
=
1 2
1
1 2
1 2
1
1 2 2
( ( ))
( , 2 )
( )
k
N N k
S S S
F F k N N k
S S
+
+
= +
+

Number of
samples in
each period
Number of
parameters in
the model
Assumption: variance is the same in both regions
Solution: Robust standard errors
36
36
13. Model checking
Diagnostic checking
1. Compute and plot the residual error
2. Check that its mean is approximately zero
3. Check for the randomness of the residual, i.e., there are no time intervals where the
mean is significantly different from zero (intervals where the residual is
systematically positive or negative).
4. Check that the residual autocorrelation is not significantly different from zero for all
lags
5. Check that the residual is normally distributed.
6. Check if there are residual outliers.
7. Check the ability of the model to predict future samples
37
14. Self-similarity, Fractal dimension, Chaos theory
37
Intuitively a fractal is a curve that is self-similar at
all scales
Koch curve
k=1
k=0
k=2
k=3
k=4
k=5
( )
X
S
o
e e
38
38
L==1m
=0.5m =0.5m
1
2 N
c
= =
=0.25m =0.25m =0.25m =0.25m
1
4 N
c
= =
1
1 N D c
= =
1
1 N
c
= =
S=1m
2
L==1m
2
1
1 N
c
= =
=0.5m
2
1
4 N
c
= =
2
=1m
2

2
=0.25m
2
2
1
16 N
c
= =
2
=0.0625m
2
=0.25m
2
2 N D c
= =
39
39
1 1
D D
N c

= =
1
3
m c =
1m c =
2
1
3
m c =
3
1
3
m c =
1
4
3
D
D
N c

| |
= =
|
\ .
2
2
1
4
3
D
D
N c

| |
= =
|
\ .
3
3
1
4
3
D
D
N c

| |
= =
|
\ .
k=1
k=2
k=3
1 log 4
4 1.26
3 log3
D
D k
k
N D c

| |
= = = =
|
\ .
40
40
( )
X
S
o
e e
5 2D o =
2 H D =
Hurst exponent
| |
0.2
0,1 0.5
0.8
H
e =

Uncorrelated time series

Long-range dependent alternating signs
Long-range dependent same signs
( )
max
max max
max
1
[ ] [ ]
N
H
MA MA N n
n n
x n x n n o

=
=
41
41
42
42
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
x
0.1
[n]
[ ] 4 [ 1](1 [ 1]) x n x n x n =
Chaotic systems
Logistic equation
[0] 0.1 x =
0 10 20 30 40 50 60 70 80 90 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
n
x
0.1
[n]-x
0.100001
[n]
43
43
Phase space
Attractor (Fixed point)
2-history
( )
2
2
[ ] [ ], [ 1] n x n x n = e x
3-history
( )
3
3
[ ] [ ], [ 1], [ 2] n x n x n x n = e x
h-history
( )
[ ] [ ], [ 1],..., [ 1]
h
h
n x n x n x n h = + e x
( )
2
[2] [2], [1] x x = x
( )
2
[3] [3], [2] x x = x
( )
2
[1] [1], [0] x x = x
2-Phase space
44
44
Recurrence plots
,
1 [ ] [ ']
( , ')
0 [ ] [ ']
h h
h
h h
n n
R n n
n n
c
c
c
<
=

>
x x
x x
White noise
2,
( , ') R n n
c
Sinusoidal Chaotic system
with trend
AR model
45
45
Recurrence plots
46
46
Correlation dimension (Grassberger-Procaccia plots)
Correlation integral
{ } ( )
( 1)
, ' 1
2
'
1
( ) Pr [ ] [ '] lim [ ] [ ']
N
h h h h h
N N
N
n n
n n
C n n H n n c c c
=
=
= < =
x x x x
0 0
( )
1 0
x
H x
x
s
=

>
Heaviside function
Correlation dimension
0
log( ( ))
lim
log( )
h
h
C
D
c
c
c
=
h
D
h
D h =
(random)
(maybe chaotic)
h
47
47
Brock-Dechert-Scheinkman (BDS) Test
( )
1
,
( ) ( )
(0,1)
h
h
h
C C
V N
N
c
c c
=
Lyapunov exponent
Consider two time points such that
0
[ ] [ '] 1
h h
n n o = x x
Consider the distance m samples later
[ ] [ ' ]
m h h
n m n m o = + + x x
0
m
m
e
o o =
The Lyapunov exponent relates these two distances
0 >
Histories diverge: chaos, cannot be predicted
0 <
Histories converge: can be predicted
Maximal Lyapunov exponent
0
,
0
1
lim log
m
m
m
o
o
=
H
0
: samples are iid
48
48
Bibliography
Hall, CRC, 1996.
J. D. Hamilton. Time series analysis. Princeton Univ. Press, 1994.
Session IV: Forecasting and Data Mining
Madrid
2
Session outline
1. Forecasting
2. Univariate forecasting
3. Intervention modelling
4. State-space modelling
5. Time series data mining
1. Time series representation
2. Distance measure
3. Anomaly/Novelty detection
4. Classification/Clustering
5. Indexing
6. Motif discovery
7. Rule extraction
8. Segmentation
9. Summarization
3
1. Forecasting
? ] [ h n x +
Goals:
Symmetric loss functions:
Quadratic loss function:
Other loss functions:
Assymmetric loss functions:
{ } ]) [ ], [ ( min arg ] [ ] [
h g h n x L E h g h n x
n
g
n
+ = = +
( )
2
] [ ] [ ]) [ ], [ ( h n x h n x h n x h n x L + + = + +
] [ ] [ ]) [ ], [ ( h n x h n x h n x h n x L + + = + +
( )
( )
+ > + + +
+ s + + +
= + +
] [ ] [ ] [ ] [
] [ ] [ ] [ ] [
]) [ ], [ (
2
2
h n x h n x h n x h n x b
h n x h n x h n x h n x a
h n x h n x L
Solution:
{ } ) ] [ ],..., 2 [ ], 1 [ ] [ ] [ ] [
n x x x h n x E h g h n x
n
+ = = +
Univariate Forecasting: use only samples of the time series to be predicted
Multivariate Forecasting: use samples of the time series to be predicted and other
companion time series.
4
2. Univariate Forecasting
Trend and seasonal component extrapolation
] [ ] [ ] [ ] [ n random n seasonal n trend n x + + =
n n trend 6 . 1 2777 ] [ + =
( ) ( ) t
t t t
+ + + = n n n seasonal
12
2
3
2
8
2
cos 5 . 22 cos 5 . 26 ] [
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-1 -0.5 0 0.5 1 1.5 2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
5
Exponential smoothing
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
200
300
Year
P
r
i
c
e
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
-100
0
100
Year
P
r
i
c
e
-
t
r
e
n
d
1820 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870
-50
0
50
Year
R
e
s
i
d
u
a
l
] [ ) 1 ( ] [ ] 1 [ n x n x n x o o + = +
1 ) (
) (
) (
+
= =
o
o
z z X
z X
z H
-2 0 2
0
0.5
1
e
|
H
(
e
)
|
2
-2 0 2
-2
0
2
e
a
n
g
l
e
(
H
(
e
)
)
] [ ]) [ ] [ ( n x n x n x + =o
] [ ] [ n x n e + =o
] 1 [ ] 1 [ x x =
6
Model based forecasting (Box-J enkins procedure)
1. Model identification: examine the data to select an apropriate model structure
(AR, ARMA, ARIMA, SARIMA, etc.)
2. Model estimation: estimate the model parameters
3. Diagnostic checking: examine the residuals of the model to check if it is valid
4. Consider alternative models if necessary: if the residual analysis reveals that the
selected model is not apropriate
5. Use the model difference equation to predict: as shown in the next two slides.
7
Model based forecasting
] 1 [ ] [ ] 1 [ ] [ + + = n bw n w n ax n x
] [ ] 1 [ ] [ ] 1 [ n bw n w n ax n x + + + = +
Example: ARMA(1,1)
1
1
1
1
) (
) (
) (

+
= =
az
bz
z W
z X
z H
W
) (
) (
) ( ) ( ) ( ) (
z H
z X
b z aX z bW z aX z X z
W
+ = + =
b z
b a
bz
b a
z z H
b
a
z z X
z X
z H
W
X
+
+
=
+
+
=
|
|
.
|
\
|
+ = =
1
1
1
) (
1
) (
) (
) (
] [ ) ( ] [ ] 1 [ n x b a n x b n x + = + + ] [ ] [ ) ( ] 1 [ n x b n x b a n x + = +
= = + = + = + + + + + = + ... ] 3 [ ] 2 [ ] 1 [ ] [ ] 1 [ ] [
3 2
h n x a h n x a h n bw h n w h n x a h n x
( ) ( ) ] [ ] [ ] 1 [
1 1
n x b n x b a a n x a
h h
+ = + =

1 > h
8
] 12 [ ] [ ]) 13 [ ] 1 [ ( ] 12 [ ] [ + + + = n w n w n x n x n x n x | o
] 11 [ ] 1 [ ]) 12 [ ] [ ( ] 11 [ ] 1 [ + + + + = + n w n w n x n x n x n x | o
Example: SARIMA(1,0,0)x(0,1,1)
12
Model based forecasting
13 12 1
12
1
1
) (
) (
) (

+
+
= =
z z z
z
z W
z X
z H
W
o o
|
] 12 [ ] 13 [ ] 12 [ ] 1 [ ] [ ] [ + = n w n x n x n x n x n w | o o
Eliminating w[n]
] 13 [ ] 12 [ ) ( ] 11 [ ] 1 [ ] [ ) ( ] 1 [ + + + + = + n x n x n x n x n x n x o| | o o| o| | o
] 12 [ ] 24 [ ] 23 [ + n x n x n x | o| |
9
10
[ ] x n
n
1 1
[ ] ( , ) y n f x y c = +
[ ] ( , ) y n f x y c = +
Chow test:
0 1
1 1
:
:
H f f
H f f
=
=
2
1
1
1
2 1
1
1
( )
( , )
N
N k
S S
F F N N k
S
=
Assumption: variance is the same in both regions
Solution: Robust standard errors
Predictive power test
11
Artificial Neural Network (ANN)
1 2
( , ,..., )
t t t t p t t t
y f y y y y c c

= + = +
w
12
Fuzzy Artificial Neural Network (ANN)
( ): [0,1]
M
a
x
13
14
15
16
What happens in the time series of a price if there is tax raise of 3% in 1990?
[ ] [ 1] [ ] 0.03 [ 1990] price n price n w n u n o = + +
1 0
[ ]
0 0
n
u n
n
>
=

<
1982 1984 1986 1988 1990 1992 1994

0
0.02
0.04
[ ] [ 1] [ ] 0.03( [ 1990] [ 1993]) price n price n w n u n u n o = + +
1982 1984 1986 1988 1990 1992 1994
0
0.02
0.04
What happens in the time series of a price if there is tax raise of 3% between 1990 and 1992?
1 1990
1
1
( ) ( ) ( ) 0.03
1
price z price z z w z z
z
o

= + +
1
( ) ( ) ( ) price z price z z w z o

= + +
1990 1993
1
1
0.03 ( )
1
z z
z

17
What happens in the time series of a price if in 1990 there was an earthquake?
[ ] [ 1] [ ] [ 1990] price n price n w n n o |o = + +
1 0
[ ]
0
n
n
otherwise
o
=
1982 1984 1986 1988 1990 1992 1994

0
0.02
0.04
What happens in the time series of a price if there is a steady tax raise of 3% since 1990?
[ ] [ 1] [ ] 0.03( 1989) [ 1990] price n price n w n n u n o = + +
1982 1984 1986 1988 1990 1992 1994
0
0.05
0.1
0.15
0.2
1 1990
( ) ( ) ( ) 1 price z price z z w z z o |

= + +
1 1990
1 1
1 1
( ) ( ) ( ) 0.03
1 1
price z price z z w z z
z z
o

= + +

18
In general
[ ] ( [ 1],..., [ ], [ ], [ 1],..., [ ]) ( [ ]) x n f x n x n q w n w n w n p f i n = +
( ) ( )
( ) ( ) ( )
( ) ( )
B z C z
X z W z I z
A z D z
= +
We are back to the system identification problem with external inputs
19
3. Intervention modelling: outliers revisited
1982 1984 1986 1988 1990 1992 1994
0
0.02
0.04
Additive outliers:
( )
( ) ( ) ( )
( )
B z
X z W z I z
A z
= +
Innovational outliers:
( ) ( )
( ) ( ) ( )
( ) ( )
B z C z
X z W z I z
A z D z
= +
Level outliers:
1982 1984 1986 1988 1990 1992 1994
0
0.02
0.04
( )
( ) ( ) ( )
( )
B z
X z W z I z
A z
= +
Time change outliers:
( ) 1
( ) ( ) ( )
( ) ( )
B z
X z W z I z
A z D z
= +
20
F
1
z
[ ]
k
n e x
H
1
[ ] n
2
[ ] n
[ ]
l
n e y
State
Noise
Noise
Observed time-series
Observation
system
State-transition
system
Linear, Gaussian system: 1
[ ] [ 1] [ ] n F n Q n = + x x
State-transition model
2
[ ] [ ] [ ] n H n n = + y x Observation model
0 0
[0] ( , ) N E x
1 1
[ ] ( , ) n N E 0
2 2
[ ] ( , ) n N E 0
Kalman filter: Given estimate
Time series modelling (System Identification):
0 0 1 2
[ ], , , , , , x n H F E E E
1 2
[ ], , x n E E
0 0
[ ], , , , y n H F E
Given y[n] estimate
21
22
Linear, Gaussian system: 1
[ ] [ 1] [ ] n F n Q n = + x x
State-transition model
2
[ ] [ ] [ ] n H n n = + y x Observation model
Extended/Unscented Kalman filter: Given estimate
1 2
[ ], , x n E E
0 0
[ ], , , , y n h f E
General system:
1
[ ] ( [ 1], [ ]) n f n n = x x
2
[ ] ( [ ], [ ]) n h n n = y x
, f h Differentiable
Particle filter:
Given and the probability density functions of
0 0
[ ], , , , y n h f E
1 2
, c c
estimate x[n] and the free parameters of the noise signals.
23
Source: http://www.kdnuggets.com/polls/2004/time_series_data_mining.htm
24
Time Series
Representations
Data Adaptive
Non Data Adaptive
Spectral Wavelets
Piecewise
Aggregate
Approximation
Piecewise
Polynomial
Symbolic
Singular
Value
Decomposition
Random
Mappings
Piecewise
Linear
Approximation
Adaptive
Piecewise
Constant
Approximation
Discrete
Fourier
Transform
Discrete
Cosine
Transform
Haar Daubechies
dbn n > 1
Coiflets Symlets
Sorted
Coefficients
Orthonormal Bi-Orthonormal
Interpolation Regression
Trees
Natural
Language
Strings
0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120 0 20 40 60 80 100120
DFT DWT SVD APCA PAA PLA
0 20 40 60 80 100120
SYM
UUCUCUCD
U
U
C
U
C
U
D
D
25
0 20 40 60 80 100 120
C
C
0
--
0 20 40 60 80 100 120
b
b
b
a
c
c
c
a
baabccbc
26
Source: http://www.cs.cmu.edu/~bobski/pubs/tr01108-onesided.pdf
27
Time series distance measure
28
Source: http://www.cs.ucr.edu/~wli/SSDBM05/
Anomaly/Novelty detection
A very complex and noisy
ECG, but according to a
cardiologist there is only one
abnormal heartbeat.
The algorithm easily finds it.
29
Classification/Clustering
30
Indexing
31
Winding Dataset
(The angular speed of reel 2)
0 500 1000 1500 2000 2500
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
A B C
A B C
Motif discovery
32
Motif discovery
33
Rule extraction
34
Rule extraction
| |
| | | |
| |
1 x n x n
y n
x n

=
Step 1: Convert time series into a symbol sequence
35
Rule extraction
Step 2: Identify frequent itemsets
x[n]=abbcaabbcabababcaabcabbcabcbcabbaabcbcabbcbc
2-item set
aa 3
ab 11
ac 1
ba 2
bb 5
bc 11
ca 7
cb 3
cc 0
Min.Support=5
3-item set
aba 2
abb 5
abc 4
bba 1
bbb 0
bbc 4
bca 7
bcb 3
bcc 0
caa 2
cab 5
cac 0
4-item set
abba 1
abbb 0
abbc 4
bcaa 2
bcab 5
bcac 0
caba 1
cabb 3
cabc 1
5-item set
bcaba 1
bcabb 3
bcabc 1
36
Segmentation (Change Point Detection)
37
Summarization
38
Session outline
1. Forecasting
1. Time series representation
2. Distance measure
3. Anomaly/Novelty detection
4. Classification/Clustering
5. Indexing
6. Motif discovery
7. Rule extraction
8. Segmentation
9. Summarization
39
Conclusions
Regression, Curve fitting
Explained by statistical models (AR, MA, ARMA)
Preprocessing
(heteroskedasticity,
gaussianity, outliers, )?
Harmonic analysis,
Filtering
(F)ARIMA SARIMA
40
Bibliography
Hall, CRC, 1996.
C. Chatfield. Time-series forecasting. Chapman & Hall, CRC, 2000.
D. Pea, G. C. Tiao, R. S. Tsay. A course in time series analysis. J ohn
Wiley and Sons, Inc. 2001

Time Series Analysis

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Time Series Analysis

Caricato da

Copyright:

Formati disponibili

Time Series Analysis

Session 0: Course outline

Removal of a linear trend

Financial time series

Short-time Fourier Transform

Uncorrelated time series

1982 1984 1986 1988 1990 1992 1994

1982 1984 1986 1988 1990 1992 1994

Potrebbero piacerti anche