Slides

System Identification
and
Parameter Estimation
Course numbers: 191131700 (5 EC for MSc ME/S&C)
201600356 (6 EC for PDEng)
Lecturer: Ronald Aarts

Structural Dynamics, Acoustics & Control (SDAC)
Dept. of Mechanics of Solids, Surfaces & Systems (MS3)
Room: Horstring N 123
Phone: (053) 489 2557
Email: R.G.K.M.Aarts@utwente.nl
Canvas: http://canvas.utwente.nl/
Ronald Aarts PeSi/0/1

Today
• Introduction of System identification and Parameter estimation

– Course topics and course material
– Software: Matlab
– Assessment / examination
• Systems and signals

• Continuous and discrete time systems
• Time and frequency domain

Introduction part 1: System Identification
• What is system identification?

A set of methods to obtain a mathematical model of a dynamic system from
experimental data.
• What is the goal of system identification?
The obtained model should on the one hand be compact and on the other
hand be adequate for the intended purpose. Such purpose may be the
design of a controller for that system.
• How is system identification being carried out?
In dedicated experiments measurements are carried out on a system. The
obtained system response is analysed, e.g. with M ATLAB’s system
identification toolbox systemIdentification.

Example: Data from a mechanical system driven by a piezo actuator (basically
a mass-spring system), input piezo current, output position, Ts = 33 µs.
Input and output signals Input and output signals

900 900
y [−]
y [−]
800 800
700 700
2100 2100
u [−]
u [−]
2050 2050
2000 2000
0 0.2 0.4 0.6 0.8 1 0.5 0.501 0.502 0.503 0.504 0.505
t [s] t [s]
This data will be used more often during the lectures and is available for download.

• How is this toolbox being used?
Often with a GUI:
Click-and-play??
• And what is the use of these lectures?
An overview of the applied methods, the background of the algorithms, the
pitfalls and suggestions for use.

Introduction part 2: Parameter Estimation
• What is parameter estimation (in this lecture)?

A set of methods to obtain a physical model of a dynamic system from
experimental data.
• What is the goal of parameter estimation?
The obtained parameters should represent certain physical properties of
the system being examined.
• How is parameter estimation being carried out?
The parameters of the model are fitted to obtain the “best” representation
of the experimental data, e.g. with M ATLAB’s optimization toolbox.

Physical models vs. Mathematical models
microscopic macroscopic
“First principles”: conservation laws, Experimental: planned experi-
physical properties, parameter esti- ments, look for relations in data,
mation system identification
PDE’s: ∞-dimensional ODE’s: finite dimension
non-linear LTI
no general-purpose technique “standard” techniques
LTI: Linear Time Invariant system, e.g. described by means of a transfer

function
Y (s) y(t)
= G(s) of = G(s)
U (s) u(t)
Where possible, non-linearities are removed from the data by preprocessing

prior to system identification

Approach for (1) system identification and (2) parameter estimation:
• Input/output selection
• Experiment design
• Collection of data
• Choice of the model structure (set of possible solutions)
• Estimation of the parameters
• Validation of the obtained model (preferably with separate
“validation” data).
1. The models used for system identification are “mathematical”:

The underlying physical behaviour is unknown or incomplete.
To the extend possible physical insight is used, e.g. to select the model
structure (representation, order).
2. The parameters in the models for parameters estimation have, to a more or
lesser extend, a physical meaning.

Overview topics: 1. System identification (first part of the lectures):
• Introduction, discrete systems, basic signal theory

• Open-loop LTI SISO systems, time domain, frequency domain
• Non-parametric identification: correlations and spectral analysis
• Subspace identification
• Identification with “Prediction Error”-methods: prediction, model structure,
approximate models, order selection, validation

Overview topics: 2. Parameter estimation (first and second part of the lectures):
• Non-linear model equations

• Linearity in the parameters
• Identifiability of parameters
• Error propagation

Overview topics: 3. Miscellaneous (second part of the lectures):
• MIMO-systems
• Identification in the frequency domain
• Identification of closed loop systems
• Non-linear optimisation
• Cases.

Software:
M ATLAB (any recent version) + identification toolbox + optimization toolbox.
Course material:
Lecture notes / slides in PDF-format from Canvas site.
On-line M ATLAB documentation of the (selected) toolboxes.
Examination (5 EC):
Part 1: Individual open book notebook exam at the end of the block. Questions
are similar to exercises 1–5 about the topics in chapters 1–6. The answers to
these exercises will be addressed during the lectures, see Activity plan.
Grade for this exam is 50% of final grade and must be pass (≥ 5.5).
Part 2: “Standard” assignments (with/without lab assignment) or your own plan
An oral exam may be scheduled to finalise the grade.
Planning: Can be completed at the end of block 2A or spread out over two
blocks 2A and 2B or see Canvas for grading schedule.
Questions?
Lecture breaks, reserved office hours (see Canvas), ...

u System y
T
Systems and signals
A description of the system T should specify how the output signal(s) y depend
on the input signal(s) u.
The signals depend on time: continuous or at discrete time instants.
Several system descriptions are available:
• Continuous versus discrete time

• Time domain versus frequency domain
Examples for LTI systems:
• Frequency Response Function (FRF) or Bode plot

• Impulse response or step response
• State space model
• Transfer function

The Bode plot of a system G(iω) is the amplitude and phase plot of the
complex function G depending on the real (angular) frequency ω.
It specifies for each frequency ω the input-output relation for harmonic signals
(after the transient behaviour vanished):
• The amplification equals the absolute value |G(iω)|.

• The phase shift is found from the phase angle ∠G(iω).
The output y(t) for an arbitrary input signal u(t) can be found by considering
all frequencies in the input signal: Fourier transform.
Procedure (in principle): Transform the input into the frequency domain, multiply
with the Bode plot or FRF, transform the result back to the time domain.

The impulse response g(t) of an LTI system can also be used to compute the
output y(t) for an arbitrary input signal u(t):
The input signal can be considered as a sequence of impulses with some (time
dependent) amplitude u(t). The outputs due to all impulses are added:
Continuous time signals and systems: Convolution integral
Z∞
y(t) = (g ∗ u)(t) = g(τ )u(t − τ ) dτ
0
Discrete time signals and systems: Summation
∞
X
y(k) = (g ∗ u)(k) = g(l)u(k − l) (k = 0, 1, 2, ...)
l=0
Note that discrete time g(l) equals approximately continuous time g(l Ts ) Ts.

Discrete time dynamic systems
Signals are sampled at discrete time instances tk with sample time Ts,:
uk−1 = u(tk−1) = u(tk − Ts)

uk = u(tk ) = u(tk )
uk+1 = u(tk+1) = u(tk + Ts)
uk+2 = u(tk+2) = u(tk + 2Ts )
Transfer function y = G(z)u + H(z)e (in z-domain)
Continuous time vs. Discrete time

differential operator shift operator
du
su ⇒ (1 − q)uk ⇒ uk − uk+1
Z dt
s−1u ⇒ u dt (1 − q −1)uk ⇒ uk − uk−1
State space equations:

ẋ(t) = Ax(t) + Bu(t) + Kv(t) x(tk+1) = Ax(tk ) + Bu(tk ) + Kv(tk )
y(t) = Cx(t) + Du(t) + v(t) y(tk ) = Cx(tk ) + Du(tk ) + v(tk )

Discrete time dynamic systems: stability
Continue time vs. Discrete time

Laplace transform z transform
Transfer function:
n1 s + n0 b1z −1 + b2z −2
G(s) = G(z) =
d2s2 + d1s + d0 1 + a1z −1 + a2z −2
Relation between s en z: z = esTs (Franklin (8.19))
Stability:
Poles in LHP: Poles inside the unit circle:
Re(s) < 0 |z| < 1
Undamped poles:
Imaginary axis: Unit circle:
s = iω z = eiωTs

Discrete systems and phase lag
Zero order hold (ZOH) discretisation introduces a phase lag:
1 Phase lag depends on

frequency ω and sample time
0.5
Ts: −ω · Ts/2 (in rad).
x [−]
0 0
−20
phase [deg]
−0.5
−40
−1 −60
Ts = 0.1 s
0 0.5 1 1.5 2 −80
t [s] 0 1
10 10
ω [rad/s]
So the phase lag is −90o at the Nyquist frequency ωN = ωs/2 = π/Ts.

Signal characterisation
M ATLAB’s identification toolbox ident works with time domain data. Even then,
the frequency domain will appear to be very important. Furthermore,
identification can also be applied in the frequency domain.
• Frequency content: Fourier transform

• Energy
• Power
Deterministic or stochastic signals?

Fourier transforms
Continuous-time deterministic signals u(t): Fourier integral:

Z ∞ Z ∞
−iωt 1
U (ω) = u(t)e dt u(t) = U (ω)eiωt dω
−∞ 2π −∞
For a finite number (N ) of discrete-time samples ud(tk ): Fourier summation:

NX
−1
−iωl tk 1 NX
−1
UN (ωl ) = ud(tk )e ud(tk ) = UN (ωl )eiωl tk
k=0
N l=0
UN (ωl ) with ωl = Nl ωs = Nl 2π
Ts , l = 0, ..., N − 1 is the discrete Fourier
transform (DFT) of the signal ud(tk ) with tk = kTs, k = 0, ...N − 1.
For N equal a power of 2, the Fast Fourier Transform (FFT) algorithm can be
applied.

Example: 32768 data samples of the piezo mechanism (left).
6
900 10
4
800 10
YN [−]
y [−]
2
700 10
0
600 10
0 2 4
0 0.5 1 10 10 10
t [s] f [Hz]
DFT (right) in 16384 frequencies computed with M ATLAB’s fft command.

Horizontal axis in steps of 1/(total measurement time) = 0.925 Hz.

Energy and power (continue time signals)
Energy spectrum: Ψu(ω) = |U (ω)|2

Z∞ Z∞
1
Energy: Eu = u(t)2dt = Ψu(ω)dω
2π
−∞ −∞
1
Power spectrum: Φu(ω) = lim |UT (ω)|2
T →∞ T
ZT Z∞
1 1
Power: Pu = lim u(t)2dt = Φu(ω)dω
T →∞ T 2π
0 −∞
[ UT (ω) is de Fourier transform of a continuous time signal with a finite duration ]

Signal types
• Deterministic with finite energy: 1

R
Eu = Ψu(ω)dω bounded. 0
Ψu(ω) = |U (ω)|2 limited. −1
• Deterministic with finite power: 1
0
R
Pu = Φu (ω)dω finite.
Φu(ω) = T1 |UT (ω)|2 unbounded. −1
1
• Stochastic with finite power:
R 0
Pu = Φu (ω)dω finite.
Φu(ω) = T1 |UT (ω)|2 bounded. −1
0 20 40 60 80 100
t [s]

Energy and power (discrete time signals)
Energy spectrum: Ψu(ω) = |U (ω)|2 (from the DFT)

∞ Z
ud(k)2 =
X
Energy: Eu = Ψu(ω)dω
k=−∞ ωs
1
Power spectrum: Φu(ω) = lim |UN (ω)|2 (“periodogram”)
N →∞ N
1 NX
−1 Z
Power: Pu = lim ud(k)2 = Φu(ω)dω
N →∞ N
k=0 ωs
900
0
10
800
Output # 1
y [−]
Piezo data: 700 −5

10
600 0
10
1
10
2
10
3
10
4
10
0 0.5 1
t [s]

Convolutions
Z∞
Continuous time: y(t) = (g ∗ u)(t) = g(τ )u(t − τ ) dτ
0
After Fourier transform: Y (ω) = G(ω) · U (ω)
∞
X
Discrete time: y(k) = (g ∗ u)(k) = g(l)u(k − l) (t = 0, 1, 2, ...)
l=0
After Fourier transform: Y (ω) = G(ω) · U (ω)
Example: u is the input and g(k), k = 0, 1, 2, ... is the impulse response of the
system, that is the response for an input signal that equals 1 for t = 0 en
equals 0 elsewhere.
Then with the expressions above y(k) is the output of the system.

Stochastic signals
Realisation of a signal x(t) is not only a function of time t, but depends also on
the ensemble behaviour.
An important property is the expectation:E f (x(t))
Examples: Mean E x(t)

Power E (x(t) − E x(t))2
Cross-covariance: Rxy (τ ) = E[x(t) − Ex(t)][y(t − τ ) − Ey(t − τ )]
Auto-covariance: Rx(τ ) = E[x(t) − Ex(t)][x(t − τ ) − Ex(t − τ )]
White noise: e(t) is not correlated with signals e(t − τ ) for any τ 6= 0.
Consequence: Re(τ ) = ???

Stochastic signals
Realisation of a signal x(t) is not only a function of time t, but depends also on
the ensemble behaviour.
An important property is the expectation:E f (x(t))
Examples: Mean E x(t)

Power E (x(t) − E x(t))2
Cross-covariance: Rxy (τ ) = E[x(t) − Ex(t)][y(t − τ ) − Ey(t − τ )]
Auto-covariance: Rx(τ ) = E[x(t) − Ex(t)][x(t − τ ) − Ex(t − τ )]
White noise: e(t) is not correlated with signals e(t − τ ) for any τ 6= 0.
Consequence: Re(τ ) = 0 for τ 6= 0.

Power density or Power Spectral Density:
Z∞ ∞
Rx(τ ) e−iωτ dτ Rxd (k) e−iωkT
X
Φx(ω) = Φxd (ω) =
−∞ k=−∞
With (inverse) Fourier transform:

Z∞
1 T
Z
Rx(τ ) = Φx(ω) eiωτ dω Rxd (k) = Φxd (ω) eiωkT dω
2π 2π ω
−∞ s
Power:
E (x(t) − E x(t))2 = Rx(0) = E (xd(t) − E xd(t))2 = Rxd (0) =
Z∞
1 T
Z
= Φx(ω) dω = Φxd (ω) dω
2π 2π ω
−∞ s
White noise: Re (τ ) = 0 for τ 6= 0.

Consequence: Φe(ω) = ???

Power density or Power Spectral Density:
Z∞ ∞
Rx(τ ) e−iωτ dτ Rxd (k) e−iωkT
X
Φx(ω) = Φxd (ω) =
−∞ k=−∞
With (inverse) Fourier transform:

Z∞
1 T
Z
Rx(τ ) = Φx(ω) eiωτ dω Rxd (k) = Φxd (ω) eiωkT dω
2π 2π ω
−∞ s
Power:
E (x(t) − E x(t))2 = Rx(0) = E (xd(t) − E xd(t))2 = Rxd (0) =
Z∞
1 T
Z
= Φx(ω) dω = Φxd (ω) dω
2π 2π ω
−∞ s
White noise: Re (τ ) = 0 for τ 6= 0. 0

10
Input # 1
Consequence: Φe(ω) = constant for all ω. −5

10
0 1 2 3 4
10 10 10 10 10
Frequency (Hz)

Systems and models
A system is defined by a number of external variables (signals) and the

relations that exist between those variables (causal behaviour).
v
u ✗✔
❄ y
✲
G ✲
✖✕
✲
Signals: • measurable input signal(s) u

• measurable output signal(s) y
• unmeasurable disturbances v (noise, non-linearities, ...)

Estimators
Suppose we would like to determine a vector θ with n real coefficients of which

the unknown (true) values equal θ0.
An estimator θ̂N has been computed from N measurements.
This estimator is
• unbiased if the estimator E θ̂N = θ0.

• consistent, if for N → ∞ the estimator E θ̂N resembles a δ-function, or in
other words the certainty of the estimator improves for increasing N .
The estimator is consistent if it is unbiased and the asymptotic covariance

lim covθ̂N = lim E(θ̂N − E θ̂N )(θ̂N − E θ̂N )T = 0
N →∞ N →∞

Non-parametric (system) identification
• t-domain: Impulse or step response IDENT: cra ∗
• f -domain: Bode plot IDENT: spa & etfe
• Give models with “many” numbers, so we don’t obtain models with a “small”
number of parameters.
• The results are no “simple” mathematical relations.
• The results are often used to check the “simple” mathematical relations that
are found with (subsequent) parametric identification.
• Non-parametric identification is often the fist step.
∗The IDENT commands impulse & step use a different approach that is related to the
parametric identification to be discussed later.

Correlation analysis Ident manual: Tutorial pages 3-9,10,15;
Function reference cra (4-42,43).
v
u ✗✔
❄ y y(t) = G0(z) u(t) + v(t)
✲
G0 ✲
✖✕
✲
Using the Impulse Response g0(k), k = 0, 1, 2, ... of system G0(z)
∞
X
y(t) = g0(k)u(t − k) + v(t) (t = 0, 1, 2, ...)
k=0
∞
g0(k)z −k
X
So the transfer function can be written as: G0(z) =
k=0
⇒ Impulse response of infinite length.

⇒ Assumption that the “real” system is linear and v is a disturbance (noise, not
related to input u).

The Finite Impulse Response (FIR) ĝ(k), k = 0, 1, 2, ..., M , is a model
estimator for system G0(z) for sufficiently high order M :
M
X
y(t) ≈ ĝ(k)u(t − k) (t = 0, 1, 2, ...)
k=0
Note: In an analysis the lower limit of the summation can be taken less than 0 (e.g. −m) to
verify the (non-)existence of a non-causal relation between u(t) and y(t).
How do we compute the estimator ĝ(k)?
• u(t) en v(t) are uncorrelated (e.g. no feedback from y to u!!!).

• Multiply the expression y(t) = g0(k)u(t − k) + v(t) with u(t − τ )
P
and compute the expectation.

• This leads to the Wiener-Hopf equation:
∞
X
Ryu(τ ) = g0(k)Ru (τ − k)
k=0

∞
X
Wiener-Hopf: Ryu(τ ) = g0(k)Ru (τ − k)
k=0
If u(t) is a white noise signal, then Ru(τ ) = σu2δ(τ ), so
Ryu (τ ) R̂yu (τ )
g0(τ ) = and ĝ(τ ) =
σu2 σu2
How do we compute the estimator for the cross covariance R̂yu (τ ) from N
measurements?
Sample covariance function:
N
N 1 X
R̂yu(τ ) = y(t)u(t − τ )
N t=τ
is asymptotically unbiased (so for N → ∞).

∞
X
k=0
And what if u(t) is not a white noise signal?
Option 1: Estimate the auto-covariance e.g. with

N
N 1 X
R̂u (τ ) = u(t)u(t − τ )
N t=τ
and solve the linear set of M equations for ĝ(k):
M
N N
X
R̂yu (τ ) = ĝ(k)R̂u (τ )
k=1
Not so favorable ....

∞
X
k=0
And what if u(t) is not a white noise signal (continued)?
Option 2: Filter input and output with a pre-whitening filter L(z):

Suppose we know a filter L(z), such that uF (k) = L(z)u(k) is a white
noise signal.
Apply this filter to the output as well (yF (k) = L(z)y(k)), then
yF (t) = G0(z) uF (t) + L(z)v(t)
so the impulse response ĝ(k) can also be estimated from uF (k) and
yF (k).
OK, but how do we find such a pre-whitening filter L(z)?

Finding a pre-whitening filter L(z) for uF (k) = L(z)u(k):
Try to use a linear model of order n (default in ident n = 10)
L(z) = 1 + a1z −1 + a2z −2 + ... + anz −n
N
1 X
and look for a “best fit” of the n parameters such that e.g. u2
F (k) is
N k=1
minimised.
⇒ Exercise 3.

Example 1: Known system
Ident manual page 2-15:
yk − 1.5yk−1 + 0.7yk−2 = uk−1 + 0.5uk−2 + ek − ek−1 + 0.2ek−2.
q −1 + 0.5q −2 z + 0.5
So G0(q) = or G0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
1 − q −1 + 0.2q −2 z 2 − z + 0.2
and H0(q) = or H0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
Simulation: N = 4096
T = 1s
fs = 1 Hz
u(t) binary signal in frequency band 0..0.3fs
e(t) “white” noise (random signal) with variance 1

Impulse response of known system
Impulse response
3
2.5
G0 (exact)
2
1.5 cra with n = 10

1
cra with n = 1 (not

0.5
enough pre-whitening)
0
−0.5
−1
−1.5
0 5 10 15 20 25 30 35 40
Time [s]

Example 2: The piezo mechanism.
Warning: All equations starting from y(t) = G0(z) u(t) + v(t) do not account
for offsets due to non-zero means in input and/or output. So detrend!
Input and output signals

150 load piezo;
100
50 Ts = 33e-6;
y1
−50
u=piezo(:,2);
−100 y=piezo(:,1);
−150
0 0.2 0.4 0.6 0.8 1
piezo =
100
iddata(y,u,Ts);
50
piezod =
u1
0
detrend(piezo);
−50
plot(piezod);
−100
0 0.2 0.4 0.6 0.8 1
Time

Impulse response of the piezo mechanism.
[ir,R,cl]=cra(piezod,200,10,2);
Covf for filtered y Covf for prewhitened u

2000 2500
1800 2000
1500
Upper right: u is indeed
1600
1000 whitened.
1400
500
1200 0
1000 −500
Lower right: The impulse
−200 −100 0 100 200 −50 0 50
response is causal.
Correlation from u to y (prewh) Impulse response estimate
0.1 2500
0.08 2000
0.06 1500
The horizontal axes count the
0.04
1000 time samples, so the values
0.02
0
500
should be scaled with
0
−0.02
−500
T = 33 µs.
0 50 100 150 200 0 50 100 150 200

Spectral analysis Ident manual: Tutorial pages 3-10,15,16;
Function reference etfe (4-53,54),
spa (4-193–195).
v
u ✗✔
❄ y y(t) = G0(z) u(t) + v(t)
✲
G0 ✲
✖✕
✲
Fourier transform (without v): Y (ω) = G0(eiωT )U (ω), so
Y (ω)
G0(eiωT ) = .
U (ω)
iωT iωT YN (ω)

Estimator for G0(e ) using N measurements: ĜN (e )= .
UN (ω)
V (ω)
Effect of v: ĜN (eiωT ) = G0(eiωT ) + N .
UN (ω)

The estimator ĜN (a) is unbiased.
(b) has an asymptotic variance
Φv (ω)/ N1 |U (ω)|2 unequal 0 !
N
(c) is asymptotically uncorrelated for different
frequencies ω.
Difficulty: For N → ∞ there is more data, but there are also estimators at more
(=N/2) frequencies, all with a finite variance.
Solutions:
1. Define a fixed period N0 and consider an increasing number of
measurements N = rN0 by r → ∞. Carry out the spectral analysis for each
period and compute the average to obtain a “good” estimator in N0/2
frequencies.
2. Smoothen the spectrum in the f -domain.

An example: 32768 data samples of the output signal (position) of the piezo
mechanism.
Split the DFT in parts:
6 7
10 10
6
10
5
10
5
10
4
10
4
10
3
10
3
10
2
10
2
10
1 1
10 10
0 1 2 3 4 0 1 2 3 4
10 10 10 10 10 10 10 10 10 10
DFT 16384 frequencies and 2048 and 256 frequencies

An example: 32768 data samples of the output signal (position) of the piezo
mechanism.
Filter the DFT with a Hamming∗ filter:
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0 1 2 3 4
10 10 10 10 10
Original DFT and with Hamming∗ filters of width 8, 64 and 512.

∗ Signal Processing Toolbox: Filters neighbouring frequencies with cosine function.

With IDENT (1): periodogram (“data spectra”) of input and output signals
The output signal shows a
Periodogram
low frequent slope of -2,
slope -2 which is caused by the
0
10
existence of a pure
Output # 1
integrator 1/s in the system

−5
10 (slope -1 in a Bode plot and
0 1 2 3 4
squared in the
10 10 10 10 10
periodogram).
0
10
The input signal is

Input # 1
−5
10
“reasonable” white.
0 1 2 3 4
10 10 10 10 10
Frequency (Hz)

With IDENT (2): identification
ETFE (Empirical Transfer Function Estimate, manual page 4-53):
YN (ω)
Estimate the transfer function G with Fourier transforms ĜN (eiωT ) =
UN (ω)
in 128 (default) frequencies.
Smoothing is applied and depends on a parameter M , which equals the width

of the Hamming window of a filter that is applied to input and output (small M
means more smoothing).
SPA (SPectral Analysis, manual page 4-193):

Estimate the transfer function G with Fourier transforms of the covariances
iωT Φ̂yu(ω)
ĜN (e )= in 128 (default) frequencies.
Φ̂u(ω)
Smoothing is also applied and again depends on a parameter M , that sets the
width of the Hamming window of the applied filter.

Choice between ETFE and SPA?
• Not always straightforward to predict which method will perform best. Why
not try both?
• ETFE is preferred for systems with clear peaks in the spectrum.
• SPA estimates also the noise spectrum v(t) = y(t) − G0(z)u(t)
according to
2
Φ̂yu(ω)

Φ̂v (ω) = Φ̂y (ω) −
Φ̂u(ω)
• Measure of signal
v
to noise ratio with the coherence spectrum
u |Φ̂yu(ω)|2
u
κ̂yu(ω) = t
Φ̂y (ω)Φ̂u (ω)

Example 1: Spectral analysis of the known system
Ident manual page 2-15:
q −1 + 0.5q −2 z + 0.5
So G0(q) = or G0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
1 − q −1 + 0.2q −2 z 2 − z + 0.2
and H0(q) = or H0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
Simulation: N = 4096
T = 1s
fs = 1 Hz
e(t) “white” noise (random signal) with variance 1

Spectra:
Frequency response
1
10
Amplitude
0
10
G0
−1
10
−2 −1 0
10 10 10
ETFE, M = 30
0
SPA, M = 30
−90
Phase (deg)
−180
−270
−2 −1 0
10 10 10
Frequency (Hz)

r=1 r=4 r = 16
Smoothen the ETFE: 2
Phase [deg] Magnitude [−]

10
0
10
−2
10
Periodic input,
r periods 0
−180
−360
−3 −2 −1 −3 −2 −1 −3 −2 −1
10 10 10 10 10 10 10 10 10
f [Hz] f [Hz] f [Hz]
2
w=1 w=2 w=4
Phase [deg] Magnitude [−]

10
0
10
−2
10
Filtering, Hamming
window of width w 0
−180
−360
−3 −2 −1 −3 −2 −1 −3 −2 −1
10 10 10 10 10 10 10 10 10
f [Hz] f [Hz] f [Hz]

Example 2: ETFE of the piezo example
Frequency response
0
10
Amplitude
−2
10
−4
10
2 3 4
10 10 10
window
0
parameter:
−200
Phase (deg)
−400
M = 15
−600 M = 30
−800
M = 60 *
2 3 4
M = 90
10 10 10
Frequency (Hz) M = 120
What is the “real” width of the peak near 2 kHz?

SPA of the piezo example
Frequency response
0
10
Amplitude
−2
10
−4
10
2 3 4
10 10 10
window
0
parameter:
−200
Phase (deg)
−400
M = 15
−600 M = 30
−800
M = 60
2 3 4
M = 90 *
10 10 10
Frequency (Hz) M = 120

Going from “many to “just a few” parameters: a first step
Idea: Try to recognise “features” in the data.
• Immediately in u(k) en y(k) ???

• In the spectral models: Are there “features” e.g. like peaks as are are
expected in the Bode plots / FRF (eigenfrequency, ...) of a system with a
complex pole pair.
• In the impulse response (measured or identified):
(1) Recognise “features” (settling time, overshoot, ...).
(2) Realisation algorithms → to be discussed next.

Intermezzo: Linear regression and Least squares estimate
Regression:
• Prediction of variable y on the basis of information provided by other

measured variables ϕ1, ..., ϕd.
 
ϕ1
• Collect ϕ =  .. .
 
ϕd
• Problem: find function of the regressors g(ϕ) that minimises the difference
y − g(ϕ) in some sense.
So ŷ = g(ϕ) should be a good prediction of y.
• Example in a stochastic framework: minimise E[y − g(ϕ)]2.

Linear regression:
• Regression function g(ϕ) is parameterised. It depends on a set of

parameters
 
θ1
 . 
θ =  . .
θd
• Special case: regression function g(ϕ) is linear in the parameters θ.

Note that this does not imply any linearity with respect to the variables
from ϕ.
• Special case: g(ϕ) = θ1ϕ1 + θ2ϕ2 + ... + θdϕd

So g(ϕ) = ϕT θ.

Linear regression — Examples:
15
10
y [−]
• Linear fit y = ax + b. " #
5
0
x
Then g(ϕ) = ϕT θ with input vector ϕ = 0 5 10
1 x [−]
" # " #
a h i a
and parameter vector θ = . So: g(ϕ) = x 1 .
b b
15
• Quadratic function y = c2x2 + c1x + c0. 10
y [−]
5
x2

0
Then g(ϕ) = ϕT θ with input vector ϕ =  x  −5
 
0 5 10
 
1 
x [−]

c2 h i c2
and parameter vector θ =  c1 . So: g(ϕ) = x2 x 1  c1 .
   
c0 c0

Least-squares estimate (LSE):
• N measurements y(t), ϕ(t), t = 1, ..., N .
N
1 X
• Minimise VN (θ) = [y(t) − g(ϕ(t))]2 .
N t=1
• So a suitable θ is θ̂N = arg min VN (θ).
N
1 X
• Linear case VN (θ) = [y(t) − ϕT (t)θ]2.
N t=1

Linear least-squares estimate (1):
N
1 X
• In the linear case the “cost” function VN (θ) = [y(t) − ϕT (t)θ]2
N t=1
is a quadratic function of θ.
∂VN (θ)
• It can be minimised analytically: All partial derivatives have to be
∂θ
zero in the minimum:
N
1 X
2ϕ(t)[y(t) − ϕT (t)θ] = 0
N t=1
The solution of this set of equations is the parameter estimate θ̂N .

Linear least-squares estimate (2):
• A global minimum is found for θ̂N that satisfies a set of linear equations, the
normal equations
 
N N
1 X 1 X
 ϕ(t)ϕT (t) θ̂ N = ϕ(t)y(t).
N t=1 N t=1
• If the matrix on the left is invertible, the LSE is

 −1
N N
1 1 X
ϕ(t)ϕT (t)
X
θ̂N =  ϕ(t)y(t).
N t=1 N t=1

Linear least-squares estimate — Matrix formulation:  
y(1)
• Collect the output measurements in the vector YN = 
 .. ,

y(N )
T
 
ϕ (1)
and the inputs in the N × d regression matrix ΦN = 
 .. .

ϕT (N )
h i
• Normal equations: ΦT T
N ΦN θ̂N = ΦN YN .
†
• Estimate θ̂N = ΦN YN
i−1
†
h
T
(Moore-Penrose) pseudoinverse of ΦN : ΦN = ΦN ΦN ΦT
N.
†
Note: ΦN ΦN = I.

Linear least-squares estimate in Matlab:
Solution x of overdetermined Ax = b with rectangular matrix A,

so more equations than unknowns, or
more rows than columns, or
A is m-by-n with m > n and full rank n
Then least squares solution x̂ = A†b
In Matlab:
x = A\b; % Preferred
x = pinv(A)*b;
x = inv(A’*A)*A’*b;

Example 1: The “well-known” linear fit y = ax + b
Measurements xi and yi for i = 1, ..., N .
1
Cost function VN = N (yi − axi − b)2.
P
∂VN ∂VN
1) “Manual” solution: = 0 and = 0, so
∂a ∂b
 P
−xi(yi − axi − b) = 0 2
" P # " # " P #
xi xi a xi y i
 P
⇔ =
1 b yi
P
xi
P P
−(yi − axi − b) = 0
 P
#−1 " P
2
" # " P #
â x x xy
P
Parameter estimate: = P i P i Pi i
b̂ xi 1 yi

Example 1: The “well-known” linear fit y = ax + b
Measurements xi and yi for i = 1, ..., N .
1
Cost function VN = N (yi − axi − b)2.
P
   
y(1) x(1) 1 " #
.. .. ..  a
2) Matrix solution: YN =  , ΦN =   and θ = .
  
b
y(N ) x(N ) 1
1 ||Y − Φ θ||2.
Cost function (in vector form) VN = N N N 2
i−1
†
h
T
Estimate θ̂N = ΦN YN = ΦN ΦN ΦT
N YN .
In Matlab: theta = Phi\Y;

Example 2: Pre-whitening filter L(z) for correlation analysis
Find a linear filter of order n (e.g. n = 10)
L(z) = 1 + a1z −1 + a2z −2 + ... + anz −n
that filters a given signal u(t) with uF (k) = L(z)u(k), such that
N
1 X
VN = u2
F (k) is minimised.
N k=1
∂VN
1) “Manual” solution: = 0 for all i = 1, 2, ..., n.
∂ai
2) Matrix solution: Write cost function in vector form

1 ||Y − Φ θ||2 with best fit θ̂ = Φ Y . †
VN = N N N 2 N N N

N
1 X
For both solutions substitute in VN = u2 (k)
N k=1 F
uF (k) = u(k) + a1u(k − 1) + a2u(k − 2) + ... + anu(k − n).
∂VN
1) Partial derivatives can be computed, etc.
∂ai
2) Vector form
 2
u(n + 1)+ a u(n) + a u(n − 1) +...+ anu(1)
1 2

1  u(n + 2)+ a1u(n + 1) + a2u(n) +...+ anu(2)
 
VN =

 .. 
N  

n(N ) +a1u(N − 1)+a2u(N − 2)+...+anu(N − n)
2
Parameter vector θ = [a1a2...an]T
†
Recognise YN and ΦN , then compute best fit θ̂N = ΦN YN .

ARX models will
Example 3: An ARX fit yk = B(z)
A(z)
u k + 1 e
A(z) k be discussed in one
of the next chapter
Measurements uk and yk for k = 1, ..., N .
Consider A(z) = 1 + a1z −1 and B(z) = b1z −1 + b2z −2,
so (1 + a1z −1)yk = (b1z −1 + b2z −2)uk + ek .
Cost function VN = (yk + a1yk−1 − b1uk−1 − b2uk−2)2.

P
   
a1 y(3)
Matrix solution: Collect θ =  b1 , YN = 
   ..  and

b2 y(N )
 
−y(2) u(2) u(1)
ΦN = 
 .. .. .. .

−y(N − 1) u(N − 1) u(N − 2)
h i−1
T
Estimate θ̂N = ΦN ΦN ΦT
N YN or PHIN\YN.

Least-squares estimate (LSE) from normal equations — Summary:
1 ||Y − Φ θ||2.
• Write the cost function as VN = N N N 2
• Compute the LSE from the normal equations

h i
ΦN ΦN θ̂N = ΦT
T
N YN
†
• Solution for the best fit θ̂N = ΦN YN ,
i−1
†
h
T
with pseudo-inverse ΦN = ΦN ΦN ΦT
N.
Is that all? Often yes, but ...

Least-squares estimate (LSE) from normal equations — Pitfalls:
h i
T
• Analytical difficulty: What if ΦN ΦN is singular?
h i
T
• Numerical difficulty: What if ΦN ΦN is near-singular?
How to solve the normal equations anyhow?
• How about the accuracy of the estimate θ̂N ?
Specific answers can be given for “easy” problems like y = ax + b.
For a more generic treatment we will look at

Singular Value Decomposition (svd).

Singular Value Decomposition (svd):
    
σ
  1 ..  
. 
  
   
Φ = U ΣV T T
    
or Φ = U σd 
 V
  
    
    
    
0
• Φ is a N × d matrix with N (rows) > d (columns).

• U is an orthogonal N × N matrix.
• V is an orthogonal d × d matrix.
• Σ is a N × d matrix with the singular values Σii = σi ≥ 0 (i = 1, 2, ..., d)
and 0 elsewhere.
Usually the singular values are sorted: σ1 ≥ σ2 ≥ ... ≥ σd ≥ 0.

Svd and (near) singularity:
Condition number of matrix Φ is the ratio of the largest and smallest singular
values σ1/σd.
• hIf all σii > 0 (i = 1, 2, ..., d), then the condition number is finite
ΦT Φ is invertible. Matrix Φ has full (column) rank.
• If σ = 0 for i = r + 1, r + 2, ..., d, the condition number is infinite

h i i
ΦT Φ is singular, r is the rank of matrix Φ.
ih
T
• “Large” condition numbers indicate near-singularity of ΦN ΦN and will
lead to numerical inaccuracies.

Svd and condition number (1):
Consider quadratic function y = c2x2 + c1x + c0
x2(1)
 
x(1) 1
Regression matrix ΦN = 
 .. .. .. 
 depends only on x(i).
x2(N ) x(N ) 1
❍❍
15 ❍❍
c0
10 40
y [−]
y [−]
5 c0
20
0
−5 0
0 5 10 0 10 20
x [−] x [−]
svd(phi) = [160.3; 5.2; 1.2] svd(phi) = [842.5; 9.1; 0.1]
cond(phi) = 131 cond(phi) = 6364
phi\y = [0.43; −2.33; −2.25] phi\y = [0.43; −10.9; 63.8]
Exact: [0.40; −2.00; −3.00] Exact: [0.40; −10.0; 57.0]

Svd and condition number (2):
Consider linear functions y = ax + b and y = a1x + a2(10 − x) + b

   
x(1) 1 x(1) 10−x(1) 1
Regression matrices ΦN = 
 .. ..  and Φ = 
  .. .. .. 
N 
x(N ) 1 x(N ) 10−x(N ) 1
15
10
y [−]
5
0
0 5 10
x [−]
svd(phi) = [19.8; 1.75] svd(phi) = [23.7; 14.8; 0.0]

cond(phi) = 11.3 cond(phi) = 5.7 · 1016
phi\y = [1.97; −2.76] phi\y = [1.70; −0.28; 0]
Warning: Rank deficient, rank = 2 tol = 4.8e-14.
Exact: [2.00; −3.00] Exact: [2.00; 0.00; −3.00]

Svd and (near) singularity (2):
Step 1: Split Σ in Σ1 with non-zero singular values and Σ2 ≈ 0:

    
Σ1 T
    
    
Φ = U ΣV T
    
Φ = U1 U2 Σ2 
or   V1 V2 
  
   
    
    
0
• Φ is a N × d matrix with N (rows) > d (columns).

• U is an orthogonal N × N matrix. U1 are the first r columns.
• V is an orthogonal d × d matrix. V1 are the first r columns.
• Σ1 is a r × r diagonal matrix with the nonzero singular values
σ1 ≥ σ2 ≥ ... ≥ σr > 0. The rest of the matrix is zero.

Step 1 (continued): Then only U1, Σ1 and V1 contribute to Φ, so
   
    
   
V1T
h i
Φ = U1Σ1V1T
   
or Φ = U1  Σ 
1

   
   
   
Step 2: Compute the (pseudo) inverse by only considering this part:
Φ† = V1Σ−1
1 U T
1 with Σ−1
1 = diag(1/σ1 , 1/σ2 , ..., 1/σr ).
Then the LSE is θ̂N = V1Σ−1U1T YN
This is the output of the M ATLAB command pinv(phi)*yn
Note: If the rank r of Φ is less than d, the solution is not unique.

SVD and LSE — example
Consider again the linear function y = a1x + a2(10 − x) + b

 
x(1) 10−x(1) 1
Regression matrix ΦN = 
 .. .. .. 

x(N ) 10−x(N ) 1
15
[u,s,v]=svd(phi);
10 r=1:2; % rank
y [−]
5 u1=u(:,r); s1=s(r,r); v1=v(:,r);

0 % Compute pinv(phi)*y:
v1*diag(1./diag(s1))*u1’ * y =
0 5 10 [ 1.68; -0.29; 0.14 ]
x [−]
v2=v(:,3) = [ 0.099; 0.099; -0.99 ]
svd(phi) = [23.7; 14.8; 0.0]
cond(phi) = 5.7 · 1016 The parameters can be modified with any
phi\y = [1.70; −0.28; 0] multiple of V2, e.g. into
Warning: Rank deficient, rank = 2 tol = 4.8e-14. [1.97; 0; −2.76].
Exact: [2.00; 0.00; −3.00]

SVD and LSE — summary part 1
• Singular Value Decomposition (svd) can provide the numerical tools to

compute the linear least squares estimate.
• It can detect a (near) singular regression matrix. The pseudo inverse of the
regression matrix ΦN can always be computed reliably.
• r parameters or linear combinations of parameters are obtained. In case of
non-uniqueness all solutions can be found found by taking into account
matrix V2.
What is left?
• The accuracy of the estimate θ̂N .
This will be discussed after some system identification topics .....

Approximate realisations (not part of ident)
Given: impulse response g(k) for k = 0, 1, ..., ∞ (Markov parameters) of a

finite-dimensional linear time invariant system.
Wanted: state space model of minimal order, like

x(k + 1) = Ax(k) + Bu(k)
y(k) = Cx(k) + Du(k)
Observations:
∞
g(k)z −k • G(z) = D + C(zI − A)−1B
X
• G(z) =
k=0

 CAk−1B k≥1
• g(k) =
 D k=0

Algorithm by Ho & Kalman (1966) to construct an exact model (A, B, C, D)
from the impulse response:
Compose Hankel matrix H:
 
g(1) g(2) g(3) ··· g(nc)
 g(2) g(3) g(4) ··· g(nc + 1)
 

 
Hnr ,nc = 
 g(3) g(4) g(5) ··· g(nc + 2) 
.. .. .. ..

 
 
g(nr ) g(nr + 1) g(nr + 2) · · · g(nr + nc − 1)
 
C
 CA h
n −1
i
Then Hnr ,nc =   B AB · · · A c B
 
 .. 
CAnr −1
so rank(H) is the minimal order n (provided nr and nc are sufficiently large).

To determine the rank n and to compute the decomposition, “singular value
decomposition” (SVD) will be applied:
H = UnΣnVnT
with Un and Vn unitary matrices (UnT Un = VnT Vn = In)
and Σn a diagonal matrix with the positive “singular values”

σ1 ≥ σ2 ≥ . . . ≥ σn > 0 on the diagonal.
Note that zero singular values are removed in the singular value decomposition.

1/2 1/2
Then also H = UnΣn Σn VnT and compute
1/2
B from the first column of H2 = Σn VnT and
1/2
C from the first row of H1 = UnΣn .
−1/2 ←
− −1/2
A is Σn UnT H VnΣn using the shifted Hankel matrix
 
g(2) g(3) g(4) ··· g(nc + 1)
g(3) g(4) g(5) ··· g(nc + 2)
 
←
−
 
 
H nr ,nc =  g(4) g(5) g(6) ··· g(nc + 3) 
.. .. .. ..
 
 
 
g(nr + 1) g(nr + 2) g(nr + 3) · · · g(nr + nc)

What if there is noise and g(k) is only a finite series for k = 0, 1, ..., N ?
• Construct a Hankel matrix with nc + nr = N .

• Apply SVD (H = U ΣV T ) and find out how many n singular values are
significantly nonzero.
• Next construct an approximating Hankel matrix of rank n according
Hn = UnΣnVnT , in which Un and Vn are the first n columns of U and V .
Σn is the diagonal matrix with the first n (nonzero) singular values.
• Apply the original Ho&Kalman algorithm to Hn to compute B and C. For A
←
−
the shifted Hankel matrix H with the original series g(k) is used.

Example: The estimated impulse response (CRA) of the simulated data from
z + 0.5
G0(z) = 2
z − 1.5z + 0.7
8
SVD’s:
6
7.410
3.602
5
0.033
4
0.033
0.023
3 0.023
0.016
2 0.015
0.009
1
Conclusion:
0
1 2 3 4 5 6 7 8 9
order n = 2.

Identified system (with Ho&Kalman algorithm):
" # " #
0.829 −0.379 1.392
A= B=
0.379 0.673 −0.968
h i h i
C= 1.3915 0.9681 D= 0
0.999z + 0.496
As a transfer function: Ĝ(z) = 2
z − 1.502z + 0.701
Pole-zero-plot:
Poles (x) and Zeros (o)
0.8
0.6
0.4 G0
0.2
−0.2 Ĝ
−0.4
−0.6
−0.8
−1 −0.5 0 0.5 1

Subspace identification (E.g. n4sid in ident)
More recent developed realisation method (early 90’s, KU Leuven), using

directly the input and output data (so without explicit impulse response).
+ No explicit model equations needed (specify only the order n, very well
suited for MIMO).
+ Mathematically elegant (robust, reliable) and efficient (optimisation with
linear equations).
− Not all mathematical issues of the optimisation are settled.
− The obtained solution is “sub-optimal”,
+ but is well suited as an initial guess to obtain better PE-models (these will
be discussed afterwards and may need non-linear iterative optimisation
algorithms).

Subspace algorithm (van Overschee and de Moor (1994,1996); ident manual
pages 3-13,17 & 4-140)
Typical state space model structure:
xk+1 = A xk + B uk Measured input and output uk and yk

y k = C xk + D u k + v k Noise source vk
Characterisation of the noise source according to the innovations form
xk+1 = A xk + B uk + K ek Measured input and output uk and yk

yk = C xk + D uk + ek White noise source ek
Equivalent to yk = G(z)uk + H(z)ek with
G(z) = C(zI − A)−1B + D

H(z) = C(zI − A)−1K + I

xk+1 = A xk + B uk + K ek
yk = C xk + D uk + ek
Solution approach:
• If the states xk , input uk and output yk were known, then :

Compute C and D with linear regression,
compute ek ,
compute A, B and K also with linear regression.
• How do we find the states xk ?
• Determine the order n of the system:
→ From a Singular Value Decomposition of the “correct” matrix.
Then find an estimator for the states.
Compute the state matrices as above.
This overview is rather “black box” alike. So-called PEM models will be discussed in much more
detail later.

Example subspace identification of simulated data from
z + 0.5
G0(z) = 2
z − 1.5z + 0.7
Order estimation: Pole-zero-plot:

Model singular values vs order Poles (x) and Zeros (o)
3
Red: Default Choice
0.8
2
0.6
0.4
1
0.2
Log of Singular values
0
0
−0.2
-1
−0.4
-2
−0.6
−0.8
-3
−1 −0.5 0 0.5 1
-4
G0 and Ĝn4s2.
-5
0 2 4 6 8 10 12 14 16
Model order

Example subspace identification of the piezo data
Order estimation: Pole-zero-plot:

Model singular values vs order Poles (x) and Zeros (o)
10
Red: Default Choice 2.5
9 2
1.5
8 1
0.5
Log of Singular values
7 0
−0.5
6 −1
−1.5
5 −2
−2.5
4 −3 −2 −1 0 1 2 3 4
Ĝn4s4 and Ĝn4s5.

3
0 2 4 6 8
Model order
10 12 14 16
The poles of Ĝn4s4 are quite
Matlab 7.0.4: 4e order? unlikely as the main resonance
(May be different for other versions)
peak is “overlooked”.

Frequency response
Spectral analysis: 10
0
Amplitude
−2
10
−4
10
2 3 4
10 10 10
−180
Phase (deg)
−360
−540
−720
−900
2 3 4
10 10 10
Frequency (Hz)
Ĝn4s4 and Ĝn4s5 compared with a non-parametric ETFE100-model.

Poles (using M ATLAB’s pole and damp commands after exporting the model from ident’s
GUI to the workspace and transforming into a tf model):
• 0.9988: “almost” pure integator
• 0.96 ± 0.27i: frequency 1.3 kHz and relative damping 0.031
• 0.82 ± 0.56i: frequency 2.9 kHz and relative damping 0.013

Prediction Error identification Methods: Introduction
A model with the essential dynamics of a system with a finite (and limited)
number of parameters is wanted for
• Simulation of the process.

• Prediction of future behaviour.
• Controller design.
• ...
The applicability of non-parametric models is limited.
Subspace models may be “suboptimal”.

e
PEM: system description and disturbances ❄
LTIFD (Linear Time Invariant Finite Dimensional) system H0

y(t) = G0(z) u(t) + v(t) v
+
u ✎☞
❄ y
✲
G0 +
✲
✍✌
✲
with measured input u(t) and output y(t).

The non-measurable disturbance v(t) includes:
• measurement noise, • effects on non-measured inputs.

• process disturbances, • non-linearities.
Noise model: v(t) with power spectrum Φv (ω) that can be written as
v(t) = H0(z) e(t)
The white noise e(t) has a variance σe2 and

the transfer function H0(z) is stable, monic (H0(∞) = 1) and minimum phase
(so it has a stable inverse).

PEM: prediction and prediction error (ident manual page 3-16)
Assume the dynamic system is given as y(t) = G(z)u(t) + H(z)e(t) with

observations Yt−1 = {y(s), s ≤ t − 1} and Ut−1 = {u(s), s ≤ t − 1}.
What is the best prediction for y(t) ?
Situation I: u = 0, so y(t) = v(t) = H(z)e(t) and Vt−1 is known.

∞
Then v(t) = e(t) + m(t − 1), with m(t − 1) = h(k)e(t − k).
P
k=1
As H(z) has a stable inverse, also Et−1 is known, and then also
m(t − 1) = [1 − H −1(z)]v(t).
The expectation for v(t) is:
v̂(t|t − 1) := E{v(t)|Vt−1} = m(t − 1).

Situation II: u 6= 0, so y(t) = G(z)u(t) + v(t) and Ut−1 and Yt−1 are known.
Then Vt−1 is also known.
So E{y(t)|Ut−1, Yt−1} = G(z)u(t) + E{v(t)|Vt−1},
and ŷ(t|t − 1) = G(z)u(t) + [1 − H −1(z)]v(t).
With v(t) = y(t) − G(z)u(t) we find
the predictor ŷ(t|t − 1) = H −1(z)G(z)u(t) + [1 − H −1(z)]y(t),
and the prediction error

ε(t) := y(t) − ŷ(t|t − 1) = H −1(z)[y(t) − G(z)u(t)].
Predictor models: “Tuning” of the transfer functions G(z) and H(z). When they
both are exactly equal to the “real” G0(z) and H0(z), the prediction error ε(t)
is a white noise signal.
In practice: “tune” the estimates to minimise the error ε(t) with a least squares
fit.

Model structures
Predictor model: Two rational functions (G(z), H(z)).
Model candidates from the model set M = {(G(z, θ), H(z, θ)) | θ ∈ Θ ⊂ Rd}.
Parameterisation: models are represented by real valued parameters θ from a

parameter set Θ.
Examples of parameterisations:
• Coefficients of series impulse response models

• Coefficients in state space matrices subspace identification
• Coefficients in fractions of polynomials PE methods

ARX Model structure (ident manual page 3-11)
−1 ,θ)
G(z, θ) = B(z
A(z −1,θ)
, H(z, θ) = 1
A(z −1,θ)
,
with B(z −1, θ) = b1 + b2z −1 + ... + bnb z −(nb−1)

A(z −1, θ) = 1 + a1z −1 + ... + ana z −na
and θ = [a1 a2 ... ana b1 b2 ... nnb ]T ∈ Rna+nb .
B(z −1 ,θ) 1
y(t) = A(z −1,θ) u(t) + A(z −1 ,θ)
e(t)
A(z −1, θ) y(t) = B(z −1, θ) u(t) + e(t)

(AutoRegressive eXogenous)
Predictor: ŷ(t|t − 1; θ) = B(z −1, θ) u(t) + [1 − A(z −1, θ)] y(t)

is linear in the parameters θ.

Model structures
Name equation G(z, θ) H(z, θ)

B(z −1,θ) 1
ARX A(z −1) y(t) = B(z −1) u(t) + e(t) A(z −1,θ) A(z −1,θ)
B(z −1,θ) C(z −1,θ)

ARMAX A(z −1) y(t) = B(z −1) u(t) + C(z −1) e(t) A(z −1,θ) A(z −1,θ)
B(z −1) B(z −1,θ)

OE y(t) = F (z −1) u(t) + e(t) F (z −1,θ)
1
FIR y(t) = B(z −1) u(t) + e(t) B(z −1, θ) 1
B(z −1) C(z −1 ) B(z −1,θ) C(z −1,θ)

BJ y(t) = F (z −1) u(t) + D(z −1) e(t) F (z −1,θ) D(z −1,θ)

Properties of the model structures
Linearity-in-the-parameters: if H −1(z)G(z) and [1 − H −1(z)] are

polynomials, then the predictor ŷ is a linear function in θ.
Result: least squares criterium for ε(t) is quadratic in θ and there is an
analytical solution for the best fit θ̂ (fast algorithm, suitable as order estimator).
OK: ARX, FIR. Not: ARMAX, OE, BJ.
Independent parameterisation of G(z, θ) and H(z, θ): no common parameters

in G and H.
Result: independent identification of G and H.
OK: OE, FIR, BJ. Not: ARX, ARMAX.
The structure and the order define the model set M.

Identification criterium
Ingredients for the identification

• the experimental data {(y(t), u(t)), t = 1, ..., N }.
• the model set M.
• the identification criterium.
PE-methods: consider the prediction error ε(t, θ) = y(t) − ŷ(t|t − 1, θ) for all t
as a function of θ.
• Least squares criterium: minimisation of the scalar function of ε(t, θ):

N
1
VN = N
P
ε(t, θ)2
t=1
N
1
or a filtered error: VN = N
P
(L(z)ε(t, θ))2.
t=1
• Instrumental variable (IV) techniques: try to obtain an uncorrelated sig-
nal ε(t, θ).

Consistency properties of the least squares criterium
If
• the system (G0, H0) is in the chosen model set and
• Φu(ω) 6= 0 in sufficient frequencies (sufficiently exciting)
then G(z, θN ) and H(z, θN ) are consistent estimators.
If
• the system G0 is in the chosen model set,
• G and H are parameterised independently (FIR, OE, BJ) and
• Φu(ω) 6= 0 in sufficient frequencies (sufficiently exciting)
then G(z, θN ) is a consist estimator.
Note: The IV-method combined with an ARX model set can also provide a
consistent estimator for G0 even when the noise model is not in the chosen
model set.

Example of the consistency properties
Model system from the ident manual page 2-14:
z −1 + 0.5z −2 1.0 − 1.0z −1 + 0.2z −2

Or G0(z) = and H0(z) =
1.0 − 1.5z −1 + 0.7z −2 1.0 − 1.5z −1 + 0.7z −2
−1) = 1.0 − 1.5z −1 + 0.7z −2





A(z
This is an ARMAX structure with B(z −1) = z −1 + 0.5z −2

−1 −1 −2

C(z ) = 1.0 − 1.0z + 0.2z

Simulation (as before): N = 4096

T = 1s
fs = 1 Hz
e(t) white noise (random signal) with variance 1

Parametric models:
G0, H0 model
armax2221 na = 2, nb = 2, nc = 2, nk = 1 G0 and H0 ∈ M.
arx221 na = 2, nb = 2, nk = 1 only G0 ∈ M.
oe221 nb = 2, nf = 2, nk = 1 H = 1.
arx871 na = 8, nb = 7, nk = 1 High order guess.
Spectral analysis: Pole-zero-plot:

Frequency response Poles (x) and Zeros (o)
1
1
10
0.8
Amplitude
0
10
0.6
10
−1 0.4
0.2
−2 −1 0
10 10 10
0
0
−0.2
−90
Phase (deg)
−0.4
−0.6
−180
−0.8
−270
−2 −1 0
10 10 10
−1
Frequency (rad/s) −1 −0.5 0 0.5 1

Parametric models:
G0, H0 model
armax2221 na = 2, nb = 2, nc = 2, nk = 1 G0 and H0 ∈ M.
arx221 na = 2, nb = 2, nk = 1 only G0 ∈ M.
oe221 nb = 2, nf = 2, nk = 1 H = 1.
arx871 na = 8, nb = 7, nk = 1 High order guess .
Noise spectrum (|H|): Step response:

Power Spectrum Step Response
2
10 10
7
1
10
4
0
10
3
−1
10 0
−2 −1 0
10 10 10 0 5 10 15 20 25 30 35 40
Frequency (rad/s) Time

Asymptotic variance
For system that fit in the model set, it can be proven for n → ∞, N → ∞ and
n/N → 0 (n ≪ N ) that
iω n Φv (ω)
var(ĜN (e )) ∼ ,
N Φu(ω)
n Φv (ω)
iω n iω 2
var(ĤN (e )) ∼ = |H0 (e )| .
N σe2 N

Approximate modelling:
What happens if the model set does not fit?
Error: ε(t, θ) = H −1(z, θ)[y(t) − G(z, θ)u(t)]

= H −1(z, θ)[(G0(z) − G(z, θ))u(t) + v(t)]
|G0(eiω ) − G(eiω , θ)|2 Φu(ω) + Φv (ω)

Power spectrum: Φε(ω, θ) = iω 2
with
|H(e , θ)|
Φv (ω) = σe2|H0(eiω )|2.
N
1
Cost function: VN = N
P
ε(t, θ)2.
t=1
Limit for N → ∞: VN → V̄ (θ) = Ēε(t, θ)2.
1 ∞
Z
With Parseval: V̄ (θ) = Φε(ω, θ) dω
2π −∞
shows the limit θ ∗ to which the estimator θN converges.

1 ∞ |G0(eiω ) − G(eiω , θ)|2 Φu (ω) + Φv (ω)
Z
Minimising iω 2
dω
2π −∞ |H(e , θ)|
Two mechanisms:
|G0(eiω ) − G(eiω , θ)|2 Φu (ω)

• Minimising .
|H(eiω , θ)|2
• Fitting of nominator and denominator.
Observation: The way G0 is approximated by G(z, θ̂N ) depends on the noise

model H(z, θ) (so in fact filtering occurs).

Example (Ljung 1999, Example 8.5, page 268)
y(t) = G0(z)u(t), with white noise input (Φu(ω) ≈ 1) and 4th order G0
0.001z −2(10 + 7.4z −1 + 0.924z −2 + 0.1764z −3)

G0(z) =
1 − 2.14z −1 + 1.553z −2 − 0.4387z −3 + 0.042z −4
Approximate models:
b1 z −1 + b2 z −2
• 2nd order OE (oe221): y(t) = −1 −2
u(t) + e(t)
1 + f1 z + f2 z
• 2nd order ARX (arx221):
(1 + a1 z −1 + a2 z −2) y(t) = (b1 z −1 + b2 z −2) u(t) + e(t)

Amplitude Bode-plot of G0, 2nd order OE (oe221) and 2nd order ARX
(arx221) (left).
1
10
0
10
0
−1
10
10
Amplitude Bode plot
|A|
−2
10 −1
10
−3
10
−2
10
−2 −1 0 −2 −1 0
10 10 10 10 10 10
frequency (rad/s) frequency (rad/s)
Background: In the ARX estimation there is an extra filtering with the (a priori
unknown) function (right):
iω 2 1
= |A(e , θ)|
|H(eiω , θ)|2

Approximate modelling: using a fixed noise model
That means: H(eiω , θ) = H∗(z)
With Parseval we obtain the limit θ ∗ to which the estimator θN converges now
by minimising
1 ∞ |G0(eiω ) − G(eiω , θ)|2 Φu(ω) + Φv (ω)

Z
dω, or
2π −∞ |H∗(eiω )|2
1 ∞ Φu(ω)
Z
|G0(eiω ) − G(eiω , θ)|2 iω 2
dω
2π −∞ |H∗(e )|
Mechanism:
Find the least squares estimate G(eiω , θ) of G0(eiω ) by applying a frequency
domain weighting function Φu(ω)/|H∗(eiω )|2.
Example: For the OE-model H∗(z) = 1.

Approximate modelling: using a prefilter L(z)
N N
1 X 2 1 X
New cost function: VN = εF (t, θ) = (L(z) ε(t, θ))2.
N t=1 N t=1
Limit for N → ∞: VN → V̄F (θ) = ĒεF (t, θ)2.
Again with Parseval the limit θ ∗ to which the estimator θN converges:
1 ∞ |L(eiω )|2
Z
V̄F (θ) = iω iω 2
{|G0(e ) − G(e , θ)| Φu(ω) + Φv (ω)} dω
2π −∞ iω
|H(e , θ)| 2
Observation: The way G0 is approximated by G(z, θ̂N ) depends on the

modified noise model L−1(z)H(z, θ), of which the prefilter L(z) can be tuned
by the user.

Example (extended with prefilter for identification)
y(t) = G0(z)u(t), with PRBS input (Φu(ω) ≈ 1) and 4th order G0
0.001z −2(10 + 7.4z −1 + 0.924z −2 + 0.1764z −3)

G0(z) =
1 − 2.14z −1 + 1.553z −2 − 0.4387z −3 + 0.042z −4
Bode amplitude plots of 2e order (approximate) estimates:

Frequency response
0
10
G0
OE, L(z) = 1
Amplitude
−2
10
ARX, L(z) = 1
OE, L(z) = L1(z)
−2 −1 0
ARX, L(z) = L2(z)
10 10 10
Frequency (rad/s)
Applied 5th order Butterworth prefilters:

L1(z): high-pass, cut-off 0.5 rad/s.
L2(z): low-pass, cut-off 0.1 rad/s.

Remarks:
• Both the 2nd order OE model without prefilter and the 2nd order ARX
model with prefilter L2(z) give a good model estimate at low frequencies.
For the obtained ARX-model the Bode amplitude plots of the prefilter
|L2(eiω )| (dashed) and the ultimate weighting function |L2(eiω )A∗(eiω )|
(solid) are:
Frequency response
0
10
−2
10
Amplitude
−4
10
−6
10
−8
10
−10
10
−2 −1 0
10 10 10
Frequency (rad/s)
Note that ARX models (with and without a prefilter) can be computed
uniquely and quick from a linear optimisation problem, whereas the
identification of OE models involves an iterative algorithm for a nonlinear
optimisation.

• In ident’s GUI a Butterworth filter of a specified order can be applied to
input and output data.
At M ATLAB’s command prompt any prefilter L(z) can be applied by
computing
uF (t) = L(z) u(t) and yF (t) = L(z) y(t).
and next using the filtered data for the identification.
See also e.g. help idfilt.
• The estimated model depends of course on the chosen model structure. In
addition the result can be tuned by
• the chosen input spectrum Φu(ω),
• the chosen prefilter L(z) and
• (when possible) the chosen noise model H(z) (actually comple-
mentary to L(z)).
∞
1 |L(eiω )|2
Z
iω iω 2
V̄F (θ) = {|G0(e ) − G(e , θ)| Φu (ω) + Φv (ω)} iω , θ)|2
dω
2π −∞ |H(e

Approximate modelling: time domain analysis
The asymptotic estimator θ ∗ of an ARX model

(order denominator = na, order nominator = nb − 1, delay nk = 0)
with an input signal that satisfies Ru(τ ) = 0 for τ 6= 0 is always:
• a stable G(z, θ ∗),

• an exact result for the first nb samples of the impulse response g0(k),
k = 0, ..., nb − 1.
On the contrary, in an OE model the (total) squared difference between the

impulse responses of system and model are minimised.

Example 4th order G0:
0.001z −2(10 + 7.4z −1 + 0.924z −2 + 0.1764z −3)

G0(z) =
1 − 2.14z −1 + 1.553z −2 − 0.4387z −3 + 0.042z −4
Impulse responses of: G0, arx120, arx230, arx340, oe210 and oe320:
Impulse Response Impulse Response
0.06 0.08
0.05 0.07
0.06
0.04
0.05
0.03
0.04
0.02 0.03
0.02
0.01
0.01
0
0
−0.01
−0.01
−0.02 −0.02
0 1 2 3 4 0 10 20 30 40 50 60 70 80 90 100
Time Time
Note: nb ≤ 2 does not give meaningful results.

Choice of the model set
• The chosen model parameterisation, so writing [G(z, θ) and H(z, θ)] as a

polynomial fraction.
• The choice of the model structure in the transfer functions
[G(z, θ), H(z, θ)]: ARX / ARMAX / OE / BJ:
Linearity in the parameters (ARX, FIR) or not.
Independent parameterisation of G(z, θ) and H(z, θ) (OE, FIR, BJ) or not.
• De choice of the model complexity, so the number of parameters or the
order of the polynomials.
What are the criteria to make these choices in order to obtain a good model for
a fair price?

What is a good model?
Depends on the goal: Is it e.g. good enough to design a controller of does it

simulate the most important dynamic behaviour?
More general: small bias and small variance.
Conflict:
• For a small bias a large model set is necessary (high order, flexible
model structure).
• A small variance is obtained easier with a small number of parameters.
Large model set: Small model set:

Good fit with data (small residue) Bad fit with data (large residue)
Large variance Small variance
Model depends on the specific noise Better distinction between stochastic
realisation and structured effects

What makes up the “price” of the model?
• “Price” of the identification: Complexity of the algorithm, amount of work.

• “Price” of model usage: Order in the case of controller design or
simulations.
Methodology for the choice of the model set
• A priori considerations
• Analysis of the data
• A posteriori comparison of the models
• Validation of the models

A priori considerations
Physical insight regarding the (minimal) order of the model.
Physical insight regarding the nature of the noise disturbance.
Relation between the number of data points N and the number of parameters
to be estimated Nθ : General: N ≫ Nθ
Rule of thumb: N > 10Nθ
Note that the required number of points depends strongly on the signal to noise
ratio.

Analysis of the data
Information regarding the model order from
• Non-parametric identification: (anti)resonance peaks, phase behaviour.

• Approximate realisation methods by doing an SVD-analysis of the
Hankel-matrix or for subspace identification.
• Evaluation of the rank of the Toeplitz matrix:
N
1
R̂(n) = N ϕn(t)ϕT
n (t)
P
t=1
with ϕn (t) = [−y(t − 1) ... − y(t − n) u(t) ... u(t − n)]T .
For noiseless data matrix R̂(n) becomes singular when the order n is taken
“too large” (compare with the ARX order estimator).

A posteriori comparison of the identified models
For a chose model structure, compare the obtained results for the criterium
VN (θ̂N , Z N ) as a function of the parameters sets θ̂N of different model orders.
ARX-example (system with na = 2, nb = 3) in left graph: VN decreases for

increasing number of parameters (i.e. model order), so the best fit is obtained
with the most complex model!
Model order selection: estimation data set Model order selection: validation data set
0.012 0.012
0.0115 0.0115
0.011 0.011
loss function
loss function
0.0105 0.0105
0.01 0.01
0.0095 0.0095
0.009 0.009
0 5 10 15 0 5 10 15
# of parameters # of parameters

Solution to avoid this overfit: cross-validation:
Split data into two parts: Z N = Z (1)Z (2).
Identify θ̂N from Z (1) and evaluate criterium for Z (2): right graph: There is a
minimum VN near 5 prameters, but it is not very distinct.
Model order selection: estimation data set Model order selection: validation data set
0.012 0.012
0.0115 0.0115
0.011 0.011
loss function
loss function
0.0105 0.0105
0.01 0.01
0.0095 0.0095
0.009 0.009
0 5 10 15 0 5 10 15
# of parameters # of parameters

ARX order selection
For ARX models automatic (and reliable) selection criteria exist (considering
ĒVN (θ̂N , Z N )) (ident manual page 3-70 ff, 4-183 ff):
• Akaike’s Information Criterion (AIC)

AIC = log((1 + 2 n/N ) V )
• Akaike’s Final Prediction Error Criterion (FPE)
1+n/N
FPE = 1−n/N V
• Rissanen’s minimum description length (MDL)
MDL = (1 + log N ∗ n/N ) V

ARX order estimation: 4th order ARX model system with noise
With one data set: Model Fit vs # of par’s

With validation data: Model Fit vs # of par’s
2 2
1.8 1.8
1.6 1.6
% Unexplained of output variance
% Unexplained of output variance

1.4 1.4
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
# of par’s # of par’s
Blauw = AIC and MDL criterium, Rood = Best fit

Comparison between arx432 and 4th order G0
0.001z −2(10 + 7.3z −1 + 0.8z −2)

Garx432(z) =
1 − 2.147z −1 + 1.557z −2 − 0.4308z −3 + 0.0371z −4
0.001z −2(10 + 7.4z −1 + 0.924z −2 + 0.1764z −3)
G0(z) =
1 − 2.14z −1 + 1.553z −2 − 0.4387z −3 + 0.042z −4
Bode: Frequency response

Pole/zero: Poles (x) and Zeros (o)
0
10
0.8
−1
Amplitude
10
0.6
−2
10
0.4
−3 0.2
10
−2 −1
10 10
0
−0.2
−90
−0.4
Phase (deg)
−180
−270 −0.6
−360 −0.8
−450
−2 −1
10 10
−1 −0.5 0 0.5 1
Frequency (Hz)

ARX order estimation: Piezo data
Frequency response
0
10
Amplitude
−2
10
−4
10
2 3 4
10 10 10
−200
Phase (deg)
−400
−600
−800
2 3 4
10 10 10
Frequency (Hz)
Problem: The suggested arx 8 10 4 puts too much emphasis on high

frequencies.
The right graph shows a comparison with the non-parametric etfe 60.

Suggestion: filter with a low-pass filter in the frequency range 0...3788 Hz and
resample with a factor 4.
Comparison between arx542 and a (new) etfe200:
Bode: Frequency response

Pole/zero: Poles (x) and Zeros (o)
1
10
0
10 0.8
Amplitude
−1
10 0.6
−2
10 0.4
−3
10 0.2
2 3
10 10
0
−0.2
−100
−200
−0.4
Phase (deg)
−300
−400 −0.6
−500
−600 −0.8
−700
2 3
10 10
−1 −0.5 0 0.5 1
Frequency (Hz)
First eigenfrequency ≈ 1300 Hz (ζ ≈ 0.03).

Clearly more emphasis on the low-frequent behaviour (sufficient ?)

Validation of the models
The ultimate answer to the question whether the identified model is “good
enough”, so also for the choice of the model structure and order.
Techniques:
• Comparison of model G(z, θ̂N ) with previous results:

Non-parametric estimates (spectra, impulse response).
• Model reduction: pole-zero-cancellation (within the relevant confidence
interval) may indicate too high model order.
• Large confidence intervals for one or more parameters may indicate too
high model order (or too few data).

• Simulation of the model: Consider the simulation error
esim(t) = y(t) − ysim(t) = y(t) − G(z, θ̂N ) u(t).
“Optimal” case: esim (t) = v(t) = H0(z)e(t).
In order to avoid overfit, cross-validation is preferable.
• Residual test: For a consistent model the prediction error ε(t) (the residue)
converges to a white noise signal. Two tests:
1. Is ε(t, θ̂N ) a realisation of white noise?
Evaluate autocovariance RεN (τ ) → δ(τ ).
(Test for both Ĝ and Ĥ)
2. Is ε(t, θ̂N ) a realisation of a stochastic process?
N (τ ) → 0.
Evaluate cross-covariance Rεu
(Test for Ĝ)

NX
−τ
1
Method 1: Autocovariance RεN (τ ) = ε(t + τ )ε(t) for τ ≥ 0.
N t=1
Requirement RεN (τ ) small for τ > 0 and N large:

√
|RεN (τ )|/RεN (0) ≤ Nα/ N
with Nα the confidence level for the confidence interval with probability α (E.g.
N95% = 1.96, N99% ≈ 3).
N 1 NX
−τ
Method 2: Cross-covariance Rεu(τ ) = ε(t + τ )u(t) for τ ≥ 0.
N t=1
N (τ ) ≤ N
√ √
Requirement Rεu α P/ N
∞
with estimator for P : P̂ N = N
R̂εN (k)R̂u
X
(k)
k=−∞

Example: 4th order G0 with noise (arx442-model).
System model Noise model

arx432 OK OK
oe342 OK not OK (autocovariance does not fit in the plot)
arx222 not OK not OK
Autocorrelation of residuals for output 1
0.2
0.1
−0.1
−0.2
−20 −15 −10 −5 0 5 10 15 20
Cross corr for input 1and output 1 resids

0.1
0.05
−0.05
−0.1
−20 −15 −10 −5 0 5 10 15 20
Samples

Identification design: Introduction
The user can tune the identified model by choosing
• the model set (model structure and order)

• a prefilter L(q)
• the experimental conditions like the number of data samples N , de sam-
ple time Ts, the input spectrum Φu(ω), etc.
Which requirements should be satisfied by the model?
• Consistency? No problem with validation and sufficiently exciting input

signal. However, is the requirement for a correct model set realistic?
• Adequate for the intended purpose: simulation, prediction, diagnostics,
controller design.

Design variables for the estimation of the transfer functions
The asymptotic estimator of the parameters θ ∗ minimises
1 ∞ |L(eiω )|2
Z
V̄F (θ) = iω iω 2
{|G0(e ) − G(e , θ)| Φu(ω) + Φv (ω)} dω
2π −∞ iω
|H(e , θ)| 2
in which a compromise between two mechanisms plays a role:
1 ∞
Z
• Minimisation of |G0(eiω ) − G(eiω , θ)|2 Q(ω, θ ∗) dω, with
2π −∞
Q(ω, θ) = Φu(ω) |L(eiω )|2 / |H(eiω , θ)|2.
• Fitting of |H(eiω , θ)|2 with the error spectrum.
The limit θ ∗ depends on the model set, the prefilter L(q) and the input
spectrum Φu(ω).

Fitting of G0(eiω ): How can we get a fit in terms of the usual Bode plots?
• The vertical axis of the Bode amplitude plot shows log |G|, which is
approximated better with a small relative error. We are minimising
1 ∞ G0(eiω ) − G(eiω , θ) 2
Z
iω 2 ∗ ∗)
| | |G 0 (e )| Q(ω, θ ) dω, so Q(ω, θ
2π −∞ G0(eiω )
should be large when G0(eiω ) becomes small.
• The horizontal axis shows log ω, which means that high frequencies
dominate. This can be compensated by looking at
|G0(eiω )|2 Q(ω, θ ∗) dω = ω|G0(eiω )|2 Q(ω, θ ∗) d(log ω). So the fit at
low frequencies improves if |G0(eiω )|2 Q(ω, θ ∗) is larger than ω in that
frequency region.
The factor Q can be manipulated by modification of the design variables: input

spectrum Φu(ω), noise model set {H(eiω , θ)}, prefilter L(q).

Experiment design
First analysis of the process (preparation):
• What are the possible input(s) and output(s) of the system?

• How are these signals being measured?
• At which frequencies do we expect essential dynamic behaviour?
• What are usual and extreme amplitudes of the signals.
Preliminary experiments (not specifically aiming at identification):
• Measurement of output signal (noise) for constant input signal.

• Measurement of a step response:
- Linearity - Relevant frequencies
- Static gain - Selection input signal
• Non-parametric identification (correlation and frequency analysis) with har-
monic or broadband input signal:
- Relevant frequencies - Duration of the experiment
- Sample frequency - Selection input signal

Input signals
Quite often a broadband input signal is desirable: sufficiently exciting and the
data contains information of the system in a large range of frequencies.
Matlab-tool: u = idinput(N,type,band,levels)
Possibilities for (type):
• ’RGS’: Random, Gaussian Signal: discrete white noise with a flat

spectrum
• ’RBS’: Random, Binary Signal (default).
• ’PRBS’: Pseudo-Random, Binary Signal.
• ’SINE’: sum of harmonic signals (sine functions).

Pseudo-Random, Binary Signal
• Signal that can be generated easily by a computer algorithm using n shift

registers and modulo-2 addition ⊕:
s(t) = xn(t)
xi(t + 1) = xi−1(t) 2≤i≤n
x1(t + 1) = a1x1(t) ⊕ a2x2(t) ⊕ ... ⊕ anxn(t)
Clock 0⊕0=1⊕1=0
0⊕1=1⊕0=1
❄ x1 ❄ x2 xn−1 ❄ xn=s
✲ State 1 ✲ State 2 ✲ State n ✲
❄ ❄ ❄ ❄
❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁
❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁
❆✁ a1 ❆✁a2 ❆✁ an−1 ❆✁ an
✛✘
❄ ✛✘ ❄ ✛✘
❄ ✛✘
❄
✛ ✛ ✛
⊕
✚✙
⊕
✚✙
⊕
✚✙
⊕
✚✙
Given: initial condition and binary coefficients a1, ..., an.

• Output: Deterministic periodic signal. “Pseudo-random”?

• Maximum length PRBS: The period of the PRBS is as large as possible and
equals M = 2n − 1 if the coefficients are chosen correctly, for example:
n ai = 1 n ai = 1 n ai = 1 n ai = 1
3 1,3 7 1,7 11 9,11 15 14,15
4 1,4 8 1,2,7,8 12 6,8,11,12 16 4,13,15,16
5 2,5 9 4,9 13 9,10,12,13 17 14,17
6 1,6 10 3,10 14 4,8,13,14 18 11,18
• The binary signal s(t) can be transformed into a signal u(t) with amplitude
c and mean m with u(t) = m + c(−1 + 2 s(t)):
c
Ēu(t) = m + M
Ru(0) = (1 − M12 )c2

2
c (1 + 1 )
Ru(τ ) = − M M τ = 1, ..., M − 1

• For M → ∞ the autocovariance Ru(τ ) converges to a white noise
autocovariance.
Ru(τ )
✻
c2(1 − M12 ) ❊ ✆❊
❊ ✆❊
❊ ✆ ❊
❊ ✆ ❊
❊ ✆ ❊
❊ ✆ ❊
❊ ✆ ❊
✆
c2 1) ❊
❊ ✆
❊
❊
✲ τ
− M (1 + M M
The power spectrum can be modified by filtering of the signal with a (linear)
filter, but then the binary character is no longer guaranteed.
• The binary character is advantageous in the case of (presumed)
non-linearities as it has a maximum power for a limited amplitude.

c ✻
PRBS, Nc = 1
✲
t
−c
c ✻
PRBS, Nc = 2
✲
t
−c
• The actual clock period can also be taken Φu(ω)/2π

1.6
equal to a multiple of the sample 1.4
frequency: 1.2
Nc = 10
uNc (t) = u(ent(t/Nc)). 1
The signal is stretched.

0.8
0.6
Nc = 5
The spectrum is no longer flat and even 0.4 Nc = 3
has some frequencies with Φu(ω) = 0. 0.2 Nc = 1
0
0 0.5 1 1.5 2 2.5 3 3.5

Random Binary Signal
This is also a binary signal, but it is a stochastic signal (instead of deterministic

like the PRBS).
Two classes:
1. Generated from u(t) = c sign[R(q)u(t − 1) + w(t)], in which
w(t) is a stochastic white noise process, and
R(q) is a stable linear filter. The power spectral density Φu(ω) depends on
R(q). For R(q) ≡ 0 the output spectrum is flat.
2. A random binary signal u(t) that equals Φu (ω)/2π
1.5
±c, according to
P r(u(t) = u(t − 1)) = p and
1
P r(u(t) = −u(t − 1)) = 1 − p, in which

p is the probability that the signal does not p = 0.90
change the next sample time (0 < p < 1). 0.5
p = 0.75
The power spectral density depends on p: p = 0.50
For p = 1/2 the spectrum is flat. 0
0 0.5 1 1.5 2 2.5 3 3.5

Periodic sum of harmonic signals (sine functions)
Compute
r
X
u(t) = αk sin(ωk t + ϕk )
k=1
with a user defined set of excitation frequencies {ωk }k=1,...,r and associated
amplitudes {α}k=1,...,r .
Selection of the phases {ϕ}k=1,...,r :
• For a minimal amplitude in the time domain: Schroeder-phased sines

• Randomly chosen.
Information of the system is obtained in a limited number of frequencies.

Sample frequency ωs = 2π/Ts
Two different situations:

• data acquisition.
• identification & application of the model.
Advice for data acquisition: collect as many data as possible.

For a fixed total measurement time TN = N · Ts and unlimited N the sample
time Ts is as small as possible.
For a fixed (or limited) N the sample time Ts should not be too small in order to
capture also the slowest responses of the system.
Rule of thumb:
• Upper limit Ts, lower limit ωs:
Nyquist frequency ωN = ωs/2 > highest relevant frequency.
For a first order system with bandwidth ωb: ωs ≥ 10ωb.
• Lower limit TN > 5–10 times the largest relevant time constant.
Make sure to apply (analog) anti-aliasing filters when needed.

The advice for parametric identification & application of the model is:
Numerical aspects define an upper limit for ωs:
A continuous system with state space matrix Ac and ZOH discretisation has a
discrete state space matrix Ad = eAcTs .
For Ts → 0 it appears that Ad → I: all poles cluster near z = 1.
The physical length of the range of the difference equation becomes smaller.
PE-methods also emphasise high frequencies if Ts is too small:
The prediction horizon (one step ahead) becomes small.

For smaller Ts / Zlarger ωs also higher frequencies are included in the
minimisation of |G0(eiω ) − G(eiω , θ)|2 Q(ω, θ) dω.
In e.g. an ARX model it is very likely that the weighting of (too) high frequencies
is too large (Q(ω, θ) = Φu (ω) |A(eiω , θ|2).

Rule of thumb for the upper limit for a first order system : ωs ≤ 30ωb .
Combined (for a first order system): 10ωb ≤ ωs ≤ 30ωb.
Strategy:
• Data acquisition and preliminary experiments at a high ωs.

• Where needed reduction of ωs for parametric identification by applying a
digital filter.

Data processing:
• If necessary an anti-aliasing and/or noise filter should be used during the

measurement: analog low-pass filters.
• Outliers / spikes: visual inspection and manual removal.
• Means and drift: remove of model explicitly.
• Scaling of signals: very important for MIMO systems.
• Delays: compensate or model explicitly.
• Adaptation of the sample frequency for the identification: after digital
anti-aliasing filtering (explicit or automatic in a Matlab command).
Take notice of this step when the input data is generated: Do not put
energy in a frequency band that is later removed (e.g. for a PRBS: set Nc).

Parameter identification: Model parameters have physical meaning.
Example: double pendulum with actuators m 2 J2

(motors) driving the rotations φ1 and φ2 φ2
with torques T1 and T2.
φ1
m1 l2 m3
l1
Non-linear equations of motion J1
(see e.g. Dynamics of Machines (191131730)):
g
" # " #
φ̈1 h
m3 l1 l2 (φ̇22 + 2φ̇1 φ̇2 ) sin φ2 + (m2 + m3 )l1 g sin φ1 + m3 l2 g sin(φ1 + φ2 )
i T1
M̄ = −m3 l1 l2 φ̇21 sin φ2 + m3 l2 g sin(φ1 + φ2 )
+
φ̈2 T2
with (reduced) mass matrix

m2 l12 + m3 (l12 + l22 + 2l1l2 cos φ2) + J1 + J2 m3 (l22 + l1l2 cos φ2 ) + J2
M̄ = m3(l22 + l1l2 cos φ2) + J2 m3 l22 + J2

Parameter-linear form
Parameters:
• Link lengths (l1 and l2) are known.
• Gravity g is known
• Masses and inertias (m1, J1, m2, J2 and m3) are to be estimated.
• Measurements of the torques (T1 and T2).
• Angles known as function of time (φ1(t) and φ2(t)).
Can the equations of motion be expressed in a parameter-linear form?
τ = Φ(q , q̇ , q̈ )p
with
• measurements vector τ = [T1, T2]T ,
• parameter vector p including m1, J1, m2, J2 and m3,
• regression matrix Φ(q , q̇ , q̈ ) depending only on known kinematic quantities
with q = [φ1, φ2]T .

Parameter-linear form (2)
" # " #
φ̈1 h
m3 l1 l2 (φ̇22 + 2φ̇1 φ̇2 ) sin φ2 + (m2 + m3 )l1 g sin φ1 + m3 l2 g sin(φ1 + φ2 )
i T1
M̄ = −m3 l1 l2 φ̇21 sin φ2 + m3 l2 g sin(φ1 + φ2 )
+
φ̈2 T2

m2 l12 + m3 (l12 + l22 + 2l1l2 cos φ2 ) + J1 + J2 m3 (l22 + l1l2 cos φ2 ) + J2
with M̄ = m3 (l22 + l1l2 cos φ2 ) + J2 m3 l22 + J2
 
m1
 J1 
" #  
T1  
⇔ = Φ(q , q̇ , q̈ )  m2 
 ???
T2  J 

 2 
m3
• Mass m1 does not show up in the equations of motion, so it can not be

identified!

Parameter-linear form (3)
• The other parameters are collected in p = [J1, J2, m2, m3]T . Then the
elements of Φ can be written as
Φ11 = φ̈1
Φ12 = φ̈1 + φ̈2
Φ13 = −l1g sin φ1 + l12φ̈
1
Φ14 = l1l2((2φ̈1 + φ̈2) cos φ2 − (φ̇2
2 + 2φ̇1 φ̇2 ) sin φ2 )
− l1g sin φ1 − l2g sin(φ1 + φ2) + l1 2 φ̈ + l2(φ̈ + φ̈ )
1 2 1 2
Φ21 = 0
Φ22 = φ̈1 + φ̈2
Φ23 = 0
Φ24 = l1l2(φ̇2 2
1 sin φ2 + φ̈1 cos φ2) − l2 g sin(φ1 + φ2) + l2 (φ̈1 + φ̈2)
• So matrix Φ indeed depends only on known quantities and not on any of the
parameters in p: A parameter linear form can be obtained:
τ = Φp.

Experimental approach
• Measure the torques τ i = [T1, T2]T at N time instances ti (i = 1..N ) with

known angles, velocities and accelerations q i = [φ1, φ2]T , q̇ i = [φ̇1, φ̇2]T and
q̈ i = [φ̈1, φ̈2]T .
   
τ1 Φ1
 τ2   Φ 
• Collect b =  and A =  . 2  in b = Ap.
   
 ..   . 
τN ΦN
• With the pseudoinverse the LS solution is p = A†b.

Experimental approach (2)
• In order to compute the solution p = A†b, matrix A has to be known with

sufficient accuracy, so all angles, velocities and accelerations have to be known
with sufficient accuracy.
• In order to compute the solution p = A†b, the pseudoinverse of matrix A has

to exist. In other words: AT A may not be singular or all columns of A should be
linearly independent (full column rank). This is the case when:
• All parameters are independent.
• The input angles, velocities and accelerations q , q̇ , q̈ are sufficiently exciting.
→ The rank of matrix A can be checked with an SVD analysis:

All singular values should be (sufficiently) unequal zero.

• An upper bound for the relative error of the estimated parameter vector is
||p − pLS || ||en||

≤ cond(A) ,
||p|| ||b||
where en is the measurement noise and the condition number is
σmax(A)
cond(A) = ,
σmin (A)
with the largest and smallest singular values σmax(A) and σmin(A),
respectively.
→ The input angles, velocities and accelerations q , q̇ , q̈ can be optimised to

minimise the condition number.

• How can such an optimisation be carried out?

• How can velocities and accelerations be computed accurately when only
position sensor data is available?
Possible solution: Use finite Fourier series:

Ni
i i
X
φi(t) = al sin(ωf lt) + bl cos(ωf lt) + φi(0)
l=1
Ni
i i
X
φ̇i(t) = al ωf l cos(ωf lt) − bl ωf l sin(ωf lt)
l=1
Ni
i 2 2 i 2 2
X
φ̈i(t) = −al ωf l sin(ωf lt) − bl ωf l cos(ωf lt)
l=1
The fundamental pulsation of the Fourier series ωf should match the total
measurement time.

• For each φi there are 2 × Ni + 1 parameters (ail, bil for i = 1..Ni and φi(0))
that can be optimised for optimal excitation while e.g. motion constraints are
satisfied, e.g. with the M ATLAB command fmincon.
• The use of harmonic function makes it possible to compute velocities and

accelerations analytically.
From experimental data the time derivatives can be computed by considering
only the relevant frequencies after Fourier transform.
• Due to periodicity several periods can be averaged for improved signal to

noise ratio.
• The input signal contains only the selected frequencies.

Identifiability
• Column rank deficiency of matrix A is not always easy to see beforehand as a

linear dependency of parameters can be hidden well.
Consider e.g. a horizontal pendulum, so with no gravity. Then with parameter

vector p = [J1, J2, m2, m3]T the elements of Φ can be written as
Φ11 = φ̈1
Φ12 = φ̈1 + φ̈2
Φ13 = 2φ̈
l1 1
Φ14 = l1l2((2φ̈1 + φ̈2) cos φ2 − (φ̇2 2 + 2φ̇1 φ̇2 ) sin φ2 )
+ l12 φ̈ + l2 (φ̈ + φ̈ )
1 2 1 2
Φ21 = 0
Φ22 = φ̈1 + φ̈2
Φ23 = 0
Φ24 = l1l2(φ̇2 1 sin φ2 + φ̈1 cos φ2 ) + l 2 (φ̈ + φ̈ )
2 1 2

Identifiability (2)
m 2 J2
Now the first and third column of Φ φ2
are dependent as
φ1
m1 l2 m3
l1
"
Φ13
#
2
"
Φ11
#
J1
= l1
Φ23 Φ21
It implies that p1 = J1 and p3 = m2 can not be estimated separately. Instead

2 p = J + l2 m can be estimated.
only the linear combination p1 + l1 3 1 1 2
By defining a new parameter vector p∗ = [J1 + l1 2m , J , m ]T a matrix Φ

2 2 3
with full rank can be constructed and the pseudoinverse exists (provided
sufficient excitation).
Such a parameter vector is denoted a base parameter vector.

Measurement errors
Suppose parameter set p∗ is obtained for measurement set τ ∗ and parameter

set p∗ + δp is obtained for measurement set τ ∗ + δτ .
How are δp and δτ related?
For cost function VN (p, τ ) and the estimated parameters sets:
∂VN ∗ ∗ ∂VN ∗
(p , τ ) = 0 and (p + δp, τ ∗ + δτ ) = 0
∂p ∂p
Taylor series for the second expression
∂ 2VN ∗ ∗ ∂ 2VN ∗ ∗
(p , τ )δp + (p , τ )δτ = 0
∂p2 ∂p∂τ
so
#−1
2 ∂ 2VN ∗ ∗
"
∂ VN ∗ ∗
δp = − 2
(p , τ ) (p , τ ) δτ
∂p ∂p∂τ

Measurement errors (2)
1 ε(p, τ )T ε(p, τ ):
For least squares criterion VN (p, τ ) = 2
∂VN ∂ 1 T

T
First derivative: = ε(p, τ ) ε(p, τ ) = ε(p, τ ) S(p, τ ) with
∂p ∂p 2
∂ε(p, τ )
sensitivity matrix S(p, τ ) = .
∂p
∂ 2VN T ∂S(p, τ ) T
Second derivative: = ε(p, τ ) + S(p, τ ) S(p, τ )
∂p2 ∂p
Small residue, so ignore first term and
δp = −(S(p∗, τ ∗)T S(p∗, τ ∗))−1S(p∗, τ ∗)δτ = −S(p∗, τ ∗)†δτ

Non-linear parameter optimisation (as not the whole world is linear)
→ Numerical minimisation of function V by iterative search methods
θ̂ (i+1) = θ̂ (i) + αf (i)
with search direction f (i) and (positive) step size factor α.
1. using function values only, or

2. using function values and the gradient, or
3. using function values, the gradient and the Hessian (second derivative
matrix).
Typical member of group 3: Newton algorithms with search direction

i−1
(i) (i)
′′
V ′(θ̂ (i))
h
f = − V (θ̂ )
Subclass of group 2 are quasi-Newton methods: estimate Hessian.

Non-linear least-squares problem
Quadratic criterion: VN (θ) = ε(θ)T ε(θ)
Gradient: VN′ (θ) = S(θ)T ε(θ) with sensitivity matrix S.
Hessian: VN′′ (θ) = S(θ)T S(θ) + ε(θ)T S ′(θ)

or VN′′ (θ) ≈ S(θ)T S(θ)
Gauss-Newton (µ = 0, λ = 0) / damped Gauss-Newton (λ = 0) /

Levenberg-Marquardt:
−1
(i+1) (i) (i) (i) (i) (i) (i)
θ̂N = θ̂N − µN S(θ̂N )T S(θ̂N ) + λI S(θ̂N )T ε(θ̂N )
see e.g. M ATLAB’s “Optimization Toolbox”.

Parameter identification: More complex model.
Joint 3
Link 3
Hinge 3 Beam 3
Link 2 Joint 4
Slider truss Hinge 4
Link 4
Beam 4
Joint 2 Beam 2
Joint 5 Hinge 5
Hinge 2
Joint 6 Link 5 Beam 5
Beam 1 Hinge 6
Link 1 Link 6
Hinge 1 Beam 6
Joint 1
z
Base
y
Figure 1. Stäubli RX90B six-axis industrial robot. Courtesy of x
Stäubli, Faverges, France. Figure 2. Finite element model of the Stäubli RX90B.
Six axes industrial robot

Outline:
• Modelling: Which effects are included in the (dynamic) model?

• Masses: acceleration and gravity: Yes.
• Friction: Yes.
• Gravity compensation spring: Yes.
• Driving system: Yes.
• Elasticity of the mechanism: No, so only “low” frequencies.
• Modelling: Complete the equations of motion and indicate the parameters.
• Experiment: How is the data collected?
• Identification: Linear-in the parameters or not?

Modelling (1): Masses and spring
From Dynamics of Machines (191131730):
(x)T 2 (x)
q̇ )q̇ − f + DF (e,c)T σ (c) = T ,
h i
M̄ q̈ + DF M (D F (3)
• q are the six degrees of freedom, i.e. the joint angles.

• First two terms are the inertia effects:
• Reduced mass matrix M̄ .
• Accelerations q̈ .
• First order geometric transfer functions of Jacobians DF .
• Force f includes gravity.
• Total stress in the gravity compensating spring σ (c).
• Joint torques T

m(k) , J(k) Rq ny′
Modelling (2): Masses and spring parameters q
p s(k) Rq nx′
• Lumped Masses: Each link is described by a R ny′
Rq nz′
symmetric rotational inertia matrix J (k), a mass z p
Rp nz′
m(k) and a vector s(k) defining the center of y
Rp nx′
gravity with respect to the corresponding element x
node at which the body is lumped. (Toon Hardeman, 2005)
For each link element a lumped parameter vector p(l,k) is defined as
p(l,k) = (m, msx′ , msy′ , msz ′ , Jx′x′ , Jx′y′ , Jx′z ′ , Jy′y′ , Jy′z ′ , Jz ′z ′ )(k)
containing the components of m(k), s(k) and J (k), respectively.
• Total stress in pre-stressed spring: σ (c) = σ (c,0) + ks e(c) , (4)
depends on elongation e(c) and parameters: pre-stress σ (c,0) and stiffness ks .

Modelling (3): Driving system
i5
J5
n5
• Generated torques depend on ij nj
(m)
k5
(f )
(m) Jj Tj ϕ5 (f )
known motor constants kj and Tj T5
(m)
kj q5
currents ij ϕj qj T5 (f )
(m) i6 T6
T (m) = kj ij . n56 T6
(m) q6
k6
ϕ6
• Part of the torque is needed for J6 n6
(a) Joints 1 to 4. (b) Joints 5 and 6.
the acceleration of the motor inertia
(m) Figure 3. Layout of the driving system.
Jj ϕ¨j .
(f )
• All friction losses are combined into a single term Tj .
• The input-output relation of the complete driving system
T (m) − TT J (m)Tq̈ − T (f ) = T , (6)
with gear ratios T and all motor inertias collected in J (m).

Modelling (4): Friction – first try
Friction torques have been measured in single joint constant velocity

experiments. Commonly used model equation:
δ(a)
(s) j
(f ) (C,0) (s,0) − q̇j /q̇j (v,0)
Tj = Tj + Tj e + Tj q̇j
Coulomb Stribeck viscous
0.40 0.04
T (f ) /T (max)
T (f ) /T (max)
0.30 0.03
Dots (•): experiments.
0.20 0.02
Dashed (- - -):
0.10 0.01 estimated in the full
velocity range.
0.00 0.00
0 1 2 3 4 5 0.00 0.05 0.10 0.15 0.20
Angular velocity q̇ [rad/s] Angular velocity q̇ [rad/s] Solid (—): estimated in
the range from 0 to 0.5
(a) Full velocity range. (b) Low velocity range.
rad/s.

Modelling (5): Friction – improved model
After looking carefully at the experimental

(a) data and with physical insight:
(s) δj (v)
(f ) (a,0) − q̇j /q̇j (v,0) (1−δj )
Tj = Tj e + Tj q̇j . (7)
0.40 0.04
T (f ) /T (max)
T (f ) /T (max)
0.30 0.03
0.20 0.02
0.10 0.01
0.00 0.00
0 1 2 3 4 5 0.00 0.05 0.10 0.15 0.20
Angular velocity q̇ [rad/s] Angular velocity q̇ [rad/s]
(a) Full velocity range. (b) Low velocity range.
This friction model gives an accurate fit in the full velocity range with a minimal
parameter set and physically sound model structure (Rob Waiboer, 2005).

Modelling (6): Friction – conclusion
(a)
(s) δj (v)

(f ) (a,0) − q̇j /q̇j (v,0) (1−δj )
Tj = Tj e + Tj q̇j . (7)
• For each friction torque there are five unknown parameters:

(a,0)
• the static friction torque Tj ,
(s)
• the Stribeck velocity q̇j ,
(a)
• the Stribeck velocity power δj ,
(v,0)
• the viscous friction coefficient Tj and
(v)
• the viscous friction power δj .
• All parameters can be obtained with a non-linear parameter estimation,

provided a reasonable initial guess is available.
(a,0) (v,0)
• Parameters Tj and Tj depend strongly on temperature and have to
be identified again in each experiment.
Fortunately, these parameters appear linearly in the model!

Modelling (7): Putting it all together
All submodels can be combined in a single equation:
(m) (x)T 2 (x)

q̇ )q̇ − f + DF (e,c)T σ (c) + T (f ) = T (m).(8)
h i
M̄ q̈ + DF M (D F
(m)
• The motor inertias J (m) are included in the new reduced mass matrix M̄ .
• The joint driving torques T (m) are computed from the measured motor
currents.
• This acceleration linear form is well suited for simulations.
• It can be proven that a parameter linear form can be obtained, provided:

• the proposed model structure is used to model the mass
and inertia properties (Toon Hardeman, 2004) and
• the non-linear parameters in the friction model are taken
constant.

Modelling (8): Parameter linear form
Φ(q̈ , q̇ , q )p = T (m) (9)
The parameter linear form is well suited for a linear least squares fit of the
model parameters p:
• p(l): 60 parameters of the lumped inertia parameters of the six links.
• p(s): 2 parameters of the spring (pre-stress σ (c,0) and stiffness ks ).
• p(m) : 6 parameters with the motor inertias.
• p(f ) : 14 parameters representing the linear part of seven equations for

friction torques.
So in total 82 parameters!

Identification experiment
• Move the robot along a trajectory for the joint angles q (t).
• Record the joint driving torques T (m) (t) at n time instances.
• Compute all matrices Φi (q̈ i, q̇ i, q i) from the (actual) trajectory q (t).
• Collect all data in the regression matrix A and measurement vector b:
(m)
   
Φ1(q̈ 1, q̇ 1, q 1) T 1
A =
 .. 
 and b

= .. 
 (15)
 
Φn(q̈ n, q̇ n, q n) (m)
T n
• Identification experiment: Multi-sine (well-defined input in frequency domain).

Parameter identification: LS solution
• Estimate parameter vector p by minimising the residual ρ in the linear system

of equations
Ap = b + ρ. (14)
• The least-squares solution
p̂ = arg min ||ρ||22 (16)

p
can be obtained using the pseudo-inverse A†
p̂ = A†b = (AT A)−1AT b, (19)
provided the inverse (AT A)−1 exists, so A has full rank.

Parameter identification (2): SVD
• Unfortunately, A is rank deficient
rank(A) = r < m, (20)
where m is the length of the parameter vector p (m = 82).
• Apply Singular Value Decomposition
A = U ΣV T , (21)
with orthogonal 6n × 6n matrix U , m × m matrix V .

" #
S
Σ= (22)
0
contains the nonnegative singular values S = diag(σ1, σ2, ..., σm) in

descending magnitude.

Parameter identification (3): SVD result
105
• For two different trajectories the

10−5
rank of the regression matrix
σ
10−10 appears to be r = 55.

10−15
• Does that mean that 55
10−20
1 10 20 30 40 50 55 60 70 82 parameters can be estimated
Singular value number meaningfully?

Parameter identification (4): essential parameters and null space
• Split S in nonzero part S 1 (r × r) and zero part S 2

" #
S1 0
S= , (23)
0 S2
and accordingly
h i h i
U= U1 U2 and V = V1 V2 (24)
• Transform and partition parameter vector p with the right singular matrix V :
p = V 1α(E) + V 2α(N ) (25)
α(E) is the essential parameter vector that contains the parameter

combinations that can be estimated independently,
α(N ) is associated with the null space: It can have any value!

Parameter identification (5): parameter solution and ambiguity
• Transform measurement vector b with the left singular matrix U
g = U T b, (30)
then the original LS problem can be replaced by

" #
S
α̂ = arg min || α − g ||22, (31)
0
α
with solution for the essential parameters
(E) gi
α̂i = , for i = 1, 2, ...r. (32)
σi
• An infinite number of parameters is found
p̂ = V 1α̂(E) + V 2α̂(N ) (34)
as α̂(N ) can take any value.

Parameter identification (6): residual error and parameter accuracy
• An estimate of the variance of the residual is
6n
1
s2 = var(ρ) =
X
ρi (38)
6n − r i=1
• Now the variance of the LS estimate α̂i equals
s2
var(α̂i) = 2 . (39)
σi
So for small singular values, the accompanying α̂i can not be estimated
accurately.
• Such inaccurate estimates will ruin the parameters in p!!!

Parameter identification (7): truncated SVD
• To avoid degradation of the parameter accuracy: Do not take into account too
small singular values.
• In other words: Take only r singular values into account where r is smaller
than the rank of the regression matrix A.
This is the truncated or partial SVD method.
• How large should r be?

• Large enough to avoid model errors and to obtain a
small residue.
• Small enough to avoid degradation of the accuracy of the
parameter estimate and overfit.

Parameter identification (8): Residual error
• Consider the residual error as a function of the number of essential parameter

r.
10.0
5.0
• About 20 parameters seems to
be sufficient.
kρk22
2.0
• How can we determine r more
1.5 exactly?
It depends on the error in the
1.0
1 10 20 30 40 50 55 parameter estimate that is
Number of singular values r
accepted.

Intermezzo: Scaling
• Scaling is important for any MISO, SIMO or MIMO identification!
• The absolute errors in all torques are weighted equally.
E.g. scaling with the maximum torques for each joint.
• The resulting errors in the parameters are also absolute errors.
E.g. scaling with some nominal parameter set changes this effectively in relative
errors of all parameters.

Parameter identification (9): Applying truncated SVD
104
σ, |g| and 10% threshold
102
1
Dots (•): singular values σi .
10−2
10−4 Solid (—): magnitude of gi.

10−6
Parameter scaling: gi should be
10
−8
1 10 20 30 40 50 55 near σi, which is true for about
Singular value number
the first 30 parameters.
From an estimate of the variance s2 and a maximum parameter error of 10%,

the value of the smallest singular value is computed: The dashed line (- - -).
So only 22 or 23 parameters can be estimated sufficiently accurate.

Simulation results
Measured and simulated joint torques (22 essential parameters).
(a) Joint 1. (b) Joint 3. (c) Joint 6.

0.4 0.10 0.15
0.08
0.3
0.10
0.06
normalised joint torque

0.2
0.04
0.05
0.1
0.02
0 0 0
−0.02
−0.1
−0.05
−0.04
−0.2
−0.06
−0.10
−0.3
−0.08
−0.4 −0.10 −0.15

0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
time [s] time [s] time [s]
Figure 8. The simulated and measured joint torques along the trajectory as a function of time. The simulation has been carried out with model M1 .
With: the measured torque, the simulated torque and the residual torque.
Remaining residual error appears to be mainly unmodelled high-frequent

dynamics.

Closed loop identification
Identification of a system G0 being controlled by a control system C can be e 3
desirable / necessary for: r_1 1 Noise H_0
• instability of the system,
r_2 u y
• safety requirements, 2 C G_0
Controller System
• impossibility to remove the control system,
• controller relevant identification.
Situation:
• Prerequisites: y(t) = G0(z) u(t) + H0(z) e(t)
u(t) = C(z) [r2(t) − y(t)] + r1(t)
The control system C(z) is known or not.
• Wanted: estimate Ĝ of system G0 from measurements of system output
y(t) and furthermore system input u(t) and/or feedforward r1(t) and/or
tracking reference r2(t).
• Problem: input u(t) is correlated with the disturbance v(t) =
H0(z) e(t) because of the controller C(z).

e 3
r_1 1 H_0
Noise
r_2 u y
2 C G_0
Notations: Controller System
By defining signal r(t) = r1(t) + C(z)r2(t) the feedback is written as:

u(t) = r(t) − C(z)y(t).
With the input sensitivity S0(z) = (I + C(z)G0(z))−1

and the output sensitivity W0(z) = (I + G0(z)C(z))−1
(SISO: S0 = W0)
it follows u(t) = S0(z)r(t) − C(z)W0(z)H0(z)e(t)

y(t) = G0(z)S0(z)r(t) + W0(z)H0(z)e(t)

e 3
r_1 1
Noise H_0
Closed loop identification problems: r_2

2 C
u
G_0
y
Controller System
When is it possible
(a) to identify a model [G(z, θ̂N ), H(z, θ̂N )] consistently.
(b) to identify a system model G(z, θ̂N ) consistently.
(c) to formulate an explicit approximation criterium that characterises the
asymptotic estimator G(z, θ ∗) independent of Φv (ω)
(d) to set a specific order for G0 in the model set.
(e) to identify an unstable process G0.
(h) to guarantee that the identified model for G0 will be stabilised by the
applied control system C.

e 3
r_1 1 H_0
Noise
r_2 u y
2 C G_0
Controller System
Solution strategy depends also on starting point:
• Measurements of the system output y(t).

• Measurements of the system input u(t).
• Knowledge regarding presence and properties of signals r1(t) and
r2(t).
• Measurements of r1(t) and/or r2(t).
• Knowledge of controller C(z).
For most solution methods it is required that system G0 (and its model) have at
least one delay.
What can go wrong?

e 3
Example of spectral analysis: SISO closed loop system r_1 1

Noise H_0
r_2 u y
2 C G_0
Controller System
Measurements of system input u(t) and output y(t).
Background information: r2(t) ≡ 0, r1(t) and e(t) uncorrelated.
Define: z(t) = C(z)H0(z)e(t).
1 z(t)]
Then: y(t) = S0(z) [G0(z)r1(t) + C(z)
u(t) = S0(z) [r1(t) − z(t)]
So: Φu(ω) = |S0(eiω )|2 [Φr1 (ω) + Φz (ω)]

Φyu (ω) = |S0(eiω )|2 [G0(eiω )Φr1 (ω) − 1 Φ (ω)]
C(eiω ) z

Suppose that accurate estimators for Φu(ω) and Φyu(ω) exist (N → ∞), then
an estimator for G0 is given by:
iω Φyu (ω) G0(eiω )Φr1 (ω) − Φz (ω)/C(eiω )

Ĝ(e ) = =
Φu(ω) Φr1 (ω) + Φz (ω)
Situation 1: No noise, so z(t) ≡ 0 and Φz (ω) = 0, then:
Ĝ(eiω ) = G0(eiω )
Situation 2: No external signal, so r1(t) ≡ 0 and Φr1 (ω) = 0, then:
−1
Ĝ(eiω ) =
C(eiω )
in other words the inverse controller is estimated!
In general: weighted combination of both situations.

Example of parametric identification: SISO system with proportional feedback
Background information: r1(t) ≡ r2(t) ≡ 0, and assume a system with a

proportional controller: u(t) = f y(t) with f ∈ R.
Consider a first order ARX model with two parameters θ = (a, b)T :
ε(t, θ) = y(t) + a y(t − 1) − b u(t − 1)
With feedback, so: ε(t, θ) = y(t) + (a − bf ) y(t − 1)
In other words all models with the same â − b̂f predict an identical error ε and
can not be distinguished.

Overview techniques:
Dual Youla
Coprime factorisation
• Two-steps
Tailor-made
• Joint input/output
• Indirect
• Direct
IV
Consistency (Ĝ, Ĥ) + + + + + + +
Consistency Ĝ + - + - + + +
Tunable bias - - + - + + + +
Fixed model order + + - - + + -
Unstable system + + + + - + +
(G(z, θ̂N ), C) stable - - - + - - /+
C known n n j n j n n j
• in this course.

Direct closed-loop identification
Idea: Neglect feedback and estimate model for G0 and/or H0 with common
open-loop techniques from measured {u(t), y(t)}.
Consistent estimators for G0 and/or H0 are possible if
a. The system (G0, H0) is in the model set.

b. There is at least one sample delay in CG0.
c. There is a sufficiently exciting (measurable or unmeasurable) signal r(t)
and/or controller C is sufficiently complex (e.g. nonlinear or order equal
to or larger than G0).
Conclusion: measurements of {u(t), y(t)} seems sufficient.
However: Requirement (a) is not practical.

Open-loop asymptotic identification criterium
1 π |G0(eiω ) − G(eiω , θ)|2 Φu(ω) + |H0(eiω )|2Φe(ω)

Z
dω
2π −π |H(eiω , θ)|2
is for the closed-loop situation
1 π iω 2
2 |S0(e )|
Z
iω iω
{|G0(e ) − G(e , θ)| iω 2
Φr (ω)
2π −π |H(e , θ)|
|H0(eiω )|2|S0(eiω )|2
+ iω 2 iω 2
Φe(ω)} dω
|H(e , θ)| S(e , θ)|
with sensitivity S(eiω , θ) = (1 + C(eiω )G(eiω , θ))−1.
Independent parameterisation is no longer advantageous, e.g. OE (H = 1):

G(eiω , θ) = G0(eiω ) minimises the first term, but G is also in S in the second
term.
Requirements for a small bias: Accurate (and fixed) noise model H0 and/or a
large signal-to-noise ratio at the input u.

Indirect closed-loop identification
Idea: (1) identify closed-loop system with common open-loop techniques and
measurements of {r(t), y(t)} and (2) next compute the open-loop system with
knowledge of the controller.
System: y(t) = S0(z)[G0(z) r(t) + H0(z) e(t)].
Model set: y(t) = [Gc(z, θ) r(t) + Hc(z, θ) e(t)].
Step 1 identify in fact S0G0 and S0H0.
G(z, ρ̂) Gc(z, θ̂N )

Step 2: = Gc(z, θ̂N ) G(z, ρ̂) =
1 + G(z, ρ̂)C(z) 1 − C(z)Gc(z, θ̂N )
H(z, ρ̂) Hc(z, θ̂N )
= Hc(z, θ̂N ) H(z, ρ̂) =
1 + G(z, ρ̂)C(z) 1 − C(z)Gc(z, θ̂N )
with parameter vector ρ for the open-loop system.

Properties indirect method:
• Consistency is comparable with direct method.

• Order of model can not be chosen beforehand.
• The order of Ĝ is usually the sum of the orders of Ĝc and C.
• A consistent estimation of only G0 is possible using independent param-
eterisation.

Two-stage method
Idea:
(1) identify closed-loop system sensitivity S0(z) with common open-loop
techniques and measurements of {r(t), u(t)}
(2) use this estimate Ŝ(z) to create a noisefree estimate û(t) of u(t) and
estimate system G0(z) with an open-loop technique from this estimate û(t)
and the measured y(t).
Note that
y(t) = G0(z)û(t) + S0(z)H0(z)e(t) + G0(z)[S0(z) − Ŝ(z)]r(t)
is not an open-loop system due to the term with r(t), but this contribution
vanishes if the estimate from the first step is sufficiently accurate.
For this reason usually a high order estimate for S0(z) is considered.

Joint Input-Output method
Idea: Estimate Gy (z) = G0(z)S0(z) and Gu(z) = S0(z) with common

open-loop techniques and measurements of {r(t), y(t)} and {r(t), u(t)},
respectively:
y(t) = G0(z)S0(z)r(t) + W0(z)H0(z)e(t)

u(t) = S0(z)r(t) − C(z)W0(z)H0(z)e(t)
This can be a SIMO identification using {r(t), [y(t), u(t)]}.
Next Ĝ(z) = Ĝy (z)Ĝ−1

u (z).
Estimate Ĝ(z) is consistent, if both Ĝy (z) and Ĝu(z) are consistent estimates.
Note that this method can be used for unstable G0(z).

Frequency domain identification
In the case a frequency response is available, e.g. from
• a f -domain measurement, e.g. with a spectrum-analyser: often applied

for mechanical systems.
• a t-domain measurement with a multi-sine excitation (e.g. from
idinput), processed with FFT.
• an estimate from a non-parametric identification.
Advantages of identification in the frequency domain:
• data reduction
• simplicity of processing: filtering is a multiplication.
• evaluation of models in f -domain: aiming at application.
• choice discrete/continuous models.

The data:
Ğ(ωj ), ωj ∈ Ω Data is complex (amplitude/phase)

Ω = {ω1, ω2, ..., ωN } Frequency grid can be chosen (design)
Suppose: Ğ(ωj ) = G0(eiωj ) + V (ωj ), with

the real response G0(eiωj ) and
a stochastic noise process V (ωj ).
Identification problem: Find θ̂N by minimising
N
|Ğ(ωk ) − G(eiωk , θ)|2|W (ωk )|2
X
k=1
iωk B(eiωk , θ)
with G(e , θ) = iω
and W (ωk ) a chosen weighting function.
A(e , θ)
k
→ Resembles time domain OE-problem (non-linear-in-the-parameters).

In practice:
• Accurate initial estimate with linear regression techniques.
• Subsequent non-linear optimisation (e.g. Sanathanan-Koerner
algorithm).
• Weighting function is important: yes/not emphasis on high-frequent
behaviour.
“Tricks”:
• An ARX-like problem (linear-in-the-parameters) can be obtained by
rewriting the optimisation problem (implicit weighting with |A(eiωk , θ)|)
• An iterative linear least-squares solution can be found by weighting with
the |A(eiωk , θ)| found in the previous iteration.
→ Exercise

Other implementation examples: the freqid toolbox
• Commercial fdident toolbox from ELEC, Vrije Universiteit Brussel.
• Freely downloadable freqid toolbox developed by R.A. de Callafon,
initially at the TU Delft (old Matlab versions).
Starting point are frequency

domain data:
• Measured FRF.
• Time domain data + FFT.
• The GUI has some similarities with the ident GUI.
• Models are fitted using a non-linear weighted least squares fit, e.g. using the
accompanying lsfits toolbox.

Example: the freqid toolbox (2)
Typical weighting functions:

• None.
• Inverse of data.
• User defined, e.g. frequency
range.
• Supplied weighting function.
• Identification can be carried out using both discrete-time and continuous-time

models.

Example: the freqid toolbox (3)
• Often used weighting function: Coherence spectrum, see M ATLAB’s help

cohere.

 1 for a perfect correlation between input u and output y
Cyu (ω) =
 0 for no correlation between input u and output y
Estimate:
v
|Φ̂N 2
yu (ω)|
u
N
u
Ĉyu(ω) = t
Φ̂N N
y (ω)Φ̂u (ω)
• Identification is possible for both open-loop and closed-loop systems, but in

the latter case the coherence spectrum should be used with care.

Aspects of MIMO systems
• Polynomials become polynomial matrices, e.g. ARX:
A(q −1)y(t) = B(q −1)u(t) + e(t)
with ny × ny matrix A(q −1) = Iny + A1q −1 + ... + Anaq −na, or

 
a (q −1)
a12 (q −1)
· · · a1ny (q −1)
 11 −1
 a21 (q ) a22 (q −1 ) · · · a2ny (q −1 )

−1
A(q ) = 

 .. .. ... .. 

 
any1(q −1) any2(q −1) · · · any ny (q −1)
−1 1 −1 nakj −na
with elements akj (q ) = δkj + akj q + ... + akj q kj .

And the ny × nu matrix B(q −1) = B0 + B1q −1 + ... + Bnbq −nb, or
 
b (q −1)
b12 (q −1)
· · · b1nu (q −1)
 11 −1
 b21(q ) b22 (q −1 ) · · · b2nu(q −1)

−1
B(q ) = 

 .. .. ... .. 

 
bny1(q −1) bny2(q −1) · · · bny nu(q −1)
−1 1 −nkkj nbkj −nk −nb +1

with elements bkj (q ) = bkj q + ... + bkj q kj kj .

Usually: choice of a specific model structure, e.g. common denominator
polynomial a(z). E.g. the transfer function G(z) = A(z)−1 B(z) of a 2 × 2
system
 
b11(z) b21(z) " #
G(z) =  ba(z)
 a(z) 
= (a(z)I )−1 b11(z) b21 (z)
12(z) b22(z)  2×2 b12(z) b22(z)
a(z) a(z)
Note: this expression has a large number of parameters, especially when the
poles differ for each of the elements in the matrix. Then the denominator has to
contain all poles and in each element some of them need to be compensated
by zeros of bij (z).
• Simpler representation by means of state space models: MIMO is a

straightforward extension of SISO.
• Example: Subspace techniques.

• Available in the ident toolbox:
• Models of polynomials with ARMAX-, OE- and BJ-structure for

MISO-systems.
• Models of polynomials with ARX-structure for MIMO-systems.
• Arbitrary state space models for MIMO-systems with subspace
identification.
• Structured state space models for MIMO-systems with PEM-
techniques: OE- and ARMAX-structure.
N X p
1 X
Minimisation of w i ε2
ip (t, θ).
N t=1 i=1
• Extra requirements regarding fixed parameters, available orders and

structures may be necessary, e.g. for the physical model / identifiability /
uniqueness.

• Closed-loop MIMO identification: For the direct, indirect and twee-steps
methods it depends on the capabilities of the applied open-loop identification.
• Frequency domain MIMO identification: Algorithm is also applicable for m

inputs and p outputs: Measure all m × p frequency responses Ğjl (ωk ) and
minimise
p X N
m X
|Ğjl (ωk ) − Gjl (eiωk , θ)|2|Wjl (ωk )|2
X
j=1 l=1 k=1
Also in this case there is a large variation of available parameterisations for

MIMO systems, e.g. left MFD (for the number of outputs p ≤ number of inputs
m):
G(q, θ) = A(q, θ)−1 B(q, θ) with

A(q, θ) = I + A1 q −1 + ... + Ana q −na
B(q, θ) = B0 + B1 q −1 + ... + Bnb q −nb

Slides

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Slides

Caricato da

Copyright:

Formati disponibili

System Identification

Lecturer: Ronald Aarts

Ronald Aarts PeSi/0/1

• Introduction of System identification and Parameter estimation

• Systems and signals

Ronald Aarts PeSi/1/1

• What is system identification?

Ronald Aarts PeSi/1/2

Input and output signals Input and output signals

Ronald Aarts PeSi/1/3

Ronald Aarts PeSi/1/4

• What is parameter estimation (in this lecture)?

Ronald Aarts PeSi/1/5

LTI: Linear Time Invariant system, e.g. described by means of a transfer

Where possible, non-linearities are removed from the data by preprocessing

Ronald Aarts PeSi/1/6

1. The models used for system identification are “mathematical”:

Ronald Aarts PeSi/1/7

• Introduction, discrete systems, basic signal theory

Ronald Aarts PeSi/1/8

• Non-linear model equations

Ronald Aarts PeSi/1/9

Ronald Aarts PeSi/1/10

Ronald Aarts PeSi/1/11

The signals depend on time: continuous or at discrete time instants.

Several system descriptions are available:

• Continuous versus discrete time

Examples for LTI systems:

• Frequency Response Function (FRF) or Bode plot

Ronald Aarts PeSi/2/1

• The amplification equals the absolute value |G(iω)|.

Ronald Aarts PeSi/2/2

Continuous time signals and systems: Convolution integral

Discrete time signals and systems: Summation

Ronald Aarts PeSi/2/3

uk−1 = u(tk−1) = u(tk − Ts)

Transfer function y = G(z)u + H(z)e (in z-domain)

Continuous time vs. Discrete time

State space equations:

Ronald Aarts PeSi/2/4

Continue time vs. Discrete time

Ronald Aarts PeSi/2/5

Zero order hold (ZOH) discretisation introduces a phase lag:

1 Phase lag depends on

So the phase lag is −90o at the Nyquist frequency ωN = ωs/2 = π/Ts.

Ronald Aarts PeSi/2/6

• Frequency content: Fourier transform

Deterministic or stochastic signals?

Ronald Aarts PeSi/2/7

Continuous-time deterministic signals u(t): Fourier integral:

For a finite number (N ) of discrete-time samples ud(tk ): Fourier summation:

Ronald Aarts PeSi/2/8

DFT (right) in 16384 frequencies computed with M ATLAB’s fft command.

Ronald Aarts PeSi/2/9

Energy spectrum: Ψu(ω) = |U (ω)|2

Ronald Aarts PeSi/2/10

• Deterministic with finite energy: 1

Ronald Aarts PeSi/2/11

Energy spectrum: Ψu(ω) = |U (ω)|2 (from the DFT)

Piezo data: 700 −5

Ronald Aarts PeSi/2/12

After Fourier transform: Y (ω) = G(ω) · U (ω)

After Fourier transform: Y (ω) = G(ω) · U (ω)