Handout 7

Contents
Lecture 7 Least Squares Adaptive Filtering

N. Tangsangiumvisai 2102876 Adaptive Signal Processing 1
Revision on the adaptive filtering algorithm The Least Squares Method The RLS Algorithm Computational Complexity Performance Analysis Comparison between LMS and RLS The Fast RLS (FRLS) Algorithms Summary y
N. Tangsangiumvisai
Adaptive Signal Processing : Lecture 7
Rev. on the adaptive filtering algorithm g g

d(n) + x(n) adaptive filter w d(n) e(n) ( )
Revision (II)
If x(n), d(n) are zero mean and WSS random processes, the cost function (MSE) at time n is
2 J (n) = d w H (n)p p H w (n) + w H (n)R w (n)
x(n) input signal d(n) desired signal e(n) error signal ( ) i l
... (2)
adaptive filtering algorithm l ith
e( n ) = d ( n ) d ( n / X n )
= d ( n) w H ( n) x( n)
Fig.1 Generic from of an adaptive filter
At the minimum point of the error-performance surface, w (n) = w opt hence, minimum MSE
2 J min = d p H w opt
- The adaptive filtering algorithm is employed to control the coefficients so as to minimize some cost function. - The usual form of the update equation:
w (n + 1) = w (n) + w (n)
N. Tangsangiumvisai Adaptive Signal Processing : Lecture 7
... (3)
since
w opt = R 1p
... (4)
... (1)
3 Adaptive Signal Processing : Lecture 7 4
N. Tangsangiumvisai
Revision (III)
Method of the steepest descent
- a recursive method to find the minimum MSE coefficients (Wiener solutions) at global minimum - uses exact measurement of the gradient vector t t f th di t t
Revision (IV)
Steepest Descent algorithm (true gradient)
Least Mean Square algorithm (stochastic gradient) The gradient vector is estimated from the available data, i.e. instantaneous estimates of R and p are used.
J ( n ) = 2 x ( n ) d * ( n ) + 2 x ( n ) x H ( n ) w ( n )
R ( n) = x( n) x H ( n) p ( n) = x( n) d * ( n)
The d t Th update equation: ti
w (n + 1) = w (n) + 1 ( w J (n) ) 2
... (5)
where is a positive constant.
w J (n) = 2p + 2Rw (n)
... (7)
w (n + 1) = w (n) + (p R w (n) ), n = 0,1,2,
... (6)
N. Tangsangiumvisai
N. Tangsangiumvisai
Revision (V)
The update equation of LMS :
w (n + 1) = w (n) + x(n) d * (n) x H (n)w (n)
Least Squares Adaptive Filtering
... (8)
e* ( n )
The update equation of NLMS :
w (n + 1) = w (n) + v
a small positive constant
x( n)
+ x( n) 2
2
e* ( n )
... (9)
LMS / NLMS uses the instantaneous estimate of the gradient vector excess MSE Misadjustment
N. Tangsangiumvisai Adaptive Signal Processing : Lecture 7 7
Contents Revision on the adaptive filtering algorithm The Least Squares Method The RLS Algorithm Computational Complexity Performance Analysis Comparison between LMS and RLS The Fast RLS (FRLS) Algorithms ( ) g Summary
N. Tangsangiumvisai
Least Squares Method

Wiener filter theory : - obtained from ensemble averages - based on the assumptions on the statistics of the input applied to the adaptive filter. Method of Least Squares : - a model-dependent procedure - a best fit obtained by minimizing the sum of squares of difference b t diff between real-valued measurements and points l l d t d i t of a curve constructed to fit those measurements. - a deterministic approach
involves time averages depends on the number of samples used in the computation. p p p
Least Squares Method (II)

Two main families of adaptive filtering algorithms : LMS family - Gradient Descent method - approximate minimization of the mean-squared error - use of ensemble averages of data g
(instantaneous value)
RLS family - Least Squares technique - exact minimization of the sum of square errors - use of time averages of data
N. Tangsangiumvisai
N. Tangsangiumvisai
10
Least Squares Method (III)

Consider 2 sets of variables, input signal : x(n), x(n 1), ..., x(n M + 1) desired signal : d ( n) The desired signal is modeled as
d (n) = wk x(n k ) + em (n)
k =0 M 1
Least Squares Method (IV)

x(n) ( ) w0
x(n-1)
w1
... (10)
measurement error
em(n) y(n) x(n M+1) x(n-M+1) wM-1
where the unknown system is described as

w (n) = [ w0 w1 wM 1 ]
T
... (11)
d(n)
Fig.2: Linear Transversal Filter model g

Adaptive Signal Processing : Lecture 7 11 Adaptive Signal Processing : Lecture 7 12
N. Tangsangiumvisai
N. Tangsangiumvisai
Least Squares Method (V)

The measurement error is an unobservable random variables. i bl It is assumed to be white with zero mean and variance 2 m, i.e.
x(n)
Least Squares Method (VI)

unknown system w(n) d(n)
+
adaptive filter w(n) d(n)
e(n)
E {em (n)} = 0 n 0,
and
* E em (n)em (k )
... (12)
m , n = k 2 = 0 nk
Fig.3: Fig 3: System Identification
... (13)
To estimate the unknown parameter w (n) from the tap-weights w (n) = w0 w1 wM 1 T , the estimated desired signal is M 1 equal to ... (14) d (n) = wk x(n k )
k =0
N. Tangsangiumvisai Adaptive Signal Processing : Lecture 7 14
N. Tangsangiumvisai
13
Least Squares Method (VII)

The adaptive filter employs the linear transversal filter model : d l
x(n-1) x(n)
1 z-1 1 z-1
Least Squares Method (VIII)

The error signal is therefore obtained as
e( n ) = d ( n ) d ( n )
... (15) ... (16)
x(n-2)
x(n-M+2)
1 z-1
x(n-M+1)
w0
w1
w2
wM-2
wM-1
Cost function (based on the pre-windowing method)

d(n)
e
n =t1
t2
( n)
error energy
... (17)
Fig.4: Linear Transversal Filter Model
N. Tangsangiumvisai
15
N. Tangsangiumvisai
16
Methods for data windowing

Covariance Method - make no assumption about the data outside the interval [1,N] - t1 = M, t2 = N - Input data matrix is given by
x( M + 1) x( N ) x( M ) x( M 1) x( M ) x( N 1) x(2) x( N M + 1) x(1)
Methods for data windowing (II)

Autocorrelation Method - assume that the data prior to time n=1 and after n=N are all zero - t1 = 1, t2 = N+M-1 - Input data matrix is given by
x( M + 1) x( N ) 0 0 x(1) x(2) x( M ) 0 x(1) x( M 1) x( M ) x( N 1) x( N ) 0 () x(2) ( ) ) ) 0 x(1) x( N M + 1) x( N M + 2) x( N ) 0
n =1 n = 2 n=M n=N n = N + M +1
n=M
N. Tangsangiumvisai
n = M +1
n=N
17
N. Tangsangiumvisai
18
Methods for data windowing (III)

Pre-windowing Method - assume that the data prior to time n=1 are zero - t1 = 1, t2 = N - Input data matrix is given by
x( M + 1) x( N ) x(1) x(2) x( M ) 0 x(1) x( M 1) () ) x( M ) x( N 1) ) x( N M + 1) 0 x(1) x(2) 0
n =1 n = 2 n=M n=N
Methods for data windowing (IV)

Post-windowing Method - make no assumption about the data prior to time n=1, but make an assumption that the data after n=N are zero - t1 = M, t2 = N+M-1 - Input data matrix is given by
x( M + 1) x( N ) 0 0 x( M ) x( M 1) x( M ) x( N 1) x( N ) 0 () x(2) ( ) ) ) x( N M + 1) x( N M + 2) x( N ) x(1)
n=M n = M +1 n=N n = N +1 n = N + M +1
N. Tangsangiumvisai
19
N. Tangsangiumvisai
20
Least Squares Method (IX)

The cost function of the LS method:
J ( w0 , w1 , , wM 1 ) =
th
Least Squares Method (X)

By substituting Eq.(13) into Eq.(17), the normal equations of a linear least-squares least squares filter are obtained as
M 1 k =0
e
n = t1
t2
( n)
... (18)
w x(n i) x(n k ) = x(n i)d (n),

k n =t1 n =t1
t2
t2
i, k = 0,1, , M 1
The k component of the gradient vector gives

k J = 2
x ( n k )e( n )
n =t1
t2
... (19)
time-averaged autocorrelation function
(k , i)
z (i )
time-averaged cross-correlation function
k J = 0, k = 0,1, , M 1 Since is required, hence, the principle of orthogonality :
M 1
x(n k )e(n) = 0, k = 0,1,, M 1

n =t1
t2
... (20)
w (k , i )
k =0 0 k
= z (i ),
i = 0,1, , M 1
... (21)
N. Tangsangiumvisai
21
N. Tangsangiumvisai
22
Least Squares Method (XI)

Vector notation
desired response vector
(n x 1)
Least Squares Method (XII)

parameter vector of a filter ... (22)
(M x 1)
d ( n) d (n 1) d( n) = d (1)
T
w1 (n) w2 (n) w ( n) = wM (n)
... (24)
x(n 1) x ( n) x(n 1) x(n 2) data matrix X( n) = (n x M) x(0) x(1)

M = .
N. Tangsangiumvisai
x(n M + 1) x (n) x(n M ) xT (n 1) = ... (23) x( M ) xT (1)
e( n ) e(n 1) = d(n) X(n)w (n) ... (25) The error vector e( n) = (n x 1) e(1)
23
N. Tangsangiumvisai
24
Least Squares Method (XIII)

There are 3 possibilities for this set of equations : (1) under-determined, when n < M (2) exactly determined, when n = M (3) over-determined, when n > M
If w (n) is chosen so that e(n) is zero, and
Least Squares Method (XIV)

Then,
(X
( n) X( n) w ( n)
XT (n)d(n)
w ( n) =
(X
( n ) X( n )
-1(n)
XT (n)d(n)
z(n) ( )
... (26)
time-averaged autocorrelation matrix t l ti ti
time-averaged cross-correlation vector
assume that n > M, e( n ) = d ( n ) X ( n ) w ( n ) = 0

d ( n) = X( n) w ( n )
Thus, the pseudo inverse of this matrix exists, although
-1(n)
is not a square matrix q
XT (n)d(n) = XT (n) X(n)w (n)

Adaptive Signal Processing : Lecture 7 25 Adaptive Signal Processing : Lecture 7 26
N. Tangsangiumvisai
N. Tangsangiumvisai
Least Squares Method (XV)

The orthogonality condition of least squares estimation
states at the optimal value of w (n) that, the error is orthogonal to the t t t th ti l l f th t th i th l t th data on which the prediction is made.
eT (n) X(n) = 0T
Therefore
... (27)
( d ( n)
X(n)w (n) ) X(n) = 0T

T
dT (n) X(n) = wT (n) XT (n) X(n)

applying the transpose operator on both sides yields XT (n)d(n) = XT (n) X(n)w (n)
w (n) = XT (n) X(n)
XT (n)d(n)
... (28)
-1(n)
N. Tangsangiumvisai
z ( n)
27 N. Tangsangiumvisai Adaptive Signal Processing : Lecture 7 28
Recursive Least Squares (RLS)

Recursively compute the updated estimate of the tapweight vector upon th arrival of new d t i ht t the i l f data. Its cost function requires no statistical information about x(n) or d(n), and is given by
J ( n) =
RLS algorithm (II)

By partitioning the data matrix, new data vector
((n+1) x M)
xT (n + 1) X(n + 1) = X( n )
... (30) old data matrix
e
i =1
a sample autocorrelation input matrix can now be written as ... (29)

(M x M)
(i )
(n + 1) = XT (n + 1) X(n + 1)
Note th t th l N t that, the length of data samples is i th f d t l i increasing. i
x(i) x (i) = x(i ) x (i ) +

=
T in 1 =
T i =1
29 N. Tangsangiumvisai
n +1
... (31)
x(n + 1)xT (n + 1)
... (32)
N. Tangsangiumvisai
30
RLS algorithm (III)

Matrix Inversion Lemma
A = B + CDCT
... (33)
RLS algorithm (IV)

Define : kalman gain vector likelihood variable
From Eq.(35), we have
k (n) = 1 (n) x(n)
... (36) ... (37)
then
A 1
= B 1 B 1C CT B 1C + D 1
A = (n + 1) B = (n)
CT B 1
... (34)
= x (n + 1) (n)x(n + 1)
T
If we select
1 (n + 1)( + 1) = 1 (n)( + 1) 1 (n)x(n + 1)xT (n + 1) 1 (n)

post-multiplying both sides by x(n+1) gives
C = x(n + 1) ( )
D = 1
k (n + 1)( + 1) = 1 (n)( + 1)x(n + 1) 1 (n)x(n + 1)

1
1 (n + 1) = 1 (n)
1 (n)x(n + 1)xT (n + 1) 1 (n) x (n + 1) (n)x(n + 1) + 1

T
... (35)
k (n + 1) =
1 (n)x(n + 1) ( + 1)
... (38)
N. Tangsangiumvisai
31
N. Tangsangiumvisai
32
RLS algorithm (V)

1 (n + 1) = 1 (n) k (n + 1)xT (n + 1) 1 (n)
Since ... (39)
RLS algorithm (VI)
z (n + 1) =
i=1
n +1
d (i )x(i )
... (40) ... (41)
= xT (n + 1) 1 (n)x(n + 1)
k (n + 1) = 1 (n)x(n + 1) ( + 1)
= z (n) + d (n + 1)x(n + 1)
To solve
w (n + 1) = 1 (n + 1)z (n + 1)
(n + 1) = d (n + 1) xT (n + 1)w (n)
w (n + 1) = w (n) + k (n + 1) (n + 1) ) ) )
Substituting Eq (39) and Eq (40) in Eq (41) yields Eq.(39) Eq.(40) Eq.(41)
w (n + 1) = w (n) + k (n + 1) (n + 1)
where the priori estimation error is given by
... (42)
1 (n + 1) = 1 (n) k (n + 1)xT (n + 1) 1 (n) ) ) )
(n + 1) = d (n + 1) xT (n + 1)w (n)
N. Tangsangiumvisai Adaptive Signal Processing : Lecture 7
... (43)
N. Tangsangiumvisai
RLS Initialisation
Assumption of the input data : pre-windowing method data prior to n = 0 are zero
Re-define
( n) =
x(i) x
i =1
(i ) + I
... (44)
where is a small positive constant, hence 1 (0) = 1 I
... (45) ( )
N. Tangsangiumvisai
35
N. Tangsangiumvisai
36
Computational Complexity
x +/-
Non-stationary environment
Previously, we assume a statistically stationary environment for the RLS algorithm. i t f th l ith In a non-stationary environment,
J ( n) =
= xT (n + 1) 1 (n)x(n + 1)
1 (n)x(n + 1) ( + 1)
k (n + 1) =
i =1
n i
e 2 (i )
... (46)
(n + 1) = d (n + 1) xT (n + 1)w (n)
w (n + 1) = w (n) + k (n + 1) (n + 1)
1 (n + 1) = 1 (n) k (n + 1)xT (n + 1) 1 (n) Total
where (0,1] is a forgetting factor.
N. Tangsangiumvisai
37
N. Tangsangiumvisai
38
RLS algorithm (VI)
= xT (n + 1) 1 (n)x(n + 1)
k (n + 1) =
1 1 (n)x(n + 1) (1 + )
(n + 1) = d (n + 1) xT (n + 1)w (n)
w (n + 1) = w (n) + k (n + 1) (n + 1) ) ) )
1 (n + 1) = 1 1 (n) 1k (n + 1)xT (n + 1) 1 (n) ) ) )
N. Tangsangiumvisai
39
N. Tangsangiumvisai
40
Performance Analysis
By writing the desired signal as
Performance Analysis (II)

For convenience, the MSE of RLS is modified to
d (n) = w x(n) + em (n)

T
... (47)
J (n) =
i =1
(i )
... (50)
( where em (n) is the measurement error of zero mean and 2 variance m , and is independent of the input signal.
By defining the B d fi i th weight-error vector as i ht t ... (48) the a priori estimation error can then be expressed as
Hence, Hence the MSE of RLS is equal to

2 J (n) = m + tr{R K (n-1)}
where R (n) = E
K ( n) = E
{ (n) (n)} is the weight-error correlation matrix.
{ x(n)xT (n)} is the autocorrelation input matrix and

T
... (51)
(n) = d (n) wT (n 1)x(n) )

(n) = em (n) (n 1)x(n)
T
N. Tangsangiumvisai
= em (n) ( w (n 1) w )
The steady-state MSE approaches variances of the measurement 2 error m as n . (in stationary environment)
x( n)
... (49)
41 N. Tangsangiumvisai
Since RLS i b Si is based on a d t d deterministic cost function. i i ti tf ti

Adaptive Signal Processing : Lecture 7 42

Co e ge ce ate Convergence rate Contents Revision on the adaptive filtering algorithm The Least Squares Method The RLS Algorithm Computational Complexity Performance Analysis Comparison between LMS and RLS The Fast RLS (FRLS) Algorithms ( ) g Summary
LMS V.S. RLS
LMS 20M RLS 2M The convergence rate of RLS does not depend on the condition number of the input data as in the case of LMS.
Computational Complexity
LMS O(2M) ( ) RLS O(M2)
N. Tangsangiumvisai
43
N. Tangsangiumvisai
44
LMS V.S. RLS (II)
Robustness
Leakage LMS : operate in fixed-point implementation RLS : numerically sensitive to rounding errors, when < 1
Tracking Performance
LMS RLS
(1 )
N. Tangsangiumvisai
45
N. Tangsangiumvisai
46
Fast RLS (FRLS) algorithms

To reduce the computational complexity of RLS to O(L), where > 2. By the use of linear forward and backward prediction. Example - Fast Transversal Filters (FTF) - Fast Newton Transversal Filters (FNTF) - Fast Quasi-Newton (FQN) - F t LMS/Newton Fast LMS/N t - Fast Least Squares (FLS) Problem : numerical instability, particularly when < 1 .
Summary
The cost function of RLS requires no statistical information about input or desired signals. RLS achieves the improvement in performance, as compared to LMS but at higher computational cost LMS, cost. Next lecture Lecture 8 : Frequency-domain algorithms
N. Tangsangiumvisai
47
N. Tangsangiumvisai
48

Handout 7

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Handout 7

Caricato da

Copyright:

Formati disponibili

Contents

Lecture 7 Least Squares Adaptive Filtering

Adaptive Signal Processing : Lecture 7

Rev. on the adaptive filtering algorithm g g

x(n) input signal d(n) desired signal e(n) error signal ( ) i l

adaptive filtering algorithm l ith

The d t Th update equation: ti

where is a positive constant.

w J (n) = 2p + 2Rw (n)

w (n + 1) = w (n) + (p R w (n) ), n = 0,1,2,

Adaptive Signal Processing : Lecture 7

Least Squares Adaptive Filtering

Adaptive Signal Processing : Lecture 7

Least Squares Method

Least Squares Method (II)

Adaptive Signal Processing : Lecture 7

Adaptive Signal Processing : Lecture 7

Least Squares Method (III)

Least Squares Method (IV)

em(n) y(n) x(n M+1) x(n-M+1) wM-1

where the unknown system is described as

Fig.2: Linear Transversal Filter model g

Least Squares Method (V)

Least Squares Method (VI)

adaptive filter w(n) d(n)

Fig.3: Fig 3: System Identification

Adaptive Signal Processing : Lecture 7

Least Squares Method (VII)

Least Squares Method (VIII)

... (15) ... (16)

Cost function (based on the pre-windowing method)

Fig.4: Linear Transversal Filter Model

Adaptive Signal Processing : Lecture 7

Adaptive Signal Processing : Lecture 7

Methods for data windowing

Methods for data windowing (II)

Adaptive Signal Processing : Lecture 7

Adaptive Signal Processing : Lecture 7

Methods for data windowing (III)

Methods for data windowing (IV)

Adaptive Signal Processing : Lecture 7

Adaptive Signal Processing : Lecture 7

Least Squares Method (IX)

Least Squares Method (X)

w x(n i) x(n k ) = x(n i)d (n),

The k component of the gradient vector gives

time-averaged autocorrelation function

time-averaged cross-correlation function

k J = 0, k = 0,1, , M 1 Since is required, hence, the principle of orthogonality :

x(n k )e(n) = 0, k = 0,1,, M 1

Adaptive Signal Processing : Lecture 7

Adaptive Signal Processing : Lecture 7

Least Squares Method (XI)

Least Squares Method (XII)

w1 (n) w2 (n) w ( n) = wM (n)

x(n 1) x ( n) x(n 1) x(n 2) data matrix X( n) = (n x M) x(0) x(1)

x(n M + 1) x (n) x(n M ) xT (n 1) = ... (23) x( M ) xT (1)

Adaptive Signal Processing : Lecture 7

Adaptive Signal Processing : Lecture 7

Least Squares Method (XIII)

Least Squares Method (XIV)

time-averaged autocorrelation matrix t l ti ti

time-averaged cross-correlation vector