Sei sulla pagina 1di 5

Learning feature characteristics

Simon J. Hickinbotham, Edwin R. Hancock and James Austin


Department of Computer Science
University of York, York Y01 5DD, UK.

Abstract

This paper describes a statistical framework for


the unsupervised learning of linear lter combinations
for feature characterisation. The learning strategy is
two step. In the rst instance, the EM algorithm is
used to learn the foreground probability distribution.
This is an abductive process, since we have a detailed
model of the background process based on the known
noise-response characteristics of the lter-bank. The
EM algorithm is therefore used to learn the parameters of a radial-basis expansion which describes the
residual probability distribution when the background
is subtracted. The second phase uses the a posteriori foreground and background probabilties to compute
a weighted between-class covariance matrix. We use
principal components analysis to locate the linear lter
combinations that maximise the between class covariance matrix. The new feature characterisation method
is illustrated for the problem of extracting linear features from complex milli-metre radar images. Here
the method proves to be e ective in learning a mixture
of sine and cosine phase Gabor functions necessary to
capture shadowed line structures.

1 Introduction

Multichannel lter banks have recently been


demonstrated to o er exciting possibilities for complex feature characterisation and iconic object recognition. The basic idea is to represent quite intricate
grey-scale appearance using the response pattern of
a high-dimensional lter bank. In practice, the lter
bank is usually constructed from some well de ned basis functions. Examples include Gabor wavelets, Hermite polynomials or derivatives of Gaussians. These
lter banks are designed so that the channels are distributed over representative scales, orientations and
feature symmetries.
Concrete example in the literature include the recent contributions of Rao and Ballard [1], Von der
Maalsburg [2, 3], and, Bregler and Malik [4]. Rao and
Ballard's contribution was to show how the lter bank
response pattern could be encoded into the Kanerva

memory and used to recognise 3D objects under different poses. Von der Maalsburg and co-workers [2, 3]
have used a relatively simple multi-channel representation to model the appearance of key facial features.
Rather than providing overall recognition themselves,
these so-called jets are used to control the tting of an
active net to faces in di erent 3D poses. De Bonet and
Viola use a multiscale lter bank extract 48,000 features. The features are used to perform content-based
image indexing [5]. Mel's channel model is perhaps
one of the most ambitious [6]. Here there are over
100 di erent channels specialised not only to local features, but also to colour, spatial or linear contiguity
(blobs and lines) and local curvature (corners).
One of the key issues that raises itself when such
a multi-channel feature or object representation is
used is that of how to learn the pattern of lter
responses. The literature here is relatively sparse.
Most of the work which exploits neural network architectures adopts the working model that the recognition process should be trained from a few examples
and that the generalisation properties of the network
should be exploited to accommodate variable object
appearance [1]. An example of a more principled approach is Bregler and Maliks [4] use of the expectationmaximisation [7] algorithm to learn the channel mixing proportions. This procedure has been demonstrated to work e ectively on relatively noise-free and
uncluttered imagery.
In this paper we are interested in the challenging feature recognition problems posed by noisy radar
data. Here we wish to identify lters capable of characterising complex radar re ection patterns due to elevated features in the landscape. In particular we
are interested in learning lters for enhancing linear
features such as roads. We commence from a similar starting point to that of Bregler and Malik [4],
by adopting the EM algorithm as a learning engine.
However, our methodology di ers in a number of important respects.
In the rst instance we commence by performing
channel balancing so as to ensure that each of the com-

Authorized licensed use limited to: University of York. Downloaded on August 16,2010 at 20:37:33 UTC from IEEE Xplore. Restrictions apply.

ponents of the lter bank has an equal noise throughput. In order to model the unknown foreground feature distribution for the channel responses, we adopt
a radial-basis distribution. The basic idea is to t
a series of Gaussian basis functions to the residue of
the probability distribution when the background process is subtracted. The parameterised distribution is
used to compute a posteriori feature probabilities in
the expectation step of the learning process. Finally,
once the a posteriori feature probabilities are to hand
they may be used to project out the optimal set of
channel combinations for the foreground or target features. Here we use the between class covariance matrix as a foreground-background separation measure.
We project out linear lter combinations by applying
principal components analysis to the between class covariance matrix.

2 Channel Model

The overall aim in this paper is to describe a statistical methodology for learning combinations of channel lters for feature characterisation. The learning
procedure is non-linear and is based on the EM algorithm of Dempster, Laird and Rubin [7]. In this
Section we outline the statistical model the underpins
the learning algorithm.

2.1 Filter Bank

We are interested in identifying an orientational lter basis consisting of odd and even symmetry kernels that can be used to characterise mixed-symmetry
variable-width hedge-features. Although there are
many alternatives available in the literature, here we
make use of the Gabor- lter. If x and y denote the
spatial co-ordinates, then the so called Gabor functions with horizontal orientation having spatial width
w and frequency  are as follows
Lw;0(x; y) = exp[,

x2 + y2
2w2 ] cos[2x]

(1)

Ew;0(x; y) = exp[,

x2 + y2
2w2 ] sin[2x]

(2)

Since it is of even symmetry, the cosine-phase Gabor kernel Lw;0(x; y) operates as a line-enhancement
operator. The sine-phase kernel, on the other hand,
is appropriate to edge-detection. The lter pair described above is appropriate to the detection of intensity features aligned along the x-axis of the image
plane. Kernels appropriate to the detection of features
oriented along the vertical are obtained by rotating the
horizontal kernels by 2 . To make this angular dependence explicit, we let Lw; (x; y) and Ew; (x; y) denote

the even and odd kernels for orientation state  in the


image.
We are interested in constructing feature-vectors
from the responses of a battery of Gabor lters oriented along the horizontal and vertical axes of the image lattice. This lter-bank is composed of lters of
two-scales w and 2w. The lter responses are stacked
to form the image feature-vector xij . For the sake
of shorthand convenience we write the vector of lter
responses as xij = W
I where W represents the vec


tor of lter kernels and
is the convolution
operation
with the image I .

2.2 Background

Our statistical modelling of the lter-bank output commences by considering the image background.
Here we assume that the lters are being applied to
locally uniform image regions containing no signi cant features or structure. We further assume that
the uniform structureless regions are subject to additive Gaussian noise with zero mean and variance
2 . Since the lter responses are obtained in a linear
fashion from the noisy image data, then the channel
response vector xij = W
follows a multivariate

In Iother
Gaussian with zero mean.
words, the probability density function for the background distribution
of channel-vectors is distribution


1
1
1
,
1
T
p
exp , 2 xi;j  xi;j (3)
p(xi;j j) =

(2) n j j
2

The elements of the channel-vector covariance matrix


 = E (xi;j xTi;j ). proportional to the autocorrelations and cross correlations for individual lter kernels. This noise distribution will be used to model the
background contributions in our training data.

2.3 Foreground

The statistical modelling of the foreground distribution of Mahalanobis length for feature detection
has proved to be an extremely elusive task. For instance, attempts at modelling the distribution of edgegradient for automatic control of the Canny hysteresis
thresholds have con ned their attention to the background or noise process [8, 9]. In this paper our aim is
to use the background model to assist in learning the
channel structure of the foreground. Speci cally, we
augment the Gaussian background model with a radial basis expansion to which we use to parameterise
foreground structure. We distinguish between the different basis kernels by assigning them a label !. The
complete set of foreground kernels is denoted by the
set
. The kernel indexed ! has mean Mahalanobis
length ! and mixing proportion ! . The basis functions are assumed to have a radial structure. In other

Authorized licensed use limited to: University of York. Downloaded on August 16,2010 at 20:37:33 UTC from IEEE Xplore. Restrictions apply.

words, the channel covariance matrix is diagonal with


identical elements and is of the form ! = !2 I where
I is the identity matrix. With these ingredients the
basis kernel indexed ! is

3.2 Maximisation:

In the maximisation step we aim to recover maximum likelihood basis-parameters which satisfy the
condition



(n+1) = arg max
K (j(n))
(9)
1
1
1

T ,1
)

(x
,

)
p(xi;j j!; ) =
exp
,
(x
,


2 i;j ! ! i;j ! At iteration n of the algorithm, the position of the
(2) n !
(4)
basis-function indexed ! is given by
where ! represents the set of basis parameters for
P
the kernel indexed !:
(1 , P (xj))P (!jx; (n) )
x2HP
(
n+1)
(10)

=
!
! = (! ; ! ; ! )T
(5)
x2H P (!jx; (n) )
Finally, the radial-basis approximation of the foreThe corresponding basis-function width is equal to
ground is
P
(n) 2
X
(n) )
x
2
H ((1 , P (xj)) , ! ) P (! jx; 
p(xi;j j
) =
! p(xi;j j!; )
(6)
2
n+1)


P

=



!
! 2

x2H P (!jx; (n))


(11)
The complete model of the lter outputs can then be
3.3
Expectation:
expressed as follows
In the expectation-step of the algorithm, the a posteriori
probabilities of the basis-components are upX
p(xi;j ) =  p(xi;j j) +
! p(xi;j j!; ) (7)
dated.
The updated probabilities are related to the



! 2

current estimated of the mixing proportions and the


new density estimates using the Bayes formula in the
3 Learning
following manner
In this section we outline our learning algorithm.
(!n) p(xj!; (n) )
This is based on the EM algorithm. The idea is to
(n+1) ) =
P
(
!
j
x
;

P
(n) 0 p(xj! 0 ; (n) ) (12)
iterate between expectation and maximisation steps


! 2
[ !

to learn the parameters of the foreground radial-basis
expansion.
New mixing proportions are computed by averaging
the updated a posteriori probabilities over the training
3.1 Objective Measure:
data
Stated succinctly, we use the foreground radial ba1 X P (!jx; (n+1) )
sis expansion to model the distribution of channel (!n+1) =
(13)
jH j x2H 
vectors which remains unexplained by the background

process. In order to capture this process in a statis4
Optimal
class
separation
lters
tical framework we measure the Kullback divergence
The
unsupervised
learning
strategy
outlined
in the
between the component basis kernels and the compleprevious
section
delivers
radial
basis
expansion
of the
ment of the background distribution. The quantity of
foreground
Mahalanobis
length
distribution
for
the
interest is
output of the lter-bank. In this section, we aim to exploit this probability distribution to modify the lter(
n
+1)
XX
banks so that it gives optimal foreground-background.
p(xj
)
K ((n+1) j(n) ) =
P (!jx; (n) ) ln 
Speci cally, we aim to to nd linear combinations of

1 , p(xj)
x2H !2

the Gabor functions that maximise the between-class


(8)
covariance matrix B . The optimal lter combinawhere H is the set of Mahalanobis lengths of each
tions are located using principal components analysis.
data point making up the image. The basic aim is
Supose that the weigthed mean-vectors for the foreminimise K ((n+1) j(n) ) with respect to the mixing
ground, background and population are denoted by zf ,
proportions, mean channel-vectors and radial covarib and zp , then the between-class covariance matrix is
zequal
ance parameters for the set of basis functions. The soto
lution to this problem is well-known and is furnished
B = qb (zb ,zp )(zb ,zp )T + qf (zf ,zp )(zf ,zp )T (14)
by the EM algorithm [10].
2

Authorized licensed use limited to: University of York. Downloaded on August 16,2010 at 20:37:33 UTC from IEEE Xplore. Restrictions apply.

where qb and qf are the total background and foreground weights.


Our basic aim in performing principal components
analysis on the output of the multichannel lter bank,
is to identify the linear transformation of the individual lter responses that has maximum betweenclass variance with respect to the foreground and
background distributions. The principal components
transformation is obtained by solving the eigenvalue
equation jB , I j = 0, where I is the 8  8 identity
matrix. The eigenvalues 1 ; :::; 8 are the variances
of the components of the transformed feature-vectors.
Associated with each of these eight eigenvalues is an
eigenvector V whose components satisfy the following system of linear equations B V = V . These
eigenvectors are the axes of a new orthonormal coordinate system. Speci cally, the eight eigenvectors
are the columns of the transformation matrix between
the original feature-vectors and the principal components representation. If we denote the transformation
matrix by  = (V ; :::; V ) then the transformed
feature matrix is given by yij = T :W

Ii;j . The in are the inner products


dividual principal components
of the eigenvectors and the vector of lter-responses,
i.e. yij = VT W

I . As a result, we can re-write the

principal components transformation to make the role
of the lter-bank clearer. Substituting for the vector
of lter responses, i.e. the xi;j = W

I , we nd that
yi;j = T W

I . In other words, the transformed




vector
of lter-responses
can be obtained equivalently
by convolution with the transformed lter-bank W~ =
T W
. Each elelment of the vector W~ , therefore rep
resents a new lter that is obtained by linearly combining the lter kermels of the original Gabor-basis.
1

5 Experiments

(a)

(b)

(c)

(d)

Figure 1: a) original radar image b) modulus of lter


vectors c) principal component of maximum betweenclass variance d) modulus of remaining components

The experimental evaluation of our feature characterization algorithm revolves around Millimetric
Doppler Beam Sharpened (MDBS) radar images.
These images di er from their SAR counterparts in a
number of important respects. Firstly, the frequency
of the radar is of the order of 100 GHz rather than
the 10 GHz which is typical of SAR. This means that
structures whose size is of the order of a few millemetres appear rough to the radar. The shorter wavelengths employed in the MDBS imagery are a consequence of physical constraints imposed upon the dimensions of the resonating cavities in airborne military radars. The second diculty stems from the
imaging geometry. Since the radar is used to sense
objects in the line of ight from a low ying aircraft,
the images are subject to small angle systematics.

Figure 2: Composite lter suggested by the principle


eigenvector of the between class covariance matrix
The scenes under study are of rural areas where the
principal man-made features available for cartographic
matching are linear road and hedge structures. Since
they are elevated, these features produce ridge artefacts in the radar images. The image background is
typically grassland, which appears rough to the radar
and results in specularities.
Figures 1 shows sample image and intermediate
stages of this analyis. Panel (a) shows the example
image and panel (b) shows the modulus of the ltervectors. Panel (c) shows the principal component of
maximum between-class variance, i.e. y1ij = VT xij .

The contrast between hedge and background
features
is clearly enhanced. Panel (d) shows the modulus
of the transformed vector obtained by dropping the

Authorized licensed use limited to: University of York. Downloaded on August 16,2010 at 20:37:33 UTC from IEEE Xplore. Restrictions apply.

leading principal
component( the quantity of interest
qP
8
T
R
2
is yij =
k xij ) . Here the contrast bek=2 k (V


tween the elevated structures and the background is
very poor. In other words, the lter combination obtained by principal components analysis extracts most
of the salient raised structures from the ltered intensity image.
Finally, Figure 2 shows the lter obtained by combining the raw Gabor kernels in the proportions
suggested by the leading principal components. In
other words the lter-kernel for hedge-enhancement
is K (x; y) = VT W
(x; y). The combined kernel
is predominantly of even-symmetry structure. It is
this even-symmetry component which enhances linestructure. However, there is also an odd-symmetry
component which allows for an admixture of edge-like
structure in the detected features. It is the shadowing
of the elevated features which is responsible for the
small edge-component.
1

6 Conclusions

The main contribution is this paper has been to


present a new methodology for learning lters for complex intensity feature characterisation. The learning
strategy is an unsupervised one. We commence with
a suitably selected lter-basis that can capture the
stucture of a wide variety of feature types due to variations in scale, orientation and local symmetry (phase).
Next, we model the noise output of the lter-bank under the assumption of additive Gaussian intensity errors. The distribution of lter responnses which remains un-explained by the background noise-model
is assumed to have originated from genuine feature
structure. By tting a radial-basis model to the complement of the noise-distribution, we are able to compute foreground and background probabilities for the
feature structure using the EM algorithm.
The second step is to use the a posteriori feature
probabilities to compute a between-class covariance
matrix for the foreground and background structure in
the image. We seek linear combinations of the original
basis lters that result in maximum between class variance. Since we are dealing with a two-class problem,
then maximising the between class variance will automatically result in minimum within class variance. As
a result the linear lter combinations can be identi ed by applying principal components analysis to the
between class covariance.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]
[10]

sponse properties in the visual cortex. Neural


Computation, 9:721{763, 1997.
T. Maurer and C. Von der Malsburg. Learning
feature transformations to recognise faces rotated
in depth. In Proceedings of the International Conference on Arti cial Neural Networks, pages 353{
358. EC2 et Compgnie, 1995.
L. Wiskott, J.M. Fellous, N. Kruger, and
C. von der Malsburg. Face recognition by elastic
bunch graph matching. IEEE-PAMI, 19:775{779,
1997.
C. Bregler and J. Malik. Learning appearancebased models: mixtures of second moment experts. In Advances in Neural Information Processing Systems., pages 845{851. MIT press,
1996.
J.S. De Bonet, C.L. Isbell, and P. Viola. Mimic:
Finding optima by estimating probability densities. In Advances in Neural Information Processing Systems 9, pages 424{430. MIT Press, 1997.
B. W. Mel. Seemore: Combining color, shape,
and texture histogramming in a neurally inspired
approach to visual object recognition. Neural
Computation, 9:777{804, 1997.
A.P. Dempster, N.M. Laird, and D.B. Rubin.
Maximum likelihood from incomplete data via
the EM algorithm. Journal of the Royal Satistical Society, Series B (methodological), 39:1{38,
1977.
H. Voorhees and T. Poggio. Detecting textons
and texture boundaries in natural images. In Proceedins of the First International Conference on
Computer Vision, pages 250{258. Computer Society Press, 1987.
E.R. Hancock and J. Kittler. edge labelling using dictionary-based relaxation. IEEE PAMI,
12:165{181, 1990.
M. I. Jordan and R. A. Jacob. Hierarchical mixtures of experts and the em algorithm. Neural
Computation, 6:181{, 1994.

References

[1] R. P. N. Rao and D. H. Ballard. Dynamic


model of visual recognition predicts neural re-

Authorized licensed use limited to: University of York. Downloaded on August 16,2010 at 20:37:33 UTC from IEEE Xplore. Restrictions apply.

Potrebbero piacerti anche