Sei sulla pagina 1di 12

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO.

11, NOVEMBER 1998 1121


Are Multilayer Perceptrons Adequate for
Pattern Recognition and Verification?
Marco Gori, Senior Member, IEEE, and Franco Scarselli
AbstractThis paper discusses the ability of multilayer perceptrons (MLPs) to model the probability distribution of data in typical
pattern recognition and verification problems. It is proven that multilayer perceptrons with sigmoidal units and a number of hidden
units less or equal than the number of inputs are unable to model patterns distributed in typical clusters, since these networks draw
open separation surfaces in the pattern space. When using more hidden units than inputs, the separation surfaces can be closed
but, unfortunately, it is proven that determining whether or not an MLP draws closed separation surfaces in the pattern space is
13-hard. The major conclusion of this paper is somewhat opposite to what is believed and reported in many application papers:
MLPs are definitely not adequate for applications of pattern recognition requiring a reliable rejection and, especially, they are not
adequate for pattern verification tasks.
Index TermsMultilayer perceptrons, pattern recognition, pattern verification, function approximation, closed hemisphere problem.

1 INTRODUCTION
N the last 10 years, multilayer perceptrons (MLPs) have
been massively used in the area of pattern recognition.
The experimental results have been impressive in some
applications where we know in advance that the patterns
belong to a small number of classes. In those cases, be-
cause of their strong discrimination capabilities, MLPs
exhibit excellent performance (see, e.g., [1], [2]). In most
practical applications, however, one needs to perform the
classification into a fixed number of classes, but also needs
to carry out a reliable pattern rejection. Basically, patterns
have a different degree of membership, and it is reason-
able to reject them whenever their degree of membership
does not reach a given threshold fixed in advance. As a
matter of fact, an important desiderata of any classifier is
that of performing a reliable pattern rejection. When
dealing with problems of pattern verification, an accurate
evaluation of the degree of membership of any given pat-
tern becomes the fundamental requirement that we need
to fulfill. This is clearly pointed out by Gish and Schmidt
[3] when discussing the problems of speaker identification
and verification: In the first case, the test requires a close
set of speaker, whereas in the the second, one the set of
speakers is open and, therefore, the system must be pro-
tected against any potential impostor, whose identity is
not known in advance. Similar problems arise in many
different domains like the recognition and verification of
faces, banknotes, fingerprints, targets from radar images,
signatures, etc.
A very simple verification criterion (thresholding criterion)
that has been massively used consists of checking whether
the MLP outputs are close enough to the code adopted for
the targets. The closeness is checked properly by thresholds
and patterns are rejected whenever the MLP outputs depart
from the target code beyond the threshold limit.
In this paper, we give theoretical arguments which allow
us to claim that, somewhat opposite to what is reported in
many papers, unfortunately, MLPs cannot act as adequate
classifiers whenever the patterns do not surely belong to
the classes defined in advance, and the classifier is required
to perform a reliable rejection. Likewise, we claim that
MLPs with the thresholding criterion on the outputs are not
adequate for pattern verification, and that good perform-
ance of MLPs for these tasks are likely to be due to the spe-
cial nature of data and preprocessing.
These conclusions are based on the reasonable assump-
tion that in order to face effectively the proposed tasks, the
separation surfaces drawn by the classifier in the pattern
space must be closed so as to model properly most common
pattern probability distributions found in practice. We
analyze the separation surfaces regardless of the special
sigmoidal-like function adopted for the neurons, thus pro-
viding conclusions which only involve the MLP architec-
ture. We prove that in the case in which the number of hid-
den units is less or equal than the number of inputs, then
the separation surfaces created by multilayer perceptrons
are open. This condition is often met in practice since, in
many cases, neural networks adopted for pattern recogni-
tion have a pyramidal architecture with a large number of
inputs [4].
1
When using more hidden units than inputs, we
prove that MLPs can indeed draw either closed or open
separation surfaces but, unfortunately, we also prove
that checking whether the surfaces are open or closed is
13-hard. Basically, the separation surfaces are open for
1. Neural classifiers with nonpyramidal architectures, however, have
been also used in pattern recognition, especially in those cases in which the
patterns are represented by low-dimensional patterns (see, e.g., [5]).
0162-8828/98/$10.00 1998 IEEE

M. Gori is with Dipartimento dIngegneria dellInformazione, Universit


di Siena, via Roma, 56, Italy. E-mail: marco@ing.unisi.it.
F. Scarselli is with Dipartimento di Ingegneria dei Sistemi e Informatica,
Universit di Firenze, via S. Marta, 3, 50139, Firenze, Italy.
E-mail: franco@dsi.ing.unifi.it.
Manuscript received 14 Jan. 1998; revised 16 July 1998. Recommended for accep-
tance by A. Webb.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number 107164.
I
1122 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
pyramidal networks, and it is intractable to establish
whether the separation surfaces are closed with more hid-
den units than inputs.
2 PATTERN VERIFICATION AND SEPARATION
SURFACES
Most of the analysis carried out in the paper are mainly
based on MLPs with only one hidden layer, where the hid-
den units adopt the activation function s(). If we denote by
f() the function that an MLP realizes, then
f w b c
i i i
i
n
x v x 0 5 2 7
= + +
=

s
1
(1)
where v
i
5
m
, w
i
, b
i
, c 5, 1 i n, are the parameters of
the network, and x 5
m
is a vector representing the input
pattern. Let us consider the case in which each class is
modeled by a multilayer network with a single output. In
that case, given any real L > 0, we can define
C(L) = {x : f(x) L, x 5
m
}, (2)
where f : 5
m
5 is realized by an MLP network. A possi-
ble pattern verification criterion is that of establishing
whether or not x C(L). Throughout this paper, it will be
referred to as thresholding criterion.
2
The pattern verifica-
tion process according to the thresholding criterion is
depicted in Fig. 1, where one can easily conclude that a
fundamental requirement for the design of a pattern veri-
fication system is that C(L) be bounded in the pattern
space, created by the preprocessing modulus. If C(L) is
not bounded, an impostor represented by a remarkably
2. Without limitation of generality, we assume that the output neuron is
linear. In the case in which s() transforms also the activation of the output
neuron, the verification criterion is still the same, apart from the threshold
which changes from L to s
-1
(L).
different pattern can be erroneously verified by the sys-
tem. Note that, depending on the sensors and on the pre-
processing modulus, an MLP which draws open separa-
tion surfaces can also perform successfully. The sensors
and the preprocessing modulus can in fact contribute
themselves to pattern rejection; for instance, this holds if
any pattern associated with impostors is mapped to
square patterns (see impostor1, Fig. 1), that is, to points
that would be rejected also by open separation surface. In
some cases, the sensors and the preprocessing modulus
are chosen using a lot of heuristical rules derived from
experience, thus leading to very good experimental results
also using MLP with the thresholding or a related crite-
rion. Successful results have in fact been reported in ap-
plications to paper currency recognition and verification
[6], signature verification [7], [8], quality control of pad-
lock manufacturing [9], face recognition [10], and auto-
matic target recognition [11], [12].
Our own practical experience on the thresholding
criterion indicates that, unfortunately, depending on the
sensors, on the preprocessing modulus, and on the par-
ticular learning experiment the reliability of pattern veri-
fication systems based on the thresholding criterion can
be very poor. Likewise, pattern classification becomes
also very critical when increasing the number of classes,
especially if we want to reject patterns with a low degree
of membership. The failure of experiments of pattern
verification and pattern recognition based on the thresh-
olding criterion is likely to be due to the fact that MLP,
under certain experimental conditions, develop open
separation surfaces. This motivates the theoretical analy-
sis reported in this paper, which is aimed at under-
standing deeply the geometry of the space C(L) when
changing either the architecture or the kind of nonline-
arity (function s()).
Fig. 1. Verification using MLPs and the thresholding criterion. The pattern is properly preprocessed and then applied at the input of the MLP. The
verification is based on the comparison of the MLP output with a proper threshold L. As indicated in the picture which depicts the pattern space, a
fundamental requirement is that C(L) be a bounded domain.
GORI AND SCARSELLI: ARE MULTILAYER PERCEPTRONS ADEQUATE FOR PATTERN RECOGNITION AND VERIFICATION? 1123
3 ON THE GEOMETRY OF THE SEPARATION
SURFACES
To the best of our knowledge, the first analysis on the ge-
ometry of the separation surfaces was carried out by
Lippmann in [13]. In that paper, however, there was no spe-
cial attention on whether the separation surfaces are closed
and, most importantly, only networks with threshold func-
tions were considered. Although in practical experiments
with sigmoidal functions an MLP can have many saturated
neurons which exhibit a behavior very close to hard-
limiting neurons, others are likely not to be saturated, and,
consequently, the case of sigmoidal units cannot be trivially
understood simply invoking Lippmanns results.
The following theorem gives some insights on the ge-
ometry of the domain C(L), in the case in which the number
of input units is greater or equal to the number of hidden
units. This is commonly verified in many applications to
pattern recognition [4].
THEOREM 1. For all threshold L > 0, if C(L) is nonempty, then it
is unbounded in both the following cases
1) the number of hidden neurons is less than the number of
inputs (n < m);
2) the number of the hidden neurons equals the number
of the inputs (n = m) and the activation function s()
is monotone.
PROOF. See the Appendix. o
This theorem suggests that the number of the hidden
neurons plays a crucial role on the geometry of domains
C(L). If the number of hidden neurons is less than the
number of inputs then the network cannot model prop-
erly bounded classes, no matter what activation function
is chosen. The choice of the activation function influ-
ences the MLP behavior only when the number of the
hidden units equals the number of the inputs. In that
case, however, if the activation function is monotone,
3
3. Note that there is no need to make any assumption on the boundness
of s().
then the network cannot model bounded sets, otherwise
when breaking this condition bounded sets can be mod-
eled. Finally, if the number of hidden neurons is greater
than the number of inputs, then bounded sets can be
obtained, but this depends on the weights.
EXAMPLE 1. An example of the separation surfaces drawn
when n > m is given in Fig. 2 and Fig. 3. A set of
random patterns in R
2
with modulus 0.5 and 2 was
used for positive and negative examples, respec-
tively. The positives patterns are represented by
stars, whereas the negative examples are depicted
by circles. The network was trained so as to force
the output to one on positive examples and to zero
on negative examples. The pattern membership was
established by choosing L = 0.8 and L = 0.2 for the
two classes, and the solid and the dotted line repre-
sent the associated contour. In the experiments of
Fig. 2 and Fig. 3, the network was based on the
classic logistic sigmoidal activation functions and
had five hidden neurons. Although the number of
hidden units was greater than the number of in-
puts, the training of the MLP produced an open
separation surface (see Fig. 2). Unlike what hap-
pens in Fig. 2, in Fig. 3 the training of the MLP in
different conditions (there was a different number
of negative examples) yielded a closed separation
surface.
EXAMPLE 2. In the case m = n, Theorem 1 states that MLPs
draw open separation surfaces under the additional
condition that s() is monotone. When violating this
condition, this is no longer true. For example, de-
pending on the weights, the separation surfaces
drawn by an MLP with two hidden units are shown
in Fig. 4 and Fig. 5. Like in the case n > m and mono-
tone activation functions, the domain C can either be
open or closed.
Theorem 1 is based on the thresholding criterion and
MLPs with one hidden layer and one output neuron only.
Fig. 2. Separation surfaces produced by backpropagation when using
an MLP with five hidden units and two input units (n = 5, m = 2). Al-
though n > m, the separation surfaces are open. The surfaces are
drawn for L = 0.8 (solid line) and L = 0.2 (dotted line).
Fig. 3. Separation surfaces (L = 0.8 and L = 0.2) drawn when learning
the same concept with more examples. In this case, the separation
surfaces produced by backpropagation are closed.
1124 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
One may wonder what happens when using more hidden
layers, multiclass problems, and different rejection criteria.
The following theorem gives a negative answer to the
possibility of developing bounded sets also under these
hypotheses.
THEOREM 2. Let us consider an MLP with at least one hidden
layer and multiple outputs, and let f() be the function it
realizes. If the number of hidden neurons n in the first
hidden layer is less than the number of the inputs m, then
there is a linear subspace S R
m
of dimension m - n such
that
f(x) = f(x + v) (3)
for all v S.
PROOF. See the Appendix. o
The fundamental consequence of this theorem is that no
rejection criterion only based on the outputs of the network
can discriminate patterns x and x + v. Since v S, pattern
x + v can be arbitrarily large and, therefore, no bounded
class can be modeled.
Notice that the theorem holds also for multiple classes
and rejection criteria based on all the outputs of the MLP,
like those based on softmax.
Finally, notice that the first hidden layer of a generic
MLP plays the role of the only hidden layer in two layered
MLPs. In this sense, the only difference with Theorem 1 is
noticed when n = m. In this case, an MLP with more layers,
or more outputs, or different rejection criteria can realize
closed surfaces regardless of s(), whereas with a two lay-
ered MLP and threshold criterion a nonmonotone function
s() is needed.
Theorem 1 and Theorem 2 do not clarify completely
the role played by the activation function. For instance,
in the case n > m, it is not clear to what extent the
boundness of C(L) can be affected by the activation
function. The following theorem answers completely this
question for MLPs with sigmoidal functions, where with
sigmoidal function we mean any nondecreasing func-
tion s() such that lim
x-
s(x) = 0 and lim
x+
s(x) = 1.
Basically two networks N
h
and N
s
with different sigmoi-
dal functions are compared for what concerns the do-
mains C
h
(L) and C
s
(L).
THEOREM 3. Let N
s
, N
h
be two MLPs with one hidden layer
having the same number of units and weights, but differ-
ent sigmoidal activation functions s
s
() and s
h
(). Let us
assume that for all thresholds L > 0:
1) " +

U n w c L
i
i U
1, , : K , , ;
2) Every subset of size m of the set {v
1
, , v
n
} is linearly
independent.
Then C
s
(L) is bounded iff C
h
(L) is bounded.
PROOF. See the Appendix. o
Basically, this theorem states that the property of pro-
ducing a closed separation surface is independent of the
special sigmoidal function s(), whereas it strongly related
to the network architecture and weight values. The do-
main C(L) cannot be changed from bounded to un-
bounded, or vice versa, simply by replacing the activation
function in the hidden units, regardless of the chosen
threshold L > 0. As a consequence, if we are interested in a
pattern verification system based on an MLP with the
thresholding criterion, this theorem states that no design
choice, based on the sigmoidal function, exists which can
turn open separation surfaces of N
h
to closed separation
surfaces in N
s
.
4 COMPUTATIONAL COMPLEXITY ISSUES
In the previous section, we have given conditions under
which the separation surfaces can be closed. As pointed out
in Section 2, this is a very desirable geometrical condition
especially in applications of pattern verification. When
using MLP with n > m, one can get closed separation sur-
faces. In some applications in which the preprocessing
modulus usually produces patterns with a large number
of inputs, however, the condition n > m might led to huge
architectures, which require large amount of training data
and expensive computational resources.
Fig. 4. Case m = n and nonmonotone function s(). Open separation
surfaces (L = 0.8 and L = 0.2) can be created for an MLP with two
hidden neurons and Gauss activation function.
Fig. 5. Separation surface (L = 0.8 and L = 0.2) for the same problem:
Depending on the weights, the domain C(L) is bounded.
GORI AND SCARSELLI: ARE MULTILAYER PERCEPTRONS ADEQUATE FOR PATTERN RECOGNITION AND VERIFICATION? 1125
Note that, in order to guarantee that an MLP with n > m
yields closed separation surfaces and, consequently, that
our verification system works properly, we still need to
check explicitly the learned configuration. Unfortunately,
we prove that checking the boundness of C(L) is intracta-
ble. Let us define formally the following two different
problems.
DEFINITION 1. Bounded Set Problem for MLPs with generic
activation function. Let us consider an MLP with activa-
tion function s() and let C(L) be the domain associated
with a given threshold L. The Bounded Set Problem (BSP)
with activation function s() consists of deciding whether
or not C(L) is bounded.
Special forms of BSP arise when considering special
kinds of activation function. An interesting form of BSP
arises for the Heaviside function. In such a case, f(), which
is the function realized by the network, can be written
as
f w c
h i
i b
i i
x
v x
0 5 = +
+

: 0
,
and it is constant on each set
P b i U b i U
U i i i i
= + + < x v x v x 0 0 for and for
= B
, (4)
where U {1, , n}. BSP reduces to look for a polytope
where f(x) L. Hence, the following definition arises.
DEFINITION 2. BSP for MLPs with Heaviside activation
function. Given L > 0, the Bounded Set Problem with
Heaviside function consists of deciding whether there exists
a nonempty and nonbounded P
U
such that
w c L
i
i U
+

(5)
holds.
In principle, the complexity of BSP may depend on the
activation function. However, the following theorem states
that even BSP for MLPs with Heaviside activation function
is 13-complete.
THEOREM 4. BSP for MLPs with Heaviside activation function is
13-complete.
PROOF. See the Appendix. o
A straightforward consequence of Theorem 4 is that, in
the cases in which s() ranges in a set that includes the
Heaviside function, BSP is 13-hard. This limits strongly
the actual possibility of solving BSP in many practical
problems of pattern verification.
5 CONCLUSIONS
In this paper, we have discussed the ability of multilayer
perceptrons to create bounded domains in the pattern space
and, in particular, we have related this analysis to applica-
tions of pattern verification. We have proven that, regard-
less of the function used in the processing units, architec-
tures with less units in the first hidden layer than inputs
cannot yield closed separation surfaces. When using more
hidden units than inputs, we have also proven that an
MLP can either create open or closed surfaces. Moreover,
no choice of the sigmoidal function in the neurons can
transform open separation surfaces into closed separation
surfaces, and deciding whether or not they are open is
13-hard.
These theoretical results concerning the geometry of the
separation surfaces have strong negative consequences on
the application of MLP with the thresholding criterion to
pattern verification systems. Successful results reported in
the literature are likely to be due to the special sensors
and preprocessing moduli used for generating the inputs
of the MLP, but the thresholding criterion can hardly be
regarded as a general criterion to be adopted for pattern
verification. In many applications of pattern recognition,
MLP with the thresholding criterion can give rise to the
same problem pointed out for the case of pattern verifica-
tion, unless one knows in advance that all the patterns
have an acceptable degree of membership with respect to
the given classes. Basically, in application of pattern rec-
ognition, MLP with the thresholding criterion can exhibit
excellent discrimination performance, but may fail in the
task of pattern rejection.
There are alternative approaches to pattern verification
using neural networks which do not suffer from the prob-
lems pointed out in this paper. For instance, MLPs used as
autoassociators, where the weights are adjusted so as to
copy the inputs to the outputs, can profitably be used for
designing pattern verification systems. For each pattern, the
verification criterion is based on the input/ output Euclid-
ean distance, that is, given a threshold d, pattern x is ac-
cepted if and only if f(x) - x) d. The basic idea is that
only the patterns of the class used for training the autoasso-
ciator are likely to be reproduced with enough approxi-
mation at the output. It has been shown that in this case the
separation surfaces are always closed (see, e.g., [14], [15])
and, therefore, the problem pointed out in this paper con-
cerning the pattern rejection is inherently solved. It can
easily been proven that neural networks based on radial
basis functions can also provide closed separation surfaces
and, consequently, appear more adequate than MLPs based
on sigmoidal functions (see, e.g., their application in the
field of speech verification [16]).
APPENDIX
A.1 Proofs of Theorems 1 and 2
PROOF OF THEOREM 1. Let us consider the case n = m and s()
monotone, and the case n < m separately.
Case n = m and s() monotone.
In this case, since the vectors v
1
, , v
n-1
are not a
basis of 5
m
, there exists x 5
m
such that = v x
i
0,
1 i n - 1. Given
4
L > 0, consider any x
0
such
that f(x
0
) L, that is x
0
C. Moreover, assume that
a 5 and consider x x x = + a
0
. Since s() is
monotone, the following inequality
4. For the sake of simplicity, the dependence on L for the domain C(L)
will be dropped in the remainder of the paper.
1126 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
w b
n n n
s + = v x

w b w b
n n n n n n n
s a s + + + v x v x v x
0 0

(6)
holds for all x whenever choosing the as sign
properly. Hence, without loss of generality, we can
assume a (-, 0] or a [0, -), and we have
f w b c
i i i i
i
n
x v x v x
= + + +
=

s a
0
1
= + + + + +
=
-

w b w b c
i i i n n n n
i
n
s s a v x v x v x
0
1
1

+ + + + =
=
-

w b w b c f L
i i i n n n
i
n
s s v x v x x
0 0
1
1
0

.
Then x C and, since x can be chosen arbitrarily
large by changing a, we conclude that C is not
bounded.
Case n < m.
Any network with n < m is in fact equivalent to a
network with m = n simply assuming that v
n+1
= 0,
, v
m
= 0; w
n+1
= 0, , w
m
= 0. Hence, the analysis
for the previous case can be extended straightfor-
wardly to the case n < m, the only difference being
that, since v
m
= 0, inequality (6) holds regardless of
the monotonicity of s(). o
PROOF OF THEOREM 2. Notice that because of the multi-
layered assumption, the behavior of the first hidden
layer is not affected by additional layers and/ or ad-
ditional outputs. Hence, we can extend the reasoning
of the proof of Theorem 1 to discuss the outputs of the
first hidden layer in an MLP with many hidden layers
and many outputs.
Now the vectors v
1
, , v
n
are not a basis of R
m
, be-
cause n < m. So, there exist other vectors x x
m n 1
, , K
-
that, together with the previous ones, constitute a ba-
sis for R
m
. The equation = v
i j
x 0 holds for each 1 i
n, 1 j m - n, and we claim that the linear space S
spanned by x x
m n 1
, , K
-
is the linear space of the the-
sis. In fact, the activation of the neurons of the first
hidden layer is the same either when the MLP is fed
with a point x or when it is fed with x + v, v S, since
v =
=
-

a
j j
j
m n
x
1
for some reals a
1
, , a
m-n
, and
+ + = +

+
=
-

v x v v x
i i i j j
j
m n
i
b x b a
1
= + +
=
-

v x v
i j i j
j
m n
i
x b a
1
= + v x
i i
b
holds for each i. As a consequence, the output of the
MLP is the same for x and for x + v. o
A.2 Proof of Theorem 3
Let us introduce the notation we adopt in the proof. H
1
,
, H
n
represent the hyperplanes defined by H
i
=
x v x + =
i i
b 0
, ,
for 1 i n. That is, hyperplane H
i
can be
regarded as the set of all the inputs for which the activation
of ith hidden neuron is null. For 1 i n, Str
i,d
is the strips
that contains the inputs for which the activation level of the
ith hidden nodes is smaller in modulus than d, i.e.,
Str b d
i d i i ,
= + < x v x
, ,
.
Moreover, B
t
is the ball of ray t with center in 0, and
denotes the Euclidean norm.
Notice that it is sufficient to prove a simpler version of
Theorem 3, where s
h
() is the Heaviside function H . This
can promptly be seen as follows. Suppose that Theorem 3
holds for s
h
() = H(), and let s
s
() and s
s
be the two
sigmoidal activation functions. The simpler version of the
theorem could be applied twice, to s
s
(), H() and to s
s
,
H(). It follows that C
s
is bounded iff C
h
is bounded and
C
s
is bounded iff C
h
is bounded. Hence, C
s
is bounded iff
C
s
is bounded. Thus, without loss of generality, we as-
sume s
h
() = H().
Moreover, we also assume n m (the number of the
hidden units is larger or equal to the number of inputs),
since such an assumption simplifies our proof. However,
this is not a limit: The case n < m is immediately recon-
duced to the case n m, provided to supposes that the
MLP contains a sufficient number of hidden units with
null weights.
In order to prove the theorem, some lemmas are needed.
The behavior of the hidden units that are saturated by a
given input is nearly the same for N
s
and N
h
, since in that
case the sigmoidal and H() functions have a similar be-
havior. Hence, the differences between f
s
() and f
h
() are
mostly due to the unsaturated units. The first lemma for-
malizes this claim.
LEMMA 1. Given > 0, there exists D > 0 such that f
s
() can be
approximated, with maximal error by the function
g f w b b
s h i s i i i i
i Str
i D
x x v x v x
x

= + + - +

s H
:
,
. (7)
PROOF. From the definitions of g
s
(), f
s
(), and f
h
(), we have
f g
s s
x x - =
f f w b b
s h i s i i i i
i Str
i D
x x v x v x
x

- - + - +

s H
:
,
= + - +
= =

w b w b
i s
i
n
i i i
i
n
i i
s
1 1
v x v x

H
- + - +

w b b
i s i i i i
i Str
i D
s v x v x
x

H
:
,
GORI AND SCARSELLI: ARE MULTILAYER PERCEPTRONS ADEQUATE FOR PATTERN RECOGNITION AND VERIFICATION? 1127
= + - +

w b b
i s i i i i
i Str
i D
s v x v x
x
2 7 2 7 3 8
H
:
,
+ - +

w b b
i s i i i i
i Str
i D
s v x v x
x
2 7 2 7
H
:
,
.
Now, let D be large enough so that
5
max
max
,
,
d
s d d
-

D D
s
i n i
n w
: ?
: ?
1 6 1 6
4 9
3 8
H

1,K
. (8)
Hence, we get
f g w b b
s s i s i i i i
i Str
i D
x x v x v x
x
0 5 0 5 2 7 2 7
- + - +

s H
:
,
-
-
n w
i n
i
D D
s
max max
, , 1,K : ? : ?
3 8 1 6 1 6
4 9
d
s d d H
.
o
f
s
() and f
h
() are similar when all the hidden units are satu-
rated. On the contrary, f
s
() and f
h
() may differ signifi-
cantly in the case in which some hidden units are not
saturated. However, in the following, we will provide
more insights in the latter case. It will be shown how to
build, from given an input x, another input y that satisfies
f
s
(x) f
s
(y) (this implies x C
s
y C
s
) and saturates all
hidden units including those that are not saturated by x
(this implies y C
s
y C
h
). Thus, x C
s
y C
h
which is the key property that will be employed to prove
the theorem.
The following lemma proves part of our claim.
LEMMA 2. Let condition (2) of Theorem 3 hold and x be a vector
that belongs to strips Str Str
j D j D
r 1
, , K , where 0 r m.
Moreover, let {j
r+1
, , j
m
} be a possibly empty set of indexes
that does not overlap {j
1
, , j
r
}. Then, there exists a vector
y such that
$) + = + v x v y
i i i i
b b for i {j
r+1
, , j
m
};
% + = v y
i i
b D and
w b w b
i s i i i s i i
s s + + v x v y
2 7 2 7
for i {j
1
, , j
r
};
&) y x -
2rD
V m0 5
where m(V) is the smallest modulus of the eigenvalues of
matrix V
j j
m
=

v v
1
, , K .
PROOF. Given x which belongs to strips Str Str
j D j D
r 1
, , K , let
us recursively define r + 1 vectors y
0
, , y
r
as follows:
1) y
0
x;
2) "k = 1, , r:
y y v y z
k k k j k j k
D b
k k

- -
+ - -
1 1
a
4 9
,
5. Here, it is implicitly assumed that w
i
0 holds for some i. On the
other hand, when w
i
= 0 for all the i, the thesis holds since f
s
(x) = g
s
(x) = 0
for each x.
where z
k
is a vector that fulfills = v z
j k
k
1 and
= v z
j k
i
0 for all i k, 1 i m,
6
and a
k
is an integer in
{-1, 1}, whose value will be fixed in the proof.
For each k, we will prove the following propositions:
$
k
) + = + v x v y
i i i k i
b b for i {j
k+1
, , j
m
};
%
k
) + = v y
i k i
b D and
w b w b
i s i i i s i k i
s s + + v x v y
2 7 2 7
for i {j
1
, , j
k
};
&
k
) y x
k
kD
V
-
2
m0 5
In fact, propositions $
k
, %
k
, and &
k
yield the thesis for
k = r and y
r
= y.
7
Let us prove $
k
, %
k
, and &
k
, 0 k r,
by induction on k.
Basis: Trivial.
Induction step: Assume by induction that $
k-1
, %
k-1
,
and &
k-1
hold.
Proposition $
k
By the definition of y
k
and simple algebraic calcula-
tions, "i {j
1
, , j
k-1
, j
k+1
, , j
m
}, we have
+ = +
-
v y v y
i k i i k i
b b
1
. (9)
Thus, $
k
follows immediately from the induction hy-
pothesis $
k-1
and (9).
Proposition %
k
Assumption %
k-1
and (9) imply that both
w
i
s
s
(v
i
x + b
i
) w
i
s
s
(v
i
y
k
+ b
i
) (10)
and
| v
i
y
k
+ b
i
| = D (11)
hold "i {j
1
, , j
k-1
}. In the following, we prove that
choosing properly a
k
, (10) and (11) hold also for i = j
k
.
We have in fact
+ = + - - +
- -
v y v y v y v z
j k j j k k j k j j k j
k k k k k k k
b D b b
1 1
a
4 9
= a
k
D
so that (11) is fulfilled provided that a
k
{-1, +1}.
Further, since x Str
j D
k
,
- < + < D b D
j j
k k
v x
holds. Thus, notice that if we assign one to a,
+ v y
j k j
k k
b is larger than + v x
j j
k k
b , otherwise, if we
assign -1 to a, + v y
j k j
k k
b is smaller than + v x
j j
k k
b .
Since s
s
() is monotone, one of the two inequalities is
sufficient to make %
k
hold.
6. Note that this is well-defined, since v v
j j
m 1
, , K are linearly independ-
ent by hypothesis.
7. For each k, input y
k
saturates hidden units {j
1
, , j
k
} and does not satu-
rate units {j
k+1
, , j
m
}(see propositions $
k
, %
k
, and &
k
). Vector y
k+1
is recursively
computed from y
k
such that y
k+1
saturates a further unit (that is, unit j
k+1
).
Moreover, while it leaves constant the output of the other hidden units, it
makes j
k+1
to provide a larger contribution to f
s
() (see proposition %
k
). At
the end, for k = r, the hidden units 1, , r are saturated, and the lemma is
proved.
1128 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
Proposition &
k
Notice that
z
k
V

1
m
(12)
holds. In fact, Vz
k
, by definition of V and z
k
, is equal
to

v z v z
j k j k
m 1
, , K , which is a unitary vector whose
components are all null except for the kth one, which
is one. Inequality (12) follows immediately, because
1 = Vz
k
m(V)z
k
. Moreover
+
-
v y
j k j
k k
b D
1
(13)
holds, since x Str
j D
k
,
by hypothesis and
+ = +
-
v x v y
j j j k j
k k k k
b b
1
by the recursive assumption
$
k-1
. Finally, using the recursive assumption &
k-1
and
inequalities (12) and (13), we get
x y x y y y - - + -
- - k k k k 1 1
=
-
+ - -
-
2 1
1
k D
V
D b
j k j k
k k


,
m
a v y z

-
+ + +
-
2 1
1
k D
V
D b
j k j k
k k


,
m
v y z

-
+
2 1 2 k D
V
D
V

m m

2kD
V m
which is &
k
. o
Lemma 2 shows how to derive from x a y that satu-
rates at least m given hidden units. The following lemma
shows that if x is large enough, at most m - 1 units are
saturated.
LEMMA 3. Let us assume that property (2) of Theorem 3 holds.
Then, "a > 0, $T > 0 such that "x: x T, the number of
the indexes i for which
d b
i i i
= + v x a (14)
holds is less than or equal to m - 1.
PROOF. Let us order the indexes 1, , n in a sequence i
1
, ,
i
n
so that indexes appearing before correspond with
the nonlarger values of d
i
, i.e., r s d d
i i
r s
< . In the
following, we prove that "a > 0 $T > 0 such that
given any vector x, x T, we have that d
i
m
> a. This
is in fact equivalent to the thesis since d
i
m
> a implies
that (14) holds for no more than m - 1 indexes. Let us
define matrix V and vectors b, y as follows
V
b b
d d
i i
i i
i i
m
m
m
=

v v
b
y
1
1
1
, ,
, ,
, ,
K
K
K
When using these definitions, we have
d V
i
m
= = +

y x b

(15)
for any x 5
m
, where

denotes the infinitive


norm. According to hypothesis (2), matrix V is a full
rank matrix, and, therefore, the following inequalities
follow
d V
i
m
= +

x b
+
1
m
Vx b
+
-
m V
m
V

x b
1
-
-
m V
m
V

x b
1
,
where m(V) is the smallest of the moduli of the eigen-
values of V. Finally, the thesis of lemma follows
straightforwardly by the choosing T as
T
m
U U
>
+

max
a
m
b

,
where U ranges over all the square matrices that can
be created by selecting the rows in {v
1
, , v
n
}. We
have in fact
d
V
m
V
i
m
= -
-
m
x b
1
-
m
m
V
m
T
V


b
>
+
-
m a
m m
V
m
m
V V


b b
= a .
o
Lemma 2 shows how to build, from x, a vector y that
saturates only m given hidden units. Now, we join the
conclusions of Lemma 2 and Lemma 3 to prove that, pro-
vided x is large enough, y actually saturates all the hidden
units.
LEMMA 4. Let D > 0 be a given positive real and define S(x) =
{i| x Str
i,D
} as the set of the indexes of the strips where x
is contained, r as | S(x)| , and M(x) as a set of m indexes
GORI AND SCARSELLI: ARE MULTILAYER PERCEPTRONS ADEQUATE FOR PATTERN RECOGNITION AND VERIFICATION? 1129
where + v x
i i
b is minimal.
8
Moreover, suppose that at
least a weight w
i
is nonnull and is any given small real
that fulfills max
i{1,

,n}
(| w
i
| ).
Then, there exists a positive real T such that, "x : x
T, S(x) is a subset of M(x), and there is a vector y with the
following properties:
9
$) + = + v x v y
i i i i
b b for i M(x)\ S(x);
%) + = v y
i i
b D and w b w b
i s i i i s i i
s s + + v x v y

for i S(x);
&) + > v y
i i
b D, the sign of + v x
i i
b is equal to the sign
of + v y
i i
b , and w b w b
i s i i i s i i
s s + - + v x v y

for i M(x);
') y x -
2rD
V m
.
PROOF. Let b > 0 be a real that fulfills both
s b
s
i n i
w


, ,
-

1
1,

max
, K
(16)
and
s b
s
i n i
w
-



, ,

max
, 1,K
. (17)
Such a b exists, because lim
x-
s
s
(x) = 0 and
lim
x
s
s
(x) = 1. Moreover, let T be a real such that
inequality + v x
i i
b a holds for no more than m - 1
indexes i, when both
a b
m
= + +

D rD
V i n
i
2
1,
max
, K , ,
v
and x T are satisfied. In this case, the existence of T
is demonstrated by Lemma 3. We claim that such a T
fulfills the thesis of the lemma.
Propositions $ and %
In fact, let us assume x T. Notice that, since
+ v x
i i
b a holds for no more than m - 1 indexes
and a is larger than D, the cardinality of S(x) is
smaller than m and M(x), which contains just m ele-
ments, is a superset of S. Thus, let S(x) = {j
1
, , j
r
} and
M(x) = {j
1
, , j
r
, , j
m
} be. If we apply Lemma 2 to the
strips Str Str
j D j D
r 1
, , K and to the set of indexes {j
r+1
,
, j
m
}, we immediately get that there is a vector y that
fulfills $and %.
Proposition &
To prove proposition &, we have to discuss the ex-
pressions + v x
i i
b and + v y
i i
b , when i M(x). Notice
8. M(x) is a subset {1, , n} that contains exactly m elements and, for any
k, h, fulfills k M(x) and k M(x) implies + + v x v x
k k h h
b b . When
more than one subset satisfies the above proposition, any one is good for
our purposes.
9. Note that y depends on x.
that M(x) contains all the indexes for which
+ v x
i i
b a , so i M(x) immediately implies
+ > > v x
i i
b a b .
On the other hand,
+ + - - v y v x v y x
i i i i i
b b
- a
m
2rD
V
i
v

= + +

D rD
V
rD
V i m
i i
b
m m
2 2
1,
max
, K , ,
v v
+ D b
> b, (18)
where, in the above calculations, the inequality x - y
2rD/ m(V) is employed that follows from Proposition
& of Lemma 2.
Moreover, notice that the signs of the expressions
+ v y
i i
b and + v x
i i
b are equal. Otherwise, the differ-
ence of the two expressions should be larger, in
modulus, than the expressions themselves: On the
contrary, we have
+ - - - v x v y v y x
i i i i i
b b
2rD
V
i
v
m
a
+ v x
i i
b .
Summing up, either
+ - v x
i i
b b and + - v y
i i
b b
hold, or
+ v x
i i
b b and + v y
i i
b b
hold. In the former case, we have
w b w b
i s i i i s i i
s s + - + v x v y

w b b
i s i i s i i
+ - + s s v x v y

- -

max
, i n
i s
w
1,
0
K , ,
s b ,
where the inequality
s s s b
s i i s i i s
b b + - + - - v x v y
0
is employed which follows from (17) and equality
| a - b| = | max(| a| , | b| ) - min(| a| , | b| )| . Similarly,
in the latter case, we have
w b w b
i s i i i s i i
s s + - + v x v y

max
, i n
i s
w

-
1,
1
K , ,
s b .
1130 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
Thus, the latter inequality of Proposition & is
proven. On the other hand, + > v y
i i
b D follows im-
mediately by (18) and the statement on the signs of
+ v x
i i
b and + v y
i i
b is proven, and, consequently, &
holds, too.
Proposition '
Finally, Proposition ' is a straightforward conse-
quence of the inequality x - y 2rD/ m(V) that fol-
lows from Proposition & of Lemma 2.
Now, we have all the information needed to prove
Theorem 3.
PROOF OF THEOREM 3. The polytopes P
U
of (4) realize a par-
tition of R
m
. Since s
h
= H, f
h
is a set function on such a
partition: It assumes a constant value within each
polytope P
U
. Thus, C
h
consists of the union of those
polytopes where f
h
is equal or larger than L.
Let b the maximal value that f
h
assumes on the un-
bounded polytopes, that is
b f
P
h P
U
U
= max
nonbounded
x
, ,
,
where x
P
U
denotes any vector in P
U
. Set C
h
is un-
bounded or bounded according whether there is a
nonbounded P
U
where f
h
is larger than to L or there is
not. In other words, C
h
is bounded if b < L holds and
it is unbounded if b > L holds.
In order to carry out the proof, we must demon-
strate that C
h
is bounded if and only if C
s
is bounded.
According to the previous discussion, the if impli-
cation and the only if implication can be rewritten
as follows.
1) The if implication: If b < L holds (C
h
is
bounded), then there is a T such that f
s
(x) < L for all
x such that x T (C
s
is bounded).
2) The only if implication: If b > L holds (C
h
is un-
bounded), then there is an unbounded subset of R
m
where f
s
(x) > L (C
s
is unbounded).
Notice that we did not consider the case b = L. In
fact, it is impossible by property (1) of hypothesis.
The expression
iU
w
i
+ c in property (1) represents
just the output of the network N
h
for some input x
that belongs to P
U
. Thus, property (1) excludes that
f
h
(x) = L holds for any P
U
.
The if implication
Suppose that b < L holds. By using Lemma 1 with =
(L - b)/ 4, there is D such that
f g
L b
s s
x x

-
-
4
(19)
holds. Let us discuss separately the case when x does
not belong to any one of the strips Str
1,D
, , Str
n,D
and
the case when x belongs to some strips.
First of all, assume that x Str
i,D
for each i. By ine-
quality (19), the definition of g
s
and the fact that x
does not belong to any strip, it follows
f f
s h
x x - =
f f w b b
s h i s i i i i
i Str
i D
x x v x v x
x

- - + - +

s H
:
,
= - f g
s s
x x

- L b
4
such that
f f
L b
b
L b
L
s h
x x +
-
+
-
<
4 4
(20)
holds. Thus, vectors x outside all the strips are not in
C
s
.
Then, let us assume that x belongs to
Str Str
j D j D
r 1
, ,
, , K . Applying Lemma 4 to x, D, and ( =
min(L - b)/ (4(n - m)), max
1,

,n
(| w
i
| )),
10
it follows
that there is a T such that, if x T, there is y that ful-
fills propositions $, %, &, and '. Inequalities in those
propositions state that y is not in any strip and, as a
consequence, we can get
f b
L b
s
y +
-
4
(21)
by the same reasoning adopted to obtain (20). Moreo-
ver, by (21) and inequalities in $, %, and &, we get
f w b c
s i s i i
i
n
x v x
= + +
=

s
1
= + + +


w b w b
i s i i
i S
i s i i
i M S
s s v x v x
x x x

\
+ + +

w b c
i s i i
i M
s v x
x


+ + +


w b w b
i s i i
i S
i s i i
i M S
s s v y v y
x x x

\
+ + +
-
-

w b
L b
n m
c
i s i i
i M
s v y
x



4
= +
-
f
L b
s
y
4
+
-
< b
L b
L
2
which proves that also the vectors that belong to the
strips are not in C
s
.
The only if implication
Assume that b > L holds. Using Lemma 1 with =
(L - b)/ 4, there is a real D such that
f g
L b
s s
x x -
-
4
(22)
holds for every x.
10. Here, we implicitly assume that w
i
0 for at least an index i. However,
this is not a limitation, since when all the w
i
are null, the theorem is trivial.
GORI AND SCARSELLI: ARE MULTILAYER PERCEPTRONS ADEQUATE FOR PATTERN RECOGNITION AND VERIFICATION? 1131
Now, let us apply Lemma 4 to D: It follows that
there is a real T, such that for all x, x T, there is a
vector y that fulfills properties $, %, and &. Thus, let
x be any vector such that x T holds and consider
the corresponding y defined by the lemma. Notice
that properties $, %, and & of Lemma 4 imply that y
is not contained in any strip among Str
1,D
, , Str
n,D
.
Thus, using the inequality that follows from (22) by
instantiating x with y and the definition of g
s
, we get
f f
s h
y y 1 6 1 6 - =
f f w b b
s h i s i i i i
i Str
i D
y y v y v y
y
1 6 1 6 2 7 2 7 3 8
- - + - +

s H
:
,
= - f g
s s
y y 1 6 1 6

- L b 1 6
2
and
f f
L b
b
L b
L
s h
y y 1 6 1 6 -
-
= -
-
>
2 2
so that y C
s
holds and y can be arbitrarily large,
since y x -
2rD
V m0 5
holds from property ' of
Lemma 4. o
A.3 Proof of Theorem 4
To carry out the proofs, we will largely employ some results
in [17], where it is proven that the closed hemisphere prob-
lem (CHP) is 13-complete. Given v
1
, , v
n
vectors in R
m
and an integer L, such a problem is defined as follows.
11
DEFINITION 3. Closed hemisphere problem. Decide whether
there is an x R
m
such that x 0 and the cardinality of the
set U i
i
x v x 0 5
= B
= 0 is larger than or equal to L.
PROOF OF THEOREM 4. To complete the proof of Theorem 4,
we show that
1) BSP for MLPs with Heaviside activation function is
nondeterministically computable in polynomial
time and
2) CHP can be reduced to BSP.
The proof of point (1) is almost immediate. In fact,
a solution to the problem can be found by generating
all the subsets U of {1, , n}, selecting the nonempty
sets P
U
and evaluating inequality (5) over them. The
reader can easily verify that the generation of all U
can be performed in O(n) steps by a nondeterministic
algorithm. On the other hand, the selections can be
performed in a polynomial number of operations by
linear programming.
11. More precisely, Johnson and Preparata call this problems: the state-
ment of CHP as a feasibility questions. Here, for the sake of simplicity, we
change their terminology. In fact, in their formulation, CHP is an optimiza-
tion problem, that is, the problem of finding x that maximizes the cardinal-
ity of the considered sets, while, in our formulation, CHP is a feasibility
question.
In order to prove point (2), given any instance of
CHP, let us consider an associate BSP, where the vec-
tors v
1
, , v
n
and the threshold L are the same as
CHPs, whereas the other parameters are chosen as
follows
b
i
= 0, c = 0, and w
i
= 1, 1 i n. (23)
We claim that the given CHP instance reduces to the
associate BSPs. In fact, by (23) and simple calcula-
tions, we have that:
1) U w c
o i
i U
o
x
x
2 7
2 7
= +

holds for any x


o
;
2) the set
P
U
o
x 2 7
=
x v x x v x x <
i o i o
i U i U 0 0 for and for
2 7 2 7
J L
is unbounded if x 0 and it is the set {0} if x = 0.
Thus, let x
o
be a solution of the BSP; we directly see
that P
U
o
x 2 7
is the solution of the corresponding CHP.
In fact, P
U
o
x 2 7
is unbounded by property (2) and ine-
quality (5) follows from
w c U L
i
i U
+ =

x
x
0 5
0 5
.
Moreover, if P
U
is a solution for the instance of the
BSP, then any nonnull x
o
P
U
is a solution of the cor-
responding CHP, because we have
U w c L
o i
i U
o
x
x
2 7
2 7
= +

by property (1). o
ACKNOWLEDGMENTS
This work has been supported by the Italian National Re-
search Council. We thank A.C. Tsoi for very fruitful discus-
sions especially concerning the extension of our results to
MLP with more than one hidden layer.
REFERENCES
[1] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang,
Phoneme Recognition Using Time-Delay Neural Networks,
IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, no. 3,
pp. 328-339, 1989.
[2] Y. le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hub-
bard, and L. Jackel, Backpropagation Applied to Handwritten
Zip Code Recognition, Neural Computation, vol. 1, pp. 541-551,
1989.
[3] H. Gish and M. Schmidt, Text-Independent Speaker-
Identification, IEEE Signal Processing, vol. 11, pp. 18-32, Oct. 1994.
[4] M. Gori and A. Tesi, On the Problem of Local Minima in Back-
Propagation, IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 14, no. 1, pp. 76-86, Jan. 1992.
[5] R. Lippmann, Review of Neural Networks for Speech Recogni-
tion, Neural Computation, vol. 1, pp. 1-38, 1989.
[6] F. Takeda and S. Omatu, High Speed Paper Currency Recogni-
tion by Neural Networks, IEEE Trans. Neural Networks, vol. 6,
pp. 73-77, Jan. 1995.
[7] R. Bajaj and S. Chaudhury, Signature Verification Using Multi-
ple Neural Classfier, Pattern Recognition, vol. 30, no. 1, pp. 1-7,
1997.
1132 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 11, NOVEMBER 1998
[8] J.-P. Drouhard, R. Sabourin, and M. Godbout, A Neural Net-
work Approach to Off-Line Signature Verification Using Di-
rectional PDF, Pattern Recognition, vol. 29, no. 3, pp. 415-424,
1996.
[9] A. Hakulinen and J. Hakkarainen, A Neural Network Approach
to Quality Control of Padlock Manufacturing, Pattern Recognition
Letters, vol. 17, pp. 357-362, 1996.
[10] N. Intrator, D. Reisfeld, and Y. Yeshurun, Face Recognition Using
a Hybrid Supervised/ Unsupervised Neural Network, Pattern
Recognition Letters, vol. 17, pp. 67-76, Jan. 1996.
[11] Y. Zhou and R. Hecht-Nielsen, Target Recognition Using Multi-
ple Sensors, Neural Networks for Signal Processing, vol. 3, pp. 411
420, New York, 1993.
[12] S.P.C. Spence, S. Hsu, and J. Pearson, Integrating Neural Net-
works With Image Pyramids to Learn Target Context, Neural
Networks, vol. 8, no. 7-8, pp. 1,143-1,152, 1995.
[13] R. Lippmann, An Introduction to Computing With Neural
Nets, IEEE ASSP Magazine, pp. 4-22, Apr. 1987.
[14] A. Frosini, M. Gori, and P. Priami, A Neural Network-Based
Model for Paper Currency Recognition and Verification, IEEE
Trans. Neural Networks, vol. 7, pp. 1,4821,490, Nov. 1996.
[15] M. Gori, L. Lastrucci, and G. Soda, Autoassociator-Based Models
for Speaker Verification, Pattern Recognition Letters, vol. 17, pp. 241-
250, 1996.
[16] J. Oglesby and J. Mason, Radial Basis Functions for Speaker
Recognition, Proc. ICASSP91, pp. 393-396, 1991.
[17] D.S. Johnson, and F.P. Preparata, The Densest Hemisphere Prob-
lem, Theor. Comput. Sci., vol. 6, pp. 93-107, 1978.
Marco Gori received the Laurea in electronic
engineering from Universit di Firenze, Italy, in
1984, and the PhD in 1990 from Universit di
Bologna, Italy. From October 1988 to June 1989,
he was a visiting student at the School of Com-
puter Science (McGill University, Montreal). In
1992, he became an associate professor of
computer science at Universit di Firenze and, in
November 1995, he joined the University of
Siena. His main research interests are in pattern
recognition (especially document processing)
and neural networks. Dr. Gori was the general chairman of the Second
Workshop of Neural Networks for Speech Processing held in Firenze in
1992, organized the NIPS96 post-conference workshop on Artificial
Neural Networks and Continuous Optimization: Local Minima and
Computational Complexity, and co-organized the Caianiello Summer
School on Adapting Processing of Sequences held in Salerno in
September 1997. He coedited the volume Topics in Artificial Intelli-
gence (Springer-Verlag, 1995) which collects the contributions of the
1995 Italian Congress of Artificial Intelligence.
Dr. Gori serves as a program committee member of several work-
shops and conferences mainly in the area of Neural Networks and
acted as guest coeditor of the Neurocomputing Journal for the special
issue on recurrent neural networks (July 1997). He is an associate
editor of the IEEE Transactions on Neural Networks, Neurocomputing,
and Neural Computing Survey, He is the Italian chairman of the IEEE
Neural Network Council (R.I.G.) and is a member of the IAPR, SIREN,
and AI*IA Societies. He is also a senior member of the IEEE.
Franco Scarselli received the Laurea in com-
puter science from Universit di Pisa, Italy, in
1989, and the PhD degree in 1995 from Univer-
sit di Firenze, Italy. His main interests are in the
field of neural networks and pattern recognition.
He is currently a postdoc with Dipartimento do
Sistemi e Informatica, Unvirsity of Firenze, where
he is working on the application of neural net-
works to problems of document processing.

Potrebbero piacerti anche