Sei sulla pagina 1di 4

The answer to this question is embodied in the universal approximation theorem8 for a

nonlinear input–output mapping, which may be stated as follows:


Let (·) be a nonconstant, bounded, and monotone-increasing continuous function. Let Im0
Section 4.12 Approximations of Functions 167
denote the m0-dimensional unit hypercube . The space of continuous functions on [0, 1]m0 Im0
is denoted by . Then C(Im0) , given any function and f C(Im0) 0, there exist an integer
m1 and sets of real constants i, bi, and wij, where i 1, m1 and j 1, m0 such that we
may define
F(x1, ... , xm0) = a
(4.88) m1
i a am0
as an approximate realization of the function f(·); that is,
for all that lie in the input space.
The universal approximation theorem is directly applicable to multilayer perceptrons. We first note, for example,
that the hyperbolic tangent function used as the nonlinearity in a neural model for the construction of a multilayer
perceptron is indeed a
nonconstant, bounded, and monotone-increasing function; it therefore satisfies the conditions imposed on the
function (·) Next, we note that Eq. (4.88) represents the output of a multilayer perceptron described as follows:
1. The network has m0 input nodes and a single hidden layer consisting of m1 neurons; the inputs are denoted by .
2. Hidden neuron i has synaptic weights , and bias bi.
3. The network output is a linear combination of the outputs of the hidden neurons,
with defining the synaptic weights of the output layer.
The universal approximation theorem is an existence theorem in the sense that it
provides the mathematical justification for the approximation of an arbitrary continuous function as opposed to exact
representation. Equation (4.88), which is the backbone of the theorem, merely generalizes approximations by finite
Fourier series. In
effect, the theorem states that a single hidden layer is sufficient for a multilayer perceptron to compute a uniform
approximation to a given training set represented by the
set of inputs x1, ..., xm and a desired (target) output . However, the theorem f(x1, ..., xm )
0 0

1,..., m 1

wi , ..., wm
1 0

x1, ... , xm 0

x1, x2, ..., xm0


F(x1, ..., xm0) - f(x1, ..., xm0) 6 
i=1
j=1 wijxj + bi b
..., ...,
does not say that a single hidden layer is optimum in the sense of learning time, ease of
implementation, or (more importantly) generalization.
Bounds on Approximation Errors
Barron (1993) has established the approximation properties of a multilayer perceptron,
assuming that the network has a single layer of hidden neurons using sigmoid functions
and a linear output neuron. The network is trained using the back-propagation algorithm and then tested with new
data. During training, the network learns specific points
of a target function f in accordance with the training data and thereby produces the
approximating function F defined in Eq. (4.88). When the network is exposed to test
data that have not been seen before, the network function F acts as an “estimator” of new
points of the target function; that is, .
A smoothness property of the target function f is expressed in terms of its Fourier
representation. In particular, the average of the norm of the frequency vector weighted
by the Fourier magnitude distribution is used as a measure for the extent to which the
denote the multidimensional Fourier transform
function f oscillates. Let
of the
function f(x), the m0-by-1 vector is the frequency vector.The
function f(x) is
defined in terms of its Fourier transform
by the inverse formula
f
(4.89) f(x) = 3f

where . For the complex-valued function for which


is integrable,
j = 2-1 ( ) f
we define the first absolute moment of the Fourier magnitude distribution of the function
f as
C (4.90)
f = 3 f~( ) * 7 7 12 d
m0

where is the Euclidean norm of and is the absolute value of . The


first absolute moment C
f quantifies the smoothness of the function f.
The first absolute moment C
f provides the basis for a bound on the error that
results from the use of a multilayer perceptron represented by the input–output mapping function F(x) of Eq. (4.88)
to approximate f(x). The approximation error is measured by the integrated squared error with respect to an
arbitrary probability measure
on the ball of radius r 0. On this basis, we may state the following proposition for a bound on the approximation
error given by Barron (1993):
For every continuous function f(x) with finite first moment Cf and every m1 1, there exists
a linear combination of sigmoid-based functions F(x) of the form defined in Eq. (4.88) such
that when the function f(x) is observed at a set of values of the input vector x denoted by
{xi}Ni1 that are restricted to lie inside the prescribed ball of radius r, the result provides the following bound on the empirical
risk:
(4.91)
where C
f¿ = (2rCf)2.
e
av(N) = 1
Na
N
i=1
(f(xi) - F(xi))2
C¿
f
m1
B
r = {x: 7x 7
r}
7 7 ~f( ) ~f( )
(
~f ~
)
~( )exp(j Tx) d
m0

~( )

Potrebbero piacerti anche