Artificial Neural Networks Basics

Artificial Neural Networks
The biological network

The eye senses light around us and transforms it into electrical impulses that are carried to
the brain by nerve fibres. At the back of each eye, nerve fibres from the retina bundle together to
form optic nerves. The two optic bundles meet at a region termed the optic chiasm. At this region
the two tracts form one bundle, which then splits into two optic tracts going to the left and right sides
of the brain, respectively. Each tract carries signals from both eyes, and the brain integrates the
visual messages. The region of the brain responsible for vision is dubbed the visual cortex (Fig.1).
Since each side of the brain receives two images of the scene, one from each eye at a slightly
different angle of view, it interprets the result to produce three dimensional or stereoscopic vision.
At the brain a massive connection of neurons carry out information processing.
Fig.2 is a simplified schematic of a neuron. This consists of a cell body (soma) with input
nerves (dendrites) and output nerves (axons). The dendrites receive signals which could be either
excitatory or inhibitory. Excitery signals promote the neuron to fire a signal and inhibitory signals
retard firing. The axon carries the signals to another cell, and the information is transferred through
synaptic end bulbs, and received by the dendrite through receptor regions. The synaptic end bulb
and the receptor regions are separated by a gap about one billionth of an inch wide, and transmission
of signals through that gap is carried by an electrochemical process (Fig. 3). The end bulb and the
receptor regions are termed the synapse. The signals flow in the dendrites and axons as electric
current. There are several types of neurons in the brain with various cell body shapes and functions.
Some are inhibitors to prevent the spread of impulses that might overload the sensory circuits. Some
have the purpose of bringing additional information to the brain's surface, and others act as receivers
to incoming signals.
The synaptic bulb contains tiny globules called synaptic vesicles (Fig. 3), each carries
Figure 1 Visual pathway to the brain.
thousands of molecules termed neurotransmitters. When the nerve impulse reaches the synaptic
bulb, the vesicles fuse with the membrane, spilling their contents into the synaptic gap. The
neurotransmitters binds with receptors on the target cell; this opens receptor channels and allow
sodium ions to rush into the target cell, and potassium ions to leave. The flow of ions excites an
area of the target-cell membrane and generate an electrical impulse in the target cell.
Figure 2 Simplified Diagram of a neuron
Figure 3 The synapse
(1)
(2)
The human mind contains approximately 10
11
neurons, and is estimated to be capable of
performing over 100 billion operations/sec. The Cray X-MP is capable of 0.8 billion
operations/sec. That makes the human mind over 100 times faster with the added advantage that
it is much smaller in size and requires far less energy. One characteristic that one can point out is
that the human brain is designed in three dimensions. Integrated circuits on the other hand are
usually two-dimensional, and recent advances in three dimensional integrated circuit design does
not achieve or even come close to the type of 3-D integration of the human brain.
The perceptron
Fig. 4 shows what is believed to be a mathematical model for a single neuron. The node
sums N weighted inputs, applies some threshold value centred, and passes the result through a
nonlinearity. The output can be either 1 representing the firing of a neuron or 0 (-1 can also be
used). The non-linearity functions commonly used are the sigmoid function and the limiter
(threshold logic).
The structure in Fig.4 is known as the perceptron, and is basically a linear classifier
that is capable of separating two disjoint classes as shown in Fig. 5. The output of the
perceptron can be written as:
where
Figure 4 Computational element of a neural network.
(3)
(4)
A scheme that determines the weights {w
0
,w
1
,...,w
N-1
} such that f() separates the two classes;
A and B is, not surprisingly, called a "learning" scheme. is called the threshold, and is
usually a fixed number between 0 and 1.
To derive a learning scheme we shall consider for now a perceptron with just two
inputs:
x
0
could represent the color feature 'x' of the chromaticity diagram and x
1
the color feature 'y'.
If we wish the perceptron to separate two colors A and B, then we should expect an output of
say 1 if (x
0
,x
1
) belonged to color A, and 0 if the inputs belonged to color B. Alternatively, one
can write:
where the subscript p denotes a pattern reading for (x
0
,x
1
) and d
p
denotes the desired output
for that pattern. If (w
0
,w
1
) are known then the actual output y can be calculated from
Eq.(12.1). The error for that pattern reading can be given by:
Figure 5 A simple decision function of two classes
The problem then becomes the minimization of E
p
w.r.t. w
0
and w
1
for all pattern inputs
(x
0
,x
1
)
p
such that Eq..(1) provides the correct separation between the two classes as seen in
Fig. 5. E
p
is a nonlinear function of the variables w
0
and w
1
, and hence nonlinear schemes
need to be used to minimize it.
If y is given by the sigmoid function:
then by differentiating E
p
w.r.t. w
0
we get:
and w.r.t. w
1
we get:
The steepest descent algorithm can be used to obtain the values of the weights as
follows:
1. Set the weights (w
0
,w
1
) and to small random values.
At iteration k:
2. Present an input value (x
0
,x
1
) and specify the desired output; 1 if it belongs to one
class and 0 if it belongs to the other.
3. Calculate the actual output y.
4. Calculate:
5. Calculate the gradients:
6. Adjust the weights using the recursive equation:
where
and is a positive fraction less than 1.
7. Present new values of input, or if data has all been read recycle same set of data.
Go to step 2 and repeat until the weights stabilize i.e.,
Convergence is sometimes faster if a momentum term is added and weights are
smoothed by:
where 0 < < 1.
The above algorithm is known as the "delta" rule [1], and has been used extensively
in the literature. Although the algorithm can produce weights for the classifier, it requires a
large number of iterations to converge. The choice of the two parameters and seems to be
rather arbitrary. To allow you to examine the performance of the delta rule algorithm we will
present you next with a C program designed for a perceptron with two inputs.
W444444444444444444444444444444444444444444444444444444444444444U
Program 1 "PERECEPT.C". Perceptron learning using the delta
rule.
W444444444444444444444444444444444444444444444444444444444444444U
/*******************************
* Developed by M.A.Sid-Ahmed. *
* ver. 1.0, 1992. *
********************************/
/* Teaching a single pereptron using
the delta rule. */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <conio.h>
#include <io.h>
#include <custom.h>
#define eta 0.8
#define alpha 0.2
void main()
{
unsigned int d[200];
unsigned int N,ind,iter,i;
float w[2],x[2],x1[200],x2[200],net,E;
float dEp[2],sum,y,theta,dEp_old[2],delta;
FILE *fptr;
char file_name[14];
clrscr();
N=0;
iter=0;
gotoxy(1,1);
printf("Enter file name containing data --->");
scanf("%s", file_name);
fptr=fopen(file_name,"r");
if(fptr==NULL)
{
printf("file %s does not exist.",file_name);
exit(1);
}
while((fscanf(fptr,"%f %f %d ",&x1[N], &x2[N],&d[N])!=EOF))
N++;
fclose(fptr);
srand(1);
w[0]=(float)rand()/32768.00;
srand(2);
w[1]=(float)rand()/32768.00;
theta=0.1;
i=0;
sum=0.0;
ind=1;
gotoxy(1,10);
printf("Press ESC to exit before convergence.");
while(ind)
{
x[0]=x1[i];
x[1]=x2[i];
gotoxy(1,3);
printf("Iteration # %5d ",iter);
net=w[0]*x[0]+w[1]*x[1]+theta;
if(net>=20) E=0.0;
else E=exp(-(double)net);
y=1.0/(1.0+E);
delta=(d[i]-y)*y*(1.0-y);
dEp[0]=x[0]*delta;
dEp[1]=x[1]*delta;
if(i==0)
{
w[0]+=eta*dEp[0];
w[1]+=eta*dEp[1];
dEp_old[0]=dEp[0];
dEp_old[1]=dEp[1];
}
else
{
w[0]+=eta*dEp[0]+alpha*(dEp[0]-dEp_old[0]);
w[1]+=eta*dEp[1]+alpha*(dEp[1]-dEp_old[1]);
dEp_old[0]=dEp[0];
dEp_old[1]=dEp[1];
}
sum+=fabs((double)(d[i]-y));
i++;
if(i>=N)
{
gotoxy(1,6);
printf(" Square error= %f",sum);
i=0; sum=0;
iter++;
}
if(d[i]==1)
gotoxy(1,4);
else
gotoxy(1,5);
printf("%d %f", d[i],y);
if((i==N)&&(sum<=1.0e-1))
{
gotoxy(1,7);
printf("\n w[0]=%f w[1]=%f",w[0],w[1]);
exit(1);
}
if(kbhit()!=0)
{
gotoxy(1,7);
if(getch()==27)
{
printf("\n w[0]=%f w[1]=%f",w[0],w[1]);
(5)
exit(1);
}
}
}
}
W444444444444444444444444444444444444444444444444444444444444444U
Teaching by minimizing the total error function.
The delta rule is a slow iterative scheme that may converge to a solution. The selection
of the two parameters and governs the speed of convergence, and sometimes might even lead
to a divergence from a solution. Convergence is very slow requiring hundreds and for most
problems thousands of iterations. The simple perceptron with two inputs and hence two
adjustable weights, which by itself can separate only two disjoint classes, required a fair amount
of computations to arrive at a satisfactory solution for the separation of the two classes. If such
effort is required to arrive at a solution for such a relatively simple problem, you can imagine
what will happen if we were to use the same approach to teach a multi-layer neural structure.
What is obviously then needed is a scheme that is guaranteed to converge at a faster rate.
The problem in essence is a minimization problem, and what is required to be minimized
is the total error function:
where d
i
is the desired output for a given pattern: X
i
=[ x
0
, x
1
, ...., x
N-1
], and y
i
is the actual output
for that same input pattern. Since y
i
is a non-linear function of the adjustable weights: W=[w
0
,
w
1
, ...., w
M-1
], the problem becomes a non-linear minimization one. Hence instead of trying to re-
invent the wheel we should seek methods from the area of non-linear optimization that have
already been proven to be effective in solving similar problems.
Most non-linear optimization schemes for minimizing a function of several variables
require as part of the process the minimization of a single variable function. Therefore, the first
step is to examine univariate search methods, develop and test C routines for them, and then
proceed to examine effective multivariate methods. Once these approaches are understood and
tested the next step is to apply them to the perceptron problem and compare the approach to the
delta rule.
(6)
Univariate Search Methods
The most common method for obtaining a minimum of a function of a single variable is
the Golden Section method [2]. This method is based on a region elimination scheme, and
assumes that the function has only one minimum within a pre-defined region. That is the function
is said to be monotonic. The region elimination schemes in general can be explained by the
diagram shown in Fig.6.
Referring to Fig.6 if two points are selected in the region between w1 and w2, and if
y2>y1, then obviously the minimum would lie between w1 and a2. Therefore, the region [a2,w2]
can be eliminated from the search. If now two points are selected in the smaller region and that
process is repeated then the search region will shrink further. Eventually, the minimum will be
bracketed by a very small region. The obvious question is: What systematic way do we have in
selecting these internal points? A rather more in-depth question would be: Is there an optimum
solution for locating these points sequentially such that the minimum is bracketed within a very
small region 2 in a predetermined number of steps? The answer to this question was found in
1953 by Kiefer [3]. This optimum sequential search has come to be known as the Fibonacci
search. The search is based on the Fibonacci integers developed by Fibonacci in the 13th century.
Later this was modified to a sub-optimum sequential search that did not require the generation
of the Fibonacci integers, and this came to be known as the Golden Section search.
We will assume for the development of the Fibonacci search that the region of search is
normalized to [0,1]. The Fibonacci integer series are defined by the recursive equations:
Figure 6 Region elimination scheme
(7)
If N function evaluations are to be performed to provide a resolution , then if we were to start
backwards from the solution and move towards the starting region while expanding the region
in each step as shown in the diagram of Fig.7 we arrive at the following equations:
where L
i
is the interval length at the ith iteration, and F
k
, k=2,3,...,N+1 are the Fibonacci numbers.
If the first interval is [0,1] then the last equation of (7) can be written as:
or
Figure 7 The Fibonacci search.
i.e. if is given then N can be determined from
Similarly if N is given, then a resolution greater than
cannot be expected.
The implementation of Fibonacci search is quite simple. Two initial function evaluations
are performed at a
2
=L
2
=F
N
and a
1
=1-L
2
=1-F
N
. Depending on the outcome, the interval [0,a
1
]
or [a
2
,1] is eliminated and a new point is located in the remaining interval symmetrically w.r.t.
the interior point. The process is repeated until all N function evaluations have been performed.
The Golden Section search is derived from the Fibonacci search as follows.
Since
and
then
Suppose L
2
is selected on the premise that a very large number of function evaluations
will be made (even though we may have no intentions of performing them). Let the
approximation for L
2
so obtained be Therefore;
(8)
For large N
or
That is for an interval [0,1] the initial points of search are: a
1
=1-0.618=0.382, and a
2
=0.618
irrespective of N or . The solution algorithm proceeds in exactly the same manner as previously
discussed.
The ratio was known to ancient architects and mathematicians as the Golden
Section. It divides a line segment into two parts such that the ratio of the larger to the original
segment equals the ratio of the smaller to the larger segment. For this reason this elimination
technique is referred to as the Golden Section search.
The algorithm for the Golden section search can now be formalized as follows:
1. Determine two locations w
1
and w
2
that brackets the minimum (w2>w1).
2. Compute L=w
2
-w
1
; a
2
=0.618*L+w
1
and a
1
=w
1
+w
2
-a
2
(refer to Fig.7).
3. Compute tol=w
2
-w
1
;
4. If tol< stop
5. Compute y
1
=f(a
1
) and y
2
=f(a
2
).
6. If y
1
<y
2
and a
1
>a
2
then eliminate the region [w
1
,a
2
], i.e. w
1
=a
2
and a
2
=w
1
+w
2
-a
1
.
If y
1
<y
2
and a
1
<a
2
then eliminate the region [a
1
,w
2
], i.e.
w
2
=a
1
and a
2
=w
1
+w
2
-a
1
.
If y
1
>y
2
and a
1
>a
2
then eliminate the region [a
1
,w
2
], i.e.
w
2
=a
1
and a
1
=w
2
+w
1
-a
2
.
If y
1
>y
2
and a
1
<a
2
then eliminate the region [w
1
,a
2
], i.e. w
1
=a
1
and a
1
=w
1
+w
2
-a
2
.
7. Go to step 3.
(9)
(10)
(11)
W444444444444444444444444444444444444444444444444444444444444444U
Exercise 1.
Develop a C program for the Golden Section search. Test the program on the following
functions:
(i)f(x)=6.0-11x+6x
2
-x
3
. (Answer=1.42256, for the interval
[0,2]).
(ii)f(x)=(100-x)
2
. (Answer=100).
(iii)f(x)=e
x
-3x
2
-2e
-2x
. (Answer=2.8316).
W444444444444444444444444444444444444444444444444444444444444444U
A scheme that requires less function evaluations then the Golden Section method was
developed by Powell and is based on successive quadratic estimation. The quadratic estimation
scheme assumes that within a bounded interval the function can be approximated by a quadratic
function. The minimum of the quadratic function is used as a first estimate for the minimum of
the function. This minimum along with two more points is used to obtain a better approximation,
and so on. Eventually the minimum of the quadratic function will approach the actual minimum
within some error bound. The quadratic approximation method can be derived as follows.
Given three consecutive points x
1
, x
2
,x
3
and their corresponding function values f
1
, f
2
, f
3
,
we seek to determine three constants a
0
, a
1
and a
2
such that the quadratic function
agrees with f(x) at these three points.
Since
we have
Since
we have
Finally, at x=x
3
(12)
(13)
(14)
Solving for a
2
, we obtain
To obtain an estimate for the minimum (or maximum) we differentiate q(x) w.r.t. x and
equate the result to zero, i.e.
By solving the above equation for x we obtain the estimate
Note that a minimum is obtained if
The Powell algorithm for successive quadratic estimation is provided in [4]. It can, however, lead
to problems if a
2
works out to be zero or the function simply cannot be approximated by a
quadratic.
A method that combines the Golden Section search and successive quadratic
approximation is due to Brent [5]. The Brent's method utilizes the quadratic approximation when
the function is cooperative and switches to the Golden Section method when the function is not.
The Brent's method will be utilized in the multivariate routines provided next. We will skip both
the derivation of the algorithm and refer the interested reader to the original work of Brent [5].
The program, however, will be provided when we cover the multi-layer neural network.
Bracketing the minimum.
Non of the above schemes will work without first bracketing the minimum. A scheme
developed by Swann [6] is my choice. The method is best explained by the following algorithm.
1. Select some initial value x0 and a small step-length dx.
2. Compute
x1=x0+dx
y1=f(x0)
y2=f(x1)
3. If y1>=y0 then set dx=-dx and compute
x1=x0+dx
y1=f(x1)
4. Compute
dx=2.0*dx
x2=x1+dx
y2=f(x2)
5. Repeat the following steps until y2>y1
dx=2.0*dx
x0=x1
y0=y1
x1=x2
y1=y2
x2=x1+dx
y2=f(x2)
6. The bracket is (x0,x2).
Multivariate minimization methods.
There are a number of effective techniques for minimizing a function of several variables.
The approach that is usually taken is determining a direction of search in the multi-dimensional
space. This is followed by a univariate approach to determine the minimum along that direction.
Once a minimum is obtained, a new direction is found and the search begins again. This
continues until convergence is reached. The delta rule described previously relates to a technique
called the steepest descent. However, no effort was made to carry out a univariate search along
the steepest descent to obtain the minimum. The other difference is that the samples were
provided one at a time followed by adjustments to the weights. In optimization schemes this is
never done, and all the samples have to be provided to the routine in order to calculate an
optimum search direction. Since there is really nothing to prevent us from doing this, then we
should follow the proper optimization procedures which are founded on solid mathematical
reasoning.
Since this is not a book on optimization I will not provide you with a gamut of
multivariate optimization schemes, but rather two schemes that I found to be very effective. The
first method I will present was developed by Fletcher and Reeves which is a conjugate gradient
method. Conjugate gradients are basically directions designed so as to minimize a true quadratic
function in N variables in exactly N steps. The other scheme which I did find useful in a variety
of applications is the Davidon, Fletcher and Powell method. This was found to be particularly
useful for minimizing objective functions with large number of variables. You will not need to
understand the derivations of any of these two schemes in order to implement them. However,
if you are curious I suggest you look it up in recent text books on optimization (e.g. [2],[7]), or
better yet read the original work [8],[9],[10]. I will provide you next with the two algorithms.
We shall assume that the function of N variables to be minimized is given by:
, and

.
The Fletcher-Reeves Conjugate Gradient method
1. Select starting values for , and set iteration counter to
zero i.e., iter=0. Select maximum number of iterations allowed.
2. Calculate .
3. Set
4. Compute
If test< convergence is obtained, return X and exit from routine.
5. Set
6. Minimize using a univariate method the following function w.r.t.
7. Adjust X=X+S.
8. Calculate
9. Update S using:
10. If for any i, set i.e. use steepest descent.
11. Calculate iter=iter+1.
12. If (iter < maximum number of iterations provided) go to step 4, else return X and exit
from routine.
The Davidon-Fletcher-Powell Method
We shall assume that the function of N variables to be minimized is given by:
. The algorithm is as follows:
1. Select starting values for , and set iteration counter to zero
i.e., iter=0. Select maximum number of iterations allowed.
2. Calculate .
3. Set
4. Form an N N identity matrix H, i.e. H=I.
5. Compute
If test< convergence is obtained, return X and exit from routine.
6. Set
7. Minimize using a univariate method the following function w.r.t.
8. Adjust X=X+S.
9. Calculate
10. Calculate
11. Update H using the following equations:
12. Calculate
(15)
(16)
Figure 8 Enclosing a planar region.
13. If for any i, set i.e. use steepest descent.
14. Calculate iter=iter+1.
15. If (iter < maximum number of iterations provided ) go to step 5, else return X and exit
from routine.
For either of the above two methods can be obtained by the line search technique of
Brent. In order to utilize these schemes the gradients of the objective function will have to be
derived. For the perceptron problem these are given by:
and
Instead of providing you with a program listing at this stage for just the perceptron, we
will wait until we have developed the equations for the multi-layer perceptron network. I will
then provide you with a general program that can be applied to any number of layers.
A Multi-Layer Perceptron.
A single perceptron with two inputs can be represented graphically by a straight line
separating two classes, Fig. 5.
Obviously to enclose a region in a two-input system,
(2-D system) requires at least 3 lines, Fig.8. This
naturally leads us to the two layer perceptron shown
in Fig.9. This network consists of an input layer, a
hidden layer, and an output layer. It should not be too
difficult to figure out that for a 3-input system (3-D)
at least 4 planes are needed to enclose a region, i.e.
four perceptrons or nodes would be needed in the
hidden layer. In general, for an N-D hyperspace
(N+1) hyperplanes would be needed to enclose a
given region, i.e. N+1 perceptrons or nodes are
needed in the hidden layer. However, a two-layer
perceptron is limited to isolating convex regions. A
convex region is one in which any line joining points on the border of the region goes only
through points within that region. A three-layer perceptron on the other hand can isolate arbitrary
decision regions and can separate meshed classes as shown in Fig.10. Figure 11 depicts a three
layer perceptron network. The interconnections are left out so as not to clutter the diagram.
We will derive the derivatives with respect to the weights for the three layer case. Then
we will generate the equations for the general case, which we will apply also to the two layer
perceptron network. Following this we will provide you with the source file for a general program
Figure 10 Different types of decision
regions.
Figure 9 A three layer perceptron.
(17)
that can train any size network. Of course an example will be provided. The following notations
are used in Figs.9, 11 and in the proceeding analysis.
The weights are labelled as w
ij
R
, where;
i=node position in layer R-1;
j=node position in layer R.
The sum of inputs into any node are labelled net
i
R
, where
i=node position in layer R+1.
The output from any node in the network is labelled y
i
R
, where i=node position in layer
R+1.
From the diagram of Fig. 11 we can write the error function:
Figure 11 A general three layer network.
To obtain the derivatives of E with respect to the weights we will start with the weights
feeding the output layer. Therefore
We can write
Therefore
(18)
(19)
By defining:
we can write:
The derivatives of E with respect to the weights feeding the second hidden layer can be
derived as follows:
Now
Note that the k has been dropped for the time being so as not to over clutter the equations. This
last equation can be expressed as:
(20)
(21)
(22)
Therefore
Since
we can write:
This results in
By defining
we can express
By following the previous procedure we can derive the derivatives of E w.r.t. the weights
feeding the first hidden layer. These are given through the equations:
(23)
In general for a multi-layer network we can deduce from the above equations the
following set.
If the last layer is L then:
and
and
Note that if the function f(net) is a sigmoid we can replace:
The last set of equations can be used in the development of a C-program for "training" a
multi-layer network. Since the derivatives are calculated by first considering the output layer, and
then working backwards to the input layer, this method of computing the derivatives came to be
known as the back propagation method. Back propagation also indicates that the error in the
output propagates back to the input. The following program uses the conjugate gradient method
along with Brent's method for linear search to train a neural network with any number of hidden
layers and nodes. The bracketing algorithm, descriped previously, is also used.
W444444444444444444444444444444444444444444444444444444444444444U
Program 2 " PERNCONJ G. C" . Tr ai ni ng a mul t i - l ayer neur al net wor k
usi ng t he conj ugat e gr adi ent met hod.
W444444444444444444444444444444444444444444444444444444444444444U
/ *******************************
* Devel oped by M. A. Si d- Ahmed. *
* ver . 1. 0, 1992. *
********************************/
/ * Pr ogr amf or t r ai ni ng a mul t i - l ayer per cept r on
usi ng t he conj ugat e gr adi ent met hod. */
voi d conj _gr ad( f l oat ( *) ( f l oat *) , voi d ( *) ( f l oat *, f l oat *,
i nt ) , f l oat *, i nt , f l oat , f l oat , i nt ) ;
f l oat f un( f l oat *) ;
voi d df un( f l oat *, f l oat *, i nt ) ;
#i ncl ude <st di o. h>
#i ncl ude <st dl i b. h>
#i ncl ude <mat h. h>
#i ncl ude <coni o. h>
#i ncl ude<t i me. h>
#i ncl ude <i o. h>
i nt M, *NL, *NS, L;
i nt *d;
f l oat *xp, *y, *net , *del t a, t het a;
voi d mai n( )
{
f l oat *w, q, xt ;
i nt i , j , N, xd, i nd, Nt ;
char f i l e_name[ 14] , f i l e_name2[ 14] , ch;
FI LE *f pt r , *f pt r 2;
cl r scr ( ) ;
pr i nt f ( " \ nDo you wi sh t o use pr evi ousl y t r ai ned wei ght s? ( y or n) - - >" ) ;
whi l e( ( ( ch=get ch( ) ) ! =' y' ) &&( ch! =' n' ) ) ;
put ch( ch) ;
swi t ch( ch)
{
case ' y' :
pr i nt f ( " \ nEnt er f i l e name - - >" ) ;
scanf ( " %s" , f i l e_name) ;
f pt r =f open( f i l e_name, " r " ) ;
i f ( f pt r ==NULL)
{
pr i nt f ( " No such f i l e exi st s. " ) ;
exi t ( 1) ;
}
f scanf ( f pt r , " %d " , &L) ;
NL=( i nt *) mal l oc( L*si zeof ( i nt ) ) ;
NS=( i nt *) mal l oc( ( L- 2) *si zeof ( i nt ) ) ;
f or ( i =0; i <L; i ++)
f scanf ( f pt r , " %d " , &NL[ i ] ) ;
NS[ 0] =NL[ 0] *NL[ 1] ;
f or ( i =1; i <( L- 2) ; i ++)
NS[ i ] =NS[ i - 1] +NL[ i ] *NL[ i +1] ;
N=NS[ L- 3] +NL[ L- 2] *NL[ L- 1] ; / * Tot al # of wei ght s. */
/ * Assi gni ng memor y f or wei ght s. */
w=( f l oat *) mal l oc( N*si zeof ( f l oat ) ) ;
f or ( i =0; i <N; i ++)
f scanf ( f pt r , " %f " , &w[ i ] ) ;
f scanf ( f pt r , " %f " , &t het a) ;
f cl ose( f pt r ) ;
br eak;
case ' n' :
/ * Ent er i ng number of l ayer s. */
pr i nt f ( " \ nEnt er number of hi dden l ayer s- - >" ) ;
scanf ( " %d" , &L) ;
L+=2; / *addi ng i nput and out put l ayer s. */
pr i nt f ( " Ent er number of nodes i n i nput l ayer - - >" ) ;
scanf ( " %d" , &NL[ 0] ) ;
f or ( i =1; i <=( L- 2) ; i ++)
{
pr i nt f ( " Ent er number of nodes i n hi dden l ayer %d- - >" , i ) ;
scanf ( " %d" , &NL[ i ] ) ;
}
pr i nt f ( " Ent er number of nodes i n out put l ayer - - >" ) ;
scanf ( " %d" , &NL[ L- 1] ) ;
NS[ 0] =NL[ 0] *NL[ 1] ;
f or ( i =1; i <( L- 2) ; i ++)
NS[ i ] =NS[ i - 1] +NL[ i ] *NL[ i +1] ;
r andomi ze( ) ;
f or ( i =0; i <N; i ++)
w[ i ] =( f l oat ) r andom( N) / ( f l oat ) N;
t het a=0. 1;
}
Nt =0;
f or ( i =1; i <L; i ++)
Nt +=NL[ i ] ; / * Tot al number of neur al s. */
got oxy( 1, 10) ;
pr i nt f ( " Ent er f i l e name f or st or i ng t r ai ned wei ght s- - > " ) ;
i nd=access( f i l e_name, 0) ;
whi l e( ! i nd)
{
got oxy( 1, 12) ;
pr i nt f ( " Fi l e exi st s. Wi sh t o over wr i t e? ( y or n) - - >" ) ;
whi l e( ( ( ch=get ch( ) ) ! =' y' ) &&( ch! =' n' ) ) ;
put ch( ch) ;
swi t ch( ch)
{
case ' y' :
i nd=1;
br eak;
case ' n' :
got oxy( 1, 7) ;
pr i nt f ( " " ) ;
got oxy( 1, 10) ;
pr i nt f ( " " ) ;
got oxy( 1, 10) ;
pr i nt f ( " Ent er f i l e name - - >" ) ;
i nd=access( f i l e_name, 0) ;
}
}
f pt r =f open( f i l e_name, " w" ) ;
/ * Assi gni ng memor y t o *net , *z, *del t a. */
net =( f l oat *) mal l oc( Nt *si zeof ( f l oat ) ) ;
y=( f l oat *) mal l oc( Nt *si zeof ( f l oat ) ) ;
del t a=( f l oat *) mal l oc( Nt *si zeof ( f l oat ) ) ;
pr i nt f ( " \ nEnt er f i l e_name cont ai ni ng t r ai ni ng dat a - - >" ) ;
scanf ( " %s" , f i l e_name2) ;
f pt r 2=f open( f i l e_name2, " r " ) ;
i f ( f pt r 2==NULL)
{
pr i nt f ( " f i l e %s does not exi st . " , f i l e_name) ;
exi t ( 1) ;
}
/ * Det er mi ni ng t he si ze of t he dat a. */
M=0; i nd=1;
whi l e( 1)
{
f or ( i =0; i <NL[ 0] ; i ++)
{
i f ( ( f scanf ( f pt r 2, " %f " , &xt ) ) ==EOF) / * i nput dat a. */
{ i nd=0;
br eak;
}
}
i f ( i nd==0)
br eak;
f or ( i =0; i <NL[ L- 1] ; i ++) / * desi r ed out put . */
f scanf ( f pt r 2, " %d " , &xd) ;
M++;
}
pr i nt f ( " \ n# of dat a poi nt s=%d" , M) ;
r ewi nd( f pt r 2) ;
/ * Assi gni ng memor y t o *xp, *d */
xp=( f l oat *) mal l oc( ( M*NL[ 0] ) *si zeof ( f l oat ) ) ;
d=( i nt *) mal l oc( ( M*NL[ L- 1] ) *si zeof ( i nt ) ) ;
/ * Readi ng i n t he dat a. */
f or ( i =0; i <M; i ++)
{
f or ( j =0; j <NL[ 0] ; j ++)
f scanf ( f pt r 2, " %f " , &xp[ j *M+i ] ) ;
f or ( j =0; j <NL[ L- 1] ; j ++)
f scanf ( f pt r 2, " %d " , &d[ j *M+i ] ) ;
}
f cl ose( f pt r 2) ;
/ * Cal l t he Fl et cher - Reeves conj . gr ad. al gor i t hm. */
cl r scr ( ) ;
got oxy( 1, 1) ;
pr i nt f ( " Pr ess ESC t o exi t and save l at est updat e f or wei ght s. " ) ;
conj _gr ad( f un, df un, w, N, 1. e- 3, 1. e- 3, 10000) ;
f pr i nt f ( f pt r , " %d " , L) ;
f or ( i =0; i <L; i ++)
f pr i nt f ( f pt r , " %d " , NL[ i ] ) ;
f or ( i =0; i <N; i ++)
f pr i nt f ( f pt r , " %f " , w[ i ] ) ;
f pr i nt f ( f pt r , " %f " , t het a) ;
q=f un( w) ;
pr i nt f ( " \ nEr r or =%f " , q) ;
pr i nt f ( " \ n Fi l e name used t o st or e wei ght s i s %s" , f i l e_name) ;
pr i nt f ( " \ n Fi l e name f or t he t r ai ni ng dat a i s %s" , f i l e_name2) ;
}
ext er n f l oat *net , *w, *del t a, *y;
ext er n i nt *d;
ext er n i nt *NS, *NL;
/ * Gener at i ng t he f unct i on. */
f l oat f un( f l oat *w)
{
i nt i , j , k, m, n, Nt 1, Nt 2;
f l oat q, er r or , E;
q=0. 0;
f or ( k=0; k<M; k++)
{
f or ( i =0; i <NL[ 1] ; i ++) / * Fr omi nput l ayer t o f i r st */
{ / * hi dden l ayer . */
net [ i ] =0. 0;
f or ( j =0; j <NL[ 0] ; j ++)
net [ i ] +=w[ i +j *NL[ 1] ] *xp[ j *M+k] ;
net [ i ] +=t het a;
E=( f l oat ) exp( - ( doubl e) net [ i ] ) ;
y[ i ] =1. 0/ ( 1. 0+E) ;
}
Nt 1=NL[ 1] ; Nt 2=0;
f or ( n=2; n<L; n++) / * Fr oml ayer n- 1 t o l ayer n. */
{
f or ( i =0; i <NL[ n] ; i ++)
{
m=Nt 1+i ;
net [ m] =0. 0;
f or ( j =0; j <NL[ n- 1] ; j ++)
net [ m] +=w[ NS[ n- 2] +i +j *NL[ n] ] *y[ j +Nt 2] ;
net [ m] +=t het a;
E=( f l oat ) exp( - ( doubl e) net [ m] ) ;
y[ m] =1. 0/ ( 1. 0+E) ;
}
Nt 1+=NL[ n] ;
Nt 2+=NL[ n- 1] ;
}
f or ( i =0; i <NL[ L- 1] ; i ++) / * Cal cul at i ng t he er r or . */
{
er r or =d[ k+i *M] - y[ Nt 2+i ] ;
q+=er r or *er r or ;
}
} / *k- l oop*/
q/ =2;
r et ur n q;
}
ext er n f l oat *df , *w, *net ;
ext er n *NL, *NL;
#def i ne f d( i ) y[ i ] *( 1. 0- y[ i ] ) / * Def i ne der i vat i ve. */
voi d df un( f l oat *w, f l oat *df , i nt N)
{
i nt i , j , k, m, n, Nt 1, Nt 2, Nt 3, i i ;
f l oat E, er r or , sum;
/ * I ni t i al i ze der i vat i ve vect or . */
f or ( i =0; i <N; i ++)
df [ i ] =0. 0;
/ * St ar t . */
f or ( k=0; k<M; k++)
{
/ * For war d pr opagat i on. */
net [ i ] =0. 0;
f or ( j =0; j <NL[ 0] ; j ++)
net [ i ] +=w[ i +j *NL[ 1] ] *xp[ j *M+k] ;
y[ i ] =1. 0/ ( 1. 0+E) ;
}
Nt 1=NL[ 1] ; Nt 2=0;
{
f or ( i =0; i <NL[ n] ; i ++)
{
m=Nt 1+i ;
net [ m] =0. 0;
f or ( j =0; j <NL[ n- 1] ; j ++)
net [ m] +=t het a;
y[ m] =1. 0/ ( 1. 0+E) ;
}
Nt 1+=NL[ n] ;
Nt 2+=NL[ n- 1] ;
}
Nt 1=0;
f or ( i =1; i <( L- 1) ; i ++)
Nt 1+=NL[ i ] ;
f or ( i =0; i <NL[ L- 1] ; i ++) / * del t a' s f or out put l ayer . */
{
i i =Nt 1+i ;
er r or =d[ k+i *M] - y[ i i ] ;
del t a[ i i ] =- er r or *f d( i i ) ;
}
f or ( m=0; m<( L- 2) ; m++) / * del t a' s by back pr opagat i on. */
{
Nt 2=Nt 1- NL[ L- 2- m] ;
f or ( i =0; i <NL[ L- 2- m] ; i ++)
{
i i =Nt 2+i ;
sum=0. 0;
f or ( j =0; j <NL[ L- 1- m] ; j ++)
sum+=del t a[ Nt 1+j ] *w[ NS[ L- 3- m] +j +i *NL[ L- 1- m] ] ;
del t a[ i i ] =f d( i i ) *sum;
}
Nt 1=Nt 2;
}
f or ( i =0; i <NL[ 1] ; i ++)
f or ( j =0; j <NL[ 0] ; j ++)
df [ i +j *NL[ 1] ] +=del t a[ i ] *xp[ k+j *M] ;
Nt 1=NS[ 0] ; Nt 2=0;
Nt 3=NL[ 1] ;
f or ( m=1; m<( L- 1) ; m++)
{
f or ( i =0; i <NL[ m+1] ; i ++)
f or ( j =0; j <NL[ m] ; j ++)
df [ Nt 1+i +j *NL[ m+1] ] +=del t a[ Nt 3+i ] *y[ Nt 2+j ] ;
Nt 1=NS[ m] ;
Nt 2+=NL[ m] ;
Nt 3+=NL[ m+1] ;
}
} / *k- l oop*/
}
voi d conj _gr ad( f l oat ( *) ( f l oat *) , voi d ( *) ( f l oat *, f l oat *,
i nt ) , f l oat *, i nt , f l oat , f l oat , i nt ) ;
f l oat f ( f l oat , f l oat ( *) ( f l oat *) , f l oat *, f l oat *,
f l oat *, i nt ) ;
f l oat f un( f l oat *) ;
voi d df un( f l oat *, f l oat *, i nt ) ;
voi d br acket ( f l oat , f l oat ,
f l oat *, f l oat *, f l oat ( *) ( f l oat *) ,
f l oat *, f l oat *, f l oat *, i nt ) ;
f l oat Br ent ( f l oat , f l oat , f l oat ( *) ( f l oat *) , f l oat ,
f l oat *, f l oat *, f l oat *, i nt ) ;
/ * Conj ugat e gr adi ent met hod.
f un: i s a subpr ogr amt hat r et ur ns t he val ue
of t he f unct i on t o be mi ni mi zed. The
ar gument s ar e: vect or of var i abl es, number
of var i abl es.
df un: i s subpr ogr amt hat pr ovi des t he gr adi ent s. Ar gument s:
var i abl es, gr adi ent s, number of var i abl es.
x[ ] : cont ai n t he var i abl es. An i ni t i al val ue need t o be
suppl i ed.
N: number of var i abl es.
eps1: over al l conver gence cr i t er i a.
eps2: l i ne sear ch conver gence cr i t er i a.
no_i t er : Maxi mumnumber of i t er at i ons. */
#def i ne ESC 0x1B
f l oat EPS; / *squar e- r oot of machi ne epsi l on. */
voi d conj _gr ad( f l oat ( *f un) ( f l oat *) , voi d ( *df un) ( f l oat *, f l oat *,
i nt ) , f l oat *x, i nt N, f l oat eps1, f l oat eps2, i nt no_i t er )
{
f l oat *df , *df p, *xt , *S, q, ast ar , sum, t est , sum1, sum2;
i nt i , j , i t er ;
f l oat a, b, t ol 1;
EPS=1. 0;
do
{
EPS/ =2. 0;
t ol 1=1. 0+EPS;
} whi l e( t ol 1>1. 0) ;
EPS=( f l oat ) sqr t ( ( doubl e) EPS) ;
df =( f l oat *) mal l oc( N*si zeof ( f l oat ) ) ;
df p=( f l oat *) mal l oc( N*si zeof ( f l oat ) ) ;
S=( f l oat *) mal l oc( N*si zeof ( f l oat ) ) ;
xt =( f l oat *) mal l oc( N*si zeof ( f l oat ) ) ;
df un( x, df , N) ;
f or ( i =0; i <N; i ++)
S[ i ] =df [ i ] ;
got oxy( 1, 6) ;
q=f un( x) ;
pr i nt f ( " I ni t i al val ue of er r or f unct i on=%f " , q) ;
i t er =0;
whi l e( i t er <no_i t er )
{
i f ( kbhi t ( ) ! =0)
{
i f ( get ch( ) ==ESC) ;
r et ur n;
}
i t er ++;
/ * t est conver gence. */
t est =0. 0;
f or ( i =0; i <N; i ++)
t est +=( f l oat ) f abs( ( f l oat ) df [ i ] ) ;
i f ( t est < eps1)
{
pr i nt f ( " \ nConver gence by gr adi ent t est . " ) ;
br eak;
}
/ * I f df *S<0. 0 r est ar t . */
t est =1. 0;
f or ( i =0; i <N; i ++)
{
i f ( df [ i ] *S[ i ] >0. 0) {
t est =- 1. 0;
br eak;
}
}
i f ( t est <0. 0)
{
f or ( i =0; i <N; i ++)
S[ i ] =df [ i ] ;
}
/ * Save pr evi ous gr adi ent vect or . */
f or ( i =0; i <N; i ++)
df p[ i ] =df [ i ] ;
/ * Li ne Sear ch. */
br acket ( 0. 01, 0. 001, &a, &b, f un, x, xt , S, N) ;
ast ar =Br ent ( a, b, f un, eps2, x, xt , S, N) ;
/ * Adj ust var i abl es. */
f or ( i =0; i <N; i ++)
x[ i ] - =ast ar *S[ i ] ;
df un( x, df , N) ;
sum1=sum2=0. 0;
f or ( i =0; i <N; i ++)
{
sum1+=df p[ i ] *df p[ i ] ;
sum2+=df [ i ] *df [ i ] ;
}
sum=sum2/ sum1;
f or ( i =0; i <N; i ++)
S[ i ] =sum*S[ i ] +df [ i ] ;
q=f un( x) ;
got oxy( 1, 7) ;
pr i nt f ( " Er r or f unct i on=%f at i t er at i on # %- 5d" , q, i t er ) ;
}
pr i nt f ( " \ nNumber of i t er at i ons = %d \ n" , i t er ) ;
f r ee( S) ;
f r ee( xt ) ;
}
/ * Funct i on eval uat i on f or l i ne sear ch. */
f l oat f ( f l oat al pha, f l oat ( *f un) ( f l oat *) , f l oat *x, f l oat *xt ,
f l oat *S, i nt N)
{
i nt i ;
f l oat q;
f or ( i =0; i <N; i ++)
xt [ i ] =x[ i ] - al pha*S[ i ] ;
q=f un( xt ) ;
r et ur n q;
}
/ * Funct i on t o br acket t he mi ni mumof a si ngl e
var i abl e f unct i on. */
voi d br acket ( f l oat ax, f l oat dx,
f l oat *a, f l oat *b, f l oat ( *f un) ( f l oat *) ,
f l oat *x, f l oat *xt , f l oat *S, i nt N)
{
f l oat y1, x1, x0, y0, x2, y2;
i nt i t er ;
x0=ax;
x1=x0+dx;
y0=f ( x0, f un, x, xt , S, N) ;
y1=f ( x1, f un, x, xt , S, N) ;
i f ( y1>=y0)
{
dx=- dx;
x1=x0+dx;
y1=f ( x1, f un, x, xt , S, N) ;
}
dx=2. 0*dx;
x2=x1+dx;
y2=f ( x2, f un, x, xt , S, N) ;
i t er =0;
whi l e( y2<y1)
{
i t er ++;
dx=2. 0*dx;
x0=x1;
y0=y1;
x1=x2;
y1=y2;
x2=x1+dx;
y2=f ( x2, f un, x, xt , S, N) ;
}
*a=x0;
*b=x2;
}
/ * Br ent ' s al gor i t hmf or obt ai ni ng t he mi ni mum
of a si ngl e var i abl e f unct i on. */
#def i ne CGOLD 0. 381966
f l oat Br ent ( f l oat ax, f l oat bx, f l oat ( *f un) ( f l oat *) , f l oat TOL,
f l oat *x, f l oat *xt , f l oat *S, i nt N)
{
f l oat a, b, u, v, w, xx, e, f x, f v, f u, f w, xm, t ol 1, t ol 2, c, r , q, p;
i nt i t er ;
a=ax;
b=bx;
v=a+CGOLD*( b- a) ;
w=v;
xx=v;
e=0. 0;
f x=f ( xx, f un, x, xt , S, N) ;
f v=f x;
f w=f x;
c=0. 0;
i t er =0;
whi l e( i t er <100)
{
i t er ++;
xm=0. 5*( a+b) ;
t ol 1=EPS*( f l oat ) f abs( ( doubl e) xx) +TOL/ 3. 0;
t ol 2=2. 0*t ol 1;
i f ( ( f l oat ) f abs( ( doubl e) ( xx- xm) ) <=( t ol 2- 0. 5*( b- a) ) )
{
r et ur n xx;
}
i f ( ( f l oat ) f abs( ( doubl e) e) >t ol 1)
{
r =( xx- w) *( f x- f v) ;
q=( xx- v) *( f x- f w) ;
p=( xx- v) *q- ( xx- w) *r ;
q=2. 0*( q- r ) ;
i f ( q>0. 0) p=- p;
q=( f l oat ) f abs( ( f l oat ) q) ;
r =e;
e=c;
/ * i s par abol a accept abl e. */
i f ( ( ( f l oat ) f abs( ( doubl e) p) <( f l oat ) f abs( ( doubl e) ( 0. 5*q*r ) ) ) | |
( p > q*( a- xx) ) | |
( p < q*( b- xx) ) )
{ / * f i t par abol a. */
i f ( q==0. 0) q=1. e- 10;
c=p/ q;
u=xx+c;
/ * f must not be eval uat ed t oo cl ose t o a or b. */
i f ( ( ( ( u- a) <t ol 2) ) | | ( ( b- u) <t ol 2) )
c=( ( xm- xx) >0. 0) ? t ol 1 : - t ol 1;
got o l 2;
}
el se got o l 1;
}
el se
{ / * A gol den sect i on st ep. */
l 1: i f ( xx>=xm) e=a- xx;
el se e=b- xx;
c=CGOLD*e;
}
/ * updat e a, b, v, w, and x. */
l 2: i f ( f abs( ( doubl e) c) >=t ol 1) u=xx+c;
el se u=xx+( ( c>0. 0) ?t ol 1: - t ol 1) ;
f u=f ( u, f un, x, xt , S, N) ;
i f ( f u<=f x)
{
i f ( u>=xx) a=xx;
el se b=xx;
v=w;
f v=f w;
w=xx;
f w=f x;
xx=u;
f x=f u;
cont i nue;
}
el se
{
i f ( u<xx) a=u;
el se b=u;
}
i f ( ( f u<=f w) | | ( w==xx) )
{
v=w;
f v=f w;
w=u;
f w=f u;
cont i nue;
}
i f ( ( f u<=f v) | | ( v==xx) | | ( v==w) )
{
v=u;
f v=f u;
}
}
}
W444444444444444444444444444444444444444444444444444444444444444U
We need now to develop a program to test that the network actually works. The listing
for such a program is given next.
W444444444444444444444444444444444444444444444444444444444444444U
Program 3 " TESNLYE. C" . Test i ng a mul t i - l ayer net wor k.
W444444444444444444444444444444444444444444444444444444444444444U
/ *******************************
* Devel oped by M. A. Si d- Ahmed. *
* ver . 1. 0, 1992. *
********************************/
/ * Pr ogr amf or t est i ng a mul t i - l ayer per cept r on. */
voi d f un( f l oat *) ;
i nt M, *NL, *NS, L;
i nt *d;
f l oat *xp, *y, *net , *del t a, t het a;
voi d mai n( )
{
f l oat *w, q, xt ;
i nt i , j , N, xd, i nd, Nt ;
char f i l e_name[ 14] , f i l e_name2[ 14] , ch;
FI LE *f pt r , *f pt r 2;
cl r scr ( ) ;
pr i nt f ( " \ nEnt er f i l e_name f or wei ght s- - >" ) ;
f pt r =f open( f i l e_name, " r " ) ;
i f ( f pt r ==NULL)
{
exi t ( 1) ;
}
f scanf ( f pt r , " %d " , &L) ;
f or ( i =0; i <L; i ++)
f scanf ( f pt r , " %d " , &NL[ i ] ) ;
NS[ 0] =NL[ 0] *NL[ 1] ;
f or ( i =1; i <( L- 2) ; i ++)
NS[ i ] =NS[ i - 1] +NL[ i ] *NL[ i +1] ;
f or ( i =0; i <N; i ++)
f scanf ( f pt r , " %f " , &w[ i ] ) ;
f scanf ( f pt r , " %f " , &t het a) ;
Nt =0;
f or ( i =1; i <L; i ++)
Nt +=NL[ i ] ; / * Tot al number of neur al s. */
/ * Assi gni ng memor y t o *net , *z, *del t a. */
net =( f l oat *) mal l oc( Nt *si zeof ( f l oat ) ) ;
y=( f l oat *) mal l oc( Nt *si zeof ( f l oat ) ) ;
del t a=( f l oat *) mal l oc( Nt *si zeof ( f l oat ) ) ;
pr i nt f ( " \ nEnt er f i l e_name cont ai ni ng t r ai ni ng dat a - - >" ) ;
scanf ( " %s" , f i l e_name2) ;
f pt r 2=f open( f i l e_name2, " r " ) ;
i f ( f pt r 2==NULL)
{
exi t ( 1) ;
}
/ * Det er mi ni ng t he si ze of t he dat a. */
M=0; i nd=1;
whi l e( 1)
{
f or ( i =0; i <NL[ 0] ; i ++)
{
i f ( ( f scanf ( f pt r 2, " %f " , &xt ) ) ==EOF) / * i nput dat a. */
{ i nd=0;
br eak;
}
}
i f ( i nd==0)
br eak;
f or ( i =0; i <NL[ L- 1] ; i ++) / * desi r ed out put . */
f scanf ( f pt r 2, " %d " , &xd) ;
M++;
}
pr i nt f ( " \ n# of dat a poi nt s=%d" , M) ;
r ewi nd( f pt r 2) ;
/ * Assi gni ng memor y t o *xp, *d */
xp=( f l oat *) mal l oc( ( M*NL[ 0] ) *si zeof ( f l oat ) ) ;
d=( i nt *) mal l oc( ( M*NL[ L- 1] ) *si zeof ( i nt ) ) ;
/ * Readi ng i n t he dat a. */
f or ( i =0; i <M; i ++)
{
f or ( j =0; j <NL[ 0] ; j ++)
f scanf ( f pt r 2, " %f " , &xp[ j *M+i ] ) ;
f or ( j =0; j <NL[ L- 1] ; j ++)
f scanf ( f pt r 2, " %d " , &d[ j *M+i ] ) ;
}
f cl ose( f pt r 2) ;
got oxy( 1, 7) ;
pr i nt f ( " Pr ess any key t o see net wor k r esponse f or next i nput . " ) ;
f un( w) ;
}
ext er n f l oat *net , *w, *del t a, *y;
ext er n i nt *d;
ext er n i nt *NS, *NL;
/ * Gener at i ng t he f unct i on. */
voi d f un( f l oat *w)
{
i nt i , j , k, m, n, Nt 1, Nt 2;
f l oat er r or , E;
f or ( k=0; k<M; k++)
{
net [ i ] =0. 0;
f or ( j =0; j <NL[ 0] ; j ++)
net [ i ] +=w[ i +j *NL[ 1] ] *xp[ j *M+k] ;
y[ i ] =1. 0/ ( 1. 0+E) ;
}
Nt 1=NL[ 1] ; Nt 2=0;
{
f or ( i =0; i <NL[ n] ; i ++)
{
m=Nt 1+i ;
net [ m] =0. 0;
f or ( j =0; j <NL[ n- 1] ; j ++)
net [ m] +=t het a;
y[ m] =1. 0/ ( 1. 0+E) ;
}
Nt 1+=NL[ n] ;
Nt 2+=NL[ n- 1] ;
}
got oxy( 1, 10) ;
f or ( i =0; i <NL[ L- 1] ; i ++) / * Cal cul at i ng t he er r or . */
{
er r or =d[ k+i *M] - y[ Nt 2+i ] ;
pr i nt f ( " r esponse t o dat a # %d" , k+1) ;
i f ( NL[ L- 1] ! =1) pr i nt f ( " \ nout put # %d" , i ) ;
pr i nt f ( " \ ndesi r ed=%d act ual =%f er r or =%f " ,
d[ k+i *M] , y[ Nt 2+i ] , er r or ) ;
}
get ch( ) ;
} / *k- l oop*/
}
W444444444444444444444444444444444444444444444444444444444444444U
W444444444444444444444444444444444444444444444444444444444444444U
Exercise 2
1. Develop a C-program that uses the multi-layer network to selectively adjust a specific
color.
2. Redevelop Program PERNCONJG.C to utilize instead the Davidon-Fletcher-Powell
algorithm. Save the program as PERNDFP.C.
W444444444444444444444444444444444444444444444444444444444444444U
Mathematical Descriptors
Mathematical descriptors can be used in discriminating between different shapes.
Discrimination is carried out by training a neural classifier, such as the one described in this
chapter.
Given a two-dimensional discrete function f(x,y) we define the moment of order (p+q) by
the relation
m x y f x y
pq
p q
y x
=

( , )
(24)
The central moments are expressed by:
where
x
m
m
y
m
m
= =
10
00
01
00
,
The normalized central moments, denoted by , are defined as:
pq
pq
pq
=
00
where
=
+
+
p q
2
1
for p q + = 2 3 , ,K
For the second and third moments, a set of seven invariant moments can be derived [15].
They are given by:

1 20 02
= +

2 20 02
2
11
2
4 = + ( )

3 30 12
2
21 03
2
3 3 = + ( ) ( )

4 30 12
2
21 03
2
= + + + ( ) ( )
[ ]
[ ]

5 30 12 30 12 30 12
2
21 03
2
21 03 21 03 30 12
2
21 03
2
3 3
3 3
= + + +
+ + + +
( )( ) ( ) ( )
( )( ) ( ) ( )
[ ]

6 20 02 30 12
2
21 03
2
11 30 12 21 03
4
= + +
+ + +
( ) ( ) ( )
( )( )
pq
p q
y x
x x y y f x y =

( ) ( ) ( , )
(25)
[ ]
[ ]

7 12 30 30 12 30 12
2
21 03
2
12 03 21 03 30 12
2
21 03
2
3 3
3 3
= + + +
+ + + +
( )( ) ( ) ( )
( )( ) ( ) ( )
This set of moments has been shown to be invariant to translation, rotation, and scale [15].
Statistical texture features
Statistical features are usually calculated from a window of centred ( ) ( ) 2 1 2 1 n n + +
around a pixel p(x, y). This window can be randomly placed at different locations by the user
in the area of the image in which texture/color need to be classified for data collection. The
data collected is then used in neural classifier to train it for the recognition of a particular
texture/color in images of similar nature. These features are:
! The color primaries, R,G,B of the center pixel or the non-physical primaries (X,Y,Z)
or just the luminance Y, or the luminance along with two of the normalized color
primaries, or another combination depending on the problem you are trying to solve.
! Mean of the pixel values
ij
y j n
j n
x i n
i n
n
p x y =
+
=
+
=
+

1
2 1
2
( )
( , )
p(x,y) could the luminance value or the color primaries, in which case you will have 3
means one for each primary.
! Standard deviation

[ ]

ij ij
y j n
j n
x i n
i n
n
p x y =
+

=
+
=
+

1
2 1
2
( )
( , )

! Skewness
S
n
p x y
ij
ij
ij y j n
j n
x i n
j n
=
+

=
+
=
+

1
2 1 ( )
( , )
! Kurtosis
K
n
p x y
ij
ij
ij y j n
j n
x i n
i n
=
+

=
+
=
+

1
2 1
3
4
( )
( , )
! Entropy

h
n
h k h k
ij
k
L
=
+
=
1
2 2 1
0
1
log( )
( ) log( ( ))
where h(k) is the histogram of the pixels in the window. ( )( ) 2 1 2 1 n n + +
Application Ideas
1. Shape recognition of manufactured parts (invariant moments + neural network).
2. OCR applications (invariant moments + neural network)
3. Face recognition (invariant moments + texture/color + neural network).
4. Thresholding or binarization of gray scale or color images to isolate the foreground, such
as text, from the background which could have water markings, such as in passports,
personal IDs, checks, etc.
5. Color recognition.
6. Recognition of items through color/texture.
7. Color correction.
...etc.

References.
1. D.E.Rumelhart, J.L.McClelland and the PDP research group, Parallel Distrubuted Processing,
Vol.1 MIT Press. (1988).
2. G.V.Reklaitis, A.Ravindran, and K.M.Ragsdell, Engineering Optimization, Methods and
Applications, John Wiley and sons,(1983). 3. Kiefer,J., "Optimum sequential search and
approximation methods under minimum regularity assumption, " J. Soc. Ind. Appl. Math.,
5(3),pp.105-125 (1957).
4. M.J.D. Powell, "An efficient method for finding the minimum of a function of several variables
without calculating derivatives," Computer J., vol. 7, pp.155-162 (1964).
5. R.P.Brent, Algorithm For Minimization Without Derivatives, Prentice Hall, Englewood Cliffs,
NJ, 1973.
6. W.H.Swann, "Report on the development of a direct search method of optimization," ICI Ltd.,
Central Instr. Res. Lab. Res. Note, 64/3, London, 1964.
7. R.L.Zahradnik, Theory and Techniques of Optimization for Practicing Engineers, Barnes and
Noble, Inc. New York, (1971).
8. R.Fletcher and C.M.Reeves, "Function minimization by conjugate gradients," Computer J.,
vol.7, 149-154 (1964).
9. W.C.Davidon, "Variable metric method for minimization," AEC Res. Develop. Rep., ANL-599,
1959.
10. R.Fletcher and M.J.D. Powell, "A rapidly convergent descent method for minimization,"
Computer J., vol. 6, p.163 (1963).
11. A.Abou-Nasr and M.A.Sid-Ahmed, "LBAQ: A pattern recognition neural network that learns
by asking questions," Proceedings of 35th Mid-West Symposium, Washington, (1992).
12. J.A.Hartigan, Clustering Algorithms, John Wiley and Sons, New York (1949).
13. G.A.Carpenter, and S.Grossberg, "Neural dynamics of category learning and recognition:
attention memory consolidation, and amnesia," in J.Davis, R.Newburg, and E.Wegmen (Eds.)
Brain Structure, Learning, and Memory, AAAS Symposium Series, 1986.
14. John Gosch, "Color-recognition system distinguishes more colors, more accurately and faster,"
Electronic Design, A Penton publication, November 21, 1991, vol.39, No.22, pp.30-32.
15. Hu, M. K. [1962]. Visual Pattern Recognition by Moment Invariants, IRE Trans. Info.
Theory, vol. IT-8, pp. 179-187.

Artificial Neural Networks Basics

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Artificial Neural Networks Basics

Caricato da

Copyright:

Formati disponibili

Artificial Neural Networks

The biological network

Potrebbero piacerti anche