Sei sulla pagina 1di 10

Back-Propagation Algorithm of CHBPN code

A.K. Mahendra Machine Dynamics Division, B.A.R.C.

Introduction
Back-propagation (BP) is a technique for modeling complex nonlinear functions, even when the equation describing the function is unknown. BP is useful for tasks that require classification, pattern recognition, or mapping continuous input values to continuous output values. Due to its versatility and relative ease of use, BP is currently the most popular neural network algorithm. Some applications that have used BP are classification, process control, signal processing, medical diagnosis, and financial forecasting. This page discusses the following topics: * Backpropagation network * BP Algorithm Overview * When to Use BP

Back Propagation Network


A neuron is the network component that processes the data fed to it and a backpropagation network is an interconnected group of neurons arranged in layers. The actual neural network topology consists of atleast three layers as shown in Figure 2, viz. the bottom or input , the middle or hidden and the top or output layers. In practice, number of hidden layers may be more than one . Since the input layer is merely a passive conduit for the data, its elements perform no computational work and these are called non-processing neurons. In Figure 1, the input neurons are shown as squares while the O1 O2 O3 Output layer Middle layer Input layer I1 I2 I3

Figure 1. Back Propagation Network


processing neurons are indicated as circles. Artificial neural network models are made from a large number of interconnected neurons similar to large number of neurons present in the simple biological brain ( cf. human brain has the most complicated network of around 1013 neurons.)

As Figure 2 suggests, a processing neuron has many inputs but only one output which is properly weighted before sending to the next layer. Collectively, the set of inputs is the neuron's input vector, labeled I1 to In. Each input component has its corresponding adjustable value called a weight, and the set of such weights associated with a neuron is called its weight vector, labeled W1 to Wn. Each neuron also has an additional weighted input , I , called the threshold input. Unlike the other inputs, the value applied to the threshold input is fixed at unity, so that the threshold acts as a reference level or bias. Further, the number of elements/neurons in the input layer equals the number of input data involved.

I
1

W
1

I2 I3 I4 Wn In

Neuron IW W I Output,O

Figure 2. A single neuron with n inputs

After an input vector is broadcast to a neuron, the neuron multiplies the input components by their weights, sums the products, then adds the threshold weight to the sum. The effective input (Sj) is therefore the dot (or inner) product of the weight vector times the input vector plus the threshold. This sum serves as the input to a function that produces the neuron's result. In a back-propagation neuron, this function--called the transfer function--is typically an "S"-shaped curve. Figure 3 shows a sigmoidal transfer function. After passing through the transfer function, the neuron's result (Oj) is then sent to other neurons or to a file. As Figure 3 shows, a sigmoidal transfer function is monotonic and semi-linear, meaning that it always increases and that it is roughly linear in the center but nonlinear at both extremes. The transfer function therefore acts like a broadly tuned center pass filter, moderating raw sums (Sj) with very small or very large values but passing the midrange neutrally. The sigmoid approaches '0' and '1' asymptotically, so the sigmoid guarantees that all outputs fall between '0' and '1'.

-1 0 .0 0 1 .0 0

-5 .0 0

0 .0 0

5 .0 0

1 0 .0 0 1 .0 0

0 .8 0

0 .8 0

0 .6 0

0 .6 0

O UTPUT
0 .4 0

0 .4 0

0 .2 0

0 .2 0

0 .0 0 -1 0 .0 0 -5 .0 0

0 .0 0

IN P U T

0 .0 0

5 .0 0

1 0 .0 0

Figure 3. Sigmoidal Transfer Function


It is possible to construct a BP network using any one of several transfer functions. CHBPN software uses a logistic sigmoidal transfer function, which has the following form:

Output, O = 1/{1 + exp(-Sj)}

(1)

where Sj is the weighted sum of the inputs of neuronj. The benefit of using a logistic transfer function is that it is easy to calculate its derivative, as required later in Equation (2) and Equation (11). The hidden layer, which has no direct connection to input or output, is first computational layer in the network. As the figure suggests, each hidden-layer neuron is connected to all of the input nodes. Each hidden neuron calculates the weighted sum of its inputs (Sj), applies the transfer function to the sum to generate a result (Oj), then passes the result to the output layer. The number of neurons in the hidden layer influences the network's behavior, often significantly. Networks with too many hidden neurons tend to memorize the training data; those with too few cannot learn the problem. The output layer generates the network's output. Each output-layer neuron is connected to all of the hidden neurons. Like the hidden neurons, each output neuron calculates the weighted sum of its inputs and applies the transfer function to produce a result. The output layer then transmits all of the individual results as the network's output vector. The number of neurons in the output layer usually follows from the number of categories you want the network to recognize or from the function you want it to emulate. A BP network always passes data forward through the hidden layer to the output layer. Applying a vector to the input side produces a corresponding vector on the output side. The network thus acts as a function, mapping input patterns to output patterns. The network learns to associate specific input patterns with specific output patterns by adjusting its weights during training, described in the next section.

BP Algorithm Overview
A BP network learns by example, repeatedly processing a training file that contains a series of input vectors and the correct (or target) output vector for each. Each pass through the training file is one epoch. During each epoch, the network compares the target result with the actual result, calculates the error, and modifies the network's weights to minimize the error. Through this process--called supervised training--the network learns to associate input patterns with the correct output patterns. The accumulated change in the weights represents what the network learned, and saving the trained weights preserves the learned solution. This section provides a brief overview of the calculations BP performs during training. As an example the steps involved for a 3-layer network using a generalized delta rule with momentum factor can be summarized as : Assign the initial values for the primary weight matrices Wkh ( for input to hidden layer ) and Whi .( for hidden to output layer) Calculate the output vector for each pattern of the input vector (Inp)k =1,km
km

Ih =

W
k =1

kh

(Inp)k

(2)

Oh = 1/{1 + exp(-Ih)}
hm

(3)

Ii =

W
h =1

hi

Oh

(4)

Oi = 1/{1 + exp(-Ii)}

(5)

where subscript h denotes hidden neuron and hm is the total number of hidden neurons. Calculate the error for the output layer. ( if the output so obtained is not matching with the coefficient vector Ai=1,n ) i = Oi - Ai
n

(6)

Calculate the sum of the square of errors for all the output neurons, increment the total error, E for all the patterns, T.
T n i 2

( )
i i=1

and then

E=

( )

(7)

t=1 i=1

After calculating the error for a neuron, the algorithm then adjusts the neuron's weights using the Least Mean Squared (LMS) learning rule, also called the Delta rule. The following equation computes the new weight vector for output neuronj: Update the weights between output to hidden layer i.e Wih Wihnew = Wihold + [ i (Oi/ Ii)Oh + Wihold ] (8)

where (0,1) is called the momentum factor and (0,1) is called the learning rate. As Equation (8) suggests, the vector term added to the current weight vector to adjust it during training is often called the delta vector. This learning rule is designed to adjust the network's weights to find the least mean squared error for the network as a whole. This minimization has an intuitive geometrical meaning. It can be shown that the mean squared error is a quadratic function of the weight vector. As a result, plotting the mean squared error against the weight vector components produces a hyperparabolic surface. Figure 4 shows an idealized error surface, assuming two-dimensional weight vectors for simplicity. Aggregate Error

Delta vector

Ideal weight vector

Old weight vector New weight vector

Weight y

Weight x

Figure 4. Gradient Descent


As Figure 4 suggests, the geometrical effect of the LMS learning rule is to move the weight vectors toward values that produce the minimum mean squared error, represented by the bottom point of the bowl-shaped error surface. In reality, the error surface typically has complex ravine-like features and many local minima. The delta vector follows the locally-steepest path, a little like a ball rolling downhill, so this process is some times called gradient descent. So far, this discussion ignores the network's layered architecture. A typical network has at least two levels of weighted connections, one for the hidden neurons and the other for the output neurons. At first glance, it is difficult to determine how the hidden neurons contribute to the output-layer error because their target output is unknown. The key to the BP algorithm lies in assigning credit for the actual results back to the hidden layer as required.

First, finding the error for the output-layer neurons is straightforward. As shown in Equation (2), the error is proportional to the difference between the target result and the actual result. To find the error in the hidden layer, the error value for each output-layer neuron is sent back to the hidden layer using the same weighted connections. This backward propagation of errors gives the algorithm its name. Because the error is transmitted backward using the original weighted connections, BP in effect assumes that a hidden neuron contributes to the output-layer error in proportion to the weighted sum of the back-propagated error values. This sum is calculated using the hidden-to-output weights before the output layer updates them. Error for the hidden layer;
n

h = (Oh/ Ih)
i =0

ih i

(9)

Updation of the weights between hidden to input layer i.e. Whk Whknew = Whkold + [h(Inp)k + Whkold] (10)

It may be noted that for a sigmoid function the derivative Olayer/Ilayer = Olayer (1-Olayer). The derivative term serves to moderate the sum using the value of the hidden neuron's result. This moderation is necessary partly because a strong connection sometimes transmits a weak result. If a hidden neuron's result approaches zero, for instance, then that neuron probably did not contribute much to the output error, even if the neuron is strongly weighted. Allowing for the hidden-layer output when computing hidden-layer error reduces the risk of blaming a neuron unfairly. The derivative term also contributes to the network's stability. If there are more than one hidden layer then the equation for weight change is given by
hm(L+1)

Wh(L) h(L-1)new = [(Oh(L)/ I h(L))O h(L-1)

h(L+1) =0

new h(L+1) h(L) h(L+1)

] + Wh(L) h(L-1)old

(11)

where subscript L is the layer number. L+1 is the layer number nearest to the output layer. Go for the next pattern of the input vector (Inp)k=1,km. Repeat the process for all the patterns,T. As already indicated earlier, sets of such patterns can be created from the data generated either experimentally or by numerical simulation. Training gets over if error criterion based on total error,E is met. After the training is over ( i.e. adaptation of the weight matrices is frozen ), the validation of the artificial neural network is done by utilizing fresh experimental/simulated input vector (Inp)k=1,km. The weight matrices of the validated neural network are preserved for their use in final application.

Because the transfer function is an "S"-shaped curve, its derivative is a bell-shaped curve. The sigmoid's derivative has large values in the middle range and small values toward both extremes. This shape assures that large changes in weights do not occur when the sum of the back-propagated errors approaches a very large or very small value. Each hidden neuron calculates its own error, then adjusts the input-to-hidden layer connection weights using the LMS learning rule described in Equation (8). The network is then ready for the next input pattern. To summarize, the algorithm follows this sequence of steps during training: 1. Receives an input pattern at the unweighted input layer. 2. Calculates the hidden-layer weighted sums and applies the transfer function to the sums, producing the hidden-layer result. 3. Transmits the hidden-layer result to the output layer. 4. Calculates the output-layer weighted sums and applies the transfer function to the sums, producing the output-layer result. 5. Compares the actual output-layer result with the target result, calculating an output-layer error for each neuron. 6. Transmits the output-layer errors to the hidden layer (back-propagation). 7. Calculates the hidden-layer error using the weighted sum of the back-propagated error vector moderated by the hidden-layer result. 8. Updates the output-layer and hidden layer weights using the LMS learning rule. 9. Receives the next pattern. For example a typical pseudo-code description is as follows : Set maximum acceptable error /* user-specified value, often that worst case element in any pattern is within 10% of desired */ repeat { total error = 0; for each pattern in training set do { /* forward activity flow */ get next pattern for each NEURON in middle layer do { compute net input = weighted sum of input pattern elements apply transfer function f(I) = 1/(1+exp(- I)) save net input /* needed for backward error pass and derivative computation */ } /* end for middle layer NEURON */ for each NEURON in output layer do { compute net input weighted sum of middle layer output elements apply transfer function f(I) = 1/(1 + exp(- I))

display output } /* end for each output-layer NEURON */ /* backward error pass; omit this pass if not actively training the network */ /* note this is the most effiecient proceedure, but clarifies the order of each step */ for each NEURON in the output layer do /* compute the error for each output layer NEURON */ { compute error = func(desired output - actual output) total error = total error + error /* function func can be simple LMS or ABS or any other function */ } /* end for each output-layer NEURON */ for each NEURON in the middle layer do /* backpropagate the output layer NEURONs error to the middle layer */ compute incoming error = weighted sum of the output layer errors compute final error = incoming error*(net input) *(1-net input) /* because of choice of transfer function, df/dI = f(I)*(1-f(I) ), net input I, in this formula is this NEURONs net input as computed in the forward activation flow */ } /* end for each middle layer NEURON */ for each NEURON in the output layer do /* adjust the weights between the middle layer and the output layer */ { for each weight from a middle layer NEURON do /* these are the incoming weights to the output layer */ { compute weight change = lr * E * I /* I is the incoming activityalong this connection; E is this NEURONs error; lr is learing constant */ update weight } /* end for each weight */ } /* end for each output layer NEURON */ for each NEURON in the middle layer do /* adjust the weights between the input layer and the middle layer */ { for each weight from a input layer NEURON do /* these are the incoming weights to the middle layer */ { compute weight change = lr * E * I /* I is the incoming activityalong this connection; E is this NEURONs error; lr is learing constant */

update weight } /* end for each weight */ } /* end for each middle layer NEURON */ } /* end for each pattern in the training set */ } /* end repeat until total error < maximum acceptable error */ until ( total error < maximum acceptable error ) During training, you typically test the network to evaluate its ability to process data it has not seen before. Testing involves presenting a file that --like the training input file--contains known input and target output patterns. During a test, the network passes a series of input vectors through the network to generate a series of output vectors. Comparing the actual and target results lets you measure the network's accuracy to decide whether to continue training, possibly after adjusting a constant or otherwise changing the network. No learning occurs during testing. After training and testing end, you typically validate the network with fresh data to confirm that it behaves as expected. Validation is recommended because testing influences training, for instance by ending it. After testing and validating the network, you save its weights to preserve what it learned during training. You then load the trained weights before running the network as an application. When running the trained network, the network processes a file that contains "real" data. Unlike the training, testing, and validation input files, the application input file does not supply a target result. Instead, it supplies only input vectors. The network reads a series of vectors from the file, passes them from layer to layer, then writes the corresponding series of output vectors to a file. The output vectors might represent categories or some other response, depending on the task the network has been trained to perform.

When to Use BP
You can use BP for a broad range of applications that directly or indirectly depend on modeling a function. The underlying function can be simple or complex, linear or nonlinear, and continuous or discontinuous. Often, you use BP when the equation describing the function is unknown and tradi tional methods do not provide an adequate solution. The following applications take advantage of the algorithm's properties: * Pattern classification * Process control * Signal processing

* Medical diagnosis * Noise filtering * Optical character recognition * Converting speech or text to phonemes * Encoding and compressing data * Financial forecasting. Many other applications that require function mapping are also possible. Because BP is a frequently used as a classifier, contrasting it with another classifier such as Learning Vector Quantization (LVQ) may suggest a basis of choice for applications that could use either algorithm. The differences include the following: * Mathematically, BP minimizes sum-squared error, while LVQ minimizes the number of misclassifications. (In fact LVQ often increases sum-squared error by moving its initial decision boundaries away from the "centers" of the categories to improve classification accuracy.) * BP often trains more slowly than LVQ because it is computationally more complex. On the other hand, trained BP networks that perform the same task are often smaller and may be faster. * BP learns from all input vectors, no matter how far from the center of the class. LVQ, in contrast, can ignore outliers during training. And, since you can change the size of the check window, LVQ lets you tune the degree of rejection. * BP supplies a vector as its result, while LVQ supplies a token called a class ID. Because the vector components are variable values, BP some times interpolates "new" results in between the original target output vectors. LVQ, on the other hand, draws its class IDs from a fixed population and never interpolates in this sense. Which differences are benefits depends on the application at hand. For applications that can use either algorithm, usually the best way to decide between them is to try both. BP has several drawbacks that may affect its suitability for some applications. First, BP is not guaranteed to solve the problem because the LMS algorithm cannot always find an acceptable local minimum. Second, BP requires training data labeled with a target result for each training pattern, but labeled data is sometimes difficult or impossible to obtain. Finally, BP is computationally complex, requiring many passes through the training data. As a result, BP using conventional computers has been for the most part limited to small problems that can be solved off-line.

Potrebbero piacerti anche