Sei sulla pagina 1di 57

Artificial Neural

Networks
Human Brain

 The brain is the main organ of the central nervous system. It is responsible for all
bodily activities such as metabolism, thought processes, reasoning etc.
 The human brain has four prominent lobes – the frontal lobe, the parietal lobe, the
occipital lobe and the temporal lobe. “Lobes” are prominent bulge shapes on the
brain surface.
 The main parts of the brain are the cerebrum (big brain), cerebellum (small brain)
and the medulla oblongata.
 There are crests and troughs all over the brain surface. The crests or rises are called
gyrus (plural gyri) and the crests or falls are called sulcus (sulci, plural).
 the important parts of the human brain are: the hippocampus, the amygdale,
thalamus, hypothalamus, pituitary gland, the pons. There are several deep brain
structures too which are responsible for signal routing, such as the basal ganglia.
 the main functional unit of the brain is the neuron. Each neuron has a cell body,
and fibers called dendrons and extension of the dendron, which are smaller fibers
called dendrites.
 Neurons are interconnected and occur over the entire brain – over the outer cortex
as well as the inner regions. On an average a single neuron is surrounded by 10,000
neighboring neurons. It is through this network of neurons that signals pass in and
out of the brain. The male human brain has 100 billion neurons on an average. The
female human brain has a few hundred million fewer than the male brain.
 There are supportive cells of various types in the brain – all of which are
collectively termed glial cells. These cells outnumber the neurons 4 times.
 The neuron is set to fire (or get activated) when the sum total of all input voltages
from the neighboring neurons exceeds its firing potential (action potential).
 the central nervous system consists of a network of nerves which carry signals from
the receptor cells to the brain (called sensory nerves) and nerves which carry signals
from the brain to other parts of the body (called mortar nerves). Sensory nerves,
typically, are located at the muscle or a sense organ (eye, ear, nose tongue, skin).
 The arrangement of the brain is “contra-lateral” – this means that the right half of
the brain controls the left half of the body and the left half of the brain controls the
right half.
 There are sub-regions on the” brain’s cortex (outer surface) called “cortical areas
the neurons each cortical area perform specific function. For example, the visual
centers (or visual cortical areas) are located on the occipital lobe. The auditory
centers are on the temporal lobe and so on. Scientists have identified at least 44
cortical areas and charted their specific body function.
 The signals from the sense organs reach the corresponding cortical region, then
routed to other cortical regions before reaching the mortar nerves.
Artificial Neural Networks (ANN)

 An Artificial Neural Network or ANN is a graph with nodes and interconnections, aping the way
the biological neurons are connected in the brain. The goal of the ANN is to mimic the learning
and reproducibility functions of the biological human brain.
 The ANN’s are trained using certain algorithms called “training algorithms”. Just like in the
biological system which has voltage inputs of the neighboring neurons, the ANN has “weights”
taking the role of voltages. Weights are real numbers which get updated as the algorithm works
on the graph until the learning is complete.
 There are two types of learning methods – supervised and unsupervised.
 Learning versus Training: Learning refers to the fundamental weight update rule (weight update
step) whereas “training” refers to the full sequence of steps of weight updating plus other
calculation over the full network. So, a training algorithm consists of one or more learning steps
within.
 McCulloch and Pitts first proposed a model based upon the biological neuron.
Their model has just a threshold function as the activation function.

The McCulloch-Pitts model of the neuron is shown in Figure 2.3a. The inputs xi, for i = 1, 2, . .
. , n, are 0 or 1, depending on the absence or presence of the input impulse at instant k.

 The neuron's output signal is denoted as o. The firing rule for this model is defined
as follows
where superscript k = 0, 1, 2, . . . denotes the discrete-time 'instant, and wi is the multiplicative weight
connecting the i'th input with the neuron's membrane. In further discussion, we will assume that a unity
delay elapses between the instants k and k + 1.
 Note that wi = + 1 for excitatory synapses, wi = - 1 for inhibitory synapses for this model, and T is the
neuron's threshold value, which needs to be exceeded by the weighted sum of signals for the neuron to fire.
Although this neuron model is very simplistic, it has substantial computing potential. It can perform the
basic logic operations NOT, OR, and AND, provided its weights and thresholds are appropriately selected.
As we know, any multivariable combinational function can be implemented using either the NOT
and OR, or alternatively the NOT and AND, Boolean operations.
 Examples of three-input NOR and NAND gates using the McCulloch-Pitts neuron model are shown in Figure
2.3(b) and (c). The reader can easily inspect the implemented functions by compiling a truth table for each
of the logic gates shown in the figure.
First note that a single neuron with a single input x and with the weight and threshold values both of
unity, computes Ok+1 = xk.
Such a simple network thus behaves as a single register cell able to retain the input for one period
elapsing
between two instants. As a consequence, once a feedback loop is closed around the neuron as shown in
Figure 2.3(d), we obtain a memory cell.
An excitatory input of 1 initializes the firing in this memory cell, and an inhibitory input of 1 initializes a
non firing state. The output value, at the absence of inputs, is then sustained indefinitely.
 This is because the output of 0 fed back to the input does not cause firing at the next instant, while the
output of 1 does.
Supervised and Unsupervised Learning

 In unsupervised learning the weight update is independent of the targets corresponding to the input
data vectors. A good example of unsupervised learning is Hebbian learning.

 Supervised learning requires the target data corresponding to the given inputs. The target data
corresponding to the given input plays a role in the computing of the learning (delta_w) the pattern
(input, output pair) has caused. This is unlike the unsupervised learning method.
Single-layer perceptron

The simplest kind of neural network is a single-layer perceptron network, which consists of a single
layer of output nodes; the inputs are fed directly to the outputs via a series of weights.

In this way it can be considered the simplest kind of feed-forward network.

The sum of the products of the weights and the inputs is calculated in each node, and if the value is
above some threshold (typically 0) the neuron fires and takes the activated value (typically 1);
otherwise it takes the deactivated value (typically -1).

Neurons with this kind of activation function are also called artificial neurons or linear threshold
units.

In the literature the term perceptron often refers to networks consisting of just one of these units. A
similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.
Figure.Single-layer perceptron
Multi-layer perceptron

This class of networks consists of multiple layers of computational units, usually interconnected in a feed-
forward way.
Each neuron in one layer has directed connections to the neurons of the subsequent layer. In many
applications the units of these networks apply a sigmoid function as an activation function.

The universal approximation theorem for neural networks states that every continuous function that maps
intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a
multi-layer perceptron with just one hidden layer.
This result holds for a wide range of activation functions, e.g. for the sigmoidal functions.

Multi-layer networks use a variety of learning techniques, the most popular being back-propagation.
Here, the output values are compared with the correct answer to compute the value of some predefined
error-function.
By various techniques, the error is then fed back through the network.

Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of
the error function by some small amount.
Role of the BIAS Node
 In a multilayer feedforward network a single node is appended into each of the input and
hidden layers but NOT the output layer (whose neurons are fixed by the problem). This
additional node is called the “bias node”. The bias node connects to every node of the
immediately following layer.
 In some feedforward network models, the bias in certain layers may removed. So some
layers could have the bias and some layers not.
 The role of the bias node is to capture the nonlinearity of the data. More the bias nodes more
is the network’s ability to capture nonlinearities.
 Bias node is sometimes referred to as the dummy node.
Activation Function

 In a feed forward network, each node initially was taken to be the threshold
function (also called hard limiting function). Such nodes (or neurons) are called
perceptrons – which just perceive if the net input falling into it exceeds the
threshold or falls below the threshold.
 In later developments, the hard limiting function got updated to the sigmoid
function: 1/(1+exp(-x)).
 there are two types of sigmoid – monopolar and bipolar sigmoid. Monopolar has
the formula: 1/(1+exp(-x)). And bipolar has the formula: (2/(1+exp(-x)) – 1).
 As the names imply monopolar is always positive and bipolar assumes both signs.
 Derivatives of the monopolar sigmoid may be shown to be s*(1-s), where
s=1/(1+exp(-x)).
 Derivative of the bipolar sigmoid is (1-w^2). Where the w = (2/(1+exp(-x)) – 1).
History of Artificial Neural Networks
1943 = McCullcoh and Pitts model of the neuron

1949 = Donald Hebb developed the Hebbian learning rule

1954 = Minsky and Papert developed “neurocomputers”

1958 = Frank Rosenblatt developed the perceptron.

1960 = ADALINE and MADALINE appeared

1972, 1977 = Amari developed the mathematical theory of neural networks

1977 = Kohonen in Finland developed the self-organizing map.

1982 = Hopfield developed Hopfield networks based on dynamical systems

1974, 1982 = Grossberg developed Adaptive Resonance Theory (ART)

1974 = Paul Werbos developed backpropagation algorithm


Learning Rules:

 The various learning rules are: Hebbian learning rules, perceptron


learning rule, delta learning rule, Widrow-Hoff learning rule, Correlation
learning rule, winner take all learning rule, out star learning rule.
 Hebbain learning rule is based upon Hebb’s observation: “When the
axon of cell A is close enough to excite cell B and it repeatedly and
persistently participates in firing it then some kind of growth process or
metabolic change occurs in A or B or both so that A’s efficiency as one of
the neurons firing B is increased.”
 Outstar rule is also referred as the fan-out rule.
 winner take all learning rule is often call fan-in learning rule.
 Hebbian learning rule uses the correlation of the input data as a measure for deciding whether
the signal is excitatory (correlation and hence delta_w is positive) or inhibitory (correlation is
negative so is delta_w).
 A cost function is used in delta learning rule for computing the learning (or delta_w value).
 Delta learning rule may be extended (from single ADALINE learning) for single perceptron
layer and also for multilayer perceptron.
 When the delta learning rule is applied for multilayer feedforward network with sigmoid
activation function (the delta learning rule for this case is called the generalized delta
learning rule), we get the “error backpropagation training algorithm” or backpropagation
algorithm for short. It is called so because data goes in the forward direction but weight
update happens in the reverse direction (back propagation).
Figure: summary of learning rules
Learning Rules in Detail:
 Let us study the learning of the weight vector Wi, or its components Wij connecting the jth
input with the i'th neuron.
 The trained network is shown in Figure 2.21 and uses the neuron symbol from Figure 2.21.
In general, the j'th input can be an output of another neuron or it can be an external input.
 Our discussion in this section will cover single-neuron and single-layer network supervised
learning and simple cases of unsupervised learning.
 Under different learning rules, the form of the neuron's activation function may be
different.
 Note that the threshold parameter may be included in learning as one of the weights. This
would require fixing one of the inputs, say xn,.
 We will assume here that xn, if fixed, takes the value of - 1.
 Metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is
increased." (Hebb 1949.)

 The rule states that if the cross product of output and input, or correlation term oixj is positive, this
results in an increase of weight wij; otherwise the weight decreases.

 It can be seen that the output is strengthened in turn for each input presented.

 Therefore, frequent input patterns will have most influence at the neuron's weight vector and will
eventually produce the largest output.

EXAMPLE

 This example illustrates Hebbian learning with


binary and continuous activation. functions of a
very simple network. Assume the network shown in
Figure 2.22 with the initial weight vector
EXAMPLE
been assumed to be arbitrary constants. The weights are initialized at any values
for this method of training
EXAMPLE
 -This criterion corresponds to finding the weight vector that is closest to the* input x.
 The rule (2.46) then reduces to incrementing w, by a fraction of x - w,. Note that only
the winning neuron fan-in weight vector is adjusted.
 After the adjustment, its fan-in weights tend to better esti- mate the input pattern in
question.
 In this method, the winning neighborhood is sometimes extended beyond the single
neuron winner so that it includes the neighboring neurons.
 Weights are typically initialized at random values and their lengths are normalized
during learning in this method.
Error back propagation training algorithm

Figure 4\8(a) illustrates the flowchart of the error back-propagation training algorithm for a basic
two-layer network as in Figure 4.7.

The learning begins with the feed forward recall phase (Step 2). After a single pattern vector z is
submitted at the input, e layers' responses y and o are computed in this phase.

 Then, the error signal computation phase (Step 4) follows.

 Note that the error signal vector must be determined in the output layer first, and then it is
propagated toward the network input nodes.

The K X J weights are subsequently adjusted within the Step 6.

Note that the cumulative cycle error of input to output mapping is computed in Step 3 as a; sum
over all continuous output errors in the entire training set.
The final error value for the entire training cycle is calculated after each completed pass through
the training set {zl, z2,. . . , zp}.
The learning procedure stops when the final error value below the upper bound, E,,,, is obtained
as shown in Step 8.
 Figure 4.8(b) depicts the block diagram of the error back-propagation trained
network operation and explains both the flow of signal, and the flow of error
within the network.
Data Normalization

 While training using the ANN the data must always be normailized to
make sure that the values fall within the “good slope region” of the
sigmoid function.
 The normalization function frequently used is
Xnew = 0.8*(X-Xmin)/(Xmax-Xmin) +0.1
 Above formula scales the data to within [0.1,0.9].
Integer Programming and Sigmoid Scaling

 the k in the sigmoid formula A/(1+exp(-kx)) is called the scaling factor.


A is another constant.
 By choosing A and k suitably, for the entire feedforward network, it is
possible to have all the weights assuming integer values.
 Effectively this mean that given any data, an ANN may be trained to
capture the data such that the weights of the ANN are integers alone
(and not real numbers). This may be done for arbitrary accuracy of the
training.
Stopping Criterion for Training Algorithm

 ANN training may be stopped by any of the following two criteria: error or
number of training epochs.
 The algorithm training can stop when the required target error is reached or the
number of epochs (or training passes of the algorithm) is reached – whichever
occurs sooner.
 Each element of the data set is called a pattern. ANN training can occur
through a pattern-wise training or an epoch-wise training.
 In a pattern-wise training, the input vector passes through network, the learning
caused by it is computed and instantly weights are updated.
 In epoch-wise training, the entire training data passes and the delta_w for each
pattern is computed and stored aside (weights not immediately updated). After
a full pass of the training data occurs, the stored delta_w’s are averaged and
then the weights are updated.
 In order to test how good a training algorithm is, and how well the network
learned to map the data, we set aside some training data for “testing”. The
testing data is not used for training. After the training is complete, the trained
network is used for testing the goodness of the training.
Single Layer Perceptron Classifiers

 A single layer perceptron can be used for classification problem. This in


particular is useful for realizing using the ANN the logic gates such as AND,
OR, NOT and XOR.
 A hyperplane is a line in two dimensional space which separates the outputs (0
and 1) of a AND, OR and NOT gates. This hyperplane could be trained with a
single layer perceptron so that such ANN’s can implement the AND OR and
NOT gates.
 Such hyperplane is called a decision boundary.
 For the XOR gate a linear boundary doesnot exist which separates the outputs
into two regions. Therefore, the XOR is a nonlinear gate and a multilyer
perceptron is necessitated to solve the XOR problem.
Type of Problems

 The ANN can solve a variety of problems including but not limited to: curve-
fitting, classification, parameter estimation and various other types of
problems.
 In each of those problems, the data is available from the system and the ANN
is required to learn to behave like the data. Curve-fitting type problems are
referred to as regression problems. Learning the AND gate for example is a
classification problem because the networks learns to map either sides of the
hyperplane (the so called “decision boundary”). Parameter estimation
problems, typically, involve estimation of parameters which cause the given
function to fit the data in the best possible way. ANN can solve each of these
problems well.
Feedforward and Feedback Networks

 In feedforward networks, the layered structure prevails. These networks are


composed of many-ADALINES. The links are not fed back in any manner.
 In feedback networks, there is a feedback connection from the output to one
of the previous layers, so that the feedback introduces a recursive relationship
among the states of the network. Such networks are called recurrent
networks because, upon opening up they give rise to recurrence relations.
They are also examples of discrete time dynamical systems. Their
generalizations lead into continuous time dynamical systems and their
transition states (example: Hopfield networks).

Potrebbero piacerti anche