Sei sulla pagina 1di 9

Artificial Neural Networks

Human Brain

1. The brain is the main organ of the central nervous system. It is responsible for all
bodily activities such as metabolism, thought processes, reasoning etc.
2. The human brain has four prominent lobes – the frontal lobe, the parietal lobe, the
occipital lobe and the temporal lobe. “Lobes” are prominent bulge shapes on the
brain surface.
3. The main parts of the brain are the cerebrum (big brain), cerebellum (small brain)
and the medulla oblongata.
4. There are crests and troughs all over the brain surface. The crests or rises are
called gyrus (plural gyri) and the crests or falls are called sulcus (sulci, plural).
5. the important parts of the human brain are: the hippocampus, the amygdale,
thalamus, hypothalamus, pituitary gland, the pons. There are several deep brain
structures too which are responsible for signal routing, such as the basal ganglia.
6. the main functional unit of the brain is the neuron. Each neuron has a cell body,
and fibers called dendrons and extension of the dendron, which are smaller fibres
called dendrites. a
7. Neurons are interconnected and occur over the entire brain – over the outer cortex
as well as the inner regions. On an average a single neuron is surrounded by
10,000 neighboring neurons. It is through this network of neurons that signals
pass in and out of the brain. The male human brain has 100 billion neurons on an
average. The female human brain has a few hundred million fewer than the male
brain.
8. There are supportive cells of various types in the brain – all of which are
collectively termed glial cells. These cells outnumber the neurons 4 times.
9. The neuron is set to fire (or get activated) when the sum total of all input voltages
from the neighboring neurons exceeds its firing potential (action potential).
10. the central nervous system consists of a network of nerves which carry signals
from the receptor cells to the brain (called sensory nerves) and nerves which carry
signals from the brain to other parts of the body (called mortar nerves). Sensory
nerves, typically, are located at the muscle or a sense organ (eye, ear, nose tongue,
skin).
11. The arrangement of the brain is “contra-lateral” – this means that the right half of
the brain controls the left half of the body and the left half of the brain controls
the right half.
12. There are sub-regions on the brain’s cortex (outer surface) called “cortical areas”
the neurons each cortical area perform specific function. For example, the visual
centers (or visual cortical areas) are located on the occipital lobe. The auditory
centers are on the temporal lobe and so on. Scientists have identified at least 44
cortical areas and charted their specific body function.
13. The signals from the sense organs reach the corresponding cortical region, then
routed to other cortical regions before reaching the mortar nerves.
Artificial Neural Networks (ANN)

1. An Artificial Neural Network or ANN is a graph with nodes and interconnections,


aping the way the biological neurons are connected in the brain. The goal of the
ANN is to mimic the learning and reproducibility functions of the biological
human brain.
2. The ANN’s are trained using certain algorithms called “training algorithms”. Just
like in the biological system which has voltage inputs of the neighboring neurons,
the ANN has “weights” taking the role of voltages. Weights are real numbers
which get updated as the algorithm works on the graph until the learning is
complete.
3. There are two types of learning methods – supervised and unsupervised.
4. Learning versus Training: Learning refers to the fundamental weight update rule
(weight update step) whereas “training” refers to the full sequence of steps of
weight updating plus other calculation over the full network. So, a training
algorithm consists of one or more learning steps within.
5. McCulloch and Pitts first proposed a model based upon the biological neuron.
Their model has just a threshold function as the activation function.
6. Beginning with the McCulloch and Pitts model, an Adaptive Linear Combiner
(ADALINE) evolved. An ADALINE has weights on the segments falling into the
node. For output, the ADALINE yields ∑wij*xj.
7. It is possible to put ADALINES together into a single layer and form a single
layer neural network. Such extensions of ADALINE are called many-ADALINE
or MADALINE.
8. Several MADALINE layers may be interconnected to form a feedforward
network.

Supervised and Unsupervised Learning

1. In unsupervised learning the weight update is independent of the targets


corresponding to the input data vectors. A good example of unsupervised learning
is Hebbian learning.
2. Supervised learning requires the target data corresponding to the given inputs.
The target data corresponding to the given input plays a role in the computing of
the learning (delta_w) the pattern (input, output pair) has caused. This is unlike
the unsupervised learning method.

Role of the BIAS Node

1. In a multilayer feedforward network a single node is appended into each of the


input and hidden layers but NOT the output layer (whose neurons are fixed by the
problem). This additional node is called the “bias node”. The bias node connects
to every node of the immediately following layer.
2. In some feedforward network models, the bias in certain layers may removed. So
some layers could have the bias and some layers not.
3. The role of the bias node is to capture the nonlinearlity of the data. More the bias
nodes more is the network’s ability to capture nonlinearities.
4. Bias node is sometimes referred to as the dummy node.

Activation Function

1. In a feedfwd network, each node initially was taken to be the threshold function
(also called hard limiting function). Such nodes (or neurons) are called
perceptrons – which just perceive if the net input falling into it exceeds the
threshold or falls below the threshold.
2. In later developments, the hard limiting function got updated to the sigmoid
function: 1/(1+exp(-x)).
3. there are two types of sigmoids – monopolar and bipolar sigmoids. Monopolar
has the formula: 1/(1+exp(-x)). And bipolar has the formula: (2/(1+exp(-x)) – 1).
4. As the names imply monopolar is always positive and bipolar assumes both signs.
5. Derivatives of the monopolar sigmoid may be shown to be s*(1-s), where s=1/
(1+exp(-x)).
6. Derivative of the bipolar sigmoid is (1-w^2). Where the w = (2/(1+exp(-x)) – 1).
History of Artificial Neural Networks

1943 = McCullcoh and Pitts model of the neuron


1949 = Donald Hebb developed the Hebbian learning rule
1954 = Minsky and Papert developed “neurocomputers”
1958 = Frank Rosenblatt developed the perceptron.
1960 = ADALINE and MADALINE appeared
1972, 1977 = Amari developed the mathematical theory of neural networks
1977 = Kohonen in Finland developed the self-organizing map.
1982 = Hopfield developed Hopfield networks based on dynamical systems
1974, 1982 = Grossberg developed Adaptive Resonance Theory (ART)
1974 = Paul Werbos developed backpropagation algorithm

Learning Rules:

1. The various learning rules are: Hebbian learning rules, perceptron learning rule,
delta learning rule, Widrow-Hoff learning rule, Correlation learning rule, winner
take all learning rule, outstar learning rule.
2. Hebbain learning rule is based upon Hebb’s observation: “When the axon of cell
A is close enough to excite cell B and it repeatedly and persistently participates in
firing it then some kind of growth process or metabolic change occurs in A or B or
both so that A’s efficiency as one of the neurons firing B is increased.”
3. Outstar rule is also referred as the fan-out rule.
4. winner take all learning rule is often call fan-in learning rule.
5. Hebbian learning rule uses the correlation of the input data as a measure for
deciding whether the signal is excitatory (correlation and hence delta_w is
positive) or inhibitory (correlation is negative so is delta_w).
6. A cost function is used in delta learning rule for computing the learning (or
delta_w value).
7. Delta learning rule may be extended (from single ADALINE learing) for single
perceptron layer and also for multilayer perceptron.
8. when the delta learning rule is applied for multilayer feedforward network with
sigmoid activation function (the delta learning rule for this case is called the
generalized delta learning rule), we get the “error backpropagation training
algorithm” or backpropagation algorithm for short. It is called so because data
goes in the forward direction but weight update happens in the reverse direction
(back propagation).

Figure: summary of learning rules


Data Normalization

1. While training using the ANN the data must always be normailized to make sure
that the values fall within the “good slope region” of the sigmoid function.
2. The normalization function frequently used is

Xnew = 0.8*(X-Xmin)/(Xmax-Xmin) +0.1

3. Above formula scales the data to within [0.1,0.9].

Integer Programming and Sigmoid Scaling

1. the k in the sigmoid formula A/(1+exp(-kx)) is called the scaling factor. A is


another constant.
2. By choosing A and k suitably, for the entire feedforward network, it is possible to
have all the weights assuming integer values.
3. Effectively this mean that given any data, an ANN may be trained to capture the
data such that the weights of the ANN are integers alone (and not real numbers).
This may be done for arbitrary accuracy of the training.
Stopping Criterion for Training Algorithm

1. ANN training may be stopped by any of the following two criteria: error or
number of training epochs.
2. The algorithm training can stop when the required target error is reached or the
number of epochs (or training passes of the algorithm) is reached – whichever
occurs sooner.
3. Each element of the data set is called a pattern. ANN training can occur through a
pattern-wise training or an epoch-wise training.
4. In a pattern-wise training, the input vector passes thru network, the lerning caused
by it is computed and instantly weights are updated.
5. In epoch-wise training, the entire training data passes and the delta_w for each
pattern is computed and stored aside (weights not immediately updated). After a
full pass of the training data occurs, the stored delta_w’s are averaged and then
the weights are updated.
6. In order to test how good a training algorithm is, and how well the network
learned to map the data, we set aside some training data for “testing”. The testing
data is not used for training. After the training is complete, the trained network is
used for testing the goodness of the training.

Single Layer Perceptron Classifiers

1. A single layer perceptron can be used for classification problem. This in


particular is useful for realizing using the ANN the logic gates such as AND, OR,
NOT and XOR.
2. A hyperplane is a line in two dimensional space which separates the outputs (0
and 1) of a AND, OR and NOT gates. This hyperplane could be trained with a
single layer perceptron so that such ANN’s can implement the AND OR and NOT
gates.
3. Such hyperplane is called a decision boundary.
4. For the XOR gate a linear boundary doesnot exist which separates the outputs into
two regions. Therefore, the XOR is a nonlinear gate and a multilyer perceptron is
necessitated to solve the XOR problem.

Type of Problems

1. The ANN can solve a variety of problems including but not limited to: curve-fitting,
classification, parameter estimation and various other types of problems.
2. In each of those problems, the data is available from the system and the ANN is
required to learn to behave like the data. Curve-fitting type problems are referred to as
regression problems. Learning the AND gate for example is a classification problem
because the networks learns to map either sides of the hyperplane (the so called “decision
boundary”). Parameter estimation problems, typically, involve estimation of parameters
which cause the given function to fit the data in the best possible way. ANN can solve
each of these problems well.
Feedforward and Feedback Networks

1. In feedforward networks, the layered structure prevails. These networks are


composed of many-ADALINES. The links are not fed back in any manner.
2. In feedback networks, there is a feedback connection from the output to one of
the previous layers, so that the feedback introduces a recursive relationship
among the states of the network. Such networks are called recurrent networks
because, upon opening up they give rise to recurrence relations. They are also
examples of discrete time dynamical systems. Their generalizations lead into
continuous time dynamical systems and their transition states (example:
Hopfield networks).

Potrebbero piacerti anche