Sei sulla pagina 1di 86

Neural Networks

Unit I & II (of a Total of VIII units)


K Raghu Nathan Retd Dy Controller (R&D)

Topics covered in this Unit


Biological Neural Networks Computers & Biological Neural Networks Models of Neuron [Artificial Neurons] ANN Terminology Artificial Neural Networks Historical Development of NN Principles

Topics covered [contd]


ANN Topologies ANN Functional Usage Pattern Recognition Tasks Learning in ANNs Basic Learning Laws

Biological Neural Networks


Nervous System
Complex system of interconnected nerves

Made up of Nerve Cells called Neurons Neurons Receive & Transmit information between various parts/organs of the body Sensory (Receptor) Neuron, Motor Neuron, Inter-Neuron Transmission of signal is a complex electro-chemical process

The Biological Neuron

Biological Neuron [contd]

Biological Neuron
Cell Body Soma
Has a Nucleus

Dendrites
Fiber-like; large number; branched structure Receive signals from other neurons

Axon
One per neuron; longer & thicker; branched at its end Transmits signals to other neurons Contains Vesicles, which hold chemical substance called neural transmitters

Biological Neuron [contd]


Synapse Synaptic Cleft Synaptic Gap
Junction of axon & dendrites

Pre-synaptic neuron
Transmitting neuron

Post-synaptic neuron
Receiving neuron

The Synapse

Neuron Signals
Complex electro-chemical process Incoming signals raise or lower the electrical potential inside the neuron If potential crosses a threshold, a short electrical pulse is produced
We say the neuron fires [is triggered or activated]

The pulse is sent down the axon electrical activity inside the neurons Chemical activity occurs at the synapses Vesicles in the axon release chemical substance, called neural transmitters These are collected by dendrites of receiving neuron This raises/lowers electric potential in the receiving neuron

Neuron Signals
Each neuron receives lot of input signals thru its dendrites
from many other neurons

Sends an output signal thru its axon


to many other neurons

Output depends on all inputs Cell body acts like a summing & processing device Process depends on type of neuron

Characteristics of Biological NN
Robustness & Fault Tolerance
Decay of nerve cells does not seem to affect performance significantly

Flexibility
Automatically adjusts to new environment

Ability to deal with wide variety of situations


Uncertain, Vague, Inconsistent, Noisy

Collective Computation
Massively Parallel Distributed

Aspect

Computer

Biological NN

Speed Processing Size & Complexity Storage

Numeric: Faster Patterns: Slower Sequential Less complex in memory locations; addressable; fixed capacity; new info overwrites old info no centralized

Slower Faster Massively Parallel Very Complex In the strengths of the interconnections; Adaptable size, to add new info yes distributed

Fault Tolerance Control Mechanism

Artificial Neuron - Neuron Models Mathematical Models of Neuron


M & P model Perceptron Adaline Madaline Neocognitron

McCulloch & Pitts Model


inputs weights w1 w2 wi wn [b = bias]
Activation

Output signal

value

a1

a2 ai an

aiwi + b

f(x)
output function

s
[s= f(x)]

Summing part

Output part

Output Function
Binary : if x>=t, s=1 else, s=0 (t = threshold)
1 0 x t s

Linear :

s x s=kt
0

Ramp :

Sigmoid : s = 1/(1+e-x)

NOR gate, using the M&P model

a1 a2 a1 a2 -1 s 0 0 0 1

x 1 0

s 1 0

-1

1
1

0
1

0
-1

0
0

Perceptron
Inputs are first processed by Association Units Weights are adjustable, to enable Learning Actual output is compared with desired output; the difference is Error Error is used to adjust the weights, to obtain desired output

Perceptron (contd)
a1 w1

w2 a2

aiwi + b

x f(x) s

w3 a3 Association Sensory units units Summing unit

Output function

Perceptron (contd)
Expected output = s Actual output = s Error = = s s Weight change = wi = ai is the Learning Rate parameter

Perceptron Learning
Perceptron Learning Rule
Procedure for adjusting the weights

If weight adjustments lead to zero-error, we say it converges Whether error reduces to 0, depends on nature of desired input-output pairs of data Perceptron Convergence Theorem
To determine whether desired input-output pairs are representable [achievable]

Adaline
Adaline = Adaptive Linear Element Similar to Perceptron; difference is :
Employs Linear Output Function (s=x)

Weight update rule minimises the mean squared error, averaged over all inputs
Hence known as LMS (Least Mean Squared) Error Learning Rule Also known as : Gradient Descent Algorithm

Terminology
Processing Unit
Summing part, output part Inputs, weights, bias, activation value Output function, output signal

Interconnections various Topologies Operations


Activation Dynamics, Learning Laws

Update
Synchronous, Asynchronous

Artificial Neural Networks


It is possible to create models of the biological neurons as processing units
and link them to form closely interconnected networks

Models may be electronic / software Such networks are called Artificial Neural Networks [ANN]

ANN
ANNs exhibit abilities surprisingly similar to Biological NNs They can Learn, Recognize, Remember, Match & Retrieve Patterns of Information Hardware implementations of ANN are also available nowadays
Costly but faster than software implementation

Historical Development of ANN


1943 - McCulloch & Pitts Model of Neuron 1949 - Hebbian Learning Law 1958 - Rosenblatts Model Perceptron 1960 Widrow & Hoff Adaptive Linear Element [Adaline] & Least Mean Squared [LMS] Error Learning Law 1969 Minsky & Papert - Multilayer Perceptron 1971 Kohonen Associative Memory 1971 Wilshaw Self-Organization 1974 Werbos Error Backpropagation

Historical Development of ANN [contd]


1976 Grossberg Adaptive Resonance Theory [ART] 1980 Fukushima - Neocognitron 1982 Hopfield Energy Analysis 1985 Sejnowski Boltzmann Machine 1987 Nielsen Counterpropagation [CPN] 1988 Kosko Bidirectional Associative Memory [BAM] 1988 Broomhead Radial Basis Function [RBF]

Topology
Topology is the physical organisation of the ANN Arrangement of the processing units, interconnections & pattern input & output ANN is made up of Layers of Neurons All Neurons within one layer have same activation dynamics & output function In addition to interlayer connections, intralayer connections may also be made Connections across the layers may be in feedforward or feed-back manner

Topology (contd)
One Input layer, one output layer zero or more intermediate layers (usually referred as hidden layers) No limit on no. of layers There can be any no. of neurons in any layer; all layers need not have same no. of neurons If there is no hidden layer, the ANN is called single-layer network If one or more hidden layers are present, it is called multi-layer network

Topology (contd)
Feedforward Networks
the units are connected such that data flows only in forward direction, ie. from input layer to output layer, via successive hidden layers if any

Feedback Networks
data flows in forward direction, as above in addition, the connections allow data flow from output layer towards input layer also the reverse flow (feedback) is for errorcorrection, for adjusting weights suitably to get desired output, which is an essential feature of the mechanism for NN Learning

Single Layer FF Network

Output layer Input layer

Multilayer Feedforward Network

Output Input layer Hidden layers layer

Feedback Network

Neuronal Dynamics
Operation of NN governed by Neuronal Dynamics
Dynamics of activation state Dynamics of synaptic weights

Short term Memory (STM) modelled by activation state of the NN Long Term Memory (LTM) corresponds to encoded pattern of info in synaptic weights

Applications of Artificial Neural Networks


Advance Robotics
Intelligent Control Technical Diagnisti cs Intelligent Data Analysis and Signal Processing

Machine Vision

Artificial Intellect with Neural Networks

Image & Pattern Recognition Intelligent l Medicine Devices Intelligent Security Systems

Intelligent Expert Systems

42

Major Areas of Usage


Pattern Recognition Tasks These tasks necessarily involve
Learning Memory Information Retrieval

Patterns
Computers deal with Data Humans deal with Patterns Objects/Images, voices/sounds, even actions [walking etc] have patterns Different images, sounds & actions have different patterns Patterns enable us to recognise, classify & identify objects & to take decisions based on such identification

Pattern Recognition Tasks


Pattern Association Pattern Classification

Pattern Mapping
Pattern Clustering (aka Pattern Grouping) Feature Mapping

Pattern Association
Every input pattern is associated with an output pattern, to form a pair of input-output patterns There will be many such pairs of input-output patterns A well-designed ANNs can be trained to learn (remember) many such pairs of patterns Whenever a pattern is input, the ANN should retrieve (output) the corresponding output pattern Supervised Learning has to be employed [being taught] This is purely a memory function & is called auto-association task

Pattern Association (contd)


Desirable : even if the input pattern is incomplete or noisy [ie. contains some errors], we should get correct output pattern Among the various input patterns in its memory, the ANN should select one pattern which is closest to the test input & the corresponding output pattern should be output by the ANN This needs content addressable memory & the process is called accretive behaviour Example of Pattern Association task : OCR of printed characters

Pattern Classification
Objects belonging to the same class have many common features/patterns This fact enables us to classify objects into classes & to identify new classes Supervised Learning the patterns for each class has to be taught to the system Pattern classification tasks must exhibit accretive behaviour ie. an incomplete or noisy input should poduce an output corresponding to its closest known input pattern Example of Pattern Classification task: Voice Recognition, Handwriting Recognition

Pattern Mapping
Capturing the relation between the input pattern & its corresponding output pattern This is a generalisation task, not mere memorising This is called interpolative behaviour Example of Pattern Mapping task: Speech Recognition

Pattern Clustering
Identifying subsets of patterns having similar distinctive features & grouping them together Sounds similar to Pattern Classification, but is not the same Has to employ Unsupervised Learning

Classification
Patterns for each class are input separately That is, system is trained to learn patterns of one class first Then it is taught the patterns of another class

Clustering
Patterns belonging to several groups are mixed in the set of inputs System has to resolve them into different groups

Feature Mapping
In several patterns, the features may not be unambiguous May vary over a time-period Therefore, difficult to cluster In this case, system learns a feature map rather than clustering or classifying Has to employ unsupervised learning Example: you see a new object - for the first time never seen it before - & it has some distinct features, as well some features common to many known classes or groups

Pattern Recognition Problem


In any pattern recognition task, we have a set of input patterns & a set of desired output patterns Depending on the nature of desired output patterns & the nature of the task environment, the problem would be one of the following three types:
Pattern Association Problem Pattern Classification Problem Pattern Mapping Problem

Pattern Association Problem


Problem: to design an ANN Input-output pairs are (a1,b1), (a2, b2), (a3, b3), ., (aL,bL) al = (al1,al2,,alM) & bl = (bl1, bl2,,blN) are vectors of dimensions M & N The ANN should associate the input patterns with the corresponding output patterns

Pattern Association Problem (contd)


If al & bl are distinct, the problem is heteroassociative If al = bl, it is auto-associative; al=bl means M=N, the input & output patterns both refer to the same point in a N dimensional space Storing the association of the pairs of input & output patterns = deciding the weights in the network, by applying the operations of the network on the input pattern

Pattern Association Problem (contd)


If a given input pattern = same as what was used for training the network, the output pattern = same as what was used during training If input pattern is slightly different (incomplete or noisy), output may also be different If actual input a = al + [ = noise vector]
If output is bl [as desired] NW is showing acretive behaviour If output is b = bl + , and 0 as 0, NW is showing interpolative behaviour

Basic Functional Units


Basic functional unit = simplest form in the 3 types of NN viz. FF, FB & Combination NWs Simplest FF NN is a single-layer NW

Simplest FB NN has N units, each connected to all others & to itself

Simplest Combination of FF & FB NW [aka Competitive Learning NW] is a singlelayer NW in which the units in output layer have feedback connections among themselves

Types of ANN & their suitable tasks


FF NN
Pattern Association, Classification & Mapping

FB NN
Auto-Association, Pattern Storage (LTM), Pattern Environment Storage (LTM)

FF & FB (CL) NN
Pattern Storage (STM), Clustering & Feature Mapping

FF NN Pattern Association
a1 a2 a5 a3 a6 a4 b4 b1

b2
b3

For input pattern ai, the corresponding output pattern is bi. a5 & a6 are noisy versions of a3. In a5 the noise is less, it is nearest to a3 - NW outputs b3 [desired], it is accretive. In a6 noise is more, it is nearer to a4 than a3 NW may output b4.

Real-Life Example

Inputs are 8x8 grids of pixels of binary values. Input pattern space is a binary 256-dimensional space.

A
B

A 1000001 B 1000010 . . . Z 1011010

Outputs are 5-bit binary numbers (7-bit ascii characters). Output pattern space is binary 5-dimensional space.

Noisy versions of input patterns can occur, when some values of some pixels get changed, due to noise in transmission channel or dust/stain spots on the document being scanned.

FF NN Pattern Classification
Some of the output patterns may be identical So, a set of input patterns may correspond to the same output pattern No. of distinct output patterns = a class label Input patterns corresponding to each class = samples of that class In such cases, the NN has to classify the input patterns That is: for each input pattern, the NN should identify the class [output pattern] to which it belongs

Real-Life Example

A 1000001

B 1000010

CL NN Pattern Classification

Accretive behaviour

FF NN Pattern Mapping
NN is trained with some pairs of input-output patterns, not all possible pairs When a new input pattern is given, the NN is made to find the coresponding output pattern [though the NN was not trained with this pair] Suppose the NN has been trained with i/o pairs a n & bn If the new input pattern am is closer to some known input pattern am, the NN tries to find an output pattern b which is closer to bn Interpolative behaviour

Pattern Mapping Action


a1 a2 a6 b6 b3 b4 b5 b1 b2

a3
a4 a5

NN trained with (a1,b1) to (a5,b5) only; not trained with (a6,b6) pair. a6 closer to a3; so, NN maps it on to b6, which is closer to b3.

FB NN Pattern Association
If input patterns are identical to output patterns, input & output spaces are identical Problem reduces to auto-association
trivial; the NW merely stores the input patterns

If a noisy pattern arrives at input, NW outputs the same noisy pattern as output
Absence of accretive behaviour

FB NN Pattern Association (contd)

a1 a2 a5

a1 a2 a5

a3
a4

a3
a4

FB NN Pattern Storage (LTM)


Auto-association with accretive behaviour Input patterns are stored; stored patterns can be retrieved by a noisy/approximate input pattern also Very useful in practice Two possibilities :
Stored patterns = same as input patterns; input pattern space is continuous; output pattern space is a set of fixed finite set of patterns that are stored Stored patterns = some transformed versions of input patterns; output space has same dimensions as input space

FB NN Pattern Storage (contd)

FB NN Pattern Environment Storage


Pattern Environment = a set of patterns + the probabilities of their occurrence NW is designed to recall the patterns with lowest probability of error More about this in Unit-VII

CL NN Pattern Storage (STM)


STM = short term memory = temporary storage Given input [as it is or a transformed version] is stored As long as same pattern is input, the stored pattern is recalled When new pattern is input, stored pattern is lost new pattern is stored Such NW can be studied on academic interest only not of practical use

CL NN Pattern Clustering
Patterns are grouped, based on similarities Input is an individual pattern; ouput is the pattern of group to which the input belongs That is : a group of approximately similar patterns are identified with one & the same cluster label & will produce the same output pattern Two types possible : new input pattern, not belonging to any group, is
forced to one of the groups (Accretive behaviour) shown as belonging to a new group
Input is close to some known input pattern x New group is close to xs group (Interpolative behaviour)

CL NN Pattern Clustering (contd)

Interpolative behaviour

CL NN Feature Mapping
Similar to clustering; difference is: Similar inputs produce similar output [not the same output] similarities of inputs is retained at the output No accretive behaviour; only interpolative Output patterns are much larger [than for clustering]

Types of Learning
Supervised Learning
desired response [output] of the system is already known/decided network is made [tuned/trained] to give the desired output it is as if the network is "being taught or trained by a teacher"

Unsupervised Learning
output is not known network is allowed to settle into stable state by itself It is as if the network is discovering special features and patterns from available data without using an external help

Types of Learning (contd)


Reinforcement Learning
Bridges the gap between supervised & unsupervised methods Output is not known System receives feedback from environment Reward for correctness Punishment for error System adapts its parameters based on this feedback

Learning Equation
Implementation of Synaptic Dynamics Expression for updating of weights Express the weight vector of ith processing unit at time instant t+1, in terms of that weight vector at time instant t wi(t+1) = wi(t) + wi(t) wi(t) is the change in the weight vector Different researchers have proposed different expressions for calculating wi(t); these are called Learning Laws

Learning Laws
Hebbs Law [Hebbian Learning Law] Perceptron Learning Law Delta Learning Law LMS Learning Law Correlation Learning Law Instar [winner-take-all] Learning Law Outstar Learning Law

Boltzmann Learning
Stochastic Learning Algorithm A Network designed to apply Boltzmann Learning Rule is called Boltzmann Machine The neurons constitute a recurrent structure & give binary output [+1 or -1] corresponding to whether the neuron is on or off

Memory-based Learning
Past experiences = patterns which the NN has been trained to recognise/classify Each experience is a pair of input & output patterns All or most of the past experiences are stored in a large memory Any new input pattern can be compared with patterns stored in memory & the corresponding output pattern can be output

Memory-based Learning (contd)


Memory-based learning algorithms involve 2 essential ingredients Criteria applied to define local neighbourhood [patterns which are similar] Learning rule applied for training the NN Algorithms will differe based on how these 2 ingredients are defined

Summary of Learning Laws


See table 1.2 on page 35 of Yegnanarayanas book

Learning Law

Weight Update (Wi) Formula (for j = 1, 2, ..., M) siaj (bi - si) aj

Initial Weights

Type of Learning

Remarks

Hebbian

Near zero

Unsupervised

Perceptron

Random

Supervised

Bipolar Output Functions

Delta

(bi - si) f(xi) aj

Random

Supervised

Widrow-Hoff (LMS)

[bi - wiTa] aj

Random

Supervised

Correlation

bi aj (aj wkj)

Near zero

Supervised

Winner-Take-All (Instar)

Random, but normalised

Unsupervised

Competitive Learning; K is the Winning Unit

Outstar

(bi wjk)

Zero

Supervised

Grossberg Learning

End of Units I & II

Potrebbero piacerti anche