Sei sulla pagina 1di 41

Neurocomputing

NN-Models

Neurocomputing
Prof. Dr.-Ing. Andreas König
Institute of Integrated Sensor Systems ISE

Dept. of Electrical Engineering and Information Technology


Technische Universität Kaiserslautern

Fall Semester 2006

© Andreas König Slide 2-1

Neurocomputing
NN-Models

Course Contents:
1. Introduction
2. Rehearsal of Artificial Neural Network models relevant for
implementation and analysis of the required computational steps
3. Analysis of typical ANN-applications with regard to computational
requirements
4. Aspects of simulation of ANNs and systems
5. Efficient VLSI-implementation by simplification of the original
algorithms
6. Derivation of a taxonomy of neural hardware
7. Digital neural network hardware
8. Analog and mixed-signal neural network hardware
9. Principles of optical neural network hardware implementation
10. Evolvable hardware overview
11. Summary and Outlook

© Andreas König Slide 2-2

1
Neurocomputing
Chapter Contents NN-Models

2. Rehearsal of Artificial Neural Network models relevant for


implementation and analysis of the required computational steps
2.1 Discussion and analysis of the ADALINE (Perceptron) – recall and
learning requirements
2.2 Relevant neural networks and related statistical methods for
classification purposes
2.2.1 Parametric Classifiers (Normal distribution, Mahalanobis,
Euclidean Distance)
2.2.2 Nonparametric classifiers (Parzen Window, Nearest neighbor)
2.2.3 Multi-Layer-Perceptron with Backpropagation learning
2.2.4 Learning-Vector-Quantization
2.2.5 Dynamic-Nearest-Neighbor Classifiers
2.2.6 Restricted-Coulomb-Energy-Network

© Andreas König Slide 2-3

Neurocomputing
Chapter Contents NN-Models

2.2.7 Radial-Basis-Function Networks


2.2.8 Probabilistic Neural Networks
2.2.9 Hopfield Network
2.2.10 Boltzmann Machine
2.2.11 Self-Organizing Feature Map
2.2.12 Cellular Neural Network
2.2.13 Summary

© Andreas König Slide 2-4

2
Neurocomputing
ADALINE/Perceptron NN-Models

¾ Early neuron models, such as the famous Perceptron or ADaptive Linear


Element (ADALINE) neuron were used for (simple) classification
¾ Example of the Perceptron application for letter recognition:

output neuron

variable weights
fixed connections

¾ Special learning algorithm


for perceptron
¾ Systematic approach for
ADALINE
© Andreas König Slide 2-5

Neurocomputing
ADALINE/Perceptron NN-Models

¾ In the general case the neuron possess a nonlinear activation function f(net),
which is the identity function in the ADALINE:

 m 
x1 o = f  ∑ xi wi  (2.1)
x2
w1  i =1 
w2
wi f(net) o 1
xi wm f (net ) = (2.2)
1 + e − net
xm
f(net)

Stimuli Weights Cell body Output 0.5


(Activation)
net
¾ In classication (recall mode), a step function is employed
© Andreas König Slide 2-6

3
Neurocomputing
ADALINE/Perceptron NN-Models

¾ The deviation of the actual neuron behavior or output from the desired or
prescribed output in such a supervised approach, can be assessed for all
pattern pairs
¾ The resulting error can be displayed as an error (hyper) surface with the
neuron weights as parameters, here w1 and w2:

error

¾ Good performance corresponds to valley location, that has to be reached !


© Andreas König Slide 2-7

Neurocomputing
ADALINE/Perceptron NN-Models

¾ Adaptation of a neuron weight is commonly achieved by gradient descent


based on an error function:
2
1 N   m 
E = ∑  y k − f  ∑ xik wi   (2.3)
2 k =1   i =1 
¾ Every weight is adapted after (random) initialization according to:
∂E
∆wi = −η (2.4)
∂wi
¾ The gradient is computed as:

∂E N
  m   m 
= −∑  y k − f  ∑ xik wi   ⋅ f '  ∑ xik wi  ⋅ xik (2.5)
∂wi k =1   i =1    i =1 

¾ Inserting (2.5) in (2.4) returns the batch learning rule, reducing the batch
size to one returns on-line learning rule with immediate weight adaptation

© Andreas König Slide 2-8

4
Neurocomputing
ADALINE/Perceptron NN-Models

¾ Additional implementation requirement for implementing the Delta-Rule


(ADALINE):
∆wij

xi
* * ∑ wijnew
δj
y kj η
* - wijold
y j

For batch learning, the need of accumulation

∑ and intermediate storage of individual batch


pattern contributions must be regarded here

¾ Learning requires availability of previous forward phase results ( y kj )


¾ An arbitrary nonlinearity, e.g., (2.2), requires the availability of its
derivative and an additional scaling (multiplication) step !
© Andreas König Slide 2-9

Neurocomputing
ADALINE/Perceptron NN-Models

¾ Simple dot product neurons can serve as linear classifiers


 m 
o = f  ∑ xi wi + 1 ⋅ w0 
threshold
x1 +1
w1w0  i =1 
x2 w2  m 
f(net) o o = f  ∑ xi wi 
wi  i =0 
xi wm x2

xm
Class 2
(o=+1)
Class 1
Stimuli Weights Cell body Output (o=-1)
(Activation)

x1
¾ A single neuron can separate a linear separable problem with a separating
line (plane, hyperplane)
¾ Logical combinations of a layer allow tackling non-linear problems !
© Andreas König Slide 2-10

5
Neurocomputing
ADALINE/Perceptron NN-Models

¾ Simple distance neurons can serve also classifiers with spheric regions:

x1 R f
 m 
w1 o = f  ∑ (xi − wi ) − R 
2
x2 w2
 i =1 
wi f(net) o
xi wm x2
w2
xm
Class 2
Class 1 (o=+1)
Stimuli Weights Cell body Output (o=-1)
R
(Activation)

x
¾ A single neuron can separate a region by a radius limited (hyper) sphere w 1
1
¾ The norm of the distance vector is computed and compared with the
sphere radius, inside gives +1, outside returns -1
© Andreas König Slide 2-11

Neurocomputing
Relevant ANN for Classification NN-Models

¾ System context of neural network application:


Class labels
Feature space

Preprocessing 67663
Knowledgebased
S.Ender

Feature
From Home 11
Daiytown, 0815

Receiver & Classification interpretation


extraction
At Home 12
Daisytown, 0815

segmentation
Kaiserslautern

Vision (CCD,CMOS), IR, UV, SAR, US,


THz, Sonic, Olfaction, Degustation,...

© Andreas König Slide 2-12

6
Neurocomputing
Relevant ANN for Classification NN-Models

¾ In the following, artificial neural networks will regarded in their role as


classifiers, where they are most commonly applied

¾ They will be compared with established techniques of statistical pattern


recognition

¾ Both performance and implementation requirement/cost will be regarded

¾ The objective is the determination of basic building blocks and operations


common to most regarded algorithms

¾ Later, implementation options for these basic building blocks will be


considered

¾ ANN serve in different places of recognition systems and in very different


applications !

© Andreas König Slide 2-13

Neurocomputing
Relevant ANN for Classification NN-Models

¾ Numerous options to define estimation functions


¾ Taxonomy of important classification methods:

classification methods

statistical approaches function approximation

parametric non parametric

BAC/MLC MAC EAC CBC Parzen Hyper kNN Decision Polynomial


window sphere Tree Classifier

PNN RCE LVQ RBF/BP/CasCor

© Andreas König Slide 2-14

7
Neurocomputing
Parametric Methods NN-Models

¾ The Bayes-Normal distribution Classifier (BAC) assumes Gaussian


distributions for the classes
¾ The decision function are determined as

r P(ωi ) −
1 r r T −1 r r
( x − µi ) K i ( x − µi )
dˆi (x ) = e 2
(2.6)
(2π )N det K i

© Andreas König Slide 2-15

Neurocomputing
Parametric Methods NN-Models

¾ The intersections of two class region probability distribution functions


(pdf) define the class borders as lines of equiprobability

sketch of
class boundary

¾ Parabolic class boundaries result in the two-dimensional example


¾ In the case of more than two classes, class regions are defined by
intersections of resulting parabolic functions

© Andreas König Slide 2-16

8
Neurocomputing
Parametric Methods NN-Models

¾ Assuming equal a priori values for all classes return the Maximum-
Likelihood-Classifier (MLC):

r 1 1 r r T r r
dˆi (x ) = − ln(det K i ) − (x − µi ) K i−1 ( x − µi ) (2.7)
2 2
¾ Assuming further equal covariance for all classes return the Mahalanobis-
Classifier (MAC):
r 1 r r T r r
dˆi (x ) = − ( x − µi ) K −1 ( x − µi ) (2.8)
2
x2

sketch of
class boundary

x1
© Andreas König Slide 2-17

Neurocomputing
Parametric Methods NN-Models

¾ Assuming further the covariance matrix to be the identity matrix return the
Centroid or Euclidean Distance Classifier (EAC):
N −1 2

= ∑ (x )
r r r
dˆi (x ) = x − µi
2
j −µj (2.9)
j =0
x2

sketch of
class boundary

x1

r N −1
¾ Simplifying the metric returns the dˆi (x ) = ∑ x j − µ j (2.10)
City-Block-Classifier (CBC): j =0
© Andreas König Slide 2-18

9
Neurocomputing
Nonparametric Methods NN-Models

¾ Parametric methods condense all the sample set information into few
model parameters and perform a global pdf estimation
¾ In contract, nonparametric methods perform a local pdf estimation:
r
r k (x)
p(x ) = (2.11)
N ⋅ν
¾ Assumption of either fixed volume ν or fixed number of patterns k

¾ The Parzen-window classifier bases on the first alternative


r r
r 1 N
 x − xi 
pParzen ( x ) = ∑ 
κ  (2.12)
N ⋅ hNM i =1  hN 
¾ In the simplest case the kernels used in (2.12) could be Gaussian
functions: Summation of kernel
Class 1 contributions in window max
Class 3 hN hN
Class 2
Class 1 hN
x1
© Andreas König Slide 2-19

Neurocomputing
Nonparametric Methods NN-Models

¾ For the given choice of Gaussian kernels, implementation could look like:

xi
wij
- * ∑ / exp oj
2σ 2

o1 Can be replaced by multiplication


oj
oL1
∑ with precomputed factor for fixed σ

o1

Max-of-L

oj
oLi
Class index
pdf value(s)
o1
oj
oLL

© Andreas König Slide 2-20

10
Neurocomputing
Nonparametric Methods NN-Models

¾ The kernel width must be adapted to data density


¾ k-nearest-neighbor classifiers (kNN) under control of the parameter k
estimate the densitiy and determine the class affiliation of a new pattern
by evaluating the k-nearest-neigbors
r k −1
x2 pkNN (x ) = r (2.13)
N ⋅ν ( x )

Class 1

Volume for k=5

Class 2 Class 1: 0/5=0.0


Class 3 max for
Class 2: 3/5=0.6
Class 3: 2/5=0.4 Class 2

x1
¾ kNN training means storage of all patterns !
¾ Two basic variations: voting and volumetric kNN
© Andreas König Slide 2-21

Neurocomputing
Nonparametric Methods NN-Models

¾ Sketch of class boundary for k=1 1NN-classification

x2

Class 1

Class 2 Class 3

x1

¾ Class-specific Voronoi tesselation defines the class borders in this case

© Andreas König Slide 2-22

11
Neurocomputing
Nonparametric Methods NN-Models

¾ Implementation requirements of kNN (special case k=1):

xi
wij
- * ∑ Min.
Search
oj

¾ Parallel implementation of individual neurons advocate


parallelisation of minimum (maximum) search: Class index
Neighbor
xi indices

Parallel Minimum Search


- * pdf value(s)
wi1

xi
wij
- * ∑ oj

xi
wiK
- * ∑
© Andreas König Slide 2-23

Neurocomputing
Nonparametric Methods NN-Models

¾ Edited-Nearest-Neighbor-classification (ENN, Devijver and Kittler, 1980)


¾ For k=1 resubstitution is guaranteed, however generalization can be
affected by this over-specialization in following situations:

x2 ¾ Automatic determination of the


outliers in the data set:
kNNω j =ω j
GENN = (2.14)
Class 1 kNNω j ≠ω j
k=5
¾ k neighbors determined and
investigated for class affiliation
Class 2 Class 3
¾ Elimination if same to different
class affiliations are below
threshold:
x1
GENN < Θ (2.15)

¾ Edited sample set is reduced, resubstitution no longer assured,


generalization generally improved by outlier elimination (rf. qo)
© Andreas König Slide 2-24

12
Neurocomputing
Nonparametric Methods NN-Models

¾ Condensed-Nearest-Neighbor-classification (CoNN, Hart, 1968)


¾ The complete storage in kNN requires large memory & long computation
¾ Reduction by limiting storage to vectors defining class boundary

x2
Pseudo-Code of CoNN algorithm:
1. Initially empty classifier
2. Get first (next) pattern xi
3. 1-NN-classify sample set
pattern
If correct_classification goto 4
Else Insert pattern xi; goto 2
4. If all_patterns_corr_class
break;
Else goto 2
x1

¾ Algorithm reduces effort but depends on sample set presentation order and
leaves redundancy in the CoNN; recall by 1-NN mandatory !
© Andreas König Slide 2-25

Neurocomputing
Nonparametric Methods NN-Models

¾ Condensed-Nearest-Neighbor-classification (CoNN, Hart, 1968)


¾ Step-wise demonstration:

x2
Pseudo-Code of CoNN algorithm:
1. Initially empty classifier
2. Get first (next) pattern xi
3. 1-NN-classify sample set
first pattern pattern
If correct_classification goto 4
Else Insert pattern xi; goto 2
4. If all_patterns_corr_class
break;
Else goto 2
x1

¾ All class 3 patterns in the following will be classified correctly

© Andreas König Slide 2-26

13
Neurocomputing
Nonparametric Methods NN-Models

¾ Condensed-Nearest-Neighbor-classification (CoNN, Hart, 1968)


¾ Sketch of potential final solution:

x2
Pseudo-Code of CoNN algorithm:
1. Initially empty classifier
2. Get first (next) pattern xi
potentially 3. 1-NN-classify sample set
redundant pattern
reference If correct_classification goto 4
vector Else Insert pattern xi; goto 2
4. If all_patterns_corr_class
break;
Else goto 2
x1

¾ Removal of existing redundancy by following step

© Andreas König Slide 2-27

Neurocomputing
Nonparametric Methods NN-Models

¾ Reduced-Nearest-Neighbor-classification (RNN, Gates, 1972)


¾ Addition of a removal step to clean reference vector set from redundant
instances, i.e., vectors not required for perfect resubstitution

x2 Pseudo-Code of RNN algorithm:


1. Run CoNN algorithm
2. Tentatively remove first (next)
reference vector ri
potentially
3. 1-NN-classify all sample set
redundant
reference patterns
vector If correct_classification
permanently remove ri; goto 4
Else restore ri; goto 4
4. If last_ref_vec break;
Else goto 2
x1
¾ CoNN redundancy eliminated, trong dependance on presentation order
¾ Alternatively, CoNN and RNN steps can be interleaved in a modified,
potentially faster algorithm, however, limit cycles can occur (rf. qs)
© Andreas König Slide 2-28

14
Neurocomputing
Nonparametric Methods NN-Models

¾ QuickCog example for Iris data:

22
2825 49
2 16
67 70
73 2
36 486
33 39 5 11 3155 3
30 69
63321 18 5614
5775 32 4317 34
61
9 27 4160 38
8 44 4037
10 810
45 42
51
15 54 26 59
71
53 19464
52
46 9 5
35
72 66 5023
65
68 58 7
24 62 13
29 20 47 1 6
74 4
12
17

Mirrored due to projection

¾ With the given settings, 10 reference vectors are chosen (qs=0.906666)

© Andreas König Slide 2-29

Neurocomputing
Nonparametric Methods NN-Models

¾ A traditional method in OCR used for nonparametric classification, based


on function approximation approach, is the polynomial classifier
r
dˆi (x ) = a0,i + a1,i ⋅ x1 + a2,i ⋅ x2 + K + a N ,i ⋅ x N + (2.16)

a N +1,i ⋅ x12 + a N + 2,i ⋅ x1 ⋅ x2 + K


¾ Expressing (7.14) by new coordinates results in

v = (v1 , v2 , K, v p ) = (1, x1 , x2 ,K, xN , x12 , x1 ⋅ x2 , K)


r
(2.17)
r
r r
dˆ = AT ⋅ v (x ) (2.18)
¾ LMS-optimization to determine polynomial coefficients of A
¾ Similarity to dot product neuron with nonlinear synapses !
¾ Excessive growth of variables in (7.15) for increasing order G of
polynomial:
 N + G  (N + G )!
p =   = (2.19)
 G  N !⋅ G!
¾ Data compression/term selection, G 1-3 (Schürmann, Kressel)
© Andreas König Slide 2-30

15
Neurocomputing
Neural Networks NN-Models
¾ A multilayer perceptron with backpropagation algorithm serves as a
classifier for non-linear separable problems:
Max-of-L Class affiliation
oi
Output Layer
wij
hj
Hidden Layer

wjk

xk Input Layer
¾ Proven to be a universal function approximator with one nonlinear HL
¾ A learning rule is required for this network, in particular the hidden layer(s)
¾ Choice of hidden layer size and learning parameters can be difficult
¾ Resubstitution not guaranteed, generalization can be surprizing
© Andreas König Slide 2-31

Neurocomputing
Neural Networks NN-Models

¾ Introducing the following abbreviations using the notation given with the
network structure:
   
oi = f  ∑ h j wij  ; h j = f  ∑ xk w jk  (2.20)
 j   k 
¾ With (5.19) the error can be expressed as:
2
1 N L   m 
E = ∑∑  yiµ − f  ∑ h µj wij   (2.21)
2 µ =1 i =1   i =1 
¾ Every weight is adapted after (random) initialization according to:
∂E
∆wij = −η (2.22)
∂wij
¾ The gradient for the output layer weights is computed as:
N     
∂E
= −∑  yiµ − f  ∑ h µj wij   ⋅ f '  ∑ h µj wij  ⋅ h µj (2.23)
∂wij  
µ =1   j   j 
© Andreas König Slide 2-32

16
Neurocomputing
Neural Networks NN-Models

¾ This can be expressed employing the abreviations of (2.20) as


∂E
( )
N
= −∑ yiµ − oiµ ⋅ o'iµ ⋅h µj (2.24)
∂wij µ =1

¾ Inserting in (2.22) gives the output layer batch adaptation rule

∂E
( )
N
∆wij = −η = η ∑ yiµ − oiµ ⋅ o'iµ ⋅h µj (2.25)
∂wij µ =1

¾ For the hidden layer adaptation rule, the error function must be expanded:
2
1 N L    
E = ∑∑  yiµ − f  ∑ wij f  ∑ xkµ w jk    (2.26)
2 µ =1 i =1   j  k   
netj
hj
neti
oi
ei

© Andreas König Slide 2-33

Neurocomputing
Neural Networks NN-Models

¾ Every hidden weight is adapted after (random) initialization according to:


∂E
∆w jk = −η (2.27)
∂w jk
¾ The gradient for the hidden layer weights is computed by application of the
chain rule:
∂E ∂E ∂ei ∂oi ∂neti ∂h j ∂net j
= ⋅ ⋅ ⋅ ⋅ ⋅ (2.28)
∂w jk ∂ei ∂oi ∂neti ∂h j ∂net j ∂w jk
N L       
∂E
= −∑ ∑  yiµ − f  ∑ h µj wij   ⋅ f '  ∑ h µj wij  ⋅ wij ⋅ f '  ∑ xkµ w jk  ⋅ xkµ
∂w jk  
µ =1 i =1   j   j   j 
(2.29)
¾ This can again be expressed employing the abreviations of (2.20) as
∂E
( )
N L
= −∑∑ yiµ − oiµ ⋅ o'iµ ⋅wij ⋅ h'µj ⋅xkµ (2.30)
∂w jk µ =1 i =1

© Andreas König Slide 2-34

17
Neurocomputing
Neural Networks NN-Models

¾ Introduction of error or δ-terms with

δ iµ = ( yiµ − oiµ )⋅ o'iµ (2.31)

L
δ jµ = h'µj ⋅∑ wijδ iµ (2.32)
i =1

¾ ... allows a compact representation of the adaptation rules


N
∆wij = η ∑ δ iµ ⋅ h µj (2.33)
µ =1

N
∆w jk = η ∑ δ jµ ⋅ xkµ (2.34)
µ =1

¾ This learning rule is denoted as error-backpropagation learning rule


¾ Numerous variants of this vanilla approach are in existence to improve
learning behavior, e.g., introduction of a momentum term or adaptive η
© Andreas König Slide 2-35

Neurocomputing
Neural Networks NN-Models

¾ Learning rule requirements for output layer:


∆wij

neti NL * * * ∑ wijnew
~0 !
δi
yiµ η h µj
- wijold
oiµ

0.4
Derivative For batch learning, the need of accumulation
0.3
0.2 and intermediate storage of individual batch
0.1
0
pattern contributions must be regarded here
1

0.5
Sigmoid

0

© Andreas König Slide 2-36

18
Neurocomputing
Neural Networks NN-Models

¾ Data flow in forward and backward propagation:


(y µ
i − oiµ )⋅ o'iµ
oi δi

wij

∑w δ ij i

δj
w jk

∑w jk δj
δk
¾ Implies transposed access to weight memory
¾ Learning of lower layers recursively requires data from previous layer (∑ w δ ) ij i

¾ Each neuron has an additional weight connected to constant +/-1 (threshold)


© Andreas König Slide 2-37

Neurocomputing
Neural Networks NN-Models

¾ Commonly this „vanilla“ backpropagation learning algorithm is


implemented in NN-HW, its nice and regular !
¾ Improved faster variants tend too loose that advantage
¾ Momentum term extension:

∂E
∆wij (t ) = −η + α ⋅ ∆wij (t − 1) (2.35)
∂wij

¾ Potentially helps to avoid sluggish behavior or oscillations


¾ Globally adaptive learning rule:

 +κ for ∆E < 0

∆η = − ϕ ⋅η for ∆E > 0 (2.36)
 0 for else

© Andreas König Slide 2-38

19
Neurocomputing
Neural Networks NN-Models

¾ Locally adaptive learning rule:


 +κ for δ (t − 1)δ (t ) < 0

∆ηij = − ϕ ⋅ηij for δ (t − 1)δ (t ) > 0 (2.37)
 0 for else

∂E
with ∂ (t ) = and δ (t ) = (1 − Θ )δ (t ) + Θδ (t − 1)
∂wij
¾ Denoted as Delta-Bar-Delta-Rule
¾ More information on error surface by 2nd order derivative (Hessian matrix)
¾ Requires storage in the order of the square of no. of weights
¾ Restricted effort by using only diagonal elements of Hessian matrix:
∂E
∂wij
∆wij (t ) = −η (2.38)
∂2E

∂wij2 practical extension to avoid
division by zero
© Andreas König Slide 2-39

Neurocomputing
Neural Networks NN-Models

¾ Model-based acceleration technique, denoted as Quickprop, by Fahlman:


∂E
parabolic model
∂wij

estimated crosssection of
mimimum gradient surface
two known points location

wij (t − 1) wij (t ) wij (t + 1) wij


y = ax + bx + c y1 ' = 2ax1 + b y2 '− y1 ' = 2a(x2 − x1 )
2
y2 '
x3 = x2 + (x2 − x1 )
y ' = 2ax + b y2 ' = 2ax2 + b − y2 ' = 2a (x3 − x2 ) y1 '− y2 '
0 = 2ax3 + b
∂E
∂wij
wij (t + 1) = wij (t ) +
∂E
(w (t ) − w (t − 1)) (2.39)
(t − 1) − ∂E (t )
ij ij

∂wij ∂wij
¾ Weight increment limited by maximum step size given by µ !
¾ Epoch or batch learning rule !
© Andreas König Slide 2-40

20
Neurocomputing
Neural Networks - LVQ NN-Models

¾ A learning vector quantization network (LVQ) employs winner-take-all


mechanism that leaves but the strongest response active:
Class 1 Class L

Output (Classification) Layer

WTA-Layer

wij

Input Layer
xk

¾ Adjusting of a fixed number, commonly randomly or by SOM training


initialized vectors
¾ Recall via 1-NN classification
© Andreas König Slide 2-41

Neurocomputing
Neural Networks - LVQ NN-Models

¾ Basic LVQ-1 learning method [Kohonen 1989]


¾ Iterative presentation of training data and WTA-computation finding wc(t)
r r r r
w2, x2 wc (t + 1) = wc (t ) + α (t )[x (t ) − wc (t )] if ωc = ω x
r r r r
wc (t + 1) = wc (t ) − α (t )[x (t ) − wc (t )] if ωc ≠ ω x
r r
wi (t + 1) = wi (t ) ∀ i≠c
(2.40)

x(t)-wc(t)
α (t )

wc(t) temporal decay of


learn rate
wc(t+1)

w1, x1 t
¾ Dead reference vectors wi can occur, sufficient no. per class not assured
¾ Different initialization, e.g., RNN, could be suitable
¾ Basic methods extended by improved versions LVQ2/2.1 & LVQ3
© Andreas König Slide 2-42

21
Neurocomputing
Neural Networks - LVQ NN-Models

¾ LVQ-2 learning method [Kohonen 1989]


¾ Now two weight vectors are determined for a new pattern:
r r r r
wi (t + 1) = wi (t ) + α (t )[x (t ) − wi (t )] if ω i = ω x
r r r r (2.41)
wl (t + 1) = wl (t ) − α (t )[x (t ) − wl (t )] if ω l ≠ ω x
r r
wk (t + 1) = wk (t ) ∀ k ≠ i, l

r
x (t ) ¾ Window computation:
new class
border
r d d  1− w
wi (t + 1) min xi , xl  > ηˆ =
d d
 xl xi  1 +w
r r
wi (t ) wl (t ) (2.42)

r
wl (t + 1) ¾ Monotonic decrease of dli
old class ¾ Remedy: Accept vectors in both
border sides of window (LVQ 2.1)
© Andreas König Slide 2-43

Neurocomputing
Neural Networks - LVQ NN-Models

¾ Computational requirements in forward phase as for 1-NN methods


¾ Requirements in learning (LVQ-1):

∆wij
xi
wci
- * +/- wijnew

t
α (t )
α (0) wijold
right/wrong
class

¾ Main computational effort in forward phase (all neurons compute)


¾ Adaptation requires only very few neurons to update their weights
¾ This leads to inefficient use of potentially parallel hardware
¾ Window computation and decaying learn rate computation impose
additional demands on computing resources

© Andreas König Slide 2-44

22
Neurocomputing
Neural Networks - RCE NN-Models

¾ A special class of ANN consists of a kernel layer and an output layer


Resolving of class affiliation, e.g., Max-of-L
Class 1 Class L

∑ ∑ ∑ ∑ Output
Rj (Classification)
Layer

r r Kernel-Layer
w j − xk
wij

Input Layer
xk
¾ The Restricted-Coulomb-Energy network (RCE) employs step functions
in the kernel layer and or-gate or summation of activated kernel neurons
¾ The network is generated from scratch by (patented) dynamic training
© Andreas König Slide 2-45

Neurocomputing
Neural Networks - RCE NN-Models

¾ RCE-training is part of the Nestor-Learning-System (NLS)


¾ Dynamic placement and scaling of hyperspheres:
Pseudo-Code of RCE algorithm:
x2 1. Initially empty classifier
2. Get first (next) pattern xi
3. RCE-classify pattern
If correct_classification goto 4
Rmax Else If unknown Insert xi; goto 2
Else If ambiguous reduce
radii of ri with ωi≠ωj;goto 2
Else (misclassification)
reduce radii of ri with
Rmin
ωi≠ωj; Insert xi; goto 2
4. If all_patterns_corr_class break;
x1 Else goto 2
¾ Result strongly depends on presentation order (Pro-RCE extension)
¾ Insertion only does not remove redundant neurons
¾ Classification resolving by voting or kernel based pdf estimation (PRCE)
¾ Additional attributes unknown and uncertain/ambiguous
© Andreas König Slide 2-46

23
Neurocomputing
Neural Networks - RCE NN-Models

¾ RCE-training is part of the Nestor-Learning-System (NLS):

Class 1

Class 2

Mechatronic data
Iris data

© Andreas König Slide 2-47

Neurocomputing
Neural Networks - RCE NN-Models

¾ Computational requirements of RCE forward phase:

xi oj
wij
- * ∑ - >0 to global
Rj class
assignment
oj > Rj
oj ≤ Rj
¾ Evaluation of activated hyperspheres and result determination (class
assignment) according to global rules
¾ Not really regular structure, not easy amenable to parallel implementation
¾ Computational requirements of RCE learning phase:
• Pattern storage requires fast transfer to memory
• Initial setting of radii to Rmax
• Radius reduction requires distance computation between two weight
vectors and comparison to Rmin
• Rather irregular process, not amenable to parallelization
• Overlap of forward and training phase hard due to resource conflicts
© Andreas König Slide 2-48

24
Neurocomputing
Neural Networks - RCE NN-Models

¾ Rough sketch of a potential RCE architecture:

Reference
Time-
Vector
multiplex
Memory 1

RN+1 RN+2 Rn+3 RN+4 R2N-1 R2N


First stored pattern

2nd stored pattern

3rd stored pattern

Reference
Vector
x2 Memory 1

R1 R2 R3 R4 RN-1 RN
PE
x1

¾ One more memory: Storage of reference vectors class affiliations !


© Andreas König Slide 2-49

Neurocomputing
Neural Networks - RBF NN-Models

¾ The Radial-Basis-Function network (RBF) employs (commonly Gaussian)


kernel functions in the kernel layer and dot product output layer neurons
Resolving of class affiliation, e.g., Max-of-L
Class 1 Class L

∑ ∑ ∑ ∑ Output
σj (Classification)
Layer

r r Kernel-Layer
w j − xk
wij
d ker_ max
σj =
2M Input Layer
xk
¾ RBF-networks are also universal function approximators !
¾ Number of kernels fixed by choice and random initialization or SOM
© Andreas König Slide 2-50

25
Neurocomputing
Neural Networks - RBF NN-Models
¾ RBF-training on a fixed hidden layer might not be efficient
¾ Dynamic training algorithms for function approximation and classification
¾ The first one is Platt´s Resource-Allocation-Network RAN
Pseudo-Code of principle RAN algorithm:
x2 sketch of σj 1. Initially empty kernel layer
2. Get first (next) pattern xi
3. Compute output value
If Error > ε & d>δ
Insert xi as new kernel;
σj = min(κ• δ, κ• d)
Adapt output layer weights ... ;

4. If sum_err < ε 2 break;


Else δ=δ•e-1/τ ; goto 2

x1
¾ Fast (evolving) training, insertion only does not remove redundant neurons
¾ Classification by determination of maximum pdf; background applicable !
¾ Training for all RBF-parameters can be achieved by gradient descent !
¾ Smooth and well-generalizing behavior of RBF-networks
© Andreas König Slide 2-51

Neurocomputing
Neural Networks - RBF NN-Models

¾ Computational requirements of RBF forward phase:

• Output layer neurons correspond to MLP output neurons


• Hidden layer neurons correspond to distance neuron computation,
where the computed distance is subject to a Gaussian non-linearity
with potentially global or local σ

¾ Computational requirements of RBF learning phase:

• Requirements vary with chosen learning approach


• Output layer neuron adaptation corresponds to MLP
• Complete gradient descent weight and σ adaptation can take place
with similar data propagation and approach as in MLP
• Usually, a simple scheme with mild requirements chosen in context
of HW-implementation

¾ An extension of RBF is to include oriented kernels in hyperbasis function


networks, i.e., introducing the effort of a BAC in each hidden neuron
© Andreas König Slide 2-52

26
Neurocomputing
Neural Networks-PNN NN-Models

¾ Probabilistic-Neural-Networks of Specht resemble Parzen-Window:


Resolving of class affiliation, e.g., Max-of-L
Class 1 Class L

∑ ∑ ∑ ∑ Output
σj (Classification)
Layer

r r Kernel-Layer
w j − xk
wij

Input Layer
xk
¾ Each training data vector is stored as Gaussian kernel with fixed global σ
¾ According to class labels, kernel are wired to pdf summation nodes
¾ Explicit cost or a priori weighting can be employed before pdf max-of-L
© Andreas König Slide 2-53

Neurocomputing
Neural Networks-PNN NN-Models

¾ pdf-computation:

 (xv − µ j )T (x − µ j ) 
r r r
1 1 m
pi = ∑ exp −  (2.43)
(2π )n σ n m i =1  2σ 2 

¾ Explicit weighting and cost function inclusion as well as rejection


generation easily feasible
¾ Computational simplification assuming normalized vectors of unit length:
 (x − µ j )T (x − µ j ) 
r r r r r r rr r r
 − x T x + 2 x µ j − µ Tj µ j 

g j = exp −  = exp  
 2σ 2   2 σ 2 
   
rr rr
 − 1 + 2 xµ j − 1   2 xµ j − 2   net j − 1 
= exp  = exp  = exp 
 2σ 2
  2σ
2
  σ
2

¾ Reduction of Gaussian to exponential activation function


© Andreas König Slide 2-54

27
Neurocomputing
Neural Networks-PNN NN-Models

¾ Computational requirements of PNN forward phase:

xi oj
wij
- * ∑ NL
Rj

pdf accumulation
Class-Specific
¾ Metric computation can be simplified and parallelized

Max-of-N
¾ Pdf-computation rather complex

¾ Learning is a process of storing training patterns


¾ Additionally, σ, rejection thresholds or cost factor
specification required

¾ Relation with RCE variant, denoted as p-RCE


¾ In the uncertain case, for activated hyperspheres pdf
computation takes place as in general PNN to resolve
© Andreas König Slide 2-55

Neurocomputing
Neural Networks-AM NN-Models

¾ Additional application fields of neural networks

© Andreas König Slide 2-56

28
Neurocomputing
Neural Networks-AM NN-Models

¾ Associative memories serve to establish mappings between


incomplete/distorted versions of the pattern itself (auto association) or
between entirely different patterns (hetero association)
¾ The conditioning of animals (dogs) to show flow of saliva to the
presentation of a ringing bell (Pawlovian reflex) is a prominent example
¾ Steinbuchs Lernmatrix is the first hardware implementation
© Andreas König Slide 2-57

Neurocomputing
Neural Networks-AM NN-Models

¾ Technical systems employ associative memory on a large scale, e.g., in


memory management systems (cache, page-based memory)
¾ The search is for a certain bit pattern, tolerance in the search can be added
by masking bits of the pattern, i.e., excluding these from the search
¾ Here, metrics serve for pattern comparison and similarity measure
¾ First case: Linear Associative Memory

Y =W ⋅ X (2.44)

¾ The pattern association relates to standard linear algebra


¾ If Y=X then the case of auto association is met
¾ Determination of association matrix W:
• Pseudo Inverse computation ( )
W = YX T XX T
−1
(2.45)

∆W = η ∑ ( y − Wx )(x )
N
• Gradient descent r r r T
µ µ µ (2.46)
µ =1
• Correlation matrix 1 N
r r
W=
N

µ
y µ xµ
=1
(2.47)

© Andreas König Slide 2-58

29
Neurocomputing
Neural Networks-AM NN-Models

¾ Inherent problem: limited storage capacity that leads to crosstalk


between patterns during association process:
r r
r r 2 r r xν xµ 
y = xν  yν + ∑ yµ r 2 (2.48)
 xν 
 µ ≠ν

¾ Remedy: pairwise orthogonal patterns


r r 2 r r 0  r r
y = xν  yν + ∑ y µ r 2  = xν
2
yν (2.49)
 xν 
 µ ≠ν

¾ Remedy: pairwise orthonormal patterns


r r 2 r r
y = xν yν = 1 ⋅ yν (2.50)

¾ Storage limitation due to orthogonalization requirements


¾ Further activities: sparse coded or nonlinear associative memories
¾ Representatives: Canerva’s or Palm’s associative memory and the
Hopfield network

© Andreas König Slide 2-59

Neurocomputing
Neural Networks-Hopfield NN-Models

¾ The Hopfield network (Hopfield 82) is a recurrent neural network with


binary or real valued neurons and weights applied for pattern restauration
and optimization

¾ An energy function is associated with the Hopfield network:


1
E (t ) = − ∑∑ wij oi (t )o j (t ) − ∑j in j o j (t ) + ∑j Θ j oi (t )
2 i j
(2.51)

E Inital state E

Spurios
Attractor Final state
attractor
© Andreas König Slide 2-60

30
Neurocomputing
Neural Networks-Hopfield NN-Models

¾ Stored patterns correspond to attractors in energy landscape


¾ Excessive number of patterns leads to spurious attractors
¾ Storage capacity has been analyzed to be
p ≈ 0,146 N

¾ Binary Hopfield neuron computation in forward phase:

net j (t + 1) = ∑ wij oi (t ) + in j (2.52)


j

 1 if f (net j (t ) ) > Θ j
o j (t + 1) = f (net j (t ) ) =  0

if f (net j (t ) ) < Θ j (2.53)
o (t ) else
 j
¾ Asynchronous (one-at-a-time) or synchronous change of neuron states
¾ Learning can take place, e.g., by correlation learning:

(xν ⋅ xν )
P
1
wij =
N

ν=1
i j
(2.54)

¾ Weights can be computed externally, e.g., for optimization tasks


© Andreas König Slide 2-61

Neurocomputing
Neural Networks-Hopfield NN-Models

¾ Application examples of Hopfield network for pattern restauration


Test patterns

Training patterns

© Andreas König Slide 2-62

31
Neurocomputing
Neural Networks-Hopfield NN-Models

¾ Optimization with Hopfield network applied to the Traveling-Sales-


Person-problem (TSP):
city/order of
traversal

¾ Energy function adapted to the problem with (soft) constraints:

(2.55)

¾ Penalty terms of soft constraints disappear if


A: only one “1” per row, i.e., every city visited only once
B: only one “1” per column, i.e., every city visited at a time
C: every city visited once (trivial solution)
D: Tour length (main constraint)
© Andreas König Slide 2-63

Neurocomputing
Neural Networks-Hopfield NN-Models

¾ Due to the network growth with the number of cities n (n2 neurons, n4
weights) problem size was commonly limited to several hundred cities

¾ This and a class of related optimization problems of significant


economical impact, e.g., air-crew-scheduling-problem etc., kindled the
interest in fast parallel hardware implementation

¾ For this kind of task real-valued neurons were used in the Hopfield
network, employing the nonlinearity of (2.2)

¾ Practical drawback: Hopfield network can only reach a local optimum


during “roll-off” in the energy landscape, thus can get stuck in a shallow
dent representing a potentially sub-optimal or bad solution

¾ Advocates other techniques employing random mechanisms, such as


Simulated Annealing or Boltzmann Machines

© Andreas König Slide 2-64

32
Neurocomputing
Neural Networks-SA NN-Models

¾ Principle of Simulated Annealing, escaping local mimimum:


E E
1 2

E E
3 4

¾ Decaying system temperature T defines the T


system’s ability to accept a temporary 1
decrease of the solution quality to leave a 2
local optimum
3 4 t
© Andreas König Slide 2-65

Neurocomputing
Neural Networks-BM NN-Models

¾ A Boltzmann Machine (BM) is a neural network exploiting a stochastic


transition mechanism inspired by SA (Koorst & Aarts, 1990)
¾ Basically, the network has input, hidden, and output neurons connected by
bi-directional weighted connections
¾ For optimization, only hidden units are used
¾ Neurons are binary, i.e, “on” or “off”, their states are denoted as BM
configuration
¾ In every configuration, the sum of all connection weights incident with
two “on” neurons is accumulated as consensus function

C (k ) = ∑ w k (u )k (v )
{u ,v}∈U
uv (2.56)

¾ Here, k(u) gives the state of neuron u in the current configuration


¾ A sequential BM generates a state transition of one unit at a time
¾ A parallel BM generates state transitions for several up to all units at a
time
1
¾ A state transition is generated with a probability of: G (u ) = (2.57)
U
© Andreas König Slide 2-66

33
Neurocomputing
Neural Networks-BM NN-Models

¾ A state change will lead to a consensus difference between configuration k


and ku
∆Ck (u ) = C (ku ) − C (k ) (2.58)

¾ A state transition will be accepted with a probability Ak(u,c) controlled by


the implied consensus difference and the system temperature:
1
Ak (u , c) = − ∆Ck (u ) (2.59)
1 + exp c

© Andreas König Slide 2-67

Neurocomputing
Neural Networks-BM NN-Models

¾ Computation for acceptance of a neuron proposed state transition:


o (t ) if random < Ak (u , c )
ok (t + 1) =  k (2.60)
ok (t ) else
¾ Temperature is reduced gradually during the process (Markov chain
length) reaching intermediate equilibria

¾ For optimization, the problem must be reformulated to a binary variable


representation
¾ The binary variables are assigned to the BM neurons and problem-specific
weights are determined to meet constraints (order preserving, feasible)
¾ Minimization and maximization problems must be mapped in an appro-
priate way to the consensus function that ensures valid solution finding
¾ Example: general Cut problem:

∑ w ((1 − x )x + (1 − x )x )
n n
f (X ) = ∑ ij i j j i
(2.61)
i =1 j =i +1

© Andreas König Slide 2-68

34
Neurocomputing
Neural Networks-BM NN-Models

¾ The BM has the clear advantage to find better, perhaps global minima
¾ This property is also attractive with regard general mappings learning:

¾ Supervised learning takes place in two phases:


¾ Clamped Phase: Environmental units are clamped to prescribed values of
the aspired mapping and the remaining hidden units equilibrate. The
probability z’{u,v} of each weight having a pre and post synaptic active
neuron is calculated
¾ Free Phase: All units equilibrate (special case: input units clamped) and
probability z{u,v} is calculated
© Andreas König Slide 2-69

Neurocomputing
Neural Networks-BM NN-Models

¾ Formulation of the BM learning algorithm:

denotes wij

¾ Learning can be (dreadfully) slow, however, can reach good optimum


¾ Convergence proof exists: reaches global optimum for infinitesimal slow
cooling process
¾ Applied in optimization and pattern recognition, rarely implemented
¾ Demands for random generators and cooling schedule implementation

© Andreas König Slide 2-70

35
Neurocomputing
Neural Networks-SOM NN-Models

¾ The Self-Organizing feature Map (SOM), introduced by Teuvo Kohonen


is the probably most well-known and applied neural network
¾ The SOM was derived from physiological evidence observed in the
somato sensory cortex, e.g. [Kohonen 89]
Winning neuron Nc
1
2
Neigborhood function Component
(Gaussian, pyramid, box) planes

α (t )
M
r
Weight vector w j M
WTA
M
r
Input vector vi r (t ) d c = min Nj=SOM
1 ∑ (v − w )
i =1
i ij
2

(2.62)

© Andreas König Slide 2-71

Neurocomputing
Neural Networks-SOM NN-Models

¾ The SOM features the properties of data quantization, probability


density approximation, topology preserving dimensionality reducing
mapping
¾ Typically, 1D- or 2D-SOM neuron grids are employed (3D in Robotics)
¾ SOM learning in its common technical implementation:
r
1. Random initialization of neuron weight vectors w j
r
2. Iterative presentation of stimuli vectors vi and computation of the
winner neuron Nc M
N
d c = min j =SOM
1
2
∑ (v − w )
i =1
i ij

3. Adaptation of the winning neuron and the neigbors


w (t ) + α (t ) N c (r (t ))(vik − wij (t )) for j ∈ N c (r (t ))
wij (t + 1) =  ij
 wij (t ) for j ∉ N c (r (t ))
(2.63)
4. Reduce α (t ) and r(t); Terminate learning by max. steps/error

© Andreas König Slide 2-72

36
Neurocomputing
Neural Networks-SOM NN-Models

¾ For the special case of two-dimensional SOM, two-dimensional weights


and stimuli equal probability distribution , the network unfolding during
training can be observed:

Initialization 20 steps 100 steps

300 steps 1000 steps 10000 steps

© Andreas König Slide 2-73

Neurocomputing
Neural Networks-SOM NN-Models

¾ During the training process, the SOM unfolds in the multivariate pattern
space and creates a topology preserving mapping to the 2D neuron grid
¾ Example of SOM visualization for Cube-data:

© Andreas König Slide 2-74

37
Neurocomputing
Neural Networks-SOM NN-Models

¾ SOM component planes for Cube-data:

Plane 1

Plane 2

Plane 3

¾ More discussed in sensor signal processing lecture ....


© Andreas König Slide 2-75

Neurocomputing
Neural Networks-SOM NN-Models

¾ Computational requirements of the SOM in the forward phase are the


common distance computation, e.g., Euclidean distance, followed by the
search for the closest stimulus, i.e., the minimum distance
¾ This compares to 1-NN computation and can be subject to parallelization
¾ In particular the minimum search is an attractive candidate as sequential
search can be a bottleneck in a parallel array
¾ Remedy: Comparator tree or efficient bitwise parallel comparison schemes

¾ In learning, however, only a subset (of decreasing size) of neurons take part
¾ Both spatial and temporal adaptations for the learn rate have to be computed
and communicated to adapting neurons:
xi
t - * + wijnew
α (t ) wij
α (0)
t
N C (r (t ) ) *
r (0) wijold
© Andreas König Slide 2-76

38
Neurocomputing
Neural Networks-CNN NN-Models

¾ Cellular-Neural-Networks (CNN) Chua, Yang IEEE TCAS (1988), p. 1257:


Network topology
Neurons
r=1

Inputs
DTCNN:
s j (t ) = ∑ wij oi (t ) + ∑ wkj xk + Θ j (2.64)
i k

 1 for s j (t ) > 0 (2.65)


o j (t + 1) = 
− 1 for s j (t ) ≤ 0
¾ Neural networks for low-level processing: Retina/vision chips, cochlea
chips, cellular-neural-networks with restricted local neighborhood conn.
¾ Implementation of linear and non-linear image processing operations based
on appropriate (heuristically determined) cloning templates
© Andreas König Slide 2-77

Neurocomputing
Neural Networks-CNN NN-Models

¾ Image processing capabilities of Discrete-Time-Cellular-Neural-Networks


(DTCNN) for various cloning templates:
¾ Sceletonization, edge extraction, connected-component-detection, hole
filling, concentric-contour-detection, dilation/erosion, noise elimination

© Andreas König Slide 2-78

39
Neurocomputing
Neural Networks-CNN NN-Models

¾ Application to feature computation for OCR:

¾ Preferred application in analog or mixed-signal implementation


¾ Time-continuous CNN with differential equation for modelling
¾ Grey value output/processing and hexagonal neighborhood as options
¾ No explicit learning rule available !
¾ Forward phase requirements: Twice the dot product computation,
accumulation to state and non-linearity (thresholding) for output
© Andreas König Slide 2-79

Neurocomputing
Conclusions NN-Models

¾ Identified Basic Algorithmic Building Blocks:

• Vector Subtraction, Addition, and Scaling


• Matrix-Vector Multiplication
• Distance Metric (Dot Product, Euclidean Distance, …)
• Non-Linearity (Sigmoid, Gaussian, …, derivatives of )
• Winner-Takes-All-Mechanism (WTA), corresponds to efficient
Max/Min-Search
• Convolution/Correlation Support
• Random Generators
• Dynamic Network Topology Support

¾ Operations supported by dedicated neural network hardware


¾ Often complemented for conventional signal processing needs

© Andreas König Slide 2-80

40
Neurocomputing
Conclusions NN-Models

¾ Unifying regarded algorithm’s requirements for a common forward phase:

K2
xi
wij
- * ∑ Min.
Search
oj

NL
-Rj

¾ A data path for a processing element will follow these conceptual guidelines,
obeying to additional constraints
¾ Learning implementation considerably more inhomogeneous
¾ Additional non-linearity required (derivative)
¾ Commonly, implementations including on-chip learning tend to be more
specialized and support only single or few algorithms

© Andreas König Slide 2-81

Neurocomputing
Summary NN-Models

¾ The chapter briefly introduced to (revisited) important and commonly


applied artificial neural network algorithms

¾ These are MLPs with backpropagation learning, RBFs, RCE, PNN,


LVQ/SOM, Hopfield/Boltzmann and Cellular neural networks

¾ The focus on the presentation was on the computational requirements of


the ANN algorithms and their potential for parallel implementation

¾ Common requirements for the forward phase were identified for


potential multi-model implementation (not so much for learning)

¾ Spiking neural network algorithms not yet included

In the next step, typical applications will be investigated with


regard to the justification of the underlying effort of an actual
dedicated massively parallel system implementation.

© Andreas König Slide 2-82

41

Potrebbero piacerti anche