Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
NN-Models
Neurocomputing
Prof. Dr.-Ing. Andreas König
Institute of Integrated Sensor Systems ISE
Neurocomputing
NN-Models
Course Contents:
1. Introduction
2. Rehearsal of Artificial Neural Network models relevant for
implementation and analysis of the required computational steps
3. Analysis of typical ANN-applications with regard to computational
requirements
4. Aspects of simulation of ANNs and systems
5. Efficient VLSI-implementation by simplification of the original
algorithms
6. Derivation of a taxonomy of neural hardware
7. Digital neural network hardware
8. Analog and mixed-signal neural network hardware
9. Principles of optical neural network hardware implementation
10. Evolvable hardware overview
11. Summary and Outlook
1
Neurocomputing
Chapter Contents NN-Models
Neurocomputing
Chapter Contents NN-Models
2
Neurocomputing
ADALINE/Perceptron NN-Models
output neuron
variable weights
fixed connections
Neurocomputing
ADALINE/Perceptron NN-Models
¾ In the general case the neuron possess a nonlinear activation function f(net),
which is the identity function in the ADALINE:
m
x1 o = f ∑ xi wi (2.1)
x2
w1 i =1
w2
wi f(net) o 1
xi wm f (net ) = (2.2)
1 + e − net
xm
f(net)
3
Neurocomputing
ADALINE/Perceptron NN-Models
¾ The deviation of the actual neuron behavior or output from the desired or
prescribed output in such a supervised approach, can be assessed for all
pattern pairs
¾ The resulting error can be displayed as an error (hyper) surface with the
neuron weights as parameters, here w1 and w2:
error
Neurocomputing
ADALINE/Perceptron NN-Models
∂E N
m m
= −∑ y k − f ∑ xik wi ⋅ f ' ∑ xik wi ⋅ xik (2.5)
∂wi k =1 i =1 i =1
¾ Inserting (2.5) in (2.4) returns the batch learning rule, reducing the batch
size to one returns on-line learning rule with immediate weight adaptation
4
Neurocomputing
ADALINE/Perceptron NN-Models
xi
* * ∑ wijnew
δj
y kj η
* - wijold
y j
Neurocomputing
ADALINE/Perceptron NN-Models
xm
Class 2
(o=+1)
Class 1
Stimuli Weights Cell body Output (o=-1)
(Activation)
x1
¾ A single neuron can separate a linear separable problem with a separating
line (plane, hyperplane)
¾ Logical combinations of a layer allow tackling non-linear problems !
© Andreas König Slide 2-10
5
Neurocomputing
ADALINE/Perceptron NN-Models
¾ Simple distance neurons can serve also classifiers with spheric regions:
x1 R f
m
w1 o = f ∑ (xi − wi ) − R
2
x2 w2
i =1
wi f(net) o
xi wm x2
w2
xm
Class 2
Class 1 (o=+1)
Stimuli Weights Cell body Output (o=-1)
R
(Activation)
x
¾ A single neuron can separate a region by a radius limited (hyper) sphere w 1
1
¾ The norm of the distance vector is computed and compared with the
sphere radius, inside gives +1, outside returns -1
© Andreas König Slide 2-11
Neurocomputing
Relevant ANN for Classification NN-Models
Preprocessing 67663
Knowledgebased
S.Ender
Feature
From Home 11
Daiytown, 0815
segmentation
Kaiserslautern
6
Neurocomputing
Relevant ANN for Classification NN-Models
Neurocomputing
Relevant ANN for Classification NN-Models
classification methods
7
Neurocomputing
Parametric Methods NN-Models
r P(ωi ) −
1 r r T −1 r r
( x − µi ) K i ( x − µi )
dˆi (x ) = e 2
(2.6)
(2π )N det K i
Neurocomputing
Parametric Methods NN-Models
sketch of
class boundary
8
Neurocomputing
Parametric Methods NN-Models
¾ Assuming equal a priori values for all classes return the Maximum-
Likelihood-Classifier (MLC):
r 1 1 r r T r r
dˆi (x ) = − ln(det K i ) − (x − µi ) K i−1 ( x − µi ) (2.7)
2 2
¾ Assuming further equal covariance for all classes return the Mahalanobis-
Classifier (MAC):
r 1 r r T r r
dˆi (x ) = − ( x − µi ) K −1 ( x − µi ) (2.8)
2
x2
sketch of
class boundary
x1
© Andreas König Slide 2-17
Neurocomputing
Parametric Methods NN-Models
¾ Assuming further the covariance matrix to be the identity matrix return the
Centroid or Euclidean Distance Classifier (EAC):
N −1 2
= ∑ (x )
r r r
dˆi (x ) = x − µi
2
j −µj (2.9)
j =0
x2
sketch of
class boundary
x1
r N −1
¾ Simplifying the metric returns the dˆi (x ) = ∑ x j − µ j (2.10)
City-Block-Classifier (CBC): j =0
© Andreas König Slide 2-18
9
Neurocomputing
Nonparametric Methods NN-Models
¾ Parametric methods condense all the sample set information into few
model parameters and perform a global pdf estimation
¾ In contract, nonparametric methods perform a local pdf estimation:
r
r k (x)
p(x ) = (2.11)
N ⋅ν
¾ Assumption of either fixed volume ν or fixed number of patterns k
Neurocomputing
Nonparametric Methods NN-Models
¾ For the given choice of Gaussian kernels, implementation could look like:
xi
wij
- * ∑ / exp oj
2σ 2
o1
∑
Max-of-L
oj
oLi
Class index
pdf value(s)
o1
oj
oLL
∑
© Andreas König Slide 2-20
10
Neurocomputing
Nonparametric Methods NN-Models
Class 1
x1
¾ kNN training means storage of all patterns !
¾ Two basic variations: voting and volumetric kNN
© Andreas König Slide 2-21
Neurocomputing
Nonparametric Methods NN-Models
x2
Class 1
Class 2 Class 3
x1
11
Neurocomputing
Nonparametric Methods NN-Models
xi
wij
- * ∑ Min.
Search
oj
xi
wij
- * ∑ oj
xi
wiK
- * ∑
© Andreas König Slide 2-23
Neurocomputing
Nonparametric Methods NN-Models
12
Neurocomputing
Nonparametric Methods NN-Models
x2
Pseudo-Code of CoNN algorithm:
1. Initially empty classifier
2. Get first (next) pattern xi
3. 1-NN-classify sample set
pattern
If correct_classification goto 4
Else Insert pattern xi; goto 2
4. If all_patterns_corr_class
break;
Else goto 2
x1
¾ Algorithm reduces effort but depends on sample set presentation order and
leaves redundancy in the CoNN; recall by 1-NN mandatory !
© Andreas König Slide 2-25
Neurocomputing
Nonparametric Methods NN-Models
x2
Pseudo-Code of CoNN algorithm:
1. Initially empty classifier
2. Get first (next) pattern xi
3. 1-NN-classify sample set
first pattern pattern
If correct_classification goto 4
Else Insert pattern xi; goto 2
4. If all_patterns_corr_class
break;
Else goto 2
x1
13
Neurocomputing
Nonparametric Methods NN-Models
x2
Pseudo-Code of CoNN algorithm:
1. Initially empty classifier
2. Get first (next) pattern xi
potentially 3. 1-NN-classify sample set
redundant pattern
reference If correct_classification goto 4
vector Else Insert pattern xi; goto 2
4. If all_patterns_corr_class
break;
Else goto 2
x1
Neurocomputing
Nonparametric Methods NN-Models
14
Neurocomputing
Nonparametric Methods NN-Models
22
2825 49
2 16
67 70
73 2
36 486
33 39 5 11 3155 3
30 69
63321 18 5614
5775 32 4317 34
61
9 27 4160 38
8 44 4037
10 810
45 42
51
15 54 26 59
71
53 19464
52
46 9 5
35
72 66 5023
65
68 58 7
24 62 13
29 20 47 1 6
74 4
12
17
Neurocomputing
Nonparametric Methods NN-Models
15
Neurocomputing
Neural Networks NN-Models
¾ A multilayer perceptron with backpropagation algorithm serves as a
classifier for non-linear separable problems:
Max-of-L Class affiliation
oi
Output Layer
wij
hj
Hidden Layer
wjk
xk Input Layer
¾ Proven to be a universal function approximator with one nonlinear HL
¾ A learning rule is required for this network, in particular the hidden layer(s)
¾ Choice of hidden layer size and learning parameters can be difficult
¾ Resubstitution not guaranteed, generalization can be surprizing
© Andreas König Slide 2-31
Neurocomputing
Neural Networks NN-Models
¾ Introducing the following abbreviations using the notation given with the
network structure:
oi = f ∑ h j wij ; h j = f ∑ xk w jk (2.20)
j k
¾ With (5.19) the error can be expressed as:
2
1 N L m
E = ∑∑ yiµ − f ∑ h µj wij (2.21)
2 µ =1 i =1 i =1
¾ Every weight is adapted after (random) initialization according to:
∂E
∆wij = −η (2.22)
∂wij
¾ The gradient for the output layer weights is computed as:
N
∂E
= −∑ yiµ − f ∑ h µj wij ⋅ f ' ∑ h µj wij ⋅ h µj (2.23)
∂wij
µ =1 j j
© Andreas König Slide 2-32
16
Neurocomputing
Neural Networks NN-Models
∂E
( )
N
∆wij = −η = η ∑ yiµ − oiµ ⋅ o'iµ ⋅h µj (2.25)
∂wij µ =1
¾ For the hidden layer adaptation rule, the error function must be expanded:
2
1 N L
E = ∑∑ yiµ − f ∑ wij f ∑ xkµ w jk (2.26)
2 µ =1 i =1 j k
netj
hj
neti
oi
ei
Neurocomputing
Neural Networks NN-Models
17
Neurocomputing
Neural Networks NN-Models
L
δ jµ = h'µj ⋅∑ wijδ iµ (2.32)
i =1
N
∆w jk = η ∑ δ jµ ⋅ xkµ (2.34)
µ =1
Neurocomputing
Neural Networks NN-Models
neti NL * * * ∑ wijnew
~0 !
δi
yiµ η h µj
- wijold
oiµ
0.4
Derivative For batch learning, the need of accumulation
0.3
0.2 and intermediate storage of individual batch
0.1
0
pattern contributions must be regarded here
1
0.5
Sigmoid
∑
0
18
Neurocomputing
Neural Networks NN-Models
wij
∑w δ ij i
δj
w jk
∑w jk δj
δk
¾ Implies transposed access to weight memory
¾ Learning of lower layers recursively requires data from previous layer (∑ w δ ) ij i
Neurocomputing
Neural Networks NN-Models
∂E
∆wij (t ) = −η + α ⋅ ∆wij (t − 1) (2.35)
∂wij
+κ for ∆E < 0
∆η = − ϕ ⋅η for ∆E > 0 (2.36)
0 for else
19
Neurocomputing
Neural Networks NN-Models
Neurocomputing
Neural Networks NN-Models
estimated crosssection of
mimimum gradient surface
two known points location
∂wij ∂wij
¾ Weight increment limited by maximum step size given by µ !
¾ Epoch or batch learning rule !
© Andreas König Slide 2-40
20
Neurocomputing
Neural Networks - LVQ NN-Models
WTA-Layer
wij
Input Layer
xk
Neurocomputing
Neural Networks - LVQ NN-Models
x(t)-wc(t)
α (t )
w1, x1 t
¾ Dead reference vectors wi can occur, sufficient no. per class not assured
¾ Different initialization, e.g., RNN, could be suitable
¾ Basic methods extended by improved versions LVQ2/2.1 & LVQ3
© Andreas König Slide 2-42
21
Neurocomputing
Neural Networks - LVQ NN-Models
r
x (t ) ¾ Window computation:
new class
border
r d d 1− w
wi (t + 1) min xi , xl > ηˆ =
d d
xl xi 1 +w
r r
wi (t ) wl (t ) (2.42)
r
wl (t + 1) ¾ Monotonic decrease of dli
old class ¾ Remedy: Accept vectors in both
border sides of window (LVQ 2.1)
© Andreas König Slide 2-43
Neurocomputing
Neural Networks - LVQ NN-Models
∆wij
xi
wci
- * +/- wijnew
t
α (t )
α (0) wijold
right/wrong
class
22
Neurocomputing
Neural Networks - RCE NN-Models
∑ ∑ ∑ ∑ Output
Rj (Classification)
Layer
r r Kernel-Layer
w j − xk
wij
Input Layer
xk
¾ The Restricted-Coulomb-Energy network (RCE) employs step functions
in the kernel layer and or-gate or summation of activated kernel neurons
¾ The network is generated from scratch by (patented) dynamic training
© Andreas König Slide 2-45
Neurocomputing
Neural Networks - RCE NN-Models
23
Neurocomputing
Neural Networks - RCE NN-Models
Class 1
Class 2
Mechatronic data
Iris data
Neurocomputing
Neural Networks - RCE NN-Models
xi oj
wij
- * ∑ - >0 to global
Rj class
assignment
oj > Rj
oj ≤ Rj
¾ Evaluation of activated hyperspheres and result determination (class
assignment) according to global rules
¾ Not really regular structure, not easy amenable to parallel implementation
¾ Computational requirements of RCE learning phase:
• Pattern storage requires fast transfer to memory
• Initial setting of radii to Rmax
• Radius reduction requires distance computation between two weight
vectors and comparison to Rmin
• Rather irregular process, not amenable to parallelization
• Overlap of forward and training phase hard due to resource conflicts
© Andreas König Slide 2-48
24
Neurocomputing
Neural Networks - RCE NN-Models
Reference
Time-
Vector
multiplex
Memory 1
Reference
Vector
x2 Memory 1
R1 R2 R3 R4 RN-1 RN
PE
x1
Neurocomputing
Neural Networks - RBF NN-Models
∑ ∑ ∑ ∑ Output
σj (Classification)
Layer
r r Kernel-Layer
w j − xk
wij
d ker_ max
σj =
2M Input Layer
xk
¾ RBF-networks are also universal function approximators !
¾ Number of kernels fixed by choice and random initialization or SOM
© Andreas König Slide 2-50
25
Neurocomputing
Neural Networks - RBF NN-Models
¾ RBF-training on a fixed hidden layer might not be efficient
¾ Dynamic training algorithms for function approximation and classification
¾ The first one is Platt´s Resource-Allocation-Network RAN
Pseudo-Code of principle RAN algorithm:
x2 sketch of σj 1. Initially empty kernel layer
2. Get first (next) pattern xi
3. Compute output value
If Error > ε & d>δ
Insert xi as new kernel;
σj = min(κ• δ, κ• d)
Adapt output layer weights ... ;
x1
¾ Fast (evolving) training, insertion only does not remove redundant neurons
¾ Classification by determination of maximum pdf; background applicable !
¾ Training for all RBF-parameters can be achieved by gradient descent !
¾ Smooth and well-generalizing behavior of RBF-networks
© Andreas König Slide 2-51
Neurocomputing
Neural Networks - RBF NN-Models
26
Neurocomputing
Neural Networks-PNN NN-Models
∑ ∑ ∑ ∑ Output
σj (Classification)
Layer
r r Kernel-Layer
w j − xk
wij
Input Layer
xk
¾ Each training data vector is stored as Gaussian kernel with fixed global σ
¾ According to class labels, kernel are wired to pdf summation nodes
¾ Explicit cost or a priori weighting can be employed before pdf max-of-L
© Andreas König Slide 2-53
Neurocomputing
Neural Networks-PNN NN-Models
¾ pdf-computation:
(xv − µ j )T (x − µ j )
r r r
1 1 m
pi = ∑ exp − (2.43)
(2π )n σ n m i =1 2σ 2
27
Neurocomputing
Neural Networks-PNN NN-Models
xi oj
wij
- * ∑ NL
Rj
pdf accumulation
Class-Specific
¾ Metric computation can be simplified and parallelized
Max-of-N
¾ Pdf-computation rather complex
Neurocomputing
Neural Networks-AM NN-Models
28
Neurocomputing
Neural Networks-AM NN-Models
Neurocomputing
Neural Networks-AM NN-Models
Y =W ⋅ X (2.44)
∆W = η ∑ ( y − Wx )(x )
N
• Gradient descent r r r T
µ µ µ (2.46)
µ =1
• Correlation matrix 1 N
r r
W=
N
∑
µ
y µ xµ
=1
(2.47)
29
Neurocomputing
Neural Networks-AM NN-Models
Neurocomputing
Neural Networks-Hopfield NN-Models
E Inital state E
Spurios
Attractor Final state
attractor
© Andreas König Slide 2-60
30
Neurocomputing
Neural Networks-Hopfield NN-Models
1 if f (net j (t ) ) > Θ j
o j (t + 1) = f (net j (t ) ) = 0
if f (net j (t ) ) < Θ j (2.53)
o (t ) else
j
¾ Asynchronous (one-at-a-time) or synchronous change of neuron states
¾ Learning can take place, e.g., by correlation learning:
(xν ⋅ xν )
P
1
wij =
N
∑
ν=1
i j
(2.54)
Neurocomputing
Neural Networks-Hopfield NN-Models
Training patterns
31
Neurocomputing
Neural Networks-Hopfield NN-Models
(2.55)
Neurocomputing
Neural Networks-Hopfield NN-Models
¾ Due to the network growth with the number of cities n (n2 neurons, n4
weights) problem size was commonly limited to several hundred cities
¾ For this kind of task real-valued neurons were used in the Hopfield
network, employing the nonlinearity of (2.2)
32
Neurocomputing
Neural Networks-SA NN-Models
E E
3 4
Neurocomputing
Neural Networks-BM NN-Models
C (k ) = ∑ w k (u )k (v )
{u ,v}∈U
uv (2.56)
33
Neurocomputing
Neural Networks-BM NN-Models
Neurocomputing
Neural Networks-BM NN-Models
∑ w ((1 − x )x + (1 − x )x )
n n
f (X ) = ∑ ij i j j i
(2.61)
i =1 j =i +1
34
Neurocomputing
Neural Networks-BM NN-Models
¾ The BM has the clear advantage to find better, perhaps global minima
¾ This property is also attractive with regard general mappings learning:
Neurocomputing
Neural Networks-BM NN-Models
denotes wij
35
Neurocomputing
Neural Networks-SOM NN-Models
α (t )
M
r
Weight vector w j M
WTA
M
r
Input vector vi r (t ) d c = min Nj=SOM
1 ∑ (v − w )
i =1
i ij
2
(2.62)
Neurocomputing
Neural Networks-SOM NN-Models
36
Neurocomputing
Neural Networks-SOM NN-Models
Neurocomputing
Neural Networks-SOM NN-Models
¾ During the training process, the SOM unfolds in the multivariate pattern
space and creates a topology preserving mapping to the 2D neuron grid
¾ Example of SOM visualization for Cube-data:
37
Neurocomputing
Neural Networks-SOM NN-Models
Plane 1
Plane 2
Plane 3
Neurocomputing
Neural Networks-SOM NN-Models
¾ In learning, however, only a subset (of decreasing size) of neurons take part
¾ Both spatial and temporal adaptations for the learn rate have to be computed
and communicated to adapting neurons:
xi
t - * + wijnew
α (t ) wij
α (0)
t
N C (r (t ) ) *
r (0) wijold
© Andreas König Slide 2-76
38
Neurocomputing
Neural Networks-CNN NN-Models
Inputs
DTCNN:
s j (t ) = ∑ wij oi (t ) + ∑ wkj xk + Θ j (2.64)
i k
Neurocomputing
Neural Networks-CNN NN-Models
39
Neurocomputing
Neural Networks-CNN NN-Models
Neurocomputing
Conclusions NN-Models
40
Neurocomputing
Conclusions NN-Models
K2
xi
wij
- * ∑ Min.
Search
oj
NL
-Rj
¾ A data path for a processing element will follow these conceptual guidelines,
obeying to additional constraints
¾ Learning implementation considerably more inhomogeneous
¾ Additional non-linearity required (derivative)
¾ Commonly, implementations including on-chip learning tend to be more
specialized and support only single or few algorithms
Neurocomputing
Summary NN-Models
41