Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
8 Neural Networks
Hantao Zhang
http://www.cs.uiowa.edu/ hzhang/c145
Motivation:
Reasonable Size:
neurons
Reliable
Graceful degradation
Computer
Human Brain
1011 neurons
1011 neurons, 1014 synapses
10 3 sec
1014 bits/sec
1014
Biological System
Axonal arborization
Axon from another cell
Synapse
Dendrite
Axon
Nucleus
Synapses
A neuron does nothing until the collective influence of all its inputs
reaches a threshold level.
At that point, the neuron produces a full-strength output in the form of
a narrow pulse that proceeds from the cell body, down the axon, and
into the axons branches.
It fires!: Since it fires or does nothing it is considered an all or
nothing device.
a i = g(ini )
Wj,i
Input
Links
ini
Input
Function
ai
Activation
Function
Output
Links
Output
ni
X
Wj,i aj
j=1
ni
X
Wj,i aj )
j=1
ai
ai
+1
+1
+1
ini
ini
ini
1, if x t
stept (x) =
0, if x < t
+1, if x 0
sign(x) =
1, if x < 0
sig(x) =
1
1+ex
if x >
1,
f (x) =
0,
if x
1, if x <
1
1 + ex
W=1
W = 1
t = 1.5
W=1
AND
t = 0.5
t = 0.5
OR
NOT
W=1
Oi
Wj,i
Hidden units a j
Wk,j
Input units
Ik
Notes:
The roots of the graph are at the bottom and the (only) leaf at the top.
The layer of input units is generally not counted (which is why this is
a two-layer net).
Example
I1
w13
H3
w14
w35
O5
w23
I2
a5
w24
H4
w45
g5 (W3,5 a3 + W4,5 a4 )
Perceptrons
Single-layer, feed-forward networks whose units use a step
function as activation function.
Ij
Input
Units
Wj,i
Oi
Ij
Output
Units
Input
Units
Perceptron Network
Wj
O
Output
Unit
Single Perceptron
Perceptrons
Perceptrons caused a great stir when they were invented
because it was shown that
If a function is representable by a perceptron, then it
is learnable with 100% accuracy, given enough
training examples.
The problem is that perceptrons can only represent
linearly-separable functions.
I1
I1
?
0
0
0
(a)
I1 and I2
I2
0
0
(b)
I1 or I2
I2
0
(c)
I2
I1 xor I2
w13
H3
w14
w35
O5
w23
I2
a5
w24
H4
w45
g5 (W3,5 a3 + W4,5 a4 )
W = 1
I2
W = 1
t = 1.5
W = 1
I3
(a) Separating plane
Learning = Training in NN
Neural networks are trained using data referred to as a
training set.
The process is one of computing outputs, compare
outputs with desired answers, adjust weights and
repeat.
The information of a Neural Network is in its structure,
activation functions, weights, and
Learning to use different structures and activation
functions is very difficult.
These weights are used to express the relative strength
of an input value or from a connecting unit (i.e., in
another layer). It is by adjusting these weights that a
neural network learns.
Ij
Input
Units
Wj,i
Oi
Ij
Output
Units
Input
Units
Perceptron Network
Wj
O
Output
Unit
Single Perceptron
n
X
Wj Ij ) = step0 (
j=1
n
X
W j Ij )
j=0
where W0 = t and I0 = 1.
Therefore, we can always assume that the units threshold is 0 if we
include the actual threshold as the weight of an extra link with a fixed
input value.
This allows thresholds to be learned like any other weight.
Then, we can even allow output values in [0, 1] by replacing step0 by
the sigmoid function.
Pn
j=0 Wj Ij ),
Theoretic Background
Learn by adjusting weights to reduce error on training set
The squared error for an example with input x and true
output y is
1 2 1
E = Err = (y hW (x))2 ,
2
2
Perform optimization search by gradient descent:
n
X
Err
E
Wj x j )
y g(
= Err
= Err
Wj
Wj
Wj
j=0
= Err g (in) xj
E
Wj
= Wj + Err g (in) xj
Wj
I2
I3
I4
I5
W1
W2
W3
W4
W5
e1
e2
e3
e4
e5
e6
e7
e8
Sum
Out
Err
I2
I3
I4
I5
W1
W2
W3
W4
W5
e1
e2
e3
e4
e5
e6
e7
e8
Sum
Out
Err
ai
Wj,i
Hidden units
aj
Wk,j
Input units
ak
hW(x1, x2)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -2
x1
-4
4
2
0 x2
-2
hW(x1, x2)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -2
x1
-4
4
2
0 x2
-2
= Err g (in) xj
E
= Wj + Err g (in) xj
Wj
Wj
Assuming
g (x) =
1
g(x) =
(1 + ex )
ex
= g(x)(1 g(x))
x
2
(1 + e )
Back-propagation Learning
1. Phase 1: Propagation
(a) Forward propagation of a training examples input to get the
output O
(b) Backward propagation of the output error to generate the deltas
of all neural nodes (neurons).
2. Phase 2: Weight update
(a) Multiply its output delta and inputs activation to get the gradient of
the weight.
(b) Bring the wright in the opposite direction of the gradient by
substracting a ratio of it from the weight.
(Most neuroscientists deny that back-propagation occurs in the brain)
Back-propagation Learning
Output layer: similar to the single-layer perceptron, let
i = Erri g (ini ) (called EO in (Eq 8.6)).
Wji Wji + aj i
Back-propagation Learning
u1
u2
v1
v2
w1
w2
e1
e2
e3
1
0
e4
e1
S1
a1
S2
a2
S3
Out
Err
Handwriting Digits
The Neural Network has 35 input nodes for such an image.
The NN is trained by these perfect examples many times.
Handwriting Digits
The Neural Network has 10 hidden nodes, fully connected to all
the input nodes (350 edges), and fully connected to all 10
output nodes (100 edges). Each output node represents one
type of digits.
The final result is decided by the maximum output node (winner
takes all).
Summary
Learning needed for unknown environments, lazy designers
Learning method depends on type of performance element,
available feedback, type of component to be improved, and its
representation
For supervised learning, the aim is to find a simple hypothesis
approximately consistent with training examples
Learning performance = prediction accuracy measured on test
set
Many applications: speech, driving, handwriting, credit cards,
etc.