Sei sulla pagina 1di 55

Ch.

8 Neural Networks
Hantao Zhang
http://www.cs.uiowa.edu/ hzhang/c145

The University of Iowa


Department of Computer Science

Artificial Intelligence p.1/??

Brains as Computational Devices


Algorithms developed over centuries do not fit the complexity
of real world problem. The human brain is the most sophisticated
computer suitable for solving extremely complex problems.

Motivation:

1011 neurons (neural cells) and only a small portion


of these cells are used

Reasonable Size:

Simple Building Blocks:


Massively parallel:
Fault-tolerant:

No cell contains too much information.

Each region of the brain controls specialized tasks.

Information is saved mainly in the connections among

neurons
Reliable
Graceful degradation

Artificial Intelligence p.2/??

Comparing Brains with Computers


Computational units
Storage units
Cycle time
Bandwidth
Neuron updates/sec

Computer

Human Brain

1 CPU, 105 gates


109 bits RAM, 1010 bits disk
10 8 sec
109 bits/sec
105

1011 neurons
1011 neurons, 1014 synapses
10 3 sec
1014 bits/sec
1014

Even if a computer is one million times faster than a brain in


raw speed, a brain ends up being one billion times faster
than a computer at what it does.
Example: Recognizing a face
Brain:
< 1s (a few hundred computer cycles)
Computer: billions of cycles

Artificial Intelligence p.3/??

Biological System
Axonal arborization
Axon from another cell
Synapse
Dendrite

Axon

Nucleus
Synapses

Cell body or Soma

A neuron does nothing until the collective influence of all its inputs
reaches a threshold level.
At that point, the neuron produces a full-strength output in the form of
a narrow pulse that proceeds from the cell body, down the axon, and
into the axons branches.
It fires!: Since it fires or does nothing it is considered an all or
nothing device.

Increases or decreases the strength of connection and causes


excitation or inhibition of a subsequent neuron
Artificial Intelligence p.4/??

Analogy from Biology


aj

a i = g(ini )

Wj,i
Input
Links

ini

Input
Function

ai

Activation
Function

Output
Links

Output

Artificial neurons are viewed as a node connected to


other nodes via links that correspond to neural
connections.
Each link is associated with a weight.
The weight determines the nature (+/-) and strength of
the nodes influence on another.
If the influence of all the links is strong enough the node
is activate (similar to the firing of a neuron).

Artificial Intelligence p.5/??

A Neural Network Unit

Artificial Intelligence p.6/??

Artificial Neural Network


A neural network is a graph of nodes (or units), connected
by links.
Each link has an associated weight, a real number.
Typically, each node i has several incoming links and
several outgoing links. Each incomping link provides a
real number as input to the node and the node sends
one real number through every outgoing link.
The output of a node is a function of the weighted sum
of the nodes inputs.

Artificial Intelligence p.7/??

The Input Function


Each incoming link of a unit i feeds an input value, or
activation value, aj coming from another unit.
The input function ini of a unit is simply the weighted sum of
the units input:
ini (a1 , . . . , ani ) =

ni
X

Wj,i aj

j=1

The unit applies the activation function gi to the result of ini to


produce an output:
outi = gi (ini ) = gi (

ni
X

Wj,i aj )

j=1

Artificial Intelligence p.8/??

Typical Activation Functions


ai

ai

ai

+1

+1

+1

ini

ini

ini

(a) Step function

1, if x t
stept (x) =
0, if x < t

(b) Sign function

(c) Sigmoid function

+1, if x 0
sign(x) =
1, if x < 0

sig(x) =

1
1+ex

Artificial Intelligence p.9/??

Typical Activation Functions 2


Hard limiter (binary step)

if x >

1,
f (x) =
0,
if x

1, if x <

Binary sigmoid (exponential sigmoid)


sig(x) =

1
1 + ex

where controls the saturation of the curve. When , hard


limiter is achieved.
Bipolar sigmoid (atan)
f (x) = tan1 (x)

Artificial Intelligence p.10/??

Units as Logic Gates


W=1

W=1
W = 1
t = 1.5

W=1
AND

t = 0.5

t = 0.5

OR

NOT

W=1

Activation function: stept


Since units can implement the , , boolean operators,
neural nets are Turing-complete: they can implement any
computable function.

Artificial Intelligence p.11/??

Structures of Neural Networks


Directed:
Acyclic:
Feed-Forward:
Multi Layers: Nodes are grouped into layers and
all links are from one layer to the next layer.
Single Layer: Each node sends its output out of
the network.
Tree: ...
Arbitrary Feed: ...
Cyclic: ...
Undirected: ...

Artificial Intelligence p.12/??

Multilayer, Feed-forward Networks


A kind of neural network in which
links are directional and form no cycles (the net is a
directed acyclic graph);
the root nodes of the graph are input units, their
activation value is determined by the environment;
the leaf nodes are output units;
the remaining nodes are hidden units;
units can be divided into layers: a unit in a layer is
connected only to units in the next layer.

Artificial Intelligence p.13/??

A Two-layer, Feed-forward Network


Output units

Oi
Wj,i

Hidden units a j
Wk,j
Input units

Ik

Notes:

The roots of the graph are at the bottom and the (only) leaf at the top.
The layer of input units is generally not counted (which is why this is
a two-layer net).

Artificial Intelligence p.14/??

Example
I1

w13

H3

w14

w35
O5

w23
I2

a5

w24

H4

w45

g5 (W3,5 a3 + W4,5 a4 )

g5 (W3,5 g3 (W1,3 a1 + W2,3 a2 ) + W4,5 g4 (W1,4 a1 + W2,4 a2 ))

where ai is the output and gi is the activation function of node i.

Artificial Intelligence p.15/??

Multilayer, Feed-forward Networks


A powerful computational device:
with just one hidden layer, they can approximate any
continuous function;
with just two hidden layers, they can approximate any
computable function.
However, the number of needed units per layer may grow
exponentially with the number of the input units.

Artificial Intelligence p.16/??

Perceptrons
Single-layer, feed-forward networks whose units use a step
function as activation function.

Ij
Input
Units

Wj,i

Oi

Ij

Output
Units

Input
Units

Perceptron Network

Wj

O
Output
Unit

Single Perceptron

Artificial Intelligence p.17/??

Perceptrons
Perceptrons caused a great stir when they were invented
because it was shown that
If a function is representable by a perceptron, then it
is learnable with 100% accuracy, given enough
training examples.
The problem is that perceptrons can only represent
linearly-separable functions.

Artificial Intelligence p.18/??

Linearly Separable Functions


On a 2-dimensional space:
I1

I1

I1

?
0

0
0
(a)

I1 and I2

I2

0
0
(b)

I1 or I2

I2

0
(c)

I2

I1 xor I2

A black dot corresponds to an output value of 1. An empty


dot corresponds to an output value of 0.

Artificial Intelligence p.19/??

How to Represent XOR function by NN


I1

w13

H3

w14

w35
O5

w23
I2

a5

w24

H4

w45

g5 (W3,5 a3 + W4,5 a4 )

g5 (W3,5 g3 (W1,3 a1 + W2,3 a2 ) + W4,5 g4 (W1,4 a1 + W2,4 a2 ))

where ai is the output, gi = step0.5 is the activation function of node i,


W13 = W24 = W35 = W45 = 1, W14 = W23 = 1.

Artificial Intelligence p.20/??

A Linearly Separable Function


On a 3-dimensional space:
The minority function: Return 1 if the input vector contains
less ones than zeros. Return 0 otherwise.
I1

W = 1
I2

W = 1

t = 1.5

W = 1

I3
(a) Separating plane

(b) Weights and threshold

Artificial Intelligence p.21/??

Computing with a 2-layer NN

Artificial Intelligence p.22/??

Computing with a 2-layer NN


#define NUM_INPUTS 4
#define NUM_HIDDEN_NEURONS 4
#define NUM_OUTPUT_NEURONS 3
typedef mlp_s {
/* Inputs to the MLP (+1 for bias) */
double inputs[NUM_INPUTS+1];
/* Weights from Hidden to Input Layer (+1 for bias) */
double w_h_i[NUM_HIDDEN_NEURONS+1][NUM_INPUTS+1];
/* Hidden layer */
double hidden[NUM_HIDDEN+1];
/* Weights from Output to Hidden Layer (+1 for bias) */
double w_o_h[NUM_OUTPUT_NEURONS][NUM_HIDDEN_NEURONS+1];
/* Outputs of the MLP */
double outputs[NUM_OUTPUT_NEURONS];
} mlp_t;
double step(double x) {
if (x > 0.0) return 1.0; else return 0.0;
}

Artificial Intelligence p.23/??

Computing with a 2-layer NN


void feed_forward( mlp_t *mlp ) {
int i, h, out;
/* Feed the inputs to the hidden layer through
the hidden to input weights. */
for ( h = 0 ; h < NUM_HIDDEN_NEURONS ; h++ ) {
mlp->hidden[h] = 0.0;
for ( i = 0 ; i < NUM_INPUT_NEURONS+1 ; i++ ) {
mlp->hidden[h] += ( mlp->inputs[i] * mlp->w_h_i[h][i] );
}
mlp->hidden[h] = step( mlp->hidden[h] );
}

/* Feed the hidden layer activations to the output layer


* through the output to hidden weights. */
for( out = 0 ; out < NUM_OUTPUT_NEURONS ; out++ ) {
mlp->output[out] = 0.0;
for ( h = 0 ; h < NUM_HIDDEN_NEURONS ; h++ ) {
mlp->outputs[out] += ( mlp->hidden[h] * mlp->w_o_h[out][h] );
}
mlp->outputs[out] = step( mlp->outputs[out] );
Artificial Intelligence p.24/??
}}

Applications of Neural Networks


Signal and Image Processing
Signal prediction (e.g., weather prediction)
Adaptive noise cancellation
Satellite image analysis
Multimedia processing
Bioinformatics
Functional classification of protein and genes
Clustering of genesbased on DNA microarray data

Artificial Intelligence p.25/??

Applications of Neural Networks


Astronomy
Classification of objects (stars and galaxies)
Compression of astronomical data
Finance and Marketing
Stock market prediction
Fraud detection
Loan approval
Product bundling
Strategic planning

Artificial Intelligence p.26/??

Computing with NNs


Different functions are implemented by different network
topologies and unit weights.
The lure of NNs is that a network need not be explicitly
programmed to compute a certain function f .
Given enough nodes and links, a NN can learn the
function by itself.
It does so by looking at a training set of input/output
pairs for f and modifying its topology and weights so
that its own input/output behavior agrees with the
training pairs.
In other words, NNs learn by induction, too.

Artificial Intelligence p.27/??

Learning = Training in NN
Neural networks are trained using data referred to as a
training set.
The process is one of computing outputs, compare
outputs with desired answers, adjust weights and
repeat.
The information of a Neural Network is in its structure,
activation functions, weights, and
Learning to use different structures and activation
functions is very difficult.
These weights are used to express the relative strength
of an input value or from a connecting unit (i.e., in
another layer). It is by adjusting these weights that a
neural network learns.

Artificial Intelligence p.28/??

Process for Developing NN


1. Collect data Ensure that application is amenable to a NN
approach and pick data randomly.
2. Separate Data into Training Set and Test Set
3. Define a Network Structure Are perceptrons sufficient?
4. Select a Learning Algorithm Decided by available tools
5. Set Parameter Values They will affect the length of the
training period.
6. Training Determine and revise weights
7. Test If not acceptable, go back to steps 1, 2, ..., or 5.
8. Delivery of the product

Artificial Intelligence p.29/??

The Perceptron Learning Method


Weight updating in perceptrons is very simple because
each output node is independent of the other output nodes.

Ij
Input
Units

Wj,i

Oi

Ij

Output
Units

Input
Units

Perceptron Network

Wj

O
Output
Unit

Single Perceptron

With no loss of generality then, we can consider a


perceptron with a single output node.

Artificial Intelligence p.30/??

Normalizing Unit Thresholds


Notice that, if t is the threshold value of the output unit, then
stept (

n
X

Wj Ij ) = step0 (

j=1

n
X

W j Ij )

j=0

where W0 = t and I0 = 1.
Therefore, we can always assume that the units threshold is 0 if we
include the actual threshold as the weight of an extra link with a fixed
input value.
This allows thresholds to be learned like any other weight.
Then, we can even allow output values in [0, 1] by replacing step0 by
the sigmoid function.

Artificial Intelligence p.31/??

The Perceptron Learning Method


If O is the value returned by the output unit for a given
example and T is the expected output, then the units
error is
Err = T O

If the error Err is positive we need to increase O ;


otherwise, we need to decrease O .

Artificial Intelligence p.32/??

The Perceptron Learning Method


Since O = g(
each Wj .

Pn

j=0 Wj Ij ),

we can change O by changing

Assuming g is monotonic, to increase O we should


increase Wj if Ij is positive, decrease Wj if Ij is
negative.
Similarly, to decrease O we should decrease Wj if Ij is
positive, increase Wj if Ij is negative.
This is done by updating each Wj as follows
Wj Wj + Ij (Err)

where is a positive constant, the learning rate, and


Err = T O

Artificial Intelligence p.33/??

Theoretic Background
Learn by adjusting weights to reduce error on training set
The squared error for an example with input x and true
output y is
1 2 1
E = Err = (y hW (x))2 ,
2
2
Perform optimization search by gradient descent:

n
X
Err

E
Wj x j )
y g(
= Err
= Err
Wj
Wj
Wj
j=0

= Err g (in) xj

Artificial Intelligence p.34/??

Weight update rule


Wj

E
Wj
= Wj + Err g (in) xj
Wj

E.g., positive error = increasing network output =


increasing weights on positive inputs and decreasing on
negative inputs.
Simple weight update rule (assuming g (in) constant):
Wj Wj + Err xj

Artificial Intelligence p.35/??

A 5-place Minority Function


At first, collect the data (see below), then choose a structure (a perceptron
with five inputs and one output) and the activation function (i.e., step3 ).
Finally, set up parameters (i.e., Wi = 0) and start to learn:
P5
Assumping = 1, we have Sum = i=1 Wi Ii , Out = step3 (Sum),
Err = T Out, and Wj Wj + Ij Err.
I1

I2

I3

I4

I5

W1

W2

W3

W4

W5

e1

e2

e3

e4

e5

e6

e7

e8

Sum

Out

Err

Artificial Intelligence p.36/??

A 5-place Minority Function


The same as the last example, except that = 0.5 instead
of = 1, and initial Wi .
P5
Sum = i=1 Wi Ii , Out = step3 (Sum), Err = T Out, and
Wj Wj + Ij Err.
I1

I2

I3

I4

I5

W1

W2

W3

W4

W5

e1

e2

e3

e4

e5

e6

e7

e8

Sum

Out

Err

Artificial Intelligence p.37/??

Multilayer Perceptrons (MLP)


Layers are usually fully connected;
numbers of hidden units typically chosen by hand
Output units

ai

Wj,i

Hidden units

aj

Wk,j

Input units

ak

All continuous functions with 2 layers, all functions with 3


layers

Artificial Intelligence p.38/??

Sigmoid Function in MLP


1
g(x) =
1 + ecx

As c goes bigger, the curve becomes sharper:

Artificial Intelligence p.39/??

Sigmoid Function in Perceptron


1
g(x) =
1 + ecx

Only linearly-separable functions:

Artificial Intelligence p.40/??

Sigmoid Function in Two Layer MLP


1
g(x) =
1 + ecx

Two hidden nodes produce ridge-like functions:

hW(x1, x2)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -2
x1

-4

4
2
0 x2
-2

Artificial Intelligence p.41/??

Sigmoid Function in Two Layer MLP


1
g(x) =
1 + ecx

Four hidden nodes produce bump-like functions:

hW(x1, x2)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -2
x1

-4

4
2
0 x2
-2

Artificial Intelligence p.42/??

Errors with Sigmoid Functions


E
Wj
Wj

= Err g (in) xj
E
= Wj + Err g (in) xj
Wj
Wj

Assuming

g (x) =

1
g(x) =
(1 + ex )
ex
= g(x)(1 g(x))
x
2
(1 + e )

Eq (8.8) on the page 267 of the textbook, g (u) = u(1 u), is


a typo.

Artificial Intelligence p.43/??

Back-propagation Learning
1. Phase 1: Propagation
(a) Forward propagation of a training examples input to get the
output O
(b) Backward propagation of the output error to generate the deltas
of all neural nodes (neurons).
2. Phase 2: Weight update
(a) Multiply its output delta and inputs activation to get the gradient of
the weight.
(b) Bring the wright in the opposite direction of the gradient by
substracting a ratio of it from the weight.
(Most neuroscientists deny that back-propagation occurs in the brain)

Artificial Intelligence p.44/??

Back-propagation Learning
Output layer: similar to the single-layer perceptron, let
i = Erri g (ini ) (called EO in (Eq 8.6)).
Wji Wji + aj i

Hidden layer: back-propagate the error from the output layer,


X
j = (
Wji i ) g (inj ) .
i

(called Eh in (Eq 8.7) which has typos).


The update rule is identical:
Wkj Wkj + ak j .

Artificial Intelligence p.45/??

Back-propagation Learning

Initialize the weights in the network (often randomly


while (stopping criterion not meet)
For each example e in the training set
O = neural-net-output(network, e); \\ forward ph
T = teacher output for e
Calculate err = (T - O) at the output units
\\ backward phase
Compute delta_wh for all weights from hidden lay
to output layer;
Compute delta_wi for all weights from input laye
to hidden layer;
Update the weights in the network
Return the network

Artificial Intelligence p.46/??

Learning XOR Function


Assuming = 0.5, g = 1;
S1 = u u1 + v v1 , a1 = step0.5 (S1 ),
S2 = u u2 + v v2 , a2 = step0.5 (S2 ),
S3 = a1 w1 + a2 w2 , Out = step0.5 (S3 ),
Err = T Out, wj wj + aj Err for j = 1, 2,
ui ui + u w1 Err, and vi vi + v w2 Err, for i = 1, 2.

Artificial Intelligence p.48/??

Learning XOR Function


Assuming = 0.5, g = 1;
S1 = u u1 + v v1 , a1 = step0.5 (S1 ),
S2 = u u2 + v v2 , a2 = step0.5 (S2 ),
S3 = a1 w1 + a2 w2 , Out = step0.5 (S3 ),
Err = T Out, wj wj + aj Err for j = 1, 2,
ui ui + u w1 Err, and vi vi + v w2 Err, for i = 1, 2.

u1

u2

v1

v2

w1

w2

e1

e2

e3

1
0

e4

e1

S1

a1

S2

a2

S3

Out

Err

Artificial Intelligence p.49/??

Handwriting Digits
The Neural Network has 35 input nodes for such an image.
The NN is trained by these perfect examples many times.

Artificial Intelligence p.50/??

Handwriting Digits
The Neural Network has 10 hidden nodes, fully connected to all
the input nodes (350 edges), and fully connected to all 10
output nodes (100 edges). Each output node represents one
type of digits.
The final result is decided by the maximum output node (winner
takes all).

Artificial Intelligence p.51/??

Neural Network Summary


Advantages
Easy to adapt to unknown situations
Robustness: fault tolerance due to network redundancy
Autonomous learning and generalization
Disadvantages
Poor accuracy
Large complexity of the network structure
Over-fitting to training examples (cannot generalize well)
The solution is a black box (no insights into the problem)

Artificial Intelligence p.52/??

Nearest Neighbor Classification


Compute the distances from the input to all the examples;
Choose k examples which are nearest neighbors of the input;
Decide the majority class of these k neighbors.
Question: Can this technique of classification be regarded as
neural network?

Artificial Intelligence p.53/??

Probabilistic Neural Network (PNN)


Compute the distances from the input to all the examples;
Cumulate the normalized distances for each class of the
examples;
Decide the majority class of the normalized distances for each
class (winner takes all).

Artificial Intelligence p.54/??

Probabilistic Neural Network (PNN)


Compute the distances from the input to all the examples:
Pn
hi = Ei F = j=1 eij aj (Eq 8.13), where Ei = (ei1 , ei2 , ..., ein ) is
an example and F = (a1 , a2 , ..., an )q
is the input. Some people
Pn
2
also use Euclidean distance: hi =
j=1 (eij aj ) .

Cumulate the normalized distances for each class of the


2
examples: Initially cj := 0, cj +=e(hi 1)/ /Nj for each hi , where
cj is the accumulated normalized distance for class j, and Nj is
the number of examples of class j. is the smoothing factor.
This normalized distance is called normalized Radial-Basis
Form (RBF) in Probability Theory (a kind of probability density).
Other normalized distances are also used in practice.
Decide the majority class of the normalized distances for each
class (winner takes all).

Artificial Intelligence p.55/??

Summary
Learning needed for unknown environments, lazy designers
Learning method depends on type of performance element,
available feedback, type of component to be improved, and its
representation
For supervised learning, the aim is to find a simple hypothesis
approximately consistent with training examples
Learning performance = prediction accuracy measured on test
set
Many applications: speech, driving, handwriting, credit cards,
etc.

Artificial Intelligence p.56/??

Potrebbero piacerti anche