Sei sulla pagina 1di 38

Lecture

Slides for INTRODUCTION TO


Machine Learning
2nd Edi7on
CHAPTER 11:

Mul7layer Perceptrons
ETHEM ALPAYDIN
The MIT Press, 2010
Edited and expanded for CS 4641 by Chris Simpkins
alpaydin@boun.edu.tr
h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Overview
Neural networks, brains, and computers
Perceptrons
Training
Classication and regression
Linear separability
Multilayer perceptrons
Universal approximation
Backpropagation

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 2
Neural Networks
Networks of processing units (neurons) with connec7ons
(synapses) between them
Large number of neurons: 1010
Large connec7vity: 105
Parallel processing
Distributed computa7on/memory
Robust to noise, failures

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 3
Understanding the Brain
Levels of analysis (Marr, 1982)
1. ComputaOonal theory
2. RepresentaOon and algorithm
3. Hardware implementaOon
Reverse engineering: From hardware to theory
Parallel processing: SIMD vs MIMD
Neural net: SIMD with modiable local memory
Learning: Update by training/experience

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 4
Perceptron

(RosenblaS, 1962)

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 5
What a Perceptron Does
Regression: y=wx+w0 Classica7on: y=1(wx+w0>0)

y y
s y
w0 w0
w w
x
w0
x x
x0=+1

Linear t
Linear discrimination
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 6
Regression:

K Outputs

ClassicaOon:

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 7
Training
Online (instances seen one by one) vs batch (whole
sample) learning:
No need to store the whole sample
Problem may change in Ome
Wear and degradaOon in system components
Stochas7c gradient-descent: Update aYer a single paZern
Generic update rule (LMS rule):

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 8
Training a Perceptron: Regression
Regression (Linear output):

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press 9
(V1.1)
ClassicaOon
Single sigmoid output

K>2 so4max outputs

Same as for linear discriminants from chapter 10 except


we update after each instance
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 10
Learning Boolean AND

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 11
XOR

No w0, w1, w2 sa7sfy:


(Minsky and Papert, 1969)

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press 12
(V1.1)
MulOlayer Perceptrons

(Rumelhart et al., 1986)


Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 13
MLP as Universal Approximator

x1 XOR x2 = (x1 Lecture Notes for E


AND ~x2) OAR (~x1 AND x2)
lpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 14
BackpropagaOon

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 15
Regression

Backward

Forward

x
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 16
Regression with MulOple Outputs
y i

vih

zh
whj

xj

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 17
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 18
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 19
whx+w0
vhzh
zh

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 20
Two-Class DiscriminaOon
One sigmoid output yt for P(C1|xt) and P(C2|xt) 1-yt

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 21
K>2 Classes

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 22
MulOple Hidden Layers
MLP with one hidden layer is a universal approximator
(Hornik et al., 1989), but using mul7ple layers may lead to
simpler networks

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 23
Improving Convergence
Momentum

Adap7ve learning rate

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 24
Overbng/Overtraining
Number of weights: H (d+1)+(H+1)K

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 25
Conclusion
Perceptrons handle linearly separable problems
Multilayer perceptrons handle any problem
Logistic discrimination functions enable gradient
descent-based packpropagation
Solves the structural credit assignment problem
Susceptible to local optima
Susceptible to overtting

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 26
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 27
Structured MLP

(Le Cun et al, 1989)


Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 28
Weight Sharing

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 29
Hints (Abu-Mostafa, 1995)
Invariance to translaOon, rotaOon, size

Virtual examples
Augmented error: E=E+hEh
If x and x are the same: Eh=[g(x|)- g(x|)]2
ApproximaOon hint:

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 30
Tuning the Network Size
DestrucOve ConstrucOve
Weight decay: Growing networks

(Ash, 1989) (Fahlman and Lebiere, 1989)

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 31
Bayesian Learning
Consider weights wi as random vars, prior p(wi)

Weight decay, ridge regression, regularizaOon


cost=data-mist + complexity
More about Bayesian methods
Lecture Notes for E in chapter 14
Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 32
Dimensionality ReducOon

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 33
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 34
Learning Time
Applica7ons:
Sequence recogniOon: Speech recogniOon
Sequence reproducOon: Time-series predicOon
Sequence associaOon
Network architectures
Time-delay networks (Waibel et al., 1989)
Recurrent networks (Rumelhart et al., 1986)

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 35
Time-Delay Neural Networks

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 36
Recurrent Networks

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 37
Unfolding in Time

Lecture Notes for E Alpaydn 2004 Introduction


to Machine Learning The MIT Press (V1.1) 38

Potrebbero piacerti anche