Sei sulla pagina 1di 38



Machine Learning
2nd Edi7on

Mul7layer Perceptrons
The MIT Press, 2010
Edited and expanded for CS 4641 by Chris Simpkins
Neural networks, brains, and computers
Classication and regression
Linear separability
Multilayer perceptrons
Universal approximation

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 2
Neural Networks
Networks of processing units (neurons) with connec7ons
(synapses) between them
Large number of neurons: 1010
Large connec7vity: 105
Parallel processing
Distributed computa7on/memory
Robust to noise, failures

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 3
Understanding the Brain
Levels of analysis (Marr, 1982)
1. ComputaOonal theory
2. RepresentaOon and algorithm
3. Hardware implementaOon
Reverse engineering: From hardware to theory
Parallel processing: SIMD vs MIMD
Neural net: SIMD with modiable local memory
Learning: Update by training/experience

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 4

(RosenblaS, 1962)

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 5
What a Perceptron Does
Regression: y=wx+w0 Classica7on: y=1(wx+w0>0)

y y
s y
w0 w0
w w
x x

Linear t
Linear discrimination
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 6

K Outputs


Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 7
Online (instances seen one by one) vs batch (whole
sample) learning:
No need to store the whole sample
Problem may change in Ome
Wear and degradaOon in system components
Stochas7c gradient-descent: Update aYer a single paZern
Generic update rule (LMS rule):

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 8
Training a Perceptron: Regression
Regression (Linear output):

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press 9
Single sigmoid output

K>2 so4max outputs

Same as for linear discriminants from chapter 10 except

we update after each instance
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 10
Learning Boolean AND

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 11

No w0, w1, w2 sa7sfy:

(Minsky and Papert, 1969)

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press 12
MulOlayer Perceptrons

(Rumelhart et al., 1986)

Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 13
MLP as Universal Approximator

x1 XOR x2 = (x1 Lecture Notes for E

AND ~x2) OAR (~x1 AND x2)
lpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 14

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 15



Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 16
Regression with MulOple Outputs
y i




Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 17
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 18
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 19

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 20
Two-Class DiscriminaOon
One sigmoid output yt for P(C1|xt) and P(C2|xt) 1-yt

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 21
K>2 Classes

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 22
MulOple Hidden Layers
MLP with one hidden layer is a universal approximator
(Hornik et al., 1989), but using mul7ple layers may lead to
simpler networks

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 23
Improving Convergence

Adap7ve learning rate

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 24
Number of weights: H (d+1)+(H+1)K

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 25
Perceptrons handle linearly separable problems
Multilayer perceptrons handle any problem
Logistic discrimination functions enable gradient
descent-based packpropagation
Solves the structural credit assignment problem
Susceptible to local optima
Susceptible to overtting

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 26
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 27
Structured MLP

(Le Cun et al, 1989)

Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 28
Weight Sharing

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 29
Hints (Abu-Mostafa, 1995)
Invariance to translaOon, rotaOon, size

Virtual examples
Augmented error: E=E+hEh
If x and x are the same: Eh=[g(x|)- g(x|)]2
ApproximaOon hint:

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 30
Tuning the Network Size
DestrucOve ConstrucOve
Weight decay: Growing networks

(Ash, 1989) (Fahlman and Lebiere, 1989)

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 31
Bayesian Learning
Consider weights wi as random vars, prior p(wi)

Weight decay, ridge regression, regularizaOon

cost=data-mist + complexity
More about Bayesian methods
Lecture Notes for E in chapter 14
Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 32
Dimensionality ReducOon

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 33
Lecture Notes for E Alpaydn 2004 Introduction
to Machine Learning The MIT Press (V1.1) 34
Learning Time
Sequence recogniOon: Speech recogniOon
Sequence reproducOon: Time-series predicOon
Sequence associaOon
Network architectures
Time-delay networks (Waibel et al., 1989)
Recurrent networks (Rumelhart et al., 1986)

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 35
Time-Delay Neural Networks

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 36
Recurrent Networks

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 37
Unfolding in Time

Lecture Notes for E Alpaydn 2004 Introduction

to Machine Learning The MIT Press (V1.1) 38

Potrebbero piacerti anche