Deep Learning Models

Deep Learning Models
2012-05-03
Byoung-Hee Kim
Biointelligence Lab, CSE,
Seoul National University
NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Input
output
target
Two!
Artificial Neural Networks
Historical background:
First generation neural networks
Perceptrons (~1960)
used a layer of handcoded features and tried
to recognize objects by
learning how to weight
these features.
There was a neat
learning algorithm for
adjusting the weights.
But perceptrons are
fundamentally limited in
what they can learn to
do.
Bomb
Toy
output units e.g.

class labels
non-adaptive
hand-coded
features
input units
e.g. pixels
Sketch of a typical
perceptron from the 1960s
10
Second generation neural networks (~1985)

Back-propagate
error signal to
get derivatives
for learning
Compare outputs with

correct answer to get
error signal
outputs
hidden
layers
input vector
11
But, finding any model with deep architecture was not successful till 2006
12
http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html
13
14
15
16
17
Agenda
Computer Perception
Unsupervised feature learning

Various deep learning models
Application cases of deep learning models
Written digit recognition/generation (MNIST dataset)

Image classification
Audio recognition
Language modeling
Motion generation
References
Appendix
18
Brain-like Cognitive Computing & Deep Learning

It is well know that the brain has a
hierarchical structure
Researchers try to build models that
simulate and/or act like the brain
Learning deep structures from data,
or the deep learning is a new frontier
in Artificial Intelligence research
Researchers try to find analogies between the

characteristics of the brain and their deep
models
19
Feature Learning
pixel 1
Learning
algorithm
Input
pixel 2
pixel 2
Input space
Motorbikes
Non-Motorbikes
pixel 1
20
Feature Learning
handle
wheel
Feature
Extractor
Learning
algorithm
Input
Feature space
pixel 2
handle
Input space
Motorbikes
Non-Motorbikes
pixel 1
wheel
21
How is computer perception done?

Object
detection
Image
Low-level
vision features
Recognition
Audio
classification
Audio
Low-level
audio features
Speaker
identification
Helicopter
control
Helicopter
Low-level state
features
Action
22
Learning representations
Sensor
Feature
Representation
Learning
algorithm
23
Computer vision features
SIFT
HoG
Textons
Spin image
RIFT
GLOH
24
Audio features
MFCC
Spectrogram
Flux
ZCR
Rolloff
25
Problems of hand-tuned features
Needs expert knowledge

Sub-optimal
Time-consuming and expensive
Does not generalize to other domains
Can we automatically learn good feature representations?

26
Sensor representation in the brain
Seeing with your tongue
Human echolocation (sonar)
Auditory cortex
learns to see.
Auditory
Cortex
27
[BrainPort; Martinez et al; Roe et al.]
Unsupervised Feature Learning
Find a better way to represent images than pixels

28
The goal of Unsupervised Feature Learning
Unlabeled images
Learning
algorithm
Feature representation
29
30
31
32
33
34
35
36
37
38
39
40
41
Stochastic binary units

(Bernoulli variables)
1
These have a state

of 1 or 0.
p(si 1)
The probability of
turning on is
determined by the
weighted input
from other units
(plus a bias)
p( si 1)
0
0
bi s j w ji
j
1
1 exp(bi s j w ji )
j
42
Binary
Stochastic
Neuron
43
44
45
46
47
48
A model of digit recognition

The top two layers form an
associative memory whose
energy landscape models the low
dimensional manifolds of the
digits.
The energy valleys have names
2000 top-level neurons
10 label
neurons
The model learns to generate

combinations of labels and images.
To perform recognition we start with a

neutral state of the label units and do
an up-pass from the image followed
by a few iterations of the top-level
associative memory.
500 neurons
500 neurons
28 x 28
pixel
image
49
Generation & Recognition of Digits by DBN

Deep belief network that learns to generate
handwritten digits
http://www.cs.toronto.edu/~hinton/digits.html
50
51
First stage of visual processing in brain: V1

The first stage of visual processing in the brain (V1) does
edge detection.
Schematic of simple cell
Actual simple cell
Gabor functions.
[Images from DeAngelis, Ohzawa & Freeman, 1995]
Sparse coding illustration

Learned bases (f1 , , f64): Edges
Natural Images
50
100
150
200
50
250
100
300
150
350
200
400
250
50
300
100
450
500
50
100
150
200
350
250
300
350
400
450
150
500
200
400
250
450
300
500
50
100
150
350
200
250
300
350
100
150
400
450
500
400
450
500
50
200
250
300
350
400
450
500
Test example
0.8 *
x
0.8 *
+ 0.3 *
f36
+ 0.3 *
+ 0.5 *
f42
+ 0.5 *
f63
[0, 0, , 0, 0.8, 0, , 0, 0.3, 0, , 0, 0.5, ]

Compact & easily
= [a1, , a64] (feature representation)
interpretable
Supervised learning
Cars
Testing:
What is this?
Motorcycles
Semi-supervised learning
Unlabeled images (all cars/motorcycles)
Testing:
What is this?
Car
Motorcycle
Self-taught learning
Unlabeled images (random internet images)
Testing:
What is this?
Car
Motorcycle
Self-taught learning
Sparse codin
g, LCC, etc.
f1, f2, , fk
Use learned f1, f2, , fk to represent training/test sets.

Using f1, f2, , fk
Car
Motorcycle
a1, a2, , ak
Convolutional DBN for Images
Convolutional DBN on face images

object models
object parts
(combination
of edges)
edges
pixels
Learning of object parts

Examples of learned object parts from object categories
Faces
Cars
Elephants
Chairs
Training on multiple objects

Trained on 4 classes (cars, faces, motorbikes, airplanes).
Second layer: Shared-features and object-specific features.
Third layer: More specific features.
Plot of H(class|neuron active)
Hierarchical probabilistic inference

Generating posterior samples from faces by filling in experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
An application to modeling motion capture data

(Taylor, Roweis & Hinton, 2007)
Human motion can be captured by placing

reflective markers on the joints and then
using lots of infrared cameras to track the
3-D positions of the markers.
Given a skeletal model, the 3-D positions of
the markers can be converted into the joint
angles plus 6 parameters that describe the
3-D position and the roll, pitch and yaw of
the pelvis.
We only represent changes in yaw because physics

doesnt care about its value and we want to avoid
circular variables.
Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/

63
Motion Generation by Conditional RBM
64
65
66
67
Hintons Talk in Google:
http://www.youtube.com/watch?v=VdIURAu1
-aU
Andrew Ngs Talk in Bay Area Vision

Meeting: Unsupervised Feature
Learning and Deep Learning
http://www.youtube.com/watch?v=ZmNOAtZI
gIk&feature=relmfu
68
References
General Info on Deep Learning
http://deeplearning.net/
Review
Y. Bengio, Learning deep architectures for AI,

Foundations and Trends in Machine Learning,
2(1):1-127, 2009.
I. Arel, D.C. Rose, and T.P. Karnowski, Deep
machine learning A new frontier in Artificial
Intelligence Research, Computational
Intelligence Magazine, 14:12-18, 2010.
69
References
Tutorials & Workshops
Deep Learning and Unsupervised Feature

Learning workshop NIPS 2010:
http://deeplearningworkshopnips2010.wordpr
ess.com/schedule/acceptedpapers/
Workshop on Learning Feature Hierarchies ICML 2009:
http://www.cs.toronto.edu/~rsalakhu/deeplea
rning/index.html
70

Deep Learning Models

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Deep Learning Models

Caricato da

Copyright:

Formati disponibili

Deep Learning Models

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Artificial Neural Networks

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

output units e.g.

Second generation neural networks (~1985)

Compare outputs with

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Unsupervised feature learning

Application cases of deep learning models

Written digit recognition/generation (MNIST dataset)

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Brain-like Cognitive Computing & Deep Learning

Researchers try to find analogies between the

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

How is computer perception done?

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Computer vision features

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Problems of hand-tuned features

Needs expert knowledge

Can we automatically learn good feature representations?

Sensor representation in the brain

Seeing with your tongue

Human echolocation (sonar)

[BrainPort; Martinez et al; Roe et al.]

Unsupervised Feature Learning

Find a better way to represent images than pixels

The goal of Unsupervised Feature Learning

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Stochastic binary units

These have a state

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

A model of digit recognition

2000 top-level neurons

The model learns to generate