Deep Learning Models

Deep Learning Models
2012-05-03
Byoung-Hee Kim
Biointelligence Lab, CSE,
Seoul National University
NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 2
Input output target
Two!

Artificial Neural Networks

Historical background:
First generation neural networks
Perceptrons (~1960) Bomb Toy

output units e.g.
used a layer of hand-
coded features and tried class labels
to recognize objects by
learning how to weight
these features.
There was a neat non-adaptive
learning algorithm for hand-coded
adjusting the weights. features
But perceptrons are
fundamentally limited in
what they can learn to input units
do. e.g. pixels
Sketch of a typical
perceptron from the 1960s
Second generation neural networks (~1985)
Compare outputs with

Back-propagate correct answer to get
error signal to error signal
get derivatives
for learning outputs
hidden
layers
input vector
But, finding any model with deep architecture was not successful till 2006
http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html
Agenda
Computer Perception
Unsupervised feature learning
Various deep learning models
Application cases of deep learning models
Written digit recognition/generation (MNIST dataset)
Image classification
Audio recognition
Language modeling
Motion generation
References
Appendix

Brain-like Cognitive Computing & Deep Learning
It is well know that the brain has a

hierarchical structure
Researchers try to build models that

simulate and/or act like the brain
Learning deep structures from data,

or the deep learning is a new frontier
in Artificial Intelligence research
Researchers try to find analogies between the
characteristics of the brain and their deep
models
Feature Learning
pixel 1
Learning
algorithm
pixel 2
Input
Motorbikes
Input space Non-Motorbikes
pixel 2
pixel 1
Feature Learning
handle
Feature Learning
wheel Extractor algorithm
Input
Motorbikes
Input space Non-Motorbikes Feature space
handle
pixel 2
pixel 1 wheel
How is computer perception done?
Object
detection
Image Low-level Recognition

vision features
Audio
classification
Low-level Speaker
Audio
audio features identification
Helicopter
control
Low-level state
Helicopter
features Action
Learning representations
Feature Learning
Sensor
Representation algorithm

Computer vision features
SIFT Spin image
HoG RIFT
Textons
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ GLOH 24
Audio features
Spectrogram MFCC
Flux ZCR Rolloff

Problems of hand-tuned features
Needs expert knowledge

Sub-optimal
Time-consuming and expensive
Does not generalize to other domains
Can we automatically learn good feature representations?

Sensor representation in the brain
Seeing with your tongue

Human echolocation (sonar)
Auditory cortex
learns to see.
Auditory
Cortex

[BrainPort; Martinez et al; Roe et al.]
Unsupervised Feature Learning
Find a better way to represent images than pixels

The goal of Unsupervised Feature Learning
Unlabeled images
Learning
algorithm
Feature representation

Stochastic binary units
(Bernoulli variables)
1
These have a state
of 1 or 0.
p(si 1)
The probability of
turning on is 0
0
determined by the
weighted input bi s j w ji
from other units j
(plus a bias)
1
p( si 1)
1 exp(bi s j w ji )
j
Binary
Stochastic
Neuron

A model of digit recognition
The top two layers form an

associative memory whose 2000 top-level neurons
energy landscape models the low
dimensional manifolds of the
digits.
10 label
The energy valleys have names 500 neurons
neurons
The model learns to generate

combinations of labels and images. 500 neurons
To perform recognition we start with a

neutral state of the label units and do 28 x 28
an up-pass from the image followed pixel
by a few iterations of the top-level image
associative memory.
Generation & Recognition of Digits by DBN
Deep belief network that learns to generate

handwritten digits
http://www.cs.toronto.edu/~hinton/digits.html

First stage of visual processing in brain: V1
The first stage of visual processing in the brain (V1) does
edge detection.
Schematic of simple cell Actual simple cell
Gabor functions.
[Images from DeAngelis, Ohzawa & Freeman, 1995]
Sparse coding illustration
Natural Images Learned bases (f1 , , f64): Edges
50
100
150
200 50
250
100
300
150
350
200
400
250 50
450
300 100
500
50 100 150 200 250 300 350 400 450
150 500
350
200
400
250
450
300
500
50 100 150 200
350 250 300 350 400 450 500
400
450
500
50 100 150 200 250 300 350 400 450 500
Test example
0.8 * + 0.3 * + 0.5 *
x 0.8 * f36 + 0.3 * f42 + 0.5 * f63

[0, 0, , 0, 0.8, 0, , 0, 0.3, 0, , 0, 0.5, ]
Compact & easily
= [a1, , a64] (feature representation)
interpretable
Supervised learning
Cars Motorcycles
Testing:
What is this?
Semi-supervised learning
Unlabeled images (all cars/motorcycles)
Testing:
What is this?
Car Motorcycle
Self-taught learning
Unlabeled images (random internet images)
Testing:
What is this?
Car Motorcycle
Self-taught learning
Sparse codin
g, LCC, etc. f1, f2, , fk
Use learned f1, f2, , fk to represent training/test sets.
Using f1, f2, , fk

a1, a2, , ak
Car Motorcycle
Convolutional DBN for Images
Convolutional DBN on face images
object models
object parts
(combination
of edges)
edges
pixels
Learning of object parts
Examples of learned object parts from object categories

Faces Cars Elephants Chairs
Training on multiple objects
Trained on 4 classes (cars, faces, motorbikes, airplanes).
Second layer: Shared-features and object-specific features.
Third layer: More specific features.
Plot of H(class|neuron active)

Hierarchical probabilistic inference
Generating posterior samples from faces by filling in experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
An application to modeling motion capture data
(Taylor, Roweis & Hinton, 2007)
Human motion can be captured by placing

reflective markers on the joints and then
using lots of infrared cameras to track the
3-D positions of the markers.
Given a skeletal model, the 3-D positions of

the markers can be converted into the joint
angles plus 6 parameters that describe the
3-D position and the roll, pitch and yaw of
the pelvis.
We only represent changes in yaw because physics
doesnt care about its value and we want to avoid
circular variables.
Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/

Motion Generation by Conditional RBM




Hintons Talk in Google:
http://www.youtube.com/watch?v=VdIURAu1
-aU
Andrew Ngs Talk in Bay Area Vision

Meeting: Unsupervised Feature
Learning and Deep Learning
http://www.youtube.com/watch?v=ZmNOAtZI
gIk&feature=relmfu

References
General Info on Deep Learning

http://deeplearning.net/
Review
Y. Bengio, Learning deep architectures for AI,
Foundations and Trends in Machine Learning,
2(1):1-127, 2009.
I. Arel, D.C. Rose, and T.P. Karnowski, Deep
machine learning A new frontier in Artificial
Intelligence Research, Computational
Intelligence Magazine, 14:12-18, 2010.

References
Tutorials & Workshops

Deep Learning and Unsupervised Feature
Learning workshop NIPS 2010:
http://deeplearningworkshopnips2010.wordpr
ess.com/schedule/acceptedpapers/
Workshop on Learning Feature Hierarchies -
ICML 2009:
http://www.cs.toronto.edu/~rsalakhu/deeplea
rning/index.html

Deep Learning Models

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Deep Learning Models

Caricato da

Copyright:

Formati disponibili

Deep Learning Models

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 4

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 9

Perceptrons (~1960) Bomb Toy

Compare outputs with

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 18

It is well know that the brain has a

Researchers try to build models that

Learning deep structures from data,

Image Low-level Recognition

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 23

SIFT Spin image

Flux ZCR Rolloff

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 25

Needs expert knowledge

Can we automatically learn good feature representations?

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 26

Seeing with your tongue

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 27

Find a better way to represent images than pixels

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 28

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 29

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 43

The top two layers form an

The model learns to generate

To perform recognition we start with a

Deep belief network that learns to generate

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 50

Schematic of simple cell Actual simple cell

0.8 * + 0.3 * + 0.5 *

x 0.8 * f36 + 0.3 * f42 + 0.5 * f63

Unlabeled images (all cars/motorcycles)

Unlabeled images (random internet images)

Use learned f1, f2, , fk to represent training/test sets.

Using f1, f2, , fk

Examples of learned object parts from object categories

Plot of H(class|neuron active)

Human motion can be captured by placing

Given a skeletal model, the 3-D positions of

Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 64

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 65

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 66

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 67

Andrew Ngs Talk in Bay Area Vision

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 68

General Info on Deep Learning

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 69

Tutorials & Workshops

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 70

Potrebbero piacerti anche