Sei sulla pagina 1di 70

Deep Learning Models

2012-05-03
Byoung-Hee Kim
Biointelligence Lab, CSE,
Seoul National University

NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 2
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 3
Input output target

Two!

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 4


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 5
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 6
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 7
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 8
Artificial Neural Networks

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 9


Historical background:
First generation neural networks

Perceptrons (~1960) Bomb Toy


output units e.g.
used a layer of hand-
coded features and tried class labels
to recognize objects by
learning how to weight
these features.
There was a neat non-adaptive
learning algorithm for hand-coded
adjusting the weights. features
But perceptrons are
fundamentally limited in
what they can learn to input units
do. e.g. pixels

Sketch of a typical
perceptron from the 1960s
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 10
Second generation neural networks (~1985)

Compare outputs with


Back-propagate correct answer to get
error signal to error signal
get derivatives
for learning outputs

hidden
layers

input vector
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 11
But, finding any model with deep architecture was not successful till 2006
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 12
http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 13
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 14
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 15
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 16
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 17
Agenda

Computer Perception
Unsupervised feature learning
Various deep learning models
Application cases of deep learning models
Written digit recognition/generation (MNIST dataset)
Image classification
Audio recognition
Language modeling
Motion generation
References
Appendix

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 18


Brain-like Cognitive Computing & Deep Learning

It is well know that the brain has a


hierarchical structure

Researchers try to build models that


simulate and/or act like the brain

Learning deep structures from data,


or the deep learning is a new frontier
in Artificial Intelligence research
Researchers try to find analogies between the
characteristics of the brain and their deep
models
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 19
Feature Learning

pixel 1

Learning
algorithm
pixel 2
Input
Motorbikes
Input space Non-Motorbikes
pixel 2

pixel 1
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 20
Feature Learning

handle

Feature Learning
wheel Extractor algorithm

Input
Motorbikes
Input space Non-Motorbikes Feature space

handle
pixel 2

pixel 1 wheel
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 21
How is computer perception done?
Object
detection

Image Low-level Recognition


vision features

Audio
classification

Low-level Speaker
Audio
audio features identification

Helicopter
control

Low-level state
Helicopter
features Action
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 22
Learning representations

Feature Learning
Sensor
Representation algorithm

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 23


Computer vision features

SIFT Spin image

HoG RIFT

Textons
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ GLOH 24
Audio features

Spectrogram MFCC

Flux ZCR Rolloff

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 25


Problems of hand-tuned features

Needs expert knowledge


Sub-optimal
Time-consuming and expensive
Does not generalize to other domains

Can we automatically learn good feature representations?

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 26


Sensor representation in the brain

Seeing with your tongue


Human echolocation (sonar)

Auditory cortex
learns to see.
Auditory
Cortex

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 27


[BrainPort; Martinez et al; Roe et al.]
Unsupervised Feature Learning

Find a better way to represent images than pixels

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 28


The goal of Unsupervised Feature Learning

Unlabeled images

Learning
algorithm

Feature representation

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 29


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 30
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 31
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 32
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 33
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 34
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 35
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 36
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 37
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 38
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 39
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 40
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 41
Stochastic binary units
(Bernoulli variables)

1
These have a state
of 1 or 0.
p(si 1)
The probability of
turning on is 0
0
determined by the
weighted input bi s j w ji
from other units j
(plus a bias)
1
p( si 1)
1 exp(bi s j w ji )
j
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 42
Binary
Stochastic
Neuron

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 43


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 44
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 45
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 46
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 47
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 48
A model of digit recognition

The top two layers form an


associative memory whose 2000 top-level neurons
energy landscape models the low
dimensional manifolds of the
digits.
10 label
The energy valleys have names 500 neurons
neurons

The model learns to generate


combinations of labels and images. 500 neurons

To perform recognition we start with a


neutral state of the label units and do 28 x 28
an up-pass from the image followed pixel
by a few iterations of the top-level image
associative memory.
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 49
Generation & Recognition of Digits by DBN

Deep belief network that learns to generate


handwritten digits
http://www.cs.toronto.edu/~hinton/digits.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 50


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 51
First stage of visual processing in brain: V1
The first stage of visual processing in the brain (V1) does
edge detection.

Schematic of simple cell Actual simple cell

Gabor functions.
[Images from DeAngelis, Ohzawa & Freeman, 1995]
Sparse coding illustration
Natural Images Learned bases (f1 , , f64): Edges
50

100

150

200 50

250
100

300
150

350
200
400
250 50
450

300 100
500
50 100 150 200 250 300 350 400 450
150 500
350

200
400

250
450

300
500
50 100 150 200
350 250 300 350 400 450 500

400

450

500
50 100 150 200 250 300 350 400 450 500

Test example

0.8 * + 0.3 * + 0.5 *

x 0.8 * f36 + 0.3 * f42 + 0.5 * f63


[0, 0, , 0, 0.8, 0, , 0, 0.3, 0, , 0, 0.5, ]
Compact & easily
= [a1, , a64] (feature representation)
interpretable
Supervised learning

Cars Motorcycles

Testing:
What is this?
Semi-supervised learning

Unlabeled images (all cars/motorcycles)

Testing:
What is this?
Car Motorcycle
Self-taught learning

Unlabeled images (random internet images)

Testing:
What is this?
Car Motorcycle
Self-taught learning

Sparse codin
g, LCC, etc. f1, f2, , fk

Use learned f1, f2, , fk to represent training/test sets.

Using f1, f2, , fk


a1, a2, , ak

Car Motorcycle
Convolutional DBN for Images
Convolutional DBN on face images

object models

object parts
(combination
of edges)

edges

pixels
Learning of object parts

Examples of learned object parts from object categories


Faces Cars Elephants Chairs
Training on multiple objects
Trained on 4 classes (cars, faces, motorbikes, airplanes).
Second layer: Shared-features and object-specific features.
Third layer: More specific features.

Plot of H(class|neuron active)


Hierarchical probabilistic inference
Generating posterior samples from faces by filling in experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

Input images

Samples from
feedforward
Inference
(control)

Samples from
Full posterior
inference
An application to modeling motion capture data
(Taylor, Roweis & Hinton, 2007)

Human motion can be captured by placing


reflective markers on the joints and then
using lots of infrared cameras to track the
3-D positions of the markers.

Given a skeletal model, the 3-D positions of


the markers can be converted into the joint
angles plus 6 parameters that describe the
3-D position and the roll, pitch and yaw of
the pelvis.
We only represent changes in yaw because physics
doesnt care about its value and we want to avoid
circular variables.

Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 63
Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 64


Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 65


Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 66


Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 67


Hintons Talk in Google:
http://www.youtube.com/watch?v=VdIURAu1
-aU

Andrew Ngs Talk in Bay Area Vision


Meeting: Unsupervised Feature
Learning and Deep Learning
http://www.youtube.com/watch?v=ZmNOAtZI
gIk&feature=relmfu

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 68


References

General Info on Deep Learning


http://deeplearning.net/

Review
Y. Bengio, Learning deep architectures for AI,
Foundations and Trends in Machine Learning,
2(1):1-127, 2009.
I. Arel, D.C. Rose, and T.P. Karnowski, Deep
machine learning A new frontier in Artificial
Intelligence Research, Computational
Intelligence Magazine, 14:12-18, 2010.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 69


References

Tutorials & Workshops


Deep Learning and Unsupervised Feature
Learning workshop NIPS 2010:
http://deeplearningworkshopnips2010.wordpr
ess.com/schedule/acceptedpapers/
Workshop on Learning Feature Hierarchies -
ICML 2009:
http://www.cs.toronto.edu/~rsalakhu/deeplea
rning/index.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 70

Potrebbero piacerti anche