Sei sulla pagina 1di 70

Deep Learning Models

2012-05-03
Byoung-Hee Kim
Biointelligence Lab, CSE,
Seoul National University
NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Input

output

target

Two!

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Artificial Neural Networks

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Historical background:
First generation neural networks
Perceptrons (~1960)
used a layer of handcoded features and tried
to recognize objects by
learning how to weight
these features.
There was a neat
learning algorithm for
adjusting the weights.
But perceptrons are
fundamentally limited in
what they can learn to
do.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Bomb

Toy

output units e.g.


class labels

non-adaptive
hand-coded
features

input units
e.g. pixels

Sketch of a typical
perceptron from the 1960s
10

Second generation neural networks (~1985)


Back-propagate
error signal to
get derivatives
for learning

Compare outputs with


correct answer to get
error signal

outputs

hidden
layers

input vector
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

11

But, finding any model with deep architecture was not successful till 2006
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

12

http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

13

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

14

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

15

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

16

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

17

Agenda
Computer Perception

Unsupervised feature learning


Various deep learning models

Application cases of deep learning models

Written digit recognition/generation (MNIST dataset)


Image classification
Audio recognition
Language modeling
Motion generation

References
Appendix

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

18

Brain-like Cognitive Computing & Deep Learning


It is well know that the brain has a
hierarchical structure
Researchers try to build models that
simulate and/or act like the brain
Learning deep structures from data,
or the deep learning is a new frontier
in Artificial Intelligence research

Researchers try to find analogies between the


characteristics of the brain and their deep
models

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

19

Feature Learning
pixel 1

Learning
algorithm

Input

pixel 2

pixel 2

Input space

Motorbikes
Non-Motorbikes

pixel 1
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

20

Feature Learning
handle

wheel

Feature
Extractor

Learning
algorithm

Input
Feature space

pixel 2

handle

Input space

Motorbikes
Non-Motorbikes

pixel 1
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

wheel

21

How is computer perception done?


Object
detection
Image

Low-level
vision features

Recognition

Audio
classification

Audio

Low-level
audio features

Speaker
identification

Helicopter
control
Helicopter
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Low-level state
features

Action
22

Learning representations

Sensor

Feature
Representation

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Learning
algorithm

23

Computer vision features

SIFT

HoG

Textons

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Spin image

RIFT

GLOH

24

Audio features

MFCC

Spectrogram

Flux

ZCR

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Rolloff
25

Problems of hand-tuned features

Needs expert knowledge


Sub-optimal
Time-consuming and expensive
Does not generalize to other domains

Can we automatically learn good feature representations?


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

26

Sensor representation in the brain

Seeing with your tongue

Human echolocation (sonar)

Auditory cortex
learns to see.
Auditory
Cortex
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

27

[BrainPort; Martinez et al; Roe et al.]

Unsupervised Feature Learning

Find a better way to represent images than pixels


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

28

The goal of Unsupervised Feature Learning

Unlabeled images
Learning
algorithm

Feature representation
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

29

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

30

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

31

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

32

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

33

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

34

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

35

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

36

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

37

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

38

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

39

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

40

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

41

Stochastic binary units


(Bernoulli variables)
1

These have a state


of 1 or 0.

p(si 1)
The probability of
turning on is
determined by the
weighted input
from other units
(plus a bias)

p( si 1)

0
0

bi s j w ji
j
1

1 exp(bi s j w ji )
j

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

42

Binary
Stochastic
Neuron

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

43

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

44

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

45

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

46

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

47

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

48

A model of digit recognition


The top two layers form an
associative memory whose
energy landscape models the low
dimensional manifolds of the
digits.
The energy valleys have names

2000 top-level neurons

10 label
neurons

The model learns to generate


combinations of labels and images.

To perform recognition we start with a


neutral state of the label units and do
an up-pass from the image followed
by a few iterations of the top-level
associative memory.
(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

500 neurons

500 neurons

28 x 28
pixel
image
49

Generation & Recognition of Digits by DBN


Deep belief network that learns to generate
handwritten digits

http://www.cs.toronto.edu/~hinton/digits.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

50

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

51

First stage of visual processing in brain: V1


The first stage of visual processing in the brain (V1) does
edge detection.

Schematic of simple cell

Actual simple cell

Gabor functions.
[Images from DeAngelis, Ohzawa & Freeman, 1995]

Sparse coding illustration


Learned bases (f1 , , f64): Edges

Natural Images
50

100

150

200

50

250

100

300

150
350

200
400

250

50

300

100

450

500
50

100

150

200

350

250

300

350

400

450
150

500

200

400

250

450
300

500
50

100

150

350
200

250

300

350

100

150

400

450

500

400

450

500
50

200

250

300

350

400

450

500

Test example
0.8 *
x

0.8 *

+ 0.3 *
f36

+ 0.3 *

+ 0.5 *
f42

+ 0.5 *

f63

[0, 0, , 0, 0.8, 0, , 0, 0.3, 0, , 0, 0.5, ]


Compact & easily
= [a1, , a64] (feature representation)
interpretable

Supervised learning

Cars

Testing:
What is this?

Motorcycles

Semi-supervised learning

Unlabeled images (all cars/motorcycles)

Testing:
What is this?

Car

Motorcycle

Self-taught learning

Unlabeled images (random internet images)

Testing:
What is this?

Car

Motorcycle

Self-taught learning

Sparse codin
g, LCC, etc.

f1, f2, , fk

Use learned f1, f2, , fk to represent training/test sets.


Using f1, f2, , fk
Car

Motorcycle

a1, a2, , ak

Convolutional DBN for Images

Convolutional DBN on face images


object models

object parts
(combination
of edges)

edges

pixels

Learning of object parts


Examples of learned object parts from object categories
Faces

Cars

Elephants

Chairs

Training on multiple objects


Trained on 4 classes (cars, faces, motorbikes, airplanes).
Second layer: Shared-features and object-specific features.
Third layer: More specific features.

Plot of H(class|neuron active)

Hierarchical probabilistic inference


Generating posterior samples from faces by filling in experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

Input images

Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference

An application to modeling motion capture data


(Taylor, Roweis & Hinton, 2007)

Human motion can be captured by placing


reflective markers on the joints and then
using lots of infrared cameras to track the
3-D positions of the markers.
Given a skeletal model, the 3-D positions of
the markers can be converted into the joint
angles plus 6 parameters that describe the
3-D position and the roll, pitch and yaw of
the pelvis.

We only represent changes in yaw because physics


doesnt care about its value and we want to avoid
circular variables.

Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/


(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

63

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

64

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

65

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

66

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

67

Hintons Talk in Google:

http://www.youtube.com/watch?v=VdIURAu1
-aU

Andrew Ngs Talk in Bay Area Vision


Meeting: Unsupervised Feature
Learning and Deep Learning

http://www.youtube.com/watch?v=ZmNOAtZI
gIk&feature=relmfu

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

68

References
General Info on Deep Learning

http://deeplearning.net/

Review

Y. Bengio, Learning deep architectures for AI,


Foundations and Trends in Machine Learning,
2(1):1-127, 2009.
I. Arel, D.C. Rose, and T.P. Karnowski, Deep
machine learning A new frontier in Artificial
Intelligence Research, Computational
Intelligence Magazine, 14:12-18, 2010.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

69

References
Tutorials & Workshops

Deep Learning and Unsupervised Feature


Learning workshop NIPS 2010:
http://deeplearningworkshopnips2010.wordpr
ess.com/schedule/acceptedpapers/
Workshop on Learning Feature Hierarchies ICML 2009:
http://www.cs.toronto.edu/~rsalakhu/deeplea
rning/index.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/

70

Potrebbero piacerti anche