Sei sulla pagina 1di 89

A

Short History of
and Introduction to
Deep Learning
John Kau)old
Deep Learning Analy5cs

This talk
Historical machine learning context
Historical limita,ons of neural networks
Deep Learning technological developments
since 2006
Deep Learning now

Machine Learning in AI
AI agent
Sensor

ML
The World

Features

Actuator


Digital

Form


x,y

Classiers /
Predictors




Analy5cs


Machine Learning to detect cats


Expensive
x
(photos)
Sensor+
Digital
Form

y
(human labels)

x,y

ML



(human
engineering)



Classiers /

Predictors

Features

Cats

Not
Cats

The World

Cat

Not Cat

Cat

The way it was always done


Engineered by hand

MFCC
y (LDC transcrip,ons)

Speech
Recogni5on
Kau)old, Energy Formula5ons of Medical Image Segmenta5ons 2001

The way it was always done


Engineered by hand

y (3D labels drawn by hand)

MRI Image
Analysis
Kau)old, Energy Formula5ons of Medical Image Segmenta5ons 2001

The way it was always done


Yoshua Bengio

hVp://www.youtube.com/watch?v=4xsVFLnHC_0

The way it was always done

hVp://www.youtube.com/watch?v=vShMxxqtDDs

The way it was always done



hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10739.pdf

Tiny Datasets
Shallow Neural Nets
ML: {BDT, RF, NB, SVMs, ANNs, LR, BAG-DT}

ML Algorithm x Performance x Problem Domain

Ensemble Methods Win


Boosted Decision Trees
Random Forests
Bagged Decision Trees

Neural Nets 10th
Threshold Metrics

Ranking Metrics

Probability
Metrics

S,ll Small Datasets


Shallow Neural Nets
ML: {BDT, RF, NB, SVMs, ANNs, LR, BAG-DT}

ML Algorithm x Performance x Problem Domain

Score

Random Forests Win


2nd/3rd Boosted Decision Trees
2nd/3rd Neural Nets

Random Forests

Feature dimensionality

This talk
Historical machine learning context

Historical limita,ons of neural


networks
Deep Learning technological developments
since 2006
Deep Learning now

Forbes, August 2013

hVp://www.forbes.com/sites/netapp/2013/08/19/what-is-deep-learning/

The Deep Learning Triumv[ei]rate


LeCun: You have to realize that deep learning . is really a conspiracy between Geo
Hinton and myself and Yoshua Bengio

Geo Hinton

Yann LeCun

Yoshua Bengio

La,n : Trium - vir - ate


English : Three - men - ocial
hVp://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/

trending

2004

2013
trends.google.com

Neural Networks 101

1980s technology

(label) y

Supervised learning
Given x and y, learn p(y|x)
Is this photo, x, a cat, y?

x =
x (input data)

hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10740.pdf

Neural Networks 101


Pros
Simple to learn p(y|x)
Results good for shallow nets

Cons

Doesnt learn p(x)


Trouble with > ~3 layers
Overts
Slow
1980s technology

hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10740.pdf

Neural Networks 101


Backpropaga5on

Vanishing Gradient in Backpropaga,on

This talk
Historical machine learning context
Historical limita,ons of neural networks

Deep Learning technological


developments since 2006
Deep Learning now

The 2006 Breakthrough


B.R.

(before RBMs)

1.
2.
3.
4.
5.
6.
7.

Mainstream: Shallow nets (3 or fewer layers) on small data


Slow training on CPUs
Near universal sigmoid neuron nonlineari5es
Parameters ini5alized with random weights
Could only learn discrimina5ve p(y|x) not genera5ve p(x)
Neural Nets: Yet another machine learning algorithm (yamla)
Some convolu5onal networks (LeCun, et.al.)

Hinton, et. al.s RBMs


A.R.

(a]er RBMs)

1.
2.
3.
4.
5.
6.
7.

Mainstream: Deep nets (6+ layers) on Big Data


Fast training on GPUs
Autoencoders/RBMs to learn genera5ve models for p(x)
Ini5alize discrimina5ve p(y|x) parameters with genera5ve model
Rise of the ReLU nonlinearity
Dropout prevents overvng
Deep nets outcompete best SOTA in the world
1. Image Recogni5on (ImageNet)
2. Speech Recogni5on (TIMIT)
8. Deep learning moves out of academia to google and facebook

Neural Networks 101


Pros
Simple to learn p(y|x)
Results good for shallow nets
Cons
Doesnt learn p(x)
Trouble with > ~3 layers
Overts
Slow

The 2006 Breakthrough: RBMs


h: Hidden layer

v: Data

Key insights:
Learn p(x), not just p(y|x)
Address explaining away by
imposing condi5onal independence
Features in genera5ve model are
hierarchical

1. Symmetric weights between v and h


2. Contras5ve divergence in a nutshell:
1. Sample v, sample h via p(h|v)
2. g+ = hvT
3. Construct a sample v from h
4. sample h via p(h|v)
5. g- = hvT
6. Update W based on (g+ - g-)

Also see Denoising Autoencoders, Sparse Autoencoders, etc.

The 2006 Breakthrough: RBMs

Greedy Stacking RBMs


(approximately) improves a
varia5onal lower bound
on training data
likelihood.

hVp://www.slideshare.net/zukun/p04-restricted-boltzmann-machines-cvpr2012-deep-learning-methods-for-vision
hVp://machinelearning.wustl.edu/mlpapers/paper_les/AISTATS09_SalakhutdinovH.pdf

Neural Networks 101


Pros
Simple to learn p(y|x)
Results good for shallow nets
Cons
Unsupervised feature
Doesnt learn p(x) learning: RBMs, DAEs, etc.
Trouble with > ~3 layers
Overts
Slow

OverPitting and Dropout


Randomly turn o the neurons every 5me you backpropagate
a training example
Forces neurons in hidden layers to rely on broader popula5ons of
inputs rather than get dependent on a specic individual input

At test 5me, halve all the weights


A regularizer ~injec5ng noise into training and bagging
(improves condi5oning)

x
x

x
x x


x x

Train Error

Finetuning with dropout

Finetuning without dropout

Train Cross Entropy

Train data

Finetuning with dropout

Finetuning without dropout

hVp://www.youtube.com/watch?v=DleXA5ADG78
hVp://arxiv.org/pdf/1207.0580.pdf

Finetuning without dropout


Finetuning with dropout

Test Cross Entropy

Test Error

Test data

Finetuning without dropout


Finetuning with dropout

hVp://www.youtube.com/watch?v=DleXA5ADG78
hVp://arxiv.org/pdf/1207.0580.pdf

Regulariza,on of Neural Networks using DropConnect


Li Wan, MaVhew Zeiler, Sixin Zhang, Yann LeCun, Rob Fergus
Dept. of Computer Science, Courant Ins5tute of Mathema5cal Science, New York University

Dropout

Vanilla

Dropout BeV

er

Dropconnect

Dropconnect

Drop*t randomly turns o neurons every


5me a training sample is seen (simula5ng a
dierent architecture)
When tes5ng, approximate by appropriate
scaling on neurons
Dropout masks neuron outputs
Dropconnect masks neuron inputs
Both eec5vely bag classiers
Tighter theore5cal bounds on dropconnect

hVp://cs.nyu.edu/~wanli/dropc/

Neural Networks 101


Pros
Simple to learn p(y|x)
Results good for shallow nets
Cons
Doesnt learn p(x)
Trouble with > ~3 layers
Drop*out
Overts
Maxout
Fast Dropout
Slow
Stochas5c Pooling

A new kind of cat detector

Contrary to what appears to be a widely-held intui,on, our


experimental results reveal that it is possible to train a face detector
without having to label images as containing a face or not.

hVp://research.google.com/archive/unsupervised_icml2012.html

A new kind of cat detector


Non-faces
Faces

In terms of scale, our network is perhaps one of the largest known


networks to date. It has 1 billion trainable parameters our network is s5ll
5ny compared to the human visual cortex, which is 106 5mes larger in terms
of the number of neurons and synapses (Pakkenberg et al., 2003).

Face Detector Results

Non-cats

Breakthrough:
NO LABELS (y) to learn detectors!

Cats

Cat Detector Results


hVp://research.google.com/archive/unsupervised_icml2012.html

A new kind of cat detector

hVp://www.ny5mes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0

A new kind of cat detector


Automa5c 9-layer sparse autoencoder with 109 connec5ons. Training dataset has 10
million 200 x 200 pixel images. Trained with simple asynchronous SGD!

BUT

How to make it
scalable for the
masses?

hVp://www.youtube.com/watch?v=wZfVBwOO0-k

GPUs Faster, cheaper, scalabler


GPU speedup over CPU

Scaling op,ons
Scale out more machines in cluster
Scale up more compute/machine (GPUs)
Both
Cluster setup
16 servers x 2 quad-core processors,
4 NVIDIA GTX 680 GPUs/server (1TFLOPS)
16 x 4 x 1536 = 98,304 CUDA cores
Inniband connec5vity
hVp://www.youtube.com/watch?v=wZfVBwOO0-k

Results
Replicated 1 billion parameter
network results with 3 machines,
200 images/s, 14TB in 3hrs
Scaled to 11 billion parameter
network in a few days with 2% as
many machines

GPUs Faster, cheaper, scalabler

hVp://www.nvidia.com/content/HelpMeChoose/fx2/HelpMeChoose.asp?lang=en-us

Neural Networks 101


Pros
Simple to learn p(y|x)
Results good for shallow nets
Cons
Doesnt learn p(x)
Trouble with > ~3 layers
Overts
Slow
GPUs / CUDA

Deeper Intuition

Depth 4

Depth 3

Depth of architecture: the longest path from an input node to


an output node
Neural net depth = number of layers a~er input layer
Decision trees have eec5vely 2 layers
Boos5ng usually adds one layer to base learner via vo5ng
An architecture with n-1 layers may require exponen,ally more
units than a depth n architecture to model the same func,on
Reinterpret 2004/2008 Caruana results with this insight

Deeper Intuition

hVp://www.cs.toronto.edu/~rsalakhu/ISBI1_pdf_version.pdf

2009

1st level lters

2nd level
generic lters

2009

2nd
level

3rd
level

2009

2nd
level

3rd
level

This talk
Historical machine learning context
Historical limita,ons of neural networks
Deep Learning technological developments
since 2006

Deep Learning now

Its Learning Cats and Dogs


Deep Learning
99% accurate
Top 10 results all
Deep Learning

hVps://www.kaggle.com/c/dogs-vs-cats/leaderboard
hVp://www.npr.org/blogs/alltechconsidered/2014/02/20/280232074/deep-learning-teaching-computers-to-tell-things-apart

Learn, dont engineer


feature representa5ons

hVp://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/

Reuse features
across tasks

hVp://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/

Reuse features
across tasks

hVp://blog.kaggle.com/2013/05/06/qa-with-job-salary-predic5on-rst-prize-winner-vlad-mnih/

Deep Learning Wins

1. MICCAI 2013 Grand Challenge on Mitosis Detec5on


2. ICPR 2012 Contest on Mitosis Detec5on in Breast Cancer Histological Images
3. ISBI 2012 Brain Image Segmenta5on Challenge (with superhuman pixel error rate)
4. IJCNN 2011 Trac Sign Recogni5on Compe55on
(only method to achieve superhuman results)
5. ICDAR 2011 oine Chinese Handwri5ng Compe55on
6. Online German Trac Sign Recogni5on Contest
7. ICDAR 2009 Arabic Connected Handwri5ng Compe55on
8. ICDAR 2009 HandwriVen Farsi/Arabic Character Recogni5on Compe55on
9. ICDAR 2009 French Connected Handwri5ng Compe55on.
hVp://www.idsia.ch/~juergen/deeplearning.html

Deep Learning Wins

hVp://www.cs.toronto.edu/~hinton/absps/speechDBN_jrnl.pdf

Deep Learning Wins


2013

hVp://research.microso~.com/apps/pubs/?id=188864

Deep Learning Wins


Segmenta,on of neuronal structures in EM stacks challenge - ISBI 2012

Raw Data

Human Labels

Challenge: How well can your computer algorithm match the human labels?
Performance metrics: Pixel error, Rand (cluster) error, Warping (topology) error
hVp://ji.sc/wiki/index.php/Segmenta5on_of_neuronal_structures_in_EM_stacks_challenge_-_ISBI_2012

Deep Learning Wins


The final rankings of the challenge
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

at the ISBI 2012 (May 2nd, 2012) were as follows:

Name / Group
Alessandro Gius5 & Dan Ciresan / IDSIA
Dmitry Laptev / MLL-ETH
Sarvesh Dwivedi / MLL-ETH
Uygar Sumbul / MIT
Ting Liu / SCI
Verena Kaynig / Harvard
Mojtaba Seyedhosseini / SCI
Lee Kamentsky / CellProler
Daniel Manson / UCL
Lewis Grin / UCL
Radim Burget / IMMI
Xiao Tan / TSC+PP
Erhan Bas / CLP
Margret Keuper / Munich
Tolga Tasdizen / SCI
Saadia I~ikhar / NIST
Thorsten Schmidt / Munich

Submission # Rand Error

11
2
4
3
14
4
3
2
3
4
1
1
4
2
1
2
1

0.048314096
0.064500546
0.069819563
0.075537461
0.083700043
0.084447767
0.089458158
0.09035629
0.100188599
0.104156509
0.13903844
0.153145065
0.162303187
0.163112555
0.175128169
0.230241267
0.418431548

Warping Error
0.000434367
0.000555801
0.000524902
0.000645574
0.001601664
0.001124446
0.001134237
0.001512273
0.002199173
0.001475143
0.002641296
0.000684865
0.001613108
0.003023656
0.002140045
0.01615626
0.001403046

Pixel Error
0.060298549
0.083264179
0.079264809
0.065254812
0.134148235
0.157146646
0.077758118
0.100000994
0.132557284
0.095671823
0.102285508
0.087875907
0.109391938
0.097570061
0.092647919
0.149973922
0.097255861

Deep learning
Other

Deep Learning Wins

agaric

hVp://www.image-net.org/challenges/LSVRC/2012/results.html

hVp://videolectures.net/machine_krizhevsky_imagenet_classica5on/

Deep Learning Wins


2012 ImageNet Leaderboard

SuperVision Team:
Alex Krizhevsky,
Ilya Sutskever,
Geo Hinton

DEEP LEARNING

COMPUTER
VISION

hVp://www.image-net.org/challenges/LSVRC/2012/results.html

hVp://www.image-net.org/explore

The ImageNet Breakthrough

Learned Filters (1st layer):

Convolu5onal Neural Network


& fully connected layers
1.2M train, 150k test
60M parameters

The ImageNet Breakthrough


Data Augmenta,on

Augmenta,on 1: Sampled overlapping patches with LR reec,ons


Increases # of training examples, averages 10 labels in output

02
04
06
20

08

20

40

001

40

60

021

60

80

041

80

20

100

100
02

40

061

120

120

081

04

60

140

140

002

06

80

160

100

180

160
08
002

200

05

140

220

200

20

20

40

60

80

100

120

140

160

021

220

40
180

160

60

180

80

200

041

220

220

50


100

150

002

081

061

100

150

04

002

06

022

041
140

021

001

08

06

04

02

08

160

001

180

021
041

200
220

20

50

100

150

40

60

60

80
100

100

150

200

08

200

041

220

06

180

021

200

04

160

001

180

05

02

140

08

160

022
001

120

06

140

002

051

100

04

120

081

80
002

02

061

20
200

40

50

001

220
50

061

100

021
041

081
061
002
081
022
002

051

001

05

002
022
002

256

200

02

081

120
022

200

50

061

100

200

256

022
001

001

120

Sample
224x224 images
+LR reec5ons

051180

051

001

05

Tes,ng samples 4 corners +


+center images + LR reec5ons

Augmenta,on 2: RGB PCA perturba,ons


Add noise in the direc5on of PCA on RGB pixel values

~ N(0,0.12),

One per image

Reduces error ~1%
hVp://www.image-net.org/explore

150

200

ReLU vs. Logis,c:


ReLU achieves logis5c training error
in epochs
3

Sigmoid
ReLU=max(0,x)
Tan

-1

(x)

ReLU(x)=max(0,x)

Training Error

The ImageNet Breakthrough

Epoch

0
-3

-2

-1

Mi5gates vanishing
gradient
Good results without
pre-training

hVp://eprints.pascal-network.org/archive/00008596/01/glorot11a.pdf

The ImageNet Breakthrough


Local Response Normaliza,on (LRN)
Neuron ac5vi5es are normalized at individual loca5ons across feature maps
4 parameter transforma5on
Reduces error rate 1.2% - 1.4%
Overlapping Max-Pooling
Every other pixel is replaced by the max value in a 3x3 window around that pixel
Max pooling is per channel
Overlapping pooling seems to resist overvng beVer than nonoverlapping
Reduces error rate 0.3% - 0.4%

See also:
Sample pooling outputs to eec5vely model average over convolu5onal layers
hVp://arxiv.org/abs/1301.3557

The ImageNet Breakthrough


Training parameters:
Backpropaga,on with stochas5c gradient descent (no pretraining)
Used dropout to prevent overvng
Minibatch size = 128 examples
Momentum = 0.9
Weight decay = 0.0005

Update rule:

Code released and is BSD licensed (without dropout or mul5-GPU features)


hVp://code.google.com/p/cuda-convnet/

Whose team got 3rd on ImageNet?

Led by outstanding Oxford


Professor with one of the leading
object recogni,on labs in the world

Whose team got 3rd on ImageNet?

Whose team got 3rd on ImageNet?

On page 10, there are


s5ll object recogni5on
contribu5ons with
>100 cites!

How did the computer vision community take it?

+5

+22

Computer Vision vs. Neural Networks


hVps://plus.google.com/104362980539466846301/posts/JBBFfv2XgWM

How did the computer vision community take it?

hVp://www.theonion.com/ar5cles/local-idiot-to-post-comment-on-internet,2500/

How did the computer vision community take it?


Alexei A Efros
11k cites

Jitendra Malik
77k cites
Geo Hinton
71k cites
Yann Lecun
19k cites

Andrew Zisserman
63k cites
Computer Vision vs. Neural Networks
hVps://plus.google.com/104362980539466846301/posts/JBBFfv2XgWM

How did the computer vision community take it?


The [computer] vision community has been
skep5cal about deep learning and feature
learning, but there was the same kind of
skep5cism from the ML community un5l 4 or 5
years ago Thankfully, results on standard
benchmarks have a way of quie5ng down
theological arguments.
[82] H. Bourlard, H. Hermansky, and N. Morgan. Towards
increasing speech recogni5on error rates. Speech
Communica5on, 18(3):205231, May 1996.
[83] J.R. Pierce. Whither speech recogni5on? Journal of
the Acous5cal Society of America., 46:10491051, 1969.

hVps://plus.google.com/104362980539466846301/posts/JBBFfv2XgWM

The ImageNet Breakthrough


We trained the network for roughly
90 cycles through the training set,
which took ~6 days on 2 NVIDIA GTX
580 3GB GPUs.

2 GTX 580s: ~$600


A new computer: ~$1000
A week of electricity: <$50
BSD code: free
hVp://videolectures.net/machine_krizhevsky_imagenet_classica5on/

Neural Networks 101 (ca. 2013)


Pros
Simple, fast, cheap to learn p(y|x) & p(x)
Results good win for shallow deep nets
Feature learning faster, less expensive,
beser than feature engineering
Open source resources and community
Theore,cally defensible
More scalable
Cons
Doesnt learn p(x)
Trouble with > ~3 layers
Overts

Slow training

hVp://www.technologyreview.com/featuredstory/513696/deep-learning/

Google is the new academia

hVp://www.wired.com/wiredenterprise/2013/03/google_hinton/

Facebook is the new academia


LeCun: the purpose and the goal
of the new organiza5on [is] to do
two things. One is to really make
progress from a scien5c point of
view, from the side of technology.
This will involve par5cipa5ng in
the research community and
publishing papers. The other part
will be [integra5ng these
technologies] at Facebook.

hVp://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/

NetPlix: Deep Personalization

hVp://venturebeat.com/2014/02/10/nelix-moves-into-deep-learning-research-to-improve-personaliza5on/

The Deep Learning Draft

Deep Mind sold for


$400M-$500M

Microso~, Facebook and Google nd themselves in a


baVle for deep learning talent Last year, the cost of a
top, world-class deep learning expert was about the same
as a top NFL quarterback prospect.
hVp://www.businessweek.com/ar5cles/2014-01-27/the-race-to-buy-the-human-brains-behind-deep-learning-machines

Does AI have 9 lives?


Deep Learning cured cancer!
Cau,ous op,mism in results,
without reckless asser,ons
about the future.

Deep Learning is yamla.

hVp://www.newyorker.com/online/blogs/elements/2014/01/the-new-york-5mes-ar5cial-intelligence-hype-machine.html

Ng on Deep Learning Criticisms

hVp://www.youtube.com/watch?v=ZmNOAtZIgIk

Deep Learning Facts


Deep Learning
Learn mul5ple levels of conceptual abstrac5on in data, o~en these are
understandable, reusable hierarchical feature representa5ons
Provides breakthrough or state of the art performance across many
problem domains, and is becoming a prac5cal resource through data
availability, algorithms, culture & hardware
Huge datasets available now, some labeled
Convolu,onal neural networks, Dropout, ReLU, Max-pooling, RBMs,
Autoencoders, data augmenta,on and curriculum learning
Many open source permissively licensed codebases
Pylearn2, CUDA convnet, theano, Torch, etc.

GPUs and large scale HPC networks

Is disrup>ng established technologies (like CSR)


Mi5gates the expensive hand labeling of objects and hand-engineering
of features
Can be used to generate sample data
Scales to terabytes and exabytes of data (i.e. Big Data)
Has a huge learning capacity that resists overvng with drop*t

Past Present

p(y|x),
yamla

Labels

Tan-1, logis5c
Data

Labels
Backprop,
feature
engineering

C, Matlab

p(y|x),
SOTA

Caruana-like
academic
evalua5ons

ReLU, maxout
NVIDIA

Backprop,
dropout, SGD,
aSGD,
autoencoders,
wake-sleep,
curriculum
learning, ?...

Stampede,
UTexas

CUDA, Torch, Cudamat,


Theano, pylearn2

p(x)

DataVVVV

Deeply Generative and


Discriminative Modeling
p(x)


s
e
r
tu
a
e
f

le
b

a
n
s
o
u
5
e
za
, r
i
l
d
a
e
r
x
~Fi r gene
fo

Now needs much


less labeled data
p(y|x)

SVD(DeepLearningFuture|Bengio)
2013

Limita,ons and Looking Forward


1. Scaling computa5ons
2. Reducing the dicul5es in op5mizing parameters
3. Designing (or avoiding) expensive inference and sampling
4. Improved learning of representa5ons that beVer disentangle the
unknown underlying factors of varia5on.
hVp://arxiv.org/pdf/1305.0445v2.pdf

What did I omit?


The dark art of training
Sparsity

L1 penal5es on ac5va5ons and ReLU


Biologically, 1-4% of neurons on
Reduces capacity of network (but ReLU increase capacity)
Leads to lower dimensional and more stable representa5ons

With a lot of labeled data, you dont really need the


unsupervised pre-training, per se, but its a logical way to tell
the recent history of deep learning
Need to get the scales of the weights correct and backpropagate

Recurrent networks
I do not understand these, except at a very high level

Other motivating resources


hVp://deeplearning.net
Courses

Hintons NNs hVps://www.coursera.org/course/neuralnets


Ngs ML hVps://www.coursera.org/course/ml

Libraries

Pylearn2 hVp://deeplearning.net/so~ware/pylearn2/
Theano hVp://deeplearning.net/so~ware/theano/
cuda-convnet hVps://code.google.com/p/cuda-convnet/
Torch7 hVp://torch.ch

Videos

My favorite 2 minutes on youtube (note the polite applause at the end):


hVp://videolectures.net/machine_krizhevsky_imagenet_classica5on/
A beVer, more informed version of this talk with Hintons visionary perspec5ve:
hVp://www.youtube.com/watch?v=vShMxxqtDDs
Ng hVp://www.youtube.com/watch?v=n1ViNeWhC24
Coates hVp://www.youtube.com/watch?v=wZfVBwOO0-k
A panel discussion hVp://www.youtube.com/watch?v=b4zr9Zx5WiE
H. Lee
hVp://www.slideshare.net/zukun/p04-restricted-boltzmann-machines-
cvpr2012-deep-learning-methods-for-vision
Not a vide: Bengio on the future hVp://arxiv.org/pdf/1305.0445v2.pdf

The Deep Learning Triumv[ei]rate


LeCun: You have to realize that deep learning . is really a conspiracy between Geo
Hinton and myself and Yoshua Bengio

Geo Hinton

Yann LeCun

Yoshua Bengio

La,n : Trium - {ver,vir} - ate


English : Three - {truth, men} - ocial
hVp://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/

Questions?

Deep programming

Shallow programming

Deep Learning Wins

hVp://clopinet.com/isabelle/Projects/ICML2011/

Potrebbero piacerti anche