Deep Learning Notes

A
Short History of
and Introduction to
Deep Learning
John Kau)old
Deep Learning Analy5cs
This talk
Historical machine learning context
Historical limita,ons of neural networks
Deep Learning technological developments
since 2006
Deep Learning now
Machine Learning in AI
AI agent
Sensor
ML
The World
Features
Actuator

Digital

Form

x,y

Classiers /
Predictors

Analy5cs

Machine Learning to detect cats

Expensive
x
(photos)
Sensor+
Digital
Form
y
(human labels)
x,y
ML

(human
engineering)

Classiers /

Predictors

Features
Cats
Not
Cats
The World
Cat
Not Cat
Cat
The way it was always done

Engineered by hand
MFCC
y (LDC transcrip,ons)
Speech
Recogni5on
Kau)old, Energy Formula5ons of Medical Image Segmenta5ons 2001

Engineered by hand
y (3D labels drawn by hand)
MRI Image
Analysis
Kau)old, Energy Formula5ons of Medical Image Segmenta5ons 2001

Yoshua Bengio
hVp://www.youtube.com/watch?v=4xsVFLnHC_0
hVp://www.youtube.com/watch?v=vShMxxqtDDs

hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10739.pdf
Tiny Datasets
Shallow Neural Nets
ML: {BDT, RF, NB, SVMs, ANNs, LR, BAG-DT}
ML Algorithm x Performance x Problem Domain
Ensemble Methods Win

Boosted Decision Trees
Random Forests
Bagged Decision Trees

Neural Nets 10th
Threshold Metrics
Ranking Metrics
Probability
Metrics
S,ll Small Datasets

Shallow Neural Nets
ML: {BDT, RF, NB, SVMs, ANNs, LR, BAG-DT}
ML Algorithm x Performance x Problem Domain
Score
Random Forests Win

2nd/3rd Boosted Decision Trees
2nd/3rd Neural Nets
Random Forests
Feature dimensionality
This talk
Historical limita,ons of neural

networks
since 2006
Deep Learning now
Forbes, August 2013
hVp://www.forbes.com/sites/netapp/2013/08/19/what-is-deep-learning/
The Deep Learning Triumv[ei]rate

LeCun: You have to realize that deep learning . is really a conspiracy between Geo
Hinton and myself and Yoshua Bengio
Geo Hinton
Yann LeCun
Yoshua Bengio
La,n : Trium - vir - ate

English : Three - men - ocial
hVp://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/
trending
2004
2013
trends.google.com
Neural Networks 101
1980s technology
(label) y
Supervised learning
Given x and y, learn p(y|x)
Is this photo, x, a cat, y?
x =
x (input data)

Neural Networks 101

Pros
Simple to learn p(y|x)
Results good for shallow nets
Cons
Doesnt learn p(x)

Trouble with > ~3 layers
Overts
Slow
1980s technology
Neural Networks 101

Backpropaga5on
Vanishing Gradient in Backpropaga,on
This talk
Deep Learning technological

developments since 2006
Deep Learning now
The 2006 Breakthrough

B.R.
(before RBMs)
1.
2.
3.
4.
5.
6.
7.
Mainstream: Shallow nets (3 or fewer layers) on small data

Slow training on CPUs
Near universal sigmoid neuron nonlineari5es
Parameters ini5alized with random weights
Could only learn discrimina5ve p(y|x) not genera5ve p(x)
Neural Nets: Yet another machine learning algorithm (yamla)
Some convolu5onal networks (LeCun, et.al.)
Hinton, et. al.s RBMs

A.R.
(a]er RBMs)
1.
2.
3.
4.
5.
6.
7.
Mainstream: Deep nets (6+ layers) on Big Data

Fast training on GPUs
Autoencoders/RBMs to learn genera5ve models for p(x)
Ini5alize discrimina5ve p(y|x) parameters with genera5ve model
Rise of the ReLU nonlinearity
Dropout prevents overvng
Deep nets outcompete best SOTA in the world
1. Image Recogni5on (ImageNet)
2. Speech Recogni5on (TIMIT)
8. Deep learning moves out of academia to google and facebook
Neural Networks 101

Pros
Cons
Doesnt learn p(x)
Overts
Slow
The 2006 Breakthrough: RBMs

h: Hidden layer
v: Data
Key insights:
Learn p(x), not just p(y|x)
Address explaining away by
imposing condi5onal independence
Features in genera5ve model are
hierarchical
1. Symmetric weights between v and h

2. Contras5ve divergence in a nutshell:
1. Sample v, sample h via p(h|v)
2. g+ = hvT
3. Construct a sample v from h
4. sample h via p(h|v)
5. g- = hvT
6. Update W based on (g+ - g-)
Also see Denoising Autoencoders, Sparse Autoencoders, etc.
The 2006 Breakthrough: RBMs
Greedy Stacking RBMs

(approximately) improves a
varia5onal lower bound
on training data
likelihood.
hVp://www.slideshare.net/zukun/p04-restricted-boltzmann-machines-cvpr2012-deep-learning-methods-for-vision
hVp://machinelearning.wustl.edu/mlpapers/paper_les/AISTATS09_SalakhutdinovH.pdf
Neural Networks 101

Pros
Cons
Unsupervised feature
Doesnt learn p(x) learning: RBMs, DAEs, etc.
Overts
Slow
OverPitting and Dropout

Randomly turn o the neurons every 5me you backpropagate
a training example
Forces neurons in hidden layers to rely on broader popula5ons of
inputs rather than get dependent on a specic individual input
At test 5me, halve all the weights

A regularizer ~injec5ng noise into training and bagging
(improves condi5oning)
x
x
x
x x

x x
Train Error
Finetuning with dropout
Finetuning without dropout
Train Cross Entropy
Train data
hVp://www.youtube.com/watch?v=DleXA5ADG78
hVp://arxiv.org/pdf/1207.0580.pdf

Test Cross Entropy
Test Error
Test data

hVp://www.youtube.com/watch?v=DleXA5ADG78
hVp://arxiv.org/pdf/1207.0580.pdf
Regulariza,on of Neural Networks using DropConnect

Li Wan, MaVhew Zeiler, Sixin Zhang, Yann LeCun, Rob Fergus
Dept. of Computer Science, Courant Ins5tute of Mathema5cal Science, New York University
Dropout
Vanilla
Dropout BeV
er
Dropconnect
Dropconnect
Drop*t randomly turns o neurons every

5me a training sample is seen (simula5ng a
dierent architecture)
When tes5ng, approximate by appropriate
scaling on neurons
Dropout masks neuron outputs
Dropconnect masks neuron inputs
Both eec5vely bag classiers
Tighter theore5cal bounds on dropconnect
hVp://cs.nyu.edu/~wanli/dropc/
Neural Networks 101

Pros
Cons
Doesnt learn p(x)
Drop*out
Overts
Maxout
Fast Dropout
Slow
Stochas5c Pooling
A new kind of cat detector
Contrary to what appears to be a widely-held intui,on, our

experimental results reveal that it is possible to train a face detector
without having to label images as containing a face or not.
hVp://research.google.com/archive/unsupervised_icml2012.html

Non-faces
Faces
In terms of scale, our network is perhaps one of the largest known

networks to date. It has 1 billion trainable parameters our network is s5ll
5ny compared to the human visual cortex, which is 106 5mes larger in terms
of the number of neurons and synapses (Pakkenberg et al., 2003).
Face Detector Results
Non-cats
Breakthrough:
NO LABELS (y) to learn detectors!
Cats
Cat Detector Results

hVp://research.google.com/archive/unsupervised_icml2012.html
hVp://www.ny5mes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0

Automa5c 9-layer sparse autoencoder with 109 connec5ons. Training dataset has 10
million 200 x 200 pixel images. Trained with simple asynchronous SGD!
BUT
How to make it
scalable for the
masses?
hVp://www.youtube.com/watch?v=wZfVBwOO0-k
GPUs Faster, cheaper, scalabler

GPU speedup over CPU
Scaling op,ons
Scale out more machines in cluster
Scale up more compute/machine (GPUs)
Both
Cluster setup
16 servers x 2 quad-core processors,
4 NVIDIA GTX 680 GPUs/server (1TFLOPS)
16 x 4 x 1536 = 98,304 CUDA cores
Inniband connec5vity
hVp://www.youtube.com/watch?v=wZfVBwOO0-k
Results
Replicated 1 billion parameter
network results with 3 machines,
200 images/s, 14TB in 3hrs
Scaled to 11 billion parameter
network in a few days with 2% as
many machines
GPUs Faster, cheaper, scalabler
hVp://www.nvidia.com/content/HelpMeChoose/fx2/HelpMeChoose.asp?lang=en-us
Neural Networks 101

Pros
Cons
Doesnt learn p(x)
Overts
Slow
GPUs / CUDA
Deeper Intuition
Depth 4
Depth 3
Depth of architecture: the longest path from an input node to

an output node
Neural net depth = number of layers a~er input layer
Decision trees have eec5vely 2 layers
Boos5ng usually adds one layer to base learner via vo5ng
An architecture with n-1 layers may require exponen,ally more
units than a depth n architecture to model the same func,on
Reinterpret 2004/2008 Caruana results with this insight
Deeper Intuition

hVp://www.cs.toronto.edu/~rsalakhu/ISBI1_pdf_version.pdf
2009
1st level lters
2nd level
generic lters
2009
2nd
level
3rd
level
2009
2nd
level
3rd
level
This talk
since 2006
Deep Learning now
Its Learning Cats and Dogs

Deep Learning
99% accurate
Top 10 results all
Deep Learning
hVps://www.kaggle.com/c/dogs-vs-cats/leaderboard
hVp://www.npr.org/blogs/alltechconsidered/2014/02/20/280232074/deep-learning-teaching-computers-to-tell-things-apart
Learn, dont engineer

feature representa5ons
hVp://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/
Reuse features
across tasks
hVp://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/
Reuse features
across tasks
hVp://blog.kaggle.com/2013/05/06/qa-with-job-salary-predic5on-rst-prize-winner-vlad-mnih/
Deep Learning Wins
1. MICCAI 2013 Grand Challenge on Mitosis Detec5on

2. ICPR 2012 Contest on Mitosis Detec5on in Breast Cancer Histological Images
3. ISBI 2012 Brain Image Segmenta5on Challenge (with superhuman pixel error rate)
4. IJCNN 2011 Trac Sign Recogni5on Compe55on
(only method to achieve superhuman results)
5. ICDAR 2011 oine Chinese Handwri5ng Compe55on
6. Online German Trac Sign Recogni5on Contest
7. ICDAR 2009 Arabic Connected Handwri5ng Compe55on
8. ICDAR 2009 HandwriVen Farsi/Arabic Character Recogni5on Compe55on
9. ICDAR 2009 French Connected Handwri5ng Compe55on.
hVp://www.idsia.ch/~juergen/deeplearning.html
Deep Learning Wins
hVp://www.cs.toronto.edu/~hinton/absps/speechDBN_jrnl.pdf
Deep Learning Wins

2013
hVp://research.microso~.com/apps/pubs/?id=188864
Deep Learning Wins

Segmenta,on of neuronal structures in EM stacks challenge - ISBI 2012
Raw Data
Human Labels
Challenge: How well can your computer algorithm match the human labels?
Performance metrics: Pixel error, Rand (cluster) error, Warping (topology) error
hVp://ji.sc/wiki/index.php/Segmenta5on_of_neuronal_structures_in_EM_stacks_challenge_-_ISBI_2012
Deep Learning Wins

The final rankings of the challenge
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
at the ISBI 2012 (May 2nd, 2012) were as follows:
Name / Group
Alessandro Gius5 & Dan Ciresan / IDSIA
Dmitry Laptev / MLL-ETH
Sarvesh Dwivedi / MLL-ETH
Uygar Sumbul / MIT
Ting Liu / SCI
Verena Kaynig / Harvard
Mojtaba Seyedhosseini / SCI
Lee Kamentsky / CellProler
Daniel Manson / UCL
Lewis Grin / UCL
Radim Burget / IMMI
Xiao Tan / TSC+PP
Erhan Bas / CLP
Margret Keuper / Munich
Tolga Tasdizen / SCI
Saadia I~ikhar / NIST
Thorsten Schmidt / Munich
Submission # Rand Error
11
2
4
3
14
4
3
2
3
4
1
1
4
2
1
2
1
0.048314096
0.064500546
0.069819563
0.075537461
0.083700043
0.084447767
0.089458158
0.09035629
0.100188599
0.104156509
0.13903844
0.153145065
0.162303187
0.163112555
0.175128169
0.230241267
0.418431548
Warping Error
0.000434367
0.000555801
0.000524902
0.000645574
0.001601664
0.001124446
0.001134237
0.001512273
0.002199173
0.001475143
0.002641296
0.000684865
0.001613108
0.003023656
0.002140045
0.01615626
0.001403046
Pixel Error
0.060298549
0.083264179
0.079264809
0.065254812
0.134148235
0.157146646
0.077758118
0.100000994
0.132557284
0.095671823
0.102285508
0.087875907
0.109391938
0.097570061
0.092647919
0.149973922
0.097255861
Deep learning
Other
Deep Learning Wins
agaric
hVp://www.image-net.org/challenges/LSVRC/2012/results.html
hVp://videolectures.net/machine_krizhevsky_imagenet_classica5on/
Deep Learning Wins

2012 ImageNet Leaderboard
SuperVision Team:
Alex Krizhevsky,
Ilya Sutskever,
Geo Hinton
DEEP LEARNING
COMPUTER
VISION
hVp://www.image-net.org/challenges/LSVRC/2012/results.html
hVp://www.image-net.org/explore
The ImageNet Breakthrough
Learned Filters (1st layer):
Convolu5onal Neural Network

& fully connected layers
1.2M train, 150k test
60M parameters

Data Augmenta,on
Augmenta,on 1: Sampled overlapping patches with LR reec,ons

Increases # of training examples, averages 10 labels in output
02
04
06
20
08
20
40
001
40
60
021
60
80
041
80
20
100
100
02
40
061
120
120
081
04
60
140
140
002
06
80
160
100
180
160
08
002
200
05
140
220
200
20
20
40
60
80
100
120
140
160
021
220
40
180
160
60
180
80
200
041
220
220
50

100
150
002
081
061
100
150
04
002
06
022
041
140
021
001
08
06
04
02
08
160
001
180
021
041
200
220
20
50
100
150
40
60
60
80
100
100
150
200
08
200
041
220
06
180
021
200
04
160
001
180
05
02
140
08
160
022
001
120
06
140
002
051
100
04
120
081
80
002
02
061
20
200
40
50
001
220
50
061
100
021
041
081
061
002
081
022
002
051
001
05
002
022
002
256
200
02
081
120
022
200
50
061
100
200
256
022
001
001
120
Sample
224x224 images
+LR reec5ons
051180
051
001
05
Tes,ng samples 4 corners +

+center images + LR reec5ons
Augmenta,on 2: RGB PCA perturba,ons

Add noise in the direc5on of PCA on RGB pixel values

~ N(0,0.12),

One per image

Reduces error ~1%
hVp://www.image-net.org/explore
150
200
ReLU vs. Logis,c:

ReLU achieves logis5c training error
in epochs
3
Sigmoid
ReLU=max(0,x)
Tan
-1
(x)
ReLU(x)=max(0,x)
Training Error
Epoch
0
-3
-2
-1
Mi5gates vanishing
gradient
Good results without
pre-training
hVp://eprints.pascal-network.org/archive/00008596/01/glorot11a.pdf

Local Response Normaliza,on (LRN)
Neuron ac5vi5es are normalized at individual loca5ons across feature maps
4 parameter transforma5on
Reduces error rate 1.2% - 1.4%
Overlapping Max-Pooling
Every other pixel is replaced by the max value in a 3x3 window around that pixel
Max pooling is per channel
Overlapping pooling seems to resist overvng beVer than nonoverlapping
Reduces error rate 0.3% - 0.4%
See also:
Sample pooling outputs to eec5vely model average over convolu5onal layers
hVp://arxiv.org/abs/1301.3557

Training parameters:
Backpropaga,on with stochas5c gradient descent (no pretraining)
Used dropout to prevent overvng
Minibatch size = 128 examples
Momentum = 0.9
Weight decay = 0.0005
Update rule:
Code released and is BSD licensed (without dropout or mul5-GPU features)

hVp://code.google.com/p/cuda-convnet/
Whose team got 3rd on ImageNet?
Led by outstanding Oxford

Professor with one of the leading
object recogni,on labs in the world
On page 10, there are

s5ll object recogni5on
contribu5ons with
>100 cites!
How did the computer vision community take it?
+5
+22
Computer Vision vs. Neural Networks

hVps://plus.google.com/104362980539466846301/posts/JBBFfv2XgWM
hVp://www.theonion.com/ar5cles/local-idiot-to-post-comment-on-internet,2500/

Alexei A Efros
11k cites
Jitendra Malik
77k cites
Geo Hinton
71k cites
Yann Lecun
19k cites
Andrew Zisserman
63k cites
Computer Vision vs. Neural Networks

The [computer] vision community has been
skep5cal about deep learning and feature
learning, but there was the same kind of
skep5cism from the ML community un5l 4 or 5
years ago Thankfully, results on standard
benchmarks have a way of quie5ng down
theological arguments.
[82] H. Bourlard, H. Hermansky, and N. Morgan. Towards
increasing speech recogni5on error rates. Speech
Communica5on, 18(3):205231, May 1996.
[83] J.R. Pierce. Whither speech recogni5on? Journal of
the Acous5cal Society of America., 46:10491051, 1969.

We trained the network for roughly
90 cycles through the training set,
which took ~6 days on 2 NVIDIA GTX
580 3GB GPUs.
2 GTX 580s: ~$600

A new computer: ~$1000
A week of electricity: <$50
BSD code: free
Neural Networks 101 (ca. 2013)

Pros
Simple, fast, cheap to learn p(y|x) & p(x)
Results good win for shallow deep nets
Feature learning faster, less expensive,
beser than feature engineering
Open source resources and community
Theore,cally defensible
More scalable
Cons
Doesnt learn p(x)
Overts
Slow training
hVp://www.technologyreview.com/featuredstory/513696/deep-learning/
Google is the new academia
hVp://www.wired.com/wiredenterprise/2013/03/google_hinton/
Facebook is the new academia

LeCun: the purpose and the goal
of the new organiza5on [is] to do
two things. One is to really make
progress from a scien5c point of
view, from the side of technology.
This will involve par5cipa5ng in
the research community and
publishing papers. The other part
will be [integra5ng these
technologies] at Facebook.
NetPlix: Deep Personalization
hVp://venturebeat.com/2014/02/10/nelix-moves-into-deep-learning-research-to-improve-personaliza5on/
The Deep Learning Draft
Deep Mind sold for

$400M-$500M
Microso~, Facebook and Google nd themselves in a

baVle for deep learning talent Last year, the cost of a
top, world-class deep learning expert was about the same
as a top NFL quarterback prospect.
hVp://www.businessweek.com/ar5cles/2014-01-27/the-race-to-buy-the-human-brains-behind-deep-learning-machines
Does AI have 9 lives?

Deep Learning cured cancer!
Cau,ous op,mism in results,
without reckless asser,ons
about the future.
Deep Learning is yamla.
hVp://www.newyorker.com/online/blogs/elements/2014/01/the-new-york-5mes-ar5cial-intelligence-hype-machine.html
Ng on Deep Learning Criticisms
hVp://www.youtube.com/watch?v=ZmNOAtZIgIk
Deep Learning Facts

Deep Learning
Learn mul5ple levels of conceptual abstrac5on in data, o~en these are
understandable, reusable hierarchical feature representa5ons
Provides breakthrough or state of the art performance across many
problem domains, and is becoming a prac5cal resource through data
availability, algorithms, culture & hardware
Huge datasets available now, some labeled
Convolu,onal neural networks, Dropout, ReLU, Max-pooling, RBMs,
Autoencoders, data augmenta,on and curriculum learning
Many open source permissively licensed codebases
Pylearn2, CUDA convnet, theano, Torch, etc.
GPUs and large scale HPC networks
Is disrup>ng established technologies (like CSR)

Mi5gates the expensive hand labeling of objects and hand-engineering
of features
Can be used to generate sample data
Scales to terabytes and exabytes of data (i.e. Big Data)
Has a huge learning capacity that resists overvng with drop*t
Past Present
p(y|x),
yamla
Labels
Tan-1, logis5c
Data
Labels
Backprop,
feature
engineering
C, Matlab
p(y|x),
SOTA
Caruana-like
academic
evalua5ons
ReLU, maxout
NVIDIA
Backprop,
dropout, SGD,
aSGD,
autoencoders,
wake-sleep,
curriculum
learning, ?...
Stampede,
UTexas
CUDA, Torch, Cudamat,

Theano, pylearn2
p(x)
DataVVVV

Deeply Generative and

Discriminative Modeling
p(x)

s
e
r
tu
a
e
f

le
b

a
n
s
o
u
5
e
za
, r
i
l
d
a
e
r
x
~Fi r gene
fo
Now needs much

less labeled data
p(y|x)
SVD(DeepLearningFuture|Bengio)
2013
Limita,ons and Looking Forward

1. Scaling computa5ons
2. Reducing the dicul5es in op5mizing parameters
3. Designing (or avoiding) expensive inference and sampling
4. Improved learning of representa5ons that beVer disentangle the
unknown underlying factors of varia5on.
hVp://arxiv.org/pdf/1305.0445v2.pdf
What did I omit?

The dark art of training
Sparsity
L1 penal5es on ac5va5ons and ReLU

Biologically, 1-4% of neurons on
Reduces capacity of network (but ReLU increase capacity)
Leads to lower dimensional and more stable representa5ons
With a lot of labeled data, you dont really need the

unsupervised pre-training, per se, but its a logical way to tell
the recent history of deep learning
Need to get the scales of the weights correct and backpropagate
Recurrent networks
I do not understand these, except at a very high level
Other motivating resources

hVp://deeplearning.net
Courses
Hintons NNs hVps://www.coursera.org/course/neuralnets

Ngs ML hVps://www.coursera.org/course/ml
Libraries
Pylearn2 hVp://deeplearning.net/so~ware/pylearn2/
Theano hVp://deeplearning.net/so~ware/theano/
cuda-convnet hVps://code.google.com/p/cuda-convnet/
Torch7 hVp://torch.ch
Videos
My favorite 2 minutes on youtube (note the polite applause at the end):

A beVer, more informed version of this talk with Hintons visionary perspec5ve:
hVp://www.youtube.com/watch?v=vShMxxqtDDs
Ng hVp://www.youtube.com/watch?v=n1ViNeWhC24
Coates hVp://www.youtube.com/watch?v=wZfVBwOO0-k
A panel discussion hVp://www.youtube.com/watch?v=b4zr9Zx5WiE
H. Lee
hVp://www.slideshare.net/zukun/p04-restricted-boltzmann-machines-
cvpr2012-deep-learning-methods-for-vision
Not a vide: Bengio on the future hVp://arxiv.org/pdf/1305.0445v2.pdf
The Deep Learning Triumv[ei]rate

LeCun: You have to realize that deep learning . is really a conspiracy between Geo
Hinton and myself and Yoshua Bengio
Geo Hinton
Yann LeCun
Yoshua Bengio
La,n : Trium - {ver,vir} - ate

English : Three - {truth, men} - ocial
Questions?
Deep programming
Shallow programming
Deep Learning Wins
hVp://clopinet.com/isabelle/Projects/ICML2011/

Deep Learning Notes

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Deep Learning Notes

Caricato da

Copyright:

Formati disponibili

A

Machine Learning to detect cats

The way it was always done

The way it was always done

y (3D labels drawn by hand)

The way it was always done

The way it was always done

The way it was always done

ML Algorithm x Performance x Problem Domain

Ensemble Methods Win

S,ll Small Datasets

ML Algorithm x Performance x Problem Domain

Random Forests Win

Historical limita,ons of neural

Forbes, August 2013

The Deep Learning Triumv[ei]rate

La,n : Trium - vir - ate

Neural Networks 101

Neural Networks 101

Doesnt learn p(x)

Neural Networks 101

Vanishing Gradient in Backpropaga,on

Deep Learning technological

The 2006 Breakthrough

Mainstream: Shallow nets (3 or fewer layers) on small data

Hinton, et. al.s RBMs

Mainstream: Deep nets (6+ layers) on Big Data

Neural Networks 101

The 2006 Breakthrough: RBMs

1. Symmetric weights between v and h

Also see Denoising Autoencoders, Sparse Autoencoders, etc.

The 2006 Breakthrough: RBMs

Greedy Stacking RBMs

Neural Networks 101

OverPitting and Dropout

At test 5me, halve all the weights

Finetuning with dropout

Finetuning without dropout

Train Cross Entropy

Finetuning with dropout

Finetuning without dropout

Finetuning without dropout

Test Cross Entropy

Finetuning without dropout

Regulariza,on of Neural Networks using DropConnect

Drop*t randomly turns o neurons every

Neural Networks 101

A new kind of cat detector

Contrary to what appears to be a widely-held intui,on, our

A new kind of cat detector

In terms of scale, our network is perhaps one of the largest known

Face Detector Results

Cat Detector Results

A new kind of cat detector

A new kind of cat detector

GPUs Faster, cheaper, scalabler

GPUs Faster, cheaper, scalabler

Neural Networks 101

Depth of architecture: the longest path from an input node to

1st level lters

Deep Learning now

Its Learning Cats and Dogs

Learn, dont engineer