Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Short
History
of
and
Introduction
to
Deep
Learning
John
Kau)old
Deep
Learning
Analy5cs
This
talk
Historical
machine
learning
context
Historical
limita,ons
of
neural
networks
Deep
Learning
technological
developments
since
2006
Deep
Learning
now
Machine
Learning
in
AI
AI
agent
Sensor
ML
The
World
Features
Actuator
Digital
Form
x,y
Classiers
/
Predictors
Analy5cs
y
(human
labels)
x,y
ML
(human
engineering)
Classiers
/
Predictors
Features
Cats
Not
Cats
The World
Cat
Not Cat
Cat
MFCC
y
(LDC
transcrip,ons)
Speech
Recogni5on
Kau)old,
Energy
Formula5ons
of
Medical
Image
Segmenta5ons
2001
MRI
Image
Analysis
Kau)old,
Energy
Formula5ons
of
Medical
Image
Segmenta5ons
2001
hVp://www.youtube.com/watch?v=4xsVFLnHC_0
hVp://www.youtube.com/watch?v=vShMxxqtDDs
hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10739.pdf
Tiny
Datasets
Shallow
Neural
Nets
ML:
{BDT,
RF,
NB,
SVMs,
ANNs,
LR,
BAG-DT}
Ranking Metrics
Probability
Metrics
Score
Random Forests
Feature dimensionality
This
talk
Historical
machine
learning
context
hVp://www.forbes.com/sites/netapp/2013/08/19/what-is-deep-learning/
Geo Hinton
Yann LeCun
Yoshua Bengio
trending
2004
2013
trends.google.com
1980s technology
(label) y
Supervised
learning
Given
x
and
y,
learn
p(y|x)
Is
this
photo,
x,
a
cat,
y?
x
=
x
(input
data)
hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10740.pdf
Cons
hVps://www.ipam.ucla.edu/publica5ons/gss2012/gss2012_10740.pdf
This
talk
Historical
machine
learning
context
Historical
limita,ons
of
neural
networks
(before RBMs)
1.
2.
3.
4.
5.
6.
7.
(a]er RBMs)
1.
2.
3.
4.
5.
6.
7.
v: Data
Key
insights:
Learn
p(x),
not
just
p(y|x)
Address
explaining
away
by
imposing
condi5onal
independence
Features
in
genera5ve
model
are
hierarchical
hVp://www.slideshare.net/zukun/p04-restricted-boltzmann-machines-cvpr2012-deep-learning-methods-for-vision
hVp://machinelearning.wustl.edu/mlpapers/paper_les/AISTATS09_SalakhutdinovH.pdf
x
x
x
x
x
x
x
Train Error
Train data
hVp://www.youtube.com/watch?v=DleXA5ADG78
hVp://arxiv.org/pdf/1207.0580.pdf
Test Error
Test data
hVp://www.youtube.com/watch?v=DleXA5ADG78
hVp://arxiv.org/pdf/1207.0580.pdf
Dropout
Vanilla
Dropout BeV
er
Dropconnect
Dropconnect
hVp://cs.nyu.edu/~wanli/dropc/
hVp://research.google.com/archive/unsupervised_icml2012.html
Non-cats
Breakthrough:
NO
LABELS
(y)
to
learn
detectors!
Cats
hVp://www.ny5mes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all&_r=0
BUT
How
to
make
it
scalable
for
the
masses?
hVp://www.youtube.com/watch?v=wZfVBwOO0-k
Scaling
op,ons
Scale
out
more
machines
in
cluster
Scale
up
more
compute/machine
(GPUs)
Both
Cluster
setup
16
servers
x
2
quad-core
processors,
4
NVIDIA
GTX
680
GPUs/server
(1TFLOPS)
16
x
4
x
1536
=
98,304
CUDA
cores
Inniband
connec5vity
hVp://www.youtube.com/watch?v=wZfVBwOO0-k
Results
Replicated
1
billion
parameter
network
results
with
3
machines,
200
images/s,
14TB
in
3hrs
Scaled
to
11
billion
parameter
network
in
a
few
days
with
2%
as
many
machines
hVp://www.nvidia.com/content/HelpMeChoose/fx2/HelpMeChoose.asp?lang=en-us
Deeper Intuition
Depth 4
Depth 3
Deeper
Intuition
hVp://www.cs.toronto.edu/~rsalakhu/ISBI1_pdf_version.pdf
2009
2nd
level
generic
lters
2009
2nd
level
3rd
level
2009
2nd
level
3rd
level
This
talk
Historical
machine
learning
context
Historical
limita,ons
of
neural
networks
Deep
Learning
technological
developments
since
2006
hVps://www.kaggle.com/c/dogs-vs-cats/leaderboard
hVp://www.npr.org/blogs/alltechconsidered/2014/02/20/280232074/deep-learning-teaching-computers-to-tell-things-apart
hVp://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/
Reuse
features
across
tasks
hVp://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/
Reuse
features
across
tasks
hVp://blog.kaggle.com/2013/05/06/qa-with-job-salary-predic5on-rst-prize-winner-vlad-mnih/
hVp://www.cs.toronto.edu/~hinton/absps/speechDBN_jrnl.pdf
hVp://research.microso~.com/apps/pubs/?id=188864
Raw Data
Human Labels
Challenge:
How
well
can
your
computer
algorithm
match
the
human
labels?
Performance
metrics:
Pixel
error,
Rand
(cluster)
error,
Warping
(topology)
error
hVp://ji.sc/wiki/index.php/Segmenta5on_of_neuronal_structures_in_EM_stacks_challenge_-_ISBI_2012
Name
/
Group
Alessandro
Gius5
&
Dan
Ciresan
/
IDSIA
Dmitry
Laptev
/
MLL-ETH
Sarvesh
Dwivedi
/
MLL-ETH
Uygar
Sumbul
/
MIT
Ting
Liu
/
SCI
Verena
Kaynig
/
Harvard
Mojtaba
Seyedhosseini
/
SCI
Lee
Kamentsky
/
CellProler
Daniel
Manson
/
UCL
Lewis
Grin
/
UCL
Radim
Burget
/
IMMI
Xiao
Tan
/
TSC+PP
Erhan
Bas
/
CLP
Margret
Keuper
/
Munich
Tolga
Tasdizen
/
SCI
Saadia
I~ikhar
/
NIST
Thorsten
Schmidt
/
Munich
11
2
4
3
14
4
3
2
3
4
1
1
4
2
1
2
1
0.048314096
0.064500546
0.069819563
0.075537461
0.083700043
0.084447767
0.089458158
0.09035629
0.100188599
0.104156509
0.13903844
0.153145065
0.162303187
0.163112555
0.175128169
0.230241267
0.418431548
Warping
Error
0.000434367
0.000555801
0.000524902
0.000645574
0.001601664
0.001124446
0.001134237
0.001512273
0.002199173
0.001475143
0.002641296
0.000684865
0.001613108
0.003023656
0.002140045
0.01615626
0.001403046
Pixel
Error
0.060298549
0.083264179
0.079264809
0.065254812
0.134148235
0.157146646
0.077758118
0.100000994
0.132557284
0.095671823
0.102285508
0.087875907
0.109391938
0.097570061
0.092647919
0.149973922
0.097255861
Deep
learning
Other
agaric
hVp://www.image-net.org/challenges/LSVRC/2012/results.html
hVp://videolectures.net/machine_krizhevsky_imagenet_classica5on/
SuperVision
Team:
Alex
Krizhevsky,
Ilya
Sutskever,
Geo
Hinton
DEEP LEARNING
COMPUTER
VISION
hVp://www.image-net.org/challenges/LSVRC/2012/results.html
hVp://www.image-net.org/explore
02
04
06
20
08
20
40
001
40
60
021
60
80
041
80
20
100
100
02
40
061
120
120
081
04
60
140
140
002
06
80
160
100
180
160
08
002
200
05
140
220
200
20
20
40
60
80
100
120
140
160
021
220
40
180
160
60
180
80
200
041
220
220
50
100
150
002
081
061
100
150
04
002
06
022
041
140
021
001
08
06
04
02
08
160
001
180
021
041
200
220
20
50
100
150
40
60
60
80
100
100
150
200
08
200
041
220
06
180
021
200
04
160
001
180
05
02
140
08
160
022
001
120
06
140
002
051
100
04
120
081
80
002
02
061
20
200
40
50
001
220
50
061
100
021
041
081
061
002
081
022
002
051
001
05
002
022
002
256
200
02
081
120
022
200
50
061
100
200
256
022
001
001
120
Sample
224x224
images
+LR
reec5ons
051180
051
001
05
150
200
Sigmoid
ReLU=max(0,x)
Tan
-1
(x)
ReLU(x)=max(0,x)
Training Error
Epoch
0
-3
-2
-1
Mi5gates
vanishing
gradient
Good
results
without
pre-training
hVp://eprints.pascal-network.org/archive/00008596/01/glorot11a.pdf
See
also:
Sample
pooling
outputs
to
eec5vely
model
average
over
convolu5onal
layers
hVp://arxiv.org/abs/1301.3557
Update rule:
+5
+22
hVp://www.theonion.com/ar5cles/local-idiot-to-post-comment-on-internet,2500/
Jitendra
Malik
77k
cites
Geo
Hinton
71k
cites
Yann
Lecun
19k
cites
Andrew
Zisserman
63k
cites
Computer
Vision
vs.
Neural
Networks
hVps://plus.google.com/104362980539466846301/posts/JBBFfv2XgWM
hVps://plus.google.com/104362980539466846301/posts/JBBFfv2XgWM
Slow training
hVp://www.technologyreview.com/featuredstory/513696/deep-learning/
hVp://www.wired.com/wiredenterprise/2013/03/google_hinton/
hVp://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/
hVp://venturebeat.com/2014/02/10/nelix-moves-into-deep-learning-research-to-improve-personaliza5on/
hVp://www.newyorker.com/online/blogs/elements/2014/01/the-new-york-5mes-ar5cial-intelligence-hype-machine.html
hVp://www.youtube.com/watch?v=ZmNOAtZIgIk
Past Present
p(y|x),
yamla
Labels
Tan-1,
logis5c
Data
Labels
Backprop,
feature
engineering
C, Matlab
p(y|x),
SOTA
Caruana-like
academic
evalua5ons
ReLU,
maxout
NVIDIA
Backprop,
dropout,
SGD,
aSGD,
autoencoders,
wake-sleep,
curriculum
learning,
?...
Stampede,
UTexas
p(x)
DataVVVV
s
e
r
tu
a
e
f
le
b
a
n
s
o
u
5
e
za
,
r
i
l
d
a
e
r
x
~Fi r
gene
fo
SVD(DeepLearningFuture|Bengio)
2013
Recurrent
networks
I
do
not
understand
these,
except
at
a
very
high
level
Libraries
Pylearn2
hVp://deeplearning.net/so~ware/pylearn2/
Theano
hVp://deeplearning.net/so~ware/theano/
cuda-convnet
hVps://code.google.com/p/cuda-convnet/
Torch7
hVp://torch.ch
Videos
Geo Hinton
Yann LeCun
Yoshua Bengio
Questions?
Deep programming
Shallow programming
hVp://clopinet.com/isabelle/Projects/ICML2011/