Sei sulla pagina 1di 17

What is Deep Learning (DL) ?

A machine learning subfield of learning representations of data. Exceptional effective


at learning patterns.
Deep learning algorithms attempt to learn (multiple levels of) representation by using
a hierarchy of multiple layers
If you provide the system tons of information, it begins to understand it and respond
in useful ways.

https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
Neural Network Intro
Weights

 𝒉 =𝝈 (𝐖 𝟏 𝒙+ 𝒃𝟏)

 𝒚= 𝝈 (𝑾 𝟐 𝒉+𝒃𝟐 )

Activation functions

How do we train?

 𝒚

4 + 2 = 6 neurons (not counting inputs)


 𝒙
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
 𝒉 26 learnable parameters

Demo
Element of Neural Network

Neuron  𝑓 : 𝑅𝐾 → 𝑅

a1 w1 z  a1w1  a2 w2    a K wK  b

a2 w2
 z   z a
wK

aK weights
Activation
function
b
bias
Neural Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2

……
……

……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer

Deep means many hidden layers


Softmax
• Softmax layer as the output layer

Ordinary Layer

z1   
y1   z1
In general, the output of
z2   
y2   z 2
network can be any
value.
May not be easy to interpret
z3   
y3   z 3
Softmax
 Probability:
• Softmax layer as the output layer
Softmax Layer
3
0.88
 e
3 z1 20 zj
z1 e e y1  e z1

j 1
0.12 3
z2 1
e e z2 2.7  y2  e z2
e
zj

j 1
0.05 ≈0 3
z3 -3 
e
e
z3 zj
e y3  e z3

3 j 1

 e zj

j 1
Why Deep?
Universality Theorem

Any continuous function f

f : R N  RM
Can be realized by a network
with one hidden layer
Reference for the reason:
(given enough hidden http://neuralnetworksandde
neurons) eplearning.com/chap4.html

Why “Deep” neural network not “Fat” neural network?


Fat + Short v.s. Thin + Tall
The same number
of parameters

Which one is better?


……

x1 x2 …… xN x1 x2 …… xN

Shallow Deep
Fat + Short v.s. Thin + Tall
The same number
of parameters

Which one is better?


……

x1 x2 …… xN x1 x2 …… xN

Shallow Deep
Why can
Deep?
be trained by little data
• Deep → Modularization
Classifier Girls with
1 long hair
Boy or
Girl? Classifier Boys with
2 fine long Little
hair data
Image Basic
Classifier Classifier Girls with
Long or 3 short hair
short?
Classifier Boys with
Sharing by the 4 short hair
following classifiers
as module
Deep Learning also works
WhyonDeep?
small data set like TIMIT.
• Deep → Modularization
→ Less training data?
x1 ……
x2 The modularization is ……
automatically learned from
……

……
……

……
data.
xN ……

The most basic Use 1st layer as module Use 2nd layer as
classifiers to build classifiers module ……
Hand-crafted
kernel function

SVM
Apply simple
classifier
Source of image: http://www.gipsa-lab.grenoble-
inp.fr/transfert/seminaire/455_Kadri2013Gipsa-lab.pdf
Deep Learning
simple
Learnable kernel
𝜙  ( 𝑥 ) classifier

x1 …… y1
x2
 
𝑥 …… y2



xN …… yM
Supervised Learning vs Unsupervised
Learning
What What
happe happens
where
ns we don’t
when have
our labels for
labels training
are at all?
noisy?

● Missing
values.

● Labeled
incorrectly.

01
Traditional Autoencoder

 Unlike the PCA now we


can use activation
functions to achieve
non-linearity.
 It has been shown that
an AE without activation
functions achieves the
PCA capacity.

Potrebbero piacerti anche