Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Samy Bengio
bengio@idiap.ch
y = g(s)
s = f(x;w)
w1 w2
x1 x2
X
Example of integration function: s = 0 + xi i
i
outputs
output layer
units
hidden layers
parameters
input layer
inputs
From now on, let xi (p) be the ith value in the pth example
represented by vector x(p) (and when possible, let us drop p).
Each layer l (1 l M ) is fully connected to the previous layer
X
l l
Integration: si = bi + yjl1 wi,j
l
Linear Linear+Sigmoid
Linear+Sigmoid+Linear Linear+Sigmoid+Linear+Sigmoid+Linear
where
y(p) = MLP(x(p); )
s+1 s C(Dn , s )
=
s
where is the learning rate.
gradient (slope)
local minimum
global minimum
a = f(b)
Chain rule:
if a = f (b) and b = g(c)
b = g(c)
a a b
then = = f 0 (b) g 0 (c)
c b c
Sum rule:
a = f(b,c)
if a = f (b, c) and b = g(d) and c = h(d)
a a b a c
then = + b = g(d) c = h(d)
d b d c d
a f (b, c) 0 f (b, c) 0
= g (d) + h (d)
d b c d
targets
criterion
outputs
backpropagation
of the error
parameter
to tune
inputs
M
Hence the derivative with respect to wi,j is:
yi yi sMi
M
=
wi,j sMi
M
wi,j
= (1 (yi )2 ) yjM 1
yi yi sM i
=
bMi sMi bM
i
= (1 (yi )2 ) 1
where
ykl+1 ykl+1 sl+1
k
=
yjl sl+1
k
yjl
= (1 (ykl+1 )2 ) wk,j
l+1
and
yi yi
M
= 1 and M
=0
yi yk6=i
yi yi yjl
l
=
wj,k yjl wj,k
l
yi l1
= l
yk
yj
and
yi yi yjl
= l
blj l
yj bj
yi
= 1
yjl
C C C L y(p) yjl
l
= l
+ l
l
wj,k wj,k L y(p) yj wj,k
(c) Update the parameters: is+1 = is C
is
0.3 linear
0.6 1.2
1.3 0.3
0.7 0.6
0.6 1.2
0.55 0.92
1.1 tanh tanh 2.3
0.62 1.58
1.3 0.3
0.7 0.6
0.8 0.8
0.6 1.2
dw=0.87 dw=1.44
0.8 0.8
0.1 linear
0.77 0.91
1.19 0.35
0.81 0.65
0.77 0.91
0.73 0.89
1.23 tanh tanh 2.24
0.92 1.45
1.19 0.35
0.81 0.65
0.8 0.8
derivative is good
weighted sum
1
1/(1+0.1*x)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
yil = exp(sli )
l
The parameters of such layer l are l = {i,j , li,j : i, j}
These layers are useful to extract local features (whereas tanh
layers extract global features)
Initialization: use K-Means for instance
RBFs MLPs
internal space
inputs
Artificial Neural Networks 40
Mixture of Experts
Expert 1
Expert 2
.
.
.
Expert N
Gater