Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Digression
DERIVATIVES AND
GRADIENT
Derivatives
Some derivation rules:
w 2w
2 2
f (w) 2 f (w) f ' (w) c 0
w aw
a a 1
f (w) a f (w) f ' (w)
a a 1
(cw) c
f ( w) g ( w) f ( w) g ( w)
e e
w w
e e f (w)
f ( w) f ( w)
ln(w) 1
ln( f ( w)) 1
f ' ( w)
w f ( w)
f
f w2 also denoted by
w2
f
f w3 also denoted by
w3
Gradient
The gradient is the vector of partial derivatives.
f (w ) 2 w1w22 w32 , 2 w12 w2 w32 , 2 w12 w22 w3
Now suppose we want to compute the gradient at a point, say
w=(1,2,3), i.e. w1=1, w2=2, w3=3
f (w ) f ([1,2,3]) 2 1 2 2 32 , 2 12 2 32 , 2 12 2 2 3 81, 36, 24
OPTIMIZATION
Minimization Problem
min w f (w )
Iterative Method
Start at some w0; take a step along steepest slope
Fixed step size:
w w + v
f (w )
v Direction of fastest increase of f.
|| f (w ) ||
|| E (w ) ||
And we get
v E (w )
Gradient Descent Algorithm
Initialize w=0
For t=0,1,2,do
Compute the gradient f (w )
Iterate with the next step until w doesnt change too much
(or for a fixed number of iterations)
Return final w.
HOW DO WE DETERMINE ?
f(w) during iterations
f
Value of f for the w we
get after 100 iterations.