Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Backpropagation
and
Neural Networks part 1
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 1 13 Jan 2016
Administrative
A1 is due Jan 20 (Wednesday). ~150 hours left
Warning: Jan 18 (Monday) is Holiday (no class/office hours)
Also note:
Lectures are non-exhaustive.
Read course notes for completeness.
Ill hold make up office hours on Wed Jan20, 5pm @ Gates 259
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 2 13 Jan 2016
Where we are...
scores function
SVM loss
want
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 3 13 Jan 2016
Optimization
(image credits
to Alec Radford)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 4 13 Jan 2016
Gradient Descent
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 5 13 Jan 2016
Computational Graph
x
s (scores)
* hinge
loss
+
L
W
R
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 6 13 Jan 2016
Convolutional Network
(AlexNet)
input image
weights
loss
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 7 13 Jan 2016
Neural Turing Machine
input tape
loss
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 8 13 Jan 2016
Neural Turing Machine
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 9 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 10 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 11 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 12 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 14 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 15 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 16 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 17 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 18 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Chain rule:
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 19 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 20 13 Jan 2016
e.g. x = -2, y = 5, z = -4
Chain rule:
Want:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 21 13 Jan 2016
activations
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 22 13 Jan 2016
activations
local gradient
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 23 13 Jan 2016
activations
local gradient
gradients
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 24 13 Jan 2016
activations
local gradient
gradients
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 25 13 Jan 2016
activations
local gradient
gradients
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 26 13 Jan 2016
activations
local gradient
gradients
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 27 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 28 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 29 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 30 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 31 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 32 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 33 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 34 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 35 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 36 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 37 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 38 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 39 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 40 13 Jan 2016
Another example:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 41 13 Jan 2016
sigmoid function
sigmoid gate
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 42 13 Jan 2016
sigmoid function
sigmoid gate
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 43 13 Jan 2016
Patterns in backward flow
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 44 13 Jan 2016
Gradients add at branches
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 45 13 Jan 2016
Implementation: forward/backward API
Graph (or Net) object. (Rough psuedo code)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 46 13 Jan 2016
Implementation: forward/backward API
x
z
*
y
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 47 13 Jan 2016
Implementation: forward/backward API
x
z
*
y
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 48 13 Jan 2016
Example: Torch Layers
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 49 13 Jan 2016
Example: Torch Layers
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 50 13 Jan 2016
Example: Torch MulConstant
initialization
forward()
backward()
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 51 13 Jan 2016
Example: Caffe Layers
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 52 13 Jan 2016
Caffe Sigmoid Layer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 53 13 Jan 2016
(x,y,z are now This is now the
Gradients for vectorized code Jacobian matrix
vectors)
(derivative of each
element of z w.r.t. each
element of x)
local gradient
f
gradients
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 54 13 Jan 2016
Vectorized operations
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 55 13 Jan 2016
Vectorized operations
Jacobian matrix
Q: what is the
size of the
Jacobian matrix?
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 56 13 Jan 2016
Vectorized operations
Jacobian matrix
4096-d max(0,x)
f(x) = max(0,x) 4096-d
input vector (elementwise) output vector
(elementwise)
Q: what is the Q2: what does it
size of the look like?
Jacobian matrix?
[4096 x 4096!]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 57 13 Jan 2016
Vectorized operations
in practice we process an
entire minibatch (e.g. 100)
of examples at one time:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 58 13 Jan 2016
Assignment: Writing SVM/Softmax
Stage your forward/backward computation!
margins
E.g. for the SVM:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 59 13 Jan 2016
Summary so far
- neural nets will be very large: no hope of writing down gradient formula by
hand for all parameters
- backpropagation = recursive application of the chain rule along a
computational graph to compute the gradients of all
inputs/parameters/intermediates
- implementations maintain a graph structure, where the nodes implement
the forward() / backward() API.
- forward: compute result of an operation and save any intermediates
needed for gradient computation in memory
- backward: apply the chain rule to compute the gradient of the loss
function with respect to the inputs.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 60 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 61 13 Jan 2016
Neural Network: without the brain stuff
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 62 13 Jan 2016
Neural Network: without the brain stuff
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 63 13 Jan 2016
Neural Network: without the brain stuff
x W1 h W2 s
3072 100 10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 64 13 Jan 2016
Neural Network: without the brain stuff
x W1 h W2 s
3072 100 10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 65 13 Jan 2016
Neural Network: without the brain stuff
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 66 13 Jan 2016
Full implementation of training a 2-layer Neural Network needs ~11 lines:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 67 13 Jan 2016
Assignment: Writing 2layer Net
Stage your forward/backward computation!
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 68 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 69 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 70 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 71 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 72 13 Jan 2016
sigmoid activation
function
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 73 13 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 74 13 Jan 2016
Be very careful with your Brain analogies:
Biological Neurons:
- Many different types
- Dendrites can perform complex non-
linear computations
- Synapses are not a single weight but
a complex non-linear dynamical
system
- Rate code may not be adequate
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 75 13 Jan 2016
Activation Functions Leaky ReLU
max(0.1x, x)
Sigmoid
Maxout
tanh tanh(x) ELU
ReLU max(0,x)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 76 13 Jan 2016
Neural Networks: Architectures
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 77 13 Jan 2016
Example Feed-forward computation of a Neural Network
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 78 13 Jan 2016
Example Feed-forward computation of a Neural Network
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 79 13 Jan 2016
Setting the number of layers and their sizes
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 81 13 Jan 2016
Summary
- we arrange neurons into fully-connected layers
- the abstraction of a layer has the nice property that it
allows us to use efficient vectorized code (e.g. matrix
multiplies)
- neural networks are not really neural
- neural networks: bigger = better (but might have to
regularize more strongly)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 82 13 Jan 2016
Next Lecture:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 83 13 Jan 2016
inputs x
outputs y
complex graph
reverse-mode differentiation (if you want effect of many things on one thing)
for many different x
forward-mode differentiation (if you want effect of one thing on many things)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 84 13 Jan 2016