Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This notebook
To keep things explores different
simple, instead of flavors/variations
working with neural of Stochastic Gradient
networks here, Descent,
we will the keylinear
use a simple optimiza
mo
NNs.
NB: Click File->Make a Copy before using this spreadsheet.
Goal: We want to create a model so that we can predict y when we're given x.
To build the model, we are given training data x, y and we want to find an optimal a, b to create a m
How does this relate to NNs? To make an analogy with CNNs: x would be our images, y would be th
We are trying to find the optimal weights (a, b). We use the same optimization methods to find wei
basic SGD:
momentum:
adagrad:
rmsprop:
adam:
adagrad_ann:
adam_ann:
Some comments have been added to the other sheets in this notebook.
Comments are indicated by a small red triangle in the corner of a cell. Hover your mouse over tha
is this excel notebook about?
otebook
ep things explores different
simple, instead of flavors/variations
working with neuralof Stochastic Gradient
networks here, Descent,
we will the keylinear
use a simple optimization method usH
model (y=ax+b).
ey idea of SGD:
ndomly choose some weights to start.
e those weights to calculate a prediciton, calculate the error/loss, and then update those weights.
peat step 2 lots of times. Eventually we end up with some decent weights.
does this relate to NNs? To make an analogy with CNNs: x would be our images, y would be the dog/cat label
e trying to find the optimal weights (a, b). We use the same optimization methods to find weights for our NN
iew of theHere
sheets in this notebook:
we randomly generate some training data x and y that we will use on the other sheets. In the
another
We create source. In our
our data CNN, x is the
by arbitrarily image adata
choosing andand y is the
b (a=2, category
b=30), (doggenerating
randomly or cat). x, and then cal
know the true values of a and b, and see if our optimization algorithms help us find them.
We
Thiscan getmost
is the a sense of how
basic, good
vanilla formour
ofanswers
stochasticare, since really
gradient we know
descent. If wewhat
werethe "true"
doing a anddescen
gradient b are.
our
What data, anditthen
makes updateisthe
stochastic thatweights a, b.evaluating the error on a subset of our data (in this case,
we're just
batch") and then updating the weights.
Accelerate SGD in the relevant direction and dampen side-to-side oscillations
It does this by combining a decaying average of previous gradients with the current gradient
Adapts the learning rate by dividing by the sqrt of the avg of the previous gradients squared
Adapts the learning rate by dividing by exponentially decaying average of squared gradients
rmsprop + momentum
adagrad with learning rate annealling
adam with learning rate annealling
x y=a*x + b
36 102
55 140
23 76
91 212
84 198
61 152
95 220
91 212
84 198
6 42
30 90
46 122
36 102
44 118
26 82
97 224
61 152
35 100
64 158
74 178
78 186
64 158
14 58
4 38
23 76
78 186
63 156
46 122
82 194
intercept 1 learn 0.0001
slope 1 e=(ax+b-y)^2 de/db=2(ax+b-y)
x y intercept slope y_pred err^2 errb1 est de/db erra1 est de/da de/db
14 58 1 1 15 1,849 1,848.14 -85.99 1,836.98 -1,202.04 -86.00
86 202 1.01 1.12 97.36 10,949 10,946.81 -209.26 10,769.67 -17,923.60 -209.27
28 86 1.03 2.92 82.79 10 10.22 -6.40 8.56 -171.70 -6.41
51 132 1.03 2.94 150.87 356 356.60 37.76 375.73 1,951.14 37.75
28 86 1.03 2.75 77.90 66 65.40 -16.18 61.10 -445.58 -16.19
29 88 1.03 2.79 81.97 36 36.30 -12.06 33.00 -341.60 -12.07
72 174 1.03 2.83 204.50 930 930.68 61.00 974.50 4,443.41 60.99
62 154 1.02 2.39 149.00 25 24.86 -9.98 19.15 -581.09 -9.99
84 198 1.02 2.45 206.72 76 76.18 17.45 91.36 1,535.20 17.44
15 60 1.02 2.30 35.56 597 597.00 -48.88 590.17 -731.06 -48.89
42 114 1.03 2.38 100.80 174 173.91 -26.38 163.26 -1,090.94 -26.39
62 154 1.03 2.49 155.19 1 1.44 2.39 3.28 186.07 2.38
47 124 1.03 2.47 117.20 46 46.11 -13.59 40.07 -617.15 -13.60
35 100 1.03 2.54 89.78 104 104.29 -20.43 97.46 -703.30 -20.44
9 48 1.03 2.61 24.50 552 551.89 -46.99 548.14 -422.23 -47.00
38 106 1.04 2.65 101.72 18 18.25 -8.55 15.22 -310.98 -8.56
44 118 1.04 2.68 119.05 1 1.12 2.11 2.21 111.56 2.10
99 228 1.04 2.67 265.65 1,417 1,417.98 75.30 1,492.75 7,551.94 75.29
13 56 1.03 1.93 26.09 895 894.17 -59.82 887.01 -776.04 -59.83
21 72 1.04 2.01 43.15 833 831.99 -57.70 820.49 -1,207.47 -57.71
28 86 1.04 2.13 60.58 646 645.61 -50.83 631.97 -1,415.62 -50.84
20 70 1.05 2.27 46.42 556 555.45 -47.15 546.53 -939.12 -47.16
8 46 1.05 2.36 19.96 678 677.73 -52.08 674.09 -416.05 -52.09
64 158 1.06 2.40 154.96 9 9.19 -6.07 5.77 -348.36 -6.08
99 228 1.06 2.44 242.98 224 224.63 29.97 254.97 3,063.62 29.96
70 170 1.06 2.15 151.35 348 347.44 -37.29 322.19 -2,561.96 -37.30
27 84 1.06 2.41 66.08 321 320.79 -35.83 311.54 -960.42 -35.84
17 64 1.06 2.50 43.65 414 413.86 -40.70 407.37 -689.13 -40.71
8 46 1.07 2.57 21.66 592 591.96 -48.67 588.56 -388.80 -48.68
rmse 151 301.51