3 ArtificialNeuralNetworks PDF

Artificial Neural Networks
Recap: Logistic Unit

• If the combination of inputs
– {x1, x2, x3, …, xn}
• Result in a response that is a “categorical variable”
– With two possible states: 0 and 1
• Then, we have a unit that is known as the
– Logistic Unit
• And we need a function that will
– Trigger 0 or 1 as an output, based on the inputs
– Such a function is known as an Activation Function
2
Linear v/s non-linear classifiers
• Logistic regression : linear classifier
Boundary can be expressed as:
WT X
• Neural networks : non linear classifier

Boundary cannot be expressed
as: WTX
But can be achieved by a

combination of many logistic
units, each of which can be
expressed as WTX
3
Neural networks
• Neural network:
– multiple layers of logistic regression units
4
Neural networks
• Hidden layer
– One or more
• From left to right:
– a node in one layer is
connected to every other node
in the next layer
• Left-most layer = Input
• Right-most layer = Output
• Neural Network
– Graph with nodes and edges
5
Neural Networks: Training & Prediction
• Training: Back propagation
– Used to train the network
– All the (selected) observations are
considered for training
– Results in the calculation of optimum
values of the weights wi and vi
• Prediction: Feed forward
– Used to predict / test
– The xi are fed to the network, which
results in the y getting predicted
6
Neural Network (Case: output = binary)
7
Feed forward calculations (Case: output = binary)
• x1=>0, x2 => 1
• All the ‘w’ weights => 1
• All the ‘v’ weights => 1
• bias: b= 0; c=0;
• z(1) = sigmoid(0*1 + 1*1) = 0.731

• z(2) = sigmoid(0*1 + 1*1) = 0.731
• p(y|x) = sigmoid(0.731*1 + 0.731*1) = 0.812
• Challenge:
– How to choose the weights and biases so that the neural
network predicts accurately
8
Calculations using matrix notations
Z = sigmoid(XW + b)
For one hidden layer and binary classification Y = sigmoid(Zv + c)
• X => N by D matrix
– N = sample size (number of points)
– D = number of input features
• Z => N by M matrix
– M = Number of nodes in the hidden layer
• p(Y|X) => N by K matrix
– K = Number of distinct output classifications = 1 for binary
• W => D by M;
• b => M by 1; c => scalar
9
Binary v/s K-class classification
• Binary classification example
– Inputs: age, exercise frequency, smoking status
– Output: Will / Will not suffer from disease
• Multi-class classification
– Faces
– Vehicle models
– Weather: sunny, cloudy, heavy rain, moderate rain, etc
– Number recognition
– Character recognition
• Sigmoid function
– Can only support binary classification
10
The multi-class situation
11
Softmax function
• For binary classification
– y = sigmoid(z)
– Only one probability (p) is required to be calculated
– Since the other automatically becomes (1-p)
• For multi class classification, this idea can be
extended using the softmax function
– P(Y = K|X) = exp(ak) / Denom
– Where Denom = exp(a1) + exp(a2) + exp(a3) + …
12
13
14
Backpropagation
• Backpropagation goal:
– Obtain the weights w and v
– Based on minimizing an error
function (as in the case of
Logistic Regression)
– Error gets “propagated”
backwards from the right
– Weights in the intermediate
layers get adjusted based on
this back-propagated error
15
Likelihood Recap: Logistic Regression
• Recap: Logistic Regression
– SIGMOID function is used to generate the output
– Outcomes have only two possible states => 0,1
– The Likelihood function is expressed as follows
– Error function: Negative of the Log Likelihood

– Gradient of the error function w.r.t w is:
16
Maximum Likelihood: Generalized Case
17
Maximum Likelihood: Generalized Case
18
Likelihood : Generalized Case
• In the generalized case of Neural Networks
– SOFTMAX function is used to generate the output
– Outcomes may have multiple states
– The Likelihood function is expressed as follows:
19
Expressions for gradients: Neural Network
• In this case:
• Minimizing J => Maximizing Likelihood

• We differentiate J w.r.t weights w and v
– This will help us calculate w and v using Gradient
Descent and Backpropagation
• It can be shown that (see next slide for ref.):
20
Derivative of the SOFTMAX function
https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
21
Expressions for gradients: Deep Networks
• Following expressions are for single observations;
same can be extended for N observations
22
Procedure: Calculating weights of Neural Networks
1. Randomly initialize the weight vectors w
2. Multiply the input with weight vectors to reach the
final output (feed forward)
3. Calculate the error : compare result of the forward
step to the expected output
4. Calculate the gradient of the error and change
weights towards the direction of the gradient
– This is the Gradient Descent method
5. Back-propagate this calculation and change the
weights right up to the first hidden layer
6. Repeat steps 1-5 till the stopping criteria is reached
23
How to interpret the weights and outputs
• The value(s) of Y resulting from these weights is a
probability
– If the probability is greater than 50% => YES
– If the probability is less than 50% => NO
• In the case of neural networks, beyond the first layer the
weights cannot be said to have any meaning!
– This is what makes neural networks non-linear and powerful
24
Example : Neural Network – Training Data
25
Example: Neural Network
26
Example: Neural Network: Misclassified Obs.
27
ANN: ROC Plots Example
28
ROC Plots: ANN Based Solutions
Hidden
Layer
Nodes
2 to 151
29
ROC Plots: ANN Based Solutions
Hidden
Layer
Nodes
2 to 151
30
Various ANN Examples
31
Boundary with 0 nodes in hidden layer
32
Boundary with 1 node in hidden layer
33
34
35
36
37
38
39
40
41
42
43
ANN: Example 2
44
ANN: Hidden layer nodes = 1
45
46
47
48
49
50
51
52
53
54
55
56
ANN: Example 3
57
58
59
60
61
62
63
64
65
66
67
ANN : Example 4
68
69
70
71
72
73
ANN: Example 5
74
75
76
77

3 ArtificialNeuralNetworks PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

3 ArtificialNeuralNetworks PDF

Caricato da

Copyright:

Formati disponibili

Artificial Neural Networks

Recap: Logistic Unit

• Neural networks : non linear classifier

But can be achieved by a

• z(1) = sigmoid(01 + 11) = 0.731

– Error function: Negative of the Log Likelihood

• Minimizing J => Maximizing Likelihood

Potrebbero piacerti anche