Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2
Linear v/s non-linear classifiers
• Logistic regression : linear classifier
Boundary can be expressed as:
WT X
4
Neural networks
• Hidden layer
– One or more
• From left to right:
– a node in one layer is
connected to every other node
in the next layer
• Left-most layer = Input
• Right-most layer = Output
• Neural Network
– Graph with nodes and edges
5
Neural Networks: Training & Prediction
• Training: Back propagation
– Used to train the network
– All the (selected) observations are
considered for training
– Results in the calculation of optimum
values of the weights wi and vi
• Prediction: Feed forward
– Used to predict / test
– The xi are fed to the network, which
results in the y getting predicted
6
Neural Network (Case: output = binary)
7
Feed forward calculations (Case: output = binary)
• x1=>0, x2 => 1
• All the ‘w’ weights => 1
• All the ‘v’ weights => 1
• bias: b= 0; c=0;
• Challenge:
– How to choose the weights and biases so that the neural
network predicts accurately
8
Calculations using matrix notations
Z = sigmoid(XW + b)
For one hidden layer and binary classification Y = sigmoid(Zv + c)
• X => N by D matrix
– N = sample size (number of points)
– D = number of input features
• Z => N by M matrix
– M = Number of nodes in the hidden layer
• p(Y|X) => N by K matrix
– K = Number of distinct output classifications = 1 for binary
• W => D by M;
• b => M by 1; c => scalar
9
Binary v/s K-class classification
• Binary classification example
– Inputs: age, exercise frequency, smoking status
– Output: Will / Will not suffer from disease
• Multi-class classification
– Faces
– Vehicle models
– Weather: sunny, cloudy, heavy rain, moderate rain, etc
– Number recognition
– Character recognition
• Sigmoid function
– Can only support binary classification
10
The multi-class situation
11
Softmax function
• For binary classification
– y = sigmoid(z)
– Only one probability (p) is required to be calculated
– Since the other automatically becomes (1-p)
• For multi class classification, this idea can be
extended using the softmax function
– P(Y = K|X) = exp(ak) / Denom
– Where Denom = exp(a1) + exp(a2) + exp(a3) + …
12
The multi-class situation
13
The multi-class situation
14
Backpropagation
• Backpropagation goal:
– Obtain the weights w and v
– Based on minimizing an error
function (as in the case of
Logistic Regression)
– Error gets “propagated”
backwards from the right
– Weights in the intermediate
layers get adjusted based on
this back-propagated error
15
Likelihood Recap: Logistic Regression
• Recap: Logistic Regression
– SIGMOID function is used to generate the output
– Outcomes have only two possible states => 0,1
– The Likelihood function is expressed as follows
16
Maximum Likelihood: Generalized Case
17
Maximum Likelihood: Generalized Case
18
Likelihood : Generalized Case
• In the generalized case of Neural Networks
– SOFTMAX function is used to generate the output
– Outcomes may have multiple states
– The Likelihood function is expressed as follows:
19
Expressions for gradients: Neural Network
• In this case:
20
Derivative of the SOFTMAX function
https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
21
Expressions for gradients: Deep Networks
• Following expressions are for single observations;
same can be extended for N observations
22
Procedure: Calculating weights of Neural Networks
1. Randomly initialize the weight vectors w
2. Multiply the input with weight vectors to reach the
final output (feed forward)
3. Calculate the error : compare result of the forward
step to the expected output
4. Calculate the gradient of the error and change
weights towards the direction of the gradient
– This is the Gradient Descent method
5. Back-propagate this calculation and change the
weights right up to the first hidden layer
6. Repeat steps 1-5 till the stopping criteria is reached
23
How to interpret the weights and outputs
• The value(s) of Y resulting from these weights is a
probability
– If the probability is greater than 50% => YES
– If the probability is less than 50% => NO
• In the case of neural networks, beyond the first layer the
weights cannot be said to have any meaning!
– This is what makes neural networks non-linear and powerful
24
Example : Neural Network – Training Data
25
Example: Neural Network
26
Example: Neural Network: Misclassified Obs.
27
ANN: ROC Plots Example
28
ROC Plots: ANN Based Solutions
Hidden
Layer
Nodes
2 to 151
29
ROC Plots: ANN Based Solutions
Hidden
Layer
Nodes
2 to 151
30
Various ANN Examples
31
Boundary with 0 nodes in hidden layer
32
Boundary with 1 node in hidden layer
33
Boundary with 2 nodes in hidden layer
34
Boundary with 5 nodes in hidden layer
35
Boundary with 7 nodes in hidden layer
36
Boundary with 11 nodes in hidden layer
37
Boundary with 21 nodes in hidden layer
38
Boundary with 31 nodes in hidden layer
39
Boundary with 51 nodes in hidden layer
40
Boundary with 71 nodes in hidden layer
41
Boundary with 91 nodes in hidden layer
42
Boundary with 151 nodes in hidden layer
43
ANN: Example 2
44
ANN: Hidden layer nodes = 1
45
ANN: Hidden layer nodes = 1
46
ANN: Hidden layer nodes = 2
47
ANN: Hidden layer nodes = 2
48
ANN: Hidden layer nodes = 3
49
ANN: Hidden layer nodes = 3
50
ANN: Hidden layer nodes = 9
51
ANN: Hidden layer nodes = 9
52
ANN: Hidden layer nodes = 1
53
ANN: Hidden layer nodes = 2
54
ANN: Hidden layer nodes = 3
55
ANN: Hidden layer nodes = 9
56
ANN: Example 3
57
ANN: Hidden layer nodes = 1
58
ANN: Hidden layer nodes = 1
59
ANN: Hidden layer nodes = 2
60
ANN: Hidden layer nodes = 2
61
ANN: Hidden layer nodes = 9
62
ANN: Hidden layer nodes = 31
63
ANN: Hidden layer nodes = 31
64
ANN: Hidden layer nodes = 2
65
ANN: Hidden layer nodes = 9
66
ANN: Hidden layer nodes = 31
67
ANN : Example 4
68
ANN: Hidden layer nodes = 1
69
ANN: Hidden layer nodes = 2
70
ANN: Hidden layer nodes = 3
71
ANN: Hidden layer nodes = 9
72
ANN: Hidden layer nodes = 31
73
ANN: Example 5
74
ANN: Hidden layer nodes = 1
75
ANN: Hidden layer nodes = 2
76
ANN: Hidden layer nodes = 3
77