Sei sulla pagina 1di 77

Artificial Neural Networks

Recap: Logistic Unit


• If the combination of inputs
– {x1, x2, x3, …, xn}
• Result in a response that is a “categorical variable”
– With two possible states: 0 and 1
• Then, we have a unit that is known as the
– Logistic Unit
• And we need a function that will
– Trigger 0 or 1 as an output, based on the inputs
– Such a function is known as an Activation Function

2
Linear v/s non-linear classifiers
• Logistic regression : linear classifier
Boundary can be expressed as:
WT X

• Neural networks : non linear classifier


Boundary cannot be expressed
as: WTX

But can be achieved by a


combination of many logistic
units, each of which can be
expressed as WTX
3
Neural networks
• Neural network:
– multiple layers of logistic regression units

4
Neural networks
• Hidden layer
– One or more
• From left to right:
– a node in one layer is
connected to every other node
in the next layer
• Left-most layer = Input
• Right-most layer = Output

• Neural Network
– Graph with nodes and edges
5
Neural Networks: Training & Prediction
• Training: Back propagation
– Used to train the network
– All the (selected) observations are
considered for training
– Results in the calculation of optimum
values of the weights wi and vi
• Prediction: Feed forward
– Used to predict / test
– The xi are fed to the network, which
results in the y getting predicted

6
Neural Network (Case: output = binary)

7
Feed forward calculations (Case: output = binary)
• x1=>0, x2 => 1
• All the ‘w’ weights => 1
• All the ‘v’ weights => 1
• bias: b= 0; c=0;

• z(1) = sigmoid(0*1 + 1*1) = 0.731


• z(2) = sigmoid(0*1 + 1*1) = 0.731
• p(y|x) = sigmoid(0.731*1 + 0.731*1) = 0.812

• Challenge:
– How to choose the weights and biases so that the neural
network predicts accurately

8
Calculations using matrix notations

Z = sigmoid(XW + b)
For one hidden layer and binary classification Y = sigmoid(Zv + c)
• X => N by D matrix
– N = sample size (number of points)
– D = number of input features
• Z => N by M matrix
– M = Number of nodes in the hidden layer
• p(Y|X) => N by K matrix
– K = Number of distinct output classifications = 1 for binary
• W => D by M;
• b => M by 1; c => scalar

9
Binary v/s K-class classification
• Binary classification example
– Inputs: age, exercise frequency, smoking status
– Output: Will / Will not suffer from disease
• Multi-class classification
– Faces
– Vehicle models
– Weather: sunny, cloudy, heavy rain, moderate rain, etc
– Number recognition
– Character recognition
• Sigmoid function
– Can only support binary classification
10
The multi-class situation

11
Softmax function
• For binary classification
– y = sigmoid(z)
– Only one probability (p) is required to be calculated
– Since the other automatically becomes (1-p)
• For multi class classification, this idea can be
extended using the softmax function
– P(Y = K|X) = exp(ak) / Denom
– Where Denom = exp(a1) + exp(a2) + exp(a3) + …

12
The multi-class situation

13
The multi-class situation

14
Backpropagation
• Backpropagation goal:
– Obtain the weights w and v
– Based on minimizing an error
function (as in the case of
Logistic Regression)
– Error gets “propagated”
backwards from the right
– Weights in the intermediate
layers get adjusted based on
this back-propagated error

15
Likelihood Recap: Logistic Regression
• Recap: Logistic Regression
– SIGMOID function is used to generate the output
– Outcomes have only two possible states => 0,1
– The Likelihood function is expressed as follows

– Error function: Negative of the Log Likelihood


– Gradient of the error function w.r.t w is:

16
Maximum Likelihood: Generalized Case

17
Maximum Likelihood: Generalized Case

18
Likelihood : Generalized Case
• In the generalized case of Neural Networks
– SOFTMAX function is used to generate the output
– Outcomes may have multiple states
– The Likelihood function is expressed as follows:

19
Expressions for gradients: Neural Network
• In this case:

• Minimizing J => Maximizing Likelihood


• We differentiate J w.r.t weights w and v
– This will help us calculate w and v using Gradient
Descent and Backpropagation
• It can be shown that (see next slide for ref.):

20
Derivative of the SOFTMAX function

https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

21
Expressions for gradients: Deep Networks
• Following expressions are for single observations;
same can be extended for N observations

22
Procedure: Calculating weights of Neural Networks
1. Randomly initialize the weight vectors w
2. Multiply the input with weight vectors to reach the
final output (feed forward)
3. Calculate the error : compare result of the forward
step to the expected output
4. Calculate the gradient of the error and change
weights towards the direction of the gradient
– This is the Gradient Descent method
5. Back-propagate this calculation and change the
weights right up to the first hidden layer
6. Repeat steps 1-5 till the stopping criteria is reached

23
How to interpret the weights and outputs
• The value(s) of Y resulting from these weights is a
probability
– If the probability is greater than 50% => YES
– If the probability is less than 50% => NO
• In the case of neural networks, beyond the first layer the
weights cannot be said to have any meaning!
– This is what makes neural networks non-linear and powerful

24
Example : Neural Network – Training Data

25
Example: Neural Network

26
Example: Neural Network: Misclassified Obs.

27
ANN: ROC Plots Example

28
ROC Plots: ANN Based Solutions

Hidden
Layer
Nodes

2 to 151

29
ROC Plots: ANN Based Solutions

Hidden
Layer
Nodes

2 to 151

30
Various ANN Examples

31
Boundary with 0 nodes in hidden layer

32
Boundary with 1 node in hidden layer

33
Boundary with 2 nodes in hidden layer

34
Boundary with 5 nodes in hidden layer

35
Boundary with 7 nodes in hidden layer

36
Boundary with 11 nodes in hidden layer

37
Boundary with 21 nodes in hidden layer

38
Boundary with 31 nodes in hidden layer

39
Boundary with 51 nodes in hidden layer

40
Boundary with 71 nodes in hidden layer

41
Boundary with 91 nodes in hidden layer

42
Boundary with 151 nodes in hidden layer

43
ANN: Example 2

44
ANN: Hidden layer nodes = 1

45
ANN: Hidden layer nodes = 1

46
ANN: Hidden layer nodes = 2

47
ANN: Hidden layer nodes = 2

48
ANN: Hidden layer nodes = 3

49
ANN: Hidden layer nodes = 3

50
ANN: Hidden layer nodes = 9

51
ANN: Hidden layer nodes = 9

52
ANN: Hidden layer nodes = 1

53
ANN: Hidden layer nodes = 2

54
ANN: Hidden layer nodes = 3

55
ANN: Hidden layer nodes = 9

56
ANN: Example 3

57
ANN: Hidden layer nodes = 1

58
ANN: Hidden layer nodes = 1

59
ANN: Hidden layer nodes = 2

60
ANN: Hidden layer nodes = 2

61
ANN: Hidden layer nodes = 9

62
ANN: Hidden layer nodes = 31

63
ANN: Hidden layer nodes = 31

64
ANN: Hidden layer nodes = 2

65
ANN: Hidden layer nodes = 9

66
ANN: Hidden layer nodes = 31

67
ANN : Example 4

68
ANN: Hidden layer nodes = 1

69
ANN: Hidden layer nodes = 2

70
ANN: Hidden layer nodes = 3

71
ANN: Hidden layer nodes = 9

72
ANN: Hidden layer nodes = 31

73
ANN: Example 5

74
ANN: Hidden layer nodes = 1

75
ANN: Hidden layer nodes = 2

76
ANN: Hidden layer nodes = 3

77

Potrebbero piacerti anche