AI NN Introduction

Introduction to Neural Introduction to Neural Networks Networks
Who has more processing power: A supercomputer or the brain of a fly? Who is more intelligent? How to add intelligence to computers?
Mimic the natural biological evolution and/or the social behavior of species (EAs GAs) Simulate the human brain in its structure and way of processing information Artificial Neural Networks (ANN)
A Topology (Architecture) that has the ability to learn (train) and once trained it can provide outputs or predictions
3
Introduction Introduction
Artificial Neural Networks (ANN) also named Connectionist Artificial Neural Networks (ANN) also named Connectionist Models or Parallel Distributed Processing (PDP) Models or Parallel Distributed Processing (PDP) Models (PDP) ANN consists of a pool of simple processing units called ANN consists of a pool of simple processing units called neurons ,, nodes, or cells which communicate over a large neurons nodes, or cells which communicate over a large cells number of weighted connections and sending signals to number of weighted connections and sending signals to each other each other
ANN behave like a human brain. It ANN behave like a human brain. It demonstrates the ability to learn, demonstrates the ability to learn, recall, and generalize from training recall, and generalize from training pattern or data pattern or data The processing element in ANN The processing element in ANN Neuron Neuron Neuron A human brain consists of 10 billions A human brain consists of 10 billions neurons neurons Each biological neuron is connected to Each biological neuron is connected to several thousands of other neurons, several thousands of other neurons, similar to the connectivity in ANN similar to the connectivity in ANN
5
Biological and Artificial neurons Biological and Artificial neurons
Dendrites receive activation from other neurons Dendrites receive activation from other neurons The neurons cell body (soma) processes the incoming The neuron cell body (soma) processes the incoming neurons (soma) activations and converts them into output activations activations and converts them into output activations Axons acting as transmission lines that send activation to Axons acting as transmission lines that send activation to other neurons other neurons
6
Biological and Artificial neurons Biological and Artificial neurons
Inputs Inputs
Outputs Outputs
Weights Processing Weights Processing A set of connections brings in activations from other A set of connections brings in activations from other neurons neurons A processing unit sums the inputs, and then applies an A processing unit sums the inputs, and then applies an activation function activation function An output line transmits the result to other neurons An output line transmits the result to other neurons
7
General Structure of ANN General Structure of ANN
w11 x11 x x22 x xnn x Hidden Hidden layer 2 layer 2 y22 y w21 y11 y
Input Input layer layer
Hidden Hidden layer 1 layer 1
Output Output layer layer
Important for ANN
classification
Topology (Architecture) Activation (transfer) Function Learning Paradigm
Classification of ANN: Architecture Classification of ANN: Architecture
Topology or Architecture Topology or Architecture
How information flow from input to output How information flow from input to output Number of neurons Number of neurons
10
Single Layer Feed-forward (Perceptron 1957 by Frank Rosenblatt)
Inputs Inputs
Outputs Outputs
Weights Processing Weights Processing
11
w11 x11 x x22 x xnn x w21
One output node, many input nodes, no Hidden Hidden Hidden Hidden hidden layer 2 layer 1 layer 2 layer 1 layers
y11 y
y22 y
12
13
Multi Layer Feed-forward

w11 x11 x x22 x xnn x Hidden Hidden layer 2 layer 2 y22 y w21 y11 y
14
Recurrent Network
15
Classification of ANN: Activation Function Classification of ANN: Activation Function
Inputs Inputs
Outputs Outputs
Processing Processing
x11 x w1
O O
wn xnn x
16
Classification of ANN: Activation Function Classification of ANN: Activation Function
Continuous e.g., Y = tanh(x)
Segments
0 if x < if x if x >
17
Y=
x+.5 1
Threshold
1 0
18
Classification of ANN: Learning Algorithm Classification of ANN: Learning Algorithm
Learning What? Learning What?

Predicting the output giving the inputs
19
Learning
w11
Connections weights to x x y y provide accurate outputs

22
x11 x
w21 y11 y
xnn x
22
Nodes, layers, weights, activation functions

20
Classification of ANN: Learning Algorithm Classification of ANN: Learning Algorithm Learning Algorithm (to define the weights) Learning Algorithm (to define the weights)
Supervised learning (learning by examples or real cases) Supervised learning (learning by examples or real cases)
Both inputs and outputs (desired) are provided. The network Both inputs and outputs (desired) are provided. The network processes the inputs and produces an output (NN output). The processes the inputs and produces an output (NN output). The ANN compares its resulting outputs against the desired outputs. ANN compares its resulting outputs against the desired outputs. Errors are calculated to adjust the weights Errors are calculated to adjust the weights
Unsupervised learning (clustering and classifications) Unsupervised learning (clustering and classifications)
the network is provided with inputs but not with outputs. The the network is provided with inputs but not with outputs. The system itself must decide what features it will use to group the system itself must decide what features it will use to group the input data. This is often referred to as self-organization input data. This is often referred to as selfself-organization
21
Supervised Learning
gender age
Example
Automated Employment System
decision education
computer
22
Supervised Learning
gender age education male 25-30 H female 25-30 H decision
computer H Decisions rejected H accepted

23
Strengthen the weight for gender
Supervised Learning
Learn by examples Real cases with known inputs and outputs
24
Supervised Learning
Process:
Divide the examples to two sets Training set set and Testing set e.g., for 20 cases , consider set 15 for training or learning and 5 for testing. Create the network and select the activation function then initiate the weights by any values Subject the ANN to the training cases one after the other .
25
Supervised Learning
Process:
Consider the first training case, process the inputs using the initial weights and predict the output (ANN output) , with the arbitrary initial weights the output provided by the ANN will not match the actual output you know for this case i.e., there is an error. The process considers the error to adjust the weights
The back-propagation algorithm back-
26
Supervised Learning
Process:
Subject the ANN to the next training case (using the modified weights). Calculate ANN output, compare it with the actual output, calculate the error, adjust the weights. Keep doing this process till finishing all training cases then accumulate the errors to determine the Total Error ( for all training cases).
27
Supervised Learning
Process:
If the total error is tolerable, STOP the training otherwise, repeat the training for all training cases , using the last set of weights obtained from previous training. Once the training process is finished, the trained ANN is tested using the testing set of cases The testing means the total error for the testing set (between actual and ANN output) is tolerable.
28
Unsupervised Learning
Features Classification and clustering Example:
Fruits Classification System
29
Example
tr tr Wnew=Wold+ Error . Input tr tr
x y z
Input = x, y, or z
30
ANN
error
1-(-2)=3
Wnew x
Wnew y
Wnew z
-2
-1
-2
0 1 1
0 *(-1)+1*3+1*(-5)=-2 *(- 1)+1*3+1*(- 5)=Wnew=Wold+ Error . Input
Wnew x =-1 + 3*0 =-1 =31
ANN
error
1-(-2)=3 0-(-1)=1
Wnew x
Wnew y
Wnew z
-2 -1
-1 0
6 6
-2 -2
1 0 0
6 -2
-1
1*(-1)+0*6+0*(-2)=-1 1*(- 1)+0*6+0*(- 2)=Wnew=Wold+ Error . Input
32
ANN
error
1-(-2)=3 0-(-1)=1 0-(6)=-6
Wnew x
Wnew y
Wnew z
-2 -1 6
-1 0 0
6 6 0
-2 -2 -2
0 1 0
6 -2
0*(0)+1*6+0*(-2)=6 0*(0)+1*6+0*(Wnew=Wold+ Error . Input
33
ANN
error
1-(-2)=3 0-(-1)=1 0-(6)=-6 1-(-2)=3
Wnew x
Wnew y
Wnew z
Total Error=11 0 0 1
0 -2 0
-2 -1 6 -2
-1 0 0 0
6 6 0 0
-2 -2 -2 1
0*(0)+0*0+1*(-2)=-2 0*(0)+0*0+1*(- 2)=Wnew=Wold+ Error . Input
34
ANN
error
Wnew x
Wnew y
Wnew z
0 1 1
0 1
0 *0+1*0+1*1=1
Wnew=Wold+ Error . Input
35
ANN
error
Wnew x
Wnew y
Wnew z
1 0
0 0
0 0
0 0
1 1
1 0 0
0 1
1*0+0*0+0*1=0
36
ANN
error
Wnew x
Wnew y
Wnew z
1 0 0
0 0 0
0 0 0
0 0 0
1 1 1
0 1 0
0 1
0*0+1*0+0*1=0
37
ANN
error
Wnew x
Wnew y
Wnew z
Total Error=0 0 0 1
0 1 0
1 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
1 1 1 1
Training Done
0*0+0*0+1*1=1
38
Testing
x
0
y
1
z
39
Example
tr tr tr tr ts
ts ts
ts
40
Testing
ts
ts ts
0
0
ts
0
1
0
41
ANN
An Optimization Problem
Objective Function
Min total error for training set
Variables
Weights (between Input layer and
Hidden layer and between Hidden layer and Output layer
Constraints: ?
42
I WIH
array matrix
H WHO O
sumproduct f (sumproduct) matrix sumproduct f (sumproduct) error

43
44
Example Example
45
Training Testing Training Training Training Training Training Testing Testing Testing
1 1 2 1 2 1 2 1 2 2
46
Example Example
47
Example Example
48
Practical Considerations Practical Considerations

Training Data Pre-processing Training Data Pre-processing
We can just use any raw data to train our networks. However, We can just use any raw data to train our networks. However, it is necessary to carry out some preprocessing of the training it is necessary to carry out some preprocessing of the training data before feeding it to the network data before feeding it to the network We should make sure that the training data is representative, We should make sure that the training data is representative, it should not contain too many examples of one type. it should not contain too many examples of one type. On the other hand, if one class of pattern is easy to learn, On the other hand, if one class of pattern is easy to learn, having large numbers of patterns from that class in the having large numbers of patterns from that class in the training set will only slow down the learning process training set will only slow down the learning process
49

Training Data Pre-processing Training Data Pre-processing
If the training data is continuous, it is good idea to rescale If the training data is continuous, it is good idea to rescale the input values. Simply shifting the zero of the scale so that the input values. Simply shifting the zero of the scale so that the mean value of each input is near zero, and normalizing so the mean value of each input is near zero, and normalizing so that the standard deviation of the values for each input are that the standard deviation of the values for each input are roughly the same, can make a big difference. roughly the same, can make a big difference.
50

Scaling the Input Data Scaling the Input Data
Data is scaled to a range of --1to 1. Data is scaled to a range of 1 to 1. Scaled value = 2(unscaled value min value) Scaled value = 2(unscaled value min value) -- 1 1 (max value min value) (max value min value) For example: an input of year of construction ranges from For example: an input of year of construction ranges from 1991 till 1998. For a given value of 1995, its scaled vale = 1991 till 1998. For a given value of 1995, its scaled vale =
[2 (1995 1991) / (1998 1991) ] 1 = 0.14 [2 (1995 1991) / (1998 1991) ] 1 = 0.14
51

Choosing the Initial Weights Choosing the Initial Weights
Do not start weights with the same value, all the hidden units Do not start weights with the same value, all the hidden units will end up doing the same thing and the network will never will end up doing the same thing and the network will never learn properly learn properly For that reason, we generally start off all the weights with For that reason, we generally start off all the weights with small random values around zero small random values around zero We usually train the network from a number of different We usually train the network from a number of different random initial weight sets random initial weight sets In networks with hidden layers, we can expect different final In networks with hidden layers, we can expect different final sets of weights to emerge from the learning process for sets of weights to emerge from the learning process for different choices of random initial weights different choices of random initial weights
52

Choosing the Learning Rate Choosing the Learning Rate
Choosing the learning rate is constrained by two opposing Choosing the learning rate is constrained by two opposing facts: If is too small, it will take too long to get anywhere facts: If is too small, it will take too long to get anywhere near the minimum of the error function. If is too large, the near the minimum of the error function. If is too large, the weight updates will over-shoot the error minimum and the weight updates will overover-shoot the error minimum and the weights will oscillate, or even diverge weights will oscillate, or even diverge The optimal value is network dependent, so one cannot The optimal value is network dependent, so one cannot formulate general weight. Generally, one should try a range formulate general weight. Generally, one should try a range of different values (e.g. h = 0.1, 0.01, 1.0, 0.0001) and use of different values (e.g. h = 0.1, 0.01, 1.0, 0.0001) and use the results as a guide the results as a guide There is no necessity to keep the learning rate fixed There is no necessity to keep the learning rate fixed throughout the learning process throughout the learning process
53

Choosing the Transfer Function Choosing the Transfer Function
In terms of computational efficiency, the standard sigmoid is In terms of computational efficiency, the standard sigmoid is better than the step function of the Simple Perceptron better than the step function of the Simple Perceptron A convenient alternative to the logistic function is then the A convenient alternative to the logistic function is then the hyperbolic tangent: f(x) = tanh(x) hyperbolic tangent: f(x) = tanh(x) When the outputs are required to be non-binary, i.e. When the outputs are required to be nonnon-binary, i.e. continuous real values, having sigmoidal transfer functions no continuous real values, having sigmoidal transfer functions no longer makes sense. In these cases, a simple linear transfer longer makes sense. In these cases, a simple linear transfer function f(x) = x is appropriate. function f(x) = x is appropriate.
54

Batch Training vs Online Training Batch Training vs Online Training
When we add up the weight changes for all the training When we add up the weight changes for all the training patterns, and apply them in one go, it is called Batch Training. patterns, and apply them in one go, it is called Batch Training. Training. A natural alternative is to update all the weights immediately A natural alternative is to update all the weights immediately after processing each training pattern. This is called On-line after processing each training pattern. This is called OnOn-line Training (or Sequential Training). Training (or Sequential Training). On-line learning, Normally a much lower learning rate will be OnOn-line learning, Normally a much lower learning rate will be necessary than for batch learning. However, the learning is necessary than for batch learning. However, the learning is often much quicker. often much quicker.
55

AI NN Introduction

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

AI NN Introduction

Caricato da

Copyright:

Formati disponibili

Introduction to Neural Introduction to Neural Networks Networks

Input Input layer layer

Hidden Hidden layer 1 layer 1

Output Output layer layer

Important for ANN

Topology (Architecture) Activation (transfer) Function Learning Paradigm

Single Layer Feed-forward (Perceptron 1957 by Frank Rosenblatt)

Weights Processing Weights Processing

w11 x11 x x22 x xnn x w21

Input Input layer layer

Output Output layer layer

Multi Layer Feed-forward

Input Input layer layer

Hidden Hidden layer 1 layer 1

Output Output layer layer

Continuous e.g., Y = tanh(x)

Learning What? Learning What?

Connections weights to x x y y provide accurate outputs

Input Input layer layer

Hidden Hidden layer 1 layer 1

Hidden Hidden layer 2 layer 2

Output Output layer layer

Nodes, layers, weights, activation functions

Automated Employment System

computer H Decisions rejected H accepted

Strengthen the weight for gender

Wnew x =-1 + 3*0 =-1 =31

1*(-1)+0*6+0*(-2)=-1 1*(- 1)+0*6+0*(- 2)=Wnew=Wold+ Error . Input

0*(0)+1*6+0*(-2)=6 0*(0)+1*6+0*(Wnew=Wold+ Error . Input

0*(0)+0*0+1*(-2)=-2 0*(0)+0*0+1*(- 2)=Wnew=Wold+ Error . Input

Hidden layer and between Hidden layer and Output layer

sumproduct f (sumproduct) matrix sumproduct f (sumproduct) error

Practical Considerations Practical Considerations

Practical Considerations Practical Considerations

Practical Considerations Practical Considerations

Practical Considerations Practical Considerations

Practical Considerations Practical Considerations

Practical Considerations Practical Considerations

Practical Considerations Practical Considerations

Potrebbero piacerti anche

1(-1)+06+0(-2)=-1 1(- 1)+06+0(- 2)=Wnew=Wold+ Error . Input

0(0)+16+0(-2)=6 0(0)+16+0(Wnew=Wold+ Error . Input

0(0)+00+1(-2)=-2 0(0)+00+1(- 2)=Wnew=Wold+ Error . Input