Sei sulla pagina 1di 28

Introduction to Neural Introduction to Neural Networks Networks

Who has more processing power: A supercomputer or the brain of a fly? Who is more intelligent? How to add intelligence to computers?

Mimic the natural biological evolution and/or the social behavior of species (EAs GAs) Simulate the human brain in its structure and way of processing information Artificial Neural Networks (ANN)
A Topology (Architecture) that has the ability to learn (train) and once trained it can provide outputs or predictions
3

Introduction Introduction
Artificial Neural Networks (ANN) also named Connectionist Artificial Neural Networks (ANN) also named Connectionist Models or Parallel Distributed Processing (PDP) Models or Parallel Distributed Processing (PDP) Models (PDP) ANN consists of a pool of simple processing units called ANN consists of a pool of simple processing units called neurons ,, nodes, or cells which communicate over a large neurons nodes, or cells which communicate over a large cells number of weighted connections and sending signals to number of weighted connections and sending signals to each other each other

Introduction Introduction
ANN behave like a human brain. It ANN behave like a human brain. It demonstrates the ability to learn, demonstrates the ability to learn, recall, and generalize from training recall, and generalize from training pattern or data pattern or data The processing element in ANN The processing element in ANN Neuron Neuron Neuron A human brain consists of 10 billions A human brain consists of 10 billions neurons neurons Each biological neuron is connected to Each biological neuron is connected to several thousands of other neurons, several thousands of other neurons, similar to the connectivity in ANN similar to the connectivity in ANN
5

Introduction Introduction
Biological and Artificial neurons Biological and Artificial neurons

Dendrites receive activation from other neurons Dendrites receive activation from other neurons The neurons cell body (soma) processes the incoming The neuron cell body (soma) processes the incoming neurons (soma) activations and converts them into output activations activations and converts them into output activations Axons acting as transmission lines that send activation to Axons acting as transmission lines that send activation to other neurons other neurons
6

Introduction Introduction
Biological and Artificial neurons Biological and Artificial neurons

Inputs Inputs

Outputs Outputs

Weights Processing Weights Processing A set of connections brings in activations from other A set of connections brings in activations from other neurons neurons A processing unit sums the inputs, and then applies an A processing unit sums the inputs, and then applies an activation function activation function An output line transmits the result to other neurons An output line transmits the result to other neurons
7

Introduction Introduction
General Structure of ANN General Structure of ANN
w11 x11 x x22 x xnn x Hidden Hidden layer 2 layer 2 y22 y w21 y11 y

Input Input layer layer

Hidden Hidden layer 1 layer 1

Output Output layer layer

Important for ANN

classification

Topology (Architecture) Activation (transfer) Function Learning Paradigm

Introduction Introduction
Classification of ANN: Architecture Classification of ANN: Architecture
Topology or Architecture Topology or Architecture
How information flow from input to output How information flow from input to output Number of neurons Number of neurons

10

Single Layer Feed-forward (Perceptron 1957 by Frank Rosenblatt)

Inputs Inputs

Outputs Outputs

Weights Processing Weights Processing

11

w11 x11 x x22 x xnn x w21

Input Input layer layer

One output node, many input nodes, no Hidden Hidden Hidden Hidden hidden layer 2 layer 1 layer 2 layer 1 layers

y11 y

y22 y

Output Output layer layer

12

13

Multi Layer Feed-forward


w11 x11 x x22 x xnn x Hidden Hidden layer 2 layer 2 y22 y w21 y11 y

Input Input layer layer

Hidden Hidden layer 1 layer 1

Output Output layer layer

14

Recurrent Network

15

Introduction Introduction
Classification of ANN: Activation Function Classification of ANN: Activation Function

Inputs Inputs

Outputs Outputs

Processing Processing
x11 x w1
O O

wn xnn x
16

Introduction Introduction
Classification of ANN: Activation Function Classification of ANN: Activation Function

Continuous e.g., Y = tanh(x)

Segments
0 if x < if x if x >
17

Y=

x+.5 1

Threshold

1 0
18

Introduction Introduction
Classification of ANN: Learning Algorithm Classification of ANN: Learning Algorithm

Learning What? Learning What?


Predicting the output giving the inputs

19

Learning
w11

Connections weights to x x y y provide accurate outputs


22

x11 x

w21 y11 y

xnn x

22

Input Input layer layer

Hidden Hidden layer 1 layer 1

Hidden Hidden layer 2 layer 2

Output Output layer layer

Nodes, layers, weights, activation functions


20

Introduction Introduction
Classification of ANN: Learning Algorithm Classification of ANN: Learning Algorithm Learning Algorithm (to define the weights) Learning Algorithm (to define the weights)
Supervised learning (learning by examples or real cases) Supervised learning (learning by examples or real cases)
Both inputs and outputs (desired) are provided. The network Both inputs and outputs (desired) are provided. The network processes the inputs and produces an output (NN output). The processes the inputs and produces an output (NN output). The ANN compares its resulting outputs against the desired outputs. ANN compares its resulting outputs against the desired outputs. Errors are calculated to adjust the weights Errors are calculated to adjust the weights

Unsupervised learning (clustering and classifications) Unsupervised learning (clustering and classifications)
the network is provided with inputs but not with outputs. The the network is provided with inputs but not with outputs. The system itself must decide what features it will use to group the system itself must decide what features it will use to group the input data. This is often referred to as self-organization input data. This is often referred to as selfself-organization
21

Supervised Learning
gender age

Example

Automated Employment System

decision education

computer
22

Supervised Learning
gender age education male 25-30 H female 25-30 H decision

computer H Decisions rejected H accepted


23

Strengthen the weight for gender

Supervised Learning
Learn by examples Real cases with known inputs and outputs

24

Supervised Learning
Process:
Divide the examples to two sets Training set set and Testing set e.g., for 20 cases , consider set 15 for training or learning and 5 for testing. Create the network and select the activation function then initiate the weights by any values Subject the ANN to the training cases one after the other .

25

Supervised Learning
Process:
Consider the first training case, process the inputs using the initial weights and predict the output (ANN output) , with the arbitrary initial weights the output provided by the ANN will not match the actual output you know for this case i.e., there is an error. The process considers the error to adjust the weights
The back-propagation algorithm back-

26

Supervised Learning
Process:
Subject the ANN to the next training case (using the modified weights). Calculate ANN output, compare it with the actual output, calculate the error, adjust the weights. Keep doing this process till finishing all training cases then accumulate the errors to determine the Total Error ( for all training cases).

27

Supervised Learning
Process:
If the total error is tolerable, STOP the training otherwise, repeat the training for all training cases , using the last set of weights obtained from previous training. Once the training process is finished, the trained ANN is tested using the testing set of cases The testing means the total error for the testing set (between actual and ANN output) is tolerable.
28

Unsupervised Learning
Features Classification and clustering Example:
Fruits Classification System

29

Example
tr tr Wnew=Wold+ Error . Input tr tr

x y z

Input = x, y, or z

30

ANN

error
1-(-2)=3

Wnew x

Wnew y

Wnew z

-2

-1

-2

0 1 1
0 *(-1)+1*3+1*(-5)=-2 *(- 1)+1*3+1*(- 5)=Wnew=Wold+ Error . Input

Wnew x =-1 + 3*0 =-1 =31

ANN

error
1-(-2)=3 0-(-1)=1

Wnew x

Wnew y

Wnew z

-2 -1

-1 0

6 6

-2 -2

1 0 0
6 -2

-1

1*(-1)+0*6+0*(-2)=-1 1*(- 1)+0*6+0*(- 2)=Wnew=Wold+ Error . Input

32

ANN

error
1-(-2)=3 0-(-1)=1 0-(6)=-6

Wnew x

Wnew y

Wnew z

-2 -1 6

-1 0 0

6 6 0

-2 -2 -2

0 1 0
6 -2

0*(0)+1*6+0*(-2)=6 0*(0)+1*6+0*(Wnew=Wold+ Error . Input

33

ANN

error
1-(-2)=3 0-(-1)=1 0-(6)=-6 1-(-2)=3

Wnew x

Wnew y

Wnew z

Total Error=11 0 0 1
0 -2 0

-2 -1 6 -2

-1 0 0 0

6 6 0 0

-2 -2 -2 1

0*(0)+0*0+1*(-2)=-2 0*(0)+0*0+1*(- 2)=Wnew=Wold+ Error . Input

34

ANN

error

Wnew x

Wnew y

Wnew z

0 1 1
0 1

0 *0+1*0+1*1=1
Wnew=Wold+ Error . Input

35

ANN

error

Wnew x

Wnew y

Wnew z

1 0

0 0

0 0

0 0

1 1

1 0 0
0 1

1*0+0*0+0*1=0
Wnew=Wold+ Error . Input

36

ANN

error

Wnew x

Wnew y

Wnew z

1 0 0

0 0 0

0 0 0

0 0 0

1 1 1

0 1 0
0 1

0*0+1*0+0*1=0
Wnew=Wold+ Error . Input

37

ANN

error

Wnew x

Wnew y

Wnew z

Total Error=0 0 0 1
0 1 0

1 0 0 1

0 0 0 0

0 0 0 0

0 0 0 0

1 1 1 1

Training Done
0*0+0*0+1*1=1
Wnew=Wold+ Error . Input

38

Testing

x
0

y
1

z
39

Example

tr tr tr tr ts

ts ts

ts

40

Testing
ts

ts ts

0
0

ts

0
1

0
41

ANN
An Optimization Problem
Objective Function
Min total error for training set

Variables
Weights (between Input layer and

Hidden layer and between Hidden layer and Output layer

Constraints: ?
42

I WIH

array matrix

H WHO O

sumproduct f (sumproduct) matrix sumproduct f (sumproduct) error


43

44

Example Example

45

Training Testing Training Training Training Training Training Testing Testing Testing

1 1 2 1 2 1 2 1 2 2

46

Example Example

47

Example Example

48

Practical Considerations Practical Considerations


Training Data Pre-processing Training Data Pre-processing
We can just use any raw data to train our networks. However, We can just use any raw data to train our networks. However, it is necessary to carry out some preprocessing of the training it is necessary to carry out some preprocessing of the training data before feeding it to the network data before feeding it to the network We should make sure that the training data is representative, We should make sure that the training data is representative, it should not contain too many examples of one type. it should not contain too many examples of one type. On the other hand, if one class of pattern is easy to learn, On the other hand, if one class of pattern is easy to learn, having large numbers of patterns from that class in the having large numbers of patterns from that class in the training set will only slow down the learning process training set will only slow down the learning process
49

Practical Considerations Practical Considerations


Training Data Pre-processing Training Data Pre-processing
If the training data is continuous, it is good idea to rescale If the training data is continuous, it is good idea to rescale the input values. Simply shifting the zero of the scale so that the input values. Simply shifting the zero of the scale so that the mean value of each input is near zero, and normalizing so the mean value of each input is near zero, and normalizing so that the standard deviation of the values for each input are that the standard deviation of the values for each input are roughly the same, can make a big difference. roughly the same, can make a big difference.
50

Practical Considerations Practical Considerations


Scaling the Input Data Scaling the Input Data
Data is scaled to a range of --1to 1. Data is scaled to a range of 1 to 1. Scaled value = 2(unscaled value min value) Scaled value = 2(unscaled value min value) -- 1 1 (max value min value) (max value min value) For example: an input of year of construction ranges from For example: an input of year of construction ranges from 1991 till 1998. For a given value of 1995, its scaled vale = 1991 till 1998. For a given value of 1995, its scaled vale =
[2 (1995 1991) / (1998 1991) ] 1 = 0.14 [2 (1995 1991) / (1998 1991) ] 1 = 0.14
51

Practical Considerations Practical Considerations


Choosing the Initial Weights Choosing the Initial Weights
Do not start weights with the same value, all the hidden units Do not start weights with the same value, all the hidden units will end up doing the same thing and the network will never will end up doing the same thing and the network will never learn properly learn properly For that reason, we generally start off all the weights with For that reason, we generally start off all the weights with small random values around zero small random values around zero We usually train the network from a number of different We usually train the network from a number of different random initial weight sets random initial weight sets In networks with hidden layers, we can expect different final In networks with hidden layers, we can expect different final sets of weights to emerge from the learning process for sets of weights to emerge from the learning process for different choices of random initial weights different choices of random initial weights
52

Practical Considerations Practical Considerations


Choosing the Learning Rate Choosing the Learning Rate
Choosing the learning rate is constrained by two opposing Choosing the learning rate is constrained by two opposing facts: If is too small, it will take too long to get anywhere facts: If is too small, it will take too long to get anywhere near the minimum of the error function. If is too large, the near the minimum of the error function. If is too large, the weight updates will over-shoot the error minimum and the weight updates will overover-shoot the error minimum and the weights will oscillate, or even diverge weights will oscillate, or even diverge The optimal value is network dependent, so one cannot The optimal value is network dependent, so one cannot formulate general weight. Generally, one should try a range formulate general weight. Generally, one should try a range of different values (e.g. h = 0.1, 0.01, 1.0, 0.0001) and use of different values (e.g. h = 0.1, 0.01, 1.0, 0.0001) and use the results as a guide the results as a guide There is no necessity to keep the learning rate fixed There is no necessity to keep the learning rate fixed throughout the learning process throughout the learning process

53

Practical Considerations Practical Considerations


Choosing the Transfer Function Choosing the Transfer Function
In terms of computational efficiency, the standard sigmoid is In terms of computational efficiency, the standard sigmoid is better than the step function of the Simple Perceptron better than the step function of the Simple Perceptron A convenient alternative to the logistic function is then the A convenient alternative to the logistic function is then the hyperbolic tangent: f(x) = tanh(x) hyperbolic tangent: f(x) = tanh(x) When the outputs are required to be non-binary, i.e. When the outputs are required to be nonnon-binary, i.e. continuous real values, having sigmoidal transfer functions no continuous real values, having sigmoidal transfer functions no longer makes sense. In these cases, a simple linear transfer longer makes sense. In these cases, a simple linear transfer function f(x) = x is appropriate. function f(x) = x is appropriate.
54

Practical Considerations Practical Considerations


Batch Training vs Online Training Batch Training vs Online Training
When we add up the weight changes for all the training When we add up the weight changes for all the training patterns, and apply them in one go, it is called Batch Training. patterns, and apply them in one go, it is called Batch Training. Training. A natural alternative is to update all the weights immediately A natural alternative is to update all the weights immediately after processing each training pattern. This is called On-line after processing each training pattern. This is called OnOn-line Training (or Sequential Training). Training (or Sequential Training). On-line learning, Normally a much lower learning rate will be OnOn-line learning, Normally a much lower learning rate will be necessary than for batch learning. However, the learning is necessary than for batch learning. However, the learning is often much quicker. often much quicker.
55

Potrebbero piacerti anche