Sei sulla pagina 1di 4

1

J.MZurada 2001
1
ECE613-Lecture 4: Multilayer Feedforward
Neural Networks-Applications
OVERVIEW OF THE LECTURE
Applications of MLFNN for
-function approximation
-prediction
-classification
-expert systems
-control actions
Training: Error analysis and data split into three sets
Useful demonstrations
Data conditioning and network initialization
We will refer to two-layer perceptron neural networks as
Multilayer Feedforward Neural Networks (MLFNN)
J.MZurada 2001
2
Application of MLFNN for function approximation
Approximation of h(x) with H(w,x), a staircase function; each stair called
boxterm can be implemented by a single-input neuron (TLU) with bias,
and scaled by h(x
i
)

=
i
i i
x boxterm x h H )] ( )[ (
J.MZurada 2001
3
x
x
i
sgn(.)
net
2
sgn(.)
net
1
0.5
1
1
-1
1
-1
-
-0.5
boxterm
2 2
x
x x
x
x
i i i
A
+
A

1
x
boxterm(x
i
)
To implement all P stairs:
1. 2nd and 3rd input can be
merged into a single bias
input, new inputs will be x
and 1, and weights need to be
re-computed
2. Output, here of unity value,
needs to be scaled according
to h(x
i
) values
3. P pairs of TLUs like the
one shown now must be
connected between x and
output
Application of MLFNN for function approximation (ctd)
2
x A
2
x A
J.MZurada 2001
4
Application of MLFNN for function approximation (ctd)
Case 1: Single function of a single variable, x=x ,n=1, I=2, K=1
h(x)
MLFNN
w=[w
1
w
2
w
3
w
m
]
t
x
1
Case 2: Single function of n variables,
h(x)
MLFNN
w=[w
1
w
2
w
3
w
m
]
t
x
1
x=[x
1
x
2
x
3
x
n
]
t
, I=n+1, K=1
Case 3: Many functions of n variables,
h(x)
MLFNN
w=[w
1
w
2
w
3
w
m
]
t
x
1
x=[x
1
x
2
x
3
x
n
]
t
, I=n+1, K>1
J.MZurada 2001
5
Properties of MLFNN for function approximation:
MLFNN is able to approximate any multivariable relationship
with any accuracy only if there is infinite number of neurons in
the hidden layer (Stone-Weierstrass Theorem)
In practice we need to follow these guidelines
1. We select a common-sense size of the hidden layer
2 When approximating f-ns which assume values (-1, 1), we use
Bipolar Continuous AFs in the output layer (desired, or target
values of outputs are all in this range), or
3. When approximating functions which assume values outside
(-1,1), we use the Linear AF in the output layer only. The only
modification of the EBP Algorithm now is that the f (net)=1
for all neurons in the output layer in EBP Algorithm, Step 4.
For demo, run MATLABs toolbox/nnet/nndemos script nnd11gn. Other, similar
demo is nnd11fa
Other useful demo:
http://neuron.eng.wayne.edu/bpFunctionApprox/bpFunctionApprox.html
Application of MLFFN for function approximation (ctd)
J.MZurada 2001
6
Application of MLFNN for prediction
t
f(t)
t
1
t
2
t
3
t
n
f(t
1
)
f(t
n
)
y is vector of m future outputs
y = [ f(t
n+1
) f(t
n+2
) f(t
n+m
) ]
t
x = [ f(t
1
) f(t
2
) f(t
n
) ]
t
This is so called Time-Delay NN
x
y
MLFNN
Predictor
This is an m-step-ahead predictor. For m=1, we have single-step predictor
y = f(t
n+1
)
With single step predictor designed computing f*(t
n+1
), multi-step predictors
can be built as follows: To compute f*(t
n+2
) we shift the input sequence
accordingly by one time delay, and inputs become:
f*(t
n+1
) replaces f(t
n
) at the n-th input,
f(t
n
) replaces f(t
n-1
) at the (n-1)th input, etc
input f(t
1
) is now unused .
f(t
1
)
f(t
n
)
2
J.MZurada 2001
7
Application of MLFNN for prediction ctd
y is vector of m future outputs
here y = f(t
n+1
), n=4
x
y
MLFNN
Predictor
f(t
1
)
f(t
n
)
J.MZurada 2001
8
Guidelines 1-3 for Approximators (earlier slides) have to be followed for any
m. In addition, predictors training involves:
1. Choice of the Sampling Period, T
2. Choice of the Window Width, n (usually much shorter than the
observation period involving on the Fig. below n+s samples)
(n+s>n, or better n+s>>n)
3. Choice of the Training Horizon, hT, where h>n
4. During training, window slides h-n times to the right to complete
the training cycle, and then returns to the origin. The sequence of
pattern matters! Then the window starts sliding again, etc.
Application of MLFNN for prediction(ctd)
f(t)
t
(n+s)T
nT
hT
These data only for testing
Sliding window
J.MZurada 2001
9
Application of MLFN for prediction(ctd)
Examples of MLFNN Predictors
- currency exchange rates (T=1hr, or 24 hrs). I/O identical.
260 working days in 1988/89=200 training+60 testing days
- crude oil price change, 5 days ahead (single-step, only the change
needed to be computed)
17 input attributes such as 15/32/55/150 day MVA,
hi/low close indices, % change in standard
deviation compared with 7 days ago). Approximator
which works as a predictor, I/O different (more
general case)
-weather trends
-economic indicatiors, financial returns, stock/future analysis
Note: ML community uses the term prediction for time-invariant modeling
such as classification, regression, expert systems
More on non-elementary predictors in later courses on recurrent networks
J.MZurada 2001
10
Application of MLFNN for classification
Note:(1) Definition of features (2) MLFNN can be both classifier
and feature extractor/classifier in one. (3) Outputs must be binary
J.MZurada 2001
11
Classifier maps the set
{x} into {y}, where x
i
are
binary-valued or real-
valued, and y
i
are
always binary-valued
vectors, i.e., y
i
{-1, 1}
y
MLFNN
Classifier
x
e
Class Coding for M Classes:
1. 1-of-M
-requires M outputs, the
different outputs index
is the class number
2. Binary Encoding
-requires only
INT(log
2
M)+1
outputs, smaller NN,
easier learning,
preferred since NN is
not expected to
decode classes
Application of MLFNN for classification (ctd)
EXAMPLE OF CLASSIFICATION
For planar images, the size of the input vector can be made
equal to the total number of pixels in the evaluated image.
X
1
= [ 1 1 1 1 -1 -1 1 1 ]
t
X
2
= [ -1 1 -1 -1 1 -1 -1 1 -1 ]
t
X
3
= [ 1 1 1 -1 1 -1 -1 1 -1 ]
t
Class 1 Class 2 Class 3
Desired (target) vectors d are equal:
1-of-3 Binary Encoding
(example)
Class 1 [ 1 -1 -1]
t
[-1 -1]
t
Class 2 [-1 1 -1]
t
[ 1 -1 ]
t
Class 3 [-1 -1 1]
t
[ 1 1 ]
t
NOTE: Avoid 0 (zero) as active learning inputs! Why?
Application of MLFNN for classification (ctd)
3
J.MZurada 2001
13
Application of MLFNN for classification (ctd)
Important aspects of interpreting classification results (incl.
Expert Systems outputs computation):
Desired values are typically binary
MLFNN computed outputs in the classification phase data are
NOT binary, then
-output thresholding with, say, 0.9 should be applied
outputs >0.9 ~1, outputs <-0.9 ~-1
-any outputs between -0.9 and 0.9 should be rejected as
below the accepted confidence level as having no crispy
class indication (example of ZIP code digit reader)
J.MZurada 2001
14
Application of MLFNN for expert systems
Block Diagram of MLFNN as Expert Systems
Why this fig. shows input neurons (often called layer)?
J.MZurada 2001
15
Application of MLFNN for expert systems (ctd)
MLFNN as Expert Systems
y
MLFFN
Expert
System
x
x denotes
vector of
symptoms,
real- or
binary-
valued
attributes.
Includes do-
not-knows
which are
coded as 0s
Set of
diagnoses,
typically
binary-
valued
Equivalent to a Classifier!!!
Unmistakeable diagnoses should only be used for training
Expert systems like this are widely use as decision aids in finances
(bankrupcy, credit worthiness), investment analysis (failure or success),
quality control (water, wine) and medicine (heart infarct, skin diseases, back
pain diagnosis)
J.MZurada 2001
16
Application of MLFNN for control actions
F(t)=a x(t)+b dx(t)/dt +c theta(t) + d dtheta(t)/dt
F(t)~ o sgn [a x(t)+b dx(t)/dt +c theta(t) + d dtheta(t)/dt],
or assuming the approximation dg/dt=g(t)-g(t-1)
F(t)~ o sgn [(a+b) x(t)-b x(t-1) + (c+d) theta(t) - d theta(t-1)]
J.MZurada 2001
17
Application of MLFNN for control actions (ctd)
x
1
x
2
x
111
w
1
w
2
w
111
f(net)=sgn(net)
Features of this single-perceptron controller
inputs 1-55 - crude present image
inputs 56-110 -crude most recent image, input 111 is bias
no need to write differential/difference equation
no need to know physical properties of the systems such as mass, moments
raw and crude visual info is inserted into learning
learning commands issued by an intellegent observer (teacher)
teacher knows no physics, control, neural networks
Merging of visual info with experiential learning yields a succesful controller
J.MZurada 2001
18
Application of MLFNN for control actions (ctd)
What these positive weights
are telling us???
At these locations at t and t-
1, F is definitely positive:
The cart requires push to the
right!!!
What these weights close to 0
are telling us???
4
J.MZurada 2001
19
Training: Error analysis and training data split
To develop a MLFNN all data pairs available for supervised
training are split into Training, Validation, and Test set
(60/20/20%).
Cycle Error E (see Step 3 of the EBP Algorithm) is computed
cumulatively within each cycle for all P training steps
Cycle No.
Cycle
Error E Validation set (20% of data)
Training set (60% of data)
Training should stop here to avoid overfitting
E
max
J.MZurada 2001
20
Data conditioning and network initialization
Step 1
Cleansing of data (rejection of mistakes, outliers, unreliable
data, do-not-knows), but noisy data are fine! (ALVINN)
Step 2
Normalization of each input vector component to the interval
(-1, 1) by scaling and shifting appropriately (for real numbers)
Use 0 for missing data, and do-not-knows
Normalization can be done by
-dividing inputs by their variances so that their pdfs fit
-dividing by the min-max data spread (|max-min|)
Reason for normalization: avoiding training on the part of
AF where f (net)~0
J.MZurada 2001
21
Step 3
Assigning numerical values -1, 1 to unconditional logic
statement of FALSE and TRUE, and suitable values in
between in proportion to the degree of truth (0 = no info,
0.7 for likely TRUE, 0.3 for more TRUE than FALSE, etc)
Step 4
Initialize weights by assigning them random values from
uniform distribution in the range (-w
m
, w
m
) to each weight,
where w
m
= [sqrt (n)]
-1
and n is neurons fan-in
Step 5
Change the architecture of the network (K), or apply TWO
hidden layers, if the chosen MLFNN does not train to below
E
max
Data conditioning and network initialization (ctd)

Potrebbero piacerti anche