S10

S10:
Gene)c Algorithms and Neural Networks

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

DSS Course Outline

Introduction to Modeling & Data Mining nFundamental concepts and terminology Data Mining methods nClassification decision trees, association rules, clustering and segmentation, collaborative filtering, genetic algorithms etc. nInner workings nStrengths and weaknesses Evaluation nHow to evaluate the results of a data mining solutions nApplications nReal-world business problems DM can be applied to
YouTube
hEp://www.youtube.com/watch? v=b1rHS3R0llU&annota)on_id=annota)on_882813&feature=iv
hEp://brainz.org/15-real-world-applica)ons-gene)c-algorithms/ hEp://gene)calgorithms.ai-depot.com/Programs.html hEp://math.hws.edu/xJava/GA/
Business Applica)ons
Complex scheduling problems Resource op)miza)on in large factories Op)mizing weights of neural networks (will learn about neural networks in this or next session) Op)miza)on of Making predic)ons in nance (Well get to this!)
Darwins Insight
evolu)on is a series of single steps each single step is simply rela%ve to its predecessor, arising randomly... but a sequence of small steps, cumula%ve selec%on is not a random process! explains how order and complexity can happen and thrive
Key Observa)on
As long as you can es)mate the quality of a proposed solu)on, search does a lot of work for you!
Dene the performance variables Dene the features
REMEMBER: OPIM 621 and OPIM 101? Example: A Linear Discontinuous Terrain of Possible Solutions X < 50 Y
B C
Y < 75
Feasible solution region
AX +BY < 5000
The Gene)c Algorithm

Directed search algorithms based on the mechanics of biological evolu)on Developed by John Holland, University of Michigan (1970s)
To understand the adap)ve processes of natural systems To design ar)cial systems soiware that retains the robustness of natural systems
Components of a GA
A problem to solve, and ... Encoding technique (gene, chromosome) Ini)aliza)on procedure (crea%on) Evalua)on func)on (environment) Selec)on of parents (reproduc%on) Gene)c operators (muta%on, recombina%on) Parameter sekngs (prac%ce and art)

The GA Cycle of Reproduc)on

reproduc)on
parents children
modica)on
modied children
popula)on
deleted members
evaluated children
evalua)on
discard
Popula)on
popula)on
Chromosomes could be:

Bit strings (0101 ... 1100) Real numbers (43.2 -33.1 ... 0.0 89.2) Permuta)ons of element (E11 E3 E7 ... E1 E15) Lists of rules (R1 R2 R3 ... R22 R23) Program elements (gene)c programming) ... any data structure ...
Chromosomes Can Represent Arbitrary Data Structures

1 0 0
+ - a b c
Arrays
Trees
<
0.8
AND
Boolean Expressions
I.e. X1 between 0 and 0.8
Reproduc)on
parents popula)on Parents are selected at random with selec)on chances biased in rela)on to chromosome evalua)ons.
reproduc)on
children
Chromosome Modica)on

children
modica)on
modied children
Modica)ons are stochas)cally triggered Operator types are: Muta)on Crossover (recombina)on)
A Simple Example
The Traveling Salesman Problem:

Find a tour of a given set of ci)es so that

each city is visited only once the total distance traveled is minimized

Representa)on
Representa)on is an ordered list of city numbers known as an order-based GA.

1) London 3) Dunedin 5) Beijing 7) Tokyo 2) Venice 4) Singapore 6) Phoenix 8) Victoria

CityList1 (3 5 7 2 1 6 4 8) CityList2 (2 5 7 6 8 1 3 4)
TSP Example: 30 Cities

120 100 80 y 60 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100
Solution i (Distance = 941)

TSP30 (Performance = 941)
120 100
80
y 60
40
20
0 0 10 20 30 40 50 x 60 70 80 90 100
44 62 69 67 78 64 62 54 42 50 40 40 38 21 35 67 60 60 40 42 50 99
Solution j(Distance = 800)

120 100
80
y 60
40
20
0 0 10 20 30 40 50 x 60 70 80 90 100
Solution k(Distance = 652)

120 100
80
y 60
40
20
0 0 10 20 30 40 50 x 60 70 80 90 100
Best Solution (Distance = 420)

42 38 35 26 21 35 32 7 38 46 44 58 60 69 76 78 71 69 67 62 84 94 TSP30 Solution (Performance = 420)
120
100
80
y 60
40
20
0 0 10 20 30 40 50 x 60 70 80 90 100
Overview of Performance
TSP30 - Overview of Performance
1800 1600 1400 1200 e c 1000 n a t s 800 i D 600 400 200 0 1 3 5 7 9 11 13 15 17 19 21 Generations (1000) 23 25 27 29 31
Best Worst Average
A Nonlinear Continuous Terrain of Possible Solutions Z

Global maximum
Local maxima
Genetic Algorithms: Directed Random Search

Landscape represents quality of solutions, higher the better Objective is to find the highest spots, and report the description of those spots The search proceeds as shown below, where each dot represents a solution
20 Generations
50
From Seven Methods for Transforming Corporate Data Into Business Intelligence, by Vasant Dhar and Roger Stein, Prentice-Hall, 1997.
30,000 Foot View of Genetic Search

Better part Better part Better part Better part Better part
Breeds
Breeds
Breeds
Breeds
Worse part
Worse part
Worse part
Worse part
Worse part
Drops Out
Drops Out
Drops Out
Drops Out
Drops Out
Solution Quality Best Average Worst Generations
Basic Concept of a Gene)c Algorithm

breed solu)ons to a problem in parallel by:
selec)ng the beEer solu)ons from a popula)on using some evalua)on criteria exchanging informa)on between selected individuals (ma)ng) occasionally tweaking a few solu)ons randomly (muta)ng)
stop when acceptable solu)ons are reached, or solu)ons are not improving, or aier the algorithm has run for a specied amount of )me or for a specic number of genera)ons
How Selection Works: Roulette

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again. The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.
Krome 5
Krome 1
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again. The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted.The image cannot be displayed. Your computer may not have enough memory to open the Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again. image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.
Krome 2
Krome 4 Krome 3
Most likely to be chosen
Least likely to be chosen
Process in Finance
Establish model basis (I.e. fundamentals, anomalies? Momentum? Asset-specic versus general?) Feature engineering (construc)ng good technical and fundamental indicators) Search
Selec)on, crossover, muta)on
Theory construc)on Rollout and con)nuous evalua)on
How Crossover Works

Before
Frequency 0.3 0.9 0 Product Type 0 0 1 Price 0.7 0.9 70 Frequency 0.3 0.9 0
Aier
Product Type 0 0 1 Price 0.6 0.9 68
0.1
0.8
0.6
0.9
84
0.1
0.8
0.7
0.9
92
Crossover Point
How Muta)on Works
Before
Frequency 0.3 0.9 0 Product Type 0 0 1 Price 0.7 0.9 Frequency 0.3 0.9 0
Aier
Product Type 0 0 1 Price 0.5 0.9
SQL-Like (Boolean) Chromosome Seman)cs for PaEerns

Concept Variable|Constant|Operator Example: 30-day-moving average of price (MA30) Univariate predicate (Single "conjunct") Example: MA30 > 10 Example: MA30 < 10 OR MA30 > 90 Multivariate predicate (Conjunctive pattern) Example: MA30 > 10 AND MA10 < 5 Multiple Patterns Representation Gene Set of Genes Chromosome Population
Example 1: Searching for Trading Rules With a Gene)c Learning Algorithm

A Popula)on of PaEerns
Asset Trend Asset Type Rela)ve Str Ac)on Fitness
0.5
0.8
0.6
0.9
Long
0.84
PaEern: GO LONG WHEN Asset Trend is between 0.5 and 0.8 Asset Type is of type 2 or type 4 Rela)ve Strength is between 0.6 and 0.9
Based on mul)ple objec)ves, I.e. consistency of returns over )me consistency of returns across assets, etc
Example Used to Evaluate Alterna)ve Methods: Earnings Surprise Predic)on

Cumulative Abnormal Returns Based on Earnings Surprise
0.1 0.08 0.06 0.04
Best Surprise
Return
0.02 0
14 20 26 32 38 44 50 56 62 68 74 -4 -3 -2 -2 -1 -1 80
-0.02 -0.04 -0.06 -0.08
-4
Worst Surprise
Announcement Day
Days
The Problem: How Do You Figure Out the Outcome X Days in Advance?
Variables Considered (30)

Es)mate revision index (Expecta)on) Number of up/down revisions (Expecta)on) Cash-ow based ROI (CFROI) and its momentum (Fundamental) Industry trend (Technical)
Dependent variable: Degree of surprise can be: posi)ve, nega)ve, none
A Comparison of Rule Learning Algorithms

Several algorithms were tested of the earnings surprise predic)on problem Every case was categorized into one of three categories: posi%ve, nega%ve, neutral (priors: 0.12, 0.75, 0.13) S&P500, January 1990 to September 1998 Predic)on task: predict category 20 days prior to announcement
See www.stern.nyu.edu/~vdhar for details and copy of paper or Journal of Data Mining and Knowledge Discovery, October 2000
The Data
In sample training 14,490 records In sample tes)ng 9,650 records (split randomly into mul)ple test sets) Out of sample: 12,164 records (con)guous periods chosen manually)
The Objec)ve
Balance between generality and accuracy Balance condence and support
Dependent Variable Distribu)on

Earnings Surprise: Priors
80 60 40 20 0 Negative Neutral Positive 13.35 12.71 73.94
Goal: Predict Nega)ve and Posi)ve Surprises only
Comparison of Tree Induc)on, Rule Learning, and Gene)c Algorithms

70 60
Nega)ve Surprise Rules
50 40 30 20
10 0
64.39
63.49
65.53
Mean
Sigma
5.41 TI:conf
5.93 1.00 TI:supp
3.34 RL:conf
7.39 3.70 RL:supp
1.39 GA:conf
13.34 1.60 GA:supp
80 60
Posi)ve Surprise Rules
40 20
64.24
65.02
71.48
Mean
5.39 TI:conf
1.70 0.82 TI:supp
4.38 RL:conf
7.86 2.60 RL:supp
2.09 GA:conf
10.71 1.04 GA:supp
Sigma
Conclusion: Search Does MaEer!
Tradeos Among Methods

Tolerance for Problem Complexity Tree Induction Algorithms Neural Networks Genetic Learning Algorithms Speed Attribution/Explainability
Conclusion: Nothing is perfect!
Other Strengths? Weaknesses?
Strengths
Flexible Mul)ple objec)ves encoded into 1 Can model rela)onships between aEributes Searches a large space of possible solu)ons
Weaknesses
You have to know your objec)ve/tness Time Heuris)c doesnt nd the op)mal solu)on
How Do GAs Compare with Tree Induc)on?
A chromosome can be a whole path or even a tree. The GA can therefore generate and test in parallel
Why is the GA Useful?

It is good at nding mul)ple good paEerns
It is good at scouring many areas of the search space It is good at nding nonlineari)es and interac)ons
It allows a high degree of exibility in dening the evalua)on func)on

It allows us to dene explicitly the desired tradeo between tness and coverage But overkng needs to be avoided
What Did We Learn?

How to construct gene)c learners that iden)fy explicit rules in data The outputs of gene)c search can help us generate interes)ng hypotheses
But, does it work?!
Issues for GA Prac))oners

Choosing basic implementa)on issues:
representa)on popula)on size, muta)on rate, ... selec)on, dele)on policies crossover, muta)on operators
Termina)on Criteria Performance, scalability Solu)on is only as good as the evalua)on func)on (oien hardest part)
Benets of Gene)c Algorithms

Concept is easy to understand Modular, separate from applica)on Supports mul)-objec)ve op)miza)on Good for noisy environments Always an answer; answer gets beEer with )me Inherently parallel; easily distributed
Benets of Gene)c Algorithms (cont.)

Many ways to speed up and improve a GA- based applica)on as knowledge about problem domain is gained Easy to exploit previous or alternate solu)ons Flexible building blocks for hybrid applica)ons Substan)al history and range of use
How Do GAs Compare with Tree Induc)on?
A chromosome can be a whole path or even a tree. The GA can therefore generate and test in parallel
Considering the GA Technology

Almost eight years ago ... people at Microsoi wrote a program [that] uses some gene)c things for nding short code sequences. Windows 2.0 and 3.2, NT, and almost all Microsoi applica)ons products have shipped with pieces of code created by that system.

Technology Group, Wired, September 1995
- Nathan Myhrvold, Microsoi Advanced
When to Use a GA
Alternate solu)ons are too slow or overly complicated Need an exploratory tool to examine new approaches Problem is similar to one that has already been successfully solved by using a GA Want to hybridize with an exis)ng solu)on Benets of the GA technology meet key problem requirements
Ques)ons?
Neural Networks
h7p://www.youtube.com/watch? v=FZ3401XVYww&feature=related h7p://www.youtube.com/watch? v=AjxJabpjDGo
ArQcial Neural Networks
Who is stronger and why?
Introduc)on to Neural Networks ArQcial Intellect:

Applied Problems:
Image, Sound, and Pa7ern recogniQon Decision making Knowledge discovery Context-Dependent Analysis
- modern theory about principles and new mathemaQcal models of informaQon processing, which based on the biological prototypes and mechanisms of human brain 57 acQviQes
NEUROINFORMATICS
How our brain manipulates with paEerns ?
Principles of Brain Processing

A process of paEern recogni)on and paEern manipula)on is based on:
Massive parallelism
Brain computer as an informa%on or signal processing system, is composed of a large number of a simple processing elements, called neurons. These neurons are interconnected by numerous direct links, which are called connec%on, and cooperate which other to perform a parallel distributed processing (PDP) in order to soJ a desired computa%on tasks.
ConnecQonism
Brain computer is a highly interconnected neurons system in such a way that the state of one neuron aects the poten%al of the large number of other neurons which are connected according to weights or strength. The key idea of such principle is the func%onal capacity of biological neural nets determs mostly not so of a single neuron but of its connec%ons
AssociaQve distributed memory

Storage of informa%on in a brain is supposed to be concentrated in synap%c connec%ons of brain neural network, or more precisely, in the paMern of these connec%ons and strengths (weights) of the synap%c connec%ons.
58
Brain Computer: What is it?

Human brain contains a massively interconnected net of 1010-1011 (10 billion) neurons (cor%cal cells)
Biological Neuron - The simple arithme%c compu%ng element
59
1. Soma or body cell - is a large, round
central body in which almost all the logical func)ons of the neuron are realized.
Biological Neurons
Synapses
2. The axon (output), is a nerve bre aEached
to the soma which can serve as a nal output channel of the neuron. An axon is usually highly branched. branching tree of bres. These long irregularly shaped nerve bres (processes) are aEached to the soma.

Axon from other neuron Soma
3. The dendrites (inputs)- represent a highly
4. Synapses are specialized contacts on a

Axon
neuron which are the termina)on points for the axons from other neurons. Dendrites
Dendrite from other
The schemaQc model of a biological neuron
60
Neural networks to the rescue

Neural network: informa%on processing paradigm inspired by biological nervous systems, such as our brain Structure: large number of highly interconnected processing elements (neurons) working together Like people, they learn from experience (by example)
Inspira)on from Biology

Informa)on processing inspired by biological nervous systems
Structure of the nervous system:
n
A large number of neurons (informa)on processing units) connected together A neurons response depends on the states of other neurons it is connected to and to the strength of those connec)ons. The strengths are assigned based on experience.
62
Biological Learning
n n
A neuron: many-inputs / one-output unit. Output can be excited or not excited based on signals from incoming neurons. Resistance in the synapses, changes output.
Hebbs Rule:
If an input of a neuron is repeatedly and persistently causing the neuron to re, a metabolic change happens in the synapse of that par%cular input to reduce its resistance
63
From Real to Artificial
64
Neural Networks: The Model

Model has two components
A par)cular architecture
Number of hidden layers Number of nodes in the input, output and hidden layers Specica)on of the ac)va)on func)on(s)
The associated set of weights
Weights are learned from the data
65
Nodes: A Closer Look

x1
Bias b
w1
Ac)va)on func)on
Input values
x2
w2
Summing func)on
()
Output y
xm
wm
weights
66
Nodes: A Closer Look

n
A node (neuron) is the basic informa)on processing unit of a neural net. It has:
n
A set of inputs with weights w1, w2, , wm along with a default input called the bias An adder func)on (linear combiner) that computes the weighted sum of the inputs
u = w jx j
j =1
n
An AcCvaCon funcCon (squashing func)on) that limits the amplitude of the neuron output.
y = (u + b)
67
A Simple Node: A Perceptron

n
A simple ac)va)on func)on: A signing threshold
+ 1 if v 0 (v ) = 1 if v < 0
x1 x2 xn b (bias) w1 w2 wn v y
(v)
68
Common Ac)va)on Func)on

Sigmoid (logis)c) func)on
Squashing Function 1 0.8
Output
0.6 0.4 0.2 0

Weighted Sum
The s-shape adds non- linearity
69
Neural Network: Architecture

Input layer Output layer
Hidden Layer(s) Data
n n
A Mul)-layer Feed Forward network Each connec)on has a weight
70
A Typical Neural Net Architecture
output layer hidden layer
guesses internal processing
input layer
data
CONFUSION!!!
Individually, Each Neuron Does a Simple Calcula)on...

o u t p u t
sum
Sum Up weighted inputs
Inputs from lower layer
...In Order to Decide How to Respond to Inputs Coming Into It...

YES!
1 1
NO!
1
Umm...
?
0
- sum +
0
- sum +
0
- sum +
Large posiQve sums (inputs) result in high output from nodes
Large negaQve sums result in low output from nodes
Sums near zero result in mediocre output from nodes
...Where the Input Levels Themselves are Determined By weighted connec)ons From Other Neurons!
Nega%ve weights 0 output layer
1
sum
- - - -
-1-1-1-1=-4 input layer 1 1 1 1
Posi%ve weights 1 output layer

1
Changing the weights of a neural network changes the output.
sum
+ + + +
input layer 1 1 1 1 1+1+1+1=4
The Net as a Whole Tries to Find Connec)on Weights Such That its Error in Minimized
Ini)al weights
Weight values that result in higher errors
Error
w1
w2 Final weights aier training and tes)ng is complete w1
w2
Hidden layers allow neural networks to approximate dierent funcQons

1 0.8 0.6 0.4 0.2 0 3 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 -2.7 -2.4 -2.1 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 2.7 -3 -0.2 -0.4 -0.6 -0.8 -1 2 node hidden layer w=(1.5,-0.75;1,1) 2 node hidden layer w=(-2,0.5;1,1) 1.527E-15 1 0.8 0.6 0.4 0.2 0 1.527E-15 0.9 2.1 -2.7 -2.4 -2.1 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 2.7 2.7 0.3 0.6 1.2 1.5 1.8 2.4 -3 -0.2 -0.4 -0.6 -0.8 -1 3 3
1.5
1 0.8 0.6 0.4 0.2 0 0.3 0.6 0.9 1.2 1.5
-0.75
1 0.8 0.6 0.4 0.2 0 1.8 2.1 2.4 2.7 3 0.3 0.6 0.9
-2
0.5
1.2
1.5
1.8
2.1
-2.7
-2.4
-2.1
-1.8
-1.5
-1.2
-0.9
-0.6
-2.7
-2.4
-2.1
-1.8
-1.5
-1.2
-0.9
-0.6
-0.3
-0.3
-0.2 -0.4 -0.6 -0.8
1.527E-15
-0.2 -0.4 -0.6 -0.8
-1
-1
3 node hidden layer w=(-0.5, 1, -0.5;1,1,1)
4 node hidden layer w=(3,-0.5, 1, -0.5; 0.25,1,1,1)
1 -0.5 -0.5 3
1.527E-15
-0.5 1 0.25
2.4
-3
-3
Proper)es of the Neural Model

Can be very accurate in approxima)ng complex func)ons and rela)onships The rela)onship between inputs and outputs is hidden in the connec)ons The rela)onship is dicult to interpret although dithering the inputs can to some extent reveal what it has learned
Image Recogni)on
Given noisy and incomplete input data about an object (I.e. a submarine), iden)fy the type of the object
I N P U T S O U T P U T S
How Does the Recogni)on System Work?

The system must be TRAINED we must give it lots of examples (I.e. with dierent values of temperatures, densi)es, etc along with the actual object being recognized) examples will include noise in the data (I.e. distorted waveforms, incorrect sensor readings, etc, corresponding to eld condi)ons) The system must be TESTED on data it has never seen before Consistent accuracy rates in the two situa)ons should give us comfort
Objec)ve of the Recogni)on System

High classica)on accuracy High tolerance for noisy inputs (I.e. graceful degrada)on in performance) High response speed
Nuts and Bolts Issues

What transfer func)on(s) should be used in a neuron?
Linear Logis)c Other?
Why are weights typically set around zero prior to training? Why are values normalized between zero and 1 prior to training?
Feature Engineering: Example of Predic)ng Debt Default

Conceptual Basis (risk): Input Variables: Outputs:
Protability
Financial InformaQon
Return on Assets Return on Net Worth
Liquidity Capital Structure Market Presence Equity Marke t
Working Capital / Total Assets Leverage RaCo Firm Size (log(Total Assets)) Stock VolaClity and Return Market Value of Equity Distance to Default Default Probability
Market InformaQon
Correct formula)ons of input variables is key
A Linear Predic)on Model

Input Variables:
Y1 Y2
Risk Components:
Sub-Model Scores:
. . . . . . . . . . . . . .
Protability Liquidity Capital Structure Market Presence

Yn
+ + + + +
Sub-model Financial = Scores Score
Y1 Y2 Yn
Y1 Y2 Yn
Financial Score
Equity Market
A Nonlinear Predic)on Model, a la Neural Network

Input Variables Y1 Y2 . . .
Yn
Risk Components:
Internal Components:
Protability Liquidity Capital Structure Market Presence

. . . . . . . . . . . .
Y1 Y2 Yn
Y1 Y2 Yn
Financial Score
Equity Market
How Does a Net Train with Backprop?

Present input and desired outputs Compare actual with desired output for each output node Star)ng at output node, modify the weight as follows: if j is the output node and i is the input node feeding into it with exis)ng weight between i and j being Wij(old): Wij(new) = Wij(old) + Dj*Xi where Dj is the error term and Xi is the output of the node feeding into node j Dj is the error term. If Y is the output of an output node, Dj = Y*(1-Y)*(desired-actual) For a hidden node, the (desired-actual) is replaced by the summa)on of Dk*Wjk where the hidden node feeds into k nodes
Actual weight value Error surface for different neural network weights
Figure 5.10 How a neural network finds good weight settings
Learning rate step size too big Weight bounces around but cant settle down to find minimum
Minimum error Error surface for different neural network weights
Figure 5.11 Too big a step in the learning rate parameter can prevent a neural network from finding good weight settings
output layer
guesses
hidden layer
internal processing
input layer
data
Figure 5.3 A simple neural network
o u t p u t sum +
Sum up weighted inputs
Inputs from lower layer
Figure 5.4 The inside of a neurode. The thickness of each input is proportional to its weight. A solid denotes a positive weight and an unfilled one a negative weight
YES!
NO!
Umm...
0
sum +
0
sum +
0
sum +
Large positive sums (inputs) result in high output from nodes
Large negative sums result in low output from nodes
Sums near zero result in mediocre output from nodes
Figure 5.5 How the sum of inputs affects the ouptut of a neurode. Notice how the rate of change of the output is nonlinear across the input range
1 output layer
input layer 1 1 1 1
Figure 5.7 A very simple neural network with no hidden layer. The numbers state that if the inputs to each neurode are 1 the output should be 1.
Negative weights 0 output layer

1
sum
input layer 1
-1-1-1-1=-4
Positive weights 1 output layer

1
sum
+
input layer 1
+
1+1+1+1=4
Changing the weights of a neural network changes the output.
1 output layer
sum
input layer 0
+
0+0+1+1=2
0 output layer
sum
input layer 1
+
-1-1+0+0=-2
Figure 5.9 A more involved example of setting weight values to get desired output
Neural Nets: Process

Iden)fy the input and output features Transform the inputs and output so that they are in small range (-1 to 1) Set up a network with an appropriate topology Train the network on a representa)ve set of training examples Use the valida)on set to choose the set of weights that minimizes the error Evaluate the network sing the test set to see how well it performs Apply the model generated by the network to predict outcomes for unknown inputs
98
Neural Nets: Strengths

Can model very complex func)ons, very accurately non linearity is built into the model Handles noisy data quite well Provides fast predic)ons (calculate a mathema)cal func)on) Training )me is reasonable, but not low Flexible: Can adapt to dierent problem deni)ons
99
Neural Nets: Weaknesses

A black-box. Hard to explain or gain intui)on. For complex problems, training )me could be quite high Highly prone to overkng
100
S10: Gene)c Algorithms and Neural Networks

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

S10

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

S10

Caricato da

Copyright:

Formati disponibili

S10:

Gene)c Algorithms and Neural Networks

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

DSS Course Outline

hEp://brainz.org/15-real-world-applica)ons-gene)c-algorithms/ hEp://gene)calgorithms.ai-depot.com/Programs.html hEp://math.hws.edu/xJava/GA/

Feasible solution region

AX +BY < 5000

The Gene)c Algorithm

The GA Cycle of Reproduc)on

Chromosomes could be:

Chromosomes Can Represent Arbitrary Data Structures

Find a tour of a given set of ci)es so that

1) London 3) Dunedin 5) Beijing 7) Tokyo 2) Venice 4) Singapore 6) Phoenix 8) Victoria

TSP Example: 30 Cities

Solution i (Distance = 941)

Solution j(Distance = 800)

Solution k(Distance = 652)

Best Solution (Distance = 420)

A Nonlinear Continuous Terrain of Possible Solutions Z

Genetic Algorithms: Directed Random Search

30,000 Foot View of Genetic Search

Solution Quality Best Average Worst Generations

Basic Concept of a Gene)c Algorithm

How Selection Works: Roulette

Most likely to be chosen

Least likely to be chosen

Theory construc)on Rollout and con)nuous evalua)on

How Crossover Works

How Muta)on Works

SQL-Like (Boolean) Chromosome Seman)cs for PaEerns

Example 1: Searching for Trading Rules With a Gene)c Learning Algorithm

Example Used to Evaluate Alterna)ve Methods: Earnings Surprise Predic)on

-0.02 -0.04 -0.06 -0.08

Variables Considered (30)

A Comparison of Rule Learning Algorithms

Dependent Variable Distribu)on

Goal: Predict Nega)ve and Posi)ve Surprises only

Comparison of Tree Induc)on, Rule Learning, and Gene)c Algorithms

Nega)ve Surprise Rules

5.93 1.00 TI:supp

7.39 3.70 RL:supp

13.34 1.60 GA:supp

Posi)ve Surprise Rules

1.70 0.82 TI:supp

7.86 2.60 RL:supp

10.71 1.04 GA:supp

Conclusion: Search Does MaEer!

Tradeos Among Methods

Conclusion: Nothing is perfect!

Other Strengths? Weaknesses?

How Do GAs Compare with Tree Induc)on?

Why is the GA Useful?

It allows a high degree of exibility in dening the evalua)on func)on

What Did We Learn?

But, does it work?!

Issues for GA Prac))oners

Benets of Gene)c Algorithms

Benets of Gene)c Algorithms (cont.)

How Do GAs Compare with Tree Induc)on?

Considering the GA Technology

Technology Group, Wired, September 1995

- Nathan Myhrvold, Microsoi Advanced

ArQcial Neural Networks

Who is stronger and why?