Sei sulla pagina 1di 101

S10:

Gene)c Algorithms and Neural Networks


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30



DSS Course Outline


Introduction to Modeling & Data Mining nFundamental concepts and terminology Data Mining methods nClassification decision trees, association rules, clustering and segmentation, collaborative filtering, genetic algorithms etc. nInner workings nStrengths and weaknesses Evaluation nHow to evaluate the results of a data mining solutions nApplications nReal-world business problems DM can be applied to

YouTube
hEp://www.youtube.com/watch? v=b1rHS3R0llU&annota)on_id=annota)on_882813&feature=iv

hEp://brainz.org/15-real-world-applica)ons-gene)c-algorithms/ hEp://gene)calgorithms.ai-depot.com/Programs.html hEp://math.hws.edu/xJava/GA/

Business Applica)ons
Complex scheduling problems Resource op)miza)on in large factories Op)mizing weights of neural networks (will learn about neural networks in this or next session) Op)miza)on of Making predic)ons in nance (Well get to this!)

Darwins Insight
evolu)on is a series of single steps each single step is simply rela%ve to its predecessor, arising randomly... but a sequence of small steps, cumula%ve selec%on is not a random process! explains how order and complexity can happen and thrive

Key Observa)on
As long as you can es)mate the quality of a proposed solu)on, search does a lot of work for you!
Dene the performance variables Dene the features

REMEMBER: OPIM 621 and OPIM 101? Example: A Linear Discontinuous Terrain of Possible Solutions X < 50 Y
B C

Y < 75

Feasible solution region

AX +BY < 5000

The Gene)c Algorithm


Directed search algorithms based on the mechanics of biological evolu)on Developed by John Holland, University of Michigan (1970s)
To understand the adap)ve processes of natural systems To design ar)cial systems soiware that retains the robustness of natural systems

Components of a GA
A problem to solve, and ... Encoding technique (gene, chromosome) Ini)aliza)on procedure (crea%on) Evalua)on func)on (environment) Selec)on of parents (reproduc%on) Gene)c operators (muta%on, recombina%on) Parameter sekngs (prac%ce and art)

The GA Cycle of Reproduc)on


reproduc)on
parents children

modica)on
modied children

popula)on
deleted members

evaluated children

evalua)on

discard

Popula)on

popula)on

Chromosomes could be:


Bit strings (0101 ... 1100) Real numbers (43.2 -33.1 ... 0.0 89.2) Permuta)ons of element (E11 E3 E7 ... E1 E15) Lists of rules (R1 R2 R3 ... R22 R23) Program elements (gene)c programming) ... any data structure ...

Chromosomes Can Represent Arbitrary Data Structures


1 0 0
+ - a b c

Arrays

Trees

<

0.8

AND

Boolean Expressions
I.e. X1 between 0 and 0.8

Reproduc)on
parents popula)on Parents are selected at random with selec)on chances biased in rela)on to chromosome evalua)ons.
reproduc)on
children

Chromosome Modica)on

children

modica)on
modied children

Modica)ons are stochas)cally triggered Operator types are: Muta)on Crossover (recombina)on)

A Simple Example
The Traveling Salesman Problem:

Find a tour of a given set of ci)es so that


each city is visited only once the total distance traveled is minimized

Representa)on
Representa)on is an ordered list of city numbers known as an order-based GA.

1) London 3) Dunedin 5) Beijing 7) Tokyo 2) Venice 4) Singapore 6) Phoenix 8) Victoria


CityList1 (3 5 7 2 1 6 4 8) CityList2 (2 5 7 6 8 1 3 4)

TSP Example: 30 Cities


120 100 80 y 60 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100

Solution i (Distance = 941)


TSP30 (Performance = 941)
120 100

80

y 60

40

20

0 0 10 20 30 40 50 x 60 70 80 90 100

44 62 69 67 78 64 62 54 42 50 40 40 38 21 35 67 60 60 40 42 50 99

Solution j(Distance = 800)


TSP30 (Performance = 800)
120 100

80

y 60

40

20

0 0 10 20 30 40 50 x 60 70 80 90 100

Solution k(Distance = 652)


TSP30 (Performance = 652)
120 100

80

y 60

40

20

0 0 10 20 30 40 50 x 60 70 80 90 100

Best Solution (Distance = 420)


42 38 35 26 21 35 32 7 38 46 44 58 60 69 76 78 71 69 67 62 84 94 TSP30 Solution (Performance = 420)
120

100

80

y 60

40

20

0 0 10 20 30 40 50 x 60 70 80 90 100

Overview of Performance
TSP30 - Overview of Performance
1800 1600 1400 1200 e c 1000 n a t s 800 i D 600 400 200 0 1 3 5 7 9 11 13 15 17 19 21 Generations (1000) 23 25 27 29 31
Best Worst Average

A Nonlinear Continuous Terrain of Possible Solutions Z


Global maximum

Local maxima

Genetic Algorithms: Directed Random Search


Landscape represents quality of solutions, higher the better Objective is to find the highest spots, and report the description of those spots The search proceeds as shown below, where each dot represents a solution

20 Generations

50

From Seven Methods for Transforming Corporate Data Into Business Intelligence, by Vasant Dhar and Roger Stein, Prentice-Hall, 1997.

30,000 Foot View of Genetic Search


Better part Better part Better part Better part Better part

Breeds

Breeds

Breeds

Breeds

Worse part

Worse part

Worse part

Worse part

Worse part

Drops Out

Drops Out

Drops Out

Drops Out

Drops Out

Solution Quality Best Average Worst Generations

Basic Concept of a Gene)c Algorithm


breed solu)ons to a problem in parallel by:
selec)ng the beEer solu)ons from a popula)on using some evalua)on criteria exchanging informa)on between selected individuals (ma)ng) occasionally tweaking a few solu)ons randomly (muta)ng)

stop when acceptable solu)ons are reached, or solu)ons are not improving, or aier the algorithm has run for a specied amount of )me or for a specic number of genera)ons

How Selection Works: Roulette


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again. The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.

Krome 5

Krome 1
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again. The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted.The image cannot be displayed. Your computer may not have enough memory to open the Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again. image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.

Krome 2

Krome 4 Krome 3
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the le again. If the red x still appears, you may have to delete the image and then insert it again.

Most likely to be chosen

Least likely to be chosen

Process in Finance
Establish model basis (I.e. fundamentals, anomalies? Momentum? Asset-specic versus general?) Feature engineering (construc)ng good technical and fundamental indicators) Search
Selec)on, crossover, muta)on

Theory construc)on Rollout and con)nuous evalua)on

How Crossover Works


Before
Frequency 0.3 0.9 0 Product Type 0 0 1 Price 0.7 0.9 70 Frequency 0.3 0.9 0

Aier
Product Type 0 0 1 Price 0.6 0.9 68

0.1

0.8

0.6

0.9

84

0.1

0.8

0.7

0.9

92

Crossover Point

How Muta)on Works

Before
Frequency 0.3 0.9 0 Product Type 0 0 1 Price 0.7 0.9 Frequency 0.3 0.9 0

Aier
Product Type 0 0 1 Price 0.5 0.9

SQL-Like (Boolean) Chromosome Seman)cs for PaEerns


Concept Variable|Constant|Operator Example: 30-day-moving average of price (MA30) Univariate predicate (Single "conjunct") Example: MA30 > 10 Example: MA30 < 10 OR MA30 > 90 Multivariate predicate (Conjunctive pattern) Example: MA30 > 10 AND MA10 < 5 Multiple Patterns Representation Gene Set of Genes Chromosome Population

Example 1: Searching for Trading Rules With a Gene)c Learning Algorithm


A Popula)on of PaEerns
Asset Trend Asset Type Rela)ve Str Ac)on Fitness

0.5

0.8

0.6

0.9

Long

0.84

PaEern: GO LONG WHEN Asset Trend is between 0.5 and 0.8 Asset Type is of type 2 or type 4 Rela)ve Strength is between 0.6 and 0.9

Based on mul)ple objec)ves, I.e. consistency of returns over )me consistency of returns across assets, etc

Example Used to Evaluate Alterna)ve Methods: Earnings Surprise Predic)on


Cumulative Abnormal Returns Based on Earnings Surprise
0.1 0.08 0.06 0.04
Best Surprise

Return

0.02 0
14 20 26 32 38 44 50 56 62 68 74 -4 -3 -2 -2 -1 -1 80

-0.02 -0.04 -0.06 -0.08

-4

Worst Surprise
Announcement Day

Days

The Problem: How Do You Figure Out the Outcome X Days in Advance?

Variables Considered (30)


Es)mate revision index (Expecta)on) Number of up/down revisions (Expecta)on) Cash-ow based ROI (CFROI) and its momentum (Fundamental) Industry trend (Technical)
Dependent variable: Degree of surprise can be: posi)ve, nega)ve, none

A Comparison of Rule Learning Algorithms


Several algorithms were tested of the earnings surprise predic)on problem Every case was categorized into one of three categories: posi%ve, nega%ve, neutral (priors: 0.12, 0.75, 0.13) S&P500, January 1990 to September 1998 Predic)on task: predict category 20 days prior to announcement
See www.stern.nyu.edu/~vdhar for details and copy of paper or Journal of Data Mining and Knowledge Discovery, October 2000

The Data
In sample training 14,490 records In sample tes)ng 9,650 records (split randomly into mul)ple test sets) Out of sample: 12,164 records (con)guous periods chosen manually)

The Objec)ve
Balance between generality and accuracy Balance condence and support

Dependent Variable Distribu)on


Earnings Surprise: Priors
80 60 40 20 0 Negative Neutral Positive 13.35 12.71 73.94

Goal: Predict Nega)ve and Posi)ve Surprises only

Comparison of Tree Induc)on, Rule Learning, and Gene)c Algorithms


70 60

Nega)ve Surprise Rules

50 40 30 20

10 0

64.39

63.49

65.53
Mean

Sigma

5.41 TI:conf

5.93 1.00 TI:supp

3.34 RL:conf

7.39 3.70 RL:supp

1.39 GA:conf

13.34 1.60 GA:supp

80 60

Posi)ve Surprise Rules

40 20

64.24

65.02

71.48
Mean

5.39 TI:conf

1.70 0.82 TI:supp

4.38 RL:conf

7.86 2.60 RL:supp

2.09 GA:conf

10.71 1.04 GA:supp

Sigma

Conclusion: Search Does MaEer!

Tradeos Among Methods


Tolerance for Problem Complexity Tree Induction Algorithms Neural Networks Genetic Learning Algorithms Speed Attribution/Explainability

Conclusion: Nothing is perfect!

Other Strengths? Weaknesses?

Strengths
Flexible Mul)ple objec)ves encoded into 1 Can model rela)onships between aEributes Searches a large space of possible solu)ons

Weaknesses

You have to know your objec)ve/tness Time Heuris)c doesnt nd the op)mal solu)on

How Do GAs Compare with Tree Induc)on?

A chromosome can be a whole path or even a tree. The GA can therefore generate and test in parallel

Why is the GA Useful?


It is good at nding mul)ple good paEerns
It is good at scouring many areas of the search space It is good at nding nonlineari)es and interac)ons

It allows a high degree of exibility in dening the evalua)on func)on


It allows us to dene explicitly the desired tradeo between tness and coverage But overkng needs to be avoided

What Did We Learn?


How to construct gene)c learners that iden)fy explicit rules in data The outputs of gene)c search can help us generate interes)ng hypotheses

But, does it work?!

Issues for GA Prac))oners


Choosing basic implementa)on issues:
representa)on popula)on size, muta)on rate, ... selec)on, dele)on policies crossover, muta)on operators

Termina)on Criteria Performance, scalability Solu)on is only as good as the evalua)on func)on (oien hardest part)

Benets of Gene)c Algorithms


Concept is easy to understand Modular, separate from applica)on Supports mul)-objec)ve op)miza)on Good for noisy environments Always an answer; answer gets beEer with )me Inherently parallel; easily distributed

Benets of Gene)c Algorithms (cont.)


Many ways to speed up and improve a GA- based applica)on as knowledge about problem domain is gained Easy to exploit previous or alternate solu)ons Flexible building blocks for hybrid applica)ons Substan)al history and range of use

How Do GAs Compare with Tree Induc)on?

A chromosome can be a whole path or even a tree. The GA can therefore generate and test in parallel

Considering the GA Technology


Almost eight years ago ... people at Microsoi wrote a program [that] uses some gene)c things for nding short code sequences. Windows 2.0 and 3.2, NT, and almost all Microsoi applica)ons products have shipped with pieces of code created by that system.

Technology Group, Wired, September 1995

- Nathan Myhrvold, Microsoi Advanced

When to Use a GA
Alternate solu)ons are too slow or overly complicated Need an exploratory tool to examine new approaches Problem is similar to one that has already been successfully solved by using a GA Want to hybridize with an exis)ng solu)on Benets of the GA technology meet key problem requirements

Ques)ons?

Neural Networks
h7p://www.youtube.com/watch? v=FZ3401XVYww&feature=related h7p://www.youtube.com/watch? v=AjxJabpjDGo

ArQcial Neural Networks

Who is stronger and why?

Introduc)on to Neural Networks ArQcial Intellect:


Applied Problems:

Image, Sound, and Pa7ern recogniQon Decision making Knowledge discovery Context-Dependent Analysis

- modern theory about principles and new mathemaQcal models of informaQon processing, which based on the biological prototypes and mechanisms of human brain 57 acQviQes

NEUROINFORMATICS

How our brain manipulates with paEerns ?

Principles of Brain Processing


A process of paEern recogni)on and paEern manipula)on is based on:

Massive parallelism
Brain computer as an informa%on or signal processing system, is composed of a large number of a simple processing elements, called neurons. These neurons are interconnected by numerous direct links, which are called connec%on, and cooperate which other to perform a parallel distributed processing (PDP) in order to soJ a desired computa%on tasks.

ConnecQonism
Brain computer is a highly interconnected neurons system in such a way that the state of one neuron aects the poten%al of the large number of other neurons which are connected according to weights or strength. The key idea of such principle is the func%onal capacity of biological neural nets determs mostly not so of a single neuron but of its connec%ons

AssociaQve distributed memory


Storage of informa%on in a brain is supposed to be concentrated in synap%c connec%ons of brain neural network, or more precisely, in the paMern of these connec%ons and strengths (weights) of the synap%c connec%ons.

58

Brain Computer: What is it?


Human brain contains a massively interconnected net of 1010-1011 (10 billion) neurons (cor%cal cells)

Biological Neuron - The simple arithme%c compu%ng element

59

1. Soma or body cell - is a large, round

central body in which almost all the logical func)ons of the neuron are realized.

Biological Neurons
Synapses

2. The axon (output), is a nerve bre aEached

to the soma which can serve as a nal output channel of the neuron. An axon is usually highly branched. branching tree of bres. These long irregularly shaped nerve bres (processes) are aEached to the soma.


Axon from other neuron Soma

3. The dendrites (inputs)- represent a highly

4. Synapses are specialized contacts on a


Axon

neuron which are the termina)on points for the axons from other neurons. Dendrites

Dendrite from other

The schemaQc model of a biological neuron

60

Neural networks to the rescue


Neural network: informa%on processing paradigm inspired by biological nervous systems, such as our brain Structure: large number of highly interconnected processing elements (neurons) working together Like people, they learn from experience (by example)

Inspira)on from Biology


Informa)on processing inspired by biological nervous systems
Structure of the nervous system:
n

A large number of neurons (informa)on processing units) connected together A neurons response depends on the states of other neurons it is connected to and to the strength of those connec)ons. The strengths are assigned based on experience.

62

Biological Learning
n n

A neuron: many-inputs / one-output unit. Output can be excited or not excited based on signals from incoming neurons. Resistance in the synapses, changes output.

Hebbs Rule:
If an input of a neuron is repeatedly and persistently causing the neuron to re, a metabolic change happens in the synapse of that par%cular input to reduce its resistance
63

From Real to Artificial

64

Neural Networks: The Model


Model has two components
A par)cular architecture
Number of hidden layers Number of nodes in the input, output and hidden layers Specica)on of the ac)va)on func)on(s)

The associated set of weights

Weights are learned from the data

65

Nodes: A Closer Look


x1
Bias b

w1

Ac)va)on func)on

Input values

x2

w2

Summing func)on

()

Output y

xm

wm
weights

66

Nodes: A Closer Look


n

A node (neuron) is the basic informa)on processing unit of a neural net. It has:
n

A set of inputs with weights w1, w2, , wm along with a default input called the bias An adder func)on (linear combiner) that computes the weighted sum of the inputs

u = w jx j
j =1
n

An AcCvaCon funcCon (squashing func)on) that limits the amplitude of the neuron output.

y = (u + b)
67

A Simple Node: A Perceptron


n

A simple ac)va)on func)on: A signing threshold

+ 1 if v 0 (v ) = 1 if v < 0
x1 x2 xn b (bias) w1 w2 wn v y

(v)

68

Common Ac)va)on Func)on


Sigmoid (logis)c) func)on
Squashing Function 1 0.8
Output

0.6 0.4 0.2 0


Weighted Sum

The s-shape adds non- linearity

69

Neural Network: Architecture


Input layer Output layer

Hidden Layer(s) Data

n n

A Mul)-layer Feed Forward network Each connec)on has a weight

70

A Typical Neural Net Architecture

output layer hidden layer

guesses internal processing

input layer

data

CONFUSION!!!

Individually, Each Neuron Does a Simple Calcula)on...


o u t p u t

sum

Sum Up weighted inputs

Inputs from lower layer

...In Order to Decide How to Respond to Inputs Coming Into It...


YES!
1 1

NO!
1

Umm...
?

0
- sum +

0
- sum +

0
- sum +

Large posiQve sums (inputs) result in high output from nodes

Large negaQve sums result in low output from nodes

Sums near zero result in mediocre output from nodes

...Where the Input Levels Themselves are Determined By weighted connec)ons From Other Neurons!
Nega%ve weights 0 output layer
1

sum

- - - -
-1-1-1-1=-4 input layer 1 1 1 1

Posi%ve weights 1 output layer


1

Changing the weights of a neural network changes the output.

sum

+ + + +
input layer 1 1 1 1 1+1+1+1=4

The Net as a Whole Tries to Find Connec)on Weights Such That its Error in Minimized
Ini)al weights

Weight values that result in higher errors

Error

w1

w2 Final weights aier training and tes)ng is complete w1

w2

Hidden layers allow neural networks to approximate dierent funcQons


1 0.8 0.6 0.4 0.2 0 3 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 -2.7 -2.4 -2.1 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 2.7 -3 -0.2 -0.4 -0.6 -0.8 -1 2 node hidden layer w=(1.5,-0.75;1,1) 2 node hidden layer w=(-2,0.5;1,1) 1.527E-15 1 0.8 0.6 0.4 0.2 0 1.527E-15 0.9 2.1 -2.7 -2.4 -2.1 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 2.7 2.7 0.3 0.6 1.2 1.5 1.8 2.4 -3 -0.2 -0.4 -0.6 -0.8 -1 3 3

1.5
1 0.8 0.6 0.4 0.2 0 0.3 0.6 0.9 1.2 1.5

-0.75
1 0.8 0.6 0.4 0.2 0 1.8 2.1 2.4 2.7 3 0.3 0.6 0.9

-2

0.5

1.2

1.5

1.8

2.1

-2.7

-2.4

-2.1

-1.8

-1.5

-1.2

-0.9

-0.6

-2.7

-2.4

-2.1

-1.8

-1.5

-1.2

-0.9

-0.6

-0.3

-0.3

-0.2 -0.4 -0.6 -0.8

1.527E-15

-0.2 -0.4 -0.6 -0.8

-1

-1

3 node hidden layer w=(-0.5, 1, -0.5;1,1,1)

4 node hidden layer w=(3,-0.5, 1, -0.5; 0.25,1,1,1)

1 -0.5 -0.5 3

1.527E-15

-0.5 1 0.25

2.4

-3

-3

Proper)es of the Neural Model


Can be very accurate in approxima)ng complex func)ons and rela)onships The rela)onship between inputs and outputs is hidden in the connec)ons The rela)onship is dicult to interpret although dithering the inputs can to some extent reveal what it has learned

Image Recogni)on
Given noisy and incomplete input data about an object (I.e. a submarine), iden)fy the type of the object
I N P U T S O U T P U T S

How Does the Recogni)on System Work?


The system must be TRAINED we must give it lots of examples (I.e. with dierent values of temperatures, densi)es, etc along with the actual object being recognized) examples will include noise in the data (I.e. distorted waveforms, incorrect sensor readings, etc, corresponding to eld condi)ons) The system must be TESTED on data it has never seen before Consistent accuracy rates in the two situa)ons should give us comfort

Objec)ve of the Recogni)on System


High classica)on accuracy High tolerance for noisy inputs (I.e. graceful degrada)on in performance) High response speed

Nuts and Bolts Issues


What transfer func)on(s) should be used in a neuron?
Linear Logis)c Other?

Why are weights typically set around zero prior to training? Why are values normalized between zero and 1 prior to training?

Feature Engineering: Example of Predic)ng Debt Default


Conceptual Basis (risk): Input Variables: Outputs:

Protability
Financial InformaQon

Return on Assets Return on Net Worth

Liquidity Capital Structure Market Presence Equity Marke t

Working Capital / Total Assets Leverage RaCo Firm Size (log(Total Assets)) Stock VolaClity and Return Market Value of Equity Distance to Default Default Probability

Market InformaQon

Correct formula)ons of input variables is key

A Linear Predic)on Model


Input Variables:
Y1 Y2

Risk Components:

Sub-Model Scores:

. . . . . . . . . . . . . .

Protability Liquidity Capital Structure Market Presence


Yn

+ + + + +

Sub-model Financial = Scores Score

Y1 Y2 Yn

Y1 Y2 Yn

Financial Score

Equity Market

A Nonlinear Predic)on Model, a la Neural Network


Input Variables Y1 Y2 . . .
Yn

Risk Components:

Internal Components:

Protability Liquidity Capital Structure Market Presence


. . . . . . . . . . . .

Y1 Y2 Yn

Y1 Y2 Yn

Financial Score

Equity Market

How Does a Net Train with Backprop?


Present input and desired outputs Compare actual with desired output for each output node Star)ng at output node, modify the weight as follows: if j is the output node and i is the input node feeding into it with exis)ng weight between i and j being Wij(old): Wij(new) = Wij(old) + Dj*Xi where Dj is the error term and Xi is the output of the node feeding into node j Dj is the error term. If Y is the output of an output node, Dj = Y*(1-Y)*(desired-actual) For a hidden node, the (desired-actual) is replaced by the summa)on of Dk*Wjk where the hidden node feeds into k nodes

Weight values that result in higher errors

Actual weight value Error surface for different neural network weights

Figure 5.10 How a neural network finds good weight settings

Learning rate step size too big Weight bounces around but cant settle down to find minimum

Weight values that result in higher errors

Minimum error Error surface for different neural network weights

Figure 5.11 Too big a step in the learning rate parameter can prevent a neural network from finding good weight settings

output layer

guesses

hidden layer

internal processing

input layer

data

Figure 5.3 A simple neural network

o u t p u t sum +

Sum up weighted inputs

Inputs from lower layer

Figure 5.4 The inside of a neurode. The thickness of each input is proportional to its weight. A solid denotes a positive weight and an unfilled one a negative weight

YES!

NO!

Umm...

0
sum +

0
sum +

0
sum +

Large positive sums (inputs) result in high output from nodes

Large negative sums result in low output from nodes

Sums near zero result in mediocre output from nodes

Figure 5.5 How the sum of inputs affects the ouptut of a neurode. Notice how the rate of change of the output is nonlinear across the input range

1 output layer

input layer 1 1 1 1

Figure 5.7 A very simple neural network with no hidden layer. The numbers state that if the inputs to each neurode are 1 the output should be 1.

Negative weights 0 output layer


1

sum

input layer 1

-1-1-1-1=-4

Positive weights 1 output layer


1

sum

+
input layer 1

+
1+1+1+1=4

Changing the weights of a neural network changes the output.

1 output layer

sum

input layer 0

+
0+0+1+1=2

0 output layer

sum

input layer 1

+
-1-1+0+0=-2

Figure 5.9 A more involved example of setting weight values to get desired output

Neural Nets: Process


Iden)fy the input and output features Transform the inputs and output so that they are in small range (-1 to 1) Set up a network with an appropriate topology Train the network on a representa)ve set of training examples Use the valida)on set to choose the set of weights that minimizes the error Evaluate the network sing the test set to see how well it performs Apply the model generated by the network to predict outcomes for unknown inputs

98

Neural Nets: Strengths


Can model very complex func)ons, very accurately non linearity is built into the model Handles noisy data quite well Provides fast predic)ons (calculate a mathema)cal func)on) Training )me is reasonable, but not low Flexible: Can adapt to dierent problem deni)ons

99

Neural Nets: Weaknesses


A black-box. Hard to explain or gain intui)on. For complex problems, training )me could be quite high Highly prone to overkng
100

S10: Gene)c Algorithms and Neural Networks


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

Potrebbero piacerti anche