Differential Evolution Algorithm

The Dierential Evolution Algorithm
presentation based on book : Dierential Evolution
Chiriac Igor
Summary
The Motivation for Dierential Evolution
Introduction to Parameter Optimization
Local Versus Global Optimization
The Dierential Evolution Algorithm

Overview
Parameter Optimization
Initialization
Base Vector Selection
Dierential Mutation
DIFFERENTIAL EVOLUTION
Overview
Optimization - the attempt to maximize a systems desirable
properties while simultaneously minimizing its undesirable
characteristics
Function of the tuning knob:
noise power
f (x) =
signal power
Objective function - their most extreme value represents the

optimization goal
Cost function - when the minimum is sough
Error function - the minimum being sought is zero
Fitness function - function that describe properties to be
maximized
Presentation : Dierential Evolution
Objective functions attributes
Parameter quantization
Parameter dependence
Dimensionality
Modality
Time dependence
Noise
Constraints
Dierentiability
Classifying optimizers
Single-point
Steepest descent
Conjugate gradient
Derivative-based
Quasi-Newton
Derivative-free
Random walk
HookeJeeves
Multi-point
Multi-start and
clustering techniques
NelderMead
Evolutionary algorithms
Dierential evolution
Single-point, derivative-based
optimization
Classical derivative-based optimization can be eective
as long the objective function fulfills two requirements:
1. The objective function must be two-times
dierentiable.
2. The objective function must be uni-modal
Example
A simple example of a dierentiable and uni-modal objective function is:
Fig1: Representation of the function
f (x1 , x2 ) = 10
(x21 +3x22 )
Fig2: The method of steepest descent first computes the negative gradient,
then takes a step in the direction indicated.
Dierentiability of an objective function

Constraining the objective function may create
regions that are not dierentiable
If the objective function is a computer program,
conditional branches make it non-dierentiable
the objective function is subjective, an analytic
formula is not possible
the objective function is not explicit
Brute Force Search

Known as enumeration, the brute force method visits all grid points in a
bounded region while storing the current best point in memory
Hooke and Jeeves algorithm

known as a direction or pattern search
Multi modal functions
pose a starting point problem
a finite probability that the

random walk will eventually
generate a new and better
point in a basin of attraction
other than the one containing
the current base point
10
Simulated Annealing
modifying the greedy criterion to accept
some uphill moves while continuing to
accept all downhill moves
one drawbacks is that special eort may be
required to find an annealing schedule that
lowers T at the right rate
11
Multi-Point , Derivative-Based Methods

Multi-start techniques - restart the optimization
process from dierent initial points
Clustering methods - applying a clustering
algorithm to identify those sample points that
belong to the same basin of attraction
12
Multi-Point , Derivative-Free Methods
Because they mimic Darwinian evolution, ESs, GAs, DE

their are often referred to as evolutionary algorithms
ES eective continuous optimizer because it encodes

parameters as floating-point numbers and manipulate
them with arithmetic operations
GAs are often better suited for combinatorial optimization

because they encode parameters as bit strings and
modify them with logical operators
13
Evolution Strategies Algorithm
14
Nelder and Mead
tries to solve the step size problem by allowing the step size to
expand or contract as needed
the algorithm begins by forming a (D+1) dimensional

polyhedron, or simplex, of D + 1 points, xi, i = 0, 1, ..., D, that
are randomly distributed throughout the problem space
to obtain a new trial point, xr, the worst point, xD, is reflected
through the opposite face of the polyhedron using a weighting
factor, F1: xr = xD + F1 (xm xD )
D
1
X
1
xi )
the vector, x , is the centroid of the face opposite x : xm = (
D
i=0
15
Nelder and Mead
One advantage of the NelderMead

method is that the simplex can shrink
as well as expand to adapt to the
current objective function surface
NelderMead algorithm restricts the

number of sample points to D + 1
16
Dierential Evolution
Developed by Price and Storn
DE has proven itself in competitions like the IEEEs

International Contest on Evolutionary Optimization
(ICEO) in 1996 and 1997 and in the real world on a
broad variety of applications
DE is a population-based optimizer
DE perturbs vectors with the scaled dierence of two

randomly selected population vectors
17
DE INITIALIZATION
Fig5: Initializing the DE population

Presentation : Dierential Evolution
18
DE Mutation
Fig8: Mutation
19
Crossover
crossover builds trial vectors
out of parameter values that
have been copied from two
dierent vectors
DE crosses each vector with a
mutant vector
the crossover probability, Cr
[0,1], is a user-defined value
Fig10: Crossover
20
DE Selection
Fig.9: Selection
21
Visualizing DE
22
Visualizing DE
23
Visualizing DE
24
Visualizing DE
25
Visualizing DE
26
Parameter Representation
DE encodes all parameters as floating-point numbers.
Advantages over the traditional GA bit flipping approach :
ease of use
ecient memory utilization
lower computational complexity scales better
enlarge problems
lower computational eort faster convergence
greater freedom in designing a mutation
distribution
27
Bit Strings Representation
GAs encode a continuous parameter x as an integer

string of q bits
even on unimodal objective functions, the

computational eort to optimize a parameter is a
function of l that depends on a parameters value
the standard GA coding scheme imposes multimodality on even uni-modal objective function
28
Floating-Point
unlike the standard GA representation the floatingpoint format retains only a limited number of
significant digits
the floating-point span a vast dynamic range with

minimal resources
eciently because most modern programming

languages support common floating-point formats
the encoding process is transparent to the user
29
Initialization
DE requires a predefined probability

distribution function, or PDF, to seed
the initial population
when parameters exhibit no obvious

limits, their upper and lower bounds,
bj,U and bj,L, respectively, should be
set so that the initial bounding box
they define encompasses the
optimum
Fig12: Far initialization shrinks the initial bounding box so that it no longer
contains the optimum, x*
30
The eects of far initializing DE

Results are 100-trial averages obtained using classic DE with F = Cr = 0.9 and r0 r1 r2 i
31
Initial Distribution
Uniform Distributions - preferred because
they best reflect the lack of knowledge
about the optimums location
Gaussian Distribution - may prove faster if
the location of optimums location is well
known, although it may increase the
probability that the population will converge
prematurely
32
Uniform Distribution
distributing initial points with random uniformity
is not mandatory, but experience has shown
randj(0,1) to be very eective in this regard
Halton point sets are based on prime

numbers, these pseudo-random
distributions are both uniform and irregular
33
Fig: 14 : Two hundred points distributed with random uniformity (left) and according to a two-dimensional Halton point set (right)
34
35
Gaussian Distribution
population far initialized with a Gaussian distribution is less

likely to be successful on multi-modal functions than a
uniformly distributed one
when the objective function is multi-modal, it is important to

disperse the initial population widely enough to contain the
optimum
in every case where uniform distributions failed, the

Gaussian-distributed population failed more often
36
Base Vector Selection

DE equation:
the target index i - specifies the vector with
which the mutant is recombined and against
which the resulting trial vector competes
r0, r1 and r2 - determine which vectors
combine to create the mutant
37
Choosing the Base Vector Index, r0
Randomly selecting the base

vector without restrictions is
known in EA parlance as
roulette wheel selection
Roulette wheel selection

chooses Np vectors by
conducting Np separate
random trials
38
One-to-One Base Vector Selections

it is a way to stochastically assign each target vector an unique base
vector
Each of the Np possible
values for rg defines a
one-to-one mapping between target and base
vectors
39
Random Based Index Vectors Comparison
40
Degenerate Vector Combinations
r1 = r2: No Mutation
when indices are chosen without restrictions, r1 will equal r2 on
average once per generation, i.e., with probability 1/Np
the probability that all three indices will be equal is (1/Np)2
r1 = r0 or r2 = r0: Arithmetic Recombination

each coincidence occurs on average once per generation
DEs three-vector mutation formula reduces to a linear relation
between the base vector and a single dierence vector
41
Implementing Mutually Exclusive Indices
42
Eects of Degenerate Combinations
43
Dierential Mutation
DE uses a uniform probability distribution function to
randomly sample vector dierences:
In a population of Np distinct vectors, there will be
Np(Np 1) non- zero vector dierences
44
The Mutation Scale Factor: F

the stated range for F is (0,1)
to avoid premature convergence, it is crucial that F be of
sucient magnitude to counteract this selection
pressure
Zaharie empirically examined three test functions using
Np = 50, pCr = 0.2 and found that F ~ 0.3 was the
smallest reliable scale factor
45
Randomizing the Scale Factor
keeping F constant has proven eective

nevertheless, randomizing F oers potential
benefits
randomizing the scale factor is a way to
increase the pool of potential trial vectors and
minimize the risk of stagnation without
increasing the population size
46
PDF Sampling Frequency: Dither and Jitter

jitter - the practice of generating a new
value of F for every parameter
dither - choosing F anew for each vector
47

As expected, both jitter and dither exhibit the same
number of function evaluations and the same optimal population
size (Np = 7) when Cr = 0.
At Cr = 0.2 (Zaharies choice), all three methods require
about the same number of function evaluations, with both jitter
and dither also having the same optimal population size (Np = 8).
Over the range of Cr, jitter was the fastest technique and
the optimal population size was virtually constant at Np = 8.
In terms of the number of function evaluations, F =
constant and dither perform similarly, but dither requires a larger
population.
48
The data in table casts suspicion on Zaharies

contention that multiplying each component of a
dierential by a normally distributed variable does not
aect DEs performance
Although jitter is eective on separable functions, its

poor performance on non-separable, multi-modal
functions makes it a questionable strategy for nonlinear global optimization with DE
49
Next presentation summary
DE Selection
Termination Criteria
Benchmarking Differential Evolution
50
QUESTIONS ?
51

Differential Evolution Algorithm

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Differential Evolution Algorithm

Caricato da

Copyright:

Formati disponibili

The Dierential Evolution Algorithm

presentation based on book : Dierential Evolution

The Dierential Evolution Algorithm

Objective function - their most extreme value represents the

Objective functions attributes

Fig1: Representation of the function

Dierentiability of an objective function

Brute Force Search

Hooke and Jeeves algorithm

Multi modal functions

pose a starting point problem

a finite probability that the

Multi-Point , Derivative-Based Methods

Multi-Point , Derivative-Free Methods

Because they mimic Darwinian evolution, ESs, GAs, DE

ES eective continuous optimizer because it encodes

GAs are often better suited for combinatorial optimization

Evolution Strategies Algorithm

Nelder and Mead

the algorithm begins by forming a (D+1) dimensional

Nelder and Mead

One advantage of the NelderMead

NelderMead algorithm restricts the

Developed by Price and Storn

DE has proven itself in competitions like the IEEEs

DE perturbs vectors with the scaled dierence of two

Fig5: Initializing the DE population

Bit Strings Representation

GAs encode a continuous parameter x as an integer

even on unimodal objective functions, the

the floating-point span a vast dynamic range with

eciently because most modern programming

the encoding process is transparent to the user

DE requires a predefined probability

when parameters exhibit no obvious

The eects of far initializing DE

Halton point sets are based on prime

population far initialized with a Gaussian distribution is less

when the objective function is multi-modal, it is important to

in every case where uniform distributions failed, the

Base Vector Selection

Choosing the Base Vector Index, r0

Randomly selecting the base

Roulette wheel selection

One-to-One Base Vector Selections

Random Based Index Vectors Comparison

Degenerate Vector Combinations

r1 = r0 or r2 = r0: Arithmetic Recombination

Implementing Mutually Exclusive Indices

Eects of Degenerate Combinations

The Mutation Scale Factor: F

Randomizing the Scale Factor

keeping F constant has proven eective

PDF Sampling Frequency: Dither and Jitter

PDF Sampling Frequency: Dither and Jitter

PDF Sampling Frequency: Dither and Jitter

The data in table casts suspicion on Zaharies

Although jitter is eective on separable functions, its

Next presentation summary

Potrebbero piacerti anche