Sei sulla pagina 1di 52

The Dierential Evolution Algorithm

presentation based on book : Dierential Evolution

Chiriac Igor

Summary
The Motivation for Dierential Evolution
Introduction to Parameter Optimization
Local Versus Global Optimization

The Dierential Evolution Algorithm


Overview
Parameter Optimization
Initialization
Base Vector Selection
Dierential Mutation
DIFFERENTIAL EVOLUTION

Overview
Optimization - the attempt to maximize a systems desirable
properties while simultaneously minimizing its undesirable
characteristics
Function of the tuning knob:

noise power
f (x) =
signal power

Objective function - their most extreme value represents the


optimization goal
Cost function - when the minimum is sough
Error function - the minimum being sought is zero
Fitness function - function that describe properties to be
maximized
Presentation : Dierential Evolution

Objective functions attributes

Parameter quantization
Parameter dependence
Dimensionality
Modality
Time dependence
Noise
Constraints
Dierentiability

DIFFERENTIAL EVOLUTION

Classifying optimizers
Single-point
Steepest descent
Conjugate gradient
Derivative-based
Quasi-Newton

Derivative-free

DIFFERENTIAL EVOLUTION

Random walk
HookeJeeves

Multi-point

Multi-start and
clustering techniques
NelderMead
Evolutionary algorithms
Dierential evolution

Single-point, derivative-based
optimization
Classical derivative-based optimization can be eective
as long the objective function fulfills two requirements:
1. The objective function must be two-times
dierentiable.
2. The objective function must be uni-modal

DIFFERENTIAL EVOLUTION

Example
A simple example of a dierentiable and uni-modal objective function is:

Fig1: Representation of the function

DIFFERENTIAL EVOLUTION

f (x1 , x2 ) = 10

(x21 +3x22 )

Fig2: The method of steepest descent first computes the negative gradient,
then takes a step in the direction indicated.

Dierentiability of an objective function


Constraining the objective function may create
regions that are not dierentiable
If the objective function is a computer program,
conditional branches make it non-dierentiable
the objective function is subjective, an analytic
formula is not possible
the objective function is not explicit

DIFFERENTIAL EVOLUTION

Brute Force Search


Known as enumeration, the brute force method visits all grid points in a
bounded region while storing the current best point in memory

DIFFERENTIAL EVOLUTION

Hooke and Jeeves algorithm


known as a direction or pattern search

DIFFERENTIAL EVOLUTION

Multi modal functions

pose a starting point problem

a finite probability that the


random walk will eventually
generate a new and better
point in a basin of attraction
other than the one containing
the current base point

DIFFERENTIAL EVOLUTION

10

Simulated Annealing
modifying the greedy criterion to accept
some uphill moves while continuing to
accept all downhill moves
one drawbacks is that special eort may be
required to find an annealing schedule that
lowers T at the right rate

DIFFERENTIAL EVOLUTION

11

Multi-Point , Derivative-Based Methods


Multi-start techniques - restart the optimization
process from dierent initial points
Clustering methods - applying a clustering
algorithm to identify those sample points that
belong to the same basin of attraction

DIFFERENTIAL EVOLUTION

12

Multi-Point , Derivative-Free Methods

Because they mimic Darwinian evolution, ESs, GAs, DE


their are often referred to as evolutionary algorithms

ES eective continuous optimizer because it encodes


parameters as floating-point numbers and manipulate
them with arithmetic operations

GAs are often better suited for combinatorial optimization


because they encode parameters as bit strings and
modify them with logical operators

DIFFERENTIAL EVOLUTION

13

Evolution Strategies Algorithm

DIFFERENTIAL EVOLUTION

14

Nelder and Mead

tries to solve the step size problem by allowing the step size to
expand or contract as needed

the algorithm begins by forming a (D+1) dimensional


polyhedron, or simplex, of D + 1 points, xi, i = 0, 1, ..., D, that
are randomly distributed throughout the problem space

to obtain a new trial point, xr, the worst point, xD, is reflected
through the opposite face of the polyhedron using a weighting
factor, F1: xr = xD + F1 (xm xD )
D
1
X
1
xi )
the vector, x , is the centroid of the face opposite x : xm = (
D

i=0

DIFFERENTIAL EVOLUTION

15

Nelder and Mead

One advantage of the NelderMead


method is that the simplex can shrink
as well as expand to adapt to the
current objective function surface

NelderMead algorithm restricts the


number of sample points to D + 1

DIFFERENTIAL EVOLUTION

16

Dierential Evolution

Developed by Price and Storn

DE has proven itself in competitions like the IEEEs


International Contest on Evolutionary Optimization
(ICEO) in 1996 and 1997 and in the real world on a
broad variety of applications

DE is a population-based optimizer

DE perturbs vectors with the scaled dierence of two


randomly selected population vectors

DIFFERENTIAL EVOLUTION

17

DE INITIALIZATION

Fig5: Initializing the DE population


Presentation : Dierential Evolution

18

DE Mutation

Fig8: Mutation
DIFFERENTIAL EVOLUTION

19

Crossover
crossover builds trial vectors
out of parameter values that
have been copied from two
dierent vectors
DE crosses each vector with a
mutant vector
the crossover probability, Cr
[0,1], is a user-defined value

DIFFERENTIAL EVOLUTION

Fig10: Crossover

20

DE Selection

Fig.9: Selection
DIFFERENTIAL EVOLUTION

21

Visualizing DE

DIFFERENTIAL EVOLUTION

22

Visualizing DE

DIFFERENTIAL EVOLUTION

23

Visualizing DE

DIFFERENTIAL EVOLUTION

24

Visualizing DE

DIFFERENTIAL EVOLUTION

25

Visualizing DE

DIFFERENTIAL EVOLUTION

26

Parameter Representation
DE encodes all parameters as floating-point numbers.
Advantages over the traditional GA bit flipping approach :

DIFFERENTIAL EVOLUTION

ease of use
ecient memory utilization
lower computational complexity scales better
enlarge problems
lower computational eort faster convergence
greater freedom in designing a mutation
distribution
27

Bit Strings Representation

GAs encode a continuous parameter x as an integer


string of q bits

even on unimodal objective functions, the


computational eort to optimize a parameter is a
function of l that depends on a parameters value

the standard GA coding scheme imposes multimodality on even uni-modal objective function

DIFFERENTIAL EVOLUTION

28

Floating-Point

unlike the standard GA representation the floatingpoint format retains only a limited number of
significant digits

the floating-point span a vast dynamic range with


minimal resources

eciently because most modern programming


languages support common floating-point formats

the encoding process is transparent to the user

DIFFERENTIAL EVOLUTION

29

Initialization

DE requires a predefined probability


distribution function, or PDF, to seed
the initial population

when parameters exhibit no obvious


limits, their upper and lower bounds,
bj,U and bj,L, respectively, should be
set so that the initial bounding box
they define encompasses the
optimum
Fig12: Far initialization shrinks the initial bounding box so that it no longer
contains the optimum, x*

DIFFERENTIAL EVOLUTION

30

The eects of far initializing DE


Results are 100-trial averages obtained using classic DE with F = Cr = 0.9 and r0 r1 r2 i

DIFFERENTIAL EVOLUTION

31

Initial Distribution
Uniform Distributions - preferred because
they best reflect the lack of knowledge
about the optimums location
Gaussian Distribution - may prove faster if
the location of optimums location is well
known, although it may increase the
probability that the population will converge
prematurely
DIFFERENTIAL EVOLUTION

32

Uniform Distribution
distributing initial points with random uniformity
is not mandatory, but experience has shown
randj(0,1) to be very eective in this regard

Halton point sets are based on prime


numbers, these pseudo-random
distributions are both uniform and irregular

DIFFERENTIAL EVOLUTION

33

Uniform Distribution

Fig: 14 : Two hundred points distributed with random uniformity (left) and according to a two-dimensional Halton point set (right)

DIFFERENTIAL EVOLUTION

34

Uniform Distribution

DIFFERENTIAL EVOLUTION

35

Gaussian Distribution

population far initialized with a Gaussian distribution is less


likely to be successful on multi-modal functions than a
uniformly distributed one

when the objective function is multi-modal, it is important to


disperse the initial population widely enough to contain the
optimum

in every case where uniform distributions failed, the


Gaussian-distributed population failed more often

DIFFERENTIAL EVOLUTION

36

Base Vector Selection


DE equation:
the target index i - specifies the vector with
which the mutant is recombined and against
which the resulting trial vector competes
r0, r1 and r2 - determine which vectors
combine to create the mutant
DIFFERENTIAL EVOLUTION

37

Choosing the Base Vector Index, r0

Randomly selecting the base


vector without restrictions is
known in EA parlance as
roulette wheel selection

Roulette wheel selection


chooses Np vectors by
conducting Np separate
random trials

DIFFERENTIAL EVOLUTION

38

One-to-One Base Vector Selections


it is a way to stochastically assign each target vector an unique base
vector
Each of the Np possible
values for rg defines a
one-to-one mapping between target and base
vectors

DIFFERENTIAL EVOLUTION

39

Random Based Index Vectors Comparison

DIFFERENTIAL EVOLUTION

40

Degenerate Vector Combinations

r1 = r2: No Mutation
when indices are chosen without restrictions, r1 will equal r2 on
average once per generation, i.e., with probability 1/Np
the probability that all three indices will be equal is (1/Np)2

r1 = r0 or r2 = r0: Arithmetic Recombination


each coincidence occurs on average once per generation
DEs three-vector mutation formula reduces to a linear relation
between the base vector and a single dierence vector

DIFFERENTIAL EVOLUTION

41

Implementing Mutually Exclusive Indices

DIFFERENTIAL EVOLUTION

42

Eects of Degenerate Combinations

DIFFERENTIAL EVOLUTION

43

Dierential Mutation
DE uses a uniform probability distribution function to
randomly sample vector dierences:
In a population of Np distinct vectors, there will be
Np(Np 1) non- zero vector dierences

DIFFERENTIAL EVOLUTION

44

The Mutation Scale Factor: F


the stated range for F is (0,1)
to avoid premature convergence, it is crucial that F be of
sucient magnitude to counteract this selection
pressure
Zaharie empirically examined three test functions using
Np = 50, pCr = 0.2 and found that F ~ 0.3 was the
smallest reliable scale factor

DIFFERENTIAL EVOLUTION

45

Randomizing the Scale Factor

keeping F constant has proven eective


nevertheless, randomizing F oers potential
benefits
randomizing the scale factor is a way to
increase the pool of potential trial vectors and
minimize the risk of stagnation without
increasing the population size

DIFFERENTIAL EVOLUTION

46

PDF Sampling Frequency: Dither and Jitter


jitter - the practice of generating a new
value of F for every parameter
dither - choosing F anew for each vector

DIFFERENTIAL EVOLUTION

47

PDF Sampling Frequency: Dither and Jitter


As expected, both jitter and dither exhibit the same
number of function evaluations and the same optimal population
size (Np = 7) when Cr = 0.
At Cr = 0.2 (Zaharies choice), all three methods require
about the same number of function evaluations, with both jitter
and dither also having the same optimal population size (Np = 8).
Over the range of Cr, jitter was the fastest technique and
the optimal population size was virtually constant at Np = 8.
In terms of the number of function evaluations, F =
constant and dither perform similarly, but dither requires a larger
population.
DIFFERENTIAL EVOLUTION

48

PDF Sampling Frequency: Dither and Jitter

The data in table casts suspicion on Zaharies


contention that multiplying each component of a
dierential by a normally distributed variable does not
aect DEs performance

Although jitter is eective on separable functions, its


poor performance on non-separable, multi-modal
functions makes it a questionable strategy for nonlinear global optimization with DE

DIFFERENTIAL EVOLUTION

49

Next presentation summary

DE Selection
Termination Criteria
Benchmarking Differential Evolution

DIFFERENTIAL EVOLUTION

50

QUESTIONS ?

DIFFERENTIAL EVOLUTION

51

Potrebbero piacerti anche