Sei sulla pagina 1di 39

Dr.

Omari Mohammed
Matre de Confrences Classe A
Universit dAdrar
Courriel : omarinmt@gmail.com



SUPPORT DE COURS


Matire : Algorithmes Gntiques
Niveau : 2
me
Anne Master en Informatique
Option : Rseaux et Systmes Intelligents



-1-

Genetic Algorithms
Text Book: Sivanandam - Introduction to Genetic Algorithms (Springer, 2008)
Other References:
Franz Rothlauf, Representations for Genetic and Evolutionary Algorithms (Springer, 2006).
Melanie Mitchell, An Introduction to Genetic Algorithms, (MIT Press, 1998).
Thomas Weise, Global Optimization Algorithms: Theory and Application, (2009)

Program:
1- Basic Definitions
a. Introduction
b. Biological Background
c. Search and Evolution
d. Conventional Optimization and Search Techniques
e. A Simple Genetic Algorithm
f. Comparison of Genetic Algorithm with Other Optimization Techniques
g. Advantages and Limitations of Genetic Algorithm
h. Applications of Genetic Algorithm

2- Terminologies and Operators of GA
a. Key Elements
b. Individuals
c. Genes
d. Fitness
e. Populations
f. Data Structures
g. Search Strategies
h. Encoding
i. Breeding
j. Search Termination (Convergence Criteria)
k. Why do Genetic Algorithms Work
l. Solution Evaluation
m. Search Refinement
n. Constraints
o. Fitness Scaling
p. Example Problems
-2-

3- Advanced Operators and Techniques in Genetic Algorithm
a. Diploidy, Dominance and Abeyance
b. Multiploid
c. Inversion and Reordering
d. Niche and Speciation
e. Few Micro-operators
f. Non-binary Representation
g. Multi-Objective Optimization
h. Combinatorial Optimizations
i. Knowledge Based Techniques

4- Classification of Genetic Algorithm
a. Simple Genetic Algorithm(SGA)
b. Parallel and Distributed Genetic Algorithm (PGA and DGA)
c. Hybrid Genetic Algorithm (HGA)
d. Adaptive Genetic Algorithm(AGA)
e. Fast Messy Genetic Algorithm (FMGA)
f. Independent Sampling Genetic Algorithm(ISGA)

5- Genetic Algorithm Optimization Problems
a. Fuzzy Optimization Problems
b. Multi-objective Reliability Design Problem
c. Combinatorial Optimization Problem
d. Scheduling Problems
e. Transportation Problems
f. Network Design and Routing Problems

-3-


Introduction: Evolutionary Computation

Lets try to solve the quadratic equation: x
2
- 3x + 2 = 0, xZ.
Lets put f(x) = x
2
- 3x + 2.
Lets generate a random set of possible solutions {-8, -4, 0, 4, 8, 12}.
These solutions are not optimal since f(x
i
) 0, x
i
{-8, -4, 0, 6, 8, 10}
Step 1: calculate the second generation of solutions as follows: for each consecutive pair x
i
and x
j
,
calculate (x
i
+ x
j
)/2 and (x
i
x
j
)/2. So the new generated set is {-6, -2, 3, -3, 9, -1}.
After evaluation (calculating f(x
i
)), we find that none of these solutions is optimal (f(x
i
) 0), so we
keep repeating Step 1 until we find a solution.
Third generation: {-4, -2, 0, 3, 4, 5}. No optimal solutions.
Fourth generation: {-3, -1, 1, -1, 4, 0} we find the first optimal solution x = 1, since f(1) = 0.
Fifth generation: {-2, -1, 0, 1, 2, 2} we find the second optimal solution x = 2, since f(2) = 0.
Discuss!
I- Introduction:
The origin of evolutionary algorithms was an attempt to mimic some of the processes taking place
in natural evolution.
Natural evolution is a process operating over chromosomes rather than over organisms.
The chromosomes are organic tools encoding the structure of a living being. When a set of
chromosomes is decoded, the corresponding creature is built.
Natural selection is the mechanism that relates chromosomes with the efficiency of the entity they
represent, thus allowing that efficient organism which is well adapted to the environment to
reproduce more often than those which are not.
There exist a large number of reproductive mechanisms in Nature.
The popular type of reproduction is recombination, which combines the chromosomes of the
parents to produce the offspring.
Mutation (an unpopular type) is another form of reproduction, which causes the chromosomes
(precisely the genes) of an offspring to be different to those of its parents.
An Evolutionary Algorithm (EA) is an iterative and stochastic process that operates on a set of
individuals (population). Each individual represents a potential solution to the problem being
solved.
-4-

Initially, the population is randomly generated. Every individual in the population is assigned, by
means of a fitness function, a measure of its goodness with respect to the problem under
consideration. This value is the quantitative information the algorithm uses to guide the search.
Among the evolutionary techniques, the genetic algorithms (GAs) are the most extended group of
methods representing the application of evolutionary tools. They rely on the use of a selection,
crossover and mutation operators.
Replacement is usually by generations of new individuals.
A GA proceeds by creating successive generations of better and better individuals by applying
very simple operations.
The search is only guided by the fitness value associated to every individual in the population. This
value is used to rank individuals depending on their relative suitability for the problem being
solved.
The hardest part in GAs is to find the BEST fitness function.
-5-

Here is the situation of genetic algorithms among other well-known search procedures:

II- Evolutionary Computation:
Heuristic methods refer to experience-based techniques for problem solving, learning, and
discovery. Where an exhaustive search is impractical, heuristic methods are used to speed up the
process of finding a satisfactory solution. Examples of this method include using a "rule of
thumb", an educated guess, an intuitive judgment, or common sense.
Heuristic algorithms are often employed because they may be seen to "work" without having been
mathematically proven to meet a given set of requirements. Statistical analysis should be
conducted when employing heuristics to estimate the probability of incorrect outcomes.
The evolutionary concept can be applied to problems where heuristic solutions are not present or
which leads to unsatisfactory results.
The theory of natural selection (Darwin, 1859) proposes that the plants and animals that exist
today are the result of millions of years of adaptation to the demands of the environment.
Search Techniques
Calculus Based
Random
Enumerative
Direct Indirect Guided Unguided Guided Unguided
Fibonacci Newton
Las Vegas Dynamic
Programming
Branch &
Bound
Backtracking
Tabu
Search
Simulated
Annealing
EVOLUTIONARY
ALGORITHMS
Neural Networks
Kohonen
MAP
Hopfied Multi-
Layer
Perceptron
Genetic
Programming
Evolutionary
Strategies
Evolutionary
Programming
GENETIC
ALGORITHMS
Parallel GA Sequential GA
Automatic
Parallelism
One Population,
Parallel Evolution
Coarse Grain
Parallel GAs
Fine Grain
Parallel GAs

Generational Steady-State Messy

Homogeneous
Heterogeneous
Homogeneous
Heterogeneous
-6-

The organisms that are most capable of acquiring resources and successfully procreating are the
ones whose descendants will tend to be numerous in the future. Organisms that are less capable,
for whatever reason, will tend to have few or no descendants in the future.
Over time, the entire population of the ecosystem is said to evolve to contain organisms that are
more fit than those of previous generations of the population.
Evolutionary computation (EC) techniques abstract these evolutionary principles into algorithms
that may be used to search for optimal solutions to a problem.
For a search space with only a small number of possible solutions, all the solutions can be
examined in a reasonable amount of time and the optimal one found. This exhaustive search,
however, quickly becomes impractical as the search space grows in size.
Traditional search algorithms randomly sample (e.g., random walk) or heuristically sample (e.g.,
gradient descent) the search space one solution at a time in the hopes of finding the optimal
solution.
The key aspect distinguishing an evolutionary search algorithm from such traditional algorithms is
that it is population-based.
Through the adaptation of successive generations of a large number of individuals, an evolutionary
algorithm performs an efficient directed search.
Evolutionary search is generally better than random search and is not susceptible to the hill-
climbing behaviors of gradient-based search.

1- Historical Development
In the case of evolutionary computation, there are four historical paradigms that have served as the
basis for much of the activity of the field:
Genetic algorithms (Holland, 1975).
Genetic programming (Koza, 1992, 1994)
Evolutionary strategies (Recheuberg, 1973).
Evolutionary programming (Forgel et al., 1966).
The basic differences between the paradigms lie in the nature of the representation schemes, the
reproduction operators and selection methods.

2- Advantages of Evolutionary Computation
Conceptual Simplicity: A key advantage of evolutionary computation is that it is conceptually
simple. The evolutionary algorithm consists of initialization, iterative variation and selection in
light of a performance index. No pre-knowledge and no gradient-based steps.
-7-

Broad Applicability: Evolutionary algorithms can be applied to any problems that can be
formulated as function optimization problems. To solve these problems, it requires a data structure
to represent solutions, to generate and evaluate solutions from old solutions.
Hybridization with Other Methods: Evolutionary algorithms can be combined with more
traditional optimization techniques. They can be used to optimize the performance of neural
networks, fuzzy systems, production systems, wireless systems and other program structures.
Parallelism: Evolution is a highly parallel process. The individual solutions are evaluated
independently of the evaluations assigned to competing solutions. The evaluation of each solution
can be handled in parallel. However, the selection requires some serial operation.
Robust to Dynamic Changes: Traditional methods of optimization are not robust to dynamic
changes in the environment. In fact, they usually require a complete restart. In contrary,
evolutionary computation can be used to adapt solutions to changing circumstances. The generated
population of evolved solutions provides a basis for further improvement and in many cases, it is
not necessary to reinitialize the population at random.
Solves Problems that have no Solutions: The advantage of evolutionary algorithms includes its
ability to address problems for which there is no human expertise. Artificial intelligence may be
applied to several difficult problems requiring high computational speed, but they cannot compete
with the human intelligence. Fogel (1995) declared artificial intelligence as They solve problems,
but they do not solve the problem of how to solve problems. (You need to build the neural
network layers and weight matrix in order to train and solve the problem). In contrast,
evolutionary computation provides a method for solving the problem of how to solve problems.
-8-


Introduction: Genetic Algorithms

I- Introduction:
Flowchart of an Evolutionary Algorithm:

In nature, an individual in population competes with each other for virtual resources like food,
shelter and so on.
Also in the same species, individuals compete to attract mates for reproduction.
So, the most adapted or fit individuals produce a relatively large number of offsprings.
During reproduction, a recombination of the good characteristics of each ancestor can produce
best fit offspring whose fitness is greater than that of a parent.
In 1975, Holland developed this idea in his book Adaptation in natural and artificial systems.
He described how to apply the principles of natural evolution to optimization problems and built
the first Genetic Algorithms.
Genetic algorithms are based on the principle of genetics and evolution.
By simulating evolution, one can solve optimization problems.

-9-

II- Biological Background:
1- The Cell:
Animal or human cell is a complex of many small factories that work together. The center of all
this is the cell nucleus. The genetic information is contained in the cell nucleus.

2- The Cell:
Animal or human cell is a complex of many small factories that work together. The center of all
this is the cell nucleus. The genetic information is contained in the cell nucleus.
3- Chromosomes
All the genetic information gets stored in the chromosomes.
Each chromosome is build of Dioxy Ribo Nucleic Acid (DNA).
In humans, a chromosome exists in the form of pairs (23 pairs found).
The chromosomes are divided into several parts called genes.
Genes code the properties of species i.e., the characteristics of an individual.
The possibilities (values) of the genes for one property are called allele and a gene can take
different alleles.
Gene = Eye color
Allele = Black
Gene Pool = {Black, Brown, Blue, Green}
The set of all possible alleles present in a particular population forms a gene pool.
This gene pool can determine all the different possible variations for the future generations.
The set of all the genes of a specific species is called genome.
Genome = {Eye color gene, nose shape gene, ...}
Each gene has a unique position on the genome called locus.
Most living organisms store their genome on several chromosomes, but in the Genetic Algorithms
(GAs), all the genes are usually stored on the same chromosomes.
Thus chromosomes and genomes are synonyms with one other in GAs.
-10-



4- Genotype and Phenotype:
For a particular individual, the entire combination of genes is called genotype.

The phenotype describes the physical aspect of decoding a genotype to produce the phenotype.

The selection process is always done on the phenotype.
The reproduction process consists of recombining genotypes.
Chromosomes may contain two sets of genes (diploids). In this case, the dominant one will
determine the phenotype whereas the other one, called recessive, will still be present and can be
passed on to the offspring.
In haploid representation, only one set of each gene is stored, thus the process of determining
which allele should be dominant and which one should be recessive is avoided.
Most GA concentrates on haploid chromosomes because they are much simple to construct.
5- Reproduction: Mitosis and Meiosis:
In Mitosis, the same genetic information is copied from a parent to its new offspring. There is no
exchange of information (e.g., cell multiplication).



-11-


In Meiosis, genetic information is shared between the parents in order to create new offspring (e.g.,
sexual reproduction).

III- Genetic Algorithms World
1- Advantages
Genetic Algorithm is a stochastic algorithm.
Randomness as an essential role in both selection and reproduction phases.
Genetic algorithms always consider a population of solutions. A population base algorithm is also
very amenable for parallelization.
There is no particular requirement on the problem before using genetic algorithms, so it can be
applied to resolve any problem (optimization).
GAs are a new field, and parts of the theory have still to be properly established. We can find
almost as many opinions on GAs as there are researchers in this field.
2- Limitations
GAs are not guaranteed to find the global optimum solution to a problem,



Parents
Offspring



Parent
Offsprings




Offsprings
-12-

They are satisfied with finding acceptably good solutions to the problem.
GAs are an extremely general tool, and they have no specific way for solving particular problems.
GAs are usually used when everything else is failed or when we dont have enough knowledge of
the search space.
Even when such specialized techniques exist, it often interesting to hybridise them with a GA in
order to possibly gain some improvements.
IV- A Basic Genetic Algorithm:

Start
Create initial random
population
Evaluate fitness for each
member of the population
Store best individuals
Create mating pool
Create next generation
using crossover
Optimal or good
solution found?
Stop
Perform Mutation
NO
YES
Fitness evaluation
Population
Selection
Crossover
Mutation
-13-

[START] Generate random population of n chromosomes (suitable solutions for the problem)
[FITNESS] Evaluate the fitness f(x) of each chromosome x in the population
[NEW POPULATION] Create a new population by repeating following steps until the new
population is complete:
a. [Selection] Select two parent chromosomes from a population according to their fitness.
b. [Crossover] Cross over the parents to form new offspring (children).
c. [Mutation] With a mutation probability, mutate new offspring at each locus (position in
chromosome)
d. [Accepting] Place new offspring in the new population.
[REPLACE] Use new generated population for a further sum of the algorithm.
[TEST] If the end condition is satisfied, stop, and return the best solution in current population.
[LOOP] Go to step2 for fitness evaluation.

IV- Comparison with other Optimization Techniques:
GAs operate with coded versions of the problem parameters; the coding of solution set and not the
solution.
Almost all conventional optimization techniques search from a single point (centralized) but, GAs
always operate on a whole population of points (distributed). It improves the chance of reaching
the global optimum and also helps in avoiding local stationary point.
GA uses fitness function for evaluation rather than derivatives. As a result, they can be applied to
any kind of continuous or discrete optimization problem.
GAs use probabilistic transition operates while conventional methods for continuous optimization
apply deterministic transition operates i.e., GAs does not use deterministic rules.
V- Summary of Advantages:
Parallelism.
Easy to discover global optimum.
They are resistant to becoming trapped in local optima.
The problem has multi objective function.
Only uses function evaluations.
Easily modified for different problems.
Handles large, poorly understood search spaces easily.
Good for multi-modal problems (a suite of solutions).
Discontinuities of the response surface have little effect on overall optimization performance.
-14-

VI- Summary of Limits:
The problem of identifying fitness function.
Definition of representation for the problem.
Premature convergence occurs.
The problem of choosing the various parameters like the size of the population, mutation rate,
cross over rate, the selection method and its strength.
Cannot use gradients.
Cannot easily incorporate problem specific information.
Not good at identifying local optima.
Have trouble finding the exact global optimum.
No effective terminator.
Not effective for unimodal functions.
Needs to be coupled with a local search technique.
Require large number of response (fitness) function evaluations.
-15-


Terminologies of Genetic Algorithms
I- Introduction
Genetic Algorithm is used to solve optimization problems inside an environment, and feasible or
optimal solutions are individuals that exist in that environment.
In genetic algorithms, individuals are usually encoded into binary digits.
1- Individuals
An individual is a single solution.
An individual is a combination of two forms of solutions: the genotype (the chromosome) and the
phenotype (the physical appearance).

A chromosome is subdivided into genes.
A gene is the GAs representation of a single factor.
Each factor in the solution set corresponds to gene in the chromosome.
The morphogenesis function associates each genotype with its phenotype.
Each chromosome must define one unique solution, but it does not mean that each solution
encoded by exactly one chromosome.
The morphogenesis function may not be bijective, but it should at least be subjective; a candidate
solution of the problem must correspond to at least one possible chromosome.
When the morphogenesis function is not injective, i.e., different chromosomes can encode the
same solution, the representation (encoding) is said to be degenerated.
Chromosomes are encoded by bit strings.

2- Genes
Genes are the basic instructions for building a Generic Algorithms.

Encoded Chromosome




An individual
Morphogenesis
function
-16-

A chromosome is a sequence of genes. Genes may describe a property, a characteristic, or a part of
the possible solution to a problem.
A gene is a bit string of arbitrary lengths.
This bit string is a binary representation of number of intervals.

A gene is the GAs representation of a single factor value for a control factor, where control factor
must have an upper bound and lower bound.
A bit string of length n can represent 1 interval of n bits, 2 interval of

bits, , or n intervals of 1
bit.
3- Fitness
The fitness of an individual in a genetic algorithm is the value of an objective function for its
phenotype.
The fitness is applied on the solution not the chromosome. For calculating fitness, the chromosome
has to be first decoded and the objective function has to be evaluated.
The fitness not only indicates how good the solution is, but also corresponds to how close the
chromosome is to the optimal solution.
In the case of multicriterion optimization, the fitness function is definitely more difficult to
determine. There is often a dilemma as how to determine if one solution is better than another.
So the trouble comes more from the definition of a better solution rather than from GA.
4- Populations
A population is a collection of individuals.
A population consists of a number of individuals being tested, the phenotype parameters defining
the individuals, and some information about search space.

The two important aspects of population used in Genetic Algorithms are:
1. The initial population generation.
2. The population size.


Interval 1 Interval 2 Interval 3 Interval 4
-17-

The population size depends on the complexity of the problem.
The initial population is often generated randomly. In the case of a binary coded chromosome,
each bit is initialized to a random zero or one.
Some techniques use some known good solutions to guide the random initialization of the
population. Sometimes a kind of heuristic method can be used to seed the initial population. In this
case, the mean fitness of the population is already high and it may help the genetic algorithm to
find good solutions faster.
The gene pool should be large enough; otherwise, if the population badly lacks diversity, the
algorithm will just explore a small part of the search space and never find global optimal solutions.
The time required by a GA to converge is O(nlgn) function evaluations where n is the population
size.
We say that the population has converged when all the individuals are very much alike and no
further improvement are possible, except by mutation.
Practically, a population size of around 100 individuals is quite frequent.
5- Data Structures
The main data structures in GA are chromosomes, phenotypes, objective function values and
fitness values.
An entire chromosome population can be stored in a single array given the number of individuals
and the length of their genotype representation.
Similarly, the design variables, or phenotypes can be stored in a single array.
The objective function values can be scalar or vectorial.
Fitness values are derived from the object function using scaling or ranking function and can be
stored as vectors.
6- Search Strategies
The search process consists of initializing the population and then breeding (generating) new
individuals until the termination condition is met.
There can be several goals for the search process:
a- the global optima: there is always a possibility that the next iteration in the search would
produce a better solution, but there is no guaranty. Also, the chance of converging on local optima
is possible.
b- fast convergence: when the objective function is expensive to run, faster convergence is depends
on the hardware.
c- diversity: when the solution space contains several distinct optima (similar in fitness), it is
useful to be able to select between them.
7- Encoding
-18-

Encoding is a process of representing individual genes.
The process can be performed using bits, numbers, trees, arrays, lists or any other objects. It
depends mainly on solving the problem.
The most common way of encoding is a binary string.

Each bit in the string can represent some characteristics of the solution.
Every bit string therefore is a solution but not necessarily the best solution.
Another possibility is that the whole string can represent a number or an object.
Octal Encoding (using octal numbers 07), Hexadecimal Encoding (hexadecimal numbers (09,
AF) can also be used instead of binary encoding.
In Permutation Encoding, every chromosome is a string of numbers, which represent a permutation
vector. (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), and (3,2,1) are examples of permutations vectors of
size 3.
The permutation encoding represents a rank, an order, or a sequence number.
Permutation encoding is only useful for ordering problems.
After crossover and mutation phases, corrections must be made to leave the chromosome
consistent (no duplicates in the sequence).
In Value Encoding, every chromosome is a string of values. The values can be anything connected
to the problem.
This encoding produces best results for some special problems.
It is often necessary to develop new genetic operators (crossover, mutation) specific to the
problem.
Direct value encoding can be used in problems. The use of binary encoding for this type of
problems would be very difficult.



Octal Encoding
Hexadecimal Encoding

-19-


II- Breeding
The breeding process is the heart of the genetic algorithm, where we create new and hopefully
better and fitter individuals.
The breeding cycle consists of three steps:
a. Selecting parents.
b. Crossing the parents to create new individuals (offspring or children).
c. Replacing old individuals in the population with the new ones.

III- Selection
Selection is the process of choosing (two) parents from the population for crossing.
After deciding on an encoding, the next step is to decide how to perform selection of individuals in
the population that will create offspring for the next generation and how many offspring will be
created.
The purpose of selection is to emphasize fitter individuals in the population in hopes that their
offsprings have higher fitness.
Selection is a method that randomly picks chromosomes out of the population according to their
evaluation function. The higher the fitness function, the more chance an individual has to be
selected.
Selection guides the GA to improve the population fitness over the successive generations.
The convergence rate of GA is largely determined by the magnitude of the selection procedure.

Mating Pool
Best Individuals
New Population

-20-

Genetic Algorithms should be able to identify optimal or nearly optimal solutions.
If the selection pressure is too low, the convergence rate will be slow; the GA will take longer time
to find the optimal solution.
If the selection pressure is too high, there is an increased change of the GA prematurely
converging to an incorrect or sub-optimal solution.
Selection schemes should also preserve population diversity, as this helps to avoid premature
convergence.
Selection schemes can be probabilistic where chances of being selected are proportional to fitness,
yet it is possible for less fit individuals to be selected. They can also be greedy, where only the
fittest solutions are selected.
1- Roulette Wheel Selection
Roulette selection is one of the traditional GA selection techniques.
The principle of roulette selection is a linear search through a roulette wheel with the slots in the
wheel weighted in proportion to the individuals fitness values.
A target value is set, which is a random proportion of the sum of the fitnesses in the population.
The population is scanned one by one until the target value is reached.
Fit individuals are not guaranteed to be selected, but somewhat have a greater chance.
It is essential that the population not be sorted by fitness, since this would dramatically bias the
selection.

The expected value of an individual is its fitness divided by the actual fitness of the population.
Each individual is assigned a slice of the roulette wheel.
The wheel is spun N times, where N is the number of individuals in the population.
On each spin, the individual under the wheels marker is selected to be in the pool of parents for
the next generation.
There are many implementations of the roulette methods.


Chromosome 1
(prob =

)
Chromosome 2
(prob =

)
Chromosome 3
(prob =

)
Chromosome 4
(prob =

)
Chromosome 5
(prob =

)
Wheels marker
-21-

1. Let T be the sum of the total expected value of the individuals in the population.
2. Repeat N times:
i. Choose a random integer r between 0 and T.
ii. Loop through the individuals in the population, summing the expected values, until the sum is
greater than or equal to r. Then we pick up the individual we stop at, and repeat step (i) for the
next individual.

Example:
Objective function: Maximize x
2
-3x+2
Initial population = {4, 6, 10, 12}
X Expected value
4 6
6 20
10 72
12 110
Total 208
We select a random number between 0 and 208: 30.
At 4, the sum of expected values = 6. At 6, the sum of expected values = 26. At 10 the sum of
expected values is 92. So 10 is selected.
We select again another number between 0 and 208: 115.
At 12, the sum of expected values = 110. At 4, the sum of expected values = 116. So, 4 is selected.
We select again another number between 0 and 208: 101.
At 6, the sum of expected values = 20. At 10, the sum of expected values = 92. At 12, the sum of
expected values = 202. So, 12 is selected.

2- Random Selection
This technique randomly selects parents from the population.
Random selection does not take in consideration ranking of parents.
It does not follow the rule that good characteristics are passed to offsprings.
3- Rank Selection
The Roulette wheel will have a problem when the fitness values differ very much.
In case an individual has 90% of fitness, then it occupies most the surface of the Roulette wheel, so
other individuals have less chance to be selected.
In Rank Selection ranks the population as follows: For N individuals, the worst individual has
fitness = 1 and the best has fitness = N.
-22-

Rank selection is characterized by slow convergence compared to Roulette wheel.
Rank selection also prevents too quick convergence.
It keeps up selection pressure when the fitness variance is low.
In case of high variance, it preserves diversity which leads to a successful search.
There are many ways to implement Rank Selection.
Technique 1:
r is a parameter to be chosen (0 < r < 1).
Select a pair of individuals at random (Ind
1
, Ind
2
).
Generate a random number R between 0 and 1.
If R < r use the first individual as a parent (P
1
=Ind
1
).
If the R>=r then use the second individual as the parent (P
1
=Ind
2
).
Repeat generating R to select the second parent (P
2
).
Technique 2:
Select two individuals at random (Ind
1
, Ind
2
).
The individual with the highest evaluation becomes the parent (P
1
).
Repeat to find a second parent (P
2
).

3- Tournament Selection
An ideal selection strategy should be such that it is able to adjust its selective pressure and
population diversity.
Unlike, the Roulette wheel selection, the tournament selection strategy provides selective pressure
by holding a tournament competition among k individuals.
Selection pressure is directly proportional to the number k of participants.
The best individual from the tournament is the one with the highest fitness, which is the winner of
k.
The winners of the tournaments are then inserted into the mating pool.
The tournament competition is repeated until the mating pool for generating new offspring is
filled.
Randomly select k individuals for a tournament.
Extract the best individual and insert it in the mating pool
Repeat step 1 until mating pool is filled.
4- Stochastic Universal Sampling
Stochastic universal sampling provides zero bias, yet individuals of better fitness have better
chance to be selected.
-23-

The individuals are mapped to contiguous segments of a line or circle, such that each individuals
segment is equal in size to its fitness exactly as in roulette-wheel selection.

Here equally spaced pointers are placed over the line, as many as there are individuals to be
selected.
Implementation:
Consider N the number of individuals to be selected, then the distance between the pointers are
1/N.
The position of the first pointer is given by a randomly generated number in the range [0, 1/N].
For 6 individuals to be selected, the distance between the pointers is 1/6=0.125.
Sample of 1 random number in the range [0, 0.167]: 0.1.
After selection the mating population consists of the individuals, 1, 2, 3, 4, 6, 8.

5- Elitism
The first best chromosome (2, 3, ) are copied directly to the new population.
The rest of the new population is generated through one of the previous techniques.
Because the classical selection techniques and crossover do not guarantee better offsprings, good
individuals (better solutions) can be lost if they are not selected.
Elitism significantly improves the GAs performance.




-24-

IV-Crossover (Recombination)
Crossover is the process of taking two parent solutions (after selection process) and producing
from them a child or an offspring.
Reproduction makes clones of good strings but does not create new ones.
Crossover operator is applied to the mating pool with the hope that it creates a better offspring.
Crossover is a recombination operator that proceeds in three steps:
Select at random a pair of two individual strings for the mating.
A cross site is selected at random along the string length.
The position values are swapped between the two strings following the cross site.
There are various techniques for crossover.
1- Single Point Crossover
The traditional genetic algorithm uses single point crossover, where the two mating chromosomes
are cut once at corresponding points and the sections after the cuts are swapped.

Here, a cross-site or crossover point is selected randomly along the length of the mated strings and
bits next to the cross-sites are exchanged.
2- Two Point Crossover
Apart from single point crossover, many different crossover algorithms have been devised, often
involving more than one cut point.
Adding further crossover points reduces the performance of the GA.
An advantage of having more crossover points is that diverse offsprings are created, i.e., searching
is more diverse.
In two-point crossover, two crossover points are chosen and the contents between these points are
exchanged between two mated parents.
-25-


Originally, GAs were using one-point crossover which cuts two chromosomes in one point and
splices the two halves to create new ones.
But with this one-point crossover, the head and the tail of one chromosome cannot be passed
together to the offspring.
So, if both the head and the tail of a chromosome contain good genetic information, none of the
offsprings obtained directly with one-point crossover will share the two good features.
Using a 2-point crossover is generally considered better than 1-point crossover.
3- Multi-Point Crossover (N-Point crossover)
In multipoint crossover, N random cross-sites are chosen.
There are two ways in this crossover: One is even number of cross-sites and the other odd number
of cross-sites.
In the case of even number of cross-sites, cross-sites are selected randomly.
In the case of odd number of cross-sites, a different cross-point is always assumed at the string
beginning.
4- Uniform Crossover
Each gene in the offspring is created by copying the corresponding gene from one or the other
parent chosen according to a random generated binary crossover mask of the same length as the
chromosomes.
Where there is a 1 in the crossover mask, the gene is copied from the first parent, and where there
is a 0 in the mask the gene is copied from the second parent.

A new crossover mask is randomly generated for each pair of parents.
The Uniform Crossover uses a fixed mixing ratio between two parents.
-26-

Unlike one-point and two-point crossover, the Uniform Crossover enables the parent chromosomes
to contribute the gene level rather than the segment level.
If the mixing ratio is 0.5, the offspring has approximately half of the genes from first parent and
the other half from second parent, although cross over points can be randomly chosen.
5- Three Parent Crossover
In this crossover technique, three parents are randomly chosen.
Each bit of the first parent is compared with the bit of the second parent.
If both are the same, the bit is taken for the offspring otherwise; the bit from the third parent is
taken for the offspring.


6- Shuffle Crossover
Shuffle crossover is related to uniform crossover.
A single (or multiple) crossover position is selected. But before the variables are exchanged, they
are randomly shuffled in both parents.
This removes positional bias as the variables are randomly reassigned each time crossover is
performed.

7- Precedence Preservative Crossover (PPX)
PPX was independently developed for vehicle routing problems.
For instance, lets consider six operations AF.
Parent
1
= (A, B, C, D, E, F), and Parent
2
= (C, A, B, F, D, E)
The operator works as follows:
A vector representing the number of operations is randomly filled with elements of the set {1, 2};
1 for parent
1
and 2 for parent
2
.
-27-

In this strategy, parents and offsprings are permutation lists: no redundancy.
First we start by initializing an empty offspring.
We start selecting operations (alleles) from parents following the selected random vector.
The selected operation is appended to the offspring.
After an operation is selected, it is deleted in both parents.
The step is repeated until both parents are empty and the offspring contains all operations
involved.


8- Ordered Crossover
Given two parent chromosomes, two random crossover points are selected partitioning them into a
left, middle and right portion.
Offspring 1 inherits its left and right section from parent 1.
The middle section is determined by the genes in the middle section of parent 1 in the order in
which the values appear in parent 2.
A similar process is applied to determine child 2.

9- Partially-Matched Crossover (PMX)
In Partially Matched Crossover, two strings are aligned, and two crossover points are selected
uniformly at random along the length of the strings.
The two crossover points give a matching selection.
The crossover is performed as position-by-position exchange operations.

Consider the above example, two crossover points were selected at random: positions 4 and 6.
Therefore, the genes are exchanged: the 3 and the 2, the 6 and the 7, the 5 and the 9 exchange
places. So, child A looks like:

Child B also is formed as by exchanging the same genes in parent B:

Therefore, each offspring contains ordering information partially determined by each of its parents.
PMX can be applied to problems with permutation representation.
-28-

10- Crossover Probability (P
c
)
Crossover probability is a parameter to describe how often crossover will be performed.
If crossover probability is 100%, then all offspring are made by crossover.
If it is 0%, whole new generation is made from exact copies of chromosomes from old population
In this case, the new generation is not always the same as the old generation. Mutation can play a
role in modifying the old generation.
Usually, it is good to leave some old population to next generation. This can be set by selecting a
crossover probability less than 100%.


V- Mutation
After crossover, the strings are subjected to mutation.
If crossover is supposed to exploit the current solution to find better ones, mutation is supposed to
help for the exploration of the whole search space.
Mutation helps to maintain genetic diversity in the population.
Mutation introduces new genetic structures in the population.
Mutation prevents the algorithm to be trapped in a local minimum.
Mutation ensures ergodicity: A search space is said to be ergodic if there is a non-zero probability
of generating an optimal solution from any population state.
There are many different forms of mutation for the different kinds of representation.
1- Flipping
Flipping of a bit involves changing 0 to 1, and 1 to 0 based on a mutation chromosome generated.

For a 1 in mutation chromosome, the corresponding bit in parent chromosome is flipped (0 to 1
and 1 to 0) and child chromosome is produced.

2- Interchanging
Two random positions of the string are chosen and the bits corresponding to those positions are
interchanged.

3- Reversing
A random position is chosen and the bits next to that position are reversed.
-29-


4- Mutation Probability (P
m
)
The mutation probability decides how often parts of chromosome will be mutated.
If mutation probability is 100%, whole chromosome is changed. If it is 0%, nothing is changed.
Mutation should not occur very often, because then GA will in fact change to random search.
The mutation probability concerns the number of individuals that can be muta ted not the
number of chromosomes inside one individual.

VI- Replacement
Replacement is the last stage of the breeding cycle.
After selecting two parents from a fixed size population (N), usually they produce two offsprings
after crossover and mutation. Now with 4 individuals (2 old and 2 new), only 2 of them should be
considered for the next GA iteration.
GA convergence is greatly affected by the technique used to decide which individual stay and
which are replaced.
Basically, there are two kinds of methods for maintaining the population; generational updates and
steady state updates.
The basic generational update scheme consists in producing N children from a population of size N
to form the population at the next time step (generation).
So, the new population of children completely replaces the parents.
Clearly this kind of update implies that an individual can only reproduce or crossover with
individuals from the same generation.
In a steady state update, new individuals are inserted in the population (2N) as soon as they are
created.
The insertion of a new individual usually necessitates the replacement of another population
member.
The individual to be deleted can be chosen as the worst member of the population, or as the oldest
member of the population.
This will lead to a very strong selection pressure.
Another alternative is to replace the most similar member in the existing population.
1- Random Replacement
The children replace two randomly chosen individuals in the population.

-30-

This can be useful for continuing the search in small populations.
2- Weak Parent Replacement
In weak parent replacement, a weaker parent is replaced by a strong child.
With the 4 individuals only the fittest two, parent or child, return to population.
This process improves the overall fitness of the population when paired with a selection technique
that selects both fit and weak parents for crossing; otherwise, a weak individual will never be
replaced.
3- Both Parents
The child replaces the parent.
In this case, each individual only gets to breed once.
This leads to a problem when combined with a selection technique that strongly favors fit parents:
Those parents (the best fit) are disposed of.

VII- Search Termination (Convergence Criteria)
The various stopping condition are listed as follows:
Maximum generations: The genetic algorithm stops when the specified number of generations is
have evolved (e.g., 100 generations).
Elapsed time: The genetic process will end when a specified time has elapsed (e.g., 60 minutes).
No change in fitness: The genetic process will end if there is no change to the populations best
fitness for a specified number of generations (e.g., after 100 generations, the best fit is the same).
Stall generations: The algorithm stops if there is no improvement in the objective function for a
sequence of consecutive generations (e.g., nearly fixed fitness for 100 generations).
Stall time limit: The algorithm stops if there is no improvement in the objective function during an
interval of time (e.g., nearly fixed fitness for 60 minutes).
Convergence value: The algorithm stops if a specific individual (or individuals) have its fitness
less than a convergence value (e.g., convergence value = 10). Next are methods that use the
convergence value:
1- Best Individual
A best individual convergence criterion stops the search once the maximum fitness in the
population drops below the convergence value.
This brings the search to a faster conclusion guaranteeing at least one good solution.
2- Worst individual
Worst individual terminates the search when the least fit individuals in the population have fitness
less than the convergence value.
This guarantees the entire population to be of minimum standard.
-31-

3- Sum of Fitness
The search is stopped when the sum of the fitness in the entire population is less than or equal to
the convergence value in the population record.
This guarantees that virtually all individuals in the population will be within a particular fitness
range.
The population size has to be considered while setting the convergence value.
4- Median Fitness
Here at least half of the individuals will be better than or equal to the convergence value.
This should give a good range of solutions to choose from.
-32-


Example Problems
I- Maximizing a Function:
Consider the problem of maximizing the function,

1,2, ,31


1- Encoding
For using genetic algorithms approach, one must first code the decision variable x into a finite
length string.
Using a five-bit unsigned integer, numbers between 0 (00000) and 31(11111) can be obtained.
The objective function here is f(x) = x
2
which is to be maximized.
2- Initial Population
An initial population of size 4 is randomly chosen : {12, 25, 5, 19}
Then, we should obtain the decoded x values for the initial population generated:
String # X value Binary Code
1 12 01100
2 25 11001
3 5 00101
4 19 10011

3- Objective function
Calculate the fitness or objective function for each individual.
This is obtained by simply squaring the x value, since the given function is f(x) = x
2
.
String # X value Binary code Fitness value
1 12 01100 144
2 25 11001 625
3 5 00101 25
4 19 10011 361

4- Probability of selection
Compute the probability of selection as follows:


So, for string 1, Fitness f(x
1
) = 144, and f(x
i
) = 1155
The probability that string 1 occurs is given by,
-33-

=
144
1155
= 0.1247 = 12.47%
String # X value Binary code Fitness
value
Probability
1 12 01100 144 12.47 %
2 25 11001 625 54.11 %
3 5 00101 25 2.16 %
4 19 10011 361 31.26 %
Sum 1155 100 %

5- Roulette selection: Expected count
Roulette selection can be implemented by many ways.
The expected and actual count method is proposed for roulette selection.
The next step is to calculate the expected count as follows:


Where

4

For string 1, Expected count = Fitness/Average = 144/288.75 = 0.4987

=
144
1155
4
= 0.4987
String
#
X value Binary code Fitness
value
Prob. Expected
count
1 12 01100 144 12.47% 0.4987
2 25 11001 625 54.11% 2.1645
3 5 00101 25 2.16% 0.0866
4 19 10011 361 31.26% 1.2502
Sum 1155 100% 4
Average 288.75 25% 1
Maximum 625 54.11% 2.1645

The expected count gives an idea of which population can be selected for further processing in the
mating pool.
6- Roulette selection: Actual count
The actual count is to be obtained to select the individuals, which would participate in the
crossover cycle using Roulette wheel selection.
The Roulette wheel is formed as follows:
-34-


The wheel may be spun and the number of occurrences of population is noted to get actual count.
String
#
X value Binary code Fitness
value
Prob. Expected
count
Actual
count
1 12 01100 144 12.47% 0.4987 1
2 25 11001 625 54.11% 2.1645 2
3 5 00101 25 2.16% 0.0866 0
4 19 10011 361 31.26% 1.2502 1
Sum 1155 100% 4 4
Average 288.75 25% 1 1
Maximum 625 54.11% 2.1645 2

String 1 occupies 12.47%, so there is a chance for it to occur at least once. Hence its actual count
may be 1.
With string 2 occupying 54.11% of the Roulette wheel, it has a fair chance of being selected more
than once. Thus its actual count can be considered as 2.
On the other hand, string 3 has the least probability percentage of 2.16%, so their occurrence for
next cycle is very poor. As a result, it actual count is 0.
String 4 with 31.26% has at least one chance for occurring while Roulette wheel is spun, thus its
actual count is 1.
7- Mating pool:
The actual count of string no 1 is 1, hence it occurs once in the mating pool.
The actual count of string no 2 is 2, hence it occurs twice in the mating pool.
Since the actual count of string no 3 is 0, it does not occur in the mating pool.
Similarly, the actual count of string no 4 being 1, it occurs once in the mating pool.
Based on this, the mating pool is formed as follows:
String
#
X value Mating pool
1 12 01100
2 25 11001
2 25 11001
String
1
(12)
12.47%
String
2
(25)
54.11%
String
3
(5)
2.16%
String
4
(19)
31.26%
-35-

4 19 10011
8- Crossover:
Crossover operation is performed to produce new offspring (children).
The crossover probability is assumed to 1.0.
The crossover point is specified and based on the crossover point (chosen randomly), single point
crossover is performed and new offspring is produced.
String
#
X value Mating pool Cross point Offsprings
code
Offsprings
x value
1 12 0110|0 4 0110|1 13
2 25 1100|1 4 1100|0 24
2 25 11|001 2 11|011 27
4 19 10|011 2 10|001 17

8- Mutation:
Mutation operation is performed to produce new off springs after crossover operation.
We select the mutation-flipping operation to be performed and then new off springs are produced.
The mutation probability is assumed to 0.001.
String
#
Offspring
X value
Offspring
code
before
mutation
Mutation
Chromosome
Offsprings
code after
mutation
Offsprings
x value
after
mutation
1 13 01101 10000 11101 29
2 24 11000 00000 11000 24
3 27 11011 00000 11011 27
4 17 10001 00100 10101 21

9- Evaluation:
Once selection, crossover and mutation are performed, the new population is now ready to be
tested.
String
#
X value Binary code Fitness
value
Prob. Expected
count
1 29 11101 841 32.51% 1.30
2 24 11000 576 22.27% 0.89
3 27 11011 729 28.18% 1.13
4 21 10101 441 17.05% 0.68
Sum 2587 100% 4
Average 646.75 25% 1
Maximum 841 32.51% 1.30

-36-

It can be noted how maximal and average performance has improved in the new population.
The population average fitness has improved from 288.75 to 646.75 in one generation.
The maximum fitness has increased from 625 to 841 during same period.
This example has shown one generation of a simple genetic algorithm. Therefore, many
generations can be produced to get more optimal solutions.

II- The Travelling Salesman Problem (TSP)


A traveling salesman has to travel through some cities, in such a way that the expenses on
traveling are minimized.
Formal definition: We are given a complete undirected graph G that has a nonnegative integer cost
(weight) associated with each edge, and we must find a hamiltonian cycle (a tour that passes
through all the vertices) of G with minimum cost.
The Traveling Salesman Problem is a permutation problem in which the goal is to find the shortest
path between N different cities that the salesman takes in a tour.
The problem deals with finding a route covering all the cities so that the total distance traveled is
minimal.
1- Encoding
All the cities are sequentially numbered starting from one, e.g., (1, 2, 3, 4)
The route between the cities is described with an array with each element of the array representing
the number of the city.
The array represents the sequence in which the cities are traversed to make up a tour.

-37-

Each chromosome must contain each and every city exactly once.

For instance, this chromosome represents the tour starting from city 1 to city 7 and so on and back
to city 1.
2- Fitness function
The fitness function should be chosen so that the route with the lowest cost should have high
fitness value, e.g., (- cost)
3- Selection
Parents' selection can be based on the highest fitness or roulette scheme, or any other appropriate
technique.
4- Crossover
We can use any crossover technique that guarantees the production of offsprings in the form of
permutation.
5- Mutation
For this problem, mutation refers to a randomized exchange of cities in the chromosomes.


6- Selection Method
Using steady state selection mechanism, two chromosomes from a population are selected for the
crossover and mutation operations.
The offspring so obtained replace the least fit chromosomes in the existing population.


-38-

7- Implementation
See the Java implementation of TSP:

Potrebbero piacerti anche