Maximum Generation

2006 IEEE Congress on Evolutionary Computation
Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada

July 16-21, 2006
Minimum Number of Generations Required for Convergence of

Genetic Algorithms
Matthew S. Gibbs, Student Member, IEEE, Holger R. Maier, Graeme C. Dandy, and John B. Nixon
Abstract Genetic Algorithms (GAs) have been applied to a

wide range of optimization problems, however a great deal of
time and effort is required to calibrate the GA parameters to
ensure that the best possible solutions are located. It is proposed
that there exists a minimum number of GA generations before
the members of a population will converge to a solution for
a given optimization problem. This property would be useful
in the calibration of a GA, as if there is a constant number
of generations to solve the problem, the best population size
can be determined using the desired number of function
evaluations divided by the minimum number of generations.
The hypothesis is tested for two versions of a test function; a
commonly used separable test function, and a version of the
function with epistatic interactions introduced between decision
variables. Different problem sizes and convergence criteria are
also considered. Two different relationships are identified. For
the case where epistatic interactions are introduced into the test
function the hypothesis is validated, as a constant number of
generations before convergence is identified, and this increases
with the size of the problem. However, for the case with no
interactions between decision variables, the smallest population
size produced the best results, regardless of problem size or
convergence criteria.
I. I NTRODUCTION
In recent years, Evolutionary Algorithms, in particular
Genetic Algorithms (GAs), have been applied to many different optimization problems, generally with great success
([1], [2]). The fact that these are guided search methods that
carry out optimization in conjunction with any simulation
model has allowed them to be adopted in a wide range
of disciplines. However, the successful application of a GA
requires expert knowledge of the algorithm, and how it solves
a problem.
One potential cause of poor performance of a GA is the
incorrect calibration of the GA parameters. Common GA
parameters include population size, probability of crossover,
and probability of mutation. The values used for these
parameters must be calibrated to each specific problem that is
tackled. Both the absolute value used for these parameters, as
well as their relative values, will determine how a GA finds
new solutions and, ultimately, the quality of the final solution
found. The values adopted for the GA parameters will
produce a search behavior between two possible extremes:
exploitation, where the current best solution is used as a basis
M.S. Gibbs, H.R. Maier and G.C. Dandy are at the Centre for Applied
Modelling in Water Engineering, School of Civil and Environmental Engineering, University of Adelaide, Adelaide SA 5005, Australia (e-mail:
mgibbs;hmaier;gdandy@civeng.adelaide.edu.au).
J.B. Nixon is with United Water International Pty Ltd, GPO Box 1875,
Adelaide SA 5001, Australia (e-mail: john.nixon@uwi.com.au).
0-7803-9487-9/06/$20.00/2006 IEEE
for finding better solutions; and exploration, where solutions

are combined to explore the entire search space [2].
As the GA parameter values must be calibrated for each
problem, it would be expected that the best GA parameters would be related to characteristics of the problem [3].
A number of problem characteristics have been identified
that affect the difficulty of a problem in the context of
GAs, including [4]: isolation, where the global optimum is
not located near the better local optima; multimodality, or
the number of local optima present in the fitness function
to potentially trap the GA at sub-optimal solutions; and
deception, or the degree of interaction between decision
variables, requiring accurate processing of combinations of
values in the population. The last characteristic is also known
as epistasis, where a highly epistatic problem has many
interactions between a number of decision variables.
Suitable GA parameters are typically found using a trial
and error approach, as commonly the characteristics of
the problem are largely unknown. However, this approach
requires a great deal of time and computational effort.
Consequently, a number of methods have been suggested in
an attempt to provide an insight into how to set these values.
These methods can be divided into three classes: empirical
studies; parameter adaptation; and theoretical modeling.
A. Empirical Studies
The simplest method for determining useful relationships
considering all aspects of a GA is to perform empirical
analyses on a range of test functions. However, there is
the potential that the results will be specific to the cases
that have been considered in the analyses, and will not hold
for the general case. [5] provide an example of this, where
test functions that have previously been used in empirical
experiments are separable functions, where there are no nonlinear interactions between decision variables. The functions
may be nonlinear in the contribution from each decision
variable to the fitness function, but without the nonlinear
interactions between the decision variables, each decision
variable can be optimized independently of the others. These
separable functions are often readily solved by local search
methods [5], and are most likely unsuitable representations
of more realistic problems that would be tackled in GA
applications. At best, empirical analyses can provide some
useful results about the classes of functions that have been
considered in the analyses, but it is unlikely that the results
will extend to more general applications of GAs.
565
B. Parameter Adaptation
A review of parameter adaptation methods is provided
in [6]. Three classes of parameter adaptation are defined:
deterministic parameter control, where the values change
in accordance with a predetermined rule; adaptive parameter control, where the values are changed based on the
performance of the algorithm; and self-adaptive parameter
control, where the parameters are built into the optimization
problem for the GA to solve along with the optimization
problem itself. While a number of these methods provide
some guidance on applying a GA to an optimization problem,
a complete calibration methodology based on parameter
adaptation is yet to be proposed. Also, only the adaptive
methods (and possibly, implicitly, the self-adaptive methods)
adjust the parameters based on the performance of the GA
on a given problem, and none of the methods analyze the
problem to determine the most appropriate GA parameters
for each case.
C. Theoretical Modeling
Numerous theoretical modeling studies have been performed to provide an insight into the inner workings of GAs.
As the interaction between GA operators is very complex,
generally the models consider the effect of only one or two
aspects of a GA, enabling meaningful results to be drawn.
Some of these studies include considering mutation alone
[7], selection and crossover [8], convergence due to selection
[9], convergence due to genetic drift [10], and using the
initial supply of building blocks and their correct selection
to determine the population size [11].
[9] and [12] have used simple models of GA convergence
to relate the number of generations before convergence
occurs to the length of the solution string. The models are
based on the fact that for selection schemes with a constant
selection intensity (e.g., tournament selection), the increase
in population mean fitness from one generation to the next
(f (g + 1) f (g)) is equal to the product of the selection
intensity (I) and the standard deviation of the population
fitness (t ) [12]:
f (g + 1) f (g) = t I.
This relationship is independent of the algorithm coding or
other operators (such as crossover or mutation), as it is only
dependent on the distribution of the fitness function under
a selection operator with constant selection intensity. [9]
considers normally distributed fitness functions to derive that
the number of
generations before convergence, gconv , is of
the order O( l), where l is the string length. As a specific
case, the Onemax problem is considered, where expressions
can be derived for f (t) and t as a proportion of the genes
that have
converged to the optimal value, 1. The relationship
gconv = 2 l is derived for this problem with a tournament
size of two, and validated experimentally with tournament
selection and uniform crossover.
[12] provides a similar analysis for arbitrarily distributed
fitness functions, where some building blocks have a higher
marginal fitness contribution than others. As an extreme case,

exponentially scaled fitness functions are considered, and for
this case it is determined that the number of generations
before convergence, gconv is of the order O(l). The BinInt
problem is considered as a specific case, as relationships can
be derived for f (t) and t , and the relationship gconv = 1.76l
is derived for this problem. The model is experimentally
validated, however an upper limit to gconv must be taken into
consideration, which occurs when the population prematurely
converges due to genetic drift.
Possibly the most comprehensive GA calibration methodology based on theoretical modeling results, proposed by
[13], makes use of the above relationship. The methodology
determines the initial GA population size using the gconv =
1.76l relationship outlined above (as gconv = 2l), as well
as a modeling result regarding the number of generations
before the population will arbitrarily converge due to genetic drift, gdrift 1.4n [10], where n is the population
size. Relationships obtained from modeling disruption during
crossover [14] are used to determine the crossover probability, and empirical rules are used to determine the probability
of mutation. After the GA has converged, the population
size is increased by a series of GA restarts, similar to that
proposed by [15], where for each restart the population size
is doubled, and the best solution from the previous run is
inserted into the initial population. The process continues
until the best solution found does not improve from one GA
run to the next.
While this approach is the most practical, comprehensive,
and theoretically sound calibration method available, a number of assumptions were made to determine the relationships
for setting the parameter values, and therefore many not be
valid in practice. For example, the relationship between gconv
and l is determined for functions with exponentially scaled
fitness contributions from each bit, and therefore is limited
to fitness functions where this is the case. Similarly, the
genetic drift model does not include a mutation operator
to reintroduce values into the population, and the constant
value of 1.4 is an expected value with a large variance [12].
The systematic increase in population size may provide a
mechanism to compensate for assumptions in the models
used to determine the initial population size, allowing the
GA to eventually find near-optimal solutions if the initial
population size is unsuitable. The methodology suggests
that the only characteristic of an optimization problem that
will affect the most suitable GA parameters is the string
length, and other characteristics of optimization problems
that contribute to problem difficulty, such as multimodality
and deception [4], are ignored.
This highlights a serious concern with the theoretical
results that have been obtained previously, where simple
models were used to enable conclusions to be drawn, however beneficial attributes included in the GA were ignored.
Also, the vast majority of GA modeling has been conducted
using binary coding, as it is much simpler to perform the
analyses when each bit in the solution string can only be in
566
one state, 1, or the other, 0. For problems where the decision

variables are real values, [2] suggests that real coding will
allow the algorithm operators to perform more efficiently
and exploit the graduality of the function. In this case, each
decision variable can take any value over a specified range,
making the modeling of these algorithms much more complex. Similarly, the inclusion of all GA operators, for example
both crossover and mutation, as well as other operators that
have shown potential, such as elitism, further complicates the
modeling, and it very quickly becomes extremely difficult,
if not impossible, to draw any useful conclusions.
The modeling results provided by [9] and [12] suggest that
there is a constant number of generations before a population
will converge to a solution for a given problem, provided
that the population size is large enough to prevent genetic
drift occurring. A constant number of generations before
convergence, gconv would be an extremely beneficial property
to assist GA calibration, as the most appropriate population
size (n) could then be determined using the number of function evaluations that are available (FE), as n = FE /gconv .
The exact relationships developed between gconv , and the
string length, l, for the Onemax problem [9] and BinInt
problem [12] are specific to these functions using a GA with
a binary coding, tournament selection, and uniform crossover.
However, the basis for these relationships is only dependent
on the distribution of the fitness function under constant
selection intensity, so the dimensional relationship between
gconv and l may be expected to extend to any encoding and
GA parameters, provided a selection operator with constant
selection intensity is used.
The aim of this paper is to test this hypothesis empirically.
To provide a distinct contrast to previous studies studies (
[9], [12]), the hypothesis is experimentally tested using a
real-coded GA. In addition, a change in the problem characteristics is considered, in which one form of the test function
has interactions between decision variables, and one does not.
Different problem sizes are also considered to investigate if
there is a relationship between gconv and the problem size, l.
The remainder of the paper is structured as follows: Section II
outlines the methodology adopted, including details of the
GA and the variations of the test function used. The results
of the parametric study are presented in Section III, before
a discussion of the results and concluding remarks are made
in Sections IV and V, respectively.
II. M ETHODOLOGY
An empirical approach has been adopted to test if there
exists a constant number of generations before convergence,
gconv , for the given test problems. This approach allows
for realistic results to be obtained and, by considering the
characteristics of the test functions, the results obtained
are likely to extend to other problems with similar characteristics. To determine gconv , a GA has been calibrated
by means of a large-scale parametric study. The number
of generations before convergence has been determined by
identifying the best performing population size in a given
number of function evaluations, as it is assumed that the best
TABLE I
T HE TEST FUNCTIONS USED FOR THE PARAMETRIC STUDY.
F1
f1 (x) =
l
X
` 2
xi 10 cos (2xi ) + 10l

i=1
F2
f2 (x) =
l1
X
i=1
xi + xi+1
2
xi + xi+1
+ 10 cos 2
2
+x2l 10 cos (2xl ) + 10l
solutions are identified by the GA converging to them, as

opposed to randomly locating a good solution by chance. To
support this assumption, each set of GA parameter values that
have been tested (Section II-C) were run with 30 different
sequences of random numbers, with the average of these
runs taken as the solution found. To investigate the effect
of epistatic interactions on gconv , two variations of the test
function have been considered, one with interactions between
adjacent decision variables, and one without. The GA has
been applied to each test function for a number of different
problem sizes, or dimensions, to investigate if there is a
relationship between gconv and the dimensionality of the
problem, l. The best performing GA parameters from the
large-scale parameter study were stored as the algorithm was
solving each function, to allow evaluation of the parameters
under different stopping criteria.
A. Test Functions
Table I lists the two versions of the benchmark optimization problem adopted for this research. Both functions have
been minimized over the range [5.12, 5.12] for problem
sizes of l = 5, 10, 20, 30. F1 is the Rastrigin function as used
by [16], a common benchmark optimization problem. To
investigate the effect of epistatic interactions on this problem,
F2 has been constructed using the guidelines proposed by [5].
The guidelines allow a non-linear, non-separable, scalable
function to be developed, which provides a realistic challenge
to any optimization algorithm. The Rastrigin function has
been used as a basis, and to produce the non-separable
component of the function, which introduces interactions
between the decision variables, the xi terms have been
substituted with (xi + xi+1 )/2. The final function developed
can be seen in Table I as F2, where the sub-function in
variable l has been included to ensure that there is a unique
optimal solution to the problem.
B. Genetic Algorithm
As the test functions have real-valued decision variables, a
real-coded GA has been used for the analyses. A tournament
size of two has been adopted, and there is the possibility of
none, one, or two elite solutions replacing the worst solutions
during each generation. It should be noted that when two
elite solutions are used, the best two solutions are inserted,
as opposed to two copies of the best solution.
A one-point distributed crossover operator has been used,
as neighborhood-based crossover operators such as this have
567
TABLE II
T HE GA PARAMETER VALUES USED
C. Parametric Study
To determine the best parameters for the GA, a large-scale
parametric study has been undertaken for each test function.
While the results for the GA parameters other than the
population size have not been included in the analyses, they
must be included in the calibration of the GA to ensure that
they are not biasing the results toward a certain population
size. The parameter values tested are given in Table II. Each
combination of parameter values was tested 30 times with
different random number sequences populations to evaluate
Parameter
Values
Population Size, n
6, 10, 25, 50, 75, 100,

150, 200, 400, 800
0, 0.1, 0.2, 0.3, . . . , 1.0
0.7, 0.85, 1
s/18, s/6
0, 1, 2
Probability of Mutation, pm
Probability of Crossover, pc
Standard Deviation of Crossover,
Elite Solutions per Generation, e
Fig. 1. The crossover distribution used for the GA for one decision variable.
A normal distribution with standard deviation is used around p1 before the
crossover point, then around p2 after the crossover point. The distribution
shown uses = s/6.
been found to exploit the numerical nature of real-coded GAs

[17]. When crossover is applied as part of this operator, a
random crossover point is generated, and a new value of
each decision variable is generated from a normal distribution
centered around the first parents solution values, p1 , before
the crossover point, or centered around the second parents
solution values, p2 , after the crossover point. The distribution
used for one decision variable can be seen in Figure 1.
The crossover operator used for this work is similar to the
Simulated Binary Crossover operator [18] and the Fuzzy
Recombination Crossover operator [19].
To provide an alternate search mechanism to the crossover
operator used, uniformly distributed random mutation has
been applied. A Gaussian mutation operator is commonly
used for a real-coded GA, however, this would provide a very
similar search mechanism to the crossover operator adopted
for this work. A range of mutation probabilities have been
considered, thus if uniformly distributed mutation is proving
to be disruptive to the search, a small or zero probability
of mutation will be the most effective. The probability of
mutation, pm , is the probability of one decision variable in a
solution string being subject to mutation, therefore pm = 1
for each string corresponds to the empirical mutation rule for
a binary coded GA under bitwise mutation, pm = 1/l [7].
The GA parameters that must be selected are the population size, probability of crossover, probability of mutation, and the number of elite solutions per generation. The
standard deviation, , of the distribution to be used for the
crossover operator must also be selected, which was taken as
a fraction of the distance between the two parent values, s.
A smaller fraction of s will produce a tighter distribution
around the parent values, and therefore greater exploitation
of current solutions. Figure 1 shows a crossover distribution
using the parameter = s/6.
FOR THE PARAMETRIC STUDY.
the performance. This resulted in a total of 59 400 GA runs

for each function.
A GA with each set of parameter values was run until
a solution within 106 of the actual solution (F (x) = 0)
was found, or a maximum of 500 000 function evaluations.
The average and standard deviation of the best solution
found over the 30 different GA runs for each GA parameter
set were recorded every 1 000 function evaluations. A 2tailed Students t-test with a 95% confidence interval was
used to compare parameter sets and identify those that were
statistically the best as the GA solved each function. Of the
best performing population sizes found by the Students ttest, the median was used as the optimal population size,
n. The hypothesis of this paper is that as FE is increased,
n will increase proportionally, producing a constant gconv .
The results from these analyses, for the two variations of
the Rastrigin function, each with four problem sizes, are
presented in the following section.
III. R ESULTS
The results for the optimal population sizes for F1 and
F2 are shown in Figures 2 and 3, respectively. The graphs
show the best performing population sizes plotted against the
number of function evaluations made. All of the parameter
combinations that find the current best solution (not necessarily the optimal solution), as determined from the Students
t-tests, are plotted as box plots against the number of function
evaluations made. The median of the box plots has been taken
as the optimal population size. The variations in population
size are produced by different combinations of the other
GA parameters (probability of crossover or probability of
mutation, for example) producing statistically similar results.
For F1, the analyses were stopped when the optimum
solution was reached. For F2, the analyses were stopped once
the optimal population size reached the maximum considered
in the parametric study, n = 800, as it would be expected
that beyond this larger population sizes would outperform
those considered.
Two very different relationships are observed for the two
functions considered. For F1, a constant population size is
the most effective, and irrespective of the number of function
evaluations, the best performance was obtained with the
smaller population sizes. Figure 2 shows the optimal population sizes for F1 in l = 5 dimensions, and similar results
are found for F1 in l = 10, 20 and 30 dimensions, with
568
(a) l = 5 Dimensions
(b) l = 10 Dimensions
(c) l = 20 Dimensions
(d) l = 30 Dimensions
Fig. 3. Optimal Population Sizes (n) for F2, the Rastrigin function with interactions between decision variables introduced, in l = 5, 10, 20, and 30
dimensions for increasing number of function evaluations, (FE). The slope of the straight line fitted between the optimal population sizes indicates the
number of generations before convergence for each problem. The box plots for the given number of function evaluations represent different combinations
of GA parameters that produce statistically similar results.
TABLE III
O PTIMAL GENERATION RESULTS FOR F2.
Fig. 2. Optimal Population Sizes (n) for F1, the original Rastrigin function,
in l = 5 dimensions for increasing number of function evaluations (FE).
the only difference being that convergence was reached in

12 000, 28 000, and 42 000 function evaluations, respectively.
For F2, there is a clear relationship between the optimal
population size and the number of function evaluations: as
the number of function evaluations is increased, the optimal
population size also increases. This relationship suggests that
for F2, there exists a constant number of generations to most
efficiently solve the problem. A graphical illustration of the
g/l
g/ l
5
10
20
30
70
100
202
302
14.00
10.00
10.10
10.07
31.30
31.62
45.17
55.14
process used to determine the optimal number of generations

for each problem size of F2 is shown in Figure 3. The
inverse of the slope of the straight line fitted between optimal
population sizes, the median of the box plots, and the number
of function evaluations, produces gconv , as gconv = FE /n.
Table III shows the optimal number of generations, g, for
each problem size, l, considered. The effect of introducing
epistatic iterations into the fitness function on the optimal
GA population size is discussed in the following section (IIIA), before Section III-B explains the problem size effects
observed in Table III. Section IV provides a discussion of
the wider implications of these results.
569
A. Epistasis Effects
The effect of the epistatic interactions introduced into F2
observed above can be explained by the efficient use of function evaluations. For F2, where there are interactions between
decision variables, combinations of decision variable values
must be stored and evaluated to solve the problem. For a GA,
this is done using a large population size, and better solutions
are found using a larger population size, provided there is
time for the GA to process the information in the population
and converge to a solution. Hence, as the number of function
evaluations increases, so does the optimal population size.
However, these larger population sizes do not find the best
results for fewer function evaluations, as there is insufficient
time to process all the combinations of decision variable
values in the population and thus to converge on a solution.
For F1, which does not have any interactions between
decision variables, each decision variable can be optimized
independently of the other decision variable values. Therefore, it is not necessary to store combinations of the decision
variable values, and function evaluations are wasted on poor
quality solutions stored in a large population. As can be seen
from the results in Figure 2, it is more efficient to have
a smaller population size updated more frequently through
more generations for the completely separable function.
B. Problem Size Effects
The results indicate that for F1, the size of the problem
does not affect the optimal population size, which is always
the smallest population size that has been considered. Similar
to the case described above for the epistasis effects, as F1
is a separable function, each decision variable can be optimized separately and therefore the smallest population size is
always the most efficient, and the only way to locate better
solutions is through more function evaluations. Therefore,
in this case, and possibly for all separable functions, better
results are obtained by increasing the number of generations,
as opposed to increasing the population size.
Figure 3 indicates that for F2 the problem size does have
an effect on the optimal population size, where for the same
number of function evaluations a smaller population size
performs better for the larger problem sizes compared to that
for the smaller problem sizes. The straight line fitted between
the optimal population sizes and the number of function
evaluations, seen in Figure 3, suggests that for F2 there is
an optimal number of generations to solve the problem, and
the values of the optimal number of generations for each
problem size are given in Table III. Table III also shows
the proportionality constants for the g l and g l

relationships, as proposed by [9] and [12], respectively. The
results are somewhat consistent with the modeling results obtained by [9] and [12], even though binary-coded GAs were
used in those studies, and a real-coded GA was used here.
For the two smaller problem sizes (l = 5, 10), the g l

relationship found by [9] was observed, as seen in Table III,
with a proportionality constant of approximately 31.5. This
result would be expected for F2, as each decision variable
has about the same contribution to the fitness function value

(although there is possibly a slightly higher contribution from
the lth decision variable).
For the problems sizes of F2 with l 10 the relationship
between g and l changes to the relationship proposed by [12]
for fitness functions with exponentially scaled fitness contributions from each decision variable, g l. This relationship
can be seen in Table III with a proportionality constant of
approximately 10. While the equation for F2 suggests that
each decision variable has the same contribution to the fitness
function value, for the larger problem sizes, there are more
decision variables, and due to the random nature of the GA, it
is more likely that they will converge at different rates. When
this is the case, the decision variables, or small combinations
of decision variable values, will have different contributions
to the fitness function value. Hence, the decision variables
that have values that are most distant from the optimum, and
therefore have the biggest contribution to the fitness function
value, must be improved first before the contributions from
other decision variables are significant again. Therefore, for
the larger problem sizes, it is more likely that there will be
variations in the rate of convergence of the different decision
variables, and the g l relationship is observed.
IV. D ISCUSSION
By considering a controlled experiment on two versions
of the same function, one with epistatic interactions between
the decision variables and one without, the effect of these
interactions on gconv has been investigated in a realistic
environment. The results for F2 presented in Section III
support the initial hypothesis that, for this function, there
is a constant number of GA generations to best solve the
problem. Interestingly, this was only valid if the problem has
a characteristic that benefits from a larger population size,
in this case epistatic interactions between decision variables.
This can be seen in the results for F1, where for a completely
separable function, a small population size was the most
efficient, irrespective of problem size or convergence criteria.
The result could be used to assist in the calibration of
subsequent runs of a GA. If the solution found using an initial
population size is unsuitable, the results suggest that the best
results will be found by keeping the number of generations
constant, and therefore the most suitable population size can
be calculated based on the maximum number of function
evaluations available before a solution is required.
The relationships identified are useful in the case where
the equation of the fitness function is known, and therefore
so is the degree of interaction between the decision variables.
However, often this is not the case, for example, if the
fitness function is constructed from the results of a simulation
model. This highlights the potential usefulness of epistasis
measures that have been developed previously to quantify
problem difficulty, as if there are interactions between decision variables, there exists a gconv to most efficiently solve
the problem. This approach to using information provided
by problem difficulty measures is different to that which
has previously been adopted. In this case, the measures
570
would be used to determine the most suitable parameters for

the algorithm, as opposed to attempting to predict problem
difficulty or convergence behavior.
The results relating problem size to the optimal number
of generations for F2 correlated well with the relationships
proposed by [9] and [12]. There are a number of differences
between the modeling assumptions made by [9] and [12]
and the experimental conditions used for this study. Firstly,
the modeling results of [9] and [12] were obtained using a
binary solution string, where the optimal value for each bit
was assumed to be present in 50% of the initial random
population. In contrast, for the real-coded solution string
used in this work, it is highly unlikely that any of the
decision variables in the initial population have their optimal
value. Secondly, the convergence models in the previous
studies were developed based on the OneMax function [9]
and the BinInt function [12] assuming a uniform crossover
operator and no mutation operator, which is considerably
different to the real-valued, highly multi-modal test functions
optimized with a real-coded GA including a one point
crossover operator, mutation operator, and elitism presented
here. Despite these significant differences, the modeling
relationships proposed by [9] and [12] were observed in the
results obtained in this study. This is not surprising, as the
basis for these results is only dependent on the distribution
of the fitness function under constant selection intensity.
Interestingly, the results indicate that even for a problem with
an equal fitness contribution from each decision variable,
the decision variables, or combinations of decision variables,
converge at different rates for larger problem sizes.
The constant number of generations before convergence
is promising for the variation of the Rastrigin function
considered. However, it must be stressed that this is only
one fitness function and thus further work is required to
investigate whether these relationships hold for other functions, including other forms of interactions between decision
variables, before generalizations can be made about the
most suitable gconv for a particular problem. However, the
dimensional relationships between gconv and l have been
theoretically developed by [9] and [12], so they may be
expected to hold for the general case. For example, in this
work the relationships held for test functions and GAs that
were completely different from those used in the original
studies.
case, the results suggest that the number of generations

before convergence is related to problem size. However, if
there is not a problem characteristic that benefits from being
solved with larger population sizes (such as a separable
function no interactions between decision variables), the
smallest population size was found to be the most effective,
irrespective of problem size or convergence criteria.
A number of areas for future research have been identified
from these results. First, a wider range of fitness functions
should be considered, to investigate if the constant number of
generations observed in this work is valid for more general
cases. By considering more examples of fitness functions,
the relationship between the each problems characteristics
and gconv can also be investigated. Second, the effect of
epistatic interactions on gconv has been highlighted, however
this is only useful if it is known that epistatic interactions are
present in the fitness function. Further work will investigate
if problem difficulty measures, such as the epistatic variance,
can provide this information. This is a different approach to
previous uses of problem difficulty measures, which have
generally been used in attempts to predict problem difficulty
or convergence behavior. It may be that the degree of
interaction between decision variables is related to gconv .
Therefore, if the degree of interaction can be determined
from an epistasis measure, it can be used along with the
problem size to obtain and estimate for gconv . This will be
the focus of future work.
ACKNOWLEDGMENT
This work was supported in part by an Australian Postgraduate Award from the Commonwealth Department of
Education, Science and Training, and in part by the Cooperative Research Centre for Water Quality and Treatment,
Project 2.5.0.3.
R EFERENCES
[1] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and
Machine Learning. Boston, MA, USA: Addison-Wesley Publishing
Company, Inc., 1989.
[2] F. Herrera, M. Lozano, and J. L. Verdegay, Tackling real-coded
genetic algorithms: Operators and tools for behavioural analysis, Artif.
Intell. Rev., vol. 12, no. 4, pp. 265319, 1998.
[3] M. Gibbs, H. R. Maier, and G. C. Dandy, Applying fitness landscape
measures to water distribution optimisation problems, in 6th Intl
Conf. on Hydroinformatics, S.-Y. Liong, K.-K. Phoon, and V. Babovic,
Eds., vol. 1. Singapore: World Scientific Publishing, 2004, pp. 795
802.
[4] B. Naudts and L. Kallel, A comparison of predictive measures of
problem difficulty in evolutionary algorithms, IEEE Trans. Evol.
Comput., vol. 4, no. 1, pp. 115, 2000.
[5] D. Whitley, K. Mathias, S. Rana, and J. Dzubera, Building better
test functions, in Proc. of the 6th Intl Conf. on Genetic Algorithms,
L. Eshelman, Ed. San Francisco, CA: Morgan Kaufmann, 1995, pp.
239246.
[6] A. E. Eiben, R. Hinterding, and Z. Michalewicz, Parameter control
in evolutionary algorithms, IEEE Trans. Evol. Comput., vol. 3, no. 2,
pp. 124141, 1999.
[7] H. Muhlenbein, How genetic algorithms really workpart I: Mutation
and hillclimbing, in 2nd Conf. on Parallel Problem Solving From
Nature, R. Mnner and B. Manderick, Eds., Amsterdam, North Holland,
1992, pp. 1525.
V. C ONCLUSIONS AND F UTURE W ORK

This paper has investigated the hypothesis that under
tournament selection, there exists a constant number of GA
generations to most efficiently solve a given optimization
problem. Two variations of a common benchmark test function, the Rastrigin function, have been considered, each for
a range of problem sizes. For the cases considered, it was
concluded that provided the fitness function has a characteristic that benefits from being solved with a larger population
size (such as interactions between decision variables), a
constant number of generations exists that will locate the
best solutions possible for the problem. When this is the
571
[8] D. Thierens, Dimensional analysis of allele-wise mixing revisited,

in Parallel Problem Solving from NaturePPSN IV, H.-M. Voigt,
W. Ebeling, I. Rechenberg, and H.-P. Schwefel, Eds. Berlin: Springer,
1996, pp. 255265.
[9] D. Thierens and D. E. Goldberg, Convergence models of genetic
algorithm selection schemes, in Parallel problem solving from nature
PPSN III, ser. Lect. Notes Comput. Sc., Y. Davidor, H. Schwefel, and
R. Manner, Eds. London, UK: Springer-Verlag, 1994, vol. 866, pp.
119129.
[10] H. Asoh and H. Muhlenbein, On the mean convergence time of
evolutionary algorithms without selection and mutation, in Parallel
problem solving from naturePPSN III, ser. Lect. Notes Comput. Sc.,
Y. Davidor, H. Schwefel, and R. Manner, Eds. London, UK: Springer
Verlag, 1994, vol. 866, pp. 8897.
[11] G. Harik, E. Cant-Paz, D. E. Goldberg, and B. L. Miller, The gamblers ruin problem, genetic algorithms, and the sizing of populations,
in Proc. IEEE Intl Conf. Evol. Comput., T. Back, Ed. New York:
IEEE Press, 1997, pp. 712.
[12] D. Thierens, D. E. Goldberg, and A. Pereira, Domino convergence,
drift and the temporal-salience structure of problems, in Proc. IEEE
Intl Conf. Evol. Comput. New York, NY: IEEE Press, 1998, pp.
535540.
[13] B. Minsker, Genetic algorithms, in Hydroinformatics: Data Integrative Approaches in Computation, Analysis and Modeling, P. Kumar,
J. Alameda, P. Bajcsy, M. Folk, and M. Markus, Eds. Florida, USA:
CRC Press, 2005, pp. 439456.
[14] D. Thierens, Analysis and design of genetic algorithms, PhD,
Katholieke Universiteit, 1995.
[15] G. R. Harik and F. G. Lobo, A parameter-less genetic algorithm, in
GECCO, Proc., W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon,
V. Honavar, M. Jakiela, and R. E. Smith, Eds., vol. 1. Orlando,
Florida, USA: Morgan Kaufmann, 1999, pp. 258265.
[16] H. Muhlenbein and D. Schlierkamp-Voosen, Predictive models for
the breeder genetic algorithm: I. Continuous parameter optimization,
Evol. Comput., vol. 1, no. 1, pp. 2549, 1993.
[17] F. Herrera, M. Lozano, and A. M. Sanchez, A taxonomy for the
crossover operator for real-coded genetic algorithms: An experimental
study, Int. J. Intell. Syst., vol. 18, no. 3, pp. 309338, 2003.
[18] K. Deb and R. Agrawal, Simulated binary crossover for continuous
search space, Complex Syst., vol. 9, pp. 115148, 1995.
[19] H.-M. Voigt, H. Mhlenbein, and D. Cvetkovic, Fuzzy recombination
for the breeder genetic algorithm, in Proc. of the 6th Intl Conf. on
Genetic Algorithms, L. Eshelman, Ed. San Francisco, CA: Morgan
Kaufmann, 1995, pp. 104113.
572

Maximum Generation

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Maximum Generation

Caricato da

Copyright:

Formati disponibili

2006 IEEE Congress on Evolutionary Computation

Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada

Minimum Number of Generations Required for Convergence of

Abstract Genetic Algorithms (GAs) have been applied to a

for finding better solutions; and exploration, where solutions

marginal fitness contribution than others. As an extreme case,

one state, 1, or the other, 0. For problems where the decision

xi 10 cos (2xi ) + 10l

+x2l 10 cos (2xl ) + 10l

solutions are identified by the GA converging to them, as

6, 10, 25, 50, 75, 100,

been found to exploit the numerical nature of real-coded GAs

FOR THE PARAMETRIC STUDY.

the performance. This resulted in a total of 59 400 GA runs

the only difference being that convergence was reached in

process used to determine the optimal number of generations

the proportionality constants for the g l and g l

For the two smaller problem sizes (l = 5, 10), the g l

has about the same contribution to the fitness function value

would be used to determine the most suitable parameters for

case, the results suggest that the number of generations

V. C ONCLUSIONS AND F UTURE W ORK

[8] D. Thierens, Dimensional analysis of allele-wise mixing revisited,

Potrebbero piacerti anche