Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Contents
1 Introduction 7
1.1 Design as a creative process . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Formulation of an optimization model . . . . . . . . . . . . . . . . . . . 10
3
4 Application of design optimization in structural engineering 33
4.1 Trusses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Statically determinate trusses . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Statically indeterminate trusses . . . . . . . . . . . . . . . . . . . 34
4.2 Several trusses in different situations . . . . . . . . . . . . . . . . . . . . 36
4.3 Optimization of a statically indeterminate truss . . . . . . . . . . . . . . 39
4.4 Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Framed structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6 Panel structures with in-plane forces . . . . . . . . . . . . . . . . . . . . 45
4.7 Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4
7.5.4 Interpolation methods . . . . . . . . . . . . . . . . . . . . . . . 93
7.5.5 One point pattern (Newton-Raphson iteration) . . . . . . . . . . . 93
7.5.6 Two point pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5.7 Three point pattern . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6 Handling of constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6.1 Transformation methods . . . . . . . . . . . . . . . . . . . . . . 96
7.6.2 Sequential unconstrained minimization . . . . . . . . . . . . . . 96
7.6.3 Proof of concept of SUMT . . . . . . . . . . . . . . . . . . . . . 97
7.6.4 Multiplier methods . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.7 Sequential approximation methods . . . . . . . . . . . . . . . . . . . . . 100
7.7.1 SLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7.2 SQP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7.3 SCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.8 On the usage of termination criteria . . . . . . . . . . . . . . . . . . . . 101
7.9 Evolution strategies for design optimization . . . . . . . . . . . . . . . . 102
5
6
Chapter 1
Introduction
Before we begin with the details of design optimization, let us first discuss some funda-
mental issues on the background and history of this subject. In the past decades (since the
1980’s), design optimization has emerged as a new and substantial discipline in engineer-
ing, gaining more and more significance, in particular in the fields of
• mechanical engineering,
• automobil engineering,
• aerospace engineering,
• shipbuilding,
The goal of design optimization is to design technical systems in an optimal fashion w.r.t.
costs, quality, mechanical behavior, efficiency and other objectives. When focusing on
structural systems, we customarily use the term “structural optimization”. The relevance
of optimization in engineering is obvious because of
• shrinking resources,
• expensive energy,
• faster developments of new products with shorter periods available for testing.
We have to realize that the process of design optimization, in a modern fashion, needs
the application of (efficient) computers, so therefore computer systems are mandatory to
solve real-world problems. Also, design optimization is an interdisciplinary methodology
where a wide variety of engineering facets have to be incorporated. Design optimization
collects the numerically oriented aspects of a design process requiring a formal, or abstract
model and is represented through an optimization problem.
Design and structural optimization is closely linked with modern structural analysis (finite
elements, boundary elements, etc.) because displacements, stresses, vibrations, etc. are
fundamental for the design of engineering structures. Also, as noted above, optimization,
analysis and user-guided navigation of the design process requires powerful computer
7
systems (including advanced hard- and software) due to the inherent complexity. These
facts explain why the use of design (or structural) optimization in engineering has been
delayed so much compared to computational engineering in general. In fact, the design
optimization can be considered to begin only around the 1960’s:
• Clough (Berkeley) introduced the finite element method for the first time
The finite element method, as a rationale for design, required approx. 30 years to mature to
a real tool. It is therefore no surprise that design optimization needs some time to become
a mainstay tool. Since optimization is highly non-linear and multi-dimensional appropri-
ate software has been missed for a long time. Finally, in advanced design optimization
knowledge-based issues are also relevant.
We can summerize as follows: Design optimization
CAD/CAE numerical/
FEM math. optimization
design
optimization
modern concepts
software
of
engineering
computer science
Figure 1.1: The position of design optimization within engineering and computer science fields.
8
system preliminary
detailed
identification specifications design testing
design
(needs) (concepts)
fabrication
result
change based
on experience
The conventional design process guided by engineering intuition is interesting w.r.t. the
design optimization process, where heavy use is made of computers and numerical mod-
els. The basic steps of conventional design are shown in Fig. 1.3. An advantage of con-
ventional design is that the designer’s experience can go into making conceptual changes.
A disadvantage, however, is that changes are time consuming so that more than one or
two changes are avoided in the design process.
The optimum design process is much more efficient, in particular, in detailed design, as
can be seen in Fig. 1.4.
Advantages of optimum design are:
• time is no longer such a serious matter because the computer, and not the engineer,
is doing the job (however, this doesn’t mean that computer time is not crucial for
very large structures),
no no
result
change design using
optimization method
9
• a large set of constraints is possible and the objective criteria are checked automat-
ically.
h(x) = 0;
g(x) ≤ 0.
According to the basic terminology and notation used in design optimization the con-
straints of the inequality form are declared as of the ≤-type. This is to say, if a formulation
of the ≥-type occurs, it should be transformed to the ≤-type, for example by multiply-
ing with −1. The reason for that is that most software packages assume the ≤-type of
constraint equations.
Furthermore, the optimization is defined as a minimization problem. Thus, we have the
general format
h(x) = 0
min f (x) .
x g(x) ≤ 0
1
that is the normal case in engineering
10
Chapter 2
min {E(x)}.
x
11
Considering linear equation systems, however, there are much more efficient techniques
available for solving equations of the form Ax = r (for example, Gauß, Crout or Cholesky
solvers). By contrast, in the non-linear case the optimization approach is not such a bad
idea, because solving a general non-linear system of equations is still a challenge.
A simple example of a non-linear system of equations is:
x21 + 3 cos x2 = 2,
cos x1 + 2x1 x2 = 4.
and
In a more general approach we can apply the least p-th error condition,
n
E(x) = ∑ wi ei (x) → min,
p
i=1
where p is an even integer and wi is a weighting factor. This represents an effective and
generic solution technique.
To understand the mathematical background of optimization it is a good idea to “change”
the equation Ax = r into an inequality equation system Ax ≤ r.
Example:
Fig. 2.2 illustrates the situation. As demonstrated, the solution domain S in the x1 /x2 -
space is bordered by the linear constraints, defining an infinite number of solutions (al-
ternatives). The best solution can be found by evaluating an objective function. Here, the
(linear) function Q(x) = x1 + x2 is used. The optimum x∗ can be found at the intersection
of the lines 3x1 + 5x2 = 15 and x1 = 4.
12
x2
4
x1 + x2 = -3 3
2 x2 = 2
1 Popt
x1
-5 -4 -3 -2 -1 1 2 3 4 5
-1
S 3x1 + 5x2 =15
-2
-3 x2 = -3
-4
Q1
-5
x1 = 4
Figure 2.2: The feasible region and solution defined by the the equation Ax = r
According to the standard notation defined above the graphical solution of the optimiza-
tion problem
min{x1 + x2 | Ax ≤ r}
x
has been discussed which is a simple linear optimization problem. Real world problems,
on the other hand, are usually non-linear (see Fig. 2.3), which makes life much more
complicated.
To give an insight in the complexity of non-linear optimization problems some basic cases
for the 2-dimensional problems are depicted in Fig. 2.4.
With respect to the objective function, complicated cases can also occur, which may also
cause trouble. Again, in Fig. 2.5 some typical cases for 2 design variable are shown.
• design variables,
• decision variables,
13
x2
5 optimum
1
-5 -4 -3 -2 -1 1 2 3 5
feasible region x1
-1 contour lines of
the objective function
-2
-3
-4
-5
Figure 2.3: A 2-dimensional optimization problem with non-linear constraints and a non-linear
objective function
x2 x2 x2
x1 x1 x1
x2 x2 x2
x1 x1 x1
x2 x2 x2 x2
x1 x1 x1 x1
one peak only two peaks infinite multiple
(well posed!) (a whole curve is optimal)
14
These variables are regarded as explicitly free because the designer (engineer) can assign
any value to them. If the specified values do not satisfy all constraints of the design prob-
lem (represented as an optimization problem), the design is infeasible. If the values do
satisfy all constraints, the design is called feasible (or workable or usable).
It’s an important first step in the proper formulation / modeling of the problem to identify
the appropriate design variables. Sometimes it is desirable to designate more design vari-
ables than may be apparent from the statement of the problem formulation because this
gives flexibility in design. Later on, it is then possible to fix some of the variables to keep
the dimension of the optimization problem low.
It is to be emphasized that the design model, as described above, represents a synthesis
problem in contrast to the conventional design making use of an analysis problem.
To clarify the definition of design variables some practical examples will be considered.
By that, we are focusing on examples from structural engineering, as this will be the prime
field in this course!
x4
x3
x1 = web thickness
x2 = web height
x2
x3 = flange height
x4 = flange width
x1
E I2 = E x2
x1 = moment of inertia for frame bars
E I1 = E x1 E I3 = E x3 (or shafts/columns) 1 and 3
H
1 3
x2 = moment of inertia for bar 2
EIi =flexural rigidity
E =Young’s modulus a.k.a. modulus of elasticity
2B
15
2.2.3 Dimension variables
x1 x2 l - x1 - x2 x1 = span section 1
x2 = span section 2
l = const.
l = total span (const.)
J = jacking f orce
J J
xk = number of cables in prestressed
concrete box girder bridge
IPB h b s t r S
mm mm mm mm mm cm2
100 100 100 6 10 12 26
120 120 120 6.5 11 12 34
140 140 140 7 12 12 43
160 160 160 8 13 15 54.3
180 180 180 8.5 14 15 65.3
200 200 200 9 15 18 78.1
220 220 220 9.5 16 18 91
... ... ... ... ... ... ...
16
Figure 2.6: The homogenization method for topology optimization
17
2.3.3 Shells
Revolving hyperboloid (cooling tower)
r(z)
t = const.
t = const.
t = r(z)
z
r
z = z(y) = x1 = x1 (y)
t = t(y) = x2 = x2 (y)
Originally, these are problems with unknown functions, for example, r = r(z) or z = z(y).
In other words, the design variables r and z are functions of further parameters. This leads
to so-called variational problems, where functionals are the basis in the design space.
Thus, given a base of functionals, variational problems can be transferred to parameter
optimization problems, using, for example, a transformation equation such as
k
z(y) = ∑ xi · yi−1 .
i=1
18
2.4.2 Moment of inertia expressed in terms of area
b Given a rectangular cross-section, we can
express the moment of inertia Iy in terms of
a design variable, say xk = A,
bh3 h
= xk · β,
h
Iy = = A·
12 12
with β = h/12.
Since areas are often defined as sizing variables, it is no surprise that cross-sectional
values of various categories of beams, columns, bar, etc. are introduced as dependant
design or behavior variables.
Iy = A = Aβ1 , β1 = ,
4 4
h h
h
Wy = A = Aβ2 , β2 = ,
2 2
t
6bh2 + κh3
t
Iy = ,
24b + 12κh
so we can express Iy by means of the area A
s as
h
Iy = β1 A = βxk
with
b
h2 (6bt + s)
β= .
12(tb + sh)(2tb + sh − st)
19
Figure 2.7: The relative values of Iy , Wy and As with respect to an IPBl-500 profile.
A
A500 , the non-linear relationships
Iy = 3.78A2 (curve × ),
√
Wy = 1.58A A (curve
),
As = 0.20A1.15 (curve ),
are obtained as simplified models (see Fig. 2.7).
20
tural mechanics) are circumvented to the greatest possible extent. The rudimentary trans-
formation, as in the example
x1 + x2 = 3 → 3 − ε ≤ x1 + x2 ≤ 3 + ε
where ε ≪ 1 and ε > 0, is in most cases, however, not appropriate due to numerical
problems.
In certain (simple) cases, similar to the example
x1 + x2 = 3,
x3 ≤ 4,
x4 + x5 ≤ 10,
x1 + x22 ≤ 5,
it is not a bad idea to resolve the equation system by eliminating one of the variables in
an equality equation. Thus, a solution of x1 + x2 = 3 with respect to, for example x1 , gives
x1 = 3 − x2 . Replacing the variable x1 in the last equation reduces the total number of
constraints and design variables. We have
x3 ≤ 4,
x4 + x5 ≤ 10,
(3 − x2 ) + x22 ≤ 5 or x22 − x2 ≤ 2.
There are some cases where it might be useful to eliminate specific constraints completely.
This may hold particularly for side constraints of the type xi ≥ 0 or 1 ≥ xi ≥ 0. It can easily
be shown that the transformations
can replace the side constraints given above using the new design variables Xi . In the
same fashion, the general side constraint αi ≤ xi ≤ βi can be compensated by means of
the transformation xi → αi + (βi − αi ) sin2 Xi .
It should be emphasized explicitly that the original linear nature of the constraints gets
lost, of course. Now, a non-linearity is introduced into the given objective function. In
practical cases, occasionally, the engineer who is modeling an optimization problem does
not want be narrowed by mathematical requirements which ask for special categories of
functions used in the formulation of constraints. Instead, one wishes to formulate certain
constraints in an “algorithmic fashion”. An example of this is a constraint used in the
shape optimization of hypars (hyperbolic paraboloids) in order to assure flat shells. Here
we define (see Fig. 2.8)
x4 − x5 x4 − x3 x3 x5 T
gl = max | |, | |, | |, | | ≤ 0.24.
x1 x1 x1 lv
21
Figure 2.8: A hypar element
From the engineering point of view, constraints, to a large extent, derive from check-
ing procedures defined in numerous standards. Characteristic demands due to standards
include:
• the limit state of serviceability with respect to the required limit has to be complied
with reliably.
Obviously, the equality constraints stemming from (1) the equations of equilibrium, (2)
the equations of kinematics and (3) the constitutive equations according the material used
also play an important role. We will see, however, that these equations are regularly in-
corporated into the implicit constraints mentioned above.
22
Figure 2.9: Bending moment distribution along the shell
• weight,
• cost,
• energy expenditure,
• failure probabilities.
Again, as outlined in the discussion of the constraints, in certain cases it makes sense for
an engineer to formulate a general optimization criterion that is not a “function” in the
strict mathematical sense, but rather a general algorithm. The purpose of the algorithm
is to describe in detail how a numerical value, used for the quantification of a candidate
design x, is computed.
By way of an example, such an algorithmic criterion was used for example in the opti-
mization of beam-like cylindrical shells (carried out by the author of these lecture notes).
To catch the manufacturing cost in terms of a structure-oriented quantity the following
algorithm was established:
4. minimize it.
Thus we get
23
24
Chapter 3
To get an insight in what the practical modeling of a structural optimization problem looks
like, a simple two bar truss is to be considered. As will be shown, the simple structure as-
sociated with a very elementary structural analysis does not result in a trivial optimization
problem.
B = half width,
H = height, dm
L = bar length,
L
H
α = inclination angle, 2P
1
E = modulus of elasticity.
The cross section of both bars is a tube with the α
1 3
characteristic parameters t (thickness of the thin-
walled tube) and dm (median diameter).
From the drawing of the structural system, it can be B B
seen that the nodes
1 and
3 are fixed nodes while
node
2 is free. The load at the vertex of the truss Figure 3.1: A two bar truss
is given as 2P. In fact, it would be very difficult to
find a less simple structural system!
25
dimensional optimization problem for the sake of clarity (the problem can be visualized
geometrically).
x H
Defining x1 = H and x2 = dm , we get the optimization vector x = 1 = , while
x2 dm
placing all further quantities B, E, P and t constant.
R
As we know, each parameter of the optimization problem requires a def-
inite optimization criterion. Although a cost criterion would be desirable,
t
r
as in most cases, here an engineering-fashioned objective is taken into ac-
count. In this example, we define the volume V of the total structure as the
objective function, V = 2AL, where A is the cross section area of a single
dm
bar and L is the bar length. We have
p q
L = L(x1 ) = H 2 + B2 = x21 + B2
and
because dm = 2r + t.
q
As a consequence, the volume V can be written as V = 2πt · x2 x21 + B2 .
With respect to the stability of a solution (numerical or geometrical), very often it is
reasonable to introduce normalized variables. In our case, we choose
x1 H
x̃1 = = ,
B B
or x1 = Bx̃1 and (to allow computational simplifications)
πx2
x̃2 = ,
B
or x2 = Bx̃2 /π. Note also that the new variables x̃1 and x̃2 are without dimensions.
Thus, we have
q
Bx̃2
V = 2πt · · x̃2 B2 + B2
πq 1
= 2B2t · x̃2 x̃21 + 1.
V p
= f (x̃) = x̃2 x̃1 + 1.
2b2t
Again, the function f (x̃1 , x̃2 ) is dimensionless because the term 2B2t has the same dimen-
sion as the volume V .
26
q
Figure 3.2: A 3D-plot (left) and a contour plot (right) of the objective function x̃2 x̃21 + 1
Obviously, the function f that we take as our objective function is non-linear1 . The visu-
alization using Mathematica is demonstrated in Fig. 3.2.
3.4 Modeling the constraints
Modeling the set of constraints is extensive, even in this simple problem. However, as
we have seen from the discussion of the possible solution domains in an optimization
problem the definition of constraints is crucial (to avoid pathological situations for the
computer2 ).
A very first step is to look at the lower and upper bounds of the optimization variables x1
and x2 or x̃1 and x̃2 . Immediately we can identify the following four cases (see Fig. 3.3):
1. x2 = dm > 0: zero or negative values make no sense,
2. x1 = H = 0: this truss is instable,
3. x1 = H > 0: the two bars are subjected to compression,
4. x1 = H < 0: a truss with tension forces in both bars.
The discussion of the possible ranges for the design variables demonstrates that the stresses
in the structure are prominent quantities in the formulation of constraints.
According to the equilibrium conditions at the vertex we have (see Fig. 3.4)
P P P
sin α = = = ,
S1 S2 S
1 Note that taking L as a design variable also leads to a non-linear objective function.
2 Note that computers are pretty “stupid” with respect to the recognition of problems!
+
2P H
2P
x1 = H = 0 x1 = H > 0 x1 = H < 0
the truss is instable bars are subjected to compresson bars are subjected to tension
x1 H
sin α = =
L L
and
q
L = x21 + B2 .
Thus
q
P P x21 + B2
S1 = S2 = S = = = S(x1 )!
sin α x1
It is to be emphasized that the equation for the axial force S represents a set of equal-
ity constraints within the structural optimization problem. Usually we have equilibrium
equations of the form
∑ ~F = 0, or ∑ Fx = 0, ∑ Fy = 0.
Also, we can see that
q
P x21 + B2
lim S = lim → ∞,
x1 →0 x1 →0 x1
which leads to a singularity in the model that, of course, has to be eliminated. We have to
therefore ensure that x1 6= 0 and x̃1 6= 0.
The stresses due to the forces S1 and S2 yield stresses
σ (simple stress concept assumed) that are given by
2
q S1
2P
S S(x1 ) P x21 + B2
σ = σ(x1 , x2 ) = = = · . α
A A(x2 ) πt x1 x2 α
L
H = x1
2P
S2
1
For the solution, we still have to replace the design variables x1 and x2 by the dimension-
less new variables x̃1 and x̃2 ; this will be carried out subsequently.
28
3.5 Stability problem
In the case x1 > 0, the axial compressive forces S may induce a buckling that, of course,
has to be prohibited. From the fundamentals of structural mechanics we know the critical
buckling load Scrit corresponding to the end conditions in the two bar truss (Euler case II)
is
π2
Scrit = EI ,
L2
where EI is the flexural rigidity of each bar. It is required that ν · Sexist ≤ Scrit , introducing
ν as a safety factor.
An essential comment: Of course, more detailed checking concepts may be applied for
buckling (EC 3); nevertheless, simple concepts may serve as an appropriate “operational
basis”.
The moment of inertia becomes a behavior variable. For the cross section of the tube we
get
π 4
I= (d − di4 ).
64 a
Replacing da = dm + t = x2 + t and di = dm − t = x2 − t, after some intermediate compu-
tations we get
π
I = (dm3 t + dmt 3).
8
If
t t 1
= ≤ ,
da dm + t 10
This equation is associated with the constraint for the validity limit
t 1 t 1
≤ , or ≤
da 10 t + x2 10
that can be rewritten to 10t ≤ t + x2 , or 9t − x2 ≤ 0.
Additional constraints are defined to ensure a closed solution domain; also geometrically
unacceptable configurations of the structure should be prevented. This is accomplished
by the side constraints x2 ≤ 50t, where 50t is more or less arbitrary (tubes according to
29
Table 3.1: The set of constraints for the two bar truss
q
x21 + B2 Etπ3 x3
g1 (x1 , x2 ) = ν · P − · 2 2 2 ≤ 0, x1 > 0,
x1 8 x1 + B
q
P x21 + B2
g2 (x1 , x2 ) = · − σCadm ≤ 0, x1 > 0,
πt x1 x2
q
−P x21 + B2
g3 (x1 , x2 ) = · − σTadm ≤ 0, x1 > 0,
πt x1 x2
g4 (x2 ) = x2 − 50t ≤ 0,
g5 (x2 ) = −x2 + 9t ≤ 0,
g6 (x1 ) = x1 − 2B ≤ 0, x1 > 0,
g7 (x1 ) = −x1 − 2B ≤ 0, x1 < 0,
g8 (x1 ) = −x1 + 0.2B ≤ 0, x1 > 0,
g9 (x1 ) = x1 + 0.2B ≤ 0, x1 < 0,
DIN 2448 are ≤ 16t) and 0.2B ≤ x1 ≤ 2.0B, where, again, the chosen factors 0.2 and 2.0
are arbitrary, but “reasonable”.
Summarizing the individual constraints in a systematic manner, we get the set of equations
in Table 3.1, or in a more abstract notation,
min V (x) | g j (x) ≤ 0; j = 1, 2, ...9 .
x
To improve the computational efficiency3 , in harmony with the reformulation of the ob-
jective function, the normalized design variables (x̃1 and x̃2 ) are applied.
The transformation process is to be demonstrated only for the stability constraint (x1 > 0)
q
x21 + B2 Etπ3 x32
νP − · ≤ 0,
x1 8 x21 + B2
where, again, we substitute
x̃2 B
x1 = x̃1 B and x2 = .
π
Thus,
q
x̃21 + 1 EtB x̃3
νP · − · 2 2 ≤ 0.
x̃1 8 x̃1 + 1
Multiplying the above inequality by 8/EtB leads to
q
8νP x̃21 + 1 x̃3
· − 2 2 ≤ 0,
EtB x̃1 x̃1 + 1
3
Note that optimization is always a numerically intensive job!
30
Table 3.2: The set of constraints using normalized variables
q
g̃1 (x̃) = Ω1 (x̃21 + 1)3 − x̃1 x̃32 ≤ 0, x̃1 > 0,
q
g̃2 (x̃) = Ω2 x̃21 + 1 − x̃1 x̃2 ≤ 0, x̃1 > 0,
q
g̃3 (x̃) = Ω3 x̃21 + 1 + x̃1 x̃2 ≤ 0, x̃1 < 0,
g̃4 (x̃) = x̃2 − Ω4t ≤ 0, x̃1 > 0,
g̃5 (x̃) = −x̃2 + Ω5 ≤ 0, x̃1 > 0,
g̃6 (x̃) = x̃1 − Ω6 ≤ 0, x1 > 0,x̃1 > 0,
g̃7 (x̃) = −x̃1 + Ω7 ≤ 0, x1 < 0,x̃1 < 0,
g̃8 (x̃) = −x̃1 + Ω8 ≤ 0, x1 > 0,x̃1 > 0,
g̃9 (x̃) = x̃1 − Ω9 ≤ 0, x1 < 0,x̃1 < 0,
8νP P P
Ω1 = Ω2 = Ω3 =
EBt BtσCadm BtσTadm
t t
Ω4 = 157.1 Ω5 = 28.3 Ω6 = Ω7 = 2.0
B B
Ω8 = Ω9 = 0.2
or
q
x̃21 + 1 x̃32
Ω1 · − ≤ 0,
x̃1 x̃21 + 1
where Ω1 = 8νP/EtB is a dimensionless factor. Similarly, we obtain the other constraints
g j using normalized design variables (see Table 3.2) using the definitions given in Table
3.3.
In the x̃1 , x̃2 -plane the contour lines of the objective function are to be drawn. The can be
achieved by putting the values -1.0, -0.5, 0, 0.5, 1.0, etc. into the objective function Ṽ . By
that, an array of curves can be implicitly determined using the relationship
Ṽ
x̃2 = x̃2 (x̃1 ) = q .
x̃21 + 1
An example for the series 0.1, 0.2, 0.25, etc. has been visualized by Mathematica (see Fig.
3.5 (left)). Similarly the constraints can also be pictured in the x̃1 , x̃2 -plane. The approach
is to be exemplified for the stability constraint only.
Resolving
q
Ω1 (x̃21 + 1)3 − x̃1 x̃32 = 0
31
q
1/3
Figure 3.5: A contour plot of Ṽ (left); a plot of x̃21 + 1 · 0.0019
1/3 (right)
x̃1
Figure 3.6: A plot of the contour lines of the objective function and the constraints
32
Chapter 4
4.1 Trusses
Historically, truss optimization has been carried out very early, long before the com-
puter was even invented! Two names of famous engineers have to be mentioned, Maxwell
(1869) and Michell (1904), who set up the following theorem:
Theorem: A truss is optimal with respect to a specific load case if the bars
are concurrent (run parallel) to the direction of the principal strains.
This lead to the so-called Michell structures which were also verified by the homogeniza-
tion method based on computer models by Bensoe and Kikuchi (1988, see Fig. 4.1).
There are some further theorems in truss optimization that should be known because they
represent an effective method of testing numerical optimization strategies; these theorems
can be used to validate an “optimizer”.
33
c)
Figure 4.1: (a) Michell structure, (b) Similar truss structure, simplified, (c) A schematic view of a
Michell structure derived from a plate where “unneeded” material has been removed.
This is an example of topology optimization.
F F
F F
Figure 4.2: For both load cases (top and bottom), there is a unique optimal subsystem.
or
Ai = σadm · Si ,
for the forces Si and cross sectional areas Ai of each bar i. Note: The design variables Ai
are degenerated, therefore, only shape optimization make sense in this case.
34
initial geometry and loading
variables are the nodes on the upper chord
Figure 4.3: Shape optimization: constraints affecting the optimization result (Pedersen)
35
4.2 Several trusses in different situations
Some comments in advance: Based on an intuitive design, the optimum weight is sought.
The design variables are specified nodal coordinates of the trusses. Sizing variables can
be eliminated (degenerated design variables) because of the statical determinancy of the
trusses. Therefore, the cross sections are computed from stress constraints (stability ac-
cording to DIN 18800) such that they are fully stressed. As a consequence, again shape
optimization is of interest (see Figs. 4.4,4.5).
36
Cantilever
material: steel; loads are at the upper chord;
optimization variables are the coordinate of points 1, 2 and 3;
initial weight: 2.13 t; final weight 1.91 t; improvement: 10.1%
Cantilever
material: steel; loads are at the upper chord;
optimization variables are the vertical coordinates of nodes at the lower chord;
initial weight 5.43 t; final weight 4.33 t; improvement: 20.3%
optimal geometry
initial geometry Notice the thrust-line shape of the upper chord
37
Bridge girder
material: steel; loads are on the lower chord;
optimization variables are the coordinates of nodes at the upper chord;
initial weight 7.48 t; final weight 3.42 t; improvement: 54.4%
Bridge girder
material: steel; the concentrated load is in the center of the lower chord;
optimization variables are the coordinates of nodes at the upper chord;
initial weight 12.51 t; final weight 7.70 t; improvement: 38.4%
38
4.3 Optimization of a statically indeterminate truss
In the following, a large-scale truss, being statically indeterminate, is optimized with re-
spect to two separate objective functions (see Fig. 4.6). Only cross sections are introduced
as design variables. Hence, no shape optimization is considered. Furthermore, two options
of the structural system are evaluated:
39
initial structure and loads sizing optimization, version 2, objection function is costs
(dashed lines indicate additional bars in version 2) volume: 0.108
Note: although there are more bars in this version, the
total volume is less than in version 1
iteration history for the cost optimization problem iteration history for the weight optimization problem
Figure 4.6: Cross section optimization of truss (top) The iteration history of the cost optimization
(bottom left) and weight optimization (bottom right) problem
40
constant variable
F cross section cross section
h = const
h
g = 100% g = 74 %
1.0 F 1.21 F 1.61 F
Figure 4.7: Critical buckling loads depend on the cross section type (left); Theoretical alternatives
for the shape of an I-beam (right)
fixed support
fixed support
4.4 Beams
Similar to truss optimization there are a few forerunners in the optimization of beams
stressed through bending moments. Very early, Gallilei created some optimal solutions of
a beam subjected to a centrically induced compression load. Assuming that the volume
remains constant the critical buckling load of the system shown in Fig. 4.7 (left) can be
improved by changing the cross sections as shown.
Further rudimentary examples are shown in Fig. 4.7 (right). A continuously increased
flange (the top as well as the bottom flange) reduces the weight from 100% to 74%. (Of
course, the solution is debatable with respect to engineering requirements!). The fish-
bellied beam shown in Fig. 4.8 is another historical example for a clamped beam with a
uniform load and fixed supports.
A simple example for structural optimization of beams using the finite element method is
shown in Fig. 4.9. Material distribution is to be optimized. The loads are a uniform load
(left side) and a concentrated load with a varying number of sections.
41
definition and loading of a continuous beam
42
4.5 Framed structures
A highly sophisticated example is the strucutural optimization of a frame with three storys
and two bays. The assumed loading is an earthquake, represented in terms of a horizontal
pertubation (earth tremor) at the bottom to the structure. To compute the relevant con-
straints, in this case the finite element analysis requires the solution of the Newtonian
equations of motion,
where
M, D, K are system matrices,
r, ṙ, r̈ are kinematic quantities and
P(t) is the time-variant loading.
Design variables are the cross section where, in total, six distinguished cross section
groups are defined. The material used is St37 along with H-EA-shaped beams (standard
I-beams). Consequently, six design variables are introduced. The objective functions are
• stress constraints,
• constructive constraints,
• validity constraints.
The finite element analysis is based upon 24 beam elements lumped together in 18 nodes
(so-called lumped mass concept). The initial frame an optimization results are shown in
Figs. 4.10 and 4.11.
43
Figure 4.10: Structure design and loading of the initial frame. Optimization variables are are cross
sectional areas, the displacement at the top has to be ≤ 0.0327 m. The design objec-
tive is weight and cost. The nodes 1, 2, and 3 are fixed.
Figure 4.11: Comparision of optimal moment distribution (left) and cross sectional area (right).
In the right figure, the results of cost optimization and weight optimzation are shown
with a dotted pattern and hashes, respectively. The final volume of cost optimization
is 0.9123 m3 , after weight optimization it is 0.8645 m3 .
44
Figure 4.12: In-plane weight optimization of a screw wrench for a given dominant load case. The
CAD model (left); the initial FE mesh (center); the final FE mesh and boundary
conditions (right).
Figure 4.13: Lug optimization: The state variables are bearing stress (≤ 36000 psi) and maximum
σe (≤ 21000 psi). The design variables are l, h, d1 , d2 , r and t. The loads are P1 =
2600 lbs. and P2 = 15000 lbs., both uniformly distributed.
45
Figure 4.14: Initial and final lug geometries showing line and keypoint numbers. In the initial
design (top), the volume is 25.4 in3 , the thickness is 1.0 in. The design is infeasible.
In the final design (bottom), the volume is 16.4 in3 , the thickness is 1.2 in. It took 23
loops to converge.
46
Initial design, weight = 106.5 kN
2
Optimum design, weight = 87.5 kN (σν = 200 N/mm )
2
Optimum design, weight = 112.0 kN (σν = 150 N/mm )
Figure 4.15: Optimization of a Vierendeel truss (a frame with a top beam containing individual
shear walls). The question to be answered is: What are the optimal shapes of the
notches used for installation purposes? Shown are the initial design (top) and optimal
solutions for σν = 200 N/mm2 (middle) and σν = 150 N/mm2 (bottom)
47
Figure 4.16: System and optimal form: (a) crown load, supports movable
Figure 4.17: System and optimal form: (b) crown load, supports fixed
4.7 Shells
The structural efficiency of shells and the wide variety of the possible shapes, in terms
of 3-dimensional surfaces, make structural optimization of shells extremely interesting.
However, this also forms a real challange to the structural designer.
Obviously, powerful finite element methods have to be provided in structural optimiza-
tion models to obtain the desired optimal design. The influence of loading as well as the
influence of fixing of supports is demonstrated in the optimization of a spherical calotte
belonging to the shell of revolution category (line load at the crown, dead load, snow load,
wind load; see Fig. 4.16 - 4.20).
48
Figure 4.18: System and optimal form: (c) dead weight, supports movable
Figure 4.19: System and optimal form: (d) snow load, supports movable
49
Figure 4.20: System and optimal form: (e) wind load, supports movable
50
Chapter 5
If more than two design variables are defined a mathematical or numerical soluton tech-
nique is needed. Descriptive graphical solutions, used for 2-dimensional problems, are no
longer applicable.
As an overview of the material that we have to compile, a very broad classification of the
optimization techniques available is given in Fig. 5.1.
A throrough knowledge of optimality conditions is important to understand the perfor-
mance of the various numerical methods used in the design practice. In particular, to dis-
cuss optimal design concepts some fundamentals of vector and matrix algebra is needed.
In this context, the differentiation notation for functions of several variables will be intro-
duced, or at least repeated (see the corresponding lectures in mathematics).
The gradient vector for a function of several variables plays a crucial role in classical
and also modern numerical approaches. In the same fashion, the so-called Hessian ma-
trix which defines the second partial derivatives of a function, is also important in the
mathematical solution of optimization problems.
optimization methods
51
Consider a function f (x) of n variables x1 , x2 , x3 , . . . , xn . The partial derivatives of the
function with respect to x1 at a given point x∗ is defined as
∂ f (x∗ )
,
∂x1
and with respect to x2 as
∂ f (x∗ )
,
∂x2
and so on.
For convenience and compactness of notation, the individual partial derivatives are ar-
ranged into a column vector called the gradient vector that, in the mathematical literature,
is represented by any of the following symbols:
∂f
∇f, , grad f .
∂x
Thus, we have
∂f
∂x
1
∂f
∂x T
2 ∂ f ∂ f ∂ f
∇ f (x ) =
∗
= ∂x1 , ∂x2 , . . . , ∂xn ∗
.. x
.
∂f
∂xn x∗
where the superscript T denotes the transpose of the row vector. Note that all partial
derivatives are taken at the given point x∗ .
Geometrically, the gradient vector is normal to the tangent plane at the point x∗ as shown
in Fig. 5.2 for a function of three variables x1 , x2 and x3 . Also, it points in the direction
of maximum increase in the function. In our case, the gradient will be used in developing
optimality conditions or in calculating appropriate search directions.
52
x3
∇f(x*)
x*
x2
x1
where all derivatives are computed at the given point x∗ . This matrix of the type n × n is
usually denoted as the Hessian H or ∇2 f . It is important to emphasize that each element
of the Hessian H is itself a function which is evaluated at the given point x∗ . Also, since
f (x) is assumed to be twice continuously differentiable, the cross partial derivatives are
equal, that is
∂2 f ∂2 f
= i = 1, 2, . . ., n, j = 1, 2, . . ., n, i 6= j.
∂xi ∂x j ∂x j ∂xi
Thus, the Hessian is always a symmetric matrix. This plays a prominent role in the suffi-
ciency conditions for optimality discussed later on. We therefore define the Hessian matrix
as
∂2 f
H= , i = 1, 2, . . ., n, j = 1, 2, . . ., n.
∂xi ∂x j
This idea is materialized in the Taylor series expansion. Considering first a simple function
of a single variable x, the Taylor expansion about the point x∗ (in our case, this point is
53
assumed to be the optimum solution) is given by
d f (x∗ ) 1 d 2 f (x∗ )
f (x) = f (x∗ ) + (x − x∗ ) + (x − x∗ )2
dx 2 dx2
1 d n f (x∗ )
+ ... + n
(x − x∗ )n .
n! dx
If we summarize the terms with the derivatives of the order larger than 2 in the remainder
term R, because it is smaller in magnitude than the previous terms of the first and second
order (provided x is sufficiently close to x∗ ), we then obtain more briefly using the prime
notation for d/dx:
1
f (x) = f (x∗ ) + f ′ (x∗ ) · (x − x∗ ) + f ′′ (x∗ )(x − x∗ )2 + R.
2
Let the difference (x − x∗ ) = ∆x be a small change in the point x∗ . Then the Taylor expan-
sion becomes
1
f (x∗ + ∆x) = f (x∗ ) + f ′ (x∗ )∆x + f ′′ (x∗ )∆x2 + R.
2
Accordingly, for a function of two variables f (x1 , x2 ), we can write the Taylor expansion
at the point x∗ = [x∗1 , x∗2 ]T as
∗ ∗ ∂ f ∗ ∂ f
f (x1 , x2 ) = f (x1 , x2 ) + (x1 − x1 ) + (x2 − x∗2 )
∂x1 x∗ ∂x2 x∗
" #
1 ∂2 f ∂2f ∂2f
+ (x1 − x∗1 )2 + 2 (x1 − x∗ )(x2 − x∗ ) + (x2 − x∗ )2 + R,
2 ∂x21 x∗ ∂x1 ∂x2 x∗ 1 2
∂x22 x∗ 2
where all partial derivatives are taken at the given point x∗ = [x∗1 , x∗2 ]T . Using the summa-
tion notation the Taylor expansion can also be written as
Here, the quantities ∂ f /∂xi as well as ∂2 f /∂xi ∂x j are components of the gradient of the
function f (x1 , x2 ) and the Hessian matrix ∇2 f (x1 , x2 ), respectively, always evaluated at
the given point x∗ .
It is, therefore, not surprising that the Taylor series in matrix notation has the general form
1
f (x) = f (x∗ ) + ∇ f T x∗ (x − x∗ ) + (x − x∗ )T H|x∗ (x − x∗) + R.
2
This notation also holds for the case that the function f (x) has multiple, i.e. n, and not
only 2 variables. Therefore, x, x∗ and ∇ f are n-dimensional vectors and the matrix H is
of the type n × n.
54
Defining ∆x = x − x∗ we obtain
1
f (x∗ + ∆x) = f (x∗ ) + ∇ f T ∆x + ∆xT H∆x + R.
2
The change of the function f moving x∗ to a neighboring point x∗ + ∆x is
1
f (x∗ + ∆x) − f (x∗ ) = ∇ f T ∆x + ∆xT H∆x
2
if the term R can be neglected (second-order change).
A first-order change is consequently defined by
∆ f = ∇ f T ∆x ≡ δ f ,
where in all the cases above ∆x represents a small change in x∗ .
Example: Obtain a second order Taylor expansion for the function f (x) = 3x31 x2 at the
point x∗ = [1, 1]T.
Solution:
" #
∂f 2
∂x1 9x1 x2 9
∇ f x∗ = ∂ f = 3 = ,
3x1 x∗ 3
∂x2 x∗
18x1 x2 9x21 18 9
H(x ) = ∇ f x∗ =
∗ 2
= .
9x21 0 x∗ 9 0
Approximating gives
T T
9 x1 − 1 1 x1 − 1 18 9 x1 − 1
f˜(x) = 3 + +
3 x2 − 1 2 x2 − 1 9 0 x2 − 1
= 9x21 + 9x1 x2 − 18x1 − 6x2 + 9.
The accuracy of the approximate solution compared to the exact solution is shown in
following table:
55
1. The conditions that must be satisfied at the optimum point are called necessary con-
ditions. Conversely, if any point does not satify the necessary conditions, it cannot
be an optimum.
Note, however, that the satisfaction of the necessary conditions does not guarentee
an optimum point, that is, there may be non-optimum points that can also satisfy
the same conditions! Points satisfying the necessary conditions are called candi-
date optimum points or stationary points (maxima, minima, saddle points), such
that further tests for distinguishing between optimum and non-optimum points are
required.
2. The sufficient conditions provide these tests to decide on the actual optimality of
candidate optimum points. That is to say, if a candidate optimum point satisfies the
sufficient conditions, then it is indeed an optimum. In this case, no further tests are
needed.
Note, however, if the sufficient conditions are not satisfied or cannot be used, no conclu-
sion can be made that the candidate design (point) is an optimum. If should be emphasized
that the above discussion affects both unconstrained and constrained optimization prob-
lems. To elucidate the optimality conditions, however, we will start with the unconstrained
optimum design problem.
for all changes ∆x. From the introduction of the Taylor series we also know that
1 !
∆ f = ∇ f (x∗ )T ∆x + ∆xT H(x∗)∆x + R ≥ 0
2
56
is defined through first and second order terms, plus a remainder R. Since the change
vector ∆x is small, the first order term ∇ f (x∗ )T ∆x dominates the higher order terms.
Focusing on this term, we can conclude that the requirement ∆ f ≥ 0 can satisfied if and
only if ∇ f (x∗ ) = 0 for all possible changes ∆x. That is, the gradient of the function at the
assumed minimum point x∗ must vanish. This condition is called the necessary condition
for a stationary point x∗ . In the component form the necessary condition becomes
∂ f (x∗ )
= 0, i = 1, 2, 3, . . ., n.
∂xi
More precisely, both conditions are of first order. Considering the second term in the
Taylor series for ∆ f at the minimum point x∗ , i.e.
In general, a quadratic form F may be either positive, negative or zero for any fixed ∆x.
It may also have the property of being always positive, except for F(x = 0). Then, such a
form is called positive definite. In harmony with the positive definiteness of the quadratic
form, the matrix associated with that quadratic form is also denoted as positive definite.
Using x instead of ∆x in our formulation, we can define a general matrix A to be positive
definite if xT Ax > 0, ∀x 6= 0. Accordingly, we can complete the classification of A as
shown in Table 5.1.
It is to be pointed out that, besides a trial and error method by which arbitrary x vectors are
evaluated for a given matrix A, the following two methods for checking the definiteness
of a matrix A are also applicable for use in numerical problems:
57
Table 5.1: Classification of a general matrix A
Quadratic Form matrix A
xT Ax >0, ∀x 6= 0 positive definite
xT Ax ≥0, ∀x 6= 0 positive semidefinite
T
x Ax < 0, ∀x 6= 0 negative definite
xT Ax ≤ 0, ∀x 6= 0 negative semidefinite
xT Ax < 0 for some x and
xT Ax > 0 for some other x indefinite
1. F(x) is positive definite if and only if all eigenvalues of A are strictly positive, i.e.
λi > 0, i = 1, . . ., n.
2. F(x) is positive semidefinite if and only if all eigenvalues of A are non-negative, i.e.
λi ≥ 0, i = 1, . . ., n (note that at least one eigenvalue must be zero for it to be called
positive semidefinite).
3. F(x) is negative definite if and only if all eigenvalues of A are strictly negative, i.e.
λi < 0, i = 1, . . ., n.
4. F(x) is negative semidefinite if and only if all eigenvalues of A are non-positive, i.e.
λi ≤ 0, i = 1, . . ., n (note that at least one eigenvalue must be zero for it to be called
negative semidefinite).
1. A is positive definite if and only if all the principal minors are positive, i.e. Mk > 0,
k = 1, . . ., n.
3. A is negative definite if and only if Mk < 0 for odd k and Mk > 0 for even k.
58
5. A is indefinite if it does not satisfy any of the preceding criteria.
Finally, the sufficiency condition is to be derived from the discussion of the Taylor series.
Again, considering the second term in the series for ∆ f , where
1
∆ f = ∇ f (x∗ )T ∆x + ∆xT H(x∗ )∆x + R
2
is evaluated at the staionary point determined by the condition
!
∇ f |x∗ = ∇ f (x∗ ) = 0,
This will be true if the Hessian H(x∗ ) is a positive definite matrix which is then the
sufficient condition for a local minimum of f (x) at x∗ .
Summarizing the last development gives the second order sufficiency conditions. If the
matrix H(x∗ ) is positive definite at the stationary point x∗ , then x∗ is a local minimum
point for the function f (x).
Some last comments at the end of the derivation of the optimality conditions: Note that
the entire conditions involve derivatives and not “absolute” values of the functions. As
a consequence, adding a constant to f (x) or multiplying f (x) by any positive constant
does not change the minimization problem. Multiplying f (x) by a negative constant (e.g.
−1) changes the minimum of f to a maximum. This property allows us to convert a
minimization problem directly to a maximization problem by multiplying f (x) by −1.
A quick example can demonstrate the capabilities of the optimality conditions in action.
Example
Discuss the function f (x) with respect to optimum solutions where
by replacing
5
x1 = x2 − x32
2
from the first equation in the second equation yields
5 5
12( x2 − x32 ) · x22 − 10( x2 − x32 ) + 2x2 = 0
2 2
59
or
(−12x42 + 40x22 − 23)x2 = 0.
A possible solution point is x2 = 0 and x1 = 0. Thus,
∗ 0
x =
0
is a stationary point.
Checking the Hessian matrix for this point requires the computation of H(x),
2
∂ f ∂2 f
2 − 10
∂x2 ∂x1 x 2 4 12x2
H(x) = ∂2 1f ∂2 f
= 2 − 10 24x x + 2 .
12x2 1 2
∂x2 x1 ∂x2
2
60
f ’ (x1) not continuous
f(x1)
x1
f(x1)
x1
x1
61
62
Chapter 6
Based on the discussion of unconstrained optimization problems, one might conclude that
only the nature of the objective function f (x) will determine the location of the minimum
point. This, however, is not true.
The constraint functions, as we already know, can play a prominent role. The following
examples are given to illustrate possible situations.
Case #1
Solve the minimization problem
min f (x) = (x1 − 1.5)2 + (x2 − 1.5)2 | h(x) = x1 + x2 − 2 = 0 .
x
Solution. The potential solution point must lie on the line x2 = 2 − x1 (see Fig. 6.1, left).
The minimum can be determined by the perpendicular from the center of the contours of
f (x) (which are of course concentric circles) to the line x2 = 2 − x1 . Consequently, due to
the theorem of intersecting lines the solution is
∗
∗ x 1.0
x = 1∗ = .
x2 1.0
That means that the solution is a specified point on the given line x2 = 2 − x1 where
h(x∗ ) = 0 and f ∗ = f (x∗ ) = (x21 − 1.5)2 + (x∗2 − 1.5)2 = 0.5.
Case #2
Solve the minimization problem
x1 + x2 − 2 ≤ 0,
min f (x) = (x1 − 1.5)2 + (x2 − 1.5)2 −x1 ≤ 0,
x
−x2 ≤ 0.
Solution. In this (simple) case, instead of a single inequality constraint, three inequality
constraints are given. Although the graphical representation of the problem looks very
similar, the nature of the problem is completely different from the previous case (see
63
x2 x2
2 2
M
x1* = 1 2 x1 x1
1 2
Fig. 6.1, right). The solution is determined by the feasible domain spanned by the three
inequality constraints. Nevertheless, the same solution point occurs. The minimum value
for f (x) in this case corresponds to a circle with the smallest possible radius of all circles
in the feasible domain that just intersects the feasible region. This, with respect to its
value, is the point
∗
∗ x 1.0
x = 1∗ = ,
x2 1.0
where f (x∗ ) = 0.5, but the solution logic has changed. The location of the minimum is
found by considering a feasible domain! Again, the location is governed by the constraints
or boundaries of the feasible domain.
Case #3
Solve the minimization problem
x1 + x2 − 2 ≤ 0,
min f (x) = (x1 − 0.5)2 + (x2 − 0.5)2 −x1 ≤ 0,
x
−x2 ≤ 0.
Solution. Now the solution is independent of the constraints. As we can see from the
graphical representation (see Fig. 6.2, left) the minimum lies on the inside of the feasible
domain because the center of the contours (again, circles) are located at the point ( 21 , 12 )
where
∗ ∗ 1/2
f (x ) = 0 and x = .
1/2
Case #4
Solve the minimization problem
x1 + x2 − 2 ≤ 0,
2 −x1 + x2 + 3 ≤ 0,
2
min f (x) = (x1 − 2) + (x2 − 2)
x −x1 ≤ 0,
−x2 ≤ 0.
64
x2 x2
M
2 2
1 1
M
1/2
x1 x1
1/2 1 2 1 2
In this case, the objective function is modified again. Also, a further inequality constraint
(−x1 + x2 + 3 ≤ 0) is added to the model.
Solution. The plot of the constraints for case #4 shows that the problem is over-constrained
(see Fig. 6.2, right). This is to say that there are conflicting requirements which can not be
satisfied. As a consequence, a solution does not exist. In such a case we must re-examine
the problem formulation and relax the constraints.
To discuss the above example in more detail we have to realize that
1 1
if P1 = [ , ]T , then g2 (x) = −x1 + x2 + 3 > 0,
2 2
if P2 = [2, 2]T, then g1 (x) = x1 + x2 − 2 > 0 and g2 (x) = −x1 + x2 + 3 > 0,
if P3 = [5, 1]T, then g1 (x) = x1 + x2 − 2 > 0.
All constraints in the constraint sets cannot be satisfied simultaneously. In other words,
no solution point exits that can fulfill all the requirements.
From the discussion of the four examples it follows that the general constrained optimum
design problem, based on the formulation
h(x) = 0
min f (x)
x g(x) ≤ 0
65
This approach is often applied in mathematics (and engineering). Thus, we transform the
original problem
h(x) = 0
min f (x)
x g(x) ≤ 0
having only equality constraints instead of both inequality and equality constraints.
Furthermore, we transform the problem with equality contraints to an unconstrained prob-
lem
min{ f (x)}
x
As can be seen, the abstract objective function has two variables x1 and x2 and the mini-
mization over x is subjected to only one single equality constraint h(x1 , x2 ) = 0.
To derive necessary conditions which we are interested in, we assume that the equality
constraint can be used to solve for one variable in terms of the other (at least symboli-
cally), that is, we assume that we can write
x2 = Φ(x1 ),
where Φ is an appropriate function of x1 . (In some very elementary cases it may be possi-
ble to find an explicit representation for Φ(x1 ), for example if h(x1 , x2 ) = x1 + x2 − 2 = 0,
we could write x2 = Φ(x1 ) = 2 − x1 . In general, however, such explicit expressions cannot
be found.)
Taking the implicit representation x2 = Φ(x1 ), we are able to establish a procedure by
which the Lagrange multipliers (in our case, however, there is only one multiplier for
the single equality constraint) get defined naturally in the process. Since the (symbolic)
elimination of the second variable results in an equation with only one variable, we can
easily write the necessary condition for the newly obtained problem
as
df !
= 0.
dx1
66
Using the chain rule of differentiation for f (x1 , Φ(x1 )) = f (x1 , x2 ), we get
Since this is the necessary condition for a stationary point x∗ = (x∗1 , x∗2 ) we have to re-write
This step make sense because we also have to involve the equality condition into our
consideration. Solving for dΦ/dx1 , we immediately obtain
Now we are capable of substituting dΦ/dx1 from the above equation into the equation for
the necessary condition and we obtain
Reordering gives
67
Note that in this derivation we force both terms
∂ f (x∗1 , x∗2 )/∂x1 ∂ f (x∗1 , x∗2 )/∂x2
and
∂h(x∗1 , x∗2 )/∂x1 ∂h(x∗1 , x∗2 )/∂x2
to be equal to the same quantity λ, although they contain distinct partial derivatives in their
numerators as well as denominators. This demands that, at the stationary point x = x∗ , we
force the equality of these two terms, or in other words
This expression is, in the sense of the functional analysis in mathematics, equivalent to
the requirements that the determinant of the Jacobian matrix of the two functions f (x)
and h(x), also called the functional determinant, vanishes. Thus,
∂f ∂f
∂x1 ∂x2
!
= 0.
∂h ∂h
∂x ∂x ∗
1 2 x
∂x2 ∂ f ∂x2 ∂h
· = ·
∂ f ∂x1 ∂h ∂x1
which means that, at the point x∗ , the slopes ∂x2 /∂x1 with respect to both curves f (x∗ )
and h(x∗ ) coincide (see Fig. 6.3).
The two equations
∂ f ∂h
+λ =0
∂x1 x∗ ∂x1 x∗
and
∂ f ∂h
+λ =0
∂x2 x∗ ∂x2 x∗
68
f(x) contours
x2
h(x)
α dx2
x2*
dx1
tan α = dx2 / dx1
x1* x1
h(x)|x∗ = 0
are the necessary conditions of optimality for an optimum design problem associated with
the equality constraint h(x) = 0.
Any point x that violates these conditions cannot be a minimum point for the problem.
Here, the quantity λ is designated as the Lagrangian multiplier for the equality constraint
h(x) = 0. If the minimum point is known the value of the multiplier can be calculated. For
the above case #1 we get x∗1 = x∗2 = 1 and λ∗ = 1.
It is customary (and convenient) to use what is known as the Lagrange function in writing
the necessary conditions. The Lagrange function is denoted as L and becomes
such that the necessary conditions can be obtained by the following derivatives
∂L ∂ f ∂h
= +λ = 0,
∂x1 ∂x1 x∗ ∂x1 x∗
∂L ∂ f ∂h
= +λ = 0,
∂x2 ∂x2 x∗ ∂x2 x∗
∂L
= h(x1 , x2 )|x∗ = 0.
∂λ
Consequently, the vanishing gradient of the L construct ∇x,λ L(x, λ)x∗ = ∇x,λ ( f (x) + λh(x))x∗
= 0 yields the same statements as above.
Thus, the two problems
and
69
gradient of the objective function and the constraint function have to be along the same
line and proportional to each other. Therefore, the Lagrange multiplier is a proportionality
constant. It can also be interpreted as a force required to impose the constraint.
The idea of a Lagrangian multiplier for an equality constraint can, of course, be general-
ized to many, say p, equality constraints. This leads to the Lagrange Multiplier Theorem.
Before defining this general theorem one important amendment is needed. It must be
asssured that the candidate minimum point that we wish to examine with respect to the
necessary optimality conditions represents a so-called regular point. This has to do with
the fact that the derived necessary optimality conditions are only valid if the point of in-
terest is a regular point. If not, no conclusion based upon the optimality conditions are
possible. (This does not mean that the point is not a minimum!)
where x = [x1 , x2 , ..., xn]T . Let x∗ be a regular point that is a local minimum for the above
problem. Then, there exist Lagrange multipliers λ∗k , k = 1, 2, 3, ..., p such that ∇L(x,λλ)|x∗ ,λλ∗
= 0, or
∂L
= 0, i = 1, 2, 3, ..., n,
∂xi x∗ ,λλ∗
and
∂L
= 0, k = 1, 2, 3, ..., p,
∂λk x∗ ,λλ∗
λ) = f (x) +λ
where L(x,λ λT h(x) is the Lagrange function.
Example
Consider the problem
λ)}
min{L(x,λ
λ
x,λ
70
x2
10
∇f |x*
∇h |x*
1
x2* = 5
1
x1
x1* = 5 10
Figure 6.4: The gradients of f and h at the point x∗ = (−5, −5). We have ∇ f |x∗ = −λ ∇h|x∗ with
∇ f |x∗ = [2x1 , 2x2 ]T = [10, 10]T , ∇h|x∗ = [1, 1]T and λ = −10.
where L(x,λ λ) = f (x) + λh(x) = x21 + x22 + λ(x1 + x2 − 10). The necessary optimality con-
ditions give
∂L
= 2x∗1 + λ∗ = 0 → λ∗ = −2x1 ,
∂x1
∂L
= 2x∗2 + λ∗ = 0 → λ∗ = −2x2 ,
∂x2
∂L
= x∗1 + x∗2 − 10 = 0.
∂λ
Since x∗1 = x∗2 = −λ/2 we obtain x∗1 + x∗2 − 10 = −λ∗ /2 − λ∗ /2 − 10 = 0, or λ∗ = −10 and
x∗1 = x∗2 = −λ∗ /2 = −5. The geometrical interpretation is shown in Fig. 6.4.
The transformation of the original problem into an unconstrained problem using the La-
grange function leads to a so-called “saddle point problem.” In numerical optimization,
there are some methods available which directly utilize the saddle point nature of the
problem as the solution philosophy (dual methods). These methods search for the deepest
point in the “basin” created by the ridges of the saddle surface. For this (see Fig. 6.5), first
a maximization of the λ multipliers is carried out, followed by a minimization of the xi
variables.
71
ridge
basin of the ridge
x2
x1
Figure 6.5: Tansformation of a constrained problem into an unconstrained problem using the La-
grange function leads to a “saddle point problem”.
equality and inequality constraints. That means that we are interested in transforming the
original optimization problem
h(x) = 0
min f (x)
x g(x) ≤ 0
into a problem where only equality constraints appear.
We can transform the inequality constraints of the form g(x) ≤ 0 or g j (x) ≤ 0, j =
1, 2, 3, ..., m by addng new variables (to the constraints) which are called slack variables.
Using, for example, a specified constraint g j (x) ≤ 0 it can immediately be seen that the
slack variable s j for this constraint always has to be non-negative (i.e. positive or zero) to
make the inequality an equality.
To give an example, if g j (x) = −100, then s j = +100 gives exactly
g j (x) + s j = 0.
In the case that g j (x) is active, i.e. g j (x) = 0, then s j is zero because again
g j (x) + s j = 0.
Consequently, an inequality constraint g j (x) ≤ 0 is equivalent to the equality constraint
g j (x) + s j = 0, where s j ≥ 0.
The variables s j are treated as new unknowns of the design optimization problem, along
with the original design variables xi , i = 1, 2, 3, ..., n. Their values are determined as part
of the solution.
When a slack variable s j has zero value, the corresponding inequality constraint is satis-
fied at equality. Such an inequality is call an active constraint. In other words, there is no
“slack” in the constraint. For any s j ≥ 0, the corresponding constraint is a strict inequal-
ity and called an inactive constraint. Note that the usage of slack variables introduces
additional “design” variables and an additional constraint for each s j of the type s j ≥ 0.
This, of course, increases the dimension of the design problem, in a similar fashion as
the Lagrange multipliers enlarge the problem dimensions. In the case of the Lagrange
72
multipliers the dimensionality increases from n to n + p, if p equality constraints are
given. The constraints of the type s j ≥ 0, j = 1, 2, 3, ..., m, can, however, be avoided if we
use s2j instead of s j as the slack variable. Therefore, we have the improved version
g j (x) + s2j = 0,
where s j can be any real number. This new definition of quadratic slack variables allows
the evaluation of constraints as follows:
According to the preceeding discussion we can now apply the well known Lagrangian
Multiplier Theorem to treat the inequality constraints that we could not master so far. The
following transformation is obvious:
h(x) = 0
min f (x)
x g(x) ≤ 0
is transformed to
h(x) = 0
min f (x)
x g(x) + s2 = 0
λ,ω
L(x,λ ω, s) = f (x) +λ
λT h(x) +ω
ωT (g(x) + s2),
where the ω j , j = 1, 2, 3, ..., m are new Lagrange multipliers specifically introduced for
those equality constraints that have been converted from inequality to equality constraints
by means of the slack variables s2 .
Applying the Lagrange Multiplier Theorem to this enhanced Lagrange function yields the
problem
λ,ω
min {L(x,λ ω, s)}.
λ,ω
x,λ ω,s
As can be seen, there are no constraints in the minimization problem min{L}, but we now
have a n + p + 2m-dimensional optimization problem. (That’s the price we have to pay
for getting an unconstrained problem.) There is an additional condition for the Lagrange
multiplier of the original ≤-type constraints, i.e. the ω j , j = 1, 2, 3, ..., m multipliers. Since
the g j (x) are required to have to be less than or equal to zero, the ω j -values are not allowed
to be negative. Thus
ω j ≥ 0, j = 1, 2, 3, ..., m
is an absolute must which creates m additional, although very elementary constraints (sign
constraints).
According to the Lagrange approach the necessary conditions for equality and inequality
constraints can be derived which are commonly known as the Kuhn-Tucker necessary
conditions (KTC).
73
6.2.1 The Kuhn-Tucker Theorem
Let x∗ be a regular point of the problem
h(x) = 0
min f (x) .
x g(x) ≤ 0
Define the Lagrange function
λ,ω
L(x,λ ω, s) = f (x) +λ
λT h(x) +ω
ωT (g(x) + s2)
or
p m
L = f (x) + ∑ λk hk (x) + ∑ ω j (g j (x) + s2j ).
k=1 j=1
Then there exist Lagrange multipliers λ∗k (or λ ∗ ) and ω∗j (or ω ∗ ) such that the Lagrangian
is stationary with respect to xi , λk , ω j and s j . Hence, we obtain the necessary KTC as
follows:
λ∗ · ∇x h|x∗ +ω
∇x L = 0 = ∇x f |x∗ +λ ωT · ∇x g|x∗ ,
∇λ L = 0 = h|x∗ ,
∇ω L = 0 = g|x∗ + s∗2 ,
∇s L = 0 = ω ∗ T s∗ , ω ∗ ≥ 0.
must be satisfied. Furthermore, the equations ω∗j s∗j = 0, called the switching conditions,
provide the multiplicity of solutions.
The gradient conditions also have a geometrical meaning. They can be interpreted such
that, at x∗ , the gradient of the objective function is a positive linear combination of the
gradients of the constraints with Lagrange multipliers as the parameters of the linear com-
bination. A 2D example is shown in Fig. 6.6.
74
−∇f | x* represented by a negative linear combination
xb* does not qualifiy for the Kuhn-Tucker conditions
xb*
hi
hi-1
∇hi | x*
xa*
Figure 6.6: A 2D example of positve and negative linear combinations of constraint gradient vec-
tors
where
g1 (x) = x1 + x2 − 2 ≤ 0.
The definition of the Langrange function L(x, ω, s) = f (x) + ω(g(x) + s2) is in this case
L = (x1 − 1.5)2 + (x2 − 1.5)2 + ω(x1 + x2 − 2 + s2 ).
The KTC for x = x∗ require
∂L !
2(x∗1 − 1.5) + ω∗ =0 =0 ,
∂x1
∂L !
2(x∗2 − 1.5) + ω∗ = 0 =0 ,
∂x2
∂L !
x∗1 + x∗2 − 2 + s∗2 = 0 =0 ,
∂ω
∂L !
2ω∗ s∗ = 0 =0 .
∂s
The solution starts by using the switching conditions (in general we have 2m cases for m
inequality constraints). By that, the following two case are possible:
75
case #2 Now we assume that ω∗ = 0 in the switching condition. We obtain the equations
2(x∗1 − 1.5) = 0, 2(x∗2 − 1.5) = 0 and x∗1 + x∗2 2 + s∗2 = 0. Again, solving these equa-
tions show that x∗1 = x∗2 = 1.5, but s∗2 = 2 − x∗1 − x∗2 = −1, which means that there
is no real solution for s. Thus, this solution violates the given problem.
2. Any point that does not satisfy the KTC cannot be a local minimum point unless it
is an irregular point. Points satifying the KTC are called Kuhn-Tucker points.
4. If there are equality constraints and no inequalities are active (i.e. their Langrange
multipliers are zero), the the Kuhn-Tucker points can again be only stationary (min-
imum, maximum or inflection) points.
5. If some inequality constraints are active and their multipliers are positive, then the
Kuhn-Tucker points cannot be local maxima. They may not be local minima either;
this will depend on the second-order necessary and sufficient conditions.
6. The KTC can be used to check whether or not a given point is a candidate minimum
point. (This fact is utilized in some popular numerical optimization methods for
termination.)
1. the constraints define a convex set S of points forming a collection that has the
following property: If P1 and P2 are any points in S, then the entire line segment
76
(2) (1)
x -x
α(x - x )
(2) (1)
f(x)
x secant
(2)
x
(1)
x
f(x) is convex
S
x
Figure 6.7: An example of a convex 2-dimensional set left; A 2D example of a convex function
right
P1 P2 is also in S. In the n-dimensional space the line segment can be written as (see
Fig. 6.7, left)
x = αx(2) + (1 − α)x(1) , 0 ≤ α ≤ 1.
77
ne
t pla
en
ng
∇gj(x) | x* int ta
tra
ns
co
∆x
x*
Figure 6.8: The constraint gradients are normal to the constraint tangent plane
λ∗ )T · ∇2 h + (ω
∇2 L = ∇2 f + (λ ω∗ )T · ∇2 g.
Let there be nonzero feasible directions ∆x 6= 0 satisfying the following linear systems at
the point x∗ : ∇hT · ∆x = 0 and ∇gT · ∆x = 0, for all active inequalities. Then, if x∗ is a
local minimum point for the constrained minimization problem, it must be true that
Q = ∆xT · ∇2 L|x∗ · ∆x ≥ 0.
78
3. The Kuhn-Tucker conditions allow a symbolic and analytical solution for some
very specific design problems (e.g. weight minimization of trusses with displaced
constraints).
5. The Kuhn-Tucker conditions are applicable solely for regular points. This is a qual-
ification that can make “life more difficult”.
6. The multipliers and slack variables can be used for navigation and for getting in-
sights into a problem. In particular, ω j = 0 implies an inactive constraint, s2j = 0
implies an active constraint, s2j > 0 implies an inactive constraint and s2j < 0 im-
plies a violation of the Kuhn-Tucker conditions.
79
80
Chapter 7
7.1 Introduction
Numerical methods for nonlinear optimization problems are needed because the indirect
methods, based upon necessary and sufficient conditions, are either too cumbersome or
not applicable at all. Nevertheless, the theoretical background of the indirect solution
techniques is extraordinarily important with respect to an understanding of the numerical
optimization methods. As repeatedly outlined, many concepts discussed in the indirect
solution of optimization problems are often reused in numerical optimization.
It must be pointed out at the beginning that the mediation and representation, and therefore
the teaching of numerical optimization methods, is not an easy task. In contrast to other
numerical methods, such as linear or nonlinear equation solvers, integration methods,
eigenvalue solvers, etc., there is an enormous plentitude of distinct solution methods in
numerical optimization. This fact makes it difficult to select the most suitable method for
a given problem. In fact, one has to realize that
81
design optimization problem
concepts
- barrier function methods - methods solving the original constrained problem
- penalty function methods - methods solving the approximations of the constrained problem
- multiplier methods - methods for special cases
strategies
hill-climbing methods for hill-climbing methods for sequential special
unconstrained problems constrained problems approximation solvers
quadratic
SLP
substrategies/optimizers
1. Numerical methods for optimum design are conceptually different from indirect
(analytical) methods described in the previous chapters: They work in a direct fash-
ion using an iterative process, no matter if an unconstrained or a constrained ap-
proach is applied.
3. The right side of the classification (primal methods) provides methods that incorpo-
rate the constraints directly. Hereby, different subconcepts can be identified with re-
spect to the constraints. The first subconcept leaves the original constraint unaltered.
The second subconcept—in analogy to the transformation methods—transfers the
problem with constraints into a sequence of subproblems using specifically approx-
imated constraints, such as linearized constraints, quadratically approximated con-
straints or convex constraints. The third subconcept is limited to very specific non-
linear and constrained problems, for example pure quadratic or so-called geometric
optimization problems where polynomials (sums of products) form the objective
function and the constraints. The latter problems are, however, more of interest for
mathematicians and are of minor practical relevance in engineering.
82
(k)
select an initial solution x with k = 0
yes no
improved? satisfied?
yes no yes no
satistified? improved?
yes no yes no
output results
Figure 7.2: Graphical illustration of the basic optimization process used in all numerical opti-
mization methods
83
x2 contour of objective function
α
(k)
(k)
s
(k+1)
x
(k)
x
constraints
x1
f (x(k) ) ≤ 0, where s(k) represents the desirable direction of design change, also called the
direction of descent, and the quantity α(k) is a positive scalar called the step size.
The process of computing the change in design, ∆x(k) = α(k) s(k) , is therefore composed
of two parts (see Fig. 7.3).
In general, there are three distinct methodologies that can be used to find solutions for
the direction finding subproblem. The individual methodology depends on the order of
information with respect to the objective function and constraints, incorporated into the
computation of the s(k) direction.
84
x2 e2
∆x
(0) α2*e2
α1*e1
(0)
x
x1 f(x)
Figure 7.4: Search vector based on a linear combination of the unit vectors
x2, e2
x1, e1
Figure 7.5: An example of the Rosenbrock method using rotated search directions
where the α∗1 and α∗2 are the optimal step sizes in the directions of the unity vectors e1
and e2 . Fig. 7.4 shows the solution process schematically for an unconstrained problem.
As we can see, α∗1 and α∗2 in the step from (k) to (k + 1) are determined such that the
minimum in each direction ek , k = 1, 2 is found (carried out by a line search method).
The Rosenbrock method is an improvement of the successive variation method. Success
and failures are identified and weighted by means of factors. This creates a rotation of
the search directions as illustrated in Fig. 7.5, resulting in an acceleration of the search
process.
In the simplex method (not to be confused with the Simplex algorithm im linear optimiza-
tion) a domain-oriented solution is found while the above two methods are path-oriented
methods. In path-oriented methods, according to their designation a single path creates
the solution points. By contrast, in a domain-oriented method, due to specified rules, a
subregion (in the feasible domain) of the optimization space is defined. Fig. 7.6 show the
schema which is applied in the simplex method. A triangle spans the subregion where
the worst point is eliminated in the consecutive steps through reflection. It must be men-
85
x2, e2
worst point in the
following step
x1, e1
(k+6)
x
x*
(k+5)
x
(k+4)
x
(k+3)
x
(k+2)
x
(k+1)
x
Figure 7.7: Solution points can be driven out of the feasible domain in the Simplex method
86
x2
start
x1
The derivation of gradient methods can be demonstrated very rapidly if we make use
of the Taylor series expansion that we have already considered thoroughly. Using our
fundamental inequality f (x(k) + α(k) s(k) ) ≤ f (x(k) ) we can approximate the left-hand side
of the above “minimization condition” for numerical methods by the linear Taylor series
expansion about the point x(k) such that
f (x(k) ) + ∆x(k) T · ∇ f |x(k) + · · · ≤ f (x(k) ),
where ∇ f |x(k) is the gradient of f (x) at the point x(k) and the small terms of higher order
have been neglected. Also, ∆x(k) = α(k) s(k) . Subtracting f (x(k) ) from both sides of the
inequality gives
∆x(k) T · ∇ f |x(k) = α(k) s(k) T · ∇ f |x(k) ≤ 0.
Since α(k) > 0, it may be dropped from the inequality. Also, since ∇ f |x(k) is a known
quantity (the gradient of the objective function at x(k) ), the search direction must be com-
puted to satisfy the inequality such that
s(k) T · ∇ f |x(k) = s(k) T grad f |x(k) ≤ 0.
Geometrically, the inequality shows that the angle between the vectors s(k) and ∇ f |x(k)
must lie between 90◦ and 270◦ . In other words, any small movement in such a direction
must decrease the objective function. Furthermore, we can postulate that s(k) is propor-
tional to ∇ f |x(k) . The descent direction according to the gradient-proportional direction
is also called the downhill direction, which we should travel along to find our minimum
solution. As one can imagine, due to the wide range of the “downhill” angles (90◦ ...270◦ )
there are several methods available which compute the downhill direction differently.
The steepest descent method is the simplest, the oldest and probably the best known nu-
merical method for unconstrained optimization. (Cauchy, 1847, already introduced it even
before computers were invented.) In this method, exactly the negative gradient represents
the downhill direction, i.e. s(k) = −∇ f |x(k) . The graphical representation in Fig. 7.8 shows
the iteration history for a 2D example. Note that a large number of iterations may be re-
quired.
The method of conjugate gradient methods, due to Fletcher and Reeves (1964), is a very
simple and effective modification of the steepest descent method. In the steepest descent
87
method two consecutive steps within one iteration cycle, in the 2D example, are always
orthogonal to each other. This tends to slow down the process of the steepest descent
method, although the method converges. In contrast, the conjugate gradient directions
are not orthogonal to each other. Rather, these directions tend to cut diagonally through
the steepest descent directions. Without derivation, in the conjugate gradient method the
search direction is computed as s(k) = −∇ f |x(k) + β(k) s(k−1) , where
k ∇ f |x(k) k
β(k) = .
k ∇ f |x(k−1) k
Again, in the case that constrained problems are to be solved using gradient methods
specific enhancements are required. To give an example, the so-called gradient projec-
tion method by Rosen consists of a projecting mechanism by which the gradient of the
objective function is projected on the hyperplane tangent to the active set of constraints.
The vector so obtained is then the search direction. This indicates a close affinity to the
Kuhn-Tucker optimality conditions where we have also discussed the demands on feasi-
ble directions for active constraints (see second-order necessary and sufficient conditions,
in particular the dot products between ∇hT · ∆x and ∇gT · ∆x).
∂ f (x + ∆x) !
= 0 + ∇ f (x) + H∆x = 0,
∂∆x
or
∆x = −H−1 · ∇ f (x),
where ∆x is a small change in design and H is the Hessian of the function f (x) at the
current point. Of course, it must be assumed that the Hessian matrix is nonsingular to be
able to invert H.
Using this value for ∆x, the new estimate for the design is given according to our al-
ready known equation x(k+1) = x(k) + ∆x(k) . Each iteration requires the computation of
the Hessian, which is a symmetric matrix. Nevertheless, determining H needs n(n + 1)/2
second derivatives which means considerable computational effort. To improve efficiency
Newton methods have been developed which require less computational effort. In this
88
context, the Davidon-Fletcher-Powell method and the BFGS (Broyden, Fletcher, Gold-
farb and Shannon) update method are to be mentioned which use approximations of the
Hessians based on first-order derivatives. A “nice” property of the Newton methods is that
they show a so-called Q1-behavior, i.e. they find the minimum of a quadratic function in
one single step. Again, if constraints come into play additional remedies are necessary to
remain in the feasible domain during optimization.
min { Φ(α) } .
α
Since this line search problem is a fundamental part in almost all optimization problems a
numerical example will be considered in which the one-dimensional minimization aspect
is presented in detail. The consideration of a concrete example may open the view for
other numerical line search which are numerously available in numerical optimization.
7.5.1 Example
Let a direction of change for the function f (x) = 3x21 + 2x1 x2 + 2x22 + 7 at the point (1, 2)
be given as (−1, −1). Therefore,
(k) 1 (k) −1
x = and s = .
2 −1
To compute the step size such that minα {Φ(α)} and minx { f (x)} hold, we first check to
see if s(k) is a direction of descent. Hence
6x1 + 2x2 10
∇ f (x)|x(k) = ∇ f |(1,2) = = .
2x1 + 4x2 10
89
Φ
ε Φ’(0) secant
Φ(α) ≤ Φ(0) + ε Φ’(0) α
0 < ε < 1, recommended: ε ≈ 0.2 ... 0.5
0 1 α
test failed test satisfied
Therefore, s(k) = (−1, −1)T is indeed a direction of descent. Then, the new point x(k+1) =
x(k) + αs(k) is given as
(k+1)
x1 1 −1
= +α ,
x2 2 −1
(k+1) (k+1)
or, in component form, x1 = 1 − α, x2 = 2 − α. Substituting the above equations
(k+1) (k+1)
for x1 and x2 into the objective function at the point x(k+1) we have f (x(k+1) ) =
3(1 − α)2 + 2(1 − α)(2 − α) + 2(2 − α)2 + 7 = 7α2 − 20α + 22 = Φ(α).
!
According to the necessary conditions we demand that dΦ/dα = 0, or 14α∗ − 20 = 0 and
α∗ = 10/7. (Note that d 2 Φ/dα2 = 14 > 0.) As a result, we obtain
(k+1) 1 10 −1 −3/7
x = + = .
2 7 −1 4/7
In the above example, it was possible to obtain an explicit form of the function f (x(k+1) ) =
Φ(α) and to use the conventional necessary (and sufficient) conditions for computing the
desired step size α∗ . For many problems, however, such an explicit representation for
Φ(α) is not available. Moreover, even if the function Φ(α) would be known, it may be
too complicated to get an analytical solution. Therefore, a numerical method must be used
to find an α∗ value that minimizes the the function f (x) in the direction s(k) . This method
is designated as the numerical line search process, being iterative in itself.
For engineers solving structural optimization problems it is extremely important to know
that the accuracy of a line search crucially governs the convergence of all optimization
processes based upon the search and gradient methodologies. An exception is the opti-
mization using the Newton methodology where a rough estimated line search is sufficient;
in this case, only the Armijo test needs to be checked which means that the total process
is convergent when the function values remain below the downward directed secant at the
beginning of an interval (see Fig. 7.9).
Focusing on the line search techniques for search and gradient methodologies, it is not
astonishing that there are, again, numerous alternative line searches at work in the vari-
ous software packages available at the individual computer centers. Therefore, as at the
beginning of studying numerical optimization methods, an overview on the competitive
line searches is very important. The chart in Fig. 7.10 shows the main concepts.
In all cases, we must make some basic assumptions on the form of the line search function
to compute the step size numerically. For example, it has to be taken for granted that a
90
line search technique
quadratic cubic
a b α a b c d α
minimum exists and it is unique in the interval of interest. A function with this property
is called a unimodal function. The two examples in Fig. 7.11 demonstrate this situation.
Most line searches work for only unimodal functions. This may appear to be a severe
restriction, however, it is not. For functions that are not unimodal, we can think of locating
only a local minimum point that is closest to the starting point. The search problem then
is to find an interval at which the function f (α) has a global minimum value. This may be
carried out only in a numerical sense by determining the interval in which the minimum
may lie, i.e. by determining some lower and upper bound limits αl and αu for α∗ . The
interval [αl , αu ] is called the interval of uncertainty. This interval is reduced iteratively
until it is less than a specified small positive number ε. This is the desired accuracy for
locating the minimum.
According to this consideration, two phases can be identified. In phase one, the location
of the minimum point is bracketed and the initial interval of uncertainty is established. In
the second phase, the interval of uncertainty is refined by eliminating regions that cannot
contain the minimum.
91
f(α)
α
interval of interest
f(α)
0.382 L
0.618 L
a b d c α
L=c-a
is used to compute the minimum value more precisely (see Fig. 7.12).
Since only zero order information is used, the equal interval search is a “search method”
according to our general classification of numerical optimization methods.
Elimination methods vary the increment of each step, which may be more efficient in
many cases. Bracketing the relevant minimum is very quick if a unimodal function can
be assumed. To give an example for the elimination approach the Golden Section Search
is considered here. In this method, the interval of interest is partitioned using the golden
ratio (see Fig. 7.13). Assume the three points a, b and c have been given. Then the next
step is to position a fourth point d in the larger of the two interval (a, b) or (b, c). It can be
shown that if the larger interval is divided by d into two segments that have the ratio of the
golden section, then this will give the fastest convergence if no other assumptions about
f (α) are made. This approach is carried out repeatedly until a convergence criterion is
reached.
92
quadratic approximation q(α)
f(α)
given f(α)
2. cubic interpolation
• two point pattern using two function values at two points along with their
derivative results.
Due to the limitations in time, only the quadratic curve interpolation (fitting) using a one
point pattern is to be considered in more detail.
93
quadratic approximation q(α)
f(α)
given f(α)
α1 α2 α
Suppose that at a point α1 (basic point) we can evaluate f (α1 ), f ′ (α1 ) and f ′′ (α1 ). It is
then possible to construct a quadratic function q(α) which agrees at α1 with the given
function f (α) up to the second derivative, according to a Taylor series expansion,
1
q(α) = f (α1 ) + f ′ (α1 )(α − α1) + f ′′ (α1 )(α − α1 )2 .
2
An estimate ᾱ∗ of the true minimum α∗ can now be calculated by finding the vanishing
point of the derivative of q(α) with respect to α. Thus, setting 0 = q′ (ᾱ∗ ) = f ′ (α1 ) +
f ′′ (α1 )(ᾱ∗ − α1 ) we obtain for the estimate
f ′ (α1 )
ᾱ∗ = α1 − .
f ′′ (α1 )
This process can be repeated at ᾱ∗ . It is apparent that the new point in the iteration of
the line search does not depend on the value f (α1 ). Therefore, the method can be viewed
as iteratively solving the equation f ′ (α) = 0. In fact, this is the well known method of
Newton-Raphson.
1 f ′ (α2 ) − f ′ (α1 )
q(α) = f (α1 )(α − α1 ) + · (α − α1 )2 .
2 α2 − α1
Consequently, we obtain
α2 − α1
ᾱ∗ = α1 − · f ′ (α2 ).
f ′ (α
2 ) − f ′ (α )
1
This formula assumes that the interval [α1 , α2 ] includes the unknown minimum α∗ .
94
quadratic approximation q(α)
f(α)
given f(α)
α1 α2 α3 α
where ai j = αi − α j and bi j = α2i − α2j (see Fig. 7.16). For example, a23 = α2 − α3 and
b31 = α23 − α21 .
For our purposes in this course, it is sufficient not to engross the repair mechanisms of hill
climbing methods, nor is there any reason to consider special non-linear constrained op-
timization methods (that may be only interesting for mathematicians). As a consequence,
we only have to look at sequential approximation methods and transformation methods,
which are the fourth concept to handle constraints in numerical optimization.
We start our investigation by describing the solution concepts used in the transformation
methodology. (Sequential approximation is treated later on.)
95
7.6.1 Transformation methods
Transformation methods try to circumvent the disturbing impact of the constraints by
reformulating the original constrained problem. As we know, the idea of reformulating
a given problem into a substitute problem is not new. The Lagrange multiplier or the
Kuhn-Tucker approach are good examples for that.
The idea, common to all individual transformation methods, is easily and rapidly ex-
plained. The original problem
hk (x) = 0, k = 1, 2, 3, ..., p
min f (x)
x g j (x) ≤ 0, j = 1, 2, 3, ..., m
where (r(κ) ) = r(1) , r(2) , r(3) , ... is a monotonic decreasing sequence with limκ→∞ r(κ) → 0
and Φ(x, r(κ) ) = f (x) + P(h(x), g(x), x, r(κ)) is the transformation function, also called
the composite or auxiliary function. In Φ(x, r(κ) ) the original objective function f (x) is
augmented in terms of a real valued function P whose action is controlled by the con-
straints h(x) as well as g(x), and the controlling parameters r(κ) . The form of the function
P depends on the transformation used.
The basic procedure is to choose an initial estimate x(0) and define the function Φ(x, r(κ)).
The controlling parameters r(κ) are also initially selected. Then, the function Φ(x, r(κ))
is minimized for x, keeping r(κ) fixed. The parameters r(κ) are adjusted between two
interactions and the procedure described above is repeated until no further improvement
is possible.
96
f(x)
Φ∗(0)
Φ∗(1)
Φ(x,r ) = Φ , r = 1.0
(0) (0) (0)
Φ∗(2)
Φ(x,r ) = Φ , r = 0.5
(1) (1) (1)
Φ(x,r ) = Φ , r = 0.25
(2) (2) (2)
infeasible
region
feasible region
inequality constraint x
Figure 7.17: Using the SUMT method to minimize the objective function f (x) = x2 with the con-
straint g(x) = −x + 2 ≤ 0.
Both functions are called barrier functions because a large barrier is constructed around
the feasible region. In fact, both functions P become infinite if any of the inequalities
is active. Thus, when the iteration is started from a feasible point, it cannot go into the
infeasible region because the barrier cannot be crossed. In both cases, it is attempted to
converge against the unknown minimum x∗ as r(κ) → 0 and x∗ (r(κ) ) = x∗(κ) → x∗ for
κ → ∞. Correspondingly, limκ→∞ Φ(κ) → f ∗ = f (x∗ ) with g(x∗ ) ≤ 0. As an example,
see Fig. 7.17 for the minimization of the objective function f (x) = x2 with the constraint
g(x) = −x + 2 ≤ 0.
Since the solution is found only by feasible points, the BFM is also named the interior
point BFM. In those cases in which it is not easy to create a feasible initial point it is a
good idea to create a feasible point with the aid of other numerical methods (e.g. interval
search or evolution strategies).
For a fixed r(κ) we omit the κ and can write Φ(κ) = Φ(x) = x21 + x22 − r ln(−(1 − x1 − x2 )).
The minimum of Φ(κ) is given by ∇Φ = 0 or
∂Φ 1
= 0 = 2x∗1 + 0 − r ∗ ,
∂x1 x1 + x∗2 − 1
∂Φ 1
= 0 = 0 + 2x∗2 − r ∗ .
∂x2 x1 + x∗2 − 1
∂Φ ∂Φ
Since ∂x1 − ∂x = 2x∗1 − 2x∗2 = 0, it follows that x∗1 = x∗2 . We have
2
r r
2x∗1 − = 2x∗2 − =0
x∗1 + x∗2 − 1 x∗1 + x∗2 − 1
97
x2
T
1 x* = [1/2, 1/2]
1/2
x1
1 2
1/2
where g+j (x) = max(0, g j (x)). In contrast to the BFM, the penalty function is defined so
as to prescribe a high cost for violation of the constraints h(x) as well as g(x). Two facts
can be identified:
1. the monotonic decreasing sequence of control parameters r(κ) enlarge the value P
and impose a penalty,
2. the iteration proceeds in the infeasible region because max(0, g j (x)) defines P only
with respect to the infeasible domain.
Therefore, the PFM is an exterior point SUMT. Note, that the PFM allows the existence
of both equality as well as inequality constraints which are checked against a violation by
means of queries, i.e. it is examined if hk (x) is not equal to zero, or if g j (x) if positive.
As an illustration, Fig. 7.19 shows the penalty functions for the example minx { f (x) =
2x2 | g(x) = 1 − 2x ≤ 0}.
It should not be concealed that for optimum design it may be disadvantageous to ap-
proach the unknown minimum from the infeasible domain. If the iterative process stops
in between, an illegal design is created.
98
Φ(x,r ) = Φ , r = 0.05
(2) (2) (2)
Φ(x,r ) = Φ , r = 0.5
(1) (1) (1)
f(x)
Φ(x,r ) = Φ , r = 1.0
(0) (0) (0)
infeasible region
feasible region
x
inequality constraint
Figure 7.19: Penalty functions for the example minx { f (x) = 2x2 | g(x) = 1 − 2x ≤ 0}.
Through the integration of the Lagrange multipliers λ and ω the “destructive effect” of
the parameter r(κ) can be compensated in ill-conditioned cases. It is to be mentioned
99
that the multiplier methods solve a saddle point problem by two nested subproblems
(minx,λλfixed Φκ , maxλ lr (λ
λ)). They are, therefore, called dual methods.
A short description of the above three methods is sufficient for our purposes.
7.7.1 SLP
In this case, the objective function and the constraints, assuming only inequality con-
straints, are repeatedly linearized as the modal point x(k) . Hence, the following type of
subproblems have to be considered.
7.7.2 SQP
Here, the objective function is approximated quadratically,i.e.
1
f (x) 7→ f (x(k) ) + ∆x(k) T · ∇ f |x(k) + ∆x(k) T · ∇2 f x(k) · ∆x(k)
2
while the constraints are linearized in the same fashion as in SLP.
7.7.3 SCP
In this method, the advantages of convex optimization are used. As we already know, the
property of convexity allows us to find global optimal solutions. Consequently, it seems to
be a brilliant idea to construct the individual subproblems in the sequential solution pro-
cess such that each subproblem is convex. It could be figured out that a specific Taylor se-
ries expansion creates the desired convexity (Fleury has affected the SCP tremendously).
The series expansion of the objective function as well as the constraints contains only first
order partial derivatives, but they are introduced depending on their algebraic sign.
100
Accordingly, for the objective function we get
!
n1
∂ f + n2
∂ f − 1 1 1
f (x) 7→ f (x ) + ∑ (xi − xi ) − ∑
(k) (k)
− (k) .
i=1 ∂xi x(k) i=1 ∂xi x(k) (x )2
(k) xi x
i i
In the above formula, the + and − superscripts indicate where the respective gradient is
positive (+) or negative (−). If the gradient is negative the first derivative is carried out
with respect to the reciprocal 1/xi of the optimization variable xi (that’s the trick by which
convexity can be created). In a similar fashion, the constraints are established, i.e.
!
m1 ∂g+ m2 ∂g−
1 1 1
g j (x) 7→ g j (x(k) ) + ∑ (xi − xi ) − ∑
j (k) j
− .
i=1 ∂xi (k) i=1 ∂xi (k) (x )2
(k)
xi x(k)
x x i i
Note that all sequential approximation methods are available in the optimization library
offered by the Institute of Computational Engineering.
101
f(x1) f(x1)
dell
plateau plateau
x1 x1* x1
Figure 7.20: A “plateau” during the optimization process can hinder the search greatly. (left); a
minimal could lie in a dell. (right)
Note that it may be advisable, sometimes, to compare solution vectors using a certain lag,
e.g. x(k+r) and x(k) , where r may be a positive integer (5, 10, or so). By this retardation,
there may be a chance to find minima that could lie in a “dell” of a plateau (see Fig.
7.20, right).
The most “brutal” method of termination is to restrict the elapsed computer time or
the maximum number of iterations. It is not a secret that this is often the only way to
avoid serious problems, in particular to prevent endless loops due to an unintentionally
ill-conditioned optimization model.
Finally, in cases where the objective function and the constraint functions show a C2 -
continuity (i.e., if they are smooth functions in a mathematical sense and twice differen-
tiable), then also the Kuhn-Tucker conditions can be used as termination criteria. Indeed,
modern software solutions often make use of them.
• multi-modal problems that don’t have only one single local optimum,
• problems where the objective function or the constraints or parts of the constraints
are not differentiable. Examples are beam-like cylindrical shells (see Fig. 7.21) or
point-wise defined functions (see Fig. 7.22),
• migrating optimum points, e.g. due to temporal changes in the optimization model
(then the parameter t comes into play as well) or due to stochastic behavior of
the optimization model induced by stochastic processes (damages or deterioration
phenomena are some examples),
102
- -
- -
+
my -distribution
Figure 7.21: For a structure with beam-like cylindrical shells, the moment distribution my is not
differentiable.
f(x1)
x1
(2)
x*
x2
(1)
x*
x1
Figure 7.23: A discontinuous solution space can have many local optima.
103
general optimization
problem
MM
hill climbing
hill climbing
strategies
strategies
w.r.t. constraints
evolution evolution
strategies strategies
In all the cases mentioned above the evolution strategies seem to be an appropriate choice.
As a consequence, the robustness and, in context of that, the general applicability of the
evolution strategies distinguish them from potential competitors.
From the viewpoint of categorizing, the evolution strategies are a member of the hill-
climbing methods, more appropriately, down-hill-climbing, since we are considering min-
imization as the standard formulation of an optimization process. According to our previ-
ous classification schema, therefore, the evolution strategies appear in both main branches
of our classification diagram, i.e. they can be included in the transformation concept as
well as the primal (constraint-oriented) concept. To clarify this consideration the classifi-
cation diagram in shown again in Fig. 7.24 where the evolution strategies are included in
the schema.
As demonstrated, the evolution strategies belong to the search methods. The reason for
this is because no (first or second order) derivatives are needed, i.e. only zero order infor-
mation is taken.
The basic approach of the evolution strategies will be briefly explained. Evolution strate-
gies are assigned to the field of “evolutionary computation” (also denoted as “evolutionary
algorithms” as used since 1990), which encompasses four subdomains:
104
Some very short comments on EP (famous scientists are L. J. Fogel and D. B. Fogel) and
GP (J. R. Koza): Both directions started from the manipulation of finite state automata and
computer programming, respectively, making use of the solution principles and mecha-
nisms of the biological evolution processes, where numerous optimization mechanisms
are embedded. In general, structured quantities can be considered such as data trees,
graphs, structural systems, etc. There are a lot of similarities to GA and ES such that
a further discussion in not necessary because the main solution aspects are dealt with
when ES are explained.
The GA and the ES were developed independently from each other, but both were in-
vented to solve optimization problems by applying the continuous adaption and improve-
ment of the biological evolution as a paradigm for mathematical or technical optimization.
Characteristic examples of biological evolution are
Around 1975, John Holland (Ann Arbor, USA) together with Kenneth A. De Jong, in-
troduced the genetic algorithms. Some years previously, Ingo Rechenberg created the
first evolution strategy, the so-called (1 + 1)-ES, in Berlin in 1972, accompanied by Paul
Schwefel, who essentially intensified the further developments. (Note: for the first time
in engineering the (1 + 1)-ES was evaluated in the author’s Ph. D. thesis on the shape
optimization of cylindrical shells in time period 1970-1974.) Both approaches bear many
analogies. In the following, particularly the evolution strategies are to be introduced, as
mentioned before. The genetic algorithms are only described to get to know the main
differences between genetic and evolution strategies.
The most palpable difference between the GA and the ES consists in how the optimization
variables (from a biological point of view the “individuals of a population”) are modeled
and internally represented. In the case of genetic algorithms the individuals are repre-
sented in terms of “chains of bits”, which attempts to simulate the natural model, i.e. the
genetic code implemented in the DNA (deoxyribonucleic acid) structure of the double
helix, in the best possible fashion.
In contrast to that, the evolution strategies use real-valued vectors to represent the pheno-
type of the individuals (optimization variables). The genotype is therefore not considered.
As a consequence, the genetic operators which incorporate optimization mechanisms,
such as the mutation or recombination operator, are different. The GA have to code the
phenotype information into bits using mostly the floating point representation known from
computer science. Since bits are applied in GA directly, the combinatorial problems (e.g.
traveling salesman problem) are very appropriate for GA. On the other hand, the ES are
better suited for “parameter optimization” because no bit coding is necessary.
In the following, only the dyadic (1 + 1)-evolution strategy is examined. This is sufficient
for a first introduction into the philosophy of evolution strategies. The denotation (1 + 1)
means that there are two competitive individuals that are subjected to a selection process
in accordance with the principal of the survival of the fittest (Darwin’s principal). Hereby,
the first “1” accounts for a parents, the second “1” for a child, created from the parent by
a mutation.
105
Figure 7.25: The Gaussian distribution with mean mi and standard deviation σi .
It should be mentioned that, in the last two decades, further strategies have been created,
such as the multi-membered ES as follows:
Here, the “+” and “,” symbols stand for the way in which the selection process is car-
ried out; µ and λ represent the number of individuals in the parent and child generation,
respectively. “+” means that the members of both generations (parents and children) are
used in the selection process, “,” means that the parent generation is not considered in
the selection process for determining the fittest member. In this context, the parameter κ
constitutes the duration of a possible lifespan by which the lethality principle of biology
is materialized.
Now to some details of the (1 + 1)-ES. The core of the corresponding algorithm again
is the governing vector equation x(k+1) = x(k) + α(k) s(k) known from our structure chart
that has been used for describing the iterative optimization process in general. However,
this vector equation is modified in the ES, and also in the (1 + 1)-strategy. Now we have
(k)
x(k) + α(k) s(k) 7→ x(k) + z(k) , where the vector z(k) contains stochastic components zi ,
i = 1, 2, ...n, which follow a Gaussian distribution (see Fig. 7.25). Thus,
(zi −mi )2
1 −
2σ2
pdf(zi ) = √ e i .
σi 2π
Hence, the individual components zi of the stochastic vector z are created around the
position vector x(k) following a Gaussian distribution for each zi component. For this, the
mean value mi (arrow head of vector x(k) is the origin) becomes mi = 0 so that
z2
i
1 −
2σ2
pdf(zi ) = √ e i
σi 2π
can be simplified. According to the (0, σ)−pdf, stochastic steps into the solution space are
created which can be understood as a mutation mechanism. The geometric interpretation
of this mutation is as follows:
106
a b c
d e f
Figure 7.26: Some model functions used in the probabilistic theory of the (1 + 1)-ES.
• The search directions cover the full search space Rn , if n optimization variables are
defined,
• the step length of the Gaussian step into the space remains small in general; large
step lengths are rare, due to the Gaussian distribution.
107
x2 x2
success
a a
x1 x1
Figure 7.27: The success rate for a small (a) and a large (b) circle of probability.
of the σi quantities, although the two model functions represent diametrically opposed
scenarios. The model “d” (corridor model) reflects the circumstances far away from the
optimum, the model “b” (spherical model) display the situation in the vicinity of a local
or global optimum.
Furthermore, it can be substantiated that the adaption of the σi -values has to make sure
that the values for σi are neither too small nor too large. This conclusion can be immedi-
ately drawn if the following two cases are considered (see Fig. 7.27 in the 2D case).
Assuming σi = σ = const, for all i, the geometric location of points having the same
“strike probability” is a circle of, say, radius a. Now we can state that if the radius a is too
small, the “gain of progress” is also small, although there are many successes within this
iteration step (see Fig. 7.27(a)). Otherwise, if the radius a is too large (see Fig. 7.27(b)),
the gain is likewise small because the number of successes is small, although the step size
is large! As a consequence it can be guessed that the appropriate step size, represented
in terms of the radius a, must lie in between these two extreme cases. Due to the proba-
bilistic investigations carried out by the Rechenberg group it could be figured out that the
convergence speed is optimal
1. in the corridor model if the rate successes/mutations, called the success probability
pdfs , is about 1/3.2,
2. in the spherical model if the success probability pdfs is about 1/5.
In fact, the two numbers 1/3.2 and 1/5 don’t differ very much, albeit the two models
represent two diametrically opposed endpoints of a wide scale of possible models. In other
words, the optimal convergence speed ranges in a very narrow window of possibilities
(like the escape velocity of missiles in aeronautics). If this window is abandoned a “deadly
reaction” happens. Due to the small difference between 1/3.2 and 1/5 it is suggested to
take the 1/5 value as the governing value to formulate a success rule for the (1 + 1)-ES
algorithm. If the success ratio therefore, is less than 1/5 the σi values have to be reduced,
if it is larger than 1/5 the σi values have to be increased.
The reduction and amplification factor can also be specified more accurately. For the
spherical model, which represents the most disadvantageous case with regard to the σi
changes the theoretical examination leads to the factor
0.202 n 1
lim 1 − = 0.817 = .
n→∞ n 1.224
108
This gives us the following adaption rule that is incorporated in the (1 + 1)-algorithm:
1. Check after n (i.e. the number of optimization variables) mutations how many suc-
cesses have been achieved during the last 10n mutations.
2. If the number of successes is less than 2n, then multiply the values σi by the factor
0.817 (reduction!).
3. If the number of successes is larger than 2n, then divide the values σi by the factor
0.817 (enlargement!).
4. Otherwise, the values of σi are optimal and there is no need to change them.
where εa and εb are the absolute and relative accuracies of the computer processor used
in the optimization. By that, a minimum of variance is achieved.
Finally, if should be mentioned that the (1 + 1)-strategy is also able to create feasible
solutions if the starting point is infeasible. In this case, the primary objective criterion is
replaced temporarily by a substitute function fˆ representing the sum of the entire violated
constraints as follows,
m
fˆ = ∑ g j (x)δi,
j=1
where
−1, if g j (x) < 0 (infeasible),
δi =
0, otherwise.
109
1. the relative termination criteria
| f (k) − f (k+∆) |
≤ ε1 ,
| f (k) |
| f (k) − f (k+∆) | ≤ ε2 ,
and, finally,
110