Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mathematics
Edited by A. Dold and B. Eckmann
1133
Krzysztof C. Kiwiel
Springer-Verla~
Berlin Heidelberg New York Tokyo
Author
Krzysztof C. Kiwiel
Systems Research Institute, Polish Academy of Sciences
ul. Newelska 6, 01-447 Warsaw, Poland
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting,
reproduction by photocopying machine or similar means, and storage in data banks. Under
54 of the German Copyright Law where copies are made for other than private use, a fee is
payable to "Verwertungsgeselrschaft Wort", Munich.
by Springer-Verlag Berlin Heidelberg 1985
Printed in Germany
Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr.
2146/3140-543210
PREFACE
Page
Chapter i. F u n d a m e n t a l s
i.i. Introduction .................................... 1
1.2. Basic Results of N o n d i f f e r e n t i a b l e Optimization
Theory .......................................... 2
1.3. A Review of E x i s t i n g Algorithms and Original
Contributions of T h i s W o r k ...................... 22
Chapter 2. A g g r e g a t e Subgradient Methods for U n c o n s t r a i n e d
Convex Minimization
2.1. Introduction .................................... 44
2.2. Derivation of the A l g o r i t h m Class ............... 44
2.3. The Basic Algorithm ............................. 57
2.4. Convergence of the B a s i c Algorithm .............. 59
2.5. The Method with Subgradient Selection ........... 71
2.6. Finite Convergence for Piecewise Linear Functions 76
2.7. Line Search Modifications ....................... 84
Constrained Problems
6.1. Introduction ..................................... 229
6.2. Derivation of the Methods ........................ 230
6.3. The Algorithm with Subgradient Aggregation ....... 245
6.4. Convergence ...................................... 252
6.5. The Algorithm with Subgradient Selection ......... 264
6.6. Modifications of the Methods ..................... 269
6.7. Methods with Subgradient Deletion Rules ......... 275
6.8. Methods That Neglect Linearization Errors ........ 293
6.9. Phase I - Phase II M e t h o d s ....................... 294
Fundamentals
i. I n t r o d u c t i o n
A set ScR N
is called convex if ~,y]cS for all x and y be-
k
longing to S. A linear c o m b i n a t i o n Z ljx 3 is called a convex combina-
j=l k
tion of points xl,...,x k in R N if each lj ~0 and Z lj=l. The convex
j=l
hull of a set S C R N, d e n o t e d conv S, is the set of all convex combina-
t i o n s of p o i n t s in S. c o n v S is the s m a l l e s t c o n v e x set c o n t a i n i n g S,
and S is c o n v e x if and only if S = c o n v S. An i m p o r t a n t property of con-
vex hulls is d e s c r i b e d in
SI+s 2 = { z l + z 2 : z l ~ S I , z 2 E $2},
A function f:R N - - R is c a l l e d c o n v e x if
This is e q u i v a l e n t to the ~ i g r a p h of f
are convex.
A function f : R N ---+ R is s t r i c t l y convex if f ( k x l + ( l - k ) x 2) <
l f ( x l ) + ( l - l ) f ( x 2) for all ~e(0,1) and x l ~ x 2. F o r i n s t a n c e , the
function " 12
~. " is s t r i c t l y convex.
A function f : RN -- R is said to be l o c a l l y Lipschitzian if for
each bounded subset B of RN there exists a Lipschitz constant
L=L(B) < ~ such t h a t
Then in p a r t i c u l a r f is c o n t i n u o u s . Examples of l o c a l l y L i p s c h i t z i a n
functions include continuously differentiable functions, convex functions,
concave functions and any l i n e a r c o m b i n a t i o n or p o i n t w i s e m a x i m u m of a
finite collection of s u c h functions, cf. (2.1).
Following {Rockafellar, 1978), we s h a l l n o w d e s c r i b e differentiabi-
lity properties of l o c a l l y L i p s c h i t z i a n functions. Henceforth let f de-
note a function satisfying (2.27 and let x be an i n t e r i o r p o i n t of B,
i.e. x~int B.
The C l a r k e @eneralized directional derivative of f at x in a
direction d
f(x+td} 5 f ( x ) + t f D ( x ; d ) + o ( t ) , (2.5)
If
dom vf = { y ~ R N : f is differentiable at y}
where the index set U is a compact topological space (e.g. a finite set
in the discrete topology), each fu is locally Lipschitzian, uniformly
for u in U~ and the mappings fu(X) and ~fu(X) are upper semicontinuous
in (x,u) (e.g. each fu is a differentiable function such that fu(X) and
Vfu(X ) depend continuously on ix,u)). Let
~(fl/f2)(x)c (f2(x))2
l [f2(x)~f1(x)-fl(x)~f2(x)]. 2.25c)
H C = {z e R N : <Vf(x),z-x> : 0}
is t a n g e n t at x to the c o n t o u r of f at x
where o(t)/t+0 as t+0. Moreover, the graph of ~ equals Hvf, while the
contour of f at x is equal to H C. We c o n c l u d e that l i n e a r i z a t i o n s
based onaf(-) = {vf(')} provide c o n v e n i e n t d i f f e r e n t i a l a p p r o x i m a t i o n s
to f w h e n f is mooth.
Next suppose that f is convex. Then f is locally L i p s c h i t z i a n and
~f is the s u b d i f f e r e n t i a l in the sense of convex analysis:
~f(x) = {gf ~ RN: f(z) Z f(x) + <gf,z-x> for all z}. 2.32)
which is a lower a p p r o x i m a t i o n to f at x
f(x)
= fgf (x), (2.34a)
Hgf
=
{(z,~) ~ RN+I: ~
= fgf (z)} (2.35)
H 1 = {z 6 RN: <gf,z-x~ = 0}
Observe that the "max" above is attained, because ~f(x) is a compact set
by Lemma 2.2. By (2.34), ~ is a lower approximation to f at x
A
f(x) = f(x), (2.38a)
where
Observe that the convexity of ~ follows directly from (2.37) even when
f is nonconvex, since
I maX{fgf(zl): g f e ~f(x)}+(l-~max{fgf(Z~:gfe~f(x)}
= ~ ~(zl)+<l-~)~I~ 2)
Proof. The convexity of ~ was shown above. For each gf E ~f(x), ~gf is
the compactness of ~f(x), (2.37), (2.34a) and Lemma 2.5 imply that
is subdifferentially regular and satisfies (2.43), and
~f(x) = conv{gf: gf e ~f(x)}. The last relation and the convexity of
~f(x) yield (2.42a}. Then (2.42b) follows from (2.43) and (2.26). In
view of (2.5), (2.7), (2.42a) and (2.43), for each d ~ R N we have
< f(x)+tf'(x;d)+o(t)
The above lemma will be used below in two schemes for finding
descent directions. Relation (2.46) means that the set ~f(x) can be
separated from the origin by a hyperplane. Since ~f(x) is a convex
compact set, this is possible if and only if 0 ~ ~f(x). Therefore we
shall first state two auxiliary results.
Nr G = arg min{Igl : g e G}
The following lemma shows how one may find descent directions for
nonsmooth functions.
Then
(i) ~ exists, is uniquely determined and satisfies
-~ = ~ ~ ~f(x), (2.50)
H(x;x) = 0 , (2.56)
~f(x) if F(x)<0,
M(X) = conv{~f(x) U ~F(x)} if F(x)=0, (2.s8)
~F(x) if F(X)>0,
I ~f(x) if F(xI<0,
M(x) = conv{$f(x) o ~ ( x ) } if F(x)=0, (2.59)
~(x) if F(X)>0.
By Lemma 2.5, F(') and H(.;x) are locally Lipschitzian and the above
mappings satisfy
~F(x) C ~ ( x ) , (2.60)
~(x;x) c ~(x), 42.61)
M(X) C M(x), (2.62)
where ~H(';x) denotes the subdifferential of H(-;x) for fixed x.
We have the following necessary condition of optimality.
0 e ~H(x;x), (2.63)
0 ~ M(X). (2.64)
In particular, there exist numbers ui' i=0,...,m, satisfying
m
0 ~ ~0~f(x) + ~ ~i~Fi(x),
i=l
m
~i~0, i=0,...,m, ~ ~i=l, (2.65)
i=0
Proof. Since x must minimize H(-,x) locally, from Lemma 2.14 we obtain
(2.63), which in turn implies (2.64) by (2.61) and (2.62). To see that
(2.65) follows from (2.64), (2.59) and (2.57), note that F(~)~0 and
that one may set ~i=0 if Fi(x)<0. []
17
This follows from the fact that in the convex case we have ~=~F, see
Corollary 2.8, and the condition 0e~F(x) is e q u i v a l e n t to F(x) <_F(y)
for all y, see Lemma 2.11.
Relation (2.65) is known as the F. John necessary condition of
optimality. It becomes the Kuhn-Tucker condition
m
0 e ~f(x) + E ~i~Fi(x), (2.68a)
i=l
when ~o~0, since one may take ~ i = ~ i / ~ o . When problem (2.53) is convex,
i.e. f and each F are convex functions, then the Kuhn-Tucker condi-
1
tion and the Slater constraint qualification yield the following suffi-
cient condition for optimality.
Lemma 2.16. Suppose that problem (2.53) is convex and satisfies the
Slater constraint qualification (2.67). Then the following are equiva-
lent:
i) x solves p r o b l e m (2.53);
ii) ~ satisfies
iii) x satisfies
O : ~H(:;:); (2.70)
18
Proof. Ca) As noted above, (i) implies (ii). Suppose that (2.69) holds,
but f(~)<f(x) for some x satisfying F(x~t<_0. Then
and observe that Y(x) is empty when F(x)<0. Relation (2.78) is equiva-
lent to the following
Lemma 2.18. Consider a locally Lipschitzian problem (2.53) and its con-
vex approximation P(x) at x~S defined via (2.72) and (2.73). Let
= Nr M(x), 2.81)
Then
i) d exists, is u n i q u e and s a t i s f i e s
-~ = p ~ ( x ) , 2.83)
~(x+d) = ~(x)-Jpl 2 , 2.84a)
ef(x)={gf e R N : f ( z ) -
> f(x)+ < gf,z-x > - e for all z}. (2.86)
10 if m=0,
S = {x~RN: F(x)<0}.
value f(xk+tkd k) < f(x k) and the next feasible point xk+l=xk+tkd k E S .
Such a stepsize can be found if dk is a descent direction
may not suffice for assessing the behavior of the problem functions aro-
und x k. For instance, consider an u n c o n s t r a i n e d problem and an analogue
of the steepest descent direction
d k = -gf(xk).
The method is based on the following crucial observation. For any fixed
y ~ R N, define the linearizations
and, since g f ( y ) ~ 3f(y) and gF(y) ~ ~F(y), for each xER N we have
f(x) >f(x;y),
(3.10)
F(x) >~(x;y).
^k ^
minimize f (xk+d) subject to Fk(xk+d) ~ 0. (3.14)
27
minimize u,
I.
for all j=l .... ,k. Setting xk+l=yk+l=xk+d k completes the k-th itera-
tion.
An interesting feature of the cutting plane method is its use of
linearizations provided by each newly generated point for improving the
polyhedral approximations to the problem functions. In other words,the
next search direction finding subproblem (3.15) is modified by appen-
ding the constraints generated by the latest linearizations. This idea
is used in many algorithms for nonsmooth optimization.
Convergence of the cutting plane algorithm can be very slow (Wol-
fe, 1975). This is mainly due to the fact that [ d k l may be so large
that the point xk+d k is far from the points y], j=l,...,k. Then
x k+dk is in the region where ~k and ~k poorly approximate f and
F. Also subproblems (3.14) and (3.15) may have no solutions. These draw-
backs can be eliminated by adding to the objective functions of (3.14)
and (3.15) a penalizing term 1/21dl 2, which will prevent large values
of Idk . Thus we obtain the following regularized modification of sub-
problem (3.14)
minimize 1
~Idl2+u (3.18a)
f(x k+l) < f(x k) for all k, is much easier to attain in practice. The
main idea consists in taking the trial point yk+l=xk+d k as x k+l only
if this leads to an improvement in the objective function value, i.e.
f(yk+l) < f(xk). This is called a serious step. Otherwise a null step is
taken by setting xk+l=x k.
To analyze Lemarechal's (1978) line search rules in more detail,
let (dk,u k) denote the solution of (3.18a)-(3.18b) and let
v k = uk-f(xk). (3.21)
minimize 1dl2+v ,
(3.22)
subject to f~-f(xk)+ < g~ d ~ < v , j=l .... k,
This shows that (d,v)=(0,0) is feasible for (3.22). Hence the optimal
I k 1 2 + v k ~ 1 0 1 2 + 0 = 0 . Therefore
value of 43.22) ~Id
v k < - yld
1 k2
I ~0. (3.24)
k ~ j k
u = max{f + < gf,d > : j=l ..... k}, (3.25)
u k = ~k (xk+d k ) , (3.26a)
v k = fk(xk+dk)-f(xk) . (3.26b)
29
f(yk+l)-f(xk) ~ m v k, (3.27)
where m E (0,1) is a fixed line search parameter, then the trial point
k+l
y is accepted as the next iterate xk+l=y k+l . Otherwise the algorithm
stays at xk+l=x k. In both cases f(x k+l) ~ f(x k), since m> 0 and ~<0.
The following remarks on the above line search rules will be use-
ful in what follows. The condition for a serious step of the form
instead of a simpler test f(x k+l) < f(xk), prevents the a l g o r i t h m from
taking infinitely many serious steps without significantly reducing the
objective value, which could impair convergence. On the other hand, at
a null step we have xk+l=x k and
~k+l(xk+l+dk~ ~ u k. (3.28a)
~k(xk+l+dk)=uk, (3.285)
From (3.281 we conclude that after a null step the linearization ob-
k+l
tained at the trial point y leads to significant m o d i f i c a t i o n s of
both the next polyhedral approximation and the next search d i r e c t i o n
finding subproblem. Therefore e v e n t u a l l y a serious step must be taken
if the current point is not a solution.
The above algorithm of Lemarechal (1978) was extended by Mifflin
(1982) to constrained convex problems as follows. The set
S = {xeR N : Fix)50},
that is
S c Sk . (3.30)
1 k
This follows easily from (3.10). If the a u x i l i a r y points y ,...,y are
n e a r to x k , then Sk is a c l o s e approximation to S in some n e i g h -
borhood of x k. H o w e v e r , the s o l u t i o n (dk,u k) of (3.18) would usually
give x k + d k l y i n g at some "corner" of Sk which is o u t s i d e S. T h e r e f o -
re a l m o s t e v e r y trial p o i n t yk+l=xk+dk w o u l d be i n f e a s i b l e . F o r this
reason, Mifflin (1982) obtains dk f r o m the s o l u t i o n (dk,~) to the
problem
minimize ]dl2+v ,
max{F a
+ < g ~ , d k > : j=l, .... k} vk . (3.32)
from (3.32), (3.16) and (3.8). Combining this with (3.29), we see t h a t
the trial p o i n t yk+l=xk+dk lies in the i n t e r i o r of S k. T h e r e f o r e we
shall h a v e yk+l S whenever Sk is s u f f i c i e n t l y close to S around
k
x .
The above d e s c r i b e d line s e a r c h r u l e s of L e m a r e c h a l (1978) need
o n l y a s i m p l e m o d i f i c a t i o n in the p r e s e n c e of the c o n s t r a i n t s . If the
k+l
trial point y is f e a s i b l e , then one m a y use the t e s t (3.27) and
proceed as above. If yk+l~ S then a n u l l step xk+l=x k is d e c l a r e d .
Thus f(x k+l) < f(x k) and xke S for all k.
Whenever a null step r e s u l t s from
F ( y k+l) > 0
then
Fk+l(xk+l+dk)=F(yk+l;yk+l)=F(y k) > v k
V
k+l >
Fk+l (xk+ l+dk+l )
from (3.33) and (3.11). The above inequalities imply that (dk+l,vk+l)~
(dk,vk). We conclude that a null step due to infeasibility provides a
significant modification of the polyhedral approximation to the const-
raint function. This explains why a feasible trial point is generated
after finitely many null steps.
The following remark on Lemarechal's (1978) search direction find-
ing subproblem (3.22)(see also (3.175 and (3.18a)-(3.18b)) will be
useful in what follows. Observe that at the k-th iteration the j-th li-
nearization
f(x;yJ)=f(yJS+ < gf(yJ),x-y j > for all x
can be written as
where the numbers l~ j=l ...,k are the Lagrange multipliers of (3.22)
which solve the following dual of (3.22):
hence
gf(yJ) e ~ef(x k) for c=~f(xk,y j) ~ 0, (3.39)
32
which means that the value of ef(xk,y j) indicates how far gf(yJ) is
k
for some set J f c {l,...,k}, and choose the k-th direction dk to
k k
f(x)=max{f(x )+ < gf,x-x > : gf E ~f(xk)} (3.42)
Then ~k
fLW and subproblem (3.41) may be regarded as approximate versions
of the "theoretical" constructions (3.42) and (3.43). Moreover, from
Lemma 2.13 we deduce that
33
-d = pf = Nr ~f(xk), (3.44a)
k k 2
max {< gf(yJ),d k > : je Jf} <--Ipfl (3.455)
k k
Thus pf=-d is found by projecting the origin onto the set conv{gf(y3) :
j EJk}, which approximates ~f(x~). Moreover, as in (2.52), we have
"k
fLw4x k +td k ) <_ f(xk)-tlpkl 2 for all t ~ [0,i], (3.46)
k 2 2
hence the value of -IPfl =-Idkl may be thought of as an approximate
derivative of f at x k in the direction d k.
We may add that search direction finding subproblems of the form
(3.41) are also used in the algorithms of Mifflin (1977b) and Polak,
Mayne and Wardi (1983). A quadratic programming formulation of 43.41)
is to find (dk,v k) to
minimize iLdI2+v .
(3.47)
k
subject to < gfIyJ~d > < v, j ~ Jf ,
k =
jf {j : lyJ_xk I ~ ~k}, (3.50)
k-l} :
-d k = p~ = Nr conv[~pf u {gf(YJ) j J~}~ (3.51)
k-i
pf e conv{gf( yJ ) : j=l .... ,k-l}
carries over from the previous iteration the relevant past subgradient
information. In this case J~ may be selected subject only to the re-
quirement
k e j~,
e.g. one may set J~={k}. The use of (3.51) corresponds to setting
~ W (x)=max{ f(xkl+ < p~-l,x-xk > ,f(xk) + < gf(y3),x-x k > : 3~Jf}
. k
<pk-l,d > ~ v
We now pass to the line search rules used in Wolfe (1975) and
(Mifflin, 1977b). To this end recall that the Lemarechal (1978) algo-
rithm described above generates sequences related by
xk+l k k k
= x +tLd ,
(3.52)
k+l = x k +t~d
k k,
Y
with tk=l at serious steps, tk=0 at null steps, and tk=l for all k.
Moreover, at each step we have
f(yk+l)-f(xk+l) m v k. (3.54)
The above relations follow from the criterion (3.27) and the fact that
t~=l at a serious step, while a null step occurs with t~=0 and
k+l k k+l k ~k k+l .k k+l
x =x . At a null step we also have y =x +e =x +a , hence y -
xk+l=d k and
We have shown above that the direction finding subproblems in the Wolfe
(1975) algorithm can e s s e n t i a l l y be obtained from those in (Lemarechal,
1978) by neglecting the linearization errors. Now, if we assume that
~f(xk+l,yk+l)=0 in (3.55) then we obtain
lyk+l_xk+ll ~ ~k (3.57a)
k k k
f(x k+l) <_f(x )+mLtLV , (3.57b)
<gf(yk+l),dk+l ~ v k+l.
Combining this with (3.56c) and the fact that mR vk > v k since v k=
gf(yk+l) k k 2
< ,Pf > ~mRlPfl ,
means that gf(yk+l) lies in the open halfspace containing the origin.
It follows that the next separating haperplane, corresponding _k+l =
to Jf
~f u {k+l}, must be closer to the null vector, i.e. k+l < IPfl"
IPf k Thus
eventually the direction degenerates Cone can have dk=0), which pro~
vides another motivation for resetting strategies.
To sum up, the second class of algorithms discussed above (Lema-
rechal, 1975; Mifflin, 1977b; Polak, Mayne and Wardi, 1983; Wolfe,1975),
which neglect the linearization errors at search direction finding, need
rules for discarding obsolete subgradients. This is in contrast with
37
the first class (Lemarechal, 1978; Mifflin, 1982), in which the linea-
rization errors automatically weigh the past subgradients, cf.(3.38)
and (3.48).
We shall now review the third class of methods, which is inter-
mediate between the two classes discussed above.It contains so-calledboundle
methods (Lemarechal, 1976; Lemarechal, Strodiot and Bihain, 1981;
Strodiot, Nguyen and Heukemes, 1983). At the k-th iteration of the al-
gorithms based on relation (3.45 I the set
k k k
f(x)=j=iI ljf(x) _>f(xk)+ ]~lljgf(yJ)= 'x-xk > - j=l
~ lj~f( xk ,YJ)
with
- k k 2
< g , P f > ~--lPfl for all ~Gk(e k) (3.61)
1 k 2
minimize ~I3~lljgf(yJ)
I = ,
k
subject to I.> 0, j=l ..... k, ~ lj=l, (3.62)
3- j=l
k
xk,yj) k
j-llj~f(~ <_ e ,
and setting
k
d k = _pk = _ ~_llkgf(yj). (3.63)
3-
38
i.e. we must find a hyperplane separating Gk( E k) from the origin. The
best such hyperplane is defined by pf, cf. (3.601 and (3.61). Note also
k
that if pf=0 then f(x)>f(xk)-e k for all x, which means that the value
of ck should be decreased.
Observe that if ~f(xk,yj)<_~ k for all j, then subproblem (3.62) is
equivalent to (3.48). For smaller values of e k the last constraint of
(3.62) tends to make the subgradients with larger linearization errors
contribute less to d k, since the corresponding multipliers must be
smaller, cf. (3.63). Thus the weighing of the past subgradients depends
on the value of k . Since it is difficult to design convergent rules for
automatic choice of the value of E k (Lemarechal, 19 80), this is the
main drawback of bundle methods in comparison with the first class of
methods based on polyhedral approximations.
Lemarechal, Strodiot and Bihain (1981) have proposed a bundle
method that requires storing only a finite number, say Mg>_l, of the
past subgradients. Suppose that at the k-th iteration we have the
(k-l)-st aggregate subgradient (pk-l,fk)E R N x R, satisfying
k-i k
(pf ,f )e conv{(gf(y3),?(xk;yj) " j=l ..... k-l}.
Then, since
we have
_ fk + k-I k
f(x) > P <pf , x-x > for all x,
hence
k-i f(x k) for e=~,
Pf ~ ~e
where
k = f(x k) - fk.
P P
39
and (3.63) by
The second class of methods require bounded storage and use simple
quadratic programming subproblems, but seem to converge slowly in practice
(Lemarechal, 1982). As for convergence, Polak, Mayne and Wardi (1983) have
modified the line search rules of the earlier versions so as to obtain
global convergence in the sense that each of the algorithm's accumulation
points is stationary.
4~
Strodiot (1985), Lemarechal and Zowe (1983), Mifflin (1983 and 1984).
This research is not d i s c u s s e d here, for our p u r p o s e is to e s t a b l i s h
some general convergence theory in the h i g h e r dimensional and c o n s t r a i n e d
case.
CHAPTER 2
i. I n t r o d u c t i o n
2. D e r i v a t i o n of the A l g o r i t h m Class
minimize ]dl 2 + u,
(2.2)
subject to fj(x k] +<vfj(xk),d> ~u, j~ J .
and we have
k "k k k
Up = fp(X +dp). (2.5)
f,( x k ;dp)
k = max{<vfk(xk),d~> : fj(x k) = f(xk)}
S max{fj(xk)-f(xk)+<vfj(xk),dk > : j e J}
k k
Up - f(x k ) = v~ ,
k
if m ~ C0,1) and Vp<0, which shows that (2.6) must hold if t k is suffi-
k
ciently small. On the other hand, if Vph0 then the method of lineari-
k
zations stops, because x is stationary for f.
In fact, Pshenichny_ defined v~-~ in (2.6) as -Id~I
2 - , which is
slightly larger than v~ given by (2.7), and assumed that the gradients
of fj are Lipschitz continuous. However, it is easy to prove that the
above version of the method of linearizations is globally convergent
when each fi is continuously differentiable, and that the rate of
convergence is at least linear under standard second order sufficien-
cy conditions even when f is nonconvex, see (Kiwiel,1981a). Moreover,
if all the functions fi are affine, then the method finds a solution
in a finite number of iterations. Therefore it seems worthwile to
extend this method to more general nondifferentiable problems.
Although our methods will not require the special form (2.1) of
the objective function, they are in fact based on a similar, but
implicit, representation
1 2
minimize ~ Idl + u ,
42.11)
subject to fk + <gJ,d> ~u, jE jk ,
3
where
V k = u k _ fkxk
t ~. (2.13)
u k = fA~(xk+dk ), (2.14)
~ ( x k + t d k) & ( l - t ) Z~k,
s [ X )k. + t ~ ( x k + d k)
k -
and tLZt. This involves a finite number of function evalutions, beca-
use ~>0 (only one if ~=i). If such a number t>0 exists, we shall set
k+l k k k k+l k+l
x =x +tLd and y =x Ca serious step). Otherwise we have to
k+l k
accept a null step by setting x =x . In this case we also know
k
a number t R ~ [~,i] satisfying
k+l k k k
Therefore at a null step we shall set y =x +tRd , because this new
trial point will define a linearization fk+l by (2.9) that satisfies
see Section 7. Comparing (2.10), (2.14), (2.15) and (2.19), and using
the fact that xk+l=x k, vk<0 and m E (0,i), we deduce that after a null
step we hav~
fk+l'xk+l+dk)s
[ > uk,
~k+l.xk+l+dk+l k+l
s ~ ) =u
provided that
k+le jk+l.
k+l
Thus after a null step the linearization from the trial point y
will modify both the next polyhedral approximation and the next
search direction finding subprob!~m.
We shall now show how to choose the next subgradient index set
jk+l. As noted above, we should have k + l ~ jk+l which is satisfied if
for some set Jkc jk. The obvious choice ~k=~, suggested by the cut-
ting plane methods (Cheney and Goldstein,1959;Kelley,1960), would
49
Lemma 2.1 (i) The unique solution (dk,u k) of subproblem (2.11) always
exists.
(ii~ (dk,u k) solves (2.11} if and only if there exists Lagrange multi-
pliers I k ~ R N, j E jk, and a vector pke RN satisfying
3
Ik>0, je jk, Z I~=i (2.21a)
3- jE jk '
(2.21b)
pk = j~j
E k lk3 g3, (2.21c)
dk = _pk, (2.21d)
uk = _{ I
p1k2 k k
_ jeZjk ljfj}, (2.21e)
satisfies
3k
I i <_N+I (2.23b)
minimize - ljf~
1 j ~jk 3 '
subject to
J EZjk 13 = 1 ,
ljg j k (2.24)
J Z jk = p '
lj~O, j~ jk ,
minimize iIdl2 + u ,
(2.25)
subject to fk + <gJ,d> ~u, j 6 ~k
3
=i
minimize[~(d) ~Idl 2 + Us(d) ] over all d,
ing (2.23). Since these multipliers also solve (2.22), p~rts (ii) -
(iii)of the lemma imply (2.21). Therefore one may use (2.23a) and
part (ii) of the lemma to complete the proof. [ ]
~k k k
p = j ~jk ~jfj (2.26)
minimize 1dl 2 + u ,
+ =1
' ' P j 3 '
Ifk + <g3,dk>
' ~k = 0,
- u k] lj j e 3k ,
k "k k -k j
P = lpP + Z k ljg ,
jeJ
dk :
_pk r
U k = - { I pki2 - ~k~k
p p- jJjk ~kfk
j j}'
~k + < p k , d k > ~ u k ' fk + <gJ,dk> <-uk, je 3 k
P 3
Subproblem (2.28) is of the form (2.11). Therefore the above relations
and Lemma 2.1(ii) imply that (dk,u k) solves (2.28). Hence Lemma 2.1(i)
yields that (2.11) and (2.28) have the same unique solution (dk,uk).~
~k ~ jk
s(X) = max{f + <gJ,x-xk>: je } for all x, (2.33)
Therefore one may use the aggregate linearization (2.35) for search
direction finding at the next point xk+l, where
for each x. The above relations, (2.21a) and (2.29) yield (2.39).
minimize 1dl 2 u,
minimize Idl 2 u
Let Ik3 ' j~ jk, and Ikp denote any Lagrange multipliers of (2.41). Since
subpreblem (2.41) is of the form (2.11), Lemma 2.1 implies that these
multipliers satisfy
d k = _pk , (2.43e)
Remark 2.4. Convergence of the method which uses the aggregate subprob-
lems (2.41) with Jk={k} can be slow, since only two linearizations may
provide insufficient approximation to the nondifferentiable objective
function. Using more subgradients for search direction finding enhan-
ces faster convergence, but at the cost of increased storage and work
per iteration. To strike a balance, one may use the following strategy.
Let M g->2 denote a user-supplied bound on the number of subgradients
(including the aggregate subgradient) that the algorithm may use for
each search direction finding. Then one may choose the set jk+l on the
basis of the k-th Lagrange multipliers subject to the following
requirements:
f(x) = max{~j(x): j c J}
3. The Basic A l g o r i t h m
sequent sections.
Algorithm 3.1.
Step 0 (Initialization). Select the starting point x l E R N and set
yl=xl. Choose a final accuracy tolerance e_Z0 and a line search para-
meter m g (0,i). Set p0=gl=gf(yl), f~=f~=f~yl) and Jl={l}. Set the
counters k=l, i=0 and k(0)=l.
Step 1 (Direction finding). Find multipliers ikj, j6 jk, and ipk that
solve the following k-th dual subproblem
w k = ~I
1 pkl2 + f(xk) - ~kp (3.3)
Remark 3.2. It follows from Lemma 2.1 that in Algorithm 3.1 (dk,u k)
solves the primal subproblem (2.41), where u k is given by (2.43f),
and that I~ j e jk, and Ik are the associated Lagrange multipliers.
3' P
Thus one may equivalently solve subproblem (2.41) in Step 1 of the
above algorithm.
59
Lemma 4.1. Suppose that Algorithm 3.1 did not stop before the n-th ite-
ration, n>l.
. .Then. for. each
. k=l ,n there exist multipliers ~k
~j,
j=l,...,k, satisfying
k
(P k ,fp)
"k = k
Z ~j(g3,fk), 4.1a)
j=l
-k 0 k ~k 4.1b)
lj_> , j=l ..... k, jZIX j = i.
k-1
~k-~_O, j:l ..... k-~, z ~k-~ : 1. 4.2b)
] j=l 3
60
~i
11 =
1, (4.3a)
If k=l then (4.1) follows from (4.3a) and (2.45), because (pl,~p) =
= (gl,fl) = (p0,f). Therefore (2.32) and (2.38) yield (4.2) for
k=2. Suppose that (4.2) holds for some k=n>2. Then, since
k
(pk,~k) = j=IZ Ik(gj,fk) + ik~p~pk-l,fk),
k
Ik>_0, j=l ..... k, ik>_0, Z Ik + Ik = i
9= 1 3 m
k-i
= lk(kg 'k fk) + Z (lk + I k ~k-l)(gj,f~) =
j=l 3 P 3
k
= Z ~k (gj fk)
j=l 3 ' '
= Z +l Z = Z + = i,
j=l 3 P j=l 3 j=l 3 P
k k
= Z lj [fjk + <g3,x k+l _ xk>] = Z kf +l
j=l j=l 3 3
from (2.38), (4.1a) and (2.32). Therefore (4.2) holds for k=n+l, and
the induction step is complete. []
61
p k- ~ ~ f ( x k ) k ,
for e = ~p 4.6b)
k k ~k
Remark 4.3. In view of (4.5), the values of ~j, ep and ep indicate the
distance from gJ, pk-i and pk to the subdifferential of f at at x, res-
pectively. For instance, the value of ~k>0 indicates how much pk dif-
fers from being a member of
P=k
3f(x k); if ~ =0 we have p k 3f(xk).
P
w k : ~ 1J pk l2 + ~ -k
p, (4.7a)
v k = -{~pk I2 + ep},
~k (4.7b)
v k S - wk ~ 0 (4.7c)
Proof. This follows immediately from (3.2), (3.3), (4.4c) and (4.5d).[]
k
Remark 4.5. The variable w may be termed a stationarity measure of
the current point x k, for each k, because --~Ipkl2 indicates how much
k
p differs from the null vector and o" k measures the distance from pk
to ~f(x k) (X k is stationary
Pk
if 0 ~ ~f(x )). The estimates (3.6), which
follow from (4.5c) and(4.7a), show that x k is approximately optimal
k
when the value of w is small.
k
Lemma 4.5. If A l g o r i t h m 3.1 terminates at the k-th iteration, then x
is a m i n i m u m point of f.
k k k
with m>0 and tL>0 , the fact that v <-w <0 (see (4.7c)) yields that the
k -
sequence {f(x )} is nonincreasing.
The next result states a fundamental property of the stationarity
measures {wk}.
Lemma 4.6. Suppose that there exist an infinite set K c {1,2,...} and
a point ~ e R N satisfying x k K ~ and wk K ,0. Then ~ m i n i m i z e s f.
k-i
_~m ~ tL( -v i ) .
i=l
Lemma 4.8. Suppose that there exist an infinite set L c {1,2,...} and
a point x ~ R N such that x k(ll , x as i~, 1 6 L. Then ~ is a m i n i m u m
point of f.
Proof. Let k = {k(l+l)-l: i ~ L}. Observe that the line search rules
imply t~=l for all k e K, while (4.10) yields
k K ~
x . (4.11a)
wk K ,0. (4.11b)
k k = f(x k ) - k - i f
j~jk kjej + Ipep j~jk ljfj p kp
from (4.4a) and (4.4b), which proves the equivalence of (3.1) and (4.12).
k
Since lj ' j~ jk, and ikp solve (3.1) ' the optimal value of (4.12) is
= 2
_ ipkl2 + f(xk) - fp = w
k
65
1, ,2 ~
w = ~lpl + ~p t 4.13b)
Then
~ ~c(W), (4.15)
where
From (4.13a,c,d)
= f(yk) - f(xk).
This follows k
from the fact that ap=f(xk)-f k =f(xk-l)-~ k-I if x k =x k-i
p
see (4.4) and (2.38). Since for each v e ~0,I] the multipliers (4.23)
are feasible for (4.12), we deduce from (4.24) that w k, the optimal
value of (4.12), cannot exceed the optimal value of the following
problem
Proof. Suppose that t~=0 for all k>k and some fixed k. From Lemma 4.11,
we have
1 k2 ~k k
In particular ~IP I + ep = w & w for all k>k, hence there exists
a constant CI<~ satisfying
max{]pk-ll,lgkl,e ~k-i
p ,i} < C for all kak,
where C=max{Cl,C2}. Thus (4.22) holds for constant ck=c and all k>~.
68
Combining Lemma 4.8 with Lemma 4.12 and using (4.10), we obtain
A k "k
Proof. From (4.5c), 0~f(~)-f(xk)~ <pk,x-x ~ -~p, hence
k+l k k k k k
Since we always have x -x =tLd =-~Lp and t ~0, (4.28) implies
^ k , x k+l - xk> _~Lep.
-<x-x <~k~k Therefore
k~k
Z {Ixk+l-xkl 2 + 2tL~p} < ~.
k=l
^ ~ k
Proof. If x ~ X then f(x)~f(x ) for all k, hence Lemma 4.14 implies the
boundedness of {xk}. By Theorem 4.13, {x k} has an accumulation point
X E X. It remains to show that xk+x. Take any 6>0. Since f(x)~f(x k) for
all k, Lemma 4.14 implies that there exists a number n I such that
EZjk lj(0)~ kj + I p ( 0 ) ~ = ~
J
see (4.23) and (4.24), and that
Corollary 4.18. Suppose that inf{f(x): x e R N } > -~. Then Algorithm 3.1
terminates if its final accuracy tolerance s is positive.
71
k
Step i' (Direction finding). Find m u l t i p l i e r s I i, j ~ jk, that solve the
k-th dual search direction finding subproblem (2.22), and a set
J*k ={j e "jk:
I=>0}k satisfying i%11okl<N+l. Calculate the agregate subgra-
dient (p ,fp)J by (2.27). Set d k= -p k and vk= -{I pkl2 + f(xk)_ f~}.
k ~k
Lemma 5.4. At the k-th iteration of Algorithm 5.1, kZl, w k is the opti-
mal value of the following problem
1 ljg j k
minimize ~ [ Z 12 + Z lj ej,
j E jk j E jk
(5.2)
subject to lj~0, j jk, Z ~ = i.
je jk 3
for each ~ e [0,i], and note that (2.21a), (2.23a), (2.27), (2.32) and
(2.38) imply
73
fk = ~k-l,
P P
= k-1 (54bi
je " P "
Remark 5.5. It should be clear by now that the above approach to conver-
gence analysis can be applied to methods that use more subgradients for
search direction finding, cf. Remark 2.5 and Remark 5.3. For instance,
if the sets ~ are chosen subject to the requirement (5.1), then one
may replace (5.3) with the following definition
k
and define the reduced polyhedral approximation to f at x
74
d k = argmin {~(xk+d)+ 1
~[d I2}, (5.10a)
d
k+l
i.e. YC is any point satisfying
see (5.8a). We shall now show that under certain conditions also
then
k+l k k
y = Prxk x = Prxk x , (5.15a)
s r
d k = Pr (5.15b)
D k0'
argmin { u k + 1 1y_xk 2}
ye X k
argmin ly-xkl = Pr x k.
yeX k Xks
S
Since
we similarly deduce that y k + l e Xkr and yk+l= Prxk x k. Then (5.15b) fol-
lows from (5.15a), (5.8) and (5.9c). [] r
To interpret the above result, consider the following condition
k
x + ~ as k+~, (6.6)
hence k
aj=f(x k )-fj(x k) can be expressed as
and we have
and note that each I k has at most N+I elements, since so has jk. Asym-
ptotic properties of {I k} are described in
Proof. Let i~ I be fixed and let K i ={k: i ~ Ik}. Then (6.2), (6.10),
(5.13b) and (5.13d) imply
~k+l. ,
s ix) = max{fi(x): i e I k u {i(k+l)}}, (6.17b)
for any k. This follows from (5.7), (6.5), (6.13) and the fact that we
always have ~ + l = ~ k u {k+l}. Combining Lemma 6.2 with Lemma 5.6, we get
80
Corollary 6.3. Relations (5.15) hold for all kzn , i.e. for sufficient-
ly large k the point yk+l=xk+dk is the nearest point to x k that minimi-
zes the functions ~k and ~k given by (6 17) Moreover,
s r " "
~k(yk+l)s = ~ ( y k + l ) = f(x k) + v k for all k>n , (6.18)
^k yk+l ^~ (6.19)
fr ( ) S f (x) ! f(x) for all x and k~n .
k E
Proof.
--w-----
If x =x for all k large enough, then Lemma 6.1 implies that
I~c I(~) for such k, hence (6.10),(6.11) and (6.13) yield
w k+l ~ min{~}
1 I E ljg j 12: lj~0, je 3 k, 2 lj = i}
j E 3k 96 3 k
for large k. In view of Lemma 6.2, the right side of the above inequality
is equal to zero for large k, hence the algorithm must stop owing to
wk=0 for some k. []
i ( k + l ) ~ I k. (6.23)
Proof. Suppose that f(yk+l)! f(xk)+v k for some k~n . By (6.20),we have
f(yk+1) = min f. On the other hand, the line search rules yield
k+l k+l f(xk+l
x =y , hence )= min f. The next serious step must decrease
the objective value, which contradicts f(xk+l)= min f. Therefore we have
f(yk+1) ) f(x k) v k for all k~n and (6.18) yields
*k
~ Argmin f~ ~ Argmin fr' (6.24)
Proof. By Lemma 6.1 and Lerama 6.2, we have I k c I(x) and ~(Ik)=0.
^k
I k c I(~) and (6.17a) imply fr(X)=fi(x)=f(x ) for all i~ I k. Thus we
for all i~ I k and ~(Ik)=0. Therefore -x-~ A r g m i n
have ~ ( ~ ) = ~ i ( x ) ~k
fr'
Ak - *k N --
cf. (6.16) and (6.17a), and f (x)= mln{f (x): x e R } = f(x)= mln f.
^ r
Since ~.~f for all i and fk ~ f, we obtain f(~)=~k(~)<~k(~)~f(~)
l_+. ~- - - ~- r s r - s -
and f~k(yK
s l)>fK(yK+l),fK(~))
- r - Combining this with (6.19), we obtain
(6.24) and (6.25). If we had f(yk+l)= min f for some kan , this
would contradict (6.22~ and (6.25). []
implies that Assumption 6.8 is satisfied. This follows from the fact
that (6.27) and ~<I)=0 yield rank {ai: i ~ I} = N.
Lemma 6.9. Suppose that Assumption 6.8<ii) is satisfied and i~. Then
for all ken one has
k+l --
y ~ x, (6.28b)
Proof. Let kzn be fixed, so that I k C I(x) and ~(Ik)=0. Let bi=bi+min f
for all i ~ I. From Lemma 6.7 and Corollary 6.3 we deduce that
Thus we may use Assumption 6.8 with I=I k. We have two cases:
(i) If Assumption 6.8(i) is satisfied then (6.19) yields y k + l Argmin f.
(ii) If Assumption 6.8(ii) holds then Argmin ~ is given by (6.29a).
On the other hand, (6.17), Corollary 6.3 and Lemma 6.7 imply
Remark 6.11. P o w e l l ' s polyhedral example (6.1) satisfies the Haar con-
dition and has a unique solution x=(0,0) T. A l g o r i t h m 5.1 finds x in a
finite number of iterations and terminates from any starting point,
using at m o s t 3 subgradients for each search d i r e c t i o n finding. Note
that II(~)I=5.
7. Line Search M o d i f i c a t i o n s
k+l k k k k+l k k k
x = x +tLd and y =x +tRd for k=l,2,...
from the starting point xl=y I. At the k-th iteration the objective func-
tion was e v a l u a t e d only at yk+l=xk+dk, and a serious step was taken
if f ( x k + d k ) ~ f ( x k ) + mv k. The r e q u i r e m e n t t~=l- for a serious step may re-
sult in too m a n y null steps. Therefore, following Lemarechal (1978), we
85
Step 3' (Line search). Select an auxiliary stepsize t kR E [~,i] and set
k+l k k k
y =x +tRd . If
k kR Ca serious
then set tL=t step), set k(l+l)=k+l and increase 1 by i;
otherwise, i.e. if 47.1) is violated, set t~=0 Ca null step).
Observe that if ~=I then Step 3' reduces to Step 3. Also one may
use t~=l as before. When ~<i, the search for a suitable value of
t~ [~,i] may use geometrical contraction, as described in Section 2, or
interpolation based upon the value of f(xk+td k) and <gfCxk+tdk),d k>
for trial values of t>0, and f(x k) and the approximate derivative v k of
f at x k, c o r r e s p o n d i n g to t=0. Many efficient procedures for executing
Step 3' can be designed, see CLemarechal, 1978 and 1981; Wierzbicki,
1978b; Wolfe, 1975 and 1978). The results of Section 6 indicate that an
efficient line search procedure should try a unit stepsize in the neigh-
borhood of a solution.
We shall now indicate the m o d i f i c a t i o n s necessary for the results
of Section 4 and Section 5 to hold also for the algorithms with Step 3'.
In the proof of Lemma 4.8 observe that t~Z~>0 for all k E K. Part(i) of
the proof of Lemma 4.11 may be substituted by the following result, which
is due to Lemarechal 41978).
Lemma 7.1. Suppose that a point y=xk+td k satisfies f(y)>f(xk)+ mtv k for
some t g (0,i]. Let g=gf(y) e 3f(y) and ~=f(xk) - ~ ( y ) + < g , x k - y > ~ . Then
-e + <g,dk> >mv k.
k = 0
tL if t kL < ~, (7.2b)
Functions
i. Introduction
2. D e r i v a t i o n of the M e t h o d s
k+l k k k
x = x + tLd for k=l,2, . . . .
k+l k k k 1 1
y = x + tLd for k=l,2,... , y = x ,
k k k
where the a u x i l i a r y stepsizes t >0 satisfy tL~tR, for all k. The two-
-point line search will detect d i s c o n t i n u i t i e s in the g r a d i e n t of f. The
algorithms e v a l u a t e subgradients
f(x) - ~(x;y),
90
because we have
cf. Lemma 2.4.2. Thus one may have g f ( y ) ~f(x) even when y is far from
x, provided that the linearization error vanishes. This is no longer true
when f is nonconvex; in particular, we may have f(x)-~(x;y)<0. For this
reason, Mifflin (1982) introduced measures of the form
with the convention that y=0 if f is convex, and >0 in the nonconvex
case. Of course, (2.3) and (2.4) are equivalent in the convex case,
since then
while our definition (2.4) puts more stress on the value of the lineari-
zation error for nonconvex f. This will allow for choosing a small value
of the distance measure parameter y.
The algorithm of Mifflin (1982) uses for search direction finding
at the k-th iteration the following polyhedral approximation to f at
k
x
minimize ~k
fM(xk+d) 1
+ ~[d I2 over all d e R N . (2.6)
and
^k ~k, k ~ k
v = XM<X to ) - f(x k) < 0 (2.10)
One can relate the rules (2.9) to the the line search criteria (2.7.2)
discussed in Section 2.7. We shall return to this subject in the next
section and in Section 6.
The M i f f l i n algorithm requires the storage of points y3 for cal-
culating the distances Ixk-yjl involved in ~M(xk,yj). This can be avoid-
ed by using the following upper estimates of [xk-yj I
s k = IxJ-yJl + k-I
Z 1xi+l_xi 1 for j<k, sk = lxk-ykl, (2.11)
3 i=j
f~ = fj(x k) = ~ ( x k ; y j) (2.13)
3
92
k k
(gJ,fj,sj) ~ RN ~ R * R
minimize +Idl 2 + u,
(d,u)~RN-R (2.15)
subject to f(x k) - e~ + <gJ,d> ~u, j ~ jk,
3
since
minimize ~(xk+d) 1
+ ~Idl 2 over all d 6 R N (2.16)
~ ( x k + d k) = u k. (2.18)
Moreover, letting
we see that ( d k , ~k
v) is a solution to the following problem
minimize 11d 2 + v,
^
(d,$)~R N+I
(2.20)
subject to -~ + <gJ,d><$, j c jk.
93
f(x) ~ ~k-1(x)
~(x) ^ -
= max {fk l(x), fj(x): j e jk} for all x (2.231
is a lower approximation to f,
94
(pk-i k k k-i
Z ~-l(gj k k
,fp,Sp) = j=l ,fj,sj), (2.26a)
k-i
-i~0 for j=l .... ,k-l, ~ k-I
. =i. (2.26b)
j=l 3
The value of k indicates how far pk-i is from ~f(xk). Indeed, for con-
P
vex f we have (y=0]
k = f~x
~ k, - fk
P P
by (2.25), while relation (2.26) implies
k-i
sk Z -k-i s~
P = j=l lj 3
is small, hence (2.26b) and the fact that s~ ~Ixk-yjl imply that the
J k
value of ~ - i must be small if y~~ is far from x , i.e. only local subgra-
I
dients gJ=gf(yJ) with small values of sk] contribute significantly to
95
p k-i . Therefore, in this case p k-i is close to 3f (xk ) by the local up-
and use i~ for finding the k-th search direction d k that solves the
problem
minimize 21d[ 2 + ~,
(d,~)~R N+I
(2.30)
subject to -~jk + <gJ,d> <_v; j E J k,
k k-I ^
-~p + <p ,d~ <_v.
v~k = ^
fk(xk+dk) - f(xk), (2.31)
a
(pk ,fp,Sp
"k ~k) = Z ik(gj ,fj,sj)
k k + ik k k
p(pk-l, fp,Sp). (2.31)
je jk
~k ~k
(Pk,fp,Sp) e conv{( gj ,fj,sj)
k k : j=l ..... k}, (2.33)
we shall set
k+l -k k+l k
s = s + Ix -x I .
P P
It is easy to check (see Lemma 4.1) that the above updating formulae and
relation (2.33) yield
Comparing the above relation with (2.26) we conclude that we have com-
k k
pleted the recursion without using the subgradients (gJ,fj,sj) for
j e {l,...,k} \Jk. Consequently, these subgradients need not be stored.
In order to be able to use the above notation for k=l, we shall
initialize the method by setting
Ijk+iI& Mg-l,
97
e.g.
jk+l = { k + l } u Sk,
13kl M g -1
jk = {j e jk: ~ ~0},
l~kl~ N+I,
where l~, j6 jk, are Lagrange multipliers of (2.20), which can be compu-
ted as shown in Lemma 2.2.1 (see also Remark 2.5.2). Then the aggregate
k "k
subgradient (p ,fp,S ) is calculated according to (2.32), but with Ik=0.p
The next subgradient index set jk+l is of the form
jk+l = ~k U {k+l},
so that only local subgradients indexed by j e jk are used for the k-th
search direction finding.
Remark 2.1. As noted above, the use of distance measures s~ for estimat-
3 yj
ing Ixk-y31 enables us to resign from storing trial points . Still,
for theoretical reasons, one may consider the following version of the
method with subgradient selection. At the k-th iteration, let (dk,$ k)
denote the solution to the following quadratic programming problem
minimize 1
~Idl 2 + ,
(d,v) ~ R NI (2.34)
subject to -e(xk,y j) + <gJ,d> ~$, j e jk,
then this version will need additionally to store at most N+2 points
{YJ}j ~ jk for calculating the locality measures ~(xk,y3),- for all k.
k+l
In this case the locality radius a can be computed directly by
setting
and the set jk+l should be reduced, if necessary, so that ak+Is~. The
subsequent convergence results remain valid for this version of the
method. However, we do not think that this version should be more ef-
99
Algorithm 3.1.
Step 0 (Initialization). Select the starting point xl~ R N and a final
accuracy parameter Es>_0. Choose fixed positive line search parameters
m L, m R , a and ~, ][ <_ 1 and 0<mL<mR<l, and a distance measure parameter
7>0 (7=0 if f is convex). Set
1 1
y1=xl ' p0__gl=gf(yl), f~=f~=f<yl), Sp__sl=0' 51__{i}
Set al=0 and the reset indicator rl=l.
a Set the counter k=l.
Step 1 (Direction finding). Find the solution (dk,~ k) to the following
k-th quadratic programming problem
minimize 1
~Idl 2 + v,
(d,$)~ R N+I
subject t o - k + <gJ,d> <_~, je J k, (3.1)
where
100
k = max{if(xk)[_
~p fkP , y(s )2}. (3.3)
Compute Lagrange multipliers I~, j jk, and ikp of 43.1), setting I~=0
if rk=l.
a Set
"k ~k =
(pk,fp,Sp) Z l~(gj, fj,sj)
k k + Ap[p
.k, k-i ,fp,Sp),
k k (3.4)
jE jk
v k = -{Ipkl 2 + ~ } . (3.6)
If i~=0 set
k 1 pk 2 ~k (3.8)
w =31 i + ~p-
If w k ~ s then terminate. Otherwise, go to Step 3.
Step 3 (Line search). By a line search procedure as given below, find
< k< k
two stepsizes t Lk and t kR such that 0_tL_t R and such that the two corre-
sponding points defined by
k+l k k k k+l k k k
x = x + tLd and y = x + tRd
k k,
f(x k+l) <_ f(x k) + mLtLV (3.9)
k k k n
tR = tL if tLzt, 3.10a)
-~ (xk+l,yk+l ) + <gf(yk+l),dk> ~mRvk k --,
i f tL~t 3.10b)
Ip=0 if rk=l'a
d k = _pk, (3.18)
Sk = _{Lpkl2 + Z
k k k k
lj~j + Ip~p}, (3.19)
j ~jk
Ok =
~ ( x k + d k) -
f(x k) _
< v k < 0. (3.22)
g k+l = gf( yk+l ), with yk+l and x k lying on the opposite sides of a dis-
continuity of the gradient of f, will force a significant modification
of the next search direction finding subproblem. The criterion (3.11),
which is related to the distance resetting test, prevents the algorithm
from collecting irrelevant subgradient information.
Clearly, the line search rules (3.9) - (3.11) are so general that
one can devise many procedures for implementing Step 3, see (Lemarechal,
1981; Mifflin, 1977b and 1982; Wierzbicki, 1982; Wolfe, 1978). For com-
pleteness, we give below a procedure for finding stepsizes tL=t kL and
k
tR=tR, which is based on the ideas of Mifflin (1977b and 1982). In this
procedure ~ is a fixed parameter satisfying ~ e{0,0.5), x=x k, d=d k and
k
V=V .
Lemma 3.3. If f has the property (3.23) then Line Search Procedure 3.2
terminates with t~=t L and t~=t R satisfying (3.9) - (3.11).
Proof. Assume, for contradiction purposes, that the search does not
terminate. We recall that the line search is entered with 0<mL<mR<l,
104
Since t~+t,
t~ LT and tu.
i ~ LT if t~=t i there exists an infinite set
I c {1,2 .... } such that t~=tl>t and
By (3.24), we have
hence
--i
where g =gf(x+tid) for all i. But
~(ti-t~)21dl 2} + O,
because ~i ~
tL+t, ~
tl+t,
'
f is continuous and the subgradient mapping gf is
locally bounded (see Lemma 1.2.2). Therefore
f(x+tid) - ti<~i,d>
and hence
Remark 3.5. One may choose trial stepsizes t in step(v) of Line Search
Procedure 3.2 as follows. If on entering step iv) of the procedure we
have tL=0 , which means that t=tu>0 and
t = m a x { ~ t u , % }, (3.28a)
where
t _<max{s, 0.5/(l-mL)}tu,
t = (t L + tu)/2,
or g e o m e t r i c b i s e c t i o n
t = (t L + tu)i/2
Too see this, note that Jl={l} and that in Step 4 the index k+l is the
k+l = lyk+l k+l
l a r g e s t in jk+l and Sk+ 1 -x l~a/2 owing to (3.14b) and (3.11).
T h e r e f o r e k+l cannot be d e l e t e d from jk+l in Step 6.
4. C o n v e r g e n c e
jk kr(k)
r = J U {j: kr(k)<j~k}, (4.1b)
107
for all k.
The following lemma shows that the aggregate subgradient is a
convex combination of the subgradients retained at the latest reset
and the subgradients calculated after the latest reset.
Lemma 4.1. Suppose k~l is such that Algorithm 3.1 did not stop before
~k ^k
the k-th iteration. Then there exist numbers l~,j j ~ Jp, satisfying
~k ~k
(pk,fp,Sp) = Z ~k(gj ,fk,sk) (4 3a)
jE~ k 3 J 3 '
P
~k Ik=l. (4.3b)
~0, j ~ Jp,
je ~k 3
P
Moreover,
k k ^~
a = max{sj: j ~ J }, (4.4)
Proof. (i) It follows from (4.2) and the rules of the algorithm that
we always have j k + i c jku{k+l} and J k c J~k . Therefore, in view of (3.4)
P
and (3.20), we can define additional multipliers
l~=0 for j ~ ~k \ jk
P
so that
-k ~k
(pk,fp,Sp) = ~ l~(g j k k .k, k-i k k (4.6a)
,fj,sj) + Ap<p ,fp,Sp),
jEJ k
P
j~ 3 k J
P
for any k. Suppose that Xk =0 for some k_>l. Then Jp=J
^k k by (4.2], and
(4.3) follows from <4.6)
if one sets ~k=lk for all j E ~ k = J k. Also
3 3 1 P 1
(4.4) is implied by (3.7) if Ik=0.
p Observe that I p =0 since r a =i. Hence
to prove that relations (4.3) - (4.4) are valid for any k, it suf-
fices to show that if they hold for some fixed k and Ik+l>0, then they
P
are true also for k increased by i. Therefore, suppose that (4.3] and
108
(4.4) are satisfied for some k=n and ~ + < 0. By (4.2), we have ~k+l
p =
_~k
-Jp u{k+l}. Let
kk+l + kk+l = 1
i
j 3k+l 3 P
P
which yields (4.3b) for k increased by i. From (4.6), (4.7), (4.3) and
the fact that 3k+l=~k
p p u {k+l} we obtain
k+l k+l gj + ~ + I pk =
p = Z lj
j e 3 k+l
P
= ik+l k+l k+l k+l ~ )g3' = Z -k+l g j ,
lj
k+l g + ~ (~j + Ip
j E ~k j ~ ~k+l
P P
and
k+l
from (3.14). This yields
(4.3a) for k=n+l. Next, since l_ >0 by assump-
k+l P k+l
tion, the rules of the algorithm imply that r a =0, and so a is co-
mputed by (3.15), i.e.
Combining this with (4.4) and the fact that s3k+l=s3+k xk+l-xkl for all
j E ~ and that ~k+l
P = ~kP u {k+l}, we obtain
Lemma 4.2. Suppose that k~l is such that Algorithm 3.1 did not stop be-
fore the k-th iteration, and that f is convex. Then
Proof. As in the proof of Lemma 2.4.2, use (4.3) and the fact that ~k
P
If<xk)-f~I~ if f is convex, since ~=0 in the convex case.
P
110
Our next result states that in fact the aggregate subgradient can
be expressed as a convex combination of N+3 (not necessarily different)
past subgradients calculated at points whose distances from the current
point do not exceed the threshold value a.
Lemma 4.3. Suppose k>_l is such that Algorithm 3.1 did not stop before
the k-th iteration, and let M=N+3 Then there exist numbers ~k and v e c -
" i
tors (yk'i,fk'i,sk'i) E R N ~ R ~ R, i=l,...,M, satisfying
M
(pk ,~k "k
fp,Sp) = Z
i=l ~ki (gf(yk,i) ,fk ,i,
' s k ,i), 4.9a)
M
i >_0, i=l ..... M, Z ~k = I, 4.9b)
i= 1 l
(gf(yk,i),fk ,l,
s k ,1)
'
6 {(gf( yj )'fJ'
k sk):
3 j ~ j ~k } i=l ..... M, 4.9c)
Lemma 4.4. Let x e R N be given and suppose that the following hypothesis
is fulfilled:
there exist N-vectors p, yi, ~i for i=l,...,M, M=N+3, and numbers
~p, ~l, ~l, satisfying
M
(p,~p,Sp) = Z ~i (~i,~i,~i), 4.zOa)
i=l
M
~i>0, i=l,...,M, Z ~.=I 4.10b)
i= 1 1 '
~i ~f(~i), i=i ..... M 4.10c)
ySp = 0. (4.10g)
Proof. (i) First, suppose that y>0. Let l={i:. ~i # 0}. By (4.10g),
~p=0, hence (4.10a,b) and (4.10e) imply ~l=~ for all i ~ I, so (4.10c)
yields ~i e ~f(x) for all i g I. Thus we have p= E ~i ~i, ~i>0 for
iE I
i e I, E ~ =i and ~ i ~ ~f(~), i~ I, so p ~ f ( ~ ) by the convexity of
i~ I 1
~f(~).
(ii) Next, suppose that y=0. Then f is convex and (4.10c) and (4.10d) give
Lemma 4.5. If Algorithm 3.1 terminates at the k-th iteration, kzl, then
the point ~=x k is stationary for f.
Lemma 4.6. Suppose that there exist a point x e R N and an infinite set
K c {i,2, . ..} satisfying x k ----+x.
K -- Then there exists an infinite set K c K
such that the hypothesis (4.10a) - (4.10e) is fulfilled at ~ and
( k ~ k -,Sp)
k +(p, fp, Sp) . (4.11)
p ,fp
Proof. (i) From (4.9d,e), the fact that ~<~ and the assumption that
x k ---x,
K -- we deduce the existence of points --i
y , i=l,...,M, and an infinite
set K i c K, satisfying
yk,i K1 --i
y for i=l,...,m. (4.12a)
gf(yk ,z)
' K 2 ,gl
--' ~ f ( ~ i ) for i=l ..... M, (4.12b)
) = --'
fk,i K2 ~i f(~i} + <gl,~_~1> for i=l ..... M, (3.12c)
Ak
li K~ ~'I for i=l,...,M, (4.12d)
or equivalently
113
Then 0 ~ ~f(x).
Proof. The e q u i v a l e n c e of (4.13) and (4.14) follows from the fact that
w k_ is n o n n e g a t i v e for all k, since we always have w k= ~iIpkI2+ eP~k and
~>0. Thus (4.14) implies pk K ~0 and ~k K D0, so Lemma 4.6 yields
P
the d e s i r e d conclusion.
^k --llpki2 ^k
w = 2 j + ep, (4,14a)
where
^k lk k + ~k k
~p = E 3 J p p. (4.15b)
j EJ k
^k
0 -< k p -< ~p, (4.16)
v k s -w k ~ 0, (4.18)
^k vk
v ! . (4.19)
jeJ k
(4.20a)
j eJ k
k k 2 k ( k 2
< ~ lj~(sj) + Ipy Sp) , (4.20b)
for all k. Since the Lagrange multipliers It~~ and ik are nonnegative, we
3 P
obtain from (4.20)
lk k + ik k ^k
j ~ jk 3 3 P P ~P'
We conclude from the above lemma that in the convex case the va-
riables involved in line searches and the search direction finding sub-
problems satisfy relations analogous to those developed for the algorith-
ms in Chapter 2.
Returning to relation (3.22), we see that (3.22) follows from
(4.18), (4.19) and the fact that w k is always positive at line searches.
Note that for nonconvex f our estimate v k of the derivative of f at x k
^k
in the direction d k can be less optimistic than the primal estimate v ,
since ~k~vk for all k. Thus v k is always negative, hence the criterion
k
(3.9) with mL>0 and t~z0 ensures that the sequence {f(xk)} is nonin-
creasing and f(xk+l)<~(x k) if xk+l~x k.
Consider the following condition for some fixed point x e R N :
k k
tLv 0 as k+~. (4.23)
(ii) If (4.21) is fulfilled and there exist a number {>0 and an infi-
k
nite set K c K such that tL_>t for all k e K , then (4.14) holds.
k k
0_<-tLv _< [f(xk) - f ( x k + l ) ] / m L for all k,
Observe that for a null step tL=0 the above lemma reduces to Lem-
k ~k-1 k . k-1
ma 2.4.11, since then ~p= ~p In this case w is a fraction of w
For short serious steps the rate of decrease of w k depends on the value
k ~k-i
of I~p-ap I and the following properties of the function @C" Note that
in fact C depends on the value of m R e(0,1), which is fixed in our
analysis.
Lemma 4.12. For any ew>0 and C>0 there exist nun~ers e =ea(ew,C)> 0 and
N=N(w,C)~I such that for any sequence of numbers {t l} satisfying
i
one has t <e w for all i>_N.
Proof. For any ~ >0 define the number t(e ) by t(e) = c(t(a ))+~
and observe that @c(t)+e <t for any t>t(e ). Then it is easy to show
that limsup ti~t(e ) for any sequence {t l} satisfying (4.29), because
i~
the function @C(.)+e e is continuous. Define the sequence ~i=4c2,
~i+l= ~ c ( t i ) + e for i>l. Clearly, limsup ~iit(z ) and for any sequence
We conclude from Lemma 4.11 and Lemma 4.12 that w k will become
arbitrarily small, i.e. wk< w for any fixed ew>0 , provided that for
sufficiently many N=N(Cw,C ) consecutive iterations a local bound of the
k -k-i l<_c and
form (4.27) is valid, we have sufficiently small lep-ep
k-i -
t L <t, and no reset occurs. These properties will be established by
the following four lemmas.
A locally uniform bound of the form (4.27) will result from the
following lemma, which gives the reason for the line search rule (3.11).
~k 1
max{Ipkl , ~p}!max{~Igkl 2+ a kk, (I gk I2+ -ek
9 k%1/2~
j J" (4.30)
Chapter 2 this case was equivalent to having t~=0 for all sufficiently
large k, and was analyzed by showing that the optimal value of the
dual search direction finding subproblem decreases after a null step
owing to line search requirements of the form (3.10b). Proceeding along
similar lines, we shall now show that the stationarity measure ~ decreas-
es whenever the algorithm cannot obtain a significant improvement in
the objective value, i.e. after a null step or a short serious step.
~
Lemma 4.11. Suppose that t -i<~ and rk=0
a for some k>l. Then
wk ~k<c(wk-l)+
- k ~k-i {,
l~p-~p (4.2s)
where the function C is defined (for the fixed value of the line search
parameter m R ~ (0,i)) by
W~ k ~mzni~l(l-~)pk-l+
. .1, ~gkl2+(l-~)ak + k
~e~k: ~ E [0,1]} <
P
<min{1(l-~)pk-l+ ~gkl2+(l-~)~k-i + ~ k: v e [0,i] } +
- p
k ~k-i
+ l~p - ~p I"
Then one may use (4.28) and the various definitions of the algorithm
to obtain the desired conclusion by invoking Lemma 2.4.10 as in the
proof of Lemma 2.4.11.
118
Since k e jk, the above multipliers are feasible for the k-th dual sub-
problem (3.17). Therefore the optimal value w~k of (3.17) satisfies
~k<- ~Ig
1 k I2+ ~k'
k hence ~Ipkl2+
1 ~k = wk< 1 Igkl 2+ okk and (4.30) follows
ep - ~ '
(ii) We deduce from the local boundedness of af (Lemma 1.2.1) that the
mappings gf(.) and e(',-) are bounded on the bounded sets B and B x B,
respectively. Therefore, the constants defined by (4.31) are finite.
If Ix-xklsa then Ix-yk[~l~-xkI+Ixk-ykl&2~, because lyk-xkl~a by (3.11).
k
Thus we have xk~ B, yk6 B, gk=gf(yk) and ~k=~(x k ,yk), hence (4.32) fol-
lows from (4.30) and (4.31).
k ~k-i I involved
Our next result will provide bounds on the term l~p-ap
in (4.25).
l l.,
~ x k + l .)-~p
. k + l , l-lf(x k )-f~ll < 1f(x k+l )_fk+l_f( xk)+f~
- p
<
-
~k S
&If(xk+l)-f(xkll+Ifpk + l -fpl
~If(xk+l)-f(xk)[+l<pk,xk+l-xk> I (4.34a)
k+l k k k k k k
from (3.13c). Next, since t~>0L_ and x =x +tLd =x -tLp (see (3.18)
for all k, we obtain from (3.6)
k k k~tLlpkl 2 = <xk-xk+l,pk>
-tLV ~0. (4.34b)
119
consecutive iterations.
Lemma 4.15. Suppose (4.21) and (4.24) hold. Then for any fixed integer
m~0 there exists a number ~m such that for any integer n [0,m~
k+n --
x x as k~, k e K , (4.35a)
^
tk+n
L +u as k+~, k ~ K. (4.35c)
Moreover, for any numbers k, N and E >0 there exists a number kh~,
e K, such that
Proof. (i) We shall first establish (4.35). For m=n=0, (4.35a) follows
from our assumption (4.21). Suppose that (4.35a) holds for some fixed
m=n>0. From (4.35a) and (4.24) we deduce the existence of a number n
such that
(4.35C) follows from (4.35a), (4.24) and Corollary 4.10. Using (4.35a)
and Lemma 4.13, we deduce that Ipk+n[sc for all k E K. Then
tif(xk+1,)- fk+l
p l-lf(xk)-fkl I _< e /2 for all k_>kI. (4.38a)
hence
~k 2 ~y[,sk+l)2<y(~k
y(Sp) P _ Sp) 2 + Ixk+l -xkl (2(yC) I/2 + Y Ixk+l_xkl ) (4.385)
for any kzk I such that Ix-xklga. using (4.32) and (4.39), we deduce
from the first part of the lemma the existence of khk I satisfying
(4.36). []
Then (4.36b) - (4.36d), (4.41b), Lemma 4.11, Lemma 4.12 and our choice
of e and N imply wk<ew=~/2 for some k e [~,~+N~, which contradicts
(4.36a) and (4.41a). Consequently, we have shown that for any number
satisfying (4.41a) we have
rk=l
a for some k e [k,k+N] .
(iv) Letting k = ~ we obtain from part (iii) of the proof that rkl=l for
some k I E [k,k+N], and the rules of Step 5 and Step 6 yield
kI _
a ~a/2. (4.42)
(vi) Since ~=kl+l satisfies (4.41a), from part (iii) of the proof we
deduce a c o n t r a d i c t i o n with (4.44). Therefore, (4.13) must hold.
Proof. One can check that L e m m a 2.4.7 and Lemma 2.4.14 hold also for
Algorithm 3.1 in the convex case. Then T h e o r e m 4.17 and the proofs of
Theorem 2.4.15 and T h e o r e m 2.4.16 yield the d e s i r e d result.
Proof. If the assertion were false, then the infinite sequence {xk}cs
would have an a c c u m u l a t i o n point, say ~. Then L e m m a 4.16 w o u l d y i e l d
(4.14), and the a l g o r i t h m should stop owing to w k S es for large k.
In this section we state in detail and analyze the method for non-
convex m i n i m i z a t i o n that uses subgradient selection in the way describ-
ed in Section 2.
Algorithm 5.1.
where
~k = max{If(xk) -f
]
I, y(s ) } for j~ (5.2)
Sk = {j E Jk:xk ~ 0} (5.3a)
]
^k = ik k (5.4)
~p j ~ j k j~j'
k k k>_~,
t R = t L i f t L 5.6b)
jk+l ^k
= J u {k+l}. (5.7)
k+l ( k+l,
Set g =gf y ~ and
ak+l=max{sk+l
j : j 6 jk+l }. (5.9)
If ak+l ~ --
a then set r k+l =0 and go to Step 7. Otherwise, set rk+~l
a a
and go to Step 6.
dk =
_ pk , (5.12)
^k , 2 k k
v = -{Ip k + ~ . I~.} (5.13)
jEj K J 3 '
where
p
k
= z jk I g3.
~' (5.14)
j~
Moreover, any Lagrange m u l t i p l i e r s of (5.1) also solve (5.11). In par-
ticular, we have
wk = 1
TiP k 12+~ (5.19)
~k 1 k 2 ~k
w = Ti P +~p, (5.20)
hence one can use the p r o o f of L e m m a 4.8 and the c o n v e n t i o n (5.16) to
show that
~k ^k
~p & ~p, <5.21)
w k & ~k,
(5.22)
^k 2 ~k.
V = -{ Ipkl +~p), (5.24)
^k k
v ~ v . (5.25)
^k ^k
Also v ~-w from (5.19) and (5.24), so the line search is a l w a y s
^k
entered with negative v . As o b s e r v e d in S e c t i o n 2,
^ k
Sk : f~(x + d k ) - f ( x k)
is an a p p r o x i m a t i o n to the d e r i v a t i v e of f at x k in the d i r e c t i o n d k.
Thus, except for the d i f f e r e n c e in the v a l u e s of vk and v^k , the
line search criteria (5.6) m a y be i n t e r p r e t e d esentially as in S e c t i o n 3.
We m a y add that for i m p l e m e n t i n g Step 3 of A l g o r i t h m 5.1 one can use
Line Search Procedure 3.2 w i t h vk replaced by v~k .
k+l
In a l g o r i t h m 5.1 the l o c a l i t y radius a is c a l c u l a t e d direct-
ly v i a (5.9) and (5.10), instead of u s i n g the r e c u r s i v e formulae (3.7)
127
~k = jk for all k
P
tk^k
Lv ~ 0 as k~,
128
while in the proof one can refer to (5.6a) instead of (3.9), replace
^k
v k by v and use (5.25). Of course, Corollary 4.10 remains valid,
even if one replaces w k by w^k in (4.13), (4.14) and (4.24).
and
j ~j J j ~j I (5.29)
k-i
Proof. (i) If k 1 and tL <~ then the line search rule (5.6c)
yields -~(xk,yk)+<gf(yk),dk-i _>mR;k-1 , so
ik(~ ) : ~, lj(~)=(l_~)ik-i
j for j ~ ~k-i (5.31)
.k-i j
pk-l= ~ J k-l~j g , (5.32a)
je
Since rk=0
a by assumption, we have J k = 3 k - i u {k} ' so (5 32b) implies
that the multipliers defined by (5.31) satisfy the constraints of the
k-th dual subproblem (5. ii) for each ~ ~ [0,i] . Noting that w^k is
the optimal value of (5.11)(see (5.14), (5.4) and (5.20)), we deduce
from (5.31) and (5.32a) that
~k 1 2 ,k-i k, k
w <_~l(l-v)pk-l+vgk I +(i-'~1
j~Z ~k_l^j ~ j-rg~k
~k <_1<l-)pk-l+gki2 . . ^k-i k .k
(5.33)
for all ~ e [0,i]. Using Lemma 2.4.10 and relations (5.12), (5.20),
(5.24), (5.30) and (5.28), we obtain
min{1(l-v)pk-l+vgkI2+(l-V)ap
.^k-i +v~ k : v [0,I]} ~ ~c(wk-l). (5.34)
We conclude from the above lemma that after a null step (or a
short serious step) of Algorithm 5.1 one can expect a significant de-
crease of the stationarity measure w^k , while in Algorithm 3.1 the
same observation applies to the stationarity measure w k, cf. (5.27) and
(4.25) The rate of decrease of w^k is established by Lemma 4.12 if
Ak~ ).
AS far as Lemma 4 13 is concerned, substitute ~k by ^k
~p mn
P
relations (4.30) and (4.32), and use the fact that wk is the optimal
value of the dual subproblem (5.11).
Recall that Lemma 4.14 was instrumental only in the proof of Lemma
4.15. Therefore it suffices now to consider the following substitute of
Lemma 4.15.
Lemma 5.4. Suppose that (4.21) and (4.24) hold. Then the assertions of
Lemma 4.15 are valid for Algorithm 5.1 if one replaces (4.36b)-(4.36c)
by
max{ipk-ll,lgkl ^k-i ,i} ! C
,~p for k < k < k+N , (5.35a)
II~,
r~x k+l,)-~j
.k+l I-I f(xk) -f~l 1< If( xk+l )-f~+l-f(xk )+f~l <
-- ] --
!If(xk+l)-f(xk)l+Igf(yJ)llxk+l-xk I (5.37a)
Y(S~ )2 ~( sk+i)23
" = y(s~ +lxk+l-xkl )2 <
<_~(s})22ys~l
xklJ -xkl+~Ixkl-xkl 2 <
< y(sk)2+Ixk+l-xkl(2y~+ylxk+l-xkl) (5.37b)
- 3
max{ak+Ixk+l-xkl, ~/2},
which follows from the fact that max{s~:j E ~k} !max{s~:j 6 Jk}=ak.
^k k
(4.14) in w h i c h w replaces w .
To sum up, we have e x t e n d e d all the c o n v e r g e n c e results of Section
4 to A l g o r i t h m 5.1.
6. M o d i f i c a t i o n s of the Methods
A 1 9 o r i t h m 6.1.
k+l k k k k+l k k k
x =x +tLd and y =x +tRd
satisfy
f(xk+l) ~ f(x k) + m L t L
kV k , (6.1a)
lyk+l-xk+ll ~ ~, (6.1c)
tkR ~ ~. (6 ld)
To sum up, we have shown that one may employ the general line search
criteria (6.~) in the methods for convex m i n i m i z a t i o n from Chapter
2 without impairing the global convergence results.
Next,
consider the version of the subgradient selection method for
k
nonconvex m i n i m i z a t i o n that uses the measures e (xk,y j ) instead of ei'
see Remark 2.1. The method results from replacing in A l g o r i t h m 5.1 the
variables k j by ~(xk,yj), and calculating a k+l by
instead of (5.9) and (5.10). For verifying that the global convergence
results of Section 5 cover this version of the method, m o d i f y (5.17)
as follows
(f~k) =,s
F~kF jE~ jk ikj ( fk
3'Ixk-yjl) (6.3)
jk+l = jk u {k+l},
k xk_yj
a = max{ I J: j=l ..... k}
5.5, Theorem 4.18 and Corollary 4.19 under the additional assumption
(6.5).
We shall now present a simple m o d i f i c a t i o n of the methods of this
chapter that we have found useful in calculations. This modification,
which amounts to calculating and using more subgradients for search di-
rection finding, can be m o t i v a t e d as follows. The methods described so
far evaluate subgradients gJ=gf(yJ) at trial points yJ which can,
in general, be different from the points x j. This is due to the use of
the two-point line search for detecting discontinuities in the gradient
of f. However, the lack of subgradient information associated with the
points xj may u n n e c e s s a r i l y slow down convergence. For instance, con-
sider the k-th iteration of A l g o r i t h m 3.1. Recall that for line sear-
ches the variable vk is regarded as an approximation to the directio-
nal derivative of f at xk in the direction d k. For this interpreta-
tion to be valid, we should try to achieve the relation
which would yield f'(xk;dk)= < ?f(xk),d k > 5 v k. But if yk@xk and
gf(x k) is not e v a l u a t e d then we cannot even verify (6.7), let alone
ensuring that it holds. A simple way out of this difficulty is to cal-
culate gf(x k) and use it for the k-th search direction finding by
appending the following additional constraint
^k k
which will yield (6.7), since v i v . Noting that (6.8) can be formu-
lated as
notation, let
fj(x)=f(y9)+ < g3,x-y3 > =fk+ < gJ,x-x k > for all x
3
and the following upper estimates of Ixk-yJl
=Ixl91y91i=
s 91 k-i
Ixi+l-xil for lJl < k, skk=Ixk-ykl, (6.13)
fk+l
-(k+l) =f(xk+l )' (6.16a)
k+l
S_(k+l)=0, (6.16b)
etc.
To sum up, the above described m o d i f i c a t i o n has no impact on the
theory of our methods, but may increase their computational efficiency
by decreasing the number of null steps (or short serious steps), thus
leading to faster convergence. Of course, this advantage should be we-
ighed againts the work involved in evaluating additional subgradients.
We may add that in many applications it is relatively easy to calcula-
te gf(x k+l) once f has been evaluated at xk+l
since then the line search rule (2.9c) yields yk+l=xk+l=xk and gk+l=
gf(yk+l)=gf(xk+l). The test (6.17) is related to the desirable relation
^k
(6.9), since if (6.17) holds then the fact that m R ~ C0,1) and v < 0
at line searches yield
~k ^k
< gf(xk),d k > ~ mRv > v ,
since (dk+l,v k~l) solves the (k+l)-st subproblem of the form (2.34)
with k + 1 6 jk+l={l,...,k+l }. Thus relation (6.9) is satisfied for k
increased by i. We may add that the line search rule (2.9c) plays a
137
k k k k 2
~j = max{f( x )-fj,T ( sj) } , 6.18a)
k k) k k 2
~p = m a x {x f .(- f p(, Ts ) ~ }, 6.18b)
~k k ~k ~k 2
ep = max{ f( x ) - f p , T ( Sp) } , 6.18c)
k k 2
j = T(sj) ,
k = y(s~) 2,
ep (6.19)
2
~p l
where this time Y is fixed and positive even if f convex. One can easi-
ly v e r i f y that global c o n v e r g e n c e results of the form of T h e o r e m 4.17
remain valid for d e f i n i t i o n s (6.19). However, these results can no long-
er be s t r e n g t h e n e d in the convex case to the form of T h e o r e m 4.18,
because s u b g r a d i e n t locality m e a s u r e s (6.19) neglect l i n e a r i z a t i o n er-
138
rors. For this reason, definitions (6.19) are inferior to the two de-
finitions discussed above.
CHAPTER 4
i. I n t r o d u c t i o n
2. D e r i v a t i o n of the Methods
k+l k k k 1 1
y =x +tRd for k=l,2,..., y =x ,
and e v a l u a t e s u b g r a d i e n t s
where f~=f (xkl for all j < k. In the convex case we found it con-
3 J
venient to use the following polyhedral approximation to f
since
f(x) > f~(x) for all x, (2.4a)
k k
~j = f(xk)-fj - (2.7)
~k
fs(X ) = max{f(xk)_ k + < g J , x _ x k > : je jk} for all x, (2.8)
NdM = Z kl~g
~ j, (2.11)
j~ J J
k 2
~k3 = m a x { I f ( x k ) - f ~ I , y(sj) }, (2.13)
sk = IxJ-yJl+ k-I
Z 1 xi+l xil
3 i=j
were used for estimating Ixk-yJl without storing {yJ} (although one
could use
2
~jk = max{if(xk)_f~l ylxk-y 3] } (2.14)
since then each k3. is positive (s~ -> Ixk-yj I > 0), and so
is less than f(xk+d) for small Idl because of the Lipschitz continu-
ity of f. We also have (2.4a) for ~k defined by (2.8) and (2.13) if
s
f is convex and y=0. Thus (2.15) may be regarded as a local version
of (2.4a) in the nonconvex case.
Relation (2.13) defines subgradient locality measures, since in
the convex case (y=0) (2.13) reduces to (2.7) and we have (2.12), while
in the nonconvex case we may use the following d e f i n i t i o n of the
Goldstein (1977) e-subdifferential
~f(x;e)=con{~f(y):ly-xl i E} (2.16)
to deduce that
k 1/2
g J e ~f(xk;el for ~=(~j/) , (2.17)
since
lyJ-xk I _< sk<3- ( ~ / y ) i / 2
3 k = {j 6 jk: Ik ~ 0}
3
_k+l tk
and set d =J u {k+l} as in Chapter 3. Next, we shall use suitable
rules for d e c i d i n g w h e t h e r jk+l should be reduced by deleting indi-
ces j corresponding to large values of distance measures s k.. Since
3
specific deletion rules are applicable also to the method with subgra-
dient aggregation described below, we postpone their discussion till
the end of this section.
Let us now pass to the method with subgradient aggregation. As in
Chapter 3, for search direction finding at the k-th iteration the agg-
regate subgradient (pk-l,f~),- satisfying
may replace the past subgradients (gj,fkj), j=l ..... k-l. Since (2.18)
corresponds to the following linearization error
146
k
~p = If(xk)-f~i (2.2~)
associated with the (k-l)-st aggregate linearization
minimize 1
~[d I2 +v,
"
2.24a)
(d,v) e R N+I
k
-~p+ <
pk-i ,d > < v.^ 2.24c)
such that
d k = _pk,
(pk,~k) e conv{(gj,fk):
3 j~ 3 k-I
p u jk}. 2.27)
Therefore, by setting
147
^k = max{Ixk-y3[:
aj j 6 jk},
^k = max{[xk-yj[:
ap j ~ 3~-i}, (2.28)
^k
a = max{aj,ap}
^k ^k
^k 3k-i
the only way of r e d u c i n g ap, ~ this is necessary, is to reset P
to an empty set, so that a vanishes. This is e q u i v a l e n t to dis-
P
carding the (k-l)-st a g g r e g a t e s u b g r a d i e n t in the d e f i n i t i o n (2.22) of
^k
fa' and to d r o p p i n g the last c o n s t r a i n t of (2.24) at the search direc-
tion finding.
We shall now d e s c r i b e a simple s t r a t e g y for l o c a l i z i n g the past
s u b g r a d i e n t i n f o r m a t i o n used for search d i r e c t i o n finding. It is based
on an idea due to Wolfe (1975) in the convex case. The concept of the
Goldstein subdifferential (2.16) is useful in the n o n c o n v e x case, be-
cause it is defined d i r e c t l y in terms of n e i g h b o o r h o o d s of a given
point. Moreover, ~f(x;0)=~f(x) and, owing to the d e f i n i t i o n (1.2.14)
of ~f, ~f(x;) is a close a p p r o x i m a t i o n to ~f(x) if the value of eis
small. Our aim is to obtain at some i t e r a t i o n pk~ ~f(xk;c) for some
small values of ~ and Ipkl ; then x k will be a p p r o x i m a t e l y s t a t i o n a r y
148
pk = E kljg 3, (2.29a)
j~3
P
~k ^k
lj ~0, j 6Jp, j6Z ~k ~ =i , (2.29b)
P
max{lyJ-xkl: %
j e J } ~ a k, (2.29c)
then we would always have a k+l ~ a k. Therefore (2.31) would hold only
at later iterations for large values of a k.
a k = max{sj:
k j e jk}. (2.32)
Next, if (2.31) holds once again, the resetting procedure should be re-
149
pk ^k
{ l&ma a ,
where
a^k = m a x { l yJ_x k I: j 6 ~k} .
Algorithm 3.1.
2 n
minimize 21dl +v,
( d , v ) ~ R N+I
subject to _ k+ < gJ,d > <_ v, j e ~ , (3.1)
A
_ k+ < pk-l,d > & v if r~=0,
where
ejk = If(xk)-f I for j 6 jk, (3.2a)
, p) (3.3)
j e J klj( ' ~
(ii) If IJkl > 1 then delete the smallest number from jk and go to
Step i.
(iii) Set yk=xk, gk=gf(yk ), fk=f(y
k k ), s =0, Jk={k} and to Step i.
k k k >~, (3.8a)
t R = tL if tL _
jk c jk, (3.12a)
and set
jk+l = 5 k u {k+l}. (3.12c)
k+l = sj+
k x k+l -x k I (3.145)
s]. for j 6 ~k
Calculate
a k+l =
max{ak+ ixk+l xkl _ ,
k+l,
Sk+l~ (3.15)
dk = _pk, ( 3.17 )
^k 2 k k k k.
v =-{Ipkl + E jklj~j+~p~p~ , (3.18)
jE
153
k ,
where p is given by (3.3). Moreover, any Lagrange multipliers of
(3.1) also solve (3.16). In particular, we always have
Thus one may equivalently solve the k-th dual search direction finding
subproblem (3.16) in Step 1 of the algorithm.
The algorithm stops at Step 2 when
k
p 3f(xk;Es/ma ) and IPkl <-as'
k .
i.e. when x is approximately stationary for f. This follows from the
fact that p k e ~f(xk;a k) at Step 2, as will be shown below.
k k
Thus v < 0 is an estimate of the directional derivative of f at x
in the d i r e c t i o n d k, and the criteria (3.7)-(3-.9) may be interpreted
similarly to the line search criteria of Section 3.3. It will be seen
that the additional requirement (3.10) ensures that the subgrad-
Sent --~+1=gf(yk+1) is calculated sufficiently close to the next point
x k+l . We show below that the variable ek decreases and s k is corustant
whenever the algorithm executes a series of null steps at deblocking,
154
Proof. Use the proof Lemma 3.3.3, setting ~=i-~ and observing that
we must have either tL=0 for all i and tiJdk I <_ eks k for large i,
i
or t -t L <_~ >0 for large i~
that localize, via the line search requirement (3.10a), the new sub-
gradient information at null steps. Upon c o m p l e t i o n of Step 6, the cur-
rent value of 1 is equal to the number of serious steps taken so
far. In general, we have k(1) < k(l+l) and
X
k =
xk(1) if k I] ! k < k ( i + I ) , (3.21a)
k
tL > 0 if k = k(l+l)-l, (3.21c)
4. C o n v e r g e n c e
jkr}k L
3k = {j : kr(k ) < j ~ k } , (4.1b)
for all k.
(p k ,fp)
~k e conv{( gJ ,fk)] : j E ~ } (4.4)
= max{s : j E J }, (4.5)
157
IY 3-xk
I <- ak for all j e Jp,
^k (4.6)
14. I
(4.1) and (4.2) imply that kp(k) _>kr(k ) and ~k~p ~kr for all k.
Hence it suffices to prove that J*k c ~k for any k. If rk=l for some
r a
k, then kr(k)=k by (4.1a) and j k c {j : k-Mg+2 ~ j} by the rules of
P k E af(xk;a k) (4.8)
k k "k "k
P E aef(xk) for E=f(x )-fp=Uf_> 0.
Proof. As in the proof of Lemma 2.4.2, use (4.4) and the fact that
~k
p=l f ( x k ) _ ~ I- S
First, we consider the case when the algorithm terminates.
Proof. If max{Ipkl,ma ak} ~ ~s=0 then (4.8) and the fact that ma > 0
yield 0e af(xk).
2 ~k
Since dk=-p k by (3.17), vk=-{ Ipkl +~p} by (3.5), and ~ ~0 by
(3.4), we obtain from (4.9) that the line search is always entered
with
d k ~ 0, (4.10)
k
v < 0. (4.11)
This establishes (3.20). Moreover, the criterion (3.7) with m L > 0 and
k
tL ~0 ensures that the sequence {f(xk)}- is nonincreasing and
Our next result states that the aggregate subgradient can be ex-
pressed as a convex combination of N+2 (not necessarily different) pa-
st subgradients.
Lemma 4.4. At the k-th iteration of Algorithm 3.1 there exist numbers
~ and vectors (yk,i,fk,i) R N R, i=l ..... M, M=N+2, satisfying
~ki ~ 0, i=l,...,M, Z
M ~k:l, (4.12)
i=l I
(yk,i,fk,i) E {(yj,fk)
:3 j ~}' i=l, .... M,
Comparing Lemma 4.4 with Lemma 3.4.3 we see that the only differ-
ence stems from the fact that we are now considering tuples (p k ,fp)
"k
instead of triples (p k ,Zp,Sp
~k ~k).
Lemma 4.5. Let ~ e RN be given and suppose that the following hypoth-
esis is fulfilled:
-- --i --i
there exist N-vectors p,y ,g for i=I,...,M=N+2, and numbers
M
f -- --i ~i (4.13a)
(P, p)=iZlli(g.= ' ),
M
~i ~ 0, i=l,...,M, Z ~.=i, <4.13b)
i=l 1
_ , i=l,...,M, (4.13e)
max{~l : ~i @ 0} = 0. (4.13f)
3.4.4.
(pk,~k) K (~,~p).
~k K ,
If a d d i t i o n a l l y ak K 0 then p~ ~f(x) and ~ 0.
P
160
Proof. Using Lemma 4.4, and Lemma 4.5 let sk'i=lyk'i-xk I for i=l,..
.,M and k e K, and argue as in the proof of Lemma 3.4.6.
k 1 2 ~k
w = 7fpkl +~p (4.14)
or e q u i v a l e n t l y
k K
x + x and w k _K ~ 0. (4.16)
Then 0 ~ ~f(x).
Proof. (i) The equivalence of (4.15) and 4.16) follows from the non-
Lemma 4.6. Also max{ ipkl,l~_xk[} K -~ 0, hence we have shown that (4.16)
implies (4.17).
(ii) It remains to show that (4.17) implies (4.16). Suppose that (4.17)
^k 1 . . 2 ^k
w = ~Ipkl +~p, (4.18a)
where
^k k k k k
~p = ~ l_.~.+Ip~p.j
J (4.18b)
j 6 jk
"k ^k
0 ~ ep < ~p, (4.19a)
k ~k
0 ~w ~ w , (4.19b)
k k
v ~-w <_0, (4.19c)
^k k
v ~ v . (4.19d)
or
162
Conversly, (4.21) implies (4.20), so in fact (4.20) and (4.211 are equi-
valent.
k K
Our aim is to show that (4.20) implies that w 4 0 for some
KcK. To this end we shall first analyze the case of "long" serious
steps in the following lemma, which can be e s t a b l i s h e d s i m i l a r l y to
Lemma 3.4.9 if one uses (3.71, (4.19c) and (3.21a).
kk
tLv ---+0 as k~.
^
(ii) If (4.21a1 is fulfilled and there exist a number t > 0 and an
infinite set Lc L such that t kL > ~ for all k=k(l+l)-i and i ~ L,
then (4.16) holds.
Corollary 4.10. Suppose (4.20) is satisfied, but (4.15) does not hold,
i.e.
k
m a x { t L : k(1) < _ k < k ( l + l ) } , 0 as i~, l e L. (4.23)
Proof. The a s s e r t i o n follows from Lemma 4.9 (ii), (3.21b1, and the equi-
valence of (4.15) and (4.16) on the one hand, and of (4.20) and (4.211
on the other.
W k <
_W ~k _< #c(wk-l) + k
~p-~p ~k-I I (4.24)
163
~c(t)=t-(l-mR)2t 2 /(8C2),
max{Ipk-i I , Igkl ,
~k-l,l}
p
~ C. (4.25)
max{[pki,lgkl
~k,l}
p
<- C if Ixk-xl ~ a , (4.26)
k ~k-I
Our next result demonstrates that the term lap-ep I involved
in (4.24) vanishes in the limit.
k+l-~kl---+
ep ~p I 0 as k~.
may use Lemma 4.9(i) to obtain the desired conclusion as in the proof
of Lemma 3.4.14.
(i) For any fixed integer m ~0 there exists a number im such that
164
(ii) For any numbers k, N and > 0 there exists a number k >
such that k e K={k(l+l)-i : le L} and
k
w >_~/2 for k=k ..... k+N,
max{ipk-l[,igkl, ~k-i
p , i} ~ C for k=k t- ..,k+N,
(4.29)
k "k-l[ _
~p-ep < ~ for k=k, . . . ,k+N,
k-I
tL <~ for k=k ..... k+N,
Proof. (4.27) can be proved by using Corollary 4.10 and Lemma 4.12
as in the proof of Lemma 3.4.12. Then (4.28) follows from (4.27),
(3.21), (3.22) and the fact that ~ 6(0,1). Part (ii) of the lemma
follows from part (i), (4.26) and Lemma 4.13.
Since the above lemma dealt with the case of an infinite number
of serious steps, we now have to analyze the remaining case.
Lamina 4.15. Suppose that (4.21b) holds. Then t~=0 for each k ~ k(1)
k
and y - - x as k~. If additionally (4.22) holds then the second
assertion of Lemma 4.14 is true.
Proof. Suppose that (4.21b) holds. Then, since we always have x k+l=
k k-k
x +tLd and dk~0 (see (4.10)), we deduce from the line search rule
(4.10) that tk=0 and lyk+l-xk+l[=[yk+l-xl <_ 0ks k for all k >_k(1).
But tk=0 always yields 0k+l=~e k and sk+l=s k, hence 0k+0 (0< -G<I)
165
Lemma 4.16. For any k >i, let bk denote the m i n i m u m value taken on
ak
by m a x { I p k l , m a ak} for the successive values of pk and calcula-
ted at each e x e c u t i o n of Step 1 of A l g o r i t h m 3.1 at the k-th iteration.
(The v a r i a b l e bk is w e l l - d e f i n e d , because there can be only finitely
many returns to Step 1 at any iteration.) Suppose that
Proof. For any k > i, let ak and pk have the values calculated at
Remark 4.17. One may check that, except for Lemma 4.7, all the p r e c e -
ing results of this section hold also for v a r i a b l e s generated by each
execution of S t e p 1 at any iteration. Lemma 4.7 assumes that IpkI>maa~-
hence it deals only w i t h the v a r i a b l e s calculated by the last execu-
tion of Step 1 at any iteration.
and
liminf max{b k, l~-xkl } >_~p. (4.31)
k+~
(i) Let ~w--~/2 > 0 and choose e =~ (ew,C) and N=N(ew,C) < +~, N>_I,
~ { ~+~-2~, (4.33)
k
ra : 0 for all ~ Lk,k+Nj. (4.34)
Then (4.29), (4.34), Lemma 4.11, Lemma 3.4.12 and our choice of E
[~^--
and N imply that w k ~ ew=~/2 for some k e ,k+N] , which contradicts
(4.33) and (4.29). Consequently, we have shown that for any number
k satisfying(4.33) we have rk=l
a for some k e [k,k+N].
(iv) Let kl=k+2Mg " Then part (iii) of the proof implies that r k2-1 =1
a
k-i
for some k 2 ~ [kl,kl+N]. Since k2-Mg > k and re2 =i, we obtain from
(v) Since k=k 2 satisfies (4.33), from part (iv) of the proof we
deduce a contradiction with (4.38). Therefore, either (4.15) or (4.30) must
hold,
Combining Lemma 4.18 with Lemma 4.7 and Lemma 4.16, we obtain
Proof. If the assertion were false, then the infinite sequence {xk}s
would have an accumulation point, say ~. Then Lemma 4.18, Lemma 4.7 and Le-
mma 4.16 would yield liminf max{Ipkl,maak}=0, so the algorithm should
k+~
stop owing to m a x { I p k I ' m a ak} < ~s for large k.
168
Al~orithm 5.1.
Step 0 (Initialization). Select x I ~ RN and es Z 0. Choose positive
parameters ma, m L, mR, a, t, 0, Mg and sI with ~I, mL<mR<l, O ~ I and
Mg ~ N+2. Set 01=[, r~=0, Jl={l}, yl=xl, gl=gf(yl), f~=f(yl) and s~=0.
1 2^
minimize ~Id[ +v,
(d,$) e R N+I (5.1)
subject to -e~+< gJ,d > ! $, j 6 jk,
where
~jk = If(xk)-f ~ I for j 6 jk (5.2)
Set
ak = max{sjk : j6 jk}.
^ (5.4)
rka =i.
(ii) If IJkl ~ 1 then delete the smallest number from jk and go to
Step i.
(iii) Set yk=xk r gk=gf(yk) t f~=f(yk) i s~=0 t Jk={k} and go to Step 1
169
Step 5 (Line search). Find two nonnegative stepsizes t kL & t kR and the
k k k
t R = t L <i if tL>~ , (5.5b)
k+l
Step 8. set r a =0, increase k by 1 and go to Step 1.
^k
then recover (dk,v) via (3.5.12)-(3.5.14).
As in Section 3.5, we may derive useful relations b e t w e n variab-
les g e n e r a t e d by A l g o r i t h m 5.1 by setting
lk = 0 for all k
P
p--Kf(xk) I,
^k k k
~p = Z -i.~., (5.6)
j6J K 3 3
k 2 -k
w k = 1p +ep,
"k 1 k 2 ~k
w = ~IP +~p,
V k = -{ Ipk[ }
d k = _pk, (5.7a)
~k ~k
ep ~ ap, (5.7b)
k ~k
w S w , (5.7c]
~k k
v ~ v , (5.7d~
cf. (3.5.12)-(3.5.251 .
6. M o d i f i e d R e s e t t i n ~ Strategies
of past subgradients are used for search direction finding. This draw-,
back is eliminated to a certain extent in the following m o d i f i c a t i o n
of A l g o r i t h m 3.1.
Before stating the modified method, let us briefly recall the ba-
sic tasks of subgradient deletion rules. In this chapter we concentra-
ted on rules for localizing the accumulated subgradient information.
Such rules ensure that polyhedral functions of the form (2.8) and (2.22)
are close approximations to f in a neighborhood of x k. On the other
hand, in Chapter 3 we used resetting test of the form ak S ~ to ensu-
re locally uniform boundedness of the subgradients that were aggregated
at any iteration. Observe that in the methods of this chapter there was
no need for distance resets through ak ~ a , since we had a k ~ Ipkl/ma
and IpkI was locally bounded (cf. Lemma 4.12). However, if we substi-
tute the resetting test Ipkl ~ m a ak by some other test, then we shall
no longer have estimates of the form ak 5 IPkl/ma Therefore, we shall
additionally use a distance resetting test of the form ak S ~ . For a
sufficiently large value of ~, say a=103, a reduction of the past
subgradient information due to ak being larger than ~ will occur
infrequently.
To derive a new resetting strategy, suppose that in Step 1 of AI-
-k
gorithm 3.1 we calculate the aggregate distance measure s by setting
P
(p k ,fp
~k , Sp}=j~
~k k
Z jklj(g j ,fj,sj)
k k +ik(pk-i
P ,fk
p , S pk) (6.1)
Spk+l = Sp+IX
"k k+l
-xk I (6.2)
( k ~k "k ~k j k k
p ,fp ,s ~ )= j ~~ lj(g ,fj,sj) (6.3a}
~k ^k ~k=l , (6.3b)
lj >_ 0 j 6 Jp, jE 3k lj
where
~k = jkp(k) U {j : kp(k) < j ~ k } ,
~k
Thus the aggregate d i s t a n c e m e a s u r e Sp, b e i n g a convex c o m b i n a t i o n of
k k
the d i s t a n c e m e a s u r e s sj, is no greater than the locality radius a ,
k -k
w h i c h is the m a x i m u m of the c o r r e s p o n d i n g sj-s. The value of Sp, in
c o n t r a s t with that of a k, can decrease even if no reset occurs. Also
the value of ~k can be small even if the value of s~ is large for
3
some j e ~, P
provided that the value of ~k3 is small. This means that
if ~k has a small value then only local past subgradients gJ (with
r e l a t ip v e l y small sk] ~ lyJ-xkl] contribute s i g n i f i c a n t l y to the current
direction dk=-p k (have large Ik in (6.3a)). For this reason we
]
shall reset the m e t h o d only if
Set rk+l=l.
a
Proof. Since a formal proof of the theorem would involve lengthy repe-
titions of the results of Section 4 and Section 3.4, we give only an
outline, hoping that the reader can fill in the necessary details.
First, we observ~ that we always have Ipk[ > maS ~ and ak S a at Step 5
of the method. Using this, one may replace Lemma 4.1 through Lemma 4.6
by Lemma 3.4.1 through Lemma 3.4~, with the condition
,,~k K ~ 0" in
P
Lemma 3.4.6 being replaced by the condition ,,~k K * 0". In the proof
P
of Lemma 4.7, use the fact that ~kp <- ipkl/ma at Step 5. In the formu-
lation and the proof of Lemma 4.16, replace a k by ~k. Finally, while
proving Lemma 4 18 observe that ~kp <- a k from (6.3], replace
p (4.36)
with
assume with no loss of generality that ~p/(2ma) < a/2, and deduce that
in part (iv) of the proof we have ak ~ a and [pk I > mas ~ for k=k2,.
..,k2+N, so that (4.38) holds, as required.
when max{[pkl, maak} ~ ~k, which means that the algorithm has found a
significantly better approximation to a stationary point. When 6k
decreases, the line search requirement (6.7) ensures that the algorithm
collects p r o g r e s s i v e l y more local subgradient information.
We may add that in the original version of the Wolfe (1975) stra-
tegy, one would use only part (iii) of Step 4 of A l g o r i t h m 3.1, i.e.
each reset would involve restarting the method from the current point
x k, and d i s c a r d i n g all the past subgradients. As observed by Mifflin
(1977b), such a strategy is inefficient, because it leads to many null
steps accumulating subgradient information to compensate for total
resets,
The following result describes convergence properties of this modi-
fication of A l g o r i t h m 3.1.
k k 2
tLIP [ =Ipkltkldkl=IPkl Ixk+l-xkl >-~Ixk+l-xkl,
i.e.
k
tLlP I
2 ~ ~ f kl - xkl for all k > k 6. (6.8)
k+n-i
Ixk+n_x k _< ~ Ixi+l-x i I-- 0 as k,n --*~
i=k
xk ~ x as k+~. 6.10)
k
maa >~ at Step 4' for all k >_k 6, 6.1l)
k
s~ < ~/m a for all large j &k. (6.14)
J
177
Then one may set ~p=2~ and proceed as in the proof of Lemma 4.18,
deleting (4.30a)and using (6.14) to replace (4.371 by
k k
a = max{sj : j~ 3 k} < ~ / m a for k=k 2 .... ,k2+N , (6.15)
~k 6k 6k
if maS p ~ then replace by 6k/2.
Using the preceding results, one may check that convergence pro-
perties of the above modification of Algorithm 6.1 can be expressed in
the form of Theorem 6.3.
We shall now describe a resetting strategy based on the ideas of
Mifflin (1977b). Thus consider the following modification of Algori-
thm 3.1. In Step 0 we choose a positive 61 . Step 3 and Step 4 are rep-
laced by the following
lyk+l-xk+Zl~ 6 k / m a . (6.161
{j e jk+l : sk+l
J _< ~6k+i/ma} so that the reset value ~ a k+l satisfies
where m ~ = ~ m a 0, and
At the same time, one can have a reset at Step 8" due to ma ak+l ~ 6k
even if the value of IPkl Z 6k is large. In this case a reset occurs
even though ]pk I > m a ak, in c o n t r a s t with the rules of A l g o r i t h m 3.1.
We note that on each e n t r a n c e to Step 4" one has
m a x { I p k l , m a a k } ~ 6k (6.21)
Ix k + l - x k I < + ~ , 6.22)
k=l
k
x J x as k+~. 6.23)
k -- i(~2
w > e = > 0 for all k >_ k 6. (6.24)
By (6.16), jyk+l-x~+l
l u ~ ~klm a < ~ 6 1 1 m a , so L e m m a 4.12 r e m a i n s valid.
showing that no reset due to maak+l > ~k+l can occur at Step 8" for
any k > k 2. Thus (4.38) holds and we obtain a contradiction. Therefore,
~=0.
(ii) Note that for each k eK we have ipkl < k and ma ak <_ 6 k frc~
(6.18), so
b k = m a x { i p k l , m a a k } ~ dk for all k e K.
Set rk=l
a and go to Step i.
183
lyk+l-xk+ll ~ e ~ k + i / m a . (6.29)
Finally, Step 8 is s u b s t i t u t e d by
k+l k
Step 8" (Distance resetting). If 6 =~ then go to Step 9. Otherwise,
replace jk+l by the set {j e jk+l : sk+l ~k+i/ma}. If a k+l > ~ k + i / m a
3
k+l
then set r a =i and
(ii) We now c l a i m that Lemma 4.18 and its proof remain valid if one
replaces in A l g o r i t h m 3.1 Step 4 and the line search r e q u i r e m e n t s (3.9)
-(3.10) by Step 4'" and (6.29), respectively, p r o v i d e d that 6k+0. To
justify this claim, use (6.29) and the a s s u m p t i o n that 6k+0 for show-
ing that Lemma 4.12, Lemma 4.14 with (4.28a) deleted, and Lemma 4.15
are true.
(iii) The theorem will be proved if we show how to modify the proof of
Lemma 4.18. Thus suppose that (4.20), (4.22) and (4.31) hold. Let K6=
{k : 6k+l < ~ k } . F r o m part (i) above and the a s s u m p t i o n (4.20) we know
that K~ is infinite and 6k+0. Therefore, in view of part (ii) above,
in the proof of Lemma 4.18 we need only c o n s i d e r additional resets oc-
curing at Step 8" for k E K 6. To this end, suppose that k is so large
that ~6 k < ~ p / 2 , where ~p > 0 is the c o n s t a n t involved in (4.36).
Then (4.36) and the rules of Stew 3" yield 6k+i=6 k for k=k2,...,k2+N ,
so that k#K~ for k=k2,...,k2+N. Thus no resets occur at Step 8" for
k=k2,...,k2+N, and part (iv) of the proof of Lemma 4.18 remains valid.
Thus L e m m a 4.18, and hence also T h e o r e m 4.19, T h e o r e m 4.20 and Corolla-
ry 4.21 are true.
Remark 6.6. We conclude from the above proof that the global convergen-
ce results e s t a b l i s h e d in Section 4 for A l g o r i t h m 3.1 are not impaired
if one replaces Step 4 by Step 4"' above, and the line search require-
ments (3.9)-(3.10) by (6.29), p r o v i d e d that the rules for c h o o s i n g {6 k}
are such that {~k} is b o u n d e d and 6k_+ 0 w h e n e v e r {x k} has at least
one a c c u m u l a t i o n point. This o b s e r v a t i o n may be used in d e s i g n i n g rules
d i f f e r e n t from the ones of A l g o r i t h m 3.1 and its a b o v e - d e s c r i b e d modi-
fication.
7. S i m p l i f i e d V e r s i o n s That N e g l e c t L i n e a r i z a t i o n Errors
i.e. we use (7.1) instead of (3.2). Then the corresponding dual search
direction finding subproblem (3.16) is of the form
jgj+lppk_ 1 2
minimize 1 Z jk I I ,
l,lp jg
I =0 if rk=l.
p a
v
k
=-[
pk j2 , (7.3)
k
instead of using (3.5) and (3.11). Then v < 0, so that Line Search
Procedure 3.2 can be used as before, with Lemma 3.3 remaining valid.
As far as convergence of the above algorithm is concerned, one may
reason as if the values of ~KD were zero for all k, since in this
case (7.3)is equivalent to the previously employed relation (3.5). Then
it is easy to check that the m o d i f i c a t i o n defined by (7.1)-(7.4) does
not impair Theorem 4.19 and Corollary 4.21; in fact, the relevant
proofs of Section 4 are simplified. We conclude that in the nonconvex
case the above-described version of A l g o r i t h m 3.1 has the same global
convergence properties as the original method.
187
z t {l l }<+" (7.5)
k=l
tk,
L[-V k ) <_ [f(xkl-f( xk+ 1 )]/m L (7.6)
2
with _ v k = i p k l 2 + ~ k , for all k. But now we have -v k=Ipkl in the sim-
plified v e r s i o n of the method, so that (7.6) yields
ipkl2 ~k
Step 3' (Resetting test). If IPkl ! m a a k or m ep then go to
Step 4; otherwise, go to Step 5.
R e a s o n i n g as in. Section 3, one may check that only finitely many re-
sets can occur at any i t e r a t i o n of the m o d i f i e d method.
We shall now show that the m o d i f i e d m e t h o d retains all the conver-
188
~k --2
~p < ap/(4ma) for all k e [k2,k2+N ]. (7.9)
Then (4.36), (4.37) and (7.9) yield iiLpkl > m a a k and jpkl2 > m ~k
~p for
~k = if(xk)_~pk I =
~p
B={y : Ix-yl ~ 2}. From (4.22), (4.31), Lemma 4.14, Lemma 4.15 and (4.20)
we deduce the existence of k satisfying (4.29), (4.32) and
from (7.11), since {yk,i : i=l ..... M } ~ {yJ : j e ~k} and Ixk-yjl 5 s~.
i. I n t r o d u c t i o n
where the functions f : R N --+ R and F : R N --+ R are convex, but n o t ne-
cessarily differentiable. We assume that the S l a t e r constraint qualifica-
tion is f u l f i l l e d , i.e. there exists x~ RN satisfying F(x) < 0. T h i s
i~plies t h a t the f e a s i b l e set
S = {x e R N : F (x) ~ 0}
2. D e r i v a t i o n of the A l g o r i t h m Class
H(y;x) = m a x { f ( y ) - f ( x ) , F ( y ) } , (2.1)
f
[ ~F(y) if f ( y ) - f ( x ) < F(y) .
192
which in turn is e q u i v a l e n t to
0 ~ ~H(x;x).
Step 2. If
where fj, j Jf, and Fj, j ~ JF' are convex functions with continuous
gradients Vfj and VFj, respectively, and the sets Jf and JFarefinite. To
calculate a point ~ satisfying (2.2), the method of centers (Kiwiel,
1981a) proceeds as follows. Given the k-th approximation to a solution
x k S, a search direction dk is found from the solution (dk,vk)~R Nx
R to the following problem
minimize 1dl2+v,
and we have
194
v k = H^~(xk+d k ). 42.8)
H ( x k + t k d k ; x k) ~ H ( x k ; x k ) + m tkv k, (2.9)
This follows from the nonnegativity of mtkv k and the fact that H(xk;xk)=0
owing to x k e S.
It is known (Kiwiel, 1981a) that the above method of centers is
globally convergent (to stationary points of (i.i) if the problem func-
tion are nonconvex), and that the rate of convergence is at least line-
ar under standard second order sufficiency conditions of optimality.
This justifies our efforts to extend the m e t h o d to more general nondif-
ferentiable problems.
Although the methods given below will not require the special form
(2.4) of the problem functions, they are based on similar representa-
tions
Suppose that at the k-th iteration we have the current point xk~s
and some auxiliary points yJ , j ~ J~ u JF'
k and subgradients g~=gf(yJ),
k and
j Jf,
g~=gF(yJ), k where
j e JF' Jfk and JFk are some subsets of
{i .... ,k}. Define the linearizations
fj(x) = f(yJ)+ < g~,x-y 3 > , j e Jfk
(2.10)
Fj(x) = F(yJ)+ < g~,x-y 3" > , j e JF'
k
minimize 1
~Idl2+v,
i 2
minimize + ~Idl over all d, (2.14)
H~(xk+d k) = v k, (2.15)
196
In the next section we prove that vk < 0 and that vk=0 only if
x k e X. Therefore we may now suppose that vk is negative. The line
search rule of
the above-described m e t h o d of centers must be modified
k
here, because v need no longer be an upper estimate of the directional
derivative of H(.,x k) at x k in the d i r e c t i o n d k. This is due to the
fact that ^k
Hs(. ) may poorly approximate H(.;xk). However, we still
have
k ^k k k k k
v = Hs(X +d )-H(x ;x ) (2.17)
H ( x k + d k ) - H ( x k ; x k) & m [ H ~ ( x k + d k ) - H ( x k ; x k ) ] . (2.19)
k
If a stepsize tL ~ satisfying
(2.18) is found, then the method can
k+l k k k k+l k+l
execute a serious step by setting
x =x +tLd and y =x . Other-
wise a null step is taken by setting x k+l = x k . In this case an auxilia-
ry stepsize t kR ~ ~ satisfying
k+l k.k~k
is known from the search for t kL. Then the trial point y =x ~tRO
^k k
for some sets ^k c Jfk
Jf and J F C JF" From Chapter 1 and Chapter 2 we know
Ak
that at least three approaches to the selection of 3~ and JF are pos-
sible. First, one may use subgradient a c c u m u l a t i o n by choosing Jf-Jf^k-
k
and tk _k
dF=d F , which r e s u l t s in
k k ~ lj+j
k Z _k~jk = i, 2.21a)
lj ~0, j ~ , ~ ~ O, J ~ J F ' j e J~ ~ 0F
k j k ko k k
IF + < gF,d > -v ]~j=0, j ~ JF' 2.21c)
pk = j eZ J kl g3 + Z k ~ kjgF'
j 2.21d)
f J ~JF
198
d k = _pk, 2.21e)
_{ ipkl 2 ~k k k
vk = j [f(x l 2.21f)
k
f~-f<xkl < g~,d k ~ < v k, j e Jf, 2.21g)
F k + < gJ,dk~ k
< v k, j e JF" 2.21hi
(iii) Multipliers
lk3' j E Jf,
k uj,
k j JF'
k satisfy (2.21) if and only
they solve the following dual of (2.12)
^k k k
Jf = {J~ : 3 and JF = {jE JF : ~j > 0} (2.23a)
satisfy
(2.23b)
subject to kI + Z k~ = 1
j e Jf j j E JF j '
(2.24)
Z ~kljg~+. ~ k~jg ~ = pk,
j E ~f 3 e JF
k k
lj a 0, j e Jf, ~j a0, J e J F '
minimize lldl2.v,
(d, v)eR N+I
subject to ^k
f -f(xk)+ < g~,d > ~ v, j Jf, (2.25)
Lemma 2.3(iv) and the generalized cutting plane idea from Section
2.2 lead us to the following subgradient selection_strategy. Subproblem
(2.25) is a reduced, equivalent version of subproblem (2.12). Therefore
the choice of j~+l- and j~+l- specified by (2.20) and (2.23) conforms
with the generalized cutting plane concept, because it consists in ap-
pending to a reduced subproblem linear constraints generated by the la-
test subgradients. Thus only those past subgradients that contribute to
the current search direction are retained, see (2.21a), (2.21d), (2.21e)
and (2.23). Subgradient selection results in implementable algorithms
that require storage of at most N+I past subgradients.
In the s ubgradient aggregation strategy we shall construct an aux-
iliary reduced subproblem by forming surrogate contraints with the help
of Lagrange multipliers of (2.12). As expounded in Chapter 2, subgradi-
ent aggregation consists in forming convex combinations of the past sub-
gradients of a given function on the basis of the corresponding Lagran-
ge multipliers. Here a slight complication arises from the fact
that the Lagrange multipliers associated with each of the problem func-
tions (l k with f, and k with F) do not form separate convex com-
binations, see (2.21a). Yet the subgradients of f should be aggregated
separately from those of F, since otherwise the mixing of subgradients
would spoil crucial properties of subgradient aggregation. For separa-
te subgradient aggregation we shall use scaled versions of Lagrange
multipliers of (2.12). A suitable scaling procedure, which yields sepa-
rate convex combinations, is given below.
Let (Ik,~ k) denote any vectors of Lagrange multipliers of (2.12),
which do not necessarily satisfy (2.23) and let the numbers v~,
' 3'
k k ~k k
j Jf, VF, ~j, j e JF' satisfy
k lk k k'k k
~ f = 3 ~l jk j, lj=~)flj, j ~Jf, (2.26a)
f
200
k k k k~k k
~F = j ~E jfk~j, ~j=VFW j , j 2.26b)
JF'
2.26c)
3- ' jeff
~k k E ~k= 2.26d)
wj _> 0, j e J F, j ~ jkWj i.
k k k k
Vf_> 0, ~F >-0' ~f + ~F = i. (2.27)
k
satisfy (2.26a) and (2.26d) in view of (2.21a). If ~f=0 then l~=0a
for all j 6 J~ by (2.21a), hence (2.26a) is trivially fulfilled by any
~k
numbers ~3 satisfying (2.26c). Similarly one may choose ~j.
The above scaled Lagrange multipliers (~k,~k) will be used for
subgradient aggregation as follows.
k ~k ~k ' k k ~k ~k gJ,F k)
(pf,fp) : E lj(g3,f ) and (PF'Fp) = E k ~j( F 3 (2.28)
je JF
Then
k k k k k (2.29)
p = ~fpf + ~FPF ,
k 2 k k ~k k~k
v = - {lpkl +~fEf(x )-fp]-~FFp}. (2.30)
minimize 1dl2+v,
(d,v)~ RN + l
Proof. (2.29) and (2.30) follow easily from (2.21) and (2.26). The equ-
ivalence of (2.31) and (2.12) can be shown as in the proof of Lemma 2.
2.2.
fk(x) = fk+<pk,x-xk > and ~k(x) = ~ k + < PFk 'x-x k > . (2.32)
The rules for updating the linearizations can be taken from Chapter
2:
~k(x ) = fk+l+ < p~,x_xk+l > and Fk(x) = Fk+l+ < PF,X-X
k k+l > (2.38)
P P
where
Hk(x) = max{fk(x)-f(xk),Fk(x)} for all x.
(2.40)
minimize 11dl2+v,
(d,v) ~ R N+I
j k
subject to fk-f(xk)+ < g ,d > _< v, j e Jf,
0 1 , fl I = f(x I) I = {i}
Pf = gf = gf (xl) p = fl ' Jf '
(2.42)
0 1 , F1 1 = F(x I) 1
PF = gF = gF (xl) p = F1 ' JF = {i}"
k k
Ik>-0'3 je jf,k ik0 , p ]Jk >0- ' J ~ JF' ~p>0,_ (2.43a)
203
k k j k k-i k j k k-i
p = ~ _kljgf+lpPf + ~ kPjgF+upPF , (2.43c)
j e cf J E JF
d k = - pk, (2.43d)
k k k k k~k k k k'k
vF = ~ JFk ~ j + ~ p ' ~ j = ~F~j ' j e JF ' ~p = ~FPp, (2.44b)
J 6
~k>o,
] - j ~ Jf'
k ~pk-> o ' j ~~ jk ~kj + ~kp = i, (2.44c)
f
~k k ~k 0, Z kp~kj + pp
~k = l,
(2.44d)
pj a 0, J ~ JF' Ppa j e JF
k k k k
~f > 0, ~F >_0, ~f + 9F = I, (2.44e)
(2.45)
'fP) j e Jf J P~Pf '
-k j Fk k-lF l
, p) = Z k P j ( g F , j) + Pp(PF
J e JF
Moreover, relations (2.29) and (2.30) also h o l d for the m e t h o d with sub-
gradient aggregation based on ( 2 . 4 3 ) - ( 2 . 4 5 ) .
~k 1 2
minimize Ha(xk+d)+ ~Idl over all d, (2.46)
and one can compute subgradients g ~ ' J e ~fj(xk), j ~ Jf, and g~'Je~Fj(xk),
J ~ JF" Then one may append the constraints
to the search direction subproblems (2.12) and (2.41), for all k. This
enhances faster convergence, but at the cost of more work per iteration.
One may also replace the sets Jf and JF in (2.49) with the sets
for some E~0. Such augmentations should be used especially if there are
many constraints in the original formulation of problem (1.17, cf. Re-
mark2.2. It is straighforward to extend the subgradient selection and
aggregation rules to the augmented subproblems. The subsequent results
hold also for such modifications.
Al@orithm 3.1
Step 0 (Initialization). Select a starting point xla S, a final accuracy
tolerance E s Z 0 and a line search parameter m e (0,i). Set yl=xl and
initialize the algorithm according to (2.42). Set the counters k=l,
i=0 and k( 0 )=i.
k k k
Step 1 (Direction finding). Compute multipliers 13k, j ~ Jf, Ip, ~j,
k and
J ~JF' k that solve the k-th dual search direction finding sub-
problem
i, j k-i ~j~ k - i 2
minimize ~l Z ~kl~gf+IpPfj + E k~j~F~pF F I +
I,~ j6 of j e JF
E ~ +I + ~ k~j+~p=l.
j~jf 3 P jejF
k 1 k 2 k k ~kT k k
w : ~IP I +vf[f(x )-fpJ-VFF p. (3.2)
k+l k k-k
Step 4 (Linearization updating). Set x =x +tLd . Choose some sets
.k+l tk k+l tk
Set of =Jf u {k+l} and JF = d F U {k+l}. Increase k by I and go to
Step I.
Remark 3.2. It follows from Lemma 2.3 that in Algorithm 3.1 (dk,v k)-
Remark 3.3. As noted in the previous section, the line search quarantees
that x k+l e S if x ke S. Since xI ~ S by assumption, it follows
that Algorithm 3.1 is a feasible point method. In particular, H(xk,xk) =
0 for all k.
Remark 3.4. For convenience, the above version of the algorithm requires
the subgradient mappings gf and gF to be defined everywhere. In
fact, gf need not be defined at x ~ S. In this case, in Step 4 set
k+l . k+l. .k+l _.xk+l yk+l~
gf =gf[x ) and Zk+l=Z( ) if S. If gF is not defined
4. C o n v e r g e n c e
(pf,fp)k
~k E conv{(gj,fk,)3 : j = l , . . . . k } ,
(~4 ~z)
jk
(PF' ) E conv{(gF,Fj) : j:l ..... k}.
If a d d i t i o n a l l y k > 1, then
(pk-i ,fk
p ) E c o n v { ( g 3 , f k) : j=l ..... k-l},
(4.2)
conv (g ,F ) : j:1 .....
~k,p=f(xk)_~kP ~k - ~k (4.3c)
ef , OF,p= p
~k k k ~k k~k
~p = ~f[f(x )-fpJ-~Fp (4.3d)
k
g]e DEf(xk ) for e=ef,j j=l,...,k, (4.4a)
k eD f(x k)
pf for ~k
c=af,p (4.4e)
Proof. Using (2.34), (2.36), Lemma 4~ and the fact that we always have
F(x)+ >F(x) and F(xk)+=0, one obtains (4.4a)-(4.4f) as in the proof
of Lemraa 2.4.2. In particular, similarly to (2.4.5c) we get
k k
F(x) -> < PF,X-X > +~kp
yield
k k k k k j+
R N" k k k k
for each x~ Since vf ~ 0 and VF ~ 0 satisfy vf+VF=l, we have
H(x;xk)=max{f(x)-f(xk),F(x)} a v~[f(x)-f(xk)l+v~F(x),
hence
H(x;x k) ~ H(xk;xk)+ < pk,x_xk > - ~ k for all x,
P
Remark 4.3. In view of (4.4), the linearization errors (4.3) may also be
called subgradient locality measures, because they indicate the distance
from subgradients to the corresponding subdifferentials at the current
point x k. For instance, the value of~k h 0 indicates how much pk dif-
P
fers from being a member of 3H(xk;xk); if ~k=0 then p k ~ ~H(xk;xk).
P
The following result is useful for justifying the stopping criter-
ion of the algorithm.
k 1 2 ~k
w = ~lpkl +~p, (4.5)
v k = -{ IP k I2+;~ }, (4.6)
vk ~ -w k ~0. (4.7)
Proof. This follows easily from (3.2), (2.30), (4.3d) and the nonnega-
tivity of ~kp"
because 1pkl indicates how much pk differs from the null vector
210
From now on we suppose that the algorithm does not terminate, i.e.
wk> 0 for all k. Since the line search rules imply that we always have
f(xk+l)-f(x k) mt~v
k~ (4.9)
with m> 0 and t kL ~ 0, the fact that v k ~ -w k < 0 see (4.7)) yields
that the sequence {f(xk)} is nonincreasing.
Lemma 4.8. Suppose that the sequence {f(xk)} is b o u n d e d from below. Then
= k k 2 k~k +~ (4.10)
{tLlP I +tLa p} <
k=l
k k(1)
x = x for k=k(1), k(1)+l ..... k(l+l)-l, (4.11)
k(1)-i = 1
tL for all 1
Lemma 4.9. Suppose that there exist an infinite set L c{i,2,...} and
j k-I j k-i 2+
minimize 11 Z ~kljgf+IpPf + Z kPjgF+~pPF I
l,~ J ~ ~f 3 e JF
212
k k k k
+ Z -k1-~f,-+IP~f,P
+ 3 3 Z
j ~ df j e JFkPj~F'j+~p~F'P'
(4.12)
subject to 13 >- 0, j ~ k
Jf' i p -> 0, p j > 0, j ~- k
JF' Pp a 0,
kl.+1 + Z kPj+pp=l,
j e jf 3 P J ~ JF
Proof. As in the proof of Lemma 2.4.9, the assertion follows from (4.3),
(4.5), (2.45), (2.43c) and the fact that the k-th Lagrange multipliers
solve (3.1).
for all k > i. They will be used in the following extension of Lemma 2.
4.11.
k-i
Lemma 4.11. Suppose that t L =0 for some k >I. Then
w k S #c(wk-l), (4.15)
-k-i ,i } .
C _> max{ Ipk-ll ,Igkl , ~p
k-i
Proof.(i) If t L =0 then the line search rules yield yk=xk-l+dk-I
and xk=x k-l, i.e. yk=xk+dk-l, and
First, suppose that F(y k) ~ m v k-l. Then (4.13a), (4.3a), the rules of
k+ gk,dk-i k k k k F(yk) k
-e < > = Fk+ < gF,y -x > = >mv (4.17a)
213
Next, suppose that F(y k) < m v k-l. Then (4.16) implies f(yk)-f(xk) >
mv k-l. Hence (4.13b), (4.3a) and the rules of Step 4 yield
k Ip(9) = (l-v)vfk-1
lj(v) = 0, j Jf,
(4.18a)
~k(V) = 9, ~ j ( 9 ) = O, j E J ~ \ { k } , ~p(9) = (1-9)~ -1.
]
Ik(V) =v , lj(v) = 0, j ~ J ~{k}, I p ( V ) = (l-v)vf
k-1
,
(4.18b)
k ~p(9) = (i_~) k-i
~j(9) = 0, j6 JF,
Observe that the multipliers (4.18) are feasible for subproblem <4.12)
for each ~ ~ [0,i], because k J n JFk and (2.44e) is satisfied. More-
over, for each
klj(v)g~+Ip(9)Pf k-i + Z k~j ( 9)gJ+~p(VF )PFk-i =(l-v)pk-l+vgk"
j e jf j e JF
(4.19a)
This follows from (4.18), (2.29) and (4.13). Next, x k =x k-i implies
~kp= fk-i
p and ~kp= Fk-i
p by (2.35), hence (4.18) (4.13) and (4.3) yield
k ) fk,p+j k k
Z klj(9)ef,j+ip(~ ~ ~j(v)e F j+~p(V)~F,p =
j~ Jf e jk ,
(4.19b)
~k-i k
=(l-v) ap + ~ ,
for each ~ - [0,i]. (4.19) and Lemma 4.10 imply that wk is not larger
than the optimal value of the problem
minimize ]<l-~)pk-l+ggk[2+(l-~)~k-l+v~k,
P
<4.20)
subject to ~ e [0,i].
214
Since we also have (4.14), one may complete the proof by using Lemma 2.
4.10 for bounding the optimal value of (4.20), as in the proof of Lem-
ma 2.4.11. E
Combining Lemma 4.7 with Lemma 4.9 and Lemma 4.12, and using (4.11),
we obtain
Lemma 4.14. Suppose that a point x eS satisfies f(x )<f(x k) for all
k. Then the sequence {x k} is bounded and
{Ixi+1-xil 2+2tL~
i~i
p} --+ 0 as n~.
i=l
Since the above inequality is of the form (2.4.28), one may complete
the proof by using Lemma 4.8 similarly to the proof of Lemma 4.14.
^ ^ ^ xk
Proof. Let x eX. Then x S and f(x) S f( ) for all k, hence Lemma
4.14 shows that {x k} is bounded. By Theorem 4.15, {x k} has an accu-
m u l a t i o n point ~ X. For showing that x k --+ x, use Lemma 4.14 and
the proof of Theorem 2 . 4 . 1 5
Lemma 4.17. Suppose that the sequence {f(xk)} is bounded from below.
Then w k + 0.
, k k k . k
Step 1 (Direction finding). Find m u l t i p l i e r s lj, j e ~f, and ^~j,3 ~ JF'
that solve the k-th dual subproblem (2.22), and sets J~ and J~ satis-
fying (2.23). Calculate scaled Lagrange m u l t i p l i e r s satisfying (2.26).
k ~k
Compute (p~,f) and (PF,Fp) by (2.28), and use (2.29) and (2.30) for
lj(9) = (i-9)13 -I
k , j , Ik(~) = 0,
(5.2a)
tk- 1
]Jj(~)) = (l-~)p k-I , j E dE , pk(V) = v,
and (4.18b) by
lj(~) = ( l - v ; ,ik-I
j , j ~ Jtk-i
f , ik(~)) = ~),
(5.25)
^k-i
~j(9) = (I-9)~3k.-i , j e JF , ~k(~)) = 0.
k-1 ' k
~Fk-l(pFk-l'~k-l)=p" j eZ ~Ftk-l~J (gF3'F)"
,"k-1 k (5.4)
jE Jf ,3 jE JF
k-i
for all ~ E [0,i], if t L =0. In view of Lemma 5.2, (5.4) suffices for
completing the proof of Lemma 4.11 for A l g o r i t h m 5.1. The remaining pro-
ofs need not be modified.
6. Line Search M o d i f i c a t i o n s
k
Step 3' <Line search). Select an auxiliary stepsize t R 6 L~,I] and set
k+l k k k
y =x +tRd . If
Lemma 6.1. Suppose that a point y=xk+td k satisfies F(y) > m t v k for
some t e (0,i], where F(x k) ~ 0. Let g=gF(y) e ~F(y) and ~ = - [F(y)+
<g,xk-y >] . Then -e+ < g,d k > > m y k.
Proof. By assumption,
- ~ < g , d k > = F ( y ) - t < g , d k > + < g , d k > > tmvk+(l-t) < g,d k > .
since t ~ (0,i] .
Step 3 " ILine search). Select an auxiliary stepsize t kR e [~,lJ and set
yk+l=xk+t~dk. If
k k
then set tL=t R (a serious step); otherwise, i.e. if at least one of
inequalities (6.2) is violated, set t~=0 Ca null step).
7. Phase I - phase II m e t h o d s
minimize lldl2+v,
k
subject to < g ,d > .<_v, j Jf,
j k
Fk-F(xk)++ < g ,d > <_ v, j e JF'
2
1 j k-i j k-I
minimize ~I g kljgf+IpPf + z _k~jgF+ppp F I +
i,~ J ~ JF je JF
+ g +
jE Jf
(7.2)
je JF
k k
subject to lj _> 0, j ~ Jf, Ip>_0, ~j > 0, J ~ JF' p p Z 0 ,
J ~ j~11 3 +I P + j eZ j~pj+pp-i
k lk k k k
with a solution denoted by I~, j~ Jf, p, ~j, jC JF' pp. If we denote
by (dk,u k) the solution of subproblem (2.41) then
v k = uk-F(xk)+, (7.3)
v k = ^~
H (xked k )_H(xk;x k) ~0, (7.4)
searches.
It is easy to observe that if F(x k) i 0 then vk=u k and subprob-
lems (7.1) and (7.2) reduce to subproblems (2.41) and (3.1), respec-
tively. In fact, even for F(x k) > 0 subproblems (7.1) and (2.41) are
essentially equivalent in view of (7.3), and can be regarded as qua-
dratic programming formulations of the following
~k 1 2
minimize Ha(xk+d)+ ~Idl over all d,
k k ~ k j k k-i
P = j Ef ZJ kl g + lppfk
k-l+j EZ ~F ujgF + ~pPF '
ik_>
] 0, j e j k , lkp a 0 , ~jk ~ 0, j 6JF,
k k 0,
~p_>
k k
k Ik + Ik + Z jk~j + Up = i,
je j 3 P je F
k = f(xk)_fk j=l,...,k,
~f,J 3 '
k = f(xk)_f~,
~f,p
(7.5)
k = F(x k) _F k
aF,j + j ' j=l,...,k,
k = F(x k) _F k
~F,p + p'
g klj + I + g kPj + Mp = i.
J ~ Jf P J JF
k k k k
Step i" IDirection finding). Find a solution I~, j g Jf, Ip, pj, JmJF'
k k
and pp to the k-th dual subproblem (7.6). Calculate multipliers ~f,
~' k
J ~ Jf' ~kp' 9F'
k ~j'
~k k and
j JF' p~ satisfying (2.44). Compute (p~
~k)p and (p~,F~) by (2.45) and use (2.29) for calculating pk. Set
d k = _ pk and
wk=~lp +~fk2
I kEf(xk)_~+F(xkl+]+,~EF<xk)+_~] " (7.7b)
k
If w _< E s terminate; otherwise, go to Step 3.
~k = f ( x k ) _ ~ ,
~f,p
~k x k . _}k (7.8)
~F,p=F( )+ p'
Lemma 7.2. At the k-th iteration of Algorithm 7.1, one has (4.4) and
k-I k
PF e ~eH(xk;x k) for e=~F,p , (7.9d)
k ~k
pf 3 eH(xk;xk) for e=~ f,p+F{ xk) +, (7.9e)
k ~k
p F e ~ H ( x k ; x k) for ~=~F,p' (7.9f)
j
f(x) _> < gf,x-x k > + fk] ,
hence
j k +Fk
F(x) ~ < gF,x-x > 3
and
j
H(X;X k) >F(x) ~ < gF,x-x k > +F k >
which implies (7.9b). In view of Lemma 4.1, one may take convex combi-
nations of ~7.10) to obtain (7.9c)-(7.9f). In particular, we have
k
H(x;x k) ~ H(xk;xk)+ < pf,x-x k > _Ef(xk)_~+F(xk)+] (7.11a)
k
H(X;X k) > H(xk;xk)+ < PF,X-X k > - [F(xk)._9~ (7.11b)
k k k k k
H(x;x k) ~ H(xk;xk)+ < vfPf+~FPF,X-X >+
from (2.29). Setting x=x k, we get ~k_> 0. This completes the proof of
(7.9g). (4.4a)-(4.4f) can be established as in the proof of Lemma 4.2.
From (7.7) and (7.8) we deduce that Lemma 4.4 holds for Algorithm
7.1. Then relation (4.8) follows from (7.9g) and (4.5), so Lemma 4.5
remains valid for Algorithm 7.1.
225
As observed above, we may assume that F(x k) > 0 for all k. Since
the line search rules imply that we always have
we obtain
Lemma 7.3. At the k-th iteration of Algorithm 7.1, w k is the optimal va-
lue of subproblem (7.6).
and (4.16) by
_k+ < gk,dk-1 - _ IF( xk) +_F( yk)_ < g ,xk_yk 7]+ < g k ,d k > =
_ k + < gk,dk > = - [ f ( x k ) - f ( y k)- < g k , x k - y k > +F(x k ) + ] + < gf,dkk > =
k k ~-l+vak
+ Z k~j(~)~F,j+~p(~)aF, j = (l-~)e (7.16)
J~ JF
and use it together with i4.19a) to deduce from Lemma 7.3 that w k is
m a j o r i z e d by the optimal value of (4.20), as before.
Since Lemma 4.12 is valid for A l g o r i t h m 7.1, we obtain the follow-
ing result.
(i) If F(x k) > 0 for all k, i.e. the algorithm stays at phase I, then
every accumulation point of {x k} is a solution to problem (i.i).
(ii) If F(x k) ~ 0 for some k ~I, then F(x k) ~ 0 for all k zk and
F(x k) > 0 k k k
and Fly k+l) <_Fix )+mtRv (7.17a)
or
k k
then set tL=t R Ca serious step); otherwise set t =0 a null step).
If Step 3'" is used then at phase I the algorithm will ignore the
objective function values at line searches until a feasible point is
227
or
f(yk+l) ~ f(x k,J+mtRv
k k and F( yk+l ) ~ 0 (7.18b)
k k
then set tL=tR; otherwise set t~=0.
so that we have
k k k
from Lemma 2.6.1 and the fact that ~F,k=F(x ~ - F k = F ( x k ) - F ~ if F(xk)>0.
Thus one can use (7.19) instead of (7.15) in the proof of Lemma 4.11.
We now pass to the phase I - phase II method with subgradient se-
lection, which extends A l g o r i t h m 5.1 to the case of infeasible starting
points.
Algorithm 7.5 is obtained from A l g o r i t h m 7.1 by replacing Step i "
with
k k k
Step i'" (Direction finding). Find m u l t i p l i e r s ~j, j ~ Jf, and ~j,
k
j e JF' that solve the following k-th dual subproblem
228
k~jg~ 12 +
~kljg~+ 3 6Z JF
minimize 1j 6 Jf
+ ~ klj[a
~,j+F(x k
)+]+
k
Z k~jaF,j, (7.20)
je Jf je JF
^k ^k
and the corresponding sets Jf and JF
that satisfy (2.23). Calculate
k -k k ~k
scaled multipliers satisfying (2.26), compute (pf,fp) and (PF,Fp) by
(2.28), and use (2.29) for calculating pk. Set d k = _ pk and
vk =- {Ipkl 2+~f[~f,p+F(x
k ~k k k k
)+]+~F,p}. 7.21)
minimize 1dl2+v,
(d,v)~ R N+I
i. I n t r o d u c t i o n
S = { x E R N : F(x) & 0 }
is nonempty.
xk e S and f(x
k+l)~" < f(x
k)'" if x k + l ~ x k ' for all k,
a feasible starting p o i n t by m i n i m i z i n g F.
We shall also p r e s e n t phase I - phase II m e t h o d s that can be em-
ployed when the user has a good, but infeasible, initial approximation
to a solution. Starting from this point, phase I of such m e t h o d s tries
to find a feasible point without unduly increasing the o b j e c t i v e value.
At phase II the m e t h o d s reduce to feasible point algorithms.
The algorithms of this chapter may be v i e w e d as e x t e n s i o n s of the
Pironneau and Polak (1972; 1973) m e t h o d of centers and m e t h o d of f e a s ~
ble d i r e c t i o n s to the n o n d i f f e r e n t i a b l e case. One of the algorithms can
be d e r i v e d by a p p l y i n g our subgradient selection and a g g r e g a t i o n rules
to the M i f f l i n (1982) method . Also our e x t e n s i o n s of the Polak, Mayne
and T r a h a n (1979) phase I - phase II a l g o r i t h m differ from those of Po-
lak, M a y n e and W a r d i (1983).
We shall prove that each of our feasible point m e t h o d s is g l o b a l l y
convergent in the sense that it g e n e r a t e s an infinite sequence of points
{x k} such that every accumulation point of {x k} is s t a t i o n a r y for f on
S. If p r o b l e m (I.i) is c o n v e x and satisfies the Slater constraint qua~-
ficati o n (i.e. F(x) < 0 for some x in RN), then xk is a minimi-
zing s e q u e n c e for f on S, w h i c h converges to a s o l u t i o n of p r o b l e m (i.i)
whenever f attains its i n f i m u m on S. Similar convergence results hold
for our phase I - phase II methods.
In Section 2 we derive the methods. The a l g o r i t h m with subgradient
aggregation is d e s c r i b e d in detail in S e c t i o n 3, and its c o n v e r g e n c e is
established in S e c t i o n 4. S e c t i o n 5 is d e v o t e d to the a l g o r i t h m with s u ~
gradient selection. In S e c t i o n 6 we study various modifications of the
methods with subgradient locality measures. Several versions of methods
with subgradient deletion rules are a n a l y z e d in S e c t i o n 7. In S e c t i o n 8
we discuss methods that n e g l e c t linearization errors. Phase I - phase II
methods are d e s c r i b e d in S e c t i o n 9.
2. D e r i v a t i o n of the M e t h o d s
A
the necessary condition of optimality is 0 E Mix ) . For this reason, a
point ~E S such tha~ 0 ~ M(x) is called stationary for f on S.
Defining
for all x. By (2.2) and (2.5), Mi') c M i . ) , so, although we may have
M(~)~M(~), if x solves(l.l) locally then 0 M(~). Therefore, we s h a h
also say that a point x~ S is stationary for f on S if 0 E M(x).
k+l k k k
x = x +t~d for k=l,2,...,
yk+l = xk+t~d k
such that the subgradients gf(yk+l) and gF(y k+l) modify significan-
ly the next p o l y h e d r a l a p p r o x i m a t i o n s to f and F that will be used for
finding the next search direction.
233
fk = fj(xk)
3
Fk = F (x k)
S J '
These easily updated quantities enable us not to store the points yJ.
At the k-th iteration we want to find a descent direction for
H(o;xk). Therefore, we need some measures, say ~ ,j ~0 and ~F,j Z0,
that indicate how much the subgradients g~=gf(yJ) and g~=gF(y j) dif-
fer from being elements of 3H(xk;xk). To this end, we shall use the
following subgradient locality measures
k 2
k = max{If(xk)-f k I yf(sj) }, (2.8a)
~f,j
234
see Lemma 5.7.2. Next, suppose that F is nonconvex and the value of
In the convex case, the methods of Chapter 5 would use the following
search direction finding subproblem
where
x = max x - x , x ,
k f(xk)_f~
~f,j =
k =_ Fk
eF,j j'
k j k k j k
Fj(x)=F + < gF,x-x > = - ~F,j+ < g ,x-x > ,
so
^
Hk(x) = max[max{ - e k
f,j+< g~,~_x k > : j e j~},
minimize
^
1dL2 +v,
'~
(d,v) e R N+I
k
subject to k j+< g3,d
--~fl
> < v, j6 Jf, <2.14)
j k
k j + < g ,d~ < v, j6 J F.
-~F,
Also
^k
v = ~ ( x k + d k )=Hs(xk+dk)-Hk(x
^k k)
minimize ~I
1 Z /jg ~+j Z k~jg~l ~ Z k I k k
Z k~jeF,J '
I,~ J g ~f JF je Jf J~f'J+j ~ JF
42.15)
subject to > 0
k3 _ ,
j ~ k k Z I.+
Jf, ~j ~ 0, j e JF' j ~ j~ 3 je JF
Z k~j=l,
236
k j k j
- dk = ~ ~ k l j g f + ~ k~jgF , (2.16a)
J Jf 3 ~ JF
^k k k k k
v = - {IdkI2+ 2 kk.~ .+ ~ k~j~F (2.16b)
j jf 3 z,3 j g JF 'J}'
and
kk > 0, j jk k ~0,j ~ k ~ ~.kk+ ~ k~jk = i.
3 - f~j JF'j j~ 3 j e JF
the values of ~,j and ~,j are relatively small, i.e. g~ and g~
are approximate subgradients of Hk at x k.
Up till now we have not specified how to choose the sets Jfk and
k
JF involved in (2.13) and (2.15). Since subproblem (2.15) is of the
form studied in Chapter 5 ~see Lemma 5.2.3), we may use the subgradient
selection rules developed in that chapter for choosing J and JFk re-
cursively so that at most N+3 past subgradients are used for each di-
rection finding. Thus at the k-th iteration one can find Lagrange multi-
pliers Ik3 and pjk of (2.15) ~k c Jfk
and sets Jf ~k c JFk such that
and JF
~k = k k
jf^k = {j 6 Jf: k Ik3 ~ 0} and JF {J JF : ~j ~ 0},
k = max{ If(xk)-Fk I k 2
af,p , yf(sf) },
(2.18)
eF,pk = max{ IFkl , yF (sk) 2}.
k k k-l. k-l,
The value of ~f,p(aF,p) indicates how far pf [PF ) is from 3Hk(xk)."
k k-i k
- ~ f , p + < pf ,X-X > ,
k
-~3?, p+
<p ixxk ,] (2.20)
1 2 ^
minimize ~Idl +v,
(d,v) e R N+I
k j * k
subject to - ~ f , j + < g ,d > i v , j Jf,
k k-i ^ (2.21)
- ~f ,p + <P~ ,d > < v,
k j ^ k
- eF,j + < g ,d > <_v, j E J F,
k k-i ^
- ~ F ,p + < P F ,d> <v.
2 A
minimize 1dl +v,
(d,v) ~ R N+I
k j ^ k
- a F , j + < g ,d > _< v, j e J F,
k k-i ^ rk=0,
- a F ,p + < PF ,d> <
- v if a
ik k ik k k k
3 Z0, j e Jf, p Z 0 , ~j >_0, je JF, ~p~_0,
Ik I k + Z k~jk + ~pk = i,
j jk j + p J JF
and use them for computing the current aggregate subgradients (cf. (3.3.
4))
k k k k
9f Z 0 , 9F>_0 ' 9f + 9F = i, (2.25)
and that
240
~k k k k ~k k k
~j = ~j/VF' J JF' ~p = ~ p / V F
k k
if k ~ 0 an~ ~. If 9f=0 (~ 0) then one n~y p i c k any n u m b e r s satisfy-
ing (2.23c) ((2.23d)). We a l s o h a v e
dk _ pk
k k k k k
p = v f p f + ~FPF, (2.26)
x k+l , one can o b t a i n (~fk+l k+l~ and (~Fk+l k+l'by the u p d a t i n g rules
P ,sf ~ P ,s F J
of S e c t i o n 3.2. In p a r t i c u l a r , we m a y d e f i n e the k - t h a g g r e g a t e lineari-
zations
~k ^k k
j kf + l= Jf u {k+l}, J f c Jf, (2.28a)
241
^k ^k k
j Fk + l= JF u{k+l}, JF c JF' (2.28b)
for all k ~ i, and that the methods are initialized by setting yl=xl
and
1 1 1 = F(yl) (2.29b)
JF = {i}, gF = gF (yl)' F1
~k k+l
u {k+l} if y 6 S,
_k+l I Jf (2.30a)
Jf = ^k
Jf if y k + l % S,
^k k k+l
where Jf c Jf, for all k. Then there is no need for (2.27a) if y is
infeasible (Another possibility is to use (2.28a) with (2.27a) if
^k
_k+l I JF if yk+l I S,
(2.30b)
JF = ^k
JF u {k+l} if yk+l S,
~k k k+l
where J F C JF' for all k. Then (2.27b) need not be used if y is fe-
asible. In this case the last constraint (2.22) should be dropped for
all k such that yJ~ S for j=l,...,k-l, i.e. we do not use the con-
straint subgradients until the first infeasible trial point is found. It
will be seen that all the subsequent proofs need only minor changes to
cover the rules (2.30), while the rules (2.28) require simpler notation.
Specific techniques for dealing with the rules (2.30) will be described
in Section 7.
which indicate how far gf(y) and gF(y) are from ~H(x;x), respecti-
vely. Since
we see that (2.8) differs from (2.32) by using the distance measures s.k
3
instead of Ixk-yjl. This enables us not to store the points yJ.
In fact, one may use of(xk,y j) and OF(xk,y j) instead of ~
k J
and OF,j in the search direction finding subproblems (2.14) and (2.22).
Then the method with subgradient selection has subproblems of the form
while the k-th iteration of the method with subgradient aggregation uses
the subproblem
2
minimize 1dl +v,
<d,v) e R~+I
k k-I ^
- ~F,p + < PF ,d > K v if rk=0a"
(p~,'k ~k = klk(g k k l k k
fp,Sf) j jf 3 ',fj,]xk-yjl) + p(pf ,fp,Sf),
(2.35)
k ~k ~k, ~k j k ~k k-i k k
PF,Fp,SF ) = Z jk~j(gF'Fj 'Ixk-yjl) +~p(PF 'Fp'SF)"
j e -F
k k 2} ,
ef,j = max{f(xk)-f~, yf(sj)
k k 2}
~F,j = max{-F~, YF(Sj) ,
(2.37)
k
~f,p = max{f(xk)-f~, yf(s~)2},
k k 2}
aF,p = max{-F~, YF(SF) .
We note that in the convex case (yf=YF=0) the values of the subgradi-
ent locality measures (2.32), (2.8) and (2.18) coincide with those gi-
ven by (2.36)-(2.37), respectively.
To compare our method with subgradient selection with the Mifflin
(1982) algorithm, we shall need the following notation. Define the sub-
gradient mapping
I gf(x) if x e S,
g(x) = (2.38)
gF(x) if x % S,
k
Suppose that thel rules for choosing Jf and JFk satisfy (2.30) for
all k, with J~={l} and J~=~. Then we have
k
JF = @
and
g~ = g(yJ) and ~f(xk,yj)=~(xk,y j ) if yJes,
jk k k
= Jf U JF for all k,
minimize Aid
2 I 2 ^
+v,
(d,~)e R N+I
(2.40)
subject to -~(x k ,yJ ) + < g( yJ) ,d> ^ j ~ jk.
~ v,
for all k. Moreover, this choice of Jfk and JFk combined with the
subgradient locality measures (2,37) and the search direction finding
subproblems (2.14) leads to a version of the M i f f l i n (1982) algorithm
that does not need storing the points yJ.
To sum up, we shall now comment on relations of the a b o v e - d e s c r i b -
ed methods with other algorithms. If we neglect the variables corre-
sponding to the constraint function F then the methods reduce to the
algorithms for u n c o n s t r a i n e d minimization from Chapter 3. In the convex
case we automatically obtain the search direction finding subproblems
studied in Chapter 5. Thus the methods generalize the method of centers
for inequality constrained m i n i m a x problems (Kiwiel, 1981a), which in
turn extends the Pironneau and Polak m e t h o d of centers and method of
feasible directions for smooth problems.
245
Algorithm 3.1.
1 1 0 gf(yl 1 ~ yl)
Jf = {I}, gf = pf = )' fl = f ( '
1 1 0 gF(yl 1 F~(yl)
JF = {i}, gF = PF = )' FI = '
k k k >~, j~ k
Step 1 (Direction finding). Find multipliers lj, j~ Jf, lp, JF'
and ~ that solve the following k-th dual search direction finding sub-
problem
k ~p> 0
k lp >_ 0 , ~ j -> 0, je JF' (3.1)
subject to Xj ~ 0, j e Jf, - '
Xp = ~p=0 if rk=la'
where
2
k = max{if(xk)_fkl, yf(sk)2} ~ = m~{[Fkl YF(S k) } (3.2a)
ef,j 3 3- ' ,J 3 ' 3 '
k k 2 k k 2
k
ef,p = max{If(xk) -f I, yf(sf) }, aF ,p = max{IFkl, yf(SF) }.(3.25)
P
Compute
246
k lk lk k k k
~f = Z k + and
9F = Z kUj + ~p. (3.3)
3eJ J P 3 e JF
Set
~k k k k ~k ~ k k
] = lj/vf for j 6 Jf and P = ip/gf if 9f ~ 0,
= 0, 3.4)
"k = ~j/~F
k k "k k k k
~j for j e JFk and ~p = ~p/~F if ~F ~ 0,
"k = i, ~j
~k "k = 0, jg J k -{k}, ~kP = 0 if vk = 0.
k k k k k
p = vfpf + 9FPF , 3.6)
d k = _ pk,
3.7)
2
ef,P'k = max{if(xk)_~kl, yf(sk) } 3.8a)
k pki2+9 k ~ k +~ k
v =-{I fef,p F,p . 3.8c)
w k = 1,
~ipkL,2 k,k k~k
~vfef,p+VFeF, p" 3.10)
two stepsizes t kL and t kR such that 0 N t kL N t kR and such that the two
k k ka~ '
tR = tL if tL (3.11c)
where
(3.12)
jk+l ^k _k+l ^k
f = Jf U {k+l} and oF = J F U {k+l}. (3.13)
sjk+l = s~ + Ixk+l-xk I ^k
for j e Jf u ~ k ,
k+l ~k ixk+l_xk
sf = sf+ I,
248
k+l
ak+l = max{ak+l xk+l-xkl'sk+l}
If a k+l < ~ then set rk+l=0 and go to Step 7. Otherwise, set rk+l=l
a
a
and go to Step 6.
^k 2+ ~k k ~k k ~k k ~k k
v = - { Ipkl ~ _klj~f,j+lpef,p+ Z kPjaF,j+pp~F,p}. (3.15)
j~ df Jg JF
One may, of course, solve the k-th primal search direction finding sub-
problem (2.22) in Step 1 of the method.
k
P 6 SEH(xk;xk) for =~p,
w k = ~[
1 pk 12+~pk (3.17)
L(x,~) = 9ff(x)+gFF(X),
V k = fi~(xk+dk)-Hk(x k) < 0.
or
k+l k+l d k k k k+l
-~F,k+l + < gF ' > ~mRv if tL <~ and y i S,
ensures that at least one of the two new subgradients will significantly
modify the next polyhedral approximation to H k+l after a null step or
250
a short serious step. This prevents the algorithm from jamming at non-
stationary points. The criterion (3.11e) is connected with the distan-
ce resetting strategy discussed below.
The line search rules (3.11) are general enough to allow for con-
structing many efficient procedures for executing Step 3 (Mifflin, 1982
and 1983). For completeness, we give below a simple extension of Line
Search Procedure 3.3.1 for finding stepsizes tL=t kL and tR=t kR. In ths
procedure ~ is a fixed parameter satisfying ~6 (0,0.5), x=x k, d=d k
and v = v k < 0.
(ii) If f(x+td) < f(x)+mLtv and F(x+td) S 0 set tL=t; otherwise set
tu=t.
Lemma 3.3. If f and F are semismooth in the sense of (3.3.23) and (3.18)
and observe that we now have {t L } c TL, t and t eTL, because both
f and F are continuous, so we have F(x+td) <_0 in addition to (3.
3.24a). Using (2.31), (3.12), the continuity of f and F, and the local
boundedness of gf and gF' we obtain
~(x+t~d,x+tid) --+ 0,
F(x+~d) = 0, (3.20c)
The subgradient deletion rules of Step 5 and Step 6 are taken from
Algorithm 3.3.1. Therefore, similarly to (3.3.29), we have
k k
k e Jf if yk S, and k e JF if y k ~ S. (3.21b)
252
4. C o n v e r @ e n c e
Lemma 4.1. Suppose k _> 1 is such that A l g o r i t h m 3.1 did not stop before
^k ^k
k-th iteration, and let M=N+3. Then there exist numbers I i and ~i'
and vectors (y~,i,fk 'i ,sf
k,i) R N~ R x R and (gk'i,Fk'i,s
k ' i ) - - e R N' R ~R,
r r
i=l,...,M, satisfying
~k M ~k
i >_0, i=l,...,M, Z li=l , (4.1b)
3=1
and
k ~k ~k M ^kl k,i. k,i.
(PF,Fp,SF) = Z ~i~gf~yF ),Fk'l,SF ),
i=l
253
^k M ^k
~i ~ 0, i=l,...,M, Z ~i = i,
i=l
k k k k
(gF(YF'i),Fk'i,sF 'i) E{(gF(yJ),Fj,sj): j=l ..... k},i=l ..... M, (4.2)
-- --i --i
Lemma 4.2. (i) Suppose that a point x E R N, N-vectors pf ,yf and gf,
and numbers fp,Sf,~ l and Sf, i=I,...,M=N+3,
--i satisfy
M
--i --i --i
(pf,fp,Sf) = Z ~i(gf,f ,Sf),
i=l
M
~. ~0, i=l,...,M, Z ~. = I,
l i= 1 i
B i
gf e ~f(y~), i=~, ... ,M,
= ~ <--i -- --i
~i f(~ )+ gf,x-yf >, i=l ..... M, (4.4)
f(~) = ~p,
yfSf = 0.
Then pf e ~f(~) .
254
-- --i --i
(ii) Suppose that a point x ~ R N, N-vectors pF,y F and gF' and num-
--i
bers Fp,SF,~l and s F, i=l,...,M, satisfy
M
-- .--i ~ i --i
(PF,Fp,SF) = ~ ~i~gF ,- ,s F), 4.5a)
i=l
M
p i > O, i=l,...,M, ~ l~i= 1 , 4.5b)
i=1
~ --i -- --i
~i = F( ) + < gF,x-YF > , i=l,...,M, 4.5d)
--i -- --i
lYF-XL <_ s F , i=i ..... M, 4.5e)
YFSF = 0. 4.5g)
Proof. We shall only prove part (ii) of the lemma, since part (i) fol-
lows from Lemma 3.4.4.
YF = ~ if ~i # 0 (4.6)
M
o = F(x)+-~p = zji[F<~)+-~1]==
M
-- --i --i -- --i >]
= Z Pi[F(x)+-F(YF)- < gF,x-YF J
i=l
=_ E 7i[F(x)+-F(~) ] = F(~)+-F(x).
~i #o
(b) Next, suppose that YF=0. Then F is convex and (4.5c,d) yield
255
_ -- ---- = ~i
for ..... M
Lemma 4.3. If Algorithm 3.1 terminates at the k-th iteration then the
point ~=x k is stationary for f on S.
k 1 k 2+'k
w = ~IP ~p,
k kk kk
p = vfpf + vFp F,
2
~P~k = 9 k m a x { if(~)_~kl ,yf(sk)}+gkmax{ l~k[ ,YF(S k) }
k k>_0, 9 k + v k = l (4 7a)
9f >_ 0, ~F
from (3.17), (3.6), (3.16), (3.8) and (2.25). Therefore wk=0 and
k k k k
vfpf + ~FPF = 0, (4.7b)
k k ~k =
k[f(~)_~ ] = 0, 9fyfsf 0, (4.8a)
vk[F(x)+-F
k U F p / 0, k ~k = 0, (4.85)
= VFYFSF
where F(~)+=0, because F(x)=F(x k) < 0. Suppose that V~0. Then (4.8a),
Lemma 4.1 and Lemma 4.2 imply p~ e ~f(~), i.e.
k k
Next, if ~F~0 then (4.85), Lemma 4.1 amd Lemma 4.2 yield PF e~F(x)
and F(~)~0, so, because F(~) ~ 0, we have
k k
PF eSF(x) and F(x)=0 if v F ~ 0. (4.9b)
^
Since F(x) ~ 0, (4.7) and (4.9) imply 0 e M(x) (see (2.2)) and x , S.
256
Lemma 4.4. Suppose that there exist a point x ER N and an infinite set
k K k K
pf ~ pf and PF ' Pp.
~k K ~k K + 0 then
If additionally ~f,p 0 then p f E ~f(x), while if ~F,p
or e q u i v a l e n t l y
Ipk I K J 0 and
and Lemma 4.4 to deduce the existence of an infinite set K oK, numbers
~f and ~F' and N-vectors pf and PF such that
k K k
vf + ~f ' ~F ~F'
k K k K_~_, --
Pf ' Pf ' PF PF'
4.13b
~fpf + ~Fp F = 0.
-k
Suppose that ~f~0. Then (4.12) yields 0, so p f e 8f(x) by
~f,p
Lemma 4.4. Thus
pf g 8f(x) if ~f ~ 0. (4.13c)
~k 1 .. 2 ^k
w = ~ipkl + (4.14a)
am ,
^k k^k + k^k (4.14b)
ap = vf~f,p 9FeF,p,
^k ~k k ~k k (4.14c)
~f'P = ~ ~kXj~f,j
j e _f +kp~f,p,
~k ~k k ~k k
aF,p =
j eZ J ~j~F,p +~p~F,p' (4.14d)
~k ^k ~k ^k 4.15
af,p ~ af,p and ~F,p $ ~F,p
258
k ^k (4.16)
w <_w ,
v k = - { l p k l 2 + ap}
~k S -w k ~ O, (4.17)
^k
V <_Vk (4.18)
~k ^k k k k ^k
Moreover, if f and F are convex then ap-ep, w =w and v =v .
k k j ik k-i k j k k-i
p = E _kljgf + ppf + ~ _k~jgF + PpPF '
je df jed F
while (4.14b,c,d) and (2.23a,b) yield
wk =
1
71pk12+~ 1 k 2+^ke
~Ip I p = W
^k
,
-k = - { I p k l 2+o
v ~ } < - ipki2+ = v
from (3.17), (4.14a), (3.9), (3.16), (3.15) and (4.19). This proves
(4.16)-(4.18).
#c(t) = t-(l-mR)2t2/(8C2),
and
k k-i k + k-i k (4.23)
~p = ~f,p eF,p"
so we have
k gk,dk-I k-i
-e + < > ~mRv (4.24)
if yk e S, and
(4.25b)
~k(9):9, ~j(9):0 for j e J~\{k}, ~p(9):(i-9)9~ -I
k-1
~i .klj('~)gJ+Ip(',~)pf + ~] k]~j(~)g3F+]~p(~ )PFk-1 =(i-9) pk-l+ggk
j e Jf J ~ JF (4.26a)
k k + k k k k
Z klj(~)~f,j+Ip(V)~f,p J k~j(V)~F,j+~p(~)~F,p=(l-~)~p+V~
j Jf e JF
(4.265)
k-i k-i k-i k-i
By (2.25), vf a 0, v F >0 and ~f +VF =i, hence (4.25) yields
klj(v)+Ip(9)+ Z k~j(v)+ip(9) =
je Jf J~JF
=~ + (l-~)~k-l+(l-V)~Fk-i =I
for all ~ e [0,i]. Combining this with our assumption that rk=0
a and
with (3.21b), we deduce that the multipliers (4.25) are feasible for
subproblem (3.1) for all v a [0,I]. Therefore ~ (the optimal value
of subproblem (3.1)) satisfies
min{1(l-9)pk-l+ggk[ 2 + ( 1 - ~ ) ~ - I ~ k :~ k ~k~Z
[o,i]" }+lep-~p' I"
(4.27)
Using Lemma 2.4.10 and relations (3.7), (3.17), (4.17), (4.24) and
(4.22), we obtain
min{ y1 [(l-~)p-l+vgk[2+(l-9)ep
-k-l+~k : 9 e [o,i] } #c(wk-l),
k
~j = 0, for ieJF, ~p = 0
if yk6 S, and
k
lj = 0 for j~Jf, ip = 0,
~k = i, ~j = 0 for j e J ,{k}, 0
k~ k
if yk4 S. Since k ~ Jf JF by (3.21a), the above multipliers are fea-
^k
sible for the k-th dual subproblem (3.1). Therefore w , the optimal va-
11 p k 2 +ep
-k = w k <_w
^k <-~Igkl2+
1 k
by (3.17) and (4.16). The above inequality and the fact that ~kp -> 0 yield
(4.28).
262
Lemma 4.9. Suppose that there exist a point ~ ~ RN and an infinite set
k+l ~k
l ~ f , p - af,p[ ---+ 0 as k~, k e K , (4.31a)
Proof Suppose xk K + ~ and let B={y R N : Ix-yl ~ 2a}. Since ~> 0 and
x k K ,x,
-- we deduce from (4.1d,e) the e x i s t e n c e of a number k such that
k,i
Yf B for i=l,...,M and all k ~k, k ~K. Then (4.1a,b) and the
boundedness of gf on B imply that {P~}k ~ K is bounded. In a similar
If(xkl)-f(xk)E+Ip~Lixk+l-xk L ~ , 0, (4.33a)
K -- , , K k
since xk ~ x, ~xk+l-xk I ~ 0, f is continuous and {Pf}kaK is
bounded. A similar a r g u m e n t yields
Fk+l ~k k k+l
p -Fp =l ~p~,x -x k ~l_<IpFJlx
k k+l
-x k [ K
, o. (4.33b)
263
IYf[Sf
kl ) 2-Yf(Sf)l
~k ~ I
xk+l_xk I(2(yfC) ~/2 +yf L kl_xk I),
Combining this with our assumption that ixk+l_xkl K 0 and with (4.33),
we obtain (4.31) from (3.2b) and (3.8). By (4.23), (3.15) and (2.45),
k+l ~k I k+l ~k K
max{ l~f,p-af,~, [~F,p -~ F,p I} + 0
from (4.31).
Using the above lemma, one can easily m o d i f y the proof of Lemma 3.
4.15 for A l g o r i t h m 3.1. Then, since the proof of Lemma 3.4.16 requires
no modifications, we obtain
Proof. One can use the proofs of Lemma 5.4.14, T h e o r e m 5.4.15 and Theo-
rem 5.4.16 to obtain the desired conclusion.
Remark 4.14. The results of this section hold for the case of many con-
straints considered in Remark 2.1 and Remark 2.2. This follows from the
fact that the mapping ~ has essentially the same properties as the
subdifferential ~F, i.e. is locally bounded and upper semicontinuous.
In this section we state in detail and analyze the method with sub-
gradient selection introduced in Section 2.
Al@orithm 5.1.
1 1 1 = f(yl),
Jf = {i}, gf =gf(yl), fl
1 ' 2+ ~ k k
minimize ~I E k l]g~+ j E kvjg~ I Z _k j~f, + ~
] j kVjeF,J "
~,p j~ Jf e JF J ~ of ~ JF
(5.1)
where
~k ^k
and sets Jf and JF satisfying
265
~k k ~k~o } and
~k k k
JF = {je JF : ~j ~0}, (5.3a)
f={jeJf: 3
Compute
pk= E k k j k j 5.4)
J ~ J f ljgf + JeJFE
k ~jgF
d k = _pk, 5.5)
~k k k k k 5.6)
= E _klj~f,j+ ~ k~j~F,j,
~P j ~ df J ~ JF
^k pk 2 ~k
w=] L 5.8)
If ~k
w ~ ~s then terminate; otherwise, go to Step 3.
k+l k k k k+l k k k
x =x + tLd and y =x + tRd
satisfy t~ ! 1 and
k k k
t R = tL if t L a ~, (5.9c)
lyk+l-xk+ll ~ a / 2 (5.9e)
^k _k+l ~k
j kf + l= Jf u{k+l} and oF = JFU{k+l} . (5.10)
F k+l = F k + < g3F,xk+l-xk> for J a ^JF"
k
3 3
_ ^k ~k
Sk+ik+l= sk3 + [ xk+l xkl for j e Jf o JF"
We shall now conment on relations between the above method and Al-
gorithm 3.1.
k k k k
~f = Z and ~F = E k)/_i,
jej k 3
J ~JF ~
~k ~k
define the scaled multipliers I~J and ~j satisfying
lk = k "k k
] ~fkj for j e Jf,
~k >0, je k Z k~k:l,
3 - Jf' j e Jf
k = ~F~j
~j k~k for j~ 4 ,
-k k ~k
E k~j=l,
~j _>0, j eJF, J~ JF
and let
"k j k k
(pk,~kFp,SF)'k = j 6Z JFkZJ(gF'Fj'sj)'
k ~k 2
~k,p=max{ If(xk)-f
ef I yf(sf) },
~k,p=max { l~kl, ~k 2
F(SF ) },
~k k'k k~k
~p = 9f~f,p + ~F~F,p,
k 1 k 2 ~k
w =~Lp I + ~p,
v k : -{Ipkl 2+[~}.
ik k
P =~p =0
in the relevant relations of the preceding sections to see that Lemma 4.6
holds for Algorithm 5.1. In particular, we have wk~ w^k . Thus both w k
and ~k can be regarded as stationarity measures of xk; see Section 3.
The line search rules (5.9) differ from the rules (3.11) only in that
we now use Sk instead of vk for estimating the directional derivative
of f at x k in the direction d k. Note that we always have Sk < 0 at Step
^
3, since vk S vk < 0 by Lemma 4.6. Hence to implement Step 3 one can use
268
where +C is defined by
~c(t) = t-(l-mR)2t2/(8C2),
,xk-1 ~k-1
lk(~)=v, lj(v)=(l-~) j , j a Jf ,
k tk- 1
~k(~)=0, ~j(~)=(l-v)~j -I' J ~ OF '
and (2.25b) by
lk-i tk-i
ik(V)=0, lj(v)=(l-9, j , j E jf ,
E klj(~)gJ+ Z ~ j ( 9 ) g J = (l-9)pk-l+~g k
J ~Jf jeJ k
269
k k = (1-V)~,k+'gc~ k ,
Z klj(~]~f,j + Z k~j(9)eF,j
j e Jf J~ JF
k k Z klj(v)+ ~ ~j(~)=l
kj(~) ~ 0, j ~ Jf, ~j(~) ~ 0, je JF" j e jf j j~
^k ~ m a x {~[
max{Ipkl,ep} 1 gk 12+~k' ( Igk [2+2~k)I/2},
max{lpkl,[skl,~p,1}
^k Sc if Ixk-xl< a ,
^k 1 k 2 ~k
without influencing the proof, since w =~ p +ep.
Lemma 5.3. Suppose that there exist a point ~ RN and an infinite set
K c {1,2, ..} such that xk K -~ x and Ixk+l-xkl---~ 0 as k+~, k ~K.
Then
I ep
k+l "kl
-~pl---+ 0 as k+~, k e K.
[ ~ +I _ ~ I s m a x Lrmaxr~t,~f,j-~f,jI:
k+l k j e j~} ,
maxiDF, j - aF,jl : J
Fj -Fj[ = [ < ~F ,= ~x I~ I - I
k+l k k ~k k+l k k k
x =x +tLa and y =x +tRd
satisfy
k k k
f(x k+l) <_ f(x )+mLtLV , (6.1a)
f(x)-f(y)- < g f ( y ) , x - y ~ if y ~ S,
~(x,y) =
I -F(y)- < g F ( Y ) , x - y > if y ~ S.
Theorem 6.1. Suppose that problem (I.i) is convex and satisfies the Sla-
ter constraint qualification, i.e. f and F are convex and F(x) < 0 for
some x e R N. Then A l g o r i t h m 5.3.1 with the m o d i f i e d line search rules
(6.1) generates a sequence {xk}~ S satisfying f(xk)+ inf{f(x): x e S}.
Moreover, if f attains its infimum on S then {x k} converges to a so-
lution of problem (I.i).
by (5.4.3), so
a
k+1 ~ k
-ep
k[f(
~f
k+l)_f(xk)]_v~(=k+l ~k,
Xp -~p)-VF~
k,Fp k+l -Fp)=
~k-
f[k~f(xkl)-f(xk I k k kl k k k kl k
= ~ ]-~f < pf,x -x -gF < PF 'x -x > =
Hence
I ~pk+l-~p~k~_ If(xk+l)-f(xk)l+Ipkll xk+l-xk L,
since v~ e [0,1]. Then, since xk K , ~, ixk+l_xk I K + 0 and {Pk}ke K
bles us to establish Lemma 3.4.15 for the m o d i f i e d method, and then we may
prove Lemma 4.10 by using parts (i)-(iii) of the proof of Lemma 3.4.16.
We conclude from the above proof and the results of Section 5 that
Theorem 6.1 is valid for A l g o r i t h m 5.5.1 with the m o d i f i e d line search
criteria 6.1.
To sum up, we have shown that one may use the general line search
272
k k k
a =max{Ixk-y31 : j e J f u JF }
k k
if Ip=~p=0, and by
Ixk+l-yjl ~ Ixk-yjl+Ixk+l-xkl,
one can verify that the convergence analysis of Section 4 covers this
version of A l g o r i t h m 3.1.
Reasoning as above, we conclude that if we replace sk by IxK-yJ
I--
3
everywhere in A l g o r i t h m 5.1 then the resulting method is globally con-
vergent in the sense of Theorem 4.11, T h e o r e m 4.12 and Corollary 4.13.
Moreover, this method has (primal) search direction finding subproblems
of the form (2.33), which, as it was shown in Section 2, reduce to the
k
Mifflin (1982) subproblem (2.40) if the rules for choosing J~JF and
satisfy the requirements (2.30). Therefore, this method may be regarded
as an implementable and globally convergent version of the Mifflin (1982)
algorithm. Further comparisons with the algorithm of Mifflin (1982) are
given below.
For the sake of completeness of the theory, let us now consider a
method that uses all the past subgradients for search direction finding
at each iteration. The method with subgradient accumulation is obtained
from A l g o r i t h m 5.1 by deleting Step 5 and Step 6 and setting
jfk = J~={l . . . . k}
since then {xk}c S in view of the monotonocity of {f<xk)} and the fe-
asibility of {xk}, so {xk} is bounded, while lyk-xkl S ~ / 2 for all k
owing to the line search requirement (5.9e).
The results of Section 4 and Section 5 imply that the above-descri-
bed method with subgradient accumulation is convergent in the sense of
Theorem 4.11, Theorem 4.12 and Corollary 4.13 under the additional assu-
mption (6.2). The same result holds if we replace sjk by Ixk-yjl eve-
rywhere in this version of Algorithm 5.1.
As observed in Section 3.6, it may be efficient to calculate sub-
gradients not only at {~} but also at {xk}, and then use such addi-
tional subgradients for each search direction finding. This idea can be
easily incorporated in all the methods discussed so far in this chapter.
For instance, in Algorithm 3.1 we may let
[y3-xlJII+ k-i
z ixi+ 1-xil if lj I <k,
sj
k = I I I ~lJl
,yJ-x k , if j=k,
jk+l ^k
f = Jf u{k+l,-(k+l)}, (6.5a)
jk+l ^k
F = JF u{k+l,-(k+l)}. <6.5b)
^k
_k+l Jf u{k+l,-(k+l)} if yk+l g S,
(6.6a)
Jf = ^k
Jf if yk+l ~ S,
^k
_k+l = i JF^k if yk+l ~ S, (6.6b)
OF JF u { k + l , - ( k + l ) } if yk+l ~ S .
~k k ~k -k
~f,p=max{f(x 1-fp,~f(sf)2},
~F,p=max(-Fp,~F[SF)2},
~k - -k ~k
and (3.12) by
[yk+l-xk+l I & a.
Algorithm 7.1.
1 0 RN F 1 = 0.
JF = @ ' PF = 0 ~ , P
k + k k k
+j~Z jklj~ff ,3 I p ~-x,p +j e~ j~ ~JaF'J + ~p~F,p ,
k k
subject to lj ~ 0, j e Jf, ~p ~ 0, ~j a 0, j ~ JF' ~p ~ 0, (7.1)
Z klj + l p + Z k~j + ~ p = i,
j e Jf j e JF
where
ef,j =
If(xk)-f ' ~F,j
= IF I' (7.2)
Compute k k ~k k ~k ~ , j~ k ~k
~f and ~F by (3.3), and lj, j e Jf, p, JF' and ~p
by (3.4). Calculate
k-i
Pf , , (7.4a)
(7.4b)
k k k k k
p = ~ f p f + v F P F, (7.5)
277
dk =
_pk t
(7.6)
~ f , p = If(x k) , (7.7a)
F,p= I I, (7.7b)
k k
If rf=rF=l set
a k = m a x { s k.
3 : j g J fk u jk}. (7.9)
satisfy tk
L < 1
and
k k k 7.10a)
f(x k+l) < f(x }+mLtLV ,
lyk+l-xk+l[ ~ ~, 7.10e)
where
g(y)=gf(y) and e(x,y)=~f(x,y) if y e S, (7.11a)
and set
k+l tk _k+l tk if yk+l S, (7.14a)
Jf =df U {k+l} and JF =OF
jk+l ~k k+l ^k+l yk+l~ (7.14b)
f =Jf and JF =JF u {k+l} if S.
Calculate
ak+l=max{ak+ixk+l_xk I,Sk+l}.
k+l (7.151
Set rk+l=0
a and
k+l i 1 if rf=l
k and ~f=0,
k
rf = ~ k (7.16a)
0 if r =0 or vf~0,
k k
k+l r i if rF=l and VF=0,
(7.16b)
rF = I k k
0 if rF=0 or VF#0.
1 2+$,
minimize
(d,v) R N+I
k k
subject to -ef,j+<g ,d> <v, j eJf,
k k-i ^ k
-ef,p+ < p f ,d > <_v if rf=0, (7.17)
k j " k
-eF,j+ < g ,d > <_v, j e JF'
^ k
-F,P
+k <pk-l,d> <v_ if rF=0.
d j = -( vfpf
J J + ~FPF
J J) for J=kr( k ) ..... k-i
in the sense that ~3=0 (~ =0) for J=kr(k ) ..... k-i (cf.(7.16)). This
can occur, for instance, if JJ=~ (JFJ=@) for J=kr(k) ..... k-i (cf. 7.
13)-(7.14)). Then there is nothing to be aggregated at such iterations
with JJ=0 and rJ=l, (JJ=@ and rJ=l), so we must ignore the (k-l)-
st aggregate subgradient of f (of F) by setting rk=l (rk=l).
k k k k=0 k
~F = ~ +~ and ~ if rF=l
J eJ F 3 P P
1 k k
and rF=l, we have 9F=0 and rF=l for all k by the rules of Step 4
k
(i) and Step 7. (Note that we have implicitly 2 k~j=0 ifassumed that
Js JF
jk=@. The same convention of ignoring summations over empty sets is em-
k k k
ployed in A l g o r i t h m 7.1). Moreover, we have ~f=l-VF=l, so that rf=0 if
k
re=0 ,for all k. Then it is easy to deduce that Algorithm 7.1 reduces to Algorithm 4.3.1,
except for the rule of updating ak via (7.9) if rk=l, while A l g o r i t h m
280
k
~f,j = f(xk)-f~ and ~f,Pk = f(x k )_f~,
k =-Fk and k = _F k
~F,j 3 ~F,p p'
k k
corresponding to relations (5.4a,b). Therefore, if rf=rF=0 , in this
case subproblem (7.1) coincides with subproblem (5.4.12), which is equiva-
lent to the k-th search direction finding subproblem (5.3.1) of Algori-
thm 5.3.1 by Lemma 5.4.10.
To discuss the resetting strategy of the algorithm, we shall need
the following result on convex representations of aggregate subgradients,
which is an analogue of Lemma 4.4.1 and Lemma 4.1. Let
k k
= Jf U J F for all k, (7.18)
k k
J f n JF = ~ for all k. (7.19)
Lemma 7.2. Suppose k a1 is such that Algorithm 7.1 did not stop before
the k-th iteration, and let
a = max{s : jE J , 7.23)
3k = c 7.24)
p r
~k
Jf,r ~ @ if 9 k ~ 0, (7.25a)
*k k
JF,r ~ @ if v F ~ 0. (7.25b)
~k ~k (y~,i,
If Jf,p~@ then there exist numbers Ii and (N+2)-vectors
fk'i,sk'i), i=l,...,M, satisfying
k ~k M
(pf,fp) =
Z ik(g(yfk,i),fk,i),
i=l
M ^
ik ~ 0, i=l ..... M, ~ Ik=l,
1- i=l
k,i) ,fk,i k,i) (gj, fj,sj)
k k ^ (7,26)
(g(yf ,Sf ~ { : j ~ jk ,p} . . i=l,
... M,
k,i x k k,i
Yf - I -< sf ,
k ~k : M ^ ,i) i),
(PF,Fp) iZ:l~k(g(Y k ,F k'
Mk M ^k
~i_> 0, i=l,...,M, Z ~i=l,
i=l
k,i xk k
YF - I ~ SF'l, i=l, . . ,M,
to the constrained case. Observe that at each iteration only one subgra-
dient of the form
^ k k
gk=g(yk)= I gf(yk) E M ( y ) if y ~ S,
(7.29)
gF(yk) ~ ( y k ) if y k ~ S,
k k k k k
p = vfpf + 9Fp F,
k k k k
vf >_0, ~F a 0, ~f + ~F = I, (7.31a)
kf = o if (7.31b)
f,P =
k
~F = 0 if F,p (7.31c)
Lemma 7.4. If f and F are semismooth in the sense of (3.3.23) and (3.
18) then Line Search Procedure 7.3 terminates with tL=t
Lk and tR=t
Rk
satisfying (7.10).
~k c Jfk and ^k k
Jf JF CJF, (7 34a)
Mg2 (734b)
where Mg t 2 is a fixed, user-supplied upper bound on the number of
stored subgradients. In view of (7.18) and (7.19), the simplest way of
satisfying (7.34) is to delete some smallest numbers from JF
k=j u k
.^ ^k ^k ^k
so as to obtain IJkl ~ M g - 2 with Jf U J F = J . In fact, as far as con-
vergence is concerned, the r e q u i r e m e n t (7.34a) can be substituted by
the following more general rule
^k ^k ^k ^k
Jf c J f , p and JF c JF,p'
i.e. any subgradient used since the latest reset can be stored, cf. (7.
17a,b,c).
Observe that (7.10e), (7.14) and the rules of Step 4 yield the
following analogue of (4.3.23) and (3.21b)
Thus the latest subgradient is always used for the current search direc-
tion finding.
We shall now establish convergence of the algorithm. To save space
we shall use suitable modifications of the results of Section 4.4 and
Section 4.
We suppose that the final accuracy tolerance es is set to zero
and that each execution of Line Search Procedure 7.3 is finite (see
Lemma 7.4, Remark 3.3.4 and Remark 3.4).
First, we observe that Lemma 7.2 can serve as a substitute for Lem-
ma 4.1, Lemma 4.4.1, Lemma 4.4.2 and Lemma 4.4.4. Secondly, since Lem-
ma 4.3 holds in view of (7.33), the assumption that es=0 and the de-
finition of stationary points, we may assume that the method generates
as infinite sequence of points. Then (4.4.9)-(4.4.11) are easily veri-
fied, and we conclude that {f(xk)} is nonincreasing. Thirdly, we note
that part (i) of Lemma 4.2 can be replaced by Lemma 4.4.5, and part (ii)
by the following result.
285
Lemma 7.5. Suppose that a point x E R N, N-vectors pF,y F and gF' and
~p, ~i --i
number~ and s F, i=l,...,M, satisfy
M
-- --i
(PF,Fp) = ~ ~i(gF,Fi),
i=l
_
M ~ ^
g F E ~F( ), i=l,...,M,
~i=F(
~)+ -i - -i
< gF,x-YF > , i=l,...,M,
--i
-i
max {s F : ~i ~ 0~ = 0,
M
Proof. Set ~F=iZl~i~= and use part (i) of the proof of Lemma 4.2.[']
Lemma 7.6. Suppose that there exist a point x eR N and an infinite set
K c{1,2,...} such that xk K ~ and ak K ~ 0. Then there exist an
infinite set K c K, N-vectors p,pf and PF' and numbers ~f and ~F
such that
k
P + p,
= ~fpf + 9FPF ,
Pf ~ ~f(~)' PF ~ ~F(~)'
F(~)~O if ~F~O.
k~k k-k
Moreover, p g M(~) and vfaf,p+VF~F,p 0.
^k ~k
Proof. By (7.31), J f , p O JF,p ~ ~ for all k, so at least one of the
following two sets
286
^k ^k
Kf = { k e K : Jf,p~@} and KF={k~ K : JF,p~@}
k
is infinite. Suppose that KF is finite. Then we have VF=0 and (7.26)
for all large k~Kf, hence we may use (7.31a), (7.6) and (7.7a) to de-
duce, as in the proof of Lemma 3.4.6, (7.36) w i t h ~f=l, ~F=0 and
k-k
~fef,p ~ 0. A similar a r g u m e n t based on (7.27} and Lemma 7.5 yields
k-k
(7.36) with ~f=0, ~F=I and 9FeF,p ~ 0 if Kf is finite. In view
of the p r e c e d i n g two results, and the fact that K=KfuKF, it remains
to c o n s i d e r the case of an infinite set K = K f ~ K F. Then (7.26) and (7.
27) hold for all k a K, so the d e s i r e d c o n c l u s i o n can be deduced from
(7.31), (7.6)-(7.7), Lemma 4.4.5 and Lemma 7.5. (7.36) implies p6M(x)
in view of (2.2).
Define the s t a t i o n a r i t y m e a s u r e
wk 1 k 2 -k
(7.37a)
where
~k k~k k~k
~p = vfef,p + VFeF, p, (7.37b)
at the kth i t e r a t i o n Cat Step 5) of the algorithm, for all k. We have the
following analogue of Lemma 4.4.7.
or e q u i v a l e n t l y
xk K + x and wk K , 0. (7.39)
liminf m a x { I p k l , l ~ - x k l } = 0.
k+~
~k
Let w denote the optimal value of the k-th dual search direc-
287
Algorithm 7.9.
r k = 1.
a
(ii) If IJk[ > 1 then delete the smallest number from Jfk or k set
JF'
^k ~k ^k ^k for all k,
Jf,r = Jf and JF,r = JF
3k = ~k for all k,
p r
and resetting tests of the form IpklS maS ~, instead of Ipk ] ~ m a ak.
An extension of this strategy to the constrained case is given in the
following method.
Algorithm 7.11.
1 1
Step 0 (Initialization I. Do Step 0 of Algorithm 7.1. Set sf=sF=0.
(p~ ~k ~k ~k j k s k k k-I k k
,fp,Sf) = Z -l.(gf,fj, 3 ) + ~ (pf ,fp,Sf),
jgJf ] (7.40)
(p~ ~k ~k ~k j k k .k, k-i k s k)
,Fp,SF) = Z kDj(gF,Fj,sj) + ~ p [ P F ,Fp,
JgJF
~k k k k-k
Sp = vfsf + ~FSF . (7.41)
_k+l
Step 9 IDistance resetting I. Keep deleting from of and o_k+l
F indi-
k+l
ces with the smallest values until the reset value of a satisfies
respectively.
~k
We c o n c l u d e from (7.41), (7.31) and Lemma 7.12 that Sp is al-
ways a convex c o m b i n a t i o n of the a g g r e g a t e d i s t a n c e m e a s u r e s ~k and
k , w h i c h in turn indicate how far pf k an d PFk are from M(x ). Thus
k ~k k k -k k k ~k
(p ,Sp) = v f ( p f , s f ) + ~F(PF,SF),
k k k k
~f_> 0, VF>_ 0, 9f + g F = i,
h e n c e the v a l u e of
~k
- indicates how far pk is from ^M(xk). This jus-
P
tifies the stopping c r i t e r i o n of Step 2. In fact, by using Lemma 7.12,
Lemma 4.2 and the proof of Lemma 4.3 one can show that if es=0 then
the a l g o r i t h m stops only at s t a t i o n a r y points.
S u p p o s i n g the m e t h o d does not terminate, we have the following re-
sult.
291
-k -k ak ^k
Sp = sf _< if JF,p = ~,
~k ~k a k ^k
p =s F -
< if Jf,p = ~;
(7.45)
~k ~k -k ak ^k ^k
p max{sf,s F} < if J f , p ~ JF,p
~k ^k
J f , p U JF,p ~ @'
We may add that one can modify A l g o r i t h m 7.9 in the spirit of Al-
g o r i t h m 7.11 w i t h o u t impairing the preceding global convergence results.
Namely, in A l g o r i t h m 7.9 we may use the stopping criterion
max{Ipkl,maS ~} ~ e s
- maS p ,
with &~ generated by (7.40) with ~kp=~p=~k0, and replace Step 8 by Step
8, Step 9 and Step 10 of A l g o r i t h m 7.11. To establish Theorem 7.10 for
the resulting method with subgradient selection, one may use the proof
of Theorem 7.13.
The preceding algorithms of this section can be modified by using
the r e s e t t i n g strategies of W o l f e and M i f f l i n described in Section 4.6.
292
Algorithm 7.14.
~ ~k k ~k k ~k -k k + ~k k
sk Tk~jSj + ~pSf
j e ~f and sF = z k~jSj ~pSF'
j J
F
~k k ~k k ~k
Sp = vfsf + ~FSF
Step 4 (Resetting). Replace Jfk by {je Jfk : sjk < ~6k+i/ma} and JFk by
jyk+l-xk+lj ~ 6 k + i / m a . (7.46)
k k k k k k =0
~f,j = 0 ~ j ~ Jf, ep=0, ~F,j=0, J~ JF' eF,p
k pk 2
v =-I I ,
~k k-k k~k
ep = 9 f ~ f , p + 9 F e F , p
9. P h a s e I - Phase II M e t h o d s
Al~orithm 9.1.
Step 0 (Initialization). Select a starting point xl~ R N and initiali-
ze the method according to the rules of Step 0 of Algorithm 3.1.
k k k k k
Step 1 Direction finding}. Find multipliers lj, j~ Jf, Ip, ~j, j ~ JF'
and Ik that
P
1 ~ k-i ~ k-I 2
mlnimize ~ I Z _kljg + IpPf + ~ kUj g + ZpPF I +
I,~ j ~ Jf j e JF
jklj + I p + E k~ j + ~p= i,
ja f J~JF
k
~p = ~p = 0 if r a = 1,
where
af,kj=max{jf(xk)_f~l,yf(sj)
k 2 }, eF,j
k =max{IF(x k )+-FJk I' YF( 3
sk)2} '
(9.2)
k 2 k k k s~)2}
~,p=max{If(xk)-f~l,yf(sf) }, ~F,p=max{IF(x )+-Fpl, F (
Compute k, k, ~k ~k
(~k ~k).. ak, (pk,fp,Sf), k ~k ~k ' P k
(PF,Fp,SF) and dk as
~k,p=max{if(f x k )-fpl,yf(sf),
~k ~k 2 }, ~F,p
~k =max{IF(x k )+-FpI'YF(SF)
-k -k 2 }' (9.3a)
vk =-{[pk I2 +~p}.
-k (9.4)
k k
F(x k+l) <_F(x k) + m L t L V , (9.6a)
k k if k (9.6b)
t R = tL t L Z ~,
where
~F(x,y) = m a x { I F ( x ) + - F ( x ; y ) ] , YF[x-y[2}. (9.7)
Lemma 9.2. If Algorithm 9.1 terminates at the k-th iteration then the
point x=x k satisfies 0 ~ M(x). If additionally F(x~ 0 or
Proof. Use (9.3) in the proof of Lemma 4.3 for replacing (4.9) by
k
p f ~ ~f(~) and F(x)+=0 if ~ ~ 0,
k
PF 3F(~) and F(~) ~ 0 if ~ ~ 0
A ^
In view of the above result, we shall assume from now on that the
method calculates an infinite sequence {xk}. Of course, phase II of the
method is covered by the results of Section 4. Therefore we need only
consider the case when the method stays at phase I.
Proof. To save space, we shall only indicate how to modify the results
of Section 4 for Algorithm 9.1.
(i) Proceeding as in the proof of Lemma 9.2, use (9.3) in the proof of
Lemma 4.5 to obtain the desired conclusion if (4.11) holds.
^k = ~fLef,p
~p kr^k + F(xk)+ ~ + ~FaF,P
k k (9.9)
~k
p= ,p+F( xk)j + vk-l~F, p . (9.1o)
Bundle Methods
i. I n t r o d u c t i o n
2. D e r i v a t i o n of the Methods
e(x,y) = f(x)-~(x;y)
k k k j
where fk=~(xk;y3) for j E jk. Let ~j=aj(x ,y ) for all j e jk. By con-
3
vexity, gJE ~ kf(xk), i.e.
3
k
f(x) >_f(xk) + < g3,x-xk >-~ for all x,
3
and hence for any e Z 0 the convex polyhedron
Suppose that for some ek > 0 Gk(e k) is nonempty and we want to find
a direction d~ RN such that f (xk+d) < f(xk)-e k. Letting x=xk+d and
e=e k, we see that d must satisfy
i.e. we must find a hyperplane separating Gk(e) from the origin. One
way of finding such a hyperplane is to compute the element pk=Nr Gk(e)
of Gk(e) that is nearest to the origin, since (see Lemma 1.2.12)
(We may add that, since < g,pk/Ipk I > is the length of the projection
of g on the direction of pk and Ipkl is the distance of the hyper-
plane
H = {ze R N : < z,p k > = Ipkl 2 }
from the origin, among the hyperplanes separating Gk(e) and the null
vector H is the furthest one from the origin.) Of course, there is
no separation if pk=0, but then 0=pk~ G k ( e ) ~ ~ef(X k) and so f(x)
f(xk)-e for all x. In this case xk minimizes f up to the accuracy
302
Otherwise one may decrease the value of e, compute new pk=NrGk(e) etc.
This process will either drive e to zero, indicating that x k is op-
timal, or find a direction dk=-p k satisfying (2.2).
We shall now give another motivation for the above construction.
In Section 1.2 (see Lemma 1.2.13) we considered search direction find-
ing subproblems of the form
Since the use of ~k would require the knowledge of the full subdiffer-
ential ~f(xk), in Chapter 2 we replaced ~k by the polyhedral appro-
ximation
used in the methods of Lemarechal (1975) and Wolfe (1975); see Section
4.7. Let us now consider the following approximation to f at xk
~k
fB,s(X) = m a x { f ( x k) + < g,x-x k > : g ~ Gk(ek)} for all x. (2.3)
*k ^k k
Observe that fB,s reduces to fLW whenever e is sufficiently large,
i.e. ek ~max{~ : j ~Jk}. On the other hand, if ek is small enough
then we may hope that Gk(ek), being a subset of ~ekf(xk), is a good
approximation of ~f(xk). In this case ^k
fB,s is close to the "concep-
tual" approximation ~k. It is natural, therefore, to consider the fo-
llowing search direction finding subproblem
Lemma 2.1. (i) Subproblem (2.4) has a unique solution d k. (Recall that
minimize 1j ~ j k ljg j 12
subject to lj a0, j ~ j k , Z jklj = i, (2.5)
j~
jklj~ k <-e k,
j~
and let
pk = klkgj. (2.6)
j~ J J
I~kl ~ N + 1. (2.7b)
subject to z jklj = I,
j~
(2.8)
E jkljg j = pk,
je
i. z0, j~jk.
3
minimize ^k
fB,r(xk+d) +1d] 2 (2.9)
where
~k
fB,r(X) =max{f(xk)+<g,x-xk > : g a ~k }
^k
lj _>0, j a J , J ~Z ~klj = i}.
304
k
(iv) There exists a Lagrange multiplier s >_0 for the last constra-
int of (2.5] such that (2.5) is equivalent to the problem
k N+I^ ~i
p = Z fig , (2.11a)
i=l
N+I^
~i>_ 0, i=l,...,N+l, I i = i. (2.11b)
i=l
^
N+I
minimize ~1 [ r. .
~i I 2
X i=l xlg '
N+I
s~ject to I i ~ 0, i=l,...,N+l, Z I i = l,
i=l
hence Lemma 2.2.1 yields
where ~=_pk and v = - I p kI2. From (2.11) and Lemma 1.2.5 we obtain
pke ~,s(xk+~),
^k k ^ ^
hence 0 m afB,s(X +d)+d=ack(d) by Corollary 1.2.6. Thus d minimizes
k A .k
k, so -p =a=a from part (i) above.
(iii) The simplex method will find an optimal basic solution of (2.8)
with no more than N+I positive components (Dantzig , 1963) such that
305
1 k 2
it solves (2.5), since 71P I is the optimal value of (2.5). Hence
Remark 2.2. The Mifflin (1978) quadratic programming algorithm will au-
tomatically find multipliers lk satisfying (2.7), and the multiplier
3
sk .
We conclude from the above lemma that subproblem (2.5) has many
properties similar to those of the subproblems studied in Chapter 2,
which were of the form (2.10) but with sk=l. In particular, (2.9) is
its reduced version. Therefore, according to the generalized cutting
plane idea of Section 2.2, we may construct the (k+l)-st approximation
to f by choosing jk+l such that
jk+l ^k
= J u {k+l}
where ~k satisfies (2.7). This will define the method with subgrad-
lent selection, which uses at most N+3 past subgradients for search
direction finding at any iteration.
apk = f ( x k ) _ f k
we have
f(x) Z f(x k) + < pk'l,x-xk > _ k for all x.
P
^k
where fB,a is the k-th aggregate approximation to f
Al@orithm 3.1.
solve the k-th dual subproblem (2.13). Calculate the aggregate subgra-
dient (p k ,fp)
~k by 2.15). Set dk=-p k and
v k : -Ipkl 2.
-k = f(xk)-f k , (3.1)
P P
If max{ipkl2 '~p}
~k ~ es' terminate; otherwise, continue.
k k+l k k (3.2b)
tL~ ~ or ~(xk,x ) > mee if t L > 0,
k(1) if k(1) < k+l, and set jk+l=jku {k+l}. Set gk+l=gf(yk+l) and
compute
fk+l = f(yk+l gk+l,xk+l_yk+l
k+l )+ < > '
fk+l = fk + < g 3 , x k + l _ x k > for j ~ J^k , (3.3)
3 3
fk+l = ~ k + < p k , x k + l _ x k >.
P P
ik > 0 , j ~ jk, ik lk
3 - p~0, Z jk i kj + = i. (3.4)
j~ P
k
This e s t i m a t e justifies our s t o p p i n g criterion and shows that x is
optimal if s=0.
k
Our r u l e s for u p d a t i n g the a p p r o x i m a t i o n tolerances e s t e m from
the f o l l o w i n g considerations. In v i e w of (3.6), we a i m at o b t a i n i n g small
values of b o t h pk[ and ~K at s o m e iteration. This w i l l occur if
both Ipkl2<_~k and the v a l u e of e~k is small. Thus a mechanism is ne-
eded for decreasing~ the v a l u e of ~ Pkp if -k
ip k i 2 <_ ep. Since
= :
~k k
a p -< e . (3.9)
pk ~ -k
Therefore, whenever I I2 ~p occurs the a l g o r i t h m decreases ek(me<l)
~k
and c a l c u l a t e s new pk and ~k. Thus the u p p e r b o u n d on e is decreas-
P P
ed, while the n e w Ipkl c a n n o t be s m a l l e r than the old one. Moreover,
this r e d u c t i o n of ek increases the a c c u r a c y of our a p p r o x i m a t i o n ~k
B,a
of f a r o u n d x k, w h i c h is b a s e d on the set G~(ek)c-d 3 k f ( x k ) ' w e m a y add
e
that, for s i m p l i c i t y , the a l g o r i t h m uses the a p p r o x i m a t i o n t o l e r a n c e
ek=ea after each s e r i o u s step. Other, more efficient rules for u p d a -
ting ek w i l l be d i s c u s s e d in S e c t i o n 6.
Our line s e a r c h criteria ensure two b a s i c prerequisites for con-
vergence: a sufficient decrease of the o b j e c t i v e value at a s e r i o u s step,
and a s i g n i f i c a n t modification of the n e x t a p p r o x i m a t i o n to f a f t e r a
null step. We have, by (2.14) and (2.2),
k
= v (3.10)
This shows that when ek decreases during a series of null steps then
the algorithm collects only local subgradient information, i.e. gk+l is
close to ~f(xk+l).
The following line search procedure may be used for executing
Step 4.
(a) Set tL=0 and t=tu=l. Choose m satisfying m L < m < mR, e.g. m=
(9mL+mR)/10.
(b) If f(xk+td k) ~ f(xk)+ mtv k set tL=t; otherwise set tu=t.
Proof. Assume, for contradiction purposes, that the search does not ter-
minate. Denote by t i, ~i
tL and t Ui the values of t, t L and tU after
A ^
i i i'
Since t U + t, LT and t ~ LT if tu=t , the set I={i:tl=t2}
u is
f(x+td) ~ f(x)+mLtv-~
e(x,x+tid) ~ f ( x ) - f ( x + t i d ) + t i l g i I Idl,
Making use of (3.12) and the fact that 0 <m<mR, one may obtain the de-
sired c o n c l u s i o n as in the proof of Lemma 3.3.3, since f, being a convex
function, has the s e m i s m o o t h e n e s s p r o p e r t y (3.3.23) (see Remark 3 . 3 . 4 ) . ~
work per iteration; see Remark 2.2.4. We may add that one m a y use addi-
tional subgradients for search direction finding w h e n the o b j e c t i v e func-
tion is a max function, see Remark 2.2.5. We also note that the subgradi-
ent gf(x k) is always used for search direction finding at the k-th
iteration, i.e. we have
k
k(1) ~ jk, g k ( 1 ) = g f ( x k ) and ~k(1)=0 if k(1) ~ k < k(l+l) . (3.13)
4. Convergence.
k
Lemma 4.1. If A l g o r i t h m 3.1 t e r m i n a t e s at the k-th iteration, then x
is a m i n i m u m p o i n t of f.
Lemma 4.2. Suppose that at the k-th iteration Algorithm 3.1 cycles infi-
nitely b e t w e e n Steps 1 and 3. Then 0 ~ ~f(~) for ~ = x k.
or e q u i v a l e n t l y
Then 0 ~ ~f(~).
Proof. (4.1) and (4.2) are equivalent, since ~k ~ 0 for all k. If (4.2)
P
holds, we may let kg K tend to infinity in (3.6) and use the conti-
nuity of f if is locally L i p s c h i t z i a n as a convex function on R N) to
obtain f(x) ~ f(x) for all x, i.e. 0 e ~f(~).
Lemma 4.4. (i) Suppose that the sequence {f(xk)} is bounded from be-
low. Then
k k 2 k-k
Z {tLlP 1 +tL~ p} < ~. 4.5)
k=l
f(xl)-f(xk+l)=f(xl)-f(x2)+...+f(xk)-f(xk+l)
k i i k i pi 2
-m L Z tLv =m L Z tLI I
i=l i=l
(iii) If (4.3) holds then (4.6) follows from the continuity of f and
the m o n o t o n i c i t y of {f(xk)}. Hence f(x k) ~ f(x) for all k, so (4.5)
holds and we have (4.7), as desired.
We shall now show that the properties of the dual search direction
finding subproblems ensure locally uniform reductions of [pkl after
null steps.
where the function ~C is defined (for the fixed value of the line
search parameter m R e (0,1)) by
(ii) Suppose that k(1) &k < k(l+l), so that xk=x k(1). Observe that
tk-i
R =t k-i
L , and hence yk =x k =x k(1) and gk=gf( x k(1)) if k=k(1). Com-
bining this with the fact that k(1)e jk by the rules of Steps 5 and 6,
and that ~k(ik)=~(xk,xk(1))=~(xk(1),xk(1))=0, we obtain
are feasible for the k-th subproblem (2.13), so its optimal value
llpkl2 1 gk(1) 2 k 1 2
tkR [ d k l a <_~ I d k l + ~ and gk+l =gf( yk+l ) if xke B, we deduce from (4.11)
from (4.8). Since ek=e k-l, me< 1 and xk=x k-l, (3.9) and (3.2c) yield
k ,k-i ~ e k
ep=ep and k
~k=~(xk,y k) ~ meek ~ e k , while k e jk by the rules of
Step 6, so the multipliers
are feasible in (2.13) for each v e [0,i]. Therefore the optimal value
of (2.13) satisfies
Since tk-i
L =0, (3.2d) yields
Using (4.12), (4.14) and the fact that m R ~ (0,I), dk-l=-p k-I and
v k-i =-I p k - 1 2] , we deduce from Lemma 2.4.10 that the right side of ine-
quality (4.13) is no larger then %C(Ipk-l12/2), so (4.9) holds, as re-
quired.
316
Proof. Suppose that xk=x for all k a k(1) and let K={k: Ipkl 2 ~k
~p}"
We shall consider two cases.
It r e m a i n s to c o n s i d e r the c a s e of an i n f i n i t e number of s e r i o u s
steps. To this end, let K l = { k : k(1) < k < k ( l + l ) } and let bI denote
the m i n i m u m value t a k e n on by m a x { IpkI ' ~k}
p in S t e p 2 at i t e r a t i o n s
k e KI, for all i. Note that bI is w e l l - d e f i n e d if i ~, s i n c e then
there can by only f i n i t e l y m a n y executions of S t e p 1 at any i t e r a t i o n .
e ~0 such t h a t on e a c h e n t r a n c e to S t e p 3 we have
By the algorithm's rules, for any 1 and k such that k e K 1 and k+le K 1
317
we have ek(1)=e
a
~ 0, e k+l m ek
e
if [pk[ ~ ~ S ek at Step 3, and
k+l k k
e =e otherwise. Therefore, if e a p p r o a c h e d zero for some k ~K 1
and large 1 ~ L, then so w o u l d [pk[ 2 and ~k
ep, w h i c h w o u l d c o n t r a d i c t
(4.15). Thus e k > Ee > 0 for some Ee and all k~ K1 and large 1 6 L.
In particular,
k
e _>e>0 for all large k ~ K, (4.16)
k k 2 k k pk k+l k
tLIP I =ItL d I I I=I x -x I Ipk[ for all k,
we obtain tk
L K 0 and Ixk+l-xkl-~K+ 0. But tk
L > 0 for all k e K, so
we deduce from (3.2b), the fact that ~> 0 is fixed, and (4.16) that
Since x k ----*
K ~ and Ixk+l-x k I__K+K 0, we have
Proof. If the a s s e r t i o n were false then Lemma 2.4.14, which holds for
A l g o r i t h m 3.1 owing to (3.5a) and Lemma 4.4(i), w o u l d imply that {x k}
is b o u n d e d and has some a c c u m u l a t i o n point x if {x k} is infinite,
while the proof of Lemma 4.2 shows that the m e t h o d must stop if {x k}
is finite and e s > 0. Then ~ 4.6 and 4.7 w o u l d y i e l d that max{Ipkl,
~} ~ es for some k, and hence the m e t h o d w o u l d stop, a contradiction.
A l g o r i t h m 5.1.
k-th dual s u b p r o b l e m (2.5) and are such that the c o r r e s p o n d i n g set ~k=
dk = - p k and vk = - I p k I 2 "
319
k 3k k
lj>_0, je ' jE~ ~kX3.=l (5.3)
from Lemma 2.1 and the construction of 3k. Hence, by the results of
Section 2.5, Lemmas 2.4.1 and 2.4.2 are true for A l g o r i t h m 5.1 and we
k
have (3.5). In view of (5.1) and (5.3), we may set ~p=0 in (3.7) to
obtain
~k e k
ep-<
as in A l g o r i t h m 3.1.
Xk(V)--~'lJ (V)=(l-v)Ik-13
for j e jk-l,xj(~)=0 for je J k \ .^k-i
( J u {k})
320
and use (2.5.4), which follows from (5.1)-(5.3), to deduce that (4.13)
holds, as before.
To sum up, A l g o r i t h m 5.1 is a g l o b a l l y convergent method in the
sense of T h e o r e m 4.9 and C o r o l l a r i e s 4.10 and 4.11.
k+l k k
e = maX{ea,-tLv } (6.1a)
or
k+l m a x { e a , f ( x k ) - f ( x k+l)} (6.1b)
e =
k
at Step 5 of A l g o r i t h m 3.1 if t L > 0. This will enable the m e t h o d to
use ek larger then ea at initial iterations. At the same time, one
may easily verify that this m o d i f i c a t i o n does not impair the c o n v e r g e n -
ce results of S e c t i o n 4.
The above m o d i f i c a t i o n does not e l i m i n a t e the need for the fixed
threshold ea > 0, h e n c e it has the second d r a w b a c k mentioned above. For
this reason, consider the f o l l o w i n g modification of A l g o r i t h m 3.1.
321
Algorithm 6.1.
el=me 61 .
Step i (Direction finding). Do Step 1 of A l g o r i t h m 3.1.
~k =
max{ipkl 2 , ~~k
p}. (6.2)
~k = m a x { I p k l 2 ,ap}
~k = ~p
~k ~ e k =me~ k
{x k} such that for some x ~R N one has f(x) <_ f(x k) for all k. Then
there exists x ~R N such that 0 ~f(x) and xk + x as k~. Moreover,
liminf ;k=0.
k+~
Proof. Suppose that f(x k) a f(x) for all k. We may use Lemma 4.4(i),
(3.9) and the fact that tLk< t for all k to deduce that Lemma 2.4.14
(i) Suppose that ek stays constant for all large k. Then the d e s i r e d
c o n c l u s i o n follows from Lemma 4.6 and the proof of Lemma 4.7, w h i c h is
valid for constant e k.
(ii) Suppose that ek tends to zero. Then 6k+0 and for infinitely ma-
ny k-s we have m a x { ] p k l 2 ,ap}
~k <_ 6 k, w h i l e x k + x--
, so (4.2) holds and Lem-
ma 4.3 yields the desired conclusion.
1
minimize ~1 jk ljgj+ippk-i 12+ Z jkljS k ~j
k + IpS k~k
ap,
1 jm je
(6.3)
subject to lj >_0, j e J k t lp >0,
-- Ejklj+l p ---- i t
jE
s k( ,k k k ) = O.
z k Ik k + ^p~p-e
jaj 3 3
323
By (3.8),
-k k k k k
ep= Z jklj~j + Ip~p,
j
k~k k k
hence we have s ~ p = S e . Combining the preceding relations with (2.15)
and invoking Lemma 2.2.1, we see that (6.3) is the dual to the problem
minimize lldl2+v,
<d,v) ~ R ~ 1
subject to -s k ~jk + < gJ,d> <_ v, j e J k, (6.4)
k k pk-i
-s ~ + < ,d > < v,
P
which has a unique solution (dk,v k) with dk=-p k (pk is given by (2.
15) ) and
that xk=~, t~=0 and ek=e > 0 for some fixed x, e and all large k.
Then Ipkl+0 as k+~.
k+l
ak+ i ! mee and
for all k. Moreover, since (dk+l,v k+l) solves the (k+l)-st subproblem
(6 .4), while k + l a jk+l , we have -s k+l ak+l
k+l + < g k + l dk+l > ! ~k+l , so
k+l
since ~k+l & m e e ' for all k. Subtracting (6.6a) from (6.6b) and rearrag-
ing terms yields
k+l
s ~ skmR/(l-me)-ck/(l-me)e,
where
c k =l pk+l i2_m R [pk L2-<g k+l ,pk+l -p k ~, IG 7)
k+l k
S ~ s - ck/(l-me)e for all k. (6.8)
(ii) From the proof of Lemma 4.5 we deduce that {gk} is bounded and
that pk+l=Nr[pk+l,pkj, for all k. Therefore {Ipkl} is monotonically
nonincreasing, and (6.6b) and the positivity of (l-me)e imply the ex-
istence of a constant s such that sk&s for all k.
this end, suppose that pk+l --+ p for some p and an infinite set
oK. Since pk+l=Nr[pk+l,pkj, Lemma 1.2.12 implies < pk+l,pk> Z Ipk+iI2
Passing to the limit with k K and using the monotonicity of {Ipkl}
yields < P,P > ~ IPl IPl ~ 0, so p = p ~ 0 from elementary properties of
the inner product. This shows that pk and pk+l have a common limit
~ 0 as k+~, k ~ K. By (6.7) and the boundedness of {gk}, we have
But skp ~ , so (6.8) yields skp +kc+l & s - k c~/2(l-me)e < 0, contra-
7. E x t e n s i o n to N o n c o n v e x U n c o n s t r a i n e d Problems.
~(x,y) = m a x { I f ( x ) - f ( x ; y ) l , y l x - y l 2} (7.1)
member of ~f(x), where f(x)-f(x;y) is the error with whic~ the lineariza-
tion of f calculated at y
is an overestimate of Ixk-yjl.
Algorithm 7.1.
m < 1 and ~_< 1 <_ t. Select a distance measure parameter y > 0 (y=0 if
e
cator rl=l.
a
Set the counters k=l, i=0 and k(0)=l.
Set
(p k ,fp,Sp)
"k-k = j ~Z jk Ik (g3,fj,sj)
' k k + Xp~p
.k~ k-i ,fp,Sp),
k k
7.6)
-k = max{if(xk)_~ki,
ap y(~k)2}. (7.7)
_Step 4 (Line search). By a line search procedure (e.g. Line Search Pro-
k k
cedure 3.2), find two stepsizes t kL and t kR such that 0 i tL s t R
and k kL
tR=t if tkL > 0, and such that the two corresponding points defin-
k+l k .k~k k+l k k k
ed by x =x eUL a and y =x +t~d satisfy
~ku {k+l} contains k(1), i.e. k(1) e 3k if k(1) < k+l. Set gk+l =
rk+l=l
a and go to Step 8.
k i 2+~k
w = ~lpki p
satisfies
The first group comprises sk ~ a with j_< k(1), since the rules of
3
Steps 7 and 8 ensure a k(1) _<a after a serious step, while sk stay
3
k
c o n s t a n t b e t w e e n serious steps. Since the second group contains s. =
3
[yJ-xJJ=tJldJ] < tlpJ I with xJ=x k and k(1) < j < k, w h i l e xk K-
k
we deduce from the proof of Lemma 4.5 that such s.-s are u n i f o r m l y
bounded for k ~K, b e c a u s e so are the c o r r e s p o n d i n g 3 p3-s. Hence {ak}kE K
is bounded, as desired.
hence in the convex case Theorem 4.9 and C o r o l l a r y 4.10 hold for A l g o -
rithm 7.1. We conclude that A l g o r i t h m 7.1 is a g l o b a l l y convergent
method.
We m a y add t h a t one m a y m o d i f y the line search criteria of the
method by replacing vk in (7.8) by ~k defined by (6.5). This modifica-
tion will not impair the p r e c e d i n g global convergence results, since the
proof of L e m m a 6.4 remains valid.
where the functions f : RNR and F : RN+ R are convex, but not nece-
ssarily differentiable. We assume that the S l a t e r constraint qualifica-
tion is f u l f i l l e d , i.e. F(x) < 0 for some x ~ R N, so t h a t the feasible
set
S = {x m RN : F(x) & 0}
H(y;x) = m a x { f ( y ) - f ( x ) , F ( y ) } i8.2)
0 e 3H(x;x).
Hk(x) = m a x { f ( x ) - f ( x k ) , F ( x ) }
k =f(xk)_fk and k k
~f,J 3 ~F,j=-FJ ' (8.6)
k
g ~ e 3eHk(x k) for =~F,j"
k k
Gk(e)={gE R N : g= Z jklJ g + Z k~J g ' . Z kl.~.
f 3 r,].+ ] E k~j~F,j ~ e
j e _f J JF 3~ J " JF
k
lj z 0,j e Jf,pj ~ 0,je J Fk ' j eZ Jfklj+ jeZ O_kUj=l},
F (8.7)
_J F k,
k k
lj ~ 0,j e Jf,lp ~ 0,~j h0,j E JF,~p ~ 0,
Moreover, vk=-Ipkl
2"" may be regarded as an approximate directional de-
rivative of Hk at xk in the direction dk; cf. (3.10).
Algorithm 8.1.
k \k k
Step 1 (Direction finding). Find multipliers ~, je Jf, p, ~j, j~
k and
JF' i kp that solve the k-th dual subproblem
j k-i 9 k-I 2
minimize ~I ~ ~k~jgf+lppf + Z k~jgF+~pPF I ,
l,p J g oF 3 ~ JF
k k 0
subject to ~3 >- 0, j ~ Jf' lp >- 0 ' ~j -> 0 ' j E JF ' ~P ~ '
(8.~o)
+ ~ k~j + ~ p = l ,
JgZ Jfklj + lp Jg JF
k k k k k k
vf = ~ klj + Ip and VF = ~ k~j + ~p'
JJf J~J F
~k lk, k for j~ k ~k k, k k
j= j/~f Jf and lp=kp/Vf if 9f ~ 0,
k k ~k k (8.11)
~k=l, I =0 for j g Jf ~ {k} and ip=0 if 9f=0,
-k k, k k ~k_ kt k k
~j=~j/~F for J ~ JF and ~p-~p/~r if ~F ~ 0,
-k -k_0 k ~k k
~k=l, ~j- for J JF " {k} and ~p-0- if ~F=0.
334
k ~k ~ '
,fj)+Xp(pf , ,
j e Jf
8.12)
k ~k, ~k j k ~k k - 1 F k )
(PF' p) = Z k~j(gF,Fj)+~p(PF , P ,
9e JF
k k k k k 8.13)
P =~fPf + 9FPF ,
If max{ipki2 ,ep}
-k ~ es, terminate; otherwise, continue.
k k k (8.16a)
f(x k+l) <_f(x )+mLtLV ,
t kL ~ or max{~(xk,xk+l),~(xk,~k+l)} >me ek k
if t L > 0, (8.16c)
where
g(y)=gf(y) and a(x,y)=f(x)-f(y)- < gf(y),x-y > if y E S,
(8.17)
g(y)=gF(y) and a(x,y)=-F(y)- < gF(y ),x-y > if y~S.
335
pk~ ~esH (xk;x k) and Ipkl 2 <_ es. This estimate justifies our stopping
criterion and shows that xk is stationary for f on S if Es=0, since
stationary points ~ satisfy 0 e ~H(x;~) and are optimal for problem
(8.1).
The method updates the approximation tolerances ek according to
modified rules of Section 6. Note that we always have ~ k & e k, since
P
the proof of Lemma 6.4.6 yields a suitable extension of relation (3.8).
It is worth adding that the method may also use the efficient strategy
of Algorithm 6.1 for regulating e k, as will be shown below.
The line search criteria (8.167 extend (3.2) to the constrained
for all k > i). The nontrivial aspect of this extension consists in al-
lowing for a serious step when a(xk,y k+l) > me ek , which indicates, by
the properties of the function u and the fact that meek is positive,
(d) If ~(x,x+td) < mee and < g(x+td),d > a m R v set tR=t and tL=
tB=0, and return.
iterations with stepsizes tL=tL,k t~=t R and t~=t B satisfying the re-
quirements of Step 4 of A l g o r i t h m 8.1.
Proof. We shall use a combination of the proofs of Lemmas 3.3 and 6.3.3.
Assume, for c o n t r a d i c t i o n purposes, that the search does not terminate.
Let
^ ~i i ~
the existence of t a0 such that ~+t' tu#t and t~LT, and that the
set I is infinite. We shall consider the following two cases.
-i ~i *
(i) Suppose that ~ > 0. Then, since tL%t, we have
t L > 0 for large i,
"i i
and, since t i e {tL,t U} for all i, the rules of step (c) imply that
step (d) is entered with ~(x,x+tid) < m e e for large i. Therefore in
step (d)
of t at step (e), and then we have tUi+l <_ t Ui _< I0 <_ 10tiL+I This shows
that t B_< 10t L at termination.
_k+l
The rules for choosing of at Step 6 yield the following ana-
logue of (3.13)
k ( 1 ) e jf,gfk
k(1)=gf (xk) and k
~f,k(l~=0j if k(1) <_k < k ( i + i ) , ( 8 . 2 3 )
k 1 k 2+~k (8.24a)
w : ~Ip I p
and observe that we always have
338
e(x,y)+0 if x,y+xe S,
Corollary 8.7. If the level set {x e S : f(x) & f(xl)} is b o u n d e d and the
339
^k k k k k k
Jf = {j e Jf : lj ~ 0} and JF = {j ~ JF : ~j ~ 0} (8.25a)
should satisfy
^k ^k
IJf u JFI ~ N+I. (8.25b)
k k
The r e q u i r e d m u l t i p l i e r s lj and ~j may be found by the M i f f l i n
(1978) algorithm; see Remark 2.2.
One may easily verify that the method w i t h s u b g r a d i e n t s e l e c t i o n
is g l o b a l l y c o n v e r g e n t in the sense of T h e o r e m 8.5 and Corollaries 8.6-
8.7. To this end, it suffices to m o d i f y the p r e c e d i n g c o n v e r g e n c e ana-
lysis of the m e t h o d w i t h s u b g r a d i e n t a g g r e g a t i o n by using 15.5.2)-(5.
5.4) in the proof of an analogue of Lemma 4.5.
A l g o r i t h m 8.1 may be m o d i f i e d by i n c o r p o r a t i n g the a p p r o x i m a t i o n
t o l e r a n c e u p d a t i n g s t r a t e g y of A l g o r i t h m 6.1. This will not impair the
p r e c e d i n g c o n v e r g e n c e results, since an analogue of Lemma 6.3 (in w h i c h
^
9. E x t e n s i o n s to N o n c o n v e x C o n s t r a i n e d Problems.
~f(x,y)=max{If(x)-~(x;y) xflx-yl2}, I,
~F(x,y)=max{l~(x;y)I,YFlX-yl2},
defined in terms of the l i n e a r i z a t i o n s
following method.
Al~orithm 9.1.
j . k-i j k-i 2
minimize 1 1 Z kljgf+IpPf +j ~ kDjgF+~pPF I ,
I,~ J g Jf ~ JF
k
subject to 1 3. >0,
_ j ~ Jf,
k ip_> 0 , pj >_ 0, J ~ J F ' ~ p >_ 0 ,
+ ~ kDj+pp=l, (9.4)
J e J fklj+iP J e JF
Ip = ~p = 0 if r ka = l ,
k k < ek
k k + Z _kD3~F,j+~peF,p ,
Z klJ~f'J+IP~f'P j 6 o F
J~Jf
k ikj' j e Jf,
Calculate the scaled multipliers k , 9F' k ik -k J~ JF'
p' Pj' k
and pp
~k by (8.11). Compute the aggregate subgradients
~k sf)=j
(pf,fp,k "k ~k j k k k k '
eZ.TklJ(gf'fj'sj)+Ik(pk-l'fp'Sf)vf
P z
~k j k k ~k k-i k k
(PF,Fp
, k ~k ~k)=j EZ JFk~J(gF'Fj'sj)+~P(PF 'Fp'SF)' (9.6)
342
k k k k k
p = vfpf + ~Fp F
Step 4 (Line search). By a line search procedure (e.g. Line Search Pro-
cedure 8.2), find three not necessarily different stepsizes k t kR and
tL,
t Bk satisfying the requirements of Step 4 of Algorithm 8.1 with g and
defined by (9.3).
Step 8 (Distance r e s e t t i n ~
Delete from J~+]
~ and dF all indices
k k _k+i _ k + l }.
j with s 7 +1 > a / 2 , and set a k + l = m a x { s 7 +1 : j e u dF
] ] df
k 1 k,2 ~k
w = ~ p ~ +~p (9.8a)
satisfies
w k ~ 2max{Ipk[ 2 ,ap}
~k and m a x { i p k l 2 ,~p}
~k ~ 2w k, (9.8b)
Then:
344
(iii) If f and F are convex and F(x) < 0 for some x, then {x k} mini-
mizes f on S, i.e. {x k} c S and f ( x k ) + i n f { f ( x ) : x e S}. Moreover, {x k}
converges to a m i n i m u m point of f on S w h e n e v e r f attains its infimum
on S.
N u m e r i c a l Examples
I. Introduction
Id kl > ma ak and w k 5 s
2. N u m e r i c a l Results
(hi) = (1,5,10,2,4,3,1.7,2.5,6,3.5) ,
346
0 2 1 1 3 0 1 1 0 1 ]
0 1 2 4 2 2 I 0 0 1
(aij)T = 0 I I I I I I 1 2 2 ,
0 I I 2 0 0 I 2 I 0
0 3 2 2 I I I I 0 0
= (1.12434,0.97945,1.47770,0.92023,1.12429),
f(x) = 22.60016,
xI = (0,0,0,0,1), f ( x I) = 80.
x 49 = (1.12433,0.97943,1.47749,0.92027,1.12425).
Table 2.1
Es k f (x k ) Lf
10 -4 34 22.60021 64
10 -5 41 22.60017 76
10 -6 49 22.60016 90
A%. : i sin(i)/10 + [ A~ ,
ll j#i 13
347
b I = exp(i/l)sin(i-1),
1
= (-0.1263,-0.0346,-0.0067,0.2668,0.0673,
0.2786, 0.0744, 0.1387,0.0839,0.0385),
f(x)= -0.8414.
51
x = (-0.1263,-0.0342,-0.0062,0.0269,0.0671,
-0.2783, 0.0744, 0.1385,0.0836,0.0383).
where
N
Aij = I/(i+j), bi = ~- I/(i+j), i,j=1 ,...,N, N>2,
j=1
N
c i = -I/(I+I) - ~ I/(i+j)
j=1
x = (1,1 ..... 1) ,
Table 2.2
N f(~) k f(x k ) Lf
5 -6.26865 14 -6.26865 31
10 -13.1351 23 -13.1351 47
15 -20.0420 32 -20.0420 67
Table 2.3
k f(x k ) Lf
~s
10 -4 16 -20.0411 26
10 -5 21 -20.0420 41
10 -6 25 -20.0420 51
10 -7 32 -20.0420 67
Table 2.4
N as k f (xk) Lf LF
5 10 -4 13 -6.26610 30 43
5 10 -5 20 -6.26861 52 74
10 10 -4 46 -13.1344 100 154
10 10 -5 50 -13.1348 110 168
15 10 -3 20 -19.9912 56 83
15 ]0 -4 43 -20.0396 117 174
15 10 -5 51 -20.0406 131 193
349
2.4. CRESCENT
2 2
f(x) = max{x~ (x2-I)2 + x 2 - 1, -x I - (x2-1) + x 2 + I)} ,
= (0,0) , f(~) = 0.
Table 2.5
k f(x k ) Lf
~s
10 -6 16 8-10 -6 29
10 -9 21 7-10 -6 36
The p r o b l e m is to
5
minimize 2 [ djyj3 + < c y , y > - <b,z> o v e r all
j=1
(y,z) e R 5X R I0
matrix A b
-16 2. 0 1 0. -40
0. -2 0 4 2. --2.
-350. 2 0 0. -0.25
0 -2 0 -4 --I --4.
0 -9. -2 -I -2.8 --4.
2 0. -4 0 0. --I .
symmetric matrix C d e
Table 2.6
f (x k) Lf
!00 32.85 259
150 32.54 384
200 32.38 497
250 32.36 626
300 32.35 766
351
x = (al,bl,Cl,dl,a2,b2,c2,d2,A) R 9,
i/2
H (x, g) =A ~ I I+a2i+bi+2bi
2 (2cos2g_1)+2ai(1+bi)cos g )
2+ .2
i=I ~ I+c i ai+2di(2cos2g-1)+2ci(1+di)cos g
g =7[h,
S(h) = I 1-2hl,
h.=0.63+(i-25)0.03
1
for i=25 ..... 35,
h.=0.95+(i-36)0.01
1
for i=36 ..... 41,
=(0,0.980039,0,-0.165771,0,-0.735078,0,-0.767228,0.3679),
Table 2.7
k f (xk) Lf
~s
10 -3 23 17"10 -3 43
10 -4 54 14"10 -3 105
10 -5 201 6.45-10 -3 400
352
where
3s2+9s+8 -3s2-7s-4
d1(s,x) d2(s,x)
Q(s,x) = I
2s+4
-2s-2 s2-8s+10
d1(s,x) d2(s,x)
gives an upper bound on the noise power per hertz in any channel at
the plant input. The design requirement
as
F (x) < 0.
353
Table 2.8
Problem k Lf f (x k) Xlk xk F (x k) +
I 19 37 0.704194 0.5030 1.5088 0
2 15 45 1.032101 1.0321 1.0321 0
3 21 69 0.740497 0.7405 1.2405 5"10 -7
4 31 78 0.923553 1.4236 0.9236 0
References
appear).
INDEX
objective function i
optimality condition 15, 16, 17