(Lecture Notes in Mathematics 257) Richard B. Holmes (Auth.) - A Course On Optimization and Best Approximation-Springer-Verlag Berlin Heidelberg (1972)

Lecture Notes in
Mathematics
A collection of informal reports and seminars
Edited by A. Dold, Heidelberg and B. Eckmann, ZiJrich
257
Richard B. Holmes
Purdue University, Lafayette, IN/USA
A Course on
Optimization and
Best Approximation
Springer-Verlag
Berlin-Heidelberg • NewYork 1 972
A M S S u b j e c t Classifications (1970): 41-02, 41 A 50, 41 A 65, 4 6 B 9 9 , 4 6 N 0 5 , 49-02, 4 9 B 30, 9 0 C 2 5
I S B N 3-540-05764-1 S p r i n g e r - V e r l a g B e r l i n • H e i d e l b e r g • N e w Y o r k
I S B N 0-387-05764-1 S p r i n g e r - V e r l a g N e w Y o r k • H e i d e l b e r g • B e r l i n
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned,
specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine
or similar means, and storage in data banks.
Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher,
the amount of the fee to be determined by agreement with the publisher.
@ by Springer-Verlag Berlin * Heidelberg 1972. Library of Congress Catalog Card Number 70-189753. Printed in Germany.
Offsetdmck: Julius Beltz, Hemsbach/Bergstr.

PREFACE
The course for which these notes were originally prepared was a
one-semester graduate level course at Purdue University, dealing
with optimization in general and best approximation in particular.
The prerequisites were modest: a semester's worth of functional
analysis together with the usual background required for such a
course. A few prerequisite results of special importance have been
gathered together for ease of reference in Part I.
My general aim was to present an interesting field of application
of functional analysis. Although the tenor of the course is
consequently rather theoretical, I made some effort to include a
few fairly concrete examples, and to bring under consideration
problems of genuine practical interest. Examples of such problems
are convex programs (~'s 11-13), calculus of variations (§17),
minimum effort control (§21), quadrature formulas (§24), construction
of "good" approximations to functions (§'s 26 and 29), optimal
estimation from inadequate data (§33), solution of various ill-posed
linear systems (§'s 34-3S). Indeed, the bulk of the notes is devoted
to a presentation of the theoretical background needed for the study
of such problems.
No attempt has been made to provide encyclopedic coverage of
the various topics. Rather I tried only to show some highlights,
techniques~ and examples in each of the several areas studied.
Should a reader be stimulated to pursue a particular topic further,
he will hopefully find an adequate sample of the pertinent literature
included in the bibliographies. (Note that in addition to the main
bibliography between Parts IV and V, each section in Part V has its
own special set of references appended.)

IV
The first three parts of these notes constitute a slightly
fleshed-out arrangement of the material actually covered in the Purdue
course. That course also involved the solution of numerous problems;
about 50 of those problems have been included here and Part IV
contains hints and/or complete solutions to most of them. Thus this
portion of the notes is reasonably self-contained, modulo the
indicated prerequisites (minor exceptions to this assertion occur on
pages 28, 81 and 89). Part V is a bit more loosely written; in
particular, it contains a few references without proof to rather
deep results. I feel that all the topics in Part V might have
legitimately been included in the course had time permitted. The
order of ~'s 32 and 33 is somewhat arbitrary and could have been
reversed. §'s 34 and 35 provide some applications of metric
projections by illustrating their natural occurrence in attempts to
handle ill-posed linear equations.
It is my hope that the present notes can serve as the basis for
other courses besides the original; for example, a two-quarter
course covering essentially everything, a one-quarter course on best
approximation covering Part III, §'s 31, 32, and perhaps 19 and 35,
or a one-quarter course on convexity and optimization covering
Part If, ~33 (note that 33b) contains a proof of Valadier's formula
for the subdifferential of a supremum of convex functions), and
perhaps some of the early material in Part III.
As format goes, sections are divided into sub-sections; each
sub-section contains at most one theorem, at most one definition,
etc. (the sole exception to this being 33e)). A reference to (sub-
section) 15b), say, is unambiguous; a reference to b), say, refers
to sub-section b) of the current section.

Because of typographical limitations, the symbol "4" has been
used in two ways, which hopefully are distinguishable by context:
it denotes on occasion the empty set, and at other times, it
denotes a linear functional.
Some acknowledgments are now in order. Professor Frank Deutsch
generously made available to me a copy of his own lecture notes on
best approximation, and these proved quite useful in the arrangement
of some of the material in Part III. Mr. Philip Smith provided
several helpful comments about Chebyshev centers in §33. Professor
Paul Halmos kindly recommended the inclusion of the manuscript in
the Springer Lecture Notes Series. Finally, it is a pleasure to
thank Mrs. Nancy ~berle and Mrs. Judy Snider for their competent and
cheerful assistance in the preparation of the manuscript.
West Lafayette, Indiana

November, 1971
CONTENTS
Part I. Preliminaries . . . . . . . . . . . . . . . . . . . .
51. Notation . . . . . . . . . . . . . . . . . . . . . . 1
§2. The H a h n - B a n a c h Theorem . . . . . . . . . . . . . 2
§S. The Separation Theorems . . . . . . . . . . . . . . 4
§4. The A l a o g l u - B o u r b a k i Theorem . . . . . . . . . . . . 7
§5. The K r e i n - M i l m a n Theorem . . . . . . . . . . . . . . 8
Part II. Theory of O p t i m i z a t i o n . . . . . . . . . . . . . . . 14
§6. Convex Functions . . . . . . . . . . . . . . . . . . 14

§7. Directional Derivatives . . . . . . . . . . . . . . 16
§8. Subgradients . . . . . . . . . . . . . . . . . . . . 20
§9. Normal Cones . . . . . . . . . . . . . . . . . . . . 23
§i0. Subdifferential Formulas . . . . . . . . . . . . 25
§II. Convex Programs . . . . . . . . . . . . . . . . . . 29
§12. Kuhn-Tucker Theory . . . . . . . . . . . . . . . . . 32

513. Lagrange Multipliers . . . . . . . . . . . . . . . . 36
§14. Conjugate Functions . . . . . . . . . . . . . . . . 42
§lB. Polarity . . . . . . . . . . . . . . . . . . . . . . 48
516. Dubovitskii-Milyutin Theory . . . . . . . . . . . . 51
§17. An A p p l i c a t i o n . . . . . . . . . . . . . . . . . . . 56
§18. Conjugate Functions and S u b d i f f e r e n t i a l s ...... 58
§19. Distance Functions . . . . . . . . . . . . . . . . . 61
§20. The Fenchel Duality Theorem . . . . . . . . . . . . 65
§21. Some A p p l i c a t i o n s . . . . . . . . . . . . . . . . . 7O
Part III. Theory of Best A p p r o x i m a t i o n • . . . . . . . . . . 76
§22. Characterization of Best A p p r o x i m a t i o n s ...... 76

§23. Extremal Representations . . . . . . . . . . . . . . 81
§24. Application to G a u s s i a n Quadrature . . . . . . . . . 88
§25. Haar Subspaces . . . . . . . . . . . . . . . . . . . 91
§26. Chebyshev Polynomials . . . . . . . . . . . . . . . 98
§27. Rotundity . . . . . . . . . . . . . . . . . . . . . i05
§28. Chebyshev Subspaces . . . . . . . . . . . . . . . . 109
§29. Algorithms for Best A p p r o x i m a t i o n . . . . . . . . . 118

§30. Proximinal Sets . . . . . . . . . . . . . . . . . . 123
VIII
Part IV. Comments on the Problems . . . . . . . . . . . . . . . 128
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Part V" Selected Special ToPic s . . . . . . . . . . . . . . . . 145
§31. E-spaces . . . . . . . . . . . . . . . . . . . . . . . 145

§32. Metric Projections . . . . . . . . . . . . . . . . . . 157
§33. Optimal Estimation . . . . . . . . . . . . . . . . . . 177
§34. Quasi-Solutions . . . . . . . . . . . . . . . . . . . 203
§35. Generalized Inverses . . . . . . . . . . . . . . . . . 214
Part I
Preliminaries
~i. Notation
Throughout these notes we will be d e a l i n g with linear spaces
X, Y, ..., and v a r i o u s mappings defined on them. The u n d e r l y i n g
scalar field may be either the real or c o m p l e x number field, unless
one or the o t h e r is e x p l i c i t l y singled out. We list b e l o w some of the
abbreviations and/or symbols to be e m p l o y e d throughout the text. Al-
t h o u g h not all u s e d right away, it is c o n v e n i e n t to have them c o l l e c t e d
together for ease of reference. Notation of less frequent usage will
be i n t r o d u c e d as the need arises.
We write:
Is - for linear space;
tls - for t o p o l o g i c a l linear space;
its - for l o c a l l y convex (Hausdorff) space;
nls - for n o r m e d linear space;

@ - for the zero v e c t o r in a Is;
X - for the real restriction of a c o m p l e x is;

r
X t _ for the a l g e b r a i c dual of a Is;
X* - for the continuous dual of a tls;
u(x) for the unit ball {x ~ X: [Ixll ~ i} of a nls x;
S(X) - for the unit sphere {x ~ x: Ilxll = 1} of a nls x;
L(X,Y) for the space of all c o n t i n u o u s linear maps from a tls X

into a tls Y;
Rn _ for real E u c l i d e a n n-space;
e. - for the i t h - s t a n d a r d unit v e c t o r in Rn;

l
T - for the c o n j u g a t e of a c o m p l e x number z;
sgn (z) for the s i g n u m ~/Izl of a n o n - z e r o complex number z

(with sgn (0) = 0);
span (A) for the linear hull of a set A;
co (A) for the convex hull of a set A;
int (A) for the interior of a set A;
rel-int (A) - for the r e l a t i v e interior of a set A;
cl (A) (or s o m e t i m e s A) for the c l o s u r e of a set A;
fIA - for the restriction of a f u n c t i o n f to a s u b s e t A of

its domain;
wrt - for "with respect to";
nas- for " n e c e s s a r y and s u f f i c i e n t " ;
C(~) for the space of c o n t i n u o u s scalar-valued functions on a

compact Hausdorff space fl;
rca (~) for the space of r e g u l a r Borel m e a s u r e s on such a space
I P ( n ) , c o , l P , L P ( ~ ) - for the usual Banach spaces;
A subscript R attached to the symbol for a function space, as
in CR(fl ) or L~(V), means that the functions involved are real-
valued; otherwise the scalars may b e e i t h e r real or complex.
Finally, the symbol "z" is to be read "equals by definition".
§2. The H a h n - B a n a c h Theorem
In this s e c t i o n we recall w i t h o u t proof some v a r i a n t s of the Hahn-
Banach extension theorems. These results all assert the e x i s t e n c e of
linear functionals with certain properties. Together with their geo-
metrical versions to be given in §3 below, they c o n s t i t u t e the corner-
stone of the e x i s t e n c e and d u a l i t y theory to be d e v e l o p e d later in
these notes.
a) Theorem. Let M be a l i n e a r subspace of a real is X and
f ~ M'. Let p be a r e a l - v a l u e d sublinear function on X such that
f < p]M. Then ~ F E X' satisfying F < p and F]M = f.

3
Thus the linear functional f has a linear extension F to all
of X and this extension remains dominated (pointwise) by p. Using
a separation theorem (§3) Weston [77] has shown that the above result
remains true if p is replaced by a (finite) convex function on X.
b) Corollary. Let X be a complex is and let f, M have the
same meaning as in a). If p is a semi-norm on X such that
If(.)l ! plM, then O F e X' such that IF(.)I ! p and FIM = f.
c) Corollary. Let X be a nls, M a linear subspace of X,
and f s M*. Then ~ F s X* such that IIFII = ]Ifl[ and FIM = f.
This result may be viewed in particular as asserting the exist-
ence of a continuous linear extension of f with minimal norm.
Clearly f has "many" extensions F with I[FII > llfl[. It is less
clear a priori whether or not an extension of minimal norm is uni%ue.
This question has some interesting connections with approximation and
moment problems; the reader may consult [19, 26, 58, 73] for further
details.
We note also that the proofs of b) and c) above establish some
information about linear functionals on a complex nls. Namely, let X
be such a space and f g X*. Define (re f)(x) -- re f(x) and
(im f)(x) = im f(x). Then re f and im f belong to X* (where X

r r
denotes X regarded as a real is), (ira f) (x) = -(re f)(i x), and
II re fll = Ilfll • And conversely, if f ~ X~ and F is defined by

r'
F(x) = f(x) i f(i x), x ~ X,
then F s X* and IIFII = llfll
d) Corollary. Let M be a linear subspace of the nls X and
x O ~ X \ M. Then f ~ S(X*) such that f(x) = 0 ~x ~ M and
f(Xo) = d(Xo,M ).
Proofs of all the preceding results, along with further corollar-

4
ies, can be found in [15, Ch. If].
§3. The Separation Theorems
The main results of this section, namely the Support Theorem 3f)
and the Separation Theorem 3g) are actually equivalent, in their
linear space formulations, to each other, and to the Hahn-Banach
Theorem 2a). However, here they will simply be deduced as conse-
quences of 2a).
a) Lemma. Let X and Y be real tls and T: X + Y an addi-
tive mapping. If T is continuous at 0, then T ~ L(X,Y).
Proof. Exercise i.
b) Lemma. Let X be a tls and f ~ X', f + 0. Then f is
continuous if and only if re f is bounded (above or below) on some
open set.
Proof. After a translation if necessary, we may assume that
re f(U) ~ c, where U is some @-nbhd. Letting V ~ U N (-U) we
obtain Ire f(V) I ~ c. Thus if ~ > 0, and W ~ (¢/c)V, then
]re f(W) l j ~, which proves that re f is continuous at 0. Hence
re f is continuous by a), and therefore so is f.
c) Lemma. Let K be a convex subset of a tls. If x ¢ int (K)
and y ¢cl (K), then {tx + (l-t)y: 0 < t < l}Cint (K). Hence
int (K) is Convex, and cl (int (K)) = cl (K), provided
int (K) + ~.
Proof. Exercise 2.
d) Let us recall that when K is a convex @-nbhd in a tls X,
there is defined on X a non-negative function PK according to the
formula
5
PK(X) = inf {t > 0: x s tK}.
PK is called the M i n k o w s k i function of K. It is s u b l i n e a r and we
have int (K) = (x c X: PK(X) < I) [15, p. 411], so in p a r t i c u l a r , PK

is c o n t i n u o u s on X.
e) If X is a is, a flat in X is any t r a n s l a t e of a linear
subspace of X. If X is a tls, a convex body in X is any convex
set with n o n - e m p t y interior.
Theorem. (Mazur, Bourgin) Let K be a c o n v e x body in a tls X,
and V a flat in X such that V{~hint (K) = ~. Then ~ (real)
closed hyperplane H, satisfying V C H C X and H Nint (~) = ~.
Proof. It will suffice to e x h i b i t an f ~ X* and a real c for
which re f(V) = c > re f(int (K)). After a translation we may assume
that 9 ~ int (K). Let M = real span (V), then V is a h y p e r p l a n e
in Mr, hence ~ f o s Mr such that V = {x c M: fo(X) = i}. Now d)
above implies fo(X) = I <_ PK(X) V x ~ V. Hence fo(tX) < PK(tX) if
t > 0, while fo(tX) <_ 0 <_ PK(tX) if t <_ 0. That is,
fo -< PK IM" By 3a) we can extend fo to fl s X'r so that fl -< PK"
This inequality, together with a) and d) above, shows that fl is
continuous. Let H = {x s X: fl(x) = i}, take c = i, and, if X
is a c o m p l e x space, define f s X* (as in 2c) so that re f = fl' qed.
f) Corollar~. (Support Theorem) If x is not an i n t e r i o r point
of a c o n v e x body K in a tls, then ~ a (real) closed hyperplane H
containing x such that K lies on one side of H.
Proof. Let V = {x} in e).
In typical applications of the Support Theorem x is a b o u n d a r y
point of K. If x is also in K then it is a support point of K,
and any h y p e r p l a n e satisfying the c o n d i t i o n s of f) is a s _ ~ o r t i n ~
hyperplane to K at x.
6
g) Corol!ary. (Separation Theorem) Let K1 be a convex body
and K2 a convex set in a tls X such that int (KI)('~K 2 = ~.
Then ~ a (real) closed hyperplane H separating K1 and K 2.
Proof. We must produce a non-zero f ~ X~ such that
sup re f(Kl) ~ inf re f(K2). Let K = K1 K 2 (vector difference).
Then @ ~ int (K) ~ ~, so by f) ~ a non-zero f ~ X~ for which
re f(K) ~ re f(@) = 0. If c is any number in the interval
[sup re f(Kl) , inf re f(K2)], we may take H = (x c X: re f(x) = c],
qed.
Remark. In case the space X is finite dimensional, a more
precise version of the Separation Theorem is valid. It is based on
the fact that every finite dimensional convex set has a non-void
relative interior (i.e., interior relative to the flat generated by
the set). Excluding the trivial case where both the convex sets lie
in a common hyperplane, the Finite Dimensional Separation Theorem
asserts that two convex sets can be separated by a (real) hyperplane
if and only if their relative interiors are disjoint. The proof is
similar to the one preceding, and may be found along with related
results in [70, §ii].
h) Lemma. Let A be closed and B compact in a tls X. Then
A + B is closed.
Proof. Exercise 3.
Theorem. (Strong Separation Theorem) Let X be a Its, and K1 ,

K2 disjoint closed convex subsets with one of them compact. Then
3 a (real) closed hyperplane strongly separating K1 and K 2.
Proof. The assertion to be proved is that ~ a non-zero
f E X~ such that sup re f(Kl) < inf re f(K2). We first observe that
K ~ K2 K1 is closed by the lemma and that @ ~ K. Then because X

is l o c a l l y convex, ~ a convex @-nbhd. U such that UNK = @.
g) now implies the e x i s t e n c e of a n o n - z e r o f ~ X* such that
sup re f(U) ! inf re f(K). Since f + @, ~ x° c X such that
f(xo) = I, and then ~ e > 0 such that tx ° E U whenever
Itl ! ~" Therefore, s ! sup re f(U) since f(tXo) = t. Thus
~ inf re f(K 2 KI) and so sup re f(Kl) + s ! inf re f(K2) , qed.
i) Corollary. A closed convex subset of a ics is the i n t e r s e c t i o n
of all the c l o s e d (real) half-spaces which contain it.
We recall that a closed (real) half-space in a tls X is a set
of the form {x ~ X: re f(x) < c} for some f ~ X ~. Thus any c l o s e d
convex subset of a ics can be d e f i n e d by a family of (real) linear
constraints. When X is a s e p a r a b l e nls, the family can always be
chosen to be countable.
Exercise 4. Prove this last assertion.
§4. The A l a o g l u F B o u r b a k i Theorem
Let X be a tls w i t h n o n - t r i v i a l dual X~ (which is c e r t a i n l y
the case if X is a ics). We recall that the w e a k - s t a r (w ~-)
topology on X~ is the t o p o l o g y of p o i n t w i s e convergence on X. The
general basic w~-nbhd of f a X~ is d e f i n e d by

o
{f E X~: I f(x)-fo(x)I < 6, x ~ A}; here 6 > 0 is a r b i t r a r y and A
is an a r b i t r a r y finite subset of X. The w ~ - t o p o l o g y is l o c a l l y
convex and H a u s d o r f f ; it is the w e a k e s t topology on X~ for w h i c h all
the linear functionals f ~ f(x) (x ~ X) are continuous. The compact-
ness theorem presented in this section and the e a r l i e r Hahn-Banach
T h e o r e m will justify our later interest in dual spaces.
a) D e f i n i t i o n . Let A be a subset of a tls X. The p o l a r of A
is the set
A ° ~ {f ~ X~: re f(A) < i}

8
A° is evidently a non-empty (0 ~ A ° always) convex subset of
X~ closed in the w*-topology.
ExamFles. i) If M is linear subspace of X, then

MO = M& {f ~ X*: f(M) = 0} (the annihilator of M).
2) If X is a nls, then U(X) ° = U(X*).
3) Let X be either real Euclidean space or real Hilbert space
IR,2 and E the ellipsoid {x = (x n) ~ X: ~(Xn/an) 2 _< i} for fixed
a n > 0. Then E° = (x e X: ~ a n2 x n2 -< 1 ) "
Exercise 5. Verify these examples.
b) It is clear from the relevant definitions that a subset B of
X~ is equicontinuous on X if and only if B is equicontinuous at
O if and only if B C A °, for some @-nbhd A CX.
Theorem. (Alaoglu-Bourbaki) If B is an equicontinuous subset
of the dual of a tls X, then B is relatively w*-compact.
The proof of this depends in part on the Tychonov compactness
theorem for product spaces, and can be found in [71, p. 84]. The
converse is generally false, but does hold when X is a Banach space
(principle of uniform boundedness). The polarity concept will be
examined in greater detail in §15 below.
~5. The Krein-Milman Theorem
In this final preliminary section we recall an extremely impor-
tant device for describing a compact convex set K by means of
smaller subsets. The most important applications of this procedure
will be to the case where K is a (w*-closed) subset of U(X*), for
some nls X. The notation c-~ (A) will mean cl (co(A)), this set
being the same as the intersection of all closed convex sets contain-
ing A (with respect to some ambient tls). The main result gives
9
two e q u i v a l e n t conditions on A CK, either of w h i c h in turn is
equivalent to co (A) = K.
a) Lemma. The c o n v e x hull of the u n i o n of f i n i t e l y many compact
convex subsets of a tls is compact.
Proof. Exercise 6.
b) D e f i n i t i o n . Let X be a is and E CK CX. E is (K-)
extremal if k i ~ K, 0 < k < 1 and ~k I + (l-~)k 2 c E imply
k i £ E. If E is a s i n g l e t o n set, E = (ko) , and meets the p r e c e d i n g
condition, then k° is an extreme point of K; in this case we
write k o £ ext (K).
Examples. i) Let X = R 3, K a cube or t e t r a h e d r o n . Then the
faces, sides, and v e r t i c e s of K are the K - e x t r e m a l subsets of K,
while the e x t r e m e points of K are the vertices.
2) Let ~ be a p o s i t i v e measure on some m e a s u r e space and
1 < p < ~. Let X = LP(~). Then ext U(X) = S(X).
3) Let X = CR(~)* and take k = {~ a X: ~(~) = 1 and ~ K @}"
That is, K consists of n o n - n e g a t i v e measures in rca (~) with total
mass of unity (probability measures on ~). The set K is c a l l e d
the p o s i t i v e face of U(X). The n o n - z e r o extreme points of K can
then be d e s c r i b e d as either the set of delta m e a s u r e s (point masses)
on ~ or, in f u n c t i o n a l terms, as the set of all real algebra
homomorphisms of CR(~ ).
4) Let X be either co or LI(~), where ~ is a p o s i t i v e
non-atomic measure. Then ext U(X) is empty.
Exercise 7. Verify examples i), 2), and 4).
Example 3 is more difficult to verify; it may be d e d u c e d from
results in [I, 74]. Also, the delta m e a s u r e characterization will be
proved later in 15c).

1o
c) Lemma. Let K be a subset of a is X.
i) If {E } is a family of K-extremal sets, then ~ E and
are also K-extremal sets.
2) Let E' ~ E CK. If E' is E-extremal and E is K-extremal,
then E' is K-extremal.
3) If E is K-extremal, then ext (E) = ext (K) f~ E.
Proof. Exercise 8.
We note that if E is an extremal subset of a convex set K,
then K \ E is again convex. The converse statement is clearly
false in general, but it is valid for singleton extremal sets. Thus
we may state that the extreme points of a convex set K are exactly
those points of K which may be deleted from K without destroying
convexity. We also note that if x ~ ext (K) ~'~co (J), where J is
a finite subset of a convex set K, then x e J.
d) Lemma. Let A be a compact subset of a ics X. Then
ext (A) ~ ¢.
Proof. We order the non-empty compact A - e x t r e m a l sets by
inclusion and use Zorn's Lemma to obtain the existence of a minimal
compact A-extremal set B. We wish to show that B is a singleton
set. If not, ~ distinct points p, q ~ B. Then by 3h) ~ f ~ X*
such that re f(p) { re f(q). Let H be the hyperplane
{x ~ X: re f(x) = min re f(B)~. The set B ~H is then a proper
compact B-extremal set. By c-2) it is also A-extremal, which contra-
dicts the m i n i m a l i t y of B, qed.
Corollary. Let A and X be as in the Lemma, and f a contin-
uous convex function on X (e.g., f ~ X*). Then f attains its
maximum over A at an extreme point of A.

II
Proof. The subset of A where f attains its m a x i m u m is a
non-empty compact A-extremal set. It t h e r e f o r e has an extreme point
w h i c h must b e l o n g to ext (A) by c-3).
e) Theorem. (Extended Krein-Milman Theorem) Let K be a c o m -
pact convex subset of a ics X. The following statements about a
subset A C K are e q u i v a l e n t :
i) ~ (A) = K;
2) sup re f(A) = max re f(K), for any f s X~;
3) ext (K) C A.
Proof. The e q u i v a l e n c e of I) and 2) follows from 3h) and the
fact that sup re f(A) = sup re f(c-o (A)), for any f E X~ and A C Y~
The p r e c e d i n g corollary shows that 3) implies 2). Thus we need only
check that 2) implies 3). Using i) we must prove that
ext (c-o ( A ) ) C A. For this it is s u f f i c i e n t to prove that if V is
any c l o s e d b a l a n c e d convex @-nbhd in X, and x E ext (c-o (A)),
then (x+V) /'I A + ¢. Now since A is t o t a l l y bounded, ~ a finite
set ( X l , ' ' ' , x n} C A such that
ACKJ(x i + v).
i=l
Now the sets KI • = c--o ((xi+V) ~ A) are compact and convex, and so
we have
n n
co (A) = co ) = co ),
the last equality following from a). Hence we may w r i t e x as a
convex combination of points in Ki, and since Ki~ c-~ (A), the
comment immediately preceding d) implies that x belongs to some Ki .
Therefore, x = x I. + v for some v c V, and so x.z = x - v is a
point in (x+V)~'~A~ qed.

12
Corollary. If X is a nls, then ext U(X ~) is total over X.
That is, if x and y are distinct points in X, ~ f s ext U(X ~)
such that re f(x) @ re f(y).
Exercise 9. Let X be a reflexive Banach space. Then
U(X) = co (ext(U(X))), where the closure is taken here in the norm
topology on X.
f) Example s . i) Let X be a Banach space. Combining the
Alaoglu-Bourbaki and Krein-Milman Theorems shows that
U(X*) = c-~ (ext(U(X*))), with closure here in the w~-topology on X e.
This observation leads to an easy proof that certain Banach spaces
are not dual spaces. For if X is an infinite dimensional dual
space then ext (U(X)) must be infinite. (If X is also reflexive,
then the Krein-Milman Theorem can be used to show that actually
ext U(X) is uncountable [40].) Thus if it is empty or finite X
cannot be a dual space. Hence from Example 4 in b) we see that
LI(~) (~ non-atomic) and co are not dual spaces. As a further
illustration of this idea let X = CR(g ). Characterize ext (U(X))
(Exercise i0), and then show that if ~ has only a finite number of
components, ext U(X) is finite. Hence, either ~ is finite, in
which case X is finite dimensional, or else X is not a dual space.
2) An extreme point of a compact convex set K in a ics is not
necessarily a support point of K. An example of this phenomenon was
given by Klee [36, p. 98]. Nevertheless, K is the closed convex
hull of its extreme support points. Prove this as Exercise Ii.
3) Some further idea of the power and scope of the extreme
point concept and the Krein-Milman Theorem can be obtained by noting
that they have been used as the basis for proofs of the Stone-
Weierstrass Theorem [5], and the Lyapounov convexity theorem for
vector-valued measures [35, 38]. They also underlie the extensive
Choquet representation theory [I0, 59]. Of course we will make our

i3
own particular applications of this material in our study of approxi-
mation theory below.
We conclude by remarking that a sharper version of the Krein-
Milman Theorem is possible in the finite dimensional case. This will
be developed when it is needed, namely in ~23.

Part II
Theory of O p t i m i z a t i o n
In the next several sections we present an i n t r o d u c t i o n to the
theory of o p t i m i z a t i o n in abstract spaces, together with selected
applications of a more concrete nature. Parts III and V to follow
provide many additional illustrations of the theory to be developed
now. With the exception of an introduction to the Dubovitskii-
Milyutin theory in ~'s 16 and 17, which allows n o n - c o n v e x objective
functions, we will only be concerned with convex o p t i m i z a t i o n prob-
lems. Most generally, these are simply problems of minimizing a
convex function over a convex set (or, what is the same, subject to
linear or convex contraints) in linear spaces. We will adopt the
device p o p u l a r i z e d by R. T. R o c k a f e l l a r of redefining the function
to be infinite outside the constraint set. The two basic tools for
characterizing the solutions of our problems are the theory of sub-
gradients and the theory of conjugate functions.
~6. Convex Functions
a) Definition. Let X be a Is and f: X ÷ (-~, +~]. f is a
proper convex function on X, written f ~ Cony (X), if f is not
identically +~ and
f(tx+(1-t)y) < tf(x)+(1-t)f(y),
whenever x, y a X and 0 < t < I. The effective domain of f is
the set
dom(f) --- i x ~ X: f ( x ) < +~}.
Frequently an o p t i m i z a t i o n problem involves a (finite) convex
function defined only on some convex subset K CX. Such a function

15
is obviously extendable to belong to Conv (X); we simply define its
values at points in X \ K to be +~.
Examples. I) Evidently, any linear or sublinear function on X,
hence any norm or semi-norm, belongs to Conv (X).
2) If K C X is a convex set containing 0, then the
Minkowski function PK is in Conv (X), and dom (pK) is the (convex)
cone generated by K.
3) A very important example of a convex (but not sublinear)
function which repeatedly occurs in optimization problems is the
following. Let X be a tls and Y a nls; let R g L(X,Y) and
Yo ~ Y" Then put f(x) = l lR(x)-Yol l .

4) If K is any subset of X, the indicator function of K,
6K, is defined by
6K(X) = ~ 0 if x ~ K
L +oo if x ~ K.
Then 6 K ~ Conv (X) exactly when K is convex. This seemingly
innocuous function will play an important role in the analysis of
constrained optimization problems.
S) Let X be a nls (in particular, consider X = Rn), and K
an open convex subset of X. Let f be a continuously differentiable
real-valued function on K, and define the values of f to be +~
off K. Define a function (the "excess function") E: K x K ÷ R 1 by
E(x,y) = f(y)-f(x)-df(x).(y-x),
for x, y c K. Here df(x) is the (Frechet) differential of f at
x, and the dot signifies its value at the vector (y-x). Then
f e Cony (X) if and only if E(x,y) > 0 for all x, y ~ K.
(The reader should be sure to understand the geometric significance of
E.)
16
Exercise 12. Verify this last assertion. Use the result to
prove that if f is actually twice continuously differentiable on K,
then f s Conv (X) if and only if d2f(x) is p o s i t i v e semi-definite
for every x s K. Here d2f(x) is the second (Frechet) differential
of f at x, and is a continuous symmetric bilinear function on
X × X. This observation includes the familiar case where
K = (a,b) ~ R 1 = X, and the c o n v e x i t y criterion is simply that
f"(x) > 0 for a < x < b.
In Example 4 above we a s s o c i a t e d a convex function to every
convex set in a is. We now give a converse association, by consider-
ing the region above the graph of a given convex function.
b) Definition. Let f E Cony (X). The e p i g r a p h of f, epi (f),
is the set defined by
epi (f) = {(x,t) s X × RI: f(x) ~ t}.
Clearly every function f: X ÷ (-~, +~] has an epigraph; the
epigraph is convex exactly when f s Conv (X). Note that for such
f's dom (f) is the p r o j e c t i o n of epi (f) into X. In general,
the e p i g r a p h will be important to us because of the support and
separation principles developed in ~3.
§7. Directional Derivatives
a) Theorem. Let X be a is and f s Conv (X). Then if
x ° s dom (f),
f(Xo+tX)-f(x o)
(l) f' (Xo;X) ~ lim
t+O t
exists in [-~, +~] for every x ~ X.
Proof. Observe first that if g ~ Cony (X) satisfies g(@) = 0,
then h(t) ~ g(tx)/t is a n o n - d e c r e a s i n g function on (0, +~).

17
Because, if 0 < s < t, then
h(sx) i ~s h ( t x ) + tt---~s h(@)
whence h(sx)/s ~ h(tx)/t. Apply this o b s e r v a t i o n to the function
g(y) m f(Xo+Y)-f(Xo) to conclude that the difference quotient in (i)
is a n o n - d e c r e a s i n g function of t.
Remark.
.. When x E Xo-dom (f), then -~ < f'(xo;x ). We can see
this by verifying that the d i f f e r e n c e quotient is b o u n d e d below for
t > 0. In fact, in the inequality f(Xu+(l-X)v) < Xf(u)+(l-X)f(v),
let us replace u by x ° + tx, v by xo x, and X by i/(l+t);
1 t
f(Xo) = f(y~y (Xo+tX)+ ~ (Xo-X))
1
-< i - ~ f(xo+tx) + ~ t f(x o_x),
whence
f ( X o + t X ) - f ( x o)
f (x o) - f (x o-x) i t
b) Theorem. Let f be a finite convex function on a is X.
Then f'(Xo;. ) is a ( f i n i t e ) sublinear function on X, ~2/x 0 ~ X.
Proof. The finiteness of f'(Xo;- ) follows from a) and its
homogeneity is an immediate consequence of (I). Let us prove its
subadditivity. In the convexity inequality for f stated in the
preceding Remark, we replace u by x ° + 2tx , v by x o + 2ty,

1
and set ~ - 2 " This yields
f(Xo+t(x+y)) <_ l ( f ( X o + 2 t x ) + f ( X o + 2 t y ) ),
and s o
f (Xo+t (x+y)) - f (Xo)

<
f(Xo+2tx) -f(x O) f(Xo+2ty) -f(x o)

< +
-- 2t 2t
18
Thus, in the limit as t + 0, we o b t a i n
f'(Xo;X+Y) ! f'(Xo;X)+f'(Xo;Y)'
qed.
Corollary. Let f be as in the Theorem. Then
lira f (x°+tx) - f (x°)

= _f,(Xo;-X )
t~0 t
< f' (x o ;x),
for every Xo, x e X.
Proof. Exercise 13.
c) The p r e c e d i n g Corollary has the following implication.
Suppose that X is a real tls, f a finite convex function on X,
and x ° e X. Suppose also that f'(x ° ; -) ~ ~ belongs to X* Then
f ( X o + t X ) - f ( x o)
(2) (x) = lim t '
t+0
that is, the two-sided limit exists Vx E X.
Definition. When Equation (2) holds for a f u n c t i o n f, a point
x ° ~ X, and all x e X, and ~ e X*, then ~ is called the
gradient of f at Xo, and is w r i t t e n ~ ~ Vf(Xo).
We will use this terminology even w h e n f assumes infinite
values or w h e n f is not convex. It is c e r t a i n l y justified in
particular when f is (Frechet) differentiable at x o. When f has
a gradient at all points of some open set, f will be said to be
smooth on that set.
Exercise 14. Let X = LP(~), f(x) = IIxll, and @ + x ° e X.
If 1 < p < +=, show that

19
vf(x o) : Xo(.)IXo(')IP-Z/I 1%11 p - I ,
as an element of Lq(~), ~ + ! = i. If p = I, find n e c e s s a r y and

P q
sufficient conditions on x° in order that Vf(xo) exists, and
identify it as an e l e m e n t of L~(~) (assume ~ to be a-finite).
d) Remark. Let f be a s m o o t h convex function on R I. Then
f' is n o n - d e c r e a s i n g . This follows from the excess function charac-
terization of c o n v e x i t y given in 6b). Indeed, for x < y, the
inequalities 0 < E(x,y), 0 < E(y,x) imply
f'(x) < f ( y ) f(x) < f,(y)

-- y-x
(Of course a stronger statement is true, since even if f is not
differentiable, both its o n e - s i d e d derivatives are still non-decreas-
ing functions.)
To g e n e r a l i z e this m o n o t o n i c i t y property of g r a d i e n t s of s m o o t h
convex functions on the line, let f be a s m o o t h convex function
on an open (convex) subset K of some tls X. The excess function
characterization then implies
(3) Vf(xo)-X =- (~(x) <_ f ( X o + X ) - f ( X o ) ,
or
(4) }(X-Xo) i f(x)-f(Xo),
for Xo, x s K. If we now interchange x° and x, add the two
inequalities, and p e r f o r m a bit of algebra, we o b t a i n the m o n o t o n i c i t y
inequa___llity
0 <_ ( V f ( x ) - V f ( X o ) ) • (X-Xo) ,
which is v a l i d xo, x ~ K. This is also d e s c r i b e d by saying that
Vf(.) is a m o n o t o n e mapping of K into X ~.

2O
~8. Subgradients
We turn now to a s a t i s f a c t o r y generalization of the n o t i o n of
gradient in the case of convex functions which are not differentiable.
The appropriate concept is a " s u b g r a d i e n t " of a given convex function,
the theory of w h i c h has been extensively developed in the last few
years by M o r e a u [48, 49, 50], Rockafellar [7, 66, 68, 69], and others.
a) Definition. Let X be a real tls and f ~ Cony (X). Any
~ X~ satisfying the equivalent equations (3) or (4) of §7 is
called a s ubgradient of f at x o. The set of all such ~ is the
subdifferential of f a_!t xo, denoted by ~f(xo). The m a p p i n g
x ~ ~f(x) of X into (possibly empty) subsets of X~ is the sub .-
derivative of f (sometimes called the subdifferential of f).
Remarks. i) If X is a complex tls and f ~ Conv (X), then
~f(Xo) will consist by d e f i n i t i o n of those ~ ~ X~ such that re
is a s u b g r a d i e n t of f at x relative to X .
O r
2) If Vf(Xo) exists then it o b v i o u s l y belongs to ~f(xo) ,
and we will shortly see that in this case there can be no other sub-
gradients at x o.
3) If x ° ~ dom (f), then ~f(xo) is void by definition. On the
other hand, if x o ~ dom (f), then } s X* is a s u b g r a d i e n t of f
at x° if and only if %(x) ! f'(Xo;X ) ~ x ~ X (using 7a)).
4) If f is s u b d i f f e r e n t i a b l e at Xo, which means that ~f(Xo)
is not void then f must be lower semi-continuous (isc) at x

o
~xamples. i) Let f be a continuous convex function on R I.
Then ~f(Xo) is the compact interval whose left and right hand end-
points are respectively the left and right hand derivatives of f
at x .
o
2) A isc p r o p e r convex function need not be s u b d i f f e r e n t i a b l e
throughout its e f f e c t i v e domain. As an example, define

21
f E Cony (R I) by
f(x)
+ ~ otherwise.
Then ~f(~l) = ~.
3) Let f be the norm on some nls X. Then ~f(@) = U(X~).
Further examples of subgradients of particular convex functions
will be given in subsequent sections.
b) We consider next the geometrical interpretation of sub-
differentiability. When X is a real tls, we recall that (X × RI) *
is (algebraically) isomorphic to X* × R 1 under the correspondence
X* × R1 ~ (¢,s) *-~ ~ (X x RI)*, where
~((x,t)) ~ ~(x,t) = ¢(x)+st. Indeed, this is a special case of the
theorem asserting that a product of tls has the corresponding (direct)
sum for its dual.
Theorem. Let X be a real tls and f ~ Conv (X). Then
~ ~f(xo) if and only if the graph of the affine mapping
h(x) ~ f(Xo)+%(X-Xo) is a supporting hyperplane to epi (f) at the
point (xo,f(Xo)) .
Pr____£of. By definition } E ~f(Xo)4=>h(.) ! f(.). In turn this
happens exactly when ~(epi (f)) K f(xo)-¢(x o) z c, where
~(x,t) ~ -¢(x)+t, and this definition of ~ entails 9(x,h(x)) = c.
(The graph of h is the hyperplane {(x,t): 9(x,t) = -c}.)
c) Lemma. Let f, X be as in b). Suppose that
H ~ {(x,t) ¢ X × RI: ~(x,t) = c} is a supporting hyperplane to
epi (f) at (Xo,f(Xo)), say ~(epi (f)) h c. If 9 corresponds to
(~,s) ¢ X* × R I, and H is non-vertical (that is, s } 0), then
s > 0 and -%/s ¢ ~f(Xo).

22
Proof. Exercise 15.
d) We are now ready to present the main existence theorem for
subgradients, due originally to Minty [45]. Let us note in advance,
that the subdifferential of a given convex function at any point is
always a convex and w*-closed subset of the dual space, although as
we have already seen, it may well be void.
Theorem. Let X be a ics and f ~ Conv (X). If f is
continuous at x ° s dom (f) then ~f(Xo) is a non-empty w~-compact
convex subset of X*.
Pr__~oof. We first show that ~f(Xo) ~ ~. Now epi (f) is a
convex body in X × R1 (for example, an open set in epi (f) is
V x (b,+~), where V is an open Xo-nbhd. for which f(V) ! b). We
may therefore apply the Support Theorem 3f) to obtain a hyperplane
H ~ {(x,t) c X x RI: ~(x,t) = c} which supports epi (f) at
(Xo,f(Xo)). Further, H is non-vertical since x° must belong to
int (dom (f)). Hence the Lemma in c) establishes the existence of a
subgradient at x o. Now since ~f(Xo) is always w*-closed, it will
suffice to prove that it is, in the present case, relatively w *-
compact, and this will complete the proof.
According to 4b), it will suffice in turn to find a @-nbhd
VCiX so that 8f(xo) C V °. But we may take V = {x g X:
f(Xo+X)-f(Xo) ~ i}, qed.
e) Coro!lary. With the same hypotheses except that now X is
a nls, it follows that ~f(Xo) is also a (norm-)bounded subset of ~ .
Pr___ooof. A category argument can be used to establish that any
w*-compact convex subset of the dual of a nls X is bounded (and
convexity is essential should X not be complete); cf. [34, §18].
However, an ad hoc argument can be given in the present case. Sup-

23
pose that 8f(Xo) contains an u n b o u n d e d sequence {~n }. Choose
6 > 0 so that f(x) ~ 1 if I IX-Xol I ~ 6. Choose Yn c S(X) so
that Cn(Yn ) ~ II¢nIJ-i/n. Let x n = 6y n. Then we o b t a i n the
contradiction
i ~ f(xo+Xn)-f(xo) K ~(Xn)
: 6¢(y n) ~ 6(I[¢nll l/n).
Example. Let X and Y be nls, R e L(X,Y) and Yo c Y.
For the (continuous) convex function f(x) = llR(x)-Yo] [, introduced
in 6a), we see that all the s u b g r a d i e n t s of f at any point in X
are c o n t a i n e d in the ball IIRIIO(×').
§9. Normal Cones
We recall that if x° belongs to a is X, a cone w i t h vertex
x° (or simply a cone at xo) is a union of rays emanating from x o.
That is, if K is a cone at xo, then x ~ K~xo+t(X-Xo) E Ks V
t > 0. Clearly a subspace M is a convex cone at each x ° ~ M,
and a h a l f - s p a c e H is a convex cone at each of its b o u n d a r y points.
a) Definition. Let X be a ics, K a convex subset of X,
and x o E K. The support cone to K at x° is the closed convex
cone at x° generated by K; that is, it is the intersection of all
closed convex cones at x which contain K.

O
The support cone to K at xo, denoted S(xo~K), can also be
described as the closure of the union of all rays emanating from x

O
and p a s s i n g through a point in K. When x° is a support point of K,
the d i m e n s i o n of S(xo,K ) is a m e a s u r e of the smoothness of the
boundary of K at Xo, since this in turn depends on the n u m b e r of
(linearly independent) supporting hyperplanes at x o. In fact,
S(xo,K ) is the intersection of the supporting half-spaces at x°
(that is, the h a l f - s p a c e s containing K whose boundary hyperplanes

24
support K at Xo). On the other hand, if x is not a support

O
point of K, then S(Xo,K) = X.
Exercise 16. Verify these last two assertions.
b) Let x ~ K, and X be as in a). We i n t r o d u c e another cone

O
associated with x° and K, but lying in X*. In the special case
where X = R n, this new cone arises as follows. For each supporting
hyperplane H to K at Xo, there is an unique exterior ray
emanating from x° which is normal (orthogonal) to H. The union of
these rays taken over all such H forms a closed convex cone at x
o
called the "normal cone to K at x ". Again, its d i m e n s i o n is a
O
measure of the smoothness of the b o u n d a r y of K at x o. In parti-
cular, its d i m e n s i o n is unity exactly when x° is a "smooth point"
of K, that is, a point through which passes a unique supporting
hyperplane. The normal cone is usually replaced by the t r a n s l a t e d
cone w i t h vertex @.
Definition. The normal cone to K at x K is

0
(S (x o,K)-xo) o
We are taking the polar here of a cone at @, so we obtain a
w*-closed convex cone at @ in X*, denoted N(Xo,K). We easily
see from the d e f i n i t i o n s involved that this cone consists of those
functionals whose real parts attain their m a x i m u m over K at x o.
That is,
N ( X o , K ) = {~ ~ X*: re ~(Xo) = max re ~(K)}.
In p a r t i c u l a r , when K is a subspace of X, N(Xo,K ) is the subspace
K ~, provided x o c K. We e m p h a s i z e that w h e n x ° ~ K, N(Xo,K ) is
empty by definition.
25
c) Example. Let K be a convex subset of a ics X. Then for
every x ° s X,
~K(Xo) -- N(Xo,K ).
It is this formula which motivates our interest in normal cones to
convex sets. We will have occasion to compute normal cones to
specific convex sets (the constraint sets in optimization problems)
in later sections.
§i0. Subdifferential Formulas
a) Let X be a real ics and x ° s X. The mapping f ~ Vf(Xo)
is a linear mapping from the linear space of smooth functions on X
into X *. Now the subdifferential mapping f ~ ~f(Xo) from the
convex cone Conv (X) into the convex subsets of X* is supposed to
play a role analogous to the gradient mapping. For this analogy to
be viable it is important to know when the subdifferential mapping
preserves the cone operations, i.e., positive linear combinations.
Now it is immediate that
~(xf)(Xo) = x~f(Xo),
V X > O, and this is also valid for X = O, provided Bf(Xo) ~ ~.
Of much greater import, however, is the following formula for sums
of convex functions.
Lemma. (Moreau, Rockafellar) Let X be a real ics and
f, g ~ Conv (X). Assume that f is continuous at some point in
doT (f)f'] dot (g) (z doT (f+g)). Then ~ x ° c X,
(f+g)(x o) = ~f(Xo)+$g(Xo).
26
Proof. Directly from the d e f i n i t i o n of s u b g ~ a d i e n t follows
(f+g) (xo) ~ ~f(Xo)+~g(Xo).
To prove the reverse inclusion, let } ~ ~(f+g)(xo). Replacing f
and g if n e c e s s a r y by the convex functions
fl(x) m f(Xo+X)-f(Xo)-9(x),
gl(x) m g(Xo+X)-g(Xo),
we can reduce the argument to the case where x o = 8, 9 = @, and
f(@) = g(@) = 0. So now we have @ E ~(f+g)(@). This implies that
min (f+g)(X) = (f+g)(@) = 0.
It follows that the set
K = {(x,t) s X × RI: t ~ -g(x)}
is disjoint from int (epi (f)), and so we may apply the S e p a r a t i o n
Theorem 3g) to separate K and epi (f). The resulting hyperplane
is n o n - v e r t i c a l since int (dom (f))(-] dom (g) ~ ~, and is the
graph of a linear function (not m e r e l y affine) since
(8,0) ~ KN epi (f). Thus ~ ~ ~ X* such that
re 9 (x) < t V (x,t) a epi (f),
t < re ~(x)
These inequalities entail s ~f(@) and -9 s ~g(@), hence
@ s ~f(@) + ~g(@), qed.
We will give two immediate applications of this lemma; first~
to obtain a relationship between the d i r e c t i o n a l derivative at a
point and the subdifferential at that point, and then to derive a
basic optimality criterion for convex programs.

27
b) Theorem. (Moreau, Pshenichnii) Let X be a real ics and
f ~ Conv (X). Assume that f is continuous at x . Then V x s X,

o
f'(Xo;X ) = max {~(x): ¢ E Sf(Xo)}.
Proof." Fix x ~ X and let L be the line
{Xo+tX: t e RI}. Then
e ~(f+6L)(Xo) ,
¢(y-Xo)+f(Xo) i f(Y) ~y e L,
t~(x)+f(x o) <_ f(Xo+tX) V t s RI,
and
-f' (Xo;X) <_ ¢(x) < f' (Xo;X),
are all equivalent statements about ¢ s X*. Further, since X is
locally convex, 3 ¢ s X* such that ¢(x) = f'(Xo;X). Then, since
the preceding iemma implies ~(f+6L)(Xo) = Sf(Xo)+~L(X o) = Sf(x o)

+span(x) ~ (using 9c)), we have
f' (Xo;X) = max {~(x): ~ = y+~, y e ~f(Xo) and ~(x) = 0}
= max {$(x): ¢ c ~f(Xo)},
qed.
Exercise 17. Under the hypotheses of the Theorem, show that
x ~ f'(Xo;X ) is a continuous function on X.
Corollary. With the same hypotheses,
-f'(Xo;-X ) = min {~(x): ~ E ~f(Xo)}.
Hence the two-sided directional derivative
f(Xo+tX) -f(x o)
(i) lira
t
t+0
28
exists and has the value k if and only if the function @ ~ ¢(x) is
constant (with value X) on the set ~f(x o) .

We note further that the set of x g X for which the limit in
(I) exists is a closed subspace of X on which the value of (i) de-
fines a continuous linear functional [26, p. 97].
c) Corollary. With the same hypotheses, ~ z Vf(Xo) exists if
and only if Sf(Xo) consists of a single element, namely ~.
Proof. Exercise 18.
d) Remarks. i) An alternative proof of the theorem in b) will
be given later, in §18, to illustrate a formula involving conjugate
functions.
2) Since we will frequently encounter the hypothesis that a
convex function f be continuous at a point, it is worthwhile to
recall that this is not a very stringent requirement. Namely, if f
is bounded above on some non-empty open set, in particular, if f is
upper semi-continuous (usc) at any point, then f is necessarily
continuous throughout int (dom (f)) [4, p. 92; 42, p. 193]. (A
special case of this was proved in 3b)). As a special case it follows
that if f ~ Conv (X) where X is finite dimensional, then f is
continuous throughout rel-int (dom (f)). (A direct proof of this
last assertion is also available. Without loss of generality we may
assume that K ~ rel-int (dom (f)) is an open (convex) set in R n.
Now if xo E K and IIxll E ll(~l,...,~n) II is sufficiently small,
then
f(Xo+X)-f(Xo) = f(Xo+~iei)-f(Xo)
f (Xo+n~lel Xo+n~nen)
= n +" "'+ n - f(Xo)
<-- ~1 ~ (f (Xo+n~iei) _f(Xo) ) ,

i=l
29
which tends to 0 as x ÷ 8, because a convex function on an
interval must be continuous. This proves that f is usc at x o.
Now since we also have
f(x o) 5_ i f ( X o + X ) + } f ( X o - X ) ,
then
f ( X o ) - f ( X o +x) £ f ( X o - X ) - f ( x o)
which as above tends to 0 as x ÷ 0, qed.)
Exercise 19. Let X be a nls, and s u p p o s e that f E Conv (X)
is c o n t i n u o u s at x . Then 3 an x -nbhd. V and k > 0 such

O O
that w h e n e v e r x, y s V, the L i p s c h i t z inequality
[f(x)-f(y)] i k[]x-y][
holds.
511. Convex Programs
a) D e f i n i t i o n . A variational pair is an o r d e r e d pair (X,f)
where X is a set and f: X + (-~,+~]; f is c a l l e d the o b j e c t i v e
function. The associated variational problem, or a b s t r a c t mathe-
matical program, is to d e t e r m i n e the n u m b e r inf f(X), called the
value of the p r o g r a m , and the points in X (if any) where the value
is attained. All such points are then c a l l e d solutions of the program.
When X is a is and f s Conv (X), the a s s o c i a t e d variational
problem is called an a b s t r a c t c o n v e x ~rogram.
It is i m p o r t a n t to r e c o g n i z e at the outset that this definition
encompasses the p r o b l e m of m i n i m i z i n g a convex function over a convex
set which is not a linear space. Indeed, the v a r i a t i o n a l problem
(K,f), where K is a convex set in a is X, and f is a convex

30
function defined on K, is for all p u r p o s e s , the same as the convex
program (X, f), where T agrees with f on K, and takes the
value +~ on X \ K. If f is a p r i o r i defined on all of X (i.e.,
f s Cony (X)), then we will identify the p r o g r a m s (K, flK) and
(X, f+6K) .
b) When X is a Ics and f a smooth convex function on X,
then xo is a s o l u t i o n of the p r o g r a m (X, f) if and only if
Vf(xo) = 8. More generally, for any f s Cony (X), x° is a s o l u t i o n
if and only if
(i) e ~ ~f(Xo).
Although trivial in itself, the o p t i m a l i t y condition (i) is the basis
for later more informative characterizations of s o l u t i o n s of c o n v e x
programs.
Note that, by 9a - R e m a r k 3), (i) holds if and only if
x s dom (f) and

o
(2) f ' ( X o ; X ) >_ o, x ~ x.
This condition in turn depends only on the values of f in small
nbhds, of x o. Thus if f has a local m i n i m u m at x o, then (2) and
hence (i) are satisfied, so f has a ~lobal minimum at Xo, that is,
x° is a s o l u t i o n of the p r o g r a m (X, f).
c) We remark that the set of all s o l u t i o n s of a c o n v e x p r o g r a m
(X, f) is a convex set in X which is c l o s e d w h e n e v e r f is isc
on X.
d) We are now ready for the second application of the lemma in
i0 a).
Theorem. (Pshenichnii, Rockafellar) Let X be a ics and

3i
f ~ Conv (X). Suppose that K is a convex set in X such that
either dom (f) /-~int (K) + ~ or f is continuous at some point in
K. Then xo s K is a solution of the convex program (K, fIK) if
and only if -N(Xo,K ) ~ ~f(Xo) + ¢.
Proof. Identifying the given program with (X, f+6K) , and
using successively b), i0 a), and 9 c), we see that x° s K is a
solution if and only if
e E ~(f+~(x o) -- ~f(Xo)+~6K(Xo)
= 8f (Xo) +N(Xo,K) ,
qed.
In other words, our optimality criterion states that a
necessary and sufficient condition for x e K to be a solution to

O
the convex program (K, fIK) is that there should exist ~ ~ ~f(Xo)
which attains its minimum over K at x o. In particular, for smooth
(convex) f, the condition is that Vf(Xo) attains its minimum over
K at x . If K happens to be a subspace of X, or more generally

O
a flat (3e)) in X, and these are important cases in practice, then
the functional ¢ above (or Vf(xo) ) must belong to K ~,
respectively, to the annihilator of the subspace parallel to K.
Exercise 20. As a first illustration of the use of the
optimality criterion above, we propose the solution of a simple
quadratic variational problem in Hilbert space. Let
X = HI[0,T] z {x ~ CR[0,T]: x is absolutely continuous with derivative
i c L2[0,T]}. X is a Hilbert space under the inner product
T
x,y~ ~ x(0)y(0)+ I i(t)~(t)dt.
0
(Completeness of X follows readily from the completeness of

2
L [0,T].) Define f ~ Conv (X) by
32
T
f(x) = J (x(t)2+i(t)2)dt,
0
and let K be the flat {x s X: x(0) = I}. Find the (unique)
solution and the value of the program (K, flK). (A simple approximate
solution is x(t) = e -t, the approximation improving as T in-
creases.)
§12. Kuhn-Tucke r Theory
As a second illustration of the use of lld), we consider a
special class of convex programs ("ordinary convex programs") which,
in the finite dimensional case, have been of great practical interest,
and for which an elegant theory is available. The programs may be
intuitively described as "minimizing a convex function subject to
convex constraints".
a) Lemma. Let KI,...,K n be closed convex bodies in a ics X
whose interiors have a point in common. Let x ° s K ~ ( ~ K i. Then
N(Xo,K) = ~N(Xo,Ki).
Proof. Apply 9c) and 10a) to 6 K = [~K."
1
b) Lemma. Let f be a continuous convex function on a real ics
X which somewhere in X assumes a negative value. For the set
K -= {x ~ X: f(x) < 0} we have
I %, if f(Xo) > 0
N(Xo,K ) = {@}, if f(Xo) < 0
S(@,Sf(Xo)) , if f(Xo) = 0.
Proof. Here we are using the notation of 9a) even though
@ ~ Sf(Xo) when f(Xo) = 0; S(@,Sf(Xo)) is simply the set of all
non-negative multiples of elements of ~f(Xo). Let us prove the
indicated relation when f(xo) = 0, the other two being quite

33
straightforward. If , s ~f(Xo) , then ,(X-Xo) < f(x) ~x s X, and
so ~(x) <_ ~(x)-f(x) <_ ~(Xo) Vx ~ K. Therefore ~ ~ N(Xo,K ) and so
is k%, ~ k >_ 0. That is, S(@,~f(Xo)) C N(Xo,K). Conversely, let
~(+ @) s N(Xo,K ) so that ~(Xo) = max ~(K). Then
f(x) < 0 -->}(x) < %(Xo) , hence 9(x) >_ %(Xo) =>f(x) >_ 0. Thus
f(Xo) = min f(H), where H is the half-space {x s X: 9(x) > ~(Xo)}.
By 10a) and llb) this means that @ e ~f(Xo)+~6H(Xo). But, this

second subdifferential is just the ray of non-positive multiples of
~. Therefore, ~ ~ ~ 3f(Xo) and X K 0 such that @ = ~-X~, and
since f does not attain its minimum at x X cannot vanish. This

O'
proves that ¢ ~ S(~,~f(Xo)), qed.
Corollary. Let fl,.'',fn be continuous convex functions on
a real ics X, let K i = {x E X: fi(x) ~ 0}, and assume that all
the fi are simultaneously negative at some point in X. Then
~ N(Xo~Ki) if and only if ~ Xi K 0 such that Xifi(Xo) = 0
and ~ ~ ~XiSfi(Xo).
Proof. This is an immediate consequence of the results in a)
and b). We are assuming here that x ° s ~ K i.
c) Definition. An ordinary convex p_r_o~ram is a convex program
of the form (X, f+~6K ), where X is a real Ics,

1
f, fl,.-.,fn s Cony (X), f is finite, fl,...,fn are continuous
on X, and K i = {x c X: fi(x) ! 0}.
Thus we are in effect trying to minimize f subject to the
simultaneous inequality constraints fi(x) ~ 0, i = l,.'',n. Such
programs have been of wide-spread interest for the last two decades,
and a considerable computational and duality theory has been develope&
Some general references are [20, 30, 44, SS, 70]. Our only concern
here is to characterize the solutions of these programs under the

34
regularity assumption (or "constraint qualification") that there is
a point in X where all the fi's simultaneously assume a negative
value. The existence question for solutions of (ordinary or abstract)
convex programs is frequently easier to answer~ and will not be
discussed here. But see §30 below for some special cases, and
[70, ~27] for the general (finite dimensional) case.
d) Theorem. (Generalized Kuhn-Tucker Conditions) x° c ~ Ki
is a solution of the ordinary convex program defined in c) if and
only if ~i ~ ~fi(Xo ) and X I• _< 0 such that ~,ifi(Xo ) = 0 and
~i~i ÷'..+ ~n~n ~ ~f(Xo).
Proof. Immediate from the optimality criterion lld) and the
Corollary in b) above.
Corollary. (Classical Kuhn-Tucker Conditions) If all the
functions f, fl~...,fn are smooth, then the solvability condition
becomes: ~ h i K 0 such that ~ifi(Xo ) = 0 and
Vf(xo)+~iVfl(Xo)+...+XnVfn(Xo) = @.
Exercise 21. For x ~ (xl,x2) E R2 define
f(x) = x~ + 2x~ 4x I - 6x 2.
Show that f is convex in the half space ((Xl,X2): x I K 0~. Use
the conditions in d) to solve the ordinary convex program
(R 2, f+6K) where K = (x: x I K 0, x 2 ~ 2, 3Xl+2X 2 ~ i0~.
e) Remark. It is possible to generalize the class of convex
programs for which conditions of Kuhn-Tucker type still characterize
the solutions. In one direction we may adjoin a finite number of
affine constraints (constraints of the form ~(x)j = ~j, where Zj

35
is a continuous linear functional); the only change is that the
corresponding additional "multipliers" X. are of unrestricted sign.

J
In another direction, we may replace the finite set of constraints
fi(x) < 0, i = i,.'' ,n
by an infinite family of smooth (convex) constraints,
f~(x) <_ 0, ~ ~ A,
where A is a compact Hausdorff space and for each x, the function
~ f (x) is continuous on A. The appropriate regularity assumption
is then that for some x,
sup {f (x): ~ ~ A} < 0.
The conclusion is then that an x° satisfying all the constraints is
a solution if and only if there exists a non-positive finite Borel
measure >~ on A such that
IAfa(xo)dX(a) = 0
and
JAVf (Xo)dX(~) s Sf(Xo).
These conditions are of course a precise generalization of those given
in d) where A is a finite set. A proof of this result has been
given by Rockafellar [69, p. 45].

36
§13. Lagrange Multipliers
We continue to study the ordinary convex program of 12c), the
regularity assumption of that paragraph being in force throughout the
present section. Classically the "method of Lagrange multipliers"
is a device for replacing an optimization problem with constraints
(such as the one under study) with a new unconstrained problem. In
addition to discussing this point of view, we develop an equally
interesting interpretation of Lagrange multipliers as measurements
of the rate of change of the value of the program when the con-
straints are slightly altered.
a) Definition. An n-tuple (~l,-'-,~n) of non-negative numbers
is called a Lagrange multiplier vector (or, a Kuhn-Tucker vector)
for the ordinary convex program 12c) if the value of the program
(X, f + ZXifi) is finite and equal to the value of (X, f + Z6K. )

1
(~ the original program).
Thus if we can be assured of the existence of a Lagrange
multiplier vector, then to solve the program we can try to minimize
the function f + Zlif i over all of X (which computationally may
be less difficult than minimizing f over some proper convex sub-
set), and then examine these solutions to find those which satisfy
the constraints fi(x) ~ 0.
b) Theorem. (Minimum Principle) If xo is a solution of the
ordinary convex program 12c), then there exists a Lagrange multiplier
vector (Xl,...,In) such that x° is a solution of the program
(X, f + Zlifi).
Proof. The Kuhn-Tucker optimality condition of 12d) implies
the existence of non-negative ll,.-.,i n such that
@ E ~(f + ZXifi)(xo). Consequently f + EXif i attains its

37
minimum over X at x o. Further, the two minima are equal, since
f(Xo) + ~ i f i ( X o ) = f(Xo) because Xifi(Xo) = 0 also, by 12d).
Hence by definition the n-tuple (~l,...,Xn) is a Lagrange multiplier
vector.
c) We now work toward the establishment of a fundamental connect-
ion between Lagrange multiplier vectors and subgradients. This
relationship leads both to existence criteria for Lagrange multiplier
vectors and to their interpretation alluded to at the beginning of
this section.
Definition. The perturbation function of the ordinary convex
program 12c) is the mapping p: R n + [-~,+=] defined by
p(y) z P(YI'''''Yn ) = inf {f(x):
fi (x) ! Yi' i : l,.'.,n}.
Thus p is the value of a new ordinary convex program differing
from the original only in that the original constraint levels have
been perturbed. Intuitively we think of p(y) as the optimal payoff
for a given level y of resource expenditure. It is possible,
although in practice unlikely, that p may assume the value -~.
In any event, dom (p) ~ {y s Rn: p(y) < +~} = {y s Rn: fi(x) ! Yi
for some x s X and all i}. We assume that @ ~ dom (p), and
direct our attention to the study of p(y) for small values of y.
The basic property of p is its convexity.
Lemma. The perturbation function p is a convex function on

Rn "
Proof. Since a priori p may assume the value -~, we cannot
directly verify the convexity inequality of 6a), because of the
possibility of encountering an indeterminate expression such as

38
-~ +~. However, it is just as easy to establish the convexity of
epi (p). To do thisj choose points y, z s dom (p), numbers
> p(y), ~ > p(z), and 0 < t < I, and show, as Exercise 22, that
p(ty+(l-t) z) < t~ + (l-t)B.
Several facts about p follow directly from its convexity.
First, if p ever assumes the value -~, then it has the value -~
throughout the relative interior of its effective domain; hence if
the value of the original program is finite (i.e., -~ < p(@)), this
case cannot occur, so p must be continuous throughout
rel-int (dom (p)) by 10d). If so, then p is subdifferentiable
throughout this same relative interior by 8d). In particular, if
@ cint (dom (p)), then p is continuous and subdifferentiable on
some 0-nbhd.
d) Theorem. The set of Lagrange multiplier vectors for the
ordinary convex program 12c) is identical with -~p(@).
Proof. Let i ~ (ll,...,ln) be a Lagrange multiplier vector.
Then, recalling that K i ~ {x s X: fi(x) ~ 0},
p(@) = value of (X, f + E~K. )

1
= value of (X, f + ~%ifi )
<_ f(x) + EkiYi, if fi (x) <--Yi vi"
Therefore, p(@) <_ p(y) + ZkiYi, which proves that -I s ~p(@).
Conversely, assume that k ~ -~p(@). Then h i >__ 0 (because, by
definition of p, Yi >- 0 =)p'(@;y) <_ 0, so by 9a-Remark 3),
-h i = (-k)-e i < p'(@;ei) < 0). Now choose any ~ s X and put
Yi = fi (~)" Then p(@) <_ p(~) + Zkiy i =>

39
p(@) < inf {f(x): fi(x) ! Yi }
+ Zliy i
< f(~) + E~if i(~),
and so
value of (X, f + E~K. ) ! value of (X, f + ~Xifi).

1
But the reverse inequality is also valid , since X I. _> 0 and so the
value of (X, f + EXifi) cannot exceed the infinum of this function
over the set ~ K . .i Hence the two values are equal and this means
that X is a Lagrange multiplier vector.
The upshot is that all questions about Lagrange multiplier
vectors are reducible to questions about Sp(@). For instance, a
Lagrange multiplier vector fails to exist if and only if ~p(@) = ~;
this latter condition is in turn equivalent to the existence of a
vector y s Rn for which p'(@;y) = -~ [70, p. 216], in which case
the program is highly unstable for some perturbations. We also
obtain a satisfactory solution of the uniqueness question.
Corollary. A unique Lagrange multiplier vector l exists if
and only if p has a gradient at @, and then
li -~Yi
e) Exa___mple. The ordinary convex program defined by X = R 2,
f(x) = Xl, fl(x) = x2, and f2(x) = x~-x 2 has x = @ as its

unique solution, but there is no Lagrange multiplier vector. Of
course here the regularity assumption is violated.
f) We now briefly consider the use of Lagrange multipliers in
sensitivity analysis. This use is to provide an estimate for the

40
change in value of an ordinary convex program when the constraint
bounds are perturbed. The basic result, which depends on the
Theorem in d), is the following.
Theorem, Let f, fl,...,fn define an ordinary convex program
as in 12c). For given Y, F ~ Rn let x, ~ e X be solutions of the
programs inf {f(-): fi(. ) <_ yi }, resp. inf {f('): fi (.) < ~i }. If
X, Y are corresponding Lagrange multiplier vectors, then
~"(y-F) ! f ( x ) - f ( x ) ! i - . ( y - y ) .
Proof. We verify the right-hand inequality by examining the
definition of X-; the other inequality is handled similarly. Let
q be the perturbation function for the ordinary convex program de-
fined by f, fl-Yl,-..,fn-y- n. Then by d)~ -~- E aq(@). If p is the

perturbation function for the original program, we see that
p(y) = q(y-~) and hence that aq(@) = ap(7). Therefore,
-~ e ap(~), and so the right-hand inequality follows by noting that
f(x) = p(y) and f(x) = P(Y).
In addition to this property of Lagrange multiplier vectors,
recall that it was also brought out in the course of the proof of d)
that
-X.y < p' (@;y),
whenever p is the perturbation function for the ordinary convex
program 12c), X is a Lagrange multiplier vector, and y ~ R n.
Taking in particular y to be the jth unit vector in R n, we may
state that -X. is a lower bound on the marginal rate of change of

J
the value of the program relative to an increase in the right hand
side of the jth constraint. Further, as the Corollary in d) shows,
this lower bound is exact when Vp(@) exists.

4~
g) The u s e f u l n e s s of the p r e c e d i n g results in f) for deciding
whether it is w o r t h w h i l e to make changes in one or more of the con-
straint levels (for the purpose of d e c r e a s i n g the objective function)
depends on the a v a i l a b i l i t y of a Lagrange multiplier vector. Fortu-
nately, many p r a c t i c a l algorithms (such as the simplex procedure for
linear programs) will supply a Lagrange multiplier vector ~ along
with the p r o g r a m solution. This happens because ~ is the solution
of a "dual program" which is s i m u l t a n e o u s l y solved. In fact, in
practical problems (such as linear programs), the vector X is
usually unique.
Example. Consider the linear program in R4 defined by
maximize fo(Xo) ~ 2x I + x 2 + 10x 3 + 4x 4
subject to
5x I + x 2 + 15x 3 + 5x 4 ~ i00
3x I + 2x 2 + 7x 3 + 5x 4 ~ 125
0 ~ Xl,..-,x 4
A solution is ~ = (0,25,0,15) and the value is 85. A Lagrange
multiplier vector is ~ = (3/5,1/5). Question: how does a decrease
in the first constraint bound affect the value of the program?
To answer this, we reformulate the given p r o b l e m as an ordinary
convex program by defining
f = -fo'
fl(x) = 5x I + x 2 + 15x 3 + 5x 4 i00
f2(x) = 3x I + 2x 2 + 7x 3 + 5x 4 125.
Then -~.y <_ p(y)-p(@) gives here

42
-(3/5,1/5)-(6,0) < p(y)-(-85),
where y = (6,0) , and so
-p(y) < 85 + 36/5
which is < 85 if 6 < 0. In other words, any decrease in the first
constraint level will decrease the maximum value of fo by at least
the amount indicated.
h) Remark. In this section we have only touched upon a few of
the highlights of the theory of Lagrange multipliers in convex pro-
gramming. A most conspicuous omission is the interplay between this
theory and that of dual programs. Also missing is the very pretty
economic interpretation in terms of "equilibrium prices". For these
ideas, and for a more detailed study of ordinary convex programs, we
must refer the reader to the literature, in particular [20, 44, 54, 55,
7o3.
~14. Conjugate Functions
In this section we introduce the second technical device for
analyzing optimization problems - the "conjugate" of a (generally
convex) function. This notion, in its modern form, has been developed
by Fenchel [16], Moreau [47, 50], Br~ndsted [6], and Rockafellar
[66, 67]. A very useful survey, with applications to duality theorems
for mathematical programs and optimal control problems, and to
approximation theory, has been given by Ioffe and Tikhomirov [31].
For the remainder of Part I we will be working in a real ics
(possibly a nls) X. As usual there is an immediate extension of the
theory to complex spaces, obtainable by passing to the real parts of
linear functionals. Our restriction to real spaces is thus motivated
primarily by the desire to avoid cluttering up the many formulas

43
below w i t h innumerable "re's". We will also consistently use the
notation <x,y> to denote the value of a linear functional y ~ X*
at the vector x ~ X.
a) Definition. Let X be a real ics and f: X ÷ (-=,+=]. The
conjugate (also called the Fenchel transform) of f is the function
f*: X* + (-=,+~] defined by
f*(y) = sup{<x,y>-f(x): x g don (f)}.
EWe will assume that dom (f) ~ ~.)
b) A trivial yet useful consequence of this definition is Young's
inequality (or F__enchel's inequality):
<x,y~ i f(x) + f*(y).
Also immediate are the properties:
(f+c)* = f*-c, c s R I.
(cf) * = cf*(c), c > O;
f*(y) = g*(cy) if f(x) = g(x/c), c > 0;
f*(Y) = g~(y+F) if f(x) = g(x)- <x,Y~ ;
and
(inf f )* = sup f*
for any family {f } of functions on X.
c) Examples. i) If f = 6K for some K C X, then
f*(y) = sup { <x,y> : x ~ K} ~ sup < K , y ~ . This function is called
the support function of K.
2) Let X be a nls and f(x) = jJxll. Then f* = 6U(X, ).
More generally, if M is a subspace of X, and f(x) = dist (x,M),

44
then f* = 6U(Mi ).
3) Let X be a nls and for 1 0.
s) Let X = Rn and let A be a symmetric positive definite
n × n matrix. If i/p + i/q = 1 and
f(x) = (1/p)(x,Ax>P/2,
then
f*(y) = (l/q) < y , A - l y > q/2
d) Theorem. A conjugate function f* is always convex and
w~-isc.
Proof. f* is the pointwise supremum of the functions
y,÷ <x,y~ -fCx), each of which is a w~-continuous affine function
on X~ .
e) Definition. The second conjugate of a function

f: X ÷ (-¢o,+oo] is the function f**: X + [-~,+~] defined by
f*~(x) = sup { < x , y > -f*(y): y ~ dora (f*)}.
Examples. In Examples 2)-5) in c) we have f*~ = f. But in
Example I) in c) we see that

45
f~e = 6-
CO (K) '
since sup < K , y > = sup < c - o ( K ) , y > , ~ y g X*.
f) Theorem. With f as defined in a), we have f** ! f, and
dom ( f ~ ) ~ c-o (dom (f)), provided that f** is proper.
Proof. The pointwise inequality f~* < f follows directly from
Young's inequality (b)). Suppose x ° ~ c-$ (dom (f)). Choose
Yo ¢ X~ according to 3h) so that
~Xo,Yo~ > sup ~c-o (dom ( f ) ) , y o ~
= sup <dora (f),yo~.
Since f** is proper, ~ Yl e X ~ such that If*(yl)l < +~. If
t > 0 then
f*(Yl+tYo ) - sup { < x , Y l + t Y o ~
f(x) : x e dom (f)}
! f*(Yl) +t sup ~..dom (f),yo~ ,
and so
f~(x o) K <Xo,Yl+tyo~2 f~(Yl+tYo )
< x o,yl ~ - f * ( y l ) : t ( 4 X o , Y o ~
-sup i dora f'Yo~ )'
which is unbounded as t ÷ +~, qed.
g) For f and X as in a) we recall that the conditions "f
is isc on X" and "the sub-level sets of f (i.e., the sets
{x e X: f(x) ! c} are closed" are equivalent. It is also easy to
see that these are equivalent to the condition "epi (f) is closed in
X × R I''. We denote by F(X) the set of all isc proper convex

46
functions on X.
Theorem. f** = f if and only if f s F(X).
Proof. If f** = f, then by d) f must be convex and w-lsc,
hence isc since every w e a k l y closed set is closed. Also, f is p r o p e r
by the assumptions made in a). Conversely, suppose f e F(X). Then
the p r e c e d i n g theorem implies
(i) dom (f) C dom (f**) C_ cl (dom (f)).
Suppose ~ x o e dom (f**) such that f**(Xo) < f(Xo). Then
(Xo,f**(Xo)) % epi (f), a closed convex set. Hence Yo E X* and
t c R1 such that
o
tof** <Xo)+ ¢o'Yo> >

(2) sup {tot+ < X , Y o > : (x,t) e epi (f)}.
Now t o } 0, for o t h e r w i s e (I) would be contradicted. In fact
t o < 0, since otherwise the sup in (2) would be infinite. So we may
assume t o = -i. Then for given x e dom (f), the sup is a t t a i n e d
when t = f(x). But now
<Xo'Yo') - f * * ( X o ) >
sup { < X , Y o > - f ( x ) : x s dora ( f ) } = f * ( y o ) ,
a contradiction to Young's inequality.
C o r o l l a r X. With X* endowed with the w * - t o p o l o g y , the map-
ping f~÷ f* is an o r d e r - r e v e r s i n g bijection from F(X) onto F(X*).
Exercise 24. For any f u n c t i o n f as in a),
f** = c--o (f) ~ sup {g ! f: g convex and isc on X}. (Conceivably, the
only such g is the constant -~, but then f* must be the con-
47
stant +~.)
h) Definition. Given functions fl,...,fn as in a), their
(infimal) convolution @fi is the function defined by
n n
~fi(x) = inf {i=l
[ fi(xi):i!l xi = x}.
Exercise 25. If each f. s Conv (X), then @ f. is convex

1 l
(although possibly not proper). Its effective domain is ~ dom (fi).
Exgmples. i) For non-empty sets A, B C X , 6A(9 ~ B = 6A+ B.
2) (6A(~ f)(x ) = inf {f(x-a): a s A}. In particular, if f
is a semi-norm on X, then 6A( ~ f = dist (.,A). If also A is
convex, then Exercise 25 shows that d(.,A) is a convex function on
X,
n
Theorem. ~ ( fi)* = ~ f.* .
i= 1 I
Proof. (~fi)*(y) = sup { < x , y >
inf {[fi(xi): x i = x, x i s dom (fi)}:
x e [ dora ( f i ) }
= sup sup [(<xi,Y > - fi(xi))

X ~X.=X
1
=~ sup {<x,y> fi(x)}

X
= <y),
i) A formula for the conjugate of a sum is a little harder to
come by, although of greater interest. We give next one result in
this direction; however, we will later give a more useful formula
(requiring stronger hypotheses) as an illustration of the Fenchel
Duality Theorem - see 21a).

48
Theorem. If fl,...,fn c F(X), then
(~fi)* = ( @ f l ) * *
Proof• Whether or not the f. are convex we always have the

1
inequality
(3)
as follows from f) and g) above. By h) the right hand side of (3)
equals (~f~)*~ But since fi¢ F(X), there is actually equality
in (3), qed.
§15. Polarity
a) Let X be a real ics and A~X. In 3d) the Minkowski func-
tion of A was defined, provided that A was a convex e-nbhd.
Using the convention that the infinum of the empty set of real num-
bers is + ~, we now expand the coverage of that definition by
defining pA(@) = 0 and
PA(X) = inf {t > 0: x ~ t A},
whenever x + e. Thus PA: X ÷ [0,~], and is positively homogeneous:
PA(CX) = cPA(X ) for c > 0. We also recall that polars were dis-
cussed in §4.
Theorem. For any A C X, we have
(i) PA = 6
A O'
and, if e ~ A,
(2) 6A = p
A°
49
Proof. *
pA(y) = sup { <x,y> PA(X):
x s dora (pA)} = sup { <x,y> - inf {t > 0:

X
x s t A}} = sup {sup { <x,y) -t: x s t A}}

X
Now if y ¢ A° this last sup is < 0, and hence = 0 since
pA(@) = 0. On the other hand, if y ~ A °, then ~ x e A such that
(x,y) = 1 + E > i. Consequently,
pA(y) >_ < t x , y --t = t~,
for t > 0, so pA(y ) = +~.
To prove the s e c o n d formula, recall from 14c) that
~A(y) = sup <A,y) z g(y) L 0,
the last inequality coming from @ E A. Now A ° = {y: g(y) < I}.
Suppose g(y) = 0. Then the ray ( 0 , ~ ) F (~ A°, so PAO(Y) = 0.
Suppose next that 0 < g(y) < +~. Since g is p o s i t i v e l y homo-
geneous we have
P A o ( Y ) = inf {t > 0: ~/t E A °}
= inf {t > 0: g(y/t) < i}
= inf (t > 0: g(?-) < t} = g(F).
Similarly, if g(y) = +~, then for no t > 0 is t~ ~ A °, hence
PAo(Y) = +~, qed.
b) F o r m u l a (I) t o g e t h e r with 14d) provides a new p r o o f of the
fact that A° is always w * - c l o s e d and convex. In turn this fact
yields the h i g h l y useful
Corollary. (Bipolar Theorem)

50
A°°~ (a°) ° = c--~ ({e} L.JA).
Proof. Since {@} L) A C A °° which is closed and convex, we
have at least that A°°~c-$ ({@} k / A ) . On the other hand, if any
closed half-space contains {@} E / A , it must also contain A °°"
taking into account 3i) this proves the reverse inclusion.
Exercise 26. Let {A } be a family of c l o s e d convex subsets of
X each c o n t a i n i n g @. Show that
(3) (¢")ac~) ° = FF ( U a ° a ) ,
the closure here being of course taken in the w~-topology.
c) Example. This example completes the d e s c r i p t i o n of the
extreme point sets of the unit balls of the c l a s s i c a l Banach spaces.
It is d e s t i n e d to play a vital role in the theory of best Chebyshev
(uniform) approximation, to be p r e s e n t e d in Part III.
Let X = CR(~); we will characterize ext (U(X*)). For t ~
let 6t be the point mass at t; as an element of X*, 6t is
the n o r m - o n e functional x ~ x(t), x ~ X. Now each
8 t ~ ext (U(X*)). (If ~ v,a E S(X*) such that 8t = i/2(v + a),
then both ~ and a must be > @, since the p o s i t i v e face (5b
Example 3)) of U(X*) is U(X*)-extremal. Hence they must both
annull any n o n - n e g a t i v e x ~ X which vanishes at t. It follows
the supports of V and a are just {t}; consequently,
= a = 6t. ) Therefore, the set E ~ {~ 6t: t ~ ~}C ext (U(X*)).
Further, E ° = U(X), so the Bipolar Theorem implies
U(X*) = E °° = c--o- ({@} KJ E) = c-o (E) (w*-closures of course}. Since
E is w * - c o m p a c t (the map t ~ 6t is a h o m e o m o r p h i s m on a), the
Krein-Milman Theorem 5e) shows that
ext (U(x*)) C E = E~ext (U(X*)). Thus we have p r o v e d that

51
ext (U(CR(~)*)) = {~ 6t: t E ~}.
A completely analogous characterization is valid for ext (U(C(~)*)),
namely it is the set {~t: t e ~ and lal = i}.
d) We reconsider now the formula of Exercise 26. It is of
frequent interest (for example, in the next section) to know that the
convex hull on the right hand side of (3) is already (w *-) closed.
In particular this is the case if there are only finitely many A ,
each of which is a (closed convex) O-nbhd. For then, by ~4, each

~O •
is w*-compact convex, and the result follows from 5a). We sum

C¢
up:
Lemma. Let AI,...,A n be closed convex @-nbhds. in a i c s . Then
(AIn'''~A n )° = co ( A ~ U . - . O A ~ ) .
~16. Dubovitskii-Milyutin Theory
We give next a brief introduction to a very general approach to
the solution of (not necessarily convex) mathematical programs.
Given a variational pair (X,f) (lla); here X is a real Ics), the
procedure yields a necessary condition, in the form of an equation in
X* ("abstract Euler equation"), for a specific element of dom (f)
to be a solution of the associated program. The scope of this theory
has recently been extended by Halkin [23] and Lobry [41] so as to be
applicable to optimal control problems. The original presentation
was [14]; a discussion has also been given in the Girsanov book [21].
a) Theorem. (Dubovitskii-Milyutin) Let Ko be a convex set
and KI,.-.,K n open convex cones with vertex @ in a real Ics X.
Then
n
(1) ~-~K i =
I=O
52
if and only if ~ Yi s K~ not all zero such that
(23 Yo + Yl +'''+ Yn = @"
Proof. The existence of Yi'S satisfying (2) is clearly suf-
ficient for (I) to hold, since the cones KI,...,K n are open• Now
conversely, if condition (i) holds, we can assume, without loss of
generality, that K ~ Klf~...f-~K n + ~. Choose ~ s K and let
Jl" ~ K.I ~, i = 0,1,...,n. Apply 3g to separate Jo and
J ~ Jl N '" "'~Jn: sup < J o , Y o ~ _< ko < inf ~ J , y o ~

- -
Since J is a
@-nbhd., ko < 0 • Hence yo/~o E = co (J U . . . ~ J ), by 15d).
This implies the existence of a i _> 0, i = l,...n, a I +...+ an = I,
for which -Yo = Yl +'''+ Yn' where Yi ~ -Xoai J° ~ Xi JO" Thus

n n
(3) X Yi -- e and ~ Xi = 0.
i=O i=0
Now by definition of Ji' sup ~ K i , Y i / ~ i ~ is bounded above, so
sup < K i , Y i / X i ~ ~ 0 since Ki is a cone at @, i = l,...,n; in
particular Yi ~ K~. From this, and the fact that x ~ K, it

1
follows that sup Ko,Y ° X ~o + X'Yo -< - Z

1(~i + (x,Yi~) -< 0
by (3), so Yo ~ K°o also, qed.
Other proofs of this theorem have been given by Vlach [76],
Halkin [23], and Pshenichnii [62]; the proof given above was adapted
from Ioffe-Tikhomirov [31]. The interest in this theorem, as will
be seen shortly, is that a necessary condition for the solution of a
variational problem can be expressed as the requirement that a cer-
tain finite family of convex cones should have an empty intersection.
The theorem then yields an equation, (2), which must be solved. We
will refer to (2) as the abstract Euler equation.
b) The variational problems (X,f) to which we will apply the
preceding theorem are of the following type. There are sets

53
~l,.-.,~n_l, each having non-void interior, and a set A which will
not generally have interior points, such that
(4) dora ( f ) = AN aiN'" • N ~ n _ 1.
Intuitively, each ai is the set satisfying some i n e q u a l i t y con-
straint, while A is the set where one or more e q u a l i t y constraints
hold. We s e e k a condition which a given x ~ dom ( f ) must satisfy

o
in order that it be a s o l u t i o n . Since we a r e not at present limit-
ing ourselves to convex programs, we m u s t a l l o w the possibility of
local minima; the condition to be d e r i v e d will indeed be c h a r a c t e r -
istic of each local minimum.
From the given data X (a real ics), f, A, al,..',an_l, and
x ° ~ dom (f), we now construct the sets Ko, KI,...,K n to which
a) will be applied.
c) We begin with the objective function f.
Definition. x c X is a direction of decrease of f (at Xo)
(originally called a "prohibited variation") if 3e > 0 and an
~-nbhd. V such that 0 < t < e and x e V imply
f(x ° + tx) < f(Xo).
The s e t C(xo,f ) of all such elements x is easily seen to be
an open cone with vertex @, or else it is void. There is no
priori reason to expect that the cone C(xo,f ) is convex; however,
as we i n d i c a t e next, this actually is the case i n many c o m m o n l y
occurring situations.
Example,s. 1) Suppose that Vf(Xo) exists as an e l e m e n t of X*.
Then the cone C(xo,f ) is the open half-space {x e X: V f ( x ~ . x < 0}.
2) Suppose that f e Conv (X) and is continuous o n some
Xo-nbhd. Then C(xo,f ) is the convex cone {x ~ X: f ' ( X o ; X ) < 0 } .

54
d) We c o n t i n u e with the sets ~i; we take any one of them and
call it ~.
Definition. x s X is a d m i s s i b l e with respect to ~ if
s > 0 and an x-nbhd. V such that 0 < t < s and x s V imply
x + tx ~ ~.
O
The set C(Xo,~ ) of all such vectors in X is an open cone at
@; it m i g h t be void or it m i g h t not be convex. Only the case w h e r e
x° is a b o u n d a r y point of ~ is n o n - t r i v i a l : if x ° s int (~),
then C ( X o , ~ ) = X.
Examples. I) Let ~ be a c o n v e x body in X. Then C(xo,~ ) =
int (S(Xo, int (~))) Xo; that is, it is c o n v e x cone at x°
generated by int (~) which is then t r a n s l a t e d to @.
2) Suppose that g = {x s X: g(x) ~ g(Xo)} for some real-
valued function g on X. It is clear that C(xo,g)~C(Xo,~). If
either Vg(Xo) exists in X* (and is not @), or if g e Conv (X),
continuous on some x -nbhd., and the r e g u l a r i t y assumption

O
{x: g(x) < g(Xo)} + ~ is valid, then C ( X o , ~ ) = C(xo,g).
e) F i n a l l y we consider the construction of a cone for the set A.
Definition. x s X is a d m i s s i b l e with respect to A (or, is a
tangent direction to A at Xo) if ~ a map r: [0,E] ÷ X for
some s > 0, such that x ° + tx + r(t) s A when 0 _< t _< s, and
and r(t)/t + @ as t + 0+.
The set C(Xo,A ) of all such v e c t o r s is a g a i n a cone at @
and @ s C(Xo,A). In m a n y cases of i n t e r e s t this cone is s i m p l y a

55
l i n e a r subspace.
~xamples. I) If A is a flat in X then C(Xo,A ) is the
parallel subspace.
2) Let X and Y be Banach spaces and G: X ÷ Y a mapping
which is continuously Frechet differentiable on an x -nbhd. Assume

O
that the differential dG(Xo) s L(X,Y) is surjective. Then if h
has the form {x ~ X: G(x) = @}, we have C(Xo,A) = nullspace of
dG(Xo). (Without the assumed surjectivity, we can only assert that
C(Xo,h) C nullspace of dG(Xo). ) When X and Y are finite dimen-
sional, the surjectivity condition is equivalent to the Jacobian
matrix of G at x° having maximum row rank. This result is due to
Liusternik [43]; see also Flett [17].
f) Now we come to the point of the last four sections. We re-
consider the program formulated in b), and define
=
Ko C (x ° , A ) ,
K i = C(Xo,~i), i = l,''',n-l,
K n = C(xo,f).
It is explicitly assumed that all these sets are convex. We then
obtain the Dubovitskii-Milyutin_ Optimality Criterion: if
x ° ~ dora (f) ((4)) is a solution of the program (X,f), then
3 y i ~ K?i' not all @, such that Yo + Yl +'''+ Yn @' that is,
the abstract Euler equation must hold.
Proof. By a) we must prove that (i) holds. Suppose ~ x e (-~K..

1
Since the intersection of finitely many x-nbhds, is again such a
nbhd., ~ ~-nbhd. V and ~ > 0 such that f(x o + tx) < f(Xo) and
x o + tx ~ ~ i whenever 0 < t < ~ and x ~ V. But x ~ Ko also.

56
Hence x o + t(x + r(t)/t) e A for sufficiently small t > 0. By
definition of r(.), x + r(t)/t s V for small t. This shows that
t > 0 and x e V such that x ° + tx E A('~ ~ l ( ' ~ . . . ~ n _ l ~ dom (f)
but f(x o + tx) < f(Xo), and so x° is not a solution after all,
qed.
§17. An Application
As one illustration of the Dubovitskii-Milyutin procedure we con-
sider here the so-called "simplest problem in the calculus of
variations". In particular, we will see that the abstract Euler
equation of 16a) leads in this case to the classical Euler differen-
tial equation. The problem to be solved is essentially that of
minimizing a functional defined over a class of smooth curves joining
two fixed points in the plane R 2. Among such programs are included
the shortest distance problem, the brachistochone problem (the shape
of a wire along which a ring descends in least time subject to
gravity), and the profile of a minimal surface of revolution. The
solutions of these three problems are respectively straight lines,
cycloids, and catenaries.
a) Let F: R 3 + R 1 be continuous with continuous partial de-
rivatives in its second and third arguments. Consider the functional
r 1
x '+ I F(t,x(t),x' (t))dt,
0
defined for all x e C~([0,1]). We seek to minimize this functional
over the set of all such x which satisfy x(O) = ~, x(1) = ~,
for given fixed ~ and ~.
To recast this problem in a more convenient form, let
X = CR([O,I]) × CR([O,I]) , and define f: X ÷ R I by

i
f(x,y) ~ f((x,y)) |0 F(t,x(t) ,y(t))dt.
= J
57
Define the constraint set A C X by
A = {(x,y): x(t) = a + y(t)dt, x(1) = 8}.

0
Thus our variational problem becomes (X, f+~A). We now assume that
this problem has a minimum at (Xo,Yo) ~ A.
b) The objective function f is certainly not convex in
general, but it is smooth on X. Indeed, we have the formula
1
(i) Vf(Xo,Yo).(x,y) = j (F2x + F3Y)dt,
0
where the subscripts indicate partial derivatives on the second and
third variable, and these derivatives are each evaluated at
(t,Xo(t) ,Yo(t)) . By 16c)
C((xo,Yo),f) = {(x,y): Vf(Xo,Yo)'(x,y) < 0}.
The polar of this cone is simply the ray {t Vf(xo,Yo): t K 0}.
c) Since A is a flat in X, 16e) implies that C((xo,Yo),A)
is the parallel subspace:
C((xo,Yo),A) = {(x,y): x(t) = y(t)dt, x(1) = 0}.

0
The p o l a r of this subspace is the annihilator subspace which consists
of those ~ ~ X* having the form
(z) ~(x,y) = cx(1)

1
+ f, (s(t)
it y(s)ds)d~ (t),
J0 0
for some c ~ R1 and ~ e rca ([0,I]).
Exercise 29. Prove this last assertion.
d) We can now write down the abstract Euler equation which
must be satisfied if (Xo,Yo) is to be a solution. There must

58
exist c ~ R1 ~ ~ rca ([0,i]) and z > 0 such that
(3) -~vf(xo,Y o) ÷ ~ = e
where ¢ is defined by (2). These (linear) functionals cannot both
vanish so • > 0. Suppose we apply both sides of (3) to elements of
the form (x,y) where x(t) ; y(s)ds. We o b t a i n , using (1) and

0
(2):
1 t
(4) (10cF2 I0Y(s)ds + F3Y(t))dt)
+ c y(t)dt ~ 0,
0
for any y ~ CR([0,1]). If we i n t e g r a t e by parts the first integral
in (4), we a r r i v e at the equation
1 1
(s) I0 (~(It Fzds + F3)-c)y(t)dt = 0.
Since (5) holds for every y a CR([0,1]) , we a c t u a l l y must have

¢, t
(6) ~F 5 TjlF2ds - c = 0.
Finally, if F3 happens to be differentiable, we o b t a i n from (6)
the classical Euler differential equation which x° and Yo (-- x ~ )
must satisfy:
d
~-~ F3(t,Xo(t),Yo(t)) = F2(t,Xo(t ),yo(t)).
§18. Conjugate Functions and Subdifferentials
We consider next a few relationships which depend on both the
conjugate operation and the subdifferential mapping. The first re-
sult does not depend on convexity and follows immediately from the
definitions.
59
a) Lemma. If X is a real ics and f: X + (-~,+~] then
Yo ~f(Xo)<=)f(Xo) + f*(yo) = <Xo,Yo~ (equality in Young's

inequality).
b) Lemma. Let f, X be as in a). If f is subdifferentiable
at x° e X then f(Xo) = f**(Xo).
Proof. Choose any @ s ~f(Xo) and define a continuous affine
function h by
h(x) = ~(X-Xo) + f(Xo).
We have h j f pointwise on X and h(Xo) = f(Xo). Hence
h < f** < f (recall ~xercise 24), qed.
Corollary. Yo s ~f(Xo) =)Xo e ~f*(yo).
c) It follows from a) and b) that if f is subdifferentiable
at x e X, then so is the isc convex function f**, and

o
furthermore
~f(Xo) = ~f**(Xo).
This provides some of the interest in the functions in F(X), since
we may assume that f = f** e F(X) for the purpose of finding
subgradients. If f is already in Cony (X), then f** differs
from f (if at all) only at certain relative boundary points of
dom (f) where its values may be strictly smaller than the corres-
ponding values of f. In fact, in this case, we have the formula
f**(Xo) = lira inf f(x),

x-~x
o
valid ~ x o e X, if f has a point of continuity.

60
d) C o n s i d e r now an abstract convex program (X,f) (lla)). We
already have studied the s o l v a b i l i t y condition @ s Sf(Xo) ,
necessary and s u f f i c i e n t for x E X be a solution. Now by

O
definition f~(@) = -inf f(X), so that f is b o u n d e d below on X
exactly when e s dom (f~). Next we o b s e r v e from a) that
~f*(e) = {x s x: 0 = f*(e) + f**(x)}
= {x s x: f**(x) = inf f(x)}.
Thus for f s F(X), the set of s o l u t i o n s to the p r o g r a m (X,f) is
just Df*(@); in p a r t i c u l a r , the e x i s t e n c e of a s o l u t i o n is equi-
valent to the subdifferentiability of f* at 0. Recalling Exercise
18, we see a d d i t i o n a l l y that the e x i s t e n c e of a u n i q u e solution is
equivalent to the e x i s t e n c e of the g r a d i e n t qf*(e) in X, this
vector being then the u n i q u e solution to the program.
Especially when X is finite dimensional a m u c h more detailed
study of the m i n i m u m of a convex function is p o s s i b l e - see [70, §27].
e) Theorem, Let X be a real ics, and f ¢ Conv (X). For any
X O ~ X,
f'(Xo;.)* = 62
f(xo)
Proof. For fixed t > 0 let
Ft(x ) = (f(x ° + tx)-f(Xo))/t.
Then
F~(y) = (f (xo) +f* (y) - ( X o , y ) )/t,
by 14b), and F~ ~ @. Consequently,

61
f' (Xo;.)* = (inf Ft)*

t
= sup F~
t
= sup (f (Xo) +f* (y) - ~ X o , Y > ) / t

t
= 6~f(Xo),
where we have used 7a), 14b), and b) above.
Corollary• c--o (f' (Xo,.)) =
f'(Xo,.)~ = 6~f(Xo)
the support function of ~f(Xo) (14c)).
Suppose now that f is also continuous at x o. Then according
to Exercise 17 f'(Xo;.) is then continuous on X, and so
f'(Xo;.) = 6~f(Xo) ,
in this way we obtain a new proof via conjugate functions of the
Moreau-Pshenichnii Theorem 10b).
§19. Distance Functions
a) To further illustrate the use of conjugate functions and the
formulas of Sections 14 and 15, we study a very important example of
a convex function - the distance to a convex set. Let K be a
subset of a nls X. The function
x ~ d(x,K) e dist (x,K)
can be r e p r e s e n t e d by 14h) as the convolution
d(.,K) = I1•11 @ 6 K
62
Hence if K is convex then so is d(.,K); it is easy to see that
the converse is also valid. Of course, for any non-empty K~X,
d(.,K) is (Lipschitz) continuous on X.
b) Theorem. (Duality Formula for Distance) Let @ ~ K, a
convex subset of a nls X. Then
(1) d(x,K) = max ( < x , y > - P K o ( Y ) : y e U(X~)}.
Proof. Applying successively 14h) and 15a) we find
dC.,K)* = ll'Jl* + 6K~
= 6U(X, ) + PKO.
Then since certainly d(.,K) ~ r(x), d(.,K) = d(.,K)**, whence
d(x,K) = sup {<x,y) - ~ u ( x * ) (y)
PKo(Y): y ¢ X*)
= sup {<x,y> - ~Ko(Y): y ~ u(x*)}.
The "sup" here is actually a "max" since U(X*) is w*-compact, and
the function y ~ (x,y) + (-PKo(Y)) is the sum of a w~-continuous
and a w~-usc function, hence is w~-usc.
Note that if @ ~ K we still obtain a duality formula, if pK °
is replaced by the support function of K. This also shows that if
d(Xo,K) is attained (i.e., if ~z ° ~ K such that
]iXo zo] [ = d(Xo,K)), then since the "max" in (i) is attained only
at points in S(X), we have (Xo-Zo'Yo) = I Ix o-z o] I, for any
Yo where the "max" is attained. Hence any such Yo is a sub-
gradient of the norm at x -z .

O O
63
Corollary. If M is a linear subspace of X then
(2) d(x,M) = max {<x,y>: y ~ U(M~)}.
c) Remark. In the p r e c e d i n g corollary the "max" may be replaced
by a "sup" over ext U(M ) (5d)). This extreme point set is fre-
quently much smaller than the entire unit ball, so that such a
replacement may be a c o n s i d e r a b l e simplification. This theme is
developed at some length in [8].
Exercise 30. (Buck, Golomb) Let M be the subspace of
separable functions in CR([0,1]x[0,1]); that is,
M = {z: z(s,t) = x(s)+y(t), x,y c CR([0,1])}. Let Xo(S,t ) = st.
Compute d(Xo,M ) .
d) Lemma. For 1 < p < +~, let f = ~d(.,K) p, where K is

1
a convex subset of a nls. Then f* = ~I[.]I q + OK, where oK is
the support function of K.
The proof follows from 14c) and 15h).
e) We study now the smoothness of the distance function when K
is a convex subset of Hilbert space. To do so we must anticipate a
result and some terminology from Part III. The result needed is a
characterization of best approximation from convex sets in Hilbert
space. Although not p r o v e n until §22, it is a c o n s e q u e n c e of the
optimality criterion in lld). We let PK be the m e t r i c projection
(see ~32) onto the closed convex set K.
Theorem. Let K be a closed convex subset of a Hilbert space
S, and let f = ~d(.,K) 2. Then f is a smooth convex function on
X and
(3) Vf = I - PK"
64
Proof. Applying the lemma in d) (p = 2), and Young's
inequality (14b) we obtain
(4) f(x) >_ < x , y > - } [ [ y ] l 2 - oK(y),
for every x,y ~ X. (We a r e u s i n g the self-duality of Hilbert space
and < .,. > is the inner product.) Fix z ~ X and l e t
y = Z-PK(Z ) . Using the characterization of best approximations from
K (22d)), namely
o !<z-PK(z),P~(z)-Y>, y ~ K,
we have
°KEY) = ~up < K,y > =

sup < K,z-pK(z) > = < P~Cz),z-PK(Z) 3 ,
whence by ( 4 ) ,
f(x) >_ < X - P K ( Z ) , Z - P K ( z ) > - f ( z ) .
Therefore,
f(x)-f(z) >_ <~-p~Cz),z-pK(z) ~

(s)
-[[Z-pK(z)ll 2 = < x - z , z-pK(z) > .
Thus
o ! f(x)-f(z)- (~-z, ~-v~(~)>
< ix-z, z-PKcx)) - < x z, z-PKcz)3

= <x-z, (X-PK(X))-(Z-PK(Z)) >
llx-zll 11 <~-PK)<x-z)l I
211x-zll 2,
65
qed. (The second inequality here results from (5) by interchanging
x and z; the final inequality depends on the fact that PK is a
contraction (32a)) .
Remarks. i) The formula (3) shows that f is actually con-
tinuously Frechet differentiable on X. It is interesting that this
conclusion does not depend on any smoothness properties of the
boundary of K.
2) The function f is not generally twice differentiable; there
are trivial counterexamples. Thus higher order differentiability of
f does depend on the boundary smoothness of K.
3) Of course the theorem implies that the distance function
itself is smooth; indeed, its gradient vector at x ~ X\K is
(x-PK(K))/I Ix-p~(x) I I •
4) Smoothness of d(.,K) on tile open set X \K can be
established in certain Banach spaces X. It is required that the
norm in X be smooth and that PK be (single valued and) continuous
[Holmes-Kripke; unpublished].
Exercise 31. Verify the assertions in Remarks 2) and 3).
g20. The Fenchel Duality Theorem
This theorem associates with a given convex program a "dual"
program wherein a concave function is to be maximized. Before
making precise the form of the dual programs we take time to rethink
the geometric interpretation of conjugate functions.
a) Let X be a real ics and f ~ Cony (X). As was observed in
8b), linear functionals on X × R1 can be identified with pairs
(~,s) E X* × R I. The corresponding hyperplanes are called non-
vertical provided s # 0; if so, they each intersect the Rl-axis at

66
exactly one point. Since we can assume that s = -i, this point of
intersection is (@,-X) if the h y p e r p l a n e has the form
H A ~ {(x,t) s X × RI: < x , y > - t = X}, for some y ~ X*.
Now suppose X > f*(y). Then Vx e X, I > ~x,y) -f(x) or
f(x > in other words the h y p e r p l a n e HA lies strictly
"below" epi (f). Similarly, if ~ < f*(y) then HA intersects
epi (f) strictly "above" some point (x,f(x)). Thus w h e n X = f*(y)
HA is trying to be a supporting hyperplane to epi (f), although
the two sets will intersect only if the "sup" that defines f*(y)
is actually attained, which happens exactly when y is in the range
of ~f. In any event the "vertical height" of this hyperplane above
the origin (@,0) is -f*(y).
b) Next suppose that -g s Conv (X). In this case we will say
that g is a p r o p e r concave function on X and write g s Conc (X).
The theory of such functions is of course a mirror image of the pre-
viously developed theory of proper convex functions. Thus we define
dom (g) = {x ¢ X: -~ < g(x)}
epi (g) = {(x,t) ~ X x RI: t <_ g(x)}.
If we consider again the h y p e r p l a n e s HA in X x R I, and
define an (extended) real number y by requiring that for -k > y
HA lies "above" epi (g), while for -X < y HA intersects
"below" some x, then evidently y = -inf {<x,y~-g(x): x s dom (g)}.
Definition. If g: X ÷ [-~,+~) the (concave) conjugate of

+
is the function g : X* ÷ [-~,~) defined by
+
g (y) -- inf {<x,y~-g(x): x ~ dom (g)}.
(Again we assume dom (g) ~ ~.)

67
+
Analogously to 14d) we have g is a w*-usc concave function
on X ~. If h is any real-valued function on X, then
h+(y) = -(-h*(-y)).
Hence even when h ~ Conv (X)/~ Conc (X), i.e., when h is
affine, h + } h*.
With the definition of g+(g ~ C o n c (X)) we see that Hi is
"tangent" to epi (g) (that is, neither intersecting epi (g)
strictly "below" some point (x,g(x)) nor lying strictly "above"
epi (g)) exactly when -I = g+(y), and then the "vertical height"
+
of this hyperplane over the origin is -g (y).
c) We now consider a convex program of the form (X, f-g),
where f, -g ¢ Conv (X). Such programs are not as special as they
might at first appear, and we will discuss several examples shortly.
For now, note that V x c X, ~ y ~ Y,
f(x)+f~Cy) >_ < x , y > >_gCx)+g+Cy),
SO
fCx) g(x) > g+(y) f~(y),
that is,
(i) inf (f-g)(X) > sup (g+ f~)(x~).
It is helpful to view (I) geometrically by considering epigraphs
and hyperplanes in X × R I. The inequality asserts that the value
of the program (X, f-g) (the left hand side of (I), which can be
thought of as the minimal vertical distance between epi (f) and
epi (g)) is at least as large as the value of the concave program
(X ~, g+-f*) (the right hand side of (i), which, by the analysis of
a) and b), can be interpreted as maximum vertical separation of two

68
parallel hyperplanes tangent to the two epigraphs).
Theorem. (Fenchel, Rockafellar) Let f, - g e Conv (X) and
assume that one of them i s continuous at some p o i n t in
dom (f)(3dom (g). Then
(2) inf (f-g)(X) = max (g+-f*)(X*).
Proof. Let f be c o n t i n u o u s at x s dom ( f ) ( - ~ dom ( g ) . Then

O
x ° ~ int (dom ( f ) ) and +~ > f ( X o ) - g ( X o ) ~ inf (f-g)(X) ~ ~. The
theorem is clearly true if a = -~ by (1). So we may a s s u m e a is
finite. Introducing the sets
A = {(x,t) ~ X x el: x cint (dora (f)), t > f(x)},
B = {(x,t) g X x RI: t < g(x)+a},
we note that they are convex and disjoint, and A is open. Hence
a hyperplane Hx separating A and B. Hx cannot be v e r t i c a l
for o t h e r w i s e its p r o j e c t i o n onto X would separate the pro-
jections of A and B, viz. dom (f) and dom (g), and this
would contradict the e x i s t e n c e of x o. With Hx having the form
{(x,t) ~ X x RI: <x,y> -t = X} as in a), we can assume that
t > f(x)~---><x,y> -t ! X. Thus
(5) < x , y ) - x 2 f(x)
is valid throughout int (dom (f)), and it clearly holds outside
dom (f). But (3) is also valid if x is a b o u n d a r y point of
dom (f), since then by 3c) tXo+(l-t)x e int (dora (f)) for
0 < t ~ i, so (3) implies
~(tXo+(1-t)x)-X <_ f ( t X o + ( 1 - t ) x )
<__ tf(Xo)+(l-t) f(x) ,

69
and we may let t + 0+. Consequently, f*(y) < I and in a
similar way we see that
<x,y) ~ ~ g(x)+~, V x ~ x,
or ~ + I ~ g+(y). Therefore,
÷
! g ( y ) - ~ ! g+ (y) - f* (Y)
! sup ( g + - f ~ ) ( X ~) <_ i n f (f-g)(X) = ~,
qed.
d) We now want to consider when the "inf" in (2) is actually
attained.
Definition. Let g e Conc (X) and x X. Any y e X* for

O
which
x-xo,y ) ~_ g ( x ) - g ( X o ) , ~ x ~ x,
is called a supergradient of g at x
O
The set of all supergradients of g at x is written

O
~g(xo) , and we h a v e y e Ag(xo) if and o n l y i f
g(Xo)+g+(y) = ~Xo,Y ) .
Corollary. Assume that f and -g in Conv (X) satisfy
equation (2). Then f - g attains its infimum over X at x if and

O
only if ~f(xo) (~ Ag(xo) ~ ~. Points in this intersection are then

+
exactly the points where g f~ attains its supremum over X ~.
Proof. Yo ~ ~f(Xo)(~ ~g(Xo)<~-> f(Xo)+f~(Yo) <- ~ X o ' Y o ~ <-- g(Xo)+
g+(yo)¢-~f(xo)-g(Xo) -- g+(yo)-f~(yo). Now use equation (2).

70
§21. Some Applications
a) As a first application of the Duality Theorem we establish
the promised improvement of the theorem in 14i).
Theorem. Let fl,..-,fn E Conv (X) where X is a real Its.
Suppose that f2,...,fn are continuous at a point in
dom (fly f'~ int (dom ( f 2 ) ) N ... ~ i n t (dom (fn)). Then
(Efi)* = O f~'z
Proof. An obvious induction will establish the general case
after we attend to the case n = 2. In 20c) put f ~ f2 and
g ~ < "'Yo> -fl' for any fixed Yo e X*. Then g+(y) = -f~(yo-y),
and so the Duality Theorem implies
- (fl+f2)* (yo) = i n f {fz (x) +f2 (x)
<X,yo) : x ~ x)
= max (g+-f*) (X*)
= max {-f~(yo-y)-f~(y): y ~ X*}
= -(f~ + f~)Cyo)'
qed.
b) The standard convex program involving the minimization of
f e Cony (X) over a constraint set K can be cast in the form
(X, f-g), to which the Duality Theorem is applicable, by setting
g z -~K" The original program can then be replaced by the dual
concave (maximization) program (X*, g+-f*), provided the hypotheses
of 20c) is satisfied. Of course, the interest in doing so depends

+
on the ease with which f* and g can be calculated and the
simplicity of the resultant dual program.

71
c) Exam~_l_e. Let K be a closed convex cone at @ in a real
1cs X, and f s ?(X) with f continuous at some point in K.
Then
inf f(K) =-rain f*(-K°).
Notice that if K is a linear subspace of X, this formula reduces
to
inf f(K) = -min f*(K -a)J
and if also f is the f u n c t i o n llx-zll, we obtain the duality
formula (2) in 19b).
Exercise 32. Verify these assertions.
d) Exampl£. Let X = R n, c e X, and let A be a (real)
m × n matrix. Consider the s t a n d a r d linear program
max {<x,c> : A.x = b, x >_ 0}
for some given b ~ R m. We view this as a concave program and use
the Duality Theorem to construct the dual convex program. Let
gi(t) = cit 6Pl(t), where Pl ~ {t ~ RI: t ~ 0}. Then
g~(s) = inf {t(s-ci): t ~ 0}
O, s > ci
I -~ S < C i.
.ence for gCx) <x,c>-6pn, where Pn = {x ~ Rn: x _> @}, we
have
72
n ÷
g+(x) = X gi(xi)
i=l
'0j xi > ci V i
=
- , otherwise.
Now let K ~ {x: A.x = b} and f ~ 6K" Then f* = o K , and
dom (f*) = nullspace (A) = range (A*)
= row space of A.
So f*(y) < +~ if and only if y is a linear combination of the
rows Ai of A: y = ZlAl+'''+ZmA m. In this case the functional
.,y > is constant on K, and this constant value is
Zz i < x , A i > = Zzibi(x ¢ K). We now see that the dual program has
the form "minimize f* g+", or
rain { < z , b > : A*.z > c}.
Since the z-variable in Rm enters so naturally the dual program
is always considered to be defined on Rm rather than (Rn) * = R n.
(If the original constraint had been of the form Ax < b, x > @,
then the dual constraints would have turned out as A*.z > c,
z~e).
Suppose that a solution z has been obtained to the dual

0
program. We then put Yo = A*'Zo and obtain (in principle) a
solution x° to the original linear program from the requirements
X o c 8f*(yo)~ &g+(yo ), deduced in 20d). In practice, since the
computational difficulty involved in solving a linear program de-
pends more on the number of constraints (not counting non-negativity
constraints) than on the number of variables, it tends to be more
efficient to directly solve the dual program whenever m > n.

73
e) Example. Let X be a r e a l Banach space, A ~ L(X,Rn),
c a Rn. The p r o b l e m o f f i n d i n g an e l e m e n t o f m i n i m a l norm i n t h e
flat A-l(c) will be c a l l e d an a b s t r a c t minimum e f f o r t control
problem. This problem is discussed at length i n t h e book [60], and
is considered for illustrative purposes in the book [42]; cf. also
the following example f).
Suppose that X is a Hilbert space. Then there is a unique
solution and the subdifferential theory locates it as the point of
intersection of A-l(c) and (A-I(0)) ~. To proceed via the Duality
Theorem, let f = ½11-If 2, K = A-l(c) and g = -6 K. Then
dom (g+) = (A-I(@)) i = range (A~), and so
g + (y) = inf < K,y

~ > -~ if and only if y =A* (e) for some
+
e a R n, and then g (y) = < c , e > . Thus the dual program
"maximize g + - f,,, becomes the finite dimensional (unconstrained)
problem
(i) max ( < c , e > - ½ 1 1 A * ( e ) jl: e ¢ R n ) .
e o ¢ Rn is a solution if and only if the gradient of the function
in (i) vanishes at Co, and this condition requires
=0 vo,
(2) AA*(eo) = c.
(If, more generally, X is a reflexive and rotund Banach space
(see §27), the has condition on eo is that
A(Vf(A*(eo)) = c.)
Having solved (2) for e° we obtain a solution Yo of the dual
problem by Yo = A*(eo)" However, Yo is also a solution of the
original problem by 20d), since Yo a K so that g(yo ) = 0, and

74
therefore
[lyo[[ 2 : <A*(eo),Am(eo) >
that is,
f(yo)-g(yo ) = 11 lyol ]2
o <Oo,~ ½11~oJl~
+
-- g (Yo)-f*(yo ).
Remark. If A is written in the form
n
(~) ACx): X <x,ui)el,
i=l
then
n
A*Ce~ : Z <e,ei}ui.
i=1
Hence AA* is the Gram matrix [ < u i , u j ~ ]. Assuming that range
(A) = R n, the set {Ul,...,u n} is linearly independent so that
AA* is invertible and hence
YO = A* (AA*) - I (c) = A t ( c ) '
where At is the pSeudoinverse of A (see §35).
f) Example (cont.) A linear dynamical system is governed by a
set of ordinary differential equations
(4) %((t) : F x ( t ) + Bu(t),
where x: [0,T] ÷ R n, u: [0,T] ÷ R m, and the matrices F, B may
be functions of t. We assume that x(0) = @ and try to choose a
control u so as to transfer the state of the system to c s Rn at
time T (i.e., x(T) = c) with minimum expenditure of energy. The

75
latter is taken proportional to
fi
(Note that no magnitude constraints are being imposed on the control
u.)
Let X = L2(dt) m, where dt denotes Lebesgue measure on
[0,T]. We will define an operator A ~ L(X,Rn). Let ~ be the
transition matrix of the system (4). Then define
A(u) : x(r)
= *(T) t)B(t)u(t) at.
(Recall that if F is a matrix of constants, then ~(t) = exp (Ft).)
We have now put this dynamical problem in the form of the abstract
model considered in the preceding example.
Let [wij(t)] -5 W(t) -= ~(T)~-l(t)B(t). Then
i:l j=l ]w
0 ij (t)uj (t)dt)e i
n
= ~ <u,Wi) ei ,
i=l
where < . ,. ~ is the inner product in X. Hence we see that the
matrix
¢ T
AA~ = | W(t)W* (t)dt.
J0
g) Remark. Further discussion and examples of the Fenchel
Duality Theorem occur in [42] and [70].

Part III
Theory of Best Approximation
We turn now to a consideration of a special class of convex
programs, namely those in which the objective functions is the dis-
tance function defined by a convex subset of a nls - see §19. A
fair amount of the material in this part can also be found in the
books of Cheney [9] and Singer [72] (although most of that treatment
considers only approximation from linear subspaces); the latter work
in particular contains a great deal of additional information on
approximation theory, all developed within the framework of functional
analysis.
§22. Characterization of Best Approximations
a) Definition. Let K be a convex subset of a nls X and
x a X k cl(K), x° is a best approximation (b.a.) t__oo x from K
if it is a solution of the convex program (X, f + ~K), where
f(z) ~ l lx-z[l.
Thus x0 is simply an element of K of least distance to x:
]]x-xol I = d(x,K) (5 the value of the above program). The next
theorem is the main result characterizing best approximations; a
sharper version will be given later (23f)) for the finite dimensional
case.
b) Theorem (Garkavi; Deutsch-Maserick). x is a b.a. to x

O
from K if and only if ~ ~ ~ S(X*) such that *(X-Xo) : 1 I x-xol l

and re ~(xo) = max re ¢(K).
Proof. If such a ~ exists and z ~ K then

77
I lx-xol I = ,(x-x o) = re ,(x-x o)
= re qb(x) - re ,(Xo) <_ r e dO(x) re ,(y)
= re O(x-y) <_ l * ( x - Y ) t < I I x - y l l .
Conversely, if xo is a b.a. to x from K, and we put
f(z) ~ l]x-z][, then by lld) ~ s 3f(Xo) such that
,(Xo) = min ,(K). (If X is complex, apply the f o l l o w i n g argument
to X and extend the resultant ~ from X* to X* as usual.)

r r
In particular, for any y ~ X,
(1) (Y) = , (Xo+Y) * (x o)

tlx-(Xo+y)ll llX-Xoll : Ilyll,
so I1~11 ~ l . Similarly, 9(X-Xo) ~ -llx-xoll so that
Let y = Xo-X in (i) to get
,(Xo-X) : jlx-C2Xo-X)jl JlX-Xoil

Irx-Xol I ,
so that ~(Xo-X) = l lx-x o I, and we may take ~ = -~, qed.
Geometrically this theorem says that x° is a b.a. to x if
and only if there is a (real, closed) hyperplane H supporting K
at Xo, separating K from x, and such that d(x,H) = d(x,K).
Also, note that if K is a linear subspace, then ~ must
belong to K &.
c) Example. Let X = LP(~), 1 _ O,
for every z ~ K. When K is a linear subspace of X, the c r i t e r i o n

78
is simply that
(x-x---'-~) IX-Xo Ip-2 ~ KJ" C Lq(!a),
d) ~ . Let X be an inner p r o d u c t space, and K a
convex subset of X. Then x is a b.a. to x from K if and

O
only if
re x o-x , z-x o > _> 0,
for every z ~ K. This results directly from b), since the func-
tional ¢ there is now g i v e n by y ~ re < y , ( x - x o ) / I ] X - X o ] [ > . 0£
course when K is a linear subspace we recover the usual criterion
that (X-Xo) ~ K.
e) Example. Let X = CR(~ ). We say that t s ~ is a peak
point of x ~ X (written t s P(x)) if Ix(t) I = IIxll. we say
that Z s X* has the same sign as x if
f
E
x d~ > 0,
for all Borel subsets E of ~. The p o s i t i v e , negative, and t o t a l

+
variations of ~ are denoted by ~ , ~ , and I~l, respectively.
Theorem. x is a b.a. to x from a convex subset K of

o
if and only if there is a n o n - z e r o D ~ X* such that
I(Xo-Z)d~ > 0, z e K;
has the same sign as X-Xo;
support (Z)C P(X-Xo).
The p r o o f is an easy c o n s e q u e n c e of b) and the next lemma.

79
Lemma. Let x E CR(~), ~ c rca (a).
i) ~ has the same sign as x if and only if
(2)
ffxd~ = Ilxldl~l.
2) support (~) C_- P(x) if and only if
(3)
fl
Proof. I) If {2) holds and ; does not have the ~ame
sign as x, then ~ Borel set E C ~ such that
I
E
xd~ < 0.
Therefore,
fxd~ < f xd~ <_ f IxldI~ I

~'~E ~\E
a contradiction. Conversely, assume that ~ has the same sign as
x. Let a = A U B be a Hahn decomposition of a for ~. Then
fxd~J = Ixd~+ - fxd~J-
= Ixd~+ - Ixd~ -
A B
= (x -x )d~ - (x+-x)d~-.
f + - + f
A B
+
[Here x is the positive part of x, etc.) Now
fx-d~+ = 0 = fx+d~-
A B
(check!), and so
~:~ 0
/
c+ C~
fO
0
II
O
b~ D ~
A IA II -- ,g. II
II ~" , II II
IA ~ ~l~ ~ -
-- f
x --¢ -~
x
O ~ + II
+
03 ~
Z" v + i
f +
/ -- O h +
+ / +
n t1~~¸, + +
"~ N m
0 + +
,+ +
-2 ~ Im
I[ el - - ~:a
O II i
x C v ~ 4-
n2~ t~ + "E
- - N l
II
- - I
~...FI
C / --
b = l lxll I1 11

a
if and only if ~ has the same s i g n as x and support (g) ~ P(x)
if and only if support (~+) (resp. support (~-)) is contained in
{t e a= x C t ) = l lxjl (resp. = -j jx I j)}.
Exercise 34. Take a = [0,1] and let ; be absolutely con-
tinuous wrt Lebesgue measure. Then ~ has the same s i g n as
x s CR([0,1]) if and only if the Radon-Nikodym derivative d~/dt
has the same sign almost everywhere (as a function) as x.
§23 •
Extremal
...........
Re P resentations
In this section we consider some applications of the extreme
point concept of ~5 to the r e p r e s e n t a t i o n of linear functionals and
the implications of this for finite dimensional best approximation.
This abstract theory is then illustrated with applications to spaces
of continuous functions.
a) Lemma. (Carath~odory) Let A C X~ an n - d i m e n s i o n a l Is.
If x e co (A), then x is a convex combination of at most n + 1
elements of A (resp., at most 2n + 1 elements of A if the
scalars are complex).
The proof of this w e l l - k n o w n result is omitted here, it may be
found in [9, p. 17], [44, p. 43], or [70, p. 155]. A particular
consequence is that co (A) is compact whenever A is, a fact
which is generally not valid in infinite dimensional spaces.

82
b) Lemma. Let K be a c o m p a c t convex subset of an n-dimen-
sional Is X. Then each b o u n d a r y (resp. interior) point of K is
a convex combination of at most n (resp. n + i) points of
ext (K). (If the scalars are complex these n u m b e r s are to be re-
placed by 2n and 2n + I, resp.) In p a r t i c u l a r , K = co(ext (K)).
Proof. It will suffice to assume real scalars. We p r o c e e d by
induction on the d i m e n s i o n d of K, the case d = 1 being trivial.
Assume the lemma true for d < m-I and let d = m; we may also
assume that @ s K. If M ~ span (K), then K has n o n - e m p t y
interior wrt M, namely rel-int (K), and it is convex by 3c). Let
x be a r e l a t i v e boundary point of K; then by the S u p p o r t Theorem
(3f)) there is a h y p e r p l a n e H in M supporting K at x. The
set H6%K is compact, convex, K-extremal, and of d i m e n s i o n at
most m - i. By the induction hypothesis x is a convex combination
of at most (m-l) + 1 = m points in ext (H(']K). But
ext (H6hK) C ext (K) by 5c-3). Finally, if x s rel-int (K),
choose any z g ext (K) and e x t e n d the line segment [z,x] until
it meets the relative boundary of K at some y. Then x s co (y,z)
and we can apply what has just been p r o v e n to y.
c) Lemma. (Singer) Let M be a linear subspace of a nls X,
and % cext (U(M*)). Then there exists an e x t e n s i o n of } to all
of X which belongs to ext (U(X*)).
Proof. Exercise 35.
d) Theorem. (Interpolation Formula for L i n e a r Functionals)
Let M be an n - d i m e n s i o n a l linear subspace of a nls X and let
~ S(M~). Then ~ {~l'''''~m } C ext (U(X*)) and XI,...,X m > 0,
with XI+...+X = 1 such that

m
83
m
(l ) ,-- x.,jtM.
j=l J
Here m ~ n (real scalars) or m < 2n-1 (complex scalars).
Proof. By c) we may assume that M = X. In the real case the
result follows directly from b) w i t h K = U(X*). Consider now that
the scalars are complex; applying b) w o u l d give us a r e p r e s e n t a t i o n
of the form (i) but we could only be sure that m ~ 2n. The re-
maining argument allows us to reduce this bound to 2n-l.
Let K = ker (¢), then dim (K) = n-i so dim (Kr) = 2n-2.
Choose x ° ¢ S(X) such that ~(xo) = 1 (= I l~I I). Let
Y = real span (Xo,Kr) and define ~ ¢ S(Y*) by ~ = re elY. Apply
b) to get {~l'''''~m }C ext (U(Ye)), XI,...,X m > 0,
Xl+'''+Xm = I, with m _< dim (Y) = 2n-l, so that
m
j=l J J
Apply c) to extend each ~j to ~j E e x t (U(Xr)), put
j=l J'
and define ~ ~ X~ by ~ = re ~ (usual extension from X~ to X*).

r
Thus
m
~= ~ Xj~j,
j=l
where Tj ( x ) = ~j (x) - i~% ( i x ) (i - Rf7~), and each Tj is in
ext (U(Xe)). Now we h a v e that ker (~) = ker (~) and
I I~ll = 11,II, hence ~ = a, for some s c a l a r ~ with lal = 1.
We c l a i m that ~ = 0. Indeed, since IT(Xo) I <_ 1 , re ~(ixo) ; 0,
and so ~ ( x o) = 1 = ~(xo); that is, a = 1, qed.

84,
e) Corollar~. (Zuhovickii, Ptak, Rivlin-Shapiro) Let M be
an n-dimensional linear subspace of C(e) and ¢ e M ~. Then
{tl,''',t m} C ~ and scalars XI,...,X m such that
m
¢(x) = j=l~ijx(tj) V X e M,
m
11 11 =
j=l
X Ixjl,
sgn (Xj) = x o(tj), 1 <_ j <__ m,
for any x ° e X(M) satisfying ¢(Xo) = II¢II. Here m < n (real
scalars) or m < 2n-I (complex scalars).
Proof. Exercise 36.
f) We now reconsider the characterization theorem 22b) under
the additional hypothesis that the convex set K lies in an n-dimen-
sional subspace of the nls X. In this case the separating func-
tional ¢ of 22b) can be written as a convex combination of m
extreme points of U(X*) where m < n+l (real scalars) or
m < 2n+l (complex scalars). This follows from the Interpolation
Formula d) applied to the subspace M z span ({x,K}) (x as in 22b)),
which has dimension < n+l.
If we write this representation of ¢ as
where Xj > O, Xl+...+X m = 1, and Cj E ext (U(X*)), and i f
x 8 K is a b.a. to x, then we have in addition

o
cj(x-x o) = IIx-xoll, Vj.

For suppose that re Cj(X-Xo) < IIX-XoJJ for some j < m. Then by
22b),
85
l lx-xoll = ¢(x-~ o) = j=iZjCj

~ (x-xo)
m
= re ( ~ X.~j (x )
j=l J -x°)
m
= j=l
~ X j re ,j ( x - x ° ) < I l X-XolI '
a contradiction. Consequently,
I1~-~oll = re ~ j ( X - X o )
l~j(X-Xo)l E Ilx-~oll,
qed. Since we are tacitly assuming the ~j to be distinct from
one another, this entails their pairwise linear independence.
We sum up the preceding remarks for the following important
special case.
Theorem. Let K be an n-dimensional linear subspace of the
nls X. Then x ~ K is a b.a. to x ~ X \ K if and only if

O
{~l,...,~m}C ext (U(X*)), X I,.-.,x m > 0, Xl+'''+Xm = 1 such that
®j(x-x o) = llx-~ol I, V j,
m
~.~j ~ s ( ~ ) .
j=l J
Corollary. (Cheney, Ikebe, Singer) Let K ~ span ({Xl,...,Xn})
be an n-dimensional subspace of the nls X with scalar field F.
Then x° ~ K is a b.a. to x e X \ K if and only if the origin in
Fn belongs to
co ({ ( ~ ) ¢ ( ~ 1 ) "'" '~)¢(~n )):

qb ~ e x t (U(X*)), I~(X-Xo) I = l t X - X o l l } ).
86
Proof. Exercise 37.
g) Corollar~. (Distance Formula) Let K be a convex subset
contained in an n - d i m e n s i o n a l subspace of a nls X. Assume that K
contains a b.a. to x ¢ X \ K (certainly true if K is closed).
Then there are m (as in f)) pairwise linearly independent
functionals Sj ~ ext (U(X*)) such that
d(x,K) = min { max ]¢j(x-z)]: z g K}.

l<j<m
Proof. Let x be the assumed b.a. to x and apply the

o
results in f) to obtain {~l,...,~m}~ ext (U(X*)), Xl,...,% m > 0,
kl+-..+km = i such that
Sj (X-X O) = I lX-Xol I ,
m
re ~ k.¢j(Xo-Z ) > O, Z E K.
j=l j
Thus for z c K,
m
re ~ k.$j (x-z)
j=l J
m
> re ~ X.$j (X-Xo) = l lX-Xol I
- j=l J
and therefore,
d(x,K) = I Ix-Xotl
Ill
<_ inf {]j=l kjCj (x-z)[: z ¢ K}
m
< inf { 1 % ' I ~ j (x-z) I : z e K}
-- j=l J
< inf { max l ¢ . ( x - z ) ] : z ~ K}

- l<_j <m J
< inf {I Ix-zl I : z ~ K} = d(x,K).

87
Setting z = x° shows that the infimum is actually attained, qed.
h) ExamPle. The usefulness of the foregoing results evidently
hinges on our knowledge of the ext (U(X~)). Ideally, this set
should be small relative to S(X~), and known in explicit form.
The outstanding example of such a space X is C(~).
Let K = span ((Xl,'-',Xn)) be an n-dimensional subspace of
C(~) and let t denote the n-tuple (xl(t),...,Xn(t)). For fixed
x outside K, let
n
.o,
r = r(Xl, ,Xn) ~ x - ~ X.x .
j=l J j
be the error function in the approximation to x by Z~jxj. Then
a nas condition that r achieve a minimum at a particular set of
values (~l,...,Xn) (so that Z~jxj is a b.a. to x) is that the
origin in n-space belong to co ((r(t)t: Ir(t)] = l]rlI}). This
conclusion is an immediate consequence of f) and 15c).
i) ~xample. (Remez, Schnirelman, Zuhovickii) Let x and K
be as in h), and m as in f). Then ~ (tl~...,tm) C ~ such that
d(x,K) = rain {l~jmax<__m

Ix(tj)-z(tj)]: z ~ K}
This follows immediately from g). The implication is that there is
a finite subset (tl,..-,tm) C ~ such that the minimum distance
from x to K is the same as the minimum distance when all func-
tions are restricted to this subset. Further, among these restric-
tions at least one of the b.a.'s to x will be a b.a. to x wrt
the entire set ~. These observations underlie the construction of
the practical algorithms used for computing best approximations in
the spaces C(~). These algorithms reduce the original problem to
a succession of approximation problems involving functions defined

88
on (judiciously chosen) finite subsets of 2, and hence to a
succession of finite dimensional convex programs.
§__24. Application to Gaussian quadrature
a) Let ~ be a positive Borel measure on [-i,i] and
f ~ CR([-I,I]). Our problem is to guess the value of ffd~, given
m samples f(tl),...,f(tm) , of f. Confining ourselves to linear
estimates, we are led to a quadrature formula:
(i) I1 fd~ ; m~ A.f(tj).

-1 j=i J
It is clear that the A. can be chosen so that the approximation (i)

3
is exact whenever f is a polynomial of degree ~ m-l. Gauss
proved that by proper choice of the nodes
{tl,''',tm} C [-i,i], the formula (i) becomes exact for all poly-
nomials of degree < 2m-l.
b) Let Pn be the (n+l)-dimensional space of polynomials of

degree < n on [-i,i]. Let
l
(x) -
f -I
xd~,
so that ~ E P* and ~CI) = 11~[I = I1~11. By 2 3 e ) ,

n
{tl,''',tm}C [-i,i], where m ~ n+l, and positive numbers
Xl,..-,Xm, such t h a t Vx c Pn'

1 m
I xd~ = ~ Xjx )
(2) "i j =l (tj ,
m
(3) 11~tl = X x.
j=l ]
We are concerned with the size of m in this representation of ~.

89
Theorem. (Krein, Rivlin-Shapiro) There is a unique repre-
sentation of ~ in the form (2), (3) for which m is minimal.
This minimal value of m is i + In/2], assuming that support (~)
contains at least 1 + [n/2] points. Furthermore, when m has
this value and n is odd, formula (2) is exactly the Gauss qua-
drature formula referred to in a).
c) Before proving this theorem let us recall some facts about
the Gauss formula. Suppose we apply the Gram-Schmidt orthonormal-
ization procedure to the monomials {l,t,t 2,..'} in L2(~). We
thereby obtain a complete orthonormal sequence of polynomials
{Qo,QI,Q2,...} in L2(~). Each root of Qj is simple and lies in
(-i,i).
Example. If ~ is Lebesgue measure then
Qn = n ~ ~ +~- Ln"
where L is the n th Legendre polynomial:
n
1 dn (t21)n
L n(t) 2nn! dt n
Theorem. If the quadrature formula (i) is exact on Pm' then
it is exact on P2m-i if and only if the nodes {tj} are the
roots of Qm"
The proof of this theorem can be found in [9, p. ii0]. The
resulting quadrature formula is the Gauss quadrature formula.
d) We now prove the theorem in b). We first note that (2) can
only hold if m > In/2]. For otherwise, ~P ¢ Pm such that

p(tj) = 0, 1 ~ j ! m; since deg (p2) = 2m ~ n, we have
90
1 m
I p2d~ = ~iAjp2(tj) = O.
-t j--
This implies that p vanishes on support (~), which therefore has
at most m ~ In/2] points, contradicting the assumption about
support (~) in b). Now let m = (n+l)/2 (remember that n is
assumed odd), and, for 1 ~ j 2 m and {tj) = {roots of Qm ),
define
(4) ~3
• z
'
(Qm(tj
))-i/I -i Om(t ) (t-tj)-id~(t).
Claim: each
~. > 0, and relations (2), (3) hold. By the theorem
J
in c), no other choice of ~. with this small an m can satisfy
J
(2); hence when the claim has been justified, the proof of the
theorem in b) will be complete.
Proof of Claim. Let p ~ P vanish at each t.. Then

n j
P = Qm " q for some q a Pm-l" But Qm is orthogonal to any such
q (as elements of L2(~)), and so fpd~ = 0. From linear algebra
there follows the existence of real numbers yl,-.-,ym such that

m
(s) ¢(x) = j=l
~ y~x )
" (tj ,
V x ~ Pn" Applying (5) to the functions
, -1
xi(t ) = (Qm(ti))-lQm(t) (t-t i) ,
and recalling (4), we find that Ti = Xi' 1 2 i ~ m. This proves
(2) and if, as will next be shown, X. > O, relation (3) also
1
follows.
Fix an i, 1 0 since x > @, and x vanishes at only m-i points
while support (;) contains at least (n+l)/2 > m-i points. This
shows that X i > 0, qed.
Exercise 38. Let A be the n th Gauss quadrature formula,

n
considered as an element of CR([-I,I])*. That is,
n
An(X) =
j=l
xj(n)x(t}n))
for x ~ CR([-I,I]) , where {t~ n)} = {roots of Qn }. Prove that
An * ~ in the w*-topology.
§25. ....Haar Subspaces
In order to obtain a sharper and more useful form of the
characterization theorem in 23f) in the case where M = C(~) we
introduce the notion of a (finite dimensional) "Haar subspace" of
C(~). This notion will later be generalized to subspaces of an
arbitrary nls, and will play a role in the study of uniqueness ques-
tions in the theory of best approximation.
a) Definition. Let M be an n-dimensional linear subspace of
C(~). Then M is a Haar subspace (interpolatin ~ subspace) if given
any n distinct points {tl,...,tn} ~ , and any n scalars
{Cl,...,Cn} , there is exactly one x E M for which x(ti) = c i,
l<i<n.
The following lemma provides some alternative characterizations
of Haar subspaces; its straightforward proof is omitted.

92
Lemma. Let M = span ({Xl,...,Xn}) be an n-dimensional sub-
space of C(~). The following assertions are all equivalent.
i) M is a Haar subspace;
2) @ is the only element of M having at least n roots in
3) For distinct {tl,.'',t n} C a , the matrix [xi(tj)] is
non-singular;
4) For distinct {tl,...,tn}C ~, the set of n-vectors
is linearly independent (the notation ~ was

{£i,.'',£n}
defined in 23h)).
Remark. Obviously the span of any non-vanishing x s C(~) is
a one-dimensional Haar subspace. However, the existence of higher
dimensional Haar subspaces imposes a severe topological restriction
on ~. In particular, if CR(~ ) contains an n-dimensional Haar
subspace (n > 2) then ~ is homeomorphic to a (compact) subset of
the unit circle (Mairhuber, Curtis, Sieklucki). For further details,
see the discussion in Singer [72, p. 218-222].
c) We consider now several examples of Haar subspaces. First,
since the Vandermonde determinant is non-zero, it follows from a)
that the polynomial subspace Pn is a Haar subspace in CR([a,b])
for any n > 1 and a < b. This can also be viewed as a special
case of the following fact. If x ~ C~([a,b]) and x(n)(t) > 0 on
[a,b], then span ({l,t,t 2,..-,tn-l,x}) is a Haar subspace of
CR([a,b]). On the other hand, span ({t,et}) is not a Haar subspace
of c~([0,3]).
Next we give a general result which shows that Haar subspaces
can be generated by solutions of certain special kinds of ordinary
differential equations.
93
Theorem. (Pdlya, Zedek) Let I be any real interval. Define
a linear differential operator
Ln(D ) = ( D + X n ( t ) ) ( I ) + X n _ l ( t ) ) ' ' ' (D+XI(t)),
n-i
where Xi ¢ CR ( I ) , 1 < i < n, and D -- d / d t . Then any non-zero
solution of
(l) L (D)'x : 0
n
has at most (n-l) distinct roots in I. Consequently, any n
linearly independent solutions of (i) span a Haar subspace of I.
The p r o o f requires a preliminary lemma consisting of two
generalizations of R o l l e ' s theorem.
Lemma. I) Let x be a d i f f e r e n t i a b l e function on [a,b] with
x(a) = x(b) = 0, and let X ~ CR([a,b]). Then ~ c s (a,b) such
that
(D+X(c)).x(c) -= x ' ( c ) + x ( c ) x ( c ) : o.
2) Let x be n - t i m e s differentiable on [a,b] and have

n-i
(n+l) distinct roots there. Let Xi s C R ([a,b]) for 1 < i < n.
Then ~ c ~ (a,b) such that
(2) Xn(C ) =- L n ( D ) . x ( t ) I t = c = 0
Proof. I) Apply Rolle's theorem to the f u n c t i o n
y(t) ~ x(t) exp (/X(t)dt).
2) Define x ° = x, x k = (D+Xk)Xk_l, for 0 ~ k ~ n. By
induction and the result in i) we see that xk has at least
n - k + 1 roots, each lying b e t w e e n each pair of a d j a c e n t roots of
Xk_ I. When k = n we o b t a i n (2).

94
Proof of the Theorem. We p r o c e e d by i n d u c t i o n on n. For
n = 1 the general non-zero solution of (I) is g i v e n by
x = c exp (-/~l(t)dt), where c ~ 0. This x has at most
n - i = 0 roots in I as claimed. Now assume the t h e o r e m true
for a value n-i and let x be a n o n - z e r o solution of (i). Then
the f u n c t i o n
w = Ln_l(D).x
is a s o l u t i o n of the e q u a t i o n
(D+Xn).W = @.
Now two cases are possible. If w = @, then x is a s o l u t i o n of
(I) w i t h n replaced by n-l, so x has at most (n-2) roots in
I by the i n d u c t i o n hypothesis. Otherwise, w > @ and then by the
first step of the induction, w has no roots in I, since it
satisfies (3). But in this case the s e c o n d part of the p r e c e d i n g
lemma implies that x can have at most (n-l) distinct zeros in
I, qed.
Exercise 39. Verify the a s s e r t i o n s in the first paragraph of
this sub-section. Also:
i) Let al < ~2 < "'" < an, and 0 < a < b < +~. Then
span ({t ~I , . . . , t a n})
is a Haar subspace of CR([a,b]).
2) Let {~i } be as in i), and a < b. Then
~i t ant
span ({e ,.-.,e })
is a H a a r subspace of CR([a,b]).
95
3) For n a positive integer,
span ({i, cos kt, sin kt: I <__ k < n])
is a Haar subspace of the space of all real continuous 2~-periodic
functions on the line (identified with CR(~ ), where ~ is the
unit circle].
4) For n a positive integer,
span ({i, cos kt: 1 < k < n}),
span ({sin kt: i < k < n})
are each a Haar subspace of CR(~), where ~ is a compact sub-
interval of (0,~).
d) We are now ready to e s t a b l i s h the famous "alternation
theorem" which characterizes b.a.'s from Haar subspaces of
CR([a,b]). As a preliminary, let M = span ({Xl,...,Xn}) be a
subspace of some space CR(~ ). Define a function
D ~ CR(~ x...x ~) by
D(tl,...,tn) = det [x~(tj)].
Then D is zero only if two or more of the points {tj} coincide.
Given two sets {Sl,...,s n} and {tl,...,tn} , each consisting of
distinct points in ~, suppose that it is p o s s i b l e to vary tj
continuously so that t. ÷ s. while no two of the t become

3 ] ]
coincident; then sgn D(tl,...,tn) = sgn D(Sl,...,Sn). In particular,
this can be done when ~ = [a,b].
Lemma. Let a < s I <'''< Sn < b, a < t I <''.< tn < b and let
be as just defined. Then sgn D(Sl,... ,Sn) = sgn D(tl,...,tn).

96
Proof. Exercise 40.
Theorem. (Chebyshev-Bernstein Alternation Theorem) Let M
be an n - d i m e n s i o n a l Haar subspace of CR([a,b]) and x ~ M. Then
x° s M is a b.a. to x if and only if there are points
a j t I <...< tn+ 1 ~ b such that
Ix(tj)-Xo(tj)I = l lX-Xoll,
x(tj)-Xo(tj) -- (-l)J+l(x(tl)-Xo(tl)),
for 1 < j < n + i.
Remark. Let r ~ x - x be the error function. The condition

o
just stated is that this error function should attain its m a x i m u m
absolute value over [a,b] at least (n + I) times, with alternate
signs. Sometimes this is expressed by the statement "the error
curve alternates (n + i) times", or, "the error curve has (n + i)
alternating peak points". The point set {tl,--.,tn+ I} is called
a Chebyshev alternance for r.
Proof of the A l t e r n a t i o n Theorem. From 23f) we recall that a
nas condition for x to be a b.a. to x from M is that there

O
should exist m ( ~ n + i) points {tj} C [a,b]~ (tj < tj+l),
> O, Ol,...,qm with loj = 1 such that

Xl,''',X m
°j (x(tj)-Xo(tj)) = Ix-xoll,
m
Xjojxi(tj) = 0,
j=l
for 1 <_ i <_ n. (That is, the ~j of 23f) are oj6tj here.) By
setting ~. = o.X. the above nas condition is equivalent to the

J J J
existence of m non-zero scalars ~ such that
J
97
IX(tj)-xo(tj) I = l iX-Xoll,
(4) sgn ~j = sgn (x(tj)-Xo(tj)),
m J.
(s]
j =i tj
Suppose that m < n + i. Then, because M is a Haar subspace,
~y ~ M such that y(tj) ~ aj, 1 <_ j <_ m, and so by (5),
m m
0 = !l~jy(tj) = ~ a2 ¢ 0
j j=l J '
a contradiction. Therefore m = n + I.
Now let us rewrite equation (5) in the form
j=l ~.x.(tj)
J i = -an+iXi(tn+l)
for 1 < i < n, and solve for a. by Cramer's rule:

-- -- J
D(tl,''',tj_l,tj+l,''',tn+ I)
C6) aj = (-l)n+l-i~n+ 1
D(tl,..',t n)
Now by the p r e c e d i n g lemma, the ratio of the D's is positive.
Thus the numbers a. alternate in sign, and hence so do the numbers

J
(x(tj)-xo(tj)). This proves the n e c e s s i t y of the alternation
condition for x to be a b.a. to x.

O
Conversely, if x - x° has an (n + l)-point Chebyshev alter-
nance in [a,b], a ! t I <...< tn+ 1 ~ b, set
~n+l --- sgn (X(tn+l)-Xo(tn+l)),
and define @l,...,~n by means of (6). This means that
{~l,...,~n+l} satisfy (4) and (S), and so the conditions for xo
to be a b.a. to x are met, qed.

98
e) The Alternation Theorem is very useful for actually com-
puting best approximations in CR([a,b]). For example, it easily
implies that the b.a. to x by constant functions is the constant
function whose value is the average of the minimum and maximum
values assumed by x on [a,b].
Exercise 41. Verify this last assertion. Compute the
(unique) b.a. from P1 and from P2 to x(t) = t on [-i,i],
and from P1 to x(t) = ~ on [0,I].
We will shortly see a further application of the Alternation
Theorem in the discussion of Chebyshev polynomials.
f) The final result of this section is a classical theorem of
de La Val~e Poussin which complements the distance formula of 23i)
by providing lower estimates for the distance to a subspace
M C - C R ( [ a , b ] ), if M is Haar.
Theorem. Let M be an n-dimensional Haar subspace of
CR([a,b] ) and x ~ M. Let z ~ M have the property that x - z
assumes alternately positive and negative values at (n + i)
consecutive points (tj) C [a,b]. Then
d(x,M) >_ rain (Ix(tj)-z(tj)I: 1 <__ j <_ n + i).
Proof. Suppose there is y ~ M such that
I Ix-yll < rain (Ix(tj)-z(tj)). Then y - z = x - z - (x-y) assumes
alternately positive and negative values at the tj, and so has
at least n roots in [a,b], contradicting the Haar condition for
M.
§26. Chebyshev Po!xnomials
These polynomials occur as solutions of certain best approxi-

99
mation problems on [-i,I] and have many remarkable properties.
In particular, we will see that they can be used to conveniently
produce "good" (although not necessarily "best") approximations to
any function in CR([-I,I]).
a) Lemma. For -i < t < i,
(I) cos (n arc cos (t)) = 2n-lt n + qn_l(t),
where qn-i ~ Pn-l"
Proof. This follows from the formula
n- 1 (n)
cos (n@ = 2n-l(cos (a))n + k!oXk (cos (~))k,
which in turn is proved by induction on n, making use of the

identity
Definition. The (n + i) st Chebyshev polynomial T is

n
g i v e n by t h e r i g h t hand s i d e o f ( 1 ) .
b) Consider the problem of finding the b.a. from Pn-i to tn
on [-i,I]. This is evidently equivalent to the problem of finding
the monic polynomial in Pn which best approximates @ (i.e.,
which has least norm).
Theorem. (Chebyshev) The monic polynomial in Pn of least

norm on [-I,i] is (i/2n-l)rn .
Proof. Let p be the desired solution, and put A a ]]Pl]"
Then by 25d) there must be an (n + l)-point Chebyshev alternance
~[-I,i]. The alternance must include the points + i, since p'
vanishes at each alternance point in (-i,I), while deg (p') = n - i.

I00
Thus the p o l y n o m i a l s A 2 _ p2 and (l-t2)(p') 2 have the same
roots, and each root in (-I,i) is a double root. This implies
A2 p ( t ) 2 = ( 1 - t 2 ) p , ( t ) 2 / n 2,
A2 p(t)2 = +_ ~ P (t) .
Now p' changes sign as each point of the a l t e r n a n c e is passed; let
I be an interval where p' > 0. Then, for t ~ I,
p__' (t)_ _ n
fAZ-p(t)z 41-tZ
Integrate both sides to get
arc cos ( ~ ) = c + n arc cos (t),
p(t) = A cos (n arc cos (t) + c)
= A(cos (C)Tn(t) sin (c) sin (n a r c cos (t))),
for some constant c. But sin (c) must equal 0, since
sin (n arc cos (t)) is not a polynomial. Therefore cos (c) = h i,
and since p is monic we finally obtain that cos (c) = 1 and
A =i/2 n-l. That is, p = (i/2 n-l)Tn, qed.
c) We state with proof (for w h i c h see [51]) the following facts
about the C h e b y s h e v polynomials T n.
1) To ( t ) = 1, Tl(t) = t,
Tn(t ) = 2tTn_l(t)-Tn_2(t), n > 2.
2) For 1 <__ k <_ n, the roots of Tn are the points
cos ((2k,l)~ .
\ 2n !
For large n these roots tend to cluster toward the endpoints of
[-i,i] .
101
3) For any P ~ Pn and t outside (-i,i) we have
IP(t) l 5_ ]]P][ ]Tn(t)],
where I[PI[ ~ max {Ip(t) l: Itl ! I}. Thus all the zero tendency of
Tn is compressed to within (-i,i); outside this interval it ex-
ceeds every other polynomial in Pn of norm ~ i. The above in-
equality is also valid if p and Tn are each replaced by their
kth-derivatives.
4) The sequence
is a complete orthonormal set in L2(~), where is the positive
measure on [-i,i] defined by
dt
d~(t) _
~ .
Exercise 42. Prove this last fact.
d) In view of c-4) we now have the possibility of expanding
any x s CR([-I,I]) in a Fourier-Chebyshev series
cO
(2) x = 7 + =
[ICkTk '
k
where
x(t)Tk(t)
- i ~ - - dt,
and of course, the series (2) will converge to x in the L2(D) -
metric (~ as defined in c-4)). We next observe that (2) will also
converge to x in the CR([-l,l])-metric , with a modest additional
assumption on x.
I02
Theorem. Assume t h a t x s CR([-1,1]) is of bounded variation.
Then t h e Fourier-Chebyshev series (2) converges .uniformly to x on
i-l,1].
Proof. The f u n c t i o n y(s) = x(cos (s)) is of bounded variation
in the space of real continuous 2~-periodic functions. A theorem of

Titchmarsh [ 7 5 , p. 410] guarantees the uniform convergence of the
Fourier series for y. Since y is an e v e n f u n c t i o n its Fourier
series has the form

C co
y(s) - 9 +
k
lCk cos Cks ,
y(s) cos (ks)ds.
Ck~ 0
Now r e c a l l i n g the definition of Tk in a), and a p p l y i n g the change

of variable t = cos (s), we o b t a i n the conclusion of the theorem.
e) L e t Sn[X ] denote the truncation of the series (2) at
k = n. Then, of course, Sn[X ] is the L2(p)-best approximation to
x from P . We wish to show that it is also a "near-best"

n
CR([-l,l])-best approximation to x. This will prove to be a
consequence of the following general theorem.
To set the stage, let v be a positive measure on [-i,i],
absolutely continuous wrt Lebesgue measure, and let
{qk: k = 0,i,...} be the sequence of v-orthorormal polynomials. De-
fine numbers
I n
A~n 5 max {I I ~ qk(s)qk(t)ldv(s) }"
Itl~l -1 k=o
For fixed x s CR([-I,I]) let S~[x] be the L2 ( v ) - b e s t approxi-

n
mation to x from Pn' and let Qn[X] be the CR([-1,1])-best

approximation to x from P .
n
103
Theorem. ( A l e x i t s , Powell) In the p r e c e d i n g n o t a t i o n , we have
llx-S~[x]II ! (l+A~)IIX-Qn[X]II
(where the norm on both sides is the uniform norm).
Proof. We may write

n
X-Qn[X]-(X-Sn[X]) = [ akq k,
k=o
where the ak are to be determined. By 22d),
i1 (X_SnV[X])qkdV = 0,
-1
for 0 i k ! n. Consequently,
1
~k = f (X-Qn[X]) qkdV'
-1
for 0 < k < n. Hence, for -I < t < i,
Ix(t)-Sn~[X] (t) I = I x ( t ) -
Qn[X] (t) (X-Qn[X] qkdV) qk (t) 1

k=o -i
S n
= Ix(t)-Qn[X](t)- 1-1 {(X-Qn[X])(s) } k ! o q k ( s ) q k ( t ) d ~ ( s ) t
l n
! )lX-Qn[X])I + f -1 t(X-Qn[XJ)(s)l I k=o

~ qk(s)qk(t)/dv(s)
! llX-Qn[X]ll(l+A~),
qed.
Corollary. Let ~ = ~ ~ the Chebyshev measure d e f i n e d in c - 4 ) ,

and let A n ~ A n~. Then
(s) t lx SntX]l/ i (l+A~)IZx Qn[X]ll,

and
104
2 I'g
An = o<t<~max -~ n
0 Ik=o~' cos (ks) cos (kt)]ds,
where the prime on the summation sign indicates that the first term
is to be halved.
This corollary follows directly from the theorem by making the
change o f v a r i a b l e used in d).
f) It is shown in [61, p. 406] that
= (2n+2~ + 2 ~ ~ tan (2-~

An ,2n+l ~ ~ ),
k--1
and hence that
n -~ co
An ~ -2
4 log (n)
In particular, A n < 5.1 for n ! I000.
The practical implication of all these estimates is the follow-
ing. Suppose we are given some x E CR([-I,I]), and we wish to
approximate x by polynomials to within some specified tolerance.
Then the partial sums Sn[X ] of the Fourier-Chebyshev series (2)
are at least good first attempts. Indeed, the inequality (3) shows
that the best we could ~o by way of approximating x from Pn
(fixed n), namely [IX-Qn[X]l[, is greater that (i/6.1)]IX-Sn[X]l],

provided n < i000. That is, we cannot even obtain one additional
decimal place of accuracy by replacing the near-best approximation
Sn[X ] by the best approximation Qn[X].
Results of Hornecker, Talbot, and Rivlin [64] show that for a
certain class of rational functions x ~ CR([-I,I]), Sn[X]

actually differs from Qn[X] only in the top order term.
105
§27. Rotundit ~
In this and the next section we will be discussing the unique-
ness question in approximation theory. We consider two different
approaches to this problem: first, we impose global (rotundity)
conditions on the norm, sufficient to guarantee that every (convex)
approximation problem has at most one solution; on the other hand,
we consider particular subspaces having special properties sufficient
for uniqueness.
a) Definition. Let X be a Is and f e Cony (X). f is
strictl Z convex if u,v ~ dom (f) and 0 < t < 1 imply
(i) f(tu + (l-t)y) < tf(u) + (l-t)f(v).
Examples. If we refer back to 6a) - Example 5), a sufficient
condition for strict convexity of f is that E(u,v) > 0 for
u,v ~ K, u # v. In particular (refer to Exercise 12), f is
strictly convex if d2f(x) is positive definite for every x ~ K.
The functional f in Exercise 20 is strictly convex.
Remark. The immediate relevance of the strict convexity con-
cept to optimization problems is contained in the following ob-
vious assertion: a convex program with a strictly convex objective
function has at most one solution.
b) Definition. A closed convex subset K of a tls is rotund
if every boundary point of K belongs to ext (K). By abuse of
language, a nls X is called rotund if U(X) is a rotund set in X.
We would like to think that the rotundity of a nls X is
somehow connected with the strict convexity of the norm function on
X. However, due to the homogeneity of norms, the norm is never a
strictly convex function on X (the inequality (I) is violated along

106
rays emanating from @).
c) Definition. A norm ]I" I on a is X is essentially
strictly convex if u,v e X and
llu+vlj = ilull + llvll
implies u = tv for some t > 0.
Theorem. A nls is rotund if and only if its norm is essentially
strictly convex.
Proof. Exercise 43.
Thus by allowing the inequality (i) to fail only where it must,
we have obtained a property of norms which is equivalent to the
geometric property of rotundity. Of course, our interest in ro-
tundity is that in a rotund nls each element has at most one b.a.
from any specified convex set.
d) Definition. By abuse of language, a nls X is called smooth
if its norm is a smooth function (in the sense of 7c)) on the open
set X \ {8}.
Theorem. Let X be a nls such that X* is smooth (resp.
rotund). Then X is rotund (resp. smooth).
Proof. Suppose that X is not rotund. Then some point in
S(X) is not an extreme point of U(X), and hence there exists a
line segment [u,v] C S(X). By Mazur's Theorem (3e)), ~ ¢ a S(X*)
such that re ¢ ([u,v]) = I. The canonical images of u,v in X**
are then both subgradients of the norm (on X*) at ¢. But this
implies, by 10c), that the norm (on X *) has no gradient at ¢, a
contradiction. The proof of the remaining assertion of the theorem

107
is similar.
In particular, when X is reflexive, there is complete duality
between the properties of smoothness and rotundity. Since, accord-
ing to Exercise 14, the LP(~) spaces are smooth for 1 < p < +~,
we see again that such spaces are also rotund (a direct proof was
suggested as part of Exercise 7). It also follows that, for the
same values of p, the closely related Sobolev spaces wk'P(G)
(here G is an open subset of Rn and wk'P(G) consists of all
those scalar-valued functions on G whose distributional derivatives
up to order k are all in LP; see [78, p. 55]) are smooth and
rotund. Similarly the compact operator spaces Sp (which consist
of all compact operators T acting on some fixed Hilbert space such
that trace (T'T) p/2 < +~; see [22, Ch. Ill]) are smooth and rotund.
(In fact, all the spaces just listed possess the much stronger prop-
erties of uniform rotundity and uniform smoothness. In particular,
they are all E-spaces (see §31).)
e) Given a separable nls X it is always possible to find an
equivalent essentially strictly convex norm on X, which furthermore
differs arbitrarily little from the original norm. Much stronger
renorming results are known (see the survey article [ii], also [2]),
but they are not too useful for approximation theory. The result
just mentioned, however, will allow us to give an interesting appli-
cation of approximation theory.
Lemma. (Clarkson) Let (X, I]']l) be a separable nls, and
let ~ > 0. Then there is an essentially strictly convex equivalent
norm Ill'Ill on X, such that
Ilxll lllxlll (l+ ltxll, Vx ×.

108
Proof. It will suffice to produce some essentially strictly
convex norm equivalent norm ].[ on X, for then we may take
]]['[I] = I['[[ + X].], for suitably small X > 0. To construct
I'[, it will suffice to find an i s o m o r p h i s m T on X onto a sub-
space of a rotund nls (Y, ]].][) and then set
Ix[ = [[x][ + ][T(x)[], for x e X. But for Y we may take
C([0,1]), and for If'If the map
Y~-~llYI[ z llyll~ + ly(t) l2d ,
where [[.][~ denotes the usual uniform norm on C([0,1]). The iso-
morphism T is then the c o m p o s i t i o n of the identity map:
(Y, ]['I[~) ÷ (Y, ]I'][) with an isometry of (X, [].][) onto a
subspace of (Y, []'[I=); this isometry exists by virtue of the
separability of X (Banach's theorem).
f) Lemma. Let K be a compact convex subset of a rotund nls
X. Then the m e t r i c projection PK: X ÷ K is a (single valued) con-
tinuous function on K.
Proof. Exercise 44. (Metric projections are defined and dis-
cussed in §32.)
g) Application. (Schauder Fixed Point Theorem) Let K be a
closed convex subset of a nls X, and T a continuous mapping of
K into a compact subset of K. Then T has a fixed point in K.
Proof. (Bonsall) We can i n i t i a l l y reduce the p r o b l e m to the
case where K is b o u n d e d and X is separable and rotund. For, if
T(K) C A, where A is compact in K, then it suffices to prove
the theorem for B ~ c--~ (A) C K. Further, if we put Y ~ span (B),
then Y is a separable nls, and we may renorm Y so as to be

109
rotund by e ) .
Since T(K) is totally bounded, for each n = 1,2,..., there
is a ~-net {T(xi): i = 1,..., m = m(n)} for T(K). Let Yn be
the linear hull of this -net, and p u t Kn ~ K N Yn , a compact sub-
set of Yn" Let Pn: X ÷ Kn be t h e m e t r i c projection; then
Tn = (Pn o T) IKn
is a continuous self-map of K n. The classical Brouwer fixed point
theorem provides a fixed point un for Tn: Tn(Un) = u n. By compact-
ness, we may assume that T(Un) ÷ v, for some v ~ K. Now
]lUn-Vll = tITn(Un)-Vl[
< I ITn(Un)-T(Un)l[ + I ITCun)-vl]

--< n! + ,,,, l l T,t U n,~ - v l l
because, V x ¢ Kn, we have
l lT(x)-Tn(X)l t = I IT(x)-Pn(T(x))tt
: I J (I-P n)('r(x))li : a(T(x),~:)
< rain {[IT(x)-T(xi)]l: 1 < i < m} < 1 .

. . . . n
This proves that u n ~ v, and so Ttun) ÷ T(v); that is, v is a
fixed point of T, qed.
§28. Chebyshev Subspaces
a) Definition. A subset of a nls X is s e m i - C h e b y s h e v (resp.
Cheby_sshev) if it contains at most one (resp. exactly one) b.a. to
every element of X.
From 27c) it follows that every convex subset of a rotund nls
is semi-Chebyshev.
110
b) Theorem, Let K be a convex subset of a nls X. Then K
is s e m i - C h e b y s h e v if and only if there do not exist
(9 ~ @), points xI ~ x2 in X, and points yl,y 2 s K such that
~(x i) = l lxiJ I, i = 1,2,
re ~(yi) = sup re ~(K), i = 1,2,
Xl x2 = Yl Y2"
Proof. Suppose that 9' xi' Yi exist as just described. We
may assume that ~ a S(X*). Now
J IXlJ X - llx21J -- ¢(x i x2) = re %(x I x2)
= re ~(Yl y2) = re ~(yl) re 9(y2) = 0,
so lJxliJ ~ ilx2j J Consider the convex set
K' z K-- (x 2 + yl).
The points
-Xl = Y2 (x2 + Yl )'
-x2 = Yl (x2 + Yl )
are both in K' and we claim that they are both b.a.'s to @ from
K ~ . This will imply that K' and hence K are not semi-Chebyshev.
That -x i is a b.a. to @ from K' follows from 22b), because
~(-(-xi) ) = ~(xi) = tJxiJl,
and (taking i -- 1 for definiteness)
sup re }(K') = sup re 9(K-(x 2 + yl) )
= sup {re }(y): y s K} re }(x2) re }(yl)
= re }(yl) ÷ re ~(-x2) re ~(yl)

111
= re #(-x2).
For the converse, assume that K is not semi-Chebyshev; then
x s X with two distinct b.a.'s yl,y 2 E K. Then by 22b) again,
~ ~ S(X*) such that
re ~(yi) = sup re ¢(K),
#(x - yi ) = fix - yit I = d(x,K).
The points x i z x - Yi are then distinct, and together with
and Yi they satisfy the second condition of the theorem.
Corollary. (Singer) A linear subspace K of a nls X is
semi-Chebyshev if and only if there do not exist ~ ~ S(K±), x g X,
@ ~ y ~ K such that ~(x) = Ixll = I] x - ylI.
c) As in §23 we now specialize the above considerations to the
case where the convex set K is c o n t a i n e d in an n - d i m e n s i o n a l sub-
space of X. In fact, for simplicity, we will assume that K is
an m - d i m e n s i o n a l subspace of X.
Theo____rem. (Singer) An n - d i m e n s i o n a l linear subspace M of a
nls X is C h e b y s h e v if and only if there do not exist
{~l,...,¢m} C ext (U(X*)) (where m ! n (real scalars) or m ! 2n-i
(complex scalars)), kl,''-,km > 0, kl +'''+ km = i, x ~ X,
~ y ~ M, such that
Cj(x) = l[xl I = Ilx - Yl[ V j,

m
j=~iXjCj ~ S(M±).
Proof. Clearly if such Cj, kj, x, y exist then by the pre-
ceding Corollary, M is not Chebyshev. Conversely, assume that
is not Chebyshev. Then z ~ X ~ M having distinct b.a.'s

112
yl,y 2 ~ M. Let x = z - YI' Y = Y2 Yl ~ 8' so that @ and y

are both b.a.'s to x from M. Let Y - span ({x,M}). By 22b)
~@ ~ S(Y *) such that ~(M) = 0 and I ixll = ¢(x) = @(x - y) =
If x - yl I. By 23d) ~ {¢l"''"~m } C ext (U(Y~)), XI,--.,X m > 0,
X1 +'''+ Xm = i such that
m
(i) % = ~ XD ~j.
j=l
Here m ! n + 1 (real scalars) or m < 2n + 1 (complex scalars).
Since IICjI[ = 1 and lj > 0, @j(x) = llxll = fix - yII, wj.
Also, l@j( x - Y) I ~ II x - yll = @j(x), so that re @j(y) ~ 0. To-
gether with
j ! I X j ~ j (M) = O,
we deduce
(2) Cj (y) = O, ~ j .
It now remains only to show that the ~j can be chosen so that
m ! n (real scalars) or m ! 2n - 1 (complex scalars); then an
application of 23c) will complete the proof.
Claim. The set {}l,...,~m } can be chosen to be real linearly
independent.
Granting (for a moment) the Claim, assume that the scalars are
real and that m = n + i. Then since Y is (n+l)-dimensional, the
set {gj} forms a basis for Y*. But then by (2) y = @, a contra-
diction. Consequently, m < n. Similarly, if the scalars are com-
plex, the a s s u m p t i o n that m > 2n leads to a c o n t r a d i c t i o n (since it
entails real-dim ({v E Y: ~j(v) = 0 ~j }) ~ I, while (2) implies
that this subspace contains the real linearly independent set
{y, iy}).
113
Proof of the Claim. Choose m to be minimal wrt the represen-
tation (i), and assume that {91,...,9 m} is (real) linearly depen-
dent. Then ~ ~l,''',~m not all zero such that a191 +...+ ~m~m = @.
Since ] 19] I = i, ~ v ~ S(Y) such that 9(v) = i, and hence
~j (v) = 1 ~j , by (i). So ~i +'''+ am = 0, and
m
j=2~j~"(~j - 91) = @.
Thus the set {92 91,...,¢ m - ~I } is linearly dependent, and there-
fore the d i m e n s i o n of its linear hull is at most m - 2. Consequent-
ly,
dim (co ({91,'-',9m})) ! m - 2.
Now we also have
(91,''',9 m} = ext (co ((~l,...,%m})),
as follows from 5e) and the fact that ~j ~ ext (U(Y*)). But now,
since 9 E co ({91,...,9m}), 23b) implies that ~ is a convex
combination of at most (m - 2) + 1 = m - 1 of the 9j, and this
contradicts the m i n i m a l i t y of m, qed.
d) Lemma. Let X be a nls of dimension at least n. Then
U(X*) contains at least n linearly independent extreme points.
Proof. Exercise 45.
e) Definition. An n - d i m e n s i o n a l subspace M of a nls X is
an interpolating subspace if for every linearly independent set
{~l,...,gn) C - e x t (U(X*)) and every set {Cl,...,c n} of scalars,
there is a unique y ~ M for which ¢i(y ) = ci, Vi.

114
This definition was given in [3]. It clearly reduces to the
definition of Haar subspace when X = C(~), in view of 15c). Sub-
spaces with this special property are rather rare in general normed
spaces; for instance, if X* is rotund, then X contains no (non-
trivial) interpolating subspace.
Exercise 46. Prove this last statement.
The remainder of this section is devoted to the connection be-
tween the interpolating property and the Chebyshev property for finite
dimensional subspaces of a nls. The preceding exercise shows that,
in general, a Chebyshev subspace need not be interpolating.
f) Theorem. Let M be an n-dimensional interpolating subspace
of a real nls X. Then M is a Chebyshev subspace.
Proof. Suppose that M is not Chebyshev. By the theorem in c),
there is a linearly independent subset {%l,''',}m ] C ext (U(X~))
(m ~ n), XI,'..,X m > O, X 1 +''.+ km 1 x E X, @ / y ~ M, such

that
cj(x) = llxll = llx- yll Vj,

m
X.@j ~ S(MA).
j=l ]
If we assume that m = n, equation (2) in the proof of c) together
with the interpolating hypothesis on M show that y = 8, a
contradiction. Therefore, we assume m < n. Then the lemma in d)
shows that {¢l,...,}m} may be extended to a linearly independent
set {~l,''',gm,-'',~n} C ext (U(X*)). Setting Xm+ 1 . . . . . X n = 0,

we h a v e
M~
j=l ]
115
Let {Xl,...,x n} be a basis for M. Then the equations
n
j~IXj~j (Xk) = 0, i ~ k ~ n,
have a non-trivial solution (Xl,...,ln). But this implies that the
determinant det [%j(Xk) ] = 0, and again this contradicts our hypo-
thesis that M is interpolating, qed.
g) The upshot of the preceding two sections is that, in real
spaces, interpolating subspaces are a special kind of Chebyshev sub-
space. As such they may be expected to possess properties not en-
joyed by general Chebyshev subspaces. This is certainly the case,
and such properties are studied in [3]. We now give the famous re-
sult which states that the two types of subspaces coincide in spaces
C( a ) .
Theorem. (Haar, Kolmogorov) An n-dimensional subspace M of
C(~) is a Chebyshev subspace if and only if it is a Haar subspace.
Proof. Assume that M is a Haar subspace. If the scalars are
real, M is Chebyshev by f). Now, if the scalars are complex, we
may still proceed as in the proof of f). However, we observe that
the set {¢l,...,~m } is pairwise (complex) linearly independent
(this follows by definition, using that this set is real linearly
independent). But the special nature of the extreme points of
U(C(g)*) shows that a pairwise linearly independent subset thereof
is actually linearly independent. Thus the proof of f) applies in

the present case as well.
Conversely, assume that M is not a Haar subspace. The idea
will be to use c) to show that M is not Chebyshev. Since M is not
Haar, there are y s S(M) and points tl,...,t n E such that
y(tj) = 0, I ! j j n. It follows that ~ scalars

116
]~1] +'''+ ]an[ = 1, such that
n
i
~ c~.6 eM
j=l J tj
We now use the Tietze extension theorem to obtain z ¢ S(C(fl)) such
that z(tj) = sgn (~j) for 1 ~ j ~ m. Here the aj are labeled
so that exactly the first m, 1 < m < n, are non-zero. Define
x = z(l - IY[)" We then have
x a s(c(~)),
x(tj) = sgn (~j)) 1 < j < m,
and
2 ~_ Ix - y l [ ~_ II lxl + tyl II
= I1 IzlCl - lyJ) + tyl It ~_ 1.
Finally, for i < j < m, define
Cj = sgn(~j)~t.
]
XD = I~jl.
Then since
Cj (x) = sgn(~j)x(tj)
= sgn (~j) sgn (~j) = 1
= llxll = llx yll,
the c o n d i t i o n of the theorem in c) is v i o l a t e d and M is not
Chebyshev, qed.
h) We have now established several conditions sufficient to
ensure that certain subspaces of various normed linear spaces are
Chebyshev. Eventually we are led to inquire whether there necessarily

117
exists any Chebyshev subspace in a given nls. For reflexive spaces
it has been shown by Lindenstrauss [39] that there is a closed hyper-
plane which supports the unit ball at a single point, and so is the
translate of a Chebyshev subspace. However, Garkavi [18] has ex-
hibited a non-reflexive and non-separable Banach space which contains
no Chebyshev subspace.
To reproduce this example, we first introduce some notation.
Let card (S) denote the cardinality of a set S, and let
c ~ card (RI). The is of all bounded scalar-valued functions on S
is denoted %~(S); it is a Banach space under the sup-norm. The
support of x ~ i~(S) is the set o[x) = {s E S: x(s) ~ 0}, and
the (closed) subspace of Z=(S) consisting of those functions with
countable support is denoted %~oCS).
Theorem. (Garkavi) If card (S) > c, then ~o (S ) contains
no Chebyshev subspace.
Proof. Let M be a (closed) subspace of Z](S). Suppose first
that card (M) < c. Since the union of c countable sets has card-
inality c, ~s o E S such that Y(So) = 0 Vy e M. Let x be the
characteristic function of {So}. Then for any y ~ U(M).
I] x - Yl] = max {i, sup {Ix(s)-y(s)l}} = i,

S~S 0
that is, y is a b.a. to x from M. On the other hand, if
card (M) > c, and x ~ M, let y be any b.a. to x from M Cif
one happens to exist). The set A ~ o(x - y) is countable, and
since card (M) > c = card (%~(A)), there must exist ~ g S(M)
such that yIA = @. But then y - t~ is also a b.a. to x from M
for sufficiently small t > 0, qed.

118
Remark. It is apparently unknown if there is a separable Banach
space with no Chebyshev subspaces. However, there does exist a
separable incomplete nls containing no Chebyshev subspaces, namely
the space of finitely supported scalar-valued functions on the inte-
gers with sup-norm (Klee-Singer).
Exercise 47. Show that the space co has no infinite dimen-
sional Chebyshev subspaces. But, for each n = 1,2,..., show that
there is an n-dimensional Chebyshev subspace.
,~29. Algorithms for Best Approximation
Let M be a finite dimensional subspace of a nls X. Given
some x c X \ M, how can one actually compute a b.a. to x from M?
Many practical algorithms for computing such b.a.'s can be considered
as particular cases of the "method of nearby norms". This method
generates a (relatively compact) sequence of elements of M which
clusters in the set of b.a.'s to x. The members of this sequence
appear as b.a.'s to ~ from M wrt a sequence of norms on X which
converges pointwise on X to the original norm on X. Since all the
analysis takes place in span ({~,M}), we can and will assume that
X is finite dimensional in the main theorem.
a) Theorem. (Kripke) Let p be a norm on the finite dimen-
sional is X, and {pk } a sequence of semi-norms on X which con-
verges pointwise on X to p. Let M be a linear subspace of X,
and ~ ~ X \ M. For each k choose a Pk - b.a., Yk' to x from

M. Then
1) Every subsequence of {yk } has a p-convergent subsequence;
2) lim p(~ - yk) = p-dist (x,M);
3) ~very p-cluster point of (yk } is a p - b.a. to x from
M;
119
4) If ~ has a unique p - b.a., 7, from M, then
lim p(y - yk) = 0.
Proof. Let {Xl,...,x n} be a basis for X, and define a norm
on X by
n n
o(x) _-- o( y~ c i x i ) = ~ Icil.
i=l i=l
Since all norms on X are equivalent, there exists I > 0 such that
~(x) < xo(x), V x ~ x.
We now claim that {pk } constitutes a uniformly p-equicontinuous
family of functions on X. To see this, let e = sup {Pk(Xi);
1 ! i ! n, k = 1,2,...}. Then for each k, and any x e X,
pk(x) = pk(~cixi) <_ )~ I c i l p k ( x i)
<_ ~ ( x ) <_ ~Xp(x),
and so
lPk(X)-Pk(Z)j < Pk(X - z) <_ ~Xp(x - z ) ,
which proves the claim. It follows that the sequence {pk } c o n -
verges uniformly to p on the compact p-unit sphere S (X); in

P
particular, if 0 < 6 < I, and k is sufficiently large, the
homogeneity of semi-norms implies
(i) (1-~)p(x) <_ Pk(X) <_ (l+~)p(x), #x.
Note that, for such k, (i) shows that Pk must actually be a

norm on X.
We next claim that {yk } is a p-bounded set. Because, taking
= 1/2 in (i) ,
P(Yk ) <__ 2Pk(Y k) 5_ 4Pk(X) <_ 4~Xp(x);
this proves the claim, and hence statemont i). Now note that 3)
120
follows from 2). Also, if 4) is false, there is 6 > 0 and a
subsequence {zj} of (yk } such that p(y - zj) > ~. Using I),
let ~ be a cluster point of {zj}. Then p(~ - 5) > 6 so that
7 ~ ~, but by 3), ~ is also a b.a. to ~, a contradiction.
It remains to prove 2). Given ~ in (0,i), we choose k
large enough that (i) holds, and then
P(~ - Yk ) ! Pk (~ - Yk ) + ~P(~ - Yk )
-< (i + Ji_]
i-~ Pk (~ - Yk ]
1
< ~ Pk(X - y), V y e M,
l+g
S T I-E P ( X - y ) ,
qed.
Remark. This last estimate indicates the problem which must
be solved in order to obtain useful bounds on the amount by which
p(x - yk ) exceeds the best value, p-dist (~,M). Namely, we must be
able to estimate sup (IPk(X) p(x) I: p(x) = I}.
b) We now consider some applications of the preceding theorem.
In general, given the p-norm with respect to which best approxima-
tions are desired, the nearby (semi-) norms Pk should be chosen so
that, first, it is (relatively) easy to compute Pk-best approxi-
mations (e.g., take Pk to be quadratic, or differentiable, or
discrete (i.e., Pk has a finite codimensional kernel)), and second,
it is possible to obtain (useful) estimates on the rate of con-
vergence Pk ÷ p"
Lemma. Let be a finite positive measure on some measure

space, x ~ L~(~), and 1 ~ p ~ +~. Suppose that Pk ÷ p" Then
lJXjlpk ~ IIXllp.
121
Proof. Exercise 48. (The cases p = 1 and p = +~ are of
special importance for the applications we have in mind.)
c) Corollary. (Polya) Let ~ be as in b), M a finite dimen-
sional subspace of L~(~), and ~ E L~(~) \ M. If Pk ÷ +~ then
the sequence of (unique) LPk(~)-best approximations to x from M
is a relatively compact set in L~(~), and any cluster point is an
L~(~)-best approximation to x.
In particular, if {x,M} C C([0,1]), and M is a Haar subspace,
then the sequence just defined converges uniformly to the (unique)
b.a. to ~ from M. If further M = P and ~ is a smooth function,

n
then Peetre [53, p. 255] obtains the estimate
IrT-ykIE {log Pk)

Here I1-It i s t h e s u p - n o r m on [0,11, and Yk(resp. y') is the
unique LPk(dt)-b.a. (resp. C([0,1])-b.a.) to x from P
n
Remarks. 1) I t was c o n j e c t u r e d f o r some t i m e t h a t the sequence

{yk }, where Yk is the unique LPk(!a)-b.a. to ~ from M, was i n
fact a convergent sequence in L~(!~). It i s now known [12, 46] t h a t
this is indeed the case when L~(~) is finite dimensional. In fact,
the limit was identified by Descloux as the "strict approximation" to
from M, a concept earlier defined by Rice [63] in the context of
uniform best approximation by functions defined on finite sets, and
shown by him to be uniquely specified, even though uniform best ap-
proximation is not generally unique.
2) The problem indicated in the Remark in a) has been recently
studied by Hebden [24] when Pk is the LPk(~)-norm. Combining the
bounds in [24] with the result in a) will yield a value of p (pro-
bably larger than necessary) for which the LP(~)-b.a. to x from M
is a suitable substitute for the uniform b.a. to ~ (in the sense

122
that the L~(~)-distance between this approximation and ~ is within
some preassigned tolerance of the distance from ~ to M).
3) An apparent improvement on the Polya algorithm has been given
by Karlovitz [33]. The numbers Pk are restricted to be even inte-
gers. Then a sequence in M is constructed by alternately solving a
weighted L2(~)-best approximation problem out of M and then a one-
dimensional LP(~)-best approximation problem. This sequence con-
verges uniformly to a specified LPk(~)-b.a. to ~ from M. Thus it
is possible to avoid the actual computation of LPk(~)-best approxi-
mations. By choosing a specified member of each such sequence (for
each pk), we obtain a sequence, bounded in L~(~), all of whose
cluster points are L~(~)-b.a.'s to x from M.
(Note: in Remarks 2) and 3), the measure ~ is Lebesgue measure
on some compact subset of Euclidean space, and the functions involved
are all continuous.)
d) It is not difficult to see that the theorem in a) remains
valid if the subspace M is replaced by any closed convex subset of
X (check!). If X is also rotund (or, more generally, possibly
infinite dimensional but uniformly rotund), another approach to the
construction of algorithms for approximating b.a.'s out of convex
subsets has been given in [25]. Basically the idea is to solve a
sequence of best approximation problems from convex sets w~ich are
appropriately "near" the original set, and which are such that the
associated b.a. problems are easier to solve. Such sets might be
hyperplanes, balls, or (convex) polyhedrons. Note that, in contrast
to the algorithms discussed earlier in this section, the norm remains
the same throughout.

123
~30. Proximinal Sets
In the many preceding sections of Part III we have considered
characterizations and uniqueness of best approximations, while glossing
over the basic question of their existence. We turn to this question
now.
a) Definition. Let A be a subset of a nls X. Then A is
called proximinal if every x e X has at least one b.a. from A. On
the other hand, A is called anti-pr0ximinal if no x g X \ A has a
b.a. from A.
Obviously a proximinal set must be closed in X, and any compact
set is clearly proximinal (see also 31a)). It is easy to see that any
closed subset of a finite dimensional space is proximinal, but ~he
following general existence theorem subsumes this along with many
other special cases.
Theorem. A w*-closed subset A of a dual space X* is proxi-
minal.
Proof. Civen any y X*\ A, the function z '-'flY " zIi is
w*-Isc on X*, and therefore attains its minimum on the w*-compact
set (y - A ) ~ 2d(y,A)U(X*).
Corollary. A reflexive subspace of a nls is proximinal, and so
is a closed convex subset of a reflexive space.
b) The preceding conditions are sufficient but not necessary for
the existence of best approximations. In the remainder of this section
we consider some special kinds of subspaces of non-reflexive spaces.
Let us first recall that a closed subspace M of a nls X is factor-
reflexive if the quotient space X/M is a reflexive (Banach) space.
We then have the following necessary condition.

124
Lemma. (Phelps) Let M be a p r o x i m i n a l factor-reflexive sub-
space of a nls X. Then every functional in M~ attains its n o r m on
u(x).
Proof. Exercise 49. (It is to be shown that 9 s M ~ implies
the e x i s t e n c e of x ~ S(X) for w h i c h } (x) = I 1¢11.)

In p a r t i c u l a r , this lemma applies to the case w h e r e M has
finite codimension in X. Although generally not s u f f i c i e n t , the
condition does characterize proximinal subspaces of c o d i m e n s i o n one,
as we will see next. In general, the u s e f u l n e s s of the c o n d i t i o n in
a particular nls X depends on having available (useful) knowledge
of the form of those functionals in X* which attain their norms.
Such information has been summarized in [56, 57], for example.
Theorem. Let H ~ {x ~ X: re ~(x) = c} be a h y p e r p l a n e in the
nls X defined by ¢(/ @) in X* and c ~ R I. Then H is proxi-
minal (resp. anti-proximinal) if and only if ~ attains (resp. does
not attain) its n o r m on U(X).
Proof. Either directly, or as a c o n s e q u e n c e of H e l l y ' s theorem
(e.g. [15, p. 86]), we see that
(i) d(x,H) = Ic - re ~ ( x ) I / I I ~ l l ,
for every x ~ x. Now if ~ attains its norm on U(X) at some
z ~ S(X), then
cz
PH(X) ~ x - t1¢ti -1 re ¢(x - l i¢ll )z
is a b.a. to x from H, since its d i s t a n c e from x is the same as
d(x,H) given in (i).
On the other hand, if does not attain its norm, then given
x E X and y ~ H such that

125
llx - yll ~ dCx,H) = ire ¢(x) cl/lI*ll
= Ire ¢ ( x - Y) I / I I ¢ I I ,
it follows that x - y = 0, that is, x a H. H e n c e no x m X \ H
has a b.a. from H, qed.
According to a profound theorem of James [32], such anti-proxi-
minal hyperplanes exist in every non-reflexive Banach space. To s e e
a simple example of such behavior, let X = LI([o,1]), and consider
the functional
? 1
,(x) z | tx(t)dt, x e X.
J O
c) We g i v e next an example of a closed but w*-dense subspace of
a dual space which is proximinal. In general, there is n o t much
theory of best approximation from non-reflexive and non-factor-reflex-
ive subspaces, and each case must be h a n d l e d on a n ad h o c b a s i s . For
some o t h e r recent e x a m p l e s we r e f e r to [29], wherein it is observed
that, while most of the standard Banach spaces are proximinal when
e m b e d d e d as subspaces of their second duals, it is possible for such
an embedding to result in an anti-proximinal subspace.
Theorem. (Holmes-Kripke) Let a be a paracompact topological

c~
space (e.g. a metric space). Let M be t h e subspace of ~R(~)
consisting of all (bounded) continuous functions on a. Then M is
proximinal in £~(a).
Proof. For any x e ~R(~), define
x,(t) ~ lim inf {x(s): s + t},
x*(t) ~ lira sup {x(s): s ÷ t},
so that xe(t) < x(t) < xe(t), V t E ~. Then define

126
1
a = A(x) -z ~ l l x * x, ll,
U = X, + A, v = X - A,
the n o r m []']] being the sup-norm. Then v ! u and we c l a i m that
a = d(x,M) and that y is a b.a. to x from M if and o n l y if
v< y 0, ~ y ~ M such
that li x - Yli ~ d + ~, or
x- d- e < y < x + d + e.
Taking the lim sup of the left h a n d inequality, and the lim inf of
the right h a n d side y i e l d s
x* d - e < y < x, + d + ~,
whence
0 < x* - x, < 2d + 2 e ,
or A ! d + s. Next, if v < y < u, then
x- A <x* - A= v<y
< U = X~ + a < x + A,
whence d ! llx - Yli ~ A ! d. Finally, the e x i s t e n c e of s u c h a y
follows f r o m the Interposition Theorem of D i e u d o n n ~ [13, p. 75]
taking into a c c o u n t the p a r a c o m p a c t n e s s hypothesis and the fact that
u (resp. v) is isc (resp. usc) on ~, qed.
Remark. The foregoing r e s u l t has b e e n e x t e n d e d in two d i f f e r e n t
directions. First, Olech [521 has allowed the f u n c t i o n s being approxi-
m a t e d by the b o u n d e d continuous functions on ~ to take as v a l u e s
bounded subsets of some E u c l i d e a n space. Second, Holmes and K r i p k e

127
[28] have developed a general approach to approximation by various
subspaces of Z~(~) which depends on the idea of "interposing" a
function between two others (in the sense that y was interposed
pointwise between u and v in the preceding proof).

Part IV
Comments on the Problems
Exercise 2. For fixed t, 0 < t < I, we must show that
A ~ t int(K) + (l-t)cl(K)C int(K). Since A is open, it is suf-
ficient to show A C K. Now x ~ int(K) so t(int(K) x) is an
open @-nbhd. Therefore, (l-t)cl(K) = cl((l-t)K) C (l-t)K +
t(int(K) x) = (l-t)K + t int(K) tx C K - tx, qed.
Exercise 3. Let ~'a + b I be a net in A + B convergent to some
x ~ X. Apply compactness to the net Ib I and then use continuity of
addition in X.
Exercise 4. Since X is separable, every subset is a Lindel~f
space. Apply 3i) and then apply Lindel~f's Theorem to the union of
the complements of the resultant half-spaces.
Exercise 6. Let KI,...,K n be the sets in question. Then co(UKi)
is the image of the compact set {(Xl~tl,...~Xn,tn) ~ (X × RI) n"
x i ~ Ki, 0 ~ ti ~ i t t I +...+ t n = I) under the continuous map
(Xl,tl,''',Xn,tn)~Ztixi •
Exercise 7-2~. Recall the implication of equality in Minkowski's
inequality.
Exercise 9. U(X) is weakly compact so by 5e) K ~ co(ext(U(X))) is
w-dense in U(X). But cl(K) being closed and convex is w-closed,
hence cl(K) = U(X).

129
Exercise i0. ext(U(CR(~)) ) = {x ~ CR(~): jx(t) I = 1 ~ t c ~}.
Exercise ii. Verify condition Se-2) by use of the C o r o l l a r y in Sd).
Exercise 12. Let x,y E K and define 9(t) - (l-t)f(x)+tf(y)-f
((l-t)x+ty). Then f E Cony (X)<=>9(t) > 0 = 9(0) for 0 < t < i.
If this latter condition holds then E(x,y) = 9' (0) >_ 0. Conversely,
if x,y E K =>E(x,y) > 0 then ¢(t) = (l-t)E(x+t(y-x),x) + tE
(x+t(y-x),y) > 0. Now if f is also of class C2 on K and
x,y e K (x # y), then ~t, 0 < t < i, such that
E(x,y) (I/2)d2f(x+t(y-x)). (y-x,y-x), using Taylor's formula.
Exercise 14. For 1 < p < +~, the formula for Vf(Xo) is o b t a i n e d
by writing out the appropriate difference quotient and d i f f e r e n t i a t i n g
under the integral sign. For p = 1 the has condition on x° is
that u({t: Xo(t ) = 0}) = 0, and then Vf(Xo) can be i d e n t i f i e d
with the function sgn x ° ~ S(L~(~)).
Exercise 17. The function g(x) ~ f(Xo+X)-f(Xo) is continuous at
and -g(-x) ~ -f'(Xo;X ) ! f'(Xo;X ) ~ g(x) by 7a) and 7b). Thus
f'(xo;. ) is continuous at @ and hence at any x because of its
sublinearity. Of course the criterion of 10d-2) is also applicable
here.
Exercise 19. Without loss of g e n e r a l i t y assume x ° = @. Choose
6 > 0 so that I]xll < 6 ~If(x)-f(@)l < I. Let V = (6/2)U(x)
and ~ = 8/6. Suppose ~ x,y e V such that f(y~-f(x; > xllx-yll.

Let a = ~/2[ Ix-yll, then ~(f(y)-f(x)) > 4. Suppose ~ ! i; then
4 < f(y)-f(x) ~ If(y)-f(x)l ! If(y)-f(@)l+If(x)-f(@)I ~ 2, contra-
diction. Therefore ~ > i. But, if z z x+~(y-x), then

130
y = (i/~)z + (l-I/~)x, so f(y) < (i/~)f(z) + (l-i/~)f(x); this
implies ~(f(y)-f(x)) <_ f(z) f(x) and the same contradiction re-
sults since I lzll < 6. (Halkin)
Exercise 20. First compute that
Vf(x).y = 2 (x(t)y(t) + :k(t)~(t))dt

O
= ,
where
T
2u(t) = (l+t)I x(t)dt
O
- (t-s)x(s)ds + x(t) i.
o
Applying lld) we m u s t choose x° c X so that Xo(0 ) = 1 and the
Riesz representer u for Vf(Xo) is identically one on [0,T]. We
are led to the ODE ~ - x = e and finally to the solution
Xo(t ) = cosh(t) tanh(T)sinh(t).
The value of the program is tanh(r).
Exercise 21. The solution is the point (2/~'~,2).
Exercise 24. By 14f), f** < f, so f** < c--o(f) < f. Hence
f* ! c-o(f)* ~ f*** = f* by 14g). Therefore, c--$(f) = c-6(f)** = f*~.
Exercise 26. Let K be the right hand side of (3). Then
K ° = (UA~) ° = U A °°~ = U A by 15b). Now take polars and use 15b)
again.
131
Exercise 27. 2) Let K be the cone {x e X: f'(Xo;X ) < 0}. We
have C(xo,f) C K by 7a). Now choose ~ E K; then ~ e > 0 such
that 0 < t < a =>f(Xo+t~ ) f(Xo) z -6 < 0. Let V be the x-nbhd.
{x s X: If(Xo+a~ ) - f(Xo+SX)] < 8/2. This V and s meet the
requirements of 16c).
Exercise 28. 2) By 16d) and Exercise 27 it is sufficient to show
that - ) < 0.
x s C(Xo,~ ) ---)g'(Xo;X We have x ° + t~ c ~ for
0 < t < e so g~ (x o;x) <_ g ( X o + t X ) g(Xo) <_ 0. Now by hypothesis
~x I so that g(xl) < g(Xo) and so g' (x o;x l-xo) < 0. Since
C(Xo,~ ) is open, ~ ~ > 1 so that z - (Xl-Xo) + ~,(7-Xl+X o) e ~,
and therefore g' (Xo;Z) 5_ 0. Finally,
1 1
g'(Xo;~) = g'(Xo;i-z + (1-X-)(xl-x o))
5_ ~ g'(Xo;Z) + (i- )g'(Xo;Xl-Xo) < 0.
Exercise 29. Any ¢ ~ X* has the form
* (x,y) -- 0 xdta 1 + 0 ydl~ 2
where ~i ~ rca ([0,I]). Let V° be the V o l t e r r a operator:
x ~-'~ x(s)ds and 81 point evaluation at t = 1. Then if

0
qb e C ( ( x o , Y o ) , A ) "L a n d y(t)dt = 0,
0
0 =
fl
0 V°(y)d~l +
Ii 0 Ydu2,
so ~2 = (c81-~i) ~ Vo for some constant c. Now take ~ = ~i to
prove (2) in 17c).
Exercise 30. Some members of M -a are
d~ = (sin ~s)(sin ~t)ds dt;

t32
d~ = (l-ns n-l) (l-ntn-l)ds dt;
d~ = ( 8 1 ( s ) - 6 o ( S ) ) ( 6 1 ( t ) - 8 o ( t ) ) .
The answer is a(Xo,M) = 1/4.
Exercise 33. Of the several implications here only the one that
(4) = > s u p p o r t (u¢) C {t ¢ 2: x(t) = _

+ [Ixl]} ~ Et is (perhaps)
unclear. Assuming, as we may, that IIxII : i,
= ~(~+)+(-1)~(~-)
= ~+(e+)-~- (~*)_~+(e-)+~- (~-)
whence
o < ~+(~ \ ~+)+~+(~-)

= -.-(~ \ E-)-,-(~+) < o,
so all terms here are zero, and the result follows.
Exercise 34. Let E be a Borel subset of [0,I]. Then
fxd~ = fx ~-C
d~at "
E E
Exercise 35. Let K be the set of all n o r m - p r e s e r v i n g extensions
of ~. Then K is a w ~ - c o m p a c t U(X*)-extremal set and so 5c), 5d)
apply,
Exercise 37. The condition of the c o r o l l a r y holds if and only if
~l,...,Zm > 0, X 1 +...+ Xm = 1 such that

133
0 j~iXj~j (X-Xo)*j (Xl),
• . ° • • ° 0 . ° • • • •
m
0 = j=IXj~j(X-Xo)~j
I (x n)
where m < n + I (real scalars) or m < 2n + 1 (complex scalars),
~j ~ e x t ( U ( X * ) ) , and l~j(X-Xo) I = llX-Xol I. Here we have used 23a).

Defining
= ,j(X-Xo)/llx-xol 1,
we have ]~j] = 1 and ~j --- ~j~j ¢ ext (U(X*)). Also
~j (X-X O) = ~j~j (X-X O)
= ~j I~j (x-x o) ]sgn(~j (X-Xo))
= llxxol I
Hence the condition of the corollary is equivalent to the nas condi-

tion of the theorem in 23f).
Exercise 38. Recall that the coefficients X! n) are all positive,

J
and that ix!n) = ][~] I independently of n. Now use the classical
j ]
Weierstrass theorem to approximate any given x ¢ CR([-I,I]) by a
polynomial p. Then
]¢(x)-An(X)] ! ]~(x)-¢(p)]
+ I~(P)-An(X)l ! 2IIp-xll It~lt
which can be made arbitrarily small by proper choice of p.
Exercise 39. Writing t ] = exp(~j log t) reduces I) to a special
case of 2 ) , which i n t u r n , follows directly from the theorem in 25c).

134
To prove 3), we write
n
p(t) - ~ a k cos(kt) + b k sin(kt)
k=o
for arbitrary but fixed real ak, bk; it must be shown that p has
at most 2n distinct roots in the interval [0,2~). There are
complex ck such that
p(t) = e -int ~nckeikt.

k=o
Hence i f
2n
q(z) - X ckzk,
k=o
then with z = exp(it),
q(z) = exp(zn)p(t).
Since q has at most 2n distinct roots, the same is true for p
in the interval [0,27). Finally, for 4), we observe that a linear
combination of
n
{c°s(kt)}nk=o (resp. {sin(kt)}k=l)
is an even (resp. odd) trigonometric polynomial of degree n, so
by 3) can have at most 2n roots in [-~,~).
Exercise 40. Assume, if possible, that
D(Sl,...,Sn) < 0 < D(tl,...,tn).
Then _~ ~, 0 < ~ < i such that
D(k~l+(l-k)tl,''',ks n + (l-~)tn) = O;
since M is a Haar subspace, this entails

135
Xs i + (l-~.)t i = ~sj + (l-~)tj for some i < j. Therefore, ,
0 < X(sj-si) = (l-X)(ti-tj) < 0, a contradiction.
Exercise 41. The b.a. to 4"t from P1 on [0,I] is t + 1 The
b.a. to ]t[ from P1 (resp. P2) on [-I,I] 1

is t - ~- (resp
1
t 2 + ~).
Exercise 42. The o r t h o n o r m a l i t y of the indicated sequence in L2(V)
follows from the familiar formula
cos(ms) cos(ns)ds = -2- 6mn'
by making the change of variable t = cos(s). The indicated sequence
is complete in L2(U), since CR([-I,I]) is dense in L2(U)
(using Lusin's theorem), and the p o l y n o m i a l s are dense in CR([-I,I]).
Exercise 43. Assume that the nls X is rotund and that llu+vll =
llull + [Ivll for some u, v e X. The function
~(t) z [I]'F~ + ~ l l
is convex on [0,i] and ¢(0) = ¢(i) = I. We will show that
9( 1 ) = I; this will imply that ~ is c o n s t a n t l y equal to one on
[0,I] and c o n t r a d i c t s rotundity unless u/llull = v/ilvll. Assuming,
as we may, that llull = i, we must show that ~ - [Lu+v/HvlL II = 2
(it's certainly < 2). If llvI[ > I, then ~ > liu+vlL llv-v/llvll II =
1 + ILvll (l-llvll-l)llvll - 2. And if llvl[ < I, then II ILvllu+vll >
[hu+vH - [lu-iIv[[u[[ = 1 + [[vl[ - (1-[Ivil) = 2[Iv[l, q e d .
Exercise 45. By 2 3 c ) , we may a s s u m e t h a t X is n-dimensional. Sup-
pose that m < n is t h e maximum c a r d i n a l i t y of linearly independent
subsets of ext(U(X*)), and let {~1 . . . . . Cm} b e s u c h a subset.
Then every ~ E ext(U(X~)) belongs to span ({~l,...,~m}). But, by

136
23d), every ~ ~ S(X ~) belongs to co (ext(U(X ~))). Consequently,
X* = span ({~l,...,~m}, so dim (X) = dim (X*) = m < n.
Exercise 46. Let M be any proper finite dimensional subspace of
X. Then ¢ ~ S(M ) C S(X *) = ext (U(X*)). Then the definition in
28e) cannot be satisfied if, in the set {¢i,...,¢n } there, we take
~i ¢ S(M±) and c I ~ 0.
Exercise 47. Suppose that M is an infinite dimensional subspace
of co and that y ~ (yi) is a b.a. to x ~ (xi) from M. Choose
an index n such that
sup {Ixi-Yil: i > n} < Ilx-y]l.
Then, because M is infinite dimensional, ~y ~ M such that ~ ~ @
but Yi = 0, i j n. (Because, if y ~ M and Yi = 0, i j n,

=)y = @, then M is disjoint from the n-codimensional subspace
(y s Co: Yi = O, i j n}, whence dim (M) ! n < +~.) It follows

that y + ty is also a b.a. to x from M if t is sufficiently
near 0, and so M is not Chebyshev. On the other hand, given a
positive integer n, choose 0 < t I <...< t n < 1 and define M to
be span ({Xl,...,Xn}), where the ith-coordinate of xk is (tk)i
Using that c* = zl and that ext(u(zl)) consists of

O
{~en: I~I = i}, where en is the nth-standard unit vector in Z I,
it is seen that M is actually interpolating, an~ hence Chebyshev
by 28f).
Remark. In fact, it can be shown, using in part 28f) and the
first half of the preceding exercise, that the only Chebyshev sub-
spaces of c are the interpolating subspaces ([3, p. 167]).

O
Exercise 49. Recalling the formula M ~= (X/M) ~, we consider the
given ~ ~ M as a functional on X/M. Since this space is re-

137
flexive, ~ z ~ S(X/M) such that ~(z) = [IS1[. Now z is a t r a n s -
late of M, and since M is p r o x i m i n a l , z has an e l e m e n t x of
minimal norm: [lzll = I]xl[ = I. Thus ~(x) = []~[1 also,

138
Bibliography
1) R. Arens and J. Kelley, Characterizations of the space of

continuous functions over a compact Hausdorff space. Trans.
Amer. Math. Soc. 62(1947), 499-508.
2) E. Asplund, Averaged norms. Israel J. Math. 5(1967), 227-233.
3) D. Ault, F. Deutsch, P. Morris, and J. Olson, Interpolating sub-

spaces in approximation theory. J. Approx. Th. 3(1970),
164-182.
4) N. Bourbaki, El~ments de math~matique, Livre V, Espaces

vectoriels topologiques. Hermann et Cie, Act. Sci. et Ind.
1189, Paris, 1953.
5) L. deBranges, The Stone-Weierstrass theorem. Proc. Amer. Math.

Soc. 10(1959), 822-824.
6) A. Br~ndsted, Conjugate convex functions in topological vector

spaces. Mat.-Fys. Medd. Dansk. Vid. Selsk. 34(1964), 1-26.
7) and R. T. Rockafellar, On the subdifferentiability

of convex functions. Proc. Amer. Math. Soc. 16(1965),
605-611.
8) R. C. Buck, Applications of duality in approximation theory.

p. 27-44 in Approximation of Functions (H. Garabedian, Ed.),
Elsevier, Amsterdam-London-New York, 1965.
9) E. W. Cheney, Introduction to Approximation Theory. McGraw-Hill,

New York, 1966.
I0) G. Choquet, Lectures on Analysis (Vol. II). Benjamin, New York,

1969.
II) D. Cudia, Rotundity. p. 73-97 in Convexity (V. Klee, Ed.),

Amer. Math. Soc., Providence, 1963.
12) J. Deschoux, Approximations in Lp and Chebyshev approximations.

J. Soc. Ind. App. Math 11(1963), 1017-1026.
139
13) J. Dieudonn~, Une g~n~ralisation des espaces compacts. J. Math.

Pures Appl. 23(1944), 65-76.
14) A. Dubovitskii and A. Milyutin, Extremum problems in the

presence of constraints. Zh. Vychisl. Mat. Mat. Fiz.
5(1965), 395-453. (Russian)
15) N. Dunford and J. Schwartz, Linear Operators, Part I. Inter-

science, New York, 1958.
16) W. Fenchel, On conjugate convex functions. Canad. J. Math.

1(1949), 73-77.
17) T. Flett, On differentiation in normed vector spaces. J. Lon.

Math. Soc. 42(1967), 523-533.
18) A. Garkavi, On Chebyshev and almost Chebyshev subspaces. Izv.

Akad. Nauk SSSR Ser. Mat. 28(1964), 799-818. (Russian)
19) , Uniqueness of solutions of the L-problem of

moments, Izv. Akad. Nauk SSSR Ser. Mat. 28(1964), 553-570.
(Russian)
2o) A. Geoffrion, Duality in nonlinear programming: a simplified

applications oriented development . Soc. Ind. App. Math.
Rev. 13(1971), 1-37.
21) I. Girsanov, Lectures on the Mathematical Theory of Extremal

Problems. University of Moscow, Moscow 1970. (Russian)
22) I. Gohberg and M. Krein, Introduction to the Theory of Linear

Nonselfadjoint Operators. Amer. Math. Soc., Providence,
1969.
23) H. Halkin, A satisfactory treatment of equality and operator

constraints in the Dubovitskii-Milyutin optimization
formalism. J. Optim. Th. Appl. 6(1970), 138-149.
24) M. Hebden, A bound on the difference between the Chebyshev

norm and the Holder norms of a function. Soc. Ind. App.
Math. J. Num. Anal, 8(1971), 270-277.
140
25) R. Holmes, Approximating best approximations. Nieuw Arch. voor

Wisk. 14(1966), 106-113.
26) , Smoothness indices for convex functions and the

unique Hahn-Banach extension problem. Math. Z. i19(1971),
95-110.
27) and B. Kripke, Approximation of bounded

functions by continuous functions. Bull. Amer. Math. Soc.
71(1965), 896-897.
28) , Interposition and approximation. Pac. J. Math.

24(1968), 103-110.
29) , Best approximation by compact operators. Ind.

Univ. Math. J. (to appear).
30) L. Hurwicz, Programming in linear spaces. Ch. 2 in Studies in

Linear and Nonlinear Programming (by K. Arrow, L. Hurwicz,
and H. Uzawa). Stanford Univ. Press, Stanford, 1958.
31) A. Ioffe and V. Tikhomirov, Duality of convex functions and

extremal problems. Russian Math Surveys 23(1968), 53-124.
32) R. James, Characterizations of reflexivity. Studia Math.

23(1964), 205-216.
33) L. Karlovitz, Construction of nearest points in the L p, p

even, and L~ norms. I. J. Approx. Th. 3(1970), 123-127.
34) J. Kelley and I. Namioka, Linear Topological Spaces. Van

Nostrand, Princeton, 1963.
35) J. Kingman and A. Robertson, On a theorem of Lyapunov. J. Lon.

Math. Soc. 43(1968), 347-351.
36) V. Klee, Extremal structure of convex sets. II. Math. Z.

69(1958), 90-104.
37) B. Kripke, Best approximation with respect to nearby norms.

Num. Math. 6(1964), 103-105.
141
38) J. Lindenstrauss, A short proof of Lyapunov's convexity

theorem. J. Math. Mech. 15(1966), 971-972.
39) , On nonseparable reflexive Banach spaces. Bull.

Amer. Math. ~oc. 72(1966), 967-970.
40) and R. Phelps, Extreme point properties of convex

bodies in reflexive Banach spaces. Israel J. Math. 6(1968),
39-48.
41) C. Lobry, Etude G~om~trique des Probl~mes d'Optimisation en

Presence de Constraintes. Universit~ de Genoble, 1967.
42) D. Luenberger, Optimization by Vector Space Methods. Wiley,

New York, 1969.
43) L. Liusternik and V. Sobolev, Elements of Functional Analysis.

Ungar, New York, 1961.
44) O. Mangasarian, Nonlinear Programming. McGraw-Hill, New York,

1969.
45) G. Minty, On the monotonicity of the gradient of a convex

function. Proc. J. Math. 14(1964), 243-247.
46) B. Mitjagin, The extremal points of a certain family of

convex functions. Sibirsk. Math. Zh. 6(1965), 556-563.
(Russian)
47) J. Moreau, Fonctions Convexes en DualitY. Facult~ des Sciences,

Universit6 de Montpellier, 1962.
48) , Fonctionnelles sous-diff~rentiables. C. R. Acad.

Sci. Paris 257(1963), 4117-4119.
49) , Sous-diff~rentiabilit@. Proc. Coll. Convexity,

Copenhagen 1965(1967), 185-201
so) , Fonctionnelles Convexes. S~minaire "Equations

aux Deriv~es Partielles. Coll~ge de France, 1966.
142
SL) I. Natanson, Constructive Theory of Functions. Ungar, New York,

1964.
52) C. Olech, Approximation of set-valued functions by continuous

functions. Coil. Math 19(1968), 285-293.
s3) J. Peetre, Approximation of norms. J. Approx. Th. 3(1970),

243-260.
54) E. Peterson, An economic interpretation of duality in linear

programming. J. Math. Anal. Appl. 30(1970), 172-196.
55) , Symmetric duality for generalized unconstrained

geometric programming. SIAM J. App. Math. 19(1970),
487-526.
56) R. Phelps, Subreflexive normed linear spaces. Arch. der Math.

8(1957), 444-450.
57) , Some subreflexive Banach spaces, Arch. der

Math. 10(1959), 162-169.
58) , Uniqueness of Hahn-Banach extensions and

unique best approximation. Trans. Amer. Math. Soc.
95(1960), 238-255.
s9) , Lectures on Choquet's Theorem. Van Nostrand,

Princeton, 1966.
60) W. Porter, Modern Foundations of Systems Engineering.

Macmillan, New York, 1966.
61) M. Powell, On the maxiumum errors of polynomial approximations

defined by interpolation and by least squares criteria.
Comp. J. 9(1967), 404-407.
62) B. Pshenichnii, Convex programming in a normed space.

Kibernetika 1(1965), 46-54. (Russian).
63) J. Rice, Tchebycheff approximation in several variables.

Trans. Amer. Math. Soc. 109(1963), 444-466.
143
64) T. Rivlin, Polynomials of best uniform approximation to certain

rational functions. Num. Math. 4(1962), 345-349.
65) and H. Shapiro, A unified approach to certain

problems of approximation and minimization. J. Soc. App.
Ind. Math. 9(1961), 670-699.
66) R. T. Rockafellar, An extension of Feuchel's duality theorem

for convex functions. Duke Math. J. 33(1966), 81-90.
67) , Level sets and continuity of conjugate convex

functions. Trans. Amer. Math. Soc. 123(1966), 46-63.
68) , Characterization of the subdifferentials of

convex functions. Pac. J. Math. 17(1966), 497-510.
69) , Convex functions, monotone operators, and

variational inequalities, p. 35-65 in Theory and
Applications of Monotone Operators. Proc. NATO Advanced
Study Institute, Venice, 1968.
70) , Convex Analysis. Princeton University Press,

Princeton, 1970.
71) H. Schaefer, Topological Vector Spaces. Macmillan, New York,

1966.
72) I. Singer, Best Approximation in Normed Linear Spaces by

Elements of Linear Subspaces. Springer, Berlin-Heidelberg,
1970.
73) S. Stechkin and L. Taikov, On minimal extensions of linear

functionals. Trudy Mat. Inst. Steklov 78(1965), 12-23.
74) J. Tate, On the relation between extremal points of convex

sets and homomorphisms of algebras. Comm. Pure Appl.
Math. 4(1951), 31-32.
75) E. Titchmarsh, The Theory of Functions, 2nd Ed. Oxford Univ.

Press, Oxford, 1939.
144
76) M. Vlach, On necessary conditions of optimality in linear

spaces. Comm. Math. Univ. Carolinae 11(1970), 501-513.
77) J. Weston, A note on the extension of linear functionals. Amer.

Math. Monthly 67(1960), 444-445.
78) K. Yosida, Functional Analysis. Academic Press, New York, 1965.

Part V
Selected Special Topics
In this final supplementary part of these notes we consider, in
varying degrees of detail, a variety of special topics in approxima-
tion and optimization. For the most part they represent areas of
current and active research interest. Consequently, our aim is not
to present definitive treatments, but rather to alert the reader who
has come this far to the existence of several further areas for
study, to indicate a few of the results already known (and when
possible, to incorporate these results within the framework of
Parts I-III), and to provide some pertinent bibliographical reference~
§31. E-spaces
The special class of Banach spaces to be defined next, the so-
called "E-spaces", appears to be the maximal satisfactory class of
Banach spaces for which all convex norm-minimization problems are
"strongly solvable" and all convex b.a. problems are "well posed"
in the sense of Hadamard (definitions below). By "satisfactory" we
mean that several different characterizations are known (over a
dozen as a matter of fact), that numerous concrete examples are
available, and that the E-property is stable wrt the formation of
subspaces, quotients and products.
a) Definition. Let (~, d) be a metric space, and ~0 a sub-
set of ~. ~0 is boundedly compact if its intersection with every
closed ball in ~ is compact. ~0 is a p_~roximativel K compact if
for any x ~ ~, every minimizing sequence in ~0 (i.e., every
sequence {Xn}~ ~0 for which d(X,Xn) ÷ d(x,~0)) has a cluster
point in ~0 ~ .
146
It is clear that bounded compactness ~> approximative compactness
=~ proximinal in any nls (or in any metric space, for that matter).
Simple examples in Hilbert space show, however, that neither of these
implications is reversible.
b) Definition. A (real) Banach space X is an E-space if X
is rotund and every weakly closed set in X is approximatively
compact.
Such spaces were first introduced by Fan and Glicksberg [5],
and characterized in several ways. We will establish one of their
characterizations next. It depends in part on the theorem of James,
alluded to in 30b), that a Banach space X is reflexive if (and only
if) every element in X* attains its norm on U(X). We use the
notation xn ~ x to denote weak convergence in a nls.
Theorem. A (real) Banach space X is an E-space if and only
if X is reflexive, rotund, and Xn, x e S(X), x n ~ x, implies
Ilx n - x]l ÷ 0 (that is, weak sequential convergence within S(X)
entails norm convergence).
Proof. Using the weak compactness of closed balls in a
reflexive space, it is straightforward to verify that these conditions
imply the E-property. For the converse, reflexivity is obtained by
applying James' Theorem and the approximative compactness of
(closed) hyperplanes. Now let IIXnl I = IIxll = I, and x n ~ x.
Choose ~ e S(X*) such that ~(x) = i. Then ~(Xn) + ~(x) = i, so
we may assume that all ¢(Xn) > 0. Let ~n = Xn/~(Xn) so
{Xn} C H z {z ~ X:~(z) = I}. Then {Xn } is a minimizing sequence
for @ in H, and since H is approximatively compact by
hypothesis, we may assume that ~n ÷ ~ e H. But since ~n ~ x also,
we have x = x. Therefore, x + x (norm convergence), qed.

n
147
c) Before establishing our second characterization of E-spaces,
we must introduce a stronger form of the d e f i n i t i o n in 27d) of a
smooth nls.
Definition. A nls X is called strongly smooth if its norm is
is F r e c h e t - d i f f e r e n t i a b l e on the open set X X {@}.
Lemma. (Shmulian) The n o r m is Frechet differentiable at
x e S(X) if and only if any sequence {~n} C U(X*) for which
~n(X) ÷ 1 is convergent.
Proof. Assume that the norm has a Frechet differential
e S(X*); we will show that any sequence {¢n } as d e s c r i b e d in
the Lemma must converge to %. Since 1 > ll%nl ] > en(X) ÷ i, we
may suppose that ll%nl I = i. Now, if I]¢ n ~I]~ 0, then
e > 0 and {Zn} C S(X ) such that (%n - ~)(Zn) { 2e. Define
1
xn = ~(I Ix[I *n(X))Zn •
Then x ÷ 0, but
n
Ilx + Xnll Ilxll ¢(x n)

IIXnll
q~n(X + Xn) 1 ¢(Xn)

>
I lXnll
1 en(X)
en(X) - 1 + (c~n qb) ( Z n ) (
1 - en(X)
G
= (q~n - oh)(Zn) e > ~,

148
which contradicts that $ is the Frechet differential of the norm
at x.
Conversely, assume that the c o n d i t i o n of the Lemma is satisfied.
Then at least the n o r m has a gradient ¢ at x. For otherwise,
by lOc), there are two distinct norm-subgradients 9, * at x, and
then the sequence {¢, ,, $, ,,...} violates the c o n d i t i o n of the
Lemma. Now if the norm is not Frechet differentiable at x, then
e > 0 and {x n} C X, x n + @, s u c h that
r lx + Xnll - I lxll *(Xn? > ~

ElXnll
or
I lx + xnll ,(x + Xn) _> ~ t lXnt I-
Choose Cn ~ S(X~) such that Cn(X + Xn) = ] I x + Xn] [. Then
Cn(X) = Ilx + Xnl I - Sn(Xn) ÷ I lxI[,
since x ÷ @. But
n
lt*n *II _> (% - ¢)(Xn/llxnll)
¢(x) - Cn(X)
> + ~ > ~ ,
llxnll
since @(x) = I IxIl > @n(X), and so (¢n } does not converge to ¢.
But this means that {@n } is not convergent, since any limit of
(¢n } must be a n o r m - s u b g r a d i e n t at x. Thus we again arrive at a
contradiction.
149
Cqrollary. Let X be a strongly smooth nls. Then the map
which sends each x(/ @) in X into the gradient of the norm at
x (= the Frechet differential of the norm at x) is continuous.
Proof. This gradient map must at least be continuous when X~
is given its w~-topology, since its range lies in S(X*) C U(X*),
and this latter set is w*-compact. But then the Lemma immediately
implies that, in fact, the map is continuous when X~ is given its

norm topology.
Thus if X is strongly smooth, its norm is actually
continuously Frechet differentiable on the open set X \{@}. It also
follows from the Lemma that the norm is nowhere Frechet
differentiable in such function spaces as CR([0,1]) and L~([O,I]).

d) Theorem. (Anderson) A (real) Banach space X is an E~space
if and only if X* is strongly smooth.
Proof. If X is an E-space, then by the theorems in b) and
27d), X is reflexive and X* is smooth. We verify the Shmulian
criterion at ¢ e S(X*) by choosing (x n} C U(X) with @(Xn) ÷ I,
and showing that {x n} is convergent. Now 1 > IlXnl I > ¢(Xn) + i,
so I[Xnll ÷ i. Let Xn ~ x n /IIXn[I, and let ~ be a weak-
cluster point of (Xn }. Then Ilxll < I, but @(x) =
lim @(Xn)/]IXnl [ = i, so [[~II = I. Now the E-property implies
that ~ is a norm-cluster point of {Xn } (note that we have used
the Eberlein theorem here, namely that U(X) is weakly sequentially
compact). Because X* is smooth, x is uniquely specified by
the conditions: ~ ~ S(X) and ¢(~) = I. Therefore, ~n ÷ ~'
and so xn ÷ x also. The converse implication is proved similarly,
again making use of the Shmulian criterion.
e) We next want to mention some of the significance of E-spaces
in optimization theory. Other uses of E-spaces are pointed out in

§ 32-35 below.
150
Consider a variational pair (~,f)(lla)) where ~ is a metric
space. We assume that the set of solutions of the associated
mathematical program is a non-empty set ~0 C ~. A minimizing
sequence for (~,f) is any sequence ~Xn)C ~ for which f(Xn) ÷
inf f(~).
Definition. Such a mathematical program is called stable if
every minimizing sequence (x n} for (~,f) satisfies d(Xn,~ o) ÷ 0.
If the solution set ~o is a singleton set, and the program is
stable, it is called strongly solvable.
Theorem. A (real) Banach space X is an E-space if and only
if every convex program (X, If'If + ~K ), where K is a closed
convex subset of X~ is strongly solvable.
Proof. The proof is straightforward; to show that this
condition implies the F-property it is enough, by the proof in b),
to show that every hyperplane K G X is approximatively compact.
Remark. A more general result has recently been obtained by
Asplund and Rockafellar [I]. Namely, if X is a Banach space, and
f a Isc function in Conv(X), then the convex program (X,f) is
strongly solvable if and only if f~ is Frechet differentiable at
@ in X~ (and then the Frechet differential is shown to belong to
X, rather than just X~).
Corollary. (Regularization Algorithm for Convex Programs).
Let f and X be as in the Remark, with X an E-space. Let Iynl
decrease to the value of the convex program (X,f). Then the convex
programs (X, If" II + ~ (~n z (x e X:f(x) ~ ¥n )) have unique
solutions, and the resulting sequence converges to the element of
minimal norm in the solution set ~ .

O
151
Thus any method of minimizing the norm in X over the convex
sets ~n leads to approximate solutions for the original program
(X,f). This Corollary has been stated by Sholohovich [Ii].
f) It remains to give some examples of E-spaces. Once an
initial class of E-spaces has been discovered, many other E-spaces
may be constructed by use of the following operations.
Theorem. The E-property of Banach spaces is hereditary and
divisible (that is, if M is a closed subspace of the E-space X, then
M and X/M are E-spaces), and productive (in the sense that if
XI, X2,... are all E-spaces, then P2(Xn) is again an E-space).
(We recall that P2(Xn) - {(Xl,X2,...):Xn ~ Xn and

~[ IXnl 12 < + ~}, with
I t(Xl,X 2 .... )11 = (~flxntl2)l/Z;
P2(Xn ) is called the Z2-product of the Banach spaces {Xn}.)
Proof. The first two assertions follow readily from the E-space
characterizations in b) and d), respectively. Now since each Xn
is in particular reflexive, and since the map
(¢1,¢2 . . . . ) ~,- ~ ,
oo
¢(x) -= ¢ ( ( X l , X z .... )) = ¢n (Xn) '

1
is an isometric isomorphism from P2(X~) onto P2(Xn)~ (check:), we
see that P2(Xn) is reflexive. Also, by use of the Schwarz
inequality in 4 2, we see that P2(Xn) is rotund (since each Xn
is). Finally, to complete the proof by use of b), suppose
{x,x (m)} C S(P2(Xn) ) and x (m) -~ x; we must show that
llx (m) - xll ÷ 0. Given e > 0, choose an index no such that

152
llxnll 2 < e. Next, since x n( m ) ~ xn for each n (on

n >n
o
account of the formula for P2(X)*),..
_ we have
lim Ilx~m) ll ~ Ilxnll,

m +
and so I x<m'lt~ ÷ I lxnll for each n (since "~llx(m~ll I lxll 1)

n '
Therefore, because each Xn is an E-space, the result in b) implies
IIx~ m) - Xnl I + 0, for each n. Thus we may choose an integer mo
so that
n
X°llx~ m) xnll 2 < ¢

n=l
for m > m o. Now, this implies in turn
n
o
CllxCm)
n
ll llx n II) 2 < e '
n=l
and so
no no
1 - ~ < llxnll z < X llx~m) ll2 + 1~".
n=l n=l
Consequently, [ I Ix(m) I 12 < e +~/~, and hence

n
n>n o
oo
] ix(m) _ xl 12 = ~ I Ix(m) x nl 12
n=l
<
o x (m) 2 2)
I I x n(m) - Xn[I 2 + 2 ~ (ll n ]1 + l]Xn]l
n=l n>n o
< e + 2(e +'~¢) + 2e,
whenever m > mo, qed.

153
g) For most practical purposes the standard examples of E-spaces
actually have the stronger property of " u n i f o r m rotundity", although
there certainly are E-spaces without this property (the simplest
example perhaps being P2(zn(2))[3]). A discussion of both some
uses and some limitations of u n i f o r m l y rotund spaces in a p p r o x i m a t i o n
theory is given in [6, 7].
Definition. Let K be a closed convex subset of a nls X.
K is u n i f o r m l y rotund if there is a n o n - d e c r e a s i n g function 8 on
[0,+~), with 8(0) = 0, 8(t) > 0 for t > 0, such that
x +Y+ z ~ K
2
whenever x,y ~ K and I Izl I _< 6(I Ix-YI I). By abuse of language,
X is called uniformly rotund if U(X) satisfies this condition.
Clearly a uniformly rotund convex set is rotund (27b)). This
definition is equivalent to the more usual definitions that X is
uniformly rotund if either i) for ~ > 0 36 > 0 such that
x,y ~ U(X) and II (x+y)/2II > 1 - 8~ I Ix-yiI < c; or 2)
(Xn,Yn}C S(X ) and Iix n + YnI [ + 2 --~Iix n YnI I ÷ o.
Theorem. A uniformly rotund Banach space is an E-space.
Proof. We consider that the Banach space X is c a n o n i c a l l y
embedded in X **. If X is not reflexive, ~ ~ S(X **) such that
d(~,U(X)) = 2e > 0. If V is any w * - ¢-nbhd. ; then
~ w*-cI(V~U(X)), by the Goldstine density theorem [4, p.424].
Let 6' -~ 28(~), where 8(e) is as in the d e f i n i t i o n i) of u n i f o r m
rotundity. Choose ~ ~ S(X ~) such that [~(~) 1 I < 8'/2. Then
define
v = (~ e x**:l~(*) iI < 6'/2}.
Now if x,y c V ~ U(X), then

154
f[x+yfl _> I ~ ( x ) + ~(y) l > 2 - ~,,
whence I Ix-yll < ~. Thus for any fixed such X,
vo u(x) C x + ~ u(x**).
Since the right hand side here is w*-closed, it follows that
e x + e U(X*~), that is, llx~ll ~ ~, a contradiction.
To complete the proof, we use definition 2) of uniform rotundity.
Suppose that {x n, x } C S(X) and that Xn--~x. Choose ~ e S(X ~)
such that ~(x) = i. Then
I Ixm + x n[ [ _> [~(x m) + ~(x n) ]
÷ 21~(x)] = 2 as m,n + +~,
and so fix m Xnl I ÷ 0. Since X is complete, x n ÷ x' e S(X);
but also Xn-a x whence x' = x, qed.
We have shown that uniform rotundity implies the E-property by
verifying the conditions of the theorem in b). It is also possible
to verify directly the conditions of the original definition in b),
and this would be a bit shorter (cf. [2, p.22]). The main interest
in the above proof is that it establishes reflexivity independently
of James' theorem, which was needed in b).
Example. For 1 < p < +~, the spaces LP (~), wP'k (G) , and S
P
(defined in 27d)) are all uniformly rotund. (The elements of LP(~)
can even be vector-valued, taking values in some fixed uniformly
rotund Banach space [9].) In particular, Hilbert space is uniformly
rotund.
155
The last assertion follows readily from the parallelogram law.
The remaining assertions all hinge on the uniform rotundity of
LP(~). The most direct proof of this fact seems to be the one
given recently by Morawetz [I0]. The case of wP'k(G) then follows
from this and the easily checked fact that the finite ~P-product of
uniformly rotund spaces is still uniformly rotund. Finally, the
uniform rotundity of the operator spaces S has been established

P
(at some length) by McCarthy [8].
We might also take note of one other class of uniformly rotund
Banach spaces. Namely, any finite dimensional rotund nls is
actually uniformly rotund. This follows easily from the compactness
of the unit ball in such a space.

156
References for §31
i) E. Asplund and R. T. Rockafellar, Gradients of convex functions.

2) E. W. Cheney, Introduction to Approximation Theory. McGraw Hill,

New York, 1966.
3) M. Day, Reflexive Banach spaces not isomorphic to uniformly

convex spaces~ Bull. Amer. Math. Soc. 47(1941), 313-317.
4) N. Dunford and J. Schwartz, Linear Operators, Part I. Inter-

science, New York, 1958.
5) K. Fan and I. Glicksberg, Some geometric properties of the spheres

in a normed linear space. Duke Math. J. 25(1958), 553-568.

Wisk. 14(1966), 106-113.
7) and B. Kripke, Smoothness of approximation, Mich.

Math. J. 15(1968), 225-248.
8) C. McCarthy, Cp. Israel Math. J. 5(1967), 249-271.
9) E. McShane, Linear functionals on certain Banach spaces. Proc.

Amer. Math. Soc. 1(1950), 402-408.
I0) C. Morawetz, Two L p inequalities. Bull. Amer. Math. Soc.

75(1969), 1299-1302.
ii) V. Sholohovich, Unstable extremal problems and geometric

properties of Banach spaces. Soviet Math. Dokl. 11(1970),
1470-1472.
157
§32. Metric projections
a) Definition. Let M be a subset of a nls X, and define
PM(X) = [y ~ M ' I I x - y l l = d(x,M)).
This set-valued mapping on X, x ~ P M ( x ) , is called the metric
projection on M.
It is clear that PM(X) is a closed and bounded (but possibly
void) subset of M, and is convex whenever M is convex. When M
is a Chebyshev set, PM is a single-valued mapping of X onto M,
sometimes known as the "Chebyshev map", "best approximation operator'~
or "proximity map". It is a natural object of study in trying to
understand the nature of a particular best approximation problem
defined by some set M. We especially wish to learn for which sets
of approximators M the metric projection is linear, or
differentiable, or (at least) continuous. That there should be any
question about the continuity of PM when, say, M is a
Chebyshev subspace of a Banach space, may seem surprising, but it
turns out that even this modest property of best approximation can
fail in general.
We consider first by far the most satisfactory setting for
metric projections, n a m e l ~ the case where X is an inner product
space.
Example. Let K be a complete convex subset of an inner
product space X. Then K is a Chebyshev set and PK is a
contraction on X:
(1) t IPK(X) - PK(Y)]] ff I Ix - y t l .

158
When K is in addition a linear subspace of X, then PK is the
usual orthogonal projection of X onto K.
Let us just prove the first s~tement; the proof depends on the
characterization of b.a.'s in 22d). Given x, y ~ X, we have
re <PK(X) - x, PK(y) - PK(X)> > O,
( y - PK(y), P (Y) - PK(x)}-> o.
Addition of these two inequalities yields
re <y - x, PK(y ) PK(X)> +
re <PK(X) - PK(y), PK(y) PK(X)) >_ 0,
whence by the Schwarz inequality,
l lx-yll I IPK(x) pK(y) I[ _> I IPK(x) - pK(y) II 2,
qed. This argument shows that equality can occur in (I) only if
d(x,K) = d(y,K). It also shows that the metric projection PK is
a monotone mapping on X, since
re < y - x , PK(y ) - PK(X)) > I IPK(y) pK(x) ll 2 > o.
Either of the above properties of metric projections
characterizes inner product spaces among general normed spaces, a
point which emphasizes again how "unnatural" is the metric geometry
associated with non-euclidean norms. For example, we have the
following theorem, the proof of which depends on the Jordan-von

159
Neumann and Kakutani characterizations of inner product spaces, and
may be found in [22, p.249].
Theorem. (James, Rudin-Smith) Let X be a nls of dimension
at least 3, such that for all subspaces M of dimension n, where
n is a fixed integer satisfying 1 ~ n < dim(X) 2, M is Chebyshev
and PM is linear. Then X is an inner product space.
b) The restriction dim(X) > 3 in the last theorem is
essential, since all 2-dimensional rotund spaces have the property
that PM is linear for every subspace M. This follows from a
more general fact about Chebyshev hyperplanes, which is a corollary
to the next result.
Theorem. Let M be a Chebyshev subspace of a nls X. Then
i) PM is idempotent and closed (i.e., has closed graph);
3) PM is homogeneous (i.e., PM(tX) = tPM(X), ~x ~ X,
scalars t);
4) PM is additive mod M (i.e., PM(x+y) = PM(x) + PM(y),
if either x or y ~ M).
The proof is completely routine; parts 3), 4) immediately
imply the corollary mentioned above.
Corollary. Any Chebyshev hyperplane M in a nls has PM
linear.
c) Consider now the "fibres" defined by PM' where M is
some Chebyshev subspace of a nls X. The fibre over y ~ M is the
inverse image (y). All such fibres are isometric, being simply
translates of one a n o t h e r :
160
pMl(y) = Y + p~l(@).
Thus we need study only the fibre over @, hereafter denoted M @,
and called the metric complement of M in X. It consists of all
x ¢ X satisfying llx[l = d(x,M), such vectors being frequently
said to be ortho on~£_~ to M. Evidently, M@ is a closed and
nowhere-dense subset of X. Also, from b), it follows that M@
is a union of one-dimensional subspaces of X, and hence is
contractible.
The metric complement M@ can also be characterized by means
of linear functionals, even if M is not Chebyshev. For, as has
been noted by Murray and Singer, it is a consequence of the Hahn-
Banach theorem that
M@ = { x e X: ~c) e S(M "L) such that
-- l Ixll}.
1) M ® M Q = X;
2) Letting QM be the quotient map: X ÷ X/M, we have
(2) M e is convex <=)
(3) QMIM e is an isometry (~)
(4) PM is a smooth mapping ~ = ~
(5) PM is linear.
161
Proof. I) We m u s t show that every x e X c a n be u n i q u e l y
expressed as a s u m y + z, w i t h y ¢ M, z e M @. Since
x = pM(x) + (x - PM(x)),
we need only check uniqueness. This reduces to s h o w i n g that if
Zl, z2 e M @ and zI z 2 = y ¢ M, then z I = z 2. But b y b),
zI = z2 + y =)
PM(Zl) = PM(Z2) + PM(y) = @ + y = y,
and since M is C h e b y s h e v , we m u s t have y = @.
2) It is c l e a r that (5) =7 (2), (3), and (4), and t h a t
(2) =~ (3), since if M@ is c o n v e x , it is a c t u a l l y a linear sub-
space. Now assume (3) and let Zl, z 2 e M @. Then
I Iz I z21] = I IQM(Zl) - QM(z2) II
= ]IQM(Z 1 z2) ll = d(z 1 - z 2, M),
which shows that zI z2 e M @ and so M@ is a s u b s p a c e , i.e
(2) holds. Finally, assume (4). This means by definition that
PM(x+ty) PM(X)
P MI ( x ; y ) - lira
t ÷ 0
exists V x, y e X, and that the m a p x~-~P~(x;y) is c o n t i n u o u s
for e a c h fixed y. Now if d > 0 then
P~(~x;y) = ~P~(x,6-1y) = P~(x;y).

162
Hence
PI~l(x;y) = lira P~l(Sx;y)

6+0
= P~(@;y) = PM(y),
and therefore
PM (x+y) = PM (x) + ~t PM (x+ty)dt
= PM(X) + P~i(x+ty~y)dt
0
= PM(X) + P M ( y ) .
This completes the proof.
It should be noted that what makes the proof of (4) ~ > (5)
"work" is the continuity of P~(';y) at @. A more usual situation
is that P~(.;y) is continuous on the open set X\M, although PM
is not linear (provided the norm on X is sufficiently smooth;
there is an extensive discussion of the differentiability of metric
projections in [9]; see also [20]).
d) We consider next a few examples concerning the linearity
of metric projections on Chebyshev subspaces of certain non-Hilbert
spaces.
Examples. I) Let X = ~P(3), I < p < ~, and M = span((l,l,l)).
Then PM is linear only if p = 2. Because,
M @ = {(a,b,c) e X: ~tII(a,b,c)-t(l,l,l)IlPlt=o = o}
= {(a,b,c) e X:ala] p-2 + blbl p-2 + clc] p-2 = 0},

163
which is not a convex set if p ~ 2. It follows that if ~ is any
positive measure on a measure space containing three disjoint sets
of positive measure, then the corresponding characteristic functions
span a subspace M of LP(~) which is isometric to ~P(3) (or at
least to a weighted zP(3) space, but this does not effect the
above example). Hence PM is not linear on LP(~).
2) More generally, Ando [i] has proved that a closed subspace
M of LP(~), where ~ is a finite positive measure and
1 < p < ~, has PM linear if and only if the quotient space
LP(~)/M is isometrically isomorphic to some other LP(v) space.
3) Let M be a finite dimensional Chebyshev subspace of
CR([0,1]). Then PM is not linear. The proof of this is an easy
consequence of a theorem of Daugavet [2], which asserts that if T
is any compact linear operator on CR([0,1]) then III + TII =
1 + IITII. (This result has been extended by Foias and Singer [5]
to cover compact operators on spaces CR(~), where ~ is perfect
(i.e., has no isolated points)). Now suppose that PM were linear.
Then
fix pM(x)1] _< II~- oif = IfxII,
whence we obtain the contradition
i : Ill PM11 : 1 + IIPMII > 2.
Observe that this argument also demonstrates that no finite
codimensional subspace in CR(~ ) can be the range of a norm-one
projection.
164
e) We now come to the question of continuity of metric
projections on Chebyshev subspaces. The basic sufficiency conditions
are contained in the following theorem.
PM is continuous if either dim(M) < ~ or else X is an E-space.
Proof. Suppose that X is an E-space and that xn ÷ x in
X. By the weak compactness of balls in X it follows that
PM(Xn) ~ PM(X). Further, {x - PM(Xn)} is a minimizing sequence
for the norm on the coset x - M, because
d(x, M) = f i x - PM(x) II
< lim i n f l ix - PM(Xn)[[
< lim supllx - PM(Xn) ll
< lim sup( I Ix - Xnl I + l[Xn - PM(Xn)[I)
= d(x, M)
By the definition of an E-space (31b)) it now follows that
PM(Xn) ÷ PM(X), qed.
Remarks. i) The preceding proof of continuity of PM when X
is an E-space works equally well when M is any weakly closed
Chebyshev set in X. Thus, in an E-space, every convex b.a. problem
is "well posed" in the sense of Hadamard: there is a unique solution
which depends continuously on the point being approximated.
2) The E-property is not quite necessary for all metric
p~ojections PM to be continuous. Lambert (unpublished) has shown

165
that the dual of a Banach space constructed by Klee [ii, p. 240] by
suitably renorming ~2 has all PM continuous, but is "not quite"
an E-space, because the Klee space fails to have a Frechet
differentiable norm at a particular unit vector.
3) Recently Oshman [21] has announced nas conditions for all
PM to be continuous. They constitute a slight weakening of the
weak-strong convergence implication in 31b). In view of Lambert's
result, the dual of the Klee space is a concrete example wherein the
Oshman conditions are fulfilled but the E-property is lacking.
4) It might be hoped that strengthening the E-property to
uniform rotundity would result in the uniform continuity of the
metric projections (and hence their Lipschitz continuity since they
are homogeneous maps). However, this fails to be true even in
finite dimensions - [9, p. 246] shows that even pointwise Lipschitz
continuity cannot be expected in general. What about the case of
LP(~) spaces? A result of Lindenstrauss [15, p. 270] implies in
particular that if PM is uniformly continuous on some reflexive
nls X, then M is complemented in X. But Murray [19] has shown
that infinite dimensional LP(~) spaces contain closed subspaces
without complements. On the brighter side, however, it has been
proved [9, p. 236] that if LP(~) is finite dimensional and 2 g p,
then every PM is uniformly continuous.
5) There is also another positive result concerning the
continuity of metric projections on a uniformly rotund Banach space
X. Namely, the family of maps {PM:M a closed subspace of X} is
uniformly equicontinuous on any bounded subset of X [7, p. 109].
f) Next we record a few simple necessary conditions for the
continuity of metric projections.

166
Theorem. Let M be a Chebyshev subspace of a n ~ X, and
suppose that PM is continuous. Then
i) PM is an open, mapping;
2) M@ is a strong deformation retract of X;
3) M@ is homeomorphic to a nls, and hence in particular is
locally contractible.
Proof. i) The fibre bundle (X, M, M @, PM) is equivalent to
the product bundle (M x M @, M, M @, P) under the homeomorphism
x e-~T(x) ~ (PM(X), x - PM(x)). Here P:M x M @ ÷ M is projection on
the first factor. Since such projection maps are always open, and
since we clearly have PM = P° T, it follows that PM is open.
2). The definition of a strong deformation retract (e.g.
[4, p. 324]) requires us to show that the identity map of X is
homotopic to a retraction of X onto M@ in such a way that the
points of M@ remain fixed throughout the entire deformation. In
the present case it is clear that the homotopy t ~I - tPM,
0 < t < I, meets these requirements.
3) An immediate consequence of the following theorem.
g) The next result reenforces earlier evidence of intimate
connections between "smoothness" properties of a metric projection
PM and structural properties of the corresponding metric
complement M @. It provides anas condition for the continuity of
PM and has several interesting implications, one of which being
the existence of discontinuous metric projections on CR(~ ) for
appropriate spaces ~.
167
Theorem. (Holmes) Let M be a Chebyshev subspace of a nls X.
Then the map Q z QMIM @ is a c o n t i n u o u s norm-preserving bijection
of M@ onto X/M, and is a h o m e o m o r p h i s m exactly when PM is
continuous.
Proof. The injectivity of Q is a c o n s e q u e n c e of the first
part of the theorem in c). Suppose that Q-I is c o n t i n u o u s on
X/M, and let x + x in X. Then x + M ÷ x + M in X/M, and so

n n
I IPM(Xn) PM(X) II < t lx n - PM(Xn) (x - PM(X))ll
÷ f rx n xjl -- I JQ- l ( x n + M) - Q-~(x + M) II
+ Irx n xJJ + 0.
Now suppose that PM is c o n t i n u o u s . Let xn + M + x + M in
X/M and let e > 0. Since PM is c o n t i n u o u s at Q-l(x + M),
~6 > 0 such that llz - Q ' l ( x + M) I I < ~ = ~ I I P M ( Z ) I I < e. Let
be the open set {z e X : [ I z - Q-l(x + M) I I < m i n ( ~ , e)}. Now
QM(V) is an (x + M ) - n b h d . in X/M and h e n c e contains xn + M
for n > n o , say. For each such n let zn ¢ V satisfy
QM(Zn) = x n + M. Then ~Yn e M so that xn Zn = Yn' and hence

P M ( X n ) -- Yn + P M ( Z n ) = Xn - Zn + P M ( Z n )" Therefore,
[ [Q-l(x n + M) Q-I(x + M) I [
= Ix n PM(Xn) (x - PM(X))[I
Ix n PM(Xn ) Znll + l lz n (x vM(x))Ij
= IPM(Zn)[I + I lz n (x - P M ( X ) ) I I < 2~
168
for n > no, qed.
Several corollaries of this theorem are given in [8]; let us
just list the following one here.
corQllary, (Cheney and Wulbert) Let M be a Chebyshev subspace
of finite codimension in some nls. Then PM is continuous if and
only if M@ is boundedly compact.
h) We can now give some examples of discontinuous metric
projections. Historically, the first example of such pathological
behavior is due to Lindenstrauss [16, p. 87]; the subspace M
there is the 2-codimensional annihilator of the subspace
span({t, t2}) C CR([0 , i]). Another example is given in [9, p. 245];
the subspace M again has codimension 2, it is contained in a
rotund isomorph of Z~, and the restriction of PM to a line turns
out to be discontinuous.
It has been shown by Garkavi [6] that if the (infinite) compact
space a is the closure of its isolated points, then CR(a)
contains Chebyshev subspaces of all finite codimensions. We will
continue to use the notation of 22e), and will also abbreviate
I xd~ to ~x,~7.
Lemma. Let M be a Chebyshev subspace of finite codimension
n in CR(a), and let t ~ a be n o t isolated in ~. Then
1) @ ¢ ~ e M~ = ) t e support (~);
2) x e S ( C R ( a ) ) ( ~ M @ -~> l x ( t ) l = 1.
Proof. i) It is enough to show that the open set ~\support (~)
is finite. In fact we can obtain a contradiction by assuming this
set contains n or more points. If so, let

169
N = ~x e CR(~): x(support(~)) = 0); we have dim(N) > n. Choose
a basis ~, p2,...,~n ) for M ~. The subspace
M 1 z {x e CR(~): ( x , ~ i ~ = 0, i = 2 ..... n) has codimension (n~l),
hence ~y e MI(-~N, y ~ @, and since (y,~) = 0 also, we have
y e M. But this implies that M is not Chebyshev. For, by the
lemma in 30b) and by Exercise 33, we can choose z e S(CR(~)) such
that z(support(~±)) = ~i, and then the function x z z(l -I Yl)
has norm one, and has both @ and y as b.a.'s from M (check!).
2) From c) it follows that ~ e S(M~) such that
1 = l lxlll I~II = ( x , ~ , whence x(support(~±)) = +I. We now
conclude by use of i), qed.
Theorem. (Morris) Let M he a Chebyshev subspace of CR(~),
satisfying 1 < codim(M) < +~. If ~ contains infinitely many
points, then PM is discontinuous.
Proof. Since ~ is infinite we can choose some non-isolated
point t e ~. Define A ~ = (x e S(CR(~)) ~ M @ : x ( t ) = tl).
According to the lemma just proved, the sets A+ and A- constitute
a (closed) partition of S(CR(~))('~ M@, so that this set is n o t
connected. But since dim(X/M) > 2, S(X/M) is connected.
Consequently, the map QMIM @ cannot be a homeomorphism, so by g),
PM is discontinuous.
Remarks. i) This theorem has been generalized in [14, p. 210]
to cover the case where CR(~) is replaced by AR(K), the space of
real, affine, continuous functions on a Choquet simplex K.
2) A remarkable example has recently been discovered by
Kripke (to appear). He has shown that Hilbert space may be renormed
with a rotund norm so as to contain a subspace M with PM

170
discontinuous. This is the first example of such a metric
projection acting on a reflexive space.
i) We recall that a continuous linear map between two ics is
weakly continuous (that is, continuous when both spaces have their
weak topologies), and that the converse is true when the spaces
involved are both Banach (or Frechet) spaces. Now since a metric
projection PM is generally not linear it is not a priori clear
whether or not PM is weakly continuous. Once again, the answer
depends on a topological property of M @.
Theorem. (Holmes, Kottman-Lin) Let M be a Chebyshev subspace
of a nls X.
I) If M is finite dimensional, then PM is w-continuous if
(and only if) M @ is w-closed;
2) If M is reflexive, then PM is bw-continuous (resp.
w-sequentially continuous) if (and only if) M @ is bw-closed (resp.
w-sequentially closed).
Proof. i) To show that a map between two topological spaces
is continuous, it is sufficient to show that the inverse image of
each basic open set is open. Thus in the present situation it is
sufficient to show that PM l(y + r int(U(M)))is w-open for every
y ~ M and r > 0. If this is not the case for some such r and
y, there is a net {x } w-convergent to x in X such that
(6) I IPM (x) - Y[] < r < [[PM(X ) - Y]I
V~. Since PM is norm-continuous by e), there is, for each a,
a vector z ~ co({x, x }) such that I IPM(Z a) - Yl[ = r. That is,

171
{z } C r S ( M ) + (y + Me),
and the set on the right is w-closed, since M@ is w-closed by
hypothesis, and S(M) is compact hence w-compact by finite
dimensionality (we are using 3h) here). But {z } converges
weakly to x, hence IIPM(X) yll = r, contradicting 46).
2) We recall that in the bw-topology on X a set E CX is
closed if and only if every bounded w-convergent net in E has its
limit in E (cf. [3, p. 41]). Now let (x) be a bounded net
(resp. sequence) convergent to some x in X. By b), (PM(X~)) is
bounded and hence has a w-cluster point y e M. Thus x is a
w-cluster point of (x + (y - PM(X ) } C y + M @, which is bw-closed
(resp. w-sequentially closed). Consequently, PM(X) = y, qed.
This theorem does not completely settle the matter of deciding
whether or not some particular metric projection is w-continuous,
since it is not always so easy to decide whether or not some non-
convex set is w-closed. Let us briefly indicate a few instances
where the theorem has played a role.
Examples, I) Let M be any closed subspace of zP,
1 < p < ~, p ~ 2. Then PM is w-sequentially continuous [8].
2) Let M be a finite dimensional subspace of LP(u),
1 < p < ~, p # 2, where ~ is a separable non-atomic measure
(e.g., Lebesgue measure on [0, I]). Then PM is n%t w-sequentially
continuous at any point of LP(~); in fact, M@ is w-sequentially
dense in LP(~). [13].
3) The subspace Pn of nth-degree polynomials is Chebyshev
in CR([0, i]), but by the Alternation Theorem 25d), P@n is not
w-closed.
172
j) In all the foregoing discussion we have assumed that the
metric projection PM' whose continuity properties we have been
studying, is single-valued, that is, M is a Chebyshev subspace.
Now as we know, there are many examples of proximinal subspaces
which are not Chebyshev. Among these are the non-Haar subspaces of
C(~), the subspace of continuous functions in Z~(~) (30c)), the
subspace of compact operators in the space of all bounded operators
on Hilbert space [i0], and the finite dimensional subspaces of
LI(~), ~ sigma-finite and non-atomic [12, 18]. What can be said
about the continuity of the metric projection in such cases?
Definition. A mapping T:X ÷ 2 ~ where X and ~ are
topological spaces is said to admit a continuous selection if there
is a continuous function F:X ÷ ~ (the selection) such that
F(x) e T(x) ~ x e x.
Thus one particular question which might be posed for a given
proximinal subspace M of a nls X is whether or not PM admits
a continuous selection. The answer to this question is known to be
affirmative whenever PM is Isc on X, or, more strongly, whenever
PM is continuous wrt the Hausdorff metric on the closed bounded
subsets of M. (This affirmative answer is a special case of the
Michael selection theorem [17], which is applicable in the present
situation provided the subspace M is complete.) Finite
dimensionality of M is no help here. For example, if M is a
finite dimensional subspace of LI(~) (~ as above), then PM is
not isc nor does it admit a continuous selection [14]. The answer
to the analogous question for C(a) is not as clear at present:
some non-Haar subspaces of C([0, i]) do exist for which there is
a continuous selection, but it is apparently not known how such sub-
spaces are to be recognized in general.

173
We c o n c l u d e with one s t r o n g p o s i t i v e assertion about the
continuity of a p a r t i c u l a r class of m e t r i c projections, namely those
shown to exist in 30c).
Theorem. (Kripke) Let M be the (proximinal) subspace of
continuous functions in Z[(~), where ~ is p a r a c o m p a c t . Then PM

is L i p s c h i t z continuous wrt the H a u s d o r f f metric on the c l o s e d
bounded subsets of M, and in p a r t i c u l a r , PM admits a continuous
selection.
Proof. We use the n o t a t i o n of 30c). Let x, z e ~(~) and
put ~ ~ IIx - zi[. Then
x, e < z, _< z* _< x * + e.
For any y e PM(X), 30c) implies
x~ A(x) < y _< x, + A(x).
Therefore,
z* - e - A(x) _< y _< z , + e + A(x),
z* - 2e - A(z) j y < z, + 2e + A(z).
it follows that
max(z* - A(z), y - 2e) -= v
< u - min(z, + A(z), y + 2e).

174
Now v is usc and u is Isc, so the D i e u d o n n e interposition
theorem provides a continuous function w interposed between u
and v: v < w < u. We now have both
IIw- yll <
z* A(z) < w < z, + A ( z ) ,
so that by 30c), w e PM(Z), and thus d(y, PM(Z)) 5 2~. By the
symmetric roles played by x and z we are able to c o n c l u d e that
the H a u s d o r f f distance between the sets PM(X) and PM(Z) is at
most 2e ~ 211x - z[i , qed.

175
References for §32
i) T. Ando, Contractive projections in L spaces. Pac. J. Math.

P
17(1966), 391-405.
2) I. Daugavet, A property of completely continuous operators in

the space C. Uspehi Mat. Nauk 18(1963), 157-158.(Russian)
3) M. Day, Normed Linear Spaces. Academic Press, New York, 1962.
4) J. Dugundji, Topology. Allyn and Bacon, Boston, 1966.
5) C. Foias and I. Singer, Points of diffusion of linear operators,

Math. Zeit. 87(1965), 434-450.
6) A. Garkavi, Approximative properties of subspaces with finite

defect in the space of continuous functions. Sov. Math.
5(1964), 440-443.

Wisk. 14(1966), 106-113.
8) , On the continuity of best approximation operators.

Proc. Symp. Inf. Dim. Topology. Annals of Math Study #69,
Princeton Univ. Press, to appear.
9) and B. Kripke, Smoothness of approximation.

Mich. Math. J. 15(1968), 225-248.
i0) , Best approximation by compact operators. Ind.

Univ. Math. J., to appear.
ii) V. Klee, Two renorming constructions related to a question of

Anselone. Studia Math. 23(1969), 231-242.
12) B. Kripke and T. Rivlin, Approximation in the metric of LI(x,~).

13) J. Lambert, The weak sequential continuity of the metric

projection in Lp spaces. Dissertation, Purdue Univ., 1970.
I76
14) A. Lazar, P. Morris, and D. Wulbert, Continuous selections for

metric projections. J. Func. Anal. 3(1969), 193-216.
15) J. Lindenstrauss, On nonlinear projections in Banach spaces.

Mich. Math. J. 11(1964), 263-287.
16) , Extension of compact operators. Mem. Amer.

Math. Soc. #48, 1964.
17) E. Michael, Selected selection theorems. Amer. Math. Monthly

63(1956), 233-238.
18) R. Moroney, The Haar problem in L I. Proc. Amer. Math. Soc.

12(1961), 793-795.
19) F. Murray, On complementary manifolds and projections in L

P
and ~ . Trans. Amer. Math. Soc. 41(1937), 138-152.
P
20) T. Newman and P. Odell, On the concept of a p-q generalized

inverse of a matrix. SIAM J. Appl. Math. 17(1969),
520-525.
21) E. Oshman, On continuity of metric projections onto some classes

of subspaces in a Banach space, Soy. Math. 11(1970),
1521-1523.
22) I. Singer, Best Approximation in Normed Linear Subspaces by

Elements of Linear Subspaces. Springer, Berlin-Heidelberg,
1970.
177
§33. Optimal Estimation
The problems which will concern us now are special kinds of
convex programs motivated by the following practical situation. A
physical object P is assumed to be adequately modeled by an
unknown element Xp of some nls X. Typically X is a function
space such as L~([a,b]) or CR([a,b]) (or perhaps a finite product
of such spaces). Via appropriate experiments, observations are
taken of P, leading to a limited amount of information about Xp.
For example, it may be possible to evaluate Xp (or some of its
derivatives) at certain points, to compute certain moments or Fourier
coefficients, to estimate the values of certain semi-norms on xp,
etc. Of course there will be experimental inaccuracies. We assume
that the data thus accumulated is inadequate to specify Xp
completely; it merely delineates some subset A ~ X and our
knowledge can be summarized by "xp e A". The problem is to obtain
an estimate for xp, which is in some sense optimal.
a) Let X be a real nls and A c X. We consider the problem
of choosing an element of X which best represents the set A. If
x is any particular element of X chosen to represent the set A,
the error incurred will be sup(IIx-yII:y ~ A). In order for this
quantity to be finite it is has that A be bounded, so we will make
that assumption. Thus an x ~ X will best represent the set A
when the above error is a minimum.
Definition. Let A be a bounded set c X. A center (or
Cheb~§hev center) of A is an element x° ~ X for which
(I) sup~llx o yll:y ~ A~ = inf sup~llx - YlI:Y ~ A~,

178
where the infimum is taken over all x e X. The number on the right
in (I) is called the Chebyshev radius of A, denoted r(A).
Thus r(A) is the radius of the smallest ball in X (if one
exists) which contains the set A, and the centers of all such balls
are just the centers of A in the above definition. We denote the
collection of such centers by E(A). Referring back to the opening
paragraph of this section we see that the estimation problem posed
there is in principle resolved by our definition: an optimal estimate
of Xp will be any element of E(A), where A is determined by the
experimental data.
The set A is usually defined by means of affine and convex
constraints:
A = {x e A:¢ez(x ) = ca, CB(x) < d B}
where {~} c X* and {~} ~ Cony(X). (There may, however, also be
some qualitative information which immediately constrains A to lie
in some subspace or cone in X.) Thus A is usually closed and
convex, as well as bounded. In any event this much may always be
assumed, since for any (bounded) A, E(A) = E(6-6(A)).
It is possible to make our estimation problem a bit more
sophisticated by limiting our search for optimal estimates to some
(convex) subset of X, e.g., a finite dimensional subspace. Although
in practice this may be an important consideration for computational
purposes, we will not pursue it further here. The interested
reader may consult [13] for some results in this direction when A
is compact.
179
b) The function
FA(X ) - s u p { ] [ x - y [ l : y e A}
is evidently convex and (Lipschitz) continuous on X. Consequently,
E(A), as the set of all solutions of the convex program (X, FA), is a
(bounded) closed convex subset of X. Now, as we recall from llb),
a nas condition for x° to belong to E(A) is that @ ~ 3FA(Xo).
As always, the efficacy of this condition depends on our ability to
subdifferentiate the function in question; in this case, F A-
A formula of some usefulness for the subdifferential at a point
of a function equal to the supremum of convex functions has been
given by Valadier [18]. This formula assumes an especially pleasant
appearance when the index set is compact.
Lemma. (Valadier) Let X be a real Ics, ~ a compact space,
{ft:t e ~} c Cony(X), f ~ sup{ft:t e ~}, and x ° e X. If there is an
Xo-nbhd. V such that (t,x) r->ft(x ) is continuous on 2 × V, then
f is continuous on V and
(z) ~f(Xo) = c-o{Sft(Xo):t e 2o }
where 2 ° z {t e 2:f(Xo) = ft(Xo)}, and the closure is taken in
the w*-topology.
Proof. Thc continuity hypothesis entails the equi-continuity
of {ft } on V, whence f is usc. Being also isc by definition,
it is continuous. Next, forming difference quotients and taking
into account the definition of ~ we see that

O ~
(3) f ' (Xo;X) > f~(Xo;X ) ~ t e 2o' V x e X.

180
Hence, by 8a-3) we have
3 f ( X o ) m 3ft(Xo) V t ~
and therefore that the inclusion from right to left in (2) is valid.
In order to reverse this inclusion, it is sufficient, in view
of the Strong Separation Theorem 3h), to show that any w*-closed
hyperplane containing U {3ft(Xo):t e 2 0 } must also contain 3f(xo).
That is, if for some z e X we have
z, 4 > <
valid for every ~ e 3ft(Xo) and every t e 2o' then the same
inequality also holds for every ~ e 3f(Xo]. Recalling the Moreau-
Pshenichnii formula 10b), this amounts to showing that
sup f~(x o;z) < X

t e ~
o
= ) f' (Xo;Z) < ~ .
Thus it will suffice to show that for any fixed z e X, there is an
index to e 2 o such that
(4) f' (Xo;Z) <- f't o ( X "z)

o , •
Since we already have the inequality (3), what we are actually about
to demonstrate is that
(5) f' (Xo;Z) = max{f~(Xo;Z):t e 2o }.

181
Fix some 6 < f'(Xo;Z). Then taking into account 7a), we see
that
ft(Xo + c~z) f(Xo)
B - {t e ~: > B}
is non-empty for every ~, 0 < ~ < ~o' where ~o is chosen small
enough that x ° + ~ o z e V. Now the sets B are compact and they
decrease with ~ (again by 7a) and ft(xo) < f(Xo), t e 2).

Consequently,
t o e f; {B :0 < ~ 5 So}.
For every such ~,
ft (Xo + ~z) _> f(Xo) + ~B,

O
whence ft (Xo) -> f(Xo)' that is, t e Therefore, for

O O
O
0 < ~ < ~
o'
fto (x° + ~z) - fto(Xo)

> 8.
This inequality establishes the inequality (4), and hence the formula
(5). The proof is complete.
Remarks. I) We see that, under the hypotheses of the Lemma~
~f(Xo) depends only on the functions ft for which t ~ ~o" In
other words, ~f(Xo) = ~g(Xo) , where g ~ sup{ft:t e ~o }. (Of

course, the "sup" here is really a "max".)
2) The proof of formula (5) only requires continuity of the
functions t ~-~ft(x), for x e V.

182
To apply formula (2) to the computation of 3FA(Xo) , we assume
that A is compact in X. Then defining f (x) = ][x-y][, we can

Y
write
FA(X) = sup{fy(X):y e A}.
Now the subdifferentials 3fy(X) were in effect described in the
course of the proof in 22b), namely
(6) ~fyC~o) = {~ ~ s ( x * ) : ¢ C x o y) = IIx o y[l}.
Combining (2) and (6) we obtain
~FA(Xo) = c-~{qJ e S(X*):~(x ° y)
(7)
= FA(Xo), for some y e A},
where, as usual, the closure is taken in the w*-topology on X *.
The restriction to compact sets A is of course severe, but
formulas for ~FA(Xo) when A is non-compact become even more
unwieldly than (7). The authors of [13] suggest an instance of
practical occurrence which leads to the necessity for estimating
compact sets, namely when one is trying to approximate a continuous
function which depends on several inexactly specified parameters:
x ~+ f(x, X 1 ..... Xm).
Assuming that enough is known about the parameters to permit the
assertion ~i ~ Xi ~ ~i' for each i, one is led to a compact

183
family of functions. For an extensive discussion of this problem,
based on formula (7), we refer to [13].
c) Let us now consider the existence question for centers of
bounded sets in a nls X. For brevity we will say that X "admits
centers" if ¢ ~ A(bounded) c X 3) E(A) ~ ¢. The classical Banach
spaces are known to admit centers, but in general a Banach space need
not admit centers even for finite sets. Before presenting the main
sufficiency condition, it is convenient to introduce a definition.
Definition. A subspace M of a nls X is constrained in X
if M is the range of a norm-one projection defined on X.
Of course, a constrained subspace must be closed. All closed
subspaces of a Hilbert space are constrained; on the other hand,
as was observed in 32d), no finite codimensional subspace of CR(~ )
is constrained (~ perfect). For present purposes, we are interested
in Banach spaces X which (when canonically embedded) are constrained
in X ~. It is known that such is the case if either X is a dual
space or an LI(~) space.
Examples. I) Let Y be a nls, then Y~ is constrained in
Y*~ (= (Y*)*~). Indeed, an appropriate norm-one projection is the
map restricting a given element of Y*** to Y.
2) Let X = L~(~). Then it is known (Kakutani; [4, p. i00])
that X ~* is an (AL)-space. But Dean [5] has shown that every
closed sublattice of an (AL)-space is constrained. Therefore, X
is constrained in X ~. (It is possible to proceed much more
directly in special cases. For instance, let ~ be Lebesgue
measure on [a,b]. Given ~ e X ~, ~ defines an element
~I ~ CR[a'b]~ by restriction. Identifying ~I with a normalized
function of bounded variation on [a,b], we define

184
P(~) = d~l/dt, so P : X * * ÷ X. Now
ib d~ 1
It ~ - a t < var(,l) = ll,l] I ~ II*ll
a
so that llP] < I. Finally, suppose that ~ 5 x e X. In this case,
¢I is the indefinite integral of x, whence P(x) = d~I/dt = x, qed.)
Theorem. (Garkavi) If a Banach space X is constrained in X**,
then X admits centers.
Proof. Suppose it known t h a t every dual space admits centers.
Then g i v e n A ( b o u n d e d ) c X, there is a center for A in X**.
But t h e image of this center under the assumed norm-one projection
on X** is easily seen to be a center for A in X. T h u s we are
reduced to the case where X is a dual space. For n = 1,2,...,
we c a n f i n d x e X such that
n
sup I Ix n - Y l I < r ( A ) + ~.
yeA - n
Let xo be a w - c l u s t e r point of the sequence {Xn}; then because
11" I1 is w*-lsc on X we h a v e immediately l lx o - YI[ ~ r ( A ) , for
every y e A, that is, x o e E(A), qed.
d) The c o n d i t i o n of the last theorem is certainly not necessary
for a space X to admit centers. For example, the space co is
not even complemented in its second dual (- m ~ Z ~ ) , yet Garkavi [6]
proved that it does admit centers. In this section we s h o w , more
significantly, that the spaces CR(a ) a d m i t centers. T h i s was
originally proved by Kadets and Zamyatin [11] for the case where
= [a, b], but their proof easily generalizes. In fact, it will be
seen their proof carries over to the case where X is the space of
185
bounded continuous functions on a p a r a c o m p a c t space ~. The w h o l e
approach is quite reminiscent of 30c).
Given A(bounded) c CR(~ ), we define
a(t) = inf{x(t) :x e A},
a(t) = sup{x(t):x e A},
a,(t) = lim i n f { a ~ s ) : s ÷ tT,
a*(t) = lim s u p { a ( s ) : s ÷ t}.
Then the f u n c t i o n a* - a, is usc on ~, h e n c e attains its m a x i m u m
value z 2r at a p o i n t t o e ~.
Lemma. The n u m b e r r just d e f i n e d satisfies r < r(A).
Proof. For any z e CR(~), we show that FA(Z )
sup{IIy-zII:y e A} ~ r. Given e > 0, ~ to-nbhd. N on w h i c h the
oscillation of z is < e/2. By d e f i n i t i o n of r and to, we
must have either
(a*- z)(to) _> r, or (z a,)(to) > r,
say the former. By d e f i n i t i o n of a* and a, we can first find
s e N for w h i c h a(s) > Z(to) + r, and then an x ~ A for w h i c h
x(s) > Z(to) + r - e/2. By d e f i n i t i o n of N, we then o b t a i n
FA(Z) > l lx - Yll _> (x - z)Is) ~ r
which completes the proof.

186
Theorem. The space CR(~ ) admits centers. If A is any
bounded subset of CR(fl), we have
(8) E(A) = {x e CR(a):a* - r < x < a, + r},
where a*, a,, and r were just defined. In p a r t i c u l a r , r = r(A).
Proof. Dieudonn~s Interposition Theorem (cf. 30c) guarantees
that the right hand side of (8) is non-void. But choosing any x
in such a f a s h i o n entails
llx- yll r, Vy CA,
and so the Lemma yields x ~ E(A), and r = r(A). On the other hand,
if we have any center x° for A, then Vt e ~, Vy ~ A
y(t) r < Xo(t ) _< y(t) + r,
hence
i(t) r < Xo(t) < a(t) + r,
and finally,
a* - r < x° < a, + r,
and so the p r o o f is complete.
Corollary. If A is a compact subset of CR(~ ), then the
function (a + ~)/2 belongs to E(A).
This follows because a and a are c o n t i n u o u s , due to the
equicontinuity of A, and so a = a,, a = a *. The Corollary thus

187
provides a simple formula for a center of a compact set in CR(~).
Such formulas are generally not available in other Banach spaces.
e) We now turn to the uniqueness problem for centers. Not
surprisingly, the answer hinges on rotundity properties of the unit
ball. A precise though somewhat unusual condition is known which is
nas for every bounded set to have at most one center. This condition is
that the unit ball should be "uniformly rotund in every direction"
[6,10]. This property is known to be weaker than uniform rotundity.
In fact, there exist reflexive spaces having this property yet not
isomorphic to uniformly rotund spaces. On the other hand, the
property is definitely stronger that mere rotundity. For example,
addition of the L 2 norm to the uniform norm turns CR[0,1 ] into a
rotund space which is not uniformly rotund in every direction. This
property is discussed in detail in [I0] and will not be considered
further here. It appears that for most practical purposes the
following two sufficient conditions are adequate.
Theorem. (Klee, Garkavi) Let X be a uniformly rotund Banach
space. Then every bounded subset of X has a unique center in X.
Proof. We know (31g)) that X is an E-space, hence reflexive,
so that by c) X admits centers. Now let A c X and suppose
Xl, x 2 e E(A). Then also x o ~ (x I + x2)/2 e E(A) and we can
choose {yn} c A such that fix o ynl I ~ r(A) o Now
1 1
Xo Yn = ~(Xl - Yn ) + 2-(x2 Yn )
and I lx i ynl I < r(A) (i = 1,2), hence
limIlx i ynll = r(A>.
But also
188
liml](x I - yn ) + ( x 2 - yn) l[
= limII2(x ° yn) II = 2 r .
Consequently, by uniform rotundity,
0 = limIl(x I yn) (x 2 yn) l[ = [Ix I - x21 I,
and so E(A) is a singleton, qed.
Entirely similar arguments establish that with respect to the
Hausdorff metric on the closed bounded convex subsets of a uniformly
rotund space, the functions r(.) and E(.) are continuous. (In
fact, r(.) is always continuous for any nls.) This statement, along
with the preceding theorem, naturally reminds us of the analogous
fact that in such spaces every convex best approximation problem is
well posed. However, this last assertion is even valid in E-spaces,
as was noted in 32e). It is apparently not known whether the
E-property also suffices for the uniqueness and stability (i.e.,
continuity of E(.)) of centers for arbitrary bounded subsets, but
it does suffice if we restrict ourselves to the consideration of
compact sets.
Theorem. (P. Smith) If X is a rotund space then every compact
set in X has at most one center in X. If X is an E-space, then
each compact set in X has a unique center and E(-) is continuous
with respect to the Hausdorff metric on the compact (convex) subsets
of X.
Proof. Let Xl, x 2 ~ E(A) for some compact A c X. Then
(x I + x2)/2 e E(A) and ~y e A such that

189
xI + x2
r(A) = i] 2 Yli
< 21_] ix 1 _ y i ] + 1]ix 2 _ y[] _< r ( A ) .
In order to avoid having a line segment on the sphere r(A)S(X) we
must therefore have xI y = x2 y, or x I = x 2.
Now let X be an E-space, and {A n } a sequence of compact
subsets converging in the H a u s d o r f f metric to a c o m p a c t subset A.
Let E(An) = {x n} and E(A) = {Xo}. Choose Yn e A n such that
llx n ynl I = r(An). For any y e A and any w - c l u s t e r point x of
{Xn}, we have
[Ix - yll < lim inf[[x n Y]I ~ lim infJlx n yn] ]
= lim inf r(An) = r(A),
which shows that x e E(A) • Consequently, xn ~ X o• Now, given
Yo e A satisfying fix ° - yo[ ] = r(A), we have xn Yo --~ Xo Yo"

Therefore,
r(A) = I lXo - Yoll < l i m i n f ] Ix n yo] [
< lim s u p I i x n yoI I ~ lim sup[Ix n ynl I
= lim sup r(An) = r(A),
and so the E - p r o p e r t y entails Xn Yo ÷ Xo Yo' hence xn Xo,

qed.
190
f) In contrast with the best approximation problem we have in
the present circumstances a new problem of location. Given A
(bounded) in a nls X we have already noted that E(A) = E(~-o(A)),
but where is E(A) wrt c--o(A)? In particular, do we have
E(A) c c-o(A), or at least E(A) ~l c-o-(A) ~ ~? Unfortunately, the
answer to even the latter question is generally negative, as we see
next.
Theorem. (Klee, Garkavi) For a nls X, the following assertions
are equivalent:
i) for each bounded A ~ X, E(A) f~ c-o(A) ~ ~;
2) dim X < 2 or else X is a Hilbert space.
Proof. Let X be a Hilbert space, {x o} ~ E(A), and suppose
x° / ~(A). Applying 3h), we strongly separate x° from co(A) by
a hyperplane H; we may assume that @ ¢ H, x o ~ H. We set
h ~ PH(Xo ) (32a)) and consider any y e A. If z is the point
where the line segment [Xo, y] intersects H, then we have
[l~ o - z[[ = [[PH(Xo z)[[ < [IXo zr[,
11~o- yll ~ 11~o- zll + ))y- zl)
< iix o zl] + IJy zil = llx o - yil.
This implies that Xo ~ xo ¢ E(A), and thereby contradicts uniqueness
of X0 .
For the converse, we may assume that A contains at least
3 points, and that dim X > 3. By well-known characterizations of
inner-product spaces (Jordan-von Neumann, Kakutani), it will suffice

191
to assume dim X = 3, and to c o n s t r u c t a norm-one projection from X
onto fixed but arbitrary 2-dimensional subspace L c X. Once it is
known that X must be an i n n e r - p r o d u c t space the p r o o f will be
accomplished, for if any nls X satisfies condition I) above, it
m u s t be c o m p l e t e (if not, let A be the intersection of X with a
ball centered in X~X; then E(A) is void).
Now if z o e X\L is fixed, the sets
Dn ~ {x ¢ L : [ I x - Zol ff n } ,
r n ~ {x e L : l l x - Zol = n}
are n o n - e m p t y for large n. For y e rn let
s(y) ~ ~x e e : l l x - Yl < n>.
We now apply Helly's theorem [3] to c o n c l u d e that
fl {S(y):y e F n} ~ S n ¢ ~. The h y p o t h e s i s of H e l l y ' s theorem,
namely that every three S(y)'s have n o n - v o i d intersection, is
justified by a p p l y i n g condition i) to any 3-point subset of F n-
If x n e Sn then IlXn - yll 5 fly - Zoll V y e F n, and now a
geometric argument shows that
(9) llx - Xnll 5 I[x - Zorl,
for every x ~ D n. The sequence {x n} is b o u n d e d in L, hence has
a cluster point x o ~ L. Because L is the u n i o n of the Dn we
see that II x - Xoll 5 [I x - Zoll V x e L. We now define a
projection P:X ÷ L via
P(tz + x) = tx + x,
o o
192
for all scalars t and x ~ L; taking into account (9) we obtain
tlP[l = 1, qed,
g) Let A be a bounded set in some nls X. It is clear that
1
(io) r(A) > ~- diam(A),
where diam(A) is the ordinary metric diameter of A. Let us say
that A is centerable if equality holds in (i0). In general we
expect this property to depend on the shape of A vis-a-vis the
shape of U(X). For example, in R2 consider the triangles A1 and
A2, where both Ai have vertices at (0, 0) and (2, 0), and the
third vertex of A 1 (resp. A2) is at (I, I) (resp. (i, ~ ) ) . Now
both Ai have diameter 2 wrt either the Euclidean norm or the
sup norm, and both are centerable wrt the latter norm. However,
although A1 is also centerable wrt the Euclidean norm, A2 is not.
That the (equilateral) triangle A2 is not centerable wrt the
Euclidean norm is only a special case of the following classical
result (Jung, 1901): if A c ~2(n), then
(11) r(A) n
< (T6--7--f) i/2diam(A )
with anas condition for equality being that A is a regular simplex
[2, 3]. The infinite dimensional analogue of (Ii), namely that we
may let n ÷ ~ to obtain
r(A) < 2-1/2diam(A),
has been shown by Routledge [16] to be valid in any Hilbert space.

193
It is natural to inquire whether some Banach spaces contain
only centerable sets. In order to produce such examples, let us
first recall that the following properties of a (real) Banach space
X hsve been shown by Nachbin [15] and Kelley [12] to be equivalent:
i) X is a "PI space", that is, X is constrained (c)) in
every Banach space containing it;
2) every collection of mutually intersecting (closed) balls in
X has non-void intersection;
3) X is (isometric with) CR(~), where the compact space ~ is
extremally disconnected.
It is known that no P1 space can be smooth and that no infinite
dimensional Pl space can be separable or w-sequentially complete
(and hence cannot be reflexive) [8]. The standard examples of such
spaces or ~(S) and LR(~ ). Combining condition i) above with the
theorem in c), we see that each Pl space admits centers. A proof
of the following theorem has been given by Belobrov [i] utilizing
condition 2).
Theorem. Let X be a Pl space. Then every (bounded) subset
A of X is centerable.
Proof. We identify X with CR(~) as in condition 3).
Because ~ is extremally disconnected the space CR(~ ) is boundedly
complete (indeed this property is characteristic of such spaces
[17]). Consequently, in the notation of d), the functions £ and
belong to X, and so, as in d), (£ + a)/2 e E(A). But whenever
this happens, we can, given e > 0, find X and ~ in A such that
diam(A) > I IF - KII > ~(t o) - K(t o)

194
>_ a ( t o ) ¢ - a(to) e = 2r(A) 2e,
if t e Q is chosen so that
0
1 --
r(A) ~ J[~(a - a) J I = l ( a ( t o ) a(to)).
Corollary~ Every compact subset of any space CR(~ ) is
centerable.
This follows from the preceding argument and the corollary in d).
However, an arbitrary bounded set in CR([0,1]) need not be
centerable.
h) To conclude this section we present a result due to Golomb
and Weinberger, which shows that centers for certain subsets of
Hilbert spaces may be identified with elements of minimal norm. This
reduces the estimation problem to one of best approximation. An
extensive variety of examples illustrating this method is available
in [7].
Let X be a Hilbert space, M a closed linear subspace
(especially important for the applications are the finite
codimensional subspaces), and p > 0. We define A to be the
intersection of pU(X) with some fixed translate of M, and refer
to A as a "hypercircle" in X.
Theorem~ The center of any hypercircle A in X is the
(unique) element of minimal norm.
Proof. Let x e A be the element of minimal norm, that is,

O
x o = PA(@). By the characterization of b.a.'s in Hilbert space 22d),

195
we have
0 < <y - Xo, X o > , V y ¢ A,
whence for any y e A,
ti% - y))2 = ]x ° 12 + l ) y l ) 2 - 2KXo, y)
< ix ° [2 + i ] y ] ] 2 _ 2[[Xo]]2
= ]yl 2 ] l X o l l 2 <_ p2 _ ~2 ~ 0 2
where 6 ~ IIXol I. To see that xo must be the center of A, we
will produce YI' Y2 ~ A such that ]]Yl - Y2 [I = 2a. Because of
(i0), this will show that r(A) > ~ and hence complete the proof.
Define
Yl = X O + ~z,
Y2 = Xo ~z,
for any fixed z ~ S(M). Then because x ¢ M i,

O
[]x o
+_ a z [ ] 2 =
i[Xo[[2 + 2 = 0
2,
so that both Yi e A and Ily I - y2[I 2 = 4~ 2, qed.
Note that we have obtained the formula
r(A) = ~ --- (O 2 i iXo I ]2)1/2,
and that the center x appears as the orthogonal projection of any

O
point in A onto M
196
Remarks. i) As has been observed by P. Smith, the argument
just presented is equally applicable to the case where the hypercircle
is replaced by the intersection of a ball pU(X) and a "strip"
{x e X : c ~ ¢~(x) f d }, for some family {¢~} c X*. Indeed,
letting A be the intersection and xo the minimal element, we see
as before that r(A) j sup{]Ix ° y][:y e A} < c and then that
r(A) = c by consideration of the hypercircle {x e A:¢~(x) = ca(Xo)}.
2) Consider the very special case of the above theorem where
M is one-dimensional and hence the set A is just a chord of the
ball pU(X). Now the center of a line segment is obviously its
mid-point (in any nls). Hence the theorem asserts in this case that
the minimal element of any such chord is its mid point. It has
recently been established by Gurarii and Sozonov [9] that this
property is characteristic of inner-product spaces.
i) Frequently, in practical situations of estimation, we are
confronted with the problem of estimating or "predicting" the value
of one or more linear functionals on the unknown element x of X

P
(notation as in the beginning of this section). Assuming as usual
that our knowledge of x can be compressed into the assertion

P
"x e A", and given some ¢ ~ X* (a "prediction functional"), we
P
would like to enclose the image ¢(A) in as small an interval as
possible (assuming real scalars for simplicity). Then the mid-
point of that interval is our estimate or "predicted value" of ~(Xp)
and half the length of that interval is a bound on the error. This
of course is simply a center problem in the scalar field. In
particular, when A is convex, we would like to be able to actually
compute the interval ~(A).
The only cases for which this problem has received attention
are those where A is a finite codimensional hypercircle. Here we

197
report only the s o l u t i o n w h e n X is a H i l b e r t space, w i t h o u t the
finite codimensionality restriction. However) the interested reader
is r e f e r r e d to some w o r k of M e i n g u e t [14], w h e r e i n a f o r m u l a of some
value for ¢(A) is o b t a i n e d for a g e n e r a l nls.
In the next t h e o r e m we let A be a h y p e r c i r c l e , defined as in
h) to be the intersection of a t r a n s l a t e of a c l o s e d subspace M
with pU(X), for some p > 0.
Theorem. Let X be a real H i l b e r t space, A a hypercircle in
X, and ~> e X*. Then
~(A) = [~(~o ) oi[~[MII, ~(Xo ) + ° I [ ~ [ M I [ ] ,
where xo is the c e n t e r (= m i n i m a l element) of A, and
o2 ~ p2 _ i t x o l l 2 , as in h).
Proof. We k n o w that x ° = y - PM(y), V y e A. Since
y = PM(y) + (y - P M ( y ) ) , we h a v e
)IPM<y)E] z = llyll z iXoli2 <_2,
whence
]~(y) ~(xo)[ = ]~(PM(y))
< r[~E~Ir I PM(y>ll _< ~II~I~II.
To see that these b o u n d s c a n n o t be d e c r e a s e d , define
Yl = XO + (;too'
Y2 = Xo - ~mo'
I98
where m ° e S(M) satisfies ¢(m o) = I I¢IMII. Then just as in h)
we have YI' Y2 e A and
¢(Yl) = ¢(Xo) + °]l¢lM[r
¢(y2 ) = ~(x o) aJj¢iMjl.
This theorem shows that the optimal estimate for ~(Xp), given
that Xp e A, is the value ¢(Xo) and the associated estimation

error is I[¢IMII(p 2 - llXo]I2) I/2.
Example~ Suppose that we have the following information about
a function x(-) on the interval [a, b]:
x ~ CR([a , b]), and ~ is
sectionally continuous on [a, b];
x(a) = ~, x(b) -- ~;
b~(t)2dt < p2
What is our optimal guess for x, and what value can we predict for
I
a
bx(t)dt?
This problem is clearly a (very) special case of the ones just
discussed. We choose as our underlying Hilbert space the space
X = Hl([a, b]) (cf. Exercise 20, p. 31), with inner product

199
b
(x, y) = IaX(t))~(t)dt.
In order for < x, x \~ 1/2 to be a n o r m we must identify functions
differing by a constant. Since the c o n s t a n t functions are d i s j o i n t
from the s u b s p a c e M of f u n c t i o n s vanishing at a and b, this
identification cannot lead to any a m b i g u i t y in our answer.
According to h) our optimal estimate for the u n k n o w n x is the
minimal element x° in the v a r i e t y {x e X: x(a) = a, x(b) = B).
By u s i n g any of the several optimization techniques from Part II
(for example, 12d, e) or 16f)) we are led to c o n c l u d e that xo must
have the p r o p e r t y that for some scalars c and d, and all x e X,
b
(x, x o) ---I ~(t)~°(t)dt = cx(a) + dx(b)"
a
i
This entails x ° = constant, and so x° is just the linear function
on [a, b] with the p r e s c r i b e d v a l u e s at the e n d p o i n t s . It follows

that the predicted value for Ibx(t)dt is
~a
b (b-a)(~+B)
(12)
i a
Xo (t)dt ~ 2 "
In other words, given only such limited information about the
unknown function x, the optimal method for e s t i m a t i n g its d e f i n i t e
integral is to apply the trapezoid rule.
What is the error incurred by c h o o s i n g (12) as our p r e d i c t e d

b
value for I x(t)dt? The answer to t h i s depends on computing the
a
v a l u e of the program
200
b
(13) max{ I x ( t ) d t : x e S(M)}.
a
This program has a unique solution mo, characterized (again by the
results in either §12 or ~16) by the e x i s t e n c e of c, d and X > 0
such that
f
a
bx(t)dt = cx(a) + dx(b) X fbx (t)mo (t) d t ,
a
for all x. This entails m = constant so m must have the form

O O
mo(t ) = p ( t - a ) ( t - b ) ,
for some scalar p. Choosing p so that Ilmol I = i, we find
(3_____~l/Z '
P=\(b-a)~
and thence the value of (13) is
( ( b -1a/)2 13 2)
Assembling a l l t h i s i n f o r m a t i o n we f i n a l l y a r r i v e at the c o n c l u s i o n
t h a t for any x s a t i s f y i n g the c o n d i t i o n s l i s t e d at the b e g i n n i n g
of t h i s example,
x(t)dt .....(a+B)2 (b-a) 2 <_ p2 _ (b-a)12

a
201
References for §33
i) P. Belobrov, On the problem of the Chebyshev center of a set.

Izv. Vys. Ucheb. Zaved. (1964), 3-9. (Russian)
2) L. Blumenthal and G. Wahlin, On the spherical surface of smallest

radius enclosing a bounded subset of n-dimensional euclidean
space. Bull. Amer. Math. Soc. 47 (1941), 771-777.
3) L. Danzer, B. Gr~nbaum, and V. Klee, Helly's theorem and its

relatives. Convexity, Prec. Symp. Pure Math. 7 (1963),
Amer. Math. Soc.; 101-180.
4) M. Day, Normed Linear Spaces. Academic Press, New York, 1962.
5) D. Dean, Direct factors of (AL)-spaces. Bull. Amer. Math. Soc.

71 (1965), 368-371.
6) A. Garkavi, The best possible net and the best possible cross-
section of a set in a normed space. Izv. Akad. Nauk SSSR 26
(1962), 87-106. (Russian) (Translated in Amer. Math. Soc.
Trans., Ser. 2, 39 (1964).)
7) M. Golomb and H. Weinberger, Optimal approximation and error

bounds. On Numerical Approximation, R. Langer, Ed., Univ. of
Wisconsin Press, Madison, 1959; 117-190.
8) A. Grothendieck, Sur les applications lin~aires faiblement

compactes d'espaces du type C(K). Can. J. Math. 5 (1953),
129-173.
9) N. Gurarii and Ju. Sozonov, Normed spaces in which the unit sphere
has no bias. Math. Zametki 7 (1970), 307-310. (Russian)
(Translated in Math. Notes 7 (1970), 187-189.)
I0) R. James and S. Swaminathan, Nermed linear spaces that are uni-
formly convex in every direction. Preprint.
202
Ii) M. Kadets and V. Zamyatin, Chebyshev centers in the space

C[a,b]. Teo. Funk., Funkcion. Anal. Pril. 7 (1968), 20-26.
(Russian).
12) J. Kelley, Banach spaces with the extension property. Trans.

Amer. Math. Soc. 72 (1952), 323-326.
13) P. Laurent and P.-Dinh-Tuan, Global approximation of a compact

set by elements of a convex set in a normed space. Num.
Math. 15 (1970), 137-150.
14) J. Meinguet, Optimal approximation and interpolation in normed

spaces. Numerical Approximation to Functions and Data,
J. Hayes, Ed., Ath]one Press, London, 1970; 143-157.
15) L. Nachbin, A theorem of the Hahn-Banach type for linear trans-

formations. Trans. Amer. Math. Soc. 68 (1950), 28-46.
16) N. Routledge, A result in Hilbert space. Quart. J. Math.

3 (1952), 12-18.
17) M. Stone, Boundedness properties in function-lattices. Can. J.

Math. 1 (1949), 176-186.
18) M. Valadier, Sous-diff~rentiels d'une borne sup~rieure et d'une

somme continue de fonctions convexes. C. R. Acad. Sci.
Paris 268 (1969), A39-A42.
203
§34. Quasi-Solutions
A familiar application of optimization techniques, dating back
to Cauchy, is to the location (or at least approximate location) of
roots of functions and more general mappings; that is, to the
solution of equations. For example, consider the problem of locating
roots of a given polynomial p. We define a function f on R2 in
the following way:
p(z) -= p(x + iy) - g(x, y) + ih(x, y),
f(x, y) - g(x, y) 2 + h(x, y)2.
Then clearly the value of the program (R 2, f) is 0, and even
though f need not be convex, there are no other (local) minima of
f (as follows easily from the Maximum Modulus Principle). Thus any
scheme for minimizing smooth functions over R2 can be used to
locate roots of polynomials. Clearly this observation does not
depend on p being a polynomial; it is equally applicable to any
analytic function.
In this section we give a brief introduction to the theory of
equation solving via optimization. However, in order to have the
resulting programs convex, we will confine ourselves to linear
equations. On the other hand the underlying vector spaces can be
infinite dimensional, as usual.
a) Let X and Y be respectively a tls and a nls, and let
A ~ L(X, Y). We will assume that A is injective, but not
invertible. This means that the problem
(1) A(x) -- y ,
204
is not well-posed [cf. 32e)); that is, a solution need not exist
for all y e Y, and that solution which does exist for y e range
(A) does not depend continuously on y. This lack of "stability" in
the inverse problem inhibits the use of any sort of approximation
scheme for solving (i).
As was observed by Tikhonov [4], one way to circumvent this
difficulty is to seek solutions to (i) only within a fixed compact
subset M C X. The point of this is that, due to a familiar theorem
from topology, the restricted mapping AIM is then a homeomorphism.
Hence if in (I) y varies within A(M), then the solution x
depends continuously on y. Of course, the drawback to this approach
is that it may not be clear whether y belongs to A(M) (in
practical problems y may be known only via experiments, and
therefore is not given with complete accuracy).
Another possible approach to the study of ill-posed problems
such as (i) is to alter the notion of a solution. This was first
suggested by Ivanov [I].
Definition. Let A, X, Y be as defined above and let M be
a subset of X. An M-~uasi-solution of equation (I) is a solution
of the program (X, f + 6M) where f(x) ~ J JA(x) Y[I.
Evidently, an M-quasi-solution exists if and only if y has
a b.a. from A(M). In particular this is the case whenever A(M)
is proximinal (30 a)). A fairly general sufficient condition for
this to occur is given next.
Lemma. Let M = B + F, where B is a compact subset of X
and F is a finite dimensional subspace of X. Then A(M) is
boundedly compact (31 a)), and hence proximinal in Y.

205
Proof. A(M) = A(B) + A(F), the sum of a compact subset and a
finite dimensional subspace of Y. Let {yn } = {b~ + f'}n be a
bounded sequence in A(M). Then {b~}, and consequently {f~}, are
bounded sequences. It is clear from this that we may extract a
convergent subsequence {b' ) from {b~}, and then a similar sub-

ni
sequence from {fnf }. This shows that {yn } is compact, and since
1
A(M) is closed, the proof is complete.
For the next theorem we assume that X is also a nls, that
both X and Y are infinite dimensional, and that range (A) is
dense in Y. We see that the situation under discussion occurs in
particular whenever A is a compact (injective) linear map (that
is, such an operator cannot be invertible).
Theorem. (Ivanov) Let M = B + F CX as in the previous
lemma. Then the M-quasi-solution program (X, f + gM ) is stable,
and is well-posed whenever N ~ A(M) is Chebyshev.
Proof. Suppose, for the time being, that we know that the
operator A 1 ~ AIM is invertible. To establish the stability (31e))
of the program, we consider any minimizing sequence {x n} (i.e.,
{Xn}C M and ]IA(Xn) - Yll ÷ d(y, N)). Then {A(Xn)} is bounded
and hence
{x n} = {All(A(Xn) )}
is also bounded. Since M is boundedly compact, stability follows.
Now if N is Chebyshev, the associated metric projection PN is
continuous. Hence, given y e N, the unique M-quasi-solution of
(i) is given by
(2) x = All(PN(y)).
206
It remains to show that A1 is a homeomorphism. To this end,
we choose a complementary (closed) subspace G for A(F):
(3) Y -- A(F) O G ,
and define a (closed) subspace E C X by
E = ~ ( A * ( G -L)}.
Then we easily see that
X = E~ F,
and that ~ = G (using the density of range (A) in Y). Next,
let BE be the projection of B on E (along F), so that
M = B E + F. Now the mappings A B ~ AIB E and A F ~ AIF are
invertibie on their respective domains. Hence if P:Y ÷ A(F) is
the projection operator defined by (3), and if y e N, then
All(y ) = AFI(p(y)) + ABI((I - P)(y));
this formula exhibits the continuous dependence of All(y)- on y,
qed.
b) In order to attain a more versatile result, one which does
not involve the restriction to subsets compact in a norm topology,
we present the following variant of the theorem in a).
Theorem. Let X be a Ics and Y an E-space. Let A be a
closed injective linear mapping with domain D(A) dense in X, and
w i t k values in Y. Assume that there is a compact set B C X such

207
that M ~ D(A) ~] B is convex. Then the M-quasi-solution program
is well-posed.
Proof. In analogy with the preceding proof, let A 1 = AIM
and N = A(M). We will show that A1 has a continuous inverse
(on N) and that N is closed. Since N is also convex, PN is
then (single-valued and) continuous because of the E-property (32 e)~
Then formula (2) will hold for the M - q u a s i - s o l u t i o n as a function of
y, and the result will be established.
Let M1 be a relatively closed subset of M; that is,
M 1 = D(A)~ BI, where B1 is a closed subset of B. We claim that
N 1 z A(MI) is closed. This will prove both that N is closed
(take M1 = M), and t h a t A~1 is continuous (since its inverse A1
is mapping closed sets into closed sets). Let {yn } C N1 be a
convergent sequence with limit Yo" Then xn z A-l(yn) e M 1 C B1

which is compact. Consequently, {x n} has a cluster point x o e B I.
Because A is closed, the point (Xo, yo) belongs to the graph of
A. Thus x o e D(A), so in fact, x ° e MI, and Yo = A(Xo) e N I, qed.
Remark. A particular circumstance where the hypotheses of this
theorem are satisfied is the following. There is a consistent
vector topology z on X, stronger than the given topology, and the
mapping A is everywhere defined on X and T-continuous. For
example, T might be the Mackey topology on X. Then since such
an A is weakly continuous on X, A is closed wrt the original
topology on X.
Example . Let X and Y be Hilbert spaces of infinite
dimension, and A e L(X, Y) be compact and injective. For some
r > 0, let M = rU(X). Now M is convex and w~compact, so by the
preceding remark and theorem, the M-quasi-solution program is well-
posed. Let us compute this solution for a given y e Y.

208
The operator A*A is compact, self-adjoint and positive semi-
definite. Let k I > k 2 ~ ... > 0 be its eigenvalues, and
{Ul, u2,...} the corresponding orthonormal basis of eigenvectors.
Set ~n = ~ A ( u n ) ' Y~ for n = i, 2 ..... Then the unique M-quasi-
solution for the equation A(x) = y is
Bn
(4) x = ~ ( X + X )Un'
n=l n
where X = 0 if
2
(s) Bn
n=l 7 < r2,
n
and otherwise k is the positive root of the equation
2
(6) ~ Bn = r2.
2
n=l (~n + X)
To verify this assertion, we put f(x) = IIA(x) y[]2, and compute
that
O = Vf(x) = 2A*(Ax - y )
if and only if
co
(7) A*A(x) = A*y = ~ flnUn .
Expanding x in terms of the basis {Un} , and substituting it into
(7), we find that
~n
x = [ (~---)Un •
n=l n
209
Therefore, if (5) holds, this X must be the desired quasi-solution.
Otherwise, we have
~n 2 > r 2 ,
(y_)
n=l n
and now we have the constrained program of minimizing f(x) subject
to I[xl] 2 < r 2. This is an ordinary convex program to which the
(classical) Kuhn-Tucker conditions of 12d) are applicable. We
conclude that there is X > 0 such that the solution x satisfies
IIxll = r and
2A*(A(x) - y) + ~(2x) = @
or
A*A(x) + Xx = A*(y).
Expanding x and A*(y) in terms of the basis {u n} immediately
leads to (4); the requirement Ilxl[ = r then implies that
satisfies (6).
c) The practical problem involved in solving equation (I)
(under the hypotheses on the operator A made in a)) is the following.
Given that y is either known exactly, or can be computed
(approximated) to arbitrarily high accuracy, compute (approximate)
the solution x to arbitrary accuracy. That is, assuming that
y e range(A), and that we have a sequence {yn} C Y with Yn + y'
find a sequence {Xn} C X such that x n ÷ x ~ A-l(y).
One possible way to utilize the preceding results on quasi-
solutions for the resolution of this problem is to choose an
increasing sequence {Mn} of compact subsets of X such that

210
c~(~Mn) = X.
n
This is c e r t a i n l y possible if X is a s e p a r a b l e nls. We m i g h t
then let xn be the Mn-quasi-solution of e q u a t i o n (i) and try to
prove that x n ÷ x. This scheme has in fact been suggested by
Lavrentiev [2, p. 8], and alleged by him to always lead to a
convergent sequence {Xn}. (Using the c o n t i n u i t y and i n j e c t i v i t y
of A, it is not h a r d to see that the only p o s s i b l e limit of {x n}
is the true solution x.) However, this allegation is false, even
when the u n d e r l y i n g spaces X and Y are H i l b e r t spaces, as we
see next.
?
Example. Let X = Y = ~. We define a compact injective
linear operator A on X by
1 1
(z I, z 2 ..... z n .... ) = z ~-~A(z) = (Zl, ~z 2 ..... ~Zn,...).
1 1
Let x-- (I, 2' 3 ,...) and y = A(x). Finally, define
B n = {z e X: ]ziJ < n, i < n; z i = 0, i > n},
2
M n = co(B n U (x + e n }),
where en is the n t h - s t a n d a r d unit vector. Now first we see that
each Mn is c o m p a c t and convex, and that their u n i o n is dense in
X. Also, although {M n} as given is not an i n c r e a s i n g sequence,
appropriate subsequences are increasing, for example,
{M 2, M 4, MI6, M256,...} = {M22 n}

211
Next we c l a i m that the Bn-quasi-solution of the c o r r e s p o n d i n g
equation (i) is zn -
= (i, ~1 .... ' K'
1 O, 0 .... ). From this it follows
that
]IA(zn) yll ~ I IA( zn - x ) [ ]
= ( [ k-4)I/2 > i n-3/2

k=n+l ~ "
n2
On the other hand, for large n, the M n - q u a s i - s o l u t i o n is X + e ,
since
2 2
IIA( x + en ) - yll = IIA( en )[1 = n-2,
2
which is e v e n t u a l l y < a ~ i n-3/2. However, obviously x + e n 74 x.
Thus, for L a v r e n t i e v ' s scheme to be s u c c e s s f u l , we must be able
to g u a r a n t e e in advance that x e (JM n. Suppose this to be the
case, say x e Mn for n > no. Let xn ( r e ~ . ~ ) b e ~ e M n - q u a s i - s o ~ t i o n
of the e q u a t i o n A(z) = y (resp. A(z) = yk ). Then we have
l im x nk = x n, Vn,
x n = x, n > n .
- o
From these equations it is clear that we can p r o d u c e sequences in
X which converge to the true solution x.
To avoid the p r o b l e m of c h o o s i n g the sets Mn so as to be
sure in advance that x e ~ J M n, we might assume either that X is
a reflexive nls, or else that X is a dual space and that A is
the adjoint of an o p e r a t o r in L(Y*, *X) (where *X is the p r e - d u a l

212
of X). If we also assume that Y is an E-space, the result in b)
becomes applicable, and we can produce sequences in X which
converge weakly, or weak-star, to x.
d) A recent and related approach to the approximate solution
of ill-posed linear equations of the form (i) is due to Tanana [3].
The assumptions on the mapping A are the same, but those on X
and Y made in b) are interchanged. That is, it is assumed that
X is an E-space and Y a ics. Let us suppose given a directed
nbhd. basis {N } of closed convex y-nbhds., for some y e range (A).
(In a practical problem, this corresponds to the possibility of
determining y by experiment or measurement to arbitrary accuracy.)
Then the discrepanc K method of Ivanov and Tanana consists of
....
mlnlmlzlng the norm in X over the sets M ~ A-I(N ). For each
the E-property guarantees a unique solution x e M , and these
are considered to be approximate solutions of A(x) = y. The E-
property further entails the convergence of {x } to the exact
solution A-l(y). Conversely, if X is separable, and this
discrepancy method always yields convergent nets of approximate
solutions, for every A, every Y, and every directed nbhd. basis
of every element of range (A), then X must be an E-space.

213
References for ~34
i) V. Ivanov, On linear problems which are not well posed. Soviet

Math. Dokl. 3(1962), 981-983.
2) M. Lavrentiev, Seme Improperly Posed Problems of Mathematical

Physics. Springer-Verlag, New York, 1967.
3) V. Tanana, Incorrectly posed problems and the geometry of

Banach spaces. Seviet Math. Dokl. Ii[1970), 864-867.
4) A. Tikhonov, On the stability of inverse problems. Dokl. Akad.

Nauk SSSR 39(1944)~ 195-198. (Russian)
214
§35. Generalized Inverses
We continue with the study of abstract linear equations of the
form A(x) = y, but with a somewhat different viewpoint and toward
other ends than in the preceding section. Not only do we allow the
equation to be inconsistent for some y's, but also we permit a
superfluity of solutions, that is, the operator A need not be
injective. We shall attempt to single out a unique "best approxi-
mate solution" for a given y, and to study the correspondence be-
tween y and this "solution". The mapping so defined has some of
the properties of an inverse for A, and is known as the "general ~
ized inverse" of A. It must be noted at the outset that the only
really satisfactory results require that both x and y vary in
Hilbert spaces, and that A have closed range.
a) Let X and Y be nls and A ~ L(X,Y). For a given
Yo ¢ Y' we consider the linear equation
(1) i(x) = Yo"
Definition. An X-quasi-solution (34a)) of (I) is called an
extremal solution (or sometimes, a virtual solution). An extremal
solution of minimal norm is called a best approximate solution
(b.a.s.) to (i).
Let R(A) (resp. N(A)) denote the range (resp. nullspace) of
the operator A. It is clear that the existence of an extremal solu-
tion to (I) is equivalent to the condition
PR(A) (Yo) ~ ~"
In particular, if R-(TT~ is proximinal (30a)), then this last
condition becomes
215
(23 PR--~(yo) E R(A).
A more s o p h i s t i c a t e d nas condition for the existence o f an e x t r e m a l

solution is given next, in the Hilbert space case.
Theorem. (Tseng) Let X and Y be Hilbert spaces. There is
an extremal solution to (i) if and only if there exists a positive
constant fl such that
(3) I <yo,y> 12
for every y a N(3&*) -~.
Proof. Let us first prove the necessity of (3). We have
Y = R - - ~ ~ R(A) &
= R--[~e N(A*).
Let Yo = ~ + w be the associated decomposition of Yo" Since an
extremal solution exists, we have from (2) that y c R(A), that is,
= A(~), for some ~ c X. Let ~ = ][~]]2 Then, noting that
N(AA*) = N(A*), we have, for any y c N(AA*)~
] (Yo,Y) ] = I (A(~),y~I
= j <7,A, Cy)) I l l ll tIA*Cy) II
1 t
Conversely, let us demonstrate the sufficiency of (3). We
define a new inner product (.,.) on R-~ by
(Yl'Y2) ( YI'AA*CY2 )
Let Z be the completion of R(A~ in the metric defined by (.,.).
Since As is ~ontinuous wrt this metric, and X is complete, we
can extend As to belong to L(Z,X). Decomposing Yo as above,

216
namely Yo = ~ + w, we have for any y ~ ~ :
](~7,y~) [2 = I ~ Y o , Y ~ I 2
< ~ <y,AA *(y)> - B(y,y).
That is, the linear functional f(y) - (~,y~ is continuous on
wrt the (.,.)-metric, and therefore can be extended to belong
to Z*. By t h e Riesz Representation T h e o r e m we h a v e
f(z) = (z,~), Vz ~ z,
and for some z- e Z. Hence, for y ~ R(A):
f(y) = (y,~-)
T h u s we s e e that ~y e
< y,A(x)-y > = 0.
But this of course entails A(x) = y; in other words condition (2)
holds, and x is an extremal solution to equation (1), qed.
b) Returning now t o the case of general nls X and Y, let
E(A,Yo) be the (possibly void) set of extremal solutions to equation
(1). Since E(A,Yo) is always closed and convex, we s e e that when-
ever X is reflexive and rotund, in particular whenever X is an
E-space, there is a unique b.a.s, x o. The i d e a is now t o study the
mapping
A+: Y ~ D(A +) ÷ X,
A+ ( y o ) ~ x o.
By d e f i n i t i o n , the domain D(A +) consists of those Yo e Y for
which there exists a unique b.a.s, in X. Clearly, @ e D(A +) al-
ways.
217
Definition. The mapping A+ just defined is the generalized

inverse of A.
We now give conditions which imply that A+ is densely defined,
and/or linear, and/or continuous, etc.
Theorem. Let X be reflexive and rotund, and let R(A) be a
Chebyshev subspace of Y. Put B = (AIN(A)@) -I. Then
D(A +) D D(PR(A)) and
(4) A+ 1D(PR(A)) = BPR(A)"

In p a r t i c u l a r , A÷ is densely defined on Y.
Proof. We first note that PR(A) is densely defined on Y
because
D(PR(A)) D R(A)(D R - - ~ o,
which is dense in Y by 32c). Next we note that the mapping B is
well-defined, that is, A]N(A) @ is injective, because N(A) is a
Chebyshev subspace and so 32c-i) applies. Of course, B need not
be linear or continuous. Now let Yo s D(PR(A)) and define
x o = BPR(A)(Yo ).
Then 'xo ~ E(A,Yo) because
l lA(Xo)-Yo] I = l lPR(A)(yo)-Yoll
- d ( Y o , R ( A ) ) 5_ [IA(x3-Yol[, V x ~ x.
Now a g a i n u s i n g t h e (non-linear) direct sum d e c o m p o s i t i o n of 32c),

namely
X = N(A) ~ N(A)@,
we can express any x ~ X as x = n + p, and then if x is also

218
in E (A,Yo) we find
I Ixll = I In+pll = tln+BA(x) II
> d (BA(x) ,N (A))
= d(BPR(A) (Yo),N(A)) -= l lXoll.
This shows that x° is a b.a.s, to equation (i). Since any such
b.a.s, must be unique because of the hypotheses on X, it follows
that Yo ~ D(A+) and that (4) holds, qed.
It follows immediately that if to the preceding hypotheses we
adjoin the assumption that A has closed range, then D(A +) = Y. If
we also assume a little more about X and Y, then we can obtain a
more striking improvement on the theorem.
Corollary. Let X and Y be E-spaces, and let A s L(X,Y)
have closed range. Then A+ is a continuous mapping of Y onto
N(A) 0, whose restriction to R(A) is a homeomorphism.
Proof. First of all, we have D(PR(A) ) = Y, so that A+ is
given by the right hand side of (4). Since X and Y are E-spaces,
both metric projections PN(A) and PR(A) are continuous by 32e).
Hence we are reduced to checking the continuity of B. But, if A1
is the isomorphism of X/N(A) with R(A) induced by A, and
Q = QN(A) IN(A)@ (where, for any closed subspace M C X, QM is
the associated quotient map), then
B = Q-IAil
is continuous by 32g), qed.
Since a number of operators A of interest have finite dimen-
sional nullspaces, we can frequently drop the hypothesis in the
Corollary that X be an E-space and just require that N(A) be a
finite dimensional Chebyshev subspace of X.

219
Let us also note that u n d e r the h y p o t h e s e s of the last theorem,
AA+ = PR(A)'
A+A = I - PN(A)'
so that
(5) AA+A = A,
A+AA + = A +
Example. In this example we see how the g e n e r a l i z e d inverse
determines the solution set of the corresponding linear equation.
Let X and Y be nls and A s L(X,Y). Suppose that A+ satisfies
(5), and that equation (I) has a solution, say x o. T h e n every
solution x of (i) has the form
(6) x = A+(Yo ) + ( ~ - A + a ) ( z ) ,
for some z ~ X, and conversely. Because, if xI is a s o l u t i o n of
(i), then we may take z = xI in (6). And c o n v e r s e l y , given z ~ X
and x defined by (6)~ we have
A(x) = A(A+(Yo )) + A(z) AA+A(z)
Ai+A(Xo ) + A(z) A(z)
= A(Xo) = Yo"
In p a r t i c u l a r , the set of s o l u t i o n s to the h o m o g e n e o u s equation
A(x) = @ is simply the range of I A+A.
Also, we can observe that (5) e n t a i l s a consistency criterion
for e q u a t i o n (i), namely~ this equation is c o n s i s t e n t (i.e.,
solvable) if and only if AA+(Yo) = Yo"
c) For the remainder of this s e c t i o n we will assume that both
X and Y are H i l b e r t spaces. This will suffice to g u a r a n t e e that

220
generalized inverses are always linear mappings; this in turn leads
to a m u c h more elegant (and useful) theory.
Theorem. Let X and Y be H i l b e r t spaces and A e L(X,Y).
Then A+ is a closed, densely defined linear m a p p i n g on Y.
Proof. The theorem in b) applied here and allows us to con-
clude that A+ is a d e n s e l y defined linear m a p p i n g on Y, namely
A+ = B P R ( A ) ,
whose domain D(A +) is the dense subspace R(A) ~ R(A) & of Y.
Here B E (AJN(A)a) -I. To see that A+ is closed, select
{yn } C D ( A +) with Yn ÷ y ~ Y and A + ( Y n ) + x e X. We can w r i t e
Yn = A(Xn) + vn'
for x n ~ N(A) a- and v n E R(A) ± • Then xn - A + (yn) ÷ x, hence
A(Xn) ÷ A(x). Hence also v n + y --A(x) ~ R(A) A-. This shows that
y ~ D(A +) and that A+(y) = x, qed.
A quite similar argument shows that A+ is still a closed
linear m a p p i n g if A is any c l o s e d linear m a p p i n g on X [7]. Now
in the case we are c o n s i d e r i n g , namely A ~ L(X,Y), if we assume
also that R(A) is c l o s e d in Y, then it is c l e a r that A + ~ L(Y,X).
In this most important case we adopt a special terminology and
notation.
Definition. If for some A ~ L(X,Y) we have A + ~ L(Y,X) then
A+ is c a l l e d the p s e u d o i n v e r s e of A, and is w r i t t e n At .
Since we are only d e a l i n g with Hilbert spaces we see that
A ~ L(X,Y) has a pseudoinverse exactly when A has closed range
(in w h i c h case A is u s u a l l y said to be n o r m a l l y solvable).

221
2) If A is a partial isometry on X, then A t = A*. In
particular, if A is an orthogonal projection, then A T = A. If A
is normally solvable, then (At) * = (A*) t
3) For A a L(X,Y), R(A) is closed exactly when
0 < y(A) ~ inf{ilaCx) ll: x ~ S(N(A)~)}.
In t h i s c a s e we h a v e llAtll = ~(A) -1 [ii].
4) Showalter [13] shows that for A E L(X,Y) we have
A t = lim exp(-A*A(t-s))A*ds
lira B(t),
t+~
and estimates the rate of convergence by
y(A)][At-B(t)I] ~ exp(-ty(A) 2), t > 0.
It is further shown in [14], again with estimates of convergence
rate, that if A does not have closed range, we still have
A +(y) = lira B(t)(y)
+
for all y e Y, that is, B(t) * A strongly.
s) Decell [6] applies the Cayley-Hamilton theorem to AA*,
where A is an arbitrary (complex) matrix to deduce the following
formula for At: let
n
p(X) = ( - 1 ) n ~ a.X n - J a = I,
j=o ~ ' o
be the characteristic polynomial of AA*. If k = max(j: aj # 0},

222
then
k-I
AP = _akl ~ aj(AA*)k-j-1;
j=o
if k = 0, then A t = 0.
d) There are a variety of methods available for computing the
pseudoinverse of a matrix, for example c-5) above; see also [15],
[3, p. 685-688], [9]. Thus it is of interest when possible to have
a procedure for reducing the c o m p u t a t i o n of At to the case where A
is a matrix. The next theorem shows that this can always be done if
one of X and Y is finite dimensional. We have seen an example
of this situation in 21e,f).
Theorem. Let X and Y be Hilbert spaces and let A s L(X,Y)
be n o r m a l l y solvable. Then
A f = A~(AA~) t
= (A~A)fA * .
Proof. Let us just verify the first equality. We have
N ( A A ~) = N(A ~) and therefore R(AA *) = R(A) (obviously
R(AA *) C R(A) and since Y is the orthogonal direct sum of N(AA*)
and R(AA*), we have R(AA*) = R(A); but also y(A) > 0 (see c-3))
y(A ~) > 0 = > y ( A A ~) > 0 ~ R(AA ~) closed). Now according to the
theorem in c) we must show that Vx ~ N(A) &,
x -- A ~ B I A ( X ),
where
B 1 -- (AA, I N ( A , ) J - ) - i .
But there is a unique y ~ N(A ~) ~ such that A~(y) = x, whence
Bl ( A ( x ) ) -- BIAA*(y) = y,
223
and this completes the proof.
Two special cases of this theorem are of importance. First, if
either AA* or A*A happens to be invertible then we have a formula
for At . Second, if X (resp. Y) is finite dimensional, then A*A
(resp. AA*) is a matrix, to which the computational techniques men-
tioned above can be applied. Again we refer back to the example in
21e,f).
e) We now work toward formulas for At which do not require
the computation of any other pseudoinverses. These formulas require
a choice of an auxiliary operator with special range or nullspace.
The following lemma expressing the pseudoinverse of a product is
essential.
Lemma. Let X, Y, Z be Hilbert spaces, B ~ L(Z,Y),
C E L(X,Z), with B* and C surjective. Define A ~ L(X,Y) by
A = BC. Then
A t = C*(CC*)-I(B*B)-IB*
= CtB t"
Proof. Since B is an isomorphism of Z with a ~bspace of
Y, the theorem in d) implies the second equality. So we concentrate
on the first equality. We have
At = AtAA t = (AtB) (CA t ) ,
so that it will suffice to show
AtB = C*(CC*) -I,
OT
(7) B*A t* = (CC*)-ic,

224
and the analogous formula for At*c * . Now since C* is injective
and R(CC*) = R(C) (see d)) ~ Z, we see that CC* is invertible.
Next,
C,B,At* = A,At*
= (AtA)* = AtA = AtBC
~-> BCC*B*A t* = BCAtBC
= AAtA = A = BC.
-i
Left-multiply the two end terms of this last equation first by B
and then by (CC*) -I to obtain (7), qed.
Theorem. (Boot, Minimide-Nakamura) Let A be normally
solvable in L(X,Y). If for some Z there exists C, surjective,
in L(X,Z) such that R(C*) = R(A*), then
A t = C* (CA*AC*)-IcA*.
Similarly, if there exists B a L(Z,Y), B* surjective, such that
R(B) = R(A), then
A t = A*B(B*AA*B)'IB *.
Proof. In the first case we can write
A = APN(A)~ = APR(A, )
= APR(c, ) = Actc ~ BC,
verify that B* is surjective, and then apply the Lemma. Similarly,
in the second case, we can write
A = PR(A)A = PR(B)A
= BBtA - BC,
225
verify that C is surjective, and again apply the Lemma. Let us
just give the details for the first case.
To see that B* is surjective, it is has to prove that B
has a bounded inverse. But R(B) = R(A) is closed, so we need only
check that B is injective. Let B(z) = 0. Then
AC t (z) -- 0
-=-->O : A*ACt(z)
-- pR(a,)ct(z) = PR(c,)Ct(z)
= PN(c~C* (z) = c* (z).
But since C is surjective, Cf is an i s o m o r p h i s m of Z with
N(C)~; consequently, z = @.
Now applying the Lemma we obtain
A t = ctB t
= C* (CC*) -i (B.B) -IB *
= C* (CC*) - 1 (Ct*A*ACf) - IC%*A *
= C* (cT*A*AcTcc *)- ICt*A*
= C * ( C t * A * A C * ) - I c t * A *.
Thus we are reduced to showing
(C**A*AC*)-IcT*
(s)
= (CA*AC*)-Ic.
Since X = N(C) @ R(C*), it is sufficient to check that both
operators in (8) agree on N(C) and on R(C*). Now N(C) = N(C t*)
so the two operators certainly agree on N(C). Next, let z ~ Z;
then
226
cf*(C*(z)) = (cct)*(z)
= PR(c)(Z) = z.
Thus we are further reduced to showing
(C t*i*iC*) - 1 (z)
(93
_- (CA*AC*) - 1 (CC* (z)).
By rewriting (9) as an e q u a t i o n for Zl, where zI is chosen so
that
z = Ct*A*AC * (Zl) ,
we are led to showing
(i0) ~Z = (CA*AC*) - I C C * C t * A * A C * "
However, (I0) is c e r t a i n l y true, as we see by recalling that
c*ct* = (ctc)* = PR(C*) = PR(A*)"
An important use of this theorem is to reduce the c o m p u t a t i o n of
At to the i n v e r s i o n of a matrix. This is possible when either X
or Y is finite dimensional. Suppose, for example, that
dim (Y) < ~. Then in the t h e o r e m we can choose B to be the
natural injection of Z --- R(A) into Y.
f] Let us r e c o n s i d e r equation (i):
A(x) = Yo'
We have been considering extremal solutions of minimal norm to this
equation. For some purposes in o p t i m i z a t i o n and statistics, it is
important to restrict the solutions to lie in some p r e a s s i g n e d sub-

227
set M C X. Such a requirement leads naturally to the notions of
"restricted b.a.s." and "restricted pseudoinverse". Rather than
repeat most of the theory of a) and b), we shall continue to assume
that all spaces are Hilbert spaces, and that all operators are
normally solvable. For additional simplicity we shall also assume
that M is a closed linear subspace of X.
Definition. Let A s L(X,Y), B E L(X,Z) be normally solvable,
let A B = AIN(B ). Suppose that AB is also normally solvable.
Then A~ is called the restricted ~seudoinverse of A (wrt B).
Since AB is assumed normally solvable, we see that

t
AB(Yo) = x° means that x is the unique N(B)-quasi-solution of
equation (i) with minimal norm. The assumption that AB is nor-
mally solvable is equivalent to the assumption that the orthogonal
projection of N(B) on N(A) is closed. This latter condition is
certainly in effect if either dim (N(B)) < ~, or else one of the
nullspaces N(A), N(B) is contained in the other. We obviously
have At
0 ~ At as defined in c).
As does the ordinary pseudoinverse, the restricted pseudo-
inverse satisfies various algebraic relations. In fact, we have the

t
following algebraic characterization of A B-
Lemma. The restricted pseudoinverse At

B is the unique solution
E of the following equations:
(ii) BE = 0 ,
(i2) ~AE = E,
(13) (AE)* = AE,
(14) AEA = A on N(B),
(is) PN(B) (~A)* : EA on N(B).

228
Proof. We omit the verification that A#B satisfies (ii)-(15).
Let us, however, show that there is only one solution to this set of
equations. Suppose that E and F are both solutions. Then
= (AFAPN ( B ) ) * E * E
= (APN(B)FAPN(B))*E*E
= (PN(B)A*F*) (PN(B)A*E")E
FAEAE = FAE
= FAFAE = F ( F * A * ) A E
= F (E*A*AF) ~
= F (AEAF)*
= F(AF)* = FAF = F ,
where several times we have used (Ii) to conclude that R(E),
R(F) C N(B), qed.
Our primary interest in restricted pseudoinverses is that they
allow us to express the solution of certain kinds of quadratic
optimization problems with operator constraint.
Theorem. (Minimide-Nakamura) Let A and B be operators
satisfying the hypotheses of the preceding definition. Let Yo E Y
and z ° s R(B). Then the b.a.s, to the equation A(x) = Yo' sub-
ject to the constraint B(x) = Zo, is given by
(16) x ° = A~(Yo-ABt(Zo) ) + B~(Zo ).

229
Proof. Because of the h y p o t h e s e s on A and B it is clear
that this p r o b l e m has a unique solution. Now let x be d e f i n e d by

0
(16). Applying (ii), we see that
B(Xo) = B(Bt(Zo)) = PR(B)(Zo) = z 0
Next choose any x for w h i c h B(x) = z o. Then
]lA(x)_yo[ [2 = l lACx_BtCZo))
M&B (yo -AB* (%)) I t z

+ I I (I-~) (Yo-AB*(z o)) 2
>_ ] l (I-A;q) (yo-ABt (z o) ) Z

_ ]]A(xo)_yoi[2,
with equality if and o n l y if
(17) A ( x - B ~ ( Z o )) = AA~9 (Yo-ABT ( Z o ) ) .
(The f i r s t equality above arises from the Pythagorean Law a p p l i e d to
the sum o f an e l e m e n t in R(A) and an e l e m e n t in R(A)a.) Now, i f
x also satisfies (17), then
J lxtI 2 = j lx-Bt(Zo)Ile + iiBtCZo) llz

Ix-BY (zo)-AB*(yo-iS(z o ))112
+ IAs[ ( Y o - i B $ ( z o ) ) t ]2 + l lBrCzoDlt
" e
> [XoI l 2,
unless x = Xo, qed. ( A g a i n we h a v e applied the Pythagorean Law,
first to elements in N(B) ± and N(B) , and t h e n to elements in
N(A) ~ and N(A) .)

230
g). In effect, what we have been studying in this section is
a special class of "multi-stage" optimization problems, where each
stage involves the minimization of a quadratic norm. So far we
have only encountered two-stage problems, but higher-stage problems
lie close at hand. For example, consider the problem solved in the
last theorem, but suppose that z ° ~ Z \ R(B). We can still define
an element x° via (16) but now its significance is that it is a
solution of the following three-stage problem: find an extremal
solution (- X-quasi-solution) of B(x) = Zo, which, among all such
extremal solutions, is a b.a.s, of A(x) = Yo (this latter problem
of course being two-stage).
For amether example, let X and Y be IIilbert spaces, let
r ~ L(Y,X) be an isomorphism and S ~ L(X,X) an automorphism, and
define new equivalent norms on X and Y by
I iYIIT-- )IT(v3tt,
tlXlis--)lsCx)ll.
Then given Yo e Y and A e L(X,Y) (normally solvable), we pose
the problem: among all extremal solutions of A(x) = Yo (wrt the
II" liT-norm', find the (unique) element of least [[']IS -n°rm. To
solve this problem, we first note that
t]A(x)-Yol [T z l JTA(x)-T(Yo)][ ,
whence the set of l l'llT-extremal solutions is the flat
(TA)*(T(Yo)) + (I-(TA)t(TA))(X).
Then the element of least l iolls norm in this flat is given by
x o = S-l(ras-l)*(T(Yo)).
231
The finite dimensional version of this last result has been used to
construct the unbiased linear estimate of minimal variance (Gauss-
Markov estimate) to the unknown vector of parameters appearing in a
linear statistical model [12, p. 119]. In this situation the iso-
morphism S2 represents the (positive definite) covariance matrix
of the model, R is the identity, and A is determined by the
particular physical situation involved.
Yet another kind of two-stage optimization problem of the type
under discussion occurs in the theory of optimal control. Namely,
from an admissible set of controls it is desired to choose one which
steers a given (linear) system in such a way that at the terminal
time, some (quadratic) function of the difference between the
achieved state and the desired state is minimized (it might also be
important to minimize some (quadratic) function of the difference
between the actual trajectory and a desired trajectory). If there
is more than one such "optimal control", then from among these it is
desired to choose one which minimizes some (quadratic) cost criterion.
An example along these lines is given in [i0, p. 174].
h). Let us conclude by citing a few of the more recent works
on pseudoinversion. The earlier literature is most adequately
referenced in [3]. Very much in the spirit of the present notes are
two papers of Ben-Israel [1,2], which expound the use of metric
projections onto convex subsets of Rn and pseudoinverses to pro-
duce algorithms for solving non-linear equations and inequalities in
several variables. For a related approach see also the recent paper
of Fletcher [8]. An interative scheme for computing the operator At
which generalizes the hyperpower method for inverting an operator is
given by Petryshyn [II]. For an extensive survey of finite dimen-
sional pseudoinverses, there are now available a symposium pro-
ceedings [4] and an introductory text [5].

232
References for §35
1) A. Ben-lsrael, On iterative methods for solving nonlinear least

squares problems over convex sets. Israel Math. J. 5(1967),
211-224.
2) , On Newton's method in nonlinear programming,

p. 339-352 in Princeton Symposium on Mathematical Program-
ming (H. Kuhn, Ed.), Princeton Univ. Press, Princeton, 1970.
3) and A. Charnes, Contributions to the theory of

generalized inverses. J. Soc. Ind. Appl. Math. 11(1963),
667-699.
4) T. Boullion and P. Odell, Ed's., Symposium on Theory and Appli-

cation of Generalized Inverses of Matrices. Texas Tech.
College, Lubbock, 1968.
5) , Generalized Inverse Matrices. Wiley-Interscience,

New York, 1971.
6) H. Decell, An application of the Cayley-Hamilton Theorem to

generalized matrix inversion. SIAM Rev. 7(1965), 526-528.
7) I. Erdelyi and A. Ben-Israel, ~xtremal solutions of linear equa-

tions and generalized inversion between Hilbert spaces. J.
Math. Anal. Appl., to appear.
8) R. Fletcher, Generalized inverses for nonlinear equations and

optimization, p. 75-86 in Numerical Methods for Nonlinear
Algebraic Equations (P. Rabinowitz, Ed.), Gordon and
Breach, New York, 1970.
9) T. Greville, Some applications of the pseudoinverse of a matrix.

SIAM Rev. 2(1960), 15-22.
10) N. Minimide and K. Nakamura, A restricted pseudoinverse and its

application to constrained minima. SIAM J. App. Math.
19(1970), 167-177.
233
11) W. Petryshyn, On generalized inverses and on the uniform con-

vergence of (I-~K) n with application to iterative methods.
J. Math. Anal. Appl. 18(1967), 417-439.
i2) C. Price, The matrix pseudoinverse and minimal variance estimates.

SIAM Rev. 6(1964), 115-120.
13) D. Showalter, Representation and computation of the pseudo-

inverse. Proc. Amer. Math. Soc. 18(1967), 584-586.
14) and A. Ben-lsrael, Representation and computation of

the generalized inverse of a bounded linear operator be-
tween Hilbert spaces. Appl. Math. Report No. 69~12,
Northwestern Univ., 1969.
is) S. Zlobec, Explicit computation of the Moore-Penrose generalized

inverse. SIAM Rev. 12(1970), 132-134.

(Lecture Notes in Mathematics 257) Richard B. Holmes (Auth.) - A Course On Optimization and Best Approximation-Springer-Verlag Berlin Heidelberg (1972)

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

(Lecture Notes in Mathematics 257) Richard B. Holmes (Auth.) - A Course On Optimization and Best Approximation-Springer-Verlag Berlin Heidelberg (1972)

Caricato da

Copyright:

Formati disponibili

Lecture Notes in

Offsetdmck: Julius Beltz, Hemsbach/Bergstr.

one-semester graduate level course at Purdue University, dealing

with optimization in general and best approximation in particular.

The prerequisites were modest: a semester's worth of functional

analysis together with the usual background required for such a

course. A few prerequisite results of special importance have been

gathered together for ease of reference in Part I.

My general aim was to present an interesting field of application

of functional analysis. Although the tenor of the course is

consequently rather theoretical, I made some effort to include a

few fairly concrete examples, and to bring under consideration

problems of genuine practical interest. Examples of such problems

are convex programs (~'s 11-13), calculus of variations (§17),

minimum effort control (§21), quadrature formulas (§24), construction

of "good" approximations to functions (§'s 26 and 29), optimal

estimation from inadequate data (§33), solution of various ill-posed

to a presentation of the theoretical background needed for the study

No attempt has been made to provide encyclopedic coverage of

the various topics. Rather I tried only to show some highlights,

techniques~ and examples in each of the several areas studied.

Should a reader be stimulated to pursue a particular topic further,

he will hopefully find an adequate sample of the pertinent literature

included in the bibliographies. (Note that in addition to the main

bibliography between Parts IV and V, each section in Part V has its

own special set of references appended.)

The first three parts of these notes constitute a slightly

fleshed-out arrangement of the material actually covered in the Purdue

course. That course also involved the solution of numerous problems;

about 50 of those problems have been included here and Part IV

contains hints and/or complete solutions to most of them. Thus this

portion of the notes is reasonably self-contained, modulo the

indicated prerequisites (minor exceptions to this assertion occur on

pages 28, 81 and 89). Part V is a bit more loosely written; in

particular, it contains a few references without proof to rather

legitimately been included in the course had time permitted. The

order of ~'s 32 and 33 is somewhat arbitrary and could have been

reversed. §'s 34 and 35 provide some applications of metric

projections by illustrating their natural occurrence in attempts to

handle ill-posed linear equations.

other courses besides the original; for example, a two-quarter

course covering essentially everything, a one-quarter course on best

or a one-quarter course on convexity and optimization covering

for the subdifferential of a supremum of convex functions), and

perhaps some of the early material in Part III.

As format goes, sections are divided into sub-sections; each

sub-section contains at most one theorem, at most one definition,

etc. (the sole exception to this being 33e)). A reference to (sub-

section) 15b), say, is unambiguous; a reference to b), say, refers

to sub-section b) of the current section.

used in two ways, which hopefully are distinguishable by context:

it denotes on occasion the empty set, and at other times, it

denotes a linear functional.

Some acknowledgments are now in order. Professor Frank Deutsch

generously made available to me a copy of his own lecture notes on

best approximation, and these proved quite useful in the arrangement

of some of the material in Part III. Mr. Philip Smith provided

several helpful comments about Chebyshev centers in §33. Professor

Paul Halmos kindly recommended the inclusion of the manuscript in

the Springer Lecture Notes Series. Finally, it is a pleasure to

cheerful assistance in the preparation of the manuscript.

West Lafayette, Indiana

Part II. Theory of O p t i m i z a t i o n . . . . . . . . . . . . . . . 14

and f s M. Then ~ F s X such that IIFII = ]Ifl[ and FIM = f.