Sei sulla pagina 1di 240

Lecture Notes in

Mathematics
A collection of informal reports and seminars
Edited by A. Dold, Heidelberg and B. Eckmann, ZiJrich

257

Richard B. Holmes
Purdue University, Lafayette, IN/USA

A Course on
Optimization and
Best Approximation

Springer-Verlag
Berlin-Heidelberg • NewYork 1 972
A M S S u b j e c t Classifications (1970): 41-02, 41 A 50, 41 A 65, 4 6 B 9 9 , 4 6 N 0 5 , 49-02, 4 9 B 30, 9 0 C 2 5

I S B N 3-540-05764-1 S p r i n g e r - V e r l a g B e r l i n • H e i d e l b e r g • N e w Y o r k
I S B N 0-387-05764-1 S p r i n g e r - V e r l a g N e w Y o r k • H e i d e l b e r g • B e r l i n

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned,
specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine
or similar means, and storage in data banks.
Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher,
the amount of the fee to be determined by agreement with the publisher.
@ by Springer-Verlag Berlin * Heidelberg 1972. Library of Congress Catalog Card Number 70-189753. Printed in Germany.

Offsetdmck: Julius Beltz, Hemsbach/Bergstr.


PREFACE

The course for which these notes were originally prepared was a

one-semester graduate level course at Purdue University, dealing

with optimization in general and best approximation in particular.

The prerequisites were modest: a semester's worth of functional

analysis together with the usual background required for such a

course. A few prerequisite results of special importance have been

gathered together for ease of reference in Part I.

My general aim was to present an interesting field of application

of functional analysis. Although the tenor of the course is

consequently rather theoretical, I made some effort to include a

few fairly concrete examples, and to bring under consideration

problems of genuine practical interest. Examples of such problems

are convex programs (~'s 11-13), calculus of variations (§17),

minimum effort control (§21), quadrature formulas (§24), construction

of "good" approximations to functions (§'s 26 and 29), optimal

estimation from inadequate data (§33), solution of various ill-posed

linear systems (§'s 34-3S). Indeed, the bulk of the notes is devoted

to a presentation of the theoretical background needed for the study

of such problems.

No attempt has been made to provide encyclopedic coverage of

the various topics. Rather I tried only to show some highlights,

techniques~ and examples in each of the several areas studied.

Should a reader be stimulated to pursue a particular topic further,

he will hopefully find an adequate sample of the pertinent literature

included in the bibliographies. (Note that in addition to the main

bibliography between Parts IV and V, each section in Part V has its

own special set of references appended.)


IV

The first three parts of these notes constitute a slightly

fleshed-out arrangement of the material actually covered in the Purdue

course. That course also involved the solution of numerous problems;

about 50 of those problems have been included here and Part IV

contains hints and/or complete solutions to most of them. Thus this

portion of the notes is reasonably self-contained, modulo the

indicated prerequisites (minor exceptions to this assertion occur on

pages 28, 81 and 89). Part V is a bit more loosely written; in

particular, it contains a few references without proof to rather

deep results. I feel that all the topics in Part V might have

legitimately been included in the course had time permitted. The

order of ~'s 32 and 33 is somewhat arbitrary and could have been

reversed. §'s 34 and 35 provide some applications of metric

projections by illustrating their natural occurrence in attempts to

handle ill-posed linear equations.

It is my hope that the present notes can serve as the basis for

other courses besides the original; for example, a two-quarter

course covering essentially everything, a one-quarter course on best

approximation covering Part III, §'s 31, 32, and perhaps 19 and 35,

or a one-quarter course on convexity and optimization covering

Part If, ~33 (note that 33b) contains a proof of Valadier's formula

for the subdifferential of a supremum of convex functions), and

perhaps some of the early material in Part III.

As format goes, sections are divided into sub-sections; each

sub-section contains at most one theorem, at most one definition,

etc. (the sole exception to this being 33e)). A reference to (sub-

section) 15b), say, is unambiguous; a reference to b), say, refers

to sub-section b) of the current section.


Because of typographical limitations, the symbol "4" has been

used in two ways, which hopefully are distinguishable by context:

it denotes on occasion the empty set, and at other times, it

denotes a linear functional.

Some acknowledgments are now in order. Professor Frank Deutsch

generously made available to me a copy of his own lecture notes on

best approximation, and these proved quite useful in the arrangement

of some of the material in Part III. Mr. Philip Smith provided

several helpful comments about Chebyshev centers in §33. Professor

Paul Halmos kindly recommended the inclusion of the manuscript in

the Springer Lecture Notes Series. Finally, it is a pleasure to

thank Mrs. Nancy ~berle and Mrs. Judy Snider for their competent and

cheerful assistance in the preparation of the manuscript.

West Lafayette, Indiana


November, 1971
CONTENTS

Part I. Preliminaries . . . . . . . . . . . . . . . . . . . .

51. Notation . . . . . . . . . . . . . . . . . . . . . . 1
§2. The H a h n - B a n a c h Theorem . . . . . . . . . . . . . 2
§S. The Separation Theorems . . . . . . . . . . . . . . 4
§4. The A l a o g l u - B o u r b a k i Theorem . . . . . . . . . . . . 7
§5. The K r e i n - M i l m a n Theorem . . . . . . . . . . . . . . 8

Part II. Theory of O p t i m i z a t i o n . . . . . . . . . . . . . . . 14

§6. Convex Functions . . . . . . . . . . . . . . . . . . 14


§7. Directional Derivatives . . . . . . . . . . . . . . 16
§8. Subgradients . . . . . . . . . . . . . . . . . . . . 20
§9. Normal Cones . . . . . . . . . . . . . . . . . . . . 23
§i0. Subdifferential Formulas . . . . . . . . . . . . 25
§II. Convex Programs . . . . . . . . . . . . . . . . . . 29

§12. Kuhn-Tucker Theory . . . . . . . . . . . . . . . . . 32


513. Lagrange Multipliers . . . . . . . . . . . . . . . . 36
§14. Conjugate Functions . . . . . . . . . . . . . . . . 42

§lB. Polarity . . . . . . . . . . . . . . . . . . . . . . 48
516. Dubovitskii-Milyutin Theory . . . . . . . . . . . . 51
§17. An A p p l i c a t i o n . . . . . . . . . . . . . . . . . . . 56
§18. Conjugate Functions and S u b d i f f e r e n t i a l s ...... 58
§19. Distance Functions . . . . . . . . . . . . . . . . . 61
§20. The Fenchel Duality Theorem . . . . . . . . . . . . 65
§21. Some A p p l i c a t i o n s . . . . . . . . . . . . . . . . . 7O

Part III. Theory of Best A p p r o x i m a t i o n • . . . . . . . . . . 76

§22. Characterization of Best A p p r o x i m a t i o n s ...... 76


§23. Extremal Representations . . . . . . . . . . . . . . 81
§24. Application to G a u s s i a n Quadrature . . . . . . . . . 88
§25. Haar Subspaces . . . . . . . . . . . . . . . . . . . 91
§26. Chebyshev Polynomials . . . . . . . . . . . . . . . 98

§27. Rotundity . . . . . . . . . . . . . . . . . . . . . i05

§28. Chebyshev Subspaces . . . . . . . . . . . . . . . . 109

§29. Algorithms for Best A p p r o x i m a t i o n . . . . . . . . . 118


§30. Proximinal Sets . . . . . . . . . . . . . . . . . . 123
VIII

Part IV. Comments on the Problems . . . . . . . . . . . . . . . 128

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Part V" Selected Special ToPic s . . . . . . . . . . . . . . . . 145

§31. E-spaces . . . . . . . . . . . . . . . . . . . . . . . 145


§32. Metric Projections . . . . . . . . . . . . . . . . . . 157
§33. Optimal Estimation . . . . . . . . . . . . . . . . . . 177
§34. Quasi-Solutions . . . . . . . . . . . . . . . . . . . 203
§35. Generalized Inverses . . . . . . . . . . . . . . . . . 214
Part I

Preliminaries

~i. Notation

Throughout these notes we will be d e a l i n g with linear spaces

X, Y, ..., and v a r i o u s mappings defined on them. The u n d e r l y i n g

scalar field may be either the real or c o m p l e x number field, unless

one or the o t h e r is e x p l i c i t l y singled out. We list b e l o w some of the

abbreviations and/or symbols to be e m p l o y e d throughout the text. Al-

t h o u g h not all u s e d right away, it is c o n v e n i e n t to have them c o l l e c t e d

together for ease of reference. Notation of less frequent usage will

be i n t r o d u c e d as the need arises.

We write:

Is - for linear space;

tls - for t o p o l o g i c a l linear space;

its - for l o c a l l y convex (Hausdorff) space;

nls - for n o r m e d linear space;


@ - for the zero v e c t o r in a Is;

X - for the real restriction of a c o m p l e x is;


r
X t _ for the a l g e b r a i c dual of a Is;

X* - for the continuous dual of a tls;

u(x) for the unit ball {x ~ X: [Ixll ~ i} of a nls x;

S(X) - for the unit sphere {x ~ x: Ilxll = 1} of a nls x;

L(X,Y) for the space of all c o n t i n u o u s linear maps from a tls X


into a tls Y;
Rn _ for real E u c l i d e a n n-space;

e. - for the i t h - s t a n d a r d unit v e c t o r in Rn;


l
T - for the c o n j u g a t e of a c o m p l e x number z;

sgn (z) for the s i g n u m ~/Izl of a n o n - z e r o complex number z


(with sgn (0) = 0);
span (A) for the linear hull of a set A;

co (A) for the convex hull of a set A;

int (A) for the interior of a set A;

rel-int (A) - for the r e l a t i v e interior of a set A;

cl (A) (or s o m e t i m e s A) for the c l o s u r e of a set A;

fIA - for the restriction of a f u n c t i o n f to a s u b s e t A of


its domain;

wrt - for "with respect to";

nas- for " n e c e s s a r y and s u f f i c i e n t " ;

C(~) for the space of c o n t i n u o u s scalar-valued functions on a


compact Hausdorff space fl;

rca (~) for the space of r e g u l a r Borel m e a s u r e s on such a space

I P ( n ) , c o , l P , L P ( ~ ) - for the usual Banach spaces;

A subscript R attached to the symbol for a function space, as

in CR(fl ) or L~(V), means that the functions involved are real-

valued; otherwise the scalars may b e e i t h e r real or complex.

Finally, the symbol "z" is to be read "equals by definition".

§2. The H a h n - B a n a c h Theorem

In this s e c t i o n we recall w i t h o u t proof some v a r i a n t s of the Hahn-

Banach extension theorems. These results all assert the e x i s t e n c e of

linear functionals with certain properties. Together with their geo-

metrical versions to be given in §3 below, they c o n s t i t u t e the corner-

stone of the e x i s t e n c e and d u a l i t y theory to be d e v e l o p e d later in

these notes.

a) Theorem. Let M be a l i n e a r subspace of a real is X and

f ~ M'. Let p be a r e a l - v a l u e d sublinear function on X such that

f < p]M. Then ~ F E X' satisfying F < p and F]M = f.


3

Thus the linear functional f has a linear extension F to all

of X and this extension remains dominated (pointwise) by p. Using

a separation theorem (§3) Weston [77] has shown that the above result

remains true if p is replaced by a (finite) convex function on X.

b) Corollary. Let X be a complex is and let f, M have the

same meaning as in a). If p is a semi-norm on X such that

If(.)l ! plM, then O F e X' such that IF(.)I ! p and FIM = f.

c) Corollary. Let X be a nls, M a linear subspace of X,

and f s M*. Then ~ F s X* such that IIFII = ]Ifl[ and FIM = f.

This result may be viewed in particular as asserting the exist-

ence of a continuous linear extension of f with minimal norm.

Clearly f has "many" extensions F with I[FII > llfl[. It is less

clear a priori whether or not an extension of minimal norm is uni%ue.

This question has some interesting connections with approximation and

moment problems; the reader may consult [19, 26, 58, 73] for further

details.

We note also that the proofs of b) and c) above establish some

information about linear functionals on a complex nls. Namely, let X

be such a space and f g X*. Define (re f)(x) -- re f(x) and

(im f)(x) = im f(x). Then re f and im f belong to X* (where X


r r
denotes X regarded as a real is), (ira f) (x) = -(re f)(i x), and

II re fll = Ilfll • And conversely, if f ~ X~ and F is defined by


r'

F(x) = f(x) i f(i x), x ~ X,

then F s X* and IIFII = llfll

d) Corollary. Let M be a linear subspace of the nls X and

x O ~ X \ M. Then f ~ S(X*) such that f(x) = 0 ~x ~ M and

f(Xo) = d(Xo,M ).

Proofs of all the preceding results, along with further corollar-


4

ies, can be found in [15, Ch. If].

§3. The Separation Theorems

The main results of this section, namely the Support Theorem 3f)

and the Separation Theorem 3g) are actually equivalent, in their

linear space formulations, to each other, and to the Hahn-Banach

Theorem 2a). However, here they will simply be deduced as conse-

quences of 2a).

a) Lemma. Let X and Y be real tls and T: X + Y an addi-

tive mapping. If T is continuous at 0, then T ~ L(X,Y).

Proof. Exercise i.

b) Lemma. Let X be a tls and f ~ X', f + 0. Then f is

continuous if and only if re f is bounded (above or below) on some

open set.

Proof. After a translation if necessary, we may assume that

re f(U) ~ c, where U is some @-nbhd. Letting V ~ U N (-U) we

obtain Ire f(V) I ~ c. Thus if ~ > 0, and W ~ (¢/c)V, then

]re f(W) l j ~, which proves that re f is continuous at 0. Hence

re f is continuous by a), and therefore so is f.

c) Lemma. Let K be a convex subset of a tls. If x ¢ int (K)

and y ¢cl (K), then {tx + (l-t)y: 0 < t < l}Cint (K). Hence

int (K) is Convex, and cl (int (K)) = cl (K), provided

int (K) + ~.

Proof. Exercise 2.

d) Let us recall that when K is a convex @-nbhd in a tls X,

there is defined on X a non-negative function PK according to the

formula
5

PK(X) = inf {t > 0: x s tK}.

PK is called the M i n k o w s k i function of K. It is s u b l i n e a r and we

have int (K) = (x c X: PK(X) < I) [15, p. 411], so in p a r t i c u l a r , PK


is c o n t i n u o u s on X.

e) If X is a is, a flat in X is any t r a n s l a t e of a linear

subspace of X. If X is a tls, a convex body in X is any convex

set with n o n - e m p t y interior.

Theorem. (Mazur, Bourgin) Let K be a c o n v e x body in a tls X,

and V a flat in X such that V{~hint (K) = ~. Then ~ (real)

closed hyperplane H, satisfying V C H C X and H Nint (~) = ~.

Proof. It will suffice to e x h i b i t an f ~ X* and a real c for

which re f(V) = c > re f(int (K)). After a translation we may assume

that 9 ~ int (K). Let M = real span (V), then V is a h y p e r p l a n e

in Mr, hence ~ f o s Mr such that V = {x c M: fo(X) = i}. Now d)

above implies fo(X) = I <_ PK(X) V x ~ V. Hence fo(tX) < PK(tX) if

t > 0, while fo(tX) <_ 0 <_ PK(tX) if t <_ 0. That is,

fo -< PK IM" By 3a) we can extend fo to fl s X'r so that fl -< PK"

This inequality, together with a) and d) above, shows that fl is

continuous. Let H = {x s X: fl(x) = i}, take c = i, and, if X

is a c o m p l e x space, define f s X* (as in 2c) so that re f = fl' qed.

f) Corollar~. (Support Theorem) If x is not an i n t e r i o r point

of a c o n v e x body K in a tls, then ~ a (real) closed hyperplane H

containing x such that K lies on one side of H.

Proof. Let V = {x} in e).

In typical applications of the Support Theorem x is a b o u n d a r y

point of K. If x is also in K then it is a support point of K,

and any h y p e r p l a n e satisfying the c o n d i t i o n s of f) is a s _ ~ o r t i n ~

hyperplane to K at x.
6

g) Corol!ary. (Separation Theorem) Let K1 be a convex body

and K2 a convex set in a tls X such that int (KI)('~K 2 = ~.

Then ~ a (real) closed hyperplane H separating K1 and K 2.

Proof. We must produce a non-zero f ~ X~ such that

sup re f(Kl) ~ inf re f(K2). Let K = K1 K 2 (vector difference).

Then @ ~ int (K) ~ ~, so by f) ~ a non-zero f ~ X~ for which

re f(K) ~ re f(@) = 0. If c is any number in the interval

[sup re f(Kl) , inf re f(K2)], we may take H = (x c X: re f(x) = c],

qed.

Remark. In case the space X is finite dimensional, a more

precise version of the Separation Theorem is valid. It is based on

the fact that every finite dimensional convex set has a non-void

relative interior (i.e., interior relative to the flat generated by

the set). Excluding the trivial case where both the convex sets lie

in a common hyperplane, the Finite Dimensional Separation Theorem

asserts that two convex sets can be separated by a (real) hyperplane

if and only if their relative interiors are disjoint. The proof is

similar to the one preceding, and may be found along with related

results in [70, §ii].

h) Lemma. Let A be closed and B compact in a tls X. Then

A + B is closed.

Proof. Exercise 3.

Theorem. (Strong Separation Theorem) Let X be a Its, and K1 ,


K2 disjoint closed convex subsets with one of them compact. Then

3 a (real) closed hyperplane strongly separating K1 and K 2.

Proof. The assertion to be proved is that ~ a non-zero

f E X~ such that sup re f(Kl) < inf re f(K2). We first observe that

K ~ K2 K1 is closed by the lemma and that @ ~ K. Then because X


is l o c a l l y convex, ~ a convex @-nbhd. U such that UNK = @.

g) now implies the e x i s t e n c e of a n o n - z e r o f ~ X* such that

sup re f(U) ! inf re f(K). Since f + @, ~ x° c X such that

f(xo) = I, and then ~ e > 0 such that tx ° E U whenever

Itl ! ~" Therefore, s ! sup re f(U) since f(tXo) = t. Thus

~ inf re f(K 2 KI) and so sup re f(Kl) + s ! inf re f(K2) , qed.

i) Corollary. A closed convex subset of a ics is the i n t e r s e c t i o n

of all the c l o s e d (real) half-spaces which contain it.

We recall that a closed (real) half-space in a tls X is a set

of the form {x ~ X: re f(x) < c} for some f ~ X ~. Thus any c l o s e d

convex subset of a ics can be d e f i n e d by a family of (real) linear

constraints. When X is a s e p a r a b l e nls, the family can always be

chosen to be countable.

Exercise 4. Prove this last assertion.

§4. The A l a o g l u F B o u r b a k i Theorem

Let X be a tls w i t h n o n - t r i v i a l dual X~ (which is c e r t a i n l y

the case if X is a ics). We recall that the w e a k - s t a r (w ~-)

topology on X~ is the t o p o l o g y of p o i n t w i s e convergence on X. The

general basic w~-nbhd of f a X~ is d e f i n e d by


o
{f E X~: I f(x)-fo(x)I < 6, x ~ A}; here 6 > 0 is a r b i t r a r y and A

is an a r b i t r a r y finite subset of X. The w ~ - t o p o l o g y is l o c a l l y

convex and H a u s d o r f f ; it is the w e a k e s t topology on X~ for w h i c h all

the linear functionals f ~ f(x) (x ~ X) are continuous. The compact-

ness theorem presented in this section and the e a r l i e r Hahn-Banach

T h e o r e m will justify our later interest in dual spaces.

a) D e f i n i t i o n . Let A be a subset of a tls X. The p o l a r of A

is the set

A ° ~ {f ~ X~: re f(A) < i}


8

A° is evidently a non-empty (0 ~ A ° always) convex subset of

X~ closed in the w*-topology.

ExamFles. i) If M is linear subspace of X, then


MO = M& {f ~ X*: f(M) = 0} (the annihilator of M).

2) If X is a nls, then U(X) ° = U(X*).

3) Let X be either real Euclidean space or real Hilbert space

IR,2 and E the ellipsoid {x = (x n) ~ X: ~(Xn/an) 2 _< i} for fixed

a n > 0. Then E° = (x e X: ~ a n2 x n2 -< 1 ) "

Exercise 5. Verify these examples.

b) It is clear from the relevant definitions that a subset B of

X~ is equicontinuous on X if and only if B is equicontinuous at

O if and only if B C A °, for some @-nbhd A CX.

Theorem. (Alaoglu-Bourbaki) If B is an equicontinuous subset

of the dual of a tls X, then B is relatively w*-compact.

The proof of this depends in part on the Tychonov compactness

theorem for product spaces, and can be found in [71, p. 84]. The

converse is generally false, but does hold when X is a Banach space

(principle of uniform boundedness). The polarity concept will be

examined in greater detail in §15 below.

~5. The Krein-Milman Theorem

In this final preliminary section we recall an extremely impor-

tant device for describing a compact convex set K by means of

smaller subsets. The most important applications of this procedure

will be to the case where K is a (w*-closed) subset of U(X*), for

some nls X. The notation c-~ (A) will mean cl (co(A)), this set

being the same as the intersection of all closed convex sets contain-

ing A (with respect to some ambient tls). The main result gives
9

two e q u i v a l e n t conditions on A CK, either of w h i c h in turn is

equivalent to co (A) = K.

a) Lemma. The c o n v e x hull of the u n i o n of f i n i t e l y many compact

convex subsets of a tls is compact.

Proof. Exercise 6.

b) D e f i n i t i o n . Let X be a is and E CK CX. E is (K-)

extremal if k i ~ K, 0 < k < 1 and ~k I + (l-~)k 2 c E imply

k i £ E. If E is a s i n g l e t o n set, E = (ko) , and meets the p r e c e d i n g

condition, then k° is an extreme point of K; in this case we

write k o £ ext (K).

Examples. i) Let X = R 3, K a cube or t e t r a h e d r o n . Then the

faces, sides, and v e r t i c e s of K are the K - e x t r e m a l subsets of K,

while the e x t r e m e points of K are the vertices.

2) Let ~ be a p o s i t i v e measure on some m e a s u r e space and

1 < p < ~. Let X = LP(~). Then ext U(X) = S(X).

3) Let X = CR(~)* and take k = {~ a X: ~(~) = 1 and ~ K @}"

That is, K consists of n o n - n e g a t i v e measures in rca (~) with total

mass of unity (probability measures on ~). The set K is c a l l e d

the p o s i t i v e face of U(X). The n o n - z e r o extreme points of K can

then be d e s c r i b e d as either the set of delta m e a s u r e s (point masses)

on ~ or, in f u n c t i o n a l terms, as the set of all real algebra

homomorphisms of CR(~ ).

4) Let X be either co or LI(~), where ~ is a p o s i t i v e

non-atomic measure. Then ext U(X) is empty.

Exercise 7. Verify examples i), 2), and 4).

Example 3 is more difficult to verify; it may be d e d u c e d from

results in [I, 74]. Also, the delta m e a s u r e characterization will be

proved later in 15c).


1o

c) Lemma. Let K be a subset of a is X.

i) If {E } is a family of K-extremal sets, then ~ E and

are also K-extremal sets.

2) Let E' ~ E CK. If E' is E-extremal and E is K-extremal,

then E' is K-extremal.

3) If E is K-extremal, then ext (E) = ext (K) f~ E.

Proof. Exercise 8.

We note that if E is an extremal subset of a convex set K,

then K \ E is again convex. The converse statement is clearly

false in general, but it is valid for singleton extremal sets. Thus

we may state that the extreme points of a convex set K are exactly

those points of K which may be deleted from K without destroying

convexity. We also note that if x ~ ext (K) ~'~co (J), where J is

a finite subset of a convex set K, then x e J.

d) Lemma. Let A be a compact subset of a ics X. Then

ext (A) ~ ¢.

Proof. We order the non-empty compact A - e x t r e m a l sets by

inclusion and use Zorn's Lemma to obtain the existence of a minimal

compact A-extremal set B. We wish to show that B is a singleton

set. If not, ~ distinct points p, q ~ B. Then by 3h) ~ f ~ X*

such that re f(p) { re f(q). Let H be the hyperplane

{x ~ X: re f(x) = min re f(B)~. The set B ~H is then a proper

compact B-extremal set. By c-2) it is also A-extremal, which contra-

dicts the m i n i m a l i t y of B, qed.

Corollary. Let A and X be as in the Lemma, and f a contin-

uous convex function on X (e.g., f ~ X*). Then f attains its

maximum over A at an extreme point of A.


II

Proof. The subset of A where f attains its m a x i m u m is a

non-empty compact A-extremal set. It t h e r e f o r e has an extreme point

w h i c h must b e l o n g to ext (A) by c-3).

e) Theorem. (Extended Krein-Milman Theorem) Let K be a c o m -

pact convex subset of a ics X. The following statements about a

subset A C K are e q u i v a l e n t :

i) ~ (A) = K;

2) sup re f(A) = max re f(K), for any f s X~;

3) ext (K) C A.

Proof. The e q u i v a l e n c e of I) and 2) follows from 3h) and the

fact that sup re f(A) = sup re f(c-o (A)), for any f E X~ and A C Y~

The p r e c e d i n g corollary shows that 3) implies 2). Thus we need only

check that 2) implies 3). Using i) we must prove that

ext (c-o ( A ) ) C A. For this it is s u f f i c i e n t to prove that if V is

any c l o s e d b a l a n c e d convex @-nbhd in X, and x E ext (c-o (A)),

then (x+V) /'I A + ¢. Now since A is t o t a l l y bounded, ~ a finite

set ( X l , ' ' ' , x n} C A such that

ACKJ(x i + v).
i=l

Now the sets KI • = c--o ((xi+V) ~ A) are compact and convex, and so

we have

n n
co (A) = co ) = co ),

the last equality following from a). Hence we may w r i t e x as a

convex combination of points in Ki, and since Ki~ c-~ (A), the

comment immediately preceding d) implies that x belongs to some Ki .

Therefore, x = x I. + v for some v c V, and so x.z = x - v is a

point in (x+V)~'~A~ qed.


12

Corollary. If X is a nls, then ext U(X ~) is total over X.

That is, if x and y are distinct points in X, ~ f s ext U(X ~)

such that re f(x) @ re f(y).

Exercise 9. Let X be a reflexive Banach space. Then

U(X) = co (ext(U(X))), where the closure is taken here in the norm

topology on X.

f) Example s . i) Let X be a Banach space. Combining the

Alaoglu-Bourbaki and Krein-Milman Theorems shows that

U(X*) = c-~ (ext(U(X*))), with closure here in the w~-topology on X e.

This observation leads to an easy proof that certain Banach spaces

are not dual spaces. For if X is an infinite dimensional dual

space then ext (U(X)) must be infinite. (If X is also reflexive,

then the Krein-Milman Theorem can be used to show that actually

ext U(X) is uncountable [40].) Thus if it is empty or finite X

cannot be a dual space. Hence from Example 4 in b) we see that

LI(~) (~ non-atomic) and co are not dual spaces. As a further

illustration of this idea let X = CR(g ). Characterize ext (U(X))

(Exercise i0), and then show that if ~ has only a finite number of

components, ext U(X) is finite. Hence, either ~ is finite, in

which case X is finite dimensional, or else X is not a dual space.

2) An extreme point of a compact convex set K in a ics is not

necessarily a support point of K. An example of this phenomenon was

given by Klee [36, p. 98]. Nevertheless, K is the closed convex

hull of its extreme support points. Prove this as Exercise Ii.

3) Some further idea of the power and scope of the extreme

point concept and the Krein-Milman Theorem can be obtained by noting

that they have been used as the basis for proofs of the Stone-

Weierstrass Theorem [5], and the Lyapounov convexity theorem for

vector-valued measures [35, 38]. They also underlie the extensive

Choquet representation theory [I0, 59]. Of course we will make our


i3

own particular applications of this material in our study of approxi-

mation theory below.

We conclude by remarking that a sharper version of the Krein-

Milman Theorem is possible in the finite dimensional case. This will

be developed when it is needed, namely in ~23.


Part II

Theory of O p t i m i z a t i o n

In the next several sections we present an i n t r o d u c t i o n to the

theory of o p t i m i z a t i o n in abstract spaces, together with selected

applications of a more concrete nature. Parts III and V to follow

provide many additional illustrations of the theory to be developed

now. With the exception of an introduction to the Dubovitskii-

Milyutin theory in ~'s 16 and 17, which allows n o n - c o n v e x objective

functions, we will only be concerned with convex o p t i m i z a t i o n prob-

lems. Most generally, these are simply problems of minimizing a

convex function over a convex set (or, what is the same, subject to

linear or convex contraints) in linear spaces. We will adopt the

device p o p u l a r i z e d by R. T. R o c k a f e l l a r of redefining the function

to be infinite outside the constraint set. The two basic tools for

characterizing the solutions of our problems are the theory of sub-

gradients and the theory of conjugate functions.

~6. Convex Functions

a) Definition. Let X be a Is and f: X ÷ (-~, +~]. f is a

proper convex function on X, written f ~ Cony (X), if f is not

identically +~ and

f(tx+(1-t)y) < tf(x)+(1-t)f(y),

whenever x, y a X and 0 < t < I. The effective domain of f is

the set

dom(f) --- i x ~ X: f ( x ) < +~}.

Frequently an o p t i m i z a t i o n problem involves a (finite) convex

function defined only on some convex subset K CX. Such a function


15

is obviously extendable to belong to Conv (X); we simply define its

values at points in X \ K to be +~.

Examples. I) Evidently, any linear or sublinear function on X,

hence any norm or semi-norm, belongs to Conv (X).

2) If K C X is a convex set containing 0, then the

Minkowski function PK is in Conv (X), and dom (pK) is the (convex)

cone generated by K.

3) A very important example of a convex (but not sublinear)

function which repeatedly occurs in optimization problems is the

following. Let X be a tls and Y a nls; let R g L(X,Y) and

Yo ~ Y" Then put f(x) = l lR(x)-Yol l .


4) If K is any subset of X, the indicator function of K,

6K, is defined by

6K(X) = ~ 0 if x ~ K

L +oo if x ~ K.

Then 6 K ~ Conv (X) exactly when K is convex. This seemingly

innocuous function will play an important role in the analysis of

constrained optimization problems.

S) Let X be a nls (in particular, consider X = Rn), and K

an open convex subset of X. Let f be a continuously differentiable

real-valued function on K, and define the values of f to be +~

off K. Define a function (the "excess function") E: K x K ÷ R 1 by

E(x,y) = f(y)-f(x)-df(x).(y-x),

for x, y c K. Here df(x) is the (Frechet) differential of f at

x, and the dot signifies its value at the vector (y-x). Then

f e Cony (X) if and only if E(x,y) > 0 for all x, y ~ K.

(The reader should be sure to understand the geometric significance of

E.)
16

Exercise 12. Verify this last assertion. Use the result to

prove that if f is actually twice continuously differentiable on K,

then f s Conv (X) if and only if d2f(x) is p o s i t i v e semi-definite

for every x s K. Here d2f(x) is the second (Frechet) differential

of f at x, and is a continuous symmetric bilinear function on

X × X. This observation includes the familiar case where

K = (a,b) ~ R 1 = X, and the c o n v e x i t y criterion is simply that

f"(x) > 0 for a < x < b.

In Example 4 above we a s s o c i a t e d a convex function to every

convex set in a is. We now give a converse association, by consider-

ing the region above the graph of a given convex function.

b) Definition. Let f E Cony (X). The e p i g r a p h of f, epi (f),

is the set defined by

epi (f) = {(x,t) s X × RI: f(x) ~ t}.

Clearly every function f: X ÷ (-~, +~] has an epigraph; the

epigraph is convex exactly when f s Conv (X). Note that for such

f's dom (f) is the p r o j e c t i o n of epi (f) into X. In general,

the e p i g r a p h will be important to us because of the support and

separation principles developed in ~3.

§7. Directional Derivatives

a) Theorem. Let X be a is and f s Conv (X). Then if

x ° s dom (f),

f(Xo+tX)-f(x o)
(l) f' (Xo;X) ~ lim
t+O t

exists in [-~, +~] for every x ~ X.

Proof. Observe first that if g ~ Cony (X) satisfies g(@) = 0,

then h(t) ~ g(tx)/t is a n o n - d e c r e a s i n g function on (0, +~).


17

Because, if 0 < s < t, then

h(sx) i ~s h ( t x ) + tt---~s h(@)

whence h(sx)/s ~ h(tx)/t. Apply this o b s e r v a t i o n to the function

g(y) m f(Xo+Y)-f(Xo) to conclude that the difference quotient in (i)

is a n o n - d e c r e a s i n g function of t.

Remark.
.. When x E Xo-dom (f), then -~ < f'(xo;x ). We can see

this by verifying that the d i f f e r e n c e quotient is b o u n d e d below for

t > 0. In fact, in the inequality f(Xu+(l-X)v) < Xf(u)+(l-X)f(v),

let us replace u by x ° + tx, v by xo x, and X by i/(l+t);

1 t
f(Xo) = f(y~y (Xo+tX)+ ~ (Xo-X))

1
-< i - ~ f(xo+tx) + ~ t f(x o_x),

whence

f ( X o + t X ) - f ( x o)
f (x o) - f (x o-x) i t

b) Theorem. Let f be a finite convex function on a is X.

Then f'(Xo;. ) is a ( f i n i t e ) sublinear function on X, ~2/x 0 ~ X.

Proof. The finiteness of f'(Xo;- ) follows from a) and its

homogeneity is an immediate consequence of (I). Let us prove its

subadditivity. In the convexity inequality for f stated in the

preceding Remark, we replace u by x ° + 2tx , v by x o + 2ty,


1
and set ~ - 2 " This yields

f(Xo+t(x+y)) <_ l ( f ( X o + 2 t x ) + f ( X o + 2 t y ) ),

and s o

f (Xo+t (x+y)) - f (Xo)


<

f(Xo+2tx) -f(x O) f(Xo+2ty) -f(x o)


< +
-- 2t 2t
18

Thus, in the limit as t + 0, we o b t a i n

f'(Xo;X+Y) ! f'(Xo;X)+f'(Xo;Y)'

qed.

Corollary. Let f be as in the Theorem. Then

lira f (x°+tx) - f (x°)


= _f,(Xo;-X )
t~0 t

< f' (x o ;x),

for every Xo, x e X.

Proof. Exercise 13.

c) The p r e c e d i n g Corollary has the following implication.

Suppose that X is a real tls, f a finite convex function on X,

and x ° e X. Suppose also that f'(x ° ; -) ~ ~ belongs to X* Then

f ( X o + t X ) - f ( x o)
(2) (x) = lim t '
t+0

that is, the two-sided limit exists Vx E X.

Definition. When Equation (2) holds for a f u n c t i o n f, a point

x ° ~ X, and all x e X, and ~ e X*, then ~ is called the

gradient of f at Xo, and is w r i t t e n ~ ~ Vf(Xo).

We will use this terminology even w h e n f assumes infinite

values or w h e n f is not convex. It is c e r t a i n l y justified in

particular when f is (Frechet) differentiable at x o. When f has

a gradient at all points of some open set, f will be said to be

smooth on that set.

Exercise 14. Let X = LP(~), f(x) = IIxll, and @ + x ° e X.

If 1 < p < +=, show that


19

vf(x o) : Xo(.)IXo(')IP-Z/I 1%11 p - I ,

as an element of Lq(~), ~ + ! = i. If p = I, find n e c e s s a r y and


P q
sufficient conditions on x° in order that Vf(xo) exists, and

identify it as an e l e m e n t of L~(~) (assume ~ to be a-finite).

d) Remark. Let f be a s m o o t h convex function on R I. Then

f' is n o n - d e c r e a s i n g . This follows from the excess function charac-

terization of c o n v e x i t y given in 6b). Indeed, for x < y, the

inequalities 0 < E(x,y), 0 < E(y,x) imply

f'(x) < f ( y ) f(x) < f,(y)


-- y-x

(Of course a stronger statement is true, since even if f is not

differentiable, both its o n e - s i d e d derivatives are still non-decreas-

ing functions.)

To g e n e r a l i z e this m o n o t o n i c i t y property of g r a d i e n t s of s m o o t h

convex functions on the line, let f be a s m o o t h convex function

on an open (convex) subset K of some tls X. The excess function

characterization then implies

(3) Vf(xo)-X =- (~(x) <_ f ( X o + X ) - f ( X o ) ,

or

(4) }(X-Xo) i f(x)-f(Xo),

for Xo, x s K. If we now interchange x° and x, add the two

inequalities, and p e r f o r m a bit of algebra, we o b t a i n the m o n o t o n i c i t y

inequa___llity

0 <_ ( V f ( x ) - V f ( X o ) ) • (X-Xo) ,

which is v a l i d xo, x ~ K. This is also d e s c r i b e d by saying that

Vf(.) is a m o n o t o n e mapping of K into X ~.


2O

~8. Subgradients

We turn now to a s a t i s f a c t o r y generalization of the n o t i o n of

gradient in the case of convex functions which are not differentiable.

The appropriate concept is a " s u b g r a d i e n t " of a given convex function,

the theory of w h i c h has been extensively developed in the last few

years by M o r e a u [48, 49, 50], Rockafellar [7, 66, 68, 69], and others.

a) Definition. Let X be a real tls and f ~ Cony (X). Any

~ X~ satisfying the equivalent equations (3) or (4) of §7 is

called a s ubgradient of f at x o. The set of all such ~ is the

subdifferential of f a_!t xo, denoted by ~f(xo). The m a p p i n g

x ~ ~f(x) of X into (possibly empty) subsets of X~ is the sub .-

derivative of f (sometimes called the subdifferential of f).

Remarks. i) If X is a complex tls and f ~ Conv (X), then

~f(Xo) will consist by d e f i n i t i o n of those ~ ~ X~ such that re

is a s u b g r a d i e n t of f at x relative to X .
O r

2) If Vf(Xo) exists then it o b v i o u s l y belongs to ~f(xo) ,

and we will shortly see that in this case there can be no other sub-

gradients at x o.

3) If x ° ~ dom (f), then ~f(xo) is void by definition. On the

other hand, if x o ~ dom (f), then } s X* is a s u b g r a d i e n t of f

at x° if and only if %(x) ! f'(Xo;X ) ~ x ~ X (using 7a)).

4) If f is s u b d i f f e r e n t i a b l e at Xo, which means that ~f(Xo)

is not void then f must be lower semi-continuous (isc) at x


o

~xamples. i) Let f be a continuous convex function on R I.

Then ~f(Xo) is the compact interval whose left and right hand end-

points are respectively the left and right hand derivatives of f

at x .
o
2) A isc p r o p e r convex function need not be s u b d i f f e r e n t i a b l e

throughout its e f f e c t i v e domain. As an example, define


21

f E Cony (R I) by

f(x)

+ ~ otherwise.

Then ~f(~l) = ~.

3) Let f be the norm on some nls X. Then ~f(@) = U(X~).

Further examples of subgradients of particular convex functions

will be given in subsequent sections.

b) We consider next the geometrical interpretation of sub-

differentiability. When X is a real tls, we recall that (X × RI) *

is (algebraically) isomorphic to X* × R 1 under the correspondence

X* × R1 ~ (¢,s) *-~ ~ (X x RI)*, where

~((x,t)) ~ ~(x,t) = ¢(x)+st. Indeed, this is a special case of the

theorem asserting that a product of tls has the corresponding (direct)

sum for its dual.

Theorem. Let X be a real tls and f ~ Conv (X). Then

~ ~f(xo) if and only if the graph of the affine mapping

h(x) ~ f(Xo)+%(X-Xo) is a supporting hyperplane to epi (f) at the

point (xo,f(Xo)) .

Pr____£of. By definition } E ~f(Xo)4=>h(.) ! f(.). In turn this

happens exactly when ~(epi (f)) K f(xo)-¢(x o) z c, where

~(x,t) ~ -¢(x)+t, and this definition of ~ entails 9(x,h(x)) = c.

(The graph of h is the hyperplane {(x,t): 9(x,t) = -c}.)

c) Lemma. Let f, X be as in b). Suppose that

H ~ {(x,t) ¢ X × RI: ~(x,t) = c} is a supporting hyperplane to

epi (f) at (Xo,f(Xo)), say ~(epi (f)) h c. If 9 corresponds to

(~,s) ¢ X* × R I, and H is non-vertical (that is, s } 0), then

s > 0 and -%/s ¢ ~f(Xo).


22

Proof. Exercise 15.

d) We are now ready to present the main existence theorem for

subgradients, due originally to Minty [45]. Let us note in advance,

that the subdifferential of a given convex function at any point is

always a convex and w*-closed subset of the dual space, although as

we have already seen, it may well be void.

Theorem. Let X be a ics and f ~ Conv (X). If f is

continuous at x ° s dom (f) then ~f(Xo) is a non-empty w~-compact

convex subset of X*.

Pr__~oof. We first show that ~f(Xo) ~ ~. Now epi (f) is a

convex body in X × R1 (for example, an open set in epi (f) is

V x (b,+~), where V is an open Xo-nbhd. for which f(V) ! b). We

may therefore apply the Support Theorem 3f) to obtain a hyperplane

H ~ {(x,t) c X x RI: ~(x,t) = c} which supports epi (f) at

(Xo,f(Xo)). Further, H is non-vertical since x° must belong to

int (dom (f)). Hence the Lemma in c) establishes the existence of a

subgradient at x o. Now since ~f(Xo) is always w*-closed, it will

suffice to prove that it is, in the present case, relatively w *-

compact, and this will complete the proof.

According to 4b), it will suffice in turn to find a @-nbhd

VCiX so that 8f(xo) C V °. But we may take V = {x g X:

f(Xo+X)-f(Xo) ~ i}, qed.

e) Coro!lary. With the same hypotheses except that now X is

a nls, it follows that ~f(Xo) is also a (norm-)bounded subset of ~ .

Pr___ooof. A category argument can be used to establish that any

w*-compact convex subset of the dual of a nls X is bounded (and

convexity is essential should X not be complete); cf. [34, §18].

However, an ad hoc argument can be given in the present case. Sup-


23

pose that 8f(Xo) contains an u n b o u n d e d sequence {~n }. Choose

6 > 0 so that f(x) ~ 1 if I IX-Xol I ~ 6. Choose Yn c S(X) so

that Cn(Yn ) ~ II¢nIJ-i/n. Let x n = 6y n. Then we o b t a i n the

contradiction

i ~ f(xo+Xn)-f(xo) K ~(Xn)

: 6¢(y n) ~ 6(I[¢nll l/n).

Example. Let X and Y be nls, R e L(X,Y) and Yo c Y.

For the (continuous) convex function f(x) = llR(x)-Yo] [, introduced

in 6a), we see that all the s u b g r a d i e n t s of f at any point in X

are c o n t a i n e d in the ball IIRIIO(×').

§9. Normal Cones

We recall that if x° belongs to a is X, a cone w i t h vertex

x° (or simply a cone at xo) is a union of rays emanating from x o.

That is, if K is a cone at xo, then x ~ K~xo+t(X-Xo) E Ks V

t > 0. Clearly a subspace M is a convex cone at each x ° ~ M,

and a h a l f - s p a c e H is a convex cone at each of its b o u n d a r y points.

a) Definition. Let X be a ics, K a convex subset of X,

and x o E K. The support cone to K at x° is the closed convex

cone at x° generated by K; that is, it is the intersection of all

closed convex cones at x which contain K.


O

The support cone to K at xo, denoted S(xo~K), can also be

described as the closure of the union of all rays emanating from x


O

and p a s s i n g through a point in K. When x° is a support point of K,

the d i m e n s i o n of S(xo,K ) is a m e a s u r e of the smoothness of the

boundary of K at Xo, since this in turn depends on the n u m b e r of

(linearly independent) supporting hyperplanes at x o. In fact,

S(xo,K ) is the intersection of the supporting half-spaces at x°

(that is, the h a l f - s p a c e s containing K whose boundary hyperplanes


24

support K at Xo). On the other hand, if x is not a support


O

point of K, then S(Xo,K) = X.

Exercise 16. Verify these last two assertions.

b) Let x ~ K, and X be as in a). We i n t r o d u c e another cone


O

associated with x° and K, but lying in X*. In the special case

where X = R n, this new cone arises as follows. For each supporting

hyperplane H to K at Xo, there is an unique exterior ray

emanating from x° which is normal (orthogonal) to H. The union of

these rays taken over all such H forms a closed convex cone at x
o
called the "normal cone to K at x ". Again, its d i m e n s i o n is a
O

measure of the smoothness of the b o u n d a r y of K at x o. In parti-

cular, its d i m e n s i o n is unity exactly when x° is a "smooth point"

of K, that is, a point through which passes a unique supporting

hyperplane. The normal cone is usually replaced by the t r a n s l a t e d

cone w i t h vertex @.

Definition. The normal cone to K at x K is


0

(S (x o,K)-xo) o

We are taking the polar here of a cone at @, so we obtain a

w*-closed convex cone at @ in X*, denoted N(Xo,K). We easily

see from the d e f i n i t i o n s involved that this cone consists of those

functionals whose real parts attain their m a x i m u m over K at x o.

That is,

N ( X o , K ) = {~ ~ X*: re ~(Xo) = max re ~(K)}.

In p a r t i c u l a r , when K is a subspace of X, N(Xo,K ) is the subspace

K ~, provided x o c K. We e m p h a s i z e that w h e n x ° ~ K, N(Xo,K ) is

empty by definition.
25

c) Example. Let K be a convex subset of a ics X. Then for

every x ° s X,

~K(Xo) -- N(Xo,K ).

It is this formula which motivates our interest in normal cones to

convex sets. We will have occasion to compute normal cones to

specific convex sets (the constraint sets in optimization problems)

in later sections.

§i0. Subdifferential Formulas

a) Let X be a real ics and x ° s X. The mapping f ~ Vf(Xo)

is a linear mapping from the linear space of smooth functions on X

into X *. Now the subdifferential mapping f ~ ~f(Xo) from the

convex cone Conv (X) into the convex subsets of X* is supposed to

play a role analogous to the gradient mapping. For this analogy to

be viable it is important to know when the subdifferential mapping

preserves the cone operations, i.e., positive linear combinations.

Now it is immediate that

~(xf)(Xo) = x~f(Xo),

V X > O, and this is also valid for X = O, provided Bf(Xo) ~ ~.

Of much greater import, however, is the following formula for sums

of convex functions.

Lemma. (Moreau, Rockafellar) Let X be a real ics and

f, g ~ Conv (X). Assume that f is continuous at some point in

doT (f)f'] dot (g) (z doT (f+g)). Then ~ x ° c X,

(f+g)(x o) = ~f(Xo)+$g(Xo).
26

Proof. Directly from the d e f i n i t i o n of s u b g ~ a d i e n t follows

(f+g) (xo) ~ ~f(Xo)+~g(Xo).

To prove the reverse inclusion, let } ~ ~(f+g)(xo). Replacing f

and g if n e c e s s a r y by the convex functions

fl(x) m f(Xo+X)-f(Xo)-9(x),

gl(x) m g(Xo+X)-g(Xo),

we can reduce the argument to the case where x o = 8, 9 = @, and

f(@) = g(@) = 0. So now we have @ E ~(f+g)(@). This implies that

min (f+g)(X) = (f+g)(@) = 0.

It follows that the set

K = {(x,t) s X × RI: t ~ -g(x)}

is disjoint from int (epi (f)), and so we may apply the S e p a r a t i o n

Theorem 3g) to separate K and epi (f). The resulting hyperplane

is n o n - v e r t i c a l since int (dom (f))(-] dom (g) ~ ~, and is the

graph of a linear function (not m e r e l y affine) since

(8,0) ~ KN epi (f). Thus ~ ~ ~ X* such that

re 9 (x) < t V (x,t) a epi (f),

t < re ~(x)

These inequalities entail s ~f(@) and -9 s ~g(@), hence

@ s ~f(@) + ~g(@), qed.

We will give two immediate applications of this lemma; first~

to obtain a relationship between the d i r e c t i o n a l derivative at a

point and the subdifferential at that point, and then to derive a

basic optimality criterion for convex programs.


27

b) Theorem. (Moreau, Pshenichnii) Let X be a real ics and

f ~ Conv (X). Assume that f is continuous at x . Then V x s X,


o

f'(Xo;X ) = max {~(x): ¢ E Sf(Xo)}.

Proof." Fix x ~ X and let L be the line

{Xo+tX: t e RI}. Then

e ~(f+6L)(Xo) ,

¢(y-Xo)+f(Xo) i f(Y) ~y e L,

t~(x)+f(x o) <_ f(Xo+tX) V t s RI,

and

-f' (Xo;X) <_ ¢(x) < f' (Xo;X),

are all equivalent statements about ¢ s X*. Further, since X is

locally convex, 3 ¢ s X* such that ¢(x) = f'(Xo;X). Then, since

the preceding iemma implies ~(f+6L)(Xo) = Sf(Xo)+~L(X o) = Sf(x o)


+span(x) ~ (using 9c)), we have

f' (Xo;X) = max {~(x): ~ = y+~, y e ~f(Xo) and ~(x) = 0}

= max {$(x): ¢ c ~f(Xo)},

qed.

Exercise 17. Under the hypotheses of the Theorem, show that

x ~ f'(Xo;X ) is a continuous function on X.

Corollary. With the same hypotheses,

-f'(Xo;-X ) = min {~(x): ~ E ~f(Xo)}.

Hence the two-sided directional derivative

f(Xo+tX) -f(x o)
(i) lira
t
t+0
28

exists and has the value k if and only if the function @ ~ ¢(x) is

constant (with value X) on the set ~f(x o) .


We note further that the set of x g X for which the limit in

(I) exists is a closed subspace of X on which the value of (i) de-

fines a continuous linear functional [26, p. 97].

c) Corollary. With the same hypotheses, ~ z Vf(Xo) exists if

and only if Sf(Xo) consists of a single element, namely ~.

Proof. Exercise 18.

d) Remarks. i) An alternative proof of the theorem in b) will

be given later, in §18, to illustrate a formula involving conjugate

functions.

2) Since we will frequently encounter the hypothesis that a

convex function f be continuous at a point, it is worthwhile to

recall that this is not a very stringent requirement. Namely, if f

is bounded above on some non-empty open set, in particular, if f is

upper semi-continuous (usc) at any point, then f is necessarily

continuous throughout int (dom (f)) [4, p. 92; 42, p. 193]. (A

special case of this was proved in 3b)). As a special case it follows

that if f ~ Conv (X) where X is finite dimensional, then f is

continuous throughout rel-int (dom (f)). (A direct proof of this

last assertion is also available. Without loss of generality we may

assume that K ~ rel-int (dom (f)) is an open (convex) set in R n.

Now if xo E K and IIxll E ll(~l,...,~n) II is sufficiently small,

then
f(Xo+X)-f(Xo) = f(Xo+~iei)-f(Xo)

f (Xo+n~lel Xo+n~nen)
= n +" "'+ n - f(Xo)

<-- ~1 ~ (f (Xo+n~iei) _f(Xo) ) ,


i=l
29

which tends to 0 as x ÷ 8, because a convex function on an

interval must be continuous. This proves that f is usc at x o.

Now since we also have

f(x o) 5_ i f ( X o + X ) + } f ( X o - X ) ,

then

f ( X o ) - f ( X o +x) £ f ( X o - X ) - f ( x o)

which as above tends to 0 as x ÷ 0, qed.)

Exercise 19. Let X be a nls, and s u p p o s e that f E Conv (X)

is c o n t i n u o u s at x . Then 3 an x -nbhd. V and k > 0 such


O O

that w h e n e v e r x, y s V, the L i p s c h i t z inequality

[f(x)-f(y)] i k[]x-y][

holds.

511. Convex Programs

a) D e f i n i t i o n . A variational pair is an o r d e r e d pair (X,f)

where X is a set and f: X + (-~,+~]; f is c a l l e d the o b j e c t i v e

function. The associated variational problem, or a b s t r a c t mathe-

matical program, is to d e t e r m i n e the n u m b e r inf f(X), called the

value of the p r o g r a m , and the points in X (if any) where the value

is attained. All such points are then c a l l e d solutions of the program.

When X is a is and f s Conv (X), the a s s o c i a t e d variational

problem is called an a b s t r a c t c o n v e x ~rogram.

It is i m p o r t a n t to r e c o g n i z e at the outset that this definition

encompasses the p r o b l e m of m i n i m i z i n g a convex function over a convex

set which is not a linear space. Indeed, the v a r i a t i o n a l problem

(K,f), where K is a convex set in a is X, and f is a convex


30

function defined on K, is for all p u r p o s e s , the same as the convex

program (X, f), where T agrees with f on K, and takes the

value +~ on X \ K. If f is a p r i o r i defined on all of X (i.e.,

f s Cony (X)), then we will identify the p r o g r a m s (K, flK) and

(X, f+6K) .

b) When X is a Ics and f a smooth convex function on X,

then xo is a s o l u t i o n of the p r o g r a m (X, f) if and only if

Vf(xo) = 8. More generally, for any f s Cony (X), x° is a s o l u t i o n

if and only if

(i) e ~ ~f(Xo).

Although trivial in itself, the o p t i m a l i t y condition (i) is the basis

for later more informative characterizations of s o l u t i o n s of c o n v e x

programs.

Note that, by 9a - R e m a r k 3), (i) holds if and only if

x s dom (f) and


o

(2) f ' ( X o ; X ) >_ o, x ~ x.

This condition in turn depends only on the values of f in small

nbhds, of x o. Thus if f has a local m i n i m u m at x o, then (2) and

hence (i) are satisfied, so f has a ~lobal minimum at Xo, that is,

x° is a s o l u t i o n of the p r o g r a m (X, f).

c) We remark that the set of all s o l u t i o n s of a c o n v e x p r o g r a m

(X, f) is a convex set in X which is c l o s e d w h e n e v e r f is isc

on X.

d) We are now ready for the second application of the lemma in

i0 a).

Theorem. (Pshenichnii, Rockafellar) Let X be a ics and


3i

f ~ Conv (X). Suppose that K is a convex set in X such that

either dom (f) /-~int (K) + ~ or f is continuous at some point in

K. Then xo s K is a solution of the convex program (K, fIK) if

and only if -N(Xo,K ) ~ ~f(Xo) + ¢.

Proof. Identifying the given program with (X, f+6K) , and

using successively b), i0 a), and 9 c), we see that x° s K is a

solution if and only if

e E ~(f+~(x o) -- ~f(Xo)+~6K(Xo)

= 8f (Xo) +N(Xo,K) ,

qed.

In other words, our optimality criterion states that a

necessary and sufficient condition for x e K to be a solution to


O

the convex program (K, fIK) is that there should exist ~ ~ ~f(Xo)

which attains its minimum over K at x o. In particular, for smooth

(convex) f, the condition is that Vf(Xo) attains its minimum over

K at x . If K happens to be a subspace of X, or more generally


O

a flat (3e)) in X, and these are important cases in practice, then

the functional ¢ above (or Vf(xo) ) must belong to K ~,

respectively, to the annihilator of the subspace parallel to K.

Exercise 20. As a first illustration of the use of the

optimality criterion above, we propose the solution of a simple

quadratic variational problem in Hilbert space. Let

X = HI[0,T] z {x ~ CR[0,T]: x is absolutely continuous with derivative

i c L2[0,T]}. X is a Hilbert space under the inner product

T
x,y~ ~ x(0)y(0)+ I i(t)~(t)dt.
0

(Completeness of X follows readily from the completeness of


2
L [0,T].) Define f ~ Conv (X) by
32

T
f(x) = J (x(t)2+i(t)2)dt,
0

and let K be the flat {x s X: x(0) = I}. Find the (unique)

solution and the value of the program (K, flK). (A simple approximate

solution is x(t) = e -t, the approximation improving as T in-

creases.)

§12. Kuhn-Tucke r Theory

As a second illustration of the use of lld), we consider a

special class of convex programs ("ordinary convex programs") which,

in the finite dimensional case, have been of great practical interest,

and for which an elegant theory is available. The programs may be

intuitively described as "minimizing a convex function subject to

convex constraints".

a) Lemma. Let KI,...,K n be closed convex bodies in a ics X

whose interiors have a point in common. Let x ° s K ~ ( ~ K i. Then

N(Xo,K) = ~N(Xo,Ki).
Proof. Apply 9c) and 10a) to 6 K = [~K."
1

b) Lemma. Let f be a continuous convex function on a real ics

X which somewhere in X assumes a negative value. For the set

K -= {x ~ X: f(x) < 0} we have

I %, if f(Xo) > 0

N(Xo,K ) = {@}, if f(Xo) < 0

S(@,Sf(Xo)) , if f(Xo) = 0.

Proof. Here we are using the notation of 9a) even though

@ ~ Sf(Xo) when f(Xo) = 0; S(@,Sf(Xo)) is simply the set of all

non-negative multiples of elements of ~f(Xo). Let us prove the

indicated relation when f(xo) = 0, the other two being quite


33

straightforward. If , s ~f(Xo) , then ,(X-Xo) < f(x) ~x s X, and

so ~(x) <_ ~(x)-f(x) <_ ~(Xo) Vx ~ K. Therefore ~ ~ N(Xo,K ) and so

is k%, ~ k >_ 0. That is, S(@,~f(Xo)) C N(Xo,K). Conversely, let

~(+ @) s N(Xo,K ) so that ~(Xo) = max ~(K). Then

f(x) < 0 -->}(x) < %(Xo) , hence 9(x) >_ %(Xo) =>f(x) >_ 0. Thus

f(Xo) = min f(H), where H is the half-space {x s X: 9(x) > ~(Xo)}.

By 10a) and llb) this means that @ e ~f(Xo)+~6H(Xo). But, this


second subdifferential is just the ray of non-positive multiples of

~. Therefore, ~ ~ ~ 3f(Xo) and X K 0 such that @ = ~-X~, and

since f does not attain its minimum at x X cannot vanish. This


O'
proves that ¢ ~ S(~,~f(Xo)), qed.

Corollary. Let fl,.'',fn be continuous convex functions on

a real ics X, let K i = {x E X: fi(x) ~ 0}, and assume that all

the fi are simultaneously negative at some point in X. Then

~ N(Xo~Ki) if and only if ~ Xi K 0 such that Xifi(Xo) = 0

and ~ ~ ~XiSfi(Xo).

Proof. This is an immediate consequence of the results in a)

and b). We are assuming here that x ° s ~ K i.

c) Definition. An ordinary convex p_r_o~ram is a convex program

of the form (X, f+~6K ), where X is a real Ics,


1
f, fl,.-.,fn s Cony (X), f is finite, fl,...,fn are continuous

on X, and K i = {x c X: fi(x) ! 0}.

Thus we are in effect trying to minimize f subject to the

simultaneous inequality constraints fi(x) ~ 0, i = l,.'',n. Such

programs have been of wide-spread interest for the last two decades,

and a considerable computational and duality theory has been develope&

Some general references are [20, 30, 44, SS, 70]. Our only concern

here is to characterize the solutions of these programs under the


34

regularity assumption (or "constraint qualification") that there is

a point in X where all the fi's simultaneously assume a negative

value. The existence question for solutions of (ordinary or abstract)

convex programs is frequently easier to answer~ and will not be

discussed here. But see §30 below for some special cases, and

[70, ~27] for the general (finite dimensional) case.

d) Theorem. (Generalized Kuhn-Tucker Conditions) x° c ~ Ki

is a solution of the ordinary convex program defined in c) if and

only if ~i ~ ~fi(Xo ) and X I• _< 0 such that ~,ifi(Xo ) = 0 and

~i~i ÷'..+ ~n~n ~ ~f(Xo).

Proof. Immediate from the optimality criterion lld) and the

Corollary in b) above.

Corollary. (Classical Kuhn-Tucker Conditions) If all the

functions f, fl~...,fn are smooth, then the solvability condition

becomes: ~ h i K 0 such that ~ifi(Xo ) = 0 and

Vf(xo)+~iVfl(Xo)+...+XnVfn(Xo) = @.

Exercise 21. For x ~ (xl,x2) E R2 define

f(x) = x~ + 2x~ 4x I - 6x 2.

Show that f is convex in the half space ((Xl,X2): x I K 0~. Use

the conditions in d) to solve the ordinary convex program

(R 2, f+6K) where K = (x: x I K 0, x 2 ~ 2, 3Xl+2X 2 ~ i0~.

e) Remark. It is possible to generalize the class of convex

programs for which conditions of Kuhn-Tucker type still characterize

the solutions. In one direction we may adjoin a finite number of

affine constraints (constraints of the form ~(x)j = ~j, where Zj


35

is a continuous linear functional); the only change is that the

corresponding additional "multipliers" X. are of unrestricted sign.


J
In another direction, we may replace the finite set of constraints

fi(x) < 0, i = i,.'' ,n

by an infinite family of smooth (convex) constraints,

f~(x) <_ 0, ~ ~ A,

where A is a compact Hausdorff space and for each x, the function

~ f (x) is continuous on A. The appropriate regularity assumption

is then that for some x,

sup {f (x): ~ ~ A} < 0.

The conclusion is then that an x° satisfying all the constraints is

a solution if and only if there exists a non-positive finite Borel

measure >~ on A such that

IAfa(xo)dX(a) = 0

and

JAVf (Xo)dX(~) s Sf(Xo).

These conditions are of course a precise generalization of those given

in d) where A is a finite set. A proof of this result has been

given by Rockafellar [69, p. 45].


36

§13. Lagrange Multipliers

We continue to study the ordinary convex program of 12c), the

regularity assumption of that paragraph being in force throughout the

present section. Classically the "method of Lagrange multipliers"

is a device for replacing an optimization problem with constraints

(such as the one under study) with a new unconstrained problem. In

addition to discussing this point of view, we develop an equally

interesting interpretation of Lagrange multipliers as measurements

of the rate of change of the value of the program when the con-

straints are slightly altered.

a) Definition. An n-tuple (~l,-'-,~n) of non-negative numbers

is called a Lagrange multiplier vector (or, a Kuhn-Tucker vector)

for the ordinary convex program 12c) if the value of the program

(X, f + ZXifi) is finite and equal to the value of (X, f + Z6K. )


1
(~ the original program).

Thus if we can be assured of the existence of a Lagrange

multiplier vector, then to solve the program we can try to minimize

the function f + Zlif i over all of X (which computationally may

be less difficult than minimizing f over some proper convex sub-

set), and then examine these solutions to find those which satisfy

the constraints fi(x) ~ 0.

b) Theorem. (Minimum Principle) If xo is a solution of the

ordinary convex program 12c), then there exists a Lagrange multiplier

vector (Xl,...,In) such that x° is a solution of the program

(X, f + Zlifi).

Proof. The Kuhn-Tucker optimality condition of 12d) implies

the existence of non-negative ll,.-.,i n such that

@ E ~(f + ZXifi)(xo). Consequently f + EXif i attains its


37

minimum over X at x o. Further, the two minima are equal, since

f(Xo) + ~ i f i ( X o ) = f(Xo) because Xifi(Xo) = 0 also, by 12d).

Hence by definition the n-tuple (~l,...,Xn) is a Lagrange multiplier

vector.

c) We now work toward the establishment of a fundamental connect-

ion between Lagrange multiplier vectors and subgradients. This

relationship leads both to existence criteria for Lagrange multiplier

vectors and to their interpretation alluded to at the beginning of

this section.

Definition. The perturbation function of the ordinary convex

program 12c) is the mapping p: R n + [-~,+=] defined by

p(y) z P(YI'''''Yn ) = inf {f(x):

fi (x) ! Yi' i : l,.'.,n}.

Thus p is the value of a new ordinary convex program differing

from the original only in that the original constraint levels have

been perturbed. Intuitively we think of p(y) as the optimal payoff

for a given level y of resource expenditure. It is possible,

although in practice unlikely, that p may assume the value -~.

In any event, dom (p) ~ {y s Rn: p(y) < +~} = {y s Rn: fi(x) ! Yi

for some x s X and all i}. We assume that @ ~ dom (p), and

direct our attention to the study of p(y) for small values of y.

The basic property of p is its convexity.

Lemma. The perturbation function p is a convex function on


Rn "

Proof. Since a priori p may assume the value -~, we cannot

directly verify the convexity inequality of 6a), because of the

possibility of encountering an indeterminate expression such as


38

-~ +~. However, it is just as easy to establish the convexity of

epi (p). To do thisj choose points y, z s dom (p), numbers

> p(y), ~ > p(z), and 0 < t < I, and show, as Exercise 22, that

p(ty+(l-t) z) < t~ + (l-t)B.

Several facts about p follow directly from its convexity.

First, if p ever assumes the value -~, then it has the value -~

throughout the relative interior of its effective domain; hence if

the value of the original program is finite (i.e., -~ < p(@)), this

case cannot occur, so p must be continuous throughout

rel-int (dom (p)) by 10d). If so, then p is subdifferentiable

throughout this same relative interior by 8d). In particular, if

@ cint (dom (p)), then p is continuous and subdifferentiable on

some 0-nbhd.

d) Theorem. The set of Lagrange multiplier vectors for the

ordinary convex program 12c) is identical with -~p(@).

Proof. Let i ~ (ll,...,ln) be a Lagrange multiplier vector.

Then, recalling that K i ~ {x s X: fi(x) ~ 0},

p(@) = value of (X, f + E~K. )


1

= value of (X, f + ~%ifi )

<_ f(x) + EkiYi, if fi (x) <--Yi vi"

Therefore, p(@) <_ p(y) + ZkiYi, which proves that -I s ~p(@).

Conversely, assume that k ~ -~p(@). Then h i >__ 0 (because, by

definition of p, Yi >- 0 =)p'(@;y) <_ 0, so by 9a-Remark 3),

-h i = (-k)-e i < p'(@;ei) < 0). Now choose any ~ s X and put

Yi = fi (~)" Then p(@) <_ p(~) + Zkiy i =>


39

p(@) < inf {f(x): fi(x) ! Yi }

+ Zliy i

< f(~) + E~if i(~),

and so

value of (X, f + E~K. ) ! value of (X, f + ~Xifi).


1

But the reverse inequality is also valid , since X I. _> 0 and so the

value of (X, f + EXifi) cannot exceed the infinum of this function

over the set ~ K . .i Hence the two values are equal and this means

that X is a Lagrange multiplier vector.

The upshot is that all questions about Lagrange multiplier

vectors are reducible to questions about Sp(@). For instance, a

Lagrange multiplier vector fails to exist if and only if ~p(@) = ~;

this latter condition is in turn equivalent to the existence of a

vector y s Rn for which p'(@;y) = -~ [70, p. 216], in which case

the program is highly unstable for some perturbations. We also

obtain a satisfactory solution of the uniqueness question.

Corollary. A unique Lagrange multiplier vector l exists if

and only if p has a gradient at @, and then

li -~Yi

e) Exa___mple. The ordinary convex program defined by X = R 2,

f(x) = Xl, fl(x) = x2, and f2(x) = x~-x 2 has x = @ as its


unique solution, but there is no Lagrange multiplier vector. Of

course here the regularity assumption is violated.

f) We now briefly consider the use of Lagrange multipliers in

sensitivity analysis. This use is to provide an estimate for the


40

change in value of an ordinary convex program when the constraint

bounds are perturbed. The basic result, which depends on the

Theorem in d), is the following.

Theorem, Let f, fl,...,fn define an ordinary convex program

as in 12c). For given Y, F ~ Rn let x, ~ e X be solutions of the

programs inf {f(-): fi(. ) <_ yi }, resp. inf {f('): fi (.) < ~i }. If
X, Y are corresponding Lagrange multiplier vectors, then

~"(y-F) ! f ( x ) - f ( x ) ! i - . ( y - y ) .

Proof. We verify the right-hand inequality by examining the

definition of X-; the other inequality is handled similarly. Let

q be the perturbation function for the ordinary convex program de-

fined by f, fl-Yl,-..,fn-y- n. Then by d)~ -~- E aq(@). If p is the


perturbation function for the original program, we see that

p(y) = q(y-~) and hence that aq(@) = ap(7). Therefore,

-~ e ap(~), and so the right-hand inequality follows by noting that

f(x) = p(y) and f(x) = P(Y).

In addition to this property of Lagrange multiplier vectors,

recall that it was also brought out in the course of the proof of d)
that

-X.y < p' (@;y),

whenever p is the perturbation function for the ordinary convex

program 12c), X is a Lagrange multiplier vector, and y ~ R n.

Taking in particular y to be the jth unit vector in R n, we may

state that -X. is a lower bound on the marginal rate of change of


J
the value of the program relative to an increase in the right hand

side of the jth constraint. Further, as the Corollary in d) shows,

this lower bound is exact when Vp(@) exists.


4~

g) The u s e f u l n e s s of the p r e c e d i n g results in f) for deciding

whether it is w o r t h w h i l e to make changes in one or more of the con-

straint levels (for the purpose of d e c r e a s i n g the objective function)

depends on the a v a i l a b i l i t y of a Lagrange multiplier vector. Fortu-

nately, many p r a c t i c a l algorithms (such as the simplex procedure for

linear programs) will supply a Lagrange multiplier vector ~ along

with the p r o g r a m solution. This happens because ~ is the solution

of a "dual program" which is s i m u l t a n e o u s l y solved. In fact, in

practical problems (such as linear programs), the vector X is

usually unique.

Example. Consider the linear program in R4 defined by

maximize fo(Xo) ~ 2x I + x 2 + 10x 3 + 4x 4

subject to

5x I + x 2 + 15x 3 + 5x 4 ~ i00

3x I + 2x 2 + 7x 3 + 5x 4 ~ 125

0 ~ Xl,..-,x 4

A solution is ~ = (0,25,0,15) and the value is 85. A Lagrange

multiplier vector is ~ = (3/5,1/5). Question: how does a decrease

in the first constraint bound affect the value of the program?

To answer this, we reformulate the given p r o b l e m as an ordinary

convex program by defining

f = -fo'

fl(x) = 5x I + x 2 + 15x 3 + 5x 4 i00

f2(x) = 3x I + 2x 2 + 7x 3 + 5x 4 125.

Then -~.y <_ p(y)-p(@) gives here


42

-(3/5,1/5)-(6,0) < p(y)-(-85),

where y = (6,0) , and so

-p(y) < 85 + 36/5

which is < 85 if 6 < 0. In other words, any decrease in the first

constraint level will decrease the maximum value of fo by at least

the amount indicated.

h) Remark. In this section we have only touched upon a few of

the highlights of the theory of Lagrange multipliers in convex pro-

gramming. A most conspicuous omission is the interplay between this

theory and that of dual programs. Also missing is the very pretty

economic interpretation in terms of "equilibrium prices". For these

ideas, and for a more detailed study of ordinary convex programs, we

must refer the reader to the literature, in particular [20, 44, 54, 55,

7o3.

~14. Conjugate Functions

In this section we introduce the second technical device for

analyzing optimization problems - the "conjugate" of a (generally

convex) function. This notion, in its modern form, has been developed

by Fenchel [16], Moreau [47, 50], Br~ndsted [6], and Rockafellar

[66, 67]. A very useful survey, with applications to duality theorems

for mathematical programs and optimal control problems, and to

approximation theory, has been given by Ioffe and Tikhomirov [31].

For the remainder of Part I we will be working in a real ics

(possibly a nls) X. As usual there is an immediate extension of the

theory to complex spaces, obtainable by passing to the real parts of

linear functionals. Our restriction to real spaces is thus motivated

primarily by the desire to avoid cluttering up the many formulas


43

below w i t h innumerable "re's". We will also consistently use the

notation <x,y> to denote the value of a linear functional y ~ X*

at the vector x ~ X.

a) Definition. Let X be a real ics and f: X ÷ (-=,+=]. The

conjugate (also called the Fenchel transform) of f is the function

f*: X* + (-=,+~] defined by

f*(y) = sup{<x,y>-f(x): x g don (f)}.

EWe will assume that dom (f) ~ ~.)

b) A trivial yet useful consequence of this definition is Young's

inequality (or F__enchel's inequality):

<x,y~ i f(x) + f*(y).

Also immediate are the properties:

(f+c)* = f*-c, c s R I.

(cf) * = cf*(c), c > O;

f*(y) = g*(cy) if f(x) = g(x/c), c > 0;

f*(Y) = g~(y+F) if f(x) = g(x)- <x,Y~ ;

and

(inf f )* = sup f*

for any family {f } of functions on X.

c) Examples. i) If f = 6K for some K C X, then

f*(y) = sup { <x,y> : x ~ K} ~ sup < K , y ~ . This function is called

the support function of K.

2) Let X be a nls and f(x) = jJxll. Then f* = 6U(X, ).

More generally, if M is a subspace of X, and f(x) = dist (x,M),


44

then f* = 6U(Mi ).

3) Let X be a nls and for 1 < p < +~ let f(x) = llxllP/p.

Then f*(Y) = I]yl ]q/q, where i/p + I/q = i.

4) Let X = R1 and f(x) = e x Then

i +~ for y < 0
f*(y) = 0 for y = 0

y(log y-l) for y > 0.

s) Let X = Rn and let A be a symmetric positive definite

n × n matrix. If i/p + i/q = 1 and

f(x) = (1/p)(x,Ax>P/2,

then

f*(y) = (l/q) < y , A - l y > q/2

Exercise 23. Verify these examples.

d) Theorem. A conjugate function f* is always convex and

w~-isc.

Proof. f* is the pointwise supremum of the functions

y,÷ <x,y~ -fCx), each of which is a w~-continuous affine function

on X~ .

e) Definition. The second conjugate of a function


f: X ÷ (-¢o,+oo] is the function f**: X + [-~,+~] defined by

f*~(x) = sup { < x , y > -f*(y): y ~ dora (f*)}.

Examples. In Examples 2)-5) in c) we have f*~ = f. But in

Example I) in c) we see that


45

f~e = 6-
CO (K) '

since sup < K , y > = sup < c - o ( K ) , y > , ~ y g X*.

f) Theorem. With f as defined in a), we have f** ! f, and

dom ( f ~ ) ~ c-o (dom (f)), provided that f** is proper.

Proof. The pointwise inequality f~* < f follows directly from

Young's inequality (b)). Suppose x ° ~ c-$ (dom (f)). Choose

Yo ¢ X~ according to 3h) so that

~Xo,Yo~ > sup ~c-o (dom ( f ) ) , y o ~

= sup <dora (f),yo~.

Since f** is proper, ~ Yl e X ~ such that If*(yl)l < +~. If

t > 0 then

f*(Yl+tYo ) - sup { < x , Y l + t Y o ~

f(x) : x e dom (f)}

! f*(Yl) +t sup ~..dom (f),yo~ ,

and so

f~(x o) K <Xo,Yl+tyo~2 f~(Yl+tYo )

< x o,yl ~ - f * ( y l ) : t ( 4 X o , Y o ~

-sup i dora f'Yo~ )'

which is unbounded as t ÷ +~, qed.

g) For f and X as in a) we recall that the conditions "f

is isc on X" and "the sub-level sets of f (i.e., the sets

{x e X: f(x) ! c} are closed" are equivalent. It is also easy to

see that these are equivalent to the condition "epi (f) is closed in

X × R I''. We denote by F(X) the set of all isc proper convex


46

functions on X.

Theorem. f** = f if and only if f s F(X).

Proof. If f** = f, then by d) f must be convex and w-lsc,

hence isc since every w e a k l y closed set is closed. Also, f is p r o p e r

by the assumptions made in a). Conversely, suppose f e F(X). Then

the p r e c e d i n g theorem implies

(i) dom (f) C dom (f**) C_ cl (dom (f)).

Suppose ~ x o e dom (f**) such that f**(Xo) < f(Xo). Then

(Xo,f**(Xo)) % epi (f), a closed convex set. Hence Yo E X* and

t c R1 such that
o

tof** <Xo)+ ¢o'Yo> >


(2) sup {tot+ < X , Y o > : (x,t) e epi (f)}.

Now t o } 0, for o t h e r w i s e (I) would be contradicted. In fact

t o < 0, since otherwise the sup in (2) would be infinite. So we may

assume t o = -i. Then for given x e dom (f), the sup is a t t a i n e d

when t = f(x). But now

<Xo'Yo') - f * * ( X o ) >

sup { < X , Y o > - f ( x ) : x s dora ( f ) } = f * ( y o ) ,

a contradiction to Young's inequality.

C o r o l l a r X. With X* endowed with the w * - t o p o l o g y , the map-

ping f~÷ f* is an o r d e r - r e v e r s i n g bijection from F(X) onto F(X*).

Exercise 24. For any f u n c t i o n f as in a),

f** = c--o (f) ~ sup {g ! f: g convex and isc on X}. (Conceivably, the

only such g is the constant -~, but then f* must be the con-
47

stant +~.)

h) Definition. Given functions fl,...,fn as in a), their

(infimal) convolution @fi is the function defined by

n n
~fi(x) = inf {i=l
[ fi(xi):i!l xi = x}.

Exercise 25. If each f. s Conv (X), then @ f. is convex


1 l

(although possibly not proper). Its effective domain is ~ dom (fi).

Exgmples. i) For non-empty sets A, B C X , 6A(9 ~ B = 6A+ B.

2) (6A(~ f)(x ) = inf {f(x-a): a s A}. In particular, if f

is a semi-norm on X, then 6A( ~ f = dist (.,A). If also A is

convex, then Exercise 25 shows that d(.,A) is a convex function on

X,

n
Theorem. ~ ( fi)* = ~ f.* .
i= 1 I

Proof. (~fi)*(y) = sup { < x , y >

inf {[fi(xi): x i = x, x i s dom (fi)}:

x e [ dora ( f i ) }

= sup sup [(<xi,Y > - fi(xi))


X ~X.=X
1

=~ sup {<x,y> fi(x)}


X

= <y),

i) A formula for the conjugate of a sum is a little harder to

come by, although of greater interest. We give next one result in

this direction; however, we will later give a more useful formula

(requiring stronger hypotheses) as an illustration of the Fenchel

Duality Theorem - see 21a).


48

Theorem. If fl,...,fn c F(X), then

(~fi)* = ( @ f l ) * *

Proof• Whether or not the f. are convex we always have the


1

inequality

(3)

as follows from f) and g) above. By h) the right hand side of (3)

equals (~f~)*~ But since fi¢ F(X), there is actually equality

in (3), qed.

§15. Polarity

a) Let X be a real ics and A~X. In 3d) the Minkowski func-

tion of A was defined, provided that A was a convex e-nbhd.

Using the convention that the infinum of the empty set of real num-

bers is + ~, we now expand the coverage of that definition by

defining pA(@) = 0 and

PA(X) = inf {t > 0: x ~ t A},

whenever x + e. Thus PA: X ÷ [0,~], and is positively homogeneous:

PA(CX) = cPA(X ) for c > 0. We also recall that polars were dis-

cussed in §4.

Theorem. For any A C X, we have

(i) PA = 6
A O'

and, if e ~ A,

(2) 6A = p

49

Proof. *
pA(y) = sup { <x,y> PA(X):

x s dora (pA)} = sup { <x,y> - inf {t > 0:


X

x s t A}} = sup {sup { <x,y) -t: x s t A}}


X

Now if y ¢ A° this last sup is < 0, and hence = 0 since

pA(@) = 0. On the other hand, if y ~ A °, then ~ x e A such that

(x,y) = 1 + E > i. Consequently,

pA(y) >_ < t x , y --t = t~,

for t > 0, so pA(y ) = +~.

To prove the s e c o n d formula, recall from 14c) that

~A(y) = sup <A,y) z g(y) L 0,

the last inequality coming from @ E A. Now A ° = {y: g(y) < I}.

Suppose g(y) = 0. Then the ray ( 0 , ~ ) F (~ A°, so PAO(Y) = 0.

Suppose next that 0 < g(y) < +~. Since g is p o s i t i v e l y homo-

geneous we have

P A o ( Y ) = inf {t > 0: ~/t E A °}

= inf {t > 0: g(y/t) < i}

= inf (t > 0: g(?-) < t} = g(F).

Similarly, if g(y) = +~, then for no t > 0 is t~ ~ A °, hence

PAo(Y) = +~, qed.

b) F o r m u l a (I) t o g e t h e r with 14d) provides a new p r o o f of the

fact that A° is always w * - c l o s e d and convex. In turn this fact

yields the h i g h l y useful

Corollary. (Bipolar Theorem)


50

A°°~ (a°) ° = c--~ ({e} L.JA).

Proof. Since {@} L) A C A °° which is closed and convex, we

have at least that A°°~c-$ ({@} k / A ) . On the other hand, if any

closed half-space contains {@} E / A , it must also contain A °°"

taking into account 3i) this proves the reverse inclusion.

Exercise 26. Let {A } be a family of c l o s e d convex subsets of

X each c o n t a i n i n g @. Show that

(3) (¢")ac~) ° = FF ( U a ° a ) ,

the closure here being of course taken in the w~-topology.

c) Example. This example completes the d e s c r i p t i o n of the

extreme point sets of the unit balls of the c l a s s i c a l Banach spaces.

It is d e s t i n e d to play a vital role in the theory of best Chebyshev

(uniform) approximation, to be p r e s e n t e d in Part III.

Let X = CR(~); we will characterize ext (U(X*)). For t ~

let 6t be the point mass at t; as an element of X*, 6t is

the n o r m - o n e functional x ~ x(t), x ~ X. Now each

8 t ~ ext (U(X*)). (If ~ v,a E S(X*) such that 8t = i/2(v + a),

then both ~ and a must be > @, since the p o s i t i v e face (5b

Example 3)) of U(X*) is U(X*)-extremal. Hence they must both

annull any n o n - n e g a t i v e x ~ X which vanishes at t. It follows

the supports of V and a are just {t}; consequently,

= a = 6t. ) Therefore, the set E ~ {~ 6t: t ~ ~}C ext (U(X*)).

Further, E ° = U(X), so the Bipolar Theorem implies

U(X*) = E °° = c--o- ({@} KJ E) = c-o (E) (w*-closures of course}. Since

E is w * - c o m p a c t (the map t ~ 6t is a h o m e o m o r p h i s m on a), the

Krein-Milman Theorem 5e) shows that

ext (U(x*)) C E = E~ext (U(X*)). Thus we have p r o v e d that


51

ext (U(CR(~)*)) = {~ 6t: t E ~}.

A completely analogous characterization is valid for ext (U(C(~)*)),

namely it is the set {~t: t e ~ and lal = i}.

d) We reconsider now the formula of Exercise 26. It is of

frequent interest (for example, in the next section) to know that the

convex hull on the right hand side of (3) is already (w *-) closed.

In particular this is the case if there are only finitely many A ,

each of which is a (closed convex) O-nbhd. For then, by ~4, each


~O •

is w*-compact convex, and the result follows from 5a). We sum



up:

Lemma. Let AI,...,A n be closed convex @-nbhds. in a i c s . Then

(AIn'''~A n )° = co ( A ~ U . - . O A ~ ) .

~16. Dubovitskii-Milyutin Theory

We give next a brief introduction to a very general approach to

the solution of (not necessarily convex) mathematical programs.

Given a variational pair (X,f) (lla); here X is a real Ics), the

procedure yields a necessary condition, in the form of an equation in

X* ("abstract Euler equation"), for a specific element of dom (f)

to be a solution of the associated program. The scope of this theory

has recently been extended by Halkin [23] and Lobry [41] so as to be

applicable to optimal control problems. The original presentation

was [14]; a discussion has also been given in the Girsanov book [21].

a) Theorem. (Dubovitskii-Milyutin) Let Ko be a convex set

and KI,.-.,K n open convex cones with vertex @ in a real Ics X.

Then
n
(1) ~-~K i =
I=O
52

if and only if ~ Yi s K~ not all zero such that

(23 Yo + Yl +'''+ Yn = @"

Proof. The existence of Yi'S satisfying (2) is clearly suf-

ficient for (I) to hold, since the cones KI,...,K n are open• Now

conversely, if condition (i) holds, we can assume, without loss of

generality, that K ~ Klf~...f-~K n + ~. Choose ~ s K and let

Jl" ~ K.I ~, i = 0,1,...,n. Apply 3g to separate Jo and

J ~ Jl N '" "'~Jn: sup < J o , Y o ~ _< ko < inf ~ J , y o ~


- -
Since J is a

@-nbhd., ko < 0 • Hence yo/~o E = co (J U . . . ~ J ), by 15d).

This implies the existence of a i _> 0, i = l,...n, a I +...+ an = I,

for which -Yo = Yl +'''+ Yn' where Yi ~ -Xoai J° ~ Xi JO" Thus


n n
(3) X Yi -- e and ~ Xi = 0.
i=O i=0

Now by definition of Ji' sup ~ K i , Y i / ~ i ~ is bounded above, so

sup < K i , Y i / X i ~ ~ 0 since Ki is a cone at @, i = l,...,n; in

particular Yi ~ K~. From this, and the fact that x ~ K, it


1

follows that sup Ko,Y ° X ~o + X'Yo -< - Z


1(~i + (x,Yi~) -< 0

by (3), so Yo ~ K°o also, qed.

Other proofs of this theorem have been given by Vlach [76],

Halkin [23], and Pshenichnii [62]; the proof given above was adapted

from Ioffe-Tikhomirov [31]. The interest in this theorem, as will

be seen shortly, is that a necessary condition for the solution of a

variational problem can be expressed as the requirement that a cer-

tain finite family of convex cones should have an empty intersection.

The theorem then yields an equation, (2), which must be solved. We

will refer to (2) as the abstract Euler equation.

b) The variational problems (X,f) to which we will apply the

preceding theorem are of the following type. There are sets


53

~l,.-.,~n_l, each having non-void interior, and a set A which will

not generally have interior points, such that

(4) dora ( f ) = AN aiN'" • N ~ n _ 1.

Intuitively, each ai is the set satisfying some i n e q u a l i t y con-

straint, while A is the set where one or more e q u a l i t y constraints

hold. We s e e k a condition which a given x ~ dom ( f ) must satisfy


o
in order that it be a s o l u t i o n . Since we a r e not at present limit-

ing ourselves to convex programs, we m u s t a l l o w the possibility of

local minima; the condition to be d e r i v e d will indeed be c h a r a c t e r -

istic of each local minimum.

From the given data X (a real ics), f, A, al,..',an_l, and

x ° ~ dom (f), we now construct the sets Ko, KI,...,K n to which

a) will be applied.

c) We begin with the objective function f.

Definition. x c X is a direction of decrease of f (at Xo)

(originally called a "prohibited variation") if 3e > 0 and an

~-nbhd. V such that 0 < t < e and x e V imply

f(x ° + tx) < f(Xo).

The s e t C(xo,f ) of all such elements x is easily seen to be

an open cone with vertex @, or else it is void. There is no

priori reason to expect that the cone C(xo,f ) is convex; however,

as we i n d i c a t e next, this actually is the case i n many c o m m o n l y

occurring situations.

Example,s. 1) Suppose that Vf(Xo) exists as an e l e m e n t of X*.

Then the cone C(xo,f ) is the open half-space {x e X: V f ( x ~ . x < 0}.

2) Suppose that f e Conv (X) and is continuous o n some

Xo-nbhd. Then C(xo,f ) is the convex cone {x ~ X: f ' ( X o ; X ) < 0 } .


54

Exercise 27. Verify these examples.

d) We c o n t i n u e with the sets ~i; we take any one of them and

call it ~.

Definition. x s X is a d m i s s i b l e with respect to ~ if

s > 0 and an x-nbhd. V such that 0 < t < s and x s V imply

x + tx ~ ~.
O

The set C(Xo,~ ) of all such vectors in X is an open cone at

@; it m i g h t be void or it m i g h t not be convex. Only the case w h e r e

x° is a b o u n d a r y point of ~ is n o n - t r i v i a l : if x ° s int (~),

then C ( X o , ~ ) = X.

Examples. I) Let ~ be a c o n v e x body in X. Then C(xo,~ ) =

int (S(Xo, int (~))) Xo; that is, it is c o n v e x cone at x°

generated by int (~) which is then t r a n s l a t e d to @.

2) Suppose that g = {x s X: g(x) ~ g(Xo)} for some real-

valued function g on X. It is clear that C(xo,g)~C(Xo,~). If

either Vg(Xo) exists in X* (and is not @), or if g e Conv (X),

continuous on some x -nbhd., and the r e g u l a r i t y assumption


O

{x: g(x) < g(Xo)} + ~ is valid, then C ( X o , ~ ) = C(xo,g).

Exercise 28. Verify these examples.

e) F i n a l l y we consider the construction of a cone for the set A.

Definition. x s X is a d m i s s i b l e with respect to A (or, is a

tangent direction to A at Xo) if ~ a map r: [0,E] ÷ X for

some s > 0, such that x ° + tx + r(t) s A when 0 _< t _< s, and

and r(t)/t + @ as t + 0+.

The set C(Xo,A ) of all such v e c t o r s is a g a i n a cone at @

and @ s C(Xo,A). In m a n y cases of i n t e r e s t this cone is s i m p l y a


55

l i n e a r subspace.

~xamples. I) If A is a flat in X then C(Xo,A ) is the

parallel subspace.

2) Let X and Y be Banach spaces and G: X ÷ Y a mapping

which is continuously Frechet differentiable on an x -nbhd. Assume


O

that the differential dG(Xo) s L(X,Y) is surjective. Then if h

has the form {x ~ X: G(x) = @}, we have C(Xo,A) = nullspace of

dG(Xo). (Without the assumed surjectivity, we can only assert that

C(Xo,h) C nullspace of dG(Xo). ) When X and Y are finite dimen-

sional, the surjectivity condition is equivalent to the Jacobian

matrix of G at x° having maximum row rank. This result is due to

Liusternik [43]; see also Flett [17].

f) Now we come to the point of the last four sections. We re-

consider the program formulated in b), and define

=
Ko C (x ° , A ) ,

K i = C(Xo,~i), i = l,''',n-l,

K n = C(xo,f).

It is explicitly assumed that all these sets are convex. We then

obtain the Dubovitskii-Milyutin_ Optimality Criterion: if

x ° ~ dora (f) ((4)) is a solution of the program (X,f), then

3 y i ~ K?i' not all @, such that Yo + Yl +'''+ Yn @' that is,

the abstract Euler equation must hold.

Proof. By a) we must prove that (i) holds. Suppose ~ x e (-~K..


1

Since the intersection of finitely many x-nbhds, is again such a

nbhd., ~ ~-nbhd. V and ~ > 0 such that f(x o + tx) < f(Xo) and

x o + tx ~ ~ i whenever 0 < t < ~ and x ~ V. But x ~ Ko also.


56

Hence x o + t(x + r(t)/t) e A for sufficiently small t > 0. By

definition of r(.), x + r(t)/t s V for small t. This shows that

t > 0 and x e V such that x ° + tx E A('~ ~ l ( ' ~ . . . ~ n _ l ~ dom (f)

but f(x o + tx) < f(Xo), and so x° is not a solution after all,

qed.

§17. An Application

As one illustration of the Dubovitskii-Milyutin procedure we con-

sider here the so-called "simplest problem in the calculus of

variations". In particular, we will see that the abstract Euler

equation of 16a) leads in this case to the classical Euler differen-

tial equation. The problem to be solved is essentially that of

minimizing a functional defined over a class of smooth curves joining

two fixed points in the plane R 2. Among such programs are included

the shortest distance problem, the brachistochone problem (the shape

of a wire along which a ring descends in least time subject to

gravity), and the profile of a minimal surface of revolution. The

solutions of these three problems are respectively straight lines,

cycloids, and catenaries.

a) Let F: R 3 + R 1 be continuous with continuous partial de-

rivatives in its second and third arguments. Consider the functional

r 1
x '+ I F(t,x(t),x' (t))dt,
0

defined for all x e C~([0,1]). We seek to minimize this functional

over the set of all such x which satisfy x(O) = ~, x(1) = ~,

for given fixed ~ and ~.

To recast this problem in a more convenient form, let

X = CR([O,I]) × CR([O,I]) , and define f: X ÷ R I by


i
f(x,y) ~ f((x,y)) |0 F(t,x(t) ,y(t))dt.
= J
57

Define the constraint set A C X by

A = {(x,y): x(t) = a + y(t)dt, x(1) = 8}.


0

Thus our variational problem becomes (X, f+~A). We now assume that

this problem has a minimum at (Xo,Yo) ~ A.

b) The objective function f is certainly not convex in

general, but it is smooth on X. Indeed, we have the formula

1
(i) Vf(Xo,Yo).(x,y) = j (F2x + F3Y)dt,
0

where the subscripts indicate partial derivatives on the second and

third variable, and these derivatives are each evaluated at

(t,Xo(t) ,Yo(t)) . By 16c)

C((xo,Yo),f) = {(x,y): Vf(Xo,Yo)'(x,y) < 0}.

The polar of this cone is simply the ray {t Vf(xo,Yo): t K 0}.

c) Since A is a flat in X, 16e) implies that C((xo,Yo),A)

is the parallel subspace:

C((xo,Yo),A) = {(x,y): x(t) = y(t)dt, x(1) = 0}.


0

The p o l a r of this subspace is the annihilator subspace which consists

of those ~ ~ X* having the form

(z) ~(x,y) = cx(1)


1
+ f, (s(t)
it y(s)ds)d~ (t),
J0 0

for some c ~ R1 and ~ e rca ([0,I]).

Exercise 29. Prove this last assertion.

d) We can now write down the abstract Euler equation which

must be satisfied if (Xo,Yo) is to be a solution. There must


58

exist c ~ R1 ~ ~ rca ([0,i]) and z > 0 such that

(3) -~vf(xo,Y o) ÷ ~ = e

where ¢ is defined by (2). These (linear) functionals cannot both

vanish so • > 0. Suppose we apply both sides of (3) to elements of

the form (x,y) where x(t) ; y(s)ds. We o b t a i n , using (1) and


0
(2):
1 t
(4) (10cF2 I0Y(s)ds + F3Y(t))dt)
+ c y(t)dt ~ 0,
0

for any y ~ CR([0,1]). If we i n t e g r a t e by parts the first integral

in (4), we a r r i v e at the equation

1 1
(s) I0 (~(It Fzds + F3)-c)y(t)dt = 0.

Since (5) holds for every y a CR([0,1]) , we a c t u a l l y must have


¢, t
(6) ~F 5 TjlF2ds - c = 0.

Finally, if F3 happens to be differentiable, we o b t a i n from (6)

the classical Euler differential equation which x° and Yo (-- x ~ )

must satisfy:

d
~-~ F3(t,Xo(t),Yo(t)) = F2(t,Xo(t ),yo(t)).

§18. Conjugate Functions and Subdifferentials

We consider next a few relationships which depend on both the

conjugate operation and the subdifferential mapping. The first re-

sult does not depend on convexity and follows immediately from the

definitions.
59

a) Lemma. If X is a real ics and f: X + (-~,+~] then

Yo ~f(Xo)<=)f(Xo) + f*(yo) = <Xo,Yo~ (equality in Young's


inequality).

b) Lemma. Let f, X be as in a). If f is subdifferentiable

at x° e X then f(Xo) = f**(Xo).

Proof. Choose any @ s ~f(Xo) and define a continuous affine

function h by

h(x) = ~(X-Xo) + f(Xo).

We have h j f pointwise on X and h(Xo) = f(Xo). Hence

h < f** < f (recall ~xercise 24), qed.

Corollary. Yo s ~f(Xo) =)Xo e ~f*(yo).

c) It follows from a) and b) that if f is subdifferentiable

at x e X, then so is the isc convex function f**, and


o

furthermore

~f(Xo) = ~f**(Xo).

This provides some of the interest in the functions in F(X), since

we may assume that f = f** e F(X) for the purpose of finding

subgradients. If f is already in Cony (X), then f** differs

from f (if at all) only at certain relative boundary points of

dom (f) where its values may be strictly smaller than the corres-

ponding values of f. In fact, in this case, we have the formula

f**(Xo) = lira inf f(x),


x-~x
o

valid ~ x o e X, if f has a point of continuity.


60

d) C o n s i d e r now an abstract convex program (X,f) (lla)). We

already have studied the s o l v a b i l i t y condition @ s Sf(Xo) ,

necessary and s u f f i c i e n t for x E X be a solution. Now by


O

definition f~(@) = -inf f(X), so that f is b o u n d e d below on X

exactly when e s dom (f~). Next we o b s e r v e from a) that

~f*(e) = {x s x: 0 = f*(e) + f**(x)}

= {x s x: f**(x) = inf f(x)}.

Thus for f s F(X), the set of s o l u t i o n s to the p r o g r a m (X,f) is

just Df*(@); in p a r t i c u l a r , the e x i s t e n c e of a s o l u t i o n is equi-

valent to the subdifferentiability of f* at 0. Recalling Exercise

18, we see a d d i t i o n a l l y that the e x i s t e n c e of a u n i q u e solution is

equivalent to the e x i s t e n c e of the g r a d i e n t qf*(e) in X, this

vector being then the u n i q u e solution to the program.

Especially when X is finite dimensional a m u c h more detailed

study of the m i n i m u m of a convex function is p o s s i b l e - see [70, §27].

e) Theorem, Let X be a real ics, and f ¢ Conv (X). For any

X O ~ X,

f'(Xo;.)* = 62
f(xo)

Proof. For fixed t > 0 let

Ft(x ) = (f(x ° + tx)-f(Xo))/t.

Then

F~(y) = (f (xo) +f* (y) - ( X o , y ) )/t,

by 14b), and F~ ~ @. Consequently,


61

f' (Xo;.)* = (inf Ft)*


t

= sup F~
t

= sup (f (Xo) +f* (y) - ~ X o , Y > ) / t


t

= 6~f(Xo),
where we have used 7a), 14b), and b) above.

Corollary• c--o (f' (Xo,.)) =

f'(Xo,.)~ = 6~f(Xo)

the support function of ~f(Xo) (14c)).

Suppose now that f is also continuous at x o. Then according

to Exercise 17 f'(Xo;.) is then continuous on X, and so

f'(Xo;.) = 6~f(Xo) ,

in this way we obtain a new proof via conjugate functions of the

Moreau-Pshenichnii Theorem 10b).

§19. Distance Functions

a) To further illustrate the use of conjugate functions and the

formulas of Sections 14 and 15, we study a very important example of

a convex function - the distance to a convex set. Let K be a

subset of a nls X. The function

x ~ d(x,K) e dist (x,K)

can be r e p r e s e n t e d by 14h) as the convolution

d(.,K) = I1•11 @ 6 K
62

Hence if K is convex then so is d(.,K); it is easy to see that

the converse is also valid. Of course, for any non-empty K~X,

d(.,K) is (Lipschitz) continuous on X.

b) Theorem. (Duality Formula for Distance) Let @ ~ K, a

convex subset of a nls X. Then

(1) d(x,K) = max ( < x , y > - P K o ( Y ) : y e U(X~)}.

Proof. Applying successively 14h) and 15a) we find

dC.,K)* = ll'Jl* + 6K~

= 6U(X, ) + PKO.

Then since certainly d(.,K) ~ r(x), d(.,K) = d(.,K)**, whence

d(x,K) = sup {<x,y) - ~ u ( x * ) (y)

PKo(Y): y ¢ X*)

= sup {<x,y> - ~Ko(Y): y ~ u(x*)}.

The "sup" here is actually a "max" since U(X*) is w*-compact, and

the function y ~ (x,y) + (-PKo(Y)) is the sum of a w~-continuous

and a w~-usc function, hence is w~-usc.

Note that if @ ~ K we still obtain a duality formula, if pK °

is replaced by the support function of K. This also shows that if

d(Xo,K) is attained (i.e., if ~z ° ~ K such that

]iXo zo] [ = d(Xo,K)), then since the "max" in (i) is attained only

at points in S(X), we have (Xo-Zo'Yo) = I Ix o-z o] I, for any

Yo where the "max" is attained. Hence any such Yo is a sub-

gradient of the norm at x -z .


O O
63

Corollary. If M is a linear subspace of X then

(2) d(x,M) = max {<x,y>: y ~ U(M~)}.

c) Remark. In the p r e c e d i n g corollary the "max" may be replaced

by a "sup" over ext U(M ) (5d)). This extreme point set is fre-

quently much smaller than the entire unit ball, so that such a

replacement may be a c o n s i d e r a b l e simplification. This theme is

developed at some length in [8].

Exercise 30. (Buck, Golomb) Let M be the subspace of

separable functions in CR([0,1]x[0,1]); that is,

M = {z: z(s,t) = x(s)+y(t), x,y c CR([0,1])}. Let Xo(S,t ) = st.

Compute d(Xo,M ) .

d) Lemma. For 1 < p < +~, let f = ~d(.,K) p, where K is


1
a convex subset of a nls. Then f* = ~I[.]I q + OK, where oK is

the support function of K.

The proof follows from 14c) and 15h).

e) We study now the smoothness of the distance function when K

is a convex subset of Hilbert space. To do so we must anticipate a

result and some terminology from Part III. The result needed is a

characterization of best approximation from convex sets in Hilbert

space. Although not p r o v e n until §22, it is a c o n s e q u e n c e of the

optimality criterion in lld). We let PK be the m e t r i c projection

(see ~32) onto the closed convex set K.

Theorem. Let K be a closed convex subset of a Hilbert space

S, and let f = ~d(.,K) 2. Then f is a smooth convex function on

X and

(3) Vf = I - PK"
64

Proof. Applying the lemma in d) (p = 2), and Young's

inequality (14b) we obtain

(4) f(x) >_ < x , y > - } [ [ y ] l 2 - oK(y),

for every x,y ~ X. (We a r e u s i n g the self-duality of Hilbert space

and < .,. > is the inner product.) Fix z ~ X and l e t

y = Z-PK(Z ) . Using the characterization of best approximations from

K (22d)), namely

o !<z-PK(z),P~(z)-Y>, y ~ K,

we have

°KEY) = ~up < K,y > =


sup < K,z-pK(z) > = < P~Cz),z-PK(Z) 3 ,

whence by ( 4 ) ,

f(x) >_ < X - P K ( Z ) , Z - P K ( z ) > - f ( z ) .

Therefore,

f(x)-f(z) >_ <~-p~Cz),z-pK(z) ~


(s)

-[[Z-pK(z)ll 2 = < x - z , z-pK(z) > .

Thus

o ! f(x)-f(z)- (~-z, ~-v~(~)>

< ix-z, z-PKcx)) - < x z, z-PKcz)3


= <x-z, (X-PK(X))-(Z-PK(Z)) >

llx-zll 11 <~-PK)<x-z)l I
211x-zll 2,
65

qed. (The second inequality here results from (5) by interchanging

x and z; the final inequality depends on the fact that PK is a

contraction (32a)) .

Remarks. i) The formula (3) shows that f is actually con-

tinuously Frechet differentiable on X. It is interesting that this

conclusion does not depend on any smoothness properties of the

boundary of K.

2) The function f is not generally twice differentiable; there

are trivial counterexamples. Thus higher order differentiability of

f does depend on the boundary smoothness of K.

3) Of course the theorem implies that the distance function

itself is smooth; indeed, its gradient vector at x ~ X\K is

(x-PK(K))/I Ix-p~(x) I I •
4) Smoothness of d(.,K) on tile open set X \K can be

established in certain Banach spaces X. It is required that the

norm in X be smooth and that PK be (single valued and) continuous

[Holmes-Kripke; unpublished].

Exercise 31. Verify the assertions in Remarks 2) and 3).

g20. The Fenchel Duality Theorem

This theorem associates with a given convex program a "dual"

program wherein a concave function is to be maximized. Before

making precise the form of the dual programs we take time to rethink

the geometric interpretation of conjugate functions.

a) Let X be a real ics and f ~ Cony (X). As was observed in

8b), linear functionals on X × R1 can be identified with pairs

(~,s) E X* × R I. The corresponding hyperplanes are called non-

vertical provided s # 0; if so, they each intersect the Rl-axis at


66

exactly one point. Since we can assume that s = -i, this point of

intersection is (@,-X) if the h y p e r p l a n e has the form

H A ~ {(x,t) s X × RI: < x , y > - t = X}, for some y ~ X*.

Now suppose X > f*(y). Then Vx e X, I > ~x,y) -f(x) or

f(x > in other words the h y p e r p l a n e HA lies strictly

"below" epi (f). Similarly, if ~ < f*(y) then HA intersects

epi (f) strictly "above" some point (x,f(x)). Thus w h e n X = f*(y)

HA is trying to be a supporting hyperplane to epi (f), although

the two sets will intersect only if the "sup" that defines f*(y)

is actually attained, which happens exactly when y is in the range

of ~f. In any event the "vertical height" of this hyperplane above

the origin (@,0) is -f*(y).

b) Next suppose that -g s Conv (X). In this case we will say

that g is a p r o p e r concave function on X and write g s Conc (X).

The theory of such functions is of course a mirror image of the pre-

viously developed theory of proper convex functions. Thus we define

dom (g) = {x ¢ X: -~ < g(x)}

epi (g) = {(x,t) ~ X x RI: t <_ g(x)}.

If we consider again the h y p e r p l a n e s HA in X x R I, and

define an (extended) real number y by requiring that for -k > y

HA lies "above" epi (g), while for -X < y HA intersects

"below" some x, then evidently y = -inf {<x,y~-g(x): x s dom (g)}.

Definition. If g: X ÷ [-~,+~) the (concave) conjugate of


+
is the function g : X* ÷ [-~,~) defined by

+
g (y) -- inf {<x,y~-g(x): x ~ dom (g)}.

(Again we assume dom (g) ~ ~.)


67

+
Analogously to 14d) we have g is a w*-usc concave function

on X ~. If h is any real-valued function on X, then

h+(y) = -(-h*(-y)).

Hence even when h ~ Conv (X)/~ Conc (X), i.e., when h is

affine, h + } h*.

With the definition of g+(g ~ C o n c (X)) we see that Hi is

"tangent" to epi (g) (that is, neither intersecting epi (g)

strictly "below" some point (x,g(x)) nor lying strictly "above"

epi (g)) exactly when -I = g+(y), and then the "vertical height"
+
of this hyperplane over the origin is -g (y).

c) We now consider a convex program of the form (X, f-g),

where f, -g ¢ Conv (X). Such programs are not as special as they

might at first appear, and we will discuss several examples shortly.

For now, note that V x c X, ~ y ~ Y,

f(x)+f~Cy) >_ < x , y > >_gCx)+g+Cy),

SO

fCx) g(x) > g+(y) f~(y),

that is,

(i) inf (f-g)(X) > sup (g+ f~)(x~).

It is helpful to view (I) geometrically by considering epigraphs

and hyperplanes in X × R I. The inequality asserts that the value

of the program (X, f-g) (the left hand side of (I), which can be

thought of as the minimal vertical distance between epi (f) and

epi (g)) is at least as large as the value of the concave program

(X ~, g+-f*) (the right hand side of (i), which, by the analysis of

a) and b), can be interpreted as maximum vertical separation of two


68

parallel hyperplanes tangent to the two epigraphs).

Theorem. (Fenchel, Rockafellar) Let f, - g e Conv (X) and

assume that one of them i s continuous at some p o i n t in

dom (f)(3dom (g). Then

(2) inf (f-g)(X) = max (g+-f*)(X*).

Proof. Let f be c o n t i n u o u s at x s dom ( f ) ( - ~ dom ( g ) . Then


O

x ° ~ int (dom ( f ) ) and +~ > f ( X o ) - g ( X o ) ~ inf (f-g)(X) ~ ~. The

theorem is clearly true if a = -~ by (1). So we may a s s u m e a is

finite. Introducing the sets

A = {(x,t) ~ X x el: x cint (dora (f)), t > f(x)},

B = {(x,t) g X x RI: t < g(x)+a},

we note that they are convex and disjoint, and A is open. Hence

a hyperplane Hx separating A and B. Hx cannot be v e r t i c a l

for o t h e r w i s e its p r o j e c t i o n onto X would separate the pro-

jections of A and B, viz. dom (f) and dom (g), and this

would contradict the e x i s t e n c e of x o. With Hx having the form

{(x,t) ~ X x RI: <x,y> -t = X} as in a), we can assume that

t > f(x)~---><x,y> -t ! X. Thus

(5) < x , y ) - x 2 f(x)

is valid throughout int (dom (f)), and it clearly holds outside

dom (f). But (3) is also valid if x is a b o u n d a r y point of

dom (f), since then by 3c) tXo+(l-t)x e int (dora (f)) for

0 < t ~ i, so (3) implies

~(tXo+(1-t)x)-X <_ f ( t X o + ( 1 - t ) x )

<__ tf(Xo)+(l-t) f(x) ,


69

and we may let t + 0+. Consequently, f*(y) < I and in a

similar way we see that

<x,y) ~ ~ g(x)+~, V x ~ x,

or ~ + I ~ g+(y). Therefore,

÷
! g ( y ) - ~ ! g+ (y) - f* (Y)

! sup ( g + - f ~ ) ( X ~) <_ i n f (f-g)(X) = ~,

qed.

d) We now want to consider when the "inf" in (2) is actually

attained.

Definition. Let g e Conc (X) and x X. Any y e X* for


O

which

x-xo,y ) ~_ g ( x ) - g ( X o ) , ~ x ~ x,

is called a supergradient of g at x
O

The set of all supergradients of g at x is written


O

~g(xo) , and we h a v e y e Ag(xo) if and o n l y i f

g(Xo)+g+(y) = ~Xo,Y ) .

Corollary. Assume that f and -g in Conv (X) satisfy

equation (2). Then f - g attains its infimum over X at x if and


O

only if ~f(xo) (~ Ag(xo) ~ ~. Points in this intersection are then


+
exactly the points where g f~ attains its supremum over X ~.

Proof. Yo ~ ~f(Xo)(~ ~g(Xo)<~-> f(Xo)+f~(Yo) <- ~ X o ' Y o ~ <-- g(Xo)+

g+(yo)¢-~f(xo)-g(Xo) -- g+(yo)-f~(yo). Now use equation (2).


70

§21. Some Applications

a) As a first application of the Duality Theorem we establish

the promised improvement of the theorem in 14i).

Theorem. Let fl,..-,fn E Conv (X) where X is a real Its.

Suppose that f2,...,fn are continuous at a point in

dom (fly f'~ int (dom ( f 2 ) ) N ... ~ i n t (dom (fn)). Then

(Efi)* = O f~'z

Proof. An obvious induction will establish the general case

after we attend to the case n = 2. In 20c) put f ~ f2 and

g ~ < "'Yo> -fl' for any fixed Yo e X*. Then g+(y) = -f~(yo-y),
and so the Duality Theorem implies

- (fl+f2)* (yo) = i n f {fz (x) +f2 (x)

<X,yo) : x ~ x)

= max (g+-f*) (X*)

= max {-f~(yo-y)-f~(y): y ~ X*}

= -(f~ + f~)Cyo)'

qed.

b) The standard convex program involving the minimization of

f e Cony (X) over a constraint set K can be cast in the form

(X, f-g), to which the Duality Theorem is applicable, by setting

g z -~K" The original program can then be replaced by the dual

concave (maximization) program (X*, g+-f*), provided the hypotheses

of 20c) is satisfied. Of course, the interest in doing so depends


+
on the ease with which f* and g can be calculated and the

simplicity of the resultant dual program.


71

c) Exam~_l_e. Let K be a closed convex cone at @ in a real

1cs X, and f s ?(X) with f continuous at some point in K.

Then

inf f(K) =-rain f*(-K°).

Notice that if K is a linear subspace of X, this formula reduces

to

inf f(K) = -min f*(K -a)J

and if also f is the f u n c t i o n llx-zll, we obtain the duality

formula (2) in 19b).

Exercise 32. Verify these assertions.

d) Exampl£. Let X = R n, c e X, and let A be a (real)

m × n matrix. Consider the s t a n d a r d linear program

max {<x,c> : A.x = b, x >_ 0}

for some given b ~ R m. We view this as a concave program and use

the Duality Theorem to construct the dual convex program. Let

gi(t) = cit 6Pl(t), where Pl ~ {t ~ RI: t ~ 0}. Then

g~(s) = inf {t(s-ci): t ~ 0}

O, s > ci

I -~ S < C i.

.ence for gCx) <x,c>-6pn, where Pn = {x ~ Rn: x _> @}, we

have
72

n ÷
g+(x) = X gi(xi)
i=l

'0j xi > ci V i
=

- , otherwise.

Now let K ~ {x: A.x = b} and f ~ 6K" Then f* = o K , and

dom (f*) = nullspace (A) = range (A*)

= row space of A.

So f*(y) < +~ if and only if y is a linear combination of the

rows Ai of A: y = ZlAl+'''+ZmA m. In this case the functional

.,y > is constant on K, and this constant value is

Zz i < x , A i > = Zzibi(x ¢ K). We now see that the dual program has

the form "minimize f* g+", or

rain { < z , b > : A*.z > c}.

Since the z-variable in Rm enters so naturally the dual program

is always considered to be defined on Rm rather than (Rn) * = R n.

(If the original constraint had been of the form Ax < b, x > @,

then the dual constraints would have turned out as A*.z > c,

z~e).

Suppose that a solution z has been obtained to the dual


0

program. We then put Yo = A*'Zo and obtain (in principle) a

solution x° to the original linear program from the requirements

X o c 8f*(yo)~ &g+(yo ), deduced in 20d). In practice, since the

computational difficulty involved in solving a linear program de-

pends more on the number of constraints (not counting non-negativity

constraints) than on the number of variables, it tends to be more

efficient to directly solve the dual program whenever m > n.


73

e) Example. Let X be a r e a l Banach space, A ~ L(X,Rn),

c a Rn. The p r o b l e m o f f i n d i n g an e l e m e n t o f m i n i m a l norm i n t h e

flat A-l(c) will be c a l l e d an a b s t r a c t minimum e f f o r t control

problem. This problem is discussed at length i n t h e book [60], and

is considered for illustrative purposes in the book [42]; cf. also

the following example f).

Suppose that X is a Hilbert space. Then there is a unique

solution and the subdifferential theory locates it as the point of

intersection of A-l(c) and (A-I(0)) ~. To proceed via the Duality

Theorem, let f = ½11-If 2, K = A-l(c) and g = -6 K. Then

dom (g+) = (A-I(@)) i = range (A~), and so

g + (y) = inf < K,y


~ > -~ if and only if y =A* (e) for some
+
e a R n, and then g (y) = < c , e > . Thus the dual program

"maximize g + - f,,, becomes the finite dimensional (unconstrained)

problem

(i) max ( < c , e > - ½ 1 1 A * ( e ) jl: e ¢ R n ) .

e o ¢ Rn is a solution if and only if the gradient of the function

in (i) vanishes at Co, and this condition requires

=0 vo,

(2) AA*(eo) = c.

(If, more generally, X is a reflexive and rotund Banach space

(see §27), the has condition on eo is that

A(Vf(A*(eo)) = c.)

Having solved (2) for e° we obtain a solution Yo of the dual

problem by Yo = A*(eo)" However, Yo is also a solution of the

original problem by 20d), since Yo a K so that g(yo ) = 0, and


74

therefore

[lyo[[ 2 : <A*(eo),Am(eo) >

that is,

f(yo)-g(yo ) = 11 lyol ]2
o <Oo,~ ½11~oJl~
+

-- g (Yo)-f*(yo ).

Remark. If A is written in the form

n
(~) ACx): X <x,ui)el,
i=l
then

n
A*Ce~ : Z <e,ei}ui.
i=1

Hence AA* is the Gram matrix [ < u i , u j ~ ]. Assuming that range

(A) = R n, the set {Ul,...,u n} is linearly independent so that

AA* is invertible and hence

YO = A* (AA*) - I (c) = A t ( c ) '

where At is the pSeudoinverse of A (see §35).

f) Example (cont.) A linear dynamical system is governed by a

set of ordinary differential equations

(4) %((t) : F x ( t ) + Bu(t),

where x: [0,T] ÷ R n, u: [0,T] ÷ R m, and the matrices F, B may

be functions of t. We assume that x(0) = @ and try to choose a

control u so as to transfer the state of the system to c s Rn at

time T (i.e., x(T) = c) with minimum expenditure of energy. The


75

latter is taken proportional to

fi
(Note that no magnitude constraints are being imposed on the control
u.)

Let X = L2(dt) m, where dt denotes Lebesgue measure on

[0,T]. We will define an operator A ~ L(X,Rn). Let ~ be the

transition matrix of the system (4). Then define

A(u) : x(r)

= *(T) t)B(t)u(t) at.

(Recall that if F is a matrix of constants, then ~(t) = exp (Ft).)

We have now put this dynamical problem in the form of the abstract

model considered in the preceding example.

Let [wij(t)] -5 W(t) -= ~(T)~-l(t)B(t). Then

i:l j=l ]w
0 ij (t)uj (t)dt)e i

n
= ~ <u,Wi) ei ,
i=l

where < . ,. ~ is the inner product in X. Hence we see that the

matrix

¢ T
AA~ = | W(t)W* (t)dt.
J0

g) Remark. Further discussion and examples of the Fenchel

Duality Theorem occur in [42] and [70].


Part III

Theory of Best Approximation

We turn now to a consideration of a special class of convex

programs, namely those in which the objective functions is the dis-

tance function defined by a convex subset of a nls - see §19. A

fair amount of the material in this part can also be found in the

books of Cheney [9] and Singer [72] (although most of that treatment

considers only approximation from linear subspaces); the latter work

in particular contains a great deal of additional information on

approximation theory, all developed within the framework of functional

analysis.

§22. Characterization of Best Approximations

a) Definition. Let K be a convex subset of a nls X and

x a X k cl(K), x° is a best approximation (b.a.) t__oo x from K

if it is a solution of the convex program (X, f + ~K), where

f(z) ~ l lx-z[l.

Thus x0 is simply an element of K of least distance to x:

]]x-xol I = d(x,K) (5 the value of the above program). The next

theorem is the main result characterizing best approximations; a

sharper version will be given later (23f)) for the finite dimensional

case.

b) Theorem (Garkavi; Deutsch-Maserick). x is a b.a. to x


O

from K if and only if ~ ~ ~ S(X*) such that *(X-Xo) : 1 I x-xol l


and re ~(xo) = max re ¢(K).

Proof. If such a ~ exists and z ~ K then


77

I lx-xol I = ,(x-x o) = re ,(x-x o)

= re qb(x) - re ,(Xo) <_ r e dO(x) re ,(y)

= re O(x-y) <_ l * ( x - Y ) t < I I x - y l l .

Conversely, if xo is a b.a. to x from K, and we put

f(z) ~ l]x-z][, then by lld) ~ s 3f(Xo) such that

,(Xo) = min ,(K). (If X is complex, apply the f o l l o w i n g argument

to X and extend the resultant ~ from X* to X* as usual.)


r r
In particular, for any y ~ X,

(1) (Y) = , (Xo+Y) * (x o)


tlx-(Xo+y)ll llX-Xoll : Ilyll,

so I1~11 ~ l . Similarly, 9(X-Xo) ~ -llx-xoll so that

Let y = Xo-X in (i) to get

,(Xo-X) : jlx-C2Xo-X)jl JlX-Xoil


Irx-Xol I ,

so that ~(Xo-X) = l lx-x o I, and we may take ~ = -~, qed.

Geometrically this theorem says that x° is a b.a. to x if

and only if there is a (real, closed) hyperplane H supporting K

at Xo, separating K from x, and such that d(x,H) = d(x,K).

Also, note that if K is a linear subspace, then ~ must

belong to K &.

c) Example. Let X = LP(~), 1 < p < ~, and let K be a con-

vex subset of X. Then x is a b.a. to x from K if and only if


O

re I(Xo-Z) (x-x-----o)IX-XolP-2d~ >_ O,

for every z ~ K. When K is a linear subspace of X, the c r i t e r i o n


78

is simply that

(x-x---'-~) IX-Xo Ip-2 ~ KJ" C Lq(!a),

d) ~ . Let X be an inner p r o d u c t space, and K a

convex subset of X. Then x is a b.a. to x from K if and


O

only if

re x o-x , z-x o > _> 0,

for every z ~ K. This results directly from b), since the func-

tional ¢ there is now g i v e n by y ~ re < y , ( x - x o ) / I ] X - X o ] [ > . 0£

course when K is a linear subspace we recover the usual criterion

that (X-Xo) ~ K.

e) Example. Let X = CR(~ ). We say that t s ~ is a peak

point of x ~ X (written t s P(x)) if Ix(t) I = IIxll. we say

that Z s X* has the same sign as x if

f
E
x d~ > 0,

for all Borel subsets E of ~. The p o s i t i v e , negative, and t o t a l


+
variations of ~ are denoted by ~ , ~ , and I~l, respectively.

Theorem. x is a b.a. to x from a convex subset K of


o
if and only if there is a n o n - z e r o D ~ X* such that

I(Xo-Z)d~ > 0, z e K;

has the same sign as X-Xo;

support (Z)C P(X-Xo).

The p r o o f is an easy c o n s e q u e n c e of b) and the next lemma.


79

Lemma. Let x E CR(~), ~ c rca (a).

i) ~ has the same sign as x if and only if

(2)
ffxd~ = Ilxldl~l.

2) support (~) C_- P(x) if and only if

(3)
fl
Proof. I) If {2) holds and ; does not have the ~ame

sign as x, then ~ Borel set E C ~ such that

I
E
xd~ < 0.

Therefore,

fxd~ < f xd~ <_ f IxldI~ I


~'~E ~\E

a contradiction. Conversely, assume that ~ has the same sign as

x. Let a = A U B be a Hahn decomposition of a for ~. Then

fxd~J = Ixd~+ - fxd~J-

= Ixd~+ - Ixd~ -
A B

= (x -x )d~ - (x+-x)d~-.
f + - + f
A B
+
[Here x is the positive part of x, etc.) Now

fx-d~+ = 0 = fx+d~-
A B

(check!), and so
~:~ 0
/
c+ C~

fO

0
II

O
b~ D ~

A IA II -- ,g. II
II ~" , II II
IA ~ ~l~ ~ -
-- f
x --¢ -~
x
O ~ + II
+
03 ~
Z" v + i
f +
/ -- O h +
+ / +
n t1~~¸, + +
"~ N m
0 + +
,+ +
-2 ~ Im
I[ el - - ~:a
O II i
x C v ~ 4-
n2~ t~ + "E
- - N l
II
- - I
~...FI
C / --
b< i

r+ /

o .o
l_.a In.

/ o ~ N
r+ ~-'

t~ V
t:r
81

= IIxll I.IC ),

a contradiction, qed.

Exercise 33. Show that

(4) fxd> = l lxll I1 11


a

if and only if ~ has the same s i g n as x and support (g) ~ P(x)

if and only if support (~+) (resp. support (~-)) is contained in

{t e a= x C t ) = l lxjl (resp. = -j jx I j)}.

Exercise 34. Take a = [0,1] and let ; be absolutely con-

tinuous wrt Lebesgue measure. Then ~ has the same s i g n as

x s CR([0,1]) if and only if the Radon-Nikodym derivative d~/dt

has the same sign almost everywhere (as a function) as x.

§23 •
Extremal
...........
Re P resentations

In this section we consider some applications of the extreme

point concept of ~5 to the r e p r e s e n t a t i o n of linear functionals and

the implications of this for finite dimensional best approximation.

This abstract theory is then illustrated with applications to spaces

of continuous functions.

a) Lemma. (Carath~odory) Let A C X~ an n - d i m e n s i o n a l Is.

If x e co (A), then x is a convex combination of at most n + 1

elements of A (resp., at most 2n + 1 elements of A if the

scalars are complex).

The proof of this w e l l - k n o w n result is omitted here, it may be

found in [9, p. 17], [44, p. 43], or [70, p. 155]. A particular

consequence is that co (A) is compact whenever A is, a fact

which is generally not valid in infinite dimensional spaces.


82

b) Lemma. Let K be a c o m p a c t convex subset of an n-dimen-

sional Is X. Then each b o u n d a r y (resp. interior) point of K is

a convex combination of at most n (resp. n + i) points of

ext (K). (If the scalars are complex these n u m b e r s are to be re-

placed by 2n and 2n + I, resp.) In p a r t i c u l a r , K = co(ext (K)).

Proof. It will suffice to assume real scalars. We p r o c e e d by

induction on the d i m e n s i o n d of K, the case d = 1 being trivial.

Assume the lemma true for d < m-I and let d = m; we may also

assume that @ s K. If M ~ span (K), then K has n o n - e m p t y

interior wrt M, namely rel-int (K), and it is convex by 3c). Let

x be a r e l a t i v e boundary point of K; then by the S u p p o r t Theorem

(3f)) there is a h y p e r p l a n e H in M supporting K at x. The

set H6%K is compact, convex, K-extremal, and of d i m e n s i o n at

most m - i. By the induction hypothesis x is a convex combination

of at most (m-l) + 1 = m points in ext (H(']K). But

ext (H6hK) C ext (K) by 5c-3). Finally, if x s rel-int (K),

choose any z g ext (K) and e x t e n d the line segment [z,x] until

it meets the relative boundary of K at some y. Then x s co (y,z)

and we can apply what has just been p r o v e n to y.

c) Lemma. (Singer) Let M be a linear subspace of a nls X,

and % cext (U(M*)). Then there exists an e x t e n s i o n of } to all

of X which belongs to ext (U(X*)).

Proof. Exercise 35.

d) Theorem. (Interpolation Formula for L i n e a r Functionals)

Let M be an n - d i m e n s i o n a l linear subspace of a nls X and let

~ S(M~). Then ~ {~l'''''~m } C ext (U(X*)) and XI,...,X m > 0,

with XI+...+X = 1 such that


m
83

m
(l ) ,-- x.,jtM.
j=l J

Here m ~ n (real scalars) or m < 2n-1 (complex scalars).

Proof. By c) we may assume that M = X. In the real case the

result follows directly from b) w i t h K = U(X*). Consider now that

the scalars are complex; applying b) w o u l d give us a r e p r e s e n t a t i o n

of the form (i) but we could only be sure that m ~ 2n. The re-

maining argument allows us to reduce this bound to 2n-l.

Let K = ker (¢), then dim (K) = n-i so dim (Kr) = 2n-2.

Choose x ° ¢ S(X) such that ~(xo) = 1 (= I l~I I). Let

Y = real span (Xo,Kr) and define ~ ¢ S(Y*) by ~ = re elY. Apply

b) to get {~l'''''~m }C ext (U(Ye)), XI,...,X m > 0,

Xl+'''+Xm = I, with m _< dim (Y) = 2n-l, so that

m
j=l J J

Apply c) to extend each ~j to ~j E e x t (U(Xr)), put

j=l J'

and define ~ ~ X~ by ~ = re ~ (usual extension from X~ to X*).


r

Thus

m
~= ~ Xj~j,
j=l

where Tj ( x ) = ~j (x) - i~% ( i x ) (i - Rf7~), and each Tj is in

ext (U(Xe)). Now we h a v e that ker (~) = ker (~) and

I I~ll = 11,II, hence ~ = a, for some s c a l a r ~ with lal = 1.

We c l a i m that ~ = 0. Indeed, since IT(Xo) I <_ 1 , re ~(ixo) ; 0,

and so ~ ( x o) = 1 = ~(xo); that is, a = 1, qed.


84,

e) Corollar~. (Zuhovickii, Ptak, Rivlin-Shapiro) Let M be

an n-dimensional linear subspace of C(e) and ¢ e M ~. Then

{tl,''',t m} C ~ and scalars XI,...,X m such that

m
¢(x) = j=l~ijx(tj) V X e M,

m
11 11 =
j=l
X Ixjl,
sgn (Xj) = x o(tj), 1 <_ j <__ m,

for any x ° e X(M) satisfying ¢(Xo) = II¢II. Here m < n (real

scalars) or m < 2n-I (complex scalars).

Proof. Exercise 36.

f) We now reconsider the characterization theorem 22b) under

the additional hypothesis that the convex set K lies in an n-dimen-

sional subspace of the nls X. In this case the separating func-

tional ¢ of 22b) can be written as a convex combination of m

extreme points of U(X*) where m < n+l (real scalars) or

m < 2n+l (complex scalars). This follows from the Interpolation

Formula d) applied to the subspace M z span ({x,K}) (x as in 22b)),

which has dimension < n+l.

If we write this representation of ¢ as

where Xj > O, Xl+...+X m = 1, and Cj E ext (U(X*)), and i f

x 8 K is a b.a. to x, then we have in addition


o

cj(x-x o) = IIx-xoll, Vj.


For suppose that re Cj(X-Xo) < IIX-XoJJ for some j < m. Then by

22b),
85

l lx-xoll = ¢(x-~ o) = j=iZjCj


~ (x-xo)
m
= re ( ~ X.~j (x )
j=l J -x°)

m
= j=l
~ X j re ,j ( x - x ° ) < I l X-XolI '

a contradiction. Consequently,

I1~-~oll = re ~ j ( X - X o )

l~j(X-Xo)l E Ilx-~oll,
qed. Since we are tacitly assuming the ~j to be distinct from

one another, this entails their pairwise linear independence.

We sum up the preceding remarks for the following important

special case.

Theorem. Let K be an n-dimensional linear subspace of the

nls X. Then x ~ K is a b.a. to x ~ X \ K if and only if


O

{~l,...,~m}C ext (U(X*)), X I,.-.,x m > 0, Xl+'''+Xm = 1 such that

®j(x-x o) = llx-~ol I, V j,
m
~.~j ~ s ( ~ ) .
j=l J

Corollary. (Cheney, Ikebe, Singer) Let K ~ span ({Xl,...,Xn})

be an n-dimensional subspace of the nls X with scalar field F.

Then x° ~ K is a b.a. to x e X \ K if and only if the origin in

Fn belongs to

co ({ ( ~ ) ¢ ( ~ 1 ) "'" '~)¢(~n )):


qb ~ e x t (U(X*)), I~(X-Xo) I = l t X - X o l l } ).
86

Proof. Exercise 37.

g) Corollar~. (Distance Formula) Let K be a convex subset

contained in an n - d i m e n s i o n a l subspace of a nls X. Assume that K

contains a b.a. to x ¢ X \ K (certainly true if K is closed).

Then there are m (as in f)) pairwise linearly independent

functionals Sj ~ ext (U(X*)) such that

d(x,K) = min { max ]¢j(x-z)]: z g K}.


l<j<m

Proof. Let x be the assumed b.a. to x and apply the


o
results in f) to obtain {~l,...,~m}~ ext (U(X*)), Xl,...,% m > 0,

kl+-..+km = i such that

Sj (X-X O) = I lX-Xol I ,
m
re ~ k.¢j(Xo-Z ) > O, Z E K.
j=l j

Thus for z c K,

m
re ~ k.$j (x-z)
j=l J
m
> re ~ X.$j (X-Xo) = l lX-Xol I
- j=l J

and therefore,

d(x,K) = I Ix-Xotl
Ill
<_ inf {]j=l kjCj (x-z)[: z ¢ K}

m
< inf { 1 % ' I ~ j (x-z) I : z e K}
-- j=l J

< inf { max l ¢ . ( x - z ) ] : z ~ K}


- l<_j <m J

< inf {I Ix-zl I : z ~ K} = d(x,K).


87

Setting z = x° shows that the infimum is actually attained, qed.

h) ExamPle. The usefulness of the foregoing results evidently

hinges on our knowledge of the ext (U(X~)). Ideally, this set

should be small relative to S(X~), and known in explicit form.

The outstanding example of such a space X is C(~).

Let K = span ((Xl,'-',Xn)) be an n-dimensional subspace of

C(~) and let t denote the n-tuple (xl(t),...,Xn(t)). For fixed

x outside K, let

n
.o,
r = r(Xl, ,Xn) ~ x - ~ X.x .
j=l J j

be the error function in the approximation to x by Z~jxj. Then

a nas condition that r achieve a minimum at a particular set of

values (~l,...,Xn) (so that Z~jxj is a b.a. to x) is that the

origin in n-space belong to co ((r(t)t: Ir(t)] = l]rlI}). This

conclusion is an immediate consequence of f) and 15c).

i) ~xample. (Remez, Schnirelman, Zuhovickii) Let x and K

be as in h), and m as in f). Then ~ (tl~...,tm) C ~ such that

d(x,K) = rain {l~jmax<__m


Ix(tj)-z(tj)]: z ~ K}

This follows immediately from g). The implication is that there is

a finite subset (tl,..-,tm) C ~ such that the minimum distance

from x to K is the same as the minimum distance when all func-

tions are restricted to this subset. Further, among these restric-

tions at least one of the b.a.'s to x will be a b.a. to x wrt

the entire set ~. These observations underlie the construction of

the practical algorithms used for computing best approximations in

the spaces C(~). These algorithms reduce the original problem to

a succession of approximation problems involving functions defined


88

on (judiciously chosen) finite subsets of 2, and hence to a

succession of finite dimensional convex programs.

§__24. Application to Gaussian quadrature

a) Let ~ be a positive Borel measure on [-i,i] and

f ~ CR([-I,I]). Our problem is to guess the value of ffd~, given

m samples f(tl),...,f(tm) , of f. Confining ourselves to linear

estimates, we are led to a quadrature formula:

(i) I1 fd~ ; m~ A.f(tj).


-1 j=i J

It is clear that the A. can be chosen so that the approximation (i)


3
is exact whenever f is a polynomial of degree ~ m-l. Gauss

proved that by proper choice of the nodes

{tl,''',tm} C [-i,i], the formula (i) becomes exact for all poly-
nomials of degree < 2m-l.

b) Let Pn be the (n+l)-dimensional space of polynomials of


degree < n on [-i,i]. Let

l
(x) -
f -I
xd~,

so that ~ E P* and ~CI) = 11~[I = I1~11. By 2 3 e ) ,


n
{tl,''',tm}C [-i,i], where m ~ n+l, and positive numbers

Xl,..-,Xm, such t h a t Vx c Pn'


1 m
I xd~ = ~ Xjx )
(2) "i j =l (tj ,

m
(3) 11~tl = X x.
j=l ]

We are concerned with the size of m in this representation of ~.


89

Theorem. (Krein, Rivlin-Shapiro) There is a unique repre-

sentation of ~ in the form (2), (3) for which m is minimal.

This minimal value of m is i + In/2], assuming that support (~)

contains at least 1 + [n/2] points. Furthermore, when m has

this value and n is odd, formula (2) is exactly the Gauss qua-

drature formula referred to in a).

c) Before proving this theorem let us recall some facts about

the Gauss formula. Suppose we apply the Gram-Schmidt orthonormal-

ization procedure to the monomials {l,t,t 2,..'} in L2(~). We

thereby obtain a complete orthonormal sequence of polynomials

{Qo,QI,Q2,...} in L2(~). Each root of Qj is simple and lies in

(-i,i).

Example. If ~ is Lebesgue measure then

Qn = n ~ ~ +~- Ln"
where L is the n th Legendre polynomial:
n

1 dn (t21)n
L n(t) 2nn! dt n

Theorem. If the quadrature formula (i) is exact on Pm' then

it is exact on P2m-i if and only if the nodes {tj} are the

roots of Qm"
The proof of this theorem can be found in [9, p. ii0]. The

resulting quadrature formula is the Gauss quadrature formula.

d) We now prove the theorem in b). We first note that (2) can

only hold if m > In/2]. For otherwise, ~P ¢ Pm such that


p(tj) = 0, 1 ~ j ! m; since deg (p2) = 2m ~ n, we have
90

1 m
I p2d~ = ~iAjp2(tj) = O.
-t j--

This implies that p vanishes on support (~), which therefore has

at most m ~ In/2] points, contradicting the assumption about

support (~) in b). Now let m = (n+l)/2 (remember that n is

assumed odd), and, for 1 ~ j 2 m and {tj) = {roots of Qm ),

define

(4) ~3
• z
'
(Qm(tj
))-i/I -i Om(t ) (t-tj)-id~(t).

Claim: each
~. > 0, and relations (2), (3) hold. By the theorem
J
in c), no other choice of ~. with this small an m can satisfy
J
(2); hence when the claim has been justified, the proof of the

theorem in b) will be complete.

Proof of Claim. Let p ~ P vanish at each t.. Then


n j
P = Qm " q for some q a Pm-l" But Qm is orthogonal to any such
q (as elements of L2(~)), and so fpd~ = 0. From linear algebra

there follows the existence of real numbers yl,-.-,ym such that


m
(s) ¢(x) = j=l
~ y~x )
" (tj ,

V x ~ Pn" Applying (5) to the functions

, -1
xi(t ) = (Qm(ti))-lQm(t) (t-t i) ,

and recalling (4), we find that Ti = Xi' 1 2 i ~ m. This proves

(2) and if, as will next be shown, X. > O, relation (3) also
1

follows.

Fix an i, 1 < i < m., and define

x(t)-- (Qm(t)/(t-ti)) 2"


91

then deg (x) = 2m-2 < 2m-l. Apply ~ to x:

dp(x) = Xi(Qm(ti)) 2

Now ¢(x) > 0 since x > @, and x vanishes at only m-i points

while support (;) contains at least (n+l)/2 > m-i points. This

shows that X i > 0, qed.

Exercise 38. Let A be the n th Gauss quadrature formula,


n

considered as an element of CR([-I,I])*. That is,

n
An(X) =
j=l
xj(n)x(t}n))

for x ~ CR([-I,I]) , where {t~ n)} = {roots of Qn }. Prove that

An * ~ in the w*-topology.

§25. ....Haar Subspaces

In order to obtain a sharper and more useful form of the

characterization theorem in 23f) in the case where M = C(~) we

introduce the notion of a (finite dimensional) "Haar subspace" of

C(~). This notion will later be generalized to subspaces of an

arbitrary nls, and will play a role in the study of uniqueness ques-

tions in the theory of best approximation.

a) Definition. Let M be an n-dimensional linear subspace of

C(~). Then M is a Haar subspace (interpolatin ~ subspace) if given

any n distinct points {tl,...,tn} ~ , and any n scalars

{Cl,...,Cn} , there is exactly one x E M for which x(ti) = c i,

l<i<n.

The following lemma provides some alternative characterizations

of Haar subspaces; its straightforward proof is omitted.


92

Lemma. Let M = span ({Xl,...,Xn}) be an n-dimensional sub-

space of C(~). The following assertions are all equivalent.

i) M is a Haar subspace;

2) @ is the only element of M having at least n roots in

3) For distinct {tl,.'',t n} C a , the matrix [xi(tj)] is

non-singular;

4) For distinct {tl,...,tn}C ~, the set of n-vectors

is linearly independent (the notation ~ was


{£i,.'',£n}
defined in 23h)).

Remark. Obviously the span of any non-vanishing x s C(~) is

a one-dimensional Haar subspace. However, the existence of higher

dimensional Haar subspaces imposes a severe topological restriction

on ~. In particular, if CR(~ ) contains an n-dimensional Haar

subspace (n > 2) then ~ is homeomorphic to a (compact) subset of

the unit circle (Mairhuber, Curtis, Sieklucki). For further details,

see the discussion in Singer [72, p. 218-222].

c) We consider now several examples of Haar subspaces. First,

since the Vandermonde determinant is non-zero, it follows from a)

that the polynomial subspace Pn is a Haar subspace in CR([a,b])

for any n > 1 and a < b. This can also be viewed as a special

case of the following fact. If x ~ C~([a,b]) and x(n)(t) > 0 on

[a,b], then span ({l,t,t 2,..-,tn-l,x}) is a Haar subspace of

CR([a,b]). On the other hand, span ({t,et}) is not a Haar subspace

of c~([0,3]).

Next we give a general result which shows that Haar subspaces

can be generated by solutions of certain special kinds of ordinary

differential equations.
93

Theorem. (Pdlya, Zedek) Let I be any real interval. Define

a linear differential operator

Ln(D ) = ( D + X n ( t ) ) ( I ) + X n _ l ( t ) ) ' ' ' (D+XI(t)),

n-i
where Xi ¢ CR ( I ) , 1 < i < n, and D -- d / d t . Then any non-zero

solution of

(l) L (D)'x : 0
n

has at most (n-l) distinct roots in I. Consequently, any n

linearly independent solutions of (i) span a Haar subspace of I.

The p r o o f requires a preliminary lemma consisting of two

generalizations of R o l l e ' s theorem.

Lemma. I) Let x be a d i f f e r e n t i a b l e function on [a,b] with

x(a) = x(b) = 0, and let X ~ CR([a,b]). Then ~ c s (a,b) such

that

(D+X(c)).x(c) -= x ' ( c ) + x ( c ) x ( c ) : o.

2) Let x be n - t i m e s differentiable on [a,b] and have


n-i
(n+l) distinct roots there. Let Xi s C R ([a,b]) for 1 < i < n.

Then ~ c ~ (a,b) such that

(2) Xn(C ) =- L n ( D ) . x ( t ) I t = c = 0

Proof. I) Apply Rolle's theorem to the f u n c t i o n

y(t) ~ x(t) exp (/X(t)dt).

2) Define x ° = x, x k = (D+Xk)Xk_l, for 0 ~ k ~ n. By

induction and the result in i) we see that xk has at least

n - k + 1 roots, each lying b e t w e e n each pair of a d j a c e n t roots of

Xk_ I. When k = n we o b t a i n (2).


94

Proof of the Theorem. We p r o c e e d by i n d u c t i o n on n. For

n = 1 the general non-zero solution of (I) is g i v e n by

x = c exp (-/~l(t)dt), where c ~ 0. This x has at most

n - i = 0 roots in I as claimed. Now assume the t h e o r e m true

for a value n-i and let x be a n o n - z e r o solution of (i). Then

the f u n c t i o n

w = Ln_l(D).x

is a s o l u t i o n of the e q u a t i o n

(D+Xn).W = @.

Now two cases are possible. If w = @, then x is a s o l u t i o n of

(I) w i t h n replaced by n-l, so x has at most (n-2) roots in

I by the i n d u c t i o n hypothesis. Otherwise, w > @ and then by the

first step of the induction, w has no roots in I, since it

satisfies (3). But in this case the s e c o n d part of the p r e c e d i n g

lemma implies that x can have at most (n-l) distinct zeros in

I, qed.

Exercise 39. Verify the a s s e r t i o n s in the first paragraph of

this sub-section. Also:

i) Let al < ~2 < "'" < an, and 0 < a < b < +~. Then

span ({t ~I , . . . , t a n})

is a Haar subspace of CR([a,b]).

2) Let {~i } be as in i), and a < b. Then

~i t ant
span ({e ,.-.,e })

is a H a a r subspace of CR([a,b]).
95

3) For n a positive integer,

span ({i, cos kt, sin kt: I <__ k < n])

is a Haar subspace of the space of all real continuous 2~-periodic

functions on the line (identified with CR(~ ), where ~ is the

unit circle].

4) For n a positive integer,

span ({i, cos kt: 1 < k < n}),

span ({sin kt: i < k < n})

are each a Haar subspace of CR(~), where ~ is a compact sub-

interval of (0,~).

d) We are now ready to e s t a b l i s h the famous "alternation

theorem" which characterizes b.a.'s from Haar subspaces of

CR([a,b]). As a preliminary, let M = span ({Xl,...,Xn}) be a

subspace of some space CR(~ ). Define a function

D ~ CR(~ x...x ~) by

D(tl,...,tn) = det [x~(tj)].

Then D is zero only if two or more of the points {tj} coincide.

Given two sets {Sl,...,s n} and {tl,...,tn} , each consisting of

distinct points in ~, suppose that it is p o s s i b l e to vary tj

continuously so that t. ÷ s. while no two of the t become


3 ] ]
coincident; then sgn D(tl,...,tn) = sgn D(Sl,...,Sn). In particular,

this can be done when ~ = [a,b].

Lemma. Let a < s I <'''< Sn < b, a < t I <''.< tn < b and let

be as just defined. Then sgn D(Sl,... ,Sn) = sgn D(tl,...,tn).


96

Proof. Exercise 40.

Theorem. (Chebyshev-Bernstein Alternation Theorem) Let M

be an n - d i m e n s i o n a l Haar subspace of CR([a,b]) and x ~ M. Then

x° s M is a b.a. to x if and only if there are points

a j t I <...< tn+ 1 ~ b such that

Ix(tj)-Xo(tj)I = l lX-Xoll,

x(tj)-Xo(tj) -- (-l)J+l(x(tl)-Xo(tl)),

for 1 < j < n + i.

Remark. Let r ~ x - x be the error function. The condition


o
just stated is that this error function should attain its m a x i m u m

absolute value over [a,b] at least (n + I) times, with alternate

signs. Sometimes this is expressed by the statement "the error

curve alternates (n + i) times", or, "the error curve has (n + i)

alternating peak points". The point set {tl,--.,tn+ I} is called

a Chebyshev alternance for r.

Proof of the A l t e r n a t i o n Theorem. From 23f) we recall that a

nas condition for x to be a b.a. to x from M is that there


O

should exist m ( ~ n + i) points {tj} C [a,b]~ (tj < tj+l),

> O, Ol,...,qm with loj = 1 such that


Xl,''',X m

°j (x(tj)-Xo(tj)) = Ix-xoll,
m
Xjojxi(tj) = 0,
j=l

for 1 <_ i <_ n. (That is, the ~j of 23f) are oj6tj here.) By

setting ~. = o.X. the above nas condition is equivalent to the


J J J
existence of m non-zero scalars ~ such that
J
97

IX(tj)-xo(tj) I = l iX-Xoll,
(4) sgn ~j = sgn (x(tj)-Xo(tj)),

m J.
(s]
j =i tj

Suppose that m < n + i. Then, because M is a Haar subspace,

~y ~ M such that y(tj) ~ aj, 1 <_ j <_ m, and so by (5),

m m
0 = !l~jy(tj) = ~ a2 ¢ 0
j j=l J '

a contradiction. Therefore m = n + I.

Now let us rewrite equation (5) in the form

j=l ~.x.(tj)
J i = -an+iXi(tn+l)

for 1 < i < n, and solve for a. by Cramer's rule:


-- -- J

D(tl,''',tj_l,tj+l,''',tn+ I)
C6) aj = (-l)n+l-i~n+ 1
D(tl,..',t n)

Now by the p r e c e d i n g lemma, the ratio of the D's is positive.

Thus the numbers a. alternate in sign, and hence so do the numbers


J
(x(tj)-xo(tj)). This proves the n e c e s s i t y of the alternation

condition for x to be a b.a. to x.


O

Conversely, if x - x° has an (n + l)-point Chebyshev alter-

nance in [a,b], a ! t I <...< tn+ 1 ~ b, set

~n+l --- sgn (X(tn+l)-Xo(tn+l)),

and define @l,...,~n by means of (6). This means that

{~l,...,~n+l} satisfy (4) and (S), and so the conditions for xo

to be a b.a. to x are met, qed.


98

e) The Alternation Theorem is very useful for actually com-

puting best approximations in CR([a,b]). For example, it easily

implies that the b.a. to x by constant functions is the constant

function whose value is the average of the minimum and maximum

values assumed by x on [a,b].

Exercise 41. Verify this last assertion. Compute the

(unique) b.a. from P1 and from P2 to x(t) = t on [-i,i],

and from P1 to x(t) = ~ on [0,I].

We will shortly see a further application of the Alternation

Theorem in the discussion of Chebyshev polynomials.

f) The final result of this section is a classical theorem of

de La Val~e Poussin which complements the distance formula of 23i)

by providing lower estimates for the distance to a subspace

M C - C R ( [ a , b ] ), if M is Haar.

Theorem. Let M be an n-dimensional Haar subspace of

CR([a,b] ) and x ~ M. Let z ~ M have the property that x - z

assumes alternately positive and negative values at (n + i)

consecutive points (tj) C [a,b]. Then

d(x,M) >_ rain (Ix(tj)-z(tj)I: 1 <__ j <_ n + i).

Proof. Suppose there is y ~ M such that

I Ix-yll < rain (Ix(tj)-z(tj)). Then y - z = x - z - (x-y) assumes

alternately positive and negative values at the tj, and so has

at least n roots in [a,b], contradicting the Haar condition for

M.

§26. Chebyshev Po!xnomials

These polynomials occur as solutions of certain best approxi-


99

mation problems on [-i,I] and have many remarkable properties.

In particular, we will see that they can be used to conveniently

produce "good" (although not necessarily "best") approximations to

any function in CR([-I,I]).

a) Lemma. For -i < t < i,

(I) cos (n arc cos (t)) = 2n-lt n + qn_l(t),

where qn-i ~ Pn-l"

Proof. This follows from the formula

n- 1 (n)
cos (n@ = 2n-l(cos (a))n + k!oXk (cos (~))k,

which in turn is proved by induction on n, making use of the


identity

Definition. The (n + i) st Chebyshev polynomial T is


n

g i v e n by t h e r i g h t hand s i d e o f ( 1 ) .

b) Consider the problem of finding the b.a. from Pn-i to tn

on [-i,I]. This is evidently equivalent to the problem of finding

the monic polynomial in Pn which best approximates @ (i.e.,

which has least norm).

Theorem. (Chebyshev) The monic polynomial in Pn of least


norm on [-I,i] is (i/2n-l)rn .

Proof. Let p be the desired solution, and put A a ]]Pl]"

Then by 25d) there must be an (n + l)-point Chebyshev alternance

~[-I,i]. The alternance must include the points + i, since p'

vanishes at each alternance point in (-i,I), while deg (p') = n - i.


I00

Thus the p o l y n o m i a l s A 2 _ p2 and (l-t2)(p') 2 have the same

roots, and each root in (-I,i) is a double root. This implies

A2 p ( t ) 2 = ( 1 - t 2 ) p , ( t ) 2 / n 2,

A2 p(t)2 = +_ ~ P (t) .

Now p' changes sign as each point of the a l t e r n a n c e is passed; let

I be an interval where p' > 0. Then, for t ~ I,

p__' (t)_ _ n
fAZ-p(t)z 41-tZ
Integrate both sides to get

arc cos ( ~ ) = c + n arc cos (t),

p(t) = A cos (n arc cos (t) + c)

= A(cos (C)Tn(t) sin (c) sin (n a r c cos (t))),

for some constant c. But sin (c) must equal 0, since

sin (n arc cos (t)) is not a polynomial. Therefore cos (c) = h i,

and since p is monic we finally obtain that cos (c) = 1 and

A =i/2 n-l. That is, p = (i/2 n-l)Tn, qed.

c) We state with proof (for w h i c h see [51]) the following facts

about the C h e b y s h e v polynomials T n.

1) To ( t ) = 1, Tl(t) = t,

Tn(t ) = 2tTn_l(t)-Tn_2(t), n > 2.

2) For 1 <__ k <_ n, the roots of Tn are the points

cos ((2k,l)~ .
\ 2n !
For large n these roots tend to cluster toward the endpoints of

[-i,i] .
101

3) For any P ~ Pn and t outside (-i,i) we have

IP(t) l 5_ ]]P][ ]Tn(t)],

where I[PI[ ~ max {Ip(t) l: Itl ! I}. Thus all the zero tendency of

Tn is compressed to within (-i,i); outside this interval it ex-

ceeds every other polynomial in Pn of norm ~ i. The above in-

equality is also valid if p and Tn are each replaced by their

kth-derivatives.

4) The sequence

is a complete orthonormal set in L2(~), where is the positive

measure on [-i,i] defined by

dt
d~(t) _

~ .

Exercise 42. Prove this last fact.

d) In view of c-4) we now have the possibility of expanding

any x s CR([-I,I]) in a Fourier-Chebyshev series

cO
(2) x = 7 + =
[ICkTk '
k

where

x(t)Tk(t)
- i ~ - - dt,

and of course, the series (2) will converge to x in the L2(D) -

metric (~ as defined in c-4)). We next observe that (2) will also

converge to x in the CR([-l,l])-metric , with a modest additional

assumption on x.
I02

Theorem. Assume t h a t x s CR([-1,1]) is of bounded variation.

Then t h e Fourier-Chebyshev series (2) converges .uniformly to x on

i-l,1].

Proof. The f u n c t i o n y(s) = x(cos (s)) is of bounded variation

in the space of real continuous 2~-periodic functions. A theorem of


Titchmarsh [ 7 5 , p. 410] guarantees the uniform convergence of the

Fourier series for y. Since y is an e v e n f u n c t i o n its Fourier

series has the form


C co

y(s) - 9 +
k
lCk cos Cks ,
y(s) cos (ks)ds.
Ck~ 0

Now r e c a l l i n g the definition of Tk in a), and a p p l y i n g the change


of variable t = cos (s), we o b t a i n the conclusion of the theorem.

e) L e t Sn[X ] denote the truncation of the series (2) at

k = n. Then, of course, Sn[X ] is the L2(p)-best approximation to

x from P . We wish to show that it is also a "near-best"


n

CR([-l,l])-best approximation to x. This will prove to be a

consequence of the following general theorem.

To set the stage, let v be a positive measure on [-i,i],

absolutely continuous wrt Lebesgue measure, and let

{qk: k = 0,i,...} be the sequence of v-orthorormal polynomials. De-

fine numbers

I n
A~n 5 max {I I ~ qk(s)qk(t)ldv(s) }"
Itl~l -1 k=o

For fixed x s CR([-I,I]) let S~[x] be the L2 ( v ) - b e s t approxi-


n

mation to x from Pn' and let Qn[X] be the CR([-1,1])-best


approximation to x from P .
n
103

Theorem. ( A l e x i t s , Powell) In the p r e c e d i n g n o t a t i o n , we have

llx-S~[x]II ! (l+A~)IIX-Qn[X]II

(where the norm on both sides is the uniform norm).

Proof. We may write


n
X-Qn[X]-(X-Sn[X]) = [ akq k,
k=o

where the ak are to be determined. By 22d),

i1 (X_SnV[X])qkdV = 0,
-1

for 0 i k ! n. Consequently,
1
~k = f (X-Qn[X]) qkdV'
-1

for 0 < k < n. Hence, for -I < t < i,

Ix(t)-Sn~[X] (t) I = I x ( t ) -

Qn[X] (t) (X-Qn[X] qkdV) qk (t) 1


k=o -i
S n
= Ix(t)-Qn[X](t)- 1-1 {(X-Qn[X])(s) } k ! o q k ( s ) q k ( t ) d ~ ( s ) t
l n

! )lX-Qn[X])I + f -1 t(X-Qn[XJ)(s)l I k=o


~ qk(s)qk(t)/dv(s)

! llX-Qn[X]ll(l+A~),

qed.

Corollary. Let ~ = ~ ~ the Chebyshev measure d e f i n e d in c - 4 ) ,


and let A n ~ A n~. Then

(s) t lx SntX]l/ i (l+A~)IZx Qn[X]ll,


and
104

2 I'g
An = o<t<~max -~ n
0 Ik=o~' cos (ks) cos (kt)]ds,

where the prime on the summation sign indicates that the first term
is to be halved.

This corollary follows directly from the theorem by making the

change o f v a r i a b l e used in d).

f) It is shown in [61, p. 406] that

= (2n+2~ + 2 ~ ~ tan (2-~


An ,2n+l ~ ~ ),
k--1
and hence that

n -~ co
An ~ -2
4 log (n)

In particular, A n < 5.1 for n ! I000.

The practical implication of all these estimates is the follow-

ing. Suppose we are given some x E CR([-I,I]), and we wish to

approximate x by polynomials to within some specified tolerance.

Then the partial sums Sn[X ] of the Fourier-Chebyshev series (2)

are at least good first attempts. Indeed, the inequality (3) shows
that the best we could ~o by way of approximating x from Pn

(fixed n), namely [IX-Qn[X]l[, is greater that (i/6.1)]IX-Sn[X]l],


provided n < i000. That is, we cannot even obtain one additional
decimal place of accuracy by replacing the near-best approximation

Sn[X ] by the best approximation Qn[X].

Results of Hornecker, Talbot, and Rivlin [64] show that for a

certain class of rational functions x ~ CR([-I,I]), Sn[X]


actually differs from Qn[X] only in the top order term.
105

§27. Rotundit ~

In this and the next section we will be discussing the unique-

ness question in approximation theory. We consider two different

approaches to this problem: first, we impose global (rotundity)

conditions on the norm, sufficient to guarantee that every (convex)

approximation problem has at most one solution; on the other hand,

we consider particular subspaces having special properties sufficient

for uniqueness.

a) Definition. Let X be a Is and f e Cony (X). f is

strictl Z convex if u,v ~ dom (f) and 0 < t < 1 imply

(i) f(tu + (l-t)y) < tf(u) + (l-t)f(v).

Examples. If we refer back to 6a) - Example 5), a sufficient

condition for strict convexity of f is that E(u,v) > 0 for

u,v ~ K, u # v. In particular (refer to Exercise 12), f is

strictly convex if d2f(x) is positive definite for every x ~ K.

The functional f in Exercise 20 is strictly convex.

Remark. The immediate relevance of the strict convexity con-

cept to optimization problems is contained in the following ob-

vious assertion: a convex program with a strictly convex objective

function has at most one solution.

b) Definition. A closed convex subset K of a tls is rotund

if every boundary point of K belongs to ext (K). By abuse of

language, a nls X is called rotund if U(X) is a rotund set in X.

We would like to think that the rotundity of a nls X is

somehow connected with the strict convexity of the norm function on

X. However, due to the homogeneity of norms, the norm is never a

strictly convex function on X (the inequality (I) is violated along


106

rays emanating from @).

c) Definition. A norm ]I" I on a is X is essentially

strictly convex if u,v e X and

llu+vlj = ilull + llvll

implies u = tv for some t > 0.

Theorem. A nls is rotund if and only if its norm is essentially

strictly convex.

Proof. Exercise 43.

Thus by allowing the inequality (i) to fail only where it must,

we have obtained a property of norms which is equivalent to the

geometric property of rotundity. Of course, our interest in ro-

tundity is that in a rotund nls each element has at most one b.a.

from any specified convex set.

d) Definition. By abuse of language, a nls X is called smooth

if its norm is a smooth function (in the sense of 7c)) on the open

set X \ {8}.

Theorem. Let X be a nls such that X* is smooth (resp.

rotund). Then X is rotund (resp. smooth).

Proof. Suppose that X is not rotund. Then some point in

S(X) is not an extreme point of U(X), and hence there exists a

line segment [u,v] C S(X). By Mazur's Theorem (3e)), ~ ¢ a S(X*)

such that re ¢ ([u,v]) = I. The canonical images of u,v in X**

are then both subgradients of the norm (on X*) at ¢. But this

implies, by 10c), that the norm (on X *) has no gradient at ¢, a

contradiction. The proof of the remaining assertion of the theorem


107

is similar.

In particular, when X is reflexive, there is complete duality

between the properties of smoothness and rotundity. Since, accord-

ing to Exercise 14, the LP(~) spaces are smooth for 1 < p < +~,

we see again that such spaces are also rotund (a direct proof was

suggested as part of Exercise 7). It also follows that, for the

same values of p, the closely related Sobolev spaces wk'P(G)

(here G is an open subset of Rn and wk'P(G) consists of all

those scalar-valued functions on G whose distributional derivatives

up to order k are all in LP; see [78, p. 55]) are smooth and

rotund. Similarly the compact operator spaces Sp (which consist

of all compact operators T acting on some fixed Hilbert space such

that trace (T'T) p/2 < +~; see [22, Ch. Ill]) are smooth and rotund.

(In fact, all the spaces just listed possess the much stronger prop-

erties of uniform rotundity and uniform smoothness. In particular,

they are all E-spaces (see §31).)

e) Given a separable nls X it is always possible to find an

equivalent essentially strictly convex norm on X, which furthermore

differs arbitrarily little from the original norm. Much stronger

renorming results are known (see the survey article [ii], also [2]),

but they are not too useful for approximation theory. The result

just mentioned, however, will allow us to give an interesting appli-

cation of approximation theory.

Lemma. (Clarkson) Let (X, I]']l) be a separable nls, and

let ~ > 0. Then there is an essentially strictly convex equivalent

norm Ill'Ill on X, such that

Ilxll lllxlll (l+ ltxll, Vx ×.


108

Proof. It will suffice to produce some essentially strictly

convex norm equivalent norm ].[ on X, for then we may take

]]['[I] = I['[[ + X].], for suitably small X > 0. To construct

I'[, it will suffice to find an i s o m o r p h i s m T on X onto a sub-

space of a rotund nls (Y, ]].][) and then set

Ix[ = [[x][ + ][T(x)[], for x e X. But for Y we may take

C([0,1]), and for If'If the map

Y~-~llYI[ z llyll~ + ly(t) l2d ,

where [[.][~ denotes the usual uniform norm on C([0,1]). The iso-

morphism T is then the c o m p o s i t i o n of the identity map:

(Y, ]['I[~) ÷ (Y, ]I'][) with an isometry of (X, [].][) onto a

subspace of (Y, []'[I=); this isometry exists by virtue of the

separability of X (Banach's theorem).

f) Lemma. Let K be a compact convex subset of a rotund nls

X. Then the m e t r i c projection PK: X ÷ K is a (single valued) con-

tinuous function on K.

Proof. Exercise 44. (Metric projections are defined and dis-

cussed in §32.)

g) Application. (Schauder Fixed Point Theorem) Let K be a

closed convex subset of a nls X, and T a continuous mapping of

K into a compact subset of K. Then T has a fixed point in K.

Proof. (Bonsall) We can i n i t i a l l y reduce the p r o b l e m to the

case where K is b o u n d e d and X is separable and rotund. For, if

T(K) C A, where A is compact in K, then it suffices to prove

the theorem for B ~ c--~ (A) C K. Further, if we put Y ~ span (B),

then Y is a separable nls, and we may renorm Y so as to be


109

rotund by e ) .

Since T(K) is totally bounded, for each n = 1,2,..., there

is a ~-net {T(xi): i = 1,..., m = m(n)} for T(K). Let Yn be

the linear hull of this -net, and p u t Kn ~ K N Yn , a compact sub-

set of Yn" Let Pn: X ÷ Kn be t h e m e t r i c projection; then

Tn = (Pn o T) IKn

is a continuous self-map of K n. The classical Brouwer fixed point

theorem provides a fixed point un for Tn: Tn(Un) = u n. By compact-

ness, we may assume that T(Un) ÷ v, for some v ~ K. Now

]lUn-Vll = tITn(Un)-Vl[

< I ITn(Un)-T(Un)l[ + I ITCun)-vl]


--< n! + ,,,, l l T,t U n,~ - v l l

because, V x ¢ Kn, we have

l lT(x)-Tn(X)l t = I IT(x)-Pn(T(x))tt
: I J (I-P n)('r(x))li : a(T(x),~:)

< rain {[IT(x)-T(xi)]l: 1 < i < m} < 1 .


. . . . n

This proves that u n ~ v, and so Ttun) ÷ T(v); that is, v is a

fixed point of T, qed.

§28. Chebyshev Subspaces

a) Definition. A subset of a nls X is s e m i - C h e b y s h e v (resp.

Cheby_sshev) if it contains at most one (resp. exactly one) b.a. to

every element of X.

From 27c) it follows that every convex subset of a rotund nls

is semi-Chebyshev.
110

b) Theorem, Let K be a convex subset of a nls X. Then K

is s e m i - C h e b y s h e v if and only if there do not exist

(9 ~ @), points xI ~ x2 in X, and points yl,y 2 s K such that

~(x i) = l lxiJ I, i = 1,2,

re ~(yi) = sup re ~(K), i = 1,2,

Xl x2 = Yl Y2"

Proof. Suppose that 9' xi' Yi exist as just described. We

may assume that ~ a S(X*). Now

J IXlJ X - llx21J -- ¢(x i x2) = re %(x I x2)

= re ~(Yl y2) = re ~(yl) re 9(y2) = 0,

so lJxliJ ~ ilx2j J Consider the convex set

K' z K-- (x 2 + yl).

The points

-Xl = Y2 (x2 + Yl )'

-x2 = Yl (x2 + Yl )

are both in K' and we claim that they are both b.a.'s to @ from

K ~ . This will imply that K' and hence K are not semi-Chebyshev.

That -x i is a b.a. to @ from K' follows from 22b), because

~(-(-xi) ) = ~(xi) = tJxiJl,

and (taking i -- 1 for definiteness)

sup re }(K') = sup re 9(K-(x 2 + yl) )

= sup {re }(y): y s K} re }(x2) re }(yl)

= re }(yl) ÷ re ~(-x2) re ~(yl)


111

= re #(-x2).

For the converse, assume that K is not semi-Chebyshev; then

x s X with two distinct b.a.'s yl,y 2 E K. Then by 22b) again,

~ ~ S(X*) such that

re ~(yi) = sup re ¢(K),

#(x - yi ) = fix - yit I = d(x,K).

The points x i z x - Yi are then distinct, and together with

and Yi they satisfy the second condition of the theorem.

Corollary. (Singer) A linear subspace K of a nls X is

semi-Chebyshev if and only if there do not exist ~ ~ S(K±), x g X,

@ ~ y ~ K such that ~(x) = Ixll = I] x - ylI.

c) As in §23 we now specialize the above considerations to the

case where the convex set K is c o n t a i n e d in an n - d i m e n s i o n a l sub-

space of X. In fact, for simplicity, we will assume that K is

an m - d i m e n s i o n a l subspace of X.

Theo____rem. (Singer) An n - d i m e n s i o n a l linear subspace M of a

nls X is C h e b y s h e v if and only if there do not exist

{~l,...,¢m} C ext (U(X*)) (where m ! n (real scalars) or m ! 2n-i

(complex scalars)), kl,''-,km > 0, kl +'''+ km = i, x ~ X,

~ y ~ M, such that

Cj(x) = l[xl I = Ilx - Yl[ V j,


m

j=~iXjCj ~ S(M±).

Proof. Clearly if such Cj, kj, x, y exist then by the pre-

ceding Corollary, M is not Chebyshev. Conversely, assume that

is not Chebyshev. Then z ~ X ~ M having distinct b.a.'s


112

yl,y 2 ~ M. Let x = z - YI' Y = Y2 Yl ~ 8' so that @ and y


are both b.a.'s to x from M. Let Y - span ({x,M}). By 22b)

~@ ~ S(Y *) such that ~(M) = 0 and I ixll = ¢(x) = @(x - y) =

If x - yl I. By 23d) ~ {¢l"''"~m } C ext (U(Y~)), XI,--.,X m > 0,

X1 +'''+ Xm = i such that

m
(i) % = ~ XD ~j.
j=l

Here m ! n + 1 (real scalars) or m < 2n + 1 (complex scalars).

Since IICjI[ = 1 and lj > 0, @j(x) = llxll = fix - yII, wj.

Also, l@j( x - Y) I ~ II x - yll = @j(x), so that re @j(y) ~ 0. To-

gether with

j ! I X j ~ j (M) = O,

we deduce

(2) Cj (y) = O, ~ j .

It now remains only to show that the ~j can be chosen so that

m ! n (real scalars) or m ! 2n - 1 (complex scalars); then an

application of 23c) will complete the proof.

Claim. The set {}l,...,~m } can be chosen to be real linearly

independent.

Granting (for a moment) the Claim, assume that the scalars are

real and that m = n + i. Then since Y is (n+l)-dimensional, the

set {gj} forms a basis for Y*. But then by (2) y = @, a contra-

diction. Consequently, m < n. Similarly, if the scalars are com-

plex, the a s s u m p t i o n that m > 2n leads to a c o n t r a d i c t i o n (since it

entails real-dim ({v E Y: ~j(v) = 0 ~j }) ~ I, while (2) implies

that this subspace contains the real linearly independent set

{y, iy}).
113

Proof of the Claim. Choose m to be minimal wrt the represen-

tation (i), and assume that {91,...,9 m} is (real) linearly depen-

dent. Then ~ ~l,''',~m not all zero such that a191 +...+ ~m~m = @.

Since ] 19] I = i, ~ v ~ S(Y) such that 9(v) = i, and hence

~j (v) = 1 ~j , by (i). So ~i +'''+ am = 0, and

m
j=2~j~"(~j - 91) = @.

Thus the set {92 91,...,¢ m - ~I } is linearly dependent, and there-

fore the d i m e n s i o n of its linear hull is at most m - 2. Consequent-

ly,

dim (co ({91,'-',9m})) ! m - 2.

Now we also have

(91,''',9 m} = ext (co ((~l,...,%m})),

as follows from 5e) and the fact that ~j ~ ext (U(Y*)). But now,

since 9 E co ({91,...,9m}), 23b) implies that ~ is a convex

combination of at most (m - 2) + 1 = m - 1 of the 9j, and this

contradicts the m i n i m a l i t y of m, qed.

d) Lemma. Let X be a nls of dimension at least n. Then

U(X*) contains at least n linearly independent extreme points.

Proof. Exercise 45.

e) Definition. An n - d i m e n s i o n a l subspace M of a nls X is

an interpolating subspace if for every linearly independent set

{~l,...,gn) C - e x t (U(X*)) and every set {Cl,...,c n} of scalars,

there is a unique y ~ M for which ¢i(y ) = ci, Vi.


114

This definition was given in [3]. It clearly reduces to the

definition of Haar subspace when X = C(~), in view of 15c). Sub-

spaces with this special property are rather rare in general normed

spaces; for instance, if X* is rotund, then X contains no (non-

trivial) interpolating subspace.

Exercise 46. Prove this last statement.

The remainder of this section is devoted to the connection be-

tween the interpolating property and the Chebyshev property for finite

dimensional subspaces of a nls. The preceding exercise shows that,

in general, a Chebyshev subspace need not be interpolating.

f) Theorem. Let M be an n-dimensional interpolating subspace

of a real nls X. Then M is a Chebyshev subspace.

Proof. Suppose that M is not Chebyshev. By the theorem in c),

there is a linearly independent subset {%l,''',}m ] C ext (U(X~))

(m ~ n), XI,'..,X m > O, X 1 +''.+ km 1 x E X, @ / y ~ M, such


that

cj(x) = llxll = llx- yll Vj,


m
X.@j ~ S(MA).
j=l ]

If we assume that m = n, equation (2) in the proof of c) together

with the interpolating hypothesis on M show that y = 8, a

contradiction. Therefore, we assume m < n. Then the lemma in d)

shows that {¢l,...,}m} may be extended to a linearly independent

set {~l,''',gm,-'',~n} C ext (U(X*)). Setting Xm+ 1 . . . . . X n = 0,


we h a v e

M~
j=l ]
115

Let {Xl,...,x n} be a basis for M. Then the equations

n
j~IXj~j (Xk) = 0, i ~ k ~ n,

have a non-trivial solution (Xl,...,ln). But this implies that the

determinant det [%j(Xk) ] = 0, and again this contradicts our hypo-

thesis that M is interpolating, qed.

g) The upshot of the preceding two sections is that, in real

spaces, interpolating subspaces are a special kind of Chebyshev sub-

space. As such they may be expected to possess properties not en-

joyed by general Chebyshev subspaces. This is certainly the case,

and such properties are studied in [3]. We now give the famous re-

sult which states that the two types of subspaces coincide in spaces
C( a ) .

Theorem. (Haar, Kolmogorov) An n-dimensional subspace M of

C(~) is a Chebyshev subspace if and only if it is a Haar subspace.

Proof. Assume that M is a Haar subspace. If the scalars are

real, M is Chebyshev by f). Now, if the scalars are complex, we

may still proceed as in the proof of f). However, we observe that

the set {¢l,...,~m } is pairwise (complex) linearly independent

(this follows by definition, using that this set is real linearly

independent). But the special nature of the extreme points of

U(C(g)*) shows that a pairwise linearly independent subset thereof

is actually linearly independent. Thus the proof of f) applies in


the present case as well.

Conversely, assume that M is not a Haar subspace. The idea

will be to use c) to show that M is not Chebyshev. Since M is not

Haar, there are y s S(M) and points tl,...,t n E such that

y(tj) = 0, I ! j j n. It follows that ~ scalars


116

]~1] +'''+ ]an[ = 1, such that

n
i
~ c~.6 eM
j=l J tj

We now use the Tietze extension theorem to obtain z ¢ S(C(fl)) such

that z(tj) = sgn (~j) for 1 ~ j ~ m. Here the aj are labeled

so that exactly the first m, 1 < m < n, are non-zero. Define

x = z(l - IY[)" We then have

x a s(c(~)),

x(tj) = sgn (~j)) 1 < j < m,

and

2 ~_ Ix - y l [ ~_ II lxl + tyl II

= I1 IzlCl - lyJ) + tyl It ~_ 1.

Finally, for i < j < m, define

Cj = sgn(~j)~t.
]

XD = I~jl.

Then since

Cj (x) = sgn(~j)x(tj)

= sgn (~j) sgn (~j) = 1

= llxll = llx yll,

the c o n d i t i o n of the theorem in c) is v i o l a t e d and M is not

Chebyshev, qed.

h) We have now established several conditions sufficient to

ensure that certain subspaces of various normed linear spaces are

Chebyshev. Eventually we are led to inquire whether there necessarily


117

exists any Chebyshev subspace in a given nls. For reflexive spaces

it has been shown by Lindenstrauss [39] that there is a closed hyper-

plane which supports the unit ball at a single point, and so is the

translate of a Chebyshev subspace. However, Garkavi [18] has ex-

hibited a non-reflexive and non-separable Banach space which contains

no Chebyshev subspace.

To reproduce this example, we first introduce some notation.

Let card (S) denote the cardinality of a set S, and let

c ~ card (RI). The is of all bounded scalar-valued functions on S

is denoted %~(S); it is a Banach space under the sup-norm. The

support of x ~ i~(S) is the set o[x) = {s E S: x(s) ~ 0}, and

the (closed) subspace of Z=(S) consisting of those functions with

countable support is denoted %~oCS).

Theorem. (Garkavi) If card (S) > c, then ~o (S ) contains

no Chebyshev subspace.

Proof. Let M be a (closed) subspace of Z](S). Suppose first

that card (M) < c. Since the union of c countable sets has card-

inality c, ~s o E S such that Y(So) = 0 Vy e M. Let x be the

characteristic function of {So}. Then for any y ~ U(M).

I] x - Yl] = max {i, sup {Ix(s)-y(s)l}} = i,


S~S 0

that is, y is a b.a. to x from M. On the other hand, if

card (M) > c, and x ~ M, let y be any b.a. to x from M Cif

one happens to exist). The set A ~ o(x - y) is countable, and

since card (M) > c = card (%~(A)), there must exist ~ g S(M)

such that yIA = @. But then y - t~ is also a b.a. to x from M

for sufficiently small t > 0, qed.


118

Remark. It is apparently unknown if there is a separable Banach

space with no Chebyshev subspaces. However, there does exist a

separable incomplete nls containing no Chebyshev subspaces, namely

the space of finitely supported scalar-valued functions on the inte-

gers with sup-norm (Klee-Singer).

Exercise 47. Show that the space co has no infinite dimen-

sional Chebyshev subspaces. But, for each n = 1,2,..., show that

there is an n-dimensional Chebyshev subspace.

,~29. Algorithms for Best Approximation

Let M be a finite dimensional subspace of a nls X. Given

some x c X \ M, how can one actually compute a b.a. to x from M?

Many practical algorithms for computing such b.a.'s can be considered

as particular cases of the "method of nearby norms". This method

generates a (relatively compact) sequence of elements of M which

clusters in the set of b.a.'s to x. The members of this sequence

appear as b.a.'s to ~ from M wrt a sequence of norms on X which

converges pointwise on X to the original norm on X. Since all the

analysis takes place in span ({~,M}), we can and will assume that

X is finite dimensional in the main theorem.

a) Theorem. (Kripke) Let p be a norm on the finite dimen-

sional is X, and {pk } a sequence of semi-norms on X which con-

verges pointwise on X to p. Let M be a linear subspace of X,

and ~ ~ X \ M. For each k choose a Pk - b.a., Yk' to x from


M. Then

1) Every subsequence of {yk } has a p-convergent subsequence;

2) lim p(~ - yk) = p-dist (x,M);

3) ~very p-cluster point of (yk } is a p - b.a. to x from

M;
119

4) If ~ has a unique p - b.a., 7, from M, then

lim p(y - yk) = 0.

Proof. Let {Xl,...,x n} be a basis for X, and define a norm

on X by

n n
o(x) _-- o( y~ c i x i ) = ~ Icil.
i=l i=l

Since all norms on X are equivalent, there exists I > 0 such that

~(x) < xo(x), V x ~ x.

We now claim that {pk } constitutes a uniformly p-equicontinuous

family of functions on X. To see this, let e = sup {Pk(Xi);

1 ! i ! n, k = 1,2,...}. Then for each k, and any x e X,

pk(x) = pk(~cixi) <_ )~ I c i l p k ( x i)

<_ ~ ( x ) <_ ~Xp(x),

and so

lPk(X)-Pk(Z)j < Pk(X - z) <_ ~Xp(x - z ) ,

which proves the claim. It follows that the sequence {pk } c o n -

verges uniformly to p on the compact p-unit sphere S (X); in


P
particular, if 0 < 6 < I, and k is sufficiently large, the

homogeneity of semi-norms implies

(i) (1-~)p(x) <_ Pk(X) <_ (l+~)p(x), #x.

Note that, for such k, (i) shows that Pk must actually be a


norm on X.

We next claim that {yk } is a p-bounded set. Because, taking

= 1/2 in (i) ,

P(Yk ) <__ 2Pk(Y k) 5_ 4Pk(X) <_ 4~Xp(x);

this proves the claim, and hence statemont i). Now note that 3)
120

follows from 2). Also, if 4) is false, there is 6 > 0 and a

subsequence {zj} of (yk } such that p(y - zj) > ~. Using I),

let ~ be a cluster point of {zj}. Then p(~ - 5) > 6 so that

7 ~ ~, but by 3), ~ is also a b.a. to ~, a contradiction.

It remains to prove 2). Given ~ in (0,i), we choose k

large enough that (i) holds, and then

P(~ - Yk ) ! Pk (~ - Yk ) + ~P(~ - Yk )

-< (i + Ji_]
i-~ Pk (~ - Yk ]

1
< ~ Pk(X - y), V y e M,

l+g
S T I-E P ( X - y ) ,

qed.

Remark. This last estimate indicates the problem which must

be solved in order to obtain useful bounds on the amount by which

p(x - yk ) exceeds the best value, p-dist (~,M). Namely, we must be

able to estimate sup (IPk(X) p(x) I: p(x) = I}.

b) We now consider some applications of the preceding theorem.

In general, given the p-norm with respect to which best approxima-

tions are desired, the nearby (semi-) norms Pk should be chosen so

that, first, it is (relatively) easy to compute Pk-best approxi-

mations (e.g., take Pk to be quadratic, or differentiable, or

discrete (i.e., Pk has a finite codimensional kernel)), and second,

it is possible to obtain (useful) estimates on the rate of con-

vergence Pk ÷ p"

Lemma. Let be a finite positive measure on some measure


space, x ~ L~(~), and 1 ~ p ~ +~. Suppose that Pk ÷ p" Then

lJXjlpk ~ IIXllp.
121

Proof. Exercise 48. (The cases p = 1 and p = +~ are of

special importance for the applications we have in mind.)

c) Corollary. (Polya) Let ~ be as in b), M a finite dimen-

sional subspace of L~(~), and ~ E L~(~) \ M. If Pk ÷ +~ then

the sequence of (unique) LPk(~)-best approximations to x from M

is a relatively compact set in L~(~), and any cluster point is an

L~(~)-best approximation to x.

In particular, if {x,M} C C([0,1]), and M is a Haar subspace,

then the sequence just defined converges uniformly to the (unique)

b.a. to ~ from M. If further M = P and ~ is a smooth function,


n
then Peetre [53, p. 255] obtains the estimate

IrT-ykIE {log Pk)


Here I1-It i s t h e s u p - n o r m on [0,11, and Yk(resp. y') is the
unique LPk(dt)-b.a. (resp. C([0,1])-b.a.) to x from P
n

Remarks. 1) I t was c o n j e c t u r e d f o r some t i m e t h a t the sequence


{yk }, where Yk is the unique LPk(!a)-b.a. to ~ from M, was i n

fact a convergent sequence in L~(!~). It i s now known [12, 46] t h a t

this is indeed the case when L~(~) is finite dimensional. In fact,

the limit was identified by Descloux as the "strict approximation" to

from M, a concept earlier defined by Rice [63] in the context of

uniform best approximation by functions defined on finite sets, and

shown by him to be uniquely specified, even though uniform best ap-

proximation is not generally unique.

2) The problem indicated in the Remark in a) has been recently

studied by Hebden [24] when Pk is the LPk(~)-norm. Combining the

bounds in [24] with the result in a) will yield a value of p (pro-

bably larger than necessary) for which the LP(~)-b.a. to x from M

is a suitable substitute for the uniform b.a. to ~ (in the sense


122

that the L~(~)-distance between this approximation and ~ is within

some preassigned tolerance of the distance from ~ to M).

3) An apparent improvement on the Polya algorithm has been given

by Karlovitz [33]. The numbers Pk are restricted to be even inte-

gers. Then a sequence in M is constructed by alternately solving a

weighted L2(~)-best approximation problem out of M and then a one-

dimensional LP(~)-best approximation problem. This sequence con-

verges uniformly to a specified LPk(~)-b.a. to ~ from M. Thus it

is possible to avoid the actual computation of LPk(~)-best approxi-

mations. By choosing a specified member of each such sequence (for

each pk), we obtain a sequence, bounded in L~(~), all of whose

cluster points are L~(~)-b.a.'s to x from M.

(Note: in Remarks 2) and 3), the measure ~ is Lebesgue measure

on some compact subset of Euclidean space, and the functions involved

are all continuous.)

d) It is not difficult to see that the theorem in a) remains

valid if the subspace M is replaced by any closed convex subset of

X (check!). If X is also rotund (or, more generally, possibly

infinite dimensional but uniformly rotund), another approach to the

construction of algorithms for approximating b.a.'s out of convex

subsets has been given in [25]. Basically the idea is to solve a

sequence of best approximation problems from convex sets w~ich are

appropriately "near" the original set, and which are such that the

associated b.a. problems are easier to solve. Such sets might be

hyperplanes, balls, or (convex) polyhedrons. Note that, in contrast

to the algorithms discussed earlier in this section, the norm remains

the same throughout.


123

~30. Proximinal Sets

In the many preceding sections of Part III we have considered

characterizations and uniqueness of best approximations, while glossing

over the basic question of their existence. We turn to this question

now.

a) Definition. Let A be a subset of a nls X. Then A is

called proximinal if every x e X has at least one b.a. from A. On

the other hand, A is called anti-pr0ximinal if no x g X \ A has a

b.a. from A.

Obviously a proximinal set must be closed in X, and any compact

set is clearly proximinal (see also 31a)). It is easy to see that any

closed subset of a finite dimensional space is proximinal, but ~he

following general existence theorem subsumes this along with many

other special cases.

Theorem. A w*-closed subset A of a dual space X* is proxi-

minal.

Proof. Civen any y X*\ A, the function z '-'flY " zIi is

w*-Isc on X*, and therefore attains its minimum on the w*-compact

set (y - A ) ~ 2d(y,A)U(X*).

Corollary. A reflexive subspace of a nls is proximinal, and so

is a closed convex subset of a reflexive space.

b) The preceding conditions are sufficient but not necessary for

the existence of best approximations. In the remainder of this section

we consider some special kinds of subspaces of non-reflexive spaces.

Let us first recall that a closed subspace M of a nls X is factor-

reflexive if the quotient space X/M is a reflexive (Banach) space.

We then have the following necessary condition.


124

Lemma. (Phelps) Let M be a p r o x i m i n a l factor-reflexive sub-

space of a nls X. Then every functional in M~ attains its n o r m on

u(x).

Proof. Exercise 49. (It is to be shown that 9 s M ~ implies

the e x i s t e n c e of x ~ S(X) for w h i c h } (x) = I 1¢11.)


In p a r t i c u l a r , this lemma applies to the case w h e r e M has

finite codimension in X. Although generally not s u f f i c i e n t , the

condition does characterize proximinal subspaces of c o d i m e n s i o n one,

as we will see next. In general, the u s e f u l n e s s of the c o n d i t i o n in

a particular nls X depends on having available (useful) knowledge

of the form of those functionals in X* which attain their norms.

Such information has been summarized in [56, 57], for example.

Theorem. Let H ~ {x ~ X: re ~(x) = c} be a h y p e r p l a n e in the

nls X defined by ¢(/ @) in X* and c ~ R I. Then H is proxi-

minal (resp. anti-proximinal) if and only if ~ attains (resp. does

not attain) its n o r m on U(X).

Proof. Either directly, or as a c o n s e q u e n c e of H e l l y ' s theorem

(e.g. [15, p. 86]), we see that

(i) d(x,H) = Ic - re ~ ( x ) I / I I ~ l l ,

for every x ~ x. Now if ~ attains its norm on U(X) at some

z ~ S(X), then

cz
PH(X) ~ x - t1¢ti -1 re ¢(x - l i¢ll )z

is a b.a. to x from H, since its d i s t a n c e from x is the same as

d(x,H) given in (i).

On the other hand, if does not attain its norm, then given

x E X and y ~ H such that


125

llx - yll ~ dCx,H) = ire ¢(x) cl/lI*ll

= Ire ¢ ( x - Y) I / I I ¢ I I ,

it follows that x - y = 0, that is, x a H. H e n c e no x m X \ H

has a b.a. from H, qed.

According to a profound theorem of James [32], such anti-proxi-

minal hyperplanes exist in every non-reflexive Banach space. To s e e

a simple example of such behavior, let X = LI([o,1]), and consider

the functional

? 1
,(x) z | tx(t)dt, x e X.
J O

c) We g i v e next an example of a closed but w*-dense subspace of

a dual space which is proximinal. In general, there is n o t much

theory of best approximation from non-reflexive and non-factor-reflex-

ive subspaces, and each case must be h a n d l e d on a n ad h o c b a s i s . For

some o t h e r recent e x a m p l e s we r e f e r to [29], wherein it is observed

that, while most of the standard Banach spaces are proximinal when

e m b e d d e d as subspaces of their second duals, it is possible for such

an embedding to result in an anti-proximinal subspace.

Theorem. (Holmes-Kripke) Let a be a paracompact topological


c~
space (e.g. a metric space). Let M be t h e subspace of ~R(~)

consisting of all (bounded) continuous functions on a. Then M is

proximinal in £~(a).

Proof. For any x e ~R(~), define

x,(t) ~ lim inf {x(s): s + t},

x*(t) ~ lira sup {x(s): s ÷ t},

so that xe(t) < x(t) < xe(t), V t E ~. Then define


126

1
a = A(x) -z ~ l l x * x, ll,

U = X, + A, v = X - A,

the n o r m []']] being the sup-norm. Then v ! u and we c l a i m that

a = d(x,M) and that y is a b.a. to x from M if and o n l y if

v< y < u everywhere on a.

First, a < d ~- d ( x , M ) , since for any > 0, ~ y ~ M such

that li x - Yli ~ d + ~, or

x- d- e < y < x + d + e.

Taking the lim sup of the left h a n d inequality, and the lim inf of

the right h a n d side y i e l d s

x* d - e < y < x, + d + ~,

whence

0 < x* - x, < 2d + 2 e ,

or A ! d + s. Next, if v < y < u, then

x- A <x* - A= v<y

< U = X~ + a < x + A,

whence d ! llx - Yli ~ A ! d. Finally, the e x i s t e n c e of s u c h a y

follows f r o m the Interposition Theorem of D i e u d o n n ~ [13, p. 75]

taking into a c c o u n t the p a r a c o m p a c t n e s s hypothesis and the fact that

u (resp. v) is isc (resp. usc) on ~, qed.

Remark. The foregoing r e s u l t has b e e n e x t e n d e d in two d i f f e r e n t

directions. First, Olech [521 has allowed the f u n c t i o n s being approxi-

m a t e d by the b o u n d e d continuous functions on ~ to take as v a l u e s

bounded subsets of some E u c l i d e a n space. Second, Holmes and K r i p k e


127

[28] have developed a general approach to approximation by various

subspaces of Z~(~) which depends on the idea of "interposing" a

function between two others (in the sense that y was interposed

pointwise between u and v in the preceding proof).


Part IV

Comments on the Problems

Exercise 2. For fixed t, 0 < t < I, we must show that

A ~ t int(K) + (l-t)cl(K)C int(K). Since A is open, it is suf-

ficient to show A C K. Now x ~ int(K) so t(int(K) x) is an

open @-nbhd. Therefore, (l-t)cl(K) = cl((l-t)K) C (l-t)K +

t(int(K) x) = (l-t)K + t int(K) tx C K - tx, qed.

Exercise 3. Let ~'a + b I be a net in A + B convergent to some

x ~ X. Apply compactness to the net Ib I and then use continuity of

addition in X.

Exercise 4. Since X is separable, every subset is a Lindel~f

space. Apply 3i) and then apply Lindel~f's Theorem to the union of

the complements of the resultant half-spaces.

Exercise 6. Let KI,...,K n be the sets in question. Then co(UKi)

is the image of the compact set {(Xl~tl,...~Xn,tn) ~ (X × RI) n"

x i ~ Ki, 0 ~ ti ~ i t t I +...+ t n = I) under the continuous map

(Xl,tl,''',Xn,tn)~Ztixi •

Exercise 7-2~. Recall the implication of equality in Minkowski's

inequality.

Exercise 9. U(X) is weakly compact so by 5e) K ~ co(ext(U(X))) is

w-dense in U(X). But cl(K) being closed and convex is w-closed,

hence cl(K) = U(X).


129

Exercise i0. ext(U(CR(~)) ) = {x ~ CR(~): jx(t) I = 1 ~ t c ~}.

Exercise ii. Verify condition Se-2) by use of the C o r o l l a r y in Sd).

Exercise 12. Let x,y E K and define 9(t) - (l-t)f(x)+tf(y)-f

((l-t)x+ty). Then f E Cony (X)<=>9(t) > 0 = 9(0) for 0 < t < i.

If this latter condition holds then E(x,y) = 9' (0) >_ 0. Conversely,

if x,y E K =>E(x,y) > 0 then ¢(t) = (l-t)E(x+t(y-x),x) + tE

(x+t(y-x),y) > 0. Now if f is also of class C2 on K and

x,y e K (x # y), then ~t, 0 < t < i, such that

E(x,y) (I/2)d2f(x+t(y-x)). (y-x,y-x), using Taylor's formula.

Exercise 14. For 1 < p < +~, the formula for Vf(Xo) is o b t a i n e d

by writing out the appropriate difference quotient and d i f f e r e n t i a t i n g

under the integral sign. For p = 1 the has condition on x° is

that u({t: Xo(t ) = 0}) = 0, and then Vf(Xo) can be i d e n t i f i e d

with the function sgn x ° ~ S(L~(~)).

Exercise 17. The function g(x) ~ f(Xo+X)-f(Xo) is continuous at

and -g(-x) ~ -f'(Xo;X ) ! f'(Xo;X ) ~ g(x) by 7a) and 7b). Thus

f'(xo;. ) is continuous at @ and hence at any x because of its

sublinearity. Of course the criterion of 10d-2) is also applicable

here.

Exercise 19. Without loss of g e n e r a l i t y assume x ° = @. Choose

6 > 0 so that I]xll < 6 ~If(x)-f(@)l < I. Let V = (6/2)U(x)

and ~ = 8/6. Suppose ~ x,y e V such that f(y~-f(x; > xllx-yll.


Let a = ~/2[ Ix-yll, then ~(f(y)-f(x)) > 4. Suppose ~ ! i; then

4 < f(y)-f(x) ~ If(y)-f(x)l ! If(y)-f(@)l+If(x)-f(@)I ~ 2, contra-

diction. Therefore ~ > i. But, if z z x+~(y-x), then


130

y = (i/~)z + (l-I/~)x, so f(y) < (i/~)f(z) + (l-i/~)f(x); this

implies ~(f(y)-f(x)) <_ f(z) f(x) and the same contradiction re-

sults since I lzll < 6. (Halkin)

Exercise 20. First compute that

Vf(x).y = 2 (x(t)y(t) + :k(t)~(t))dt


O

= ,

where

T
2u(t) = (l+t)I x(t)dt
O

- (t-s)x(s)ds + x(t) i.
o

Applying lld) we m u s t choose x° c X so that Xo(0 ) = 1 and the

Riesz representer u for Vf(Xo) is identically one on [0,T]. We

are led to the ODE ~ - x = e and finally to the solution

Xo(t ) = cosh(t) tanh(T)sinh(t).

The value of the program is tanh(r).

Exercise 21. The solution is the point (2/~'~,2).

Exercise 24. By 14f), f** < f, so f** < c--o(f) < f. Hence

f* ! c-o(f)* ~ f*** = f* by 14g). Therefore, c--$(f) = c-6(f)** = f*~.

Exercise 26. Let K be the right hand side of (3). Then

K ° = (UA~) ° = U A °°~ = U A by 15b). Now take polars and use 15b)

again.
131

Exercise 27. 2) Let K be the cone {x e X: f'(Xo;X ) < 0}. We

have C(xo,f) C K by 7a). Now choose ~ E K; then ~ e > 0 such

that 0 < t < a =>f(Xo+t~ ) f(Xo) z -6 < 0. Let V be the x-nbhd.

{x s X: If(Xo+a~ ) - f(Xo+SX)] < 8/2. This V and s meet the

requirements of 16c).

Exercise 28. 2) By 16d) and Exercise 27 it is sufficient to show

that - ) < 0.
x s C(Xo,~ ) ---)g'(Xo;X We have x ° + t~ c ~ for

0 < t < e so g~ (x o;x) <_ g ( X o + t X ) g(Xo) <_ 0. Now by hypothesis

~x I so that g(xl) < g(Xo) and so g' (x o;x l-xo) < 0. Since

C(Xo,~ ) is open, ~ ~ > 1 so that z - (Xl-Xo) + ~,(7-Xl+X o) e ~,

and therefore g' (Xo;Z) 5_ 0. Finally,

1 1
g'(Xo;~) = g'(Xo;i-z + (1-X-)(xl-x o))

5_ ~ g'(Xo;Z) + (i- )g'(Xo;Xl-Xo) < 0.

Exercise 29. Any ¢ ~ X* has the form

* (x,y) -- 0 xdta 1 + 0 ydl~ 2

where ~i ~ rca ([0,I]). Let V° be the V o l t e r r a operator:

x ~-'~ x(s)ds and 81 point evaluation at t = 1. Then if


0
qb e C ( ( x o , Y o ) , A ) "L a n d y(t)dt = 0,
0

0 =
fl
0 V°(y)d~l +
Ii 0 Ydu2,

so ~2 = (c81-~i) ~ Vo for some constant c. Now take ~ = ~i to

prove (2) in 17c).

Exercise 30. Some members of M -a are

d~ = (sin ~s)(sin ~t)ds dt;


t32

d~ = (l-ns n-l) (l-ntn-l)ds dt;

d~ = ( 8 1 ( s ) - 6 o ( S ) ) ( 6 1 ( t ) - 8 o ( t ) ) .

The answer is a(Xo,M) = 1/4.

Exercise 33. Of the several implications here only the one that

(4) = > s u p p o r t (u¢) C {t ¢ 2: x(t) = _


+ [Ixl]} ~ Et is (perhaps)

unclear. Assuming, as we may, that IIxII : i,

= ~(~+)+(-1)~(~-)

= ~+(e+)-~- (~*)_~+(e-)+~- (~-)

whence

o < ~+(~ \ ~+)+~+(~-)


= -.-(~ \ E-)-,-(~+) < o,

so all terms here are zero, and the result follows.

Exercise 34. Let E be a Borel subset of [0,I]. Then

fxd~ = fx ~-C
d~at "
E E

Exercise 35. Let K be the set of all n o r m - p r e s e r v i n g extensions

of ~. Then K is a w ~ - c o m p a c t U(X*)-extremal set and so 5c), 5d)

apply,

Exercise 37. The condition of the c o r o l l a r y holds if and only if

~l,...,Zm > 0, X 1 +...+ Xm = 1 such that


133

0 j~iXj~j (X-Xo)*j (Xl),

• . ° • • ° 0 . ° • • • •

m
0 = j=IXj~j(X-Xo)~j
I (x n)

where m < n + I (real scalars) or m < 2n + 1 (complex scalars),

~j ~ e x t ( U ( X * ) ) , and l~j(X-Xo) I = llX-Xol I. Here we have used 23a).


Defining

= ,j(X-Xo)/llx-xol 1,
we have ]~j] = 1 and ~j --- ~j~j ¢ ext (U(X*)). Also

~j (X-X O) = ~j~j (X-X O)

= ~j I~j (x-x o) ]sgn(~j (X-Xo))

= llxxol I

Hence the condition of the corollary is equivalent to the nas condi-


tion of the theorem in 23f).

Exercise 38. Recall that the coefficients X! n) are all positive,


J
and that ix!n) = ][~] I independently of n. Now use the classical
j ]
Weierstrass theorem to approximate any given x ¢ CR([-I,I]) by a
polynomial p. Then

]¢(x)-An(X)] ! ]~(x)-¢(p)]

+ I~(P)-An(X)l ! 2IIp-xll It~lt

which can be made arbitrarily small by proper choice of p.

Exercise 39. Writing t ] = exp(~j log t) reduces I) to a special

case of 2 ) , which i n t u r n , follows directly from the theorem in 25c).


134

To prove 3), we write

n
p(t) - ~ a k cos(kt) + b k sin(kt)
k=o

for arbitrary but fixed real ak, bk; it must be shown that p has

at most 2n distinct roots in the interval [0,2~). There are

complex ck such that

p(t) = e -int ~nckeikt.


k=o

Hence i f

2n
q(z) - X ckzk,
k=o

then with z = exp(it),

q(z) = exp(zn)p(t).

Since q has at most 2n distinct roots, the same is true for p

in the interval [0,27). Finally, for 4), we observe that a linear

combination of

n
{c°s(kt)}nk=o (resp. {sin(kt)}k=l)

is an even (resp. odd) trigonometric polynomial of degree n, so

by 3) can have at most 2n roots in [-~,~).

Exercise 40. Assume, if possible, that

D(Sl,...,Sn) < 0 < D(tl,...,tn).

Then _~ ~, 0 < ~ < i such that

D(k~l+(l-k)tl,''',ks n + (l-~)tn) = O;

since M is a Haar subspace, this entails


135

Xs i + (l-~.)t i = ~sj + (l-~)tj for some i < j. Therefore, ,

0 < X(sj-si) = (l-X)(ti-tj) < 0, a contradiction.

Exercise 41. The b.a. to 4"t from P1 on [0,I] is t + 1 The

b.a. to ]t[ from P1 (resp. P2) on [-I,I] 1


is t - ~- (resp
1
t 2 + ~).

Exercise 42. The o r t h o n o r m a l i t y of the indicated sequence in L2(V)

follows from the familiar formula

cos(ms) cos(ns)ds = -2- 6mn'

by making the change of variable t = cos(s). The indicated sequence

is complete in L2(U), since CR([-I,I]) is dense in L2(U)

(using Lusin's theorem), and the p o l y n o m i a l s are dense in CR([-I,I]).

Exercise 43. Assume that the nls X is rotund and that llu+vll =

llull + [Ivll for some u, v e X. The function

~(t) z [I]'F~ + ~ l l

is convex on [0,i] and ¢(0) = ¢(i) = I. We will show that

9( 1 ) = I; this will imply that ~ is c o n s t a n t l y equal to one on

[0,I] and c o n t r a d i c t s rotundity unless u/llull = v/ilvll. Assuming,

as we may, that llull = i, we must show that ~ - [Lu+v/HvlL II = 2

(it's certainly < 2). If llvI[ > I, then ~ > liu+vlL llv-v/llvll II =

1 + ILvll (l-llvll-l)llvll - 2. And if llvl[ < I, then II ILvllu+vll >

[hu+vH - [lu-iIv[[u[[ = 1 + [[vl[ - (1-[Ivil) = 2[Iv[l, q e d .

Exercise 45. By 2 3 c ) , we may a s s u m e t h a t X is n-dimensional. Sup-

pose that m < n is t h e maximum c a r d i n a l i t y of linearly independent

subsets of ext(U(X*)), and let {~1 . . . . . Cm} b e s u c h a subset.

Then every ~ E ext(U(X~)) belongs to span ({~l,...,~m}). But, by


136

23d), every ~ ~ S(X ~) belongs to co (ext(U(X ~))). Consequently,

X* = span ({~l,...,~m}, so dim (X) = dim (X*) = m < n.

Exercise 46. Let M be any proper finite dimensional subspace of

X. Then ¢ ~ S(M ) C S(X *) = ext (U(X*)). Then the definition in

28e) cannot be satisfied if, in the set {¢i,...,¢n } there, we take

~i ¢ S(M±) and c I ~ 0.

Exercise 47. Suppose that M is an infinite dimensional subspace

of co and that y ~ (yi) is a b.a. to x ~ (xi) from M. Choose

an index n such that

sup {Ixi-Yil: i > n} < Ilx-y]l.

Then, because M is infinite dimensional, ~y ~ M such that ~ ~ @

but Yi = 0, i j n. (Because, if y ~ M and Yi = 0, i j n,


=)y = @, then M is disjoint from the n-codimensional subspace

(y s Co: Yi = O, i j n}, whence dim (M) ! n < +~.) It follows


that y + ty is also a b.a. to x from M if t is sufficiently

near 0, and so M is not Chebyshev. On the other hand, given a

positive integer n, choose 0 < t I <...< t n < 1 and define M to

be span ({Xl,...,Xn}), where the ith-coordinate of xk is (tk)i

Using that c* = zl and that ext(u(zl)) consists of


O

{~en: I~I = i}, where en is the nth-standard unit vector in Z I,

it is seen that M is actually interpolating, an~ hence Chebyshev

by 28f).

Remark. In fact, it can be shown, using in part 28f) and the

first half of the preceding exercise, that the only Chebyshev sub-

spaces of c are the interpolating subspaces ([3, p. 167]).


O

Exercise 49. Recalling the formula M ~= (X/M) ~, we consider the

given ~ ~ M as a functional on X/M. Since this space is re-


137

flexive, ~ z ~ S(X/M) such that ~(z) = [IS1[. Now z is a t r a n s -

late of M, and since M is p r o x i m i n a l , z has an e l e m e n t x of

minimal norm: [lzll = I]xl[ = I. Thus ~(x) = []~[1 also,


138

Bibliography

1) R. Arens and J. Kelley, Characterizations of the space of


continuous functions over a compact Hausdorff space. Trans.
Amer. Math. Soc. 62(1947), 499-508.

2) E. Asplund, Averaged norms. Israel J. Math. 5(1967), 227-233.

3) D. Ault, F. Deutsch, P. Morris, and J. Olson, Interpolating sub-


spaces in approximation theory. J. Approx. Th. 3(1970),
164-182.

4) N. Bourbaki, El~ments de math~matique, Livre V, Espaces


vectoriels topologiques. Hermann et Cie, Act. Sci. et Ind.
1189, Paris, 1953.

5) L. deBranges, The Stone-Weierstrass theorem. Proc. Amer. Math.


Soc. 10(1959), 822-824.

6) A. Br~ndsted, Conjugate convex functions in topological vector


spaces. Mat.-Fys. Medd. Dansk. Vid. Selsk. 34(1964), 1-26.

7) and R. T. Rockafellar, On the subdifferentiability


of convex functions. Proc. Amer. Math. Soc. 16(1965),
605-611.

8) R. C. Buck, Applications of duality in approximation theory.


p. 27-44 in Approximation of Functions (H. Garabedian, Ed.),
Elsevier, Amsterdam-London-New York, 1965.

9) E. W. Cheney, Introduction to Approximation Theory. McGraw-Hill,


New York, 1966.

I0) G. Choquet, Lectures on Analysis (Vol. II). Benjamin, New York,


1969.

II) D. Cudia, Rotundity. p. 73-97 in Convexity (V. Klee, Ed.),


Amer. Math. Soc., Providence, 1963.

12) J. Deschoux, Approximations in Lp and Chebyshev approximations.


J. Soc. Ind. App. Math 11(1963), 1017-1026.
139

13) J. Dieudonn~, Une g~n~ralisation des espaces compacts. J. Math.


Pures Appl. 23(1944), 65-76.

14) A. Dubovitskii and A. Milyutin, Extremum problems in the


presence of constraints. Zh. Vychisl. Mat. Mat. Fiz.
5(1965), 395-453. (Russian)

15) N. Dunford and J. Schwartz, Linear Operators, Part I. Inter-


science, New York, 1958.

16) W. Fenchel, On conjugate convex functions. Canad. J. Math.


1(1949), 73-77.

17) T. Flett, On differentiation in normed vector spaces. J. Lon.


Math. Soc. 42(1967), 523-533.

18) A. Garkavi, On Chebyshev and almost Chebyshev subspaces. Izv.


Akad. Nauk SSSR Ser. Mat. 28(1964), 799-818. (Russian)

19) , Uniqueness of solutions of the L-problem of


moments, Izv. Akad. Nauk SSSR Ser. Mat. 28(1964), 553-570.
(Russian)

2o) A. Geoffrion, Duality in nonlinear programming: a simplified


applications oriented development . Soc. Ind. App. Math.
Rev. 13(1971), 1-37.

21) I. Girsanov, Lectures on the Mathematical Theory of Extremal


Problems. University of Moscow, Moscow 1970. (Russian)

22) I. Gohberg and M. Krein, Introduction to the Theory of Linear


Nonselfadjoint Operators. Amer. Math. Soc., Providence,
1969.

23) H. Halkin, A satisfactory treatment of equality and operator


constraints in the Dubovitskii-Milyutin optimization
formalism. J. Optim. Th. Appl. 6(1970), 138-149.

24) M. Hebden, A bound on the difference between the Chebyshev


norm and the Holder norms of a function. Soc. Ind. App.
Math. J. Num. Anal, 8(1971), 270-277.
140

25) R. Holmes, Approximating best approximations. Nieuw Arch. voor


Wisk. 14(1966), 106-113.

26) , Smoothness indices for convex functions and the


unique Hahn-Banach extension problem. Math. Z. i19(1971),
95-110.

27) and B. Kripke, Approximation of bounded


functions by continuous functions. Bull. Amer. Math. Soc.
71(1965), 896-897.

28) , Interposition and approximation. Pac. J. Math.


24(1968), 103-110.

29) , Best approximation by compact operators. Ind.


Univ. Math. J. (to appear).

30) L. Hurwicz, Programming in linear spaces. Ch. 2 in Studies in


Linear and Nonlinear Programming (by K. Arrow, L. Hurwicz,
and H. Uzawa). Stanford Univ. Press, Stanford, 1958.

31) A. Ioffe and V. Tikhomirov, Duality of convex functions and


extremal problems. Russian Math Surveys 23(1968), 53-124.

32) R. James, Characterizations of reflexivity. Studia Math.


23(1964), 205-216.

33) L. Karlovitz, Construction of nearest points in the L p, p


even, and L~ norms. I. J. Approx. Th. 3(1970), 123-127.

34) J. Kelley and I. Namioka, Linear Topological Spaces. Van


Nostrand, Princeton, 1963.

35) J. Kingman and A. Robertson, On a theorem of Lyapunov. J. Lon.


Math. Soc. 43(1968), 347-351.

36) V. Klee, Extremal structure of convex sets. II. Math. Z.


69(1958), 90-104.

37) B. Kripke, Best approximation with respect to nearby norms.


Num. Math. 6(1964), 103-105.
141

38) J. Lindenstrauss, A short proof of Lyapunov's convexity


theorem. J. Math. Mech. 15(1966), 971-972.

39) , On nonseparable reflexive Banach spaces. Bull.


Amer. Math. ~oc. 72(1966), 967-970.

40) and R. Phelps, Extreme point properties of convex


bodies in reflexive Banach spaces. Israel J. Math. 6(1968),
39-48.

41) C. Lobry, Etude G~om~trique des Probl~mes d'Optimisation en


Presence de Constraintes. Universit~ de Genoble, 1967.

42) D. Luenberger, Optimization by Vector Space Methods. Wiley,


New York, 1969.

43) L. Liusternik and V. Sobolev, Elements of Functional Analysis.


Ungar, New York, 1961.

44) O. Mangasarian, Nonlinear Programming. McGraw-Hill, New York,


1969.

45) G. Minty, On the monotonicity of the gradient of a convex


function. Proc. J. Math. 14(1964), 243-247.

46) B. Mitjagin, The extremal points of a certain family of


convex functions. Sibirsk. Math. Zh. 6(1965), 556-563.
(Russian)

47) J. Moreau, Fonctions Convexes en DualitY. Facult~ des Sciences,


Universit6 de Montpellier, 1962.

48) , Fonctionnelles sous-diff~rentiables. C. R. Acad.


Sci. Paris 257(1963), 4117-4119.

49) , Sous-diff~rentiabilit@. Proc. Coll. Convexity,


Copenhagen 1965(1967), 185-201

so) , Fonctionnelles Convexes. S~minaire "Equations


aux Deriv~es Partielles. Coll~ge de France, 1966.
142

SL) I. Natanson, Constructive Theory of Functions. Ungar, New York,


1964.

52) C. Olech, Approximation of set-valued functions by continuous


functions. Coil. Math 19(1968), 285-293.

s3) J. Peetre, Approximation of norms. J. Approx. Th. 3(1970),


243-260.

54) E. Peterson, An economic interpretation of duality in linear


programming. J. Math. Anal. Appl. 30(1970), 172-196.

55) , Symmetric duality for generalized unconstrained


geometric programming. SIAM J. App. Math. 19(1970),
487-526.

56) R. Phelps, Subreflexive normed linear spaces. Arch. der Math.


8(1957), 444-450.

57) , Some subreflexive Banach spaces, Arch. der


Math. 10(1959), 162-169.

58) , Uniqueness of Hahn-Banach extensions and


unique best approximation. Trans. Amer. Math. Soc.
95(1960), 238-255.

s9) , Lectures on Choquet's Theorem. Van Nostrand,


Princeton, 1966.

60) W. Porter, Modern Foundations of Systems Engineering.


Macmillan, New York, 1966.

61) M. Powell, On the maxiumum errors of polynomial approximations


defined by interpolation and by least squares criteria.
Comp. J. 9(1967), 404-407.

62) B. Pshenichnii, Convex programming in a normed space.


Kibernetika 1(1965), 46-54. (Russian).

63) J. Rice, Tchebycheff approximation in several variables.


Trans. Amer. Math. Soc. 109(1963), 444-466.
143

64) T. Rivlin, Polynomials of best uniform approximation to certain


rational functions. Num. Math. 4(1962), 345-349.

65) and H. Shapiro, A unified approach to certain


problems of approximation and minimization. J. Soc. App.
Ind. Math. 9(1961), 670-699.

66) R. T. Rockafellar, An extension of Feuchel's duality theorem


for convex functions. Duke Math. J. 33(1966), 81-90.

67) , Level sets and continuity of conjugate convex


functions. Trans. Amer. Math. Soc. 123(1966), 46-63.

68) , Characterization of the subdifferentials of


convex functions. Pac. J. Math. 17(1966), 497-510.

69) , Convex functions, monotone operators, and


variational inequalities, p. 35-65 in Theory and
Applications of Monotone Operators. Proc. NATO Advanced
Study Institute, Venice, 1968.

70) , Convex Analysis. Princeton University Press,


Princeton, 1970.

71) H. Schaefer, Topological Vector Spaces. Macmillan, New York,


1966.

72) I. Singer, Best Approximation in Normed Linear Spaces by


Elements of Linear Subspaces. Springer, Berlin-Heidelberg,
1970.

73) S. Stechkin and L. Taikov, On minimal extensions of linear


functionals. Trudy Mat. Inst. Steklov 78(1965), 12-23.

74) J. Tate, On the relation between extremal points of convex


sets and homomorphisms of algebras. Comm. Pure Appl.
Math. 4(1951), 31-32.

75) E. Titchmarsh, The Theory of Functions, 2nd Ed. Oxford Univ.


Press, Oxford, 1939.
144

76) M. Vlach, On necessary conditions of optimality in linear


spaces. Comm. Math. Univ. Carolinae 11(1970), 501-513.

77) J. Weston, A note on the extension of linear functionals. Amer.


Math. Monthly 67(1960), 444-445.

78) K. Yosida, Functional Analysis. Academic Press, New York, 1965.


Part V

Selected Special Topics

In this final supplementary part of these notes we consider, in

varying degrees of detail, a variety of special topics in approxima-

tion and optimization. For the most part they represent areas of

current and active research interest. Consequently, our aim is not

to present definitive treatments, but rather to alert the reader who

has come this far to the existence of several further areas for

study, to indicate a few of the results already known (and when

possible, to incorporate these results within the framework of

Parts I-III), and to provide some pertinent bibliographical reference~

§31. E-spaces

The special class of Banach spaces to be defined next, the so-

called "E-spaces", appears to be the maximal satisfactory class of

Banach spaces for which all convex norm-minimization problems are

"strongly solvable" and all convex b.a. problems are "well posed"

in the sense of Hadamard (definitions below). By "satisfactory" we

mean that several different characterizations are known (over a

dozen as a matter of fact), that numerous concrete examples are

available, and that the E-property is stable wrt the formation of

subspaces, quotients and products.

a) Definition. Let (~, d) be a metric space, and ~0 a sub-

set of ~. ~0 is boundedly compact if its intersection with every

closed ball in ~ is compact. ~0 is a p_~roximativel K compact if

for any x ~ ~, every minimizing sequence in ~0 (i.e., every

sequence {Xn}~ ~0 for which d(X,Xn) ÷ d(x,~0)) has a cluster

point in ~0 ~ .
146

It is clear that bounded compactness ~> approximative compactness

=~ proximinal in any nls (or in any metric space, for that matter).

Simple examples in Hilbert space show, however, that neither of these

implications is reversible.

b) Definition. A (real) Banach space X is an E-space if X

is rotund and every weakly closed set in X is approximatively

compact.

Such spaces were first introduced by Fan and Glicksberg [5],

and characterized in several ways. We will establish one of their

characterizations next. It depends in part on the theorem of James,

alluded to in 30b), that a Banach space X is reflexive if (and only

if) every element in X* attains its norm on U(X). We use the

notation xn ~ x to denote weak convergence in a nls.

Theorem. A (real) Banach space X is an E-space if and only

if X is reflexive, rotund, and Xn, x e S(X), x n ~ x, implies

Ilx n - x]l ÷ 0 (that is, weak sequential convergence within S(X)

entails norm convergence).

Proof. Using the weak compactness of closed balls in a

reflexive space, it is straightforward to verify that these conditions

imply the E-property. For the converse, reflexivity is obtained by

applying James' Theorem and the approximative compactness of

(closed) hyperplanes. Now let IIXnl I = IIxll = I, and x n ~ x.

Choose ~ e S(X*) such that ~(x) = i. Then ~(Xn) + ~(x) = i, so

we may assume that all ¢(Xn) > 0. Let ~n = Xn/~(Xn) so

{Xn} C H z {z ~ X:~(z) = I}. Then {Xn } is a minimizing sequence

for @ in H, and since H is approximatively compact by

hypothesis, we may assume that ~n ÷ ~ e H. But since ~n ~ x also,

we have x = x. Therefore, x + x (norm convergence), qed.


n
147

c) Before establishing our second characterization of E-spaces,

we must introduce a stronger form of the d e f i n i t i o n in 27d) of a

smooth nls.

Definition. A nls X is called strongly smooth if its norm is

is F r e c h e t - d i f f e r e n t i a b l e on the open set X X {@}.

Lemma. (Shmulian) The n o r m is Frechet differentiable at

x e S(X) if and only if any sequence {~n} C U(X*) for which

~n(X) ÷ 1 is convergent.

Proof. Assume that the norm has a Frechet differential

e S(X*); we will show that any sequence {¢n } as d e s c r i b e d in

the Lemma must converge to %. Since 1 > ll%nl ] > en(X) ÷ i, we

may suppose that ll%nl I = i. Now, if I]¢ n ~I]~ 0, then

e > 0 and {Zn} C S(X ) such that (%n - ~)(Zn) { 2e. Define

1
xn = ~(I Ix[I *n(X))Zn •

Then x ÷ 0, but
n

Ilx + Xnll Ilxll ¢(x n)


IIXnll

q~n(X + Xn) 1 ¢(Xn)


>
I lXnll

1 en(X)
en(X) - 1 + (c~n qb) ( Z n ) (
1 - en(X)
G

= (q~n - oh)(Zn) e > ~,


148

which contradicts that $ is the Frechet differential of the norm

at x.

Conversely, assume that the c o n d i t i o n of the Lemma is satisfied.

Then at least the n o r m has a gradient ¢ at x. For otherwise,

by lOc), there are two distinct norm-subgradients 9, * at x, and

then the sequence {¢, ,, $, ,,...} violates the c o n d i t i o n of the

Lemma. Now if the norm is not Frechet differentiable at x, then

e > 0 and {x n} C X, x n + @, s u c h that

r lx + Xnll - I lxll *(Xn? > ~


ElXnll

or

I lx + xnll ,(x + Xn) _> ~ t lXnt I-

Choose Cn ~ S(X~) such that Cn(X + Xn) = ] I x + Xn] [. Then

Cn(X) = Ilx + Xnl I - Sn(Xn) ÷ I lxI[,

since x ÷ @. But
n

lt*n *II _> (% - ¢)(Xn/llxnll)

¢(x) - Cn(X)
> + ~ > ~ ,

llxnll

since @(x) = I IxIl > @n(X), and so (¢n } does not converge to ¢.

But this means that {@n } is not convergent, since any limit of

(¢n } must be a n o r m - s u b g r a d i e n t at x. Thus we again arrive at a

contradiction.
149

Cqrollary. Let X be a strongly smooth nls. Then the map

which sends each x(/ @) in X into the gradient of the norm at

x (= the Frechet differential of the norm at x) is continuous.

Proof. This gradient map must at least be continuous when X~

is given its w~-topology, since its range lies in S(X*) C U(X*),

and this latter set is w*-compact. But then the Lemma immediately

implies that, in fact, the map is continuous when X~ is given its


norm topology.

Thus if X is strongly smooth, its norm is actually

continuously Frechet differentiable on the open set X \{@}. It also

follows from the Lemma that the norm is nowhere Frechet

differentiable in such function spaces as CR([0,1]) and L~([O,I]).


d) Theorem. (Anderson) A (real) Banach space X is an E~space

if and only if X* is strongly smooth.

Proof. If X is an E-space, then by the theorems in b) and

27d), X is reflexive and X* is smooth. We verify the Shmulian

criterion at ¢ e S(X*) by choosing (x n} C U(X) with @(Xn) ÷ I,

and showing that {x n} is convergent. Now 1 > IlXnl I > ¢(Xn) + i,

so I[Xnll ÷ i. Let Xn ~ x n /IIXn[I, and let ~ be a weak-

cluster point of (Xn }. Then Ilxll < I, but @(x) =

lim @(Xn)/]IXnl [ = i, so [[~II = I. Now the E-property implies

that ~ is a norm-cluster point of {Xn } (note that we have used

the Eberlein theorem here, namely that U(X) is weakly sequentially

compact). Because X* is smooth, x is uniquely specified by

the conditions: ~ ~ S(X) and ¢(~) = I. Therefore, ~n ÷ ~'

and so xn ÷ x also. The converse implication is proved similarly,

again making use of the Shmulian criterion.

e) We next want to mention some of the significance of E-spaces

in optimization theory. Other uses of E-spaces are pointed out in


§ 32-35 below.
150

Consider a variational pair (~,f)(lla)) where ~ is a metric

space. We assume that the set of solutions of the associated

mathematical program is a non-empty set ~0 C ~. A minimizing

sequence for (~,f) is any sequence ~Xn)C ~ for which f(Xn) ÷

inf f(~).

Definition. Such a mathematical program is called stable if

every minimizing sequence (x n} for (~,f) satisfies d(Xn,~ o) ÷ 0.

If the solution set ~o is a singleton set, and the program is

stable, it is called strongly solvable.

Theorem. A (real) Banach space X is an E-space if and only

if every convex program (X, If'If + ~K ), where K is a closed

convex subset of X~ is strongly solvable.

Proof. The proof is straightforward; to show that this

condition implies the F-property it is enough, by the proof in b),

to show that every hyperplane K G X is approximatively compact.

Remark. A more general result has recently been obtained by

Asplund and Rockafellar [I]. Namely, if X is a Banach space, and

f a Isc function in Conv(X), then the convex program (X,f) is

strongly solvable if and only if f~ is Frechet differentiable at

@ in X~ (and then the Frechet differential is shown to belong to

X, rather than just X~).

Corollary. (Regularization Algorithm for Convex Programs).

Let f and X be as in the Remark, with X an E-space. Let Iynl

decrease to the value of the convex program (X,f). Then the convex

programs (X, If" II + ~ (~n z (x e X:f(x) ~ ¥n )) have unique

solutions, and the resulting sequence converges to the element of

minimal norm in the solution set ~ .


O
151

Thus any method of minimizing the norm in X over the convex

sets ~n leads to approximate solutions for the original program

(X,f). This Corollary has been stated by Sholohovich [Ii].

f) It remains to give some examples of E-spaces. Once an

initial class of E-spaces has been discovered, many other E-spaces

may be constructed by use of the following operations.

Theorem. The E-property of Banach spaces is hereditary and

divisible (that is, if M is a closed subspace of the E-space X, then

M and X/M are E-spaces), and productive (in the sense that if

XI, X2,... are all E-spaces, then P2(Xn) is again an E-space).

(We recall that P2(Xn) - {(Xl,X2,...):Xn ~ Xn and


~[ IXnl 12 < + ~}, with

I t(Xl,X 2 .... )11 = (~flxntl2)l/Z;

P2(Xn ) is called the Z2-product of the Banach spaces {Xn}.)

Proof. The first two assertions follow readily from the E-space

characterizations in b) and d), respectively. Now since each Xn

is in particular reflexive, and since the map

(¢1,¢2 . . . . ) ~,- ~ ,

oo

¢(x) -= ¢ ( ( X l , X z .... )) = ¢n (Xn) '


1

is an isometric isomorphism from P2(X~) onto P2(Xn)~ (check:), we

see that P2(Xn) is reflexive. Also, by use of the Schwarz

inequality in 4 2, we see that P2(Xn) is rotund (since each Xn

is). Finally, to complete the proof by use of b), suppose

{x,x (m)} C S(P2(Xn) ) and x (m) -~ x; we must show that

llx (m) - xll ÷ 0. Given e > 0, choose an index no such that


152

llxnll 2 < e. Next, since x n( m ) ~ xn for each n (on


n >n
o
account of the formula for P2(X)*),..
_ we have

lim Ilx~m) ll ~ Ilxnll,


m +

and so I x<m'lt~ ÷ I lxnll for each n (since "~llx(m~ll I lxll 1)


n '

Therefore, because each Xn is an E-space, the result in b) implies

IIx~ m) - Xnl I + 0, for each n. Thus we may choose an integer mo

so that
n

X°llx~ m) xnll 2 < ¢


n=l

for m > m o. Now, this implies in turn

n
o
CllxCm)
n
ll llx n II) 2 < e '
n=l

and so

no no
1 - ~ < llxnll z < X llx~m) ll2 + 1~".
n=l n=l

Consequently, [ I Ix(m) I 12 < e +~/~, and hence


n
n>n o

oo
] ix(m) _ xl 12 = ~ I Ix(m) x nl 12
n=l

<
o x (m) 2 2)
I I x n(m) - Xn[I 2 + 2 ~ (ll n ]1 + l]Xn]l
n=l n>n o

< e + 2(e +'~¢) + 2e,

whenever m > mo, qed.


153

g) For most practical purposes the standard examples of E-spaces

actually have the stronger property of " u n i f o r m rotundity", although

there certainly are E-spaces without this property (the simplest

example perhaps being P2(zn(2))[3]). A discussion of both some

uses and some limitations of u n i f o r m l y rotund spaces in a p p r o x i m a t i o n

theory is given in [6, 7].

Definition. Let K be a closed convex subset of a nls X.

K is u n i f o r m l y rotund if there is a n o n - d e c r e a s i n g function 8 on

[0,+~), with 8(0) = 0, 8(t) > 0 for t > 0, such that

x +Y+ z ~ K
2

whenever x,y ~ K and I Izl I _< 6(I Ix-YI I). By abuse of language,

X is called uniformly rotund if U(X) satisfies this condition.

Clearly a uniformly rotund convex set is rotund (27b)). This

definition is equivalent to the more usual definitions that X is

uniformly rotund if either i) for ~ > 0 36 > 0 such that

x,y ~ U(X) and II (x+y)/2II > 1 - 8~ I Ix-yiI < c; or 2)

(Xn,Yn}C S(X ) and Iix n + YnI [ + 2 --~Iix n YnI I ÷ o.

Theorem. A uniformly rotund Banach space is an E-space.

Proof. We consider that the Banach space X is c a n o n i c a l l y

embedded in X **. If X is not reflexive, ~ ~ S(X **) such that

d(~,U(X)) = 2e > 0. If V is any w * - ¢-nbhd. ; then

~ w*-cI(V~U(X)), by the Goldstine density theorem [4, p.424].

Let 6' -~ 28(~), where 8(e) is as in the d e f i n i t i o n i) of u n i f o r m

rotundity. Choose ~ ~ S(X ~) such that [~(~) 1 I < 8'/2. Then

define

v = (~ e x**:l~(*) iI < 6'/2}.

Now if x,y c V ~ U(X), then


154

f[x+yfl _> I ~ ( x ) + ~(y) l > 2 - ~,,

whence I Ix-yll < ~. Thus for any fixed such X,

vo u(x) C x + ~ u(x**).

Since the right hand side here is w*-closed, it follows that

e x + e U(X*~), that is, llx~ll ~ ~, a contradiction.

To complete the proof, we use definition 2) of uniform rotundity.

Suppose that {x n, x } C S(X) and that Xn--~x. Choose ~ e S(X ~)

such that ~(x) = i. Then

I Ixm + x n[ [ _> [~(x m) + ~(x n) ]

÷ 21~(x)] = 2 as m,n + +~,

and so fix m Xnl I ÷ 0. Since X is complete, x n ÷ x' e S(X);

but also Xn-a x whence x' = x, qed.

We have shown that uniform rotundity implies the E-property by

verifying the conditions of the theorem in b). It is also possible

to verify directly the conditions of the original definition in b),

and this would be a bit shorter (cf. [2, p.22]). The main interest

in the above proof is that it establishes reflexivity independently

of James' theorem, which was needed in b).

Example. For 1 < p < +~, the spaces LP (~), wP'k (G) , and S
P
(defined in 27d)) are all uniformly rotund. (The elements of LP(~)

can even be vector-valued, taking values in some fixed uniformly

rotund Banach space [9].) In particular, Hilbert space is uniformly

rotund.
155

The last assertion follows readily from the parallelogram law.

The remaining assertions all hinge on the uniform rotundity of

LP(~). The most direct proof of this fact seems to be the one

given recently by Morawetz [I0]. The case of wP'k(G) then follows

from this and the easily checked fact that the finite ~P-product of

uniformly rotund spaces is still uniformly rotund. Finally, the

uniform rotundity of the operator spaces S has been established


P
(at some length) by McCarthy [8].

We might also take note of one other class of uniformly rotund

Banach spaces. Namely, any finite dimensional rotund nls is

actually uniformly rotund. This follows easily from the compactness

of the unit ball in such a space.


156

References for §31

i) E. Asplund and R. T. Rockafellar, Gradients of convex functions.


Trans. Amer. Math. Soc. 139(1969), 443-467.

2) E. W. Cheney, Introduction to Approximation Theory. McGraw Hill,


New York, 1966.

3) M. Day, Reflexive Banach spaces not isomorphic to uniformly


convex spaces~ Bull. Amer. Math. Soc. 47(1941), 313-317.

4) N. Dunford and J. Schwartz, Linear Operators, Part I. Inter-


science, New York, 1958.

5) K. Fan and I. Glicksberg, Some geometric properties of the spheres


in a normed linear space. Duke Math. J. 25(1958), 553-568.

6) R. Holmes, Approximating best approximations. Nieuw Arch. voor


Wisk. 14(1966), 106-113.

7) and B. Kripke, Smoothness of approximation, Mich.


Math. J. 15(1968), 225-248.

8) C. McCarthy, Cp. Israel Math. J. 5(1967), 249-271.

9) E. McShane, Linear functionals on certain Banach spaces. Proc.


Amer. Math. Soc. 1(1950), 402-408.

I0) C. Morawetz, Two L p inequalities. Bull. Amer. Math. Soc.


75(1969), 1299-1302.

ii) V. Sholohovich, Unstable extremal problems and geometric


properties of Banach spaces. Soviet Math. Dokl. 11(1970),
1470-1472.
157

§32. Metric projections

a) Definition. Let M be a subset of a nls X, and define

PM(X) = [y ~ M ' I I x - y l l = d(x,M)).

This set-valued mapping on X, x ~ P M ( x ) , is called the metric

projection on M.

It is clear that PM(X) is a closed and bounded (but possibly

void) subset of M, and is convex whenever M is convex. When M

is a Chebyshev set, PM is a single-valued mapping of X onto M,

sometimes known as the "Chebyshev map", "best approximation operator'~

or "proximity map". It is a natural object of study in trying to

understand the nature of a particular best approximation problem

defined by some set M. We especially wish to learn for which sets

of approximators M the metric projection is linear, or

differentiable, or (at least) continuous. That there should be any

question about the continuity of PM when, say, M is a

Chebyshev subspace of a Banach space, may seem surprising, but it

turns out that even this modest property of best approximation can

fail in general.

We consider first by far the most satisfactory setting for

metric projections, n a m e l ~ the case where X is an inner product

space.

Example. Let K be a complete convex subset of an inner

product space X. Then K is a Chebyshev set and PK is a

contraction on X:

(1) t IPK(X) - PK(Y)]] ff I Ix - y t l .


158

When K is in addition a linear subspace of X, then PK is the

usual orthogonal projection of X onto K.

Let us just prove the first s~tement; the proof depends on the

characterization of b.a.'s in 22d). Given x, y ~ X, we have

re <PK(X) - x, PK(y) - PK(X)> > O,

( y - PK(y), P (Y) - PK(x)}-> o.

Addition of these two inequalities yields

re <y - x, PK(y ) PK(X)> +

re <PK(X) - PK(y), PK(y) PK(X)) >_ 0,

whence by the Schwarz inequality,

l lx-yll I IPK(x) pK(y) I[ _> I IPK(x) - pK(y) II 2,

qed. This argument shows that equality can occur in (I) only if

d(x,K) = d(y,K). It also shows that the metric projection PK is

a monotone mapping on X, since

re < y - x , PK(y ) - PK(X)) > I IPK(y) pK(x) ll 2 > o.

Either of the above properties of metric projections

characterizes inner product spaces among general normed spaces, a

point which emphasizes again how "unnatural" is the metric geometry

associated with non-euclidean norms. For example, we have the

following theorem, the proof of which depends on the Jordan-von


159

Neumann and Kakutani characterizations of inner product spaces, and

may be found in [22, p.249].

Theorem. (James, Rudin-Smith) Let X be a nls of dimension

at least 3, such that for all subspaces M of dimension n, where

n is a fixed integer satisfying 1 ~ n < dim(X) 2, M is Chebyshev

and PM is linear. Then X is an inner product space.

b) The restriction dim(X) > 3 in the last theorem is

essential, since all 2-dimensional rotund spaces have the property

that PM is linear for every subspace M. This follows from a

more general fact about Chebyshev hyperplanes, which is a corollary

to the next result.

Theorem. Let M be a Chebyshev subspace of a nls X. Then

i) PM is idempotent and closed (i.e., has closed graph);

3) PM is homogeneous (i.e., PM(tX) = tPM(X), ~x ~ X,

scalars t);

4) PM is additive mod M (i.e., PM(x+y) = PM(x) + PM(y),

if either x or y ~ M).

The proof is completely routine; parts 3), 4) immediately

imply the corollary mentioned above.

Corollary. Any Chebyshev hyperplane M in a nls has PM

linear.

c) Consider now the "fibres" defined by PM' where M is

some Chebyshev subspace of a nls X. The fibre over y ~ M is the

inverse image (y). All such fibres are isometric, being simply

translates of one a n o t h e r :
160

pMl(y) = Y + p~l(@).

Thus we need study only the fibre over @, hereafter denoted M @,

and called the metric complement of M in X. It consists of all

x ¢ X satisfying llx[l = d(x,M), such vectors being frequently

said to be ortho on~£_~ to M. Evidently, M@ is a closed and

nowhere-dense subset of X. Also, from b), it follows that M@

is a union of one-dimensional subspaces of X, and hence is

contractible.

The metric complement M@ can also be characterized by means

of linear functionals, even if M is not Chebyshev. For, as has

been noted by Murray and Singer, it is a consequence of the Hahn-

Banach theorem that

M@ = { x e X: ~c) e S(M "L) such that

-- l Ixll}.

Theorem. Let M be a Chebyshev subspace of a nls X. Then

1) M ® M Q = X;

2) Letting QM be the quotient map: X ÷ X/M, we have

(2) M e is convex <=)

(3) QMIM e is an isometry (~)

(4) PM is a smooth mapping ~ = ~

(5) PM is linear.
161

Proof. I) We m u s t show that every x e X c a n be u n i q u e l y

expressed as a s u m y + z, w i t h y ¢ M, z e M @. Since

x = pM(x) + (x - PM(x)),

we need only check uniqueness. This reduces to s h o w i n g that if

Zl, z2 e M @ and zI z 2 = y ¢ M, then z I = z 2. But b y b),

zI = z2 + y =)

PM(Zl) = PM(Z2) + PM(y) = @ + y = y,

and since M is C h e b y s h e v , we m u s t have y = @.

2) It is c l e a r that (5) =7 (2), (3), and (4), and t h a t

(2) =~ (3), since if M@ is c o n v e x , it is a c t u a l l y a linear sub-

space. Now assume (3) and let Zl, z 2 e M @. Then

I Iz I z21] = I IQM(Zl) - QM(z2) II

= ]IQM(Z 1 z2) ll = d(z 1 - z 2, M),

which shows that zI z2 e M @ and so M@ is a s u b s p a c e , i.e

(2) holds. Finally, assume (4). This means by definition that

PM(x+ty) PM(X)
P MI ( x ; y ) - lira
t ÷ 0

exists V x, y e X, and that the m a p x~-~P~(x;y) is c o n t i n u o u s

for e a c h fixed y. Now if d > 0 then

P~(~x;y) = ~P~(x,6-1y) = P~(x;y).


162

Hence

PI~l(x;y) = lira P~l(Sx;y)


6+0

= P~(@;y) = PM(y),

and therefore

PM (x+y) = PM (x) + ~t PM (x+ty)dt

= PM(X) + P~i(x+ty~y)dt
0

= PM(X) + P M ( y ) .

This completes the proof.

It should be noted that what makes the proof of (4) ~ > (5)

"work" is the continuity of P~(';y) at @. A more usual situation

is that P~(.;y) is continuous on the open set X\M, although PM

is not linear (provided the norm on X is sufficiently smooth;

there is an extensive discussion of the differentiability of metric

projections in [9]; see also [20]).

d) We consider next a few examples concerning the linearity

of metric projections on Chebyshev subspaces of certain non-Hilbert

spaces.

Examples. I) Let X = ~P(3), I < p < ~, and M = span((l,l,l)).

Then PM is linear only if p = 2. Because,

M @ = {(a,b,c) e X: ~tII(a,b,c)-t(l,l,l)IlPlt=o = o}

= {(a,b,c) e X:ala] p-2 + blbl p-2 + clc] p-2 = 0},


163

which is not a convex set if p ~ 2. It follows that if ~ is any

positive measure on a measure space containing three disjoint sets

of positive measure, then the corresponding characteristic functions

span a subspace M of LP(~) which is isometric to ~P(3) (or at

least to a weighted zP(3) space, but this does not effect the

above example). Hence PM is not linear on LP(~).

2) More generally, Ando [i] has proved that a closed subspace

M of LP(~), where ~ is a finite positive measure and

1 < p < ~, has PM linear if and only if the quotient space

LP(~)/M is isometrically isomorphic to some other LP(v) space.

3) Let M be a finite dimensional Chebyshev subspace of

CR([0,1]). Then PM is not linear. The proof of this is an easy

consequence of a theorem of Daugavet [2], which asserts that if T

is any compact linear operator on CR([0,1]) then III + TII =

1 + IITII. (This result has been extended by Foias and Singer [5]

to cover compact operators on spaces CR(~), where ~ is perfect

(i.e., has no isolated points)). Now suppose that PM were linear.

Then

fix pM(x)1] _< II~- oif = IfxII,

whence we obtain the contradition

i : Ill PM11 : 1 + IIPMII > 2.

Observe that this argument also demonstrates that no finite

codimensional subspace in CR(~ ) can be the range of a norm-one

projection.
164

e) We now come to the question of continuity of metric

projections on Chebyshev subspaces. The basic sufficiency conditions

are contained in the following theorem.

Theorem. Let M be a Chebyshev subspace of a nls X. Then

PM is continuous if either dim(M) < ~ or else X is an E-space.

Proof. Suppose that X is an E-space and that xn ÷ x in

X. By the weak compactness of balls in X it follows that

PM(Xn) ~ PM(X). Further, {x - PM(Xn)} is a minimizing sequence

for the norm on the coset x - M, because

d(x, M) = f i x - PM(x) II

< lim i n f l ix - PM(Xn)[[

< lim supllx - PM(Xn) ll

< lim sup( I Ix - Xnl I + l[Xn - PM(Xn)[I)

= d(x, M)

By the definition of an E-space (31b)) it now follows that

PM(Xn) ÷ PM(X), qed.

Remarks. i) The preceding proof of continuity of PM when X

is an E-space works equally well when M is any weakly closed

Chebyshev set in X. Thus, in an E-space, every convex b.a. problem

is "well posed" in the sense of Hadamard: there is a unique solution

which depends continuously on the point being approximated.

2) The E-property is not quite necessary for all metric

p~ojections PM to be continuous. Lambert (unpublished) has shown


165

that the dual of a Banach space constructed by Klee [ii, p. 240] by

suitably renorming ~2 has all PM continuous, but is "not quite"

an E-space, because the Klee space fails to have a Frechet

differentiable norm at a particular unit vector.

3) Recently Oshman [21] has announced nas conditions for all

PM to be continuous. They constitute a slight weakening of the

weak-strong convergence implication in 31b). In view of Lambert's

result, the dual of the Klee space is a concrete example wherein the

Oshman conditions are fulfilled but the E-property is lacking.

4) It might be hoped that strengthening the E-property to

uniform rotundity would result in the uniform continuity of the

metric projections (and hence their Lipschitz continuity since they

are homogeneous maps). However, this fails to be true even in

finite dimensions - [9, p. 246] shows that even pointwise Lipschitz

continuity cannot be expected in general. What about the case of

LP(~) spaces? A result of Lindenstrauss [15, p. 270] implies in

particular that if PM is uniformly continuous on some reflexive

nls X, then M is complemented in X. But Murray [19] has shown

that infinite dimensional LP(~) spaces contain closed subspaces

without complements. On the brighter side, however, it has been

proved [9, p. 236] that if LP(~) is finite dimensional and 2 g p,

then every PM is uniformly continuous.

5) There is also another positive result concerning the

continuity of metric projections on a uniformly rotund Banach space

X. Namely, the family of maps {PM:M a closed subspace of X} is

uniformly equicontinuous on any bounded subset of X [7, p. 109].

f) Next we record a few simple necessary conditions for the

continuity of metric projections.


166

Theorem. Let M be a Chebyshev subspace of a n ~ X, and

suppose that PM is continuous. Then

i) PM is an open, mapping;

2) M@ is a strong deformation retract of X;

3) M@ is homeomorphic to a nls, and hence in particular is

locally contractible.

Proof. i) The fibre bundle (X, M, M @, PM) is equivalent to

the product bundle (M x M @, M, M @, P) under the homeomorphism

x e-~T(x) ~ (PM(X), x - PM(x)). Here P:M x M @ ÷ M is projection on

the first factor. Since such projection maps are always open, and

since we clearly have PM = P° T, it follows that PM is open.

2). The definition of a strong deformation retract (e.g.

[4, p. 324]) requires us to show that the identity map of X is

homotopic to a retraction of X onto M@ in such a way that the

points of M@ remain fixed throughout the entire deformation. In

the present case it is clear that the homotopy t ~I - tPM,

0 < t < I, meets these requirements.

3) An immediate consequence of the following theorem.

g) The next result reenforces earlier evidence of intimate

connections between "smoothness" properties of a metric projection

PM and structural properties of the corresponding metric

complement M @. It provides anas condition for the continuity of

PM and has several interesting implications, one of which being

the existence of discontinuous metric projections on CR(~ ) for

appropriate spaces ~.
167

Theorem. (Holmes) Let M be a Chebyshev subspace of a nls X.

Then the map Q z QMIM @ is a c o n t i n u o u s norm-preserving bijection

of M@ onto X/M, and is a h o m e o m o r p h i s m exactly when PM is

continuous.

Proof. The injectivity of Q is a c o n s e q u e n c e of the first

part of the theorem in c). Suppose that Q-I is c o n t i n u o u s on

X/M, and let x + x in X. Then x + M ÷ x + M in X/M, and so


n n

I IPM(Xn) PM(X) II < t lx n - PM(Xn) (x - PM(X))ll

÷ f rx n xjl -- I JQ- l ( x n + M) - Q-~(x + M) II

+ Irx n xJJ + 0.

Now suppose that PM is c o n t i n u o u s . Let xn + M + x + M in

X/M and let e > 0. Since PM is c o n t i n u o u s at Q-l(x + M),

~6 > 0 such that llz - Q ' l ( x + M) I I < ~ = ~ I I P M ( Z ) I I < e. Let

be the open set {z e X : [ I z - Q-l(x + M) I I < m i n ( ~ , e)}. Now

QM(V) is an (x + M ) - n b h d . in X/M and h e n c e contains xn + M

for n > n o , say. For each such n let zn ¢ V satisfy

QM(Zn) = x n + M. Then ~Yn e M so that xn Zn = Yn' and hence


P M ( X n ) -- Yn + P M ( Z n ) = Xn - Zn + P M ( Z n )" Therefore,

[ [Q-l(x n + M) Q-I(x + M) I [

= Ix n PM(Xn) (x - PM(X))[I

Ix n PM(Xn ) Znll + l lz n (x vM(x))Ij

= IPM(Zn)[I + I lz n (x - P M ( X ) ) I I < 2~
168

for n > no, qed.

Several corollaries of this theorem are given in [8]; let us

just list the following one here.

corQllary, (Cheney and Wulbert) Let M be a Chebyshev subspace

of finite codimension in some nls. Then PM is continuous if and

only if M@ is boundedly compact.

h) We can now give some examples of discontinuous metric

projections. Historically, the first example of such pathological

behavior is due to Lindenstrauss [16, p. 87]; the subspace M

there is the 2-codimensional annihilator of the subspace

span({t, t2}) C CR([0 , i]). Another example is given in [9, p. 245];

the subspace M again has codimension 2, it is contained in a

rotund isomorph of Z~, and the restriction of PM to a line turns

out to be discontinuous.

It has been shown by Garkavi [6] that if the (infinite) compact

space a is the closure of its isolated points, then CR(a)

contains Chebyshev subspaces of all finite codimensions. We will

continue to use the notation of 22e), and will also abbreviate

I xd~ to ~x,~7.

Lemma. Let M be a Chebyshev subspace of finite codimension

n in CR(a), and let t ~ a be n o t isolated in ~. Then

1) @ ¢ ~ e M~ = ) t e support (~);

2) x e S ( C R ( a ) ) ( ~ M @ -~> l x ( t ) l = 1.

Proof. i) It is enough to show that the open set ~\support (~)

is finite. In fact we can obtain a contradiction by assuming this

set contains n or more points. If so, let


169

N = ~x e CR(~): x(support(~)) = 0); we have dim(N) > n. Choose

a basis ~, p2,...,~n ) for M ~. The subspace

M 1 z {x e CR(~): ( x , ~ i ~ = 0, i = 2 ..... n) has codimension (n~l),

hence ~y e MI(-~N, y ~ @, and since (y,~) = 0 also, we have

y e M. But this implies that M is not Chebyshev. For, by the

lemma in 30b) and by Exercise 33, we can choose z e S(CR(~)) such

that z(support(~±)) = ~i, and then the function x z z(l -I Yl)

has norm one, and has both @ and y as b.a.'s from M (check!).

2) From c) it follows that ~ e S(M~) such that

1 = l lxlll I~II = ( x , ~ , whence x(support(~±)) = +I. We now

conclude by use of i), qed.

Theorem. (Morris) Let M he a Chebyshev subspace of CR(~),

satisfying 1 < codim(M) < +~. If ~ contains infinitely many

points, then PM is discontinuous.

Proof. Since ~ is infinite we can choose some non-isolated

point t e ~. Define A ~ = (x e S(CR(~)) ~ M @ : x ( t ) = tl).

According to the lemma just proved, the sets A+ and A- constitute

a (closed) partition of S(CR(~))('~ M@, so that this set is n o t

connected. But since dim(X/M) > 2, S(X/M) is connected.

Consequently, the map QMIM @ cannot be a homeomorphism, so by g),

PM is discontinuous.

Remarks. i) This theorem has been generalized in [14, p. 210]

to cover the case where CR(~) is replaced by AR(K), the space of

real, affine, continuous functions on a Choquet simplex K.

2) A remarkable example has recently been discovered by

Kripke (to appear). He has shown that Hilbert space may be renormed

with a rotund norm so as to contain a subspace M with PM


170

discontinuous. This is the first example of such a metric

projection acting on a reflexive space.

i) We recall that a continuous linear map between two ics is

weakly continuous (that is, continuous when both spaces have their

weak topologies), and that the converse is true when the spaces

involved are both Banach (or Frechet) spaces. Now since a metric

projection PM is generally not linear it is not a priori clear

whether or not PM is weakly continuous. Once again, the answer

depends on a topological property of M @.

Theorem. (Holmes, Kottman-Lin) Let M be a Chebyshev subspace

of a nls X.

I) If M is finite dimensional, then PM is w-continuous if

(and only if) M @ is w-closed;

2) If M is reflexive, then PM is bw-continuous (resp.

w-sequentially continuous) if (and only if) M @ is bw-closed (resp.

w-sequentially closed).

Proof. i) To show that a map between two topological spaces

is continuous, it is sufficient to show that the inverse image of

each basic open set is open. Thus in the present situation it is

sufficient to show that PM l(y + r int(U(M)))is w-open for every

y ~ M and r > 0. If this is not the case for some such r and

y, there is a net {x } w-convergent to x in X such that

(6) I IPM (x) - Y[] < r < [[PM(X ) - Y]I

V~. Since PM is norm-continuous by e), there is, for each a,

a vector z ~ co({x, x }) such that I IPM(Z a) - Yl[ = r. That is,


171

{z } C r S ( M ) + (y + Me),

and the set on the right is w-closed, since M@ is w-closed by

hypothesis, and S(M) is compact hence w-compact by finite

dimensionality (we are using 3h) here). But {z } converges

weakly to x, hence IIPM(X) yll = r, contradicting 46).

2) We recall that in the bw-topology on X a set E CX is

closed if and only if every bounded w-convergent net in E has its

limit in E (cf. [3, p. 41]). Now let (x) be a bounded net

(resp. sequence) convergent to some x in X. By b), (PM(X~)) is

bounded and hence has a w-cluster point y e M. Thus x is a

w-cluster point of (x + (y - PM(X ) } C y + M @, which is bw-closed

(resp. w-sequentially closed). Consequently, PM(X) = y, qed.

This theorem does not completely settle the matter of deciding

whether or not some particular metric projection is w-continuous,

since it is not always so easy to decide whether or not some non-

convex set is w-closed. Let us briefly indicate a few instances

where the theorem has played a role.

Examples, I) Let M be any closed subspace of zP,

1 < p < ~, p ~ 2. Then PM is w-sequentially continuous [8].

2) Let M be a finite dimensional subspace of LP(u),

1 < p < ~, p # 2, where ~ is a separable non-atomic measure

(e.g., Lebesgue measure on [0, I]). Then PM is n%t w-sequentially

continuous at any point of LP(~); in fact, M@ is w-sequentially

dense in LP(~). [13].

3) The subspace Pn of nth-degree polynomials is Chebyshev

in CR([0, i]), but by the Alternation Theorem 25d), P@n is not

w-closed.
172

j) In all the foregoing discussion we have assumed that the

metric projection PM' whose continuity properties we have been

studying, is single-valued, that is, M is a Chebyshev subspace.

Now as we know, there are many examples of proximinal subspaces

which are not Chebyshev. Among these are the non-Haar subspaces of

C(~), the subspace of continuous functions in Z~(~) (30c)), the

subspace of compact operators in the space of all bounded operators

on Hilbert space [i0], and the finite dimensional subspaces of

LI(~), ~ sigma-finite and non-atomic [12, 18]. What can be said

about the continuity of the metric projection in such cases?

Definition. A mapping T:X ÷ 2 ~ where X and ~ are

topological spaces is said to admit a continuous selection if there

is a continuous function F:X ÷ ~ (the selection) such that

F(x) e T(x) ~ x e x.

Thus one particular question which might be posed for a given

proximinal subspace M of a nls X is whether or not PM admits

a continuous selection. The answer to this question is known to be

affirmative whenever PM is Isc on X, or, more strongly, whenever

PM is continuous wrt the Hausdorff metric on the closed bounded

subsets of M. (This affirmative answer is a special case of the

Michael selection theorem [17], which is applicable in the present

situation provided the subspace M is complete.) Finite

dimensionality of M is no help here. For example, if M is a

finite dimensional subspace of LI(~) (~ as above), then PM is

not isc nor does it admit a continuous selection [14]. The answer

to the analogous question for C(a) is not as clear at present:

some non-Haar subspaces of C([0, i]) do exist for which there is

a continuous selection, but it is apparently not known how such sub-

spaces are to be recognized in general.


173

We c o n c l u d e with one s t r o n g p o s i t i v e assertion about the

continuity of a p a r t i c u l a r class of m e t r i c projections, namely those

shown to exist in 30c).

Theorem. (Kripke) Let M be the (proximinal) subspace of

continuous functions in Z[(~), where ~ is p a r a c o m p a c t . Then PM


is L i p s c h i t z continuous wrt the H a u s d o r f f metric on the c l o s e d

bounded subsets of M, and in p a r t i c u l a r , PM admits a continuous

selection.

Proof. We use the n o t a t i o n of 30c). Let x, z e ~(~) and

put ~ ~ IIx - zi[. Then

x, e < z, _< z* _< x * + e.

For any y e PM(X), 30c) implies

x~ A(x) < y _< x, + A(x).

Therefore,

z* - e - A(x) _< y _< z , + e + A(x),

z* - 2e - A(z) j y < z, + 2e + A(z).

it follows that

max(z* - A(z), y - 2e) -= v

< u - min(z, + A(z), y + 2e).


174

Now v is usc and u is Isc, so the D i e u d o n n e interposition

theorem provides a continuous function w interposed between u

and v: v < w < u. We now have both

IIw- yll <

z* A(z) < w < z, + A ( z ) ,

so that by 30c), w e PM(Z), and thus d(y, PM(Z)) 5 2~. By the

symmetric roles played by x and z we are able to c o n c l u d e that

the H a u s d o r f f distance between the sets PM(X) and PM(Z) is at

most 2e ~ 211x - z[i , qed.


175

References for §32

i) T. Ando, Contractive projections in L spaces. Pac. J. Math.


P
17(1966), 391-405.

2) I. Daugavet, A property of completely continuous operators in


the space C. Uspehi Mat. Nauk 18(1963), 157-158.(Russian)

3) M. Day, Normed Linear Spaces. Academic Press, New York, 1962.

4) J. Dugundji, Topology. Allyn and Bacon, Boston, 1966.

5) C. Foias and I. Singer, Points of diffusion of linear operators,


Math. Zeit. 87(1965), 434-450.

6) A. Garkavi, Approximative properties of subspaces with finite


defect in the space of continuous functions. Sov. Math.
5(1964), 440-443.

7) R. Holmes, Approximating best approximations. Nieuw Arch. voor


Wisk. 14(1966), 106-113.

8) , On the continuity of best approximation operators.


Proc. Symp. Inf. Dim. Topology. Annals of Math Study #69,
Princeton Univ. Press, to appear.

9) and B. Kripke, Smoothness of approximation.


Mich. Math. J. 15(1968), 225-248.

i0) , Best approximation by compact operators. Ind.


Univ. Math. J., to appear.

ii) V. Klee, Two renorming constructions related to a question of


Anselone. Studia Math. 23(1969), 231-242.

12) B. Kripke and T. Rivlin, Approximation in the metric of LI(x,~).


Trans. Amer. Math. Soc. 119(1965), 101-122.

13) J. Lambert, The weak sequential continuity of the metric


projection in Lp spaces. Dissertation, Purdue Univ., 1970.
I76

14) A. Lazar, P. Morris, and D. Wulbert, Continuous selections for


metric projections. J. Func. Anal. 3(1969), 193-216.

15) J. Lindenstrauss, On nonlinear projections in Banach spaces.


Mich. Math. J. 11(1964), 263-287.

16) , Extension of compact operators. Mem. Amer.


Math. Soc. #48, 1964.

17) E. Michael, Selected selection theorems. Amer. Math. Monthly


63(1956), 233-238.

18) R. Moroney, The Haar problem in L I. Proc. Amer. Math. Soc.


12(1961), 793-795.

19) F. Murray, On complementary manifolds and projections in L


P
and ~ . Trans. Amer. Math. Soc. 41(1937), 138-152.
P

20) T. Newman and P. Odell, On the concept of a p-q generalized


inverse of a matrix. SIAM J. Appl. Math. 17(1969),
520-525.

21) E. Oshman, On continuity of metric projections onto some classes


of subspaces in a Banach space, Soy. Math. 11(1970),
1521-1523.

22) I. Singer, Best Approximation in Normed Linear Subspaces by


Elements of Linear Subspaces. Springer, Berlin-Heidelberg,
1970.
177

§33. Optimal Estimation

The problems which will concern us now are special kinds of

convex programs motivated by the following practical situation. A

physical object P is assumed to be adequately modeled by an

unknown element Xp of some nls X. Typically X is a function

space such as L~([a,b]) or CR([a,b]) (or perhaps a finite product

of such spaces). Via appropriate experiments, observations are

taken of P, leading to a limited amount of information about Xp.

For example, it may be possible to evaluate Xp (or some of its

derivatives) at certain points, to compute certain moments or Fourier

coefficients, to estimate the values of certain semi-norms on xp,

etc. Of course there will be experimental inaccuracies. We assume

that the data thus accumulated is inadequate to specify Xp

completely; it merely delineates some subset A ~ X and our

knowledge can be summarized by "xp e A". The problem is to obtain

an estimate for xp, which is in some sense optimal.

a) Let X be a real nls and A c X. We consider the problem

of choosing an element of X which best represents the set A. If

x is any particular element of X chosen to represent the set A,

the error incurred will be sup(IIx-yII:y ~ A). In order for this

quantity to be finite it is has that A be bounded, so we will make

that assumption. Thus an x ~ X will best represent the set A

when the above error is a minimum.

Definition. Let A be a bounded set c X. A center (or

Cheb~§hev center) of A is an element x° ~ X for which

(I) sup~llx o yll:y ~ A~ = inf sup~llx - YlI:Y ~ A~,


178

where the infimum is taken over all x e X. The number on the right

in (I) is called the Chebyshev radius of A, denoted r(A).

Thus r(A) is the radius of the smallest ball in X (if one

exists) which contains the set A, and the centers of all such balls

are just the centers of A in the above definition. We denote the

collection of such centers by E(A). Referring back to the opening

paragraph of this section we see that the estimation problem posed

there is in principle resolved by our definition: an optimal estimate

of Xp will be any element of E(A), where A is determined by the

experimental data.

The set A is usually defined by means of affine and convex

constraints:

A = {x e A:¢ez(x ) = ca, CB(x) < d B}

where {~} c X* and {~} ~ Cony(X). (There may, however, also be

some qualitative information which immediately constrains A to lie

in some subspace or cone in X.) Thus A is usually closed and

convex, as well as bounded. In any event this much may always be

assumed, since for any (bounded) A, E(A) = E(6-6(A)).

It is possible to make our estimation problem a bit more

sophisticated by limiting our search for optimal estimates to some

(convex) subset of X, e.g., a finite dimensional subspace. Although

in practice this may be an important consideration for computational

purposes, we will not pursue it further here. The interested

reader may consult [13] for some results in this direction when A

is compact.
179

b) The function

FA(X ) - s u p { ] [ x - y [ l : y e A}

is evidently convex and (Lipschitz) continuous on X. Consequently,

E(A), as the set of all solutions of the convex program (X, FA), is a

(bounded) closed convex subset of X. Now, as we recall from llb),

a nas condition for x° to belong to E(A) is that @ ~ 3FA(Xo).

As always, the efficacy of this condition depends on our ability to

subdifferentiate the function in question; in this case, F A-

A formula of some usefulness for the subdifferential at a point

of a function equal to the supremum of convex functions has been

given by Valadier [18]. This formula assumes an especially pleasant

appearance when the index set is compact.

Lemma. (Valadier) Let X be a real Ics, ~ a compact space,

{ft:t e ~} c Cony(X), f ~ sup{ft:t e ~}, and x ° e X. If there is an

Xo-nbhd. V such that (t,x) r->ft(x ) is continuous on 2 × V, then

f is continuous on V and

(z) ~f(Xo) = c-o{Sft(Xo):t e 2o }

where 2 ° z {t e 2:f(Xo) = ft(Xo)}, and the closure is taken in

the w*-topology.

Proof. Thc continuity hypothesis entails the equi-continuity

of {ft } on V, whence f is usc. Being also isc by definition,

it is continuous. Next, forming difference quotients and taking

into account the definition of ~ we see that


O ~

(3) f ' (Xo;X) > f~(Xo;X ) ~ t e 2o' V x e X.


180

Hence, by 8a-3) we have

3 f ( X o ) m 3ft(Xo) V t ~

and therefore that the inclusion from right to left in (2) is valid.

In order to reverse this inclusion, it is sufficient, in view

of the Strong Separation Theorem 3h), to show that any w*-closed

hyperplane containing U {3ft(Xo):t e 2 0 } must also contain 3f(xo).

That is, if for some z e X we have

z, 4 > <

valid for every ~ e 3ft(Xo) and every t e 2o' then the same

inequality also holds for every ~ e 3f(Xo]. Recalling the Moreau-

Pshenichnii formula 10b), this amounts to showing that

sup f~(x o;z) < X


t e ~
o

= ) f' (Xo;Z) < ~ .

Thus it will suffice to show that for any fixed z e X, there is an

index to e 2 o such that

(4) f' (Xo;Z) <- f't o ( X "z)


o , •

Since we already have the inequality (3), what we are actually about

to demonstrate is that

(5) f' (Xo;Z) = max{f~(Xo;Z):t e 2o }.


181

Fix some 6 < f'(Xo;Z). Then taking into account 7a), we see

that
ft(Xo + c~z) f(Xo)
B - {t e ~: > B}

is non-empty for every ~, 0 < ~ < ~o' where ~o is chosen small

enough that x ° + ~ o z e V. Now the sets B are compact and they

decrease with ~ (again by 7a) and ft(xo) < f(Xo), t e 2).


Consequently,

t o e f; {B :0 < ~ 5 So}.

For every such ~,

ft (Xo + ~z) _> f(Xo) + ~B,


O

whence ft (Xo) -> f(Xo)' that is, t e Therefore, for


O O
O
0 < ~ < ~
o'

fto (x° + ~z) - fto(Xo)


> 8.

This inequality establishes the inequality (4), and hence the formula

(5). The proof is complete.

Remarks. I) We see that, under the hypotheses of the Lemma~

~f(Xo) depends only on the functions ft for which t ~ ~o" In

other words, ~f(Xo) = ~g(Xo) , where g ~ sup{ft:t e ~o }. (Of


course, the "sup" here is really a "max".)

2) The proof of formula (5) only requires continuity of the

functions t ~-~ft(x), for x e V.


182

To apply formula (2) to the computation of 3FA(Xo) , we assume

that A is compact in X. Then defining f (x) = ][x-y][, we can


Y
write

FA(X) = sup{fy(X):y e A}.

Now the subdifferentials 3fy(X) were in effect described in the

course of the proof in 22b), namely

(6) ~fyC~o) = {~ ~ s ( x * ) : ¢ C x o y) = IIx o y[l}.

Combining (2) and (6) we obtain

~FA(Xo) = c-~{qJ e S(X*):~(x ° y)

(7)

= FA(Xo), for some y e A},

where, as usual, the closure is taken in the w*-topology on X *.

The restriction to compact sets A is of course severe, but

formulas for ~FA(Xo) when A is non-compact become even more

unwieldly than (7). The authors of [13] suggest an instance of

practical occurrence which leads to the necessity for estimating

compact sets, namely when one is trying to approximate a continuous

function which depends on several inexactly specified parameters:

x ~+ f(x, X 1 ..... Xm).

Assuming that enough is known about the parameters to permit the

assertion ~i ~ Xi ~ ~i' for each i, one is led to a compact


183

family of functions. For an extensive discussion of this problem,

based on formula (7), we refer to [13].

c) Let us now consider the existence question for centers of

bounded sets in a nls X. For brevity we will say that X "admits

centers" if ¢ ~ A(bounded) c X 3) E(A) ~ ¢. The classical Banach

spaces are known to admit centers, but in general a Banach space need

not admit centers even for finite sets. Before presenting the main

sufficiency condition, it is convenient to introduce a definition.

Definition. A subspace M of a nls X is constrained in X

if M is the range of a norm-one projection defined on X.

Of course, a constrained subspace must be closed. All closed

subspaces of a Hilbert space are constrained; on the other hand,

as was observed in 32d), no finite codimensional subspace of CR(~ )

is constrained (~ perfect). For present purposes, we are interested

in Banach spaces X which (when canonically embedded) are constrained

in X ~. It is known that such is the case if either X is a dual

space or an LI(~) space.

Examples. I) Let Y be a nls, then Y~ is constrained in

Y*~ (= (Y*)*~). Indeed, an appropriate norm-one projection is the

map restricting a given element of Y*** to Y.

2) Let X = L~(~). Then it is known (Kakutani; [4, p. i00])

that X ~* is an (AL)-space. But Dean [5] has shown that every

closed sublattice of an (AL)-space is constrained. Therefore, X

is constrained in X ~. (It is possible to proceed much more

directly in special cases. For instance, let ~ be Lebesgue

measure on [a,b]. Given ~ e X ~, ~ defines an element

~I ~ CR[a'b]~ by restriction. Identifying ~I with a normalized

function of bounded variation on [a,b], we define


184

P(~) = d~l/dt, so P : X * * ÷ X. Now

ib d~ 1
It ~ - a t < var(,l) = ll,l] I ~ II*ll
a

so that llP] < I. Finally, suppose that ~ 5 x e X. In this case,

¢I is the indefinite integral of x, whence P(x) = d~I/dt = x, qed.)

Theorem. (Garkavi) If a Banach space X is constrained in X**,

then X admits centers.

Proof. Suppose it known t h a t every dual space admits centers.

Then g i v e n A ( b o u n d e d ) c X, there is a center for A in X**.

But t h e image of this center under the assumed norm-one projection

on X** is easily seen to be a center for A in X. T h u s we are

reduced to the case where X is a dual space. For n = 1,2,...,

we c a n f i n d x e X such that
n

sup I Ix n - Y l I < r ( A ) + ~.
yeA - n

Let xo be a w - c l u s t e r point of the sequence {Xn}; then because

11" I1 is w*-lsc on X we h a v e immediately l lx o - YI[ ~ r ( A ) , for

every y e A, that is, x o e E(A), qed.

d) The c o n d i t i o n of the last theorem is certainly not necessary

for a space X to admit centers. For example, the space co is

not even complemented in its second dual (- m ~ Z ~ ) , yet Garkavi [6]

proved that it does admit centers. In this section we s h o w , more

significantly, that the spaces CR(a ) a d m i t centers. T h i s was

originally proved by Kadets and Zamyatin [11] for the case where

= [a, b], but their proof easily generalizes. In fact, it will be

seen their proof carries over to the case where X is the space of
185

bounded continuous functions on a p a r a c o m p a c t space ~. The w h o l e

approach is quite reminiscent of 30c).

Given A(bounded) c CR(~ ), we define

a(t) = inf{x(t) :x e A},

a(t) = sup{x(t):x e A},

a,(t) = lim i n f { a ~ s ) : s ÷ tT,

a*(t) = lim s u p { a ( s ) : s ÷ t}.

Then the f u n c t i o n a* - a, is usc on ~, h e n c e attains its m a x i m u m

value z 2r at a p o i n t t o e ~.

Lemma. The n u m b e r r just d e f i n e d satisfies r < r(A).

Proof. For any z e CR(~), we show that FA(Z )

sup{IIy-zII:y e A} ~ r. Given e > 0, ~ to-nbhd. N on w h i c h the

oscillation of z is < e/2. By d e f i n i t i o n of r and to, we

must have either

(a*- z)(to) _> r, or (z a,)(to) > r,

say the former. By d e f i n i t i o n of a* and a, we can first find

s e N for w h i c h a(s) > Z(to) + r, and then an x ~ A for w h i c h

x(s) > Z(to) + r - e/2. By d e f i n i t i o n of N, we then o b t a i n

FA(Z) > l lx - Yll _> (x - z)Is) ~ r

which completes the proof.


186

Theorem. The space CR(~ ) admits centers. If A is any

bounded subset of CR(fl), we have

(8) E(A) = {x e CR(a):a* - r < x < a, + r},

where a*, a,, and r were just defined. In p a r t i c u l a r , r = r(A).

Proof. Dieudonn~s Interposition Theorem (cf. 30c) guarantees

that the right hand side of (8) is non-void. But choosing any x

in such a f a s h i o n entails

llx- yll r, Vy CA,

and so the Lemma yields x ~ E(A), and r = r(A). On the other hand,

if we have any center x° for A, then Vt e ~, Vy ~ A

y(t) r < Xo(t ) _< y(t) + r,

hence

i(t) r < Xo(t) < a(t) + r,

and finally,

a* - r < x° < a, + r,

and so the p r o o f is complete.

Corollary. If A is a compact subset of CR(~ ), then the

function (a + ~)/2 belongs to E(A).

This follows because a and a are c o n t i n u o u s , due to the

equicontinuity of A, and so a = a,, a = a *. The Corollary thus


187

provides a simple formula for a center of a compact set in CR(~).

Such formulas are generally not available in other Banach spaces.

e) We now turn to the uniqueness problem for centers. Not

surprisingly, the answer hinges on rotundity properties of the unit

ball. A precise though somewhat unusual condition is known which is

nas for every bounded set to have at most one center. This condition is

that the unit ball should be "uniformly rotund in every direction"

[6,10]. This property is known to be weaker than uniform rotundity.

In fact, there exist reflexive spaces having this property yet not

isomorphic to uniformly rotund spaces. On the other hand, the

property is definitely stronger that mere rotundity. For example,

addition of the L 2 norm to the uniform norm turns CR[0,1 ] into a

rotund space which is not uniformly rotund in every direction. This

property is discussed in detail in [I0] and will not be considered

further here. It appears that for most practical purposes the

following two sufficient conditions are adequate.

Theorem. (Klee, Garkavi) Let X be a uniformly rotund Banach

space. Then every bounded subset of X has a unique center in X.

Proof. We know (31g)) that X is an E-space, hence reflexive,

so that by c) X admits centers. Now let A c X and suppose

Xl, x 2 e E(A). Then also x o ~ (x I + x2)/2 e E(A) and we can

choose {yn} c A such that fix o ynl I ~ r(A) o Now

1 1
Xo Yn = ~(Xl - Yn ) + 2-(x2 Yn )

and I lx i ynl I < r(A) (i = 1,2), hence

limIlx i ynll = r(A>.

But also
188

liml](x I - yn ) + ( x 2 - yn) l[

= limII2(x ° yn) II = 2 r .

Consequently, by uniform rotundity,

0 = limIl(x I yn) (x 2 yn) l[ = [Ix I - x21 I,

and so E(A) is a singleton, qed.

Entirely similar arguments establish that with respect to the

Hausdorff metric on the closed bounded convex subsets of a uniformly

rotund space, the functions r(.) and E(.) are continuous. (In

fact, r(.) is always continuous for any nls.) This statement, along

with the preceding theorem, naturally reminds us of the analogous

fact that in such spaces every convex best approximation problem is

well posed. However, this last assertion is even valid in E-spaces,

as was noted in 32e). It is apparently not known whether the

E-property also suffices for the uniqueness and stability (i.e.,

continuity of E(.)) of centers for arbitrary bounded subsets, but

it does suffice if we restrict ourselves to the consideration of

compact sets.

Theorem. (P. Smith) If X is a rotund space then every compact

set in X has at most one center in X. If X is an E-space, then

each compact set in X has a unique center and E(-) is continuous

with respect to the Hausdorff metric on the compact (convex) subsets

of X.

Proof. Let Xl, x 2 ~ E(A) for some compact A c X. Then

(x I + x2)/2 e E(A) and ~y e A such that


189

xI + x2
r(A) = i] 2 Yli

< 21_] ix 1 _ y i ] + 1]ix 2 _ y[] _< r ( A ) .

In order to avoid having a line segment on the sphere r(A)S(X) we

must therefore have xI y = x2 y, or x I = x 2.

Now let X be an E-space, and {A n } a sequence of compact

subsets converging in the H a u s d o r f f metric to a c o m p a c t subset A.

Let E(An) = {x n} and E(A) = {Xo}. Choose Yn e A n such that

llx n ynl I = r(An). For any y e A and any w - c l u s t e r point x of

{Xn}, we have

[Ix - yll < lim inf[[x n Y]I ~ lim infJlx n yn] ]

= lim inf r(An) = r(A),

which shows that x e E(A) • Consequently, xn ~ X o• Now, given

Yo e A satisfying fix ° - yo[ ] = r(A), we have xn Yo --~ Xo Yo"


Therefore,

r(A) = I lXo - Yoll < l i m i n f ] Ix n yo] [

< lim s u p I i x n yoI I ~ lim sup[Ix n ynl I

= lim sup r(An) = r(A),

and so the E - p r o p e r t y entails Xn Yo ÷ Xo Yo' hence xn Xo,


qed.
190

f) In contrast with the best approximation problem we have in

the present circumstances a new problem of location. Given A

(bounded) in a nls X we have already noted that E(A) = E(~-o(A)),

but where is E(A) wrt c--o(A)? In particular, do we have

E(A) c c-o(A), or at least E(A) ~l c-o-(A) ~ ~? Unfortunately, the

answer to even the latter question is generally negative, as we see

next.

Theorem. (Klee, Garkavi) For a nls X, the following assertions

are equivalent:

i) for each bounded A ~ X, E(A) f~ c-o(A) ~ ~;

2) dim X < 2 or else X is a Hilbert space.

Proof. Let X be a Hilbert space, {x o} ~ E(A), and suppose

x° / ~(A). Applying 3h), we strongly separate x° from co(A) by

a hyperplane H; we may assume that @ ¢ H, x o ~ H. We set

h ~ PH(Xo ) (32a)) and consider any y e A. If z is the point

where the line segment [Xo, y] intersects H, then we have

[l~ o - z[[ = [[PH(Xo z)[[ < [IXo zr[,

11~o- yll ~ 11~o- zll + ))y- zl)

< iix o zl] + IJy zil = llx o - yil.

This implies that Xo ~ xo ¢ E(A), and thereby contradicts uniqueness

of X0 .

For the converse, we may assume that A contains at least

3 points, and that dim X > 3. By well-known characterizations of

inner-product spaces (Jordan-von Neumann, Kakutani), it will suffice


191

to assume dim X = 3, and to c o n s t r u c t a norm-one projection from X

onto fixed but arbitrary 2-dimensional subspace L c X. Once it is

known that X must be an i n n e r - p r o d u c t space the p r o o f will be

accomplished, for if any nls X satisfies condition I) above, it

m u s t be c o m p l e t e (if not, let A be the intersection of X with a

ball centered in X~X; then E(A) is void).

Now if z o e X\L is fixed, the sets

Dn ~ {x ¢ L : [ I x - Zol ff n } ,

r n ~ {x e L : l l x - Zol = n}

are n o n - e m p t y for large n. For y e rn let

s(y) ~ ~x e e : l l x - Yl < n>.

We now apply Helly's theorem [3] to c o n c l u d e that

fl {S(y):y e F n} ~ S n ¢ ~. The h y p o t h e s i s of H e l l y ' s theorem,

namely that every three S(y)'s have n o n - v o i d intersection, is

justified by a p p l y i n g condition i) to any 3-point subset of F n-

If x n e Sn then IlXn - yll 5 fly - Zoll V y e F n, and now a

geometric argument shows that

(9) llx - Xnll 5 I[x - Zorl,

for every x ~ D n. The sequence {x n} is b o u n d e d in L, hence has

a cluster point x o ~ L. Because L is the u n i o n of the Dn we

see that II x - Xoll 5 [I x - Zoll V x e L. We now define a

projection P:X ÷ L via

P(tz + x) = tx + x,
o o
192

for all scalars t and x ~ L; taking into account (9) we obtain

tlP[l = 1, qed,

g) Let A be a bounded set in some nls X. It is clear that

1
(io) r(A) > ~- diam(A),

where diam(A) is the ordinary metric diameter of A. Let us say

that A is centerable if equality holds in (i0). In general we

expect this property to depend on the shape of A vis-a-vis the

shape of U(X). For example, in R2 consider the triangles A1 and

A2, where both Ai have vertices at (0, 0) and (2, 0), and the

third vertex of A 1 (resp. A2) is at (I, I) (resp. (i, ~ ) ) . Now

both Ai have diameter 2 wrt either the Euclidean norm or the

sup norm, and both are centerable wrt the latter norm. However,

although A1 is also centerable wrt the Euclidean norm, A2 is not.

That the (equilateral) triangle A2 is not centerable wrt the

Euclidean norm is only a special case of the following classical

result (Jung, 1901): if A c ~2(n), then

(11) r(A) n
< (T6--7--f) i/2diam(A )

with anas condition for equality being that A is a regular simplex

[2, 3]. The infinite dimensional analogue of (Ii), namely that we

may let n ÷ ~ to obtain

r(A) < 2-1/2diam(A),

has been shown by Routledge [16] to be valid in any Hilbert space.


193

It is natural to inquire whether some Banach spaces contain

only centerable sets. In order to produce such examples, let us

first recall that the following properties of a (real) Banach space

X hsve been shown by Nachbin [15] and Kelley [12] to be equivalent:

i) X is a "PI space", that is, X is constrained (c)) in

every Banach space containing it;

2) every collection of mutually intersecting (closed) balls in

X has non-void intersection;

3) X is (isometric with) CR(~), where the compact space ~ is

extremally disconnected.

It is known that no P1 space can be smooth and that no infinite

dimensional Pl space can be separable or w-sequentially complete

(and hence cannot be reflexive) [8]. The standard examples of such

spaces or ~(S) and LR(~ ). Combining condition i) above with the

theorem in c), we see that each Pl space admits centers. A proof

of the following theorem has been given by Belobrov [i] utilizing

condition 2).

Theorem. Let X be a Pl space. Then every (bounded) subset

A of X is centerable.

Proof. We identify X with CR(~) as in condition 3).

Because ~ is extremally disconnected the space CR(~ ) is boundedly

complete (indeed this property is characteristic of such spaces

[17]). Consequently, in the notation of d), the functions £ and

belong to X, and so, as in d), (£ + a)/2 e E(A). But whenever

this happens, we can, given e > 0, find X and ~ in A such that

diam(A) > I IF - KII > ~(t o) - K(t o)


194

>_ a ( t o ) ¢ - a(to) e = 2r(A) 2e,

if t e Q is chosen so that
0

1 --
r(A) ~ J[~(a - a) J I = l ( a ( t o ) a(to)).

This completes the proof.

Corollary~ Every compact subset of any space CR(~ ) is

centerable.

This follows from the preceding argument and the corollary in d).

However, an arbitrary bounded set in CR([0,1]) need not be

centerable.

h) To conclude this section we present a result due to Golomb

and Weinberger, which shows that centers for certain subsets of

Hilbert spaces may be identified with elements of minimal norm. This

reduces the estimation problem to one of best approximation. An

extensive variety of examples illustrating this method is available

in [7].

Let X be a Hilbert space, M a closed linear subspace

(especially important for the applications are the finite

codimensional subspaces), and p > 0. We define A to be the

intersection of pU(X) with some fixed translate of M, and refer

to A as a "hypercircle" in X.

Theorem~ The center of any hypercircle A in X is the

(unique) element of minimal norm.

Proof. Let x e A be the element of minimal norm, that is,


O

x o = PA(@). By the characterization of b.a.'s in Hilbert space 22d),


195

we have

0 < <y - Xo, X o > , V y ¢ A,

whence for any y e A,

ti% - y))2 = ]x ° 12 + l ) y l ) 2 - 2KXo, y)

< ix ° [2 + i ] y ] ] 2 _ 2[[Xo]]2

= ]yl 2 ] l X o l l 2 <_ p2 _ ~2 ~ 0 2

where 6 ~ IIXol I. To see that xo must be the center of A, we

will produce YI' Y2 ~ A such that ]]Yl - Y2 [I = 2a. Because of

(i0), this will show that r(A) > ~ and hence complete the proof.

Define

Yl = X O + ~z,

Y2 = Xo ~z,

for any fixed z ~ S(M). Then because x ¢ M i,


O

[]x o
+_ a z [ ] 2 =
i[Xo[[2 + 2 = 0
2,

so that both Yi e A and Ily I - y2[I 2 = 4~ 2, qed.

Note that we have obtained the formula

r(A) = ~ --- (O 2 i iXo I ]2)1/2,

and that the center x appears as the orthogonal projection of any


O

point in A onto M
196

Remarks. i) As has been observed by P. Smith, the argument

just presented is equally applicable to the case where the hypercircle

is replaced by the intersection of a ball pU(X) and a "strip"

{x e X : c ~ ¢~(x) f d }, for some family {¢~} c X*. Indeed,

letting A be the intersection and xo the minimal element, we see

as before that r(A) j sup{]Ix ° y][:y e A} < c and then that

r(A) = c by consideration of the hypercircle {x e A:¢~(x) = ca(Xo)}.

2) Consider the very special case of the above theorem where

M is one-dimensional and hence the set A is just a chord of the

ball pU(X). Now the center of a line segment is obviously its

mid-point (in any nls). Hence the theorem asserts in this case that

the minimal element of any such chord is its mid point. It has

recently been established by Gurarii and Sozonov [9] that this

property is characteristic of inner-product spaces.

i) Frequently, in practical situations of estimation, we are

confronted with the problem of estimating or "predicting" the value

of one or more linear functionals on the unknown element x of X


P
(notation as in the beginning of this section). Assuming as usual

that our knowledge of x can be compressed into the assertion


P
"x e A", and given some ¢ ~ X* (a "prediction functional"), we
P
would like to enclose the image ¢(A) in as small an interval as

possible (assuming real scalars for simplicity). Then the mid-

point of that interval is our estimate or "predicted value" of ~(Xp)

and half the length of that interval is a bound on the error. This

of course is simply a center problem in the scalar field. In

particular, when A is convex, we would like to be able to actually

compute the interval ~(A).

The only cases for which this problem has received attention

are those where A is a finite codimensional hypercircle. Here we


197

report only the s o l u t i o n w h e n X is a H i l b e r t space, w i t h o u t the

finite codimensionality restriction. However) the interested reader

is r e f e r r e d to some w o r k of M e i n g u e t [14], w h e r e i n a f o r m u l a of some

value for ¢(A) is o b t a i n e d for a g e n e r a l nls.

In the next t h e o r e m we let A be a h y p e r c i r c l e , defined as in

h) to be the intersection of a t r a n s l a t e of a c l o s e d subspace M

with pU(X), for some p > 0.

Theorem. Let X be a real H i l b e r t space, A a hypercircle in

X, and ~> e X*. Then

~(A) = [~(~o ) oi[~[MII, ~(Xo ) + ° I [ ~ [ M I [ ] ,

where xo is the c e n t e r (= m i n i m a l element) of A, and

o2 ~ p2 _ i t x o l l 2 , as in h).

Proof. We k n o w that x ° = y - PM(y), V y e A. Since

y = PM(y) + (y - P M ( y ) ) , we h a v e

)IPM<y)E] z = llyll z iXoli2 <_2,

whence

]~(y) ~(xo)[ = ]~(PM(y))

< r[~E~Ir I PM(y>ll _< ~II~I~II.

To see that these b o u n d s c a n n o t be d e c r e a s e d , define

Yl = XO + (;too'

Y2 = Xo - ~mo'
I98

where m ° e S(M) satisfies ¢(m o) = I I¢IMII. Then just as in h)

we have YI' Y2 e A and

¢(Yl) = ¢(Xo) + °]l¢lM[r

¢(y2 ) = ~(x o) aJj¢iMjl.

This completes the proof.

This theorem shows that the optimal estimate for ~(Xp), given

that Xp e A, is the value ¢(Xo) and the associated estimation


error is I[¢IMII(p 2 - llXo]I2) I/2.

Example~ Suppose that we have the following information about

a function x(-) on the interval [a, b]:

x ~ CR([a , b]), and ~ is

sectionally continuous on [a, b];

x(a) = ~, x(b) -- ~;

b~(t)2dt < p2

What is our optimal guess for x, and what value can we predict for

I
a
bx(t)dt?

This problem is clearly a (very) special case of the ones just

discussed. We choose as our underlying Hilbert space the space

X = Hl([a, b]) (cf. Exercise 20, p. 31), with inner product


199

b
(x, y) = IaX(t))~(t)dt.

In order for < x, x \~ 1/2 to be a n o r m we must identify functions

differing by a constant. Since the c o n s t a n t functions are d i s j o i n t

from the s u b s p a c e M of f u n c t i o n s vanishing at a and b, this

identification cannot lead to any a m b i g u i t y in our answer.

According to h) our optimal estimate for the u n k n o w n x is the

minimal element x° in the v a r i e t y {x e X: x(a) = a, x(b) = B).

By u s i n g any of the several optimization techniques from Part II

(for example, 12d, e) or 16f)) we are led to c o n c l u d e that xo must

have the p r o p e r t y that for some scalars c and d, and all x e X,

b
(x, x o) ---I ~(t)~°(t)dt = cx(a) + dx(b)"
a

i
This entails x ° = constant, and so x° is just the linear function

on [a, b] with the p r e s c r i b e d v a l u e s at the e n d p o i n t s . It follows


that the predicted value for Ibx(t)dt is
~a

b (b-a)(~+B)
(12)
i a
Xo (t)dt ~ 2 "

In other words, given only such limited information about the

unknown function x, the optimal method for e s t i m a t i n g its d e f i n i t e

integral is to apply the trapezoid rule.

What is the error incurred by c h o o s i n g (12) as our p r e d i c t e d


b
value for I x(t)dt? The answer to t h i s depends on computing the
a
v a l u e of the program
200

b
(13) max{ I x ( t ) d t : x e S(M)}.
a

This program has a unique solution mo, characterized (again by the

results in either §12 or ~16) by the e x i s t e n c e of c, d and X > 0

such that

f
a
bx(t)dt = cx(a) + dx(b) X fbx (t)mo (t) d t ,
a

for all x. This entails m = constant so m must have the form


O O

mo(t ) = p ( t - a ) ( t - b ) ,

for some scalar p. Choosing p so that Ilmol I = i, we find

(3_____~l/Z '
P=\(b-a)~

and thence the value of (13) is

( ( b -1a/)2 13 2)

Assembling a l l t h i s i n f o r m a t i o n we f i n a l l y a r r i v e at the c o n c l u s i o n
t h a t for any x s a t i s f y i n g the c o n d i t i o n s l i s t e d at the b e g i n n i n g
of t h i s example,

x(t)dt .....(a+B)2 (b-a) 2 <_ p2 _ (b-a)12


a
201

References for §33

i) P. Belobrov, On the problem of the Chebyshev center of a set.


Izv. Vys. Ucheb. Zaved. (1964), 3-9. (Russian)

2) L. Blumenthal and G. Wahlin, On the spherical surface of smallest


radius enclosing a bounded subset of n-dimensional euclidean
space. Bull. Amer. Math. Soc. 47 (1941), 771-777.

3) L. Danzer, B. Gr~nbaum, and V. Klee, Helly's theorem and its


relatives. Convexity, Prec. Symp. Pure Math. 7 (1963),
Amer. Math. Soc.; 101-180.

4) M. Day, Normed Linear Spaces. Academic Press, New York, 1962.

5) D. Dean, Direct factors of (AL)-spaces. Bull. Amer. Math. Soc.


71 (1965), 368-371.

6) A. Garkavi, The best possible net and the best possible cross-
section of a set in a normed space. Izv. Akad. Nauk SSSR 26
(1962), 87-106. (Russian) (Translated in Amer. Math. Soc.
Trans., Ser. 2, 39 (1964).)

7) M. Golomb and H. Weinberger, Optimal approximation and error


bounds. On Numerical Approximation, R. Langer, Ed., Univ. of
Wisconsin Press, Madison, 1959; 117-190.

8) A. Grothendieck, Sur les applications lin~aires faiblement


compactes d'espaces du type C(K). Can. J. Math. 5 (1953),
129-173.

9) N. Gurarii and Ju. Sozonov, Normed spaces in which the unit sphere
has no bias. Math. Zametki 7 (1970), 307-310. (Russian)
(Translated in Math. Notes 7 (1970), 187-189.)

I0) R. James and S. Swaminathan, Nermed linear spaces that are uni-
formly convex in every direction. Preprint.
202

Ii) M. Kadets and V. Zamyatin, Chebyshev centers in the space


C[a,b]. Teo. Funk., Funkcion. Anal. Pril. 7 (1968), 20-26.
(Russian).

12) J. Kelley, Banach spaces with the extension property. Trans.


Amer. Math. Soc. 72 (1952), 323-326.

13) P. Laurent and P.-Dinh-Tuan, Global approximation of a compact


set by elements of a convex set in a normed space. Num.
Math. 15 (1970), 137-150.

14) J. Meinguet, Optimal approximation and interpolation in normed


spaces. Numerical Approximation to Functions and Data,
J. Hayes, Ed., Ath]one Press, London, 1970; 143-157.

15) L. Nachbin, A theorem of the Hahn-Banach type for linear trans-


formations. Trans. Amer. Math. Soc. 68 (1950), 28-46.

16) N. Routledge, A result in Hilbert space. Quart. J. Math.


3 (1952), 12-18.

17) M. Stone, Boundedness properties in function-lattices. Can. J.


Math. 1 (1949), 176-186.

18) M. Valadier, Sous-diff~rentiels d'une borne sup~rieure et d'une


somme continue de fonctions convexes. C. R. Acad. Sci.
Paris 268 (1969), A39-A42.
203

§34. Quasi-Solutions

A familiar application of optimization techniques, dating back

to Cauchy, is to the location (or at least approximate location) of

roots of functions and more general mappings; that is, to the

solution of equations. For example, consider the problem of locating

roots of a given polynomial p. We define a function f on R2 in

the following way:

p(z) -= p(x + iy) - g(x, y) + ih(x, y),

f(x, y) - g(x, y) 2 + h(x, y)2.

Then clearly the value of the program (R 2, f) is 0, and even

though f need not be convex, there are no other (local) minima of

f (as follows easily from the Maximum Modulus Principle). Thus any

scheme for minimizing smooth functions over R2 can be used to

locate roots of polynomials. Clearly this observation does not

depend on p being a polynomial; it is equally applicable to any

analytic function.

In this section we give a brief introduction to the theory of

equation solving via optimization. However, in order to have the

resulting programs convex, we will confine ourselves to linear

equations. On the other hand the underlying vector spaces can be

infinite dimensional, as usual.

a) Let X and Y be respectively a tls and a nls, and let

A ~ L(X, Y). We will assume that A is injective, but not

invertible. This means that the problem

(1) A(x) -- y ,
204

is not well-posed [cf. 32e)); that is, a solution need not exist

for all y e Y, and that solution which does exist for y e range

(A) does not depend continuously on y. This lack of "stability" in

the inverse problem inhibits the use of any sort of approximation

scheme for solving (i).

As was observed by Tikhonov [4], one way to circumvent this

difficulty is to seek solutions to (i) only within a fixed compact

subset M C X. The point of this is that, due to a familiar theorem

from topology, the restricted mapping AIM is then a homeomorphism.

Hence if in (I) y varies within A(M), then the solution x

depends continuously on y. Of course, the drawback to this approach

is that it may not be clear whether y belongs to A(M) (in

practical problems y may be known only via experiments, and

therefore is not given with complete accuracy).

Another possible approach to the study of ill-posed problems

such as (i) is to alter the notion of a solution. This was first

suggested by Ivanov [I].

Definition. Let A, X, Y be as defined above and let M be

a subset of X. An M-~uasi-solution of equation (I) is a solution

of the program (X, f + 6M) where f(x) ~ J JA(x) Y[I.

Evidently, an M-quasi-solution exists if and only if y has

a b.a. from A(M). In particular this is the case whenever A(M)

is proximinal (30 a)). A fairly general sufficient condition for

this to occur is given next.

Lemma. Let M = B + F, where B is a compact subset of X

and F is a finite dimensional subspace of X. Then A(M) is

boundedly compact (31 a)), and hence proximinal in Y.


205

Proof. A(M) = A(B) + A(F), the sum of a compact subset and a

finite dimensional subspace of Y. Let {yn } = {b~ + f'}n be a

bounded sequence in A(M). Then {b~}, and consequently {f~}, are

bounded sequences. It is clear from this that we may extract a

convergent subsequence {b' ) from {b~}, and then a similar sub-


ni
sequence from {fnf }. This shows that {yn } is compact, and since
1
A(M) is closed, the proof is complete.

For the next theorem we assume that X is also a nls, that

both X and Y are infinite dimensional, and that range (A) is

dense in Y. We see that the situation under discussion occurs in

particular whenever A is a compact (injective) linear map (that

is, such an operator cannot be invertible).

Theorem. (Ivanov) Let M = B + F CX as in the previous

lemma. Then the M-quasi-solution program (X, f + gM ) is stable,

and is well-posed whenever N ~ A(M) is Chebyshev.

Proof. Suppose, for the time being, that we know that the

operator A 1 ~ AIM is invertible. To establish the stability (31e))

of the program, we consider any minimizing sequence {x n} (i.e.,

{Xn}C M and ]IA(Xn) - Yll ÷ d(y, N)). Then {A(Xn)} is bounded

and hence

{x n} = {All(A(Xn) )}

is also bounded. Since M is boundedly compact, stability follows.

Now if N is Chebyshev, the associated metric projection PN is

continuous. Hence, given y e N, the unique M-quasi-solution of

(i) is given by

(2) x = All(PN(y)).
206

It remains to show that A1 is a homeomorphism. To this end,

we choose a complementary (closed) subspace G for A(F):

(3) Y -- A(F) O G ,

and define a (closed) subspace E C X by

E = ~ ( A * ( G -L)}.

Then we easily see that

X = E~ F,

and that ~ = G (using the density of range (A) in Y). Next,

let BE be the projection of B on E (along F), so that

M = B E + F. Now the mappings A B ~ AIB E and A F ~ AIF are

invertibie on their respective domains. Hence if P:Y ÷ A(F) is

the projection operator defined by (3), and if y e N, then

All(y ) = AFI(p(y)) + ABI((I - P)(y));

this formula exhibits the continuous dependence of All(y)- on y,

qed.

b) In order to attain a more versatile result, one which does

not involve the restriction to subsets compact in a norm topology,

we present the following variant of the theorem in a).

Theorem. Let X be a Ics and Y an E-space. Let A be a

closed injective linear mapping with domain D(A) dense in X, and

w i t k values in Y. Assume that there is a compact set B C X such


207

that M ~ D(A) ~] B is convex. Then the M-quasi-solution program

is well-posed.

Proof. In analogy with the preceding proof, let A 1 = AIM

and N = A(M). We will show that A1 has a continuous inverse

(on N) and that N is closed. Since N is also convex, PN is

then (single-valued and) continuous because of the E-property (32 e)~

Then formula (2) will hold for the M - q u a s i - s o l u t i o n as a function of

y, and the result will be established.

Let M1 be a relatively closed subset of M; that is,

M 1 = D(A)~ BI, where B1 is a closed subset of B. We claim that

N 1 z A(MI) is closed. This will prove both that N is closed

(take M1 = M), and t h a t A~1 is continuous (since its inverse A1

is mapping closed sets into closed sets). Let {yn } C N1 be a

convergent sequence with limit Yo" Then xn z A-l(yn) e M 1 C B1


which is compact. Consequently, {x n} has a cluster point x o e B I.

Because A is closed, the point (Xo, yo) belongs to the graph of

A. Thus x o e D(A), so in fact, x ° e MI, and Yo = A(Xo) e N I, qed.

Remark. A particular circumstance where the hypotheses of this

theorem are satisfied is the following. There is a consistent

vector topology z on X, stronger than the given topology, and the

mapping A is everywhere defined on X and T-continuous. For

example, T might be the Mackey topology on X. Then since such

an A is weakly continuous on X, A is closed wrt the original

topology on X.

Example . Let X and Y be Hilbert spaces of infinite

dimension, and A e L(X, Y) be compact and injective. For some

r > 0, let M = rU(X). Now M is convex and w~compact, so by the

preceding remark and theorem, the M-quasi-solution program is well-

posed. Let us compute this solution for a given y e Y.


208

The operator A*A is compact, self-adjoint and positive semi-

definite. Let k I > k 2 ~ ... > 0 be its eigenvalues, and

{Ul, u2,...} the corresponding orthonormal basis of eigenvectors.

Set ~n = ~ A ( u n ) ' Y~ for n = i, 2 ..... Then the unique M-quasi-

solution for the equation A(x) = y is

Bn
(4) x = ~ ( X + X )Un'
n=l n

where X = 0 if

2
(s) Bn
n=l 7 < r2,
n

and otherwise k is the positive root of the equation

2
(6) ~ Bn = r2.
2
n=l (~n + X)

To verify this assertion, we put f(x) = IIA(x) y[]2, and compute

that

O = Vf(x) = 2A*(Ax - y )

if and only if

co

(7) A*A(x) = A*y = ~ flnUn .

Expanding x in terms of the basis {Un} , and substituting it into

(7), we find that

~n
x = [ (~---)Un •
n=l n
209

Therefore, if (5) holds, this X must be the desired quasi-solution.

Otherwise, we have

~n 2 > r 2 ,
(y_)
n=l n

and now we have the constrained program of minimizing f(x) subject

to I[xl] 2 < r 2. This is an ordinary convex program to which the

(classical) Kuhn-Tucker conditions of 12d) are applicable. We

conclude that there is X > 0 such that the solution x satisfies

IIxll = r and

2A*(A(x) - y) + ~(2x) = @

or

A*A(x) + Xx = A*(y).

Expanding x and A*(y) in terms of the basis {u n} immediately

leads to (4); the requirement Ilxl[ = r then implies that

satisfies (6).

c) The practical problem involved in solving equation (I)

(under the hypotheses on the operator A made in a)) is the following.

Given that y is either known exactly, or can be computed

(approximated) to arbitrarily high accuracy, compute (approximate)

the solution x to arbitrary accuracy. That is, assuming that

y e range(A), and that we have a sequence {yn} C Y with Yn + y'

find a sequence {Xn} C X such that x n ÷ x ~ A-l(y).

One possible way to utilize the preceding results on quasi-

solutions for the resolution of this problem is to choose an

increasing sequence {Mn} of compact subsets of X such that


210

c~(~Mn) = X.
n

This is c e r t a i n l y possible if X is a s e p a r a b l e nls. We m i g h t

then let xn be the Mn-quasi-solution of e q u a t i o n (i) and try to

prove that x n ÷ x. This scheme has in fact been suggested by

Lavrentiev [2, p. 8], and alleged by him to always lead to a

convergent sequence {Xn}. (Using the c o n t i n u i t y and i n j e c t i v i t y

of A, it is not h a r d to see that the only p o s s i b l e limit of {x n}

is the true solution x.) However, this allegation is false, even

when the u n d e r l y i n g spaces X and Y are H i l b e r t spaces, as we

see next.

?
Example. Let X = Y = ~. We define a compact injective

linear operator A on X by

1 1
(z I, z 2 ..... z n .... ) = z ~-~A(z) = (Zl, ~z 2 ..... ~Zn,...).

1 1
Let x-- (I, 2' 3 ,...) and y = A(x). Finally, define

B n = {z e X: ]ziJ < n, i < n; z i = 0, i > n},

2
M n = co(B n U (x + e n }),

where en is the n t h - s t a n d a r d unit vector. Now first we see that

each Mn is c o m p a c t and convex, and that their u n i o n is dense in

X. Also, although {M n} as given is not an i n c r e a s i n g sequence,

appropriate subsequences are increasing, for example,

{M 2, M 4, MI6, M256,...} = {M22 n}


211

Next we c l a i m that the Bn-quasi-solution of the c o r r e s p o n d i n g

equation (i) is zn -
= (i, ~1 .... ' K'
1 O, 0 .... ). From this it follows

that

]IA(zn) yll ~ I IA( zn - x ) [ ]

= ( [ k-4)I/2 > i n-3/2


k=n+l ~ "

n2
On the other hand, for large n, the M n - q u a s i - s o l u t i o n is X + e ,

since

2 2
IIA( x + en ) - yll = IIA( en )[1 = n-2,

2
which is e v e n t u a l l y < a ~ i n-3/2. However, obviously x + e n 74 x.

Thus, for L a v r e n t i e v ' s scheme to be s u c c e s s f u l , we must be able

to g u a r a n t e e in advance that x e (JM n. Suppose this to be the

case, say x e Mn for n > no. Let xn ( r e ~ . ~ ) b e ~ e M n - q u a s i - s o ~ t i o n

of the e q u a t i o n A(z) = y (resp. A(z) = yk ). Then we have

l im x nk = x n, Vn,

x n = x, n > n .
- o

From these equations it is clear that we can p r o d u c e sequences in

X which converge to the true solution x.

To avoid the p r o b l e m of c h o o s i n g the sets Mn so as to be

sure in advance that x e ~ J M n, we might assume either that X is

a reflexive nls, or else that X is a dual space and that A is

the adjoint of an o p e r a t o r in L(Y*, *X) (where *X is the p r e - d u a l


212

of X). If we also assume that Y is an E-space, the result in b)

becomes applicable, and we can produce sequences in X which

converge weakly, or weak-star, to x.

d) A recent and related approach to the approximate solution

of ill-posed linear equations of the form (i) is due to Tanana [3].

The assumptions on the mapping A are the same, but those on X

and Y made in b) are interchanged. That is, it is assumed that

X is an E-space and Y a ics. Let us suppose given a directed

nbhd. basis {N } of closed convex y-nbhds., for some y e range (A).

(In a practical problem, this corresponds to the possibility of

determining y by experiment or measurement to arbitrary accuracy.)

Then the discrepanc K method of Ivanov and Tanana consists of

....
mlnlmlzlng the norm in X over the sets M ~ A-I(N ). For each

the E-property guarantees a unique solution x e M , and these

are considered to be approximate solutions of A(x) = y. The E-

property further entails the convergence of {x } to the exact

solution A-l(y). Conversely, if X is separable, and this

discrepancy method always yields convergent nets of approximate

solutions, for every A, every Y, and every directed nbhd. basis

of every element of range (A), then X must be an E-space.


213

References for ~34

i) V. Ivanov, On linear problems which are not well posed. Soviet


Math. Dokl. 3(1962), 981-983.

2) M. Lavrentiev, Seme Improperly Posed Problems of Mathematical


Physics. Springer-Verlag, New York, 1967.

3) V. Tanana, Incorrectly posed problems and the geometry of


Banach spaces. Seviet Math. Dokl. Ii[1970), 864-867.

4) A. Tikhonov, On the stability of inverse problems. Dokl. Akad.


Nauk SSSR 39(1944)~ 195-198. (Russian)
214

§35. Generalized Inverses

We continue with the study of abstract linear equations of the

form A(x) = y, but with a somewhat different viewpoint and toward

other ends than in the preceding section. Not only do we allow the

equation to be inconsistent for some y's, but also we permit a

superfluity of solutions, that is, the operator A need not be

injective. We shall attempt to single out a unique "best approxi-

mate solution" for a given y, and to study the correspondence be-

tween y and this "solution". The mapping so defined has some of

the properties of an inverse for A, and is known as the "general ~

ized inverse" of A. It must be noted at the outset that the only

really satisfactory results require that both x and y vary in

Hilbert spaces, and that A have closed range.

a) Let X and Y be nls and A ~ L(X,Y). For a given

Yo ¢ Y' we consider the linear equation

(1) i(x) = Yo"

Definition. An X-quasi-solution (34a)) of (I) is called an

extremal solution (or sometimes, a virtual solution). An extremal

solution of minimal norm is called a best approximate solution

(b.a.s.) to (i).

Let R(A) (resp. N(A)) denote the range (resp. nullspace) of

the operator A. It is clear that the existence of an extremal solu-

tion to (I) is equivalent to the condition

PR(A) (Yo) ~ ~"

In particular, if R-(TT~ is proximinal (30a)), then this last

condition becomes
215

(23 PR--~(yo) E R(A).

A more s o p h i s t i c a t e d nas condition for the existence o f an e x t r e m a l


solution is given next, in the Hilbert space case.

Theorem. (Tseng) Let X and Y be Hilbert spaces. There is

an extremal solution to (i) if and only if there exists a positive

constant fl such that

(3) I <yo,y> 12

for every y a N(3&*) -~.

Proof. Let us first prove the necessity of (3). We have

Y = R - - ~ ~ R(A) &

= R--[~e N(A*).

Let Yo = ~ + w be the associated decomposition of Yo" Since an

extremal solution exists, we have from (2) that y c R(A), that is,

= A(~), for some ~ c X. Let ~ = ][~]]2 Then, noting that

N(AA*) = N(A*), we have, for any y c N(AA*)~

] (Yo,Y) ] = I (A(~),y~I
= j <7,A, Cy)) I l l ll tIA*Cy) II
1 t

Conversely, let us demonstrate the sufficiency of (3). We

define a new inner product (.,.) on R-~ by

(Yl'Y2) ( YI'AA*CY2 )
Let Z be the completion of R(A~ in the metric defined by (.,.).

Since As is ~ontinuous wrt this metric, and X is complete, we

can extend As to belong to L(Z,X). Decomposing Yo as above,


216

namely Yo = ~ + w, we have for any y ~ ~ :

](~7,y~) [2 = I ~ Y o , Y ~ I 2

< ~ <y,AA *(y)> - B(y,y).

That is, the linear functional f(y) - (~,y~ is continuous on

wrt the (.,.)-metric, and therefore can be extended to belong

to Z*. By t h e Riesz Representation T h e o r e m we h a v e

f(z) = (z,~), Vz ~ z,

and for some z- e Z. Hence, for y ~ R(A):

f(y) = (y,~-)

T h u s we s e e that ~y e

< y,A(x)-y > = 0.

But this of course entails A(x) = y; in other words condition (2)

holds, and x is an extremal solution to equation (1), qed.

b) Returning now t o the case of general nls X and Y, let

E(A,Yo) be the (possibly void) set of extremal solutions to equation

(1). Since E(A,Yo) is always closed and convex, we s e e that when-

ever X is reflexive and rotund, in particular whenever X is an

E-space, there is a unique b.a.s, x o. The i d e a is now t o study the

mapping

A+: Y ~ D(A +) ÷ X,

A+ ( y o ) ~ x o.

By d e f i n i t i o n , the domain D(A +) consists of those Yo e Y for

which there exists a unique b.a.s, in X. Clearly, @ e D(A +) al-

ways.
217

Definition. The mapping A+ just defined is the generalized


inverse of A.

We now give conditions which imply that A+ is densely defined,

and/or linear, and/or continuous, etc.

Theorem. Let X be reflexive and rotund, and let R(A) be a

Chebyshev subspace of Y. Put B = (AIN(A)@) -I. Then

D(A +) D D(PR(A)) and

(4) A+ 1D(PR(A)) = BPR(A)"


In p a r t i c u l a r , A÷ is densely defined on Y.

Proof. We first note that PR(A) is densely defined on Y

because

D(PR(A)) D R(A)(D R - - ~ o,

which is dense in Y by 32c). Next we note that the mapping B is

well-defined, that is, A]N(A) @ is injective, because N(A) is a

Chebyshev subspace and so 32c-i) applies. Of course, B need not

be linear or continuous. Now let Yo s D(PR(A)) and define

x o = BPR(A)(Yo ).

Then 'xo ~ E(A,Yo) because

l lA(Xo)-Yo] I = l lPR(A)(yo)-Yoll
- d ( Y o , R ( A ) ) 5_ [IA(x3-Yol[, V x ~ x.

Now a g a i n u s i n g t h e (non-linear) direct sum d e c o m p o s i t i o n of 32c),


namely

X = N(A) ~ N(A)@,

we can express any x ~ X as x = n + p, and then if x is also


218

in E (A,Yo) we find

I Ixll = I In+pll = tln+BA(x) II

> d (BA(x) ,N (A))

= d(BPR(A) (Yo),N(A)) -= l lXoll.

This shows that x° is a b.a.s, to equation (i). Since any such

b.a.s, must be unique because of the hypotheses on X, it follows

that Yo ~ D(A+) and that (4) holds, qed.

It follows immediately that if to the preceding hypotheses we

adjoin the assumption that A has closed range, then D(A +) = Y. If

we also assume a little more about X and Y, then we can obtain a

more striking improvement on the theorem.

Corollary. Let X and Y be E-spaces, and let A s L(X,Y)

have closed range. Then A+ is a continuous mapping of Y onto

N(A) 0, whose restriction to R(A) is a homeomorphism.

Proof. First of all, we have D(PR(A) ) = Y, so that A+ is

given by the right hand side of (4). Since X and Y are E-spaces,

both metric projections PN(A) and PR(A) are continuous by 32e).

Hence we are reduced to checking the continuity of B. But, if A1

is the isomorphism of X/N(A) with R(A) induced by A, and

Q = QN(A) IN(A)@ (where, for any closed subspace M C X, QM is

the associated quotient map), then

B = Q-IAil

is continuous by 32g), qed.

Since a number of operators A of interest have finite dimen-

sional nullspaces, we can frequently drop the hypothesis in the

Corollary that X be an E-space and just require that N(A) be a

finite dimensional Chebyshev subspace of X.


219

Let us also note that u n d e r the h y p o t h e s e s of the last theorem,

AA+ = PR(A)'

A+A = I - PN(A)'

so that

(5) AA+A = A,

A+AA + = A +

Example. In this example we see how the g e n e r a l i z e d inverse

determines the solution set of the corresponding linear equation.

Let X and Y be nls and A s L(X,Y). Suppose that A+ satisfies

(5), and that equation (I) has a solution, say x o. T h e n every

solution x of (i) has the form

(6) x = A+(Yo ) + ( ~ - A + a ) ( z ) ,

for some z ~ X, and conversely. Because, if xI is a s o l u t i o n of

(i), then we may take z = xI in (6). And c o n v e r s e l y , given z ~ X

and x defined by (6)~ we have

A(x) = A(A+(Yo )) + A(z) AA+A(z)

Ai+A(Xo ) + A(z) A(z)

= A(Xo) = Yo"

In p a r t i c u l a r , the set of s o l u t i o n s to the h o m o g e n e o u s equation

A(x) = @ is simply the range of I A+A.

Also, we can observe that (5) e n t a i l s a consistency criterion

for e q u a t i o n (i), namely~ this equation is c o n s i s t e n t (i.e.,

solvable) if and only if AA+(Yo) = Yo"

c) For the remainder of this s e c t i o n we will assume that both

X and Y are H i l b e r t spaces. This will suffice to g u a r a n t e e that


220

generalized inverses are always linear mappings; this in turn leads

to a m u c h more elegant (and useful) theory.

Theorem. Let X and Y be H i l b e r t spaces and A e L(X,Y).

Then A+ is a closed, densely defined linear m a p p i n g on Y.

Proof. The theorem in b) applied here and allows us to con-

clude that A+ is a d e n s e l y defined linear m a p p i n g on Y, namely

A+ = B P R ( A ) ,

whose domain D(A +) is the dense subspace R(A) ~ R(A) & of Y.

Here B E (AJN(A)a) -I. To see that A+ is closed, select

{yn } C D ( A +) with Yn ÷ y ~ Y and A + ( Y n ) + x e X. We can w r i t e

Yn = A(Xn) + vn'

for x n ~ N(A) a- and v n E R(A) ± • Then xn - A + (yn) ÷ x, hence

A(Xn) ÷ A(x). Hence also v n + y --A(x) ~ R(A) A-. This shows that

y ~ D(A +) and that A+(y) = x, qed.

A quite similar argument shows that A+ is still a closed

linear m a p p i n g if A is any c l o s e d linear m a p p i n g on X [7]. Now

in the case we are c o n s i d e r i n g , namely A ~ L(X,Y), if we assume

also that R(A) is c l o s e d in Y, then it is c l e a r that A + ~ L(Y,X).

In this most important case we adopt a special terminology and

notation.

Definition. If for some A ~ L(X,Y) we have A + ~ L(Y,X) then

A+ is c a l l e d the p s e u d o i n v e r s e of A, and is w r i t t e n At .

Since we are only d e a l i n g with Hilbert spaces we see that

A ~ L(X,Y) has a pseudoinverse exactly when A has closed range

(in w h i c h case A is u s u a l l y said to be n o r m a l l y solvable).


221

2) If A is a partial isometry on X, then A t = A*. In

particular, if A is an orthogonal projection, then A T = A. If A

is normally solvable, then (At) * = (A*) t

3) For A a L(X,Y), R(A) is closed exactly when

0 < y(A) ~ inf{ilaCx) ll: x ~ S(N(A)~)}.

In t h i s c a s e we h a v e llAtll = ~(A) -1 [ii].

4) Showalter [13] shows that for A E L(X,Y) we have

A t = lim exp(-A*A(t-s))A*ds

lira B(t),
t+~

and estimates the rate of convergence by

y(A)][At-B(t)I] ~ exp(-ty(A) 2), t > 0.

It is further shown in [14], again with estimates of convergence

rate, that if A does not have closed range, we still have

A +(y) = lira B(t)(y)

+
for all y e Y, that is, B(t) * A strongly.

s) Decell [6] applies the Cayley-Hamilton theorem to AA*,

where A is an arbitrary (complex) matrix to deduce the following

formula for At: let

n
p(X) = ( - 1 ) n ~ a.X n - J a = I,
j=o ~ ' o

be the characteristic polynomial of AA*. If k = max(j: aj # 0},


222

then

k-I
AP = _akl ~ aj(AA*)k-j-1;
j=o
if k = 0, then A t = 0.

d) There are a variety of methods available for computing the

pseudoinverse of a matrix, for example c-5) above; see also [15],

[3, p. 685-688], [9]. Thus it is of interest when possible to have

a procedure for reducing the c o m p u t a t i o n of At to the case where A

is a matrix. The next theorem shows that this can always be done if

one of X and Y is finite dimensional. We have seen an example

of this situation in 21e,f).

Theorem. Let X and Y be Hilbert spaces and let A s L(X,Y)

be n o r m a l l y solvable. Then

A f = A~(AA~) t

= (A~A)fA * .

Proof. Let us just verify the first equality. We have

N ( A A ~) = N(A ~) and therefore R(AA *) = R(A) (obviously

R(AA *) C R(A) and since Y is the orthogonal direct sum of N(AA*)

and R(AA*), we have R(AA*) = R(A); but also y(A) > 0 (see c-3))

y(A ~) > 0 = > y ( A A ~) > 0 ~ R(AA ~) closed). Now according to the

theorem in c) we must show that Vx ~ N(A) &,

x -- A ~ B I A ( X ),

where

B 1 -- (AA, I N ( A , ) J - ) - i .

But there is a unique y ~ N(A ~) ~ such that A~(y) = x, whence

Bl ( A ( x ) ) -- BIAA*(y) = y,
223

and this completes the proof.

Two special cases of this theorem are of importance. First, if

either AA* or A*A happens to be invertible then we have a formula

for At . Second, if X (resp. Y) is finite dimensional, then A*A

(resp. AA*) is a matrix, to which the computational techniques men-

tioned above can be applied. Again we refer back to the example in

21e,f).

e) We now work toward formulas for At which do not require

the computation of any other pseudoinverses. These formulas require

a choice of an auxiliary operator with special range or nullspace.

The following lemma expressing the pseudoinverse of a product is

essential.

Lemma. Let X, Y, Z be Hilbert spaces, B ~ L(Z,Y),

C E L(X,Z), with B* and C surjective. Define A ~ L(X,Y) by

A = BC. Then

A t = C*(CC*)-I(B*B)-IB*

= CtB t"

Proof. Since B is an isomorphism of Z with a ~bspace of

Y, the theorem in d) implies the second equality. So we concentrate

on the first equality. We have

At = AtAA t = (AtB) (CA t ) ,

so that it will suffice to show

AtB = C*(CC*) -I,

OT

(7) B*A t* = (CC*)-ic,


224

and the analogous formula for At*c * . Now since C* is injective

and R(CC*) = R(C) (see d)) ~ Z, we see that CC* is invertible.

Next,

C,B,At* = A,At*

= (AtA)* = AtA = AtBC

~-> BCC*B*A t* = BCAtBC

= AAtA = A = BC.

-i
Left-multiply the two end terms of this last equation first by B

and then by (CC*) -I to obtain (7), qed.

Theorem. (Boot, Minimide-Nakamura) Let A be normally

solvable in L(X,Y). If for some Z there exists C, surjective,

in L(X,Z) such that R(C*) = R(A*), then

A t = C* (CA*AC*)-IcA*.

Similarly, if there exists B a L(Z,Y), B* surjective, such that

R(B) = R(A), then

A t = A*B(B*AA*B)'IB *.

Proof. In the first case we can write

A = APN(A)~ = APR(A, )

= APR(c, ) = Actc ~ BC,

verify that B* is surjective, and then apply the Lemma. Similarly,

in the second case, we can write

A = PR(A)A = PR(B)A

= BBtA - BC,
225

verify that C is surjective, and again apply the Lemma. Let us

just give the details for the first case.

To see that B* is surjective, it is has to prove that B

has a bounded inverse. But R(B) = R(A) is closed, so we need only

check that B is injective. Let B(z) = 0. Then

AC t (z) -- 0

-=-->O : A*ACt(z)

-- pR(a,)ct(z) = PR(c,)Ct(z)

= PN(c~C* (z) = c* (z).

But since C is surjective, Cf is an i s o m o r p h i s m of Z with

N(C)~; consequently, z = @.

Now applying the Lemma we obtain

A t = ctB t

= C* (CC*) -i (B.B) -IB *

= C* (CC*) - 1 (Ct*A*ACf) - IC%*A *

= C* (cT*A*AcTcc *)- ICt*A*

= C * ( C t * A * A C * ) - I c t * A *.

Thus we are reduced to showing

(C**A*AC*)-IcT*

(s)
= (CA*AC*)-Ic.

Since X = N(C) @ R(C*), it is sufficient to check that both

operators in (8) agree on N(C) and on R(C*). Now N(C) = N(C t*)

so the two operators certainly agree on N(C). Next, let z ~ Z;

then
226

cf*(C*(z)) = (cct)*(z)

= PR(c)(Z) = z.

Thus we are further reduced to showing

(C t*i*iC*) - 1 (z)

(93
_- (CA*AC*) - 1 (CC* (z)).

By rewriting (9) as an e q u a t i o n for Zl, where zI is chosen so

that

z = Ct*A*AC * (Zl) ,

we are led to showing

(i0) ~Z = (CA*AC*) - I C C * C t * A * A C * "

However, (I0) is c e r t a i n l y true, as we see by recalling that

c*ct* = (ctc)* = PR(C*) = PR(A*)"

This completes the proof.

An important use of this theorem is to reduce the c o m p u t a t i o n of

At to the i n v e r s i o n of a matrix. This is possible when either X

or Y is finite dimensional. Suppose, for example, that

dim (Y) < ~. Then in the t h e o r e m we can choose B to be the

natural injection of Z --- R(A) into Y.

f] Let us r e c o n s i d e r equation (i):

A(x) = Yo'

We have been considering extremal solutions of minimal norm to this

equation. For some purposes in o p t i m i z a t i o n and statistics, it is

important to restrict the solutions to lie in some p r e a s s i g n e d sub-


227

set M C X. Such a requirement leads naturally to the notions of

"restricted b.a.s." and "restricted pseudoinverse". Rather than

repeat most of the theory of a) and b), we shall continue to assume

that all spaces are Hilbert spaces, and that all operators are

normally solvable. For additional simplicity we shall also assume

that M is a closed linear subspace of X.

Definition. Let A s L(X,Y), B E L(X,Z) be normally solvable,

let A B = AIN(B ). Suppose that AB is also normally solvable.

Then A~ is called the restricted ~seudoinverse of A (wrt B).

Since AB is assumed normally solvable, we see that


t
AB(Yo) = x° means that x is the unique N(B)-quasi-solution of

equation (i) with minimal norm. The assumption that AB is nor-

mally solvable is equivalent to the assumption that the orthogonal

projection of N(B) on N(A) is closed. This latter condition is

certainly in effect if either dim (N(B)) < ~, or else one of the

nullspaces N(A), N(B) is contained in the other. We obviously

have At
0 ~ At as defined in c).

As does the ordinary pseudoinverse, the restricted pseudo-

inverse satisfies various algebraic relations. In fact, we have the


t
following algebraic characterization of A B-

Lemma. The restricted pseudoinverse At


B is the unique solution

E of the following equations:

(ii) BE = 0 ,

(i2) ~AE = E,

(13) (AE)* = AE,

(14) AEA = A on N(B),

(is) PN(B) (~A)* : EA on N(B).


228

Proof. We omit the verification that A#B satisfies (ii)-(15).

Let us, however, show that there is only one solution to this set of

equations. Suppose that E and F are both solutions. Then

= (AFAPN ( B ) ) * E * E

= (APN(B)FAPN(B))*E*E
= (PN(B)A*F*) (PN(B)A*E")E
FAEAE = FAE

= FAFAE = F ( F * A * ) A E

= F (E*A*AF) ~

= F (AEAF)*

= F(AF)* = FAF = F ,

where several times we have used (Ii) to conclude that R(E),

R(F) C N(B), qed.

Our primary interest in restricted pseudoinverses is that they

allow us to express the solution of certain kinds of quadratic

optimization problems with operator constraint.

Theorem. (Minimide-Nakamura) Let A and B be operators

satisfying the hypotheses of the preceding definition. Let Yo E Y

and z ° s R(B). Then the b.a.s, to the equation A(x) = Yo' sub-

ject to the constraint B(x) = Zo, is given by

(16) x ° = A~(Yo-ABt(Zo) ) + B~(Zo ).


229

Proof. Because of the h y p o t h e s e s on A and B it is clear

that this p r o b l e m has a unique solution. Now let x be d e f i n e d by


0

(16). Applying (ii), we see that

B(Xo) = B(Bt(Zo)) = PR(B)(Zo) = z 0

Next choose any x for w h i c h B(x) = z o. Then

]lA(x)_yo[ [2 = l lACx_BtCZo))

M&B (yo -AB* (%)) I t z


+ I I (I-~) (Yo-AB*(z o)) 2

>_ ] l (I-A;q) (yo-ABt (z o) ) Z


_ ]]A(xo)_yoi[2,

with equality if and o n l y if

(17) A ( x - B ~ ( Z o )) = AA~9 (Yo-ABT ( Z o ) ) .

(The f i r s t equality above arises from the Pythagorean Law a p p l i e d to

the sum o f an e l e m e n t in R(A) and an e l e m e n t in R(A)a.) Now, i f

x also satisfies (17), then

J lxtI 2 = j lx-Bt(Zo)Ile + iiBtCZo) llz


Ix-BY (zo)-AB*(yo-iS(z o ))112
+ IAs[ ( Y o - i B $ ( z o ) ) t ]2 + l lBrCzoDlt
" e

> [XoI l 2,
unless x = Xo, qed. ( A g a i n we h a v e applied the Pythagorean Law,

first to elements in N(B) ± and N(B) , and t h e n to elements in

N(A) ~ and N(A) .)


230

g). In effect, what we have been studying in this section is

a special class of "multi-stage" optimization problems, where each

stage involves the minimization of a quadratic norm. So far we

have only encountered two-stage problems, but higher-stage problems

lie close at hand. For example, consider the problem solved in the

last theorem, but suppose that z ° ~ Z \ R(B). We can still define

an element x° via (16) but now its significance is that it is a

solution of the following three-stage problem: find an extremal

solution (- X-quasi-solution) of B(x) = Zo, which, among all such

extremal solutions, is a b.a.s, of A(x) = Yo (this latter problem

of course being two-stage).

For amether example, let X and Y be IIilbert spaces, let

r ~ L(Y,X) be an isomorphism and S ~ L(X,X) an automorphism, and

define new equivalent norms on X and Y by

I iYIIT-- )IT(v3tt,
tlXlis--)lsCx)ll.
Then given Yo e Y and A e L(X,Y) (normally solvable), we pose

the problem: among all extremal solutions of A(x) = Yo (wrt the

II" liT-norm', find the (unique) element of least [[']IS -n°rm. To

solve this problem, we first note that

t]A(x)-Yol [T z l JTA(x)-T(Yo)][ ,

whence the set of l l'llT-extremal solutions is the flat

(TA)*(T(Yo)) + (I-(TA)t(TA))(X).

Then the element of least l iolls norm in this flat is given by

x o = S-l(ras-l)*(T(Yo)).
231

The finite dimensional version of this last result has been used to

construct the unbiased linear estimate of minimal variance (Gauss-

Markov estimate) to the unknown vector of parameters appearing in a

linear statistical model [12, p. 119]. In this situation the iso-

morphism S2 represents the (positive definite) covariance matrix

of the model, R is the identity, and A is determined by the

particular physical situation involved.

Yet another kind of two-stage optimization problem of the type

under discussion occurs in the theory of optimal control. Namely,

from an admissible set of controls it is desired to choose one which

steers a given (linear) system in such a way that at the terminal

time, some (quadratic) function of the difference between the

achieved state and the desired state is minimized (it might also be

important to minimize some (quadratic) function of the difference

between the actual trajectory and a desired trajectory). If there

is more than one such "optimal control", then from among these it is

desired to choose one which minimizes some (quadratic) cost criterion.

An example along these lines is given in [i0, p. 174].

h). Let us conclude by citing a few of the more recent works

on pseudoinversion. The earlier literature is most adequately

referenced in [3]. Very much in the spirit of the present notes are

two papers of Ben-Israel [1,2], which expound the use of metric

projections onto convex subsets of Rn and pseudoinverses to pro-

duce algorithms for solving non-linear equations and inequalities in

several variables. For a related approach see also the recent paper

of Fletcher [8]. An interative scheme for computing the operator At

which generalizes the hyperpower method for inverting an operator is

given by Petryshyn [II]. For an extensive survey of finite dimen-

sional pseudoinverses, there are now available a symposium pro-

ceedings [4] and an introductory text [5].


232

References for §35

1) A. Ben-lsrael, On iterative methods for solving nonlinear least


squares problems over convex sets. Israel Math. J. 5(1967),
211-224.

2) , On Newton's method in nonlinear programming,


p. 339-352 in Princeton Symposium on Mathematical Program-
ming (H. Kuhn, Ed.), Princeton Univ. Press, Princeton, 1970.

3) and A. Charnes, Contributions to the theory of


generalized inverses. J. Soc. Ind. Appl. Math. 11(1963),
667-699.

4) T. Boullion and P. Odell, Ed's., Symposium on Theory and Appli-


cation of Generalized Inverses of Matrices. Texas Tech.
College, Lubbock, 1968.

5) , Generalized Inverse Matrices. Wiley-Interscience,


New York, 1971.

6) H. Decell, An application of the Cayley-Hamilton Theorem to


generalized matrix inversion. SIAM Rev. 7(1965), 526-528.

7) I. Erdelyi and A. Ben-Israel, ~xtremal solutions of linear equa-


tions and generalized inversion between Hilbert spaces. J.
Math. Anal. Appl., to appear.

8) R. Fletcher, Generalized inverses for nonlinear equations and


optimization, p. 75-86 in Numerical Methods for Nonlinear
Algebraic Equations (P. Rabinowitz, Ed.), Gordon and
Breach, New York, 1970.

9) T. Greville, Some applications of the pseudoinverse of a matrix.


SIAM Rev. 2(1960), 15-22.

10) N. Minimide and K. Nakamura, A restricted pseudoinverse and its


application to constrained minima. SIAM J. App. Math.
19(1970), 167-177.
233

11) W. Petryshyn, On generalized inverses and on the uniform con-


vergence of (I-~K) n with application to iterative methods.
J. Math. Anal. Appl. 18(1967), 417-439.

i2) C. Price, The matrix pseudoinverse and minimal variance estimates.


SIAM Rev. 6(1964), 115-120.

13) D. Showalter, Representation and computation of the pseudo-


inverse. Proc. Amer. Math. Soc. 18(1967), 584-586.

14) and A. Ben-lsrael, Representation and computation of


the generalized inverse of a bounded linear operator be-
tween Hilbert spaces. Appl. Math. Report No. 69~12,
Northwestern Univ., 1969.

is) S. Zlobec, Explicit computation of the Moore-Penrose generalized


inverse. SIAM Rev. 12(1970), 132-134.

Potrebbero piacerti anche