Sei sulla pagina 1di 322

Optima l

Control
CONTEMPORARY SOVIET MATHEMATICS
Series Editor: Revaz Gamkrelidze, Steklov Institute, Moscow, USSR

COHOMOLOGY OF INFINITE-DIMENSIONAL LIE ALGEBRAS


D. B. Fuks
DIFFERENTIAL GEOMETRY AND TOPOLOGY
A. T. Fomenko

LINEAR DIFFERENTIAL EQUATIONS OF PRINCIPAL TYPE


Yu. V. Egorov

OPTIMAL CONTROL
V. M. Alekseev, V. M. Tikhomirov, and S. V. Fomin

THEORY OF SOLITONS: The Inverse Scattering Method


S. Novikov, S. V. Manakov, L. P. Pitaevskii, and V. E. Zakharov
TOPICS IN MODERN MATHEMATICS: Petrovskii Seminar No.5
Edited by 0. A. Oleinik
Opt ima l
Con trol

V. M. Aleksee v
V. M. Tikhom irov
and S. V. Fomin
Department of Mathematics and Mechanics
Moscow State University
Moscow, USSR

Translated from Russian by


V. M. Volosov

SPRINGER SCIENCE+BUSINESS MEDIA, LLC


Library of Congress Cataloging in Publication Data

Alekseev, V. M. (Vladimir Mikhailovich)


Optimal control.
(Contemporary Soviet mathematics)
1. Control theory. 2. Mathematical optimization. I. Tikhomirov, V. M. II. Fomin, S.
V. (Sergei Vasil'evich Ill. Title. IV. Series.
QA402.3.A14 1987 001.53 87-6935
ISBN 978-1-4615-7553-5 ISBN 978-1-4615-7551-1 (eBook)
DOI 10.1007/978-1-4615-7551-1

This translation is published under an agreement with the


Copyright Agency of the USSR (VAAP) .

© 1987 Springer Science+Business Media New York


Originally published by Plenum Publishing Corporation, New York in 1987
Softcover reprint of the hardcover 1st edition 1987
All rights reserved

No part of this book may be reproduced, stored in a retrieval system, or transmitted


in any form or by any means, electronic, mechanical, photocopying, microfilming,
recording, or otherwise, without written permission from the Publisher
PREFACE

There is an ever-growing interest in control problems today, con-


nected with the urgent problems of the effective use of natural resources,
manpower, materials, and technology. When referring to the most important
achievements of science and technology in the 20th Century, one usually
mentions the splitting of the atom, the exploration of space, and computer
engineering. Achievements in control theory seem less spectacular when
viewed against this background, but the applications of control theory are
playing an important role in the development of modern civilization, and
there is every reason to believe that this role will be even more signifi-
cant in the future.
Wherever there is active human participation, the problem arises of
finding the best, or optimal, means of control. The demands of economics
and technology have given birth to optimization problems which, in turn,
have created new branches of mathematics.
In the Forties, the investigation of problems of economics gave rise
to a new branch of mathematical analysis called linear and convex program-
ming. At that time, problems of controlling flying vehicles and technolog-
ical processes of complex structures became important. A mathematical
theory was formulated in the mid-Fifties known as optimal control theory.
Here the maximum principle of L. S. Pontryagin played a pivotal role. Op-
timal control theory synthesized the concepts and methods of investigation
using the classical methods of the calculus of variations and the methods
of contemporary mathematics, for which Soviet mathematicians made valuable
contributions.
This book is intended to serve as a text in various courses in opti-
mization that are offered in universities and colleges. A brief outline of
the general concept and structure of the book is given below.
The history of the investigation of problems on extremum or, as we
usually say, "extremal problems," did not begin in our time- such problems
have attracted the attention of mathematicians for ages. Chapter 1, which
is written for a wide audience with diverse mathematical backgrounds, fol-
lows the trend of development of this science. Although the mathematical
apparatus used there is minimal, the style of presentation is that of a
precise mathematical text, confined to the most impressive yet elementary
periods in the history of the investigation of extremal problems. The aim
is to unite the early concepts of J. Kepler and P. Fermat, the problems
posed by C. Huygens, I. Newton, and J. Bernoulli, and the methods of

v
vi PREFACE

J. L. Lagrange, L. Euler, and K. T. W. Weierstrass with present-day theory


which, in fact, is an extension of the works of these great scientists.
Chapter 1 also describes some methods of solving concrete problems and
presents examples for solving these problems based on a unified approach.
The remainder of the book is intended for mathematicians. The appear-
ance of the new theory stimulated the development of both the old and new
branches of mathematical analysis, yet not every one of these branches is
adequately represented in modern mathematical education. That part of
classical mathematical analysis centered on the topic of implicit functions
plays an exceptional role at present in all aspects of finite-dimensional
and infinite-dimensional analysis. The same applies to the principles of
convex analysis. Finally, in the development of optimal control theory,
the basic principles of the theory of differential equations remain valid
for equations with discontinuous right-hand sides. The above three divi-
sions of mathematical analysis and geometry are dealt with in Chapter 2.
Chapters 3 and 4 deal with the theory of extremal problems, a substan-
tial portion of which forms the subject matter in the half-year courses
given by the authors at the Department of Mathematics and Mechanics of
Moscow State University. The text is so arranged that the proof of the
main theorems takes no more than one lecture (two sessions of 45 minutes
each in the Soviet Union). The presentation of the material is thorough,
the authors avoiding omissions and references to the obvious.
Some standard notation from set theory, functional analysis, etc. is
used in the text without further comment, and, to facilitate reading, a
list of basic notation with brief explanations is given at the end of the
book.
Sections in the text marked with an asterisk are intended to acquaint
the reader with modern methods in the theory of extremal problems. The
presentation of the material in these sections follows ciosely the style
of a monograph although a few references are made to some theorems which
have become classical but are not at present included in traditional mathe-
matical programs. These sections are based on special courses known as
"Convex Analysis," "Supplementary Chapters in the Theory of Extremal Prob-
lems," and others that have been given for a number of years at the Depart-
ment of Mathematics and Mechanics of Moscow State University.
The book is intended for a broad audience: primarily for students of
universities and colleges with comprehensive mathematical programs, and
also for engineers, economists, and mathematicians involved in the solution
of extremal problems. The Introduction (Chapter 1) is written for the last
group of readers, and researchers intimately associated with the theory of
extremal problems will find the Notes and Guide to the Literature at the
end of the book useful.
Credit must be given to the late Professor S. V. Fomin in organizing
and expanding the course in optimal control at the Department of Mathemat-
ics and Mechanics of Moscow State University. It was on his initiative
that the present book was written. His untimely death in the prime of his
life did not allow him to see this work realized. In the preparation of
this book the authors used the original text of the lectures given by S. V.
Fomin and V. M. Tikhomirov. Professor V. M. Alekseev, one of the co-authors
of this book, passed away on December 1, 1980. This book is his last major
work.
I express my gratitude to the Faculty of General Control Problems of
the Department of Mathematics and Mechanics of Moscow State University for
their participation in the discussion of teaching methods for the course
in optimal control and the method of presentation of the subjects touched
upon in this book.
PREFACE vii

I am indebted to A. A. Milyutin for his creative work in the formula-


tion of the mathematical concepts on which this book is based. Thanks are
also due to A. I. Markushevich for his valuable advice and to A. P. Bus-
laev and G. G. Magaril-Il'yaev for reading the manuscript and making a
number of valuable comments.
The English translation incorporates suggestions made by me and V. M.
~lekseev. It includes two supplementary sections: Section 3.5 of Chapter 3
and the whole of Chapter 5. I express my deep gratitude to Professor V.M.
Volosov for translating the book and for the various refinements that he
made in the translation.

V. M. Tikhomirov
CONTENTS

Chapter 1
INTRODUCTION

1.1. How Do Extremal Problems Appear?. . . . . ...... . 1


1.1.1. The Classical Isoperimetric Problem. Dido's Problem. 1
1.1.2. Some Other Ancient Extremal Problems in Geometry 5
1.1.3. Fermat's Variational Principle and Huygens'
Principle. The Light Refraction Problem . . . . 7
1.1.4. The Brachistochrone Problem. The Origin of the
Calculus of Variations . . . . . . . . . . . . . 9
1.1.5. Newton's Aerodynamic Problem . . . . . . . . . . 11
1.1.6. The Diet Problem and the Transportation Problem. 12
1.1.7. The Time-Optimal Problem . . . 13

1.2. How Are Extremal Problems Formalized? 13


1.2.1. Basic Definitions . . . . 13
1.2.2. The Simplest Examples of Formalization of Extremal
Problems . . • . . . . . . . . . . . . 14
1.2.3. The Formalization of Newton's Problem . . . . . 15
1.2.4. Various Formalizations of the Classical
Isoperimetric Problem and the Brachistochrone
Problem. The Simplest Time-Optimal Problem. . 17
1.2.5. Formalization of the Transportation and the Diet
Problems . . . . . . . . . . . . . . . 19
1.2.6. The Basic Classes of Extremal Problems . . . . 19

1.3. Lagrange Multiplier Rule and the Kuhn-Tucker Theorem. 23


1.3.1. The Theorem of Fermat . . . . 23
1.3.2. The Lagrange Multiplier Rule . . . . . . . 24
1.3.3. The Kuhn-Tucker Theorem . . . . . . . . . . 28
1.3.4. Proof of the Finite-Dimensional Separation
Theorem. . . 31

1.4. Simplest Problem of the Classical Calculus of Variations


and Its Generalizations . . . . . . 32
1.4.1. Euler Equation . . . . . . . . . . . . 32
1.4.2. Necessary Conditions in the Bolza Problem.
Transversality Conditions . . . . . . . 36
1.4.3. The Extension of the Simplest Problem . . . 37

ix
X CONTENTS

1.4.4. Needlelike Variations. Weierstrass' Condition 43


1.4.5. Isoperimetric Problem and the Problem with
Higher Derivatives . . . . . . . . . . . . . . 45

1.5. Lagrange Problem and the Basic Optimal Control Problem. 47


1.5.1. Statements of the Problems . . . . . . . . . 47
1.5.2. Necessary Conditions in the Lagrange Problem . 48
1.5.3. The Pontryagin Maximum Principle . . ... so
1.5.4. Proof of the Maximum Principle for the Problem
with a Free End Point. 51

1.6. Solutions of the Problems . 56


1.6.1. Geometrical Extremal Problems. 57
1.6.2. Newton's Aerodynamic Problem . 60
1.6.3. The Simplest Time-Optimal Problem. 63
1.6.4. The Classical Isoperimetric Problem and the
Chaplygin Problem. . . . . . . . . . . . . . 66
1.6.5. The Brachistochrone Problem and Some Geometrical
Problems . . . . . . . . . . . . . . . . . . . . 69

Chapter 2
MATHEMATICAL METHODS OF THE THEORY OF EXTREMAL PROBLEMS

2.1. Background Material of Functional Analysis . . 71


2.1.1. Normed Linear Spaces and Banach Spaces 71
2.1.2. Product Space. Factor Space . . . . . . 72
2.1.3. Hahn-Banach Theorem and Its Corollaries 74
2.1.4. Separation Theorems . . . . . . . . 77
2.1.5. Inverse Operator Theorem of Banach and the
Lemma on the Right Inverse Mapping 79
2.1.6. Lemma on the Closed Image . . . . . 80
2.1.7. Lemma on the Annihilator of the Kernel of a
Regular Operator . . . . . . . . . . . 81
2.1.8. Absolutely Continuous Functions . . . . 81
2.1.9. Riesz' Representation Theorem for the General
Linear Functional on the Space C. Dirichlet's
Formula . . . . . . . . . . . . . . . . . 84

2.2. Fundamentals of Differential Calculus in Normed Linear


Spaces . . . . . . . . . . . . . . . . . . . . . 85
2.2.1. Directional Derivative, First Variation, the
Frechet and the Gateaux Derivatives, Strict
Differentiability . . . . . . . . . . . . . . 85
2.2.2. Theorem on the Composition of Differentiable
Mappings . . . . . . . . . . . . . . . . . . 90
2.2.3. Mean Value Theorem and Its Corollaries . . . 92
2.2.4. Differentiation in a Product Space. Partial
Derivatives. The Theorem on the Total Differ-
ential . . . . . . . . . . . . . . . . . . . 94
2.2.5. Higher-Order Derivatives. Taylor's Formula. 97

2.3. Implicit Function Theorem . . . . . . . . 101


2.3.1. Statement of the Existence Theorem for an
Implicit Function. . . . . 101
2.3.2. Modified Contraction Mapping Principle 102
2.3.3. Proof of the Theorem . . . . . . . . . 103
2.3.4. The Classical Implicit Function and Inverse
Mapping Theorems . . . . . . . . . . . . . . 105
CONTENTS xi

2.3.5. Tangent Space and Lyusternik's Theorem . 108

2.4. Differentiability of Certain Concrete Mappings. 111


2.4.1. Nemytskii's Operator and a Differential Constraint
Operator . . . . . . . . . . . . 111
2.4.2. Integral Functional . . . . . . . 113
2.4.3. Operator of Boundary Conditions. 115

2.5. Necessary Facts of the Theory of Ordinary Differential


Equations . . . . . . . . . . 116
2.5.1. Basic Assumptions . . . 117
2.5.2. Local Existence Theorem. 118
2.5.3. Uniqueness Theorem . . . 120
2.5.4. Linear Differential Equations. 121
2.5.5. Global Existence and Continuity Theorem. 124
2.5.6. Theorem on the Differentiability of Solutions
with Respect to the Initial Data . . . . . 128
2.5.7. Classical Theorem on the Differentiability of
Solutions with Respect to the Initial Data 130

2.6. Elements of Convex Analysis . 133


2.6.1. Basic Definitions . . . 133
2.6.2. Convex Sets and Functions in Topological Linear
Spaces . . . . . . . . 138
2.6.3. Legendre-Young-Fenchel Transform. The
Fenchel-Moreau Theorem . . . . . . 144
2.6.4. Subdifferential. Moreau-Rockafellar Theorem.
Dubovitskii-Milyutin Theorem . . . . 147

Chapter 3
THE LAGRANGE PRINCIPLE FOR SMOOTH PROBLEMS WITH CONSTRAINTS

3.1. Elementary Problems . . . . . . . . . . . . . 154


3.1.1. Elementary Problem without Constraints 154
3.1.2. Elementary Linear Programming Problem. 157
3.1.3. The Bolza Problem . . . . . . . . . . 158
3.1.4. Elementary Optimal Control Problem . 160
3.1.5. Lagrange Principle for Problems with Equality and
Inequality Constraints . . . . . . . . . . . . . . 160

3.2. Lagrange Principle for Smooth Problems with Equality and


Inequality Constraints . . . . 163
3.2.1. Statement of the Theorem 163
3.2.2. Lagrange Multiplier Rule for Smooth Problems with
Equa1ity Constraints . . 164
3.2.3. Reduction of the Problem 165
3.2.4. Proof of the Theorem . . 166

3.3. Lagrange Principle and the Duality in Convex


Programming Problems . . . . . . . . . . . . . 169
3.3.1. Kuhn-Tucker Theorem (Subdifferential Form) . 169
3.3.2. Perturbation Method and the Duality Theorem. 170
3.3.3. Linear Programming: The Existence Theorem and the
Duality Theorem. . . . . . . . . . . . . . . . . . 174
3.3.4. Duality Theorem for the Shortest Distance Problem.
Hoffman's Lemma and Minimax Lemma . . . . . . . . . 178
xii CONTENTS

3.4. Second-Order Necessary Conditions and Sufficient Conditions


for Extremum in Smooth Problems . . . . . . . . . . 186
3.4.1. Smooth Problems with Equality Constraints . . . . . 186
3.4.2. Second-Order Necessary Conditions for Smooth Prob-
lems with Equality and Inequality Constraints . . . 188
3.4.3. Sufficient Conditions for an Extremum for Smooth
Problems with Equality and Inequality Constraints. 190

3.5. Application of the Theory to Algebra and Mathematical


Analysis . . . . . . . . . . . . . . . 193
3.5.1. Fundamental Theorem of Algebra . . . . . . . 193
3.5.2. Sylvester's Theorem . . . . . . . . . . . . . 194
3.5.3. Distance from a Point to a Subspace. Theorem
on the Orthogonal Complement. Gram Determinants 195
3.5.4. Reduction of a Quadratic Form to Its Principal
Axes . . . . . . . 197
3.5.5. Legendre Quadratic Forms 201

Chapter 4
THE LAGRANGE PRINCIPLE FOR PROBLEMS OF THE CLASSICAL CALCULUS
OF VARIATIONS AND OPTIMAL CONTROL THEORY

4.1. Lagrange Principle for the Lagrange Problem . . . . 203


4.1.1. Statement of the Problem and the Formulation of
the Theorem. . ........ . 203
4.1.2. Reduction of the Lagrange Problem to a Smooth
Problem . . . 207
4.1.3. Generalized DuBois-Reymond Lemma . . 208
4.1.4. Derivation of Stationarity Conditions. 210
4.1.5. Problem with Higher-Order Derivatives. The
Euler-Poisson Equation . . 212

4.2. The Pontryagin Maximum Principle. 214


4.2.1. Statement of the Optimal Control Problem 214
4.2.2. Formulation of the Maximum Principle. The
Lagrange Principle for the Optimal Control Problem 217
4.2.3. Needlelike Variations . . . . . . . . . . . 220
4.2.4. Reduction to the Finite-Dimensional Problem. 222
4.2.5. Proof of the Maximum Principle . . . . . . . 223
4.2.6. Proof of the Lemma on the Packet of Needles. 227
4.2.7. Proof of the Lemma on the Integral Functionals 234

4.3. Optimal Control Problems Linear with Respect to Phase


Coordinates 236
4.3.1. Reduction of the Optimal Control Problem Linear
with Respect to the Phase Coordinates to the
Lyapunov-Type Problem. 236
4.3.2. Lyapunov's Theorem 237
4.3.3. Lagrange Principle for the Lyapunov-Type
Problems 240
4.3.4. Duality Theorem. . . ..... . 245
4.3.5. Maximum Principle for Optimal Control Problems
Linear with Respect to the Phase Coordinates . 248

4.4. Application of the General Theory to the Simplest Problem


of the Classical Calculus of Variations . . . . 250
4.4.1. Euler Equation. Weierstrass' Condition.
Legendre's Condition . • • . . . . . . . 250
CONTENTS xiii

4.4.2. Second-Order Conditions for a Weak Extremum.


Legendre's and Jacobi's Conditions . . 252
4.4.3. Hamiltonian Formalism. The Theorem on the
Integral Invariant . . . . . . . . • . . . 255
4.4.4. Sufficient Conditions for an Absolute Extremum in
the Simplest Problem . . . . . . . . . . 261
4.4.5. Conjugate Points. Sufficient Conditions for Strong
and Weak Extrema . . . . . . . 265
4.4.6. Theorem of A. E. Noether . . . 272
4.4.7. Lagrange Variational Principle and the
Conservation Laws in Mechanics 275

PROBLEMS . . . . . . . . . 278
NOTES AND GUIDE TO THE LITERATURE. 298
REFERENCES . . 301
BASIC NOTATION . . . . . . . . . . 306
Chapter 1
INTRODUCTION

1.1. How Do Extremal Problems Appear?


Striving to improve oneself is a characteristic of human beings, and
when one has to choose from several possibilities one naturally tries to
find the optimal possibility among them.
The word "optimal" originates from the Latin "optimus," meaning the
best. To choose the optimal possibility one has to solve a problem of
finding a maximum or a minimum, i.e., the greatest or the smallest values
of some quantities. A maximum or a minimum is called an extremum ·(from
the Latin "extremus," meaning furthest off). Therefore the problems of
finding a maximum or a minimum are called extremal problems.
The methods of solving and analyzing all kinds of extremal problems
constitute special branches of mathematical analysis. The term optimiza-
tion problems is used in almost the same sense; this term indicates more
distinctly its connection with practical applications of mathematics.
The aim of this book is to present to the reader the theory of extre-
mal problems and the methods of their solution. However, before proceeding
to the formal and logically consistent presentation of this branch of math-
ematics, we turn to the past to gain a better understanding of the reasons
motivating scientists to pose and solve extremal problems in their applied
aspect, i.e., as optimization problems.
Mercatique solum, facti
de nomine Byrsam
Taurino quantum possent
circumdare tergo
P. Vergilius Mara, "The Aeneid"*
1.1.1. The Classical Isoperimetric Problem. Dido's Problem. Prob-
lems of finding maximum and minimum values were first formulated in anti-
uity. The classical isoperimetric problem is perhaps the most ancient of
the extremal problems known to us. It is difficult to say when the idea
was first stated that the circle and the sphere have the greatest "capac-
ity" among all the closed curves of the same length and all the surfaces
of the same area. Simplicius (6th century A.D.), one of the last pupils

*They bought a space of ground, which (Byrsa called, from the bull's hide)
they first enclosed ... " [The Works of Virgil, Oxford University Press
(1961), p. 144, translated by J. Dryden.]

1
2 CHAPTER 1

of the school of Neoplatonism at Athens who compiled extensive commentaries


on Aristotle's work (4th century B.C.), writes: "It was proved before
Aristotle (since the latter uses it as a known fact) and later proved by
Archimedes and Zenodoros more completely that, among the isoperimetric fig-
ures, the circle, and among the solids of constant surface area, the ball,
have the greatest capacity." These words indicate the statement of the
following extremal problems: among the plane closed curves of a givenlength
find the curve enclosing the greatest area, and among the closed spatial
surfaces of a given area find the surface enclosing the greatest volume.
Such a formulation of the problem is natural for a Platonist and is
connected with the search for perfect forms. As is known, the circle and
the ball were symbols of perfection in ancient times.
A more matter-of-fact motivation of the same isoperimetric problem and
a number of similar problems are found in a sufficiently distinct though
naive form in the legend of Dido. Let us recall the plot of the legend as
it is narrated in "The Aeneid" by the Roman poet Virgil (see the epigraph
at the beginning of this subsection).
The Phoenician princess Dido with a small group of inhabitants of the
city of Tyre fled from her brother, the tyrant Pygmalion, and set sail for
the West along the Mediterranean coast. Dido and her faithful companions
chose a good place on the north coast of Africa (at present the shore of
the Gulf of Tunis) and decided to found a settlement there. It seems that
among the natives there was not much enthusiasm for this idea. However,
Dido managed to persuade their chieftain Hiarbas to give her "as much land
as she could enclose with the hide of a bull." Only later did the simple-
hearted Hiarbas understand how cunning and artful Dido was: she then cut
the hull's hide into thin strips, tied them together to form an extremely
long thin thong, and surrounded with it a large extent of territory and
founded the city of Carthage* there. In commemoration of this event the
citadel of Carthage was called Byrsa.t According to the legend, all these
events occurred in 825 (or 814) B.C.
Analyzing the situation, we see that here there are several possibili-
ties of stating an optimization problem.
A) It is required to find the optimal form of a lot of land of the
maximum area S for a given perimeter L.
Clearly, this is that same classical isoperimetric problem.+ Its so-
lution is a circle.
Exercise. Assuming that the hide of a bull is a rectangle of 1 x 2 m
and that the width of the thong is 2 mm, determine L and the maximum S.
[The authors could not find information on the exact dimensions of
Byrsa. Situated on a high hill (63 m above sea level) it was unlikely to
be very large. Compare the length of the wall of the Kremlin in Moscow-
2235 m.]

*From the Phoenician name "Quarthadasht" ("new town").


tByrsa is a word Punic in origin (''bull 's hide"). The name "Punic" orig-
inates from the Latin "Punicus" pertaining to Carthage or its inhabitants.
tAnother real situation leading to the same problem was described in the
story "How much land does a man need?" by L. N. Tolstoy. For the analysis
of this story from the geometrical viewpoint, see the book "Zanimatel'naya
geometriya" (Geometry for Entertainment) by J. A. Perelman, Moscow, Gos-
tekhizdat (1950), Chapter 12 (in Russian).
INTRODUCTION 3

The solution of the isoperimetric problem is contained in the follow-


ing assertion:
If a rectifiable curve of length L encloses a plane figure of area S,
then
L';;;;:.4nS, (1)

and, moreover, the equality takes place here if and only if the curve is a
circle.
Relation (1) is called the isoperimetric inequality. For its proof,
see [21].
B) Other statements of the problems are obtained if we suppose (which
is quite natural) that Dido wanted to preserve the access to the sea. To
distinguish between these problems and the classical isoperimetric problem,
we shall call the former Dido's problems. For the sake of simplicity, we
shall first consider the case of a rectilinear coastline (Fig. 1).
Dido's First Problem. Among all arcs of length L, lying within a
half-plane bounded by a straight line l, with end points A, B E l find the
arc which together with the line segment [AB] encloses a figure of the max-
imum area S.
Solution. Let ACB be an arbitrary admissible arc with end points A,
BEL, enclosinga figure of areaS (Fig. 1). Together this arc and its sym-
metric reflection in l form a closed curve of length 2L enclosing a figure
of area 2S. According to the isoperimetric inequality,
(2L) 2 ;;;::. 4n2S, (2)

whence
So:;;;P;(2n). (3)

Consequently the maximum value of Scan only be equal to L2 /(2n), and this
value is in fact attained if ACB is a semicircle subtending the diameter
[AB]. The problem possesses a single solution to within a shift along the
straight line l (why?).
C) In the foregoing problem the end points A and B of the sought-for
arc could occupy arbitrary positions on the straight line l. Now, what
occurs when the end points are fixed?
Dido's Second Problem, Among all arcs of length L, lying in a half-
plane bounded by a straight line l, with fixed end points A,B E l, find the
arc which together with the line segment [AB] encloses a figure of the max-
imum area.
Solution. It is evident that the problem makes sense only for L >
IABI (otherwise, then either there is no arc satisfying the conditions of
the problem or there is only one such arc, namely (for L = IABI) the line
segment [AB] itself. By analogy with the first problem, it is natural to
expect that the solution is an arc of a circle for which [AB] is a chord.
Such an arc ACB is determined uniquely. Let us complete it to the whole
circle by adding the arc ADB (Fig. 2). Let us denote the length of the
arc ADB by A and the area of the segment of the circle bounded by this arc
and the line segment [AB] by o.
Now let ACB be an arbitrary arc satisfying the conditions of the prob-
lem which together with [AB] encloses an area S. The length of the closed
curve ACBD is equal to L +A and the area bounded by it is equal to S +a.
According to (1), we have 4n(S + o) ~ (L + A) 2 , whence
4 CHAPTER 1

~~
A,
', S
l
I

... ___ , "


', I

Fig. 1 Fig. 2 Fig. 3

As in (1), the equality in this relation and, consequently, the maximumS


are attained if and only if the curve ACBD is a circle, i.e., if the two
arcs are equal: ACB = ACB.
Attention should be paid. to the following distinction between the two
problems we have considered. In Dido's first problem the set of the ad-
missible curves is broader since the positions of the points A and B are
not prescribed. By the way, without loss of generality, one of them, say
A, can be regarded as fixed. Then the position of the point B is deter-
mined by an additional condition: ACB is not simply an arc of a circle as
in Dido's second problem but is a semicircle. In the equivalent form this
condition is stated thus: the sought-for arc approaches the straight line
l at angles of 90° at its end points. Later we shall see that this demon-
strates a general principle: when the end points of a sought-for curve
possess some freedom we must require that at these points certain condi-
tions called transversality conditions be fulfilled. As to the shape of
the sought-for _curve, it is the same in both the problems and is determined
by a certain equation (the Euler equation) which must hold along the curve.
In the case under consideration the sought-for curve must have one and the
same curvature at all its points.
D) Now let us consider the case when the coastline is not a straight
line. We shall confine ourselves to fixed end points. It can easily be
understood that if the coastline between A and B deviates slightly from a
straight line, the solution will be again the same circular arc ACB as be-
fore.
The above proof is completely applicable if we denote by cr the area
shaded in Fig. 3. Figure 4 shows what occurs when there is a big gulf be-
tween A and B. For instance, let a canal DC be dug perpendicularly to AB.
Assuming that the border of the city must go along the coastline we see
that the solution of the problem is the same arc ACB when the point D re-
mains inside ACB (Fig. 4a). In case the canal DC intersects the arc ACB
but the inequality lACI + ICBI < L holds, the solution is the curve composed
of the two circular arcs AC and CB (Fig. 4b). In the limiting case IACI +
ICBI = L the solution is the broken line ACB, and for IACI + ICBI > L there
exists no solution.
This version of the problem could be called Dido's problem with phase
constraints.
E) Finally, let us dwell on one more version of Dido's problem. Sup-
pose that for some reason (for instance, because of the ban imposed by the
priests of Eshmun, a god whose temple was built in Byrsa) the walls of the
city must not be inclined to the coastline (which we again assume to be
rectilinear) at an angle greater than 45°. Such a problem is an optimal
control problem. Its solution can be found with the aid of the Pontryagin
maximum principle which will be presented later. In a typical case the
solution is as shown in Fig. 5. The line segments AC and BE form angles
of 45° with the coastline, and CDE is an arc of a circle.
INTRODUCTION 5

a b A ff
Fig. 4 Fig. 5

1 .1.2. Some Other Ancient Extremal Problems in Geometry. The iso-


perimetric problem has been discussed sufficiently thoroughly, and in this
section we shall dwell on some other extremal probelms of geometrical
character which were considered by mathematicians of different centuries.
In particular, such problems are found in the works of Euclid, Archimedes,
and Apollonius, the great mathematicians of antiquity.
In Euclid's "Elements" (4th century B.C.) only one maximum problem is
found (Book 6, Proposition 27). In terms of modern language it is
stated thus:
The Problem of Euclid. In a given triangle ABC inscribe a parallelo-
gram ADEF (Fig. 6) with the greatest area.
Solution. The vertices D, E, and F of the sought-for parallelogram
are the midpoints of the corresponding sides of the given triangle. This
can be proved in various ways. For instance, it can easily be shown that
the parallelograms DDGE and FHEF have equal areas. It follows that the
area of the parallelogram ADEF is less than that of the parallelogram ADEF
since the latter is equal to the area of the figure ADGEHF containing the
parallelogram ADEF. •
In Archimedes' works (3rd century B.C.) known to us the isoperimetric
problem is not mentioned. It is still unknown what was done by Archimedes
in this field, and therefore the words of Simplicius quoted in Section
1.1.1 still remain mysterious. However, in the work "On a Ball and aCyl-
inder" by Archimedes there is a solution of a problem on figures of a given
area. Namely, in this work Archimedes stated and solved the problem on the
maximum volume that spherical segments with a given lateral surface area
may have.
The solution of this problem is a hemisphere (just as a semicircle
is the solution of Dido's second problem).
"Conics" is the name of the greatest work of Apollonius (3rd-2nd cen-
turies B.C.). The fifth book of "Conics" is related to our topic. Vander
Waerden* writes: Apollonius states the "problem of how to draw the shortest
and the longest line segments from a point 0 (see Fig. 7) to a conic sec-
tion. However, he does even more than he promises: he finds all the
straight lines passing through 0 which intersect the conic section at right
angles (at present they are called normals) and analyzes for what position
of 0 the problem has two, three, or four solutions." Moving the point 0 he
"determines the ordinates of the boundary points G1 and Gz where the number
of the normals through 0 passes at once from 2 to 4 and vice versa."
After the downfall of the ancient civilization the scientific activity
in Europe had been at a standstill until the 15th century. In the 16th
century the foundations of algebra were laid down and the first extremal
problem of algebraic character appeared.

*B. L. Vander Waerden, Ontwakende Wetenschap, Groningen (1950).


6 CHAPTER 1

Fig. 6 Fig. 7 Fig. 8


For example, N. Tartaglia (16th century) posed the following problem:
divide the number 8 into two parts such that the product of their product
by their difference is maximal.
Until the 17th century no general methods of solving extremal problems
were elaborated, and each problem was solved by means of a special method.
In 1615 the book "New Stereometry of Wine Barrels"* by J. Kepler was pub-
lished. Kepler begins his book thus: "The year when I got married the
grape harvest was rich and wine was cheap, and therefore as a thrifty man
I decided to store some wine. I bought several barrels of wine. Some time
later the wine merchant came to measure the capacity of the barrels in
order to set the price for the wine. To this end he put an iron into every
barrel and immediately announced how much wine there was in the barrel
without any calculations." (This "method" is demonstrated in Fig. 8.)
Kepler was very much surprised. He found it strange that it was pos-
sible to determine the capacity of barrels of different shape by means of
a single measurement. Kepler writes: "I thought it would be appropriate
for me to take a new subject for my mathematical studies: to investigate
the geometrical laws of this convenient measurement and to find its founda-
tions." To solve the stated problem Kepler laid the foundation of differ-
ential and integral calculus and at the same time formulated the first
rules for solving extremal problems. He writes: "Under the influence of
their good genius who was undoubtedly a good geometer the coopers began
shaping barrels in such a way that from the measurement of the given length
of a line it was possible to judge upon the maximum capacity of the barrel,
and since changes are insensible in the vicinity of every maximum, small
random deviations do not affect the capacity appreciably."
The words we have underlined contain the basic algorithm of finding
extrema which was later stated as an exact theorem first by Fermat in 1629
(for p0lynomials) and then by Newton and Leibniz and was called the theorem
of FeY'171at.
It should also be noted that Kepler solved a number of concrete ex-
tremal problems and, in particular, the problem on a cylinder of the
greatest volume inscribed in a ball.
To conclude this section, we shall present one more geometrical.
problem in which many mathematicians of the 17th century were interested
(Cavalieri, Viviani, Torricelli, Fermat, etc.). It was studied in the 19th
century by the Swiss-German geometer J. Steiner and therefore is called
Steiner's problem. t
Steiner's Problem. Find a point on a plane such that the sum of the
distances from that point t·o three given points on the plane is smallest.
*J. Kepler, Nova Stereometria Doliorum Vinariorum, Linz (1615).
tit should be noted that later problems of this kind appeared in highway
engineering, oil pipeline engineering, and laying urban service lines.
INTRODUCTION 7

Fig. 9 Fig. 10 Fig. 11

Solution. We shall present an elegant geometrical solution of the


problem for the case when the three given points are the vertices of an
acute· triangle. Let the angle at the vertex C of the triangle ABC (Fig. 9)
be not smaller than 60°. Let us turn the triangle ABC through an angle of
60° about the point C. This results in the triangle A'B'C. Now we take
an arbitrary point D within the triangle ABC and denote by D' the image of
D under this rotation through 60°. Then the sum of the lengths IADI +
IBDI + ICDI is equal to the length of the broken line IBDI + IDD'I + ID'A'I
because the triangle CDD' is equilateral and ID'A' I = IDAI.
Now let D be the Torricelli point, i.e., the point at which the angles
of vision of the three sides of the triangle are equal to 120°, and let D'
be the image of D under the rotation. Now it can easily be understood that
the points B, D, D', and A' lie in one straight line, and hence the Torri-
celli point is the solution of the problem. •
We have acquainted the reader with a number of problems of geometrical
character posed and solved at different times. Their statement can only be
partly justified from the viewpoint of practical demands, and perhaps they
were mainly stimulated by the desire to demonstrate the elegance of geom-
etry itself.
1.1.3. Fermat's Variational Principle and Huygens' Principle. The
Light Refraction Problem. The name of Pierre Fermat is connected with the
formulation of the first variational principle for a physical problem. We
mean Fe~at's variational principle vn geometrical optics. The law of
light refraction was established experimentally by W. Snell. Soon after
that R. Descartes suggested a theoretical explanation of Snell's law. How-
ever, according to Descartes, the speed of light in a denser medium (say,
in water) is greater than in a rarer medium (e.g., in the air). This
seemed strange to many scientists.
Fermat gave another explanation of the phenomenon. His basic idea
was that a ray of light "chooses" a trajectory so that the time of travel
from one point to another along this trajectory be minimal (in comparison
with any other trajectories joining the same points). In a homogeneous
medium where the speed of light propagation is the same at all points and
in all directions the time of travel along a trajectory is proportional to
its length. Therefore, the minimum time trajectory connecting two points
A and B is simply the line segment [AB]: light propagates along straight
lines in a homogeneous medium.
At present the derivation of Snell's law from Fermat's variational
principle is included in textbooks for schools (also see Section 1.6.1).
8 CHAPTER

Fermat's principle is based on the assumption that light propagates


along certain lines. C. Huygens (1629-1695) suggested another explanation
of the laws of propagation and refraction of light based on the concept of
light as a wave whose front moves with time.
Abstracting from the discussion of the physical foundations of this idea
and, in particular, the question of whether light is a wave or a flux of
particles, we shall give the following definition which is visual rather
than rigorous.
By a wave front St is meant the set of points that are reached in time
t by light propagating from a source So.
For instance, if So is a point source (Fig. 10) and the medium is
homogeneous, then St is a sphere (a "spherical wave") of radius vt with
center at So. As t increases the wave front expands uniformly in all di-
rections with velocity v. As to the lines of light propagation, they form
a pencil of rays (which are the radii) orthogonal to St at each instant of
time t. As the spherical wave moves off the source, the wave front becomes
flatter and still flatter, and if we imagine that the source recedes to
infinity, in the limit the wave front becomes a plane moving uniformly with
velocity v and remaining perpendicular to the beam of rays of light which
are parallel to one another (Fig. 11).
To determine the motion of a wave front in more complicated situations
Huygens uses the following rule (Huygens' principle). Every point of a
wave front St becomes a secondary source of light, and in time ~t we obtain
the family of the wave fronts of all these secondary sources, and the real
wave front St+~t at the instant of time t + ~t is the envelope of this fam-
ily (Fig. 12).
It can easily be seen that the propagation of light in a homogeneous
medium and the limiting case of a plane wave satisfy Huygens' principle.
As an example of the application of this principle for the simplest homo-
geneous medium, we shall present the derivation of Snell's law "according
to Huygens."
Let a parallel beam of rays of light be incident on a plane interface
~ between two homogeneous media; for simplicity, we shall assume that ~ is
horizontal and the light is incident from above (Fig. 13). By v 1 and v 2
we denote the speed of light above and below~. and by a1 and a 2 we denote
the angle of incidence and the angle of refraction (reckoned from the nor-
mal N to~). The wave front A1A2A propagates with velocity v1, and at some
timet the light issued from the point A reaches the interface.~ at the
point B. After that the point B becomes a secondary source of spherical
waves propagating in the lower medium with velocity v 2 • The light reaches
the point c1 at time t 1 =t+IB1C1 1=t+IBC1 j 8 ina1 , and the intermediate point
vl vl

DE[BC1] at time i 2 =i+JBDJ 8 ina1 • At time t1 the radius of the spherical


vl
wave issued from the secondary source B becomes equal to r 1 :=V2 (t1 -t) =
jBC 1 j~sinct
. vl
1 , and the radius of the spherical wave from D becomes equal to
r2 =v2 (t 1 -t 2 )=JDC 1 J~sinct1 .• The tangents C1C and C1C2 to these spheres
v1
coincide because
• B_...._CC r1 v. . ra . D_...._CC
Sill 1 = iBCd =v;-slllctf= IDC1l =Sill 1 •·

The point D was taken on BC1 quite arbitrarily, and consequently the en-
velope of the secondary waves at time t1 is the straight line CC1 forming
an angle a2 with ~ such that sinct2 =~sinct1 • But this is nothing other than
' !11
INTRODUCTION 9

Fig. 12 Fig. 13

Snell's law:
sin a 2 v2
sin a,_= "V;"
The principles of Huygens and Fermat are closely interrelated. In-
deed, let the position of a wave front St at an instant of time t be known.
Where will it be in some time fit? Let us take a point CESt+At (Fig. 14).
By definition, there exist A ESo and a path AC traveled by the light dur-
ing time t + ~t, and, according to Fermat's principle, it takes the light
more time to travel from A to C along any other path. By the continuity,
there is a point B on the arc AC such that it takes the light time t to
travel from A to B and time ~t to travel from B to C. Since the arc AC
possesses the minimality property, the arcs AB and BC possess the same
property. Indeed, if, for instance, there existed a path along which the
light could travel from A to B in shorter time than t, then adding the arc
BC to this path we would obtain a path from A to C along which the light
propagates during time shorter than t + ~t, which is impossible. It fol-
lows that, firstly, BESt , and, secondly, the point C belongs to the wave
front corresponding to the point source located at Band to time ~t. This
is in complete agreement with Huygens' principle: the point B became a sec-
ondary source, and the wave propagating from it reaches the point C in time
~t. The idea of a wave front, Huygens' principle, and the way of reasoning
we have sketched served later as a basis for the Hamilton-Jacobi theory and
in the middle of our century for the so-called dynamic programming, which is
an important tool for solving various applied extremal problems.
* * *
After Fermat's variational principle many other variational principles
were discovered: first in mechanics and then in physics. Gradually most
of the scientists began to believe that nature always "chooses" a motion
as if an extremal problem were solved. Here it is expedient to quote
Euler: "In everything that occurs in the world the meaning of a maximum
or a minimum can be seen." In our time C. L. Siegel said jokingly: "Ac-
cording to Leibniz, our world is the best of all the possible worlds, and
therefore its laws can be described by extremal principles."
1.1:4. The Brachistochrone Problem. The Origin of the Calculus of
Variations. In 1696 an article by Johann Bernoulli was published with an
intriguing title: "Problema novum, ad cujus solutionem mathematici invitan-
tur" ("A New Problem to Whose Solution Mathematicians Are Invited"). The
following problem was stated there: "Suppose that we are given two points
A and Bin a vertical plane (Fig. 15). Determine the path AMB along which
a body M which starts moving from the point A under the action of its own
10 CHAPTER 1

ll

Fig. 14 Fig. 15
gravity reaches the point B in the shortest time."*
The solution of this problem ("so beautiful and until recently un-
known," according to Leibniz) was given by Johann Bernoulli himself and
also by Leibniz, Jacob Bernoulli, and an anonymous author. Experts imme-
diately guessed that it was Newton ("ex unge leonem, "t as Johann Bernoulli
put it). The curve of shortest descent or the brachistochrone turned out
to be a cycloid. Leibniz' solution was based on the approximation of
curves with broken lines. The idea of broken lines was later developed by
Euler and served as a basis of the so-called direct methods of the calculus
of variations. The remarkable solution given by Jacob Bernoulli was based
on Huygens' principle and the concept of a "wave front." However, the most
popular solution is the one given by Johann Bernoulli himself, and we shall
present it here.
Let us introduce a coordinate system (x, y) in the plane so that the
x axis is horizontal and they axis is directed downward (Fig. 16). Ac-
cording to Galileo's law, the velocity of the body Mat the point with co-
ordinates (x, y(x)) is independent of the shape of the curve y(·) between
the points A and (x, y(x)) [provided that the body moves downward without
friction along the curve v(·)] and is dependent on the ordinate y(x) solely
and is equal to /zgy(x), where g is the acceleration of gravity. It is re-
quired to find the shortest time it takes the body to travel along the path
from A to B, i.e., the integral
r-s~-s
- _ 11 - -
ds
Y2gy(x)
AB AB

(here ds is the differential of arc length) must be minimized.


However (by virtue of Fermat's principle stated in Section 1.1.3), we
obtain exactly the same problem if we investigate the trajectories of light
in a nonhomogeneous (two-dimensional) medium where the speed of light at a
point (x, y) is equal to hgy. Further, Johann Bernoulli "splits" the me-
dium into parallel layers within each of which the speed of light is as-
sumed to be constant and equal to vi; i = 1, 2, ..• (Fig. 17). By virtue
of Snell's law, we obtain
sin as
... C$ -vt=const ,
sin tXt
-v;- = -v;- =
sin a.1

*It should be noted that in Galileo's "Dialogues" there is a slight hint


to the statement of the brachistochrone problem: Galileo proves that a
body which moves along a chord reaches the terminal point ·later than if it
moved along the circular arc subtended by the chord.
t"Painting a lion from the claw" (PZ.utareh, On the Cessation of Oracles).
INTRODUCTION 11

Fig. 16 Fig. 17

where ai are the angles of incidence of the ray. Performing the refinement
of the layers and passing to the limit (Johann Bernoulli did not, of course,
dwell on the justification of the validity of this procedure), we conclude
that
sin~(x)
V"(x) = const,

where v(x) = l2gy(x) and a(x) is the angle between the tangent to the curve
y(·) at the point (x, y(x)) and the axis Oy, i.e., sina.(x)=l/YI+(y'(x))1 •
Thus, the equation of the brachistochrone is written in the foliowing form:

V 1 +(y')' VY = C ~ y' = ..y!"C=Y


y ~ YC-y
dg y-y = dx.

Integrating this equation (with the aid of the substitution y = Csin1 ~,


dx =Csin1 ]-dt), we arrive at the equation of a cycloid:

x=C1 +; (t-sint), y=; (1-cost).


We would like to stress an important distinction between· the problem
of Euclid and Kepler's problem (on an inscribed cylinder), on the one hand,
and, say, the brachistochrone problem, on the other hand. The latter is
that the set of all parallelograms inscribed in a triangle and the set of
all cylinders inscribed in a ball depend only on one parameter. Hence in
these problems it is required to find an extremum of a function of one
variable. As to the brachistochrone problem, it requires one to find an ex-
tremum of a function of infinitely many variables. The history of mathe-
matics made a sudden jump: from the dimension one it passed at once to the
infinite dimension or from the theory of extrema of functions of one vari-
able to the theory of problems of the type of the brachistochrone problem,
i.e., to the calculus of variations, as this branch of mathematics was
called in the 18th century.
Soon after the work by Johann Bernoulli many other problems similar
to the brachistochrone problem were solved: on the shortest lines on a
surface, on· the equilibrium of a heavy thread, etc.
Traditionally the year 1696, the year of the brachistochrone problem,
is considered to be the starting point of the calculusofvariations. How-
ever, this is not exactly so, and we shall discuss it in the next sec-
tion.
1.1.5. Newton's Aerodynamic Problem. In 1687 the book "Mathematical
Principles of Natural Philosophy" by I. Newton was published. In Section
VII titled "The Motion of Fluids, and the Resistance Made to Projected
Body" Newton considers the problem on the resistance encountered by a ball
and a cylinder in a "rare" medium, and further in the "Scholium" he in-
vestigates the question of the resistance to a frustum of a cone moving in
12 CHAPTER 1

Fig. 18

the same "rare" medium.* In particular, he finds that among all the cones
of given width and altitude the cone encountering the least resistance has
an angle of 135°. He makes the remark that this result "may be of some
use in shipbuilding" and then writes: "Quod si figura DNFG ejusmodi sit ut,
si ab ejus puncto quovis N ad axem AB demittatur perpendiculum NM, et di-
catur recta GP quae parellela sit rectae figuram tangenti in N, et axem
productam sicet in P, fuerit MN ad GP ut cpcub ad 4BP x GBq, solidum quod
figurae hujus revolutione circa axem AB describitur resistetur minime
omnium ejusdem longitudinis & latl.tudinis."
Newton's words can be translated thus: "When the curve DNFG is such
that if a perpendicular is dropped from its arbitrary point N to the axis
AB and a straight line GP parallel to the tangent to the curve at the point
N is drawn [from the given point G] which intersects the axis at the point
P, then [there holds the proportion] MN:GP = GP 3 :(4BP x GB 2 ), the solid
obtained by the revolution of this curve about the axis AB·encounters the
least resistance in the above-mentioned rare medium among all the other
solids of the same length and width" (see Fig. 18 drawn by Newton). Newton
did not explain how he had obtained his solution. Later he gave his com-
mentators a sketch of the derivation, but it was published only in 1727-1729
when the first stage of the calculus of variations was completed. Newton's
preparatory material (published only in our time) shows that he had had a
good command of the elements of many mathematical constructions which were
realized later by Euler and Lagrange. As we shall see, Newton's problem
belongs to the optimal control theory whose elaboration began only in the
Fifties of our century rather than to the calculus of variations.
1.1.6. The Diet Problem and the Transportation Problem. Let us sup-
pose that an amount of a certain product is kept in several stores and that
the product must be delivered to several shops. Let the cost of the trans-
portation of the unit of the product from each store to each shop be known,
and let it be known how much product must be delivered to each shop. The
transportation problem consists in working out a transportation plan opti-
mal in the given situation, i.e., it is required to indicate what amount
of the product must be transported from each store to each of the shops
for the total cost of the transportation to be minimal. The diet problem
is similar to the transportation problem; it consists in working out a diet
with minimum cost satisfying the consumption requirements for a given as-
sortment of products on condition that the content of nutrient substances
in each of the products and the cost of the unit of each product are known.
A great number of problems of this kind appear in practical economics.
The two above-mentioned problems belong to the branch of mathematics
called linear programming. The linear programming theory was created quite
recently- in the Forties-Fifties of our century.

*For example, see I. Newton, Mathematical Principles of Natural Philosophy,


Cambridge (1934), pp. 327 and 333 (translated into English by Andrew Motte
in 1729).
INTRODUCTION 13

1.1.7. The Time-Optimal Problem. We shall present the simplest ex-


ample of an extremal problem with a "technical" content. Suppose that
there is a cart moving rectilinearly without friction along horizontal
rails. The cart is controlled by an external force which can be varied
within some given limits. It is required that the cart be stopped at a
certain position in the shortest time. In what follows this problem will
be called the simplest time-optimal problem.
A characteristic peculiarity of extremal problems in engineering is
that the acting forces are divided into two groups. One of them consists
of natural forces (such as the gravitational force) and the other includes
forces controlled by a man (e.g., the thrust force). Here there naturally
appear constraints on the controlled forces connected with limitations of
technical means.
The theory of the solution of such problems was constructed still
later- at the end of the Fifties. It is called the optimal control theory.

* * *
So, where do extremal problems appear from? We tried to demonstrate
by the examples presented here that there are many answers to this ques-
tion. Extremal problems appear both from natural science, economics, and
technology and from mathematics itself due to its own demands. That is
why the theory of extremal problems and its practical aspect - the opti-
mization theory- have won a wide popularity in our time.

1.2. How Are Extremal Problems Formalized?


1.2.1. Basic Definitions. Each of the problems of Section 1.1 was
formulated descriptively in terms of the specific field of knowledge where
the problem appeared. This is the way in which extremal problems are usu-
ally posed, and, generally speaking, it is not necessary to solve every
problem analytically. For example, Euclid's and Steiner's problems were
solved in Section 1.1.2 purely geometrically. However, i f wewant tomake
use of the advantages of the analytical approach, then the first thing to
do is to translate the problem from the "descriptive" language into the
formal language of mathematical analysis. Such a translation is called
formalization.
The exact statement of an extremal problem includes the following com-
ponents: a functionalt f:X + i defined on a set X, and a constraint, i.e.,
a subset C s X. (By i we denote the "extended real line," i.e., the collec-
tion of all real numbers together with the values and +oo.) The set X
-00

is sometimes called the class of underlying elements, and the points x EC


are said to be admissible with respect to the constraint. The problem it-
self is formulated thus: find the extremum (i.e., the infimum or the supre-
mum) of the functional f over aU xEC. For this problem we shall use the
following standard notation:
f(x)-+inf(sup); xec. (1)
Thus for the exact statement of the problem we must describe X, f,
and C.
If x = c, then problem (1) is called a problem without constraints.
A point xis called a solution of problem (1), namely, it is said to yield
an absolute minimum (maximum) in the problem (1), if f(x) ~ f(x) [f(x) ~
f(x)] for all xEC. As a rule, we shall write all the problems as mini-
mization problems replacing, when necessary, a problem of the type f(x) +
sup, xec, by the problem f(x) + inf, xEC, where f(x) = -f(x). In those
tin the theory of extremal problems scalar functions are often called
functionals.
14 CHAPTER

cases when we want to stress that it does not make any difference to us
whether a minimization or a maximization problem is considered we write
f(x) + extr.
Further, the set X is usually endowed with a topology, i.e., the no-
tion of closeness of elements makes sense in X. To this end we can, for
instance, specify a set of neighborhoods in X (as it is done in the stan-
dard manner in Rn or in a normed space). If X is a topological space, then
x is called a local minimum if there exists a neighborhood U of the point
x such that xis a solution of the problem f(x)-+-inf, xECnU. A local
maximum is defined in a similar way.
1.2.2. The Simplest Examples of Formalization of Extremal Problems.
We shall present the formalization of some of the problems mentioned inSec-
tion 1.1. Let us begin with the problem of Euclid (see Section 1.1.2,
Fig. 6). From the similarity of the triangles DBE and ABC, we obtain h(x)/
H = x/b. Here x is the side IAFI of the parallelogram ADEF, H is the alti-
tude of ~ABC, h(x) is the altitude of ~BDE, and b = IACI is the length of
the side AC. The area of the parallelogram ADEF is equal to (H- h(x))x ~
H(b - x)x/b. Now we obtain the following formalization of the problem of
Euclid:
H0-~x
b
.
->SUp, O,;;;x,;;;b, ~x(x-b)-..mf, xE(O, b]. ( 1)
Here the three components every formalization consists of are:
X=R, f=H(b-x)x/b, C=[O, b].
Let us formalize the problem of Archimedes on spherical segments with
a given surface area (see Section 1.1.2). Let h be the altitude of a
spherical segment, and let R be the radius of the ball. As is known from
geometry, the volume of a spherical segment is equal to nh 2 (R- h/3) while
its surface area is equal to 2nRh. We see that the problem of Archimedes
can be formalized in two ways:

nN ( R-4) -sup, 2nRh=a, R~O, 2R~h~O, (2)

or, eliminating R from the functional in (2),


h .. /a
ha nh3
2-3-+SUp, O,;;; ,;;; Jf n (2 ')

(the last ine~uality is a consequence of h,;;;2R>=;>a~nh 2 ). In the first case


we have X= R+, f = nh 2 (R- h/3) C = {(R, h)l2nRh =a, 2R ~ h}; in the
second case, putting X= [0, Ia/;], we obtain a problem without constraints
with the functional f = ha/2- nh 3 /3.
Kepler's problem on the cylinder of maximum volume inscribed in a ball
(see Section 1.1.2) admits of the following evident formalization:
2nx(1-x 2)-+-sup; o,;;;x,;;; 1 (X= R, C=[O, 1]). (3)
Here the radius of the ball is equal to unity, and x is half the altitude
of the cylinder.
The problem of light refraction at the interface between two homo-
geneous media which is solved with the aid of Fermat's variational prin-
ciple (see Section 1.1.3 and Fig. 19) is reducible to the following prob-
lem. Let the interface between two homogeneous media be the plane z = 0,
the speed of light in the upper and in the lower half-spaces being v 1 and
Vz, respectively. We must find the trajectory of the ray of light propa-
gating from the point A= (0, 0, a), a> 0, to the point B (~, 0, -£),
B > 0. Due to the symmetry, the ray must lie in the plane y 0. Let
C = (x, 0, 0) be the point of refraction of the ray. Then the time of
propagation of the ray from A to B is equal to
INTRODUCTION 15
z :c

Fig. 19 Fig. 20

T(x) =
Y a2+x' + Y pz+ (6-x)Z .
Vl VI

According to Fermat's principle, the coordinate x of the point of re-


fraction is found from the solution of the problem

(4)

It should be noted that we have obtained a problem without constraints.


Similarly, the problem without constraints
lx-stl+ lx-ssl+lx-~sl- inf, (5)

where X= C = R2 , ~1, ~2, and ~3 are three given points in the plane R2 ,
and lxl = g+x-'f is a formalization of Steiner's problem (see Section
1.1.2). Let us stress an important peculiarity of the functional of the
problem (5): it is a convex function which, however, is not everywhere dif-
ferentiable.
The problem of Apollonius on the shortest distance from a point ~ =
(~1, ~z) to an ellipse specified by an equation of the form xf/af + x~/a~
1 obviously admits of the following formalization:
I 2
(xl-£1)• +(xs-s.)•--. inf, !!..+.!!.. = 1
af a:
(x=R C={xl :i +:i =1}).
(6)
2
,

Thus, we have learned the formalization of the simplest problems. The


formalization of more complex problems arising in natural sciences, tech-
nology, and economics is a special and nontrivial problem. In these prob-
lems the formalization itself depends on physical or some other hypotheses.
In the next section this will be demonstrated by the example of Newton's
problem.
1.2.3. The Formalization of Newton's Problem. Naturally, the formal-
ization of this problem depends on the resistance law of the medium. New-
ton thought of the medium (he called it "rare") as consisting of fixed
particles with a constant mass m possessing the properties of perfectly
elastic balls. We shall also assume this hypothesis.
Let a solid of revolution about the x axis (Fig. 20) move in the di-
rection opposite to that of the x axis ("downward") in Newton's "rare"
medium with a velocity v. In its rotation about the x axis an element dr
on the r axis describes an annulus of area do = 2ndr; to this annulus there
corresponds a surface element d~ on the body of revolution. During time dt
this surface element "displaces" a volume dV = 2nrdrvdt. Let p be the den-
sity of the medium. Then the number of the particles which have hit the
16 CHAPTER 1

surface element is N=.J!_dV= pZnrdrv dt, where m is the mass of a particle.


m m
Let us calculate the force dF acting on the surface element dE during time
dt. Let the element ds be inclined to the r axis at an angle cp. When a
particle is reflected from dE, it gains an increment of momentum equal to
m(v 2 -v1 )=-2mvcoscp·n, where v = lv1/ = lv2/, n is the unit normal vector
to dE, and cp=arctan* is the angle between ds and the horizontal direction.
By virtue of Newton's third law, the body gains the opposite momentum in-
crement m2vcoscp·n, and during time dt there are N such increments; by vir-
tue of symmetry, the sum of the components of the momentum orthogonal to
the axis of revolution is equal to zero while the resultant axial component
of the increment of the momentum is equal to

N m 2vcos cp cos cp = Zpnr dr v dt m2v cos• cp = 4p:rw•r dr dt cos• cp.


m
By Newton's second law, this expression is equal to dFdt, whence dF
krdrcos 2 cp, k=4p:rw 2 , and the resultant resistance force is
R
(1)
F=k Sl+(~:;dr)••
0

Hence, replacing r and t by R and T we arrive at the extremal problem


T

S-l+x•
t dt . f
-.---+lll' X (0) = 0, X (T) = £. ( 2)
0

It can easily be understood without solving the problem (Legendre was


the first to note this fact in 1788) that the infimum in the problem is
equal to zero. Indeed, if we choose a broken line x(•) with a very large
absolute value of the derivative (Fig. 21), then integral (2) will be very
small. On the other hand, the integral in (2) is nonnegative for any func-
tion x(·). Thus the infimum of the values of the integral is equal to zero.
For this Newton was many times subjected to criticism. For example,
in the book* by the distinguished mathematician L. C. Young it is written:
"Newton formulated a variational problem of solid of least resistance, in
which the law of resistance assumed is physically absurd and ensures that
the problem has no solution- the more jagged the profile, the less the
assumed resistance .... If this problem had been even approximately cor-
rect ... there would be no need today for costly wind tunnel experiments."
But is all this really so in the situation we are discussing? First of
all, it should be noted that Newton himself did not formalize the problem.
This was done (and not quite adequately) by other people. For the correct
formalization one must take into account the monotonicity of the pro~ile,
which was tacitly implied (when the profile is jagged the particles undergo
multiple reflections, which leads to distortion of the whole phenomenon).
The requirement of monotonicity makes the problem physically consistent.
If this fact is taken into account, it is seen that the solution given by
Newton himself is not only "approximately correct" but marvellously correct
in every detail, which will be discussed later. And what is more, the
physical hypotheses Newton had put forward and the solution of the aero-
dynamic problem he had given turned out to be very important for modern
supersonic aerodynamics when it became necessary to construct high-speed
and high-altitude flying vehicles.
The assumption on the monotonicity leads to the following correct for-
malization of Newton's problem:
*L.. C. Young, Lectures on the Calculus of Variations and Optimal Control
Theory, W. B. Saunders, Philadelphia-London-Toronto (1969), p. 23.
INTRODUCTION 17

Fig. 21
T
S t dt f
--.--+In'
o I+xa
0

x(O)=O, x(T)=s, xER+· (3)

1.2.4. Various Formalizations of the Classical Isoperimetric Problem


and the Brachistochrone Problem. The Simplest Time-Optimal Problem. The
first two problems mentioned in the title of this section belong to the
most widely known ones, but they turn out to be perhaps the most difficult
for a complete analysis. We shall formalize each of them twice: first
in the traditional and well-known manner and then by means of a method
which is not so widely known. In this way we want to stress that, in prin-
ciple, the formalization procedure is not unique. An adequate choice of a
formalization is in fact an independent problem whose successful solution
is to a.great extent dependent on one's skill.
We shall begin with the classical isoperimetric problem. Let the
length of the curve be L, and let the curve itself be represented para-
metrically by means of functions x(•) andy(·), the parameter being the
arc length s reckoned along the curve from one of its points. Then the
x y
relation 2 (s) + 2 (s) = 1 holds at each point, and, moreover, x(O) = x(L)
and y(O) = y(L) since the curve is closed.
For definiteness, let the sought-for curve be located so that its
L L
center of gravity is at the origin, i.e., the equalities ~ x(s)ds= ~ y(s)ds=O
0 0 L
take place. The areaS bounded by the curve (x(·), y(·)) is equal to ~xyds.
0
We thus obtain the following formalization:
L
8= ~ Xgds-+SUp; X1(s)+y 2 (S)= },
0 ( 1)
L L
~ x(s)ds= ~ y(s)ds=O,_ x(O)=x(L), y(O)=y(L).
0 0

However, the same problem can be formalized in a different way. Sup-


pose that an airplane is required to circumnavigate the greatest area in a
given time and return to the airfield. If the maximum velocity of the
airplane is independent of the direction of the flight, the natural for-·
malization of the given problem is the following:
T
5
The area must be the greatest~} (x(t)v(t)-y(t)u(t))dt-+sup, where
i(t) = u(t), y(t) = v(t).
0

The maximum velocity of the airplane is equal to V~u•+v 2 :s;;;V'.


The airplane must return to its airfield ~ x (0) = x (T), y (0) = Y(T).
The problem also admits of a more general formulation when the maximum
velocity depends on the direction of the flight (e.g., in the presence of
wind). Then we have a more general problem:
18 CHAPTER 1

-} J(xu-yu)dt-.sup,
T

x=u, y= v,
(2)
x(O)=x(T), y(O)=y(T), (u, v)EA,

where A is the set of all admissible values of the velocity of the airplane.
If A is a circle, we obviously arrive at the classical isoperimetric
problem, and if A is a "shifted circle" (which corresponds to constant
wind), we obtain the well-known Chaplygin p:rooblem.
Now we shall present the most traditional formalization of the
b:roaahistoah:roone p:rooblem.
As in Section 1.1.4, let us introduce a coordi-
nate system (x, y) in the plane (Fig. 16) so that the x axis is horizontal
and the y axis is directed downward. Without loss of generality, we can
assume that the point A coincides with the origin. Let the coordinates of
the point B be (xl, Yl), x1 > O, Yl > 0 (see Fig. 16), and let y(·) be the
function specifying the equation of the curve joining the points A and B.
We remind the reader that, according to Galileo's law, the velocity of the
body M at the point (x, y(x)) is independent of the shape of the curve y(·)
in the interval (0, x) and is only dependent on the ordinate y(x) itself,
the value of this velocity being equal to l2gy(x), where g is the accelera-
tion of gravity. Consequently, the time T it takes the body to travel
along an element of the curve ds = ldx 2 + dy 2 from the point (x, y(x)) to
the point (x + dx, y(x) + dy) is equal to ds/l2gy(x), whence the following
formalization of the brachistochrone problem is obtained:
x,
:J (y ( ·)) = S~ dx-+ inf,
0 2gy
y (0) = 0, y (x1) = y1• (3)

Let us present another formalization of the brachistochrone problem


whose idea is similar to that of the second formalization of the classical
isoperimetric problem; here we shall follow the article by Johann Bernoulli
(1969) mentioned earlier in which he proceeded from Fermat's variational
principle.
Suppose that there is a nonhomogeneous medium in which the speed of
light propagation depends solely on the "depth" y according to the law
v 2 = 2gy. Then, by virtue of Fermat's variational principle, the ray of
light will travel along the path from A to B in the shortest time. In this
way we obtain a formalization of the brachistochrone problem in the form of
a time-optimal p:rooblem:

T-.inf, X=Vyu, u=VYv. u2 +v2 =2g,


(4)
x(O)=y(O)=O, x(T)=xi, y(T)=Yi·
The formalization of the simplest time-optimal p:rooblem (Section 1.1.7)
is also similar to the above. Let the mass of the cart be m, let its ini-
tial coordinate be xo, and let the initial velocity be v 0 • The external
force (the thrust force) and the variable coordinate of the cart will be
denoted by u and x(t), respectively. Then, by Newton's second law, mX = u.
The constraint on the thrust force will be set in the form uE[u 1 , u.]. We
thus have
T-+inf, mx=u, ue[u1 u.J, (5)
x(O)=X0 , x(O)=v0 , x(T)=x(T)=O.
The statement of the problem we have obtained is very similar to (2) and
(4).
Now we note an important fact: we have in fact "underformalized" the
problems. We mean that, for instance, in the formalization (3) the domain
of the functional 3 is not indicated precisely, and consequently it is
INTRODUCTION 19

yet unknown for what class of curves the problem should be considered, i.e.,
the set X (see Section 1.2.1) is not defined. The same applies to the
other formalizations of the present section. By the way, the "classics"
often did not pay attention at all to the precise formalizations of the
problems and simply solved them in the "underformalized" form. However,
in what follows we are going to be precise, and therefore we shall have to
do this rather tedious job: to stipulate every time within what class of
objects the solution is being sought (or was found).
1.2.5. Formalization of the Transportation and the Diet Problems.
We shall begin with the transportation problem. Let us introduce the fol-
lowing notation:
ai is the amount of the product (measured in certain units) kept in
the i-th store, 1 ~ i ~ m;
bi is the demand for the product (expressed in the same units) of the
j-th shop;
Cij is the cost of the transportation of the unit of the product from
the i-th store to the j-th shop;
Xij is the amount of the product (in the same units) to be transported
from the i-th store to the j-th shop. m n
Then the total cost of the transportation is equal to ~ ~ c11 x11 , and
i=l i=l

it must be minimized. Here we have the following constraints:


a) x11 E R+ (this is an obvious constraint on the amount of the product
to be transported);
n
b) ~ x 11 ~a 1 (this constraint means that it is impossible to trans-
i-=1
port more product from a store than the amount kept there);
m
c) ~ x 11 =b1 (this means that it is necessary to transport exactly
l=l
the amount of the product which is demanded).
As a result, we arrive at the following formalization:
m n n Ill

X;1 ~a;,
(1)
~ ~ C;jXij-+ inf, ~ ~ xu=b1 , x 11 ER+·
i=lj=l i=l 1=1

The diet problem is formalized as simply as the transportation prob-


lem. Let there be n products (corn, milk, etc.) and m components (fat,
protein, carbohydrates, etc.). Let us suppose that for the full value diet
bj units of the j-th component are needed, and let us denote by aij the
amount of the j-th component contained in a unit amount of the i-th prod-
uct and by Ci the cost of a unit of the i-th pr9duct.
Denoting by Xi the amount of the i-th product in the diet, we obtain
the problem
n n
~ C;X;-+ inf, .~ a 11 X;~b1 , X 1 ~0. (2)
i=l •=I
1.2.6. The Basic Classes of Extremal Problems. In Section 1.1 we
already mentioned briefly that in the theory of extremal problems a number
of sufficiently well-defined classes of problems are separated out. Before
describing them we shall present a brief review of the methods by means of
which the constraints were specified in the problems formalized above.
In the first place, we encountered formalizations in which there were
no constraints at all (say, in the light refraction problem or in Steiner's
20 CHAPTER 1
problem). In the second place, constraints were sometimes specified by a
system of equalities [for example, in the problem of Apollonius or in the
formalization (3) of the brachistochrone problem in Section 1.2.4 where the
boundary conditions were specified by equalities]. In the third place,
constraints were sometimes expressed by inequalities (e.g., in the trans-
portation problem). Finally, in the fourth place, some constraints were
written in the form of inclusions [for instance, the constraint x E R+ in
Newton's problem, and the constraint (u, v) E A , where A = { (u, v); u 2 + v 2 ~
1} in the formalization (2) of the classical isoperimetric problem in
Section 1.2.4].
It should be stressed that such a division is somewhat conditional.
x
For example, the constraint E R+ in Newton's problem could have been
written in the form of the inequality x
~ 0 while the constraint (u,v)EA
in the classical isoperimetric problem could have been written in the form
of the inequality u 2 + v 2 ~ 1. Conversely, every inequality f(x) ~ 0 can
be replaced by the equality f(x) + u = 0 and the inclusion uER+, etc.
Nevertheless, from the viewpoint accepted in the present book the
division of the constraints into equalities and inequalities, on the one
hand, and inclusions, on the other hand, is in a sense meaningful. In
courses of mathematical analysis the Lagrange multiplier rule is presented
for the solution of "conditional extremum" problems (this will be discussed
in detail in Section 1.3). As is known, at the initial stage of the appli-
cation of this rule the "Lagrange function" is formed which involves both
the functional under investigation and the functions specifying the con-
straints. It may happen that for some reason or other it is convenient not
to include some of the constraints in the Lagrange function. So, these are
the constraints which are not involved in the Lagrange function in the so-
lution of the corresponding problem that are separated out in the form of
inclusions. Here, as in the formalization of a problem (which can usually
be carried out in many ways and whose adequate choice among the possible
formalizations depends on one's skill), there is no uniqueness in the divi-
sion of the constraints. Now we pass to the description of the basic
classes of extremal problems.
In what follows, we shall consider, from a sufficiently general view-
point, the following four classes.
I. Smooth Problems with Equality and Inequality Constraints. Here
the class of underlying elements X is usually a normed space,* and the
constraint C is specified by an equality F(x) = 0, where F is a mapping
from X into another normed space Y and by a finite number of inequalities
fi(x) ~ 0, i = 0, 1, ... ,m. As a result, we obtain the class of the prob-
lems
f 0 (x)-inf, F(x)=O, f;(x)~O, i=l, ... ; m. (1)
It is supposed here that the functions fi, i = 1, 2, ••• ,m, and the mapping
F possess some smoothness properties. The smooth problem fo(x) +in£ will
be called an elementary smooth problem.t
II. The Classical Calculus of Variations. Here the traditional class
of underlying elements is a Banach space X= C1 ([to, t1l, Rn)ofcontinuously
differentiable n-dimensional vector functions x(·) = (x 1 (·), .•• ,xn(·)) in
*Here and below the terms "normed space" and "Banach space" are used only
for the sake of correctness of the statement of the problem. The reader
will find the exact definitions in Section 2.1.1. For simplicity, the
reader may assume that in problem (1) the element x is a vector x = (xl,••••
xn) in the n-dimensional arithmetic space Rn.
tMaximum problems or those involving the opposite inequalities (~) can
easily be brought to the form (1) (see Section 3.2).
INTRODUCTION 21

which the norm ~s determined by the formulas


I x ( · )ilt =max (II x ( ·) 11.. I x(·) 11.1.
max ( le[l
llx(·)l[o= l~i~n max [x 1 (t)l)·
0 ,11)

The functionals in the problems of the classical calculus of varia-


tions are usually of the following types:
integral functionals, i.e., functionals of the form
11 11
3(x(·))=~L(t,x, x)dt=~L(t,x1 (t), ... ,xn(t), xdt), ... ,xn(t))dt; (2)
t0 i0

endpoint functionals, i.e., functionals of the form


$"(x(·))=l(x(t 0 ), x(tt))=l(x 1 (t 0 ), . . . , x.(t.), xdt1), . . . . , xn_(t1)); (3)

functionals of mixed form, i.e.,


53 (x ( · )) = 3 (x ( ·)) +ff (x ( · )). (4)
[In what follows functionals of the form (4) will also be called Bolza
functionals.]
The constraints in the problems of the classical calculus of varia-
tions usually split into two groups:
differential constraints of the form
M(t, x(t), x(t))=O~M;(t, xt(t), ... , Xn(t), xdt), ... , xn(t))=O, i=l, 2, ... , p; (5)

boundary conditions of the form


~(x(t 0 ), x(t 1 ))=0~¢J(xt(to), ... , xn(t 0 ), X1 (i 1 ), ... , X 11 (t 1 ) ) , (6)
j=l, 2, ... , s.
The functions L and Mi in (Z) ana ~5) and the functions l and ~j in
(3) and (6) depend on 2n + 1 and 2n variables, respectively, and are as-
sumed to be smooth. The problem
.'7(x(·))->inf, M(t, x, x)=O, IJ'(X(t 0 ), x(t 1))=0 (7)

is called the Lagrange problem. The problem


$(x(·))-+inf, M(t, X, x)=O, ljJ(x(t.), x(tt))=O
is called the Bolza problem.
The problem
ff(x(·)-+inf, M(t, x, x)=O, ¢(x(t 0 ), x{t 1 ))=0
is called the Mayer problem.
The problem without constraints
I,

$(X(·))=~ L(t, x(t), x(t))dt+l(x(t:), x(t 1))-+inf (8)


t,

will be called the elementary Bolza problem.


The problem
It
(9)
'(x(. ))= ~ L(t, x(t), x(t))dt -inf, x(t.)=x•• x(ti)=Xt
t,

will be called the simplest vector problem of the classical calculus of


variations; in the case n = 1 it will be called the simplest problem of
the olassioal oaloulus of variations. For the sake of simplicity, here
we shall confine ourselves to problems with fixed time.
22 CHAPTER
In Chapter 4 the reader will find a more general statement in which
the functional and the constraints also depend on the variable instants of
time to and t1.
III. Convex Programming Problems. Here the class of underlying ele-
ments X is a linear space, and the constraint C is determined by a system
of equalities F(x) = 0 (F:X + Y,whereY is another linear space), inequal-
ities fi(x) :s 0, i = 1, 2, ••• ,m, and inclusions xEA. As a result, the class
of problems
f 0 (x)·-+inf, F(x)=O, f;(x)~O, i=l, •.. , m, xEA, ( 10)
is obtained.
It is supposed that the functions fi, i = 0, 1, ••• ,m, are convex, the
mapping F is affine [i.e., F(x) =Ax+ n, where n is a fixed vector and A
is a linear operator from X into Y], and A is a convex set. If all the
functions fi in (10) are linear and A is a certain standard cone, then
the problem (10) is called a linear programming problem. If there are no
constraints in (10), the problem fo(x) + inf with a convex function fo will
be called an elementary convex problem without constraints.
IV. Optimal Control Problems. In this book we shall consider the
following class of optimal control problems:

J0 (x( · ), u( · ), t0 , t 1 ) =f fo(t, x(t), u(t)) dt + l/lo(t0 , x(t0 ), tt. x(t 1 )) ~ inf,


to
(11)

x=q>(t,x, u),J,(x( · ), u( · ), t0 , t 1 ) = ff.(t,x(t), u(t)) dt+I/J,(t0 ,x(t0 ), tt.x(t1 ))~0, i=l, ... ,m,
to

where xE Rn a:nd u E R' • The symbol Ji § 0 means that the i-th constraint
is of the form Ji = 0 or Ji ~ 0 or Ji ~ 0.
In the general case to and t1 in (11) are not fixed, all the functions
f: a X an X ar +a, ~: R X an X ar +an, ~= a X an X a X an+ as, and g =
a x an x a x an + R are assumed to be continuous jointly with respect to
their arguments and continuously differentiable with respect to the vari-
ables t and x; the set U is an arbitrary subset of Rr.
For the complete description of the problem it only remains to explain
what objects form the class of underlying elements. To begin with, we
shall consider the collection of vector functions (x(·), u(·)) where u(·)
is defined and piecewise continuous on [to, t1l and, moreover, the inclu-
sion u(t)EU holds for all t, and x(·) is continuous on [to, tll and dif-
ferentiable at all the points except those where u(•) suffers discontinui-
ties; besides, it is supposed that the equality i(t)=cp(t, x(t), u(t)) holds at
all the points where x(·) is differentiable.
The problems in which t1 plays the role of the functional are called
time-optimal problems.
*
Now let us see to what classes the problems of Sections 1.2.1-1.2.5
* *
belong.
The problem of Euclid [see (1) in Section 1.2.2] can be considered
both as a smooth problem and as a convex programming problem. The formal-
izations (2) and (2') of the problem of Archimedes in Section 1.2.2 and
Kepler's problem (3) of Section 1.2.2 belong to the class of smooth prob-
lems. The light refraction problem [see (4) in Section 1.2.2] is both an
elementary smooth problem and an elementary convex problem. Steiner's
problem [see (5) in Section 1.2.2] is an elementary convex problem. The
INTRODUCTION 23
problem of Apollonius [see (6) in Section 1.2.2] is a smooth problem with
equality constraints. Newton's problem [see (3) in Section 1.2.3] is an
optimal control problem. The formalization (1) in Section 1.2.4 of the
classical isoperimetric problem belongs to the classical calculus of varia-
tions while the formalization (2) of the same problem belongs to the opti-
mal control theory. The brachistochrone problem [see (3) in Section 1.2.4]
is the simplest problem of the classical calculus of variations; the same
problem in the formalization (4) of Section 1.2.4 is a time-optimal problem
of the optimal control theory. The transportation problem and the diet
problem [see (1) and (2) in Section 1.2.5] are linear programming problems;
the simplest time-optimal problem is a problem of the optimal control the-
ory.
Thus the problems have been formalized and classified. Now we shall
see what methods the apparatus of mathematical analysis provides for the
solution of the problems.

1.3. Lagrange Multiplier Rule and the Kuhn-Tucker Theorem


1.3.1. The Theorem of Fermat. The first general analytical method
of solving extremal problems was elaborated by Pierre Fermat. It must
have been discovered in 1629, but for the first time it was presented suf-
ticiently completely in a letter of 1636 to G. P. Roberval.* In terms of
modern mathematical language Fermat's approach (Fermat himself developed
it only for polynomials) reduces to the fact that at a point ~ which yieids
an extremum in the problem without constraints f(x) + extr the equality
f'(x) = 0 must be fulfilled. As we remember, the first hint to this re-
sult can be found in Kepler's words in his "New Stereometry of Wine Bar-
rels."
The exact meaning of Fermat's reasoning was elucidated 46 years later
in 1684 when Leibniz' work appeared, in which the foundations of mathemati-
cal analysis were laid down. Even the title of this work itself,whichbe-
gins with the words "Nova methodus pro maximis et minimis .•• " ("A New Meth-
od of Finding the Greatest and the Smallest Values ... "), shows the important
role of the problem of finding extrema in the development of modern mathe-
matics. In this article Leibniz not only obtained the relation f'(x) = 0
as a necessary condition for an extremum (at present this result is called
the theorem of Fermat) but also used the second differential to distinguish
between a maximum and a minimum. However, we should remind the reader that
most of the results presented by Leibniz had also been known to Newton by
that time. But his work "The Method of Fluxions,"which had been basically
completed by 167l,waspublished only in 1736.
Now we shall remind the reader of some facts of mathematical analysis
and then pass to Fermat's theorem itself. To begin with, we shall consider
the one-dimensional case when the functions are defined on the real line R.
A function of one variable f: R +I is said to be differentiable at a point
x if there is a number a such that
t (x+A) =f (x) +aA+r (A),
where r(X) = o(IXI), i.e., for any £ > 0 there is o > 0 such that lXI ~ o
implies lr(X)I ~ eiXI. The number a is called the derivative off at the
point i and denoted by f' (~). Thus, f' (x) = lim (f(x+A)-f (i))/A..
,, ... 0

Theorem of Fermat, Let f be a function of one variable differentiable


at a point x. If x is a point of local extremum, then
(1)

*Oevres de Fermat, Vol. 1, Paris (1891), p. 71.


24 CHAPTER 1

The points x at which the relation (1) holds are said to be stationary.
According to the general definitions of Section 1.2.1, a point~ yields
a local minimum (maximum) for the function f if there is £ > 0 such that
the inequality lx- xl < £ implies the inequality f(x) ~ f(x) (f(x) ~ f(~)).
By Fermat's theorem, the points of local extremum (maximum or minimum) are
stationary. The converse is not of course true: for example, f(x) = x 3 ,
x = o.
Proof. Let us suppose that ~ is a point of local minimum of the func-
tion f but f'(x) =a~ 0. For definiteness, let a< 0. On setting£=
lal/2 we find o > 0 from the definition of the derivative such that IAI < o
implies lr(A)I < lal IAI/2. Then for 0 <A< owe obtain
t(~+A.)=f(X}+aA.+r(A.) ::(f(x)+aA.+ la~J.. =f(x)- 1adJ.. <f(x),

i.e., xis not a point of local minimum of f. This contradiction proves


the theorem. •
There are a number of different definitions of derivatives for the
case of several variables (and all the more so for "infinitely many" vari-
ables). In Chapter 2 (see Section 2. 2. 1) this will be discussed in greater
detail. Here we shall only remind the reader of the basic definition (whose
infinite-dimensional version was stated by Frechet).
A function f: Rn + R of n variables is said to be differentiable at a
point x if there are numbers (al,•••,an) (we briefly denote them by a) such
that
n
t (x+h) =f (x) + i=l
~a;h;+r (h),

where lr(h)l = o(lhl), i.e., for any£> 0 there is o > 0 such that lhl =
lhf + •.• h~ < o implies Ir(h) I ~ £h. The collection of numbers a = (a 1 , ••• ,
an) is called the derivative of f at the point x and is also denoted by
f' (x). We stress that f' (x) is a set of a numbers. The numbers a;= lim [f (x+
;\.-+0

A.e;)-f(x)]fA., i = 1, .. ,n, where ei = (0, ... ,1, ... ,0) (the unity occupies the
i-th place), are called partial derivatives of f and denoted by fxi (x) or
3f(x)/3xi· Thus, f'(x) = (fx 1 (x), ... ,fxn<x)). The relation f'(x) = o
means that fx 1 (x) = ... = fxn (x) = 0.
The "one-dimensional" Fermat theorem implies an obvious corollary:
COROLLARY (Fermat's Theorem for Functions of n Variables). Let f be
a function of n variables differentiable at a point x. If x is a ~oint of
local extremum of the function f, then
f'(x)=O~fx.(x)=O, i=l, ... , n. (2)
'
The points xE Rn at which equalities (2) take place are also called
stationary.
The Proof of the Corollary. If the function f has an extremum at a
point x, then the point zero must be a point of extremum of the function
~ def ~
!p;(A)=f(x+A.e;)· By Fermat's theorem, IJli(O)= 0, and we have IJli(O)=fx;(x).
1.3.2. The Lagrange Multiplier Rule. This section is devoted to
smooth finite-dimensional problems. We once again remind the reader of the
whimsical historical development of the calculus of variations. After the
methods of solving one-dimensional extremal problems had been elaborated
there came the time of the classical calculus of variations. In his "Ana-
lytical mechanics"* Lagrange stated the famous multiplier rule for variational
*J. L. Lagrange, Mecanique Analytique, Paris (1788).
INTRODUCTION 25
problems with constraints, i.e., for the infinite-dimensional case, and it
was only about a decade later that he applied this rule to finite-dimen-
sional problems (in 1797).
By a finite-dimensional smooth extPemal pPoblem with equality con-
stPaints (or a conditional extPemum pPoblem) is meant the problem
f 0 (x)--+ extr, f 1 (x) = .. · = fm (x) = 0. (1)
Here the functions fk: an+ R, k = 0, 1, ••• ,m, must possess some
smoothness (i.e., differentiability) properties. In the present section.
we shall assume that all the functions fk are continuously differentiable
in some neighborhood U in the space Rn (in the sense that all the partial
derivatives ofk/oxi exist and are continuous in U).
A point xEU, f,.(x)=O, k = 1, 2, ... ,m, yields a local minimum (maxi-
mum) in the problem ( 1) if there is e: > 0 such that if a point x E U sat-
isfies all the constraints (fi(x) = o, i = 1, ••. ,m) and lx- xl < e:, then
fo(x) ~ fo(x) (fo(x) ~ fo(x)).
The function
m
2 =2 (x, !.., 1..0 ) = ~ !..,.{,. (x), (2)
k=O
where A= (Aio···•Am), is called the LagPange function* of the problem (1),
and the numbers Ao, Alo···•Am are called the LagPange multipliePs.
The Lagrange Multiplier Rule. 1. Let all the functions fo, fl,•••,fm
in problem (1) be continuously differentiable in a neighborhood of a point
x. If xis a point of local extremum in problem (1), then there are
Lagrange multipliers~= (~lo••·•~m) and ~o, not all zero, such that there
hold the stationarity conditions for the Lagrange function with respect to
x:
2,.(x,_ i, }.0 )=0~2x1 (x, i 1, ••• , 1,., '1..0 )=0, (3)
i =I, 2, ... , n.
2. For ~o to be nonzero (~o ~ 0) it is sufficient that the vectors
f~(x), .•. ,f~(x) be linearly independent.
Thus, we have n + m equations with n + m + 1 unknowns for determining
x, ~. and ~o:

:-(f.I..J (x))=o,
OXj k=O
11 i=l, ... , n, ft(x)= ... =f,.(x)=O. (4)

It should be taken into consideration that the Lagrange multipliers are


determined to within proportionality. If it is known that ~o ~ 0 (this is
the most important case because if Ao = 0, then the relations (3) only ex-
press the degeneracy of the constraints and are not connected with the
functional), we can obtain the equality ~0 = 1 by multiplying all xi by a
constant. Then the number of the equations becomes equal to that of the
unknowns.
Equations (4) can also be written in the following more symmetric
form:
(4 I)

*In English mathematical literature L is called the Lagrangian. In the


present translation we follow the authors' distinction between the "Lagrange
function" and the "Lagrangian" (Sections 1.3.3, 1.4.1, 1.5.1, 1.5.3, 3.1.5,
3.3.1, 4.1.1, and 4.2.2)- Translator's note.
26 CHAPTER 1
The solutions of these equations are called stationary points of the prob-
lem (1).
From the times of Lagrange for almost a whole century the multiplier
rule was stated with ~o ~ 1 (Lagrange's formulation will be presented a
little later), although in this form it is incorrect without some additional
assumptions, for instance, without the assumption on the linear indepen-
dence of fi(x), i ~ 1, ••• ,m. To confirm what has been said let us consider
the problem
Xt-+inf, xf+4=0.
Its evident and single solution is x
~ (0, 0) because this is the only
admissible point. Now let us try to form the Lagrange function with Ao ~ 1
and then apply the algorithm (3). We have .9'=xt+l(x~+xD, whence

and we see that the first of these equations contradicts the equation xf +
X~ ~ 0.
Usually as a regularity condition, guaranteeing that ~o ~ 0, the lin-
ear independence of the derivatives f~(x), •.• ,f~(x) is used (see the asser-
tion 2 of the above theorem). However, as a rule, the verification of
this condition is more complicated than the direct verification, with the
aid of Eq. (3), of the fact that ~o cannot be equal to zero (see the solu-
tions of the problems in Section 1.6). Therefore, the above formulation
of the theorem, which involves no additional assumptions other than the
smoothness conditions, is very convenient.
The proof of the theorem is based on one of the most important theo-
rems of finite-dimensional differential calculus, namely the inverse
function theorem (see [14, Vol. 1,p. 455] and [9, Vol. 1]).
The Inverse Function Theorem. Let $l(xl,••••xs), ••• ,$s(xl, •.• ,xs) be
s functions of s variables continuously differentiable in a neighborhood
of a point i. Moreover, let the Jacobian
:; = det ( o1Jl; <x> )
OXj

be nonzero. Then there exist Eo > 0 and oo > 0 such that for any n ~
(n 0 , ••• ,ns), lnl <;Eo, there is t;;, lt;;l <; oo, such that $(x + t;;) ~ $(x) + n,
and, moreover, t;; + 0 when n + 0.
More general versions of this theorem will be proved in Section 2.3.
Proof of the Lagrange Multiplier Rule. Let a point x
yield a local
minimum in problem (1). Here there are two possibilities: either the vec-
tors f~(x), ••• ,f~(x) are linearly independent or these vectors are linearly
dependent.
A) Let the vectors be linearly dependent. Then, by the definition of
linear dependence, there are ~k• k ~ O, 1, ••• ,m, not all zero, for
which the condition (3) holds.
B) Let the vectors be linearly independent. We shall show that this
assumption contradicts the fact that x
yields a local minimum in '
the problem (1).
Let us consider the mapping ~(x) ~ (fo(x)- fo(x), fl(x), •.• ,fm(x)).
By assumption, the vectors f&(x), .•• ,f~(x) are linearly independent, i.e.,
the rank of the matrix
INTRODUCTION 27

at. (x) at. (x) )


-a--· ···· -a-
A=
(
atm~i) ...... ~~::xl
axl ' ... ' axn

is equal tom+ 1. For definiteness, let us suppose that the first m + 1


columns of the matrix A are linearly independent, i.e.,

at.(xl at.(x) )
~t ••• , axm+l
(
det . . ~· · . · · . · ·~ . =I= 0.
atm (x) atm (x)
~· ... , a.>:m+l

Then the functions


l)ldxi, · · ·, Xm+t) = fo (xl, · · ·, xm+i• Xm+o• · · ·, xn)-fo (xi, · · · ,xn),
~~'· (xl, ... 'Xm+l) = fl (xl, ... 'Xm+lt x"l+2' ... 'Xn), ...•
'i'm+i (xl, .. Xm+l) = fm (xl, ... 'Xm+tt xm+2t . . . t Xn)
0 '

satisfy the conditions of the inverse function theorem. By that theorem,


for any e, 0 <£<eo, there are Xl(£), •.. ,Xm+ 1 (e) such that
f 0 (X1 (e), · · .,Xm+de), Xm+>• · · ., xn)-f.(x)=-e,
ft(xl(e), ... ,xm+l(e), Xm+2• •.. 'xn)=O,
(5)

fm(x 1 (e), ... ,xm+i(e), Xm+>• •.. ,xn)=O,

and, moreover, Xi(£)+ ~i for£+ 0; i = 1, 2, ... ,m + 1. Now (5) directly


implies that x
is not a point of local minimum. We have thus arrived at
the desired contradiction. The assertion 2 of the theorem is obvious. •
The generalization of the Lagrange multiplier rule to infinite-dimen-
sional problems with equality and inequality constraints will be presented
in Chapter 3.
Now we shall make some more historical remarks. The multiplier rule
was first mentioned in Euler's works on isoperimetric problems (in 1744).
Later (in 1788) it was put forward by Lagrange in his "Analytical mechan-
ics" for a wide class of variational problems (the so-called Lagrange prob-
lems; this class of problems will be discussed later). In 1797 in his
book "The Theory of Analytical Functions"* Lagrange also touches upon the
question of finite-dimensional extremal problems.
He writes: "On peut les reduire a ce principe generale. Lars qu'une
fonction de plusieurs variables doit etre un maximum ou minimum, et qu'il
y a entre ces variables une ou plusieurs equations, il suffira d'ajouter a
la fonction proposee les fonctions qui doivent etre nulles, multipliees
chacune par une quantite indeterminee, et la chercher ensuite le maximum
ou minimum comme si les variables etaient independantes; les equations
qu'on trouvees, serviront a
determiner toutes les inconnues."
"The following general principle can be stated. If a maximum or a
minimum of a function of several variables is being sought under the con-
dition that there is a constraint connecting the variables which is deter-
mined by one or several equations, one must add the functions determining
the equations of the constraint multiplied by indeterminate multipliers to
the function to be minimized and then seek the maximum or the minimum of
the constructed sum as if the variables were independent. The equations

*J. L. Lagrange, Theorie des Fonctions Analytiques, Paris (1813).


28 CHAPTER 1

obtained, when added to the constraint equations, will serve for determining
all the unknowns."
We can say without exaggeration that the major part of the present
book is devoted to the explanation and the interpretation of Lagrange's
idea in its application to problems of different nature. Let us again
present here the main idea of Lagrange. Let it be required to find an ex-
tremum in the problem (1). Then we should form the Lagrange function (2)
and consider the problem ~(x, A, £0 )--+ extr without constraints. According
to the theorem of Fermat for. functions of n variables, the necessary con-
ditions for this problem without constraints yield the desired equations
9,.=0. Thus, in accordance with the Lagrange principle, the equations for
an extremum in the problem with constraints coincide with the equations
for the problem 9(x, ~. ~0 )--+ extr without constraints for an adequate choice
of the Lagrange multipliers ~. ~o.
Further, we shall see that the general idea of Lagrange proves to be
correct for a great number of problems.
1.3.3. The Kuhn-Tucker Theorem. In this section we shall consider
convex programming problems for which the idea of Lagrange assumes the most
complete form. The investigation of this class of problems has been going
on for a comparatively short time. The fundamentals of linear programming
(this is a special case of convex programming) were laid down in the work
by L. v. Kantorovich [55] in 1939. The Kuhn-Tucker theorem (which is the
basic result in the present section) was proved in 1951.
Let us consider the following extremal problem (a convex pro~amming
problem):
/0 (X)-+irrf, f;(x)~O, i=l, ... ,m,xEA, (1)
where X is a linear space (not necessarily finite-dimensiona l), fi are con-
vex functions on X, and A is a convex subset of X. Here it is important
that (1) is a minimization problem and not a maximization problem.
We remind the reader that a set C lying in a linear space is said to
be convex if, along with any two of its points x and y, it contains the
whole line segment [x, y] = {zlz =ax + (1 - a)y, 0 ~a~ 1}. A function
f: X~ R is said to be convex if Jensen's inequality
f(a.x+(l-a.)y)~a.f(x)+(l-a.)f(y), Va., O~a.~l,
holds for anyx,-yEXor, which is the same, if the "epigraph" off, i.e.,
the set
epi f = {(a., x) E R X X Ia.;;;;;:. f (x)} ,
is a convex set in the product space R x X.
The function
m
9=9(x, A., A0 )= ~ A.kfk(x), (2)
' k=O

where A= (Al•••••Am), is called the Lagrange function of the problem (1),


and the numbers Ao, •.• ,Am are called the Lagrange multipliers. Here the
constraint xEA is not involved in the expression (2).
The Kuhn-Tucker Theorem. 1. Let X be a linear space, let fi: X~ R,
i = 0, 1, ••• ,m, be convex functions on X, and let A be a convex subset of
X.
If xis a solution of problem (1), then there are Lagrange multipliers
~o, ~. not all zero, such that
INTRODUCTION 29

a) min£" (x, ~. ~0 ) = £" (x, ~. ~0 ) (this is the minimum principle);


xeA.

b) ~i ~ 0, i = 0, 1, ..• ,m (the nonnegativity condition);


c) ~ifi(x) o, i 1, 2, ... ,m (the condition of complementary slack-
ness).
2. If AO ~ 0, then the conditions a)-c) are sufficient for the admis-
sible point x to be a solution of problem (1).
3. For ~o to be nonzero (~o ~ 0) it is sufficient that there exist a
point i E A, for which the Slater condition holds: fi (i) < 0, i = 1, ••• ,m.

Hence, if the Slater condition is fulfilled, we can assume that Xo =

The relation a) expresses Lagrange's idea in the most complete form:


if x yields a minimum in a problem with constraints (inequality constraints)
then the same point is a point of minimum of the Lagrange function (in the
problem not involving those constraints which are included in the Lagrange
function). The relations b) and c) are characteristic of problems with
inequality constraints (for more detail, see Section 3.2).
The proof of the Kuhn-Tucker theorem is based on one of the most im-
portant theorems of convex analysis, namely, the separation theorem. How-
ever, here it suffices to prove the simplest finite-dimensional version
of this theorem which we state below and prove in the next section.
The Finite-Dimensional Separation Theorem. Let C be a convex subset
of RN not containing the point 0. Then there are numbers a 1 , ••• ,aN such
that the inequality
N
~ a1x 1 ;;;a.O (3)
1=1
N
holds for any x = (x1 , •• , xN)EC (in other words, the hyperplane ~ a1x1 =0,
I= I
passing through the point O,divides RN into two parts within one of which
the set Cis entirely contained).
Proof of the Kuhn-Tucker Theorem. Let x
be a solution of the problem.
Without loss of generality, we can assume that fo(x) = 0 [if otherwise, we
introduce the new function fo (x) = fo (x)-fo (x)].
,--,J A

Let us put
C={!!-ERm+l, J.t=(J.t 0 , ... ,J.tm)l3xEA: fo(x)<llo• f;(x):s=;;;,lli• i;;;a.l}. (4)
The further part of the proof is split into several stages.
A) The Set C is Nonempty and Convex •. Indeed, any v~ctor Jl E Rm+l with
positive components belongs to C because we can put x = x in (4). Conse-
quently, the set C is nonempty. Let us prove its convexity. Let ~ =
(~ 0 , •••• ~) and~· =(~~ •••• ,~)belong to C, 0 ~a~ 1, and let x and x'
be elements of A such that fo(x) < ~o, fo(x') < ~~. fi(x) ~ ~i• and fi(x')~
~I. i ~ 1 [according to (4), such elements x and x' exist]. Let us put
Xa = ax + (1 - a)x'. Then Xa;EA since A is convex, and, by the convexity
of fi•

i.e., the point a~+ (1- a)~' belongs to C.


B) The Point 0 E Rm+l Does Not Belong to C. Indeed, if the point 0
belonged to C, then, by virtue of definition (4), this would imply the
existence of an element (x) E A, for which the inequalities f o (x) < 0,
fi(x):s=;;;O, i~ 1, hold. However, from these inequalities it follows that
30 CHAPTER 1

x is not a solution of the problem. Consequently, 0 f. C.


Since C is convex and Qf.C, w~ can apply the separation theorem ac-
cording to which there are Ao, ... ,Am such that

(5)

for any p. EC.


C) The Multipliers ~j, i ~ 0, in (5) Are Nonnegative. Indeed, in
Section A) it was already said that any vector with positive components
belongs to C; in particular, the vector (E, .•• ,E, 1, E, ..• ,E) belongs to
C,where E > 0 and 1 occupies the io-th place, i 0 ~ 1. Substituting this
point into (5), we see that ~~;;;;::. - s ~ ~~, whence, by the arbitrariness of
0 i-=Pi0
E > 0, it follows that ~io ~ 0.
D) The Multipliers~;. i ~ 1, Satisfy the Conditions of Complementary
Slackness. Indeed, if fj 0 (x) -0, jo ~ 1, then the equality Xj 0 fj 0 (x) -0
is trivial. Let fj 0 (x) < 0. Then the point (6, 0, .•. ,0, fj (x), 0, .•• ,0),
where the number fj 0 (x) occupies the jo-th place and 6 > 0, ~elongs to C
[to show this it suffices to substitute the point for the point x in (4)]. x
Substituting this point (6, 0, •.• ,0, fj 0 (x), 0, ••• ,0) into (5),wesee
that ~j fj (x) ~ -~o6, whence, by the arbitrariness of 6 > 0, the inequal-
o 0 A

ity ~j 0 EO; 0 f~llows. How:ver, as was proved in Section B), we have Aj 0 ~ ~


0, and hence Aj 0 = 0 and Aj 0 fj 0 (x) = 0.
E) The Minimum Principle Holds at the Point Indeed, let xEA. Then x.
the point (fo(x) + 6, fl(x), ••• ,fm(x)) belongs to C for any 6 > 0 [see def-
inition (4)]. Therefore, taking into account (5), we obtain ~ofo(x) +
m
~ ~;{;(x);;;::.-i..ofJ, whence (by virtue of the arbitrariness of 6 > O) the in-
1=1

equality 9 (x, i:, i:0 );;;;::. 0 follows.


Now, taking into account the equality fo(x) =0 and the conditions of
complementary slackness, we obtain
m
.IE (x, i., i..o);;;;::. o= ~ i.;t; (x) =.IE (x, i., f- 0 ).
1=0

for any x E A • Assertion 1 of the theorem has been proved.


F) Let Us Prove Assertion 2. If we suppose that Ao ~ 0, then we can
put Ao = 1 [because the Lagrange multipliers satisfying the relations a)-c)
of the-theorem preserve their properties when multiplied by an arbitrary
positive factor]. Then for any admissible x(xEA, f;(x)::;;;;,O, i;;;::.I), we obtain
b) ~ A I!' a) I!'A A ~ A A c) A

fo (x);;;;::. fo (x) + ~ A;/; (x) = 2 (x, t.; I) ;;;::.9"(x, "'• 1) =fo (x) +~A;/; (x) =f0 (x),
1=1 1=1

i.e., xis a solution of the problem.


G) Let Us Prove Assertion 3. Let the Slater condition be fulfilled
[i.e., let forAsome x
.EA the inequalities fi(x) < 0, i ~ 1, hold] and,
moreover, let Ao = 0 in assertion 1. Then we immediately arrive at a con-
tradiction:
m
3 <x. i., o>= ~~ i.tti <x>< o=2 <x. i., o>
INTRODUCTION 31

(in the calculation we have used the fact that ~i• i ~ 1, are not all zero),
and, at the same time, by virtue of a), we have 2 (x, ).., 0);;;.. 2 (x, i, 0). •
We shall also present another version of the Kuhn-Tucker theorem.
Since ~o > 0 when the Slater conditions are fulfilled, and since the La-
grange multipliers are determined to within a positive factor, we can as-
sume that ~o = 1. Now the Lagrange function
m
9 (x, 'i., 1) = fo (x) +....~,?.d;(x)
is defined on the set
Ax R!' = {(x, 'i.) I x = (K1 , ... , x,.) E A; 'i. = ('i.:~, ... , 'i.,.), 'i.;;;;:.:O},
and the relations a)-c) are equivalent to the fact that (i, ~) is a saddZe
point of the Lagrange function, i.e.,
min2(x,>:, 1)=2(x,>:, 1)=max2(x,'i., 1}. (6)
xeA 1.:;..0

Indeed, the left equality (6) coincides with a), and the right one is
a consequence of c) and b):
m m
2 (x, i., 1) = fo (x) +.~ i.d; (x) =to (x);;;.. fo (x) + ~~ 'i.;{; (x) = 2 (x, 'i., 1).
In Sections 2.6, 3.1, 3.3, and 4.3 we shall present some other theo-
rems of convex analysis and convex programming.
Remark. In "convex analysis" it turns out to be convenient to con-
sider functions whose values belong to the extended real line R = R U{- oo,
+oo}; that is, it is convenient to admit that the functions may assume the
infinite values ~, +oo, The rules of arithmetic are applicable, with some
limitations, to the extended real line (this will be discussed in greater
detail in the footnote to Section 2.6.1). It can easily be seen that the
above proof of the Kuhn-Tucker theorem remains valid, without changes, for
such functions as well.
1.3.4. Proof of the Finite-Dimensional Separation Theorem. The
statement of the theorem was given earlier. We remind the reader that C
is a convex set in aN and oEc.
A) Let lin C be the Zinear> huU of the set C, i.e., the smallest linear
subspace containing c. Here there are two possible cases: either line ~
aN or lin C = aN. In the first case lin C is a proper subspace of aN, and
N
therefore there exists a hyperplane ~ a;x;=O, containing lin C and passing
i=l

through zero. This is the sought-for hyperplane.


B) If lin C = aN, then among the vectors belonging to C there are N
linearly independent ones, and, due to the linear independence, they form
a basis in aN. Let us denote these vectors by 'ito .. .• eN; iiEC, i= !, •.. , i'/.
Let us consider two convex cones in aN: the "negative orthant"

and the con1c hull of the set C:


32 CHAPTER
N
These cones do not intersect. Indeed, if a vector x=- ~ y;"h Y· > o, be-
io:l '

~iEC, such that x= ~ Cii~i •


- s
longed to ;K •. , then there would be s, cti ;;<: 0 and
io:l
But then the point 0 would belong to C because this point could be repre-
sented in the form of a convex combination

of points belonging to C.
C) Since ~1 is open, what is proved in B) implies that none of the
points of ~ 1 can belong to the closure ~~ of the set ~~. (It should
be noted that iK2 is closed and convex. Why?) Let us take an arbitrary
N
point belonging to ~1' say Xo=- ~ e;' and find the nearest point soE.lir.
io:l
to x 0 • There exists such a point; namely, this is the point belonging to
the compact set ~ 2 flB(x0 ,tx0 f), at which the continuous function f(x) "' lx-
xol attains its minimum.
D) Let us draw a hyperplane H through ~o perpendicular to x 0 - ~o and
show that it is the desired one, i.e., OEH and the set C is entirely con-
tained in one of the two closed half-spaces bounded by that hyperplane.
We shall prove even more; namely, we shall show that if H is the interior
of one of the half-spaces, the one which contains the point x 0 , then fl H
:IC 2 =fi!J. Since C is a subset of .lir,, it is entirely contained in the closed
half-space which is the complement to H.
Let us assume the contrary, and let s1 Efinffi,. Then the angle x~1
is acute. Moreover, [6 0 , sdE~.. since Yt, is convex. Let us drop a perpen-
dicular (x 0 , 62) , 62 €(6 0 , 61) , from xo to the straight line (~ o, ~ l) and show
that ~o is not the point of Jfi, nearest to xo. Indeed, the points ~o, ~ 1 ,
and ~ 2 lie in one straight line and s.Eh (why?). I f s2 E!so. sd, then soE.ii"o
and lxo- ~21 < lxo- ~ol (the perpendicular is shorter than an inclined
line). If ~l lies between ~o and ~2, then lxo- ~ll < lxo- ~ol (of two
inclined lines the shorter is the one whose foot is nearer to the foot of
the perpendicular).
On the other hand, H passes through the point 0 because, if otherwise,
the ray issued from 0 toward the point ~o which is entirely contained in
~. (why?) would have common points with H.
1.4. Simplest Problem of the Classical Calculus of
Variations and Its Generalizations
1.4.1. Euler Equation. Soon after the work by Johann Bernoulli on
the brachistochrone had been published, there appeared (and were solved)
many problems of this type. Johann Bernoulli suggested that his disciple
L. Euler should find the general method of solving such problems.
In 1744 was published Euler's work ·~ethodus inveniendi lineas curvas
maximi minimive proprietate gaudentes sive solutio problematis isoperi-
metrici latissimo sensu accepti": "A method of finding curves possessing
the properties of a maximum or a minimum or the solution of the isoperimetric
problem taken in the broadest sense." In this work the theoretical foun-
dations of. a new branch of mathematical analysis were laid down. In
INTRODUCTION 33

particular, approximating curves with broken lines, Euler derived a second-


order differential equation for the extremals. Later Lagrange called it
the Euler equation. In 1759 the first work by Lagrange on the calculus of
variations appeared in which new investigation methods were elaborated.
Lagrange "varied" the curve suspected to be an extremal, separat:ed out the
principal linear parts from the increments of the functionals which he
called variations, and used the fact that the variations must vanish at a
point of extremum. Later on the Lagrange method became generally accepted.
This is the method that we shall use to derive the Euler equation. It
should also be mentioned that after Lagrange's works had appeared Euler
suggested that the whole branch of mathematics in which the Lagrange
method was used should be called the calculus of variations.
Now we proceed to the derivation of the Euler equation for the sim-
plest problem of the classical calculus of variations. In Section 1.2.6
this name was given to the extremal problem
I,

.'7(x(·))=~L(t,x,x)dt------..extr, x(t.)=x., x(t 1 ).=xi, ( 1)


t,

considered in the space of continuously differentiable functions c1 ([t 0 ,


t1]), -ro <to< t1 < oo, C1 ([to, t1l) is a Banach space, i.e., a normed
space complete with respect to the norm
llx(·)llt=max( max lx(t)l, max lx(t)i).
te(l,,l,] le(l 0 ,1tl

We shall suppose that the function L (we call it the integrand or the
Lagrangian of the problem) together with its partial derivatives Lx and 1x
is continuous jointly with respect to the arguments t, x, i.
THEOREM (a Necessary Condition for an Extremum in the Simplest Problem
of the Classical Calculus of Variations) . Let a function ~(-) EC1 (ft 0 , t1})
yield a local extremum in the problem (1). Then it satisfies the equation
d ~ ~ ~ ~
-di'L;,(t, x(t), x(t))+Lx(t, x(t), x(t))=O. (2)

Equation (2) is called the Euler equation. An admissible function


~(·) for which the equation holds is called a stationary point of the prob-
lem (1) or an extremal. Thus, the admissible elements which yield local
extrema in the problem (1) are extremals; generally speaking, the converse
is not true.
In the calculus of variations a local extremum in the space c 1 ([to,
t 1 ]) is said to be weak. According to the general definition, a function
i(·) yields a local minimum (maximum) in the problem (1) in the space
C1 ([t 0 , t 1 ]) i f there is E > 0 such that for any function x(·)EC1 ([t0 , t 1 ]) sat-
isfying the conditions x(to) = x(tl) = 0 and llx(·)lll ~ E there holds the
inequality

We shall carry out the proof of the theorem twice. First we shall
make use of Lagrange's reasoning. However, to this end we shall have to
assume additionally that the function t + Lx(t, x(t), k(t)) is continuously
differentiable. After that we shall prove a very important lemma of
DuBois- Reymond from which the theorem under consideration follows without
that additional assumption as well. A generalization of the construction
elaborated by DuBois-Reymond will play an essential role in Chapter 4.
The proof of the theorem consists of three stages.
A) The Definition of the First Lagrange Variation. Let x(·)EC 1 ([t 0 ,t 1 ]).
Let us consider the function 1...-q:>"{At=.? (.£(. )+l.,t(·)). We have
34 CHAPTER 1
t, t,
cp(?..)= S F(t, ?..)dt= SL(t, x(t)+?..x(t).~(t)+1..X(t))dt.
t, t,

The conditions imposed on L allow us to differentiate the function F


under the integral sign (to this end it is sufficient that the functions
(t, A) + F(t, A) and (t, A) + aF(t, A)/aA be continuous (see [14, Vol. 2,
p. 661] and [9, Vol. 2]). Differentiating and substituting A = 0, we ob-
tain
t,
cp' (0) = s(q (/)X (t) +p (t) X(t)) dt, (3)
t,
where
p (t) =L;, (t, x(t), ~ (t)), q (t) = L,. (t, x (t), ~ (t)). (3a)
Thus, lim(3(x(·)+M (·))-3 (x(·))/1..) exists for any x(•)ECl([t 0 , t 1] ) . Let
~0

us denote this limit by ll3(x(·), x(·)). The function x(·)~-+.ll.'l(x(·), x(·)) is


called the first Lagrange variation of the functional .'1.
B) The Transformation of the First Variation with the Aid of Integra-
tion by Parts. Let x(to) = x(tl) = 0, and let the function p(·) be con-

.
tinuously differentiable. Following Lagrange's method, we have
~ ~
63 (x(·), x(·)) = s(q(t)x(t)+p(t)x(t))dt == sa(t)x(t)dt, (4)
~

where a(t) = -P(t) + q(t) [the product p(t)x(t) vanished at the end points
of the interval of integration].
We know that x(·) yields a local extremum (for definiteness, let it
be a minimum) in problem (1). I t follows that the function 1...-cp(?..) has a
local mi~imum at zero. By virtue of Fermat's theorem (see Section 1.3.1),
cp'(O)=ll.7(x(·),x(·))=O. Comparing this with (4), we conclude that for an ar-
bitrary function x(·) EC1 (ft0 , ti)) such that x(to) = x(td = 0 the equality
t,
S a(t)x(t)dt=O takes place.
to
B) The Fundamental Lemma of the Classical Calculus of Variations
(Lagrange Lemma). Let a continuous function t~-+a(t),tE[t 0 ,tij, possess the
t,
property that Sa(t)x(t)dt=O for any continuously differentiable function
t,
x(·) satisfying the condition x(to) = x(tl) = 0. Then a(t) =0.
..fl:QQ.f_, Let us suppose that a(T) ~ 0 at a point TE[t 0 , t 1 ]. Then the
continuity of a(·) implies that there is a closed interval &=['t'1, 't'2]c (t0, t1),
on which a(·) does not vanish. For definiteness, let a(t)~m>O, tEll. Let
us construct the function
·- { (t-'t'l)'(t-'t'2)~. t e&,
x(t)= 0, t~&.
It can easily be verified that x(·)EC1(ft•• tl]); moreover, x(to) = x(tl) = 0.
t,
By the condition of the lemma, Sa(t)x(t)dt=O. On the other hand, by the
t,
mean value theorem (see [14, Vol. 2, p. 115] and [9, Vol. 1]), we have
t, t,
Sa(t)x(t)dt=a(s)Sx(t)dt>O, and we arrive at a contradiction which proves
t, t,
the lemma. •
INTRODUCTION 35

Comparing what was established in A) and B), we see that .....P<t) + q(t) ::
0, which is what we had to prove.
D) DuBois-Reymond's Lemma. Let functions a(·) and b(·) be continuous
on [to, t1l, and let
t,
~ (a (t) .i: (t)+ b (t) x (t)) dt =0 ( 5)


for any function x(·)EC 1([to. t1]) such that x(to) = x(tl) = 0. Then the func-
tion a(·) is continuously differentiable and da/dt = b(t).
~. Integrating by parts in the second summand of the integrand
in (5), we obtain
~ t t ~ t
0 = ~ (a (t)- ~ b (s) ds) .i: (t) dt+ ~ b (s) ds x (t) ~~: =~(a (t)- ~ b (s) ds) .i: (t) dt. (6)
~ ~ 4 4 4
t
Let us prove that cp(t)=a(t)-~ b(s)ds=const. Indeed, if this is not so,

then there are t1 and


' '·
T2 such that cp(,;1);6cp(,; 2) . Without loss of generality,
we can assume that Tl and T2 are interior points of the closed interval
[to, til and that cp ('t"1) > cp (,;,), Let us choose o > 0 so that ['t;-11, 't"t+ll]c [to. td;
i = 1, 2, and so that cp(s-t-,;1)-cp(s-t-,;.):;;..a>O for lsi~ o.
Now let us take a function x(•) whose derivative has a special form:
-(t-,;1 +II) (t-,;1-ll), tEIT1-II, Tt +Ill,
x(t)= { -t-(t-T,-t-ll)(t-,;,-11), tE[T,-6, T,-t-11], (7)
0 for all the other ,t.
t,
If we put x(to) = 0 then, by virtue of (7), we have x(tt)= ~ x(t)dt=O, and
hence (6) must hold for such a function. However, '·
11 t ] ~,+6 't"o+G
x(t) dt = ~ cp (t) x(t) dt + ~ cp (t) .i: (t) dt =
[

~ a (t)- ~ b (s) ds
'• '· ~. -6 ~.-6
G 6
= ~ (cp ('t1 -t-s)-cp (,; 2 +s))(ll-s) (s-t-11) ds:;;.:,a; ~ (11-s)(s+ll)&s > 0.
-6 -0
t
We thus arrive at a contradiction, which shows that cp(t)=a(t)-~ b(s)ds=const,

and this implies that a(•) is differentiable and da/dt =b.


'·•
Applying DuBois-Reymond's lemma to the first variation (3), we con-
clude that p·(·)EO([t 0 , t 1 J) and p(t) = q(t).
The argument we have just used contains the so-called method of varia-
tions in its incipiency with the aid of which various necessary conditions
for extremum are derived. The essence of the method consists in the fol-
lowing. Let x be a point suspected to yield a minimum in the problem
f(x)~inf, xeC. Then one can try to construct a continuous mapping A+
x(A),A.ER+> such that x(O) = x and x(A.)EC, O~A.~A-0 • It is natural to call
this curve a variation of the argument. Let us put Ql(A)=f(x(A.)). Further,
let us suppose that the function Ql is differentiable on the right with
respect to A. If x
does in fact yield a minimum, then the inequality
~p' ( +0) =lim (QJ (A.)-Ql (0))/A.~ 0 must hold because it is evident that 1p (A.)~
).~0

~p(O) = f (x).. If one manages to construct a sufficiently broad set of varia-


tions of the argument, the set of the inequalities ~p'(-t-0)~0 corresponding
to all these variations forms a necessary condition for a minimum. When
deriving the Euler equation, we used "directional variations": x(A) = x +
AX. In the derivation of Weierstrass' necessary condition and the maximum
36 CHAPTER
principle we shall use variations of another form, namely, the so-called
"needlelike variations" (see Sections 1 .4 .4 and 1 .5 .4).
In conclusion, we indicate two first integrals of the Euler equation:
1) If the Lagrangian L does not depend on x, the Euler equation pos-
sesses the obvious integral
p (t) = L x(t, x(t)) =canst.
It is called the momentum integPal.
2) If the Lagrangian L does not depend on t, the Euler equation pos-
sesses the first integral
H(t) = L x (x (t), ~ (t)) ~ (t)-L (x (t), ~ (t)) ==canst.
It is called the energy integral (both the names originate from classical
mechanics; we shall have a chance to come back to this question; see Sec-
tions 4.4.6 and 4.4.7). To prove this we calculate the derivative:
dH ~ ..: " d ~ ..: ..:
Tt==Lx (x(t), x(t))x(t)+ dt Lx (x(t), x(t))x(t)- ·

-L.,(i(t), ~(t))i(t)-Lx (x(t), i(t))i(tl=(~ L.i(x(t), ~(t))-L.. (x(t), ~{t)))i(t)==O


(by virtue of the Euler equations). Consequently, H(t) = const.
Remark. In the course of the calculation we have made the additional
assumption on the existence of the second derivative ~(t). This assumption
may be weakened,.but it is impossible to do without it completely.
It should also be noted that the Euler equation is a second-order dif-
ferential equation. Its general integral depends on two arbitrary con-
stants. Their values are found from the boundary conditions.
1.4.2. Necessary Conditions in the Bolza Problem. Transversality
Conditions. The simplest problem considered in the foregoing section is
a problem with constraints: the boundary conditions x(to) = xo and x(t 1 ) =
x 1 form two equality constraints. The problem considered in the present
section (the so-called Bolza problem) is a problem without constraints.
Let L satisfy the same conditions as in Section 1.4.1; let l = l(x 0 , x 1 )
be a continuously differentiable function of two variables. We shall con-
sider the following extremum problem in the space C1 ([to, t 1 ]):
t,
:B(x(· ))= ~ L(t, x, x)dt +l (x(t0 ), x(tJ)-+ extr. (1)
to
This is the problem that is called the Bolza problem.
THEOREM (a Necessary Condition for an Extremum in the Bolza Problem).
Let a function :X (·)ECl([t0 ,t1]) yield a local minimum in the problem (1).
Then there hold the Euler equation

- ~ L;, (t, ::(t), i(t))+L.,(t, x(t), i(t)):;;;Q (2)

and the transversality conditions


L. (t -(t) ..: (t )) _iJl(x(to). x(ttl)
x o• X o' X o - ilxo ' (3)
L x (ti, x(t~>· ~ (tl) = -at <xu~~/ u~» .
~. In exactly the same way as at the first stage of the proof of
the theorem of the foregoing section we obtain the following expression
for the first variation of the functional ~ :
INTRODUCTION 37

t,
M3(x(·), x(·))= ~ (q(t)x(t)+p(t)x(t))dt+a.x(t.)+ a,x(t,), (4)
t,

where p(·) and q(·) are determined by relations (3a) of the foregoing
section and a.= al(x (to). x(t,) , i = 0, 1 . Since i: ( ·) is a point of local mini-
' ax;
mum, /)j](x(·); x(·))=O for any point x(·)EC1 ((te, t,]). In particular, /)j](x(·);
i:(·)) = 0 for any point x(·)EC'([t 0 , t,]) such that x(to) = x(t 1 ) = 0. By
DuBois-Reymond's lemma, the function p(•) is continuously differentiable,
and, moreover, dp(t)/dt = q(t). Integrating by parts in (4) and using the
equality p(t) = q(t), we arrive at the following expression for the first
Lagrange variation of the Bolza functional:
I,

0= /)j] (x( · ), X ( ·)) = ~ (q (t)-p (t)) X (t) dt+(a0 -p(t 0 )) X(t 0 ) +


t,

(5)

Putting, in succession, x(t) = (t- to) and then x(t) = t - t 1 in (5) we


conclude that ao = p(to), -a1 = p(tl)• •
Thus, we have again obtained a second-order differential equation and
two boundary conditions (the transversality conditions).
We have considered the "one-dimensional" Bolza problem. The vector
problem is stated quite analogously:
I,

~(x(·))=~L(t, Xi, •• ·., Xn, X1 , ... , Xn)dt+


lo
+l(xi;(t 0 ), ... , Xn{t 0 ), xdt 1 ), ••• , Xn(/1 )}-->-extr, ( 1 I)

where
L: RxRnxRn~R. l: RnxRn~R.

In this case the necessary conditions for an extremum are also of the
form (2), (3), but the corresponding equation should be understood in. the
vector sense:

- ~ L;'i Uu x, (t), ... , x,. (t), ~~ (t}, ... , x,. (t)) +


+Lx.(t, x,(t), ... , Xn(t), ~,(t), ... , ~,.{t))=O, (2')
J

L; (t;,
I
x, (t;) •... , x,..(t 1), i, (t;), ... , in (t;)) =
=(-l);aL(x,u.> ..... X..u.>. xdtd, .... i,.ull>, ( 3 I)
ax;1
i = 0,1; j = l, ... , n.
1.4.3. The Extension of the Simplest Problem. In Section 1.4.1 we
considered the simplest problem of the classical calculus of variations in
the space C1 ([to, t1]), which is traditional. The space C1 ([to, t1]) is of
course convenient, but it is by far not always natural for a given problem.
In particular, under the conditions imposed on the integrand in that space
[the assumption that L, Lx, and LX are continuous jointly with respect to
the arguments (t, x, x)] the existence of the solution in C1 ([to, tl]) can-
not always be guaranteed. Below we give one of the simplest examples.
Hilbert's Example. Let us consider the problem
I
:; (x(. )) =) t' 1•x2dt ~ inf, x (O) = o, x (l) = 1.
0
38 CHAPTER 1

Here the Euler equation possesses the momentum integral (see Section
x
1.4.1) t 2 13 = C, whence it follows that the only admissible extremal is
x(t) = t 113 This extremal does not belong to the space C1 ([to, t 1 ]) (why?).
At the same time, it yields an absolute minimum in the problem. Indeed,
let x(•) be an arbitrary absolutely continuous function* for which the in-
tegral .?(x(·)) is finite and x(O) = x(1) = 0. Then
I

, <x <. >+x <. ))= ~ t''· (i• (t)+ 2i(t)x(t)+x" (t))dt =
0

=" (x <.)) + j. Sx +" (x <.)) =" (x(. ))+.? (x <.));;;::: '(x <. )).
I

(t) dt
0
Hence, the solution of the problem exists but does not belong to
C 1 ([to, td).
Among the famous "Hilbert problems" there are some devoted to the cal-
culus of variations. In particular, the twentieth problem concerns the
existence of the solution. Hilbertt writes: "I am sure that it will be
possible to prove the existence with the aid of some general basic prin-
ciple indicated by the Dirichlet principle and which will probably enable
us to come nearer to the question whether every regular variational problem
admits of a solution ••• if, when necessary, the very notion of a solution
is given an extended interpretation." Hilbert's optimism is confirmed by
a great deal of experience in the classical mathematical analysis. Here
we shall demonstrate the problem of an "extended interpretation" of a solu-
tion by the example of the simplest variational problem.
One of the constructions of an extension consists in adding new ele-
ments to the original class of underlying elements and extending the func-
tional to the new elements. Here we shall make a small step in this direc-
tion: namely, we shall extend the simplest problem to the class of piecewise
smooth functions.
We remind the reader that by a piecewise smooth function x(·) on [to,
t 1 ] is meant a function possessing the property that the function itself is
continuous while its derivative is piecewise continuous, i.e., continuous
everywhere on [to, t1l except at a finite number of points to< T1 < ••. <
Tm < t1 t and, moreover t the derivative x( •)
has discontinuities of the first
kind at the points Ti· For example, the function

-·{ o, -I~ t~o.


x(t)- · _.!..
t• sm 1 ,
O<t ,

possesses the derivative which is not piecewise continuous. The collection


of all piecewise smooth functions on [to, t1l will be denoted by KC 1 ([t 0 ,
tl]).
The integral functional
- t,
:J (x( ·)) = ~ L (t, x (t), x(t))dt (1)
t.
(cf. Section 1.4.1) can be extended in a natural way to the elements x(·)E
KC 1 ([t 0 , t 1 ]) since the integrand is piecewise continuous for these x(·)
and the integral exists.
*For absolutely continuous functions, see Section 2.1.8. If the reader is
not familiar with this notion, it is allowable to assume that x( ·) is con-
tinuously differentiable in the half-open interval (0, 1], is integrable in
the improper sense on the closed interval [0, 1], possesses the finite in-
tegral3(x(·)), and x(O) = x(l) = 0.
tD. Hilbert, Gesammelte Abhandlungen, Vol. 3, Springer-Verlag, Berlin (1935).
INTRODUCTION 39
Now let us consider the simplest problem
3(x(·))-extr, x{t 1)=X 0 , x(t 1)=xi (2)
in the space KC 1 ( [ t 0 , t 1 ]). At the end of this section we shall give an
example of a function L satisfying the standard conditions for which, how-
ever, a problem of the form (2) possesses no solution in the space C1 ( [ t 0 ,
t 1 ]) while its solution in KC 1 ([to, t1]) exists, which shows that this ex-
tension of the problem is reasonable (although in Hilbert's example the
minimum is not attained in KC 1 either).
Here the notion of a local solution needs a somewhat more precise
definition. If, as before, we include in a neighborhood of an element
i(·)EKC1 {[t 0 , t 1 ]) those x(·) for which both the differences x(t)- x(t)
themselves and their derivatives are small (cf. the definition of a weak
extremum in Section 1.4.1), then the newl¥ added elements will lie far from
the old ones. Indeed, if the derivative x(·) has a jump of magnitude o at
some point then none of the functions x(·)EC1 {lt 0 , t 1 ]) can satisfy the in-
equality lx(t)- i(t)l < o/2 for all t [for which i(t) exists]. Generally,
this is inconvenient because it is usually desirable that the old space be
everywhere dense in the extended space. Therefore, the closeness in
KC 1 ( [to, t d) will be defined by comparing only the functions themselves but
not their derivatives. This leads to the replacement of the notion of a
weak extremum (Section 1.4.1) by the notion of a strong extremum.
Definition, A function x(·)El(O{[t 0 , t,l) is said to yield a strong
minimum (maximum) in the problem (2) if there exists E > 0 such that for
any x(·)EXC 1 ((t 0 , t1J), satisfying the conditions x(to) = xo, x(tl) = x 1 , and
llx( · )-x (·) l o = sup 1x(t)-x(t) I< e, (3)
le[l,, 11]

the inequality 3(x(·))~3(x(·)) (:1(xl·)) ~ 3(x{·))) holds.


The Lemma on "Rounding the Corners." 1) If a function L = L(t, x, x)
is continuous jointly with respect to its arguments, then
inf :J (x ( ·)) = inf 3 (x ( · )). (4)
x( ·) e KC 1 ((10 , 11 ]) x ( ·) e c• ([1 0 , 1,])
x (10 )=x 0 , x (1 1)=x 1 x (10 )=x8 , x (1 1 )=.<t

2) Equality (4) is retained when only those x(·)El(C1 ([t0 , tt]) are taken
which satisfy inequalities (3) for given X(•) and E ) 0.
3) The assertion also remains true when inf is replaced by sup.
Proof. Since l(C1 {ft0 , ti]) ::>C1 ([t 0 , tt]), the left-hand member of (4) does
not exceed the right-hand member, and it only remains to prove the opposite
inequality. Let us assume the contrary. If inf3(x(·))<inf:J(x(·)), then
KC• c•
there is a piecewise-smooth function x(·) and n > 0 such that :J(x(·))<
inf3(x(·))-T). Let Ti, i = 1, 2, ... ,m, be the points of discontinuity of the
c,
derivative i, and let ~i = i(Ti + 0) - i(Ti - O) be the jumps of the deri-
vative at these points.
The continuous function L is bounded on the closure of the bounded set
X= { (t, x, x)l to~ t~ tt, lx-x(t) I~ max I A; l6o/4, IX-~ (t) I ~max I A; I /2}
i.e., IL(t, x, x) I ~ M.
The function
(1-ltl>• I
a(t)== { - 4 - , tl~l, (5)
o, Ill> 1
40 CHAPTER 1

0 I
Fig. 22 Fig. 23
is continuous, and its derivative has a jump of magnitude -1 at t = 0.
Therefore, the function Ba(t6~1 ).whose graph is obtained from the graph of
a(•) by a similitude transformation and a shift (Fig. 22), is also continu-
ous everywhere except at the point Gi where, as before, it has a jump of
magnitude -1 .
Now it can easily be verified that the function
m

xo(t)=x(t)+ l:A BaC6""


t=l
1 1)

is continuous on [to, t1J together with its derivative, and, moreover,


Xo(t) = x(t) outside the closed intervals [Ti- o, Ti + o]. In particular,
these intervals do not overlap for sufficiently small o, x 0 (ti) = x(ti) =
Xi, i = 0, 1, lx0 (t)-x(t)l~maxl.1i1B/4, and lxo.(t)-i(t) l:s:;;;maxiA;i/2, because,
according to (5), we have la(t)l ~ 1/4 and la(t)l ~ 1/2. Therefore, for
0 < o < co we have
ts tt
.7(Xo(·))-.7(x{·)) ""'s L(t, X6(t), x6(t))dt-SL(t, x(t), i(t))dt==
~ ~
m ~;+6 •
= ~ s (L(t,
1=1 "'1-6
Xo(t), Xo(t))-L(t, x(t), x(t)))dt,

and consequently
.7 (x6 ( ·)):;:;;; :J (x (·))+4mB M < inf .7 (x ( · ))-1') +4mB M < inf .7 (x ( ·))
c• c•
if o is sufficiently small. Since x6 ( ·} E C1 ([t 0, 11]}, we have arrived at a
contradiction. For small o the function x 0 (·) satisfies the inequalities
(3) provided that x(·) satisfies them. This implies the second assertion
of the lemma. The third assertion is obvious. •
COROLLARY. If a function x(·)EC1 ([t 0 , 11.1) yields an absolute or a
strong minimum (maximum) in problem (2) in the space C1 ([to, tl]), then it
possesses the same property in the space KC 1 ([to, t 1 ]) as well.
The extension we have constructed is not in fact always sufficient.
Besides, the space KC 1 ([t 0 , t 1 ]) is not complete [with respect to the met-
ric determined by the left-hand side of (3)]. It would be more natural to
extend the problem to the class W~([to, t1]) of functions satisfying the
Lipschitz condition.* But this would require some facts of the theory of
functions. There is another important construction of an extension of the
simplest problem which will be demonstrated by the example below.

*It should be noted, however, that in this extension we would not obtain
the existence in Hilbert's problem because the function t + t 113 does not
satisfy the Lipschitz condition.
INTRODUCTION 41

The Bolza Example. Let us consider the problem


I

:; (x(· ))= ~ ((l-x2 ) 2 +x2 ) dt-..inf, x(O) =0, x(l)= 6.


0

Let us suppose that ~ = 0. Then it is clear that there exists no so-


lution of the problem. Indeed, here the infimum of the functional is equal
to zero. To show this it sufficies to consider the following minimizing
sequence of functions belonging to KC 1 ([0, 1]):
t
xn(t)=~sgnsin2n2"Td't", n=l, 2, ... (Fig. 23).
0

The sequence of the functions xu(·) is uniformly convergent to zero, and


I

~ACt)= 1 except at a finite number of points, i.e., .'J(xn(·))=~x~(t)dt-+-0.


0
On the other hand, if xo(·) = 0, then :J(x0 (·))=1, and if x(·) i. 0, then
I
1.'} (x(· )) > ~ x dt > 0.
0
2

The matter is that the functional .'J is not lower semicontinuous: as


was established, in any arbitrarily small neighborhood of the function
xo(·) = 0 on which the functional assumes the value equal to unity there
are functions for which the value of the functional is much smaller [since
.'J(xn(·)-+0 ]. It turns out that it is possible to construct a "lower semi-
continuous" extension of the problem in which the functional .'} is replaced
by the functional
1(X(·))= lim .'J(y(·)),
1/l·)...,.x(·)

while the original set of functions (say C1 ([to, t 1 ]) or KC 1 ([t 0 , t 1 ]))


remains the same. The passage to the limit for y(·) ...,. x(·) is understood
here in the sense of the space C([to, t1]) (i.e., in the sense of the uni-
form convergence). It turns out that the functional 1 admits of a very
simple description; namely, in the Bolza example we have
I

-;¥(x(·))= ~ (((x -1)+)


2 2 +x 2 )dt,
1

where

In the general case when


I,

.? (xf·)) = ~ L (t, x, x)dt,


I,

we have
I,

-;¥(x(· ))= ~ L(t, X, x)dt,


lo

where L is the "convexification" of L with respect to x, i.e., the function


X ..... L(t, x, x) is the greatest convex function not exceeding the function
X ..... L(t, x, x). This assertion is known as Bogolyubov's theorem.
The same Bolza example can be used to construct a problem of the type
(2) in which the minimum is attained on a curve with a corner. However,
for purely technical reasons, we shall consider a somewhat different prob-
lem:
42 CHAPTER 1

\ -t
I
Q

Fig. 24
L'"
1 II

t,
3(x(•))= ~ (f(x)+x 2 )dt-i>~f, x(t 0 )=x0 , x(t 1)=x1 , (6)
I!

where

The function f(u) is continuously differentiable and convex (Fig. 24). In


particular, the graph of f(u) does not lie lower than any of its tangents;
i.e., we always have
f (v);;;;::: f (u) + f' (u) (v-u).
Therefore, for any two functionsx(·),x(·)EKC 1 ((t 0 , / 1J) satisfying the
given boundary conditions x(ti) = x(ti) = Xi, i = 1, 2, We have
I,

: (x( · ))-.7 (~( · )) = ~ (f (x (t))-f (i (t)) +x• (t)-x• (t))dt ~


lo
t,
;;;;::: ~ (f' (~(t)) (x(t)-~(t)) +2x(t)(x(t)-x(t)) +(x(t)-x(t)) 2)dt;;;:::
lo
t,
; ; : : ~ <f' <~(t)) (x(t)-~(t))+2x(t) (x(t)-x(t)))dt.
t,

Now let us suppose that x(·) satisfies the Euler equation


d " ~
df f' (x (t)) = 2x (t) (7)

at all points of its differenti~bility. Integrating by parts on each of


the intervals of continuity of x(·), we obtain
I,

:; (x(·))-:J(x(·));;;::: 5(- ~ f'(~(t))+2x(t)) (x(t)-x(t))dt+


t,

provided that, in addition to (7), the function p(t) = f'(x(t)) (the momen-
tum) is continuous. Hence the function x(·) [satisfying the Euler equation,
the condition of continuity of p(·), and the boundary conditions] is a so-
lution of the problem (6). For instance, the function x(t) = eltl having a
corner at the point t = 0 is a solution of this problem for to= 1, t 1 = 1,
xo = x1 = e. Indeed,
~ { e1, t;;;:.o,
x(t) = e-t, t<O,

whence l~(t)l ~ 1, and consequently the function


INTRODUCTION 43

II
II
II
II
II
I
I I
1 e
II
II
I I
I lr
r-A I
I
a b

Fig. 25

, ~ { 2(e1 -l), t~o.


p (t) =f (x(t)) = 2 (-e-t 1), t < 0, +
has no jump at the point t = 0: p(+O) = p(-D) = 0.
2x(t), and hence the Euler equation is satisfied.
Moreover, :t p(t)=2eitl=
1.4.4. Needlelike Variations. Weierstrass' Condition. The notion
of a strong extremum was introduced into the calculus of variations by Weier-
strass. To prove a necessary condition for a strong minimum Weierstrass
used special variations of the following form:

{ £A.+(t·-'t'H, tE['t'-A., 't'],


h,.(t)=hJ..(t; 't', ~)= £A.-(t-'t')£-y'T, tE['t', 't'+JITJ,
(1)
XA, (t) =X (t) +hl.(t)
(see Fig. 25a).
The derivative of the variation hA(·) has the form shown in Fig. 25b.
The graph of the derivative resembles a needle to some extent, which ac-
counts for the name "needlelike." The needlelike variations are convenient
for investigating problems on a strong extremum.
Now we proceed to the derivation of Weierstrass' necessary condition.
Let us consider the simplest problem of the classical calculus of variations:
t,
.?' (x ( · )) = ) L (t, X, x) dt--+- inf, X {t 0) =X0 , X (ti) =Xi , (2)
t,

on the class KC 1 ([to, t1]) of piecewise-smo oth functions. Let x(·) be an


extremal suspected to yield a strong minimum; for definiteness, we shall
suppose that x(·) is smooth.
Following the general idea of the method of variations, we shall con-
sider the function
(3)
where hA is determined by the formulas (1), Tis an interior point of [to,
t 1 ], and~ is an arbitrary number.
For sufficiently small A~ 0 the function xA(·) is admissible in prob-
lem (2), i.e .• xA(ti) = i(ti) + hA(ti) = Xi• i = 0, 1.
44 CHAPTER ·1

The function cp(·) is defined for nonnegative A. Let us prove that it


is differentiable on the right at the point zero. From definition (1) it
is readily seen that
a) lha.(·>lo= max lha.(t)j=O(A),
1 e [10 , 1,]
(4)
b) llia.(t)l =ir'X=o(y'X), tE(t',>t+}"X).
It follows that
...
cp (A)-cp (0)= ~ (L (t, Xa. (t), i (t)+ ~)-L (t, X(t), ; (t))) dt+
T-A
T+JI'J:'
+ ~ (L(t,xa.(t), Xa.(t))-L(t, x(t), i(t)))dt=1i+1z·
(5)

The integral 1'-i in (5) can be estimated thus:


'1t=A(L(t, x(t), i(t)+~)-L(t, X(t), i ('t)))+o(A) (6)
[to this end one should make use of the mean value theorem of differential
calculus (see Section 2.2.3) and the estimate (4a)].
To estimate the second integral 1. we represent the difference
11=L(t, x(t)+ha.(t), i(t)+ha.(t))-L(t, x(t), x"(t))
in the form

[here again one must apply the mean value theorem and the estimate (4b)],
perform integration by parts in the second term and use the fact that

- ddt L,;, +Lx IA


x(l)
=0

[because x(·) is an extremal]. As a result, we obtain


T+'VX"
1.-- ~L,;,(t, X(t), i(t))+ ~ o( y'X) dt = - ~L.;, (t, x(t), i(t))+g (A.). (7)

Comparing (6) and (7), we find


'P(A)-cp(O)=A(L(t, x(t),.i(t)+s)-L('t, x('t), i('t))-iL.X (t, x(t), i(t)))+o(A). (8)

Hence, the function ljl(-} possesses the right-hand derivative at the


point A. = 0:
IP' <+O)= lim cp (A);:-cp (O) =L (t, X ('t'), i ('t)+ s)-L (t, x(t), ; (t)) -;L. ('t', X('t'),
q, 0 X
i (t)).
However, if x(·) yields a strong minimum, then 3(xa.(·))~ 3(x(·)), and conse-
quently
cp' ( +O) = lim (cp (1..)-cp (0))//.. ;;.:= 0,
Aj.O

i.e., the relation


B<"'· x<"'>· ~<"'>· ~<"'>+s>=
=L(-r;, x(T), ~(•)+s)-L(T,X(T), ~(T)) -sL,;, (,;,x(,;), ~(T))~O (9)
holds for any sE R. The function
B(t, x, y, z)=L(t, x,iY• z)-L(t, x, y)-(z-y)Lu(t, x, y)

is called the Weierstrass function. We have thus proved the following:


INTRODUCTION 45

THEOREM (Weierstrass' Necessary Condition for a Strong Minimum). For


an extremal Je(·)EC1([t 6 , t 1 ]) of the simplest problem of the classical calcu-
lus of variations (2) to ·yield a strong minimum it is necessary that the
equality
G (r:, x(r:), ~ (r:), ~ (r:) + 6) = L (r:, x(r:), i\r:) + 6)- L (r:, x(r:), ~ (r:))- 6Lx (r:, x(r:), i (r:)) ~ o
and any 6 E R.
hold for any r: E[! 0 , t1 ]
1.4.5.• Isoperimetric Problem and the Problem with Higher Derivatives.
By the isoperimetria problem (with fixed end points) in the calculus of
variations is usually meant the following problem:
t,
3o(X(•))=~fo(t,X, X)dt---..extr,
t,

t, (1)
3 1 (x(·))=~f 1 (t,x,x)dt=a1 , i=l, ... ,m,
.I,
x(t 0 )=-X0 , x(t 1)=x1 •

It is supposed that the functions fi satisfy the same conditions as


those for Lin Section 1.4.1: the functions themselves and their partial
derivatives with respect to x and x
are continuous jointly with respect to
their arguments. For the sake of simplicity, we shall confine ourselves to
the case m = 1.
THEOREM (a Necessary Condition for an Extremum in the Isoperimetric
Problem~. Let a function ~(•) yield a local extremum (relative to the
space C ([to, t1])) in the problem
t,
3 0 (X(·))=~ f0 (t, X, x)dt---..extr,
t, ( 1 I)
t,
3 1 (X(·))= ~ {1 (t, x, x)dt=a 1 , x(t;)=x 1 , i=O, l,
t,

and, moreover, let the functions t + f 0x(t, ~(t), i(t)) and t + f 1x(t, x(t),
:R(t)) belong to C1 ([t 0 , t 1 ]). Then therearenumbers, ~o and ~1, not vanish-
ing simultaneously, such that for the integrand L = ~ofo + ~1f1 the Euler
equation holds:
d ~ ..: ~ ..:
-TtL,; (t, x(t), x(t))+Lx(t, x(t), x(t))= 0. (2)

Proof. A) By analogy with what was done at the beginning of the proof
of the theorem of Section 1.4.1, we calculate the first Lagrange variations
of the functionals 3 0 and 31:
t,
M 1 (x(-), x(·))= ~ (p 1 (t)x(t)+q 1 (t)x(t))dt, i=O,I, (3)
t.

where
pt(t) = f 1x(t, x(t), i (t)), q1 (t) =fix (t, x(t), f (t)).
Here there are two possible cases: either 1131,==0, Vx(·) EC1 ([t 0 , t 1 ]),
x(to) = x(t 1 ) = 0, or 113i~O. In the first case, by the theorem of Section
1 • 4 • 1 , we have

and putting AO = 0 and Al =1 we immediately obtain (2).


46 CHAPTER

then there exists a function y(·)EC1 ((t 0 , 11 ]), y(t0 )=


B) Let 1\.'7 1 ~0;
y(t 1 ) = 0, for which Mix(·), y(·))*O. Let us consider the functions of
two variables
cp;(a..~)=.?t(x(·)+a.x(·)+~u(·)),
i=O, 1.
We leave it to the reader to verify that they are continuously differenti-
able in a neighborhood of zero and that

ilcp;~,O)=Il.?t(x(·), x('·)), ilcp;~~,O)=Mt(x(·), y(·)).

LEMMA· For any function. x(·)EC1 ((t0 , 11 ]) such that x(to) = x(tl) = 0
there holds the relation

(4)

~ If determinant (4) is nonzero, then there is a neighborhood


of the point (O, 0) which goes into a neighborhood of the point cp 1 (0, 0)),
(cp 0 (0, 0), under the mapping (a.,~)~ (cp0 (a.,~), ~pda., ~)) (the corresponding theo-
rem on the inverse mapping was already used in Section 1.3.2). In particu-
lar, there are a and B, and, consequently, an admissible function i(·) +
ax(•) +By(·), such that cp 0 (a., ~)=.? 0 (x(·)+a.x(·)+~y(·))=.'7 0 (x(·))-e, where
£ > 0 and <p 1 (a.,~)=cp 1 (0,0)=.? 1 (.x(·))=a. 1 • This contradicts the fact that~(·)
yields a local minimum in the problem (1). •
C) From equality (4), we obtain

(by assumption, the denominator is nonzero here). Putting AO 1' ~1


-l\.'7 0 (x(·), y(·))/11.'7 1 (x(·), y(·)), we see that
~oll.'7 0 (x(-), x(·))+~~Mdx(·), x(·))=O (5)
for any function x(·)EC1 ([t 0 , 11]) satisfying the condition x(t 0 ) = x(t 1 ) = 0.
It can easily be seen that the left-hand side of (5) has the form
t,
~ ((~ 0P0 (t) + i 1p1 (t)) X (t) + (,; 0 q0 (t) + ,; 1 q1 (t)) X (t)) dt.
t.

Applying the fundamental lemma of the classical calculus of variations, we


obtain the equation

coinciding with (2). •


Problem (1') was for the first time considered by Euler in his famous
work of 1744 where the method of broken lines was used to derive the rela-
tion (2). As a matter of fact, this was the main subject of the work. Un-
doubtedly, this work already contained the Lagrange multiplier rule in its
incipiency.
In conclusion, we shall briefly consider the problem with higher de-
rivatives:
t,
.'7 (x ( ·)) = ~ f (t, x, i, x, ... ,x<nl)dt-+extr,
t,
INTRODUCTION 47

xV>(t;)=xl. i=O, 1, j=O, .. . , n-1. (6)


For greater detail we refer the reader to [2] or [3]. We shall investigate
this problem in the space cn([to, t1]) of functions continuous together
with their derivatives up to the order n inclusive on the closed interval
[to, t1]. We shall assume that the function f and its derivatives with
respect to x, x, ...
,xCn) are continuous jointly with respect to their ar-
guments. Let x(·) be a function suspected to yield an extremum. Let us
calculate the first Lagrange variation of the functional:

(7)

where p1 (t) = ~ (t, X(t), ... , ,X<n> (t)).


8xU>
The conditions of local extremality imply that 6.7(x(·), X(·))=O. pro-
vided that x(j)(ti) = 0, i = 0, 1, j = 0, 1, ... ,n- 1. Integrating by
parts in (7) [let the reader state the necessary requirement concerning the
smoothness of Pj ( ·) for the integration by parts to be legitimate], we bring
the first variation to the form

6.7 (X (•), X ( ·)) =


1,( n
~ j~O (-1)1 py> (t)
)
X (t) dt,

Further, applying the generalization of the fundamental lemma of the class-


sica! calculus of variations to the case of functions belonging to
c(n) ([to, t 1 ] ) and satisfying the conditions x(j)(ti) = 0, i = 0, 1, j = 0,
1, .•• ,n- 1 (let the reader think over this generalization), we conclude
that the desired necessary condition for the extremality of x(·) is the
following equation:
n
" '
~(-1)'. ( dt
d )J -.(t,x(t),
at ~ ·'
x(t), ~
... ,x<n>(t))=O.
i=O axw ·

It is called the Euler-Poisson equation.

1 .5. Lagrange Problem and the Basic Optimal


Control Problem
1.5.1. Statements of the Problems. The publication in 1788 of the
treatise by Joseph Louis Lagrange "Analytical Hechanics" was an important
stage of the development of natural science. In particular, the treatise
also played an exceptional role in the development of the calculus of vari-
ations. Namely, the following conditional extremum problem was stated
there:

.7 (x ( · )) =)'· f (t, x (t), x(t)) dt-+ extr, (1)


t,
ID(t, x(t), x(t))=O~ID1 (t, x(t), x(t))=O,
j= l, ... , m, (2)
x(t 0 )=X0 , 'x(t 1 )=Xt•

Here x(·): [to, t 1 ] + Rn, f: R x Rn x Rn + R, and¢: R x Rn x Rn + Rm.


Later problem (1) and its various modifications connected with some addi-
tional constraints (some other boundary conditions, additional integral
relations, etc.; also see Section 1.2.6) were called the Lagrange problems.
To solve problem (1) Lagrange used the basic method described in Section
1 .3.2, i.e., the multiplier rule. However, he did not prove it rigorously,
and it took more than 100 years to give Lagrange's reasoning the form of a
rigorously proved theorem.
48 CHAPTER 1

Let us indicate two most important special types of constraints em-


braced by the general expression (2). In the calculus of variations a
constraint of the form ~(t, x) = 0 in which the function ~ involved in (2)
is independent of x
is called a phase constraint. In mechanics the phase
constraints are also called holonomic constraints.
The other case is the one in which the relation (2) can be resolved
with respect to the derivatives x.
Then this constraint is written in the
form of equations x =cp(t, x, u), x(·): [t 0 , td-Rn, u(·): [t 0 , t1 ]-R', cp: RxRnx
Rr + Rn. Here the variables x(·) are called phase coordinates and the
variables u(·) are called controls. Attention will be primarily paid to
this most important case. More precisely, further we shall consider the
problem
t,

:J (x ( 0 ), u ( o)) = ~ f (t, x (t), u (t)) dt+'lJl.(x (t 0), x (t 1) ) - extr, ( 1 I)


t,

x-cp(t,x, u)=O, 'ljl(x(t 0 ), x(t 1 ))=0(~'ljl1 (x(t 0 ), x(t 1 ))=0, j=l, ooo,s)o (2')
Formulas (1') and (2') involve functions f:V-R, cp: v-R",'ljl: w-R'.
where V and W are open sets in the spaces R x Rn x Rr and Rn x Rn, respec-
tively. Here we shall assume the instants of time to and t 1 to be fixed.
The constraint x-cp (t, X, u) = 0 is called a differential constraint,
and the constraints ~(x(to), x(tl)) = 0 are called boundary conditions.
Problem (1'), (2') will be called the Lagrange problem in Pontryagin's form.
We shall assume that all the functions f, cp , and~ are continuously
differentiable.
Later in Chapter 4 a still more general case will be considered in
which to and t 1 may also vary and isoperimetric equality and inequality
constraints, etc. are admitted.
Problem (1'), (2') will be considered in the Banach space Z = C1 ([t 0 ,
t 1 ], Rn) x C([t 0 , t 1 ], Rr). In other words, we shall consider the pairs
(x(·), u(·)) where x(·) is a continuously differentiable n-dimensional
vector function and u(·) is a continuous r-dimensional vector function.
For a pair (x(·), u(·)) we shall sometimes use the abbreviated notation z.
An element .z=~(o).,u(o))EZ will be called a controlled process in the
problem (1 1 ) , (2 1 ) if x(t)=cp(t, x(t), u(t)),.'v'tE[t •• t 1 ] , and will be called an
admissible controlled process if-, in addition, the boundary conditions are
satisfied. An admissible element z = (x(·), u(·)) will be called an opti-
mal process in the weak sense or will be said to yield a weak minimum in
the problem (1 1 ) , (2 1 ) if it yields a local minimum in the problem, i.e.,
if there is E: > 0 such that llx- xlll < E: and llu- ullo < E: imply :J(z)';;::J (z).
1.5.2. Necessary Conditions in the Lagrange Problem. Let us try to
apply the general method of Lagrange mentioned in Section 1.3.2 to the
problem ( 1 1 ) , (2 1 ) of the foregoing section. By analogy with the fi-
nite-dimensional case, the Lagrange function should be written thus:
t,
£'(x(o), U(o); p(o), ft, A- 0 )= ~ Ldt+l, ( 1)
t,
where
L(t, x, x, u)=p(t)(x-cp(t,x,u))+A..f(t, x, u),
s
(2)

l (x0 , X1 ) = 0~ f.tJ'ljlJ (x 0 , X1), A0 =fto· (3)


I =0

There is no doubt that the "endpoint part" l of the Lagrange function


is of the form (3): here there is a complete similarity between the case
INTRODUCTION 49

under consideration and the finite-dimensional case. As to the constraint


x =(Jl(t, x, u) , it must be fulfilled for all tE (t 0 , t 1 ] , and, by analogy, the
corresponding "Lagrange multiplier" p(·) must be a function of t, and its
contribution to the Lagrange function has the form of an integral instead
of a sum.
Thus, we have formed the Lagrange function. Now, following Lagrange's
recipe we must seek extremum conditions for the resultant expression "as if
the variables were independent." In other words, we should consider the
problem
2'(x(·), u(·); p(·), ~. }, 0)---+-extr, (4)
where the Lagrange multipliers are assumed to be fixed. The problem (4) is
a Bolza problem considered in Section 1.4.2. The application of the ex-
tremum conditions written in that section leads to the correct equations
called the Euler-Lagrange equations. More precisely, the following-theorem
holds:
The Euler-Lagrange Theorem. If~= (i(·), ~(·)) is an optimal process
in the weak sense in the problem (1'), (2') of Section 1.5.1, then there
are Lagrange multipliers ~o = ~o > 0 (in a minimum problem) or ~o = ~o ~ 0
(in a maximum problem), p (·)EC 1 ([t 0 , t 1 ], Rn), and~= (~1, ... ,~s), not all
zero,* such that there hold the Euler equations

- : 1 L;; u. x(t), ~(t), u(t>)+L.. (t, x(t).~ (t), u(t))=o, (5)

Lu(t, x(t), x(t), u(t))=O, (6)


and the transversality conditions

(7)

The Euler equation in u has the degenerate form (6) because L is in-
dependent of ~ and l is independent of u, and there is no transversality
condition "in u" at all.
This theorem (even in a more general form) will be proved in Section
4.1 as a direct consequence of a general theorem concerning the Lagrange
multiplier rule for smooth infinite-dimensional problems.
It should be noted that the Euler-Lagrange theorem implies a necessary
condition for an extremum in the isoperimetric problem (with an arbitrary
number of isoperimetric conditions) and the Euler-Poisson equation for
problems with higher derivatives. The isoperimetric constraints
tl
?;(x(·))=~f;(t,x,x)dt=rx;, i=l, ... ,m; xERn,
t,

can be taken into account by introducing new phase coordinates connected


with the old ones by the differential constraint Xn+i(t) = fi(t, xl(t), ... ,
xn(t), xl(t), •.. ,~n(t)) and satisfying the boundary conditions

i= 1, ... , m.

*An advocate of mathematical purism may notice a glaring inaccuracy in this


statement: ~o and 0i are numbers while p(·) is an element of a function
space, and therefore they cannot be simultaneously equal to one and the
same "zero." Each of them may or may not be equal to the zero element of
its own space. However, we are used to identifying zeros of all the space,
and this statement does not irritate us.
50 CHAPTER
Now the application of the Euler-Lagrange theorem leads to the desired
necessary conditions in the isoperimetric problem. When the problem with
higher derivatives is inyestigated, ~t can be brought to the form (1'),
(2') by putting x = x1, x1 = X2, ••• ,xn = u and applying the Euler-Lagrange
theorem.
1.5.3. The Pontryagin Maximum Principle. In the Fifties the demands
of applied sciences (technology, economics, etc.) stimulated the statement
and the analysis of a new class of extremal problems which were called
optimal eontPol ppoblems. A necessary extremum condition for this class
of problem.s, known as the "maximum principle,"was stated by L. S. Pontryagin
in 1953 and was proved and developed later by him and his pupils and colla-
borators (see [12]). It is important to note that this condition is of an
essentially different form in comparison with the classical equations of
Euler and Lagrange: the solution of an optimal control problem includes as
a necessary stage the solution of an auxiliary maximum problem (this ac-
counts for the name the "maximum principle").
Here we shall consider a special case of the general statement of an
optimal control problem in which the Lagrange problem in Pontryagin 's form [see
(1') and (2') in Section 1.5.1] involves one more additional condition on
the control: u E U.. More precisely, we shall consider the following problem:
t,
.7(x(·), u(·))=~f(t, x(t), u(t))dt+¢0 (x(t 0 ), x(t1))-inf, (1)
t,

x(t)-cp(t, x(t), u(t))=O, ..p1 (x(t 0 ), x(tJ)=O,


i= 1, ... , s, (2)
ueu. (3)
Here the functions f, cp, and ~j are the same as in problem (1 '),
(2') of Section 1. 5. 1 and U is a fixed set in Rr. A more general problem
will be considered in Chapter 4.
The Lagrange problem was considered in a Banach space. However, here
we want to use the simplest means, and the admissible elements will be de-
scribed in another way similar to the one mentioned in Section 1.4.3,where
the simplest problem of the classical calculus of variations was extended
to a problem in the space KC 1 ([to, t1]) of piecewise-differentiable func-
tions.
A pair (x(·), u(•)) will be called a controlled pPoeess in problem
(1)-(3) if the control u(·):(t 0 ,t1 ]-+U is a piecewise-continuous function,
the phase trajectory x(•) is piecewise-continuously differentiable, and,
moreover, the function x(·) satisfies the differential equation x(t)=cp(t,
x(t), u(t)) everywhere except at the.points of discontinuity of the control
u(·). A controlled process is said to be admissible if, in addition, the
boundary conditions are satisfied.
An admissible process (x(·), u(·)) is said to be optimal if there is
e: > 0 such that the inequality 3(x(·), u(·))~.7(x(·), u(·)) holds for every
admissible controlled process (x(·), u(·)) such thatjx(t)-x(t)l<e, YtE(t 0 ,
ttJ.
Let us again try to apply the Lagrange general method discussed in
Section 1.3.2 to problem (1)-(3). The Lagrange function of problem (1)-(3)
is of the same form as in the Lagrange problem: the constraints of the form
of inclusions (uEU) are not involved in the Lagrange function. Thus
t,
(4)
2(x(·)~ u(·); p(·), ~-'• 1..0)= ~ Ldt+l,
t,
INTRODUCTION 51

where
L(t, X, x', u)=p(t)(X-ffl(t, x, u))+A.of(t, x, u), (5)

l (x0 , X 1) = l1 1-'J'i'/ (x
i=O
0, x1 ), J.'o = 'A0 • (6)

Further, as usual, we have to consider the problem


.9'(x(·), u(·);p(·). ~. i0 )~inf (7)
(where the Lagrange multipliers are assumed to be fixed) "as if the vari-
ables were independent." Naturally, problem (7) splits into two partial
problems:
.2-'(x(·), u(·); P(·), ~. io}~inf (with respect to X(·}), (8)
.9'(x(·), u(·); P(·), ~. i 0)-+inf (with respect to u(·)E'U), (9)
where ~ designates the set of piecewise-smooth functions with values be-
belonging to U. Problem (8) is again a Bolza problem; as can easily be
understood, problem (9) has the following simpler form:
t,
~ x(t, u(t))dt-+inf, u(·)E~. ( 10)

where X(t, u) = iof (t, x(t), u)-p(t) fP (t, x(t), u).


It is quite evident that a necessary (and sufficient) condition in the
problem (10) can be stated thus: u(·H~~ yields a minimum in the pr>oblem
(10) if and only if the r>elation
minx{t. u)=x(t, u(t))~minL(t, x(t), ~(t), u)=L(t, x(t), ~(t), u(t))~
ueu ueu

~max(p(t)fP(t, X(t), U)-iof(t, X(t), U}]=p(t)ffl(t, X(t), u(t)}-~of(t, X(t), U(t)} (11)
UEII

holds ever>ywher>e except at the points of discontinuity of u(·).


The combination of the necessary conditions for extremum in problem
(8) and relation (11) leads to the necessary conditions for extremum in
problem (1)-(3) which were called the Pontr>yagin maximum pr>inciple [because
of the specific form of condition (11)). More precisely, the following
theorem holds:
THEOREM (Pontryagin Maximum Principle). If(~(·),~(·)) is an optimal
process in problem (1)-(3), then there are Lagrange multipliers ~o =no~
0, p(·) E KC 1 ([t 0 , t1 ], Rn•), ~ = (~1 , . . . , ~.), not all zero, such that there hold
the Euler equation
d - "- -.:-
iiiLi (t, x(t), x(t), u(t))=Lx(t, x(t), x(t), u(t)), (12)

the maximum principle (11), and the transversality conditions


LX (t,
-
X (t), X
at (x- (to), X- (tl)),
.: (t), u- (t)) It=t,. = ( -1)" ax,.
( 13)
k=O,l.
1.5.4. Proof of the Maximum Principle for the Problem with a Free
End Point. In this section the Pontryagin maximum princiole will be
proved for the simplest situation when there is no endpoint part in the
functional, one of the end points is fixed, and the other is free, i.e.,
when ~ 0 = 0 in relation (1) of Section 1.5.3 and ~j(x 0 , x 1 ) = xoj- xoj,
j = 1, .•. ,n, in relation (2) of that section.
Thus, we deal with the problem
I,

'.3(x(·), u(·))= ~ f(t, x(t), u(t))dt-..inf, (1)


I,
52 CHAPTER 1

x(t)-cp(t, x(t), u (t))=O, x(t 0 )=x0 , (2)


ueu. (3)

As in the foregoing section, the functions f, fx, cp, and cpx are
assumed to be continuous jointly with respect to their arguments. Let us
see what the Pontryagin maximum principle is like in this case. Since the
m
function l (x 0 , x 1 ) = ~~; (x 0 ;-X0 ;) does not depend on x1, the transversality
i=l
conditions (13) of Section 1.5.3 result in
Lx(tl, X (tl), X (/1), u(tl))=p(tl)=O. (4)
Further, Eq. (12) of Section 1.5.3 takes the form

-dt L;, + Lx = 0 ~- p (t) = P (t) Cflx (t)-/.. 0f x (t),


d -' ~ ~ ~-
(5)

where we have introduced the following abbreviated notation: fx(t)=fx(t,x(t),


u(t)), ~x(t)=qJ .. (t, x(t), u(t)). If we suppose that ~0 = 0, then, by the unique-
ness of the solution of Cauchy's problem for the homogeneous equation (5),
there must be p(·) = 0, and hence, by virtue of the transversality con-
dition at the left end point [see the relation (13) in Section 1.5.3] there
must also be ~ = 0. But this contradicts the condition of the theorem
according to which the Lagrange multipliers do not vanish simultaneously.
Therefore, ~ 0 ~ 0, and we can assume that ~o = 1. However, then p(·) is
uniquely determined by Eq. (5) and the boundary condition (4) (due to the
uniqueness of the solution of Cauchy's problem for a nonhomogeneous linear
system of differential equations). Summing up, we can state the maximum
principle thus:
THEOREM (Pontryagin Maximum Principle for the Problem with a Free End
Point. If a process (x(·), u(·)) is optimal in problem (1)-(3), then for
the solution p(·) of the system
-p (t) = p(t) ~.. (t)-f.. (t) (6)
with the boundary condition
(6a)
there holds the maximum principle at the points of continuity of the con-
trol u(·):
max(p(t)qJ(t,x(t), u)-f(t,x(t), u))=p(t)qJ(t, x(t), u(t))-f(t,x(t), u(t)). (7)
'UEII

As in the preceding cases, to prove the theorem we shall apply the


method of variations. To begin with, we shall state the definition of an
elementary (or Weierstrass' or needlelike) variation analogous to the one
used in Section 1 .4.4. Let us denote by To the set of those points be-
longing to (to, t1) at which ~( ·) is continuous. Let us fix a point r:ET 0 ,
an element <.' E U , and a number A > 0 so small that [r:-1., -r]c(t 0 , 11].
The control

UJ.. (t) = u~. (t; -r, v) = 1


f u(t) for
f
tH-r-"-, -r), (8)
l V or tE[-r-/.., -r),
will be called an elementary needlelike variation of the control;(·). Let
Xf.(t) = Xf.(t; r:, v) be the solution of the equation x ='qJ(t, x, u._(t)) with
the initial condition X;l.(to) = x(to) = xo. The function Xf.(t) will be
called an elementary needlelike variation of the trajectory, and the pair
(x>.(t), ~(·))will be called an elementary variation of the process (x(·),
u(·)). Finally, the pair (r:, v) specifying this variation will be called
an elementary needle.
INTRODUCTION 53

Proof. As usual, we split the proof of the theorem into stages. The
first two stages entirely belong to the theory of ordinary differential
equations.
A) The Lemma on the Properties of an Elementary Variation. · Let an
elementary needle (T, v) be fixed. Then there exists AO > 0 such that for
0 ~ A ~ Ao:
1) the trajectory xA(·) is defined throughout the closed interval
[to, t1l, and xA(t) + x(t) for A+ 0 uniformly on [to, t1l;
2) fort > T, 0 ~A~ AO, the derivative
d
df.. xl.(t; 't', v) = zl.(t; 't', v)

exists and is continuous with respect to A (for A =0 it is understood as


the right-hand derivative);
3) the function t + y(t) zo(t; T, v) satisfies the differential
equation
(9)
on the closed interval [T, t1l and the initial condition
Y('t')=cp('t', X('t'), v)-cp('t', x(T), u(T)). ( 1O)
The proof of the lemma is based on two well-known facts of the theory
of ordinary differential equations: the local existence and uniqueness
theorem and the theorem on the continuous differentiability of the solution
with respect to the initial data. These theorems are included in standard
textbooks on ordinary differential equations in the form suitable for our
aims (see [10, 11, 15]). We also refer the reader to the material of Sec-
tion 2.5.
First we shall prove the lemma for the case when the function~(·) is
continuous.
Let us consider the differential equations
i =q> (t, x, u,_ (t)), ( 11)
i =q> ~t. x, u(t)). (12)
According to (8), the right-hand sides of these equations coincide fort<
T - A, and since xA(to) = xo = x(to), the uniqueness theorem for the solu-
tion of Cauchy's problem implies that xA(t) = x(t) for t < T - A, and, by
the continuity,
~(f..)=x,_ (-r-J..)=x(-r-1..). ( 13)
In particular, ~(A) is differentiable with respect to A and
GCO>=x<-r>. s'<O>= -x<•>=-q>(-r,x<-r>, u c-r». ( 14)
Let us denote by =<t, s, ~) the solution of Cauchy's problem for the
differential equation with a fixed control v:
x=q>(t, X, v), X(s)=s· (15)
According to the local existence and uniqueness theorem, there are E1 >0
and o1 > 0 such that =<t, s, ~) is defined for
jt-Tj <Ill> js-Tj'< lito /s-x(-r)j <St.

and, by virtue of the theorem on the continuous differentiability of the


solution with respect to the initial data, is a continuously differenti-=
able function.
54 CHAPTER 1
According to (8) and (13), to define xA(t) on the closed interval [T-
A, T] we must put E;. = E;.(A) = x(T- A) in (15), and if AI < 61 is chosen so
that IE;.(A)- x(T)I < El for 0 ~A~ Al, then
x1.(t)=8(t, T-A, s(A.)), T-A<:t<!·

In particular, since the function


1J(A)=x1.(T)=8(T, T-A, s(A.)) (16)
is a compos1t1on of continuously differentiable functions, it is also con-
tinuously differentiable with respect to A and, by virtue of (14),
!](O)=x(-r), TJ'(0)=-8,(-r, T, £(0))+8~(T,T, s(0))£'(0)=
=-8,(-r, T, s(0))-8g(T, T, £(0))q>(T,X(T), u(-r)), (17)
The solution =<t, s, E;.) of Cauchy problem (15) satisfies the equivalent
integral equation
t
8 (t, s, &>=£+ ~ q> (a, 8 (a, s, S), v)da. (18)
s
Differentiating (18) with respect to s, we obtain
t
8, (t, s, s) = -cp (s, 8 (s, s, s). v>+ \ cp_.. (a, 8 (a, s,
J
s). v) 8, (a, s, ID da
s

and putting t = S = T, E;, = x(T) we derive (taking into account the obvious
identity =<t, t, E;.) = E;.] the relation
8, (T, T, x(-r)) = -q> (T, x(T), v). ( 19)
Similarly, differentiating (18) with respect to E;. and substituting the same
values of the arguments, we obtai~
si (-r, -r, x(-r)) = r. (20)
Here I = (aE;.i/aE;.k) = (oik) is the unit matrix (which is sometimes denoted
by E). The substitution of (19) and (20) into (17) results in
'I (O)=X(T), 1)' (O)=q>(T, X (T), v)-q>(T, X (T), u(-r)). (21)
Further, the function= is continuous at the point (T, T, i(T)) and,
moreover, =(T, T, X(T)) = x(T), and i(•) is continuous at the point T.
Therefore, for any £ > 0 there exists o > 0 such that for
lt--rl<ll, ls--rl<ll, j&-x(-r)l<ll
the inequalities
Is (t, s, s)-x <vi < e/2, I x(t) -x (-r)l < e/2 (22)
hold.
Let us take a positive number Al < o so small that the inequality
IE;.(A)- i(T)I < o holds for 0 ~A~ Al· Then forT- A~ t ~ T, s = T
A, and E;, = E;,(A) inequalities (22) hold, whence
h. (t)-x(t)j =IS (t, T-A, S(A.))-x (t)l <: (8 (t, T-A, S(A.)) -x(-r)l+lx(T)-x (t)l < e/2+e/2=s.
Since xA(t) = x(t) for to ~ t ~ T- A, we have
O<:A.<:A.l ~rx,. (t)-x(t)i < e, to<:t<:T,
which proves the first assertion of the lemma.
Now let us denote by X(·, n) the solution of Cauchy's. problem for Eq.
(12) with the initial condition x(T) = n. By the theorem on the continuous
differentiability with respect toAthe initial data, there exists £ 2 such
that X(t, n) is defined for In - x(<) I < £2, T ~ t ~ t 1 and is a continuously
differentiable function. According to (8) and (16), and also by virtue of
INTRODUCTION 55

the uniqueness theorem, xA(t) = X(t, n(A)). Again, since the function
xA(t) is a composition of continuously differentiable functions, it is con-
tinuously differentiable with respect to (t, A) for T ~ t ~ t1 and 0 ~ A~
A2,where A2 is chosen so that the inequality ln(A)- x(T)I < Ez holds for
0 ~A~ A2· Putting AO =min (Al, Az) we conclude that the second asser-
tion of the lemma is true.
Passing from Eq. (12) to the equivalent integral equation and taking
into account (16), we obtain the equation
t
XA (I)=I')(A) + ~ cp(a, XA (a), u(a))da.
~

Differentiating this equation with respect to A and then putting A = 0 and


denoting, just as in the condition of the lemma, y(t)=fxxA(t) JA=O' we find
t
y(t)=I'J'(O)+~ cpx(a, x(a), u(a))y(a)da,
'(

The last integral equation is equivalent to Eq. (9) with the initial condi-
tion y(T) = n'(O) coinciding with (10) by virtue of (21).
If the control u(·) is a piecewise-continuous function, we resort to
the following reasoning. For the sake of simplicity, let there be two
points of discontinuity, say a1 and az, and let T be a point [at which u(•)
must be continuous] lying between them: to < a1 < T < az < t1. Systems
(11) and (12) coincide in the strip to~ t ~ a1 [fort= a1 the control in
these systems must be understood as its limiting value u(a1 ..:...0)= lim u(t)],
t-Hts-0
and, by the uniqueness theorem, xA(t) = x(t).
Now we pass to the strip a1 < t < az [the control is again assumed to
be equal to the limiting values u(al + 0) and ~(az - 0) at the boundaries
t = a1 and t = az of the strip]. Here Eqs. (11) and (12) are solved with
the initial condition X= x(al). The foregoing assertions are applicable
to the present case, and thus we see that xA(t) is continuously differenti-
able with respect to A.
Finally, in the strip az ~ t ~ t1 (under the same agreement on the
value of the control at t = az) the differential equations are solved with
the initial data xA(az) and x(az). Once again referring to the theorem on
the continuous differentiability of the solution with respect to the ini-
tial data, we prove the continuous differentiability of xA(t) with respect
..... t1 and compute
..... t ""'
to A for az ""' d
y(t)=~ xA(t)
IA=O 1n. the same way as before. •

B) The Lemma on the Increment of the Functional. Let us put x(A) =


3 (x,(. ), u,(.)) and prove that this function is differentiable on the right
at the point A = 0.
Let p(·) be the solution of system (6) with the boundary condition
(6a). Then

•t.,'(+0)=~3(XA(•), UA(·))""=+O=a('r, v),

where
a(i, v)=f('t, x('t), v)-f('t, x(t)~ u('t))·-p("t)[cp('t, x('t), v)-cp('t, x-('t), u('t))]. (23)
~ Since
t, t, '
xJA)-X(O)= ~ f(t, xA(t), uA(t))dt- ~ f (t, x(t), u(t))dt=
t0 to
56 CHAPTER 1

~ ~

=~ [f(t. xA(t), u(t))-f(t, x(t), u~(t))]dt+ ~ (f(t, xA(t), v)-f(t, x(l), u(t)]dt,
~ ~-A

we have

AtO aA
~

x' (O)=lim X (1..)-X (O) =d"d sf (t, XA(t), u (t)) dt IA=+O AtO h
~

+lim+ s [f (1. XA (1), v)-f (t, x(t), u (t)] dt.


~-A
~

By the lemma of Section A), xA(t) is continuously differentiable with re-


spect to A, and therefore the ordinary differentiation rule under the in-
tegral sign is applicable to the first summand; to the second summand we
apply the mean value theorem; this results in
t,
X'(+O>=Stx(t, x(t), u(t>>aixA<t>/ dl+!im lf(c. xA(c), v)-f(c, x(c), u(c>]-
~ A=O AtO
t,
= ~ fx(t, x(l), u(t))y(t)dt+f(t, x(t), v)-f(t, x(t), u~(-r)) (24)
~

[here we have used the fact that T- A~ c ~ T, which implies c + T;


xA (c)-+ x(T) by virtue of the first assertion of the lemma in Section A);
y(t) denotes the same as in that lemma].
Further, p(·) satisfies system (6), andy(·) satisfies system (9).
Therefore
d~ -:.. ~ • ~ -~ ~ ~ ~ ~
iii p(t) Y (t) =p (I) y(t)+p (I) y(l) =-p (I) «i'x (I) y(ll+fx (I) y(t)+p (I) «i'x (I) y(t) =Jx(l) y(t).

Integrating this equality from T to t1 and taking into account the


boundary condition (6a) for p(·) and the condition (10) for y(•), we obtain

ft 1:·=-p(t)
t,
5
t,
Fx (t) y(t)dt= s (p(t) y(t)] dt=p(t) y(t) [q> (T, x(t), v)-q> (t, x(T), u(T))J. (25)
~ ~

Comparing (24), (25), and (23) we conclude that x'(+O) = a(T, v). •
C) Completion of the Proof. If the right end point is free, then, by
virtue of the lemma in Section A), every elementary variation is admissible
(for sufficiently small A). Hence, if (x(·), u(·)) is an optimal process,
then for small A we have
3(x.~,(·), u.~,(·));;;.:3(~(·), u(·))~x(t..);;;.:x(O)~x'(+O);;;.:O.
Applying the lemma of Section B) we conclude that the condition a(T, v) ~ 0
is necessary for the optimality of (i(·), u(·)). However, TETo and veU
were chosen arbitrarily, and consequently we have proved that for any t·
belonging to the set of the points of discontinuity of the control u(·) and
for any u E U there holds the inequality
p(t)cp(t, x(t), u)-f(t, x(t), u) ~p(t)cp(t, x(t), u(t))-f(t, x(t), u(t)),
equivalent to the inequality a(t, u) ~ 0 and (7). The proof of the theorem
is completed.

1 .6. Solutions of the Problems


The problems mentioned at the beginning of this chapter were posed for
different purposes and at different times; here we shall present a unified
approach to these problems following a standard scheme implied by the La-
grange principle.
INTRODUCTION 57

This scheme consists of six stages:


1. Writing the formalized problem and discussing the question of the
existence and the uniqueness of the solution.
2. Forming the Lagrange function.
3. Applying the Lagrange principle.
4. Analyzing the possibility of the equality AO = 0.
5. Finding the stationary points, i.e. solving the equations implied
by the Lagrange principle.
6. Investigating stationary points, choosing the solution, andwriting
the answer.
For all the problems that will be discussed the Lagrange principle is
a rigorously justified theorem which either has already been proved or will
be proved in the following chapters. The applicability of this unified
scheme to the problems so different in their subject stresses the univer-
sality of the Lagrange principle. Of course, when considering some other
problems researchers may encounter situations which are not embraced by any
of the known concrete schemes (the classical calculus of variations,
optimal control theory, linear programming, etc.). Nevertheless, even in
these cases the Lagrange principle understood in one way or another may
prove valid or can at least serve as a useful guide. The understanding
of the general ideas and the situations to which the principle is appli-
cable (they will be discussed in Chapters 3 and 4) may help to find neces-
sary conditions for extremum in a new situation as well. However, it is
also important to realize that the Lagrange principle is not always valid
and therefore it must not be applied automatically.
1 .6 .1. Geometrical Extremal Problems. In this section all the
problems posed in Section 1.1 .2 and formalized in Section 1 .2.2 are solved.
The first stage of the above scheme includes the discussion of the question
of the existence of the solution. The geometrical problems of the present
section are finite--dimensional, and the existence of their solutions is
guaranteed by the following theorem:
Weierstrass' Theorem. Let a function f: RN + R be continuous and let
the level set 2'af={xlf(x)~a} be nonempty and bounded for some a. Then
there exists a solution of the problem f(x) + inf.
The proof of this theorem is an evident consequence of the classical
Weierstrass theorem on the existence of a minimum of a continuous function
on a bounded and closed subset of RN (see [14, Vol. 1, pp. 176 and 370] and
[9, Vol. 1]) because the set 2'af is obviously closed.
Now we proceed to the solution of the problem.
The problem of Euclid on the inscribed parallelogram. The problem
was formalized thus [see formula (1) of Section 1 .2.2]:
1. fo(x) = x(x- b)+ inf, 0 ~ x ~b.
We have dropped the inessential factor H/b and reduced the problem to
a minimization problem. The function fo is continuous, and the interval
[0, b] is bounded and closed. By Weierstrass' theorem, there exists a so-
lution of the problem. Let this solution be i. It is clear that ~ 0x
and x ~ b because fo(O) = fo(b) = 0 and the function assumes negative val-
ues as well. Consequently, xE(O, b). The function fo is smooth. There-
fore, one should seek the stationary points of the problem fo(x) + inf.
2-5. f~(~) = 0~ X= b/2.
58 CHAPTER

6. By the uniqueness of the stationary point, =b/2 E [0, b] is the x


single solution of the problem; i.e., the sought-for parallelogram ADEF is
characterized by the property that IAFI = IACI/2, i.e., Fis the midpoint of
the line segment [AC]. This is the fact that was established in Section
1.1.2 purely geometrically.
Remark. The above problem turned out to be an elementary smooth prob-
lem. Therefore, we did not write the Lagrange function, and Sections
2-5 "merged." We used the Fermat theorem (Section 1.3.1).
The problem of Archimedes on the spherical segments with a given sur-
face area (Section 1.1.2). Here the solution is quite similar to that of
the problem of Euclid, and we present it without comments.
1. f.(h)=ha/2-nN/3-+sup; O~h<lfii7it.
2-5. {;(h)=0=>h=Va!2n.
6. The value of fo at zero is equal to zero, and its value at the
point la/n is smaller than the value at the stationary point la/2n. Conse-
quently h. = la/2n is a solution of the problem. Recalling that a = 2nRh
we obtain h =R; i.e., the sought-for spherical segment is a hemisphere (its
altitude is equal to the radius).
The problem of Apollonius on the shortest distance from a point to an
ellipse. It was formalized thus [see the formula (6) in Section 1.2.2]:
1. fo(x1, x2) = (x1- ~1) 2 + (x2- ~2) 2 + inf; f1(x1, xz) = (x1/a1) 2 +
(x 2 /a 2 ) 2 = 1. The ellipse is a bounded and closed set, the function fo is
continuous, and therefore, by Weierstrass' theorem, there exists a solution.
The functions f 0 and f 1 are smooth.
2. 2='A0 ((x 1 -s 1 )"+(x~-s 2 )")+'A 1 ((x 1 /a 1 ) 2 +(x2 /a 2 ) 2-I).
3. 2 .... = O=> x. (xl- 61) + Xlxl;a~ = o, 2'x, = o=> x. (x.-6.) + X~x.;a: = o.
4. Let ~o = 0. Then \1 ~ 0 (the Lagrange multipliers are not equal
to zero simultaneously). Hence, the third stage implies that x1 = x2 =
0 => h(O, 0) = f1(~1, ~2) = 0 ~ 1. Thus, ~o ~ O, and we can put Ao 1.
Let us denote ~1 =A.
5. (For simplicity, we confine ourselves to the case when ~1~2 ~ 0.)
·
I=
~ ~~a~
I • 2 =!>X;=-(-.--)' i= I, 2.
at+l.
Substituting this into the equation of the ellipse, we obtain

('A)
t2 2
,.1a1 .,.a.
•••
I
cp ... (a~+l.)" + (ah/.) 2 •
( 1)

6. The number of stationary points of the problem [i.e., the points


corresponding to those A which satisfy Eq. (1)] does not exceed four [see
Fig. 26; the inequality cp(O)=Waf+Uia:> I shows that the graph of cp(A.) is
depicted for a point (~1, ~2) lying outside the ellipse]. To complete the
solution of the problem one must solve Eq. (1), find Ai, find the corre-
sponding points x(Ai), substitute these points into f 0 , and choose the
smallest of the resultant numbers.
Remarks. 1. Problem 1 is a smooth problem with equality constraints.
In :Sections 2-5 we used the Lagrange multiplier rule (Section 1. 3. 2).
2. The relations <Xi - ~i) + Axi/ai. = 0 have an obvious geometrical
meaning: the vector ~ - x
joining the point ~ and the point of minimum is x
proportional to the gradient vector of the function fo at the point x, i.e.,
the vector ~- x lies on the normal to the ellipse. Apollonius was the
first to establish this fact.
INTRODUCTION 59

Fig. 26 Fig. 27

3. Let us use the relations obtained to derive the equation of the


curve "separating" those points f;, toward which two normals can be drawn
and the points toward which four normals can be drawn. The separation ob-
viously takes place for A satisfying relation (1) for which
~M ;W I
cp'(?..)= (~+1.)' (a:+l.)• =0, AE.(-a•. -a,'). (2)

From (2) we obtain

where

Substituting this into (l),weobtain the equation of the separating curve:


(sla./1' + (;~a~>''• = (a:-an''•.
This is the equation of an astroid (see Fig. 7 in Section 1 .1). To every
point there correspond two normals when the point lies outside the astroid,
four normals when it lies inside the astroid, and three normals on the
astroid itself (with the exception of the cusps at which there are two nor-
mals). This result was also obtained by Apollonius in his "Conics."
Kepler's problem on an inscribed cylinder [see the formula (3) in Sec-
tion 1.2.2]. The solution of this problem is similar to that of the prob-
lem of Euclid and we make no comments on it.
1. f 0 (x)=x(x2 -i)-+inf, O~x~i.

2-s. t~(x)=o~3x'=i~x=V3;3.
6. By virtue of the uniqueness of the stationary point in (0, 1), the
point x= lf/3 is the single solution in the problem: the sought-for cyl-
inder is characterized by the property that the ratio of its altitude 2x
to the radius 11="'¥ is equal to 12.
The Light Refraction Problem. This problem was posed and solved with
the aid of Huygens' method in Section 1.1 .3. Here we present its standard
solution, which goes back to Leibniz [see the formula (4) in Section 1 .2.2].
1. f.(x)= v a'+x 2/Vl + v~·+(£-x)'!v,-+ inf.
All the level sets 2afo of the continuous function fo are compact, and
hence, by Weierstrass' theorem, there exists a solution of the problem.

2 _ 5 • f'(x)=O~ x ~ = £-x ~ ¢=>;incp 1 =sincp 2


t• 1 V~•+x 2 v2 VB 2 +(£-x) 2 01 02

(see Fig. 19 in Section 1.2.2).


60 CHAPTER 1

6. The point x
satisfying the last equation is unique (check this!),
and hence it is the single solution of the problem. Thus the point of re-
fraction of a ray of light at the interface between two media is character-
ized by the property that the Patio of the sine of the angle of incidence
to the sine of the angle of PefPaction is equal to the patio of the speeds
of light in the coPPesponding media. This is known as Snell's law.
Steiner's Problems. Itwas formalized thus (see the formula (5) ±n
Section 1.2.2):
1. fo(x)=lx-s~l+lx-6.1+1x.-6sl->inf, xER 2 ,
s;ER 2 , i=l, 2, 3, lxi=Vxf+x~.
By Weierstrass' theorem, there exists a solution of this problem (check
this!). There are two possible cases here: the solution coincides either
with one of the points ~i. i = 1, 2, 3, or with none of them. We shall
solve the problem for the latter case. Then the function is smooth in a
neighborhood of the point x
(check this!).
, ~ 61-X' 6.-X' !is-X' <
2-5. f0 (x)=0~-,~-t-+-~-t-1 +I~ !; I 0. 3)
x-~~r rx -,.. x- 3

Equation (3) means that the sum of the three unit vectors drawn from
x toward ~ 1 , ~2, and ~3 is equal to zero. Hence, these vectors are parallel
to the sides of an equilateral triangle, and consequently at the point x
the angles of vision GxG, G#;3, and GxG of the sides of the given
triangle are equal to 120°, Thus, the point xis the ToPPicelli point. It
can be found as the point of intersection of the two circular arcs with the
chords [~ 1 , ~2l and [~1. ~3] subtending angles of 120°. This construction
is possible when none of the angles of the given triangle exceeds 120°. If
otherwise, the arcs do not intersect, and hence it is impossible that the
point x coincides with none of the points ~i. i = 1, 2, 3. Then it must
obviously coincide with the vertex of the obtuse angle because the side of
the triangle opposite to the obtuse angle is the greatest.
Answer: The sought-for point is the Torricelli point if all the angles
of the triangle are less than 120° and it coincides with the vertex of the
obtuse angle in all the other cases.
We have thus solved all the geometrical problems mentioned in Section
1.1.2 and also the light refraction problem (Section 1.1.3). In addition,
let us solve TaPtaglia's pPoblem:
1. f0 (x)=x(8-x)(8-2X)-+SUp, 0=::;;;x=::;;;4.
2-5. t~ (x) = o~ 3xB-24x+ 32 = o~ = 4-4/V3.x
6. Answer: One of the numbers is equal to 4- 4/1:3 and the other is
equal to 4 + 4//3.
1.6.2. Newton's Aerodynamic Problem. This problem was posed in Sec-
tion 1.1.5 and formalized in Section 1.2.3:
T
1. s~~~~-inf, X=U, x(O)=O, x(T)=6. uER+.
0

It is by far not a simple task to prove directly the existence theorem


for problems of this kind. The main obstacle is the nonconvexity of the
integrand t/(1 + u 2 ) with respect to u for u ~ 0. However, we shall present
an exhaustive solution of this problem. The plan of the solution is the
following. First we shall suppose that there exists a solution of the
problem and apply the Lagrange principle to the hypothetical solution. In
this way we shall find that there is a single admissible stationary curve
in the problem (i.e., an admissible curve for which all the conditions im-
plied by the Lagrange principle hold). Then we shall show by means of a
direct calculation that this very curve yields an absolute minimum. In
INTRODUCTION 61

conclusion we shall recall Newton's words and verify that the resultant
solution coincides exactly with the one described by Newton in 1687.
T

2. £'=~Ldt+~-tox(O)+~-tdx(T)-~),
0
L = l+u•
A0 t · )
+p(x-u.

3. The EuleP equation:


d ~
-dt L;. + L,.=O~ p (t)=cOnst =Po· (1)

The tPansvePsality condition:


~ ~

Po= I-to= -J.ti· (2)

The minimality condition in u:


Pott(t), Vu;;;;,o. (3)

4. If we assume that Xo = 0, then it is necessary that po ~ 0 [be-


cause, if otherwise, (2) would imply the equalities Xo = po = ~o = ~ 1 = 0
while the Lagrange multipliers are not all zero]. In case t ~o = 0 and p 0 ~
O, the relation (3) implies that u(t)::: 0, and hence x(t)=~u(t)d-c=:O. But
0
then the sought-for body has no "length"; i.e., it is a plane. If ~ > 0,
then Xo ~ 0, and we can assume that Ao = 1.
It should also be noted that the inequality Po ~ 0 is also impossible
because in this case the function t/(1 + u 2 ) - pou is monotone decreasing
and (3) does not hold for u > u(t).
5. From (3) (with ~o = 1) it follows that until an instant of time
the optimal control is equal to zero [verify that for Po < 0 and small t
the function u + (t/(1 + u 2 ) ) - pou attains its minimum at u = 0]. After
that the optimal control u(·) must be found from the equation
2ut
-Po= (I + u•)• , (4)

which is obtained from the equation Lu = 0. The instant of time T corre-


sponding to the corner point of the control is determined by the condition
that the function u + (T/(1 + u 2 )) - PoU has two equal minima: at zero and
at the point found from (4) fort= T. In other words, at timeT the rela-
tions
(5)

must be satisfied [here and henceforth u(T) designates u(T + O) ~ 0]. From
the second equation we obtain -fi 2 (T)T/(1 + U2 (T)) = pou(T), whence PO
~u(T)/(1 + u2 (T)). Substituting this relation into the first equation
(5), we find that u2 (T) = 1 ~ U(T) = 1 (because U ~ 0), and then the first
equation (5) yields the equality T = -2po.
For the time after the corner point the optimal solution satisfies the
relation (4) which implies that
(6)

Now we write
62 CHAPTER 1

Integrating this relation and taking into account the equalities x(T) = 0
and u(T) = 1, we obtain parametric equations of the sought-for optimal
curve:
X(t, P0)=- ~ ( lg ~+u 2 +{u') +fPo•
(7)
t =- ~ ( ~ + 2u + u•) , Po < 0.
6. The curve ( 7) is called Newton 1 s curve. Moreover, u E [1, oo) in
(7). It can easily be understood that there is a single point of intersec-
tion of the straight line x =at and Newton's curve corresponding to the
value p 0 = -1 of the parameter. Indeed, the function x(·, -1) is continu-
ous and convex because

On the other hand, it is seen from the formulas (7) that Newton's curve
x(·, po) is obtained from the curve x(·, -1) by means of the homothetic
transformation with center at (0, 0) and the coefficient lpol (Fig. 27).
Hence, to draw a curve belonging to the family (7) through the given point
(~, T) we must find the point of intersection of the straight line x = ~t/T
and the curve x(·, -1) and then perform the corresponding homothetic trans-
formation of the curve x(·, -1). In this way we obtain an admissible curve
x(·). Let us show that it yields an absolute minimum in the problem. To
this end we come back to the relation (3) with ~o = 1. Let x(·) be an
arbitrary admissible curve [i.e.,x(·)EKC 1 ([0, T]),x(0)=0, and x(T) = ~].
Then, by virtue of (3),
_ t_ _ px(t)>-: t ~( >
I+x"(t) o ~ I+uUJ" Pou t.
Integrating this relation and taking into account that u(t) x(t) and
T T
~ x(t)dt= ~ ~ (t)dt=L we obtain
0 0

The problem has been solved completely.


Remarks. 1. In Sections 2-5 we applied the Lagrange principle for
optimal control problems which is reducible to the Pontryagin maximum prin-
ciple (Section 1.5.3).
2. Let us compare the solution obtained with the solution described
by Newton himself. Let us recall Newton's words quoted in Section 1 .1.5
and look once again at his diagram shown in Fig. 18. In the figure, be-
sides the letters set by Newton himself, some additional symbols are writ-
ten. We have lMNI = t, IBMI = x, IBGI = T, and BGP=cp; now Newton's
construction implies
tancp=x(t), IBPI!IBGI= tancp~IBPI=-rx,
IGP I'= 1Ba I"+ IBP l'=(x"+ 1)-r".
Substituting our symbols into Newton's proportion

IMNI:IGPI=IGPI":4(IBP!xiGBI'),
we obtain

't• ex"+ '>'t. xt 't


(x"+IJ't.'t 4'tn• ~ ex•+ 1) 2 =-.r· (8)
INTRODUCTION 63

We see that this is nothing other than the relation (4) into which the val-
ue po = ~12 is substituted. With the aid of the same reasoning as before,
we find from (8), by means of integration, the expression (7) for Newton's
curve. It should also be noted that the "bluntness" of the curve and the
condition an the jump at the point G ~ c (where the angle is equal to
135°) were in fact indicated by Newton in his "Scholium" on a frustum of a
cone.
We see that Newton solved his problem completely, but the meaning of
the solution was beyond the understanding of his contemporaries and many
of his followers (up to the present time).
1.6.3. The Simplest Time-Optimal Problem. This problem was posed in
Section 1.1.7 and was formalized in Section 1.2.4 in the following way:
..
T-rinf, mx=u, uE[u 1 , u,], ( 1)
X (0) = X0 , X(0) = V0 , X (T) =X (T) = 0
[see the formulas (5) in Section 1.2.4]. The case u 1 = uz is of no inter-
est because in this situation for some pairs (xo, vo) there exist no func-
tions x(·) satisfying all the constraints (1), and if such a function ex-
ists and is not identically equal to zero, the value of T is determined
uniquely, and so there is no minimization problem.
For u 1 < u 2 we can reduce the number of the parameters in the problem
by means of the substitution x(t) = A~(t) + B(t- T) 2 • In terms of the
function ~(·) the general form of the problem (1) does not change, but the
parameters x 0 , vo, u 1 , and uz assume some other values. In particular, if
we put
A= (ui-u 2 )(2m, B = (u 1 + u2 )/4m,
then ~E[-1, 1]. Taking into consideration what has been said, we shall
assume that m = 1, u1 = -1, and uz = +1 in (1). Moreover, we denote x = x1
and i = xz.
Now we proceed to the realization of our standard scheme:
T
1. T=~1·dt-lnf,
0
x ,..x,,
1 x~=u, uE[-1,1],
(2)
X 1 (0) = X0 , X1 (0) = V0 , X1 (T) = X11 (T) = 0.
The existence of the solution will be treated in exactly the same way
as in the foregoing section: on finding with the aid of the Lagrange
principle a function x( ·) "suspected" to be optimal, we verify directly that
it yields the solution in the problem.
T
2. $ = ~ L dt + iJ.1 (x1 (0) ~x 0 ) + !12 (x, (O)-v0) + '\'1X1 (T) + V11Xs (T),
.0

where 1 = Ao + p1(x1- xz) + pz(xz- u).


3. The Euler-Lagrange equations:
d
--L·+L,..=O, 2 ..... dpl
i=1,- dt
lp,
= 0, -dt--p,.
~ (4)
dt Xi I

The transversality conditions:


p;!(O) = ~i• p, (0) = ~.. pt(T) =- vl, p, (T) =- v". (5)
The Maximum Principle: dropping the summands independent of u, we can
write this condition in the form
p, (t) u(t) = -1.-u.-1
max {p. (t) u} =I Pa (t) 1.
64 CHAPTER 1

or
sgnp (t) if p2 (t)=I=O,
u(t) = { an a~bi~rary value belonging to [-1, l], if p~ (t) = 0. (6)

Moreover, the terminal instant of time T is also variable in the prob-


lem under consideration (this is the quantity that must be minimized), and
so, formally, this problem lies out of the framework of Section 1.5.3.
However, it can be shown (this will be done in Chapter 4) that in such a
situation the condition .!iT=O should be added (the stationarity condition
of the Lagrange function with respect toT), which is in complete agreement
with the general idea of the Lagrange principle. Differentiating (3) with
respect toT and taking into account the equalities xz(T) = u(T) and ~ 1 (T)
x 2 (T) = 0, we obtain
(7)
4. It is unimportant in the problem under consideration whether or
not \o vanishes because \o is not involved in (4)-(6).
5. From Eqs. (4) we conclude that Pl(t) = canst and p2 (·) is an arbi-
trary linear function. Moreover, pz(t) t 0 because

= 0 '=*Pi (t) = 0 ~ J.ti =J.t


~ (4) ~ (5) ~ ~ ~ ~ (7) ~
p2 (t) 2 = v1 = V2 =0 '=*Ao =0
and all the Lagrange multipliers turn out to be zero.
A linear function which does not vanish identically has not more than
one zero on the closed interval [0, T]. Therefore, from (6) we obtain the
following possible cases for the optimal control:
a) u<t> = 1;
b) u<t>==-1
(pz(·) does not vanish on [0, T]);

c) - { I, 0 ~ t < T,
u (t) = - I ,T<t~T;
d) u(t) = { - I' 0 ~ t
I,T<t~T
< T'
[p 2 (·)
vanishes at the point t = T, the value of the control at the point
T is of no importance because its change at one point does not ~ffect the
function x(·) (why?); for the same reason, the cases T = T and T = 0 can
be dropped].
It is convenient to carry out the further analysis using the phase
plane (x 1 , xz) (Fig. 28). Solving Cauchy's problem
xi=xi, X2 =u(t), xdT)=x.(T)=O (8)
for one of the controls a)-d) we obtain a single solution together with a
single initial point (xo, vo) = (xl(O), xz(O)) corresponding to this solu-
tion. It can easily be verified that for all the possible values of T and
T these initial points cover the whole plane in a one-to-one manner. First
of all, we have
Xt =X2, X2 = 1 ~ Xt =X~/2 +Ci, (9)
xi =X2, X2 =- I ~X1 =-X~/2+C 2 , ( 10)
and hence for the intervals of constancy of the control the phase trajec-
tories lie on the parabolas of one of the families (9) and (10).
The initial points corresponding to the control a) lie on the arc OFA
(Fig. 28): xo = va/2, Vo < 0; the points corresponding to the control b)
lie on the arc ODB: x 0 = -va/2, v 0 > 0.
INTRODUCTION 65

:rz
0

Fig. 28 Fig. 29 Fig. 30

The initial points C lying to the left of the separatingcurveBDOFA


(xo = -volvol/2) correspond to the controls c: u(t) = 1 on the arc CD of
the family (9); to timet= T there corresponds the point D at which the
switching of the control takes place; after that the moving point on the
phase trajectory traces the arc DO with u(t) = -1. Similarly, to the ini-
tial points E lying to the right of the separating curve there corresponds
the controls d).
6. It remains to show that the single extremal x(t) = Xl(t) we have
found [corresponding to the given initial point (xo, vo)] does in fact
yield a solution in the problem (2).
Let us suppose that a function x(·) is defined on a closed interval
[0, T] where T ~ T, possesses the piecewise-continuous second derivative,
and satisfies the conditions x(O) = xo, x(O) = vo, x(T) = x(T) = 0. Let
us extend x(·) to the interval [T, T] (forT~ T) by putting x(t) = 0, t E
[T, T]. After that the two functions x(·) and x(·) will be defined on one
and the same closed interval [0, T] and will have the same boundary values:
x(O) =x(O)=x0 , x(O) =x(O)=v 0 ,
x (T) =x<T> =x (r) =x (T) = o. ( 11)

Let us show that if !xi~ 1, then x(·) = x(·), and, in particular, the in-
equality T <Tis impossible. This will prove the optimality of x(·). By
virtue of the symmetry of the problem, we may confine ourselves to the con-
trol c) [or to its limiting case a)]. If lxl ~ 1, then integrating twice
the inequality x(t) ~ 1 and taking into account (11), we obtain
< I
X('t)-X('t) = ~ ~ (1-x(s))dsdt~O, (12)
0 0

where the equality can only take place if x(s) =1 at all the points of
continuity, and thenx(t)=x(t), tE [0, T].
Similarly, integrating twice the inequality x(t) ): -1, we obtain
T T
X('t)-X(T) = ~ ~ (-1-x(s))dsdt~O, (13)
< I

and here too the equality can take place only if x(s) = -1 and x(t) = x(t),
t E [T, T].
However, comparing (12) and (13)' we conclude that x(T) = x(t)' and
then, as was already mentioned, x
(t) = x (t), t E[0, T] .
.Remark. Relations (5)-(7) imply the equality \o = lpz(T)I, and hence
the equality AO = 0 turns out to be possible in this problem when pz(·)
vanishes for t = T. Then pz(·) t 0 (because, if otherwise, all the Lagrange
66 CHAPTER 1

multipliers would be equal to zero), and, by virtue of the equality pz(T)


0, the function pz(•) does not change sign, and thus there are no switch-
ings of the control at all. Consequently, the case ~o = 0 corresponds to the
the motion along the lines of switching AFO and BDO.
1.6.4. The Classical Isoperimetric Problem and the Chaplygin Prob-
~ The most ancient extremal problem (the first problem mentioned in
the title of this section) was posed in Section 1.1.1 and formalized in
various ways in Section 1.2.4. In particular, the formalization (2) in
Section 1.2.4 reduced the problem to a more general problem embracing the
Chaplygin problem as well. Proceeding from this formalization, we shall
present the solution of the two problems using our standard scheme. Thus:
T

S
S=~ (xv-yu)dt-+-sup;
0 (1)
x=u, y=v, (u, v)E A, x(O)=x(T), y(O)=y(T).
We shall assume that the set A of the admissible velocities is a
closed,convex, and bounded set in R2 •
Exercise. Show that for the solvability of the problem stated it is
necessary that OEA. ~: one can make use of the result of Exercise 2 in
Section 2.2.3.
T
2. 2 = SL dt + ft (x (0)-x (T)) + v (y (0)-y (T)), (2)
0

where
Ao {xv-yu) + p (x-u)+q
L= - 2 • •
(y-v).

3. The Euler-Lagrange equations:


d -' ~0 ~
-diL;.+L,.=0~-p- 2 v= 0,
d -' ~0 ~ (3)
-diLu+Ly =0::::>-q +Tu=O.

The transversality conditions:


p(0) = p(T) = ~. q(0) = q(T) = v.
The maximum principle:

(ii<t)-~i/<t))u(t)+(q<t)+~; xu>)v(t)=
= (u:n~;A ((.0 (t)- ~ y(t)) U+ (Q (t) + ~; X(t)) V), (4)

4. If we assume that ~o = 0, then the Euler-Lagrange equations imply


that p(t) = const, q(t) = const, and, moreover, p2 + q2 > 0 because, if
otherwise, the transversality conditions would imply that all the Lagrange
multipliers are equal to zero. Now the maximum principle takes the form
pu (t) +(j v(t) = max (pu +qv),
(u, v)eA

whence it is seen that the point (u(t), v(t)) belongs to one and the same
straight line all the time, namely, to one of the two straight lines sup-
porting to the set A which are perpendicular to the vector (p, q) (Fig. 29).
Therefore,
~ (t) = u(t) = u(0)-et (t) q,
y(t) =v (t) = v(0) +a (t) p.
INTRODUCTION 67
Integrating these equalities, we obtain
t
x.(t)=x(O)+u (0) t - Sa(t)dt
0
·q,
(5)
t
y(t) = y(0) + v(0) t +~a (.t) dt. p.
0

Putting t = T in (5) and using the boundary conditions x(O) = x(T) and
y(O) = y(T), we conclude that the vector (q, -P) is proportional to (ii(O),
v(O)), after which it is seen from (5) that the point (x(t), y(t)) all the
time lies on the straight line parallel to the vector (q, -P) and passing
through the point (x(O), y(O)). Consequently, the closed curve {(x(t),
y(t)), 0 ~ t ~ T} degenerates (lies on a straight line), and the area
bounded by it is equal to zero (prove this analytically proceeding from
the indicated expression for S!). Consequently, this curve cannot be op-
timal.
5. The 4th stage implies that we can put ~o = 1. Then from (3) it
follows that
A 0 ~ y(f)
P +2 =O~p(t)+-2-=b=const,
:. u= 0 ~ q (t) - f =-a= l:onst.
q- 2
~ XIt)

Substituting these expressions into (4), we derive

(x(t)-a)v(t)-( y(t)-b)u(t)= max {(x(t)-a)v-(y( t)-b)u}. (6)


(u, v)eA

Now we lay aside the general case and confine ourselves to two spe-
cial versions.
a) The Classical Isoperimetric Problem. Here
A={(u, v)ju'+v'~l}
and from (6) we find
-' ~ y(t)- b -' ~ x(t)it,
a
x(t)=u(t)=- -:w-· y(t)=v(t)= (7)

where
,rt = ((r(t)-a)• + (y (t)-b)")'l·
[the scalar product of the vectors (x- a, y- b) and (v, -u) assumes its
maximum value when the vectors are in one direction and the second of them
has the maximum possible length, i.e., is of unit length]. From (7) we de-
rive

~ (.n'") = 0,
i.e.'
.n (x <t), y(t)) = ((x (t) -a)• + (y (t) -b)")' I.=
= max ((x(t)-a)v-(y(t )-b)u)=R=con st (8)
(u, V)EA

on the optimal trajectory. It follows that the optimal trajectory is a


circle of radius R with center at the point (a, b). Since the angular
velocity of the motion along this circle is equal to

d I · • I
dj = R" ((x -a) y-(y- b) x)"""' R,
68 CHAPTER

and since the moving point must return to the initial position (possibly
after it has made several revolutions) in time T, we have 2nn = T/R whence
R = T/2nn. The area bounded by this curve is equal to

S= 2I s
T

(xv-yu)dt= 2I s. .
T

(xy-yx)dt= 2I s
T
. R• T
((x-a)y-(y-b)4di=zR= .
T•
4nn·
0 0 0

Consequently, n = 1 (the circle is traced only once), and S = T2 /4n. It


should be noted that here T is the length of the curve.
b) The Chaplygin Problem. Here
A= {(u, v) I(u-u 0 ) 2 +v• ~ VZ},
(uo, 0) is the wind velocity, and V ~ luol is the maximum velocity of the
airplane. To solve the auxiliary extremal problem
(x (t)-a)v -(y (i)-b) u - sup,
(u-u 0 ) 2 +v•~v•,

we can make the substitution u = uo + f,, v = n , and use the same geometrical
consideration as in the foregoing case. This results in the equations
..: ~ y(t)-b ..: ~ x(t)-a
x(t) = u (t)=u 0 -V R(i), y (t)=v (t) = V R(i),

where the quantity

is no longer constant. However,


~ _u iy(t)
dR(t) I ..: •
Cit= R (t) ((x(t)-a)x(t) +(y(t)-b) y(t)) -~ U 0
" _x(t)-.,.-a
-v 0
dl,

and so we obtain

Consequently, the optimal curve is an ~llipse determined by the equation

( 10)

As in the foregoing case, it can be shown that the moving point (x(t),
y(t)) traces the ellipse only once. It should be noted that if the veloc-
ity of the airplane were less than the wind velocity (V < lu 0 1), the air-
plane could not return to the initial point (prove this; cf. the exercise
at the beginning of the present section), and the curve (10) would be a.
hyperbola, i.e., nonclosed.
Remark. Contrary to our scheme, we have disregarded the question of
the existence of the solution. The matter is that a simple reference to
Weierstrass' theorem will not do here. Let us outline briefly a possible
course of reasoning.
First of all, without loss of generality, we can assume that x(O) =
y(O) = 0.
Using Arzela's theorem (see [KF, pp. 110 and 111]), wecanshowthat
the set of the pairs of functions {(x(·), y(·)} satisfying the generalized
Lipschitz condition

( 11)

on the closed interval [0, T] and the boundary conditions


INTRODUCTION 69

x(O)=x(T)=O, y(O) = y(T)=O


is compact in the space C([O, T]) x C([O, T]).
Further, the "area" functional is defined on this set and is a lower
semicontinuous function. Applying the corresponding generalization of
Weierstrass' theorem, we prove the existence of the solution (x(·), y(·)).
Finally, the functions satisfying the condition (11) are differenti-
able almost everywhere, and moreover (x(t),y(t))EA. The maximum principle
can be extended to this situation, and consequently the above reasoning is
applicable to the solution (x(·), y(·)).
1.6.5. The Brachistochrone Problem and Some Geometrical Problems.
Let us consider the following series of the simplest problems of the clas-
sical calculus of variations:
"•
:J (y ( · )) = ~ !/" V 1+u'" dx-+ inf; Y (xo) = Yo• y (xi)= Y1· (1)
"•
It embraces a number of interesting problems: for a = 0 we obtain the
problem on the shortest lines in a plane, for a = -1/2 we obtain the
brachistochrone problem (see Section 1.2.4), for a= 1 we obtain the mini-
mum-surface-of-re volution problem, and for a = -1 this is the problem on
the shortest lines (the geodesics) on the Lobachevskian plane in Poincare's
interpretation (for the Poincare half-plane see [13, pp. 131 and 132]).
The integrand fa = ya/1 + y' 2 in the problems (1) does not depend on
x. Therefore, the Euler equation (Section 1.4.1) possesses the energy in-
tegral from which we find all the extremals lying in the upper half-plane.
2-4. y'fa.y•-f=const=>y- 2a.(t+y'")=D'. (2)
5. First we shall consider the case of negative a: a = -£, B > 0.
Then
y~dy (3)

Now we change the variable of integration:


0 11P
y•P =D" sin2 t => y=D 1 1P sin1 /P t => dy=Tsin 1 1P-ttcostdt.

Substituting the resultant expressions into (3), we arrive at the fol-


lowing relations (in which D1 /B C):
I

y=Csint/Pt, x=C 1 + ~ ~sln1 1Psds.


u

In particular, if B = 1/2 (the brachistochrone problem), we obtain a family


of cycloids represented parametrically(< = 2t):
-
y('t,C,
- c
C1 )= 2c (1-cos't), x('t, C, C1)=C1 + 2 ('t-sin't).

If B = 1 (the case of the Poincare half-plane), we obtain a family of semi-


circles:
y(t, C, C1)=Csint, x(t, 0, C1)=C1 -Ccost =>(x-C1) 1 +g"=C•.
Among the problems with a > 0, we shall only consider the minimum-surface-
of-revolution problem. From (2) we derive

dy =dx.
y D•y•'-i
70 CHAPTER 1

Using the substitution


Dy=cosh 1-=?Ddy= sinh tdt, V D 2y 2 - l =sinh t,
we find the two-parameter family of catenaries:

For all the cases we have obtained two-parameter families of extremals


(solutions of the Euler equation), and now we must choose the constants of
integration in such a way that the boundary conditions y(xi) = Yi are sat-
isfied and then pass to the 6th stage of our scheme, i.e., analyzing the
solutions we have found and writing the answer.
However, here we shall not do this. If, in addition, the reader re-
calls that we have again evaded the question of the existence of the solu-
tion, he will have even more ground than in the foregoing section to de-
clare: "Something is rotten in the state of Extremum!"* Indeed:
For a > 0 the problem (1) may have no ordinary solution. For example,
since the boundary conditions in the minimum-surface-of-revolution problem
(a= 1) are symmetric (y(-xo) = y(xo)), one should seek a symmetric extre-
mal, i.e., y = coshDx/D. All the extremals of this form are obtained from
the extremal y = coshx by means of a homothetic transformation, and their
collection does not cover the whole plane and only fills an angular region
(Fig. 30). Therefore, a problem with such boundary conditions, say y(-xo)
y(xo) = 1, is unsolvable for sufficiently large xo.
In the case a < 0 the integrand in the problem (1) tends to infinity
for y + 0. Therefore, for instance, the brachistochrone problem (a= -1/2)
with ordinary boundary conditions y(xo) = 0, y(xl) > 0 (Section 1.1.4) does
in fact lie outside the framework of the standard scheme.
The complete solution of the problems of the series (1) requires some
additional work.

*"Something is rotten in the state of Denmark" (Shakespeare, Hamlet, Act I,


Sc. IV).
Chapter 2

MATHEMATICAL METHODS OF THE THEORY


OF EXTREMAL PROBLEMS

Now the reader is familiar with various statements of extremal prob-


lems, with basic notions and general ideas, and with some methods of solv-
ing these problems. This chapter is devoted to a systematic and rather
formal presentation of the corresponding mathematical theory.
The generality of the theory and its comparative simplicity are due
to the free use of facts and notions of the related divisions of mathemat-
ics and, particularly, of functional analysis. Here they are presented in
the form suitable for our aims. A considerable part of the material of
this chapter comprises what could be called "elements of differential cal-
culus."

2.1. Background Material of Functional Analysis


In this section we gather the facts of functional analysis, which we
need for the further presentation of the theory of extremal problems. As
a rule, proofs that can be found in the textbook by A. N. Kolmogorov and
S. V. Fomin [KF] are omitted.
2.1.1. Normed Linear Spaces and Banach Spaces. We remind the reader
that a linear space X is said to be normed if a functional 11·11 :X + R (called
a norm) is defined on X which satisfies the following three axioms:
a) JlxJJ;;:;:,O, YxEX and llxii=O ~x=O,

b) IJcxxll=lcx!JixJI, YxEX, VcxER. (1)

c)

Sometimes in order to stress that the norm is defined on X we shall write


ll·llx.
Every normed linear space becomes a metric space if it is equipped
with the distance (metric)

A normed linear space complete with respect to the distance it is


equipped with is called a Banach space.

71
72 CHAPTER 2

Exercises. Let X = R 2 • \Vhich of the functions enumerated below (and


for what values of the parameters) specify a norm in X?
1• N (x) =(I x11P +I x 21P)1/_P, 0 < p < oo;
2. N (x)= I a 11 x1 +atoXol+ I auxt +a •• x.l;

3. N(x)=max{laux1+atoXol. ja21x1+a 22x 2 1}.


4. Describe all the norms in R 2 •

5. Prove that all the norms in R2 are equivalent. (Two norms N1 and
N2 are said to be equivalent if there are c > 0 and C > 0 such that
cN 1 (x) ·..:;;. N.<xKCNt(x), vxEX.)
The set X* of all continuous linear functionals on X (it is called
the space conjugate to X) is a Banach space if the norm in X* is defined
as
llx•llx•= sup <x",x>,
llxllx..; 1

where <x*, x> is the value of the functional x* assumed on x.


For greater detail, see [KF, Chapter 4, Section 2].
Exercise 6. Find the norms in the spaces conjugate to those described
in exercises 1-4.
For our further aims, the following Banach spaces will be of great lm-
portance.
Example 1. The space C(K, Rn) of all continuous vector functions
x(•):K + Rn defined on a compact set K with the norm
llx( ·)llo =max I x(t) J.
teK

The space C(K, R) will be denoted by C(K).


Example 2. The space cr([to, t1l, Rn) of r times continuously differ-
entiable vector functions x(·):[to, t1l + Rn defined on a finite closed
interval (t 0 , t 1]cR; and equipped with the norm
I x ( ·) 1/, =max(// x (.) J/0 , ••• , /J x<rl (.) 1/0 ).
The space cr([to, t1J, R) will be denoted by cr([to, t 1 ]).
Exercises. 7. Prove that every finite-dimensional normed space is
a Banach space.
8. Prove that the unit ball in a finite-dimensional normed space is
a convex, closed, bounded, and centrally symmetric set for which the origin
is an interior point, and, conversely, for every convex, closed, bounded,
and centrally symmetric set containing the origin as an interior point
there is a norm in which this set is the unit ball.
9. Give an example of a normed space which is not a Banach space.
2.1.2. Product Space. Factor Space. Let X andY be linear spaces.
Their (Cartesian) product X x Y, i.e., the set of all the pairs (x, y), xEX,
yEY, becomes a linear space if the operations of addition and multiplica-
tion by a number are defined coordinatewise:
(x1, y1) +(x., y2) = (x1 +X2, y1 + y.), a (x, y) =(ax, ay).
If X and Y are normed spaces, then the product space X x Y can be
equipped with a norm, for instance, by putting
/J(x, Y)llxxv=max{JJxJJx, !IYI!v}. (1)
Exercise. Verify that llxllx + llylly and /llxll* + llyll~ are also norms in
X x Y and are equivalent to norm (1).
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 73
The two lemmas below are quite obvious.
Lenna 1. If X and Y are Banach spaces, then X x Y is also a Banach space.
The proof is left to the reader.
LEMMA 2. Every linear functional AE(XxY)• can be uniquely repre-
sented in the form
<A, (x, y)>=<x•, x>+<y*, y>, (2)
where x• E x• and y• E Y*.
Proof. Let us put <x*, x> =<A, (x, 0)> and <y*, y> =<A, (0, y)>.
These functionals are obviously linear, and their continuity ( ~ bounded-
ness) follows from the inequalities

l<x•, x>I~//A\1\\(x,O)JJ=IIAJII\x[\
and
I <y*, Y> I ~/IAIIII (0, y)Jl=IJAJIIIYII·
The uniqueness of the representation (2) is also quite evident. •
Since formula (2) specifies a linear functional A E (Xx Y)• for any
x*EX" and .1/*EY*, we have obtained an exhaustive description of the space
(X x Y)*. It can be briefly expressed by the formula (X x Y)* =X* E9 Y*.
Now let X be a linear space, and let L be its subspace. He shall put
XNX' if x-x' E L . The relation thus introduced is an equivalence relation
x
(Since we obviously have X N X, 1 N X 2 =:} X 2 N X 1 and X 1 N Xz, X 2 N X3 =:} X 1 N X3 ) ,
and therefore [KF, Chapter 1, Section 2], a decomposition (partition) of X
into equivalence classes is defined. A class of equivalent elements with
respect to this equivalence relation is called a residue class generated
by the subspace L. The operations of addition and multiplication by a num-
ber are introduced in the set of the residue classes in a natural way [KF,
Chapter 3, Section 1, Subsection 4], and the axioms of linear space hold.
Thus, the set of the residue classes becomes a linear space; it is called
the factor space (or the quotient space) of X relative to L and is denoted
by X/L.
The residue class 11 (x) to which an element X EX belongs is the class
x + L. The mapping n:X + X/L is linear (prove this!). It is called the
canonical mapping of X on X/L. (The mapping 11 is of course an epimor-
phism.t) It is evident that L =Kern.
Now let X be a normed space, and let L be its subspace. Let us put
llsllx;L= inf \!xllx (3)
xE£
or
lls/[x;L= inf [\xllx = inf llxo+x'~. (3')
nx=~ nx0 =£
X'EL

From definitions (3) and (3') it readily follows that the operator 11
is continuous (1\nxl\ ~ 1\xl\), and for any sEX/L there is xEX, such that
:rtx=~. llx11~2\\s\lx;L· (4)
Theorem on the Factor Space. Let X be a normed space, and let L be
its closed subspace. Then the function ll·llx/L defined by relation (3) is
a norm on X/L. If X is a Banach space, then the factor space X/L with the
norm ll·llx/L is also a Banach space.
Proof. He have to prove that, in the first place, the functional
1\·llx/L satisfies the axioms of the norm and that, in the second place, the
completeness of X implies the completeness of X/L. Let us prove the first
tThat is, 11 maps X onto the whole factor space X/L.
74 CHAPTER 2

assertion of the theorem, i.e., verify that Axioms (1) of Section 2.1.1 are
fulfilled.
Axiom a) : 11; llxtL;;;;::. 0 (¥;) since the norm in X is nonnegative. If t;, = 0,
then X = 0 can be taken as xe;. and therefore IIOIIx/L = 0. Let llt;.llx/L = 0.
Then (3) implies that there is a sequence {xn}, Xn E; , such that llxnll + 0,
i.e. Xn + 0. Since t;, is closed, we have 0E6'; i.e., is the zero element
of x/L.
Axiom b): Let ex E R. Then if 1rx = t;,, we have
inf I Yll = inf I cull= Iex I inf
11a:; llxtL = "Y=e<l\ llx~= Iex Ills II·
ru:=!; ".r=!;
Axiom c):
1;1 +;.11= inf I xi +x2 +x'll= inf ll.xl +xi +x. +x;~~
";i';}i xjeL

~ inf ~xl+xill+ inf llx~+x~ll=llslllxtL+IIs.lixtL•


x~eL x;eL

Let us prove the second assertion of the theorem, i.e., the complete-
ness of X/L. Let {E;,n} be a fundamental (Cauchy) sequence in X/L, i.e.,
Ye > 0 3N (e): n > N (s)=>llsn+m.-snllxtL <: s, Ym;;;;a.l.
Now we choose e:k = 2-k and indices nk such that nk+ 1 > nk, llt;.nk+m- t;,nkll ~
z-k, k ;;:, 1 . Then II l';.n 1 - t;,n 2 11 ~ 1 /2, and from (4) it follows that there
exist representatives X; E Sn; such that llxz - x1ll ~ 1 • In a similar way we
construct elements {xn}, n;;:, 3, such that llx11 -x11 -d~2-<k-u, x11 Esnk, k = 3,
4,... . The sequence {Xk}k=l is fundamental in X (check this I), and, by the
hypothesis, X is complete. Hence, the limit x0 = limx,. exists. Let us con-
sider the class t;.o = 1rxo. We have ~~~~

~Sn
k
-sollxtL= inf llx.--xu-x'~x~llxk-xollx-+ 0 •
X'EL k+e

Thus, l';,nk + E;,o, and therefore we also have t;,n + t;.o since {t;.n} is fundamen-
tal. Consequently, X/L is complete. •
Exercises. 1. Let X= C([O, 1]), 0 ~ T1 < T2 < ... < Tm ~ 1, and let
the subspace L consist of the functions x(·) E C((O, 1]) such that x(Ti) =
0, i = 1, ••• ,m. Find the factor space X/L and determine ll·llx/L·
2. Generalize the result of the exercise 1 and prove that if F is a
closed subspace of the closed interval (0, 1] and
LF=fx(o)EC([O,l))lx(t) ... o, tEF},
then the space C([O, 1])/Lp is isometrically isomorphic to the space C(F).
2.1.3. Hahn-Banach Theorem and Its Corollaries. In the theory of ex-
tremal problems an important role is played by separation theorems and some
other facts of convex analysis. Most of them are consequences of the Hahn-
Banach theorem, which is often called the first principle of linear analy-
sis. Since this theorem is included in all standard courses in functional
analysis, we shall limit ourselves to its statement and to some of its
most important corollaries.
In this section X is a linear space and R is the extended real line,
i.e. R =R.U{-oo, +ooJ.
Definition 1. A function p:X-+B. is said to be convex and homogeneous
if
p(x+y)~p(x)+p(y) for any x, yEX, (1)
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 75

and
p (we)= a.p (x) for any X eX and a> 0. (2)
Examples. 1) X is a normed space and p (x) = llxll.
2) X is a linear space, L is a linear subspace of X, and
0, xEL,
{
p(x)= + oo, x~L.

3) X is a linear space, l1,····Lm is a set of linear functionals on


X, and p(x)=max(<li, X), ... , <lm, x)).
The following definition implies an important example.
Definition 2. Let X be a linear space, and let A be its convex sub-
set containing 0. The Minkowski function ~A(·) of A is defined by the
equality
~~oA (x) =inf {t >OJ xftE A} (3)
[if there are no t > 0 such that xtt E A, then ~A(x) = +co].
Proposition 1. The Minkowski function is nonnegative, convex, and
homogeneous;
{x I ~~oA (x) < 1} s;;; A s {xI p.A (x) ~ 1}. (4)
If X is a topological linear space,* then ~A(·) is continuous at the
point 0 if and only if OEintA.
Proof. If ~A(x) or ~A(y) is equal to +co then (1) holds. Therefore,
let ~A(x) < +co and ~A(y) < +co, By definition, for any e: > 0 there are t
and s such that
0 < t < !lA (x)+e/2 and x/tEA,
0 < s < !lA (y) +e/2 and y/sE A.
It follows that

since A is convex, and consequently


!lA (x+ y) ~ t+s < !lA (x) +p.A (y) +e.
By the arbitrariness of e:, condition (1) holds. Further, for a > 0 we have

p.A (ax)=inf {t > Ol e&X/tf~ A}= infia;s > Ol xjs E A}=a; inf{s > Ol X/&E A} =cq.tA (x),

and hence (2) is true.


The nonnegativity of ~A(x) and the second inclusion in (4) follow di-
rectly from the definition. If ~A(x) < 1, then there is t E (0, l) for which
xJtEA, and since OEA and A is convex, the inclusion

x=0·(1-t) +7· tEA


holds, i.e., the first inclusion in (4) is true.
Now let X be a topological linear space. Then the following chain of
equivalent assertions takes place: ~A(·) is continuous at 0 ~
Ve>O, 3U.~O. Vxeu •. !lA(x)<e~
3Uao. Vxe U 1 , 11A (x) < 1 ~
3U 1 ~ 0, U1 c A ~ 0 E int A
*On topological linear spaces see [KF, Chapter 3, Section 5].
76 CHAPTER 2

[here Ue: and U1 are neighborhoods of 0; the second equivalence relation


holds because of the homogeneity of ~A(·) and the possibility of putting
Ue; = EUll· •
Propos1t1on 2. For a linear functional x* on a topological linear
space X to be continuous it is necessary and sufficient that there exist a
convex homogeneous function p(·) continuous at the point 0 such that the
inequality
<x•, x>:::::;:; p (x) (5)
holds for all xEX .
Proof. The necessity is established at once by putting p(x) = l<x*,
x>l.
Now let us suppose that (5) holds and p(x) is continuous at the point
0. Then for any E > 0 there is a neighborhood U!)0 such that p (x) < e: for
all xEU. Since OEU and 0=(-0)E(-U), there is a neighborhood W such
thatOEWcUn(-U). If xEW, then x -xEU, and, by virtue of (5),
<x•, x>:::::;:; p (x) < e,
<x*-x>:::::;:; p (-x) <e.
Consequently, l<x*, x>l < e: for all xEW, i.e., x* is continuous at 0. It
remains to take into consideration that a linear functional continuous at
one point is continuous on the whole space X [KF, p. 174].
Hahn-Banach Theorem [KF, pp. 134-137]. Let p:X ~ R be a convex homo-
geneous function on a linear space X, and let l:L ~ R be a linear func-
tional on a subspace L of the space X such that
<l, x>:::::;:; p (x) for all x E L. (6)
Then there exists a linear functional A defined on the whole space X which
is an extension of l, i.e.,
<A, x>=<l, x>, xEL, (7)
and satisfies the inequality
<A, x>::::;;p(x) for all xEX. (8)
COROLLARY 1 • Let X be a normed space and X0 EX, x0 '=I= 0. Then there is
an element AEX* such that IIAII = 1 and <A, xo> = llx 0 11.
Proof. Let us define a linear functional l on the subspace
L={xl X=ctX 0 , aE~}

by putting
(9)
The function p(x) = llxll is convex and homogeneous, and <l, a.xo> = a.llxoll ~
la.lllxoll = lla.xoll = p(a.xo); therefore, inequality (6) holds. By the Hahn-
Banach theorem, l can be extended to a linear functional A on the whole
space X, and relations (7) and (8) hold. Recalling the definition of the
norm of a functional (Section 2.1.1), we obtain from (8) the inequality
IIAII= sup <A, x>=:::;; sup p(x)=l,
UxU=l 1\X\1=1

whence, in particular, AEX•.


On the other hand, (7) and (9) imply
(A, X0 > = <l, Xo> =II Xo II,
and thus the assertion of the corollary holds and, moreover,
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 77

IIAII= sup (A, x>-;;;;.(A, li~ooll)-=1.


\IX \I= I

Therefore, Iii\ II 1. •
Corollary immediately implies·:
COROLLARY 2. If a normed linear space X is nontrivial (i.e., X~ {0})
then the space X* conjugate to X is also nontrivial.
2.1 .4. Separation Theorems. In this section X is a topological
linear space and X* is its tonjugate space consisting of all the continuous
linear functionals on X.
Defjnjtjon 1. We say that a linear functional x•ex• separates sets
A c: X and B c: X if
sup <x', x> ~ inf (x', x>, ( 1)
xeA :ceB

and that it strongly separates A and B if


sup <x', x> < inf <x•, x). (2)
:ceA :ceB

The geometrical meaning of inequality (1) is that the hyperplane


H (x', c)= {xI (x', x> =c},
where sup <x•, x)~c~ inf <x•, x> separates the sets A and B in the sense that
xeA :ceB
A lies in one half-space (H+(x*, c) = {xl<x*, x> ~ c}) generated by H(x*,
c) and Bin the other half-space (EL(x*, c) = {xl<x*, x> ~ c}) (Fig. 31);
the inequality (2) means that c can be chosen so that A and B lie inside
the corresponding half-spaces and have no common points with H(x*, c) (Fig.
32).
First Separation Theorem. If sets A c: X and B c: X are convex and
nonempty and do not intersect, and A is open, then there exists a nonzero
continuous linear functional separating A and B.
Proof. A) Since A and B are nonempty, there exist points a0 EA and
b0 E B • The set
C=(A-a0 )-(B-b0 )={xix=a-a0-b+b0 , aEA, bEB},
is obviously convex (also see Proposition 1 and Exercise 1 in Section
2.6.1), contains 0, and is open. Indeed, if x = 0:- ao- b + bo and aEA,
there is a neighborhood u, aEUc;A' and then xEU·-ao-b+boc:C. More-
over, we have c=b0 -a0 ~C, because, if otherwise, then bo - a 0 =a- a 0 -
b + bo for some aEA and bEB, whence a=b EAnB, i.e., these sets inter-
sect, which contradicts the hypothesis.
B) Let us denote by p(x) the Minkowski function of the set C. Accord-
ing to Proposition 1 of Section 2.1.3, the function p(x) is nonnegative,
convex, homogeneous, and continuous at the point 0. Moreover, p(x) ~ 1 for
all xec.
C) Let us define a linear functional l on the subspace
L={xjx=ac=c.t(b0 -a0 ), ttER}
by putting <l, ~c> = ~p(c). Then for~> 0 we have <l, ~c> = ~p(c) = p(~c)
and for~~ 0 we have <l, ~c> = ~p(c) ~ p(~c) since p(·) is nonnegative.
Consequently, the inequality <l, x> ~ p(x) holds for all xEL, and, by the
Hahn-Banach theorem, l can be extended to a linear functional A such that
(3)
<A, ac>=<l, ac>=c.tp{c), ttER,
78 CHAPTER 2

Fig. 31 Fig. 32

and
(A, x>~p(x), xeX. (4)
Since p(·) is continuous at 0, inequality (4) implies the continuity of the
functional A (Proposition 2 of Section 2.1.3).
D) For any aeA and bEB we have
(A, a-b>=<A, a-a0 -b+bo>+<A, a0 -b0 >~
~p(a-a 0 -b+bo)+<l, (-l)(b0 -a0 )> ~ 1-p(b0 -a0 ),

since a-a0 -b+b0 EC, and p(x) :e;; 1 on C; besides, we have used (4). How-
ever, for 0 < t :e;; 1 the point (bo- ao)/t = c/t cannot belong to the set C
because Cis convex and does not contain 0 while [0, c/t] contains the point c =
bo- a0 ~C. Therefore,

I
p (bo -ao) = inf { t > 0 bo-; ao ec};;;;;;. I. (5)

Consequently, (A, a-b>~l-p(b0 -a0 )~0 for any aEA and bEB .• The ele-
ments aEA and bEB in the inequality <A. a>~<A, b> can be chosen in-
dependently and therefore
sup(A, a>~ inf(A, b).
liE.. beB
Moreover, according to (3) and (5), we have (A,b0 -a0 )=p(b0 -a0 );;;;,.l, and
therefore A ~ 0. Consequently, A separates A and B. •
Remark. In Section 1.3.3 we already proved the separation theorem for
the case of a finite-dimensional space X. Comparing its conditions with
those of the theorem we have just proved we see that in the finite-dimen-
sional case the condition that A is open can be omitted (the fact that in
Section 1.3.3 one of the sets was a point was insignificant). We can weaken
the conditions of the first separation theorem and require that intA ~ ~
and (intA)OB=0· However, it is impossible to do without the existence of
interior points of at least one of the sets.
Exercises. 1. Let A be a convex set of a topological linear space X,
and let int A ~ Ill and x• EX* • Prove that
sup (x•, x>= sup (x*, x). (6)
xeA xelniA

2. Derive from (6) the separation theorem for the case of convex A
and B such that int A¢: 0, B ¢: 0, .Bnint A= 0.
3. Prove that there is no continuous linear functional on the space
X = l2 separating the compact ellipsoid

A={ x=[xk] E 11 1k~ k 2xl..;;; I}


and the ray
B={x=[xk]iXk=t/k, t:>'O}.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 79

Second Separation Theorem. Let X be a locally convex topological


linear space [KF, p. 169], let A be a nonempty closed convex subset of X,
and let xEX be a point not belonging to A. Then there is a nonzero con-
tinuous linear functional strongly separating A and x.
Proof. Since i ~A and A is closed, there is a neighborhood V ~ x
such that An V = 0. Since the space X is locally convex, there exists a
convex neighborhood B c: V of the point x.
I t is evident that B n A= 0 ,
and, by the first separation theorem, there exists a nonzero linear func-
tional x* separating A and B:
sup <x•, x> ~ inf (x*, x>.
xeA xeB
It remains to use the fact that
inf (X*, X)< (X*, X)
xeB
because the infimum of the nonzero linear functional x* cannot be attained
at the interior point x
of the set B. •
Definition 2. By the annihilator A~ of a subset A of a linear space
X is meant the set of those linear functionals l on.X for which <l, x> = 0
for all xEA.
It should be noted that A~ is always closed and contains OEX*.
Lemma on the Nontriviality of the Annihilator. Let L be a closed sub-
space of a localll convex topological linear space X, and let L ~ X. Then
the annihilator L contains a nonzero element x• E X• .
..fi:.Q.Qf_. Let us take an arbitrary point x~L. By the second separation
theorem, there exists a nonzero functional x• E X•, strongly separating x and
L (Lis a subspace of a linear space and consequently it is convex):
sup <x•,
xFL
x> < (x*, x>. (7)

If there existed X0 EL, for which <x*, xo> ~ 0 then, since a.x0 EL for
any a E R , we would have
sup (x•, x> ;;a. sup (x•, a.x0 >= + oo,
XEL a.eR

which contradicts (7). Consequently, <x*, x> =0 on L, and therefore x*E


L~. •
2.1.5. Inverse Operator Theorem of Banach and the Lemma on the Right
Inverse Mapping. The theorem below is the second fundamental principle of
linear analysis.
Inverse Operator Theorem of Banach. Let X andY be Banach spaces, and
let A:X + Y be a continuous linear operator. If A is a monomorphism, i.e.,
Ker A= {xiAx=O}= {0},
and an epimorphism, i.e.,
ImA={ylu=Ax, xEX}=Y,
then A is an isomorphism between X andY, i.e., there exists a continuous
linear operator H = A- 1 : Y -X such that
MA=Ix, AM=Iy.
For the proof, see [KF, p. 225]. In the present book the following
consequence of the Banach theorem will be frequently used:
Lemma on the Right Inverse Happing. Let X andY be Banach spaces,
and let A be a continuous linear operator from X into Y which is an epi-
morphism. Then there exists a mapping M:Y + X (in the general case this
80 CHAPTER 2
mapping is discontinuous and nonlinear) satisfying the conditions
Ao M=ly,
and
II M (y) II~ Cil Y I for some C > 0.
It should be noted that, by virtue of the second condition, the map-
ping M is continuous at zero.
Proof. The operator A is continuous, and consequently its kernel
KerA, which is the inverse image of the point zero, is a closed subspace.
By the theorem on the factor space (Section 2.1.2), X/KerA is a Banach
space. Let us define an operator A:X/KerA + Y by putting A~ =Ax on the
residue class ~ = 1rx. This definition is correct: x1 , x2 E6~nx1 = 1tX 2 =s~
x1 -x.EKerA:::;.Ax1 =Ax2 . The operator A is linear: if 1TXi = ~i• l = 1, 2,
then
A(a161 +a.5.) =A (~ixt +a.x.) =a1Ax1 +a.Ax. =(X1A5i +a.Xsa•
The operator A is continuous: for any X E5 we have
II A~ II= IJAx II~ IIA 1111 x JJ,
whence
IIAsii~IIAII inf IJxiJ=IIAJIIIsllxti<•• A·
xe~

Finally, the operator A is bijective (one-to-one). Indeed, A is an epi-


morphism, and hence so is A: VyEY 3xeX, Ax=y~ As=Y for ~ = 1r(x). On
the other hand, Ker A=O: A6=0, nx=5~Ax=O~xeKerA~5=0.
We have thus proved that A is a continuous bijective operator from
X/KerA into Y. By the inverse operator theorem of Banach, there exists a
continuous operator M= x-
1 , M:Y + X/KerA, inverse to A. According to
relation (4) of Section 2.1.2, for the element~= My there is an element
x = x(~) such that xEi and ~x~~2~s!lxJKer.A• Let us put M(y) = x(S). This
results in
(A oM) (y) =Ax(6)=As =AMy =y,
I M (y) I =II X mII~ 21/sllxti<erA = 211 Myllxti<er A~ 211M lillY II· •
2.1.6. Lemma on the Closed Image. Let X, Y, and Z be Banach spaces,
and let A:X + Y and B:X + Z be continuous linear operators. The equality
Cx=(Ax, Bx)
specifies a continuous linear operator C:X + Y x Z.
Exercise. Prove that if the norm in Y x Z is defined by equality (1)
of Section 2.1.2, then IICII = max{IIAII, IIBII}.
Lemma on the Closed Image. If the subspace ImA is closed in Y and
the suhspace B Ker A* is closed in Z, then the subspace Im C is closed in
Y X Z.
....fiQ.Qf_, The closed su~space Y = ImA of the Banach space Y is itself
a Banach space, and A:X + Y is an epimorphism. By the lemma of Section
2.1.5, there exists the right inverse mapping M:Y +X to A. Let (y, z)
belong to the closure of the image of the operator C. This means that
there is a sequence {xn In ~ 1} such that y = lim Ax;., z = lim Bxn. Let us
set nw+ao n-+ao

*B Ker A is the image of the kernel of the operator A under the mapping B.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 81

(yEY because y is the limit of the elements Ax,.EY). Taking into account
the properties of M, we obtain
A (x,.-s,.) = Ax,.-(A oM) (Ax,.-y) = y,
I s,. II= I M (Ax,.- y) II~ CM# Ax,.-y 11---+ 0.
Therefore, Bl;n + 0 and lim B (x,. -s,.) = lim Bx,. = z , i.e. , z belongs to the
closure of the set n ..... 00 n ..... oo

~ = {~=Bx! Ax=y}.

This set is a translate of the subspace B Ker A:


~i•~.E~::::::>~f=Bx;, Ax;=y, i=l,2::::::>
+
~ ~.-~i = B (x 2 -X1), A (x.-x1) = 0::::;> ~. = ~1 (~.-~1),
~1 E ~. ~.-~ 1 E B Ker A.

However, by the hypothesis, BKer A is closed, and hence the set ~ is


also closed. Consequently, the element z belonging to the closure of ~
belongs to ~ itself, which implies the existence of an element i for which
Ai = y, Bx= z. Thus (y, z) E Im C • •
2.1.7. Lemma on the Annihilator of the Kernel of a Regular Operator.
Let X and Y be Banach spaces, and let A:X + Y be a continuous linear epi-
morphism. Then (Ker A).l = Im A*.
Proof. A) Let x* E Im A• ~ x• = A•y•. Then x E Ker A implies <x*, x> =
<A*y*, x> = <y*, Ax>= 0, i.e., ImA•c(KerA).l.
B) Let x*E(KerA).l, i.e., Ax=O:::;>(x*, x>=O· Let us consider the map-
ping C:X + R x Y, Cx = (<x*, x>, Ax). The image of the kernel of A under
the mapping x* is zero, i.e., a closed set, and, by the lemma on the closed
image (Section 2.1.6), Im C is a closed subspace of R x Y. It does not
coincide with R x Y since, for instance, (l, O)~ImC [indeed, if (a, O)
(<x*, x>, Ax) then Ax=O:::;>a=(x*, x)= 0].
By the lemma on the nontriviality of the annihilator of a closed
proper subspace (Section 2.1.4), there exists a nonzero continuous linear
functional AE(ImC).Lc(R.xY)*, and, according to the representation lemma
for the general linear functional on a product space (Section 2.1.2), there
exist a number ~. E R." = R. and an element y< E Y• such that
<~.x• + A•y•, x> = ~. <x•, x> + <y•, Ax> =(A, Cx> = 0, Vx.
The case ~o = 0 is impossible because Im A = Y, and hence
= O::::::>A = 0.
<y*, Ax>= 0 ::::::>]/*
Consequently, x* = A*(-y*/~o), i.e., x•e ImA•and KerA).L c Im A•. •
2.1.8. Absolutely Continuous Functions. In the theory of optimal
control, we permanently deal with differential equations of the form
(1)

where u(·) is a given function called a control. Although the function


<p (t, x, u) is usually assumed to be continuous (and even smooth), the con-
trol u(·) does not possess these properties in the general case. It may
be piecewise-continuous, and it is sometimes convenient to assume that u(·)
is only a measurable function. Hence, the right-hand side of Eq. (1) may
be discontinuous, and therefore it is necessary to state more precisely in
what sense its solution should be understood.
Remark. In the present section such notions as "measurability,"
"integrability," "almost everywhere," etc. are understood in the ordinary
Lebesgue sense [KF, Chapter 5, Sections 4 and 5]. The same rule is re-
tained in the remaining part of the book with some rare exceptions, which
82 CHAPTER 2

are carefully stipulated: for instance, the integral in the formulas of the
next section is understood in the Lebesgue or the Lebesgue-Stieltjes
sense [KF, Chapter 6, Section 6]. When these notions are applied to vec-
tor-valued or matrix functions, then each of their components must possess
the corresponding properties. For instance, a function x(·) = (x 1 (·), ... ,
xn(·)):[a, 8] + Rn is measurable and integrable if each of the scalar func-
tions Xi(·):[a, 8] + R. possesses the same properties, and, by definition,

(2)

It is most natural to assume (and this is what we are doing) that the
differential equation (1) together with the initial condition x(to) = xo
is equivalent to the integral equation
t
x(t)=xo+) rp(s, x(s), u(s))ds. (3)
t,

For this assumption to hold the function x(·) must be equal to the in-
definite integral of its derivative, i.e., the Newton-Leibniz formula must
hold:
t
x(t)-x(t') = ~ x(s)ds. (4)

"
In the Lebesgue theory of integration [KF, Chapter 6, Section 4] the func-
tions x(·) for which (4) holds are called absolutely continuous. It is
important that the following equivalent and more effectively verifiable
definition can be stated for these functions:
Definition. Let X be a normed linear space, and let ~ = [a, 8] be a
closed interval on the real line. A function x(·):~ +X is said to be
absolutely continuous if for any £ > 0 there is 8 > 0 such that for any
finite system of pairwise disjoint open intervals (a", ~") c: [a, ~J, k = 1,
2, ... ,N, the sum of whose lengths does not exceed 8:
N
~(~"-a..,)< 6, (5)
k=l

the inequality
N
~ 1/x(~")-x(cxk)ll < 11 (6)
k=l

holds.
Example. A function x(·):~ +X satisfying the Lipschitz condition
//x(t 1 )-x(t 2 )/I~K/ti-t 2 /, ti, t,€6,
1s absolutely continuous. Indeed, here 8 = £/K. From the definition it
is readily seen that an absolutely continuous function is uniformly contin-
uous on~ (we takeN= 1).
Proposition 1. Let X andY be normed linear spaces, and let~ be a
closed interval of the real line.
If functions Xi(·):~+ X, i 1, 2, are absolutely continuous and
c;ER, then the function c1x1(·) + czxz(·) is also absolutely continuous.
I f functions x(·) :~+X and A(·).: 6-+2(X, Y) are absolutely contin-
uous, then the function A(·)x(·):~ + Y is also absolutely continuous.
If a function ~(·):G + Y satisfies the Lipschitz condition on a set
X.c:G, and a function x(·):~ + G is absolutely continuous and imx(·) =
{YIY=X(t), tE6}c:X, then the function ~(x(·)):~ + Y is absolutely continu-
ous.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 83
Proof. The first assertion follows at once from the definition (also
see [KF, p. 343]). To prove the second assertion we note that since the
functions x(·) and A(·) are continuous, they are bounded, and therefore
Jlx(tH~M.. , !!A(t)II~MA, te&.
Now for any system of disjoint open intervals (~..... ~11 )c&, k = 1, •.• ,N,
we have

~I! A @~)x@")-A (a") x(a")ll~ ~ A(~") x(~")-A (~")x(a")ii+


t-1 N k=l
11

+k=!
~ 1!A@")x(a"}-A(a")x(a11)!!~
N N
~~!!A @")JIIIx~)-x(a">n+ ~ JIA@J-A (a")/1/lx(~")/1~
k=! k=!
N N
~MA ~ lix(~")--x(a")li+M,. ~ I!A@11 )-A(a")l!.
k=l k=l
Let us choose o > 0 so that (5) implies the inequalities

r. II A @,.)-A (et")~ < 2;, .


N

k=! ..

Then
N

L JIA (~")x(~")-A (a")x(a")/1 < 2~


k=! A
MA+ 2~ M,.=s,
X

and hence A(·)x(·) is absolutely continuous.


Similarly, if

then
N N
~ JI<I>(x@"))-ll>(x(a"))./1~ ~ !lx(~")-x(a")ll < s
k=! k=!

if o > 0 is chosen so that (5) implies (6), where E is replaced by E/K. •


We shall state the basic theorem of this section only for a finite-
dimensional space X to avoid the definition of the operation of integration
in an arbitrary normed space [for X = Rn it is sufficient to take into con-
sideration (2)]. In what follows we shall need only this case.
Lebesgue's Theorem [KF, p. 345]. If a function x(·):~ + Rn is abso-
lutely continuous, :i.t is differentiable almost everywhere and its derivative
:it(·) is integrable on~. and equality (4) holds for all t, -re&.
If a functions(·):~+ Rn is integrable on~ and TEL\, then the func-
1

tion x(t)=~.s(s)ds is absolutely continuous and x(t) = s(t) almost every-


where.
In accordance with this theorem, in what follows we shall say that a
function x( ·) is a solution of the differential equation (1) if it is abso-
lutely continuous and satisfies (1) almost everywhere. Indeed, by Lebes-
gue's theorem, to this end it is sufficient that (1) imply (3). Conversely,
if x(·) satisfies (3) [i.e., in particular, t---+~p{t, x(t), u(t)) is an inte-
grable function] then, by Lebesgue's theorem, x(·) is absolutely continuous
and (1) holds almost everywhere.
To conclude this section we shall state one more assertion whose
scalar analogue is well known in the Lebesgue theory of integration and
whose proof is left to the reader as an exercise.
84 CHAPTER 2
Prolosition 2. A measurable function x(·):~ + Rn is integrable if and
only if x(·)l is integrable, and then

I~ x (t) dt I~~ Ix(t) Idt. (7)

2.1.9. Riesz' Representation Theorem for the General Linear Func-


tional on the Space C. Dirichlet's Formula. The reader is supposed to be
familiar with the definitions of a function of bounded variation and
Stieltjes' integral [KF, Chapter 6, Sections 2 and 6]. A function of
bounded variation v(•):[a, S] + R will be called canonical if it is con-
tinuous from the right at all the points of the open interval (a, S) and
v(a) = 0.
Riesz' Representation Theorem [KF, p. 369]. To every continuous lin-
ear functional x•EC([a, ~J)• there corresponds a canonical function of
bounded variation v(•):[a, S] + R such that
II
<x", x(· )>= ~ x(t)dv(t) (1)

for all x(·) EC((a, ~J) , and the representation {1) is unique: if
II
Sx(t)dv(t)=O
co
for all x(·) E C([a, for a canonical v(·), then v(t) = 0.
~])and

This theorem can readily be generalized to the vector case. Let e 1


(1, 0, ••• , O), e2 = (0, 1, .•• ,0), .•• be the unit vectors of the standard
basis in &n. An arbitrary element x(·)=(xd·), ... , Xn(·))EC((a, ~], Rn) is
represented in the form
n
X ( ·) = ~ Xk ( ·) ek.
k=l

Now if x*EC([a, M, Rn)*, then


n n
(x*, x(·)>= ~ <x•, x"(·)e"> = ~ <x:, x"(·)>,
k=l k=l
where the functionals xi;EC([a, ~])• are determined by the equalities <~,
~(·)> = <x*, ~(·)ek>· Applying Riesz' theorem to~. we obtain the repre-
sentation
n ll
<x•, x(·)>= ~ Sxk(t)dvk(t). (2)
k=l co
The collection of functions of bounded variation v(•) (v 1 ( · ) , ••• ,vn(•))
can naturally be regarded as a vector-valued function of bounded variation
v(·):[a, S] + Rn* (whose values are row vectors). Then formula (2) can be
rewritten in the form
ll
<x", x(·)>=~dv(t)x(t) (3)

coinciding with (1) to within the order of the factors. As in Riesz' theo-
rem, the correspondence between x* and v( ·) is one-to-one if the condition:
that the function v(·) is "canonical" holds.
Every function of bounded variation v(·):[a, 8] + R specifies a (gen-
eralized) signed measure (or "charge") [KF, Chapter 6, Section 5]. Stiel-
tjes' integral (1) is nothing other than the integral with respect to such
a measure. Similarly, a vector-valued function of bounded variation v(·):
[a, sj + Rn* specifies a measure on [a, S] with vector values, (3) being
the integral with respect to this measure. The measure corresponding to
MATHEHATICAL HETHODS FOR EXTREHAL PROBLEHS 85

a function of bounded variation v(t) is usually denoted by dv(t).


If v1(·) and vz(·) are two functions of bounded variation on a closed
interval [a, B], then the product measure dv1 x dvz is defined on the
square [a, Bl x [a, 8], and Fubini's theorem holds [KF, p. 317]. In par-
ticular, there holds the well-known formula for interchanging the limits
of integration ("Dirichlet's formula"):
ll (
~ l~ f (t,
l
s) dvl (s) J dv 2 (t) = l \IllJf
ll
(t, s) dv 2 (t)
}
® 1 (s).
(4)

In the generalization of the formula (4) to vector-valued functions


(which we shall need in what follows) attention should be paid to the order
of the factors. For instance, if
f(·, ·):[a, ~]x[a, ~]-+..2'(Rn, R•),
V1 (·):[a, ~]-+Rn, V2 (·): [a, ~]-+Rn*,
then (4) should be written thus:

(5)

2.2. Ftmdamentals of Differential Calculus in Norwed


Linear Spaces
Differential calculus is a branch of mathematical analysis in which
the local properties of smooth mappings are studied. At an intuitive level,
the smoothness of a mapping means the possibility of approximating this
mapping locally by a properly chosen affine mapping, i.e., the differential,
or, more generally, by a polynomial mapping, which gives some information
about the properties of the mapping under consideration. Differential cal-
culus as well as convex analysis is a basic mathematical tool of the theory
of extremal problems.
The foundation of differential calculus of scalar functions on the
real line was laid by Newton and Leibniz. The problems of the calculus of
variations required the generalization of the notions of the derivative and
the differential to mappings of many-dimensional and infinite-dimensional
spaces, which was done by Lagrange in the 19th century; in our time this
theory was developed more completely by Frechet, Gateaux, Levy, and other
mathematicians.
2.2.1. Directional Derivative, First Variation, the Frechet and the
Gateaux Derivatives, Strict Differentiability. As is known, for real func-
tions of one real variable the two definitions based on the existence of a
finite limit
lim F(l:+h>-F<x> (1)
11-+0 h
and on the possibility of the asymptotic expansion
F (x+h> = F (X} +F' (x) h +o (h) (2)
for h + 0 lead to one and the same notion of differentiability. For func-
tions of several variables and all the more so for functions with infinite-
dimensional domains the situation is not so simple. The definition (1) and
the definition of a partial derivative generalizing (1) lead to the notions
of the directional derivative, the first variation, and the Gateaux deriva-
tive while the generalization of the definition (2) leads to the notions of
the Frechet derivative and the strict differentiability.
86 CHAPTER 2
Let X and Y be normed linear spaces, let U be a neighborhood of a
point x in X, and let F be a mapping from U into Y.
Definition 1. If the limit
. F(x+M)-F(x)
I lm -'--'-.....-'----'-'- (3)
qo A
exists, it is called the derivative of F at the point x in the direction h
and is denoted by F'(x; h) [and also by (VhF)(x}, DhF(x), and ahF(x)]. For
real functions (Y = R) we shall understand (1) in a broader sense by admit-
ting -oo and +oo as possible values of the limit (3) •
. Definition 2. Let us suppose that for any hEX there exists the di-
rectional derivative F'(x; h). The mapping o+F(x; ·):X+ Y determined by
the formula o+F(x; h) = F'(x; h) is called the first variation ofF at the
point x (in this case the function F is said to have the first variation
at the point i).
If o+(x, -h) = -o+(x, h) for all h or, in other words, if for any
hE X there exists the limit

"F(A
u x, h)-J"
-1m F(x+M)-F(x)
A ,
).-+0

then the mapping h + oF(x; h) is called the Lagrange first variation of the
mapping Fat the point x (see Section 1.4.1) •
. Pefinitiqn 3. Let F have the first variation at a point x; let us
suppose that there exists a continuous linear operator AE.!?(X, Y) such
that o+F(x; h) = Ah.
Then the operator A is called the Gateaux derivative of the mapping F
at the point x and is denoted by Fr(x).
Thus, Fr(x) is an element of !l'(X, Y) such that, given any hEX, the
relation
F (x +M) = F (x) +I.F[h +o (I.) (4)
holds for A. .j. 0.
Despite the resemblance between relations (2) and (4) there is a sig-
nificant distinction between them. As will be seen below, even for X = R2
a Gateaux differentiable function may be discontinuous. The matter is that
the expression o(A.) in the expansion (4) is not necessarily uniformly small
with respect to h.
Definition 4. Let it be possible to represent a mapping F in a neigh-
borhood of a point x in the form
F (x+h)=F (x> +Ah +a(h)llhl/. (5)
where AE2(X, Y) and
lim
111111-+0
!Ia. (h) II= !Ia. (OJ II= o. (6)
Then the mapping F(·) is said to be Frechet differentiable at the
point :X, and we write FE D 1 (x). The operator A is called the Frechet deriv-
ative (or simply the derivative) of the mapping F at the point x and is
denoted by F'(x).
Relations (5) and (6) can also be written thus:
(7)
It readily follows that the Frechet derivative is determined uniquely (for
the Gateaux derivative this is obvious since the directional derivatives
are determined uniquely) because the equality IIA1h - Azhll = o(llhll) for
operators A; E 2 (X, Y) , i = 1, 2, is only possible when A1 = Az.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 87

Finally, in terms of the £, o formalism the relations (5) and (6) are
stated thus: given an arbitrary £ > 0, there is o > 0 for which the in-
equality
liP (x+h)-P (x)-Ah~ ~ ellhll (8)
holds for all h such that llhll < o. This naturally leads to a further
strengthening:
Definition 5. A mapping F is said to be strictly diffe~entiable at a
point x [and we write PESD 1 (x)] if there is an operator AE2(X, Y) such
that for any £ > 0 there is o > 0 such that for all x1 and xz satisfying
the inequalities llxl - xll < 0 and llxz - xll < 0 the inequality
liP (xl)-P (X 2 )-A (xl-X 2 )11 ~ e II X1 --''X2 [[ (9)
holds.
Putting xz = x and x1 = x + h in (9), we obtain (8), and hence a
strictly differentiable function is Frechet differentiable and A= F'(x).
By definition, the (Gateaux, Fr:chet, or strict) derivative F'(x) is
a linear mapping from X into Y. The value of this mapping assumed on a
vector hEX will often be denoted by F' (x) [h]. For scalar functions of
one variable (when X = Y = R) the space 2 (X, Y)= 2 (R., R.) of continuous
linear mappings from R into R is naturally identified with R (with a linear
function y = kx its slope k is associated). In this very sense the deriv-
ative at a point in elementary mathematical analysis is a number (equal to
the slope of the tangent) and the corresponding linear mapping from R into
R is the differential dy = F'(x)dx (here dx and dy are elements of the one-
dimensional spaceR).
As is also known, for functions of one variable Definition 3 (or the
definition of the first Lagrange variation which coincides with Definition
3 in this case) and Definition 4 are equivalent (a function possesses the
derivative at a point if and only if it possesses the differential at that
point) and lead to one and the same notion of the derivative which was in
fact introduced by Newton and Leibniz. In this case Definition 2 is appli-
cable to functions possessing the two one-sided derivatives at a point x,
which do not necessarily coincide. In elementary mathematical analysis for
functions of several variables (1 < dimX < oo), we use Definition 1 [if h
is a unit vector, then (3) is the ordinary definition of the directional
derivative; if h is a base vector along an axis OXk, then replacing in (3)
the condition A + 0 by A + 0 we arrive at the definition of the partial
derivative oF/oXk] and Definition 4 (the existence of the total differen-
tial).
The generalization of the notion of the derivative to infinite-dimen-
sional spaces was stimulated by the problems of the calculus of variations.
The definition of the first variation oF(x; h) and its notation and name
were introduced by Lagrange. In terms of functional analysis the same
definition was stated by Gateaux and therefore the first Lagrange variation
is often called the Gateaux differential. The requirement of linearity in
the definition of the Gateaux derivative has been generally accepted after
the works of P. Levy. The definition most commonly used in both finite-
dimensional and infinite-dimensional analysis is that of the Frechet deriv-
ative. However, for some purposes (in particular, for our further aims)
theexistenceof the frechet derivative at a point is insufficient and some
strengthening is needed and this leads to the notion of the strict differ-
entiability.
Proposition 1. For Definitions 1-5 and the continuity of functions
the following implications hold:
88 CHAPTER 2

continuity at continuity in a
the point :X neighborliood of
the pointx
and none of them can be converted.
~ The positive part of the proof (i.e., the existence of the
implications) follows at once from the definitions, and the negative part
(the impossibility of converting the implications) is confirmed by the set
of the counterexamples below whose thorough study is left to the reader as
an exercise.
1) 2 does not imply 3: f 1 : R + R, f1(x) ~ lxl.
The variation is nonlinear at the point x ~ 0. The same example shows that
the continuity at a point does not imply the Frechet and the Gateaux dif-
ferentiability.
2) 3 does not imply 4: fz: R2 + R,

{
I, x 1 = x~. x 2 0, >
f, (xu x,) = 0 at all the other points.
This example shows that a Gateaux differentiable function [at the point
(0, 0)] is not necessarily continuous.
3) 4 does not imply 5: f3: R + R,
x• if x is rational,
f. (x) = { 0 if xis irrational.
This function is (Frechet) differentiable at the point x 0 but is not
strictly differentiable. •
Exercises.
1. Show that the function f4: R2 + R defined on polar coordinates on
R 2 by the equality
f 4 (x) = r cos 3lp (x =(xi> x 2) = (r cos lp, r sin lp))
possesses the first Lagrange variation at the point (0, 0) but is not
Gateaux differentiable.
2. If FESD 1 (x), then the function F satisfies the Lipschil;z condition
in a neighborhood of the point x.
3. If FESD (x) is a scalar function (X ~ Y ~ R), then F' (i) exists
1

at almost all the points of a neighborhood of the point ~-


Proposition 2. If mappings Fi:U + Y, i ~ 1, 2, and A: U-+-2(Y, Z),
where Z is a normed linear space, are differentiable in the sense of one
of the Definitions 1-5 (one and the same definition for all the three map-
pings), then for any a; E R , i ~ 1, 2 the mapping F ~ o:1F 1 + o:zFz is differ-
entiable at the point x in the same sense, and, moreover,
F' (x) = a,F~ (x) +a,F; (x)
or

The mapping

is differentiable at the point x in the same sense, and


MATHEMATICAL METHODS FOR EXTREMAL PROBLEHS 89

~. The proof follows at once from the corresponding definitions. •


In the special case when X = R and A: U --+2 (Y, R}=Y* the expression
¢(x) = <A(x), Fi(x)> is a scalar function and the last formula can be re-
written thus:
<A (x), F; (x)>' lx=x =<A' (x), F; (x)> +<A (x), Pi (x)>,
which corresponds to the ordinary differentiation formula.
Below we give two simple examples of computing derivatives.
1) Affine Happing. A mapping A:X + Y of one linear space into another
is said to be affine if there exist a linear mapping A:X + Y and a constant
a E Y ·such that
A (x)=Ax+a.
If X andY are normed spaces and AE2(X,Y), then the mapping A is strictly
differentiable at any point x, and, moreover,
A' (x)=A.
This assertion is verified directly. In particular, if A is linear (a
O), then A' (x) [h] = A(h), and the derivative of a constant mapping (A = 0)
is equal to zero.
RxPrcise 1. Prove the converse: if the derivative (it is sufficient
to understand it in Gateaux' sense) of a mapping A:X + Y exists at each
point xEX and is one and the same for all x, then A is an affine mapping.
2) Hultilinear Happing. Let X andY be topological linear spaces, and
lat 2" (X, Y) be the linear space of the continuous multilinear mappings of
the Cartesian product Xn=Xx ... xX into Y. We remind the reader that a
~
n times
mapping IT(x 1 , ••• ,xn) is said to be multilinear if the mapping

1s linear for every i. If X and Y are normed spaces, the continuity of IT


equivalent to its boundedness, i.e., the finiteness of the number
I II II= II x, II < 1 ••sup
, . , II Xn II < 1
I II (xu · · ·, Xn)ll· ( 10)

In this case 2" (X, Y) is a normed linear space with the norm ( 10), and
the inequality
( 11)

holds.
The function Qn(x) = IT(x, ... ,x) is called a form of degree n (a qua-
dratic form for n 2 and a ternary form for n = 3) corresponding to the
multilinear mapping IT. It follows from the definitions that
n
Qn(x+h)=Qn(x)+~ II(x, •.. , h, . .. , x)+o(llhll2 ), (12)
i=l

and therefore Qu(·) is Frechet differentiable at each point x and


n
Q~ (x)fh J= ~
i=l
rr (x, ... , h, ... , x) (13)

[in the formulas (12) and (13) the argument h occupies the i-th place in
the i-th term of the sum, the other arguments being equal to x]. If the
mapping IT is symmetric, i.e. , IT (xl, ... ,xu) does not change under an arbi-
trary permutation of the arguments Xi, then (13) turns into
90 CHAPTER 2

Q~ (x)[h l =niT (h, x, ... , x).


In particular, if X is a real Hilbert space with a scalar product
(·I·) and if AE2(X,X) and IICx1, xz) = (Ax11xz), then Qz(x) = (Axlx) (the
quadratic form of the operator A) is differentiable and
Q~(-~) fhJ =(Ax 1 h> +<Ah lx> = <Ax+A•x 1h>•
and therefore Q~(x) can naturally be identified with the vector AX + A*x.
For instance, for Qz(x) = 1 /z(xlx) 1 /zllxll 2 the derivative is Q'(x) x.
Exercise 2. Prove that Qn(x) JI(x, ... ,x), where IIE.Zn(X, Y) is
strictly differentiable for all x.
The proposition below implies a more complicated example.
Proposition 3. Let U c 2(X, Y) be the set of those continuous linear
operators A:X + Y for which the inverse operators A-~ E 2 (Y, X) exist. If
at least one of the spaces X and Y is a Banach space, then U is open and
the function <!>(A) = A- 1 is Frechet differentiable at any point A E U, and,
moreover, <I>'(A)[H] = -A- 1HA- 1 . ~
Proof. If X is a Banach space, then the sen.es -~ (-A -1fl)k is con-
k=O
vergent in 2' (X, X) for IIHII < IIA- 1 11- 1 and

(A +H) [k~O (-A -1fl)k A-1] =I Y•

l; (-A-lfl)kA-1] (A+H)=Ix,
[ k=O
which can readily be verified. Consequently,
(A+H)-1= l; (-A-1fl)kA-l
k=O
( 14)

for IIHII < IIA- 1 11- 1 , and therefore B(A, JJA- 1!i- 1}cU and U is open.
If Y is a Banachspace, then (14) should be replaced by

(A +H)-1 = A-1 l; (-HA-1)".


k=O
Further, from (14) we obtain

IJ(A+H)-1- A-1 +A-1HA-1JJ=t~. (- A-1H)"A -111~


&± (IJA-11/IIH!I)kiiA-1II =
""k=2'
IIA-:II"IIHII'
I-IIA- 1 1111HII
=o(!IH!I)

for H + 0, and consequently the function <!>(A) = A- 1 is differentiable and


its Fr~chet derivative is <I>'(A)[·] = -A- 1 (·)A- 1 • •
Exercise 3. Prove that <!>(A) is strictly differentiable at any point

2.2.2. Theorem on the Composition of Differentiable Mappings. The


Composition Theorem. Let X, Y, and Z be normed spaces; let U be a neigh-
borhood of a point x in X, let V be a neighborhood of a point y in Y, let
cp: U-+V, cp(x)=y, and 'ljl: y_,.z, and let f='ljlocp: u_,.z be the composition
of the mappings t:p and 1ji.
If 1ji is Frechet differentiable at the point y and cp is Frechet dif-
ferentiable (or is Gateaux differentiable or possesses the first variation
or possesses the derivative in a direction h) at the point x, then f pos-
sesses the same property as <{J at the point x, and, moreover,
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 91

( 1)
or
r (x; h)=IP' <u)[~p' <x; h)J. (2)
If 1jJ is strictly differentiable at y and <p is strictly differentiable
at x, then f is strictly differentiable at x.
~. We shall consider in detail the two extreme cases: the direc-
tional derivative and the strict differentiability.
A) By the definition of the Frechet derivative,
II' (y) =II' (J;)+IP' (y)[y-y] +a(y) IIY -ull.
where
lim a(y)=a(y)=O ..
g ... g
If the limit

exists, then

lim t(x+A.h)-f(X) lim 'IJ{cp'(x+Mll-lll(Yl


qo A. qo A.
=lim 'IJ'(y)(cp(x+A.h)-yJ+a~cp(x+MHllcp(x+Ml-'YII =.
AtO

=IP'(y) [lim cp(x+M>-Y] +lima(<p(x+t.h))limjj cp(x+'A.hl-y


.qo A. qD qo 'J...
II=
= '~'' (y) [ ql' (x; h)] +a <u> 11 <p' (x; h) II'= II'' (y)[~p' (x; h)],
which proves (2).
B) For brevity, we shall denote A1 =<p'(x), A2 =1i''(y). By the definition
of the strict differentiability, for any E1 > O, E2 > 0 there are 8 1 > 0,
82 > 0 such that
11x~-x11 <1\, llx.-xll < 61 =?II <p (x~)-<p (x.)-At (xl -x.)ll < e1ll xi-x.11 (3)
and
1/Yt-YII < 6,, llu.-ull < 6. =?IIIP(Yi)-'ljl (y.)-A. (yl-y.)jj <e. II Yt-Yall· (4)
Now for any E > 0 we choose E1 > 0 and E2 > 0 so that the inequality
e1IIA.JI-t:e.IIA1il+e1e. <e
holds, and for these E1, E2 we find 01 > 0 and 02 > 0 such that the rela-
tions (3) and (4) hold; finally we put

6=min(6i, et~I.Atll).
If llx1 - x\1 < 8 and llx2 - :XII < 8' then, by virtue of (3)'
I ~p <x~)-~p (x.) II <II 1p (x~)-<p (x.) -A~ (xi-x.) I +II Ai (xi-x.) II<
< e1ll xl -x.ll +II A1llll xl -x.ll =(II Ai II+ el) I xi -x.ll· (5)
Putting x 1 = x and x2 = x in this inequality, we obtain

ll~p(x;)-~p(x)ll=llrr(x;)-YII<(IIAil!+ei)6<~S.. i = 1, 2,
and thus (4) holds for Y;=<p(X;) . Now using (4), (3), and (5) we obtain
fl t (x1)-f (x.)-A,Al (xl -x,)II<IIIP (q> (xi))-'¢ (q> (x,)) -A. (q> (xi)-<p (x,))ll +
+~A, (<p (x 1)-<p (x,))-A,Ai (xi-x,)\1 <
92 CHAPTER 2

< e,JJ cp (xt)-cp (x,) II+ I A,llll cp (xt)-cp (x,)-A 1 ~xi-x,) II~
< e, (II A1 l +e1)llx1 -x,ll +I/ A,Jje1 jjxi-x,JI =
= (e,IIA 1 ii+B18 2 +II A~J/e 1 ) IIX1 -x,ll <ell X1 -x,ll.
which implies the strict differentiability of f at the point ~-
Putting xz = x and x1 = x + h in this argument, we arrive at the proof
of the theorem for the case of the Frechet differentiability of ,q> • The
remaining assertions are proved by analyzing equality (2), which we have
already proved. •
The counterexample below shows that in the general case the composi-
tion theorem does not hold when w is only Gateaux differentiable.
Example. Let X= Y = R2 , Z = R;
rp(x)=(rp,(x1 , x,), rp,(xl' x,))=(xf, x,);
f I for Y1 =yi, y, > 0,
1jl (y) = 1jl (y,, y,) = \ 0 in all the other cases

(cf. the counterexample in the proof of Proposition 1 of Section 2.1.1). Here


rp is Fr~chet differentiable at the point (0, 0) and even strictly differ-
entiable (check this!) and w is Gateaux differentiable at (0, O); however,
the function
I for x, =I xl I > 0,
) ljl (X~, X ,) = 0 in all the other cases
{
f (X=
) (11> 0 'P) (X=
is not Gateaux differentiable at (0, 0) [and even does not possess the de-
rivatives in the directions h = (1, 1) and h = (-1, 1) at that point].
COROLLARY. Let X andY be normed spaces, and let U be a neighborhood
of a point y in Y. Let f:U ~ R be Frechet differentiable at the point y,
and let AE2(X,Y). Then foA: A- 1 (U)-+R is Fn~chet differentiable at the
point x = A- 1 (y) and
(f o A)' (x) = A• (f' o A) (x). (6)
Proof. According to the definition,
(f' o A) (x) =f' (Ax) =f' (y) E 2(Y, R) = Y•
and
(f o A)' (x) = f' (Ax) o A= f' (y) o A E 2 (X, R) = X•.
Now for any hE X we have
<A• (f' o A) (x), h> = <(f' o A) (x), Ah> = f' (y) [ Ah] =
= (f' (y) o A) h = <f' (y) o A, h> = <(f o A)' (x), h>,
which proves (6).
2.2.3. Mean Value Theorem and Its Corollarjes. As is known, for
scalar functions of one variable Lagrange's theorem holds (it is also
called the mean value theorem or the formula of finite increments): if a
function f:[a, b)~ R is continuous on a closed interval [a, b) and differ-
entiable in the open interval (a, b), then there exists a point cE(a, b) such
that
j(b)-f (a)=!' (c)(b-a). (1)
It can also be easily shown that the formula (1) remains valid for scalar
functions f(x) whose argument belongs to an arbitrary topological linear
space.
In this case
[a, b]={xlx=a+t(b-a), O~t<l},
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 93

and the open interval (a, b) is defined analogously; the differentiability


may be understood in Gateaux' sense. Putting
<D (t) = f (a+ t (b-a)),
we reduce the proof to the case of one real variable.
For vector-valued functions the situation is quite different.
Example. Let the mapping f:R + R2 be determined by the equality f(t) =
(sin t, -cost). Then for every t there exists the (strict) Frechet deriva-
tive f'(t) (prove this!):
f' (t) [aJ=(cos t, sin t)a=(acos t, a sin t).
At the same time, for any c we have
f (2n) -f (0) = 0 =f= f' (c)[2n -0] = (2n case, 2n sin c),
and hence the formula (1) does not hold.
However, it should be noted that formula (1) itself is used in mathe-
matical analysis much rarer than the inequality lf(b)- f(a)l ~ Mlb- al
[where M = suplf'(x)l] following from it. We shall show that in this weak-
ened form the proposition can be extended to the case of arbitrary normed
spaces. By tradition, the name the "mean value theorem" is retained, al-
though it would be, of course, more correct to call it a "theorem on the es-
timation of a finite increment."
The Mean Value Theorem. Let X andY be normed linear spaces, and let
an open set U ~·X contain a closed interval [a, b ].
If f:U + Y is a Gateaux differentiable function at each point xE [a, b],
then
!lf(b)-f(a)JI::( sup !If' (c)l!llb-all· (2)
c E [a. b)

Proof. Let us take an arbitrary y• E Y• and consider the function


<D(t)=<y•, f(a+t(b-a))>.
This function possesses the left-hand and the right-hand derivatives at
each point of the closed interval [0, 1]:
<D~ ( t) =lim (lJ (1-a)-(lJ (t) =lim (y•, f (a+ (t -a) (b-~~- f (a+ t (b-a)) ) ==;
<ttO -a <ttO

=-~,
I • 1.
~
f(a+t(b-a)-a(b-a))-f(a+t(b-a)).)
. =
atO a
= - <y•, f' (a+ t (b -a))[- (b -a)]>= <y•, f' (a +t (b-a))[b -a]>,
and similarly
<D:(t)=lim (lJ(t+a~-(lJ(t) =<y*, f'(a+t(b-a))[b-a]>.
a<D
Since these derivatives coincide, the function ~(t) is differentiable (in
the ordinary sense) on the closed interval [0, 1] and therefore it is con-
tinuous on the interval. By Lagrange's formula, there exists 9E(0, 1) such
that
<y*, f (b) -f (a))= <D ( 1) -<D (0) = <D' (9) = <y•, f' (a+ a(b -a))fb-a]>.
Now we shall make use of Corollary 1 of the Hahn-Banach theorem (Sec-
tion 2.1.3), according to which for any element YEY there is a linear
functional y• E Y• such that lly*ll = 1 and <y*, y> = llyll. Taking this very
functional y* for the element y = f(b)- f(a), we obtain
llf (b)-f (a) II= <!f, f (b)-f (a)>= <y•, f' (a +9 (b-a)) [11:-a]> ~
::(II y• ITll f' (a +9 (b-a)) Ill\ b-all::(ce[a,
supbj l!r (c) llllb-aJJ,
94 CHAPTER 2

which is what we had to prove. •


We shall present several corollaries of the mean value theorem.
COROLLARY 1. Let all the conditions of the mean value theorem be ful-
filled, and let AE.2'(X, Y). Then
~((b)-f(a)-A(b-a)l/~ sup /If' (c)-A/1//b-a/1. (3)
ce(a, b]

~ The proof reduces to the application of the mean value theorem


to the mapping g(x) ; f(x) -Ax. •
COROLLARY 2, Let X and Y be normed spaces, let U be a neighborhood of
a point x in X, and let a mapping f:U ~ Y be Gateaux differentiable at each
point xEU • If the mapping x ~ fr(x) is continuous [with respect to the
uniform topology of operators in the space .2'(X, Y) ] at the point x, then
the mapping f is strictly differentiable at x (and consequently it is
Fr~chet differentiable at that point).
~ Given E > 0, there is o> 0 such that the relation
~x-xll < 6 ~llfr(x)-fr(x)ll < e (4)
holds. If llx1- xll < o and llxz- xll < o, then for any X=Xt+t(x1 -X1)Efxi,
x.], 0~ t ~I, we have
llx-xJI = llx1 + t (x.-x~>-xll= 11 t (x.-x> +(l-t)(x1-x>ll ~
~ tllx.-xJI+(l-t)l/xl-xll < t6 +(l~t)6 =6,
and therefore, by virtue of (4), we have llfr(x)- f'(x)ll <E.
Applying Corollary 1 to A; fr(x), we obtain
IJf(x~)-f(xa)-ff(X) (Xt-X2 )11 ~ sup llfr (x}-frfx)lll!xt-Xall ~eiJxi-xaJI,
· xe(x1 , xa]

which implies the strict differentiability of f at x. •


Definition. Let X andY be normed spaces. A mapping F:U ~ Y defined
on an open subset U e-X is said to belong to the class C1 (U) if it pos-
sesses the derivative at each point xeu and the mapping X~ F' (x) is con-
tinuous (in the uniform topology of operators).
Corollary 2 indicates that here it is unnecessary to stipulate which
derivative is meant. This fact is always used when the differentiability
of concrete functionals is verified: the existence of the Gateaux deriva-
tive is proved and its continuity is verified, which guarantees the strict
differentiability (and hence the existence of the Frechet derivative).
Exercises. Let the conditions of the mean value theorem hold, let
y = Rn, and let the mapping x ~ fr (x) be continuous in Uc X • Then
I
1) f(b)-f(a) =) tr (a+t (b-a))dt [b-a]. (5)
0

2) There exists an operator AEZ(X, Y) belonging to the convex closure


(see Section 2.6.2) of the set {f(.(.J),.JE[a,bJ} such that
f (b)-1 (a)=A (b-a).
3) Generalize 1) and 2) to the case of an arbitrary Banach space y by
defining appropriately the integral in (5).
2.2.4. Differentiation in a Product Space. Partial Derivatives. The
Theorem on the Total Differential. In this section X, Y, and Z are
normed spaces. We shall begin with the case of a mapping whose values be-
long to the product spaceY x Z, i.e.,F; U-+YXZ, Uc.X. Since a point of
Y x Z is a pair (y, z), the mapping F also consists of two components:
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 95

F(x) = (G(x); H(x)), where G:U + Y and H:U + Z. The corresponding defini-
tions immediately imply:
Proposition 1. Let X, Y, and Z be normed spaces, let U be a neighbor-
hood of a point x in X, and let G:U + Y and H:U + Z.
For the mapping F = (G, H):U + Y x Z to be differentiable at the point
x in the sense of one of the Definitions 1-5 of Section 2. 2.1 it is neces-
sary and sufficient that G and H possess the same property. Moreover, in
this case

r <x> =<a' <x>. n· <x»


or
F' h)= <x.
h), n· h)). <a' <x. <x.
Now we pass to the case when the domain of the mapping F:U + Z belongs
to the product space: Uc.XxY.
Definition. Let X, Y, and Z be normed spaces, let U be a neighborhood
of a point (x, y) in X x Y, and let F:U + z.
If the mapping x + F(x, y) is (Frechet, Gateaux, or strictly) differ-
entiable at the point x, its derivative is called the partial derivative of
the mapping F with respect to x at the point (x, y) and is denoted Fx(x, y)
or ~~ (x, y);

The partial derivative

with respect to y is defined in an analogous manner.


Theorem on the Total Differential. Let X, Y, and Z be normed spaces,
let U be a neighborhood in X x Y, and let F:U + Z be a mapping possessing
the partial derivatives Fx(x, y) and Fy(x, y) in Gateaux' sense at each
point (x, y) E W.
If the~mappings. (x, y) +.Fx(x, y) and (x, y) + Fy(x, y) are continuous
at a point (x, y) E U 1n the un1form topology of operators, then F is strictly
differentiable at the point, and moreover,
F'(x, .Y>Is. 1'J.l=F,.(x, iiHsJ +F11 (x, i/)[ 11 ].
Proof. Given an arbitrary £ > 0, let us choose o> 0 so small that
the "rectangular" neighborhood
6)x8(!,, 6) = {(x, u> lllx-xll < 6, llu-.Y~-< 6}
v = B(x,
of the point (x, y) is contained in U and the inequalities
IJF,.(x, y)-F,.(x, y)IJ<e, JJF11 (x, y)-Fy(x, y)JJ<£ (1)
hold in that neighborhood. Now we have
A=F (x1 ,y1)-F (x 2 ,y2 )-Fx(x, y')[ X -x,]-Fy(X., y}[y1-Y2l=
1

=[F(x1 , Y1 )-F(x,, Yl)-Fx(~. y)[xi-x2 J1+


+[F(x2, Y1)-F(x2, Y2)-:-F 11 (x, y)[yi-y.]].
It can readily be seen that if the points (xl, Yl), (x2, Y2) belong to V,
then (X 2 , y1 ) EV·, and moreover the line segments [ (xl, Yl), (x2, Y1)] and
[ (X2, y 1 ), (x 2 , Y2)] are contained in Vc.U . Therefore, the functions x +
F(x, y 1) andy+ F(x2, y) are Gateaux differentiable: the first of them
possesses the derivative Fx on [x1, x2l and the other possesses the deriva-
tive Fy on [y 1 , Y2l· Applying the mean value theorem to these functions
96 CHAPTER 2

[in the form of inequality (3) of Section 2.2.3 with the corresponding A]
we obtain, by virtue of (1), the relations
(x1 , y1)EV, (x., Y2 )EV=:>II~II~ sup {llfx(s, Yt)-Fx(x, Y)illlxt-x.JI}+
~e[x., x,]

+ sup {llfu(x., ·TJ)-Fy(x, y)IIIIYt-Y.II} .~ellx~-x.l!+el!yt-Y.II· •


TJE[y,, y,]

COROLLARY. For F to be of class C1 (U) it is necessary and sufficient


that the partial derivatives Fx and Fy be continuous in U.
Both Proposition 1 and the theorem on the total differential can easily
be generalized to the case of a product of an arbitrary finite number of
spaces.
Let us dwell on the finite-dimensional case. Let F:U + Rm be defined
on an open set U~Rn. Since we naturally have
Rm=Rx ... xR, Rn=Rx ... xR,
~ ~
m times n times
it is possible to use both Proposition 1 and the theorem on the total dif-
ferential. If

then

~
r
':· ,., ...
aF 1 -

(2)

ap!!L(x)
ax1

The (m x n) matrix

is called Jacobi's matr-ix of the mapping Fat the point x. It is readily


seen that this is nothing other than the matrix of the linear operator
F'(x):Rn +RID corresponding to the standard bases in Rn and Rm.
In classical mathematical analysis the notation

dX1 )

h=dx = ( :

dxn
is used and formula (2) has the form
- =""'ax:
dF (x)
i- aF (x) dx1 . A

i=l J

The assertion proved in this section is a generalization of the well-


known theorem stating that the existence of continuous partial derivatives
is a sufficient condition for the differentiability of a function of sev-
eral variables.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 97

2.2.5. Higher-Order Derivatives. Taylor's Formula. In this sec-


tion X and Y are normed spaces and U is an open set in X. The differenti-
ability is understood in Frechet's sense.
If a mapping f:U + Y is differentiable at each point xEU. then the
mapping f' (x): U--+ .2:'(X, Y) is defined. Since .2:'(X, Y) is also a normed
space, the question of the existence of the second derivative
f"(x)=(f')'(x)E.!?(X, !f(X, Y))
can be posed. The higher-order derivatives are defined by induction: if
f(n-l)(x) has already been defined in U, then
f'n>(x)=(f'n-1l)'(x)E2(X, ... , !f(X, Y) ... ).
~
n times
~ pefinition 1. Let f:U + Y. We shall say that f(n) exists at a point
xEU if f'(x), f"(x), ... ,f(n-l)(x) exist m a neighborhood of that point
and f(n)(x) exists.
If f (n) (x) exists at each point X E U and the mapping x + f (n) (x) is
continuous in the uniform topology of the space !f(X, ... ,!f(X, Y) ... ) (gener-
ated by the norm), then f is said to be a mapping of class clnJ(U). '
In what follows we shall need some properties of continuous multilin-
ear mappings [see Example 2) of Section 2.2.1).
Proposition 1. The normed spaces £'n (X, !fm (X, Y)) and .2:'n+m (X, Y) are
isometrically isomorphic.
Proof. If :rtE£:n(X, !fm(X, Y)), then :n(xi, ... , Xn)E!fm(X, Y) and the
equality
( 1)
specifies a multilinear mapping II of the space Xn+m=Xx ... xX into Y.
~
n + m times
Conversely, every such mapping II specifies [with the aid of equality (1)]
a multilinear mapping 11 of the space xn = Xx ... xX into the space of mul-
"----v---"
n times
tilinear mappings of xm into Y. It only remains to note that
II II II= sup II II (x1, .. ·, Xn+mlll=
II X .I/-<:: I
llxn+mll<l
= sup
llx,ll-<:lilxn+•ii-<:1
sup :rt (X1, · · ·, Xn) [Xn+i• · · ·, Xn+m.l =

= sup
llx1 11-<: I
llxnll< I
COROI.I.ARY 1. £~(X, !f(X, ... , !f(X, Y) ... ))is isometrically isomorphic
ntimes
to !fn (X, Y).
Thus, we can assume that f<n>(x)E£'n(X, Y). The values of this multilin-
ear mapping on the vectors (sl,·••,sn) will be denoted by f(n)(x)[s 1 , ••• ,
snl· In accordance with the inductive definition,
tn> (xo) [£v ... ' Sn] =lim f'n-1) (xo+a(;l) [(;,, ... , Gn]-[(n-1) (xo) [(;,, ... , Gnl.
czj,O IX

For an arbitrary IIE!fn(X, Y) and an arbitrary collection of k in-


dices (i 1 , ••• ,ik) each of which assumes one of the values 1, 2, ... ,n we
denote
98 CHAPTER 2

Proposition 2. If IlE.Zn(X, Y) and Q(x) = IT(x, ... ,x) then


n
Q' (x)[h 1] = ~ Ildx; h1),
i= I
n n
Q" (x) [hl, h2] = ~ ~
i=l i=l
rrij (x; hl, hz),
;,#
(2)
Q<nl (x) [ht, •.. ' hn] = ~
(1, •.••• In)
rr (ht, • ... ' ht.).
Q<kl(x)=O for k>n
(in the formula for q(n) the summation extends over all the permutations
of the indices).
These formulas are proved by means of the direct calculation (also cf.
the example of Section 2.2.1). It should be noted that, as (2) shows, each
of the derivatives q(l) is a symmetric function of (h 1 , ... ,hz). As will be
seen below, this is not accidental.
Proposition 3. If a mapping I1E2"(X,Y) is symmetric and Q(x) =
IT(x, ... ,x) :: 0, then IT :: 0.
~ Let us put
cp (t 1 , ••• , t.) = Q (t 1 X 1 -t. ... + t.xn) = ~
i1, ... , in
it,· . . it.IT(xt,; .. x;.)=

~ t~· ... t~" ~~IT (Xt, • ••. , x;.) =


o<.ac<n
I;a;=n
~ i~' ... t~"Na, ... a.IT(X1 , ••• ,X11 ••• ,x., ... ,x.),
0 < ai < n -..._,.._.... -..._,.._....
I;a 1=n a 1 times «n times
where in the sum~~ the terms are collected among whose indices i 1 , ... ,in
there are a1 indices equal to unity, ... ,n- an indices equal ton, and
Na 1 ... an is the number of the terms in this sum (in the passage to the last
equality we have used the symmetry of IT).
By the hypothesis, we have <p (t1 , •.• , t.) = 0 , and therefore all the co-
efficients of the polynomial ~ are equal to zero. In particular, the co-
efficient n!IT(xl, ... ,xn) in t1t2 ... tn is also equal to zero. •
Theorem on the Mixed Derivatives. If a function f:U 7 Y possesses
the second derivative f"(x), then
(3)
for all ~. 1] EX.
Proof. According to the definition of the second derivative,
i' (x)-f' (x) = (f')' (x)[x-x] +a(x}[[x-x[[,
where a(x.)E;2'(X, Y) and
lilll a (x) =a (x) = 0. (4)
X->X

For sufficiently small n and x close to x, then function


cp (x) = f (x +'tJ) -f (x)
is defined and we have
cp'(x) = f' (x + tJ) -f' (x) = f'(x + 1J) -f'(x) -(f' (x)-:f'(x))=
= (f')' (x) [x+ 1]-x] +a (x+ 1]) llx+ "YJ-x[[-
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 99

-(f')' (x) [x-x]-a.(x)llx-xll~= (f')' (x) [TJ] +a. (x+TJ)~x-x+ TJ~-a.(x)llx-xll·


In particular,
cp' (x) = (f')' (x) [ TJ] +a. (x + TJ) I TJ 11. (5)
and therefore
cp' (x)-cp' (~)=a. (x + 1J) llx-x + 1J ~-a. (x)ll x-ill-a. (i+ 1J) I TJ II· (6)
By virtue of (4), for any E > 0 there is 6 > 0 such that
llx-xll< 6::::::;>lla.(x)ll < 8, (7)
whence, by virtue of (6), it follows that
11x-x11 < 6;2,
~TJII < 6/2::::::;>Jicp' (x)-cp' (i)ll < 28(11x-xii+IITJII)· (8)
For sufficiently small ~ and n, the second difference
t. (TJ; ~> =t <x+~ +TJ>-t <x+~>-t <x+ TJ) +t <x> (9)
makes sense, and using (5), (7), and (8) we obtain for 11~11 < 6/2, lin II <
6/2 the inequalities
~& (TJ, ~>-r<xH '~· m=ll cp <x + ~)-cp <x>-«f'>' <x)[T)]) r~111;,
m
=llcp<x+~>-cp <x>-cp' <x> +a. <x+TJ> 1 TJ 1111 ~ m
~ sup licp' (x)-cp' <x>llll~ll+lla.<i..+TJ)JIIISIIIITJII~
xe[X', x+ !;]
~28 Gl~ll+ll TJ Iilli ~ 11+8JJ~IIli1JII ~ 38 <II~ 11+11 TJ!I> II~ II·
Replacing ~ and n by t~ and tn, where now ~ and n can be arbitrary
fixed values and tER must be small, we obtain the inequality
( 1 O)

Interchanging ~ and n in the above argument, we obtain, in addition


to (10), the inequality
( 11)

Now since the second difference (9) is symmetric with respect to ~ and n,
we have L\(tn, tO = L\(t~, tn) and on canceling by t 2 we derive from (10)
and (11) the inequality
IJf"<x)[TJI ~]-f"(x)[~. TJH::;;;ae(Jis!I+~TJII>"·
By the arbitrariness of E, this implies (3). •
The name given to this theorem is accounted for by the fact that in
the classical mathematical analysis to this theorem there corresponds the
theorem on the independence of the mixed derivative of the order of differ-
entiation: a 2 f/axay = a 2 f/ayax (why?).
COROLLARY 2. The derivative f(n)(x) (provided that it exists) is a
symmetric multilinear function.
Proof. For n = 2 this assertion was proved in the above theorem. The
further proof is carried out by induction using the equalities
((f<n-l>) (x) [G)) [ 1J1• ...• TJn-1l = f'n> (x) [si• TJ1• · · ·• TJn-d=
= ((f<n-•>)" (x) [s. TJt]) [ 1J., · · ·• 1Jn-d·
By the induction hypothesis, the left-hand equality implies the invariance
of f(n)(x)[~, n 1 , .•. ,nn- 1 l with respect to the permutation of the arguments
ni, and from Theorem 1 and the right-hand equality follows the invariance
with respect to the transposition of ~ and n1· This is sufficient for
proving the symmetry of f(n)(x). •
100 CHAPTER 2

Passing from a symmetric multilinear function to the form correspond-


ing to it (see the example of Section 2.2.1), we obtain the differential
from the derivative:
Definition 2.
dnt(x0 ; h)=t"(x0 )[h, ... , h].
Example. If Q(x) =II(x, ..• ,x), where IIE2'P(X,Y}, then, according to
Proposition 2,
dkQ (0; h)= 0 for k =I= n
and
d"Q (0; h)=k!Q (h).

LEMMA. If f(n)(x) exists and f(x) = O, f'(x) = O, ... ,f(n)(x) 0,


then
I. llf(x~+h)ll O
h~ ilhlln = '

Proof. For n = 1 the assertion is true by the definition of the de-


rivative. The further proof is carried out by induction. Let n > 1. Let
us put g(x) = f'(x). Then g(x) = O, ... ,g(n-l)(x) = 0, and, by the induc-
tion hypothesis, for any £ > 0 there is o > 0· such that
~g(x+h,)!l < e~h,lln-• for I: hill< 6.
Now, by the mean value theorem, for llhll < o we obtain
lit (x+h)~ =lit (x+h)-t (x)-f' (x) hJI~ sup lit' (x-t-h,)-f' (x) 1111 hJI =
h,E[0,/1]

= sup JJg(x+h,)JlJlhJl~ sup eJ/h1 //"- 1 //h//=eJlh//n . •


ll,e[0,/1] h,e[O,h]

Theorem on Taylor's Formula. If f(n)(x) exists, then


~ ~ ~ I ~

t (x+h) =t (x)+df (x; h)+ ... +tiid"f(x; h)+a:n(h)JJhiJ",


where
lim a (h) =an (0) = 0.
h->0

Proof. Let us consider the function


g<s> = t <x +6>-t <x>-dt <x; s>- ... -kd"t <x; 6>.
Using the above example and the equality dkt(x; 6)=flkl(x)[6,
~ I ~ ~
... , n f!kl <x> e
2'k(X, Y), we obtain dkg(O; h) =dkf(x; h)-ki(k!dkt(x; h))=O.

Applying Corollary 1, according to which g(k)(o) is a symmetric func-


tion belonging to _g:>k (X, Y) , and Proposition 3, according to which a symmet-
ric function vanishes when the corresponding form vanishes, we conclude
that g(O) = O, ... ,g(n)(o) = 0. The lemma implies that tlg(!;)tl = o(tl!;tln),
and the assertion of the theorem is thus proved, and, moreover, an(h) =
g(h)/llhtJn for h ~ 0. •
Exercise. If f(n)(x) exists in a starlike neighborhood of a point x,
then

and
MATHEMATICAL METHODS FOR EXTREMAL PROBLENS 101
Hint. One of the possible ways of solving the problem is the follow-
ing: for n > 1 the derivative f'(x) is continuous in a neighborhood of the
point x, and we can make use of the equality
1
t (x+h)-f(x)= ~ f' (xo+th) dt[hl
0

(see the exercise in Section 2.2.3).


The inequalities of this problem are generalizations of Taylor's
classical formula with the remainder in Lagrange's form, whereas the above
theorem implies the same formula with the remainder in Peano's form.
The proof of the existence of Frechet higher-order derivatives for
concrete mappings is often connected with cumbersome calculations. At the
same time, the calculation of the derivatives, or, more precisely, the dif-
ferentials, i.e., the corresponding homogeneous forms, is usually carried
out comparatively easily. Nost often this is performed with the aid of the
notion of the Lagrange variation.
Definition 3. A function f:U + Y is said to have the n-th Lagrange
variation at a point xEU if for any hEX the function <p(cx.)=f(x+cx.h) is
differentiable at the point a = 0 up to the order n inclusive. By the n-
th variation of f at the point x we mean the mapping h + onf(x; h) deter-
mined by the equality

(12)
COROLLARY 3. If f(n)(~) exists, then f possesses the n-th Lagrange
variation at the point x,
and, moreover
(13)
~ The proof is readily carried out by induction with the aid of
Taylor's formula.because this formula implies

t (x+a.h) =f (x) +ndf (x; h)+ ..• + ~; dnf (x; h)+o (an).
Thus, if it is known in advance that the derivative f(n)(x) exists,
then d(n)f(x; h) can be computed with the aid of formulas (12) and (13) on
the basis of the differential calculus of functions of a scalar argument,
which is of course much simpler. According to Proposition 3, the deriva-
tive f(n)(x) (i.e., the corresponding multilinear function) is uniquely
determined by the differential dnf(x; h).
The converse is not, of course, true: a function possessing the n-th
variation does not necessarily have the Frechet derivative f(n).
Let the reader find what additional properties the n-th variation must
possess for the analogue of the sufficient condition for differentiability
(Corollary 2 of Section 2.2.3) to hold.

2.3. Implicit Function Theorem


The generalization of the classical implicit function theorem, which
is proved in this section, can be fruitfully used in various branchesof
mathematical analysis. In what follows we shall have a chance to see this.
2.3.1. Statement of the Existence Theorem for an Implicit Function.
Let X be a topological space, let Y and Z be Banach spaces, let W be a
neighborhood of a point (xo, yo) in X x Y, let ~be a mapping from W into
Z, and let ~(xo, yo) = zo.
102 CHAPTER 2

If
1) the mapping x + ~(x, yo) is continuous at the point xo;
2) there exists a continuous linear operator A:Y + Z such that, given
any £ > 0, there is a number 8 > 0 and a neighborhood ~ of the
point x 0 possessing the property that the condition x E 8 and the
inequalities lly' -yo II < 8 and lly"- yo II < 8 imply the inequality
ll'l'(x, y')-'l'(x, y")-A(y' -y")IJ < ejjy' -y"IJ;
3) AY = Z;
then there is a number K > 0, a neighborhood '11 of the point (xo, zo) 1.n
X x Z and a mapping <p:'U-+Y such that:
a) 'I' (x, 1p (x, z)) = z;
and
b) jj<p(x, z)-Yoii::::::KII'I'(x, y0 )-zJI.
2.3.2. Modified Contraction Mapping Principle.
LEMMA. Let T be a topological space, let Y be a Banach space, let V
be a neighborhood of a point (to, yo) in T x Y, and let~ be a mapping from
V into Y. Then if there exist a neighborhood U of the point to in T, a
number 13 > 0, and a number 6, 0 < 6 < 1, such that tEU, and lly- y 0 11 < 13
imply
a) (t, <D (t, y)) E V,
b) IJ<D(t, <D(t, y))-<D(t, u)/1~6/J<D(t, y)-ull.
c) Jj<D(t, Yo)-Yoll<~(l-6),
then the sequence {yn(t)ln > 0} defined by the recurrence relations
Yo(i)=Yo• Yn(t)=<D(t, Yn-dt))
is contained in the ball B(yo, 13) = {y Illy - yo II < 13} for any t E U and is
convergent to a mapping t ,_ 1p (t), uniformly with respect to t E U, and, more-
over,

Proof. The lemma is proved by induction. Let t be an element belong-


ing to U. Let us denote by fn the assertion "the element Yic(t) is defined
for 0 < k <nand belongs to "§(yo, 13)." Since y.(t)=y0 EB(y0 , ~), the as-
sertion ro is true. Let fn be true. Then for 1 ~ k < n we have
del del
~Yl+ 1 (t)-y~(t)JI=II<D(t, y,.(t))-ct>(t, y,_dt)H=
=1/<D(t, <D(t, y,_dt))-ct>(t, y,_dt))/1~
def
~ 6JJ<D (t, Y~t-i (t))-y,_i (t) II =6 I y,. (t)-Yk-i (t) II·

Whence, continuing the above arguments, we obtain


IIYHi (t)-y,. (t)JJ~6JJ y, (t)-Y~-i (t) II~
IIY,.-i (t)-Yk-s (t) II~···~ 6"1/Yt (t)-Yo/1=6~ IJ<D (t, Yo)-Yo~·
~6 2

Now let k > 1, k + Z < n + 1; then, by the triangle inequality,


IIYk+l (t)-y, (t) I =IIYk+l (t) -Yk+l-l(t) + Yk+l-i (t)- ... + Yk+l (t)- y,. (t) II~
~(6k+l-1+ ... +6")//<D(t, Yo)-Yoll<

< (~ e•)ii<D(t, Yo)-Yoll= 1~ 6 1/<D(t, Yo)-Yoil < ~· (1)


MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 103

In particular, putting k + Z= n + 1, k = 1, in (1) and using the condition


c), we obtain
IIYn+i (t)-Yo II =II Yn+i (t) -Yi (t) + Y1 (t)-Yo ~ ~
~IIYn+l (t)-Yi (t)II+IIYl (t)-Yoll < Ofl+~(l-9) =P..
Hence rn+l is true, whence, by induction, the assertion rn is true for
all n. Therefore, we obtain from (1) the relation

(2)

From (2) it follows that the sequence {Yk(t)lk ~ 0} is fundamental and, by


virtue of the completeness of Y, it is convergent to an element which we
denote by q>(t). Passing to the limit in (2) for Z + oo, we conclude that
(3)

From (3) it follows that the mapping t.-y,.(t), tEU, is uniformly convergent
to the mapping t.-+q>(t). Finally, putting k = 1 in (3), we obtain
IIIJ> (t)-Yo II= II 1J> (t)-yl (t) + Y1 (t)-Yo ~~
~ I~ll~<I>(t, Yo)-YoHII<D(t, Yo)-Yoll= II«D(tj~~-Yon •

2.3.3. Proof of the Theorem. We break the proof of the theoremstated


in Section 2.3.1 into several sections.
A) According to the condition of the theorem, A is an epimorphism from
Y into Z. Consequently, by the lemma on the right inverse mapping, there
exists a mapping M:Z + Y such that
(AoM) (z)=z (1)

and
~M(z)II~CIIz~ (2)

for some C > 0. Now we put T = X x Z and verify the applicability of the
lemma of Section 2.3.2 to the mapping
ID(t, y)=ID(x, z, y)=y+M(z-'P'(x, y)).
B) On choosing e, 0 < e < 1, and E = 8/C, where C is the constant in
(2), we find a neighborhood E1 of the point xo and a number So > 0 such
that the inequalities lly' -yo II < So and lly" -Yo II < So imply [in accor-
dance with the condition 2) of the theorem] the inequality
I'!' (x, y')-'1' (x, y")-A(y' -y")ll < elly' -!I'll (3)

for xE81 •
Let us denote V=8 1 xZxB(y0 , ~0 ).
C) LetS= So/(CIIAII + 2). Using the continuity of the function (x,
z) + 'P(x, y 0 ) - z at the point (xo, yo) [see the condition 1) of the theo-
rem] we choose a neighborhood 8 2 c:8i of the point xo and y > 0 such that
(x,z)E'U=:S 1 XB(z 0 ,y) implies the inequality
II'P'(x, Y 0 )-z~<~·(l-9)/C. (4)

D) If (x, z) Eeu and yE B(Yo· ~) ' then


del (I)
liltl(x, z, y)-Yoii=IIY-Yo+M(z-'P'(x, y)H~
~IIY-Yoii+CIIz--'P'(x, Y)II~IIY-Y~H
+C{IIz-'P'(x, Y0 )-'P'(x, y)+'P'(x, Yo)+A(Y-Yo)-
-A(y-yo)~} ~IIY-Yo-ii+C {!lz-'P'(x, YuH+
104 CHAPTER 2
(3)
+i/111'(x, y)-lf(x, Yo)-A(y-yo)//+1/A(y -yo)/1} ~
~1/Y-Yo//+C {1/z-lf(x, !/ 0 )1/+el/y-yoi/+1/AI/1/Y - Yo//}~
~ (1 +a+C [[A[[)[[]!-Yo li+C // z-lf (x, Yo)//
(5)
(because EC =e).
By virtue of (4), we conclude from (5) that, in the first place,

Hence the condition a) of the lemma of Section 2.3.2 is fulfilled: (x, z,


ID(x, z, y))EV. In the second place, for y =Yo we obtain
~(1-6)
[/I!> (x, Z 0 , !10 )-y~ /1 ~C/1 z-lf (x, !/0 )// < C-c- =~{1-9).
(4 )
(5")

Therefore, the condition c) of the lemma of Section 2.3.2 is fulfilled.


E) If again (x, z) E CU, y E B(y 0 , ~) , then
del
ID(x, z, <l>(x, z, y))-CD(x, z, y)=
==<l>(x, z, y)+M(z-lf(x, CD(x, z, y)))-CD(x, z, y)=M(z-lf(x, CD(x, z, y))). (6)
Here (1) implies
AID(x, z, y) =Ay+AM(z-lf(x, y)) =Ay+z-lf(x,y),
whence
z=lf(x, y)+A(ID(x, z, y)-y).
(7)
Substituting (7) into (6) and taking into account that, by virtue of (5'),
the relation «tt(x, z, y)EB(y 0 , ~ 0 ) holds, we obtain
(6)
j)«ll(x, z, <l>(x, z, y))-<D(x, z, y)//=
. ~) m
=1/M(z-lf(x, CD(x, z, y)))/I~C[/z-lf(x, «ll(x, z, y))/1=
• (3)
=C/Ilf(x, <l>(x, z, y))-lf(x, y)-A(CD(x, z, y)-y)/1~
~Cel/y-CD(x, z, y)l/=a/ly-Cl>(x, z, y)/1. (8)
Thus, the condition b) of the lemma of Section 2.3.2 is fulfilled.
F) Let us again compare our construction with the lemma of Section
2.3.2. To the topological space T of the lemma there corresponds the prod-
uct space X x Z, to the point to there corresponds the point (x 0 , z 0 ) , to
the neighborhood U of the point to there corresponds the neighborhood CU
of the point (xo, zo), to the neighborhood V in the lemma there corresponds
the neighborhood V constructed in B), to the mapping (t, y) + ~(t, y) there
corresponds the mapping (x, z, y) + ~(x, z, y), and to the number B in the
lemma there corresponds the number B constructed in C).
Therefore, relation (5') means that condition a) of the lemma is ful-
filled, relation (5") means that condition c) of the lemma is fulfilled,
and finally relation (8) means that condition b) is fulfilled. Hence, the
lemma is applicable.
G) By the lemma, the sequence {Yn(x, z)jn~O}, (x, z)ECU, specified by
the recurrence relations
!lo (x, z) = y0 , Yn+i£x, z) =Ill (x, z, Yn (x, z))
is contained in the ball B(yo, B) for all n ~ 0 and is uniformly convergent
to a function cp (x, z), and, moreover,
del
)/cp(x, z)-Yo/1~1/Cl>(x, z, !/0 )-Yo/11(1-a)=
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 105

c
< 1_ 0 1 z- '¥ (x,
(2)
=~M (z-'1' (x, Yo))ll/(l-8) Yo)ij.

Therefore, the assertion b) of the theorem with K = C/(1 - 8) holds.


H)
(I)
Jl'l'(x, cp(x, z))-zjf~IJ'I'(x, qJ(X, z))-'l'(x, Yn(x, z))ll+ll'l'(x,y;.,(x,z))-zll~
~II'¥ (x, cp (x, z))- '¥ (x, Yn (x, z)) -A (qJ (x, z) -Yn (x, z)) II+
(3), (2)
+iiA(qJ(x, z)-Yn(x, z))II+IIAM(z-'l'(x, Yn(x, z)))il <
<e[Jcp(x, z)-Yn(x, z)JI+IIAIIII(jl(x, z)-Yn(x, z)Ji+IIAIIIIYn+dx, z)-Yn(x, z)ll--0
for n-+ oo. Consequently, 'l'(x, <p(X, z))=z. •
2.3.4. Classical Implicit Function and Inverse Mapping Theorems. From
the general theorem proved above, two classical theorems can easily be de-
rived which are constantly used in various applications of mathematical
analysis, for example, in differential geometry. In comparison with Sec-
tion 2.3.1, here we shall strengthen the requirements imposed on the func-
tion~ in two aspects. Firstly, we shall assume that ~ is continuously
differentiable and, secondly, and most significantly, we shall sup-
pose that the operator A involved in the statement of the theorem of Sec-
tion 2.3.1 is invertible. All this will guarantee the smoothness and the
uniqueness of the implicit function.
A) Classical Implicit Function Theorem. Let X, Y, and Z be Banach
spaces, let W be a neighborhood in X x Y, and let ~:W-+ Z be a mapping of
class C1 (W). If:
1) ~(xo, yo) = 0; ( 1)
2) there exists the inverse operator ['l'y {x0 , y E 2 (Z, Y), then there 0 )]- 1
exist £ > 0, o > 0 and a mapping qJ: B(x0 , 6)->-Y of class C1 (B(x 0 ,6))
such that:
a) qJ (x0 ) = y0 ; (2)
b) llx-x.ll<6~11qJ(x)-y0 JI<e and W(x, qJ(x))=O;
c) the equality ~(x, y) = 0 is possible in the "rectangle" B(xo, o) X
B(yo, £)only fory=qJ(x);
d) qJ'(x)=-['l'y(x, cp(x))j- 1 'l'x(x, cp(x)). (3)
Proof. The continuous differentiability of ~ in W implies that there
hold the conditions 1) and 2) of the existence theorem for an implicit
function of Section 2.3.1 [to verify 2) we must put A= ~y(xo, y 0 ) and make
use of the mean value theorem]. Condition 3) is fulfilled because ~y 1 (x 0 ,
y 0 ) exists. Hence, according to the implicit function theorem, there is a
number K > 0, a neighborhood U :i:l x0 , and a mapping qJ: U-+ Y such that
'¥ (x, (j) (x)) =0 (4)
and
h(x)-Yoll~K[I'I'(x, Yo)[[. (5)
Putting x = xo in (5) and using (1), we obtain (2), and (4) implies the
second half of assertion b).
B) Since 'l'fC 1 (W) , the mapping ~ is strictly differentiable at the
point (x 0 , y 0 ) (Corollary 2 of Section 2.2.3). Consequently, given any
x > 0 , there is e (x) > 0, such that
[Jxi-xoll<e(x), [!Yt-Yoll<e(x), i=l, 2, ~
=>JI'I'(xl> Yl)-'l'(x., Y.)-'l'x(x., Yo)(xl-x.)-
- 'l'y (X 0 , y 0)(Y 1 -y.) II~ x max {II xl~x.[l, IIYl-Yzll}· (6)
106 CHAPTER 2

First we shall put x 0 = 1 / 2 ll'l';1 (x., y 0)jj-\ and find the corresponding e: "'
s(xJ. Since ~(x, y) is continuous and ~(xo, Yo) "'0, there is&, 0 < & <
.::, such that

and
(7)

and consequently the first half of the assertion b) holds.


Now let us suppose that ~(:lt, 9) "'0 at a point(x~y)EB(xo,li)XB(y0 , s).
Applying (6) with 1<0 indicated above [X 1 =X1 =X, Y1 =y, Y1 =Cj)(x)], we obtain
fly-q> (x) I =II '1';1(xo, Yo) '~'u (xo, Yo)(Y-q> (x)) II~
~ - - - - - (6)
~ ll'l'i/1 (X0 , YoHII 'l'(x, y)- 'l'(x, q>(x))- 'l'y(Xo, Yo)(y-q>(x))ll~
~II 'I' y (xo, Yo)- 1 1 KoiiY-q> (x) I =2lly-q> (x) 11.
- - I - -

which is only possible for y=q>(x), i.e., assertion c) is true.


C) Let us put x1 "'x, x2 "'xo, Yl "'Y2 "'Yo in (6) for the same )G0 ,
e:, and o as above. Then
(6)
Hx-xoll < ll~ll'l'(x, Yo)-'¥ (xo, Ye)-1f,.. (Xo, Yo) (X-Xo)ll~
(5)

~xollx-xoll~ll'i' (x, Yo)\1 ~(11'1". . (Xo, Yo)ll+xo) llx-x,l! ~


~ 1111' (x)-Yo II~ K (ll'l",..(xo, Yo)ll+xo)llx-xoll· (8)

Now we take an arbitrary x1 > 0 and apply (6) to


X1 =X, X2 =X0 , Yt = q> (x), Y. = Yo•
For

we have
IIX-Xoll < lll ~11'1";;1 (xo, Yo) 'I'... (xo, Yo) (x-x0 )+q>(x)-y0 ~=
=111f;1 (X0 , g0 ){'1'... (X0 , y 0 )(X-Xo)+'l"y (x0 , Yo) (q> (x)-y0 )} II"-
~II '1';;1 (Xo, Yo )!Ill 'I' (x, q> (x))- 'I' (Xo, Yo)-
(6)
-1f,..(x0 , Y0 )(x-xo)-1fy(Xo, Yo)(q>(x)-Yo)ll~
(8)
~111fi/1 (xo, Yo) !1x1 tnfl.X {llx-xo 11. II II' (x)-Yo II}~ Cxt!IX-Xall•

where
C = 11 '1";1 (X 0 , y 0 ) I max { 1, K (j\1f... (x0 , Yo) 1\ + X 0 ) }.

Since x 1 can be chosen arbitrarily, the inequality we have proved implies


lfg-1(x 0 , Yo) 1f... (x0 , Yo) (X-Xo) + q> (x)-y 0 = o (x-x 0 )
for x ~ xo, i.e.,
q> (x) =Yo+ I- 'l"y-l (Xo, Yo) 'I'... (X 0 , Yo>J (X-X 0 ) +o (x-x 0 ),

and consequently the Frt'khet derivative q>' (~e0 ) exists and equality (3) holds
for x "' xo.
To prove the differentiability of q>' for the other values of x, we
take into account that the set of the operators A for which the inverse
operators exist is open in ~(X, Y) (Proposition 3 of Section 2.2.1).
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 107

Therefore, diminishing o, if necessary, we can assume that '1';;1 (x, q> (x))
exists for Ux- xoU < o.
Now we can repeat the whole proof replacing the point (x 0 , y 0 ) by the
point (i, tp(.~)), where Ux- xoU < o. By virtue of the uniqueness proved
above, we obtain the same function q>(x) (at least if llx- xU is suffi-
ciently small), and formula (3) holds for this function at x = x. Since
x is arbitrary, the assertion d) holds for all xEB(x0 , B).
Finally, the derivative q>'(x) is continuous because, according to
(3), it can be represented as the composition of the continuous mappings
xc- (x, q> (x)), (x, y)- 'l'...(x, y),(x, y) .-'I'Y (x, yj, and A -+ A- 1 (the last of them is
continuous since it is Frechet differentiable according to Proposition 3
of Section 2.2.1). •
Remark. If, in addition to the conditions of the theorem,
'I' E Cr (W), r ~ 2, then q> E cr (B (x0 , B)).
Indeed, from formula (3) and the composition theorem it follows that
the derivative
q>" (x) =-{'I' u (x, q> (x))- 1 o 'I'" (x, q> (x))}'
exists and is continuous. The further part of the proof is carried out by
induction.
Theorem on the Inverse Mapping, Let Y and Z be Banach spaces, let W
be a neighborhood of a point Yo in Y, let ~:W-+ Z be a mapping of class
C1 (W), and let Zo =~(yo).
If there exists the inverse operator 'l''(y0 )- 1 E 9(Z, Y) then:
a) there exists e: > 0 such that B(y0 , e) c:: W and V =~I z ='I' (y), IIY-
Yo II< e} ='I' (B (y 00 e:)) is a neighborhood of Zo in Z;
b) there exists a mapping ~:V-+ Y of class C1 (V) inverse to ~IB(y 0 ,
E), i.e., such that
y=ll>(z)~z='l'(y), (y, z) E B"(Y~r• e)xV; (9)
c) <I>' (z) =['I'' (<I> (z))]- 1 • ( 10)
Proof. A) The mapping 'I' EC (W) is strictly differentiable at each
1
point of Wand, in particular, at YO· Therefore, for every x>O there is
e (x) > 0 such that
IIY;-Yoll < e(x), i= 1, 2, ~ll'l'(yl)-'P'(y.)-'1'' (Yo)(Yi-Ys)ll:s;;;;xiiYi-Yoll· (11)
Let us choose e: > 0 such that for all y E B(y0 , e) there exists the in-
verse operator ~(y)- 1 (see Proposition 3 of Section 2.2.1) and e<e(x0 } ,
where %0 =~1.11'1'' (y0 )- 1 jl- 1 •
The last fact guarantees the injectivity of the mapping~ on B(yo, E):
Yt• Ya E B(Yo• e), 'I' (yl) ='I' (y,) ~ I!Yt-Yall =II 'I"' (Yo)-l 'P'' (yo)(Yi-Y•) II :s;;;;
(II)
:s;;;;l! 'I'' (Yo)- 1 1111'1' (Yt)-'1' (y.)-'1'' (YoHYt-Yo>ll ~
~II 'I'' (Yo)- 1 112llll''(~oJ-liiiiYi-Yoll = iiiYi-Yoll~ Yi = Ya·
Further we shall prove that the ball B(yo, e:) is the one mentioned in
Section a) of the theorem; at present we put V = ~(B(yo, e:)). Since the
mapping ~IB(y 0 , E) is injective, it is invertible, and hence there exists
~:V-+ ~(y 0 , E) possessing property (9).
B) For an arbitrary point yEB(y0 , e) let us apply the theorem of Sec-
tion 2.3.1 to the mapping ~(y) (this mapping is independent of x and there-
fore the space X can be arbitrary) in a neighborhood of the point y. The
108 CHAPTER 2

condition 1) of this theorem is trivial in this case; the condition 2)


turns into the condition for the strict differentiability at the point y
and, as was indicated above, it is fulfilled, and we have A= ~'(y); condi-
tion 3) is fulfilled since we have yEB(y0 ; e), and, according to the choice
of E, there exists ~'(y)- 1 •
By the theorem of Section 2.3.1, there is a neighborhood U of the
point z = ~(y) and a mapping q>: u__..y such that
'1.' (q> (z)) = z ( 12)
and
II !p (z)-ull~ Kllz-ill (13)
for some K > 0. Let us take a sufficiently small o>0 such that B(z, o)c
U and o < (E - II yo - yll) /K. Then

~z-zll < 6<~11~p(z)-Yii < K (s-ll~-ill) ~


~II q>(z)-Yoll < e ~ z = '1' (q> (z)) E '1' (B (y0 , e)).= V ~ iJ (i, 6) c V.
Thus, every point zEV possesses a neighborhood B(z,6)cV, and hence
V is open. Moreover, for z E iJ (z 0 , 6) we have
(12)
'l"(q>(i))=z= 'l'(<D(z))~q>(z)=ID(z) (14)
since ~ is bijective.
C) Now we shall prove the differentiability of the mapping ~ and the
formula (10). By virtue of the strict differentiability of~ at the point
y = ~ (z), an analogue of ( 11) holds: for any x > 0 there is e (x) > 0 such
that
IIY;-YII<e(x), i=l, 2, ~11'1"(y1)-'1"(y.)-'l''(y)(y1-Yo)ll~xllyt-Yoll· (15)
Putting Yl = ~(z) and Y2 = y = ~(z) in (15), we obtain
~ e(x)<t3)(14) ~ (15)
llz-z!l< K ~ II<D(z)-yll<e(x)~
.. ~ ~ ~ (13) ~ (12)
~II '1' (<D (z))- '1.' (<D (z))- '1'' (y) (<D (z)-<D (z) II~ x II<D (z)- ull ~xK I z-z II~

~llz-z- '1.'' (y) (<D (z)-<D (z)) II~ xKII z-zll~


~II<D(z)-<D(z)-'1' (y)- 1(z-z)ll=
= ll'l''(y)- 1(z-i- '1'' (y) (<D (z)-<D (z))) II~ II '1'' (y)- 1 I xK I z-zll·
and therefore
<D(z)-ID(z)-'1'' (y)- 1(z-i) =o (z-i)
for z + Therefore, the Frechet derivative ~'(z) =
z since x is arbitrary.
~'(y)- 1 exists, and thus (10) is proved. The derivative ~'(z) is continu-
ous on V because, according to (10), it can be represented in the form of
the composition of the continuous mappings z + ~(z), y + ~'(y), and A+ A-l
(the last of them is continuous by virtue of Proposition 3 of Section
2.2. 1).
2.3.5. Tangent Space and Lyusternik's Theorem. In this section X
is a normed space and M is its subset.
Definition. An element hE X is called a tangent vector to M at a
point X 0 EM, if there exist E > 0 and a mapping r:[-e:, E)+ X such that:

a) x0 +ah+r (a) EM,


b) r(a) = o(a) for a + 0.
An element hE X is said to be a one-sided tangent vector to M at a
point x0 EM i f there exist E > 0 and a mapping r: (0, E) + X such that a)
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 109

g.%_
~#J/'i
Fig. 33 Fig. 34

is fulfilled and b) holds for a ~ 0.


The set of all the tangent vectors to M at a point x 0 is denoted by
Tx 0 M, and the set of the one-sided tangent vectors is denoted by T~ 0 M. It'
is obvious that Tx,M c:: Ti,M and Tx,M=Ti,Mn(-Ti,M). If the set Tx 0 M is a
subspace of X, then it is called a tangent space toM at the point x 0 •
Remark. In geometry, by a tangent line, a tangent plane, etc. is
usually meant the affine manifold xo + Tx 0 M (Fig. 33) and not Tx 0 M.
Examples (the detailed proofs are left to the reader).
1) X=R 2 , M={(x,y)[x;;;..O} (Fig. 34); Tti,, 01 M::M,
T, 0, 01 M = {(0, b) [bE R}, T,l, o1M = Tti. o(M = R".
2) X=R 2 , M={(x, y)[x~O. y~O}=R~ (Fig. 35);
Tu, 01 M={(a, O)[aE R}, T<o. 01 M={0}, T<o.<nM={(O, b)[b E R}, .
Tti,o1M={(a,b)laER, b~O}, Tto.o1M=M, Tt;,,11M=1(a, b)[a~O, bER}.
3) X = Y x R, where Y is a normed space and M=i(y, z)[z=[[y[j} is a
cone (Fig. 36).

Exercises.
1. Assuming tha~ Y is a Hilbert space (for instance, Y = R2 ) find
T(y,llyii)M and T(y,llyii)M in the example 3) for y >' 0 (Fig. 36).
2. Find TaM and T~M, where:
a) M={2-n 1 n= I, 2, , .. } U {0} C R, ii""O;

b) M={*ln=l, 2, ... }u{o}c::R. a=-0;

c) M={(x, y)lx"=lf} c:: R•, a=(O, 0) and a ( 1 ' 1) ;


d) M={(x, y)jx•.;;;ya} c:: R•, a =(0', 0) and a ( 1 ' 1) ;
e) M ={(x, y) I x•;;;.ya} c:: R2 , a=(O, 0) and a ( 1 ' 1);
f) M={(x, y, z)lz=l, if x and y are rational; z =0 in the other cases},
a = (0, 0, 1).
In many cases including those important for the theory of extremal
problems, the set of tangent vectors can be found with the aid of the fol-
lowing general theorem.
Lyusternik' s Theorem. Let X and Z be Banach spaces, let U be a neigh-
borhood of a point xo in X, and let F:U + Z, F(xo) = 0.
IfF is strictly differentiable at the point xo and F'(xo) is an e~i­
morphism, then the set M = {xiF(x) = 0} has the tangent space
110 CHAPTER 2

Fig. 35 Fig. 36

at the point xo.


Proof. A) Let hET,:.M. Then if r(•) is the mapping mentioned in the
definition of a one-sided tangent vector, it follows, by the definition of
the strict differentiability, that
0 = F(x0 +cxh + r (a))= F (x0) +aF' (x0 ) h +o (a) =aF' (x0 ) h + o (a).
Consequently, F' (xo) h = 0 and Tt.M c: KerF' (x0 ).
B) Let us apply the theorem of Section 2.3.1 to the mapping ~(x, y) =
F(x + y). The strict differentiability ofF implies that the mapping x +
~(x, 0) is continuous at the point xo and

·11 'I' (x, y')- 'I' (x, y")- F' (xo) (y'- y") II~ 811 y'- y"li
if llx-x 0 11 < 15, lly'll < 15, lly"ll < 15. The condition 3) of the theorem of
Section 2.3.1 is also fulfilled because F'(xo) maps X into z. According
to this theorem, there is a mapping cp: U -..X of the neighborhood U c: X
of the point xo such that
'I' (x, cp (x)) = 0 ~ F (x+ cp (x)) ""'0,
II cp (x)II~KII'I' (x, 0)11 =K!IF(x)ll·
It only remains to put r (a)= cp (X0 +cxh). Then
F(x0 +cxh+r (a))= F(x0 +cxh+ cp (x0 +cxh)) =0,
IJr (a) II= I cp (Xo +cxh) I ~KII F (Xo +cxh) II=
=KII F (Xo +cxh)-F (Xo) I """KII F' (Xo) cxh + o (llcxhll) I =KIIaF' (Xo) h +o (a)ft,
and if hEKerF'(x0 ) , then llr(a)ll = o(a). Consequently,
KerF' (x0 ) c: T x.M c: r_:.M. •
Lyusternik's theorem is intuitively quite clear from the geometrical
viewpoint. Neglecting the rigorousness, we can define the tangent space L
to a set M at a point xo by saying that M is approximated by the affine
manifold xo + L with an "accuracy to within infinitesimals of higher order."
In an "infinitesimal neighborhood" of the point xo the function F is ap-
proximated by its linear part:
(1)
[since X0 EM, we have F(xo) = 0], and the set M = {xiF(x) = 0} is approxi-
mated by the set
M = {xl F' (x0) (x-x0 ) = 0} =K.er F' (x0 ) +x0 •
Hence M coincides with the manifold M = xo + KerF' (x 0 ) "to within infini-
tesimals of higher order," ~nd consequently KerF' (xo) is the tangent space
to M at xo.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 111

2.4. Differentiability of Certain Concrete Mappings


2.4.1. Nemytskii 's Operator and a Differential Constraint Operator.
Let U be an open set in R x Rn, and let a function f(t, x):U + Rm and its
partial derivative fx(t, x) be continuous in U. Let us consider the equal-
ity
off (x ( ·)) (t) == f (t, x (t)), 10 :;;;;;;, t :; ; ; , t1 , (1)
specifying a mapping JY': 'U-+C([t 0 , t 1 j, R"') defined on the set
'll = {x ( ·) EC ((t0 , t 1], Rn) I(t, X (t)) E U, t 0 :;;;;;;, t:;;;;;;, t 1 }. (2)
This mapping will be called Nemytskii's operator.
Exercise 1. Check that set (2) is open in C([to, t1J, Rn).
Proposition 1. Nemytskii' s operator specified by relation ( 1) is con-
tinuously differentiable on set (2), and, moreover,
off' <x<. >> [h <. n<t> =fx u> h (t), (3)
where

fx(t)~( a0x',~ ~= 1 ... , m,)


(t, x(t))
J=1,• ... , n (4)

is Jacobi's matrix.
Proof. To begin with, we shall compute the first Lagrange variation
of the mapping Jr. According to the definition,

6off(x(·); h(·))= lim off<x<·Hall<·>>-off<x<·».


a.-+0 IX

Here the left-hand and right-hand members are elements of the space C([to,
t 1 ], Rm), i.e., some functions of tE[t 0 , t 1]. For a fixed t we have
1. off (t (·Hall(.)) (I)-off (x (·))(I)
rm ~
a-+0

=lim f(l, x(IHall(l))-f(l, x(l)) =fx(t. x(t))h(t)=fx(t)h(t). (5)


a-+ 0 a

In order to show that here the convergence takes place in the sense of the
space C([to, t 1 ], Rm) as well, i.e., to prove that the convergence is uni-
form, we take into account that for some a > 0 we have
X'={(t,x)llx--,-x(t)j:;;;;;;,a, t0 :;;;;;;.t:::;;;t1 }c:U,
and fx is uniformly continuous on the compact set ~ . This means that
Ye > 0, 36 > 0, I t'-t"l < 6, lx' -x"l < 6 ~llf,(t', x')-f,.(t", x")JI < s.
If la.l < <5/llh(·)llc, then, by the mean value theorem (Section 2.2.3),

I JY' <i <·Hall~ n-JI' <i< ·» {!" (t) 1i (t)} II=


t (t, x(tHall (t))~f (t, x(t))-fx (t, x(t))ah (I) ~ _.....
... max II · 11::::::.
te(t0 ,1,J a:
~ max max llfx(t, x(t)+fhh(t))-f,.(t, x(t))lllh(t)J
IE [11 , 11 )9e [0; I)
:;;;;;;.ellhllc· (6)
.

Therefore, the convergence in (5) is in fact uniform, and the existence of


the Lagrange first variation
BH<x<·>· h<·>><t>==f,.<t>h<t>
is proved. Since the first variation is specified by a linearoperator
112 CHAPTER 2

[namely, oN'[ (x(. )) is the operator of multiplication by the matrix function


fx(t)] and this operator is bounded:
I oN'[ (x (·))[I~ max litx (t, i (t)) 11. (7)
I E[l 0 , 1,]

we see that oN'(x(·)) is Gateaux differentiable. To prove the continuity of


the Gateaux derivative, we shall estimate the norm of the difference
I cNf (x ( · ))-oN'f(x ( · )) = sup I oN'r (x ( · ))[h ( · )]~oN'Hx ( · ))[h (. m=
II h (-)lie.; I

= sup max llfx(t, x(t))h(tr-fx(t, x(t))h(t)ll<


ll~<·>llc<;; t le(t,, I.J
< max llfx(t, x(t))-fx(t,
IE [1 0 , 1,]
x(t))ll·
Using again the compact set ~ and the uniform continuity of fx, we con-
clude that oN'f(x(·)) is continuous with respect to x(·)E'U. Now applying
Corollary 2 of Section 2.2.3, we see that _oN'(x(·)) possesses the Frechet
derivative at each point of the set 'U and is strictly differentiable. The
equality oN'r =oN'' implies that the Frechet derivative is continuous in 'U .•
Exercise 2. Prove that in (7) the equality takes place.
Now let U be an open set in R x Rn x Rr, and let a function f(t, x,
u):U + Rm and its partial derivatives fx and fy be continuous in U. The
mapping J{': 'U -..C([t 0 , i 1 ], R"'). defined on the set
'U={(X(·), u(·))[(t, x(t), u(t))EU, t 0 <t<t1 }cC1 ([t 0 , i 1 ], Rn)xC([t 0 , i 1 ], R,') (8)
by the equality
oN'(x(·), u(·))(t)-==f(t, x(t), u(t)) (9)
will also be called Nemytskii's operator.
Proposition 2. Nemytskii' s operator determined by the relation (8) is
continuously differentiable on set (9), and, moreover,
Jf'(x(·), u(·))[h(·), v(·)j(t)=fx(t)h(t)+f,.(t)v(t), ( 10)
where

f-x (t) -..... (ati(t, ~~), ii (t)), L• = I , ... , m, 1. = I , ... , n 1~ ( 11)

and

(12)

Proof. As in the foregoing case, the mapping oN' possesses the partial
derivatives

oN'J~ (.)' u (. ))[h (.)] ( t) =f. (t) h ( t)


and
oN',. (i (. ), u (. ))fv (. )J (t) =f.(t) v (t),
and the mappings

and
oN',.: 'U-..$(C1 ([t 0 , t,], R'); C({t 0 , t1 ], R.m))
are continuous in 'U • It remains to refer to the theorem on the total dif-
ferential of Section 2.2.4. •
Further, let U and &U be the same as before, and let a function q> (t,
x, u) :U + Rn and its partial derivatives <fix and <J>u be continuous in U.
MATHEMATICAL ~lliTHODS FOR EXTREMAL PROBLEMS 113

The mapping ttl: 1l-+C(ft0 , t 1 ], Rn) determined by the equality


tll(x(·), u(·)Ht)=x(t)-cp(t, x(t), u(t)) ( 13)
will be called a differential constraint operator.
Proposition 3. The differential constraint operator determined by th~
relation (13) is continuously differentiable on the set (8), and, moreover,
<D' (x (·), u(·)) [h ( · ), v ( · )J (t) =h (t)-~x (t) h (t)-~u (t) v (t),. ( 14)
where
~ (t)-
fPx .,.,.
(acp;(t, ax(
x(tl. u(tll
'
. .
t, 1 =
1' ... ' n) (15)

and
~
fPu (t) ~
(acp;(t, x(t), uun • t.= 1' ... 'n, k = 1,
auk
)
... ' r . (16)

Proof. The mapping ¢ is the difference between the continuous linear


operator (x(·), u(·)) + x(·) and Nemytskii's operatoroN'(x(·) 1 u(·))(t)=cp(t,
x(t), u(t)). Therefore, the assertion we have to prove follows from the
general properties of the derivatives (Section 2.2.1) and Proposition 2. •
2.4.2. Integral Functional. Let U be an open set in R x R x Rn, and
let a function f(t, x, x): U + Rm and its partial derivatives fx and fx
be continuous in U. Let us define a mapping ::1: '71"-+ Rm on the set
'll"={x(·)EC 1 ([t 0 , f 1 ], Rn)/(t,x(t),x.(t))EU, f 0 '(;t=::;;;;t 1 } ( 1)
with the aid of the equality


::J(x(-))= ~ f(t, x(t), x(t))dt. (2)
I,

Proposition 1. The integral mapping (2) is continuously differenti-


able on set (1), and, moreover,
I,

:;' (x(· )) [h(· )] = ~ {fx(t)h (t)+fx (t)h (t)}dt, (3)


t,

where
~fx(t)~ ( ax;
ar ~ \
~
(t,x(t,.x(t)), i=l, ... ,m, j=1, ... ,n ) (4)

and

f·~ (t)~ (-.-' ~ "" i=l, ... ,m,


ar (t,x(t),x(t)), j=1, ... ,n. ) (5)
X axj
Proof. Let us represent mapping (2) in the form of the composition
::1=/o.,N'oD,
where

is the continuous linear operator determined by the equality


D (x(· )) =(x( · ), x(· ));
oN': CU-+ C ([t 0 , f 1 ], Rm)
is Nemytskii' s operator determined by equality (9) of Section 2.4 .1 (for
r = n) and
114 CHAPTER 2

is the operator of integration:


t,
l(x(·))=~x(t)dt
t.
which is also linear and continuous. All these operators are differenti-
able (see Section 2.2.1 and Proposition 2 of Section 2.4.1), the derivative
of the linear operator coinciding with that operator itself, and the deriv-
ative of Nemytskii' s operator being expressed by the formula (10) of Section
2.4.1. By the composition theorem,
:J' (x (·))=I o oN" (Dx (·)) o D, (6)

that is,

,. <xt·>Hh<· >J=I {Jf" <Dx<·>HDh<· m=I {oN'' <x< ·>. ;<· >Hh<· >. li<· n==
t,
=I {f" (t) h (t) + fx (t) h (t)} = ~ {f,. (t) h (t) +t;; (t) h (t)} dt.
to

This proves formula (3). The continuity of :J'(x(·)) follows from the con-
tinuity of the derivative of Nemytskii' s operator and equality (6). •
Exercise. Find the norms of operators D and I.
In problems of the classical calculus of variations and optimal con-
trol problems we also deal with integral functionals of form (2) with vari-
able limits of integration to and t1. To include them in the general scheme
we shall use the following procedure.
Let f(t, x, x)
satisfy the same conditions as before, let A!:R be a
closed interval, and let
'll={(x(·), t0 , i 1 )ix(·)EC1 (A, Rn), (t, x(t), x(t))EU, (7)
tEA, t0 , t1 E int A}.
Let us define a mapping 3: 'll---+Rm with the aid of the equality
t,
.:J (x ( · ), t0 , 11 ) = ~
t.
f (t, x ( t), x(t)) dt. (8)

Proposition 2. The integral mapping (8) is continuously differenti-


able on set (7), and, moreover,
t,
3' (x(-), fo. il)[h{·), -t0, -r_l] = ~ (t,.(t)h(t)+fx(t)_h(t))dt+f<i't> (9)
lo

where fx(t) and fx(t) are determined by formulas (4) and (5) and
f<t>=t<t. x<t>. ~<tn. ( 10)
Proof. We shall make use of the theorem on the total differential of
Section 2.2.4. The partial derivatives exist according to Proposition 1
and the classical theorem on the derivatives of an integral with respect
to its upper and lower limits of integration:
t;
:lx<·><x(·), i 0 , tl)[h(·)]= ~ {f,.(t)h(t)+fx(t)h(t)}dt,
r.
:lt.(.X (· ), fo, fllf-ro] = -1 (fo) 'to,
'~. <x <· >. fo, ~)[-rlJ = 1<~> -rl.
Now we shall verify the continuity of the partial derivatives.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 115

A)
[f.7t,(x(·), i 0 , ti)~:7t.(x(·), f 0 , f 1)[[=
- I • ~ ~~ ~
-sup l f(t.,x(t.),x(t.))-r:.-f(t., x(t.), x(t.))-r:oll<
1

ll<oll.;l
~lf(t •• x(t.), x(t.))-f(t•• x(f.), x(t~))l: ( 11)
Moreover,
1x u.>-x 0.> 1~ 1x u.>-x (t.) 1+ 1xu.>-x (t.) 1~II x(' >-x (. He• + 1xu.>-x (f.) 1 <12)
and
. ~ ..... . ..: ~ ..: ...... ..... ...: .:.
lx(t 0 )-x(t 0 ) I~ lx(t.)-x(t.) HI x(t.)-x(t.) I ~llx( · )-x (· )llc•+lx(t 0)-x(t 0 )j. (13)
\..,

Therefore, if to +to and x(·) + x(·) in C 1 (~, Rn), then x(to) + x(to),
~(to)+ iCto), and hence the right-hand side of equality (11) tends to
zero. Consequently, :7 1,(x(·), t 0 , f 1) depends continuously on (x(·), to, t 1)
(this derivative does not depend on t1 at all). The continuity of :7 1 (x(·),
to, t 1 ) is verified analogously. '
B) Let us choose a number a > 0 such that
X={(t, x, u)llx-x(t)l~a. ju--x(t)/::::;a, tEL\}cU.
The derivatives fx(t, x, and fx(t, x, x)
are uniformly continuous and x)
bounded on the compact set X. Now for llx(·)- x(·)llcl ~a we have

~max {II f X (t,


iiiC
X, u) ll+llfx (t, X, u) [[}([ tl- trl + )t.- r. [) +
+ Ta:{llf ... (t, x(t), x(t))-f ... (t, x(t), ~(t))ll+
{1 0 , 11 j

+II f.< (t, x (t), x (t))-f.;,(t, :X (t), ~ (t))lll·


Here the first term tends to zero for to+ to and t1 + t 1 • The second term
is estimated using the uniform continuity as in the proof of Proposition 1
of Section 2.4 .1, and hence this term tends to 'zero when x( ·) + X:(·) 1.n the
space C 1 (~, Rn).
Applying the theorem on the total differential, we obtain (9). •
2.4.3. Operator of Boundary Conditions. Let a function W(to, xo, t 1 ,
x 1 ):W + Rs be continuously differentiable on an open set WcRxR"XRXR",
and let
?P'={(x(·), i0 , i 1 )[x(·)EC1 (L\, R"), ( 1)
t 0 , t1 E int L\, (t 0 , x ((), t1, x (t 1) E W}.

The mapping '¥: 71" -..R• determined by the equality


\f (X ( • ), f0 , fi) = 'ljl (t 0 , X (t 0 ), f 1 , X (t 1 )) (2)
will be called an operator of boundary conditions.
Propos1.t1on. The operator of boundary conditions (2) is continuously
differentiable on set (1), and, moreover,
116 CHAPTER 2

where
(4)
and
(5)
Proof. A) First we shall consider the simplest mapping ev: C\(~. Rn)x
int~__,.Rn determined by the equality
ev(x(·), t 0 )=X(to)
(the mapping of the values). The mapping is linear with respect to x(·),
and therefore

The partial derivative with respect to to is an ordinary partial deriva-


tive:
ev 1,(x(·), t0 )[t0 ]=x(i;.)to.
Let us verify the continuity:
~evx (·) (x ( · ), to)-evx o (x (·), t 0) II= sup
UhPIIc,.;;l
llh Uo)-h (i;.)ll ~

~ sup sup llh(6)1ilto-fol~lto-f 0 /_,.0


!lh(·lllc'.;;lee[t,,I,J ·

for to +to. Further,


nev 1 (x(·), to)-evt (x(·), to)ll= sup llx(to)t.-~(i;.)toll=ll~(to)-~(t:)!l--..0
U o o I t,l<:; I

for to + to and x(·) + x(·) in the space C 1 (~, Rn) as was shown in the
proof of Proposition 2 of Section 2.4.2 [inequalities (12) and (13)]. By
virtue of the theorem on the total differential,

The continuous differentiability of mapping (x(·), t1) + x(ti.) is proved


in a similar way.
B) Using the composition theorem, we prove the differentiability of
mapping (2) and equality (3). •
Exercise 1. Let ev: CL([O, be the mapping determined by the
l])X(O, I)-+ R.
formula ev(x(·), to) = x(to). Prove that:
a) for the second variation 62 ev(x(·), t 0) to exist it l.S necessary and
and sufficient that x(to) exist and, moreover,

62 ev (x(·), to) [h(), <]=2h(to)<-l x'(to)< 2 ;

b) the mapping ev does not possess the second Frechet derivative, al-
though its first Frechet derivative is Gateaux differentiable.
Hint. li(t0 + T) -li(t0 ) ,0 O(( llh /1 2 + T 2 ) 112 ).

2.5. Necessary Facts of the Theory of Ordinary


Differential Equations
As was already mentioned in Section 2.1.8, differential equations
X=cp(t, x, u(t)) considered in optimal control problems possess certain spe-
cific properties. Since discontinuous controls u(·) are admitted, the
right-hand sides of the equations do not necessarily satisfy the conditions
of the standard theorems presented in courses of differential equations.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 117

Moreover, the necessary facts concerning the solutions are not always pre-
sented in these courses in a form suitable for our aims. That is why for
the completeness of the presentation of the material and in order to de-
monstrate the possibility of the application of the general theorems of
Sections 2.3 and 2.4 we shall present in this section the proofs of the
basic theorems on the existence, the uniqueness, and the differentiability
of solutions and also some special assertions concerning linear systems.
Some of the statements are given in a more general form than is usually
done. Most often we deal with piecewise-continuous controls u( · ), but even
in this case it is convenient to speak about "measurability," "integrabil-
ity," etc.
A function x(·) is said to be a solution of the differential equation
x=F(t, x)
if it is absolutely continuous (see Section 2.1.8) and satisfies the equa-
tion almost everywhere.
In the equivalent form we can say that x(·) must be the solution of
the integral equation
t
x(t)=X(t 0 )+~F(s, x(s))ds.
t,

2.5.1. Basic Assumptions. Here and in Sections 2.5.2-2.5.7 we shall


assume that G is a given open set in R x Rn and F:G + Rn is a given func-
tion satisfying the following three conditions:
A) For any x the function t + F(t, x) defined on the section Gx =
{tj(t, x)EG} is measurable and integrable on any finite closed in-
terval contained in Gx·
B) For any t the function x + F(t, x) defined on the section Gt =
{xj(t, x)EG} is differentiable (at least in Gateaux' sense).
C) For any compact set '!JCcG there exists a locally integrable func-
tion k(·) such that
\IPx(t, x)\l~k(t), V(t, x)EX. (1)

(A function is said to be locally integrable if it is integrable


on any finite closed interval.)
A typical example of this kind is the function P(t, x) = ~p(t, x, u(t)),
where ~p(t, x, u) and IJlx(t, x, u) are continuous onGX'll,, andR-..'ilis piecewise-
continuous. The conditions A) and B) are obviously fulfilled, and k(t)
can be defined as the maximum of Fx(t, x) on CJt. Another example is
F(t, x) = A(t)x, where A(·) is a locally integrable measurable matrix
function:
A: R-+..9'(Rn, Rn).
Remark. Conditions A)-C) are more restrictive than the well-known
Caratheodory conditions [6] but they are more suitable, on the one hand,
for our aims [the consideration of equations X=~p(t, x, u(t)) with right-hand
sides involving discontinuous controls u(•) for which, however, the deriva-
tives IJlx exist] and, on the other hand, for our facilities (the apparatus
of differential calculus presented in Section 2.2).
LEMMA 1. For any compact set :J(cG there exists a locally integrable
function x(·) such that
IP(t, x)l~x(t), V(t, x)E:J(. (2)
Proof. Since the set :1( is compact, there exist o>0 and E > 0 such
that for any point (t,,x0)E::7( we have
118 CHAPTER 2

ct,x,={(t, x) II t-t. I ,;;;;;6, lx-x. I :;;;;;e}cG


for the "cylinder" Ctoxo· For this cylinder we find the corresponding
locally integrable function Ktoxo(t) indicated in C).
Now we again use the compactness and cover ~ by a finite number of
N
cylinders: ~C U ChW
{=( L

The formula
N
x(t)= i::
~[I1
F(t, x,.)/+kt·x·(t)·e];'([t.-6.
l l l
t.+6J(t)
l

determines the sought-for function (here X[a,S](·) is the characteristic


function of the closed interval [a, 13]; IF(t, Xi)l is integrable on [ti
o, ti + o] by virtue of A)). Indeed,
(t, x)EX~3i, (t, x)EC 1 ixi~IF(t, x)I</F(t, x;)I+/F(t, x)-F(t, x,.)l,;;;;;
<IF(t, x)l+ sup IIFx(t, c)lle:;;;;;x(t)
ce(x, xll

(we have [(t, x), (t~ x,.)JcCt 1x,., and the mean value theorem of Section 2.2.3
is applicable).
LEMMA 2. If a function x:~ ~ Rn is continuous on a closed interval ~
and its graph {(t, x(t))ltE~} is contained in an open set OcRxRn, and a
function F:G ~ Rn satisfies the conditions A)-C), then the function t ~
f(t) = F(t, x(t)) is measurable and integrable on~.
Proof. For sufficiently small E > 0 we have
~={(t, x)llx-x(t)l:;;;;;e, tE~}cG.

Let k(·) be the function corresponding to the compact set ~ in accordance


with the condition C). The continuous function x(·) is the uniform limit
of the piecewise-constant functions xn(·), and, without loss of generality,
we can assume that the graphs of xn ( ·) are contained in ~.
By condition A), the function t ~ fn(t) = F(t, xn(t)) is measurable
and integrable on every interval of constancy of xn(·), and therefore fn(·)
is measurable and integrable on~. Using the mean value theorem (Section
2.2.3) and (1), we write
lfn(t)-f(t)I=IF(t, xn(t))-F(t, .t(t))J :;;;;;k(t)lxn(t)-x(t)l,
whence fn(t) ~ f(t) for all tE~, and consequently f(·) is measurable since
it is a limit of measurable functions [KF, p. 284]. Finally,
If ( t) I ,;;;;; Ifn (f) I + k ( t) IXn ( t)- X (f) I
'
and consequently f(t) is integrable.
2.5.2. Local Existence Theorem. Let the function F:G ~ Rn satisfy
the conditions A)-C) on the open set GcRxRn , and let ~cG be a compact
set.
Then there exist 0 > 0 and E > 0 such that for any point (t~ x) E~ and
for (t 0 , xo) satisfying the inequalities
lt.-[j <~'~. lx.-xl <e (1)
the solution X(t, to, xo) of Cauchy's problem
x=F(t, x), \ (2)
x(t 0 )=X0 , f
is defined on the closed interval [t-o, t + o] and ~s jointly continuous
with respect to its arguments.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 119

Proof. Let us apply the contraction mapping principle in the form


stated in Section 2.3.2. Cauchy's problem (2) is equivalent to the inte-
gral equation
t
x(t)=x0 + ~ F(s, x(s))ds, (3)
t,

and we shall apply the lemma of Section 2.3.2 to the mapping


(·)

(t 0 , X0 , X (·)).-<I> (t0 , X0 , X ( ·)) = X0 +~ F (s, X (s)) ds. (4)


t,
Let us choose y and S such that
~h\={(t, x)J\t-tJ~y. Jx-xl~P. (t~ x)E~}c:G,
and let k( •) and x ( ·) be integrable functions corresponding to the com-
pact set ~1 by virtue of C) and Lemma 1 of Section 2.5.1. Let us check
the conditions of the lemma of Section 2.3.2. Here the topological space
Tis the set (1), U =TandY= C([t- o, t + o], Rn), where o, 0 < o ~ y,
will be chosen later; yo(s) = x, and
V={(t 0 , x0 , x(·))\\t 0 -t\<~. \x0 -x\<e, (5)
\Jx(·)-Yo(·)\\c<~}.

The constants£ and 6 are such that 0 < 6 < 1 and 0 < £ < S(1- 6).
Condition c) of Section 2.3.2 is fulfilled if
t
lxo+ ~ F(s, x)ds-xj < ~(l-6) (6)
t,

for tEfi'_-:-_11, t+IIJ. By Lemma 1 of Section 2.5.1, JF(s, x)J~x(s), and taking
a sufficiently small o, we obtain
1+0 t+G
~ \F(s, x)\ds~ ~ x(s)ds<~~~-6)-e, (7)
1-o 1-o
whence follows (6) because
t t t+6
!xo+~F(s, x)ds-xi~Jx.-xJ+i~F(s, x)dsj~e+ ~ \F(s, x)jds<P(l-6).
to fo 1-6

Condition a) of Section 2.3.2 means that the mapping (to, xo, x(·)) +
(t 0 , x 0 , ~(to, xo, x(·))) transforms V into itself. It holds if
t
jx.+~F(s, x(s))ds-xi<P
to

for (t 0 , x0 , x(·))EV and tE[t-11, t+ll]. We shall estimate the left-hand


member of the last inequality using (6), the mean value theorem and rela-
tion (1) of Section 2.5.1:

l
x.+ { F(s,
t0
x(s))ds-xl~lxo+ { F(s, t1
x)ds-xl+l ( {F(s, x(s))-F(s, x)}dsl<
to

t+6 i+G
<~(1-6)+ ~ k(s)jx(s)-xjds~p(l-6+ ~ k(s)ds)~P.
t-G t-6
if the inequality
120 CHAPTER 2

t+6
~ k(s)ds~9 (8)
t-6

is fulfilled, which can be achieved by taking, if necessary, a smaller o.


Finally, since condition a) has already been verified, condition b) of
Section 2.3.2 is fulfilled provided that
llfD(t 0 , X0 , x(·))-cl>(to, Xo, Y(·))ll~611x(-)-y(·)~
\

for any (t 0 , X0 , x(-))EV and (t 0 , X0 , y(·))EV. By virtue of (8),


/lfD(t 0 , X0 , X(·))-CD(t 0 , X0 , Y(·))ll=

=max lx + { F(s,
t
0
t0
x(s))ds-x0 - ~ F(s,
to
y(s))dsl=

=maxi{ {F(s,
I t,
x(s))-F(s, y(s))}dsl~ ~r k(s)dsJJx(·)-y(-)/J~61Jx(·)-y(·)ll.
t -0

Hence the lemma of Section 2.3.2 is applicable, and therefore the se-
quence Xn(·, to, xo) determined by the equalities Xo(·, to, xo) = x and
(-)

xn+d· 'to, Xo)=xo+~ F(s,Xn(s, to, Xo))ds (9)



is uniformly convergent in the space C[(t- o, t + o], Rll) with respect to
(t 0 , x 0 ) satisfying (1), i.e., uniformly convergent with respect to tE[t-
8, t + o]. Passing to the limit in (9), we conclude that the function X(t,
t0 , x 0 )=limXn(t, t 0 , X0 ) satisfies the integral equation (3) and, consequently,
it is the solution of Cauchy's problem (2). It can easily be verified by
induction that the functions Xn(t, to, xo) are continuous, and since the
convergence is uniform, the function X(t, to, xo) is jointly continuous
with respect to its arguments.
2.5.3. Uniqueness Theorem.
LEMMA (Gronwall's Inequality). Let nonnegative functions a(·) and
w(·) be measurable on a closed interval ~. and let a(·)w(·) be integrable
on ~. If for some b > 0 and TEL\ and for all tEL\ the inequality

(1)

is fulfilled, then

ro (t) <;be
I I
}I a(s)ds
~ (2)
for all tE;L\.
Proof. We begin with t ~ T. By the hypothesis,

N""") rt (s)ro (s) ds < ()Q


,t
and, according to (1),
ro(t)~N+b.

By induction, we find that (1) implies the inequalities


MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 121

Indeed, (3) holds form= 0, and if (3) holds for some m, then substituting
(3) into (1) and using the equality

Jk+l
Ja (cr) dcr
I
[

we conclude that (3) holds for m + 1 • Passing to the limit ~n (3) for
m + oo, we obtain (1).
For t ~ T the argument is quite similar: it is only necessary to in-
terchange the limits of integration everywhere. [It should be noted that
the function a(·) is not necessarily integrable, and therefore the right-
hand side of (2) may be infinite.] •
Uniqueness Theorem. Let function F:G + Rn satisfy conditions A)-C)
of Section 2.5.1 on the open set GcRxR"; let 6i, i = 1, 2, be intervals
belonging toR, and let Xi(·):6i + Rn be two solutions of Cauchy's problem
(2) of Section 2.5.2 whose graphs are contained in G. Then x 1 (t) = x 2 (t)
for all t E ~ 1 n ~ 2 •
Proof. The set 6 = {tlxl(t) = xz(t)} is obviously closed in ~ 1 n~ 2
and nonempty because t0 E ~ • It rejllains to verify that it is open, and the
assertion of the theorem will follow from the connectedness of the interval
~i n~2·
Let tE~; then x1(t) = xz(t) = x. Let us choosey> 0 and B > 0 such
that X={(t, x)jjt-fj:o:;;;;y, jx-xj:o:;;;;~}cG and (t, x1 (t))EX for It- f.l ~ y; let
k(t) be the function corresponding to X by virtue of the condition C) of
Section 2.5.1.
Applying the mean value theorem (Section 2.2.3) and inequality (1) of
Section 2.5.1, we find that

Now if the lemma we have just proved [a(t) = k(t), b = O] is applied to


the function w(t) = lx1Ct)- xz(t)l on [t- y, t + y], then, by virtue of
(2), we obtain x1Ct) = xz(t).
Consequently (f-y, f+y)c~ , and 6 is open. •
2.5.4. Linear Differential Equations. In this section 6 is a
closed interval on the real line and
A: ~-2' (R", R"), b: ~- R"andc: ~-R"*
are an integrable matrix function and two integrable vector functions, x =
(xv ... , Xn) E R" and· p = (p 1 , ••• , p,) E R"*. We shall consider linear systems
of differential equations
x= A (t) x+b (t) ( 1)
and
p=-pA(t)+c(t) (2)
[(2) is sometimes called the system adjoint to the system (1)]. As is
known, for linear systems it is possible to prove a nonlocal existence
theorem for the solutions (for the whole closed interval 6 in the case un-
der consideration).
LEMMA. If the functions A: ~-2'(R", R"), b: ~-R", and c:6 + Rn* are
measurable and integrable on the closed interval 6, then any Cauchy's
122 CHAPTER 2

problem for Eq. (1) or Eq. (2) posed for an initial instant t0 ~d has a
unique solution which can be extended to the whole closed interval ~.
Proof. We shall confine ourselves to Eq. (1) since for Eq. (2) the
whole argument is quite analogous. For the sake of convenience, let us
extend the functions A(·), b(·), and c(·) to the whole real lineR assuming
that they are equal to zero outside ~. Then the function
F(t, x)=A(t)x+b(t)
will satisfy the conditions A)-C) of Section 2.5.1 in the domain G = R x
Rn, and therefore the uniqueness theorem of Section 2.5.3 and the local
existence theorem of Section 2.5.2 will be applicable to the differential
equation (1) at each point (t, x). Hence the solution of Eq. (1) with
x(t) = x is sure to be unique and defined on a closed interval [t-o, t +
0].
Now.we note that if lxl ~ M, then
I F (t. x)l ::::;; I A (t)ll M +I b (t)l
and
IfF.. (t. x)JI =II A ( t)JJ.
Coming back to the proof of the theorem of Section 2.5.2, we can choose
the constants y, S, 8, and E sufficiently arbitrarily by putting, for in-
stance, 8 = 1/2, S = 4, E = y = 1, and then inequalities (7) and (8) of
Section 2.5.2 specifying o take the form
t+O t+O
M ~ JJA(s)\\ds+) \b(s)Jds<l,
f-o 1-o (3)
t+O
~- I A (s)ll ds::::;; 1;2.
t-0
Since IIA(·)II and lb(·)l are integrable, we can choose a sufficiently small
o > 0 (not exceeding y = 1) such that inequalities (3) are fulfilled for
all t [KF, p. 301]. Consequently, the solution of Eq. (1) with x(t) = x
is defined on a closed interval [t-o, t + o], where 0 is guaranteed to
be one and the same for any t and any x such that lxl ~ M.
Now let t 0 ~d and xo be given, and let a solution of Eq. (1) be sought
which satisfies the condition
(4)
Let us put

(5)

Cauchy's problem (1), (4) is equivalent to the integral equation


t t
x (t) ""'X0 +) A (s) x (s) ds +) b (s) ds,
t, t,
whence

The solution x(·) of problem (1), (4) is sure to be defined on the closed
interval ~1 =[to-o, to+ o]. Applying the lemma of Section 2.5.3 to the
functionro{t)=fx(t)f, (a(t)=//A(t)~,b=lxo\+~Jb(s)fds)on that closed interval,
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 123

we obtain the inequality

(7)

and, in particular, according to (5), we have lx(to ±a) I~ M. Therefore,


the local existence theorem can again be applied for the points (to ± o,
x(to ± o)), and the solution x(·) can be extended to the closed interval
~2 =[to- 2o, to + 2o].
Further this procedure is repeated. Inequality (6) holds for ~2. and
hence (7) holds for ~2, and we have lx(to ± 2o)l ~ M, etc. After a finite
number of steps we extend the solution of Cauchy's problem (1), (4) to the
whole closed interval ~. •
The explicit formulas for the solutions of systems (1) and (2) are
expressed in terms of the principle matrix solution of the homogeneous sys-
tem
x=A(t)x. (8)
Definition. By the principal matrix solution n(t, T) of the system
(8) is meant the matrix function Q: l\xl\->-2(Rn, Rn), which is the solution
of Cauchy's problem
OQ~~· ,;) =A(t)Q(t, ,;), (9)
Q('t, 't)= I. (10)
In other words, each of the columns of the matrix n(t, T) is a solu-
tion of system (8), and fort= T these columns turn into the set of the
unit vectors forming the standard basis in Rn: e1 = (1, 0, ... ,0), e2 = (0,
1, •.. ,0), ... , en= (0, 0, ... ,1).
THEOREM. If the matrix function A(·) is integrable on the closed
interval ~. then the principal matrix solution n(t, T) of system (8) exists
and is continuous on the square ~ x ~. and, moreover:
1) n(t, s)n(s, T) = n(t, T·) for all t, s, TEd; ( 11)
2) n(t, T) is a solution of the differential equation
ao (t, 't)
-!J (t, 't) A ('t') (12)
.O't

for each t.
Moreover:
3) If the function b:~ ~ Rn is integrable on the closed interval
and x(·) is a solution of system (1), then
t
x(t)=Q(t, 't)X('t)+ ~ Q(t, s)b(s)ds ( 13)
~

for any t, TEd.


4) If function c:~ ~ Rn* is integrable on the closed interval ~ and
p(·) is a solution of system (1), then

I
T

p(t) = p(T)il(T, t)- c(s)O(s, t) ds (14)

for any t, 'tEd.


Proof. Performing the direct substitution and using (9), we readily
verify that the functions
124 CHAPTER ·2

are solutions of system (8) for any 6 ERn • By virtue of (10),


x1 (s) = Q (s, s) Q (s, -r)S = Q (s, -r)S = x2 (s),
and therefore the uniqueness theorem implies
Q (t, ~s) Q (s, -r) 6= X1 (t) = X2 (t) = Q (t, -r) 6.
Since~ is arbitrary, (11) must hold. Putting t =T in (11), we obtain
n(t, s)n(s, t) = I, whence
Q(t, s~=[Q(s, t)]- 1 •
Fixing s in (11), we represent the function of two variables n(t, T) in the
form of a product of two functions of one variable, and since these func-
tions are both continuous, the function n is continuous with respect to
(t, T) on ~ x ~. In particular, it is bounded. According to Proposition
3 of Section 2.2.1, the matrix function f(A) = A- 1 is continuously differ-
entiable on the set of invertible matrices and
f' (A) [H]=-A- 1 HA- 1 • (15)
The set X'={A/A=Q(s, t), s, iE~} is the image of the compact set~ x ~
under the continuous mapping Q: ~X~·--·.2'(Rn, Rn), and therefore the set
X is compact, and the function f is bounded on it (ilfll :s;; M) and satisfies
the Lipschitz condition since
//f(A)--f(B)[/=/)B- 1 -A- 1 )/=/)-B- 1 (B-A)A- 1 ))~M•)fB-A/f.

According to Proposition 1 of Section 2.1.8, the function


cp(s)=Q(t, s)=Q(s, t)- 1 =f(Q(s, t))
is absolutely continuous, and, by virtue of (15) and the composition theo-
rem of Section 2.2.2, the relation
aQb~; s) =-[Q(s, t)]- 1 aQ~; t) [Q(s, t)]- 1 ==
= - [Q (s, t)]- 1 A (s) Q (s, t)[Q (s, t)]-' = -Q (t, s) A (s)
holds almost everywhere, which proves (12).
Representing the right-hand side of (13) in the form

j
Q(t, -r) [x(-r)+ Q(-r, s)b(s)ds]. ( 16)

we see that it is absolutely continuous. Indeed, n(T, s)b(s) is integrable


(as a function of s) on ~. and the integral of an integrable function is
absolutely continuous (Section 2.1.8). Consequently, the expression in the
square brackets in (16) is absolutely continuous with respect to t. It
remains to note that the product of two absolutely continuous functions is
also absolutely continuous (Proposition 1 of Section 2.1 .8). Taking into
account(9) and (11) and differentiating (16),wesee that, like x(·), the
right-hand side of (13) is a solution of Eq. (1). Since these solutions
of Eq. (1) coincide for t = T, the uniqueness theorem implies that they
coincide for all tE/1.
Equality (14) is proved in a similar way. •
2.5.5. Global Existence and Continuity Theorem. Let us come back to
Cauchy's problem
. ( 1)
x=F(t, x),
X (t 0 ) = X0 (2)
HATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 125

for differential equations whose right-hand sides satisfy the conditions


A)-C) of Section 2.5.1. As it is done in courses of differential equa-
tions, we can use the local existence theorem and the uniqueness theorem
to extend the solution X(·, to, xo) of Cauchy's problem (1), (2) to the
maximum interval (a, b) of its existence. From the theorem of Section
2.5.2 it follows that for t + a and t t b this solution must leave any
compact set XcG because the possibility of extending the solution to an
interval ( t - o, t + o) is guaranteed as long as (t, x (t)) EX, whence a +
o < t < b + o since (a, b) is the maximum existence interval. The next
theorem shows in what sense X(·, to, xo) depends continuously on (t 0 , x 0 ) .
THEOREM. Let the function F:G + Rn satisfy conditions A)-C) of Sec-
tion 2.5.1 on the open set GcRx R", and let x:[to, t1l + Rn be a solution
of Eq. (1) whose graph is contained in G: f={(t, x(I))Jt~:S;:t~t 1 }cG.
Then there exist o > 0 and a neighborhood G::Jf such that for any
(t 0 , x0 )EG
the solution X(·, to, xo) of Cauchy's problem (1), (2) is defined
on the closed interval~= [to- 8, t1 +~],and the function (t, to, xo) +
X(t, to, xo) is jointly continuous with respect to its arguments on~ x G.
In particular, X(t, to, xo) + x(t) for to+ T, xo + x(T) uniformly with
respect to t E 11.
~. A) Applying the local existence theorem to the points (t 0
x(to)) and (tl, x(tl)) we extend the solution x(·) to a closed intervai
~ = [t 0 - 8, t 1 + ~] such that the graph of x(·) remains inside G. After
that we choose E so that
X={(t, x)Jix-x(I)\~E', IE11}cG.
Applying the theorem of Section 2.5.2 to the compact set X, we find o > 0
such that for (t., Xn)E::Jt" the solution X(·, to, xo) is defined on [to-o,
t 0 + o] . Finally, let k ( t) and x (-) be the locally integrable functions
corresponding to X according to condition C) and Lemma 1 of Section
2.5 .1.
Now we shall consider the domain
G={(t, x)ltE(t.-6',
+B), lx-x(t)l<e} t1

and show that for a sufficiently small E > 0 this domain possesses the
properties indicated in the statement of the theorem.
B) Let two solutions Xi(·), i = 1, 2, of Eq. (1) be defined on a
closed interval [~, y]c/1 such that (t, X;(t))EX' for S .;;; t.;;; y, and let T E
[S, y). Applying the mean value theorem (Section 2.2.3) and inequality
(1) of Section 2.5.1, we obtain

I x1 (t)-x, (t)l =I xi ('t') + j F (s, X1 j


(s)) ds-x, ('t')- F (s, X2 (s))-ds I~
~\Xd't')-x2 ('t')l+li [F(s, X1 (s))-F(s, x,(s))ldsl~
~\x1 ('t')-x~('t'JI+Ij k(s)\x 1 (s)-x,(s)ldsl. (3)

By the lemma of Section 2.5.3 [~{s)=k(s), w(s)=lx1 (s)-x,(s)1!; b=Jx 1 ('t')-x 2 ('t')[],
we have

(4)
126 CHAPTER 2
C) Let us choose E > 0 such that
J k(•)d•

ee • ~E, (5)
and let (t0 , x0)-E;G. Since Gc:.~, the solution x(·) =X(·, to, xo) is defined
on[~. y]=(t 0 -6, t 0 +6]nL\. Let us show that it remains in X on this closed
interval. Indeed, let
T=sup{tjt 0 ~t~min{t 0 +6, ~+B}; (6)
(s, x (s)) EX, Ys E [t 0 , t]}.
The argument of Section B) of the proof is applicable to the solutions
x(·) and x(·) on the closed interval [to, T], and, by virtue of (4), we
have
T
~ ~ Jk(s)ds Jk(s)ds ~
I x(T)-x(T)I ~I X 0 - X (t 0)1 e'• < ee 6 ~e.

Therefore (T, x(T))EintX, and the point (t, x(t)) remains in X for t >
T close toT as well, which contradicts the definition (6). Fort~ to the
argument is quite similar.
Since (p, x(p))EX and (y, x(y))EX, the solution x(·) is defined on the
closed interval Uo;_26, to +26] nL\. Applying the same argument to this
closed interval, we see that (t, x(t)) remains in ~. Continuing this
procedure we conclude that x(·) is defined on ~.
D) As has been proved, any solution X(·, to, xo) remains in x· if
(t0 , x0 ) E 6.
Using function x ( ·), we write the inequality

(7)

Now let us take two solutions for which (t 0 , x0 ) E G and (t0 , x EG •


0) Then,
by virtue of (4) and (7),
IX(l, t:, X)-X(t, t
0 0, X0 )!~
~!X(t, t 0 , X0 )-X(t, t 0 , X0 )[+1X(t, f0 , X0 )-X(t, t 0 , X0 )l~

~ I
t I __
) x(s) ds +I X0-X (t 0,
Jfk(s)dsJ
t0 , Xo)l e T. ~

~If x(s)dsl + {txo-Xoi+IJ."x(s)ds)}Jk(s)ds (8)

It is evident that the right-hand member of (8) tends to zero for t + t,


to+ to, xo + xo, which proves the continuity of X(t, t 0 , x 0 ). Putting
t = t, to = '• xo = x(c) in (8) and taking into consideration that, accord-
ing to the uniqueness theorem, the identity X(t, '• x(c)) : x(t) holds,
we obtain

whence we see that X(t, to, xo) + ~(t) uniformly for x 0 + ~(<), to+'· •
Now we pass to the continuity of solutions with respect to parameters.
Let us suppose that we are given a family of differential equations
x=Fa.(t, x) (9)
dependent on a parameter a E ~ , where ~ is a topological space.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 127

THEOREM 2. Let the functions of the family {F.. : G-RnJaE)l{} satisfy


conditions A) and B) of Section 2.5.1 on the open set GcRx Rn for each
aE)l{ : let x:
[to, td -+ Rn be a solution of Eq. (6) for a = 0., and let its
graph be contained in G: r={(t, x(t))jf'o~t~i;_}cG; also let the following
conditions hold:
C') For any compact set :KeG there exist locally integrable functions
x(·) and k(·) such that for any (t, x)E9'~, aE)l{ the inequalities
IFa(t, x)j~x(t), IIFax(t. x)jj~k(t) (1 0)
hold.
D)
7.
lin~~ IFa(s, x(s))-Fa(s, x(s))jds=O. ( 11)
a-+a 4

Then there exist 5 > 0, a neighborhood G:::>r in G and a neighborhood


U;,a in ~1 such that for (t 0 , x0 )EG, aEU the solution xa(t, t 0 , x 0 ) of
Cauchy's problem (9), (2) is defined on the closed interval~= [to- 8,
t1 +iS], and
Xa(t, 10 , x0 )-x(t)
for / 0 -•'t'E[i0 , t1 ], x0 -+X('t'), a-+a uniformly with respect to tE[i0 , [ 1 ].

Proof. With a few changes the proof of the theorem is carried out
according to the same scheme as the proof of Theorem 1. Namely, in
Section A) we take into account that the number o in the local existence
theorem is specified by inequalities (7) and (8) of Section 2.5.2, and, by
virtue of the condition C'), we can take one and the same o for all aE)l{.
In Section B) we replace x1(·) by the solution xa(•) of Eq. (9) and
x 2 (·) by the solution x(·). When estimating the difference between the
solutions, we must take into account that the right-hand sides of the equa-
tions are different now and therefore

jXa(t)-x~t)j ~jxa.('t')-x('t')j+lf {F.. (s, x(s))-F.x (s, x(s))}dsl+


+If k (s)j x .. (s)-x (s)jds I,
and (4) is replaced by the inequality

lx.. (t)-x-(t)j~ ( ,lxa.('t')-x('t')j + I J{F"(s, x(s))-Fa(s, x(s))}ds


t I) lfk(s)dslr.
e' (4 I)

In inequality (5) the number E should be replaced by 2E, and, accord-


ing to the conditions C') and D), the number 8 and the neighborhood U should
be chosen so that the inequalities

li {Fa.(s! x(s))-F.x(s, x(s))}d.sl~:ciF«(s, x(s)-F.x(s, x(s))jds~


<2( J +7
\1,-0
fx(s)ds)+f1Fa(s, x(s))·-F,x(s.,-x(s))jds<e
t, t,
(12)

hold. Then the argument of Section C) [with (4) replaced by (4')] re-
mains valid for all aEU simultaneously.
128 CHAPTER 2

In inequality (8) of Section D) the correction corresponding to


the replacement of (4) by (4') should be introduced, after which we obtain

IX"(t, 10 , x)-x(l)/ ~ (lx.-x('r)l+jf x(s)dsl+


0

7,
I, ) f k(s)ds
+~ IF"(s, x(s))-F;;; (s, x: s))/ds el,
t,

and the rest is obvious. •


COROLLARY. Let all the conditions of Theorem 2 be fulfilled, the
condition D) being taken in the strengthened form:
D') for any function x:[to, t1J + Rn whose graph is contained in G
a
and any E ~ the relation
I,
lind7o /F"(s, x(s))-Fry_(s, x(s))/ds=O
a_,..a

holds. Then the solution xa(t, to, xo) of Cauchy's problem (9), (2) is a
continuous function of (t, to, xo, a) on~ x GxU.
Proof. The graph of the solution x(·) =X&(·, to, ~o):~ + Rn is con-
tained in G for any (i •. x.) a
EO and E u . Now we can apply Theorem 2 after
replacing x(•) by x(·) in the conditions of this theorem . •
The following identity is an analogue of the basic property of the
principal matrix solution [formula (11) of the foregoing section]:
X (t, -r, X (-r, 10 , x0 )) =X (t, 10 , x 0 ). ( 13)
This identity follows automatically from the uniqueness theorem because
each of its members is a solution of Eq. (1), and fort= T these solutions
coincide. Let the reader analy~e the connection between identity (13) and
Huygens' principle stated in Section 1.1 .3.
2.5.6. Theorem on the Differentiability of Solutions with Respect to
the Initial Data. Up till now the presentation of the material has been
based on the comparatively weak conditions A)-C) of Section 2.5.1. In
order to carry out further analysis and to establish the differentiability
of the solution X(t, to, xo) of Cauchy's problem with respect to the ini-
tial value xo, these conditions should be strengthened to a certain extent.
Then it becomes possible to apply the classical implicit function theorem.
In the proof of the theorem of this section we shall not dwell on the
existence of the solution X(t, to, xo) for (1 0 , x.)EG and its extension to
all IE/j, since both the facts follow from Theorem 1 of Section 2.5.5.
Nevertheless, we recommend the reader to show that these facts also follow
from the implicit function theorem, i.e., it is possible to do without the
reference to the foregoing section. (An analogous argUOlent will be pre-
sented in full in Section 2.5.7, where the differentiability with respect
to to will be proved under still stronger assumptions and a somewhat dif-
ferent version of the implicit function theorem will be applied.)
THEOREM. Let function F:G + Rn satisfy conditions A) and C) of Sec-
tion 2.5.1 on the open set GcRxRn, and let the following condition hold:
B') for any t the function x + F(t, x) is continuously differentiable
on the section 0 1 ={x/(t, x)EG}.
Let x:[to, t1J + Rn be a solution of Eq. (1) of Section 2.5.5 whose
graph f={(t,x(t))/i;.~t~t1 } is contained in G, and let o, E, and Gbe the
same as in Theorem 1 of-Section 2.5.5; let~= [to- 8, t 1 + 8].
MATHEMATICAL HETHODS FOR EXTREHAL PROBLEHS 129

Then for any fixed f 0 EL'l the mapping xo-+ X(·, to, xo) from B(:x(to),
€) into C(6, Rn) is continuously differentiable in Fr~chet's sense, and,
moreover,
8X(t,·t 0 ,x 0 )=Q(t t)
Oxo ' o '
( 1)

where ~ is the principal matrix solution of the variational equation


Z=Fx(f, X(t, i 0 , X0 ))Zo (2)
Proof. Let the compact set ~ and the function k(t) be the same as
in Theorem 1 of Section 2.5.5. The set
:§={x(o)EC(L'l, R")J(t, x(t))EG, fEL'l}
is obviously open in C(6, Rn). Let us define a mapping ifF: R"x:§~C(L'l,
Rn) with the aid of the equality
I

ifF (X 0 , x ( o )) (f)= X 0 + ~ F (s, X (s)) ds-x (t). (3)


t,

It ~s evident that
(4)
Let XoEB(.~(t,),e) be fixed, and let x(t) = X(t, to, xo). We shall
verify that the classical implicit function theorem (Section 2.3.4) is
applicable in the neighborhood of the point (xo, x(·)). Indeed, the de-
rivative ifF x, (x 0 , x ( [so]= So is obviously continuous. Now we compute the
o ))

Gateaux derivative:
fEx(-)(Xo, x(·))[s(o )](t)= lim df(xo. x(·)-!-a~~))-ff(xo, x(·)) (t) =
a+O
t t
=lim 5F (s, x (sJ+a~~s)) -F (s, x (s)) ds -s (f)=~ Fx (s, x (s))S (s) ds- s(t). (5)
a+o 4 ~

To prove the existence of the uniform limit with respect to t we note that,
according to the mean value theorem (Section 2.2.3),

I
s
to
F(s, x(s))-\-a~ds))-F(s, x(s)) ds-.~ Fx(s, x(s))S(s)dsi<S ra(s)ds,
to a
where
ra(s)= max IIFx(s, c)o-Fx(s, x(s))ilis(s)]. ·.
c e [x (s), x (s)+a!; (s)]

The condition B') implies that ra(s) -+ 0 for any s and a+ 0, and, by vir-
tue of the relation (1) of Section 2.5.1, we have lra(s)l ~ 2k(s)llt;ll. By
Lebesgue's bounded convergence theorem [KF, p. 302], limit (5) exists and
is uniform.
The operator
t
fEx<o>(x 0 , x(-))[s(·)](i)= ~ Fx(s, x(s))S(s)ds
t,

is bounded:

It depends continuously on (xo, x(·)):


!Wx<olx., x(·))-fExo(X0 , x(·))ll=
130 CHAPTER 2
t t
= IIGII<l
sup sup I~ F., (s, x(s)}; (s)ds- ~ F .,(s, x(s))6(s)dsJ~
t to . t.
~ ~ 1/F.,(s, x(s))-F.,(s, x(s))]Jds.
4

By virtue of B'), for llx(·)- x(·)ll-+- 0 we have


~F.,(s, x(s))-F.,(s, x(s))JJ-0, VsE~.
Again using relation (1) of Section 2.5.1, we see that IIFx(s, x(s)) - Fx(s,
x(s))ll ~ 2k(s), and therefore Lebesgue's theorem is applicable. Now Corol-
lary 2 of Section 2.2.3 implies that the continuous Frechet derivative
;F.,(·) exists.
Let 'IJ(·)EC(~. Rn) be arbitrary. According to (5),
t
3F., 1.,(x0 , x(·))[s(·>J='IJ(·) ~ ~ F.,(s, x(s))6(s)ds-6(t)=TJ(t), (6)
t.

and to apply the implicit function theorem we must show that the equation
on the right-hand side of (6) is uniquely resolvable. The substitution
~(t) + n(t) = ~(t) transforms it into the integral equation
t
b(t)= SF.,(s, x(s))fb(s)-TJ(s)]ds,
t, .

which in its turn is equivalent to the linear differential equation


~(t)=F.,(t, x(t))b(t)-F.,(t, x(t))'IJ(t); bUo)=O.
It remains to refer to the lemma of Section 2.5.4. By Banach's theorem
(Section 2.1 •5), 3f~~·l is bounded.
By the implicit function theorem, there exists a mapping cp: U--..C(~.
Rn) continuously differentiable in a neighborhood U ~ 0 such that 3F (x0 , x
cp(x0 ))=0. According to (4), we have cp(x0 )(t)==X(t,t0 ,x0 ), and therefore the
mapping x 0 -+- X(t, to, xo) is in fact Frechet differentiable at the point
xo. By virtue of the same implicit function theorem,

iJX (~x:•· xo) So= cp' (x.) [so] (t) = - (3F~L o 3F"• [6 0 ]) (t) = 6(t),
and, as was already found, ~(t) is a solution of the integral equation (6),
which takes the form
t
SF.,(s, x(s))~(s)ds-s(t) =-6 0•
t, .
Consequently,
~=F.,(s, x(s))6, 6(to)=~o,
and the theorem of Section 2.5.4 implies that ~(t) = n(t, to)~o, whence
ax (t, t , x )=Q(t, t ).
-a •
0 0 0
Xo

2.5.7. Classical Theorem on the Differentiability of Solutions with


Respect to the Initial Data. In this section we again consider Cauchy's
problem
x=F(t, x), (1)
x(t 0 ) =X1 , (2)
but this time in the classical situation when the given function F(t, x)
and its partial derivative Fx(t, x) are continuous in G. For the sake of
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 131

convenience, the theorem stated below will be proved independently of the


theorems of Sections 2.5.5 and 2.5.6. The references to the local exis-
tence theorem (Section 2.5.2) and the uniqueness theorem (Section 2.5.3)
can be replaced by the reference to any textbook on differential equations.
THEOREM. Let function F:G + Rn and its partial derivative Fx be con-
tinuous on the open set Gc:·RxR", and let :le: [to, t1] +an be a solution
of Eq. (1) whose graph f={(t,x(t))jt0 ~t~t1 } is contained in G.
Thep there exists b > 0 and a neighborhood G:Jf such that for any
(t 0 , X 0)EG the solution X(t, to, xo) of Cauchy's problem (1), (2) is defined
on~ = [to-~. tl + ~] and is a jointly continuously differentiable func-
tion with respect to its arguments in the domain (to-~. t1 + ~) x Gand,
moreover,
X(t. -r, x(-r))=x(-r), (3)
iJX - -
ar(t, -r, x(-r))=F(t,x(t)), (4)
iJX (t,
-01
0
t 0 , -x(-r)) I
to=t
=-Q(t, -r)F(-r, ~x(-r)), (5)

i)iJX (t,T,X0 )1 ~ =Q(t,T), (6)


Xo x.=x(t)

where O(t, T) is the principal matrix solution of the variational system


of equations
z=Fx(t, x(t))z.
Proof. A) Applying the local existence theorem to the points (to,
x(to)) and (tl, x(tl)), we extend x(·) to the closed interval~= [to-~.
t 1 + ~] so that the graph {(t,x(t))jtE~} remains inside G. After that we
choose e > 0 such that
~={(t, x)ltE~.Ix-x(t)j.:s:;;;e}c: G.
Functions F(t, x) and Fx(t, x) are uniformly continuous on the compact set
~.

B) Now we define a function g{; ~---+C(~. R")XR" in the domain


~={(x(·), !0 , X0)1!x(t)-x(t)l <e. Yt.E~; t 0 Eint~. X 0 ER"}c:C1 (~. R")xRxR"
with the aid of the equality
dx (I) -F (I x (I)))
g{(x(·), t 0 ,x0 )(t)= ( dt ' • (7)
x(to)-x 0

It is evident that the equality 3f(x(·), t 0 , x0 )=0 is equivalent to Cauchy's


problem (1), (2). In particular, by the condition of the -theorem, T(x(·),
T, X(T)) = 0.
The function g{ is continuously differentiable in the domain ~.
Indeed, (7) involves the linear mappings x(·) + dx(·)/dt and x 0 + -xo,
the operator of boundary conditions (x(·), to)+ x(to), andNemytskii'sope r-
ator {x(t)} + {F(t, x(t))}, which are all continuously differentiable ac-
cording to Sections 2.4.1 and 2.4.3, and, moreover,

Tx<·>(X(·), t0 , Xo)[s(·)]=(~-Fx(t,x(t))S), (8)


6(to)
Tt,.(X(·), to, Xo)[aJ=(x(~)a)' (9)

g{ "• (x( · ), t0 , Xo)[so] = ( -~J · (10)


132 CHAPTER 2

C) Let ~ ~(t); then (t, x) E r. We shall verify that the operator


tFx<·> (x( · ), 'i, x) is invertible. The equ3lity

is equivalent to Cauchy's problem


~-F" (t, (t)) = (t), (t) = i'· x s s ( 11) s
According to the theorem of Section 2.5.4, problem (11) possesses the solu-
tion
t
s(t) =0 (t, t) i' +) Q(t, s)S (s)ds.
t
Thus, tFx<·l maps the Banach space C1 (L1, Rn) onto the whole Banach space C(D.,
Rn) x Rn. Since problem (11) possesses only the zero solution for s(·) =
Q. andy = 0, the operator tFx<·l is one-to-one. By Banach's theorem (Section
2.3.4), the operator tF;l., exists and is bounded.
D) According to the classical implicit function theorem (Section
2.3.4), there exist 8 > 0 and a continuously differentiable mapping
<D: {(to, Xo) II to-t I< ll, lxo -xl < ll}-+ C 1 (~. R")
such that

Let us denote
X (t, f 0 , X0 ) = <D (t 0 , X0 ) (t).
Then X(t, to, xo) LS the solution of Cauchy's problem (1), (2), and we have
ax F (t, X (t, t
ar= X 0 )), (12)
0,

ax a<D
ar;;=ar; (to, Xo) (t), (13)
ax a<D
;;-=;;-(to,
vx 0 vX 0
Xo)'(J). ( 14)

If (to, ~o) +(to, xo), then ¢(to, xo) +¢(to, xo) in C1 (D., Rn), i.e.,
X(t, t 0 , x0 ) + X(t, to, xo) uniformly on D.. It follows that X(t, t 0 , x 0 )
is jointly continuous with respect to its arguments. The same considera-
tion and formulas (12)-(14) show that the derivatives of the function X(t,
to, xo) are also continuous.
E) The foregoing argument was applicable to the initial data (to, x 0 )
belonging to the a-neighborhood of the point (i, x) E r . This point was
chosen arbitrarily, and the whole graph r can be covered with such neigh-
borhoods. If x( 1 )(t, to, xo) and x( 2 )(t, to, xo) are defined in two of
these neighborhoods having a common point (to, xo), then
X<ll (t 0 , t 0 , X0 ) = X0 = X< 2 l ( t 0 , t 0 , X0 )

and the uniqueness theorem of Section 2.5.3 implies that


X<ll (t, t0 , X0 ) = X< 2 l (t, t0 , X 0 ), Vt E ~.
Consequently, the solution X(t, to, xo) is defined correctly for (to, x 0 )
belonging to a neighborhood of r and is obviously continuously differenti-
able in this neighborhood.
F) It remains to derive formulas (3)-(6). Equality (3) holds accord-
ing to the uniqueness theorem, and then (4) is a consequence of (12). By
the implicit function theorem,
MATHEMATICAL METHODS FOR EXTRE}~ PROBLEMS 133

and therefore (13) and (9) yield


aX
~~ (t, 't,X('t))[ctJ=-o'T.<L(x(· ), 't, x('t))
A A A (0 )
uo xOOa
A •

()X
Therefore, a~;(t, 't, x('t))
A (
regarded as a function of t) is the solution of
Cauchy's problem (11) with s = 0 andy= -i(T) = -F(T, x(T)), and hence
(5) takes place.
Similarly,

and therefore from (14) and (9), we obtain

~X (t, 't, x(<))[yJ=-o'T,;~.)(x(·), 't, X('t)) ( 0


v~ -v).
Consequently, aX (t, 't, X('t)) (regarded as a function of t) is the solution
ilx 0
of Cauchy's problem (11) with s = 0, whence

~X
vXo
(t, 't, X ('t))[ y] = Q (t, 't) y,

which is equivalent to (6). •

2.6*. Elements of Convex Analysis


Convex analysis is a branch of mathematics in which convex sets and
functions are studied. The application of these concepts was already demon-
strated in Section 1.3.3 (the Kuhn-Tucker theorem). Since the concept of
convexity plays an important role in most of the works concerning extremal
problems, we shall present here, besides elementary facts, fundamentals of
the duality theory (the Fenchel-{1oreau theorem). We shall also present the
simplest properties of the subdifferential (a concept generalizing the con-
cept of the differential to the case of convex functions). In Sections
3.3, 3.4, and 4.3 we shall demonstrate the application of these concepts.
The presentation of the material in this section is primarily based on Sec-
tions 2.1.3 and 2.1.4.
2.6.1. Basic Definitions. Let X be a real linear space.
Definition 1. a) A set Cc X is said to be convex if it contains,
along with any two of its points x1 and x2, the whole (closed) line segment
fx 1 , x,J={xjx=ctx1 +(1-ct)X 2 , O~ct~l}.

b) A set Ac X is called an affine manifold if it contains, along with


any two of its points x1 and x2, the whole straight line
{xjx=ax 1 + (1-a) x,, aE R}.
c) A set Kc X 1.s called a cone (with vertex at the origin) if it
contains, along with its any point xo, the whole ray
{ax0 ja>0}.
By definition, an empty set is assumed to be a convex set, an affine
manifold, and a cone. The family of all the convex sets belonging to X is
denoted by )[\(X).
134 CHAPTER 2

Definition 1 immediately implies:


Proposition 1. a) The intersection of an arbitrary collection of con-
vex sets (affine manifolds or cones) is itself a convex set (an affine
manifold or a cone).
b) The image f(A) and the inverse image f- 1 (B) of convex sets (affine
manifolds or cones) Ac X and BeY under a linear mapping f :X + Y is a
convex set (an affine manifold or a cone).
c) A translate C + ~ of a convex set (an affine manifold) C is a con-
vex set (an affine manifold).
d) A cone K is a convex set if and only if
Xt. x.EK~(x 1 +x.)EK.
e) An affine manifold A is a linear subspace of X if and only if 0 EA.
Exercise 1. Prove that an affine manifold is a translate of a linear
subspace.
Definition 2. a) The intersection of all the convex sets C containing
a given set H is called the eonvex hull of the set H and is denoted by
convH:
conv M = n C, C E ill (M). (1)
C:::!M

b) If in (1), instead of the convex sets, all the convex cones K~M
(the affine manifolds A ~M or the linear subspaces L ~M) are taken,
then their intersection is called the eonie (the affine or the linear) hull
of H and is denoted by cone H (aff H or 1 in H) •
Definition 3. Let Xl••••,xn be elements of X. An element
n
X=~ A·X· (2)
1=1 I I

is called a linear or an affine or a eonie or a eonvex combination of


xl,••••xn if in (2):
Ai are arbitrary (for a linear combination)
or
n
~ A;= 1 (for an affine combination)
1=1
or
Ai ~ 0 (for a conic combination)
or
n
~A;= 1, A;~O (for a convex combination).
1=1

It is proved by induction that if Xl, •.. ,xn belong to a convex set C


(a convex cone Koran affine manifold A or a linear subspace L), then
their convex (conic or affine or linear) combination belongs to C (K or A
or L).
Proposition 2. a) The convex (the conic, the affine, or the linear)
hull of a set H consists of all the convex (the conic, the affine, or the
linear) combinations of the elements of H.
b) H is a convex set (a convex cone, an affine manifold, or a linear
subspace) if and only if it coincides with its convex (conic, affine, or
linear) hull.
Proof. We shall consider the case of a convex set H since in the
other cases the proof is similar. Let us denote by Mthe set of all the
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 135

convex combinations of the points of M. We have M ;:)M, and the set M is


convex because if
m, m,
X1 = ~ rxilxil• x2 = ~ rx; 2 x,. 2 , au~ 0,
i=l 1~1

then for rxEfO, I] we have


m1 ms
ax1 + (1-a) x 2 = ~ aailx; 1
i=l
+~ 1=1
(1-a) a; 2X; 2 ,

and the last sum is a convex combipation of the points x11o···•xm 1 1,


Xlz, ... ,xm 2 z. Consequently, McM (because convM is the intersection of
all the convex sets containing M). On the other hand, as was mentioned
above, each point of M is contained in every convex set containing M, and
therefore M c conv M.
If M is convex, then we can take C =Min (1), and consequently we
obviously have conv M = M. Conversely, if conv M = M, then M is convex ac-
cording to the definition of the convex hull and Proposition 1 a). •
Exercise 2. Prove that the arithmetical sum

M1 + ... +Mn={xlx= ±x;, x;EM,


1=1
i=l, ... , n}
of a finite number of convex sets (cones or affine manifolds or linear
spaces) possesses the same property.
Examples. 1) The nonempty convex sets on a straight line are all the
single-point sets and the intervals of all kinds (the closed intervals, the
open intervals, the half-open intervals, the closed and the open rays, and
the whole straight line itself).
2) Every linear subspace or affine manifold is a convex set.
Exercise 3. Prove these assertions.
3) The convex hull of finitely many points is called a convex polytope.
An important special case of a convex polytope is an n-dimensional simplex,
i.e., the convex hull of n + 1 affinely independent points (none of which
is an affine combination of the others).
Exercise 4. Prove that the convex hull of three points in R2 not
lying in one straight line (a two-dimensional simplex) is the triangle with
vertices at these points. What happens when the points are in one straight
line?
For a finite-dimensional space Proposition 2 can be strengthened. The
corresponding assertion will be called Caratheodory's theorem although,
strictly speaking, this name is only related to the part of the theorem
concerning the convex hull.
Caratheodory's Theorem. Let n = dim (lin A) < ""· Every point belong-
ing to the convex (the conic, the affine, or the linear) hull of the set A
is a convex (a conic, an affine, or a linear) combination of not more than
s points Xl,•••oXs of A.
For the convex and the affine hulls, s "' n + 1, and the points x 1 , ... ,
xs can be regarded as affinely independent.
For the conic and the linear hulls, s = n, and the points x1, ... ,xs can
be regarded as linearly independent.
136 CHAPTER 2

Proof. A) We shall limit ourselves to the most complicated case of a


convex hull; after the proof we shall indicate what changes must be made
for the other three cases.
According to Proposition 2 a), every point xEconv A can be repre-
sented in the form

X=A1x1+···+AkXk, Ak>O,
k
~ A1 = I, x 1 EA (3)
i=l

(in the definition of a convex hull, we have \i ~ 0, but it is clear that


\i = 0 can be discarded).
We shall show that if x 1 , ••• ,xk are affinely dependent, then the num-
ber of the points can be diminished while the representation (3) is re-
tained. Indeed, if the points xl,••·,Xk are affinely dependent, then one
of them is an affine combination of the others without loss of generality,
k-1
we can assume that X~o=f! 1 X 1 + ... +fA-k-lXk-· 1, ~ f!;= I· Putting )Jk = -1, we ob-
k k i=l
tain ~ f!;X;=O, ~ f!;=O. Now let
i=l i=l

Then
k k k k
x = ~ A1x1 +a ~ f.t;X;= ~ (A 1+af! 1)x1= ~ A;x1,
1=1 i=l i=l i=l

and, by virtue of the choice of a, we have t.i ~ 0, i = 1, 2, ... ,k, and at


least one number t.i is equal to zero. Deleting from the set {xl, ... ,xk}
those Xi for which t.i = 0, we arrive at a representation of form (3) with

a smaller number of points c~ Ai = t~ A; 11


+a ~ f!; =I)·

Let us denote by s(x) the smaller power of the set {x 1 , ••• ,Xk} for
which representation (3) is possible (any nonempty set of natural numbers
contains the smallest number). Then the above argument shows that the
points of the set {xl, ... ,xs(x)} must be affinely independent. It remains
to take into account that in an n-dimensional space any n + 2 points
{x 1 , ••• ,Xn+ 2 } are affinely dependent. Indeed, the points Xi- Xn+ 2 , i =
1, ... ,n + 1, must be linearly dependent, and if, for instance, Xn+t--Xn+ 2 =
n n
n ( n ) n
~~A; (X;--Xn+ 2), then Xn+t = 1-- 1~ A; Xn+2+ ~~ A;X1 • Since l--~A 1 +~'A 1 =1,
i= I i= I

the point Xn+l is an affine combination of the other points.


B) In the case of an affine hull, the part of the above argument con-
nected with the inequalities \i > 0, t.i ~ 0 is dropped, and we can put a =
-\i/)Ji for any i with )Ji ~ 0.
In the two remaining cases affine combinations are replaced by linear
ones, and the number of linear independent points cannot exceed n. For a
conic hull the argument with the inequalities is retained; for a linear
hull it can be dropped. •
Now we proceed to the description of convex functions. Here and
henceforth in the present section by a function we shall mean a mapping
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 137

f :X + i, where i = R U { - oo, + oo} is the extended real line.* With every


function f the following two sets can be associated:
dam f = {x EX If (x) < + oo}
and
epi f={(x, a) E Xx RIa> f(x), xE domf}.
They are called the effective domain and the ep~graph of the function
f, respectively.
Exercise 5. Prove that a set CcX X~ is the epigraph ·of a function
f:X + R if and only if Vx, {a.l(a., x) E C} is 0, R or [a, +oo), 3:a.
Definition 4. A function f for which domf ~ 0 and f(x) > -oo every-
where is said to be proper; all the other functions are called improper.
A function f:X +Ron a linear space X is said to be convex if epif is a
convex set in X x R. The set of all the convex functions on X is denoted
by 'P (X).
This definition immediately implies:
Proposition 3. a) For a proper function f to be convex it is neces-
sary and sufficient that Jensen's inequality

(4)

hold for any points X; Edam f and any a.i ;;;, 0, i 1, ... ,n, such that
n
~ Cl.i = 1.
i= 1,
n
b) A sum ~ {; (x) of a finite number of convex functions and the upper
i=l

bound Vafa(X)=sup{fa(x)} of any family of convex functions are convex.


a

Examples. 1) Convex Functions on the Straight Line. Let fo(x):(a.,


8) + R, and let the derivative f o (x) exist on the open interval (a,~) S R
and be nondecreasing. Let us put
x _{ fo (x), x E (a, ~).
f( )- + oo, x ~(a, ~).
For this function we have domf =(a., 8). Let us verify that it is convex.
An arbitrary point of the line segment joining two points (x 1, z1)Eepif, i =
1, 2, has the form (ax1 +(1-a)x 2 , az 1 +(1-a)z 2 ), aE[O, 1] . Let us apply La-
grange's formula; this yields
az 1 + (1-a) Z 2 -f (ax1 +(I -a) X 2 ) >af (x1 ) +(I -a) f (x 2 )-f (ax 1 +(I -a) x 2 ) =
=af' (c1 ) (x1 -(ax1 + (1-a) X2 )) +
+(I -a) f' (c 2 ) (x 2 -(ax1 +(I -a) x2 )) =a ( 1-a) (f' (c1 )-f' (c 2 )) (x1 -X 2 ) > 0
*The arithmetical operations and inequalities involving the improper num-
bers +oo and -oo are defined thus (below a E ~ is arbitrary and p > 0):

<+ oo)+(+oo)= + oo, ("+ oo) (± oo)= ± oo, + oo >a,


± oo+a= ± oo, (- oo) (± oo)= 'f oo, - oo <a,
(- oo)+(- oo)=- oo, (± oo)·p=± oo, + >- oo,
00
-(+-oo)=-oo, (± oo)·O=O,
-(-oo)=+oo, (± oo)·(-p)='F OO•

The expressions (+oo)- (+oo), (+oo) + (-oo), and (-oo)- (-oo) are regarded as
meaningless; the operations of division of the improper numbers are not
defined either.
138 CHAPTER 2

because we have X1 ~C1 ~ax1 +(I-a)x.~c.~x. and the derivative is nonde-


creasing by the hypothesis. Consequently, (ax1 +(1-a)x.,,az 1 +(1-a)z 2 )Eepif,
and f is convex. • ·
Exercise 6. Let a convex function f:R + R be finite on an open inter-
val I= (a, S). Prove that f is continuous on I and possesses the left-
hand and the right-hand derivatives f~(x) and f~(x) at each point xEI and
that these derivatives are nondecreasing on I and coincide everywhere ex-
cept, possibly,at not more than a countable number of points.
2) Let U be a function of class C2 (Section 2.2.5) on an open convex
set U of a normed space X, and let its second differential d 2 fo = f~(x)[s,
s] be nonnegative. As in the foregoing example, let us put
f(x)={ fo(x), x E U,
+oo, x~U.
The restriction of this function to any straight line in X possesses the
properties of the function considered in the foregoing example (prove
this!). Therefore, f is convex (to verify the convexity of epi f we can limit
ourselves to all the possible sections of this set by vertical two-dimen-
sional planes in X x R).
These examples yield a great number of concrete convex functions. On
R such are the functions eX, ax 2 + bx + c for a~ 0, -logx (completed by
the value +oo for x ~ 0), etc. In the Euclidean space the quadratic func-
tion f(x) = (Axlx) + (b lx) + c is convex, where A is a positive-definite
symmetric operator.
3) Indicator Function. This is the function

M(x)=
{ +oo,
0,
x~A,
xEA.
I t is clear that M (x) E 'Y" (X) if and only if A E lB (X).
4) Minkowski's Function (Section 2.1.3). Jensen's inequality (4) is
a source of many important inequalities used in various branches of mathe-
matics. In particular, applying it (with ai = 1/n) to the functions

-logx, x>O,
!1(x) = {
+oo, x,;;;O,

l
and
xlogx, x>O,
f 2 (x)= 0, x=O,
+oo, x<O,

we obtain, in the first case, the well-known Cauchy inequality connecting


the arithmetic and the geometric means and, in the second case, the im-
portant inequality of the probability theory for the entropy H(p) of a
n
distribution (pl,•••,Pn), where P;=xd~ X;, x;;;;:,O:
i=l
n
H(p)=- I p,logp,,;;;iogn.
i=l

2.6.2. Convex Sets and Functions in Topological Linear Spaces. Let


us suppose now that X is a locally convex topological linear space [KF,
Chapter 3, Section 5]. As is known, in this case the conjugate space X*
consisting of all continuous linear functionals on X is sufficiently rich
[KF, Chapter 4, Sections 1-2], which yields a great number of convex sets
and functions.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 139

The simplest convex sets in X are hyperplanes and half-spaces (here


x•ex•):
H (x*, a)= {X I (X*, x> =a} (a hyperplane),
H+ (x•, a)={xl<x•,_x>~a}}
(closed half-spaces),
H_ (x•, a}={xl <x•, x> ~a}.
H_

(x*, a)=.{xl <x•, x> <a}} (open half.-spaces),
H_(x•,a)={xi<x•, x>>a}
Exercise 1. Check that these sets are convex.
Further, by a convex polyhedron we shall mean the intersection of a
finite collection of closed half-spaces (it should be stressed tha~ such
an intersection is not necessarily a bounded set).
Exercise 2. Prove that every convex polytope (Section 2.6.1) in Rn
is a convex polyhedron and that every bounded convex polyhedron is a con-
vex polytope.
Now we present the list of the topological properties of convex sets
which we need for our further aims (for the topological terminology see
[KF]).
Proposition 1. Let A be a convex set. Then:
a) all the points of the half-open line segment [x 1 , x 2), where x 1 E
int A and x2 E A belong to int A;
b) the interior intA and the closure A of the set A are convex; if
int A ;t 0, then A = int A.
Proof. a) Let VcA be a convex neighborhood of the point x 1 • An
arbitrary point xE[x1 , x2) has the form x = ax1 + (1- a)x2, 0 <a~ 1, and
aV + (1- a)x2 is its neighborhood contained in A, i.e., xEintA.
From the assertion a) it follows that if int A ;t 0 then A c int A,
whence Ac intA. The opposite inclusion is obvious.
b) If X; E int A, i = 1, 2, then, according to a), we have fx 10 x2 JE int A ;
i.e., intA is convex. Now let x,.EA, i = 1, 2. Let us take an arbitrary
convex neighborhood V of zero. By the definition of the closure, there
exist xiE(x;+V)nA, i = 1, 2. For an arbitrary point x=ax 1 +(1--a)x.E[X1 ,
x2] let us put x' = axi + (1- a)x~. Then x'EA andx'Ea(x.+V)+( I-
a) (x 2 + V) = x + V; i.e., every neighborhood of the point x intersects A, whence
xEA and A is convex. •
Definition 1. The intersection of all the closed convex sets contain-
~a set A is called the convex closure of the set A and is denoted by
conv A.
It is clear that ~A is convex and closed.
Proposition 2. a) conv A = A ~ the set A is convex and closed;
b) the set ~A coincides with the intersection of all the closed
half-spaces containing A;
c) ~A= convA.
~ The first assertion follows at once from the definition. The
inclusion eonv A c Coi1v A is also evident (not every convex set containing
A is necessarily closed) and therefore conv A c conv A. On the other hand,
the set conv A is convex [Proposition 1 b)] and closed, and contains A, and
therefore, according to the definition, it must contain convA, whence
conv A =COnY A.
140 CHAPTER 2

Further, let us denote by B the intersection of all the closed half-


spaces containing A. Then conv AcB (not every closed convex set contain-
ing A is necessarily a half-space) . Let X 0 If. conv A. By the second separa-
tion theorem (Section 2.1.4), there exists a linear functional x*EX*
strongly separating the point xo and conv A:
<x*, x,> > sup <x*, x>=a.
XE conv A

I t follows that X 0 ~H+(x*,a)::::lconvA::::lA',and therefore X 0 ~B. Consequently,


convA =B. •
Exercise 3. Let us consider the infinite-dimensional ellipsoid B =
{ x=(x 1 , x 2 , ••• ) In~1 (xnlan) 2.;;;1 }, with semiaxes au in Hilbert 1 s space l 2 •

a) Prove that the set B is convex and closed.


b) Find the conditions on an under which intB ~ 0.
c) Let int B ~ ¢, and let a point y (yl, Y2, ••. ) lie on the boundary
00

of the ellipsoid, i.e.' ~ (Ynlan) 2 = 1. Prove that it is possible to


n=-1

draw a hyperplane through y such that B lies on its one side.


d) Is the foregoing assertion always true when int B = 0?
The simplest convex sets are half-spaces, and, similarly, the simplest
convex functions are affine functions
a (x) = <x*, x)-b, x• EX, bE R.
Exercise 4. Prove that a(·) is convex; find epia.
Propositions and 2 have their analogues to whose statement we now
proceed.
Definition 2. Let f:X + R. The function f specified by the condition
epi f = epi f is called the closure of the function f; if f = f, the func-
tion f is said to be closed. The function convf determined by the condi-
tion epi (conv f) = conv (epi f) is called the convex closure of f.
Exercise 5. Verify that epi f and conv f are epigraphs of certain func-
tions (see Exercise 5 in Section 2.6.1).
It is evident that ~ f = f ~ f is convex and closed.
We also remind the reader that a function f:X + R is said to be lower
semicontinuous at a point xo if lim f (x) ~ f (x,), and that it is simply called
X-+Xo

lower semicontinuous if the same holds for any x 0 •


Exercise 6. Prove that f:X + R is lower semicontinuous if and only
if the set -Z'cf={x If (x) .;;;;c} is closed in X for any c E R.
Proposition 3. a) For a proper function to be closed it is necessary
and sufficient that it be lower semicontinuous.
b) For a closed function f to be continuous on int dom f it is suffi-
cient that f ~e bounded (above) 1n a neighborhood U of a point ~ and finite
at the point x.
Moreover, f is proper, int dom f >' ¢,

intepi f ={(x, a) E Xx R Ix E int domf, a> f (x)} =!= 0,


and
f (x) = (conv f) (x), Vx E int dom f.
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 141

In accordance with Exercise 6, we shall verify the closedness of the


set :iA.
Proof. A) Let f be a closed function. Then epif is a closed set in
X x R. The hyperplane H={(x,aJa=c} is also closed. Therefore, the set
~cf={xJf(x)~c} is also closed because we have epifnH={(x,c)J xE2'cf} and
the mapping x -+ (x, c) is a homeomorphism.
Conversely, let all 2 cf be closed and (X0 , a 0 ) ~ epi f. Then we have
f(x 0 )> ao, and consequently f(xo) > ao + E: for some e > 0 and X0 ~2'a,+d.
The set .:Za,+ef being closed, there exists a neighborhood V:;lx 0 such that
Vil.:Za,+d=0·, i.e., we have f(x)>a.+e,xEV. The open set {(x,a)lxEV, a<
ao + E:} contains (xo, ao) and does not intersect epif. Consequentiy, the
complement of epi f is open and epi f itself is closed.
B) LEMMA. Let_X be a locally convex space, let V be a convex neigh-
borhood of a point x in X,_and let f be a convex function on X taking on a
finite value at the point x EV (f(x) ~ -oo) and bounded above in V. Then
f is continuous at the point i.
Proof of the Lemma. Performing the translation x -+ x + i and sub-
tracting the constant f(x) from f we reduce the situation to the case when
x=O, 0 E V, f(O)=O, supf(x)~C. Here V can be regarded as a symmetric neigh-
AEV
borhood of zero [if otherwise, we would consider W=Vn(-V)]. Let 0 <
a < 1 and xEaV. Then we have X/aEV, and the convexity of f implies
f (x) = f'((1-a) 0 +axja) ~ (1-a:) f (0) +af (xja) ~aC.

On the other hand, the symmetry of V implies - x/a E V, and using the equal-
ity 0 = x/(1 +a) + a/(1 + a)(-x/a) and the convexity off we obtain
0= f (0) = f (x/(1 +a) +a!( I +a)(- x;a)) ~ 1/(1 +a) f (x) +a/(1 +a) f (- xja),
and hence f(x) > -af(-x/a) > -aC.
Thus, if xEaV, then lf(x) I '( aC, i.e., f is continuous at zero. •
COROLLARY. Under the conditions of the lemma, intdomf "'C/J.
C) Now we come back to the proof of the assertion of Section b) of
Proposition 3. According to the lemma and its corollary, f is continuous
at i, and we have intdomf "'C/J.
Let us suppose that f(y) = -oo for some y, i.e., (y, a) Eepif for any
aER . Since epi f is convex,
((1-t.)x+t.y, (1-l.)f(x)+t.a)Eepif
for any aER, w):;ence f((1 - >..)i + /..y) = -oo, which contradicts the con-
tinuity of f at x for /.. + 0. Hence f(y) > everywhere, i.e., f is proper.
--C:)

Now let yEintdomf. Let us find p > 1 such that Z=x+p (y-x)Eintdomf
(this can be done because the intersection of the open set int dom f with
the straight line passing through i and y is an open interval of that line).
The homothetic transformation of G with center at z and the coefficient
(p- 1)/p transforms i into y and the neighborhood V into a neighborhood
v' of the pointy. Further, if ~EG(V)=V', then ~= (p-1)/px+z/p, xEV and
f(p~r-l
p
f(x)+~f(z)~r-
p p
1 c+~f(z).
p

Consequently, according to the lemma, f is continuous at the point y; i.e.,


f is continuous on int dom f.
If we have (x 0 , a 0 ) E int epi f., then, according to the definition, there
is a neighborhood W of the point xo and E: > 0 such that
{(x, a) JxE W, Ja-a 0 J < ~>} c epi f,
142 CHAPTER 2

whence x0 Eintdomf and ao > f(xo). The opposite inclusion


{(x,a)lxEintdomf, a>f(x)}c intepif
is evident. In particular, intepi f"' (/).
Since f is convex, conv (epi f) = epi f. Recalling Definition 2 and
Proposition 2c), we \vrite
epi (conv f)= conv (epi f)= conv (epi f)= epi (f) :::J epi (f).
Consequently, we always have (m
f) (x) ~ f (x). In case f is continuous (or
at least lower semicontinuous) at a point x and f(x) > a, the same inequal-
ity is retained in a neighborhood of the point x, and therefore we have
(x,a)~epif=epi(convf) and (convf)(x)>a. I t follows that (convf)(x)=f(x). In
particular, under the conditions of Section b) of the assertion we have
to prove this equality holds for x E int (domf) . •
Exercise 7. Let f be convex on X, and let f(x) = -oo at a point x E
intdomf. Then f_(x)=e-<Y:J,VxEintdornf,
Definition 3. An affine function a(x) <x*, x>- b is said to be a
supporting function to a function f if:
a) a(x) ~ f(x) for all x;
b) for any E > 0 there is x such that a(x) > f(x) -E. In other words,

b=sup{(x•, x>-f(x)}. (1)


X

Minkowski's Theorem. A proper function f is convex and closed if and


only if it is the upper bound of the set of all its supporting affine func-
tions.
Proof. 1) "If." An affine function is convex and closed since its
epigraph is a closed subspace. Further, for any family of functions {fa}
we have
epi {sup fa}= J(x, z) I z;;;::. sup fa (x)l = n {(x, z) Iz;;;::. fa (x)} = n epi fa, (2)
a t a ( a a

and if all fa are closed and convex, then epifa are closed convex sets and
their intersection possesses the same property.
2) "Only if." By the hypothesis, B = epi f is a nonempty closed convex
set in X x R. According to Proposition 2, the set B is the intersection of
all the closed subspaces containing it. Since every continuous linear
functional on X x R has the form (x*,x>+.Az, x*EX•, .AER (see Section 2.1.2),
a closed half-space is determined by an inequality
(3)
Since B = epif is nonempty and contains, along with its point (xo, zo), all
the points (xo, z), z > zo, the set B can be contained in the half-space
(3) only if A ~ 0. For A = 0 we shall call the half-space (3) vertical.
It is evident that
epif={(x, z)lf(x)~z} c {(x, z)l<x-, x>~b}~
~domf c {xl<x*, x>~b}=H+ (x•, b)~ epif c H+ (x*, b) x R.
For A< 0 all the terms of inequality (3) can be divided by IAI, and
thus we can assume that A = -1 and the half-space (3) coincides with the
epigraph of the affine function a(x) = <x*, x>- b. However,

B = epi f c epi a~ f (x);;;::. a (x), Vx ,


MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 143

and therefore
epif=nepian n [H+(x•,b)xRJ. (4)
a<; f dom f C. H + (x*,b)

Now we note that at least one affine function a 0 ~ f exists [if other-
wise, epi f would contain, along with the point (xo, zo), all the points
(x 0 , z), zER, whence f(xo) = -oo, which contradicts the fact that f is prop-
er]. But we have
epia.nH,.(x•, b)x R={(x, z)la.(x)~z. <x", x>~b}=
= n {(x, z)la0 (x)+J..((X 0 , X)-b)~z} = n epi{a0 (x)+J..((x*, X)-b)},
A~O A~O

and if a0 (x} ~ f (x) and dom f c H + (x•, b), then


f(x) <+oo~<x•, x>-b~O~a. (x)+A. (<x•, x>-b)~f(x).
Consequently, epia.n[H+ (x•, b)xR] c n epia, and hence we can discard all
a<;; I
the vertical half-space H+(x*, b) x R in (4). Thus
epif= n epia
a<;f

and, according to (2),


f(x) =sup {a (x) Ia (x} is affine and ~ f (x)}. (5)
It remains to note that in (5) we can confine ourselves to the supporting
functions to f since the supremum does not change if of all the functions
a(x) = <x*, x> - b ~ f(x) with one and the same x* we retain the one for
which b is determined by equality (1), i.e., a supporting function.
Exercise 8. If f:X + R is convex and closed and f(~) = -oo, then
f (x}""'- co, vx E dom f.
For smooth functions we can use the following assertion.
Proposition 4. If a function f is convex and Gateaux differentiable
at a point xo, then the function a(x) = <fr(xo), x- x 0 > + f(xo) is its
supporting function.
Proof. The condition b) of Definition 3 follows from the equality
a(xo) = f(xo). Let us suppose that f(xl) < a(xl) for some x 1 • Then
f(x 1 ) < a(xl) + £ for some £ > 0, and therefore for 0 < a < 1 we have
f(x 0 +a(x1 - X0 }) = f((l-a) x0 +ax1 ).:;;;
~(1-a.)f(x.)+a.f(x,) < (1-a.)f(x 0 )+a.(a(x1 )-e)=
= (1-a.) f (x.) +a. (f (x.) +<fi:(x.), x1 - X0 )-e) = f (x0 ) +a <fi- (x0 ), x1 -X0 )-a.e.
Consequently,
<f'r(x.), x.-x.> =lim f(xo+tt (x, ~xu})- f (xo} ~ <fr (x.), x.-x.>-e,
cqo
which is contradictory. Hence the condition a) is also fulfilled. •
Now let a function f:X +R be Gateaux differentiable at each point
X0 EX. Let us form its Weierstrass function (cf. Section 1 .4 .4):

li(x, X0 )=f(x)-f(x~)-<ff(x0), x-x0 ) .


Proposition 5. For f to be convex it is necessary and sufficient that
the inequality it(x, X0 );;;a.O,
Vx, X0 , hold.
Proof. a) If f is convex, then If (x, X0 );;;::. 0, Vx, x0 is implied by Propo-
sition 4 we have just proved.
b) Let x0 =ax1 +(1-a.)x,, O~a..:;;;l. The inequalities {i(X 10 x.)~O,{f(x,
xo) ~ 0 mean that the points (x 1 , f(xl)) and (x2, f(x2)) lie in the half-
space {(x, z) Iz;;;a. f (X0 ) +<fr (x0 ), x-x0 >}. Consequently, the line segment joining
144 CHAPTER 2
them also lies in this half-space, and therefore
a f (x1) + (l-a) f (x.);;:::, f (x.) + <f' (x0 ), ax1 + (1-a) x. -x.> = f (x.). •
Thus, the fulfillment of Weierstrass' condition of Section 1.4.4 for
all the points (t, x, x)
is equivalent to the convexity of the Lagrangian
L(t, x, x) with respect to x.
2.6.3. Legendre-Young=Fenchel Transform. The Fenchel-Moreau Theorem.
Let X be again a locally convex topological linear space, and let X* be its
conjugate space. Let us take an arbitrary function f:X + i and investigate
in detail the set of its supporting functions. It is natural to assume
that the function f is proper because, according to the definition, f(x) ~
a(x) > -oo everywhere if f possesses at least one supporting function, and
the case f(x) = +oo is trivial: any affine function is a supporting func-
tion.
Definition. The function on X* specified by the equality
f*(p)=sup{<p, x>-f(x)} ( 1)
xeX
is called the conjugate function to f or the Young-Fenchel transfo~ (some-
times the Legendre transfo~) off, and the function
f**(x)=sup {<p, x>-f*(p)} (2)
peX•

is called the second conjugate to f.


It is seen from equality (1) of Section 2.6.2 that the set of the
supporting functions to f and the set of those p EX* for which f*(p) is
finite are in one-to-one correspondence:
(p, f* (p)'=F ± oo)+-+ap(X)=<p, x>-f* (p). (3)
Remark. Despite the seeming symmetry of formulas (1) and (2), we have
f** 7 (f*)*. The matter is that, by definition, (f*)* must be regarded on
(X*)* and not on X. Only for reflexive spaces are (f*)* and f** identified
in a natural way.
Proposition 1. 1) Functions f* and f** are convex and closed.
2) For any xE.X and pEX* there holds Young's inequality
f (x) +f* (p) ;;t (p, X). (4)
3) f**(x) ~ f(x), Vx.
4) If f(x) ~ g(x), then f*(p) ~ g*(p) and f**(x) ~ g**(x).
Proof. 1) It is seen from 1) and 2) that each of the functions f*
and f** is obtained as the upper bound of a family of affine functions,
and therefore [according to formula (2) of Section 2.6.2] the epigraphs of
these functions are intersections of closed subspaces and thus are convex
and closed.
2) It follows from (1) that there must be f*(p) ~ <p, x>- f(x) for
any eX and p eX* ' which is equivalent to (4).
X

3) As has already been proved, f(x) ~ <p, x>- f*(p), whence


f (x);;:::, sup
, {<p, x>-f* (p)} = f** (x).
4) This assertion is an obvious consequence of the definitions. •
Examples. 1) Let f(x) = <x*, x>- b be an affine function. Such a
function f(x) has only one supporting function, which is f(x) itself.
Therefore,
_s~p {<p, x>-<x:, x> +b} = {+OCI,
f* (p) = _ b,
P"Jfi:x:,
p,.. x;, (5)
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 145

and
f"'* (.x)- sup {(P,, x>- f"' (p)} a. <x:, x> -b.;,. f (x). (6)
p

2) As in the example 1) of Section 2.6.1, let

f(x)={fo(X), cx<x<j},
+ ao, x ~(a., A),
and let f~(x) be continuous and monotone increasing on (a, S). Let us de-
note
A= lim f~ (x), B= lim f~ (x).
X-+et+O X--?P-0

Given any pE(A, B), the equality p = f'(xo) holds for some x0 E(a., ~).
Since f(x) is a convex function, Proposition 4 of Section 2.6.2 implies
that

~s a supporting function, and hence the equalities


f*(p)=PX 0 -fo(Xo), p=f~(Xo) (7)
specify f*(p) parametrically in (A, B) [in the classical mathematical anal-
ysis the function f* specified by equalities (7) is called the Legendre
transform of fo].
Exercises. 1. What can be said about f*(p) for p~(A, B) in the above
example?
2. For the following functions f on B. indicate dom f and epi f, verify
the, convexity and the closedness, and compute f* and f**:
a) ax 2 +bx+c, a~O, b) Jxl+!x-a!,
0, a,.;;;x<;~,
c)ill xj-IJ, d) .S [ex, ~] (x)= { d: [ R]
oo,
f){++oo,Vi-x
x~ a,~-''

e){-~, JxJ.:;;;l, 2, Jxj <I,


+oo, Jxl>i, jxj~l,

g) e',. { - log X, X > 0,


h) 0, x...;;O,

i) 0,txlogx,

+oo,
x>O,
x=O,
x<O,
. {xlog..r. x>O,
J) +oo, x.;;;O,

k)
f x"
a' x~o. (a~ 1),
l+oo, x<O
m) I x !"Ia (a~ 1).
3) The example below will be used in Section 3.3.
Proposition 2. Let f:Rs + R be the function determined by the equal-
ity f(x) = f(xl, ... ,xs) = max{xl, ... ,xs}.
Then

f*(p)=f*(pl, ... , Ps)= l 0


+ oo
if P;';?;O, ~Pi= I,
i=l
s

in all the other cases.

~· Since f is the maximum of a finite collection of linear (and


hence convex and continuous) functions, the convexity and the continuity
of f are obvious.
s
If P;';?;O, ~P;=I, then <p, x> ~ max{xl, ... ,xs}, and therefore
1=1

f*(p)=sup{<p, x)-max{x1 , ••• , x,}}=O.


X
146 CHAPTER 2

On the other hand, if f*(p) < +oo, then there must be f*(p) = 0 [by virtue
of the homogeneity of the functions x ~ <p, x> and x ~ max{x, ... ,xs}l,
i.e., for such p the inequality
s
~ P;X;-max{x 1 , ••• , x.}:::;;;o, Vx,
1=1

holds. Substituting Xj = 0, j ~ i, Xi = ~ into this inequality, we obtain


Pi~:( max{~, 0}, whence Pi;;,. 0. Putting Xi= a, i = 1, ... ,s, in the same
inequality, we obtain
s s
a~p;:::;;;a, VaER~~p 1 =1. •
1=1 i=l

4) The conjugate function to the norm in a normed space.


Proposition 3. Let X be a normed space, and let N(x) = llxll. Then N*
coincides with the indicator function cB* of the unit ball of the conju-
gate space X*, and N**(x) = N(x).
Proof. If llpll > 1, then <p, xo> > llxoll for some x 0 , and then
f*(p)=sup{<p. x)-jjxjj}~sup{(p, ax0 )-aj/x0 //}=+ oo.
x a~O

In case llpll :( 1, we have <p, x> :( llxll and


f*(p)=sup(<p, x>-//x//)=0.
X

Further,
f** (x) =sup {<p, x>-f* (p)} =sup {<p, x>}=J/ x((=N (x). •
p liPII<I

5) Let X= R, f(x) = 1/(1 + x 2 ). Then, as can easily be seen,


0, p=O,
f* (p)= { +oo, p=/=0, f**(x)=O,

and hence the equality f**(x) = f(x), which was observed in the examples
1) and 4), does not hold here.
The next theorem, which is one of the most important in convex analy-
sis, shows that the coincidence of f and f** is by far not accidental.
Fenchel-Moreau Theorem. Let X be a locally convex topological linear
space, and let f:X ~ R be a function greater than -oo everywhere. Then
a) f**(x) - f(x) if and only if f is convex and closed.
b) f**(x) sup{a(x)la(x) is affine and :(f(x)}.
c) If there exists at least one affine function a(x) :( f(x) [equiva-
lent conditions are: f*(p) t +oo or f**(x) > -oo·everywhere], then
f**(x) is the greatest of the convex functions not exceeding f(x),
i.e., f** = conv f.
d) (f**)* = f*.
Proof. A) "Only if" follows from Section 1) of Proposition 1. Let
us suppose now that a(x) :( f(x) is an affine function. In the example 1)
we establish that a**(x) = a(x). Using Proposition 1 again, we write
a (x) =a** (x):::;;;; f** (x):::;;;; f (x).
Therefore
Sl,lfl {a (x) I a (x) is affine and:::;;;; f (x)}:::;;;; f** (x):::;;;; f (x), (8)
whence, first of all, the "if" part of a) follows because when f is convex
and closed, the left-hand and the right-hand sides of (8) coincide (for a
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 147

proper function f this is implied by Minkowski's theorem, and if f - +~


then the left-hand side is also equal to+~). Hence a) is proved.
B) For an arbitrary f the following three cases are possible. If
there is no affine function a(x) ~ f(x), then f*(p)= sup{(p,x>-f(x)}=+ oo
X

for all p, and hence f**(x) ~ -oo. However, in this case the left-hand
side of (8) is also equal to -.co (supC}l ~ -oo by definition), i.e., b) takes
place.
In case there exists an affine function a(x) ~ f(x), relation (8) im-
plies that f**(x) > ~ everywhere. There remain the following two possi-
bilities. If f**(x) = +~, then f(x) = ~ and the two functions are equal
to the left-hand side of (8) because here the affine function can be arbi-
trary. In case f** is proper, Minkowski's theorem implies
f** (x) =sup {a (x) I a (x) is supporting to f** (x)}.
Comparing this with (8) we see that sup{a(x)la(x) is affine and :e;;;f(x)}~
f** (x) =sup {a (x) I a (x) is supporting to f** (x)} ~sup {a (x) I a{x) is affine and
~f(x)}. Thus, b) holds for these cases as well.
C) Further, let us suppose that there exists an affine function a(x) ~
f(x), and let g(x) ~ f(x) be an arbitrary closed convex function. Let us
put
h (x) =sup {g (x), f** (x)}.
This function is closed and convex, and f(x) ~ h(x) > -.co everywhere. Con-
sequently,
h (x) = h** (x) :e;;; f** (x) ~ h (x),
and therefore h(x) f**(x) and g(x) ~ f**(x), which proves c).
D) Since f**(x) ~ f(x), we have (f**)*(p) ~ f*(p). On the other hand,
by virtue of Young's inequality, we have f(x) ~ a(x) ~ <p, x>- f*(p),
whence, by virtue of (8),
f" (x) ~a (x) and (f••)• (p) =sup (<p, x>-f" (x)) :e;;; sup ((p, x>-a (x)) = f' (p)
X X

for a finite f*(p). The case f*(p) ~~is trivial, and for f*(p) ~-.co,
we obtain
f (x) = f" (x) 5all + oo and (f")' (p) = - oo = f• (p). •
Exercise 3. Let f be convex and bounded above in a neighborhood of a
point. Then t••(x)=f(x),YxEintdomf.
2.6.4. Subdifferential. Moreau-Rockafella r Theorem. Dubovitskii~
Milyutin theorem. Let X be a locally convex topological linear space, and
let f be a function on X (f:X + i).
Definition. The subset of X* consisting of those elements x'EX• for
which the inequality
(1)

holds is said to be the subdifferentiaZ of fat the point xo. The subdif-
ferential of the function fat the point XQ is denoted by df(xo).
Hence af(xo) is the set of the "slopes" of the affine functions a(x) ~
<x*, x>- b supporting to fat the point xo, i.e., such that the supremum
in the formula (1) of Section 2.6.2 is attained at x 0 :
(2)

Proposition 1. The subdifferential af(xo) is a convex (possibly emp-


ty) set in X*.
148 CHAPTER 2

Proof. Let xjEof(x~), i = 1, 2. Then, by definition, f(x)-f(~0 )";;;;;:<X:.


x-X0 ), f(x)-f(x 0 ) ;;o:.<x;, x-x0) . Multiplying the first inequality by a and
the second by 1 -a, where ttE[O, lj, and adding them together, we obtain
f(x)-f(x0);;;.. <cui +(1-a)x;,.x-x.>, Vx. •
Examples. 1) Subdifferentials of Certain Functions Defined on the
Straight Line and on the Plane. Let the reader check that:

a) fdx)=lxl, (xE R)~ofdx) =nign1;, 1], :;:~:

b) f.(x)=VX:+X:. (xeR·)~at.(x)= { -=--(


lxl- '~+
{ulVul+u:~1}. x=O,
xl Xa
a• '~+2 ,x:pO.
)

Jf Xi+ Xij Jf Xi+ Xii

c) fa(x)=J;tlax(lxll• lx.l), (xER 2 )~

{YIIYll+lu.l~1}, x=O, ·
{ (sgnxl, 0),
~ 0fa(x) = (0, sgnx 2 ),
lx1l > lxsl•
lx.l > lx1l•
conv ((sgnx1 , 0), (0, sgnx.)),lx1l=lx.l=z, z:pO.
2) Subdifferential of the Norm in a Normed Space.
Proposition 2. Let X be a normed space, let X* be its conjugate
space, and let N(x) = llxll. Then
B•, where s• is the unit ball of the
aN (x) ={ conjugate space if X =0,
{x• E X•111x-~= 1, (x-, x> =llxJI}, if x=FO.

Proof. As was indicated above [relation (3) of Section 2.6.3], the


set of the supporting functions to f and the set {x* E X•, f• (x•) =F ± oo} are
in one-to-one correspondence, and, moreover,
a..,• (x) =(X'", x>-f• (x*).
According to Proposition 2 of Section 2.6.3,
0, lix•ii~ 1,
N• (x") = 68• (x•) = { +oo, llx*ll > 1,

and hence the functions <x*, x>, llx*ll ~ 1, are supporting toN. Equality
(2) goes into
O=<x•, x.>-liXoJ!, (3)
and hence any function x + <x*, x>, llx*ll ~ 1, is supporting to N at the
point xo = 0, and <lN(O) = B*. In case xo ;t 0, the set of X'"EB• for which
(3) holds coincides with the one indicated in the statement of the propo-
sition. •
In the above examples the subdifferential existed at each point. Of
course, for nonconvex functions (e.g., for -jxJ) it may exist at none of
the points. However, convex functions may have no subdifferentials either
even at points of their effective domains.
Here is the simplest example:
f(x)={-V1-~· .. lxl~ l,
+oo, lxl>l.
The subdifferential is empty at the points x 1 = +1 and x 2 = -1. A
sufficient condition for the existence of the subdifferential will be
established later (see the corollary of the Moreau-Rockafella r theorem).
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 149

In convex analysis the theorem proved below is an analogue of the


theorem on the linear property of the differential.
Moreau-Rockafellar Theorem. Let fi, i = 1, ... ,n, be proper convex
functions on X. Then

(4)

If all the functions except possibly one are continuous at a point x,


and if this function is finite at the equality x,
(5)

holds at any point x.


Proof. We shall confine ourselves to the case n = 2; the case of a
greater number of summands is considered by induction. The inclusion
iJf1(x)+CJf.(x)ciJ(f1+f.)(x) follows at once from the definition of the sub-
differential.
Let x~EiJ(fi+f.)(x.). Without loss of generality, we can assume that
xo = 0, x~ = 0, fi(O) = 0, i = 1, 2. Indeed, if these relations are not
fulfilled, we can consider the functions
gl (x) = ft (x. +x)-f 1 (x0 )-<x;, x>
and

instead of the functions f1 and f2• Thus, let 0EiJ(fi+f1)(0). According to


(1), this means that
min (f1 (x)+ f. (x)) = f1 (0) +f. (0) = 0.
JC

Let x be the point at which f1 is continuous and f2 is finite. Let us


consider the two convex sets
cl = {(x, a.) I a.> fl (x), X E int dom ft} = int epi ft
and
c.= {(x, a.) 1-a.;;;;:;: f.(x)}.
It is clear that the sets C1 and C2 are convex and nonempty ((x,- f. (x)) E C,),
C1 is open, and C1 ~ 0 by virtue of Proposition 3b) of Section 2.6.2 be-
cause the function f1 is continuous at the point and therefore it is x
bounded in a neighborhood of X' and we have cl n c.= 0 • Indeed' if (al, xl) E
cl nc.' then ft (xl) <a.~- f. (xl) ; i.e.'
0 =min (f 1(x) +f. (x)) ~ f1 (xl) +f. (xl) < 0,
JC

which is impossible. By the first separation theorem (see Section 2.1.4),


c1 and C2 are separated by a nonzero linear functional (x~, S):
inf (<x;, x>+!h);;;;:;: sup (<x;, x>+jh). (6)
(x. a) e c, (x, a) ec,

It is clear that S ~ 0 because, if otherwise, the supremum would be equal


to +oo. If we assume that S = 0, then
sup <x:, x) ~ inf <x:, x).
xe int dom f 1 xe dom fs

Since the maximum of a linear function cannot be attained at an interior


point, there must be
150 CHAPTER 2

(x~, x> < sup


xEintdomf1
(x;, x>..;; inf (x;, x>..;; <x;,
xedomfa
x).
This contradiction shows that B ~ 0. Consequently, all the terms in (6)
can be divided by I BI, and then denoting :X~ "' I B l- 1 x~ and using the fact
that epif1 s;;intepift=C1 (Proposition 1 of Section 2.6.2), we obtain
aup {<x;, x>-fdx)} = sup {<i;, x>-g.J"""
X { ~~d~r'M,

=sup {<x;:x>-~}~ inf {<x:,x>-a}=inf{<i:,x>+f.(x)}.


(x,.,.)EC1 (x,.,.)ECa X

The values of the functions in the braces coincide and are equal to zero
for x"' 0 [f1(0) "'f2(0) "'0]. Hence
fdx)-fl(O)~<x;, x>, f,(x)-f,(O)~<-i;, x>;
i.e. •

and
oeofl (O) +of. (O). •
COROLLARY 1. Let f be a convex function continuous at a point xo.
Then af(xo) ~ 0.
Proof. Let us consider the function
f+tJ {xo} ={ f(xoo, 0), X=X 0 ,
x=Fx0 •
The functions f and o{xo} satisfy the conditions of the Moreau-Rockafellar
theorem. The equality
X• ==o(f+tJ {x0 }) (x0 ) =Of (x0 ) +06 {x0 } (x0 ) =iJf (x0 ) +X•
means that af(xo) ~ 0. •
The subdifferential af(xo) of a convex function continuous at a point
xo is in fact a compact convex set in the weak* topology.
Let K be a cone (Section 2.6.1). The cone K* consisting of those
elements x* for which (xe, X) ~0, VxE I(, is said to be the aone aonjugate
to K. If OEK, then definition (1) immediately implies the equality
aoK(O) "' -K*.
COROLLARY 2. Let Kl,••••Kn be open convex cones whose intersection
is nonempty. Then

( nn K1)• =
1=1
n
Y-Ki.
~~
Indeed, let us add the origin to Ki and consider the functions oKi·
Applying the Moreau-Rockafellar theorem to them (all oKi are continuous at
- n
a nonzero point X E n K; ) '
i=d
we obtain

COROLLARY 3. Dubovitskii=Milyutin Theorem on the Intersection of


Cones. For convex cones K1, ••• ,Kn, Kn+l, among which the first n are open,
not to intersect it is necessary and sufficient that there be functionals
n+l
xiEKi, i = 1, ... ,n + 1, not all zero, such that ~xi=O.
l=l
MATHEMATICAL HETHODS FOR EXTREMAL PROBLEMS 151

Proof. "Necessary." Without loss of generality, we can assume that


n
n K; =I= 0• · Then K is an open cone which, by the hypothesis, does not
K = i=l
intersect Kn+1· By the first separation theorem, there is a nonzero lin-
ear functional y• E X• separating K and Kn+ 1 :
inf <y•, x> ~ sup <y•, x).
xeK xeKn+•

The last relation means that y*EK•, while (-1)y*EK~+l (x = 0 is a limit


point both forK and for Kn+ 1 ). By Corollary 2, we can represent y* as a
n
sumu·= ~xi. xiEKi. 1~i~n; denoting (-1)y* by X~+l•weobtainwhatis
i=1
required.
n+I
"Sufficient." Let xi E Ki be not all zero, and let ~ x{ =0. Let
I= I
n+l
XE n Ki.
i=l
X=/=0, and xi,=/=0, 1 ~io~n. Then X E int K ;, :::;> <xi,, X) > 0, and hence
0 =<~xi, x> > 0. This contradiction proves the theorem. •
Dubovitskii-Milyutin Theorem on the Subdifferential of the Maximum.
Let f1, ... ,fn be convex functions on a locally convex topological space X
continuous at a point xo; let f(x) = max{fl(x), ... ,fn(x)}, and let I=
{i 1 , ... ,is} be a set of indices such that
for iE/,
for i~/.
Then
at (xo) = conv {atdxo) u u aft. (xo)} =
={x•i x•= ±A.11xj,,
0 0 0

xi,Eat£11 (xo), !.11~0,


k=l

Proof. A) If i EI and x• Eat 1 (x 0), then

f (x)-f (Xo) = f (~-fi (xo) ~ /; (x)-f; (Xo) ~


:;a=<x•, X-X 0 ):;>x•Eaf(x0 ):;>af (xo)=> U af;(xo):;> af(x0 )=> conv { U af;(.t0 ) } ,
IE/ i EI •

where the last implication is a consequence of the convexity of 3f(xo)


(Proposition 1).
The opposite inclusion will be proved by induction on the number of
functions. For n = 1 the assertion is obvious. In what follows we shall
suppose that it is true for n- 1 functions.
B) LEMMA. If a function f(x) is convex and the inequality
(7)

holds in an open convex set V and if IJl(t)=O for some xEV, then x•Eaf(x 0 ) .
Proof. Let us show that inequality (7) holds for all x; in accordance
with the definition of the subdifferential, this will imply the assertion
of the lemma.
Let x be arbitrary. The convexity of f implies the convexity of the
function qJI, and if <p(;!<O, then
(j) ((1-a)x+ax)~(1-a)«p (x)+ a«p (x) =a«p (.x) <0
for any aE(O, 1). For a sufficiently small et we have (1-a)x+axEV; we
have thus arrived at a contradiction. Consequently, «p(x)~O for all x. •
152 CHAPTER 2

C) Let xe E of (x0 ). We shall show that in this case there are two pos-
sibilities: either xo is a solution of the following linear programming
problem:
cpo(X)=ft.(x)-f(x0)-<x*, X-X0)·--.inf, (8)
cp;(x)=f;(x)-f(x0 )-<x•, x-x 0 >:s:;;;O, i:#:i 1 , (9)
or the assertion of the theorem is true. Indeed, if the first possibility
does not take place, then
( 10)

for some x E int dom cp • By the continuity, the inequality cp 0 (x) < 0 is
retained in a convex ~eighborhood V~x; i.e.,
{t.(x)-f(xo)<<x•,x-xo>· xev. (11)
Now let us take into consideration that x*Eof(x0 )., and consequently
the inequality
max{fl(x), ... , fn(x)}-f(x0 )~<x*, x-xo>
must hold. Comparing it with (11), we conclude that for xEV the inequal-
ity
cp(x)=max{f;(x) I i.Pi 1 }-f(x0 )-<x•, x-x0 )~0 ( 12)
holds. At the same time, according to (10),
cp(x)=max{f;(X) I i.Pil}-f(xo)-<x•, x-xo>~O.
and hence cp (x) = o, and, by the lemma, x• Eat (x0 ) , where
f(x)=max{f 1 (x) I i.Pi 1}.
Since f is the maximum of n- 1 functions, the induction hypothesis
implies that

D) Knowing that xo is a solution of problem (8), (9) we can make use


of the Kuhn-Tucker theorem (Section 1.3.3); attention should be paid to the
remark made after the proof of that theorem: although, by virtue of Propo-
sition 3 of Section 2.6.2, the functions we deal with are continuous on
int dom, they can be equal to +co outside it. ·
According to the Kuhn-Tucker theorem, there exists Lagrange multi-
pliers AO, Ai ~ O, i ~ i1, not all zero, for which xo is a point of mini-
mum of the Lagrange function

Moreover, the conditions of complementary slackness must hold according to


which Ai = 0 if ~pt(x,)=ft(x0 )-f(x0 ).PO, i.e., if i~/. Denoting Ao as Ai 1
and !Jlo as lpi1 , • we obtain !l'(x, A.)= ~A. 1cp;(x), and since the Lagrange multi-
le 1
pliers are determined to within a p.ositive factor, we can assume that ~A 1 =;
1• IE I

Now we have
1l' (x, A.)~ !l' (x0 , A.)~ ~ A1cp (x) ~ 0 ~
le 1

~ ~A;{ 1 (x)-( ~A- 1 ) f(x 0) - ( ~A- 1 ) <x•, x-x0 )~0 ~x·E fJ (~A.;{;(·))
lEI
(x0 ).
tel iel tel

Applying the Moreau-Rockafellar theorem and taking into account that the
definition of the subdifferential implies the evident equality
MATHEMATICAL METHODS FOR EXTREMAL PROBLEMS 153

a(f..f) (xo) = t..at (xo)


holding for A > 0, we see that

x• = i ~ 'Aixi, xi Eat; (x.),


EI

l.i~O, ~f..i = l,
iel
Chapter 3

THE LAGRANGE PRINCIPLE FOR SMOOTH PROBLEMS


WITH CONSTRAINTS

In the present chapter our primary aim is to prove the Lagrange prin-
ciple for smooth problems with equality and inequality constraints and to
derive sufficient conditions for an extremum in such problems. The general
results of this chapter will be applied in Chapter 4 to problems of the
classical calculus of variations and optimal control problems. Since a
part of the material was already presented at an elementary level in Chap-
ter 1, it is advisable to look through the corresponding passages in Sec-
tions 1.3 and 1.4.

3.1. Elementary Problems


3.1.1. Elementary Problems without Constraints. Let X be a topologi-
cal space, let U be a neighborhood in X, and let f:U + R. The problem
f (x) __,. extr
is called an elementaPy (extPemal) pPoblem without aonstPaints. In the
case when X is a normed linear space and f is smooth in a certain sense,
the problem ~) is called an elementaPy smooth pPoblem; if X is a topologi-
cal linear space and f is a convex function, the minimum problem (3) is
called an elementary aonvex pPoblem.
The definition of a local extremum for the problem (3) was stated in
Section 1.2.1. If x yields a local minimum (or a maximum or an extremum)
x
in the problem (3) we use the abbreviated notation E locmfh 3 (or x E
locmaxa, or x E locextq). For the definition of the various terms related to
the notion of smoothness, see Section 2.2.
We shall begin with the simplest one-dimensional case when X = R.
LEMMA. Let the function f:U + R be defined on an open interval U~R
containing a point x.
a) If f possesses the right-hand and the left-hand derivatives at the
point x and xE locmina (locmaxa) , then
f'(x+O);;:=O, f'(x-O):s;;;;O (f'(x+O):s;;;;O, f'(x-0);;:=0). (1)
If (1) involves strict inequalities, then ie locmina (locmaxa).
Further, let the function f be k times differentiable at the point x.
b) If xElocmina(Iocmaxa), then either f(i) (x) = o, 1 ~ i ~ k, or there
exists s, 1 ~ s ~ [k/2], such that

154
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 155

(2)
c) If there is s, 1 ~ s ~ [k/2], such that the relations (2) hold,
then x
e locm.in a (locmax a).
For k = 1 the assertion b) is known in mathematical analysis as the
Fermat theorem [xElocextq~f'(x)=O·]. We proved this theorem in Section
1.4.1.
Proof. The assertion a) follows at once from the definitions of one-
sided derivatives and a local extremum. For k = 1 the assertion b) (the
Fermat theorem) follows from a) if we take into account that the existence
of the derivative implies the equalities f'(x) = f'(x + 0) = f'(x- O).
Further, let k > 1; for definiteness, let us consider a minimum problem.
By Taylor's formula,

f(x+h)= :E f<'~ 1<x> hi +oW'>·


k . ~
(3)
i=O I

Letf'(x)=···=f<•-U(x)=O, l<m~k. f<•l(x)=FO· Here there are two possibil-


ities: m may be odd or even. For an odd m let us put q>(;}=f(x+'V'~). sER.
Then from (3) it follows that q> ED~ (0) and q>' (0) = f<•> (x)/m! :;6 0, whereas the
Fermat theorem implies that q>' (0) =0. This contradiction shows that m must
be even. For an even m we put ~ (s)=f (x+ 'V'1), s;;;;,: 0 and obtain from ( 1)
the inequality lji' ( +0) = f (m) (x) /m! > 0 which proves the assertion b). The
a~sertion c) immediately follows from Taylor's formula f(x +h) = f(x) +
ft 2 s)(x)h 2 Sj(2s)! + o(h 2 S)~which shows that if f( 2 s)(x) > 0, then x
and if f( 2 s) (x) < 0, then xElocmaxa. •
The assertions that follow are almost immediate consequences of the
corresponding definitions.
THEOREM 1 (a First-Order Necessary Condition for an Extremum). Let
X in (3) be a normed linear space, and let the function f possess the de-
rivative in a direction h at a point E U. x
If xE locmin a (x E locmax 3), then
r <x; h);;;;,:o <f' <x; h> ~ o).(4)
COROLLARY 1. Let f possess the first variation (the first Lagrange
variation) at a point X. If XE locmin a (rE locmax 3) ' then
B+f (x, h);;;;,: 0, Vh EX (B+f (x, h)~{), Vh EX) (5)
(Bf(x, h)=O, VhEX).
COROLLARY 2. Let f possess the Fr~chet (the Gateaux) derivative at a
point x. If x E locextq, then
r (x> = o <tr<x> = o). (6)
The assertion of Corollary 2 is also called the Fermat theorem. The
points i at which the equality (6) is fulfilled are called stationaPy
points of the problem W·
THEOREM 2 (a Second-Order Necessary Condition for an Extremum). Let
X in ta) be a normed }inear space, and let f possess the second Lagrange
variation at a point xEU. If xElocmina (xElocmaxa), then the following
conditions hold:
Bf(x, h)=O, VhEX, (7)
6 f (x, h)~ 0, Vh EX (f>'f (x, h)~ 0, Vh EX).
1 (8)
156 CHAPTER 3

Proof. Equality (7) is contained in Corollary 1. Inequalities (8)


follow at once from the definition of the second Lagrange variation and
the assertion b) of the lemma. •
COROLLARY 3. Let f possess the second Fr~chet derivative at a point
x. If xElocmins (xE locmaxa), then the following relations hold:

r <x>=o, (9)
r <x>~ o ( 10)
(These inequalities mean that the quadratic form d 2 f = f"(x)[h, h] is
nonnegative (nonpositive).)
Proof. Using Taylor's formula (see Section 2.2.5), we write

whence
t>f<x, h)=cp' <a>=f' <xHhJ,
t>•t (x, h)= cp" <a>= r <x>
[h, hJ,
and it remains to refer to (7) and (8). •
Corollary 2 asserts that the points of local extremum are stationary
points. As was mentioned in Section 1.3.1, the converse is not true. To
obtain sufficient conditions we have to take into consideration higher-
order derivatives as has already been done in the lemma.
THEOREM 3 (a Second-Order Sufficient Condition for an Extremum). Let
X in (~) be a normed space, and let f possess the second Fr~chet deriva-
tive. If the conditions
f' (i) =0, ( 11)
f"(x)[h, hJ~ajjhjj2 , VhEX(f"(x)[h,h]~-allhiJ2 , VhEX) (12)
are fulfilled for some a. >o, then xElocmina (xElocmax 0).
Proof. If the first of inequalities (12) holds, then using Taylor's
formula once again we obtain

provided that h "' 0 and llhll is sufficiently small. Consequently, xE


loomina. The case of a rr~ximum is considered in a similar way. •
The condition (12) is called the condition of strict positivity (nega-
tivity) of the second differential of f.
Remarks. 1. For the finite-dimensional case when X= Rn the asser-
tions of Theorems 1-3 are well known in mathematical analysis. In this
case the Fermat theorem means the following:
( 13)
From the second-order sufficient condition it follows that in the finite-
dimensional extremum problem the assertion xElocmin 0 implies [besides (13)]
that the matrix (;z~~-) is positive-semidefinite:
n •
~ a•f'(x) h h ....._ 0 wh n
"'-'-~it,::::;,
i, i=l X, Xj
v ER..
As can easily be seen, the condition that the matrix (:;~~~1 ) is positive-
definite guarantees the strict positivity of the second differential (and
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEHS 157

hence it is a sufficient condition for a minimum at a stationary point).


This is not the case for infinite-dimensional spaces .
..
Example. Let X= l2., and let f(x)= ~ ((x,!n")-x'). Then x = 0 is a
i=l
stationary point, f'(O) = O, and the second differential of fat zero
.,
r (0) fh, h] = ~ h';n•
i=l

is positive-definite. At the same time, O~locmin~ because the functional


f assumes negative values on the sequence {en/n}~=l (ei is the i-th base
vector of the space l2.): f(en/n) = (1/n 5 ) - (1/n 4 ) < 0 while the sequence
itself tends to zero.
2. In the one-dimensional case the lemma with which this section
begins gives in a certain sense an exhaustive answer to the question
whether the given point is a point of local minimum of an infinitely dif-
ferentiable function f: with the exception of the trivial case when all
the derivatives vanish at the given point the lemma answers this question.
Such an exhaustive procedure is unlikely to exist even in the case of func-
tions of two variables.
3. Inequalities (12) impose severe restrictions on the structure of
the normed space X. If the first of them holds, the symmetric bilinear
functions (~In) f"(x)[~, n] determines a scalar product in X, and the
inequalities

show that the norm generated by this scalar product is equivalent to the
original norm. Consequently, X is linear homeomorphic to a Euclidean
(pre-Hilbert) space. In particular, inequalities (12) cannot hold in the
spaces C and c 1 •
3.1.2. Elementary Linear Programming Problem. In Section 1.2.6 we
presented the general description of the class of linear programming prob-
lems. Here we shall consider the simplest problem of this class. It will
be of interest to us because its analysis will lead us to very important
conditions for extremum, namely, the conditions of correspondence and con-
cordance of signs and the conditions of complementary slackness.
Let X = B.n, and let aeRn• be an element of the conjugate space: a =
(al,••••an). We shall consider the problem
n
ax-..inf(sup); ~~a 1x 1 -inf(sup). (a)
1=1
x~.~o.

Here the symbol ~ means that the corresponding constraint may involve one
of the signs ~. ~. and =. The problem (3) will be called an elementary
linear programming problem.
We shall say that for a vector a the condition of correspondence of
signs (with the constraints x ~ 0) is fulfilled if ai ~ 0 for the con-
straint Xi ~ 0, ai ~ 0 for the constraint Xi ~ 0, and ai may have an arbi-
trary sign for the equality constraint Xi = 0. In the case when these in-
equalities are "reversed," i.e., if ai ~ 0 for the constraint Xi~ 0, ais
0 for the constraint Xi ~ 0 (and again ai may be of an arbitrary sign for
xi= 0), we shall say that the condition of concordance of signs is ful-
filled. In the former case we write ai ~ 0 and in the latter case ai ~ 0.
THEOREH (a Necessary and Sufficient Condition for an Extremum in an
Elementary Linear Programming Problem). For a point x E R~' to yield an
158 CHAPTER 3

absolute minimum (maximum) in the problem (3) , it is necessary and suffi-


cient that there hold:
a) the conditions of correspondence (concordance) of signs
( 1)
and
b) the conditions of complementary slackness
a;X;=O, i= 1, ... , n. (2)
Proof. Let us suppose that (3) is a minimization problem and that the
i 0 -th constraint has the form xi 0 ~ 0. If we assume that ai 0 < O, then the
infimum in the problem (3) is equal to -oo because xi 0 may assume arbitrarily
large positive values. Therefore, the condition ai 0 ~ 0 is necessary for
the value of the problem (the sought-for inf) to be finite, Now i f we assume
that ai 0 xi 0 ~ 0, then the condition Xi 0 ~ 0 and ~he inequality ai 0 ~ 0 we
have proved imply that there must be in fact ai 0 Xio > 0. However, then the
point x does not yield a minimum in this problem because ax < ax on the
admissible vector x = (x 1 , ... ,xi 0 - 1 , 0, Xio+l••••,xn). The sufficiency of
conditions (1), (2) is evident: if, for definiteness, (.3) is again a mini-
mization problem and the conditions of correspondence of signs and comple-
mentary slackness are fulfilled, then we obtain ax ~ 0 for any admissible
vector xERn ,whereas, by virtue of (2), we have o for ax= x. •
Exercise. Derive this theorem from the Kuhn-Tucker theorem.
3.1.3. The Bolza Problem. Let~ be a finite closed interval on the
real line. Let us consider the Banach space
E= c~ (d; Rn) x R x R
consisting of the elements~= (x(·), to, t1). For this space we shall
consider the problem
t,
.'73(6)=.1a(x(·), i 0 , i 1 ) = ~ L(t, x(t), x(t))dt+l(t 0 , x(t0 ), t 1 , x(i1)) ....... extr. (3)
lo

Here the functions (t, x, x) + L(t, x, x) and (to, xo, t1, Xl) + l(to, xo,
t 1 , x 1 ) in (3) are defined in some open subsets V and W of the spaces R x
Rn x Rn and R x Rn x R x Rn, respectively, and t0 , t 1 Ed. We shall suppose
that the two functions are at least continuous. The problem (3) is called
the Bolza problem with nonfixed time and the functional in (a) is called
the Bolza functional. In Section 1.4.2 we considered a special case of
this problem when n = 1 (the case n > 1 was only briefly mentioned) and
the instants to and t1 were fixed.
LEMMA. Let the functions L and l and their partial derivatives Lx,
L*, lti• and lxi (i = 0, 1) be continuous in the domains V and W, respec-
tively. Then the functional ~ is continuously (Frechet) differentiable
on the following open subset of the space ==
'll = {6 = (x ( · ), t 0 , i 1 ) IX ( ·) E C1 (d,_ Rn); tEA,
(t, x(t), x(t))EV; (t 0 , x(t 0 ), i1, x(tl)EW, t9, i 1 EinU},
and, moreover,
t;
.'73' (x ( · ), f0 , f1 )fh ( · ), T0, T 1] = ~ (lx (t) h (t) +
I,
+li (t) h (t))dt +L (f1 ) T1 -L (-10 ) T0 +40t 0 +L1,T1 + lx0 (~ (fo) +~ (t0) T0 )+ ~ 1 (h (11 ) +~ (i;,) T1 ),
(1)
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 159

where
i(t)-L(t, x(t), ~(t)), Lx(t)=Lx(t, x(t), ~(t)),
Lx<t>=LxU. x(t), x(t)). hi=olt{(i;,. x(i;.>J 4. x(i,.)).
lxi = lxi (i;,, X(i;,), £',., X(t:)), i = 0, 1.

Proof. The functional ~ is a sum of an integral functional and an


endpoint functional. Their continuous differentiability was proved in
Sections 2.4.2 and 2.4.3. [The endpoint functional·s + l(to, x(to), t 1 ,
x(t 1 )) is a special case of an operator of boundary conditions.] To obtain
(1) it remains to make use of formula (9) of Section 2.4.2 and the formula
(3) of Section 2.4.3. •
According to this lemma, (3) is an elementary smooth problem without
constraints. Therefore, we can apply to it the necessary and sufficient
conditions for an extremum of Section 3.1 .1. Then the assertion of the
Fermat theorem is interpreted as the following theorem:
THEORE~1 1 (a Fi,st-Order Necessary Condition for an Extremum in the
Bolza Problem). Let the conditions of the lemma be fulfilled. If an ele-
ment ~ ~ (x (·),
t 0 , t1) E 8 belongs to the set CU and yields a local minimum in
the problem (3), then
1) the stationary conditions with respect to x(·) hold:
a) the Euler equation:
d
-di Lx (t) +Lx (t) = 0
~ ~
(2)

and
b) the transversality conditions with respect to x:
Lx(ti)=(-l)tlxi' i=O, 1; (3)
2) the transversality conditions with respect to t hold:
H(t;)=(-l)i+lft .• i=O, I, (4)
'
where
ii (t) =LX (t) ~ (t) -L (t).
Relations (2) and (3) were already encountered in Section 1.4.2,where-
as the transversality conditions with respect to t have appeared for the
first time, and later in Chapter 4 we shall constantly deal with them.
~. As was already mentioned, the problem (3) is an elementary
smooth problem, and consequently the Fermat theorem is applicable to it
(Corollary 2 of Section 3.1.1) according to which
~ E locextr 3 ~ :iiJ' (~) = O#:iiJ' (~) [ Tl] = 0, (5)
V1J = (h ( · ), 't0 , 't1 ) E 8.
Now taking into account (1), we obtain from (5) the relation
I,
0=~' (~)[TI] = ~ (ix(t)h(t)+l;,(t)h(t))dt+a0h(1~) ta1 h(t1 )+~0't 0 +~ 1't 1 , (6)
t,
where, for brevity, the following notation has been introduced:
ai=lxi' ~i=Lti+lxix(ti)+(-l)i+~L(ti), i=O. I. (7)

The relation ~·(~)[1]]=0 holds for any element n = (h(·), To, Tl). Let us
first consider the elements of the form TJ=(h(·), 0, 0), hEC 1 (A, Rn), h(t 1)=
0, i = 1, 2. Then (6) implies that
160 CHAPTER 3

7,
~ (l,.(t)h(t)+l;.(t)h(t))dt=O (8)


for any vector function h(·) E C1 ([to, t1], Rn) such that h(t 0 ) h(t 1 )
0. The situation of this kind was already dealt with in Section 1.4.1.
From the DuBois-Reymond's lemma proved there we readily obtain relation (2)
[DuBois-Reymond's lemma was proved in Section 1.4.2 for scalar functions;
here it should be applied consecutively to each of the components of the
vector function h(·); putting h1(·) = •.• = hi-1(.) = hi+ 1 (·) = ••• = hn(·) =
0 we reduce (8) to an analogous scalar relation with one component hi(·)
and obtain the i-th equation of system (2)]. ·Incidentally, the continuous
differentiability of Lx(t) has also been proved. Therefore, we can perform
integration by parts in the expression (6), and thus [taking into account
(2)] we obtain
0 = .'B' (~)f'IJJ = (a.0 -l;. (t0))h (10) +(a.1+l;. (tl)) h (ti) +PoTo +P1Tf (9)
for arbitrary vectors h(to) and h1(t1) and numbers To and T1 . Equalities
(9) and (7) immediately imply relations (3) and (4). •
As is seen, the proof of the theorem is virtually the same as in Chap-
ter 1. The only distinction is that earlier we used primitive methods for
computing the derivatives while here we have used the general theorems of
differential calculus established in Sections 2.2 and 2.4 .
. 3.1.4. Elementary Optimal Control Problem. Let U be a topological
space, and let cp:[t0 , t 1j.X U--rR. We shall consider the problem
I,
f(u(·))= ~ cp(t, u(t))dt--rin£
lo

in the space KC([t 0 , t 1J, U) of piecewise-continuous functions u(·):ft0 , tt}--rU


with values belonging to U. We shall assume that the function q> is con-
tinuous in (t 0 , t1]XU. The problem (~) will be called an elementary optimal
control problem.
THEOREM (a Necessary and Sufficient Condition for a Minimum in an Ele-
mentary Optimal Control Problem). Let a function u (·)€KC(ft 0 , t 1J, U) yield
an absolute minimum in the problem (0). Then for every pofnt of continuity
of the function u(·), the condition
minf(t, U}=f(t, u(t)) (1)
UEU

holds.
Proof. Let us suppose that relation (1) does not hold and there exist
a point T [at which u(·) is continuous] and an element vEU such that f(T,
v) < f(T, u(T)). Since the functions t ~ f(t, v) and t ~ f(t, u(t)) are
continuous, there is a closed interval ~ = [T- 8, T + 8] such that f(t,
v) < f(t, u(t)) for tEA. Let us put il(t) = u(t) for t~A and il(t) = v
for tEA • Then we obtain f (il( ·)) < f (u( ·)), which contradicts the minimal-
ity .of u(·). The sufficiency is obvious.
3.1.5. Lagrange Principle for Problems with Equality and Inequality
Constraints. In Chapter 1 we already discussed the principle to which
there correspond necessary conditions for problems with constraints. Now
after we have separated out some "elementary problems" it is possible to
sum up certain results.
The extremal problems we dealt with were formalized in such a way that
the constraints were divided into two groups the first of which had the
form of equalities. With respect to the constraints of the first group the
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 161

Lagrange function was formed. Then the problem was mentally posed on the
corresponding extremum of the Lagrange function with respect to the second
group of cons·traints, i.e., the group which did not take part in the forma-
tion of the Lagrange function. Then it turned out that either the problem
thus stated was itself elementary or the "partial" problems obtained by
fixing all the unknowns except one were elementary. The collection of the
necessary conditions for these elementary problems formed the sought-for
set of necessary conditions for the extremum. Let the reader verify that
all the necessary conditions which were discussed in Chapter 1 and all the
necessary conditions which will be spoken of in Chapter 4 correspond to
the described procedure. The problems with inequality constraints make
an exception: they involve some additional necessary conditions. These
problems will be treated in the next section; here we shall show that they
can also be included in the scheme of the "Lagrange principle" we have de-
scribed.
For definiteness, let us consider a minimization problem
/ 0 (x)-.inf; F(x)=O, ft{x)SO, '""' l, ... , m, <a>
in which besides equality constraints there are also inequality constraints
[X andY in (3) are topological linear spaces; fi:X "*Rand F:X "* Y]. The
introduction of the new variables ui brings the problem (3) to the form
/0 (x)·-inf; F(x),.,.O, ft(x)+u 1 -0;
u1 ~ 0, i == 1, ... , m.
Let us divide the constraints into two groups the first of which consists
of equalities [F(x) = 0 and fi(x) + Ui = 0] and the other consists of the
constraints of the form Ui ~ 0. Let us form the Lagrange function for the
problem <3> , ignoring the constraints of the second group:
m
.li(x, u, y*, 1., 1.8)=1.0/ 0(x)+ ~ l.t(ft(x)+ut)+<y", F(x)), 1.=(1.1, • .. ,A.).
i=l

As to the sign of the multiplier Ao, we assume that AQ ~ 0 in a minimum


problem and AO ~ 0 in a maximum problem.
The problem of minimizing the Lagrange function (for fixed values of
the multipliers)
fi (X, u, ye, i, i 0) - inf
involves two groups of variables: x and u. If the variables u = u are
fixed, we obtain an elementary problem without constraints (a smooth or a
convex problem, etc.), and for this problem we can write the required
necessary condition for an extremum; as can easily be seen, fii are not in-
volved in this condition. If the variables x = x are fixed, we obtain an
elementary linear programming problem. The conditions for an extremum
written in accordance with Section 3.1.2 yield the conditions of correspon-
dence of signs for the Lagrange multipliers ~:
~~~o (1)
and the conditions of complementary slackness:
~i-u, = o~ J.tt, <x> = o. (2)
In what follows we shall permanently use this procedure, but if there
are inequality constraints we shall form the "shortened" Lagrange function
m
at the very beginning: !l' (x, y•. 1., A0 ) = ~ 1.;{1 (x)
i=O
+<y•, F (x)> • It is this very

function that will be called the Lagrange function of the problem (3). It
should be borne in mind, however, that to the conditions for the extremum of
162 CHAPTER 3

this function with respect to x we must add the relations following from
(1), (2), namely, the conditions of concordance of signs:
-/;(x)~O::;>i;~O (1 ')
and the conditions of complementary slackness:
i;t; <x>=o.
Now it should be noted that the Lagrange principle (like, for example,
the Fermat theorem for a smooth extremal problem) yields only necessary
conditions for an extremum, i.e., separates out a set of "suspected ob-
jects" but does not prove their "guilt." Therefore, for the complete so-
lution of the problem we must either have at our disposal a set of suffi-
cient conditions for an extremum or be sure that the solution exists.
Accordingly, in the first case we can "make an examination": each of
the objects separated out by the necessary conditions is tested for the
sufficiency. Finding among these objects the one satisfying the sufficient
conditions, we complete the solution of the problem. In the second case the
sought-for solution (whose existence is known in advance or is proved) must
be among the suspected objects satisfying the necessary conditions for an
extremum. On computing the values of the functional for all these objects,
we choose as the solution the object for which the value of the functional
is extremal.
Of course, we cannot give preference to either of these approaches.
Usually sufficient conditions do not coincide with necessary conditions
and therefore there can remain "suspects" whose "guilt" cannot be proved
[for example, the theorems of Section 3.1.1 give no answer to the question
about the existence of an extremum at a point x for a "planar" function
f(x) for which f(k) (x) =0 for all k]. On the other hand, even i f it is known
that the solution exists, difficulties may arise when the necessary condi-
tions separate out an infinite (or simply a very large) number of the sus-
pected objects.
For many questions concerning the existence of the solution it is pos-
sible to rely on a perfected version of the classical Weierstrass theorem
(this theorem was mentioned in Section 1.6.1): the existence is a conse-
quence of the compactness of the set of the admissible elements and the
semicontinuity of the functional.
Definition. A function f:X + R defined on a topological space X is
said to be lower semicontinuous at a point xo if lim f(x)~f(x0 ), and it is

simply called lower semicontinuous if it is lower semicontinuous at each


point (cf. Section 2.6.2).
The Weierstrass Theorem. A lower semicontinuous function f:X + R
attains its minimum on every countable compact subset of the topological
space X.
Proof. Let Ac::X be a countably compact subset, and let S be the
value of the problem
/(x)-inf, xeA; (3)
i.e.,
S = inf f (x).
XEA

According to the definition of the infimum, it is possible to choose


a minimizing sequence for problem (3), i.e. , a sequence of points Xn E A,
such that f(xn) + S. By the definition of countable compactness, xn con-
tains a subsequence xnk convergent to a point XE A . Since f is semicon-
tinuous, we have
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 163

f(x)~ lim f(x,k)= lim f(xn)=S,


k-+-co n~CD

and, on the other hand, since f(x) cannot be less than the value of the
problem (3), there must be f(x) ; S, i.e., xis a point of minimum. •
COROLLARY.Let f be lower semicontinuous on a topological space X.
If an a-level set £',.f={xlf(x)~et}of the function f is nonempty and count-
ably compact, the function f attains its minimum on X.

3.2. Lagrange Principle for Smooth Problems with


Equality and Inequality Constraints
3.2.1. Statement of the Theorem. Let us consider the extremal prob-
lem
fo(x)-..extr; F(x)=O,· fdx)50, i= l, ... , m. ( 1)
The symbol fi(x) § 0 means that the i-th constraint has either the
form fi(x) ; 0 or fi(x) < 0 or fi(x) ~ 0.
The problems of the type (1) where X and Y are normed spaces, fi are
smooth functions on X, and F is a smooth mapping from X into Y are called
smooth extremal problems hlith equality and inequality constraints.
By the Lagrange function of the problem (1) we mean the function
m
!l' (x, y•, A., A.o) = ~
i;O
"-J1 (x) + <y•, F (x)>. (2)

where A.= (A.i, ... , Am) E Rm•, "-o E R, y• E Y. are the Lagrange multipliers.
THEOREM (Lagrange Multiplier Rule for Smooth Problems with Equality
and Inequality Constraints). Let X andY be Banach spaces, let U be an
open set in X, and let the functions fi:U + R, i ; 0, 1, ... ,m, and the map-
ping F:U + Y be strictly differentiable at a point x.
If x yields a local extremum in problem (1) and if
the image ImF'(x) is a closed subspace in Y, (3)
then there are Lagrange multipliers y*, ~. ~o for which there hold:
a) the stationarity condition for the Lagrange function with respect
to x:
(4)
b) the condition of concordance of signs: \* ~ 0 in the case of a min-
imum problem and ~o < 0 in the case of a maximum problem, and
~~~o. i= l, ... , m; (5)
c) the conditions of complementary slackness:
XJ;(x) = o, i = l, ... , m. (6)
We once again remind the reader that th~ notation ~i ~ 0 means the
following: if fi(x) ~ 0 in problem (1), then \i < 0; if fi(x) <
. 0, then\.]. ~
0; and, finally, if fi(x) ; 0, then the sign of \i may be arbltrary.
The assertion that there exist Lagrange multipliers satisfying the
set of the conditions a)-c) is in exact agreement with the general prin-
ciple of constraint removal: this very principle was spoken of in the last
subsection of Section 3.1. Therefore, the result stated above is also
called the Lagrange principle for smooth problems hlith equality and in-
equality constraints.
164 CHAPTER 3

The proof of the general theorem is based, in the first place, on the
implicit function theorem of Section 2.3 and, in the second place, on the
Kuhn-Tucker theorem, i.e. ultimately on the finite-dimensional separation
theorem. The Kuhn-Tucker theorem was proved in Section 1.3, and this is
in fact the only significant reference to the first chapter in the present
part of the book. The reference to the Kuhn-Tucke-r theorem is connected
with the presence of inequalities in (1), and this complicates the proof
to a certain extent. As to the case when there are no inequalities, it is
quite simple and at the same time meaningful and instructive. Therefore,
we shall consider it separately although the result concerning this case
follows automatically from the general result. In the study of the next
section we recommend that the reader compare the infinite-dimensional ver-
sion of the proof with the finite-dimensional version considered in Section
1 .3.2.
3.2.2. Lagrange Multiplier Rule for Smooth Problems with Equality
Constraints.
THEOREM (Lagrange Principle for Smooth Problems with Equality Con-
straints).
a) Let X and Y be Banach spaces, let U be an open set in X, and let
f:U + R and F:U + Y be a function and a mapping strictly differentiable at
a point x.
If the point x is a point of local extremum in the problem
f (x)---. extr, F (x) = 0 ( 1)
and if the image ImF'(x) is a closed subspace of Y, then there are Lagrange
multipliers A0 ER and U*EY* for which the stationarity condition for the
Lagrange function holds:
(2)
b) If the condition of regularity holds for the mapping F, i.e., if
ImF' (x) coincides with the whole space Y, then the multiplier ~o is nonzero.
Proof. We shall carry out the proof for a minimum problem. Let us
define the mapping
. 3F(x)=(f(x)-f(x),,F(x)), 3F: U-->-RXY.
Obviously, the mapping 3F is stnctly diffe:rent~able at x, and 3F' (x) =
(f' (x), F' (x)).
There are two possible cases here: the image Im3F'(x) may either coin-
cide with the space R x Y or be different from it.
A) We shall begin with the case Im3F'(x):f<RxY. Let us apply the lem-
ma on the closed image (Section 2.1. 6) to the mapping 3F' (X) • By the hypo-
thesis, the image ImF'(x) is closed, and the image of f'(x)(KerF'(x)) is
either {0} orR, i.e., a closed subset of R. Hence, by that lemma, the
image Im 3F' (.~) is closed in R x Y. Since lm 3F' (x) does not coincide with
R x Y, it is a closed proper subspace. Further, according to the lemma on
the nontriviality of the annihilator (Section 2.1.4), there exist a number
~ 0 and an element y* (see Section 2.1.5 where the general form of a linear
functional in a product space was discussed), not vanishing simultaneously
(IX 0 1 + lly*ll ;t 0), such that
<1.of' (x), x> + <y*, F' (~)"[xJ> = 0,. Vx.
We see that this relation coincides with (2).
B) Now let Im3F'(x)=RxY. Then we can apply the implicit function
theorem (see Sections 2.3.1-2.3.3).
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 165

According to this theorem,t there exist a constant K > 0, a neighbor-


hood 'U of the point (0, 0) in the space R x Y, and a mapping cp: 'U-.X
such that
3f(~(cx, y))=(cxi y) and \\cp(a, y)-x\\~K\\8f(x)-(a, YH· (3)
Now we put x(s)=cp(-e, O)=cp(z(s)). Then (3) yields
3f (x (s)) = z (s) ~ f (x (s))-f (.x) = - P F (x (s)) = 0, (4)
llx(s)-x\1=\\cp (z (s))-xii~KIJ(O, 0)-(-s, O)II=Kls I· (S)
From (4) and (S) it follows that x(£) is an admissible element in
problem (1) which is arbitrarily close to i, and at the same time f(x(£)) <
f(x), i.e., x~locmin (I). This contradiction shows that the equality
Im 3f' (x) -= R x Y is impossible. Hence the assertion a) of the theorem is true.
C) Let F be regular at the point :X, and let ~o "'0. Then y* 7 0, and
relation (2) takes the following form: <y*, F' (x)tx]>=O, Vx. Let us choose
an element y such that <y*, y> 7 0 (this is possible because y* 7 0) and
find an element x such that F'(x)[x] = y [it exists by virtue of the equal-
ity ImF'(x)X = Y]. Then 0 7 <y*, y> = <y*, F'(x)[x]> = 0. This contra-
diction proves the second assertion. •
Remark. We constructed the above proof according to the same scheme
as the one used in the proof of the finite-dimensional Lagrange multiplier
rule (Section 1.3.2), and it turned out to be as simple and concise as the
latter. However, this required preliminary work because we used three
facts of functional analysis: the theorem on the nontriviality of the anni-
hilator (i.e., in fact the Hahn-Banach theorem), the lemma on the closed
image, and the implicit function theorem. To complete the general theory,
i.e., to prove the Lagrange multiplier rule for problems with inequality
constraints, we shall only use, besides these three facts, the Kuhn-Tucker
theorem, which is a direct consequence of the finite-dimensional separation
theorem (see Section F) of the proof of the theorem in Section 3.2.4.
It is remarkable that such modest means concerning general mathematical
structures yield a result from which, as its immediate consequence, in the
next chapter, with the aid of some simple auxiliary facts which are also
of general character (Riesz' representation theorem for linear functionals
in the space C and the standard theorems on differential equations), sig-
nificant concrete results are derived: necessary conditions for an ex-
tremum in problems of the classical calculus of variations and optimal
control problems.
3.2.3. Reduction of the Problem. Before proving the theorem stated
in Section 3.2.1, we shall perform a transformation of its conditions.
First we reduce the problem to a minimum problem by replacing, if neces-
sary, fo by (-1)fo. If among the constraints of the form fi ~ 0 or fi ~ 0
there are such that fi(x) < 0 or fi(x) > 0, respectively, we discard them
since they are insignificant from the local point of view because they hold
at all the points in a neighborhood of the point x. Further, if (1) in
Section 3.2.1 involves the relations fi(x) ~ 0, fi(x) = 0, we replace them
by the inequalities (-1)fi(x) ~ 0. As to the equalities fi(x) = 0, we add
tLet us compare the notation of the theorem of Section 2.3.1 with that of
the present se-ction. The topological space X in the implicit function
theorem (i.f.) consists here of one point {xo}, Y(i.f.) ~ X, Z(i.f.) ~
R x Y, '¥(x, y) "' '¥(xo, y) (i.f.) ~ 3f'(x), 1\.(i.f.) ~ 3f''(x), Yo (i.f.) ~ x,
z 0 (i.f.)~OE R x Y. Condition 1) of the theorem becomes trivial, condition
2) holds due to the strict differentiability of 3F because
'I' (x0 , y')- 'I' (x0 , y")-.A (y' -y") ~ 3f (x')-3f (x")-g{'' (x) [x'-x"J,
and, finally, condition 3) holds since Im~' (X}=RXY.
166 CHAPTER 3

them to the equality F(x) = 0. After that we renumber all the inequalities
specifying the constraints to obtain the following problem equivalent to
(1):

f.(x)-+-inf; F(x)=O, i<x)<,O, j=l, ... , m, ( 1 I)

where F(x)=(F(x), fm+l(x), ·· ., fll(x)), and for every j, 0 ~ j ~ ~. there exists


Ej = +1 or Ej ~ -1 such that
(2 I)
The element x yields a local minimum in the problem (1 1 ), and, moreover,
fjCX) = o, j = 1, ... ,m.
Now we shall show that all the conditions of the theorem hold for
problem (1 1 ) . Indeed, X andY x ~-iii are Banach spaces. The functions fj
and the mapping F are strictly differentiable at the point x (because F
and fj are). It only remains to show that L = ImF 1 (x) is closed in Y x
~-iii. We have F' (x) = (F' (x), <I>' (x)) ' where <I>' (x) [x] = <<1~+1 (x), x> • ... ' <1~ (x), xi),
<I>' (x): X-..Ril-m. The image Im F 1 (x) is closed by the hypothesis, and the
subspace <I>' (x) [KerF' (x)Jc Rll- m is closed like every subspace of a finite-
dimensional space. Therefore, by the lemma on the closed image (Section
2.1.6), ImF 1 (x) is closed in Y x R~-m.
Now let us suppose that the theorem under consideration is true for
problem (1 1 ) and hence there exist Lagrange multipliers y* = (y*, Xm+ 1 , ••• ,
X~), ~·, j = 0, 1, ... ,iii, for which the conditions a) and b) of Section
3.2.1 hold [the conditions c) hold a~tomatically because fj(x) = 0, j =
1, ... ,iii). By virtue of b), we have Aj ~ 0, j = 0, 1, ... ,m, and, by virtue
of a),
m
2,. (x, y•, i:i, ... , £;;,, i-o) = .l: ~~ t/ (x) + (y•o!)' (x) .... o, (3 I)
J= 0
m
where .9' = l: "i}/x)+(~of) (x).
/=0

Now it remains to put Ao=£•. ilj=e/Kj, j=l, ,... , m,J..;,='Ki;j=m+l, ····It


and ~i = 0 if fi(x) 7 0. The collection of the numbers ~ij obviously sat-
isfies the conditions of complementary slackness and the condition of con-
cordance of signs [because in (2 1 ) we have Ej = +1 when the constraint has
the form fij(x) ~ 0 and Ej = -1 for fij ~ 0]. Moreover, since
.fi' (x, Y., li, . · . , i-ll, 1
0) =2 (x, Y., ~f• • • ·, ~m• 1..0 ) ,
the stationarity condition is fulfilled or not fulfilled for the two La-
grange functions simultaneously.
Thus, in what follows we can limit ourselves to problem (1 1 ) . How-
ever, for the sake of simplicity we shall no longer write the tilde (-)
above F, fj, and m.
3.2.4. Proof of the Theorem. Thus, let a problem
f 0 (x)--..inf~ F(x)=O, f1 (x)<,O, j=l, ..• ,m, (1)
be given, let all the conditions of the theorem of Section 3.2.1 hold, and
let fj(x) = 0, j = 1, ... ,m. Without loss of generality, we can also assume
that fo(x) = 0.
We shall split the proof into several parts.
A) The Linear Case. Let us first consider the simplest situation
when fo = x* is a linear functional, fi = 0, i = 1, •.. ,m, and F =A is a
surjective continuous linear operator. The point = 0 is the solution of x
the problem
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 167

<x•, x)-+inf; Ax=O (AEJl'(X, Y), A(X)=Y) (2)


if and only if x• E (Ker A).l • By the lemma ~n the annihilator of the kernel
(Section 2.1.7), there exists an element y*EY* such that x* + 1\.*y* = 0.
The last relation is nothing other than the Lagrange principle for the
problem (2):
x•+A•y•= 0C:0.9'.c(O, y•,l)..,Q,
where .!l'(x, y•, A0 )=A0 (X 0 , x>+<y•,Ax)=(A 0 x"+A•y•, x> is the Lagrange function
of problem (2). Hence in this situation the Lagrange principle is true.
B) The Degenerate Case. Let Im F 1 (x) be a proper subspace of the
spaceY. By the lemma on the nontriviality of the annihilator (Section
2.1 .4), there is an element il" E (Im F' (x))l. ~ (il oF') (~)=0. It remains to put
\i = 0, i = O, ... ,m and to verify that the Lagrange principle holds for the
degenerate case under consideration as well.
In what follows we assume that F is a regular operator at the point
x, i.e. , Im F 1 (x) = Y.
Let us put
Ak={x!·<t;(x),x><O, i=k,k+l, ... ,m,F'(x)[xJ=O}
for 0 ~ k ~ m.
~C~)~L~e~mm==a~~<~B~a~s~i~c~)~. If xis a local solution in the problem (1), then
the set Ao is empty. In other words,
max <fi (x), x> ~ 0, Vx E KerF' (X).
o~i~m

Proof. Let us suppose that Ao is nonempty, i.e., there exists an


element~ such that F 1 (x)[~] = 0, <fi(x), ~> = Bi, Bi < 0, 0 ~ i ~ m. Then,
by Lyusternik 1 s theorem (Section 2.3.5), there exists a mapping r:[-a, a]+
X, a > O, such that
F(x+A.s+r(A.))=O, A.Ef-ct, ct], r(A.)=o(A.). (3)
For small A> 0 and i = 0, 1, ... ,m we have the inequalities

Relations (3) and (4i) (i = 1, ... ,m) mean that for small A> 0 the
element x
+A~+ r(A) is an admissible element in problem (1). Then in-
equality (4 0 ) contradicts the fact that x yields a local minimum in the
problem. •
D) LEMMA 2. If the set Am is empty, then the Lagrange principle holds
for problem (1).
Proof. The emptiness of Am means that x = 0 is a solution of the
problem
<t:n (x), ~> _,. inf, F' (x) [ x1= o.
By virtue of Section A), the Lagrange principle holds for this problem
and hence it holds for problem (1) as well [to show this it suffices to
put \o = ••• = ~m-l = O] ·
Thus, from Sections C) and D) i t follows that either the Lagrange
principle has already been proved (Am = 0) or there exists k, 0 ~ k < m,
such that
Ak= 0, Ak+i"fo 0.
E) LEMl1A 3. If relations (5) hold, then zero is a solution of the
following linear programming problem:
<fi. (X), X)-+ inf; <fi (X}, X) ~0,
168 CHAPTER 3

i=k+!, ... , m, F'(x)[x]=O .. (6)


Proof. Let us suppose that n is an admissible element in the problem
(6) (i.e., <fi(x), n> ~ 0, i > k + 1, F'(x)[n] = O) such that <fk(x), n> <
0. Lets be an element belonging to Ak+ 1 , i.e., <fi(x), s> < 0, i > k + 1,
F'(x)[G] = 0. Then the element n + Es belongs to Ak for smalls> 0, which
contradicts (5). •
F) Completion of the Proof. We apply the Kuhn-Tucker theorem (Section
1.3.3) to problem (6) taking into account that the Slater condition holds
for this problem (because Ak+l is nonempty). According to this theorem,
there are nonnegative numbers ~k+ 1 , ... ,Am such that the point zero is a
solution of the problem
m
<fi. (x), x> + ~ <~di (x), x> --*inf, F' (~)[xJ = 0. (7)
i=k+l

[The last assertion is nothing other than the minimum principle for problem
(6); the Lagrange multiplier in the functional is put to be equal to 1 since
the Slater condition is fulfilled.] Now, problem (7) is again the problem
mentioned in Section A). The Lagrange principle holds for the latter
problem or, which is the same in the present situation, the lemma on the
annihilator of the kernel of the operator F'(x) is applicable. In other
words, there exists an element y* for which
m
~ ~;~; <x> + <u· o F'Hx> = o.
t;. <x> + i=k+l
We see that this is nothing other than the stationarity condition for
the Lagrange function if we put ~o = ••• ~k-1 = 0, ~k = 1. •
In the course of the proof we have found that ~o = 1 for A1 ~ 0. Let
us show that if F is regular [i.e., Im F' (x) = Y], then there is no co llec-
tion of Lagrange multipliers at all for which the stationarity con~ition
for the Lagrange function holds and which contains the multiplier Ao = 0.
Indeed, for A1 ~ 0 there exists an element h suchthatF'(x)(h] = 0, <fi(x),
h> < 0, i = 1, ... ,m. Now let us suppose that there are Lagrange multi-
pliers X= (X 1 , ••• ,Xm), y*, not all zero, such that..2'"(x,y•,~,0)=0. Then,
by virtue of the inequalities Xi<fi(x), h> ~ 0, we have
m
O= ..2'"(x, ~ ~~<fi(x), h> + <y•, F'·(x>fhl> =
y•, i, o)[h]= i=l
i; ~~ <ti (x>, h)=> 1i = ... = x,. = o =>
= i=l u· + o,
and therefore
0=..2'x(x, y•, X, O)[x]=<y-., F' (x)[x]>, Yx:> lm F'(x)=;W,
which contradicts the hypothesis.
Let us state what has been said as a separate proposition.
Proposition. For the assertion of the theorem of Section 3.2.1 to
involve the inequality ~o 7 0 it suffices to add to its conditions that
ImF'(x) = Y and that there exists an element hEKerP'(x) for which <fi(x),
h> < 0 , i = 1 , ••• , m.
The additional assumptions which have been mentioned here will be
called the stPengthened PegulaPity conditions foP pPoblem (1).
The statement of the Lagrange principle we have proved includes the
important condition (and the only condition besides the requirement of
smoothness and the requirement that X and Y be Banach spaces) of the closed-
ness of the image ImF'(x). It should be noted that without conditions of
this type the Lagrange principle may not hold.
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 169

First of all, if the requirement that the operat·or AE.!l'(X, Y) be sur-


jective (X andY are Banach spaces) is dropped, the formula (KerJ\)J. = ImJ\*
may prove to be wrong or, more precisely, ImJ\* may turn out to be a proper
subspace of (Ker J\)J.. For instance, if X =Y =1 2 , x = (x 10 ••• , xn, ... ) E {2 , Ax= (xi,
X 2 /2, ... ,xn!n, •.. ), then KerJ\ = {0}, and hence (KerJ\)J. = l 2 , whereas Imi\=
ImJ\*"' l2 [for instance, the element y = (1, 1/2, ... ,1/n, ... ) belongs to
l2 while the solution of the equation J\x = y obviously does not exist]. Now
we can give an example of a problem for which the Lagrange principle does
not hold.
Example. Let X and Y be Banach spaces, and let an operator AE2(X,
Y) be such that KerJ\* = {0}, while ImJ\* is a proper subspace of (Keri\).1..
Now we choose x*E(KerA).l."lmA• and consider the problem
<x•, x>- inf; Ax =0.
The Lagrange principle does not hold for this problem. Indeed, x 0
is a point of minimum, and if there were ~o and y*EY* such that
£'.. (0, y•, J:.)[hj = 0,
Vh EX~ j;o <x•, h>+ <y•, M> =0,
VhEX,
then we would have ~ o = 0 (because, if otherwise, then x* E Im A•) , and con-
sequently 1\.*JI" =0 => y•=O.
Exercise.t Let Z=Y=l 2, z=(zl> •;o• Zn, ... )€1 2 , Az=(z1 , z2 /2, ... , Zn/n, ... ), y~lmA,
X=RXZ,f(x)=f(ct,z)=(%, f(x)=f((%, z)=Az+(1.2 y.
Show that the Lagrange principle
does not hold for the problem f(x) ~ inf; F(x) = 0.

3.3*. Lagrange Principle and the Duality in Convex


Programming Problems
3.3.1. Kuhn-Tucker Theorem (Subdifferential Form). The Lagrange
principle for convex programming problems (the Kuhn-Tucker theorem) was
already proved in Section 1 .3.3. In the present section the "subdiffer-
ential" form of this theorem is given, and its connection with some other
notions of convex analysis is elucidated.
Let X and Y be Banach spaces, let J\:X ~ Y be a continuous linear oper-
ator, let fi:X + i, i = 0, 1, ... ,m, be convex functions, let a= (a 1 , ••• ,
am)ERm, bEY, and let A be a convex set in X. We shall consider the fol-
lowing convex programming problem:
f0 (x)-.inf; ft(x)<,a;, i=l, ... , m, Ax=b, xEA.
The set {xiJ\x b} will be denoted by B. The Lagrange function of the
problem (3) is
m
2' (x, y•, A, A0 ) = A0 f0 (x) + ~A; (f; (x)-a;) + <y•, Ax-b>.
i;;:;l

Proposition. Let x be a point of absolute minimum in the problem (3)•


Then x is a point of absolute maximum in the elementary problem
f(x)=max(f.(x)-{ 0 (x), {1 (x)-a,, .. . , fm(x)-am)+6(AnB)(x)-+inf, (3')
where 6 (An B) is the indicator function of the set An B.
Indeed, if there exists an element x for which f(x) < 0, then this
implies that, firstly, xEA. xEB(~AX=b), secondly, fi(:K) < Gi, i = 1, ... ,
m (i.e. , :X is an admissible element of the problem) , and, 'thirdly, f o (:X) <
f 0 (x), which contradicts the hypothesis. •
THEOREM (Subdifferential Form of the Kuhn-Tucker Theorem). Let the
functions fi, i = 0, 1, ... ,m, in (3) be continuous at a point xE A nB,.
tThis exercise was suggested by the fourth-year student V. V. Uspenskii.
170 CHAPTER 3

which yields an absolute minimum in the problem. Then there are numbers
m
Ai): 0 such that ~A,=1, ~i(fi(:x)- ai) = o, i): 1, and an element x•e
1=0

iJ6 (An B) (x) for which


m
oe i=O
~ ~~ at;(x> +x•. ( 1)

Proof. By the hypothesis, x


yields an absolute m1n1mum in the ele-
mentary problem (3') . According to Theorem 1 of Section 3.1 .1, we have 0 ,E;
af(x). The function g(x)=.max(f0 (X)-f 0 (i), f1 (x)-a1 , . . . , fm(x)-am) is convex
and continuous at the point xEA n B [i.e., x belongs to dom6(AnB)],
and hence, by the Moreau-Rockafellar theorem (Section 2.6.4), iJf(x)~iJg(x)+
iJil(AnB)(x). Finally, the Dubovitskii-Milyutin theorem (Section 2.6.4) im-
plies iJg(x)= conv(iJfl,(x)u ... uiJ!I,(x)), where ij are those and only those in-
dices for which fi(x) = g(x). Thus, there exist two elements ~·e·og(x) and
x•EiJil(AnB)(x) for which
~·+x• = 0, ~· E conv (iJf;, (x) U ... UiJf;, (x)) ~ ~· =
s s
= ~ ~~1 x1
j =I I
.. xt1 eatt.(x), i=~~~1 =1,
1 I
1:,.;;;;:.o.
I

I t remains to put A1 = 0, i ~ {i 1 , ... , i,} . •


Exercise. Let B = {xI Ax = b}. Prove that if Xo E B, then a0 (B) (xo)
(Ker A).L.
COROLLARY (Lagrange Principle for Convex Programming Problems with
Equality and Inequality Constraints). Let A= X in the conditions of the
theorem, and let the image of X under the mapping A be closed in Y. Then
there are Lagrange multipliers A0 E R, AE Rm', E Y• such that y•
i;;. 0, A;(f,(x)- a;)= o, i;;. 1,
(2)
min f£(x, y*, A, A0 ) = f£(x, y*, A, A0 ).
X

Indeed, if Im A is a proper subspace in Y, then we can put A1 = 1..0 = 0, E y•


(lmA).L. In case A is a surjective operator, we have iJilB(x)=(KerA).L= ImA•.

Then, according to (1), 0 EiJ,.2(x, y•, A, A0 )= ~


1=0
1-: 1 iJf;(x)+ImA•, and hence,

by the definition of the subdifferential,


!l'(x, y•, A, A0 )-..2'(x, y•,1.., 1..0 )-<0, x-x>;;;;:.O~
~ !l' (x, y•, A, A0 );;;;:. ..2' (x, y•, A, A0 ), Vx. •
3.3.2. Perturbation Method and the Duality Theorem. The convex pro-
gramming problem (3) was considered in the foregoing section as an indi-
vidual problem. However, for many reasons it is natural and useful to con-
sider families of problems of this kind. Let us fix fi, A, ai, A, and b
and include the problem (3) in the family
/ 0 (X)·->-inf; f;(x)+cx 1 ~a 1 , i= 1, ... , m,
(3 (ex, Tj))
Ax+rt=b, xEA.
(Of course, we could simply assume ai and b to be variable parameters of
the family, but the above introduction of the parameters Cli, n leads to more
beautiful formulas.) The family of problems {(3(a, rt))} will be called a
perturbation of the problem (3) = (3 (0, 0)).
As was already seen in Section 3. 3.1, the problem (3) is reducible to
an elementary problem. Here, in a somewhat different manner, we shall
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 171
perform an analogous reduction of the family 3(~.~). Namely, let us denote
I(x,. ~.
) { / (x)
'I =
0 if f;(x)+~~~a;,Ax+oq=b,xEA,
(1)
+co 'in all the other cases.

Then the family (8(~, ~)) can be writUn in the form of an elementary prob-
lem:
f (x; a, ~)- inf (with respect to :x·E X). (2)
In what follows when speaking about the problem (~(~, f)-)),, we shall not dis-
tinguish between the original statement of the problem and (2). For the
indicated constraints the value of this problem, i.e., inf f 0 , is a function
of a= (a 1 , ••• ,am) and n which will be denoted by S (S:Rm x Y ~ R) and will
sometimes be called the S-funetion:
S (a, TJ) = inf f (x; ~. 'IJ) = inf fo (x). (3)
" xEA, f;(.cl +t<;.;; a 1, AxHt=b

LEMMA. Let F(x, z) be a convex function on the product of two linear


spaces X and Z. Then the function
S (z) = inf F (x, z)
"
is convex on Z.
Proof. Let (z 1, t 1)EepiS, i = 1, 2, and I..E[Ot 1]. Then S(zi) E;; ti,
i = 1, 2, and for every£> 0 there exist (xi, Zi) such that F(xi, zi) <
ti + e, i = 1, 2. By the convexity ofF, it £ollows that
F(t..x1 +(l-t..)x1 , t..zi +(1-1..) z1) ~
~t..F (x1 , ~)+(1-1..) F(x 1 , Z1 ) < 1. (t1+e)+(l-t..)(t1+e)= A.t 1 + (1-A.) t 1 +a.
Since e > 0 is arbitrary,
F(A.x:~+(l-t..)xi, l..z:~ +(1-1..) Za)~t..t 1 +(1-l..) f 1 ~S (A.z 1 +(1-A.) Z2 ) ~
~At 1 +(1-A.) t.~(l..z 1 +(1-t..) Z 2 , At 1 +(l ..... A.) f 2 ) Eepi S. •
Proposition 1. Let X and Y be linear spaces, let AcX be a convex
set, and let A:X ~.Y be a linear mapping.
If the functions fi:X ~ R, i = O, 1, ... ,m, are convex, the function
f(x; a, n) determined by equality (1) is convex on X x Rm x Y.
Proof. The sets
M0 = {(x, I
~. ~. t) (x, t) Eepi f 0 },
M 1 = {(x, I
~. ~. t) f;(x) +~;~a,},
MA={(x, ~. 'IJ, t)lxEA},
MA = {(x, ~. 1], t) IAx+'IJ
=b}
are convex in X x Rm x Y x R. Indeed, we obtain (changing, when necessary,
the order of the factors): M0 =epi/0 XR .. xY, MA=AxRmxYxR, MA=A-l(b)x
Rm x R, where A:(x, y) ~Ax+ y is a linear mapping from X x Y into Y and
Mi=aepi{f1 -a1}xRm- 1 xYxR, where o:(x, t) ~ (x, -t) is a symmetry trans-
formation in X x R. It follows that all these sets are convex, and it re-
mains to note that

COROLLARY 1. The S-f unction of the problem (3 (~. 'I])) is convex on


Rm x Y.
As was already seen in Section 2.6, convexity makes it possible to
associate with various objects (functions and sets) the objects dual to
them in the conjugate spaces. The same applies to the convex programming
172 CHAPTER 3

problems. In what follows we shall assume that X andY are locally convex
spaces, X* and Y* are their conjugate spaces, and A.= (Ai, ... , Am) E Rm•.
Definition 1. The family of extremal problems
g (x*; A, 11")-sup (with respect to AE Rm•, 11• E Y•), (3" (x"))
where (-1)g(x*; A, n*) ~ f*(x*; A, n*) is the Young-Fenchel transform (Sec-
tion 2.6.3) of the function (x, a, n) + f(x, a, n), determined by equality
( 1) is said to be dual to the family 3 (a, 11) <*(2).
The problem a• = a• (0) is said to be dual to the problem 3 = 3 (0, 0)
[with respect to the family of perturbations a(a, 11)].
The value of the problem 0 •(~) will be denoted
I: (x•) = sup g (x•; A, 11"). (4)
(1., 1]*)

The function f* being convex, its opposite function g is concave. The


dual problems could be called concave programming problem, but we would
rather do without this new term.
Definition 2. By the Lagrange function of the pair of dual families
3 (a, n) and 3" (x•) or the extended Lagrange function of the problem (3) , we
shall mean the function .§:X X Rm• X Y•-+ R determined by the relations

_ { fo (x) + i~l A; (f;(x) --a;)+ <11", Ax-b>,


2(x; A., 11") = xEA, A. ERr;'", (5)
-oo, xEA, A.~R~'•,
+oo, x~A.
This function is convex with respect to x for fixed (A, n*), and the
opposite function -~is convex with respect to (A, n*) for fixed x.
It should be noted that
£> (x; A, TJ") = 2 (x, 'l'J", A, I), x E A, AE R'L'". (6)
where 2 is the Lagrange function of the problem (3) defined in Section
3.3.1.
Proposition 2. The pair of dual families 3 (ex, 'l'J) and 3" (x•) ~s uniquely
determined by their Lagrange function since
f (x; ex, TJ) == '(- oi')•<•l, g (x*; A, TJ") = - §'•(1), (7)
where *(1) and *(2) denote the Young-Fenchel transforms with respect to the
arguments x and (A, n*), respectively.
Proof. The definition of the Young-Fenchel transform implies
- g (x~; A, 'l'J*) = f• (x•; A, 11") =
= sup {<x•, x>+Aex+<'l'J", 'l'J>-f(x; A, fJ}}=
(x, a, Til
= xeA.Ax+fl~b
sup {<x•, ~>+A.cx+<11"• 11>-fo(x)} =
Ii (x) +a;< a;

sup{ <x". x>-f 0 (x)- fA; (f 1 (x)-a;)-<11"• Ax-b>},


= { xeA 1=1 _ .fj?o(l) ( 0, A
AE R';'" - X' ' 11")'
+ oo, A.~ R'L'•

which proves the second of equalities (7). Incidentally, we have derived


the useful relation
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 173

inf (.i' (x; A., 1J0 ) -<x•, X)), A. eR~·. (8)


g( x•· A, n*) = { xeA
' '., - oo, "d:Rm•
~~~~+ •

On the other hand,t


+ oo, x~A,
{ sup (A.a.+ <1l", 11> + f0 (x) +
(- 2)• (I> = sup (A.a.+ <1l*, 1J) + 2 (x; 1., ·1J")) = ~:.;;..o,,;•
(A, 11*)
+ 1~1 A.j(fdx)-a 1 )+<1J*,Ax-b>)

+oo i f x~A or Ax-:-b=i=-1J or f 1 (x)-a 1>-a.1


= { for some t, = f (x; a., 1J). •
fo (x) in all the other cases.

CORQLLARY 2. The conjugate function to the S-function of the problem


3 (a., 1J) has the form
- inf ..2" (x,1J•, A., 1), A.E R~·.
S•(A., n*)=-g(O·, A. n•)- { xeA (9)
., '., - +oo, A.~R~•.

Proof. By definition,
s• (A., t
1J0 ) =sup (A.a.+ (1j 0 • 1])- inf (x; a., 1j)) =
(CX,I]) X

= sup (A.a.+ <1J*, 1J)- f (x; a., 1J)) = - g (0; A., 1)•),
(C%,1],%) .

after which (9) follows from (6) and (8).


Hence, the problem 3* = 3" (0) dual to the problem 3 = 3 (0, 0) can also
be stated thus:
-S•(A., 1J")=inf2(x, 1J*, A., 1)--+-sup; ( 10)
e
A. R'l.'*. 1l* v•. e
Remark. Generally speaking, the definition of the dual problem de-
pends on the family of perturbations in which the original problem (3) is
included. Nevertheless, the equalities specifying the extended Lagrange
function and equality (7) show that in a sense the families {(3(ct.,1]))} and
{(b" (.x*))} correspond to the problem (3) in a natural way, and hence the
problem (3") equivalent to (10) is the natural dual problem of (3) in the
same sense.
Exercise. Show that if the functions fi and the set A are convex and
closed and if fo(x) does not assume the value-= on the set of the elements
X admissible in the problem m, then the family dual to the family {(3°(x'))}
(see the footnote to the proof of Proposition 2) coincides with {(3(a, TJ))},
and thus (3) and (3*) form a pair of dual problems.
Duality Theorem for Convex Programming Problems. Let us suppose that
the S-function of the family {(3 (a., 1J))} is continuous at the point (0, 0).
Then
( 11)

tAs in Section 2.6.3, when computing the conjugate function to a function


defined on the conjugate space (on Rffi* x Y* in the case under considera-
tion),weregard the result as a function on the original space and not on
the second conjugate space.
174 CHAPTER 3
for any (a, f)) E int (dom S), and the values of the problem (3) and its dual
problem (a•) are equal:
inf
xeA
fo (x) = S (0, 0)-= I: (0) :=;
Ax=b
ft<x)<;a;.l=l, ... ,m

=sup { inf{f0 (x)


'-~0 xtA
+ i;t..1 (f;(x)-a 1)+<1J", Ax-b>}}·
1<=1
(12)
'l* e v•

~· By Corollary 1, the functionS is convex, and its continuity


at one point implies (see Proposition 3 of Section 2.6.2) its continuity
and the equality S(a, n) = (c"Ori.VS) (a, n) for all (a, TJ)Eint(domS). By. the
Fenchel-Moreau theorem (Section 2.6.3) and Corollary 2,
s (a., TJ) """s•• (a, TJ) = sup (A.a + <TJ",
(~ 'l*)
fJ) -S• (/..,
.
TJ")) """
= sup (A.a+<TJ", 11>+ inf 9(x, TJ", A, I)),
'-~ o xeA
f)• E Y*

which proves (11). Substituting a= 0 and n = O, we obtain (12). •


COROLLARY 3 (Minimax Theorem). Under the conditions of the duality
theorem, the relation
sup inf 2 (x, TJ", A., 1)
'-~0 xeA
= xeA
inf sup 2 (x, TJ", A., 1)
'-~0
( 13)
t)*EY* 'l*EY*

holds.
Proof. The left-hand side of (13) is equal to E(O) and, by virtue of
(12), it coincides with S(O, 0). As to the right-hand side, the formula
(7) implies that it is equal to

inf sup 9 (x, TJ", A, 1) = inf sup .!E (x; A., TJ") =
.uA '-~ o xeA <'-·11*)
t~•• v•

= inf ((- 2)• '11 (x; 0, 0)) = inf f (x; 0, 0)


xeA xeA

and thus it also coincides with S(O, 0).


3.3.3. Linear Programming: the Existence Theorem and the Duality
Theorem. Let X be a linear space, let X' be the space algebraically con-
jugate to X (i.e., the collection of all the linear functionals on X), and
let K be a polyhedral cone, i.e. the intersection of a finite number of
half-spaces Hi~{<xj,x>~O,xjEX', j=1, ... , s} (the condition that the cone
is polyhedral is-sometimes dropped).
The extremal problems of the type
<x;, x>-:-+ inf; ft(x) =<xi, x> ;;p-b1, i = I, ... , m, x E /(, (1)
form a separate class of linear programming problems.
In this section we shall consider the finite-dimensional case when
X= Rn, X' = Rn*, and K = ~· In this case the problem (1) can be rewrit-
ten thust:
(2)

tWe remind the reader that for finite-dimensional spaces, we have two types
of notation:
It
<c, x>=cx= ~CiXi, cER.11*, xER.n.
,-:1
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 175

where cERn•, A=(a;1 ), j = 1, ... ,n, i = 1, ... ,m, is a matrix specifying a


linear operator ±rom Rn into Rm, and bERm. For finite-dimensional vectors
z' and z the symbol z' ~ z means that all the coordinates of the vector z
do not exceed the corresponding coordinates of the vector z'. Problem (2)
will be called a finite-dimensional linear programming problem. This prob-
lem is a special case of the general convex programming problem. However,
it involves a still more special structure: here all the functions are lin-
ear and the cone is polyhedral. As usual, we shall agree that if the prob-
lem (2) is inconsistent, i.e., has no admissible elements, its value is
assumed to be equal to +oo.
The Existence Theorem. If the set of admissible elements of the fi-
nite-dimensional linear programming problem (2) is nonempty and its value
is finite, then the problem possesses a solution.
Proof. The proof of the theorem is based on a simple fact of finite-
dimensional geometry. We remind the reader that the conic hull cone C of
a set CcX is determined by the equality

conec={ X EX IX=i~ A;Xi, t.i;;;:o 0, xi E c}

and is a convex set (Proposition 1 of Section 2.6.1).


We shall say that the cone cone C is generated by the set c.
A) The Lemma on the Closedness of a Finitely Generated Cone in a Fi-
nite-Dimensional Space. Every cone in a finite-dimensional space generated
by a finite set of points is closed.
Proof. Let K=coneCcRN, C ={z 1 , .,., zk}, and ziERN. We shall prove
the lemma by induction.
1) k = 1. Then K=cone{z1 }={xlx='A1 Z1 , 'Af;;;:,O} is a closed half-line.
2) Now let us suppose that the lemma is true fork= s - 1. We shall
prove it for k = s. There are two possible cases here:
a) the cone K contains the vectors -z1, ... ,-zs· Then K is a subspace
of a finite-dimensional space and thus is a closed subset;
b) at least one of the vectors (-zi), i = 1, .•. ,s, say (-zs), does not
belong to the cone K. Let us denote K1 = cone{z 1 , ... ,zs- 1 }. By
the induction hypothesis, the cone K1 is closed. Let a vector z
belong to the closure of the cone K, i.e., Z=limsn; snEK. n;;;:,I.
According to the definition,

snEK=cone{z 1 , ••• ,
n
.
z,}~sn =~ 'A;nZi=~n+t.,nz,,
i=l

where ~n E Ki and Asn ~ 0.


If we assume that Asn + oo for n + oo, then
-z =lim ~n- ~n = lim ..b._ E Ki
1 n-.cc 'Asn n-'~>a> 'Asn

(we remind the reader that ~n + z, and therefore ~n/Asn + O, and sn/Asn E
K1 since sn E K1 ), i.e., - Z9 EKi because K1 is closed, which contradicts
the hypothesis. Hence Asn does not tend to infinity, and there is a subse-
quence Asnk convergent to a number Aso ~ 0. In this case
~nk = ~nk -Asnis-+ Z -A90 Z8 = i.
Since K1 is closed, we have iEKi, and consequently
176 CHAPTER 3
B) Now we shall prove the existence theorem.
Let us consider the set K in the space Rm+l formed of those vectors
(a,z),aER 1 zER•, for each of which there is at least one vector xER~ such
that ex~ a, Ax~ z.
The set K is a cone because if (a, z) EK, then ex ~ a, AX ~ z for some
xER~. Therefore, for any t ~ 0 we have txER~. c(tx)~ta, A(tx)~tz, i.e.,
t (a, z) E K. The convexity of K is proved in a similar way.
Let us show that the cone K is generated by a finite set of vectors
belonging to Rm+l:

~n+t=(l, 0, ... , 0), ~n+2=(0, -I, 0, ... , 0), ... , ~n+m+ 1 =(0, .. . , 0, -I).
First of all, we have ~ 1 E K, and hence gi, ... , ~n+m+l} c:: K. Indeed, if 1 ~
i ~ n, then the standard base vector ei should be taken as for the other x;
values of i we must take x = 0. Now let ~=(a,z)=(a,zi, ... ,zm)EK· Then
for some xeRn, X=(xlt ... ,x,.). such that Xl'~ o, ... ,xn ~ 0 and some So~ o,
Sj ~ 0 the relations
n n
.l:c1 x1 +~0 =a, .l:a 11x1 -~1 =z 1 , i=l, ... , m,
!=1 /=1

hold. But this exactly means that


n m
~=(a, z)= l: Xi~J+Mn+i+ 1=1
/=1
l: Mn+l+l•
x1 ~o. ~.~o. ~~~o.
i.e.'
t E cone {ti• · · •, ~+a+I}•
According to the lemma proved in the foregoing section, the cone K
is closed. By the condition of the theorem, the problem (2) has an admis-
sible element x, and the value of the problem is &> -oo. From the defini-
tion of Kit follows that the point (~,b), where a= ex, belongs to K.
Moreover, we obviously have a~ a..
Consequently, the set ~={aERI(a, b)EK}
is nonempty and ci=inf{alaEill}; i.e., (a, b) belongs to the closure of the
cone K, and hence to K itself. Therefore, there exists an element ~ 0 x
for which Ax~ band ex~ & ~ex=&., i.e., xis a solution of the prob-
lem. •
Now we proceed to the investigation of the dual problem. Here the
duality theorem assumes a more complete form than in the foregoin~ sec-
tion since the S-function of the problem turns out to be closed due to its
specific structure.
In accordance with formulas (5) and (6) of Section 3.3.2, the extended
Lagrange function is of the form
cx+'-(b-Ax), xER~. '-ER:'~,
Ji (x; '-) = { - oo, .xE R~. A~ R:'",
+ oo, x~R~.

From this expression we find the family of perturbations of the problem


(2) and the dual family. According to (7) in Section 3.3.2,
f (x; a)=(- 9')* 111 =sup (l..a+2 (x, /..)) =
~

{ +co, x~R~. {ex, x;;;;:,o, Ax;;;.b+~Z,


""' ~~(Aa+cx+l..(b-Ax), xE R: = + 1\111 in all the other cases.
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 177

Denoting z = b +a, we obtain the perturbation of problem (2):


cx-.inf, x~O, Ax~z. (3)
Similarly, for the determination of the dual problem, we have to calculate
g (0, A.)=(- $•<1)) (0, A)=- sup ( -.!i (~, )..))=-

={ -;~~(-cx-l.(b-Ax)), 2.;;;.0,
_ 00 , ')..~ R'J.'•,
={"'Ab, ~~0, 2.A~c,
- oo i.n all the other cases.
This leads us to the problem
(4)
dual to problem (2). It possesses the same structure as problem (2), and
it can easily be seen that if a dual problem is constructed proceeding from
problem (4) in accordance with the same rule with the aid of which problem
(4) was obtained from problem (2), we shall come back to the problem (2).
Therefore, it makes sense to speak about a pair of dual linear programming
problems.
The Duality Theorem. For a pair of dual linear programming problems
the following alternative holds: either the values of the problems are
finite and equal and solutions exist in both the problems or in one of the
problems the set of admissible elements is empty.
In the first case elements xERn and ~ERm• are solutions of prob-
lems (2) and (4), respectively, if and only if they are admissible in these
problems and satisfy one of the two relations
cx=1.b· (5)
and,
~(Ax-b)= (}.A-c) X. (6)
In the second case one of the problems is inconsistent and the other
is either inconsistent (i.e., its set of admissible elements is empty) or
has an infinite value.
Proof. A) The Construction of the S-Function. Let us again consider
the same cone K as in the existence theorem. If (a, z) E K and 13 ): a, then
(~. z) e K, and therefore K is the epigraph of the function

S(z)=inf{al(a, z)EK}. (7)


From the proof of the existence theorem it follows that inf is attained
and S(z) is the value of problem (3); consequently, formula (7) specifies
the S-function of problem (2). Since K is a convex and closed cone, the
S-function of problem (2) is closed and convex.
B) Let us compute the function S* conjugate to the function S. Ac-
cording to the definition,
s• ('A)= sup ('Az-S (z)) =sup {'Az- inf {ex I x E R~. Ax~ z}} =
z z X

=sup {A.z-cx Ix E R~. z E Rm, z ~Ax}.


(z, x)

It is evident that sup{'Az\zERm, z~Ax}='AAx<oo if and only if;>..): 0.


Thus,
sup('AA-c)x if
S"('A)= { x:;.o
+oo if A.~ R~·

={ 0 i f 'AA ~ c, 'A~ 0,
+oo in all the other cases.
178 CHAPTER 3

Therefore,
s•• (z) =sup {A.z 1A.A ~ c, A.~ 0}. (8)
In particular, it follows that S**(b) is the value of the dual problem
(4).
C) Completion of the Proof. The function S cannot be identically
equal to +» because S(O) ~ 0 since zero is an admissible element in the
problems ex+ inf, Ax~ 0, x ~ 0; consequently, domS ~ ~.
Here one of the following two cases takes place:
1) S(z) >-co, Vz or
2) there exists z for which S(z) : -m.

The case 1) in its turns splits into two subcases: 1a) bEdomS and
1b) b~domS. If the subcase 1a) takes place, then S is a proper function,
and the value of the problem is finite. By virtue of the closedness of S,
the Fenchel-Moreau theorem (Section 2.6.3) implies that S**(b) : S(b), and
now it is seen from (8) that the dual problem (4) has the same value as
the primal problem (2) and, in particular, it is consistent. By virtue of
the existence theorem, solutions exist in both the problems. If the sub-
case 1b) takes place, then S(b) :+»;i.e., problem (2) is inconsistent.
The Fenchel-Moreau theorem again implies S**(b) : S(b), and hence problem
(4) is consistent but its value is infinite.
If the case 2) takes place, we have S(z) : -m for all zEdomS (see
Exercise 8 in Section 2.6.2). According to the definition, S*(A) =+~and
S**(z) =-co. In particular, S**(b) : -=, i.e., problem (4) is inconsistent.
If bEdomS, then problem (2) is consistent and its value is infinite:
S(b) : -=. In case b~domS, we have S(b) : +~, and problem (2) is incon-
sistent. The proof of the alternative is completed.
Let us come back to the subcase 1a) in which, as was proved, solutions
of problems (2) and (4) exist. We denote them by x and ~. respectively.
It is proved that the values of the problems are equal: ex: ~b, i.e., (5)
holds, and therefore
f. (Ax-b)= f.Ax-M1 = ~Ax-ci = (f.A -c) x,
i.e., (6) holds. Further, if x and A are admissible elements (i.e., x ~ 0,
A~ 0, AA ~ c, Ax~ b), then
ex~ A.Ax ~A.b.

Therefore, if ex Ab, then x


and A are solutions of the problems. Further,
if ~(AX- b) : (~A- c)x, then ex: ~b, i.e., x and Aare solutions of the
problems.
Thus, if x and ~ are solutions, then (5) and (6) hold; if x and ~ are
admissible and either (5) or (6) holds, then x and A are solutions of the
problems. The proof of the theorem is completed. •
Exercise. What does the minimax theorem (Corollary 3 in Section 3.3.2)
turn into in the situation under consideration?
3.3.4. Duality Theorem for the Shortest Distance Problem. Hoffman's
Lemma and the Minimax Lemma. Let Y be a normed space, and let BeY be a
nonempty subset of Y. The quantity
S8 (TJ)=p(TJ, B)= infiiY-TJ~ (1)
geB

is called the distanae from the point n to the set B. The investigation
of the function SB(n) is one of the basic problems in the approximation
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 179
theory (see [84]). If B is a convex set, we obtain a convex programming
problem. The duality theorem proved below has many applications in mathe-
matical analysis.
Thus, we shall suppose that B is a convex set. (In what follows it
is fixed, and we drop the subscript B in the notation of the function S.)
Then the function n + S(n) is the S-function of the following problem:
jjzjj~inf; g-Z=TJ, gEB. (2)
Let us reduce (2) to the standard form of a family of convex programming
problems [see ( 3 (cx, n)) in Section 3.3.1]. To this end we put X= Y x Y,
x=(g, z), fo(x)=iiZI!• Ax=z-g, A: X-Y; A ... {x-(y, z)lgEB}i. and then problem (2)
assumes the form
fo(x)-....inf; Ax+rr=O, xEA. (3)
It follows that functionS is convex (Corollary 1 in Section 3.3.2).
For our further aims we need the following important geometrical no-
tion.
Definition. Given a set Bc:::Y, function sB:Y* + i determined by the
equality
sB (!f)= sup <!f, u>
IIEB

is called the support function of the set B.


Duality Theorem for the Shortest Distance Problem. The quantity S(n)
admits of the following dual representation:
S (TJ) = p (TJ, B) =-sup {<y•, u>-sB (!f) ill !f ~ ~ 1}. (4)
Proof. It is obvious that S ( n) ~ 0, Vn. On the other hand, we have
S(n) ~ llyo- nil, where Yo is an arbitrary point belonging to B. Therefore,
S is bounded both from above and from below, and consequently it is contin-
uous throughout the spaceY (Proposition 3 of Section 2.6.2). Now we can
apply the duality theorem proved in Section 3.3.2. Since there are no in-
equality constraints in (2), the Lagrange function has the form
.!l'(x, !f, I) =~zll+<y•, z-y>.
From the formula (11) of Section 3.3.2 we obtain
S (TJ) =sup (<y•, 'I>+ inf (iizll+<!f, z-y>)) =
II' IIEB
&EY
=sup{(<ye, f))-sup<!f, Y.>)-sup((-!f, z>-llzll)}•
I' IIEB IIY

,..
=sup {<!f, 'fJ>-sB (!f)-N•(- y•)},

where N(z) = llzll, and N*(z) = 0 for llz*ll ~ 1 and N*(z) = +oo for llz*ll > 1
(Proposition 3 of Section 2.6.3), whence follows (4). •
For our further aims we need the following generalization of Corollary
2 of Section 2.6.4:
The Lemma on the Conjugate Cone. Let X and Y be Banach spaces, let
A:X + y be a surjective linear operator from X into Y, let x~, •.. ,x~ be
elements of the conjugate space X*, and let
K={xi<xi,x>:s:;;;O, i=l, ... , s, Ax=O}
be a cone in X.
Then every element x~ belonging to the conjugate cone
K"={x"i<x", x>~O, xEK}
180 CHAPTER 3
admits of the representation of the form
s
-x:= ~ 'J. 1xi+A•y•
1=1

for some Ai ~ 0 and y"EY".


s
Proof. A) Let us denote L n
== i=O l(er xi, Z = l(er AIL, and let 1r :Ker !1. + Z
be the natural projection operator transforming x E l(er A into the class
n; (x) E Z containing it. Let us show that dim Z ..; s + 1 • Indeed, let
z~, •.. ,Zs+l be arbitrary elements of Z, Zi = 1T(xi). The homogeneous lin-
ear system of s + 1 equations
s+1 s+1
.~<x;,x1 >'J..1 =(xi, .~t..1 x1 }=0, i=0 1 1, ... , s,
i=O i=O
with s + 2 unknowns possesses a nonzero solution Xo, ... ,As+l· Hence
s+1 s+1 1+1
i=_~;:1,)EI<erxj, t .. o, ... , s.~xEL~O=:n:(~)=~ ~J1t(x1 )=~ ~1 z1 •
!=0 /=0 /=0
Consequently, any s + 2 elements of Z are linearly dependent and d =
dim Z ~ s + 1.
B) Choosing a basis f1, ..• ,fd in Z we establish the standard isomor-
phism between Z and Rd and between Z* and Rd*:
d
Z= ~ ~~~~-(~i• · • .,_ ~.,),
1=1
d
<z•, z>•:E<z",ft>t,~z·~(b;,· ... , ~;i)=(<z",fl>,
1=1
.... , (z•,f.,>).
Further, let us take the functionals z;ez•, i = 0, 1, ..• ,s, determined by
the equalities
<z;, :n: (x)> =<xi, x>
and consider the convex cone

[(=cone {z;, ... , z;} = {z•l 1~1 'J. 1z;, 'J. 1 ~ 0}


in Z*. Since Z* is finite-dimensional and the cone K is finitely gener-
ated, the lemma of Section 3.3.3 implies that K is closed.
Let us suppose that -z~~[(. Then, by the second separation theorem
(Section 2.1.4), there exists a linear functional l E(Z•)• strongly sepa-
rating {-z~} and K:

<1, -z~> >sup {<l, z•> Iz• E K}=sup L~1 'J. 1 (l, zi> I 'J. 1 ~0 }.
Therefore, there must necessarily_be <Z, z!>..; 0 (because, if otherwise,
then sup= +co), O=sup{(l, z•>!z"EK}, and <Z, z*> < 0.
Now we note that Z, like every linear functional on the finite-dimen-
sional space Z*, is specified by a linear form:
d d d
(1, z">= ~~
~ atti= ~ at<z•, ft>=(z~, ~ ad 1 )=<z•, a).
1=1 1=1
d
Choosing a representative x0 E l(er A of the class a=~ aJ 1 EZ=KerA;L, we
see that, on the one hand, 1=1
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 181

and consequently
s
XoEKerAn n {xl<xi. x>~O}=K,
1=1

and, on the other hand,


<x~. x0 ) = <z~. n (x0 ) ) = <z;, a)= <l, z0 ) < 0,
and consequently x~~l(• , which contradicts the condition of the lemma.
Thus, the assumption --z~~K leads to a contradiction, and therefore
s
--z~= ~ t.. 1zi for some "-i ~ 0.
1=1

C) For an arbitrary xEKerA we have


s s
( x~+ ~ t.. 1xi, x )=( z~+ ~ t.. 1zi, n (x) )=0.
1=1 1=1 .
s
Therefore, X~+ ~ t.. 1x: E (Ker A)l. , and the lemma on the kernel of a regular
1=1 s
operator (Section 2.1.7) implies that x~+~t.. 1x;=A•(--y•) for some y"EY*.•
1=1

Hoffman's Lemma. Let the same conditions be fulfilled as in the lemma


on the conjugate cone.
Then for the distance function from a point x to K the inequality

p (x, K) ~ctt <xi, x>+ +[[Ax[[}


1 (5)

* x>+ 1.s
holds, where <xi_, . equal to <xi,
* x> 1."f <xi,
* x> 'r 0 and 1.s
. equal to
zero in all the other cases, the constant C being independent of x.
Proof. The set K is the intersection of a finite collection of half-
spaces and a subspace and is therefore a convex cone in X. Let us compute
its support function sK. Let X* EX*. Here one of the following two cases
takes place: either there is an element x0 El\ such that <x*, x 0 > > 0 or
<x", x> ~ 0, "'x E K. In the former case we have sK (x•);;;;;, t <x•, X0 )
consequently sK(x*) = +00 • In the latter case (-1)x* belongs to the cone
E R+ , and "'t
conjugate to the cone K, and hence, by the foregoing lemma, we have x* =
s
~ t.. 1xi + A*y*, t.. 1 ;;;;;, 0, y• EY* . Thus
i=1
s
if 3t.. 1 2:: 0, y• E Y•: x• = ~ t.. 1xi + A*y,
sl((x")={ 0 i=l
+oo in all the other cases.

Now, applying formula (4) (the duality theorem) to the problem under
consideration, we obtain
s
p(x, K) =sup{<x", x> [x"= ~ t.. 1x;+A•y•, t.. 1 ;;;;;.0, [lx"[[~ 1}. (6)
t=1

The subspace L=lin{x;, ... , x;}+lmA* is the sum of the closed subspace
L1 = Im A* [because Im A* = (Ker A)J.., and an annihilator is always closed]
and the finite-dimensional space Lz {xt, ... ,x;}. Therefore, Lis closed
in X (prove this!) and consequently it is a Banach space. The operator
s
A1 : R•xy•_,.L, Adt.., y)= ~t.. 1 x~+A'y• is linear and continuous, and it maps
i= 1

Rs x Y* onto the Banach space L. By the lemma on the right inverse mapping
(Section 2.1.5), there exists a mapping M1:L + RS x Y* such that A1 oMi=/L,
s
IIHlx*ll :( cllx*ll. Therefore, if llx*ll :( 1, then IIMix"IIR•xY*= ~l"'ii+IIY"II~C.
1=1
182 CHAPTER 3

Hence we can assume that 0 ~ Ai ~ C, lly*ll ~ C in expression (6), whence

S(x)=p(Je, K)=s;;;sup{( tA,xitA•y•, x)lo:s;;;A,:s;;;C,


i= I
lluell~.c}~
~c{.~ <xj, x>+ +II Axil}· •

It should be noted that if the cone K is specified only by equalities:


K={xi<xi, x>=O, Ax=O}
(i.e., it is a subspace), then it can also be specified in terms of in-
equalities:
K={-xl<xi, x>:s;;;O, <(-l)xj, x>:s;;;O, Ax=O}.
Applying Hoffman's lemma, we see that

p(x, K) ~ct~ I<x~. x>I+IIAxll} (5 I)

in this case as well.


The Minimax Lemma. Let X and Y be Banach spaces, let A:X ~ Y be a
surjective operator, and let x;ex•, i=l, •.. , s, a=(ai, ... ,a,). Let us de-
fine the function S:RS x Y ~ R by means of the equality
S (a, y) = inf max (a 1 +<xi, x> ). (7)
Ar+v=O I~~~ •

If max <xi, x>;;;;;.O for any x-EI<erA, then the following duality for-
l<:t<s
mula holds:

S (a, y) =sup { ~ a.1a, + <y•, y> Ia.l;;;;;. 0,


I= I

ta.,=l,
i=l
~a. 1xi=O}.
A•y•+ 1=1 (8)

Moreover, inf in (7) is attained on some element X = x(a, y) (possibly


not unique), and there exist C > 0 and C > 0 independent of a andy such
that
llx(a, u>ll :s;;;c {IS (a, y) I+I al +llull} :s;;;c <I a! +!lull> (9)
for an appropriately chosen x(a, y).
Proof. The existence of the minimum in (7), the convexity of S, and
formula (8) will be proved by the reduction to the two standard problems
considered above.
A) Auxiliary Mappings. We shall use the lemma on the right inverse
mapping (Section 2.1.5). According to the lemma, there exists M:Y-+X
such that AoM=/, I!M(y)JI=s;;;Cdyll. Further, let <p: X-+R 1 be specified by
the equality q>(K)=(<x~, x>, ... ,(X~, x)). Let us denote L=<p(l(erA). Apply-
ing the same lemma once again, we conclude that there exists a mapping
~= L ~ Ker A such that

Further, L, like every subspace of as, can be determined by a linear



system of equations ~ b 11~, =0, l == 1, ••• , p , or with the aid of a mapping
I=! '

~: R1 -+RP, Pl=(~b
/=1
11 11) by means of the condition fH; = 0.
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 183

B) The Boundedness of S(a, b). We have


Ax+y=O .. x=M(-y)+x0 , x8 EK.erA::::>a 1 +<x;, x>=
= a1 +<xi, M(-y)>+<xi, X0). ( 1O)
Therefore, putting xo = 0 we obtain, on the one hand, the inequality
• } del
inf max (
a1 +<x,, x> ~ max {lati+C,jjxill~llli}-K, ( 11)
Ax+li"'O I ~I<• I ~I< 1

and, on the other hand, for Ax+ y = 0, according to (10),


(12)

because, by the hypothesis, max (<x:, X 0 ))~0 (x0 EK.erA). Now (11) and (12)
I<i<.s
imply the estimate

JS(a, y)I~K= max {Ja1 I+C1 jjx;JIIIYII}- <13 )


I .;; i .;; s

C) The Existence of the Minimum. Denoting Gi = ai + <xf, M(-y)> + K


and recalling that
X0 E K.er A .. ( <x~, X0 ) , ••• , <x~. X0 ) ) = fJl (x0 ) E L = fJl (K.er A)= K.er f},
we conclude from (10) and (7) that S(a, y) + K is the value of the problem
( 14)
By virtue of (13), we have S(a, y) + K ~ 0, and therefore the condition
c ~ 0 can simply be added to (14).
Problem (14) can be brought to the standard form of a linear program-
ming problem (2) of Section 3.3.3 in Rs+l. To this end we denote zo = c,
Zi = c - Si- ai, z = (zl,····zs). e = (1, ... ,1), (ih, ... ,as), z = a=
(z 0 , z). The condition 8s = 0 is written in the form
1}6 = o ~IJ (dl-z-a) =cfl6-fJz-fJa=O~ 8i~ b,
where

are a matrix of order 2p x (s + 1) and a 2p-dimensional vector (we have in


fact replaced the equality 8s = 0 by the two inequalities 8s ~ 0 and 8s ~
0).
Therefore, problem (14) is equivalent to the problem
z0 -+inf, Bz~b. i~o. (15)
having the standard form. This problem is consistent [for example, (2K,
2K- ih, ... ,2K- as) is an admissible vector to which 6=0EL corresponds],
and its value S(a, y) + K is finite (and even nonnegative). By the exis-
tence theorem of Section 3.3.3, problem (15) possesses a solution: z =
(zo, z) = (S(a, y) + K, z). Consequently, problem (14) possesses the solu-
tion ~ = (S(a, y) + K)8 - z -a,
and therefore
S(a, y)+K= max {a1 +~ 1 }= max {a 1 +<xi, M(-y)>+~1 }+K
l~i~s l<i<s

and

where, according to (10) and the definition of the mapping ~:L + KerA, we
have
184 CHAPTER 3

(16)
D) Convexity of S and the Duality Theorem. The function (a, y) +
S(a, y) is the S-function of the following convex programming problem:
maX(TJt• ... , TJ,)~inf, -TJ 1 +<xi, x>+a 1 =0, ( 17)
Ax+y=O.
Putting Z = Rs x X, Y = RS x Y, z = (n, x),
fo (z) =max (TJ., ... , TJ 3 ),
Az=(<x~, x)-TJ 1 , ••• , <x;, x~-TJs• Ax),
we reduce (17) to the standard form [ (3(a, n)) in Section 3.3.2]:
{0 (z)~inf, Az+(a, y)=O. (18)
Consequently, function (a, y) + S(a, y) is convex (Corollary 1 of Section
3.3.2). According to (13), it is bounded and therefore continuous through-
out the spaceY (Proposition 3 of Section 2.6.2). By the duality theorem
proved in Section 3.3.2 [we remind the reader that problem (18) involves
no inequality constraints],

S(a, y)=sup{<z•, (a, y)>+inf[<z•, Az>+rnaX(TJ 1 , ••• , 11,)]}. (19)


•• z

However, z•~=;(R•xY)•=R'*xY•, i.e., z* is representable in the form (a, y*),


et=(eti, ... , a,)ER'", y~EY*, and therefore
<z•, (a, y)> =eta+ <y*, y>,
s (20)
<z•, Az>= ~ c.t1 (<xi, x>-TJ;)~<y". Ax>.
i=l

According to Proposition 2 of Section 2.6.3, the Young-Fenchel trans-


form of the function f(n) =max (n 1 , ••• ,ns) has the form

f*(a)=SUp(etTJ-f(TJ))={ 0 for IX;;;;::O, i~c.t_;= l,


~ - (21)
+oo in all the other cases.
Now from (20) and (21) we obtain
inf(max(TJi• ... , TJ,)+<z•, Az>)=
2

=-sup{IXrJ-maX(TJi• ... , 11,)-(.;!:et,x;+A•y•, x)}-

== {
<~ x~or IX/;;;:::: 0, ~IX;"" l, A*y• ~= ~a1xi;;; 0,
~~ i=l (22)
- oo in all the other cases.

Substituting (22) into (19), we obtain (8).


E) A Geometrical Lemma. Let L be a subspace of Rs and let K be a
finitely generated cone in RS. Then there exists N > 0 such that
p(O, (L+a)nK)~Np(O, L+a)~NJaJ (23)
for any a E R' [this formula remains valid in the case (:L +a) nK = 0 as
well if we agree that p(O, 0) = -oo].
Proof. a) Let K = cone {xl, ... ,xm}. Let us represent every vector
as a sum of two components one of which is contained in the subspace L and
the other is orthogonal to L:
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 185

Then
p(O, (L+a)nK)=inf{l~li6EK. (s-a)EL}==

=inf{l,~"-dy1 +z;)IIA.1 ~ 0, ~~ A. (y +z )-b-ceL} ==1 1 1

=inr{j,~A.,(y,+z1),,A.,~o. 1~A.1z 1 = YtEL}~


~inf{ i;A.,
Ii=
max {IY1 i}+ltljA.,;;;:.o,
I<:; i.;;; m
i;A.,z, ...
I= 1
y,eL}. (24)

On the other hand,


p (0, L+a) = inf {I1J+aii1J ELf= inf { !11' +cii1J' EL} =lei ~V lb i"+lcl"= laj.
Since
m
~A. 1 z 1 =c·~cEcone{zt, ... , z,},
i=l

we see that (23) will follow from (24) if


c E cone {zi, ... , zm} ~ 3A. 1 ;;:;::. 0, (25)
m
~J.,~Ndcl
i=l

for some N1 > 0 [and then we shall have N=Ni max {/y1 1} + 1 in (23)].
1<0; i<;; m

b) Let Lf=lin{zi, ... , zm}• dimLi=n. Let us separate out the various
collections of indices I= {i1, ... ,in} for which {zi 1 , ... ,zin} is a basis
in L1. To each of these collections there corresponds a linear mapping
n
AI:Rn + L1 determined by the formula A 1x=~x"z 1 • Since {zi 1 , ... ,zin}
k=l k

is a basis in L1 , there exists the inverse mapping.


By Caratheodory's theorem (Section 2.6.1), every vector cEcone{zi, ... ,
zm} is a conic combination of not more than n linearly independent vectors
Zi!
C=A;,Z;,+···+A.,,z,,, A. 1k>O, s~n.

Completing, if necessary, the set {zi 1 , ••• ,zi 8 } with some vectors Zis+ 1 , ••• ,
Zin• to obtain a basis in L1 and putting Ais+ 1 = ... = Ain 0 we see that
c=A1A., A=(A.1,, ... , Atn), 1. 1 "~0, l=(ii, ... , in)•
Therefore
n
k~l A.;" ~n I A. I ~niiA/ 1 111 cl ~ n mlx {IIAi1 ll} lc/.

This proves (25) for Ni=nm1ax{[/Ai1 /j}.

F) Completion of the Proof of the Minimax Lemma. It only remains to


derive estimate (9). To this end we come back to the left problem in (14).
An element~= (~1,···•~s) is its solution if and only if
(26)
[since S(a, b) + K is the value of the problem, for at least one index i
the equality takes place in (26)]. Recalling the definition of ai in
Section C) and denoting a= (a1, ... ,as), where
(27)
186 CHAPTER 3
we rewrite (26) in the form ~i + ai ~ 0.
Thus, ~ is a solution of (14) *~+aE(L+a)n(-R~). Therefore,
inf{i!+all! isasolutionof (14)}=p(O, (L+a)n(-R~)).
Since R! = cone{-el, •.• ,-es} is a finitely generated cone, the geo-
metrical lemma of Section E) implies that the right-hand member of the
last relation does not exceed Np(O, L +a) ~ Nlal. Consequently,
inf{l~+allf isasolutionof (14)}~Nial,
and since l~l~l~+al+la/, we have
inf{l~ll~ isasolutionof (14)}~(N+l)ial.
If a~ 0, then (N + 1)lal < (N + 2)lal, and among the solutions of
problem (14) there is one (we denote it by ~) for which
I~I~(N+2)/al· (28)
If a = o, then E; = 0 is a solution, and inequality (28)· again holds.
Using formula (16), we find the element X= x(a, y) from~- To esti-
mate this element we take into account the inequalities
IIM(-y)ll ~C~IIull. ll~t(~)//~C~I~l (29)
[cf. the definitions of the right inverse mappings M and ~ in Section
A)] and

(30)

and also estimate (13):


A (16), (28) (28) (80)
y)J/
A A

~x(a, ~ Cr/lyJI+Cti61~C~IJuii+Cs(N+2)/a/~
,;-:- (13}
~Cdl +C. (N +2)r., max JJxi IJ)/IYII +C.(N+2)Ys(/al+IS(a, y)J)~
l<l<s

~C 1 (1 + VS max JlxiJJ2C2 (N +2))JJyJJ+2C8 (N +2)'VSJaj.


l<l;l<~;s

Therefore, inequality (9) with the constants

C = max{Cdl +C.(N +2)VS max 1/xn/, C2 (N+2)VS},


I <l<s
G=max{CiCI+VS max /lxi//2C.(N+2)), 2C1 (N+2)'VS}
l<l<s

holds.

3.4*. Second-Order Necessary Conditions and Sufficient


Conditions for Extremum in Smooth Problems
We shall again begin with the case when there are no inequality con-
straints.
3.4.1. Smooth Problems with Equality Constraints. We shall consider
the problem
f (x) __,. extr; F (x) = 0. (1)
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 187

THEOREM 1 (Second-Order Necessary Conditions). Let X andY be Banach


spaces, let U be an open set in X, and let the function f:U + R and the
mapping F:U + Y possess the second Fr~chet derivatives at a point xEU.
Let ~ (x, y*, 1) = f(x) + <y*, F(x)>.
If x
yields a local minimum (maximum) in problem (1) and if F is regu-
lar at the point x
[i.e., ImF' (x) = Y], then there exists a Lagrange multi-
plier ye EY• such that
!l',.(x.Y..t>=o, (2)
and for any y* possessing this property the relation
!l',.,.(x, Y.. 1)[h, h]~O(~O), VhEKerF'(x) (3)
holds.
Proof. The existence of y* for which equality (2) holds was proved
in Sect ion 3. 2. 2. Further we consider the case Elocmin ( 1) • Let hE x
KerF'(x). According to Lyusternik's theorem (Section 2.3.5), we have hET;elt
where elt= {x\F(x)=O}; i.e., there exists a mapping r{·): [-e, e]-+o4l such
that
F(x+th+r(t))=O, r(t)=o(t), tE{-e, e]. (4)
By virtue of (4), the element i + th + r(t) is an admissible element in
the problem for tE [-e, e], and consequently f(x) ,.;; f(x + th + r(t)) be-
cause xE locmin(1). Therefore,
f(x)~f<x+th+r(t))=.cl'<x+th_+r(t), y., 1)=
=!l'(x, i/•, 1)+~,.(x, u•.
1)(th+r<t>J+
+ 2 9',.,.(x,lf,
I ~ ~ ~ t• ·~ ~
1)[th+r(t), th+r(t)J+o(t 1)=f(x)+ 2 9',.,.(x, ye, 1)[h, h]+o(t1),

whence (3) immediately follows. •


THEOREM 2 (a Sufficient Condition for a Minimum). Let the conditions
of the foregoing theorem be fulfilled and, moreover, let the inequality

(5)
be fulfilled for some a > 0. Then ~ is a point of local minimum in problem
( 1) •
Proof. Without loss of generality, we can assume that f(x) = 0. Let
us denote by B(h 1 , h2) the bilinear form j-9',.,.(x, y•, 1)[h1, h2 ] . By the theo-
rem on mixed derivatives (Section 2.2.5), B is a continuous symmetric bi-
linear form. Let us choose E > 0 such that

1p(e) =a. (1-1!)1 -211Bil (1 +e) e:-UBUe1 -y > o (6)

[this can be done because q> (0) =a./2 > 0].


By the hypothesis, the functions F and 9'(·, 1) possess the second ye,
Frechet derivatives with respect to x at the point x. Using Taylor's for-
mula (Theorem 2 of Section 2.2.5) and taking into account that F(x) = 0,
9(x, ye,
1)=0 and 9,.(x, y\ 1)=0, we find C1 > 0 and o1 > 0 such that the
inequalities
UF (x+h)-F' (~) [h] tl=ll F (x+h)-F (x)-F' (x) [hi!l~Cllth!J•, (7)
t3<x+hJ g., 1)-B(h, h)t~-ftth~·
hold for llhll < lh. Applying the lemma proved in Section 2.1.5, we con-
struct the right inverse mapping M: Y--+X, F'(x)oM=/,JJM(y)j\=::;;;,qyj\ to
188 CHAPTER 3

F'(x). For£> 0 chosen earlier we find o, 0 < o < 01, such that
6CC1 <e. (8)
Now let llhll < o, and let x .,. h be an admissible element in the prob-
lem, i.e., F(x +h) = 0. Let us put h2 = M(F'(x)h) and denote by h 1 the
difference h- hz. Then from the estimate for M(y) and from (7) and (8)
we obtain
1\h,\1 ~Gil F' (x)l(hJI\1 ~CC1\\hl\a < e[\h[\, (9)
F'(x> [h1 J =F'<x> fh- h,]=F'<x> fh]-F'<x>M(F'(x> [hJ) = o.
Hence h1 EKerF'(x). From (9) it follows that (1-e)llh~~llh 1 \l~{l+e)flhll· As
a result, taking into account (6), (5), and (7) we obtain

t <x+h) = .:?(x+k. Y.. 1);;;;. B (h. h) -fit hi\'= B <hl +h•. hl +h.>-].l\h 11•;;;;.
;;;;, B (ht, h~)-211 B 1\\\hi Ill\ h. ij-\1 B \Ill h.\1'-fl\hll';;;;,

;;;;, ( a(l- e)'-211 Bll (l +e) e-IJB[Je'-%) IJhiJ' > 0,


i.e., xElocmin(l). •
3.4.2. Second-Order Necessary Conditions for Smooth Problems with
Equality and Inequality Constraints. We shall consider the problem
{0 (x)->-inf, F(x)=O, [;(x)~O. i=l, ... ,m.
Changing, if necessary, the signs of the functions we can reduce to
(3) any problem of Section 3. 2. The Lagrange function of the problem (3)
has the form
m
3 (x, y•, A., J.0 ) = ~ A.J i (x) + <y•, F (x)>-.
i=O .

THEOREM. Let X andY be Banach spaces, let U be an open set in X, let


the functions fi:U + R, i = 0, 1, ... ,m, and the mapping F:U + Y possess the
second Frechet derivatives at a point xEU, and, moreover, let fi(x) = 0,
i = 1 , ••• ,m.
If x
yields a local minimum in the problem (3), and i f F is regular
at the point x [i.e., Im F' (x) = Y], then
a) the collection D of the Lagrange multipliers (y•, A.,.A. 0 ), y*EY*, f.ERm•
such that
m
A.0 ;;;;, 0, A.;;;;,. 0, ~ A.;= l,
i=O ( 1)
2 x (x, y•, A., A.0 ) = o
is a nonempty convex compact set in Y* x Rm* x R;
b) for any ho belonging to the subspace
L={hl<fi(x), h>=O, i;;;;,O, F'(x)[h]=O}, (2)

there are Lagrange multipliers (y• (h 0), A(h 0 ), A0 (h 0 )) ED such that·


2 xx (x, y* (h 0 ), A. (h 0 ), A0 (h 0 ))[h 0 , h0 ];;;;, 0. (3)
For the sake of brevity, we shall denote F'(x) =A, f'(x)
x!·
A) Let us consider the mapping rp: a~X· of the si~lex 0'={(/1;,
A. 0 ) \A;;;;;, 0, i~ A;= l} determined by the equality rp (A., A.0 ) = i~ A.ix{ • Then
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 189

m
(y*, /.., f.. 0 )ED~ ~f..;xi+y•oA=
i=O

=qJ(f.., f.. 0 )+A*y*=0=>qJ(f.., f.. 0 )ElmA•=;:.(f.., P. 0 )Ea1 =.fp- 1 (ImA•).


By the hypothesis, ImA = Y, and consequently (Section 2.1.7) ImA*
Ker A) 1, and therefore Im fl * is closed. Consequently, the set 01 is closed
subset of a compact set and hence it is itself a compact set.
Now we note that KerA* ~ {0}. Indeed,
h*EKerA•=;:.<A*h*, x>=O, Vx~<h*, Ax>=O, Vx=>h*E(lmA).L=;>h*=O.
Since the closed subspace ImA* of the Banach space is itself a Banach
space, the Banach theorem implies that there exists the inverse mapping
r:ImA* + Y* to A*. Now we have
(y•, /.., /..0 ) ED~(/.., /.. 0 ) E a;,
qJ (f.., /.. 0 ) + A*y• = 0 ~ (f.., f..o.) E a;,
y• =- fqJ (/.., /.. 0 ).
Consequently, D is the image of the compact set 0 1 under the continuous
mapping (/.., /.. 0 ) - ( - fqJ (/.., /.. 0 ), A;, A0 ) and hence it is itself a compact set.
From the Lagrange principle proved in Section 3.2 it follows that D
is nonempty. The convexity of D is an obvious consequence of conditions
(1), and it can also be derived from the convexity of 0 and ImA* if we
take into account that the mappings considered above are linear. This
completes the proof of the assertion a).
B) As in Section 3.3.1, we replace the given problem with inequality
constraints by the corresponding problem with equality constraints:
f(x)=max {fo(x)-f 0 (X),f 1 (x), ... , f m(x)}--+ inf; F(x)=O.
If XE locmin 3' then XE locmin a' . Indeed'
x~ locmin3' :=;:. Ve > 0, 3x8 : jjx.-xii < e,
F (x.) = 0, f (xe) < 0:=;:. fo (x,) < fo (x), {; (x,) < 0,
i= !, ... , m, F(x,)=O=;:.x~locmin3.
Further we investigate the problem (a').
C) Let h0 EL.. Since D is compact, there are cY*, ~. ~ 0 ) such that
'¥ (h 0 ) =2 xx ()c, y•, ~. ~olfho, ho] = max
(y', ~. ~o)ED
2 xx (x, !!'• f.., /.. 0 ) [h 0 , h0 ) =

=max t~ f..Ji (x)[h 0 , h0 ] + <y•, F" (x)[h 0 , ho]> I f..;)': 0,

i~f..i=l, i~f..ixi+A•y•=o}.
The assertion b) is equivalent to the inequality ~(ho) ~ 0.
Let us suppose that ~(ho) < 0. Denoting
f.,2 ~
a; (f..)= 2 f'i (x) [h0 , h0 ], i = 0, .... m, (4)
f.,2 ~
y (/..) =2 F" (x)[h 0 , h,],

we consider the problem


max (ai(f..)+<xi, x>)-inf; Ax+y(f..)=O. (S)
O<i<.m

Let us verify that the minimax lemma of Section 3.3.4 is applicable


to problem (5). Indeed,
190 CHAPTER 3

max <xi, x>;;;a.O, YxEI(erA,


O<;;i<;;m

is a necessary condition for x


to yield a minimum in the problem (3) (Lem-
ma 1 of Section 3.2.4). Moreover, by the hypothesis, operator A is sur-
jective.
By the minimax lemma, there is an element xo(A) possessing the prop-
erties
max (a;(A.)+<xi, x0 (A.)>)=S(a(A.), y(A.)), (6)
O<;;i<;;m

where S(a(A), y(A)) is the value of problem (5) and


llxo (A.) II ~ci {max I a, (A.) I +IIY (A.) II}~ O.•. (7)
Here, according to formula (8) of Section 3.3.4,

S (a (A.), y (A.))=¥ max{.~ h;fi (x) [ho, h0 ] + <y•, 1"' (x)[h 0 , ho]> A.; ;;;a. 0, I (8)
m m . } ~
~~A;= 1, ~~ A.;xi+A-y•= 0 =2 'l"(ho)·
Therefore, (7) implies that IIAho + xo(A)II = O(h). By Taylor's formula
(Section 2.2.5),
F (x+xo (A.) +Mo)= F (x) +F' (x)[xo (A.)J +fF" (x)[M 0 +xo (A.), l..h 0 +xo (A.)J +o (!..')=

(we remind the reader that

and that, by virtue of (5),


F' (x) [xo (A.)] + (1.. 2/2) F" (x)[h 0 , ho] = 0.)
When proving Lyusternik's theorem in Section 2.3.5, we constructed
a mapping <p: U-+ X of a neighborhood U;) such that x
F(x+~p(x)):eO and ll~p(x)II~KIIF(x)ll.·
Putting r(i..)=<p(x+x0 (A)+A.h 0 ) , we write
F (X+ X0 (A.)+ i..h 0 + ((A.))= 0,
llr (i..) II ~KII F (x+xo (A.) +A.h0 }11=o (A. 1 ).
Now applying Taylor's formula to fi and using (6) and (8) we oboain
f (x+xo (A.) +Mo + r (A.))= maxj(f 0 (x+xo (A.) +Mo +r (1..))-fo (x),
ft<x+x 0 (A.)+A..ho+r(1..)), i= I. ... , ml=
= max (<xi,
O<;; i<O;m
X0 (A.)>+~ fi (x)[h0 , h0 ] +o (1.. 1 )) ~
~ max (<xi, x0 (1..)>+~fi(x)[h 0 , hoJ)+o(1.. 2 )=~22 '1"(h0 )+o(A. 1)<0
O<i<m

for small A2 • Hence 'l"(h 0)<0~ x~locmin3'~ x~!ocmin31· We have arrived


at a contradiction, which proves the theorem. •
3.4.3. Sufficient Conditions for an Extremum for Smooth Problems with
Equality and Inequality Constraints. As in the foregoing section, we
shall investigate the problem
fo (x)-+ inf; F (x) = 0, f;(x}~O. (3)
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 191

THEOREM. Let X and Y be Banach spaces, let U be an open 'set in X,


let xE U,
let the functions fi :U + R, i = 0, 1, .•• ,m, and the mapping F:
U + Y possess the second Frechet derivatives at a point x [which is admis-
sible in(~)], and let fi(X) = 0, i = 1, ... ,m.
Let us suppose that there exist Lagrange multipliers ~ERm•, y*EY* and
a number a > 0 such that ~i > 0,
m
2,.(x, :Y·. 1. 1)=t~<x>+ ~~dl<x>+F''6:Hu"l
i=l
=o (1)

and
!t',.,.(x, y•, A, 1)[h, hJ;;;::.2allhll2 (2)
for any h belonging to the subspace
L={hl<fl(x), h>=O, i;;;;;.1, F'(x) [h] O}. (3)
Then x yields a local minimum in the problem (3).
Proof. A) Let x + h be an admissible element, i.e.,
f,(x+h):s;;,O, i;;;::.1, F(.~+h)=O. (4)
We shall estimate the quantity fo(x +h) in two different ways. We have
m
fo(x+h)=fo (x+h)+ ~ ~ddx+h)+
i=l

u·. 1. 1)-~~J;(x+h).
m m
+<u·. F<x+h>>-~~J,<x+h)=2<x+h, <5>
1=1 i=l

The first estimation method is based on the direct expansion of


fo(x +h):
m
to<x+h)=2(x+h, u·. l, 1)-~X;t,(x+h)=!t'(x,
i=l
:Y·. ~. 1)+2,.(x, :Y·. ~. 1)[hJ-
m m m
-l:l,ft(x)-~<1..1f;(x), h>+rl(h)=fo(x)-~<l;t;(x), h>+rdh), (6)
i=l i=l i=l

where the remainder term r1(h) is O(llhll 2).


The second method is based on the fact that, by virtue of the inequal-
m
ities 1. 1 >0, f 1 (x+h)~O, we have ~~if 1 (x+h):s;;,O, and consequently
1=1

to<x+h);;;;;..2'(x+h • .Y·. ~. I)=.2'(x, .Y·. ~. I)+·.2',.(x, y., l, 1)[hJ+

+{2,.,.(x, y•, ~. 1)[h, h]+r 2 (h)=fo(x)+B(h, h)+r.(h), (7)


.
where B(h, h) denotes the quadrat1c form 2I 2,.,.(x,
• ~ ~ ~
y•, A., 1)[h, h], and for the
remainder term we have r2(h) = o(llhll 2).
B) Let us denote by K the cone consisting of those h for which <fi(x),
h> <; o, i ~ 1, F'(x)h = o.
By virtue of Hoffman's lemma, an arbitrary h can be represented as a
sum h 1 + h2, where h1 EK and h2 satisfies the inequality

(8)

Further, making use of the remark following Hoffman's lemma (Section


3.3.4) and of formula (5') of Section 3.3.4 we can al$o represent h 1 as a
sum h1 ~ h{ + h~, where h~EL and h~ satisfies the inequality

(9)
192 CHAPTER 3

Here we have used the fact that F'(~)[h1l = 0 (because h1 EK) and that
I <fi (~:), h1> I= I <fi (x), h~ +h~> I= I <fi (x), h~> I=
= - <fi (X), h;> (because <fi (x), h].> = 0 and <fi (x), h1 >::s;; 0).

C) From (4) we obtain


O=F(x+h)-F(x)= (.x) h+r. (h), r
(10)
o;:::tj<x+h)=<fi(x),h>+p;(h), i=I, ... ,m,
where llro(h)ll and lpi(h)l are on the order of O(llhll 2 ). Substituting these
estimates into (8), we find that if i +his an admissible element, then

/1 h,I/~Cl {i~ IPi (h) I+ltro (h) 11}. ( 11)

Now we fix a number e1 E(O, 1], whose magnitude will be specified later
and choose I) E (0, I B 11- 1 ) such that the inequality llhll :\'; o implies the in-
equalities
m '
~IPi(h)l+ ~llr1 (h)l/~elllhll, Jlr,(h)/I~TIIhJI'. (12)
i=l j=O

Further we choose a number A > 0 such that


(13)

and, finally, we choose £1 so that the inequalities


e< 1, a(l-e)'-2JIBJie(l+e)-I/BJie•-a;2;:::0 (14)
hold for £ = CC1 + A)£1.
D) Completion of the Proof. Let + h be an admissible element. Let x
us represent h in the form of a sum as it was done in B): h =hi+ h~ + h 2 •
Here the following two cases are possible: a) llh~ll > A£ 1 1ihll and b) llh~ll :\';
A£ 111hll. If the case a) takes place, relation (9) implies

Aetl!hl!<llh;II<C,( - 1~1 <ti(\:), h~>) (15)


for llhll :\'; o.
Then, by virtue of (6), (15), (11), (12), and (13), we obtain
~ (6) ~ ~ ~ ~ (15), (II), (12)
f 0 (x+h)= fo(x)-.~ <f..;{[(x), h;+h,>+r1 (h) ;;;;;.
t= 1

Now let the case b) take place. Then llh~ll :\'; A£ 1 11hll, and hence if we
denote h~ = h~ + hz, relations (11) and (12) imply the inequality llh~ll =
llh~ + hzll :;; (A + C1)E1IIhll = Elihll. Then h = hr + hL where (1 - £) llhll :;;
llhiii :\'; (1 + E)llhll. Now applying these inequalities and also (7), (2), and
( 12), we obtain
~ (7) ~ ~ .
f 0 (x+h)>f 0 (x)+B(h, h)+r,(h)=f 0 (x)+B(h~+h;, h~+h;)+r.(h)=
(2), (12)
= fo (x) + B (hi, hi)+ 28 (hi, h~) + B (h~, h;) + r, (h) ;:.:.
~

;;;::: fo (x) + (a(l-e) -2iiBiie(l +e)-IIB[Ie~~-%) llh 1 > fo(x).


11 2

The theorem is proved.


LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 193

3.5. Application of the Theory to Algebra and


Mathematical Analysis
In this section we gather several examples in which the solution of an
extremal problem is the key to the proof of the theoretical result. The
fundamental theorem of algebra, the theorem on the orthogonal complement,
and Hilbert's theorem belong to the most important theorems of the uni-
versity course of mathematics. Sylvester's theorem and the Gram deter-
minants are popular tools used by mathematicians. As to Section 3.5.5,
it contains a more exquisite material which is included because of the
great importance of the finiteness of the index of a quadratic form in the
classical calculus of variations and the Morse theory.
3.5.1. Fundamental Theorem of Algebra
THEOREM. Every polynomial with complex coefficients of degree not
less than unity has at least one complex root.
Proof. Let p(z) = ao + a 1 z + ••• + anzn be a polynomial of degree
n > 1, where an~ 0. Let us consider the elementary problem
f(z) = lp(zW-" inf.
LEMMA. There exists a solution of the problem (3).
Indeed,
f(z) = lp(zW =I kt akzk r;;. (lanllzl"- I lakllzlk r
= lanl 2 lzl 2 " (1 + 0(1/lzl))--" 00
for z + oo, It follows that all the level sets of f are compact, and now
th~ assertion of the lemma follows from the continuity of the function f
and the corollary of Weierstrass' theorem (Section 3.1.7).
Let z
be a solution of the problem (3). Without loss of generality,
we can assume that z
= 0 [if otherwise, we would consider the polynomial
g(z) = p(z - z)]. Thus,
VzEC.

If a 0 = 0, then the point zero is a root of the polynomial, and the theorem
has been proved. Let ao ~ O, and let s be the index such that a 1 = ••• =
as- 1 = 0, as ~ 0. Now we fix ~ = ei8 and consider the function of one
variable <p (t) = f(t~). By the hypothesis, the point zero yields a minimum
for this function. We have

dk 'P 1•~o
'P Ckl(O) = dtk = !!!:_
dtk [( ao + a, t'e'' + O(t'+'))(ii o+ ii, t'e-" + 0(1'+ ))]1 t~o
8 8 1

0, k = 1, ... , s -1,
{
= (2s!)Re(ii 0 a,e" 8 )=(2s)!laolla,lcos(sll+y), k=s.

Since s > 1, function 8 +cos (se + y) assumes both positive and nega-
tive values. Hence, by virtue of the lemma of Section 3.1.1, function ~
has no minimum at zero if~ is chosen so that cp(s)(o) < 0. This contra-
diction shows that ao = 0. •
The first attempts to state the fundamental theorem of algebra were
made by A. Girard (in 1629) and R. Descartes (in 1637). R. Descartes wrote:
"You must know that every equation may have as many roots as its dimension.
However, it sometimes happens that some of these roots are false or less
than nothing." The first proof of the theorem is known to have been given
by K. F. Gauss (in 1799). The idea of the proof presented above was in
fact suggested in the works by J, D'Alembert (in 1746-1748).
194 CHAPTER 3

3.5.2. Sylvester's Theorem. As we have already seen several times in


the present chapter, second-order conditions are connected, in the case of
a minimum, with the nonnegativity (a necessary condition) or the positivity
(a sufficient condition) of a quadratic form. In the finite-dimensional
case the positivity of a quadratic form is established with the aid of the
well-known Sylvester theorem. Here we shall show that the theorem itself
can be derived from Fermat's theorem.
We remind the reader that a symmetric matrix Am (aij); i, j = 1, ••• ,
m, aij = aji• is said to be positive-definite if the quadratic form xTAx =
£ aiix,xi=Om(x)
~j-1
corresponding to it is positive for any nonzero vector x E

Rm. The determinants det Ak, where Ak = (aij), i, j = 1, ••• ,k, 1 ~ k ~ m,


are called the principal minors of the matrix Am·
Sylvester's Theorem. For a matrix to be positive-definite it is nec-
essary and sufficient that its principal minors be positive.
Proof, A) For first-order matrices the assertion of the theorem is
trivial. We shall suppose that the theorem is true for the matrices of
order n- 1 (n ~ 2) and then prove it for the matrices of the n-th order.
Let An be a matrix of order n, and let An- 1 be its submatrix (of order
n - 1) obtained from the matrix An by deleting the last row and the last
column of the latter.
For an arbitrary F; = (F,;lt••••F,;m- 1 ) E Rm-l we denote~= (F;l,••••F,;m-l•
0) E Rm. Then Qn- 1 (F;) = Qn(~), and if An is positive-definite, the matrix
An- 1 is also positive-definite and consequently det Ak > 0; k = 1, ••• ,n- 1.
Thus, we have to prove the following assertion:·
LEMMA 1. Let detAk > 0, k = l, ... ,n- 1. The matrix An is positive-
definite if and only if detAn > 0.
B) Let x = (xl, •• .,xn) E Rn, where Xn ;t 0. Let us put
Y= (yh ... , Yn-1) = (x 1/ x., ... , x._tf x.) E R"- 1, a= (a 1., ... , a.- 1•• ).
It is clear that
Q.(x) = xrA.x = (yrA.- 1 y+2aTy+a•• )x~. (1)
Let us consider the following elementary extremal problem:
J(y) = yrA.- 1y+2a Ty+a•• -+inf. (2)
LEMMA 2. Under the conditions of Lemma 1, problem (2) has a solution.
Proof of Lemma 2. By assumption, det Ak > 0, k = 1, ••• ,n - 1. Since
the determinant of a matrix is continuous with respect to the elements of
the matrix, for the principal minors of the matrix An- 1 (e) = An- 1 - ei
(where I is the unit matrix) the same inequalities detAk(e) > 0, k =
1, •.• ,n- 1, are retained when e > 0 is sufficiently small. By the in-
duction hypothesis, it follows that the matrix An- 1 (e) is positive-definite
and the inequality
On-l(y) = yTAn-IY = YT(A.-t- el)y+ EIYI 2 "" EIYI 2
holds, whence, taking into account the Cauchy-Bunyakovskii inequality, we
obtain
lf(y)l = IY rA.-ty+2a ry+ a•• I"" eiYI 2 -2IaiiYI-Ia•• l-+ oo
for lyl ~ ~. It remains to refer to the corollary of Weierstrass' theorem
(Section 3.1.7).
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 195

C) Proof of Lemma 1. Let y be a solution of the problem (2). Apply-


ing Fermat's theorem, we obtain

whence
f(y) =a•• - aTA;;~ 1 a.
Further, expanding detAn in the minors of its last column and then
expanding all the minors except one (in the resultant formula) in the mi-
nors of their last rows we arrive at the equality
n-1
det A •• = a•• det A._,- L aika.,ak•
i,k=l

det A._,{a•• - _":[' d aA,k a.iakn} = det A._,{ a•• - aTA;;~ 1 a} = det A._,f(y).
=
•.k-1 et n-1

Here aik is the algebraic adjunct of the element aik of the matrix An- 1
(by virtue of the symmetry of the matrix, aik = aki), and we have used the
well-known formula for the elements of the inverse matrix.
According to (1) and (2), we have Qn(x) = f(y)xA for xn ~ 0, and, in
particular, f(y) = Qn(x), where x
= (y1•···•1) ~ 0. Consequently, if An
is positive-definite, then f(y) = Qn(x) > 0, and, according to (3), detAn
> 0. Conversely, if det An > 0, then f(y) > 0 and Qn(x) = f(¥)x~ ~
f(y)x~ > 0 for Xn ~ 0. If xn = O, then x = (~1•···•~n- 1 , 0) =~.and, by
the induction hypothesis, Qn(x) = Qn- 1 (~) > 0 for x ~ 0. This proves Lemma
1 together with Sylvester's theorem.
Exercise. Give an example in which detAk ~ 0, k = 1, .•• ,n, but
Qn(x) < 0 for some x +
0.
3.5.3. Distance from a Point to a Subspace. Theorem on the Orthogo-
nal Complement. Gram Determinants. Let X be a real Hilbert space with a
scalar product (·I·), and let L be a subspace of X. An element xis said
to be orthogonal to L if (xly) = 0 for any y E L. The collection of all
the vectors orthogonal to the given subspace L forms a subspace which is
called the orthogonal corrrplement to the subspace L and is denoted by L...1.. In
the foundation of the theory of Hilbert spaces lies the following theorem:
Theorem on the Orthogonal Complement. Let L be a closed subspace of
a Hilbert space X. Then X is representable as the orthogonal direct de-
composition L ffi LL; in other words, for any i: E X there exist uniquely
determined elements y E L and z
E LL such that = y + x z.
The proof of this theorem is based on the consideration of the problem
on the shortest distance from a point x
to an affine manifold L:
yeL, (1)
which was already studied in Section 3.3.4. Here using specific features
of Hilbert spaces, we can obtain a more extensive result.
LEMMA. If L is closed, then there exists a solution of the problem
( 1).
Proof. A) First we shall solve this problem for a straight line l
{xlx = ~ + tn}, n ~ o. Let ~(t) = (x-
~- tnlx- ~- tn). Function
t + ~(t) is quadratic and has a single minimum because the coefficient in
t 2 is positive. By Fermat's theorem, t E absmin<Jl ~ <p'(t) = 0 ~(nix­
~ - tn) = 0. It readily follows that
(2)
196 CHAPTER 3

and that x- z
is orthogonal to n. The point z is the foot of the perpen-
dicular dropped from x to l.
B) Now we shall prove the lenuna for the general case. If i E L, then
the lemma has been proved. If x •(
L, then, by virtue of the closedness
of L, the value p of problem (1) is positive. Let {yn} be a minimizing
sequence of problem (1), i.e., Yn E Land llx- Ynll->- p. Without loss of
generality, we can assume that the sequence {lix- Ynll} is monotone. Let
us show that it is fundamental. Let 0 < (llx- Yn11 2 - p 2) 1 12 < E. Let us
draw a straight line l through the point Yn and Yn+k• k ~ 1, and drop, as
above, a perpendicular with the foot z from x to l. The farther the foot
of an inclined line lies from the foot of the perpendicular, the greater
the length of the former is, and therefore
I .X- Yn II"" llx- Yn+kii=? liz- Yn II> liz- Yn+kll
==?llYn-Yn+kll"" liz- Yn+k II+ II.Yn- Ell ""2ilz- Yn I = 2(Jix- Yn 1 2 -liz- xll 2 ) 112 < 2e,
whence it follows that {yn} is fundamental.
Since X is complete and L is closed, the sequence {yn} is convergent
to an element y. By virtue of the continuity of the norm, llx-.YII=Iim llx-
Ynll = P·
Now the theorem can be proved quite simply. If y is a solution of
problem (1), then the function tp(t) = llx- y- tzll 2 has a minimum at zero
for any vector z E L. Applying Fermat's theorem once again [ fi?'(O) = 0],
we readily conclude that (x- yl z) = O, i.e., x- y E L.L. If there exist
two representations x = Yl + z1 = Y2 + z2, Yi E L, Zi E L.L; i = 1, 2,
then Yl- Y2 = Z2- Zl ~ (yl- yzlyl- Y2) = 0 ~ Yl = Y2, Z1 = Z2• •
Remark. Of course, following what was said in Section 3.1.1, we
could have proved the existence of the solution of problem (1) using the
lower semicontinuity (in the weak topology) of the function x->- N(x) = llxll
and the weak compactness of the level set {xiN(x) < 1}, i.e., of the unit
ball in the space X. However, it should be taken into account that the
fact of the weak compactness is based on the isometric isomorphism between
X and X*, and this isomorphism is proved with the aid of the theorem on
the orthogonal complement. (For the theorem on the weak compactness of
the unit ball in the space conjugate to a separable normed space, see [KF,
p. 202]. The function N is convex and continuous, and therefore it is
closed. By Minkowski's theorem (Section 2.6.2), this function is the
upper bound of affine functions which are continuous in the weak topology,
and therefore it is lower semicontinuous in the same topology.)
Now let us find the distance from a point to a hyperplane. Let a
hyperplane H be specified by an equation H = {yl(yle) =a}, llell = 1. The
square of the distance from a point X to the hyperplane H is the value of
the smooth problem llx- yll 2 = (x-
ylx- y) ->- inf; (yle) =a, which pos-
sesses a solution y
according to the lemma proved above (it is clear that
IIi- yll and llx- yll 2 attain their minima simultaneously).
Applying the Lagrange multiplier rule and omitting trivial details,
we obtain
.x =!(.X- yJ.X- y) +A (yJe), !t'y = O=?y = x+Ae,
A=a-(xje), p(x, H)= I .X- .YII = IAel = Ia- Cxle)J.
Finally, we shall present the well-known explicit formula for the dis-
tance from a point xo of a Hilbert space X to a subspace Ln, which is the
linear hull of n linearly independent vectors x 1 , ... ,xn· The square of
this distance is the value of the following finite-dimensional elementary
problem:
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 197

(4)

The existence and the uniqueness of the solution are obvious here, and it
is also possible to refer to the theorem on the orthogonal complement.
However, writing f(a) in the form
n n
f(a)= L (x;lx1 )a,a1 +2 L (x0 lx,)a;+(x0 ,x0 ) ,
1,]=1 1=1

we see that problem (4) coincides with the problem considered in Lemma
of the foregoing section if the matrix An is replaced by the matrix
((xdxJii,j=O, 1, ... , n) (5)
and the matrix An- 1 is replaced by the minor of matrix (5) obtained by de-
leting the first row and the first column. A matrix of form (5) is called
the Gram matrix of the system of vectors {xo, Xlo···•xn}; the determinant
of this matrix is called the Gram determinant (the Gramian) of the system
of vectors and is denoted by G(xo, x1, ... ,xu). If & is the solution of the
problem (4), then, according to formula (3) of Section 3.5.2,
G(x0, X~o ••• , xn) = f(a)G(x., ... , Xn).
Thus,
( 6)
In conclusion, we recommend the reader to solve the infinite-dimen-
sional analogue of the problem of Apollonius mentioned in Chapter 1.
Exercise. Let A:X + X be a compact continuous linear operator ~n a
Hilbert space X. Find the distance from a point to the ellipsoid x
IIYII,;; 1}.
3.5.4. Reduction of a Quadratic Form to Its Principal Axes. As is
known, every quadratic form Q(x) in a finite-dimensional Euclidean space
can be brought to a diagonal form Q(x)=Q(Uy)=IA,y7 with the aid of an.
orthogona~ transformation of coordinates x = Uy. The vectors xCi) = Ue(~)
(where e<~) are the elements of the standard orthonormal basis in the co-
ordinates y) are directed along the "principal axes," i.e., they are sta-
tionary points of the function Q on the sphere llxll 2 = 1.
Without exaggeration we can say that the infinite-dimensional general-
ization of this fact established by D. Hilbert stimulated first the crea-
tion of the theory of Hilbert space and then the development of functional
analysis as a whole.
We shall remind the reader of some necessary notions. Let X be a
separable real Hilbert space with a scalar product (·I·). A sequence xu
is said to be weakly convergent to x if (ylxn) + (ylx) for any y E X. A
continuous linear operator A:X + X is said to be self-adjoint if (xiAy) =
(Axly) for any x, y E X. The quadratic form Q(x) = (Ax [x) generated by
the operator A is said to be weakly continuous if x"~x~Q(xn)-'>Q(x).
weakly

Finally, an operator A is said to be compact if it transforms every bounded


set into a relatively compact set. In the case under consideration this is
equivalent to the fact that x"~x~IIAxn-Axll-'>0 (D. Hilbert himself used
weakly

the definition based on the last fact).


Hilbert's Theorem. For a quadratic form Q(x) = (Axlx), where A is a
self-adjoint continuous linear operator in a separable Hilbert space X, to
be weakly continuous, it is necessary and sufficient that the operator A
be compact.
198 CHAPTER 3

Moreover, there exists an orthonormal basis en in the space X consist-


ing of eigenvectors of the operator A and
00

Q(x) = L An(xien) 2 •
n=l

It is evident that the assertion about the finite-dimensional qua-


dratic forms stated at the beginning of the present section follows im-
mediately from this theorem since every continuous linear operator in a
finite-dimensional space is compact.
Proof. "Sufficient." Let A be a compact operator, and let {xn} be a
sequence weakly convergent to x. Then, by the Banach-Steinhaus theorem
[KF, p. 194], the sequence is strongly bounded, i.e., there exists a number
C > 0 such that llxnll ,;;;;; C, Vn. I t follows that llxll ,;;;;; C. By virtue of the
compactness of the operator, IIAxn- Axil + 0. As a result,
IQ(xn)- Q(x)j = j(Axnlxn)- (Axjx)l ~ i(Axnlxn)- (Axnjx)j

+j(Axnlx)-(Axjx)j ~ I:Axn- Axllllxnll + IIAxn- Axllllxll ~2CIIAxn- Axjl-->0


for n + oo,

B) Construction of the Orthonormal Basis. Let the form Q be weakly


continuous. Let us consider the following extremal problem:
i(Axlx )I-> sup, (xjx)~l, XEL, (1)
where L is a closed subspace.
LEMMA. There exists a solution of problem (1).
Indeed, as was already mentioned in the foregoing section the Hil-
bert space X is isometrically isomorphic to its conjugate space, and there-
fore its unit ball {xl(xlx),;;;;; 1} is weakly compact.
On the other hand, the decomposition X= L E9L~ proved in the fore-
going section implies the equality L = {xi (xly) = 0, y E L~} = (L~)~,
and if Xn E L is a sequence weakly convergent to x, then (xjy) =lim (xniY) = 0,
VyEL\ and hence x E L. Consequently, L is weakly closed, and therefore
the set of the admissible elements of problem (1) is weakly compact. By
assumption, (Axlx) is weakly continuous. Now the existence of the solution
follows from Weierstrass' theorem (Section 3.1.7). •
Let h be the value of problem ( 1) with L = X. First let us suppose
that ~ 1 = 0. Then (Axlx) = O, and hence any vector yields an extremum in
the elementary problem (Axlx) + inf. Applying Fermat's theorem to this
problem, we conclude that Ax = O, i.e., any vector x is an eigenvector be-
longing to the eigenvalue zero, and Q(x) = 0. In this case the theorem
has been proved.
Now let d1 'f 0, and let e 1 be a solution of problem (1) with L =X.
For definiteness, let (Aellel) < 0; then e1 7 0 is a solution of the
problem
(Axlx)-->inf, (xlx)~l. (2)
The Lagrange function of this problem has the form
I£(x, A~o Ao) = A0 (Axlx)+ A1 ((xjx)- 1 ),
and, according to the Lagrange multiplier rule, there exist Xo ~ 0, ~ 1 ~ 0,
not vanishing simultaneously, such that

I£x( e~o A~o A0 ) = 2A0 Ae 1 + 2A 1 e = 0.


1
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 199

It can easily be seen that neither \o nor ~1 can be equal to zero.


By the condition of complementary slackness, lle1ll = 1; denoting A. 1 = -~ 1 /~o
we obtain he1 = A.1e1, /..1 = (he1le1), and, as a result,
(3)
Now let us suppose that n orthonormal vectors e 1 , ... ,en (which are
eigenvectors of the operator h belonging to eigenvalues A. 1 , ... ,A.n) have
already been constructed. Let us consider the following extremal problem:
I(AxJx)l ~sup, (xJx) = 1, (xJe,) = 0, i = 1, ... , n. (4)
By the lemma, there exists a solution of this problem.
We shall denote by nn+ 1 the value of problem (4). Again, if 3n+l = 0,
then hx = 0 for all x E [lin{el,••••en}J.l. In this case Hilbert's theorem
has already been proved:
n n
Ax= L A;(xJe,)e,, Q(x) = L A,(xJeY.
i=l i=l

The first of these equalities obviously implies the compactness of h.


Let 3n+1"' 0, and let en+l be a solution of problem (4); A.n+l = (hen+ll
en+ 1), 1A.n+ 1 1 = 3n+ 1 . Then en+ 1 is a solution of the problem
(AxJx) (5)
---~sup; (xJx)~ 1, (xJe,) = 0, i = 1, ... , n,
An+l

and, by the Lagrange multiplier rule, there are ~o, ~l•···•~n+l such that
~o ~ 0, ~n+l ~ 0, and
n
AoAen+I +An+! en+!+ L A,e, = 0, (6)
i=l

Multiplying scalarly, in succession, the first of these equalities by


e 1 , ... ,en and using the orthonormality of the vectors ei and their ortho-
gonality to en+l• we obtain
i = 1, ... , n,
whence
A,= -Ao(Aen+tie,) = -Ao( en+tiAe;) = -AoA,( en+tie,) = o
fori= 1, .•• ,n. It follows that ~o "'0, ~n+l "'0 in (6) and llen+lll 1.
Let us denote A.n+l = -~n+ 1 /~o. Then
(7)
Thus, the inductive construction can be continued. As a result, either
ON = 0 at a finite step or there is a sequence {en};=l of orthonormal
eigenvectors of the operator h. Moreover, this construction immediately
implies that IA.1l ~ IA.2l > ... ~ I Ani ~ ..•. Indeed, when we pass from n
ton+ 1 in (4), the set of the admissible elements narrows, and therefore
3n= 1A.n+ 1 1 ~ on+1 = 1A.n+ 1 1. The sequence {en}~=l• llenll = 1, is weakly
convergent to zero, and therefore 1\nl = I(Aenlen) + 0 because the form
(Axlx) is weakly continuous. Let us consider the orthogonal complement L.L
to the subspace L spanned by the vectors {en};=l and the problem
J(Axlxll~sup; (xJx)=1, xELJ.. (8)
By the lemma, there exists a solution of problem (8). For the value
of the problem we have 3oo ~min 3n = 0, i.e., 3oo is equal to zero. Con-
sequently, hx = 0 on L.L. Taking a basis fl, •.. ,fs•··· in the space L.L we
form the basis consisting of the vectors {ei}'i=l and {fj}. Moreover, if
00 oo n
x =L (xlek)ek + L (xJjj)jj, then Ax= L Ak(xlek)ek. Let us put A.x = L Ak(xlek)ek. Then
k=l k=l k=I
200 CHAPTER 3

each of the operators An is compact and


/IA.-A//=sup {/IA.x-Ax/1//lx/1.;;1}.;; max /Ak/~0.
k~n+I

Consequently, the operator A is the limit of a sequence of compact opera-


tors convergent with respect to the norm and therefore A itself is compact
[KF, p. 241].
Moreover,

COROLLARY· The vectors en and -en are stationary points of the ex-
tremal problem
Q(x) = (Ax/x) ~ extr, (x/x) = 1. (9)
If all An are distinct, then this problem has no stationary points other
than en and -en.
Proof. By definition, the stationary points of the problem (9) are
the points at which the derivative of the Lagrange function
!£= A0 (Ax/x) + A1 (x/x)

vanishes. As before, if x is a stationary point, then Xo ~ 0 and Ax Ax,


where A= -~1/~o· Consequently,
A(x/e.) =(Ax/e.)= (x/Ae.) = A.(x/e.),

and therefore (xlen) ~ 0 only if A = An· This immediately implies the


assertion we intend to prove. •
In conclusion, let us suppose that the form Q(x) = (Axlx) is positive-
definite (all An are positive) and consider the ellipsoid
E = {x/(Ax/x) = 1}. ( 10)
For the sake of·simplicity, we shall confine ourselves to the case when all
An are distinct.
The stationary points x of the problem
(x/x) ~ extr, (Ax/x) = 1 ( 11)
specify the directions of the principal axes of the ellipsoid, and llill are
the magnitudes of its semiaxes. Again, forming the Lagrange function
!£ = JLo(x/x)+ JLt(Ax/x)
we find ~ox + ~ 1 Ax = 0, whence, as before, Ax = Ax. Thus, the directions
of the principal axes of the ellipsoid (10) are specified by the eigenvec-
tors of the operjtor Aden = Ux(n) lien, n = 1, 2, ••• , and the semiaxes Ux(n) II
are equal to A~ 1 2 •
Exercise. Let A be a compact self-adjoint operator. Let us arrange
its positive and negative eigenvalues in the decreasing order of their ab-
solute values:

and let ~. e~ be the corresponding eigenvectors of the operator A. Let


us denote by ~(hlo•••ohn-1) (S~(hl,••••hu-1)) the value of the extremal
problem
Q(x) = (Ax/x) ~sup (inf); (x/h,) = 0, i = 1, ... , n -1.

Prove that

A!= min S+(ht. ... , h._ 1 )(A~ = max S-(ht. ... , h.- 1 )).
{hl>···,hn-1} {hh ... , hn~t}
LAGRANGE PRINCIPLE FOR CONSTRAINED SMOOTH PROBLEMS 201

where the minimum (the maximum) is attained for hk = ek (hk = ek).


3.5.5. Legendre Quadratic Forms. In this section we shall continue
the study of extremal properties of quadratic forms in a Hilbert space and
prove Hestenes' theorem on the index of a quadratic form, which plays an
important role in the classical calculus of variations.
Definition, A quadratic form Q in a Hilbert space is said to be a
Legendre form if it is weakly lower semicontinuous and if the weak conver-
gence of xu to x and the convergence of Q(xn) to Q(i) imply the strong con-
vergence of xu to x.
The maximum dimension of the subspaces L on which a quadratic from Q
is negative-definite, i.e., Q(x) < 0 for all x E L, x ~ 0, is called the
index of the form and is denoted ind Q.
Hestenes' Theorem. The index of a Legendre quadratic form ~s finite.
Proof. Let us consider again the extremal problem
(Axlx) = Q(x)--> inf, ( 1)
The existence of a solution in problem (1) is guaranteed, as before,
by Weierstrass' theorem (Section 3.1.7): the ball {xl(xlx) ~ 1} is weakly
compact, and, by the hypothesis, Q is weakly lower semicontinuous. If the
value of problem (1) is equal to zero, thenQ(x) ~ 0, Vx, the index of Q is
equal to zero, and the theorem has been proved. Let the value of problem
(1) be negative. Then, as in the foregoing section [problem (2)], the
solution e 1 of problem (1) is an eigenvector:
Ae 1 = A 1 e~> lie~ II= 1, A1 <0.

W~ shall again construct by induction an orthonormal system by solving ~n


succession the extremal problems
(Axlx) = Q(x)--> inf, (xlx)=1, (xle,)=O, i= 1, ... , n, ( 2)
and finding ~n this way the eigenvectors:
(e,le,)=1, i = 1, 2, ... , A I"' A2"' ... "'0. (3)
Let us show that after a finite number of steps this procedure will stop
because the extremal problem (2) will no longer have a nonzero solution.
Indeed, let the sequence {eili ~ 1} be infinite. Then it is weakly con-
vergent to zero. By the lower semicontinuity of Q and the monotonicity of
{;>,i},

However, "-n ~ 0, and consequently lim "-n = 0. Therefore, {en} is weakly


convergent to zero, and, moreover, Q(eu) = (Aenlen) = "-n ~ 0. The form Q
being Legendre, the norm llenll must tend to zero, but this is not so because
llenll = 1. This contradiction shows that after a finite number of steps N
the procedure will stop. The form Q must be nonnegative on the orthogonal
complement L* = (lin{el,••••eN})l..
Let the form Q be negative-definite on a subspace L, and let {fu} be
a basis in L (which is infinite in case dimL = oo). By the theorem on the
orthogonal complement (Section 3.5.8), fn = gn + hu, where gu E LN, huE
LN.J..• If dimL, i.e., the number of the vectors in the basis {fn}, exceeds
N = dimLN, then the vectors g1, ... ,gn+l are linearly dependent, i.e.,
N
I angn= 0, where an are not all zero. It follows that
n=l
N+l N+l
L aJn = L anhn E L-j.,
n=l n=l
202 CHAPTER 3

and therefore Q(X,' a.J,);;;.o, whence ~~: a.f.=O because the form Q/1 is

negative-definite. The last equality is impossible since {fn} is a basis,


and, consequently, dimL ~ N. By the definition of the index, indQ ~ N.
On the other hand, the form Q is negative on 1:
N N
xELN='>x= L (xie,)e,=>Q(x)=(Axix)= L A,(x,ieY<O
i=l i=l

for x 7' 0. Hence, indQ ~ N, i.e., indQ = N. •


Chapter 4

THE LAGRANGE PRINCIPLE FOR PROBLEMS


OF THE CLASSICAL CALCULUS OF VARIATIONS
AND OPTIMAL CONTROL THEORY

The subject of this chapter is clear from its title. Here our primary
aim is, on the one hand, to show the similarity between the two versions
of the theory, which is stressed by the unified system of notation, and,
on the other hand, to elucidate the distinction between the classical and
the modern statements of the problem. We begin with necessary conditions
for the so-called Lagrange problem to which many other problems of the
classical calculus of variations can be reduced. Then we derive the Pon-
tryagin maximum principle, which is one of the most important means of the
modern theory of optimal control problems. The rest of the chapter is de-
voted to some more special classes of problems and to the derivation of
consequences of the general theory. Sufficient conditions for an extremum
are treated less thoroughly, and we confine ourselves to some particular
situations.

4.1. Lagrange Principle for the Lagrange Problem


4.1.1. Statement of the Problem and the Formulation of the Theorem.
Let us fix a closed interval ~=fa:, ~]c:R and consider the Banach space
8=C 1 (~. R.n)xC(~. R.')XR.XR.

consisting of the elements ~ (x ( • ) , u ( · ) , to , t 1) .


For this space we shall consider the problem
I,
53 0 (X(·), u(·), 10 , / 1) = ~ fo(t, x(t), u(t))dt+'IJo(t 0 , x(t 0 ), t,, x(t 1))-.extr, (1)

x=<p(t, x(t), u(t)), (2)


I,
53;(x(·), u(·), 10 , t,)= ~ f;(t, x(t), u(t))dl+'IJ;(l 0 , x(t 0 ), ti, x(t,));:SO. (3)
t,

Here and henceforth, f 1: V--+ R.. \jJ;: W--+ R, i = 0, 1, . . . m; <p: V ......... R" in
(1)-(3) are given functions, where V and Ware open sets in the spaces
R x Rn x Rr and R x Rn x R x Rn, respectively. All the enumerated func-
tions are assumed to be at least continuous. The symbol ~ is used in the
same sense as in Section 3.2.
Problem (1)-(3) will be referred to as the Lagrange problem in Pon-
tryagin's form. Some special cases of this problem were discussed in Sec-
tions 1.4 and 1.5 and in Section 3.1. Functionals of the type of 53;.

203
204 CHAPTER 4

involving both integral and endpoint parts were called Bolza functionals
(see Sections 1.4.2 and 3.1.3). If the endpoint part in a constraint ~i~
t.
0 is a constant, i.e., if the constraint takes the form ~f;(t, x, u)dt~a;,
t,
then, following L. Euler, we call it an isoperimetric constraint. Con-
versely, if there is no integral term, the constraint assumes the form
'IJl;(t~. x(t 0 ), t 1 , x(t 1 ))~0 and is called a boundary condition.

Constraint (2) is called a differential constraint. This type of a


differential constraint in the solved form with respect to is a charac- x
teristic feature of Pontryagin's form of the problem. J. L. Lagrange spe-
cified a differential constraint by an equation ~(t, x, x) = 0 (this was
briefly mentioned in Section 1.5.1).
Finally, in contrast to Chapter 1, the time interval [t 0 , t 1 ] in prob-
lem (1)-(3) is not assumed to be fixed.
A quadruple ~=(x{·), u(·), t 0 , t 1)E8 will be called a controlled process
in problem (1)-(3) if i 0 , i 1 Eint~. (t,· x(t), u(t))EV for iE~. and if the dif-
ferential constraint holds everywhere on [to, t 1 ], i.e., (C(i)=cp(t, x(t),
u(t)). Further, such a quadruple is called an admissible controlled pro-
cess if it is a controlled process and, moreover, conditions (3) hold.
A quadruple g = (~(·),~(·),to, t1) is called an optimal process in
the weak sense or is said to yield a weak extremum in problem (1)-(3) if
it yields a local extremum in the space~. i.e., if there is£> 0 such
that for any admissible controlled process ~ satisfying the condition
II~ - gII" < £ there holds one of the following two inequalities: ~.(~);;;;:,
~.(~} i~ the case of a minimum and ~.(6)~ ~.(~) in the case of a maximum.
By the Lagrange function of problem (1)-(3) we shall mean the function
t,
£'(x(·), u(·), i 0 , i 1 ; p(·), 'A, 'A 0 )= ~ Ldt+l, (4)
t,

where

are Lagrange multipliers,


m
L (t, x, x, u) = ~ 'A;[;(t, x, u)
i=O
+ p (t) (x-cp (t, x, u)) (5)

is the Lagrangian or the integrand, and


m
l (t 0 , x0 , ti, x1 ) = ~ 'A;'ll\ {t 0 , X0 , t1 , x1 ) (6)
i=O

is an endpoint functional.
As before, here and in the following sections we shall use the follow-
ing abbreviated notation:
Lx (t) = Lx (t, x(t), i (t), u~(t)),
Lx(t)=Lx(t, x(t), x(t}, u(t)),
Lu(t)=Lu(t, x(t), ~(t), u(t)),
i,k = t"Ji•. x(t.). t x(t 1, 1 )),

.!l\ = 2tJx (. ), u(. ), t.. tl; p(. ), ~. ~.) ' etc.


Euler-Lagrange Theorem., Let the functions{;: V-+R, i=O, 1, _. .. , m,
cp: V-+ Rn and their partial derivatives with respect r:o x and u be contin-
uous in the open set V in the space R x R_ll x Rr, let the functions
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 205
'lj11: W---. R , i = 0, 1, •.• ,m, be continuously differentiable in the open set
win the spaceR X Rn X R X Rn, and let ~(·)EC 1 (~. Rn), u(·)EC(~. R'), i•.
t1 E int ~ be such that
(t, x(t), u(t))EV, IE~. (i•. x(i;,), i,_, x(t.))EW.
If~ (i(·), ~(·),to, t1) is an optimal process in the weak sense
for problem (1)-(3), then there are Lagrange multipliers ~o ~ 0 (in the
case of a minimum problem) or ~o ~ 0 (in the case of a maximum problem),
p(·)EC 1 (~.Rn•) and~= (~1, ... ,\m), not all zero, such that:
a) the stationarity conditions hold for the Lagrange function:
with respect to x(·) (2x<·>=0):
-~ L;,(t)+L ... (t)=O, (7)
Lx(ik)=(-!)klXk' k=O, I; (8)
with respect to u(·) (j>u(·)=O):
Lu(f) = 0; (9)
with respect to tk:
.!i'tk=O, k=O, !; ( 10)
b) the condition of concordance of signs holds:
i.1 ~ 0, i = I, ... , m; ( 11)
c) the conditions of complementary slackness hold:
5:. 153;(~) =0, i =I, ... , m. (12)
_As in Sections 3.1 and 3.2, inequalities (11) mean that \i ~ 0 if
53;(~)~0 in condition (3)' xi ~ 0 if 9a; (6)~0 in (3)' and ~i may have
an arbitrary sign if 53;(6)=0.
The assertion on the existence of the Lagrange multipliers satisfying
the set of conditions a)-c) will be. briefly called the Lagrange prineipZe
for the Lagrange problem (1)-(3).
Let us show that the assertion of the theorem is in complete agreement
with the general Lagrange principle mentioned in Chapter 1 and Section
3.1.5. Indeed, the Lagrange function ~ is a function of three arguments:
x(·), u(•), and (to, t1). Hence, in accordance with the general Lagrange
principle, we have to consider the following three problems:
(a) ..9'(x(·), u(·), i•. t,_, p(·), ~. ~.)-extr,
(~) ..9'(x(·), u(.), t0 , t1; p(-), A., A- 0)'-extr;
(y) 2(x(-), u(:), '·· t.; IJ<·>. ~. X-.)-extr.
Problem (a) is an elementary Bolza problem, and the stationarity con-
ditions (7) and (8) are written in complete accordance with Section 1.4.2
and the theorem of Section 3.1.3. Problem (8) is also a Bolza problem, but
it is degenerate since u is not involved in the Lagrangian and u(t 0 ) and
u(t 1) are not involved in the endpoint functional. Therefore, the Euler-
Lagrange equation turns into (9), and the transversality condition becomes
trivial: 0 = 0. Finally, (y) is an elementary problem, and equalities
(10) simply express the Fermat theorem (see Sections 1.3.1 and 3.1.1).
As we know, the conditions of complementary slackness and the condi-
tions of concordance of signs are also written in accordance with the gen-
eral Lagrange principle applied to inequality constraints (Section 3.1.5).
206 CHAPTER 4

Hence, the basic theorem of the present section can be stated thus:
if the functions involved in the statement of the problem are sufficiently
smooth, then the Lagrange principle holds for a local extremum.
The stationarity ~onditions can be written in full; according to (5)
we obtain:
from (7):

(7a)

[(7a) is sometimes called the conjugate equation];


from (9):

(9a)

from (8):
(Sa)
and, finally, from (10):

0 = (- l)k-l L (1,.) +4k + fxk~ (t:) = (- l)k-1 L~o iJ;(tk) + p (tk) (~ (1,.) -~ (tk))] +
+ ltk +(-l)kp (tk)~(t,J=(-l)k-{~0 i]; (tk)-p(ik) ~ (t,)] +~k;
that is,

( 10a)

The expression
m
H(t, x, p, u)=Lxx-L=prp(t, x, u)- ~ i,.t; (t, x, u) ( 13)
1=0

will be called the Pontryagin function of the problem under consideration.


Using this function, we can rewrite Eqs. (7a)-(10a) and Eq. (2) thus:
dx ~ dp ~
Tt=HP' Tt=-Hx, fiu=O ( 14)

and
( 15)
Remark. Function (13) is sometimes called the Hamiltonian function.
We use the term the "Hamiltonian function" or the "Hamiltonian" in a dif-
ferent sense which is more natural from the viewpoint of classical mechan-
ics; namely, by the Hamiltonian we mean the function
% (t, X, p) =H (t, x, p, u (t, X, p)), ( 16)
where u(t, x, p) is the implicit function determined by the equation Hu(t,
x, p, u) = 0. When the standard conditions for the applicability of the
implicit function theorem hold, Eqs. (14) assume the ordinary form of a
canonical Hamiltonian system:

( 17)

It should be noted that the set of the conditions stated in the theo-
rem is complete in a certain sense. Indeed, we have the system of the
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 207

differential equations (2) and (7a) and the finite equation (9a) for the
determination of the unknown functions (x(·), p(·), u(·)). Expressing u(·)
in terms of x(·) and p(·) with the aid of Eq. (9a) (of course, when it is
possible, for instance when the conditions of the implicit function theorem
are fulfilled),weobtain a system of 2n scalar differential equations
[equivalent to system (17)]. Its general solution depends on 2n arbitrary
constants and also on the Lagrange multipliers Ai among which there are
m independent ones. Adding to and t1 to them, we obtain altogether 2n +
m + 2 unknowns. For their determination we have the 2n + 2 transversality
conditions (15) and them conditions of complementary slackness (12).
Thus, the number of the unknowns coincides with that of the equations.
It is this fact that was meant when we mentioned the "completeness" of the
set of the conditions. What has been said does not, of course, guarantee
the solvability of the resultant system of equations.
4.1.2. Reduction of the Lagrange Problem to a Smooth Problem. Let
us denote by Y the space C(~, Rn) and write problem (1)-(3) of Section
4.1.1 in the form
93 0 (~) __,. extr; <D (6) = 0, 93 ds) ~ 0, i = I, ... , m, (1)
where
<D(~)(t)=x(t)-qJ(t, x(t), u(t)). (2)
The mapping <P and the functionals 93; are defined in the domain
o/"={(x(·), u(·), t 0 , t,)j(t, x(t), u(t))EV, tE~;
t0 , t, E int ~. (t 0 , X (t 0 ), t,, X (t,)) E W}
in the space -·
Problem (1) has the same form as the problem of Section 3.2.1 of the
foregoing chapter for which the Lagrange principle was already proved. The
hypotheses of the corresponding theorem split into three groups: the condi-
tions that the basic spaces were Banach spaces, the conditions of smooth-
ness of the mappings, and the condition of the closedness of the image of
the infinite-dimensional mapping. Let us verify that all these require-
ments are fulfilled in problem (1).
The spaces ~ and Y are obviously Banach spaces: C1 and C are the basic
examples of Banach spaces in our presentation of the material (see Sections
2.1.1 and 2.1.2).
The smoothness (namely, the continuous differentiability in Frechet's
sense) of the functionals 93;, i 0, 1, ... ,m, follows from the results of
Section 2.4.
Indeed, the integral part can be considered in exactly the same way
as in Proposition 2 of Section 2.4.2 if we replace x(·) by u(·) everywhere,
and the differentiability of the endpoint part was proved in Section 2.4.3
(this part is an operator of boundary conditions). According to formula
(9) of Section 2.4.2 and formula (3) of Section 2.4.3, for~= (x(·), ~(·),
to, t 1 ) and n = (h(·), v(·), To, Tl), we have
t,
$j (~) [ TJ] = ~ aix (t) h (t) +ria (t) v (t)) dt +~
t,

+ fi (ii) Tf -f i (to) To+ Zt;To +It,Tf + Zx,h (T0 ) + l:.,h (f,) +lx,~ {1o)To +lx,~ (i',) Ti ( 3)
(cf. the lemma of Section 3.1.3).
Mapping (2) is also continuously Frechet differentiable (see Proposi-
tion 3 of Section 2.4.1). According to the formula (14) of Section 2.4.1,
its derivative has the form
208 CHAPTER 4

<D' (~) [ 11] (f)= h (f) -~x (f) h (f)-~u (f) V (f). (4)
The continuous Frechet differentiability of the functionals $; and
the mapping ¢ implies their strict differentiability required in the theo-
rem on the Lagrange principle.
The closedness of the image of the mapping¢' follows from its regu-
larity: the image¢'(~)~ coincides withY. Indeed, let us take an arbi-
trary y(·)EY=C(~, Rn), and put v(·) = 0, 'Io ='II = 0. The equation
<D' (~)[h ( · ), 0, 0, OJ ( ·) = y ( ·)
is equivalent to the linear differential equation h(t)-~x(t)h(t)=y(t) with
continuous coefficients which possesses a solution h ( ·) E C1 (~. Rn) by virtue
of the existence theorem forlinearsystems (Section 2.5.4). Hence, all the
conditions of the theorem of Section 3.2.1 are fulfilled.
For definiteness, let us assume that (1) is a minimum problem. Ac-
cording to the Lagrange principle, if~= (x(·), u(·), to, t1) yields a
local minim~m for (1), then there are Lagrange multipliers 10 , ~ERm• and
an element y•:EY•=(C(~. Rn))• such that for the Lagrange function
..2'(£; y•, /.., /..0 )=.2'(x(·), u(·), t., f 1 ; y•, X, A0 )=

= l(~t..;{dt, X, u) )dt +~ A;1jl; (t 0, X (t 0 ), t1 , x (t 1 )) + y• 0 ell (5)

there hold the conditions of concordance of signs, the conditions of com-


plementary slackness, and the stationarity conditions:

.£'~=0~-Pxo=O, 2uo=0, 2tk=(), k=O, 1.


The th~ee relations forming the right-hand side will be referred to as the
stationarity conditions with respect to x(·), u(·) and tk·
However, it should be noted that function (5) is not identical with
the Lagrange function (4) of Section 4.1.1. Using Riesz' representation
theorem for the general continuous linear functional on the space C(~, Rn)
(Section 2.1.9), we can only assert that

<y•, 11 (- )> = Sdv (t) 11 (t),


6.

where ~(t) = c~l(t), ...• ~n(t)) is a row vector formed of canonical func-
tions of bounded variation. Substituting this expression into (5) and
taking into account the notation in the formula (6) of Section 4.1.1, we
obtain

+
1
m )
.§' = ~ ~ A;{; (f,
1 (
X (t), U (f)) df

+ Sdv(t){x(t)-cp(t, x(t), u(t))} +l(t0 , x(t 0 ), t 1 , x(t 1)). (6)


6.

Comparing this formula with formulas (4)-(6) of Section 4.1.1, we see that
the Lagrange function ~ is obtained from ~ if the function~(·) is ab-
solutely continuous and its derivative is equal to zero outside the closed
interval [to, t1] and coincides with p(·) on this interval. Here the situ-
ation resembles the one in Section 1.4.1, where the DuBois-Reymond's lemma
allowed us to derive Euler's ~quation without the additional assumption
that the function t + p(t) = Lx(t) was differentiable. Similarly, an ap-
propriate generalization of that lemma will enable us to prove the continu-
ity of the function~(·).
4.1.3. Generalized DuBois-Reymond's Lemma. Let a vector function
a(·): [a, ~]-..Rn.o be Lebesgue integrable on a closed interval [a, S], let
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 209

a matrix function b(·): [ex, ~]-+..P(Rn, Rn) be continuous on [a., 8], let v(·):
,{ex, ~J---+ Rn• be a canonical vector function of bounded variation, let tk be
different points belonging to the open interval (a., 8), and let
Ck{Rn•, k = 0, 1, ••• ,p.
If the equality
II II P
~a (t) h (t)dt + ~ dv (t) {li (t) -b (t) h (t)} +
a a
L c,.h (t,.) = 0
k=O
(1)

holds for all h(·)EC1 ([ex, ~]. Rn), then function v(•) is absolutely continu-
ous; its derivative pl·) is continuous on [a., 8] except possibly at the
points tk, where it has the limits on the left and on the right, and is
absolutely continuous on any interval not containing the points tk, and,
moreover,
p (t) =a (t) -- p (t) b (t) almost everywhere, (2)
p (ex)= p (~) = 0, (3)
p (t,. +O)-p (t,.-0) =Ct.· (4)
t
Proof. Putting h(t)=y+~'ll(s)ds,, weobtain from (1) the relation

II II P } II t
{
~ a-(t) dt- ~ dv (t) b (t) +~ c,. y +~ a (t) ~ 'I') (s) ds dt +
II II t P tk

+ ~ dv (t) 1J (t)- ~ dv (t)b(t) ~ 'lJ (s) ds+ LC~c ~ 'I') (s)ds = 0.


a a a k=O a

Since y ERn and 'lJ ( ·) EC ([ex, ~J, Rn) can be chosen arbitrarily and indepen-
dently, in the first place, we have
II II P

~ a(t)dt-..~ dv(t)b(t)+ L c,.= 0 (5)


a a k=O

[because we can put n(·) =0 and take an arbitrary y] and, in the second
place, the equality
11 t II -~~ t P tk

~ a(t) ~ 'lJ(s)dsdt+ ~ dv(t) 'lJ (t)- ~ dv(t)b(t) ~ 'lJ (s)ds+ L c,. ~ 'lJ (s)ds=O (6)
a a a a a. k=O a

holds for any TJ(·)EC([ex, ~], Rn),.


Let us change the order of integration in the first and the third
terms in accordance with Dirichlet's formula [formula (5) in Section 2.1.9],
denote by s the variable of integration in the second term, put

(7)

and use the equality


II II
(8)
~ f (t) dt-t (t) = ~ f (t) t-t' (t) dt

(which holds for absolutely continuous functions [KF, p. 359]) to bring


equality (6) to the form

~ (fa(t)dt>(s)ds+ ~dv(s) TJ(S)-! (Jdv(t)b(t))l] (s)ds-t


II P II II
+ ~
a
L c,. Xfa,t~cJ (s) '!J (s)ds= a~ dA (s) '(} (s)+ a~ dv (s) 1J (s)=O,
k=O
(9)
210 CHAPTER 4

where

Function A(·) is absolutely continuous on [a, S], and we have A(a) = 0, and
consequently A(•) is a canonical function of bounded variation. Since v(·)
is also a canonical function of bounded variation, relation (9) and the
uniqueness property in Riesz' theorem (Section 2.1.9) imply the identity
A(s) + v(s) =
0. Hence, function v(·) is absolutely continuous, and its
derivative p(s) = v'(s) is equal to
II II P
p (s}= v' (s) =-A' (s) = - ~ a (t) dt + ~ p (t) b (t) dt-
s •
L c11X.£a, t11] (s)
k=O

for s ~ tk [here we again use (8)]. It is seen from this formula that
function p(·) is in its turn continuous everywhere except at the points
tk, where (4) holds and that (2) takes place. The equality p(S) = 0 is
evident, and p(a) = 0 is equivalent to (5), and hence (3) is also proved. •
4.1.4. Derivation of Stationarity Conditions. Thus, we have shown
that if~ yields a local minimum in problem (1)-(3) of Section 4.1.1, then
there are Lagrange multipliers %0 ;;;;:;:: 0, %E Rm• and yo EC (t:., Rn)• such that
there hold the conditions of concordance of signs and the conditions of
complementary slackness for A, and the stationarity conditions:

(1)

where .fi is determined by equality (1) of Section 4.1.2, and 2x<·>• 2u<·>•
and $, 11 are obtained from !ix<·>· 2u<·>, and fftk by substituting x< ·) = x< ·),
u(·) = u(·), tk = tk, k = o, 1.
"Now we pass to the jnterpretation of conditions (1). When differ-
entiating the function .!l', we shall take into consideration that the first
and the third terms of the formula (6) of Section 4.1.2 form a Bolza func-
tional whose derivative is expressed by formula (3) of that same section,
and the second term is the composition of the differential constraint oper-
ator and the linear mapping tJ(·)--~·dv(t)'IJ(t), and therefore its derivative
A
is obtained by differentiating the formula (4) of Section 4.1.2 with re-
spect to dv(t).
A) Stationarity with Respect to x(•), Differentiating .fi with respect
to x(·) we conclude, in accordance with (1a), that for all h(·)EC~(t:., R,n)
the equality

t;m
0= j x(·) [h ( · )] = ~ L 1.Jix (t) h (t)dt + ~ dv (t){/i (t)-~x (t) h (t)} + lx,h (lo) + fx,h (t1.} (2)
f. i=O A

holds. Let us begin with the case t1 >to. According to the generalized
DuBois-Reymond's lemma, it follows from (2) that the function v(·) is ab-
solutely continuous on~ and its derivative p(·) = v'(·) is continuous on
~ everywhere except possibly at the points to and t 1 , where it has limits
on the left and on the right, and relations (2)-(4) of Section 4.1.3 hold.
For the situation under consideration, these relations mean that
m
dpjdt = -p (t) ~" (t) +X(t,.. 7,] (t) 1~ %} ix (t) (3)
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 211

for t ~ to, t1, p(~) = p(~) = O, where~ and ~are the end points of ~ and
that
(4)
(here X[to,t 1 ] is the characteristic function equal to 1 on the closed
interval [to, t1l and to 0 outside it). It follows from Eq. (3) that p(·)
satisfies the ~omogeneo~s linear differential equation on the half-open
intervals [~, to) and (tl, ~]. Since it vanishes at the end points~ and
~ of these inter~als~ the uniq~eness theor~m of Section 2.5.3 implies that
p(t) = 0 on~\ [to, t1l and p(to- 0) = p(tl + O) = 0.
Now let us denote by p(•) the solution of Eq. (7a) of Section 4.1.1
coinciding with p(·) on the interval (to, t1) [Eq. (7a) of Section 4.1.1
and Eq. (3) are identical in this interval]. Since (7a) is a linear equa-
tion, the solution p(•) is defined on the whole interval ~ and belongs to
C 1 (~, an*) (Section 2.5.4). We have p(to) = p(to + 0) and p(tl) = p(tl-
0) for this solution, and since p(to - 0) = 0 and p(tl + 0) = 0, relation
(4) turns into condition (Sa) of Section 4.1.1. Hence, stationarity condi-
tions (7) and (S) [or (7a) and (Sa)] of Section 4.1.1 hold for p(·).
For t 1 <to we obtain, instead of (3), the equation
m,
dp/dt = -p (t) ~...(t)-X 1 t,, 7,] (t) ~~ 'ilix (t)
whose solution p(·) vanishes on[~, t1) and (to, ~]. In this case we de-
note by p(·) the solution of Eq. (7a) of Section 4.1.1, which coincides
with (-1)p(•) on (tl, to), and then conditions (7) and (8) of Section 4.1.1
hold.
Finally, for t1 = to we can again make use of the generalized DuBoi&-
Reymond's lemma, but in this case we have v'(t) = p(t) :: 0 everywhere on~
except possibly the point to = t1 at which the condition p(tl + 0) - p(tl -
0) = lx 0 + Zx 1 = 0 must be fulfilled. It follows that Zx 0 + Zx 1 ~ 0.
!aking asp(·) the solution of Eq. (7a) of Section 4.1.1 for which p(to) =
lx 0 , we again see that conditions (7) and (8) of Section 4.1 .1 hold.
B) Stationarity with Respect to u(•). Differentiating Jf with respect
to u(·) and taking into account that the function v(·) is absolutely con-
tinuous and its derivative p(·) = v'(·) is nonzero only in the interval
(to, t 1 ) and coincides with p(·) in this interval, we obtain the equality

I, { m }
O=e!l\[v(·)J=~ ~'id 1 u(t)-p(t)~u:(t) v(t)dt.

Since this equality holds for all v(·)EIC(A, Rn), we again refer to the
uniqueness in Riesz's theorem (Section 2.1.9) to obtain the equation
m
l: Atftu (t)- P(t) ~a (t) = 0,
i=O

which coincides with Eq. (9a) and consequently with the condition (9a) of
Section 4.1.1.
C) Stationarity with Respect to tk· Since
2(x(·), u(·), to, tl; ... )-.P(x(·), u(·), to, tl; ... )=
I,

= Sdv(t){~(t)-<p(t, x(t), u(t))}- ~ .O<t){~(t)-<p(t, x(t), u(t))}dt=o,


A 4
the derivatives of these two functions with respect to tk at the point ~

(x(.), u( ·), to, t 1) coincide, and since the equality .fitk=O takes place,
212 CHAPTER 4
condition (10) of Section 4.1.1 also holds.
4.1.5. Problem with Higher-Order Derivatives. The Euler-Poisson
Equation. Let us again fix a closed interval deft. We shall consider the
space
8,=C'(d, Rn)xRxR
whose elements are triples~= (x(·), to, t1), where the functions x(·):
6 ~ Rn are continuously differentiable up to the order s inclusive, and
4, t1 Eintd, When speaking about a pPobtem with higheP-oPdeP dePivatives
(in the classical calculus of variations), we shall mean the extremal prob-
lem
~ 0 (~) --+ extr; 9l i (~) :§0 0, i = I, ... , m, (1)
in the space ~s where
I,
~d~)= ~ f;(t, x(t), ... , x<•l(t))dt+
t,
+'IJli (!0 , X (t 0 ), ••• , x<•- 1 l (!0 ), ti, X (/ 1 ), ••• , x<s-ll (ti)). (2)
Here in (1) and (2) we have fi:V ~ R, ~i:W ~ R, i = 0, 1, ... ,m, where V and
Ware open sets in the spaceR x (aJl)S 1 and R x (Rn)s x ·R x (Rn)s, respec-
tively; functions fi and ~i are at least continuous. A triple~= (x(·),
/ 0 , t 1 )E8 8 is admissible in problem (1), (2) if (t, x(t), ... , x<•l(t))EV, tEd
and
(t., x (t.), ... , x<Hl (t.), 11 , x (/1), ••• , xcs-!l (11 )) EW.
An admissible triple~= (x(·), to, t1) is called a (loeal) solution
of problem (1), (2) if it yields a local extremum for the functional~.
in the space =s• i.e., if there exists E > 0 such that for all admissible
triples~= (x(•), to, t1) satisfying the conditions

there holds one of the two inequalities: .?0 (6);;;;;:. .?0 (~) in the case of a min-
imum and .? 0 (~)~.? 0 (~) in this case of a maximum.
Let us show that the problem with higher-order derivatives (1), (2)
is reducible to the Lagrange problem. To this end we denote
. . . , x,)E(Rn)•,
X=(X1 ,
(x ( · ),
X ( ·) = x(·), ... ,
x<•- 1 , (. )),
u(·)=x<•l(·), P(·)=(pl(·), ... , p,(·)).
Then problem (1), (2) takes the following standard form of the problem con-
sidered in Section 4.1.1:
t,
~ f 0 (t, x(t), u(t))dt+l)l0 (t 0 , x(t.), /1 , x(t 1))-+extr, ( 1 I)
t,
I,

~ ft(t, x(t), u(t))dt+l)l;(t 0 , x(t 0 ), / 1, x(t 1 )):§00,


t,
i=l, ... , m, (2 I)
XI=X/+1• j=l, ... , s-1, x,=U (3)
with the Lagrange function
t,
£'(X(·), U(·), i 0 , i 1 ;p(·),1.,,1., 0 )=~Ldt+l, (4)
t,
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 213
where

x, u)=~"-dt(t,
m s-1
L(t, x, x, u)+.~P1 (t)(x1 -x1 +i)+Ps(t)(x3 -u) (5)
0 i= 1=1
and
l (t 0 , X0 , X0 , ••• , X~s-u, t 10 X1, X1, ... , xl'_1,)=
m
= ~ ' ,., (t 0' Xo, .xo,
~ "'i't'i · . I., x<•-
0
1,
1
tit x lt x' l) ••• ,
<•-1,) ·-
xl
(6)
1=0
Restating the theorem of Section 4.1.2 for this case, we arrive at the
following necessary condition for an extremum:
THEOREM. Let the functions fi:V ~ R, i = 0, 1, .•. ,m, and their par-
tial derivatives with respect to x, ... ,x(s) be continuous in the open set
vc:Rx(Rn)s+ 1 , let the functions l/Ji:W ~ R, i = 0, 1, ... ,m, be continuously
differentiable in the open set Wc:RX(Rn)'xRx(Rn) s, and let x(·)EC8 (L\, Rn),
t 0 , ti (!: int L\ be such that
(t, x(t), .···• x<•,(t))EV, tEL\;
(f'o, x(t0 ), ... , x<•- 1 ,(f'o), t1 , x(t1 ), ... , i<s- 1 ,-(t~))EW.

If~ = (i(·), to, t1) is a local solution of problem (1), (2), then there
are Lagrange multipliers: ~o ~ 0 (in the case of a minimum problem) or
~o..:;; 0 (in the case of a maximum problem),p(·)=(p1 (·), . . . , p,(·))EC1 (d, (Rn)'")
and~= (~l,••••~n), not all zero, such that:
a) the stationarity conditions for the Lagrange function hold:
with respect to x(•) (2x(·)=0, 2u(·J=0)

(7)

(8)

w·ith respect to tk (.i'tk=O):


H(tk)=(-I)k+ 1 Ltk• li=O, I; (9)
b) the condition of concordance of signs holds:
11 ~ 0, i = I, ... , m; ( 10)
c) the conditions of complementary slackness hold:
115l 1 (~) = 0, i = I, ... , m. ( 11)
Here
m
f(t, x, X, ••• , x<s))= ~1;/ 1 (t, x, ... , x<•,), (12)
1=0
s
p
il(t)= ~ 1 (t)x<f,(t)-f(t, x(t), ... , x<s)(t)), ( 13)
i=l
and, as usual,
f <j) = 1 (t0 , (j) x(fa), ...• x<s-1) <lo>• t1, x<i1>• ... , x<s-l) (tl)),
"" "li
etc. and inequalities (10) mean that ~i ~ 0 if ~;(s),.;;;O in problem (1),
~i ..:;; 0 if _,1 (6);;;::;.0 in (1), and ~o may have an arbitrary sign if ~t(i.)=O
in ( 1). ·
214 CHAPTER 4
The detailed calculations bringing conditions (7)-(12) of Section
4.1.1 to the analogous conditions of the theorem under consideration are
left to the reader as an exercise.
For the special case of a problem with higher-order derivatives in
which the endpoint terms in (1') and (2') are constants and the end points
are fixed, the assertion of the theorem can be expressed in a simpler form.
In this case the transversality conditions (8) and (9) are dropped and the
system (7) can be written in the form of one equation

(14)

Finally, if we assume that the required derivatives exist, Eq. ( 14)


can be written in the form of the Euler-Poisson equation
dt
j,. (t)-di , .. + i(tif% (t)- ... -(-1)• (iii' f ,.w (t) = 0.
~ d2 ~ d•~
( 15)

The general solution of this equation of the 2s-th order involves 2s


arbitrary constants and also depends on the numbers Aa, .•• ,Am one of which
can be chosen arbitrarily. We have altogether 2s + m unknowns. For their
determination we use 2s boundary conditions (for the fixed end points),
and m more conditions are obtained from the conditions of complementary
slackness and inequalities (1) and (10) (check that altogether we can de-
rive from them exactly m equalities). It can similarly be verified that
in the general case the number of the equations the theorem yields also
coincides with the number of the unknowns.

4.2. Pontryagin Maximum Principle


4.2.1. Statement of the Optimal Control Problem. The extremal prob-
lem to which this section is devoted almost coincides in its form with the
Lagrange problem (1)-(3) of Section 4.1.1. Here we deal with the same
functional, the same differential constraint, and the same inequality con-
straints as before. The distribution of the new problem is that in its
formulation a constraint on the control is separated out explicitly; this
constraint has the form u (t) E U, where U is a topological space.
At first glance the new formulation does not provide anything new
and even seems less general. Indeed, in the Lagrange problem we dealt with
admissible processes which satisfied the condition (t, x(t), u(t)) EV. In the
special case, taking V=GxU, we split this condition into two: (t, x(t))EG
and u (t) E U.
However, the division of the variables into phase coordinates and con-
trols and the separation of the constraints imposed on the controls proved
very significant. It led to the appearance of a new branch of the theory
of extremal problems which very soon became very popular among applied
scientists due to its applications to practical problems and allowed us to
look at the classical theory from a new viewpoint.
Now, what caused all this? In the first place, it was the necessity
to investigate technical problems. The separation of phase coordinates
and controls and their connection by means of a differential equation is a
usual model of a process whose development obeys the laws of nature (the
differential equation!) and which is acted upon by the man controlling this
process according to his purposes and trying to make it in a sense optimal.
It is clear now that the constraints of the form u (t) E U imposed on the
controls are connected with the limitation of the possibility of influencing
the process (say, with the limitation of the range of the angle of rotation
of the rudder of a guided vehicle).
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 215
If U is assumed to be an open set in Rr, we obtain nothing new in-
deed. However, in technical and other applied problems the set U is not
necessarily open. The control can often be even discrete (switching on-
switching off). There naturally appear constraints for which U is a
closed set: the importance of such a situation was already demonstrated in
the solution of Newton's problem (Section 1.6.2). Even the simplest opti-
mal control problems involve constraints of the form lu(t)l ~ 1 (see the
time-optimal problem in Section 1.6.3). In the last case the set of the
admissible controls is a closed ball in the space Loo(~) of complex struc-
ture, and in the typical cases the optimal control lies on the boundary
of this ball [from what was said in Section 1.6.3 it is seen that the prob-
lem would have no solution if the constraint lu(t)l < 1 were imposed be-
cause the minimum time corresponds to a control for which lu(t)l = 1 almost
everywhere], and this boundary is highly nonsmooth and "multifaceted" or,
more precisely, "infinite-faceted ." All this hampers the application of
the ordinary methods of differential calculus, and smoothness conditions
with respect to the control often turn out to be unnatural. That is why
we shall no longer require that the derivatives fiu, ~"'etc. exist, and
even the set U of the possible values of the control will be regarded as
an arbitrary topological space which does not necessarily possess a linear
structure in the general case. The absence of the derivatives with re-
spect to u is the second distinction of the new problem from the old one.
In order to stress this fact we shall denote the functionals involved in
the formulation of the problems by a new symbol, although they almost coin-
cide in their form with Bolza functionals.
Finally, the last distinction is that the control is no longer sup-
posed to be continuous. Basically, this is connected with the fact that
the problem often has no solution in the class of continuous controls (for
instance, this is the case for most problems in which U is discrete or
for the time-optimal problem of Section 1.6.3). I t should be noted, however,
that the passage to "polygonal extremals" (which corresponds to piecewise-
continuous controls) is also made in the classical calculus of variations
(see Section 1.4.3). As to the very possibility of the free choice of the
control within the set U [it was already used when we considered needle-
like variations in the derivation of Weierstrass' conditions in Section
1.4.4 and in the proof of the maximum principle for the simplest situation
(Section 1.5.4)], it generates "latent convexity" with respect to the con-
trols, with which the form of the basic condition, i.e., the maximum prin-
ciple [see (10), (11) in Section 4.2.2], is connected: it resembles the
corresponding condition in the convex programming problems. Here this
connection will not be revealed in full. This will be done in the next
section for a particular version of the general problem.
Thus, let us consider the extremal problem
t,
.1 0 (x(·), u(·), t0 , t1)= ~ f0 (t, x(t), u(t))dt+'ljl0 (t 0, X(t 0 ), ti, x(t1 ))-inf, (1)
t,
x=q>(t, x(t), u(t)), u(t)EU, (2)
t,
.1;(x(·), u(·), t 0 , t 1)= ~ f 1 (t, x(t), u(t))dt+
t,
+'ljl;(t0 , x{t 0), ti, x(ti))~O, i=l,-2, ... , m. (3)
Here f;: GxU-+It 'ljl;:W-+R, i=O, 1, ... , m, <p: GxU-+-Rn, where G and
ware open sets in the spaces R x Rn and R x Rn x R x Rn, respectively, and
U is an arbitrary topological space. The symbol § has the same meaning as
in Sections 3.2 and 4.1. All the functions fi, ~j. and ~ are assumed to
be at least continuous.
216 CHAPTER 4

A quadruple (x( ·), u ( •), to, t1) will be called a controZZed process in
problem (1)-(3) if:
a) the control u(·): [t 0 , t1 ]-+U is a piecewise-continuous function.
Its values at the points of discontinuity are insignificant; for definite-
ness, we shall assume that u(·) is continuous on the right for to ~ t < t 1
and on the left at the point t1;
b) the phase trajectory x(·):[to, t1l + Rn is continuous and its graph
lies in G:

c) the function x(·) satisfies the differential equation


X(t) = cp (t, X (t), u (t))
for all t E [t 0 , t 1 ] except possibly at the points of discontinuity of the con-
trol u( ·) [in this case we say that x( •) corresponds to ·the control u( ·);
it can easily be seen that x(·) possesses the left-hand and the right-hand
derivatives at the points of discontinuity of the control].
A controlled process is said to be admissible if the conditions (3)
are satisfied.
An admissible controlled process (~(·),~(·),to, t 1 ) is said to be
(locally) optimal if there is E > 0 such that for every admissible con-
trolled process (x(·), u(·), to, t1) such that
lt.t-f.tl<e, k=O, land lx(t)-x(t)l<e (4)
for all t E [t 0, t1J n[t., 41
the inequality
(5)
holds.
It should be noted that here there is a change in comparison with the
Lagrange problem: we no longer require that the derivatives x(t) and ~(t)
be close to each other. In the classical calculus of variations this cor-
responds to the passage from a weak extremum to a strong one (see Section
1 .4.3).
It is instructive to compare the Lagrange problem and the optimal con-
trol problem for the case when U= R' and V = G x Rr in Section 4.1 .1.
Under these assumptions:
Every admissible controlled process (x(·), u(·), to, t 1 ) of the La-
grange problem is an admissible controlled process for the optimal control
problem as wel~ (more precisely, it becomes such for the restriction of
u(•) to [to, t1]).
This means that the problem has been extended (cf. Section 1.4.3).
Consequently, the value of the optimal control problem is not greater than
that of the Lagrange problem (inf.7 0 ~inf~0 ). However, for some natural as-
sumptions about the functions fi, t/Ji,and q> 'the two values coincide (for the
trivial case when x
= u this was proved in the "rounding the corners" lemma
in Section 1.4.3).
If inf-' 0 -<inf~•• then, as can easily be seen, the solutions of the two
problems exist or do not exist independently. In case the values of the
problem are equal, there takes place one of the following three possibili-
ties:
a) The Lagrange problem possesses a solution: inf~.==~0 (x(·),.u(·), ~. [1 )..
Then the same quadruple is a solution of the optimal control problem be- ·
cause
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 217

:70 (X(·), u(·), t 0 , ti)~930 (x(·), u(·), t t )=inf5.3


0, 1 0 =inf:7 0 •
b) The Lagrange problem has no solution while the optimal control
problem has: inf:/ 0 is attained. on a piecewise-smo oth curve~(·). Such a
situation was observed in the example (6) of Section 1.4.3 (for = u the
passage from the class C1 to the class KC 1 exactly corresponds to the pas-
x
sage to the optimal control problem).
c) The two problems have no solutions: the infimum of the functional
is not attained even when its domain is extended. This is the case in the
Bolza example (Section 1.4.3).
In the foregoing argument it was tacitly assumed that the minimization
was considered with respect to the whole class of the admissible controlled
processes. However, usually extremal problems are considered from a local
viewpoint, which, by the way, is reflected in the definitions we stated.
In Section 4.1.1 an admissible controlled process (x(·), u(·), t 0 , t 1 ) was
called optimal (in the weak sense) when it yielded a minimum for the func-
tional $ 0 in a C1 -neighborhood with respect to x(·) and in a C-neighbor-
hood with respect to u( ·),whereas in the present definition of optimality
we assume that the minimum of 5 0 takes place in a wider neighborhood de-
scribed by inequalities (4). This is in fact a C-neighborhood with respect
to x(·), and there are no constraints on the controls at all. As was al-
ready mentioned, in the classical situation this would be a strong minimum.
Therefore, a local solution of the Lagrange problem may not be an optimal
process for the optimal control problem. In case the control u(·) in an
optimal process (x(·), u(·), to, tl) is continuous, this process is a local
solution of the Lagrange problem as well under the ordinary assumptions
about the functions involved in the problem.
Conditions (3) are not the most general constraints which we have to
consider in optimal control problems. For instance, we do not consider at
all the so-called phase constraints of the form
(6)
where D is a subset in R x Rn. Of course, if D is open, then replacing G
by G nD in the definition of a controlled process and diminishing, if nec-
essary, sin (4) we automatically take into account constraints (6). There-
fore, of primary interest is the case when D is not an open set and the
optimal phase trajectory x(t) reaches the boundary of D for some t (see
Fig. 37; here the arc length is minimized, and the shaded part of the
plane does not belong to D). This leads to some essentially new facts, in
particular it becomes necessary to introduce a generalized function in the
Euler-Lagrang e equation, namely, the derivative of a measure which is not
absolutely continuous (or to replace this differential equation by an inte-
gral equation with Stieltjes' integral with respect to such a measure).
4.2.2. Formulation of the Maximum Principle. The Lagrange Principle
for the Optimal Control Problem. Let us again consider the problem
t,
:7 0 (x(·), u(-), t 0 , f 1 )=~f 0 (t, x(t), u(t))df+1Jl 0 (t 0 , x(t 0), tl' x(t 1 ))-..inf, (1)
t0

x=cp(t, x(t), u(t)), u(t)EU, (2)


t,
:/i(x(·), u(-), i 0 , i 1 )=) fdt, x(t), u(t))dt+
t,
+1Jldf 0 , x(t 0 ), ti, :t(f1 )):§:0, i= l, 2, ... , m. (3)
As in Section 4.1 .1, by the Lagrange function of this problem we mean
the function
218 CHAPTER 4

Fig. 37
t,
:l(x(·), u(·), t 0 , ti; p(-), A, A0)= ~ Ldt+l, (4)
lo

where
A0 E R, A= (Av ... , f..m) E R'"",
m
L (t, x, x, u) = ~ f..;{; (t, x, u) + p (t) (x-cp (t, x, u)), (5)
i=O
m
l (t 0 , X0 , f 1 , x 1) = ~ A;'\j);(/ 0 , X0 , f1 , X 1 ). (6)
i=O

The function p ( ·): [to, t1l -+ Rn* is assumed to be continuous. When


necessary, we shall suppose that the function p(·) is defined on a wider
interval than [to, t1l; to this end we extend it arbitrarily outside that
interval with the preservation of the continuity. Function p(·) and the
numbers Ai and Ao are called the Lagrange multipliers of problem (1)-(3);
function (5) is called the Lagrangian, and (6) is called the endpoint func-
tional. Finally, the function
m
H(t, x, u, p)=L.ix-L=prp(t, x, u)- ~ A;{;(t, x, u) (7)
i=O

will be called the Pontryagin function. As usual, we shall use the nota-
tion
lx(t)=Lx(t, x(t), x(t), u(t)), ~(t)=rp(t, x(t), u(t)),
lx" = lx" (fa, xu:). £;., x(i;.)),
etc.
THEOREM (Pontryagin Maximum Principle). Let G be an open set in the
space R x Rn, let W be an open set in the space R x Rn x R x Rn, and let U
be an arbitrary topological space. Let the functions f;: GxU-+R, i = 0,
1, ••• ,m, and rp: GxU-+R" and their partial derivatives with respect to x
be continuous in GxU, and let the functions '\j);: W-+R, i = 1, •.• ,m, be
continuously differentiable in W.
If (x(·), u(·), to, t1) is an optimal process for the problem (1)-(3),
then there are Lagrange multipliers

not all zero, such that:


a) the stationarity conditions and the minimum principle for the La-
grange function hold:
the stationarity condition with respect to x( ·) (Sx(·) =0)
d ' ' (8)
-Yt L~ (t) +Lx (t) = 0,
L; (i,)=(-J)kfxk' k=O, I; (9)
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 219

the minimum principle in the Lagrange form with respect to u(·):


L(t)s=L(t, x(t), ~(t), u(t))=minL(t, x(t), ~(t),v), ( 10)
VEU

or the maximum principle in the Hamiltonian (Pontryagin) form

H(t)=H(t, x(t), u(t), p(t))=maxH(t, x(t), v, p(t)), ( 11)


VE-lt

the function H(t) being continuous on the closed interval [to, t 1 ] ;


the stationarity conditions with respect to tk:
.itk=O, k=O, I; (12)
b) the condition of concordance of signs holds:
'A;~O; ( 13)
c) the conditions of complementary slackness hold:

( 14)
[as in the foregoing section, inequalities (13) mean that ~i ~ 0 if ';~0
in condition (3), \i.;;;; 0 if ';)'!0 in condition (3), and Ai may have an
arbitrary sign if ';=0].
The assertion about the existence of the Lagrange multipliers satis-
fying the set of the conditions a)-c) will be Qriefly called the Lagrange
principle for the optimal control problem (1)-(3) or the Pontryagin maximum
principle. As before, this assertion is in complete agreement with the
general Lagrange principle stated in Section 3.1 .5. Since the Lagrange
function~ is a function of the three arguments x(·), u(·), and (t 0 , t 1 ),
we have to consider the following three problems:

(a) ~(X(·), u(·), l •. tl; p(·), ~. X.)-inf,


(~) .!l(x(·), u(·), i •. ~; p(-), X, X.)--..inf,
('\') .Y(x(·), u(·), t0 , t 1 ; p(·), ~. X.)--..inf.
Here problems (a) and (y) are the same as in Section 4.1.1. Therefore, the
corresponding stationarity conditions with respect to x(·) and tk must also
have the same form as in the Lagrange problem, and this is really so. As
to problem (S), this is an elementary optimal control problem of Section
3.1.4, and the minimum principle (10) corresponds exactly to the condition
(1) of that subsection. Thus, the above formulation of the theorem does in
fact agree with the general Lagrange principle.
The stationarity conditions (8), (9), and (12) are usually written in
a different form: (8) is written as the Euler-Lagrange equation (alsocalled
the adjoint equation):
m
dp(t)jdi=-p(t)~x(f)+ ~ XJ;x(t); (Sa)
i=O

and (9) and (12) are written as the transversality conditions


p(t )+Lx, =0, -p (to)+Lx,=O
1 (9a)
and
(12a)
The corresponding transformations are carried out in the same way as 1n
Section 4.1.1 and we do not have to repeat them. At the points of discon-
tinuity of the control the right-hand derivative p~( ·) should be taken in
Eq. (Sa) (the left-hand derivative also exists). The condition of continuity
220 CHAPTER 4
of p(•) and H(·) at the points of discontinuity of the control [i.e., at
the corner points of x(·)] is referred to as the Weierstrass-Erdmann aondi-
tion.
Here it is also possible to carry out the analysis of the "complete-
ness" of the set of the necessary conditions given by the theorem we have
stated. However, it does not in fact differ from what was done for the
Lagrange problem. The distinction is that in the case under consideration
instead of finding u(·) as a function of x(·) and p(·) from the condition
Lu = 0 we must determine u(·) using the relation (10).
4.2.3. Needlelike Variations. As in the simplest situation considered
in Section 1.5.4, the basic method used in the proof of the maximum prin-
ciple which will be presented later consists in replacing the optimal con-
trol problem under consideration by a finite-dimensional extremal problem
or, more precisely, by a set of such problems. To this end we shall embed
the optimal process (x(·), u(·), to, tl) in a special family of controlled
processes, namely, a "packet" of needlelike variations, and consider the
restriction of problem (1)-(3) of Section 4.2.2 to this family.
The family of processes we need depends on the following parameters:
the initial data (to, xo) and the terminal instant t1;
a set a = (o. 1 , •.• ,o.N) where all a; E R are sufficiently small; for
brevity we denote o. = Eo.i;
a set t = (tl, ..• ,tN), where to < L1:;;; T2:;;; ..• :;;; tN < t1, the points
Ti (i = 1, .•• ,N) containing all the points of discontinuity of the
optimal control u(•);
a set v = (v 1 , •.• ,VN), where v;EU.
In what follows T and v
are fixed while the parameters (t1, to, xo, a)
change, and we shall differentiate various functions with respect to them.
Let us begin with N = 1. An elementary needlelike variation or, sim-
ply, an "elementary needle" corresponding to a pair (t, v) is determined
by the equalities
u (t; a, -r, v) = { ii (t), tH... -r+a),
(1)
v, tE[-r, -r+a).
Accordingly, a needlelike variation x(t; o., T, v) of the phase trajectory
x(t) is defined as the solution of Cauchy's problem
X=cp(t, x, u(t; a, -r, v)),
(2)
x(-r)=x(-r).
To within an insignificant distinction, we considered such a variation
in Section 1.5.4 (we took the half-open interval [T- A, t) instead of [t,
t + o.). It was found that the function x(t; o., t, v) possessed the right-
hand derivative with respect to o. for o. = 0 and that for t ~ T this de-
rivative y(t) = Xo,(t; 0, t, v) was the solution of the variational equation

(3)
with the initial condition
y (-r) = cp (-r, x(T), v)-cp ('t', x(-r), u('t')). (4)
As in Section 2.5.4, here and in the remaining part of the present
section we shall denote by n(t, t) the principal matrix solution of Eq. (3).
Moreover, for the sake of brevity we introduce a special symbol for the
right-hand side of equality (4):
6~ ('t', v) = cp ('t', X.('t'), v)-cp (-r, x(-r), u(-r)). (5)
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 221
Using this notation, we write
y(t)=xa.(t; 0, T, v)=!J(t, ,;)~~(,;, v). (6)
In Section 1.5.4 we managed to limit ourselves to the consideration
of one "needle" (a needlelike variation). Here we shall need a "packet"
of such variations because the problem under consideration is more complex
and we need a greater number of free parameters. A packet involves an
arbitrary finite number of needles with parameters (Ti, vi, ai), i = 1, •.• ,
N. Here it is important that some of these needles may only differ in the
values vi of the control for the same values of •i· Therefore, the needles
can no longer be specified by formulas of the form (1) because different
needles with the same 'i will overlap. To avoid this we shall shift the
N
interval of action of the i-th needle by a distance of ia=i ~a1 ; i.e.,
i=l

we shall take the interval 6i = [Ti + ia, 'i + ia + ail (it is evident that
if aj > 0 and aj are sufficiently small, then 6i do not overlap).
According to the assumptions made in Section 4.2.1, the optimal con-
trol u(·) is continuous from the left at the point tl and continuous from
the right at the point to. Let us extend~(·) outside the closed interval
[to, t1l with the preservation of the continuity [for instance, by putting
u(t) = u(to) fort< to and ~(t) = u(tl) fort> tll; in what follows we
shall not stipulate this again.
Let ai > 0 for all i = 1, ..• ,N. We define a needlelike variation .of
the control~(·) by means of the formulas

u(t; a, i,
_ {u(t), tE(t~-6. 4+6l""-j~~~;.
(7)
V)= [ N N )
V;, ~ E ~;= t;+i .~ a1, T;+i .~ a1 +a; .
1=1 i=l

The corresponding family of phase trajectories x(t; to, x 0 , a, T, ;)


is defined as the solution of Cauchy's problem
x=cp(t, x.• u(t; a, :r, v)), x(t.)=x•. (8)
For brevity, we shall write xo = x(to).
The Lemma on the Packet of Needles. 1) For sufficiently sn~ll Eo > 0
and 8 > 0, the solution of Cauchy's problem (8) such that
lt.-t~l<e 0 , jx.-x.i<E., 0<a1 <E0 , (9)
is defined for to
2) If to->- to, XO->- Xo, andaj + 0, then x(t; to, xo, a, T, v)-+ i(t)
uniformly on the closed interval [to, t1l·
3) The mapping (t 1, to, xo, a)
+ x(tl; to, Xo, a, T, can be ex- v)
tended to a continuously differentiable mapping defined in a neighborhood
of the point (th to, xo, 0), and, moreover,
ai;at 1 = xt, <il; r•. x., o;
:r, v) = cp(4, x (t,), u(t:)) = ~ (tl), ( 10)
ax;at.=xt,(i1 ; r..
x•. o, :r, v) =-r.~<4. t~)cp(t:. i .. u(t.))=-r.J<4. f.l~(t.), ( 11)
ax;ax 0 =Xx, (f1 ; f0 , Xo, 0, ""i", V)=!J(tl, [.), (12)
axjook = Xa.k (tl; £., Xo, 0, T; v) =!J (ti, to)~~ (Tk, Vk), (13)
where n(t, T) is the principal matrix solution of the variational system
(3) and ~~ (T, v) is determined by the formula (5) .
The proof of this lemma will be presented in Section 4.2.6. It is
clear that the lemma guarantees the differentiability of the endpoint terms
222 CHAPTER 4
involved in the conditions of the problem posed in Section 4.2.2. As to
the integral terms, we shall devote a separate lemma to them as in Section
1 .5.4.
The Lemma on the Integral Functionals. Let the functions fi(t, x, u)
satisfy the same conditions as in the formulation of the theorem of Section
4.2.2.
If u(t, a, T, v) is defined for ai > 0 by the formulas (7) and x(t;
to, xo, a,
T", v) is the solution of Cauchy's problem (8), then the functions
t,
F;<tl, t •• x •• a>= ~ ,, (t, x (t; t •• x ••
t,
a, -:t, v), u (t; a, :t, 'V)> dt,
i=O,I, ... , m,
can be extended to continuously differentiable functions in a neighborhood
of the point (tl, to, xo, 0) so that
aft;~at1 =Fu,(4, i0 , x., O)=ft(i;_, x(i;.), u(tl))=f;(tl), (14)
aft1;at. = Fu, (ii, fo, .x•• O) =
=-f;(fo, X0 , u(t~))+Pot(t:)rp(fo,X0 , u(fo))=-f1 (fo)+po;(t:)~(t:), (15)
aft;~ax. = F;x, (t t X.. 0) = -pot(i;,),
10 0, (16)
aff;1aa" = F ta." (t:, t:,
X0 , 0) = dft (-r", v") -poi('t'") d~ (""' v"), ( 17)
where Poi(T) is the solution of Cauchy's problem
dpo;('t')/d't'=---.,.pot('t')rp,.('t', x('r), u(-r))+f;,.('t', x(-r), u('t')),
(18)
Pot(i;.) =0
for the homogeneous linear system adjoint to the variational system (3),
and ~fi(T, v) is determined by a formula analogous to (5). The proof of
this lemma will be presented in Section 4.2.7.
4.2.4. Reduction to the Finite-Dimensional Problem. The graph of
the optimal phase trajectory x(t) lies in the domain G and, by virtue of
the second assertion of the lemma of Section 4.2.3 (on the packet of
needles), the graph of the solution x(t; to, xo, T, of Cauchy's a, v)
problem (8) of that same section also lies in this domain for t€[1 0 , td
if for a sufficiently small co inequalities (9) of Section 4.2.3 hold.
Hence the quadruple (x(·; to, xo, a, T, v), u(·; a, T, v), t 0 , t 1 ) is a
controlled process for problem (1)-(3) of Section 4.2.2.
Let us put
'l';(tl, t0, x•• a)='¢;(t •• x•• tl, x(tl; to, x•• a, :r, v)), (1)
t,
PI (tl, t •• x•• a)=~ ft (t, X (t; t.,
t,
Xo, a, T, v), u (t; a:. :r, 0) dt,
i=O, l, ... , m. (2)
By virtue of the third assertion of the lemma on the packet of needles and
according to the composition theorem, the functions (1) are continuously
differentiable in a neighborhood of the point (t 1 , t 0 , i 0 , O). The same
property of functions (2) follows from the lemma on the integral func-
tionals of Section 4.2.3.
Now we can consider the finite-dimensional extremal problem, which is
the restriction of problem (1)-(3) of Section 4.2.2 to the family of con-
trolled processes we have constructed:

(3)
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 223

1dt1 , t0 , x0 , a)= F 1 (tl, to, xo, a)+ w1 (t 1 , t0 , x0 , a):§: o·, (4)


i=l, 2, ... , m,
ak >
0, k =I, 2, ... , N. (5)
The point (tl, to, io, 0) is a local solution of this problem. Indeed, if
the constraints (4) hold, then the quadruple (x(·; to, x 0 , ~. ; , v), u(·;
~. T, v) is an admissible controlled process for problem (1)-(3) of Section
4.2.2. If eo in inequalities (9) of Section 4.2.3 is sufficiently small
then, by virtue of the second assertion of the lemma on the packet of
needles, inequalities (4) of Section 4.2.1 hold. Consequently, inequality
(5) of Section 4.2.1 hold; whence
lo(tl, to, Xo, a)=.'J.(x(·; to, Xo, a, :r, V), U(· Ia:, :r, v), to, tl)>
> '0 <x o. ii <. ), to, 'il) = 10 <'il, 'i•• x., o),
which means that (tl, to, io, 0)
is a local solution of problem (3)-(5).
Applying the Lagrange multiplier rule (Section 3.2) to problem (3)-
(5), we arrive at the following assertion:
LEMMA (Lagrange Multiplier Rule for the Auxiliary Finite-Dimensiona l
Problem). There exist Lagrange multipliers
1 0>0, i=(~.l' ... , J:m)ERm•, ~=(Ill• ... , llN), (6)
not all zero, such that for the Lagrange function
m N
A (t 1 , t0 , X0 , rx; A., ll• A.0 ) = L i=O
A.; (F 1 + 1¥1) + L
k=l
llkak (7)

the following conditions hold:


1) the stationarity conditions
m m
At,= L i;Fu, +L i;'Irit, =0, (8)
i=O i=O

r. 7:; it,=
m m
At,= L i;Ftt, + \jr 0, (9)
i=O i=O
m m
Ax,= L J:}';x, + LJ:;'irtx, = 0, ( 10)
i=O i=O
m m
Aa.k = L.7:lta.k + L 1:;Wia.k +!tk=O, ( 11)
1=0 i=O

where the notation A~k = A~k(t1, to, io, 0; ~. p, ~o), etc. is used;
2) the conditions of concordance of signs
i1 ~0, ~k~O; i=l, 2, ... , m; k=1, 2, ... ,N; ( 12)
3) the conditions of complementary slackness
,., ........... - .............
A.1 [F 1 (t 1 , t0 , x0 , O)+lfdt 1 , t 0 , x0 , 0)]=0,
-
( 13)
i=l, 2, ... , m.
4.2.5. Proof of the Maximum Principle. With the exception of one
passage which, to a certain extent, is not trivial, the remaining part of
the proof will be devoted to the interpretation of conditions (8)-(13) of
Section 4.2.4. The maximum principle itself (or, in the Lagrange form, the
minimum principle), i.e., relations (11) [or (10)] of Section 4.2.2, is
then obtained from equalities (11) of Section 4.2.4, i.e., from the condi-
tions A~k = 0. Each of those conditions corresponds to one of the needles
224 CHAPTER 4
included in the "packet," and if this needle is specified by a pair (•k•
Vk), we obtain the inequality H(•k· x(Tk), Vk· ·) ~ H(•k· x(Tk), u(Tk), ·)
from this condition. Hence, to derive the maximum principle we have to
consider the needles corresponding to all the possible pairs (T, v) while
the "packet" contains an arbitrary but finite number of needles. This dif-
ficulty is overcome with the aid of a simple topological consideration
based on the compactness. Using the lemma on the "centered system of sets,"
we can choose "universal" Lagrange multipliers applicable to the whole set
of needles.
A) Existence of the "Universal" Lagrange Multipliers. Let us study
in greater detail conditions (8)-(13) of Section 4.2.4 into which we shall
introduce the values of the derivatives of the functions Fi and ~i calcu-
lated with the aid of the two lemmas of Section 4.2.3; particular attention
will be paid to the terms involving the parameters (•k• vk) of the needles.
The derivatives Fi are given in the lemma on the integral functionals,
and denoting
AJc, v)=f;('r:, x(-c), v)-f;(-c, x(-c), (-c))- u
-po1(-c)[<p(-t, x(-c), v)-<p('t, x(-c), u(-c)J=L\f(-c, v)-po;('t)L\~('t, v), (1)
we see that
F;ak =Ai('tk, vk),
and none of the derivatives Fito• Fit 1 , and Fixo involves the parameters
of the needles, and hence these derivatives have the same values for any
"packet of needles" of Section 4.2.3. Similarly, the derivatives of the
functions
lJ'i (tl, to, Xo, a)= '¢;(to, Xo, tu X (tl; to, Xo, CX, :t", v))
are calculated with the aid of the composition theorem and the lemma on the
packet of needles, the parameters of the needles being involved only in
~iak· If we denote

(2)
then
if;a11 = B; ('tk, vk),

and the values of the derivatives ~ito• ~it 1 , ~ixo are the same £Qr any
"packet of needles."
Now we have
m m m
Aak = l; f.;fi;a 11 + l; liW;ak +~k= l; 1,(A;(-c11 , vk)+B;-('t'k, vk))+~k·
i=O i=O i=O

According to conditions (12) and (11) of Section 4.2.4, we have Pk ~ 0 and


Aak = 0, and therefore the inequalities
m
l; l,[A;('t'k, v11 )+B;(,;k, vk)J;;;;;:O (3)
i=O

must hold.
Now let us consider the following sets in the space R(m+l)*:

m
S ={(A, A0 ) I l; '-' = I},
l=O
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 225

m
K('r, v)={(A,, A0 )i ~ A,;(A;('r, v)+B;(T, v))>O},
i::;;O
m
T 1 ={(A,, A0 )1_~ A;(Fu1 + 'i'u 1)=0}, l=O, I,
~=0
m
X={(A,, A,o)i ~ A;(Ftx,+lftx,)=O},
i=O

and, finally, the set Z consisting of those (A, Ao) for which AO ~ 0 and
Ai, i = 1, ... ,m, satisfy the conditions of concordance of signs and the
conditions of complementary slackness (12) and (13) of Section 4.2.4 which
are identical with the conditions (13) and (14) of Section 4.2.2.
All these sets are closed, and, moreover, the sphere S is compact
since it is a bounded closed subset of the finite-dimensional space R(m+l)*.
The stationarity conditions (8)-(10) of Section 4.2.4 correspond to
the inclusions ().,, l.) E T,, T 0 , X, and the conditions of concordance of signs
and the conditions of complementary slackness correspond to the inclusion
(l, X.)EZ (all these conditions are the same for all packets of needles).
~onditions (11) of Section 4.2.4, which depend on the needles, hold when
(~. ~0 )EK(Tk, vk), k = 1, ... ,N. Finally, since the Lagrange multipliers are
determined to within a proportionality factor and are not all zero, they
can be normalized so that (i:, i:.)ES. Therefore, the assertion of the lellffila
of the foregoing section means that
N
snr.nrinxnzn n K(Tk, vk)=l=0. (4)
k=1

Let us consider the following system of closed subsets of the compact


set's:
R(T, v) = s n T. n T, n X n z n K (T, v), (S)
T E rt.. i,J. v E u.
Lemma on the Centered System of Sets. The intersection of the system
of sets K(T, v) is nonempty.
Proof, According to (4), the intersection of any finite collection
of sets belonging to the system K(T, v) is nonempty. Such a system is said
to be centered. By the well-known theorem [KF, p. 99], the intersection of
a centered system of subsets of a compact set is nonempty. •
Thus, there exist
(l, ~.)E n K(T, v)=Snr.nr,nxnzn n K(T, v). (6)
H (1,, t.J, H [7,, i'.J
oeu ven

These are the sought-for "universal" Lagrange multipliers. Indeed, the


multipliers ~i are not all zero((~. i:0 )ES)., satisfy the conditions of con-
cordance of signs and the conditions of complementary slackness ((~. ~.) EZ),
satisfy the conditions Aq = 0, l = 0, 1, and Ax,=O ((l, ~.)ET 0 nT,nX),
and, finally,
m
~ )., 1 [Ad~. v)+B 1 (T, v)]>O (7)
i=O

for all t E[t0 , i,], v EU.


B) Derivation of the Maximum Principle. Substituting (1) and (2) into
inequality (7), we bring it to the form
226 CHAPTER 4

=f('r:, x('t), v)-f('t, x('t), u('t))-p('t)[<p('t, x('t), v)-cp('t, x('t), u('t))J=

=-H('t, x('t), v, ii<'t))+H(-;, x('t), u('t), i<'t>>. <s)


where
m
f ('t, x, u) = L, ~;/; ('t, x, u), (9)
i=O

( 10)

is a solution of the Euler-Lagrange equation (Sa) of Section 4.2.2 (as will


be seen later), and
H('t, X, u, p)=pcp('t, X, u)-f('t, X, u) (11)
is the Pontryagin function [cf. (7) in Section 4.2.2].
Inequality (8), which holds for all 'tE [i'.. t:J and VE U, is equivalent
to the maximum principle (11) of Section 4.2.2:
H ('t, x('t), u('t), p('t)) =max H ('t, x('t), v, ji ('t)). ( 12)
C) Derivation of the Stationarity Conditions with Respect to x(·).
Differentiating (10) and taking into account that Poi(<) satisfy Eqs. (1S)
of Section 4.2.4 and

(see the theorem of Section 2.5.4), we obtain


m
dp('t)/d't= Li=O
~;{-poi('t)~.. ('t)+f;x('t)}-
m m
-L ~i ~1Jli Q(tl, 't)~.. ('t)=-p('t)~.. ('r:)+ L ~Jix('t) =-p('t)~.. ('r:)+fx('t);
~

i=O Xl . i=O

i.e., p(·) is a solution of the Euler-Lagrange equation (Sa) of Section


4.2.2.
Further, (10) directly implies that
m m .... m .... ...
P(tl) = L ~;Poi(tl)- L J.; ~~;
i=O i=O 1
g {tl, tl) = - L ).; ~~;
i:=O 1
= - :: ,
I
( 13)

since Poi(tl) = 0 by virtue of formula (1S) of Section 4.2.3, and n(t, t) =


E. Here, as in formula (6) of Section 4.2.2, the expression
m
l (t 0 , X0 , f 1, X1 ) = ~ ~;\jl; (t 0 , X0 , t 1 , .x1 ) ( 14)
i=O

is an endpoint functional, and equality (13) we have obtained 1s equivalent


to the first equality (9a) of Section 4.2.2.
Finally, the condition Ax 0 = 0, i.e., relation (10) of Section 4.2.4,
after the values of the derivatives (12) and (16) given in Section 4.2.3
have been substituted into it, yields
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 227

and hence the second transversality condition (9a) of Section 4.2.2 also
holds.
D) Derivation of the Stationarity Conditions wi-th Respect to ti. After
the values of the derivatives expressed by formulas (10), (11), (14), and
(15) of Section 4.2.3 have been substituted into the conditions At 0 = At 1 =
0 [conditions (8) and (9) of Section 4.2.4], we take into account the nota-
tion (9)-(11) and equality (13) (which has already been proved) and obtain

and
m m
O=At,= i=O
~ 1;Fu.+ ~ }.-;'Yu,=
1=0

::J
m m ~

= {;.0 1; [-f;(to) + poi(fo) ~ (t0)] + f;o 1; [1),u, +1),1x, =


m
=-f (to)+ 1~0 1;po; (to)~ (t +ft.+ z:, [-:-Q (ti, t:) ~ (t0)J =.
0)

= - f (t + P(t <P (t + lt,:: fi (t + Zt,•


0) 0) 0) 0)

This proves the two formulas (12a) of Section 4.2.2.


E) Continuity of the Function H(·). Since the control u(t) is contin-
uous from the right, so is the function

if<t>=H(t, x<t>, ii<t>. p(t>>=-t<t. i<t>. u<t>>+.U<t}cp(t, x(t), u<t>>=


m
=-~ t;t;(t, x<t>. u<t»+.O<t>cp(t, x<t>. u<t>>.
1=0

Horeover, each of the functions H(t, x(t), v, p(t)) is continuous with re-
spect to t, and passing to the limit in the inequality
H(t, x(t>. v, p(,;)):;:;;;;.n(t, i<t>. u(t), /J<t>>=il<t>. (15)
we obtain

whence
if(,;)= supH (,;,
ven
x(t), v, p (,;)):;:;;;;. H(-c- 0). (16)

On the other hand,


if (-c-O)=t....lim H(t, x(t), u(t), p(t)) =
~-0

=H(-c, x(-c), il(-c-o), p(-c))= lim H(-c,x(-c), v, P'<-c>>:;:; ; .


v-+UCT- o)
:;:;;;;,H (-c, X(t), u(-c), p(-c))= if (-c)
by virtue of the same inequality (15). The last inequality and (16) imply
H(T- 0) = H(T), i.e., H(t) is continuous from the left as well. •
4.2.6. Proof of the Lemma on the Packet of Needles. We remind the
reader that the needlelike variation x(t; to, xo, T, of the optimal a, v)
phase trajectory i(·) is the solution of Cauchy's problem
228 CHAPTER 4

X=cp(t, X, u(t; a,:;, V)), ( 1)


X (to)= Xo, (2)
where the variation u(t; a, T, v) of the optimal control u(t) is specified
for aj > 0 by the formulas
vi, tE~i=[Ti+ia, Ti+ia+a 1),
u(t; a, :r, v)= { u(t), ~~ l)~i; a=~ ar
N
(3)
' 1"1

It is convenient to split the proof of the lemma on the packet of needles


(the lemma of Section 4.2.3) into two parts. In the first part, which is
stated as a separate assertion, we prove the uniform convergence of the
varied phase trajectories to i(·). Here it is not the piecewise continuity
that plays the key role but the integral convergence of the corresponding
right-hand members of the differential constraint [see formula (5) below].
To stress this fact we extend the assertion to the case of measurable vari-
ations as well (this is preceded by the statement of an appropriate defini-
tion of measurability for the case when the controls assume values belong-
ing to an arbitrary topological space). The second part of the proof of
the lemma on the packet of needles is directly connected with the piecewise
continuity and is based on the classical theorem on continuous differenti-
ability of solutions with respect to the initial data.
Definition. Let U be an arbitrary topological space, and let I be a
closed interval on the real line. A mapping u: I is said to be mea-_,.u
surable (in Lebesgue's sense) if there exists a finite or a countable se-
quence of pairwise disjoint measurable subsets An such that:
a) the restriction ulAn can be extended to a function continuous on
the closure An;
b) the set I"'- U An is of Lebesgue's measure zero.
n
It is evident that every piecewise-continuous mapping is measurable
in the sense of this definition (in this case we take the intervals of con-
tinuity as the sets An). It is also readily seen that if Uc:R', then a
function measurable in the sense of this definition is measurable in the
ordinary sense as well [KF, Chapter 5, Section 4].
Indeed, let un(·) be a function continuous on An and coinciding with
u(·) on An· Putting un(t) = 0 outside An, we obtain a measurable function
on R. The characteristic function XAn(·) of the set An is also measurable,
and since u(t)·=~XAn(t)un(t) almost everywhere on I, the function u(t) is
n
also measurable.
Exercise. Prove that if U c: Rr arid u: 1-+ U is a function measurable in
the standard Lebesgue sense, then it is also measurable in the sense of the
above definition. Hint: use Luzin's C-property [KF, p. 291].
In the formulation of the next lemma, 2( is an arbitrary Hausdorff
topological space [KF, p. 95].
LEMMA 1. Let a function <p and its derivative <Jlx be continuous in
GxU, and let x:[to, td-+ Rn be a solution of the differential equation
x=cpu. x, u(t)), (4)
and let its graph lie in G: f={(t,x(t))li;,~t~t1 }c:G.
Further, let cSo > 0, and let Ua: (f0 -6 0 , ~+«'1 0 )-+ U be a family of mea-
surable controls such that the values of ua(t) belong to a compact set
X'0 c:U for all t and for all a E ~ .
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 229

Finally, let ~(t) :: ua(t) for some aE~(, and let


1t +Oo
lit~ ~ l<p(t, x(t), Ua(t))-<p(t, x(t), u(t))idt=O. (5)
a--1-cx 70 -6o

Then for some 8>0 the solution Xa(t, to, xo) of Cauchy's problem
X={p(t, X, Ua(t)), X{t 0 )=X0
is defined on the closed interval [to- 8, t1- 8] for all (t 0 , xo, a) be-
longing to a neighborhood of the point (to, x(to), &) in Rx Rnxm, and,
moreover,
X a (t, t 0 , x 0 )-+- X(t)
uniformly with respect to tE[t0 ,t1 J for to+ to, xo + x(to), a+ a.
Proof. Let us put
Fa(t, X)=<p(t, X, Ua(t)) (6)
and apply Theorem 2 of Section 2.5.5 to the case under consideration.
A) If An are those sets which are involved in the definition of mea-
surability applied to ua(·), then, for a fixed x, the functions t-qJ(t,x,
ua(t)) are continuous on An, and consequently t + Fa(t, x) are measurable
functions and the assumption A) of Section 2.5.1 holds. For a fixed t the
functidns Fa, along with <p, are differentiable with respect to x, and
hence the assumption B) of Section 2.5.1 also holds.
B) For any compact set Xc:O the functions <p and <p_. are continuous
on the compact set XXX 0 and consequently they are bounded. Denoting
M = max I <p (t, x, u) I,
;n-xe%o
we see that the condition C') of Theorem 2 of Section 2.5.5 with x(t)="=M
and k(t) :: Mx holds because Ua(t)E'Xo by the hypothesis. The condition D)
of the same theorem coincides with (5).
Thus, in the situation under consideration Theorem 2 of Section 2.5.5
is applicable, whence follows the required assertion. •
Proof of the Lemma on the Packet of Needles. A) Let us verify that
the conditions of the foregoing lemma hold. Their part related to <p and
x(·) is involved in the conditions of the theorem on the maximum principle.
Above we already assumed that the control u(·) was extended outside
the closed interval [to, t1l so that it is a continuous function fort~ to
and fort~ t1. Therefore, we shall regard u(·) as a function defined on
the closed interval~= [to- oo, tl + oo], oo > 0, its points of discon-
tinuity lying within the open interval (to, t1).
Let t(l) < t(z) < ••. < t(s) be these points of discontinuity, and let
t( 0 ) =to- oo and ts+l = t1 + oo. The function

u-(t)=
f u(t)
'
tU-t>::;=:t<t<i>
""" '
· t uW>-o), t=t<i)

is continuous on the closed interval Ii = [t(i-l), t(i)], and the image


X; of this interval under the continuous mapping Ui: I-+- U is compact in
'U since Ii is a compact set. According to formulas (3), the values of the
control u(t; a,T, v)
belong to the compact set
s+l
X 0= U X;U{V 1 , ••. ,vN}·
i=l
230 CHAPTER 4

It remains to verify the condition of integral convergence (5). The


continuous function 1p is bounded on the compact set
X'={(t, x(i), u)J tE!l, uEX'o}=fX%"0 ,
i.e., J~pi~M. For O.j > 0 so small that the half-open intervals lij do not
intersect we conclude from the formula (3) that
I, +6o ~ _ _ _ ~ ~
~ J~p (t, x (t), u (t; a, T, v))-1p (t, x (t), u (t)) Idt =
1,-6,
N
x(t), v1) x(t), u(t))
N
= ~~ I'P (t, -lp (t, J dt ~2M -~a1 =2Ma--+ 0
/=1 A/ /=1

for a.j ~ 0, and hence condition (5) holds.


Applying Lemma 1, · we see that the first two assertions of the lemma on
the packet of needles of S~ct~on_4.2.3 hold: for sufficiently small Eo > 0
the solution x(t; to, xo, a., T, v) of Cauchy's problem (1), (2) with (to,
xo, a.) satisfying inequalities (9) of Section 4.2.3 is defined on li = [to -
~, t 1 + 8] (this is the first assertion), and
X (t; to, Xo, a, :r, v)--+ X(t)
uniformly with respect to tE[t0 , t1] for to+ to, xo + xo, O.j ~ 0 (the second
assertion).
B) Let us proceed to the consideration of the mapping (t 1 , to, xo,
a)+ x(ti; to, T, a, v).
As usual, we denote by X(t, to, xo) the solution
of Cauchy's problem
u
X= 1p (t, x, (t)), x(t 0 ) =X0 • (7)
By Theorem 1 of Section 2.5.6, function X(·, ·, :) is defined and continu-
ous in the domain (to- oo, t1 + oo) x G, where G is a neighborhood of the
graph r of the solution x(·). If the domain is narrowed so that to and t
do not pass through the points of discontinuity t(i) of the control u(·),
then the differential equation (7) satisfies the conditions of the classi-
cal theorem on the differentiability of the solution with respect to the
initial data (Section 2.5.7), and therefore the function X(t, to, xo) is
jointly continuously differentiable with respect to its arguments. In par-
ticular, X is continuously differentiable in some neighborhoods of the
points (tk, tk, x(tk)) since, by the hypothesis, the control u(•) is con-
tinuous in some neighborhoods of the points tk.
Further, let us denote by =<~. a, T, v) the value at t = tl of the
solution of Cauchy's problem for Eq. (1) with the initial condition x(t 0 )
~. i.e.,

s (s, ;, :r, v) =x(t1, la. ;, a, :r, v). <s>


According to formula (13) of Section 2.5.5 and the formulas (3) speci-
fying the control u(t, a, T, v), we have

(9)
[in the interval from to to to the equation X=~p(t, x, u) is solved for the
control u = ~(t), from to and t 1 it is solved for the control u = u(t; a.,
T, u), and, finally, from t 1 to t1 it is again solved for the control u =
u(t)], It is important to note that, like the formula (13) of Section
~.5.5, formula (9) is valid for any location of the points ti relative to
ti·
Now if we manage to extend the mapping (~. a) + =<~. a, T, to a
mapping of class C1 defined in a neighborhood of the point (x(t 0 ), 0) (i.e.,
v)
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 231

if we manage to extend it to negative aj with the preservation of the con-


tinuous differentiability) then, accord1ng to formula (9), the mapping (tl,
to, xo, u) + x(tl; to, xo, a, ~. v) will also be extended to a neighborhood
of the point (tl, to, x, 0), and, by the theorem of Section 2.2.2, it will
be continuously differentiable since it is a composition of three contin-
uously differentiable mappings: (to, xo) + X(to, to, xo), (~, u) +=<~,a,
~. v), and (tl, no)+ X(tl, t1, no).
C) Let -rE(t0 , t1 ) and vEU be fixed. We shall denote by Yv(t, to, y 0 )
the solution of Cauchy's problem
y=cp(t, y, v), y(to)'==Yo· (1 0)
By virtue of the local existence theorem (Section 2.5.2) and the classical
theorem on the differentiability of the solution with respect to the ini-
tial data (Section 2.5.7), the solution Yv(t, to, yo) is defined and con-
tinuously differentiable in a neighborhood of the point (T, T, x(T)),
If T is a point of discontinuity of the control u(·), i.e., u(T- O) ~
u(T), then, along with u(·), we shall consider the control

ii<t>={ ~<t>. t~-r.


u(-r), t < -r.
Since u(·) is continuous in a neighborhood of the point T, the solution
X(t, to, xo) of Cauchy's problem
X=cp(t,x,u(t)), X(t 0 )=XOI (11)
is also defined and continuously differentiable in a neighborhood of the
point (<, '· x(T)). In case T is a point of continuity of~(·), the solu-
tion X(t, to, xo) itself possesses the same property, and we can put X = X
in the further formulas.
Let us add the points To= to and 'N+l = t1 to the set T = (Tl,···•TN)
so that to To < Tl ~ ... ~ TN < TN+l = t1 and consider the composition of
mappings
8 (~.a, .r, V)=P 0 XN 0 lNO ... oXA 0 z}l 0 Xll-1 o.: .o z1 0 Xo, (12)
where P(~, a) = ~.
( 13)
and
Zk(;, ci)=(X('t11 , -r,~:+ka+a11 , Yvk(-r 11 +ka+ak, -r11 +ka,
(14)
X (Til+ ka, Til,~))), a).
Since Xk and Zk are continuously differentiable in a neighborhood of the
point Pk = (x(Tk), 0), and we have Xk(Pk) = Pk+l• Zk(Pk) = Pk• and Pis
linear, composition (12) is defined and continuously differentiable in a
neighborhood of the point po(i(to), 0) = (xo, 0).
LEMMA 2. If all ak are positive and sufficiently small, then
8 (~.a, :r. 0=8(6, a, :r, V)=x(~. to,~.
(15) a, .r, V).
Proof. For the sake of brevity, we denote x(t) = x(t, to,~. a, T,v).
If all ak are positive and sufficiently small, then the half-open intervals
~i involved in definition (3) are pairwise disjoint and are arranged be-
tween to and t1 in the ascending order of their indices.
N
Let s;~'t;+ia=T;+i~ aj be the left end point of the half-open inter-
i=1
val ~i• and let so =To = to and sN+l = TN+l = t1. We shall prove by
232 CHAPTER 4

induction that
( 16)
where E;o E; and

~k=PoXk_ 1 oZk_ 1 oXk_ 2 o ... oZ1 oX 0 (~,a), k?;:l, ( 17)

and, in particular, ~i =Po X 0 (~, a)= X (T1, T0 , ~) • Since the differential


equations in (7) and (1) coincide on the half-open interval to = To ~ t <
s1 and x(To) = E; = X(To, To, E;o), we have x(t) = X(t, To, l;o) on that in-
terval. Passing to the limit for t + s and taking into consideration the
identity X(sl, To, E;o) = X(sl, T1, X(Tl, To, E;o)), we obtain (16) fork= 1.
Let us suppose that (16) holds for some k. By virtue of (3), to pass
from Sk to Sk+l along the solution x(·) the following operations must be
performed. On the interval from Sk to Sk+l we solve Eq. (10) with v = Vk
and the initial condition yo = x(sk), which results in x(sk + ak) = Yvk(sk +
ak, Sk, x(sk)). Further, from Sk + ak, i.e., from the endpointof llk, to
Sk+l• i.e., to the initial point of llk+I• we solve Eq. (7) with the initial
condition xo = x(sk + ak), which yields the equality
x (sk+r) =X (sk+i• sk +ak, x (sk +ak)) =X (sk+r• sk +ak, Yvk (sk +ak, sk, x (sk)).
According to formula (13) of Section 2.5.5, the identity
X (t, T, X (T, t0 , X0)) =X (t, t 0 , X0 )

holds. Using this identity and equality (16), which holds by the induction
hypothesis, we consecutively obtain
x(sk+ 1 )=X (sk+r• Tk+r• X (Tk+I> Tk, X (Tk, sk+ak, x (sk+ak))))=
=X.(sk+r• THt• X (Tk+r• Tk, X (Tk, sk +ak,
Yvk (sk +ak, sk, X (sk, Tk, ~k))))). ( 18)
Now we note that if all ak are positive, then Tk < Sk < Sk + ak, and since
Eqs. (7) and (11) coincide fort~ Tk, we have X(Tk, Sk + ak, E;) = X(Tk,
sk + ak, E;) and X(sk, Tk, E;k) = X(sk, Tk, E;k). Therefore, using first the
definitions (13) and (14) and then (17), we obtain
X (Tk+r• Tk, X (Tk, sk +ak, Yvk (sk +ak, sk, X (sk, Tk, ~k)))) =
=Po Xk o Zk (~k• a)=P o Xk o Zko Xk_ 1 o ... oX 0 (~,a) =~k+i·
It follows that (18) coincides with (16), where k is replaced by k + 1, and
hence (16) holds for all k ~ 1. It remains to note that fork= N + 1 re-
lation (16) results in the equalities
x(tr)=X(tl, fr, ~N+r)=GN+l = p 0 XN oZNo ... 0 X~ (s, a)= g (s, a, T, u)
[the last equality follows from (12)], and since, by definition, we have
x(t 1 ) = =<t;,a,
T, v),
relation (15) is proved. •
As was already mentioned, composition (12) is continuously differenti-
able in a neighborhood of the point (x 0 , 6). Consequently, function =<t;,
a, T, v) (which was earlier defined only for aj > O) admits of a continu-
ously differentiable extension to that neighborhood. According to Sec-
tion B), it follows that the function x(tl, to, xo, a, T, v) admits (for
fixed i and v) of a continuously differentiable extension to a neighborhood
of the point Ct1, to, xo, 6).
It only remains to prove formulas (10)-(13) of Section 4.2.3 for the
partial derivatives of this function at the point (t 1 , to, 0 , 6). x
D) First of all we note that

( 19)
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 233

x (t, t., x(t.>. o. ;, v) == x (t, t •• x(t.n=x <t>. (20)


x<ii, fo, x•• Ci, :r, v) == s <x., -a, :r, v). (21)

According to (19) and (21), the partial derivative with respect to x 0 can
be found using the theorem of Section 2.5.6:
iJx iJx ~ ~ - - -
-0 =-a (t 1 ,t 0 ,x 0 ,0,-r,v) x, =x(lo)
Xo Xo
~~
I aE (xo,a 0. T. V)
Xo
I ~~ =
x, =x(/ 0 )
_aX(tl>to,Xo)l -Q(t t) (22)
- ax-0 Xo=x,(l,)- 1' 0 '

where n(t, T) is the principal matrix solution of the variational equation


i= q>_. (t, x(t), u(t)) z
[in formula (2) of Section 2.5.6 we must put F(t, x)=q>(t, x, u(t)) and take
into account (20)]. This proves relation (12) of Section 4.2.3.
Now we shall make use of formula (9) and the fact that, as was said
in Section B), the derivatives of the function X(t, t 0 , x 0 ) in the
neighborhood of the points (tk, tk, x(tk)), k = o, 1, can be calculated
with the aid of the classical theorem on the differentiability with re-
spect to the initial data [formulas (3)-(6) in Section 2.5.7 where again
'F(t, X)=q>(t, X, u(t)]. Let us differentiate (9) taking into account formula
( 22) which we have already derived and also the fact that, according to (20) and
(21)'
s <X <l•. 1•. x<i.>>· o, :r. v) = s <x <t.>. o, ;, 0 = x <il, l•. x<i.>. o, ;, 'V> =x <il>·
This results in
x
aX;at 1 =aX(tl, i 1, (il))Jatl = q> (tl, (ll), 01n. x u
ai;at. =as ('6, o, :r, v);as l~=x<l.> ax (i•. t., x(t.);at. lt,=l. = g (ll, 7.)[-q> (l •• x(1.). u(/;,))].
We have thus proved relations (10) and (11) of Section 4.2.3.
Now let ai = O, i ~ k, and Ok > 0. For the sake of brevity, we shall
denote
x(t, a.")=x(t; t., x(t.), (0, ... , 0, a.", ... , 0), :r, v).
According to (3), for this choice of~ the differential equations (1) and
(7) coincide in the half-open interval to~ t < Tk + kak and x(to, ak) =
x(to); therefore x(t, ak) = x(t) in that interval, and passing to the limit
we obtain
(23)
Further, for Tk + kak ~ t < Tk + kak + ak we solve the differential equa-
tion (10) with the initial condition (23) and v = Vk• whichresults in
x(-rk+ka."+ a.", a.") = Yv" (-r"+ka."+a.k, -r"+ka.",:i:('tk+ka.k)). (24)
In the remaining interval Tk + (k + 1)ak ~ t ~ t1 we return to Eq. (7) for
which, however, the initial condition (24) is taken. We shall split this
interval into two by fixing s so that the interval (Tk, s) does not contain
points of discontinuity of u(·). We shall assume that the quantity ak > 0
is so small that Tk < Tk + (k + 1)ak < s. We have

~(t1 , a")=X (7i, s, x(s, a.")) =X (t1 , s, X (s, -r"+(k+ l)a... , x(-r"+(k+ l)a", a")))""'
=X(t1 , s, X(s, -r"+(k+l)a.", Yv"(t"+(k+l)a", -r"+ka", x(-r"+ka"))).

When differentiating the last formula, we must take into account the follow-
ing equalities:
234 CHAPTER 4

ax ~
iJxo (tl, s,
/
Xo) x,=x(s,
~
O)=;(s) = Q (tl, s)

[see equality (1) of Section 2.5.6; t1 and s are fixed];

~: (s, 10, x ("C,., 0)) I,=-r,. x('t u('t


I -
1 = - Q (s, "'") cp ('t11 , 11 ), 11 )),
o ax
oro (s, "'"' Xo) x,= .. <-r,.. o> =x <-rk> =Q (s, "'")
(see equalities (5) and (6) of Section 2.5.7; the classical theorem is
applicable because the control u(·) is continuous on the closed interval
[<k, s], and we have 'k < 'k + (k + 1)as < s);

iJYt,
a/ (t, "'"'
I~
x ('t,.)) 1 =-r,. = cp ('t,.,
~
x ('t,.), v11),
a~ ('t,., to, - ('t,.)) It,=-c"
iJt" X
~
= - cp ('til• x ('til), v...),
o iJY
iJYo
vk ("C,., "'"' Yo)=E

[see formulas (4)-(6) of Section 2.5.7 in which we must put F(t,X)i=cp(t,x,v,.)


and take into account that n(t, t) = E for any t]; and, finally, (7) im-
u
plies X('t,.) = cp ('t,., X('t,.), ('t,.))
·Thus,

ax =!.!__
aa; aa,. (t:. a.,.) I a:,.=o
= OXo
ax (f1 , s, xo)
.
I ~ {ax
Xo=< (s) at. x(1:~)) (k + 1) +
(s, 1:",

ax - [aY
+ 0~0 (s, "'"'X ('t,.)) at ("C,., "'"'X~ ('tk)) (k +I)+
iJY ~ iJY ~ ..: ]} -
+~to ("'II• "'"' X("C,.))k+ ay'J.. ("C,., "'~ x('t,.))i('t")k ==
=Q(ti, s){-Q(s, "C,.)cp('t11 , X('t,.), U('t,.))(k+l)+
x
+ Q (s, "'") [cp·('t,., ('t,.), v") (k +I) -cp ('t,., ('t,.), v11 ) k + x
x u x
+ cp ("C,., ('t,.), ('t,.) k]} = g <4.~ "'") [cp ("'"' ('t,.), v,.)- cp ("'"' ('t,.), ('t,.))J, x u
which proves relation (13) of Section 4.2.3 [the equality nCtt, s)n(s,
Tk) = n(L, 'k) holds by virtue of the basic property of the principal ma-
trix solution expressed by formula (11) of Section 2.5.4].
4.2.7.Proof of the Lemma on the Integral Functionals. Let~ =
(~o, ~l,···•~m), f = (fo, f1, .•. ,fm), F = (Fo, F1, •.• ,Fm) (the functions
Fi are defined in the formulation of the lemma of Section 4.2.3), u(t, a,
T, v), and x(t; to, xo, a,
T, v) have the same meaning as in Sections 4.2.3
and 4.2.6.
Let us consider Cauchy's problem for (n + m + 1)-th-order system
:. (x)6 -
X= . =cp(t, x, s,.u(t, a.,"'·
- - -v))= (<pf(t,(t, x,x, u (t, ii,_ '_T, _U}) ) ,
u(t, a, 1:, v))
( 1)
X (/ 0 ) = X0 , S(t 0 ) = 0'
where ak > 0. Here the equations for x do not involve ~ and coincide with
Eqs. (2) of Section 4.2.6. Therefore, their solution is x(t; to, x 0 , a,
T, v), whence, as can easily be seen,
F (/ 1 , 10 , X0 , ii) :s S(/1 ; ! 0 , X01 Ci, -:t, v).
Consequently, the possibility of extending the function F to nonpositive
~ and its continuous differentiability in a neighborhood of (t 1, to, ~ 0 ,
0) follow from the lemma on the packet of needles applied to system (1),
and the values of the derivatives are found by means of formulas similar
to formulas (10)-(13) of Section 4.2.3:
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 235

(2)

where
- (t,
Q t) =
(!12flu11 flu)
flu
is the principal matrix solution of the variational system
(3)

(here nij are the rectangular blocks Of the matrix n of dimensions n X n,


n x m, m x n, and m x m, respectively). Since~ does not depend on~. we
conclude from (3) that n is the solution of the .following Cauchy problem
for the block-matrix differential equation:

(4)

Solving consecutively the four matrix equations forming (4), we obtain


dQu (t, 1:)/dt=~x(t)Qu (t, 1:), Qll (1:, t)=En,
whence it follows that nll(t, t) = n(t, t), where n(t, t) is the principal
matrix solution of system (3) of Section 4.2.3. The equations
dQ12 (t, 1:)/dt =~X (t) Ql~ (t, 1:), Ql2 (1:, 1:) = 0
are satisfied by n 1 2(t, t) = 0, and, by the uniqueness theorem, there can
be no other solution. Further, we have
dQ21 (t, 1:)/dt = fx (t) Qll = fx (t) Q (t, 1:), Q21 (t, 1:) = 0,

whence
t
Q21 (t, 1:)= ~ fx(s)Q(s, 1:)ds.
~

In particular,
-Qu (tl, 1:) =Po (1:) = (Poo (1:), • · · • Pom (1:))
is the solution of the Cauchy problem (18) of Section 4.2.3 [see formula
(14) of Section 2.5.4; this can also be shown directly by differentiating
with respect to t and using the properties of the principal matrix solution
described in the same theorem of Section 2.5.4]. Finally,
dQ 28 (t, 1:)/dt=fx(t)Q12 (t, 1:)=0, Q22 ('t", 1:)=Em,
whence n22(t, t) = Em· Hence
Q (t , 1:) =
1
(fl (i;., 't)
- Po(T)
0 ) .
Em
Substituting this into (2) and separating what is associated with ~(t 1 ; t 0,
xo, a, T, v) F(tl, to, xo, a), we obtain
aPtatl = a~;atl = l <f~>·
oitot0 =a~;ot 0 = - Qn (t11 t0) ~ ('i0) -Q22 (11• to) f(ta)= Po (lu) ~ (to)-f (fa),
236 CHAPTER 4

aP;iJx0 =iJ~;ax.=Q•i (t~. to)=-Po (t.),


+
aP;aak = a~;aak = fJ, 1 (il' -rk) d~ (-rk. vk)
+ Q•• (tl, -rk) df (-rk, vk) = - P.-(-rk) d~ (-rk, vk) +df(-rk, vk),
which coincides with relations (14)-(17) of Section 4.2.3. •

4.3*. Optimal Control Problems Linear with Respect


to Phase Coordinates
As is seen from the title, this section is devoted to a special class
of optimal control problems. This class is rather important for applica-
tions, but this is not the only reason why we are interested in it. The
reason is that problems with a linear structure with respect to the phase
coordinates allow us to demonstrate most vividly one of the most important
ideas of the whole theory. We mean the "latent convexity" which is always
present in optimal control problems. Ultimately, it is this factor that
makes it possible to write the necessary condition in the form of the
"maximum principle," i.e., in the form characteristic of convex programming
problems. In contrast to the foregoing section, where we wanted to stress
the connection between the theory of optimal control problems and the gen-
eral theory of smooth extremal problems, here emphasis will be laid upon
the connection between optimal control problems and convex analysis.
It should also be noted that here the necessary conditions for an ex-
tremum almost coincide with the sufficient conditions (which is character-
istic of convex programming problems). Finally, in the present section we
shall consider not only piecewise-continuous controls but also measurable
ones, which is important for revealing the latent convexity.
4.3.1. Reduction of the Optimal Control Problem Linear with Respect
to the Phase Coordinates to the Lyapunov-Type Problem. Let ~ - [t 0 , t 1 ]
be a fixed closed interval of the real line, let a;: d-. R"•, i = 0, 1, ... ,
m, and A: a -+2'(Rn, Rn) be integrable vector and matrix functions, respec-
tively, let U be a topological space, let {;:dxlt-.R, i = 0, 1, ... ,m, and
F(·, ·): dXU-Rn be continuous functions, let Yoi, Yli• i = 1, ... ,m, be
elements of Rn*, and let Ci, i = 1, 2, ... ,m, be some numbers.
The extremal problem

J 0 (x(·), u(·))= ~ (a 0 (t)x(t)+f.(t, u(t)))dt+v •• x(t 0 )+'\'16x(t 1 )-·>inf,


4

x=A (t)x+F(t, u(t)), u(t)EU, (1)

J;(x(·), u(·))= ~ (a;(t)x(t)+f;(t, u(t)))dt+v.;x(t.)+'\'Iix(t 1 )~c;,


4

which involves linearly the quantities associated with the phase trajectory
x, is called an optimal contPol pPoblem lineaP with Pespect to the phase
cooPdinates. Of course, it can be regarded as a special case of the gen-
eral optimal control problem stated in Section 4.2. However, here we shall
consider a wider set of admissible processes.
Let us denote by ~ the set of measurable mappings (in the sense of
the definition of Section 4.2.6) u: d--->-U such that the functions t-+
fi(t, u(t)) andt-+F(t, u(t)) are integrable. A pair (x(·), u(·)) is called
an admissible pPocess if x(·):~-+ Rn is an absolutely continuous vector
f1.1nction (see Section 2.1.8), u(·)E~.
x(t)=A (t)x(t)+F(t, u(t))
almost everywhere, and the inequalities J;(X(·), u(·))~c1 hold. An admis-
sible process (x(·), u(·)) is said to be optimal if there exists£> 0
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 237

such that for any admissible process (x(·), u(·)) satisfying the condition
~x(·)-X(·)IIc<~. R") <e the inequality
J 0 (X(·), U(·)};?=J 0 (X(·), U(·))
holds.
We shall show that the linear structure makes it possible to transform
problem (1) to a form in which x(·) and u(·) are, in a sense, separated
(and, in particular, there is no differentia l constraint) .
Let ~(·, ·) be the principal matrix solution (Section 2.5.4) of the
homogeneous linear system
x=A(t)x. (2)

According to formula (13) of Section 2.5.4,


t
x(t)=Q(t, t 0 )x(t 0 )+ ~ Q(t, -r:)F(-r:, u(-r:))d-r:. (3)
t,

Let us transform the functionals of problem (1) by substitutin g (3):

J 1 (x( · ), u(· )) = H a1(t) [ Q (t, i 0 )x(t 0 )+ f Q (t, -r:) F('t, u (-r:)) d-r: l}dt +

+ ~ ft (t, U (t)) dt + '\' ;X (t 0 0) + '1' 1 ;[Q (1 1 , i 0 ) X (t 0 ) + ~ Q (t


4
1, T) F (-r:, U (-r:)} d-r:]=
4
~ t ~

= ~ ~ a;(t)Q(t, -r:)F(-r:, u(-r:))d-r:dt+ ~ f;(-r:, u(-r:))d-r:+


44 4

+{~t, a (t)Q(t, i )dt+'l'ot+'1' ;Q(ti, i )}x(t.)+


1 0 1 0

11 11 { 11
+ '\'1 ; ~ Q (t 1 , T) F(-r:, u (-r:))dT= ~ ~a; (t)Q(t, -r:)dt F(-r:, u(-r:))+
4 4 '

+f;(-r:, u(-r:))+'l'uQ(t 1 , -r:)F(-r:, u(•))}d•+

11 11
i 0 )dt+'l'ot+'l'u Q(ti• 10 ) x(t 0 )=- ~ 0 1 (-r:, u(-r:))d-r:+P;x (t 0 ),
}

+ {
~ a 1 (1)Q(I, (4)
4 4
where
0 1 (-r:, u) = Pt (-r:) F (-r:, u)-f; (-r:, u), (5)
Pt = 'l'oi -p; (t.), (6)

and
t,
p;(-r:) = - '\' 1;Q (tl, •)- ~a; (t) Q (t, -r:) dt, (7)
~

i=O, 1, .. . ,m.

Now problem (1) assumes the form


J 0 (x(·), u(·))=- ~
~
o.u. u(t))dt+P.x(t.)~inf,
(8)

J;(x(·), u(·))=-~ G 1 (t,u(t))dt+P 1 x(t 0 )~c1 •


~

Problem (8) will be called a Lyapunov-type problem.


4.3.2. Lyapunov's Theorem. Lyapunov's theorem plays the key role in
establishin g the connection between optimal control problems and convex
238 CHAPTER 4

programming problems. Here it will be stated for the case of a vector


function p:R + Rn and Lebesgue's measure m on the real lineR. A slight
perfection of the proof makes it possible to extend this theorem to the
case of an arbitrary space with a a-algebra @) defined in it and a finite
continuous measure p: @)-.. Rn. In the formulation below @) is the a-algebra
of Lebesgue measurable subsets of Rand JL(A)=~p(t)dt.
A

Here and in the next section, in contrast to the other parts of


this book, we shall use some facts of functional analysis and the theo-
ry of measure which are not usually included in the traditional courses.
We shall begin with the definitions. Here and henceforth the terms "mea-
surability," "integrability," "almost everywhere," etc. are understood in
Lebesgue's sense (for the measurability it is convenient to use the defini-
tion stated in Section 4.2.6). Two functions are said to be equivalent if
they coincide almost everywhere.
Definition 1. Let~ be an arbitrary interval on the real line. By
the spaces LI(~) and~(~) we shall mean the normed spaces whose elements
are the classes of equivalent functions x:~ + R possessing finite norms
specified by the following equalities:
in the space L1(~): //x(·)JJ=~Ix(t)jdt; (1)
.1

in the space ~(Ll): 1/x(·)l!=supvraijx(t)l= inf sup lx(t)j. (2)


.1 N, m (N)=O te&'N

Proposition 1. L1(~) and Loo(~) are Banach spaces, the space L 1 (~)
being separable. The space Loo(~) is isometrically isomorphic to the con-
jugate space LI(~)* so that the closed balls are weakly* compact in this
space.
Instead of proving these assertions we give the following references:
for the completeness and the separability of the space L1(~), see [KF,
Chapter 7, Section 1] (the existence of a countable base of the Lebesgue
measure on an infinite interval is an immediate consequence of the exis-
tence of such a base on a finite closed interval since every infinite in-
terval is representable as a countable union of finite closed intervals);
for the compactness of the closed balls in the weak* topology of the space
conjugate to a separable space, see [KF, p. 202], and for the isometric
isomorphism between LI(~)* and Loo(~) [which implies the completeness of
Lw(~)], see [4].

Definition 2. Let X be a linear space, and let A c X. A point x of


the set A is said to be an extremal point of A if
x=axi+(l-a)x2 , O<a< 1, X1 ,x2 EA=:>xi=x1 •
Exercise. Let xj E X, j = 1, ... ,n. Prove that at least one of the
points XI,••••xn is an extremal point of their convex hull conv{x 1 , •.. ,xn}.
Krein-Milman Theorem. A compact subset of a locally convex topological
linear space X has at least one extremal point and coincides with the con-
vex closure of its extremal points.
For the proof of this theorem, see [4].
Now we proceed to the basic theorem of the present section.
Theorem of A. A. Lyapunov. If p:~ + Rn, p(·) = (p 1 (·), ... ,pn(·)), is
an integrable vector function, then
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 239

is a convex compact set in Rn. (We remind the reader that @) is the a-
algebra of Lebesgue measurable sets.)
Proof. A) Let us consider the mapping A:Loo(~) + B? determined by the
equality

A{'IJ{•))= ~ ')l(t)p(t)dt=(~ 1jl(t)p1 {t)dt, •••• ~'IJ(i)pn{t)dt).


A A A

This mapping is linear and, moreover, it is continuous in the weak* topology


of Loo(~) [by definition, this topology is the weakest among those in which
all the mappings ~ (·).-<'I' ( · ), p ( · )>, p ( ·) E Ldfl) are continuous].
The set

is a closed ball in Loo(~) [of radius 1/2 and with center at the point
~(·), ~(t) = 1/2]. Consequently, it is convex and, according to Proposi-
tion 1, it is compact in the weak* topology. Therefore, its image AW under
the continuous linear mapping A is also convex and compact.
It is obvious that the characteristic function
I 1, t E A,
'XA (t) =\ i
0, t A
of any set A E e> belongs to W, and therefore

~ p(t)dt= ~ XA(t)p(t)dt=A(XA(·))EAW,
A A
and consequently M cAW. It remains to prove that M = AW.
B) Let us take a point 6EAW. Its inverse image W~=WnA-1(6) is
the intersection of the set W which is convex and compact in the weak*
topology and the convex set (the affine manifold) A- 1 (~) which is closed
in the same topology (because A is continuous with respect to that top-
ology). Consequently, the set W~ is convex and compact, and, by the Krein-
Milman theorem, it has an extremal point x( ·) EW;.
If we prove that x(·) is the characteristic function of a set AE@:i,
it will follow that

~=~X (t) p (t) dt = ~ p (t) dt EM'


A A

and the theorem will be proved.


C) If 0 < x(t) < 1 and x(·) is not a characteristic function, then
the set B 8 ={tle:<x(t):<l-e} has a positive measure for some£> 0. The
function a.-m(a.)=m[(t 0 ,a.)nB.] is continuous and varies from 0 = m(t 0 ) to
m(B 10 ) = m(t 1 ) [m(B) is the Lebesgue measure of the set B]. Let us take an
arbitrary Nand choose ak, k = 1, ... ,N- 1, such that m(ak) = k/NB 10 • This
specifies the partition B8 =B 1 U •.. UBN of the set B10 into N pairwise dis-
joint subsets of positive measure:
Bl=(-oo, a.lJnB••... , Bl<=(a.J<-t• a.J<]nB•• ••• , BN=(a.u-t• oo)nB•.
Now we put
y(t)={ Y~<• tEB~<, (3)
0, tiB •.

The homogeneous linear system


240 CHAPTER 4

possesses a nonzero solution (yl,····YN) for N > n. We shall denote the


corresponding function (3) by y( ·). If 0 < >.. < e: max { lyk I }- 1 , then

{ ~e. tEB.,
IA.y(t)l =0, t~B ••
and therefore, x( •) ± A.y( ·) E W. Moreover,

A (X ( ·) ± A.y (·))=A (x (·))±My ( ·) = ~ ±As y(t)


A
p (t) dt"" ;.

Consequently, ,x ( ·) ± AY( ·) E W;, and since >.. >' 0, we see that x( ·) is not an
extremal point.·
Hence, x(·) is a characteristic function, and the theorem is proved. •
4.3.3. Lagrange Principle for the Lyapunov-Type Problems. In this
section !::. is a fixed (finite or infinite) interval on the real line and
U is a topological space.
Lemma on the Superpositional Measurability. If a function f:~xU-..R
is continuous and a function u: A-U is measurable (in the sense of the
definition of Section 4.2.6), then the function t 7 f(t, u(t)) is also
measurable.
Proof. Let Anc~ be chosen so that m(~"-.~An)=O (where m is the
Lebesgue measure) and the restrictions u(·)IAn can be extended to continu-
ous functions on An· Then the functions t~f(t, u(t)), tEAn, can also be
extended to continuous functions on ~- •
Now let us suppose that continuous functions fi:!::. ·xU-+ R are given.
Let us denote by 'U the set of measurable mappings u: ~- U for which the
compositions t+f 1(t, u(t)) are not only measurable but also integrable on
!::.; then functions u(·)- Sft(t, u(t))dt are defined on 'U.
A

Let X be a linear space, let functions gi:X 7 R be convex for i = 0,


1, .•. ,m' and affine with finite values fori= m' + 1, ... ,m, and let A eX
be a convex subset.
The extremal problem

IF 0 (u(·.))+g0 (X)=) f 0 (t, u(t))dt+g.(x)-inf,


A

lft<u(·))+gt<x)= ~ ft<t, u(t))dt+gdx>{ : 0 °.· i= 1• ... ,


m', (1)
a - i=m' + 1, ... , m,
xeA, U(·)e'U

will be called a Lyapunov-type problem. The Lyapunov-type problems include


as a special case the standard convex programming problem studied in Sec-
tion 1.3.3 (form' = m and fi = 0), and, as will be seen later, Lyapunov's
theorem makes it possible to extend the argument with the aid of which the
Kuhn-Tucker theorem was proved to the more general situation considered
here.
The function
m
.!l(u(·), x, A, A0 )=~ A.1 (1F;(u(·))+gdx)) (2)
1=0

is called the Lagrange funation of problem (1).


THEOREM (Lagrange Principle for the Lyapunov-Type Problem). Let func-
tions fi: ~X'U-+R be continuous, let functions gi:X + R be convex for
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 241

i = 0, l, ... ,m' and affine with finite values fori= m' + 1, ... ,m, and let
iA c: X be convex.
1. If a pair (u(·), x) is a solution of problem (1), then there is a
vee tor ~ = (~ 1 , ••• , };.;n) E Rm• and a number Ao, not vanishing simultaneously,
such that:
m m
a) min~ };.Jft(t, u)= ~ };. 1 (t,'u(t)) almost everywhere, (3)
ueUI:O isO
m m
min~ ~ 1g;(x)= ~ ~ 1g 1 (1) (4)
<EAI=O izO

(this is the minimum principle);


b) ~~~0, i=O, I, ... , m' (5)
(this is the condition of concordance of signs);
c) (6)
(these are the conditions of complementary slackness).
2. If (u(·), x)E'!LxA and there exist ~ER'"* and \ 0 > 0 such that
conditions (3)-(6) hold, then (u(·), i) is a solution of problem (1).
Proof. A) Lemma on the Convexity of the Image.
The image
im<if=g=(~ 0 • • • • , ~mlls;=cfFJ(u(·)), u(·)E'!L}
of the mapping u(·)...-+3F(u(·))=(<if0 (U(•), ... , o'Fm(u(·)) is a convex set in
Rm+l.

Proof. Let
sk=(~:.sf ..... ~::.)=JF(u 17' 1 (·)), k=l, 2.
The function p(·)=(p 0 (·), ••• , Pm(·)): R--..Rm+l determined by the equalities
ft(t, u<ll(t))-{ 1(1, U121 (1)), IE~.
{
P;(t)~ O, ~~~

is integrable since u<k1 (·)E'!L. By Lyapunov's theorem, the set

is convex, and since this set contains the points 0 (A = 0) and ~ 1 - ~ 2


(A = 6), for any aE[O, I) there exists AaE~ (i.e., Aa. is Lebesgue mea-
surable) such that

o:m-~n= ~ pdt)dt= ) [f{(t, u<ll(t))-fL{t, U 111 (1)))dl=


Aa AanA

= ~ ft(t,u<U(t))dt+ ~ ft(t,u< 21 (t))dt-£,,


Aan4 A'Aa
whence
~ ft(t, u<ll(t))dt+ ~ f,(t, u<• 1 (t))dt=a~~+(l-a)£~. i=O, I, ... , m. (7)
A"n4 4'Aa

Now let us put

{ u<u (t), t E A,. n ~.


Ua (t) = u<21 (t), t E ~"A,..

Then, by virtue of (7),


3Ft (Ua (·))=a£~ +(J -ct) ~~.
242 CHAPTER 4

and hence T(ua(·))=~; 1 +(1-a:H 1 • •


B) Existence of the Lagrange Multipliers. As was already mentioned,
our argument will be similar to the proof of the Kuhn-Tucker theorem of
Section 1.3.3.
By the hypothesis, the pair (~(·), x)
is a solution of the problem
(1). We can assume, without loss of generality, that 9"0 (u(·))+ g0 (x)~O
(if otherwise, we can subtract the corresponding constant from T 0 +g0 ) .
Let us prove that the set
C={~=(~0 , ... , ~m)l3(u(·), x)E'UxA,
,... <"<. n+g. (x> < ~o. T, (ii <: »+g, <x> EO;~,.
l= 1, ... , m', TJ{u{·))+g 1 (x)=~1 , i=m' +1, .. . , m} (8)
is convex. Indeed, let ~"= (a:t ... , ~~) EC, k = 1, 2, 0 < e < ,, and let
(u(k) ( ·), x(k)) be elements of 'UX A such that
Y.,( u'"'(.)) +g. (x<"') < ~~.
y 1 (u~A> (.}) +g1 (x<"'>{ EO;~!• ~= 1 ·~ .. ·' m''
=~ 11 J=m +I, ... , m.
By the lemma on the convexity of the image, there exists a function ue(·)E'U
such that
T 1 (ue (. )) ... eT1 (u<l> (.)) + (1-9) T 1 (u<•> (. )),
i=O, I, •.• , m.
Moreover' Xa = ax<ll + (1-9) x<•l EA since A is convex' and
g 1 (Xe)=gJ{9x111 + (l-9) x ")= 9g1 (x<11) + (l-9)g1 (x< >)
1 1

fori= m' + 1, .•• ,m since these functions are affine. Consequently,


T 1 (u 6 ( ·)) +gdxe) =9 [T t{u 111 ( • )) +gt{x111 )] +(1-9) [3F" 1 (u 111 ( ·)) +g1 (x1" ) ] -
=lht+(l-6)~7. i=m' + 1, ... , m.

Further, since go is a convex function, we have


•Y o (ua ( ·)) + g0 (Xe) = 96" 0( um ( ·)) + ( 1-9) T 0 (U111( ·)) + g0 (9x111 + (1-9) x111 )~
E;;;9(6"0 (u1U( ·)) +Ko (x111 )) + (1-9) (6"0 (u 111{·)) +Ko (x111 )) < ~+(1-9) 0::.
It is proved in a similar way that
T1 (ua ( • ))+g1 (xe) ~lh}+(l-9)~,. i= 1, ... , m'.
Therefore, from the definition of the set C, it follows that 9a:1+(1-9)a:'E
C, i.e., C is convex.
Putting (u(·), x) = (u(·), x) in (8), we obtain the inclusion
C::>{a:=(aa, ... , a:m)la:o>O, ~,;;:;;.o, l-1, ... , m',
a:1 =0, l=m'+l, ... , m}. (9)
In particular, the set C is nonempty. Moreover, O~C because, if other-
wise, then, according to (8), we have
¥o (u <. »+a,<x> < o, Tt<ii"<. ))+gdi) ~ o,
i=l, ... , m',
l''d"u(·})+gt(x)=O, l=m' +I, ... , m,
for a pair (u(·), x), and consequently (u(·), x) is not a solution of prob-
lem (1).
Applying the finite-dimensional separation theorem we find ~i• i = 0,
m
1, ... ,m, not all zero, such that the inequality ~ i 1a 1 ~ 0 holds for all
a:EC • t=o
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 243

We have (s, 0, ... , 0, a1 = I, 0, ... , O)E<';_, wh~re 1 :EO; j :EO; m' and O.i = 0 for
i ~ 0, j, and therefore th~ inequality EAO + Aj ~ 0 holds. Passing to the
limit for E + 0 we obtain Aj ~ 0. In just the same way, we obtain
{8, 0, ... , O)EC ~~.~0.
and hence (5) holds.
For any E > 0 we have
(s, 0, .•• , 0, a1 =T1 (u(·))+g1 (x), 0, ... , O)E

eC==> l.!!+A1 [IF1 cu <. n+g, cx>J;;;;;. o~ A1 [IF, <u <. n+g, <x>J;;;;;. o. (1 0)
Since (~(·), x) is an admissible pair,
I <J<m', (11)
J=m'+l, ... , m.
Relations (11) and (10) and the inequalities ~j ~ 0, 1 :EO; j :EO; m', which we
have already proved yield (6).
Further, according to (8), for any s > 0, (u ( • ), x) E'llx A, we have

m
EC~ ~ idc1"E(u(·))+gt{x))+si0 =
laO

However, according to (6), (11), the inequality j ~ m' + 1, and the


above assumption about c1" 0 +g., there must be .!l'(u(·), J.., i.)=O, and there- x,
fore the minimum principle holds for the Lagrange function (2):
min .!l'(u(·), x, }.., A 0 )=$tu(·k~. i, 1.0 ). (12)
'UXA

If AO > 0, relation ( 12) takes place, and the conditions (5) and (6) hold,
then
~ (5) m' ~
A0 (T 0 (U(·))+g.(x));;;;. ~ A,(T 1 (u(·))+g,(x))=
1=0
~ 1t • ~ • (12)
= ~ A 1 (1F 1 (u(·))+g 1(x))=..9'(u(·), x, "· A0);;;:,
1=0
~ ~/;'/;' ~- - ~!6),(11)
;;;;:. .!l' (u ( · ), x, "• A 0) = ~At( IF 1 (u ( ·)) +g,(x)) =
1=0

i 0 (3F. (u <.))+g. <x>> ~6'".<u<.)) +g. <x>>;;;:, IF. <u <.)+g. cxn
=
for every admissible pair (u(·), x), and hence (u(·), x) is a solution of
problem ( 1).
C) Reduction to the Elementary Problem. It can easily be seen that
the fulfillment of relation (12) is equivalent to the simultaneous fulfill-
ment of relations (4) and
m
min ~i 1 1FE(u(·))=
U(·) E'U 1=0
r m m m
= u~/~oul 1~0 f-tft (t, u(t))dt= 1 1~0 itft (t, u {t))dt = 1~0 i 11FE(u(·))_. (13)

Indeed, if (4) and (13) hold, then


m m m m
2 <u <. ), x, }.., ~ i 11F 1(u <.)) + ~ f.,gt(x);;;:,
}...>=I=D 1=0
~ ~,.rr, cii <·)) + ~ l1g1(x) = .!l' (u <·), x, ~. ~.>
1~0 1~0
244 CHAPTER 4

for all u(·)E'U' and xEA, i.e., (12) takes place. However, if, for in-
xE A,
m m
stance, ~ i. 1g1 (x) < ~ i.1g1 (x) for some then
i=O 1=0
m m m m
2 <il <. >. ~ ~~F~<u <. »+ ~ i.tgdx> < ~ i.JF 1 <u <. »+ ~ i.igi <x>.
x-, r.. ,;.> = 1=0 1=0 1=0 i=O

and hence (12) cannot hold.


Now in order to pass from (4) and (13) to the maximum principle, i.e.,
to (3) and (4), we have to show that it is legitimate to "insert min under
the integral Sign, II i.e,> tO ShOW that U(") is a SOlution Of the auxiliary
optimal control problem

~ ~ i.tf d t, u ( · )) dt -
AI=O
inf, u ( · ) E'll , ( 14)

if and only if relation (3) holds.


A similar assertion was alr~ady proved in Section 3.1.4. However, it
was assumed there that function u(·) was piecewise continuous, whereas in
the present case it is only measurable, and therefore we shall have to use
a more subtle argument.
D) Elementary Problem with Measurable Controls. The Lemma on the
Minimum Conditions for the Elementary Optimal Control Problem. Let~ be
an interval in R, let U be a topological space, let f: L'lxU- R be a con-
tinuous function, and let 'll be the set of measurable mappings u: ~ __,. U
such that the functions t ~ f(t, u(t)) are integrable on 6.
For u(·) E'll to be a solution of the problem

IF ( u ( · )) = ~ f (t, u ( t)) d t --+ inf, u ( · ) Ei 'U, (15)


A

it is necessary and sufficient that the equality


f(t, u(t))= minf(t, u) (16)
uett

hold almost everywhere in 6.


Proof. "Sufficient." According to (16), we have f(t,u(t));;;.f(t,u(t))
almost everywhere for any u(·)E'U, whence J'F(u(·));;;.3F(u(·)).
"Necessary." Let~(·) be a solution of problem (15). Since u(·) is
measurable, there exist measurable sets An c ~ such that m (~"'~An)= 0
and ulAn can be extended to a continuous function on An (see Section
4. 2. 6). A point 't' E An will be called essential if m ((T-6, T+6) n An)> 0
for any 8 > 0 (and inessential if otherwise), and we shall show that equal-
ity (16) holds for all essential points.
Indeed, if the inequality f(T, v) < f(T, ;(T)) held for an essential
point tEAn and some vEU then, by virtue of the continuity of the func-
tions u(·)IAn and t ~ f(t, v), the same equality would hold in a o-neigh-
borhood of the point T in the set An, i.e.,
f(t, v) < f(t, u(t)). VtE(T-6, T+6)nAn.
It ~s evident that the function

belongs to 'll , and


LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 245

l""(u(·))-T(u(·))= ~ [f(t,v)-f(t, u(t))]dt<O


(~-6, ~+6) n An
since f(t, v) < f(t, u(t)) and m(('t-ll,'t+ll)nAn)>O.
It remains to prove that the set of the inessential points is of mea-
sure zero. If 'tE An is inessential then, according to the definition,
there is OT such that T E (a.T' ST) and m (('t-Il~, 't+ll~) n A;.)= 0. Narrowing
theinterval,we find rational (a.T, 13 1) such that m((a~,~~)nAn)=O. The set
of all the intervals with rational end points is countable, and therefore
the set of those of them which are of the form (a.T, ST) for an inessential
point 't" E An is not more than countable. Let these intervals be I 1 , Iz, ...
Then Ann U I~ contains all inessential points of An and m (A n U I~) ,;:::
k n k """

~m(Ann 1~)=0.
k.
Since there are not more than countably many sets An, the union of the
sets of the inessential points of each of them is also of measure zero. As
was proved, relation (16) can be violated only on !!.."'. U An and at inessen-
n
tial points; i.e., it holds almost everywhere.
E) Completion of the Proof. According to C) and D), for condition
(12) to hold it is necessary and sufficient that (3) and (4) hold simul-
taneously. Replacing in the last paragra~h of Section B) relation (12)
by (3) and (4), we complete the proof of all the assertions of the theorem •

4.3.4. Duality Theorem. The notation in this section is the same
as in the foregoing one. We remind the reader that the concept of duality
for convex programming problems was introduced in Section 3.3.2. This was
done in connection with the perturbation method, i.e., with the inclusion
of an individual extremal problem in a family of problems of the same type.
Here we shall carry out exactly the same procedure; for the sake of sim-
plicity, we shall confine ourselves to integral functionals. We shall as-
sume (and this assumption will be essential for the proof) that ~ is a
closed interval of the real line and that the topological space U to which
the values of the various controls u(•) belong is separable, i.e., it con-
tains a countable subset {Un} everywhere dense in U . For instance, U
can be an arbitrary subset of Rr, C(~, Rn), C 1 (~, Rn), orL1(l'i); as to the
space Loo(~) (see Section 4.3.2), it is not separable.
Exercise 1. Prove these assertions.
We shall consider the Lyapunov-type problem

T 0 (u(·))= ~ f,.(t, u(t))di- inf,


A

3Ft(u(. )) = ~ ft(t, u(t))dt ~o. u (t) E U, i= l, 2, ... , m. (3)


A

Let us fix fi and include the problem (3) in the family of problems
,..(u(·)).--.inf, cfF't(u(·))+a 1 ~0. (3(a))
As in Section 3.3.2, the S-functlon of the family 3(a) is determined by the
equality
(1)

The analogy between Lyapunov-type problems and the convex programming prob-
lem is retained here too, and although the functions 6" 1 are not convex,
there holds the following:
LEMMA 1 •. Function a.+ S(a.) is convex.
246 CHAPTER 4

Proof. According to the lemma of Section 4.3.3, the set


im T = {s = (so• ... ,
is convex in Rm+l. Rewriting (1) in the form
S ((t)= inf {so 161 +a1 ~-0, sE im T}
and putting gi(~) = gi(~o.····~m) = ~i• we find that S(a) is the value of
the convex programming problem
go(6)-inf, gt(s)+at~O, Himl"'.
The last problem is a special case of the problem considered in Section
3.3.2 (here there are no equality constraints and therefore Y, h, etc.
should be dropped), and hence the convexity of Sis guaranteed by Corollary
1 of Section 3.3.2.
Duality Theorem for the Lyapunov-Type Problems. Let 6 be a closed
interval in R, let U be a separable topological space, let functions fi:
L1xil"-R, i = 0, 1, •.• ,m, be continuous, and let~ be the set of measur-
able mappings u: 11-U such that the functions t-+- fi(t, u(t)) are inte-
grable on 6. ·
If the S-function of the family 3(c:t) is continuous at the point a = 0,
then
S (a)== sup (Ac:t
).:;.0
+ 1!!.~ <D (t, 'J..) dt) (2)

for any a E int (dom S), where

<D(t, 'J..)'=in:f
ueu
(t 0 (t, ~ 'J..Ift(t, u)).
u)+ 1=1 (3)

Proof. A) By Lemma 1, the function a+ S(a) is convex, and the con-


tinuity of the function at one point implies its continuity for all a E
int (dom S) (Proposition 3 of Section 2. 6. 2) and the equality S (a) =
(conv S) (a), a E int (dom S). By the Fenchel-Moreau theorem (Section
2.6.3), COriV S = S**, and hence
(4)

for the same a.


Now let us compute S*(A). According to the definition,

S•(A.)=sup(Ac:t-S(c:t))= sup {Ac:t-inf ( ~ f 0 (t, u(t))dtl ~ fj{t, u(t))dt+a1 ~0,


00 00 1!!. 1!!.

u(.) E 'l£.) l = sup sup {Ac:t- ~ fo (t, u (t)) dt} =


( u (·)eft oo 1 <-f t 1 (t, u(l)) dt 1!!.

l
1!!.

+oo, 'J..~R~,

= - inf ~ [to (t,


u(•)i'U 1!!.
u (t)) dt ~ 'J..Ift(t, u (t))] dt.
+ hd

Substituting the expression we have found into (4), we see that equal-
ity (2) and, along with it, the whole theorem will be proved if we verify
that
inf
ll(·)f!IIA
~ [t.(t, u(t))+ 1"'1
~ 'J..tft(t, u (t))]dt= AS<D(t, 'J..)dt, (5)

where ~(t, A) is determined by equality (3).


LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 247
B) LEMMA 2. Let !J. be a closed interval in R, let U be a separable
topological space, let a function f: ~X U-+R be continuous, and let 'IL
be the set of measurable mappings u: ~:_u such that the function
iJ-+f (t, u (t)) is integrable on !J..
Then the function
q> (t) = inf {f (t, v)} (6)
veil

is measurable on !J., the integral Slj) (t) dt (finite or equal to -<X)) exists,
and A

~'P(t)dt=~inff(t, u)dt= inf ~f(t, u(t))dt. (7)


A A U u(•le'UA

Proof. Let {Vklk ~ 1} be a countable subset everywhere dense in U.


By virtue of the continuity off, the functions
<p,.(t)= min f(t,v,.) (8)
l<l<<n

are continuous, and for n + oo we have


'P,(tH lim inf f(t, v,.)=inff(t, v,.)=inff(t, v)='P(t). (9)
n-+<~> l<·k<n I< veil

Moreover, there exist functions u,.(·)E'U such that '!p,.(t)=f(t, u,.(t)).


Indeed, this is true for n = 1, and if 'P,.- 1 (t)=f(t, u,_ 1 (t)), then
'P,.(t)= min [f(t, v,.)]=min[<p,.- 1 (t), f(t, v,.)]=
l<k<n
=min[f(t, u,- 1 (1)), f(t, v,.)]=)(t, u,.(t)),
where the function
u,.- 1 (t) for <p,.-dt) ~ f (t, v,.),
u, (t)= { .
v,. for <Jl,.- 1 (t) > f (t, v,.)
belongs to 'U [since the functions are continuous, the inequality <Jl,.-dt) >
f(t, vn) holds on an open set, and therefore the function un(·) is measur-
able; the integrability of the function t~-+f(t, u,.(t))=<p,.(t) follows from the
continuity of <p, (t)] •
By virtue of (9), function <p(·) is a pointwise limit of continuous
(and hence measurable) functions and therefore it is measurable [KF, p.
284]. Representing this function in the form of the difference
<p(t)=f(t, v)-[<p(t)-f(t, v)]
of a continuous function and a nonnegative function, we see that the inte-
gral S <p(t)dt exists, its value being finite or equal to -<X) [the function
A
f(t, v) is continuous on the closed interval !J. and therefore the integral
of f is finite while the integral of the nonnegative measurable function
<p(t)-f(t, v) exists but its value may be equal to +oo].
Since f(t, u(t));;;;:.<p(t) for any u(·)E'U, the inequality

inf S f(t, u(t))dt;;o, ~ <p (t)dt (10)


u(•lE'UA A

is obvious. If the left-hand member of ( 10) is equal to -oo, then equality


(7) holds. In case the left-hand member is finite, the inequalities

Scp1 (t)dt;;;;,: S'P,.(t)dt= Sf(t, u,.(t))dt;;a. inf Sf(t, u(t))dt


A A A U(•)I'UA
248 CHAPTER 4

imply that the sequence of integrals Scp,.(t)dt is bounded. Applying the


!f.
theorem of B. Levi [KF, p. 303] to the nondecreasing sequence --cp,.(·), we
find that the function cp(·) is integrable and

Scp(t)dt=lim) cp,.(t)dt;;;::::inf fcp,.(t)dt=inf St(t, u,.(t))dt;;;:;, inf Sf(t, u(t))dt.


4 n-+"'!f. n !f. n 4 u(•Je'lt!f.

The last relation and inequality (10) yield (7). •


By Lemma 2, relation (5) holds, which, as was seen, proves the theo-
rem.
Exercise 2. State and prove the duality theorem for the Lyapunov-type
problem of the general form [see (1) in Section 4.3.3].
4.3.5. Maximum Principle for Optimal Control Problems Linear with
Respect to the Phase Coordinates. Let us come back to the problem posed
in Section 4.3.1:

J 0 (X ( • ), u ( ·)) =) [a 0 (t).x (t) + fo (t, u (t))j dt + '\' 00 X (t + Y10 X (t,)-+ inf,


0) (1)
!f.

x-A(t)x+F(t, q(t)), u(t)EU, (2)


Jt(x(·), u(·))= S[a 1 (t)x(t)+ft(t, u(t))]dt+
!f.
+y 01 x(t 0 )+y, 1x(t,);!!ic 1, i-1, 2, ... , m. (3 )
THEOREM (Maximum Principle for Problems Linear with Respect to the
Phase Coordinates). Let functions a1: ~-R"*, i=O, I, ... , m, and A:~ .:......2(R",
R.n) be integrable on a closed interval ~=[t 0 , t,]cR, let U be a topological
space, let f 1: ~xU- R, i= 0, I, ... , m, F: ~XU- R" be continuous functions,
and let U be the set of measurable mappings u: ~ _,. U such that the func-
tions t + fi(t, u(t)) and t + F(t, u(t)) are integrable.
1. If (x(·), u(·)) is an optimal process for the problem (1)-(3),
then there exist a number ~ 0 ;;;:,. 0, a vector l=(A.,, ... , ~m)ERm•, and an ab-
solutely continuous function p:~ + Rm*, not all zero, such that:
a) the Euler-Lagrange equation (the adjoint equation) holds:

(4)

b) the transversality conditions hold:


m
p(tk)=(--l)k ~ i..t'Ykt; k=O, 1; (5)
i=O

c) the maximum principle holds almost everywhere:


maxG(t, V)=G(t, u(t)), (6)
veu
where
m
G(t, v)=p(t)F(t, v)-'~ '-ddt, v); (7)
1=0

d) the condition of concordance of signs holds:


'-,;eo, i= I, ... , m; (8)
e) the conditions of complementary slackness hold:
~;[Jt{x(·), U(·))--c;]=O, i= I, ... , m. (9)
2. If for an admissible process Sx(·), u(•)) there exist i..o>O. iERm•
and an absolutely continuous function p(·):~ +am* such that conditions
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 249

(4)-(9) hold, then (x(·), u(·)) is an optimal process for problem (1)-(3).
Formally, conditions (4)-(9) of the present section are the same as
the corresponding conditions in Section 4.2.2 (the reader can easily carry
out the transformation s from one group of conditions to the other). How-
ever, we remind the reader that the meaning of the conditions is somewhat
different: in the case under consideration we only assume that the controls
u(•) are measurable, and consequently the necessary conditions (4)-(9) do
not follow from the results of Section 4.2.
Moreover, the second part of the theorem yields sufficient conditions
for the optimality, which was not considered in Section 4.2.
Proof. A) Reduction to a Lyapunov-Type Problem. Problem (1)-(3) was
already reduced to a Lyapunov-type extremal problem in Section 4.3.1 [see
problem (8) in Section 4.3.1] which differs from the problems of Section
4.3.3 because it may include inequality constraints~. However, this is
inessential because the endpoint terms in problem (8) of Section 4.3.1 are
linear, and therefore, changing the signs, when necessary, we again obtain
convex functions.
Let us renumber the indices so that the inequality constraints occupy
the last places in (3), and then, as in Section 3.2, let us put Ei = 1 for
i = 0 and for those i to which the signs < and = correspond in (3) and
Ei = -1 for those ito which the sign~ corresponds in (3). Further, we
put go(x(·)) = 8ox(to) and gi(x(·)) = Ei(8ix(to)- Ci), i ~ 1.
Now problem (8) of Section 4.3.1 takes the standard Lyapunov-type form:

i.(x(· ), u(. ))=- 8o ~ a.u. u (t)) dt+g.(x(. )) - + inf'


( 10)
A
_ r { ~o. 1. ...• m',
i =
Jt(x(·), u(·))=-e 1 .JG 1 (t, u(t))dt+gdx(·) ) =O, i=m'+J.
A m,
where Gi are determined by formulas (5) of Section 4.3.1.
B) "Necessary." If (x(·), u(·)) is an optimal process, then, by the
theorem of Section 4.3.3, there exist ~i• nonnegative for 0 < i < m' and
not all zero, such that

( 11)

almost everywhere,
m m
~ 5: 1gdx (·))=min ~ ~ 1gt(x (.)) ( 1 2)
1=0 X(•) 1=0

and
5:Jt(x(·), u(·))=·O, I ~i~m. ( 13)
It can readily be seen that condition (8) is satisfied if we put \i
and (13) implies (9).
Further, let us put
m
p(t) = ~
I= 0
'A,pl (t), ( 14)

where Pi(·) are determined by formulas (7) of Section 4.3.1, whence, by


virtue of relations (6) of Section 4.3.1, it follows that
( 15)
By means of direct differentiatio n [or using formula (14) of Section 2.5.41,
we verify that p(·) is a solution of Eq. (4) and (5) holds fork= 1.
250 CHAPTER 4

From (7), (14), and from formula (5) of Section 4.3.1, we obtain
m m m
O(t, v)=p(t)F(t, v)- ~ f.tft(t, v)= ~ e1A1 [p 1 (t)F(t, v)~f 1 (t, v)]= ~ s1£10 1 (t, v),
laO lo•O · lsO

after which (11) goes into (6).


Finally, substituting the expressions of gi taken from the definition
into (12), we obtain

This equality is only possible if tL f. p = ~0 e l P= 0, after which (5) with


m
1 1
m
1 1 1

k =0 follows from (14) and (15).


C) "Sufficient." Let us suppose that for an admissible process (x(·),
u(·)) there are Ao > 0, Ai, p(•) such that conditions (4)-(9) hold. Fur-
ther, let Ei be the same as before, and let, as before, xi = Ei~i and Pi(·)
be determined by equalities (7) of Section 4.3.1 so that (15) takes place.
m A

Functions p(·)=~A 1p;(·) and p(·) satisfy Eq. (4), and, according to (5)
i=O
with k = 1, they coincide fort= t1. By the uniqueness theorem, p(t) =
p(t), and consequently
(7) m
G(t, v)=p(t)F(t, v)- ~ i.;tt(t, v)=
1=0

(16)

and

Relations (16) and (6) imply (11), (9) implies (13), and (8) together
with the equalities Xi= Ei~i implies that Xi~ 0, i = 1, ••• ,m. Moreover,
from (17) and the definition of gi(·), it follows that

and therefore (12) also takes place. By the theorem of Section 4.3.3,
conditions (11)-(13) together with the inequalities Xo > o and Xi~ o are
sufficient for the optimality of (x( ·), u(.)). •

4.4. Application of the General Theory to the Simplest


Problem of the Classical Calculus of Variations
In this section the general theory is applied to the simplest problem
of the classical calculus of variations upon which so much emphasis is laid
in mathematical textbooks. In the same manner it is also possible to de-
rive necessary and sufficient conditions for a weak extremum and necessary
conditions for a strong extremum in the general Lagrange problem. However,
this would require a rather lengthy exposition without introducing anything
essentially new. Therefore we shall confine ourselves to a special case
in which an exhaustive investigation can be completed and the relationship
between the old and the new theories can be demonstrated.
4.4.1. Euler Equation. Weierstrass' Condition. Legendre's Condi-
~ By the simplest (vector) problem of the classical calculus of varia-
tions we shall mean (cf. Section 1.2.6) the following extremal problem:
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 251

t,
:7(x(·))=~L(t, x(t), x(t))dt-+extr, ( 1)
t,
X {t 0) = X0 , X (ti) = Xi• (2)

Here the function L:V + R is assumed to be at least continuous in an open


set VcRxRnxRn, and xo, x1, to, and t1 are fixed.
Two versions of problem (1), (2) are investigated. In the first of
them we suppose that x(·) belongs to the space C1 ([t 0 , t 1 ], Rn), and in
this case a local extremum is said to be weak (cf. Section 1 .4.1). In the
second version the function x(·) belongs to the space KC 1 ([to, t1l, Rn) of
functions with piecewise-smooth derivatives endowed with the norm llx(·)ll
sup !x(t)l, and in this case a local extremum is said to be strong.
t,<,t<,t,

Exercise 1. Verify that KC 1 ([to, t1l. Rn) is a normed space but not
a Banach space (i.e., it is not complete).
In both versions a function x(·) is said to be admissible if its ex-
tended graph lies in V: {(t, x(t), x(t))jt 0 :::;;;t:::;;;t 1 }cV, and the boundary condi-
tions (2) are satisfied. The set of the admissible functions x(·) will be
denoted by ~~. The notation and the terminology introduced here and hence-
forth are retained throughout the present section. For brevity, problem
(1), (2) will be called the simplest problem without mentioning the other
details.
Exercise 2. Prove that
inf ~ (x ( ·)) = inf ~ (x ( ·)).
'Y-'nc• ([t,, I,], Rn) ~JnKc• ([1,, I,J, Rn)

Hint. Make use of the lemma on rounding the corners (Section 1.4.3).
If x(·) yields a strong extremum in problem (1) and if function x(·)
is continuous, then x(·), obviously, yields a weak extremum as well. There-
fore, necessary conditions for a weak extremum are also necessary for a
strong extremum [in the case when x(·) is continuous]. Some necessary con-
ditions are already known. In Section 1.4 we used elementary methods to
derive the Euler> equation which is a necessary condition for a weak extre-
mum and Weierstrass' condition (for n = 1) which is a necessary condition
for a strong minimum. Now we shall consider the simplest problem from the
general viewpoint. On the one hand, problem ( 1), (2) is reducible to the
following Lagrange problem:
I,

:7(x(·), u(·))= ~ L(t, x(t), u(t))dt---+extr, (3)


I,
x=u, x(t 0 )=X0 , x(t 1 )=xi.

If we suppose that not only the function L but also its derivatives
with respect to x and u are continuous in V, then the Euler-Lagrange theo-
rem of Section 4.1 can be applied to (3). In this case the weak optimality
of a process (x(·), u(·)) in problem (3) means exactly the same as the weak
extremality of x(·) in problem (1), (2).
If x(·) yields a weak extremum in problem (1), (2) then, by the Euler-
Lagrange theorem, there exist ~o and functions p(·)EC1 (ft 0 , t 1], Rn•) such that

p(t) = }.. Lx(t, x(t),


0 ~ (t)), p(t) = }.. L;, (t, x(t),
0 ~ (t)).
If Ao = 0, then all the Lagrange multipliers are zero, and therefore we can
put ~ 0 = 1. Eliminating p(·), we arrive at the system of Euler's equations

(4)
252 CHAPTER 4

[as usual, here Li(t) = Lx(t, ~(t), i(t)), etc.].


On the other hand, problem (3) (for definiteness, we assume that this
is a minimum problem; in a maximum problem L should be replaced by -L) can
be regarded as an optimal control problem with U~Rn. If Land Lx are as-
sumed to be jointly continuous with respect to the. arguments, then the max-
imum principle of Section 4.2 can be applied to (1), (2). In this case the
optimality of a process (i(·), u(·)) means that
X ( •) E KCl ([1 0 , 11], R3 )
yields a strong minimum in problem (1), (2).
From the maximum principle it follows that there exist ~ 0 ~ 0 and a
piecewise-differentiable function p(·):[to, t1l + Rn* such that

p(t) = 1-.l,. (t), (5)


(6)

Here the equality ~o = 0 is impossible because from (5) it follows that


~ 0 = 0 => p(t) = const, and, moreover, p(t) ~ 0 (because, if otherwise,
then all the Lagrange multipliers are equal to zero). However, if Ao = 0
and p ~ 0, then (6) cannot hold (a nontrivial linear function has no mini-
mum in Rn). Thus ~o ~ 0, and we can assume that ~o = 1.
Condition (6) can be rewritten in the subdifferential form:
p(t) E (iJ;,L) (t, x(t), i (t)). (7)
If L is differentiable with respect to x
too, then (6) implies the
equality p(t) = Lx(t), and, on the one hand, we again arrive at the Euler
equation (4), and, on the other hand, from (6) we obtain Weierstrass' nec-
essary condition
~(t))-L;, (t, x(t), ~(t))(u-~(t))~O,
L(t, x(t), u)-L(t, x(t), (8)
VuE Rn, Vt E fto• t1J·
Let us introduce the Weierstrass function 8 (the letter ·8 symbolizes
the Latin excessus- "excess"):
8(J, X,~. U)=L(t, X, u)-L(t, X, S)-L;,(t, X, s}(u-s). (9)
Then Weierstrass' condition (8) can be written in the form
8(t, x(t), ~(t), u)~O, VuERn, VtE[t 0 , ti}. (10)
If we assume that not only L and Lx but also Lxx is continuous in V,
then (8) implies the inequality
l;;, (t)~ 0, ( 11)
Condition (11) is called Legendre' s· condition.
Thus, the Euler equation is a necessary condition for both a weak ex-
tremum and a strong extremum. Moreover, from the maximum principle we
have derived two more necessary conditions for a strong minimum: Weier-
strass' condition and Legendre's condition (in the next section we shall
see that the last condition must hold for a weak minimum too).
4.4.2. Second-Order Conditions for a Weak Extremum. Legendre's and
Jacobi's Conditions.
LEMMA. Let the function L be continuous in V, and, moreover, let its
derivatives Lxi• LXi• Lxixj• LXiXj• LXixj exist and be continuous in V.
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 253

If :X(·)EC'(ft 0 , t,], Rn) satisfies the Euler equation, then the following
asymptotic formula holds for llh(·)llcl -+ 0:

=,. <x<. >> +~ {h <w LXX u> h <t> +


I,

"'6: <. >+h <. >>


t,

+2hT (t) Lx;, (t) Ji (t) +h (t)T L;; (t) h (t)} dt +a Gfl h (t) I'+ pi (t) I'] dt). (1)
Vh(·)EC 1 ([t 0 , t,], Rn), h(t0 )=h(t 1)=0.
Proof. As in Section 2.4.2, let us embed x(·) in a compact set en-
tirely contained in V and use the uniform continuity of the second deriva-
tives of the function L on that set to estimate the remainder term in
Taylor's formula. This yields the expansion

L(t, x(t)+h(t), x(t)+li(t)) =L(t)+Lx(t)h( t)+L; (t)li(t)+


+-} {hT (t) Lxx (t) h (t) + 2hT (t) lxx (t) fi (t) +
+liT (t) L;,;, {t) Ji (t)} + e (t, h (.))[I h (t) 11+IIi (t) n (2)
where dt; h(·))-+ 0 uniformly on t:, for llh(·)llcl-+ 0.

Integrating (2) from to to t1 and thep, as usual, performing integra-


tion by parts in the linear term involving h(t) and taking into considera-
tion the Euler equation, we obtain (1). •
If x(·) yields a weak minimum in problem (1), (2) of Section 4.4.1,
then replacing h(·) in (1) by a.h(·) and computing the limit Iim[.1(x(·)+
cxh(·))-.1(x(·)) ]/a2 for a.+ 0, we obtain
I,

Q(h ( •)) = ~ {hT {t) Lxx (t) h (t) + 2hT (t) Lxx (f) fi (f) +hT (f) L;x (t)h {t)} dt ~ 0, (3)
I,

Vh(·)EC 1 ([t0 , ti], Rn), h(t0 )=h(t1 )=0.


This means that the function h(·) = 0 yields a weak minimum in the follow-
ing secondary extremal problem for the quadratic functional Q(h(·)):
Q(h(·))--dnf, h(t0 )=h(t1 )=0. (4)
By the lemma on rounding the corners (Section 1.4.3), the infimum of the
functional Q on the set of admissible functions belonging to the space
C'([t 0 , t.], Rn) coincides with its infimum on the set of admissible func-
tions belonging to the space KC 1 ([to, t1l, Rn). Therefore, h(·) = 0 also
yields a strong minimum in problem (4). (It should be noted that, by vir-
tue of the homogeneity of Q, the local minima turn out to be global, and
the distinction between the weak and the strong minima reduces to the
change of the set of admissible functions.)
According to what was proved in Section 4.4.1, for problem (4) there
must hold Legendre's condition which obviously has the same form as in the
case of problem (1), (2):
(5)
Thus, Legendre's condition turns out to be necessary not only for a strong
minimum, as in Section 4.4.1, but also for a weak minimum.
The Euler equation for the secondary problem (4) has the form

ft [Lxx (t) h·(t) +l;x (t) h (t)] =Lxx (t) h(t) + Lxx (t) h (t), (6)

and is called the Jaaobi equation of problem (1), (2) of Section 4.4.1.
254 CHAPTER 4

Definition. A point T is said to be conjugate to the point to (with


respect to the functional Q) if the Jacobi equation (6) possesses a solu-
tion h(·) for which h(to) h(T) = 0 whilet
(7)

THEOREM 1 (Necessary Conditions for a Weak Extremum in the Simplest


Problem). Let the function L satisfy the conditions of the lemma.
If x(•) yields a weak minimum in problem (1), (2) of Section 4.4.1,
then:
1) x(·) satisfies the Euler equation (4) of Section 4.4.1;
2) Legendre's condition (5) holds;
3) Jacobi's condition holds: in the interval (t 0 , t 1 ) there are no
points conjugate to to;
4) inequality (3) holds.
In the case when x(•) yields a weak maximum, the sign~ in (3) and
(5) should be replaced by ~.
Proof. Assertions 1), 2), and 4) are already proved [the assertion
1) was proved in the foregoing section]. Therefore, it only remains to
prove 3). Let us suppose that the contrary is true, i.e., let -rE{t0 , t 1 ) be a
conjugate point to to, and let h(·) be the corresponding solution of the
Jacobi equation: h(to) = h(T) = 0 and (7) takes place. Let us put

iito={ li<t>,
0,
i 0 ~i~'f,
'f~t~ti. (8)

Then
'1:
~ {h(t)T lxx (t) h (t). + 2h (t)T Lxx (t)h (t) +h (t}T L;,;, (/) h (t)} dt =
0 0 0

Q (h { o )) =
t,
'1: '1:

- ~ h(i)T[l,.,.(t)h(t)+lx;,(t)h(t)Jdt +~ h(t)T[l;,.<(t)h(t)+l;,x (t)h(t)]dt


~ 4

[here we have used the fact that h(t)TLxx(t)fi(t) = fi(t)TLxx(t)h(t)]. Now,


recalling that h(•) satisfies Eq. (6) we integrate by parts in the first
integral [taking into account the equalit!es h(to) = h(T) = 0], after_which
the integrals mutually ca~cel. Hence, Q(h(·)) = O, which means that h(·)
[along with the function h(·) =
0] yields a strong minimum in problem (4).
Let us again apply t~e maximum principle of Section 4.2. According to the
principle, there is Ao ~ 0 and a piecewise-differentiable function p(•):
[to, tll +an* such that

p(t) = 2i.olxx (t) h(f)+ 2iofxx (t) (f), ii


min {A0uTf_._. (t) u+ 2iji (t)T l ..;, (t) u-p (t) u} =
ue-Rn

=f../iT {/) £;,;, (t)h{t)+ 2ioh (t)T Lxx (t) h(t)-p (t) h(t). (9)

trn courses in the calculus of variations conjugate points are usually defined
under the assumption that the strengthened L;gendre cond~tion LXx(t) > 0
holds. Under this assumption we have LXx(T)h(T) = 0 ~ h(T) = 0, and since
h(T) = 0, the uniqueness theorem implies that h(·) 0. Therefore, condi- =
tion (7) is equivalent to the condition of the nontriviality of h(·) and
is usually replaced by the latter.
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 255

Using the same argum~nt as above, we prove that Ao ~ 0, and therefore it


can be assumed that Ao = 1/2.
From relations (9) it follows that
p(t) = L;,;, (1) h(t) +LXX (t) fi (t)_. ( 10)
However, h(·) - 0 fort~ T, and therefore from (10) it follows that p(: +
O) = 0. On the ?ther hand, by virtue of (7), we have p(T- O) = Lxx(T)h x
(T - 0) = Lxx(T)h(T) ~ 0. Therefore, the function p is discontinuous
whereas, by virtue of the maximum principle, it must be continuous. This
contradiction proves that Jacobi's condition is fulfilled. •
Thus, we have shown that all the necessary conditions for an extremum
in the simplest problem which are usually presented in courses of calculus
of variations, namely, Euler's, Legendre's, Jacobi's, and Weierstrass' con-
ditions are consequences of the Pontryagin maximum principle. Now we pro-
ceed to sufficient conditions for a weak extremum.
THEOREM 2 (Sufficient Conditions for a Weak Extremum in the Simplest
Problem). Let the function L satisfy the conditions of the lemma, let
x(·)E C1 ([t 0 , t 1 ], Rn), and let the extended graph of the function x( ·) 1 ie in
V: {t, x(t), x(t))!t 0 ~t~t 1 }cV.
If:
1) the function x(·) satisfies the Euler equation [Eq. (4) of Section
4.4.1];
2) the boundary conditions x(to) = Xo, i(tl) = X1 are satisfied;
3) the strengthened Legendre condition

( 11)
holds;
4) the strengthened Jacobi condition holds: in the half-open interval
(to, t1l there are no conjugate points to t 0 , then i(·) yields a
strict weak minimum [or maximum if the sign> in (11) is replaced
by<] in the simplest problem (1), (2) of Section 4.4.1.
The proof of this theorem will be presented in Section 4.4.5.
4.4.3. Hamiltonian Formalism. The Theorem on the Integral Invariant.
We already mentioned in Section 4.1.1 that the Euler-Lagrange equations
together with the equation of the differential constraint can be brought
to a Hamiltonian (or canonical) system of the form
X=!/tp(t, X, p), p=-:Jt,.(t, X, p), ( 1)
where the function ~ is called the Hamiltonian (or the Hamiltonian func-
tion) of the original problem. Here we shall study this transformation in
greater detail under the conditions of the simplest problem. The passage
from the Euler equation of this problem [Eq. (4) of Section 4.4.1] to the
canonical system (1) is carried out with the aid of the Legendre transfor-
mation. We shall begin with the study of this transformation.
Definition 1. Let a function l:v + R belong to the class C1 (v) in an
open set v of a normed space X, and let its derivative l' specify a one-to-
one mapping of v onto d::O{p!p=l'(;), ;Ev}cX•; let :O::d + v be the inverse
mapping to Z'. The function h:d + R determined by the equality
h(p)=<p, 8(p)>-l(S(p)) (2)
Ls called the classical Legendre transform of the function l.
256 CHAPTER 4

Proposition 1. Let l:v 7 R belong to the class C2 (v) in an open con-


vex set of a normed space X. If l"(EJ is positive-definite, i.e.,
(3)
then the Legendre transform h(•) of the function l(·) exists and coincides
in d with the Young-Fenchel transform (Section 2.6.3) of the function

- { l (6), 6 Ev, (4)


l-
- +oo, ~~v.

which is convex.
Proof. The convexity of l was proved in Section 2.6.1 [the example
2)]; the fact that the mapping~ 7 Z'(~) is one-to-one can be proved by
analogy with Section B) of the proof of Proposition 2 below.
According to the definition of the conjugate function (Section 2.6.3),
l•(p)=sup{<p, ~>-7(6)}=-inf{l(~)-<p, ~)}.
;
If pEd, then p = l'(~) = i'<O for some ~EV· Consequently, O=(L'(~)­
P)Ea[f(·)-:-<P• ·>]{~, which implies that ~is a point of minimum of the
function Z(·) - <p, ·> and
l•(p)=-(l(~)-<p, ~>)=<p, f>-f(~.
Since p = l'(~), we have~= 3(p), and, according to the definition,
i*(p) = h(p) • •
Recalling Young's inequality (Section 2.6.3), we arrive at
COROLLARY 1. Under the conditions of Proposition 1,
l(~)+h(p);;;;o<p, ~>. VpEd, ~Ev. (5)
Given the Lagrangian L(t, x, x)
of the simplest problem, to obtain the
Hamiltonian of the problem we have to perform the Legendre transformation
with respect to the argument x.
Consequently,
flt(t, X, p)=px-L(t, X, x), (6)
where x must be eliminated using the relation
p=L;,(t, x, x), (7)
and hence flt is defined correctly by formula (6) if the mapping x 7 LX(t,
x, x) is one-to-one. [The quantity x in formulas (6) and (7) plays the
role of an independent argu~ent; to avoid the confusion of x with dx/dy we
shall denote the former as x = ~ although for the corresponding derivatives
the notation 4, aL/axi, etc. will be retained.]
Proposition 2. Let the function (t, x, ~) 7 L(t, x, ~) belong to the
class Cr(V), r;;,: 2, in an open set VcRxRnxRn.
If:
a) the second derivative LXX is nondegenerate in V, i.e.,
iJBL
det ( aii.ai, (t, x, 6)
)
* o, V (t, x, ~) E V;

b) the mapping P specified by the equality P(t, x, ~) = (t, x, LX(t,


x, U) maps V onto DcRxRnxRn• in a one-to-one manner, then
D is open, p-tE c•-1 (D}', and the function fit: D-+ R determined by
equalities (6) and (7) belongs to the class cr(D).
2) If the section Vt.~={61(t, X~ 6)EV} is convex for. any (t, x) and the
matrix Lxx<t, x, ~) is positive-definite for any (t, x, ~)EV, then the
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 257
conditions a) and b) are fulfilled and the assertion of Section 1) holds.
Proof. The inverse mapping of P, which exists according to b), is
obviously of the form (t, x, p) + (t, x, 3(t, x, p)), and hence
p=L;.(t, X, S)~s=E(t, x, p). (8)
Since LEC'(V), r~2, we have PEC'-l(V)c'C1 (V). The Jacobi matrix of the
mapping P has the form

I 0 0 )
( 0 E 0 ,
L.xt L.X% L ..
XX

and, according to a), its determinant, equal to detLxx• is nonzero.


Let (to, xo, po) = P(to, xo, ~o) be an arbitrary point of D. By the
inverse mapping theorem, there exists a neighborhood U ~ (t 0 , x0 , ~ 0 ) whose
image P(U)cD is a neighborhood of (to, xo, po). Consequently, Dis open.
By the same theorem, the mapping P- 1 (which is determined uniquely) coin-
cides on P(U) with a mapping of class C1 Since (t 0 , x 0 , p0) ED is an arbi-
trary point, we have P- 1 EC1 (D). Further, from (8) it is seen that 3(t, x,
p), as an implicit function, is determined by the equation F(t, x, p, ~) =
p - Lx(t, x, ~) = 0. This equation being formed by functions of class
cr- 1 , the function 3 and, together with it, the function p- 1 belong to the
class cr- 1 (d) (see the remark in Section 2.3.4 made after the implicit
function theorem).
From (6), (7), and (8) we derive the identity
f?t(t, X, p)=p'E(t, X, p)-L(t, X, 8(t, X, p)). (9)
The differentiation of (9) results in
fltt=P'Et-Lt-L;.'Et=[P-L;.(t , x, E)J'Et-Lt=-Lt(t, X, B(t, X, p)), (10)
flt,.=p'E,.-L,.-L;.B,.=-L,.(t, x, B(t, X, p)), (11)
f?tp=E+pEp-L;.'Ep='E(t,x, p). (12)
From formulas (10)-(12) it is seen that the first derivatives of the
function f?t are compositions of functions of class cr-l and consequently
they themselves belong to that class. Therefore, f?t E C' (D).
B) If the matrix Lxx is positive-definite, then, by Sylvester's theo-
rem [7], detLxx > 0, and hence a) holds. Let the sections Vt x be convex.
Then, given any pair of points (t, X, s ... )EV' k = 1' 2, we have (t, x, Cl~l +
(1-ct)s.)EV,ctE[O,lJ. Denoting Pk = LX(t, x, ~k), we write
.
Pl-Ps=L;.(t, X, 61)-L,;,(t, X, s.) =S d~ L;.(t,
0
X, cts1+(1-ct)6.)dct=

= ~ L;.i (t, x, as1 +(l-a)S.)(s1-ss)da.


0

If ~ 1 ~ ~2 while P(t, x, ~l) = (t, x, Pl) = (t, x, pz) P(t, x, ~z), then
Pl = pz and

r.
n

0 = <P1-p•• 61-s.> = (Pli-P•i)(su-sa;)-


'=1
=Sr. 0:x,o•L·.a·.(t,x,a~1+(1-a)S.)(s1;-s.;)
I n

0 i, i=l :x,
(st1 -6.1)da>O
because, by the hypothesis, the integrand function is positive <Lxx is
positive-definite). This contradiction shows that Pl = pz ~ ~l = sz, i.e.,
b) holds. •
258 CHAPTER 4
Recalling Corollary 1, we obtain
COROLLARY 2. Under the conditions of the second part of Proposition
2, for any (t, x, G)EV, {t, x, p)ED, the inequality
R(t, X, p)+L(t, X, ;);;;;;.<p, 6>
holds.
Now we proceed to the relationship between the Euler equation
d • •
dTL;, (t, x, x)=L,.(t, x, x) (13)

and the Hamiltonian system (1). When saying that x:~ +~is a solution of
Eq. (3) [or that a pair (x, p):~ +an x an* is a solution of system (1)] we
shall mean, without any special stipulation, that (t, x(t), x(t))EV for all
tEL\ [or, accordingly, (t; x(t), p(t))ED]. For brevity, a solution x:~ +an
of the Euler equation (13) and also its graph i(l, x(t))jt EA} will be called
an extPemaZ, and a solution (x, p):~ +an x an of the Hamiltonian system
(1) and its graph {(t, x(t), p(t))ltEA} will be called a canonical extPemaZ
(it will always be clear from the context whether the functions or their
graphs are meant).
Proposition 3. Let LEC1 (V), and let the function L satisfy the con-
ditions a) and b) of Proposition 2.
A pair (x(·), p(•)) is a canonical extremal if and only if x(·) is an
extremal and
p(t)=L;, (t, x(t). x(t)). (14)
Proof. A) Let x(·) be an extremal. If p(·) is determined by equality
(14), then, according to (8), we have x(t) = 3(t, x(t), p(t)). We see now
that the first of Eq. (1) follows from (12) and the second follows from
(13).
B) Let (x(·), p(•)) be a canonical extremal. Then (12), (8), and the
first of Eqs. (1) imply (14), and then, according to (11), the second equa-
tion (1) turns into (13). •
Remark. The assumption that LEC1 (V) is stronger than it is in fact
necessary here: as in Section 4.2.2, it would be sufficient to require the
existence and the continuity of the first and the second derivatives only
with respect to Xi and xj. The reader can verify this remark by making the
corresponding changes in the statements of the implicit function and the
inverse function theorems of Section 2.3.4.
The Hamiltonian system (1) possess many remarkable properties (e.g.,
see [1]). Here we shall limit ourselves to one of them which is related
to the derivation of sufficient conditions for an extremum.
Theorem on the Integral Invariant. Let the function .9,? belong to the
class c2 in an open set DcRx Rnx Rn*, and let w be the differential form
in D specified by the equality
n
ro=pdx-fl(dt= ~p 1 dx1 -fl((t, x, p)dt. (15)
I= I

Let r 1 =.{(t11 (a.), x11 (a.), p11 (a.))ja~a.~b}, k = 1, 2, be two piecewise-smooth


closed contours-in D [such that tk(a) = tk(b), Xk(a) = Xk(b), Pk(a) =
Pk(b)].
If the contour r 1 is obtained from the contour r 2 by translating the
latter along canonical extremals, i.e., if the points 5l 0 (a)= (t0 (a), X 0 (a),
Po (a)) and 5l 1 (a)= (tda.), X1 (a.), P1 (a)) lie on one canonical extremal for every
a, then
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 259
p

Fig. 38

( 16)

Proof. A) According to the classical theorem on the differentiability


of the solutions with respect to the initial data (Section 2.2.7) and the
composition theorem (Section 2.2.2), the solution of Cauchy's problem for
system (1) with the initial conditions (to(a), xo(a), po(a)) [we shall de-
note this solution by (x(t, a), p(t, a))] is piecewise-continuously dif-
ferentiable with respect to a and continuously differentiable with respect
to t. The mapping cr: {(t, a) 1-a~a~b, t E[t0 (a), ! 1 (a)]}-+D (for t1 (a) < to (a)
we must write [tl(a), to(a)] here) determined ~y the equality o(t, a) =
(t, x(t, a), p(t, a)) specifies a two-dimensional surface E in D (Fig. 38).
Orienting E in an appropriate manner, we can write aE = r 1 - r 0 (the two
arcs corresponding to a = a and a = b coincide geometrically but are ori-
ented oppositely so that the integrals along these arcs mutually cancel).
According to Stokes' theorem,

and hence equality (16) will be proved if we verify that dw[crt, oal - 0.
B) Differentiating form (15), we obtain

dw = 1~dp 1 !\dx,-[ :lt1 dt+ 1~1 (.9'tx 1 dx 1 +:ltp 1 dp 1)] 1\dt=


n
= i::~1[dp;!\dX;-fffx.dX;!\dt-fltp.dp;!\dt]
l l
(17)

(we remind the reader that for the exterior multiplication we have dt;\dt=
0).
If s = (st, sx, sp) and n = (nt, nx, np) are two tangent vectors, then
( 17) yields

I fftp 1-fffx1
st sx1 Sp 1
111 11x1 l'Jp1

Therefore

dw[a 1,aal""'-L
n I fft P; -fffx1
I xu(t,a) P;t(t,a)
I
=0
i=l 0 X;a,(t,a) P;a(t,a)

because the expression cr(t, a) = (x(t, a), p(t, a)) is a solution of system
(1) with respect to the argument t and hence the first and second rows of
the determinant coincide. •
260 CHAPTER 4

The differential form (15) for which equality (16) holds is called the
Poincare-Cartan integral invariant.
Definition 2. We say that AcD is a Legendre set if
( 18)

for any piecewise-smooth contour yc:A. In other words, the restriction


wiA is a closed form.
[Here we need a more precise definition of the piecewise-smoothness:
we shall suppose that A is the image of an open set GcRk, under a mapping
<p: G-+D of class C1 (G) and that y is the image of an ordinary piecewise-
smooth closed contour in Gunder this mapping.]
The following three examples of Legendre sets will play an important
role.
1) Let us fix a point (t 0 , X0 , Po)ED and consider the family of canoni-
cal extremals {(x(t, \), p(t, \)) Ia < t < S} with the initial conditions
X (t 0 , A)= Xo, p (t., A)= A. ( 19)
The set
~={(t, x, p)jx=x(t,A.), p=p(t, A.), jp0 -A.j<e, a<i<M
is the desired Legendre set; £ > 0 and the interval (a, M:;) t 0 are chosen so
that ~cD.
Indeed, as was said above, a piecewise-smooth closed contour yc~ is
the image of a piecewise-smooth contour
f={(t(s), A.(s))ja::::;;;s::::;;;b, t(a)=t(b), A.(a)=A.(b)}c:(a, ~)X(p 0 -e, p0 +e)
under the mapping (t, A) + (t, x(t, \), p(t, \)). Projecting ron the
planet= to,weobtain the contour fo = {(to,\(s))la < s < b, \(a) A(b) } ,
and, by virtue of (19), the image of fo is
'Vo={(t 0 , x(t 0 , A.(s)), p(t., l..(s))ja::::;;;s::::;;;b}={(t •• x 0 , A.(s))ia::::;;;s::::;;;b}. (20)
Since the contour Yo is obtained from the contour y by means of the trans-
lation of the latter along.canonical extremals, the theorem on the integral
invariant implies
1ro= 1ro= 1pdx-~dt=O
'l' 'l'o Vo

because, by virtue of (20), dx = 0 and dt = 0 on Yo· Since (18) holds for


any yc~, we see that l: is a Legendre set.
2) Let x(·) in problem (1), (2) be a scalar function, let us suppose
that a family of extremals x(·, A.): (a, ~)~R, A.E(a 1, ~ 1 ), is given, and let
x ( ·, ·) E C1 ((a, ~)X (a1, ~ 1 )).
The set
:2={(t, X, p) I X=x(t, A.), p=p(t, A.)=Lx(fl x(t, A.), x(t, A.)), tE(a, ~). A.E(ai, ~1)}
is the desired Legendre set. Indeed, let y, Yo, r, and r 0 be defined as
above.
Then
b

1ro = 1ro= 1p dx = 5p (t 0, A. (s)) ~ {t 0 , A. (s)) A.' (s) ds. (21)


"' Yo Yo a

Denoting by~(\) an antiderivative of the function A+ p(t 0 , \)xA(to, \),


we obtain from equalities (21) the relation
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 261

pw =~Ill' (1. (s)) 1.' (s) ds =Ill (1. (s)) /;::= 0


'I' a

because A(a) = A(b) (the contour is closed).


3) Proposition 4. Let a function p: G-. Rn•, P=(Pi• ... , Pn)• be defined
in an open set Gc:R X Rn, the graph of plying in D: l:-={!t, x, p(t, x))l(t, x)E
G}c:D. For I: to be a Legendre set:
a) under the assumption that p is continuous it is necessary and suf-
ficient that there exist a function SEC1 (G) such that

p (t, x) = iJS f; x), (22)

~~ + ~ ( f, X, ~:) = 0; (23)

b) under the assumption that pEC1 (G) it is necessary, and for a sim-
ply connected domain G it is also sufficient, that the relations
i}p; iJpk
OXk =ox; , ( 24)
n
k
op; - - of?t- ~ of?t opk
i}t - iJ x; ""'- iJPk iJ x; '
k=l

t, = I ' ... ' n,

hold at each point (t, x)EG.


Proof. Substituting p = p(t, x) into (15), we obtain the form
ro=p(t, x)dx-fft(t, X, p(t, x))dt.
The fact that I: is a Legendre set means that
r
pro=O for any rc:a·.
According to the theorem of classical mathematical analysis, this is
equivalent to the exactness of w, i.e., to the existence of a functionS
such that w = dS, whence follows- a).
If pEC1 (G), then wEC1 (G) and dw = d(dS) = 0, which is equivalent
to (24). The sufficiency of condition (24) in the case when G is simply
connected follows from the same classical theorem which implied a). •
Equation (23) is called the HamiZton-Jacobi equation. If a solution
of this equation is known in the domain G, then (22) specifies a function
whose graph is a Legendre set.
4.4.4. Sufficient Conditions for an Absolute ExtreiDllm in the Sim-
plest Problem. We shall again consider the simplest problem of the classi-
cal calculus of variations:
t,
:J (x(·))= ~ L(t, x, x)dt-+extr, (1)
t.
X (t 0) = X0 , X (t1 ) = Xt• (2)
Let us suppose that the Lagrangian L belongs to the class C2 in an open
set Vc:RxRnxRn. By the admissible function for problem (1), (2) we
shall mean the functions x(·)EKCl((t 0 , 11], Rn) with piecewise-con,tinuous
derivatives whose extended graphs lie in V: {(t, x(t), x(t)lt 0 . :s:;;;t=;;;;t1 }c:V. For
definiteness, the whole argument will be presented for the case of a mini-
IDllm; the reader can easily make the corresponding changes in the statements
for the case of a maximum by replacing L by -L.
As was already see~ in Sections 4.4.1 an~ 4.4.2, the fulfillment of
the Legendre condition Lxx(t) = Lxx<t, x(t), x(t)) ~ 0 is necessary for
262 CHAPTER 4

an admissible function~(·) to yield a minimum in problem (1), (2). On


the other hand, the strengthened Legendre condition Lxx<t) > 0 is a part
of the sufficient conditions for a minimum. This condition implies the
convexity of the function ~ + L(t, x, ~) in the vicinity of the points
(t, x(t), ~(t)), which suggests the idea that the convexity of the La-
grangian L with respect to ~ throughout its domain must be included in the
sufficient conditions for an absolute minimum. Bogolyubov's theorem (see
Section 1.4.3) confirms this idea. Therefore, we shall suppose that
Lici(t, x, £)>0, V(t, x, £)EV, (3)
and that the section
Vt, x= {6 I (t, X, £) E V} is convex (4)
for any (t, x). Then from Proposition 1 of Section 4.4.3 it follows that
the function
[,(t, X, 6 )={ , x',
L+(t00 6), (t, x, £)EV,
u. x, 6) ~v
(5)

is convex with respect to ~.

According to Proposition 2 of Section 4.4.3, conditions (3) and (4)


guarantee the correctness of the definition of the Hamiltonian fftEC•(D)
by the relations
fft(t, x, P)=px-L(t, x, x>, (6)
p=Lx(t, x, x)
from which x must be eliminated. By Corollary 2 in the same section,
L(t, X, 6)+fft(t, X, p);;:;:p'g,
(7)
V(t, x, 6)EV, V(t, x, p)ED.
Finally, we shall again deal with the Poincare-cartan differential form
w=pdx-fft(t, x, p)dt. (8)
THEOREM 1 (Sufficient Conditions for a Minimum). Let L belong to the
class C2 in an open set VcRxRnxRn, and let conditions (3) and (4) hold.
Further, let D be defined in the same way as in Proposition 2 of Section
4.4.3, and let G be an open set in R x Rn.
If:
1) x(·): [t 0 , tr]-+-Rn is an admissible extremal of problem (1), (2) and
its graph lies in G:
f'= {(t, x(t)) I to~t~tl}cG;
2) there exists a function p:G + Rn* of class C1 (G) whose graph lies
in D:
1:={(1, x, p(t, x)) I (t, x)EG}cD
and is a Legendre set;
A dcl A A ~
3) p(t)=p(t, x(t))=L;;(t, x(t), x(t));
then
~ ~

~ L(t, x(t), x(t)dt > ~ L(t, x(t), ~(t))dt


t0 10

for any admissible function x(·):[to, t 1 ] + Rn which satisfies boundary


conditions (2) and whose graph lies in G:
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 263

f={(t, x(t))lt 0 ~t~t 1 }c:G.


Remark. Proposition 3 of Section 4.4.3 implies the equality i(t)
fltp(t, x(t),_p(t)), and consequently the derivative i:(t) is continuous to-
gether with x(t) and P(t) = p(t, x(t)). Therefore, x(t) is continuously
differentiable although in the case under consideration the admissible
class consists of all piecewise-differen tiable curves. However, this fact
does not restrict the applicability of the theorem. Indeed, as was shown
in Section 4.4.1, the existence of a continuous (and even piecewise-differ-
entiable) function p(·), such that p(t) = Lx(t, x(t), i(t)), is a necessary
condition for a (strong) minimum which is implied by the maximum principle.
We see that under conditions (3) and (4) this means that the function ~(.)
must be continuous.
Proof. Let us denote p(t) = p(t, x(t)) and consider the curves
y.={(t, x(t), p(t)) I t 0 ~~ ~t 1 }c:l::,
v={(t, x(t), p{t))l to~t~tl}c:~.
Since x(ti) = x(ti), i = 0, 1, we have p(ti) = p(ti, x(ti)) = p(ti, x(ti)) =
p(ti). Therefore, the end points of~the curve y andy coincide, and we
can consider the closed contour y- yc:l:: which is obtained if we first
pass along y from (to, x(to), p(to)) to (tl, x(tl), p(tl)) and then pass
in the opposite direction along y. Since E is a Legendre set,

0= p co=Sco-S co. ( 10)


v-v v -v
Using equality (7), we obtain
t,
s spdx-fltdt= s{p(t)x(t)-D'((t, X(t), p(t))}dt~
CO=
'I' 11 t.
t,
~sL(t, x(t), x(t))dt=-'(x(·)), ( 11)

and the equality is only possible here if
p(t)x(t)-flt(t, x(t), p(t))==L(t, x(t), x(t)),
i.e. [we again use (7)], i f
p(t)x(t)-flt(t, x(t), p(t))·= max {px(t)-W(t, x(t), p>J.
{pI (t, x (I), Pl e Dl

By the Fermat theorem,


x(t)=fltp(t, x(t), p{t}). ( 12)
Since the graph of p is a Legendre set, equality (24) of Section
4.4.3 must hold, whence, taking into consideration (12), we obtain
dp; <t> =~ (t x (t)) = op, + ~ op, dx,. = - (O.flt +~an ap,.)
dt dt Pi ' · at ~ox,. dt ax, ~ iJp, iJ"1 +

+ 1:k 00P" ~'ItPit =-flt..1(t; x{t),


.1111
p (t)). ( 13)

According to (12) and (13), the pair (x(·), p(·)) is a canonical extremal.
Further, we have (x(t 0 ), p(to)) = (x(to), p(to)), and since (x(·), p(·))
is also a canonical extremal (Proposition 3 of Section 4.4.3), the unique-
ness theorem implies that x(t) = x(t), p(t) = p(t).
264 CHAPTER 4
Hence from (11) it follows that
t,
Sro= SL(t, x(t>, i(t))dt=:l<x<·>>.
v '·
and for the other admissible functions x(•) whose graphs lie in G and which
satisfy conditions (2) there must be

l co< :7 (x (. )).
'V

By virtue of (10), we obtain:7(x(·))>:7(x(·)).



COROLLARY. If, under the conditions of the theorem, the set G coin-
cides with the projection of D on R X Rn = {(t, x)}, then x(·) yields a
strict absolute maximum in problem (1), (2).
A part of the conditions of the theorem we have just proved is usually
presented in courses of the calculus of variations in a somewhat different form
with the notion of a field of extremals.
Definition. Let A and G be open sets in Rn and R x Rn, respectively.
A family of functions {x(·, A.): ft 0 (A.), tt{A)J--..Rn j.A.EA} is said to form a field
of extremals covering G if:
1) x(·, A) is an extremal (i.e., a solution of the Euler equation) of
problem (1), (2) for any A.EA;

2) the mapping (t, A) + (t, x(t, A)) is one-to-one and its image con-
tains G:

3) the function p:G + Rn* determined by the equalities


p(t, x)=L;,(t, x(t, A), x(t, A)),
(14)
(t, x)=(t, x(t, A))

belongs to the class C1 (G) and its graph is a Legendre set.


The function u:G + Rn determined by the equalities
x
u (t, x) = (t, A.), (t, x) = (t, x (t, A.)) ( 15)
is called the field slope function. An extremal i(·):[to, t 1 ] + Rn is
said to be embedded in the field x(·, ·) if its graph is contained in G
and x(t) :: x(t, ~) for some >:eA.
Substituting (14) and (15) into (8) and recalling the definitionofa
Legendre set (Section 4.4.3), we can restate condition (3) thus:
3') In the domain G the integral
(/1 ,x1)
S {L,i(t, x, u(t, x))dx- [L,i(t, x, u(t, x))u(t, x)-L(t, x, u(t, x))]}dt (16)
(/ 0 , x,)

does not depend on the path of integration joining the points (t 0 , x 0 ) and
( t1o x 1 ) (according to the theorem of classical mathematical analysis,
this is equivalent to the property that the integral over every closed
contour is equal to zero). Integral (16) is called Hilbert's invariant
integral.
THEOREM 1'. Let function L satisfy the same conditions as in Theorem
1, and let x(•):[to, t 1 ] + Rn be an admissible extremal whose graph is con-
tained in G.
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 265

rf·i(·) can be embedded in a field of extremals covering G, then x(•)


yields a minimum for functional (1) in the class of admissible functions
which satisfy conditions (2) and whose graphs are contained in G.
The reader can easily see that this is a mere restatement of Theorem
1. It is particularly convenient for n = 1 because in this case condition
(3) of the definition stated above follows from 1) and 2) (see the second
of the examples of Legendre sets in Section 4.4.3), This remark can be
used in the solution of many problems.
4.4.5.Conjugate Points. Sufficient Conditions for Strong and Weak
~xtrema. sect ion we shall again suppose that L E c• (V) and
In this
~(·)EC 1 ([t 0 ,t 1 ], Rn)
is an admissible extremal of problem (1), (2) of Section
4.4.4 along which the strengthened Legendre condition
(1)
holds. Since the matrix Lxx is continuous, it remains positive-definite
in a neighborhood Vof the extended graph {(t, x(t), i(t))lt 0 ~ t ~ t 1 },
Let us choose the neighborhood v so that its sections I (t, x, s)EV} vt .... ={s
are convex. Then the conditions of Proposition 2 of Section 4.4.3 are
fulfilled, and, accorgingly, we can define the domain Dc:::RXR"XR"• and
the Hamil ton ian ;/(: .b-+ R of class C2 •
The right-hand sides of the canonical system
X=.'ltp(t, X, p), P=-;K. . (t, X, p) (2)
are continuously differentiable in D. The canonical extremal (x(·), p(·))
corresponding to the extremal x(·), and hence x(·) itself, can be extended
to an open interval containing the closed interval [to, t1l· After that,
narrowing V, if necessary, we can assume that V has the form
V={(t, X, s)Jix-x(t)i<e, ls-~(t)l<e,t.-e<f<t.+e}.
Let (to, xo, po) =(to, x(to), Lx(to, x(to), ~(to))). Let us con-
struct the family of canonical extremals {(x(·, A), p(·, A)) I lA- pol < o}
determined by the initial conditions
(3)
(cf. the first example of a Legendre set in Section 4.4.3). For suffi-
ciently small o the extremals of this family, together with (x(·), p(•)) =
(x(·, p 0 ) , p(·, po)), are contained in :6 for tE[t •. ti]; it is evident that
the functions x(·, ·), p(•, •) are continuously differentiable (Section
2.5. 7).
Proposition 1. For a point 1: E{10 , 11] to be conjugate to t 0 it is
necessary and sufficient that
det (oX(T, A)
iiA
I )=0.
A=Po
(4)
Proof. A) By the theorem of Section 2.5.7, the collection of the de-
rivatives of the solutions of system (2) with respect to the initial data
for A = Po forms the principal matrix solution of the variational system
of equations
~ =np...s+sr'tppTJ,
(5)
~= -1t'xxs-.9't. .pTJ·
The derivatives ax/3Ai• 3p/3Ai we are interested in, which are arranged in
columns, occupy half of the columns (n of the 2n columns) of this matrix,
and, according to conditions (3),
266 CHAPTER 4

(!;~~~:) 11=1 = (e~) •


0
(6)

Equality (4) is equivalent to the existence of a nonzero vector cERn such


that

Let us denote

This expression is a linear combination of columns of the principal matrix


solution and therefore it is a solution of the system (5), and, by virtue
of (6), we have ~(to)= o, n(to) = c ~ 0. Moreover, ~(T) = 0.
Thus, equality (4) is equivalent to the existence of a nontrivial
solution of system (5) for which ~(to) = ~(T) = 0.
B) LEMMA. A pair {s(·), n(·)} is a solution of system (5) if and only
if s(·) is a solution of the Jacobi equation

(7)

and
11 (t) = Lx;~ (t) +L;., (t). s (8)
Proof. According to the definition (see Section 4.4.2), Eq. (7) is
the Euler equation for the secondary extremal problem with the Lagrangian

(9)

The corresponding Hamil ton ian S) is obtained by means of the Legendre


transformation with respect to ~:
~= TJ'~-~~· ~ ( 10)
TJ=~i =L.u~+Lu;.
To obtain the second of these equalities, we use the relation
- .
s~Lx.ts = L --aa-
j, k
a•f. .
s1 s~.
Xj· Xk

whence

Similarly,

Eliminating~ from (10), we obtain

~
I --1 L --1 ~ ~ ~ --1] }
=2 {tJTLx..· TJ-2sT xi L;i 1'J +sT{Lxx.Lxx -Lx.tLxx s ,· ( 11)

Now we make use of formulas (11), (12), and (8) of Section 4.4.3:
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 267

9't,.(t, x, p)=-L,.(t, X, B(t, X, p)),


9'tp(t,x,p)=B(t, x, p),
p==Lx(t, x, B(t, x, p)).
The differentiation of these equalities with respect to x and p results in
9't,.,. = -l,.,.-lxis,. = -l,.,.-lufftP"'
9't"P = -lu'B P = - Lufftn•
E= ~; =lx/Bp=lu9'tpp•
0= ~~ =l;x +L.rx'B,.=lg +lu9'tp,.•
It follows that

( 12)

Consequently, (11) can be rewritten thus:

,e,={{TJ·.rt'ppTJ +2~·.9'(,.pTJ+ ~·.rt,.,.~}.

According to Proposition 3 of Section 4.4.3, a pair{~(·), n(·)} is


a solution of the canonical system
;=.t)", ~=-.t)e (13)
if and only if ~(·) is a solution of (7) and the second of relations (10)
[coinciding with (8)] holds. It remains to note that system (13) is iden-
tical with (5). •
C) Completion of the Proof. In the first part of the proof it was
shown that equality (4) is equivalent to the existence of a nontrivial solu-
tion{~(·), n(·)} of system (5) for which ~(to)= ~(t) = 0. By the lemma,
~(·) i~ a solution of the Jacobi equation and (8) takes place, whence n(t) =
Lxx(t)~(t). The equality n(t) = 0 contradicts the nontriviality of the
solution (by virtue of the uniqueness theorem), and therefore n(t) ~ 0.
Recalling the definition stated in Section 4.4.2, we conclude that (4) is
equivalent to the fact that the point t is conjugate to the point to. •
The proposition we have proved leads us to the following geometrical
interpretation of the notion of a conjugate point. Let again (i(·),
p(·)) be the canonical extremal corresponding to the extremal x(·) under
consideration. For a sufficiently small o the canonical extremals deter-
mined by conditions (3) form a thin strip along the graph B0 B1 (Fig. 39).
For t = to the edge of this strip is "vertical" and its projection coin-
cides with the point Ao =(to, xo). Condition (4) which, as was shown,
is equivalent to the equality (ox/ at.)c= 0 means that for t = t the strip
is tangent to a "vertical" direction at the point Bt = (t, x(t), p(t))
(i.e., to a direction parallel to the planes t = const, x = const).
If we project the strip on the space (t, x), we obtain a pencil of
extremals issued from the point Ao (Fig. 39). In the vicinity of the point
AT = (t, x(t)) the mapping (t, A.) + (t, x(t, A.)) is not one-to-one ["the
extremals lying infinitely close to the extremal x(·) and having the same
origin as x(·) intersect at the point At"]. The point At is also said to
be conjugate to Ao.
Exercise. The arcs of the great circles on the sphere s 2 = {(x, y,
z)Jx2 + y2 + z 2 = R2 } are extremals of the distance functional. Draw a
great circle through a point· A 0 E 811 and find a point on it which is con-
jugate to Ao.
268 CHAPTER 4
Jl

Fig. 39

Proposition 2. If on an extremal i(•):[to, t1l + Rn there hold the


strengthened Legendre condition (1) and the strengthened Jacobi condition
(i.e., the half-open interval (to, t1] contains no points conjugate to t 0 ) ,
then this ext'remal can be embedded in a field of extremals covering a
neighborhood G of its graph {(t, i(t))lto ~ t ~ t 1 }.
Proof. A) We shall first prove that for a sufficiently small o > 0
none of the points of the closed interval [to, t 1 ] is conjugate to any of
the points ~~-E{t0 --6, 10 ). If otherwise, there exist t~m-..t 0 , timE[t0 , ta and
solutions E;ln) (•) of the Jacobi equation (7) such that ~<n>(t~m)=~<n>(tim)=O,
Lx:t(if')~<n>(tlm)=;i:O. The equation being homogeneous, we can put.j;"<n>(tlm)l=l.
Passing to a subsequence, if necessary, we can assume that tim-..tiE(t0 , t1],
and
~<n> (tlnl)--+ 1'J, 111 I= 1,

If condition (1) is fulfilled, the Jacobi equation is reducible to


the linear system (5) whose coefficients are continuous on a closed inter-
val tJ. :::> [1 0 , t 1J, and Cauchy's problem for this system with the initial con-
ditions
6(ti)= 0,
11 (ti) =lxx(ti) i {ti) +lx..(ti) 6' (ti) = lxx(t;) 'I
possesses a solution (E;(t), n(t)) defined on~ (Section 2.5.4). By the
theorem on the continuity of the solution with respect to the initial data
(Section 2.5.5), s(n)(t) + E;(t) and
11<"1 (t) =Lxx(tM"<nl(t) +L;..(t).~<n> (t)-+ 'I) (t)
uniformly on~. Therefore, there must also be ~(n)(t) + E;(t) uniformly
on ~.
Furtller, we have 6{t 0) =lim 6(t~m) =lim ~<n> (t~"') = 0, and the points /0 =lim t~"'
II II II

and ti=limtf11' cannot coincide because, if otherwise, then n = ~(t~) = 0


n
since

Since E;(to) = E;(t~) = 0 and Lxx<t~)~(t~) "'0, the point 1iE(t0 ,t1] is conju-
gate to to, which contradicts the hypothesis.
B) Let us choose t~ < to such that the closed interval [to, t 1 ] does
not contain conjugate points to t~. Let us construct a family of canonical
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 269

extremals (x(·, A), p(·, A)) by replacing to in (3) by t~. By virtue of


Proposition 1, det ~~ ('t, p0)-=/=0, Y't'E[t 0 , i 1 ].

Let us define a mapping «p: Co-+RxRn of the cylinder Co = [to - o,


tl + o] X B(po, o) by means of the equality «p(t,1..)=(t,x(t,,;.)) .• At the points
ta
('t', p0 ), 't' E [t 0 , the Jacob ian of «p is nonzero:
0 )=detx,.-=f=O,
det«p'=det( Xt1 x,_

and we can choose o so that this inequality is retained at all the points
of C0 • Moreover, by the inverse mapping theorem (Section 2.3.4), every
point (T, po) possesses a neighborhood UT which is mapped by «p onto WT =
«p (U ~) in a one-to-one manner, and «p-J. ECl (W ~>·
We shall show that for a sufficiently small o the mapping «p is one-
to-one throuRhout C0 • Indeed, if otherwise, there exist two sequences
(t~, A~), (tn, A~) such that A~ + Po, A~ + Po and (t~, x(t~. A.,',))=ip(t~, A~)=«p(t~,
A~)= (t~, x(t~, A~)). It follows that tJ = t~.
Passing to a subsequence we can assume that ~ = t~ + T. For suffi-
ciently large n the points (t~, A~) and (t~, A~) belong to u,, and the
equality «p(t~. 1..~)=«p(t~, 1..~) contradicts the definition of UT.
. Now we shall show that G = «p (int C6 ) is a neighborhood of the graph
f={(t, x(t))jt 0 :;;;;t:;;;;tt}. Since f=«p({(t, p0 )jt 0 :;;;;t:;;;;t1 })cG, it is necessary
to prove that G is open. Indeed, if (T, x) = «p (7, ~) EG, (T, i:) ECo , then
detcp'(T, i:')-=/=0, and, by the same theorem of Section 2.3.4, a neighborhood
of the point (t, X) is mapped on a neighborhood of the point (t, x). There-
fore, (t, x) is an interior point of G, and hence G is an open set.
Thus, the family of extremals {x(·, A)} covers Gin a one-to-one man-
ner. The fact that this family forms a field of extremals was proved in
the first example of a Legendre set in Section 4.4.3. •
The field we have constructed consists of extremals passing through
one and the same point (t~, xo) and is called a central field of extremals.
Now we come back to sufficient conditions for a minimum. First of all
we shall show how sufficient conditions for a weak minimum are derived from
the general theorem of Section 4.4.4. The corresponding theorem was al-
ready stated in Section 4.4.2.
Proof of the Jacobi Theorem (Theorem 2 of Section 4.4.2). If the
Legendre and the Jacobi strengthened conditions hold on an admissible ex-
tremal i(·) of problem (1), (2) of Section 4.4.4, then, according to Propo-
sition 2, it can be embedded in a field of extremals covering a neighbor-
hood G of the graph of x(·). By Theorem 1 (or Theorem 1') of Section
4.4.4, the inequality
t,
3(x(·))=~L(t, x(t), x(t))dt>.1(x(·))

holds for all x(·)EC1 (rt 0 , t 1J, Rn), which satisfy. the boundary conditions
x(t 0 ) = x(to), x(tl) = x(ti) and whose graphs are contained in G while
their extended graphs are contained in V. If the norm llx( ·) - x( ·) llcl is
sufficiently small, the last two conditions are satisfied. Consequently,
x(·) yields a C1 -local, i.e., weak, minimum in problem (1), (2) of Section
4.4.4. •
Now we proceed to sufficient conditions for a strong minimum. In
this case we again assume that functions x(·) and x(·) themselves are so
close to one another that the graph of x(·) is contained in G; however,
270 CHAPTER 4
~(·) may be arbitrary, and the extended graph of x(•) is not necessarily
contained in V. Outside Vcondition ( 1) does not guarantee the convexity of
the function E,; + L( t, x, E,;), and Young's inequality, which was used in the
proof of Theorem 1 of Section 4.4.4 [see formula (7) of Section 4.4.4], does
not necessarily hold. Therefore, in the formulation of the theorem we have
to introduce an additional assumption (Weierstrass' condition) about the
nonnegativity of the function
8(t, x, u, s)=L(t, x, s)-L(t,.x, u)-L;(t, X, u)(;-u).
Weierstrass' Theorem on Sufficient Conditions for a Strong Minimum.
Let the function L be of class c 2 in an open set c:: ~X ~nx.~n. let x(.) E v
C1 (ft 0 , t 1 j, ~n),t and let the extended graph of function Jt(•) lie in V:

r= {(t, x(t), ~ (t)) Ito~ t ~ tl} c:: v.


If:
1) x(•) satisfies the Euler equation
d ~ ~
TtL;,(t)~L;, (t);

2) i(·) satisfies the boundary conditions


X(t0 ) = X0 ; X(1 1 ) = X1 ;

3) the strengthened Legendre condition (1) holds along i(·);


4) the strengthened Jacobi condition holds along x(•): the half-open
interval (t 0 , t 1 ] contains no conjugate points to t 0 ;
5) there exists a neighborhood v;::) r
for which Weierstrass I condition
8(t, X, u, s)~O

holds for all (t, x, u, E;) such that (t, x, u)EY, (t, x, ~)EV:, then ~(·) yields
a strong minimum in the simplest problem (1), (2) of Section 4.4.4 (this
minimum is strict if 8>0 forE;~ u in Weierstrass' condition).
Proof •' Let us define a neighborhood V;::, f in the same way as the
beginning of the present section and construct a neighborhood, G;::, ~'=
,{(t, x(t))l_t 0 ~t~t 1 } an~ a ~ield of ex·tremals covering G; without loss of gen-
erality, we can put V = V. Let us denote by u(t, x) the slope of the field
at the point (t, x) = (t, x (t, }..)) EG:
u(t,. x)=x(t, i..) for x=x(t, l..),
and let
p(t, x)=L;,(t, x, u(t, x)).
According to Proposition 2 and the example 1) in Section 4.4.3, the graph
of the function p:G + R.n* is a Legendre set and lies in D:
ll={(t, x, p(t, x))l(t, x)EG}c::D.
Now for an admissible extremal x(·) whose graph is contained in G, we
write
~ ~
3(x(·))-~ w= ~ L(t, x(t), x(t))dt -~ {p(t, x(t))x(t)-W(t, x(t), p(t, x(t)))}dt==
' ~ ~
tMe remind the reader that function i(·), which yields a strong minimum
when condition (1) holds, must possess a continuous derivative although
in the present section the problem is considered in KC 1 (ft 0 , t~], Rn)
(see the remark after the statement of Theorem 1 of Section 4.4.4).
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 271

t,
=·t,~ {L(t, x(t), x(t))-Lx (t, x(t), u(t, x(t))x(t)+

+Lx (t, X (t), U (t, X (t)) U (t, X (t))-L(t, X (t), U (t, X (t)))}di=
t,
= ~ 8(t, x(t), u(t, x(t)), x(t))dt.
t,

Since the extremal x(·) is embedded in the field, we have x= u(t,


x(t)), and therefore
t,
.?(x(·))-5 (J)= ~ 8(t, x(t), u(t, x(t)), ~(t))dt~o. (14)
V I,

Consequently,
t,
.?(x(·))-.?'(.~(·))= ~ 8(t, x(t), ult, x(t)), x(t))dt (15)
t.

(as in Theorem 1 of Section 4.4.4, we have ~-(J)=~(J)),


y y
By construction (see the proof of Proposition 2 of Section 4.4.5), we
have (t, x(t, A), x(t, A))=(t, X, u(t, x))EV=Y for {t, x)=(t, x(t, A))EG and (t, x(t),
x(t))EV since x(·) is an admissible function. Consequently, Weierstrass'
condition is applicable, and (15) implies that .7(x(·ll-.?'(x(·))~O. Thus,
if x(·)GKC 1 ((t 0 , ! 1], Rn) is admissible and satisfies the boundary conditions,
and the norm llx(·) - x(·) lie is so small that the graph of x(·) lies in G,
then .?'(x(·})~:J.(x(·)). Hence, x(·) yields a strong minimum in problem (1)-
(4) of Section 4.4.4.
Here the equality .?'(x(·))=.?'(x(·)) can only hold when f{(t, x(t), u(t, x(t)),
x(t)) = 0 for all t except possibly the points of discontinuity of x(·).
Ifu*s:::;>'fi>O, thenx(t) =u(t, x(t)), and, in particular, x(·) turns out
to be continuous. Further, according to relations (8) and (12) of Section
4.4.3, we have
p(t, x)=Lx(t, x, u(t, x))<:>u(t, x)=

=3(t, x, p(t, x))<:>u(t, x)=g{p(t, x, p(t, x)):::;>x(t)=g{p(t, x, p(t, x(t)).


We have obtained a formula coinciding with formula ( 12) of Section
4.4.4, and using the same argument as in that section we find that x(t) _
x(t). •
Remark. Relation (15) allows us to show that the quadratic functional
t,
~<s<·H= ~ 9'(t, ;, ~>at
t,

with Lagrangian (9) can be brought to a "perfect square." Computing the


Weierstrass function [and putting~(·) = 0], we obtain
t,
Q(s (·))=SIP {t) ~+P (t)- 1 [l...x (t)-II (t) 8 -l(f)H l2dt, ( 16)
t.

where P(t) 2 = Lxx(t), and=<·) and IT(·) are the solutions of the matrix
system

8 (t) =:itpx (t) 8 +.*PP (t) II, (17)


IT (t) = - fft xx (t) E -1!xp (t) II
272 CHAPTER 4

with the initial conditions


S(t~)=O, II(t~)=E. ( 18)
4.4.6. Theorem of A. E. Noether. In Section 1.4.1 we already demon-
strated a number of situations in which the Euler equation possessed a
first integral. For instance, if the Lagrangian does not involve t ex-
plicitly, i.e., if it is of the form L(x, x), then H = Lx;- L = const
(the energy integral). In this section we shall consider a general method
of constructing first integrals for systems of differential equations which
are Euler equations of variational problems. This method is based on a
simple and, at the same time, profound theorem proved by Amalie Emmy
Noether (1882-1935), a distinguished German mathematician. The universal
principle expressed by the Noether theorem is often formulated thus: "Every
kind of symmetry in the world generates a conservation law" or thus: "The
invariance of a system with respect to a transformation group implies the
existence of a first integral of that system." For instance, the invari-
ance of a system relative to translation with respect to time, which means
that the time t is not involved in the Lagrangian, yields the energy inte-
gral.
Now we pass to strict definitions. Suppose that we are given a family
of mappings
Sr;.: RxRn--+RxRn, Ja!<e 0 ,
Sr;. (t, x) = (~ (t, x, a), :£ (t, x, a)).
We shall assume that this family possesses the following two properties:
1) the functions ~ and :£ belong to the class C2 ;
2) for a + 0 the relations
~(t, x, a)=t+aT(t, x)+o(a), (1)
:£(t, x, a)=x+aX(t, x)+o(a)

hold.
The vector field (T(t, x), X(t, x)) will be called the tangent vector
field of the family {Sa}• Under the action of the transformations Sa the
point P = (t, x) describes a curve in R x Rn, and {T(t, x), X(t, x)} is
the tangent vector to this curve at the point P (i.e., for a= 0; see Fig.
40).
LEMMA. Let X(·)EC'(ft 0 , 11 ], Rn). Then there exist ll>O, e>O, X(·, ·)E
C2 ((t 0 -ll, t 1 +ll)X,(-B, e), Rn), tk(·)EC 2 ((-s, e)), k = 0, 1, such that for a E
(-E, E) the image of the graph of x(•) is the graph of the function x(·,
a): [t 0 (a),t 1 (a)]--+Rn. Moreover,
x(t, O)=x(t), xa(t. O)=X(t, x(t))-x(t)T(t, x(t)), (2)
t{(O)=T(t;, x(t;)). (3)
Proof. The image of the graph of x(•) under the mapping Sa LS speci-
fied by the parametric formulas
i=T(s, a)=~(s, x(s), a), f 0 ~s~t 1 , (4)
x=x(s, a)=.I(s, x(s), a), laJ~s 0 •
The two functions T and x are compositions of functions of class C2 and
hence they themselves belong to this class.
According to (1),

~;(t,x,O)=l, ~;(t;x,O)=O,
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 273
and therefore

Using the compactness of the closed interval to ~ s ~ t 1 and the continuity


of oT/os, we find 01 > 0 and e:1 > 0 such that for Ia I < e: 1 and to - o1 <
s < t1 + o1 the inequality ~;(s, a)>l/2 holds. Then for a fixed aE(-e,
e:) the functions~ T(s, a) is monotone increasing and it maps [t 0 , t 1 ]
onto
[t0 (a), tda)J=[T(t 0 , a), -r(ti, a)J=f~(to, x(to), a), ~(ti, x(/ 1),
a)J. (5)
These equalities show that ti(a), i = 0, 1, are functions of class
C2 , and (1) implies (3).
By virtue of the same monotonicity, the mapping A determined by the
equality A(s, a) = (T(s, a), a) maps the rectangle IT= (to- o1 , t1 + o1) x
(-£ 1 , e: 1) onto its image W = A(IT) in a one-to-one manner. Since we have

det {o-r/as Ol:!aa.} ih > 1


0 1 =a; 2
for the Jacobian of A, the argument similar to the one in the proof of
Proposition 2 of Section 4.4.5 [Section B) of the proof] shows that the
set W is open and that the inverse mapping A- 1 :W ~IT, which is obviously
of the form (t, a)+ (cr(t, a), a), is continuously differentiable and a E
c1 (W).
From (4) and (1) we obtain x(s, O)=~(s, x(s), O)=s. Therefore, the
points (s, 0) remain fixed under the. mapping A, and A- 1 possesses the same
pro'perty, whence cr(t, 0) = 0.
Further, cr(·, ·) is found as an implicit function from the equation
F(cr, t, a) = T(cr, a) - t = 0. It follows that, in the first place, accord-
ing to (4) and (11), we have
C1a.(t, 0)=-F;;lFa.=-(-c,·(o (t, 0), 0)- 1 -ca.(o(t, 0), 0)=

=-f~t(t, x(t), O)+~.. (t, x(t), O)x(t)]- 1 ~a(t;-x(t), 0)=-T(t, x(t)), (6)
and, in the second place, since F is of class C2 , the remark made in Sec-
tion 2.3.4 implies that oEC'(W).
Further, A maps the closed interval {(s, O)lto ~ s ~ t1} into itself,
and this interval is contained in W. Consequently, W contains the rect-
angle (to - o, t 1 + o) x (-e:, e:) if o > 0 and e: > 0 are sufficiently small,
and the function
x(t, a)=.I(o(t, a), x(o(t, a)), a) (7)
is of class C2 on this rectangle. Let us choose e: so that the inequalities
lti(a) -til < o hold for lal < e:. Then recalling the definitions of func-
tions cr(·, •) and ti(·) we conclude from (4) and (7) that the image of the
graph of x(·) under the mapping Sa is the graph of the function x(·, a):
[to(a), t1(a)] + Rn.
Finally, d~fferentiating (7) and taking into account (1), (6) and the
equality cr(t, 0) = 0, we find that
Xa(t, O)=f.It(t, x(t), O)+.I.. (t, x(t), O)x(t)Joa(t, O)+
+.Ia.(t, x(t), O)=x(t)[-T(t, x(t))J+X(t, x(t)),

i.e., (2) holds.


Definition. Let a function L:V + R be at least continuous in an open
set VcRxRnxRn. The integral functional
274 CHAPTER 4

~{T,X}
.X(·,«)

Fig. 40
t,
.1 (X(·), i 0 , i 1 ) = ~ L(t, x(t), x(t))dt (8)
t,

is said to be invariant with respect to a family of mappings {Sa} if for


any function X(·)ECa([to, tl]., Rn) such that {(t, x(t), x'(t))lto~t~tl}c:V the
relation
(9)
holds for all sufficiently small a.
[Here x(·, a), to(a), t1(a) are constructed from x(·) as in the lemma,
and the interval where (9) holds may depend on x(•).]
Theorem of A. E. Noether. Let the function L, Lx, Lx be continuous
in an open set V c:Rx Rnx Rn, and let the integral functional (8) be in-
variant with respect to a family of transformations of class C2 satisfying
condition (1). Then the function
cp(t, X, x)=LJc(t, X, x)X(t, x)-[LJc(t, X, x)x-L(t, X, x)JT(t, x) (10)
is constant on every solution x(•) of the Euler equation
d • •
TtL;, (t, x(t), x(t))=Lx(t, x(t), x(t)) ( 11)

such thatx(·)EC"([t 0, tl], Rn), {(t, x(t), x(t))lt 0 ~t~t 1 }c:V and L~(·, x(·),
x· (•)) E C1 ([t 0 , f 1 ], Rn•).
Proof. A) Let x(·):[to, t1l + Rn be the solution of Eq. (11) which
is spoken of in the condition of the theorem. Using the lemma we construct
a family (x(·, a), to(a), tl(a)), where x(·, ·) is a function of class c 2
on (to-o, tl + o) X(-£, £), Let us fix a closed interval~= [S, y]
such that the inequalities to - o < S < to < t1 < y < t1 + o hold and di-
minish £, if necessary, for the inequalities S < to(a) < t 1 (a) < y to hold
for I a I < £.
Let us show that the mapping¢:(-£, £) + C 1 (~. Rn) X R2 , for which
a+ {x(·, a), to(a), t1(a)}, is Frechet differentiable at the point a= 0.
It is clear that it suffices to verify this property for the first compo-
e
nent of a + x( ·, a). Let us fix E (0, e) and apply the mean value theorem
[to the differentiable mappings a+ x(t, a); tis fixed]; then for lal ~ s,
a + 0, we obtain

U/1
I
Sup x(t, a)-x(t, 0)-'ax,.(t,
a
0) ,_..
· tetJ. ce[O, a]
I
""'=sup sup Xa (t , c) -Xa ( t, O)j -->-0,

sup j xt(t, a)-xt(t, O)-axta (t,


le/1 a
O) I~sup
.
sup lxta(t, c)-Xta(i, O)j-+0,
le/1 ce[O,a]
LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 275
since Xa and Xta are uniformly continuous on the compact set~ x [~, E].
Consequently, in the space C 1 (~, Rn) we have
x(:, ct)=x(·, G)+rLXah O)+o(lctl), ( 12)
which means the Frechet differentiability of ~. Moreover, (12) implies
that
Ill' (0) = {Xa ( ·, 0), t~ (0), ti (0)}. ( 13)
B) Let us put :J (ct) = (:J o Ill) (ct) = :J (x ( ·, ct), t 0 (ct), t1 (ct)). According to
(9), we have :J(ct)""":J(O), and hence :J'(O)=·:J'(x(·), t0 , t 1)[1D'(O)]=O. Using
formula (9) of Section 2.4.2 for the derivative of an integral mapping, we
derive from (13) the equality
t,
O=:J'(x(·), t0 , i 1 )fll>'(O)] = ~ {L..,(t, x(t), x(t))xa(t, O)+
t,

( 14)

By the hypothesis, Lx(t, x(t), ~(t)) is continuously differentiable;


integrating by parts in (14) (as in Section 1.4.1) and taking into account
(11), (2), (3), and (10), we obtain
O=[Lx (tt, x(tt), x(t;))xa(t 1, O)+L(t1, x(t 1), x(t 1 ))ti(O)]}:~=
=[Lx(t;, x(t 1), i(t 1))X(t1, x(t 1))+(L(t1, x(t 1), x(t 1) ) -
' • I 1 ' '
-Lx (tl, x(tt), x(tt))X(tt))T(t;, x(t;))],:o·=Cf!(tu X{t 1), X{t 1))-cp(t0, X{t 0), X{t 0)).
Thus,
cp (ti, x-(tl), xdtl)) = cp (to, X (to),· X(to)).
Repeating the same argument for an arbitrary closed interval ft 0 , t]c[t 0 , t 1 ],
we arrive at the equality
cp (t, x(t), x(t)) """cp (t0, x {t 0), x(t0)) == const. •
COROLLARY. If, under the conditions of the Noether theorem, the as-
sumptions a) and b) of Proposition 2 of Section 4.4.3 hold, then the func-
tion
¢(t, x, p)=pX(t, x)-.rt(t, x, p)T(t, x) ( 15)
is constant on every solution (x(t), p(t)) of the Hamiltonian system (1)
of Section 4.4.3.
The pPoof of the corollary is left to the reader. It is seen from
formula (15) that the first integral constructed in the Noether theorem
is the value of the Poincare---<:artan differential form ro=pdx-gtdt on the
tangent vector field of the family {Sa}.
As an example, let us consider the derivation of the energy and the
momentum integrals from the Noether theorem (Section 1.4.1). If L does
not depend on t, functional (8) is invariant with respect to the transfor-
mations (t, x) + (t +a, x) (check this!). Here T = 1, X= 0, and there-
fore cp=L-Ljc=-gt is constant. In case L does not depend on one of the
components of the vector x, say on Xk• then functional (8) is invariant
with respect to the transformations (t, x) + (t, x + aek) (check this!).
In this case T = 0, X = ek, and therefore cp=L;f!k=L;, •
.t
Some other examples of the application of the Noether theorem will be
presented in the next section.
4.4.7. Lagrange Variational Principle and the Conservation Laws in
Mechanics. Let us consider a system of n particles ("material points")
276 CHAPTER 4
with masses m1 , ••• ,mn; let the radius vector of the i-th particle be ri =
(xi, Yio Zi)• For brevity, we put

r=(rl, ... , r,)ERa", f=(a~~, ... ,a;..).


a~~ = ( a!t ' a:t ' a!t ) •
We shall suppose that the interaction between the particles and be-
tween the given system of particles and the surrounding medium is described
by the potential energy U(t, r) so that the equations of motion have the
form
miii =pi=- au JiJri. (1)
The kinetic energy of the system is determined by the equality

(2)

~EOREM (Lagrange Variational Principle). The equations of motion


(1) of the mechanical system are the Euler equations for the variational
problem
t,
~ L(t, r, r)dt-extr, (3)
lo
r(t 0)=r0, ,r(t1)=r1, L=K-U.
Proof. From (1) and (2) we derive
.. d • a!( au
mirt=(jf(m;r;) =(if
d
art d
=TtL;t =L,i =Trj· •
The significance of this theorem lies far beyond the simplest situa-
tion to which we are confining ourselves. In physics and in mechanics the
properties of a system are often described by setting directly its La-
grangian L. In this case K, U, and L = K- U are not necessarily expressed
in terms of Cartesian coordinates and may involve some other coordinates.
The Lagrangian may also involve certain terms describing magnetic or gyro-
scopic (nonpotential) forces, etc. If the Lagrangian is given, the Euler
equations of the variational principle (3) determine the law of motion of
the system under consideration.
In particular, the variational principle is convenient because it
allows us to derive first integrals ("conservation laws") with the aid of
the Noether theorem. We shall demonstrate this by the examples of the so-
called classical integrals corresponding to the conservation laws for ener-
gy, linear momentum, and angular momentum.
a) A Conservative System. In this case U does not depend on t. The
transformation group is (t, r) + (t +a, r). The function r(•):[t 0 , t 1 ] +
R3n goes into the function ra(·):[to +a, t1 +a]+ R3n, ra(t) = r(t- a)
under this transformation. The functional (3) is invariant:
~+~ ~+~ ~

~ L(r~(t), ;,.(t))dt =- ~ L(r(t-a.), ;(t-a.))dt= ~ L(r(t), ;(t))dt.


4M 4M 4

The tangent vector field is T = 1, R = 0. The first integral:

Consequently, the total energy


LAGRANGE PRINCIPLE IN CALCULUS OF VARIATIONS 277

fft=K+U
is a first integral of the conservative system (this is the "energy con-
servation law").
b) A Free Particle. In this case U does not depend on the radius rk
of one of the particles. The transformation group is
(t, r1 , ••• , r,., ... ,
where lER" is arbitrary. The invariance of the functional (3) is obvious.
The tangent vector field is
T=O, R=(Ri, ... , R~r., ... , Rn)=(O, ... , l, ... , 0).
The corresponding first integral is
-{_... iJK •
L;.R 1 =-. l=m,.(r,.ll).
<p=~
t=I ' a,,.
Since l is arbitrary, Pk = IDk~ must be constant. Hence, if there is
no interaction between the k-th particle and the rest of the system (i.e.,
the k-th particle makes no contribution to the potential energy), then the
"conservation law for the linear momentum" of that particle takes place.
c) There Are No External Forces. This means that U depends on the
differences ri - rj solely and L does not change if the system undergoes
translation in space as a rigid body, i.e., if a transformation (t, r 1 , •.. ,
rn) ~ (t, r1 + al, .•. ,rn + al) is performed where lis an arbitrary vector.

Here T = 0, R = (l, ••• ,l), and is preserved. Since


l is arbitrary,

" m1r1 =const.


P= ~
I= I

Thus, if there are only internal forces, the "law of conservation of


the total linear momentwn of the system" holds.
d) Rotational Symmetry. In this case U depends on the pairwise dis-
tances lri- rjl solely. The Lagrangian does not change when the system
undergoes an orthogonal transformation 6 because
L(6r1, 6r1)=K-U=

(the scalar products and the lengths do not change under an orthogonal
transformation). Let us fix a vector wand consider the transformation
group corresponding to the uniform rotation of the system about the origin
with the angular velocity w. Here the tangent vector field is T = 0, R =
(w x r 1 , ••• ,w x rn). The first integral is

Since w is arbitrary, the "conservation law for the angular momentwn M


..
~ m1r 1
i=I.
xr 1 of the system" must hold:
n
M= ~mtr1X;t const.
h=l
PROBLEMS

In this section we gather 100 problems. They are grouped by certain


subjects irrespective of their difficulty. We begin with problems of geo-
metrical character (Problems 1-14, 22, and 23) whose statement is evoked
by the problems of Euclid, Kepler, Steiner, and Apollonius mentioned in
Chapter 1. There is extensive literature devoted to extremal relations in
geometry. The reader can find many interesting problems in the book by
D. 0. Shklyarskii, N. N. Chentsov, and I. M. Yaglom.+ For most of them the
standard solution (provided that the formalization is correct) is not more
complicated than the "geometrical" one.
Problems 15-21 are related to the topic "Inequalities." Here too the
reader can enlarge the list of the problems by resorting to the book by G. H.
Hardy, G. E. Littlewood, and G. Polya.tt
Problems 24-30 are taken from two topics of approximation theory:
polynomials of least deviation from zero and inequalities for derivatives.
Extremal problems play an important role in approximation theory, and
most of the solved problems admit of the standard solution as well. A
series of extremal problems in approximation theory is considered in
the book by V. M. Tikhomirov [84].
Problems 31-100 are selected from the problems which were considered
at the seminars in the course "Optimal Control" at the Department of Mathe-
matics and Mechanics of Moscow State University. The most difficult prob-
lems are marked by stars.
We would like to draw the reader's attention to the fact that the
application of the ideas of optimal control theory makes it possible
to exhaustively analyze most of the well-known problems of the classical
calculus of variations in a very simple way [for example, see the solution
of Dido's problem (Problem 55), the problem on the harmonic oscillator
(Problem 47), etc.].

tin the preparation of the material of this section the authors received
valuable help from E. M. Galeev (Can. Phys. Math.) and from the students
Le Chyong Tung, Yu. Aleksandrov,V. Sokolova, and K. Makhmutov. To all of
them the authors express their gratitude.
tD. 0. Shklyarskii, N. N. Chentsov, and I. M. Yaglom, Geometrical Inequali-
ties and Problems on Maximum and Minimum [in Russian], Nauka, Moscow (1970).
ttG. H. Hardy, G. E. Littlewood, and G. Polya, Inequalities, Cambridge Univ.
Press, Cambridge (1934).

278
PROBLEMS 279

The answers to all the problems and hints to many of them are given
(or references are made to the literature where the problems are solved).
For the key problems the solutions are presented.
1. Find the point in ann-dimensional Euclidean space for which the
weighted sum of the squares of the distances (with positive weights) from
that point to several fixed points is smallest.
2. Solve Problem 1 on condition that the sought-for point lies on
the unit sphere.
3. Solve Problem 1 on condition that the sought-for point belongs to
the unit ball.
4. Given three different points in a plane, find the point in the
plane such that the weighted sum of the distances (with positive weights)
from that point to the three given points is smallest (this is a general-
ization of Steiner's problem; cf. Section 1.1.2).
5. Find the point on a plane such that the sum of the distances from
the point to four different points is smallest.
6. Inscribe in the unit circle the triangle with the maximum sum of
the squares of its sides.
7. Inscribe in the unit circle the triangle with the maximum weighted
sum (with positive weights) of the squares of its sides.
8. Inscribe in a sphere of unit radius in an n-dimensional Euclidean
space the cylinder with maximum volume (this is a generalization of Kepler's
problem; cf. Section 1.1.2).
9. Inscribe in a sphere of unit radius in an. n-dimensional Euclidean
space the cone with maximum volume.
10. Inscribe in a sphere of unit radius in an n-dimensional Euclidean
space the rectangular parallelepiped with maximum volume.
11. Find the simplex with maximum volume whose one vertex is at the
center of the unit sphere and the other vertices lie on that sphere.
12. Inscribe in a sphere of unit radius in an n-dimensional Euclidean
space the simplex with maximum volume.
13. In a given triangle inscribe the triangle the sum of the squares
of whose sides is smallest.
14. Find the shortest distance from a point lying outside an ellipsoid
in a Hilbert spacet to that ellipsoid (this is a generalization of the
problem of Apollonius; cf. Section 1.1.2).
n n
15. n xf•->sup;
1=1
L a;X;=u,
i=l
x,;;;:>-0, i= 1, ... , n, where O.i are
fixed numbers. From the solution of the problem derive the inequality for
the arithmetic and the geometric means.
n n
16. L jx;jP->sup; L jx,lq = 1, where p ;;,. 1 and q ;;.. 1 are fixed numbers.
i=l i=l

From the solution of the problem derive the inequality for power means.+

tBy an ellipsoid in a Hilbert space H is meant the image of the unit ball
under a continuous linear mapping A of the space into itself such that
ImA =H.
+By the power mean with an exponent p ~ 0 of nonnegative numbers al,••••an
is meant the expression (t af/nr 9

280 PROBLEMS
n
17. L ajx, -+sup; L x;=1, where ai are fixed numbers.
i=l i=l

18 . (alx)-> sup; (xlx) = 1, x E H, where H is a Hilbert space; a is a


fixed element of M. From the solutions of Problems 17 and 18 derive the
Cauchy-Bunyakovskii inequalities for Rn and for a Hilbert space.
n n
19 . L a,x,-> sup; L lx.IP = 1, where p is a fixed number, p ;;.: 1 . From
i=l i=l

the solution of the problem derive HBlder's inequality.


20. Among all discrete random variables assuming n values find the
one with the maximum entropy.t
21. Among absolutely continuous probability distributions on the
straight line R with a given mathematical expectation and a given disper-
sion find the random variable with the maximum differential entropy.t
22. Find the distance from a given point ~ in Rn to a given hyper-
plane L" a,x, =b.
i=l

23. Find the distance from a point ~ in Rn to the straight line,


with the direction vector ~. passing through a point n.
24. Among the polynomials of the form t 2 + a1t + az find the one of
least deviation from zero in the space Zz([-1, 1]).
25. Among the polynomials of the form tn + a1 tn+l + ... + an find the
one of least deviation from zero in the space Zz([-1, 1]).
26. Among the polynomials of the form tn + a1tn-l + ... +an find the
one of least deviation from zero in the space C([-1, 1]).
27. Among all nonnegative trigonometric polynomials of the form
x(t) = 1 + 2pl cost + ... + 2pn cos nt find the one for which the coeffi-
cient Pl is greatest.

28. x(O)-> sup; f (x 2 + (x<"J) 2 ) dt,; 1.

29. f x dt-> sup;


2 f (x + x
2 2) dt,; 1.
0

f
" "
30. * (x<kl) 1 dt-> sup; J (x<"l) 2 dt = y 2 , x<n( -7r) = x<n( 1r),

j=O,l, ... ,n-1,


(T0 ,0)

31. :j: f (x 1 +x) dt->inf.


(0,0)

tThe entropy of a collection of p~sitive numbers Pl•···•Pn whose sum is


equal to unity is the number H = L p, log (1/p;). Similarly, by the (differen-
i=t
tial) entropy of a positive function p(·) whose integral is equal to unity
is meant the number - f p(x) logp(x) dx.
R
:j:Here the limits of integration indicate the boundary conditions: x(O) = O,
x(To) = 0. In the next problem 32 the value x(To) is free. We write To
when this instant is fixed and T when it is nonfixed (e.g., in Problems
71, 76, etc.).
PROBLEMS 281

32. r
(0,0)
(:e+x) dHinf.

33. lxl.;;; 1.
(0,0)

34.
(0,0)

I x
(2.1)

35. t2 2 dt-+ inf.


(1,3)

I x
(1,1)

36. t2 2 dt-+inf (Weierstrass 1 example)


(0,0)

I x
(1,1)

3 7. t 213 2 dt-+ inf (Hilbert 1 s example) •


(0,0)

I
(1.~)

talx!~' d . f
38.* -{3- t-+tn, a, {3 ER (the generalized example of Weierstrass
(0,0)

and Hilbert) •

I
(1,1)

39. (x 2 -x) dt-+inf.


(0,0)

I (tx
(2,1)

40. 2 -x) dt-+inf.


(1,0)

I
(1,4)

41. (x 2 +xx+ l2tx) dHinf.


(0,1)

I
(To,~)

42. (x 2 +x2 ) dt-+inf.


(0,0)

43. <J~> (x + x 2 2) dt-+ inf,

(0,0)

J (x :x +lxl) dt-+inf.
(1.~)

2 2
44.
(0)

f
(!.~)

45. ((lxl-l)~+x 2 ) dt-+inf.

lxl.;;; 1, x(O) = 1.
282 PROBLEMS

(0,0)

To

48. J (x 2 -x 2 ) dt~inf.
(0,0)

(T0 ,0)

49. * J (x 2- x 2 ) dt ~ inf,
(0,0)

(T0,0)

50.* J (([x[-1)~-x 2 )dt~inf.


(0,0)

51. x;;.O,

fx
To

52. 2 dt- ax 2 ( T0 ) ~ inf, x(O) = 0.

f x dt~inf; f
I

53. 2
xdt=3, x(O) = 1, x(l)=6.
0

54.

To To

55. f xdt~inf; f J1+x 2 dt,;;l, x(-T0 )=x(T0 ) (Dido's problem).


-To -To

f
(rr,O) IT

56. J x dt~inf; 2
x sin tdt= L
(0,0)

f
(1,-14) I I

57. J x dt~inf;
2
xdt= -1.5, f txdt=-2.
(0,2)

f f
(1,4) I

58. x dt~inf;
2
txdt=O.
(0,-4)

f x dt~inf;
I

59. 2 x(O) = x(O) = o, x(1) = L


0
I

60. f x dt~inf; 2 x(O) = 1, x(O) = o.


2

61. f xdt--'>inf; x(O) = 1, x(O) = o, x(2) = -2.


PROBLEMS 283

f xdt~inf;
To

62. lxl,;;: 1, x(O)=x(T0 )=0.

f xdt~inf;
To

63. lxl,;;: 1, x(O) = x(O) = x( T0 ) = 0.

f xdt~inf;
To

64. lxl,;;: 1, x(O) = x(O) = x( T0 ) = x( T0 ) = 0.

f
I

65.* x dt~ inf; lx(n)l,;;: 1, x<kl(±1) = 0, k=O,I, ... ,n-1.


-I

f xdt~inf; f
I I

66. x 2 dt=144/5, x(O) = 1, x(O) = -4.

67. x(2) =sup; lxl,;;:2, x(O) = 0,

68. T~inf; -1,;;:.X,;;:3, x(O) = 1, x(O)=x(T)=O, x(T) = -1.

69. T~inf; lxl,;;: 1, x(O) = gh x(O) = g2, x(T) = 0.

70. T~inf; I.XI,;;: 1, x(O) = gh x(O) = g2, x(T)=x(T)=O (the simplest


time-optimal problem).

fx
T

71. T~inf; 2 dt =4, x(O) = 0, x(O) = 1, x(T) = -1.

72. T~inf; x(-1)=-1, x(-1)=0, x(T)=1, x(T)=O.


73. T~inf; x+x=u, lui,;;: 1, x(O) = g~> x(O) = g2 , x( T) = x( T) = o.
74. T-'>inf; x+x=u, lul,;;:1, x(O) = g~> x(O) = g2 , x 2 ( T) + x2 ( T) = R2,
gi+g~> R 2 •
75. T~inf; x+(l-w)x=O, x(O) = gh
O,;;: u,;;: 1, O<to<l.

fx
T

76. t 2 dt ~ extr; x(O) = 0, T+x(T)+ 1 = 0.

fx
I

77. 2 dt ~sup; x(O)=O (find all the extremals).

f
To

78. (x 2+ x 2) dt ~ inf; x(O) = x(O), x(To) = gh x( To)= g2·

f
To

79. (x 2- x 2) dt ~ inf; x(O) = x(O) = o, x(T0 ) = 0, x(T0 ) =0.

tHere Tis nonfixed; see the footnote to Problem 31.


284 PROBLEMS

J
To

80. ~
(x 2 - x 2 ) dt inf; x(O) = x(O) = o,

f dt~inf;
l

81. x2 jxj,;;I, x(O)=x(O)=O, x(I)=-11/24.

f
2

82. x 2 dt=inf; x;.o6, x(O)=x(O)=O, x(2)=I7.

83. J (;e -48x) dt ~ inf; x(O) = x(I) = 0.


0

J J
To To

84. x 2 dt~inf; x 2 dt= I, x(O)=x(O)=O.

f
2

85. u 2 dt+x 2 (0)=inf; x-x= u, x(2) = 1.


0

J dt+x 2 (0)~inf;
2

86. u2 X-x= u, x(2) =I, x(2) =0.

J
2

87. u 2 dt + 4x 2 (2) ~ inf; x(O) =I, x+4x = u.

J dt+4x2(2)~inf;
2

88. u2 x(O) =I, x(O)=O, x+4x= u.

89. x 2 (I) = 2 sin I sinh I.

J I
l l

90. (x 1 + x 2 ) dt ~ extr; x,x2dt=O, xi(O)=xz(O), .xJ(I)=I, X 2(I)=-3.

f iuiP dt~inf;
l

91. X=u, x(O) = o, x(O) = o, x(I) = g2,

Jlx-
To

9 2. * gj dt ~ inf; x(O) = x0 , XER"


0

(Goddard's problem).

I x"JI+x2 dt~inf,
(tl'xl)

93. a<O.
(to,xo)

f
(t,,g)

94. xJI+ x 2 dt~ inf.


(to, g)
PROBLEMS 285

I JJ+.x2
T

95. --x- dt ~ inf; x(O) = 1, T-x(T) = 1.

I (1+eJ.XIJdt~inf;
T

96.* x(O) = ~2, x(T) = x(T) =0,

97.* IluJPdt~inf; x+ux=O, x(O) = x(1) = 0, x(O) = 1, p> 1.

98. * I JuJ dt~inf; x+ux=O, x(O)=x(l)=O, x(o) = 1.

To

99. * I (Jxi+x~+J(x1-sin t) 2 +(i2 +cos t) 2 dt~inf; x 1(0)=x,(O) =0,

x,( T0 ) = ~2 (Ulam' s problem) .

100.* I
0
To ( cot x
---+---
xg + yt 2
xt
xg + yt 2
)
dt~sup; x(O) = x 0 , x(T0 )=1, where c, g, and

y are positive constants (this is the problem on the maximum flight alti-
tude of a rocket).
HINTS, SOLUTIONS, AND ANSWERS
Let us agree on some abbreviations: Swill mean "solution," H will
mean "hint," and F will mean "formalization." By (y, etc.) the solu- x z,
tion of the problem and by S its value will be denoted. If the problem
depends on some parameters, then S is written as a function of the corre-
sponding parameters; To means that time is fixed, and if in the statement
of the problem T is written, this means that time is nonfixed; T denotes
the optimal time.
1. F: Lm;Jx-x,J 2 ~inf, x,x,ER"; this is an elementary smooth problem;
x=i=Lm,x./Lm; is the center of mass.
2. F: L m;Jx-x,J 2 ~inf, JxJ=l, X,X;ER"; x=x/lxl if~"' o, and X with
lxl = 1 is arbitrary if x = 0; x is the center of mass.
3. ~=X if lxl ~ 1, and X= x/lxl if lxl > 1, X is the center of
mass.
4. F:f(x)=Lm;Jx-x,J~inf, x,x,ER 2 , i=1,2,3; this is an elementary convex
problem. S: X exists ~ 0 E af(x). If X (/:. {xl> X2' xd' then aj(x)=
{L m,(x-x.)/Jx-x;j}={O}, and this means that it is possible to construct a
triangle with sides of lengths m1 , mz, and m3. Let aij be the~angle of
this triangle formed by the sides of lengths mi and m~. Then xis the
point of intersection of the three arcs of circles, Wlth the chords [xi,
xj], subtending the angles TI- aij· If the triangle with sides of lengths
m1 , m2 , and m3 does not exist or if the construction described above is
impossible, the solution x coincides with one of the three given points.
5. We shall give the answer for the basic case when no three points
among {x 1 , x2, X3, x4} lie in one straight line. Then xis the point of
intersection of the diagonals if the quadrilateral with the vertices xi is
convex and is the interior vertex of the convex hull if the quadrilateral
is not convex.
286 PROBLEMS
6. F: lx 1 -xl+lx1-el 2 +lx2-ei 2 ~$Up, lx1l 2 =lxzl 2=l, e=(I,O), X;ER 2, i=I,2.
The problem possesses six stationary points; (x 1 , x2, e) forms an equi-
lateral triangle. For the detailed solution, see the book [54, p. 443].
7. F: m0 lx1-x212+m11x1- el 2 + m2lx2-e1 2 ~sup, lx1l 2= lxl= I. If i t is possible
to construct a triangle with sides of lengths m1m2, mom1, and mom2, and if
a 1 and a 2 are the angles between the sides ~f lengths m1m2, mom1 and m1m2,
m0 m2 , respectively, then the angle between x1 and e is equal to ~- a1, and
the angle between x2· and e is equal to ~ - a2. If the construction is im-
possible, then x1 = x2 = -e or x1 = -e, x2 = e.
Problems 8 and 9 admit of different formalizations connected with dif-
ferent interpretations of the terms "cylinder" and "cone." In what follows
we shall understand a cylinder in Rn as a product of an (n- !)-dimensional
ball by a line segment orthogonal to it, and by a cone in an we shall mean
the convex hull of an (n - !)-dimensional ball and a line segment orthogo-
nal to it.
8. i is half the altitude of the cylinder and is equal to n- 1 / 2 .
9. x is the difference between the altitude of the cone and the radi-
us of the ball and is equal to n- 1 .
10.
x is an n-dimensional cube.
11. x is a simplex whose n sides form an orthogonal basis, the lengths
of the sides being equal to the radius of the sphere. The solution of this
problem and the homogeneity imply HadamaPd's inequality:

[det(a,J] 2 ,;;;.~1 Ct a;i)·


For greater detail, see [54, p. 444].
12. x is a regular simplex.
13. F: lx-yl 2 +ly-zl 2 +lz-xl 2 ~inf,
(xia),;;;O, (ylb),;;;O, (zlc)+a,;;;_o, a>O,
a+ b + c = o, a, b, c, x, y, z ER2 •
The origin coincides with the vertex C; a, b,
and c are vectors orthogonal to [B, C), [C, A], and [A, B), respectively;
x is the point of intersection of the straight lines (xla) = 0 and (xlb)
S(b- alb); y is the point of intersection of (ylb) = 0 and (yia)=-{:l(b-
aia); z= (x~ y) +~ f:l(a +b); f:l = 2a/(3la + bl 2 + lb- ai 2 ).

14. F: llxo-t"ll~inf, t"=Ay, IIYII,;;;I, i=A.Y, where y= (A*A + h)- 1 x


A*xo, and 1 is the single positive solution of the equation II(A*A + AI) x
A*xoll = 1.
15. x1 xn = o/rai· The solution of the problem and the homo-
geneity imply the inequality

For ai = 1/n the inequality foP the aPithmetic· and the geometPic means is
obtained: (b1x,y 1
• ,;;;(~x;)/n.
16. x = ±ei, where e 1 , •.. ,en is the standard basis in Rnif q < p;
x=(±n-1/q, ..• ,±n- 1/q) if q > p; the case p = q is trivial. The solution of
this problem and the homogeneity imply the inequality foP poweP means:

q;;., p~up(x),;;; uq(X), up(x) = Ct xf/n Y'P, X;;a.O.


PROBLEMS 287

12
17. x,=a,j(,ta;y • The solution of this problem and the homogeneity

imply the Cauahy-Bunyakovskii inequality:


11: a,.x,l,;;;; (1: a7)1/2(1: x;)l/2.

18. x = a/lal. The solution of this problem and the homogeneity im-
ply the Cauchy-Bunyakovskii inequality.
1 1
-+-= 1. The solution of this
p p'
problem and the homogeneity imply Holder's inequality:
11: a,.x,l,;;;; (1: la.IP')I/p'(l: lx.IP) 11 P, .!+.!. = l.
p p'
For p 1:
x, = {sgn a, if lad= II all =max{la1l, ... , la.l},
0 ifla,l <II all-

20. " p, log p, -+ inf,


F: 1: l:p,=1, p,;;.o, p, =.!..n (the uniform distribu-
i=t
tion).

21. F: J p(x) logp(x) dx-+inf; J p(x) dx= 1, J xp(x) dx =a;

J[.\.
R R R R

u 2, p(x);;.O·; this is a Lyapunov-type problem. S: ::£= 0 p(x)logp(x)+A 1p(x)+


R

A2xp(x)+A 3(x-a) 2p(x)] dx. The extremum condition for p( · ): A0 p logp+A 1p+A 2xp+A 3 x
(x-afp;;.Aoft(x)logp(x)+)qp(x) +A2xp(x)+A3(x-a) 2p(x), Vp;;.O. It follows that
and ft(x)=e-<a,x'+b,x+c,J, where a1, b1, and c1 are chosen from the con-
~ 0 >' 0
straints, whence

(the Gauss distribution).


22. S = 11: a,g,- bl(l: a;)- 112.
23. S=(lg-1JI2-(g-1JI,/1'12)) 1/2.
24. i(t) = t 2 - 1/3. H (to Problems 22-24): problems of this kind
were investigated in Section 3.5.3 (Problem 24 can be interpreted as the
problem on the shortest distance from the function t 2 to the subspace
lin {1, t}).

J J[A~x +p(x<•>-n!)]dt.
I I

2 The Euler-
25. F: x dt-+ inf;
2 x<•>=n!. S: ::£=
-I -I

Poisson equation: p(n)(t)


p<kJ(±1)=0; k=0,1, ... ,n-1;

p<kl(±1) = O~p(t) = C(t 2-1)",

proving the sufficiency it is possible to refer to the convexity of the


problem.
26. F: f(x)= max {f(t,x)}-+inf; f(t,x)= It"+
-l~to:E;)
xktk-ll; this is an elementary I
k=l

convex problem whose solution exists. The maximum of the modulus of the
288 PROBLEMS

r
n
function x(t)=t"+ xktk-l is attained at s,;;; n + 1 points because if s >
k~l

n + 1, then~(·) has not less than n zeros in the interval (-1, 1), which
is impossible because deg ~( ·) = n - 1. Let us denote these points by
TI,•··•'s• TI < T2 < •.. < •s· If i(·) is a solution, then 0E8j(x) 2
! 4 conv x
s
r
i=l
a,=I, t a;
1=1
sgn X( T.)T~ = 0, k=O, I, ... ,

n- 1. The last relation implies that s = n + 1, i.e., x(·) assumes its


maximum and minimum values. at n + 1 poinFs. Therefore, these points must
include ±1 [because, if otherwise, then x(·) again has not less than n
zeros in the interval (-1, 1)]. It follows that (1- t 2 )~ 2 (t) and f 2 (x)-
x2 (t) are polynomials of degree 2n having simple zeros at T1 = 1 and
Tn+l = +1 and twofold zeros at Tz, ... ,Tn· Now, equating the coefficients
in t 2n, we obtain the equation (1- t 2 )~ 2 + n 2 (f 2 (x)- x 2 (t)); solving this
equation we find x(t) = z-(n-l) cos (n arccos t).
Now we can easily compute ak (this result will be of use to us later):

a=-
, n' i = 1, ... , n -I.

From what has been proved we can derive the following important equality:

f
I

sgn x(t)tk dt = 0, k = 0, 1, ... , n -1.


-I

27. p1 cos rr/(n + 2). H: By the well-known Riesz' theorem, the


nonnegative trigonometric polynomial x(·) admits of the representation
x(t)=I+2p 1 cost+ ... +2p.cosnt=lk~oxke'k'l ,where Xk are real numbers.
2
It fol-
lows that
n-1

L
k=O
X0k+l =PI·

Further, the Lagrange multiplier rule should be applied. For greater de-
tail, see [84, pp. 113 and 114].
A( ) _ ;, kjek/
28. t - ,;.. , where kj are the roots of the equation
(t ks )P'(fs)
X
1~1
x< 2n) + (-1)nx = 0 lying in the left half-plane, and P(z) = Gz- k 1) ...
(z- kn). For greater detail, see the book [84, pp. 121-123].
29. x(t) = Ae-t/z sin (t sin rr/3 - rr/3), where A is chosen from the
isoperimetric condition. For the solution of the problem, see the book
by G. H. Hardy, G. E. Littlewood, and G. Polya.* For the solution of the
general problem on inequalities for derivatives by the direct application
of the Lagrange principle, see the work by A. P. Buslaev.t
30. S = Y1P1 + yzpz, where PI and P2 satisfy the system p 1 + s 2np 2 =
s k, P1 + (s + 1) 2npz = (s + 1) 2k, where s=[(y2 /y 1) 11 [ 2 (n-oJ] and [ ] denotes
2

the integral part. H: the function x(·) should be expanded into Fourier's
series to reduce the problem to a linear programming problem. For greater
detail, see the article by Din Zung and V. M. Tikhomirov.~
*Loc. cit. p. 227.
tA. P. Buslaev, "On an extremal problem connected with inequalities for
derivatives," Vestn. Mosk. Gos. Univ., Ser. Mat., No.3, 67-77 (1978).
:j:Din Zung and V. M. Tikhomirov, "On inequalities for derivatives in the
metric of Lz," Vestn. Mosk. Gos. Univ., Ser. Mat., No.5, 3-11 (1979).
PROBLEMS 289

H (to Problems 31-100): in most of the problems that follow, the proof
of the fact that the solution x(·) yields absmin can be obtained either by
the direct analysis of the values of the functional at the points x(·) and
x(·) + h(·) (in such a case we write "is verified directly") or by an in-
dependent proof of the existence of the solution. In the latter case one
can make use of the following existence theorem for the simplest problem:
of the classical calculus of variations with a constraint on the derivative
which can be derived from the general results presented in [88]t:
Let the integrand L in the problem

j' L( t, x( t), .X( t)) dt .... inf; x(t0 ) = Xo,


to

be a continuous function on [to, t 1 ] x R x [-A, A], and, moreover, let the


function ~ + L(t,,x, ~) be convex. Then there exists an absolutely con-
tinuous function x(·) which yields a minimum in the problem(*).
The introduction of the "compulsory" constraint lxl <:;;A (which makes
it possible to guarantee the existence of the solution) followed by the
passage to the limit for A + co will be called the "compulsory constraint
method."
31. x(t) T0 t) /4 (absmin).
32. x(t) 2Tot) /4 (absmin).
33. If To<:;; 2, then x(t) = (t 2 - 2Tot)/4; if To > 2, then x(t) = -t
for 0 <:;; t <:;;To- 2 and x(t) = t(t- 2To)/4 + (T 0 - 2) 2 /4 for To- 2 <:;; t <:;;
~O· In all the cases ~(t) yields absmin.
34. If To <:;; 4, then x(t) = (t 2 - Tot)/4. If To > 4, then x(t) = -t
for 0 <:;; t <:;; (To- 4)/2, x(t) t(t- To)/4 + (To- 4) 2 /16 for (To- 4)/
2 <:;; t <:;; (To + 4)/2, and x(t) = t - To if (To + 4)/2 <:;; t <:;; T 0 • In all the
cases x(·) yields absmin.
H (to Problems 31-34): one should apply the Lagrange principle and
find the single extremal x(·). The fact that x(·) yields absmin can be
verified directly, and it is also possible to resort to the above-mentioned
existence theorem.
35. x<t> = 4/t - 1.
36. S = O, and there is no solution. H: consider the minimizing
sequence {xn(t)}, where xn(t) = nt for 0.:;; t.:;; 1/n and xn(t) = 1 for 1/
n <:;; t <:;; 1. This example was put forward by Weierstrass as an argument
against Riemann's proof of the Dirichlet principle.
37. x(t) = t 1 / 3 (this example was considered in Section 1.4.3; the
solution exists but does not belong to the class C1 ) .
38. This problem can be used to demonstrate the application of the

• f f
1 1

dual1ty methods. F: -tajujf3 . f


13-dt .... m ; u d t=g,a,{3E R ; this is a Lyapunov-type
Q

problem. The conjugate function is

If g} J ~~~ f (
1 1 1

S*( 1)) = s~p [ 17g- inf {f ta I~{3 dt u dt = = U7]- fa ~I") dt


0 0 0

tAlso see W. H. Fleming and R. W. Rishel, Deterministic and Stochastic


Control, Springer-Verlag (1975).
290 PROBLEMS

<5{0} for f3 < 1 or f3;;;. 1, f3,;;;; a+ 1;


<5{[-1,1]}for{3=1, a.;O;

{3- 1
f3- a -1
-~fora>1
{3' ~" '
a>a+1(_!_+_!_=1).
~" f3 {3'

l
The value of the problem is

(i) Oforf3<1or{3;;;.1, p.;a+1;


S(g, a, {3) = (ii) jgj for f3 = l, a,;;;; 0;

(. . )(/3- a
111 {3-1
-1)/l-I jgjll for a> l
f3 1-' '
a> a+ 1.
1-'

For (i) S < 1 there is no solution because the integrand is not convex (see
Bogolyubov's theorem in Section 1.4.3); for (i) S ~ 1 there is a discontin-
uous solution (a "jump at zero"); for (ii) a= 0 there are infinitely many
solutions (any monotone function is a solution); for (ii) a< 0 there is a
generalized solution (a "jump at unity"); for (iii) the solution is x(t) =
st<S-a- 1)/S (it belongs to C1 ([0, 1]) for a~ 0).
39. x(t) = -t 2 /4 + St/4 (absmin).
40. x(t) = -t/2 + 3 log t/2 log 2 + 1/2 (absmin).
41. x(t) = e + 2t + 1 (absmin).
H (to Problems 39-41): solve the corresponding Euler equations.
42. x(t, S) = s sinh t/sinh To (absmin).
43. If Is I cosh To ~ 1, then x(t, c;) = s sinh t/sinh To; if Is I cosh To >
1, then x(t, S) = ±sinh t/cosh T for 0 ,;;;; t ,;;;; T and x(t, /;) = ±(tanh T +
(t- T)) for T,;;;; t ~To, where T is the solution of the equation To- s
T - tanh T. In all the cases x(.) yields absmin.
44. If lsi ~ 1, then x(t, S) =I;; if lsi> 1, then x(t, s) = n for
0,;;;; t,;;;; lnl- 1 and x(t, <;) = ncosh (t- lnl- 1 ) for lnl- 1 ,;;;; t,;;;; 1, where n
is the solution of the equation ncosh(1- lnl- 1) = !;. In all the cases
x(·) yields absmin.
45. x(t, !;) = eltl + (!;- e)cosht/cosh 1. (This problem was dis-
cussed in Section 1.4.3.)
46. H:. let us suppose that x( ·) is a solution of the problem. Then,
according to the Pontryagin maximum principle, there exists a function
p(·) such that the pair (x(·), p(·)) satisfies the following conditions:
x=sgnp, -jj=x, x(O)=l, x(oo)=p(oo)=p(O)=O. (1)
If T is the first positive zero of p(•) and x(T) =a~ 0, then it can
easily be verified that the pair (x1(·), P1(·)), where
X 1 (t)=a- 1 x(~ t+r), p 1(t)=a- 2 sgn ap(J~ t+r),
also satisfies (1). The solution is unique due to the strict convexity of
the functional which must be minimized, and therefore
x(t)=a- 1 (J~ t+r), p(t)=a- 2 sgnap(J~ t+r).

It readily follows that x(·) and p(·) are finite functions, i.e., functions
of finite (compact) support, formed by "sticking together" a countable num-
ber of polynomials of the second degree and the fourth degree, respectively,
and are completely determined by setting three parameters: x(O) =-a,
p(O) = S, and T. Moreover, Sand T are expressed explicitly in terms of
PROBLEMS 291

a, and a itself [under the assumption that a E (/2, 2)] satisfies the
equation
<p(a)

J (t /2-at+1)(1f1(a)-t)dt=O,
2 (2)
0

where tp(a)=a( ~+•). ·1f1(a)=tp<a>((::~~r\•r'


It can be shown that (2) does in fact have a solution on (/2, 2), &
and if B
and T are the values of the parameters S and T corresponding to
this value of&, then we can easily construct a pair (x(·), p(·)) satis-
fying (1). This means that x(·) is a solution of the original problem.
Moreover, the function p(·) is a solution of a dual problem which is
in fact equivalent to the problem investigated in the work by L. D. Ber-
kovitz and H. Pollard.* The solutions of the two problems (considered for
a somewhat more general situation) are contained in the article by G. G.
Magaril-Ilyaev.t

={:
forT0 >1rorT0 =1T, €~0.
00
47. S(T0 , €) for T0 = 1r, § = 0,

ecot T0 for 0< T0 < 1T,


x(t; To,!;;)= (l;sint)/sinTo (absmin), To< 'Tf; x(t; 'Tf, 0) A sin t (absmin).
-oo for T0 > 1T/2
48. S(T0 ) ='{
0 for0<T0 <7r/2,
x(t; To)= 0, 0 < T0 < 'Tf/2 (absmin); x(t; 'Tf/2) A sin t (absmin).
0 for T0 < 1r,
A sin t,!AI.;;; I, for T0 = 1r,

49. x(t; T0 ) = t forO.;; t0 .;;; T0 /2- 6,


{ -./I+( T /2- 6) 2 cos ( T /2- t) for IT /2- tl.;;; 6,
0 0 0

T0 - t for T0 /2+ (},.;;; t.;;; T0 ,


where 6 = 6(To) is the minimum positive solutionof the equation cot z = T0 /2-z.
H (to Problems 47-49): there exists an extremal which yields a minimum
in Problem 49. Further, one should apply the maximum principle to show
that there is always a finite number of extremals. It can easily be veri-
fied that the extremal we have written yields the smallest value for the
functional. Problems 47 and 48 can be investigated exhaustively by using
the "compulsory constraint method"(lxl ~A). After that it is necessary
to solve some problems analogous to Problem 49 and then to pass to the
limit for A + oo.
SO.S(T) = {0 for To < 'Tf, For To < 'Tf there are countably many
0 -oo for To > 'Tf.
polygonal extremals; the first corner point lies on the curve (T, tanT),
0 ~ T ~ 'Tf/2. The curve with one corner point has the following form:
(sin t)/cos (T0 /2) forO.;; t.;;; T0 /2,
x 1(t) = {
sin (T0 - t)/cos (T0 /2) for T0 j2.;;; t.;;; T0 ,

*L. D. Berkovitz and H. Pollard, "A nonclassical variational problem aris-


ing from optimal filter problem. I," Arch. Rat. Mech. Analysis, 26, 281-
304 (1967); also see [88].
tG. G. Magaril-Ilyaev, "On Kolmogorov's inequalities on a half line,"
Vestn. Mosk. Univ., ~. 33-41 (1976).
292 PROBLEMS
and the others are constructed in a similar way.
51. s = 0. H: apply Bogolyubov's theorem (see Section 1.4.3).
52. x(t) _ 0 (absmin) for a·~ TQ 1 ; S = -oo for a> T0 1 •
53. x< t) 3t 2 + 2t + 1.
54. The stationary points of the problem are
xk(t) = (2/ T0)- 112 sink( 1r/ T0)t, k = 1, 2, ...•
The solution is x(t) = Xl(t) = (2/To)- 1 / 2 sin (7r/To)t (absmin).
The solution of this problem perfectly illustrates Hilbert's theorem
(Section 3.5.4).
55. The solution of the problem does not always exist. Let us apply

f
To

the "compulsory constraint method": lxl ~A. f£= (-A 0 x+A)1+u 2 +p(x-
. "" U -To
u)) dt~fi=-A 0 , fi=)'~ r:-:-;;;-; Ao=O~u(t)=const~u(t)=O (due to the boundary
"1 + u 2

conditions) ~ l = 2To. If l > 2To then, ~o ~ 0; i.e., we can assume


that ~o = +1. Then p(t) = -t. Integrating we conclude that x(•) is either
an arc of a circle or a curve composed of the line segments A(t + To) and
A(To - t) joined by an arc of a circle or a broken line:
A(t +To) for -To~ t ~ 0; A(To- t) for 0 ~ t ~ T0 •
Passing to the limit we obtain the following solution of the problem: for
2T 0 < l ~ 1rTo the curve x(·) is an arc of a circle with center on the
straight line t = O, and for l > 1rTo the curve x(•) is the semicircle
"lifted" by a distance of (l- 7rTo)/2.
56. jl:(t) = 2 sin t/1r.
57. x(t) = -10t 3 - 12t 2 + 6t + z.
58. x(t) 5t 3 + 3t- 4.
59. x(t) (3t 2 - t 3 )/2.
H (to Problems 56-59): x(•) yields absmin in all the cases, and this
can be verified directly.
60. x(t) = 1 (this can be noticed at once without any calculations
although the problem can also be solved using standard methods).
61. x(t) = -2t 2 + 1 for 0 ~ t ~ 1 and x(t) = t 2 - 6t + 4 for 1 ~
t ~ 2.
62. x(t) = t(t- To)/2.
63. x(t) = -t 2 /2
if 0 ~ t ~ (1 - z-l/ 2 )To, and x(t)
2)Tot + (/:2- 3/2)Tt if (1 - z-l/ 2 )To ~ t ~To.

f
t

64. x(t)= (t-T)U(T)dT, uA( t ) = -sgn cos-.


21Tt
To

H (to Problems 61-64): one should prove the existence of the solution
and apply the maximum principle. It turns out that there is a single sta-
tionary point which yields absmin.
Here the existence of the solution follows from the compactness in
the space C of the set of the functions with bounded second derivatives
(for the same boundary conditions as those involved in the statement of
PROBLEMS 293

To

the problem) and the continuity of the functional f(x( · ))= Jxdt.
f
t 0

65. x ( t ) =1- )( (t-T)"- 1 U(T) dT, u(t)=±sgncos[(n+1)arccos t], where the


n-1 !
-I

sign is chosen so that i:(t) ~ O, t E [-1, 1].


H: here the maximum principle leads to the following relations:
p<•) + (-1)"+1 = 0, .x<•l(t) = u(t) = sgn p(t),
i.e., p(·) is a polynomial of degree (n + 1) and

f
I

1 )
X(t)=-( (t-T)"- 1 sgnp(T)dT.
n-1 !
-I

If p(·) is Chebyshev's (n + 1)-th polynomial then, as it follows from the


solution of Problem 26,
I

J (1- T)n-k-l sgn p( T) dT = 0, k=O, 1, ... , n-1,


-I

i.e., x(k)(±1) = 0, k = 0, 1, ••• ,n- 1. Hence x(·) is a stationary point.


The fact that this point is a solution follows from the convexity of the
problem.
66. There are two extremals: -t 4 + t 3 - 6t 2 - 4t + 1 and (t- 1) 4 •
67. i(t) =t (absmin).
H: the fact that this solution yields absmin follows from the rela-
tions

x(2)= f2
0
xdt,;;..fi
( 2 )1/2
f
0
x 2 dt =2.

As a rule, in the problems that follow we only indicate the stationary


points. The verification of whether or not they yield the solutions of the
problems is left to the reader.
68. x(t) = -t 2 /2 + 1 if 0 ~ t ~ 1:3, and x(t) = 3(t- 4/1:3) 2 /2- 1 if
13 ~ t ~ 4//3; i' = 4/13.
69. u(t) - +1 or u(t) =-1.
70. For the solution, see Section 1.6.3.
71. '1' = 1; x(t) t - t2.
72. '1' = 1 ' x(t) (t + 1)2- 1 for -1 ~ t ~ 0, and x( t) = 1 - (t-
1) 2 for 0 ~ t ~ 1.
For the solutions of Problems 73 and 74, see [12, pp. 34 and 63]; for
the solution of Problem 75, see [54, pp. 436-439].
76. x(t) = -zt. ,
77. The extremals are xdt) f sgnsin(2k+1) ~T dT, k=0,±1, ... ; x(t)=±t.

A t t A( t • t 0 t ') A A
78. x(t)=C 1 sinh.fisin.fi+C2 cosh.fism.fi-smh.ficos,j2 ;, C1 and C2 are
found from the conditions x(To) = s1, i(To) = S2•
294 PROBLEMS

79. x(t) = 0, To ~ T, w~ere T is the first zero of the equation


cos Tcosh T = 1; S = --«>, To·> T.
80. x(t) = el(cosh t - cost) + Cz(sinh t - sin t) for To < T; T is
determined in Problem 79; cl and Cz are found from the relations x(To) =
Sl• ~(To) = f;z; S = 0 for To = T, Sl = s2 = 0; S =--«>for To > T.
81. x(t) = -t 2 /2 if 0 ~ t ~ 1/2, and x(t) = t 3 /3- t 2 + t/4- 1/24
if 1/2 ~ t ~ 1.

82. x(t) = -t 3 + 6t 2 for 0 ~ t ~ 1, and x(t) = 3t 2 + 3t- 1 for 1 ~


t ~ 2.
83. x(t) = t 4 - 2t 3 + t.
84. The function ~(t) = C[(sinh Wt- sin Wt)(cosh wT 0 - cos wT 0 ) -
(cosh wt -cos wt) (sinhwT 0 +sin WT 0 )] yields the infimum which is equal to w4 ,
where w > 0 is the minimum root of the equation cosh wTo cos wTo = -1, and
C is found from the condition

8s. x(t) sinh t/cosh 2.


86. x(t) (t- 2) cosh t/cosh 2.
87. x(t) = -sin2(t- 2)/sin4.
88. x(t) = [2tcos 2(t- 2)- sin2(t- 2)]/sin4.
89. Xl(t) = sin 1 sinh t - sinh 1 sin t, x2(t) = sinh 1 sin t + sin 1 X

sinh t.
x1(t) = 3t 2 - 2t, jxl(t) = 4t- 3t 2 ,
90 · { x2(t) = 3t 2 - 6t, xz(t) = -3t 2 •
91. x(t) l
a2pl/(p-l)
jat+bj(2 p-l)/(p-l)sgn(at+b)+A 1 t+A 2 for p > 1, where a

a. sgn a., and

b =a g2-gl
gl ,
bp/(p-1) Ibl (2p-I)/(p-I)
AI=~,
ap
A2 2
ap
1/(p o ·

H: apply the Lagrange principle. For the case p = 1, see Problem 92.
9·2. Let us denote n = vo - v1 - gTo, s = xo + voTo - x 1 - v 1 + v 0 +
gT 0 • Then S = S(n, E;) = ls/T 0 1 +In- s/Tol.
H: putting u x- g we arrive at the following Lyapunov-type problem:

tu(t)dt=~.
To

Jlui dt-+inf; x=y, y=u+g¢> Tiuldt-+inf; Tudt=71, T

Further use the results of Section 4.3.


93. The extremals of the problem were found in Section 1.6.5. For
a. < 0 the extremal joining the two points (to, xo) and (t 1 , x 1 ) is unique
if Xi > 0 and can easily be embedded in a family of extremals covering the
whole plane. Hence it yields an absolute minimum in the problem (see Sec-
tion 4.4).
PROBLEMS 295

(1 + 4o;) 2
x,=--2- x2

Fig. 41
94. This is the minimum-surface-of-revoluti on problem. The extremals
were found in Section 1.6.5. For the complete solution of the problem, see
the book [54, p. 427].
,-----~

95. x(t) = /2 - (t - 1) H: show that the Lagrange principle im-


plies that for~roblems of geometrical optics [in which the integrand
has the form (1 + x~/v(t, x)] the transversality condition coincides with
the orthogonality condition.
96. In Fig. 41 we present the phase diagram of the problem (cf. Fig.
28).
97. S: let us bring the problem to the standard form:

f iuiP
I

dt-+ inf; xiO)= 1.


0

In the last problem it is possible to prove the existence of the solution.


The Lagrangian of the problem is
L = AoluiP + p1(x1- x2) + P2<~2 + ux1).
Show that AO ~ 0. Then the Euler equations with respect to x and the
transversality conditions take the following form:
-p1 + UP2 = 0, -p2- P1 = 0, P2(1) =0 ~ P2 + UP2 = o, P2(1) = o.
Thus, x 1(·) and p 2 (·) satisfy one and the same second-order linear differ-
ential equation, and both the functions vanish at one and the same point
t = 1. Consequently, x1(•) and p2(•) are proportional: Cx1(•) = P2(•).
From the Euler equation with respect to u (Lu = 0) we obtain
p 2 (t)x 1 (t) = -plu(t) IP- 1 sgn u(t) ~ C1x~(t) = -plu(t) iP- 1 sgn u(t).
Hence, the optimal u(·) does not change sign. It can easily be seen that
u ~ 0. Therefore, C < 0, and thus
296 PROBLEMS
~(t) = Cl(xl(t)) 2 /(p-l), C1 > 0,
i.e., x(•) satisfies the equation x + Clx(p+l)/(p-l) = 0. Denoting (p +
1)/(p- 1) = q we derive the integral x 2/2 + (Cl/(q + l))xq+l = D from the
equation with respect to x. From the conditions x(O) = 0, x(O) = 1 we ob-

f
x

tain D = 1/2 and t = dz • The constant C1 is found from the


JI + (2C1/(q + I))zq+t
relation o
I

((q+I)/2C 1 ) 11 <q+ll xJ JI-7Jq+t


d7J C~'l<q+orq,

where fq is a known quantity. Thus, the extremal x(·) is a function sym-


metric with respect to the straight line t = 1/2, and on the interval [0,

fJ
X

1/2] this function is inverse to the function


cl = (2rq)q+l.
t= I- (2C,/~: + I) )zq+l ' where
0

Jt for 0 ~ t ~ 1/2,
98. x(t)
11- t for 1/2 ~ t ~ 1.
H: one should apply the compulsory constraint method (lui ~A), apply
the maximum principle, make use of the relation Cx1(·) = p2(·) analogous
to the one obtained in the solution of Problem 97, and then pass to the
limit for A + ""·
Problems 97 and 98 were considered in stability theory.*
99. H: Problem 99 is one of the formalizations of the well-known
Ulam problem of making line segments coincide. In his bookt Ulam writes:
"Suppose two segments are given in the plane, each of length one. One is
asked to move the first segment continuously without changing its length
to make it coincide at the end of the motion with the second interval in
such a way that the sum of the lengths of the two paths described by the
end points should be a minimum."
In the formalization the role of the parameter t is played by the
angle between the moving line segment and its initial position.
The problem can be reduced to a Lyapunov-type problem (because the
integrand does not depend on x 1 and x2) and investigated with the aid of
the duality methods of Section 4.3.+ However, the simplest way is to apply
the maximum principle. In Fig. 42 we present the optimal solution for line
segments lying in one straight line (here we require that the point x 1 go
to xt and the point x2 go to x~). The arrows indicate the trajectories
of the points Xl and X2· The point 0 is determined uniquely from the con-
ditions of the symmetry, of the parallelism of Ox{ to x2~ and of x 10 to
~·x~, and of the perpendicularity of 0~ to x2~ and of 0~' to x~~.

*G. Borg, "Uber die Stabilitat gewisser Klassen von linearen Differential-
gleichungen,"Arch. Mat. Astron. Phys., Bd31A, No. 41,460-482 (1944).
Yu. A. Levin, "On the zeroth stability zone," Dokl. Akad. Nauk SSSR, 145,
No. 6, 1021-1023 (1962). Yu. A. Levin, "On a stability criterion," Usp.
Mat. Nauk, 17, No.3, 211-212 (1962).
ts. M. Ulam:-A Collection of Mathematical Problems, Interscience, New York
(1960), p. 79.
+seeM. A. Rvachev, On Ulam's Problem of Making Line Segments Coincide,
Trudy Seminara "Kombinatsionnaya Geometriya i Optimalnye Razmeshcheniya,
Kiev (1973), pp. 42-52.
PROBLEMS 297

x2 xz'

Fig. 42

100. H: Problem 100 is a formalization of one of the problems on the


maximum flight altitude of a rocket.* The problem can easily be reduced
to an elementary control problem (since the Lagrangian depends linearly on

l
x).
x0 forO~t~th

x(t)= (y/cg)t 3 +(y/g)1 2 fort,~ 1~ 12,

1 fort> 12 ,

where t1 is the positive root of the equation (y/cg)t 3 + (y/g)t 2 xo and


t2 is the positive root of the equation (y/cg)t 3 + (y/g)t 2 = 1.t

*A. A. Kosmodemyanskii, Konstantin Eduardovich Tsiolkovskii [in Russian],


Nauka, Hoscow (1976), pp. 263-265.
tFor greater detail, see Yu. L. Aleksandrov,"The problem on the maximum
flight altitude of a meteorological rocket in a homogeneous atmosphere and
a homogeneous gravitation field," Vestn. Mosk. Gos. Univ., Ser. 7, Mat.
Mekh., No. 5, 70-74 (1981).
NOTES AND GUIDE TO THE LITERATURE

There is a huge literature on the theory of extremal problems. Neither


the bibliography nor the notes and the guide to the literature presented
here pretend to be complete: we confine ourselves to a number of fundamen-
tal works, basic monographs, textbooks, and review articles.
To Chapter 1. Section 1.1 . The history of the appearance of the
first maximum and minimum problems is presented in [86] (besides the book
by Vander Waerden quoted in Section 1.1.2); for the initial stage of the
classical calculus of variations, see [83] and [87]. The extremal proper-
ties of a circle and a ball are dealt with in the monograph [21]. The
transportation problem was already investigated in the work of L. V. Kan-
torovich [55], the first work on linear programming. The simplest auto-
matic control problems were first investigated by Bushaw [96]. The time-
optimal problem and a number of others were thoroughly considered in the
fundamental monograph by L. S. Pontryagin, V. G. Boltyanskii, R. V. Gam-
krelidze, and E. F. Mishchenko [12] and in many other books on optimal
control ([23], [27], etc.).
Sections 1.2-1.5. For the history of the Lagrange principle, see
the article [45]. The same topic but in the framework of the classical
calculus of variations is considered in the works [43}, [44}.
We also point out some other books and articles devoted to the topics
touched upon in these sections and meant for a broad audience: [71}, [74],
[76], [77], and [115]. The fundamentals of the classical calculus of vari-
ations are presented in the textbooks [2], [3], and [8].
To Chapter 2. Section 2.1. Besides the textbooks on functional
analysis already mentioned, we point out the books [92] and [107] spe-
cially meant for the "needs" of the theory of extremal problems.
Section 2.2. The differential calculus in normed linear spaces is
presented (besides [KF]) in the textbooks [5], [56}, and [69] and in the
monographs [28}, [54], and [59}.
Section 2.3. Finite-dimensional versions of the implicit function
theorem are presented in the textbooks [5], [56], and [69].
The theorem of Section 2.3.1 is sufficiently convenient for construct-
ing the general theory since it contains the classical implicit function
theorem and Lyusternik's theorem on the tangent space and yields an esti-
mate for the deviation from the kernel of a mapping. The constructions

298
NOTES AND GUIDE TO THE LITERATURE 299
used in the proof were in fact contained in the initial work [68] (also
see [69]). For other proofs and modifications, see [54] and [65].
Section 2.4. For the differentiability of other important concrete
functionals, see [28] and [59].
Section 2.6. The fundamentals of finite-dimensional convex analysis
were laid down by Minkowski [113] and [114] and Fenchel [101] and [102).
Convex analysis in finite-dimensional spaces was constructed in the Sixties
in the works of Br~nsted [95), Dubovitskii and Milyutin [46], Moreau, Rocka-
fellar, et al. The most complete review of the finite-dimensional theory
is contained in the monograph by Rockafellar [81) and also in the monograph
by Bonnesen and Fenchel [94) (in the aspect related only to convex sets).
The infinite-dimensional theory is presented in [16), [52), [54), [62),
[79], and [84].
Lately some attempts have been made to create a synthetic ("smooth-
convex") calculus (see [70], [99], and [ 118]).
To Chapter 3. Section 3.2. It is difficult to say who was the first
to prove the Lagrange multiplier rule for smooth finite-dimensional prob-
lems with equality and inequality constraints. Some American works con-
tain references to the work of Valentine [120] and the thesis of Karush
[109). The multiplier rule for the case of infinitely many inequalities
was proved by John [108]. The work of Kuhn and Tucker [111] played a very
important role in these problems. Infinite-dimensional versions of the
multiplier rule are presented in [37], [39], [46], [54], [79], and [84).
Section 3.3. Line~r and convex programming is widely represented in
Russian textbooks and monographs: [38], [40], [41], [50), [57], [58], [67],
[75], [76), [81], [89), and [90].
Section 3.4. The work [100] was one of the first works devoted to
necessary conditions for problems with inequality constraints. The finite-
dimensional theory is perfectly presented in the textbook by Hestenes
[106]; also see [51] and [85]. Recently Levitin, Milyutin, and Osmolovskii
have elaborated the complete theory of second-order conditions for problems
with constraints [64] and [65]. In the last of these works there is an
extensive bibliography. We used some of the constructions of [65] in the
presentation of the material in this book. Also see [105] and [117].
To Chapter 4. Sections 4.1 and 4.4. There are many textbooks and
monographs devoted to the classical calculus of variations. Besides those
mentioned above, see [20), [63], [91], [93], [97], and [103]. The theory
of necessary and sufficient conditions in the Lagrange problem is elabo-
rated most completely in the monograph by Bliss [20]. The his~ory of the
problem is also presented in that monograph. For the relationship between
the calculus of variations and classical mechanics, see [1].
Section 4.2. The initial outline of optimal control theory was
presented in [26] and in the review article by Pontryagin [78]. After
that Pontryagin, Boltyanskii, Gamkrelidze, and Mishchenko devoted their
monograph [12] to optimal control theory, which gave rise to the rapid
development of the whole branch of mathematics connected with the in-
vestigation of extremal problems. The proof of the maximum principle pre-
sented in Section 4.2 is a modification of the original proof given by
V. G. Boltyanskii although we used no other means than those of classi-
cal mathematical analysis. The method of centered systems was used in the
works of Dubovitskii and Milyutin. At present there are many proofs of the
maximum principle, e.g., see [24], [31], [46]-[49], [54], [104], and [112].
Optimal control theory is dealt with in the textbooks [23], [31],
and [35] and also in the monographs [25], [27], [32], [61], [66), [73],
300 NOTES AND GUIDE TO THE LITERATURE

etc. In these monographs much emphasis is laid upon the solution of vari-
ous concrete applied problems.
Section 4.3. The maximum principle for linear systems was first
proved by Gamkrelidze [36]. The complete theory of linear systems was
elaborated by Krasovskii [60]. The article [17] is devoted to the review
of the subjects connected with Lyapunov-type proble~; this article con-
tains a generalization of the results of Section 4.3 and an extensive
bibliography.
The development of the theory of Lyapunov-type problems has led to
the investigation of convex integral functionals in the works [54], [98],
[119], etc.
In conclusion, we mention several monographs which may help the
reader to get acquainted with some fundamental questions of the theory of
extremal problems which are not touched upon in the present book:
Many-dimensional calculus of variations: [110] and [116].
Existence theorems: [35], [54], and [116].
Sufficient conditions in optimal control problems: [22], [23], and [61].
Extension of extremal problems: [53] and [88].
Duality methods in the theory of extremal problems: [52] and [88].
Numerical methods: [29], [42], [72], [80], and [82].
Sliding regimes: [34].
Dynamic programming: [18] and [19].
Phase constraints and mixed-type constraints: [33], [47], and [48].
Also see the review.[30].
REFERENCES

KF. A. N. Kolmogorov and S. v. Fomin, Elements of the Theory of Functions


and Functional Analysis [in Russian], Nauka, Moscow (1976).
1. V. I. Arnold, Mathematical Methods of Classical Mechanics, Springer-
Verlag, New York (1978).
2. N. I. Akhiezer, Lectures on the Calculus of Variations [in Russian],
Gostekhizdat, Moscow (1955).
3. I. M. Gelfand and S. V. Fomin, Calculus of Variations, Prentice-Hall,
Englewood Cliffs, N.J. (1963).
4. N. Dunford and J. I. Schwartz, Linear Operators, Part 7: General The-
ory, Interscience, New York (1958).
5. H. Cartan, Calcul Differentiel. Formes Differentielles [in French],
Hermann, Paris (1967).
6. E. A. Coddington and N. Levinson, Theory of Ordinary Differential
Equations, McGraw-Hill, New York (1955).
7. A. G. Kurosh, Higher Algebra [Russian translation], Mir, Moscow
(1972).
8. M. A. Lavrent'ev and L. A. Lyusternik, A Course in the Calculus of
Variations [in Russian], Gostekhizdat, Moscow (1950).
9. S. M. Nikol 1 skii, A Course of Mathematical Analysis [Russian transla-
tion], Vols. 1-2, Mir, Moscow (1977).
10. I. G. Petrovskii, Lectures on the Theory of Ordinary Differential
Equations [in Russian], Gostekhizdat, Moscow (1952).
11. L. S. Pontryagin, Ordinary Differential Equations [in Russian], Nauka,
Moscow (1965).
12. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F.
Mishchenko, The Mathematical Theory of Optimal Processes, Intersci-
ence, New York (1962).
13. I. I. Privalov, Introduction to the Theory of Functions of a Complex
Variable [in Russian], Nauka, Moscow (1977).
14. G. M. Fikhtengol'ts, A Course in Differential and Integral Calculus
[in Ru~sian], Vols. 1 and 2, Nauka, Moscow (1969).
15. P. Hartman, Ordinary Differential Equations, Wiley, New York (1964).
16. G. p, Akilov and S. S. Kutateladze, Ordered Vector Spaces [in Rus-
sian], Nauka, Novosibirsk (1978).
17. v. I. Arkin and V. L. Levin, "The convexity of the values of vector
integrals, the measurable selection theorem and variational problems,"
Usp. Mat. Nauk, 27, No.3, 21-77 (1972).
18. R. Bellman, Dynamic Programming, Princeton Univ. Press, Princeton
(1957).

301
302 REFERENCES
19. R. Bellman and R. Kalaba, Dynamic Programming and Modern Control
Theory, Academic Press, New York (1965).
20. G. A. Bliss, Lectures on the Calculus of Variations, Univ. of Chicago,
Chicago (1946).
21. v. B. Blyashke, The Circle and the Ball [in Russian], Nauka, Moscow
(1967).
22. v. G. Boltyanskii, "Sufficient conditions for optimality and the justi-
fication of the dynamic programming methods," Izv. Akad. Nauk SSSR,
Ser. Mat., 28, No. 3, 481-514 (1964).
23. v. G. Boltyanskii, Mathematical Methods of Optimal Control, Holt, Rine-
hart, and Winston, New York (1971).
24. V. G. Boltyanskii, "The tent method in the theory of extremal prob-
lems," Usp. Mat. Nauk, 30, No. 3, 3-55 ( 1975).
25. V. G. Boltyanskii, Optimal Control of Discrete Systems [in Russian],
Nauka, Moscow (1973).
26. v. G. Boltyanskii, R. V. Gamkrelidze, and L. S. Pontryagin, "On the
theory of optimal processes," Dokl. Akad. Nauk SSSR, 110, No. 1, 7-10
( 1956).
27. A. E. Bryson andY. C. Ho, Applied Optimal Control, Blaisdell, New
York (1969).
28. M. M. Vainberg, The Variational Method and the Monotone Operator
Method in the Theory of Nonlinear Equations [in Russian], Nauka, Mos-
cow (1972).
29. F. P. vasil'ev, Lectures on the Methods of Solving Extremal Problems
[in Russian], Moscow State Univ. (1974).
30. R. Gabasov and F. M. Kirillova, Opti~l Control Methods [in Russian],
Itogi Nauki i Tekhniki. Sovremennye Problemy Matematiki, Vol. 6, Moscow,
pp. 133-206.
31. R. Gabasov and F. M. Kirillova, Optimization Methods [in Russian],
Belorussian State Univ., Minsk (1975).
32. R. Gabasov and F. M. Kirillova, Singular Optimal Control [in Russian],
Nauka, Moscow (1973).
33. R. V. Gamkrelidze, "Optimal control processes with constrained phase
coordinates," Izv. Akad. Nauk SSSR, Ser. Mat., 24, No. 3, 315-356
( 1960).
34. R. v. Gamkrelidze, "On sliding optimal regimes," Dokl. Akad. Nauk
SSSR, 143, No.6, 1243-1245 (1962).
35. R. V. Gamkrelidze, Fundamentals of Optimal Control [in Russian],
Tomsk State Univ. (1977).
36. R. v. Gamkrelidze, "The theory of time-optimal processes in linear
systems," Izv. Akad. Nauk SSSR, Ser. Matern., 22, No. 4, 449-474 (1958).
37. R. v. Gamkrelidze and G. L. Kharatishvili, First-Order Necessary Con-
ditions in Extremal Problems [in Russian], Nauka, Moscow (1972).
38. s: I. Gass, Linear Programming, McGraw-Hill, New York (1958).
39. I. v. Girsanov, Lectures on the Mathematical Theory of Extremal Prob-
lems [in Russian], Moscow State Univ. (1970).
40. E. G. Gol'shtein, Duality Theory in Mathematical Programming [in Rus-
sian], Nauka, Moscow (1971).
41. G. B. Dantsig, Linear Progra~ing and Extensions, Princeton Univ.
Press, Princeton, New Jersey (1963).
42. v. F. Demyanov and A. M. Rubinov, Approximate Methods of Solving Ex-
tremal Problems [in Russian], Leningrad State Univ. (1968).
43. A. V. Dorofeeva, "Calculus of variations in the second half of the
19th century," Istoriko-Matematicheskiye Issledovaniya, XV, 99-128
(1963). -
44. A. V. Dorofeeva, "The development of the variational calculus as a
calculus of variations," Istoriko-Matematicheskie Issledovaniya, XIV,
101-180 (1961). -
REFERENCES 303

45. A. V. Dorofeeva and V. M. Tikhomirov, "From the Lagrange multiplier


rule to the Pontryagin maximum principle," Istoriko-Matematicheskie
Issledovaniya, XXV (1979).
46. A. Ya. Dubovitskii and A. A. Milyutin, "Extremum problems in the pres-
ence of constraints," Zh. Vychisl. Mat. Mat. Fiz., No.3, 395-453
(1965).
47. A. Ya. Dubovitskii and A.A. Milyutin, Necessary Conditions for Weak
Extremum in the General Optimal Control Problem [in Russian], Nauka,
Moscow (1971).
48. A. Ya. Dubovitskii and A. A. Milyutin, "Necessary conditions for weak
extremum in optimal control problems with mixed-type inequality con-
straints," Zh. Vychisl. Mat. Mat. Fiz., No.4, 725-770 (1968).
49. A. Ya. Dubovitskii and A. A. Milyutin, "Translation of Euler's equa-
tion," Zh. Vychisl. Mat. Mat. Fiz., No.6, 1263-1284 (1969).
50. I. I. Eremin and N. N. Astaf'ev, Introduction to the Theory of Linear
and Convex Programming [in Russian], Nauka, Moscow (1976).
51. W. Zangwill, Nonlinear Programming- Unified Approach, Prentice-Hall,
Englewood Cliffs, New Jersey (1969).
52. A. D. Ioffe and V. M. Tikhomirov, "Duality of convex functions and
extremal problems," Usp. Mat. Nauk, No. 6, 51-116 ( 1968).
53. A. D. Ioffe and V. M. Tikhomirov, "Extension of variational problems,"
Trudy MMO, 18, 187-246 (1968).
54. A. D. Ioffeand v. M. Tikhomirov, Theory of Extremal Problems, North-
Holland, Amsterdam-New York-oxford (1979).
55. L. V. Kantorovich, Mathematical Methods of Organizing and Planning
Production [in Russian], Leningrad State Univ. (1939).
56. L. V. Kantorovich and G. P. Akilov, Functional Analysis [in Russian],
Nauka, Moscow (1977).
57. S. Karlin, Mathematical Methods and Theory in Games, Programming and
Economics, Addison-Wesley, Reading (1959).
58. V. G. Karmanov, ~mthematical Programming [in Russian], Nauka, Moscow
(1975).
59. M.A. Krasnosel 1 skii, P. P. Zabreiko, E. I. Pustyl'nik, and P. E. Sobo-
levskii, Integral Operators in the Space of Summable Functions [in
Russian], Nauka, Moscow (1966).
60. N. N. Krasovskii, Theory of Motion Control. Linear Systems [in Rus-
sian], Nauka, Moscow (1968).
61. V. F. Krotov and V. I. Gurman, Methods and Problems in Optimal Control
[in Russian], Nauka, Moscow (1973).
62. S. S. Kutateladze and A. M. Rubinov, The Minkowski Duality and Its
Applications [in Russian], Nauka, Novosibirsk (1976).
63. M. A. Lavrent'ev and L. A. Lyusternik, Foundations of the Calculus
of Variations [in Russian], Vols. I and II, ONTI, Moscow-Leningrad
( 1935).
64. E. S. Levitin, A. A. Milyutin, and N. P. Osmolovskii, "Conditions for
local minimum in problems with constraints," in: Mathematical Econom-
ics and Functional Analysis [in Russian], Nauka, Moscow (1974), pp.
139-202.
65. E. S. Levitin, A. A. Milyutin, and N. P. Osmolovskii, "Higher-order
conditions for local minimum in problems with constraints," Usp. Mat.
Nauk., 33, No.6, 85-148 (1978).
66. E. B. Lee and L. Markus, Foundations of Optimal Control Theory, Wiley,
New York (1967).
67. H. W. Kuhn and A. W. Tucker (eds.), "Linear inequalities and related
systems," Ann. Hath. Stud., No. 38 ( 1956).
68. L. A. Lyusternik, "On conditional extrema of functions," Mat. Sb.,
41, No.3, 390-401 (1934).
69. ~A. Lyusternik and V. I. Sobolev, Elements of Functional Analysis,
Ungar, New York (1961).
304 REFERENCES
70. G. G. Magaril-Il'yaev, "Implicit function theorem for the Lipschitzian
mappings," Usp. Mat. Nauk, 33, No. 1, 221-222 (1978).
71. N. Kh. Rozov (ed.), Mathematics for Engineers [in Russian], Znaniye,
Moscow (1973).
72. N. N. Moiseev, Numerical Methods in the Theory of Optimal Systems
[in Russian], Nauka, Moscow (1971).
73. N. N. Moiseev, Elements of the Theory of Optimal Systems [in Russian],
Nauka, Moscow (1971).
74. A. D. Myskis, Advanced Mathematics for Engineers, Special Courses
[Russian translation], Mir, Moscow (1975). ·
75. H. Nikaido, Convex Structures and Economic Theory, Academic Press,
New York-London (1968).
76. I. v. Nit, Linear Programming with Discussion of Some Nonlinear Prob-
lems [in Russian], Moscow State Univ. (1978).
77. Optimal Control [in Russian], Znaniye, Moscow (1978).
78. L. S. Pontryagin, "Optimal control processes," Usp. Mat. Nauk, ~.
No. 1, 3-20 (1959).
79. N. N. Pshenichnyi, Necessary Conditions for Extremum [in Russian],
Nauka, Moscow (1969).
80. B. N. Pshenichnyi and Yu. M. Danilin, Numerical Methods in Extremal
Problems [in Russian], Nauka, Moscow (1975).
81. R. T. Rockafellar, Convex Analysis, Princeton Univ. Press, Princeton,
New Jersey (1970}.
82. I. v. Romanovskii, Algorithms for Solving Extremal Problems [in Rus-
sian], Nauka, Moscow (1977).
83. K. A. Rybnikov, "The first stages of the calculus of variations,"
Istoriko-Matematicheskie Issledovaniya, No. 2, GITTL, Moscow-Lenin-
grad (1949).
84. V. M. Tikhomirov, Some Problems in the Approximation Theory [in Rus-
sian], Moscow State Univ. (1976).
85. A. v. Fiacco and G. P. McCormick, Nonlinear Programming. Sequential
Unconstrained Technique, Wiley (1968).
86. H. G. Zeuthen, Histoir des Mathematique dans L'Antiquite et la Moyen
Age [in French], Paris (1902).
87. H. G. Zeuthen, Geschichte der Mathematik in XVI and XVIII Jahrhundert
[in Gern~n], Leipzig (1903).
88. I. Ekeland and R. Temam, Analyse Convexe et Problemes Variationelles
[in French], Hermann, Paris (1974).
89. K. J. Arrow, L. Hurwicz, and H. Uzawa, Studies in Linear and Nonlinear
Programming, Stanford Univ. Press, Stanford (1958).
90. D. B. Yudin and E. G. Gol'shtein, Linear Programming (Theory, Methods,
and Applications) [in Russian], Nauka, Moscow (1969).
91. L. C. Young, Lectures on the Calculus of Variations and Optimal Con-
trol Theory, W. B. Saunders, Philadelphia-London-Toronto (1969).
92. A. v. Balakrishnan, Applied Functional Analysis, Springer-Verlag, New
York (1976).
93. 0. Bolza, Vorlesungen fiber Variationsrechnung [in German], Leipzig
( 1949) .
94. T. Bonnesen and W. Fenchel, Theorie der konvexen KBrper [in German],
Springer-Verlag, Berlin (1934).
95. A. Br~ndsted, "Conjugate convex functions in topological vector
spaces," Mat. Fys. Medd. Dansk. Vid. Selsk., 34, No. 2, 1-26 (1964).
96. D. W. Bushaw, Differential Equations with a Discontinuous Forcing
Term, Princeton: Dept. of Math. Princeton Univ. (1952).
97. C. Caratheodory, Variationsrechnung und partielle Differentialgleich-
ungen erster Ordnung [in German], Teubner, Leipzig-Berlin (1935).
98. C. Castaing and M. Valadier, "Convex analysis and measurable multi-
functions," Lect. Notes Math., No. 580, Springer-Verlag, Berlin
(1977).
REFERENCES 305

99. F. H. Clarke, "Generalized gradients and applications," Trans. Am.


Math. Soc., 205, 247-262 (1975).
100. M. J. Cox, "On necessary conditions for relative minima," Am. J.
Math., 66, No. 2, 170-198 (1944),
101. W. Fenchel, Convex Cones, Sets and Functions, Princeton Univ. Press,
Princeton, New Jersey (1951).
102. W. Fenchel, "On conjugate convex functions," Can. J. Math.,l:_, 73-77
(1949).
103. J. Hadamard, Le~on sur le Calcul des Variations [in French], Hermann,
Paris (1910).
104. H. Halkin, "On necessary conditions for optimal control of nonlinear
systems," J. Analyse Math., 12, 1-82 (1964).
105. M. R. Hestenes, Calculus of Variations and Optimal Control Theory,
Wiley, New York (1966).
106. M. R. Hestenes, Optimization Theory. The Finite-Dimensional Case,
Wiley, New York (1975).
107. R. B. Holmes, Geometric Functional Analysis and Its Applications,
Springer-Verlag, New York (1975).
108. F. John, "Extremum problems with inequalities as subsidiary condi-
tions," in: Studies and Essays. Courant Anniversary Volume, Inter-
science, New York (1948), pp. 187-204.
109. W. E. Karush, Minima of Functions of Several Variables with Inequali-
ties as Side Conditions, Univ. of Chicago Press, Chicago (1939).
110. R. Klotzler, Hehrdimensionale Variationsrechnung, Berlin: VEB Deutscher
Verlag (1971).
111. H. W. Kuhn and A. W. Tucker, "Nonlinear programming," in: Proc. of
Second Berkeley Symp., Univ. of California Press, Berkeley ( 1951), pp.
481-492.
112. P. Michel, "Une demonstration elementaire du principe du maximum de
Pontriaguine," Bull. Math. Economiques, 1A ( 1977) .
113. H. Minkowski, Geometrie der Zahlen [in German], Teubner, Leipzig
(1910).
114. H. Minkowski, "Theorie der konvexen Korper," in: Gesammelte Ab-
handlungen [in German], Vol. II, Teubner, Leipzig-Berlin (1911).
115. N. N. Moiseev and v. M. Tikhomirov, "Optimization," in: Mathematics
Applied to Physics, Springer-Verlag, New York (1970), pp. 402-467.
116. Ch. B. Morrey, Multiple Integrals in the Calculus of Variations,
Springer-Verlag, New York (1966).
117. L. W. Neustadt, Optimization. A Theory of Necessary Conditions,
Princeton Univ. Press, Princeton, New Jersey (1976).
118. R. T. Rockafellar, "The theory of subgradients and its application
to problems of optimization," Lect. Notes Univ. Montreal (1978).
119. R. T. Rockafellar, "Convex-integral functionals and duality," in:
Contributions to Nonlinear Functional Analysis, Academic Press, New
York (1971), pp. 215-236.
120. F. A. Valentine, "The problem of Lagrange with differential inequali-
ties as added side conditions," in: Contributions to the Calculus of
Variations, Univ. of Chicago Press, Chicago (1933-1937).
BASIC NOTATION

v universal quantifier, "for every"


3 existential quantifier, "there exists"
=* sign of implication " ... implies ••• "
sign of equivalence
del by definition, is equal to
(4)
by virtue of (4), is equal to
x EA the element x belongs to the set A
x~ A the element x does not belong to the set A
s empty set
AUB the union of the sets A and B
AOB the intersection of the sets A and B
A"-B the difference of the sets A and B, i.e., the set of elements that
belong to the set A but do not belong to the set B
Ac B the set A is contained in the set B
A x B the Cartesian product of the sets A and B
A + B the arithmetical sum of the sets A and B
{xiP(x)} the set of those elements x that possess the property P(·)
{x 1 , ••• ,xn,•••} the set which consists of the elements x 1 , ••• ,xn•···
F:X + Y the mapping F of the set X into the set Y; the function F with
domain X whose values belong to the set Y
x + F(x) the mapping (function) F assigns the element F(x) to an element
x; the notation of the mapping (function) F in the case when it is de-
sirable to indicate the notation of its argument
F(·) the notation which stresses that F is a mapping (function)
F(A) the image of the set A under the mapping F
imF={ulu=F(x), x EX} the image of the mapping F:X + Y
F- 1 (A) the inverse image of the set A under the mapping F

306
BASIC NOTATION 307

FIA the restriction of the mapping F to the set A


FoG the composition of the mappings G and F: (F o G)(x) = F(G(x))
a the set of all real numbers; the number line (the real line)
'i = RU{- oo, +eo} the extended number line
inf A (supA) the infimum (supremum) of the numbers which belong to the set
A c: a
an the arithmetical n-dimensional space endowed with the standardEuclidean
structure; the elements of an should be thought of as column vectors even
when they are printed in rows
~ ={x=(xi, ... ,xn) ERnlx;;;,.Q} the nonnegative orthant of an
e1, ..• ,en the vectors of the standard orthonormal basis in an; e 1 = (1,
0, ••• , 0) , ••• , en= (O, 1 , ••• , 1)
an* the arithmetical n-dimensional space conjugate to an, the elements of
an* should be thought of as row vectors
n
px= <p, x)= ~ p;x; for any x E R.n , p E Rn•
i=l

xT the row vector which is the transpose of the column vector x


lxl the Euclidean norm in an; lxl 2 = (xlx) = xTx
n
(x!y)= ~x;y 1 = xTy the scalar product in an
1=1

llxll the norm of an element x in a normed space


p(x, y) the distance from an element x to an element y
p (x, A)=inf {p (x, y) 1yEA} the distance from an element x to a set A
B(x, r) {ylp(y, x) ~ r} the closed ball with center at x and radius r
B(x, r) {ylp(y, x) < r} the open ball with center at x and radius r
A the closure of the set A
int A the interior of the set A
TxM(TiM) the set of the tangent vectors (one-sided tangent vectors) to
the set A at the point x
Zaf-={x 1f(x)c;;;,a} an a-level set of the function f
X* the conjugate space of X
x* an element of the conjugate space X*
<x*, x> the value of the linear functional x-Ex• on the element xEX
(xly) the scalar product of elements x and y of a Hilbert space
A.L={x.-1 <x•, x)=O, x E A} the annihilator of the set A
dim L the dimension of the space L
X/L the factor space of the space X generated by the subspace L
Z(X,n the space of continuous linear mappings of the space X into the
space Y; the mappings belonging to the space Z(R.n, R.m) can be identified
with their matrices relative to the standard bases in an and am
zn (X, Y) the space of continuous multilinear mapping_s of the space X x ••. x X
into the space Y n times
308 BASIC NOTATION

I the unit operator; E or En is the matrix of the unit operator of :C(Rn,:


Rn), i.e., the unit n-th-order matrix (sometimes denoted by the same sym-
bol I)
A* the operator adjoint to the operator A; <A*y*, x> <y*, Ax>
Ker A = {xi Ax = 0} the kernel of the linear operator A
ImA = {yly =Ax} the image of the linear operator A
xTAx(yTAx) the notation for the quadratic (bilinear) form with a matrix
n
A = (aij): yTAx= ~ A;iYiXj.
i, i=l

A > 0 the matrix A is positive-definite


A >0 the matrix A is positive semidefinite (nonnegative definite)
C(K, Y) the space of continuous mappings from K into Y; if Y is normed,
then llx(-)[l~llx(·)llo=sup[[x(t)ll for x(·)EC(K, Y); C(K) = C(K, Y) if it is clear
IEK

what the space Y is or if Y = R


C([to, t 1 ]) the space of continuous functions on the closed interval [to,
tll
cr(u) the set of mappings having in U continuous derivatives up to the
order r inclusive; the notation is usually used in expressions of the
type "the function f belongs to the class cr(u)" or "F is a mapping of
class cr(u)" (FEC'(U))
cr (£1, Y) the space of mappings of the closed interval l\ c R into the space
Y (usually Y = Rn or Y = Rn*) possessing continuous derivatives up to the
order r inclusive; for x(·)EC'(I\, Y) we have llx(·)ll=llx(·)[[,=max{[lx(·)llo.
II x<·l llo· . .. • I x<r, ( · >llo}
KC 1 (6, Y) the space of mappings of the closed interval 1\cR into the
space Y (usually Y = Rn or Y = Rn*) possessing piecewise-continuous de-
rivatives. It is equipped with a norm as a subspace of the space C(£1, Y)
w.;;(£1, Y) the space of mappings of the closed interval 1\c R into the space
Y (usually Y = Rn or Y = Rn*) each of which satisfies the Lipschitz con-
dition with some constant. It is equipped with a norm as a subspace of
the space C(£1, Y)
F'(x; h) the derivative of the function Fat the point x in the direction
of the vector h
o+F(x; .) the first variation of the mapping F at the point x
oF(x; .) the first Lagrange variation of the mapping F at the point x
.snF(x; .) the n-th Lagrange variation of the mapping F at the point x
Ff.<x) the Gateaux derivative of the mapping F at the point x
F' (x) the Frechet derivative of the mapping F at the point x
F'(x)[h] or F'(x)h the value of the derivative of the mapping Fat the
point x on the vector h
the mapping F is Frechet differentiable at the point x
F € SD 1 (x) the mapping F is strictly differentiable at the point x
F"(x) the second Frechet derivative of the mapping F at the point x
F"(x)[hl, hz] the value of the second derivative (regarded as a bilinear
function) on the pair of vectors h 1 and h 2
BASIC NOTATION 309

h + d 2 F(x; h) = F"(x)[h, h) the second differential of the mapping Fat


the point x
Fx 1 (xl, xz) (Fx 2 (xl, xz)) the partial derivative with respect to x 1 (with
respect to xz) of the mapping F, i.e., the derivative of the mapping
x1 + F(xl, x2) (x2 + F(xl, x2))
[xi, x2 ]={x\x=a.xi+(l-a.)x 2, O.i;;;;a...;;l} the line segment joining the points x 1
and x2
dom f={xEXIf(x)<+oo} theeffective domain of the function f:X + R
epi f={(x, a.)EXX_R lf(x)..;;a., xEdoPJf} theepigraphof the function f:X + R
oA the indicator function of the set A
sA the support function of the set A
~A the Minkowski function of the set A
lin A the linear hull of the set A
aff A the affine hull of the set A
cone A the conic hull of the set A
conv A the convex hull of the set A
convA the convex closure, i.e., the closure of the convex bull, of the
set A
df(x) the subdifferential of the function f at the point X
f*(·) the function conjugate to the function f(·), or the Young-Fenchel
transform of the function f
f (x)-----+ inf (sup, extr), x E C the notation of an extremal problem
x a solution of an extremal problem
absmin_ (absmax, absextr) absolute minimum (maximum, extremum)
x m
E absmin (absmax m. x
absextr (3)) yields an absolute minimum (maxilllllm, ex-
tremum) in the problem,3
locmin (locmax, locextr) a local minimum (maximum, extremum)
x m.
E locmin (locmax (~). locextr (3}) x yields a minimum (maximum, extremum) in the
problem 3
:;, :§, o1t, (jf, :J> functionals in problems of the classical calculus of variations
and optimal control problems
Q the quadratic functional of a secondary problem of the calculus of vari-
at ions
L L(t, x,x) or L = L(t, x, x, u) the Lagrangian
,.z the Lagrange function
~ the Hamiltonian
H the Pontryagin function
B the Weierstrass function

Potrebbero piacerti anche