Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
WALTERS
CALCULUS OF VARIATIONS
Already published
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 59 60 64 W . M . L . Holcombe Algebraic automata theory K . Petersen Ergodic theory P.T. Johnstone Stone spaces W,H. Schikhof Ultrametric calculus J.-P. Kahane Some random series of functions, 2nd edition H. Cohn Introduction to the construction of class fields J . Lambek & P.J. Scott Introduction to higher-order categorical logic H. Matsumura Commutative ring theory C . B . Thomas Characteristic classes and the cohomology of finite groups M. Aschbaeher Finite group theory J . L . Alperin Local representation theory P. Koosis The logarithmic integral I A. Pietsch Eigenvalues and S-numbers S.J. Patterson An introduction to the theory of the Riemann zeta-function H.J. Baues Algebraic homotopy V.S. Varadarajan Introduction to harmonic analysis on semisimple Lie groups W. Dicks & M. Dunwoody Groups acting on graphs L . J . Corwin & F.P. Greenleaf Representations of nilpotent Lie groups and their applications R. Pritsch & R. Piccinini Cellular structures in topology H Klingen Introductory lectures on Siegel modular forms P. Koosis The logarithmic integral II M.J. Collins Representations and characters of finite groups H. Kunita Stochastic flows and stochastic differential equations P. Wojtaszczyk Banach spaces for analysts J . E . Gilbert & M.A.M. Murray Clifford algebras and Dirac operators in harmonic analysis A. Prohlich & M.J. Taylor Algebraic number theory K . Goebel & W . A . Kirk Topics in metric fixed point theory J . F . Humphreys Reflection groups and Coxeter groups D.J. Benson Representations and cohomology I D.J. Benson Representations and cohomology II C . Allday & V . Puppe Cohomological methods in transformation groups C . Soule et al Lectures on Arakelov geometry A. Ambrosetti & G . Prodi A primer of nonlinear analysis J . Palis & F . Takens Hyperbolicity and sensitive chaotic dynamics at homoclinic bifurcations M. Auslander, I. Reiten & S. Smalo Representation theory of Artin algebras Y . Meyer Wavelets and operators C . Weibel An introduction to homological algebra W. Bruns & J . Herzog Cohen-Macaulay rings V . Snaith Explicit Brauer induction G . Laumon Cohomology of Drinfeld modular varieties I E . B . Davies Spectral theory and differential operators J . Diestel, H. Jarchow & A. Tonge Absolutely summing operators P. Mattila Geometry of sets and measures in Euclidean spaces R. Pinsky Positive harmonic functions and diffusion G . Tenenbaum Introduction to analytic and probabilistic number theory C . Peskine An algebraic introduction to complex projective geometry I Y . Meyer & R. Coifman Wavelets and operators II R. Stanley Enumerative combinatories I. Porteous Clifford algebras and the classical groups M. Audin Spinning tops V . Jurdjevic Geometric control theory H. Voelklein Groups as Galois groups J . Le Potier Lectures on vector bundles D. Bump Automorphic forms G. Laumon Cohomology of Drinfeld modular varieties II P. Taylor Practical foundations of mathematics M. Brodmann & R. Sharp Local cohomology J . Jost & X . Li-Jost Calculus of variations
Calculus of Variations
Jiirgen Jost and Xianqing Li-Jost
Max-Planck-Institute for Mathematics Leipzig in the Sciences,
CAMBRIDGE
U N I V E R S I T Y PRESS
PUBLISHED
BY T H E PRESS
SYNDICATE
OF T H E UNIVERSITY OF CAMBRIDGE
The Edinburgh Building, Cambridge C B 2 2 R U , U K http://www.cup.ac.uk 40 West 20th Street, New York, N Y 10011-4211, U S A http://www.cup.org 10 Stamford Road, Oakleigh, Melbourne 3166, Australia Cambridge University Press 1998 T h i s book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1998 Typeset in Computer Modern by the authors using I A l ^ X 2e A catalogue record of this book is available from the British Library of Congress Cataloguing in Publication data Library
Jost, Jurgen, 1956Calculus of variations / Jurgen Jost and Xianqing Li-Jost. p. cm. Includes index. I S B N 0 521 64203 5 (he.) 1. Calculus of variations. I . Li-Jost, Xianqing, 1956I I . Title. QA315.J67 1999 515'.64-dc21 98-38618 C I P I S B N 0 521 64203 5 hardback
Contents
Preface and summary Remarks on notation P a r t one: One-dimensional variational problems 1 1.1 1.2 1.3 1.4 1.5 2 2.1 2.2 2.3 3 3.1 3.2 4 4.1 4.2 4.3 4.4 4.5 4.6 T h e classical theory The Euler-Lagrange equations. Examples The idea of the direct methods and some regularity results The second variation. Jacobi fields Free boundary conditions Symmetries and the theorem of E. Noether A geometric example: geodesic curves The length and energy of curves Fields of geodesic curves The existence of geodesies Saddle point constructions A finite dimensional example The construction of Lyusternik-Schnirelman T h e theory of H a m i l t o n and J a c o b i The canonical equations The Hamilton-Jacobi equation Geodesies Fields of extremals Hilbert's invariant integral and Jacobi's theorem Canonical transformations vii
page x xv 1 3 3 10 18 24 26 32 32 43 51 62 62 67 79 79 81 87 89 92 95
Contents D y n a m i c optimization Discrete control problems Continuous control problems The Pontryagin maximum principle P a r t two: Multiple integrals in the calculus of variations 104 104 106 109
115 117 117 122 125 125 132 144 150 159 159 166 171 175 183 183 184 187 190 195 205 205 213 225 225
1 1.1 1.2 2 2.1 2.2 2.3 2.4 3 3.1 3.2 3.3 3.4
Lebesgue measure and integration theory The Lebesgue measure and the Lebesgue integral Convergence theorems B a n a c h spaces Definition and basic properties of Banach and Hilbert spaces Dual spaces and weak convergence Linear operators between Banach spaces Calculus i n Banach spaces L and Sobolev spaces L spaces Approximation of LP functions by smooth functions (mollification) Sobolev spaces Rellich's theorem and the Poincare and Sobolev inequalities
p p
T h e direct methods in the calculus of variations Description of the problem and its solution Lower semicontinuity The existence of minimizers for convex variational problems Convex functional on Hilbert spaces and MoreauYosida approximation The Euler-Lagrange equations and regularity questions Nonconvex functionals. Relaxation Nonlower semicontinuous functionals and relaxation Representation of relaxed functionals via convex envelopes T-convergence The definition of T-convergence
6 6.1
Contents 6.2 6.3 7 7.1 7.2 Homogenization T h i n insulating layers BV-functionals and T-convergence: the example of M o d i c a and M o r t o l a The space BV{Q) The example of Modica-Mortola The coarea formula The distance function from smooth hypersurfaces
ix 231 235
Bifurcation theory Bifurcation problems in the calculus of variations The functional analytic approach to bifurcation theory The existence of catenoids as an example of a bifurca tion process T h e PalaisSmale condition and unstable critical points of variational problems The Palais-Smale condition The mountain pass theorem Topological indices and critical points
The calculus of variations is concerned with the construction of optimal shapes, states, or processes where the optimality criterion is given in the form of an integral involving an unknown function. The task of the calculus of variations then is to demonstrate the existence and to deduce the properties of some function that realizes the optimal value for this integral. Such variational problems occur in many-fold applications, in particular in physics, engineering, and economics, and the variational integral may represent some action, energy, or cost functional. The cal culus of variations also has deep and important connections with other fields of mathematics. For instance, in geometrically defined classes of objects, a variational principle often permits the selection of a unique optimal representative, and the properties of this representative can fre quently be used to much advantage to deduce additional information about its class. For these reasons, the calculus of variations is a rich and ample mathematical subject, and a good impression of this diversity can be obtained by reading the beautiful book by S. Hildebrandt and A. Tromba, The Parsimonious Universe, Springer, 1996. In this textbook, we have attempted to present some of the many faces of the calculus of variations, and a brief summary may be useful before putting the contents into a broader perspective. A t the same time, we shall also describe the logical connections between the various chapters, in order to facilitate reading for readers with a specific aim. The book is divided into two parts. The first part treats variational problems for functions of one independent variable; the second, problems for functions of several variables. The distinction between these two parts, however, is also that the first treats the more elementary and more classical aspects of the subject, while the second is concerned w i t h some more difficult topics and uses somewhat more abstract reasoning. I n this second part, x
xi
also some examples are presented in detail that occurred in recent ap plications of the calculus of variations. This second part leads the reader to some topics and questions of current research in the calculus of vari ations. The first chapter of Part I is of a somewhat introductory nature and attempts to develop some intuition for the properties of solutions of vari ational problems. I n the basic Section 1.1, we derive the Euler-Lagrange equations that any smooth solution of a variational problem has to sat isfy. The topics of the other sections of that chapter contain some reg ularity questions and an outline of the so-called direct methods of the calculus of variations (a subject that will be taken up in much more de tail in Chapter 4 of Part I I ) , Jacobi's theory of the second variation and stability of solutions, and Noether's theorem that deduces conservation laws from invariance properties of variational integrals. A l l those results will not be directly applied in subsequent chapters, but should rather serve as a motivation. I n any case, basically all the chapters of Part I can be read independently, after the reader has gone through Section 1.1. In Chapter 2, we treat one of the most important variational prob lems, namely that of geodesies, i.e. of finding (locally) shortest curves under smooth geometric constraints. Geodesies are of fundamental im portance in Riemannian geometry and several physical applications. We shall make use of the geometric nature of this problem and develop some elementary geometric constructions, to deduce the existence not only of length-minimizing curves, but also of curves that furnish unstable criti cal points of the length functional. I n Chapter 3, we present some more abstract aspects of such so-called saddle point constructions. A t this point, however, we can only treat problems that allow the reduction to a finite dimensional situation. A deeper treatment needs additional tools and therefore has to wait until Chapter 9 of Part I I . Geodesies will only occur once more in the remainder, namely as an example in Section 4.3. Chapter 4 is concerned with one of the classical highlights of the cal culus of variations, the theory of Hamilton and Jacobi. This theory is of particular importance in mechanics. Presently, its global aspects are resurging in connection w i t h symplectic geometry, one of the most active fields of present mathematical research. Chapter 5 is a brief introduction to dynamic optimization and control theory The canonical equations of Hamilton and Jacobi of Section 4.1 briefly reoccur as an example of the Pontryagin maximum principle at the end of Section 5.3. As mentioned, Part I I is of a less elementary nature. We therefore need
xii
to develop some general theory first. I n Chapter 1 of that part, Lebesgue integration theory is summarized (without proofs) for the convenience of the reader. While in Part I , the Riemann integral entirely suffices (with the exception of some places in Section 1.2), the function spaces that are basic for Part I I , namely the LP and Sobolev spaces, are es sentially based on Lebesgue's notion of the integral. I n Chapter 2, we develop some results from functional analysis about Banach and Hilbert spaces that will be applied in Chapter 3 for deriving the fundamen tal properties of the L and Sobolev spaces. (In fact, as the tools from functional analysis needed in subsequent chapters are of a quite varied nature, Chapter 2 can also serve as a brief introduction into the field of functional analysis itself.) These chapters serve the purpose of making the book self-contained, and for most readers the best strategy might be to start with Chapter 4, or at most with Chapter 3, and look up the results of the previous chapters only when they are applied. Chapter 4 is fundamental. I t is concerned with the existence of minimizers of vari ational integrals under appropriate convexity and lower semicontinuity assumptions. We treat both the standard method based on weak com pactness and a more abstract method for minimizing convex functionals that does not need the concept of weak convergence. Chapters 5-7 essen tially discuss situations where those assumptions are no longer satisfied. Chapter 5 deals with the method of relaxation, while Chapters 6 and 7 present the important concept of T-convergence for minimizing func tionals that can be represented only in an indirect manner as limits of other functionals. Such problems occur in many applications, including homogenization and phase transitions, and several such examples are treated in detail. Chapter 8 discusses bifurcation theory. We first dis cuss the variational aspects (Jacobi fields), taking up the constructions of Sections 1.1 and 1.3 of Part I , then develop a general functional an alytic framework for analyzing bifurcation phenomena and then treat the example of minimal surfaces of revolution (catenoids) in the light of that framework. Chapter 8 is independent of Chapters 4-7, and of a more elementary nature than those. The key tool is the implicit function theorem in Banach spaces, proved in Section 2.4. The last Chapter 9 re turns to the topic of the existence of non-miminizing, unstable critical points of variational integrals. While such solutions usually cannot be observed in physical applications because of their unstable nature, they are of considerable mathematical interest, for example in the context of Riemannian geometry. Chapter 9 is independent of Chapters 4-8.
p
xiii
The present book is self-contained, with very few exceptions. Prere quisites are only the calculus of one and several variables. Although, as indicated, there are important connections between the calculus of variations and geometry, the present book is of an analytic nature and does not explore those connections. One such connection con cerns the global aspects of the space of solutions of one-dimensional vari ational problems and their trajectories that started w i t h the qualitative investigations of Poincare and is for example represented in V . I . Arnold, Mathematical Methods of Classical Mechanics, G T M 60, Springer, New York, 2nd edition, 1987. Here, geometric methods are used to study variational problems. I n the opposite direction, variational methods can often be used to solve geometric problems. This is the topic of geometric analysis; we refer the interested reader to J. Jost, Riemannian Geome try and Geometric Analysis, Springer, Berlin, 2nd edition, 1998, and the references contained therein. There is one important omission in this textbook. Namely, the reg ularity theory for solutions of variational problems is not treated, w i t h the exception of the one-dimensional case in Section 1.2 of Part I , and the simplest example of the multi-dimensional theory, namely harmonic functions (plus an easy generalization) in Section 4.5 of Part I I . There fore, the solutions of the variational problems that are discussed usually only are obtained in some Sobolev space. We think that a detailed treat ment of regularity theory more properly belongs to the realm of partial differential equations, and therefore we have to refer the reader to text books and monographs on partial differential equations, for example D. Gilbarg and N . Trudinger, Elliptic Partial Differential Equations of Second Order, Springer, Berlin, 2nd edition, 1983, or J. Jost, Partielle Differentialgleichungen, Springer, Berlin, 1998. In any case, the present textbook cannot cover all the many diverse aspects of the calculus of variations. For readers who are interested in a more extensive treatment, we strongly recommend M . Giaquinta and St. Hildebrandt, Calculus of Variations, several volumes, Springer, Berlin, 1996 ff., as well as E. Zeidler, Nonlinear Functional Analysis and its Applications, Vols. I l l and I V , Springer, New York, 1984 ff. (a second edition of Vol. I V appeared in 1995). Additional references are given in the course of the text. Since the present book, however, is neither a research monograph nor an account of the historical development of the calculus of variations, references to individual contributions are usually not given. We just list our sources, and refer the interested readers as well as the contributing mathematicans to those for references to the original contributions.
xiv
The authors thank Felicia Bernatzki, Ralf Muno, Xiao-Wei Peng, Marianna Rolf, and Wilderich Tuschmann for their help in proofreading and checking the contents and various corrections, and Michael Knebel and Micaela Krieger for their competent typing. The present authors owe much of their education in the calculus of variations to their teacher, Stefan Hildebrandt. In particular, the pre sentation of the material of Chapters 1 and 4 in Part I is influenced by his lectures that the authors attended as students. For example, the regularity arguments in Section 1.2 are taken directly from his lectures. For these reasons, and for his generous support of the authors over many years, and for his profound contributions to the subject, in particular to geometric variational problems, the authors dedicate this book to him.
Remarks on notation
A dot
,y =
(y\...,y )eR ,
then
d
x -y
2=1
x%
x%
X X.
In Part I , the independent variable is usually called t, because in many physical applications, it is interpreted as the time parameter. Here, the dependent variables are mostly called u(t) or x(t). I n Part I I , the inde pendent variables are denoted by x = ( x , . . . , x ) , conforming to estab lished conventions. We use the standard notation
1 d
c (n)
for the space of A;-times continuously differentiable functions on some open set Q C M , for k = 0 (continuous functions), 1,2,..., oo (infinitely often differentiable functions). For vector valued functions, w i t h values in M , we write
d d
C (fl,R ) xv
xvi
Remarks on notation
for the corresponding spaces. Co(ft) denotes the space of functions of class C on ft that vanish identically outside some compact subset K C ft (where K may depend on the function, of course). Occasionally, we also use the notation <? (fi)
0 fc
for C K
to indicate that the expression on the left of this symbol is defined by the expression on the right of i t .
P a r t
o n e
1.1 T h e E u l e r - L a g r a n g e equations. E x a m p l e s The classical calculus of variations consists in minimizing expressions of the form
where F : [a, 6] x R x R > E is given. One seeks a function u : [a, 6] R minimizing J. More generally, one is also interested in other critical points of J. Usually, u has to satisfy some constraints, the most common one being a Dirichlet boundary condition
d
u(a) = u\ u(b) = 1*2Also, one needs to specify a class of admissible functions among which one seeks a minimizing u. For example, one might want to take the class of continuously differentiable or piecewise continuously differentiable functions. Let us consider some examples of such variational prob lems: (1) We want to minimize the arc-length of the graph of a function u : [a, 6] K, i.e. the length of the curve (t,u(t)) C K among all graphs with prescribed boundary values u(a),u(b). This leads to the variational problem
2
mm. Of course, one knows and easily proves that the solution is the straight line between u(a) and u(b), i.e. satisfies u(t) = 0. 3
The classical theory (2) Historically, the calculus of variations started with the so-called brachystochrone problem that was posed by Johann Bernoulli. Here, one wants to connect two points (to,yo) and (t\,y\) in R by such a curve that a particle obeying Newton's law of gravi tation and moving without friction travels the distance between those points in the fastest possible way. After falling the height y, the particle has speed {2gy)^ where g is the gravitational accel eration. The time the particle needs to traverse the path y = u(t) then is
2
I[u)=
L
f
b
i-^w
Jl
7
= /
J
a
4- u(t)
(t,(t))
, ~rdt,
where 7 : [a, 6] x R R is a given positive function. This vari ational problem also arises from Fermat's principle..That princi ple says that a light ray chooses the path that needs the shortest time to be traversed among all possible paths. I f the speed of light in a given medium is y(t,u(t)), we obtain the preceding variational problem. If one seeks a minimum of a smooth function / : fi R ( f i open in R ) ,
d
one knows that at a minimizing point Zo fi, one necessarily has Df(z )
0
- 0,
where Df is the derivative of / . The first variation of / actually has to vanish at any stationary point, not only at minimizers. I n order to distinguish a minimizer from other critical points, one has the additional necessary condition that the Hessian D f(z ) is positive semidefinite and (at least for a local minimizer) the sufficient condition that i t is positive definite. In the present case, however, we do not have a function / of finitely many independent real variables, but a functional Z o n a class of func tions. Nevertheless, we expect that a first derivative of J something still to be defined needs to vanish at a minimizer, and moreover that a suitably defined second derivative is positive (semi)definite.
2 0
equations.
Examples
In order to investigate this more closely, we assume that F is of class C and that we have a minimizer or, more generally, a critical point of / that also is C . We also assume prescribed Dirichlet boundary conditions u(a) = u i , u(b) = U2. I n other words, we assume that u minimizes / in the class of all functions of class C satisfying the prescribed boundary condition. We then have for any 7 G CQ ([a, 6], M ) f and any s G R 7
1 1 1 d
+ sri(t),u(t)
1
sf}(t))dt.
Since F , u, and 7 are assumed to be of class C , we may differentiate 7 the preceding expression w.r.t. s and obtain at s = 0 I(u = + sr,) (1.1.1) + F (t,u{t),u{t))-T](t)}dt,
p
Uo
J {F (t,u{t),u{t))-r)(t)
u
Ja
where F is the vector of partial derivatives of F w.r.t. the components of u, and F the one w.r.t. the components of u(t). We now keep 7 fixed and let s vary. We are thus just in the situation of 7 a real valued f(s), s G R, (f(s) = I(u + srj)), and the condition / ' ( 0 ) = 0 translates into
u p
0=
/
/ a
(1.1.2)
and this actually then has to hold for all rj CQ. We now assume that F and u are even of class C . Equation (1.1.2) may then be integrated by parts. Noting that we do not get a boundary term since 77(a) = 0 = 77(6), we thus obtain
2
0 = ^ |(F(t, (t),u(<))-|(F (i
u p d
) U
( < ) ^ ) ) ) ) !,() J d
)
(1.1.3)
for all 7] Co ([a, 6 ] , R ) . I n order to proceed, we need the so-called fundamental lemma of the calculus of variations:
f This means that r) is continuously differentiable as a function on [a, b) with values in R and that there exist a < a\ < b\ < b with rj(x) = 0 if x is not contained in [aiM]d
L e m m a 1.1.1. If h e C ( ( a , 6 ) , R )
satisfies
L
l
then h = 0 on (a, 6). Proo/. Otherwise, there exists some t G (a, 6) with
0
M*o) ^ 0. Thus, h (to) i=- 0 for some index io { 1 , . . . , d}. Since / i is continuous, there exists some < > 0 w i t h 5 a<t -6<t
0 0
6<b
and \h (t)\
io
> ^ |/i (t )|
0 d
i o
whenever
| t - t\ < 6.
0
We then choose <p G C ((a, 6), R ) with <p(t) = 0 <p (t)>0 <^(t)=0 For this choice of </?, however /
./a
io
if if
0
\t -t\>6
0
|t -t|<(5
fort^to, t{l,...,d}.
h(t)(p(t)dt=
J
/ h(t)(p(t)dt ^ 0, to 6
r
contradicting our assumption. Thus, necessarily /i(o) = 0 f all (a, 6). g.e.d. Lemma 1.1.1 and (1.1.3) imply that a minimizer of I of class C to satisfy the so-called Euler-Lagrange equations, namely:
2 d d 2 2
has
F(t,u{t),u(t))dt
among all functions with prescribed boundary values u(a) andu(b). Then u is a solution of the following system of second order ordinary differ ential equations, the Euler-Lagrange equations (F (t,u(t),u(t)))
p
- F (t,u(t),u(t))
u
= 0.
(1.1.4)
equations. Examples
u(t),u(t))u(t)
u(t),
u(t))u(t) = 0, (1.1.5)
+ F (t,u(t),u(t))
- F (t,u(t),u(t))
i. e. a system of d ordinary differential equations of second order that are linear in the second derivatives of the unknown function u. Let us compute the Euler-Lagrange equations for our preceding three examples: (1) Here F = 0, F =
u p
i/i+u(t)
>
d 0 =
u( u(t)
'
i.e. u(t) = 0 meaning that u has to be a straight line, a fact that we know of course. (3) For the general example (3), we obtain as Euler-Lagrange equa tions o= | + 2 d*7(t,(*))>/l + ( * ) T u{t) ii{t) u(t)
2 8 2
+ 7
t
(1 + < i ( t ) ) .
(1.1.6)
(l+(*))
The classical theory Actually, (2) is an example of an integrand F(t,u,ii) depend explicitly on t, i.e. F = 0. I n this case
t
j (F
t
- uF ) = u(F
p
- jF )
p
= 0 by (1.1.4),
and hence every solution of the Euler-Lagrange equation (1.1.4) satisfies F(t,u{t),u{t)) -u(t)F (t,u(t),u(t))
p
= constant.
(1.1.7)
Conversely, every solution of (1.1.7), w i t h the exception of ii = 0, i.e. u = constant, also satisfies (1.1.4). In the case of example (2), we have F =
2
= ^ ( 1 + u ), i f we denote the constant in (1.1.7) by A. In all examples ( l ) - ( 3 ) , we actually had d = 1. I f one modifies e.g. (1) and seeks a curve g(t) = ( # i ( ) , . . . , 9d(t)) C R connecting two given points g(a) and ^(6), our variational problem becomes
d
d
2
d
d t
,
( d
^ ) _ _ ^ \ *
3
_ =i
J
9i J2(9j) -9i
( d
i-i
\
9j9j
f jgftW )
Ls ^'
) 2
for i = 1 , . . . , d. We now recall that any smooth curve g(t) C R may be parameterized by arc-length, i.e.
d
= 1.
(1.1.9)
We also know that a reparameterization of a curve g(t) does not change its arc-length 1(g). Consequently, we may assume (1.1.9) in (1.1.8). The latter then becomes 0 for i= l d
d
is a straight
equations.
Examples
F(t,u{t),u(t))dt
G(t,u(t),u(t))dt
= c
(a given constant).
(1.1.10)
As i n the case of finite dimensional minimization problems, one then finds a Lagrange multiplier A with 0 = A (I( as
d u
(1.1.11)
+
u
\G (t,u(t),ii(t)))
p
+ \G (t,u(t),u(t)))
= 0.
(1.1.12)
J( ) = / <i(*) d*
M
./a
(1.1.13)
(1.1.14)
Thus, A is an eigenvalue for the differential operator d /dt under the Dirichlet boundary conditions u(a) = 0 = u(b). Of course, this example can easily be generalized. Summary. We seek solutions of the variational problem I(u) > min, with I(u) = J
Ja
F(t,u(t),ii(t))dt
10
for given F and unknown u : [a, 6] R . I f F and u are differentiate, one may consider some kind of partial derivative, namely 61{u,rj) : = I{u + srj)y for rj E Co([a,6],R ). For a minimizer u then 61 (u, rj) = 0
2 d
- F*(t,u(t),u(t))
= 0.
The classical strategy for solving the problem I(u) min > consists in solving the Euler-Lagrange equations and then investigating whether a solution of the equations is a minimum of / or not.
1.2 T h e idea of the direct methods and some regularity results So far, our formulation of the variational problem I(u) min > has been rather vague, because we did not specify in which class of functions u we are trying to minimize / . The only things we did prescribe were boundary conditions of Dirichlet type, i.e. we prescribed the values u(a) and u(b) for our functions u : [a, 6] R . * Because of our derivation of the Euler-Lagrange equations in the pre ceding section, i t would be desirable to have a solution u of class C . So one might want to specify in advance that one minimizes / only among functions of class C . This, however, directly leads to the ques tion whether / achieves its infimum among functions of class C (with prescribed Dirichlet boundary conditions, as always) or not, and if i t does, whether the infimum of / in some larger class of functions, say C , could be strictly smaller than the one in C . I n the light of this question, it might be preferable to minimize / in the class of all functions u for which
d 2 2 2 1 2
11
is meaningful. Here, we assume that F(t,u,p) is continuous in u and p and measurable in t. For this purpose, one needs the class of functions for which the derivative u(t) exists almost everywhere and is finite. This is the class AC([a,b}) of absolutely continuous functions. A function u G AC([a,b]) ti,t e
2
satisfies for
= / Jti
ii(t)dt.
Note that F(t, u(t),u(t)) is a measurable function of t for u AC by our assumptions on F and the fact that the composition of a measurable and a continuous function is measurablef. The idea of the direct methods in the calculus of variations, as opposed to the classical methods described in the preceding section then consists i n minimizing / in a class of func tions like AC([a,b]) and then trying to show that a solution u because of its minimizing character actually enjoys better regularity properties, for example to be of class C , provided F satisfies suitable assumptions.
2
This minimizing procedure will be treated later J, since we want to return to the classical theory for a while. Nevertheless, even for the classical theory, one occasionally needs certain regularity results, and therefore we now briefly address the regularity theory. To simplify our notation, we put / : = [a, 6]. A class of functions intermediate between C and AC is
1
D ( / , E ) : = {u : / M , u continuous and piecewise continuously differentiable, i.e. there exist a = to < t\ < ... < t = b with u G C H M j + i ] , M ) for j = 0 , . . . , m - l } .
m d
u G D then has left and right derivatives u~(tj) and u+(tj) points where the derivative is discontinuous, and
even at the
f Lebesgue integration theory is summarized in Chapter 1 of Part I I . T h e required composition property is stated there as Theorem 1.1.2. Here, this point will not be pursued or used any further. t See Chapter 4 of Part I I . We shall use the same letter J to denote the functional to be minimized and the domain of definition of the functions, inserted into this functional. T h i s conforms to standard notations. T h e reader should be aware of this and not be confused.
12
t i ( - l ) = 1 = u(l). A minimizer is ti(0 = | t | D ( / , R ) which is not of class C . The minimizer of / is not unique (exercise: determine all minimizers), but none of them is of class C .
1 1 1
f0 = \t
1
which again is of class D , but not C . Example 1.2.3. [a, 6] = [ - 1 , 1 ] , d = 1 J(u) = ^ u(-l) = 0 The unique minimizer is , .
=
f0
which is of class C , but not of class C . T h e o r e m 1.2.1. Let F(t,u,p) be of class C w.r.t. u and p and con tinuous w.r.t t ( F : J x R x R -+ R ) , and let u G AC(I,R ) be a solution of
d d d 1
6I(u,rj)
= J {F (t,u,u)-rj
u
+ F (t,u,u)-f)}dt
p
=0
(1.2.1)
Ja
13
for all rj G AC (I,R ) (i.e. rj G AC(I,R )) and we require that if I = [a, 6], there exist a < a\ <b\ < b with rj(x) 0 if x is not contained in [ai,&i], as in the definition of C$([a, 6], R )). Then for almost all points in I
d
j F (t,
t p
(1.2.2)
(note, however, that the derivative on the left hand side cannot be com puted by the chain rule). If u G C (I,R ), (1.2.2) holds for all t G J, and if u G J D ( / , E ) , at those points tj where u(tj) is discontinuous
l d 1 d
= derivative.
Fuitjiuit&ii-itj)),
Remark. I t actually suffices to assume (1.2.1) for all rj G Co(I,R ), because functions in ACQ may be approximated by CQ functions. I f u G C or D , the proof anyway only requires (1.2.1) for 77 G CQ or Dp, respectively (where Dg is defined analogously to C Q ) .
1 1
etc.,
jf
F rjdt
u
= j[ ^ (j[
T]dt =
F
^ y)
~J
u
(/
Fudy
dt
0=J (Fp-J
We now make use of: L e m m a 1.2.1. Let h G L (I,R)
b
X
F dy^f]dt.
= 0 cGE
E).
(1.2.3)
/ o r almost all t G / .
Remark. I t actually suffices to assume (1.2.3) for all <p G C Q ( / , R ) . I f h G C , one directly sees from the proof that cp C$ suffices.
1
Proof. We put 1 c :=
b
/
~
a
h(t)dt
Ja
14 and
<p(t):= J
Ja
(h(y)~c)dy.
(p(t)dt = 0.
(1.2.4)
= f \h(y)
Ja
- cf dy
because of (1.2.4). This implies the claim. q.e.d. We now may complete the proof of Theorem 1.2.1: By Lemma 1.2.1 there exists c G R with
d
F {t,u(t),u(t))
p
= f F {y,u{y),u(y))dy
u
+c
(1.2.5)
Ja
for almost all t e l . Therefore, F is of class AC, and differentiating (1.2.5) gives (1.2.3). The claims for u e C or D are obvious from the proof. q.e.d.
p 1 1
T h e o r e m 1.2.2. Let F : I x R x R be of class C , and let F be also of class C , and let det (F i j (t, u ( t ) , ^ f all t E I and a solution u G C (I,R ) of
p l or p p l d
for
allr)eCl(I,R ).
<f>: R x R via
x R
x R
<t)(t,u,p,q) := F (t,u,p)
p p p
- q.
Our assumption d e t F ^ 0 makes it possible to apply the implicit function theorem to conclude that <t>(t,u,p,q) = 0 may be uniquely solved w.r.t. p near UQ u(t ),
0
po = u(t ),
0
15
F(to,u ,po) for any to G I . Thus, there exists a neighbourhood U of (to,uo,qo) such that for each (t,u,q) G /, <t> 0 has a unique solution p = <p(t, u, q) and that (p : U E is of class C . Since we already know a solution of <fi = 0, namely (, u(t), u(t), F (t, u(t),u(t))), the uniqueness of the solution cp implies
d 1 p
for t near 0 2
T h e o r e m 1.2.3. Let F satisfy the assumptions of Theorem. 1.2.2, and in addition assume that F where ft C E
d + 1 pp
satisfy
Proof. Since the uniqueness result of the implicit function theorem is only local, i t cannot be applied anymore because u(t) might be discon tinuous. We thus need a global argument. Thus, assume that for given (t,u,q) G ft x R , there are two solutions pi,p2 G R of (f)(t,u,p,q) = 0, i.e.
d d
q = F (t,u,pi)
p
and
q=
F (t,u,p ).
p 2
Thus / F (t,u, Jo
pp Pl
+ s(p ~ ))ds
2 Pl pp
(p -pi)=0.
2 2
(1.2.6)
By our assumption on F , (1.2.6) is invertible, hence p = P i , hence uniqueness. Using this global uniqueness together w i t h the existence result of the implicit function theorem, we now see that for any (t,u,q) in a sufficiently small neighbourhood of ( ,^(b<Zo) (^o I , Ho = u(to), qo = F (t ,^(bPo), Po = ^o(^o)), there is a unique solution (p(t,u,q) of
0 p 0
F (t,u,p)
p 1
-q
=0
and <p is of class C . Thus, as in the proof of Theorem 1.2.2, u(t) = <p(t,u(t),F (t,u(t),u(t)))
p
16
for almost all f i n a neighbourhood of to. Since u(t) and F (t,u(t),u(t)) are absolutely continuous w.r.t. t (the latter by Theorem 1.2.1), u(t) coincides for almost all t near to with an absolutely continuous function v(t). We put
to
Since v = u almost everywhere, we conclude u = w, hence u G C to, which was arbitrary in / . Theorem 1.2.2 then gives u G C .
2
near q.e.d.
Corollary 1.2.1. Under the assumptions of Theorem 1.2.3, any ACsolution of 6I(u,rj) = 0 for all rj G ACo(I,R ) is a solution of the Euler-Lagrange equations
d
F (t,u(t),ii(t))
p
- F (t,u(t),u(t))
u
= 0
(1.2.7)
or equivalently of F (t,
pp
u(t),u(t))ii(t) +F (t,u(t),ii(t))
pt
+ F (t,
pu
u(t),
u
u(t))u(t) = 0. (1.2.8)
l
- F (t,u(t),u(t))
The same holds under the assumptions of Theorem 1.2.2 for a C lution of6I(u,rj) = 0 for all rj G C<J(J,R ).
d
so
q.e.d. T h e o r e m 1.2.4. Let F : I x R x R -+ R be of class C , and let F also be of class C , k G { 2 , 3 , . . . , o o } . Suppose u is of class C and a solution of 6I(u,rj) = 0 for all rj G C o ( / , R ) , and suppose
p k 1 d d d k
det (F
k+l
pipj
^ 0
for all t G / .
(1.2.9)
1
Then u G C (I,R ). (The same result holds if we assume that u G C is a solution of the Euler-Lagrange equations (1.2.8).)
2
1.2 Direct methods, regularity results solves (1.2.8). Because of (1.2.9), F (t, hence
pp
17
u(t),u(t))
is an invertible matrix,
il(t) =
pu
F (t,u(t),u(t))
pp
{-F (t,u(t),u(t))
- F (t,u(t),u(t))
pt
+ F (t,u(t),ii(t))}
u
(1.2.10)
Let now j < k, and suppose inductively u E C . The right hand side of (1.2.10) then is of class C ~ . Therefore, u is of class C- " , hence u is of class C . q.e.d.
3 x 7 1 j+1
The preceding proof most clearly shows the importance of the as sumption det(Fpt j(t, u(t),ii(t))) ^ 0 that already occurred in the proof of Theorem 1.2.2. Namely, i t implies that the Euler-Lagrange equations (1.2.8) can be solved for u in terms of u and u.
p
Corollary 1.2.2. If under the assumption of Theorem 1.2.3, F and F are of class C , then a solution u of 6I(u,rj) = 0 for all rj ACo is of class C .
p k k + 1
q.e.d. Summary. I f one wants to solve I(u) > min by a direct minimization procedure, i t is preferable to admit a class of comparison functions u that is as large as possible. AC (I, E ) seems to be a good choice, because this is the largest class for which
d
7(u)= J
F(t,u(t),u(t))
is well defined, assuming F(t,u,p) to be continuous in u and p and measurable in t. However, if one then finds a minimizer u, it might not be a solution of the Euler-Lagrange equations, because it is not regular enough. I f the invertibility condition d e t F ^ 0 is satisfied, however, one may show that a minimizer u is as regular as F allows. Namely, if F and F are of class C , k G { 1 , 2 , . . . , oo}, then u is of class C . Examples show that without such an invertibility condition, regularity need not hold. This invertibility condition det F ^ 0 implies that the Euler-Lagrange equations allow the expression of u(t) in terms of u(t) and u(t).
p p fc k + 1 p pp
18
= /
Ja
F(t,u(t),u(t))dt,
(1.3.1)
f(s)=I(u
+ sri).
If we want to decide if a given solution u minimizes J instead of just being a critical point, we immediately see that a necessary condition would be /"(0) > 0
d
(1.3.2)
for the above function / and all 7 G D o ( J , R ) . Namely, by Taylor's 7 theorem, since / ' ( 0 ) = 0 m-f(0) = \s f"(0)+o(s )
2 2
fors^O.
More precisely, (1.3.2) is needed for u to minimize / when compared w i t h u 4- srj for sufficiently small s. I n other words, we want u to minimize i" in a D -neighbourhood of itself, i.e. among functions
1
with u(a) = v(a), u(b) = v(b) - v-(t)\ and < e (1.3.3) (1.3.4)
4- \ii+(t) -
for some e > 0. (Note: I t is not clear that e may be chosen independently of v.) We define the second variation of / at u in the direction rj e DQ as 6 I(u,rj)
2
d := _ / (
- f
s 7 ?
u = 0
fields
19
In order that this variation exists, we require for the rest of the section that F is of class C . We then compute
2
6 I(u,
77) = ^
J+ a
+ F^j^ui^^uit^rji^rjjit)}
dt.
(1.3.5)
Here, and in the sequel, we employ the standard summation conventions, e.g.
d
Fpipjfiifjj.
6 I{u,r])=
Ja
{F r)r)
pp
+ 2F r)rj
pu
+ F rjrj}
uu
dt.
(1.3.6)
Our preceding considerations imply: T h e o r e m 1.3.1. SupposeF e < 7 ( J x R x R x R ) andletu satisfy I(u) < I(v) for all v with {1.3.3), (1.3.4). Then 6 I(u,rj)>0 We now put, for given u, <p(t, 77, TT) : = F
pipj 2 2 d d
D (I,R )
forallrjeDl(I,R ).
(1.3.7)
p UJ
(t, u(t),
u^))-*^
(,u(),^))^%, and we define the accessory variational problem for J(M) min as Q(rj) : = / cf)(t,r)(t),r)(t))dt -> min among all 77 G Z ^ ( J , R ) . (1.3.8) If u satisfies the assumptions of Theorem 1.3.1, then Q(rj) > 0 for all 77 G >J, (1.3.9)
d
and hence 77 = 0 is a trivial solution of (1.3.8). We are interested in the question whether there are others. The Euler-Lagrange equations for (1.3.8) are = ^(*,r?W,i)W), (1.3.10)
20 i.e. ~ (F (t,
pp
u(t), u(t))f)(t)
+ F {t,
pu uu
u(t),
u(t))ri(t)) (1.3.11)
= F (t,
pu
u(t), u(t))fj(t)
+ F (t,
u(t), u(t))rj(t).
Since u is considered as given, our first observation is that (1.3.11) is a linear homogeneous system of second order equations for the unknown 77. These equations are called Jacobi equations. Definition 1.3.1. A solution 7 G C (I,R ) 7 (1.3.11) is called a Jacobi field along u(t).
3 d d 2 d
of the Jacobi
equations
L e m m a 1.3.1. Let F G C (I x R x R , R ) , det F {t, u{t), u{t)) ^ 0 for all t e I , u e C ( J , R ) . Then any solution of rj e AC {I,R ), 6Q(rj,(p) = 0 for all <p ACo(I, R ) is of class C and hence a Jacobi field.
pp 2 d d 0 d 2
Proof. We apply Theorem 1.2.3. For that purpose, we note that 0wir(*, *K*)> V(t)) = Fpp(t, u{t), u(t))
pp
and so the assumption det F (t, u(t), u(t)) ^ 0, that is seemingly weaker than the one of Theorem 1.2.3, indeed suffices to apply that Theorem. q.e.d. We now derive the so-called necessary Legendre condition: T h e o r e m 1.3.2. Under the assumption of Theorem 1.3.1, i.e. u G D ( / , R ) minimizes I in the sense described there, we have that
1 d
F (t,u(t),u(t))
pp
i.e. F
pipj
> 0
R.
(At points where ii(t) is discontinuous, this holds for the left and right derivatives.) Proof. We may assume that t e I and ii is continuous at t . The result at the points where u jumps then follows by taking appropriate limits, and likewise at to a, 6. We then consider 0 < e < min(to a, b to) and define 7 G Z ^ ( J , R ) by 7
0 0 d
e linear
fields
21
Then
{
0 < 6 I(u, rj) = r
Jto-e
2
0 -
for a < t < t or t + e < t < 6 for t - e < t < t for t < t < t + c.
0 0 0 0 0 0
+ 0(e )
for c
0,
since all other terms contain a factor e, and we integrate over an interval of length 2e. Hence F i (t ,u{t ),u(t ))Cl;
p pJ 0 0 o j
- lim 0
/
Jt -e
0
F {t,u{t),u(t))C^ dt
pipj
> 0. q.e.d.
The Jacobi equations and the notion of Jacobi fields are meaningful for arbitrary solutions of the Euler-Lagrange equations, not only for minimizing ones. I n fact, Jacobi fields are solutions of the linearized Euler-Lagrange equations. Namely: T h e o r e m 1.3.3. Let F e C {I x R x R , R ) , and let u (t) of C -solutions of the Euler-Lagrange equations
s 2 3 d d
be a family
= 0,
(1.3.12)
with u
is a Jacobi field along u = uo. Proof. We differentiate (1.3.12) w.r.t. s at s = 0 to obtain ~ (F (t,
pp pu
u(t), u(t))r)(t)
+ F (t,
pu uu
u(t),
u(t))ri(t)) = 0. q.e.d.
2 p
- F {t,u(t),u(t))r](t)
L e m m a 1.3.2. Let a < a\ < a <b, and let F and F be of class C in [ a i , a ] , and suppose r\ G C ([a\,a ],R ) is a Jacobi field on [ a i , a ] with r](ai) = 0 = r)(a ). Then
l d 2 2 2 2
<p(t,r)(t),r)(t))dt
= 0.
(1.3.13)
22
Proof. Since <f> is homogeneous of second order in (77,7r), we have 2<f>(t, 77, 7T) = (f) (t, 77, 7r)77 -h (f>n(t, 77, 7T)7T.
v
Comparing (1.3.10) and (1.3.11), we see that (f> is of class C as a function of t. We may hence integrate the last term in (1.3.14) by parts.
n
2 j
<t>(t,T7,r))dt = j
(j>ri{tiT7,rj) - j <t>A^V,
f
= 0, q.e.d.
2
since 77 is a Jacobi
3
field.
As before, let F be of class C , and let u(t) be a solution of class C on [a, 6] of the Euler-Lagrange equations j F {t,
t p
Definition 1.3.2. Let a < a\ < a < b. We call the parameter value a conjugate to a\ and the point (a ,u(a )) conjugate to (a\,u(a\)) if there exists a not identically vanishing Jacobi field 77 on [a\,a ] with
2 2 2 2
77(01) = 0 = 77(02).
We may derive the important result of Jacobi: T h e o r e m 1.3.4. LetF e < 7 ( J x R x R , R) and suppose u e C (I,R ). Suppose that F (t,u(t),u(t)) is positive definite on I. If there exists a* with a < a* < b that is conjugate to a, then u cannot be a local mini mum of I. More precisely, for any e > 0, there exists v D (I, R ) with v(a) = u(a), v(b) = u(b),
pp l d 3 d d 2 d
<e
tl
Proof. Let rj(t) be a nontrivial Jacobi field on [a, a*]. We put rj(t)
77 w
'
"
Jacobi fields
23
= Q(fj)
for all 7 G Z ^ ( J , M ) . 7
Hence 77* would be a minimizer of Q, hence by Lemma 1.3.1 77* G C (I,R ). Since ?)*(a*) = 0, then
2 d
7)*(a*)=0. Since also 77*(a*) = 0, and since 77* solves the Jacobi equation, a (linear) second order ordinary differential equation, the uniqueness theorem for solutions of such equations implies
a contradiction, because by assumption 7 does not vanish identically. 7 Hence u cannot be a local minimizer. q.e.d. In words, Theorem 1.3.4 says that a solution of the Euler-Lagrange equations cannot be minimizing beyond the first conjugate point. Turned the other way round, Theorem 1.3.4 says that i f u is a local minimizer, then there cannot be any parameter value a* w i t h a < a* < b that is conjugate to a. I t may happen, however, that 6 is conjugate to a. A n example will be given in the next chapter.
Summary. I n order to obtain necessary conditions for a solution of the Euler-Lagrange equations F (t,u(t),ii(t))
p
F (t,u(t),ii(t))
u
to minimize
= - ^ I ( U+
ST )
1
for r? Do-
24
If, for fixed u, we consider the variational problem Q(rj) * 0, we are led to the Jacobi equations ~ (F (t,
pp
u(t), u(t))rj(t)
+ F (t,
pu uu
u(t),u(t))rj(t)) u(t),u(t))ri(t)
= F (t,
up
u(t),u(t))r)(t)
+ F (t,
for 77. Solutions rj with 77(a) = 77(6) = 0 are called Jacobi fields, a* G (a, 6) for which there exists a nontrivial Jacobi field on [a, a*] is called conjugate to a, and if there exists such a*, u cannot be locally minimizing on [a, 6]. In other words, a solution of the Euler-Lagrange equations cannot be minimizing beyond the first conjugate point.
1.4 Free boundary conditions We recall the definition of an n-dimensional embedded differentiable submanifold M of R : For every p G M , there have to exist a neighbourhood V = V(p) C M , an open set U cR and an injective differentiable map / : U * V of everywhere maximal rank n (i.e. for every z U, the derivative Df(z), a linear map from E to E , has rank n) w i t h
d d n n d
M nv
n
= f(U).
A n example is the sphere S described in detail in Section 2.1 (Exam ple 2.1.1). The tangent space T M of M at p then is the vector space D / ( z ) ( E ) . I t can be considered as a subspace of the vector space T E , the tangent space of E at p. As in 1.1, we now consider the variational problem
P n d p d
I(u)=
2
/
Ja
F(t,u(t),u(t))dt
min >
w i t h F of class C . This time, however, we do not impose the Dirichlet boundary condition that the values of u(a) and u(b) were prescribed, but the more general condition that for given submanifolds M i , M (differentiable, embedded) of E , we require that
2 d
u(a) G M i , i i ( 6 ) G M .
2
(Dirichlet boundary conditions constitute the special case where M\ and M are points.) In this section, we do not consider regularity questions. As an exercise,
2
25
the reader should supply the necessary regularity assumptions on F , w, etc. at each step. Let u be a solution. Then, as before, u has to satisfy the EulerLagrange equations, because i f u(a) G M i , 77(a) = 0, then also u(a) + 577(a) G M i for any s, and likewise at 6, and so we may again consider variations of the form 72 + 577, 7 G DQ. This time, however, also more 7 general variations are admissible. Namely, let u (t) be a family of maps from / into M. depending differentiably on s G (e, c), w i t h u(t) = Uo(t) and
s d
u (a)
s
GMi
u (b) G M
s
for all s.
Let
Then again 0= ^ / K ) | . _
= f
Ja
=
+
F(t,u(t),u(t))dt^
{F {t,u(t),u(t))-f,{t)
p
F {t,u(t),u{t))-T}(t)}dt
u
= f =
\ - j F
p
+ ^ }
- v + F
i
( * . ( * ) , ( * ) )
m l z i
F (t,u(t)M*))-V(t)\ Za>
since u solves the Euler-Lagrange equations. We now observe that 77(a) G T ( ) M i (and likewise at 6), since we may find a 'local chart' / as above w i t h MiDV(u(a)) = f(U) for a neighbour hood V of u(a) and some open set U C M ( n i = dim M i ) . By choosing e smaller i f necessary, we may assume u (a) G M i f l V = f(U) for 5 G (~, e). Since / is injective, there then has to exist a curve 7(5) C U w i t h u (a) = fo>y(s) for all s. Hence 77(a) = u (a) = D/(/- 7i(a)) (0) is indeed tangent to M i at u(a). Moreover, any tangent vector to M i at u(a) can be realized in this manner. Therefore, since we may choose the values of 77 at a and 6 independently of each other, we conclude
u a n i s 1 , s s u=0 7
F (a,u(a),u(a))
p
V = 0
for all V G
T M
u{a)
for all W G r
u ( 6 )
M .
2
T h e o r e m 1.4.1. Let u be a critical point of I among curves withu(a) G Mi, u(b) G M {Mi, M2 given differentiable embedded submanifolds of R ), i.e. ^ ^ ( ^ s ) | = 0 for all variations u (t) differentiable in s with u (a) G M u (b) G M for all s G ( - e , e ) (e > 0 ) . Then u is a solution of the Euler-Lagrange equations for I , and in addi tion, F (a,u(a),u(a)) and F (b,u(b),u(b)) are orthogonal to T ^Mi and T ( 5 ) M 2 , respectively. In particular, if for example Mi = R , then F (a,u(a),u(a)) = 0.
2 d s = 0 s s b s 2 p p u d U p
Summary. I f instead of a Dirichlet boundary condition, we more gen erally impose a free boundary condition that u(a) and u(b) are only required to be contained in given differentiable submanifolds Mi and M 2 , respectively, of E , then F (a,u(a),u(a)) and F (b,u(b),u(b)) are orthogonal to these submanifolds for a critical point of / under those boundary conditions.
d p p
1.5 Symmetries and the theorem of E . Noether In the variational problems of classical mechanics, one often encounters conserved quantities, like energy, momentum, or angular momentum. I t was realized by E. Noether that all those conservation laws result from a general theorem stating that invariance properties of the variational integral / lead to corresponding conserved quantities. We first treat a special case. T h e o r e m 1.5.1. We consider the variational integral I(u) = /
Ja
2 d d
F(t,u(t),u(t))dt,
with F G C ([a,6] x l x E , E ) . We suppose that there exists a smooth one-parameter family of differentiable maps h : R
s d
-> R
:= h (z)
s 0
= z
for all
zeR
27
f h (u(t)^J
t s 2
dt = j\(t,u(t),
d
j u(t^J
t
dt
(1.5.1)
for all s G (~e,e) and all u G C ( [ a , 6], R ). Then, for any solution u(t) fori,
p
of the Euler-Lagrange
equations
(1.1.4)
~h (u(t))\
s
s=0
(1.5.2)
Definition 1.5.1. A quantity C(t,u(t),u(t)) that is constant in t for each solution of the Euler-Lagrange equations of a variational integral I(u) is called a (first) integral of motion. Proof of Theorem 1.5.1: Equation (1.5.1) yields for any t G [a, 6], using h (z) = z,
0 0
^ s J
{ '
^ ^ Jt
k s { u { t ) )
d t
s = 0
= jT {F +F
(t,u(t),ii(t))
^hs(u(t)) J f sW))}dt\s=ot s h
(1.5.3)
(t, u(t),u(t))
(1.5.4)
fh {u{t))
a
+F
(t,u{t),u(t))
J -ff s( (t))}dt\ =o
t s s
(1-5.5) dt.
(F (t,u(t),u(t))^- h (u(t))\ )
p s 3 s=0
F (a,u(a),u(a))~h (u(a))\
p s
s=0
(1.5.6) for any to G [a, 6]. This means that (1.5.2) is constant on [a, 6]. q.e.d.
28
3 n
, u = (ui,...,u )
n
with
F(t,u(t),u(t))
= pmj-^f--
^II^H
= E ^ i j,
V(u) that is independent of the third coordinates of the Ui. Then h (z)
a 3
= z + se ,
3
where e is the t h i r d unit vector i n M , leaves F invariant i n the sense of Theorem 1.5.1. Since ^-h \ o
s s=
= e ,
3
we conclude that
1=1
i.e. the third component of the momentum vector of the system is con served. Example 1.5.2. Similarly, i f a system as in Example 1.5.1 is invariant under rotations about the e -axis, and i f h now denotes such rotations, then (up to a constant factor)
3 8
L , n \s=oUi = es A Ui. as Hence, the conserved quantity is the angular momentum w.r.t. the e s 3
axis, n ^ F ez A Ui = ] P ( i i) i=l i
m u v
We now come to the general form of E. Noether's theorem T h e o r e m 1.5.2 ( T h e o r e m of E . N o e t h e r ) . We consider the varia tional integral rb I(u) = I F(t,u(t),u(t))dt
Ja
1.5 Symmetries
2 d d
29
with F C ([a,6] x R x E , E ) . We suppose that there exists a smooth one-parameter family of differentiate maps h = (h ,h )
s s s
: [a,b] x E E x E >
(s G ( - e o , e ) as before) with
0
h (t, z) = (t, z)
0
\ /
F(t,u(t),u(t))dt
(1.5.7) fort = h (t), all s G ( - e , e ) and a// it G C ( [ a , 6 ] , E ) . Then, for any solution u(t) of the Euler-Lagrange equations (1.1.4) for I ,
2 d s s 0 0
F {t,u(t),u(t))
p
'ds'
p
h {u{t))\
s
s=0
f c
(1-5.8)
Proof. We reduce the statement to the one of Theorem 1.5.1 by artifi cially considering t as a dependent variable on the same footing with u. Thus, we consider the integrand
F(t(T),u(t(T)),^,^u(t(r))
:=F
\
t M t ) , * ^ ) Z dr
dr
(1.5-9)
Then I(t,u) := j H F ( i ( r ) , u ( i ( r ) ) , =
=
| : ( * ( r ) ) ) dr i f i ( r ) = a, f ( n ) = 6
0
J F(t,u(t),u{t))dt,
/()
(1.5.10)
By our assumption, F remains invariant under replacing (t,u) h (t,u). Consequently, Theorem 1.5.1 applied to I yields that
s
by
F {t,u(t)Mt))~h (u(t))\ ^
p s s=Q
30
with p standing for the place of the argument ^ of F (while p stands as before for the arguments ii), is invariant. Since, by (1.5.9),
Fp F pi
Fo
p
= F -
Fu
p
= (t + s,z)
leaves / invariant as required in Theorem 1.5.2. Therefore, the 'energy' F(t, u(t),u{t)) - F (t, u(t),
p
u(t))u{t)
is conserved. We shall see another proof of this fact in Section 4.1. Summary. The theorem of E. Noether identifies a quantity that is pre served along any solution u(t) of the Euler-Lagrange equations of a variational integral, a so-called first integral of motion, with any differ entiable symmetry of the integrand. For example, in classical mechan ics, conservation of momentum and angular momentum correspond to translational and rotational invariance of the integral, respectively, while time invariance leads to the conservation of energy.
E(u) : = i
f\u{t)\ dt
d l d
(| | is the Euclidean norm of E , i.e. for z (z ,..., z ), | | 2 _ J2i i(z ) ). Compute the Euler-Lagrange equations and the second variation. Also, let
1 2 z =
L(u) := I
Ja
\ii{t)\dt.
Exercises
31
1.2
w i t h equality if \u(t)\ = constant almost everywhere. (What is an appropriate regularity class for the mappings u that are considered here?) Determine all minimizers of the variational integral
1.3
1.4
w i t h u(~l) = 0 = u{l). Develop a theory of Jacobi fields for variational problems with free boundary conditions. I n particular, you should obtain an analogue of Jacobi's theorem. For mappings u : [a, 6] E , consider
d
Compute the first and second variation of / and the Jacobi equation. Can you find Jacobi fields?
2.1 T h e length and energy of curves We let M be an n-dirnensional embedded submanifold of R . I n this section, we assume that / is of class C , i.e. that all local charts are thrice differentiable. We let c G AC([0,T],M) be a curve on M . This means that c is an absolutely continuous map from the interval [0, T] into R with the property that c(t) G M for every t G [0,T]. The derivative of c w.r.t. t will be denoted by a dot ',
3 d d
Jt
{t).
L(c):=\c(t)\dt
1 d
= ^ ( n
dt,
(2.1.1)
:= ^
\c(t)\
d t = \ Y .
(<H
(2.1-2)
f:U-~>V
f(U) =
Mf)V
be a local chart for M as defined in Section 1.4. We assume for a moment that c([0, T]) is contained in /(J7). Since / maps U bijectively onto f(U), there exists a curve
7
( t ) C C/ 32
33
(2.1.3)
Since the derivative Df(z) has maximal rank everywhere (by definition of a chart, cf. 1.4), 7 is absolutely continuous, since c is, and we have the chain rule c(t) = (Df) ( ( ) ) o 7 ( t ) ,
7
or
L(c) =
and
<
dt
1 f
df
df
In these formulae, and in sequel, the index is summed from 1 to d. For zeU, we put 9f df W i t h this notation, the preceding formulae become L{c)= I * (9iMt))fm (t)) dt Jo 9iMmW(t)dt
j h a a
(2.1.5)
E{c) = J
(2.1.6)
QfOt
QfOt
is symmetric, i.e.
z f o r a 1 1
hi
whenever 7 = ( 7 7 , . . . , rf) ^ 0 G E . 7
34
Geodesic curves
Remark 2.1.1. The use of local charts for M seerns to have the obvious disadvantage that the expressions for length and energy of curves be come more complicated. The advantage of this approach, namely not to consider curves on M as curves in R satisfying a constraint, is that this constraint now is automatically fulfilled. A l l curves represented in local charts lie on M . This more than compensates for the complication in the formulae for L and E.
d
Our aim will be to find curves of shortest length or of smallest energy on M , i.e. to minimize the functionals L and E among curves on M . For this purpose i t will be useful to observe certain invariance properties of L and E. First of all, whenever i : H& > H& is a Euclidean isometry, i.e. i(y) = Ay + b with A G 0 ( d ) , the orthogonal group, and b G E , then
d d d
L(i(c))
= L(c)
(2.1.7)
E(i(c))
d
= E(c)
(2.1.8)
for any curve c : [0, T] -> R . Secondly, L is parameterization invariant in the sense that whenever T:[0,S]->[0,r] is a diffeornorphisrn (i.e. r is bijective, and both r and its inverse are everywhere differentiable), then r~
l
L(c) = L ( c o r ) ,
Namely
L(cor)
35
L(c)
= J
l<\c(t)\dt<U
dt\
[J
\c(t)\ dt)
=VWy/E(cj, (2.1.10)
w i t h equality iff \c(t)\ = constant We have shown: L e m m a 2.1.1. For every c e A C ( [ 0 , T ] , E ) L(c) with strict inequality, unless \c(t)\ = constant If \c(t)\ = constant almost everywhere , almost everywhere. < V^>/E(cj,
d
(2.1.11)
we say that the curve c is parameterized proportionally to arc-length, and if \c(t)\ = 1, we say that i t is parameterized by arc-length. We recall that a Jordan curve, i.e. an injective curve c : [0, T] R , is rectifiable if i t is absolutely continuous (which we always assume), and this implies that i t may be parameterized by arc-length, i.e. there exists a diffeornorphisrn
d
= 1
is parameterized by arc-length. From Lemma 2.1.1, we obtain: Corollary 2.1.1. Let c : [0, L(c)] R [0,L(c)]. Among all reparameterizations
T:[0,L(C)}^[0,L(C)}
d
be a curve parameterized on
36
Geodesic curves
(i.e. we keep the interval of definition fixed, namely [0, L(c)]), the par ameterization by arc-length leads to the smallest energy. Namely, if c : [0, L(c)] E is parameterized by arc-length
d
We now return to those curves c that are confined to lie on M , in order to discover a third invariance. Namely,we compare the two expressions (2.1.1) and (2.1.5) for the length of c, and similarly (2.1.2) and (2.1.6) for its energy. (2.1.1) is obviously independent of the chart / : U V and its metric tensor, and therefore (2.1.5) has to be independent of them, too. I n order to study this more closely, let f:U-+V be another chart with
C
([0,T])c/(/).
of-.r
(/([/) n / ( # ) ) ^ r
(/([/) n /"(#))
(see Figure 2.1). (p is called a coordinate transformation, (p is a diffeomorphism, i.e. a bijective map between open subsets of E whose derivative Dp(z) has maximal rank ( = n) at every z. Then from
n
/ o ( t ) = c(t) = / o 7 ( t ) ,
7
7(0 = ^(7(0),
and from
hence
?(t)
^( (t))V(t)
7
(2.1.15)
fob))
f(z)
37
Figure 2.1.
we get 9ij(z) = From (2.1.15) and(2.1.16), we see 9iJ (7(0) (t)i (t)
j
~9ki(<p(z))^(z)^j(z).
(2.1.16)
= hi m))
t W
(*),
(2-1.17)
and this shows again the equivalence of (2.1.5) and (2.1.1), and likewise for the corresponding expressions of the energy. The important transfor mation formula (2.1.16) shows how the metric tensor transforms under coordinate transformations. This invariance property of L and E makes it possible to express the length and energy of an arbitrary curve c on M that is not necessarily contained in the image of a single chart as follows: One finds a subdivision
<t =T
m
of [0, T] w i t h the property that c([^_i,^]) is contained in the image of a single chart U
:U v j
V
n
==1
Geodesic curves
= E
/
tiy 1
(^(7,W)7iW7iW)'*
where c(t) = fvO~f (t) for t G By the preceding considerations, this does not depend on the choice of charts / . For this reason, one usually just says that for a curve c on M
u
L(c) = f Jo
( ^(t))f(t)jHt))
9ij
dt,
(2.1.18) (9ij)ij=i,...,
where 7 is the representation for c w.r.t. a local chart, and is the metric tensor of M w.r.t. this chart. Similarly E(c) = \ j T ^ ( 7 ( * ) ) f ( * ) y (*)*
(2-1.19)
We now assume that the charts for M are twice differentiable and return to the question of finding shortest curves on M , for example between two given points. By Corollary. 2.1.1, it is preferable to minimize E instead of L , because a minimizer for E contains more information than one for L ; namely, minimizers for E are precisely those minimizers for L that are parameterized proportionally to arc-length. Thus, minimizing E not only selects shortest curves but also convenient parameterizations of such curves. We now compute the Euler-Lagrange equations for E as given by (2.1.19):
d
0= ^ 0 = j
t 9ij
(the factor 2 in the first term results from the symmetry & 0 = 2 rf
9i
+2^y7
f c
7> - ^iflyyV-
,
/ t,j = l,...,n
2.1 The length and energy of curves is the matrix inverse to (gij)ij 1 0 i.e. for i = k for i ^ k for all i,
39
9*9jk = 6k :=
symbols
Equation (2.1.20) then becomes 0 = f 4- \g = f + ^ by using symmetries. Thus: L e m m a 2.1.2. The Euler-Lagrange curves on M are 0 = f(t) + r ( (t)W(t)j (t)
jk 1 i k il
(2gi ,ki
j
9kj ri V)
t
+Abu - ffiM) 7 * y
for
fori
= l,...,n.
(2.1.21)
The theorem of Picard-Lindelof about solutions of ordinary differential equations implies: L e m m a 2.1.3. For any z E U, v e K , the system (2.1.21) has a unique solution y(t) with 7(0) = z , 7(0) = v for t E [e, e] and some e > 0.
n
Moreover, 7(2) depends differentiably on the initial values z, v. Definition 2.1.2. The solutions of (2.1.21) are called geodesies on M.
:=S \
{(0,0,...,0,-1)}
40 and define
Geodesic curves
gi : fix - E as
, g : fi 2 2
and
*" > - ( I ^ j t
2
+,
nSsr)
(#1 and g are the stereographic projections from the south and north pole, respectively). We then obtain charts / /
1
= j r
9
: r - ^ \ { ( o
n
0,1)}
= 2- :R"-5 \{(0,...,0,-l)}.
/ 1
, _ f
- " + ! ) 4x
n + 1 x
n + 1
hence
4-1
and then
2 z J
1 Thus i
,...,2
4-
/ = l , . . . , n^. 1 0 )
2z
2z
zz
- 1
4*V
dz
~ (1 + z V ) '
41
()
^-6ij.
(2.1.22)
Actually, the metric tensor w.r.t. the chart f is given by the same formula. I n order to compute the expression for geodesies, we also need to compute the Christoffel symbols. I t turns out that adding a little generality will actually facilitate the computations. We consider a metric of the form
2
(2.1-23)
=<l> 8 .
ij
(2.1.24)
Next It,- = \g =
= kl
(2.1.25)
+ 6 i j
dip dz^'
Thus,
T
a n d l ^ g
fari^j.
(2.1.26)
2z '
2
42
Geodesic curves
o = f + 2
r*,.(7) V' - r j ( ) 7 7 +
7 i 7
W W
(using the symmetry F\j = T^) r r r V y V TTTP 1 + |7l j^x + 171 We now claim that the geodesic ( t ) through the origin, i.e. 7 ( 0 ) = 0, with 7(0) = a R is given by = V - 2
1 7 n 7
( 2
L 2 7 )
() = aa(t),
(2.1.28)
where a : E E then satisfies a ( 0 ) = 0, d(0) = 1. Making the ansatz (2.1.28) in (2.1.27) leads to i fr[l if.. = a a
1 L
2a a + a \a\
2 L 2 2
.o
2\a\ a . \ - cr
2
. i = 1 , . . . , n.
1
l + \a\ a*
Since we may assume a ^ 0 (otherwise the solution with 7 ( 0 ) = a is a point curve, hence uninteresting), this equation holds, if a(t) satisfies the ordinary differential equation (ODE) 0 = a 1 !-4a. |a| a
2
(2.1.29)
4-
The theorem of Picard-Lindelof implies that (2.1.29) has a unique solu tion in a neighbourhood of t = 0. We then have found a solution j(t) of (2.1.27) of the desired form (2.1.28). The image of 7(f) is a straight line through 0. By Lemma 2.1.3, we have thus found all solutions through 0. The images of the straight lines under the chart / 1 are the great circles on S through the south pole. We can now use a symmetry argument to conclude that all the geodesic lines on S are given by the great cir cles on S . Namely, the south pole does not play any distinguished role, and we could have constructed a local chart by stereographic projection from any other point on S as well, and the metric tensor would have assumed the same form (2.1.22). More generally, one may also argue as follows: We want to find the geodesic arc j(t) on S with 7 ( 0 ) = po, 7 ( 0 ) = V for some p e S ,V e T S. Let c (t) be the great circle on
n n n n n n n 0 0 0 Po 0
43
S parameterized such that Co(0) = po> co(0) = V . Co is contained in a unique two-dimensional plane through the origin in E . Let i denote the reflection across this plane. This is an isometry of R mapping S onto itself. I t therefore maps geodesies on S onto geodesies, because we have observed that the length and energy functionals are invariant under isornetries, and so isometries have to map critical points to crit ical points. Now i maps po and Vo to themselves. I f 7 were not invariant under i , i o 7 would be another geodesic w i t h initial values po, Vo, con tradicting the uniqueness result of Lemma 2.1.3. Therefore, 2 0 7 = 7 , and therefore 7 = c .
n + 1 n + 1 n n 0
We draw some conclusions: The geodesic arc through two given points need not be unique. Namely, let p, q be antipodal points on 5 , e.g. north and south pole. Then there exist infinitely many great circles that pass through both p and q. We shall later on see that the first conjugate point of a point p S along a great circle is the antipodal point q of p. One also sees by explicit comparison that a geodesic arc on S ceases to be minimizing beyond the first conjugate point, in accordance with Theorem 1.3.4.
n n n
2.2 F i e l d s of geodesic curves Let M be an embedded, differentiate submanifold of E , or, more gen erally, a Riemannian manifold of dimension nf, again of class C . Let M q be a submanifold of M ; this means that Mo itself is a differentiable submanifold of E , respectively a Riemannian manifold, and that the inclusion i : M ^ M is a differentiable embedding. We assume that M q has dimension n 1, and that i t is also of class C .
3 d 0 3 d
T h e o r e m 2.2.1. For any x M , there exist a neighbourhood V of x in M, and a chart f : U * V with the following properties:
0 0 0
(i) U contains the origin o / E , / ( 0 ) = #o(ii) M nV = f{UD{x = 0}) (Hi) The curves x = C\, C \ constant, i = l , . . . , n 1, are geodesies parameterized by arc-length. The arcs 1 < x < 2 on any such
n 0 l n
f We do not introduce the concept of an abstract Riemannian manifold here, but some readers may know that concept already, and in fact it provides the natural setting for the theory of geodesies. O n the other hand, the embedding theorem of J.Nash says that any Riemannian manifold can be isometrically embedded into some Euclidean space E , hence considered as a submanifold of M . Therefore, from that point of view, no generality is gained by considering Riemannian manifolds instead of submanifolds of R .
d d d
44
Geodesic curves curve between the hypersurfaces x the same length 1 2 (iv) The metric tensor on U satisfies 9nn = 1, gin = 0
n
= 1 and x
= 2 are M of
for all i = 1 , . . . , n - 1
l
(2.2.1)
(T/ie second relation means that the curves x = C\, i = 1 , . . . , n 1, intersect the hypersurfaces x = constant orthogonally.)
n
Proof. Since Mo is a hypersurface, for every p G Mo, there exist two unit normal vectors n(p) to Mo at p, i.e. n (p)eT M,
p
C T M.
P
In a sufficiently small neighbourhood V of #o, we may assume that such a normal vector n(p) may be chosen so that it depends smoothly on p G Mo f l V =: Vo. We assume that there is a local chart (p : Uo * VQ for M q (Uo C M "" ), possibly choosing V smaller, if necessary. For every p G Mo f l V , we then consider the geodesic arc 7 ( ) with
0 71 1 P
7P(0) = P,
7P(0) = n ( p ) .
(2.2.2)
This geodesic exists for |f| < e = e(p) by Lemma 2.1.3. By choosing V smaller i f necessary, we may assume that e > 0 is independent of p. Instead of 7 (), we write ~/(p,t). Since the solution of (2.2.2) depends differentiably on its initial values (see Lemma 2.1.3), hence on p, the map
P
/ :0b* (-e,)-M (x, t) - > 7((^(x),f) is likewise differentiable, where (p : C/o Vo is a local chart for Mo- We may assume x = y?(0),
0
by composing < with a diffeomorphism if necessary. A t (0,0) G / ? Uo x (c, c), the Jacobian of / is spanned by the linearly independent vec tors a i ^ T ' M z ) ) ( o that 7(y>(a?),0) = y>(a?) and n(y?(x))
n n t e
45
are orthogonal to all the vectors J-^r X ^ ^ M o , j = 1 , . . . , n 1). There fore, by the inverse function theorem, / yields a chart in some neigh bourhood U of (0,0) e Uo x (,). / obviously satisfies (i), (ii) (after redefining V ) . (iii) also holds by construction (putting x = t). Next, g = 1, since the curves x = c^, namely / ( c i , . . . , c _ i , ), t 6 (e, e), are geodesies parameterized by arc-length, hence g = = 1. Finally, the system of equations for these curves to be geodesic is
n % nn n nn
- + r * ^ dx dx S: (dx )
n 2 t3 n
(*
= *)
***=!,...,n.
= 0
for fc = l , . . . , n .
n n = ^9 (l9nl,n
~ 9nn,l) =
^'ffnl.n,
since #
n n
= 1. Therefore g
nkjn
= 0
1
Since furthermore ^ ( x , . . . , x , 0 ) = 0, because the geodesic arc x ~ t, x Ci ~ constant, is orthogonal to the surface ^ ( x , . . . , x ) = / ( x , . . . , x ~ , 0 ) , we obtain
n l 1 n _ _ 1 1 n 1
9nk = 0.
DejRnition 2.2.1. T/ie coordinates whose existence is affirmed by The orem 2.2.1 are called geodesic parallel coordinates based on the hypersurface M .
0
T h e o r e m 2.2.2. Let f : U V be a chart with the properties described in Theorem 2.2.1. In particular, the curves x = Ci, Ci constant,, for i = l , . . . , n 1 are geodesic arcs. Then any such curve is the short est connection of its endpoints when compared with all curves contained entirely in U and having the same endpoints.
%
Proof
(t)
= {x* = ^ , x
where U = / 7 x (-e, e). Let 7 ( f ) , t\ < t < t be another curve in U with 7 ( ^ 1 ) = 7 ( - e ) , 7 ( t ) = 7(e). We have to prove
0 2
Hi)
> i(7),
(2-2.3)
46
Geodesic curves
(7)
/TL
(
V,J=I
n
9ij ( 7 ( 0 ) 7 W ^ W + ( T ^ ) ) j
/
(2.2.4)
since #
n n
>
i7 (o
*>7
(<2)-7
(<i)=7
(e)-7
(-e)
= 1(7).
The first inequality is strict, unless 7* is constant for i = 1,... , n 1, and the second one is strict, unless 7 ( ) is monotonic. q.e.d.
n
()
= {* = ,x
x C i
t,-e<t<e}
constitute a field of geodesies. Theorem 2.2.2 essentially says that any geodesic arc i n this field is shorter than any other curve with the same endpoints in the region covered by the field. Both properties are essential. Namely geodesic arcs on S that are longer than a great semicircle show that geodesies not embedded i n a field need not minimize the length between their endpoints. A n d geodesic arcs on a cylinder, contained i n meridians, but longer than a semicircle show that there may be shorter curves not contained i n the field.
n
We observe that i f 7 ( f ) solves (2.1.21), so does y(\t) for A = constant. We fix Zo G U and denote the geodesic arc 7 of Lemma 2.1.2 with
7(0) = s ,7(0) = t;
0
for A ^ O .
(2.2.5)
Thus 7 A is defined on j], if 7 is defined on [c, e]. Since *y depends differentiably on v, and since v G E , \v\ = 1, is compact, there exists 0 > 0 with the property that for all v w i t h \v\ = 1, 7 is defined on [eo,eo]. From (2.2.5), we then conclude that for any w R with M < eo, 7tu is defined on [1,1]. For later purposes, we also note that by Lemma 2.1.3, CQ may be chosen to depend continuously on ZQ.
v v n V n
47
: {w G E
: \w\ < e } -+ U
0
w H- 7tu(l).
Then e(0) = z . We compute the derivative of e at 0 as
0
De(0)(v)
= |
7 t
(l)|
= ^7(%.o = 7(0)
by (2.2.5)
T h e o r e m 2.2.3. e maps a neighbourhood of 0 G E diffeomorphically (i.e. e is bijective, and both e and e~ are differentiable) onto a neigh bourhood of ZQ G U. q.e.d.
l
(2.2.6)
for the point zo G U under consideration. Secondly, the transformation formula (2.1.16) implies that we may perform a linear change of coord inates (i.e. replace / by / o A, where A G G L ( n , R ) ) in order to achieve ffy(0) = fy. (2.2.7)
We assume that / : U V satisfies these normalizations. We then replace / by / o e defined on {w G E : \w\ < e } .
n 0
T h e o r e m 2.2.4. In this new chart, the metric tensor satisfies fti(0) rj (0) - 0 fc ftj fc
(2.2.8) (2.2.9)
, (0)
Proof. B y (2.1.16), <ftj = 6ij holds, since the metric tensor w.r.t. the chart / satisfies this property and De(0) is the identity by the proof of Theorem 2.2.3. I n order to verify (2.2.9), we observe that in our new chart, the straight lines tv (v G E , t \v\ < e) are geodesies. Namely, tv is mapped to 7 ^ ( 1 ) = y (t) (see (2.2.5)), where *y (t) is the geodesic with
n v v
48
Geodesic curves
initial direction v. We thus insert 7 ( f ) = tv into the geodesic equation (2.1.21). Then 7 = 0, hence T (tv)v v
jk i j k
=0
for i = l , . . . , n .
= 0
l=1
for all v e E , i = 1 , . . . , n .
n
17,(0) = 0
m
We next insert v = \(ei 4- e ) , / ^ m . The symmetry TJ. = T - (which directly follows from the definition of T - and the symmetry = g j) then yields
l k k
rj (0)
m
= 0
constructed before
We let x , . . . , x be Riemannian normal coordinates. We transform them into polar coordinates r, y ? , . . . , (p ~ in the standard manner (e.g. if n = 2, x = r c o s ^ , x = rsiny? ). This coordinate transformation is of course singular at 0. We now express the metric tensor w.r.t. these polar coordinates. We write g instead of gn, and we write g instead of gu, I = 2, . . . , n , and g instead of (9ki) ,i=2,...,d' * Particular, by Theorem 2.2.4 and the transformation rule (2.1.16)
1 n l 1 1 2 1 rr r(p n w k
ffrr(0) = l , f f r ( 0 ) = 0 .
V
(2.2.10)
The lines through the origin are geodesies by the construction of Rie mannian normal coordinates, and in polar coordinates, they now become the curves (p ( y ? , . . . , <p ~ ) = constant; thus they can be written as 7(t) = (, (po) w i t h fixed (p .
0 1 n l
r* = 0
r
for all i
(where of course T
l rr
= 0 for a l i i ,
2.2 Fields of geodesic curves hence 2#w,r ~ 9rr,i = 0 Putting r I gives g and w i t h (2.2.10) then g Using this in (2.2.11) gives
rr rr
49
for a l l / .
(2.2.11)
= 0,
= 1.
(2.2.12)
hence w i t h (2.2.10) again g We have thus shown: T h e o r e m 2.2.5. In the preceding coordinates, so called Riemannian polar coordinates, that are obtained by transforming Riemannian nor mal coordinates into polar coordinates, the metric tensor has the form
(I
rip
= 0.
(2.2.13)
0 0
...
Vo
w
where # stands for the (n 1) x (n 1)-matrix of the components of the metric tensor w.r.t. the angular variables y ? , . . . , y? "" .
1 71 1
Note that this generalizes the situation for Euclidean polar coord inates. The Euclidean metric on M , written in polar coordinates, e.g. takes the form
2
Note that Theorem 2.2.5, in contrast to Theorem 2.2.4, is valid on the whole chart, not only at the origin. C o r o l l a r y 2.2.1. Riemannian polar coordinates are geodesic parallel
coordinates based on the hypersurfaces r = constant (r ^ 0, since r = 0 corresponds to a single point, and not a hypersurface). Proof. By Theorem 2.2.5, all properties stated in Theorem 2.2.1 hold. q.e.d.
50
Geodesic curves
By Corollary 2.2.1 and Theorem 2.2.1, the curves p = constant, n < T < T2, are shortest connections between their end points among all curves lying in the chart. We are now going to observe that this holds even globally, i.e. also in comparison with curves that may leave the chart: T h e o r e m 2.2.6. For each p M, there exists Co > 0 with the property that Riemannian polar coordinates centered at p may be introduced with domain {(r,^):0<r<e },
0
Q may be chosen to depend continuously on p. We denote the subset of M corresponding to this coordinate domain by B(p,o). For any e with 0 < e < o and any q G dB(p,e), there exists precisely one geodesic of shortest length e) from p to q. Namely, if q has coordinates (e,(po), this geodesic arc is given by y(t) = (t, po), 0 < t < e. Proof. The first claim follows from Theorem 2.2.3, since Riemannian polar coordinates are based on the diffeomorphism e (see the construc tions before Theorems 2.2.4 and 2.2.5). As already noted before Theo rem 2.2.3, Lemma 2.1.3 implies that .we may choose e as a continuous function of p. I n order to verify the second claim, let c(t) be a curve from p to g, w i t h c(0) = p. Let
0
Since w.l.o.g. e > 0 and c is continuous, to is positive. We are going to show that (2.2.14) Since the curve (,y>o)> 0 < f < e, has length e as easily follows from Theorem 2.2.5, this will imply the claim. I n order to verify (2.2.14), we proceed as follows:
(identifying cj
[0
51 =
\ Jo
\r\dt,
> / rdt = r{t ) = e. Jo Here, equality only holds i f g ifiip = 0, i.e. (p(t) = constant, r > 0, i.e. i f C| j is a straight line through the origin. The second claim now easily follows. q.e.d.
0 w [0
Corollary 2.2.2. If M is compact, there exists e > 0 with the property that for every p G M, there exist Riemannian polar coordinates with domain
0
{(r,^):0<r<e }.
0
Proof. This follows from Theorem 2.2.6, since the constructions em ployed for polar coordinates depend continuously on p (see essentially the construction of the diffeomorphism e). q.e.d.
2.3 T h e existence of geodesies Definition 2.3.1. Let M be a connected differentiable submanifold of Euclidean space M , or, more generally^, a connected Riemannian man ifold. The distance between p,q G M is
d
T h e o r e m 2.3.1. Let M (as in Definition 2.3.1) be compact. There exists eo > 0 with the property that any two points p, q G M with d(p, q) < to can be connected by a unique shortest geodesic arc (i.e. of length This geodesic arc depends continuously on p and q. d(p,q)).
Proof. We take eo as described in Corollary 2.2.2. This gives a unique shortest geodesic arc from p to q which furthermore depends contin uously on q. Exchanging the roles of p and q then yields continuous dependence on p, too. q.e.d.
f See footnote on p. 43.
52
Geodesic curves
We now proceed to establish a global result: T h e o r e m 2.3.2. Let M be a compact connected differentiable submanifold ofW*, or, more generally, a compact connected Riemannian man ifold. Then any two points p, q G M can be connected by a shortest geodesic arc (i.e. of length d(p,q)). Proof. Let ( c ) N be a minimizing sequence. We may assume w.l.o.g. that all c are parameterized on the interval [0,1] and proportionally to arc-length. Thus
n n n
Cn(0) = p, c ( l ) = q,
n
L(c )
n
> d(p, q)
for n
oo.
w i t h e given by Theorem 2.3.1. By Theorem 2.3.1, there exists a unique shortest geodesic arc between c (tj-i, ) =: P j - i , and c (tj ) = : P j , . We replace c \ ^ j by this shortest geodesic arc and obtain a new minimizing sequence, again denoted by c , that now is piecewise geodesic. Since the length of the c are bounded because of the mini mizing property, we may actually assume that m is independent of n . Since M is compact, after selecting a subsequence of c , the points pj converge to limit points p^, ( j = 0 , . . . , m) as n oo. c j ^ the unique shortest geodesic arc between P j _ i , and P j , , then converges to the unique shortest geodesic arc between Pj-\ and pj (for this point, one verifies that limits of geodesic arcs are again geodesic arcs, that limits of shortest arcs are again shortest arcs, that d(pj-\,pj) < Q, and one uses Theorem 2.3.1). We thus obtain a piecewise geodesic limit curve c, with c(0) = p, c ( l ) = g, and
0 n n n n jTl n n [t t n n n yTl n [t i t n n
L(c) = l i m L ( c ) ,
n
noo
( iii-..,.)
n
= B
( i^-.-^-i)
noo '
L(c) = d(p,g),
53
and c thus is of shortest possible length. This implies that c is geodesic. Namely, otherwise we could find 0 < s\ < s < 1 w i t h L ( | ) < e, but w i t h C | not being geodesic. Replacing c j by the shortest geodesic arc between c(s\) and c(s ) would yield a shorter curve (cf. Theorem 2.2.6.), contradicting the minimizing property of c.
c 2 [ s i ) S 2 ] 0 ( S I {ai 8 2 ) 2
q.e.d. Thus, any two points on a compact M may be connected by a shortest geodesic. We now pose the question whether they can be connected by more than one geodesic, not necessarily the shortest. On 5 , for example, this is clearly the case. Actually, the answer is that i t is the case on any compact M. That result needs a topological result that is not available to us here, however. Therefore, we will restrict ourselves to a special case which, however, already displays the crucial geometric idea of the construction for the general case, too.
n
T h e o r e m 2.3.3. Let M be a differentiable submanifold of Euclidean space W*, (or more generally^, a Riemannian manifold), diffeomorphic to the sphere S . The latter condition means that there exists a bijective map
2
h:S
~*M
that is differentiable in both directions. Then any two points p, q G M can be connected by at least two geodesies. Proof. M is compact and connected since diffeomorphic to S which is compact and connected. Let us assume p ^ q. We leave it to the reader to modify our constructions in order that they also apply to the case p = q. (In that case, T h m 2.3.3 asserts the existence of a nonconstant geodesic c : [0,1] > M w i t h c(0) = p = c(l).) One may then construct a diffeomorphism ho : S
2 2
54
Geodesic curves
Let us point out that these normalizations are not at all essential, but only convenient for our constructions. We look at the family of curves 7(,s) = /io(sin27rssin7r, cos27rssin7r,cos7r), Then
7
( * , 0 ) = 7 ( M ) = c(t)
for all t
and 7(0, s) = c(0), We find some number K w i t h L{n(-,8))<K for all s. (2.3.2) 7(1, s) = c ( l ) for all s.
Redefining the parameter t, we may also assume that all curves 7(-,s) are parameterized proportionally to arc-length. By Theorem 2.3.1, there exists o > 0 such that the shortest geodesic between any p, q G M, w i t h d(p, q) < o is unique. Let 0 = t < *i < ... < t
0 m
= 1
for j = 1 , . . . , m .
(2.3.3)
< tm = T
and
T j
<
forj
= l,...,m + l.
(2.3.4)
I f 7 : [0,1] M is any curve parameterized proportionally to arc-length with L(l) we then have for j = 1 , . . . , m d ( 7 ( ^ - 1 ) ,7(*j)) < I ( T |
[ t j
< K,
l ! ( j )
) < * f
= to y(tj)
coincides w i t h the shortest geodesic from 7(j_i) j coincides w i t h the again unique short
to 7(^j), j = 1 , . . . , m . Likewise, we let ^ ( 7 ) by that piecewise geodesic curve for which r (7) | .
2 [r r
2.3 The existence of geodesies L e m m a 2 . 3 . 1 . Suppose d(7(fy)7(fy-i)) < e and d(y(Tj),7(r _i))
0 7
55 <
/ o r a// j . r(7) : = r o r i ( 7 )
2
(2.3.5)
equality iffy
Proof. By uniqueness of the shortest geodesic between y(tj-\) we have i(ri (7))<i(7) w i t h equality only i n case 7*1(7) = 7Likewise, for every curve 7', L (7,' ] < Q for all j ,
L(r ( '))<L(V)
2 7
and 7 |
=
[t
I T
w i t h corners at most at the j, and i f r ( r i (7)) = r\ (7), then r i ( 7 ) is geodesic w i t h corners at most at the Tj. Thus, i f r(y) = 7, 7 cannot have any corners at all.) q.e.d. L e m m a 2.3.2. Let 7 : [0,1] M be a curve parameterized propor + tionally to arc-length and with L("y) < K. Then a subsequence of> (7) ( = ro...or(7)) converges uniformly to a geodesic with the same endpoints as 7.
n
56
N
Geodesic
curves
Proof. Each curve r ( 7 ) , n N , is a piecewise geodesic w i t h corners r 7 ( n ) , . . . , r 7 ( r ) and endpoints r 7 ( r ) = 7 ( 0 ) , r 7 ( r + i ) = 7 ( 1 ) . The individual segments are the unique shortest connections between these points. Therefore, each such curve is uniquely determined by the m-tupel
n N N N m 0 m A
: = =
( n
r
7 ( T l ) >
n 7
(r )) GMx... x M .
m
m times Since M is compact, a subsequence of A converges to some limit ( p i , . . . , P m ) 6 M x ... x M . r ( 7 ) then converges uniformly towards the piecewise geodesic 7 0 w i t h endpoints 70(0) = 7(0)>7o(l) = 7(1) and nodes 70(r<) = Pi (i = 1,... , m ) w i t h segments 7 |
0 { T i t } N n
tween their endpoints. This follows from the continuous dependence of the occurring geodesic arcs on their endpoints (Theorem 2.3.1). We de note the convergent subsequence of ( r ( 7 ) ) then
7 l / + 1 N N N
by ( 7 J , ) . For all v N
g N
= r
with n[y) 6 N .
OHT,-!,^)
( l v
hence
ra+1
(7^)
<*(7/
,7 (rj)).
(7o)
d (7o ( T J - I ) , 7 0
fa-))
) = l i m L(r <"> )
7
i/oo
1/oo
<
l i m L(7i/)
v>oo
by Lemma 2.3.1
57
Urn L ( r ( ) )
7
Voo
>
lim L f r ^ 7 ^ )
= (7o). Lemma 2.3.1 then implies that 7 0 is geodesic. q.e.d. We now return to the proof of Theorem 2.3.3: We apply the preceding curve shortening process to all curves 7 ( - , $ ) , s [0,1], simultaneously. For each 5, a subsequence of r 7 ( - , s ) then converges to a geodesic from p to q. We want to exclude the situation that all those limit geodesies coincide w i t h c. Let
N
:= L(c),
and K\ : = sup
N
We distinguish two cases: (1) K\ > K Since 7 ( - , s) is continuous in s, so is r 7 ( - , s) for every n G N . We now claim:
0
(2.3.6)
7 ( - , )) < 2c
(2.3.7)
(2.3.8)
7 ( - , $))
58
Geodesic curves is monotonically decreasing in n by Lemma 2.3.1). By definition of K i , there exists a subsequence ( e ) N - * 0 w i t h
n n
s u p L ( r 7 ( - , 5 ) ) < fti + e . 3
n
A subsequence of ( r 7 ( - , s ))neN has to converge to some limit curve c as above, and because of (2.3.7) w i t h e = e , we conclude as in the proof of Lemma 2.3.2 that
n n
L(r(c))
= L(c),
and c is hence geodesic by Lemma 2.3.1. Because of (2.3.8) and continuity of L in the limit as in the proof of Lemma 2.3.2, we get L(c)
K\.
Since c and c are both defined on [ 0 , 1 ] and have different lengths, they have to be different curves. Thus, c is the desired second geodesic. (2)
K\
We are going to show that in this case, there even exist infinitely many geodesies from p to q. For that purpose, we consider the curve
This is a closed curve w i t h 7 ( 0 ) = 7 ( 1 ) = c(\) (see Figure 2.2). Since ho is a diffeomorphism and r 7 ( t , s) is obtained through a process that can easily be made continuous from 7(^,5) = /io(sin27T5sin7rt,cos27T5sin7rt,cos7rt), r 7 ( , s ) has to map [ 0 , 1 ] x [ 0 , 1 ] surjectively onto M . Therefore, for every n N and every 5 G [ 0 , 1 ] , there exists cr (s) w i t h
n n n
7(5) G r
n n
n 7
( . , a ( ) ) =:7n,s(0
n 5 n
(in other words, r 7 ( - , o - ( s ) ) is a curve passing through 7 ( 5 ) ) . 7 ,s(*) then is a curve with 7n,a(0) = C ( 0 ) = p , 7 n , ( l ) = c ( l ) = q, and because of K \ = KQ, we obtain lim L ( , ( . ) ) < sup l i m L ( r ( . , 5 ) ) = ^ . " 0<s<l ->
7 n s 7 0 n n
(2.3.9)
59
q
Figure 2.2.
After selection of a subsequence, (7n,s(*))nN again converges to some limit curve c (-) w i t h
s
c (0)
s
=p,c (l) = q
a
and
By (2.3.5),
and since K,Q is the infimum of the energies of all curves from p to q (o = L(c), and c is minimizing), c (-) is a minimizing curve itself, hence geodesic. Therefore, we have shown that for every 5, there exists a geodesic from p to q that passes through 7(5). Hence there exist infinitely many geodesies from p to g, as claimed. q.e.d. Remarks:
s
(1) Lemmas 2.3.1 and 2.3.2 do not need that M is diffeomorphic to S . Compactness suffices.
2
60
Geodesic curves (2) We may construct the curves 7 ,s(*) at the end of the proof also in case K\ > K . I n that case, however, limits of such curves need not be geodesic anymore. (3) See Section 3.1 for an abstract version of the argument at the end of the preceding proof.
n 0
: R -+ {(x\x )
e E |x
> 0 } , con
2.2
Compute the Euler-Lagrange equations and determine all solu tions. For curves
d
7
( t ) = ( 7 , , 7 ) : R -
{{x\ . ..,x )
< 1},
consider
2.3
Compute the Euler-Lagrange equations and determine all solu tions. Determine all geodesies between two given points on a cylinder {(x,y,z)eK :x
3 2
+ y ==l}.
3
2.4
2.5 2.6
for a smooth, positive / : E E. What can you say about geodesies on E? For example, are the curves (x, y) = constant geodesies? When are the curves z = constant geodesies? Determine Riemannian polar coordinates on the sphere S with a domain of definition that is as large as possible. Let p be the center of Riemannian polar coordinates on M , w i t h domain of definition {v G E : ||v|| < g}. Let c : [0, e] M be a geodesic with c(0) = p that is parameterized by arc-length, 0 < e < Q. Show that c([0,c]) does not contain a point that is conjugate to p.
n d
Exercises 2.7 Let M be a differentiable submanifold of R that is diffeomorphic to S . Show that for any p G M , there exists a nonconstant geodesic c : [0,1] M with c(0) = c ( l ) = p. Try to find other topological classes of manifolds w i t h the prop erty that there always exists more than one geodesic connection between any two points.
2 d
2.8
3.1 A f i n i t e d i m e n s i o n a l e x a m p l e Let F : E E be a function of class C which is bounded from below and which is 'proper' in the following sense: F{x) -+ oo for |x| - oo. (3.1.1)
d 1
: F(x) < $}
is compact.
(3.1.2)
F(x).
Then {x G E
d
is compact and nonempty, and since F is continuous, it has to assume its infimum on that set. We now assume that F even has two relative minima, x i , #2 in E , and that they are strict in the following sense: For x = # i , #2, we have
d
3<5 Vt/
0
with
0 < \y-x\
(3.1.3) critical
F has a third
> m a x ( F ( x i ) , F ( x ) ) = : o
d
with (3.1.4)
7(0) = x i , 7 ( 1 ) = * 2 . 62
63
We first observe that there exists a > 0 with the property that for any such curve, there exists t (0,1) with
0
F^(t ))>K
0
+ a.
(3.1.5)
In order to verify this, we may assume w.l.o.g. F(xx) We then choose < w i t h 5 0 < 6 < min((5 , ^ | x i - x | ) .
0 2
< F(x ).
2
(3.1.6)
F(y)
2
>
F(x ),
2
and since {\y x \ = 6} is compact, F assumes its minimum on this set, hence for some a > 0 min
\y-x \=S
2
F(y)>
F(x )+a
2
= K + a.
0
(3.1.7)
x \,
2
\7(t ) - x \ = 6
0 2
(recall (3.1.6)) .
F(i(t)),
*[0,1]
2
(3.1.8)
64
Saddle point
constructions
#3 will then be necessarily be different from X\ and #2- As a step towards the existence of such a point 3 , we claim Ve > 0 with sup F(7(0) <n\+6
t[0,l]
3<5 > 0
V curves 7 w i t h
7(0) = # 1 , 7 ( 1 ) = x
(3.1.9)
31
> i - 6
(3.1.10) (3.1.11)
|(VF)( (*o))|<c. Suppose this is not the case. Then 3to > 0 Vn N 3 curve 7 between Xi and x
2
with
(3.1.12)
supF(7(t)) < i + -
t V<
0
n F(7(t )) > KI -
0
0
0
with
(3.1.13)
|(VF)(7(o))|> .
(3.1.14)
i S
by
(VF)(7(*))-
7,.(<) := 7n() -
0 = VF(#2)> and so
n 5
xi,7 , (l)
n
= x ,
2
so that the curves 7 are valid comparison curves. By our properness assumption (3.1.2) and (3.1.12), 7 ( ) stays in a bounded subset of E , and VF will then be bounded on that bounded set, and hence for any So > 0 and all 0 < 5 < so, the curves 7 ,s(0 stay in some bounded set, too. This set is independent of n (as long as 0 < 5 < 5 , for fixed SQ > 0). By Taylor's formula
d n 0
F(7n,,(0)
= F( (t))
ln
- sVF( (t))
ln n
V F ( ( ) ) + o(s).
7
Since F is continuously differentiable and 7 , s ( ) is contained in a bound ed set, 0(5) can be estimated independently of n and t (as long as 0 < s < s ) . * Particular, after possibly choosing s > 0 smaller,
n 0 0
I |VF(7n(<))|
(3.1.15)
example
65
(3.1.16)
< F( (t))
ln
- ^4
(3.1.17)
for all such t and all n. We now simply choose n so large that i
0
< f 4
n 0 0
Then by our assumption, all t w i t h F ( 7 ( ) ) > i - e and hence for all such to F(ln,s (to))
0
< F( (*o)) 7
|e
by (3.1.12)
(3.1.19)
Having proved (3.1.19), there are now various ways to construct a path 7 from X\ to X2 w i t h F(7(*)) < i for all * [0,1]. (3.1.20)
One way is to refine the above construction by letting s depend on t as follows: we choose a smooth function v(t):[0, with a(t) and a(t) = s
0
l]-[0,* ]
0
=0
whenever
F(y (t))
n
< K\ e
whenever
n
F(y (t))
n
> K\-
~.
n
We then look at the path 7 ( f ) = 7 ,<7(t)(0- Then for t w i t h F ( 7 ( ) ) < F(7(0) = i ( 7 n ( 0 ) < i - e ,
0 ?
F(7(*)) < F ( ( t ) ) 7 n
<
- ? -
66
Saddle point
constructions
n
(cf. (3.1.15), (3.1.16), (3.1.14)), and finally for all t w i t h F(y (t))
K l
>
(cf- (3.1.19)).
Thus, (3.1.20) holds indeed. This, however, contradicts the definition of i . Therefore, the assumption that our claim was not correct led to a contradiction, and the claim holds. I t is now simple to prove the theorem. Namely, we let e 0 for n oo, and for e = e , we find < = 6 as i n > 5 the claim. We than choose a curve 7 from x\ to x% with
n n n n
(3.1.21)
F(~f (tn))>Ki-e
n
(3.1.22) (3.1.23)
|(VF)( (* ))|<e .
7 n n n / n
After selection of a subsequence, ( y (tn))neN then converges to some point x , because of (3.1.2) and (3.1.21). x$ then satisfies by continuity of F and V F
3
F ( x ) = i
3
(3.1.24) (3.1.25)
V F ( x ) = 0.
3
Thus, 3 is the desired critical point. q.e.d. Theorem 3.1.1 may be refined as follows: T h e o r e m 3.1.2. Let F as above again have two relative minima, not necessarily strict anymore. Then either F has a critical point x$ with F(x )
3
> max(F(xi),F(x )) = ,
2 0
points.
Proof. For the argument of the proof of Theorem 3.1.1, we only need
inf sup F ( 7 ( * ) ) > , t[0,l]
0 7
(3.1.26)
where the infimum again is taken over curves 7 : [0,1] E with 7(0) = # i , 7(1) = #2. So, suppose that (3.1.26) does not hold. We then want to
67
show the existence of infinitely many critical points. As in the proof of Theorem 3.1.1, we may assume F(x )
x
<
F(x ).
2
The argument at the beginning of the proof of Theorem 3.1.1 then shows that (3.1.26) holds if x is a strict relative minimum. I f x is a relative minimum, which is not strict, for all sufficiently small 6 > 0, say < < <5Q, 5 we have
2 2
F(x )
2
< F(x)
for all x
with
\x - x \ < <5
2 2
(3.1.27)
and there always exists some x$ with 0 < \xs x \ < 6 and F(x )
6 0 x
= F{x ).
2
(3.1.28)
We then put 8\ = <5 /2. Then xs is a relative minimum of F by (3.1.27), (3.1.28), hence a critical point. Having found a critical point xs w i t h 0 < \x - x \ < \x _ - x | , we put
n 6n 2 6n 1 2
<Wl =
and find a critical point xs
n+1
\ \x6
X\
2
with
n+1
- x \ < <5 i.
2 n+
Remark. I t is not very hard to sharpen the statement of Theorem 3.1.2 from 'infinitely many' to 'uncountably many'.
3.2 T h e construction of L y u s t e r n i k - S c h n i r e l m a n In this section, we want to prove the following theorem, in order to ex hibit some important global construction in the calculus of variations, in troduced by Lyusternik-Schnirelman. The result presented is much more elementary than the theorem of Lyusternik-Schnirelman, which says that on any surface w i t h a Riemannian metric, e.g. a surface embedded in some Euclidean space, diffeomorphic to the two-dimensional sphere, there exist at least three closed geodesies without self-intersections. The more elementary character of our setting allows us to bypass essential geometric difficulties encountered in a detailed proof of the LyusternikSchnirelman Theorem.
68
Saddle point
constructions
Figure 3.1.
T h e o r e m 3 . 2 . 1 . Let 7 be a closed convex Jordan curved of class C in the plane E . (7 then divides the plane into a bounded region A, and an unbounded one, by the Jordan curve Theorem. That 7 is convex means that the straight line between any two points of 7 is contained in the closure A of A.) Then there exist at least two such straight lines between points on 7 meeting 7 orthogonally at both end points (see Figure 3.1).
2
Proof. We start by finding one such line. Let C be the set of all straight lines / in A w i t h dl C 7. We say that a sequence (l )neN C converges to / E , i f the end points of the l converge to those of /. I n order to have a closed space, we allow lines to be trivial i.e. to consist of a single point on 7 only. We denote the space of these point curves on 7 by We let / : = [0,1] be the unit interval. We consider continuous maps
n n 0
v:I-*C with the following two properties: (i) v(0) = v(l). (ii) To any such family, we may assign two subregions A\(t) and A (t) of A in a certain manner. Namely, we let A\(t) and A (t) be the two regions into which v(t) divides A. Having chosen A\(0) and A (0), A\(t) and A (t) then are determined by the continuity
2 2 2 2
t A closed Jordan curve is a curve 7 : [0, T] R with 7 ( 0 ) = 7(T") that is injective on [0, T ) . Cf. the definition of a Jordan curve on p. 35.
69
Figure 3.2.
A (0).
2
We let Vi be the class of all such families v. The construction is visualized in Figure 3.2. (0 corresponds to 0 J,
/to\J/
to I / / / to | , 1 to 1)
Actually, in order to simplify the visualization, if v(0) is a point curve (on 7), i) may be relaxed to just requiring that v(l) also is a point curve (on 7), not necessarily coinciding with v(0) (see Figure 3.3). Namely, any point curves can be connected through point curves, i.e. w i t h vanishing length. We denote by L(l) the length oil C and define K\ : = inf supL(v(t)). ^i tei
v y
Figure 3.3.
Saddle point
constructions
i > 0. For this purpose, let p > 0 be the inner radius of 7, i.e. the largest p for which there exists a disc B(x p)
0l
C A
2
:= {x G E
P\ B(XQ, p)).
We let Aft) : = ^ ( t ) n B(xo,p), i = 1,2. Because of (ii) and the continuous dependence of J4*() and hence also of A (t) on t, there exists some to I w i t h
f {
Area
( t ) ) = Area
0
(A' (t )).
2 0
Thus v(to) divides B(xo,p) into two subregions of equal area. v(to) then has to be a diameter of B(xo,p), i.e. L(v(to)nB(x ,p))=2p.
0
Therefore i > i = 2p > 0 and Ki is positive indeed. We are now going to show by a line of reasoning already familiar from Sections 2.3 and 3.1 that K\ is realized by a critical point / of L among all lines w i t h end points in 7, i.e. by / meeting 7 orthogonally (see Theorem 1.4.1). For that purpose we shall assume for the moment that 7 is of class C . Later on, we shall reduce the case where 7 is only C to the present one by an approximation argument. We now claim
3 1
Ve>0
Vv Vi w i t h Ki+6
with
L (v(to))
> i - c < c,
|cos(ai (v (t )))\
0 2
, |cos ( a {v (t )))\
2 0
and a (l)
71
3v
G V\ w i t h
s u p L ( v ( t ) ) < i 4-
t
V t w i t h L (v (*o)) > K I
0 n n
The idea to reach a contradiction from that assumption is simple, once the following Lemma is proved: L e m m a 3 . 2 . 1 . For every planar closed Jordan curve 7 of class C , there exists (3 > 0 with the following property: Whenever x G E satisfies
2 3
dist(x,7) : = inf \x y\ < (3 ye-y there exists a unique y G 7 with dist(#,7) = \x y\. Proof. We consider 7 as an embedded submanifold of the Euclidean plane E . 7 is then covered by the images of charts / : U V of the type constructed in Theorem 2.2.1. Here, U and V are open in E , and
2 2 7 n
v = f (u
1
n {x = 0 } ) .
2
Furthermore, the curves x = constant in U correspond to geodesies, i.e. straight lines in V perpendicular to 7, and they form shortest connec tions to 7 fl V. By shrinking U, i f necessary, we may assume that i t is of the form ( - , ) x (-77, rj), w i t h > 0, rj > 0. Since 7 is compact, i t can be covered by finitely many such charts fi
:
( - 6 , 6 ) x (-WiVi)
x
~* i
1
, i = l,...,m.
1
If we then restrict fi to ( - 6 , 6 ) ("f ' ^J, lines x = constant, ~ k < x < ^ , then correspond to shortest geodesies to 7, since the part of 7 not contained in V{ is not contained in the image of fi, and hence has distance at least ^ from the image of the smaller set ( & ) x , ). This is indicated in Figure 3.4 where the broken lines correspond to x = ^ and this is depicted for two different indices i. Therefore, (3 : = min ( ^ ) satisfies the claim.
2 2
i=l,...,n
q.e.d.
Saddle point
constructions
Figure 3.5.
We now return to the proof of Theorem 3.2.1: Without loss of generality eo < 0 <
n
Assume e.g.
0
cosai (v (to)) > e . The following construction is depicted in Figure 3.5. Choose si(to)
of Lyusternik-Schnirelman
73
with
|ai(*o)-Pi(to)|=0,
where p\(to) is the endpoint of v (to) where i t forms the angle a\(to) w i t h 7. We replace the subarc v^(to) of v (to) between p i ( t ) and si(o) by the shortest line segment v (to) from s\(to) to 7. By the theorem of Pythagoras and the convexity of 7
n n 0 f n
L (v' (t )) < L (y (t )) s i n a i (v
n 0 n 0
(t ))
0
<L(vi(t ))yfl^.
0
We then let Vn(to) be the straight line from the second endpoint p (to)
2
of v (to)
n
to the
v (to)
n
between s\(to)
n 0
and p (to),
2 n 0
+ L(v (t )).
n 0 2
L{vl(t ))
Q
< KI-/? +
We then choose n so large that Py/l for some rj > 0. Hence ~4 + Ki-0+^<Ki-ri
whenever
L (v (t)) > Ki - 0
n
whenever
L (v (t)) < K\ 20
n
i = 1,2
foralU.
74
Saddle point
constructions
n
We then choose again the shortest lines from Si (t) to 7 and replace v (t) by the straight line v (t) between those points, where these two shortest lines meet 7. By our geometric argument above
n
(t)) >
KI -
e.
0
(t)),
min(77, (3)
contradicting the definition of K \ . Consequently, our claim is correct. We then find a sequence (t ) C / and (v ) C V\ w i t h
n neN n neN
(t)) < K \ + n (t ))
n
>
K\ -
|cos ( a i (v (t )))\,
n n n n ne
|cos ( a (v (t )))\
2 n n
<
n A subsequence of (v (t )) ^ then converges to a straight line l\ in A of length K\ meeting 7 orthogonally at its endpoints. In order to construct a second line l meeting 7 orthogonally at its endpoints, we proceed as follows: We denote by V the class of all continuous maps
2 2
(3.2.1)
T(8) = (h(8)MB))
Lyusternik-Schnirelman
75
f -o
2
= 1/4
= 1/2
f = 3/4
2
r -l
2
Figure 3.6.
with t i ( l ) = 1 - t i ( 0 ) , t ( 0 ) = 0 , t ( l ) = 1,
2 2
(3.2.2)
we have
Let us exhibit an example of such a v 6 V (see Figure 3.6). We consider the v\ G V\ of Figure 3.5 where i>i(0) and V\(\) were point curves on 7, and we rotate v\ via the parameter t so that at 2 = 1 we have the same picture as at t = 0, but w i t h t\ interchanged w i t h 1 t i . Equation (3.2.2) then holds. We note that I x I becomes a Mobius strip, when we identify the parameter t\ on the line t = 1 w i t h the parameter 1 t\ on the line t =0. We define
2 2 2 2 2
:=
Then
K
2
> Ki
2
and K again is realized by some straight line l in A meeting 7 orthog onally at its endpoints. We consider two cases:
2
(1) K > K\. Then L(l ) = K > K \ = L ( i i ) , and l hence is different from l\. (2) K = tti. We claim that in this case, we even get infinitely many solutions of our problem, i.e. lines in A meeting 7 orthogonally. Namely, we let VQ G V be any critical family, i.e. satisfying
2
76
Saddle point
constructions
2
(It is not hard to see that in the present case such a VQ G V indeed exists.) We then have for any r : J -+ I
sei
2
w i t h (3.2.2)
0 2
(3.2.3)
VQOT
eVi, (3.2.4)
sei
and since K \ = K , we have equality in (3.2.3) and (3.2.4). This means that VOOT is a critical family for i , and i t then has to contain a solution l of our problem. Let S C {(s,t) e I x 11 L(i;o(s,)) = K ) } denote the set in J x J corresponding to all solutions induced by vo. After carrying out the identification prescribed by (3.2.2), which makes I x I into a Mobius strip, we see that the complement of S in this Mobius strip is not path connected. Namely, otherwise we could find r satisfying (3.2.2) for which T ( J ) avoids 5, and for such a r , vo o r would then not contain a solution, as S is the set of all solutions in the family VQ. This, however, contra dicts what has just been said (see Figure 3.7). I n fact, S has to carry a one dimensional cyclef on the Mobius strip. Otherwise, S would be contractible (in the Mobius strip) and one could reparameterize VQ on I so that the set of solutions corresponds to a finite number of points. But this is incompatible w i t h K = K \ as we have just seen. Since for each path r as in (3.2.2) w i t h r ( J ) C S, vo o r G V\ is nonconstant by (3.2.1) and (3.2.2), we obtain an uncountable number of solutions. We thus have shown our result if 7 is of class C . I f 7 is only of class C , we choose a sequence of curves y of class C approximating 7 . This means that there are parameterizations 7 U ( T ) , 7 ( T ) by arc-length w i t h
r 2 2 2 3 1 3 n
lim s u p ( | 7 ( r ) - 7 ( r ) | +
n
= 0.
n
noo
\
2
We then let li, and Z ,n be the corresponding solutions for j . After selection of subsequences, l\ and l then converge to solutions Zi, l for 7 , and those l\ and l realize the critical values K \ and K , respectively. Since the argument to produce infinitely many solutions in case K \ K
n in 2yU 2 2 2
f We have to employ here some constructions from algebraic topology. A reference is any good book on that subject, e.g. M.Greenberg, Lectures on Algebraic Topology, Benjamin, Reading, Mass., 1967, pp. 33-45, 186. While this is somewhat technical we strongly urge the reader to try to understand the essential geometric idea of the preceding construction.
77
^ ti Figure 3.7.
did not depend on a higher differentiability assumption on 7, i t is still applicable here, and we thus can complete the proof as before. q.e.d.
The variational content of Theorem 3.2.1 is that we produce two geodesies in E that meet a given convex Jordan curve orthogonally In fact, this statement generalizes to any closed convex Jordan curve on some surface, enclosing a domain homeomorphic to the unit disk. In Sections 2.3, 3.2, we could only treat variational problems that could be reduced to finite dimensional problems, because we did not yet develop tools to show the existence of critical points of functionals defined on infinite dimensional spaces. We shall develop such tools in Part I I , and consequently in Chaper 9 of Part I I , we shall be able to present general results about the existence of unstable critical points in the spirit of the preceding results. The crucial notion will be the PalaisSmale condition that guarantees that the type of reasoning presented in Section 3.1 extends to certain functionals defined on infinite dimensional spaces. Also, the reasoning employed in Section 3.2 that infinitely many critical points can be found i f two suitable critical values coincide will be given an axiomatic treatment in Section 9.3 of Part I I .
2
78
Saddle point
constructions
Exercises 3.1 Let F C ^ M j R ) ( M an embedded, connected, differentiable submanifold of R ) be bounded from below and proper (i.e. for all 5 R, {x M : F(x) < s} is compact), and suppose F has two relative minima X i , x . Let
d 2
KO : = m a x ( F ( # i ) ,
Ffa))-
Show that F either possesses a critical point 3 w i t h F(x^) > ^o, or that i t has uncountably many critical points. 3.2 Let F C ( R , R ) be bounded from below and proper, and suppose i t has three strict relative minima i , # 2 , 3 . Try to identify conditions under which F then has to possess more than two additional critical points, e.g. three or four. Let A be a compact convex subset of the unit sphere S C R , and suppose OA is a smooth curve 7; the convexity condition here means that for any two points in A, one can find precisely one geodesic arc inside A that connects them. Show the existence of at least two geodesic arcs in A that meet 7 orthogonally at both endpoints.
2 3 x d
3.3
4.1 T h e canonical equations We let t be a real parameter varying between t\ and t . We consider the variational integral
2
I = f L Jti
fax ^),...
,x (t),x (t),...
1 n
,x (t)) x (t))
dt
(4.1.1)
for the unknown functions x(t) = (x ^),..., x(t\) and x(t ). Here,
2
w i t h fixed endpoints
.i _ <W_ ~ dt'
: 2
(i = l , . . . , n ) .
(4.1.2)
^0.
(4.1.3)
2
As shown in 1.2, this implies that solutions of (4.1.2) are of class C . (4.1.3) also implies that we may perform a Legendre transformation. Namely, by the implicit function theorem, we may then locally solve Pi = L w.r.t. x , i.e. x =x*(t,x,p)
i l i
(4.1.4)
( p = (pi,...,Pn)). 79
(4.1.5)
80
The expressions pi are called momenta. The Hamiltonian H is defined as H(t,x,p) We obtain
H x i
:= x pi - L ( , x , x ) .
(4.1.6)
M ~
d x ^ ~
>
~L i.
x
= -p..
(4.1.7)
Also
n
Pi
Pj
r dpi
L/ j
X
dpi
(4.1.8)
(4.1.7) and (4.1.8) constitute a so-called canonical system. We are going to see that (4.1.7) and (4.1.8) also arise as Euler-Lagrange equations of the variational problem obtained by expressing L in (4.1.1) through H via (4.1.6). Namely, I = j [x Jti
2 j Pj
-H(t,x,p))
dt,
(4.1.9)
where the unknown functions are x(t) and p(t), has Euler-Lagrange equations (4.1.7) and (4.1.8), and so does J = [
2
(x pj+H(t,x,p))dt.
(4.1.10)
Before proceeding, we observe that i f H does not depend explicitely on t, i.e. H = H(x,p), then i f is a constant of motion, i.e. constant along any solution x(t) of the equations, Namely, ~H by (4.1.7) and (4.1.8). (x(t),p(t)) = H ix
x l
+ H pi
Pi
= 0
(4.1.11)
equation
81
+ V(x),
This example, which describes the Newtonian motion of a particle of unit mass subject to a potential F , is helpful for remembering the signs in the canonical equations.
4.2 T h e H a m i l t o n - J a c o b i equation Assumption. There is given a set fi C M = { ( ^ x , . . . , ) } with the property that for any points A, B fi, A = (a, ft , . . . , ft ), B = (s, q ,..., q ), there is a unique solution x(t) = ..., x (t)) of (4.1.2) contained i n fi w i t h (a, x(o~)) = A, (s,x(s)) = B. Thus, fi is covered by solutions of (4.1.2), and those can be considered as functions of their endpoints. Thus
1 71 1 n n n + 1 1 n
= and also
(t;s,q\...,q ;a^\...^ )
(4.2.1)
Pi = gi(t;s,q\...,q ;o-,K, ,...,K, ) In particular, = Q* = We also define ipi := gi(a; 5 , q ,..., Vi := gi(s; s,q ,...
x l
= L i.
(4.2.2)
(a; s , ? , . . . , ^ ; a,
i 1 n
ft ,...,
1 n
ft )
71
(4.2.3)
q ; a, ft ,..., ft ) = L i (a,
k
ft,
ft)
(4.2.4)
,q ;a,
ft ,...,
ft )
L^(s,q,q).
In the sequel, / * etc. will mean a derivative w.r.t. the first independent variable, f etc. a derivative w.r.t. the second one. Inserting (4.2.1), (4.2.2) into i " , we obtain
l
J = I(s,q,a,K)
(4.2.5)
82
and call this expression the geodesic distance betweeen A and B. I n this connection, I is called eiconal. Recalling (4.1.9), we may write
I = j
( < -H(t,x,p))dt.
p<
(4.2.6)
- H(s, q, v) + j f {g'J
+ f'
9i
- H f'
Xi
- H gty
Pi
dt
Pi
Is
- H{s, q,v) +
(oif)
1
l I ts
\t = (T
f f
= 0 = 0
for t = s for t = a,
- Vitf
v) (4.2.7)
q3
~l
[W
9i^\
*W~
n
*W
1 N
t=s
^(s; 5, g , . . . , q ;
G , K , . . . , K)
by (4.2.3)
and : = 0, ~P~~~t = 6{j.
dqi
dq3
13
Thus I
q j
Vj
=L#(s,q,q).
(4.2.8)
equation
k L x.
k
(4.2.9) (4.2.10)
-L (o-,K,,k).
kJ
+ H(
(4.2.11)
Thus, the geodesic distance as function of the endpoint satisfies (4.2.11), a Hamilton-Jacobi equation. I n the present context that equation then is also called eiconal equation. We observed at the end of Section 4.1 that H is constant along solutions i f it does not depend on t explicitly. In that case, (4.2.11) implies that / then depends linearly on 5. I t may be useful for understanding the preceding formulae if we derive them without the use of the Legendre transformation. Thus 1= and f
J a
L(t,x(t),x(t))dt=
f
J o
L(t,fJ)dt
and so
L if
fort
= 5
/*' = 0 hence I
s
for t = a,
= L(s,q,q)
L#q\
84
dq
dq
dp rT"
t= s
L +i
+ Iq
qi
=0.
(4.2.12)
We have seen in the preceding how solutions of the canonical equa tions yield solutions of the Hamilton-Jacobi equations. We now want to establish a converse result. Let (p(t, x , . . . , x ) be a solution of the Hamilton-Jacobi equation which we now write as
1 n
po + H(t, x\ . . . , x , with Po =
P l
, . . . ,p ) = 0
n
(4.2.13)
>t
Pi = <Px*-
D e f i n i t i o n 4 . 2 . 1 . If <p = G(t, x , . . . , x , A i , . . . , A )
n 1 n
with G e C
(4.2.14)
and det(G
x %
). .
= 1
^ 0
(4.2.15)
n
(4.2.16)
(where A is a free real parameter) a complete integral of (4-2.13). We have the following theorem of Jacobi: T h e o r e m 4 . 2 . 1 . Let <p = G(t, x , . . . , x , A i , . . . , A ) - f A be a complete integral of (4.2.13). Then one may obtain a family of solutions of the
n 1 n
equation
85
(4.2.17) (4.2.18)
1 n
H.
x
= -Pi
n
depending on 2n parameters A i , . . . , A , / i , . . . , / i G
Xi
= V?
P i
Gi =
x
Proof. Because of (4.2.15), (4.2.19) may be solved w.r.t. x*, x = *(, A i , . . . , A n , / / , . . . , / / ) . Inserting this into (4.2.20) then yields Pi =Pi(t,
% P l 1 7 1
Aij-.-jAn,// ,...,//").
We have to show that x and i satisfy the canonical equations. For this purpose, we differentiate (4.2.13) w.r.t. x and obtain:
l
Gtxi + H G k
Pk x
xi
+ H i = 0.
x
(4.2.21)
(4.2.22)
(4.2.23)
Comparing (4.2.22) and (4.2.23) and recalling (4.2.15) yields (4.2.17). Differentiating (4.2.20) w.r.t. t, we obtain ^ = G + G ^ . (4.2.24)
xH
Comparing (4.2.24) and (4.2.21) and using the relation (4.2.17) just derived, we then obtain (4.2.18). q.e.d. The canonical equations are a system of ODE whereas the H a m i l t o n Jacobi equation is a 1 order partial differential equation (PDE). The preceding considerations show the equivalence of these equations. While in general, one may consider a P D E as being more difficult than a system of ODE, in applications, one may often find a solution of the canonical
s t
86
equations by solving the Hamilton-Jacobi equation. Here, it is typically of great help that the Hamilton-Jacobi equation does not depend on the unknown function itself, but only on its derivatives. Let us consider the following example of geometric optics: / = / Jti </?(, x) y/l + x dt
2
(</?(, x) > 0 ) ,
already explained in Example (3) of Section 1.1 in a slightly different notation. The physical meaning is that x(t) is considered as the graph of a light ray travelling in a medium with light velocity ^ ^ y , where c is the velocity of light in vacuum. I n this example, putting L(t, x, x) = <p(t, x) \ / l + x , we have p= L
x 2
(4.2.25)
ipx VTTx
2
H = px-L
= - vV-P -
(4.2.26)
7(s, g, cr, K) here is the time that a light ray needs to travel from A = (a, K) to B = (5, q). The Hamilton-Jacobi equation I + H (5, g, I ) = 0 becomes the eiconal equation
s q
I +I =<p .
s q
(4.2.27)
The surfaces 7(5, q) = constant are called wave fronts. Another simple example comes from a quadratic L(t, x, x) = i ( x + a x ) Then p= L
x 2 2
(a = constant).
(4.2.28)
= x,H
= p x - L = \(p
- ax ),
(4.2.29)
- a x ) = 0.
(4.2.30)
If we substitute I = p(t)x ,
(4.2.31)
4.3 Geodesies
87
/a
+ 2Ad.
(4.2.32)
x a V /< f + 2A
2 1
This can be solved for x; let us assume for example a < 0; then the solution is x = y sin (v'a ( 4/^)).
x of course solves the Euler-Lagrange equation for (4.2.28) x = ax. A physical realization is the harmonic oscillator, where x(t) is the dis placement of an oscillating spring, with a ~ (rn mass, k = spring constant). Since p= I we obtain from (4.2.32) A= i.e. A is the energy of the spring. H(x,p),
x
, I + H(x,I )
t x
= 0,
4.3 Geodesies We consider the case where L is homogeneous of degree 1, i.e. L = L&x\ Then det L i
x xj
(4.3.1)
=0,
(4.3.2)
88
(4.3.3)
and the computations of Section 4.2 yield (writing L i instead of pi etc.) I, = L(8,q,q)-q L i=0
4 i
(4.3.4)
A n example are the geodesic lines considered in Chapter 2. Here, L = s/Q with Q = g (x\...,x )x x .
ij n i j
(4.3.5)
s(^j
")-75
"
( 4
- '
6 )
Since t does not occur explicitely in (4.3.5) and since I is invariant under transformations of t, we may choose t such that Q = 1, (4.3.7)
i.e. that solutions are parameterized by arc-length. Equation (4.3.6) then becomes j Qi*
t
= 0.
(4.3.8)
Conversely, along a solution of (4.3.8), we have Q = constant, justifying our choice of t. Namely, Q is homogeneous of degree 2 w.r.t. the variables x , hence
1
Q iX
x
2Q.
(4.3.9)
{lt
) ^
Jt
89
As already demonstrated in 2.1, (4.3.8) are the Euler-Lagrange equa tions for
E =
\j
Q( ( )^(t))dt=^^
x t
gijixityxWxi^dt.
(4.3.10)
y/Qdt <(s-a)
(^j Qdt^j
w i t h equality precisely i f Q = constant, and the extremals of E are pre cisely those extremals of I parameterized proportionally to arc-length. In contrast to J, E is no longer invariant under transformations of t. Therefore, for solutions of the Euler-Lagrange equations corresponding to E, the parameterization is determined up to a constant factor. The Hamiltonian for E is H = Q ix
x {
-Q
= Q
because of (4.3.9) .
(4.3.11)
Moreover, Pi = Q =2 x .
xi gij j
(4.3.12)
Thus H = \g Pi
ij Pj
(4.3.13)
= 0
(4.3.14)
ij
(4.3.15)
pi = - j - ^ r P f e P i
cf
-1 - - )' ( - - )> ( - - )-
T(a,K,\...,K, )
90
then defines a possibly degenerate hypersurface E (assume E ^ 0). Given B = (s, g , . . . , q ) G fi, we seek A = (a, K , . . . , K ) G E that minimizes
1 n 1 N
I(s,q\...,q ,a,
1 n
K \ . . . , K
as a function of (a, K , . . . , tt ) satisfying (4.4.1). A t such a minimizing A, we have with some Lagrange multiplier A I
a
+ \T
= 0 (j = l , . . . , n ) .
a K
(4.4.2)
4;+AT^=0
Unless the situation is degenerate (A = 0 or T = T t = 0 for all i), this means that the vector (I , I i,..., I n) is proportional to the gradient of T, hence orthogonal to E. From (4.2.9), (4.2.10), we then obtain
a K K
= XT
(4.4.3)
These are equations for the tangent vector ( K , . . . , k ) of the solution from A to B. A solution satisfying (4.4.3) is called orthogonal to E. We want to use the following: A s s u m p t i o n . Through each point of fi, there is precisely one solution orthogonal to E. For each B (s, g , . . . , q ), we thus find A = (a (5, q), K (5, q)) G E minimizing 7(s, g, a, K). We call
J ( 5 , g) : = 7 (5, g, a (5, g ) , K ( S , g))
1 n
unique
the geodesic distance from the hypersurface E. T h e o r e m 4 . 4 . 1 . Given such a field of solutions orthogonal to E, the geodesic distance satisfies J and JJ
Q 9
= -H(s,q,L )
4
(4.4.4)
=Ly,
(4.4.5)
(4.4.6)
44 Proof. J =Is
s
Fields of extremals
91
+
T
I<rO-
I iK
K
(4.4.7)
D
7 T(a(s,
+ fc*T^ = 0
and likewise
Iqi
and the result follows from (4.2.7), (4.2.8), (4.2.11). q.e.d. Conversely T h e o r e m 4.4.2. If J(s,q) is a solution of (4-4-6) of class C , there exists a field of solutions orthogonal to the hypersurfaces J(s, q) = con stant, and J is the geodesic distance from the hypersurface J = 0. Proof. Let J satisfy (4.4.6). We put Vi'.= JAs,q). The following system of ODE q = H (s,qi,J )
Pi qJ l 2
(4.4.8)
(4.4.9)
then defines an n-parameter family of curves. By (4.4.8), we have along any such curve Pi = Jqi + JqiqJQ ,
S 3
+ Hq + HpjJq q
= ~H .
qi
(4.4.10)
92
Equations (4.4.9) and (4.4.10) state that the curves q(s) constitute a field of solutions. (4.4.6) and (4.4.8) yield -H = J
S
Pj = Jqi This means that (4.4.3) is satisfied for T = J with A = 1, and the solutions are orthogonal to the hypersurfaces J = constant. q.e.d. Theorem 4.4.1 gives solutions of the Hamilton-Jacobi equation (4.4.6) depending on an arbitrarily given function T G C ( M ) (namely, we obtain those solutions that start on T = 0), whereas Theorem 4.4.2 implies that all solutions are obtained in that way. The surfaces J = constant are called parallel surfaces of the field. I n the special case where the hypersurface T 0 degenerates into a point, we recover the consid erations of Section 4.2.
1 n + 1
4.5 Hilbert's invariant integral and Jacobi's theorem For a solution J(t, x , . . . , x ) of the Hamilton-Jacobi equation, we put again Pi
1 n : = 1 n
Jx^ti
x , > x ).
1 n
If A = (a, K , . . . , K ) and B = (s, q ,..., q ) are connected by an arbi trary differentiable path x*(r), the integral
J(B)-J(A)
Jj J(T,x(T))dT
t
-h
'
Pi i
r
Ji
x
dx
r \ j +J
T
J dr
\
does not depend on this particular path, but only on the end points A and B. We rewrite this integral as dx -^-H(r,x(T),p(T)))dT (4.5.1)
and call it Hilbert's invariant integral. Conversely now let functions Pi(r, x , . . . , x ) be given in a region ft C E for which the integral (4.5.1) does not depend on the path x ( r ) connecting A = (a, x (a)) and
1 n n + 1
4-5 Hilbert's invariant integral and Jacobi's B = (s,x (s)). Thus, we may define J : fi R by > J(B) - J{A) = ( p ^ - - H
theorem
93
(T, x ( r ) ,p ( T ) ) ) dr.
(4.5.2)
Since this integral does not depend on the path connecting A and B, we must have J i =
x P i
(4.5.3)
J =
t
-H(t,x,p).
J then solves the Hamilton-Jacobi equation. By Theorem 4.4.2, any so lution of the Hamilton-Jacobi equation is the geodesic distance function for a field of solutions of the canonical equations. Thus, any invariant integral of the form (4.5.1) yields a field of solutions. Let us now reconsider Jacobi's Theorem 4.2.1. Let
v
(4.5.4)
P l
, . . . ,p ) = 0
n
(4.5.5)
(p i);
x
(4.5.6)
Jacobi's theorem says that we obtain a 2n-parameter family of solutions of the canonical equations by solving
Gi
x
= Pi,
n
where the parameters are A i , . . . , A , fi\ . . . , /Li . For fixed values of A i , . . . , A , A, G determines a field of solutions of the canonical equations, and by the preceding consideration, i t is given by the corresponding invariant integral
n n
G(B)
- G(A) = =
l
(G . ^-H^Jdr
x
(4.5.7) - ( r ) ) L J dr,
| l (r, ^ ( r ) , *(r)) +
where x (r) now denotes the derivative in the direction of the solution and not in the direction of the arbitrary curve x (r) connecting A and B. We now vary A i , . . . , A , but keep the curve x*(r) fixed. Then the field
l n
94
of solutions varies, and so then does X (T). We also determine A so that G(A) = 0. Differentiating (4.5.7) then yields
<
In the same way as G(B), this expression only depends on B (A is kept fixed for the moment) but not on the particular x (r). For each J3, we find Bo on the surface
3
that can be connected with B by a solution of the canonical equations. Along such a solution, we have dx
3
and the integrand in (4.5.8) thus vanishes along this curve. Instead of integrating from A to B, i t therefore suffices to integrate from A to JBo, and we obtain G
% Xi
= ii\
%
(4.5.9)
w i t h \i being the value of the integral from A to Bo- Thus, \i can be considered as a constant for the solution passing through BQ. If, conversely, (4.5.9) defined a family of curves x (t, \j,fjt ) (the family is locally unique because of (4.5.6)), then, since G\ is constant, the integrand in (4.5.8) has to vanish along any curve of the family Thus
l J 3
/dx
\ -x>)L , =0
x Xi
(t = l , . . . , n ) .
(4.5.10)
= detG
XJXl
+ 0.
.,
this means that the curves defined by (4.5.9) are solutions of the canoni cal equations contained in the field defined by G(t, x , . . . , x , A i , . . . , A ) . We also observe that the parameter A is only used for specifying the sur face G = 0 and has no geometric meaning beside that.
1 n n
4-6 Canonical
transformations
95
-> R
H-> (,7r),
Equation (4.6.1) constitutes a system of ODE and i f the assump tions of the Picard-Lindelof theorem are satisfied, a solution exists for given initial values x(to) = XQ, p(to) = po on some interval [ t o , t i ] . For any i E [ ^ o ^ i ] e then obtain such a transformation by letting (x,p) = x ( f ) , 7r(x,p) = p(i) where (x(t),p(t)) is the solution of (4.6.1) w i t h x(t ) = x,p(to) = p. Thus, the evolution of (4.6.1) in time t, the so-called Hamiltonian flow, yields 'canonical t r ansfor mat ions'. However, the concept of canonical transformations is more general as we now shall see. Since
w 5 0
f A diffeomorphism is a bijective map that together with its inverse is everywhere differentiable.
96 and
dx
r
d7T4
+ HPi
dx
dp
~
drci dpi dp dx
i
dp or in matrix notation
dp
0T 7
T
(r
l-(ff)
T
-()
(4.6.4)
where A denotes the transpose of a matrix A. Obviously, this is a con dition that does not depend anymore on the particular Hamiltonian H. Definition 4.6.1. A diffeomorphism ip : R R , (x,p) H-> (,7r), satisfying (4-6.3) (or equivalently (4*6.4)) is called canonical transfor mation.
2 n 2 n f
Canonical transformations can often be used to simplify the canonical equations. Before we return to that topic, however, we interrupt the discussion of the Hamilton-Jacobi theory in order to describe some basic points of symplectic geometry (for more information on that subject, we refer to D.Mc Duff, D.Salamon, Introduction to Symplectic Topology, Oxford University Press, Oxford, 1995). We denote the (n x n) unit matrix by I and put
n
Then obviously J
2
~hn.
(4.6.5)
4-6 Canonical transformations Equation (4.6.4) may then be written as (DI/J)~ or equivalently (Di)) JD<4> = J.
T 1
97
= -J(Dil>) J,
(4.6.6)
(4.6.7)
In this connection, a satisfying (4.6.7), i.e. a canonical transformation, is also called symplectomorphism. Prom these relations, one also easily sees that ^ is a canonical transformation iff is. In terms of J , the canonical equations (4.6.1) can also be written as z = -JVH(t,z)
X P
(4.6.8)
where z = (x,p), VH(t,z) = (H ,H ). For a reader who knows the calculus of exterior differential forms, the following explanation should be useful. We consider the two-form u = dx A dpi
1
on E
2 n
d A dnj = y ^ i g ^ ~
precisely i f ^ is a canonical transformation. I n fact, this is often used as the definition of a canonical transformation. I f UJ is left invariant under so is uj : = ( J A - A ( J = n\(l) ^ ^'dx A'
n times
n n z l
- A d x A d p i A - -Adp . (4.6.10)
n
A Adx
Adp A A d p ,
x n
we conclude Liouville's:
98
R >
2 n
satis
(4.6.11) q.e.d.
l n
One also expresses this result by saying that a canonical transforma tion is volume preserving in phase space as dx A A dx A dpi A A dp can be interpreted as the volume form of R . By what was observed in the beginning of this section, this applies in particular to the Hamiltonian flow which constitutes Liouville's original statement. After this excursion and interruption, we return to our canonical equa tions (4.6.1) and t r y to simplify them by suitable canonical transfor mations. Canonical transformations may be easily obtained from the variational integral
2n
I with
L(t,x,x)dt
(p =
L ).
x
dW\
so that / * and / differ only by a constant independent of the particular path x(t). Thus, we may for example take any function W ( , x , ) and require that for all choices of x, , x, dW x-pThen, with H(t,x,p) =7 T
H , T T )
(4.6.12)
4-6 Canonical transformations (x(t),p(t)) then becomes a critical path for I * . Since dW = (4.6.12) becomes x-(p-W )--(n
x
99
+ W .x
x
W Z,
r
+ Wz)-H
+ H*-W =0
t
(4.6.13)
Since (4.6.13) is required to hold for all choices of x, , x, , we obtain: T h e o r e m 4.6.2. Given an arbitrary (differentiable) function W(t, x, ), a canonical transformation (transforming (4-6.1) into (4-6.2)) is ob tained through the equations P^W
X
(4.6.14)
Wt = 0 of course means that W = W ( x , ). I n the same manner, we may also take a function W(t,p, ), W(t, x, 7r) or W ( t , p , 7r). I n the first case, we obtain for example the equations x = W
p
H* H , i.e. W = 0.
t
Here and above, of course H* = H*(t,,ir). We may now easily explain Jacobi's method for solving the canonical equations. We t r y to find W(x,) satisfying H(t,x,W (x,Q)
x
= H*(Q,
(4.6.15)
i.e. reduce the Hamiltonian to a function of the variable alone. We have to require that detW t
x v
7^0.
(4.6.16)
This ensures that the equation 7r = W^ determines x, and p then is determined from p = W . I f (4.6.15) holds, (4.6.2) becomes
x
i =o
7r = - i f | . (4.6.17)
This implies that are constants of motion (i.e. independent of t), or so-called integrals of the Hamiltonian flow. A system for which
100
n independent integrals can be found is called completely integrable. Thus, if we can find a so-called generating function W(x, ) of the above type reducing the Hamiltonian to a function of alone, the canonical system is completely integrable. Clearly, since in this case , . . . , are constant in t, the relation 7r = if|() can then be used to determine 7 T i , . . . , 7r . I n other words, a completely integrable canonical system may be solved explicitly through quadratures. Actually, one may show in this case that the sets T = { = c , . . . , = c } for a constant vector c = ( c , . . . , c ) are n-dimensional tori, if compact and connected. Thus, the so-called phase space { ( x , p ) G R } is foliated by tori that are invariant under the motion, and on each such torus, the motion is given by straight lines. It should be pointed out, however, that completely integrable dynam ical systems are quite rare, in the sense that the complete integrability usually depends on particular symmetries, and their dynamical be haviour is quite exceptional in the class of all Hamiltonian systems. The invariant tori may disappear under arbitrarily small perturbations. By way of contrast, the Kolmogorov-Arnold-Moser theory asserts that these invariant tori persist under sufficiently small and smooth pertur bations if the coordinates of H are rationally independent and satisfy certain Diophantine inequalities, and if the matrix H^ of second deriva tives is invertible.
x n n x 1 n n c 1 n 2n
In the older literature, the notion of 'canonical transformation' is usu ally applied to any transformation ip : R > R that preserves the form of the canonical equations, i.e. (4.6.1) is transformed into (4.6.2), but without requiring that
2n 2n
, TV =
H * = IE.
If we now take a generating function W(t, x, ) as above, the Hamiltonian is transformed into H* = H + W
t
(4.6.18)
, 7 = W$ T
(4.6.19)
101
still hold. This may be used to explain Jacobi's theorem once more, as we now shall see. Let I(t,x ,..., x , A i , . . . , A ) be a solution of the Hamilton-Jacobi equation
1 n n
J + #(*,x,J )=0,
t x
(4.6.20)
det/ i
x
A j
^0.
(4.6.21)
The corresponding transformation then is P= h 7T = -It: H*(t,t,n) Because of (4.6.20), H* = 0 . Thus, the new canonical equations are just = H(t,x,p) + It. (4.6.22)
i =o
7T
= 0.
Solutions are of course = A = constant 7r = I\ = fi = constant. We have thus obtained the statement of Jacobi's Theorem 4.2.1, namely that from a solution of (4.6.20) w i t h (4.6.21), we may obtain solutions of the canonical equations by solving h = Ix = P w i t h parameters A = ( A i , . . . , A ) , fi = ( / i , . . . , /L* ).
n 1 n
102
Classical references for this chapter include: C.G.J. Jacobi, Vorlesungen iiber Analytische Mechanik (ed. H . Pulte), Vieweg, Braunschweig, Wiesbaden 1996, C. Caratheodory, Variationsrechnung und partielle Differentialgleichungen erster Ordnung, Teubner, Leipzig 1935, R. Courant, D. Hilbert, Methoden der Mathematischen Physik II, Springer, Berlin, 2nd edition, 1968. The global aspects are developed in V . I . Arnold, Mathematical Methods of Classical Mechanics, Springer, New York, 1978. A recent advanced monograph is H. Hofer, E. Zehnder, Symplectic Invariants and Hamiltonian ics, Birkhauser, Basel, 1994. Dynam GTM60,
That text will give readers a good perspective on the present research directions in the field.
Exercises 4.1 Discuss the relation between the canonical equations for the energy functional E and the equations for geodesies derived in Chapter 2. (Kepler problem) Consider the Lagrangian L(x,x) = \ \x\ + r i r 2 |x|
2
4.2
for x G E .
4.3
Compute the corresponding Hamiltonian and write down the canonical equations. Show that the three components of the angular momentum x A x are integrals of the Hamiltonian flow. For smooth functions F,G : E E, define their Poisson bracket as
2 n X
'
'
dxidpj
n n
dpjdxi'
where z = (x,p) = ( x , . . . , x , p \ , . . . ,p ) are Euclidean coordi nates of E . Let z(t) = (x(t),p(t)) be a solution of a canonical system
2 n
x = H
P = ~H
Exercises for some Hamiltonian H(x,p) that for any (smooth) F : E j F{z{t))
t
2 n
and satisfies the Jacobi identity {{F, G},L} + {{G, L},F} for all smooth F, G, L . Show that a diffeomorphism ip : R formation if
2 n
+ {{L, F}, G} = 0
~> R
2 n
is a canonical trans
5 Dynamic optimization
Optimal control theory is concerned with time dependent processes that can be influenced or controlled via the tuning of certain parameters. The aim is to choose these parameters in such a manner that a desired result is achieved and the cost resulting from the intermediate states of the process and from the application or change of the parameters is minimized. I n some problems, the control parameter can be applied only at discrete time steps, while other problems can be continuously controlled. As we shall see, however, the discrete and the continuous case can be treated by the same principles. Since the end result may be prescribed, and the value of a parameter at some given time influences the state of the system at subsequent times and therefore typically will also contribute through this influence to the cost of the process at those later times, the determination of the optimal control parameters is best performed in a backward manner. This means in the discrete case that one first selects the best value of the control parameter at the last stage, whatever state the system is in at that time, then the value at the second-to-last stage, so that at this step the contribution of the value of the control parameter at the last stage to the total cost function is already determined and one only needs to optimize the cost function w.r.t. the second-to-last parameter value, and so on.
5.1 Discrete control problems We consider a process with n states x i , . . . , x we may choose a control parameter K e Ai, 104
n
M . A t each state # i ,
(5.1.1)
5.1 Discrete control problems where A* is a given control restriction (A* C M ) to determine Xi+i = (pi(xi,\i) w i t h cost
ki(Xii A$).
c
105
(5.1.2)
is
K (x ,\ ,...
u u u
,\ )
n
:= ^ f c t ( x j , A i ) ,
with
x\
i+
We wish to minimize the total cost of the process and define the Bellman function Iu(x ):=
u
inf
#(, A,...,A)
(i/ = 1 , . . . , n).
(5.1.4) equation
inf (k (x , A) + I +i
v v v
forv=l,...,n
AA
(5.1.5) (here, we put I +i = 0). Furthermore, ( A , . . . , A ) G A^ x x A , ( x , . . . ,x ) with (5.1.2) are solutions of (5.1.4) iff
n n n n
^(XJ^XJ)
+ I + (x +i)
j 1 j
for j = i / , . . . , n .
(5.1.6)
\ , . . . , A ) = k (x ,
v n v v
\ ) -h i ^ - i - i ((p (x ,
v v v
\v)\ A | / 4 - i , . . . , A ) ) ,
n
we get J,/(x) =
i =
inf
t/,...,n
jFf(x;A,...,A )
n
inf
inf
K (x \A,...,
v v
A )I
n
= =
inf
\k (x ,\ )+
v v v
A
inf
i
e A
K +1 ( < / v ( ^ A); A + i , . . . , A ) ]
v n
AA \
i
v V v
inf (k (x i\ )
v V u
+ I +i(<p {x i\ )))
v
i
<PJ(XJ,\J)
AA
for
106 j = i / , . . . , n,
Dynamic
optimization
A ) = i (x ,
n v v
A , . . . , A ).
n n
Corollary 5.1.1. ( A i , . . . , A) A i x x A , ( x i , . . . , x ) with (5.1.2) is a solution of (5.1.4), iff for all v 1 , . . . , n, ( A , . . . , A ) G A x x A ( x j , , . . . , x ) with (5.1.2) is a solution of (5.1.4)n n n n ? n
Corollary 5.1.2. (Bellman's method) An optimal solution of the pro cess can be calculated as follows: For any value of x , compute A ( x ) minimizing (5.1.5) for v = n. Having computed XJ(XJ) for j = v + l , . . . , n , com pute A(x) for any value of x as to minimize (5.1.5) and put x +\ = Kp (x , X (x )). For an arbitrary initial value x\, an optimal process thus is given by:
n n n u v v v v v
Ai : = A i ( x i ) , x
:= ^ i ( x i , A i ) , A = A (x ),...
2 2 2
= x
G B
w i t h a given set B\ G R - We have the control equation x(t) = f(t, x(t), X(t)) for almost all t G (* , h)
0
107
for some given A C R . Pairs (\(t),x(t)) satisfying all these restrictions are called admissible, and the set of admissible pairs is called P(to,xo). We put I{t ,x ):=
0 0
inf
(A(t),x(t))P(to,*o)
K(ti,x(ti))
(Bellman function). L e m m a 5.2.1. (i) I(t\, x\) = K(ti, xi) for all xi G Bi (ii) For any path (\(t),x(t)) G P(t ,xo), I(t,x(t)) increasing function oft [to, t i ] .
0 0 2
is a monotonically
Proof, (i) is obvious. For (ii), if t < T\ < r < t\, the set of all admiss ible paths from ( T 2 , X ( T 2 ) ) to (ti,Bi) can be considered as a subset of those ones from ( T I , X ( T I ) ) to (ti,x(ti)). Namely, if we have any path from ( T 2 , X ( T ) ) to (ti,xi) for some xi G S i , we may compose i t w i t h x(t)\ to obtain a path from ( T I , X ( T I ) ) to (ti,xi). Thus, every endpoint in Bi that can be reached from ( T 2 , X ( T ) ) by an admissible path can also be reached from ( T I , X ( T I ) ) by an admissible path. This implies monotonicity.
2 [ri r a ] 2
q.e.d. T h e o r e m 5.2.1. (\(t),x(t)) is a solution of the problem, if I(t,x(t)) is constant in t. Moreover, if there exist a function J(t,x) that satisfies J(ti,xi) = K(ti,xi) for all xi G Bi and is monotonically increasing along any admissible path, and an admissible path (\(t),x(t)), along which J is constant, then that path is a solution of the problem. Proof. For a solution, I(t ,x )
0 0
= K(t x(ti))
u
= I(t x{ti)
u
(x = x ( t ) ) ,
0 0
(5.2.1)
I(t,x(t)) then is constant by Lemma 5.2.1 (ii). I f I(t,x(t)) is constant, then (5.2.1) holds, and by Lemma 5.2.1, we have a solution. Given J as described, by the monotonicity of J, for any admissible path J(to,xo) < K(ti,x(ti)) and for the path (\(t),x(t)), J{t ,x )
0 0
= J(h,x{ti))
K(ti,x(ti)),
108
Dynamic
optimization is differen
Lemma 5.2.1 implies that for those t for which I(t,x(t)) tiable ((\(t),x(t)) e P(t ,x ))
0 0
For an optimal (A(), #()), we have by Theorem 5.2.1 then I {t, x(t)) + I (t, x(t))f(t,
t x
x(t), \(t)) = 0.
C o r o l l a r y 5 . 2 . 1 . (Bellman equation) Let t [o>*iL Assume that for every A A, there exists an admissible pair (\(t),x(t)) with \(r) = A, x(r) = . Then inf: ( / t ( r , O + / ( r , O / ( r , f , A ) ) = 0 . Proof. This follows from the proof of Lemma 5.2.1. Namely, the assump tion implies that we may select A such that the path is optimal at the point (r, ) under consideration. q.e.d. Example. We want to minimize the integral [
Jt
0
t l
(u (t) + \ (t))dt
2
= 1i
and the control equation u(t) = au(t) + p\(t) with given a, (3 e E. (5.2.2)
In order to express this problem as a control problem, we introduce a new dependent variable v(t) as solution of the equation
v
(t)
= (t)
u
+ X (t) , v(t ) = 0.
0
(5.2.3)
Given
p : [to,t\]
v(h).
E with
p{h) =
(5.2.4)
maximum
principle
109
= p(t)u (t)
+v(t).
and this expression vanishes precisely i f X(t) = By Theorem 5.2.1, x(t) = (u(t),v(t)) optimal solution. -f3p(t)u(t). and X(t) = ~(3p(t)u(t)
(5.2.5)
yield an
If we substitute X(t) through the control equation (5.2.2) in the vari ational integral, we obtain the integral
which is essentially the same as the one considered at the end of 4.2 with integrand given by (4.2.28). We recall that the latter one had also been reduced to a Riccati equation. Equation (5.2.5) expresses the control parameter as a function of the state of the system. We just have a feedback control: knowing the state at a given time determines the control needed to reach an optimal state at the next time.
principle
(5.3.1)
(5.3.2)
x(t)=f(t,x(t),X(t))
110 w i t h controls
Dynamic
optimization
(5.3.3)
Here, X(t) is required to be piecewise continuous, and x(t) to be continu ous. (Equation (5.3.2) then has to be interpreted as an integral equation x(t) = x + f* / ( r , X ( T ) , A ( r ) ) d r . ) F , / , and g are required to be of class C . Also, to is fixed, whereas t\ > to is variable subject to the restriction (5.3.3). We define the Pontryagin function
0 Q 1
H(x, A , p , t , / / ) :=p0
f(t,x,\)
principle
ii F(t,x,\).
0
maximum
T h e o r e m 5 . 3 . 1 . If (x(t),\(t)) is a solution of the control problem, there exist fi > 0, a = (OJI, . . . , ay) M. (a ^ 0 i f HQ 0) and a continuous p = ( p i , . . . ,p^) on [fo, f i ] swcft f/iof of a// points where X(t) is continuous, we have
D 0
H(x(t), ond
W>)
(5.3.4)
p = -H
, x = Hp condition
(5.3.5)
and of f/ie end poinf f i , we /love the trans vers ality da P(h) = --:(tux(t ))-a .
1 j j
(5.3.6)
There also exists a continuous function rj : [ f o ^ i ] R swc/i f/iof of o// * points where X(t) is continuous ri(t)=H(x{t),\(t),p{t),t,iM>) and r)(t) = H
t
(5.3.7)
(5.3.8) (5.3.9)
111
(3) I f we want to guarantee a fixed end time ?i, we simply introduce an additional variable
x
with control conditions
d+1
=t
x
d+l 0
d+1
= 1
0
x (t )=t
and end condition
x \h)
d+
= h.
We now want to exhibit the Hamilton-Jacobi theory as a special case of optimal control theory. Concretely, we want to derive the EulerLagrange equations which are equivalent to the canonical equations of Chapter 4 from the Pontryagin maximum principle. We thus consider the variational problem L(t, x(t), x(t))dt min with x(to) = #o, #(^i) = x i , x : [to, t i ] R and where x(t) is required to have piecewise continuous first derivatives. We introduce the control variable through the control equation X(t) = x(t) with A = M , i.e. no constraint imposed. We have g(ti,x(ti)) x(t\). The Pontryagin function of this problem is H(x, A,p, t, no) = p A - n L(t,
0 0 d d
= x\
x, A).
d
(a ^ 0
for fi = 0)
0
112
0 d 0
Dynamic
optimization
= max(t,x(t),A,PW,W>)
and 7 C ( [ t , t i ] , R ) w i t h 7
0
r?(t)=W(t,rc(t),A(t),p(t), )
W)
q(ti)=0. We now want to exclude that /L^O = 0. I n that case, we would have rj = Ht=0 and p = Hx=0 Thus W = a A, and since H = 0, ( x ( t ) , A(t),p(t), t, 0) = 0, and thus
t
, hence
7 = 0 7
since
rj(t\) = 0
, hence
p = a
since
p(t\) = a.
a = 0, contradicting the statement of the theorem that a ^ 0 in case ^ 0 = 0. We may thus assume == 1The Pontryagin maximum principle then gives the Weierstrafi condi tion L ( t , x ( t ) , A) - L(t,x(t),x(t)) and W (t,x(t),A(t),p(t),l)=0
A
> p . (A - (t))
for all A e R
(5.3.10)
(5.3.11)
is positive semidefinite.
(5.3.12)
maximum
principle
113
we obtain the Euler-Lagrange equations ~M = L . (5.3.13) at A basic reference for the variational aspects of optimization and control theory where also a detailed proof of the Pontryagin maximum principle together with many applications is given is
x
E. Zeidler, Nonlinear Functional Analysis and its Applications, I I I , Springer, New York, 1984, pp. 93-6, 422-40.
Part two
Multiple integrals in the calculus of variations
1.1 T h e Lebesgue measure and the Lebesgue integral In this section, we recall the basic notions and results about the Lebesgue measure and the Lebesgue integral that will be used in the sequel. Most proofs are omitted as they can be readily found in standard textbooks, e.g. J. Jost, Postmodern Analysis, Springer, Berlin, 1998, pp. 151-97 and 209-15. Definition 1.1.1. A collection E of subsets ofR (on R ) if
d d d
is called a a-algebra
The Borel a-algebra is the smallest a-algebra containing all open sub sets of R . The elements of the Borel a-algebra are called Borel sets.
d
(vi) ISA, Be
Definition 1.1.2. Let E be a a-algebra. A measure [i on E is a countably additive function fi: E R+ Pi {oo}. 117
118
(
/ o r on?/ collection of 12. A measure A Borel measure compact K cM. Borel set B.
d
oo
\
A n
oo
=
)
/
E
TO=1
n
n=l
of mutually disjoint (AmCiA = 0 form ^ n) elements defined on the Borel a-algebra is called a Borel measure. \i is called a Radon measure if n(K) < oo for every and fi(B) sup{fi(K) \ K C B, K compact} for every
A measure / i o n E enjoys the following properties: (vii) jx(0) = 0 (viii) I f A , B G E, A c B , then fi(A) < fi(B) (ix) I f A G E, n = 1,2,3,... and A C A + 1 for all n , then
n n n
00 \ 1J A = lim
n
n=l
/ /
n>oo
fi(A ).
n d
T h e o r e m 1.1.1. There exist a (unique) (unique) measure [i i n E satisfying (x) i4nt/ open subset ofR a-algebra) (xi) For Q := [
x d
a-algebra E on R
and a
= (\
x
. ..,x )eR \
aj < x < bj , j = 1 , . . . , d] ,
v(Q)
Yl( J~ A
i=i (xii) (translation invariance) For x G E , A G E we /love x + A : = { # + i/1 i/ G A} G E and + A) = n(A) consequently,
d
This ji is called Lebesgue measure, and the elements of E are called (Lebesgue) measurable. In later chapters, we shall however write meas in place of fi for Lebesgue measure.
119
One should note that the a-algebra of (Lebesgue) measurable sets is larger than the Borel a-algebra. We say that a property holds almost everywhere in A C E i f i t holds on A \ B for some B C A w i t h n(B) = 0. We say that two functions f,g : A E U { 0 0 } are equivalent if f(x) = g(x) for almost all x A. A set contained in a set of measure 0 is called a null set. We usually write meas A instead of n(A) for a measurable set A.
d
D e f i n i t i o n 1.1.3. Let A C R
be measurable. A
function
f : A E U { o c } is called measurable if
{xeA\ f(x)<\}
is measurable for every A G E . If / , n G N , are measurable, c G E, then / i + / , c / i , / i / , m a x ( / i , / ) , m i n ( / i , / ) , l i m s u p _ / , liminf _>oo f are likewise measurable. Any continuous function / is measurable, because in that case { / ( # ) < A} is open in its domain of definition. We have the following important composition property:
n 2 2 2 2 n o c n n n
T h e o r e m 1.1.2. Let g : A E be measurable (i.e. g = (#\..., and each component g is measurable), y : E E continuous. y o g is measurable.
3 c
# ), Then
is defined as
if x A otherwise.
Thus, A is measurable if and only if its characteristic function \A is measurable. More generally, s : A E is called a simple function or a step function if it assumes only finitely many values, say s(A) { A i , . . . , A^}, and if all the sets {s(x) = Xi} are measurable. Thus
k
T h e o r e m 1.1.3. / : A E is measurable if and only if it is the pointwise limit of a sequence of simple functions. If f : A E is measurable * and bounded, then it is the uniform limit of a sequence of simple func tions.
120
s(x)dx
: = ] T \in({s(x)
= Xi}).
i=l
(2) Let A be as in (1), f : A E measurable and bounded. Let s : A E 6e o sequence of simple functions converging uniformly to f according to Theorem 1.1.3. The Lebesgue integral of f then is
n
I f(x)dx
JA
:= l i m /
N
s (x)dx
n
-*
JA
(this integral is independent of the choice of the sequence (3) A as in (1), / : i - ^ E U { 0 0 } measurable. Put
(s ) ^).
n n
{
d
m n f(x)
f(x)dx
off. (4) A c l measurable, / : i - > R U { 0 0 } measurable, f is called integrable if for any increasing sequence A\ C A2 C C A of measurable subsets of A with fi(A ) < 00 for all n, f is integrable on A and
n XAn n
lim
f(x)xA (x)dx
n n
exists. That limit then is independent of the choice of (A ) and called the Lebesgue integral J f(x)dx of f.
A
T h e o r e m 1.1.4. The Lebesgue integral is a linear nonnegative func tional on C (A), the vector space of Lebesgue integrable functions on a measurable set A, and it satisfies:
X
1.1 The Lebesgue measure and the Lebesgue integral (1) If f e (A),
X
121
then g e C (A),
and I f(x)dx
JA
= I
JA
g(x)dx.
In
particular, f f(x)dx
JA
= Oiffi(A) and [ I
JA
= 0.
(2) Iffe
C (A),
f{x)dx\<
\f(x)\dx.
(3) If f C ^), h: A-> R U { 0 0 } measurable with \h\ < / , then h C {A) and
l
I h(x)dx
JA
< I
JA
f(x)dx. then
(4) If 11(A) < 00, / : A R measurable with m < f < M, feC (A), and
1
mfi(A)
< [ f(x)dx
JA
<
Mfi(A).
(5) / / (-A )nN is a sequence of mutually disjoint (A n A = 0 for m^n) measurable sets, A := U^Li A , f ^(An) for every n, and if
n m n n
00
\f(x)\dx<oo,
f(x)dx.
Conversely, sequence
if f e C (A), (A ) ^.
n ne
theory
r \f{x) - (p{x)\dx
c
< e.
d
f(x)dx=
j
JA
( [
\JB
f(a,r))dri)dt
/
IB
(Here, for example j For / C (A),
l B
[JA
ffor,)di)
dn. A)
we put
We then have Jensen's inequality: T h e o r e m 1.1.6. Let A C R be bounded and measurable, f a convex function. Then for all ip G C (A)
l d
1.2 Convergence theorems In this section, again no proofs are given, and the reader is referred to J. Jost, loc. cit., pp. 199-208. T h e o r e m 1.2.1 ( B . L e v i ) . Let A C R be measurable, and let f : A R U { 0 0 } be a monotonically increasing sequence (i.e. f (x) < / i ( x ) for all x G A, n N) of integrable functions. If
n n n + d
lim
n-^ooJ
A
/ f (x)dx
n
< 00,
then f := l i m _ >
n
00
= lim / - JA
n
f (x)dx.
n
123 : A R+ U { 0 0 }
Y] then YlnLi
s
fn(x)dx
<
OC,
J A ^
T h e o r e m 1.2.2 ( F a t o u ) . Let A C R be measurable, f : A RU {ztoo} integrable for n G N . Assume that there exists some integrable F :A R U { 0 0 } with
n
fn>F
for all n G N , of n.
L
1A
f (x)dx
n
/ l i m i n f f (x)dx
n
f (x)dx.
n
JA
T h e o r e m 1.2.3 (Lebesgue). Let A C R fee measurable, f : A * R U { o o } o sequence of integrable functions converging pointwise almost everywhere on A to some function f : A R U { 0 0 } . Suppose there exists some integrable F : A R U { 0 0 } with
n
for all n.
= lim /
N
f (x)dx.
n
JA
Thoerem 1.2.3 is called the theorem on dominated convergence. Let us consider an example that shows the necessity of the hypotheses in the previous results: f : [0,1] R is defined as
n
U(x)
t (
fn :=< 10
2/n
1
, ^ ^ (n>2)
124 Then
= 0,
/ f (x)dx Jo
n
= 1^0=
/ Jo
f(x)dx.
The f do not form a monotonically increasing sequence so that B.Levi's theorem does not apply, and they are not bounded by some integrable function that is independent of n so that Lebesgue's theorem does not apply either. Considering f instead of / , we finally obtain a sequence for which Fatou's theorem does not hold. As a corollary of Theorem 1.2.3 one has (approximate the derivative by difference quotients):
n n
Corollary 1.2.2 (Differentiation under the integral). Let I C R be an open interval, A C R measurable, and suppose f : A x I * R U { 0 0 } satisfies (i) for any t I , / ( , t) is integrable on A (ii) for almost all x A, / ( # , ) is differentiable on I (Hi) there exists an integrable (j>: A R U { 0 0 } with the property that for all t I and almost all x A
d
57/OM)
Then
<<Kx).
q.e.d.
2 Banach spaces
In this chapter, we present some results from functional analysis that will be needed in the sequel, in particular in the next chapter. A l l proofs are supplied. As a reference, one may use any good book on functional anal ysis, e.g. K . Yosida, Functional Analysis, Springer, Berlin, 5th edition, 1978, pp. 52-5, 81-3, 90-92, 102-28, 139-45 or F. Hirzebruch, W . Scharlau, Einfuhrung in die Funktionalanalysis, Bibliograph. Inst., Mannheim, 1971, pp. 60-88, 107-12. (These were also our main sources when com piling this chapter.)
2.1 Definition and basic properties of B a n a c h a n d H i l b e r t spaces Definition 2.1.1. A vector space V overR there exists a map is called a normed space if
(ii) ||Av|| = |A|\\v\\ for all A G R, v e V (iii) \\v + w\\ < \\v\ \ + ||w|| for all v,w G V (triangle A sequence
(f )nN C
n
inequality)
v\\ = 0.
(In order to distinguish the notion of convergence just defined from the notion of weak convergence to be defined in the next section, we sometimes call it norm convergence or strong convergence.) 125
126
n
Banach spaces
A sequence (v )neN C V is called a Cauchy sequence if for every e > 0 we may find N G N such that for all n,m > N \\v -Vm\\
n
< .
A normed space (V, ||-||) is called a Banach space if it is complete w.r.t the notion of convergence just defined, i.e. if every Cauchy sequence converges to some v G V.
Examples (1) Every finite dimensional normed vector space is a Banach space, for example R w i t h its Euclidean norm |-|.
d
(2) Let K C R be compact. C(K) := {/ : K R continuous}, ll/lloo P Z G A : l / ( ) l f f C(K)> defines a Banach space. If we equip C (K) := {/ : K R m-times continuously dif ferentiable}, m G N , w i t h the norm I H I ^ , i t is not a Banach space, because it is not complete. Namely the convergence w.r.t. IN loo * i f convergence, and while the uniform limit of con tinuous functions is continuous, in general the uniform limit of differentiable functions is not necessarily differentiable.
: =
S U
s u n
r m
(3) Let (V, 11-||) be a Banach space, W cV & linear subspace that is closed w.r.t. ||-|| i.e. if (u> )nN C W converges to v G V^limn-^oo\\w - v\\ = 0), then v G W. Then (W, ||-||) is a Ba nach space itself.
n n
Definition 2.1.2. A Hilbert space is a vector space H overR equipped with a scalar product, i.e. a map (-,-): H x H -*R satisfying (i) (v, w) = (w, v) for all v,w G H (ii) ( A i ^ i + \ v ,w) = Ai(i>i,w) + A (t>2,w) for all A i , A vi,v ,w GH (iii) (v, v) > 0 for alive H\ { 0 } .
2 2 2 2 2
G R,
In addition, we require (iv) H is complete w.r.t. the norm \ \v\\ : = (v, i.e. a Banach space.
127
L e m m a 2.1.1. Let (-,-): H x H -> R satisfy (i)-(iii) of Definition 2.1.2. Then we have the Schwarz inequality: \(v,w)\ < \\v\ \ ||w|| for all v,w G H, with equality if and only if v and w are linearly dependent. Proof. We have for v, w G H, A G R (v + \w, v + Xw) > 0 by (iii) .
Inserting A = and expanding with the help of (i), (ii) yields the Schwarz inequality Since + w | | = (v + w, v + w) = \\v\\ + | | w | | + 2 ( v , i u ) , the Schwarz inequality in turn implies the triangle inequality. q.e.d.
2 2 2
Definition 2.1.3. Let V be a vector space (overR, is called convex if whenever x,y G M, then also tx + (1 - t)y G M for allO <t<
as always). M
CV
1.
Example 2.1.1. Let (V, ||-||) be a normed space. Then for every \i < 0, := {# G V | ||x|| < fi} is convex. Namely if x,y G i.e. |x| < < A*, then for 0 < t < 1 |to + (1 - t)y| <t\x\ hence -h (1 t)y G + (l-t)\y\ <n,
The following definition contains a sharpening of the convexity of the balls B^ I t will be formulated only for fi = 1, but by homogeneity ((ii) of Definition 2.1.1), it implies an analogous condition for any fi > 0. Definition 2.1.4. A normed space (V,||-||) is called uniformly convex if for all e > 0 there exists 6 > 0 with the property that for all x,y eV with \\x\\ = \\y\\ = 1, we have > 1 \x - y\\ < e. (2.1.1)
Remark 2.1.1. A n equivalent form of the implication (2.1.1) is x-y\\ (again for ||x| > 1 => <l-6 (2.1.2)
= 1).
128
Banach spaces
Example 2.1.2. I n a Hilbert space (H, (, )), we have the parallelogram identity 2 ^ + 2/) (2.1.3)
which follows by expanding the norms in terms of the scalar product. Therefore, any Hilbert space is uniformly convex. L e m m a 2.1.2. In Definition 2.1.3, the condition \\x\\ = \\y\\ 1 may be replaced by
Nl<i,
llvll<i.
Proof. I n the situation of Definition 2.1.3, for eo > 0, we may find So > 0 such that for all z,w with \\z\\ = \\w\\ = 1, we have > l-S =>
0
\\z-w\\
<e .
0
(2.1.4)
<
2(lWI
IMl)
>
2 ^ + ^)
llyll
> 1 - 3<5. We apply (2.1.4) with z = x w y Q = . I f 36 < <5Q, we then get e <2-
R
Now
llyll
1^
<
Ml
IMI
2.1 Basic properties of Banach and Hilbert spaces Choosing 8 = min(3<5o, e/8), we have shown the implication >l-6=>\\x-y\\<e for ||*|| < 1, ||y|| < 1.
129
(2.1.5)
2 (n
x
H"
%m)
l.
(2.1.6)
converges to some x G V with \\x\\ = 1. 0. (2.1.5) and (2.1.6) imply l i m | | x | | = 1. Therefore, by jjf^jp we may assume w.l.o.g | | x | | = 1. Because of apply Lemma 2.1.2. By (2.1.6), we may find N G N such N
n n
1 2 (n
x
^ ^
i.e. ( # ) n e N is a Cauchy sequence. Since (V,||-|| is a Banach space, i t has a limit x, and
n
\\x\\ = lim | | x | | = 1.
n
q.e.d. In order to formulate the Hahn-Banach theorem, a fundamental ex tension result for linear functionals from a linear space to the whole space, we need: D e f i n i t i o n 2.1.5. Let V be a (real) vector space. p : V -+ R+ (R
+
: = {t G R | t > 0 } )
Banach spaces
(i) p(x + y) < p(x) + p(y) for all x,y eV (ii) p(Xx) = \p(x) for all x eV, A > 0 Example 2.1.3. The norm on a normed vector space. Let VQ be a linear subspace of the vector space V , /o : Vb R linear. A linear / : V R is called an extension of fo if f\v
0
= /o-
T h e o r e m 2.1.1 ( H a h n - B a n a c h ) . Le Vo be a linear subspace of the vector space V, p : V R convex. Suppose that /o : Vo R *5 linear and satisfies
+
fo(x)<p(x)
forallxeVo.
(2.1.7)
T/ien there exists an extension f : V R o/ /o m^/i / ( x ) < p(x) for all xeV. (2.1.8)
Remark 2.1.2. We shall need the Hahn-Banach theorem only in the case where V possesses a countable basis, i.e. is separable (see p. 130). Proof. We may assume VQ ^ V. Let v G V \Vb, V\ be the linear subspace of V spanned by Vo and v, i.e. Vi : = {x + tv \ x G Vb, , t G l } . We shall now investigate how /o can be extended to f\ : V\ R with h(x)<p(x) forallxG^i. satisfies (2.1.9)
r
We put fi(v) =: a. Then as an extension of /o, f\ fi(x Equation (2.1.9) requires f (x)
0
+ tv) = f (x)
Q
+ta.
(2.1.10)
o<p(f+)-/ (f),
0
(2.1.11)
^-pHH-Mf)-
(2
L12)
131
have
2
h(x )
xi) + v) - ( x i +v))
+ v) + p(~xi
- v),
+p(x
+ v) > -fo(xi)
- p ( - x i - v).
(2.1.13)
Thus a
2
: = inf ( - f {x )
0 2
+ p ( x - f f))
2
-v)).
satisfies (2.1.11) and (2.1.12), hence (2.1.10). Thus, the desired extension / i exists. I f V possesses a countable basis, we may use the preceding construction to extend /o inductively to all of V. If V does not possess a countable basis, we need to use Zorn's lemma to complete the proof. For that purpose, let <!>:={</?: W E extension of /o to some linear subspace W, Vo C W C V , satisfying </?(#) < p(x) for all x G W} On <, we have an obvious ordering relation (namely, for <>i : Wi > E, i = 1,2, we have (pi < (p if V^i C W and = <i), and every totally ordered subset 3>o of $ possesses a maximal element, namely <Po defined on the union of the domains of all (p <fio and coinciding with each such (p on its domain of definition. By Zorn's lemma, $ then contains a maximal element / . Let W be the domain of definition of / . / then extends fo to W. I f W were not the whole space V, we could use the preceding construction to extend / to a larger subspace of V , contradicting the maximality of / . Therefore, / furnishes the desired extension of /o. q.e.d.
2 2
C o r o l l a r y 2 . 1 . 1 . Let Vo be a linear subspace of the normed vector space (V, | | . | | ) , A > 0, f : V -> E linear with
0 0
\fo(x)\
< X\\x\\
for allxe
V.
0
132
Banach spaces
T h e o r e m 2.1.2 ( H e l l y ) . Let (V, ||-||) be a Banach space, / i , . . . , / linear functionals V R t/iot ore continuous w.r.t. the norm conver gence, / i , c * i , . . . , a G R. Suppose that for any X\,..., A G R
n n
(2.1.14)
fi(xi)=ai
for t = 1 , 2 , . . . , n
(2.1.15)
and | | X | | < / * + .
Proof. Let m < n be the maximal number of linearly independent fi, i = l , . . . , n . I t suffices to consider m linearly independent fi, w.l.o.g. /i /m since the remaining ones are easily seen to be taken care of by (2.1.14). F(x) : = ( / i ( x ) , . . . , fm(x)) may then be considered as a linear map onto R . We equip R w i t h its Euclidean structure. Let
m m
Bp :={xeV\
+e e
\\x\\ < + }.
Then F(B^ ) is a convex set containing 0 as an interior point. Also, F(B^ ) is balanced in the sense that w i t h p G R i t also contains p. We now assume that a i , . . . , a is not contained i n F(JE? ). Be cause of the properties of F(B ) just noted, we may then find A = ( A i , . . . , A ) with
m +e m M+C fl+e m
^2 ^
2=1
iOLi
S U
$>/<(*)
=1 TO
0* + ) 5 > / <
contradicting ( 2 . 1 . 1 4 ) . Thus (ai,...,a )
m
G F(JB
M + C
),
implying
the
claim.
q.e.d.
2.2 D u a l spaces and weak convergence Let V be a vector space. The linear functionals f:V-+R
133
then also form a vector space. I f (V, ||-||) is a normed vector space, we define the norm of a linear functional / : V E as
+ : = s u p
I M
IFII
| +
(2.2.1)
x*o
The easy proof is left to the reader. (See also Lemma 2.3.1 below.) q.e.d. D e f i n i t i o n 2.2.1. V* := {/ : V E linear with \\f\\^ < oo} equipped with the norm (2.2.1) is called the dual space of (V, | | - | | ) . (It is easy to verify that (2.2.1) defines a norm on V* in the sense of Definition 2.1.1.) L e m m a 2.2.2. (V*, | | - | | ) is a Banach space.
#
Proof. Let (f )neN C V* be a Cauchy sequence. For every e > 0 we may then find N G N such that for n, ra G N
n
\\fn-fm\l
<.
By (2.2.1), this implies that for every x G V \fn(x) ~ fm(x)\ < C. Therefore, since E is complete, (f {x)) eN converges for every x G X. We denote the limit by f(x). f : V E then is a linear functional. I t is an easy consequence of the triangle inequality that \\f\\* < oc and that l i m _ o o | | / n - / I I * = implies that (f )nen converges to / G V*, and (V*, I H U therefore is complete, hence a Banach space. q.e.d.
n n T n i s n n
Remark 2.2.1. We did not assume that V itself is a Banach space. We now consider (V*)* = : V**, the dual space of V*, w i t h norm denoted by | | - | | . Any x G V defines a linear functional
++
i(x)
:V*
i(x)(f)
134
Banach spaces
L e m m a 2.2.3. = Thus, the linear functional i(x) : V* > E is contained in V**, i.e. we have a linear isometric map i : V V**. Proof. We have
l ( / , * ) l < l l / I U M I ,
(2.2.2)
:= t\\x\\
for t G E.
0
By the Hahn-Banach theorem (Corollary 2.1.1), we may extend / { x | G E } t o V a s a linear functional / with = 1 and
l ( / , * ) l = N I -
from
Therefore
l l ^ ) I L = sup i ^ > | | x | | . (2.2.3)
Equations (2.2.2) and (2.2.3) imply the result. q.e.d. D e f i n i t i o n 2.2.2. A normed linear space (V, ||-||) is called reflexive if i :V -> V** is a bijective isometry (i.e. \\x\\ = \\i(x)\\^^ Remark 2.2.2. (1) Since (V**, | | - | | ) is a Banach space by Lemma 2.2.2, any reflexive space is complete, i.e. a Banach space. (2) By the remark before Definition 2.2.2, the crucial condition in that definition is the surjectivity of i.
##
135
(i) Let (V, 1 b e a normed linear space. (x )nGN C V is said to be weakly convergent to x G V if f(x ) converges to f(x) for all f G V*, in symbols:
n n
XJI
x.
(ii) Let (V*, be the dual of a normed linear space. (f )nen C V* is said to be weak* convergent to f G V* if f (x) converges to f(x) for all x G V.
n n
T h e o r e m 2 . 2 . 1 . Let V be a separable] normed linear space. Let (/n)nGN C V* be bounded, i.e. | | / | | * ^ constant (independent of n). Then (f ) contains a weak* convergent subsequence.
n n
Proof. Let (y^^^n by a dense subset of V. Since (f (yi))neN is bounded, a subsequence (fn(yi)) of (fn(yi)) converges. Having iteratively found a subsequence (f) of ( / ) for which ( / ^ ( ^ ) ) n N converges for 1 < v < m, we may find a subsequence ( / ^ ) of (f) for which also ( / r ( 2 / m + i ) ) n N converges. The diagonal sequence (/)nN then con verges at every y , v G N , and since (y )v^ is dense in V , (fn{x)) en has to converge for every x G V. Thus, we have found a weak* convergent subsequence of ( / ) e N .
n
+ 1
+1
(1) The argument employed in the preceding proof is called Cantor diagonalization. (2) Theorem 2.2.1 remains true without the assumption that V is separable, and so does the following: C o r o l l a r y 2 . 2 . 1 . Let (V, ||-||)6e a separable reflexive Banach space. Then every bounded sequence (# )nGN contains a weakly convergent sub sequence.
n
Proof. By (2.2.2) or reflexivity, (i(x )) n is a bounded sequence in V** and therefore contains a weak* convergent subsequence. Since V is
n ne
Separable means that V contains a countable subset {y )veN that is dense w.r.t. 1 i . e . for every y 6 V , e > 0 there exists y with \\y y \\ < e.
u u u
136
Banach spaces
reflexive, the limit is of the form i(x) for some x G V. Thus f(x )
n n n G
for every / G F *
so that ( x ) N converges weakly to x. q.e.d. T h e o r e m 2.2.2. Am/ weakly convergent sequence (# )nN space is bounded.
n
o, Banach
Proof. We shall show that i(x ) {feV*\ H/IL < 1 } . Then also
n
is
uniformly
bounded
on
I I ^ ) I I =
sup fev*
(2.2.4)
11/11*
n
is bounded (see Lemma 2.2.3 for the first equality). Since i(x ) is linear, it suffices to show uniform boundedness on some ball in V*. Otherwise, we find a sequence Bj of closed balls, Bj = { / G V* | ||/ - fj\\ < Qi} with Bj+x C JBJ and a subsequence (x' ) of (x )
n n
and with
l i m Qj = 0
j*oo
|(/,x;)|>i
(2.2.5)
By construction, (fj)jeN forms a Cauchy sequence and therefore con verges to some /o G V*, with
oo
/o e Because of (2.2.5), we have |(/o,Ol > J This is impossible since (fo,x ) weakly.
n f
n -Si-
foralljGN.
Example
2.2.1.
(1) I n a finite dimensional normed vector space (which automati cally is complete, i.e. a Banach space), weak convergence is just componentwise convergence and therefore equivalent to the usual convergence w.r.t. the norm.
137
(2) I n an infinite dimensional reflexive Banach space (V, | | - | | ) , this is no longer so, because one may always find a sequence ( e ) N C V w i t h ||e$|| < 1 for all i and ||e* ej\\ > 1 for i ^ j . Such a sequence cannot converge w.r.t. ||-||, because i t is not a Cauchy sequence, but i t always contains a weakly convergent subsequence according to Corollary 2.2.1 (we have shown Corollary 2.2.1 only under the assumption of separability, but i t holds true in general).
n n G
L e m m a 2.2.4. Let (V, ||-||)6e a separable normed space. Then V* satis fies the first axiom of countability w.r.t. the weak* topology, i.e. for each f G V*, there exists a sequence (U^^jq of subsets of V* that are open in the weak* topology such that every U that is open in this topology and contains x is contained in some U . Consequently, if (V, ||*||)is also reflexive, then V* satisfies the first axiom of countability w.r.t. the weak topology.
n
Proof. Let f eV*. Every neighbourhood of / w.r.t. the weak* topology contains a neighbourhood of the form
U , ,..., (f)-={geV*\
t vl Vk
\g(vi) - f(v )\
t
<
n ne
fori =
l,...,fc}.
Since V is separable, there exists a sequence (w ) n C V that is dense w.r.t the 1 t o p o l o g y . We claim that the neighbourhoods of the form u ... (f)
tWilt tWik
form a basis of the neighbourhood system of / of the required type, i.e. every U ,,, (/)
e;Vu yVk
...
iWik
i.e. g G v . ., (f) as required. Finally, i f V is reflexive, then the weak* and the weak topology of V* coincide. q.e.d. We now present some further applications of the Hahn-Banach theo rem that will be used in Chapter 3. L e m m a 2.2.5. Let (V, ||-||) be a normed space, Vo a linear subspace. Then VQ is also closed w.r.t. weak convergence. closed
138
Banach spaces
Proof. By the Hahn-Banach theorem (Corollary 2.1.1), for every XQ G V \ Vb, we may find a continuous linear functional fo : V R w i t h /o(x ) = 1
0
/ok=0. Thus, xo cannot be a weak limit of a sequence in Vbq.e.d. L e m m a 2.2.6. Let (V, ||-||)6e a reflexive linear subspace. Then Vb is reflexive. Proof v(f\ )
Vo
We may identify VQ** w i t h a subspace of V * * , by putting v(f) = for f eV*, v e Vb**. Let v G Vb**. Since V is reflexive, there
We claim x G Vb- Otherwise, by the Hahn-Banach theorem (Corollary 2.1.1), there exists f eV* with /(*) o / k = o . Since / ( x ) = v(f\y ) by the above, this is impossible. Since every fo G VQ can be extended to / G V*, again by Hahn-Banach, we conclude
0
v(fo) = fo(x)
forall/GVo*.
Thus, v = i(x). This implies VQ* = i(Vb), i.e. reflexivity of Vb. g.e.d. Corollary 2.2.2. yl Banach space (V, ||-||)is reflexive if and only if its dual (V*, \ is reflexive.
Proof. I f V = F**, then also F* = V***. Thus, i f V is reflexive, so is V*. Consequently, i f conversely V* is reflexive, so then is V**. Since V can be identified w i t h a closed subspace of V** by Lemma 2.2.2, Lemma 2.2.6 then yields reflexivity of V. q.e.d. L e m m a 2.2.7. Let (V, ||-||)6e a normed space, and suppose ( x ) N C V converges weakly to x G V . Then
n n G
that
\\x\\ < l i m i n f | | x | | .
n
139
Proof. After selection of a subsequence, we may assume that | | x | | con verges (see Theorem 2.2.2). Assume ||x|| > l i m | | x | | .
n
n>oo
with
11/11. = i
l/(x)| = I N I But then | / ( x ) | > lim | | x | | > l i m s u p | / ( x ) | ,
n n
n*oo
n n
>oo
f(x ).
n
This contradiction establishes the claim. q.e.d. T h e o r e m 2.2.3 ( M i l m a n ) . Any uniformly convex Banach space is re flexive. Proof (Kakutani). Let (V, ||-||)be a uniformly convex Banach space, and let XQ* G V**. We need to show that there exists some x V w i t h
0
t ( x ) = x*,*
0
(2.2.6)
(see Remark 2 after Definition 2.2.2). We may assume w.l.o.g. that 1 * * 1 = 1151 For every n N , we may then find / V* w i t h | | / | | = 1 and
n
(2-2.7)
1 - - < x* *(f ) n
0 n
< 1.
n
We now claim that for every n G N , we may find x fi(x )=x* *(fi)
n 0
fort = l , . . . , n
1+ -. n
(2.2.10)
4*
U > / i
< iixoir
|i=l
140
Banach spaces
and so the claim follows from Helly's Theorem 2.1.2. Since in addition to (2.2.10) also
I K H = ||/n|| I k n l l >
n*oo
For ra > n, we have 2 2 2 - - < fn(Xn) + fn(Xm) < ||x + X \\ < \\x \\ + | | x | | < 2 + - . n n
m m m
xo eV,
and /<(*o)=x5*(/<)
f
f o r i = 1,2,3,...
(2.2.12)
The solution XQ of (2.2.11), (2.2.12) is unique. Namely, if there were another solution x , on one hand, we would have I N + Xoll < 2 (2.2.13) by uniform convexity. On the other hand
0
fi(x hence
+ x' ) = 2x1*(fi)
0
f o r
a 1 1
= fi(x
4 1 1
>
= *5*(/ )
G V\
(2.2.14)
so that XQ* = i(xo), proving the theorem. Let this /o G be given. In the above reasoning, we replace the sequence / i , / 2 , / 3 , . . . by /o, fi, / 2 , / 3 , We then obtain a?Q G V with
141
Since the solution x of (2.2.11), (2.2.12) was shown to be unique, how ever, we must have x = x$. Equation (2.2.15) for i = 0 then is (2.2.14). q.e.d.
f 0
Corollary 2.2.3 ( R i e s z ) . Any Hilbert space ( H , (, )) can be identified with its dual H*. Proof. Since a Hilbert space is uniformly convex, Therem 2.2.3 implies H = H**. On the other hand, any x G H induces an f G H* by
x
f {y)
x
'= {x,y)
for y G H.
Thus, H is isometrically embedded into H*. For the same reason, H* is isometrically embedded into H**, and since H = i f * * , one readily verifies that these embeddings must be surjective, hence H = H* = H**. q.e.d. Let M be a linear subspace of a Hilbert space H. The orthogonal complement M of M is defined as
L
:= {x e H : (x, y) = 0
L
for all y e M} .
It is clear that M is a closed linear subspace of H. M need not be closed here, but the orthogonal complement of M is the same as the one of its closure M in H. Corollary 2.2.4. Let M be a closed linear subspace of the Hilbert space H. Then every x G H can be uniquely decomposed as x = xi + x
2
with xi G M , x
G M -.
x
G H*
142
Banach spaces
space itself, and f*f is an element of the dual M*. By Corollary 2.2.3, it corresponds to some X\ M , i.e. / f ( y ) = ( * i , y ) for all y G M . We put #2 = x x\. Then for all t/ M , (a? = f (y)
x x
- f**(y)
= 0 since
= / f on M.
Therefore, #2 M . Thus, we have constructed the required decom position. Concerning uniqueness, i f
x X\ + #2
= # i 4- #2
= x' . q.e.d.
2
Of course, the reader knows the preceding result in the case where H is finite dimensional, i.e. a Euclidean space. x\ is interpreted as the orthogonal projection of x onto the subspace M , and therefore Corollary 2.2.4 is called the projection theorem. The next result will be needed for Sections 4.2 and 4.3 when we estab lish the existence of minimizers for lower semicontinuous, convex func tionals. T h e o r e m 2.2.4 ( M a z u r ) . Suppose (x )nm converges weakly to x in some Banach space V. For every e > 0, we may then find a convex combination
n
N
n
]P A
71=1
( A > 0,
n
]T A
71=1
= 1)
with
N
< e.
(2.2.16)
N
X n X n w i t h
Co : = I
<
> 0, ^
n=l
A = 1> .
n
n=l
143
Replacing all x by x x\ and x by x - # i , we may assume 0 Co. I f (2.2.16) is not true, then there exists e > 0 w i t h \\x-y\\>e for all y G C .
0 0
(2.2.17)
Ci := {z G V : ||z - y\\ < | for some y G C } is convex and contains the ball with radius | and center 0. We consider the Minkowski functional p of C\ defined by p(z) : = i n f { A > 0 ; A - ^ G C i } . p is convex i n the sense of Definition 2.1.5 since C\ is convex, and contin uous since C\ contains the ball of radius | > 0 about 0. Since, because of (2.2.17), ||x z\\ > ^ for every z G C i , we have p(x) > 1. More precisely, there exists yo w i t h x = A i/o, p(yo) = 1. We consider the linear subspace V = {Mo,tiR}C
0 _1
0 < A< 1
and the linear functional /o() = Then fo <p on Vb, and by the Hahn-Banach Theorem 2.1.1, there exists an extension / of fo to all V w i t h fionV .
0
Since p is continuous, / is also continuous (see Lemma 2.2.1). We have sup f(y) < sup f(y) < sup p(y) = 1 yeC yCi yCi < A " = fiX-'yo) = /(*).
0 1
144
Banach spaces
n n
This, however, contradicts the fact that ( x ) j v C Co converges weakly to x. Thus, (2.2.17) cannot hold, and (2.2.16) is established. q.e.d.
2.3 L i n e a r o p e r a t o r s b e t w e e n B a n a c h spaces The results of this section will be used in Chapter 8. I n Section 2.2, we considered linear functionals
in the beginning, V was a normed linear space, w i t h norm denoted by 1 a n d later, we also assumed that V was complete, i.e. a Banach space. In the present section, we replace the target E by a general Banach space W, with norm also denoted by ||*||. We thus consider linear operators T:V and we put \\Tx\ |r||:=supiL-i E+U{oo}. IF!I L e m m a 2 . 3 . 1 . The linear operator T : V only if\\T\\ < oo. Proof. I f | | T | | < oo, then the inequality IW<||r|||N| (2.3.2) (2.3.1) ->W,
W is continuous if and
implies that T is continuous. (Of course, this uses the linearity of T.) Conversely, if T is continuous, we recall the usual e 6 criterion for continuity, and so for e = 1, we find some 6 > 0 with the property that \\Ty\\ < 1 if Ili/H < 6. For x G V \ { 0 } , we then have with y = SjAr (\\y\\ < 6) PIT \\Tx\ Thus l|T||<<oo. q.e.d.
[
Ty
145
The space of continuous linear operators T : V W between the normed spaces (V, ||-||) and (W, ||-||) is denoted by L(V, W). I t becomes a normed space w i t h norm | | T | | . L e m m a 2.3.2. If(W, ||-||) is a Banach space, then so is (L(V, W ) , | | - | | ) . The proof is the same as the one of Lemma 2.2.2, simply replacing (R, | |)
b y W I N D .
Remark 2.3.1. Again, (V, ||-||) need not be a Banach space here. L e m m a 2.3.3. Let T G L(V, W). Then
k e r T : = {x G V : Tx = 0} is a closed linear subspace of V. Proof, ker T = T (0) is the pre-image of a closed set under a continuous map, hence closed. q.e.d. In the sequel, we shall encounter bijective continuous linear operators T :V ->W
- 1
between Banach spaces. I t is a general theorem in functional analysis, the inverse operator theorem, that the inverse of T, denoted by T , is then continuous as well. Here, however, we do not want to prove that result, and we shall therefore frequently assume that T" is continuous although that assumption is automatically fulfilled in the light of that theorem.
- 1 1
be a bijective continuous linear map between Banach spaces, with a con tinuous inverse T . If S G L(V,W) satisfies
- 1
(2.3.3)
is continuous, too.
146
Banach spaces
^(r~ (r-s)H
^=0
T~\
(2.3.4)
Y^{T- {T-s)y
<J2\\(T- (T-s)y\\
vm
<
^(WT-'WWT-SWY,
and since |J^ 11J \\T - S\\ < 1 by assumption, the series satisfies the Cauchy property and hence converges to a linear operator w i t h finite norm. q.e.d. If V is a vector space, we say that V is the direct sum of the subspaces
v = Vi
x
2
= X\ +
x.
2
We then also call V\ and V complementary subspaces of V . Easy lin ear algebra also shows that if V\ possesses a complementary subspace of finite dimension, then the dimension of that space is uniquely deter mined, i.e. i f Vi 0 V = Vi 0 V ', then dim V = dim V .
2 2 2 2
We now consider a normed vector space (V, | | - | | ) . Then every finite dimensional subspace Vo is complete, hence closed. We also have: L e m m a 2.3.5. Let Vo C V be a finite dimensional subspace of the normed vector space (V, | | - | | ) . Then Vo possesses a closed complemen tary subspace V\, i.e. V = Vo 0 V\. Proof. Let e i , e als with
n
fo( i)
= ij
(hj = l , . - , n ) .
3
= f^.
147
7 is continuous, with ir(V) = Vb. T Vi : = ker 7r then is closed as the kernel of a continuous linear operator (Lemma 2.3.3), and every x V admits the unique decomposition x = n(x) + (x n(x)) w i t h ir(x) Vb, x ir(x) V i , because 7r O TT = 7r.
D e f i n i t i o n 2 . 3 . 1 . Lef T : V -+ W be a continuous linear operator between Banach spaces (V, ||-||) ond! (W, | | - | | ) . T is co//ed! o Fredholm operator if the following conditions hold: (i) Vb = k e r T is finite dimensional. Consequently, according to Lemma 2.3.5, there exists a closed subspace V\ ofV with V = VbSVi. (2.3.5)
(ii) There exists a finite dimensional subspace WQ of W, called the cokernel ofT (cokerT) giving rise to a decomposition ofW into closed subspaces W = W
0
W\
(2.3.6)
Thus, T yields bijective continuous linear operator T\ : V\ > W\. We finally require (iii) T~ : W\ > V\ is continuous.
l
not a vector
148
Banach spaces
Remark 2.3.3. As mentioned, condition (iii) is automatically satisfied as a consequence of the inverse operator theorem. Remark 2.3.4- I * conventions, the cokernel of T is only determined up to isomorphism, i.e. any Wo satisfying (2.3.6) w i t h W\ = T(V) is a cokernel. Usually, one defines the cokernel as the quotient space WjW\, but here we do not want to introduce quotient spaces of Banach spaces. T h e o r e m 2.3.1. Let V,W L(V,W), and be Banach spaces. F(V,W) is open in
1 o u r
ind : F(V, W) -> Z is continuous, hence constant on each connected component of F(V, W). Proof. Let T : V > W be a Fredholm operator. We use the decomposi tions V - Vo 0 Vi w i t h Vo = k e r T
0
W = Wo 0 Wi w i t h W = coker T of Definition 2.3.1. For S G L(V, W ) , we define a continuous linear op erator S' :V xWo^W
x
Since T\ : V\ VFi is bijective w i t h a continuous inverse, T is also > bijective w i t h a continuous inverse, and by Lemma 2.3.4 this then also holds for all S in some neighbourhood of T. For such 5, S'(Vi) is closed as Vi is closed and S' is continuous, and we have the decomposition
w=
and since 5 ( V i ) = 5 ( V i ) also
;
s'(v )es {w ),
1 0
/
W = 5(V"i)0 5 ( i y ) ,
o
(2.3.7)
and since Wo is finite dimensional, so is S'(Wo). Then S(V) D S(V\) is also closed since S(V\) is closed and possesses a complementary subspace of finite dimension. Finally, the dimension of the kernel of S is upper semicontinuous.
149
Namely, i f S is in our above neighbourhood of T , then since S is bijective, S is injective on V i , and hence the kernel of S is contained in some complementary subspace of V i , and as observed above, the dimension of such a subspace equals the one of Vo- Thus dim ker S < dim ker T (2.3.8)
if S is in a suitable neighbourhood of T in L(V, W). Altogether, we have verified that S is a Predholm operator i f i t is sufficiently close to T. Prom the preceding, we see that there exist finite dimensional subspaces VQ = ker S and V " of V with
0
v = v 'ev "eVi,
0 0
(V = ker T ) .
0
(2.3.9)
w=
s(v )s(v)w^
1
with WQ = cokerS and from (2.3.7) dim S(V ") + dim W = dim S'(W )
0 0 f
= dim W
(2.3.10)
-dimV^
/; 0 0 0 0
= (dim V - dim V ) - (dim JV - dimS(V ")) by (2.3.9), (2.3.10) = dim Vo - dim W = indT. for S in some neigborhood of T . q.e.d. The following result motivates the definition of a Predholm operator: T h e o r e m 2.3.2 ( F r e d h o l m a l t e r n a t i v e ) . Let V be a Banach space, T : V V a Fredholm operator of index 0. We consider the equation Tx = y. (2.3.11)
0
since S is injective on V
/; 0
150 Either
Banach spaces
(i) Either Tx = y is solvable for all y, and thus T is surjective, hence also injective as i n d T = 0, and so the solution x is uniquely determined by y, or (ii) Tx = y is only solvable if y is contained in some proper subspace ofV (with a finite dimensional complementary subspace), and for each such y, the solutions x constitute a finite dimensional affine subspace. Proof. A direct consequence of the definition. q.e.d.
2.4 C a l c u l u s in B a n a c h spaces In this section, we collect some material that will only be used in Chap ters 8 and 9. Definition 2.4.1. Let (V, \\-\\v), (W, ||-||w) be Banach spaces, F :V -> W a map. F is called differentiable (in the sense of Frechet) at u V if there exists a bounded linear map DF(u) with lim \\nu <:;o>
1 0 +
-+
v)-F(u)-DF um\\
{
IHIv
/ is called differentiable in U CV if it is differentiable at every u EU. f is said to be of class C if DF(u) depends continuously on u. f is said to be of class C if DF(u) is differentiable in u and the derivative D F(u) := D(DF)(u) depends continuously on u.
2 2
It is easy to show that a differentiable map is continuous. We now wish to derive the implicit and inverse function theorems in Banach spaces that will be used in Chapter 8. We shall need a technical tool, the Banach fixed point theorem: L e m m a 2.4.1. Let A be a closed subset of some Banach space (V, | | - | | ) . Let 0 < q < 1, and suppose G : A A satisfies \\Gyi - Gy \\ < q\\yi - y \\
2 2
for all y y
u
e A.
(2.4.2)
151
(2.4.3)
If we have a continuous family G(x) where all the G(x) satisfy (2.4-2) (with q not depending on x), then the solution y = y(x) of (2.4-3) de pends continuously on X. Proof. We choose yo G A and put iteratively Vn := We have
n Vn = (W - W - 0 +2/0 = 2 n (
G
Gy -\.
n
'
yi -
Gi
~ Vo)
+ Vo-
(2-4.4)
i=l
i=l
| | y i - ftll < j
Wvi ~ Vo\\
i=l
Consequently, the series y in (2.4.4) converges absolutely and uniformly to some y G A, noting that A is assumed to be closed and the limit function y = y(x) is continuous. We have V = l i m Gy
noo
n
= G (lim y)
n
= Gy,
/
\noo
hence (2.4.3). The uniqueness of a solution of (2.4.3) follows from (2.4.2), since q < 1. q.e.d. T h e o r e m 2.4.1 ( I m p l i c i t F u n c t i o n T h e o r e m ) . Let Vi,V ,W be Ba nach spaces with all norms denoted by \\-\\, U C V\ x V open, (xo>S/o) G U, F C (J7, W ) , i.e. F is continuously differentiable. For purposes of normalization solely, we assume
2 2 1
F(x ,y )=0.
0 0
(2.4.5)
the derivative of F(XQ, ) '- V ^ W at y = yo, is invertible. By our differ entiability assumption, D F(xo yo) is continuous, and we assume that
2 2 1
152
Banach spaces
its inverse is likewise continuous. Then there exist open neighbourhoods U\ of XQ, U of yo with U\ x U U, and a differentiable map
2 2
(p:U ^U
1
(2.4.6)
o D F(x
1
tp(x))
for all
xeU
(2.4.7) (D F(-,y) : Vi -+ W is the derivative of F(-,y) : Vi -+ I V ) . I n /ac*, /or et;er?/ x C/i, <p(x) is the only solution of (2.4-6) in U .
x 2
The content of the implicit function theorem is that the equation F(x,y)=0 can be solved locally uniquely for y as a function of x, i f the derivative of F w.r.t. y is continuously invertible. Proof. The idea is to transform the problem into a fixed point problem for which the Banach fixed point theorem is applicable. We put l:=D F( yo)2 X(h
(2.4.8)
which clearly is equivalent to our orginal equation F(x, y) = 0. For every x, we thus want to find a fixed point of &(x,y). Using l~ o / == id (note that / is invertible by assumption), we get *(ar, yi) y ) = l~ (D
2 x 2 l
F(x
0j
y )(yi
0
y )).
2
In Lemma 2.4.1, we take q = ^, and by the differentiability of F at ( Oiyo) and the continuity of Z"" , we may find 6* > 0, > 0 with the property that for
x 1
12 ~ 2o 1 1/ 1 / 1
( hence also
- y \\ < 2e ) ,
2
153
we have ||*(ar,2/o) - * ( o , 2 / o ) | | < | Since $(sco, t/o) = 2/o by assumption, we then have for ||y j/o|| < - voll < - * ( z , y ) l l + ll*(,yo) - * ( a o , > ) | |
0
< ^lly-yoll + l
0
<e
whenever x | | < < : = min(<5',6"). This means that i f \\x #o|| < <5, 5 $ ( x , y) maps the closed ball A : = {y Vi : | | y - l / o | | < onto itself. By Lemma 2.4.1, for every x with \\x x \\ < <5, there exists a unique y =: tp(x) w i t h ||y y | | < d 2/ = $ ( # , t/), i.e. F(x,y) = 0. Moreover, t/ depends continuously on x. We consider the open balls
0 a n 0
Ux : = { x : H * - soil < } , ^2 : = { 2 / : ||y - l/o|| < e } . (<&(sc, ) also maps the open ball t/2 onto itself.) By choosing <$, > 0 smaller, i f necessary, we may assume
U1XU2C U.
I t remains to show that <p(x) is differentiable and that its derivative is given by (2.3.7). We consider (x <p(xi))
u
eUxX
2y
:=
D F(x ).
2 uyi
Since F is differentiable, we may write F(x,y)=h(x-xi)+ where the remainder term satisfies
r
l (x - x ) + r(ar,y)
2 2
lim
V-+V1
Tl
4Ml
( - - )
154
I k - x i H < rj, | | 2 / - 2 / i | | < p Mx>v)\\ Thus I K * , <p(x))\\ < By (2.4.10), (2.4.11), | | ^ ( x ) - < ^ i ) l l < WtfhW\\x - x i | | + \ \\x - x i 1 + ~\\<p{x) 1 hence ||(/?(x) ^ ( x i ) | | < c\\x x\\\ for a constant c. We abbreviate r (x)
0 V
< || i||(lk-*i|| +
2 r
\\y-yill)-
(\\x - \\
Xl
+ \\<p(x) - ( x i ) | | ) .
V
(2.4.11)
(xi)||,
ip(x) with
(2.4.12)
from (2.4.9).
(2.4.13)
(2.4.12) and (2.4.13) yields the differentiability of (p and (2.4.7). q.e.d. C o r o l l a r y 2.4.1 (Inverse F u n c t i o n T h e o r e m ) . LetV,W spaces, U C V open, yo G U. Let f : U W be continuously tiable, and assume that the derivative Df(yo) tinuous inverse. <p := f~
l
be Banach differen
U\ of f(yo) = : XQ SO that f maps U2 bijective onto U\, and the inverse : U\ > U2 is differentiable
ZtyOro) = (Dfiyo))' .
}
(2.4.14)
Proof. We shall apply Theorem 2.4.1 to F(x y) := f(y) x, and find an open neighbourhood U\ of XQ and a differentiable function if : Ui -+ V
155
As y>(/i) = f~ (U\) is open, we may redefine t/2 as </?(C/i), and y> then yields a bijection between U\ and U . As f(<p(x)) = the chain rule implies
2
Df(<p(x ))
0
Zty(a? ))
0
L e
" ( - - )q.e.d.
1 4
The next topic concerns ordinary differential equations in Banach spaces. I n Chapter 9, we shall use the Picard-Lindelof theorem in a Banach space that we shall now derive. We need the integral of a continuous function x . I ^ V from some interval / = [a, 6] C R into some Banach space V , /
Ja
x(t)dt.
This can be defined as a Riemann integral as in the case of real-valued functions through approximation by step functions. Given a continouous $ : R x V -+ V, we say that x(t) solves the ODE (ordinary differential equation) on / , 4-x(t) = x(t) = dt if for silt e I x(t)=x +
0
x(t))
with x(a) = x
(2.4.15)
$(T,x(T))dT.
(2.4.16)
T h e o r e m 2.4.2 ( P i c a r d - L i n d e l o f ) . Suppose that $ is uniformly Lipschitz continuous, i.e. suppose there exists some L < 00 with ||*(ti,xi) - *(t ,x )|| < L
2 2
- t \ 4- | | x i 2
x \\)
2
G V. (2.4-15).
(2.4.17)
156
Banach spaces
Proof. We shall solve (2.4.16) w i t h the help of Lemma 2.4.1. For a con tinuous y : I V , we define Gy G C ( J , V ) , (Gy)(t) :=x + /
0
*(r,(r))dT
./a
We note that C(7, V ) , the space of continuous functions from / with values in V , is a Banach space w.r.t. the norm \\y\\ :=
c0
sup||y(t)||.
n
(To verify this, one just needs to observe that any sequence (y )neN C(I,V) with lim
n,moo
\\y - y \\ o
n m C
(=
\
= 0
J
n,m*oo f^j
co
= supl f <\t-a\
because of (2.4.17).
We choose e > 0 so small that u<_\. Lemma 2.4.1 with V replaced by C([a, a + e], V ) and with q = \ then implies that there exists a unique y G C([a, a + c], V) with Gt/() = #o 4- / $ ( T , y(r))dT for a < t < a + e.
Repeating the construction with a - f e in place of a and y(t + e) in place of #o yields the solution on [a, a + 2c], and so on. q.e.d. Remark 2.^.1. I f / is an infinite or semi-infinite interval, e.g. / = [a, oo), and if (2.4.17) holds on J, we obtain a solution of (2.4.15) on / , since Theorem 2.4.2 yields a solution on every interval [a, 6] with 6 < oo. C o r o l l a r y 2.4.2. Let the assumptions of Theorem 2.4-2 be satisfied on the interval I = [0, oo), and suppose that $ does not depend explicitly ont, i.e. $ : V V, $ = $(#). For x G V we thus consider the ODE
0
x(t)
= *(x(t)),
x(0) = x .
0
(2.4.18)
Exercises (x(0),
157
the value at 'time' 0, is called initial value). We denote the solu t). Then for s, t > 0, x(x$, t + s) = x(x(t), s) (semigroup
1
tion by X(XQ,
property).
Thus, the solution with initial value XQ at 'time 1 + s is the same as the solution with initial value x(t) computed at 'time's. Proof. This follows from the uniqueness statement i n Theorem 2.4.2, as both sides of (2.4.18) are solutions. q.e.d.
Exercises 2.1 Let (V,\\'\\y) (W, \ \ - \ \ ) be normed linear spaces. For a linear functional
w
2.2
Show that / is continuous iff | | / | | < oo. Let L(V,W) := {/ : V -+ W linear w i t h | | / | | < oo}. Show that i f ( W ^ I H I ^ ) is a Banach space then so is (L(V, W), ||-||). Show that a normed space (V, ||-||) is uniformly convex i f the following condition holds: Whenever ( x ) , (y )nN C V satisfy
n n N n
nco
then lim (x - y ) = 0. 7 O 1 O A normed space (V, ||-||) is called strictly normed i f the following condition holds: Whenever x , t / G ^ , x , | / / 0 satisfy
n n
2.3
l k
y||
I W I+
llll
Banach spaces Show that any uniformly convex normed space is strictly normed. Does the Banach fixed point theorem (Lemma 2.4.1) continue to hold if we replace (2.4.2) by the condition \\Gyi - Gy \\ < \\yi - y \\ for all y ,y
2 2 x 2
e AI
3.1 L
spaces
In the sequel, instead of functions / : A R U { 0 0 } (A measurable), we shall consider equivalence classes of functions, where / and g are equivalent if f(x) = g(x) for almost all x G A. We shall be lax w i t h the notation, however, not distinguishing between a function and its equivalence class. The equivalence class of the zero function is called the null class, and a function in that class is called a null function. D e f i n i t i o n 3 . 1 . 1 . Let A C R L (A)
P d
be measurable, p G R \ { 0 } .
For f e L (A),
we put
I I / I I
^ I I / I I L
( A ) : = ( / J / W I
^ )
(3-1.1)
The notation suggests that ||-|| verify this for p > 1. First of all,
p
(3.1.2)
Thus, ||-|| is positive definite (on the set of equivalence classes). Next, for c G R, l|c/|| = |c|||/|| .
p p
(3.1.3)
h\\ i
L {A)
< WIIWLHA)
159
+ ll/allt^)
( -
L e m m a 3.1.1 (Holder's inequality). Letp,q > 1 satisfy 1 + ^ = 1, fi L*(J4), / L(A). Then f f L (A), and
l 2 u 2
||/i/2lli<||/i|| ll/2ll .
p f l
(3.1.5)
H/iH = l .
P
I l /
| | ,
= 1.
(3-1.6)
Recalling Young's inequality, namely a b ab < + p q we have for x A /1W/2W ^ z p hence by our normalization (3.1.6) / \fl(x)f2{x)\dx<
JA P
p q
(3.1.7)
+ ~ g
We now obtain the triangle inequality: L e m m a 3.1.2 (Minkowski's inequality). Let / i , / Then | | / i + / 2 l | < | | / i | | + ll/a|| .
p p p 2
U>(A), p > 1.
(3.1.8)
Proof. The case p = 1 is given by (3.1.4). We now consider p > 1 and put q ~ j.T (so that \ + 1 = 1). For ${x) ~ \h{x) + / ( a ; ) | \ we 1 -^ I p-1 v q p have
p 2
= l/i+/ | ,
P 2
< \Mx)1>(x)\
\f (x)1>(x)\,
2
||/ +/ || <||/lV'll
P 1 2
ll/2V'll
+ l l /
< l l / i l l p l M I ,
| |
I I V ' l l ,
by Holder's inequality
spaces
161
I n fact, we have:
Proof. Let ( / ) e N C L (A) be a Cauchy sequence. For every v G N, we may then find n G N w i t h
n n v
ll/n - / n j |
<
^7
then converges in L (A). Since all elements of the series are nonnegative, ( # m ) m N converges to some g : A R U {oo} pointwise in A, and Corollary 1.2.1 implies that (g ) also converges to g in L ( A ) . I n particular, g(x) < oo for almost all x G A. Thus, our original sequence (3.1.10) is absolutely convergent for almost all x A, towards some / w i t h \f\<g+ |/m|; in particular / G L ( 0 ) . We interrupt the proof to record:
+ P m p
L e m m a 3.1.3. Let (f )neN converge to f in L (A). quence converges pointwise almost everywhere to f.
n
In order to complete the proofs of Lemma 3.1.3 and Theorem 3.1.1, it remains to show that the series (3.1.10) converges to / in L (A). (Then a subsequence of ( / ) converges to / in L (A). Since ( / ) was assumed to be a Cauchy sequence in L (A), the whole sequence has to converge in L (A). I t is in general not true, however, that the whole sequence also converges pointwise almost everywhere to / . ) This is easy:
P P n n P P
C O
/n>(*) + E (/.+(*) " " /(*)
162
/m(s) + E
i/=i
O W * ) ~ / . ) - /(*) ^
\fm(x)\,
we may apply Lebesgue's Theorem 1.2.3 on dominated convergence to conclude that we get convergence also w.r.t. ||-|| q.e.d. C o r o l l a r y 3.1.1. L (A)
2
(/i,/ ):= /
2
JA
be measurable, f : A -> R U { 0 0 }
(essential supremum), and L(A) := {(equivalence classes of) measurable functions f : A R U { 0 0 } with
ll/lloo : =
T h e o r e m 3.1.2. L(A)
is a Banach space.
Proof. I f is straightforward to verify that I H I ^ is a norm. I t remains to show completeness. Thus, let (f )neN be a Cauchy sequence in L. For v G N , we find n G N such that for ra, n > n
n
||/n
/m|loo <
3.1 L
spaces
163
| x A | |/(x)-/ (x)|>i;}
m
< ~
for ra, n > n and x A\N, f converges uniformly on A \ N towards some / . We simply put f(x) = 0 for x G N. Then ess sup | / ( x ) - / ( x ) | = ess sup | / ( x ) - / ( x ) | ,
n
xeA
xeA\N
since the essential supremum is not affected by null sets, J_ and f converges to / in L(A). q.e.d. We also note that Holder's inequality admits the following extension to the case p = 1, q = oo: L e m m a 3.1.4. Let f
x
G L (A),
G L(A).
Then f f
x 2
G L (A),
and (3.1-11)
< ll/illxll/allooProof. \
JA
\fi{x)f (x))\dx
2
|/i(x)|dx
xeA
JA
i.e. ~ + ^ = 1. T/ien L ( A ) is t/ie dual space of L (A). L (A) is reflexive. Remark 3.1.1. The dual space of L (A) dual space of L(A) is larger than L (A). L(A) is reflexive.
l l
In
particular,
164
Remark 3.1.2. Clarkson's theorem holds more generally for 1 < q < oo. The proof for 1 < q < 2 is a little more complicated than the one for 2 < q < oo. The proof of Theorem 3.1.4 is based on: L e m m a 3.1.5. Let 2 < q < oo, f,g < L {A). E 11/ + g\\ +11/ - \\
q 9 q q q q
< 2 * - (H/H;
+ y )i.
(3.1.13)
(In order to verify the left inequality in (3.1.13), we may assume w.l.o.g. x + y = 1. Then x < x , y < y since g < 2, and the desired inequality easily follows. The right inequality follows for example from Holder's inequality (Lemma 3.1.1) applied to the following functions
2 2 q 2 q 2
/i,/ :(-U)-R
2
fx = 1, , .
h ( t }
/ a ~ \ 6
The left hand side of (3.1.13) implies (|a + b\ + \a- 6 | ) ' < (|a + 6| + |a - 6 | ) * <v^(a +6 )5 for a, 6 R, and by the right-hand-side of (3.1.13), we have V2(a
2 2 2 q 9 2 2
(3.1.14)
+ b )i
(3.1.15)
Equations (3.1.14) and (3.1.15) imply |/(:r) + gix)]* + \f(x) - ( r r ) | < 2"~ {\f{x)\"
5 9 l
+ | (x)| ),
f f
(3.1.16)
Proof {Theorem
3.14).
= N I = i.
0
165
\\f-9\\ <yq
Therefore, for e > 0, we may find 6 > 0 such that \\f-9\\ <e
g
whenever | | | ( / + g)\\
Proof
(Theorem 3.1.3).
->
L (A)
|i(/)(?)|<||/|| .
p
(3.1.17)
Thus i(f) is indeed an element of L (A)*. We claim that we have equality in (3.1.17). This means that there exists some g G L w i t h
q
/
JA
f(x)g(x)dx
p
l l / I L I M L .
Then \g\ =
= / |/(x)^(a dx (x)\ = / j / ( * ) ii dx
p
p ?
= ( j \f{x)\ d y
y A X
(j \f(x)\ d y
A X
= H/llplMI,This verifies (3.1.18), hence equality in (3.1.17). Equality in (3.1.17) implies that i is an isometry, in particular injective. I n order to complete the proof we need to show that i is surjective. Suppose on the contrary that L"(A)* \ i{W{A)) / 0.
166
P
Since L (A) is complete and i is continuous, i(L (A)) is complete, hence closed. By the Hahn-Banach theorem (Corollary 2.1.1), there then exists veL<*(A)**,v^0, with
v\i(LP(A)) = 0.
We now suppose for a moment that 1 < p < 2. Then 2 < q < oo, and L (A) is reflexive by Theorems 3.1.4 and 2.2.3. We may therefore find a g in L {A) with
q q
F(g)
= v(F)
P
for all F G
L (A)*.
hence # = 0 (by a reasoning as in the derivation of (3.1.18)), hence also v = 0, a contradiction. We have shown that i furnishes an isomorphism between L (A) and L (A)*. Since L (A) is reflexive, so is L (A)* by Corollary 2.2.2, hence L (A). I n conclusion, L (A) has to be reflexive for any 1 < p < oo, and its dual space is given by L (A). q.e.d.
P q q q P P q
3.2 Approximation of L functions by smooth functions (mollification) In this section, we shall smooth out L functions by integrating them against smooth kernels. As these kernels approach the Dirac distribution, these regularizations will tend towards the original function. For that purpose, we need some g G C o ( R ) f with
d p
g(x)dx[=
\
I
JB{0,1)
g(x)dx)=l.
j
Such a g is called a Friedrichs mollifier. I n this , ft will always denote an open subset of R . Let / G L^ft). We extend / to all of R by putting
d d
t For Q C R open, Cg(Q) is the space of all C functions <p on O for which the closure of {x E Q \ ip(x) ^ 0 } , the support of ip (supp<p), is a compact subset of O. Elements of C^(Q) are often called test functions.
of LP functions
by smooth functions
167
fh is called the mollification of / w i t h parameter h. I n order to appre ciate this definition, we first observe supp Q C B(y, h) := {z R \ \z-y\<
d
h}>
(3.2.5)
For these reasons, one expects that fh tends towards / as h tends to 0. I t remains to clarify the type of convergence, however. The advantage of approximating / by fh comes from: L e m m a 3 . 2 . 1 . Let Q C C fif> h < d i s t ( f y , d f i ) . f
h f
Then
C(fi').
Proof By Corollary 1.2.2, we may differentiate w.r.t. x under the inte gral sign in (3.2.4), and since Q C so then is fh. q.e.d. We now start investigating the convergence of fh towards / . L e m m a 3.2.2. If f C ( f i ) , then for each ft' C C ft, fh converges uniformly to f onW as h 0. In symbols: fh^f on Q' as h 0. Proof. We have f{x) and fh(x) = j
J\w\<l
= f
J\w\<l
g(w)f(x)dw
by (3.2.3)
(3.2.7)
Q{w)f(x
- hw)dw
(3.2.8)
f 'ft' CC H ' means that the closure of Q is compact and contained in fi. We say that Q' is relatively compact in Q.
168
- h(x)\
< sup /
xefl' J\ \<l t\w\<
w
g(w)\f(x)
- f(x -
hw)\dw
< sup
- f{x - hw)\
M<i using (3.2.3) once more. Since ft' is bounded, {x G fi | dist(x, fi') < h} is compact (recall the choice of h). Therefore, / is uniformly continuous on that set, and we conclude that sup
xQ'
- fh(x)\
-> 0
as h -+ 0,
I
JQ
I
J\W\<1
g(w)g(x hw)dwdx
<
/
JQ
I /
\J\W\<1
g(w)dw J
J
I /
\J\U)\<1
Q(w) /
JR
d
\g{y)\ dydw
3.2 Approximation
of LP functions
by smooth functions
D
169
with
( - 3 2 1 0
Since <p has compact support, we may apply Lemma 3.2.2 to conclude that for sufficiently small h > 0, Ik
-WIIILP(R*)
^ |-
(3.2.11)
11/
^IILP(M^)
'
(3.2.12)
(3.2.10)-(3.2.12) yield
11/
~
M I L P ( Q )
<
H /
( ' '
1 3
g.e.d. Corollary 3.2.1. For 1 < p < oo, Cg(f2) is dense i n L ( f 2 ) . Proo/. Let / G L ( 0 ) , e > 0. We may then find Q' C C 0 w i t h H/llLp(n\n')
We put / ' : = / X L P ( O ' ) - T h e n
< p p
2*
( - - )
1 4
(3.2.15)
By (3.2.13), (3.2.14)
11/ ~" / ^ I I L P ( Q )
<
2*
Since
G C^(Q)
Corollary 3.2.2. L (fl) is separable for 1 < p < oo. Every f G L ( Q ) con be approximated by piecewise constant functions. Proof. By Corollary 3.2.1, i t suffices to find a countable subset BQ of L (Q) with the property that for every <p G CQ(Q) and every e > 0, there exists some a G BQ w i t h
P
llv-alli-(n)
<
( - - )
1 6
170
some fc, N G N and rational numbers c * i , . . . , a* and cubes Qi,..., fa such that for x Qi otherwise.
Clearly, B is countable. Since a continuous function (p w i t h compact support is uniformly continuous, we may easily find some a G B w i t h
Il
a
~ ^IILP(Q) ^ H
- <P\\LP(**)
< -
(3.2.17)
We put BQ := {axn | a G B } . B Q is likewise countable, and from (3.2.15), (3.2.16), we conclude that BQ is dense in L (Q).
P
q.e.d. Remark 3.2.1. The separability of L (ft) can also be seen by using Corol lary 3.2.1 and the Weierstrass approximation theorem that allows the approximation of continuous function w i t h compact support by polyno mials w i t h rational coefficients. The preceding results do not hold for L(fi). Namely, i f a sequence of continuous functions converges w.r.t. I H I X ^ Q ) , then it converges uni formly, and therefore, the limit is again continuous. Therefore, noncontinuous elements of L (ft) cannot be approximated by continuous func tions in the L-norm. Also, L(fi) is not separable. To see this, let (a )nN be any subsequence of { 0 , 1 } , i.e. a G { 0 , 1 } for all n . To ( a ) , we associate the function / ( ) on ( 0 , 1 ) defined by
oc n n n t t n p
for
/(an) { 0 0 for
= 0
n
for k G N.
Since the set of subsequences of { 0 , 1 } is uncountable, this implies that L ( ( 0 , 1 ) ) is not separable. Of course, a similar construction is possible for f2 any open subset of R . We finally note:
d
j f(x)(p(x)dx
Then f = 0.
= 0.
3.3 Sobolev spaces Proof. Since Co(fi) is dense in L ( f i ) , and since 9*-> I
JQ
2
1 1 1
f(x)g(x)dx
/
IQ JQ
f(x)g(x)dx
= 0
for all g G L ( Q ) .
3.3 Sobolev spaces In this section, we wish to introduce certain extensions of the L spaces, the so-called Sobolev spaces. They will play a fundamental role in subse quent chapters because they constitute function spaces that are complete w.r.t. norms naturally occurring in variational problems. I n this section, Q will always denote an open subset of R . We shall use the following notation: For a d-tuple a : = ( a i , . . . ,a^) of nonnegative integers,
d p
|| : = | > ,A:=(^r) ( ^ )
Definition 3.3.1. Let u,v G L (Q). derivative ofu, v := D u,
a 1
if j uD ipdx
a
(3.3.1)
We can now define, for k G N and 1 < p < oo, the Sobolev space W *(Q)
k
\\<*\<k
jQ
Finally,
let H *(n)
k
W (l)
k,p
and C$ n W *(Q),
172
We shall use the following abbreviations for u G 1 < i < d. D{U is the weak derivative for the multiindex ( 0 , . . . , 0 , 1 , 0 , . . . , 0), 1 at the 2 position, and Du is the vector ( D i u , . . . , D^u) of all first weak derivatives. The following result is obvious.
th
L e m m a 3 . 3 . 1 . Let u G C (Q), and suppose all derivatives ofu of order < k are in L ( f i ) . Then u G W (Q), and the weak derivatives are given by the ordinary derivatives. q.e.d.
p k,p
Thus, the W > spaces constitute a generalization of the spaces of k times differentiable functions. The W norm is considerably weaker than the C -norm, and so the W spaces are larger than the C spaces. Before investigating the properties of these spaces, it should be useful to consider an example: Let fi = ( - 1 , 1 ) C E, u(x) : = We claim that u G W ' (Q) for 1 < p < oo. I n order to see this it suffices that the first weak derivative of u is given by
k,p fc kyP k l p
for 0 < x < 1 for - 1 < x < 0. Indeed, we have for (p G CQ(( i.i))
We claim, however, that u is not contained in W (ft). Namely i f w(x) were the second weak derivative of u, i t would have to be the first weak derivative of and consequently, we would have w(x) = 0 for x ^ 0. The rule for integration by parts (3.3.1) would then require that for all
V>C3((-1,1))
2,p
0 =
= 2^(0) which is not the case. Thus, v does not have a first weak derivative.
173
Remark 3.3.1. Some readers may have encountered the notion of a dis tributional derivative. I t is important to distinguish between weak and distributional derivatives. Any L ( f i ) function possesses distributional derivatives of any order, but as the preceding example shows, not nec essarily weak derivatives. I n the example, of course, the second distri butional derivative of u is 2<5o, where <5o is the Dirac delta distribution at 0. u does not possess a second weak derivative because the delta distribution cannot be represented by an L function.
1 1
T h e o r e m 3.3.1.
kyP
are separable
Banach
spaces w.r.t. ||-|liyfc,P(n)Proof. That |Hlw*.p(n) norm follows from the fact that I H I X ^ Q ) is a norm (see section 3.1). Similarly, we shall now derive completeness of W *(ft) from the completeness of the L (ft) spaces (Theorem 3.1.1). Thus, let ( v ) n N C W (ft) be a Cauchy sequence w.r.t. ||-||w*.p(n)This implies that (D u ) ^ is a Cauchy sequence w.r.t. I H I ^ Q ) for all |Q| < k. By Theorem 3.1.1, (D u ) therefore converges in L (ft) towards some v . For <peCl (n)
k p k,p n a n ne p a n a] a 18 a
(3.3.2) Therefore, v is the a - t h weak derivative of t>o, the L - l i m i t of ( u ) N , and consequently vo G W (ft). The separability again follows from the corresponding property for L (ft) (Corollary 3.2.2).
a n n k,p p p
H *(ft).
kyP
can be approximated by
so that W *(ft)
is possible (Corollary 3.2.1). This is seen from the following simple ex ft = (1,1) C K, u(x) = 1. I f (<p )neN C C^ (ft) converges to u in
3 n
L ( f i ) , then after selection of a subsequence, i t converges pointwise al most everywhere (Lemma 3.1.3), and therefore, for sufficiently large n , there exists x
n
G ( - 1 , 1 ) w i t h ip (x )
n n
174
f n
Proof (Theorem 3.3.2). We have to show that any u E W < (fl) can be approximated by C(fi) functions. As in 3.2, we extend u to be 0 outside f i and consider the mollifications UH C(fi). We compute
D (u (x))
a h
Da Q (^J^j
iX
' (y) V
u s i n
g Corollary 1.2.2)
where D ,
A X
by definition of D u
a
= (D u) (x).
a h
(3.3.3)
Thus, the derivative of the mollification is the mollification of the deriva tive. Since D u E L (ft) by Theorem 3.2.1, (D u)h converges to D u in L (Q) for h 0. By (3.3.4), we conclude that D (uh) converges to D u in LP(Q)> for all | a | < fc, and this means that Uh converges to u in
p a 1 a a P a a
w *(n).
q.e.d.
Theorem
3.3.3.
W * (Q)
Proof. I t follows from Theorem 3.1.3 that the dual space of W (ft) given by W (fl),
k,q
is
Theorem
3.3.4.
HQ (Q)
,p
W (Q).
K,P
Proof. This follows from Lemma 2.2.5, since HQ (Q) a closed subspace (w.r.t. strong convergence) of
T h e o r e m 3.3.5. For 1 < p < oo, k N , any sequence in W (Q) is bounded w.r.t. IHIjy*.p(n) contains a weakly convergent
k,p
that
subsequence.
175
Proof. By Theorems 3.3.1 and 3.3.3, W ' (ft) is separable and reflexive. Therefore, the result follows from Corollary 2.2.1. q.e.d.
3.4 Rellich's theorem and the Poincare and Sobolev inequalities The compactness theorem of Rellich is: T h e o r e m 3 . 4 . 1 . Let ft C R be open and bounded. Let (w )nN C HQ' (Q) be bounded, i.e. \\u \\ i, ^ < c (independent of n). Then a subsequence of (u )neN converges in L (ft).
n p n W P p n d
Remark Rellich originally proved the theorem for p = 2. Kondrachev proved the stronger result that some subsequence converges in L(fi) for 1 < q < ^ if p < d and for 1 < q < oo if p > d. Of course, these exponents come from the Sobolev Embedding Theorem (see (3.4.12)). See Corollary 3.4.1 below. Proof. Since u v
n
3.4-1-
G HQ' (Q),
G CQ (ft) w i t h
r n ~ n||v^i.p(n) < o
t;
(3.4.1)
(3.4.2)
n,h( )
v (y)dy
n
of v
V (x)\
nyh n
L
r
v (x
n
hw))dw
by (3.2.7), (3.2.8) w . M
\w\<\
<
/
J\w\<l
\dr
v (x
n
- n?) drdw
with
(3.4.3)
v , (x)\ dx
n h
JQ
( f < / I /
f JQ \J\W\<I
\ d
Y
n
g(w) /
JO
v (x-rd)
\ur i\ i
6 w
drdw]
J
w
dx
f ( f
=
f^\ \ I J
p
Jn\J\
/ ^^
\dr ^
p
Vn
drdw
dx
, j
Vn
Vn
11
D v
n 11 LP (Q)
^ o l K I I
^co(measn)
p |KH p
L
(3.4.5)
by Holder's inequality, and similarly dx' Vn,h( ) with Ci : = sup fixed h > 0,
2
X
^ J^i i(
m e a s n
lp
\K\\
LP{Q)
(3.4.6)
K , h | | i ( ) < constant
C Q n
(3.4.7)
(where the constant depends on h). Therefore, (v ^)neN contains a uni formly convergent subsequence by the Arzela-Ascoli theorem. Since uni-
3.4 Rdlich's
111
form convergence implies L -convergence (e.g. by Theorem 1.2.3), the closure of v h is compact in L ( f i ) . Since a compact subset of a metric space (e.g. a Banach space) is totally bounded, there exist finitely many wi,... ,U)N L ( f i ) such that for every n N there exists 1 < j < N with
p Uy p
\\ n,h - W j | |
LP(
n ) < I'
(--)
By (3.4.1), (3.4.4), (3.4.8), for every n N we find 1 < j < N w i t h \\u ^jH^p(Q) < c.
p
Thus, ( w ) n N is totally bounded in L ( f i ) . Therefore, the closure of (u ) w in L ( f i ) is compact (again, a general result for metric spaces), and i t thus contains a convergent subsequence in L ( f i ) .
n p n n p
q.e.d.
T h e o r e m 3.4.2. Let fi C R
H^ (ft)
(3-4-9)
where u?d is the Lebesgue measure of the unit ball in R . Proof. Since CQ(Q) is dense in iJQ' (fi), we may assume u e C(j(fi). We put u(x) = 0 for all x e E \ S l . For $ e R f Jo
d d p
with
= 1, we have
ulx)
d u(x 4- rd)dr. dr
d u(x + dr , .
rfodddr
I f
178 Therefore
(J\u{x)\>dxY
p-1
k ( i (L J T ^ F
by Holder's inequality i du, \ J
d
I D u i v ) r d y
) [L ^
1
dx
/ 1 ( / ~dx\ W \x - V "
N
|Du(y)| d)
(3.4.10)
using Fubini's theorem to exchange the order of integration in the first factor. In order to control \ x -y\\ y
d
L we choose R with
l d X
'
J ^ ^ - J F * ^ we have
j
JQ \x y\ y\
**\*-v\>R ior\x-y\<R,
d-i^
/
JB(y,R) JB(y R)
y
d~i^
\x \x y\ y\
= dw R
d
(3.4.11) ft) .
1
= dw
1 d
(meas
3.4 Rellich's
179
We now come to somewhat stronger results that will however only be needed in Chapter 9. Namely, we have the Sobolev inequalities. T h e o r e m 3.4.3. Let u G
d
H^ (Q). and
p
\\u\\^<c\\Du\\ . (ii) Ifp>d, then u G C(fi), and sup|u| < c(measfi)^~p \\Du\\
(3.4.12)
(3.4.13)
with constants c depending only on p and d. (Actually, by a Theo rem of Morrey, forp > d, u G HQ (Q) is even Holder continuous with exponent 1 ^ . )
,P
We only prove (i) as (ii) will not be used in the present book: Proof. We first assume u G CQ(Q). have
oo fort = l,2,...,d.
-OO
H y ) \ ^ < Ml /
oo
d
IAti(W
-oo
< ( r
\J
oo
\DMV)\dy )
1 7=1
r /
1
(n
r
J
J oo \
oo
J d
1
< (y
iDMy^dy ^
(jlf
|A()|dy'dx^
Pd-i
.
= d 1.
180
finally yields
i
Hi A
L
( f i )
^(n/ i^)i^
n
1 jlPILi(n)-
(3-4-14)
This is (3.4.12) for p = 1. The case of general p may now be obtained by applying (3.4.14) to for suitable fi > 1 and using Holder's inequality. Namely, from (3.4.14) for in place of u
H I I <gjf Nx)rMiM*)i<fc
lA
and obtain
= q.e.d.
As a consequence, we obtain the theorem of Kondrachev: C o r o l l a r y 3 . 4 . 1 . Let ft G R be open and bounded. Let (u ) n C HQ (ft) be bounded for some 1 < p < d. 77ien a subsequence converges in L (ft) for anyl<q<-^.
n ne P q d
Proof. From Theorem 3.4.1 we know already that a subsequence con verges in L (ft). We may assume q > p as otherwise the result is an easy consequence of Holder's inequality since ft is bounded. We denote this converging subsequence again by (u ). From Holder's inequality, we obtain
p n
dp
- f (1 /z) ( - \p d \ \D(u
n
- tz ||i
m
( n )
- Wm)|lip n)
(
(3.4.15)
Exercises
P n n
181
Since Du is bounded in L (Q) by assumption, and (u ) is a Cauchy sequence in L ( f i ) , hence also in (3.4.15) then implies the Cauchy property i n L (ft). q.e.d.
p q
> 1} , A : = {x E
2
<i}.
for A E R .
P P X 2
1>
G L ( A ) , with
^nii/'iui
3.3
d q P
<=i
Let A C R be measurable, meas A < oo, 1 < p < g < oc. Then L (A) C L ( A ) , and for / G L(i4)
"
7 7 T II./ I I L P ( A ) -
<
-vi
II-/ I I L ( A ) '
(meas A) p
(meas A)
x
= 1, /
= /)
i i
3.5
T I I / I I
LP(A)
^4) p
Loo (A)
Suppose f
n
P
/ . Is / G L (A),
||/-/|| ^0
p
asn^oo?
3.7 3.8
Let A\,A f
2l
W *(Ai)
W *(A )l
2 2 ne
Consider the sequence ( s i n ( n x ) ) N in L ( ( 0 , 1 ) ) . Does i t con verge in the L -norm? Does i t converge weakly? I f so, what is the limit?
4.1 Description of the problem a n d its solution The typical problem of the calculus of variations is to minimize an inte gral of the form
where fi is some open subset of R (in most cases, fi is bounded), among functions u : fi -+ R belonging to some suitable class of functions and satisfying a boundary condition, for example a Dirichlet boundary condition u(y) = g(y) for ye
on
where C is some space of functions. The strategy of the direct method is very simple: Take a minimizing sequence (u ) n C C, i.e.
n ne
lim F(u )
n
= inf F(u)
uC
n
n*oo
and show that some subsequence of (u ) converges to a minimizer u G C. To make this strategy be successful, several conditions should be met: (1) Some compactness condition has to hold so that a minimizing sequence contains a convergent subsequence. This requires the careful selection of a suitable topology on C. 183
184
Direct methods
(2) The limit u of such a subsequence should be contained in C. This is a closedness condition on C. In particular, for (1) and (2) to hold, C should not be too restric tive. I n other words, one should not specify too many properties for a solution u in advance. (3) Some lower semicontinuity condition of the form F(u) < l i m i n f F(u )
n
if u
converges to u
n+oo
has to hold, in order to ensure that the limit of a minimizing sequence is indeed a minimizer for F. The lower semicontinuity condition becomes easier i f the topology of C is more restrictive, because the stronger the convergence of u to u is, the easier that condition is satisfied. That is at variance, however, with the requirement of (1) since for too strong a topology, sequences do not always contain convergent subsequences. Therefore, we expect that the topology for C has to be carefully chosen so as to balance these various requirements. I n order to gain some insights into this aspect, i t is useful to approach the problem from an abstract point of view. Thus, we shall return to the concrete integral variational problem raised in the beginning only later.
n
4.2 L o w e r s e m i c o n t i n u i t y We say that a topological space X satisfies the first axiom of countability, if the neighbourhood system of each point x X has a countable base, i.e. there exists a sequence (t/ ) eN f open subsets of X w i t h x G U with the property that for every open set U C X with x G U there exists n G N with
I/ 1/ v
V cV.
n
X satisfies the second axiom of countability i f its topology has a count able base, i.e. there exists a family {U ) ^n of open subsets of X w i t h the property that for every open subset V of X , there exists n G N with
u y
U CV.
n
We note that separable metric spaces X satisfy the second axiom of countability. I n fact, let be a dense subset of X, and let ( r ^ ) ^ ^ be dense in 1R . Then
+
{ 7 ( x , r ) := {x G X : d{x,x )
M v
< r^}
4-2 Lower
semicontinuity
185
) the distance function of X) forms a countable base for the topol ogy. If the first countability axiom is satisfied, topological notions usually admit sequential characterizations. For example, i f ( # ) n N C X is a se quence in a topological space X satisfying the first axiom of countability, then any accumulation point of (x ) (i.e. any x X w i t h the property that for every neighbourhood U of x and any m G N , there exists n > m w i t h x G U) can be obtained as the limit of some subsequence of (x ). Although we shall often employ weak topologies which typically do not satisfy the first axiom of countability, for our purposes it will usually be sufficient to use sequential versions of topological properties. For that reason, we shall define our topological notions in sequential terms, with out adding the word 'sequentially'.
n
Definition 4.2.1. Let X be a topological space. A function R : = R U { 0 0 } is called lower semicontinuous (Isc) at x if F(x)
n
F : X >
< liminf
n00
F(x )
n
for any sequence ( x ) n N C X converging to x. F is called lower semicontinuous if it is Isc at every x G X. The following properties are immediate: L e m m a 4.2.1. (i) If F :X -^Ris Isc, X > 0, then XF is Isc. (ii) / / F, G : X > R are Isc, and if their sum F - f G is well defined (i.e. there is no x G X for which one of the values F(x),G(x) is -hoo and the other one is 00), then F + G is also Isc. (iii) For F, G : X -+ R Isc, inf (F, G) is also Isc. (iv) / / (Fi)ii is a family of Isc functions, then s u p Fi is also Isc.
i /
Examples. (1) Any continuous function is lower semicontinuous. (2) I f X satisfies the first axiom of countability, then A C X is open if and only if its characteristic function \ A is Isc. Definition 4.2.2. (i) Let X be a normed space, with norm ||-||. F : X R is weakly proper, if for every sequence ( x ) N C X with \\x \\ 00 we have F(x ) oo for n oo.
n n n n
186
Direct methods
(ii) Let X be a topological space. F : X R is coercive if every sequence (x ) C X with F(x ) < constant (independent of n) has an accumulation point.
n n
We now formulate the following general existence theorem for mini mizers: T h e o r e m 4 . 2 . 1 . Let X be a separable reflexive Banach space, F : X R weakly proper and lower semicontinuous w.r.t. weak convergence. Then there exists a minimizer Xo for F, i.e. F(x )
0 n n
= inf F(x)
(> -oo).
= inf F(x).
xX
noo
n
Since F is weakly proper, | | x | | is bounded. Since X is reflexive, after selection of a subsequence, x converges weakly to some x G X by Corollary 2.2.1. By lower semicontinuity of F ,
n 0
F ( x ) < l i m F ( x ) = inf F ( x ) ,
0 n
noo
xX
and since xo G X , we must have in fact equality. Also, since F assumes only finite values by assumption, this implies that inf F(x) > - o o . xex
v y
q.e.d. Remark 4-2.1. The argument of the preceding proof also shows that in a separable reflexive Banach space, a weakly proper functional is coercive w.r.t. the weak topology. Lower semicontinuity w.r.t. weak convergence is a rather strong prop erty, in fact much stronger than lower semicontinuity w.r.t. to the Ba nach space topology of X. Fortunately, there exists a general class of functionals, namely the convex ones for which the latter property im plies the former. D e f i n i t i o n 4.2.3. Let V be a convex subset of a vector space; F : V > R is called convex if for any x, y EV, 0 < < 1 , F{tx + (1 - t)y) < tF{x) + (1 t)F(y)
187
L e m m a 4.2.2. Let V be a convex subset of a separable reflexive Banach space, F : V R convex and lower semicontinuous. Then F is also lower semicontinuous w.r.t. weak convergence. Proof. Let ( x ) n N C V converge weakly to x G V. We may assume that F(x ) converges to some K G R. By Theorem 2.2.4, for every m G N and every e > 0, we may find a convex combination
n n
N
n n
Vm '= ^ ^ X x
nm
(X > 0, ^ ^ A = 1)
n n
n=m
with
\\Vm
~ X\\ <
Since F is convex,
N
F(y )
m
< n=m
n (*n)-
(4.2.1)
Given e > 0, we choose m = m(e) G N so large that for all n > m, F(x )
n
<
K+
e.
raoo
moo
m-^oo
4.3 T h e existence of minimizers for convex variational problems We return to the concrete variational problem discussed in Section 4.1 and begin with: L e m m a 4.3.1. Let Q C R be open, f : fl x R R, with measurable for all v G M. , /(#, ) continuous for all x eft, and
d d d
f(-,v)
f(x,v)
> -a(x)
+ b\v\
188
Direct methods
d 1
b G R, p > 1.
$(v)
:= /
JQ
f(x,v(x))dx
p
$ : L ( f i ) R U {oo}. >
Proof. Since / is continuous in v, f(x,v(x)) is a measurable function, and so $ is well-defined on L ( f i ) , by Theorem 1.1.2. Suppose (v )nN converges to v in L (ft). Then a subsequence converges pointwise almost everywhere to v by Lemma 3.1.3. We shall denote this subsequence again by (t; ), noting that the subsequent arguments may also be applied to any remaining subsequence. Since / is continuous in v (actually, it would suffice to have / lower semicontinuous in v), we have
p n p n
f(x,v(x))
- b\v(x)\
< liminf ( / ( x , v ( x ) )
n
-b\v (x)\ ).
n
noo
-b\v (x)\
n
>
-a(x)
Since v converges to v in L ( f i ) ,
n
f b\v(x)\ dx
JQ
lim /
JQ
and we conclude lower semicontinuity, namely f(x,v(x))dx < liminf / f(x,v (x))dx.
n
q.e.d. L e m m a 4.3.2. Under the assumptions of Lemma 4-3.1, assume that /(#,-) is a convex function on R for every x G ft. Then $(v) := J / ( # , v(x))dx defines a convex functional on L (ft).
d p Q
of minimizers
189
Let v,w G LP (ft), 0 < t < 1. Then *(*v + ( l = / /(x,*v(x) + ( l - * ) ^ ( x ) ) d x < / {*/(x,t;(x)) + ( l - t ) / ( ^ ^ ) ) } d x
by the convexity of / = **(v) + ( l - * ) * ( w ) . g.e.d. We may now obtain a general existence result for the minimizer of a convex variational problem. T h e o r e m 4 . 3 . 1 . Let fl C R
d
(i) / ( , v) fcs measurable for all v G E . (ii) / ( # , ) 25 convex for all x (iii) f(x,v) aL (fl),
1>p x p
Efl.
d
with
= inf F(u).
liP
Proof. By Lemma 4.3.1, F is lower semicontinuous w.r.t. H (fl) w.r.t. weak H (fl) sequence in A, i.e. lim F(u )
n 1,p
con-
vergencef, and by Lemma 4.2.2, F then is also lower semicontinuous convergence, since H (fl)
1,p
ive for p > 1 (see Theorems 3.3.1 and 3.3.3). Let ( u ) n e N be a minimizing = inf F(u).
uA
noo
Since / \Du \ Jn
n p
< \F(u )+ o
n p n
\ [ b J
a(x)dx,
Q p
(Du ) n in H *^)
n ne 1
is bounded in L ( f i ) , hence ( w ) n N C g+H^ (fl) is bounded by the Poincare inequality (see Theorem 3.4.2). Since H^ (fl)
p
d
are continuous.
190
Direct methods
is a separable reflexive Banach space, by Theorem 3.3.5, after selec tion of a subsequence, (w )nN converges weakly to some UQ A (A is closed under weak convergence, Theorem 3.3.4). Since F is convex by Lemma 4.3.2 and lower semicontinuous by Lemma 4.3.1, i t is also lower semicontinuous w.r.t. weak H (Q) convergence by Lemma 4.4.2. Therefore
n liP
F(u )
0
< l i m F(u )
n
= inf F ( u ) ,
uA
noo
and since UQ A, we must have equality. q.e.d. Remark 4.3.1. The condition u g^H^ (ft), i.e. u-g HQ (Q), is a (generalized) Dirichlet boundary condition. I t means that u = g on dQ in the sense of Sobolev spaces.
p ,p
4.4 Convex functionals on H i l b e r t spaces and M o r e a u - Y o s i d a approximation In this section, we develop a more abstract method for showing the ex istence of minimizers of variational problems. I t has the advantages that it does not need the concept of weak convergence and that it provides a constructive approach for finding the minimizer. I n order to concentrate on the essential aspects, we shall only treat a special situation. Definition 4.4.1. Let X be a metric space with metric d(-,-)> and let F : X R U {oo} be a functional. For X > 0, we define the Moreau> Yosida approximation F of F as
x
F (x) for x X.
:= mi(\F(y) yex
+ d (x, y))
(4.4.1)
Remark 4-4-1- This is different from the definition in Section 5.1 where we shall take d(x,y) instead of d (x,y). Here, one might take d (x,y) for any exponent a > 1. For our present purposes, it is most convenient to work with a = 2.
2 a
We now let i f be a Hilbert space with scalar product (, ) and norm ||-|| and induced metric d(x,y) = | | : r - 2 / | | . Let D(F) C i f , and let F : D(F) - > l b e a functional. We say that F is densely defined i f D(F)
191
is dense in i f . For x D ( F ) , we put F(x) = oo. We say that F is convex if whenever 7 : [0,1] i f is a straight line segment, then for 0 < t < 1 > F ( ( * ) ) < t F ( ( 0 ) ) + (1 - i ) F ( ( l ) ) .
7 7 7
(4.4.2)
In particular, i f 7(0), 7(1) G D ( F ) , then also 7(f) G D ( F ) for 0 < t < 1. L e m m a 4 . 4 . 1 . Le F : i f R U { 0 0 } 6e convex, bounded from below, and lower semicontinuous. Then for every x G i f and X > 0, there exists a unique
y =:
x
J\x)
with F (x)
x
= \F(y )
+ d (x,y )
(4.4.3)
Proof. We have to show that the infimum in (4.4.1) is realized by a unique y . Uniqueness: Let y ,y be solutions of (4.4.3), and let
x x 2
Vo = \(Vi
+02)
be their mean value. By convexity of F F(yZ)<\(F(y )+F(y )), and by Euclidean geometry, i f y
A 2
x x x
(4.4.4)
^ y , we have
2
lk-J/o H <^(|k-^ir
hence XF(y )
x
+ ||x-^|| ),
(4.4.5)
+ Ik - 2/2 If
A 2 2
= AF(j ) + | | z - /
x
|| ,
contradicting the minimizing property of y and 1/2 Thus, we must have 2/i = 2/2 > proving uniqueness. Existence: (4.4.5) may be refined as follows: For 1/1,1/2 i f and
2/o : = ^(2/1+1/2)
(4.4.6)
XF(y )
n
+ \\x - y \\
n
inf (xF(y)
yH \
+ ||x - y | | ) = : .
A
(4.4.7)
Using the convexity of F as in (4.4.4) and (4.4.6), we obtain ><F(yk,i) + < \ (\F(y )
k
\\x-yk,i\\
+ \\x - y \\ )
k
+ \ (\F(y )
t
+ \\x - yi\\ )
\\\y
yi\\ >
(4.4.8) By definition of K\ (see (4.4.7)), the left hand side of (4.4.8) cannot be smaller than K\, and so we conclude that llifc-wll -o as fc, I > oo, establishing the Cauchy property. Since the norm is con tinuous and F is assumed to be lower semicontinuous, the limit y of (Vn)neN then solves (4.4.3). q.e.d.
x 2
x = l i m J (x).
A+0
(4.4.9)
Proof. Since x is in the closure of D(F), for every 6 > 0, we may find x B(x,6)
6
:= {yeH
:\\x~y\\<6}
< oo.
+ \\x -
,511 )
<
62
(4.4.10)
(see (4.4.7) for the definition of K,\). Let us now assume that there exists a sequence A 0 for n oo w i t h
n
\\x-y \\
Xn
>a>0
for all n.
(4.4.11)
193
+ \\x-y \\ )
Xn
< 0,
(4.4.12)
-> - o o
asn^oo.
(4.4.13)
+ | | x - y \\
< F(y )
Xn
+ \\x - y \f
Xn
-+ - o o
as n ^ oo
which is impossible. Thus, (4.4.11) cannot hold, and (4.4.9) follows. q.e.d. T h e o r e m 4 . 4 . 1 . Let F : H E U { o o } be convex, bounded from below, and lower semicontinuous,
Xn n
=
n
J (x)
converges to a minimizer
Xn n
Proof. Since (y ) eN
minimizes
yen
inf F(y)
is monotonically increasing in A. Indeed, let 0 < fi\ < fa- Then by definition of y^ W hence
W
2
+ \\x - y^\\ Mi
> W
) + ||x Mi
y^\\ ,
+ -
| | x - y^\\
> F(y^)
+
M2
+ \\x - y || M2
( l ^ - ^
l
( r - r )
l l
- l l x - ^
i i
) .
\Mi
M2/ V
2
/ only i f
> I I * - 2 / '
I I
194
Direct methods
is bounded independently of A since it is assumed to be bounded for the sequence A oo. We next claim that >
n
F(y).
inf
{y.\\x-y\\<\\x-y>\\}
x
F(t/),
and therefore y has to decrease since | \x y 1 increases. The limit has 1 to be i n f # F(y) since this is so for the subsequence {y ) eNWe now claim that (?/ )A>O satisfies the Cauchy property, i.e. for every e > 0, there exists Ao > 0 such that for all A,// > A
Xn y n a 0
\\y -y\\ <e. For that purpose, we choose AQ SO large that for A, \i > AQ < \ (4-4.14)
which is possible by the preceding monotonicity and boundedness re sults. We may also assume i V ) We let > F(yn(4-4.15)
<
+ \{\\\ -y \\ >M ..
+
1
\ I I *
w " l l
-\\\y -
y"\\ )
<F
(y*)
(ll~
..AM ,
|L,A
by (4.4.14).
4-5 Euler-Lagrange
equations
195 only i f
Thus (y )\>o satisfies the Cauchy property for A oo, and it therefore converges to some y H. y then minimizes F , because F(y ) decreases towards i n f / f F(y) for A oo, and F is lower semicontinuous. ^ q.e.d.
x y
The preceding reasoning is adapted from J. Jost, Convex functionals and generalized harmonic maps between metric spaces. Comment. Math. Helv. 70 (1995), 659-673. For a more general construction, see J. Jost, Nonpositive Curvature: Geometric and Analytic Aspects, Birkhauser, Basel, 1997, pp. 61-4. I n particular, the method also works in uniformly convex Banach spaces. General references for Moreau-Yosida approximation are the books of Attouch and dal Maso quoted in Chapter 6. Theorem 4.4.1 yields an alternative proof of Theorem 4.3.1 in case p = 2. Namely, Lemma 4.3.1 implies the lower semicontinuity, Lemma 4.3.2 the convexity of the functional, and the Poincare inequality the boundedness of any minimizing sequence, as described in the proof of Theorem 4.3.1. The present proof, however, does not need the concept of weak convergence. As mentioned, the method extends to uniformly convex Banach spaces, and thus can handle also arbitrary values of p > 1 (see Remark 3.1.2).
4.5 T h e E u l e r - L a g r a n g e equations and regularity questions In this section, we return to the variational problems considered in Sec tions 4.1 and 4.3; we consider variational integrals of the form
(i) / ( , u, v) is measurable for all u E , v R . (ii) f(x, , ) is differentiable for almost all x ft. (iii) \f(x,u,v)\ < c - f ci \u\ - f c \v\ , c , c i , c constants, for almost all x fl, and all u E, v R .
p p 0 2 0 2
d
Condition (iii) implies that # ( u ) is finite for u H (fl), since fl is bounded. (If fl is unbounded, this still holds provided c = 0.) I n the
0
1,p
196
Direct
methods
preceding section, we have obtained some results on the existence of a minimizer for # in the class g + i ? o ( f i ) , for given g G H ' (fl). I n the present section, we wish to characterize such minimizers by necessary conditions. These conditions will assume the form of differential equa tions. In fact, these differential equations will hold for arbitrary critical points of # (as specified in the assumptions of our subsequent results), and not only for minimizers.
,p l p
(i)-(iii))
du
(x, u, v) 2=1
dv
r(x,W, v) < c
+ c \u\ + c \v\ ,
4 5
C3, C4, C5 constants, for almost all x G fl, and all u G R , V G M. . Let u be a minimizer for # m he c/ass g + HQ* (l) (g G i f ^ f i ) We then have for all <p G C^(ft) f
p
given).
fd f
df
*\
= 0. Proof. Since u is a minimizer for $ in # - f i f Q ' ( 0 ) , < *(w 4- tip) We have * ( u + ty>)= / f(x,u(x) Jn + tip(x),Du(x) + tDip(x))dx. for t G E, <p G <7g( )n p
(4.5.1)
(4.5.2)
By (ii), (iii), (iv), we may apply Corollary 1.2.2 to conclude that is differentiable w.r.t. </?, and $ ( u 4- tip) = J | ^ ( x , u ( x ) +t<p(x),Du(x) +tDip(x))ip(x) d</?(x)'
$(u+t(p)
+J2^~i( ^ ^+^fr)+^( ))
x u
^?}
d x
(--)
t=l
equations
197
+ t<p)\t= = 0.
0
(4.5.4)
Equations (4.5.3) and (4.5.4) imply (4.5.1). q.e.d. Remark 4-5.1. From the preceding proof, it is clear that we do not need to assume that u is a minimizer for I f suffices that u is a critical point for # in the sense that ~ $ ( u + tip)\ at
2 2 t=0
(4.5.5)
Corollary 4.5.1. Suppose that f satisfies (i)-(iv), and in addition, f G C . If u G C ( Q ) minimizes $ in the class g + HQ (Q) (or, more gen erally, satisfies (4-5.5)), then
}P
du
ij=l d i=l
a :
I ) u
( )) -
a :
^(x,u(x),Du(x))=0. (4.5.6)
Definition 4.5.1. Equation (4-5.6) is called the Euler-Lagrange equa tion for Proof (Corollary 4-5.1). By the differentiability assumptions made, we may integrate (4.5.1) by parts to obtain
1=1
- Y . S L
1=1
'
'
( 4 5
7 )
From Lemma 3.2.3 (applied to supp</? C C fi so that the term in { } is in L ) , we then obtain (4.5.6). q.e.d.
2
198
Direct methods
Equations (4.5.6) constitutes a quasilinear partial differential equa tion of second order for u. Many such partial differential equations arise as Euler-Lagrange equations of variational problems. Therefore, i f one wants to solve such an equation, one might t r y to find a minimizer of the associated variational problem. However, the existence theory for minimizers as described in Section 4.3 naturally yields an element u of the Sobolev space H^ (fl), whereas in Corollary 4.5.1 it is required that u be of class C (ft). Thus, there exists a gap, since in general elements of Hl' (ft) are not of class C . I t is the task of regularity theory to bridge this gap, i.e. to show that under suitable assumptions on / , any minimizer of # is smooth, and specifically here of class C . The theory of partial differential equations indicates that such a result does not hold without additional assumptions on / , like an ellipticity assumption, meaning that the matrix (a '(x))tj=i,...,d w i t h coefficients a^(x) = yiQ j (x, u(x), Du(x)) is positive definite. Indeed, examples show that without such an assumption, in general one does not get smoothness of minimizers. On the positive side, however, we do have de Giorgi's and Nash's:
p 2 p 2 2 tJ d v
f : ft x R E be *
< / ( x , t ; ) < A ( l +
|t;| )
and
G g + HQ (Q) (g G H (ft)
1,p
/ n
f(x,Du(x))dx
Then u is smooth in ft (u G
C(ft)). The proof of the theorem of de Giorgi and Nash is too long to be pre sented here. We refer to M . Giaquinta, Introduction to Regularity Theory for Nonlinear Elliptic Systems, Birkhauser, Basel, 1993, pp. 76-99 and
4-5 Euler-Lagrange
equations
199
J. Jost, Partielle Differentialgleichungen, Springer, Berlin, 1998 where a detailed proof is given. Of course, there also exist extensions of this result to more general integrands of the form / ( x , u, v). We refer the interested reader to O. Ladyzhenskaya, N . Ural'tseva, Linear and Quasilinear El liptic Equations, Academic Press, New York, 1968 (translated from the Russian), Chapters I V - V I . One remark is in order here: Since Sobolev functions are only equiva lence classes of functions (in the sense specified at the beginning of Sec tion 3.1), a more precise version of Theorem 4.5.2 is: Under the stated assumptions, the equivalence class of u contains a function of class C. This point, however, usually is assumed to be implicitly understood in statements of regularity theorems. In order to display at least one regularity result, however, we consider a particular example: For a bounded, open ft C E , g G J f ' ( n ) , we wish to minimize Dirichlet's integral
d c 1 2
D(u)
yP
:= I \Du(x)\ dx
JQ
(4.5.8)
in the class g + HQ (ft). By Theorem 4.3.1, a minimizer u exists, and by Theorem 4.5.1, i t satisfies / Du(x) Dip(x)dx
JQ
=0
(4.5.9)
I f u can be shown to be
(A is called Laplace operator.) by Corollary 4.5.1, i.e. i t is harmonic. This is the famous D i r i c h l e t p r i n c i p l e : obtain a harmonic function u in ft with boundary values g by minimizing the Dirichlet integral among all functions with those boundary values. In order to justify Dirichlet's principle i t thus remains to show that any solution of (4.5.9) is of class C . Actually, one can show more, namely, u G C (in fact, u is even real analytic in ft but this will not be demon strated here), and at the same time weaken the assumption. Namely, we have:
2
200
Direct methods
1
satisfy (4.5.10)
=0
ThenuC(Q). Remark 4-5.2. (1) Clearly, (4.5.9) implies (4.5.10) by definition of Du. (2) The remark made after Theorem 4.5.2 again applies. Proof (Theorem 4-5.3). We consider the mollifications with a rotationally symmetric p (and we express this by writing p as a function of \x\)
Uh{x) =
hjjici^)
u{y)dy
as in Section 3.2. Given (p Co(Q), we restrict h to be smaller than dist(supp</?, dft). We obtain u(y)dyA(p(x)dx = / u(x)A<ph{x)dx,
JQ
(4.5.11)
using Fubini's theorem. q.e.d. Remark 4-5.3. We have also used the fact that A commutes with mol lification, i.e. (A<p) = A(<p ).
h h
(4.5.12)
For this, one needs that g is a function of |x| only, i.e. rotationally symmetric. Also, this point needs the rotational invariance of the Laplace operator A . Therefore, the present proof does not generalize to other variational problems. After this interruption, we return to (4.5.11) and conclude that / u (x)Aip(x)dx
h
=0
(4.5.13)
JQ
by applying (4.5.10) to tph CQ(Q) (by our choice of h). Since UH is smooth, we obtain e.g. from Corollary 4.5.1 Au
h
= 0
equations
201
<
M * ) l \dx <
(4.5.14)
X fe
dy = 1 by (3.2.3)
Therefore, the functions Uh are uniformly bounded in L . We now need L e m m a 4 . 5 . 1 . Let f C (Q) Af(x)
2
be harmonic, = 0 into.
i.e.
Then f satisfies the mean value property, i.e. for every ball B(xo, )
C 0,,
f(xo)
= "^3 /
yB(a:o,r)
/(*)<** = ^ - ^ r r /
JdB(x ,r)
0
f(x)da(x)
d
(4.5.15)
where uJd is the volume of the unit ball in Proof. For 0 < g < r 0 = /
JB{X ,Q)
0
R.
Af(x)dx j(x)da(x),
0
I
JdB(x ,Q)
(y +
gu)g ~ duj
Q
in polar coordinates UJ
JdB(0,\)
dg
Q -* f
f(x)da(x))
(dWd^"
DL
202 Thus,
Direct methods
is constant in g, and since its limit for g 0 is / ( s o ) as / is continuous, it has to coincide with f(xo) for all 0 < g < r. Since -i-y / f(x)dx = 4 M T - V T / f(x)da(x)) g ~d
d l
Q)
the first inequality in (4.5.15) also follows. q.e.d. We return to the proof of Theorem 4.5.3: Since UH is harmonic, it satisfies the mean value properties of Lemma 4.5.1. Since the family Uh is bounded in L ,
1
Uh(x )
0
= ~
u (x)dx
h
is bounded for fixed r w i t h B(xo,r) C fi^. Therefore, the Uh are uni formly bounded in flh for 0 < / i < ^ . Furthermore, from (4.5.15)
0
\u (xi)
h
- u (x )\
h 2
< ~
dWd
(-)
XT
X /
[
1 / B(x ,r)\B(8 r) UB(i r)\B(*i,r)
1 2 ) 2 l
\u {x)\dx
h
< c(r)\xi
-x \
2 2
(4.5.16)
for some constant depending on r, i f B(x\, r ) , B(x ,r) C . Therefore, the gradient of Uh is also uniformly bounded on fi^ . Likewise, deriva tives of Uh of all orders can be uniformly bounded on lh (0 < h < ^ ) , either by repeating the same procedure, or by observing that together with Uh, also all derivatives of UH are harmonic so that (4.5.16) can be iteratively applied to all derivatives in order to convert a bound on some derivative into a bound for a higher one. Therefore, a subsequence of un converges towards some smooth function v, together with all its derivatives, as h 0. Since all the Uh satisfy Auh = 0 so then does v:
0 0
Av = 0
in Q.
x
Since on the other hand Uh converges to u in L ( 0 ) by Theorem 3.2.1, the two limits have to coincide (e.g. by Lemma 3.1.3). Therefore u = v, and consequently u is smooth and harmonic. q.e.d. As an application, we consider the following
Exercises Example 4-5.1. Let a : R R be Lipschitz continuous w i t h 0 < A < a(y) < A < oo
d
203
for all y G R.
(4.5.17)
in the class A : = # + # d ' ( f i ) , with given # G i f (f2). By the PicardLindelof theorem, the ordinary differential equation P =
1 1
(4-5.18)
Since ^ > A~* > 0, the inverse function v(u) exists and is of class C ' as well, and we have by (4.5.19) and a chain rule for Sobolev functions that easily follows from the chain rule for differentiable functions by an approximation argument that
d d
a(u)DiuDiU
^2 F>ivDiV.
i=l
F(u)
i=l
= D(v).
Since the latter admits a smooth minimizer, the original problem (4.5.17) then admits a minimizer that is of class C ' in fi.
1 1
Exercises 4.1 4.2 Weaken the growth assumption required for | in (iv) of The orem 4.5.1. Hint: Use the Sobolev Embedding Theorem. Compute the Euler-Lagrange equations for the variational in tegral A(u) := / y/l + Jn \Du{x) \dx.
2
(A(u) represents the volume of the graph of u over fi. Critical points are minimal hypersurfaces that can be represented as graphs over fl.)
204 4.3
:= / g (x)Diu(x)Dju(x) Jn
lj
(detgij(x))^
dx,
where (<7 ())i,j=i,.,,,d is the inverse matrix of {gij{x))ij=i,...,d. Assume that (0ij(x))i,j:=i,...,d is positive definite for all x G ft. Show that for given g G H ' (ft), there exists a unique minimizer of J? among all u G i ? ( f i ) with u g G i f ( f i ) . (Minimizers for J? are harmonic functions w.r.t. the metric gij(x).)
x 2 1 , 2 1 , 2
5.1 Nonlower semicontinuous functionals and relaxation From Section 4.3, we recall the following T h e o r e m 5.1.1. Let ft C R measurable and suppose: (i) For almost all x eft, (ii) There exist a G L (ft),
1 d
+ b\v\
is Isc and convex on H (ft) equipped with its weak topology and assumes its infimum in the class of all f G H (ft) with f g G H (ft) for some given g G H (ft).
lyP yP 0 lyP
1,p
Here, (ii) is just a coercivity condition ensuring that a minimizing sequence stays bounded w.r.t. the H -norm (w.l.o.g. F ^ oo) (i) implies that F is lsc, w.r.t. the norm topology of H ' , and the convexity then implies that F is also lsc w.r.t. the weak H topology. Since bounded sequences in H have weakly convergent subsequences, any minimizing sequence has a convergent subsequence, and a limit of such a subsequence then minimizes F by lower semicontinuity. Not all functionals that one wishes to consider in the calculus of varia tions are convex, however. As a motivation for what follows, we consider
lyP x p l,p 1,p
205
206
Nonconvex functionals.
Relaxation
(u (x)
+ (u'(x)
- l ) ) dx.
(5.1.1)
i n , i+ 1
u (x)
n
:= {
x H
(5.1.2) 2n
Mn
(0)=o =
ti (l),
n
(5.1.3) (5.1.4)
K ( x ) | = 1.
= 0.
Since F(u) is nonnegative for every u, (5.1.1) follows. The infimum of F therefore cannot be realized by any H Q ' function, because if we had
4
F(u)
= 0,
;
then u(x) = 0 for almost all x G (0,1) and | w ( x ) | = 1 for almost all x G (0,1), and these two conditions are not compatible. (In fact, since d = 1 here, any u G # Q ' ( ( 0 , 1)) is absolutely continuous, and so u = 0 i f u(x) = 0 a.e., hence u is differentiable and u' = 0. (More generally, any Sobolev function that is constant on some set A has a representative u whose derivative Du vanishes on A.) We have thus shown that the problem
4
-+ min in #
M 0
(fi)
functionals
and relaxation
n
207
converges to zero
F{u ).
n
Therefore, F is not lsc w.r.t. weak H ^-convergence although the inte grand is continuous in u'. As we shall see this results from the lack of convexity of the integrand. We also observe that any sequence of saw tooth functions u , i.e. satisfying
n
\u' \ = 1 a.e.
n
that converges to 0 in L
Remark 5.1.1. Functionals of the type of our example often arise in op timal control theory as described in Section 5.2 of Part I . For example, one considers problems of the following type [ Jo under the side conditions u(0) = u ,
0
f(t, u(t),a(t))dt
-+ min
(5.1.5)
u{T) = u
(5.1.6) (5.1.7)
u'(t)=g(t,u(t),a(t))
w i t h given functions / and g. u is called a state variable, a a control variable. This means that one assumes that u describes the state of some system evolving in time t whose derivative or rate of change can be controlled through a parameter a. The aim then is to choose a in such a manner that the functional, often considered as 'cost function', is minimized. Thus, one needs to find some equation a(t)=<p(t,u(t)) for an optimal control a at time t assuming a given state u(t) of the sys tem. I f one knows the optimal control, one can reconstruct the evolution u(t) of the state of the system from (5.1.6) and (5.1.7) under appropriate assumptions. The simplest control equation (5.1.7) is u'(t)=a(t),
208
Nonconvex functionals.
Relaxation
f(t,u(t),u'(t))dt.
2
Expressions of the type (u (t) l ) can occur in many technical exam ples, like boats sailing against the wind. Faced w i t h a problem that one cannot solve, one may contemplate several options: One could t r y to modify the problem, or one might generalize the concept of a solution, or both. We shall discuss several such strategies. We first modify the problem via relaxation. This is an important method in the calculus of variations, and we therefore discuss it in some generality. D e f i n i t i o n 5 . 1 . 1 . Let X be a topological space, F : X E . We define the lower semicontinuous envelope or relaxed function sc~F of F as follows: (sc~F)(x) : = sup { $ ( # ) : $ : X E is lower semicontinuous with < F(y) for ally e X} is the largest Isc function on X that is < F
L e m m a 5 . 1 . 1 . sc~F everywhere.
Proof. sc~F is Isc as a supremum of Isc functions, see Lemma 4.2.1 (iv). Obviously, sc~F < F , and for all Isc $ w i t h # < F , we have $ < sc~F by definition of sc~F. q.e.d. T h e o r e m 5.1.2. Let X be a topological space, F : X E a function. Then every accumulation point of a minimizing sequence for F is a minimum point for sc~F. Consequently, if F is coercive, then sc~F assumes its minimum, and min sc~ F = inf F. x x
functionals
and relaxation
209
Proof Let (# )neN C X be a minimizing sequence for F w i t h accumu lation point #0- Then (sc~F)(x )
0
noo
n >o o
= inf F(y)
(5.1.8)
is Isc and < F , hence by Lemma 5.1.1 for every x G X inf F(t/) < {sc~F)(x). From (5.1.8) and (5.1.9) we conclude {sc~F)(x )
0
(5.1.9)
(5.1.10)
This implies the first claim. I f F is coercive, then every minimizing sequence has an accumulation point, and the second claim also follows. q.e.d. What does Theorem 5.1.2 tell us for our example? It simply says that i f we cannot minimize our original functional F due to its lack of lower semicontinuity, we then minimize another functional instead, one that is lower semicontinuous and as close as possible to F . Theorem 5.1.2 then says that limits (or more generally, accumulation points) of minimizing sequences for F do not minimize F , but the re laxed functional sc~F. Since sc~~F is the largest Isc functional < F by Lemma 5.1.1 that is the best one can hope for. I t then remains the task to determine the relaxed functional of some given F . Before proceeding to do so for our example, let us relax ourselves a little and derive some easy consequences of the definition of the relaxed functional and consider some easier examples first. L e m m a 5.1.2. Let X satisfy the first axiom of countability. Then sc~F is the relaxed function for F : X R iff the following two conditions are satisfied:
Nonconvex functionals. x
Relaxation
(sc~F)(x) < l i m i n f F ( x )
n
noo
x with
n
> lirn F ( x )
Proof. We claim that, since X satisfies the first axiom of countability, (sc~F)(x) = inf { l i m inf F(x )
n
:x
x in X}.
(5.1.11)
We denote the right hand side of (5.1.11) by F~(x). order to verify this, we have to check l i m i n f (inf { l i m i n f F (y^.n)
v>oo
:
Then F ~ is lsc. I n
j X
X j
(5.1.12) whenever y > x. Indeed, otherwise, for some 5 > 0, we would find some diagonal sequence y ,n ^ as v oo w i t h
v u u
(2/^,nJ <
inf { l i m
inf
F(x )
n
: x
x}
which is impossible. Thus, F~ is sequentially lsc, hence lsc, because X is assumed to satisfy the first axiom of countability. Also, F~ < F, and for every lsc $ < F , we have for x x
n
noo
n+co
and hence < F"(x). Thus, F"* is the largest lsc functional < F , and (5.1.11) follows from Lemma 5.1.1. I t is then easy to see (and left as an exercise) that F ~ ( x ) satisfies and is characterized by the properties (i) and (ii). q.e.d. Example 5.1.1. Let X be a topological space, A C X a subset. The indicator function %A is defined by
t A
x )
l {oo
if X iA.
i A,
We then have
SC~%A
functionals
and relaxation
211
={1
tf* iA.
Then sc
\A
= XA
-> R
defined by
/( ) = I L
U
:
\ \"
Du
+ fa M "
i f
i oo
C ^) otherwise.
1
(Note that 7(u) may also be infinite for some u G C (f2).) We claim (*r/)() = ( /n l ^ l " 1 oo
u d x
+ /n M " *
i f
# ( ) otherwise.
1 , P
In order to show this, we shall verify the conditions of Lemma 5.1.2: (i) (sc~I) is lower semicontinuous on L which yields condition (i). The lower semicontinuity is seen as follows: Suppose u u in L (fl). For the purpose of lower semicontinu ity, we may select a subsequence (w)eN C (w )nN w i t h
p n n p
lim (sc~I)(w )
l/
= liminf(sc~J)(u ),
n
v>oo
c 1 p
n o o
and we may also assume that this limit is finite. ( W ) N then is bounded in J f ' (f2). A subsequence of (w ) then converges weakly in H^ (fl) (Theorem 3.3.5), and by the Rellich-Kondrachev compactness Theorem 3.4.1, i t also converges strongly in L (fl). The limit has to be u, because the original sequence (u ) was assumed to converge to this limit. Since the H -norm is Isc w.r.t. weak H convergence (Lemma 2.2.7), we have
v p p n liP l,p
(sc~I)(u)
< l i m (sc~J)(uv)
VKX)
= liminf(sc~/)(u ).
n
212
Nonconvex functionals.
1 llP
Relaxation
(ii) Let u E H^(n). Since C (f2) fl H (Q) is dense in / f ^ f i ) , we may find a sequence ( w ) N C C ^ f i ) f l ff ( f i ) w i t h
1 , p n n
lim f/
N
|>u | + / K |
n
) = / |Du| + H
)
JQ
\JQ
JQ
(sc~I)(u).
HugH *^),
''
\oo
if w L P ( n ) \ c ( n ) ,
0
the relaxed functional is = U \ < i f t c e ^ n ) I oo otherwise. Remark 5.1.2. We may also define the above functionals 7, J on L f ( f i ) instead of L ( f i ) . The relaxed functionals will be given by the same formulae.
0 0 oc p
(sc-l )(u)
Remark 5.1.3. For p = 1, the relaxations of I and I are not given anymore by the J f ' -norm, but by the BV-norm which is defined in Chapter 7.
0 c 1 1
In metric spaces, there is an alternative useful characterization of the relaxation of a given functional which we now want to describe. D e f i n i t i o n 5.1.2. Let X be a metric space with distance function d(-, F :X RU {oo} be bounded from below, F ^ oo. For X > 0, we define the Moreau- Yosida transform of F as F (x)
x
(5.1.13)
- F (x )\
x 2
5.2 Representation
2
of relaxed functionals
Proof. For # i , x , t / G X , A > 0, we obtain from the triangle inequality F(y) + Xd{x y)
u
Xd(x x ).
u 2
The definition of Fx (#2) implies then inf (F(y) + Xd(x y)) yex
u
< F (x )
x 2
Xd(x x ),
u 2
hence F {xi)
x
< F ( x ) + Ad(x ,x ).
A 2 1 2 2
Since we have now shown that F\ is Lipschitz continuous, and since F\ < F , we obtain
FA <
5C"F,
(5.1.16)
A>0
+j .
Therefore lim x
A+00
A
= x
(5.1.17)
A+00
A+00
5.2 Representation of relaxed functionals v i a convex envelopes T h e o r e m 5.2.1. Let Q C R uous with
p d
-+R
contin
Nonconvex functionals. :u - u e #
0 1 , p 0
Relaxation
( f t ) } -+ R 6e given by , (u G H (Q)
0 lyP hp
:= / f(Du(x))dx
JQ
/ (cvx~ f) (Du(x))
JQ
:= sup{g(v) < f.
: g < f,g
convex}
,x + m
d
(0 - a ))
d d
= / (x , ... , x )
: = f{nx)
for n G N .
n-^oo.
(5.2.1)
\f (x)\ dx
n
= I
JW
\f(nx)\ dx
= ^
n
f
JnW
\f(y)\ dy=
f
JW
\f(x)\ dx
= ll/H
(5-2-2)
[ f(x)dx= Jw
d
[ fdx. Jw
(5.2.3)
or more compactly W
0
= a + bW
(a = ( a , . . . , a ) , 6 = (&i, ...,&<*))
x d
of relaxed functionals
(/(*) ~f)dx=
f
Ja+bW
(f(nx) l
Jna+nbW
- f) dx (f(y)~f)dy ( / ( ) - f) V
d
= n
= A
7 1
/
Jna+[nb]W
d
+ n
I
Jna+(nb-\nb\)W
(/() - f)
+ A
n
/
i+(nb-[nb})W Jna+(nb-[nb})W
{f(y)~f)dy
by periodicity of / . The first term in the right-hand side vanishes by (5.2.3), and thus, again using the periodicity of / , 1/
\JWo
{fn{x)-f)dx\<-
[ I
7 1
\f(y)~f\dy.
JW
Letting n
= 0.
= / Jw
fg(x)dx.
< e
L*(W)
q
(5.2.6)
d
(The possibility of approximating L (fl) functions g (Q open in R ) in such a manner by step functions can easily be seen as follows: Since Cg(ft) is dense in L ( f i ) , there exist y> C(fi) w i t h 110 "~ ^ c l l / ^ n ) < I t is then easy to construct a step function A^x^t (Xi R, Wi disjoint rectangles contained in supp</? ) w i t h e sup 2 meas supp (p
9
supp <p
Nonconvex functionals.
Relaxation
9 ~ Then
Yl * *
Xi
LP(Q)
/ (fn(x) ~ f) \Jw I Jw
9(x)dx
{fn(x)
f)Y^XiXWi(x)
{
+
k
< I > l |
{fn(x)
- f) ^g(x) -
]T\XwA )
{fn(x)-f)dx
+e||/ -/||
n
by (5.2.6) and Holder's inequality (Lemma 3.1.1). The first term tends to zero as n oo by (5.2.4), whereas the second one is bounded by 2e | | / | | , P ( W ) by (5.2.2) and can hence be made arbitrarily small. Therefore, (5.2.5) holds. q.e.d. The proof of Theorem 5.2.1 will be broken up into several steps: (1) We put (q-f)(v):=ud{^J f(v
u
+ D (x))dx:
V
<p G
d
H**(U), (5.2.7)
U bounded domain in R | ,
(q~f)(Du(x))dx.
Proof. Replacing F(u) by G(v) := F(v + UQ) forv = u UQ, we may assume u = 0, i.e. u G HQ' (Q). Since the piecewise affine functions, i.e. those u for which Du is constant on disjoint rectangles W{ C fi, with \ (J W\ arbitrarily small, are dense in H (for the same reason that the functions that are piecewise constant on disjoint rectangles W{ are dense in L ) , and since F
P
LYP
5.2 Representation
on some rectangle W. We next observe that for a given constant vector v, (q~ f)(v) is independent of the choice of U in (5.2.7). First, the value of the inf on the right hand side of (5.2.7) does not change under translations or homotheties of U. The general case of U\ and XJi then is handled by approximating U\ by disjoint homothetical translations of U2 and vice versa. We may therefore take U = W in (5.2.7). We now choose a sequence (y> )nN C H*' (W) with
n P
(q~f)(vo)
+ D<p (x))dx
n
>
(q-f)(v ).
0
and put
0
(then Du = v )
:= u(x) + - ( / ? ( n x ) . n
n lyP
By Lemma 5.2.1, u
converges to u weakly in H .
n ( a v v
Then u
= u
= j f(v Jw
= n d
+
v
/
lim F(u ) n-KX)
n
I f( o JnW f(v
0
D(
Pn(y))dy (5.2.9)
+ D<p (y))dy
n
W since (p is periodic.
n
f(Du (x))dx
n
f (q-f)(v ) Jw
(q-f)(v )measW.
0
The claim then follows from the characterization of (sc~F), see e.g. Lemma 5.1.2(i). q.e.d.
Nonconvex functionals.
Relaxation
(put p = 0 in (5.2.7)).
(5.2.10)
:= / Jn
(q-f)(Du(x))dx,
- F = 5C-(g~F),
(5.2.11)
= sc-((q~) F),
(5.2.12)
where (q~~) means performing the construction q~ iteratively n times. From the growth conditions on / assumed in The orem 5.2.1, we conclude that (Q- f)(v) is monotonically decreasing and bounded from below in n, hence converges to some limit (Qf)(v). From B . Levi's Theorem 1.2.1, we conclude l i m (q~ F)(u)
noo
n n
= lim f
n >o J o
(Qf)(Du(x))dx
for all n ,
<p e H^ (U),
UcR
5.2 Representation
of relaxed functionals
d
Definition 5.2.1. g : R > R is called quasiconvex if for all v G R , <p C H^ (U), U C R bounded and open
d P d
(5.2.15)
(J^dx^
<jf(i>(x))dx
(5.2.16)
(see Theorem 1.1.6). Since, as observed above, in Definition 5.2.1 it suffices to consider one fixed domain U, we may assume meas U = 1 and put ip(x) = v 4- D<p(x). Since <p G H^ , f ip(x) = v meas U = v, and (5.2.16) therefore implies that / is quasiconvex. We assume that / is quasiconvex, i.e.
f{Vo) =
p
^bjl
for all ip e
2
f{vo)dx
HQ' {U).
d
+ (1 - t)v )
2
< tf( )
Vl
+ (1 - t)f(v ).
2
(5.2.18)
+ (1 - t)v
+ D<p{y)) dy (5.2.19)
for all
and all (p
HQ (U).
220
Nonconvex functionals.
Relaxation
that v\ V2 is a positive multiple of the first basis vector of our standard basis of R , i.e. V\ v points in the ^ - d i r e c t i o n . We shall take a cube W : = (a, b) C R as our set U and construct a family of functions
d 2 d d
(v>) N C
6
H^{W)
on a set W C W w i t h meas W? = v)
2
V(p (x)
n
= -t(v
x
t(b-a)(b-a-) -
d 1
and H V ^ n l l ^ o o ^ ) < Co for some fixed constant Co that does not depend on n. Using these (p in (5.2.19) yields
n
f(tv
n
+ (1 - t)v )
2
< tf(V!)
+ (1 - t ) / ( v ) 4- Pn
2
with p 0 as n oo, hence (5.2.18). It remains to construct (p . We divide the interval (a, 6) into 2 + subintervals as follows:
n n
h = l2 =
h = (a+~(b~a),a+~(b-a)
have length ~(b a), and they alternate of length ^ ^ ( 6 - a). We then put
d 1
n 2
5.2 Representation of relaxed functionals via convex envelopes 221 We then put <p (a,x ,
n 2
...,x )
= 0,
2
d<p 8x
n(
l[X>
for x e W? forxeW?,
*$>=0*i
= 0. We also put
sup we get
n w
t),, ., , Ci (6 - a ) | v i - v | r * \2
xdip (x)\
n
Ci < n.
sup
xEWfUW?
"
ux
n
1 < |vi - v \ = : c .
2 0
This completes the construction of (p and the proof of Lemma 5.2.3. q.e.d. (4) We may now complete the proof of Theorem 5.2.1 From (2), we know (sc'F)(u)
1,p
< QF(u) =
jQf{Du(x))dx.
By Lemma 5.2.3, Qf is convex. By Lemma 4.3.1, Qf{u) therefore is lsc w.r.t. weak H convergence. Since QF < F (see (5.2.10) and the definition of Q F ) , we must also have from the definition of sc~F that QF(u) Hence equality. Thus (sc-F)(u) = J (Qf)(Du(x))dz < (sc~F){u).
222
Nonconvex functionals.
Relaxation
g(Du(x))dx
is a weakly H Isc functional < F . Therefore, from the defini tion of sc~F, the convex function Qf must in fact be the largest convex function < / . This completes the proof. q.e.d. Corollary 5.2.1. F as in Theorem 5.2.1 is weakly lower semicontinu ous in H if and only if f is convex.
1,p
Proof. Lemma 4.3.1 says that convex functionals are weakly lower semicontinuous. I f / is not convex, then by Theorem 5.2.1 sc~~F ^ F , hence F is not weakly Isc by Lemma 5.1.1. q.e.d. Remark 5.2.1. One may also consider variational problems for vector valued functions u : fl C R -+ E ,
d n
F(u)
:= [ Jn
f(Du(x))dx.
d
Again, / is called quasiconvex i f for all open and bounded U cR sl\ ip e H^ (U;R ), v eR
p n nd
and
In this case, however, while convex functions are still quasiconvex, the converse is no longer true. Theorem 5.2.1 continues to hold but w i t h con vexity replaced by quasiconvexity. Also, one may consider more general problems of the form F(u) = J f(x,u(x),Du(x))dx
with similar results and conceptually similar, but technically more in volved proofs. Remark 5.2.2. The notation of quasiconvexity and many of the basic corresponding lower semicontinuity results are due to C. Morrey. In fact, the quasiconvex functionals are precisely the weakly lower semicon-
5.2 Representation
tinuous ones. For detailed references to the work of Morrey and other researchers, see the book of Dacorogna quoted at the end of this chap ter. Remark 5.2.3. Theorem 5.2.1 can be considered as a representation the orem for relaxed functionals. I n particular, i t says that a functional on obtained by integrating an integrand f(Du(x)) (with certain tech nical assumptions on / ) has a relaxed functional of the same type, i.e. again representable by integration w.r.t. to some integrand g(Du(x)) of the same type. Furthermore, g may be computed explicitly from / . We now return to our initial example F(u)
M
= j f * | u ( x ) + (u'(x)
- l ) } dx
for u # ( ( 0 , l ) ) . F(u) is the sum of a functional which is continuous w.r.t. strong L -convergence, hence also w.r.t. weak H * convergence, and another one to which Theorem 5.2.1 applies. We conclude that
o 2 1 4
f {u (x) Jo
Q(u'(x))}dx,
otherwise,
References For the definition of relaxation and its general properties: G. dal Maso, An Introduction to F-Convergence, Birkhauser, Boston 1993, pp. 28-37. G. Buttazzo, Semicontinuity, Relaxation and Integral Representation in the Calculus of Variations, Pitman Research Notes in Math. 207, Longman Scientific, Harlow, Essex, 1989, pp. 7-28. For Theorem 5.2.1 and generalizations thereof: B. Dacorogna, Direct Methods in the Calculus of Variations, Springer, Berlin, 1989, pp. 197-249.
Relaxation
Determine sc~F and discuss the relaxation for F(u) = J (l-u'(x)) u(x) dx
2 2
foruGtf '
with F(u) = J
, 2
= 0 , u ( l ) = 1,
2
for
ueH
1A
((u(x)
- a) + ( u ( x ) - 1)) dx
for u G / J '
<** + Jn
cfa
i f ti C ^ ) otherwise.
d n d n
Why does the proof of Lemma 5.3.3 not work for vector-valued mappings R -+ M with n > 1, i.e. # : M -+ R, v G R , y> G # o ' ( R , R ) as in Remark 5.2.1?
p n
6
r~convergence
6.1 T h e definition of T-convergence In this chapter, we treat the important concept of T-convergence, intro duced and developed by de Giorgi and his school. Definition 6.1.1. Let X be a topological space satisfying the first ax iom of countability, F : X > R functions (n G N). We say that F T-converges to F,
n
F(x) and
= lim
F (x ).
n n
1 F (x)
n
nx -1
226
r'-convergence
while the pointwise limit is 0 for x = 0, 1(1) for x > 0 ( < 0). Example 6.1.2. F
n
F {x)
n
:=
2 nx 0
Then
(r-UmF )(x) = 0
n
nx
F (x)
n
Then
(r-limF )(x) = {
n
whereas the pointwise limit is again identically 0. Note that the F of 6.1.3 is the negative of the F of 6.1.2. Thus, in general
n
(r-limF )^r-lim(-F ).
n n
Example 6.1.4- F :
n
nx F (x)
n
for 0 < x < n for < x < n n otherwise for odd n for even n.
:= { nx 2 0 0
Example 6.1.5. F
: R R F ( x ) = sinnx.
n
Then
(r-limF )(x) = - l ,
n
whereas F
227
From Examples 6.1.4 and 6.1.5, we see that among the two notions of pointwise convergence and T-convergence, neither one implies the other. Example 6.1.6. F : X E converges continuously to F : X E if for every x G X and every neighbourhood V of F(x) in E (i.e. V = {y G E : | F ( x ) - i / | < e} for some e > 0 in case F(x) G E, V = {y G E : y > i f } U { o o } for some K G E in case F(x) = oo, and analogously for F(x) = oo), there exist no G N and a neighbourhood U of x w i t h
n
F (i/) G V
n
for all n > no, y U. F converges continuously if and only if both F and F converge to F and F, respectively. Continuous convergence implies pointwise convergence, and we conclude from Examples 6.1.2 and 6.1.3 that Tconvergence is weaker than continuous convergence.
n n n
Example 6.1.7. Let X satisfy the first axiom of countability, F a constant sequence. Then r-limF =
n n
E F : X -> E E
(sc~F)
is the relaxed function of F . Thus, we have the remarkable phenomenon that a constant sequence may converge to a limit different from the constant sequence element. Remark 6.1.1. Without changing the content of the definition of Tconvergence, condition (ii) may be replaced by the following condition which is weaker and therefore easier to verify: (ii') for every x G X , there exists a sequence x
n n n
converging to x w i t h
l i m s u p F ( x ) < F(x).
noo
The following result is useful in approximation arguments: L e m m a 6.1.1. Let X satisfy the first axiom of countability. Suppose ( m)meN converges to x in X, and
x
l i m s u p F ( x ) < F{x).
m
moo
Suppose that (ii') is satisfied for every x (i.e. for every m, there exists a sequence {x ,n)neN converging to x with
m
m
F(x )).
m
noo
T-convergence
Proof. Since X satisfies the first axiom of countability, we may take a neighbourhood system (U ) ^ of x and renumber i t and take intersec tions so that
u u
for all m
v
N,
u
and that every sequence (y^^n with y G U^ ) for all v and some sequence ^(y) oo as v oo converges to x. For n G N , we let ra Then lim m = oo. n+oo Namely, otherwise, we would find fco G N with
n
: = max jra G N : x , G U
m n
, F (x
n
m ) n
) < -f F ( x ) j .
m
Fn or
O&fc.uJ
>
+F(x )
k
%k,n Uk
u
To see that this is impossible we simply observe that since x and since Xk converges to Xk as n oo we have
0>n 0
G Uk
^fc ,n 4
0
f ^ sufficiently large n,
r a
) < F(x ),
fco
+ 7-
We then have
%m
n
,n
G f/m
F (x
n
m n
, ) < ^OmJ +
n
Therefore y
:= x
m n y n
+ J < F(x)
rn
n
n+oo
n+oo \
229
Let F : X -> R U { 0 0 } satisfy inf F(t/) > - 0 0 . Given e > 0, we say that x G X is an e-minimizer of X i f F ( x ) < inf +
Note that x is a minimizer of F i f i t is an e-minimizer for every e > 0. In contrast to minimizers, e-minimizers always exist for any e > 0. The following result is a trivial consequence of the definition of Tconvergence, but quite important. T h e o r e m 6 . 1 . 1 . (Let X satisfy the first axiom of countability). Let the sequence of functions F : X R F-converge to F : X R. > > Let i n f x F (y) > 00 for every n G N . Let x be an e -minimizer for F . Assume e 0 and x > x for some x G X. Then x is a minimizer > for Fj and
n y n n n n n n
F(x)
= l i m F (x ).
n n
(6.1.1)
n-+oo
Proof. I f x were not a minimizer for F , there would exist x' G X w i t h F{x') Since F
n
< F(x).
n
(6.1.2)
\imF (x )=F(x').
n n
We put 6 := \(F(x)
- F(x')).
F (x' )<F(x')
n n
230 Since x
n n
T-convergence is an e -minimizer of F ,
n
F )
n
> F (x )
n n
- e
>F (x )-S
n n
< F(x') + 36
contradicting (6.1.2) by definition of 8. Thus, x is a minimizer for F . I f (6.1.1) did not hold, then after selection of a subsequence, F(x) < limF (x )
n n
whereas by property (ii) of Definition 6.1.1, there would exist a sequence (x' ) converging to x with
n
F(x) = l i m F ( x ' ) ,
n n
q.e.d. Corollary 6.1.1. (Let X satisfy the first axiom of countability.) Let F : X R r-converge to F : X R. Let x be a minimizer for F . If x x, then x minimizes F, and
n n n n
F(x)
= lim inf
F (x ).
n n
The following result is similarly both trivial and important. T h e o r e m 6.1.2. (Let X satisfy the first axiom of countability.) Let F Y-converge to F. Then F is lower semicontinuous.
m m n
Proof. Otherwise, there exist some x G l and some sequence ( x ) ^ N with lim
m*oo
m
Xm
= x < F{x).
m n n G
lim F{x )
moo
(6.1.7) N C X
lim F ( x
m > n
) = F(x ).
m
(6.1.8)
6.2 Homogenization
m
231
We assume oo < l i m F ( x ) , F(x) < oo simply to avoid case distinc tions. We let 6 := ] (F(x)
4 \
l i m F(x j)
m
>0
/
by (6.1.7).
ra+00
m
Fn (x n )-F(x )<6
m
lim x , n
m
= x ,
= oo.
771 oo
(x ,
m
n m
).
(6.1.10)
<F(x)-36
(6.1.11)
and F (x , J
nm m n
> F(x) - 6.
(6.1.12)
Equations (6.1.9), (6.1.11) and (6.1.12) are not compatible, and the re sulting contradiction proves the lower semicontinuity. q.e.d. Remark. As a consequence of Corollary 3.2.2 and Theorems 3.1.3, 3.3.1, and 3.3.3, in combination w i t h Lemma 2.2.4, the weak topology of L (ft) and W (Q) for 1 < p < oo satisfies the first axiom of countability so that the preceding notions are applicable.
p kyP
The reference for this section is G. dal Maso, An Introduction to V-Convergence, 1993
Birkhauser, Boston,
6.2
Homogenization
In this section and the next one, we describe two important examples of T-convergence. They are taken from H . Attouch, Variational Conver gence for Functions and Operators, Pitman, Boston, 1984. In the discussion of these two examples, we shall be more sketchy about some technical details than in the rest of the book, because the main point of these examples is to show how the concept of Tconvergence can be usefully applied to concrete problems that arise in various applications of the calculus of variations.
232
T-convergence
d
Let M be a smooth subset of the open unit cube (0, l) considered as a hole. Let M :=
c
of R . M is
( J e ( M + ra) mGZ
d
(e(M + 777.) : = { x = y + em with ^ G M}) be a periodic lattice of 'holes' of scale . Let ft C R , fi : = ft \ ( M f l ft), i.e. a domain with many small holes. Such domains occur in many physical problems like crushed ice, porous media etc. Often, the physical value of e is so small that it is useful to perform the mathematical analysis for e 0. This is called homogenization. Let
d c c
, x
/ x
d
f0 ^ oo
be the indicator function of R \ M\. a ( ~ ) then is the indicator function of R \M . We consider the functional
d e
F (u):=\e
e 2
2 2
(6.2.1)
/ Jn
f(x)u(x)dx
(6.2.2)
Here d f i = d f i U (dM f)ft). The boundary condition on dft comes from the requirement that u G jfiFo' (fi), while the boundary condition on dM is forced by the functional.
2 e
convergence (6.2.3)
with
where fi(M):= j
J(o,i) \M
d
\Drj{x)\ dx=
[
J(o,i)
d
r/(x)dx,
233
m(0,l) \M inM
d
7 25 Z ~periodic 7
e
Proof. We put 77 (x) : = 77(f). By Lemma 5.2.1, 77 converges weakly in L ( f i ) to fJ>(M) as e 0. Let now u G L (ft). put
2 2
By approximation, we
/1 2
to u, and on M .
c
u 0
e
Moreover
^ e K ) = ~
l ^ c l
(6.2.5) (u \Drif
2
^2
2 /x(M)
l^jf
2
+ 2ur Du-Dri
]e
+
2
r \Du\ )
1
|-DT7 | asymptotically
c
.^
,2
meas U
J(0,l) \M
/ \Drj \
e
= messU
[
J(0,l) \M
d
\Dn\
= measU fi(M)
(6.2.6)
~>
JU
u \Drj \
e
= fi(M)
[ u
(6.2.7)
77 |>u| = 0,
(6.2.8)
and from (6.2.6), (6.2.7) and the Schwarz inequality, also lim e
C
/
JQ
un Du >77 = 0.
e c
(6.2.9)
-+
234
r-convergence
= F(u).
(6.2.10)
In order to complete the proof of T-convergence, we need to verify that whenever functions v that vanish on M converge weakly in L (Q) to tx, then
2 e
l i m i n f F (v )
e e
> F(u).
(6.2.11)
as before. We have F (v )
e e
+ F (u )
e
>e
[
2
Dv Du
e !
e V(M)
f (m
w*Dve'Du
+ uDv -Dri ).
(6.2.12)
uDv Dr]
e
(6.2.13)
since the other term on the right hand side of (6.2.12) goes to 0 by a similar reasoning as above. Equation (6.2.4) implies e Ar) Moreover
6
2
= -1
in
fi .
c
(6.2.14)
Duv D
e
Vt
<e
\D \ ^
Ve
' - 0,
2
(6.2.15)
since v as a weakly converging sequence is bounded in L , \Du\ is bounded by our approximation assumption that u is smooth enough, and since we may use (6.2.6). Integrating the right-hand side of (6.2.13) by parts, and using (6.2.14) and (6.2.15), we obtain
e
M M )
u 2
'
235
since v converges weakly in L to u. This implies (6.2.11) and concludes the proof. q.e.d.
6.3 T h i n insulating layers We consider an insulating layer of width 2e and conductivity A, and we want to analyse the limit where e and A tend to 0. Let fi C R be bounded and open, S a smooth complete surface i n R , e.g. a plane, S : = fl f l 5,
3 3
S : = {x R : dist(rr, 5 ) < e}
e
S fl Conductivity coefficient
c
: = fl f l S
c
:=fl\S .
'
_ f 1 ~\A
on fl onE
c
(A>0).
:#
1 , 2 0
( f l ) -> R \Du\ dx
2
\Du\ dx+~f
JQ
(6.3.1)
/
2
J ' (u) -
/ fu -> min
(/ L (fl)
given).
+ / = 0
C ) A
on on S
c c
fl
(6.3.2) (6.3.3)
AAu
W ,A|Q
C C
+ / = 0
C c
W ,A|E
on d f l f l d
(6.3.4) (6.3.5)
A dux*
on d f l f l d
c
= 0
on d f l .
(6.3.6)
236
r -convergence
T h e o r e m 6 . 3 . 1 . We let e -> 0, \ -+ 0. If j -> a with 0 < a < oo, then Ue,x * u weakly in L ( f i ) u \ =t u uniformly on every fl C C fi \ E, w/iere w solves
2 et 0
Aw + / = 0
on fl \ E
w|an = 0
^p and
=
^n| = a
2
H E
w/iere
and
normal derivatives for the two components of ft \ E. (In case a = oo, u is continuous across E, and! A u = / in fi.) Furthermore
\ f l^e,A| + ^ / |^e,A|
/ J
e , A 2
| D u | + ^ / [u]|dE.
2
1 { u )
/ \ J loo
n V E
\Du\
if u e H^ (Q otherwise
Thus, in case a = 0, we obtain a perfect insulation in the limit, whereas for a = oo, the limiting layer does not insulate at all. We assume for simplicity S = {x = 0 } .
3
L e m m a 6 . 3 . 1 . There exists a constant c\ (depending on / , fl, 5, but not on , A) such that for all sufficiently small e, A
237
+ \ f
\Bu , \
e x
Jn
+ A
JdQ
/ W
C
Jz
, A ^ ~ -
/ / *W A
C ) 2
M
^ l/L (Q)'
l .Alx,2( )
<C2 [
\Du , \
e x
<
<C3
| ^ C , A |
(y = x\y [
=
2
x ),
<c e
3 2
\Du \
yX
(we only get e instead of e , because the area of the portion of # E on which U A vanishes, namely Oil f l 5 , is proportional to e). Altogether
C C ) C
<
< C
( l
+ ^)
(jf
\Du , \
e X
+ \J^
|>
W e
,A| )
and the estimates follow. q.e.d. Proof (Theorem 6.3.1). We only consider the case 0 < a < oo (the other cases follow from a limiting argument). We first observe T- lim J ' ( u ) = oo We assume for simplicity A = A() = a , J : = J ' Let u e H
1,2 c c A ( c c A
i f u e L ( f i ) \ H^ (Q
\ E).
).
238
T-convergence
e
u weakly in L ( f i ) w i t h l i m / ( u ) = I(u).
c
We
if | z | > e
<.
J(ti )
e
= 5 / \Du\ * JQ
+ ^
2
[ J\x*\<e \*
+D ( y - I ~\l I
(u(x\x ,e)
2
\Du\ \
D u
+ ?L [
2
\ + rf
-f terms that contain x and go to zero as e 0 ( | # | < e). > If u is smooth (which we may assume by an approximation argument), therefore for e 0 > i2 / |D| + f
4
/[]
/E
JQ\E
e > 0
For u as above,
e
f
vE
e
\Dv \
e
+ ^
[
'Eg
\Du \ >aef
e
Du
</ E
e
Dv
> f
(!>{u(.,6)+u(.,-6)}
+ ) { ( - , e) - ( - , - e ) } +^(u(., )-u(.,- ))
e e
direction.
239
We may assume u smooth (otherwise, we use an approximation argu ment). Then as above
(D
e
E E
e) e) -
-c))) D f z
e
-^
2 J
D?J
-c)).
supe / Consequently,
\Dv \
e
< oo.
(6.3.7)
limae J
(^J
\ *
Dv
for e -+ 0.
-* 0
e >v
3
c) 3
-c)).
e -/?v (w(-,c)-w(-,-c))
3
v
) -
W(., - ) ) -
V ( u ( . , C) C
-c)) ,
where here of course dUf = 0 0 {x = e}. Since we may assume l i m i n f _ o ^ (^c) < oo, v is bounded in H (Q ). Therefore, we may
c ly2 c e e
240
e
r-convergence
c
assume that the traces of v on d converge*) . Since u is assumed smooth and v converges to u weakly in L , we may assume
2 e
^,(0E )
e
w(',0 )
weakly in L ( d ) .
We then get
h e
^2 /
J Eg
^Dv (u(-,e)-u(-,-e))
{
= -
/ []..
J E
iff/ _| v,|'
E D
^ /
| > 2 /
| .
[u)l. q.e.d.
F (x)
n
:= n(smnx
+ 1)
F () : - | ^
n
Q
n z
f^ I ^ n ^ ^ 2
for < x < ~
n
2n - n x
n
6.2
F (x) : = sinnrr + cosnrr. Show the following result: Let X be a topological space satisfy ing the first axiom of countability, F , G : X R. Suppose that F T-converges to F , G T-converges to G, F - f G Tconverges to H (assume that the sums F + G , F + G are always well defined; for example, there must not exist x X w i t h F(x) = oo, G(x) = - o o or vice versa). Then
n n n n n n n n
F + G < H. Does one get equality instead of ' < ' here? (Hint: Consider F (x) sinnrr, G (x) = - s i n n r r . )
n n
f For this technical point, see e.g. W . Ziemer, Weakly Differentiable Springer, G T M 120, New York, 1989, pp. 189ff.
Functions,
7.1 T h e space
d
BV(Q)
d
Let Co(M ) be the space of continuous functions on R with compact support. For each Radon measure fi and each //-measurable function v : R R with = 1 //-almost everywhere, we can form a linear functional
d
L : C$(R )
-* R
L(f)
= f
JR
d
fudpi.
Conversely, we have the Riesz representation theorem, given here with out proof (see e.g. N . Dunford, J. Schwartz, Linear Operators, Vol. I , Interscience, New York, 1958, p. 265). T h e o r e m 7.1.1. Let L : Co(M ) R be a linear functional \\L\\
K d
with (7.1.1)
: = s u p { L ( / ) : / C ( E ) , | / | < l , s u p p / C K] < oo
0 d
for each compact K C R . Then there exist a Radon measure ii on R and a ji-measurable function v : R R with \u\ = 1 ji-almost everywhere with
d d
fisdfi
forallfC(R ).
(7.1.2) then
i.e. L(f)
L(f)
= /
/rf/i.
(7.1.3)
241
242
Modica-Mortola
d
example
Thus, the Radon measures on R are precisely the nonnegative linear functionals on C o ( R ) . (Note that (7.1.1) automatically holds i f L is nonnegative; namely
d
\\L\\
L( )
XK
in that case where XK is the characteristic function of K.) The same result more generally holds for C o ( R , # ) where H is a finite dimensional Hilbert space with scalar product ( , ) . Then linear functionals L : C g ( M , # ) R satisfying (7.1.1) are represented as
d d
L(f)=
I
JR
d
(f,")dn,
d
(7.1.4)
where // again is a Radon measure and v : R H is //-measurable with |z/| = 1 //-almost everywhere. Also, in the situation of Theorem 7.1.1, one has /x(n) = s u p { L ( / ) : / G C (fi), | / | < 1}
0
for any open ft C E . The expression z/d// in (7.1.4) = 1 //-almost everywhere) is called a vector-valued signed measure. (// is supposed to be a Radon measure and v a //-measurable function with values in H.) D e f i n i t i o n 7 . 1 . 1 . Let ft R be open. The space BV(ft) consists of all functions u L (ft) for which there exists a vector-valued signed measure z/// with //(fi) < oo and
1 d
L
d l5 rf
udivg
= j gvdfi
(7.1.5)
for all g C o ( f i , R ) . In this case, we write Du = z///, DiU = z/^// (y = ( z / . . . , i / ) , i = 1 , . . . , d ) . For u BV(ft), we put
\\Du\\(n) :=// =
sup { / udivgdx
n
:g = (g\...,g )C%
( f i , R ),
IPll()-
7.1 The space BV{SI) For u B V ( f i ) , \\Du\\ is a Radon measure on f l : | | D u | | (fio) = sup I j f udivgdx for fto open in fi. We write
||Z>||(no)=: / ll^ll,
243
: g G C (fto,M ), M < 1 J
d 0
then u BV(fl),
and
D U { X )
u(x) = { 0
\Du(x)
ifDu(x)^0 otherwise.
The proof is obvious. q.e.d. On a compact hypersurface S CR oi class C , we have an induced metric and in particular a volume form dS. The (d l)-dimensional volume of 5 then is
d
|Sld-i =
Js
dS.
of E.
244
Modica-Mortola
example
= sup | ^ d i v
C ( R , R ) , \g\ < 1 J .
d d 0
[
JdE
g(x)n(x)d(dE)
\dE\ _
d
>sa y
P
d
divg:geCZ>(R ,R ),\g\
< l j .
For the converse inequality, we use a partition of unity to extend n to a C-vector field V on R w i t h | V ( x ) | < 1 for all x R . For <p Cg with \ip\ < 1, we put g = <pV and get
d
/ divg = f
JE
IdE JdE
ipd(dE).
Consequently
sup j ^ d i v : c ? e C
5 o o 0
l j
>sup|y
This completes the proof.
ipd(dE)
q.e.d. The same conclusion holds if J? C C namely for some bounded open set;
IdEl^i =
in that case.
\\D \\(0)
XE
= sup [Je^9
9 C 0 ( n , R ) , |<?| < 1 J
D e f i n i t i o n 7.1.2. A Borel set c R has finite perimeter in an open set fi if X E \ BV(Q). The perimeter of E in fi m that case is
N
P(E,Q)
:=
\\DxE\m
f f
( = sup|^div
(7.1.8)
245
The following lower semicontinuity result is easy to prove and very useful. T h e o r e m 7.1.2. Let ft C R u
n d
be open, ( u ) e N C BV(ft),
n n
and suppose
in
L (ft).
(7.1.9)
noo
: n N} < oo,
(7.1.10)
BV(Q).
= l i m / u div
n-^oc JJJ
g < l i m i n f | | D u | | (U).
n
n-^oo
Taking the supremum over all such g, we obtain (7.1.9). I f (f Co(Q), then for i = 1 , . . . d lim / (fDiU ~ i n
+ n
= n
l i m / u Di(p = uDup -* i n in
n
We next discuss the approximation of BV-functionals by smooth ones through mollification. As usually, we let p C o ( R ) by a mollifier w i t h p > 0, suppp C S(0,1), f p(x)dx = 1, and we also impose the symmetry condition
d Rd
p{x) = p{-x).
(7.1.11)
246
Modica-Mortola
example
p {x)
h
:=
h- p{j^
l d
we extend u to L (R )
u (x)
h
: = p^ * u(x) : =
p^(x - y)u(y)dy
C(Q).
T h e o r e m 7.1.3. If u BV(ft),
then u
->
\\Du\\ in the sense of Radon measures as h 0, i.e. for every f C (ft) * lim / f\\Du \\^
h
f f\\Du\\.
(7.1.12)
In
(7.1.13)
Proof. Uh u in L (ft) by Theorem 3.2.1. I t suffices to consider the case / > 0. Prom (7.1.3) i t follows as in the proof of Theorem (7.1.2) that for every / C (fi) with / > 0 / / | | D u | | < l i m i n f / f\\Du \\.
h
(7.1.14)
JQ
~*
JQ
I t thus remains to prove that for such / limsup / f\\Du \\< h~+o JQ
h
f f\\Du\\.
JQ
(7.1.15)
/ f\\Du \\ Ja
h
s u p j y g(x)Du (x)dx
h
: g Cf(Sl,R),\g(x)\
< f(x)
V x fij . (7.1.16)
7.1 The space BV(ft) For any such g as in (7.1.16) / g(x)Duh(x)dx Jn = / Uh(x) J
h
247
divg(x)dx
= - J J p (x - y)u(y)dy = -j
= - j u(y)div(g )(y)dy. Since we assume \g\ < / , we have \9h\ < \g\ <
h
A ,
and since / is continuous, fh =3 / uniformly as h 0 (see Lemma 3.2.2), i.e. \f (x) - f(x)\ < r) for all x fi, with \\m ^r] = 0. By definition of the right hand side of (7.1.17) therefore is bounded
h h h h
h h Jn
lim / g(x)Duh{x)dx
Then any
\\ TI\\BV
- K
or
C(fi) with
\\Dv \\(Q)<K
n lyl n
+ l.
Therefore ( f ) n N is bounded in W (ft). By the Rellich-Kondrachev compactness theorem 3.4.1, after selection of a subsequence, ( f ) n G N converges in L (ft) to some u L (ft). (u ) has to converge to u as well (in L ( f i ) ) . By Theorem 7.1.1, u BV{Q), and
n 1 1 n x
\H\ <K.
BV
q.e.d.
248
Modica-Mortola
example
A reference for the BV theory is W . Ziemer, Weakly Differentiable Functions, Springer, G T M 120, New York, 1989, Chapter 5.
v oo
otherwise,
F{u)
:= { t L \ oo
1 d
] D U l
l | Z ? U | 1
otherwise.
Then w.r.t. to L (M )
convergence F = T- l i m F .
n
(7.2.1)
(7.2.2)
n*oo
\nL (R ).
- h {t)\
n
<\s-t\
for all n G N , 5, t G .
L1
as n oo.
(7.2.3)
(7.2.4)
249
ou
u
7 T
L 1
-f
, h ou
n
2 u
7 T
L
1
as n
oo
(7.2.5)
by (7.2.3), (7.2.4), and Lebesgue's Theorem 1.2.3 on dominated convergence. We may assume u
n
G H {R )
n n
h2
for every n G N ,
(7.2.6)
because otherwise F (u ) l i m i n f F (u )
n n
n+oo
= 21iminf /
noo J
\D(h ou )\
n n
> =
f F(u).
\Du\
This shows (7.2.2). (ii) We want to show that for every u G L ( E ) , there exists a se quence (tXn)nGN C L (R ) converging to u in L ( R ) w i t h
x d 1 d x d
l i m s u p F n K ) < F(u),
noo
(7.2.7)
thereby completing the proof of T-convergence. This inequality will be much harder to show than (7.2.2), however. We shall pro ceed in several steps: (1) We may assume u G C o ( E ) . By a slight extension of the reasoning of Theorem 7.1.3, we may find Uh G C(R ) (take a smooth (fh w i t h (fh = 1 on B(0, ^ ) , ip(h) = 0 on R \ JB(0, + 1), \D(f \ < 2 and multiply the mollification of u with parameter h by tp^) with
d d h d
lim /
lim F(u )
h
hO
Applying Lemma 6.1.1, we may indeed assume u G Co(R ). (2) We now want to show that i t suffices to verify the claim for certain step functions.
Modica-Mortola
example
d
= {x : u(x) = t}
v n and satisfying
v+l n
I Ju
oo
[u^^l^dt
r
E
l / = - 0 0 OO
J
"
n
l ^ w L v
E
v oo oo
1
^l
u _ 1
(^)L-i
2
t>= oo
We choose iV(n) G N with iV(n) > (nmax \u\ + 1) and put N(n) 1 ^
t/=-AT(n)
n>oo
If t ,
v n
2 < . n
251
u(x)\dx
= 0.
Lemma 6.1.1 then implies that i t suffices to prove the claim for the functions u . (3) I n (2), we have reduced the claim to step functions
n
2=1
where the fti are disjoint bounded open sets with bound ary dfti of class C. Since the general case is completely analogous, for simplicity, we only consider the case N = 1, i.e. u = an
X
(7.2.8)
(7.2.9)
We let 0 < p < o, where eo is given in Lemma B . l . Thus, the signed distance function d(x) as defined i n Appendix B is smooth on {x G R : dist(x, dft) < p). We need the following auxiliary result:
d
4>n{x) : = fjy
+ nsin (7mx())} *
n
6e /ie one-dimensional analogue of F . Then there exist Lipschitz functions \n R R XnW = 0 /orKO for t> -~= \/n / o r O < K - f ,
XnW =
0< n(t)<a
X n
(7.2.10)
n+oo
fl"
Modica-Mortola
example
We postpone the proof of Lemma 7.2.1 and proceed with the proof of the theorem. We choose a sequence
a,
Then u (x)
n n
= 0
n
for
xeR \fi
n
for X G fi \ fi for x e fi n
<a
We also note lim I f i J . = 0. Thus (cf. (7.2.8)) lim and u B) E : = {x R : d(x) = t}.
t d
(7.2.12)
|u(x) - u ( x ) | dx = 0,
n 1
(7.2.13)
We note Du (x)
n
= 0 , sin(n7n/ (z)) = 0
n
for x e R \
fi , (7.2.14)
n
by Lemma B . l .
(7.2.15)
253
= limsup /
l D U n { x ) l
\Dd{x)\dx
l i m s u p ^ ^Xn(*)l
2 + n s i n
2(
n 7 r X w
( )) j
t
>
rft
sup
y)<t<^=
0 (Xn)*|Et|d_i J
n
4 < -a = F(u)
This is (7.2.7). (4) I t only remains to prove Lemma 7.2.1: The idea is of course to minimize 4> {x) under the given side conditions on x- The Euler-Lagrange equations for <fi are
n n
^x"
7rnsin(7rnx) cos(7rnx),
(7.2.16)
We now construct a solution of (7.2.16) with the desired properties: w.l.o.g. a > 0 (the case a < 0 is analogous). We choose ci = in (7.2.16). We put
i
*n(0 : = / ~ I
7o n ^
^ r I +sm (n7T5) y
2
ds
A=a .
n
254
Modica-Mortola
n
example
1
and
-Xn
(0 = ^ - +
sin>7r (*))J
Xn
(7.2.17)
(t)
= 0
+ nsin^(7rnxn(0) I ^
i
n
(7.2.17)
= 2 / io
References L. Modica and St. Mortola, Un esempio di r~-convergenza, Boll. U.M.I. (5), 14-B (1977), 285-99. L. Modica, The gradient theory of phase transitions and the minimal interface criterion, Arch. Rat. Mech. Anal. 98 (1987), 123-42 Let us also quote without proof the following result of L . Modica, loc. cit., which plays an important role in the theory of phase transitions: Let fl C R be open and bounded with Lipschitz boundary, W : R R be continuous with precisely two zeroes a,/3 (which then are absolute minima, because W is nonnegative)
d +
F {u)
n
:= { In ( IP(*)H + nW(u(x)))
v oo
dx
and Fo(u) = { h I oo
2 c
H^ ll
or
Modica-Mortola
255
W2(s)ds.
Then F is the T-limit of F w.r.t L -convergence. The proof is similar to the one of Theorem 7.2.1, except that we cannot apply Sard's lemma anymore, because even for a smooth function u, a and (3 need not be regular values. Thus, one has to consider nonsmooth level sets as well and appeal to some general results about BV-functions and sets of finite perimeter. The interpretation of Modica's theorem is the following: Consider first the problem
L
under the constraint
W(u(x))
dx min
1_2/ meas S
f a = such that Ai U A<i = ft,
(x)
= 7,
with a < 7 < (3 (w.l.o.g. assume a < (3). A minimizer then is of the form for A\ C ft for^Cfi
/ p ? n l o ( ?
- -
l 8 )
a meas A\ + /3 meas A
y
= 7 meas ft.
(7.2.19)
u thus jumps from the value a to the value (3 along dAiilft = dA2C\fl =: T. However, apart from the preceding relations (7.2.19), A\ and A2 and hence also T are completely arbitrary. I n particular, Y may be very irregular. I n order to gain some control over the transition hypersurface T, one adds the the regularizing term f \ \Du(x)\\ to the functional, al beit with an arbitrarily small weight, and in fact one passes to the limit where this weight vanishes so that one preserves (7.2.18), (7.2.19). A l though this regularizing term disappears in the limit it still has the effect of regularizing the hypersurface T along which the transition from a to (3 occurs. Namely, the hypersurface of discontinuity of the minimizer u now is constrained by the requirement that the B V norm of u, J \ \Du\\, be minimized. This means that T is a so-called minimal hypersurface. The existence and regularity theory for such minimal hypersurfaces may be found for example in E. Giusti, Minimal Surfaces and Functions of Bounded Variation, Birkhauser, Boston 1984, pp. 3-134.
2 Q n
256
Modica-Mortola
example
Exercises 7.1 7.2 Try to construct bounded sets in R that do not have a finite perimeter. Prove the preceding theorem of L . Modica for d = 1.
d
:= {t G R : 3x G R : Du(x) = 0, u(x) = t}
has one-dimensional Lebesgue measure zero, and thus, for almost all t G R, w~" (0 is a smooth hypersurface by the implicit function theorem. We then have for every open ft C R
1 d
/ \Du{x)\dx= Jn Proof.
f
J-oo
1^(0
rifled*.
(A.l)
(1) We first show the result for a linear map / :R R > (w.l.o.g. / ^ 0). Let 7r : R R be the projection onto the first coordinate. We may find A G G/(1,R), R G 0 ( d , R ) f w i t h I = A O 7 o R. T For every measurable subset E of R , we have by Fubini's theorem \E\ where \E\
t Gl(d,R) Gl(d,R)
d d d d d
= ^
J
OO
{Enir-Wl^dt,
= [
XE
0 } , 0(d,R) := {A
257
258
/
oo
lEniTW^I^eft.
-oo
-oo
{Enr'is^ds.
(A.2)
Since \A\ = |dZ|, and / is linear, this is the coarea formula for linear maps. (2) Let S
t u
= {x G R
d
: Du(x) = 0} for t G R.
U := {x G R We put
U t
: u(x) > t}
"
f Xt/ 1 -Xn*\c/
t
at > 0 i f * < 0.
u (x)dt.
t
Then u(x) = /
JR
Let
G Q ( R \ S ),
u
/
JR
D
JR<* JR
(A.3)
By definition of S and the implicit function theorem, 0 R \ S is a hypersurface of class C . Since we assume supp< C R \ S , we may apply the divergence theorem to obtain
d d u d u
/ div(p(x)dx= Ju
t
/ J(dU )nR \s
t d u
(f(x)n(x)d(dU )(x)
t
/ J (dUt)r\R<*\S
(p(x)n(x)d(dU )(x),
t u
259
where n(x) is the exterior normal of U . We use this in (3) (recall the definition of Ut) to obtain /
JR
d
Du(x)ip(x)dx = /
JR
d
= [ [ Jm Jdu nm \s
d t 1 d u
< / \ - (t)nR \S \ _ dt
u d 1
JR
JR
\Du{x)\dx=
[
JR \S
d u
\Du{x)\dx<
[ I ^ W L i ^ .
JR
(A.4)
(3) We now prove the reverse inequality. We let l piecewise linear maps w i t h lim
n
: R
E be >
/
jR
d
\l - u\ = 0
n
(A.5)
-^
lim
n
/
jR
d
\Dl \=
n
[
JRd
\Du\.
(A.6)
-*
Let U? := {x e R
d
: l {x)
n
> t}.
By (A.5), there exists a countable set T\ C R with the property that for all t T i lim
n
/
JRd
|Xt - X?l = 0,
(A.7)
>
where \t is the characteristic function of {u(x) > t}, and Xt the one of {l (x) > t}. As noted above, by Sard's theorem and the implicit function theorem, there exists a null set T2 C R such that for all t T 2 , u ( ) is a smooth hypersurface of class C . We put
n - 1 d
T := Ti U T .
2
Appendix A Let t e R \ T , e > 0. By Lemma 7.1.2, there exists g CQ* with \g\ < 1 and div g(x)dx + - . We let M : = J n > n
0
(A.8)
Rd
\div g(x)\dx.
\Xt-Xt\
(A.9)
div g(x)dx j
J{ln(x)>t}
div g(x)da
X t
- ?|ds<5.
x
(A.10)
/
/
1
(*)>*}
div g(x)dx + e
J{1 = j g{x)n(x)d{d{l {x) > t}) + e, ./0{M*)>t} n(x) denoting the exterior normal of {l (x)
n n
> t}
<K (t)\ _
d
+^
From Fatou's lemma (Theorem 1.2.2), (A.2) and (A.6), we obtain / I t i - V * ) ! ^ ! * < l i m t a f jf I ^ W I ^ dt < lim inf /
n->oo J
Rd
R d
\Dl (x)\dx
n
|Du(x)|dx.
(A.ll)
261
d
# : R - R integrable, Q C R
open.
g(t)\u~\t)
DQl^dt.
(A.12)
Proof. (A.12) follows from Theorem A . l i f g is the characteristic func tion of an open set and similarly i f g is the characteristic function of a measurable set. By considering g+(t) : = m a x ( 0 , (?(*)) g~(t) : = m a x ( 0 , -g(t))
separately, it suffices to consider the case where g > 0, since always g(t) = g+(t) g~(t). We thus assume g > 0. Let now (p )neN C R with
+ n
lim p n = 0
n*oo
X) Pn = OO,
n=l
oo
oo
0(x) = ] T p X A ( z ) .
n n
(A.13)
n=l
Since we observed that (A.12) holds for \ A in place of g, the repre sentation in (A.13) in conjunction with Beppo Levi's Theorem 1.2.1 on monotone convergence then implies (A.12) for g. q.e.d.
n
Remark A.l. The coarea formula is due to Federer. I t holds more gen erally for Lipschitz functions u : R R. See H . Federer, Geometric > Measure Theory, Springer, New York, 1969, pp. 241-760, 268-71.
d
We also need some elementary results about the (signed) distance func tion from a smooth hypersurface. Let ft C E be open with nonempty boundary dft. We put
d
M \ _ / dist(x,dft) W~ \_dist(x,0ft)
:
if x G ft ifxGE \a
d
d is Lipschitz continuous with Lipschitz constant 1. Namely, for x,y G R , we find 7r G dft with d(t/) = | j / 7r |, hence
d y y
< \x - 7r | < |x - y\ 4- |y - ?r | = |x - y| 4- % ) ,
y
<\x-y\.
0
We now assume that dft is of class C . Let #o dft. Let n ( x ) be the outer normal vector of ft at #o, and let be the tangent plane of dft at xo. We rotate the coordinates of W* so that the x coordinate axis is pointing in the direction of - n ( # o ) . I * some neighbourhood U(xo) of #o, # 0 can then be represented as
d 1
(B.l) with x = ( x , . . . , x - ) , where / G C ( T i D C/(x )), Df(x' ) = 0. The Hessian D f(xo) is symmetric, and therefore, after a further rotation of coordinates, it becomes diagonalized,
0 0 Q 2 1 1 d 1 2
( Kl
\ (B.2)
D f(x )
0
262
263
K I , . . . , K d - i are the eigenvalues of D f(xo), and they do not depend on the special position of our coordinates. They are invariants of dfi, and are called the principal curvatures of dQ at x$. The mean curvature of dft at #o is
1
H ( x o ) =
d _ 1
E
2
rfn /(^o).
(B.3)
=1
n'(x)
^ ' ^
'
, , i= l,...,d-l
(B.4)
( l + |Z?/(x')|)'
(B.5)
) ) . I n particular
0
n'(x ) =
for t , j = 1 , . . . , d - 1.
d
(B.6)
:= { x R : d(x) = r/}. TTiere exists eo > 0 (depending on dQ) with the property that for
M
k
< c,
0
12,1 = ^ 1 ^ .
2
(B.7)
Proof Since d f i is compact and of class C , there exists e > 0 with the following property: Whenever \rj\ < e for each x$ d f i , there exist two unique open balls B B with B C fi, JB C R \ f i ,
d u 2 x 2
Bi
n an = x = B n an
0 2 2
of a normalized
(B.8)
264
X
Appendix B
then #o K . Also, by uniqueness, these balls depend continuously on #o dft. Thus, i f \rj\ < e, each x is the centre of such a ball, and n
x
= x 4- n(x)d(x)
with n(x) : =
x
n(n )
x
(B.9)
is the unique point in dfl with |x 7r | = We once again employ the coordinates used for the definition of / and rewrite (B.9) as x = F(x', d) = {x\ f(x')) Then F G C
f c _ 1
- n(x', f{x'))d.
d 0
(((T
Xo
( 1 - Kid(x)
DF
K -\d(x)
d
by (6) .
(B.ll)
1/
By (B.8) and since \q\ < e, det D F ^ 0. By the inverse function theorem, x' and d therefore locally are C functions of x (cf. (B.9)). Since d(x) = d(x we have Dd(x) - U(XQ) = 1. Since d is Lipschitz w i t h Lipschitz constant 1, we conclude \Dd(x)\ and Dd(x)
k 0 k x
~ rjn(xo)) = 77,
= 1
=-n(x )
0
C*- .
k v
Thus d e C locally, and the level hypersurfaces Y> are of class C . For (B.7), we may w.l.o.g. take n > 0 as the case n < 0 succumbs to the same reasoning. We consider the vector field V(x) = The Gauss theorem yields /
J{0<d(x)<rj}
Dd(x).
divV(x)= /
JHo
K(x)n(x)d{E }(x)+/
0
>
265
where is the normal vector of pointing in the direction opposite to n. Since the measure of { 0 < d(x) < 77} goes to zero w i t h 7 and 7 V{x) = -n(x) V(x) = n^x) (B.7) easily follows. q.e.d. for x G E = dQ
0
for x G 7 ? ,
References D. Giibarg, N. Trudinger, Elliptic Partial Differential Equations, Springer, Berlin, 2nd edition, 1983, pp. 354-6.
8 Bifurcation theory
8.1 Bifurcation problems in the calculus of variations We wish to consider a variational problem depending on a parameter A, and to investigate how the space of solutions depends on this parameter. We thus consider
A is supposed to vary in some open set A C E . Often, one has / = 1. We assume that F : [a, 6] x R x R x A R
d d
is sufficiently often differentiable so that all derivatives taken in the sequel exist. For that purpose, one may simply assume that F is of class C in all its arguments although that is a little stronger than needed in the sequel. Remark 8.1.1. One may also impose boundary conditions depending on A, i.e. u(a) = ui(X) u(b) = u (\),
2
and finally, one may vary the boundary points themselves, a = a(A) 6 = 6(A). This latter variation, however, can formally be incorporated in the vari ation of F , by transforming the integral. 266
8.1 Bifurcation problems in the calculus of variations Let T(-,A):[a(Ao),&(Ao)]-[a(A),6(A)] be a bijective linear map, for some fixed Ao- Then
,6(A)
267
/
Ja(X)
F(T,tl(T),u(T))dT
dr(t, A)
dt
r
b(X )
0
yields a parameter-dependent variational integral for v w i t h fixed bound ary points a(Ao),6(Ao). As established in Theorem 1.1.1 of Part I , a critical point u of / ( , A) of class C satisfies the Euler-Lagrange equations
2
F (t,
pp
(8.1.1)
In the light of Theorems 1.2.2 and 1.2.4 and Lemma 1.3.1 of Part I , we shall assume det F (t,u(t),u(t),X)
pp
^0
(8.1.3)
for all functions u occurring in the sequel. Equation (8.1.3) implies that (8.1.1) can be solved for u in terms of u and i.e. ii = -Fpp(t, ti(t), ii(t), A ) " {F (t,
pu 1
u(t), u(t),
u
X)u(t) (8.1.4)
+F (t,
pt
268
Bifurcation
theory
The topic of bifurcation theory then is to study the space of solutions of (8.1.5) in its dependence on the parameter A. Before approaching this problem from a general point of view i n the next section, we should briefly comment on the relations w i t h the Jacobi theory introduced in Section 1.3 of Part I . For a critical point u of / ( , A) and rj G Dl(I,R ), we had established the expansion
d
f J
r/, A) = Q (n)
x
:= ^
p p
A)cft,
a==0
= / 4abbreviated as
{F i j{t,u,u,X)rji7jj
u uj
(8.1.7)
F i (t,u,u,\)r)ir)j}dt,
b
f
Jx(u)n:=
{ F , r ) r ) 4 2F p 'nV + F uVV}
A pp Xy U XyU
dt.
+ F (t,u,u,
pu uu
\)rj)
(8.1.8)
-F (t,
pu
u, u, X)rj = 0.
J\(u) is called the Jacobi operator associated w i t h the critical point u of / ( , A). We also observe that J\(u)n d_ = L (u 4 srj) . ds'
x ls=0
(8.1.9)
Of course, this is not surprising since L \ represents the first variation of / ( , A) and J\ the second one. From the expansion (8.1.6) we see that I(u 4 si/, A) < A) if 6 I{u
2 y
n A) < 0.
y
(8.1.10)
(8.1.11)
269
Now by Lemma 1.3.2 of Part I , for a Jacobi field 77 that vanishes at the boundary points a and 6, (8.1.11) holds. This indicates that Jacobi fields play a decisive role for deciding about the minimizing property of a critical point u of / ( , A). Jacobi fields satisfy Jx(u)rj = 0, (8.1.12)
i.e. are solutions of the linearization of the equation L\u = 0 satisfied by u. This also indicates that Jacobi fields will play a decisive role in analysing the bifurcation behaviour of L\u = 0 as A varies. Namely, in finite dimensional problems, the presence of a nontrivial solution of the linearization of a parameter-dependent equation L\u = 0 at some parameter value Ao either results from a nontrivial family u(r) of solu tions of L\ U(T) = 0 by differentiating the equation w.r.t the parameter r , or it indicates a nontrivial bifurcation as A varies in the vicinity of Ao- I n the next section, we shall see that under appropriate assump tions, the same also holds in the present infinite dimensional context. In fact, the bifurcation problem will be reduced to a finite dimensional one via Lyapunov-Schmid reduction. The reason why this is possible in our variational context is that under our assumption (8.1.3), the space of Jacobi fields is always finite dimensional. Namely, analogously to (8.1.4), (8.1.5), the assumption (8.1.3) implies that (8.1.8) can be solved w.r.t 77, i.e
0
77 <>(, u, u, 77,77, A) = 0.
(8.1.13)
(Although this is not indicated by the notation, (8.1.13) is a linear equa tion for 77, and so the space of solutions is a linear space.) Now suppose that we have a sequence (rj ) ^ of solutions of (8.1.13) (for fixed A) that are bounded in some appropriate function space like C (I) or W (I). For concreteness, let us consider C ( 7 ) , i.e. for example
n ne 2 2,2 2
H^Hc (7) 1By the Arzela-Ascoli theorem, after selection of a subsequence, (f] )neN then converges in C (I) to some limit denoted by 770. (8.1.13) then i m plies that (77) N converges in C(I) (as it follows from our assumptions on the differentiability of F that ip is smooth, in particular continu ous). Thus (since the uniform limit of derivatives is the derivative of the limit), ( 7 7 ) N converges in C (I) to 770, and consequently 770 also solves (8.1.13). From this compactness result, one easily deduces that the space of solutions of (8.1.13) has finite dimension.
n l n 2 n n e
270
Bifurcation
theory
8.2 T h e functional analytic approach to bifurcation theory We consider the following general situation. We have Banach spaces V, W, and a parameter space A . We assume that A is an open subset of some Banach space. We consider a parameter dependent family of equations
Lu
x
= 0,
(8.2.1)
with
V
x A-> W
(u, A) H-+ L\u. We assume that L\U is sufficiently often differentiable w.r.t. to u and A so that all subsequent expansions are valid. The aim of bifurcation theory is to study the set of solutions u of (8.2.1) as A varies, to identify the bifurcation values of A, i.e. those values of A where the structure of the solution set changes, and to investigate that structure at such bifurcation points. I n order to arrive at concrete results, we need an additional assumption. We consider the derivative of L\u w.r.t. u, J {u)v
x
: = (D L {u))v
u x
: = ^ L ( u + tv)^
A
(8.2.2)
for v V. We assume that J\ is a Predholm operator of index 0, i.e. that ker JA and coker J\ are of finite and equal dimension, and furthermore that there exists a canonical isomorphism
ker J
A
^ coker J .
A
(8.2.3)
=0
(8.2.4)
(8.2.5)
We shall see that in this case, no bifurcation can occur at Ao- Namely, we have: T h e o r e m 8.2.1. Let L\ uo 0 for some Ao A , UQ V, ker J A ( ^ O ) = { 0 } . Then there exist neighbourhoods U(\Q) of Ao in A and V(UQ) of UQ in V such that for all A G U(\Q), there exists a unique u V(UQ) with
0 0
Lu
x
= 0.
8.2 The functional analytic approach to bifurcation theory Proof. Since J\ implies that
0
271
JA
V -
is an isomorphism. Thus the derivative w.r.t. the variable u of the map VxA->W (u, A) i L\u is an isomorphism at (tto,Ao), and the implicit function Theorem 2.4.1 implies that the equation L\u = 0 neighbourhoods
can be locally resolved w.r.t. u, i.e. there exist J7(Ao), V(uo) and a map U(X )
0
V(u )
0
= 0
= 0
(8.2.6) (8.2.7)
K := ker J \ ( U Q ) is one-dimensional.
0
The assumption that this kernel is one-dimensional may look restrictive, but it is typically satisfied in variational problems, and in this situation, we can already see the typical phenomena of bifurcation while avoid ing additional technical complications that arise for higher dimensional kernels. I n the sequel, we shall assume for simplicity u = 0
Q
272
Bifurcation
theory
(which can always be achieved by changing the dependent variables in our equation by a translation). I n the sequel, we shall also usually write J\ in place of J\ (uo) = J\ (0). We may write
0 0 o
V = KV
(8.2.8)
and in view of (8.2.3), we may also write W = KW with W We denote by :V-+K the projection onto K according to (8.2.8), and we consider n(V) as a subspace of W, according to (8.2.9). Thus, i f u = -f w with K, w V i , then n{u) = . I n particular,
TT(0)
x U
(8.2.9)
= Jx (V)
0
= JAW).
(8.2.10)
= 0.
:V^W
u LA W +
0
n(u).
i.e.
the
derivative
= J
A o
f + 7r(v) for v G V.
(8.2.11)
The Fredholm operator J yields a bijective continuous linear map be tween V\ and W\ because of the decompositions (8.2.8), (8.2.9), (8.2.10), and its inverse is likewise continuous (by Definition 2.3.1). From the definition of K and TT and (8.2.3) we then conclude that DA\ is an isomorphism. q.e.d.
0
8.2 The functional analytic approach to bifurcation theory We now consider the map A: V x A - > W (tx, A) >-+ A\(u) := L\{u) + 7r(tx).
273
By Lemma 2.3.4, there exists a neighbourhood V(Ao) of Ao in A such that for all A V ( A ) , A\(0) is a local diffeomorphism. We may therefore apply the implicit function Theorem 2.4.1. Consequently, as
0
i4(0,A ) = 0,
0 0
(8.2.12)
there exist neighbourhoods U(0) of u = 0 in V, Ui(0) of 0 in W such that for all A V ( A ) and G t / i ( 0 ) , there exists a unique u e U(0) with
0
A(u,\) i.e.
= ,
(8.2.13)
L n + T T ( U ) = .
A
(8.2.14)
(8.2.15)
since L\ 0 is
o
= Z-
(8.2.16)
(8.2.17)
which is the equation that we wish to solve. Since the image of IT is assumed to be one-dimensional (and in any case finite dimensional as J is supposed to be a Predholm operator), we have reduced our bifurcation problem to a finite dimensional problem. I n the sequel, we shall thus let vary only in K, the image of TX. Thus, we may consider as a scalar quantity, = ao, with a G l , where o is a generator of K. We denote
A
274
Bifurcation
theory
respectively.
(Note that A in general is not a scalar quantity, as we do not assume that A is one-dimensional.) Differentiating (8.2.14) w.r.t. a yields J du
Xo a
+ 7T(d u)
a
= ^
(8.2.18)
Since G K, also
0
</A o + 7r(o) = o.
0
(8.2.19)
(8.2.20)
We are now ready for the essential point, namely the asymptotic expan sion of the equation (8.2.16), i.e. 7r(u(,A)) = (8.2.21)
near 0, A = AoWe let d u, d\u be the second derivatives of u(ao> A) w.r.t. a and A, respectively, at a = 0, A = Ao, and likewise d\ u be the mixed second derivative w.r.t. a and A. Higher derivatives will be denoted similarly by corresponding symbols. The Taylor expansion of (8.2.16) then is
2 x
4-
n(d\u)/i 1 (8.2.22) ao =
-f terms of higher order in a and / i . Since 7r(0) 0 and since, by (8.2.20), d u = > hence n(d u)a , we may write (8.2.22) as
a 0 a
(8.2.23)
-f 7r(<9^ u)a/i
Remark 8.2.1. I n order to interprete the terms in this expansion, we differentiate (8.2.14), i.e L u(t
x
A) 4- 7r(n(, A)) = a
(8.2.24)
+ 7T(d u)
a
= Zo,
(8.2.25)
275
+ Jd u
x a
+ 7T{d u) = 0.
a
(8.2.26)
We put A Ao and project onto K in the decomposition (8.2.9). We may also denote that projection by 7r, and we then have TT O J \ 0. Also, from (8.2.20), d u = o, and so we get
0 a
(8.2.27)
Thus, the first term in the expansion of Q in (8.2.24) can be expressed via DJ\. I n a variational context, J\ represents the second variation, and so DJ\ represents the third variation of the variational integral. Likewise, if d u vanishes, i.e. if the third variation vanishes on the Jacobi field o> then 7r(d^u) can be expressed by the fourth variation, and so on.
2
We now discuss the simplest case of a bifurcation, namely where 7T{d ii)
a 2 2
^ 0.
2 y
+ ait r
+ t E ( t , r, p)
(8.2.29)
+ (*, r, /2).
(8.2.31)
We shall now see by a simple application of the implicit function theorem that the bifurcation behaviour of equation (8.2.31) is equivalent to the one of 0 = a -fair .
0 2
(8.2.32)
We assume ao 0; as will be discussed below (see Lemma 8.2.2), this can be derived from a suitable assumption about the variation of L \ as a function of A. (8.2.28) of course means that a\ ^ 0. I f ^ > 0, then there is no solution r of (8.2.32), whereas for ^ < 0, we have two solutions T i , T 2 . We keep / i fixed for the moment and write (8.2.31) as 0 = a + a i r - f (, r, p) = : #(, r ) .
0 2
(8.2.33)
Bifurcation
theory
T I , T
2
of (8.2.32). As
(8.2.34)
(8.2.35)
The implicit function theorem then implies the existence of (locally unique) functions
n{t)
->R
for i = 1,2,
(8.2.36)
We have thus found two solutions T I ( ) , T ( ) of (8.2.33), hence (8.2.22), hence (8.2.16), hence (8.2.17), i.e. (8.2.1) for t ^ 0, for the parameters A = A + t Jl.
t 0 2
(8.2.37)
In the other case, ^ > 0, (8.2.30) implies that for sufficiently small \t\ ^ 0, there is no solution of (8.2.33), i.e. of (8.2.1). Thus, as promised, the bifurcation behaviour in case 7r(9^n) ^ 0 ( (8.2.28)) is completely described by the simple quadratic equation (8.2.32). Of course, replacing ft by ft changes the sign of ao and thus interchanges the cases ^ > 0 and < 0. ai We summarize our result in: T h e o r e m 8.2.2. We consider a parameter tions Lu
x
= 0
(8.2.38)
as above, V x A->W L u,
x
(u, A)i
where V, W are Banach spaces and A is an open subset of some Banach space, and L\u is smooth in u and A. We suppose that L0
Xo
= 0,
and
277
L = -j;L {u dt
x
+ *v)|
t = 0
= coker J
(J
Xo
J (0))
Xo
We assume fur
= 1 (see (8.2.7)).
(8.2.40)
: = 7T(d u)ft
x
(= j v(u(Q,
t
Ao + t/2))| =o) ^ 0
t
(8-2.41)
(nonvanishing of the third variation, see Remark 8.2.1). Then there exist e > 0 and a variation X = Ao - f t ft of Ao with the property that for 0 < t < e, there exists a neighbourhood U of 0 in V such that the number of solutions u U of
2 t t t
Lu
Xt
= 0
(8.2.43)
= 0.
(8.2.44) q.e.d.
Remark 8.2.2. Since kerJA , the image of 7r, is assumed to be onedimensional, we have simply considered ir(d u), 7r(d u) as scalar quan tities.
0 2 x
= 0,
(8.2.45)
Bifurcation
theory
For a complete description of the bifurcation behaviour, this time we need to consider a two parameter variation. We assume that there exist /ii,/i
2
(8.2.48)
= 02
(8.2.49)
We put a : = t r , /x = 6 i / i i -f 6 / i , with parameters b\, 6 , and rewrite (8.2.47) as 0 = t (7r(a w)/ii6i + TT(dl u)fi b T
A )X 2 2 3
(8.2.50)
2
+ h{dlu)T
3 0
+
3
Z{t,T^ fl ))
U
=: c t (a -f a i r -f r
0
- f E(, r, / i i , / i ) ) ,
2
with c = ^ T T ( ^ ^ )
0
0 = a + air + r Again
+ E(, r, / i i , / i ) .
2
(8.2.51)
(8.2.52)
As before, we may thus invoke the implicit function theorem to conclude that the qualitative description of the bifurcation behaviour is furnished by the solution structure of the cubic equation 0 = a -f air4-r .
0 3
(8.2.53)
In particular, locally there exist at most three solutions. We summarize our result in: T h e o r e m 8.2.3. As in Theorem 8.2.2, assume the general conditions (8.2.38)-{8.2.40). variations / i i , / i
2
Furthermore, with
parameter
7T(9AU)/XI
+ 0,
{see {8.2.48), {8.2.49)).
(8.2.54) (8.2.55)
*{tiL \v)ii2
t
279
+t b ^
(8.2.56)
suc/i that for 0 < < e, there exists a neighbourhood Ut of 0 in V for which the number of solutions u G Ut of Lu
Xt
= 0
(8.2.57)
= 0.
2
8.2.2 again.)
What we are seeing in Theorem 8.2.3 is the so-called cusp catastrophe (in the language of R. Thorn's theory of catastrophes), the bifurcation of the zero set of a cubic polynomial depending on the parameters ao, a\. In the same manner, one may also identify conditions where the bifurcation behaviour is described by other so-called elementary catastrophes, as classified by R. Thorn (see e.g. T h . Brocker, Differentiable Germs and Catastrophes, LMS Lect. Notes 17, Cambridge Univ. Press, Cambridge, 1975). The higher the order of the polynomial involved, however, the more independent parameters one needs. The general idea is that the singular behaviour at a bifurcation point, in particular the nonsmooth structure of the solution set at such a point, is simply the result of the projection of a smooth hypersurface in the product of the solution space and the parameter space onto the solution space. The singularity arises because that hypersurface happens to have a vertical tangent plane over the solution space at the bifurcation point. In order to discuss the assumption (8.2.41), (8.2.54), we provide L e m m a 8.2.2. Assume that for every (3 = TX(D L U(0,
X XQ
AO)
fa) (:= 7 r ( i L
u(0,
A ))|
0
).
(8-2.59)
(Again, we write (3 in place of /3o and consider it as a scalar quantity, as the image of TX is assumed to be one-dimensional.) Then for every J S G R , there exists some /i with Tx((d u)v)
x
- /?.
(8.2.60)
280 By (8.2.14)
Bifurcation
theory
LA.(, A
) +
A ))
T
= .
= -^(L
A t
w(0, A ))|
t L 0 A
t = 0
)^u(0, A )|
t
t = 0
Since D L\ = J , and 7r o J = 0 by definition of 7r, applying 7r to both sides of the preceding equation gives ir((d\u)ti) = -n{D Lx )u(0,
x o
A ),
0
and by assumption (8.2.59), we may find fi for which the right-hand side becomes /3o- (We take -/? in place of (3 in (8.2.59).) a.e.d. The approach to bifurcation theory presented here originated with L. Lichtenstein, Untersuchung (iber zweidimensionale regulare Variationsprobleme, Monatsh. Math. Phys. 28 (1917), and was developed in X. Li-Jost, Eindeutigkeit und Verzweigung von Minimalflachen, Thesis, Bonn, 1991, see also X . Li-Jost, Bifurcation near solutions of variational problems w i t h degenerate second variation, Manuscr. Math. 86 (1995), 1-14, J. Jost, X . Li-Jost, X . W . Peng, Bifurcation of minimal surfaces in Riemannian manifolds, Trans. AMS 347 (1995), 51-62, Correction ibid. 349 (1997), 4689-90. The reduction of a bifurcation problem in an infinite dimensional set ting to a finite dimensional one is an example of the Lyapunov-Schmid reduction which we now wish to discuss. As before, we consider a parameter dependent family of equations L\u with V xA-*W = 0 (8.2.61)
(u, A) i- L\u. (V, W Banach spaces, A an open subset of some Banach space) near ( u , A ) with
0 0
L uo
Xo
= 0.
(8.2.62)
281
(8.2.63)
0 0
: W -+ Wo
be the projection defined by the decompostion (8.2.64). Then our equa tion L\u = 0 is equivalent to
TXL U
X
= 0.
(8.2.65)
V i , A e A.
v', A ) = D L {v"
0 v Xo
4-1/) : V
Wi
0
is an isomorphism by definition of V i , W\\ namely it is simply J\ (uo), considered as a map from Vi to W\. Therefore, by the implicit function Theorem 2.4.1, near ( n , A ) , we may find a unique
0 0
4- <p{v', A)) - 0.
(8.2.66)
Thus u = v' 4- <p(v', A) solves L\u = 0 if and only if irL\{v' + <p{v\\)) -0. (8.2.67)
Equation (8.2.67) is a finite dimensional system of equations, because the image of TT, Wo, is finite dimensional. This is a Lyapunov-Schmid reduction, and we have seen an instance of this in detail in the preceding for the case where Vo and Wo are one-dimensional. A general reference for this and other topics and methods in bifurcation theory is S. N . Chow, J. Hale, Methods of Bifurcation Theory, Springer, New York, 1982.
282
Bifurcation
theory
8.3 T h e existence o f catenoids as a n e x a m p l e o f a b i f u r c a t i o n process We consider the variational problem I(u)= with F(, u, u) = u \ A + u .
2
j F(t,u(t),u(t))dt Ja
(8.3.1)
(8.3.2)
This variational problem is of the type considered in Section 1.1 of Part I . I(u) with F given by (8.3.2) is the area of the surface of revolution ob tained by rotating the curve u(t), a < t < 6, about the t-axis. Crit ical points are so-called minimal surfaces of revolution. According to Theorem 1.1.1 of Part I , the corresponding Euler-Lagrange equation is computed as j F (t,u{t),u{t))
t p
-F (t,u{t),u{t))
u
= F
pp
(t, u{t), i ( t ) ) i i ( t ) + F
pt u
pu
( t , u(i), u(t))tz(t)
+ F (t, u(), u()) - F (t, u{t), = 0 which in the present case becomes y/l -f 6 2
l T ^ 2 2
2
dt
V v
(v TT^ ) or equivalently
+ - 7 ^ = ^ " \ A + u - 0, v IT^
/ 2
(8.3.3)
(8.3.4)
283
with parameters A, t . Here A ^ 0, and we may assume A > 0 as the ease A < 0 is symmetric to the case A > 0. Also, since to just represents a translation of the independent variables, we may assume to = 0, i.e. u(t) = A c o s h ^ . (8.3.5)
The curve u(t) is called a catenary, and the minimal surface of revolution obtained by revolving u(t) about the t-axis is called a catenoid. For the sake of normalization, we consider the interval I = [1,1]. In order to use the general theory of Section 8.2, we need to choose appropriate Banach spaces V, W and A = E and consider the operator L
x
: VxA-^W (it
(8.3.6) ^
1 + i 2
( n , A ) f _ >
(vTT6*) "
~ Aeoshl,?x(-f)
A cosh j On the right hand side, we have a differential operator of second order and a Dirichlet boundary condition. The boundary values are real num bers, and so W should contain R as a factor as we have two boundary points. Otherwise, V and W shouM differ by two orders of differentia bility. Thus, possible choices are Sobolev spaces
2
V = W ' (I),
k+2 P
W = W > (I)
W = C (I)
x E .
W = L (I)
x M ,
(8.3.7)
but the reader should also convince herself or himself that the other choices work as well, although the space L will always play some aux iliary role. In the sequel, we shall denote the scalar product in L (I) by (, - ) L , i.e.
2 2 2
(WI,W )L*
2
wi(t)w (t)
2
dt
284 for
wi,w
2
Bifurcation G L (I).
2
2 2
theory
on
W = L (I) K ),
+ S I - 8
2
x E
for
w = (w ,s )
2
G L (I),s ,s
x
(wi,W )
(WI,W )L*
2
is obtained from the scalar products on L (I) The Jacobi operator is given by J\(u)v = D L\(u)v
u
and on E .
'
we need to solve
= 0.
(8.3.9)
(These solutions are simply obtained by differentiating the general solu tion A cosh (^x* ) f (8.3.4) w.r.t. the parameters A and t (at to = 0), cf. Theorem 1.3.3 of Part I.) The boundary condition (8.3.11) cannot be satisfied by V\, and so we have to find out for which values of A
1 0
(8.3.12)
We agreed above to consider only positive values of A, and this equation has precisely one positive solution which we denoted by Ao, and likewise, we put uo(t) cf. (8.3.5). = A cosh (J^j
0
'
285
The only solutions of (8.3.10), (8.3.11) are av(t) with a G R and v(t) given in (8.3.12), and so we have dimker J ( u ) = 1.
A o 0
(8.3.14)
for all rje Cg(J). In the sequel, we shall need a little regularity result, namely that any solution v of (8.3.15) of class L (I) is automatically smooth, in fact of class C(I). As we are dealing with a one-dimensional problem here, this result is not too hard to demonstrate, but since that would lead us too far astray, we omit the proof. I t can be found in most good books on differential equations or functional analysis, e.g. K . Yosida, Functional Analysis, Springer-Verlag, Berlin, 5th edition, 1978, pp. 177-82. Of course, i f v is of class C , (8.3.15) is equivalent to
2 2
for all 1 G C n i ) , 7 and by Lemma 1.1.1 of Part I , this is equivalent to v being a solution of the Jacobi equation. We shall now identify ker J ( u o ) and coker J (t*o) as required in (8.2.3). We shall simply write J in place of J\ (uo). According to our choice (8.3.7), we consider J as an operator
Ao Ao A o 0 A o
J
; =
X O
: W (I)
2 2
22
L (I)
x R .
2 2
--
with
A o
(w,ip)
= (J\ v,(p) 2
0 L
(v,J cp) 2
Xo L
= 0 (in the same manner as the equivalence of (8.3.15) and (8.3.16) and noting that cp is smooth and v and <p both vanish on dl.) Thus i f w G Ro(J\ ), then also w G (ker J ^ ) - , where - denotes the orthog onal complement in the Hilbert space L (I), as in Corollary 2.2.4. Con sequently, i f we denote the closure of a linear subspace M of L (I) x R
1 1 0 2 2 2
Bifurcation
theory
(ker J A J - .
1
(w, ^A ^)vy = 0
By the regularity result mentioned above, this implies that w is smooth, and so we may integrate by parts to get (w, J\ v)
0 w
= (J w,
Xo A o
v)
for all v G
H$ (I)
RoiJxo)- .
= Ro(Jx )
Q L 0
L2
(BRoiJxo)- ,
A o
ker J
A o
2* coker J .
A o A o
We note that this depends on the fact that J in the sense that (v,J w)
Xo
= (J v,w)
Xo
(8.3.18)
if e.g.
2 2
Remark 8.3.1. The situation here is slightly different from the one in Section 8.2 inasmuch as we identify coker J here with Ro(J\ )' and not with R^JXQ)^. Therefore, in the present situation, if IT denotes the orthogonal projection onto ker J = coker J , we have
L A() 0 A o A o
^(^Ao^) = 0
only for v G H ' (I), but not for all v G H ' (I). This is for example relevant for the argument of the proof of Lemma 8.2.2. Regularity theory also implies that R(J )
Xo
2 2
2 2
W ' (I),
J\o^n
2 2
we have
fni
2 2 2 n
and f
converges to fo in L {I)
8.3 Example: bifurcation of catenoids By Rellich's Theorem 3.4.1, after selection of a subsequence, v converges in W ' (I). 1
V
287
n
then
l 2
..
2
\ .
V
1 A2
c o s
1
7TT 2
Vn
cosh ^
n + "J! n
' dt ^ c o s h i ^
nn + To 1
h ^
fn,
Thus, v
converges in
= /oA o
is closed. Thus, J
is a Fredholm operator of
Our aim is now to check that the assumptions of Theorem 8.2.2 hold. In order to verify (8.2.42), i.e. ir(d u) ^ 0, according to Remark 8.2.1, we need to compute o\7 , i.e. the second derivative of L \ . Starting from (8.3.3) and inserting (8.3.6), i.e. no = Aocosht/Ao, we obtain
2 Ao Q
J T
3Atanh^ .
By (8.2.27), we have to project this onto the kernel of J\ (uo) and check that the result is nonzero, for our Jacobi field v given in (8.3.12), i.e. v = cosh tjA t / A s i n h t / A . Since here the projection TT is given by the orthogonal projection in the Hilbert space L (I) x E onto ker J ( u o ) , which is generated by the Jacobi field v, we simply have to verify that the L - product of dJ\ (uo)(v, v) w i t h v is nonzero. Thus, we compute
0 2 2 Ao 2 0
1 cosh
A
3Atanh|
/ |
cosh ^
cosh' j
by an integration by parts 3v(t) , (Atanh { ?)() - v(t)) cosh A Now with v = cosh j j sinh ^ , we have A tanh j v(t) - v(t) = - cosh ^ ,
2
dt.
Bifurcation
theory
< 0.
> 0.
(8.3.19)
We finally consider (8.2.41). Thus, we have to verify that it{d\u) ^ 0, w i t h d\u = ^t)\t=o f
t r a
A of parameters. We start with (8.2.14), i.e. in the notations of Section 8.2 L n ( , A t ) + 7r(n(,At)) = .
At
(8.3.21)
In the present case L \ is given by (8.3.6), and IT is the orthogonal projection in L (I) that v(l) = v(-l) However, since ^(Acoshi)
0 | A = A o 2
by v(t) = cosht/Ao t/Ao sinht/Ao (see (8.3.12)), where Ao is so chosen = 0. Thus, this v can be taken as the of Section 8.2.
0
=0
by choice of A (see (8.3.13)), we shall need to employ a variation of the parameter somewhat different from the family At = A - f tfi employed in Section 8.2. Here, we put i / := A cosh-^
0 0
(8.3.22)
-f T J - ^
\\ \\ L ( / )
v 2
(d\u,v) 2v
L
= 0 = fi.
(8.3.23)
= d\u(-l)
=j
+
c Q s
^ 2 _L
( W^J
dxU
^ ^^
0
dxU{t)Ht)dt
289
= 0 v(l)
TT^TQ
V)V
2
0,
ll ll L ( / ) i.e. (8.3.20). We thus have verified all the assumptions of Theorem 8.2.2 (for the family X defined by (8.3.22) in place of the family At = Ao + tfi). Theorem 8.2.2 thus describes the bifurcation behaviour of the solu tions of (8.3.3) or (8.3.4), i.e. the critical points of (8.3.1), (8.3.2) near Uo(t) = A o c o s h ^ : For boundary values u(l) = u(1) < A o c o s h ^ , there is no solution (at least in the vicinity of no), whereas for u ( l ) = u(1) > A o c o s h ^ , we may find two solutions. Of course, this may also be verified directly without going through all the abstract machin ery of Section 8.2, but hopefully this example can serve to illustrate the general scheme. The catenoids are frequently discussed in books on the calculus of variations, e.g. O. Bolza, Vorlesungen uber Variationsrechnung, Teubner, Leipzig, Berlin, 1909, or M . Giaquinta, St. Hildebrandt, Calculus of Variations, Springer, Berlin, 1996, I , p. 366 and I I , pp. 263-70. A discussion in terms of bifurcation theory also in the case of not necessarily symmetric boundary conditions (i.e. not requir ing u(l) = u(1)) is given by H . Wenk, Extremverhalten der Stabilitat von Catenoiden als Rotationsminimalflache, Diplom thesis, Bochum, 1994.
t
Exercises 8.1 How many parameters are needed for a complete description of the bifurcation behaviour of the roots of a fourth-order polyno mial?
290 8.2
Bifurcation
theory
+ 1
u(t) dt
= U(K)
8.3
for a parameter n > 0. Determine the value no for which a bifurcation occurs. (Hint: This problem can be reduced to the one considered in Section 8.3.) Consider geodesies on S as in Chapter 2 of Part I . More pre cisely, we take two points p,q G S w i t h distance d(p, q) = A, and consider geodesic arcs between p and q of length A, i.e. length minimizing arcs. What happens at A = 7r? Does this fit into the framework described in Section 8.2?
2 2
9.1 T h e P a l a i s - S m a l e condition In this chapter, we take up a direction that has already been presented in Chapter 3 of Part I , namely the search for nonminimizing critical points of variational problems. This chapter will consequently be independent of Chapters 4-8 of the present Part I I . I n Section 3.1 of Part I , we pre sented existence results for unstable critical points of functionals F of class C on some finite dimensional Euclidean space R . We only needed a coercivity condition on the functional guaranteeing that a critical se quence ( x ) N (i.e. satisfying DF(x ) 0, | F ( x ) | bounded) stayed in a bounded set. The local compactness of R then allowed the extrac tion of a convergent subsequence whose limit XQ satisfied DF(xo) = 0, because of the continuity of DF. I n Sections 2.3 and 3.2 of Part I , we also presented examples where variational problems could be reduced to such finite dimensional problems. The domain was a little more compli cated than E , but being finite dimensional, i t was still locally compact so that we had no difficulties finding limits of subsequences for critical sequences. I n the remainder of this book, however, we have had am ple opportunity to realize that variational problems are typically and naturally posed on some infinite-dimensional Hilbert or Banach space. Such a space is not locally compact anymore w.r.t. its Hilbert or Banach space topology, so that the previous strategy encounters a serious prob lem. Also weak topologies do not help much as the functionals under consideration typically are not continuous w.r.t. the weak topology. I f one searches for minimizers, this problem can be overcome by introduc ing convexity assumptions as we have seen in Chapters 4 and 8, but any convexity assumption excludes the existence of critical points other than minima.
1 d n n n n d d
291
292
The Palais-Smale
condition
Nevertheless, the lack of compactness of the underlying space must be compensated by an assumption on the functional that guarantees the appropriate compactness of critical sequences. I n other words we do not require the compactness of arbitrary bounded sequences on our space which is impossible as argued but only of critical sequences. This is the idea of the P a l a i s - S m a l e condition which we now formulate: Definition 9.1.1. Let (V, ||-||) be a Banach space, F : V > E a func tional of class C . We say that F satisfies the Palais-Smale condition, abbreviated as (PS) , if any sequence ( x ) n N C V satisfying
1 n
(i) | F ( x ) | < c
n
contains a convergent
Note that a limit XQ of such a subsequence satisfies DF(xo) is a critical point of F) because DF is continuous. A direct consequence of the definition is:
:= {x e V : F(x) = a, DF(x)
= 0}
(the set of critical points of F with value a) is compact. q.e.d. We also have: L e m m a 9.1.2. Suppose F : V E satisfies (PS). For a G E, we put > Ua, :=
P
| J {zeV\
xeK
a
||x-*||<p}
Then the families (U ) o and (N ^)s>o are fundamental systems of neighbourhoods of K (i.e. each neighbourhood of K contains some U , and some N j).
a a a p a
Proof. I f is clear that U , and N j are neighbourhoods of K for p > 0 respectively < > 0. I t follows from the compactness of K that each 5 neighbourhood of K contains some U , - Concerning the same prop erty of the i V 5 , let us assume on the contrary that there exist a neigh bourhood U of K and a sequence (y ) ?$ w i t h y e i V \ (UDN L)
a P a a a a a P
a <
n ne
condition
n
for all n. (PS) implies that a subsequence of (y )nen yo e K C U, contradicting the openness of U.
a
In our applications below, we shall also encounter the situation where we want to find critical points of the restriction of some functional F to the level hypersurface G(x) = (3 of some other functional G. For that purpose, we shall need a relative version of the Palais-Smale condition which we shall formulate only for the case of a Hilbert space: D e f i n i t i o n 9.1.2. Let (H,< , >) be a Hilbert space, F,G : H - * R functionals of class C , (3 G E . Suppose
1
DG{x)
^ 0
We say that F satisfies (PS) relative to G = (3 if every ( x ) n N C H with G(x ) = (3 and satisfying
n n
(i) | F ( x ) | < c
n
for n + oo
contains a convergent
subsequence.
= f3
(9.1.1)
i.e. is a critical point of the restriction of F to G(x) = (3. Of course, re sults analogous to Lemmas 9.1.1 and 9.1.2 hold in the relative case. One simply intersects the corresponding sets w i t h {G(x) = (3} and replaces DF by its projection to that level set. As in Sections 2.3, 3.1, 3.2 of Part I , in order to find critical points of a functional, one needs to construct (local) deformations that decrease the value of the functional except at or at least away from critical points. We shall now do so in stages of increasing generality. We start w i t h a functional F : H R of class C on some Hilbert space (H, (, )) that satisfies (PS). For each
2
294
The Palais-Smale
condition
u E H, DF(u) is a linear functional on H, and by Corollary 2.2.3, i t can therefore be identified w i t h an element V F ( w ) of H , called the gradient of F at u. Thus, VF(u) satisfies DF(u)(VF(u)) \\VF(u)\\ = ||>F(u)|| = \\DF(u)\\.
2 2
(9.1.3) (9.1.4)
1
Since F is assumed to be of class C , DF and hence V F are of class C in their dependence on u. I n particular, V F is locally Lipschitz. We now consider the (negative) gradient flow induced by F : i/>(u, t) = -VF(V>(u, 0 ) ot V>(u,0) = u. for t > 0
(9.1.5) (9.1.6)
Because of the Lipschitz property, by Theorem 2.4.2 and Corollary 2.4.2, for small t > 0, there exists a unique solution tp(u, t) satisfying the semigroup property
V>(M + s) = ^ ( M ) , * )
for sufficiently small 5, > 0. Moreover, ip(u,t) = u Finally F ( ^ ( u , i ) ) = F(u) + J* ~F{^{u, = F(u) + jT = F(u)< F(u) f Jo r))dr for all u
(9.1.7)
DF(i/>(u,t))-^il>(u,T)dT \\DF(ip(u,T))\\
2
dr
by (9.1.5), (9.1.3)
i.e. i f u is not a critical point of F . Thus, we have found the prototype of a deformation that decreases the value of F except at its critical points. For technical reasons, however, the above flow will need some modifications and generalizations. First of all, a solution of (9.1.5) need not exist for a l H > 0 because i t
condition
295
may become unbounded in finite ' t i m e ' t . This can be easily remedied by using the Lipschitz function
7 : M 7
T](S)
+
(i.e. V F ( w ) = V F ( u ) for | | V F ( u ) | | < 1 and | | V F ( u ) | | < 1 for all u) and replacing (9.1.5) by J^(M) Of course, we still use (9.1.6). Since VF(u) < 1 for all u, the solution of (9.1.10), (9.1.6) now exists for all t > 0, and satisfies (9.1.7) for all s,t > 0. Equation (9.1.8) also still holds, and as in the derivation of (9.1.9), we get F(iKu,t)) = F(u) < F(u) T r / ( | | V F ( ^ ( u , r ) ) | | ) ||DF(^(u, r ) ) | | d r Jo for t > 0,
2
= -VF(iK,t)).
(9.1.10)
if tx is not a critical point of F . More generally, we have F(-0(u, t)) < F(V>(u, 5)) whenever 0 < 5 < t, for all u.
Next, we wish to localize the construction near a level a. Thus, for given eo > 0 and a neighbourhood U of K we want to have a flow ip(u, t) with (9.1.7), (9.1.8) and also
a
i)(u,t)
= u
if|F(u)-a| > c ,
0
(9.1.11)
and the following more explicit local decrease of the value of F : For a E R, we put F : = {veH
a
F(v)
<a}.
^(F
a + e
\ U, 1) C F _
Q a e
(9.1.12) (9.1.13)
^(f/,l)cF _ U[/,
The Palais-Smale
condition
< F(i>(u,s))
if 0 < s < t
for all u.
(9.1.14)
We let (p : E E be Lipschitz continuous w i t h (p(s) = 0 <p(s) = 1 0 < <p(s) < 1 and replace (9.1.10) by ~*P(u, t) = -y>(F(V>(u, t)))VF(1>(u, t)). (9.1.15) for |a - s\ > e
0
Again, a solution ip(u,t) exists for all > 0 and satisfies (9.1.7) for all 5,t > 0, as well as (9.1.8) and (9.1.14) (for the latter i t was necessary to require (p > 0). (9.1.11) also is clear from the choice of (p. We now verify (9.1.12), (9.1.13). I f 0 < c < f and u G F and i f F{i>(u, 1)) > a - e, from (9.1.14)
a + C
|F(V>(tx, t)) ~a\<e and therefore <p(F(i/>(u,t))) = 1 As before, we may now compute F(i>(u,l)) f
= F(u)+ = F(u)/
1
(9.1.16)
d
F(V(w,r))dr
d T
Jo
f
JO
(i W,T))i7(||VF(V'(u,T))||)||Z?i!'W,r))|| dT
; ,
<a + e-
[ min(l,||DF(^(u,r)|| )(ir Jo
a + C
(9.1.17)
since we assume u G F
C t/ ,
a
2 p
CU
Q
(9.1.18)
(here, we are using (PS)!). From the definition of N j, thus \\DF(ip(u T)\\ > 6 whenever ip(u,r) ^ N f Without loss of gen erality 6 < 1. (9.1.17) then yields
2 2 } a
F(i){u,
iV }) 6 .
M
(9.1.19)
condition
297
I :=
'
1
inf
wN
ai6
w|| ) > p.
Since 8 d t ^
M
< 1,
a<
(9.1.20)
therefore, i f u fi /, then also t/>(u, r ) ^ 7V 5 for 0 < r < p, and similarly, if ^ ( u , 1) fi t / , then also ip(u,r) fi N j for 1 - p < r < 1. Therefore, if either tx fi U or -0(tt, 1) ^ (7, then
a
> p.
(9.1.21)
Thus, for 0 < e < min(, | p 6 ) , we get (9.1.12), (9.1.13). In conclusion, we have shown the following deformation result: T h e o r e m 9.1.1. Let F : H R be a C functional > H, satisfying (PS). Let a G R, and put
F
A
on a Hilbert space
: = {v G H : F ( v ) < a } , = a, DF(v) = 0} .
a
:= {v e H :
rj): H x [0, oo) -+ H with the semigroup property ip(ijj(u, s),t) u H and with = ip(u,s + t) for all s,t > 0,
(i) -0(w, 0) = it /or all u E H (ii) F(tp(u,t)) is nonincreasing in t for all u G H (iii) tp(u,t) = u /or a// t whenever DF(u) = 0, i n particular for u G (iv) 4>(u,t) = u whenever \F(u) a\ > e , for all t (v) </>(F \ U, 1) C F _ , </>(F , 1) C F _ U [/ (vi) IfF(u) is even (i.e. F(u) = F(u) for all u), then also F(IJJ(U, is even in u for all t (i.e. F(ip(u,t)) = F(ip(u,t))).
0 Q+c a a+c Q C
t))
298
The Palais-Smale
condition
(Property (vi) follows from the construction: A l l the auxiliary functions are invariant under replacing u by u i f F is even, and V F ( - w ) = VF(u) in the even case.) q.e.d. C o r o l l a r y 9 . 1 . 1 . If under the preceding assumptions, F has no critical point with value a, i.e. K = 0, then there exist a deformation I/J with the preceding properties and
a
V (F > Proof. I f K
a
a + C
, 1) C F a
(9.1.22)
We shall now extend Theorem 9.1.1 in two directions. First, we con sider the relative case, where in addition to F , we have another C functional F : H R w i t h
2
DF(x)
^ 0
for some given value (3 G R. We wish to find critical points of the re striction of F to G = /3. We assume that F satisfies the relative (PS) condition of Definition 9.1.2 on G = f3. We then perform the preceding construction w i t h V F(u)
K G
'
:= VF(u) - ( V i ^ V G ( K ) ) l|VG()||
K J 2
(9.1.23) '
|GWM))
= -<p(F(i,(u, t)))r,(\\V F(u)\\)
G
(V FW(u,
t)), VG(i>(u,
t)))
from the chain rule and the analogue of (9.1.15) = 0, since (V F(v),VG(v)) = 0 for all v G H. Therefore, the flow tp(u,t) now leaves G = /3 invariant. We obtain: T h e o r e m 9.1.2. Let F,G : H R beC functionals on a Hilbert space ( i f , (, )) with F satisfying (PS) relative to G = (3. Let a G R, := {veH\F(v)<a,G(v) = (3},
G 2 G
= 0} .
condition
299
Let e > 0, and let U be a neighbourhood of K^ in {G(v) = /3}. Then there exist e > 0 and a continuous semigroup family 4,:{G(v) satisfying (i) ip(u, 0) = u for all u G {G(v) = (3} (ii) (iii) (iv) (v) (vi) F(ip(u,t)) is nonincreasing in t ip(u, t) = u for all u G ip(u,t) = u for all t if \F(u) a\ > eo \U,1)C F, i>{Ff , 1) C F^ UU If F and G are even, so is F(i/>(-,)) for all t.
t t
= p}x[0,oo)^{G(v)
= l3}
Secondly, we wish to extend the preceding construction to functionals on Banach spaces. For a functional on a Banach space, in general one does not have a good notion of a gradient. We therefore need to introduce Palais' concept of a pseudo-gradient: D e f i n i t i o n 9.1.3. Let (V, ||-||) be a Banach space, U C V, F : U -+ R afunctional of class C . A pseudo-gradient vector field for F is a locally Lipschitz continuous vector field v : U V satisfying
1
for all u G U. L e m m a 9.1.3. Let F : V R be a functional of class C on the Banach space V. Then F admits a pseudo-gradient vector field on V':={ueV \ DF(u)^0}.
1
|HI <min(l,||DF(u)||)
DF(u)(w) > \ min(||DF(u)||,\\DF(u)\\ ).
1 2
(9.1.24) (9.1.25)
Since DF is continuous (as we assume F G C ) , w satisfies (9.1.24), (9.1.25) also for all v in some neighbourhood N of u. Since {N : u G V'} is an open covering of V, it possesses a locally finite refinement {M } f . Let
u u a a G /
p (v)
a
:=dist(t;,V"\M ).
a
f T h i s holds for any open covering of a paracompact set, see e.g. J . Dieudonne, Grundziige der Modemen Analysis, 2, Vieweg, Braunschweig, second edition, 1987, pp. 26-9; V is paracompact for example because it is metrizable.
300 p
a
The Palais-Smale
a
condition
a
Since each v is only contained in finitely many M ^ , because of the local finiteness of the covering, the denominator of (p is a finite sum. ((f )aei is a partition of unity subordinate to { M } , i.e. 0 < ip < 1, (p = 0 outside M , Ylaei = Also, the (p are Lipschitz continuous. Then
a a a a a a a
V U
()
<*
is a convex combination of vectors satisfying (9.1.24), (9.1.25) and hence satisfies these relations, too. v(u) thus is a pseudo-gradient vector field for F. q.e.d. Note that we only need to require F G C , and not F G C , in order to construct a locally Lipschitz pseudo-gradient field. We then have the following deformation for C -functionals on Banach spaces.
1 1 2
T h e o r e m 9.1.3. Let F : V > E be a C -functional on a Banach space V satisfying (PS). Let a G E , eo > 0, U a neighbourhood of K as in Theorem 9.1.1. Then there exist 0 < e < 1 and a continuous family I/J : V x [0, oo) V satisfying the semigroup property w.r.t. t > 0, and
a
ip(u, 0) = u for allu eV F(ip(u, s)) < F ( ^ ( u , t)) whenever 0 < t < s, u G H ip(u, t) = u for all t whenever DF(u) 0 ip(u,t) = u whenever \F(u) - a\ > e , for all t ^(F \ U, 1) C F _ , 1>(U, 1) C F _ U U If F(-) is even, so is F(ip(-,t)) for all t.
0 a+e a c a c
Proof. The proof is the same as the one of Theorem 9.1.1, replacing VF(u) by a pseudo-gradient vector field v(u) except for the following technical point: Lemma 9.1.3 asserts the existence of a pseudo-gradient field only on {x G V \ DF(x) ^ 0}. We therefore have to choose another Lipschitz continuous cut-off function 7 : V E with 0 < 7 < 1, 7(1*) = 0 i f u G J V j , 7(14) = 1 for u G V \ N ^. We may then consider
a j a
with </?, 77 as before. This has the additional effect that dj>(u,t) dt _
301
whenever tp(u,t) G JV , which is a neighbourhood of K , while the evolution is the same as before (with v(u) in place of V F ( w ) ) outside N j- This cut-off near K does not affect the rest of the construction. If F is even, we may also choose 7 even. However, there might still exist critical points of F in F \ N s- I n order to take account of those, we strengthen the requirements on the above cut-off function (p to
a a a + C ay
p(s) = 0 (p(s) = 1
for |a - s\ < m i n ( y , - ) .
W i t h such a </?, the right-hand side of (9.1.26) vanishes near any critical point of F , and i t is therefore defined on all of V. I f we then also impose the additional restriction
4
everything works out as before. q.e.d. I t is possible, and not overly difficult, to extend Theorem 9.1.3 to the relative case and to obtain a result analogous to Theorem 9.1.2. Here, however, we refrain from doing so.
9.2 T h e m o u n t a i n pass t h e o r e m W i t h the help of the deformation theorems of the previous section, one may easily derive existence results for critical points of a functional sat isfying (PS). To illustrate this point, we start with the trivial L e m m a 9 . 2 . 1 . Let F : V R be a C satisfying (PS). If
1
functional
on a Banach space
a := inf F(u) > - 0 0 , then F possesses a critical point UQ with value a (i.e. F(uo) DF(u )=0).
0 Q a
a,
Proof. Suppose that K = 0. Then U = 0 is a neighbourhood of K . Let e > 0 be arbitrary. Choose e as in Theorem 9.1.3. From the definition of a,
0
a + C
^ 0 , F _ = 0.
a c
302
The Palais-Smale
condition
Therefore, it is impossible that as Theorem 9.1.3 (v) asserts, the defor mation tp(-, 1 maps F ) into F _ . This contradiction implies K ^ 0 , which means the existence of the desired critical point. q.e.d.
a + C a c a
Of course, the methods presented in Chapter 4 yield more general existence results for minimizers of variational problems. The strength of the Palais-Smale approach rather lies in its capability of producing nonminimizing critical points. To demonstrate this, we now present the mountain pass theorem of Ambrosetti-Rabinowitz. T h e o r e m 9.2.1. Let F :V M be a C functional (V, ll-ll) satisfying (PS). Suppose F(0) = 0 and
1
on a Banach space
(i) 3p > 0,/3 > 0: F(u) > (3 for all u with \\u\\ = p (ii) We let
3t*i with
Then
a :=
7r
inf
sup
r [ 0 ) 1 ]
F(I(T))
(>
0)
= a, DF(UQ)
= 0).
Proof. Suppose again that K = 0 , and take the neighbourhood U = 0 of K . We let e = min(/?, f3 - E(u\)). Choose e as in Theorem Prom the definition of a, there exists 7 G T with
a 0 0
9.1.3.
< a - e, ?/>(,
1 )
( r ) :=V>(7o(r),l)CF _
a
w i t h 7(0) = 7o(0) = 0 and 7(1) = 7o(l) = u\ by choice of e . This contradiction implies K ^ 0 , i.e. the existence of the desired critical point.
0 Q
q.e.d.
303
Let us summarize the essential features of the preceding reasoning: (1) One chooses a family of sets, here T, that exploits some properties of F and is invariant under the deformation ?/>(, 1). (2) This family yields a minimax value a. (3) a can be estimated from above w i t h the help of any member of our family r (a < s u p [ j F ( 7 ( t ) ) ) for any 7 G T), and from below through the constraints that the members of T have to satisfy (in Theorem 9.2.1, every 7 G T intersects dB(0, p), and therefore a > f3 > 0, and therefore in particular, the critical point produced is different from 0).
r 0 x
(4) A reasoning by contradiction, based on the deformation theorem, shows that a is a critical value. As an application of the mountain pass theorem, we consider the fol lowing example: T h e o r e m 9.2.2. Let Q C R be a bounded domain, 2 < p < (respectively < 00 for d = 1,2). Then the Dirichlet problem Au+ \u\ ~ u
p 2 d
= 0 in u = 0 on
Q dn
Proof. I f u is a solution, so is u. Therefore, it suffices to verify the ex istence of one nontrivial solution. (9.2.1), (9.2.2) are the Euler-Lagrange equations in HQ (ft) for the functional
y2
F(u)
= \
f \Du\ --
f \uf.
P JQ
2
(9.2.3)
2
^ JQ
This functional is a continuous functional on HQ' (Q), because J \Du\ clearly is continuous there, and J \u\ too, because of the Sobolev Em bedding Theorem 3.4.3 as we assume p < -j^. F is also differentiable, with
Q p
DF(u)(<p)= Again (p
j Du Dip JQ
p 2
(9.2.4)
Du Dip
JQ
304
The Palais-Smale
,2
condition
whereas
(f
H {Q)
M )
(9.2.5) (9.2.6)
by the Sobolev Embedding Theorem 3.4.3 for some constant c . Thus F : H^ (Vt)
0 2
R is of class C .
n n
(9.2.7) (9.2.8)
DF(u )
n
^ 0
for u ^ oo.
(9.2.9)
/
'O
Du Dip
n
u y>| 0
n
a 2
d obtain \u \
n p
+j
< c \\u \\ , .
2 n H1 2
(9.2.11)
< c ||w||i,2 + c .
3 A
(9.2.12)
=/
we conclude from (9.2.12)
+j
n
\Du \
n
<c J
5
\Du \
n
(9.2.13)
HnllHi.(fl) ^ 6 -
(9-2-14)
305
such a sequence ( u ) n N contains a convergent subsequence, thereby completing the verification of (PS). We need to show that, after selection of a subsequence, j \Du
n
Du \
m
for n, ra oo
(9.2.15)
Du D(u
n
- Um) -
|u | ~
n
u (u
n
- u)
m
for
71, ra
> oo
(9.2.16) by (9.2.10), (9.2.14). By the Rellich-Kondrachev theorem (Corollary 3.4.1), we may also as sume (by selecting a subsequence) that (u ) ^ is a Cauchy sequence in 1^(0,). Then, using Holder's inequality as in (9.2.5),
n ne
p-i
i
|tZ -U | ^
n m P
J \u \ ~ y
P n
U (u
n
U)
m
<
(^J \u \ ^j (^J
P n
m
^ 0
(9.2.17)
Du
- D(u
u)
which implies (9.2.15). We have thus verified (PS) for F. We shall now check the remaining assumptions of Theorem 9.2.1. First of all, F(0) = 0. Recalling that by the Sobolev Embedding Theorem 3.4.3 (and the Poincare inequality, see (9.2.13))
i
(/n f
we have F(u) > ( i C8
|D
|!
|||| ) I M I i . . ( n >
with /
n
> 0 >
if IM|#i.2(Q) = p is sufBciently small. Finally, take any u G HQ (Q) large A > 0, u\ Xu satisfies
2 2 i2
\u \
2
^(l)
= T
2
/ \
2 \
~ - f
\U2\ <0.
P Jn
306
The Palais-Smale
condition
We have now verified all the assumptions of the mountain pass Theorem 9.2.1, and we consequently get a critical point u of F w i t h F(u) > (3 > 0.
This is the desired nontrivial solution. (In fact, regularity theory im plies that any weak solution of (9.2.1) is smooth in fi, see e.g. GilbargTrudinger, loc. cit.) q.e.d. Remark 9.2.1. By the same method, we can also treat the equation Au - Xu + \u\ '
p 2
u = 0
(9.2.18)
9.3 Topological indices and critical points In Section 3.2 of Part I , we have seen an example where a topologi cal construction permitted to deduce the existence of more than one (unstable) critical point of a functional. I n the present section, we first give an axiomatic approach to such constructions and then apply this in conjunction w i t h the Palais-Smale condition to a concrete variational problem to show the existence of infinitely many solutions. Such global topological constructions originated w i t h the work of Lyusternik. Contributors also include Schnirelman, and, more recently, Rabinowitz, and many others. The reader will find detailed references in the monographs quoted at the end of this chapter. Definition 9.3.1. Let X be a topological space, F : X > R x X is called a special point for F, with value a, x G spec F
a
continuous,
if x is contained in all A C X with the following property: For each open U D A there exist e = e(U) > 0 and a continuous t/> : X x [0,1] ^ satisfying (i) \j)(y,0)=y (ii) F(^(y, foryeX s)) for all y G X, 0 < s < t < 1 X
9.3 Topological indices and critical points (iii) For every y G X \ U with F(y) we have F(^(,l))<a-c. < a + e,
307
Of course, the ip of the preceding definition is an abstract version of the deformations constructed in Section 9.1, and the notion of special point is a topological version of the notion of critical point. Remark 9.3.1. Since the composition of any two deformations ipi,ip satisfying the properties of Definition 9.3.1 continues to satisfy these properties, the intersection of any two sets A\, A still satisfies the prop erty expressed in Definition 9.3.1 i f Ai,A do. Therefore, i f spec F 0, we may take U = A = 0 in Definition 9.3.1 and find a deformation ip that satisfies (i)-(iii) for all y G X.
2 2 2 a
In order to illustrate the notion of special point as well as the topo logical constructions to follow, we now present the simple: L e m m a 9 . 3 . 1 . Let F : X E be a continuous function logical space X. Let M be a (nonempty)
a
If spec F = 0, we require that M is invariant under the considered in Definition IfAeM, Suppose -oo < a = inf sup F(y) < oo. AeM
yeA
(9.3.2)
Proof. Suppose spec F = 0. According to the preceding remark, we may then take U = 0 and find ^ : l x [ 0 , a n d e > 0 w i t h
a
(9.3.3)
(9.3.4)
308
The Palais-Smale
condition
However, i f we take A\ := ip(Ao, 1) then A\ G M. by assumption, and by (9.3.3) sup F(y) < a - e, yA
x
q.e.d. In order to obtain the existence of further special points, we now shall introduce the notion of a (topological) index. Such an index is based on symmetry or invariance properties of the functional under considera tion. Here, we only consider the case of the simplest nontrivial symmetry group, namely Z , although the subsequent constructions easily gener alize to any compact group G. We thus make the following symmetry assumptions:
2
X is a topological space with a nontrivial involution, i.e. there exists a continuous map j : X > X , j ^ id, w i t h
j
2
= id.
M := {A C X I j(A) = A and for all (i.e. A contains no fixed points of j ) } . We now also require ip(j(x),t) Definition 9.3.1.
G M:
(i) i(A) = 0 ^ A = 0 (ii) (iii) (iv) (v) A finite (A ^ 0) i(A) = 1 A ) < i(Ai) + i(A ) Ai C A = < i(A ) t(i4) < i(j(A))
2(^1 U
2 2 2 2
9.3 Topological indices and critical points (vi) A compact = 3 neighbourhood U of A in X with U G M, i(A) = i(U) < oo. For n { 0 , 1 , 2 , . . . , oo}, we put M :=
n
309
{AeM\
i(A)
>n}.
Remark 9.3.2. More precisely, one should call an i as in Definition 9.3.2 an index for (X, F, Z 2 ) , in order to specify the symmetry group involved. For n E { 0 , 1 , 2 , . . . , 00}, we define a
n
:=
eA
T h e o r e m 9.3.1. Suppose the above symmetry assumptions hold, an in dex i for ( X , F) exists, and
00 < a
n
< oof
(i)
Then spec
Q n
F^0
n n n
(9.3.6)
= c* +i = . . . = a _|_fc, ^ e n
Proof We note that property (v) of Definition 9.3.2 implies that M is invariant under (symmetric) deformations ip. Therefore, Lemma 9.3.1 implies spec F ^ 0 . For the second statement, we claim that for Ao = spec F ,
n Qn Qn
i(Ao) >k+l.
(9.3.7)
If k > 1, property (ii) of Definition 9.3.2 then implies the existence of infinitely many special points w i t h value a . Suppose on the contrary that
n
i(A )
0
< k.
(9.3.8)
(9.3.9)
Since A consists of special points, we may find a (symmetric) deforma tion ip w i t h F(ip(y, 1)) < a
n
- e
for all y e X \ U
with
F(y)<a
+e
/ 0-
f Since the infimum over an empty set is 00, this contains the assumption Mn
310
The Palais-Smale
n
condition
n
a
yA
n +
k , we may find A e M +k
n
with
+ e,
(9.3.10)
hence A \ ( 7 ^ 0 by (i). Since, as noted in the beginning, M under we get i>(A\U,l) hence sup F(y) > a , yV(^\t/,l)
n
is invariant
M,
n
contradicting (9.3.10). q.e.d. I n order to apply the preceding considerations, we need to construct an index w i t h the properties listed in Definition 9.3.2. We shall present here Coffman's version of the genus of Krasnoselskij. D e f i n i t i o n 9.3.3. Suppose the symmetry assumptions stated before Def inition 9.3.2 hold. The genus of A ^ 0, A e M is defined as follows: gen(A) := inf { n G { 1 , 2 , 3 , . . . , 0 0 } | 3 with
while g e n ( 0 ) : = 0.
f : A -> R \ { 0 } xeA}
f(j(x))
= f(x)
As an example, we state: L e m m a 9.3.2. The genus of the unit sphere S' "" = {||x|| = 1} in R (with involution j(x) = x) is equal to n.
n 1 n
311
Proof. The inclusion map S' "" R satisfies the properties of Def inition 9.3.3, and so g e n ( 5 ) < n . I f n > 2, 5 is connected, and therefore, by the mean value theorem, there is no continuous map / : S^- -+ R ^ j o } w i t h f(-x) = -f(x) for all x. Hence g e n ^ " ) > 2. In fact, by the Borsuk-Ulam theoremf, there is no such continuous map to R \ {0} w i t h m < n. Therefore, g e n ( 5 ~ ) > n . q.e.d.
n _ 1 n _ 1 1 1 m n 1
C o r o l l a r y 9 . 3 . 1 . The genus of the unit sphere S := {x V : \\x\\ = 1} in an infinite dimensional Banach space (V, ||-||) is oo. Proof. For any n-dimensional subspace V gen(S) > gen(S f l V )
n n
of V,
>n
T h e o r e m 9.3.2. The genus as defined in Definition 9.3.3 is an index in the sense of Definition 9.3.2. Proof. We need to check the properties (i)-(vi) of Definition 9.3.2. (i) is obvious. (ii) I f A A I is finite, then A is of the form {x ,j(x ) \ v 1 , . . . , k} for some k. We define / : A -+ R \ {0} by f(x) = 1, f{j{x )) = - 1 for all v (of course, we may assume x ^ j(x ) for all fi, v). (hi) Let gen(A) = n < oo, v = 1,2, and let the continuous f : A -> R " \ { 0 } satisfy U(j(x)) = ~U(x) for all x. By the Tietze extension theorem^, f can be continuously extended to
1/ 1/ 1 u p u v n v v
The map ( / i , / ) : A U A
2 x 2
n i + n 2
312
The Palais-Smale
condition
(vi) Let A M be compact. Since j(x) ^ x for all x A (by the properties of A i ) , for each x A, we may find a neighbourhood /(#) w i t h U(x) n j(U(x)) = 0. Since A is compact, i t can be covered by finitely many such neighbourhoods U , v = 1 , . . . , n. For each we choose a continuous function <p : X R with </?j,(x) > 0 for x /|,, <p (x) = 0 for sc E X \ t/. We then define h = (h\...,h ): A^R \{0}by
v v u n n
j{U ).
u
(Since every x A is contained i n some we have h(x) ^ 0 for all x A . ) Thus gen(A) < n < 0 0 . If A M is compact with gen(A) = n, and / : A -> R \ {0}
n
= -/(x),
we may extend / as before to / : X R (with the same symme try property). Since A is compact, so is / ( A ) , and therefore, we may find an open neighbourhood V of f(A) w i t h V C R \ { 0 } . Then U := f (A) satisfies
n l
Thus gen(U) = gen(A) as required. q.e.d. We may now obtain a general existence theorem for critical points of functionals satisfying (PS): T h e o r e m 9.3.3. Let F, G : H R be C functionals on a Hilbert space (H, (, )) that are even, i.e. F(x) = F(x), G(x) = G(x) for all x H. Suppose F satisfies (PS) relative to G = (3, and is bounded from below. Let M:={Ac{G(x) = (3}\0<A and (xeA^-xeA)}.
2
Let 7 0 : = sup{gen(if) | K M compact} (< 0 0 ) . Then F possesses at least 70 critical points relative to G = (3. Proof. Since (PS) holds, by Theorem 9.1.2, all special points (in the
313
sense of Definition 9.3.1) for the restriction of F to X := {x H | G(x) = 0) are critical points for F relative to G = (3. Hence, it suffices to pro duce 70 special points of F on 1 . Let a
n
:=
inf
AM,gen(A)>n
sup F(x).
X
Since F is bounded below, and since in the definition of 70, we only consider compact sets, we have 00 < a
n
By Theorem 9.3.2, we may apply Theorem 9.3.1 to the genus as an index. We have in fact 00 < ot\ < cx<i < < a
n
spec F
Qn
produced by Theorem 9.3.2 (i) are all different, because their values F(x ) are all different. I f however any two such numbers a _ i and a are equal, then by Theorem 9.3.2 (ii) we even obtain infinitely many special points. Thus, in any case, we have at least 70 special, hence critical points. q.e.d.
n n n
As an application of Theorem 9.3.3, we consider the example of the previous section: C o r o l l a r y 9.3.2. Let ft C R be a bounded domain, 2 < p < j ~ (respectively < 0 0 for d = 1,2). Then for any A > 0, the Dirichlet problem Au - Xu + \u\ ~
p 2 d
u = 0 u = 0
infl on dft
(9.3.11) (9.3.12)
solutions.
F{u) = \j^(\Du\
G(u) = P Jn
+ \u )
p
lf
\u\ .
314
The Palais-Smale
condition
to the argument employed for the demonstration of Theorem 9.2.2: let (w )nN be a critical sequence, i.e.
n
F(u )
n
< ci
(9.3.13)
forn->oo
(9.3.14)
where all norms and scalar products are from H Q ' ( Q ) . From (9.3.13) (and the Poincare inequality in case A = 0), we obtain
(9.3.15)
We obtain as in the proof of Theorem 9.2.2 (cf. (9.2.5)), by using Holder's inequality, that \DG(u )(u
n
- u )\
m
= j
\u \
n
u (u -Um)
n n
2^1
" ( /
| U n | P
( / l ^ -
- |
' (
Since p < from (9.3.15) and Sobolev's Embedding Theorem 3.4.3, we conclude that / | u | is bounded, whereas (9.3.15) and the RellichKondrachev theorem (Corollary 3.4.1) imply that (u ) ^ is a Cauchy sequence in L ( f i ) . Thus, from (9.3.16)
p n n ne p
DG(u )(u
n
Um) > 0
for n, m oo.
(9.3.17)
Also \\DG(u )
n
\DG(u )(w)\
n
>
||U||1,2
!K\
(9.3.18)
||n||fl-.a
> 0 - / \u \ PJ
n p
= G(u )
= 1.
(9.3.19)
9.3 Topological indices and critical points Prom (9.3.17), (9.3.18) we conclude that there exist h DG(u )(u
n n n m
315 e H^ (ft)
2
- Um + h )
nm
= 0
2
l l ^ n m ! ! / ^ -
- u
4 h )
nm
-+ 0,
i.e. j {Du
n
(D(u
- u)
m
4 Dh )
nm
4 Xu {u
n
- u
+ h ))
nrn
- u ))
m
4 Xu (u
n
- Um))
0.
4 A | ( u - u )\ ^
n m
-+ 0
forn,ra->oo,
,2
In order to apply Theorem 9.3.2, we thus only need to check that in the present case, 70 = 00. However,
is the intersection of a sphere centered at the origin in L (ft) with the subspace HQ (Q). Therefore, the argument of Lemma 9.3.2 easily im plies 7 0 = 0 0 . Theorem 9.3.2 thus produces infinitely many solutions
,2
n !
\\DG(u )\\
n
n >
i.e. w i t h
=
(DG(u ),DF(u ))
n n
^'
weak solutions of Au
n
||r>G()||
P 2
'
in ft on dft.
- Xu 4 fi \u \ ~
n n n
u =0
n
= 0
316
n
The Palais-Smale
2 n
condition
n n n
If we choose v with z / P ~ / i = 1, then v := v u solves (9.3.11), (9.3.12) weakly. Again, we remark that elliptic regularity theory implies that all u and v are smooth in fi, so that in fact we obtain classical solutions of (9.3.11), (9.3.12). q.e.d.
n n
In Theorem 9.2.2 and in Corollary 9.3.1, we had imposed the restric tion 2d p < ( i case d > 3) , d 2 and the reader may wonder whether this is necessary. To pursue this question, we shall now discuss the theorem of Pohozaev:
n
T h e o r e m 9.3.4. Let fi C R be a smooth domain which is strictly star shaped w.r.t. 0 R (this means that the outer normal v of Vt satisfies (x, v(x)) > 0 for all x dft). Then for X > 0, any solution of
d
Au - Xu - f M ^
u = 0 u = 0
in on dft
fi
(9.3.22) (9.3.23)
vanishes
identically.
We shall present a complete proof only for A > 0 and for smooth solutions u (elliptic regularity implies that any weak solution of (9.3.21), (9.3.22) is automatically smoothf on fi, but the present book does not treat this topic): We multiply (9.3.22) by
YlLi
a n d
o b t a i n
(9.3.24)
= div
+
( W g ~^f~r
2 +
w*
(9.3.25)
^ | D u |
^ H
- - ^ H ^ .
By (9.3.23), we have w = 0 o n Ofi, hence also X > * | * T = E ^ V ^ (y = ( z / , . . . , v ) is the exterior normal of fi). Integrating (9.3.25) there fore yields
1 d
d-2
f _
l 2
Ad f ,
l 2
d-2
f ,
W /an
9u' ^
2>V
= o.
(9.3.26)
f See for example Appendix B in M . Struwe, Variational 2nd edition, 1996. Methods, Springer, Berlin,
points
317
On the other hand, multiplying (9.3.22) by u leads to (9.3.27) Jn Jn Jn Equations (9.3.26) and (9.3.27) imply (9.3.28)
Jn
Jan
du
I f A > 0, this implies u = 0, hence the result. (If A = 0, one still concludes that | ^ = 0 on dft. Since also u = 0 on dft by (9.3.23) one may invoke a unique continuation theorem for solutions of elliptic equations to obtain u = 0 in ft. We omit the details.) q.e.d. Theorem 9.3.4 implies that for p j ~ in Theorem 9.2.2 and Corol lary 9.3.2, the Palais-Smale condition no longer holds. Namely, i f it did, the proofs of those results would yield the existence of nontrivial solu tions. I t also shows that i f the Palais-Smale condition fails the whole scheme developed in the present chapter for producing critical points breaks down. Since for p < (PS) does hold, the case p can be considered as as limit case for (PS). I n fact, such limit cases of the Palais-Smale con dition occur in many variational problems that are of importance in Rie mannian geometry, e.g. the Yang-Mills functional on a four-dimensional Riemannian manifold, two-dimensional harmonic maps, surfaces of con stant mean curvature, the Yamabe functional etc. The interested reader is for example referred to K. C. Chang, Infinite Dimensional Morse Theory and Multiple Solution Problems, Birkhauser, Boston, 1993, J. Jost, Riemannian Geometry and Geometric Analysis, Springer, Berlin, 2nd edition, 1998, M. Struwe, Variational Methods, Springer, Berlin, 2nd edition, 1996, and the references contained therein. The basic references that have been used in writing the present chapter are the monograph of M.Struwe just quoted, as well as P. Rabinowitz, Minimax Methods in Critical Point Theory with Applications to Differential Equations, CBMS Reg. Conf. Ser. 65, AMS, Providence,
1986
and
318
The Palais-Smale
condition
E. Zeidler, Nonlinear Functional Analysis and its Applications, I I I , Springer, Berlin, 1984. These three monographs contain not only detailed bibliographical ref erences which the reader is urged to consult i n order to find the original sources of the results of the present chapter but also many further results and examples concerning the Palais-Smale condition and index theories.
Exercises 9.1 W h y is Theorem 9.2.1 called 'mountain pass theorem'? Hint: Try to find an analogy between the statement of that result and the geometry of mountain passes. Try to find conditions for a function
/ : ft
9.2
x R->R
so that the reasoning of Theorem 9.2.2 can be extended to the Dirichlet problem
Au(x) = f(x,u(x)) for x G ft
u(x) = 0
for x dft
9.3 9.4
in a smooth bounded domain ft. ( A n answer can be found in Theorem 6.2 of the quoted monograph of M.Struwe.) Develop an index theory for a general compact group G i n place of Z . Extend Theorem 9.1.3 to the relative case as indicated at the end of Section 9.1.
2
Index
C ( n , R ) , xvi
fc
INI.
(v),
H/IL
R + := {t e R | t > 0 } , 130 V * := { / V R linear with | | / I U < o o } , 133 (V*)* = : V * * , 133 3Jn * X , 135 M := {a: # : (x, y) = 0 for all y A / } , 141
x
| | T | | := s u p ^ l | ^ f l e R + U {oo}, 144
x 0
6 / ( U , T J ) := ^ / ( u + t y ) , ^ 1 9
= 0
19
/ o ( E i i ( 0 ) * * , 32
E(c) :=yT\c(t)\ dt
2
fl'y.fc
r
:=
~$t9v>
Jfc
: =
^(gjitk+gku-gjid),
39
:=
L ( V , W), 145 k e r T : = {a; 6 V : To; = 0 } , 145 V = V i e V , 146 coker, 147 H ( T ) , 147 i n d T , 147 F ( V , W ) , 147 DF(u), 150 C , 150 C , 150 > F ( u ) , 150 ODE, 155 I M | o : = s u p | | y ( t ) | | , 156 H/llp = H / H L P ) : = U \f(x)\ dx)p, 159 esssup f(x) := inf { A G R | f(x) < X for almost all xeA}, 162
2 1 2 2 c t / : p M A x j 4
C (n), 166
0
supp^J, 166
319
320
Index
ft, 167 Borel cr-algebra, 117 brachystochrone, 4 canonical equation, 85, 89, 95, 97, 99-101, 111 canonical equations, 80, 93 canonical system, 80 canonical transformation, 95-100, 103 Cantor diagonalization, 135 catenary, 283 catenoid, 283 Cauchy sequence, 126 characteristic function, 119, 211, 243 Christoffel symbols, 39 classical calculus of variations, 3 closed geodesic, 67 coarea formula, 250, 257 coercive, 186 coercivity condition, 291 Coffman, 310 cokernel, 147 compactness condition, 183 compactness of critical sequences, 292 complementary subspace, 146 complete, 126, 134 complete integral, 84, 93 completely integrable, 100 conjugate, 22, 24 conjugate point, 43 conservation law, 26 conserved quantities, 26 constant of motion, 80, 99 continuous linear functional, 133 continuous linear operator, 144 control condition, 109 control equation, 106, 108, 109, 111, 207 control parameter, 104 control problem, 109 control restriction, 105 control variable, 111, 207 converge, 125 convex, 68, 127, 130, 143, 186, 191, 193, 214, 219, 222 convex combination, 142 convex curve, 68 convex function, 122 convex functional, 188 convexity, 291 coordinate transformation, 36 cost, 105 cost function, 207 countable base, 184 countably additive, 118 critical family, 75 critical point, 5, 62, 66, 293, 294, 298, 301, 303, 306, 307, 312, 317 critical sequence, 291, 292, 304, 314
ft' cc
a
fh =3 / , 167
:= ( a , . . . , a ) ,
x d
171
ll
t; := D w , 171 W >*>(ft), 171
a fc
171
f e
y X
( A F ( y ) + d ( * , t / ) ) , 190
J (x),
191
(A:=Eti(afV),199
s c ~ F , 208 i , 210 9 " / , 216 r-limn-oo F,
A n
225
V(ft), 242
\\Du\\, 242 N I B V ( Q ) '= \M\mn) l^l -i,243
d X
+ H l l ( )>
Dw
2 4 2
P(E,Q) :=\\D E\\(n), 244 * u(x), 246 G / ( d , R ) , 257 0 ( d , R ) , 257 J\{u)v, 270 (,) 2,283 ( P S ) , 292 K , 292 V F ( w ) , 294 s p e c , 306 gen(A), 310 accessory variational problem, 19 accumulation point, 185, 208 Ambrosetti, 302 angular momentum, 26, 28, 30 arc-length, 3 Arzela-Ascoli theorem, 176
L a a
Banach fixed point theorem, 150, 152 Banach space, 126, 129, 132-134, 138, 145, 161, 162, 270, 291, 292, 299-301 Banach spaces, 150 Bellman equation, 105, 108 Bellman function, 105, 107 Bellman's method, 106 bifurcation theory, 268, 270 Borel measure, 118 Borel set, 117
Index
critical value, 302, 303 cusp catastrophe, 279 de Giorgi, 225 deformation, 293, 294, 297, 298, 302, 307-309 dense, 169 diffeomorphism, 34, 95 differentiable, 150 differentiable map, 150 differentiation under the integral, 124 Dirac delta distribution, 173 Dirac distribution, 166 direct method, 183 Dirichlet boundary condition, 3, 26, 183, 190 Dirichlet principle, 199 Dirichlet's integral, 199, 203 distance, 51 distance function from a smooth hypersurface, 262 distributional derivative, 173 dual space, 133, 163 eiconal, 82 eiconal equation, 83, 86, 90 elementary catastrophes, 279 ellipticity assumption, 198 energy, 26, 30, 32, 34 e-minimizer, 229 equivalence classes of functions, 159 essential supremum, 162 Euler-Lagrange equation, 6, 8-10, 16, 17, 19, 21-23, 29, 38, 60, 79, 80, 83, 88, 89, 111, 197, 267, 282, 303 example of Bolza, 206 extension, 130 Federer, 261 feedback control, 109 Fermat's principle, 4 field of geodesies, 46 field of solutions, 90, 93 finite perimeter, 244 first axiom of countability, 137, 184, 185, 209, 225, 227, 228 first conjugate point, 23 first integral of motion, 30 flow, 298 foliated by tori, 100 Frechet differentiable, 150 Fredholm alternative, 149 Fredholm operator, 147-149, 270, 281, 287 free boundary condition, 26 Friedrichs mollifier, 166
321
fundamental lemma of the calculus of variations, 5 T-convergence, 225, 227, 229, 231 generating function, 100 genus, 310, 311, 313 genus of Krasnoselskij, 310 geodesic, 39, 43, 45, 50, 51, 55, 57, 58, 60, 88, 102 geodesic distance, 82, 90, 93 geodesic parallel coordinates, 45, 49 geometric optics, 86 gradient, 294, 299 gradient flow, 294 great circle, 42 Holder continuous, 179 Holder's inequality, 160, 163 Hahn-Banach theorem, 129, 134, 137, 143, 166 Hamilton-Jacobi equation, 83-86, 89, 92, 93, 101 Hamilton-Jacobi theory, 111 Hamiltonian, 80, 89 Hamiltonian flow, 95, 98 harmonic, 199, 201 harmonic oscillator, 87 Hessian, 4 Hilbert space, 126, 128, 141, 162, 293, 297 Hilbert's invariant integral, 92 homogenization, 232 implicit function theorem, 151, 152 index, 147, 308, 311, 313, 318 indicator function, 210 inner radius, 70 insulating layer, 235 integrable, 120 integral, 155 integral of motion, 27 integral of the Hamiltonian flow, 99 invariant integral, 93 inverse function theorem, 154 inverse operator theorem, 145 involution, 308 isometry, 34 Jacobi, 22 Jacobi equation, 20, 24, 268 Jacobi field, 20-22, 24, 269 Jacobi identity, 103 Jacobi operator, 268, 284 Jacobi's method, 99 Jensen's inequality, 122 Jordan curve, 35 Jordan curve Theorem, 68
322
Index
Moreau-Yosida transform, 212 Morrey, 222 mountain pass theorem, 302, 303, 306, 318 neighbourhood system, 184 Newtonian motion, 81 Noether, 26 nonminimizing critical point, 291, 302 norm, 125 norm convergence, 125, 132 norm of a linear functional, 133 normed space, 125 null class, 159 null function, 159 optimal control theory, 111, 207 ordinary differential equation, 155 ordinary differential equations in Banach spaces, 155 orthogonal, 90 orthogonal complement, 141 Palais, 299 Palais-Smale condition, 77, 292, 293, 304, 306, 312, 317 parallel surfaces, 92 parallelogram identity, 128 parameterization invariant, 34 parameterized by arc-length, 8, 35, 36, 43, 88 parameterized proportionally to arc-length, 35, 38, 55, 89 perimeter, 244 phase space, 98, 100 phase transition, 254 Picard-Lindelof theorem, 155 Poincare* inequality, 177, 304 Poisson bracket, 102 polar coordinate, 49 polar coordinates, 48 Pontryagin function, 110, 111 Pontryagin maximum principle, 110-112 principal curvature, 263 projection theorem, 142 proper, 62 pseudo-gradient, 299, 300 quasiconvex, 219, 222 quasilinear partial differential equation, 198 Rabinowitz, 302, 306 Radon measure, 118, 241 range, 147 rectifiable, 35
Kakutani, 139 Kepler problem, 102 Kolmogorov-Arnold-Moser theory, 100 Kondrachev, 175 Lagrange multiplier, 9 Laplace operator, 199, 200 Lebesgue integral, 117, 120 Lebesgue measure, 117, 118 Legendre condition, 20, 112 Legendre transformation, 79, 88 length, 32, 34 length minimizing curve, 8 light ray, 4 limit cases of the Palais-Smale condition, 317 linear functional, 132, 133, 241 linear functionals, 129 linear operator, 144 Lipschitz continuous, 155, 203 local chart, 25, 32 local minimum, 22 lower semicontinuity, 184 lower semicontinuous, 185, 186, 188, 193, 208, 230 lower semicontinuous w.r.t. weak convergence, 187 lower semicontinuous envelope, 208 Lyapunov-Schmid, 280 Lyapunov-Schmid reduction, 269 Lyusternik, 306 Lyusternik-Schnirelman, 67 mean curvature, 263 mean value property, 201 measurable, 118-120 measure, 117 metric tensor, 33, 47 minimal hypersurface, 255 minimal hypersurfaces, 203 minimal surface of revolution, 282 minimax value, 303 minimizer, 4-6, 12, 183, 186, 229, 291, 302 minimizer of a convex variational problem, 189 minimizing, 3 minimizing sequence, 183 Minkowski functional, 143 Minkowski's inequality, 160 Modica, 254 Mobius strip, 75, 76 mollification, 167, 174, 175, 200, 245 momenta, 80 momentum, 26, 28, 30 monotonically increasing sequence, 122 Moreau-Yosida approximation, 190
Index
reflexive, 134, 135, 137-139, 163, 174, 186 regularity, 11 regularity theory, 198, 286, 306, 316 regularizing term, 255 relative minimum, 62, 66 relatively compact, 167 relaxation, 208 relaxed function, 208, 214 relaxed functional, 209 Rellich, 175 Rellich-Kondrachev theorem, 305 reparameterization, 8 Riccati equation, 86, 108 Riemannian manifold, 43, 52, 53 Riemannian normal coordinates, 48 Riemannian polar coordinate, 49, 51, 60 Riesz representation theorem, 241 rotational invariance, 200 Sard's theorem, 250, 257 scalar product, 126 Schnirelman, 306 Schwarz inequality, 35, 127 second axiom of countability, 184 second variation, 18, 23 semigroup family, 299 semigroup property, 157, 294, 297, 300 separable, 135, 169, 173, 184, 186 shortest geodesic, 52, 53, 55 shortest length, 50 cr-algebra, 117 signed measure, 242 simple function, 119 smoothing kernel, 166 Sobolev Embedding Theorem, 175, 179, 303, 305 Sobolev inequalities, 179 Sobolev space, 171, 173 special point, 306-309, 312 special value, 306, 307 sphere, 39 star shaped, 316 state variable, 207 step function, 119 strictly normed, 157 strong convergence, 125, 174 submanifold, 24, 32, 43, 52, 53 summation convention, xv, 19 support, 166 surface of revolution, 60, 282 symmetry assumption, 308-310 symplectic geometry, 96 symplectomorphism, 97 Taylor expansion, 274 test functions, 166 theorem of B . Levi, 122
323
theorem of Clarkson, 164 theorem of de Giorgi and Nash, 198 theorem of E . Noether, 28 theorem of Fatou, 123 theorem of Fubini, 122 theorem of Helly, 132 theorem of Jacobi, 84, 93, 101 theorem of Kondrachev, 180 theorem of Lebesgue, 123 theorem of Liouville, 98 theorem of Lyusternik-Schnirelman, 67 theorem of Mazur, 142 theorem of Milman, 139 theorem of Modica-Mortola, 248 theorem of Morrey, 179 theorem of Picard-Lindelof, 39, 155 theorem of Pohozaev, 316 theorem of Rellich, 175 theorem of Riesz, 141 theorem of Riesz-Fischer, 161 theorem of Sobolev, 179 theorem on dominated convergence, 123 theory of catastrophes, 279 Thorn, 279 topological space, 185 translation invariance, 118 transversality condition, 110 triangle inequality, 125, 126, 159 uniform convergence, 126, 168 uniformly continuous, 168 uniformly convex, 127, 129, 139, 157, 164 unstable critical point, 291 variational problem, 9 volume preserving, 98 weak convergence, 135-137, 142, 174, 186, 214 weak* convergence, 135 weak* convergent, 135 weak derivative, 171, 172 weak limit, 138 weak solution, 306, 315 weak solution of the Jacobi equation, 285 weak topology, 291 weak* topology, 137 weakly convergent, 135, 136 weakly lower semicontinuous, 222 weakly proper, 185 Weierstrafi, 46 Weierstrass approximation theorem, 170 Weierstrafi condition, 112 Weyl's lemma, 199 Young's inequality, 160 Zorn's lemma, 131