Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Analysis
Gerald Teschl
Gerald Teschl
Fakultät für Mathematik
Oskar-Mogenstern-Platz 1
Universität Wien
1090 Wien, Austria
E-mail: Gerald.Teschl@univie.ac.at
URL: http://www.mat.univie.ac.at/~gerald/
Preface vii
Chapter 0. Introduction 3
§0.1. Linear partial differential equations 3
iii
iv Contents
The present manuscript was written for my course Functional Analysis given
at the University of Vienna in winter 2004 and 2009. It was adapted and
extended for a course Real Analysis given in summer 2011. The last part are
the notes for my course Nonlinear Functional Analysis held at the University
of Vienna in Summer 1998 and 2001. The three parts are to a large extent
independent. In particular, the first part does not assume any knowledge
from measure theory (at the expense of not mentioning Lp spaces).
It is updated whenever I find some errors and extended from time to
time. Hence you might want to make sure that you have the most recent
version, which is available from
http://www.mat.univie.ac.at/~gerald/ftp/book-fa/
vii
viii Preface
Gerald Teschl
Vienna, Austria
August, 2013
Part 1
Functional Analysis
Chapter 0
Introduction
∂ ∂2
u(t, x) = u(t, x). (0.1)
∂t ∂x2
Here u(t, x) is the temperature distribution at time t at the point x. It
is usually assumed, that the temperature at x = 0 and x = 1 is fixed, say
u(t, 0) = a and u(t, 1) = b. By considering u(t, x) → u(t, x)−a−(b−a)x it is
clearly no restriction to assume a = b = 0. Moreover, the initial temperature
distribution u(0, x) = u0 (x) is assumed to be known as well.
3
4 0. Introduction
Since finding the solution seems at first sight not possible, we could try
to find at least some solutions of (0.1) first. We could for example make an
ansatz for u(t, x) as a product of two functions, each of which depends on
only one variable, that is,
u(t, x) = w(t)y(x). (0.2)
This ansatz is called separation of variables. Plugging everything into
the heat equation and bringing all t, x dependent terms to the left, right
side, respectively, we obtain
ẇ(t) y 00 (x)
= . (0.3)
w(t) y(x)
Here the dot refers to differentiation with respect to t and the prime to
differentiation with respect to x.
Now if this equation should hold for all t and x, the quotients must be
equal to a constant −λ (we choose −λ instead of λ for convenience later on).
That is, we are lead to the equations
− ẇ(t) = λw(t) (0.4)
and
− y 00 (x) = λy(x), y(0) = y(1) = 0, (0.5)
which can easily be solved. The first one gives
w(t) = c1 e−λt (0.6)
and the second one
√ √
y(x) = c2 cos( λx) + c3 sin( λx). (0.7)
However, y(x) must also satisfy the boundary conditions y(0) = y(1) = 0.
The first one y(0) = 0 is satisfied if c2 = 0 and the second one yields (c3 can
be absorbed by w(t)) √
sin( λ) = 0, (0.8)
which holds if λ = (πn)2 , n ∈ N. In summary, we obtain the solutions
2
un (t, x) = cn e−(πn) t sin(nπx), n ∈ N. (0.9)
So we have found a large number of solutions, but we still have not dealt
with our initial condition u(0, x) = u0 (x). This can be done using the
superposition principle which holds since our equation is linear. Hence any
finite linear combination of the above solutions will again be a solution.
Moreover, under suitable conditions on the coefficients we can even consider
infinite linear combinations. In fact, choosing
∞
2
X
u(t, x) = cn e−(πn) t sin(nπx), (0.10)
n=1
0.1. Linear partial differential equations 5
is a metric space and so is Cn together with d(x, y) = ( nk=1 |xk −yk |2 )1/2 .
P
The set
Br (x) = {y ∈ X|d(x, y) < r} (1.2)
is called an open ball around x with radius r > 0. A point x of some set
U is called an interior point of U if U contains some ball around x. If x
7
8 1. A first look at Banach and Hilbert spaces
Then v
n u n n
1 X uX X
√ |xk | ≤ t |xk |2 ≤ |xk | (1.4)
n
k=1 k=1 k=1
shows Br/√n (x) ⊆ B̃r (x) ⊆ Br (x), where B, B̃ are balls computed using d,
˜ respectively.
d,
1.1. Warm up: Metric and topological spaces 9
˜ y) = d(x, y)
d(x, (1.5)
1 + d(x, y)
without changing the topology (since the family of open balls does not
change: Bδ (x) = B̃δ/(1+δ) (x)).
A family of open sets B ⊆ O is called a base for the topology if for each
x and each neighborhood U (x), there is some set O ∈ B with x ∈ O ⊆ U (x).
Since an open set
S O is a neighborhood of every one of its points, it can be
written as O = O⊇Õ∈B Õ and we have
Lemma 1.1. If B ⊆ O is a base for the topology, then every open set can
be written as a union of elements from B.
and the largest open set contained in a given set U is called the interior
[
U◦ = O. (1.7)
O∈O,O⊆U
It is not hard to see that the closure satisfies the following axioms (Kura-
towski closure axioms):
(i) ∅ = ∅,
(ii) U ⊂ U ,
(iii) U = U ,
(iv) U ∪ V = U ∪ V .
In fact, one can show that they can equivalently be used to define the topol-
ogy by observing that the closed sets are precisely those which satisfy A = A.
We can define interior and limit points as before by replacing the word
ball by open set. Then it is straightforward to check
Lemma 1.2. Let X be a topological space. Then the interior of U is the
set of all interior points of U and the closure of U is the union of U with
all limit points of U .
Proof. From every dense set we get a countable base by considering open
balls with rational radii and centers in the dense set. Conversely, from every
countable base we obtain a dense set by choosing an element from each
element of the base.
Lemma 1.5. Let X be a separable metric space. Every subset Y of X is
again separable.
Proof. Let A = {xn }n∈N be a dense set in X. The only problem is that
A ∩ Y might contain no elements at all. However, some elements of A must
be at least arbitrarily close: Let J ⊆ N2 be the set of all pairs (n, m) for
which B1/m (xn ) ∩ Y 6= ∅ and choose some yn,m ∈ B1/m (xn ) ∩ Y for all
(n, m) ∈ J. Then B = {yn,m }(n,m)∈J ⊆ Y is countable. To see that B is
dense, choose y ∈ Y . Then there is some sequence xnk with d(xnk , y) < 1/k.
Hence (nk , k) ∈ J and d(ynk ,k , y) ≤ d(ynk ,k , xnk ) + d(xnk , y) ≤ 2/k → 0.
Proof. (i) ⇒ (ii) is obvious. (ii) ⇒ (iii): If (iii) does not hold, there is
a neighborhood V of f (x) such that Bδ (x) 6⊆ f −1 (V ) for every δ. Hence
we can choose a sequence xn ∈ B1/n (x) such that f (xn ) 6∈ f −1 (V ). Thus
xn → x but f (xn ) 6→ f (x). (iii) ⇒ (i): Choose V = Bε (f (x)) and observe
that by (iii), Bδ (x) ⊆ f −1 (V ) for some δ.
The last item implies that f is continuous if and only if the inverse
image of every open set is again open (equivalently, the inverse image of
every closed set is closed). If the image of every open set is open, then f
is called open. A bijection f is called a homeomorphism if both f and
its inverse f −1 are continuous. Note that if f is a bijection, then f −1 is
continuous if and only if f is open.
In a topological space, (iii) is used as the definition for continuity. How-
ever, in general (ii) and (iii) will no longer be equivalent unless one uses
generalized sequences, so-called nets, where the index set N is replaced by
arbitrary directed sets.
The support of a function f : X → Cn is the closure of all points x for
which f (x) does not vanish; that is,
supp(f ) = {x ∈ X|f (x) 6= 0}. (1.12)
If X and Y are metric spaces, then X × Y together with
d((x1 , y1 ), (x2 , y2 )) = dX (x1 , x2 ) + dY (y1 , y2 ) (1.13)
is a metric space. A sequence (xn , yn ) converges to (x, y) if and only if
xn → x and yn → y. In particular, the projections onto the first (x, y) 7→ x,
respectively, onto the second (x, y) 7→ y, coordinate are continuous. More-
over, if X and Y are complete, so is X × Y .
In particular, by the inverse triangle inequality (1.1),
|d(xn , yn ) − d(x, y)| ≤ d(xn , x) + d(yn , y), (1.14)
we see that d : X × X → R is continuous.
Example. If we consider R × R, we do not get the Euclidean distance of
R2 unless we modify (1.13) as follows:
˜ 1 , y1 ), (x2 , y2 )) = dX (x1 , x2 )2 + dY (y1 , y2 )2 .
p
d((x (1.15)
As noted in our previous example, the topology (and thus also conver-
gence/continuity) is independent of this choice.
Proof. Let {Uα } be an open cover for Y and let B be a countable base.
Since every Uα can be written as a union of elements from B, the set of all
B ∈ B which satisfy B ⊆ Uα for some α form a countable open cover for Y .
Moreover, for every Bn in this set we can find an αn such that Bn ⊆ Uαn .
By construction {Uαn } is a countable subcover.
Lemma 1.8 (Stone). In a metric space every countable open cover has a
locally finite open refinement.
Proof. Denote the cover by {Oj }j∈N and introduce the sets
[
Ôj,n = B2−n (x), where
x∈Aj,n
[
Aj,n = {x ∈ Oj \(O1 ∪ · · · ∪ Oj−1 )|x 6∈ Ôk,l and B3·2−n (x) ⊆ Oj }.
k∈N,1≤l<n
To show (ii) let y ∈ Ôj,i and z ∈ Ôk,i with j < k. We will show
d(y, z) > 2−n−m+1 . There are points r and s such that y ∈ B2−i (r) ⊆ Ôj,i
and z ∈ B2−i (s) ⊆ Ôk,i . Then by definition B3·2−i (r) ⊆ Oj but s 6∈ Oj . So
d(r, s) ≥ 3 · 2−i and d(y, z) > 2−i ≥ 2−n−m+1 .
Proof. (i) Observe that if {Oα } is an open cover for f (Y ), then {f −1 (Oα )}
is one for Y .
(ii) Let {Oα } be an open cover for the closed subset Y (in the induced
topology). Then there are open sets Õα with Oα = Õα ∩Y and {Õα }∪{X\Y }
is an open cover for X which has a finite subcover. This subcover induces a
finite subcover for Y .
(iii) Let Y ⊆ X be compact. We show that X\Y is open. Fix x ∈ X\Y
(if Y = X, there is nothing to do). By the definition of Hausdorff, for
every y ∈ Y there are disjoint neighborhoods V (y) of y and Uy (x) of x. By
compactness of TnY , there are y1 , . . . , yn such that the V (yj ) cover Y . But
then U (x) = j=1 Uyj (x) is a neighborhood of x which does not intersect
Y.
(iv) Let {Oα } be an open cover for X × Y . For every (x, y) ∈ X × Y
there is some α(x, y) such that (x, y) ∈ Oα(x,y) . By definition of the product
topology there is some open rectangle U (x, y) × V (x, y) ⊆ Oα(x,y) . Hence for
fixed x, {V (x, y)}y∈Y is an open cover of Y . Hence there are finitely many
1.1. Warm up: Metric and topological spaces 15
T
points yk (x) such that the V (x, yk (x)) cover Y . Set U (x) = k U (x, yk (x)).
Since finite intersections of open sets are open, {U (x)}x∈X is an open cover
and there are finitely many points xj such that the U (xj ) cover X. By
construction, the U (xj ) × V (xj , yk (xj )) ⊆ Oα(xj ,yk (xj )) cover X × Y .
(v) Note that a cover of the union is a cover for each individual set and
the union of the individual subcovers is the subcover we are looking for.
(vi) Follows from (ii) and (iii) since an intersection of closed sets is
closed.
Proof. It suffices to show that f maps closed sets to closed sets. By (ii)
every closed set is compact, by (i) its image is also compact, and by (iii)
also closed.
Oα(y) by construction whereas on the other hand B2d (xn1 ) ⊆ B4r/5 (a) ⊆
Oα(y) .
Proof. By Lemma 1.10 (ii) and (iii) it suffices to show that a closed interval
in I ⊆ R is compact. Moreover, by Lemma 1.12 it suffices to show that
every sequence in I = [a, b] has a convergent subsequence. Let xn be our
sequence and divide I = [a, a+b a+b
2 ] ∪ [ 2 , b]. Then at least one of these two
intervals, call it I1 , contains infinitely many elements of our sequence. Let
y1 = xn1 be the first one. Subdivide I1 and pick y2 = xn2 , with n2 > n1 as
before. Proceeding like this, we obtain a Cauchy sequence yn (note that by
construction In+1 ⊆ In and hence |yn − ym | ≤ b−a n for m ≥ n).
Combining Theorem 1.13 with Lemma 1.10 (i) we also obtain the ex-
treme value theorem.
Theorem 1.15 (Weierstraß). Let X be compact. Every continuous function
f : X → R attains its maximum and minimum.
A metric space for which the Heine–Borel theorem holds is called proper.
Lemma 1.10 (ii) shows that X is proper if and only if every closed ball is
compact. Note that a proper metric space must be complete (since every
Cauchy sequence is bounded). A topological space is called locally com-
pact if every point has a compact neighborhood. Clearly a proper metric
space is locally compact.
The distance between a point x ∈ X and a subset Y ⊆ X is
dist(x, Y ) = inf d(x, y). (1.16)
y∈Y
dist(x,C2 )
Proof. To prove the first claim, set f (x) = dist(x,C1 )+dist(x,C2 ) . For the
second claim, observe that there is an open set O such that O is compact
and C1 ⊂ O ⊂ O ⊂ X\C2 . In fact, for every x ∈ C1 , there is a ball Bε (x)
such that Bε (x) is compact and Bε (x) ⊂ X\C2 . Since C1 is compact, finitely
many of them cover C1 and we can choose the union of those balls to be O.
Now replace C2 by X\O.
Note that Urysohn’s lemma implies that a metric space is normal; that
is, for any two disjoint closed sets C1 and C2 , there are disjoint open sets
O1 and O2 such that Cj ⊆ Oj , j = 1, 2. In fact, choose f as in Urysohn’s
lemma and set O1 = f −1 ([0, 1/2)), respectively, O2 = f −1 ((1/2, 1]).
Lemma 1.18. Let X be a metric space and {Oj } a countable open cover.
Then there is a continuous partition of unity subordinate to this cover;
that is, there are continuous functions hj : X → [0, 1] such that hj has
compact support contained in Oj and
X
hj (x) = 1. (1.18)
j
Moreover, the partition of unity can be chosen locally finite; that is, every x
has a neighborhood where all but a finite number of the functions hj vanish.
we see that all fn (and hence all gn ) are continuous. Moreover, the very
same argument shows that f∞ is continuous and thus we have found the
required partition of unity hj = gj /f∞ .
Finally, by Lemma 1.8 we can replace the cover by a locally finite refine-
ment and the resulting partition of unity for this refinement will be locally
finite.
Problem 1.1. Show that |d(x, y) − d(z, y)| ≤ d(x, z).
Problem 1.2. Show the quadrangle inequality |d(x, y) − d(x0 , y 0 )| ≤
d(x, x0 ) + d(y, y 0 ).
Problem 1.3. Show that the closure satisfies the Kuratowski closure axioms.
Problem 1.4. Show that the closure and interior operators are dual in the
sense that
X\A = (X\A)◦ and X\A◦ = (X\A).
(Hint: De Morgan’s laws.)
Problem 1.5. Let U ⊆ V be subsets of a metric space X. Show that if U
is dense in V and V is dense in X, then U is dense in X.
Problem 1.6. Show that every open set O ⊆ R can be written as a countable
union of disjoint intervals. (Hint: Let {Iα } be the set of all maximal open
subintervals of O; that is, Iα ⊆ O and there is no other subinterval of O
which contains Iα . Then this is a cover of disjoint open intervals which has
a countable subcover.)
Problem 1.7. Show that the boundary of A is given by ∂A = A\A◦ .
and take m → ∞:
k
X
|aj − anj | ≤ ε. (1.25)
j=1
Since this holds for all finite k, we even have ka−an k1 ≤ ε. Hence (a−an ) ∈
`1 (N) and since an ∈ `1 (N), we finally conclude a = an + (a − an ) ∈ `1 (N).
By our estimate ka − an k1 ≤ ε our candidate a is indeed the limit of an .
and β = |bj |q in (1.27) and summing over all j. (A different proof based on
convexity will be given in Section 8.2.)
Now using |aj + bj |p ≤ |aj | |aj + bj |p−1 + |bj | |aj + bj |p−1 , we obtain from
Hölder’s inequality (note (p − 1)q = p)
ka + bkpp ≤ kakp k(a + b)p−1 kq + kbkp k(a + b)p−1 kq
= (kakp + kbkp )k(a + b)kp−1
p .
is a Banach space (Problem 1.14). Note that with this definition Hölder’s
inequality (1.28) remains true for the cases p = 1, q = ∞ and p = ∞, q = 1.
The reason for the notation is explained in Problem 1.16.
kfn − f k∞ < ε/3 and δ so that |x − y| < δ implies |fn (x) − fn (y)| < ε/3.
Then |x − y| < δ implies
ε ε ε
|f (x)−f (y)| ≤ |f (x)−fn (x)|+|fn (x)−fn (y)|+|fn (y)−f (y)| < + + = ε
3 3 3
as required. Hence f (x) ∈ C(I) and thus every Cauchy sequence in C(I)
converges. Or, in other words
Theorem 1.19. C(I) with the maximum norm is a Banach space.
and {δ n }∞
n=1 is a Schauder basis (linear independence is left as an exer-
cise).
A set whose span is dense is called total and if we have a countable total
set, we also have a countable dense set (consider only linear combinations
1.2. The Banach space of continuous functions 23
While we will not give a Schauder basis for C(I), we will at least show
that it is separable. In order to prove this we need a lemma first.
Lemma 1.20 (Smoothing). Let un (x) be a sequence of nonnegative contin-
uous functions on [−1, 1] such that
Z Z
un (x)dx = 1 and un (x)dx → 0, δ > 0. (1.34)
|x|≤1 δ≤|x|≤1
(In other words, un has mass one and concentrates near x = 0 as n → ∞.)
Then for every f ∈ C[− 21 , 12 ] which vanishes at the endpoints, f (− 21 ) =
f ( 12 ) = 0, we have that
Z 1/2
fn (x) = un (x − y)f (y)dy (1.35)
−1/2
Problem 1.11. While `1 (N) is separable, it still has room for an uncount-
able set of linearly independent vectors. Show this by considering vectors of
the form
aα = (1, α, α2 , . . . ), α ∈ (0, 1).
(Hint: Take n such vectors and cut them off after n + 1 terms. If the cut
off vectors are linearly independent, so are the original ones. Recall the
Vandermonde determinant.)
Problem 1.12. Prove (1.27). (Hint: Take logarithms on both sides.)
Problem 1.13. Show that `p (N) is a separable Banach space.
Problem 1.14. Show that `∞ (N) is a Banach space.
Problem 1.15. Show that `∞ (N) is not separable. (Hint: Consider se-
quences which take only the value one and zero. How many are there? What
is the distance between two such sequences?)
Problem 1.16. Show that if a ∈ `p0 (N) for some p0 ∈ [1, ∞), then a ∈ `p (N)
for p ≥ p0 and
lim kakp = kak∞ .
p→∞
Problem 1.17. Formally extend the definition of `p (N) to p ∈ (0, 1). Show
that k.kp does not satisfy the triangle inequality. However, show that it is
a quasinormed space, that is, it satisfies all requirements for a normed
space except for the triangle inequality which is replaced by
ka + bk ≤ K(kak + kbk)
with some constant K ≥ 1. Show in fact,
ka + bkp ≤ 21/p−1 (kakp + kbkp ), p ∈ (0, 1).
Moreover, show that k.kppsatisfies the triangle inequality in this case, but of
course it is no longer homogenous (but at least you can get a honest metric
d(a, b) = ka − bkpp which gives rise to the same topology). (Hint: Show
α + β ≤ (αp + β p )1/p ≤ 21/p−1 (α + β) for 0 < p < 1 and α, β ≥ 0.)
BMB
f B f⊥
B
1
B
fk
u
1
Proof. It suffices to prove the case kgk = 1. But then the claim follows
from kf k2 = |hg, f i|2 + kf⊥ k2 .
Note that the Cauchy–Schwarz inequality entails that the scalar product
is continuous in both variables; that is, if fn → f and gn → g, we have
hfn , gn i → hf, gi.
As another consequence we infer that the map k.k is indeed a norm. In
fact,
kf + gk2 = kf k2 + hf, gi + hg, f i + kgk2 ≤ (kf k + kgk)2 . (1.47)
But let us return to C(I). Can we find a scalar product which has the
maximum norm as associated norm? Unfortunately the answer is no! The
reason is that the maximum norm does not satisfy the parallelogram law
(Problem 1.20).
28 1. A first look at Banach and Hilbert spaces
Note that the parallelogram law and the polarization identity even hold
for sesquilinear forms (Problem 1.22).
But how do we define a scalar product on C(I)? One possibility is
Z b
hf, gi = f ∗ (x)g(x)dx. (1.50)
a
The corresponding inner product space is denoted by L2cont (I). Note that
we have p
kf k ≤ |b − a|kf k∞ (1.51)
and hence the maximum norm is stronger than the L2cont norm.
Suppose we have two norms k.k1 and k.k2 on a vector space X. Then
k.k2 is said to be stronger than k.k1 if there is a constant m > 0 such that
kf k1 ≤ mkf k2 . (1.52)
It is straightforward to check the following.
1.3. The geometry of Hilbert spaces 29
Lemma 1.25. If k.k2 is stronger than k.k1 , then every k.k2 Cauchy sequence
is also a k.k1 Cauchy sequence.
is finite. Show
kqk ≤ ksk ≤ 2kqk.
1.4. Completeness 31
(Hint: Use the parallelogram law and the polarization identity from the pre-
vious problem.)
Problem 1.24. Suppose Q is a vector space. Let s(f, g) be a sesquilinear
form on Q and q(f ) = s(f, f ) the associated quadratic form. Show that the
Cauchy–Schwarz inequality
|s(f, g)| ≤ q(f )1/2 q(g)1/2 (1.57)
holds if q(f ) ≥ 0.
(Hint: Consider 0 ≤ q(f + αg) = q(f ) + 2Re(α s(f, g)) + |α|2 q(g) and
choose α = t s(f, g)∗ /|s(f, g)| with t ∈ R.)
Problem 1.25. Prove the claims made about fn , defined in (1.53), in the
last example.
1.4. Completeness
Since L2cont is not complete, how can we obtain a Hilbert space from it?
Well, the answer is simple: take the completion.
If X is an (incomplete) normed space, consider the set of all Cauchy
sequences X̃. Call two Cauchy sequences equivalent if their difference con-
verges to zero and denote by X̄ the set of all equivalence classes. It is easy
to see that X̄ (and X̃) inherit the vector space structure from X. Moreover,
Lemma 1.27. If xn is a Cauchy sequence, then kxn k converges.
d
However, if we consider A = dx : D(A) ⊆ Y → Y defined on D(A) =
1
C [0, 1], then we have an unbounded operator. Indeed, choose
un (x) = sin(nπx)
which is normalized, kun k∞ = 1, and observe that
Aun (x) = u0n (x) = nπ cos(nπx)
is unbounded, kAun k∞ = nπ. Note that D(A) contains the set of polyno-
mials and thus is dense by the Weierstraß approximation theorem (Theo-
rem 1.21).
Theorem 1.31. The space L(X, Y ) together with the operator norm (1.61)
is a normed space. It is a Banach space if Y is.
The Banach space of bounded linear operators L(X) even has a multi-
plication given by composition. Clearly this multiplication satisfies
(A + B)C = AC + BC, A(B + C) = AB + BC, A, B, C ∈ L(X) (1.63)
and
(AB)C = A(BC), α (AB) = (αA)B = A (αB), α ∈ C. (1.64)
Moreover, it is easy to see that we have
kABk ≤ kAkkBk. (1.65)
In other words, L(X) is a so-called Banach algebra. However, note that
our multiplication is not commutative (unless X is one-dimensional). We
even have an identity, the identity operator I satisfying kIk = 1.
Problem 1.27. Consider X = Cn and let A : X → X be a matrix. Equip
X with the norm (show that this is a norm)
kxk∞ = max |xj |
1≤j≤n
and compute the operator norm kAk with respect to this matrix in terms of
the matrix entries. Do the same with respect to the norm
X
kxk1 = |xj |.
1≤j≤n
is a Banach space.
Proof. First of all we need to show that (1.66) is indeed a norm. If k[x]k = 0
we must have a sequence xj ∈ M with xj → x and since M is closed we
conclude x ∈ M , that is [x] = [0] as required. To see kα[x]k = |α|k[x]k we
36 1. A first look at Banach and Hilbert spaces
is a Banach space and this can be shown as in Section 1.2. The space of
continuous functions with compact support Cc (U ) ⊆ Cb (U ) is in general
not dense and its closure will be denoted by C0 (U ). If U is open it can be
interpreted as the functions in Cb (U ) which vanish at the boundary
C0 (U ) = {f ∈ C(U )|∀ε > 0, ∃K compact : |f (x)| < ε, x ∈ X \ K}. (1.68)
Of course Rm could be replaced by any topological space up to this point.
1.7. Spaces of continuous and differentiable functions 37
there must be a sequence |xj | → ∞ and some α such that |∂α f (xj )| ≥ ε > 0.
But then kf − gk∞,k ≥ ε for every g in C∞ k (Rn ) and thus C k (Rn ) is a closed
∞
subspace. In particular, it is a Banach space of its own.
Note that the space Cbk (U ) could be further refined by requiring the
highest derivatives to be Hölder continuous. Recall that a function f : U →
C is called uniformly Hölder continuous with exponent γ ∈ (0, 1] if
|f (x) − f (y)|
[f ]γ = sup (1.72)
x6=y∈U |x − y|γ
is finite. Clearly any Hölder continuous function is continuous and in the
special case γ = 1 we obtain the Lipschitz continuous functions.
38 1. A first look at Banach and Hilbert spaces
Note that by the mean value theorem all derivatives up to order lower
than k are automatically Lipschitz continuous.
While the above spaces are able to cover a wide variety of situations,
there are still cases where the above definitions are not suitable. For ex-
ample, suppose you want to consider continuous function C(R) but you do
not want to assume boundedness. Then you could try to use local uniform
convergence as follows. Consider the seminorms
kf kj = sup |f (x)|, j ∈ N, (1.74)
|x|≤j
is a Fréchet space.
Example. The space of all entire functions f (z) (i.e. functions which are
holomorphic on all of C) together with the seminorms kf kk = sup|z|≤k |f (x)|,
k ∈ N, is a Fréchet space. Completeness follows from the Weierstraß con-
vergence theorem which states that a limit of holomorphic functions which
is uniform on every compact subset is again holomorphic.
1.7. Spaces of continuous and differentiable functions 39
Hilbert spaces
41
42 2. Hilbert spaces
with equality holding if and only if f lies in the span of {uj }nj=1 .
Of course, since we cannot assume H to be a finite dimensional vec-
tor space, we need to generalize Lemma 2.1 to arbitrary orthonormal sets
{uj }j∈J . We start by assuming that J is countable. Then Bessel’s inequality
(2.4) shows that X
|huj , f i|2 (2.5)
j∈J
converges absolutely. Moreover, for any finite subset K ⊂ J we have
X X
k huj , f iuj k2 = |huj , f i|2 (2.6)
j∈K j∈K
P
by the Pythagorean theorem and thus j∈J huj , f iuj is a Cauchy sequence
if and only if j∈J |huj , f i|2 is. Now let J be arbitrary. Again, Bessel’s
P
inequality shows that for any given ε > 0 there are at most finitely many
j for which |huj , f i| ≥ ε (namely at most kf k/ε). Hence there are at most
countably many j for which |huj , f i| > 0. Thus it follows that
X
|huj , f i|2 (2.7)
j∈J
Proof. The first part follows as in Lemma 2.1 using continuity of the scalar
product. The same is true for the lastP part except for the fact that every
f ∈ span{uj }j∈J can be written as f = j∈J αj uj (i.e., f = fk ). To see this
let fn ∈ span{uj }j∈J converge to f . Then kf −fn k2 = kfk −fn k2 +kf⊥ k2 → 0
implies fn → fk and f⊥ = 0.
Note that from Bessel’s inequality (which of course still holds) it follows
that the map f → fk is continuous.
P particularly interested in the case where every f ∈ H
Of course we are
can be written as j∈J huj , f iuj . In this case we will call the orthonormal
set {uj }j∈J an orthonormal basis (ONB).
If H is separable it is easy to construct an orthonormal basis. In fact,
if H is separable, then there exists a countable total set {fj }N j=1 . Here
N ∈ N if H is finite dimensional and N = ∞ otherwise. After throwing
away some vectors, we can assume that fn+1 cannot be expressed as a linear
combination of the vectors f1 , . . . , fn . Now we can construct an orthonormal
set as follows: We begin by normalizing f1 :
f1
u1 = . (2.12)
kf1 k
Next we take f2 and remove the component parallel to u1 and normalize
again:
f2 − hu1 , f2 iu1
u2 = . (2.13)
kf2 − hu1 , f2 iu1 k
44 2. Hilbert spaces
for any finite n and thus also for n = N (if N = ∞). Since {fj }N j=1 is total,
so is {uj }N
j=1 . Now suppose there is some f = f k + f⊥ ∈ H for which f⊥ 6= 0.
Since {uj }j=1 is total, we can find a fˆ in its span, such that kf − fˆk < kf⊥ k
N
By continuity of the norm it suffices to check (iii), and hence also (ii),
for f in a dense set.
It is not surprising that if there is one countable basis, then it follows
that every other basis is countable as well.
Theorem 2.5. In a Hilbert space H every orthonormal basis has the same
cardinality.
Proof. To prove this we need to resort to Zorn’s lemma (see Appendix A):
The collection of all orthonormal sets in H can be partially ordered by inclu-
sion. Moreover, every linearly ordered chain has an upper bound (the union
of all sets in the chain). Hence a fundamental result from axiomatic set
theory, Zorn’s lemma, implies the existence of a maximal element, that is,
an orthonormal set which is not a proper subset of every other orthonormal
set.
M ⊕ M⊥ = H (2.20)
in this situation.
Proof. (i) is obvious. (ii) follows from hf, A∗∗ gi = hA∗ f, gi = hf, Agi. (iii)
follows from hf, (AB)gi = hA∗ f, Bgi = hB ∗ A∗ f, gi. (iv) follows using (2.25)
from
kA∗ k = sup |hf, A∗ gi| = sup |hAf, gi|
kf k=kgk=1 kf k=kgk=1
= sup |hg, Af i| = kAk
kf k=kgk=1
and
kA∗ Ak = sup |hf, A∗ Agi| = sup |hAf, Agi|
kf k=kgk=1 kf k=kgk=1
where we have used that |hAf, Agi| attains its maximum when Af and Ag
are parallel (compare Theorem 1.23).
Note that kAk = kA∗ k implies that taking adjoints is a continuous op-
eration. For later use also note that (Problem 2.8)
Ker(A∗ ) = Ran(A)⊥ . (2.27)
Combining the last two results we obtain the famous Lax–Milgram the-
orem which plays an important role in theory of elliptic partial differential
equations.
Theorem 2.15 (Lax–Milgram). Let s be a sesquilinear form which is
• bounded, |s(f, g)| ≤ Ckf k kgk, and
• coercive, s(f, f ) ≥ εkf k2 .
Then for every g ∈ H there is a unique f ∈ H such that
s(h, f ) = hh, gi, ∀h ∈ H. (2.28)
L∞
Example. j=1 C = `2 (N).
and we compute
X
hf, f i = |αjk |2 > 0. (2.36)
j,k
The completion of F(H, H̃)/N (H, H̃) with respect to the induced norm is
called the tensor product H ⊗ H̃ of H and H̃.
Lemma 2.16. If uj , ũk are orthonormal bases for H, H̃, respectively, then
uj ⊗ ũk is an orthonormal basis for H ⊗ H̃.
Example. We have H ⊗ Cn = Hn .
52 2. Hilbert spaces
Compact operators
53
54 3. Compact operators
Proof. First of all note that K(., ..) is continuous on [a, b] × [a, b] and hence
uniformly continuous. In particular, for every ε > 0 we can find a δ > 0
such that |K(y, t) − K(x, t)| ≤ ε whenever |y − x| ≤ δ. Let g(x) = Kf (x).
Then
Z b
|g(x) − g(y)| ≤ |K(y, t) − K(x, t)| |f (t)|dt
a
Z b
≤ε |f (t)|dt ≤ εk1k kf k,
a
Note that (almost) the same proof shows that K is compact when defined
on C[a, b].
Theorem 3.5 (Arzelà–Ascoli). Suppose the sequence of functions fn (x),
n ∈ N, in C[a, b] is (uniformly) equicontinuous, that is, for every ε > 0
there is a δ > 0 (independent of n) such that
|fn (x) − fn (y)| ≤ ε if |x − y| < δ. (3.2)
If the sequence fn is bounded, then there is a uniformly convergent subse-
quence.
3.2. The spectral theorem for compact symmetric operators 55
Remark: There are two cases where our procedure might fail to con-
struct an orthonormal basis of eigenvectors. One case is where there is
an infinite number of nonzero eigenvalues. In this case αn never reaches 0
and all eigenvectors corresponding to 0 are missed. In the other case, 0 is
reached, but there might not be a countable basis and hence again some of
the eigenvectors corresponding to 0 are missed. In any case by adding vec-
tors from the kernel (which are automatically eigenvectors), one can always
extend the eigenvectors uj to an orthonormal basis of eigenvectors.
Corollary 3.10. Every compact symmetric operator has an associated or-
thonormal basis of eigenvectors.
Proof. Pick a value λ ∈ R such that RL (λ) exists. By Lemma 3.4 RL (λ)
is compact and by Theorem 3.3 this remains true if we replace L2cont (0, 1)
by its completion. By Theorem 3.8 there are eigenvalues αn of RL (λ) with
corresponding eigenfunctions un . Moreover, RL (λ)un = αn un is equivalent
to Lun = (λ + α1n )un , which shows that En = λ + α1n are eigenvalues
of L with corresponding eigenfunctions un . Now everything follows from
Theorem 3.8 except that the eigenvalues are simple. To show this, observe
that if un and vn are two different eigenfunctions corresponding to En , then
un (0) = vn (0) = 0 implies W (un , vn ) = 0 and hence un and vn are linearly
dependent.
Problem 3.6. Show that for our Sturm–Liouville operator u± (z, x)∗ =
u± (z ∗ , x). Conclude RL (z)∗ = RL (z ∗ ). (Hint: Problem 3.2.)
Problem 3.7. Show that the resolvent RA (z) = (A−z)−1 (provided it exists
and is densely defined) of a symmetric operator A is again symmetric for
z ∈ R. (Hint: g ∈ D(RA (z)) if and only if g = (A−z)f for some f ∈ D(A)).
Chapter 4
Proof. Suppose X = ∞
S
n=1 Xn . We can assume that the sets Xn are closed
and none of them contains a ball; that is, X\Xn is open and nonempty for
every n. We will construct a Cauchy sequence xn which stays away from all
Xn .
Since X\X1 is open and nonempty, there is a closed ball Br1 (x1 ) ⊆
X\X1 . Reducing r1 a little, we can even assume Br1 (x1 ) ⊆ X\X1 . More-
over, since X2 cannot contain Br1 (x1 ), there is some x2 ∈ Br1 (x1 ) that is
not in X2 . Since Br1 (x1 ) ∩ (X\X2 ) is open, there is a closed ball Br2 (x2 ) ⊆
Br1 (x1 ) ∩ (X\X2 ). Proceeding by induction, we obtain a sequence of balls
such that
Brn (xn ) ⊆ Brn−1 (xn−1 ) ∩ (X\Xn ).
Now observe that in every step we can choose rn as small as we please; hence
without loss of generality rn → 0. Since by construction xn ∈ BrN (xN ) for
n ≥ N , we conclude that xn is Cauchy and converges to some point x ∈ X.
63
64 4. The main theorems about Banach spaces
But x ∈ Brn (xn ) ⊆ X\Xn for every n, contradicting our assumption that
the Xn cover X.
Proof. Let On be open dense sets whose intersection is not dense. Then
this intersection must be missing some closed ball Bε . This ball will lie in
S
n Xn , where Xn = X\On are closed and nowhere dense. Now note that
X̃n = Xn ∪ B are closed nowhere dense sets in Bε . But Bε is a complete
metric space, a contradiction.
Proof. Let
\
Xn = {x| kAα xk ≤ n for all α} = {x| kAα xk ≤ n}.
α
S
Then n Xn = X by assumption. Moreover, by continuity of Aα and the
norm, each Xn is an intersection of closed sets and hence closed. By Baire’s
theorem at least one contains a ball of positive radius: Bε (x0 ) ⊂ Xn . Now
observe
kAα yk ≤ kAα (y + x0 )k + kAα x0 k ≤ n + C(x0 )
x
for kyk ≤ ε. Setting y = ε kxk , we obtain
n + C(x0 )
kAα xk ≤ kxk
ε
for every x.
4.1. The Baire theorem and its consequences 65
Proof. Denote by BrX (x) ⊆ X the open ball with radius r centered at x
and let BrX = BrX (0). Similarly for BrY (y). By scaling and translating balls
(using linearity of A), it suffices to prove BεY ⊆ A(B1X ) for some ε > 0.
Since A is surjective we have
∞
[
Y = A(BnX )
n=1
and the Baire theorem implies that for some n, A(BnX ) contains a ball
BεY (y). Without restriction n = 1 (just scale the balls). Since −A(B1X ) =
A(−B1X ) = A(B1X ) we see BεY (−y) ⊆ A(B1X ) and by convexity of A(B1X )
we also have BεY ⊆ A(B1X ).
So we have BεY ⊆ A(B1X ), but we would need BεY ⊆ A(B1X ). To complete
the proof we will show A(B1X ) ⊆ A(B2X ) which implies Bε/2
Y ⊆ A(B X ).
1
k=1
P∞
By kxk k < 21−kthe limit x = k=1 xk exists and satisfies kxk < 2. Hence
X
y = Ax ∈ A(B2 ) as desired.
Proof. If Γ(A) is closed, then it is again a Banach space. Now the projection
π1 (x, Ax) = x onto the first component is a continuous bijection onto X.
So by the inverse mapping theorem its inverse π1−1 is again continuous.
Moreover, the projection π2 (x, Ax) = Ax onto the second component is also
continuous and consequently so is A = π2 ◦ π1−1 . The converse is easy.
Proof. If we set
Γ−1 = {(y, x)|(x, y) ∈ Γ}
then Γ(A−1 ) = Γ−1 (A) and
−1 −1
Γ(A−1 ) = Γ(A)−1 = Γ(A) = Γ(A)−1 = Γ(A ).
68 4. The main theorems about Banach spaces
The closed graph theorem tells us that closed linear operators can be
defined on all of X if and only if they are bounded. So if we have an
unbounded operator we cannot have both! That is, if we want our operator
to be at least closed, we have to live with domains. This is the reason why in
quantum mechanics most operators are defined on domains. In fact, there
is another important property which does not allow unbounded operators
to be defined on the entire space:
Theorem 4.9 (Hellinger–Toeplitz). Let A : H → H be a linear operator on
some Hilbert space H. If A is symmetric, that is hg, Af i = hAg, f i, f, g ∈ H,
then A is bounded.
X if and only if A is closable. Show that in this case the completion can
be identified with D(A) and that the closure of A in X coincides with the
extension from Theorem 1.30 of A in XA .
It turns out that many questions are easier to handle after applying a
linear functional ` ∈ X ∗ . For example, suppose x(t) is a function R → X
(or C → X), then `(x(t)) is a function R → C (respectively C → C) for
any ` ∈ X ∗ . So to investigate `(x(t)) we have all tools from real/complex
analysis at our disposal. But how do we translate this information back to
x(t)? Suppose we have `(x(t)) = `(y(t)) for all ` ∈ X ∗ . Can we conclude
x(t) = y(t)? The answer is yes and will follow from the Hahn–Banach
theorem.
We first prove the real version from which the complex one then follows
easily.
Theorem 4.10 (Hahn–Banach, real version). Let X be a real vector space
and ϕ : X → R a convex function (i.e., ϕ(λx+(1−λ)y) ≤ λϕ(x)+(1−λ)ϕ(y)
for λ ∈ (0, 1)).
70 4. The main theorems about Banach spaces
Proof. Let us first try to extend ` in just one direction: Take x 6∈ Y and
set Ỹ = span{x, Y }. If there is an extension `˜ to Ỹ it must clearly satisfy
˜ + αx) = `(y) + α`(x).
`(y ˜
˜
So all we need to do is to choose `(x) ˜ + αx) ≤ ϕ(y + αx). But
such that `(y
this is equivalent to
ϕ(y − αx) − `(y) ˜ ≤ inf ϕ(y + αx) − `(y)
sup ≤ `(x)
α>0,y∈Y −α α>0,y∈Y α
and is hence only possible if
ϕ(y1 − α1 x) − `(y1 ) ϕ(y2 + α2 x) − `(y2 )
≤
−α1 α2
for every α1 , α2 > 0 and y1 , y2 ∈ Y . Rearranging this last equations we see
that we need to show
α2 `(y1 ) + α1 `(y2 ) ≤ α2 ϕ(y1 − α1 x) + α1 ϕ(y2 + α2 x).
Starting with the left-hand side we have
α2 `(y1 ) + α1 `(y2 ) = (α1 + α2 )` (λy1 + (1 − λ)y2 )
≤ (α1 + α2 )ϕ (λy1 + (1 − λ)y2 )
= (α1 + α2 )ϕ (λ(y1 − α1 x) + (1 − λ)(y2 + α2 x))
≤ α2 ϕ(y1 − α1 x) + α1 ϕ(y2 + α2 x),
α2
where λ = α1 +α2 . Hence one dimension works.
To finish the proof we appeal to Zorn’s lemma (see Appendix A): Let E
be the collection of all extensions `˜ satisfying `(x)
˜ ≤ ϕ(x). Then E can be
partially ordered by inclusion (with respect to the domain) and every linear
chain has an upper bound (defined on the union of all domains). Hence there
is a maximal element ` by Zorn’s lemma. This element is defined on X, since
if it were not, we could extend it as before contradicting maximality.
Proof. Clearly if `(x) = 0 holds for all ` in some total subset, this holds
for all ` ∈ X ∗ . If x 6= 0 we can construct a bounded linear functional on
span{x} by setting `(αx) = α and extending it to X ∗ using the previous
corollary. But this contradicts our assumption.
Example. Let us return to our example `∞ (N). Let c(N) ⊂ `∞ (N) be the
subspace of convergent sequences. Set
l(x) = lim xn , x ∈ c(N), (4.9)
n→∞
then l is bounded since
|l(x)| = lim |xn | ≤ kxk∞ . (4.10)
n→∞
Hence we can extend it to `∞ (N) by Corollary 4.12. Then l(x) cannot be
written as l(x) = ly (x) for some y ∈ `1 (N) (as in (4.6)) since yn = l(δ n ) = 0
shows y = 0 and hence `y = 0. The problem is that span{δ n } = c0 (N) 6=
`∞ (N), where c0 (N) is the subspace of sequences converging to 0.
Moreover, there is also no other way to identify `∞ (N)∗ with `1 (N), since
is separable whereas `∞ (N) is not. This will follow from Lemma 4.17 (iii)
`1 (N)
below.
72 4. The main theorems about Banach spaces
Example. The Banach spaces `p (N) with 1 < p < ∞ are reflexive: Identify
`p (N)∗ with `q (N) and choose z ∈ `p (N)∗∗ . Then there is some x ∈ `p (N)
such that
y ∈ `q (N) ∼
X
z(y) = yj xj , = `p (N)∗ .
j∈N
But this implies z(y) = y(x), that is, z = J(x), and thus J is surjective.
(Warning: It does not suffice to just argue `p (N)∗∗ ∼
= `q (N)∗ ∼
= `p (N).)
However, `1 is not reflexive since `1 (N)∗ =∼ `∞ (N) but `∞ (N)∗ 6∼= `1 (N)
as noted earlier.
Example. By the same argument (using the Riesz lemma), every Hilbert
space is reflexive.
surjective.
(ii) Suppose X is reflexive. Then the two maps
(JX )∗ : X ∗ → X ∗∗∗ (JX )∗ : X ∗∗∗ → X ∗
−1
x0 7→ x0 ◦ JX x000 7→ x000 ◦ JX
−1 00
are inverse of each other. Moreover, fix x00 ∈ X ∗∗ and let x = JX (x ).
0 00 00 0 0 0 0 −1 00
Then JX (x )(x ) = x (x ) = J(x)(x ) = x (x) = x (JX (x )), that is JX ∗ =
∗
X. If it were not, we could find some x ∈ X\span{xn }∞ n=1 and hence there
is a functional ` ∈ X ∗ as in Corollary 4.14. Choose a subsequence `nk → `.
Then
k` − `nk k ≥ |(` − `nk )(xnk )| = |`nk (xnk )| ≥ k`nk k/2,
which implies `nk → 0 and contradicts k`k = 1.
W = (U − x0 ) + (V − y0 ) = {(x − x0 ) − (y − y0 )|x ∈ U, y ∈ V }
is open (since U is), convex (since U and V are) and contains 0. Moreover,
since U and V are disjoint we have z0 = y0 −x0 6∈ W . By the previous lemma,
the associated Minkowski functional pW is convex and by the Hahn–Banach
theorem there is a linear functional satisfying
and hence `(x) < `(y) for x ∈ U and y ∈ V . Therefore `(U ) and `(V )
are disjoint convex subsets of R. Since ` is surjective and hence open by
the open mapping theorem, `(U ) must be an open interval. Thus the claim
holds with c = sup `(U ).
Problem 4.7. Show that kly k = kykq , where ly ∈ `p (N)∗ as defined in (4.6).
(Hint: Choose x ∈ `p such that xn yn = |yn |q .)
Problem 4.9. Let c0 (N) ⊂ `∞ (N) be the subspace of sequences which con-
verge to 0, and c(N) ⊂ `∞ (N) the subspace of convergent sequences.
(i) Show that c0 (N), c(N) are both Banach spaces and that c(N) =
span{c0 (N), e}, where e = (1, 1, 1, . . . ) ∈ c(N).
(ii) Show that every l ∈ c0 (N)∗ can be written as
X
l(x) = yn x n
n∈N
Proof. (i) Choose ` ∈ X ∗ such that `(x) = kxk (for the limit x) and k`k = 1.
Then
kxk = `(x) = lim inf `(xn ) ≤ lim inf kxn k.
(ii) For every ` we have that |J(xn )(`)| = |`(xn )| ≤ C(`) is bounded. Hence
by the uniform boundedness principle we have kxn k = kJ(xn )k ≤ C.
(iii) If xn is a weak Cauchy sequence, then `(xn ) converges and we can define
j(`) = lim `(xn ). By construction j is a linear functional on X ∗ . Moreover,
by (ii) we have |j(`)| ≤ sup k`(xn )k ≤ k`k sup kxn k ≤ Ck`k which shows
j ∈ X ∗∗ . Since X is reflexive, j = J(x) for some x ∈ X and by construction
`(xn ) → J(x)(`) = `(x), that is, xn * x. (iv) This follows from
kxn − xm k = sup |`(xn − xm )|
k`k=1
Remark: One can equip X with the weakest topology for which all
` ∈ X ∗ remain continuous. This topology is called the weak topology and
it is given by taking all finite intersections of inverse images of open sets
as a base. By construction, a sequence will converge in the weak topology
if and only if it converges weakly. By Corollary 4.14 the weak topology is
Hausdorff, but it will not be metrizable in general. In particular, sequences
do not suffice to describe this topology.
In a Hilbert space there is also a simple criterion for a weakly convergent
sequence to converge in norm.
Proof. By (i) of the previous lemma we have lim kfn k = kf k and hence
kf − fn k2 = kf k2 − 2Re(hf, fn i) + kfn k2 → 0.
Now we come to the main reason why weakly convergent sequences are
of interest: A typical approach for solving a given equation in a Banach
space is as follows:
(i) Construct a (bounded) sequence xn of approximating solutions
(e.g. by solving the equation restricted to a finite dimensional sub-
space and increasing this subspace).
(ii) Use a compactness argument to extract a convergent subsequence.
(iii) Show that the limit solves the equation.
Our aim here is to provide some results for the step (ii). In a finite di-
mensional vector space the most important compactness criterion is bound-
edness (Heine-Borel theorem, Theorem 1.13). In infinite dimensions this
breaks down:
a contradiction.
In fact, uk converges to the Dirac measure centered at 0, which is not in
L1 (R).
Finally, let me remark that similar concepts can be introduced for oper-
ators. This is of particular importance for the case of unbounded operators,
where convergence in the operator norm makes no sense at all.
A sequence of operators An is said to converge strongly to A,
s-lim An = A :⇔ An x → Ax ∀x ∈ D(A) ⊆ D(An ). (4.19)
n→∞
It is said to converge weakly to A,
w-lim An = A :⇔ An x * Ax ∀x ∈ D(A) ⊆ D(An ). (4.20)
n→∞
Clearly norm convergence implies strong convergence and strong conver-
gence implies weak convergence.
4.3. Weak convergence 81
Then Sn converges to zero strongly but not in norm (since kSn k = 1) and Sn∗
converges weakly to zero (since hx, Sn∗ yi = hSn x, yi) but not strongly (since
kSn∗ xk = kxk) .
Example. Let us return to the example after Theorem 4.25. Consider the
Banach space of bounded continuous functions X = C(R). Using `f (ϕ) =
ϕf dx we can regard L1 (R) as a subspace of X ∗ . Then the Dirac measure
R
More on compact
operators
The numbers sj = sj (K) are called singular values of K. There are either
finitely many singular values or they converge to zero.
where vj = s−1
j Kuj . The norm of K is given by the largest singular value
85
86 5. More on compact operators
as required. Furthermore,
hvj , vk i = (sj sk )−1 hKuj , Kuk i = (sj sk )−1 hK ∗ Kuj , uk i = sj s−1
k huj , uk i
converges to K as n → ∞ since
kK − Kn k = max sj (K)
j≥n
by (5.4).
Moreover, this also shows that the adjoint of a compact operator is again
compact.
Corollary 5.3. An operator K is compact (finite rank) if and only K ∗ is.
In fact, sj (K) = sj (K ∗ ) and
X
K∗ = sj hvj , .iuj . (5.6)
j
Proof. First of all note that (5.6) follows from (5.3) since taking adjoints
is continuous and (huj , .ivj )∗ = hvj , .iuj . The rest is straightforward.
Finally, let me remark that there are a number of other equivalent defi-
nitions for compact operators.
Lemma 5.4. For K ∈ L(H) the following statements are equivalent:
(i) K is compact.
s
(ii) An ∈ L(H) and An → A strongly implies An K → AK.
(iii) fn * f weakly implies Kfn → Kf in norm.
The last condition explains the name compact. Moreover, note that one
cannot replace An K → AK by KAn → KA in (ii) unless one additionally
requires An to be normal (then this follows by taking adjoints — recall
that only for normal operators is taking adjoints continuous with respect
to strong convergence). Without the requirement that An be normal, the
claim is wrong as the following example shows.
88 5. More on compact operators
Example. Let H = `2 (N) and let An be the operator which shifts each
sequence n places to the left and let K = hδ1 , .iδ1 , where δ1 = (1, 0, . . . ).
Then s-lim An = 0 but kKAn k = 1.
Proof. First of all note that (5.11) implies that K is compact. To see this
let Pn be the projection onto the space spanned by the first n elements of
the orthonormal basis {wj }. Then Kn = KPn is finite rank and converges
to K since
X X X 1/2
k(K − Kn )f k = k cj Kwj k ≤ |cj |kKwj k ≤ kKwj k2 kf k,
j>n j>n j>n
5.2. Hilbert–Schmidt and trace class operators 89
P
where f = j cj wj .
The rest follows from (5.3) and
X X X X
kKwj k2 = |hvk , Kwj i|2 = |hK ∗ vk , wj i|2 = kK ∗ vk k2
j k,j k,j k
X
= sk (K)2 = kKk22 .
k
Since Hilbert–Schmidt operators turn out easy to identify (cf. Section 8.5),
it is important to relate J1 (H) with J2 (H):
Lemma 5.8. If K is trace class, then for every orthonormal basis {wn } the
trace X
tr(K) = hwn , Kwn i (5.15)
n
is finite and independent of the orthonormal basis.
Clearly for self-adjoint trace class operators, the trace is the sum over
all eigenvalues (counted with their multiplicity). To see this, one just has to
choose the orthonormal basis to consist of eigenfunctions. This is even true
for all trace class operators and is known as Lidskij trace theorem (see [19]
for an easy to read introduction).
Problem 5.3. Let H = `2 (N) and let A be multiplication by a sequence
a = (aj )∞ 2
j=1 . Show that A is Hilbert–Schmidt if and only if a ∈ ` (N).
Furthermore, show that kAk2 = kak in this case.
Problem 5.4. An operator of the form K : `2 (N) → `2 (N), fn 7→ j∈N kn+j fj
P
is called Hankel operator.
• Show that K is Hilbert–Schmidt if and only if j∈N j|kj |2 < ∞
P
Proof. We first show dim Ker(1 − K) < ∞. If not we could find an infinite
orthonormal system {uj }∞ j=1 ⊂ Ker(1 − K). By Kuj = uj compactness of
K implies that there is a convergent subsequence ujk . But this is impossible
by kuj − uk k2 = 2 for j 6= k.
To see that Ran(1 − K) is closed we first claim that there is a γ > 0
such that
k(1 − K)f k ≥ γkf k, ∀f ∈ Ker(1 − K)⊥ . (5.19)
In fact, if there were no such γ, we could find a normalized sequence fj ∈
Ker(1 − K)⊥ with kfj − Kfj k < 1j , that is, fj − Kfj → 0. After passing to a
subsequence we can assume Kfj → f by compactness of K. Combining this
with fj − Kfj → 0 implies fj → f and f − Kf = 0, that is, f ∈ Ker(1 − K).
On the other hand, since Ker(1−K)⊥ is closed, we also have f ∈ Ker(1−K)⊥
which shows f = 0. This contradicts kf k = lim kfj k = 1 and thus (5.19)
holds.
Now choose a sequence gj ∈ Ran(1 − K) converging to some g. By
assumption there are fk such that (1 − K)fk = gk and we can even assume
fk ∈ Ker(1−K)⊥ by removing the projection onto Ker(1−K). Hence (5.19)
shows
kfj − fk k ≤ γ −1 k(1 − K)(fj − fk )k = γ −1 kgj − gk k
that fj converges to some f and (1 − K)f = g implies g ∈ Ran(1 − K).
92 5. More on compact operators
Since
Ran(1 − K)⊥ = Ker(1 − K ∗ ) (5.20)
by (2.27) we see that the left and right hand side of (5.18) are at least finite
for compact K and we can try to verify equality.
Theorem 5.10. Suppose K is compact. Then
dim Ker(1 − K) = dim Ran(1 − K)⊥ , (5.21)
where both quantities are finite.
Note that (5.20) implies that in any case the inhomogeneous equation
f = Kf + g has a solution if and only if g ∈ Ker(1 − K ∗ )⊥ . Moreover,
combining (5.21) with (5.20) also shows
dim Ker(1 − K) = dim Ker(1 − K ∗ ) (5.25)
for compact K.
This theory can be generalized to the case of operators where both
Ker(1 − K) and Ran(1 − K)⊥ are finite dimensional. Such operators are
called Fredholm operators (also Noether operators) and the number
ind(1 − K) = dim Ker(1 − K) − dim Ran(1 − K)⊥ (5.26)
is the called the index of K. Theorem 5.10 now says that a compact oper-
ator is Fredholm of index zero.
Problem 5.5. Compute Ker(1 − K) and Ran(1 − K)⊥ for the operator
K = hv, .iu, where u, v ∈ H satisfy hu, vi = 1.
Chapter 6
Bounded linear
operators
Example. The bounded linear operators L(X) form a Banach algebra with
identity I.
95
96 6. Bounded linear operators
Lemma 6.1. Let X be a Banach algebra with identity e. Suppose kxk < 1.
Then e − x is invertible and
∞
X
(e − x)−1 = xn . (6.9)
n=0
Proof. Equation (6.13) already shows that ρ(x) is open. Hence σ(x) is
closed. Moreover, x − α = −α(e − α1 x) together with Lemma 6.1 shows
∞
1 X x n
(x − α)−1 = − , |α| > kxk,
α α
n=0
which implies σ(x) ⊆ {α| |α| ≤ kxk} is bounded and thus compact. More-
over, taking norms shows
∞
−1 1 X kxkn 1
k(x − α) k ≤ n
= , |α| > kxk,
|α| |α| |α| − kxk
n=0
which implies (x − α)−1 → 0 as α → ∞. In particular, if σ(x) is empty,
then `((x − α)−1 ) is an entire analytic function which vanishes at infinity.
98 6. Bounded linear operators
Next let us look at the convergence radius of the Neumann series for
the resolvent
∞
1 X x n
(x − α)−1 = − (6.16)
α α
n=0
encountered in the proof of Theorem 6.3 (which is just the Laurent expansion
around infinity).
The number
r(x) = sup |α| (6.17)
α∈σ(x)
is called the spectral radius of x. Note that by (6.14) we have
r(x) ≤ kxk. (6.18)
Theorem 6.6. The spectral radius satisfies
r(x) = inf kxn k1/n = lim kxn k1/n . (6.19)
n∈N n→∞
6.1. Banach algebras 99
Then `((x − α)−1 ) is analytic in |α| > r(x) and hence (6.20) converges
absolutely for |α| > r(x) by a well-known result from complex analysis.
Hence for fixed α with |α| > r(x), `(xn /αn ) converges to zero for every
` ∈ X ∗ . Since every weakly convergent sequence is bounded we have
kxn k
≤ C(α)
|α|n
and thus
lim sup kxn k1/n ≤ lim sup C(α)1/n |α| = |α|.
n→∞ n→∞
Since this holds for every |α| > r(x) we have
r(x) ≤ inf kxn k1/n ≤ lim inf kxn k1/n ≤ lim sup kxn k1/n ≤ r(x),
n→∞ n→∞
which finishes the proof.
To end this section let us look at two examples illustrating these ideas.
Example. Let X = L(C2 ) be the space of two by two matrices and consider
0 1
x= . (6.21)
0 0
Then x2 = 0 and consequently r(x) = 0. This is not surprising, since x has
the only eigenvalue 0. The same is true for any nilpotent matrix.
Hence r(K) = 0 and for every λ ∈ C and every y ∈ C(I) the equation
x − λK x = y (6.24)
has a unique solution given by
∞
X
−1
x = (I − λK) y= λn K n y. (6.25)
n=0
Problem 6.6. Show that the map Φ from the spectral theorem is positivity
preserving, that is, f ≥ 0 if and only if Φ(f ) is positive.
We have
µu,v1 +v2 = µu,v1 + µu,v2 , µu,αv = αµu,v , µv,u = µ∗u,v (6.35)
and |µu,v |(σ(A)) ≤ kukkvk. Furthermore, µu = µu,u is a positive Borel
measure with µu (σ(A)) = kuk2 .
6.3. Spectral measures 103
Problem 6.9. Show the following estimates for the total variation of spec-
tral measures
|µu,v |(Ω) ≤ µu (Ω)1/2 µv (Ω)1/2 (6.46)
(Hint: Use µu,v (Ω) = hu, PA (Ω)vi = hPA (Ω)u, PA (Ω)vi together with Cauchy–
Schwarz to estimate (9.14).)
hence pn (f ) → |f |.
106 6. Bounded linear operators
Proof. Just observe that F̃ = {Re(f ), Im(f )|f ∈ F } satisfies the assump-
tion of the real version. Hence every real-valued continuous functions can be
approximated by elements from the subalgebra generated by F̃ , in particular
this holds for the real and imaginary parts for every given complex-valued
function. Finally, note that the subalgebra spanned by F̃ contains the ∗-
subalgebra spanned by F .
Proof. There are two possibilities: either all f ∈ F vanish at one point
t0 ∈ K (there can be at most one such point since F separates points)
or there is no such point. If there is no such point, we can proceed as in
the proof of the Stone–Weierstraß theorem to show that the identity can
be approximated by elements in A (note that to show |f | ∈ A if f ∈ A,
we do not need the identity, since pn can be chosen to contain no constant
term). If there is such a t0 , the identity is clearly missing from A. However,
adding the identity to A, we get A + C = C(K) and it is easy to see that
A = {f ∈ C(K)|f (t0 ) = 0}.
Problem 6.10. Show that the functions ϕn (x) = √1 einx , n ∈ Z, form an
2π
orthonormal basis for H = L2 (0, 2π).
Problem 6.11. Let k ∈ N and I ⊆ R. Show that the ∗-subalgebra generated
by fz0 (t) = (t−z1 )k for one z0 ∈ C is dense in the C ∗ algebra C0 (I) of
0
continuous functions vanishing at infinity
• for I = R if z0 ∈ C\R and k = 1, 2,
• for I = [a, ∞) if z0 ∈ (−∞, a) and every k,
• for I = (−∞, a] ∪ [b, ∞) if z0 ∈ (a, b) and k odd.
(Hint: Add ∞ to R to make it compact.)
Problem 6.12. Let K ⊆ C be a compact set. Show that the set of all
functions f (z) = p(x, y), where p : R2 → C is polynomial and z = x + iy, is
dense in C(K).
Part 2
Real Analysis
Chapter 7
Almost everything
about Lebesgue
integration
• X ∈ A,
• A is closed under finite unions,
• A is closed under complements
is called an algebra. Note that ∅ ∈ A and that A is also closed under finite
intersections and relative complements: ∅ = X 0 , A ∩ B = (A0 ∪ B 0 )0 (de
Morgan), and A\B = A ∩ B 0 , where A0 = X\A denotes the complement.
If an algebra is closed under countable unions (and hence also countable
intersections), it is called a σ-algebra.
Example. Let X = {1, 2, 3}, then A = {∅, {1}, {2, 3}, X} is an algebra.
111
112 7. Almost everything about Lebesgue integration
• µ(∅) = 0,
∞
• µ( ∞
S P
j=1 A j ) = µ(Aj ) if Aj ∩ Ak = ∅ for all j 6= k (σ-additivity).
j=1
Example. Take a set X and Σ = P(X) and set µ(A) to be the number
of elements of A (respectively, ∞ if A is infinite). This is the so-called
counting measure. It will be finite if and only if X is finite and σ-finite if
and only if X is countable.
Proof. The first claim is obvious from µ(B) = µ(A) + µ(B\A). To see the
second define Ã1 = A1 , Ãn = An \An−1 and note that these sets are disjoint
and satisfy An = nj=1 Ãj . Hence µ(An ) = nj=1 µ(Ãj ) → ∞
S P P
j=1 µ(Ãj ) =
S∞
µ( j=1 Ãj ) = µ(A) by σ-additivity. The third follows from the second using
Ãn = A1 \An % A1 \A implying µ(Ãn ) = µ(A1 ) − µ(An ) → µ(A1 \A) =
µ(A1 ) − µ(A).
114 7. Almost everything about Lebesgue integration
For a finite measure the alternate normalization µ̃(x) = µ((−∞, x]) can
be used. The resulting distribution function differs from our above defi-
nition by a constant µ(x) = µ̃(x) − µ((−∞, 0]). In particular, this is the
normalization used in probability theory.
Conversely, to obtain a measure from a nondecreasing function m : R →
R we proceed as follows: Recall that an interval is a subset of the real line
of the form
with a ≤ b, a, b ∈ R∪{−∞, ∞}. Note that (a, a), [a, a), and (a, a] denote the
empty set, whereas [a, a] denotes the singleton {a}. For any proper interval
with different endpoints (i.e. a < b) we can define its measure to be
m(b+) − m(a+), I = (a, b],
m(b+) − m(a−), I = [a, b],
µ(I) = (7.5)
m(b−) − m(a+), I = (a, b),
m(b−) − m(a−), I = [a, b),
(which agrees with (7.5) except for the case I = (a, a) which would give a
negative value for the empty set if µ jumps at a). Note that µ({a}) = 0 if
and only if m(x) is continuous at a and that there can be only countably
many points with µ({a}) > 0 since a nondecreasing function can have at
most countably many jumps. Moreover, observe that the definition of µ
does not involve the actual value of m at a jump. Hence any function m̃
with m(x−) ≤ m̃(x) ≤ m(x+) gives rise to the same µ. We will frequently
assume that m is right continuous such that it coincides with the distribu-
tion function up to a constant, µ(x) = m(x+) − m(0+). In particular, µ
determines m up to a constant and the value at the jumps.
Now we can consider the algebra of finite unions of disjoint intervals
(check that this is indeed an algebra) and extend (7.5) to finite unions of
disjoint intervals by summing over all intervals. It is straightforward to
verify that µ is well defined (one set can be represented by different unions
of intervals) and by construction additive. In fact, it is even a premeasure.
Lemma 7.2. The interval function µ defined in (7.5) gives rise to a unique
σ-finite premeasure on the algebra A of finite unions of disjoint intervals.
116 7. Almost everything about Lebesgue integration
which shows
∞
X
µ(Ik ) ≤ µ(I).
k=1
To get the converse inequality, we need to work harder:
We can cover each Ik by some slightly larger open interval Jk such that
µ(Jk ) ≤ µ(Ik ) + 2εk (only closed endpoints need extension). First suppose
I is compact. Then finitely many of the Jk , say the first n, cover I and we
have
n
[ n
X ∞
X
µ(I) ≤ µ( Jk ) ≤ µ(Jk ) ≤ µ(Ik ) + ε.
k=1 k=1 k=1
Since ε > 0 is arbitrary, this shows σ-additivity for compact intervals. By
additivity we can always add/subtract the endpoints of I and hence σ-
additivity holds for any bounded interval. If I is unbounded we can again
assume that it is closed by adding an endpoint if necessary. Then for any
x > 0 we can find an n such that {Jk }nk=1 cover at least I ∩ [−x, x] and
hence
X∞ Xn n
X
µ(Ik ) ≥ µ(Ik ) ≥ µ(Jk ) − ε ≥ µ([−x, x] ∩ I) − ε.
k=1 k=1 k=1
Since the proof of this theorem is rather involved, we defer it to the next
section and look at some examples first.
Example. Suppose Θ(x) = 0 for x < 0 and Θ(x) = 1 for x ≥ 0. Then we
obtain the so-called Dirac measure at 0, which is given by Θ(A) = 1 if
0 ∈ A and Θ(A) = 0 if 0 6∈ A.
The typical use of this lemma is as follows: First verify some property
for sets in a set S which is closed under finite intersections and generates
120 7. Almost everything about Lebesgue integration
the σ-algebra. In order to show that it holds for every set in Σ(S), it suffices
to show that the collection of sets for which it holds is a Dynkin system.
As an application we show that a premeasure determines the correspond-
ing measure µ uniquely (if there is one at all):
Theorem 7.5 (Uniqueness of measures). Let S ⊆ Σ be a collection of sets
which generates Σ and which is closed under finite intersections and contains
a sequence of increasing sets Xn % X of finite measure µ(Xn ) < ∞. Then
µ is uniquely determined by the values on S.
where the infimum extends over all countable covers from A. Then the
function µ∗ : P(X) → [0, ∞] is an outer measure; that is, it has the
properties (Problem 7.6)
• µ∗ (∅) = 0,
• A1 ⊆ A2 ⇒ µ∗ (A1 ) ≤ µ∗ (A2 ), and
• µ∗ ( ∞
S P∞ ∗
n=1 An ) ≤ n=1 µ (An ) (subadditivity).
Note that µ∗ (A) = µ(A) for A ∈ A (Problem 7.7).
7.2. Extending a premeasure to a measure 121
Problem 7.7. Show that µ∗ defined in (7.8) extends µ. (Hint: For the
cover An it is no restriction to assume An ∩ Am = ∅ and An ⊆ A.)
7.3. Measurable functions 123
Problem 7.8. Show that the measure constructed in Theorem 7.7 is com-
plete.
Problem 7.9. Let µ be a finite measure. Show that
d(A, B) = µ(A∆B), A∆B = (A ∪ B)\(A ∩ B) (7.11)
is a metric on Σ if we identify sets of measure zero. Show that if A is an
algebra, then it is dense in Σ(A). (Hint: Show that the sets which can be
approximated by sets in A form a Dynkin system.)
Clearly the intervals (aj , ∞) can also be replaced by [aj , ∞), (−∞, aj ),
or (−∞, aj ].
124 7. Almost everything about Lebesgue integration
Proof. Note that addition and multiplication are continuous functions from
R2 → R and hence the claim follows from the previous lemma.
Proof. It suffices to prove that sup fn is measurable since the rest follows
from inf fn = − sup(−fn ), lim inf fn = sup S n inf k≥n fk , and lim sup fn =
inf n supk≥n fk . But (sup fn )−1 ((a, ∞]) = n fn−1 ((a, ∞]) are Borel and we
are done.
7.4. How wild are measurable objects 125
Proof. That µ is σ-finite is immediate from the definitions since for any
fixed x0 ∈ X the open balls Xn = Bn (x0 ) have finite measure and satisfy
Xn % X.
126 7. Almost everything about Lebesgue integration
To see that (7.18) holds we begin with the case when µ is finite. Denote
by A the set of all Borel sets satisfying (7.18). Then A contains every closed
set F : Given F define On = {x ∈ X|d(x, F ) < 1/n} and note that On are
open sets which satisfy On & F . Thus by Theorem 7.1 (iii) µ(On \F ) → 0
and hence F ∈ A.
Moreover, A is even a σ-algebra. That it is closed under complements
is easy to see (note that Õ = X\F and F̃ = X\O are the required sets
for ÃS= X\A). To see that it is closed under countable unions consider
A= ∞ n=1 An with ASn ∈ A. Then there are Fn , On such that µ(On \Fn ) ≤
ε2−n−1 . Now O = ∞
SN
n=1 On is open and F = n=1 Fn is closed for any
finite
S N . Since µ(A) is finite we can choose N sufficiently large such that
µ( ∞ \F ) ≤ ε/2. Then weShave found two sets of the required type:
N +1 FnP
µ(O\F ) ≤ ∞ ∞
n=1 µ(On \Fn ) + µ( n=N +1 Fn \F ) ≤ ε. Thus A is a σ-algebra
containing the open sets, hence it is the entire Borel algebra.
Now suppose µ is not finite. Pick some x0 ∈ X and set X0 = B2/3 (x0 )
and Xn = Bn+2/3 (x0 )\Bn−2/3 (x0 ), n ∈ N. Let An = A ∩ Xn and note that
A= ∞
S
n=0 An . By the finite case we canSchoose Fn ⊆ AS n ⊆ On ⊆ Xn such
that µ(On \Fn ) ≤ ε2 −n−1 . Now set F = n Fn and O = n On and observe
that F is closed. Indeed, let x ∈ F and let xj be some sequence from F
converging to x. Since d(x0 , xj ) → d(x0 , x) this sequence must eventually
lie in Fn ∪ Fn+1 for P
some fixed n implying x ∈ Fn ∪ Fn+1 = Fn ∪ Fn+1 ⊆ F .
Finally, µ(O\F ) ≤ ∞ n=0 µ(On \Fn ) ≤ ε as required.
Looking at the Borel σ-algebra the next general sets after open sets are
countable intersections of open sets, known as Gδ sets (here G and δ stand for
the german words Gebiet and Durchschnitt, respectively). The next general
sets after closed sets are countable unions of closed sets, known as Fσ sets
(here F and σ stand for the french words fermé and somme, respectively).
Example. The irrational numbers are a Gδ set in R. To see this let xn be
an enumeration of the rational numbers and consider the intersection of the
open sets On = R\{xn }. The rational numbers are hence an Fσ set.
Proof. Since Gδ sets are Borel, only the converse direction is nontrivial.
By LemmaT 7.14 we can find open sets On such that µ(On \A) ≤ 1/n. Now
let G = n On . Then µ(G\A) ≤ µ(On \A) ≤ 1/n for any n and thus
µ(G\A) = 0 The second claim is analogous.
Problem 7.12. Show directly (without using regularity) that for every ε > 0
there is an open set O of Lebesgue measure |O| < ε which covers the rational
numbers.
Problem 7.13. Show that a Borel set A ⊆ R has Lebesgue measure zero
if and only if for every
P ε there exists a countable set of Intervals Ij which
cover A and satisfy j |Ij | < ε.
128 7. Almost everything about Lebesgue integration
(v) follows
P from monotonicity
P of µ. (vi) follows since by (iv) we can write
s = j αj χCj , t = j βj χCj where, by assumption, αj ≤ βj .
7.5. Integration — Sum me up, Henri 129
In particular Z Z
f dµ = lim sn dµ, (7.25)
A n→∞ A
for every monotone sequence sn % f of simple functions. Note that there is
always such a sequence, for example,
n2n
X k k k+1
sn (x) = χ −1 (x), Ak = [ , ), An2n = [n, ∞). (7.26)
2n f (Ak ) 2n 2n
k=0
The second claim holds for simple functions and hence for all functions by
construction of the integral.
If the integral is finite for both the positive and negative part f ± =
max(±f, 0) of an arbitrary measurable function f , we call f integrable
and set Z Z Z
f dµ = f + dµ − f − dµ. (7.30)
A A A
7.5. Integration — Sum me up, Henri 131
Clearly f is integrable if and only if |f | is. The set of all integrable functions
is denoted by L1 (X, dµ).
Lemma 7.21. The integral is linear and Lemma 7.17 holds for integrable
functions s, t.
Furthermore, for all integrable functions f, g we have
Z Z
| f dµ| ≤ |f | dµ (7.32)
A A
Note that the proof also shows that if f is not 0 almost everywhere,
there is an ε > 0 such that µ({x| |f (x)| ≥ ε}) > 0.
In particular, the integral does not change if we restrict the domain of
integration to a support of µ or if we change f on a set of measure zero. In
particular, functions which are equal a.e. have the same integral.
Finally, our integral is well behaved with respect to limiting operations.
We first state a simple generalization of Fatou’s lemma.
Lemma 7.23 (generalized Fatou lemma). If fn is a sequence of real-valued
measurable function and g some integrable function. Then
Z Z
lim inf fn dµ ≤ lim inf fn dµ (7.36)
A n→∞ n→∞ A
if g ≤ fn and Z Z
lim sup fn dµ ≤ lim sup fn dµ (7.37)
n→∞ A A n→∞
if fn ≤ g.
R
Proof. To see the first apply Fatou’s lemma to fn −g and subtract A g dµ on
both sides of the result. The second follows from the first using lim inf(−fn ) =
− lim sup fn .
If in the last lemma we even have |fn | ≤ g, we can combine both esti-
mates to obtain
Z Z Z Z
lim inf fn dµ ≤ lim inf fn dµ ≤ lim sup fn dµ ≤ lim sup fn dµ,
A n→∞ n→∞ A n→∞ A A n→∞
(7.38)
which is known as Fatou–Lebesgue theorem. In particular, in the special
case where fn converges we obtain
Theorem 7.24 (Dominated convergence). Let fn be a convergent sequence
of measurable functions and set f = limn→∞ fn . Suppose there is an inte-
grable function g such that |fn | ≤ g. Then f is integrable and
Z Z
lim fn dµ = f dµ. (7.39)
n→∞
Proof. The real and imaginary parts satisfy the same assumptions and
hence it suffices to prove the case where fn and f are real-valued. Moreover,
since lim inf fn = lim sup fn = f equation (7.38) establishes the claim.
Remark: Since sets of measure zero do not contribute to the value of the
integral, it clearly suffices if the requirements of the dominated convergence
theorem are satisfied almost everywhere (with respect to µ).
Example. Note that the existence of g is crucial: The functions fn (x) =
1
R
χ
2n [−n,n] (x) on R converge uniformly to 0 but f
R n (x)dx = 1.
7.5. Integration — Sum me up, Henri 133
Problem 7.15. Show that the set B(X) of bounded measurable functions
with the sup norm is a Banach space. Show that the set S(X) of simple
functions is dense in B(X). Show that the integral is a bounded linear
functional on B(X) if µ(X) < ∞. (Hence Theorem 1.30 could be used to
extend the integral from simple to bounded measurable functions.)
Problem 7.16. Show that the monotone convergence holds for nondecreas-
ing sequences of real-valued measurable functions fn % f provided f1 is
integrable.
Problem 7.17. Show that the dominated convergence theorem implies (un-
der the same assumptions)
Z
lim |fn − f |dµ = 0.
n→∞
is continuous if there is an integrable function g(y) such that |f (x, y)| ≤ g(y).
Proof. As usual, we begin with the case where µ1 and µ2 are finite. Let
D be the set of all subsets for which our claim holds. Note that D contains
at least all rectangles. Thus it suffices to show that D is a Dynkin system
by Lemma 7.4. To see this note that measurability and equality of both
integrals follow from A1 (x2 )0 = A01 (x2 ) (implying µ1 (A01 (x2 )) = µ1 (X1 ) −
µ1 (A1 (x2 ))) for complements and from the monotone convergence theorem
for disjoint unions of sets.
136 7. Almost everything about Lebesgue integration
respectively,
Z
|f (x1 , x2 )|dµ2 (x2 ) ∈ L1 (X1 , dµ1 ), (7.53)
Proof. By Theorem 7.26 and linearity the claim holds for simple functions.
To see (i), let sn % f be a sequence of nonnegative simple functions. Then it
7.6. Product measures 137
follows by applying the monotone convergence theorem (twice for the double
integrals).
For (ii) we can assume that f is real-valued by considering its real and
imaginary parts separately. Moreover, splitting f = f + −f − into its positive
and negative parts, the claim reduces to (i).
Proof. Outer regularity holds for every rectangle and hence also for the
algebra of finite disjoint unions of rectangles (Problem 7.21). Thus the
claim follows from Lemma 7.9.
Problem 7.21. Show that the set of all finite union of rectangles A1 × A2
forms an algebra. Moreover, every set in this algebra can be written a finite
union of disjoint rectangles.
Problem 7.22. Let U ⊆ C be a domain, Y be some measure space, and f :
U × Y → C. Suppose y 7→ f (z, y) is measurable for every z and z 7→ f (z, y)
is holomorphic for every y. Show that
Z
F (z) = f (z, y) dµ(y) (7.55)
A
is holomorphic if for every compact subset V ⊂ U there is an integrable
function g(y) such that |f (z, y)| ≤ g(y), z ∈ V . (Hint: Use Fubini and
Morera.)
Proof. In fact, it suffices to check this formula for simple functions g, which
follows since χA ◦ f = χf −1 (A) .
Here ϕ = Vn−1 χB1 (0) and ϕε (y) = Rε−n ϕ(ε−1 y), where Vn is the volume of
the unit ball (cf. below), such that ϕε (x)dn x = 1.
We will evaluate this integral in two ways. To begin with we consider
the inner integral
Z
hε (y) = ϕε (f (z) − y)dn z.
R
For ε < ε0 the integrand is nonzero only for z ∈ K = f −1 (Bε0 (y)), where
K is some compact set containing x = f −1 (y). Using the affine change of
coordinates z = x + εw we obtain
f (x + εw) − f (x)
Z
1
hε (y) = ϕ dn w, Wε (x) = (K − x).
Wε (x) ε ε
By
f (x + εw) − f (x)
≥ 1 |w|, C = sup kdf −1 k
ε C
K
the integrand is nonzero only for w ∈ BC (0). Hence, as ε → 0 the domain
Wε (x) will eventually cover all of BC (0) and dominated convergence implies
Z
lim hε (y) = ϕ(df (x)w)dw = |Jf (x)|−1 .
ε↓0 BC (0)
with
x1 = r cos(ϕ) sin(θ1 ) sin(θ2 ) sin(θ3 ) · · · sin(θn−2 ),
x2 = r sin(ϕ) sin(θ1 ) sin(θ2 ) sin(θ3 ) · · · sin(θn−2 ),
x3 = r cos(θ1 ) sin(θ2 ) sin(θ3 ) · · · sin(θn−2 ),
x4 = r cos(θ2 ) sin(θ3 ) · · · sin(θn−2 ),
..
.
xn−1 = r cos(θn−3 ) sin(θn−2 ),
xn = r cos(θn−2 ).
The Jacobi determinant is given by
∂Tn
det = rn−1 sin(θ1 ) sin(θ2 )2 · · · sin(θn−2 )n−2 .
∂(r, ϕ, θ1 , . . . , θn−2 )
x
Proof. Consider the transformation f : Rn → [0, ∞) × S n−1 , x 7→ (|x|, |x| )
0 n−1
(with |0| = 1). Let dµ(r) = r dr and
Example. The above lemma can be used to determine when a radial func-
tion is integrable. For example we obtain
|x|α ∈ L1 (B1 (0)) ⇔ α > −n, |x|α ∈ L1 (Rn \B1 (0)) ⇔ α < −n.
(Hint: Use Fubini to show In = I1n and compute I2 using polar coordinates.)
144 7. Almost everything about Lebesgue integration
Clearly we have f (x−) ≤ f (x+) and a strict inequality can occur only at
a countable number of points. By monotonicity the value of f has to lie
between these two values f (x−) ≤ f (x) ≤ f (x+). It will also be convenient
to extend f to a function on the extended reals R ∪ {−∞, +∞}. Again
by monotonicity the limits f (±∞∓) = limx→±∞ f (x) exist and we will set
f (±∞±) = ±∞.
If we want to define an inverse, problems will occur at points where f
jumps and on intervals where f is constant. Formally speaking, if f jumps,
then the corresponding jump will be missing in the domain of the inverse and
if f is constant, the inverse will be multivalued. For the first case there is a
natural fix by choosing the inverse to be constant along the missing interval.
In particular, observe that this natural choice is independent of the actual
7.8. Appendix: Transformation of Lebesgue–Stieltjes integrals 145
value of f at the jump and hence the inverse loses this information. The
second case will result in a jump for the inverse function and here there is no
natural choice for the value at the jump (except that it must be between the
left and right limits such that the inverse is again a nondecreasing function).
To give a precise definition it will be convenient to look at relations
instead of functions. Recall that a (binary) relation R on R is a subset of
R2 .
To every nondecreasing function f associate the relation
Γ(f ) = {(x, y)|y ∈ [f (x−), f (x+)]}. (7.68)
Note that Γ(f ) does not depend on the values of f at a discontinuity and
f can be partially recovered from Γ(f ) using f (x−) = inf Γ(f )(x) and
f (x+) = sup Γ(f )(x), where Γ(f )(x) = {y|(x, y) ∈ Γ(f )} = [f (x−), f (x+)].
Moreover, the relation is nondecreasing in the sense that x1 < x2 implies
y1 ≤ y2 for (x1 , y1 ), (x2 , y2 ) ∈ Γ(f ). It is uniquely defined as the largest
relation containing the graph of f with this property.
The graph of any reasonable inverse should be a subset of the inverse
relation
Γ(f )−1 = {(y, x)|(x, y) ∈ Γ(f )} (7.69)
and we will call any function f −1 whose graph is a subset of Γ(f )−1 a
generalized inverse of f . Note that any generalized inverse is again non-
decreasing since a pair of points (y1 , x1 ), (y2 , x2 ) ∈ Γ(f )−1 with y1 < y2
and x1 > x2 would contradict the fact that Γ(f ) is nondecreasing. More-
over, since Γ(f )−1 and Γ(f −1 ) are two nondecreasing relations containing
the graph of f −1 , we conclude
Γ(f −1 ) = Γ(f )−1 (7.70)
since both are maximal. In particular, it follows that if f −1 is a generalized
inverse of f then f is a generalized inverse of f −1 .
There are two particular choices, namely the left continuous version
f−−1 (y) = inf Γ(f )−1 (y) and the right continuous version f+−1 (y) = sup Γ(f )−1 (y).
It is straightforward to verify that they can be equivalently defined via
f−−1 (y) = inf f −1 ([y, ∞)) = sup f −1 ((−∞, y)),
f+−1 (y) = inf f −1 ((y, ∞)) = sup f −1 ((−∞, y]). (7.71)
For example, inf f −1 ([y, ∞)) = inf{x|(x, ỹ) ∈ Γ(f ), ỹ ≥ y} = inf Γ(f )−1 (y).
The first one is typically used in probability theory, where it corresponds to
the quantile function of a distribution.
If f is strictly increasing the generalized inverse coincides with the usual
inverse and we have f (f −1 (y)) = y for y in the range of f . The purpose
146 7. Almost everything about Lebesgue integration
of the next lemma is to investigate to what extend this remains valid for a
generalized inverse.
Note that for every y there is some x with y ∈ [f (x−), f (x+)]. Moreover,
if we can find two values, say x1 and x2 , with this property, then f (x) = y
is constant for x ∈ (x1 , x2 ). Hence, the set of all such x is an interval which
is closed since at the left, right boundary point the left, right limit equals y,
respectively.
We collect a few simple facts for later use.
Lemma 7.34. Let f be nondecreasing.
(i) f−−1 (y) ≤ x if and only if y ≤ f (x+).
(i’) f+−1 (y) ≥ x if and only if y ≥ f (x−).
(ii) f (f−−1 (y)) ≥ y if f is right continuous at f−−1 (y) with equality if
y ∈ Ran(f ).
(ii’) f (f+−1 (y)) ≤ y if f is left continuous at f+−1 (y) with equality if
y ∈ Ran(f ).
Proof. Item (i) follows since both claims are equivalent to y ≤ f (x̃) for
all x̃ > x. Next, let f be right-continuous. If y is in the range of f ,
then y = f (f−−1 (y)+) = f (f−−1 (y)) if f −1 ({y}) contains more than one
point and y = f (f−−1 (y)) else. If y is not in the range of m we must have
y ∈ [f (x−), f (x+)] for x = f−−1 (y) which establishes item (ii). Similarly for
(i’) and (ii’).
Proof. First of all note that the assumptions in case f is bounded from
above or below ensure that (f? µ)(K) < ∞ for any compact interval. More-
over, we can assume m = m+ without loss of generality. Now note that we
7.8. Appendix: Transformation of Lebesgue–Stieltjes integrals 147
have f −1 ((y, ∞)) = (f −1 (y), ∞) for y ∈ L(f ) and f −1 ((y, ∞)) = [f −1 (y), ∞)
else. Hence
(f? µ)((y0 , y1 ]) = µ(f −1 ((y0 , y1 ])) = µ((f −1 (y0 ), f −1 (y1 )])
= m(f+−1 (y1 )) − m(f+−1 (y0 )) = (m ◦ f+−1 )(y1 ) − (m ◦ f+−1 )(y0 )
if yj is either in L(f ) or satisfies µ({f+−1 (yj )}) = 0. For the last claim observe
that f −1 ((y, ∞)) will jump from (f+−1 (y), ∞) to [f+−1 (y), ∞) at y = f (x).
Example. For example, consider f (x) = χ[0,∞) (x) and µ = Θ, the Dirac
measure centered at 0 (note that Θ(x) = f (x)). Then
+∞, 1 ≤ y,
−1
f+ (y) = 0, 0 ≤ y < 1,
−∞, y < 0,
and L(f ) = (−∞, 0)∪[1, ∞). Moreover, µ(f+−1 (y)) = χ[0,∞) (y) and (f? µ)(y) =
−1
χ[1,∞) (y). If we choose g(x) = χ(0,∞) (x), then g+ (y) = f+−1 (y) and L(g) =
−1
R. Hence µ(g+ (y)) = χ[0,∞) (y) = (g? µ)(y).
For later use it is worth while to single out the following consequence:
Corollary 7.36. Let m, f be as in the previous lemma and denote by µ, ν±
the measures associated with m, m± ◦ f −1 , respectively. Then, (f∓ )? µ = ν±
and hence Z Z
g d(m± ◦ f −1 ) = (g ◦ f∓ ) dm. (7.73)
for any Borel function g which is either nonnegative or for which one of the
two integrals is finite. Similarly, if n is left continuous and im is replaced
by m(m−1+ (y)).
R R
Hence the usual R (g ◦ m) d(n ◦ m) = Ran(m) g dn only holds if m is
continuous. In fact, the right-hand side looses all point masses of µ. The
above formula fixes this problem by rendering g constant along a gap in the
range of m and includes the gap in the range of integration such that it
makes up for the lost point mass. It should be compared with the previous
example!
If one does not want to bother with im one can at least get inequalities
for monotone g.
Problem 7.31. Show that Γ(f )◦Γ(f −1 ) = {(y, z)|y, z ∈ [f (x−), f (x+)] for some x}.
Problem 7.32. Let dµ(λ) = χ[0,1] (λ)dλ and f (λ) = χ(t,∞) (λ), t ∈ R.
Compute f? µ.
Hence sP,f,− (x) is a step function approximating f from below and sP,f,+ (x)
is a step function approximating f from above. In particular,
m ≤ sP,f,− (x) ≤ f (x) ≤ sP,f,+ (x) ≤ M, m = inf f (x), M = sup f (x).
x∈[a,b] x∈[a,b]
(7.82)
Moreover, we can define the upper and lower Riemann sum associated with
P as
n
X n
X
L(P, f ) = mj (xj − xj−1 ), U (P, f ) = Mj (xj − xj−1 ). (7.83)
j=1 j=1
Of course, L(f, P ) is just the Lebesgue integral of sP,f,− and U (f, P ) is the
Lebesgue integral of sP,f,+ . In particular, L(P, f ) approximates the area
under the graph of f from below and U (P, f ) approximates this area from
above.
By the above inequality
m (b − a) ≤ L(P, f ) ≤ U (P, f ) ≤ M (b − a). (7.84)
We say that the partition P2 is a refinement of P1 if P1 ⊆ P2 and it is not
hard to check, that in this case
sP1 ,f,− (x) ≤ sP2 ,f,− (x) ≤ f (x) ≤ sP2 ,f,+ (x) ≤ sP1 ,f,+ (x) (7.85)
as well as
L(P1 , f ) ≤ L(P2 , f ) ≤ U (P2 , f ) ≤ U (P1 , f ). (7.86)
Hence we define the lower, upper Riemann integral of f as
Z Z
f (x)dx = sup L(P, f ), f (x)dx = inf U (P, f ), (7.87)
P P
respectively. Clearly
Z Z
m (b − a) ≤ f (x)dx ≤ f (x)dx ≤ M (b − a). (7.88)
We will call f Riemann integrable if both values coincide and the common
value will be called the Riemann integral of f .
150 7. Almost everything about Lebesgue integration
Example. Let [a, b] = [0, 1] and f (x) = χQ (x). Then sP,f,− (x) = 0 and
R R
sP,f,+ (x) = 1. Hence f (x)dx = 0 and f (x)dx = 1 and f is not Riemann
integrable.
On the other hand, every continuous function f ∈ C[a, b] is Riemann
integrable (Problem 7.33).
In this case the above limit equals the Riemann integral of f and Pj can be
chosen such that Pj ⊆ Pj+1 and kPj k → 0.
and thus by Lemma 7.22 sf,+ (x) = sf,− (x) almost everywhere. Moreover,
f is continuous at every x at which equality holds and which is not in any
of the partitions. Since the first as well as the second set have Lebesgue
measure zero, f is continuous almost everywhere and
Z Z
lim L(Pj , f ) = lim U (Pj , f ) = sf,± (x)dx = f (x)dx.
j j
7.9. Appendix: The connection with the Riemann integral 151
and denote by Lp (X, dµ) the set of all complex-valued measurable functions
for which kf kp is finite. First of all note that Lp (X, dµ) is a vector space,
since |f + g|p ≤ 2p max(|f |, |g|)p = 2p max(|f |p , |g|p ) ≤ 2p (|f |p + |g|p ). Of
course our hope is that Lp (X, dµ) is a Banach space. However, Lemma 7.22
implies that there is a small technical problem (recall that a property is said
to hold almost everywhere if the set where it fails to hold is contained in a
set of measure zero):
Thus kf kp = 0 only implies f (x) = 0 for almost every x, but not for all!
Hence k.kp is not a norm on Lp (X, dµ). The way out of this misery is to
identify functions which are equal almost everywhere: Let
153
154 8. The Lebesgue spaces Lp
Then N (X, dµ) is a linear subspace of Lp (X, dµ) and we can consider the
quotient space
Lp (X, dµ) = Lp (X, dµ)/N (X, dµ). (8.4)
If dµ is the Lebesgue measure on X ⊆ Rn , we simply write Lp (X). Observe
that kf kp is well-defined on Lp (X, dµ).
Even though the elements of Lp (X, dµ) are, strictly speaking, equiva-
lence classes of functions, we will still call them functions for notational
convenience. However, note that for f ∈ Lp (X, dµ) the value f (x) is not
well defined (unless there is a continuous representative and different con-
tinuous functions are in different equivalence classes, e.g., in the case of
Lebesgue measure).
With this modification we are back in business since Lp (X, dµ) turns out
to be a Banach space. We will show this in the following sections. Moreover,
note that L2 (X, dµ) is a Hilbert space with scalar product given by
Z
hf, gi = f (x)∗ g(x)dµ(x). (8.5)
X
But before that let us also define L∞ (X, dµ). It should be the set of bounded
measurable functions B(X) together with the sup norm. The only problem is
that if we want to identify functions equal almost everywhere, the supremum
is no longer independent of the representative in the equivalence class. The
solution is the essential supremum
kf k∞ = inf{C | µ({x| |f (x)| > C}) = 0}. (8.6)
That is, C is an essential bound if |f (x)| ≤ C almost everywhere and the
essential supremum is the infimum over all essential bounds.
Example. If λ is the Lebesgue measure, then the essential sup of χQ with
respect to λ is 0. If Θ is the Dirac measure centered at 0, then the essential
sup of χQ with respect to Θ is 1 (since χQ (0) = 1, and x = 0 is the only
point which counts for Θ).
As before we set
L∞ (X, dµ) = B(X)/N (X, dµ) (8.7)
and observe that kf k∞ is independent of the equivalence class.
If you wonder where the ∞ comes from, have a look at Problem 8.2.
If X is a locally compact Hausdorff space (together with the Borel sigma
algebra), a function is called locally integrable if it is integrable when
restricted to any compact subset K ⊆ X. The set of all (equivalence classes
of) locally integrable functions will be denoted by L1loc (X, dµ). Of course
this definition extends to Lp for any 1 ≤ p ≤ ∞.
8.2. Jensen ≤ Hölder ≤ Minkowski 155
-
x y
If the inequality is strict, then ϕ is called strictly convex. It is not hard
to see (use z = (1 − λ)x + λy) that the definition implies
ϕ(z) − ϕ(x) ϕ(y) − ϕ(x) ϕ(y) − ϕ(z)
≤ ≤ , x < z < y, (8.10)
z−x y−x y−z
where the inequalities are strict if ϕ is strictly convex.
156 8. The Lebesgue spaces Lp
Now let us turn to the case p = ∞. For every ε > 0 the set Aε =
{x| |f (x)| ≥ kf k∞ −ε} has positive measure. Moreover, considering Xn % X
with µ(Xn ) < ∞ there must be some n such that Bε R= Aε ∩ Xn satisfies 0 <
µ(Bε ) < ∞. Then gε = sign(f ∗ )χBε /µ(Bε ) satisfies X f gε dµ ≥ kf k∞ − ε.
Finally, choosing a sequence of simple functions sn → gε in L1 finishes the
proof.
Note that in the case p = ∞ it suffices to assume that for every set A of
positive measure, there is a subset B ⊆ A with 0 < µ(B) < ∞. Moreover,
without this assumption the claim is clearly false since there will be no
simple functions s supported on A with ksk1 = 1.
As another consequence of Hölder’s inequality we also get
Theorem 8.6 (Minkowski’s integral inequality). Suppose, µ and ν are two
σ-finite measures and f is µ ⊗ ν measurable. Let 1 ≤ p ≤ ∞. Then
Z
Z
f (., y)dν(y)
≤ kf (., y)kp dν(y), (8.14)
Y p Y
S
A = n,m An,m is again of measure zero, we see that fn (x) is a Cauchy
sequence for x ∈ X\A. The pointwise limit f (x) = limn→∞ fn (x), x ∈ X\A,
is the required limit in L∞ (X, dµ) (show this).
Consequently, if fn ∈ Lp0 ∩ Lp1 converges in both Lp0 and Lp1 , then the
limits will be equal a.e. Be warned that the statement is not true in general
without passing to a subsequence (Problem 8.8).
It even turns out that Lp is separable.
Lemma 8.10. Suppose X is a second countable topological space (i.e., it has
a countable basis) and µ is an outer regular Borel measure. Then Lp (X, dµ),
1 ≤ p < ∞, is separable. In particular, for every countable base which is
closed under finite unions the set of characteristic functions χO (x) with O
in this base is total.
Proof. The set of all characteristic functions χA (x) with A ∈ Σ and µ(A) <
∞ is total by construction of the integral (Problem 8.10)). Now our strategy
is as follows: Using outer regularity, we can restrict A to open sets and using
the existence of a countable base, we can restrict A to open sets from this
base.
Fix A. By outer regularity, there is a decreasing sequence of open sets
On ⊇ A such that µ(On ) → µ(A). Since µ(A) < ∞, it is no restriction to
assume µ(On ) < ∞, and thus kχA −χOn kp = µ(On \A) = µ(On )−µ(A) → 0.
Thus the set of all characteristic functions χO (x) with O open and µ(O) < ∞
is total. Finally let B be a countable S base for the topology. Then, every
open set O can be written as O = ∞ j=1 Õj with Õj ∈ B. Moreover, by
consideringS the set of all finite unions of elements from B, it is no restriction
to assume nj=1 Õj ∈ B. Hence there is an increasing sequence Õn % O
with Õn ∈ B. By monotone convergence, kχO − χÕn kp → 0 and hence the
set of all characteristic functions χÕ with Õ ∈ B is total.
Problem 8.8. Find a sequence fn which converges to 0 in Lp (0, 1), 1 ≤
p < ∞, but for which fn (x) → 0 for a.e. x ∈ (0, 1) does not hold. (Hint:
Every n ∈ N can be uniquely written as n = 2m + k with 0 ≤ m and
0 ≤ k < 2m . Now consider the characteristic functions of the intervals
Im,k = [k2−m , (k + 1)2−m ].)
Problem 8.9. Let µj be σ-finite regular Borel measures on some second
countable topological spaces Xj , j = 1, 2. Show that the set of characteristic
8.4. Approximation by nicer functions 161
functions χA1 ×A2 with Aj Borel sets is total in Lp (X1 × X2 , d(µ1 ⊗ µ2 )) for
1 ≤ p < ∞. (Hint: Problem 7.21 and Lemma 8.10.)
Proof. As in the proof of Lemma 8.10 the set of all characteristic functions
χK (x) with K compact is total (using inner regularity). Hence it suffices to
show that χK (x) can be approximated by continuous functions. By outer
regularity there is an open set O ⊃ K such that µ(O\K) ≤ ε. By Urysohn’s
lemma (Lemma 1.17) there is a continuous function fε : X → [0, 1] with
compact support which is 1 on K and 0 outside O. Since
Z Z
|χK − fε |p dµ = |fε |p dµ ≤ µ(O\K) ≤ ε,
X O\K
Clearly this result has to fail in the case p = ∞ (in general) since the
uniform limit of continuous functions is again continuous. In fact, the closure
of Cc (Rn ) in the infinity norm is the space C0 (Rn ) of continuous functions
vanishing at ∞ (Problem 1.36).
If X is some subset of Rn , we can do even better and approximate
integrable functions by smooth functions. The idea is to replace the value
f (x) by a suitable average computed from the values in a neighborhood.
This is done by choosing a nonnegative bump function φ, whose area is
normalized to 1, and considering the convolution
Z Z
(φ ∗ f )(x) = φ(x − y)f (y)dy = φ(y)f (x − y)dy. (8.19)
Rn Rn
For example, if we choose φr = |Br (0)|−1 χBr (0) to be the characteristic
function of a ball centered at 0, then (φr ∗ f )(x) will be precisely the average
of the values of f in the ball Br (x). In the general case we can think of
(φ ∗ f )(x) as an weighted average. Moreover, if we choose φ differentiable,
we can interchange differentiation and integration to conclude that φ ∗ f will
162 8. The Lebesgue spaces Lp
φε (x)dn x = 0.
R
(iii) For every r > 0 we have limε↓0 |x|≥r
8.4. Approximation by nicer functions 163
-
In fact, (i), (ii) are obvious from kφε k1 = 1 and the integral in (iii) will be
identically zero for ε ≥ rs , where s is chosen such that supp φ ⊆ Bs (0).
Proof. We begin with the case where f ∈ Cc (Rn ). Fix some small δ > 0.
Since f is uniformly continuous we know |f (x − y) − f (x)| → 0 as y → 0
uniformly in x. Since the support of f is compact, this remains true when
taking the Lp norm and thus we can find some r such that
δ
kf (. − y) − f (.)kp ≤ , |y| ≤ r.
2C
(Here the C is the one for which kφε k1 ≤ C holds.) Now we use
Z
(φε ∗ f )(x) − f (x) = φε (y)(f (x − y) − f (x))dn y
Rn
Splitting the domain of integration according to Rn = {y||y| ≤ r} ∪ {y||y| >
r}, we can estimate the Lp norms of the individual integrals using the
164 8. The Lebesgue spaces Lp
Note that in case of a mollifier with support in Br (0) this result implies
a corresponding local version since the value of (φε ∗ f )(x) is only affected
by the values of f on Bεr (x). The question when the pointwise limit exists
will be addressed in Problem 8.13.
Now we are ready to prove
Theorem 8.14. If X ⊆ Rn is open and µ is a regular Borel measure, then
the set Cc∞ (X) of all smooth functions with compact support is dense in
Lp (X, dµ), 1 ≤ p < ∞.
gε f d n x = g f dn x =
R R
Rand thus dominated convergence implies 0 = limε↓0
n x.
K |f |d
Problem 8.10. Show that for any f ∈ Lp (X, dµ), 1 ≤ p < ∞ there exists
a sequence of integrable simple functions sn such that |sn | ≤ |f | and sn → f
in Lp (X, dµ). Show that for p = ∞ this only holds if the measure is finite.
(Hint: Split f into the sum of four nonnegative functions and use (7.26).)
is an approximate identity on R.
Show that the Cauchy transform
Z
1 f (λ)
F (z) = dλ
π R λ−z
of a function f ∈ Lp (R) is analytic in the upper half-plane with imaginary
part given by
Im(F (x + iy)) = (Py ∗ f )(x).
In particular, by Young’s inequality kIm(F (. + iy))kp ≤ kf kp and thus
supy>0 kIm(F (. + iy))kp = kf kp . Such analytic functions are said to be
in the Hardy space H p (C+ ).
(Hint: To see analyticity of F use Problem 7.22 plus the estimate
1 1 1 + |z|
λ − z ≤ 1 + |λ| |Im(z)| .)
Proof. We assume 1 < p < ∞ for simplicity and leave the R cases p = 1, ∞ to
the reader. Choose f ∈ Lp (Y, dν). By Fubini’s theorem Y |K(x, y)f (y)|dν(y)
is measurable and by Hölder’s inequality we have
Z Z
|K(x, y)f (y)|dν(y) ≤ K1 (x, y)K2 (x, y)|f (y)|dν(y)
Y Y
Z 1/q Z 1/p
q p
≤ K1 (x, y) dν(y) |K2 (x, y)f (y)| dν(y)
Y Y
Z 1/p
p
≤ C1 |K2 (x, y)f (y)| dν(y)
Y
for µ a.e. x (if K2 (x, .)f (.) 6∈ Lp (X, dν), the inequality is trivially true). Now
take this inequality to the p’th power and integrate with respect to x using
Fubini
Z Z p Z Z
|K(x, y)f (y)|dν(y) dµ(x) ≤ C1p |K2 (x, y)f (y)|p dν(y)dµ(x)
X Y X Y
Z Z
p
= C1 |K2 (x, y)f (y)|p dµ(x)dν(y) ≤ C1p C2p kf kpp .
Y X
Hence Y |K(x, y)f (y)|dν(y) ∈ Lp (X, dµ) and in particular it is finite for
R
Rµ-almost every x. Thus K(x, .)f (.) is ν integrable for µ-almost every x and
Y K(x, y)f (y)dν(y) is measurable.
Note that the assumptions are for example satisfied if kK(x, .)kL1 (Y,dν) ≤
C and kK(., y)kL1 (X,dµ) ≤ C which follows by choosing K1 (x, y) = |K(x, y)|1/q
and K2 (x, y) = |K(x, y)|1/p . For related results see also Problems 8.15 and
12.3.
8.5. Integral operators 167
Proof. Since K(x, .) ∈ L2 (X, dµ) for µ-almost every x we infer from Parse-
val’s relation
2 Z
X Z
|K(x, y)|2 dµ(y)
K(x, y)uj (y)dµ(y) =
j X X
Hence in combination with Lemma 5.5 this shows that our definition for
integral operators agrees with our previous definition from Section 5.2. In
particular, this gives us an easy to check test for compactness of an integral
operator.
Example. Let [a, b] be some compact interval and suppose K(x, y) is
bounded. Then the corresponding integral operator in L2 (a, b) is Hilbert–
Schmidt and thus compact. This generalizes Lemma 3.4.
Problem 8.14. Suppose (Y, dν) = (X, dµ) with X ⊆ Rn . Show that the
integral operator with kernel K(x, y) = k(x − y) is bounded in Lp (X, dµ) if
k ∈ L1 (X, dµ) with kKk ≤ kkk1 .
Problem 8.15 (Schur test). Let K(x, y) be given and suppose there are
positive measurable functions a(x) and b(y) such that
kK(x, .)b(.)kL1 (Y,dν) ≤ C1 a(x), ka(.)K(., y)kL1 (X,dµ) ≤ C2 b(y).
168 8. The Lebesgue spaces Lp
The two main results will follow as simple consequence of the following
result:
169
170 9. More measure theory
Hence by the Riesz lemma (Theorem 2.10) there exists a g ∈ L2 (X, dα) such
that Z
`(h) = hg dα.
X
By construction
Z Z Z
ν(A) = χA dν = χA g dα = g dα. (9.3)
A
where for the last equality we have assumed Nn ⊆ X̃n and fn (x) = 0 for
x ∈ X̃n0 without loss of generality. Now set N = n Nn as well as f = n fn ,
S P
9.1. Decomposition of measures 171
Proof. Just observe that in this case ν(A ∩ N ) = 0 for every A. Uniqueness
will be shown in the next theorem.
Proof. Taking νsing (A) = ν(A ∩ N ) and dνac = f dµ from the previous
theorem, there is at least one such decomposition. To show uniqueness
assume there is another one, ν = ν̃ac +ν̃sing , and let Ñ be such that µ(Ñ ) = 0
and ν̃sing (Ñ 0 ) = 0. Then νsing (A) − ν̃sing (A) = A (f˜ − f )dµ. In particular,
R
˜ ˜
R
A∩N 0 ∩Ñ 0 (f − f )dµ = 0 and hence, since A is arbitrary, f = f a.e. away
˜
from N ∪ Ñ . Since µ(N ∪ Ñ ) = 0, we have f = f a.e. and hence ν̃ac = νac
as well as ν̃sing = ν − ν̃ac = ν − νac = νsing .
Problem 9.1. Let µ be a Borel measure on B and suppose its distribution
function µ(x) is continuously differentiable. Show that the Radon–Nikodym
derivative equals the ordinary derivative µ0 (x).
Problem 9.2. Suppose µ is a Borel measure on R and f : R → R is contin-
uous. Show that f? µ is absolutely continuous if µ is. (Hint: Problem 7.13)
172 9. More measure theory
Problem 9.3. Suppose µ and ν are inner regular measures. Show that
ν µ if and only if µ(C) = 0 implies ν(C) = 0 for every compact set.
Problem 9.4. Suppose ν(A) ≤ Cµ(A) for all A ∈ Σ. Then dν = f dµ with
0 ≤ f ≤ C a.e.
Problem 9.5. Let dν = f dµ. Suppose f > 0 a.e. with respect to µ. Then
µ ν and dµ = f −1 dν.
Problem 9.6 (Chain rule). Show that ν µ is a transitive relation. In
particular, if ω ν µ, show that
dω dω dν
= .
dµ dν dµ
Problem 9.7. Suppose ν µ. Show that for every measure ω we have
dω dω
dµ = dν + dζ,
dµ dν
where ζ is a positive measure (depending on ω) which is singular with respect
to ν. Show that ζ = 0 if and only if µ ν.
Lemma 9.4 (Wiener covering lemma). Given open balls B1 = Br1 (x1 ), . . . ,
Bm = Brm (xm ) in Rn , there is a subset of disjoint balls Bj1 , . . . , Bjk such
that
m
[ [k
Bj ⊆ B3rj` (xj` ) (9.8)
j=1 `=1
Proof. Assume that the balls Bj are ordered by decreasing radius. Start
with Bj1 = B1 and remove all balls from our list which intersect Bj1 . Ob-
serve that the removed balls are all contained in B3r1 (x1 ). Proceeding like
this, we obtain the required subset.
The upshot of this lemma is that we can select a disjoint subset of balls
which still controls the Lebesgue volume of the original set up to a universal
constant 3n (recall |B3r (x)| = 3n |Br (x)|).
Now we can show
subcover of K from these balls. Moreover, by Lemma 9.4 we can refine our
set of balls such that
k k
X 3n X µ(O)
|K| ≤ 3n |Bri (xi )| < µ(Bri (xi )) ≤ 3n .
α α
i=1 i=1
and using the first part of Lemma 9.5 plus |{x | |h(x)| ≥ α}| ≤ α−1 khk1
(Problem 9.11), we see
ε
|{x | lim sup Dr (f )(x) ≥ 2α}| ≤ (3n + 1) .
r↓0 α
Since ε is arbitrary, the Lebesgue measure of this set must be zero for every
α. That is, the set where the lim sup is positive has Lebesgue measure
zero.
Note that the balls can be replaced by more general sets: A sequence of
sets Aj (x) is said to shrink to x nicely if there are balls Brj (x) with rj → 0
and a constant ε > 0 such that Aj (x) ⊆ Brj (x) and |Aj | ≥ ε|Brj (x)|. For
example Aj (x) could be some balls or cubes (not necessarily containing x).
However, the portion of Brj (x) which they occupy must not go to zero! For
example the rectangles (0, 1j ) × (0, 2j ) ⊂ R2 do shrink nicely to 0, but the
rectangles (0, 1j ) × (0, j22 ) do not.
9.2. Derivatives of measures 175
Proof. Let x be a Lebesgue point and choose some nicely shrinking sets
Aj (x) with corresponding Brj (x) and ε. Then
Z Z
1 1
|f (y) − f (x)|dy ≤ |f (y) − f (x)|dy
|Aj (x)| Aj (x) ε|Brj (x)| Brj (x)
and the claim follows.
Corollary 9.8. Let µ be a Borel measure on R which is absolutely con-
tinuous with respect to Lebesgue measure. Then its distribution function is
differentiable a.e. and dµ(x) = µ0 (x)dx.
Using the upper and lower derivatives, we can also give supports for the
absolutely and singularly continuous parts.
Theorem 9.10. The set {x|0 < (Dµ)(x) < ∞} is a support for the abso-
lutely continuous and {x|(Dµ)(x) = ∞} is a support for the singular part.
Proof. The first part is immediate from the previous theorem. For the
second part first note that by (Dµ)(x) ≥ (Dµsing )(x) we can assume that µ
is purely singular. It suffices to show that the set Ak = {x | (Dµ)(x) < k}
satisfies µ(Ak ) = 0 for every k ∈ N.
Let K ⊂ Ak be compact, and let Vj ⊃ K be some open set such that
|Vj \K| ≤ 1j . For every x ∈ K there is some ε = ε(x) such that Bε (x) ⊆ Vj
and µ(B3ε (x)) ≤ k|B3ε (x)|. By compactness, finitely many of these balls
cover K and hence
X
µ(K) ≤ µ(Bεi (xi )).
i
Lemma 9.11. The set Mac = {x|0 < (Dµ)(x) < ∞} is a minimal support
for µac .
9.2. Derivatives of measures 177
Let ε > 0 be fixed and for each An choose a disjoint partition Bn,k oaf
An such that
m
X ε
|ν|(An ) ≤ |ν(Bn,k )| + n .
2
k=1
Then
N
X N X
X m N
[
|ν|(An ) ≤ |ν(Bn,k )| + ε ≤ |ν|( An ) + ε ≤ |ν|(A) + ε
n=1 n=1 k=1 n=1
SN Sm SN
since
P∞ n=1 k=1 Bn,k = n=1 An . Since ε was arbitrary this shows |ν|(A) ≥
n=1 |ν|(An ).
Conversely, given a finite partition Bk of A, then
Xm Xm X ∞ X m X ∞
|ν(Bk )| = ν(Bk ∩ An ) ≤ |ν(Bk ∩ An )|
k=1 k=1 n=1 k=1 n=1
∞ X
X m ∞
X
= |ν(Bk ∩ An )| ≤ |ν|(An ).
n=1 k=1 n=1
P∞
Taking the supremum over all partitions Bk shows |ν|(A) ≤ n=1 |ν|(An ).
Hence |ν| is a positive measure and it remains to show that it is finite.
Splitting ν into its real and imaginary part, it is no restriction to assume
that ν is real-valued since |ν|(A) ≤ |Re(ν)|(A) + |Im(ν)|(A).
The idea is as follows: Suppose we can split any given set A with |ν|(A) =
∞ into two subsets B and A\B such that |ν(B)| ≥ 1 and |ν|(A\B) = ∞.
Then we can construct a sequence Bn of disjoint sets with |ν(Bn )| ≥ 1 for
which
X∞
ν(Bn )
n=1
diverges (the terms of a convergent series mustSconverge to zero). But σ-
additivity requires that the sum converges to ν( n Bn ), a contradiction.
It remains to show existence of this splitting. Let A with |ν|(A) = ∞
be given. Then there are disjoint sets Aj such that
n
X
|ν(Aj )| ≥ 2 + |ν(A)|.
j=1
S S
Now let A+ = {Aj |ν(Aj ) ≥ 0} and A− = A\A+ = {Aj |ν(Aj ) < 0}.
Then the above inequality reads ν(A+ ) + |ν(A− )| ≥ 2 + |ν(A+ ) − |ν(A− )||
implying (show this) that for both of them we have |ν(A± )| ≥ 1 and by
|ν|(A) = |ν|(A+ ) + |ν|(A− ) either A+ or A− must have infinite |ν| measure.
180 9. More measure theory
Note that this implies that every complex measure ν can be written as
a linear combination of four positive measures. In fact, first we can split ν
into its real and imaginary part
ν = νr + iνi , νr (A) = Re(ν(A)), νi (A) = Im(ν(A)). (9.17)
Second we can split every real (also called signed) measure according to
|ν|(A) ± ν(A)
ν = ν+ − ν− , ν± (A) = . (9.18)
2
By (9.16) both ν− and ν+ are positive measures. This splitting is also known
as Jordan decomposition of a signed measure.
Of course such a decomposition of a signed measure is not unique (we can
always add a positive measure to both parts), however, the Jordan decom-
position is unique in the sense that it is the smallest possible decomposition.
Lemma 9.13. Let ν be a complex measure and µ a positive measure satis-
fying |ν(A)| ≤ µ(A) for all measurable sets A. Then µ ≥ |ν|. (Here |ν| ≤ µ
has to be understood as |ν|(A) ≤ µ(A) for every measurable set A.)
Furthermore, let ν be a signed measure and ν = ν̃+ − ν̃− a decomposition
into positive measures. Then ν̃± ≥ ν± , where ν± is the Jordan decomposi-
tion.
Proof. It suffices to prove the first part since the second is a special case.
But for
P every measurable
P set A and a corresponding finite partition Ak we
have k |ν(Ak )| ≤ µ(Ak ) = µ(A) implying |ν|(A) ≤ µ(A).
finite sums, (9.14) holds for finitely many sets. Hence it remains to show
ν(Ãn ) → ν(A). This follows from
|ν(Ãn ) − ν(A)| ≤ |ν(Ãn ) − νk (Ãn )| + |νk (Ãn ) − νk (A)| + |νk (A) − ν(A)|
≤ 2Ck + |νk (Ãn ) − νk (A)|.
Finally, νk → ν since |νk (A) − ν(A)| ≤ Ck implies kνk − νk ≤ 4Ck (Prob-
lem 9.17).
S
Proof. If An are disjoint sets and A = n An we have
X X Z XZ Z
|ν(An )| = f dµ ≤ |f |dµ = |f |dµ.
n n An n An A
R
Hence |ν|(A) ≤ A |f |dµ. To show the converse define
k−1 arg(f (x)) + π k
Ank = {x ∈ A| < ≤ }, 1 ≤ k ≤ n.
n 2π n
Then the simple functions
n
X k−1
sn (x) = e−2πi n
+iπ
χAnk (x)
k=1
Conclude that |ν(A)| ≤ C for all measurable sets A implies kνk ≤ 4C.
is called the total variation of f over [a, b]. If the total variation is finite,
f is called of bounded variation. Since we clearly have
Vab (αf ) = |α|Vab (f ), Vab (f + g) ≤ Vab (f ) + Vab (g) (9.27)
the space BV [a, b] of all functions of finite total variation is a vector space.
However, the total variation is not a norm since (consider the partition
P = {a, x, b})
Vab (f ) = 0 ⇔ f (x) ≡ c. (9.28)
Moreover, any function of bounded variation is in particular bounded (con-
sider again the partition P = {a, x, b})
sup |f (x)| ≤ |f (a)| + Vab (f ). (9.29)
x∈[a,b]
Proof. From
f (y) − f (x) ≤ |f (y) − f (x)| ≤ Vxy (f ) = Vay (f ) − Vax (f )
for x ≤ y we infer Vax (f )−f (x) ≤ Vay (f )−f (y), that is, f+ is nondecreasing.
Moreover, replacing f by −f shows that f− is nondecreasing and the claim
follows.
Lemma 9.22. Suppose f ∈ BV [a, b]. We have limc↑b Vac (f ) = Vab (f ) if and
only if f (b) = f (b−) and limc↓a Vcb (f ) = Vab (f ) if and only if f (a) = f (a+).
In particular, Vax (f ) is left, right continuous if and only f is.
a < b we have
Vab (ν) = sup V (P, ν)
P ={a=x0 ,...,xn =b}
n
X
= sup |ν((xk−1 , xk ])| ≤ |ν|((a, b])
P ={a=x0 ,...,xn =b} k=1
as required.
The rest follows from Corollary 9.8.
Proof. This is just a reformulation of the previous result. To see the last
claim combine the last part of Theorem 9.23 with Lemma 9.17.
as well as
Z Z
f (x+)dg(x) = f (b+)g(b+) − f (a+)g(a+) − g(x−)df (x). (9.39)
(a,b] [a,b)
188 9. More measure theory
The dual of Lp
Theorem 10.1. Consider Lp (X, dµ) with some σ-finite measure µ and let
q be the corresponding dual index, p1 + 1q = 1. Then the map g ∈ Lq 7→ `g ∈
(Lp )∗ given by
Z
`g (f ) = gf dµ (10.1)
X
is an isometric isomorphism for 1 ≤ p < ∞. If p = ∞ it is at least isometric.
ν(A) = `(χA ).
S∞
Suppose A = j=1 Aj , where the Aj ’s are disjoint. Then, by dominated
convergence, k nj=1 χAj − χA kp → 0 (this is false for p = ∞!) and hence
P
X∞ ∞
X ∞
X
ν(A) = `( χA j ) = `(χAj ) = ν(Aj ).
j=1 j=1 j=1
191
192 10. The dual of Lp
Proof. Identify Lp (X, dµ)∗ with Lq (X, dµ) and choose h ∈ Lp (X, dµ)∗∗ .
Then there is some f ∈ Lp (X, dµ) such that
Z
h(g) = g(x)f (x)dµ(x), g ∈ Lq (X, dµ) ∼
= Lp (X, dµ)∗ .
But this implies h(g) = g(f ), that is, h = J(f ), and thus J is surjective.
Note that in the case 0 < p < 1, where Lp fails to be a Banach space,
the dual might even be empty (see Problem 10.1)!
Problem 10.1. Show that Lp (0, 1) is a quasinormed space if 0 < p < 1 (cf.
Problem 1.17). Moreover, show that Lp (0, 1)∗ = {0} in this case. (Hint:
Suppose there were a nontrivial ` ∈ Lp (0, 1)∗ . Start with f0 ∈ Lp such that
|`(f0 )| ≥ 1. Set g0 = χ(0,s] f and h0 = χ(s,1] f , where s ∈ (0, 1) is chosen
such that kg0 kp = kh0 kp = 2−1/p kf0 kp . Then |`(g0 )| ≥ 12 or |`(h0 )| ≥ 21 and
we set f1 = 2g0 in the first case and f1 = 2h0 else. Iterating this procedure
gives a sequence fn with |`(fn )| ≥ 1 and kfn kp = 2n−n/p kf0 kp .)
Remark: To obtain the dual of L∞ (X, dµ) from this you just need to
restrict to those linear functionals which vanish on N (X, dµ) (cf. Prob-
lem 10.2), that is, those whose content is absolutely continuous with respect
to µ (note that the Radon–Nikodym theorem does not hold unless the con-
tent is a measure).
Example. Consider B(R) and define
`(f ) = lim(λf (−ε) + (1 − λ)f (ε)), λ ∈ [0, 1], (10.9)
ε↓0
for f in the subspace of bounded measurable functions which have left and
right limits at 0. Since k`k = 1 we can extend it to all of B(R) using the
Hahn–Banach theorem. Then the corresponding content ν is no measure:
∞ ∞
[ 1 1 X 1 1
λ = ν([−1, 0)) = ν( [− , − )) 6= ν([− , − )) = 0. (10.10)
n n+1 n n+1
n=1 n=1
Lemma 11.1. The Fourier transform is a bounded map from L1 (Rn ) into
Cb (Rn ) satisfying
197
198 11. The Fourier transform
∂ |α| f
∂α f = , xα = xα1 1 · · · xαnn , |α| = α1 + · · · + αn . (11.9)
∂xα1 1· · · ∂xαnn
11.1. The Fourier transform on L1 and L2 199
Proof. The formulas are immediate from the previous lemma. To see that
fˆ ∈ S(Rn ) if f ∈ S(Rn ), we begin with the observation that fˆ is bounded
by (11.2). But then pα (∂β fˆ)(p) = i−|α|−|β| (∂α xβ f (x))∧ (p) is bounded since
∂α xβ f (x) ∈ S(Rn ) if f ∈ S(Rn ).
Hence we will sometimes write pf (x) for −i∂f (x), where ∂ = (∂1 , . . . , ∂n )
is the gradient. Roughly speaking this lemma shows that the decay of
a functions is related to the smoothness of its Fourier transform and the
smoothness of a functions is related to the decay of its Fourier transform.
In particular, this allows us to conclude that the Fourier transform of
an integrable function will vanish at ∞. Recall that we denote the space of
all continuous functions f : Rn → C which vanish at ∞ by C0 (Rn ).
Corollary 11.5 (Riemann-Lebesgue). The Fourier transform maps L1 (Rn )
into C0 (Rn ).
Proof. First of all recall that C0 (Rn ) equipped with the sup norm is a
Banach space and that S(Rn ) is dense (Problem 1.36). By the previous
lemma we have fˆ ∈ C0 (Rn ) if f ∈ S(Rn ). Moreover, since S(Rn ) is dense
in L1 (Rn ), the estimate (11.2) shows that the Fourier transform extends to
a continuous map from L1 (Rn ) into C0 (Rn ).
Proof. Due to the product structure of the exponential, one can treat
each coordinate separately, reducing the problem to the case n = 1 (Prob-
lem 11.2).
Let φz (x) = exp(−zx2 /2). Then φ0z (x)+zxφz (x) = 0 and hence i(pφ̂z (p)+
z φ̂0z (p))
= 0. Thus φ̂z (p) = cφ1/z (p) and (Problem 7.26)
Z
1 1
c = φ̂z (0) = √ exp(−zx2 /2)dx = √
2π R z
at least for z > 0. However, since the integral is holomorphic for Re(z) > 0
by Problem 7.22, this holds for all z with Re(z) > 0 if we choose the branch
cut of the root along the negative real axis.
and, invoking Fubini and Lemma 11.2, we further see that this is equal to
Z Z
ipx ∧ n 1
= (φε (p)e ) (y)f (y)d y = n/2
φ1/ε (y − x)f (y)dn y.
Rn Rn ε
But the last integral converges to f in L1 (Rn ) by Lemma 8.13.
(fˆ)∨ = f, (11.14)
where Z
1
fˇ(p) = eipx f (x)dn x = fˆ(−p). (11.15)
(2π)n/2 Rn
However, note that F : L1 (Rn ) → C0 (Rn ) is not onto (cf. Problem 11.5).
Nevertheless the Fourier transform F −1 is a closed map from Ran(F) →
L1 (Rn ) by Lemma 4.7. As F −1 and F differ only by a reflection in the
argument, the same is true for F.
Lemma 11.9. Suppose f, fˆ ∈ L1 (Rn ). Then f, fˆ ∈ L2 (Rn ) and
kf k2 = kfˆk2 ≤ (2π)−n/2 kf k1 kfˆk1
2 2 (11.16)
holds.
for f, fˆ ∈ L1 (Rn ).
We also note that this extension is still given by (11.1) whenever the
right-hand side is integrable.
Lemma 11.11. Let f ∈ L1 (Rn ) ∩ L2 (Rn ), then (11.1) continues to hold,
where F now denotes the extension of the Fourier transform from S(Rn ) to
L2 (Rn ).
In particular,
Z
1
fˆ(p) = lim e−ipx f (x)dn x, (11.17)
m→∞ (2π)n/2 |x|≤m
202 11. The Fourier transform
from which that last claim follows since F : Ran(F) → L1 (Rn ) is closed by
Lemma 4.7.
Finally, note that by looking at the Gaussian’s φλ (x) = exp(−λx2 /2) one
observes that a well centered peak transforms into a broadly spread peak and
vice versa. This turns out to be a general property of the Fourier transform
known as uncertainty principle. One quantitative way of measuring this
fact is to look at
Z
k(xj − x0 )f (x)k22 = (xj − x0 )2 |f (x)|2 dn x (11.21)
Rn
which will be small if f is well concentrated around x0 in the j’th coordinate
direction.
Theorem 11.15 (Heisenberg uncertainty principle). Suppose f ∈ S(Rn ).
Then for any x0 , p0 ∈ R we have
kf k22
k(xj − x0 )f (x)k2 k(pj − p0 )fˆ(p)k2 ≥ . (11.22)
2
0
Proof. Replacing f (x) by eixj p f (x + x0 ej ) (where ej is the unit vector into
the j’th coordinate direction) we can assume x0 = p0 = 0 by Lemma 11.2.
Using integration by parts we have
Z Z Z
2
kf k2 = |f (x)| d x = − xj ∂j |f (x)| d x = −2Re xj f (x)∗ ∂j f (x)dn x.
2 n 2 n
Rn Rn Rn
Hence, by Cauchy–Schwarz,
kf k22 ≤ 2kxj f (x)k2 k∂j f (x)k2 = 2kxj f (x)k2 kpj fˆ(p)k2
the claim follows.
204 11. The Fourier transform
where
Z
1 1
K(x, y) = ei(x−y)p χA (y)dn p = χ̂B (y − x)χA (y).
(2π)n B (2π)n
Since K ∈ L2 (Rn × Rn ) the corresponding integral operator is Hilbert–
Schmidt, and thus its eigenspace corresponding to the eigenvalue 1 can be
at most finite dimensional.
Now if there is a nonzero f , we can find a sequence of vectors xn → 0
such the functions fn (x) = f (x − xn ) are linearly independent (look at
their supports) and satisfy supp(fn ) ⊆ 2A, supp(fˆn ) ⊆ B. But this a
contradiction by the first part applied to the sets 2A and B.
Problem 11.1.
Qn Show that S(Rn ) ⊂ Lp (Rn ). (Hint: If f ∈ S(Rn ), then
|f (x)| ≤ Cm j=1 (1 + x2j )−m for every m.)
Show that f ∈ L1 (Rn ) with kf k1 = nj=1 kfj k1 and fˆ(p) = nj=1 fˆj (pj ).
Q Q
Problem 11.7. Suppose f (x)ek|x| ∈ L1 (R) for some k > 0. Then fˆ(p) has
an analytic extension to the strip |Im(p)| < k.
Since the right-hand side is integrable for n ≥ 3 we obtain that our solution
is necessarily given by
In fact, this formula still works provided g(x), |p|−2 ĝ(p) ∈ L1 (Rn ). Moreover,
if we additionally assume ĝ ∈ L1 (Rn ), then |p|2 fˆ(p) = ĝ(p) ∈ L1 (Rn ) and
Lemma 11.3 implies that f ∈ C 2 (Rn ) as well as that it is indeed a solution.
Note that if n ≥ 3, then |p|−2 ĝ(p) ∈ L1 (Rn ) follows automatically from
g, ĝ ∈ L1 (Rn ) (show this!).
Moreover, we clearly expect that f should be given by a convolution.
However, since |p|−2 is not in Lp (Rn ) for any p, the formulas derived so far
do not apply.
206 11. The Fourier transform
Lemma 11.17. Let 0 < α < n and suppose g ∈ L1 (Rn ) ∩ L∞ (Rn ) as well
as |p|−α ĝ(p) ∈ L1 (Rn ). Then
Z
(|p|−α ĝ(p))∨ (x) = Iα (|x − y|)g(y)dn y, (11.26)
Rn
where the Riesz potential is given by
Γ( n−α
2 ) 1
Iα (r) = α n/2 α r n−α . (11.27)
2 π Γ( 2 )
Proof. Note that, while |.|−α is not in Lp (Rn ) for any p, our assumption
0 < α < n ensures that the singularity at zero is integrable.
We set φt (p) = exp(−t|p|2 /2) and begin with the elementary formula
Z ∞
−α 1
|p| = cα φt (p)tα/2−1 dt, cα = α/2 ,
0 2 Γ(α/2)
which follows from the definition of the gamma function (Problem 7.27)
after a simple scaling. Since |p|−α ĝ(p) is integrable we can use Fubini and
Lemma 11.6 to obtain
Z Z ∞
−α ∨ cα ixp α/2−1
(|p| ĝ(p)) (x) = e φt (p)t dt ĝ(p)dn p
(2π)n/2 Rn 0
Z ∞ Z
cα ixp n
= e φ̂ 1/t (p)ĝ(p)d p t(α−n)/2−1 dt.
(2π)n/2 0 Rn
Note that the conditions of the above theorem are for example satisfied
if g, ĝ ∈ L1 (Rn ) which holds for example if g ∈ S(Rn ). In summary, if
g ∈ L1 (Rn ) ∩ L∞ (Rn ), |p|−2 ĝ(p) ∈ L1 (Rn ) and n ≥ 3, then
f =Φ∗g (11.28)
11.2. Applications to linear partial differential equations 207
Γ( n2 − 1) 1
Φ(x) = , n ≥ 3, (11.29)
4π n/2 |x|n−2
ĝ - P −1 ĝ
6
F F −1
?
g f
208 11. The Fourier transform
with
Z ∞
1 r ν r dt
Kν (r) = K−ν (r) = e−t− 4t , r > 0, ν ∈ R, (11.37)
2 2 0 tν+1
the modified Bessel function of the second kind of order ν ([16, (10.32.10)]).
2 α/2
= tα/2−1 e−t(1+|p| ) dt.
(1 + |p| ) 0
11.2. Applications to linear partial differential equations 209
Since g, ĝ(p) are integrable we can use Fubini and Lemma 11.6 to obtain
Γ( α2 )−1
Z Z ∞
ĝ(p) ∨ ixp α/2−1 −t(1+|p|2 )
( ) (x) = e t e dt ĝ(p)dn p
(1 + |p|2 )α/2 (2π)n/2 Rn 0
α −1 Z ∞ Z
Γ( 2 )
= n/2
e φ̂1/2t (p)ĝ(p)d p e−t t(α−n)/2−1 dt
ixp n
(4π) 0 Rn
α −1 Z ∞ Z
Γ( 2 )
= n/2
φ1/2t (x − y)g(y)d y e−t t(α−n)/2−1 dt
n
(4π) 0 Rn
α −1 Z Z ∞
Γ( 2 ) −t (α−n)/2−1
= n/2
φ1/2t (x − y)e t dt g(y)dn y
(4π) Rn 0
to obtain the desired result. Using Fubini in the last step is allowed since g
is bounded and B(n−α)/2 (|x|) ∈ L1 (Rn ) (Problem 11.11).
(Hint: Fubini.)
By Lemma 11.4 this definition coincides with the usual one for every f ∈
S(Rn ).
q
2 cos(p)−1
Example. Consider f (x) = (1 − |x|)χ[−1,1] (x). Then fˆ(p) = π p2
and f ∈ H 1 (R). The weak derivative is f 0 (x) = − sign(x)χ[−1,1] (x).
11.3. Sobolev spaces 211
We also have
Z
g(x)(∂α f )(x)dn x = hg ∗ , (∂α f )i = hĝ(p)∗ , (ip)α fˆ(p)i
Rn
= (−1)|α| h(ip)α ĝ(p)∗ , fˆ(p)i = (−1)|α| h∂α g ∗ , f i
Z
= (−1)|α| (∂α g)(x)f (x)dn x, (11.44)
Rn
is also called the weak derivative or the derivative in the sense of distri-
butions of f (by Lemma 8.15 such a function is unique if it exists). Hence,
choosing g = ϕ in (11.44), we see that H r (Rn ) is the set of all functions hav-
ing partial derivatives (in the sense of distributions) up to order r, which
are in L2 (Rn ).
In this connection the following norm for H m (Rn ) with m ∈ N0 is more
common:
X
kf k22,m = k∂α f k22 . (11.46)
|α|≤m
Proof. Abbreviate hpi = (1 + |p|2 )1/2 . Now use |(ip)α fˆ(p)| ≤ hpi|α| |fˆ(p)| =
hpi−s · hpi|α|+s |fˆ(p)|. Now hpi−s ∈ L2 (Rn ) if s > n2 (use polar coordinates
to compute the norm) and hpi|α|+s |fˆ(p)| ∈ L2 (Rn ) if s + |α| ≤ r. Hence
hpi|α| |fˆ(p)| ∈ L1 (Rn ) and the claim follows from the previous lemma.
The last example shows that in the case r < n2 functions in H r are no
longer necessarily continuous. In this case we at least get an embedding into
some better Lp space:
Theorem 11.23 (Sobolev inequality). Suppose 0 < r < n2 . Then H r (Rn )
2n
is continuously embedded into Lp (Rn ) with p = n−2r , that is,
Proof. Since kT (.)f k ∈ C([0, 1]) for every f ∈ X we have supt∈[0,1] kT (t)f k ≤
Mf . Hence by the uniform boundedness principle supt∈[0,1] kT (t)k ≤ M for
11.4. Applications to evolution equations 215
where the domain D(A) is precisely the set of all f ∈ X for which the above
limit exists.
Lemma 11.25. Let T (t) be a C0 semigroup with generator A. If g ∈ D(A)
then T (t)g ∈ D(A) and AT (t)g = T (t)Ag. Moreover, suppose g ∈ X with
u(t) = T (t)g ∈ D(A) for t > 0. Then u(t) ∈ C 1 ((0, ∞), X) ∩ C([0, ∞), X)
and u(t) is the unique solution of the abstract Cauchy problem
u̇(t) = Au(t), u(0) = g. (11.56)
This is for example the case if g ∈ D(A) in which case we even have u(t) ∈
C 1 ([0, ∞), X).
Next, one verifies (Problem 11.15) that the solution (in the sense defined
above) of this differential equation is given by
2
û(t)(p) = ĝ(p)e−|p| t . (11.58)
Accordingly, the solution of our original problem is
2
u(t) = TH (t)g, TH (t) = F −1 e−|p| t F. (11.59)
Note that TH (t) : L2 (Rn ) → L2 (Rn ) is a bounded linear operator with
2
kTH (t)k ≤ 1 (since |e−|p| t | ≤ 1). In fact, for t > 0 we even have TH (t)g ∈
H r (Rn ) for any r ≥ 0 showing that u(t) is smooth even for rough initial
functions g. In summary,
Theorem 11.26. The family TH (t) is a C0 -semigroup whose generator is
∆, D(∆) = H 2 (Rn ).
Proof. That H 2 (Rn ) ⊆ D(A) follows from Problem 11.15. Conversely, let
2
g 6∈ H 2 (Rn ). Then t−1 (eR−|p| t − 1) → −|p|2Runiformly on every compact
subset K ⊂ Rn . Hence K |p|2 |ĝ(p)|2 dn p = K |Ag(x)|2 dn x which gives a
contradiction as K increases.
Next we want to derive a more explicit formula for our solution. To this
end we assume g ∈ L1 (Rn ) and introduce
1 |x|2
− 4t
Φt (x) = e , (11.60)
(4πt)n/2
known as the fundamental solution of the heat equation, such that
û(t) = (2π)n/2 ĝ Φ̂t = (Φt ∗ g)∧ (11.61)
by Lemma 11.6 and Lemma 11.12. Finally, by injectivety of the Fourier
transform (Theorem 11.7) we conclude
u(t) = Φt ∗ g. (11.62)
Moreover, one can check directly that (11.62) defines a solution for arbitrary
g ∈ Lp (Rn ).
Theorem 11.27. Suppose g ∈ Lp (Rn ), 1 ≤ p ≤ ∞. Then (11.62) defines
a solution for the heat equation which satisfies u ∈ C ∞ ((0, ∞) × Rn ). The
solutions has the following properties:
(i) If 1 ≤ p < ∞, then limt↓0 u(t) = g in Lp . If p = ∞ this holds for
g ∈ C0 (Rn ).
(ii) If p = ∞, then
ku(t)k∞ ≤ kgk∞ . (11.63)
If g is real-valued then so is u and
inf g ≤ u(t) ≤ sup g. (11.64)
11.4. Applications to evolution equations 217
As in the case of the heat equation, we would like to express our solution
as a convolution with the initial condition. However, now we run into the
2
problem that e−i|p| t is not integrable. To overcome this problem we consider
2
fε (p) = e−(it+ε)p , ε > 0. (11.72)
Then, as before we have
|x−y|2
Z
∨ 1 − 4(it+ε)
(fε ĝ) (x) = e g(y)dn y (11.73)
(4π(it + ε))n/2 Rn
218 11. The Fourier transform
and hence Z
1 |x−y|2
i 4t
TS (t)g(x) = e g(y)dn y (11.74)
(4πit)n/2 Rn
for t 6= 0 and g ∈ L2 (Rn ) ∩ L1 (Rn ). In fact, letting ε ↓ 0 the left-hand
side converges to TS (t)g in L2 and the limit of the right-hand side exists
pointwise by dominated convergence and its pointwise limit must thus be
equal to its L2 limit.
Using this explicit form, we can again draw some further consequences.
For example, if g ∈ L2 (Rn ) ∩ L1 (Rn ), then g(t) ∈ C(Rn ) for t 6= 0 (use
dominated convergence and continuity of the exponential) and satisfies
1
ku(t)k∞ ≤ kgk1 . (11.75)
|4πt|n/2
Thus we have again spreading of wave functions in this case.
Finally we turn to the wave equation
utt − ∆u = 0, u(0) = g, ut (0) = f. (11.76)
This equation will fit into our framework once transform it to a first order
system with respect to time:
ut = v, vt = ∆u, u(0) = g, v(0) = f. (11.77)
After applying the Fourier transform this system reads
ût = v̂, v̂t = −|p|2 û, û(0) = ĝ, v̂(0) = fˆ, (11.78)
and the solution is given by
sin(t|p|) ˆ
û(t, p) = cos(t|p|)ĝ(p) + f (p),
|p|
v̂(t, p) = − sin(t|p|)|p|ĝ(p) + cos(t|p|)fˆ(p). (11.79)
Hence for (g, f ) ∈ H 2 (Rn ) ⊕ H 1 (Rn ) our solution is given by
!
sin(t|p|)
u(t) g cos(t|p|)
= TW (t) , TW (t) = F −1 |p| F.
v(t) f − sin(t|p|)|p| cos(t|p|)
(11.80)
Theorem 11.29. The family TW (t) is a C0 -semigroup whose generator is
A= ∆ 0 1 , D(A) = H 2 (Rn ) ⊕ H 1 (Rn ).
0
Note that kwk22 = hw, wi = hŵ, ŵi = hv̂, |p|2 v̂i = −hv, ∆vi.
sin(t|p|)
If n = 1 we have |p| ∈ L2 (R) and hence we can get an expression in
sin(t|p|)
terms of convolutions. In fact, since the inverse Fourier transform of |p|
is π2 χ[−1,1] (p/t), we obtain
p
1 x+t
Z Z
1
u(t, x) = χ[−t,t] (x − y)f (y)dy = f (y)dy
R 2 2 x−t
in the case g = 0. But the corresponding expression for f = 0 is just the
time derivative of this expression and thus
Z x+t
1 x+t
Z
1∂
u(t, x) = g(y)dy + f (y)dy
2 ∂t x−t 2 x−t
g(x + t) + g(x − t) 1 x+t
Z
= + f (y)dy, (11.83)
2 2 x−t
which is known as d’Alembert’s formula.
To obtain the corresponding formula in n = 3 dimensions we use the
following observation
∂ sin(t|p|) 1 − cos(t|p|)
ϕ̂t (p) = , ϕ̂t (p) = , (11.84)
∂t |p| |p|2
where ϕ̂t ∈ L2 (R3 ). Hence we can compute its inverse Fourier transform
using Z
1
ϕt (x) = lim ϕ̂t (p)eipx d3 p (11.85)
R→∞ (2π)3/2 BR (0)
where Z z
sin(x)
Si(z) = dx (11.88)
0 x
is the sine integral. Using Si(−x) = − Si(x) for x ∈ R and (Problem 11.19)
π
lim Si(x) = (11.89)
x→∞ 2
we finally obtain (since the pointwise limit must equal the L2 limit)
r
π χ[0,t] (|x|)
ϕt (x) = . (11.90)
2 |x|
For the wave equation this implies (using Lemma 7.33)
Z
1 ∂ 1
u(t, x) = f (y)dy
4π ∂t B|t| (|x|) |x − y|
Z |t| Z
1 ∂ 1
= f (x − rω)r2 dσ 2 (ω)dr
4π ∂t 0 S n−1 r
Z
t
= f (x − tθ)dσ 2 (ω) (11.91)
4π S n−1
and thus finally
Z Z
∂ t t
u(t, x) = g(x − tω)dσ 2 (ω) + f (x − tω)dσ 2 (ω), (11.92)
∂t 4π S n−1 4π S n−1
which is known as Kirchhoff ’s formula.
Finally, to obtain a formula in n = 2 dimensions we use the method of
descent: That is we use the fact, that our solution in two dimensions is also
a solution in three dimensions which happens to be independent of the third
coordinate direction. Hence our solution is given by Kirchhoff’s formula and
we can simplify the integral using the fact that f does not depend on x3 .
Using spherical coordinates we obtain
Z
t
f (x − tω)dσ n−1 (ω) =
4π S n−1
Z π/2 Z 2π
t
= f (x1 − t sin(θ) cos(ϕ), x2 − t sin(θ) sin(ϕ)) sin(θ)dθdϕ
2π 0 0
Z 1 Z 2π
ρ=sin(θ) t f (x1 − tρ cos(ϕ), x2 − tρ sin(ϕ))
= p ρdρ dϕ
2π 0 0 1 − ρ2
f (x − ty) 2
Z
t
= p d y
2π B1 (0) 1 − |y|2
which gives Poissons formula
g(x − ty) 2 f (x − ty) 2
Z Z
∂ t t
u(t, x) = p d y+ p d y. (11.93)
∂t 2π B1 (0) 1 − |y|2 2π B1 (0) 1 − |y|2
11.4. Applications to evolution equations 221
Problem 11.15. Show that u(t) defined in (11.58) is in C 1 ((0, ∞), L2 (Rn ))
2
and solves (11.57). (Hint: |e−t|p| − 1| ≤ t|p|2 for t ≥ 0.)
Problem 11.16. Suppose u(t) ∈ C 1 (I, X). Show that for s, t ∈ I
du
ku(t) − u(s)k ≤ M |t − s|, M = sup k (τ )k.
τ ∈[s,t] dt
Interpolation
kf kp ≤ kf k1−θ θ
p0 kf kp1 , (12.1)
where p1 = 1−θ θ p0
p0 + p1 , θ ∈ (0, 1). Note that L ∩ L
p1 contains all integrable
p
simple functions which are dense in L for 1 ≤ p < ∞ (for p = ∞ this is
only true if the measure is finite — cf. Problem 8.10).
This is a first occurrence of an interpolation technique. Next we want
to turn to operators. For example, we have defined the Fourier transform as
an operator from L1 → L∞ as well as from L2 → L2 and the question is if
this can be used to extend the Fourier transform to the spaces in between.
Denote by Lp0 + Lp1 the space of (equivalence classes) of measurable
functions f which can be written as a sum f = f0 + f1 with f0 ∈ Lp0
and f1 ∈ Lp1 (clearly such a decomposition is not unique and different
decompositions will differ by elements from Lp0 ∩ Lp1 ). Then we have
223
224 12. Interpolation
them by virtue of
A : Lp0 + Lp1 → Lq0 + Lq1 , f0 + f1 7→ A0 f0 + A1 f1 (12.3)
(check that A is indeed well-defined, i.e, independent of the decomposition
of f into f0 + f1 ). In particular, this defines A on Lp for every p ∈ (p0 , p1 )
and the question is if A restricted to Lp will be a bounded operator into
some Lq provided A0 and A1 are bounded.
To answer this question we begin with a result from complex analysis.
Theorem 12.1 (Hadamard three-lines theorem). Let S be the open strip
{z ∈ C|0 < Re(z) < 1} and let F : S → C be continuous and bounded on S
and holomorphic in S. If
(
M0 , Re(z) = 0,
|F (z)| ≤ (12.4)
M1 , Re(z) = 1,
then
1−Re(z) Re(z)
|F (z)| ≤ M0 M1 (12.5)
for every z ∈ S.
where f, g are simple functions with kf kpθ = kgkqθ0 = 1 and q1θ + q10 = 1.
P P θ
Now choose simple functions f (x) = j αj χAj (x), g(x) = k βk χBk (x)
with kf k1 = kgk1 = 1 and set fz (x) = j |αj |1/pz sign(αj )χAj (x), gz (y) =
P
1−1/qz sign(β )χ (y) such that kf k
P
k |βk | k Bk z pθ = kgz kqθ0 = 1 for θ = Re(z) ∈
[0, 1]. Moreover, note that both functions are entire and thus the function
Z
F (z) = (Afz )(y)gz dν(y)
1 1 1
where r = p + q − 1.
Proof. By Corollary 11.13 the claim holds for f, g ∈ S(Rn ). Now take a
sequence of Schwartz functions fm → f in Lp and a sequence of Schwartz
0
functions gm → g in Lq . Then the left-hand side converges in Lr , where
1 1 1
r0 = 2 − p − q , by the Young and Hausdorff-Young inequalities. Similarly
0
the right-hand side converges in Lr by the generalized Hölder (Problem 8.5)
and Hausdorff-Young inequalities.
Problem 12.1. Show (12.1). (Hint: Generalized Hölder inequality from
Problem 8.5.)
Problem 12.2. Show that the Fourier transform does not extend to a con-
tinuous map F : Lp (Rn ) → Lq (Rn ), for p > 2. Use the closed graph theorem
to conclude that F is not onto for 1 ≤ p ≤ 2. (Hint for the case n = 1:
Consider φz (x) = exp(−zx2 /2) for z = λ + iω with λ > 0.)
Problem 12.3 (Young inequality). Let K(x, y) be measurable and suppose
sup kK(x, .)kLr (Y,dν) ≤ C, sup kK(., y)kLr (X,dµ) ≤ C.
x y
and the corresponding spaces Lp,w (X, dµ) consist of all equivalence classes
of functions which are equal a.e. for which the above norm is finite. Clearly
the distribution function and hence the weak Lp norm depend only on the
equivalence class. Despite its name the weak Lp norm turns out to be only
a quasinorm (Problem 12.4). By construction we have
kf kp,w ≤ kf kp (12.17)
and thus Lp (X, dµ) ⊆ Lp,w (X, dµ). In the case p = ∞ we set k.k∞,w = k.k∞ .
1
Example. Consider f (x) = x in R. Then clearly f 6∈ L1 (R) but
1 2
Ef (r) = |{x| | | > r}| = |{x| |x| < r−1 }| =
x r
shows that f ∈ L1,w (R) with kf k1,w = 2. Slightly more general the function
f (x) = |x|−n/p 6∈ Lp (Rn ) but f ∈ Lp,w (Rn ). Hence Lp,w (Rn ) is strictly
larger than Lp (Rn ).
228 12. Interpolation
and
Z ∞Z
I1 = rp−p1 −1 χ{(x,r)| |f (x)|≤r} |f (x)|p1 dµ(x) dr
0 X
Z Z ∞
1
= |f (x)| p1
rp−p1 −1 dr dµ(x) = kf kpp .
X |f (x)| p1 − p
This is the desired estimate
1/p
kT (f )kp ≤ 2 p/(p − p0 )C0p0 + p/(p1 − p)C1p1 kf kp .
The case p1 = ∞ is similar: Split f ∈ Lp0 according to
f0 = f χ{x| |f |>s/C1 } ∈ Lp0 ∩ Lp , f1 = f χ{x| |f |≤s/C1 } ∈ Lp ∩ L∞
(if C1 = 0 there is noting to prove). Then kT (f1 )k∞ ≤ s/C1 and hence
ET (f1 ) (s) = 0. Thus
C0p0
Z
ET (f ) (2r) ≤ p0 |f |p0 dµ
r {x| |f |>r/C1 }
Proof. The first estimate follows literally as in the proof of Lemma 9.5
and combining this estimate with the trivial one (12.22) the Marcinkiewicz
interpolation theorem yields the second.
and the estimate follows upon taking absolute values and observing kφk1 =
P
j αj |Brj (0)|.
Nonlinear Functional
Analysis
Chapter 13
Analysis in Banach
spaces
where o, O are the Landau symbols. The linear map dF (x) is called deriv-
ative of F at x. If F is differentiable for all x ∈ U we call F differentiable.
In this case we get a map
dF : U → L(X, Y )
. (13.2)
x 7→ dF (x)
235
236 13. Analysis in Banach spaces
Pn
being fixed). We have dF v = i=1 ∂i F vi , v = (v1 , . . . , vn ) ∈ X, and
F ∈ C 1 (X, Y ) if and only if all partial derivatives exist and are continuous.
In the case of X = Rm and Y = Rn , the matrix representation of dF
with respect to the canonical basis in Rm and Rn is given by the partial
derivatives ∂i Fj (x) and is called Jacobi matrix of F at x.
We can iterate the procedure of differentiation and write F ∈ C r (U, Y ),
r ≥ 1, if the r-th derivative of F , dr F (i.e., the derivative of the (r − 1)-
th
T derivative of F ), exists and is continuous. Finally, we set C ∞ (U, Y ) =
r 0
r∈N C (U, Y ) and, for notational convenience, C (U, Y ) = C(U, Y ) and
d0 F = F .
It is often necessary to equip C r (U, Y ) with a norm. A suitable choice
is
|F | = max sup |dj F (x)|. (13.3)
0≤j≤r x∈U
The set of all r times continuously differentiable functions for which this
norm is finite forms a Banach space which is denoted by Cbr (U, Y ).
If F is bijective and F , F −1 are both of class C r , r ≥ 1, then F is called
a diffeomorphism of class C r .
Note that if F ∈ L(X, Y ), then dF (x) = F (independent of x) and
dr F (x) = 0, r > 1.
For the composition of mappings we note the following result (which is
easy to prove).
then
sup |dF (x)| ≤ M. (13.7)
x∈U
13.1. Differentiation and integration in Banach spaces 237
for any δ > 0. Let t0 = max{t ∈ [0, 1]|φ(t) ≤ 0}. If t0 < 1 then
since we can assume |o(ε)| < εδ for ε > 0 small enough, a contradiction.
|F | = sup
Qm
|F (x1 , . . . , xm )| < ∞. (13.11)
x: i=1 |xi |=1
shows that x is a fixed point and the estimate (13.20) follows after taking
the limit n → ∞ in (13.22).
240 13. Analysis in Banach spaces
Note that our proof is constructive, since it shows that the solution ξ(y)
can be obtained by iterating x − (∂x F (x0 , y0 ))−1 F (x, y).
Moreover, as a corollary of the implicit function theorem we also obtain
the inverse function theorem.
Proof. Fix x0 ∈ Cb (I, U ) and ε > 0. For each t ∈ I we have a δ(t) > 0
such that B2δ(t) (x0 (t)) ⊂ U and |f (x) − f (x0 (t))| ≤ ε/2 for all x with
|x − x0 (t)| ≤ 2δ(t). The balls Bδ(t) (x0 (t)), t ∈ I, cover the set {x0 (t)}t∈I
and since I is compact, there is a finite subcover Bδ(tj ) (x0 (tj )), 1 ≤ j ≤ n.
Let |x − x0 | ≤ δ = min1≤j≤n δ(tj ). Then for each t ∈ I there is ti such
that |x0 (t) − x0 (tj )| ≤ δ(tj ) and hence |f (x(t)) − f (x0 (t))| ≤ |f (x(t)) −
f (x0 (tj ))| + |f (x0 (tj )) − f (x0 (t))| ≤ ε since |x(t) − x0 (tj )| ≤ |x(t) − x0 (t)| +
|x0 (t) − x0 (tj )| ≤ 2δ(tj ). This settles the case r = 0.
Next let us turn to r = 1. We claim that df∗ is given by (df∗ (x0 )x)(t) =
df (x0 (t))x(t). Hence we need to show that for each ε > 0 we can find a
δ > 0 such that
sup |f (x0 (t) + x(t)) − f (x0 (t)) − df (x0 (t))x(t)| ≤ ε sup |x(t)| (13.32)
t∈I t∈I
Now we come to our existence and uniqueness result for the initial value
problem in Banach spaces.
14.1. Introduction
Many applications lead to the problem of finding all zeros of a mapping
f : U ⊆ X → X, where X is some (real) Banach space. That is, we are
interested in the solutions of
f (x) = 0, x ∈ U. (14.1)
In most cases it turns out that this is too much to ask for, since determining
the zeros analytically is in general impossible.
Hence one has to ask some weaker questions and hope to find answers for
them. One such question would be ”Are there any solutions, respectively,
how many are there?”. Luckily, these questions allow some progress.
To see how, lets consider the case f ∈ H(C), where H(U ) denotes the
set of holomorphic functions on a domain U ⊂ C. Recall the concept of
the winding number from complex analysis. The winding number of a path
γ : [0, 1] → C around a point z0 ∈ C is defined by
Z
1 dz
n(γ, z0 ) = ∈ Z. (14.2)
2πi γ z − z0
It gives the number of times γ encircles z0 taking orientation into account.
That is, encirclings in opposite directions are counted with opposite signs.
In particular, if we pick f ∈ H(C) one computes (assuming 0 6∈ f (γ))
Z 0
1 f (z) X
n(f (γ), 0) = dz = n(γ, zk )αk , (14.3)
2πi γ f (z)
k
245
246 14. The Brouwer mapping degree
Hence we see that we cannot expect too much from our degree. In addition,
since it is unclear how it should be defined, we will first require some basic
14.2. Definition of the mapping degree and the determinant formula 247
properties a degree should have and then we will look for functions satisfying
these properties.
Proof. For the first part of (i) use (D3) with U1 = U and U2 = ∅. For
the second part use U2 = ∅ in (D3) if i = 1 and the rest follows from
induction. For (ii) use i = 1 and U1 = ∅ in (ii). For (iii) note that H(t, x) =
(1 − t)f (x) + t g(x) satisfies |H(t, x) − y| ≥ dist(y, f (∂U )) − |f (x) − g(x)| for
x on the boundary.
Next we show that (D.4) implies several at first sight much stronger
looking facts.
Theorem 14.2. We have that deg(., U, y) and deg(f, U, .) are both contin-
uous. In fact, we even have
(i). deg(., U, y) is constant on each component of Dy (U , Rn ).
(ii). deg(f, U, .) is constant on each component of Rn \f (∂U ).
Moreover, if H : [0, 1] × U → Rn and y : [0, 1] → Rn are both contin-
uous such that H(t) ∈ Dy(t) (U, Rn ), t ∈ [0, 1], then deg(H(0), U, y(0)) =
deg(H(1), U, y(1)).
Note that this result also shows why deg(f, U, y) cannot be defined mean-
ingful for y ∈ f (∂D). Indeed, approaching y from within different compo-
nents of Rn \f (∂U ) will result in different limits in general!
In addition, note that if Q is a closed subset of a locally pathwise con-
nected space X, then the components of X\Q are open (in the topology of
X) and pathwise connected (the set of points for which a path to a fixed
point x0 exists is both open and closed).
Now let us try to compute deg using its properties. Lets start with a
simple case and suppose f ∈ C 1 (U, Rn ) and y 6∈ CV(f ) ∪ f (∂U ). Without
restriction we consider y = 0. In addition, we avoid the trivial case f −1 (y) =
∅. Since the points of f −1 (0) inside U are isolated (use Jf (x) 6= 0 and
the inverse function theorem) they can only cluster at the boundary ∂U .
But this is also impossible since f would equal y at the limit point on the
boundary by continuity. Hence f −1 (0) = {xi }N i=1 . Picking sufficiently small
neighborhoods U (xi ) around xi we consequently get
N
X
deg(f, U, 0) = deg(f, U (xi ), 0). (14.8)
i=1
It suffices to consider one of the zeros, say x1 . Moreover, we can even assume
x1 = 0 and U (x1 ) = Bδ (0). Next we replace f by its linear approximation
around 0. By the definition of the derivative we have
f (x) = df (0)x + |x|r(x), r ∈ C(Bδ (0), Rn ), r(0) = 0. (14.9)
Now consider the homotopy H(t, x) = df (0)x + (1 − t)|x|r(x). In order
to conclude deg(f, Bδ (0), 0) = deg(df (0), Bδ (0), 0) we need to show 0 6∈
H(t, ∂Bδ (0)). Since Jf (0) 6= 0 we can find a constant λ such that |df (0)x| ≥
λ|x| and since r(0) = 0 we can decrease δ such that |r| < λ. This implies
|H(t, x)| ≥ ||df (0)x| − (1 − t)|x||r(x)|| ≥ λδ − δ|r| > 0 for x ∈ ∂Bδ (0) as
desired.
In summary we have
N
X
deg(f, U, 0) = deg(df (xi ), U (xi ), 0) (14.10)
i=1
Using this lemma we can now show the main result of this section.
Theorem 14.4. Suppose f ∈ Dy1 (U , Rn ) and y 6∈ CV(f ), then a degree
satisfying (D1)–(D4) satisfies
X
deg(f, U, y) = sign Jf (x), (14.12)
x∈f −1 (y)
P
where the sum is finite and we agree to set x∈∅ = 0.
Proof. Since the claim is easy for linear mappings our strategy is as follows.
We divide U into sufficiently small subsets. Then we replace f by its linear
approximation in each subset and estimate the error.
Let CP(f ) = {x ∈ U |Jf (x) = 0} be the set of critical points of f . We first
pass to cubes which are easier to divide. Let {Qi }i∈N be a countable cover for
U consisting of open cubes such that Qi ⊂ U . Then it suffices S to prove that
f (CP(f )∩Qi ) has zero measure since CV(f ) = f (CP(f )) = i f (CP(f )∩Qi )
(the Qi ’s are a cover).
Let Q be any of these cubes and denote by ρ the length of its edges.
Fix ε > 0 and divide Q into N n cubes Qi of length ρ/N . Since df (x) is
uniformly continuous on Q we can find an N (independent of i) such that
Z 1
ερ
|f (x) − f (x̃) − df (x̃)(x − x̃)| ≤ |df (x̃ + t(x − x̃)) − df (x̃)||x̃ − x|dt ≤
0 N
(14.14)
for x̃, x ∈ Qi . Now pick a Qi which contains a critical point x̃i ∈ CP(f ).
Without restriction we assume x̃i = 0, f (x̃i ) = 0 and set M = df (x̃i ). By
det M = 0 there is an orthonormal basis {bi }1≤i≤n of Rn such that bn is
orthogonal to the image of M . In addition,
v
n u n n
X uX
i t √ ρ X √ ρ
Qi ⊆ { λi b | |λi |2 ≤ n } ⊆ { λi bi | |λi | ≤ n }
N N
i=1 i=1 i=1
252 14. The Brouwer mapping degree
C̃ε
and hence the measure of f (Qi ) is smaller than N n . Since there are at most
n
N such Qi ’s, we see that the measure of f (CP(f ) ∩ Q) is smaller than
C̃ε.
Having this result out of the way we can come to step one and two from
above.
SN i)
for x ∈ U \ i=1 U (x and hence
Z N Z
X
φε (f (x) − y)Jf (x)dx = φε (f (x) − y)Jf (x)dx
U i=1 U (xi )
N
X Z
= sign(Jf (x)) φε (x̃)dx̃ = deg(f, U, y), (14.18)
i=1 Bε0 (0)
where we have used the change of variables x̃ = f (x) in the second step.
Our new integral representation makes sense even for critical values. But
since ε depends on y, continuity with respect to y is not clear. This will be
shown next at the expense of requiring f ∈ C 2 rather than f ∈ C 1 .
The key idea is to rewrite deg(f, U, y 2 ) − deg(f, U, y 1 ) as an integral over
a divergence (here we will need f ∈ C 2 ) supported in U and then apply
Stokes theorem. For this purpose the following result will be used.
Lemma 14.7. Suppose f ∈ C 2 (U, Rn ) and u ∈ C 1 (Rn , Rn ), then
(div u)(f )Jf = div Df (u), (14.19)
where Df (u)j is the determinant of the matrix obtained from df by replacing
the j-th column by u(f ).
Proof. We compute
n
X n
X
div Df (u) = ∂xj Df (u)j = Df (u)j,k , (14.20)
j=1 j,k=1
where Df (u)j,k is the determinant of the matrix obtained from the matrix
associated with Df (u)j by applying ∂xj to the k-th column. Since ∂xj ∂xk f =
∂xk ∂xj f we infer Df (u)j,k = −Df (u)k,j , j 6= k, by exchanging the k-th and
the j-th column. Hence
Xn
div Df (u) = Df (u)i,i . (14.21)
i=1
(i,j) (i,j)
denote the (i, j) minor of df (x) and recall ni=1 Jf ∂xi fk =
P
Now let Jf (x)
δj,k Jf . Using this to expand the determinant Df (u)i,i along the i-th column
shows
n n n
(i,j) (i,j)
X X X
div Df (u) = Jf ∂xi uj (f ) = Jf (∂xk uj )(f )∂xi fk
i,j=1 i,j=1 k=1
n n n
(i,j)
X X X
= (∂xk uj )(f ) Jf ∂xj fk = (∂xj uj )(f )Jf (14.22)
j,k=1 i=1 j=1
as required.
254 14. The Brouwer mapping degree
Proof. Fix ỹ ∈ Rn \f (∂U ) and consider the largest ball Bρ (ỹ), ρ = dist(ỹ, f (∂U ))
around ỹ contained in Rn \f (∂U ). Pick y i ∈ Bρ (ỹ) ∩ RV(f ) and consider
Z
deg(f, U, y 2 ) − deg(f, U, y 1 ) = (φε (f (x) − y 2 ) − φε (f (x) − y 1 ))Jf (x)dx
U
(14.23)
for suitable φε ∈ C 2 (Rn , R) and suitable ε > 0. Now observe
Z 1
(div u)(y) = zj ∂yj φ(y + tz)dt
0
Z 1
d
= ( φ(y + t z))dt = φε (y − y 2 ) − φε (y − y 1 ), (14.24)
0 dt
where
Z 1
u(y) = z φ(y + t z)dt, φ(y) = φε (y − y 1 ), z = y 1 − y 2 , (14.25)
0
R
and apply the previous lemma to rewrite the integral as U div Df (u)dx.
Since the integrand vanishes in a neighborhood of ∂U it is no restriction to
assumeR that ∂U is smooth R such that we can apply Stokes theorem. Hence we
have U div Df (u)dx = ∂U Df (u)dF = 0 since u is supported inside Bρ (ỹ)
provided ε is small enough (e.g., ε < ρ − max{|y i − ỹ|}i=1,2 ).
This coincides with the boundary value approach for complex functions
(note that holomorphic functions are orientation preserving).
Proof. If f −1 (y) = ∅ the same is true for f + t g if |t| < dist(y, f (U ))/|g|.
Hence we can exclude this case. For the remaining case we use our usual
strategy of considering y ∈ RV(f ) first and then approximating general y
by regular ones.
Suppose y ∈ RV(f ) and let f −1 (y) = {xi }N j=1 . By the implicit function
theorem we can find disjoint neighborhoods U (xi ) such that there exists a
unique solution xi (t) ∈ U (xi ) of (f + t g)(x) = y for |t| < ε1 . By reducing
U (xi ) if necessary, we can even assume that the sign of Jf +t g is constant on
U (xi ). Finally, let ε2 = dist(y, f (U \ N i
S
i=1 U (x )))/|g|. Then |f + t g − y| > 0
for |t| < ε2 and ε = min(ε1 , ε2 ) is the quantity we are looking for.
It remains to consider the case y ∈ CV(f ). pick a regular value ỹ ∈
Bρ/3 (y), where ρ = dist(y, f (∂U )), implying deg(f, U, y) = deg(f, U, ỹ).
Then we can find an ε̃ > 0 such that deg(f, U, ỹ) = deg(f + t g, U, ỹ) for
|t| < ε̃. Setting ε = min(ε̃, ρ/(3|g|)) we infer ỹ−(f +t g)(x) ≥ ρ/3 for x ∈ ∂U ,
that is |ỹ − y| < dist(ỹ, (f + t g)(∂U )), and thus deg(f + t g, U, ỹ) = deg(f +
t g, U, y). Putting it all together implies deg(f, U, y) = deg(f + t g, U, y) for
|t| < ε as required.
f ∈ Dy (U , Rn ) we have
X
deg(f, U, y) = sign Jf˜(x) (14.30)
x∈f˜−1 (y)
Proof. Our previous considerations show that deg is well-defined and lo-
cally constant with respect to the first argument by construction. Hence
deg(., U, y) : Dy (U , Rn ) → Z is continuous and thus necessarily constant on
components since Z is discrete. (D2) is clear and (D1) is satisfied since it
holds for f˜ by construction. Similarly, taking U1,2 as in (D3) we can require
|f − f˜| < dist(y, f (U \(U1 ∪ U2 )). Then (D3) is satisfied since it also holds
for f˜ by construction. Finally, (D4) is a consequence of continuity.
Proof. If f does not vanish, then H(t, x) = (1 − t)x + tf (x) must vanish at
some point (t0 , x0 ) ∈ (0, 1) × ∂BR (0) and thus
0 = H(t0 , x0 )x0 = (1 − t0 )R2 + t0 f (x0 )x0 . (14.35)
But the last part is positive by assumption, a contradiction.
The same is true for the map f : ∂B1 (0) → ∂B1 (0), x 7→ −x (∂B1 (0) ⊂ Rn
is simply connected for n ≥ 3 but not homeomorphic to a convex set).
It remains to prove the result from topology needed in the proof of
the Brouwer fixed-point theorem. It is a variant of the Tietze extension
theorem.
Theorem 14.15. Let X be a metric space, Y a Banach space and let K
be a closed subset of X. Then F ∈ C(K, Y ) has a continuous extension
F ∈ C(X, Y ) such that F (X) ⊆ hull(F (K)).
Proof. Consider the open cover {Bρ(x) (x)}x∈X\K for X\K, where ρ(x) =
dist(x, K)/2. Choose a (locally finite) partition of unity {φλ }λ∈Λ subordi-
nate to this cover and set
X
F (x) = φλ (x)F (xλ ) for x ∈ X\K, (14.36)
λ∈Λ
fixed point we can define a retraction R(x) = f (x) + t(x)(x − f (x)), where
t(x) ≥ 0 is chosen such that |R(x)|2 = 1 (i.e., R(x) lies on the intersection
of the line spanned by x, f (x) with the unit sphere).
Using this equivalence the Brouwer fixed point theorem can also be de-
rived easily by showing that the homology groups of the unit ball B1 (0) and
its boundary (the unit sphere) differ (see, e.g., [20] for details).
Finally, we also derive the following important consequence know as
invariance of domain theorem.
Theorem 14.16 (Brower). Let U ⊆ Rn be open and let f : U → Rn be
continuous and injective. Then f (U ) is also open.
Proof. Suppose there where such a map and extend it to a map from U to
Rn by setting the additional coordinates equal to zero. The resulting map
contradicts the invariance of domain theorem.
of x (i.e., λi ≥ 0, m
P
wherePλmi are the barycentric coordinates i=1 λi = 1 and
x = i=1 λi vi ). By construction, f 1 ∈ C(K, K) and there is a fixed point
x1 . But unless x1 is one of the vertices, this doesn’t help us too much. So
lets choose a better function as follows. Consider the k-th barycentric sub-
division and for each vertex vi in this subdivision pick an element yi ∈ f (vi ).
Now define f k (vi ) = yi and extend f k to the interior of each subsimplex as
before. Hence f k ∈ C(K, K) and there is a fixed point
m
X m
X
xk = λki vik = λki yik , yik = f k (vik ), (14.42)
i=1 i=1
simplices shrink to a point, this implies vik → x0 and hence yi0 ∈ f (x0 ) since
(vik , yik ) ∈ Γ → (vi0 , yi0 ) ∈ Γ by the closedness assumption. Now (14.42) tells
us
Xm
x0 = λ0i yi0 ∈ f (x0 ) (14.43)
i=1
since f (x0 ) is convex and the claim holds if K is a simplex.
If K is not a simplex, we can pick a simplex S containing K and proceed
as in the proof of the Brouwer theorem.
If f (x) contains precisely one point for all x, then Kakutani’s theorem
reduces to the Brouwer’s theorem.
Now we want to see how this applies to game theory.
An n-person game consists of n players who have mi possible actions to
choose from. The set of all possible actions for the i-th player will be denoted
by Φi = {1, . . . , mi }. An element ϕi ∈ Φi is also called a pure strategy for
reasons to become clear in a moment. Once all players have chosen their
move ϕi , the payoff for each player is given by the payoff function
n
Y
Ri (ϕ), ϕ = (ϕ1 , . . . , ϕn ) ∈ Φ = Φi (14.44)
i=1
of the i-th player. We will consider the case where the game is repeated
a large number of times and where in each step the players choose their
action according to a fixed strategy. Here a strategy si for the i-th player is
a probability distribution on Φi , that is, si = (s1i , . . . , sm k
i ) such that si ≥ 0
i
262 14. The Brouwer mapping degree
and m
P i k
k=1 si = 1. The set of all possible strategies for the i-th player is
denoted by Si . The number ski is the probability for the Qn k-th pure strategy
to be chosen. Consequently, if s = (s1 , . . . , sn ) ∈ S = i=1 Si is a collection
of strategies, then the probability that a given collection of pure strategies
gets chosen is
Yn
s(ϕ) = si (ϕ), si (ϕ) = ski i , ϕ = (k1 , . . . , kn ) ∈ Φ (14.45)
i=1
(assuming all players make their choice independently) and the expected
payoff for player i is X
Ri (s) = s(ϕ)Ri (ϕ). (14.46)
ϕ∈Φ
By construction, Ri (s) is continuous.
The question is of course, what is an optimal strategy for a player? If
the other strategies are known, a best reply of player i against s would be a
strategy si satisfying
Ri (s\si ) = max Ri (s\s̃i ) (14.47)
s̃i ∈Si
Here s\s̃i denotes the strategy combination obtained from s by replacing
si by s̃i . The set of all best replies against s for the i-th player is denoted
by Bi (s). Explicitly, si ∈ B(s) if and only if ski = 0 whenever Ri (s\k) <
max1≤l≤mi Ri (s\l) (in particular Bi (s) 6= ∅).
Let s, s ∈ S, we call s a best reply against s if si is a Q
best reply against
s for all i. The set of all best replies against s is B(s) = ni=1 Bi (s).
A strategy combination s ∈ S is a Nash equilibrium for the game if it is
a best reply against itself, that is,
s ∈ B(s). (14.48)
Or, put differently, s is a Nash equilibrium if no player can increase his
payoff by changing his strategy as long as all others stick to their respective
strategies. In addition, if a player sticks to his equilibrium strategy, he is
assured that his payoff will not decrease no matter what the others do.
To illustrate these concepts, let us consider the famous prisoners dilemma.
Here we have two players which can choose to defect or cooperate. The pay-
off is symmetric for both players and given by the following diagram
R1 d2 c2 R2 d2 c2
d1 0 2 d1 0 −1 (14.49)
c1 −1 1 c1 2 1
where ci or di means that player i cooperates or defects, respectively. It is
easy to see that the (pure) strategy pair (d1 , d2 ) is the only Nash equilibrium
for this game and that the expected payoff is 0 for both players. Of course,
14.6. Further properties of the degree 263
both players could get the payoff 1 if they both agree to cooperate. But if
one would break this agreement in order to increase his payoff, the other
one would get less. Hence it might be safer to defect.
Now that we have seen that Nash equilibria are a useful concept, we
want to know when such an equilibrium exists. Luckily we have the following
result.
Theorem 14.19 (Nash). Every n-person game has at least one Nash equi-
librium.
and hence fore every i we have Ki ⊆ Gl for some l since components are
maximal connected sets. Let Nl = {i|Ki ⊆ Gl } and observe that we have
266 14. The Brouwer mapping degree
P
deg(k, Gl , y) = i∈Nl deg(k, Ki , y) and deg(h, Hj , Gl ) = deg(h, Hj , Ki ) for
every i ∈ Nl . Therefore,
XX X
1= deg(h, Hj , Ki ) deg(k, Ki , y) = deg(h, Hj , Ki ) deg(k, Ki , Hj )
l i∈Nl i
(14.61)
By reversing the role of C1 and C2 , the same formula holds with Hj and Ki
interchanged.
Hence
X XX X
1= deg(h, Hj , Ki ) deg(k, Ki , Hj ) = 1 (14.62)
i i j j
The Leray–Schauder
mapping degree
provided this definition is independent of the basis chosen. To see this let
ψ be a second isomorphism. Then A = ψ ◦ φ−1 ∈ GL(n). Abbreviate
f ∗ = φ ◦ f ◦ φ−1 , y ∗ = φ(y) and pick f˜∗ ∈ Cy1 (φ(U ), Rn ) in the same
component of Dy (φ(U ), Rn ) as f ∗ such that y ∗ ∈ RV(f ∗ ). Then A◦f˜∗ ◦A−1 ∈
Cy1 (ψ(U ), Rn ) is the same component of Dy (ψ(U ), Rn ) as A ◦ f ∗ ◦ A−1 =
ψ ◦ f ◦ ψ −1 (since A is also a homeomorphism) and
267
268 15. The Leray–Schauder mapping degree
Our next aim is to tackle the infinite dimensional case. The general
idea is to approximate F by finite dimensional maps (in the same spirit as
we approximated continuous f by smooth functions). To do this we need
to know which maps can be approximated by finite dimensional operators.
Hence we have to recall some basic facts first.
Proof. Pick {xi }ni=1 ⊆ K such that ni=1 Bε (xi ) covers K. Let {φi }ni=1 be
S
n
Pn to {Bε (xi )}i=1 , that is,
a partition of unity (restricted to K) subordinate
φi ∈ C(K, [0, 1]) with supp(φi ) ⊂ Bε (xi ) and i=1 φi (x) = 1, x ∈ K. Set
n
X
Pε (x) = φi (x)xi , (15.3)
i=1
then
n
X n
X n
X
|Pε (x) − x| = | φi (x)x − φi (x)xi | ≤ φi (x)|x − xi | ≤ ε. (15.4)
i=1 i=1 i=1
Now we are all set for the definition of the Leray–Schauder degree, that
is, for the extension of our degree to infinite dimensional Banach spaces.
Proof. Except for (iv) all statements follow easily from the definition of the
degree and the corresponding property for the degree in finite dimensional
spaces. Considering H(t, x) − y(t), we can assume y(t) = 0 by (i). Since
H([0, 1], ∂U ) is compact, we have ρ = dist(y, H([0, 1], ∂U ) > 0. By Theo-
rem 15.2 we can pick H1 ∈ F([0, 1] × U, X) such that |H(t) − H1 (t)| < ρ,
t ∈ [0, 1]. this implies deg(I + H(t), U, 0) = deg(I + H1 (t), U, 0) and the rest
follows from Theorem 14.2.
In addition, Theorem 14.1 and Theorem 14.2 hold for the new situation
as well (no changes are needed in the proofs).
Theorem 15.5. Let F, G ∈ Dy (U, X), then the following statements hold.
(i). We have deg(I + F, ∅, y) = 0. Moreover, if Ui , 1 ≤ i ≤ N , are
disjoint open subsets of U such that y 6∈ (I + F )(U \ N
S
PN i=1 Ui ), then
deg(I + F, U, y) = i=1 deg(I + F, Ui , y).
(ii). If y 6∈ (I + F )(U ), then deg(I + F, U, y) = 0 (but not the other way
round). Equivalently, if deg(I + F, U, y) 6= 0, then y ∈ (I + F )(U ).
(iii). If |f (x) − g(x)| < dist(y, f (∂U )), x ∈ ∂U , then deg(f, U, y) =
deg(g, U, y). In particular, this is true if f (x) = g(x) for x ∈ ∂U .
(iv). deg(I + ., U, y) is constant on each component of Dy (U , X).
(v). deg(I + F, U, .) is constant on each component of X\f (∂U ).
15.4. The Leray–Schauder principle and the Schauder fixed-point theorem
271
Proof. (1). F (∂U ) ⊆ U and F (x) = αx for |x| = ρ implies |α|ρ ≤ ρ and
hence (15.10) holds. (2). F (x) = αx for |x| = ρ implies (α − 1)2 ρ2 ≥
(α2 − 1)ρ2 and hence α ≤ 0. (3). Special case of (2) since |F (x) − x|2 =
|F (x)|2 − 2hF (x), xi + |x|2 .
that |f (t, s, x)−f (t0 , s, x)| ≤ ε1 and |τ (t)−τ (t0 )| ≤ ε2 for |t−t0 | < δ. Hence
we infer for |t − t0 | < δ
Z
τ (t) Z τ (t0 )
|F (x)(t) − F (x)(t0 )| = f (t, s, x(s))ds − f (t0 , s, x(s))ds
a a
Z τ (t0 )
Z τ (t)
≤ |f (t, s, x(s)) − f (t0 , s, x(s))|ds + |f (t, s, x(s))|ds
a τ (t0 )
≤ (b − a)ε1 + ε2 M = ε. (15.14)
This implies that F (U ) is relatively compact by the Arzelà-Ascoli theorem.
Thus F is compact.
This result immediately gives the Peano theorem for ordinary differential
equations.
Theorem 15.12 (Peano). Consider the initial value problem
ẋ = f (t, x), x(t0 ) = x0 , (15.16)
where f ∈ C(I ×U, Rn ) and I ⊂ R is an interval containing t0 . Then (15.16)
has at least one local solution x ∈ C 1 ([t0 −ε, t0 +ε], Rn ), ε > 0. For example,
any ε satisfying εM (ε, ρ) ≤ ρ, ρ > 0 with M (ε, ρ) = max |f ([t0 − ε, t0 + ε] ×
Bρ (x0 ))| works. In addition, if M (ε, ρ) ≤ M̃ (ε)(1 + ρ), then there exists a
global solution.
and the first part follows from our previous theorem. To show the second,
fix ε > 0 and assume M (ε, ρ) ≤ M̃ (ε)(1 + ρ). Then
Z t Z t
|x(t)| ≤ |f (s, x(s))|ds ≤ M̃ (ε) (1 + |x(s)|)ds (15.18)
0 0
implies |x(t)| ≤ exp(M̃ (ε)ε) by Gronwall’s inequality. Hence we have an a
priori bound which implies existence by the Leary–Schauder principle. Since
ε was arbitrary we are done.
Chapter 16
The stationary
Navier–Stokes equation
275
276 16. The stationary Navier–Stokes equation
Taking the completion with respect to the associated norm we obtain the
Sobolev space H 1 (U, R). Similarly, taking the completion of Cc1 (U, R) with
respect to the same norm, we obtain the Sobolev space H01 (U, R). Here
Ccr (U, Y ) denotes the set of functions in C r (U, Y ) with compact support.
This construction of H 1 (U, R) implies that a sequence uk in C 1 (U, R) con-
verges to u ∈ H 1 (U, R) if and only if uk and all its first order derivatives
∂j uk converge in L2 (U, R). Hence we can assign each u ∈ H 1 (U, R) its first
order derivatives ∂j u by taking the limits from above. In order to show that
this is a useful generalization of the ordinary derivative, we need to show
that the derivative depends only on the limiting function u ∈ L2 (U, R). To
see this we need the following lemma.
Lemma 16.1 (Integration by parts). Suppose u ∈ H01 (U, R) and v ∈ H 1 (U, R),
then Z Z
u(∂j v)dx = − (∂j u)v dx. (16.9)
U U
And since C0∞ (U, R) is dense in L2 (U, R), the derivatives are uniquely de-
termined by u ∈ L2 (U, R) alone. Moreover, if u ∈ C 1 (U, R), then the deriv-
ative in the Sobolev space corresponds to the usual derivative. In summary,
H 1 (U, R) is the space of all functions u ∈ L2 (U, R) which have first order
derivatives (in the sense of distributions, i.e., (16.10)) in L2 (U, R).
Next, we want to consider some additional properties which will be used
later on. First of all, the Poincaré-Friedrichs inequality.
Lemma 16.2 (Poincaré-Friedrichs inequality). Suppose u ∈ H01 (U, R), then
Z Z
u2 dx ≤ d2j (∂j u)2 dx, (16.11)
U U
where dj = sup{|xj − yj | |(x1 , . . . , xn ), (y1 , . . . , yn ) ∈ U }.
Proof. Again we can assume u ∈ Cc1 (U, R) and we assume j = 1 for nota-
tional convenience. Replace U by a set K = [a, b] × K̃ containing U and
278 16. The stationary Navier–Stokes equation
Hence, from the view point of Banach spaces, we could also equip H01 (U, R)
with the scalar product
Z
hu, vi = (∂j u)(x)(∂j v)(x)dx. (16.14)
U
This scalar product will be more convenient for our purpose and hence we
will use it from now on. (However, all results stated will hold in either case.)
The norm corresponding to this scalar product will be denoted by |.|.
Next, we want to consider the embedding H01 (U, R) ,→ L2 (U, R) a little
closer. This embedding is clearly continuous since by the Poincaré-Friedrichs
inequality we have
d(U )
kuk2 ≤ √ kuk, d(U ) = sup{|x − y| |x, y ∈ U }. (16.15)
n
Moreover, by a famous result of Rellich, it is even compact. To see this we
first prove the following inequality.
Lemma 16.3 (Poincaré inequality). Let Q ⊂ Rn be a cube with edge length
ρ. Then
2
nρ2
Z Z Z
1
u2 dx ≤ n udx + (∂k u)(∂k u)dx (16.16)
Q ρ Q 2 Q
for all u ∈ H 1 (Q, R).
Xn Z 1
≤n (∂i u)2 dxi . (16.18)
i=1 0
The last term can be made arbitrarily small by picking N large. The first
term converges to 0 since uk converges weakly and each summand contains
the L2 scalar product of uk − u` and χQi (the characteristic function of
Qi ).
Proof. We first prove the case where u ∈ Cc1 (U, R). The key idea is to start
with U ⊂ R1 and then work ones way up to U ⊂ R2 and U ⊂ R3 .
If U ⊂ R1 we have
Z x Z
2 2
u(x) = ∂1 u (x1 )dx1 ≤ 2 |u∂1 u|dx1 (16.24)
and hence Z
2
max u(x) ≤ 2 |u∂1 u|dx1 . (16.25)
x∈U
Here, if an integration limit is missing, it means that the integral is taken
over the whole support of the function.
If U ⊂ R2 we have
ZZ Z Z
u dx1 dx2 ≤ max u(x, x2 ) dx2 max u(x1 , y)2 dx1
4 2
x y
ZZ ZZ
≤4 |u∂1 u|dx1 dx2 |u∂2 u|dx1 dx2
ZZ 2/2 ZZ 1/2 ZZ 1/2
2 2
≤4 u dx1 dx2 (∂1 u) dx1 dx2 (∂2 u)2 dx1 dx2
ZZ ZZ
2
≤4 u dx1 dx2 ((∂1 u)2 + (∂2 u)2 )dx1 dx2 (16.26)
and applying Cauchy–Schwarz finishes the proof for u ∈ Cc1 (U, R).
If u ∈ H01 (U, R) pick a sequence uk in Cc1 (U, R) which converges to u in
H01 (U, R) and hence in L2 (U, R). By our inequality, this sequence pis Cauchy
in L4 (U, R) and qRconverges to a limit v ∈ L4 (U, R). Since kuk ≤ 4 |U |kuk
2 4
2 2
R R
( 1 · u dx ≤ 4
1 dx u dx), uk converges to v in L (U, R) as well and
hence u = v. Now take the limit in the inequality for uk .
As a consequence we obtain
8d(U ) 1/4
kuk4 ≤ √ kuk, U ⊂ R3 , (16.28)
3
16.3. Existence and uniqueness of solutions 281
and
Corollary 16.6. The embedding
H01 (U, R) ,→ L4 (U, R), U ⊂ R3 , (16.29)
is compact.
only solve the Navier-Stokes equations in the sense of distributions. But let
us try to show existence of solutions for (16.34) first.
For later use we note
Z Z
1
a(v, v, v) = vk vj (∂k vj ) dx = vk ∂k (vj vj ) dx
U 2 U
Z
1
=− (vj vj )∂k vk dx = 0, v ∈ X. (16.36)
2 U
We proceed by studying (16.34). Let K ∈ L2 (U, R3 ), then
R
U Kw dx is
a linear functional on X and hence there is a K̃ ∈ X such that
Z
Kw dx = hK̃, wi, w ∈ X. (16.37)
U
Moreover, the same is true for the map a(u, v, .), u, v ∈ X , and hence there
is an element B(u, v) ∈ X such that
a(u, v, w) = hB(u, v), wi, w ∈ X. (16.38)
In addition, the map B : X2 → X is bilinear. In summary we obtain
hηv − B(v, v) − K̃, wi = 0, w ∈ X, (16.39)
and hence
ηv − B(v, v) = K̃. (16.40)
So in order to apply the theory from our previous chapter, we need a Banach
space Y such that X ,→ Y is compact.
Let us pick Y = L4 (U, R3 ). Then, applying the Cauchy-Schwarz in-
equality twice to each summand in a(u, v, w) we see
XZ 1/2 Z 1/2
2
|a(u, v, w)| ≤ (uk vj ) dx (∂k wj )2 dx
j,k U U
XZ 1/4 Z 1/4
4
≤ kwk (uk ) dx (vj )4 dx = kuk4 kvk4 kwk.
j,k U U
(16.41)
Moreover, by Corollary 16.6 the embedding X ,→ Y is compact as required.
Motivated by this analysis we formulate the following theorem.
Theorem 16.7. Let X be a Hilbert space, Y a Banach space, and suppose
there is a compact embedding X ,→ Y . In particular, kukY ≤ βkuk. Let
a : X 3 → R be a multilinear form such that
|a(u, v, w)| ≤ αkukY kvkY kwk (16.42)
and a(v, v, v) = 0. Then for any K̃ ∈ X , η > 0 we have a solution v ∈ X to
the problem
ηhv, wi − a(v, v, w) = hK̃, wi, w ∈ X. (16.43)
16.3. Existence and uniqueness of solutions 283
Monotone maps
Proof. Our first assumption implies that G(x) = F (x)−y satisfies G(x)x =
F (x)x − yx > 0 for |x| sufficiently large. Hence the first claim follows from
Theorem 14.13. The second claim is trivial.
285
286 17. Monotone maps
Proof. Set
G(x) = x − t(F (x) − y), t > 0, (17.9)
then F (x) = y is equivalent to the fixed point equation
G(x) = x. (17.10)
It remains to show that G is a contraction. We compute
kG(x) − G(x̃)k2 = kx − x̃k2 − 2thF (x) − F (x̃), x − x̃i + t2 kF (x) − F (x̃)k2
C
≤ (1 − 2 (Lt) + (Lt)2 )kx − x̃k2 ,
L
where L is a Lipschitz constant for F (i.e., kF (x) − F (x̃)k ≤ Lkx − x̃k).
Thus, if t ∈ (0, 2C
L ), G is a uniform contraction and the rest follows from the
uniform contraction principle.
Again observe that our proof is constructive. In fact, the best choice
for t is clearly t = LC2 such that the contraction constant θ = 1 − ( C 2
L ) is
minimal. Then the sequence
C
xn+1 = xn − 2 (F (xn ) − y), x0 = y, (17.11)
L
17.2. The nonlinear Lax–Milgram theorem 287
Proof. By the Riez lemma (Theorem 2.10) there are elements F (x) ∈ X
and z ∈ X such that a(x, y) = b(y) is equivalent to hF (x) − z, yi = 0, y ∈ X,
and hence to
F (x) = z. (17.15)
By (17.13) the map F is strongly monotone. Moreover, by (17.14) we infer
kF (x) − F (y)k = sup |hF (x) − F (y), x̃i| ≤ Lkx − yk (17.16)
x̃∈X,kx̃k=1
see that we can apply the Lax–Milgram theorem if 4a0 c0 > b20 . In summary,
we have proven
Theorem 17.4. The elliptic Dirichlet problem (17.18) has a unique weak
solution u ∈ H01 (U, R) if a0 > 0, b0 = 0, c0 ≥ 0 or a0 > 0, 4a0 c0 > b20 .
17.3. The main theorem of monotone maps 289
whenever xn → x.
The idea is as follows: Start with a finite dimensional subspace Xn ⊂ X
and project the equation F (x) = y to Xn resulting in an equation
Fn (xn ) = yn , xn , yn ∈ Xn . (17.29)
More precisely, let Pn be the (linear) projection onto Xn and set Fn (xn ) =
Pn F (xn ), yn = Pn y (verify that Fn is continuous and monotone!).
Now Lemma 17.1 ensures that there exists a solution S
un . Now chose the
subspaces Xn such that Xn → X (i.e., Xn ⊂ Xn+1 and ∞ n=1 Xn is dense).
Then our hope is that un converges to a solution u.
This approach is quite common when solving equations in infinite di-
mensional spaces and is known as Galerkin approximation. It can often
be used for numerical computations and the right choice of the spaces Xn
will have a significant impact on the quality of the approximation.
So how should we show that xn converges? First of all observe that our
construction of xn shows that xn lies in some ball with radius Rn , which is
chosen such that
hFn (x), xi > kyn kkxk, kxk ≥ Rn , x ∈ Xn . (17.30)
Since hFn (x), xi = hPn F (x), xi = hF (x), Pn xi = hF (x), xi for x ∈ Xn we can
drop all n’s to obtain a constant R which works for all n. So the sequence
xn is uniformly bounded
kxn k ≤ R. (17.31)
Now by a well-known result there exists a weakly convergent subsequence.
That is, after dropping some terms, we can assume that there is some x such
that xn * x, that is,
hxn , zi → hx, zi, for every z ∈ X. (17.32)
And it remains to show that x is indeed a solution. This follows from
290 17. Monotone maps
Zorn’s lemma
A partial order is a binary relation ”” over a set P such that for all
A, B, C ∈ P:
• A A (reflexivity),
• if A B and B A then A = B (antisymmetry),
• if A B and B C then A C (transitivity).
Theorem A.1 (Zorn’s lemma). Every partially ordered set in which every
chain has an upper bound contains at least one maximal element.
291
292 A. Zorn’s lemma
293
294 Bibliography
[20] J.J. Rotman, Introduction to Algebraic Topology, Springer, New York, 1988.
[21] H. Royden, Real Analysis, Prencite Hall, New Jersey, 1988.
[22] W. Rudin, Real and Complex Analysis, 3rd edition, McGraw-Hill, New York,
1987.
[23] M. Ru̇žička, Nichtlineare Funktionalanalysis, Springer, Berlin, 2004.
[24] G. Teschl, Mathematical Methods in Quantum Mechanics; With Applications to
Schrödinger Operators, Amer. Math. Soc., Providence, 2009.
[25] J. Weidmann, Lineare Operatoren in Hilberträumen I: Grundlagen, B.G.Teubner,
Stuttgart, 2000.
[26] D. Werner, Funktionalanalysis, 3rd edition, Springer, Berlin, 2000.
[27] E. Zeidler, Applied Functional Analysis: Applications to Mathematical Physics,
Springer, New York 1995.
[28] E. Zeidler, Applied Functional Analysis: Main Principles and Their Applications,
Springer, New York 1995.
Glossary of notation
295
296 Glossary of notation
299
300 Index