Sei sulla pagina 1di 346

Introduction to Analysis

Jonathan Gleason

Trigger warning: Mathematics contained herein.

Contents

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

What is a number? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1

Cardinality of sets and the natural numbers

1.1.1
1.1.2
1.1.3
1.1.4

The natural numbers as a set . . . . . . . . . . . . . . . . . .


The natural numbers as an integral crig . . . . . . . . .
The natural numbers as a well-ordered set . . . . . . .
The natural numbers as a well-ordered integral crig

1.2

Additive inverses and the integers

25

1.3

Multiplicative inverses and the rational numbers

28

1.4

Least upper-bounds and the real numbers

30

1.4.1
1.4.2

Least upper-bounds and why you should care . . . . . . . . . . . . . . . . . . . . . . . . . 30


Dedekind cuts and the real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.5

Concluding remarks

Basics of the real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.1

Cardinality and countability

41

2.2

The absolute value

44

2.3

The Archimedean Property

45

2.4

Nets, sequences, and limits

2.4.1
2.4.2
2.4.3
2.4.4
2.4.5

Nets and sequences . . . . . . . . .


Limits . . . . . . . . . . . . . . . . . . . . .
Cauchyness and completeness
Series . . . . . . . . . . . . . . . . . . . . .
Subnets and subsequences . . . .

13
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

13
15
17
21

39

47
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

47
49
51
61
67

2.5

Basic topology of the real numbers

72

2.5.1
2.5.2
2.5.3

Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Quasicompactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

2.6

Summary

Basics of general topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.1

The definition of a topological space

3.1.1
3.1.2
3.1.3
3.1.4

Bases, neighborhood bases, and generating collections


Some basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Some motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2

A review

3.2.1

A couple new things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.3

Filter bases

104

3.4

Equivalent definitions of topological spaces

107

3.4.1
3.4.2
3.4.3
3.4.4

Definition by specification of closures or interiors .


Definition by specification of convergence . . . . .
Definition by specification of continuous functions
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.5

The subspace, quotient, product, and disjoint-union topologies

3.5.1
3.5.2
3.5.3
3.5.4

The subspace topology . . .


The quotient topology . . . .
The product topology . . . .
The disjoint-union topology

3.6

Separation Axioms

3.6.1
3.6.2
3.6.3

Separation of subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124


Separation axioms of spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

3.7

The Intermediate and Extreme Value Theorems

3.7.1
3.7.2

Connectedness and the Intermediate Value Theorem . . . . . . . . . . . . . . . . . 145


Quasicompactness and the Extreme Value Theorem . . . . . . . . . . . . . . . . . . . 149

3.8

Local properties

Uniform spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

4.1

Motivation

153

4.2

Basic definitions and facts

153

4.2.1
4.2.2
4.2.3

Star-refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Uniform spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Uniform bases, generating collections, and the initial and final uniformities . . 161

4.3

Semimetric spaces and topological groups

4.3.1
4.3.2
4.3.3

Semimetric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171


Topological groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Topological vector spaces and algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

88

89
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

90
93
94
96

97

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

108
110
115
118

118
119
119
120
123

124

145

150

170

4.4

T0 uniform spaces are uniformly-completely-T3

184

4.4.1
4.4.2
4.4.3

Separation axioms in uniform spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184


The key result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
The counter-examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

4.5

Cauchyness and completeness

4.5.1
4.5.2
4.5.3

Completeness of MorTop (X, R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193


The completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Complete metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

5.1

Measure theory

5.1.1
5.1.2
5.1.3

Outer-measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Measures on topological and uniform spaces . . . . . . . . . . . . . . . . . . . . . . . . 214
The Haar-Howes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

5.2

The integral

5.2.1
5.2.2
5.2.3

The product measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235


Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
The integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

5.3

L p spaces

Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

6.1

Tensors and abstract index notation

275

6.2

The definition

280

6.2.1

The definition itself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

6.3

A smooth version of MorTop (Rd , R): C (Rd )

285

6.4

Calculus

286

6.4.1
6.4.2
6.4.3
6.4.4

Tools for calculation . . . . . . . . . . . . . . . . . . . . . . . . . . .


The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . .
The Fundamental Theorem of Calculus . . . . . . . . . . . .
Functions defined by their derivatives: exp, sin, and cos

6.5

Differentiability and continuity

303

6.6

The Inverse Function Theorem

310

Sets and categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

A.1

Basic set theory

A.1.1
A.1.2
A.1.3
A.1.4

What is a set? . . . . . . . . . . . . . . .
The absolute basics . . . . . . . . . .
Relations, functions, and orders
Sets with algebraic structure . . .

A.2

Basic category theory

A.2.1
A.2.2

What is a category? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333


Some basic notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

191

207

235

269

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

286
288
291
293

311
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

311
312
315
327

332

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

Index of notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

Preface

I started writing these notes in order to prepare for the lectures1 I would be giving for MATH
104 Introduction to Analysis at the University of California, Berkeley during Summer 2015. My
obsessive-compulsive perfectionism and completionism turned them into what they are today.
At first, these notes were really just for meI wanted to be sure I was ready to teach the class.
As Im sure youre aware if youve ever taught before, there is much more to being able to teach
well than simply knowing all the material. For example, it is not enough to simply know Theorems
1 and 2. Among other things, for example, you have to know the order in which they come in the
theory. Most of the motivation for starting these notes was to make sure I got all of that straight in
my mind before I went up in front of a class.
It almost immediately became apparent, however, that the book I was assigned to use to teach
the course (which shall remain unnamed. . . ) was not sufficient for what I wanted to do. The target
audience for the course (or so I thought) was students who were at least considering going to
graduate school in mathematics, but also who had never seen any analysis before, and the book I
was supposed to use was just simply too elementary to serve this purpose. What really surprised me,
however, is that I could not find a single mathematical analysis book that served all of my needs.
For example, I wanted to teach the lebesgue integral in place of the riemann integral, and nearly all
of the mathematical analysis books that teach the lebesgue integral as the primary definition of the
integral assume that the student has seen analysis before. Thus, within a week or so after started the
course,2 the notes switched from just being a device I would use to prepare, to what would serve as
the effective text for the course.
I now briefly discuss several of the topics / ways of treating topics that I had difficultly finding
in standard analysis texts.
Analysis with an eye towards category theory

Given that this is a first course in analysis, I didnt want to go super heavy on the category theoryI
dont even use functors, for examplebut on the other hand I felt as if it would be a sin to say
nothing of it. For example, R as a set, a field, a topological space, a uniform space, a metric space,
1 This
2 The

is in quotes because they werent really lectures in the traditional sense of the word.
course ran 8 weeks in total.

an ordered-field, blah blah etc. etc.. . . all of these are very different things, and the appropriate way
to make these distinctions is of course via the use of morphisms (because morphisms determine
the meaning of isomorphism). In Set, the mathematics cannot tell the difference between R and
2N , whereas in Top they are very different: one is compact and the other is not. On the other hand,
in Top, the mathematics cannot tell between (0, 1) and R, whereas in Uni they are very different:
one is complete and the other is not. I can understand one making the argument that you can make
these distinctions without using categories per se. On the other hand, the budding mathematician
has to learn categories at some point, and so they may as well learn the absolute basics now.
I also implicitly made use of other more advanced categorical ideas, most notably of pseudouniversal properties.3 For example, the integers are defined as the unique4 totally-ordered integral
cring which (i) contains N and (ii) is contained in any other totally-ordered integral cring which
contains N. This is just one example on how I tried to stress that mathematical objects are defined by
the properties that uniquely specify them, instead of the nuts and bolts which happen to make them
up. The right way to think of the real numbers is as the unique dedekind complete totally-ordered
field, not as dedekind cuts or equivalence classes of cauchy sequences or whateverthese are
merely tools used to prove the existence of such an object.
Construction of the reals from scratch

Actually, many books go from N to R; what I actually had difficult finding was sources that
constructed N from scratch. I very much wanted to say at the end of the course that everything
we proved could be reduced to nothing more than pure logic5 ; and the idea that if you have a
bunch of things, you can give all those things a single name (X, for example), and now you have
a new thing.6 Doing this required us to construct N from the ground up, which we did by taking
equivalence classes of finite cardinals.
Nets

I really cant believe how many peers of mine, and even professional mathematicians, dont really
know nets. There is no really getting around them: if you want to do general topology, you have to
use nets. Sequences are just not enough. For example, the real numbers with the discrete topology
and the cocountable topology provide an example of a set equipped with two nonhomeomorphic
topologies for which the notions of sequential convergence agree. The point: if you ever want to
define a topology using convergence, you must use nets. Moreover, the definition of a uniform
space is the way it is because then the uniformity itself, ordered by reverse star-refinement, is a
directed set, so that one can conveniently index nets by the uniformity itself.
Even if you only care about R, however, nets can still be quite useful. For example, R+ both
with the usual ordering and reverse ordering provide examples of directed sets, and so you can
interpret lim0+ and limx both as limits of nets, which can be quite convenient. Its not just a
convenience, however. Nets allow you to solve problems in the reals that you cannot solve with
sequences alone. For example, suppose you have a doubly-indexed sequence hm, ni 7 am,n for
which limm limn am,n = limn limm am,n . Call this common limit a, . In a perfect world, we would
have limm am,m = a, , but this fails completely. One might hope then that you can at least construct
a single sequence m 7 bm from the terms of am,m for which limm bm = a, . In general, there is
no canonical construction that will do this,7 but if you allow yourself to construct a net whose
3I

say pseudo because I usually phrase things as A is the smallest things that satisfies XYZ and contains B.
instead of There is a universal morphism from B to A in category ABC.. Maybe I will change this in the future, but I
felt as if this level of mathematical sophistication in Chapter 1 of a first analysis course could lose a lot of people.
4 Up to isomorphism in the category of preordered rings.
5 In a naive sense. Not in the sense of mathematical logic.
6 That is, the naive notion of a set. I also made an attempt to explain away any potential set theoretic problems:
Basically, I said we can always do this, but when we do so, the thing we obtain might not be a set.
7 In fact, in some cases, there might be no construction at allI havent checked.

terms come from the set {am,n : m, n N}, then there is a construction that always workssee
Proposition 2.4.157.
Subnets

I imagine that there are at least some sources that shy away from using nets because subnets are
notoriously more difficult than subsequences. Despite how determined I was to use nets, even I
struggled with the tedium of the definition of subnets for awhile. Eventually I realized that this
difficult was a consequence of using the wrong definition of subnet. There are at least two distinct
definitions of subnet in the literature that I was awaresee Propositions 2.4.146 and 2.4.148and
both of these were wrong in the sense that they did not agree with what the analogous notion for
filter bases (filteringssee Definition 3.3.7) was.8 By examining the definition of a filtering, I was
able to formulate a definition of subnet (Definition 2.4.142) that agreed (Proposition 3.3.10) with
the definition of filter bases. That definition is as follows.
A subnet of a net x : X is a net y : 0 X such that
(i). for all 0 , y = x for some ; and
(ii). whenever U X eventually contains x, it eventually contains y.
That is, it is a net made up from terms of the original net that has the property that any set which
eventually contains the original net eventually contains the subnet. I personally find this to be a
much cleaner definition than the ones involving awkward conditions on the indices of the nets: The
indices themselves dont matterit is only what eventually happens that matters.
Uniform spaces

My inclusion of uniform spaces was essentially a replacement for a treatment of metric spaces
(which, for some reason, seems standard). Unfortunately, metric spaces are just not enough. For
example, if you care about continuous functions on the real line,9 you cannot get away with metric
spaces. You could generalize to semimetric spaces, but I find this to not be as clean as uniform
spaces. Uniform spaces also include all topological groups, and so by generalizing to uniform
spaces, you include another hugely important example of topological spaces. For some reason,
the theory of uniform spaces seems to be very obscure in standard undergraduate and graduate
curricula, and I myself was never formally taught it in my own education. Despite this, Ive found
knowledge of uniform spaces to be incredibly useful,10 in particular in functional analysis, and they
deserve more attention than they get in courses in analysis and general topology.
-algebras

In general, I try to place a relatively large amount of emphasis on motivation, and so when I first
started writing down my development of measure theory, I decided from the beginning that I would
first introduce the notion of an outer measure, and then introduce the notion of an abstract -algebra
once I had shown that Carathodorys Theorem (Theorem 5.1.23) says that the measurable sets
form a -algebra. That is, you dont have to pull the definition out of thin airthe theory gives it
to you. As I continued to develop the theory, however, I found that I never really needed to make
use of this notionworking with outer-measure spaces (sets equipped with an outer-measure) was
sufficient: all I had to do was add in the hypothesis measurable in certain places. I found that this
made the theory quite a bit cleaner. For one thing, it was easier to teach, but even in terms of the
mathematics itself, it made some things quite a bit easier. For example, when discussing product
8 There

wasnt really any uncertainly about what the right notion was for filter bases.
youre a mathematician or planning on becoming one, chances are you do, at least a little.
10 For example, there is a generalization of haar measure to uniform spaces (see The Haar-Howes Theorem), which,
among other things, gives isometric invariance of lebesgue measure for free.
9 If

measures, you run into the issue of whether or not the product -algebra agrees with the -algebra
of measurable sets on the product. If I recall, in general this fails, but whats better is that, if you
dont use -algebras to begin with, its a potentially delicate issue that you just never have to worry
about: Mr. Caratodory always tells you what the measurable sets are.
Measure theory and the lebesgue integral

Since the first day you learned the definition of the integral in your first calculus class,11 you
have almost certainly thought of the integral as the (signed) area under the curve. Thats what
it is. Measure theory allows one to make this the definition of the integral. There is no need for
these awkward approximations by rectangles.12 The counter-argument to this is Okay, sure, but
that just reduces the complication of the lebesgue integral to measure theory.. To be fair, this is
correctif you take this route, you have to develop measure theory, whereas, with the riemann
integral, you can start talking about partitions, etc., almost immediately. That being said, I take my
time developing measure theory more fully because I dont like approaches which run straight to
the finish line, but in principle, if you really dont like the amount of build-up that measure theory
requires, you can probably cut-down the length of my development by two-thirds or so to get to
the definition of the integral. To those who would still object to even a minimal measure theoretic
treatment, I would argue the following. There are two types of students: those who want to become
mathematicians and those who do not. The former need to learn the lebesgue integral anyways, and
so I dont see much point dicking around with the riemann integral when theyll have to learn the
lebesgue integral in a graduate analysis course (if not earlier) anyways. In the latter case, simply
stating informally Its the area under the curve. should sufficein fact, thats not even really a lie,
or even a stretch of the truthif you set things up right, thats literally what it is.
Tangent spaces in differentiation in Rd

I actually found differentiation to be much easier when studying it on manifolds as opposed to


in Rd . I think this is because it is easy to mix-up objects that should be thought of as points and
objects that should be thought of as vectors in Rd , something that is just plain impossible to do
in a general manifold. In an attempt to circumvent this potential confusion, I introduce the term
tangent space as well as notation for it, T(Rd )x. This is really only for conceptual clarity: it is
not the definition one would use on a manifold. In fact, it would be essentially impossible to do
this because the usual definition of the tangent space on a manifold requires one to already have a
theory of differentiation on Rd . Instead, I just take T(Rd )x as a metric vector space whereas the
entire space Rd is considered as a metric space.13
I also made a point to introduce index notation and tensor fields. The motivation for this one
really comes from my physics background: index notation and tensor calculus are mandatory
background if one wants to do theoretical physics. Its also incredibly useful in pure riemannian
geometryI personally find index notation much more transparent a lot of the time. For example,
compare
Rabcd := a b b a

(0.1)

versus
R(u, v)w := u v w v u w [u,v] w.
11 Or

(0.2)

when self-teaching, for the particularly precocious amongst you.


course, in the spirit of defining things by the properties they uniquely specify, I dont quite state it like this.
Instead I say something like The area under the curve is the unique extended real-valued function on nonnegative borel
functions such that. . . see Theorem 5.2.155.
13 Note the implicit elementary categorical ideas coming into play here. Also note the two distinct uses of the
term metric. Who the hell came-up with this terminology anyways? Mathematicians need to get better at naming
things. . . Surely you can find a more descriptive adjective than normal or regular for your new idea, yes?
12 Of

Morally, the curvature tensor is supposed to measure the lack of commutativity of the covariant
derivative, and the former makes that perfectly clear, whereas the latter has this awkward extra term
[u,v] w. Mathematicians then also tend to make this (what I find awkward) distinction between14
the curvature endomorphism R, the curvature tensor Rm, and the curvature form R, but in my
mind, these are all just different versions of Rabcd : Rabcd , Rabcd , and [Rab ]cd respectively. To each
his own, but, in this case, I personally find physicists notation much clearer, even in the realm of
pure mathematics. Note that there is no discussion of riemannian geometry herethis is just a
motivating example to justify the teaching of index notation.
I also use what seems to be a nonstandard definition of differentiable. This was almost by
accident. Without consulting a reference, I just wrote down what to me seemed to be the most
obvious thing: a real-valued function on Rd is differentiable at a point x iff (i) all the directional
derivatives at x exist and (ii) the map that sends a tangent vector to its directional derivative is linear
and continuous. That is, I put the minimal conditions in order to guarantee that the differential
was a one-form. It wasnt until I started trying to prove things that I realized this was nonstandard.
As it turns out, with this definition, there are infinitely-differentiable functions which are not
continuous.15 Oopsies. Despite this pathology, I just decided to roll with it, partially due to time
constraints. To give you some idea, the first draft when I had finished it was 344 pages, which had
been written in just a little more than 8 weeks. I had completely underestimated how much work
this was going to be16 and at that point in the course, were I try to go back and change the definition
of differentiable, I would have been at serious risk of just simply not being prepared for future
classes. In the end, things worked out just fine thoughI found that a lot of the time I didnt need
differentiable functions to be continuous, and when I did, a simple distinction between smooth and
infinitely-differentiable did the trick.17 In fact, as I find this definition simpler than the one usually
given (frchet differentiable), I may just stick with it the next time I teach.
Jonathan Gleason
Ph. D. Student, Department of Mathematics, University of California, Berkeley
First draft (of preface): 16 September 2015
: 16 September 2015

14 Using

the notation of [14].


Example 6.5.1.
16 By the end, I pretty much did nothing besides teach and write these notes.
17 Smooth wound-up meaning that (i) the function was infinitely differentiable and (ii) all the derivatives where
continuous.
15 See

1. What is a number?

What is a number? Of course, the answer is that the word number means whatever we declare
it to mean, and indeed, what a mathematician means when he says the word number will vary
from context to context. There is no single universal type of number that will work in all contexts,
so instead of coming up with a single answer to this question, we will instead define and develop
several different number systems, those being the natural numbers, the integers, the rational
numbers, and the real numbers. Being the most fundamental, we start with the natural numbers.

1.1

Cardinality of sets and the natural numbers


The motivation for introducing the natural numbers is that these are the things that allow us to count
things. But what does it mean to count? We will make sense of this by making sense of the notion
of the cardinality of a set, the cardinality being a sort of measure of how many elements the set
contains.

1.1.1

The natural numbers as a set


The first step in defining the cardinality of sets is being able to decide when two sets have the
same number of elements. So, suppose we are given two sets X and Y and that we would like to
determine whether X and Y have the same number of elements. How would you do this?
Intuitively, you could start by trying to label all the elements in Y by elements of X, without
repeating labels. If either (i) you ran out of labels before you finished labeling all elements in Y or
(ii) you were forced to use a label more than once, then you could deduce that X and Y did not have
the same number of elements. To make this precise, we think of this labeling as a function from X
to Y . Then, the first case corresponds to this labeling function not being surjective and the second
case corresponds to this labeling function not being injective.
The more precise intuition is then that the two sets X and Y have the same number of elements,
that is, the same cardinality, iff there is a bijection f : X Y between them: that f is an injection
says that we dont use a label more than once (or equivalently that Y has at least as many elements
as X) and that f is a surjection says that we label everything at least once (or equivalently that X

Chapter 1. What is a number?

14

has at least as many elements as Y ). In other words, according to Exercise A.2.14, two sets should
have the same cardinality iff they are isomorphic in Set.1
Definition 1.1.1 Equinumerous Let X and Y be sets. Then, X and Y are equinumerous

iff X
=Set Y .
So weve determined what it means for two sets to have the same cardinality, but what actually
is a cardinality? The trick is to identify a cardinal with the collection of all sets which have that
cardinality.
Definition 1.1.2 Cardinal number A cardinal number is an element of

K := Set0 /
=Set := {[X]
=Set : X Set0 } .
R

(1.1.3)

In other words, a cardinal is an equivalence class of sets, the equivalence relation


being equinumerosity.a Furthermore, for X a set, we write |X| := [X]
=Set .

a Recall

that the relation of isomorphism is an equivalence relation in every category. See Exercise A.2.15.

The idea then is that the natural numbers are precisely those cardinals which are finite. But
what does it mean to be finite? This is actually a bit tricky.
Obviously we dont have a precise definition yet, but everyone has an intuitive idea of what
it means to be infinite. Consider an infinite set X. Now remove one element x0 X to form the
set U := X \ {x0 }. For any reasonable definition of infinite, removing a single element from an
infinite set should not change the fact that it is infinite, and so U should still be infinite. In fact,
more should be true. Not only should U still be infinite, but it should still have the same cardinality
as X.2 It is this idea that we take as the defining property of being infinite.
Definition 1.1.4 Infinite Let X be a set. Then, X is infinite iff there is a bijection from X

to a proper subset of X.
Definition 1.1.5 Finite Let X be a set. Then, X is finite iff it is not infinite.

And now finally:


Definition 1.1.6 Natural numbers The natural numbers, N, are defined as

N := {|X| : X Set0 is finite.} .

(1.1.7)

In words, the natural numbers are precisely the cardinals of finite sets.
R

1 Set

Some people take the natural numbers to not include 0. This is a bit silly for a couple
of reasons. First of all, if you think of the natural numbers as cardinals, as we are
doing here, then 0 has to be a natural number as it is the cardinality of the empty set.
Furthermore, as we shall see in the next section, it makes the algebraic structure of N
slightly nicer because 0 acts as an additive identity.

is the category of sets. If you are reading this linearly and you see something you dont recognize, chances are
its in the appendix. Use the index and index of notation at the end of the notes to find exactly where.
2 We will see in the next chapter that there are infinite sets which are not of the same cardinality. That is, in this sense,
there is more than one type of infinity.

1.1 Cardinality of sets and the natural numbers


1.1.2

15

The natural numbers as an integral crig


Great! We now know what the natural numbers are. Our next objective then is to be able to add and
multiply natural numbers. In fact, we will define not only what it means to add and multiply natural
numbers, but instead we will define what it means to add and multiply any cardinal numbers. Then,
we will just need to check that the sum and product of two finite cardinals is again finite, so the the
operations restricted to N are closed.
Definition 1.1.8 Addition and multiplication Let m, n N and let M and N be sets

such that m = |M| and n = |N|. Then, we define


m + n := |M t N| and mn := |M N|.
R

(1.1.9)

Recall that |M| means the equivalence class of the set M under the equivalence relation
of equinumerosity (see the definition of a cardinal, Definition 1.1.2). Whenever
we define an operation on equivalence classes that makes reference to a specific
representative of that equivalence class, we must check that our definition does not
depend on this representative. For example, perhaps if I take a different set M 0 with
|M 0 | = |M|, it will turn out that |M 0 tN| =
6 |M tN|. If this happens, then our definition
doesnt make sense. Of course, it doesnt happen, but we need to check that it doesnt
happen. This is what it means to be well-defined.

Proposition 1.1.10 Addition and multiplication are well-defined.

Proof. Let M1 , M2 , N1 , N2 be sets with |M1 | = |M2 | and |N1 | = |N2 |. By definition,
this means that there are bijections f : M1 M2 and g : N1 N2 . We would
like to show that |M1 t N1 | = |M2 t N2 |. To show this, by definition, we need to
construction a bijection from M1 t N1 to M2 t N2 . We do this as follows. Define
h : M1 t N1 M2 t N2 by
(
f (x) if x M1
h(x) :=
.
(1.1.11)
g(x) if x N1
We now check that h is a bijection. Suppose that h(x1 ) = h(x2 ). This single
element must be contained in either M2 or N2 , so without loss of generality suppose
that h(x1 ) = h(x2 ) M2 . Then, from the definition of h, we have that f (x1 ) =
h(x1 ) = h(x2 ) = f (x2 ), and so because f is injective, we have that x1 = x2 . Thus, h
is injective. To show that h is surjective, let y M2 t N2 . Without loss of generality,
suppose that y M2 . Then, because f is surjective, there is some x M1 such that
f (x) = y, so that h(x) = f (x) = y. Thus, h is surjective, and hence bijective.
Thus, we have shown that
|M1 t N1 | = |M2 t N2 |,

(1.1.12)

so that addition is well-defined.


Exercise 1.1.13 Complete the proof by showing that multiplication is well-

defined.


Chapter 1. What is a number?

16
Definition 1.1.14 0 and 1 Define

0 := |0|,
/

1 := |{0}|
/ N.

(1.1.15)

In words, 0 is the cardinality of the empty set and 1 is the cardinality of the set that contains
the empty set and only the empty set.
Proposition 1.1.16 (K, +, , 0, 1) is an integral crig.a

Proof. We simply need to verify the properties of the definition of a crig, Definition A.1.125
(and also check that it is integral, Definition A.1.131). We first check that + is associative,
so let m, n, o N and write m = |M|, n = |N|, o = |O| for sets M, N, and O. Then,
(m + n) + o = (|M| + |N|) + |O| = |M t N| + |O| = |(M t N) t O| = |M t (N t O)|
= m + (n + o).
(1.1.17)
A similar argument shows that additive is commutative. As for the additive identity, we
have
m + 0 = |M t 0|
/ = |M| = m = 0 + m.

(1.1.18)

Thus, (K, +) is a commutative monoid.


Exercise 1.1.19 Check that (K, ) is a commutative semigroup.
Exercise 1.1.20 To show that K is a crig, there is one final property to check. What

is it? Check it.


To show that K is integral, suppose that mn = 0. Then, there must be a bijection between
M N and the empty-set, which implies that M N is empty. But if neither M nor N is
empty, then M N will be nonempty. Thus, we must have that either M or N is empty, or
equivalently, that either m = 0 or n = 0, so that K is integral.

a Of

course, K is not actually a set, but we dont care about this. See the remark in Appendix A.1.1 What is

a set?.

Now we must check that addition and multiplication restrict to operations on the collection
of finitely cardinalities, namely, N. This amounts to showing that the sum and product of finite
cardinalities are both finite.
Proposition 1.1.21 If M, N are finite sets, then M t N and M N are finite sets.

Proof. We first show that M t {} is finite if M is. We proceed by contradiction: suppose


there is a proper subset S M t{} and a bijection f : M t{} S. If M is empty, then {}
is finite because its only proper subset is the empty set to which there can be no bijection (in
fact, there is no function from a nonempty set to the empty set; see Exercise A.1.35). Thus,
we may without loss of generality assume that M is nonempty. So, let x0 M. We know that
either S is of the form S = M 0 for M 0 M or S = M 0 t {} for M 0 M. By relabeling as x0
and x0 as , we may as well assume that we are in the latter case, so that we have a bijection

1.1 Cardinality of sets and the natural numbers

17

f : M t {} M 0 t {} for M 0 M. (Forget the label x0 ; we will want that notation later


to refer to something else.) If maps to under f , then the restriction of f to M yields a
bijection from M to M 0 showing that M is infinite: a contradiction. Thus, we may as well
assume that f (x0 ) = for x0 M. Let g : M t {} M t {} be any bijection which sends
to x0 (such a bijection exists by Exercise A.1.49). Then, f g : M t {} M 0 t {} is a
bijection such that the image of M is M 0 . Thus, the restriction of f g to M yields a bijection
of M onto a proper subset, showing that M is infinite: a contradiction. Thus, M t {} is
finite.
Applying this result inductively shows that M t N is finite if both M and N are.
To see that M N is finite, think of the product as (Exercise A.1.50)
M N = tyN M.

(1.1.22)

That M N is finite now follows inductively from the fact that the disjoint union of two
finite sets is finite.

Corollary 1.1.23 (N, +, , 0, 1) is an integral crig.

1.1.3

The natural numbers as a well-ordered set


For the moment, we will set aside the algebraic structure we have just defined on N and equip N
with a preorder (which turns out to be a well-order). Then, in the next subsection, we will show
that these two structures are compatible in a way that makes N into a well-ordered integral crig.
Once again, well will in fact define the preorder on all of K and show that it is a well-order on K. It
will then follow automatically that it restricts to a well-order on N.
Definition 1.1.24 Let m, n K and let M and N be sets such that m = |M| and n = |N|.
Then, we define m n iff there is an injective map from M to N.
Exercise 1.1.25 Check that is well-defined.

Proposition 1.1.26 (K, ) is a preordered set.

Proof. Recall that being a preorder just means that is reflexive and transitive (see Definition A.1.87).
Let m, n, o N and let M, N, O be sets such that m = |M|, n = |N|, o = |O|. The
identity map from M to M is an injection (and, in fact, a bijection), which shows that
m = |M| |M| = m, so that is reflexive.
To show transitivity, suppose that m n and n o. Then, there is an injection f : M N
and an injection from g : N O. Then, g f : M O is an injection (this is part of
Exercise A.1.47), and so we have m = |M| |O| = o, so that is transitive, and hence a
preorder.

The next result is perhaps the first theorem we have come to that has a nontrivial amount of
content to it.
Theorem 1.1.27 Bernstein-Cantor-Schrder Theorem. (K, ) is a partially-ordered

set.

Chapter 1. What is a number?

18
R

This theorem is usually stated as If there is an injection from X to Y and there is an


injection from Y to X, then there is a bijection from X to Y ..

Proof. a S T E P 1 : R E C A L L W H AT I T M E A N S T O B E A PA R T I A L - O R D E R
Recall that being a partial-order just means that is an antisymmetric preorder. We have
just shown that is a preorder (see Definition A.1.91), so all that remains to be seen is that
is antisymmetric.
S T E P 2 : D E T E R M I N E W H AT E X P L I C I T LY W E N E E D T O S H O W
Let m, n K and let M, N be sets such that m = |M| and n = |N|. Suppose that m n
and n m. By definition, this means that there is an injection f : M N and an injection
g : N M. We would like to show that m = n. By definition, this remains we must show
that there is a bijection from M to N.
S T E P 3 : N O T E T H E E X I S T E N C E O F L E F T- I N V E R S E T O B O T H f A N D g
We use the result of Appendix A.1.3 which says that both f and g have left inverses. Denote
these inverses by f 1 : N M and g1 : M N respectively, so that
f 1 f = idM and g1 g = idN .

(1.1.28)

Note that it is not necessarily the case that f f 1 = idN (and similarly for g).
S T E P 4 : D E F I N E Cx
Fix an element x M and define



Cx := . . . , g1 f 1 g1 (x) , f 1 g1 (x) , g1 (x), x,
f (x), g ( f (x)) , f (g ( f (x))) , . . .} M t N.

(1.1.29)

(The C is for chain.)


S T E P 5 : S H O W T H AT {Cx : x M} F O R M S A PA R T I T I O N O F M t N
We now claim that the collection {Cx : x M} forms a partition of M t N (recall that this
means that any two given Cx s are either identical or disjoint; see Definition A.1.71). If Cx1
is disjoint from Cx2 we are done, so instead suppose that there is some element x0 that is
in both Cx1 and Cx2 . First, let us do the case in which x0 M. From the definition of Cx
(1.1.29), we then must have that
[g f ]k (x1 ) = x0 = [g f ]l (x2 )

(1.1.30)

for some k, l Z. Without loss of generality, suppose that k l. Then, applying f 1 g1


to both sides of this equation k times,b we find that
x1 = [g f ]lk (x2 ).

(1.1.31)

In other words, x1 Cx2 . Not only this, but f (x1 ) Cx2 as well because f (x1 ) =
f [g f ]lk (x2 ) . Similarly, g1 (x1 ) Cx2 , and so on. It follows that Cx1 Cx2 . Switching
1 2 and applying the same arguments gives us Cx2 Cx1 , and hence Cx1 = Cx2 . Thus,
indeed, {Cx : x M} forms a partition of M t N. It follows that (because partitions determine
equivalence relations; see Exercise A.1.75)
Cx = Cx0 for all x0 Cx .

(1.1.32)

1.1 Cardinality of sets and the natural numbers

19

S T E P 6 : D E F I N E X1 , X2 ,Y1 ,Y2
Now define
A :=

(1.1.33)

Cx

xM s.t.
Cx N f (M)

as well as
X1 := M A,

Y1 := N A,

X2 := M Ac ,

Y2 := N Ac

(1.1.34)

S T E P 7 : S H O W T H AT f |X1 : X1 Y1 I S A B I J E C T I O N
We claim that f |X1 : X1 Y1 is a bijection. Of course, it is injective because f is. To show
surjectivity, let y Y1 := N A. From the definition of A (1.1.33), we see that y Cx N for
some Cx with Cx N f (M), so that y = f (x0 ) for some x0 M. We still need to show that
x0 X1 . However, we have that x0 = f 1 (y), and so as y Cx , we have that x0 = f 1 (y) Cx
as well. We already had that Cx N f (M), so that indeed x0 A, and hence x0 X1 . Thus,
f |X1 : X1 Y1 is a bijection.
S T E P 8 : S H O W T H AT g|Y2 : Y2 X2 I S A B I J E C T I O N
We now show that g|Y2 : Y2 X2 is a bijection. Once again, all we must show is surjectivity,
so let x X2 = M Ac . It thus cannot be the case that Cx N is contained in f (M), so
that there is some y Cx N such that y
/ f (M). Then, by virtue of (1.1.32), we have that
Cx = Cy , and in particular x Cy . From the definition of Cy (1.1.29), it follows that either
(i) x = y, (ii) x is in the image of f 1 , or (iii) x is in the image of g. Of course it cannot
be the case that x = y because x M and y N. Likewise, it cannot be the case that x is
in the image of f 1 because x Ac . Thus, we must have that x = g(y0 ) for some y0 N.
Once again, we still must show that y0 Y2 ; however, we have that y0 = g1 (x), so that
y0 Cx Ac . Thus, y0 Ac , and so y0 Y2 . Thus, g|Y2 : Y2 X2 is a bijection.
STEP 9: CONSTRUCT THE BIJECTION FROM M TO N
Finally, we can define the bijection from M to N. We define h : M N by
(
f (x)
if x X1
h(x) :=
.
1
g (x) if x X2

(1.1.35)

Note that {X1 , X2 } is a partition of M and {Y1 ,Y2 } is a partition of N. To show injectivity,
suppose that h(x1 ) = h(x2 ). If this element is in Y1 , then because f |X1 : X1 Y1 is a bijection,
it follows that both x1 , x2 X1 , so that f (x1 ) = h(x1 ) = h(x2 ) = f (x2 ), and hence that x1 = x2 .
Similarly if this element is contained in Y2 . Toshow surjectivity, let y N. First assume that
y Y1 . Then, f 1 (y) X1 , so that h f 1 (y) = y. Similarly, if y Y2 , then h (g(y)) = y.
Thus, h is surjective, and hence bijective.

a Proof
b If

adapted from [1, pg. 29].


k happens to be negative, it is understood that we instead apply g f k times.

Theorem 1.1.36. (K, ) is well-ordered.

Proof.

S T E P 1 : C O N C L U D E T H AT I T S U F F I C E S T O S H O W T H AT E V E R Y

Chapter 1. What is a number?

20

NONEMPTY SUBSET HAS A SMALLEST ELEMENT

By Proposition A.1.98, we do not need to check totality explicitly, and so it suffices to show
that every nonempty subset of K has a smallest element.
STEP 2: DEFINE T AS A PREORDERED SET
So, let S K be a nonempty collection of cardinals and for each m S write m = |Mm | for
some set Mm . Define

M :=
Mm
(1.1.37)
mS

and
T := {T M : T Set0 ;
for all x, y T , if x 6= y it follows that xm 6= ym for all m S.} .

(1.1.38)

Note that the statement that T Set0 is just the statement that T is an actual set, instead of
just a set-like object (i.e. a proper class). Order T by inclusion.
S T E P 3 : V E R I F Y T H AT T S AT I S F I E S T H E H Y P O T H E S E S O F Z O R N S L E M M A
We wish to apply Zorns Lemma (Theorem A.1.106) to T . To do that of course, we must
first verify the hypotheses of Zorns Lemma. T is a partially-ordered set by Exercise A.1.93.
Let W T be a well-ordered subset and define
W :=

T.

(1.1.39)

T W

It is certainly the case that T W for all T W . In order to verify that W is indeed an
upper-bound of W in T , however, we need to check that W is actually an element of T .
So, let x, y W be distinct. Then, there are T1 , T2 W such that x T1 and y T2 . Because
W is totally-ordered, we may without loss of generality assume that T1 T2 . In this case,
both x, y T2 . As T2 T , it then follows that xm 6= xm for all m S. It then follows in turn
that W T .
STEP 4: CONCLUDE THE EXISTENCE OF A MAXIMAL ELEMENT
The hypotheses of Zorns Lemma being verified, we deduce that there is a maximal element
T0 T .
S T E P 5 : S H O W T H AT T H E R E I S S O M E P R O J E C T I O N W H O S E R E S T R I C T I O N
TO THE MAXIMAL ELEMENT IS SURJECTIVE

Let m : M Mm be the canonical projection. We claim that there is some m0 S such


that m0 (T0 ) = Mm0 . To show this, we proceed by contradiction: suppose that for all m M
there is some element xm Mm \ m (T0 ). Then, T0 {x} T is strictly larger than T0 : a
contradiction of maximality. Therefore, there is some m0 S such that m0 (T0 ) = Mm0 .
S T E P 6 : C O N S T R U C T A N I N J E C T I O N F R O M Mm0 T O Mm F O R A L L m S
The defining condition of T , (1.1.38), is simply the statement that m |T : T Mm is
injective for all T T . In particular, by the previous step, m0 |T0 : T0 Mm0 is a bijection.
And therefore, the composition m m0 |T0 : Mm0 Mm is an injection from Mm0 to M.
Therefore,
m0 = |Mm0 | |Mm | = m

(1.1.40)

1.1 Cardinality of sets and the natural numbers

21

for all m S. That is, m0 is the smallest element of S, and so K is well-ordered.


a Proof

adapted from [9].

Corollary 1.1.41 (N, ) is a well-ordered set.

1.1.4

The natural numbers as a well-ordered integral crig


We have shown that (N, +, ) is a crig and that (N, ) is a well-ordered. We now finally show how
these two different structures, the algebraic structure and the order structure, are compatible. But
before we do that, of course, we have to make precise what we mean by the word compatible.
Definition 1.1.42 Preordered rg A preordered rg is a set X equipped with two binary

operations + and , and a relation , so that


(i). (X, +, 0, ) is a rg,
(ii). (X, ) is a preordered set,
(iii). x y implies that x + z y + z for all x, y, z X,
(iv). x y and 0 z implies that xz yz,
and furthermore, in the case (X, +, 0, , 1) is a rig, that 0 1.
R

If X is a totally-ordered ring, we automatically have 0 1. By totality, we automatically have that either 0 1 or 1 0. Do you see why the latter cannot happen (if
1 6= 0)?
In general, however, we do need to make the requirement that 0 1.

For X a preordered rg, we write


X + := {x X : x > 0} and X0+ := {x X : x 0} .

(1.1.43)

A partially-ordered rg, totally-ordered rg, etc. are just preordered rgs whose underlying
preorder is respectively a partial-order, total-order, etc.

In a totally-ordered ring, we define sgn : X {0, 1, 1} X by

if x > 0
1
sgn(x) := 0
if x = 0 .

1 if x < 0

(1.1.44)

This is the signum function and is meant merely to return the sign of an element.
One thing to note about preordered rngs is that, to define the order, it suffices only to be able to
compare everything with 0.
Exercise 1.1.45 Let (X, +, 0, ) be a rng and let P be a subset of X (thought of as the

nonnegative elements) that is (i) closed under addition, (ii) closed under multiplication, and
(iii) contains 0. Show that there is a unique preorder on X such that (i) (X, +, 0, , ) is
a preordered rng and (ii) P = {x X : x 0}.. Furthermore, show that is a partial-order
iff x, x P implies x = 0, and furthermore show that is a total-order iff, in addition,

Chapter 1. What is a number?

22
either x P or x P for all x X.

Definition 1.1.46 The category of preordered rgs The category of preordered rgs is

the category PreRg whose collection of objects PreRg0 is the collection of all preordered
rgs, for every preordered rg X and every preordered rg Y the collection of morphisms from
X to Y , MorRg (X,Y ), is precisely the set of all nondecreasing homomorphisms from X to Y ,
composition is given by ordinary function composition, and the identities of the category
are the identity functions.
R

We similarly have categories of preordered rigs PreRig, preordered rngs PreRng, and
preordered ringsPreRing.

This should be pretty much what you expect: a preordered rg has two different structures on it,
namely the rg structure and the preorder structure, and so we require the morphisms in the category
of preordered rgs to preserve both of these structures.
Proposition 1.1.47 (K, +, , 0, 1, ) is a well-ordered integral crig.

Proof. We just showed in the last two sections (Proposition 1.1.16 and Theorem 1.1.36)
that (K, +, , 0, 1) is an integral crig and that (K, ) is well-ordered, so all that remains to
be checked are properties (iii) and (iv) of Definition 1.1.42.
We first check (iii) of Definition 1.1.42, so let m, n, o K and let M, N, O be sets such
that m = |M|, n = |N|, and o = |O|. Suppose that m n. This means that there is an injection
f : M N. We would like to show that m + o n + o. In other words, we want to show
that there is an injection from M t O to N t O. Of course, the function g : M t O N t O
defined by
(
f (x) if x M
g(x) :=
(1.1.48)
x
if x O
is an injection because f is, and so m + o n + o.
Now we show (iv). Suppose that m n and that 0 o.a Let f : M N be the injection
the same as before. We wish to show that mo no. In other words, we wish to construct an
injection from M O into N O. Of course, the function g : M O N O defined by
g (hx, yi) := h f (x), yi
is an injection because f is, and so mo no.
Finally, 0 1 follows because 0/ {0}
/ (the empty set is a subset of every set).
Thus, (K, +, 0, 1, ) is a well-ordered integral crig.
a Of

(1.1.49)

course we dont actually need to assume that 0 o. Every natural number is greater than or equal to 0.

Corollary 1.1.50 (N, +, , 0, 1, ) is a well-ordered integral crig.

Before moving on, we summarize all the properties of N that we have shown. This is nothing
more than explicitly spelling out what it means for (N, +, , 0, 1, ) to be a well-ordered integral
crig.
(i). + is associative,

1.1 Cardinality of sets and the natural numbers

23

(ii). + is commutative,
(iii). 0 is an additive identity,
(iv). is associative,
(v). is commutative,
(vi). 1 is a multiplicative identity,
(vii). multiplication distributes over addition,
(viii). mn = 0 implies either m = 0 or n = 0,
(ix). is reflexive,
(x). is transitive,
(xi). is antisymmetric,
(xii). is total,
(xiii). every nonempty subset has a smallest element,
(xiv). m n implies m + o n + o, and
(xv). m n and 0 o implies mo no.
Finally, we prove one more property of the natural numbers that we will need when discussing
the integers.
Proposition 1.1.51 Let m, n, o N. Then, if m + o n + o, then m n. In particular,

because this is a partial order, if m + o = n + o, then m = n.


R

Note that this is false in K. For example, 0 + 0 = 1 + 0 , but 0 6= 1. (0 := |N|.


See Definition 2.1.2.)

Proof. Suppose that m + o n + o. Let M, N, O be finite sets such that m = |M|, n = |N|,
and o = |O|. Because m + o n + o, there is an injection : M t O N t O. Define
P := {x M : (x) O} .

(1.1.52)

(P is for problematic.) We prove the result by induction on the cardinality of P.


If P is empty, then we must have that (M) N, so that |M : M N is an injection,
and hence m n.
Now suppose the result is true if |P| = k for k 1. We show that it must also be true
for |P| = k + 1. We first show that N \ (M) is nonempty. If it were empty, then we must
have that (O) O, and hence, as O is finite and is an injection, we must have that
(O) = O (injections are bijections onto their image). But then it cannot be the case that
P is nonempty: a contradiction of the fact that |P| = k 1. Thus, it must be the case that
N \ (M) is nonempty. Let n0 N \ (M). Let p0 P, so that, by the definition of P,
(p0 ) O. Let : N t O N t O be the map that exchanges (p0 ) and n0 and leaves
everything else fixed. This is a bijection, and so : M t O N t O is an injection.
Furthermore, now the image of p0 is contained in N (and there is no new point in M that
gets mapped into O), so that now there are only k elements of M which map into O via the
injection . By the induction hypothesis, there is then a injection from M to N, and we
are done.


Chapter 1. What is a number?

24
The Peano axioms

It is not uncommon for textbooks to introduce the natural numbers via the Peano axioms. We
included this material not because it is essential to the development of the real numbers, but rather
for the sake of completeness: even though it is not strictly necessary, people will expect every
mathematician to know of the Peano axioms.
People who do use the Peano axioms to introduce the natural numbers, instead of constructing
the natural numbers and proving they have the desired properties, will simply assume that a structure
which satisfies the Peano axioms exists. We, however, will instead prove the Peano axioms are true;
for us, they are theorems. Before you try to go off and prove them, however, I had probably better
tell you what they are.
Theorem 1.1.53 The Peano axioms. There exists a set N0 which contains an element

00 N0 and a function s : N0 N0 (called the successor function), such that


(i). m N0 is in the image of S iff m 6= 00 ;
(ii). s is injective, and

(iii). a subset S N0 such that (iii.a) 00 S and (iii.b) m S implies s(m) S is equal to
all of N0 .
R

The prime mark on N0 simply to distinguish the set in this theorem from our natural
numbers, namely the set that is the collection of cardinalities of finite sets Similarly
for 00 .

The successor of an element is of course supposed to be that element plus one.

(iii) is of course thought of as induction.

Proof. S T E P 1 : D E F I N E E V E R Y T H I N G
Define N0 := N, s(m) := m + 1, and 00 := 0.
S T E P 2 : P ROV E ( I )
Exercise 1.1.54 Prove that there is no m N such that m + 1 = 0.
Exercise 1.1.55 Prove that if m 6= 0, then there is some n N such that s(n) = m.

S T E P 3 : P ROV E ( I I )
Exercise 1.1.56 Prove that if m + 1 = n + 1, then m = n.

S T E P 4 : P ROV E ( I I I )
Let S N have the properties that (iii.a) 0 S and (iii.b) m S implies m + 1 S. We
wish to show that S = N. We proceed by contradiction: suppose that S 6= N. Then, Sc is
nonempty, and as N is well-ordered, Sc has a least element m0 Sc . As 0 S, it cannot be
the case that m0 = 0. Then, by (i), there is some n0 N such that s(n0 ) = m0 . As we have
defined s(n0 ) := n0 + 1, we in particular have that n0 < m0 . As n0 is less than m0 and m0 is

1.2 Additive inverses and the integers

25

the least element of Sc , we cannot have n0 Sc . Therefore, n0 S. But then, by hypothesis,


m0 = s(n0 ) S: a contradiction (as m0 Sc ). Hence, we must have that S = N.


1.2

Additive inverses and the integers


Suppose we have a simple algebraic equation involving three natural numbers, m + n = o, we are
given n and o, and we would like to find m. Of course, we know what the answer should be, namely
m = o n, but currently this is nonsensical as we have not defined what this crazy new symbol
means. The job of the integers is to make sense out of this.
We thus would like to find a set with algebraic structure (which will turn out to be an integral
cring whose elements are thought of as a new more general type of number) that allows us to
solve simple equations like m + n = o. More precisely, we seek a cring (that is a crig with additive
inverses) which contains N and, in some sense, is the simplest cring that will do so.
To understand what it means to be a cring Z that contains N is quite easy, but how does one
make sense of the statement that Z is the simplest cring with this property? The precise sense in
which Z should be the simplest cring which contains N is, if Z 0 is any other cring which contains N,
then Z 0 contains Z as well. That is, Z is contained in every cring which contains N.
Theorem 1.2.1 Integers. There exists a totally-ordered integral cring Z, the integers,

such that
(i). Z contains N as a subpreordered rig;a and
(ii). if Z 0 is any other totally-ordered integral cring which satisfies this property, then Z 0
contains a unique copy of Z as a subpreordered ring.
Furthermore, Z is unique up to isomorphism of preordered rings in the sense that, if Z
is any other totally-ordered integral cring which satisfies both of these properties, then
Z
=PreRing Z.
R

The order itself doesnt really play a role here. The significance is in going from a
crig to a cring; the order just comes along for the ride, so to speak.

This is actually a special case of a more general construction, the construction of


the Grothendieck Group of a commutative monoid. We dont need this general
construction and we are already pushing the envelope for level of abstraction in an
introductory analysis course, so we dont present it in this level of generality.

Integral crings are usually called integral domains. In fact, this term is the origin of
our use of the word integral.

Proof. S T E P 1 : D E F I N E A N E Q U I VA L E N C E R E L AT I O N O N N N
The idea being constructing the integers is to think of a pair of natural numbers hm, ni as
representing what should be m n. One issue with this, however, is that, if we do this, then
it should be the case that hm + 1, n + 1i and hm, ni both represent the same number. Thus,
we put an equivalence relation on N N.
Define hm1 , n1 i hm2 , n2 i iff m1 +n2 = m2 +n1 . We came up with this of course because
the statement hm1 , n1 i hm2 , n2 i should be m1 n1 = m2 n2 . Of course, this itself doesnt
make sense, and so we write this same thing as something that does make sense given what
we have already defined, namely m1 + n2 = m2 + n1 .
S T E P 2 : C H E C K T H AT I S A N E Q U I VA L E N C E R E L AT I O N

Chapter 1. What is a number?

26

Exercise 1.2.2 Show that is reflexive and symmetric.

To show that it is transitive, suppose that hm1 , n1 i hm2 , n2 i and hm2 , n2 i hm3 , n3 i.
Then, m1 + n2 = m2 + n1 and m2 + n3 = m3 + n2 , and so
m1 + n2 + m2 + n3 = m2 + n1 + m3 + n2 .

(1.2.3)

It follows from Proposition 1.1.51 that m1 + n3 = n1 + m3 , and so (m1 , n1 ) (m3 , n3 ), so


that is transitive.
STEP 3: DEFINE Z AS A SET
We define
Z := N N/ .

(1.2.4)

Recall (see Definition A.1.78) that this is the quotient set with respect to , the set of
equivalence classes.
S T E P 4 : D E F I N E A D D I T I O N A N D M U LT I P L I C AT I O N O N Z
We define
[hm1 , n1 i] + [hm2 , n2 i] := [hm1 + m2 , n1 + n2 i]

(1.2.5)

[hm1 , n1 i] [hm2 , n2 i] := [hm1 m2 + n1 n2 , m1 n2 + n1 m2 i].

(1.2.6)

and

Exercise 1.2.7 Show that + and are both well-defined.

STEP 5: DEFINE THE IDENTITIES


We define
0 := [h0, 0i] and 1 := [h1, 0i].

(1.2.8)

S T E P 6 : S H O W T H AT Z I S A N I N T E G R A L C R I N G
Exercise 1.2.9 Show that Z is an integral cring.

STEP 7: DEFINE A PREORDER ON Z


We define
[hm1 , n1 i] [hm2 , n2 i] iff m1 + n2 m2 + n1 .
Exercise 1.2.11 Show that is well-defined.

(1.2.10)

1.2 Additive inverses and the integers

27

Exercise 1.2.12 Show that is a preorder.

S T E P 8 : S H O W T H AT I S A T O TA L - O R D E R
Exercise 1.2.13 Show that is a total-order.

S T E P 9 : S H O W T H AT (Z, +, , 0, 1, , ) I S A T O TA L LY - O R D E R E D I N T E G R A L
CRING

Exercise 1.2.14 Show that (Z, +, , 0, 1, , ) is a totally-ordered integral cring.

S T E P 1 0 : S H O W T H AT Z C O N TA I N S N A S A P R E O R D E R E D - R I G
Exercise 1.2.15 Define a function : N Z. Show that it is (i) injective and (ii) a

morphism of preordered rigs.

S T E P 1 1 : S H O W T H AT E V E R Y T O TA L LY - O R D E R E D I N T E G R A L C R I N G W H I C H
C O N TA I N S N C O N TA I N S A U N I Q U E C O P Y O F Z
Let Z 0 be a totally-ordered integral cring which contains N. As Z 0 contains additive inverses,
it must contain the additive inverse of every element of N. However, N together with the
additive inverses of every element of N in Z 0 is just a copy of Z. There can only be one such
isomorphic copy of Z because, if there were another copy, they would have to share the
same multiplicative identity (by uniqueness of identities, Exercise A.1.116). Both copies
being rings, it would then follow that 2 := 1 + 1, 3 := 1 + 1 + 1, etc. would have to be the
same element in each, and hence that by uniqueness of inverses (Exercise A.1.118), 1, 2,
, 3, etc. would have to be the same element in each, and hence the two copies would be the
same.
S T E P 1 2 : S H O W T H AT Z I S U N I Q U E U P T O I S O M O R P H I S M
Let Z be some other totally-ordered cring that satisfies the two properties (i) and (ii). As Z
contains N and Z possesses property (ii), it follows that Z contains Z. On the other hand,
as Z contains N and Z possesses property (ii), it likewise follows that Z contains Z. As Z
contains Z and Z contains Z, and furthermore, because these copies are unique, it must be
the case that Z = Z.b

a When

we say that Z contains N as a subpreordered rig, what we actually mean is that there is an injective
morphism of preordered-rigs N , Z. In particular, it need not literally be the case that N is a subset of Z (just
an isomorphic copy of N).
b We say that Z is unique up to isomorphism because in fact all we can say is that Z contains an isomorphic
copy of Z. We abused language and said just Z instead of isomorphic copy of Z. This abuse of language is
very common in mathematics. Indeed, you should usually be thinking of two things which are isomorphic as,
for all intents and purpose, the same exact thing.

You will notice a common theme through these notes, and indeed, throughout all of mathematics:

Chapter 1. What is a number?

28

if ever we want something that isnt there, throw it in. For example, we had the natural numbers,
but wanted additive inverses, and so we came up with Z by just adjoining the additive inverses
of all the elements of N. Similarly, we want multiplicative inverses, and so by throwing them in,
we obtain Q. Doing the same with limits gives us R, and in turn doing the same with roots of
polynomials gives us C. If youve read the appendix, you may have seen already something vaguely
similar (cf. the paragraph containing (A.1.1) and what follows). We wrote down the definition of a
set-like thing, but then realized that it cannot be a set (this is Russels Paradox). To get around this,
instead of working with things that are strictly sets, we work in a larger universe that contains
proper classes as well: you might say that we throw in the proper classes alongside sets.
We now know that Z is a totally-ordered integral cring. Of course, Z satisfies other properties
as well (a lot of which follow from the fact that Z is a totally-ordered integral cring).
Exercise 1.2.16 Let X be a preordered ring and let x1 , x2 X. Show the following

statements.
(i). 0 x1 implies x1 0.
(ii). x1 , x2 0 implies 0 x1 x2 .
(iii). If X is totally-ordered, then 0 x12 .
(iv). If X is totally-ordered, then 0 1.
(v). x1 1 and 0 x2 imply x1 x2 x2 .
Exercise 1.2.17 Let m Z and suppose that 0 m 1. Show that either m = 0 or m = 1.

1.3

Multiplicative inverses and the rational numbers


The motivation of the introduction to the rationals is essentially the same as the motivation for the
introduction of the integers, just with multiplication instead of division. That is, we would like to
be able to solve equations of the form mn = o for m, n, o Z. There is one significant difference
however. Unlike with addition, we cannot invert everything: in particular, we cannot invert 0.
Exercise 1.3.1 Let R be a ring. Show that if 0 is invertible in R, then R = 0 := {0}.
Example 1.3.2 Tropical integers The tropical integers are an example of a nonzero
crig in which 0 is invertible. Thus, you do indeed need additive inverses for the result of the
previous exercise to hold.
Define R := N, m +R n := max{m, n}, m R n := m + n, and 0R := 0 =: 1R . Then,
(R, +R , R , 0R , 1R ) are the tropical integers. The subscript R is meant to distinguish tropical
version from the usual version (e.g. +R := max vs +).


Exercise 1.3.3 Show that indeed (R, +R , R , 0R , 1R ) is a crig. Furthermore, show that

0R R 0R = 1R , so that indeed 0R is invertible with multiplicative inverse 01


R = 0R .

Besides 0, however, we wind up being able to invert everything we would like.

1.3 Multiplicative inverses and the rational numbers

29

Theorem 1.3.4 Rational numbers. There exists a totally-ordered field Q, the rational

numbers, such that


(i). Q contains Z as a subpreordered ring; and
(ii). if Q0 is any other totally-ordered field which satisfies this property, then Q0 contains a
unique copy of Q as a subpreordered field.
Furthermore, Q is unique up to isomorphism of preordered-fields.
R

This is also a special case of a more general construction known as the construction
of fraction fields. For the same reason as with the Grothendieck group construction,
we do not present this in its full generality.

Proof. We leave this as an exercise.


Exercise 1.3.5 Try to do the entire proof yourself, using the proof of Theorem 1.2.1

as guidance.

Proposition 1.3.6 For all x Q, there exist unique m Z and n Z+ such that (i)

gcd(m, n) = 1 and (ii) x = mn .


Proof. By (i) of the previous theorem, Q contains Z. As Q is a field, it thus must contain
multiplicative inverses of every nonzero integer. For m Z not zero, denote its multiplicative
inverse in Q by m1 . For n Z, denote mn := n m1 , and define


Q := mn Q : m, n Z, m 6= 0 .
(1.3.7)
Note that Q is a field which contains Z has a subpreordered ring. By (ii) of the previous
theorem, we thus have that Q Q. Of course we already knew that Q Q, and so we have
that Q = Q.
Now for x Q arbitrary, we can write x = mn for some m, n Z with m 6= 0. If m < 0,
n
then we can write x = m
, so that now the denominator is positive. Define d := gcd(m, n)
0
0
and write m = m d and n = n0 d, so that x = mn 0 .
Exercise 1.3.8 Show that gcd(m0 , n0 ) = 1.

This shows existence.


We now prove uniqueness. Suppose that mn11 = mn22 for m1 , n1 , m2 , n2 Z with m1 , m2 > 0,
gcd(m1 , n1 ) = 1 = gcd(m2 , n2 ). Re-arranging, we have m2 n1 = m1 n2 , so that m2 | m1 n2 . As
gcd(m2 , n2 ) = 1, it follows that m2 | m1 . On the other hand, this same equation implies that
m1 | m2 n1 , and therefore m1 | m2 . That m1 | m2 an m2 | m1 implies that m1 = m2 . However,
as they are both positive, we have that m1 = m2 . It then follows that n1 = n2 from the
equation m2 n1 = m1 n2 .

Exercise 1.3.9 Let X be a totally-ordered field and let x1 , x2 X be nonzero. Show that

the following statements are true.


(i). x11 has the same sign as x1 .

Chapter 1. What is a number?

30
(ii). 0 < x1 x2 implies 0 < x21 x11 .

As a matter of fact, Q is not just he smallest totally-ordered field that contains Z, is is the
smallest totally-ordered field period.
Exercise 1.3.10 Let F be a totally-ordered field. Show that Char(F) = 0.
Proposition 1.3.11 Let F be a totally-ordered field. Then, there exists a subfield Q F
and an isomorphism of totally-ordered fields from Q to Q.
R

It is typical to identify Q with Q, so that Q can actually be regarded as a subfield of


F.

In fact, we could define the rationals as follows.


There exists a totally-ordered field Q, the rational numbers, such that, if
Q is another totally-ordered field, then Q Q. Furthermore, Q is unique (1.3.12)
up to isomorphism of preordered fields.
In fact, this is arguably preferable to Theorem 1.3.4, but we as we actually make
use of that result in the following proof, to do this, we would either have to rephrase
things entirely or redefine Q.

Proof. As Char(F) = 0, F contains a copy of N, namely, {0, 1, 2, 3, . . .}.a Recall that Z is


the smallest totally-ordered integral cring which contains N. Thus, as F is in particular now
a totally-ordered integral cring which contains N, we have that in fact F contains Z: Z R.
Similarly, as Q was the smallest totally-ordered field that contained Z, and we have just
showed that F is a totally-ordered field that contains Z, it follows that Q F.

a We

needed that Char(F) = 0 in order that this be a copy of N. For example, think of the integers modulo m
(Example A.1.149; see also the sequence of examples starting with Example A.1.64). For example, in the case
m = 3, we have that 1 + 1 + 1 = 0, and so this set {0, 1, 2 := 1 + 1, 3 := 1 + 1 + 1, . . .} would just be {0, 1, 2},
certainly not a copy of the natural numbers.

1.4

Least upper-bounds and the real numbers


Finally we come to the first material that might be considered the point of this course.
Just as the integers corrected the deficiency of the natural numbers that was the lack of
additive inverses, and the rationals corrected the deficiency of the integers that was the lack of
multiplicative inverses, the real numbers will correct a deficiency of the rational numbers. But
what is this deficiency? The answer turns out to be that this deficiency is a lack of what are
called least upper-bounds.

1.4.1

Least upper-bounds and why you should care


Definition 1.4.1 Suprema Let (X, ) be a preordered set, let S X, and let x X.

Then, x is a supremum of S iff


(i). x is an upper-bound of S, and
(ii). if x0 is any other upper-bound of S, then x x0 .
R

Supremum is synonymous with least upper-bounds (because they are an upper-bound


that is less than or equal to every other upper-bound).

1.4 Least upper-bounds and the real numbers


R

31

If S has a supremum x, then we write x := sup(S). This is justified by the following


exercise.

Exercise 1.4.2 Let (X, ) be a partially-ordered set, let S X, and let x1 , x2 S be

two suprema of S. Show that x1 = x2 .


We extend the definition of supremum as follows.
(

if S is not bounded above


sup(S) :=
.
if S = 0/

(1.4.3)

Exercise 1.4.4 Find an example of a preordered set (X, ) an a subset S X with two

distinct suprema.
R

It is because of counter-examples like these that we shall primarily concern ourselves


with partially-ordered sets in this section.

We similarly have a notion of infimum, which is just the same concept with inequalities
reversed.
Definition 1.4.5 Infimum Let (X, ) be a preordered set, let S X, and let s X. Then,

x is a infimum of S iff
(i). x is a lower-bound of S, and
(ii). if x0 is any other lower-bound of S, then x0 x.
R

A you might have guessed, infima are also called greatest lower-bounds.

Similarly, if S has an infimum x, then we write x := inf(S).

We extend the definition of infimum as follows.


(
if S is not bounded below
inf(S) :=
.

if S = 0/

(1.4.6)

So thats what suprema (and infima) are, but why should you care? At least one reason is the
following. Consider the set
n
o
n
S := 1 + 1n : n Z+ Q.
(1.4.7)
This set has two key properties (i) it is bounded above, and (ii) for every x S, there is some x0 S
with x < x0 . Draw yourself a picture of what this must look like. Though we dont know what a
limit is yet, from the picture we see that pretty much any reasonable definition of a limit should
have the property that

n
lim 1 + n1 = sup(S).
(1.4.8)
n

Though we dont even have the definition to make sense of it yet, youll recall that the answer
to the left-hand side should be e := exp(1), which of course is not rational.3 Thus, despite the
3 Of course, we havent proven that e is not rational, but right now, as we are only concerned with motivation, simply
knowing that it is not rational is enough to justify the desire to have suprema.

Chapter 1. What is a number?

32

fact that S Q, sup(S)


/ Q, and in fact, for us, sup(S) just doesnt make sense (yet). Ultimately,
because we want to do calculus, we want to be able to take limits. In particular, because of things
like (1.4.8), we had better be able to take suprema as well. It turns out that throwing in all least
upper-bounds gives us all the limits we were missing. This is really the motivation for demanding
the existence of least upper-bounds: we want to be able to take limits.
Thus, the desired property which Q lacks is the following.
Definition 1.4.9 Least upper-bound property A preordered set has the least upper-

bound property iff every nonempty subset that is bounded above has a least upper-bound.
Of course, we have the inequality-reversed notion as well.
Definition 1.4.10 Greatest lower-bound property A preordered set has the greatest

lower-bound property iff every nonempty subset that is bounded below has a greatest
lower-bound.
It turns out that it doesnt actually matter which property we require:
Proposition 1.4.11 Let X be a partially-ordered set with the least upper-bound property

and let S X be nonempty and bounded below. Then, T := {x X : x x0 for all x0 S}


is nonempty and bounded above, and sup(T ) = inf(S). In particular, X has the greatest
lower-bound property.
R

Of course, switching inequalities around gives us the converse as well.

Proof. As S is nonempty, there is some x0 S. From the definition of T , it follows that x0


is an upper-bound of T , so that T is bounded above. T is the set of all lower-bounds of S,
and so as S is bounded below, T is nonempty. Thus, because X has the least upper-bound
property, T has a supremum t := sup(T ).
We wish to show that t is the infimum of S. We must show two things: (i) that it is a
lower bound of S and (ii) that it is at least as large as every other lower-bound. To show the
first, let x S. By definition, x0 x for all x0 T , and so x is an upper-bound for T . As
t is the least upper-bound, we then have that t x, so that t is a lower-bound of S. Now
let t 0 X be some other lower-bound of S. Then, by definition, t 0 T , and so as t is an
upper-bound of T , we have that t 0 t, so that sup(T ) = t = inf(S).

R

Sometimes preordered sets which possess both of these properties are called dedekindcomplete. The term dedekind-complete is to contrast with the term cauchy-complete,
which we will meet in the Chapter 4 Uniform spacessee Definition 4.5.7. Indeed, you
should probably try to avoid saying merely complete (i) this can lead to confusion with
cauchy-completeness and (ii) in order theory the term complete (I believe?) usually implies
the existence of suprema and infima of all subsets (not necessarily nonempty and bounded).
That said, I can guarantee you Im going to be sloppy about this myself.

The real numbers were turn out to be a dedekind-complete totally-ordered set, and so it will be
useful to have the following equivalent characterization of suprema and infima in totally-ordered
sets.
Proposition 1.4.12 Let X be a totally-ordered set, let S X nonempty and bounded above,

and let x X be an upper-bound for S. Then, x = sup(S) iff for every x0 X with x0 < x,
there is some x00 S with x0 < x00 x.
R

Warning: This is not true if X is only partiall-orderedsee Example 1.4.14.

1.4 Least upper-bounds and the real numbers


R

33

You should think of this as saying that, in particular, S contains elements arbitrarily
close to its supremum.

Proof. () Suppose that x = sup(S). Let x0 X be such that x0 < x. We proceed by


contradiction: suppose that there is no x00 S such that x0 < x00 x. x being an upper-bound
of S, every x00 S is automatically less than or equal to S, so really this is just the same as
saying that there is no x00 S with x0 < x00 . By totality, it thus follows that we must have
x00 x0 for all x00 S, in which case x0 is an upper-bound for S. But x0 < x, which contradicts
the fact that x is the least upper-bound for x. Thus, there must be some x00 S such that
x < x00 x.
() Suppose that for every x0 S with x0 < x, there is some x00 S with x < x00 x. Let
x0 X be any other upper-bound for S. We would like to show that x x0 : we proceed by
contradiction: suppose that it is not the case that x x0 . By totality, this is equivalent to x0 < x.
Then, by hypothesis, there must be some x00 S such that x0 < x00 x, which contradicts the
fact that x0 is an upper-bound of S. Thus, it must be the case that x = sup(S).

Exercise 1.4.13 Write down and prove the analogous version of the previous proposition

for the infimum.


Example 1.4.14 A counter-example to Proposition 1.4.12 in the absence of totality Define X := Q t {A} and declare that A q if q > 0 an A q if q < 0 for q Q.


Then, 0 X is not a least upper-bound for (, 0) because it is not less-than-or-equal-to A


(in fact, its just not comparable to A). On the other hand, if q < 0, then q < q2 0, and so
0 X does satisfy the desired property even though it is not a least upper-bound
R

1.4.2

It is not uncommon to see others using facts like 2 is not rational. as motivation for the
introduction
of the real numbers. This is stupid. If all we really cared about were numbers

like 2, then we shouldnt be going from Q to R, but rather from Q to A, the algebraic
numbers (see Definition 2.1.13). The point of R is not to be able to take square-roots; the
point is to be able to take limits.

Dedekind cuts and the real numbers


Everybody reading these notes probably already has some intuition about the real numbers, most
likely gained from some sort of calculus course. Let us suppose for a moment that we know what
the real numbers are and that they make sense. Given a real number x0 R, how would you encode
x0 using only Q? The trick we use is to look at the set
Dx0 := {x Q : x x0 } .

(1.4.15)

This subset of Q uniquely determines x0 because sup(Dx0 ) = x0 . The idea then is to use sets of the
form (1.4.15) to define the real numbers. The only thing we have to do for this to make sense in Q
alone is to get rid of the reference to x0 . We do that as follows.
Definition 1.4.16 Dedekind cut Let (X, ) be a preordered set and let S X. Then, S

is a Dedekind cut in X iff


(i). S 6= 0,
/
(ii). S 6= X, and

Chapter 1. What is a number?

34

(iii). the set of all lower-bounds of the set of all upper-bounds of S is equal to S itself.
R

The word cut is used because, for example, Dx0 of (1.4.15) is sort of thought as
cutting Q at the point x0 .

Youll note that Dx0 of (1.4.15) is a Dedekind cut. Indeed,


Proposition 1.4.17 Let (X, ) be a dedekind-complete totally-ordered set and let S X be

a dedekind cut. Then,


S = {x X : x sup(S)} .

(1.4.18)

Proof. Let us write S0 := {x X : x sup(S)}. As sup(S) is in particular an upper-bound


of S, we immediately have the inclusion S S0 . On the other hand, suppose that x sup(S).
We wish to show that x S. As S is a Dedekind cut, this is the same as showing that x is
less than or equal to every upper-bound of S, so, let u X be an upper-bound of S. We now
wish to show that x u. We proceed by contradiction: suppose that u < x (this uses totality).
However, this of course contradicts the fact that u is an upper-bound for S. Thus, we must
have that x u, so that S0 S, so that S = S0 .

We will construct the real numbers as the set of all Dedekind cuts in Q.4
Theorem 1.4.19 Real numbers. There exists a nonzero dedekind-complete totally-

ordered field R, the real numbers, which is unique up to isomorphism of preordered fields.
R

Compare this with the analogous theorems for Z and Q, Theorems 1.2.1 and 1.3.4
respectively. Youll note that, for uniqueness, we need not require that R be the
simplest dedekind-complete totally-ordered field which contains Q, or even that R
contain Q at all. These will follow automatically from the fact that R is a dedekindcomplete totally-ordered field alone. (In fact, we get that R obtains Q for free via
Proposition 1.3.11.)

As you might have guessed, this is also a special case of a more general construction,
which takes partially-ordered sets to dedekind-complete partially-ordered sets, known
as the Dedekind-MacNeille completion. Be careful, however: in general the arithmetic
operations do not extend to the Dedekind-MacNeille completion; see Example 2.3.13.

Proof. S T E P 1 : D E F I N E R A S A S E T
Define
n
o
R := D 2Q : D is a Dedekind cut .

(1.4.20)

STEP 2: DEFINE A PREORDER ON R


We define
D E iff D E.
4 There

(1.4.21)

is another construction of the reals that is commonly taught, namely, the cauchy sequence construction
in which a real number is defined to be an equivalence class of cauchy sequences. While this works, I find this more
appropriate if one is thinking of the real numbers as a uniform space (in this case, a metric space), whereas we are
currently thinking of everything as algebraic structures with order. Because of this, I find it more natural to complete
the underlying partially-ordered set instead of the underlying uniform space (for one thing, we havent actually put a
uniform structure on Q yet).

1.4 Least upper-bounds and the real numbers

35

S T E P 3 : S H O W T H AT I S A T O TA L - O R D E R
is automatically a partial-order because set-inclusion is always a partial-order. To show
totality, let D, E R. If D E, we are done, so suppose this is not the case. We would like
to show that E D, i.e., that E D, so let e E. As D is a Dedekind cut, it suffices to show
that e is a lower-bound of every upper-bound of D, so let u Q be an upper-bound of D. We
wish to show that e u. We proceed by contradiction: suppose that u < e. Now, D is not
a subset of E (by hypothesis), there must be some d D with d
/ E. If we can show that
e d, then we will have u < d (because u < e), a contradiction. To show this itself (that
e d), we proceed by contradiction: suppose that d < e. Then, in particular, d is less than
every upper-bound of E, and so, as E is a cut, we have d E: a contradiction (recall that we
have taken d
/ E). Thus, we must that e d, which completes the proof of totality.
S T E P 4 : S H O W T H AT I S D E D E K I N D - C O M P L E T E
As is a total-order, it suffices simply to show that has the least upper-bound property
(by Proposition 1.4.11). To show this, let S R be nonempty and bounded above. Define
S :=

D.

(1.4.22)

DS

Exercise 1.4.23 Check that S is in fact a Dedekind cut.

We wish to show that S = sup (S ). As S is a superset of every element of S , we


certainly have that S is an upper-bound for S . To show that it is a least upper-bound, let S0
be some other upper-bound of S . We wish to show that S S0 . We proceed by contradiction:
suppose that S 6 S0 . Then, there is some x S with x
/ S0 . As x S, there must be some
D S with x D (by the definition of S). However, as S0 is an upper-bound of S , we have
that D S0 , which implies that x S0 : a contradiction. Thus, we must have that S S0 , so
that S = sup (S ).
STEP 5: DEFINE ADDITION
Addition of elements of R is just set addition:a
D + E := {d + e : d D, e E} .

(1.4.24)

Exercise 1.4.25 Check that D + E is in fact a Dedekind cut.

S T E P 6 : S H O W T H AT (R, +, 0, ) I S A C O M M U TAT I V E G R O U P
Associativity and commutativity follow from the fact that set addition is associative and
commutative. The cut
0 := {x Q : x 0}

(1.4.26)

functions as an additive identity.b


Exercise 1.4.27 Check that 0 is an additive identity.

We define the additive inverse


D := {x y : x 0 and y
/ D} .

(1.4.28)

Chapter 1. What is a number?

36
Note that we have
D + (D) = {d + (x y) : d D, x 0, y
/ D} .

(1.4.29)

As y
/ D, we have that d < y, and so d y < 0, and so d + (x y) < 0. In the other direction,
let x 0. Then,
x = d + ((x + (y d)) y) ,

(1.4.30)

and so it suffices to show that we can choose d D and y Dc so that x + (y d) < 0. To


do this, let > 0. We wish to show that there is some d D so that d +
/ D. We proceed
by contradiction: suppose that d + D for all d D. Then, for d0 D fixed, we have
d0 + D, and so d0 + 2 D, and so d0 + 3 D, etc.. As D is bounded (by any element
in Dc ), this is a contradiction. Thus, there is some d D such that d +
/ D. Then, taking
= x, we have that d x
/ D, and so we may take y := d x, so that x + (y d) = 0 0
as desired.
S T E P 7 : S H O W T H AT (R, +, 0, ) I S A T O TA L LY - O R D E R E D C O M M U TAT I V E
GROUP

Exercise 1.4.31 Check that D E implies D + F E + F.

S T E P 8 : D E F I N E M U LT I P L I C AT I O N
Multiplication is more complicated. For example, the product 0 0 should be 0 0 = 0;
however, the set product, 00 := {de : d 0, e 0}, isnt even bounded above. We have to
break-down the definition into cases. To simplify things, let us temporarily use the notation
D+
0 := {d D : 0 d} .

(1.4.32)

Here, will denote multiplication in R and juxtaposition will denote set multiplication. We
define

D+
if 0 D, E

0 E0 0

((D) E) if D 0, 0 E
D E :=
(1.4.33)

(D

(E))
if
0

D,
E

(D) (E) if D, E 0
From the definition (1.4.33), it suffices to show associativity and commutativity for the
case 0 D, E. Then,

+ +
D (E F) = D E0+ F0+ 0 = D+
(1.4.34)
0 E0 F0 0 = (D E) F,
and similarly for commutativity. The cut
1 := {x Q : x 1}
functions as a multiplicative identity.
Exercise 1.4.36 Check that 1 is a multiplicative identity.

(1.4.35)

1.4 Least upper-bounds and the real numbers

37

S T E P 9 : S H O W T H AT T H E A D D I T I V E I N V E R S E D I S T R I B U T E S
We wish to show that
(D + E) = (D) + (E).

(1.4.37)

On one hand we have


(D + E) = {x y : x < 0 and y
/ D + E} .

(1.4.38)

On the other hand,


(D) + (E) = {a + b : a D, b E}
= {(x1 y1 ) + (x2 y2 ) : x1 , x2 < 0; y1
/ D; y2
/ E}
= {(x1 + x2 ) (y1 + y2 ) : x1 , x2 < 0; y1
/ D; y2
/ E}

(1.4.39)

= {x (y1 + y2 ) : x < 0, y1
/ D, y2
/ E} .
Comparing this with (1.4.37) above, we see that it suffices to show that y
/ D + E iff
y = y1 + y2 for y1
/ D and y2
/ E. To show this, let y1 Dc , y2 E c and suppose that
y1 + y2 D + E, so that y1 + y2 = d + e for d D, e E. Then, y2 = e + (d y1 ) < e, which
implies that y2 E: a contradiction. Conversely, let y
/ D + E. Let > 0 and choose d D
such that d +
/ D. Let M 2 be such that y M D + E but y (M )
/ D + E. Write
0
0
0
0
0
y M = d + e for d D, e E. Without loss of generality, assume that d d. Then,

y M = d 0 + e0 = d + e0 (d d 0 ) .
(1.4.40)
As d d 0 0, e := e0 (d d 0 ) E. Then,
y (M ) = d + (e + ).

(1.4.41)

Therefore, as y (M )
/ D + E, it must be the case that e +
/ E. Then,
y = (d + (M )) + (e + ).

(1.4.42)

As d + (M ) d +
/ D, it follows that d + (M )
/ D, so that indeed y = y1 + y2 for
y1
/ D and y2
/ E.
+
+
S T E P 1 0 : S H O W T H AT [E + F]+
0 = E0 + F0 F O R 0 E, F
If e E, e 0 and f F, f 0, then of course e + f E + F, e + f 0. Conversely, let
x E + F, x 0 and write x = e + f for e E, f F. As x 0, we must have that either
e 0 or f 0 Without loss of generality, assume the former. If f 0, we are done, so
instead suppose that f < 0. Then, x = (e + f ) + 0. As e + f < e, e + f E, and of course,
as x 0, e + f 0, so that indeed x E0+ + {0} E0+ + F0+ .

S T E P 1 1 : S H O W T H AT (R, +, 0, , , 1) I S A C R I N G
All that remains to be shown is distributivity. Let D, E, F R and consider
D (E + F).
Let us first do the case with 0 D, E, F. Then, by the previous step,



+
+
+
+
+ +
+ +
D (E + F) = D+
0 [E + F]0 0 = D0 (E0 + F0 ) 0 = D0 E0 + D0 F0 0
+
+ +
= D+
0 E0 0 + D0 F0 0 = D E + D F.

(1.4.43)

(1.4.44)

Chapter 1. What is a number?

38

Because additive inverses distribute and the definition of multiplication, we may without
loss of generality assume that 0 D, E + F. Thus, either 0 E or 0 F. Without loss of
generality assume the former. We have already done the case 0 F, so let us instead assume
that F < 0. Then,
D E + D F = D ((E + F) + (F)) + D F = D (E + F) + D F + D (F)
= D (E + F) + D F + ((D F)) = D (E + F),

(1.4.45)

where we have used distributivity of additive inverse in D (F) = D (F + 0) = D F.


S T E P 1 2 : S H O W T H AT (R, +, 0, , , 1) I S A F I E L D
All that remains to be shown is the existence of multiplicative inverses. For D R not 0,
we define
(

y1 : y Dc
if D > 0
1

D :=
.
(1.4.46)
1
(D)
if D < 0
To finish this step, it suffices to prove that D D1 = 1 for D > 0.
1 +
D D1 = D+
0 [D ]0 0.

(1.4.47)

We would like to show that this is equal to 1 := {x 1}. Let > 0 and choose d D, d > 0
such that d +
/ D. Thus, D is bounded above by d + and Dc is bounded below by
1 +
d. It follows that D1 is bounded above by d 1 , so that D+
0 [D ]0 is bounded above by
1
1
(d + )d = 1 + d . As is arbitrary (and d gets smaller as gets smaller), it follows
1 +
1 1. For the other
that in fact D+
0 [D ]0 is bounded above by 1, which shows that D D
inclusion, let x 1. Let > 0 and choose y
/ D so that y D. As x < 1, d := x(y) D.
Thus,
D D1 3 (x(y )) y1 = x xy1 .

(1.4.48)

As this is true for all > 0, we must have that x D D1 , which completes the proof of
this step.
S T E P 1 3 : S H O W T H AT (R, +, 0, , , 1) I S A D E D E K I N D - C O M P L E T E T O TA L LY ORDERED FIELD

All that remains to be shown is that 0 D, E implies 0 D E. Of course, if d D, d 0


and e E, e 0, then de D E, so that 0 D E.
S T E P 1 4 : S H O W T H AT (R, +, 0, , , 1) I S U N I Q U E U P T O I S O M O R P H I S M O F
PREORDERED FIELDS

Let R be another nonzero dedekind-complete totally-ordered field. By Proposition 1.3.11, R


contains a copy of Q R (as well as R itself, Q R.). Let x X. Thus, because both R and
R contain Q, abusing notation, we may consider
Dx := {q Q : q x}

(1.4.49)

both as a subset of R and R.c With this abuse, we define : R R by


(x) := sup(Dx ),

(1.4.50)

1.5 Concluding remarks

39

where on the right-hande side, Dx is regarded as a subset of R and the supremum is taken in
R.
Exercise 1.4.51 Show that is an isomorphism of preordered fields.


a If

ever youre wondering how to come up with these definitions, just think of what the answer should be for
sets of the form (1.4.15).
b Of course this is abuse of notation. It should not cause any confusion as one is a subset of Q and the other
is an element of Q.
c If you want to be pedantic about things, there is a subfield Q R together with an isomorphism of
totally-ordered fields : Q Q, where Q R.

Definition 1.4.52 Irrational numbers An element x R is irrational iff x


/ Q.
R

In fact, it will be a little while before we can show that Qc is even nonempty. For
example, it is easy to show that there is no rational number whose square is 2but
can you show that there is a real number whose square is 2? (See the subsubsection
Square-roots in Subsection 2.4.3 Cauchyness and completeness.)

Exercise 1.4.53 Let A, B R be nonempty and bounded above.

(i). Show that sup(A + B) = sup(A) + sup(B).


(ii). Is it necessarily the case that sup(AB) = sup(A) sup(B)?

1.5

Concluding remarks
There are a couple of themes that we started to see in this chapter that you should pay particular
attention to.
The first theme is that, if ever we want to have a certain object that just doesnt exist in the
context in which we are working, enlarge the context in which you are working. For example,
when we wanted additive inverses but didnt have them, we went from N to Z; when we wanted
multiplicative inverses but didnt have them, we went from Z to Q; and when we wanted limits but
didnt have them, we went from Q to R.
The second theme you should take note of is the fact that we almost always defined an object
by the properties that uniquely specify it. The point is that it doesnt really matter at the end of
the day whether R is a set of Dedekind cuts or equivalence classes of Cauchy sequences; all that
matters is that it is a nonzero dedekind-complete totally-ordered field.
The third and final theme you should take note of (which is perhaps not quite as manifest as
the other two) is that the morphisms matter just as much (if not more) than the objects themselves.
For example, in light of the second theme, we cannot even make sense of the idea of an object
being unique without first talking about the morphisms. More significantly is that if you keep an
underlying set fixed, but change the relevant morphisms, you can completely different objects. For
example, if we consider Q and Z as crings, (Q, +, 0, , 1) and (Z, +, 0, , 1), Q and Z are totally
different; on the other hand, if we forget the extra structure (or, to put it another way, change our
morphisms from homomorphisms of rings to just ordinary functions) then Q and Z become the
same object, that is, Q
6 Ring Z but Q
=
= Z. The former is obvious (for example, 2 has an inverse in
Q but not in Z). As for the latter, we recommend to continue reading the following chapter. . .

2. Basics of the real numbers

First of all, let us recall our definition of the real numbers.


There exists a nonzero dedekind-complete totally-ordered field which is unique
up to isomorphism of totally-ordered fields. This field is the field of real numbers.

(2.1)

Recall that we also showed (Proposition 1.3.11) that R must contain Q, and hence in turn Z and N.

2.1

Cardinality and countability


This section is a bit of an asidewhile not on the real numbers per se, a knowledge of cardinality
and countability is absolutely essential to an understanding of the real numbers.
In the first chapter, we briefly discussed the notion of cardinality and its relationship to the
natural numbers. In fact, the natural numbers themselves are, as a set, precisely the finite cardinals.
In this chapter, we investigate the cardinality of N itself.
The first fact we point out is that the cardinality of the natural numbers is the smallest infinite
cardinal.
Proposition 2.1.1 Let be an infinite cardinal. Then, |N| .

Proof. Let K be any set such that |K| = . Recall that (Definition 1.1.24) to show that
|N| requires that we produce an injection from N into K. We construct an injection
f : N K inductively. Let k0 K be arbitrary and let us define f (0) := k0 . If K \ {k0 } were
empty, then K would not be infinite, therefore there must be some k1 K distinct from k0 ,
so that we may define f (1) := k1 . Continuing this process, suppose we have defined f on
0, . . . , n N, and wish to define f (n + 1). If K \ { f (0), . . . , f (n)} were empty, then K would
be finite. Thus, there must be some kn+1 K distinct from f (0), . . . , f (n). We may then
define f (n + 1) := kn+1 . The function produced must be injective because, by construction,
f (m) is distinct from f (n) for all n < m. Hence, |N| .

Thus, the cardinality of the natural numbers is the smallest infinite cardinality. We give a name
to this cardinality.

Chapter 2. Basics of the real numbers

42

Definition 2.1.2 Countability Let X be a set. Then, X is countably-infinite iff |X| = |N|.

X is countable iff either X is countably-infinite or X is finite. We write 0 := |N|.


R

It is not uncommon for people to use the term countable to mean what we call
countably-infinite. They would would then just say countable or finite in cases
that we would say countable.

Our first order of business is to decide what other sets besides the natural numbers are countablyinfinite.
Proposition 2.1.3 The even natural numbers, 2N, are countably-infinite.

Proof. On one hand, 2N N, so that |2N| 0 . On the other hand, 2N is infinite, and
as we just showed that 0 is the smallest infinite cardinal, we must have that 0 |2N|.
Therefore, by antisymmetry (Bernstein-Cantor-Schrder Theorem, Theorem 1.1.27) of ,
we have that |2N| = 0 .

Exercise 2.1.4 Construct an explicit bijection from N to 2N.

This is the first explicit example we have seen of some perhaps not-so-intuitive behavior of
cardinality. On one hand, our intuition might tell us that there are half as many even natural numbers
as there are natural numbers, yet, on the other hand, we have just proven (in two different ways,
if you did the exercise) that 2N and N have the same number of elements! This of course is not
the only example of this sort of weird behavior. The next exercise shows that this is actually quite
general.
Exercise 2.1.5 Let X and Y be countably-infinite sets. Show that X tY is countably-infinite.

Thus, it is literally the case that 20 = 0 . A simple corollary of this is that Z is countablyinfinite.
Exercise 2.1.6 Show that |Z| = 0 .

You (hopefully) just showed that 20 = 0 , but what about 20 ?


Exercise 2.1.7 Let X0 , X1 , . . . be a countably-infinite collection of finite sets. Show that
G

(2.1.8)

Xm

mN

is countably-infinite.
Proposition 2.1.9 20 = 0 .

Proof. For m N, define


Xm := {(i, j) N N : i + j = m}

(2.1.10)

Note that each Xm is finite and also that


NN =

Xm .

(2.1.11)

mN

Therefore, by the previous exercise, |N N| =: 20 = 0 .

2.1 Cardinality and countability

43

Exercise 2.1.12 Use Berstein-Cantor-Schrder and the previous proposition to show that

|Q| = 0 .
This result might seem a bit crazy at first. I mean, just look at the number line, right? Theres
like bajillions more rationals than naturals. Surely it cant be the case there there are no more
rationals than natural numbers, can it? Well, yes, in fact that can be, and in fact is, precisely the
case: despite what your silly intuition might be telling you, there are no more rational numbers
than there are natural numbers.
We mentioned briefly before the algebraic numbers in the remark right before Subsection 1.4.2
Dedekind cuts and the real numbers. We go ahead and define them now because they provide a
good exercise in cardinality.
Definition 2.1.13 Real algebraic numbers A real number R is algebraic iff there

exists a nonzero polynomial with integer coefficients p such that p() = 0. We write AR
for the set of real algebraic numbers.
R

The algebraic numbers (denoted simply by A) are those elements in C which are
roots of a nonzero polynomial with integer coefficients. The real algebraic numbers
are then those elements of C which are both real numbers and algebraic numbers. We
havent actually defined C, which is why we only define the real algebraic numbers.

Exercise 2.1.14 Show that AR is countable.

So, weve now done both Z and Q, but what about R? At first, you might have declared it
obvious that there are more real numbers than natural numbers, but perhaps the result about Q has
now given you some doubt. In fact, it does turn out that there are more real numbers than there are
natural numbers.
Theorem 2.1.15 Cantors Cardinality Theorem. Let X be a set. Then, |X| < |2X |.

Proof. We must show two things: (i) |X| |2X | and (ii) |X| 6= |2X |.
The first, by definition, requires that we construct an injection from X to 2X . This,
however, is quite easy. We may define a function X 2X by x 7 {x}. This is of course an
injection.
The harder part is showing that |X| 6= |2X |. To show this, we must show that there is
no surjection from X to 2X . So, let f : X 2X be a function. We show that f cannot be
surjective. To do this, we construct a subset of X that cannot be in the image of f .
We define
S := {x X : x
/ f (x)} .

(2.1.16)

We would like to show that S is not in the image of f . We proceed by contradiction: suppose
that S = f (x0 ) for some x0 X. Now, we must have that either x0 S or x0
/ S. If the
former were true, then we would have that x0
/ f (x0 ) = S: a contradiction. On the other
hand, in the latter case, we would have x f (x0 ) = S: a contradiction. Thus, as neither of
these possibilities can be true, there cannot be any x0 X such that f (x0 ) = S. Thus, S is
not in the image of f , and so f is not surjective.

Now, to show that |R| > 0 , we show that |R| = 20 . Before we do this, however, we must
know a little more about the real numbers, and so we shall return to this in the next chapter (see the
subsubsection The uncountability of the real numbers in Subsection 2.4.4 Series).

Chapter 2. Basics of the real numbers

44

2.2

The absolute value


R already has a decent amount of structure: both an order and a field structure. We now equip
it with yet another structure which will make arguments easier to work with and more intuitive
(though as the entire definition involves only the order and algebraic structure (because of the
additive inverse), in principle we could do without it).
Definition 2.2.1 Absolute value The absolute value of x R, denoted by |x|, is defined

by
(
x
|x| :=
x

if x 0
.
if x 0

(2.2.2)

For > 0 and x0 R, we write


B (x0 ) := {x R : |x x0 | < } and D (x0 ) := {x R : |x x0 | } .
R

(2.2.3)

B (x0 ) is the ball of radius centered at x0 and D (x0 ) is the disk of radius centered
at x0 .

Exercise 2.2.4 Let x1 , x2 R. Show that the following statements are true.

(i). (Nonnegativity) |x1 | 0.


(ii). (Definiteness) |x1 | = 0 iff x1 = 0.
(iii). (Homogeneity) |x1 x2 | = |x1 ||x2 |.
(iv). (Triangle Inequality) |x1 + x2 | |x1 | + |x2 |.
(v). (Reverse Triangle Inequality) ||x1 | |x2 || |x1 x2 |.
R

The reason that (iv) is called the triangle inequality is the following. First of all, by
(iii), the triangle inequality can instead be written as |x1 x2 | |x1 | + |x2 |. Then, if
you pretend that x1 and x2 are vectors representing the sides of the triangle, x1 x2 is
a vector representing the third side of the triangle. The triangle inequality then states
that the length of a side of a triangle is at most the sum of the lengths of the other two
sides (being equal iff the angle between those two sides is precisely ). Your solution
should help explain why the reverse triangle inequality is called what it is.

Intuitively, of course, |x| is supposed to be the distance x is from 0. Then, |x y| is suppose to


be the distance between x and y. A simple result we have is that, if distance between two integers is
less than 1, then they are the same integer.
Proposition 2.2.5 Let 0 < < 1 and let m, n Z. Then, if |m n| < , then m = n.

Proof. Suppose that |m n| < . Without loss of generality, suppose that m n, so that we
can write n = m + k for k Z+
0 . Then, |m n| = k < < 1. It follows from Exercise 1.2.17
that k = 0, so that m = n.


2.3 The Archimedean Property

2.3

45

The Archimedean Property


The first property of the real numbers we come to is called the Archimedean Property. The
Archimedean Property essentially says that the natural numbers are unbounded in the reals.
Definition 2.3.1 The Archimedean Property Let F be a nonzero totally-ordered field

so that Q F (see Proposition 1.3.11). Then, we say that F is archimedean iff for all x F
there is some m N F such that x < m.
Exercise 2.3.2 Let F be an archimedean ordered field and let > 0. Show that there is

some m N F such that

1
m

< .

Theorem 2.3.3 Archimedean property of the real numbers. R is archimedean.

Proof. If x 0, we may take m = 0. Otherwise, x > 0, and so the set


S := {m N : m < x}

(2.3.4)

is nonempty (it contains 0). On the other hand, it is also bounded-above (by x), and so it has
a supremum: write m0 := sup(S). We first show that m0 N, and then we will show that
x m0 + 1.
By Proposition 1.4.12, there must be some m1 S such that
m0 21 < m1 m0 .

(2.3.5)

If m1 = m0 , we are done (because m1 N), so otherwise suppose that m1 < m0 . Then, we


may use this same proposition again to obtain an m2 S such that
m1 < m2 m0 .

(2.3.6)

But then |m1 m2 | < 12 (because they are both between m0 21 and m0 ), and so m1 = m2
(by Proposition 2.2.5): a contradiction (of the fact that m1 < m2 ). hence, it must have been
the case that m0 = m1 N.
We cannot have that m0 + 1 S because then otherwise m0 would not be an upperbound of S. Therefore, m0 + 1 N \ S = {m N : x m}, and so x m0 + 1, and so
x < m0 + 2.

This might seem like it is obvious, but it is in fact not true of all totally-ordered fields. To
present such an example, we have to take a bit of an aside and discuss polynomial crings and fields
of rational functions. Such crings and fields are important for many other reasons than to present an
example of a totally-ordered field that does not have the Archimedean Property, so it is worthwhile
to understand the material anyways, though you might want to come back to it if you dont care
about the counter-example at the time being.
Polynomial crings and a totally-ordered field which is not archimedean
Definition 2.3.7 Polynomial cring Let R be a totally-ordered cring, denote by R[x] the

set of all polynomials with coefficients in R, let + and on R[x] be addition and multiplication
of polynomials, and define p > 0 iff the leading coefficient of p is greater than 0 in R.a

Chapter 2. Basics of the real numbers

46

Exercise 2.3.8 Show that (R[x], +, 0, , , 1, ) is a totally-ordered cring. What is

0 R[x]? What is 1 R[x]?


R[x] is the polynomial cring with coefficients in R. The degree of a polynomial p, denoted
by deg(p), is the highest power of x that appears in p with nonzero coefficient.
a Recall

that this is enough to define a total-order by Exercise 1.1.45.

Exercise 2.3.9 Show that the following statements are true.

(i). deg(p + q) max{deg(p), deg(q)}.


(ii). deg(pq) deg(p) + deg(q).
(iii). If R is integral, then deg(pq) = deg(p) + deg(q).
Exercise 2.3.10 Show that R is integral iff R[x] is.
Theorem 2.3.11 Field of rational functions. Let F be a field, so that F[x] is a totally-

ordered integral cring. Then, there exists a totally-ordered field F(x), the field of rational
functions with coefficients in F, such that
(i). F(x) contains F[x] as a subpreordered ring, and
(ii). if F 0 is any other totally-ordered field which satisfies this property, then F 0 contains F
as a subpreordered field.
Furthermore, F(x) is unique up to isomorphism of preordered fields.
R

Compare this to our definition of the rational numbers in Theorem 1.3.4. F(x) is to
F[x] as Q is to Z. Indeed, we mentioned there that the passage from Z to Q was an
example of the more general construction of the fraction field of an integral cring.
This is another example of this construction. In fact, the proof of this theorem is
exactly the same as the proof of Theorem 1.3.4, and so we refrain from presenting it
again (it was an exercise anyways).

Of course, we have a result from F(x) which is completely analogous to the result Proposition 1.3.6 for Q (which just says that we can write any rational function uniquely as the quotient of
two relatively-prime polynomials with the denominator positive).
Example 2.3.12 A totally-ordered field which is not archimedean R(x) is a
totally-ordered field, but on the other hand, m < x for all m N, and so R(x) is not
archimedean.


We mentioned in a remark when defining the real numbers (Theorem 1.4.19) that, in general,
the arithmetic operations on a partially-ordered field do not extend to its Dedekind-MacNeille
completion in a compatible way. That R(x) is not archimedean immediately tells us that this field
is also a counter-example to this statement.
Example 2.3.13 A totally-ordered field whose Dedekind-MacNeille completion
cannot be given the structure of a totally-ordered field If the arithmetic operations


2.4 Nets, sequences, and limits

47

on R(x) extended to give the structure of a totally-ordered field on its Dedekind-MacNeille


completion, then, by uniqueness, its Dedekind-MacNeille completion would have to be
R itself. In particular, R(x) would have to embed into R. But then, there would have
to be some natural number m N with R R(x) 3 x m (because R is archimedean: a
contradiction of the fact that R(x) is not archimedean.

The second property of the real numbers we come to is what is sometimes called the density of
the rationals in the reals. This is not quite literally true, though we will see why people refer to this
as density when we discuss basic topology in the next chaptersee Exercise 3.2.41.
Theorem 2.3.14 Density of Q in R. Let a, b R. Then, if a < b, then there exists

c Q such that c (a, b).


It turns out that this is also the case for the irrational numbers, Qc , but at the moment,
we dont even know how to construct a single irrational number, so we will have to
return to this at a later date (Theorem 2.4.76).a

Proof. The intuitive idea is to take a positive integer m at least as large as the length of
the interval b a and break-up the real line into intervals of length m1 . Then, one of the
end-points of these intervals must lie in the interval (a, b). This will be our desired point.
Define := b a > 0 and choose m0 N with m10 < (see Exercise 2.3.2). Define
n
S := m N :

m
m0

o
b .

(2.3.15)

By the Archimedean Property, S is nonempty. Therefore, because N is well-ordered, it has a


(a, b).
least element, say mn00 . We claim that n0m1
0
First of all, we know that n0 1
/ S, and so n0m1
< b. On the other hand,
0
n0 1
m0

and so

n0 1
m0

n0
m0

m10 b m10 > b = b (b a) = a,

(a, b).

(2.3.16)


a You

might be able to show that there is no positive rational number whose square is 2, but can you show
that there is some positive real number whose square is 2?

2.4
2.4.1

Nets, sequences, and limits


Nets and sequences
You probably recall sequences from calculus.
A (real-valued) sequence is a function from N to R.

(2.4.1)

(This is not our official definition of a sequence. We will present another definition later. This is
why the above has no definition bar.)
A net is a generalization of a sequence. We will actually not need to make use of nets much
in this course; the point of introducing them despite this is to put emphasis on the structure N is
equipped with in this context. In particular, the N in the definition of a sequence is not to be thought
of as a crig; for the purposes of sequences, the algebraic structure does not matter. Instead, in the
context of sequences, N should be thought of as a directed set. Another motivation for working

Chapter 2. Basics of the real numbers

48

with nets is that, when you go to generalize to examples more exotic than the real numbers, some of
the results that are true in the reals would fail to generalize if we restricted ourselves to only work
with sequences. See the remarks in Proposition 2.5.32 and Theorem 2.5.68.
Definition 2.4.2 Directed set A directed set is a nonempty partially-ordered set (, )

such that, for every x1 , x2 X, there is some x3 X with x1 , x2 x3 .


Exercise 2.4.3 Show that (N, ) is a directed set.
Definition 2.4.4 Net A (real-valued) net is a function from a directed set (, ) into R.
R

Similarly as with sequences, if a : R is a net, it is customary to write a := a( ).

It is very common that the specific directed set that is the domain is not so important
and that the result works for any directed set. As a result, in attempt to simplify
notation slightly, we frequently omit the actual domain of nets. You should usually
be able to infer the domain on the basis of the index we happen to be using.

In general, nets will take their values in topological spaces. Of course, we havent
defined what a topological space is yet (we could, but it would probably seem like I
just pulled some definition out of my ass), and so for the time being nets will take
their values in R.

For those of you born and raised in Sequence Land: Nets are a bit like the iPhone
you dont think you need them, and then you find out theyre a thing.a

a Disclaimer:

Im not actually a fan of the iPhone.

And now we give our official definition of a sequence.


Definition 2.4.5 Sequence A (real-valued) sequence is a net whose directed set is

order-isomorphic (i.e. isomorphic in Pre) to (N, ).

Typically people take a sequence to be a function from N into R. This is essentially


fine, but then, technically speaking, a function from Z+ to R is not a sequence, and
you have to reindex everything by 1 to make it a sequence. It is easier if we just
ignore this by only requiring that the domain of the net be isomorphic to (in Pre)
instead of equal to (N, ).

If the name of the sequence is not important, we may simply denote it by a list of its
values, e.g. (0, 1, 2, 3, . . .).

Example 2.4.6 A net which is not a sequence Recall that (Exercise A.1.93) 2R ,

the power-set of the reals, is a partially-ordered set with the order relation being inclusion.
Exercise 2.4.7 Check that (2R , ) is directed.

Now let f : R R be bounded. That f is bounded means that


aS := sup {| f (x)| : x S} := sup {| f (x)|}

(2.4.8)

xS

exists for all subsets S 2R . Therefore, the map S 7 aS is a net. Without knowing yet the
precise definition of convergence, do you know what the limit should be?

2.4 Nets, sequences, and limits

49

Youll recall that the third theme that we mentioned at the end of Chapter 1 What is a number?
was morphisms matter. The morphisms determine the relevant structure, and so, in particular, not
just the set, but the structure with which it is equipped, matters. The point of introducing the notion
of a net at this stage is simply to point-out that, when working with sequences, N is to be thought
of as a directed-set, as opposed to, for example, a crig.
2.4.2

Limits
We now define what it means to be the limit of a net. To make the definition as clean and concise as
possible, we introduce the concept of nets eventually doing something.
Eventuality

We often use the word eventually in the context of nets and sequences. For example, we might
say The net 7 a is eventually XYZ.. This means that there is some 0 such that if 0 it
follows that 7 a is XYZ. For example, a fact that will come in handy is that convergent nets
(and in fact, cauchy nets, once we define cauchy) are eventually bounded (Proposition 2.4.30).
We make this formal.
Meta-definition 2.4.9 Eventually XYZ Let 3 7 a R be a net. Then, 7 a is

eventually XYZ iff there is some 0 such that { : 0 } 3 7 x is XYZ.


R

In other words, once you go past 0 , you have a net which is actually XYZ.

We dont have this language yet, but nets of this form are called strict subnets of the
original netsee Definition 2.4.144.

If a net is eventually XYZ, it can be convenient to essentially just throw away the terms at the
beginning which are not XYZ so as to obtain a net which is not just eventually XYZ, but is XYZ
itself (for example, you can chop off the beginning of a convergent net to obtain a bounded net).
For all intents and purposes, you should think of the first elements of a net as not mattering;
only what eventually happens is what matters. For example, consider the sequence m 7 am := 1,
that is, the constant sequence 1 R. Obviously this converges to 1. The point to note here is that,
you can do whatever you like to any finite number of elements of this sequence, and you will have
no effect upon the fact that it converges to 1. We can essentially throw away any finite amount of
the sequence and nothing will change. This idea can actually be quite important in proofs where
we might know nothing about the first couple of elements.
Now with the concept of eventuality in hand, we present the definition of convergence.
Definition 2.4.10 Limit (of a net) Let 7 a be a net and let a R. Then, a is the

limit of 7 a iff for every > 0, 7 a is eventually contained in B (a ). If a net has


a limit, then we say that it converges.a
R

Note the use of the term eventually. Explicitly, this is equivalent to


a is the limit of 7 a iff for every > 0 there is some 0 such that,
(2.4.11)
whenever 0 , it follows that a B (a ).
Of course, when actually doing proofs, you will often need to make use of the explicit
definition, but I think its fair to say that the definition that uses the term eventually
is both more intuitive and concise.

Chapter 2. Basics of the real numbers

50

Exercise 2.4.12 Limits are unique (in R) Let a , b R be limits of the net

7 a . Show that a = b .
R

In general topological spaces (though not for most reasonable ones), limits
need not be unique (hence the reason for adding in R). In fact, limits are
unique iff the space is T2 see Proposition 3.6.42.

If a is the limit of 7 a , then we write lim a = a . Note that this is unambiguous by the previous exercise.

There are at least three directed sets that are commonly the domains of nets in calculus:
N, hR+ , i, and hR+ , i, that is, the natural numbers with the usual ordering, the
positive reals with the usual ordering, and the positive reals with the reverse of the
usual ordering. We shall denote limits of nets with these respective domains as
lim xm ,
m

a Divergence

lim xt ,

lim xt .

t0+

(2.4.13)

is not the same as nonconvergence. We will define divergence momentarily.

Exercise 2.4.14 Let x0 R and define 7 a := x0 . Show that lim a = x0 .


R

That is, constant nets converge to that constant. While trivial, it is significant in that
it becomes an axiom of the convergence definition of a topological space.

Exercise 2.4.15 Show that limm m1 = 0.


Proposition 2.4.16 Let |a| < 1. Show that limm am = 0.

Proof. Define b := 1a . Let M > 0. We show that m 7 bm is eventually larger than M. It will
then follows that m 7 am is eventually smaller than M1 . As M1 is just as arbitrary as > 0,
this will show that limm am = 0.
We have
m  
m
m
m
b = (1 + (b 1)) =
(b 1)k 1 + m(b 1).
(2.4.17)
k
k=0
Because b 1, it follows from the Archimedean Property (Theorem 2.3.3) that m 7 bm is
eventually larger than M, which completes the proof.

Definition 2.4.18 Divergence Let 7 a be a net. Then, 7 a diverges to +

iff for every M > 0 there is some 0 such that whenever 0 , it follows that a M.
Similarly for (you should consider writing this definition out explicitly yourself).
7 a diverges iff it either diverges to + or diverges to .
R

Of course, the intuition is just that a grows arbitrarily large.

Exercise 2.4.19 What is an example of a net which neither converges nor diverges?

2.4 Nets, sequences, and limits

51

The biggest problem with proving that a net converges is that we first need to know what the
limit is before hand. For example, consider
m

lim
m

1
k! .

(2.4.20)

k=0

You probably recall that this should converge to e, but what the hell is e? It has not yet been defined.
In fact, we will later define
m

e := lim
m

1
k! ,

(2.4.21)

k=0

but we cannot even make this definition if we dont know a priori that the limit of (2.4.20) exists!
Thus, we need to have a way of showing that (2.4.20) exists without making explicit reference to e.
The concepts of cauchyness and completeness (in the sense of uniform spaces, as opposed to in the
sense of partially-ordered sets) allow us to do this.
2.4.3

Cauchyness and completeness


Like with the concept of convergence and limits, we will first define what it means to be cauchy,
and then explain the intuition behind the definition.
Definition 2.4.22 Cauchyness Let 7 a be a net. Then, 7 a is cauchy iff for

every > 0, 7 a is eventually contained in some -ball.


R

Note the use of the term eventually. Similarly as before, explicitly, this means that
7 a is cauchy iff there is some 0 and some -ball B such that,
(2.4.23)
whenever 0 , it follows that a B.

You should compare this with the definition of a limit, Definition 2.4.10. The only essential
difference between the definition of convergence and the definition of cauchy is that, in the former
case, there is a single center of the -ball that works for all , whereas, in the case of cauchyness,
the center of the ball can vary with the choice of . Thus, we immediately have the implication:
Exercise 2.4.24 Let 7 a be a convergent net. Show that 7 a is cauchy.

As a short aside, we note that this is not how the definition of cauchyness is stated. Instead, the
following equivalent condition is given as the definition.
Exercise 2.4.25 Let 7 a be a net. Show that 7 a is cauchy iff for all > 0 there is

some 0 such that whenever 1 , 2 0 it follows that |a1 a2 | < .


We choose to use the Definition 2.4.22 because (i) it more closely resembles the definition
of convergence and (ii) it more closely resembles the definition in the more general case of
uniform spaces (see Definition 4.5.1). That being said, the equivalent condition in this exercise
(Exercise 2.4.25) is frequently easier to check. For example:


Example 2.4.26 In this example, we check that the sequence


m

m 7 sm :=

k!1
k=0

(2.4.27)

Chapter 2. Basics of the real numbers

52

is cauchy. (Note that sm Q, so in fact, this sequence is cauchy in Q as well as in R.)





m
n
n
n
2n 2m


(2.4.28)
|sm sn | = k!1 k!1 = k!1 21k = m+n 2m
k=m+1
k=0
2
k=0
k=m+1
where we have without loss of generality assumed that 3 m n (so that 2k k! for
k m + 1).
Exercise 2.4.29 Finish the proof that m 7 sm is cauchy.

One way to immediately tell if a net is not cauchy is if it is eventually unbounded.


Proposition 2.4.30 Let 7 a be a cauchy net. Then, 7 a is eventually bounded. In

particular, if this net is a sequence, the set {am : m N} is bounded.


Proof. Applying the definition to := 1, we deduce that there must be some 0 and some
open ball B of radius 1 such that, whenever 0 , it follows that a B. Thus, 7 a is
eventually in B, a bounded subset of R, and hence itself is eventually bounded.
Now assume that m 7 am is a sequence so that m N. Let us also write m0 := 0 . The
key to note here is that, unlike in the general case, the set Pmc 0 = {m N : m < m0 } is finite.
This enables us to make the definition
r0 := max {|a0 ||, . . . , |am0 1 |, 1 + |x0 |} ,

(2.4.31)

where x0 is the center of the ball B. The reason for the 1 + |a | is as follows. For m m0 ,
we have that
|am | = |(am x0 ) + x0 | |am a | + |x0 | = 1 + |x0 |.

(2.4.32)

From this, it follows that {am : m N]} Br0 (a ), so that {am : m N} is bounded.

Algebraic and Order Limit Theorems

Ideally, the content in this subsubsection would have been discussed very soon after defining a
limit itself, however, we will use the fact that convergent nets are eventually bounded. We could
have proved this, then done the Algebraic and Order Limit Theorems, but when we would have
wound-up essentially proving the same theorem twice (because cauchy nets are eventually bounded
as well). We decided to just postpone these results to a slightly less-than-maximally-coherent
location in the notes instead.
Proposition 2.4.33 Algebraic Limit Theorems Let 7 a and 7 b be two netsa

which converge to a , b R respectively, and let R. Then,


(i). lim(a + b ) = a + b ,
(ii). lim(a b ) = a b ,
1
(iii). lim(a1
) = a if a 6= 0, and

(iv). lim(a ) = a .
R

Later, we will see how all of this follow automatically from continuity considerations
(because + is continuous, for example).

2.4 Nets, sequences, and limits

53

Proof. We leave (i), (ii), and (iv) as exercises. Should you need guidance, check out our
proof of (iii).
Exercise 2.4.34 Prove (i).
Exercise 2.4.35 Prove (ii).

We prove (iii). It is likely the most difficult, and if you can follow this, you should be
able to do the others on your own.
As a 6= 0, this means that is is eventually bounded away from 0: Without loss of
generality take a > 0 (the other case is essentially identical). Define := 12 |a | = 12 a .
Then, there is some 0 so that whenever 0 it follows that a a |a a | < :=
1
2 a . It follows that
a > a 12 a = 21 a .

(2.4.36)

As 7 a is eventually (in this case) positive, we may without loss of generality assume
that there there is some M > 0 such that a > M for all .
Let > 0 and choose 0 such that, whenever 0 , it follows that |a a | < . Then,
for 0 ,
1
|a1
a | =

|a a |

|a a |

1
|a
M2

a | <

1
.
M2

(2.4.37)

As M12 is just as arbitrary as (this uses the fact that M does not depend on ), this
completes the proof.
Exercise 2.4.38 Prove (iv).


a The

common index is supposed to indicate that the two nets have the same domain.

Exercise 2.4.39 Order Limit Theorem Let 7 a and 7 b be two convergent nets

and suppose that it is eventually the case that a b . Show that lim a lim b .
R

Sometimes this (or really, I suppose, an immediate corollary of this) is called the
Squeeze Theorem.

As you (hopefully) just showed in Exercise 2.4.24, convergent sequences are always cauchy.
However, it is in general not the case that every cauchy sequence converges. For example, once we
1
show that the definition of e := limm m
k=0 k! makes sense (were about to) and that e is irrational,
then this will serve as an example of a sequence that is cauchy in Q (as we just showed) but does
not converge in Q (because e is irrational). However, it is true that every cauchy sequence does
converge in R, and in fact, you might say this is the real reason we care about R at all and that
the motivation for requiring the existence of least upper-bounds was so that we could prove this.
Before we actually prove this however, we first need to discuss limit superiors and limit inferiors.
Limit superiors and limit inferiors

The Monotone Convergence Theorem is the tool that will allow us to define lim sup and lim inf.

Chapter 2. Basics of the real numbers

54

Proposition 2.4.40 Monotone Convergence Theorem Let 7 a be a nondecreas-

ing net that is bounded above, or alternatively, a nonincreasing net that is bounded below.
Then, 7 a converges.
R

Thus, if 7 a , it always makes sense to write down lim a . In the nondecreasing


case, either the net is bounded above, and the Monotone Convergence Theorem tells
us that lim a exists, or it is not bounded above, in which case lim a := +.

Proof. We just do the case where 7 a is nondecreasing and bounded above. The other
case is essentially identical. Define
a := sup{a }.

(2.4.41)

As the net is bounded above, this definition makes sense. Now we need only prove that
lim a = a .
So, let > 0. By Proposition 1.4.12, there is some a0 such that
a < a0 a .

(2.4.42)

Now suppose that 0 . Then, by monotonicity, we have that


a < a0 a a .

(2.4.43)

This, however, implies that a B (a ), so that, by definition, lim a = a .

Definition 2.4.44 Limit superior and limit inferior Let a : R be a net and for each

0 define
u0 := sup {a } and l0 := inf {a }.
0

(2.4.45)

Exercise 2.4.46 Check that 7 u is nonincreasing and that 7 l is nonde-

creasing.
Then, we define the limit superior and the limit inferior respectively
lim sup a := lim u

lim inf a := lim l .

(2.4.47)

Note that it is the Monotone Convergence Theorem which guarantees that this
definition makes sense. In particular, unlike limits themselves, lim sup and lim inf
always make sense.

The intuition is that 7 u is the net of eventual upper bounds. lim sup a is then
the limit of this net (similarly for lim inf).

Definition 2.4.48 Let 7 a be a net. Show one of the following two statements.

(i). lim sup a is finite iff 7 a is eventually bounded above.

2.4 Nets, sequences, and limits

55

(ii). lim inf a is finite iff 7 a is eventually bounded below.


R

Both statements are true of course, but the proofs will be so similar that there is not
really much point in asking you to write both down.

Exercise 2.4.49 Let 7 a be a net. Show that lim inf a lim sup a .
R

Note that if we have equality at a finite value, then the net converges to the common
value. See (2.4.50).

A useful result is the following, though we wont have the tools to prove it until the end of the
chapter (see Proposition 2.5.79).
Let 7 a be a net. Show that 7 a converges iff lim sup a = lim inf a
is finite, and in this case, lim a is equal to this common value.


(2.4.50)

Example 2.4.51 Note that we can have equality at an infinite value. For example,

consider the sequence m 7 am := m. This is obviously bounded above, and so by definition


lim supm am = . On the other hand, infmm0 {am } = m0 , and so lim infm am = as well.

Now we can finally return to our current goal of proving that cauchy nets converge in R.
Theorem 2.4.52 Completeness of R. Let 7 a be a cauchy net. Then, 7 a

converges.
R

This property of having all cauchy nets converge is known as cauchy-completeness.


Being cauchy-complete is a property that a uniform space may or may not have,a and
we will see that, when we equip R with a uniform structure, this result is equivalent
to completeness in the sense of uniform spaces. You should be careful not to confuse
cauchy-completeness with dedekind-completeness. For example, while R is the
unique (up to isomorphism) dedekind-complete totally-ordered field, there are cauchycomplete totally-ordered field distinct from R. In brief, the example will turn out
to be the cauchy completion of R(x) (which cannot be dedekind-complete because
otherwise R(x) would embed into R; see Example 2.3.13), but we will have to wait
until the next chapter to be more precise about this.

Proof. To show that 7 a converges, we first have to find some number a R which we
think is going to be the limit of the net. Our guess, of course, is going to be a := lim sup a .
We know that the net 7 a is bounded (Proposition 2.4.30), so that a R is finite.
We wish to show that lim a = a . So, let > 0. First of all, choose 0 so that there is
some -ball B such that
{a : 0 } B .b

(2.4.53)

We will use this later.


Now recall the definition of lim sup:
!
a := lim u := lim

sup {a } ,

(2.4.54)

so there is some 00 such that, whenever 00 ,


u a < .

(2.4.55)

Chapter 2. Basics of the real numbers

56

Redefine 0 to be the maximum of 0 and 00 . This way, both (2.4.53) and (2.4.55) will hold
for 0 .
On the other hand, by Proposition 1.4.12, there is some 0 0 such that
u < a 0 u .

(2.4.56)

In particular,
u a 0 < .

(2.4.57)

Hence,
|a 0 a | = |(a 0 u ) + (u a )| |a 0 u | + |u a | < + = 2. (2.4.58)
Were almost done. The only problem with this is that 0 is a specific index. We need this
inequality to hold for all sufficiently large, not just a single 0 . Fortunately, the cauchyness
can do this for us: it follows from (2.4.53) that, whenever 0 , |a a 0 | < 2. Hence,
for 0 ,
|a a | = |(a a 0 ) + (a 0 a )| |a a 0 | + |a 0 a | < 2 + 2 = 4. (2.4.59)

R

Youll note that the final inequality the proof ended with was |a a | < 4, in contrast to the inequality (implicitly) in the definition of convergence (Definition 2.4.10)
|a a | < . This of course makes no difference because 4 > 0 is just as arbitrary
as > 0. Some people actually go to the trouble of doing the proof and then going
back and changing all the s to n s just so that the final inequality has an in it. This
is completely unnecessary. Dont waste your time.

a No, you are not expected to know what a uniform space is yet (though see Definition 4.2.53) if you cant
wait to find out.
b This is the definition of cauchy.

1
We showed above in Example 2.4.26 that the sequence m 7 sm := m
k=0 k! is cauchy. The
completeness of R that we just established then tells us that this sequence converges, and so now,
finally, we can define
m

e := lim
m

1
k! .

(2.4.60)

k=0

But lets not. Given what we currently know, it would seem like we would just be pulling this
definition out of our ass. This series is not why e matters. In fact, I would argue that e itself
doesnt matterit is the function exp that arises naturally in calculus (being the unique (up to scalar
multiples) nonzero function equal to its own derivative). Thus, we refrain from officially defining
e until we have defined the exponential, which of course we wont be able to do until we know
about differentiation.
Nevertheless, it would be nice to officially know that there is at least some real number that is
not rational.
Square-roots

I mentioned way back right before Subsection 1.4.2 Dedekind cuts and the real numbers that often
people introduce the real numbers so that we can take square-roots of numbers, but that this logic is

2.4 Nets, sequences, and limits

57

a bit silly, because if that was the objective, then we should really be extending our number system
from Q to A, not from Q to R. That being said, even though our justification for the introduction
of the reals was really so that we can do calculus (take limits), it does turn out that we do get
square-roots of all nonnegative reals. This should be viewed more as icing on the cake instead of
the raison dtre.
Proposition 2.4.61 Let x 0, define a0 := 1 and



x
1
am +
am+1 :=
2
am

(2.4.62)

for m 0. Then, m 7 am converges and (limm am )2 = x.


R

Not only does this proposition show the existence of a number whose square is x, but
it tells you how to compute it as well.

Proof. Let us first assume that m 7 am does converge, say to a := limm am . Then, taking
the limit of both sides of the equation (2.4.62), we find that it must be the case that


x
1
a +
a =
,
(2.4.63)
2
a
and hence that
a2 = x.

(2.4.64)

Thus, if the limit exists, it converges to a nonnegative number whose square is x (nonnegative
because each am is nonnegative). Thus, we now check that in fact it converges.
We apply the Monotone Convergence Theorem. The sequence is bounded below by 0,
so it suffices to show that it is nonincreasing. We thus look at


1
x
x a2m
am+1 am =
am +
am =
.
(2.4.65)
2
am
2am
We want this to be 0, so it suffices to show that eventually a2m x. However, for m 1
(so that am1 makes sense),
"
!
#

2
1
x
1
x2
2
2
am =
am1 +
= x+
am1 + 2x + 2
x
4
am1
4
am1
(2.4.66)

2
1
x
= x+
am1
x.
4
am1

Exercise 2.4.67 Let m Z+ . Develop an algorithm which computes mth roots.
Proposition 2.4.68 Let m Z+ and let x 0. Then, there is a unique nonnegative real

number,

m
x, whose mth power is x.

Proof. We have just established existence.

Chapter 2. Basics of the real numbers

58

Exercise 2.4.69 Show that the map x 7 xm is strictly increasing on R+


0.

As strictly increasing (and decreasing) functions are injective, it follows that there is at
most one real number whose mth power is x.

We would have been able to show almost from the very beginning that, if m Z+ is not a
perfect square, then there is no element x Q such that x2 = m. What we havent been able to
show until just now, however, is that there is a real number x R such that x2 = m. We now finally
check that there, indeed, there is no rational number whose square is m. Thus, we will have finally
established the existence of real numbers which are not rational.
Proposition 2.4.70 Let m Z+ . Then, if m is not a perfect square, then there is no x Q

such that x2 = m.
R

Thus, in particular, Proposition 2.4.70 gives an example of a sequence that is cauchy


in Q but does not converge in Q. In other words, Q is not cauchy-complete.

Proof. Suppose that m is not a perfect square. Then, we can write m = k2 n where n is
square-free. It thus suffices to show that there is no rational number whose square is n. We
proceed by contradiction: suppose there is some x Q such that x2 = n. Write x = ab with
b > 0 and gcd(a, b) = 1. Then,
a2 = nb2 ,

(2.4.71)

and so every prime factor of n divides a2 , and hence a (because the factor is prime). Let p
be some prime factor of n and write n = pn0 and a = pa0 . This gives us
p2 (a0 )2 = pn0 b2 ,

(2.4.72)

and hence
p(a0 )2 = n0 b2

(2.4.73)

As n is square-free, gcd(p, n0 ) = 1, and hence this equation implies that p divides b2 , and
hence divides b. But then gcd(a, b) p > 1: a contradiction.

We now show that all intervals in R are exactly what you think they are (confer Definition A.1.89). We could have done this awhile ago now, but we wanted to wait until we could
definitely provide an example of an interval in Q that is not of the form you would expect.1
Proposition 2.4.74 Let I R. Then, I is an interval iff either

(i). I = [a, b] for < a b < ,


(ii). I = (a, b) for a b ,
(iii). I = [a, b) for < a b , or
(iv). I = (a, b] for a b < ,
where a = inf(I) and b = sup(I).
1 By

what you would expect, we mean something like [a, b], or [a, b), etc.

2.4 Nets, sequences, and limits

59

Note that when the side of the interval is open, we allow (depending on which
side it is).

In the future, we shall simply write I = [(a, b)] to indicate that each end may be either
open or closedit is a pain to keep writing out all the four cases separately.

Proof. () Suppose that I is an interval. If I is empty, then we have that I = (0, 0), and
so is of the form (ii). Otherwise, we may define b := sup(I) and a := inf(I). For each a
and b there are two possibilities: either I contains a or it does not (and similarly for b). We
do one of the four cases: suppose that a, b I. Then, because I is an interval, we must
have immediately that [a, b] I. To show the other inclusion, let x I. We proceed by
contradiction: suppose that either x < a or x > b. Both cases are similar, so let us just
assume that x > b. Then, b is no longer an upper-bound for I: a contradiction. Therefore,
I [a, b], and so I = [a, b].
() Suppose that I is of the form (i)(iv). Recall that the definition of an interval (Definition A.1.89) is that, for any x1 , x2 I with x1 x2 , then x1 x x2 implies that x I. As
each of these forms satisfies this property (for trivial reasons), I is an interval.

Of course, we mentioned above, this is not true in Q.


Example
2.4.74((i)(iv)) Define
2.4.75 An interval in Q not of the form Proposition

I := [0, 2] Q Q. It follows from the fact that [0, 2] is an interval in R that I is an


interval in Q (because it satisfies the defining condition in Definition A.1.89). If it were
of the form [a, b] (or open-end variations of this), then we would have to have I = [0, b] or
I = [0, b). From the definition of I, however, it follows that we must have b2 = 2, and so
there is no such b (in Q).
Density of Qc in R

We mentioned back when we showed the density of Q in R (Theorem 2.3.14) that it was also true
that Qc was dense in R. At the time, however, we could not even construct a real number that we
could prove was irrational. Now, however, we have done so, and so we can return to the issue of
the density of Qc in R.
Theorem 2.4.76 Density of Qc in R. Let a, b R. Then, if a < b, then there exists

c Qc such that c (a, b).


Proof. We have just shown that Qc is nonempty, so let x Qc . By density of Q, we
may take a0 Q (a x, b x). Then, a0 + x (a, b), and furthermore, a0 + x =: q must be
irrational, because if it were rational, then x = q a0 would be rational: a contradiction. 
Counter-examples

In this small subsubsection, we present a couple of surprising counter-examples.




Example 2.4.77 For m, n Z+ , definea

am,n :=

1 1
mn
1
m2

+ n12

(2.4.78)

Chapter 2. Basics of the real numbers

60
Then, for fixed n Z+ ,
lim am,n =
m

0
= 0.
0 + n12

(2.4.79)

Similarly, for fixed m Z+ ,


lim am,n = 0.

(2.4.80)

On the other hand, if we set m = n and then take a limit, we get


!
lim am,m = lim
m

1
m2

1
m2

1
m2

= lim( 21 ) = 12 .
m

(2.4.81)

Thus:
Even if there is some number a such that (i) for all , lim a , = a ; and
(ii) for all , lim a , = a , it need not be the case that lim a , = a .

(2.4.82)

This can sort of be fixed, howeversee Proposition 2.4.157.


a This

is an example of a case where we would have to reindex by 1, so that we dont divide by 0, were we
forced to take m, n N. See the first remark in our definition of a sequence, Definition 2.4.5.

Example 2.4.83 Iterated limits need not agree For m, n Z+ , define

am,n :=

1
m
1
m

+ n1

(2.4.84)

Then,
lim am,n =
m

0
= 0,
0 + 1n

and hence


lim lim am,n = 0.
n

(2.4.85)

(2.4.86)

On the other hand,


lim am,n =
n

1
m
1
m

+0

= 1,

and hence


lim lim am,n = 1.
m

(2.4.87)

(2.4.88)

Thus:


It is possible for both iterated limits, lim lim a , and lim lim a, , to
exist and not agree.
R

This is actually incredibly important because it is often really tempting to interchange


limits, especially if they might be hidden implicitly in something like a derivative,
but this is in general just plain wrong.

2.4 Nets, sequences, and limits


2.4.4

61

Series
As you probably know from calculus, a series is an infinite sum. In other words, it is a limit of a
finite sum:
Definition 2.4.89 Series Let m 7 am be a sequence and define sm := m
k=0 ak . Then,

ak := ak := lim
ak
m

kN

k=0

(2.4.90)

k=0

is a series and the sm s are the partial sums of this series.a The series is said to converge iff
the sequence m 7 sm does and similarly for diverge. The series converges absolutely iff
m
k=0 |ak | converges. The series converges conditionally iff it converges but not absolutely.
a Of

course this limit need not exist; the use of kN ak here is just an abuse of notation.

1
Exercise 2.4.91 Geometric series Let |a| < 1. Show that mN am = 1a
.

Youll probably recall several tests from calculus that allow us to determine whether or not a
given series converges. One of the primary goals of this section is to prove several of these tests.
Perhaps the first thing you should check is whether or not the terms of the series go to 0, if only
because it (often) takes no time to check at all.
Proposition 2.4.92 Let m 7 am be a sequence. Then, if kN ak converges, then limm am =

0.
Proof. Suppose that kN ak converges. Let > 0. Then, the sequence of partial sums is
cauchy, there is some m0 N such that, whenever m0 m n, it



n
n

m



a
=
a

a
(2.4.93)
k k k < .
k=m+1 k=0

k=0
As this hold for all m n, we may take n := m + 1, in which case this inequality reduces to
|am+1 0| = |am+1 | < .
Hence, by definition, limm am = 0.

(2.4.94)


Thus, if the terms do not go to 0, you can immediately conclude that the series does not
converge.
Proposition 2.4.95 Absolute convergence implies convergence Let m 7 am be a

sequence and suppose that kN |ak | exists. Then, kN ak exists.


Proof. To show that kN ak exists, we show that the sequences of partials sums m 7 sm :=
m
k=0 ak is cauchy. Take m n. Then,





n
n

n

m
n
m





(2.4.96)
ak ak = ak |ak | = |ak | |ak | .
k=0
k=m+1 k=m+1
k=0

k=0
k=0
Now we are essentially done. Do you see why?

Chapter 2. Basics of the real numbers

62

Let > 0 and choose m0 so that whenever m0 m n it follows that




n

m


|ak | |ak | < .
k=0

k=0
Then, it follows from (2.4.96) that whenever m0 m n that


n

m


ak ak < ,
k=0

k=0

(2.4.97)

(2.4.98)

which shows that the sequence of partial sums is cauchy, and hence the series converges.

The next test we present is the Alternating Series Test.


Definition 2.4.99 Alternating series A series kN ak is alternating iff sgn(ak+1 ) =

sgn(ak ) for all k N.


R

In this case, we may write ak = (1)k bk where bk 0 and the is determined


depending on the sign of the first term.

Proposition 2.4.100 Alternating Series Test Let m 7 am be a nonincreasing sequence

that converges to 0. Then, mN (1)m am converges.


Proof. The first thing to notice is that
n

(1)km ak = am (am+1 am+2 )(am+3 am+4 ) (an1 an ) am

(2.4.101)

k=m

because each ak ak+1 0.a


Using this, we see that



n
n

m




(1)k ak (1)k ak = (1)ak am+1 .
k=0
k=m+1

k=0

(2.4.102)

Now that the partial sums are cauchy follows from the fact that limm am = 0.

a Depending

on whether m n is even or odd, the last term might have instead just be an instead of
(an1 an ). Either way, the same inequality holds.

Proposition 2.4.103 Comparison Test Let m 7 am and m 7 bm be eventually nonneg-

ative sequences such that am bm . Then,


(i). if m am diverges, then m bm diverges, and
(ii). if m bm converges, then m am converges, and furthermore, mN am mN bm .
Proof. (i) follows from the fact that the operation of taking limits (in this case, applied to
the partial sums) is nondecreasing (see Exercise 2.4.39).
Exercise 2.4.104 Prove (ii).

2.4 Nets, sequences, and limits

63

Proposition 2.4.105 Limit Comparison


Test Let m 7 am and m 7 bm be eventually


nonnegative sequences such that limm bamm converges to a nonzero number. Then, mN am
converges iff mN bm converges.


Proof. Define L := limm abmm > 0. Let > 0 be less than L, and choose m0 N such that,
whenever m m0 , it follows that


am

(2.4.106)
bm L < ,
that is,
<

am
bm

L < ,

(2.4.107)

or rather
bm (L ) < am < bm (L + ).

(2.4.108)

Note that L > 0. It follows from the Comparison Test applied to the first inequality
that, if mN am converges, then mN bm converges. Likewise, it follows from the second
inequality that, if mN bm converges, then mN am converges.

1

Proposition 2.4.109 Root test Let m 7 am be a sequence. Then, if lim supm |am | m < 1,
1

then mN am converges absolutely; if lim supm |am | m > 1, then mN am diverges.


1

Proof. Suppose that u := lim supm |am | m < 1. Let > 0 be less than 1 u > 0. Then, there
is some m0 such that, whenever m m0 , it follows that
1

sup {|an | n } u < ,

(2.4.110)

nm
1

so that supnm {|an | n } < + u < 1, so that |an | n < r := + u < 1, so that |an | < rn for all
n m0 . It follows that mN am converges absolutely by the comparison test.
1

Exercise 2.4.111 Prove the case where lim supm |am | m > 1.




Example 2.4.112 Harmonic series Consider the series mZ+ (1)m+1 m1 with terms

am := (1)m+1 m1 . This is called the alternating harmonic series, whereas the series of
absolute values mZ+ m1 is the harmonic series. We seem immediately from the Alternating
Series Test that the alternating harmonic series converges. In fact, it converges to ln(2),a
though we dont know that yet (we dont even know what ln(2) is!). The harmonic series
itself though diverges (so that the alternating harmonic series is conditionally convergent).
To see this, we apply the Comparison Test:

mZ+

1
m

= 1 + 12 + 13 + 14 + 51 + 16 + 71 + 81 +
1 + 12 +

1
4


+ 14 +

1
8

(2.4.113)

+ 18 + 81 + 81 + 18 = 1 + 12 + 21 + 12 + = .

Chapter 2. Basics of the real numbers

64

If you take a string and fix its endpoints, then there are only countably many fundamental frequencies that the string can vibrate at (fundamental in the sense that any
way in which the string might vibrate can be written as a sum of the fundamental
frequency solutions). These are called the harmonics and the wavelength of every
harmonic is of the form m1 2L, where L is the length of the string. This is where the
term harmonic comes from (or so it seems).

a See

Exercise 6.4.84.

The uncountability of the real numbers

We mentioned at the end of Section 2.1 Cardinality and countability that |R| = 20 but at the time
we did not know enough to prove it. We now return to this.
Theorem 2.4.114. |R| = 20 .

Proof. To prove this, we apply the Bernstein-Cantor-Schrder Theorem (Theorem 1.1.27).


Thus, we wish to construct in injection from 2N to R and an injection from R to 2N . By
Exercise A.1.51, we may replace 2N with {0, 1}N in this statement, and hence in return with
{0, 2}N (you will see why we do this momentarily).
The set {0, 2}N is just the collection of sequences m 7 am with am {0, 2}. We thus
define : {0, 2}N R by
(m 7 am ) :=

am
3m .

(2.4.115)

mN

Intuitively, the sequence m 7 am is thought of as the ternary expansion of a real number.


Switching from {0, 1} to {0, 2} was tantamount to changing from binary to ternary and not
including any numbers with 1 in their ternary expansion. The reason for this is because, in
binary, we have
1 = .1 := .111 ,

(2.4.116)

and so the resulting function would not be injective. Of course, in ternary, we still have
things like
1 = .2 = .222 ,

(2.4.117)

but 1 corresponds to the sequence (1, 0, 0, 0, . . .), which is not an element of {0, 2}N , and so
injectivity works out.
Exercise 2.4.118 Show that is injective.

It follows that 20 |R|.


As |Q| = 0 , it suffices to replace 2N by 2Q above, and so it suffices to produce an
injection from R to 2Q , the power-set of Q. Define : R 2Q by
(x) := {q Q : q x} .

(2.4.119)

Exercise 2.4.120 Show that is injective.

It follows that |R| 20 , and hence that |R| = 20 .

2.4 Nets, sequences, and limits

65

Addition of infinitely many numbers is noncommutative

We will make the title of this subsubsection precise in a moment, but for the moment we will settle
for something imprecise: there exists two convergent series mN am and mN bm which converge
to different values, but yet {am : m N} = {bm : m N}. That is, the terms themselves are the
same (though in different order of course), but yet the series converge to different values! If this is
your first time studying rigorous mathematics, this is probably one of these WTF!? moments
when you realize that sometimes mathematics can be so counter-intuitive so as to demand rigorif
we didnt require a proof, it would be very easy to dismiss commutativity of infinite series as
obvious. In fact, its not usually a good idea to use the word obvious in proofs at allat best
its lazy, and at worst, its just plain wrong.
Theorem 2.4.121. Let m 7 am be a sequence such that mN am converges absolutely

and let : N N be a bijection. Then, mN a (m) converges absolutely and mN am =


mN a (m) .
R

That is a bijection is the way we make precise the idea that bm := a (m) is just a
rearrangement of the original terms. Thus, this theorem says that the rearrangement
of any absolutely convergent series converges to the same value.

Proof. Define S := mN am . Let > 0 and choose m0 such that, whenever m m0 , it


follows that |m
k=0 ak S| < . Choose m1 such that, whenever m m1 , it follows that

|a
|
<

(we may do this because of the absolute convergence). Replace m0 by


k=m+1 k
max{m0 , m1 }, so that, whenever m m0 , it follows that both of these inequalities hold.
Define

n0 := max 1 ({m N : m m0 }) .
(2.4.122)
The set {m N : m n0 } is finite, and so of that set is finite, and so the definition of n0
makes sense. This definition guarantees that
{0, 1, . . . , m0 1, m0 } ({0, 1, . . . , n0 1, n0 }) .

(2.4.123)

Suppose that n n0 . Then,






n
n
m0

m0




a
a

a
+
a

S
|ak | + < 2, (2.4.124)
(k)
(k) k k
<

k=0
k=0
k=0

k=0
k=m +1
0

0
where the inequality of the first term follows from the fact that nk=0 a (k) m
k=0 ak contains
only terms ak for k m0 + 1 (by (2.4.123)).


a In the first term, because of our choice of n , every a for 0 k a


m0 appears somewhere in a (k) for
0
k
0 k n. Therefore, the difference s a finite sum of terms all whose indices are at least m0 + 1. (The finite
is crucialwe know we can rearrange finitely many terms, which is why we are able to get rid of the in

k=m0 +1 |ak |.)

Theorem 2.4.125. Let m 7 am be a sequence such that mN am converges condition-

ally and let x y . Then, there exists a bijection : N N such that


m
lim infm m
k=0 a (k) = x and lim supm k=0 a (k) = y. In particular, taking x = y, there is
a bijection : N N such that mN a (m) = x.
R

This theorem says something really quite surprising: Given a series that converges
conditionally, you can rearrange the terms to obtain a series which converges to any

Chapter 2. Basics of the real numbers

66
real number you choose whatsoever.
a

Proof.

Define

1
1
a+
m := 2 (|am | + am ) and am := 2 (|am | am ),

(2.4.126)

so that

am = a+
m am and |am | = am + am

(2.4.127)

with a+
m , am 0. In particular,

|am | = [a+m am ].

mN

(2.4.128)

mN

Thus, at least one of mN a+


m and mN am must diverge. On the other hand,

am = [a+m am ]
mN

(2.4.129)

mN

converges, and so we cannot just have one of mN a+


m and mN am diverge. Thus, we must
have that

a+m = = am .
mN

(2.4.130)

mN

Note that a+
m = am and am = 0 if am 0, and am = 0 and am = am if am 0. Thus,
modulo the existence of zero terms which make no difference, every term in both of the
series of (2.4.130) is a term in the original series mN am (up to a minus sign in the latter

case). We will build up a rearrangement of the original series mN am using a+


m and am .
The intuition is as follows: Because both of the series of (2.4.130) diverge, I can move
as far to the right as I like by choosing terms of the form a+
m , and likewise, I can move as
far to the left as I like by choosing terms of the form a
.
We
make this precise as follows.
m
Let m0 be the smallest natural number such that
m0

a+k y.

(2.4.131)

k=0

Such an m0 exists because the first series in (2.4.130) diverges. Note that, because m0 is the
smallest such number, we must have that
m0 1

a+
k < y,

(2.4.132)

k=0

so that
m0

a+k y < a+m .


0

(2.4.133)

k=0

Then, let n0 be the smallest natural number such that


m0

n0

k=0

k=0

a+k ak x.

(2.4.134)

2.4 Nets, sequences, and limits

67

Similarly as before, we now have that


!
m0

n0

k=0

k=0

a+k ak

0 x

< a
n0 .

(2.4.135)

Do this again: let m1 and n1 be the smallest natural numbers such that
m0

n0

a+k ak +

m1

k=0

k=0

k=m0 +1

m0

n0

m1

a+
k y

(2.4.136)

and

a+k ak +
k=0

k=0

a+
k

k=m0 +1

n1

a
k x

(2.4.137)

k=n0 +1

respectively. (Of course, inequalities analogous to (2.4.133) and (2.4.135) hold here as well.)
Continue this process inductively. The series
m0

n0

m1

a+k ak +
k=0

k=0

a+
k

k=m0 +1

n1

a
k +

k=n0 +1

m2

k=m1 +1

a+
k

n2

a
k +

(2.4.138)

k=n1 +1

is a rearrangement of the original series. Denote the partial sums of this series by Sm . Thus,
the inequalities (2.4.133) and (2.4.137), in terms of Sm , look like

0 Sm y < a+
im and 0 x Sm < a jm ,

(2.4.139)

where i, j : N N are strictly increasing functions of m (these are the mk and nk s). Hence,

0 sup {Sm } y < sup {a+


im } and 0 x inf {Sm } < inf {a jm }
mm0

mm0

mm0

mm0

(2.4.140)

Recall however that mN am converges. It follows that limm am = 0, and so in turn limm a+
m=
.that
lim
inf
S
=
x
and
lim
sup
S
=
y.
Thus,
taking
the
limit
of
(2.4.140)
0 = limm a
m m
m
m m
with respect to m0 , we obtain lim infm Sm = x and lim supm Sm = y.

a Adapted

from [21].

Exercise 2.4.141 Let kN ak be a conditionally convergent series. Does there exist a rearm
rangement : N N such that (i) lim supm m
k=0 a (k) = + and (ii) lim infm k=0 a (k) =
? Provide a proof or counter-example.

2.4.5

If this statement is true, holy bitch-tits batman that is insane! Disclaimer: This is not
meant to be a hint, just a comment (no, seriously).

Subnets and subsequences


The concept of a subnet is almost what you think it should be. To help understand the concept
before we go to the precise definition, lets think of what subnets of sequences should be.
Let a : N R be a sequence and let S N. When should a|S be a subsequence of the original
m 7 am ? Well certainly if S is finite, we should not consider a|S to be a subsequencewe need the
indices to get arbitrarily large. Moreover, whatever our definition of subsequence is, it should have
the property that, if m 7 am converges to a , then every subsequence of m 7 am should converge

Chapter 2. Basics of the real numbers

68

to a as well. If S is allowed to be finite, then of course this will not be the case. For example,
if we allowed this, (0) would be a subsequence of (0, 1, 1, 1, . . .). This is just silly. Thus, a key
requirement is that elements of S have to be able to become arbitrarily large.
Now let be a general directed set and let 0 be a subset whose elements are arbitrarily
large. (Precisely, this means that, for all , there is some 0 such that .) One way to
see we need to make this requirement is because, without this requirement, 0 would not itself be a
directed set in general. However, if we do require the elements of S to be arbitrarily large, then
0 will be a directed set, and so a|S will indeed be a net, and certainly it will turn-out that a|0 is a
subnet of a. However, it is not the case that every subnet of 7 a is of this form. The reason for
this is ultimately because, if we dont allow for more general subnets, then theorems we want to be
true will fail to be true (see, for example, the proofs of Theorems 2.5.68 and 3.4.8).
Definition 2.4.142 Subnet Let a : R be a be a net. Then, a subnet of a is a net

b : 0 R such that

(i). for all 0 , b = a for some ; and


(ii). whenever U R eventually contains a, it eventually contains b.
A subsequence is a subnet that is a sequence.
R

In other words, a subnet of a net is a net whose terms are all terms from the original
net and is eventually contained in every set that originally contains the original net.

There are at least two definitions in the literature that are distinct from this one. Our
definition is strictly weaker than both of them (see Propositions 2.4.146 and 2.4.148,
Example 2.4.147, and Exercise 2.4.149). These definitions are not so good because
they do not correspond precisely to the notion of filterings and filters (see Section 3.3
Filter bases).a This definition also makes a few proofs easier (see, for example, the
proofs of Theorems 2.5.68 and 3.4.8).

a You

If b : 0 R is a subnet of a net a : R, then, by (i), it follows that there is


some function : 0 such that b = a . However, in general there will not
be a unique such function. This almost never matters, and it is customary to write
b = a() := a for some noncanonically chosen .
Our definition of subsequence is slightly different than most authors (though our definition of subnet is the same). The primary reason for this is because typically people
do not introduce nets in a first analysis course, in which case the naive definition of
subsequence, i.e., a subnet of the form a|S , works. For them, subsequences are what
we would call strict subnets of a sequence (see Definition 2.4.144). In particular, we
allow for repeats whereas most authors will not.
are neither supposed to know what these are yet nor why this is significant.

As subnets of the more naive type are still quite important, we do given them a special name.
Definition 2.4.143 Cofinal subset Let S be a subset of a preordered set X. Then, S is

cofinal iff for every x X there is some s S such that s x.


R

Of course, saying that a subset is cofinal is just our fancy-schmancy way of saying
that the elements are arbitrarily large.

One way to see that we need elements to grow arbitrarily large is because, in a directed set, the
subset will itself be directed.

2.4 Nets, sequences, and limits

69

Definition 2.4.144 Strict subnet A strict subnet of a net a : R is a subnet of the

form a|0 : 0 R for 0 cofinal.

Exercise 2.4.145 Let a : R be a net and let 0 R. Then, if 0 is cofinal, show

that a|0 : 0 R is a subnet of x : R.


R

Thus, our definition does in fact make sense.

One key difference between the definition of a subnet and the more naive definition
(that is, if one were only to allow strict subnets) is that you are allowed to repeat
elements in a subnet, for example, (0, 0, 1, 2, 3, . . .) is a subnet of (0, 1, 2, 3, . . .), but
not a strict subnet.

We mentioned in the definition of a subnet, Definition 2.4.142, that there are at least two
definitions of subnets in the literature that are distinct from ours. Our definition is strictly weaker
than both of these, as we now show.
Proposition 2.4.146 Let a : R and b : 0 R be nets. Then, if there is a function

: 0 such that (i) b = a and (ii) for all there is some 0 0 such that,
whenever 0 , it follows that () , then b is a subnet of a.
R

Thus, in different notation, a subnet of 7 a is a net of the form 7 a , where


the function 7 has the property that, for all 0 , there is some 0 such that,
whenever 0 , it follows that 0 . Note that, of course, in this case, there
is a canonically chosen : 0 (confer the remarks in the definition of a subnet,
Definition 2.4.142).

This is sometimes taken as the definition of a subnet (for example, see [12, pg. 70]).
Our definition is strictly weaker than this one as the following example shows. This
definition is more or less perfectly okay for almost all purposes. The reason we have
chosen the definition we have over this one (aside from the fact that it makes some
proofs slightly easier), is that it is more natural in the sense that our definition is
the one that corresponds to the analogous notion with filters (see Section 3.3 Filter
bases).

Proof. Suppose that there is a function : 0 such that (i) b = a and (ii) for all
there is some 0 0 such that, whenever 0 , it follows that () . Let U R
eventually contain a. Then, there is some 0 such that, whenever 0 , it follows
that a U. Let 0 0 be such that, whenever 0 , it follows that () =: 0 .
Thus, whenever 0 , it follows that a U, so that 7 a is eventually contained in
U, and hence is a subnet of 7 a .

Example 2.4.147 A subnet that would not be a subnet in the sense of Proposition 2.4.146 Consider the constant sequence m 7 am := 0 and define : N N by


(n) := 0. Then, n 7 a(n) = 0, and so is certainly eventually contained in every set which
eventually contains a, and so is a subnet. On the other hand, does definitely not satisfy (ii)
of Proposition 2.4.146. Thus, this is an example of a subnet which would not be considered
a subnet if we had taken the conditions in Proposition 2.4.146 as our definition of a subnet.
And now we come to the second definition that is sometimes in the literature.

Chapter 2. Basics of the real numbers

70

Proposition 2.4.148 Let a : R and b : 0 R be nets. Then, if there is a function

: 0 such that (i) b = a , (ii) is nondecreasing and (ii) has cofinal image., b : 0 R
is a subnet of a.
This is sometimes taken as the definition of a subnet (for example, this is currentlya
the definition given on Wikipedia). The definition that is sometimes given as describe
in Proposition 2.4.146 is strictly weaker than this definition (see the next exercise),
and so in turn our definition is strictly weaker than this definition. This definition
also fails to exactly correspond to the analogous notion with filters, but, unlike the
definition of Proposition 2.4.146, using this definition can actually make things quite
a bit more difficult.b The problem is that we often construct a subnet of the form in
Proposition 2.4.146, and then showing that is nondecreasing is an extra step at best
and requires modification of the subnet at worst. As far as I am aware, there is just
no good reason to use this definition.

Proof. We apply the previous proposition. Let be arbitrary. Then, because has
cofinal image, there is some 0 0 such that (0 ) . Because is nondecreasing, it
follows that, if 0 , then () (0 ) .

a6

July 2015
maybe even impossible? I dont know because I dont use this definition.

b Or

Exercise 2.4.149 A subnet that would not be a subnet in the sense of Proposition 2.4.148 Find a net a : R, a directed set 0 with function : 0 that has the

property that, for all , there is some 0 0 such that, whenever 0 , it follows
that () , but yet either (i) is not nondecreasing or (ii) does not have cofinal image.
R

That is to say, the conditions of Proposition 2.4.146 are strictly weaker than the
conditions of Proposition 2.4.148.

Example 2.4.150

(i). (0, 0, 0, . . .) is a subsequence of (0, 1, 0, 1, 0, 1, . . .).


(ii). (0, 2, 4, , . . .) is a subsequence of (0, 1, 2, . . .).
(iii). (1, 1, 1, . . .) is not a subsequence of (0, 1, 2, . . .). (The indices that you hit in the
original sequence must get arbitrarily large; here, the only index hit is 1.)
(iv). (0, 1, 4, 9, 16, . . .) is a subsequence of (0, 1, 2, . . .).
Proposition 2.4.151 Let 7 a be a subnet of a net 7 a which converges to a R.

Then, 7 a converges to a .
Proof. Let > 0. Choose 0 such that, whenever 0 , it follows that |a a | < .
By the definition of a subnet, there is some 0 such that, whenever 0 , it follows that
0 . Hence, for 0 , it follows that |a a | < .

The following result is important because it becomes an axiom of the convergence definition of
topological spaces.

2.4 Nets, sequences, and limits

71

Proposition 2.4.152 Let 7 a be a net. Then, lim a = a iff every subnet 7 a


has in turn a subnet itself 7 a such that lim a = a .

Proof. () This follows from two applications of the preceding proposition.


() Suppose that every subnet 7 a has in turn a subnet itself 7 a such that
lim a = a . We proceed by contradiction: suppose that it is not the case that lim a =
a . Then,
there is some 0 > 0 such that for all there is some such that |a
(2.4.153)
a | 0 .
Define S := { : } and denote by 7 a the corresponding strict subnet, so that
|a a | 0 for all .
We wish to show that 7 a has no subnet which converges to a . This will be
a contradiction, thereby proving the result. To show this itself, we again proceed by
contradiction: suppose there is some subnet 7 a of 7 a that converges to a .
Then, there must be some 0 such that, whenever 0 , it follows that |a a | < 0 .
From this, we wish to derive a contradiction with (2.4.153) for := 0 : a contradiction. 
The following result, together with the above (and the fact that constant nets converge to that
constant) turn out to be sufficient to characterize the topology on a topological space. Before we
present it, however, we must first define an order on a product of preordered sets.
Definition 2.4.154 Product order Let P be a collection of preordered sets and let

x, y

PP P.

Then, we define x P y iff xP P yP for all P P.

Exercise 2.4.155 Show that P is a preorder.

Exercise 2.4.156 If P is a collection of directed sets, show that (

PP P, P )

is a

directed set.
Proposition 2.4.157 Let I be a directed seta and for each i I let ai : i R be a convergent

net. Then, if (a ) := limi lim (ai ) exists, then I


(a ) .

iI

3 (i, ) 7 (ai ) i converges to

In other words, if you have a nets worth of nets such that the iterated limit converges,
then you can write this limit as a limit of a single net (which itself is formed from the
nets worth of nets).

Note that if you insist upon working with only sequences, you havent a chance in
hell to make something like this work. In fact, recall our counter-example (Example 2.4.77), in which we had limm (an )m = 0 for all n N, and so of course we had
that limn (limm (an )m ) existed (and was equal to 0). In fact, the same was true with m
and n reversed. We then hoped that limm,m (am )m = 0, but found that this was not in
fact the case. This result tells us that you can indeed form a net from a nets worth of
convergent nets that converges to the thing you would like to, the catch being that the
answer is not as nice as one might have liked.

As messy as this answer might seem, in some sense, its the best we could do. Can
you write down any other net at all formed from just the (a ) s? As all the directed

Chapter 2. Basics of the real numbers

72

sets are in general distinct, this is essentially the simplest thing we can write
down.

Proof. Suppose that (a ) := limi lim (ai ) exists. Let > 0. Let i0 I be such that,
whenever i i0 , it follows that




lim(ai ) (a ) < .
(2.4.158)

Define
(ai ) := lim(ai ) .

(2.4.159)

Let 0i i be such that whenever i 0i it follows that


i

(a ) i (ai ) < .

(2.4.160)

Then, whenever (i, ) (i0 , 0 ), by definition of the product order, i i0 and i 0i for
all i I, and so
i



(a ) i (a ) (ai ) i (ai ) + (ai ) (a ) < + = 2.
(2.4.161)

a I

2.5

is for index.

Basic topology of the real numbers


Though we have not defined it yet (and will not do so until the next chapter), a topological space is
the most general context in which one can make precise the notion of continuity.2 The point is, is
that if continuity is something you care about, then topology in turn is something you should also
care about.

2.5.1

Continuity
Definition 2.5.1 Limit (of a function) Let f : R R be a function and let a, L R.
Then, L is the limit of f at a iff for every net 7 x such that (i) x 6= a and (ii) lim x = a
we have lim f (x ) = L.
R

If L is the limit of f at a, then we write limxa f (x) = L.

Consider the function f : R R defined by


(
0 if x 6= 0
f (x) :=
.
1 if x = 0

(2.5.2)

Then we would like to say that limx0 f (x) = 0. This is the motivation for imposing
the constraint x 6= a. Because, for example, the constant net 7 x := a does not
satisfy lim f (x ) = 0.

2 This

is arguably a slight lie, but in any case, I think its fair to say that continuity is really the point of working with
topological spaces.

2.5 Basic topology of the real numbers

73

Exercise 2.5.3 Let f : R R and let a, L R. Show that limxa f (x) = L iff for every

> 0 there is some > 0 such that, whenever 0 < |x a| < , it follows that | f (x) L| < .
The motivation for the condition 0 < |x a| is the same as the motivation for the
condition x 6= a in the previous definition.

Definition 2.5.4 Continuous (real) function Let f : R R be a function and let a R.


Then, f is continuous at a iff limxa f (x) = f (a). f is continuous iff it is continuous at a
for all a R.


Example 2.5.5 Dirichlet Function Define f : R R by

(
1 if x Q
f (x) :=
.
0 if x Qc

(2.5.6)

This is the Dirichlet Function.


Exercise 2.5.7 Where is the Dirichlet Function continuous?

Example 2.5.8 Thomae Function Define f : R R by

(
f (x) :=

1
n

if x = mn Q with gcd(m, n) = 1, n > 0


.
if x Qc

(2.5.9)

This is the Thomae Function.


Exercise 2.5.10 Show that the Thomae Function is continuous at x R iff x Qc .

Hint: For a fixed n Z+ , how many rational numbers are there in the interval [0, 1]
with denominator smaller than n?

Exercise 2.5.11 Let f : R R be a function and let a R. Show that the following are

equivalent.
(i). f is continuous at a.
(ii). For every net 7 x such that (i) x 6= a and (ii) lim x = a we have lim f (x ) =
f (a).
(iii). For every > 0 there is some > 0 such that, whenever 0 < |x a| < , it follows
that | f (x) f (a)| < .a
(iv). For every > 0 there is some > 0 such that f (B (a)) B ( f (a)).
(v). For every > 0 there is some > 0 such that B (a) f 1 (B ( f (a))).
a Note

that as | f (a) f (a)| < for all > 0, the < in 0 < |x a| < is not strictly necessary anymore.

Exercise 2.5.12 Show that the sum and product of continuous functions are continu-

ous. Show that the quotient of a continuous function by a function that never vanishes is

Chapter 2. Basics of the real numbers

74
continuous.

These equivalences are nice, but they all somehow have the disadvantage that they make
reference to the points of R.3 There is, however, a way to characterize continuity without making
reference to points at all. This is done by the introduction of open sets.
Exponentials

Despite all that weve done, there is still something you thought you knew how to do with numbers
since grade school, but we have yet to define: exponentials.4 Now that we know what it means to
be continuous, we can finally define exponentials.
Theorem 2.5.13 Exponentials. Let a R+ . Then, there is a unique continuous function,

R 3 x 7 ax R, such that
(i). ax+y = ax ay ; and
(ii). a1 = a.
Functions of this form are called exponential functions.
R

We also define: 0a := 0 for a > 0, and 00 := 1a see Exercise 5.2.117. 0a is undefined


for a < 0.

Proof. S T E P 1 : D E F I N E am F O R m N
For m N, am is defined by
am := a
a} .
| {z

(2.5.14)

S T E P 2 : D E F I N E a1/n F O R n Z+
1
For n Z+ , a n is defined to be the unique positive real number that has the property that
1
(a n )n = a.
S T E P 3 : D E F I N E ar F O R r Q
p
For qp Q with p 6= 0, q Z+ , and gcd(p, q) = 1, a q is defined by
1

a q := (a p ) q .

(2.5.15)

S T E P 4 : D E F I N E ax F O R x R
For x R, let 7 x Q be some net converging to x. Then, ax is defined by
ax := lim ax .

(2.5.16)

Exercise 2.5.17 Show that this is well-defined in the sense that if 7 b Q also

converges to a, then lim xa = lim xb .


3 Admittedly,

at this point, it should probably not be clearas to why this would be a disadvantage at all.
3

you really think you knew what something like 2 was? What are you going to do? Multiply 2 by itself

3 times? Good luck with that.


4 Did

2.5 Basic topology of the real numbers

75

S T E P 5 : C H E C K P RO P E RT I E S
a1 = a by definition.
Exercise 2.5.18 Show that ax+y = ax ay for all x, y R.

STEP 6: SHOW UNIQUENESS


Exercise 2.5.19 Show that this is the unique continuous function that is a at 1 and

takes sums to products.



a The

reason for this definition is nontrivial, and we will have to wait for lebesgue measure to justify it. That
being said, you can see intuitively why we make this definition simply by graphing the function hx, yi 7 xy on
R+ R+ .

Exercise 2.5.20 Show that (ax )y = axy .

2.5.2

Open and closed sets


Definition 2.5.21 Open set Let U R. Then, U is open iff for every x U there is

some > 0 such that B (x) U.


R

The intuition is that there is wiggle room around every point.

Exercise 2.5.22 Explain why 0/ is open.


R

When we generalize to topological spaces, we will require that the empty-set is open.

Exercise 2.5.23 Show that R is open.


R

Likewise, when we generalize to topological spaces, we will also require that the
entire set be open.

Exercise 2.5.24 Let x R and > 0. Show that B (x) is open.

The following result is incredibly important; it is neither particularly deep nor particularly
difficult, but it is what is taken as the definition of continuity when we generalize to topological
spaces, even though it might not be particularly intuitive at first.
Theorem 2.5.25. Let f : R R. Then, f is continuous iff the preimage of every open set

is open (i.e. iff U R open implies f 1 (U) R is open).

Proof. () Suppose that f is continuous. Let U R be open and let x f 1 (U). Then,
f (x) U, so because U is open, there is some > 0 such that B ( f (x)) U. Then, by
Exercise 2.5.11(iv), there is some > 0 such that f (B (x)) B ( f (x)) U. It follows that
B (x) f 1 (U), and so f 1 (U) is open.

Chapter 2. Basics of the real numbers

76

() Suppose that the preimage of every open set is open. Let x R and > 0. B ( f (x))
is open by Exercise 2.5.24, and so f 1 (B ( f (x))) is open. As this set contains x, there
is some > 0 such that B (x) f 1 (B ( f (x))), and so f (B (x)) B ( f (x)). Thus, f is
continuous by Exercise 2.5.11(iv).

Dual (but not opposite!)5 to the notion of an open set is that of a closed set.
Definition 2.5.26 Closed set Let C R. Then, C is closed iff Cc is open.

Example 2.5.27 R and the empty-set Hopefully you saw in Exercises 2.5.22

and 2.5.23 that both 0/ and R are open. As 0/ c = R and Rc = 0,


/ it follows that both 0/
and R are closed as well. In particular, it is possible for sets to be both open and closed.
This is what we meant when we said that openness and closedness are dual but not
opposite.
Exercise 2.5.28 Let f : R R. Then, f is continuous iff the preimage of every closed set

is closed.
Showing that Cc is open to show that C is closed can in fact be a very efficient way of doing so.
Nevertheless, it would be nice to have a direct way of checking that C is closed, which is why we
introduce the notion of accumulation point.
Definition 2.5.29 Accumulation point Let S R and let x0 R. Then, x0 is an

accumulation point of S iff for every > 0, B (x0 ) intersects S at a point distinct from x0 .
R

You might think of the accumulation points of S as being infinitely close to S in


some sense.

The reason we require that it intersect at a point distinct from x0 is because some
results would just fail to be true without it (for example, Proposition 2.5.36). A similar
problem which would arise is that the Equivalent formulations of quasicompactness
(Corollary 2.5.73) would be trivial without this extra condition.

Exercise 2.5.30 Let S R and let x0 R. Show that x0 is an accumulation point of S iff

for every > 0, B (x0 ) intersects S at infinitely many points.


R

Of course, there is a similar equivalence which combines this exercise with the
previous.

Warning: Many of the results we prove in this section generalize perfectly to the case
of a general topological space. This is not one of them! For example, obviously this
cannot be true in a topological space which only has finitely many points.

Definition 2.5.31 Limit point Let S R and let x0 R. Then, x0 is a limit point of S iff

there exists a net 7 x S with x 6= x0 such that lim x = x0 .


R

5 Dont

We choose our definition of limit point so that it agrees with the notion of accumulation point (see the next result, Proposition 2.5.32). This is the reason for the
requirement x 6= x0 .

make the mistake Hitler did; see https://www.youtube.com/watch?v=SyD4p8_y8Kw.

2.5 Basic topology of the real numbers

77

Proposition 2.5.32 Let S R and let x0 R. Then, x0 is an accumulation point of S iff it

is a limit point of S.
R

If you replace net with sequence in the definition of a limit point, then this result
will be false in general! Sequences are just fine if we restrict ourselves to R, but when
we generalize, this result would fail to hold if we constrained ourselves to only work
with sequences.

Proof. () Suppose that x0 is an accumulation point of S. Let > 0. Then, B (x0 ) S


contains some element x distinct from x0 . Note that the positive-real numbers (R+ , )
equipped with the reverse of the usual ordering  (i.e.,x  y is defined to be true iff y x)
is a directed set, so that 7 x is a net with x 6= x0 . We claim that lim x = x0 , so that
x0 will then be a limit point of S. Let > 0. Suppose that  (we are taking our 0
in the definition of the limit of a net, Definition 2.4.10, to be itself). Then, x B (x0 ),
and so |x x0 | < . As  , we have , and so |x x0 | < , which shows that
lim x = x0 , and so x0 is a limit point of S.
() Suppose that x0 is a limit point of S. Then, there is some net 7 x S with x 6= x0
such that lim x = x0 . Let > 0. Then, there is some x0 such that |x0 x0 | < . In other
words, x0 B (x0 ) S, so that x0 is an accumulation point of S.

The following result is an equivalent characterization of being a closed set and was our
motivation for introducing accumulation points and limit points at all.
Proposition 2.5.33 Let C R. Then, C is closed iff it contains all its accumulation points.

Proof. () Suppose that C is closed. Let x R be an accumulation point of C. We proceed


by contradiction: suppose that x Cc . Then, because Cc is open, there is some 0 > 0 such
that B0 (x) Cc . But then, B0 (x) C is empty: a contradiction of the fact that x is an
accumulation point of C. Thus, we must have had that x C.
() Suppose that C contains all its accumulation points. Let x Cc . Then, x is not an
accumulation point of C, and so there must be some 0 such that B (x) C is empty (it
cannot even contain x because x
/ C). In other words, it must be the case that B (x) Cc ,
c
so that C is open.

Corollary 2.5.34 Let C R. Then, C is closed iff it contains all its limit points.
R

I am quite confident this is not in fact the correct etymology of the term, but you
might think of closed sets as being closed under the operation of taking limits.

Exercise 2.5.35 Let S R be closed and bounded above. Show that sup(S) S.
R

Similarly, of course, if S is closed and bounded below, then inf(S) S.

Proposition 2.5.36 Let m 7 xm be a sequence that is not eventually constant and let x R.
Then, x is an accumulation point of {xm : m Z+ } iff there is a subsequence of m 7 xm
which converges to x.
R

Note that this result would be false if we did not require that B (x0 ) intersect the
set at a point distinct from x0 in the definition of an accumulation point (Defini-

Chapter 2. Basics of the real numbers

78

tion 2.5.29). For example, if we did not require this, 1 would be an accumulation
point of (1, 21 , 13 , 14 , . . .), but of course no subsequence converges to 1.
R

Warning: This is not true if you replace sequence with net. See the following
example.

Warning: This is not true in general topological spacessee Example 3.2.15.

Proof. () Suppose that x is an accumulation point of {xm : m Z+ }. We construct a


subsequence n 7 xmn inductively that has the property that xmn B 1 (x) with xmn 6= x. If we
n
can do so, then n 7 xmn will converge to x by the Archimedean Property (that is, because
numbers of the form 1n can be chosen to be arbitrarily small). Because x is an accumulation
point, we have that B1 (x) {xm : m Z+ } is contains some point distinct from x, and so we
can take xm0 to be any such element. Suppose now that we have constructed xm0 , . . . , xmn and
we wish to construct xmn+1 . By Exercise 2.5.30, not only is B 1 {xm : m Z} nonempty,
n+1
but it is infinite. Therefore, B 1 {xm : m > mn } is nonempty, and so we can choose xmn+1
n+1
to be any element in this set distinct from x.
Note that this direction of the proof fails for nets in general. We are implicitly
using the fact that infinite subsets of N are cofinal in N (and hence give us a (strict)
subsequence), and this is not true for general directed sets.

() Suppose that there is a subsequence m 7 xm which converges to x. Let > 0. Then,


there is some n0 such that, whenever n n0 , it follows that xmn B (x). In particular,
B (x) {xm : m Z+ } constant an element distinct from x (because the sequence m 7 xm
is not eventually constant), and so x is an accumulation point of {xm : m Z+ }.



Example 2.5.37 An accumulation point of a net to which no subnet converges

Define R+ 3 7 x := 1 . Then,
{x : R+ } = (0, ).

(2.5.38)

Thus, for example, 1 (0, ) is an accumulation point of the net. However, as lim x = 0,
it follows that no subnet can converge to 1.b
a This
b Of

example was inspired by a similar example showed to me by a student.


course, there is nothing special about 1any element of (0, ) would work just as well.

There is a dual (well, sort of) notion of accumulation point, though it is perhaps not quite as
useful.
Definition 2.5.39 Interior point Let S R and let x0 R. Then, x0 is an interior point
of S iff there is some 0 such that B0 (x0 ) S.

The result dual to Proposition 2.5.33 is in the following easy exercise.


Exercise 2.5.40 Let U R. Show that U is open iff every point in U is an interior point.

The next couple of results are incredibly important for the same reason that Theorem 2.5.25
(the characterization of continuity in terms of open sets) was: they are neither deep nor difficult (in
fact, theyre quite trivial), but they will be defining requirements of open sets in the more general
setting of topological spaces.

2.5 Basic topology of the real numbers


Theorem 2.5.41. Let U be a collection of open subsets of R. Then,
[

79

(2.5.42)

UU

is open.
In other words, an arbitrary union of open sets is open.

Proof. Let x UU U. Then, x U for some U U . Because U is open, there is some


S
S
0 > 0 such that B0 (x) U UU U, and so UU U is open.

S

Theorem 2.5.43. Let U1 , . . . ,Um R be open. Then,


m
\

(2.5.44)

Uk

k=1

is open.
R

In other words, the finite intersection of open sets is open.

Proof. Let x m
k=1 Uk , so that x Uk for 1 k m. Let k > 0 be such that Bk (x) Uk .
Define 0 := min{1 , . . . , m } > 0. Then, B0 (x) Bk (x) Uk for all k, and so B0 (x)
Tm
Tm

k=1 Uk , so that k=1 Uk is open.
T

Exercise 2.5.45 Find an infinite collection of open sets whose intersection is not open.
Exercise 2.5.46 Let C be a collection of closed subsets of R. Show that
\

(2.5.47)

CC

is closed.
Exercise 2.5.48 Let C1 , . . . ,Cm R be closed. Show that
m
[

Ck

(2.5.49)

k=1

is closed.
The closure of a set is the smallest closed set which contains it. Likewise, the interior of a set
is the largest open set which it contains. The sense in which these are respectively smallest and
largest is made precise by the following results.
Proposition 2.5.50 Closure Let S R. Then, there exists a unique set Cls(S) R, the

closure of S, that satisfies


(i). Cls(S) is closed;
(ii). S Cls(S); and

Chapter 2. Basics of the real numbers

80

(iii). if C is any other closed set which contains S, then Cls(S) C.


R
R

Compare this with the definition of the integers and rationals (Theorems 1.2.1
and 1.3.4).
We prefer the notation Cls(S) because (i)
Sometimes people denote the closure by S.
it is less ambiguous (the over-bar is used to denote many things in mathematics) and
(ii) the notation Cls(S) is just slightly more descriptive.

Proof. Define
Cls(S) :=

C.

(2.5.51)

CR closed
SC

Cls(S) is closed because the intersection of an arbitrary collection of closed sets is closed.
S Cls(S) because, by definition, S is contained in every subset in the intersection (2.5.51).
Let C be some other closed set which contains S. Then, C itself appears in the intersection
of (2.5.51), and so Cls(S) C.
If C is some other subset of R which satisfies (i)(iii), then, by (iii) applied to Cls(S),
we have that Cls(S) C. On the other hand, by (iii) applied to C, we have that C Cls(S).
Thus, Cls(S) = C.

We have a dual result for the interior.
Proposition 2.5.52 Interior Let S R. Then, there exists a unique set Int(S) R, the

interior of S, that satisfies


(i). Int(S) is open;
(ii). Int(S) S; and
(iii). if U is any other open set which is contained in S, then U Int(S).
R

Sometimes people denote the interior by S. We prefer the notation Int(S) for
essentially the same reasons as we prefer the notation Cls(S).

Proof. We leave this as an exercise.


Exercise 2.5.53 Complete this proof by yourself, using the dual proof for the

closure as guidance.

Exercise 2.5.54 Let S R. Show that

(i). if S is closed, then Cls(S) = S; and


(ii). if S is open, then Int(S) = S.
There is a relatively concrete description of the closure.

2.5 Basic topology of the real numbers

81

Proposition 2.5.55 Let S R. Then, Cls(S) is the union of S and its set of accumulation

points.
Proof. Let C be the union of S and its accumulation points. We simply have to verify that it
satisfies the axioms of the definition of the closure in Proposition 2.5.50.
To show that it is closed, we must show that it contains all its accumulation points. So,
let x be an accumulation point of C. If x S, we are done, so we may as well assume that
x
/ S. We show that x is an accumulation point of S, so that x C. Let > 0. We wish to
show that B (x) intersects S (as x
/ S, we know the point of intersection must be distinct
from x). As x is an accumulation point of C, we know, however, that B (x) contains either a
point of S or an accumulation point of S distinct from x. In the former case, we are done, so
let a B (x) be an accumulation point of S distinct from x. Because a is an accumulation
point of S, it must be the case that B (a ) intersects S at a point x S distinct from a . But
then, by the triangle inequality, x B2 (x). Thus, x is an accumulation point of S (x and x
must be distinct because one is in S and the other is not), and hence an element of C. Thus,
C is closed.
Because any closed set must contain all its accumulation points (Proposition 2.5.33), it
follows that C must be contained in any closed set which contains S, and so C = Cls(S). 
There is a dual concrete description of the interior.
Proposition 2.5.56 Let S R. Then, Int(S) is the set of interior points of S.

Proof. Because of the dual result to Proposition 2.5.33, namely Exercise 2.5.40 (a set is
open iff all of its points are interior points), just as in the dual proof above, it suffices to
show that the set of interior points of S is open.
So, let x R be an interior point of S. Then, there is some 0 > 0 such that B0 (x) S.
To show that the set of interior points is open, we need to show that in fact every point
of B0 (S) is an interior point of S. This however follows from the fact that balls are open
(Exercise 2.5.24).

Exercise 2.5.57 Let S R.

(i). Show that S is closed iff S = Cls(S).


(ii). Show that S is open iff S = Int(S).
In the next chapter, we will define a topological space as a set equipped with a choice of open
sets. The choice of open sets will be called a topology. Of course, it turns out that there are many
equivalent ways to specify a topology, and one way to do this is by defining what the closure (or
interior) of each set is. The following result is important because, when we go to generalize, it will
play the role of axioms which a closure (or interior) operator must satisfy.
Theorem 2.5.58 Kuratowski Closure Axioms. Let S, T R. Then,

(i). Cls(0)
/ = 0;
/
(ii). S Cls(S);
(iii). Cls(S) = Cls (Cls(S)); and
(iv). Cls(S T ) = Cls(S) Cls(T ).

Chapter 2. Basics of the real numbers

82
R

Careful: The closure of a finite union is the union of the closures, but this does not
hold in generalsee the Exercise 2.5.61 below.

Proof. The empty-set is closed, contains the empty-set, and is contained in every closed set
which contains the empty-set, and hence, by definition, Cls(0)
/ = 0.
/
(ii) follows from the definition.
(iii) follows from the fact that the closure of a set is closed (by definition) and the fact
that the closure of a closed set is itself (Exercise 2.5.54).
We now prove (iv). We show that Cls(S) Cls(T ) satisfies the axioms of the closure
of S T . Cls(S) Cls(T ) is a closed set which contains S T , and so it therefore contains
Cls(S T ). Let C be some other closed set which contains S T . C therefore contains S,
and so it must contain Cls(S). Likewise, it must contain Cls(T ), and so C must contain
Cls(S) Cls(T ). Therefore, by definition, Cls(S) Cls(T ) = Cls(S T ).

Of course, we have the dual result for interior.
Theorem 2.5.59 Kuratowski Interior Axioms. Let S, T R. Then,

(i). Int(R) = R;
(ii). Int(S) S;
(iii). Int(S) = Int (Int(S)); and
(iv). Int(S T ) = Int(S) Int(T ).
Proof. We leave this as an exercise.
Exercise 2.5.60 Complete this proof yourself, using the dual proof for the closure

as guidance.

Exercise 2.5.61 Let S 2R be a collection of subsets of R. Show that the following are

true.
(i).

Cls(S) Cls (

(ii).

Int(S) Int (

(iii).

Cls(S) Cls (

(iv).

Int(S) Int (

SS
SS
SS
SS

SS

SS

S).

SS

SS

S).

S).

S).

Find examples to show that we need not have equality in general.

2.5.3

Quasicompactness
You will find with experience that closed intervals on the real line are particularly nice to work
with. For example, youll probably recall from calculus that, on a closed interval, every continuous
function attains a maximum and minimum (the Extreme Value Theorem). In particular, continuous
functions are bounded on closed intervals. This is not just true of all closed intervals, however, but
is in fact true about any closed bounded subset of R.

2.5 Basic topology of the real numbers

83

The objective then is to characterize closed bounded sets in such a way that will (i) generalize
to arbitrary topological spaces and (ii) retain most of the nice properties that closed bounded
sets have in R. As the notion of bounded itself does not generalize, we will need an equivalent
characterization. This characterization is what is called quasicompactness.
Definition 2.5.62 Quasicompact Let S R. Then, S is quasicompact iff every open

cover of S has a finite subcover.


R

Of course, we have not defined what open covers are yet, so see the next definition in
case it is not obvious what this means.

For most authors, and in fact, for probably all authors of introductory analysis
books, my definition of quasicompact for them will be called just compact. Instead,
I reserve the term compact for spaces which are both quasicompact and T2 see
Definition 3.6.41.a As R is T2 , the notions of compact and quasicompact are the same
for the real numbers, but in general they will differ. To the best of my knowledge, the
term quasicompact originated in algebraic geometry because it was felt that things
that should not intuitively be thought of as compact nevertheless satisfied the defining
condition above. Thus, it was decided that such spaces should be referred to as
quasicompact and that instead the term compact should be reserved for spaces which
are both quasicompact and T2 . I prefer this terminology for two reasons: (i) the
terminology is more precise, that is, I have two terms to work with instead of just one;
and (ii) I strongly feel that it is a bad idea for the terminology we use to be dependent
on the mathematics we happen to be doing at the momentcompact should not
mean one thing today and something else tomorrow just because I decided to work
on something different. Terminology should be as consistent as possible across all of
mathematics.

a You

are not expected to know what T2 means. See the next chapter for details (specifically, Definition 3.6.40), though keep in mind, the details dont matter at the moment.

Definition 2.5.63 Cover Let S R and let U 2R . Then, U is a cover of S iff


S

S UU U. U is an open cover iff every U U is open. A subcover of U is a subset


V U that is still a cover of S.

And now of course we had better check that this condition properly characterizes closed and
bounded in the real numbers, as desired.
Theorem 2.5.64 Heine-Borel Theorem. Let S R. Then, S is closed and bounded iff

it is quasicompact.
R

This is equally true in Rd , but fails to hold in general topological spaces (for one
thing, boundedness does not make sense in arbitrary topological spaces). It will not
even hold in T2 topological spaces in which the notion of boundedness makes sense.

Proof. a () S T E P 1 : A S S U M E H Y P O T H E S E S
Suppose that S is closed and bounded. Let U be an open cover of S. We proceed by
contradiction: suppose that U has no finite subcover.
STEP 2: CONSTRUCT A SEQUENCE OF NONINCREASING CLOSED INTERVA L S W H O S E L E N G T H S G O T O 0 A N D W H O S E I N T E R S E C T I O N W I T H S H A S
N O F I N I T E S U B C OV E R

Let M > 0 be such that S [M, M]. Then, at least one of [M, 0] S and [0, M] S must
not have a finite subcover of U , because if they both did, then the cover of [M, 0] S

Chapter 2. Basics of the real numbers

84

together with the cover of [0, M] S would comprise a finite subcover of S. Without loss of
generality,
that
 Massume

 M [0,M] S has no finite subcover. Then, by exactly the same logic,
either 0, 2 S or 2 , M S has no finite subcover. Proceeding inductively, we obtain a
nonincreasing sequence of closed intervals I0 I1 I2 such that (i) Ik S has no finite
subcover and (ii) the lengths of the intervals converges to 0.
S T E P 3 : C O N S T R U C T A N E L E M E N T I N S kN Ik
Note that, as Ik S has no finite subcover, in particular, it cannot be empty (otherwise any
subset of U would cover it). So, let xk Ik S. Because the lengths of the intervals go to 0,
m 7 xm is a cauchy sequence, and hence converges to some x R. As S is closed, we have
in addition that x S. We wish to show that in addition x Ik for all k. Write Ik = [ak , bk ]
for ak bk . We wish to show that ak x bk . We show just x bk (the other inequality
is similar). We proceed by contradiction: suppose that there is some bk0 such that x > bk0 .
Then, there is some m0 such that, whenever m m0 , it follows that xm > bk0 . However, for
m at least as large as k0 , we need xm Im Ik0 , so that xm bk0 : a contradiction. Therefore,
x bk for all k, and so x Ik for all k.
T

STEP 4: DEDUCE THE CONTRADICTION


As x S, there is some U U such that x U. Then, because the lengths of the intervals
converge to 0 and U is open, there must be some Im0 such that x Im0 U. But then, {U}
is a finite open cover of Im0 S: a contradiction. Therefore, S is quasicompact.
()
STEP 5: ASSUME HYPOTHESES
Suppose that S is quasicompact.
S T E P 6 : S H O W T H AT S I S B O U N D E D
The cover {BM (0) : M > 0} covers all of R, and so certainly covers S. Therefore, this is
a finite subcover {BM1 (0), . . . , BMm (0) : M1 , . . . , Mm > 0}. Define M := max{M1 , . . . , Mm }.
Then, BM (0) contains each BMk (0), and so contains S. Therefore, S is bounded.
S T E P 7 : S H O W T H AT S I S C L O S E D
Let 7 x S be a net converging to x R. We must show that x S. We proceed by
contradiction: suppose that x
/ S. Then, for each s S, because s 6= x , there is some s > 0
{Bs (s) : s S}ois certainly an open cover of S, and so
such that x
/ Bs (s).b The collection
n
there is some finite subcover Bs1 (s1 ), . . . , Bsm (sm ) . Define 0 := min1km {|sk x |}.
Then, in particular, there is some x0 B0 (x ). However, by the Reverse Triangle Inequality
(Exercise 2.2.4(v)),




|x0 sk | = (x0 x ) + (x sk ) x0 x |x sk |


(2.5.65)
|x sk | x0 x > |x sk | min {|sk x |} 0,
k1m

n
o
and so in particular, x0
/ Bsk (sk ), a contradiction of the fact that Bs1 (s1 ), . . . , Bsm (sm )
is a cover of S. Therefore, lim x = x S, and we are done.

a Proof
b For

adapted from [1, pg. 8789].


what its worth, it is this step that does not work in general.

2.5 Basic topology of the real numbers

85

Equivalent formulations of quasicompactness

If for some reason you find the definition of quasicompactness in terms of open covers off-putting,
there are a couple of other equivalent formulations of the concept that we present in this section. In
contrast to the Quasicompactness, these characterizations of quasicompactness do generalize to
arbitrary topological spaces.
Proposition 2.5.66 Let K R and let C be a collection of closed subsets of R. Then, K is

quasicompact iff whenever every finite intersection of elements of C intersects K, the entire
T
intersection CC C also intersects K.

Proof. () Suppose that K is quasicompact. Let C be a collection of closed subsets


of R that has the property that the intersection of any finite number of elements of C
T
intersects K. We proceed by contradiction: suppose that CC C does not intersect K.
T
S
Then, ( CC C)c = CC Cc contains K,a and therefore the collection U := {Cc : C C }
constitutes an open cover of K. Because K is quasicompact, it follows that there is some
finite subcover C1c Cmc K. But then C1 Cm does not intersect K: a contradiction.
T
Therefore, CC C intersects K.
()
Exercise 2.5.67 Prove the converse.


a Recall

De Morgans Laws (Exercise A.1.16).

Theorem 2.5.68. Let K R. Then, K is quasicompact iff every net 7 a K has a

subnet that converges to a limit in K.


This is yet another result that will not hold in general if you replace the word net
with sequence (though it will hold in R).

Proof. () Suppose that K is quasicompact. Let 3 7 a K be a net. Define




C := Cls a :
(2.5.69)
and C := {C : }.
Exercise 2.5.70 Show that the intersection of finitely many C s is nonempty.

By the previous characterization of quasicompactness, it follows that


K, so let x K be in this intersection.
Then,

intersects

for every > 0 and for every , there is some a, B (x) with , . (2.5.71)
Define


0 := (, ) : R+ , : a B (x) .

(2.5.72)

We order R+ with the reverse of the usual ordering. Then, R+ is directed. We verify
that 0 is likewise directed. Let 1 , 2 > 0 and let 1 , 2 be such that ak Bk (x). Take

Chapter 2. Basics of the real numbers

86

:= min{1 , 2 } and let 3 be at least as large as 1 and 2 . By (2.5.71), there is some


3 such that a B (x). Then, (, ) 0 and (, ) (k , k ), so that 0 is directed.
Now, for (, ) 0 , pick some , such that (i) a, B (x) and (ii) , . Of
course (, ) 7 a, converges to x, but we still need to check that it is a subnet.
Let 0 be an arbitrary index. Then, if (, ) (1, 0 ), it follows that , 0 , and
so, by definition, this is indeed a subnet.
() Suppose that every net 7 a has a subnet that converges to a limit in K. Let C be a
collection of closed sets such that the intersection of finitely many elements of C intersects
K. Let C be the collection of all finite subsets of C ordered by inclusion, so that it is indeed
T
a directed set. For each element C C, let aC CC C. By hypothesis, this has a subnet
T
7 aC that converges to x K. We wish to show that x CC C. To show this, it suffices
to show that 7 aC is eventually in each C C . However, for C0 C , {C0 } C, and
so there there is some 0 such that that for all 0 it follows that C {C0 }. In other
T
words, for all 0 , it follows that C0 C , and as aC CC C, it in particular follows
that aC C0 for all 0 . That is, 7 aC is eventually in C0 .

Corollary 2.5.73 Bolzano-Weierstrass Theorem Every eventually bounded net has a

convergent subnet.
Proof. Every eventually bounded net is eventually contained in some closed interval
[M, M], and so, every eventually bounded net has a subnet which is contained (not
eventually contained) in [M, M]. [M, M] is quasicompact by the Quasicompactness (Theorem 2.5.64). By the subnet characterization of quasicompactness (the previous theorem), it
follows that this net has a convergent subnet.

Corollary 2.5.74 Bounded infinite sets have accumulation points.
R

This is sometimes also called the Bolzano-Weierstrass Theorem.

Proof. Any infinite set will have a sequence of distinct points contained in it. If the set
is bounded, this sequence will be bounded, and so by the Bolzano-Weierstrass Theorem
has a convergent subnet. The limit of this subnet is an accumulation point of the set
(Proposition 2.5.36).

And now we finally return to the result we mentioned way back in Equation (2.4.50) which we
now have the tools to prove.
Proposition 2.5.75 Let 7 a be a subnet of a net 7 a . Then,

lim inf a lim inf a lim sup a lim sup a .

(2.5.76)

Proof. We already know the middle inequality holds from Exercise 2.4.49. We prove the
lim sup inequality holds; the other is similar.
Let 0 be an arbitrary index, and let 0 be such that, whenever 0 , it follows that
0 . Then,
n
o
a : 0 {a : 0 } ,
(2.5.77)

2.5 Basic topology of the real numbers

87

and hence
sup {a } sup {a } .

(2.5.78)

It follows from the definition of the limit superior (2.4.47) and the Order Limit Theorem
(Exercise 2.4.39) that lim sup a lim a .

Proposition 2.5.79 Let 7 a be a net. Show that 7 a converges iff lim sup a =
lim inf a is finite, and in this case, lim a is equal to this common value.

Proof. () Suppose that 7 a converges. Then, it is eventually bounded, and so both


lim sup a and lim inf a are finite. Define u := lim sup a and l := lim inf a . Let > 0
and choose 0 such that, whenever 0 , it follows that
sup {a } u < and l inf {a } < .

(2.5.80)

Adding these two inequality, we find that, for 0 ,


!
l u < 2 +

inf {a } sup {a } .

(2.5.81)

Now define a := lim a and choose 00 such that, whenever 00 , we have that |a
a | < . In other words,
a a < and a a < .

(2.5.82)

Taking the sup of this inequality (and using the fact that sup(S) = inf(S), we find
sup {a } < + a and inf {a } < a .

(2.5.83)

Pick something larger than both 0 and 00 and change notation so that this new larger thing
is called 0 . Thus, now, for 0 , both inequalities hold, and so
l u < 2 + (( a ) + ( a )) = 4.

(2.5.84)

As was arbitrary, we have l = u.


() Suppose that lim sup a = lim inf a is finite. Define L := lim sup a = lim inf a .
We show that every subnet of 7 a has in turn a subnet which converges to L (see
Proposition 2.4.152. So, let 7 a be a subnet. As L is finite, the original net is bounded,
and so certainly 7 a is bounded, and hence, by the Bolzano-Weierstrass Theorem, it
has in turn a convergent subnet 7 a . But then,
L := lim inf a lim a lim sup a =: L

(2.5.85)

and so limn a = L, where we have applied both the () direction of this result as well as
the previous proposition

.

88

2.6

Chapter 2. Basics of the real numbers

Summary
This has been a rather long chapter and we have covered many different properties of the real
numbers. For convenience, we summarize here some of the main points we have covered.
(i). The real numbers are the unique (up to isomorphism of totally-ordered fields) nonzero
dedekind-complete totally-ordered field.
(ii). The real numbers are cauchy-complete.
(iii). The real numbers have cardinality 20 > 0 .
(iv). Nondecreasing/nonincreasing nets bounded above/bounded below converge (Monotone
Convergence Theorem).
(v). The real numbers are archimedean (the natural numbers are unbounded).
(vi). Every bounded net in R has a convergent subnet (Bolzano-Weierstrass Theorem).
(vii). A subset of R is closed and bounded iff it is quasicompact (Quasicompactness).

3. Basics of general topology

3.1

The definition of a topological space


We have mentioned topological spaces several times throughout the notes already, and finally we
turn to studying general topological spaces themselves. I said before that I think it is fair to say
that the purpose of topological spaces is to introduce the most general context as possible in which
one can talk about continuity. The motivating result in this regard is Theorem 2.5.25, which says
that a function is continuous iff the preimage of every open set is continuous. The idea is then to
axiomatize the notion of open set: a topological space will be a set X equipped with a collection
of subsets U 2X , called the open sets. Of course, if we want to be able to say anything of
interest, we cant just take any old collection of subsetswe must require the collection to satisfy
some conditions. The conditions we require are those that come from Theorems 2.5.41 and 2.5.43,
namely, that an arbitrary union and finite intersection of open sets is open. So, without further ado,
I present to you, the definition of a topological space.
Definition 3.1.1 Topological space A topological space is a set X equipped with a

collection of subsets U 2X , the topology on X, such that


(i). 0,
/ X U;
(ii).

(iii).

Tm

UV

U U if V U ; and

k=1 Uk

U if Uk U for 1 k m.

The elements of U are open sets.

A subset C X is closed iff Cc is open. (Recall that this is precisely the definition
we gave in Rsee Definition 2.5.26.)

In order to exclude stupid things like the empty topology, we require that the empty-set
and the entire set are open. (Recall that these were open in Rsee Exercises 2.5.22
and 2.5.23.)

Chapter 3. Basics of general topology

90

Of course, you can specify a topology just as well by saying what the closed sets are (the open
sets are then just the complements of these sets).
Exercise 3.1.2 Let X be a set and let C 2X be a collection of subsets of X such that

(i). 0,
/ X C;
(ii).

if D C ; and

(iii).

Sm

C if Ck C for 1 k m.

CD C
k=1 Ck

Show that there is a unique topology on X whose collection of closed sets is precisely C .
R

In other words, your collection of closed sets must be nontrivial, closed under
arbitrary intersection, and closed under finite union, just as was the case in Rsee
Exercises 2.5.46 and 2.5.48.

Definition 3.1.3 G and F sets Let S X be a subset of a topological space. Then, S

is a G set iff S is the countable intersection of open sets. S is an F set iff S i the countable
union of closed sets.
R

3.1.1

For us, these concepts will not come-up very often, so it is not imperative that you
remember these terms. They do come-up, however, and so it would be incomplete to
not include them. And of course, others use this terminology so you should at least
know where to refer back to if you ever need to know.

Bases, neighborhood bases, and generating collections


It is often not necessary to define every single open set explicitly, but rather, only a special class of
open sets that determine all the others. For example, in the real numbers, to determine whether an
arbitrary set was open, we made use of the special open sets B (x). The idea that generalizes this
notion is that of a base for a topology.
Definition 3.1.4 Base Let X be a topological space and let B be a collection of open

sets of X. Then, B is a base for the topology of X iff the statement that a subset U of X
is open is equivalent to the statement that, for every x U, there is some B B such that
x B U.


Example 3.1.5 The collection {B (x) : x R, > 0} is a base for the topology of R (by

definition).
The real reason bases are important is because they allow us to define topologies, and so it is
important to know when a collection of subsets of a set form a base for some topology.
Proposition 3.1.6 Let X be a set and let B be a collection of subsets of X. Then, there

exists a unique topology for which B is a base iff


(i). B covers X; and

(ii). for every x X and B1 , B2 B with x B1 , B2 , there is some B3 B such that


x B3 B1 B2 .
R

If X does not a priori come with a topology, we will still refer to any collection of
sets that satisfy (i)(ii) as a base.

3.1 The definition of a topological space

91

Proof. () Suppose that there is a unique topology U for which B is a base. We first
show that B covers X. We proceed by contradiction: suppose there is some x X which is
not contained in any B B. Then, as B is a base for the topology, X would not be open: a
contradiction. Therefore, B covers X.
Now for the second property. By the definition of a base, we have that B U . In
particular, B1 B2 U , and so because B is a base for U , there must be some B3 B
such that B3 B1 B2 .
() Suppose that (i) B covers X, and (ii) for every for every B1 , B2 B intersecting, there
is some B3 B such that B3 B1 B2 . We declare U X to be open iff for every x U
there is some B B with x B U. By the definition of bases, this was the only possibility.
We need only check that this is in fact a topology. The empty-set is vacuously open. X is
S
open because B covers X. Let V be a collection of open sets and let x UV U. Then,
S
x U for some U V , and so there is some B B such that x B U UV U. Thus,
S
Tm
Bk B
UV U is open. Let U1 , . . . ,Um be open and let x k=1 Uk . Then, there is someT
such that x Bk Uk . By (ii), there is some B B with x B B1 Bm m
k=1 Uk ,
Tm
and so k=1 Uk is open.

Exercise 3.1.7 Let X be a topological space and let B be a collection of subsets of X. Show

that B is a base for the topology of X iff it has the property that U X open is equivalent
to U being a union of elements of B.

There is a similar way of defining a topology. Instead of specifying a base for all open sets, you
specify a base for all the open sets at a point (see the following definitions) for every point.
Definition 3.1.8 Neighborhood Let X be a topological space and let S N X. Then,

N is a neighborhood of S iff there is some open set U X such that S U N. An open


neighborhood of S is just an open set which contains S. A(n open) neighborhood of a point
x is a(n open) neighborhood of {x}.
Definition 3.1.9 Neighborhood base Let X be a topological space and for each x X
let Bx be a collection of neighborhoodsa of x. Then, {Bx : x X} is a neighborhood base
for the topology of X (and Bx is a neighborhood base at x) iff the statement that a subset U
of X is open is equivalent to the statement that, for every x U, there is some Bx Bx such
that Bx U. In the case that every element of each Bx is open, we say that the neighborhood
base is a local base.
a Not

necessarily open!

Example 3.1.10 A neighborhood base with no open sets Take X := R, and for
x X define Bx := {D (x) : > 0}.a While we have not actually defined a topology on
R yet, once we do very shortly (Example 3.1.20), you can verify that this is in fact a
neighborhood base, but no D (x) is open.


a Recall

that D (x) := {y R : |y x| }.

Once again, neighborhood bases allow us to define topologies.

Chapter 3. Basics of general topology

92

Proposition 3.1.11 Let X be a set and for each x X let Bx be a nonempty collection

of subsets of X which contain x. Then, there exists a unique topology for which Bx is a
neighborhood base of x iff for every x X and B1 , B2 Bx , there is some B3 Bx such
that B3 B1 B2 .
S
Furthermore, if each Bx consists purely of open sets, then xX Bx is a base for this
topology.
R

Warning:
The elements of Bx need not be opensee Example 3.1.10. In particular,
S
xX Bx will not be a base in general.

Proof. () Suppose that there exists a unique topology for which Bx is a neighborhood
base at x. Let B1 , B2 Bx . Then, B1 and B2 are neighborhoods of x, and so there are open
sets U1 B1 and U2 B2 containing x. Then, U1 U2 is open and contains x, and therefore,
because Bx is a neighborhood base at x, there is some B3 Bx with B3 U1 U2 B1 B2 .
() Suppose that for every x X and B1 , B2 Bx , there is some B3 Bx such that
B3 B1 B2 . We declare U X to be open iff for every x U there is some B Bx with
x B U. By definition of neighborhood bases, this was the only possibility. We need
only check that this is in fact a topology. The empty-set is vacuously open. X is open
S
because each Bx is nonempty. Let V be a collection of open sets and let x UV U. Then,
S
x U for some U V , and so there is some B Bx such that x B U UV U. Thus,
S
Tm
UV U is open. Let U1 , . . . ,Um be open and let x k=1 Uk . Then, x Uk for each k, and
so there is some Bk Bx such that x Bk Uk . By hypothesis, there is then some B Bx
T
Tm
with x B B1 Bm m
k=1 Uk , and so k=1 Uk is open.
Now suppose that Bx consists purely of open sets. Define B := xX Bx . We wish to show
that B is a base. To do that, we must show that U X is open iff for all x U there is some
B B such that x B U. One direction is easy: if U is open and x X, then in fact there
is some B Bx such that x B U.
Conversely, suppose that for all x U there is some B B such that x B U. We wish
to show that U is open. So, let x U. We wish to find some B Bx such that x B U.
By hypothesis and the definition of B, we know that there is some y X and By By such
that x By U. However, were assuming that every element in By is open, and so in fact
there is some B Bx such that x B By , as desired.

S

Sometimes we have a collection of sets that we would like to be open, but they do not necessarily
form a base (or a neighborhood base), and so in this case we cannot just invoke Proposition 3.1.6.
However, what we can do is take the smallest topology which contains these sets.
Proposition 3.1.12 Generating collection (of a topology) Let X be a set and let

S 2X . Then, there exists a unique topology U on X, the topology generated by S , such


that
(i). S U ; and
(ii). if U 0 is any other topology on X containing S , it follows that U U 0 .
Furthermore, the collection of all finite intersections of elements of S {X} is a base for
this topology.a . S is a generating collection.
R

You should compare this with the definitions of the integers (Theorem 1.2.1), rationals
(Theorem 1.3.4), closure (Proposition 3.2.19), and interior (Proposition 3.2.22).

3.1 The definition of a topological space

93

Proof. Let B be the collection of all finite intersections of elements of S {X}. This
definitely covers X as X B. Furthermore, the intersection of any two elements of B is
also an element of B, by definition. Therefore, there is a unique topology U on X for
which B is a base (Proposition 3.1.6).
By construction, S B U . On the other hand, if U 0 is any other topology for which
every element of S is open, then, because topologies are closed under finite intersection,
U 0 must contain B, and hence it must contain U (because U 0 is closed under arbitrary
union and every element of U is a union of elements of B (Exercise 3.1.7)).

throw X in in case S did not cover X (recall that bases (Definition 3.1.4) need to cover the space).

a We

Generating collections are actually quite nice because several things can be checked just by
looking at generating collections, which are generally significantly smaller1 than the entire
topologysee Exercises 3.1.28 and 3.2.42, and the A couple new things (Theorem 3.2.43).
3.1.2

Some basic examples


We can use the notion of a base to define the order topology.
Definition 3.1.13 Order topology Let X be a totally-ordered set, and let B be the

collection of sets of the form (a, b) := {x X : a < x < b} for a, b X with a < b. (We
also allow a = and b = +, in which case (a, +) := {x X : x > a} and similarly for
(, b) and (, +).)
Exercise 3.1.14 Use Definition 3.1.4 to show that B is a base for a topology.

The topology defined by B is the order topology on X.


Example 3.1.15 A partially-ordered set whose open intervals do not form a
base Define


X := {x, y1 , y2 , y3 , z1 , z2 },

(3.1.16)

and declare that


x yk and z1 y1 , y2 and z2 y2 , y3

(3.1.17)

for all k ((and then take the reflexive and transitive closure, so that, for example, we also
have x zl , yk yk , etc.) Then,
(x, z1 ) = {y1 , y2 } and (x, z2 ) = {y2 , y3 },

(3.1.18)

so that
(x, z1 ) (x, z2 ) = {y2 }.

(3.1.19)

However, {y2 } is not an interval because if we had {y2 } = (a, b) for a < y2 and b > y2 , we
would necessarily have a = x or a = , and b = z1 , z2 , +. In all of these 6 cases, (a, b)
must contain at least either y1 or y3 as well, and so in particular, it would not be the case
that {y2 } = (a, b0.
R
1 Though

This is why we only define the order topology for totally-ordered sets.

not literally in the sense of cardinality.

Chapter 3. Basics of general topology

94


Example 3.1.20 N, Z, Q, and R N, Z, Q, and R are all totally-ordered and so we may

(and do) equip them with the order topology.


The order topology on N and Z are examples of a very special topology, the finest one possible,
namely the discrete topology.
Definition 3.1.21 Discrete topology Let X be any set. The discrete topology is the

topology in which every subset is open.


R

In other words, the topology is just the power-set of X. This is a topology for
tautological reasons.

Proposition 3.1.22 The order topologies on N and Z are both discrete.

Proof. We prove that the order topology on Z is discrete and leave the case of N as an
exercise (a very similar argument will work). We wish to show that every subset of Z is
open. To show this, it suffices to show that each singleton set is open because an arbitrary
set is going to be a union of singletons. So, let m Z. To show that {m} is open, it suffices
to find a, b Z such that (a, b) = {m} (because every element in the base of a topology is
automatically open). Take a := m 1 and b := m + 1. Recall (Exercise 1.2.17) that there is
no integer between 0 and 1. It follows that there is no integer between k and k + 1 for all
k Z, and so indeed, (m 1, m + 1) = {m}.
Exercise 3.1.23 Show that the order topology on N is discrete.


Exercise 3.1.24 Why is the topology on Q not discrete?

The discrete topology is the finest topology you can have (finer means more open sets). The
coarsest topology you can have is the indiscrete topology.
Definition 3.1.25 Indiscrete topology Let X be any set. The indiscrete topology is just

{0,
/ X}.
R

3.1.3

The definition of a topology requires that at least the empty-set and the entire set
are open. The indiscrete topology is when nothing else is open. It is not particularly
useful, but it can be an easy source of some counter-examples (cf. Example 3.2.4).

Continuity
We now finally vastly generalize our definition of continuity.
Definition 3.1.26 Continuous function Let f : X Y be a function between topologi-

cal spaces and let x X. Then, f is continuous iff the preimage of every open neighborhood
of f (x) is an open neighborhood of x. f is continuous iff f is continuous at x for all x X.
Exercise 3.1.27 Show that f : X Y is continuous iff the preimage of every open set is

open.
In fact, we can do a bit better than this.

3.1 The definition of a topological space


Exercise 3.1.28 Let X and Y be topological spaces and let S be generate the topology of

Y . Show that f is continuous iff f 1 (S) is open for all S S .


R

In particular, this is true for S a base for the topology of Y

Exercise 3.1.29 Let f : X Y be any function between two topological spaces. Show that,

if X has the discrete topology, then f is continuous.


R

In other words, every function on a discrete space is continuous.

Example 3.1.30 The category of topological spaces The category of topological


spaces is the category Top whose collection of objects Top0 is the collection of all topological spaces, for every topological space X and topological space Y the collection of
morphisms from X to Y , MorTop (X,Y ), is precisely the set of all continuous functions from
X to Y , composition is given by ordinary function composition, and the identities of the
category are the identity functions.


Exercise 3.1.31 Show that the composition of two continuous functions is continu-

ous.
R

Note that this is something you need to check in order for Top to actually
form a category (MorTop (X,Y ) needs to be closed under composition). You
also need to verify that the identity function is continuous, but this is trivial
(the preimage of a set is itself, so. . . ).

Definition 3.1.32 Homeomorphism Let f : X Y be a function between topological

spaces. Then, f is a homeomorphism iff f is an isomorphism in Top.


Exercise 3.1.33 Show that a function is a homeomorphism iff (i) it is bijective, (ii) it is

continuous, and (iii) its inverse is continuous.


Exercise 3.1.34 Find an example of a function that is bijective and continuous, but not a

homeomorphism.
R

Contrast this with, for example, isomorphisms in Grp. It follows immediately from
the definition that a function between groups is an isomorphism in Grp iff (i) it is
bijective, (ii) it is a homomorphism, and (iii) its inverse is a homomorphism. However,
by Exercise A.2.11, if the original function is a homomorphism, then we get that its
inverse is a homomorphism for free, so we only need to actually check (i) and (ii).

Definition 3.1.35 Embedding An (topological) embedding is a function that is a home-

omorphism onto its image.


R

In other words, an embedding is like a homeomorphism with the exception that it


might not be surjective.

The word topological is in parentheses because the word embedding is used in


other contexts (for analogous, but different, thingssee, for example, the remark in
Proposition 4.3.63). This is not unlike how we have homomorphisms of rings and
homomorphisms of groups.

95

Chapter 3. Basics of general topology

96

Exercise 3.1.36 Show that f : X Y is a homeomorphism iff it is (i) bijective, (ii) it is

continuous, and (iii) its inverse is continuous.


Exercise 3.1.37 What is an example of a function that is (i) bijective, (ii) continuous, but

(iii) does not have continuous inverse.


R

Note the contrast with isomorphisms in the category Ring, for example. If a function
between rings is a bijective homomorphism, then its inverse is automatically a
homomorphism.

There is a useful result that is applicable in general whenever you want to check that a piecewise function is continuous.
Proposition 3.1.38 Pasting Lemma Let X and Y be topological spaces, let C1 ,C2 X

be closed, and let f1 : C1 Y and f2 : C2 Y be continuous. Then, if f1 |C1 C2 = f2 |C1 C2 ,


then the function f : C1 C2 Y defined by
(
f1 (x) if x C1
f (x) :=
(3.1.39)
f2 (x) if x C2
is well-defined and continuous.
Proof. We leave this as an exercise.
Exercise 3.1.40 Prove this yourself.

3.1.4

Some motivation
At this point, it is reasonable for one to ask Why do we care about such generality in an introductory
real analysis course? Shouldnt we only be concerned with the real numbers for now?. I would
argue that, even in the case where you really are only interested in the real numbers, abstraction can
shed light onto such a specific example. The real numbers are many things: a field, a totally-ordered
set, a totally-ordered field, a metric space, a uniform space, a topological space, a manifold, etc.. In
mathematics, we are not just concerned about what is true, but why things are true. Because the
real numbers are so special, having so much structure, there are many things true about them. But
by the same token, because there is so much structure, unless you go through the proofs yourself in
detail, it can be difficult to recall why things are truedoes property XYZ hold for the real numbers
because they are a totally-ordered field or because they are a metric space or because they are a
topological vector space or because. . . ? Instead, however, if we step back and only study R as a
topological space, it becomes clearer why certain things are true. This allows us to tell easily what
is true about the real numbers because they are a topological space, as opposed to what is true about
the real numbers because they are a field, etc.. For example, consider the following.
 Example 3.1.41 Though we have not technically defined it yet, hopefully you are familiar
with the function arctan from calculus (see Definition 6.4.104). arctan : R ( 2 , 2 ) is a
homeomorphism (see Exercise 6.4.106).

Thus, from the point of view of general topology, there is no distinction between R and (0, 1)
(see the exercise below). This is completely analogous to the sense in which there is no difference

3.2 A review

97

between R and 2N at the level of sets. (R and (0, 1) are isomorphic in Top, and R and 2N are
isomorphic in Set.) Recall from the very end of Chapter 1 What is a number?morphisms matter.
Exercise 3.1.42 Find a homeomorphism from ( 2 , 2 ) to (0, 1).

Perhaps a good example of the more general context making it clearer why things are true is the
Intermediate Value Theorem. Here is the classical statement you are probably familiar with from
calculus.2
Let f : [a, b] R be continuous. Then, for all y between f (a) and f (b) (inclusive) there is some x [a, b] such that f (x) = y.

(3.1.43)

Compare that with the more general statement.3


The continuous image of a connected set is connected.

(3.1.44)

I think its fair to say that the latter is much more elegant, and perhaps even easier to understand.4

3.2

A review
Quite a many things that we did with the real numbers in the last chapter carry over to general
topological spaces no problem. For convenience, we present here all definitions and theorems
which carry over nearly verbatim to topological spaces. We will try to point out when things do not
carry over identically (for example, limits no longer need be unique (Example 3.2.4)).
One important thing to keep in mind is: Open neighborhoods of a point in a general topological
space play the same role that -balls did in R. Not only do we use this to determine appropriate
generalizations, but if you are feeling a bit uncomfortable with topological spaces, this should help
your intuition.
For the entirety of this section, X and Y will be general topological spaces. We recommend that
this section be used mainly as a referencethe pedagogy of the concepts is contained in the last
chapter.
Definition 3.2.1 Net A net in X is a function from a nonempty directed set (, ) into

X.
Definition 3.2.2 Sequence A sequence is a net whose directed set is order-isomorphic
(i.e. isomorphic in Pre) to (N, ).
Definition 3.2.3 Limit (of a net) Let 7 x be a net and let x X. Then, x is a limit

of 7 x iff for every open neighborhood U of x there is some 0 such that, whenever
0 , it follows that x U. If a net has a limit, then we say that it converges.
Example 3.2.4 Limits need not be unique Define X := {0, 1} and equip X with the
indiscrete topology (Definition 3.1.25). Then, the constant net 7 x := 0 converges to
both 0 and 1 (the only open neighborhood of both of these points is X itself, and of course
the net is eventually contained in X).


2 See

Corollary 3.7.24 for the proof.


Theorem 3.7.21 for the proof.
4 Admittedly, the proof to go from the general statement to the classical statement is not completely trivial (a subset
of R is connected iff it is an intervalsee Theorem 3.7.16), but the two results are still morally the same.
3 See

Chapter 3. Basics of general topology

98

Definition 3.2.5 Subnet Let x : X be a be a net. Then, a subnet of x is a net

y : 0 X such that
(i). for all 0 , y = x for some ; and
(ii). whenever U X eventually contains x, it eventually contains y.
A strict subnet of x is a net of the form x|0 for 0 cofinal. A subsequence is a subnet
that is a sequence.
Theorem 3.2.6 Kelleys Convergence Axioms.

(i). Constant nets converge to that constant.


(ii). A net converges to x X iff every subnet has in turn a subnet which converges to
x X.
(iii). Let I be a directed set and for each i I
let xi : i X be a convergent net. Then, if

i
(x ) := limi lim (x ) exists, then I iI i 3 (i, ) 7 (xi ) i converges to (x ) .
R

We will see later (Theorem 3.4.8) that these three properties can be used to define a
topology (hence, the reason we refer to them as axioms).

Definition 3.2.7 Limit (of a function) Let f : X Y be a function between topological

spaces, and let x X and y Y . Then, y is the limit of f at x iff for every net 7 x such
that (i) x 6= x and (ii) lim x = x we have lim f (x ) = y.
R

In the real numbers, this is true if you replace the word net with the word sequence,
but this fails in general topological spacessee Example 3.2.32.

Proposition 3.2.8 Let f : X Y be a function and let x0 X. Then, f is continuous at x0


iff limxx0 f (x) = f (x0 ). f is continuous iff it is continuous at x0 for all x0 X.
Proposition 3.2.9 Let f : X Y be a function. Then, f is continuous iff the preimage of

every closed set is closed.


Definition 3.2.10 Accumulation point Let S X and let x X. Then, x is an accu-

mulation point of S iff every open neighborhood of x0 intersects S at a point distinct from
x0 .
R

If you remove the requirement that the point be distinct from x0 , we obtain what is
usually called an adherent point.

Definition 3.2.11 Limit point Let S X and let x X. Then, x is a limit point of S iff

there exists a net 7 x S with x 6= x such that lim x = x.


Proposition 3.2.12 Let S X and let x X. Then, x is an accumulation point of S iff it is

a limit point of S.

3.2 A review

99

If you replace net with sequence in the definition of a limit point, then this result
will be false in general; see Example 3.2.27

Proposition 3.2.13 Let C X. Then, C is closed iff it contains all its accumulation points.
Corollary 3.2.14 Let C X. Then, C is closed iff it contains all its limit points.

We proved that (Proposition 2.5.36) in R that x R is an accumulation point of a sequence iff


there was a subsequence that converged to x. We also gave an example (Example 2.5.37) of how
this fails (even in R) for general nets. Not only this, but this also fails to hold (for sequences) in
general topological spaces.
Example 3.2.15 An accumulation point of a sequence to which no subsequence converges Define X := {x1 , x2 , x3 } and


U := {0,
/ X, {x1 , x2 }} .
Consider the sequence m 7 am defined by
(
x2 if m = 0
am :=
.
x3 otherwise

(3.2.16)

(3.2.17)

Then, it is true that every open neighborhood of x1 X contains an element of the set
{am : m N}, namely a0 := x2 distinct from x1 . On the other hand, no subsequence of
m 7 am converges to x1 as it is eventually outside an open neighborhood of x1 (namely the
open neighborhood {x1 , x2 }).
Definition 3.2.18 Interior point Let S X and let x0 X. Then, x0 is an interior point

of S iff there is some open neighborhood U of x0 such that U S.


Proposition 3.2.19 Closure Let S X. Then, there exists a unique set Cls(S) X, the

closure of S, that satisfies


(i). Cls(S) is closed;
(ii). S Cls(S); and
(iii). if C is any other closed set which contains S, then Cls(S) C.
Proposition 3.2.20 Let S X. Then, Cls(S) is the union of S and its set of accumulation

points.
Theorem 3.2.21 Kuratowski Closure Axioms. Let S, T X. Then,

(i). Cls(0)
/ = 0;
/
(ii). S Cls(S);
(iii). Cls(S) = Cls (Cls(S)); and

Chapter 3. Basics of general topology

100
(iv). Cls(S T ) = Cls(S) Cls(T ).

Proposition 3.2.22 Interior Let S X. Then, there exists a unique set Int(S) R, the

interior of S, that satisfies


(i). Int(S) is open;
(ii). Int(S) S; and
(iii). if U is any other open set which is contained in S, then U Int(U).
Proposition 3.2.23 Let S X. Then, Int(S) is the set of interior points of S.
Theorem 3.2.24 Kuratowski Interior Axioms. Let S, T X. Then,

(i). Int(X) = X;
(ii). Int(S) S;
(iii). Int(S) = Int (Int(S)); and
(iv). Int(S T ) = Int(S) Int(T ).
Proposition 3.2.25 Let S X. Then, S is closed iff S = Cls(S).
Proposition 3.2.26 Let S X. Then, S is open iff S = Int(S).

We postponed the following counter-examples because we technically had not introduced the
notion of closure in a general topological space.
 Example 3.2.27 A limit point that is not a sequential limit point Define X := R.
We equip R with a nonstandard topology, the so-called cocountable topology.a Let C X
and declare that

C is closed iff either (i) C = X or (ii) C is countable.

(3.2.28)

Because the finite union of countable sets is countable and an arbitrary intersection of
countable sets is countable (obviously), it follows that this defines a topology on X.
We first show that every convergent sequence is eventually constant. So, let m 7 xm X
be sequence that converges to x X. We proceed by contradiction: suppose that it is not
eventually constant. Then, the set
{m N : xm 6= x }

(3.2.29)

is cofinal, and hence defines a subsequence n 7 xmn that (i) converges to x but (ii) is
never equal to x . Hence, the set C : {xmn : n N} does not contain x , and so Cc is an
open neighborhood of x . But of course n 7 xmn is not eventually contained in Cc no
term of this subsequence is contained in Cc . Therefore, n 7 xmn cannot converge to x : a
contradiction. Therefore, m 7 xm must be eventually constant.
Define U := Qc . As U c = Q is countable, U is open. We note that the closure of U is all of
R: no countable set can contain U, and so the only closed set which contains U is R itself.

3.2 A review

101

It thus follows that, in particular, 0 is a limit point of U. On the other hand, as every
convergent sequence is eventually constant, no sequence in U := Qc can converge to 0.
a Just as cofinite means that the complement is finite, Cocountable means that the complement is countable.
The name of the topology here derives from the fact that the open sets have countable complement (except the
empty-set of course).

 Example 3.2.30 Two distinct topologies with the same notion of sequential convergence Let X be as in Example 3.2.27 and denote the topology on X given there (the

cocountable topology) by U . Denote by D the discrete topology on X. We saw in Example 3.2.27 that sequences converge iff they are eventually constant. However, we also have
the following result.
Exercise 3.2.31 Let X be a discrete space and let 7 x X be net. Show that

7 x iff it is eventually constant.


Thus, a given sequence m 7 xm X converges with respect to U iff it converges with
respect to D, and in this case, they converge to the same limit. On the other hand, the set
{0} is not open with respect to U a but it is open with respect to D.
Even though the topologies are distinct, perhaps it is the case that they are homeomorphic? We show that this cannot happen. In fact, we show that no bijective function from
: (X, U ) (X, D) is continuous. If were such a function, then 1 (Qc ) would be
uncountable and proper, and hence would not be closed, a contradiction of the fact that is
continuous and Qc is closed with respect to D.
a This

uses the fact that the real numbers are uncountable!

Example 3.2.32 A sequentially-continuous function that is not continuous Let


X be as in Example 3.2.27 (R with the cocountable topology) and define f : X X by
(
1
if x Q
f (x) :=
.
(3.2.33)
1 if x Qc


(Note that this is just the Dirichlet Function (Example 2.5.5)!).


This function is certainly not continuous because, for example, the preimage of {1} is not
closed. On the other hand, because every convergent sequence is eventually constant, the
condition that x 6= x in the definition of a limit (Definition 3.2.7), f vacuously satisfies the
continuity condition for sequences.
Definition 3.2.34 Cover Let S X and let U 2X . Then, U is a cover of S iff
S

S UU U. U is an open cover iff every U U is open. A subcover of U is a subset


V U that is still a cover of S.
Definition 3.2.35 Quasicompact Let S X. Then, S is quasicompact iff every open

cover of S has a finite subcover.


Exercise 3.2.36 Show that finite spaces are quaiscompact.

Chapter 3. Basics of general topology

102

Exercise 3.2.37 Show that closed subsets of quasicompact spaces are quasicompact.
Proposition 3.2.38 Let K X and let C be a collection of closed subsets of X. Then, K is

quasicompact iff whenever every finite intersection of elements of C intersects K, the entire
T
intersection CC C also intersects K.

Proposition 3.2.39 Let K X. Then, K is quasicompact iff every net 7 a K has a

subnet that converges to a limit in K.


Note that the Heine-Borel and Bolzano-Weierstrass Theorems (Theorem 2.5.64 and Corollary 2.5.73) do not hold in general, but we will wait until after having studied integration before
writing down counter-examples.
3.2.1

A couple new things


We present in this subsection a couple of facts that, while not technically review per se, make more
sense to place here once we begin study of new topological concepts in earnest.
Definition 3.2.40 Dense Let X be a topological space and let S X. Then, S is dense

in X iff Cls(S) = X.
Exercise 3.2.41 Show that Q and Qc are both dense in R.
R

We mentioned way back when we discussed density of Q and Qc in R (Theorems 2.3.14 and 2.4.76) that these results arent literally the statements that Q and
Qc are dense in R. When we say that Q is dense in R, what we mean of course is
that Cls(Q) = R (and similarly for Qc ). People abuse language and refer to the
properties of Theorems 2.3.14 and 2.4.76 as density because density in this sense
(that is, in the sense of the definition above) follows as an easy corollary of these
theorems.

The next couple of facts have to do with generating collections and their relations to concepts
reviewed in the previous subsection.
Exercise 3.2.42 Let X be a topological space, let S generate the topology of X, let

7 x S be a net, and let x X. Show that 7 x converges to x iff 7 x is


eventually contained in every element of S which contains x .
R

In other words, for the purposes of convergence, it suffices to look only at a generating
collection of the topology.

There is another characterization of quasicompactness that we did not introduce in the previous
chapter because it involves the concept of generating collections, something that we hadnt defined
in that context.
Theorem 3.2.43 Alexander Subbase Theorem. Let X be a topological space and let

S be a generating collection for the topology on X. Then, X is quasicompact iff every cover
by elements of S has a finite subcover.
R

This is of course just the defining property of quasicompactness, the only change
being that we need only check covers whose elements come from the generating
collection S .

3.2 A review

103

In particular, if S does not cover X, then X is quasicompact.

The term subbase is sometimes used for generating collections which cover the space.
In fact, people usually make the requirement that the generating collection cover the
space, but there is no need.

Proof. a () Of course, if X is quasicompact, then every open cover has a finite subcover,
and so certainly open covers that come from S will have finite subcovers.
() S T E P 1 : M A K E H Y P O T H E S E S
Suppose that every cover by elements of S has a finite subcover. To show that X is
quasicompact, we prove the contrapositive of the defining condition of quasicompactness.
That is, we show that every collection of open sets that has the property that no finite subset
covers X, also does not cover X. So, let U be a collection of open sets that has the property
that no finite subset covers X. We show that U itself does not cover X.
STEP 2: ENLARGE U TO A MAXIMAL COLLECTION
Let U be the collection of all collections of open sets which (i) contain U (ii) also have the
property that no finite subset covers X. This is a set that is partially-ordered by inclusion,
and so we intend to apply Zorns Lemma (Theorem A.1.106) to extract a maximal such
element. So, let W be a well-ordered subset of U and define
W0 :=

W.

(3.2.44)

W W

Certainly W0 is a collection of open sets, a collection which contains U . In order to be


an upper-bound for W , however, we need to check that no finite subset covers X. So,
let W1 , . . . ,Wm W0 . Then, each Wk Wk for some Wk W . Because W is in particular
totally-ordered, one of W1 , . . . , Wm must contain all the others, and in particular, all the Wk s
are contained in a single Wk . It follows that {W1 , . . . ,Wm } cannot cover X.
Therefore, by Zorns Lemma, there is a maximal collection of open sets U0 that (i)
contains U and (ii) has the property that no finite subset covers X. To show that U does not
cover X, it suffices to show that U0 does not cover X.
S T E P 3 : S H O W T H AT I F A N E L E M E N T O F U0 C O N TA I N S A N I N T E R S E C T I O N
O F O P E N S E T S , I T M U S T C O N TA I N O N E O F T H O S E S E T S
Let U U0 and suppose that U1 Um U for U1 , . . . ,Um open. We proceed by
contradiction: suppose that Uk
/ U0 for all k. This means that, by maximality, for each Uk ,
there are finitely many Uk1 , . . . ,Uknk such that X = Uk Uk1 Uknk . But then
!
!
nk
nk
m [
m
m [
m
[
\
[
\

l
l
U
Uk =
Uk
Uk
Uk Uk1 Uknk = X, (3.2.45)
k=1 l=1

k=1

k=1 l=1

k=1

so that
U

nk
m [
[

Ukl ,

k=1 l=1

a contradiction. Therefore, some Uk X.


S T E P 4 : D E D U C E T H AT U0 D O E S N O T C O V E R X

(3.2.46)

Chapter 3. Basics of general topology

104
Define
V0 := U0 S .

(3.2.47)

First of all, as no finite subset of U0 covers X, in particular, X


/ U0 , so that
V0 = U0 (S {X}).

(3.2.48)

As every element of V0 comes from U0 , it follows that no finite subset of V0 covers X. On


the other hand, every element of V0 comes from S , so that, by hypothesis, it in turn follows
that V0 does not cover X. Thus, we will be done if we can show that
[

V=

V V0

U.

(3.2.49)

UU0

The inclusion is obvious because V0 U0 . For the other inclusion, let x UU0 U.
Then, x U for some U U0 . Because S is a generating collection, the collection of all
finite intersections of elements of S {X} is a base (see Proposition 3.1.12), and so there
are U1 , . . . ,Um S such that x U1 Um U. But then, by the previous step, Uk U0
for some Uk . Of course, Uk came from S {X}, and so in fact Uk U0 (S {X}) = V0 ,
S
so that indeed x V V0 V .

S

a Proof

3.3

adapted from [12, pg. 139].

Filter bases
This section is a bit of an aside and can probably be skipped without too much trouble. Our
motivation for covering filter bases is (i) it will help us demonstrate by our definition of subnet is
the correct one, as opposed to, for example, the one currently given5 on Wikipedia, and (ii) it is
something that is important enough that you should probably at least be aware of and will almost
certainly eventually encounter if you decide to become a mathematician.
Filter bases are actually an alternative to nets. In principle, one could do the entirety of topology
never speaking of nets and instead using only filter bases. This would be one motivation for
introducing them (though we have decided to primarily stick to nets).
Definition 3.3.1 Filter base Let (X, ) be a partially-ordered set and let F X be

nonempty and not containing the empty-set. Then, F is a filter base of X iff for x1 , x2 F,
there is some x3 F such that x3 x1 , x2 .a
R

This is the definition of an abstract filter base in any partially-ordered set. For us, we
will essentially only be interested in filter bases of the partially-ordered set (2X , )
for X a topological space: if F is a filter base of (2X , ), X a topological space, then
we shall say that F is a filter base in X. (Note that the elements of filter base do not
have to be open sets.)

This condition is exactly analogous to the condition for directed sets , that for
x1 , x2 , there is some x3 with x3 x1 , x2 . In fact, you might even say that the
definition is what it is because a filter base ordered by reverse-inclusion is a directed
set.

a This

55

property is called being downward-directed.

July 2015

3.3 Filter bases

105

The relation between nets and filter bases is given by the following.
Definition 3.3.2 Derived filter base Let X be a topological space, let 7 x X be a

net, and define


F 7x := {F X : F eventually contains 7 x } .

(3.3.3)

Exercise 3.3.4 Show that F is a filter base.

F 7x is the derived filter of 7 x .


R

Given a net, we just defined a canonically associated filter. Of course, there will be
many nets which give us this filter. For example, I can change a single term of a net
without affecting its derived filter.a

Because of this, one might argue that filter bases are more fundamental than nets.
Nets somehow contain extra information that is completely irrelevant to topology. An
example of this is how one can always throw away finitely many terms of a sequence
without affecting anything of importance. Because the derived filter base of a net only
contains information about what eventually happens with the net, this extraneous
information is lost when passing from the net to its derived filter.

I personally find nets much more intuitive than filter bases (probably because they
are a much more straightforward generalization of sequences than filter bases are),
and thinking of how filter bases come from nets helps me understand some of the
intuition of filter bases themselves.

a In

general at least. In stupid cases, of course, e.g. if the domain of the net is a single point, then this will
change the derived filter.

At the bare minimum, in order for filter bases and nets to be effectively equivalent for the
purposes of topology, we must at least (i) define convergence of filter bases and (ii) show that
convergence of a net agrees with convergence of its derived filter.
Definition 3.3.5 Limit (of a filter base) Let X be a topological space, let F be a filter

base in X, and let x X. Then, x is a limit of F iff for every open neighborhood U of
x there is some F F such that F U. If a filter base has a limit, then we say that it
converges.

Proposition 3.3.6 Let X be a topological space, let 7 x X be a net, and let x X.

Then, 7 x converges to x iff F 7x converges to x .

Proof. () Suppose that 7 x converges to x . Let U be an open neighborhood of


x . Then, 7 x is eventually contained in U. Therefore, U F 7x , and so of course
F 7x converges to x .
() Suppose that F 7x converges to x . Let U be an open neighborhood of x . Then,
there is some F F 7x such that F U. By definition of derived filter bases, it follows
that 7 x is eventually contained in F, and hence eventually contained in U. Therefore,
7 x converges to x .

Now we turn to filterings and subnets. A filtering is to a filter base as a subnet is to a net. Recall
that one of our motivations for talking about filter bases at all was to argue that our definition

Chapter 3. Basics of general topology

106

of subnet was the correct one. One nice thing about filter bases is that there is no confusion
about what the definition of a filtering should be.6 We will then show that our definition of subnet
coincides with the notion of a filtering.
Definition 3.3.7 Filtering Let F be a filter base on X. Then, a filtering of F is a filter

base G that has the property that, for every F F , there is some G G such that G F.
R

Note that in some placea this is called a refinement. This is poor terminology because
it disagrees with the usual definition of refinements of coverssee Definition 4.2.26.
For comparison, we reproduce that definition here.
G is a refinement of F iff for every G G there is some F F such
(3.3.8)
that G F.

a *cough*Wikipedia*cough*

Exercise 3.3.9 Let F and G be filter bases. Show that if F G , then G is a filtering of

F.

In fact, for derived filter bases, every filtering is of this form.


Proposition 3.3.10 Let X be a topological space, let 3 7 x X be a net, and let

0 3 7 be a function such that 7 x is a net. Then the following are equivalent.


(i). 7 x is a subnet of 7 x .
(ii). F 7x F7x .
(iii). F7x is a filtering of F 7x .
R

In particular, as the definitions of subnet given in [12] (Proposition 2.4.146) and


on Wikipediaa (Proposition 2.4.148) are not equivalent (Example 2.4.147 and Exercise 2.4.149) to our definition of subnet (Definition 3.2.5), they are in turn not
equivalent to the filtering of filter bases!

Of course, in general there are filterings not of this form, but for derived filter bases,
filtering is equivalent to containing (as sets).

Proof. ((i) (ii)) Suppose that 7 x is a subnet of 7 x . Let F F 7x . Then,


7 x is eventually in F (see the definition of subnet, Definition 3.2.5), and so 7 x is
eventually in F, and so F F7x .
((ii) (iii)) Exercise 3.3.9
((iii) (i)) Suppose that F7x is a filtering of F 7x . Let F X be such that
F eventually contains 7 x . Then, F F 7x , and so there is some F 0 F7x
such that F 0 F. As 7 x is eventually contained in F 0 , it is eventually contained
in F, and so 7 x is a subnet of 7 x (once again, by the definition of subnets,
Definition 3.2.5).

a As

of 16 July 2015

In the next section, we present several new ways of defining topologies. One of these ways will
be by defining what it means for filters to converge, and so present here as theorems the results that
6 Though

evidently there is some confusion as to what they should be calledsee the remark in the definition below.

3.4 Equivalent definitions of topological spaces

107

will be used as axioms.


Proposition 3.3.11 Kelleys Filter Convergence Axioms Let X be a topological space.

Then,
(i). Px converges to x, where Px := {U X : x U};a
(ii). F converges to x iff for every filtering G F , there is some filtering H G such
that H converges to x; and
(iii). for all directed sets I and filters F i converging to xi X, for i I, if Fi7xi converges
to x , then
F := {U X : there exists iU I such that,

whenever i iU , U F i for some F i F i .

(3.3.12)

also converges to x .
R

In other words, F consists of those sets that eventually contain some element of
F i.

Note that these are completely analogous to Kelleys (Net) Convergence Axioms
(Theorem 3.2.6).

Perhaps this could be made into an argument that somehow nets are more
fundamentalto define a topology by convergence of filters, you have to make
use of nets (in this case, i 7 xi ).b

Proof. We ask you to prove (i) and (ii).


Exercise 3.3.13 Show (i).
Exercise 3.3.14 Show (ii).
R

Hint: Check out Proposition 2.4.152.

Let I be a directed set, for each i I let F i be a filter converging to xi X, and suppose
that Fi7xi converges to x . By Proposition 3.3.6, i 7 xi converges to x . To show that F
converges to x , it suffices to show that every open neighborhood of x is an element of
F . So, let U be an open neighborhood of x . Then, as Fi7xi converges to x , it follows
that there is some V Fi7xi such that V U. As i 7 xi is eventually contained in V , there
is some i0 I such that, whenever i i0 , it follows that xi V U. Thus, for i sufficiently
large, U is an open neighborhood of xi . But then, because F i converges to xi , it follows
that there is some F i F i such that F i U, and so U F as desired.

a P is for principal, the etymology being from the use of the word principal in the context of ideals in
ring theory (to the best of my knowledge anyways).
b Or rather, I am not aware of a way to state the axioms without at least implicitly using nets.

3.4

Equivalent definitions of topological spaces


There is more than one way to define a topology. By definition, the specification of a topology is
just the specification of what sets are open. Sometimes, however, what the open sets should be is

Chapter 3. Basics of general topology

108

not nearly as obvious as, for example, what convergence of nets should mean. In this section, we
present a couple of other ways you may define a topological space. Which one is most useful, of
course, will depend on the particular problem at hand.
To summarize, we have already shown that we may define a topology in the following ways.
We can define a topology
(i). by specifying the open sets (Definition 3.1.1);
(ii). by specifying the closed sets (Exercise 3.1.2);
(iii). by specifying a base for the topology (Proposition 3.1.6);
(iv). by specifying a neighborhood base for the topology (Proposition 3.1.11); or
(v). by specifying a generating collection for the topology (Proposition 3.1.12).
Of course, there are many other ways to specify a topology as well, and it is the purpose of this
section to list several others ways.
3.4.1

Definition by specification of closures or interiors


We have mentioned the Kuratowski Closure (Interior) Axioms as well as Kelleys (Filter) Convergence Axioms. These play a role analogous to (i)(iii) in the definition of a topological space,
Definition 3.1.1. For example, we have the following.
Theorem 3.4.1 Kuratowskis Closure Theorem. Let X be a set and let C : 2X 2X be

a function on the power-set of X. Then, if


(i). C(0)
/ = 0;
/
(ii). S C(S);
(iii). C(S) = C (C(S)); and
(iv). C(S T ) = C(S) C(T ),
then there exists a unique topology on X such that Cls(S) = C(S).
Proof. S T E P 1 : M A K E H Y P O T H E S E S
Suppose that (i) C(0)
/ = 0,
/ (ii) S C(S), (iii) C(S) = C (C(S)), and (iv) C(S T ) =
C(S) C(T ).
S T E P 2 : S H O W T H AT S T I M P L I E S C(S) C(T )
Suppose that S T , so that T = S (T \ S), and hence C(T ) = C(S) C(T \ S), and so
C(S) C(T ).
S T E P 3 : D E F I N E W H AT S H O U L D B E T H E C L O S E D S E T S
The idea of the proof is that the closed sets should be precisely the sets that are equal to
their closure (Proposition 3.2.25). We thus make the definition


C := C 2X : C = C(C) .
(3.4.2)
S T E P 4 : V E R I F Y T H AT T H I S D E F I N E S A T O P O L O G Y

3.4 Equivalent definitions of topological spaces

109

By (i), the empty-set is closed (by which we mean it is an element of C ). By (ii), we have
X C(X) X,

(3.4.3)

so that X is closed as well. Let D C and define


B :=

(3.4.4)

CD

Then, of course, B C for all C D, and so by Step 2, we have that C(B) C(C) for all
C D, and so
C(B)

C(C) =

CD

C =: B.

(3.4.5)

CD

As B C(B) by (ii), we have that B = C(B), so that C is closed under arbitrary intersections.
We now check that it is closed under finite unions, so let C, D C . Then,
C(C D) = C(C) C(D) = C D,

(3.4.6)

and so C D C . Thus, C is closed under finite unions, and hence defines a topology.
S T E P 5 : S H O W T H AT Cls(S) = C(S)
Let S X. By (iii), C(S) is closed, and by (ii), it contains S. Therefore, Cls(S) C(S).
It follows that C (Cls(S)) C(S). On the other hand, because S Cls(S), it follows that
C(S) C (Cls(S)) = Cls(S), and so indeed Cls(S) = C(S).

S T E P 6 : D E M O N S T R AT E U N I Q U E N E S S
If another topology V satisfies this property, that is, ClsV (S) = C(S), then we have that
ClsV (S) = Cls(S) (no subscript indicates the closure in the topology defi8ned by C ). It
follows that a set S is closed with respect to V iff it is equal to ClsV (S) iff it is equal to
Cls(S) iff it is closed in the topology defined by C . Thus, the two topologies have the same
closed sets, and hence are the same.
Similarly, we have a dual interior theorem.
Theorem 3.4.7 Kuratowskis Interior Theorem. Let X be a set and let I : 2X 2X be a

function on the power-set of X. Then, if


(i). I(X) = X;
(ii). I(S) S;
(iii). I(S) = I (I(S)); and
(iv). I(S T ) = I(S) I(T ),
then there exists a unique topology on X such that Int(S) = I(S).
R

We omit the proof as it is completely dual to the corresponding closure proof.

Chapter 3. Basics of general topology

110
3.4.2

Definition by specification of convergence


The previous two results showed that we can define a topology by defining what the closure or
interior of every set should be. The next result says that we can define a topology by defining what
it means for nets to converge.
Theorem 3.4.8 Kelleys Convergence Theorem. Let X be a set, denote by N the

collection of all nets in X, and let be a relation on N X. Then, if


(i). ( 7 x ) x ;

(ii). ( 7 x ) x iff every subnet 7 x has in turn a subnet 7 x such that


( 7 x ) x ; and
(iii). for all directed
sets I and nets xi :i X, if xi (xi ) and (i 7 (xi ) ) (x ) ,

then I iI i 3 (i, ) 7 (xi ) i (x ) ,


then there is a unique topology on X such that 7 x converges to x iff ( 7 x ) x .
R

People sometimes attempt to define a topology by defining what it means to sequences


to converge. This is nonsensical. For example, (iii) doesnt even make sense in
this context. Moreover, because of examples like Example 3.2.30 (the cocountable
topology and discrete topology on R have the same notion of sequential convergence),
trying to rectify this in general is hopeless. My best guess is that this is because people
tend to shy away from the usage of nets for some reason, and so they tend to define
topologies using convergence of sequences. In any case, I dont recall ever seeing a
case where such a definition was given that didnt make sense after you replaced the
word sequence with the word net. That is to say, even though they are technically
wrong, there is usually a very easy fix.

Proof. S T E P 1 : M A K E H Y P O T H E S E S
Suppose that (i) ( 7 x ) x ; (ii) ( 7 x ) x iff every subnet 7 x has in turn a
subnet 7 x such that ( 7 x ) x ; and (iii) for all directed sets I and nets xi : i


X, if xi (xi ) and (i 7 (xi ) ) (x ) , then I iI i 3 (i, ) 7 (xi ) i (x ) .


S T E P 2 : D E F I N E T H E N O T I O N O F A adherent point
For a subset S X and x X, we say that x is an adherent of S iff there is some net
7 x S such that ( 7 x ) x.
S T E P 3 : D E F I N E W H AT S H O U L D B E T H E C L O S U R E
For a subset S X, we define C(S) to be the set of adherent points of S.
S T E P 4 : S H O W T H AT C S AT I S F I E S K U R AT O W S K I S C L O S U R E A X I O M S
The empty-set has no adherent points points, and so trivially C(0)
/ = 0.
/
It follows from (i) that S C(S).
We wish to show that C (C(S)) C(S), so let (x ) C (C(S)). Then, there is a
net I 3 i 7 xi C(S) such that (i 7 xi ) (x ) . As each xi C(S), there is a net
i 3
i 7 (xi ) i S such that
i 7 (xi ) i (xi ) . By (iii), it then follows that

i
i
I iI 3 (i, ) 7 (x ) i (x ) . As each (xi ) i S, it follows that (x ) is an
adherent point of S, which completes the proof that C (C(S)) = C(S) (the C(S) C (C(S))
follows from the fact that S C(S)).
We now show that C(S T ) = C(S) C(T ). We have that S C(S T ), and so C(S)
C(S T ). Similarly for T , and so we have C(S) C(T ) C(S T ).

3.4 Equivalent definitions of topological spaces

111

We now show that C(S T ) C(S) C(T ). So, let x C(S T ), so that there is a net
3 7 x S T such that ( 7 x ) x . Define
S := { : x S} and T := { : x T } .

(3.4.9)

As = S T , either S or T is cofinal in . Without loss of generality, suppose that


S is cofinal in , so that x|S is a strict subnet of 7 x . By hypothesis, this in turn
must have a subnet 7 x , S , such that ( 7 x ) x . Thus, x C(S), and so
C(S T ) C(S) C(T ).
It follows by Kuratowskis Closure Theorem (Theorem 3.4.1) that there is a unique
topology on X such that Cls(S) = C(S).
S T E P 5 : S H O W T H AT 7 x C O N V E R G E S T O x I F F ( 7 x ) x
Suppose that 3 7 x converges to x . We proceed by contradiction: suppose that it is
not the case that ( 7 x ) x . Then, there must be some subnet I 3 7 x that has no
subnet 7 x such that ( 7 x ) x . To obtain a contradiction, we construct such
a subnet. For each 0 , define S0 := {x : 0 }. As 7 x converges to x , so does
7 x , and so x Cls(S0 ) for all 0 . Hence, by the definition of our closure, for each


0 I there is a net 0 3 0 7 x S0 , so that 0 0 , with 0 7 x
0
0


a

0
x . From (iii), it follows that I 0 I 3 (0 , ) 7 x ) x . Thus, if this is
0
in fact a subnet of 7 x , we will have our contradiction. To show that this isindeed a
subnet, let 0 be arbitrary. To show this, we apply Proposition 2.4.146. Let 0 0 I 0
be arbitrary, and suppose that (, ) (0 , 0 ). Recall that we have that 0 0 always,
and so certainly we will have that 0 0 .
Now suppose that ( 3 7 x ) x . We proceed by contradiction: suppose that
7 x does not converge to x . Then, there is a subnet 7 x which has no subnet
converging to x . In particular, 7 x itself does not converge to x , and so there is an
open neighborhood U of x which does not eventually contain 7 x . It follows that
I := { : x
/ U} is cofinal, so that I 3 7 x is a subnet contained in U c . On the
other hand, we know that (I 3 7 x ) x , so that x Cls({x : I}), but this is a
contradiction of the fact that it has an open neighborhood which contains no point of this set.
S T E P 6 : D E M O N S T R AT E U N I Q U E N E S S
Recall that sets are closed iff they contain all their limit points. If we have two topologies
with the same notion of convergence, then the set of limit points of a given set in each
topology are the same, and consequently, a set is closed in one iff it is closed in the
other.

a The

superscript 0 in 0 is just to help us keep track of which directed set 0 is contained in.

One thing to note is that, in practice, it is often easier to check two conditions which are
equivalent to the second axiom: (i) subnets of convergent nets converge to the same thing and (ii)
that () direction of the second axiom.
Proposition 3.4.10 Let X be a set, denote by N the collection of all nets in X, and let

be a relation on N X. Then, the following are equivalent.

(i). ( 7 x ) x iff every subnet 7 x has in turn a subnet 7 x such that


( 7 x ) x .

Chapter 3. Basics of general topology

112

(ii). (a). If ( 7 x ) x and 7 x is a subnet of 7 x , then ( 7 x ) x ;


and
(b). If every subnet 7 x has in turn a subnet 7 x such that ( 7 x ) x ,
then ( 7 x ) x .
Proof. () Suppose that ( 7 x ) x iff every subnet 7 x has in turn a subnet
7 x such that ( 7 x ) x . (ii)(b) holds by hypothesis. We thus check (ii)(a). So,
let 7 x be a net such that ( 7 x ) x and let 7 x be a subnet. To show that
( 7 x ) x , we show that every subnet 7 x has in turn a subnet 7 x such

that ( 7 x ) x . So, let 7 x be a subnet of 7 x . This itself is also a subnet

of 7 x , and so by hypothesis, it has a subnet 7 x such that ( 7 x ) x .

() Suppose that (a) if ( 7 x ) x and 7 x is a subnet of 7 x , then


( 7 x ) x ; and (b) if every subnet 7 x has in turn a subnet 7 x such that
( 7 x ) x , then ( 7 x ) x . The () direction of (i) is true by hypothesis, so it
suffices to show the () direction of (i). So, suppose that ( 7 x ) x and let 7 x
be a subnet. Then, by hypothesis, we also have that ( 7 x ) x , and so this subnet
7 x has in turn a subnet (namely itself) that is related to x by .

And of course, there is an analogous result for filters.
Theorem 3.4.11 Kelleys Filter Convergence Theorem. Let X be a set, denoted by F

the collection of all filter bases in X, and let be a relation on F X. Then, if


(i). Px x, where Px := {U X : x U};a

(ii). F x iff for every filtering F 3 G F , there is some filtering F 3 H G such


that H x; and
(iii). for all directed sets I and convergent filters F i xi X, for i I, if Fi7xi x ,
then
F := {U X : there exists iU I such that,

whenever i iU , U F i for some F i F i . x ,

(3.4.12)

then there is a unique topology on X such that F converges to x X iff F x .


Proof. S T E P 1 : M A K E H Y P O T H E S E S
Suppose that (i)Px x, where Px := {U X : x U}; (ii) F x iff for every filtering
F 3 G F , there is some filtering F 3 H G such that H x; and (iii) for all directed
sets I and convergent filters F i xi X, for i I, if Fi7xi x , then F x .
S T E P 2 : D E F I N E A R E L AT I O N O N N X
Let N denote the collection of all nets in X, and for 7 x N and x X, define
( 7 x ) x iff F 7x x .b
STEP 3: VERIFY (I) OF KELLEYS CONVERGENCE THEOREM

(3.4.13)

3.4 Equivalent definitions of topological spaces

113

As the derived filter base of the constant net 7 x is just the principal filter base Px , it
follows that (i) of Kelleys Convergence Theorem, Theorem 3.4.8 holds.
S T E P 4 : V E R I F Y () O F ( I I ) O F K E L L E Y S C O N V E R G E N C E T H E O R E M
Let 7 x be a net such that ( 7 x ) x , and let 7 x be a subnet of 7 x .
Then, F 7x x . As F7x F 7x , by (ii), there is a filter base G F7x with
G x . Without loss of generality, we may assume that each G G contains some x (we
may throw away all such sets which do not contain some x , and as ( 7 x ) x < this
will not affect the fact that G x ). For G1 , G2 G , define G1 G2 iff G1 G2 . By the
definition of a filter base (Definition 3.3.1), it follows that (G , ) is a directed set. For each
G G , let xG denote any element contained of the net 7 x contained in G.
We first check that G 7 xG is a subnet of 7 x . So, let U X eventually contain
7 x . Then, U F7x , and so U G because G F7x . Now suppose that
G U, so that, by definition of on G as a directed set, G U. Hence, for G U,
xG G U, so that G 7 xG is eventually contained in U, and hence is a subnet of
7 x .
Thus, as G 7 xG is a subnet of 7 x and ( 7 x ) x , it follows that (G 7
xG ) x . Thus, the arbitrary subnet of 7 x of 7 x does indeed have a subnet that
converges to x ,c and so () of (ii) of Kelleys Convergence Theorem is indeed satisfied.
S T E P 5 : V E R I F Y () O F ( I I ) O F K E L L E Y S C O N V E R G E N C E T H E O R E M
Let 7 x be a net such that for every subnet 7 x , there is some subnet of that
subnet 7 x such that ( 7 x ) x . We wish to show that ( 7 x ) x . In
other words, we want to show that F 7x x . To show this, we apply (ii). So, let
G F 7x be a filter base. Then, using the same technique as in the previous step,d there
is a subnet G 3 G 7 xG with G ordered by reverse-inclusion. By hypothesis, there is then
in turn a subnet of this 7 xG such that ( 7 xG ) x , and hence F7xG x . If

F7xG G , this completes this step by (ii). To show this, let G0 G and suppose that

G G0 . Then, G G0 , and so xG G G0 , so that G 7 xG is eventually contained in G0 .


Therefore, by the definition of a subnet (Definition 3.2.5), 7 xG is eventually contained
in G, and so G F7xG , and so F7xG G .

STEP 6: SHOW (III) OF KELLEYS CONVERGENCE THEOREM


Let I be a directed set, for each i I let xi : i X be a net such that xi (xi ) , and
suppose that (i 7 (xi ) ) (x ) X. Note that, for F i := F i 7(xi ) i as in (iii),
F := {U X : there exists iU I such that, whenever i iU ,

it follows that U eventually contains i 7 (xi ) i .

(3.4.14)

By (iii), F x .
We wish to show that
!
I

3 (i, ) 7 (x ) i

(x ) .

(3.4.15)

iI

In other words, we would like to show that


F(i, )7(xi ) i (x ) .

(3.4.16)

Chapter 3. Basics of general topology

114
To show this, by (ii), it suffices to show that
F F(i, )7(xi ) i .

(3.4.17)

So, let U F . Then, there is some iU I such that, whenever i iU , it follows that U
eventually contains i 7 (xi ) i . It follows from this that, for all such i,there is a ( i )U
such that, whenever i ( i )U , it follows that (xi ) i U. Define U iI i such that
(U )i := ( i )U for i iU and anything for i 6 iu . Then, whenever (i, ) (iU , U ), it follows
that i iU and each i (U )i := ( i )U , so that
(xi ) i U.

(3.4.18)

Thus, U eventually contains (i, ) 7 (xi ) i , and we are done with this step.
S T E P 7 : S H O W T H AT T H E T O P O L O G Y S AT I S F I E S F x I F F F C O N V E R G E S
T O x
After verifying (i)(iii) of Kelleys Convergence Theorem, it follows that there is a unique
topology on X that has the property that ( 7 x) x iff 7 x converges to x . We now
check that the analogous property in terms of filters holds for this topology.
For the remainder of this step, we will make use of Kelleys Filter Convergence Axioms
(Proposition 3.3.11), which we already know to be true about actual convergence of filter
bases.
Suppose that F x . To show that F converges to x , we prove that any filtering
G F has in turn a filtering which converges to x . Order G by reverse-inclusion and pick
xG G G so that G 7 xG is a net. It follows that FG7xG G , and so by (ii), FG7xG x ,
and so (G 7 xG ) x , and so G 7 xG converges to x . To show that FG7xG converges
to x , let U X be an open neighborhood of x . Then, G 7 xG is eventually in U, and so
U U FG7xG . Thus, every filtering of F has in turn a filtering which converges to x ,
and so F converges to x .
The converse follows because this argument is converges to symmetric.

a P

is for principal, the etymology being from the use of the word principal in the context of ideals in
ring theory (to the best of my knowledge anyways).
b Even though there is no topology around (yet), the notion of eventually containing still makes sense, and
so of course F 7x still makes sense.
c Converges to is in quotes because what we really mean is that it is related to x by the relation .

d Turn G into a directed set by reverse-inclusion, show that each G G must without loss of generality
contain some x , and pick any one such lambda to form a subnet G 7 xG .

And of course, there is an analogous result to Proposition 3.4.10 which gives an equivalent
characterization of the second axiom which can sometimes be easier to check.
Proposition 3.4.19 Let X be a set, denoted by F the collection of all filter bases in X, and

let be a relation on F X. Then, the following are equivalent.


(i). F x iff for every filtering F 3 G F , there is some filtering F 3 H G such
that H x.
(ii). (a). If F x and G F , then G x ; and
(b). If every filtering F 3 G F , there is some filtering F 3 H G such that
H x.
Proof. We leave this as an exercise.

3.4 Equivalent definitions of topological spaces

115

Exercise 3.4.20 Complete the proof yourself, using the proof of Proposition 3.4.10

as a guide.

3.4.3

Definition by specification of continuous functions


The following two results define a topology on a set by simply declaring that a collection of
functions be continuous. This is similar in nature to how we defined a topology with a generating
collection (Proposition 3.1.12)in this case, we started with a collection of subsets, and simply
declared them to be open.
Proposition 3.4.21 Initial topology Let X be a set, let Y be an indexed collection of

topological spaces, and for each Y Y let fY : X Y be a function. Then, there exists a
unique topology U on X, the initial topology with respect to { fY : Y Y }, such that
(i). fY : X Y is continuous with respect to U for all Y Y ; and
(ii). if U 0 is another topology for which each fY : X Y is continuous, then U U 0 .
Furthermore, if SY generates the topology on Y , then the collection
{ fY1 (U) : Y Y , U SY }

(3.4.22)

generates U .
R

In other words, the initial topology is the smallest topology for which each fY is
continuous.

But what about the largest such topology? Well, the largest such topology is always
going to be the discrete topology, which is not very interesting. This is how you
remember whether the initial topology is the smallest or largestit cant be the
largest because the discrete topology always works.

In particular,
{ fY1 (V ) : Y Y , V Y open}

(3.4.23)

generates the initial topology.


R

Warning: { fY1 (U) : Y Y , U SY } need not be a base for the topology even if
each SY is (indeed, or even if each SY is the topology of Y itself)see, for example,
the The product topology (Proposition 3.5.10).
Compare this with the definition of the integers, rationals, closure, interior, and
generating collections (Theorems 1.2.1 and 1.3.4 and Propositions 3.1.12, 3.2.19
and 3.2.22).

Proof. For Y Y , let SY be any generating collection, define


S := { fY1 (V ) : Y Y , V SY },

(3.4.24)

and let U be the topology generated by S (Proposition 3.1.12).


As every element of S is open, it is certainly the case that each fY is continuous by
Exercise 3.1.28 (it suffices to check continuity on a generating collection). On the other

Chapter 3. Basics of general topology

116

hand, if U 0 is another topology for each each fY is continuous, then it must certainly contain
S , in which case U 0 contains U by the definition of a generating collection.
Exercise 3.4.25 Show that the initial topology is unique.


There is a nice characterization of convergence in initial topologies.
Proposition 3.4.26 Let X have the initial topology with respect to the collection { fY : Y

Y }, let 7 x X be a net, and let x X. Then, 7 x converges to x in X iff


7 fY (x ) converges to fY (x ) in Y for all Y Y . Furthermore, the initial topology is
the unique topology that has this property.
Proof. () Suppose that 7 x converges to x in X. Then, because each fY is continuous,
it follows that 7 fY (x ) converges to fY (x ) in Y for all Y Y .
() Suppose that 7 fY (x ) converges to fY (x ) in Y for all Y Y . To show that
7 x converges to x in X, we apply Exercise 3.2.42 (it suffices to check convergence on
a generating collections). We know that
{ fY1 (U) : Y Y , U Y open}

(3.4.27)

generates the initial topology, and so we apply Exercise 3.2.42 to this. So, let fY1 (U)
be an open neighborhood of x . We must show that 7 x is eventually contained in
fY1 (U). However, if fY1 (U) is an open neighborhood of x , then U fY ( fY1 (U)) is an
open neighborhood of fY (x ), and so 7 fY (x ) is eventually contained in U because
7 fY (x ) converges to fY (x ). But then 7 x is eventually contained in U, and we
are done.
Uniqueness follows from Definition by specification of convergence.

Initial topologies are great in that you can determine whether functions into X are continuous
or not by looking at their composition with each fY .
Proposition 3.4.28 Let X have the initial topology with respect to the collection { fY : Y

Y }, let Z be a topological space, and let f : Z X be a function. Then, f is continuous iff


fY f is continuous for all Y Y . Furthermore, the initial topology is the unique topology
with this property.
Proof. () Suppose that f is continuous. Then, because each fY is itself continuous and
compositions of continuous functions are continuous, it follows that fY f is continuous for
all Y Y .
() Suppose that fY f is continuous for all Y Y . Let 7 x converge to x X. To
show that f is continuous, it suffices to show that 7 f (x ) converges to f (x ). However,
because each fY f is continuous, 7 fY ( f (x )) converges to fY ( f (x )). Therefore, by
the previous result, we do indeed have that 7 f (x ) converges to f (x ).
Uniqueness follows from the uniqueness in the previous proposition.

3.4 Equivalent definitions of topological spaces

117

There is a dual version of the initial topology, in which the functions map into the set on
which we would like to define a topology.
Proposition 3.4.29 Final topology Let X be a set, let Y be an indexed collection of

topological spaces, and for each Y Y let fY : Y X be a function. Then, there exists a
unique topology U on X, the final topology with respect to { fY : Y Y }, such that
(i). fY : Y X is continuous with respect to U for all Y Y ; and
(ii). if U 0 is another topology for which each fY is continuous, then U U 0 .
Furthermore,
U = {U 2X : fY1 (U) is open for all Y Y .}.

(3.4.30)

In other words, the final topology is the largest topology for which each fY is
continuous.

But what about the smallest such topology? Well, the smallest such topology is
always going to be the indiscrete topology, which is not very interesting. This is how
you remember whether the final topology is the smallest or largestit cant be the
smallest because the indiscrete topology always works.

Proof. Define
U := {U 2X : fY1 (U) is open for all Y Y .}.

(3.4.31)

Exercise 3.4.32 Check that U is actually a topology.


R

We didnt need to do any such checking in the construction of the initial


topology because there we just took the topology generated by the collection.

I am not aware of dual to Proposition 3.4.26 that characterizes convergence


in final topologies.

Of course every fY is continuous with respect to U as, by definition, the preimage of


every element of U is open. Furthermore, anything larger than U would necessarily have
to contain some set for which the preimage under some fY would not be open, and so that
fY would not be continuous.
Exercise 3.4.33 Show that the final topology is unique.


There is likewise a dual result to Proposition 3.4.28 which tells us when continuous functions
on a space equipped with a final topology are continuous.
Proposition 3.4.34 Let X have the final topology with respect to the collection { fY : Y Y },

let Z be a topological space, and let f : X Z be a function. Then, f is continuous iff f fY


is continuous for all Y Y . Furthermore, the final topology is the unique topology with
this property.
Proof. () Suppose that f is continuous. Then, because each fY is itself continuous and

Chapter 3. Basics of general topology

118

compositions of continuous functions are continuous, it follows that f fY is continuous for


all Y Y .
() Suppose that f fY is continuous for all Y Y . Let U Z be open. We must
show that f 1(U) is open. However, from (3.4.31), we know that f 1 (U) will be open iff
fY1 f 1 (U) will be open for all Y Y . However, fY1 f 1 (U) = [ f fY ]1 (U) is open
because f fY is continuous.
Let U be another topology that has the property that f : X Z is continuous iff f fY
is continuous for all Y Y . To show that U is the final topology, by the definition
(Proposition 3.4.29), it suffices to show that each fY : Y X is continuous with respect
to U and that any other such topology is contained in U . As idX : hX, U i hX, U i is
continuous, by hypothesis, it follows that idX fY = fY is continuous with respect to U . Let
U 0 be another topology such that fY is continuous with respect to U 0 for all Y Y . We
must show that U 0 U . To show this, it suffices to show that idX : hX, U i hX, U 0 i
is continuous. By the defining property of U , to show this, it suffices to show that the
composition of this with each fY is continuous. In other words, it suffices to show that each
fY is continuous with respect to U 0 , but this is true by hypothesis.


3.4.4

Summary
We now quickly recap all the ways in which we know how to specify a topology on a set.
(i). We can specify the open sets (Definition 3.1.1)..
(ii). We can specify the closed sets (Exercise 3.1.2).
(iii). We can specify a base (Definition 3.1.4).
(iv). We can specify a neighborhood base (Definition 3.1.9).
(v). We can generate a topology (Proposition 3.1.12).
(vi). We can define the closure of each set (Theorem 3.4.1).
(vii). We can define the interior of each set (Theorem 3.4.7).
(viii). We can define convergence of nets (Theorem 3.4.8).
(ix). We can define convergence of filters (Theorem 3.4.11).
(x). We can declare that functions on the space are continuous (the initial topologysee Proposition 3.4.21).
(xi). We can declare that functions into the space are continuous (the final topologysee Proposition 3.4.29.

3.5

The subspace, quotient, product, and disjoint-union topologies


The purpose of this section is to present several ways of constructing new topologies spaces
from old. In brief, the subspace topology will be the topology we put on subsets of topological
spaces, the quotient topology will be the topology we put on quotients of topological spaces,7 ,
7 Quotients

in the sense of Definition A.1.78.

3.5 The subspace, quotient, product, and disjoint-union topologies

119

the product topology is the topology we put on cartesian-products, the disjoint-union topology
(surprise, surprise) is the topology we put on disjoint-unions of topological spaces. They key to
defining all of these topologies are the initial (for the subspace and product topologies) and final
topologies (for the quotient and disjoint-union topologies) (Propositions 3.4.21 and 3.4.29).
3.5.1

The subspace topology


All subsets of topological spaces have a canonically associated topology, called the subspace
topology. Note that this is not completely immediatefor example, it is not the case that every
subset of a ring is a ring. There is definitely something to define and something to check (that the
subspace topology is in fact a topology).
Proposition 3.5.1 Subspace topology Let X be a topological space and let S X.

Then, there exists a unique topology U on S, the subspace topology, that has the property
that a function into S is continuous iff it is continuous regarded as a function into X.
Furthermore, the subspace topology is the initial topology with respect to the inclusion
: S , X. In particular,
U = {U S : U X is open.}.
R

(3.5.2)

Unless otherwise stated, subsets of topological spaces are always equipped with the
subspace topology.

Proof. All of this follows from the definition of the initial topology and its characterization of continuity of continuity of functions into initial topologies (Propositions 3.4.21
and 3.4.28), with exception of the fact that U = {U S : U X is open.}. Proposition 3.4.21
tells us that this generates the initial topology, but it does not tell us that it is a topology
itself. Of course, however, if a generating collection is itself a topology, then the topology it
generates is just itself. Therefore, it suffices just to check that {U S : U X is open.} is
in fact a topology.
Exercise 3.5.3 Check that {U S : U X is open.} is in fact a topology.




Example 3.5.4 Consider the subspace topology on [0, 1] R. Note that ( 12 , 1] = ( 12 , )

[0, 1] is open in [0, 1], despite the fact that it obviously not open in R.


Example 3.5.5 Note that the order topology on Q is the space as the subspace topology

inherited from R. This is of course because they are both equipped with the order topology
of the same order. Likewise, for N Z and Z Q.

3.5.2

The quotient topology


Whenever we have a surjective function from a topological space X onto a set Y , we can use this
function and the topology on X to place a topology on Y . Recall (Exercise A.1.80) that every
surjective function can be viewed as a quotient functionthis of course is the etymology of the
term quotient topology.

Chapter 3. Basics of general topology

120

Proposition 3.5.6 Let X be a topological space, let Y be a set, and let q : X Y be surjective.

Then, there exists a unique topology U on Y , the quotient topology, that has the property
that a function on Y is continuous iff its composition with q is continuous. Furthermore, the
subspace topology is the final topology with respect to q : X Y . In particular,
U = {U Y : q1 (U) is open.}.
R

(3.5.7)

Unless otherwise stated, quotients of topological spaces are always equipped with
the quotient topology.

Proof. All of this follows from the definition of the final topology and its characterization in
terms of continuity of functions defined on final topologies (Propositions 3.4.29 and 3.4.34).


3.5.3

The product topology


The product topology is the canonical topology we put on a cartesian-product of topological spaces
X Y . While we do technically use this in places, we use it in ways where we could have gottenaway with not speaking of the product topology per se (for example, the product topology on R R
is the same as the topology defined by -balls). The real reason we go to the trouble of talking about
the product topology explicitly is for the proof of producing a counter-example to the statement
A space is quasicompact iff every net has a convergent strict subnet.9

(3.5.8)

We mentioned when we defined subnets (Definition 3.2.5) that the notion of a strict subnet was
the more obvious naive notion (that is, you just take terms from the original net, subject to the
only condition that your indices get arbitrarily large), but that this naive notion was insufficient
because it didnt allow us to prove certain theorems. They key result, that spaces are quasicompact
iff every net has a convergent subnet (Proposition 3.2.39), was precisely the example of a theorem
we had in mind.
In brief, the counter-example will be

X := {0, 1},
(3.5.9)
2N

that is, an uncountable product (precisely, a product over the power set of N) of the two-element
set {0, 1}. Obviously {0, 1} is quasicompact (all finite spaces are)Tychnoffs Theorem is the
statement that arbitrary products of quasicompact spaces are quasicompact, which tells us that X is
quasicompact.
So before we do anything else then, we must first define the product topology.
Proposition 3.5.10 Product topology Let X be an indexed collection of topological

spaces. Then, there exists a unique


topology, the product topology, on XX X, that has
the property that a function into XX X is continuous iff each component of the function
is continuous. Furthermore, the product topology is the initial topology with respect to
{X : X X }.a In particular, if SX is a generating collection for the topology on X X ,
then the collection
{X1 (U) : U SX }

(3.5.11)

3.5 The subspace, quotient, product, and disjoint-union topologies


generates the product topology, so that
(
)

B :=
SX : SX SX and all but finitely many SX = X b

121

(3.5.12)

XX

is a base for the product topology.


R

The all but finitely many phrase is crucial. For example, in the space

R,

(3.5.13)

that is, a countably-infinite product of R, the set


(0, 1) (0, 1) (0, 1)

(3.5.14)

is not even open in the product topology on N R (much less an element of any base).
This topology (called the box topology) might be your more naive guess, but it is
wrong in the sense that, if we allow things like this, then we lose the property that
the continuity of a function is determined by the continuity of the components of the
functionsee Example 3.5.15 below.
R

Unless otherwise stated, products of topological spaces are always equipped with the
product topology.

Proof. All of this follows from the definition of the initial topology, its characterization in
terms of continuity of functions into initial topologies, and the defining result of generating
collections (Propositions 3.1.12, 3.4.21 and 3.4.28).

a Recall

(Definition A.1.57) that X : XX X X is just the projection.


b We can always without loss of generality assume that X B because, if it wasnt there before, throw it in.
X

We mentioned in the remarks of this theorem that if you take as a base sets of the form
XX SX (with SX 6= X for all X X permissible), then you will lose the property that a function
into the product is continuous iff each of its components is. We now present a counter-example.
Example 3.5.15 A discontinuous function into the box topology with each component continuous Define as a set


X :=

(3.5.16)

2N

and consider the function


idX : hX, product topologyi hX, box topologyi.

(3.5.17)

Then, each component of this function is continuous because the component of the identity
is just the projection from X onto the corresponding copy of R (the preimage of U R
under this projection is open in the product topology because in fact, according to the
theorem, such an element is in a generating collection of the product topology). On the
other hand,

(0, 1)
(3.5.18)
2N

is open in the box topology (by definition), but not open in the product topology by the
theorem above.

Chapter 3. Basics of general topology

122

There is a relatively useful corollary of the definition of the product topology that characterizes
convergence.
Corollary 3.5.19 Let X be an indexed collection of topological spaces, let 7 x

XX X be a net, and let x


XX X. Then, 7 x converges to x iff 7 (x )X
converges to (x )X in X for all X X .
In other words, nets converge to an element in a product iff every component converges to the corresponding component of that element.

Proof. As the product topology on XX X is the initial topology with respect to the
projections, {X : X X }, this result follows from Proposition 3.4.26.

Exercise 3.5.20 Projections are open Let X be an indexed collection of topological

spaces, let X X , and let U

XX

X be open. Show that X (U) X is open.

Functions that have the property that the image of open sets are open are open
functions.

Now that we have defined the product topology, we prove a relatively deep result concerning
quasicompactness of products which will allow us to produce the desired counter-example.
Theorem 3.5.21 Tychnoffs Theorem. Let X be a collection of quasicompact spaces.

Then,

XX

X is quasicompact.

Proof. a Recall that the product topology on XX X has a generating collection of the
form X1 (UX ) for UX X open, where X : XX X X is the projection. We apply
Alexanders Subbase Theorem (Theorem
3.2.43) to this generating collection.

So, let U be an open cover of XXX of subsets of the form X1 (UX ). It suffices to
show that if no finite subset of U covers XX X, then U itself does not cover X. Define


UX := U X open : X1 (U) U .
(3.5.22)

If a finite subset of UX covered X, then its preimage would cover XX X, and so by


quasicompactness of X, it follows that U
X does not cover X, so choose xX X not contained
S
in UUX U. Then, the element x XX X whose coordinate at X X is xX is not
S
contained in UU U.

a Proof

adapted from [12, pg. 143].

And finally we are able to present our counter-example.


 Example 3.5.23 A quasicompact space with a net that has no convergent strict
subnet a Define

X :=

{0, 1} = {0, 1}2

(3.5.24)

SN

that is, a product of 20 copies of the two element set {0, 1}. In other words, it is the set
of all functions from 2N into {0, 1}in fact, for the most of this example, we shall think
of this space as the collection of functions. This is quasicompact by Tychnoffs Theorem
(and because finite sets are quasicompactsee Exercise 3.2.36). On the other hand, we may

3.5 The subspace, quotient, product, and disjoint-union topologies


define a sequence m 7 xm X as follows.
(
1 if m S
xm (S) :=
,
0 if m
/S

123

(3.5.25)

where S N (xm is a function from 2N into {0, 1}, and so xm (S) is the value of this function
at the element S 2N ).
We now show that this sequence has no convergent strict subnet (necessarily also a sequence).
We proceed by contradiction: suppose that there were a cofinal subset 0 N, 0 =
{m0 , m1 , m2 , . . .} (with mn mn+1 ), such that n 7 xmn converges. Then, by Corollary 3.5.19
(nets in products converge iff each component does), for each S N, the sequence n 7
xmn (S) {0, 1} would have to converge. Thus, for each S N, the sequence n 7 xmn (S)
would have to be either eventually 0 or eventually 1. In other words, either (i) for all but
finitely many mn 0 , mn
/ S; or (ii) for all but finitely many mn 0 , mn S. In other
words, for all S N, either (i) there is a cofiniteb subset of 0 that is contained in Sc or (ii)
there is a cofinite subset of 0 that is contained in S.c
So, take S := {m0 , m2 , m4 , . . .} 0 . Then, there is some cofinite subset 00 0 such that
either 00 S or 00 Sc . In the former case, we have that
finite set = 0 \ 00 0 \ S = {m1 , m3 , m5 , . . .} :

(3.5.26)

a contradiction. Thus, we must have that 00 Sc , and so


finite set = 0 \ 00 0 \ Sc = {m0 , m2 , m4 , . . .} :

(3.5.27)

a contradiction. As both possibilities resulted in a contradiction, this itself is a contradiction, and so our assumption that there was a convergent strict subnet must have been
incorrect. Therefore, m 7 xm contains no convergent strict subnet, despite the fact that X is
quasicompact.
a This

example was shown to me by Eric Wofsey on mathoverflow.net.


means that the complement is finite.
c Cofinite in 0 , that is.

b Cofinite

3.5.4

The disjoint-union topology


Our inclusion of the disjoint-union topology is mostly because of its obvious duality to the product
topologyit feels incomplete not to include it. On the other hand, while in principle, being
completely dual to the product topology, it should be no more or less difficult work with, in practice
it seems that it is much easier to get a handle on, and so probably doesnt deserve as in-depth a
treatment.
Proposition 3.5.28 Disjoint-union topology Let X be an indexed collection of topo-

logical
spaces. Then, there exists a unique topology, thedisjoint-union topology, on

X,
that has the property that a function defined on XX X is continuous iff its
XX
restriction to each component is continuous. Furthermore, the disjoint-union
topology is the
a
final topology with respect to {X : X X }. In particular, a set U XX X is open iff
U X (X) is open for all X X . is a base for the product topology.
R

Unless otherwise stated, disjoint-unions of topological spaces are always equipped


with the disjoint-union topology.

Chapter 3. Basics of general topology

124

Proof. All of this follows from the definition of the final topology and its characterization in
terms of continuity of functions defined on final topologies (Propositions 3.4.29 and 3.4.34).

a Recall

3.6

(Definition A.1.52) that X : X

XX

X is just the inclusion.

Separation Axioms
We have mentioned the term T2 a couple of times now (see, for example, the definition of
quasicompactness (Definition 3.2.35) and the Quasicompactness (Theorem 2.5.64)). One of the
purposes of this section is to explain what we meant by this. The term T2 is a separation axiom,
and you should know the other separation axioms as well if you plan to become a mathematician,
but this is admittedly not a priority for this course (a study of them in detail is better suited for a
course on general topology itself).

3.6.1

Separation of subsets
In this subsection, we will define various levels of sepration of subsets of a topological space. In
the next section, we will then define corresponding levels of separation for spaces, which is roughly
the condition that any two points have the corresponding level of separation.10
Throughout this subsection, let S1 , S2 X be disjoint subsets of a topological space X. In the
various definitions will follow, we will say things like S1 and S2 are XYZ.. If S1 = {x1 } and
S2 = {x2 } are singletons, then instead we will say that x1 and x2 are XYZ.. In fact, this is probably
the case of most interest (though certainly not the only case).
Definition 3.6.1 Topologically-distinguishable

S1 and S2 are topologically-distinguishable iff there is an open set containing S1 but not S2
or vice-versa.

In other words, they do not have precisely the same open neighborhoods.

For example, in an indiscrete space, no two points are topologically-distinguishable.


In R on the other hand (and almost every space you work with that is not expressly
cooked-up for the sole purpose of being a counter-example to something), any two
points are topologically-distinguishable.

Example 3.6.2 Two distinct points which are not topologically-distinguishable

We just mentioned this in the remark above, but decided to place it in an example of its own
to make it easier to spot if skimming. Take X := {x1 , x2 } and equip X with the indiscrete
topology, that is,
U := {0,
/ X}.

(3.6.3)

Then, x1 6= x2 but x1 and x2 are contained in precisely the same open sets.
Definition 3.6.4 Separated S1 and S2 are separated iff there is an open neighborhood
U1 of S1 not intersecting S2 and an open neighborhood U2 of S2 not intersecting S1 .
R

The difference between this and topological-distinguishability is that this has to

10 If that doesnt make sense now, dont worryI cant be completely clear just yet. Worry if it doesnt make sense
after the next subsection ;-).

3.6 Separation Axioms

125

happen to both S1 and S2 , whereas, in the case of topological-distinguishability, we


only require that at least one of them has an open neighborhood that does not intersect
the other.

Example 3.6.5 Two points which are topologically-distinguishable but not separated Define X := {x1 , x2 } and


U := {0,
/ X, {x1 }} .

(3.6.6)

Then, x1 and x2 are topologically-distinguishable as {x1 } is an open neighborhood of x1


that does not contain x2 . On the other hand, every neighborhood of x2 contains x1 .
Definition 3.6.7 Separated by neighborhoods S1 and S2 are separated by neighbor-

hoods iff there is a neighborhood U1 of S1 and a neighborhood U2 of S2 with U1 and U2


disjoint.
R

Equivalently, we may replace U1 and U2 with open neighborhoods.

This is just like being separated, except that we may put U1 around S1 and U2 around
S2 simultaneously and have no intersection, whereas in the separated case, the U1
and U2 that work will in general intersect.

Example 3.6.8 Two points which are separated but not separated by neighborhoods Define X := {x1 , x2 , x3 } and


U := {0,
/ X, {x1 , x3 }, {x2 , x3 }, {x3 }} .

(3.6.9)

Then, {x1 , x3 } is an open neighborhood of x1 which does not contain x2 and {x2 , x3 } is
an open neighborhood of x2 which does not contain x1 . On the other hand, every open
neighborhood of x1 intersects every open neighborhood of x2 at x3 .
Definition 3.6.10 Separated by closed neighborhoods S1 and S2 are separated by

closed neighborhoods iff there is a closed neighborhood C1 of S1 and a closed neighborhood


C2 of S2 with C1 and C2 disjoint.
R

This is the same as being separated by neighborhoods, except that we can further
require that the neighborhoods are closed.a

a Recall

that neighborhoods do not have to be opensee Definition 3.1.8.

Example 3.6.11 Two points which are separated by neighborhoods but not by
closed neighborhoods Define X := {x1 , x2 , x3 } and


U := {0,
/ X, {x1 }, {x3 }, {x1 , x3 }} .

(3.6.12)

Then, {x1 } is a neighborhood of x1 , {x3 } is a neighborhood of x3 , and these two neighborhoods are disjoint, so that x1 and x3 are separated by neighborhoods. On the other hand,
x1 only has three neighborhoods: {x1 }, {x1 , x3 }, and all of X. The latter must intersect
every neighborhood of x3 and the former is not closed because {x1 }c = {x2 , x3 } is not open.
Therefore, we cannot separate x1 and x3 with closed neighborhoods.

Chapter 3. Basics of general topology

126

There is an equivalent, alternative way to think about being separated by closed neighborhoods
that you may find useful.
Proposition 3.6.13 S1 and S2 are separated by closed neighborhoods iff they are separated

by open neighoborhoods with disjoint closure.


Proof. () Suppose that S1 and S2 are separated by closed neighborhoods C1 and C2
respectively. By the definition of a neighborhood (Definition 3.1.8), there are then open
neighborhoods U1 and U2 with S1 U1 C1 and S2 U2 C2 . Then, Cls(U1 ) C1 and
Cls(U2 ) C2 , and so U1 and U2 constitute open neighborhoods of S1 and S2 with disjoint
closures.
() Suppose that S1 and S2 are separated by open neighborhoods with disjoint closure.
Then, these closure constitute closed neighborhoods which separate S1 and S2 .

Definition 3.6.14 Completely-separated S1 and S2 are completely-separated or sepa-

rated by (continuous) functions iff there is a continuous function f : X [0, 1] such that
f |S1 = 0 and f |S2 = 1.
R

Why does being completely-separated imply being separated by closed neighborhoods?a

a All the other implications are true too (i.e. separated implies topologically-distinguishable, separated by
neighborhoods implies separated, etc.), this is just the first one that is not completely obvious, which is why it is
the only one we have asked about.

Example 3.6.15 Two points which are separated by closed neighborhoods but
not completely-separated a Define


S := ((0, 1) (0, 1)) (Q Q) = {hx, yi (0, 1) (0, 1) : x, y Q} ,

(3.6.16)

T := { 12 } {r 2 : r Q} = {h 12 , r 2i : r Q}

(3.6.17)

X := S T {h0, 0i)} {h1, 0i)}.

(3.6.18)

and

We define a topology on X by defining a neighborhood base at each point (see Proposition 3.1.11). For hx, yi X, there are four cases: (i) hx, yi = h0, 0i, (ii) hx, yi = h1, 0i, (iii)
hx, yi T , and (iv) hx, yi S. We define
(

b
U

S
:
U
is
open
in
S.
if hx, yi S
o
Bhx,yi := n m
, (3.6.19)
+
Uhx,yi : m Z
if hx, yi T {h0, 0i} {h1, 0i}
where


m
:= {h0, 0i} hx, yi (0, 41 ) (0, m1 ) : x, y Q
Uh0,0i)


m
:= {h1, 0i} (x, y) ( 34 , 1) (0, m1 ) : x, y Q
Uh1,0i
n
o

U m1 := hx, yi ( 14 , 34 ) (r 2 m1 , r 2 + m1 ) : x, y Q .
h 2 ,r 2i

(3.6.20)

3.6 Separation Axioms

127

By Proposition 3.1.11, there is a unique topology for which Bhx,yi is a neighborhood base
of hx, yi X.
The closures of {h0, 0i} (0, 14 ) (0, 1n ) and {h1, 0i} ( 34 , 1) (0, 1n ) in X must be disjoint
as, in particular, any point in the former cannot have x-coordinate exceeding 14 and any point
in the latter cannot have x-coordinate strictly less than 43 .
On the other hand, h0, 0i and h1, 0i cannot be separated by a function. To see this, suppose
that f : X R were a continuous function such that f (h0, 0i) = 0 and f (h1, 0i) = 1. Then,
m
f 1 ([0, 41 )) would be an open neighborhood of h0, 0i, and so must contain Uh0,0i
for some
n
m Z+ . Similarly, f 1 (( 34 , 1]) must contain Uh1,0i
for some n Z+ . Let r Q be such that

r 2 < min{ m1 , 1n }. Obviously, f (h 12 , r 2i) cannot be in both [0, 14 ) and ( 34 , 1] as these sets
are disjoint, so without loss of generality assume that it is not contained in [0, 41 ), so let


U [0, 1] be an open neighborhood of f (h 12 , r 2i) with Cls(S) disjoint from Cls [0, 14 ) , so


that the preimages of Cls(U) and Cls [0, 41 are disjoint closed neighborhoods of h 12 , r 2i

and h0, 0i respectively. On the other hand, a disjoint closed neighborhood of h12 , r 2i must

contain Uho1 ,r2i for o Z+ with r 2 o1 > 0. As r < m1 , we have that 1o < m2 . But then,
2



m
Cls Uh0,0i
and Cls(Uho1 ,r2i ) must intersect at h 14 , r 2 1o i because r 2 1o < m1 1o < m1 .
2

This is the Arens Square.

R
a This

is significantly more nontrivial than the preceding counter-examples and comes from [23, pg. 98].
in the usual topology (the subspace topology inherited from (0, 1) (0, 1)).

b Open

Definition 3.6.21 Perfectly-separated S1 and S2 are perfectly-separated or precisely-

separated iff there is a continuous function f : X [0, 1] such that S1 = f 1 (0) and
S2 = f 1 (1).
R

Being completely-separated means that f is 0 on S1 and 1 on S2 . Being perfectlyseparated means that, furthermore, f is 0 nowhere else except on S1 and f is 1
nowhere else except on S2 .

Example 3.6.22 Two points which are completely-separated but not perfectlyseparated a This example is fairly similar to the cocountable topology examplesee

Example 3.2.27.
Define X := R. Let C X and declare that
X is closed iff either (i) C contains 0 or (ii) C is finite.

(3.6.23)

You can check for yourself that this satisfies the defining conditions for a topology in terms
of closed sets (Exercise 3.1.2).
We wish to show that 0, 1 X are completely-separated, but not perfectly-separated. We first
check that they are completely-separated by producing a continuous function f : X [0, 1]
such that f (0) = 0 and f (1) = 1. Define
(
1 if x = 1
f (x) :=
.
(3.6.24)
0 otherwise
We first check that f is continuous. Let C [0, 1] be closed. If C contains 0 [0, 1], then
f 1 (C) contains 0 X, and so is closed. If it does not contain 0, then f 1 (C) is finiteeither
C contained 1 in which case f 1 (C) = {1} or it did not in which case f 1 (C) = 0.
/

Chapter 3. Basics of general topology

128

Now we show that 0, 1 X are not perfectly separated. Suppose that there exists a continuous
function f : X [0, 1] such that {0} = f 1 (0) and {1} = f 1 (1). Then,
!
\
\

1
1
1
[0, m ) = b
(3.6.25)
{0} = f (0) = f
f 1 [0, m1 ) .
mZ+

mZ+

That is, {0} is a G set.c However, by definition, a set is open iff it does not contain 0 X
or its complement is finite. Of course, all the sets appearing in (3.6.25) must be of the latter
kind. However, taking the complement of this equation, we find
R \ {0}1 d =

f 1 (0, m1 )c ,

(3.6.26)

mZ+

so that R \ {0} is a countably-infinite union of finite setsa contradiction.


R

This is the Uncountable Fort Space.

a This

comes from [23, pg. 52].


A.1.40(ii)
c Recall that this is just a fancy-shmancy term for a set which is a countable intersection of open setssee
Definition 3.1.3.
d De Morgans Lawssee Exercise A.1.16
b Exercise

Note that we obviously have the implications


perfectly-separated completely-separated separated by closed neighborhoods separated by neighborhoods separated topologicallydistinguishable distinct.

(3.6.27)

Here, distinct literally means that S1 and S2 are not the same thing, i.e. S1 6= S2 . Furthermore, we
have presented counter-examples after each definition to show that each implication is strict.
3.6.2

Separation axioms of spaces


In the previous subsection, we defined several levels of separation between two disjoints subsets
of a topological space. We now use these definitions to put conditions on topological spaces
themselves.
Throughout this section, let X be a topological space.
Definition 3.6.28 T0 X is T0 iff any two distinct points are topologically-distinguishable.
R

Sometimes this condition is called kolmogorov. You will find that a lot (if not all) of
the separation axioms of spaces have other names. We have chosen the names we
have because (i) other terminology is less consistent and (ii) it carries less information
(of course that the subscript 0 in T0 has some significance).

This is an insanely reasonable condition. I might even argue that if you have a space
youre trying to study that is not T0 , youre doing something wrong. If the topology
cannot distinguish between two points, either (i) you may as well identify those two
points (see Proposition 3.6.30) or (ii) you should probably consider adding more
structure to your space that does distinguish between them.

Example 3.6.29 A space which is not T0 Any indiscrete space with at least two
points. The example above in Example 3.6.2 worked.


3.6 Separation Axioms

129

Proposition 3.6.30 T0 quotient Let X be a topological space. Then, there exists a unique

topological space T0(X), the T0 quotient of X, and a surjective map q : X T0(X) such
that
(i). T0(X) is T0 ; and
(ii). if Y is another T0 space with a continuous map : X Y , then there is a unique
continuous map 0 : T0(X) Y such that = 0 q.
R

As T0 is sometimes called kolmogorov, so to this is sometimes called the kolmogorov


quotient.

Compare this with the definitions of the integers, rationals, closure, interior, generating collections, initial topology, and final topology (Theorems 1.2.1 and 1.3.4
and Propositions 3.2.19, 3.2.22, 3.4.21 and 3.4.29). Note how this is a bit different
the key difference here is that the map from X to T0(X) is now surjective (i.e. a
quotient map) instead of in all the previous cases where it was injective (i.e. an
inclusion).

Proof. Define x1 x2 iff the open sets which contain x1 are precisely the same as the open
sets which contain x2 .
Exercise 3.6.31 Show that is an equivalence relation.

Define T0(X) := X/ and let q : X T0(X).


Exercise 3.6.32 Show that T0(X) satisfies (i) and (ii).

Exercise 3.6.33 Show that T0(X) is unique.


Definition 3.6.34 T1 X is T1 iff any two distinct points are separated.
R

Sometimes this condition is called accessible of frchet. I also prefer T1 over


accessible because, not only does it carry slightly more information, but its also a
lot more common. I would definitely recommend not to use the term Frchet to
describe this, as the term frceht space is usually meant to describe something else
entirely.

Warning: This is not the same as any two topologically-distinguishable points are
separated. That condition is called R0 and is rarely, if ever, used.a

In contrast to the T0 condition, there are incredibly important examples of spaces that
are not T1 .

a There

is no difference between R0 and T1 for spaces which are T0 .

Example 3.6.35 A space that is T0 but not T1 The example above in Example 3.6.5

worked.
While the above is probably the best way to state T1 as a definition because it makes its similarity
with other separation conditions more apparent, it is often best to think of T1 spaces are spaces in
which points are closed.

Chapter 3. Basics of general topology

130

Proposition 3.6.36 Let X be a topological space. Then, X is T1 iff {x} is closed for all

x X.
Proof. () Suppose that X is T1 . Let x X. For all y X distinct from x, let Uy be an open
neighborhood of y that does not contain x. Then,
[

Uy = X \ {x}

(3.6.37)

y6=x

is open, and so
\

Uyc = {x}

(3.6.38)

y6=x

is closed.
() Suppose that {x} is closed for all x X. Let x1 , x2 X. Then, {x1 }c is an open
neighborhood of x2 that does not contain x1 and {x2 }c is an open neighborhood of x1 that
does not contain x2 .

Corollary 3.6.39 Every finite T1 topological space is discrete.
R

In particular, it is a waste of time to look for counter-examples in finite spaces if your


space is T1 .

Proof. If the space is T1 , then every point is closed. If the space is finite, then every subset
is a union of finitely many points, that is, a finite union of closed sets, and hence closed. 
Definition 3.6.40 T2 X is T2 iff any two distinct points are separated by neighborhoods.
R

This is quite often referred to as hausdorff . In constrast to most of the alternative


terminologies, this actually might be more common than the term T2 .

Definition 3.6.41 Compact X is compact iff it is quasicompact and T2 .


Proposition 3.6.42 Let X be a topological space. Then, X is T2 iff limits are unique.
R

Because of lack of uniqueness in general, we have been hesitant to write lim x if


limits are not unique, then what limit does this symbol refer to? Hereafter, however,
in T2 spaces, we will not hesitate to use this notation.

0 X be limits
Proof. () Suppose that X is T2 . Let 7 x X be a net and let x , x
0 . Then, there exist disjoint open
of X. We proceed by contradiction: suppose that x 6= x
0
0
neighborhoods U of x and U of x . As 7 x converges to x , it is eventually contained
in U. But then, as U and U 0 are disjoint, 7 x is not eventually contained in U 0 a
contradiction.
0 X be distinct. We wish to show that there
() Suppose that limits are unique. Let x , x
0 . We proceed by contradiction: suppose that
exist disjoint open neighborhoods of x and x
0 . Let be the
every open neighborhood of x intersects every open neighborhood of x
collection of these intersections, that is, the collection of sets of the form U U 0 for U and U 0

3.6 Separation Axioms

131

0 respectively. Order by reverse-inclusion so as to form a


open neighborhoods of x and x
directed set, and for each U U 0 , choose xUU 0 U U 0 . Then, 3 U U 0 7 xUU 0
0 a contradiction.
converges to both x and x


Exercise 3.6.43 Show that a subspace of a T2 space is T2 .


Exercise 3.6.44 Show that an arbitrary product of T2 spaces is T2 .
R

Thus, by The product topology (Theorem 3.5.21), an arbitrary product of compact


spaces is compact.

Exercise 3.6.45 Show that quasicompact subsets (which are in fact compact by Exer-

cise 3.6.43) of a T2 space can be separated by neighborhoods.


Exercise 3.6.46 Show that quasicompact subsets of a T2 space are closed.
R
a This

Warning: The converse is falsea see the following counter-example.


is in contrast to the T1 situation in which T1 is equivalent to points being closed.

Example 3.6.47 A space for which every quasicompact subset is closed that
is not T2 Let X be as in Example 3.2.27, that is the real numbers with the cocountable


topology.
Any open neighborhood of 0, must intersect any open neighborhood of 1, because both of
these neighborhoods have countable complements and so, by the uncountability of R, must
intersect. Therefore, X is not T2 .
We claim that a subset of X is quasicompact iff it is finite. Finite subsets are always
quasicompact. On the other hand, take K X infinite. Then, there is in particular a
countably-infinite subset {x0 , x1 , x2 , . . .} K. Define Cm := {xk : k m} and C := {Cm :
m N}. Then, C is a collection of closed subsets of X. Furthermore, the intersection of
any finitely many of them intersects K. On the other hand, the intersection of all of them is
empty. Therefore, K is not quasicompact. In particular, X is not quasicompactly-generated.a
Now, if a subset of X is quasicompact, it is finite, and hence closed (by the definition of the
topology).
a Finite subsets of X are discrete, and so from what we just showed, it follows that all quasicompact subsets
of X are discrete. Therefore, S K is always open in K, regardless of whether S is open or not.

Hopefully you found an example in Exercise 3.1.34 of a continuous bijective function that was
not a homeomorphism. In certain special cases, however, you can immediately make this deduction.
Exercise 3.6.48 Show that continuous injective function from a quasicompact space into a

T2 space is a homeomorphism onto its image.




Example 3.6.49 A space that is T1 but not T2 We showed that the real numbers with

the cocountable topology is not T2 in Example 3.6.47. On the other hand, by definition,
countable sets are closed, and so certainly points are closed, and so X is T1 .

Chapter 3. Basics of general topology

132

As a sort of aside, note that (Exercise 3.6.46) T2 implies that quasicompact subsets are closed,
and in turn if quasicompact subsets are closed then the space is T1 (this follows from Proposition 3.6.36). Furthermore, we showed that one of these implications (T2 implies quasicompact
subsets closed) is strict in Example 3.6.47. In fact, the other implication (quasicompact subsets
closed implies T1 ) is strict as well.


Example 3.6.50 A T1 space for which not every quasicompact subset is closed

Define X := N. We equip X with a nonstandard topology, the so-called cofinite topology.


Let C X and declare that
C is closed iff either (i) C = X or (ii) C is finite.

(3.6.51)

Because the finite union and arbitrary intersection of finite sets is finite (obviously), it
follows that this defines a topology on X. X is of course T1 has points are finite, hence
closed.
We claim that Z+ N is quasicompact (of course this is not closed because it is not finite).
So, let U be an open cover of Z+ . Let U U be an open set containing 1 Z+ . From the
definition of the topology, N \U = Z+ \U is finite, so let us write Z+ \U = {m1 , . . . , mn }.
As U covers Z+ , there must be some Uk U with mk Uk . Then, {U,U1 , . . . ,Un } is a
finite subcover of U .
Here is where the consistency of the terminology breaks-down. If you guessed that the term T3
refers to spaces in which any two distinct points are separated by closed neighborhoods, youd be
wrong. Unfortunately, it seems that the term T3 was already taken (well see what it means in a bit)
when someone went to write this definition down, and so it is called T2 1 .
2

Definition 3.6.52 T2 1 X is T2 1 iff any two distinct points are separated by closed neigh2
2

borhoods.
R

The alternate terminology for this seems to be urysohn.

Example 3.6.53 A space that is T2 but not T2 1 a This example is very similar to the

Arens Squaresee Example 3.6.15.


Define

S := ((0, 1) (0, 1)) (Q Q) = {hx, yi (0, 1) (0, 1) : x, y Q}

(3.6.54)

X := S {h0, 0i} {h1, 0i}.

(3.6.55)

and

(This is the rationals in the open unit square together with the bottom-left and bottom-right
corners.) We define a topology on X by defining a neighborhood base at each point (see
Proposition 3.1.11). For (x, y) X, there are three cases: (i) hx, yi = h0, 0i, (ii) hx, yi = h1, 0i,
and (iii) hx, yi S. We define
(

b
U

S
:
U
is
open
in
S.
if hx, yi S
o
Bhx,yi := n m
,
(3.6.56)
Uhx,yi : m Z+
otherwise
where


m
:= {h0, 0i} hx, yi (0, 12 ) (0, m1 ) : x, y Q
Uh0,0i


m
:= {h1, 0i} hx, yi ( 12 , 1) (0, m1 ) : x, y Q .
Uh1,0i

(3.6.57)

3.6 Separation Axioms

133

By Proposition 3.1.11, there is a unique topology for which Bhx,yi is a neighborhood base
of hx, yi X.
The closure of any open neighborhood of h0, 0i contains points with x-coordinate 12 and
y-coordinate arbitrarily small, and the same goes from any open neighborhood of (1, 0).
Thus, these two points are not separated by closed neighborhoods.c
On the other hand, the x-coordinate of every point in every open neighborhood of h0, 0i
is strictly less than 12 , and similarly the x-coordinate of every point in every open neighborhood of h1, 0i is strictly greater than 12 . In particular, these two points are separated by
neighborhoods.
For any point hx, yi distinct from h0, 0i and h1, 0i, we can separate hx, yi from h0, 0i with
m
neighborhoods by taking Uh0,0i
3 h0, 0i for m Z+ with m1 < y and any -ball about hx, yi
of with radius less than y m1 . Similarly for hx, yi and h1, 0i. Of course, any two points
distinct from both h0, 0i and h1, 0i are separated by neighborhoods because they can be
separated by neighborhoods in S. Thus, X is indeed T2 .
R

This is the Simplified Arens Square.

a This

comes from [23, pg. 100].


in the usual topology (the subspace topology inherited from (0, 1) (0, 1)).
c Recall that two points are separated by closed neighborhoods iff they are separated by open neighborhoods
with disjoint closuresee Proposition 3.6.13.
b Open

You might be thinking to yourself Well, if T3 wasnt separated by closed neighborhoods, then
it must be completely-separated, right?. Sorry. Wrong again.
Definition 3.6.58 Completely-T2 X is completely-T2 iff any two distinct points are

completely-separated.

Naturally, this is sometimes called completely-hausdorff .

Sometimes people will also say that continuous functions separate points.

Example 3.6.59 A space that is T2 1 but not completely-T2 The Arens Square from
2

Example 3.6.15 will do the trick. There, we provided an example of two points which are
separated by closed neighborhoods, but not completely-separated. This is of course already
enough to show that the Arens Square is not completely-T2 , but we still need to check that
any two points can be separated by closed neighborhoods.

Denote the Arens Square by X and let h 12 , r 2i X for r Q. Take m Z+ such that

r 2 m1 > 0 and take n Z+ such that 1n < r 2 m1 . Then, using the definition (3.6.20),
m
we see that the closures of U(1,0)
and Uh 1 ,r2i are disjoint. Similarly, a point of this form
2
can be separated from (1, 0) by closed neighborhoods.
m
For hx, yi S, just take m Z+ with m1 < y. Then, the closure of Uh0,0i
and any -ball
around hx, yi with radius less than y m1 will be disjoint. A similar trick works for h1, 0i of
course.

For r, q Q with r < q, choose m, n Z+ so that r 2+ m1 < q 2 1n (and so that r 2 m1 >

0 and q 2 + n1 < 1 of course so that the open neighborhoods are actually contained in X).
Then, the closures of Uhm1 ,r2i and Uhn1 ,q2i are disjoint.
2
2

For hx, yi S and h 12 , r 2i T , we must have that y 6= r 2, and so, without loss of

generality, that y < r 2. Then, choose m Z+ large enough so that r 2 m1 > y (and

Chapter 3. Basics of general topology

134

so that Uhm1 ,2i is contained in X), and take a -ball about hx, yi with radius less than
2

(r 2 m1 ) y. Then, the closure of these two neighborhoods will be disjoint.


Finally, can separate two points in S by closed neighborhoods because we could do so in
(0, 1) (0, 1) (recall that S just as the subspace topology).
And of course, as now you probably saw coming, T3 does not mean distinct points can be
perfectly-separated.
Definition 3.6.60 Perfectly-T2 X is perfectly-T2 iff any two distinct points can be

perfectly-separated.
R

Disclaimer: I have never seen this term before. Then again, Ive never seen any term
to describe such spaces. But honestly, if completely-T2 means you can completelyseparate points, then a space in which you can perfectly-separate points is going to
be called. . .

Example 3.6.61 A space that is completely-T2 but not perfectly-T2 The Uncount-

able Fort Space from Example 3.6.22 will do the trick. There, we provided an example of
two points which are completely-separated, but not perfectly-separated. This is of course
already enough to show that the Uncountable Fort Space is not perfectly-T2 , but we still
need to check that any two points can be completely-separated.
Recall that the Uncountable Fort Space was defined to be X := R with the closed sets
being precisely the finite sets and also the sets which contained 0. We showed above in
Example 3.6.22 that 1 X can be completely-separated from 0 X. Of course, there was
nothing special about 1, and so all we need to do is to show that we can completely-separate
any two nonzero points in X.
So, let x1 , x2 X be nonzero and distinct, and define f : X [0, 1] by

0 if x = x1
f (x) := 1 if x = x2 .
(3.6.62)

1
otherwise
2
1
1 (C) contains 0 X, and is hence closed. Otherwise,
Then, if C [0, 1] closed contains
2, f

it is finite, as f 1 [0, 1] \ { 12 } = {x1 , x2 }, and hence closed. Thus, f is continuous, and so
completely-separates x1 and x2 .

But now weve gone through all the separation properties, right? How could T3 be a thing?
Well, now we enter a collection of separation axioms of a different naturenow we will focus on
separating closed sets. One unfortunate fact, however, is that, if points are not closed, then these
new separation axioms are not strictly stronger than the separation axioms we just presented. There
are thus two versions of the following separation axioms: ones in which the points are closed and
ones in which they arent.
Definition 3.6.63 Regular X is regular iff any closed set and a point not contained in it

can be separated by neighborhoods.


R

Youll note that we skipped right over being topologically-distinguishable and just
being separated. This is because a point is automatically separated from a closed
setthe complement of the closed set is an open neighborhood of the point which
does not intersect the closed set.

3.6 Separation Axioms


R

135

Unfortunately, the terminology of separation axioms is so fucked that theres really no


way of being consistent with all of the literature. There are sources which reverse my
conventions of regular and T3 . We explain the motivation of our choice of convention
in the definition of T3 (Definition 3.6.65).

Example 3.6.64 A space that is T0 regular but not T1 The indiscrete topology on
any set with at least two points is vacuously regular but not T0 .


Stupid examples like this is why we almost always care about the case when we in addition
impose the condition of being T1 . This finally brings us to the definition of T3 .
Definition 3.6.65 T3 X is T3 iff it is T1 and regular.
R

The T1 condition (points are closed) is added so that this is a strict specifization of
being T2 . In fact, by the following proposition, we would have equally well said T0 .

We choose to call this T3 instead of regular because then we have T3 T2 . If the


terms were reversed, then there would be T3 spaces (like indiscrete spaces) that were
not T2 ew.

Warning: Up until now, all of the Tk axioms (including things like T2 1 , completely-T2 ,
2
and perfectly-T2 ) have been strictly comparable, that is, T1 strictly implies T0 , T2
strictly implies T1 , etc.. With the introduction of T3 , this is no longer the case. It turns
out that T3 implies T2 1 , but that T3 is not comparable with either completely-T2 or
2
perfectly-T2 .

Proposition 3.6.66 If X is T0 and regular, then it is T2 .

Proof. Suppose that X is T0 and regular. Let x1 , x2 X. Because X is T0 , without loss


of generality there is some open neighborhood U of x1 which does not contain x2 . Then,
U c is closed and does not contain x1 , so because X is regular, there are disjoint open
neighborhoods V1 and V2 of x1 and U c respectively. Then, as x2 U c , this implies that x1
and x2 are separated by neighborhoods.

Proposition 3.6.67 If X is T3 , then X is T2 1 .
2

Proof. Suppose that X is T3 . Let x1 , x2 X be distinct. As X is T3 , it is definitely T2 , and


so we can separate x1 and x2 with open neighborhoods U1 and U2 containing x1 and x2
respectively. Then, U1c is a closed set not containing x1 , and so because X is T3 , there is an
open neighborhood V1 of x1 and an open neighborhood V2 of U1c which are disjoint. Then,
Cls(U2 ) U1c V2 and Cls(V1 ) V2c ,

(3.6.68)

so that Cls(U2 ) and Cls(V1 ) are disjoint, so that X is T2 1 by Proposition 3.6.13.


2

Example 3.6.69 A space that is perfectly-T2 but not T3 Define X := R. We equip

R with the so-called cocountable extension topology. Let C X and declare that
C is closed iff it is the union of a countable set and a set closed in the
(3.6.70)
usual topology on R.
Now, let x1 , x2 X be distinct, and let : R (0, 1) be any homeomorphism (in the usual

Chapter 3. Basics of general topology

136
topology). Now define

0
f (x) := 1

(x)

f : X [0, 1] by
if x = x1
if x = x2 .
otherwise

(3.6.71)

The preimage of any set in [0, 1] which contains neither 0 nor 1 will be closed because
is a homeomorphism. If it does contain 0 or 1, then the preimage will be the union of a
closed set in the usual topology on R and a finite set, and hence will be closed. Thus, f is
continuous, and so X is perfectly-T2 . We know check that X is not T3 .
Q X is closed because it is countable. Any open neighborhood of Q must be of the form
of a usual open neighborhood of Q with countably many points removed. Of course, the
only usual open neighborhood of Q is all of R, and so every open
neighborhood of Q is
just R with countably
many
points
removed.
On
the
other
hand,
2
/ Q and any open

neighborhood of 2 is a usual open neighborhood with countably many points removed.


Thus,
by uncountability of R, these two must intersection, and so you cannot separate Q
and 2 with neighborhoods. Thus, X is not T3 .


Example 3.6.72 A space that is T3 but not completely-T2 a For m 2Z, define

Lm := {m} [0, 12 ).

(3.6.73)

For n 1 + 2Z and k Z with k 2, let Tn,k be the legs of an isosceles triangle in R2 R2


with apex at pn,k := hn, 1 1k i and base [n (1 1k ), n + (1 + 1k )] {0} (including the
end-points of the legs on the base but not the apex pn,k ). Then, define
Y0 :=

Lm

m2Z

Y1 :=

{pn,k }
(3.6.74)

n1+2Z; kZ,k2

Y2 :=

Tn,k

n1+2Z; kZ,k2

Y := Y0 Y1 Y2 .
We define a topology on Y by defining a neighborhood base at each pointsee Proposition 3.1.11. For hx, yi Y2 , we declare a neighborhood base to be simply {hx, yi}. For
hx, yi = pn,k , we declare a neighborhood base of hx, yi to consist of subsets of the form
{pn,k }S for S Tn,k cofinite in Tn,k . For hx, yi Lm for m 2Z, we declare a neighborhood
base of hx, yi to consist of subsets of the form {hm, yi}S for S Y ((m 1, m + 1) {y})
cofinite in Y ((m 1, m + 1) {y}). It follows from Proposition 3.1.11 that there is a
unique topology for which Bhx,yi is a neighborhood base of hx, yi Y .
Finally define
X := Y t {p1 , p2 }

(3.6.75)

for distinct new points p1 , p2 . To define a topology on X, we once again use Proposition 3.1.11. A neighborhood base at hx, yi Y consists of precisely the same sets as it did
before. We furthermore declare that


B p1 := Up1 : R


(3.6.76)
B p2 := Up2 : R

3.6 Separation Axioms

137

to be neighborhood bases of p1 and p2 respectively, where


Up1 := {hx, yi Y : x < }

(3.6.77)

Up2 := {hx, yi Y : x > } .

We first check that X is not completely-T2 by showing that no continuous function can
separate p1 from p2 .
First suppose that f (pn,k ) = . Then,
!
f 1 () Tn,k = f 1

( m1 , + m1 ) Tn,k =

mZ+

Sm ,

(3.6.78)

mZ+

where
Sm := f 1 (( m1 , + m1 )) Tn,k

(3.6.79)

is cofinite in Tn,k . Hence,


[

Tn,k \ Sm .
Tn,k \ f 1 () Tn,k =

(3.6.80)

mZ+

In other words, the number of points in Tn,k that map to a different value than f (pn,k ) is at
most countable. Less precisely, f is constant on Tn,k modulo a countable set.
Let Cn,k denote the set of y-values in [0, 1k ) for which there is some point in Tn,k with
that y-coordinate and that maps to f (pn,k ). We just showed that this set is cocountable in
[0, 1 1k ), and hence it is certainly cocountable in [0, 12 ). Then, the intersection of them
T
1
kZ,k2 Cn,k must in turn then be cocountable in [0, 2 ), and in particular, be nonempty.
So, let ym Lm be such that there is some qm1,k Tm1,k and some qm+1,k Tm+1,k with
f (qm1,k ) = f (pm1,k ) and f (qm+1,k ) = f (pm+1,k ). Because the open neighborhoods of ym
are cofinite in ((m 1, m + 1) {y}) Y , the sequence k 7 qm1,k must eventually be in
every neighborhood of ym , and so we have limk qm1,k = ym = limk qm+1,k . It follows that
f (ym ) = lim f (qm+1,k ) = lim f (pm+1,k ) = lim f (qm+2,k ) = f (ym+2 ).
k

(3.6.81)

The crux: for each m 2Z, there exists points ym Lm and ym+2 Lm+2 with f (ym ) =
f (ym+2 ). By the definition of the open neighborhoods of p1 and p2 , it follows that f (p1 ) =
limm f (ym ) = limm+ f (ym ) = f (p2 ), so that p1 and p2 are not completely-separated,
and so X is not completely-T2 .
We now check that X is T3 . We first check that it is T1 . p1 is closed because
{p1 } =

(Up2 )c .

(3.6.82)

Similarly for p2 . Each point hm, yi Lm is closed because the union of all open sets not
contained in Bhm,yi , B p1 , or B p2 is precisely the complement of hm, yi. Similarly for the
pn,k s. For hx, yi Tn,k , the intersection over {hx0 , y0 i}c for hx0 , y0 i 6= hx, yi in Tn,k is {hx, yi}
and so will be closed if Tn,k is. Each Tn,k is open, however, (being the union of its points,
which are open), and so Tn,k is the complement of the union over Tn0 ,k0 with n0 6= n and k0 6= k
with {p1 , p2 } removed.
Now we just need to show that we can separate closed sets from points with open neighborhoods. One thing to note is that we need only separate closed sets that are complements of

Chapter 3. Basics of general topology

138

some element of some B p from points, because every closed set is an intersection of sets of
this form. So, let p X and let C X be closed not containing p. There is nothing to do
but break it down into cases.
First suppose that p = p1 and C Y {p2 }. If C contained points of arbitrarily small
x-coordinate, then because it is closed, it would have to contain p1 . Thus, there is some
x0 R such that the x-coordinate of every point (besides p2 of course) is at least x0 . Then
x0

x0

we can take Up81 as our open neighborhood of p1 and Up42 as our open neighborhood of C.
Similarly if our point is p2 .
We can easily separate points of Tn,k from closed sets as these points themselves are open.
Now take p = hm, yi Lm and C X not containing p. If C intersects ((m 1, m + 1)
{y}) Y at more than finitely many points, then y would be an accumulation point of C,
and hence would be contained in C. Thus, it must intersection ((m 1, m + 1) {y}) Y at
at most finitely many points, in which case we can simply remove them to obtain an open
neighborhood of hm, yi. These finitely many points are elements of Tm1,k or Tm+1,k for
some k, and so are in particular open. Thus, the union of these finitely many points together
(m1)(1 1 )

(m+1)+(1+ 1 )

k
k
with Up1
and Up2
then must contain C (because these two neighborhoods
of p1 and p2 contain all Tn,l for n 6= m 1, m + 1).
Finally if one of the points is some pn,k , then using very similar logic as in the previous
case, C must intersect Tn,k at at most finitely many points. Removing these points from Tn,k
gives us an open neighborhood which is disjoint from the neighborhood formed from the
union of these finitely many points (which are open) and the complement of Tn,k .

I do not know of a name for this space, but in order to have something to refer to, I
shall call it the Thomas Tent Space.b

a Evidently,

this example originally comes from [25], though I first heard about it from Brian M. Scott on
math.stackexchange.
b The T s are like tents, and indeed, that is the word Scott used to describe them.
n,k

Thus, once you hit T2 1 , you can increase your separation in one of two noncomparable ways:
2
you can make your space completely-T2 or you can make your space T3 . At the end of the section,
we will summarize how all the separation axioms related to each other.
Definition 3.6.83 T3 1 X is T3 1 iff it is T1 and any closed set can be separated by closed
2
2

neighborhoods from a point it does not contain.


R

This is not universally accepted terminology. Often people use the term T3 1 to refer
2
to what I call completely-T3 . I imagine this is the case because it turns out that my
T3 1 is equivalent to T3 (see the next proposition).
2

You will notice a pattern with the terminology. If Tk means you can separate XYZ
from ABC with neighborhoods, then Tk 1 means you can separate XYZ from ABC
2
with closed neighborhoods; completely-Tk means you can completely-separate XYZ
from ABC; perfectly-Tk means you can perfectly-separate XYZ from ABC. This
admittedly creates conflict with some peoples terminology, but the terminology of
separation axioms is so varied from source to source that it was impossible to come
up with a naming system that didnt conflict with something. I chose what I did
because it is the most systematic that does not completely depart from the established
nomenclature.

3.6 Separation Axioms

139

Proposition 3.6.84 If X is T3 , then X is T3 1


2

Proof. Suppose that X is T3 . Let C X be closed and let x Cc . As X is T3 , there is an


open neighborhood U of C and an open neighborhood V of X which are disjoint. Then, V c
is a closed set not containing X, and so there is an open neighborhood W1 of V c disjoint and
an open neighborhood W2 of x which are disjoint. Then,
C U Cls(U) V c W1 and x W2 Cls(W2 ) W1c ,

(3.6.85)

and so U and W2 are open neighborhoods of C and x respectively with disjoint closures, so
that X is T3 1 by Proposition 3.6.13.

2

Definition 3.6.86 Completely-T3 X is completely-T3 iff it is T1 and any closed set can

be completely-separated from a point it does not contain.


R

As noted above, sometimes people use the term T3 1 for this property. This is also
2
sometimes called tychonoff .

Example 3.6.87 A space that is T3 1 but not completely-T3 The Thomas Tent Space
2
X from Example 3.6.72 will do the trick. We showed there that it is T3 , but not completely-T2 .
From Proposition 3.6.84, it follows that X is T3 1 . However, it if were completely-T3 , then it
2
would also be completely-T2 a contradiction. Therefore, X is likewise not completely-T3 .


Definition 3.6.88 Perfectly-T3 X is perfectly-T3 iff it is T1 and any closed set can be

perfectly-separated from a point it does not contain.


Example 3.6.89 A space that is completely-T3 but not perfectly-T3 The Uncountable Fort Space from Example 3.6.22 will do the trick. In Example 3.6.61, we showed
that this was completely-T2 but not perfectly-T2 . As it is not perfectly-T2 , it is certainly not
perfectly-T3 , though we still need to check that it is completely-T3 .
Recall that the Uncountable Fort Space was defined to be X := R with the closed sets being
precisely the finite sets and also the sets which contained 0.
So, let C X be closed and let x0 Cc . First let us do the case where x0 = 0. Then, we may
define f : X [0, 1] by
(
1 if x C
f (x) :=
.
(3.6.90)
0 otherwise


Then, f 1 (1) = C is closed and f 1 (0) contains 0 X, and so is closed. Thus, f is


continuous. Now consider the case where 0 C. Then, we may define f : X [0, 1] by
(
1 if x = x0
f (x) :=
.
(3.6.91)
0 otherwise
f 1 (1) is just a point and so is closed and f 1 (0) contains 0 X and so is closed. Finally,

Chapter 3. Basics of general topology

140

let us consider the case where x0 6= 0 and 0


/ C. Then, we may define f : X [0, 1] by

1 if x C
f (x) := 0 if x = x0 .
(3.6.92)

1
otherwise
2
Then, f 1 (1) = C is closed, f 1 (0) = {x0 } is finite and hence closed, and f 1 ( 12 ) contains
0 X and is hence closed. Thus, indeed, X is completely-T3 .
The separation axioms T0 through perfectly-T2 all had to do with separation of points. All the
T3 separation axioms had to do with separating closed sets from points. Finally, the T4 axioms have
to do with separating closed sets from closed sets.
Definition 3.6.93 Normal X is normal iff any two disjoint closed subsets can be sepa-

rated by neighborhoods.

Similarly as with the definition of regular (Definition 3.6.63), we do not need to


consider topological-distinguishability and mere separatedness.

Similarly as with the term regular, some authors reverse my conventions of normal
and T4 . The motivation of our choice of convention is the same as it was for T3 .

Example 3.6.94 A space that is normal but not T0 The indiscrete topology on any

set with at least two points is vacuously normal but not T0 .


Stupid examples like this is why we almost always care about the case when we in addition
impose the condition of being T1 .
Definition 3.6.95 T4 X is T4 iff it is T1 and normal.
R

Just as before, the condition of T1 is imposed so that this is a strict specifization of


being T3 .

There are at least two relatively large families of spaces that are T4 metric spaces (which are
in fact perfectly-T 4see Proposition 4.3.20) and compact spaces.
Proposition 3.6.96 If X is compact, then it is T4 .

Proof. Suppose that X is compact. Then, closed subsets are quasicompact (Exercise 3.2.37),
and hence can be separated by neighborhoods by Exercise 3.6.45.

Proposition 3.6.97 If X is T4 , then it is T3 1 .
2

Proof. Suppose that X is T4 . Let C X be closed and let x Cc . As X is T4 , it is definitely


T3 , and so we can separate C and x with open neighborhoods U1 and U2 containing C and x
respectively. Then, U1c is a closed set disjoint from C, and so because X is T4 , there is an
open neighborhood V1 of C and an open neighborhood V2 of U1c which are disjoint. Then,
Cls(U2 ) U1c V2 and Cls(V1 ) V2 ,

(3.6.98)

so that Cls(U2 ) and Cls(V1 ) are disjoint, so that X is T3 1 by Proposition 3.6.13.


2

3.6 Separation Axioms

141

There are spaces that are perfectly-T3 but not T4 ; however, unfortunately we havent yet the
firepower to construct such a space yet (or at least prove it has the desired properties). The
counter-example we have in mind is Niemytzkis Tange Disk Topology, and you can find it in
Example 4.5.61.
Recall that (Example 3.6.72) there are T3 spaces that are not completely-T2 . This does not
happen with T4 and completely-T3 ! Fortunately, every T4 space is completely-T3 , as we shall see in
a moment (see Theorem 3.6.107).
Definition 3.6.99 T4 1 X is T4 1 iff it is T1 and any two disjoint closed subsets can be
2
2

separated by closed neighborhoods.


R

Just as with T3 1 (Definition 3.6.83), this terminology is not standard, presumably


2
because it is actually just equivalent to T4 .

Proposition 3.6.100 If X is T4 , then X is T4 1 .


2

Proof. Suppose that X is T4 . Let C1 ,C2 X be disjoint closed subsets. As X is T4 , there


are disjoint open neighborhoods U1 and U2 of C1 and C2 respectively. Then, U1c is a closed
set not containing C1 , and so there are disjoint open neighborhoods V and W of C1 and U1c
respectively. Then,
C2 U2 Cls(U2 ) U1c W and C1 V Cls(V ) W c ,

(3.6.101)

and so V and U2 are open neighborhoods of C1 and C2 respectively with disjoint closures,
so that X is T4 1 by Proposition 3.6.13.

2

Definition 3.6.102 Completely-T4 X is completely-T4 iff it is T1 and any two disjoint

closed subsets can be completely-separated.


This is in fact equivalent to being T4 , though this is relatively nontrivial, and the statement even
has a name associated to it. Before we prove that, however, we first present a useful lemma.
Proposition 3.6.103 Let X be a T1 topological space. Then,

(i). X is T3 iff whenever U is an open neighborhood of x X, there is an open neighborhood V of x such that x V Cls(V ) U; and
(ii). X is T4 iff whenever U is an open neighborhood of a closed subset C X, there is an
open neighborhood V of C such that C V Cls(V ) U.
Proof. We first prove (i).
() Suppose that X is T3 . Let U be an open neighborhood of x X. Then, U c is a
closed set that does not contain x, and so there are disjoint open neighborhoods V and W of
x and U c respectively. Then,
x V Cls(V ) W c U.

(3.6.104)

() Suppose that whenever U is an open neighborhood of x X, there is an open neighborhood V of x such that x V Cls(V ) U. Let C X be closed and let x Cc . Then, Cc
is an open neighborhood of x, and so there is an open neighborhood V of x such that
x V Cls(V ) Cc ,

(3.6.105)

Chapter 3. Basics of general topology

142

and so V and Cls(V )c are disjoint open neighborhoods of x and C respectively, so that X is
T3 .
Exercise 3.6.106 Prove (i).


Theorem 3.6.107 Urysohns Lemma. a If X is T4 , then X is completely-T4 .
R

This is often stated as If C1 ,C2 X are disjoint closed subsets of a T4 space X, then
there is a continuous function f : X [0, 1] that is 0 on C1 and 1 on C2 ..

Proof. Suppose that X is T4 . Let C1 ,C2 X be closed.


Let {rm : m N} be an enumeration of the rationals in [0, 1] with r0 = 1 and r1 = 0. We
define a collection of open sets {Urm : m N} inductively. During this process, we will
apply Proposition 3.6.103 repeatedly.
First of all, define U1 := C2c . Then, U1 is an open neighborhood of C1 , and so there is
some other open neighborhood U0 of C1 such that C1 U0 Cls(U0 ) U1 .
Suppose that we have defined Ur0 , . . .Urm such that Cls(Up ) Uq if r < q for r, q
{r0 , . . . , rm }. We wish to define Um+1 so that this property still remains to be true. As there
are only finitely many rational numbers in {r0 , . . . , rm } there is a largest p0 {r0 , . . . , rm }
with p0 < rm+1 and similarly there is a smallest q0 {r0 , . . . , rm } with rm+1 < q0 . Take
Urm+1 so that
Cls(Up0 ) Urm+1 Uq0 .

(3.6.108)

Proceeding inductively, this allows us to define Urm for all m N, and hence, we have
defined Ur for all r Q [0, 1]. By construction, it follows that
Cls(Up ) Uq

(3.6.109)

for p, q Q [0, 1] for p < q.


For x X, define
Qx := {r Q [0, 1] : x Ur }.
Note that Qx is empty iff x C2 . We then in turn define
(
1
if x C2
f (x) :=
.
inf (Qx ) otherwise

(3.6.110)

(3.6.111)

Of course f (C2 ) = {1}. Furthermore, if x C1 , then x U0 , and so indeed f (x) = 0. Thus,


we need only check that f is continuous.
So, let x0 X. First suppose that f (x0 ) 6= 0, 1. The other two cases are similar. Let > 0
be such that B ( f (x0 )) [0, 1] (if f (x0 ) = 0, for example, then you will instead use [0, ) in
place of B ( f (x0 ))). Let p, q Q be such that
f (x0 ) < p < f (x0 ) < q < f (x0 ) + .

(3.6.112)

Then, U := Uq \ Cls(Up ) is open in X and we claim that f (U) B (x0 ). This will show that
f is continuous at x0 , and hence continuous as x0 was arbitrary.

3.6 Separation Axioms

143

So, let x Uq \ Cls(Up ). Then, q Qx , and so f (x) = inf(Qx ) q. On the other hand, as
x
/ Up , it cannot be in Ur for any r p as Ur Up for r p. Therefore, p is a lower-bound
for Qx , and so f (x) = inf(Qx ) p. Hence,
f (x0 ) < p f (x) p < f (x0 ) + ,

(3.6.113)


and we are done.


a Proof

adapted from [17, pg. 207].

Definition 3.6.114 Perfectly-T4 X is perfectly-T4 iff it is T1 and any two disjoint closed

can be perfectly-separated.
R

For some reason, this is sometimes called T6 . What is T5 you ask? Evidently T5
means T1 and every subspace is T4 .

Example 3.6.115 A space that is completely-T4 but not perfectly-T4 The Uncountable Fort Space from Example 3.6.22 will once again do the trick. In Example 3.6.61,
we show that it is not perfectly-T2 , and so it is certainly not going to be perfectly-T4 . We
still need to check that it is completely-T4 .
Recall that the Uncountable Fort Space was defined to be X := R with the closed sets being
precisely the finite sets and also the sets which contained 0.
So, let C1 ,C2 X be disjoint closed sets. Let us first do the case where neither C1 nor C2
contains 0 X. Then, we may define f : X [0, 1] by

0 if x C1
f (x) := 1 if x C2 .
(3.6.116)

1
otherwise
2


The preimage of 0 is C1 is closed, the preimage of 1 is C2 is closed, and the preimage of 12


contains 0 X and so is closed. Thus, this function is continuous. Now suppose that 0 C1 .
Then, we may define f : X [0, 1] by
(
0 if x C1
f (x) :=
.
(3.6.117)
1 if x C2
The preimage of 0 is C1 is closed and the preimage of 1 is C2 is closed. Thus, this function
is continuous.

3.6.3

Summary
We summarize what we have covered so far in this section.
First of all, there are several levels of separation between two different objects in a space
Distinct 11 Topologically-distinguishable 12 Separated
13 Separated by neighborhoods 14 Separated by closed neighborhoods
15 Completely-separated 16 Perfectly-separated
(3.6.118)

Chapter 3. Basics of general topology

144

The arrows indicated implication of course, and all these implications are strict, as indicated by the
examples referenced in the footnotes.
There are three families of separation axioms of spaces: (i) separation of pairs of points, (ii)
separation of closed sets from points, and (iii) and separation of disjoint closed sets.
In the first family:
(i). Points being topologically-distinguishable is T0 (Definition 3.6.28).
(ii). Points being separated is T1 (Definition 3.6.34).
(iii). Points being separated by neighborhoods is T2 (Definition 3.6.40).
(iv). Points being separated by closed neighborhoods is T2 1 (Definition 3.6.52).
2

(v). Points being completely-separated is completely-T2 (Definition 3.6.58).


(vi). Points being perfectly-separated is perfectly-T2 (Definition 3.6.60).
In general, appending 12 2, completely, or perfectly to a separation axiom in this way changes
whatever the separation axiom was now to separated by closed neighborhoods, completelyseparated, and perfectly-separated respectively.
Closed sets and points, as well as closed sets and closed sets, are automatically separated, and
so there are no separation axioms analogous to T0 and T1 for families (ii) and (iii).
For (ii) and (iii), in order for them to be directly comparable with (i), we require that points be
closed (that is, we explicitly require spaces in families (ii) and (iii) to be T1 see Proposition 3.6.36).
Without this extra assumption, the separation axioms are called regular and normal respectively.
Thus, for the second family:17
(i). Closed sets and points being separated by neighborhoods is T3 (Definition 3.6.65).
(ii). Closed sets and points being separated by closed neighborhoods is T3 1 (Definition 3.6.83).
2

(iii). Closed sets and points being completely-separated is completely-T3 (Definition 3.6.86).
(iv). Closed sets and points being perfectly-separated is perfectly-T3 (Definition 3.6.88).
And similarly for the third family:

18

(i). Closed sets being separated by neighborhoods is T4 (Definition 3.6.65).


(ii). Closed sets being separated by closed neighborhoods is T4 1 (Definition 3.6.99).
2

(iii). Closed sets being completely-separated is completely-T4 (Definition 3.6.102).


(iv). Closed set being perfectly-separated is perfectly-T4 (Definition 3.6.114).
We now illustrate how all these axioms are related to each other. All implications are strict
(unless otherwise indicated by a ), in which case the offending counter-example is given in the
indicated footnote. Perhaps the only real surprise19 is the equivalence of T4 1 and completely-T4 :
2

17 Remember

that we require the spaces to be T1 in addition to these properties.


that we require the spaces to be T1 in addition to these properties.
19 Thats not to say that the counter-examples are easytheyre not (in fact, some of them are really fucking hard, like,
among-the-most-difficult-things-in-the-notes hard)but rather that you probably expect that some counter-example
exists, even if incredibly exotic.
18 Remember

3.7 The Intermediate and Extreme Value Theorems

145

Separation axioms of spaces (Theorem 3.6.107).20


T0
21

T1
22

T2
23

. (3.6.119)

1
22

24

T3

1
32

25

28
29

Completely-T3

33
34

Perfectly-T3

27

1
42

32
31

35

Perfectly-T2

3.7.1

T4

30

Completely-T2

3.7

26

Completely-T4
37

36

Perfectly-T4

The Intermediate and Extreme Value Theorems


Connectedness and the Intermediate Value Theorem
Youll recall from calculus that the classical statement of the Intermediate Value Theorem is
Let f : [a, b] R be continuous. Then, for all y between f (a) and f (b) (inclusive), there exists x [a, b] such that f (x) = y.

(3.7.1)

We will see that the proper way to interpret this statement is that the image of f is connected, so that
if the image contains f (a) and f (b) (which it does by definition), then it must contain everything
in-between as well. Of course, in order to make this precise, we have to first define what it means
to be connected.
Definition 3.7.2 Connected and disconnected Let X be a topological space. Then,

X is disconnected iff there exist disjoint nonempty closed sets C, D X such that X = C D.
X is connected iff it is not disconnected. A subset S of X is connected iff it is connected in
its subspace topology.
R

The intuition for the definition of disconnected of course is that we break-up the
space into two separate pieces which have no overlap.

Another way to say this is that X is disconnected iff it has a partition with two closed
sets.

Note that it is not the case that S is disconnected iff S = C D for nonempty disjoint
closed subsets C, D X. The reason for the difference is that subsets of S which are
closed in S need not be closed in X. For example, (0, 1) (2, 3) is a disconnected
space, but it cannot be written as the union of two nonempty disjoint closed subsets
of R.

This is usually phrased in terms of open sets instead of closed sets. It turns out the
definitions are equivalentsee the following result. The reason we state the definition
in terms of closed sets is because there is a related concept called hyperconnected in
which they are not equivalent and the correct notion is the one stated in terms of
closed sets.

20 Though

perhaps its worth nothing that Tk 1 is equivalent to completely-Tk for k = 3, 4 but not k = 2.
2

Chapter 3. Basics of general topology

146

Proposition 3.7.3 Let X be a topological space. Then, X is disconnected iff there exist

disjoint nonempty open sets U,V X such that X = U V .


Proof. () Suppose that X is disconnected. Then, by definition, there are disjoint nonempty
closed sets C, D X such that X = C D. As C and D are disjoint, in fact D = Cc , and so
D is itself open. Similarly C is open. This is the desired result.
() Suppose that there exist disjoint nonempty open sets U,V X such that X = U V .
Similarly as before, V = U c , and hence is closed, etc. etc..

Example 3.7.4 Define S := [0, 1] [2, 3] R =: X. Then, X is certainly disconnected
because U := [0, 1] and V := [2, 3] are nonempty disjoint open subsets of S such that
S = [0, 1] [2, 3]. Note, however, that if we required U and V to be open in X := R, then
this would not show that S is disconnected. This is why we require that U and V are open in
the subspace topology of S, instead of open in the ambient space.


Proposition 3.7.5 Let X be a topological space and let U be a collection of connected


S

subsets of X with nonempty intersection. Then,

UU

U is connected.

Proof. To simplify notation, let us write X 0 := UU U. We proceed by contradiction:


S
suppose that UU U is disconnected, so that we may write
S

X 0 = V W
for V,W X 0 open, nonempty, and disjoint. Let x0
assume that x0 V . For each U U , let us write
UV := U V and UW := U W,

(3.7.6)
T

UU

and without loss of generality

(3.7.7)

so that
U = UV UW

(3.7.8)

for all U U . As U is connected, it follows that, for each U, either UV or UW is empty.


However, we know that x0 UV , and so in fact, we must have that UW = 0/ for all U U ,
which in turn implies that W = 0:
/ a contradiction.

Definition 3.7.9 Connected component Let X be a topological space and let x1 , x2 .
Then, x1 and x2 are connected (to each other) iff there exists a connected set U X with
x1 , x2 U.

3.7 The Intermediate and Extreme Value Theorems

147

Proposition 3.7.10 The relation of being connected to is an equivalence relation on

X.
Proof. x is connected to itself because {x} is connected. The relation is symmetric
because the definition of the relation is symmetric. If x1 is connected to x2 and x2 is
connected to x3 , then there is some connected set U which contains x1 and x2 , and
there is some connected set V which contains x2 and x3 . As U and V both contain
x2 , it follows from the previous proposition that U V is connected, and hence x1 is
connected to x3 .

A connected component of X is an equivalence class of some point with respect to the
relation of being connected to.
Definition 3.7.11 Totally-disconnected Let X be a topological space. Then, X is

totally-disconnected iff every nonempty connected component of X is a point.


Proposition 3.7.12 Let X be a discrete topological space. Then, X is totally-disconnected.

Proof. Let U X have at least two distinct points x1 and x2 . Then,


U = {x1 } (U \ {x1 }),

(3.7.13)


and as every subset in a discrete space is open, it follows that U is disconnected.

Example 3.7.14 N, Z, and Q are totally-disconnected That N and Z are totallydisconnected follows from the fact that they are discrete.
Q is also totally-disconnected,a but this is more difficult to see. Let U Q have at least
two distinct points q1 and q2 . Without loss of generality, suppose that q1 < q2 . Then, by
density of Qc in R (Theorem 2.4.76), there is some x Qc with q1 < x < q2 . Define


V := (, x) U and W := (x, ) U.

(3.7.15)

Both V and W are open by the definition of the subspace topology (Proposition 3.5.1) of U,
and both are nonempty because q1 < x and q2 > x. Thus, as U = V W , U is disconnected,
and hence Q is totally-disconnected.
a In

particular, there are totally-disconnected spaces which are not discrete.

We mentioned at the beginning of this section that the proper way to interpret the Intermediate
Value Theorem is the statement that the image of connected sets are connected. There is one other
thing we need to check thoughwe need to check that intervals in R are in fact connected.
Theorem 3.7.16. Let I R. Then, I is connected iff it is an interval.
R

You might say that this result is to connectedness as the Quasicompactness is to


quasicompactnessthis result characterizes connectedness in R and the Quasicompactness characterizes quasicompactness in R. A key difference, however, is their
generalization to Rd there is no such thing as an interval in Rd , whereas the Quasicompactness holds verbatim in Rd .

In particular, R itself is connected, in contrast with N, Z, and Q.

Chapter 3. Basics of general topology

148

Proof. () Suppose that I is connected. Let a, b I with a b and let x R with a x b.


We must show that x I. If either x = a or x = b, we are done, so we may as well suppose
that a < x < b. We proceed by contradiction: suppose that x
/ I. Then,
I = (I (, x)) (I (x, )),

(3.7.17)

and so as I is connected, we must have that either I (, x) is empty or I (x, ) is empty.


But a is in the former and b is in the latter: a contradiction.
() Suppose that I is an interval. As I is an interval, by Proposition 2.4.74, we have that
I = [(a, b)] for a = inf(I) and b = sup(I).a We wish to show that I is connected. If a = b,
then either I = 0/ or I = {a}, in which case I is trivially connected. Therefore, we may
assume without loss of generality that a < b.
We proceed by contradiction: suppose that I is disconnected. The, we have that I = U V
for U,V I open in I disjoint and nonempty. As U and V are open in I and cover I, we
must have that at least one of them contains an open neighborhood of a, so without loss of
generality, suppose that [(a, x) U for some x R with a < x b. Now define
S := {x I : [(a, x) U} .

(3.7.18)

We just showed that this set is nonempty. It is also bounded above by b as b is in particular a
bound of I. Therefore, it has a supremum. We wish to show that sup(S) = b.
Note that a < sup(S) b, and so, because I is an interval, either sup(S) I or sup(S) = b.
In the latter case, we are done, so we may as well assume that sup(S) I.
In the case we are considering in which sup(S) I, we show that in fact sup(S) U.
We proceed by contradiction: suppose that sup(S) V (here is where we use the fact that
sup(S) I. As V is open, there is a neighborhood of sup(S) completely contained in V . On
the other hand, by Proposition 1.4.12, this neighborhood has to contain some element of
S, which in turn would imply that it would have to contain some element of U. But then U
intersects V : a contradiction. Therefore, sup(S) U.
We now finish the proof that sup(S) = b. We certainly have that sup(S) b as b is an
upper-bound of S. Thus, it suffices to show that sup(S) b. To show this, we proceed by
contradiction: suppose that sup(S) < b. Then, because U is open, there is some > 0 such
that (sup(S) , sup(S) + ) U. By Proposition 1.4.12, there must be some x S with
sup(S) < x sup(S), so that [(a, x) U. But then,
[(a, x)) (sup(S) , sup(S) + ) = [(a, sup(S) + ) U,

(3.7.19)

and so, in particular, there is some x0 > sup(S) such that [(a, x0 ) U: a contradiction.
Therefore, we must have that sup(S) = b.
Now that we have finally succeeded in showing that sup(S) = b, we finish the proof by
coming to a contradiction of the assumption of disconnectedness. As sup(S) = b, this means
that for every x I with a < x < b, we have that [(a, x) U, and hence
[

[(a, x) = [(a, b) U.

(3.7.20)

a<x<b

Thus, either V = {b} or V = 0:


/ a contradiction of being open in I or being nonempty
respectively.

a Recall that this notation just means that the end-points can be either open or closed; see the remark in
Proposition 2.4.74.

3.7 The Intermediate and Extreme Value Theorems

149

And now we finally get to the statement of the true Intermediate Value Theorem.
Theorem 3.7.21 Intermediate Value Theorem. Let f : X Y be a continuous function

and let S X be connected. Then, f (S) is connected.


Proof. We proceed by contradiction: suppose that f (S) is disconnected. Then, f (S) = U V
for U,V f (S) open in f (S) disjoint and nonempty. By continuity, we have thata
S b f |S 1 (U) f |S 1 (V ) S,

(3.7.22)

and so
S = f |S 1 (U) f |S 1 (V ).

(3.7.23)

As U and V are open in S and f is continuous, f |S 1 (U) and f |S 1 (V ) are open in S. They
also must be disjoint, for a point which lied in their intersection would be mapped into U S
via f . Therefore, because S is connected, we have that either f |S 1 (U) or f |S 1 (V ) is empty,
which implies respectively that either U or V is empty: a contradiction.

a Also
b This

recall that f 1 ( f (S)) Ssee Exercise A.1.48(ii).


follows from the fact that we are applying the preimage of the restriction of f to S.

As a corollary of this (and the fact that a subnet of R is connected iff it is an intervalsee
Theorem 3.7.16), we have the classical statement of the Intermediate Value Theorem.
Corollary 3.7.24 Classical Intermediate Value Theorem Let f : [a, b] R be con-

tinuous. Then, f ([a, b]) is an interval. In particular, any element between f (a) and f (b) is
in the image of f .
Proof. The in particular part follows from the definition of an interval, Definition A.1.89.
[a, b] is connected by Theorem 3.7.16, and so, by the Intermediate Value Theorem,
f ([a, b]) is connected, and so by Theorem 3.7.16 again, is an interval.


3.7.2

Quasicompactness and the Extreme Value Theorem


Youll recall from calculus that the classical statement of the Extreme Value Theorem is
Let f : [a, b] R be continuous. Then, there exists x1 , x2 [a, b] such that
f (x1 ) = infx[a,b] { f (x)} and f (x2 ) = supx[a,b] { f (x)}. In other words, continuous functions attain their maximum and minimum on closed intervals.

(3.7.25)

This is actually a special case of (or follows easily from) a much more general, elegant statement.
Theorem 3.7.26 Extreme Value Theorem. Let f : X Y be continuous and let K X.

Then, if K is quasicompact, then f (K).


R

In other words, the continuous image of a quasicompact set is quasicompact.

Let U be an open cover of f (K). Then,


Proof. Suppose
 that K is quasicompact.

f 1 (U ) := f 1 (U) : U U is an open cover of K, and therefore it has a finite subcover { f 1 (U1 ), . . . f 1 (Um )}. In other words,
K f 1 (U1 ) f 1 (Um ),

(3.7.27)

Chapter 3. Basics of general topology

150
and hence




f (K) f f 1 (U1 ) f 1 (Um ) = a f f 1 (U1 ) f f 1 (Um )
= b U1 Um ,

(3.7.28)

so that {U1 , . . . ,Um } is a finite subcover of U , and hence f (K) is quasicompact.


a By
b By

Exercise A.1.40(iii).
Exercise A.1.48(i).

And now we can present the classical version of the theorem.


Corollary 3.7.29 Classical Extreme Value Theorem Let f : [a, b] R be continuous.
Then, f ([a, b]) is closed and bounded. In particular, it attains a maximum and minimum on
[a, b].
R

I suppose that usually in calculus people dont make any statement regarding the
image being an intervalthis is separately stated as the Intermediate Value Theorem.

Proof. The in particular is a result of Exercise 2.5.35, the statement that closed bounded
sets contain their supreumum and infimum.
By the Quasicompactness (Theorem 2.5.64), [a, b] is quasicompact. Therefore, by the
Extreme Value Theorem, f ([a, b]) is quasicompact, and therefore, closed and bounded,
again by the Quasicompactness.

In fact, we may as well just combine the classical statements into one.
Corollary 3.7.30 Classical Intermediate-Extreme Value Theorem Let f : [a, b] R

be continuous. Then, f ([a, b]) is a closed, bounded, interval.

3.8

Local properties
For most topological properties, there is a local version.
Meta-definition 3.8.1 Locally XYZ A topological space is locally XYZ iff each point

has a neighborhood base consisting of sets that are XYZ.


Of particular importance are the notions of local connectedness, local quasicompactness, and
locally (completely/perfectly)-Ta .
Exercise 3.8.2 Show that if a space is T2 then it is locally T2 . Find a counter-example to

show that converse is false.


Exercise 3.8.3 Show that if a space is locally quasicompact and T2 , then it is locally

compact. Find a counter-example to show the converse is false.


The following is the result related to locally quasicompactness that will be used in the proof of
The Haar-Howes Theorem.
Proposition 3.8.4 Let X be a locally compact space, let K X be quasicompact, and let

3.8 Local properties

151

U X contain X. Then, there is an open set V X with compact closure such that
K V Cls(V ) U.

(3.8.5)

Proof. We leave this as an exercise.


Exercise 3.8.6 Complete the proof yourself.

4. Uniform spaces

4.1

Motivation
A uniform space is the most general context in which one can talk about concepts such as uniform
continuity, uniform convergence, cauchyness, completeness, etc.. To formalize this notion, we will
equip a set with a distinguished set of covers, called uniform covers. The example you should always
keep in the back of your mind is the collection of all -balls for a fixed : U := {B (x) : x R}.
The idea is that, somehow, all of the sets in the same uniform cover are of the same size.
Having specified the uniform covers, we will then be able to say things like a net 7 x
is cauchy iff for every uniform cover U , there is some U U such that 7 x is eventually
contained in U. In the case that the collection of uniform covers is {U : > 0},1 you can check
that this is precisely the definition of cauchyness we had in R (Definition 2.4.22).
Moreover, the generalization from R to uniform spaces is not a needless abstraction. Indeed, I
am required to cover metric spaces (Definition 4.3.6) in this course, and this is just a very special
type of uniform space. Indeed, essentially every topological space we look at in these notes
besides ones cooked up for the express purpose of producing a counter-examplehas a canonical
uniformity. On the other hand, it is certainly not the case that every topological space we encounter
will be a metric space. For example, something as simple as all continuous functions from R to R
has no canonical metric,2 but is trivially a uniform space (because it is a topological groupsee
Definition 4.3.57).

4.2

Basic definitions and facts


A uniform space will wind-up being a set equipped with a special set of covers, the uniform
covers. Of course, however, as you should expect, we cannot just take any collection of covers
and declare them to be the uniform coversthe collection of uniform covers has to satisfy certain
reasonable properties, analogous to the properties satisfied by the collections of all -balls. They
The collection of all the U is not actually a uniformity but rather a uniform basesee Definition 4.2.69.
what its worth, I believe the topology is metrizable (homeomorphic to a metric space), but certainly not with
any metric you would like to work with, much less a canonical one.
1 Disclaimer:
2 For

Chapter 4. Uniform spaces

154

key requirement is that the collection of uniform covers has to be downward-directed with respect to
a relation called star-refinement.3 Thus, before getting to the definition of a uniform space itself, we
must say what we mean by star-refinement (we will say what we mean by downward-directed
in the definition of a uniform space itself).
4.2.1

Star-refinements
Definition 4.2.1 Star Let X be a set, let S X, and let U be a cover of X. Then, the

star of S with respect to U , StarU (S), is defined by


[

StarU (S) :=

U.

(4.2.2)

UU s.t. US6=0/

The star of a point is the star of its singleton and denoted StarU (x).
In other words, the star of a set with respect to a cover is the union of all elements of
the cover which intersect the set.

To help guide your intuition, stars play similar roles as -balls did in R. For example, if
U := {B (x) : x R} is the cover of R ball all balls, then the star centered at a point x R is
StarU (x) = B2 (x).

(4.2.3)

Thus,
Stars are like big balls.
Example 4.2.4 The star of the preimage is not the preimage of the star One
might hope for the preimage of the star to be the star of the preimage, that is, for f : X Y ,
V Y , and V a cover of Y , that


f 1 (StarV (V )) = Star f 1 (V ) ( f 1 (V )).

(4.2.5)

Unfortunately, this is not necessarily the case. For example, take f to be the inclusion
R , R2 (with image the x-asix), take V to be any subset of R2 which does not intersect
the x-axis (e.g. V = {hx, yi R2 : y = 1}), and take V := {R2 }, namely the cover of R2
consisting of only R2 itself. Then,
StarV (V ) = R2 ,

(4.2.6)

and so
f 1 (StarV (V )) = R.

(4.2.7)

On the other hand,


Star f 1 (V ) ( f 1 (V )) = 0/
simply because f 1 (V ) = 0.
/
Despite this, we always have one inclusion.
3 You

are not expected to know what either of these terms mean yet.

(4.2.8)

4.2 Basic definitions and facts

155

Proposition 4.2.9 Let f : X Y be a function, let V Y , and let V be a cover of Y . Then,

Star f 1 (V ) ( f 1 (V )) f 1 (StarV (V )) .

(4.2.10)

Furthermore, if f satisfies f ( f 1 (S)) = S for all S X, then we have equality.


A sufficient condition for f ( f 1 (S)) = S is for f to be surjective because then f has
a right-inversesee Appendix A.1.3(ii)

Proof. Note that we have that (Exercise A.1.40(ii) and Exercise A.1.48(i)) V V0
f ( f 1 (V ) f 1 (V0 )). Therefore, if f 1 (V ) intersects f 1 (V0 ), it must be the case that V
intersects V0 . Furthermore, if we have that f ( f 1 (S)) = S, then we would in fact that that
V V0 = f ( f 1 (V ) f 1 (V0 )), so that in this case V intersects V0 iff f 1 (V ) intersects
f 1 (V0 ).
Star f 1 (V ) ( f 1 (V0 )) =

f 1 (V )

[
V V s.t. f 1 (V ) f 1 (V0 )6=0/

= a f 1

(4.2.11)

V V s.t. f 1 (V ) f 1 (V0 )6=0/

!
b f 1

= f 1 (StarV (V0 )) .

V V s.t. V V0 6=0/


a Exercise

A.1.40(i)
we are using the fact that f 1 (V ) intersects f 1 (V0 ) implies that V intersects V0 . Also note that we
have equality here if f ( f 1 (S)) = S.
b Here

We also have the dual counter-example and result for the image.


Example 4.2.12 The star of the image is not the image of the star Take f : R2 R

to be the projection onto the x-axis, define


U := {[m + y, m + 1 + y] {y} : m Z, y R},

(4.2.13)

and U0 := (0, 1) {0}. Then,


StarU (U0 ) = [0, 1] {0},

(4.2.14)

and so
f (StarU (U0 )) = [0, 1].

(4.2.15)

On the other hand,


f (U ) = {[m + y, m + 1 + y] : m Z, y R}

(4.2.16)

and f (U0 ) = (0, 1), and so f (U0 ) intersects [m + y, m + 1 + y] for m = 0 and 1 < y < 1,
and so
Star f (U ) f (U0 )

[y, y + 1] = (1, 2),

1<y<1

which is strictly larger than f (StarU (U0 )).

(4.2.17)

Chapter 4. Uniform spaces

156

Proposition 4.2.18 Let f : X Y be a function, let U X, and let U be a cover of X.

Then,
Star f (U ) ( f (U)) f (StarU (U)).

(4.2.19)

Furthermore, if f is surjective and satisfies f 1 ( f (S)) = S for all S X, then we have


equality.
R

Requiring that f be surjective is not really a big dealwe need f to be surjective for
the image of a cover to be a cover anyways.

A sufficient condition for f 1 ( f (S)) = S is for f to be injective because then f has a


left-inversesee Appendix A.1.3(i).

Proof. Note that we have that (Exercise A.1.40(iv) and Exercise A.1.48(ii)) U U0
f 1 ( f (U) f (U0 )). Therefore, if U intersects U0 , it must be the case that f (U) intersects
f (U0 ). Furthermore, if we have that f is surjective and f 1 ( f (S)) = S for all S X, then
we have that
U U0 = f 1 ( f (U)) f 1 ( f (U0 )) = f 1 ( f (U) f (U0 )),

(4.2.20)

and hence
f (U U0 ) = f (U) f (U0 ),

(4.2.21)

so in this case U intersects U0 iff f (U) intersects f (U0 ). Hence,


!
[

f (StarU (U)) = f

U0

=a

U 0 U s.t. U 0 U6=0/

f (U 0 )

U 0 U s.t. U 0 U6=0/

(4.2.22)

f (U ) = Star f (U ) f (U).

U 0 U s.t. f (U 0 ) f (U)6=0/


a Exercise

A.1.40(iii)
we are using the fact that U intersects U0 implies that f (U) intersects f (U0 ). Also note that we have
equality here if f is surjective and f 1 ( f (S)) = S.
b Here

On the other hand, we have the following similar-looking result where we have equality in all
cases.
Proposition 4.2.23 Let f : X Y be a function, let U X, and let V be a cover of Y .

Then,
Star f 1 (V ) (U) = f 1 (StarV ( f (U))) .

(4.2.24)

Proof. Note that f 1 (V ) intersects U iff V intersects f (U). Using this, we have

Star f 1 (V ) (U) :=
V V s.t.

f 1 (V ) = f 1

f 1 (V )U6=0/

V V s.t. V f (U)6=0/

(4.2.25)

=: f 1 (StarV ( f (U))) .


4.2 Basic definitions and facts

157

Definition 4.2.26 Refinement and star-refinement Let X be a set, and let U and V

be covers on X.
(i). U is a refinement of V , written U  V iff for every U U there is some V V
such that U V .
(ii). U is a star-refinement of V , written U V iff for every U U there is a V V
such that StarU (U) V .
R
R

The intuition is that every element of U is small enough to be contained in some


element of V .
In other words, U is a star-refinement of V iff for all U U , there is some V V
such that, whenever U 0 U intersects U, it follows that U 0 V . The intuition for
star-refinements is the same as for refinements, except that a star-refinement is much
finer than a mere refinement.

Exercise 4.2.27 Show that  is a preorder, but not a partial-order.


Exercise 4.2.28 Show that is transitive, but not even reflexive.

Any two covers always have a common refinement. In fact, they have a canonical (but not
unique!) largest one.
Definition 4.2.29 Meet of covers Let X be a set, and let U and V be covers of X.
Then, the meet of U and V , U V , is defined by

U V := {U V : U U and V V }.
R

(4.2.30)

The term meet and notation U V is notation taken from the theory of partiallyordered sets where x y (the meet) of x and y is defined to be inf{x, y}. In our case,
however, this is abuse of notation and terminology as  is not a partial-order (and so
infinma need not be uniquesee Exercise 1.4.2).

Exercise 4.2.31 Show that (i) U V  U , V ; and (ii) if W refines both U and V , then

it refines U V . Find an example to show that it is not the unique such cover with these
two properties.
Exercise 4.2.32 Does every cover have any maximal star-refinement?
Proposition 4.2.33 Let X be a set, let x X, and let U and V be covers of X. Then,

StarU (x) StarV (x) = StarU V (x).


R

(4.2.34)

Warning: This fails if you replace x with a general set Ssee the following counterexamplethough we should always have that inclusion.

Chapter 4. Uniform spaces

158
Proof. We simply compute.
!
[

StarU (x) StarV (x) =:

V V s.t. xV

UU s.t. xU

UU s.t. xU
V V s.t. xV

U V
(4.2.35)

U V =: StarU V (x).

UU ,V V s.t. xUV




Example 4.2.36 StarU (S) StarV (S) 6= StarU V (S) Define X := {0, 1, 2, 3}, and

U := {{0, 1}, {2}, {3}} and V := {{0, 2}, {1, 3}} .

(4.2.37)

U V = {0,
/ {0}, {1}, {2}, {3}} .

(4.2.38)

Then,

Define S := {1, 2}. Then, StarU (S) = {0, 1} {2} = {0, 1, 2} and StarV (S) = {0, 2}
{1, 3} = {0, 1, 2, 3}, and so
StarU (S) StarV (S) = {0, 1, 2}.

(4.2.39)

On the other hand,


StarU V (S) = {1, 2}.

(4.2.40)

Exercise 4.2.41 Show that if U1  U2 (resp.U1 U2 ) and V1  V2 (resp. V1 V2 ), then

U1 V1  U2 V2 (resp. U1 V1 U2 V2 ).

Later it will be useful to know that taking the preimage preserves (star-)refinement. But first,
we need to be a bit careful about what we mean by the image and preimage of a cover.
Definition 4.2.42 Image and preimage of a cover Let f : X Y be a function, let

U be a cover of X, and V be a cover of Y . Then, the image of U , f (U ), is defined by


f (U ) := { f (U) : U U }.

(4.2.43)

The preimage of V , f 1 (V ), is defined by


f 1 (V ) := { f 1 (V ) : V V }.
R

(4.2.44)

We needed to make these definitions because, technically speaking, we only defined


the image an preimage of subsets of X and Y respectively. As U and V are subsets of
2X and 2Y respectively, not X and Y , to talk about their usual image and preimage,
we would need to have a function from 2X to 2Y .a In particular, note that the definition
of the preimage of a cover is not {U 2X : f (U) V }.

a Actually, given a function f : X Y , we obtain a function from 2X to 2Y and a function from 2Y to


2X f : 2X 2Y (the function that sends a set to its image) and f 1 : 2Y 2X (the function that sends a set
to its preimage) respectively. The preimage of a cover is the image of the cover with respect to the preimage
function f 1 : 2Y 2X . Likewise, the image of a cover is the image of the cover with respect to the image
function f : 2X 2Y .

4.2 Basic definitions and facts

159

Proposition 4.2.45 Let f : X Y be a function and let U and V be covers of Y such that

U  V (U V ). Then, f 1 (U )  f 1 (V ) ( f 1 (U ) f 1 (V )).

Proof. We first do the case with U  V . Let f 1 (U) f 1 (U ) with of course U U .


Then, as U  V , there is some V V such that U V . Then, f 1 (U) f 1 (V ), and so
f 1 (U )  f 1 (V ).
Now we do the case U V . Let f 1 (U) f 1 (U ) with of course U U . Then, as
U V , there is some V V such that StarU (U) V . Hence,
Star f 1 (U ) ( f 1 (U)) f 1 (StarU (U)) f 1 (V ),

(4.2.46)

where we have applied Proposition 4.2.9, and so U V .

Exercise 4.2.47 Show that if U  V , then f (U )  f (V ).


Proposition 4.2.48 Let f : X Y be a function and let U and V be covers of Y . Then,

f 1 (U V ) = f 1 (U ) f 1 (V ).

Proof. We simply compute.




f 1 (U ) f 1 (V ) := f 1 (U) f 1 (V ) : U U ,V V


= f 1 (U V ) : U U ,V V =: f 1 (U V ).

(4.2.49)


Unfortunately, however, in general, it will not be the case that the image preserves starrefinements.
Exercise 4.2.50 Find an example of covers U and V with U V , but f (U ) not a

star-refinment of f (V ).
However, in special case, it will.
Proposition 4.2.51 Let f : X Y be a function and let U and V be covers of X such that

U V . Then, if f is surjective and f 1 ( f (U)) = U for all U U , then f (U ) f (V ).


Proof. Suppose that f is surjective and f 1 ( f (S)) = S for all S X.
Exercise 4.2.52 Using the fact that f preserves stars by Proposition 4.2.18 to show

that f (U ) f (V ).

4.2.2

Uniform spaces
Definition 4.2.53 Uniform space A uniform space is a set X equipped with a nonempty

fof covers, the uniformity, such that


collection U
fand U V , then V U
f; and
(i). (Upward-closed) if U U
f, then there is some W U
f such that W U
(ii). (Downward-directed) if U , V U

Chapter 4. Uniform spaces

160
and W V .
R

fare uniform covers.


The elements of U

The intuition is that, in a given uniform cover U , every element of U is of the same
size (think U := {B (x) : x R} for a fixed > 0).

Note that, by taking U = V in (ii), we see that, in particular, every uniform cover is
star-refined by some other uniform cover.

Note that the cover {X} is an element of every uniformity. This follows from the fact
that any collection of uniform covers is required to be nonempty and the fact that
collections of uniform covers are upward-closed with respect to star-refinement. (We
mention this because sometimes that {X} is a uniform cover is taken as an axiom, in
place of the requirement that the collection of uniform covers simply be nonempty.)

Of incredible importance is that uniformities define a canonical topology. Thus, we can think
of uniform spaces as topological spaces with extra structure.
fi be a uniform space. Then, for x X,
Proposition 4.2.54 Uniform topology Let hX, U
n
o
f
Bx := StarU (x) : U U

(4.2.55)

is a neighborhood base at x. The topology defined by this neighborhood base is the uniform
f.
topology on X with respect to U
R

Unless otherwise stated, uniform spaces are always equipped with the uniform
topology.

Proof. Let StarU1 (x), StarU2 (x) Bx . Let U3 be a common star-refinement of both U1 and
U2 . We wish to show that
StarU3 (x) StarU1 (x), StarU2 (x).

(4.2.56)

By 1 2 symmetry, it suffices to just prove one of these inclusions. By definition, we have


[

StarU3 (x) :=

U.

(4.2.57)

UU3 s.t. xU

So, let U U3 contain x. Because U3 star-refines U1 , there is some V U1 such that


StarU3 (U) V.

(4.2.58)

In particular, U V . Then, V contains x, and so V StarU1 (x), and so U StarU1 (x). It


follows that
StarU3 (x) StarU1 (x),

(4.2.59)

and we are done.


R

Note that this proof did not make use of the upward-closed axiom. Thus, in fact,
uniform bases (see below in Definition 4.2.69) suffice to define the uniform topology
as well.

4.2 Basic definitions and facts




161

Example 4.2.60 Discrete and indiscrete uniform spaces Just as with topological

spaces, we can always put the largest and the smallest uniformity on a set X. The former
case, in which every cover of X is a uniform cover, is the discrete uniformity, and the latter,
in which the only uniform cover is {X}, is the indiscrete uniformity.
Exercise 4.2.61 Show that the uniform topology with respect to the discrete uni-

formity is the discrete topology and that the uniform topology with respect to the
indiscrete uniformity is the indiscrete topology.

Just as we have continuous maps between topological spaces, we have uniformly-continuous


maps between uniform spaces.
Definition 4.2.62 Uniformly-continuous function Let f : X Y be a function between

uniform spaces. Then, f is uniformly-continuous iff the preimage of every uniform cover is
a uniform cover.


Example 4.2.63 The category of uniform spaces The category of uniform spaces

is the category Uni whose collection of objects Uni0 is the collection of all uniform spaces,
for every uniform space X and uniform space Y the collection of morphisms from X to
Y , MorUni (X,Y ), is precisely the set of all uniformly-continuous functions from X to Y ,
composition is given by ordinary function composition, and the identities of the category
are the identity functions.
Exercise 4.2.64 Show that the composition of two uniformly-continuous functions

is uniformly-continuous.
R

Note that this is something you need to check in order for Uni to actually
form a category (MorUni (X,Y ) needs to be closed under composition). You
also need to verify the identity function is uniformly-continuous, but this is
trivial (the preimage of a cover is itself, so. . . ).

Definition 4.2.65 Uniform-homeomorphism Let f : X Y be a function between

uniform spaces. Then, f is a uniform-homeomorphism iff it is an isomorphism in Uni.


Exercise 4.2.66 Show that a function is a uniform-homeomorphism iff (i) it is bijective,

(ii) it is uniformly-continuous, and (iii) its inverse is uniformly-continuous.


Exercise 4.2.67 Show that if a function is uniformly-continuous, then it is continuous.
Exercise 4.2.68 Find an example of a function that is bijective and uniformly-continuous,

but not a uniform-homeomorphism.

4.2.3

Uniform bases, generating collections, and the initial and final uniformities
It is usually convenient to not specify every uniform cover explicitly, but rather, to specify a certain
collection of uniform covers and then take the uniformity generated by this collection. This is
analogous to how it is often convenient to only specify a base for a topology.

Chapter 4. Uniform spaces

162

e be a collection of
Definition 4.2.69 Uniform base Let X be a uniform space and let B

e is a uniform base for the uniformity on X iff the statement


uniform covers of X. Then, B
e
that a cover U is a uniform cover is equivalent to the statement that there is some B B
such that B U .
R

You should compare this to the definition of a base for a topology (Definition 3.1.4).

And just like with bases, the real reason uniform bases are important is because they allow us
to define uniformities. Thus, same as before, it is important to know when a collection of covers of
a set form a uniform base for some uniformity.
e be a nonempty collection of covers of X. Then,
Proposition 4.2.70 Let X be a set and let B

e is a uniform base iff B


e is downward-directed
there exists a unique uniformity for which B
with respect to .
R

Just as we did for bases, if a set X does not a priori come with a uniformity, we will
still refer to any collection of covers that is downward-directed with respect to as a
uniform base.

e is a uniform base. Let


Proof. () Suppose that there exists a uniformity for which B
e Then, there is certainly some uniform cover U which star-refines both B and
B, C B.
e are a priori taken to be uniform covers). Thus, we will be done
C (recall that covers in B
e However, because U is a uniform cover and B
e is a uniform
if we can show that U B.
e
base, there is some D B such that D U . As U star-refines both B and C , it follows
that D does as well.
e is downward-directed with respect to . Define U
fto be the collection
() Suppose that B
e As B
e is nonempty, so to is U
f (it
of covers that are star-refined by some element of B.
e
must contain {X}). By the definition of uniform bases, this was the only possibility. As B
fcontains B,
e and
is nonempty and downward-directed with respect to , it follows that U
f
in particular is nonempty. U is upward-closed with respect to because if U is a uniform
e that star-refines U , and hence in turn
cover and star-refines V , then there is some B B
star-refines V . We now check that it is downward-directed with respect to . If U and V
e that star-refine U and V respectively. Because B
e is
are covers, then there are B, C B
e
downward-directed, there is then some D B which star-refines both B and C , and hence
both U and V .

As we mentioned above at the end of the proof of Proposition 4.2.54, uniform bases define the
uniform topology just as well as the entire uniformity.
e be a uniform base for the uniform space X. Then, for x X,
Corollary 4.2.71 Let B
n
o
f
Bx := StarU (x) : U U

(4.2.72)

is a neighborhood base at x for the uniform topology.


f:= {{x} : x X} is a uniform base for the discrete uniformity.
Exercise 4.2.73 Show that U
R

That is, the collection consisting of just a single open cover, which itself is just the
collection of all singletons, forms a uniform base. In other words, you need to check
that U star-refines itself.a

4.2 Basic definitions and facts


R

163

The dual result for the indiscrete uniformity is trivialthe indiscrete uniformity by
definition only has a single cover to begin with (namely {X}), and that single cover
certainly forms a uniform base for itself.

course, while in general is not reflexive, that doesnt mean we cant at least have U U some of
the time.
a Of

We will want to check that two collections of uniform covers are the same by just looking at
uniform bases. We did not present it because we did not need to make use of it, but of course there
is an analogous result for bases of topological spaces.
e and Ce be uniform bases on X. Then, B
e and
Proposition 4.2.74 Let X be a set, and let B

e there is some C Ce with C B;


Ce determine the same uniformity iff for every B B,
e
e
and for every C C , there is some B B with B C .
e Then, B
e and Ce determine the same uniformity. Let B B.
Proof. () Suppose that B
e
is in particular in the uniformity generated by B, and hence in the uniformity generated by
e Ce, the other result is
Ce. Thus, there is some C Ce such that C B. By symmetry B
true as well.
e there is some C Ce with C B; and for every
() Suppose that for every B B,
e
e
C C , there is some B B with B C . Let U be a uniform cover in the uniformity
e Then, there is some B B
e such that B U . By the hypothesis, then,
determined by B.
e
there is some C C with C B U . Thus, U is in the uniformity determined by Ce.
e Ce symmetry, the reverse inclusion is also true.
By B

We will also want to check whether a function is uniformly-continuous by simply looking at a
uniform base.
f) (Y, Ve) be a function between uniform spaces and
Proposition 4.2.75 Let f : (X, U
ffor each
let Ce be a uniform base for Ve. Then, f is uniformly-continuous iff f 1 (C ) U
C Ce.
Proof. () There is nothing to check (because every open cover in a uniform base is itself
a uniform cover).
f for each C Ce. We need to show that the preimave
() Suppose that f 1 (C ) U
of every uniform cover is a uniform cover. So, let V Ve. Then, there is some C Ce
such that C V . Then, by Proposition 4.2.45, it follows that f 1 (C ) f 1 (V ). As
fand U
fis upward-closed with respect to , it follows that f 1 (V ) U
f, so
f 1 (C ) U
that f is uniformly-continuous.

A uniform space does not start its life as a topological spaceinstead, it obtains a canonical
topology from its uniformity. In particular, it does not make sense a priori to just restrict to open
covers. On the other hand, once we specify the uniform covers, a topology is determined, and then,
it turns out (as the following proposition shows), that it suffices to just look at open uniform covers,
more precisely, those uniform covers obtained by taking the interior of the covers in your uniform
base.

Chapter 4. Uniform spaces

164

e be a uniform base on a set X. Then, (i) Int(B) := {Int(B) : B


Proposition 4.2.76 Let B
n
o

e := Int(B) : B B
e is (still) a uniform base on
B} is (still) a cover of X and (ii) Int(B)
e
X that generates the same uniform base as B.
e and let x X. Let C be a star-refinement of B. Let C C contain x
Proof. Let B B
and let B B be such that StarC (C) B. Then,
x StarC (x) StarC (C) B,

(4.2.77)

and so x Int(B), and so indeed Int(B) is a cover of X.


e and let D be a common star-refinement of B and C . We show that
Let B, C B
Int(D) is a common star-refinement of Int(B) and Int(C ). Because of B C symmetry,
it suffices to show that it is a star-refinement of Int(C ). So, let Int(D) Int(D). Then, there
is some C C such that StarD (D) C, and so because union of the interiors is contained
in the interior of the union (Exercise 2.5.61(ii)), we have
StarInt(D) (Int(D)) StarInt(D) (D) Int(C),

(4.2.78)

and so indeed Int(D) star-refines Int(D).


e induce the same uniform structure. To show this,
e and Int(B)
It remains to show that B
e
we apply Proposition 4.2.74. Let B B and let C be a star-refinement of B. Then, Int(C )
certainly star-refines B. For the other direction, let D be a star-refinement of C . We show
that D star-refinement Int(B). Let D D and let C C be such that StarD (D) C. Let
B B be such that StarC (C) B. Let x StarD (D). Then,
x C StarC (x) StarC (C) B,
and so indeed x Int(B).

(4.2.79)


As the idea of a uniformity is inherently global in nature, there really isnt a way to define
a uniformity that is analogous to the method of defining a topology by specifying neighborhood
bases.4 Furthermore, a bit more unexpected, and quite a bit more unfortunate, is that we cannot
generate a uniformity by simply declaring any collection of covers to be uniform in a way analogous
to specifying a generating collection of a topology.
Example 4.2.80 A collection of covers for which there is no unique minimal
uniformity making the covers uniform Before we begin with the actual counter-example,


let us elaborate on what we are trying to do. If you look back to generating collections
for topologies (Proposition 3.1.12), youll see that what we would like to be true is the
fis a nonempty collection of covers on X, then there is a unique uniformity
following: if S
f such that (i) every cover in S
fis uniform with respect to U
f, and (ii) that if U
f0 is any
U
0
f
f
other uniformity for which this is true then U U . The objective then is to construction a
ffor which there exists no such uniformity.
collection S
Define X := {1, 2, 3} and
U := {{1, 2}, {2, 3}}.
4 Of

(4.2.81)

course one cannot hope to make a statement like this precise (What does it mean for a method of defining a
uniformity to be analogous to a method of defining a topology?), but hopefully the intuition is clear. All elements in a
uniform cover, no matter where they are in the space, are supposed to be thought of as the same size. How could one
hope to encode the idea of two sets living far away are of the same size by the specification of local information alone?

4.2 Basic definitions and facts

165

f1 and U
f2 which contain {U } and for which there
We construct incomparable uniformities U
f
f1 and U
f2 . This is enough,
is no uniformity U containing {U } and contained in both U
because if there were a minimal uniformity containing {U } minimality would dictate it be
f1 and U
f2 .
contained in both U
Define
U1 := {{1}, {2}, {3}, {1, 2}} and U2 := {{1}, {2}, {3}, {2, 3}} .a

(4.2.82)

These covers both star-refine themselves, and hence both {U1 } and {U2 } form uniform
f1 and U
f2 the uniformities generated by these respective uniform bases
bases: denote by U
f
(so that Uk is the collection of covers which are star-refined by Uk ). As U is star-refined by
f1 and U
f2 . It remains to check that there is no
both U1 and U2 , U is contained in both U
uniformity contained in both which contains U .
f be a uniformity containing U and contained in both U
f1 and U
f2 . U
f must then
So, let U
f1 and U
f2 , V must then in-turn be
contain a star-refinement V of U . As V is in both U
star-refined by both U1 and U2 . However, there is no cover of X which star-refines U and
f.
is star-refined by both U1 and U2 :b a contradiction. Therefore, there is no such U
a Draw

a picture.
= {1, 2} and StarU2 ({2}) = {2, 3}, and so V would have to contain a set which contains
{1, 2} and a set which contains {2, 3}. This implies that the star of either of these sets with respect to V is
{1, 2, 3}, which is not contained in any element of U .
b Star

U1 ({2})

With topological spaces, we could define a topology by specifying the closure or interior, or
defining a notion of convergence. To the best of my knowledge, there is no analogous method for
any of these definitions of defining uniformities. As for the initial and final topologies, however,
there are analogous constructions (which you might not have expected as we no longer can simply
generate uniformities like we could with topologies).
Proposition 4.2.83 Initial uniformity Let X be a set, let Y be an indexed collection of

uniform spaces, and for each Y Y let fY : X Y be a function. Then, there exists a
fon X, the initial uniformity with respect to { fY : Y Y }, such that
unique uniformity U
ffor all Y Y ; and
(i). fY : X Y is uniformly-continuous with respect to U
f0 is another uniformity for which each fY is uniformly-continuous, then U
f U
f0 .
(ii). if U
Furthermore,
(i). if CeY is a uniform base for Y , then the collection of all finite meets of covers of the
f; and
form fY1 (V ) with V CeY is a uniform base for U
(ii). the uniform topology of the initial uniformity is the same as the initial topology.
R

In other words, the initial uniformity is the smallest uniformity for which each fY is
uniformly-continuous.

But what about the largest such uniformity? Well, the largest such uniformity is
always going to be the discrete uniformity, which is not very uninteresting. This is
how you remember whether the initial uniformity is the smallest or largestit cant
be the largest because the discrete uniformity always works.

Chapter 4. Uniform spaces

166

Proof. Let CeY be any uniform base for Y and let Ce be the collection of all finite meets of
covers of the form fY1 (V ) for V CeY . We wish to check that this is a uniform base.
So, let fY1
(V1 ) fY1
(Vm ) and fY1
(Vm ) fY1
(Vm+n ) be two elements of Ce.
m
m+n
1
m+1
Let Wk be a star-refinement of Vk in CeY . Then, because preimage preserves star-refinement
k

(Proposition 4.2.45) and the meet is compatible with the relation of star-refinement
(Exercise 4.2.41), it follows that
fY1
(W1 ) fY1
(Wm+n )
m+n
1

(4.2.84)

is a star-refinement of the two initial covers.


e is a uniform base. Denote by U
fthe uniformity generated by U
f.
Thus, B
1
By construction, fY (V ) is a uniform cover for all V CeY , so each fY is certainly
f0 were a uniformity for which each fY were
uniformly-continuous. On the other hand, if U
f0 would have to contain every cover of the form f 1 (V ) for
uniformly-continuous, then U
Y
f0 would have to
V CeY a uniform cover of Y . As uniformities are closed under meets,a U
f, as desired.
contain Ce and hence U
Exercise 4.2.85 Show that the initial uniformity is unique.

We finally check that the uniform topology of the initial uniformity agrees with the
initial topology. First of all, each fY is uniformly-continuous, hence continuous, and so
as the initial topology is the coarsest topology for which each fY is continuous, the initial
topology must be coarser than the uniform topology of the initial uniformity.
In the other direction, let U X be open with respect to the uniform topology of the
initial uniformity. By the defining result of the uniform topology (Proposition 4.2.54),
e such that x StarB (x) U. Write
this means that for all x U, there is some B B
1
1
B = fY1 (V1 ) fYm (Vm ). As, for points, the star with respect to a wedge is the
intersection of the stars (Proposition 4.2.33),
StarB (x) = Star f 1 (V1 ) (x) Star f 1 (Vm ) (x)
Y1

Ym

= fY1
(StarV1 ( fY1 (x))) fY1
(StarVm ( fYm (x))) .
m
1

(4.2.86)

where we have applied Proposition 4.2.23. As each StarVk ( fYk (x)) is a neighborhood of
fYk (x) Yk , fY1
(StarVk ( fYk (x))) is a neighborhood of x X for the initial topology, and so,
k
by the above equality, StarB (x) is likewise a neighborhood of x X for the initial topology.
This shows that every point in U has a neighborhood for the initial topology contained in U,
and hence U is likewise open for the initial topology, as desired.

a Why

Of course, we have a result that is perfectly analogous to Proposition 3.4.28 (a function is


continuous iff its composition with each fY is continuous).
Proposition 4.2.87 Let X have the initial uniformity with respect to the collection { fY :

Y Y }, let Z be a uniform space, and let f : Z X be a function. Then, f is uniformlycontinuous iff fY f is uniformly-continuous for all Y Y . Furthermore, the initial
uniformity is the unique uniformity with this property.
Proof. We leave the proof as an exercise.

4.2 Basic definitions and facts

167

Exercise 4.2.88 Prove this result, using the proof of Proposition 3.4.28 (the analo-

gous result for the initial topology) as guidance.



And just as we had with topological spaces, there is a dual version of the initial uniformity.
Proposition 4.2.89 Final uniformity Let X be a set, let Y be an indexed collection of

uniform spaces, and for each Y Y let fY : Y X be a function. Then, there exists a
fon X, the final uniformity with respect to { fY : Y Y }, such that
unique uniformity U
f; and
(i). fY : Y X is uniformly-continuous with respect to U
f0 is another uniformity for which each fY is uniformly-continuous, then U
f U
f0 .
(ii). if U
Furthermore, the uniform topology of the final uniformity is coarser than the final topology.
R

In other words, the final uniformity is the largest uniformity for which each fY is
uniformly-continuous.

But what about the smallest such uniformity? Well, the smallest such uniformity is
always going to be the indiscrete uniformity, which is not very interest. This is how
you remember whether the final uniformity is the smallest or largestit cant be the
smallest because the indiscrete uniformity always works.

For both the initial and final topologies, as well as the initial uniformity, we had a
relatively concrete description of the induced structure in terms of the structure on the
elements of Y . To the best of my knowledge, there is no such analogous description
for the final uniformity. For better insight as to why that is, see the proof.

Warning: The uniform topology of the final uniformity can be strictly coarser than
the final topologysee the following counter-example.

Proof. a Let Z be an indexed collection containing a copy of a uniform space Z for all
functions gZ : X Z that have the property that gZ fY is uniformly-continuous for all
fbe the initial uniformity with respect to {gZ : Z Z }.
Y Y . Let U
f.
By the previous result Proposition 4.2.87, each fY is uniformly-continuous U
0
f
Let U be another uniformity on X for which each fY is uniformly-continuous. Let U 0
0
f. We wish to show that U 0 U
f. To shows this, it suffices to show that : hX, U
fi
U
0
f
hX, U i is uniformly-continuous. However, fY is uniformly-continuous by assumption
for all Y Y , and so is amongst the gZ s, and in particular, is uniformly-continuous, as
desired.
Exercise 4.2.90 Show that the final uniformity is unique.

We finally check that the uniform topology of the final uniformity is coarser than the
final topology. First of all, each fY is uniformly-continuous, hence continuous, and so as
the final topology is the finest topology for which each fY is continuous, the final topology
must be finer than the uniform topology of the final uniformity.

a Proof

adapted from [18, Theorem 1.2.1.1].

Chapter 4. Uniform spaces

168

Example 4.2.91 The final topology need not agree with the uniform topology
of the final uniformity a Let q : R {(, 0], (0, )} =: X denote the quotient map. To


simplify notation, let us write x1 := (, 0] and x2 := (0, ). A subset of X is open in the


final topology iff its preimage under q is open. It follows that the final topology on X is
{0,
/ {x2 }, X}.
Obviously, every cover of X either contains X itself or it does not. If it does not, then it
must contain separately x1 and x2 , and so must be {{x1 }, {x2 }} (or this together with the
empty-set). The preimage of this cover under q is the cover of q {(, 0], (0, )}, which
is never a uniform cover of R.b Thus, every uniform cover of X (in the final uniformity)
must contain X. It follows that the uniform topology of the final uniformity is the indiscree
topology, which is strictly coarser than the final topology.
a Adapted

from [13, Exercise 2.1.10].


be a uniform cover of R, it must be star-refined by a cover of -balls, and the star of an -ball centered
at 0 R is contained in neither (, 0] nor (0, ).
b To

And the result dual to Proposition 4.2.87:


Proposition 4.2.92 Let X have the final uniformity with respect to the collection { fY : Y

Y }, let Z be a uniform space, and let f : X Z. Then, f is uniformly-continuous iff fY f


is uniformly-continuous for all Y Y . Furthermore, the final uniformity is the unique
uniformity with this property.
Proof. We leave the proof as an exercise.
Exercise 4.2.93 Prove this result, using the proof of Proposition 3.4.34 (the analo-

gous result for the final topology) as guidance



Of course, just as with topological spaces, a key application of the initial and final uniformities
is that they provide canonical uniformities on subsets, quotients, products, and disjoint-unions. The
definitions and results are completely analogous to the case of topological spaces, and so we omit
stating them explicitly.
After having discussed the real numbers themselves as a uniform space, we show below
(Example 4.2.107) that functions even as nice as polynomials are not uniformly continuous. On the
other hand, when restricted to quasicompact sets, all continuous functions are uniformly-continuous.
Proposition 4.2.94 Cantor-Heine Theorem Let f : X Y be a continuous function

between uniform spaces and let K X be quasicompact. Then, f |K : K Y is uniformlycontinuous.


R

Ideally we would have presented this result shortly after giving the definition of
uniformly-continuous functions, however, we do technically need the notion of the
subspace uniformity to state this result.

Proof. Let V be a uniform cover of Y . We would like to show that f |K 1 (V ) = f 1 (V )


{K} is a uniform-cover of K. To do this, by upward-closedness, it suffices to find a
uniform-cover of K which star-refines f 1 (V ) {K}.
f 1 (V ), while not necessarily a uniform cover of X, will certainly be an open cover, and
in particular will be an open cover of K. So, for x K, let Vx V be such that x f 1 (Vx ).

4.2 Basic definitions and facts

169

Then, choose a uniform cover Ux of X such that


StarUx {K} (x) f 1 (Vx ) K.

(4.2.95)



StarUx {K} (x) : x K

(4.2.96)

As

is an open cover of K, there is a finite subcover. So, let x1 , . . . , xm K be such that


n
o
StarUxk {K} (xk ) : 1 k m
(4.2.97)
is an open cover of K. Let U be a common star-refinement of each Uk a . Then,
StarU0 {K} (x) f 1 (Vx ) K

(4.2.98)

for all x K.
Let U be in turn a star-refinement of U0 . We show that U {K} is a star-refinement of
f 1 (V ) {K}. So, let U U . Let U0 U0 be such that StarU (U) U0 . Let x U. Then,
StarU {K} (U K) U0 K StarU0 {K} (x) f 1 (Vx ) K.

(4.2.99)


a It

is here that the finiteness given to us by quasicompactness is key.

Example 4.2.100 The real numbers The real numbers have a canonical uniformity

(and in fact, we will see below that this is just a special base of a more general construction):
let > 0 and define
U := {B (x) : x R}

(4.2.101)

f:= {U : > 0} .
U

(4.2.102)

and

fis a uniform base on R.


Exercise 4.2.103 Show that U

Chapter 4. Uniform spaces

170

Exercise 4.2.104 Show that f : R R is uniformly-continuous iff for every > 0


there is some > 0 such that f (B (x)) B ( f (x)) for every x R.
R

Compare this with the condition for f : R R being continuous at a R


given in Exercise 2.5.11(iii). We will spell-it-out here for convenience:
f : R R is continuous iff for every x R and for every > 0
(4.2.105)
there is some > 0 such that f (B (x)) B ( f (x)).
The key difference between continuity and uniform-continuity is the location
in which the quantification for every x R appears. In the former (just
continuous case), your choice of is allowed to depend on x, whereas to be
uniformly-continuous, a single has to work for every x R.

The result you just proved characterizing uniform-continuity in R is often


taken as the definition of uniform-continuity. Had we studied uniformcontinuity in the context of just the real numbers first (as opposed to in the
context of uniform spaces), we would have done the same. My personal
feeling, however, is that uniform continuity is not that incredibly important,
at least not to the point where it is worth going out of our way to discuss
it just in the context of R. The real reason we discuss uniform spaces is
for the purpose of discussing cauchyness and completeness, not uniform
continuity per se (and also of course because a huge collection of examples
of topological spaces are canonically uniform spaces).

Example 4.2.106 A uniformly-continuous function By Proposition 4.2.94, any con-

tinuous function restricted to a quasicompact set will be uniformly-continuous, so, for


example the function x 7 x2 is uniformly-continuous on [0, 1]. However, be careful: it is
not uniformly-continuous on all of R.
Example 4.2.107 A continuous function that is not uniformly-continuous Define
f : R R by f (x) := x2 . Of course f is continuous (because it is the product of continuous
functionssee Exercise 2.5.12).
On the other hand, we show that f does not satisfy the condition given in the previous
exercise. Take := 1. Then, if f were uniformly-continuous, there should be some > 0
such that
 2



x : |x x0 | < =: f (B (x0 )) B ( f (x0 )) := x R : |x x02 | < 1
(4.2.108)


for all x0 R. However, (x0 + 21 )2 is an element of the left-hand side, but


x0 + 12

2

x02 = x0 + 14 2

(4.2.109)

is not less than 1 in general (for example, for x0 = 1 (1 14 2 ).


On the other hand, by Proposition 4.2.94, f restricted to any closed interval is uniformlycontinuous.

4.3

Semimetric spaces and topological groups


As was previously mentioned, one big motivation for studying uniform spaces is that a huge
collection of very important examples of topological spaces admit a canonical uniformity. Two

4.3 Semimetric spaces and topological groups

171

such families of spaces that we will study are semimetric spaces and topological groups.
4.3.1

Semimetric spaces
Before we talk about any sort of uniformity, we had better first say what we mean by semimetric
space.
Definition 4.3.1 Semimetric and metric Let X be a set. Then, a semimetric on X is a

function |,| : X X R+
0 such that
(i). (Symmetry) |x, y| = |y, x|; and
(ii). (Triangle Inequality) |x, z| |x, y| + |y, z|.
|,| is a metric if furthermore (Definiteness) |x, y| = 0 implies x = y.

Semimetrics are also sometimes called pseudometric. However, the term seminorm
(something we havent discussed yetsee Definition 4.3.69) is actually much more
common than either of these terms, and as metrics are to norms (also something we
havent discussed yetsee Definition 4.3.69 again) as semimetrics/pseudometrics
are to seminorms, we feel as if the terminology semimetric is more appropriate.

It is much more common to denote (semi)metrics by d(, ), however, this conflicts with our conventions of reserving the letter d for dimension and differentials.

Example 4.3.2 Let X := R and define |x, y| := |x y|. Then, |,| is in fact a metric.
Of course, this is where the notation |,| in general comes from.

Example 4.3.3 A semimetric that is not a metric Let X be a topological space and
let K X be quasicompact. For f , g MorTop (X, R), we define


| f , g|K := sup{| f (x) g(x)|}.a .

(4.3.4)

xK

In general, this will not be a metric. For example, take X := R and K := [0, 1]. Then,
| f , 0|K = 0 iff f |[0,1] = 0, but of course, there are many nonzero real-valued continuous
functions on R that vanish on [0, 1] (by Urysohns Lemma (Theorem 3.6.107), for example,
if you want to use a sledgehammer (or maybe just a hammer?) to swat a fly).
a We

require that K be quasicompact so that f g is bounded on K (by the Quasicompactness and the
Extreme Value Theorem (Theorem 3.7.26))

Definition 4.3.5 Semimetric space A semimetric space is a set X equipped with a

collection D of semimetrics such that if |x, y| = 0 for all |,| D, then x = y.


R

For some reason, it seems that semimetric spaces are also referred to as gauge spaces.
Off the top of my head, I can think of at least two other distinct ways in which the
term gauge is used in mathematics, and so I would recommend not using this
terminology

Definition 4.3.6 Metric space A metric space (X, |, |) is a semimetric space (X, D)

in which D is a singleton, D = {|,|}.


R

Of course, the definiteness condition in the definition of a semimetric space forces

Chapter 4. Uniform spaces

172
|, | to be a metric.

The reason we take D to be a singleton instead of just an arbitrary collection of


metrics is to agree with standard terminology (metric spaces are almost always taken
to be sets equipped with a (single) metric).

Example 4.3.7 A semimetric space that is not a metric space Let X be a topological space, and for K X quasicompact nonempty, let |,|K be the semimetric on
MorTop (X, R) in (4.3.4), that is


| f , g| := sup{| f (x) g(x)|}.

(4.3.8)

xK

We already know from Example 4.3.3 that each |,|K is a semimetric on MorTop (X, R).
What we need to show that | f , g|K = 0 for all K X quasicompact implies f = g. This
however follows from the fact that K = {x} for x X is quasicompact (do you see why?).
As D clearly contains more than one element (at least so long as X contains more than one
point), you might think that this shows that this cannot be a metric space. However, the real
question is is it uniformly-homeomorphic to a metric space?a While not a proof, it is clear
that it should not be as, unless X is quasicompact itself, no element of D will actually be a
metric.
a Of

course, this doesnt quite make sense yet as we have not put a uniformity on semimetric spaces.

We noted at the very beginning of this chapter that the star of a point with respect to B in R is
just the ball of 2. You should be careful, however, as this does not hold in general.


Example 4.3.9 A metric space for which StarB (x) 6= B2 (x) Define X := {0, 1}

equipped with the metric |,| for which |0, 1| = 1, := 1, and x0 := 0. Then,
B = {{0}, {1}} ,

(4.3.10)

and so
StarB (x0 ) = {0}.

(4.3.11)

On the other hand,


B2 (x0 ) = {0, 1}.

(4.3.12)

Despite this, we always have one inclusion.


Exercise 4.3.13 Let X be a metric space, let x X, and let > 0. Show that StarB (x)

B2 (x).
Its worth noting that, in a metric space, for every closed subset, the distance (as defined below
in (4.3.15) from a point to the closed subset is a continuous function.
Proposition 4.3.14 Let hX, |,|i be a metric space and let C X. Then, the function

distC : X R defined by
distC (x) := inf {|x, c|}
cC

(4.3.15)

4.3 Semimetric spaces and topological groups

173

is uniformly-continuous and furthermore distC1 (0) = C.


Proof. Let > 0. Let x1 , x2 X lie in some ball. Choose some c C such that |x1 , c|
distC (x1 ) < .. Then,
distC (x2 ) |x2 , c| |x2 , x1 | + |x1 , c| < 2 + distC (x1 ),

(4.3.16)

and so
distC (x2 ) distC (x1 ) < 2.

(4.3.17)

By 1 2 symmetry, we also have that


distC (x1 ) distC (x2 ) < 2,

(4.3.18)

and hence
|distC (x1 ) distC (x2 )| < 2.

(4.3.19)

This shows that distC is uniformly-continuous.


Of course C distC1 (0). On the other hand, if x distC1 (0), then x is an accumulation
point of C, and hence contained in C. Thus, C = distC1 (0).

Proposition 4.3.20 Metric spaces are uniformly-perfectly-T4 .

Proof. Let X be a metric space and let C1 ,C2 X be closed and disjoint.
Exercise 4.3.21 Show that there is some f1 : X [0, 12 ] uniformly-continuous,

equal to 0 precisely on C1 , and equal to 12 on C2 . Similarly, show that there is some


f2 : X [0, 12 ] uniformly-continuous, equal to 12 precisely on C2 , and equal to 0 on
C1 .

Define f := f1 + f2 . Then, this is certainly 0 on C1 and 1 on C2 . Conversely, suppose


that f (x) = 0. Then, in particular, f1 (x) = 0 = f2 (x) = 0, and so in particular x C1 . On the
other hand, suppose that f (x) = 1. Then, we must have in particular that f2 (x) = 12 , which
implies that x C2 . Thus, f 1 (0) = C1 and f 1 (1) = C2 , and hence X is perfectly-T4 . 
Now that weve gotten that out of the way, we are ready to equip semimetric spaces with a
topology and uniformity.
Definition 4.3.22 Uniformity on a semimetric space Let (X, D) be a semimetric

space, and equip X with the uniformity generated by the uniform base defined by
n
o
+
1 ,...,|,|m
eD := U|,|,...,
B
:
m

Z
;
|,|
,
.
.
.
,
|,|

D;

,
.
.
.
,

>
0
,
(4.3.23)
1
m
1
m
m
1
where
|,| ,...,|,|m

1
B1 ,...,
m

n
o
|,|1 ,...,|,|m
:= B1 ,...,
(x) : x X
m

(4.3.24)

and
|,| ,...,|,|m

1
B1 ,...,
m

:= {y X : |y, x|1 < 1 , . . . , |y, x|m < m } .

(4.3.25)

Chapter 4. Uniform spaces

174

eD is indeed a uniform base.


Exercise 4.3.26 Show that B

Example 4.3.27 Discrete metric Not only does the discrete topology come from a
uniformity, but so to does the discrete uniformity in turn come from a metric.
Let X be a set and for x, y X, define
(
0 if x = y
|x, y| :=
.
(4.3.28)
1 otherwise


Exercise 4.3.29 Show that |,| is indeed a metric on X.


Exercise 4.3.30 Show that the uniformity defined by |,| is the discrete uniformity.

Exercise 4.3.31 Why does the indiscrete uniformity (on a set with at least two elements)

not come from a metric?


The following immediately follows from our result Proposition 4.2.75 characterizing uniformcontinuity in terms of uniform bases.
fi be a uniform space, let hY, |,|i be a metric space, and let
Proposition 4.3.32 Let hX, U
fsuch
f : X Y . Then, f is uniformly-continuous iff for every > 0 there is some U U
that for every U U , whenever x1 , x2 U, it follows that | f (x1 ), f (x2 )| < .
But before we head onto topological groups, what about the morphisms in the category of
semimetric spaces, you ask? Good question.
Definition 4.3.33 Bounded map (of semimetric spaces) Let f : hX, Di hY, E i be

a function between semimetric spaces. Then, f is bounded iff for every |,|0 E , there are
finitely-many |,|1 , . . . , |,|m D and constants K1 , . . . , Km 0 such that
| f (x1 ), f (x2 )|0 K1 |x1 , x2 |1 + + Km |x1 , x2 |m

(4.3.34)

for all x1 , x2 X.
R

If X and Y are metric spaces with metric |,|X and |,|Y respectively, this condition
reads just
| f (x1 ), f (x2 )|Y K|x1 , x2 |X .

(4.3.35)

In this case, f is called lipschitz-continuousa


R

Of all the categories weve come across, that the bounded maps are the right
notion of morphism between semimetric spaces is probably the least obvious.b The
motivation for the definition is that, for seminormed vector spaces, this definition is
equivalent to continuitysee Exercise 4.3.79. Perhaps a simpler explanation is that
it guarantees uniform-continuity.

a Dear lord. His name is a juxtaposition of the word lip and the word shits. You have my sympathies,
sir. . . .

4.3 Semimetric spaces and topological groups

175

b Of course, we can declare any collection of morphisms we like. Its just that, taking the morphisms to be
all functions when the objects are groups (for example) is not particularly usefulthe category wont be able to
tell that the groups are groups!

Exercise 4.3.36 Show that bounded maps between semimetric spaces are uniformly-

continuous.


Example 4.3.37 The category of semimetric spaces The category of semimetric

spaces is the category SemiMet whose collection of objects SemiMet0 is the collection of
all semimetric spaces, for every semimetric space X and semimetric space Y the collection
of all morphisms from X to Y , MorSemiMet (X,Y ), is precisely the set of all bounded maps
from X to Y , composition is given by ordinary function composition, and the identities of
the category are the identity functions.
R

a The

Every semimetric space is canonically a uniform spacesee Definition 4.3.22. By the


previous exercise, every bounded map is likewise uniformly-continuous. Therefore,
in fact, the category SemiMet embeds in Uni.a
thing to take note of is that both the objects and the morphisms have to be contained in Uni.

Example 4.3.38 A lipschitz-continuous function The function x 7 x from R to R.

Example 4.3.39 A uniformly-continuous function that is not lipschitz-continuous

Define f : [0, 1] R by f (x) := x. This function is continuous on [0, 1], and hence
uniformly-continuous because [0, 1] is quasicompact by the Quasicompactness (and by
Proposition 4.2.94). On the other hand, to show that it is not lipschitz-continuous, we need
to show that

x y
(4.3.40)
xy
is not bounded for x, y [0, 1] distinct. However, simply take x = 0. Then, we need to show
that

y
1
=
(4.3.41)
y
y
is not bounded on [0, 1]. Equivalently, you can show that limy0+

y = 0.a

a We have technically not defined one-sided limits. If this bothers you, its not a bad exercise to try to
come-up with the definition yourself.

4.3.2

Topological groups
Before we talk about any sort of uniformity, we had better first define what we mean by a topological
group.
Definition 4.3.42 Topological group A topological group is a group hG, , 1, 1 i

(Definition A.1.117) equipped with a topology such that


(i). : G G G is continuous; and

Chapter 4. Uniform spaces

176
(ii). 1 : G G is continuous.
R

That is to say, a topological group is a thing that is both a group and a topological space, subject to a couple of compatibility axioms that demand that the two
structures work together. This is very analogous to our definition of preordered rgs
(Definition 1.1.42)a preordered rg is both a preordered set and a rg subject to a
couple of compatibility conditions. This idea is not uncommon throughout all of
mathematics, and, as you might have expected by this point, can be unified with the
use of categories.

Recall that in a remark of the definition of a group (Definition A.1.117), we made a


slight deal about having inverses not being stated as an extra property but rather as
extra structure. This is one reason why. When we go to define a topological group,
the operation of taking inverses should be thought of as just thatan operation, on
the same footing as the product. If we think of the operation of taking inverses as on
the same footing as the product, then we almost have to also assume that the inverse
operation is likewise continuous, whereas if it were thought of just as an existence
property, it would not make as much sense to do this.

Warning: You do need to check both: it is possible that multiplication be continuous


but not inversion, or vice-versasee the following counter-examples.

Example 4.3.43 A topology on a group for which multiplication is continuous


but inversion is not a Define G := Q = {q Q : q 6= 0} equipped with the ordinary


multiplication in Q.
We define a topology on G by specifying a neighborhood base for the topology. For q G,
define
Bq := {(q + nZ) G : n Z, n 6= 0} .

(4.3.44)

We claim that this satisfies the hypotheses of Proposition 3.1.11, so that there will be a
unique topology on G for which {Bq : q G} is a neighborhood base. So, let (q + n1 Z)
G, (q + n2 Z) G Bq . We wish to show that an element of Bq is contained in their
intersection. This is indeed the case as (q + lcm(n1 , n2 )Z) (q + n1 Z) (q + n2 Z), where
here lcm is the least-common multiple, the smallest natural number which is a multiple of
both n1 and n2 .
We now wish to show that all the elements in this neighborhood base are in fact open
neighborhoods. So, fix an element in the neighborhood base (q0 + n0 Z) G and let q
(q0 + n0 Z) G. We need to show that an element of Bq is contained in (q0 + n0 Z) G.
We claim that in fact q + n0 Z = q0 + n0 Z. As q q0 + n0 Z, we can write q = q0 + n0 k
for some k Z. Hence, q + n0 Z = q0 + n0 k + n0 Z = q0 + n0 Z. Thus, all elements of the
neighborhood base are in fact open. It follows (Proposition 3.1.11 again) that
{q + nZ : q G, n Z, n 6= 0}

(4.3.45)

is a base for this topology.


The odd integers, (1 + 2Z) G = 1 + 2Z, thus themselves form
 an open set in this topology
(take q = 1 and n = 2), and its preimage under inversion is m1 : 2Z + 1 . If this were open,
then in particular, we would be able to find an element of B1 contained it in, which is
impossible as the only integers in this set are 1.
We now show that multiplication on the other hand is continuous. So, let 7 hq , r i
G G be a net converging to hq, ri G G. We wish to show that 7 q r converges

4.3 Semimetric spaces and topological groups

177

to qr. To do that, it suffices to show that 7 q r is eventually contained in qr + nZ


for all nonzero n Z. So, fix n Z nonzero arbitrary. That 7 hq , r i converges to
hq, ri implies that 7 q converge to q and 7 r converges to r (Corollary 3.5.19).
Thus, there is some 0 such that q q + nZ and r r + nZ whenever 0 . Of course,
however, if q q + nZ and r r + nZ, then we have that q r qr + nZ, as desired.
a This

is adapted from an answer on math.stackexchange.

Example 4.3.46 A topology on a group for which inversion is continuous but


multiplication is not Define G := R and equip G with the cocountable topology. Inversion


is bijective, and so the preimage of a countable set is countable, and hence the preimage of
every closed set under inversion is closed. Thus, inversion is continuous.
We now check that multiplication is not continuous. The set {x G : x 6= 1} is cocountable,
and hence open. Therefore, its preimage under multiplication,
S := {hx, yi G G : xy 6= 1} ,

(4.3.47)

should likewise be open. Thus, as h1, 2i S, if S were open, we should be able to find
U,V G open (i.e. cocountable) such that h1, 2i U V S. If this were true, then in
particular we would have that
 1

hx, x i : x R = Sc (U V )c = (U c R ) (R V c ).
(4.3.48)
There are uncountably many x R not contained in U, and as V c is countable, at least
one of them (in fact, uncountably many of them) must have the property that 1x is likewise
not in V . This, however, yields an element of Sc that cannot be contained in (U V )c : a
contradiction.


Example 4.3.49 The category of topological groups The category of topological

groups is the category TopGrp whose collections of objects TopGrp0 is the collection of all
topological groups, for every topological group G and topological group H the collection
of morphisms from G to Y , MorTopGrp (G, H), is precisely the set of all continuous group
homomorphisms from G to H, composition is given by ordinary function composition, and
the identities of the category are the identity functions.


Example 4.3.50 A topological group that is not T0 Define G := R/Q.a Let q : R

G be the quotient map (i.e. the map that sends an element to its equivalence class) and equip
R/Q with the quotient topology.
Exercise 4.3.51 Show that + : G G G and 1 : G G are continuous.

We now check that the quotient topology on G is not T0 . So, let x R be irrationals. We wish
to show every open neighborhood of x + Q contains 0 + Q and conversely. So, let U x + Q
be open. Then, by definition, q1 (U) is an open neighborhood of x, and so by density,
must

1
1
contain some rational number r q (U). But then, 0 + Q = r + Q q q (U) U. On
the other hand, if U is an open neighborhood of 0 + Q, q1 (U) is an open neighborhood
of 0, and so by density again, must contain some > 0 so that x is rational. But if
x Q, then x + Q = + Q q q1 (U) U.
R

We show in the next section that every T0 uniform space is in fact completely-T3 .

Chapter 4. Uniform spaces

178

Thus, this serves as an example of a uniform space which is not completely-T3 .


a This

is the quotient rng constructionsee Definition A.1.141.

A large number of examples of topological groups arise from totally-ordered rngs (or more
generally, totally-ordered commutative groups).
Exercise 4.3.52 Let G be a totally-ordered commutative group. Show that G is a topological

group with respect to the order topology.


Before we put a uniformity on topological groups, it will be useful to know at least one basic
fact about them.
Proposition 4.3.53 Let G be a topological group and let U be a neighborhood of the identity.

Then, there exists an open neighborhood V of the identity such that (i) VV U and (ii)
V 1 = V .
The notation means what you think it means: VV := {v1 v2 : v1 V, v2 V } (note
how this is not the same as V 2 ) and V 1 := {v1 : v V }.

Proof. Regarding the group operation as a function from GG to G, we know that +1 (U)
is an open neighborhood of h1, 1i in G G, and therefore we have that V W 1 (U) for
some V,W G open neighborhoods of the identity (by the definition of the product topology
Proposition 3.5.10). Replace V with V W , another open neighborhood of the identity,
so that V V 1 (U). In other words, VV U. Now do this exact same construction
again and find another open neighborhood of the identity W (replacing our old W ) with
WW V . Now define
W 0 := W [1 ](W 1 ),

(4.3.54)

that is, the intersection of W with the preimage of W 1 under the inverse function 1 : G
G.a This will be yet another open neighborhood of the identity, with both W 0 , (W 0 )1 W .
Finally, define
W 00 := (W 0 )(W 0 )1 .

(4.3.55)

This certainly satisfies (W 00 )1 = W 00 , and furthermore,


W 00W 00 := (W 0 )(W 0 )1 (W 0 )(W 0 )1 WWWW VV U.

(4.3.56)


a Yes,

I am aware that this notation is ridiculously obtuse.

Definition 4.3.57 Uniformity on a topological group Let G be a topological group,

and equip G with the uniformityd generated by the uniform base defined by
eG := {BU : U 3 1 is open.} ,
B

(4.3.58)

BU := {gU : g G} .

(4.3.59)

where

4.3 Semimetric spaces and topological groups

179

eG is indeed a uniform base.


Exercise 4.3.60 Show B
Note that we could have equally well taken the covers UG for only U N , N a
fixed neighborhood base of the identity (by Proposition 4.2.74).

We have a potential problem hereG started its life as a topological group, and in particular,
as a topological space. We then equipped it with a uniform structure, from which it obtains the
uniform topology. The question arises: Are these topologies the same, and if not, which one
should we use?. Fortunately, it turns out that they are the same.
Exercise 4.3.61 Show that the topology on a topological group agrees with the uniform

topology induced by the uniform base in (4.3.58).


We have yet another potential problem hereR is both a metric space and a topological group
(with respect to +), so which uniformity should we use? Fortunately, we neednt worry about this,
because the two uniformities are the same.
ehR,|,|i and B
ehR,+i define the same uniformity on R.
Exercise 4.3.62 Show that B
You might say that functional analysis is the study of topological vector spaces (the term
functional a result of the fact that many spaces of functions are topological vector spaces). As
vector spaces are in particular a group (just forget about the scalars), everything we say regarding
the uniformities of topological groups also applies to uniformities of topological vector spaces.
Thus, a knowledge of uniform spaces is very useful when studying functional analysis. In particular,
the following result is used ubiquitously (to the point where it is so common that it is not really
even mentioned)
Proposition 4.3.63 Let f : G H be a group homomorphism between topological groups

and let x0 G. Then, if f is continuous at x0 , then f is uniformly-continuous.


This shows that the category TopGrp embedsa into the category Uni. We already knew
that the objects embedded (from the canonical uniformity on topological groups
given in Definition 4.3.57)this result tells us furthermore that the morphisms
embed as well.

Proof. S T E P 1 : M A K E H Y P O T H E S E S
Suppose that f is continuous at x0 .
S T E P 2 : S H O W T H AT f I S C O N T I N U O U S .
We first show that f is continuous (as opposed to just continuous at x0 ). To show that, we
show that f is continuous at x G for arbitrary x. Let V be a neighborhood of f (x) H.
Then, f (x0 ) f (x)1V is a neighborhood of f (x0 ) H. As f is continuous at x0 , it follows

that f 1 f (x0 ) f (x)1V is a neighborhood of x0 , and so xx01 f 1 f (x0 ) f (x)1V is a
neighborhood of x.b However,


f xx01 f 1 f (x0 ) f (x)1V = f (x) f (x0 )1 f f 1 f (x0 ) f (x)1V
(4.3.64)
f (x) f (x0 )1 f (x0 ) f (x)1V = V,
so that

xx01 f 1 f (x0 ) f (x)1V f 1 (V ),

(4.3.65)

Chapter 4. Uniform spaces

180

so that f 1 (V ) is a neighborhood of x, so that f is continuous at x.


S T E P 3 : S H O W T H AT f I S U N I F O R M LY - C O N T I N U O U S .
To show that f is uniformly-continuous, we apply Proposition 4.2.75. So, let V H be
an open neighborhood of the identity and consider the cover UV := {hV : h H}. To show
that f 1 (UV ) is a uniform cover, it suffices to find an open neighborhood U G of the
identity such that UU f 1 (UV ). Take U 0 to be an open neighborhood of the identity such
that U 0U 0 f 1 (V ), and then in turn take U to be an open neighborhood of the identity
such that (i) UU U 0 and (ii) U = U 1 (which we may do by Proposition 4.3.53). We wish
to show that
xU f 1 (V ).

StarUU (U) =

(4.3.66)

xG s.t. xUU6=0/

It will follow from this (see (4.3.67)) that UU f 1 (UV ). So, let x G be such that
1 =
xU U 6= 0.
/ Then, there are u1 , u2 U such that xu1 = u2 , so that x = u2 u1
1 UU
0
0
0
1
UU U . Thus, xU U U f (V ). (4.3.66) follows from this. From this, we have
StarUU (x0U) =

xU =

x0 xU
xG s.t. xUx0U6=0/
xG s.t. xUU6=0/
x0 f 1 (V ) f 1 ( f (x0 )V ) f 1 (UV ),

= x0 StarUU (U)
(4.3.67)

so that UU f 1 (UV ).

a We

have not defined what precisely this means for categories, but with a little mathematical maturity, you
can probably figure it out. In any case, its okay if you dont know the precise definition
b The juxtaposition here is being used to denote multiplication in the group. Be careful not to confuse
preimages with inverse elements (even though the same symbol is used, the context makes the notation
unambiguous).

4.3.3

Topological vector spaces and algebras


An incredibly family of examples of topological groups are the topological vector spaces.
Definition 4.3.68 Topological vector space A topological vector space is real vector

space hV, +, 0, R, i such that


(i). hV, +, 0i is a topological group; and
(ii). : R V V is continuous.
R

Of course, this definition makes sense if we were to replace R with any topological
fielda , but for our purposes, restricting ourselves to working over the reals will be
sufficient. The other case of most interest is over C, but we have not even defined
the complex numbers. Most of the time, our topology comes from a collection of
seminorms (see Definition 4.3.69 below), in which case it takes a fair amount of
effort to work over topological fields whose topology does not come from a norm (at
least in the traditional sense of the word).

a What

do you think the definition of a topological field should be?

One big reason why we are interested in topological vector spaces is because almost all of the
examples of semimetrics we counter actually come seminorms.

4.3 Semimetric spaces and topological groups

181

Definition 4.3.69 Seminorm and norm Let V be a real vector space. Then, a seminorm

on V is a function || : V R+
0 such that
(i). (Homogeneity) |v| = |||v| for R and v V ;
(ii). (Triangle Inequality) |v1 + v2 | |v1 | + |v2 |.
|| is a norm if furthermore (Definiteness) |x| = 0 implies x = 0.
Definition 4.3.70 Semimetric induced by a seminorm Let V be a real vector space,

let || be a seminorm on V , and let v1 , v2 V . Then, the semimetric induced by ||, |,|, is
defined by
|v1 , v2 | := |v1 v2 |.

(4.3.71)

Exercise 4.3.72 Check that |,| is indeed a semimetric. Show that if || is a norm

then |,| is a metric.


R

Intuitively, the seminorm of something is like its size and semimetric is like
distancethe distance between two vectors is the size of their difference.

Definition 4.3.73 Seminormed vector space and normed vector space A semi-

normed vector space is a real vector space V together with a collection of seminorms D
such that hV, Di is a semimetric space. hV, Di is a normed vector space iff it is furthermore
a metric space.
As by now you should have expected, the morphisms of seminormed vector spaces are the
bounded (Definition 4.3.33) linear maps.


Example 4.3.74 The category of seminormed vector spaces The category of

seminormed vector spaces is the category SemiVect whose collection of objects SemiVect0
is the collection of all seminormed vector spaces and for every seminormed vector
space V and seminormed vector space W the collection of all morphisms from V to W ,
MorSemiVect (V,W ) is preicsely the set of all bounded linear maps from V to W , composition
is givne by ordinary function composition, and the identities of the category are the identity.
Exercise 4.3.75 Let f : hV, Di hW, E i be a linear map between seminormed vector

spaces. Show that f is bounded iff for every ||0 E there are finitely-many ||1 , . . . , ||m D
and constants K1 , . . . , Km 0 such that
| f (v)|0 K1 |v|1 + + Km |v|m

(4.3.76)

for all v V .
R

Compare this with the definition of bounded maps of semimetric spaces (Definition 4.3.33). Essentially this boils down to the statement that, to check that f is
bounded, it suffices to check for x2 = 0 and x1 =: v arbitrary (in the notation of
Definition 4.3.33).

Note that a priori a seminormed vector space is not a topological vector space. However, being

Chapter 4. Uniform spaces

182

in metric a semimetric space, it is in fact a uniform space, and so in turn is equipped with its uniform
topology. The question is then whether it is a topological vector space with respect to the uniform
topology. Of course, the answer is in the affirmative. Once we know that the seminormed vector
space V is likewise a topological vector space, we know in turn that its underlying topological
group hV, +, 0, i induces in turn yet another uniform structure, and so a new question arises as
to whether or not this uniform structures agrees with the one induced from the semimetric space
structure. Of course, the answer to this is likewise in the affirmative.
Exercise 4.3.77 Let V be a seminormed vector space. Show that V is a topological vector

space with respect to the uniform topology induced by the semimetric uniformity.
Exercise 4.3.78 Let V be a seminormed vector space. Show that the uniformity induced

by the topological group structure hV, +, 0, i is the same as the semimetric uniformity.
As a mater of fact, the morphisms dont care whether youre thinking of things as a semimetric
space or as a topological group either.
Exercise 4.3.79 Let f : hV, Di hW, E i be a linear map between two seminormed vector

spaces. Show that it is continuous iff it is bounded.

Unless otherwise stated, seminormed vector spaces are always equipped with the
uniformity induced by the semimetric space structure (or, equivalently, the uniformity
induced by the topological group structure).
In fact, a lot of examples of seminormed vector spaces have even more structure, namely, the
structure of an algebra.
Definition 4.3.80 Algebra An algebra is a set A equipped with the structure of a vector

space over a field F and the structure of a ring such that


(i). (1 2 ) a = 1 (2 a) for 1 , 2 F and a A; and
(ii). (a1 a2 ) = ( a1 )a2 for F and a1 , a2 A.
R

That is, an algebra is both a vector space and a ring subject to a couple of compatibility
axioms.

Definition 4.3.81 Homomorphism (of algebras) Let A and B be algebras and let

f : A B be a function. Then, f is a homomorphism iff f is both a linear map of the


underlying vectors spaces and a ring homomorphism of the underlying vector spaces.

Example 4.3.82 The category of algebras over a field F The category of algebras

over a field F is the category AlgF whose collection of objects is the collection of all algebras
over F, for every algebra A and algebra B over F the collection of all morphisms from X to
Y , MorAlgF (A, B), is precisely the set of all homomorphisms from A to Bs, composition is
given by ordinary function composition, and the identities of the categories are the identity
functions.

4.3 Semimetric spaces and topological groups

183

Definition 4.3.83 Seminormed algebra A seminormed algebra is an algebra whose

underlying vector space is a seminormed vector space such that |a1 a2 | |a1 ||a2 | for a1 , a2
A.


Example 4.3.84 The category of seminormed algebras The category of semi-

normed algebras is the category SemiAlg whose collection of objects SemiAlg0 is the
collection of all seminormed algebras, for every seminormed algebras A and seminormed
algebra B the collection of all morphisms from A to B, MorSemiAlg (A, B), is precisely the set
of all bounded homomorphisms, composition is given by ordinary function composition,
and the identities of the category are the identity functions.
With these new definitions in hand, we now present an incredibly important example of a
seminormed algebra, an example that was a large part of the motivation for introducing seminormed
algebras at all.


Example 4.3.85 Uniform convergence on quasicompact subsets Let X be a

topological space and define


A := MorTop (X, R).

(4.3.86)

Pointwise addition and pointwise scalar multiplication gives A the structure of a real vector
space. Pointwise multiplication gives A in turn the structure of a real algebra. The collection
{||K : K X quasicompact}, where
| f |K := sup{| f (x)|},

(4.3.87)

xK

the supremum seminorm on K, then gives A the structure of a seminormed algebra. It is thus
canonically a uniform space (and in turn a topological space). If X itself is quasicompact,
convergence in MorTop (X, R) is called uniform convergence.a Thus, in the general case,
people refer to convergence in the topological space MorTop (X, R) as uniform convergence
on quasicompact subsets.
The reason the case X quasicompact is special is because, in this case, it is actually isomorphic (in the category of seminormed algebras) to a normed algebra.
Exercise 4.3.88 Let X be quasicompact. Show that

idX : hX, {||X }i hX, {||K : K X quasicompact}i.

(4.3.89)

is an isomorphism in the category of seminormed algebras.


R

The point is that, if all we care about is the seminormed algebra structure, we
may always assume without loss of generality that MorTop (X, R) is in fact a
normed algebra with the single norm being given by | f |X := supXX {| f (x)|},
the supremum norm. In this case, we do not write | f |X but rather | f | . The
reason for this notation will become clear when we study L p spacessee
Section 5.3.

Unless otherwise stated, MorTop (X, R) is a always given the structure of a


seminormed algebra, the algebra structure defined pointwise and the seminorms
being ||K for K X quasicompact.

Chapter 4. Uniform spaces

184

Of incredible importance is that this space is in fact complete (at least for so-called
quasicompactly-generated spaces). Of course, we need to first actually define what we
mean by complete, and so we postpone this resultsee Theorem 4.5.9.
a Despite

the name and the context in which were presenting it, uniform convergence actually has nothing
to do with uniform spaces per se (in contrast to uniform continuity, for example). MorTop (X, R) has a topology,
and hence a notion of convergence, which we happen to call uniform convergence. In particular, we only
needed to equip MorTop (X, R) with a topology to define uniform convergence. The reason we waited until
the chapter on uniform spaces, of course, is because MorTop (X, R) obtains its topology from a family of
semimetrics (or in the case X is quasicompact, just a single metric), not because of any direct connection with
uniform convergence and uniform spaces.

4.4 T0 uniform spaces are uniformly-completely-T3


Our goal in this subsection is to show that all T0 uniform spaces are uniformly-T3 . Of course, to
prove this, we had better say what uniformly-completely-T3 means.
4.4.1

Separation axioms in uniform spaces


Throughout this subsection, let S1 , S2 X be disjoint subsets of a uniform space X.
Definition 4.4.1 Uniformly-distinguishable S1 and S2 are uniformly-distinguishable iff

there is some uniform cover U for which StarU (S1 ) 6= StarU (S2 ).

Definition 4.4.2 Uniformly-separated S1 and S2 are uniformly-separated iff there is

some uniform cover U for which ?U (S1 ) is disjoint from StarU (S2 ).
R

Note that this is analogous to the topological condition of being separated by


neighborhoods (Definition 3.6.7)there is not really any uniform analgoe of just
being plain separated (Definition 3.6.4).

Definition 4.4.3 Uniformly-completely-separated

S1 and S2 are uniformly-completely-separated iff there is a uniformly-continuous function


f : X [0, 1] such that f |S1 = 0 and f |S2 = 1.
Exercise 4.4.4 Show that if S1 and S2 are uniformly-completely-separated, then they are

uniformly-separated.
Definition 4.4.5 Uniformly-perfectly-separated S1 and S2 are uniformly-perfectly-

separated iff there is a uniformly-continuous function f : X [0, 1] such that S1 = f 1 (0)


and S2 = f 1 (1).
Definition 4.4.6 Uniformly-T0 X is uniformly-T0 iff any two distinct points are uniformly-

distinguishable.
Definition 4.4.7 Uniformly-T2 X is uniformly-T2 iff any two distinct points can be

uniformly-separated.

4.4 T0 uniform spaces are uniformly-completely-T3

185

Definition 4.4.8 Uniformly-completely-T2 X is uniformly-completely-T2 iff any two

distinct points can be uniformly-completely-separated.


Definition 4.4.9 Uniformly-perfectly-T2 X is uniformly-perfectly-T2 iff any two distinct

points can be uniformly-perfectly-separated.


Definition 4.4.10 Uniformly-T3 X is uniformly-T3 iff it is T1 and any closed set and a

point not contained in it can be uniformly-separated.


Definition 4.4.11 Uniformly-completely-T3 X is uniformly-completely-T3 iff it is T1

and any closed set and a point not contained in it can be uniformly-completely-separated.
Definition 4.4.12 Uniformly-perfectly-T3 X is uniformly-perfectly-T3 iff it is T1 and any

closed set and a point not contained in it can be uniformly-perfectly-separated.


Definition 4.4.13 Uniformly-T4 X is uniformly-T4 iff it is T1 and any two disjoint closed

subsets can be uniformly-separated.


Definition 4.4.14 Uniformly-completely-T4 X is uniformly-completely-T4 iff it is T1

and any two disjoint closed subsets can be uniformly-completely-separated.


Definition 4.4.15 Uniformly-perfectly-T4 X is uniformly-perfectly-T4 iff it is T1 and any

closed set and any two disjoint closed subsets can be can be uniformly-perfectly-separated.
The goal of this section is to prove that all of the from uniformly-T0 to uniformly-completelyT 3 (that is, T0 implies uniformly-completely-T3 see Corollary 4.4.31). We also present counterexamples to show that all these equivalent axioms are strictly weaker than both uniformly-T4 (see
Example 4.4.35) and uniformly-perfectly-T4 (see Example 4.4.38).
4.4.2

The key result


We actually prove a stronger result which requires the notion of the diameter of a set (in a metric
space).
Definition 4.4.16 Diameter Let hX, |,|i be a metric space and let S X. Then, the

diameter of S, diam(S), is defined by


diam(S) := sup {|x, y|}.

(4.4.17)

x,yS

Of course, it may be the case that diam(S) = .

And now we are ready to state our key result.


Theorem 4.4.18. Let U be a uniform cover of a T0 uniform space X. Then, there exists

a metric space Y and a uniformly-continuous surjective function q : X Y such that, if


diam(S) < 1 for S Y , then q1 (S) will be contained in some element of U .

Chapter 4. Uniform spaces

186

Proof. a To construct Y , we shall put a semimetric on X and then take the quotient set with
respect to the equivalence relation of being infinitely close to each other.
S T E P 1 : C O N S T R U C T A S E Q U E N C E O F S TA R - R E F I N E M E N T S O F U
Let us write U0 := U . Then, we take a star-refinement U1 of U0 , in turn another starrefinement U2 of U1 , and so on.
S T E P 2 : D E F I N E `(x1 , x2 ) F O R x1 , x2 X
Define
(
2
if x2
/ StarU0 (x1 )
`(x1 , x2 ) :=
.
1max{mN:x
Star
/
(x
)}
2
Um 1
2
otherwise

(4.4.19)

Note that this in particular implies that `(x1 , x2 ) = 0 if x2 StarUm (x1 ) for all m N. (We
do need to make the other extreme case explicit as the maximum of the empty-set is .)
Thus, the statement that X is T0 (together with the fact that stars for a base for the topology
(Proposition 4.2.54)) implies that either `(x1 , x2 ) > 0 or `(x2 , x1 ) > 0.
S T E P 3 : D E F I N E `(P) F O R PAT H S P
For the purposes of this proof, a path from x1 to x2 will be a finite sequence of points
(x , x1 , . . . , xm ) with x = x1 and xm = x2 .b If P = (x , . . . , xm ), then we define `(P) :=
`(x , x1 ) + `(x1 , x2 ) + + `(xm1 , xm ). We shall call this the length of the path.
STEP 4: DEFINE THE SEMIMETRIC
Finally, we define
|x1 , x2 | := min{inf ({`(P) : P is a path from x1 to x2 or a path from x2 to x1 .}) , 1}.
(4.4.20)

S T E P 5 : S H O W T H AT T H I S I S I N F A C T A S E M I M E T R I C
From the definition, we have that |,| is symmetric (this is the reason for putting the or
in the definitionnote that the definition of `(x1 , x2 ) is not manifestly symmetric). The
triangle inequality follows from the fact that a path from x1 to x3 and a path from x3 to x2
gives us a path from x1 to x2 , with the length of this new path being the sum of the lengths
of the other two. Thus, |,| is in fact a semimetric.
STEP 6: CONSTRUCT Y
Define x1 x2 iff |x1 , x2 | = 0. That this is an equivalence relation follows from the fact that
|,| is a semimetric. Thus, we may define
Y := X/ .

(4.4.21)

STEP 7: CONSTRUCT THE METRIC ON Y


We abuse notation and write the induced metric on Y with the same symbol |,| as the
semimetric on X:
|[x1 ] , [x2 ] | := |x1 , x2 |.

(4.4.22)

4.4 T0 uniform spaces are uniformly-completely-T3

187

Exercise 4.4.23 Check that |,| on Y is well-defined.

S T E P 8 : S H O W T H AT T H I S I N F A C T A M E T R I C
|,| on Y is automatically symmetric and satisfies the triangle inequality because |,| on X
does. Furthermore, if |[x1 ] , [x2 ] | = 0, then |x1 , x2 | = 0, and so x1 x2 by the definition of
. Thus, |,| is indeed a metric on Y .
STEP 9: DEFINE q : X Y
We take q : X Y to be the quotient map: q(x) := [x] . Of course q is surjective (all
quotient maps are).
S T E P 1 0 : S H O W T H AT q I S U N I F O R M LY - C O N T I N U O U S
We apply Proposition 4.3.32 which characterizes uniform-continuity for functions whose
codomain is a metric space. So, let > 0. We must find a uniform cover U of X such that
for every U U , whenever x1 , x2 U, it follows that |q(x1 ), q(x2 )| < . It suffices to show
this for := 21m . We show that Um is a uniform cover that works. So, let U Um and let
x1 , x2 U. Then, in particular, x2 StarUm (x1 ), and so
|q(x1 ), q(x2 )| := |x1 , x2 | < 21m =: .

(4.4.24)

S T E P 1 1 : F I N I S H T H E P RO O F B Y P ROV I N G T H E D E S I R E D P RO P E RT Y O F q
Let S Y and suppose that diam(S) < 1. We wish to show that there is some U U0
such that S U. It suffices to show that for x1 , x2 S, there is some Ux1 ,x2 U1 such that
x1 , x2 Ux1 ,x2 . This is because, if this is true, then S StarU1 (x1 ), which in turn is contained
in some element of U0 because U1 star-refines U0 .
To show this, it suffices to show that if |x1 , x2 | < 21m for m Z+ , then there is some
U Um such that x1 , x2 U. So, let x1 , x2 X be such that |x1 , x2 | < 21m . Then, without
loss of generality, there is some path (x , . . . , xn ) from x1 to x2 with
`(x , x1 ) + + `(xn1 , xn ) < 21m .

(4.4.25)

It thus suffices to show that, whenever (4.4.25) holds, there is some U Um with x , xn U.
We prove this by induction on n. For n = 1, (4.4.25) implies that `(x1 , x2 ), `(x2 , x1 ) < 21m ,
as was mentioned above in Step 2, because X is T0 , at least one of these is strictly positive
without loss of generality suppose that `(x1 , x2 ) > 0. Then, the fact that `(x1 , x2 ) < 21m
implies that
/
Uo (x1 )}
21max{oN:x2 Star
< 21m ,

(4.4.26)

so that
m max{o N : x2
/ StarUo (x1 )} = c min{o N : x2 StarUo (x1 )},

(4.4.27)

which implies that x2 StarUm+1 (x1 ), which implies that there is some U Um+1 such that
x1 , x2 U. Thus, this does the case for n = 1. (Note that in fact we can take U Um+1 this
will be important later.)
Now assume the result is true for all k n. We wish to prove the result for n + 1.

Chapter 4. Uniform spaces

188

We must have that `(x , x1 ) < 21n , because otherwise we would have to have that
for k 1, in which case xk and xk+1 lie in some U Uo for o arbitrarily large.
We can then guarantee that xk for k 1 are obtained in some element of Um+1 , and hence,
as x and x1 are obtained in some element of Um+1 , everything is contained in some element
of Um .
Thus, without loss of generality assume that `(x , x1 ) < 21n . Then, the inequality
(4.4.25) implies that there is some k0 such that
`(xk , xk+1 )

`(x , x1 ) + `(xk0 1 , xk0 ) 2m .

(4.4.28)

(This is the same inequality with m one larger). Similarly, there is some l0 such that
`(xl0 , xl0 +1 ) + + `(xn1 , xn ) 2m .

(4.4.29)

It then follows that we must also have that


`(xk0 , xk0 +1 ) + + `(xl0 1 , xl0 ).

(4.4.30)

By the induction hypotheses, we then must have in particular that there are U1 ,U2 ,U3 Um+1
such that x , xk0 U1 , xl0 , xn U2 , and xk0 , xl0 U3 . Then, there is some U Um such that
StarUm+1 (U3 ) U. However, as U1 ,U2 StarUm+1 (U3 ), this completes the proof.

a Proof

adapted from [11, pg. 8].

b The superscripts (as opposed to subscripts) if obviously for the purpose of not conflicting with the subscripts

on x1 and x2 .
c This equality implicitly uses the fact that `(x , x ) > 0, so that {o N : x Star } is bounded above.
1 2
2
Uo

From this, that every uniform space is uniformly-completely-T3 follows relatively easily.
Corollary 4.4.31 Let X be a T0 uniform space. Then, X is uniformly-completely-T3 .

Proof. a Let X be a uniform space. We first show that X is T1 . We know that X is T0 , and
so by Proposition 3.6.66 (regular T0 spaces are T2 ), X is T2 , hence T1 .
We now show that uniformly-continuous functions on X can separate closed sets from
points. So, let C X be closed, and let x0 Cc . As C is closed, there must be some
neighborhood of x0 that does not intersect C (otherwise, x0 would be an accumulation point
of C). Then, because stars for a basis for the topology (Proposition 4.2.54), there is a
uniform cover U such that
StarU (x0 ) Cc .

(4.4.32)

Now apply the previous theorem for the uniform cover U , so that there is a metric space
hY, |,|i and a uniformly-continuous map q : X Y such that, if diam(S) < 1 for S Y ,
it follows that q1 (S) is contained in some element of U . From (4.4.32), it follows that
C {x0 } is not contained in any element of U , and so
diam(q(C) {q(x0 )}) 1.

(4.4.33)

Define f : X [0, 1] by
f (x) := b max{distC (x), 1}.

(4.4.34)

This is uniformly-continuous because distC is. (4.4.33) implies that distC (x0 ) 1, and so
f (x0 ) = 1. We showed in Proposition 4.3.14 that distC (C) = 0.


4.4 T0 uniform spaces are uniformly-completely-T3

189

a Proof
b See

4.4.3

adapted from [11, pg. 8].


(4.3.15) for the definition of distC .

The counter-examples
We know from the diagram (3.6.119), that if we are to do any better in terms of separation axioms,
we would be able to prove that every uniform space is either perfectly-T3 or completely-T4 (which
is equivalent to T4 by Urysohns Lemma, Theorem 3.6.107). Unfortunately, however, there exist
counter-examples to both these separation axioms.
Example 4.4.35 A uniform space that is not perfectly-T3 The Uncountable Fort
Space X of Example 3.6.22 will do just fine yet again. We already know that this space is
not perfectly-T3 from Example 3.6.89. Thus, all that remains to be done is to equip X with a
uniformity that generates the Uncountable Fort Space Topology.
Define


e := f 1 (B ) : f MorTop (X, R), > 0 .
B
(4.4.36)


e is a uniform base for the Uncountable Fort Space


Exercise 4.4.37 Show that B
Topology.

Example 4.4.38 A uniform space that is not T4

We define a topology on
MorSet
the topology of pointwise convergence. We use Definition by specification of convergence, Theorem 3.4.8, to do it.
For f MorSet (R, R) and a let 7 f MorSet (R, R). Then,


(R, R),a

we declare that the net 7 f converges to f iff the net 7 f (x)


converges to f (x) for every x R.

(4.4.39)

Exercise 4.4.40 Show that this definition satisfies the axioms of Definition by

specification of convergence, and so define a topology on MorSet (R, R).


Exercise 4.4.41 Show that MorSet (R, R), + is a topological group, where + is

defined pointwise:
[ f1 + f2 ](x) := f1 (x) + f2 (x).

(4.4.42)

Thus, MorSet (R, R) is canonically a uniform space (and hence completely-T3 ).


We now shows that MorSet (R, R) is not completely-T4 with respect to this topology. To do
this, we first show that MorSet (R, Z) MorSet (R, R) is not completely-T4 .b
For m Z, define
n
o
Pm := f MorSet (R, Z) : f |[ f 1 (m)]c is injective. ,
(4.4.43)
that is, the set of all functions that are injective modulo sending more than one point
to m. We show that P0 and P1 are closed and disjoint, but cannot be separated by open
neighborhoods.

Chapter 4. Uniform spaces

190

We first check that P0 is closed (the proof that P1 is closed is nearly identical). So, let
7 f P0 converge to f . Suppose that f (x1 ) = f (x2 ) is distinct from 0. Then, by our
definition of convergence and the fact that our functions are taking values in the integers,
it must be the case that 7 f (xi ) is eventually equal to f (xi ) for i = 1, 2. Then, in
particular, we will have that f0 (x1 ) = f0 (x2 ) for 0 sufficiently large, and hence x1 = x2 .
Thus, f P0 .
We now check that P0 and P1 are disjoint. If f is injective on the complement of f 1 (0)
(i.e. if f P0 ), then this complement must be countable (because the image of f lies in
Z). In particular, there must be at least two elements in f 1 (0), and so f (x1 ) = 0 = f (x2 )
for x1 6= x2 . But then f 1 cannot be injective on the complement of f 1 (1), and so f
/ P1 .
Thus, P0 is disjoint from P1 .
Let A := {0 , 1 , 2 , . . .} be a countably-infinite subset of R, and for S A a finite, let us
define
US, f := {g MorSet (R, Z) : g|S = f |S }.

(4.4.44)

The complement of this is the collection of all functions which differ from f at at least one
point of S. Because the functions take their value in Z, however, if you take a net of such
functions, the limit (if it has one) must still disagree with f at least one point. Therefore,
US,c f is closed, and hence US,c f is open. Moreover, as
US, f UT, f = UST, f ,

(4.4.45)

it follows from Proposition 3.1.11 that this is a neighborhood base for f . If fact, if we restrict
ourselves to only taking S from a given infinite subset of A, we still get a neighborhood
base.
Now, let U and V be open neighborhoods of P0 and P1 respectively. Of course, we seek to
show that U and V must intersect. We seek to construct a sequence of functions fk U and
a countably-infinite collection of finite subsets Bk = {0 , . . . , mk } of A such that UBk , fk U
and
(
k if x = k Bk1
fk (x) :=
.
(4.4.46)
0 otherwise
We do so inductively. (We take B0 := 0.)
/


Take f1 := 0, so that of course f1 P0 . Thus, because US, f1 : S A finite. is a neighborhood base at f1 , there must be some finite subset B1 A such that UB1 , f1 U. In fact, we
can enlarge B1 so that it is of the form B1 = {0 , . . . , m1 }. Now define f2 according to
(4.4.46), that is
(
k if x = k B1
f2 (x) :=
.
(4.4.47)
0 otherwise
Then, f2 U, and so there must be some finite set B2 A such that UB2 , f2 U. By enlarging
B2 if necessary, we can guarantee it is of the form B2 = {0 , . . . , m2 } with m2 > m1 . Then,
we may define

2 if x B2 \ B1
f3 (x) := 1 if x B1
.
(4.4.48)

0 otherwise

4.5 Cauchyness and completeness

191

Then, for the same reason as before, f3 U, and so there is some finite set B3 A (that is
without loss of generality of the form B3 = {0 , . . . , m3 } with m3 > m2 ) and UB3 , f3 U.
Continue this process inductively.
Now define g MorSet (R, Z) by
(
k if x = k A
g(x) :=
.
(4.4.49)
1 otherwise
Then, g P1 , and so there is some finite set B A such that UB,g V . Let m be sufficiently
large so that B Bm . Then, fm UB,g V and fm UBm , fm U, and so, in particular, lies
in U V .
This shows that MorSet (R, Z) is not T4 , but we still must show that MorSet (R, Z) itself is
not T4 . This will follow from the following lemma.
Lemma 4.4.50 Let X be T4 and let C X be closed. Then, X is T4 .
R

Note
this doesnt hold in general. For example,
that
this example shows that
RR
R [, ] is not normal even though
R [, ] is (it is compact
by Exercise 3.6.44 and The product topology, Theorem 3.5.21, and hence T4
by Proposition 3.6.96.

Proof. Let C1 ,C2 C be disjoint and closed. Then, by definition of the subspace
topology (Proposition 3.5.1), C1 = C10 C and C2 = C20 C for C10 ,C20 X closed,
and so C1 and C2 are themselves closed in X because C is closed. Thus, because X is
T4 , C1 and C2 can be separated by neighborhoods in X, and hence can be separated
by neighborhoods in C.

a Mor
Set (R, R)
b The

is our fancy-schmancy notation for the set of all functions from R to R.


proof of this is adapted from [17, pg. 206].

Finally, we are ready to begin discussing cauchyness and completeness in the general context
of uniform spaces.

4.5

Cauchyness and completeness


fi be a uniform space and let 7 x be a net.
Definition 4.5.1 Cauchyness Let hX, U
f, there is some U U such that
Then, we say that 7 x is cauchy iff for every U U
7 x is eventually contained in U.
R

You should compare this with our definition of cauchyness in R, Definition 2.4.22.
As the collection of all -balls for all > 0 forms a uniform base, if we show that we
can replace the entire uniformity in the test of cauchyness, then the definition we gave
before in Definition 2.4.22 will be a word-for-word special case of this definition.
Indeed, part of the motivation for phrasing the definition in Definition 2.4.22 the way
we did was to make the transition to this higher level of generality as transparent as
possible. See the paragraphs that follow Definition 2.4.22 for a few more comments
regarding this.

To check whether a net is cauchy, it suffices to check on just a uniform base for the collection
of uniform covers.

Chapter 4. Uniform spaces

192

e be a uniform base on X, and let 7 x be a net.


Proposition 4.5.2 Let X be a set, let B

e there is some B B such that 7 x is


Then, 7 x is cauchy iff for every B B
eventually contained in B.
R

With this equivalence, the definition we gave before for cauchyness in R in Definie
tion 2.4.22 is literally verbatim equivalent to this definition upon replacement of B
with {B : > 0} and of B with B := {B (x) : x R}.

Proof. () There is nothing to check.


e there is some B B such that 7 x is eventually
() Suppose that for every B B
f. Let U U
f. Then, there is some B B
e
contained in B. Denote the uniformity on X by U
such that B U . Thus, there is some B B such that 7 x is eventually contained in
B. As B U , there is some U U such that StarB (B) U. In particular, B U, and so
if 7 x is eventually contained in B, it is certainly eventually contained in U.

From this, we obtain relatively nice description of what it means to be cauchy in our two large
families of examples, namely semimetric spaces and topological groups.
Exercise 4.5.3 Let hX, Di be a semimetric space and let 7 x X be a net. Show that

7 x is cauchy iff for every |,| D and for every > 0, 7 x is eventually contained
|,|
in B (x) for some x X.
R

In other words, a net in a semimetric space is cauchy iff it is cauchy with respect to
each semimetric.

Exercise 4.5.4 Let G be a topological group and let 7 g G be a net. Show that

7 g is cauchy iff for every open neighborhood U of the identity there is some g G
such that 7 g is eventually contained in gU.
Just as continuous functions preserve convergence, so to do uniformly-continuous functions
preserve cauchyness.
Exercise 4.5.5 Let f : X Y be uniformly-continuous and let 7 x X be cauchy.

Show that 7 f (x ) is cauchy.


Warning: Continuous functions do not necessarily preserve cauchyness.
Example 4.5.6 A continuous image of a cauchy net need not be cauchy Let
X := ( 2 , 2 ) and define m 7 arctan(m) X. This is certainly cauchy as, for every > 0,
it is eventually contained in ( 2 , 2 ). On the other hand, its image under the continuous
function tan : X R is not even bounded, much less cauchy.


Of course, if we know what it means for nets to be cauchy, then we likewise have a notion of
what it means for uniform spaces to be complete.
Definition 4.5.7 Completeness A uniform space is complete iff every cauchy net

converges.
R

In case there might be some confusion (e.g. if the topology of the underlying uniform
space comes from a totally-ordered set), then we shall say cauchy-complete in contrast
to dedekind complete.

4.5 Cauchyness and completeness

193

Exercise 4.5.8 Show that every compact metric space is complete.

4.5.1

Completeness of MorTop (X, R)


We mentioned back above in Example 4.3.85 that MorTop (X, R) is complete. We now prove this.
Theorem 4.5.9. Let X be a topological space that has the property that a subset is open iff

its intersection with each quasicompact subset K is open in K. MorTop (X, R) is complete.
R

In particular, a the limit of a uniformly convergent net of continuous functions is


continuous. This need not be the case if the convergence is just pointwisesee
Example 4.5.15.

This condition on X is called quasicompactly-generated. For example, the cocountable


topology on R is not quasicompactly-generatedsee Example 4.5.17.

Proof. S T E P 1 : P R O V E T H E R E S U LT F O R X Q U A S I C O M PA C T
We first take X to be quasicompact. In this case, the uniform structure on MorTop (X, R) is
the same as that generated by the single norm ||X . Therefore, by Exercise 4.5.3 (cauchyness
in semimetric spaces), to show that MorTop (X, R) is complete, it suffices to show that every
net that is cauchy with respect to ||X converges. So, suppose that 7 f MorTop (X, R)
is cauchy. As


(4.5.10)
| f1 (x) f2 (x)| f1 f2 ,
it follows that, for each x X, the net 7 f (x) R is cauchy. As R is complete, this
net has a limit. Call this limit f (x). We need to check two things: (i) that x 7 f (x) is
continuous (so that indeed f MorTop (X, R), and (ii) that 7 f converges to f in
MorTop (X, R).
We first show
that f is continuous. Let > 0. Let 0 be such that, whenever 1 , 2 0 ,
it follows that f1 f2 < . Let U be an open neighborhood of x such that f0 (U)
B ( f0 (x )). Let x U. Let 1 , 2 0 be such that | f1 (x) f (x)|, | f2 (x ) f (x )| < .
Then,
| f (x) f (x )| | f (x) f1 (x)| + | f1 (x) f0 (x)| + | f0 (x) f0 (x )|
+ | f0 (x ) f2 (x )| + | f2 (x ) f (x )| < 5.

(4.5.11)

Thus, f is continuous.
We now check that 7 f converges to f in MorTop (X, R). Let > 0. Now that we
know that f is continuous, for each x X, there is some open neighborhood Ux of x such
that fUx ) B ( f (x)). Then,
{Ux : x X}

(4.5.12)

is an open cover of X. Therefore, there is a finite subcover, Ux1 , . . . ,Uxm . Thus, we may
choose 0 such that, whenever 0 , it follows that | f (xk ) f (xk )| < for all 1 k m.
Now let x X be arbitrary. Without loss of generality, assume that x U1 . Then, whenever
0 , it follows that
| f (x) f (x)| | f (x) f (x1 )| + | f (x1 ) + f (x1 )| < 2.

(4.5.13)

Taking the supremum over x, we find that


| f f | < ,

(4.5.14)

Chapter 4. Uniform spaces

194
so that indeed 7 f converges to f .

S T E P 2 : P R O V E T H E R E S U LT I N G E N E R A L
We now do the general case, in which case X is not necessarily quasicompact. So, let
7 f MorTop (X, R) be cauchy. Then, by Exercise 4.5.5 again, we must have that
7 f |K MorTop (K, R) is cauchy for each quasicompact subset K X. Therefore, by
quasicompact case, 7 f converges to its pointwise limit f on each quasicompact subset.
As f is continuous on each quasicompact subset, the intersection of the preimage of every
open set with every quasicompact subset of X is open, and hence, by hypothesis, is open.
Therefore, f is continuous. Furthermore, 7 f converges to f in MorTop (X, R) because
it converges to f on each quasicompact subset.

Now that weve just proven what goes right, we turn to the more interesting side of thingswhat
can go wrong.


Example 4.5.15 A pointwise limit of continuous functions that is not continuous

Let fm : [0, 1] R be defined by fm (x) := xm . Then, of course each fm is continuous. On


the other hand,
(
0 if x [0, 1)
lim f (x) =
,
(4.5.16)
m
1 if x = 1
which is clearly not continuous.
Example 4.5.17 A discontinuous function that is continuous on every quasicompact subset Take X := R and equip it with our good old-buddy, the cocountable topology.


Recall that (Example 3.6.47) a subset of X is quasicompact iff it is finite.


Now let f : X R be any discontinuous function. For example, the Dirichlet Function
is discontinuous with respect to the cocountable topology (because Qc is not closed). On
the other hand, finite subsets of X are discrete, and so f restricted to finite subsets must be
continuous.
We can use this trick to find an example of a space for which MorTop (X, R) is not complete.


Example 4.5.18 A topological space for which MorTop (X, R) is not complete

Take X := R equipped with the cocountable extension topology. Recall that this means that
the only closed sets are (i) X itself, (ii) countable subsets, and (iii) subsets which are closed
in the usual topology of R.
The same proof in the previous example shows that the only quasicompact subsets of X are
the finite sets. Let f : X R be the Dirichlet function. As f 1 (1) = Qc is not closed, f is
not continuous. We construct a net of functions in MorTop (X, R) converging to f uniformly
on each quasicompact (i.e. each finite) subset of X.
Our directed set is the collection of all finite subsets of X ordered by inclusion. For S ,

4.5 Cauchyness and completeness

195

let us write S = {x1 , . . . , xm } with xk < xk+1 and define

f (x)
if x S

f (x )
if x < x1
1
fS (x) := f (xk+1 f (xk )
,

(x

x
)
for
x
<
x
<
x
k
k
k+1

x
x
k+1
k

f (x )
if x < x
m

(4.5.19)

That it, is is a constant f (x1 ) for all x x1 , and similarly for x xm . For x between xk and
xk+1 , is it a just the line segment going from f (xk ) at x = xk to f (xk+1 ) at x = xk+1 . By
construction, this is continuous with respect to the usual topology, and hence continuous
with respect to the cocountable extension topology.
The key result of this subsection is that MorTop (X, R) is complete for X quasicompactlygenerated. If turns out that the converse of this is false.
 Example 4.5.20 A space for which MorTop (X, R) is complete yet is not
quasicompactly-generated Take X := R equipped with the cocountable topology. We

show that every element of MorTop (X, R) is constant.


Let f : X R be continuous. If f is not constant, then f 1 (a) is proper for every a R,
and hence countable. The image cannot be countable then, because if it were, R would be
S
a countable union of countable sets,a and hence countable. As R = mZ [m, m + 1] and
the image is uncountable, some interval [m, m + 1] must intersect the image at uncountable
many points. But then, either [m, m + 1] contains the image or its preimage is a countable
set, in which case we have
countable set = f 1 ([m, m + 1]) =

f 1 (a)

(4.5.21)

a f (X)[m,m+1]

is an uncountable union of nonempty disjoint sets a contradiction. Therefore, it must


be that f (X) [m, m + 1], so that f 1 ([m, m + 1]) = X. Now, writing [m, m + 1] = [m, m +
1
1
2 ] [m + 2 , m + 1] and applying the same logic again, we find, without loss of generality,
that the image of f is contained in [m, m + 12 ]. Applying this logic inductively, we deduce
that the image of f is contained in an interval of length 21m for all m N In particular, the
image of f cannot contain more than one element, because two distinct points cannot fit
inside an interval sufficiently small.
A cauchy net of constant functions amounts to a cauchy net of real numbers, which converges, and so the original cauchy net of constant functions converges to this constant
function. Therefore, MorTop (X, R) is complete.
a Because

4.5.2

R=

aIm( f )

f 1 (a).

The completion
And now we show that every uniform space can be completed.5
Theorem 4.5.22 Completion. Let X be a uniform space. Then, there exists a complete

uniform space Cmp(X), the completion of X, such that


5 Of

course, we already know that Q is not complete (see Propositions 2.4.61 and 2.4.70), and so in general there will
most certainly be some completing to be done.

Chapter 4. Uniform spaces

196
(i). Cmp(X) contains X; and

(ii). if Y is any other complete uniform space which contains X, then Y contains Cmp(X).
Furthermore,
(i). Cmp(X) is unique up to uniform homeomorphism;a and
(ii). X is dense in Cmp(X).
R

You will see in the proof that this is the cauchy sequence construction we mentioned
in a footnote right before our proof of existence of the real numbers (Theorem 1.4.19).
It turns out that, in the case of R, this will gives us the right answer, but that the
passage from Q to R should really be thought of as the dedekind completion, not the
cauchy completion.

Proof. By replacing X with T0(X) if necessary, we may without loss of generality assume
that X is T0 .
S T E P 1 : D E F I N E A N E Q U I VA L E N C E R E L AT I O N O N C A U C H Y N E T S
Define first as a set
X 0 := { 7 x X : 7 x is cauchy.}

(4.5.23)

For 7 x and 7 y cauchy nets, define 7 x 7 y iff open subsets of X


eventually contain 7 x iff they eventually contain 7 y .
S T E P 2 : S H O W T H AT I S A N E Q U I VA L E N C E R E L AT I O N
That 7 x 7 x is tautological. The definition of is 7 x 7 y symmetric,
and so of course 7 x 7 y implies 7 y 7 x .
As for transitivity, suppose that 7 x 7 y and 7 y 7 z . We wish
to show that 7 x 7 z . Let U X be open. We must show that U eventually
contains 7 x iff it eventualy contains 7 z . By 7 x 7 z , it suffices to
show only one of these directions. So, suppose that U eventually contains 7 x . Then,
U eventually contains 7 y (because 7 x 7 y ), and so U eventually contains
7 z (because 7 y 7 z ).
S T E P 3 : D E F I N E Cmp(X) A S A S E T
We define
Cmp(X) := X 0 / .

(4.5.24)

S T E P 4 : D E F I N E A U N I F O R M I T Y O N X0.
f. For every uniform cover U U
f, we define a correDenote the uniformity on X by U
0
0
f
sponding cover U of X . For U U and U U , define


U 0 := x X 0 : x is eventually contained in U.
U 0 := {U 0 : U U }
(4.5.25)
f0 := {U 0 : U U
f}.
U
f0 is a uniform base on X 0 .b This will then define a topology and
We wish to show that U
compatible uniformity for each each U 0 is a uniform cover by Proposition 4.2.54 and

4.5 Cauchyness and completeness

197

f0 is
Proposition 4.2.70 respectively. To show this, we must show (i) that each element of U
0
0
f
in fact a cover of X and (ii) that U is downward-directed with respect to star-refinement.
f0 be arbitrary and let
We first check that each U 0 is in fact a cover of X 0 . So, let U 0 U
x X 0 . As x is cauchy, there is some U U such that x is eventually contained in U, so that
x U 0 , and hence U 0 covers X 0 .
f0 is downward-directed with respect to star-refinement. So, let
We now check that U
f0 . Let W be a common star-refinement of U and V . We wish to show that W 0 is
U 0, V 0 U
a common star-refinement of U 0 and V 0 . By U V symmetry, it suffices to prove that W 0
is a star-refinement of U 0 . So, let W00 W 0 . As W is a star-refinement of U , it follows that
there is some U0 U such that StarW (W0 ) U0 . We wish to show that StarW 0 (W00 ) U00 .
So, let W 0 W 0 intersect W00 . Then, there is a net that is eventually contained in both W and
W0 , so that, in particular, W and W0 intersect. It follows that W U0 , and hence in turn, that
any net eventually contained in W is eventually contained in U0 , so that W 0 U00 , and hence
StarW 0 (W00 ) U00 as desired.
f0 is a uniform base on X 0 .
This completes the proof that U
S T E P 5 : S H O W T H AT T H E Q U O T I E N T M A P q : X 0 Cmp(X) S AT I S F I E S
q1 (q(U 0 )) = U 0
As it is always the case that U 0 q1 (q(U 0 )), it suffices to show that q1 (q(U 0 )) S. So,
let x q1 (q(U 0 )). Then, x is a cauchy net and there is another cauchy net y U 0 such that
x y. That is, open sets eventually contain x iff they eventually contain y. However, as
y U 0 , by definition of U 0 ((4.5.25)), y is eventually contained in U, and so x is eventually
contained in U, and so x U 0 . Hence, q1 (q(U 0 )) U 0 , and we are done.
S T E P 6 : D E F I N E A U N I F O R M B A S E O N Cmp(X)
f, we define a corresponding cover Cmp(U ) of Cmp(X).
For every uniform cover U U
fand U U , definec
For U U
Cmp(U) := q(U 0 )
Cmp(U ) := q(U 0 ) := {Cmp(U) : U U }

(4.5.26)

^ ) := q(U
f0 ) := {Cmp(U ) : U U
f}.
Cmp(U
^ ) is a uniform base on Cmp(X). Certainly each Cmp(U ) is a cover
We claim that Cmp(U
of Cmp(X) because U 0 is a cover of X 0 and q is surjective.
It follows from Proposition 4.2.51 that q preserves star-refinement (the purpose of the
previous step was to verify the requisite hypotheses of Proposition 4.2.51), and so that
f) is a uniform base follows from the fact that that U
f0 was a uniform base on X 0 .
Cmp(U
S T E P 7 : S H O W T H AT Cmp(X) I S C O M P L E T E .
Let 7 q(x ) be cauchy in Cmp(X). By definition of cauchyness and our uniformity
f), there is some
on Cmp(X), this means that, for every uniform cover Cmp(U ) Cmp(U

Cmp(U) Cmp(U ) such that 7 q(x ) is eventually contained in Cmp(U) := q(U 0 ).


Thus, for each x X 0 for sufficiently large, there is some y U 0 with x y . However,
by definition of U 0 , U eventually contains y , and hence, because x y , eventually
contains x , and so in fact x U 0 . Thus, we have a net 7 x X 0 that has the property
f, there is some U U such that 7 x is eventually
that, for every uniform cover U U
contained in U 0 .

Chapter 4. Uniform spaces

198

Let us denote the domain of 7 x by . Now, each x is itself a net, and so let us
denote its domain by M . Define x X 0 to be the net

M 3 ( , ) 7 (x ) X.
(4.5.27)

We first must check that this is cauchy, so that indeed x X 0 .


f be a uniform cover. Then, there is some U U such that 7 x is
So, let U U
eventually contained in U 0 . So, let 0 be such that, whenever 0 , it follows that x U 0 .
For all such , the net x itself must be eventually contained in U, so let 0 be such that,
whenever 0 , it follows that (x ) . (For not at least 0 , 0 may be anything).
Now, suppose that ( , ) (0 , 0 ). Then,
(x ) U

(4.5.28)

Thus, ( , ) 7 (x ) is cauchy.
We now show that 7 q(x ) converges to q(x ). To show this, as stars form neighborhood bases, it suffices to show that 7 q(x ) is eventually contained in StarCmp(U ) (q(x ))
f. As q is surjective, the preimage of a star is equal to the star of the preimfor all U U
age, and so it suffices to show that 7 x is eventually contained in StarU 0 (x ) for all
f. So, let U U
fbe a uniform cover. Then, there is some U U such that 7 x
U U
is eventually contained in U 0 . Thus, we will be done if we can show that x U 0 (so that
then U 0 StarU 0 (x ). To show this, we must show that x is eventually contained in U.
However, the exact same argument that was used above to show that x was cauchy shows
precisely this. Therefore, 7 x converges to x .
S T E P 8 : S H O W T H AT Cmp(X) C O N TA I N S X
Of course, when we say that Cmp(X) contains X, what we really means is that there is a
subset of Cmp(X) which is uniformly-homeomorphic to X. So, we define : X Cmp(X),
and show that it is a uniform-homeomorphism onto its image.d
Define c : X X 0 by c(x) := ( 7 x := x), that is, c sends x to the constant net with
value x. (Constant nets converge, and hence are in particular cauchy.) Then we define
:= q c. We first show that this is injective. Suppose that (x1 ) = (x2 ). Then, every
neighborhood that eventually contains the constant net x1 eventually contains the constant
net x2 . In other words, open sets in X contain x1 iff they contain x2 , which implies that
x1 = x2 because X is T0 .
We now check that is uniformly-continuous. We claim that 1 (Cmp(U ) = U .
However, using result that q1 (q(U 0 )) = U 0 from Step 5, we have that

1 (Cmp(U )) = c1 q1 q(U 0 ) = c1 (U 0 ) = {c1 (U 0 ) : U U }.
(4.5.29)
However, the only constant nets which are eventually contained in U are the elements of U
themselves, and so c1 (U 0 ) = U, and so indeed
1 (Cmp(U )) = U .
The subspace uniformity induced on (X) is simply
n
o
f ,
Cmp(U ) {(X)} : U U

(4.5.30)

(4.5.31)

4.5 Cauchyness and completeness

199

that is, a cover of (X) is uniform iff it is a cover that is obtained from a uniform cover of
Cmp(X) by simply restricting that cover to (X). Because the preimage of a cover with
respect its inverse is the same as the image of that uniform cover, to show that the inverse of
: X (X) is uniformly-continuous, it suffices to show that (U ) is a uniform cover on
(X). By (4.5.31), it thus suffices to show that
(U ) = Cmp(U ) {(X)}.

(4.5.32)

However,




Cmp(U ) {(X)} = q(U 0 ) q(c(X)) : U U = e q(U 0 c(X)) : U U
= {(U) : U U } ,

(4.5.33)

which demonstrates the truth of (4.5.32).


S T E P 9 : S H O W T H AT A N Y O T H E R C O M P L E T E U N I F O R M S PA C E T H AT C O N X C O N TA I N S Cmp(X)
Let Y be some other complete uniform space that contains X. Let x X 0 be a cauchy net in
X and let x be its (unique) limit in Y . Define : Cmp(X) Y by (q(x)) := x .

TA I N S

Exercise 4.5.34 Show that is well-defined and is a uniform-homeomorphism onto

its image.

S T E P 1 0 : S H O W T H AT Cmp(X) I S U N I Q U E
Let Y be another complete uniform space which (i) contains X and (ii) is contained in every
other complete uniform space that contains X. From this, we know that Y is contained in
Cmp(X). On the other hand, we already knew that Cmp(X) was contained in Y . Therefore,
Cmp(X) = Y .
S T E P 1 1 : S H O W T H AT X I S D E N S E I N Cmp(X)
Let q(x) Cmp(X).
Exercise 4.5.35 Show that 7 (x ) converges to q(x) in Cmp(X).

It follows from this exercise that Cls((X)) = Cmp(X), and so that X is dense in Cmp(X).

a This

of course justifies our use of the notation Cmp(X).


B is of course to remind us that we dont know this to be a uniformity per se, only a uniform base.
c As you might have guessed, this is just the quotient uniformity induced from the one on X 0 written out
exlicitly.
d Its image will be equipped with the subspace uniformity, that is, the initial uniformity (see Proposition 4.2.83)
with respect to the inclusion into Cmp(X).
e Careful: f (X) f (Y ) = f (X Y ) does not hold in general. It does, however, hold for q because q is
surjective and satisfies q1 (q(U)) = U.
b The

One thing that we will frequently want to do is extend a given function to its completion. If
the codomain is complete as well, we can do this, and in fact, we can do it whenever we have a
continuous function defined on a dense subspace.

Chapter 4. Uniform spaces

200

Proposition 4.5.36 Let S X be a dense subset of a topological space, let Y be a complete

T0 uniform space, and let f : S Y be uniformly-continuous. Then, there exists a unique


uniformly-continuous map g : X Y such that g|S = f .
Mere continuity does not suffice, even in the nicest of cases. See the counter-example
below (Example 4.5.40).

Proof. Let x X. As Cls(S) = X, there is a net 7 x X converging to x. Pick any


such neta . By Exercise 4.5.5, 7 f (x ) is cauchy in Y , so that we may simply take
its limit (because Y is complete and is T0 , and hence T2 , so that limits are uniquesee
Proposition 3.6.42). So, let us define g : X Y by
g(x) := lim f (x ).

(4.5.37)

Exercise 4.5.38 Show that if 7 y also converges to x, then lim f (x ) =

lim f (x ).
This exercises established uniqueness. Thus, the choice of net does not matter, and so
for x S, we may simply take the constant net 7 x := x, so that g is indeed an extension
of f .
Exercise 4.5.39 Show that g is uniformly-continuous.


a Our

proof of uniqueness will show that this choice ultimately does not matter.

Example 4.5.40 A real-valued continuous function on a dense subspace that


does not extend In the notation of the previous proposition, take S := R, X := [, +],


Y := R, and f := idR . If this had an extension to all of X, then in particular the sequence
m 7 xm := m would have to converge in R.
R

In fact, if perhaps you thought you could make use of the fact that continuous functions
restricted to quasicompact subsets are uniformly-continuous (Proposition 4.2.94) to
prove the result in special cases, this even provides a counter-example in which every
point of X has a compact neighborhood.

Among other things, the significance of this result is that group operations of topological groups
extend uniquely to their completions.
Having shown that uniform spaces always have completions, finally, we may return to an
unresolved issue all the way back from Chapter 1.
Example 4.5.41 A nonzero totally-ordered cauchy-complete field distinct from
R Recall the field of rational functions with coefficients in the reals, R(x), from Exam

ple 2.3.12. Being a totally-ordered field, it is in particular a topological group (Exercise 4.3.52) with respect to its underlying commutative group hR(x), +, 0, i, and so has a
canonical uniform structure. Thus, we may complete to form the complete topological field
Cmp(R(x)).

4.5 Cauchyness and completeness

201

Exercise 4.5.42 Extend the order on R(x) to Cmp(R(x)) so that Cmp(R(x)) is a

totally-ordered field containing R(x).


By construction then, Cmp(R(x)) is a nonzero totally-ordered cauchy-complete field. Not
only is it distinct from R it cannot even embed in R as, if it did, then so to R(x) would embed
in R (as it embeds in Cmp(R(x))), and hence R(x) would be archimedeana contradiction
of Example 2.3.12.
In conclusion:
If a space is quasicompactly-generated, then MorTop (X, R) is complete. Furthermore, if MorTop (X, R) is complete, then X is a topological space.7 Both of these
implications are strict: the reals with the cocountable topology show that the first
implication is strict, and the reals with the cocountable extension topology show
that the second implication is strict.
4.5.3

(4.5.43)

Complete metric spaces


We present here two important results that are specific to complete metric spaces.
The Baire Category Theorem

The Baire Category Theorem is an important result concerning complete metric spaces. It has many
important applications, most of which we havent the time to present. One important application
for us, however, is that it will allow us to finally wrap up the separation axiom counter-examples
(see Example 4.5.61) below.
Theorem 4.5.44 Baire Category Theorem. Let X be a complete metric space. Then,

(i). the countable intersection of open dense subsets of X is dense; and


(ii). the countable union of closed sets with empty interior has empty interior.
R

These conclusions are equivalent, the equivalence being obtained by taking the
complement of the conclusion. The former is arguably a bit more intuitive to prove,
while the latter the form that is probably more frequently used in concrete situations.

Warning: This is false in general for complete uniform spacessee Example 4.5.50.

The word category in the name has nothing to do with categoriesthe terminology
it refers to is archaic.

Proof. For m N, let Um X be an open dense subset. We wish to show that


U :=

Um

(4.5.45)

mN

is dense. The definition of density is that the closure is equal to all of X, in other words,
that every point of X is an accumulation point, or in other words, that every open subset
intersects the dense set. So, let V X be open. We wish to show that V intersects U.
As U0 is dense, V intersects U0 , say at x0 U0 V . As U0 V is open. We can fit an
-ball around x0 inside U0 V . In fact, as X is perfectly-T4 , and in particular T3 , we can find
some 0 > 0 such that
Cls (B0 (x0 )) U0 V.

(4.5.46)

Chapter 4. Uniform spaces

202

In fact, by making 0 smaller if necessary, we may without loss of generality assume that
0 < 20 = 1.
Now, because U1 is dense, there is some x1 U1 B0 (x0 ), and so, just the same as
before, there is some 0 < < 1 < 21 such that
Cls (B1 (x1 )) U1 B0 (x0 ).

(4.5.47)

Proceeding inductively, we can find xm X and 0 < m < 2m such that


Cls (Bm (xm )) Um1 Bm1 (xm1 ).

(4.5.48)

We now check that m 7 xm is cauchy. Let > 0. Choose m N such that m < .
Suppose that n m. Then, xn Bn (xn ) Bm (xm ) B (xm ). Thus, m 7 xm is eventually
contained in some ball, and is hence cauchy. As X is complete, it converges (to a unique
limit) and so we may define
x := lim xm .

(4.5.49)

We claim that x U V . As explained above, this will complete the proof. m 7 xm is


eventually contained in Cls(B0 (x0 )) V , and so x Cls(B0 (x0 )) V . Similarly, m 7 xm
is eventually contained in Cls (Bm (xm )) Um1 , and so, same as before, x Um1 . Hence,
x U, and we are done.



Example 4.5.50 A complete uniform space which is not a baire space a Define

X := { f : Z+ [0, 1] : f (m) = 0 for all but finitely many m N.}

(4.5.51)

We define a uniformity on X as follows. First of all, for m Z+ , define


Xm := [0, 1] [0, 1]
|
{z
}

(4.5.52)

equipped with the product uniformity. Note that Xm embeds in X via m : Xm X defined
by
(
xk if k m
m (hx1 , . . . , xm i) := k 7
,
(4.5.53)
0 otherwise
that is, hx1 , . . . , xm i is sent to the function from Z+ into [0, 1] which sends k to xk for k m
and 0 otherwise. We then equip X with the final uniformity with respect to the collection
{m : m Z+ }.
We first wish to show that 3 7 x is eventually contained in Xm0 for some m0 Z+ .
To show this, we proceed by contradiction: suppose that for every m N and every there
is some m, such that xm,
/ Xm . Define 0 := Z+ . Then, 0 3 hm, i 7 xm, is
a subnet of 7 x , and hence is in turn cauchy. Thus, for every 1 , 2 , 3 , . . . > 0, there is
some hm0 , 0 i such that, whenever hm, i, hn, i hm0 , 0 i, it follows that
|xm, (k) xn, (k)| < k

(4.5.54)

4.5 Cauchyness and completeness


for all k Z+ . As xm

0 ,0

203

(k) = 0 for all k sufficiently large, we find that

xn, (k) < k

(4.5.55)

for all k sufficiently large, say for k k0 , and all hn, i hm0 , 0 i. Take n := max{k0 , m0 }.
As xn,
/ Xn , there is some l > n k0 such that xn, (l) 6= 0. Then,
0 < xn, (l) < l .

(4.5.56)

As l is arbitrary, this is a contradiction. Thus, there is some m0 Z+ such that 7 x is


eventually contained in Xm0 .
However, Xm0 , being a finite product of compact metric spaces, is a compact metric space,
and hence complete, so that 7 x converges in Xm0 , and hence in X. Therefore, X is
complete.
We now wish to check that X is not a baire space. From the definition, we have that
X=

Xm .

(4.5.57)

mZ+

Exercise 4.5.58 Show that Xm is closed in X.

Thus, to show that X is not a baire space, it suffices to show that each Xm has empty interior.
So, let x Xm . We show that every open neighborhood around x contains an element of
Xm+1 that is not contained in Xm . Let us write
x = hx1 , x2 , . . . , xm , 0, 0, 0, . . .i

(4.5.59)

Then, for every neighborhood U of x, there will be some 0 > 0 sufficiently small so that
x = hx1 , x2 , . . . , xm , 0 , 0, 0, . . .i U,

(4.5.60)

so that x is not in the interior of Xm , so that each Xm has empty interior.


a A baire space is a topological space in which the conclusion of the The Baire Category Theorem holds.
This example was inspired by priels answer on mathoverflow.net. Thanks to Nate Eldredge for nudging me
towards the correct proof.

Finally, we are able to tie-up our one loose end with the separation axioms.


Example 4.5.61 A space that is perfectly-T3 but not T4 a Define X := R R+


0 , the

upper-half plane, and define a base for a topology on X by




B := B (hx, yi) R R+ : > 0, hx, yi R R+
{B (hx, i) {hx, 0i} : x R, > 0} .

(4.5.62)

That is, we take all -balls contained in the (strict) upper half-plane together with all balls in the upper half-plane which are tangent to the x-axis (together with the point of
tangency).
We first check that X is perfectly-T3 . So, let C X be closed and let hx0 , y0 i Cc . The
subspace topology of R+ R+ X is just the usual topology, which, being a metric space,
is perfectly-T4 . Thus, we only need to check the case where y0 = 0. Let > 0 be such that

Chapter 4. Uniform spaces

204
B (hx0 , i) is disjoint from C. Then define f : X [0, 1] by

0
f (hx, yi) := 1

(xx0 )2 +y2

if hx, yi = hx0 , 0i
if hx, yi
/ B (hx0 , i) {hx0 , 0i} .

(4.5.63)

2y

For 0 < a < 1, f 1 ([0, a)] is Ba (hx, ai) {hx, 0i}, hence open; f 1 ((a, 1]) is X \
Cls(Ba (hx, ai)), hence open; and so f 1 (a, b) = f 1 ((a, 1]) f 1 ([0, b)) is open.
We now check that X is not T4 . To do this, we show that Q, R \ Q R R+
0 (as subsets of
the x-axis) are closed and cannot be separated by neighborhoods.b In fact, we check that
every subset S R {0} is closed. To show that, we show that Sc is open. For hx, yi Sc ,
if y > 0, then certainly we can put an -ball around it that does not intersect the x-axis,
and hence does not intersection S. On the other hand, for hx, 0i Sc , B (hx, i) {hx, 0i}
intersects the x-axis only at hx, 0i, and so does not intersect S. Therefore, Sc is open, and so
S is closed.
We now check that Qc and (R \ Q)c cannot be separated by neighborhoods.c Let U be an
open neighborhood o R \ Q. We show that there is some point q0 Q every neighborhood
of which intersects U.
For x R \ Q, let x > 0 be such that
R \ Q 3 x Bx (hx, x i) {x} U.

(4.5.64)

For m Z+ , define
Sm := {x R \ Q : x > m1 }.

(4.5.65)

Then,
R=

[
mZ+

Sm

{x},

(4.5.66)

xQ

and so
R=

[
mZ+

ClsR (Sm )

{x}.d

(4.5.67)

xQ

As R is a complete metric space, by the The Baire Category Theorem, there must be some
m0 Z+ such that ClsR (Sm ) does not have empty interior. So, let (a, b) ClsR (Sm ) and let
Q 3 q0 (a, b). Then, for every > 0, (q0 , q0 + ) intersects Sm , say at x . But then,
for all sufficiently small,
hx , i B (hq0 , i) B 1 (hx , m1 i) B (hq0 , i) U.

(4.5.68)

Thus, every neighborhood of hq0 , 0i Q intersects U.


R
a This

This is Niemytzkis Tangent Disk Topology.

example comes from [23, pg. 100].


write Q and R \ Q instead of Q {0} and (R \ Q) {0}.
c Note that we cannot write Qc to denote the irrationals as usual because, in this context, Qc means
(R R+
0 ) \ Q.
d The subscript R here is to indicate that we are taking the closure with respect to the usual topology on R.
b We

4.5 Cauchyness and completeness

205

Banach Fixed-Point Theorem

We finish the chapter with an application of the The Baire Category Theorem that will prove useful
to us later when we study differentiation.
Theorem 4.5.69 Banach Fixed-Point Theorem. Let hX, |,|i be a metric space and let

f : X X be such that
| f (x1 ), f (x2 )| M|x1 , x2 |

(4.5.70)

for some 0 1 < M. Then, there is at most one point x0 X such that f (x0 ) = x0 . If X is
nonempty and complete, then there is exactly one x0 X such that f (x0 ) = x0 .
R

Such an x0 is called a fixed-point, hence the name of the theorem. Thus, the theorem
tells us that (i) fixed-points, if they exist, have to be unique; and (ii) in the (nonempty)
complete case, there has to be some fixed-point (and hence exactly one fixed-point).

(4.5.70) is just the statement that f is lipschitz-continuous for a constant M < 1. Such
maps are called contraction-mapppings, hence the alternative name for this theorem,
the Contraction-Mapping Theorem.

Proof. Let x1 , x2 X be two fixed points of f . Then,


|x1 , x2 | = | f (x1 ), f (x2 )| M|x1 , x2 |,

(4.5.71)

and hence, if |x1 , x2 | 6= 0, we would have (because M < 1)


|x1 , x2 | < |x1 , x2 | :

(4.5.72)

a contradiction. Therefore, |x1 , x2 | = 0, and hence x1 = x2 .


Now take X to be nonempty and complete. We construct an actual fixed point of f . Let
x0 X (there is such a point because X is nonempty). For m Z+ , define
xm := f (xm1 ).

(4.5.73)

We wish to show that the sequence m 7 xm is cauchy. If we can do so, then its limit x must
exist, and so by taking the limit of the previous equation, we would find that x = f (x ) ( f
is lipschitz-continuous, hence uniformly-continuous, hence continuous). Thus, it suffices to
show that m 7 xm is cauchy
To see this, we first notice thata
|y1 , y2 | |y1 , f (y1 )| + | f (y1 ), f (y2 )| + | f (y2 ), y2 | |y1 , f (y1 )| + M|y1 , y2 | + | f (y2 ), y2 |,
(4.5.74)
and so
|y1 , y2 |

1
(|y1 , f (y1 )| + | f (y2 ), y2 |)
1M

(4.5.75)

Also note that


| f m (y1 ), f m (y2 )| M m |y1 , y2 |,

(4.5.76)

Chapter 4. Uniform spaces

206

which follows of course from just applying (4.5.70) inductively. Taking y1 := f m (x0 ) and
y2 := f n (x0 ) in (4.5.75), we find
1
1
(|xm , xm+1 | + |xn+1 , xn |)
(M m |x0 , f (x0 )| + M n | f (x0 ), x0 |)
1M
1M
Mm + Mn
|x0 , f (x0 )|.
=
1M
(4.5.77)

|xm , xn |

+M
Because M < 1, we can make M1M
arbitrarily small by taking m and n sufficiently large.b
Hence, this sequence is cauchy, and we are done.

a Were

using ys instead of xs because those symbols are already used-up.


you are not comfortable with this amount of detail, I suggest you fill in the gaps. You will want to get to
the point where you feel comfortable just asserting cauchyness after obtaining an inequality like this.
b If

Example 4.5.78 A contraction mapping with no fixed-point Take X := R+ and

define f (x) := 12 x. The fixed-point should be 0, but 0


/ R+ . You can turn this intuition
into a proof that this map indeed has no fixed point, and we recommend you try to do so.

5. Integration

So, first things firstfuck the riemann integral. Seriously. The only argument pro-riemann-integral
is that it is easier. What a ridiculous argument. This is math, dude. If you choose to do things
because theyre easy, youre in the wrong subject. Moreover, I would argue that this is not even
trueif you set things up right, you can literally define the (lebesgue) integral to be the area
(measure) under the curve. Or, if you prefer, you can take a limit over the size of a partition of
the sum of the areas of the rectangles corresponding to the subsets of the partition (the riemann
integral). Are you really going to sit here and try to argue that this is easier to teach? I call bullshit.
And besides, if youre going to become a mathematician, you have to learn the lebesgue integral at
some point anyways. . . why learn something only to have to relearn it later?
Okay, so now that my rant is out of the way, lets actually do some mathematics.

5.1

Measure theory
All of integration theory ultimately boils down to measure theory. The definition of the integral
itself is relatively easy. In fact, the definition of abstract measure spaces is even easier. Theres
really no question that writing down the definition of the lebesgue integral is significantly easier
than that of the riemann integral. What is a bit tricky, however, is constructing specific measures.
In our case, we will primarily be concerned with constructing lebesgue measure (on Rd ), and this is
really the only part that is a bit tricky. Fortunately, there is a huge theorem that will just spit out
lebesgue measure for us, the The Haar-Howes Theorem.

5.1.1

Outer-measures
The intuition behind measure is actually quite easya measure is just an axiomatization of our
intuition about notion of things like length, area, and volume. Before we define a measure, it will
be convenient to introduce a couple of terms.
Definition 5.1.1 Subadditivity and additivity Let X be a set, let m : 2X [0, ], and

let M 2X .

Chapter 5. Integration

208
(i). m is subadditive on M iff for {Mm : m N} M we have
!
m

Mm

m(Mm );

(5.1.2)

mN

mN

(ii). m is additive on M iff for {Mm : m N} M a disjoint collection we have


!
m

Mm

mN

m(Mm ).

(5.1.3)

mN

m is simply just subadditive (resp. additive) if it is subadditive (resp. additive) on all of X.


R

You might think that we should always have additivity, or at the very least, we
should have finite additivity: if S and T are disjoint, then m(S T ) = m(S) + m(T ).
Unfortunately, this is falsesee Example 5.2.93. We will have additivity on a very
large class of sets, however, the so-called measurable setssee Definition 5.1.19.

Exercise 5.1.4 Let M be a collection of sets that is closed under union, intersection, and

complementation. Show that if m is additive on M then it is subadditive on M .


R

There is something to show here. While (5.1.3) itself is obviously a stronger condition
than (5.1.2), it is also only assumed for disjoint collections. The problem then is to
show that, if (5.1.3) holds for disjoint collections, then (5.1.2) holds for all collections.

Definition 5.1.5 Outer-measure Let X be a set. An outer-measure on X is a function

m : 2X [0, ] such that


(i). m(0)
/ = 0;
(ii). (Nondecreasing) m : h2X , i [0, ] is nondecreasing;a and
(iii). (Subadditivity) m is subadditive.
A set equipped with an outer-measure is an outer-measure space.
R

Note that we allow the measure of sets to be infinite. This is incredibly importantfor
example, we will want m(R) = (for lebesgue measure anyways).

As a consequence of this, we neednt worry about convergence in the third axiom (see
(5.1.2)). As a matter of fact,Swe definitely want to allow this sum to divergethink
about what the measure of mZ (m, m + 1) should be.

a Concretely,

this means that m(S) m(T ) if S T .

In measure theory, things almost always matter only up to sets of measure 0. This concept is
so important that there is a term for it.
Meta-definition 5.1.6 Almost-everywhere XYZ Let f : hX, mi Y be a function on

an outer-measure space. Then, f is almost-everywhere XYZ iff


m ({x X : f (x) is not XYZ.}) = 0.

(5.1.7)

5.1 Measure theory

209

Whenever X is equipped with a measure, all relations on MorSet (X,Y ) are defined
only up to measure zero. For example, if we write f = g, what we really mean is that
m ({x X : f (x) 6= g(x)}) = 0. Similarly, if we write f g, what we really mean is
that m ({x X : f (x) 6 g(x)}) = 0. Etc.. A fancier way of putting this, is that we work
in the almost-everywhere category.

Example 5.1.8 The almost-everywhere category The almost-everywhere category

is the category AlE whose collection of objects AlE0 is the collection of all outer-measure
spaces, for every outer-meausre space X and outer-meausre space Y the collection of
morphisms from X to Y , MorAlE (X,Y ), is the quotient set MorSet (X,Y )/ , where f
g iff f and g are equal almost everywhere, composition is given by ordinary function
compositiona , and the identities of the category are the (equivalence classes) of the identity
functions.
Exercise 5.1.9 Check that is an equivalence relation.
Proposition 5.1.10 Composition in AlE is well-defined.

Proof. Let X,Y, Z be outer-measure spaces, let f1 , g1 : X Y be equal almosteverywhere and let f2 , g2 : Y Z be equal almost-everywhere. We must show that
f2 f1 is equal to g2 g1 almost everywhere. However, of course, if f1 (x) = g1 (x),
then certainly f2 ( f1 (x)) = g2 (g1 (x)), and so
{x X : f1 (x) = g1 (x)} {x X : [ f2 f1 ](x) = [g2 g1 ](x)} ,

(5.1.11)

and so
{x X : [ f2 f1 ](x) 6= [g2 g1 ](x)} {x X : f1 (x) 6= g1 (x)} .

(5.1.12)

As the set on the right-hand side has measure zero, so too does the set on the
left-hand side.
R

Note that we didnt even need to make use of the fact that f2 (x) = g2 (x) for
almost-every x.


a One

must check well-definednesssee below

Example 5.1.13 The zero measure

Let X be a set and define m : 2X [0, ] by

m(S) := 0. How terribly interesting.

Example 5.1.14 The infinite measure Let X be a set and define m : 2X [0, ] by

(
0 if S = 0/
m(S) :=
.
otherwise
Dear god, this example is even more interesting than the last one.

(5.1.15)

Chapter 5. Integration

210


Example 5.1.16 The unit measure Let X be a set and define m : 2X [0, ] by

(
0
m(S) :=
1

(5.1.17)

Note that this is never additive (unless of course X is either empty or a single point).
This makes it useful for producing counter-examples (see Example 5.1.48), and not
much else.

if S = 0/
.
otherwise

Example 5.1.18 The counting measure Let X be a set and for S X define m(S) :=

|S|, that is, the cardinality of S.


This is actually incredibly important, as we shall see that sums are just integrals with
respect to the counting measure.

While we will be assigning a measure to every set, not all of them will be considered to be
measurable. In general, we will not have additivity; however, when we restrict our (outer) measures
to the collection of measurable sets, we will have additivity. This is more or less the point of talking
about measurable setsadditivity is a nice thing to have.
Definition 5.1.19 Measurable (set) Let m : 2X [0, ] be an outer-measure on a set

X and let M X. Then, M is measurable iff


m(S) = m(S M) + m(S M c )

(5.1.20)

for all sets S X.


R

Think about what this means: M is chopping up S into two pieces, the set of points in
M and the set of points not in M. M is measurable, then, if the measure of S is the
sum of the measure of these two pieces for all S. In particular, S itself is definitely
not required to be measurable.a

By subadditivity, we always have that m(S) m(S M) + m(S M c ). Therefore, in


fact, M is measurable iff
m(S) m(S M) + m(S M c )

(5.1.21)

for all S X.
R

You might say that the motivation for the definition is that, for sets S, T that satisfy this
property, we should have at least finite additivity (i.e. S, T disjoint implies m(S T ) =
m(S) + m(T )). It turns-out that this is true, but in fact, perhaps surprisingly, we have
much more than thiswe actually have (countable) additivitysee Theorem 5.1.23.
By asking for finite additivity, we get countable additivity for free!

Warning: There definitely exist sets that are not measurable in general! In fact,
nonmeasurable sets are easy to find in artificial spacessee Example 5.1.48. But
such pathologies exist even for the nicest of measures. Indeed, see Example 5.2.84
for a set that is not measurable with respect to lebesgue measure.

This is also what is sometimes referred to carathodory measurable.

a For

one thing, this would make the definition circular.

5.1 Measure theory

211

Exercise 5.1.22 Show that if m(M) = 0, then all subsets of M are measurable.
Theorem 5.1.23 Carathodorys Theorem. Let m : 2X [0, ] be an outer-measure.

Then,
(i). m is additive on the collection of measurable sets;
(ii). the countable union of measurable sets is measurable;
(iii). the countable intersection of measurable sets is measurable;
(iv). the complement of a measurable set is measurable;
(v). 0/ and X are measurable.
R

Note that Sets (Exercise A.1.16) imply that (ii) are (iii) are equivalent if (iv) is true, in
which case together they imply (v).a A nonempty collection of sets which satisfies
(ii)(v) is called a -algebra. We do not use this language, but it is important to know
for consulting other references.

Proof. S T E P 1 : S H O W ( I V )
The definition of measurability is S Sc symmetric, so (iv) is automatically true.
STEP 2: SHOW (V)
The empty-set is measurable by the previous exercise, and hence by (iv), X is measurable
as well, which establishes (v).
STEP 3: REDUCE THE PROOF OF (III) TO THE PROOF OF (II)
As was explained in a remark, we need not show (iii) itselfit will now follow if we can
show (ii).
S T E P 4 : P ROV E ( I I ) F O R F I N I T E U N I O N S
We first show that the union of finitely many measurable sets is measurable. It suffices
of course to then just show that the union of two measurable sets is measurable. So, let
M1 , M2 X be measurable and let S X. Then,
m(S) = b m(S M2 ) + m(S M2c )
= c m(S M2 M1 ) + m(S M2 M1c ) + m(S M2c M1 ) + m(S M2c M1c )
d m(S (M1 M2 )) + m(S (M1 M2 )c )
(5.1.24)
Thus, indeed, M1 M2 is measurable.
STEP 5: COMPLETE THE PROOF OF (II)
Let {Mm : m N} be a countable collection of measurable sets. We wish to show that
[

(5.1.25)

Mm

mN

is measurable. First of all, define


Mm0

:= Mm \

m1
[
k=0

Mk .

(5.1.26)

Chapter 5. Integration

212

Note that each Mm0 is measurable because we already know that finite unions, complements,
and hence also finite intersections, of measurable sets are measurable.
0 M , (ii) the collection {M 0 : m N} is disjoint,
Exercise 5.1.27 Show that (i) Mm
m
m
S
S
0

and (iii)

mN Mm

mN Mm .

This trick (the one in (5.1.26), that is) is important. Dont forget it.

Thus, as
[

Mm =

mN

Mm0 ,

(5.1.28)

mN

it suffices to prove this step in the case where {Mm : m N} is itself disjoint (just rename
Mm to now be Mm0 ). Thus, we now without loss of generality assume that {Mm : m N} is
disjoint.
Now define
Nm :=

m
[

Mk and N :=

k=0

Mm .

(5.1.29)

kM

so that, by the previous step, we have that Nm is measurable. Thus, for S X,


m(S Nm ) = e m(S Nm Mm ) + m(S Nm Mmc ) = f m(S Mm ) + m(S Nm1 )
m

(5.1.30)

= g m(S Mk ).
k=0

Thus,
m

m(S) = m(S Nm ) + m(S Nmc ) =

m(S Mk ) + m(S Nmc )


k=0

m(S Mk ) + m(S N

(5.1.31)

).

k=0

Hence,
m(S)

m(S Mm ) + m(S N c ) i m(S N) + m(S N c ),

(5.1.32)

mN

and so indeed N is measurable.


S T E P 6 : P ROV E ( I )
Let N and S be as in the previous step and take S := N. Then, by (5.1.32), we have that
!
[

mN

Mm

m(Mm ).

(5.1.33)

mN

The other inequality is automatic from subadditivity, and so indeed, we have equality.

0/ = S Sc .
M2 is measurable.
c Because M is measurable (applied twice).
1
d By subadditivity and the fact that M M = (M M ) (M M c ) (M c M ).
1
2
1
2
1
2
2
1
e Because E is measurable.
m
Justf Because
as we the
have
a notion of measurable set, so too do we have a notion of measurable function
collection {Mm : m N} is disjoint.
g Apply this trick inductively
Definition
5.1.34
h Because N
c N c . Measurable (function) Let f : hX1 , m1 i hX2 , m2 i be a function
m
i By subadditivity.
between
outer-measure spaces. Then, f is measurable iff
a Because
b Because

5.1 Measure theory

213

(i). the preimage of every set is measurable; and


(ii). the preimage of a set of measure 0 has measure 0.

This is a very strong condition. For example, there are uniform-homeomorphisms on


R that are not measurablesee Example 5.2.101 (the Cantor Function).

Neither of these conditions imply one anothersee the following examples.

Example 5.1.35 A function which preserves measurability but not measure 0

Precisely, we give a function that has the property that the preimage of every set is
measurable but the preimage of a set of measure 0 does not have measure 0.
Let X1 := {x1 } be a one point set and let m1 be the infinite measure on X1 . Let X2 := {x2 } be
a one point set and let m2 be the zero measure on X2 . Let f : X1 X2 be the only function
that exists from X1 to X2 .
Every subset of X1 is measurable, and so trivially the preimage of every subset of X2 is
measurable. On the other hand, m2 ({x2 }) = 0, but m1 f 1 ({x2 }) = 6= 0.

Example 5.1.36 A function which preserves measure 0 but not measurability

Precisely, we give a function that has the property that the preimage of every set of measure
0 has measure 0 but the preimage of some measurable set is not measurable.
Let X1 := {x1 , x2 } =: X2 be a two point set. Equip X1 with the unit measure and equip X2
with the infinite measure. Let f := id{x1 ,x2 } .
There is only one subset of X2 with measure 0, namely 0,
/ and of course the preimage of the
empty-set (the empty-set itself) also has measure 0. On the other hand, {x1 } is measurable
with respect to the infinite measure, but not the unit measure, and so its preimage (namely
itself) is not measurable.

I should probably mention at this point that the way I am presenting measure theory
goes against the orthodoxy. (You can skip this comment if you dont plan to look in
other sources.) Every other author I am aware of only works with measurable sets.
For them, a measure space is a set, together with a -algebra, the measurable sets,
along with a set function that is additive on the -algebra (and sends the empty-set
to 0). For them, outer-measures are merely tools for constructing measures (a l
Mr. Carathodory). Of course, their measure1 is just the restriction of the outermeasure to the collection of all measurable sets. You might say they simply forget that
they had ever assigned a measure to the nonmeasurable sets. I find this unnecessarily
complicated and messy. For one thing, if you do things the way I have presented them,
you dont have to worry about -algebras (at least not explicitly). The disadvantage
is that now I have to add the hypothesis These sets are measurable to a lot of my
theorems. Meh. Its a trade-off, but I quite like not having to every worry about
-algebras explicitly.
Exercise 5.1.37 Let m : 2X [0, ] be an outer-measure and let S X be measurable have
1 The

quotes are to indicate that this is what they call itI myself have not defined this term.

Chapter 5. Integration

214
finite measure. Show that
m(S \ T ) = m(S) m(T ).
R

(5.1.38)

You need S to have finite measure otherwise the right-hand side of this equation will
be undefined if T also has infinite measure.

for any measurable set T S.


Exercise 5.1.39 Let M0 M1 be an nondecreasing countable collection of measurable

sets. Show that


m

!
Mk

k=0

= lim m
m

m
[

!
Mk

k=0

= lim m(Mk ).
m

(5.1.40)

Exercise 5.1.41 Let M0 M1 be a nonincreasing countable collection of measurable

sets. Show that, if at least one Mk has finite measure,


!
!
m

k=0

Mk

= lim m
m

m
\

k=0

Mk

= lim m(Mk ).
m

(5.1.42)

Though we technically have not defined measure on R yet, it is easy to see intuitively
why we would neither expect nor want this to hold if the measure of each Mk were
infinite. For example,
take Mk := (, k). Then, on one hand, m(Mk ) = for all
T
k, but yet m(
M
)
= m(0)
/ = 0.
k
k=0

At some point in the near future, we will be doing arithmetic with for example,
what should the measure of R {0} in R2 be? Of course, from our definition of
product measures, this will turn out to be 0. We hence declare that
0 := 0 =: 0 .

(5.1.43)

There are other arithmetic notions we have to technically define (e.g. x + = ), but
this is the only nonobvious one.
5.1.2

Measures on topological and uniform spaces


One can go ahead and develop the theory for outer-measures on arbitrary sets, but in practice, we
will only be working with measures defined on spaces with a lot of extra structure. This motivates
us to investigate outer-measures on topological and uniform spaces, in which case we are of course
going to require our outer-measures to be compatible with this extra structure.
Measures on topological spaces

There are definitely outer-measures on topological spaces for which the open sets are not
measurablesee Example 5.1.48in fact, this can even be the case for regular measures (the
example in Example 5.1.48 is regularsee Definition 5.1.45 for the definition of a regular measure),
but most outer-measures on topological spaces which are not cooked-up for the sole purpose of
producing counter-examples have the property that open sets are measurable. We have a name for
such measures: borel measures.

5.1 Measure theory

215

Definition 5.1.44 Borel measure Let X be a topological space and let m : 2X [0, ]

be an outer-measure. Then, m is borel iff every open set is measurable. A topological space
equipped with a borel measure is a borel measure space.
By Outer-measures, it follows that the -algebra generated by the open sets likewise
consists of measurable sets. The term for the sets in this -algebra is borel set, hence
the term, borel measurea borel measure is a measure in which the borel sets are
measurable (which is of course equivalent to the open sets being measurable).

Definition 5.1.45 Regular measure Let X be a topological space and let m : 2X [0, ]

be an outer-measure. Then, m is regular iff


(i). m is finite on quasicompact subsets;
(ii). (Outer-regular) for S X,
m(S) = inf{m(U) : S U, U open.}; and

(5.1.46)

(iii). (Inner-regular on open sets) for U X open,


m(U) = sup{m(K) : K U, K quasicompact.}.

(5.1.47)

A topological space equipped with regular measure is a regular measure space.


Warning: We will see below (Proposition 5.1.101) that open sets must be measurable
in uniform measure spaces; however, this need not be the case in regular measure
spacessee the following example.

Example 5.1.48 A regular measure that is not borel Define X := {x1 , x2 }, and

equip it with the discrete topology and the unit measure (see Example 5.1.16). Of course
the measure of every quasicompact subset is finite. But furthermore, it is also inner-regular
on open sets (and in fact on every set), because this is must the statement that 1 = sup{1}
(the measure of a nonempty open set, 1, is equal to the suprema of the set of all measures of
quasicompact sets contained in it, namely, 1). Similarly, it is outer-regular, hence regular.
On the other hand,
m({x1 , x2 }) = 1 < 1 + 1 = m({x1 }) + m({x2 }).

(5.1.49)

Therefore, one of the open sets {x1 } or {x2 } must have been nonmeasurable.
Of course, there is a counter-example to the other potential implication as well.
Example 5.1.50 A borel measure that is not regular Consider the counting measure
on R. The set [0, 1] is quasicompact, but has infinite measure. Therefore, the counting
measure on R is not regular. On the other hand, the cardinality of the union of two
disjoint sets is the sum of the cardinalities of those two sets,a and so we always have
m(S) = m(S M) + m(S M c ), that is to say, every set is measurable, and so certainly the
open sets are measurable.


a This

is how we defined addition of cardinals!

Chapter 5. Integration

216

In general topology, I really dislike imposing unnecessary countability assumptions. On the


other hand, countability is something fundamental in measure theory, simply because of the
conditions of additivity and subadditivitysee Definition 5.1.5. Furthermore, as explained there,
we dont want any stronger additivity assumptions. Thus, in the context of measures on topological
spaces, it makes sense to impose countability conditions on our spaces. This leads us to the
following definition.
Definition 5.1.51 -quasicompact A topological space is -quasicompact iff it is

the countable union of quasicompact sets. A topological space is -compact iff it is the
countable union of compact sets.
R

Recall that compact is synonymous (by definition) with T2 and quasicompact. In


particular, compact sets are closed (and so measurable for borel measurs).

Exercise 5.1.52 Show that the product of -quasicompact spaces is -quasicompact. Show

that the product of -compact spaces is -compact.


Definition 5.1.53 Topological measure space A topological measure space is a -

compact topological space equipped with a regular borel measure.


Exercise 5.1.54 Show that a topological measure space can be written as the countable

disjoint union of measurable sets of finite measure.


R

By regularity and -compactness, you by definition have that topological measure


spaces can be written as the countable union of measurable sets of finite measure. It
is your job to turn that union into a disjoint one.

I told you once upon a time not to forget a trick. You havent forgotten it, have you?

The following is nice characterization of measurability in topological measure spaces. It is


arguably the reason why we give the conditions -compact, regular, borel a name in the first
place.
Proposition 5.1.55 Let hX, mi be a topological measure space and M X. Then, M is

measurable iff for every > 0, there is an open set U and a closed set C such that
C M U and m(U \C ) < .
Proof. Write X =

mN Km

(5.1.56)

for Km X compact.

() Suppose that M is measurable. Define Mm := M Km . As Mm Km , Mm has finite


measure because m is regular. Let > 0. By outer-regularity, there is some open Um
containing Mm such that

(5.1.57)
m(Mm ) m(Um ) < m(Mm ) + m .
2
Because M is measurable, it in turn follows thata

m(Um Mm ) < m .
(5.1.58)
2
S
Define U := mN Um . Then,
m(U \ M)

m(Um Sm ) < 2.
mN

(5.1.59)

5.1 Measure theory

217

Applying this same logic to M c , we can find an open set V containing M c such that
m(V \ M c ) < 2.

(5.1.60)

Then, V c is of course a closed subset of S and


m(U \V c ) = m(U V ) m(U V M) + m(U V M c )
m(V \ M c ) + m(U \ S) < 4.

(5.1.61)

() Suppose that for every > 0, there is an open set U and a closed set C such that
C M U and m(U \C ) < .
For :=

1
2m ,

C :=

let Cm S be closed and such that m(S \Cm ) <

Cm .

(5.1.62)

2m .

Define
(5.1.63)

mN

C is measurable because it is the countable union of closed sets, which are measurable
because, by hypothesis, the measure in borel. On the other hand, m(S \ C) < 2. As
is arbitrary, it follows that m(M \ C) = 0, and so S \ C is measurable. Therefore, M =
C (M \C) is measurable.

a See Exercise 5.1.37. We are also using here the fact that K is measurable, because it is compact, hence
m
closed, and the measure is borel (by hypothesis).

Before moving onto discussion of measures on uniform spaces, we have one more adjective
that is used to describe outer-measures on topological spaces.
Definition 5.1.64 Topological-additivity Let X be a topological space and let m be an

outer-measure on X. Then, m is topologically-additive iff whenever S, T X are separated


by neighborhoods, it follows that m(S T ) = m(S) + m(T ).
R

Note that the unit measure on a two point space with the discrete topology (see
Example 5.1.48) is an example of a regular measure that is not topologically-additive.
The point is, that we do need to make this an additional assumptionwe do not get
it for free. On the other hand, note that regular borel measures are automatically
topologically-additivesee Exercise 5.1.65

In uniform spaces, there are substantially less levels of separation.a Thus, in uniform
spaces, there are not many additivity conditions one can place on the measure
(see the definition of uniform-additivity, Definition 5.1.95). On the other hand, the
separation axioms in topological spaces do not collapse at all (see the diagram in
(3.6.119)), and so there are many additivity conditions one might consider putting
on measures on topological spaces. We choose the one we do because it winds up
being equivalent to uniform-additivitysee Proposition 5.1.97.

a Well, a posteriori anywaysmany of them wind up being equivalent, though this is nontrivialsee
Corollary 4.4.31.

Exercise 5.1.65 Let m be a regular borel measure. Show that m is topologically-additive.

Chapter 5. Integration

218

Proposition 5.1.66 Let m a regular topologically-additive measure on a T2 space. Then, m

is borel.
R

Thus, by the previous exercise, for T2 spaces, borel is equivalent to topological


additivity.

Warning: This will fail if the space is not T2 see Example 5.1.102.

By Outer-measures, it thus follows that the -algebra generated by the topology,


the borel sets, are measurable. In particular, F and G sets are measurable.
a

Proof.

Let U X be open and let A X be arbitrary. We wish to show that

m(A) m(A U) + m(A U c ).

(5.1.67)

If m(A) = , this is automatically satisfied, so we may as well assume that m(A) < .
Let > 0. Then, by outer-regularity, there is some open set U that contains U and
m(A) m(U ) < m(A) + .

(5.1.68)

Then, by inner-regularity on opens, there is some quasicompact K U U such that


m(U U ) < m(K ) m(U U ).

(5.1.69)

As X is T2 , so that K is closed, so that U Kc is open, and so there is some compact


L U Kc such that
m(U Kc ) < m(L ) m(U Kc ).

(5.1.70)

Hence,
m(A) > m(U ) b m(K L ) = c m(K ) + m(L )
> m(U U ) + m(U Kc ) 3 d m(U U ) + m(U U c ) 3 (5.1.71)
e m(A U) + m(A U c ) 3.
As is arbitrary, we have that m(A) m(A U) + m(A U c ), and so U is measurable. 
a Proof

adapted from [2, pg. 194].


K L U .
c You can separate by neighborhoods disjoint compact subsets of T spaces. Then we apply the fact that m is
2
topologically-additive.
d Because U U c U K c .

e Because A U .

b Because

By definition, regularity requires inner-regularity in open sets. In certain nice cases, however,
we also get inner-regularity on measurable sets of finite measure.
Proposition 5.1.72 Let hX, mi be a topological measure space. Then, m is inner-regular on

measurable sets of finite measure.


Proof. Write X = mN Km for Km quasicompact. Let S X be measurable and of finite
measure. Define Sm := S Km . As X is T2 , K is closed. Because X is T2 , m is borel, and so
closed sets are measurable. Thus, K is measurable, and so Sm is measurable.
Let > 0. Then, there are open sets Um, and closed sets Cm, such that (i) Cm,
S

5.1 Measure theory

219

Sm Um, and (ii) m(Um, \ Cm, ) < 2m . As Cm, Km , Cm, is quasicompact, and so
S
Lm, := m
k=0 Ck, is likewise quasicompact. Furthermore,
m

m(S) m(Lm, ) =

k=0

k=0

m(Sk ) + (m(Sk ) m(Ck, ))

k=m+1
m

k=0

m(Sk ) + (m(Uk, ) m(Ck, )) <

k=m+1

m(Sk ) m(Ck, ) =
k=0

k=m+1

k
k=0 2

m(Sk ) +

m(Sk ) + 2

k=m+1

(5.1.73)
As m(S) =
k=0 m(Sk ) is finite, we can make this arbitrarily small by taking m sufficiently
large. As Lm, S is quasicompact, we have that
m(S) = sup{m(K) : K S, K quasicompact},

(5.1.74)


and so m is inner-regular on S.
A related result is the following.

Proposition 5.1.75 Let X be a topological space that can be written as the countable union

of quasicompact sets, let m be a regular measure, and let S X. Then, if m(S) = , then
there is a subset T S with 0 < m(T ) < .
R

This condition, that every set of infinite measure has a subset of finite positive measure
is sometimes called semifinite.

In particular, topological measure spaces are semi-finite.


S

Proof. Suppose that m(S) = . Write X = mN Km for Km quasicompact. Define Sm :=


S Km . As Km is quasicompact, it has finite measure, and so Sm has finite measure. We also
have that
!
= m(S) = m

[
mN

Sm

m(Sm ),

(5.1.76)

mN

which means that the measure of at least some Sm is strictly positive.

Another important fact about topological measure spaces is that every set is measurable up to
sets of measure 0.
Proposition 5.1.77 Let hX, mi be a topological measure space and let S X. Then, there

is some M X measurable with m(S M c ) + m(Sc M) = 0.


R

In the venn diagram of S and M, S M c Sc M is everything outside of S M.


We are saying that, in particular, this set has measure 0. This is what we mean when
we say that every set is measurable modulo a set of measure 0.

Proof. Write X = mN Km for Km X quasicompact. Define Sm := S Km . Let > 0.


Let Um X be open with Sm Um and m(Um ) m(Sm ) < 2m . Then, let Lm Um be
S
quasicompact and such that m(Um \ Lm ) < 2m .a Define M := mN Lm . As a countable
S

Chapter 5. Integration

220

union of measurable sets, M is measurable. For convenience, let us also write U :=


so that we have
m(U \ L)

m(Um \ Lm ) = 2.

mN Um ,

(5.1.78)

mN

We similarly have that


m(U \ S) < 2.

(5.1.79)

From these, we have


m(S M c ) m(U \ L) < 2,

(5.1.80)

which gives m(S M c ) = 0. On the other hand, we also have


m(Sc M) m(U \ S) < 2,

(5.1.81)

which likewise gives us that m(Sc M) = 0, and we are done.


a Here,

we implicitly using the fact that Lm ,Um are measurable.

We can use this to give us yet another characterization of measurability in topological measure
spaces. For convenience, we collect it together with our previous characterization (Proposition 5.1.55).
Theorem 5.1.82. Let hX, mi be a topological measure space and M X. Then, the following

are equivalent.
(i). M is measurable.
(ii). For every > 0, there is an open set U and a closed set C such that
C M U and m(U \C ) < .

(5.1.83)

(iii). For every measurable set S X of finite measure,


m(S) = m(S M) + m(S M c ).
R

(5.1.84)

The last condition of course just says that we have to verify the usual condition of
measurability (see Definition 5.1.19) for S measurable and of finite measure.

Proof. We have already shown that (i) and (ii) are equivalent in Proposition 5.1.55. Furthermore, (i) implies (iii) immediately from the definition. Thus, it suffices to show (iii) implies
(i).
Let S X be arbitrary and let B X be measurable and such that m(S Bc ) + m(Sc
S
B) = 0 (by the previous result). By Exercise 5.1.54, write B = mN Bm as a countable
disjoint union of measurable sets of measure 0. Then, it follows from the hypothesis that
m(B) = m(B M) + m(B M c ).

(5.1.85)

5.1 Measure theory

221

Then,
m(S) m(S B) = m(S B) + m(Sc B) m(B) = m(B M) + m(B M c )
m(B S M) + m(B S M c )
= m(B S M) + m(Bc S M) + m(B S M c ) + m(Bc S M c )

(5.1.86)

m(S M) + m(S M c ).

One might expect that nonempty open sets have positive measure in topological measure spaces.
This is not true.
 Example 5.1.87 An outer-measure on R for which hR, mi is a topological measure space, but is not strictly-positivethe dirac measure a Define m : 2R [0, ]

by
(
0 if 0
/S
m(S) :=
.
1 if 0 S

(5.1.88)

m((0, 1)) = 0 for example, so this is definitely not strictly-positive. What we need to check
is that it is a topological measure.
We first check that it is regular. m itself is finite, and so certainly finite on quasicompact
sets. Let U R be open. If 0 U, then as {0} is quasicompact, we have that
m(U) = 1 = sup{m(K) : K U, K quasicompact}.

(5.1.89)

On the other hand, if 0


/ U, then no subset of U will contain 0, and so once again we have
m(U) = 0 = sup{m(K) : K U, K quasicompact}.

(5.1.90)

Thus, m is inner-regular on opens. Now let S R be arbitrary. If 0 S, then every open set
which contains S will also contain 0, and so we definitely have
m(S) = 1 = inf{m(U) : S U U open}.

(5.1.91)

On the other hand, if S does not contain 0, then {0}c is an open set containing S, and so
m(S) = 0 = inf{m(U) : S U U open}.

(5.1.92)

Thus, m is outer-regular, and hence regular.


We now check that m is borel. To show this, obviously it suffices to show that every subset
of R is measurable with respect to m.
So, let M, S R. We would like to show that M is measurable, and so we need to show that
m(S) = m(S M) + m(S M c ).

(5.1.93)

There are two cases: either 0 M or 0


/ M. By M M c , we may as well assume that
0 M. Thus, we need to show that
m(S) = m(S M).

(5.1.94)

There are two cases: 0 S or 0


/ S. In the former case, this equation reads 1 = 1, an in the
later case it reads 0 = 0. Either way, it is true, and so M is measurable.
a Stricty-positive means just thatthat every nonempty open set has positive measure. The term here
presumably comes from the dirac delta function.

Chapter 5. Integration

222
Measures on uniform spaces

Definition 5.1.95 Uniform-additivity Let X be a uniform spac and let m be an outer-

measure on X. Then, m is uniformly-additive iff whenever S, T X are uniformly-separated,


it follows that m(S T ) = m(S) + m(T ).
R

Warning: The term uniform most of the time is something strictly-stronger than
something only topological. This is not the case here: topological-additivity is
superficially stronger than uniform-additivity because it is easier to be separated
by neighborhoods than it is to be uniformly-separated. In particular, topologicallyadditive measures on uniform spaces are automatically uniformly-additive.

The second condition is a generalization of the defining condition of what is called a


metric outer-measure. In particular, if X is a metric space, then any uniform meausre
is (by definition) a metric outer-measure.

Note that the unit measure on a two point space with the discrete topology (see
Example 5.1.48) is an example of a regular measure that is not topologically-additive.
The point is, that we do need to make this an additional assumptionwe do not get it
for free.

Exercise 5.1.96 Can you find an example of a regular measure m on a uniform space X

with S, T X uniformly-separated, but m(S T ) 6= m(S) + m(T ).


R

The point is: do we really need to assume uniform-additivity, or can we get it for
free?

Hint: We have already encountered a counter-example that will do the trick.

Proposition 5.1.97 Let m be a regular measure on a T0 uniform space. Then, the following

are equivalent.
(i). m is topologically-additive.
(ii). m is uniformly-additive.
(iii). m is borel.
Proof. Let X be the T0 uniform space that m is a regular measure on.
((i) (iii)) T0 uniform spaces are T2 . Therefore, by Exercise 5.1.65 and Proposition 5.1.66,
topological-additivity is equivalent to being borel.
((i) (ii)) Suppose that m is topologically additive. Let S, T X be uniformly-separated.
Then, S and T are separated by neighborhoods, and so m(S T ) = m(S) + m(T ). Thus, m
is uniformly-additive.
((ii) (i)) Suppose that m is uniformly-additive. Let S, T X be separated by neighborhoods. By definition, we want to show that m(S T ) = m(S) + m(T ). By subadditivity, it
suffices to show that m(S T ) m(S) + m(T ). If either one of the sets has infinite measure,
then this inequality reads , and so is automatically satisfied. Thus, we may as well
without loss of generality assume that m(S), m(T ) is finite. Then, by outer-regularity, for
every > 0, there is an open set W with S T W and
m(S T ) m(W ) < m(S T ) + .

(5.1.98)

5.1 Measure theory

223

On the other hand, because S and T are separated by neighborhoods, we know that
there are disjoint open sets U and V with S U and T V . Define U := U W and
V := V W , so that U and V are both disjoint and open. By the previous result, U and
V are measurable, and so we have
m(U V ) = m(U ) + m(V ).

(5.1.99)

Thus,
m(S T ) > m(W ) a m(U V ) = m(U ) + m(V ) b m(S) + m(T ) .
(5.1.100)
It follows that m(S T ) m(S) + m(T ), and we are done.

a Because U V W .

b Because S U and T V .

One important fact about uniformly-additive measures is that the open sets are automatically
measurable.
Proposition 5.1.101 Let m be a uniformly-additive regular measure on a T0 uniform space.
Then, m is a borel measure.

Proof. By the previous result, m is a topologically-additive. A T0 uniform space is T2 .


Therefore, by Proposition 5.1.66 (topologically-additive measures on T2 spaces are borel),
m is borel.

 Example 5.1.102 A topologically-additive regular on a uniform space that is
not borel a Define X := {x1 , x2 , x3 }, and equip it uniform base with just one cover, B :=

{{x1 , x2 }, {x3 }}. (This is a uniform base because this single cover star-refines itself.) The
neighborhood bases defined by this uniform base are
Bx1 = {StarB (x1 )} = {{x1 , x2 }}
Bx2 = {StarB (x2 )} = {{x1 , x2 }}

(5.1.103)

Bx3 = {StarB (x3 )} = {{x3 }}.


In particular, the open sets areb
0,
/ X, {x1 , x2 }, {x3 }.
We define an outer-measure m : 2X [0, ] by
(
0 if S = 0,
/ {x3 }
m(S) :=
.
1 otherwise

(5.1.104)

(5.1.105)

Every set has finite measure, and so certainly every quasicompact set does. There are
only four open sets to check, and you can simply check by hand that m is inner-regular on
each one of them. Measures are outer-regular on open sets, and so we only need to check
outer-regularity on the remaining 8 4 = sets that are not open. Once again, there are only
four things to check, and so you can just do so by hand.
We now check that this measure is uniformly-additive. If S and T are uniformly-separated,
then, without loss of generality, we have that S {x1 , x2 } and T {x3 }. We automatically

Chapter 5. Integration

224

have that m(S T ) = m(S) + m(T ) if either S or T is empty, so we may without loss of
generality assume that T = {x3 } and S {x1 , x2 } is nonempty. Then,
m(S T ) = 1 = 1 + 0 = m(S) + m(T ),

(5.1.106)

and so the measure is uniformly-additive.


On the other hand, it is not borel, because U := {x1 , x2 } is not measurable, as we now check.
Take S := {x1 }. Then,
m(S) = 1 6= 1 + 1 = m(S U) + m(S U c ).

(5.1.107)

a Topologically-additive

measures on uniform spaces are automatically uniformly-additivesee the remark


in Definition 5.1.95.
b Recall that, for a topology defined by a neighborhood base (see Definition 3.1.9, a set U is open iff every
point x U has an element B Bx such that x B U. As our set only contains 3 points, you can simply
verify by hand that these are the only open sets.

A summary

Before finally moving on to the The Haar-Howes Theorem, let us briefly summarize.
(i). Borel means that open sets are measurablesee Definition 5.1.44.
(ii). Regular means quasicompact sets have finite measure, inner-regularity on opens, and outerregularity on everythingsee Definition 5.1.45.
(iii). A topological space is -(quasi)compact iff it is the countable union of (quasi)compact
setssee Definition 5.1.51.
(iv). A topological measure space is a -compact topological space equipped with a regular borel
measuresee Definition 5.1.53.
(v). Topological-additivity means that the measure is additive on two sets which are separated by
neighborhoodssee Definition 5.1.64.
(vi). Uniform-additivity means that the measure is additive on two sets which are uniformlyseparatedsee Definition 5.1.95.
(vii). Neither regular topologically-additive nor regular uniformly-additive measures need be
borel (Example 5.1.102), but in fact for T2 spaces this is equivalent to being borelsee
Propositions 5.1.66 and 5.1.101.
5.1.3

The Haar-Howes Theorem


It turns out that there is a theorem, the The Haar-Howes Theorem,2 be that is quite general and will
just spit out regular uniformly-additive for us. This is how we will construct lebesgue measure.
That being said, it doesnt just work for any old uniform space. Were going to need extra structure:
2 Warning: This is the name I have chosen to call it, because I am unaware of another name for this result. In particular,
dont expect others to know what youre talking about if you reference this theorem by name. (It is a generalization
of the existence and essential uniqueness of haar measure, in case you are more familiar with that.) While the proof
of the result I cobbled together from other sources, the formulation of the result I found essentially in [10], in which
he claims the author established essential uniqueness, as well as existence, which he and another mathematician
(Izkowitz) established independently.

5.1 Measure theory

225

Definition 5.1.108 Isogeneous space An isogeneous space is a uniform space X

equipped with a group of uniform-homeomorphisms such that


e := {BU } where BU := { (U) : } and U X open.
B

(5.1.109)

is a uniform base for X.


R

e is the
is called the group of symmetries of X. It is a subgroup of AutUni (X). B
isogeneous base and each BU is an isogeneous cover.

The example you should have in mind here is that of a topological group G. In this
e := {BU } with BU := {gU : g U} for
case, the uniform base is the canonical one, B
U an open neighborhood of the identity, and the set of all uniform homeomorphisms
is the set of all left translations, := {g : g G} where g (x) := gx. Of course, you
e and !) if you so
can also choose right-translations over left-translations (in both B
desire.

What do you think the morphisms of isogeneous spaces should be?

Exercise 5.1.110 Let X be a uniform space with uniform topology U , let B be a base for

the topology, and let be a subgroup of AutUni (X). Show that


{BB : B B}

(5.1.111)

e is.
is a uniform base iff B
R

The point is that, in order to show that makes X into an isogeneous space, we only
need to check that the smaller collection of covers in (5.1.111), the covers coming
from only elements of the base instead of all open sets, form a uniform base.

Exercise 5.1.112 Let G be a topological group and let := {g : G G}, where g : G G

is defined by g (x) := gx. Show that hG, i is an isogeneous space.


Definition 5.1.113 Uniformly-measurable Let m : 2X [0, ] be an outer-measure on

a set X and let U is a cover of X. Then, U is uniformly-measurable iff m is constant on U .


A uniform base consisting of uniformly-measurable covers is a uniformly-measurable base.
R

Think about what having a uniform base of uniformly-measurable covers means for
a metric spaceif we take as a uniform base the collection of all covers by -balls,
then this is just the statement that every -ball has to have the same measure.

Note that you definitely do not want to require every uniform cover be uniformlymeasurable. For example, in a metric space, by upward-closedness the collection of
all -balls together with a single 2-ball will also be a uniform-coverwe definitely
do not want to require that a 2 ball has the same measure as an -ball.

Definition 5.1.114 Isogeneous measure Let hX, i be an isogeneous space. A iso-

e is a uniformlygeneous measure is a uniformly-additive regular measure for which B


measurable base.
R

Explicitly, this means that


m is regular;

Chapter 5. Integration

226
(ii).
(i). m is uniformly-additive; and
(iii). m(h(U)) = m(U) for h U and U X open.

Exercise 5.1.115 Let m be an isogeneous measure on an isogeneous space hX, i. Show

that m( (S)) = m(S) for all and S X.


R

In other words, this holds for all S, not just open S.

Exercise 5.1.116 Let m be an isogeneous measure on an isogeneous space hX, i, let

M X be measurable, and let . Show that (M) is measurable.


R

In other words, the symmetries of the isogeneous space preserve measurability


(in particular, for lebesgue measure, we will have that isometries of Rd preserve
measurabilitysee Definition 5.1.153 (the definition of lebesgue measure)).

And finally now the key result that will allow us to define basically every measure we work
with in these notes (and much more).
Theorem 5.1.117 Haar-Howes Theorem. Let hX, i be a T0 locally quasicompacta

isogeneous space and let K X be quasicompact with nonempty interior.. Then, there exists
a unique isogeneous m on X such that m(K) = 1.
R

You should think of K has a set with which we can compare all other sets to get
a measure of size. The condition that it be quasicompact you can think of the
condition that the measure of K be finite, and the condition that it have nonempty
interior you can think of the condition that the measure of K be positive. If m(K) is
neither infinite nor zero, then we can normalize to get m(K) = 1.

Note that in fact, by Proposition 5.1.97, the resulting measure will actually turn-out to
be topologically-additive.

Classically, the term haar measure is reserved for G a T0 locally quasicompact group
with the set of all left-translations. In particular, lebesgue measure will be the haar
measure for the topological group hR,d +, 0, i.b

Proof. S T E P 1 : M A K E H Y P O T H E S E S A N D I N T R O D U C E N O TAT I O N
Suppose that X has a quasicompact set K0 with nonempty interior. Denote the uniform
topology on X by U and denote the collection of all quasicompact subsets of X by K
S T E P 2 : D E F I N E (K : U) F O R K K A N D U U
The cover BU := {h(U) : h H} is an open cover of K, and so there are a finite subcover.
Let (K : U) denote the cardinality of the smallest such subcover.
S T E P 3 : D E F I N E HU : K R+
0
For U U , define HU : K R+
0 by
HU (K) :=

(K : U) c
.
(K0 : U)

S T E P 4 : S H O W T H AT HU (K) (K : Int(K0 ))

(5.1.118)

5.1 Measure theory

227

We now check that HU (K) (K : Int(K0 )), that is, (K : U) (K : Int(K0 ))(K0 : U). Let
us temporarily write m := (K : Int(K0 )) and n := (K0 : U). There are thus 1 , . . . , m
such that {1 (Int(K0 )), . . . , m (Int(K0 ))} covers K. There are also 10 , . . . , n0 such that
{10 (U), . . . , n0 (U)} covers K. Therefore,
!
K

m
[

hk (Int(K0 ))

k=1

m
[

hk (K0 )

k=1

m
[

hk

k=1

n
[

l=1

h0l (U) =

m [
n
[

[hk h0l ](U) (5.1.119)

k=1 l=1

Hence, K is covered by mn elements of BU ,d and hence (K : U) mn := (K : Int(K0 ))(K0 :


U).
+
S T E P 5 : D E F
I N E H : K R0
Define H := KK [0, (K : K0 )]. Each HU may be thought of as a point in H , whose
component at K K is HU (K) [0, (K : K0 )].e Thus, for U U , let us define

CU := Cls ({HV : U 3 V U})

(5.1.120)

C := {CU : U U }.

(5.1.121)

and

We wish to show that the intersection of any finitely many elements of C is nonempty. Then,
because H is quasicompact by The product topology (Theorem 3.5.21), it will follow that
the intersection over all elements in C will be nonempty.
This is actually really easy, however, because for U1 , . . . ,Um U , we have that
m
\

HU1 Um

CUk .

(5.1.122)

k=1

Therefore, by quasicompactness, there is some


H

CU .

(5.1.123)

UU

S T E P 6 : S H O W T H AT H(K1 ) H(K2 ) I F K1 K2
Let K1 , K2 K be such that K1 K2 . We first show that, for each U U , HU (K1 )
HU (K2 ). But this is trivial, because the covering of K2 with (K2 : U) elements of BU is
also a covering of K1 with (K2 : U) elements of BU , so that (K1 : U) (K2 : U), and hence
HU (K1 ) HU (K2 ).
Thinking of elements f of H as functions from K to R, consider the mapf that
sends f T to f (K2 ) f (K1 ). This is a composition of continuous functions, and hence
continuous.g This map is also nonnegative on each CU because HU (K1 ) HU (K2 ) for
each U U (we need continuity so that we know it is nonnegative on the closure of
{HV : V U}). As m is an element of each CU , it follows that this map is also nonnegative
at m, so that H(K1 ) H(K2 ).
S T E P 7 : S H O W T H AT H(K1 K2 ) H(K1 ) + H(K2 )
Let K1 , K2 K . We first show that HU (K1 K2 ) HU (K1 ) + HU (K2 ) for each U U .
This is trivial, because a covering of K1 with (K1 : U) elements of BU together with a

Chapter 5. Integration

228

covering of K2 with (K2 : U) elements of BU is a cover of K1 K2 with (K1 : U) + (K2 : U)


elements of BU , so that (K1 K2 : U) (K1 : U) + (K2 : U). It follows that HU (K1 K2 )
HU (K1 ) + HU (K2 ).
Proceeding similarly as in Step 6, the map that sends f T to f (K1 )+ f (K2 ) f (K1 K2 )
is continuous and nonnegative on each CU , and hence is nonnegative for H T . Thus,
H(K1 K2 ) H(K1 ) + H(K2 ).
S T E P 8 : S H O W T H AT HU (K1 K2 ) = HU (K1 ) + HU (K2 ) I F K1 A N D K2 A R E
U N I F O R M LY - S E PA R AT E D
Let K1 , K2 K be uniformly-separated with respect to BU . We have already shown
that HU (K1 K2 ) HU (K1 ) + HU (K2 ), so it suffices to show that HU (K1 ) + HU (K2 )
HU (K1 K2 ). In other words, it suffices to show that (K1 : U)+(K2 : U) (K1 K2 : U) =: m.
Let 1 (U), . . . , m (U) BU be a cover of K1 K2 . By hypothesis,h every single one of these
can only intersect U1 or U2 , but not both. Thus, after relabeling if necessary, the first k of these
guys will form a cover of K1 and the latter m k will form a cover of K2 . Thus, (K1 : U) k
and (K2 : U) m k, and so (K1 : U) + (K2 : U) k + (m k) = m := (K1 K2 : U), which
completes this step.
STEP 9: DEFINE THE MEASURE m ON ALL OPEN SUBSETS OF X
For U X open, define
m(U) := sup{H(K) : K U, K K }.

(5.1.124)

STEP 10: EXTEND m TO ALL SUBSETS OF X


Now, for an arbitrary subsets S of X, define
m(S) := inf{m(U) : S U, U U }.

(5.1.125)

Exercise 5.1.126 Show that this agrees with (5.1.124) when S is open, so that this

is indeed an extension.

S T E P 1 1 : S H O W T H AT m I S A N O U T E R - M E A S U R E
Exercise 5.1.127 Check that m(0)
/ = 0 and that m is nondecreasing.

We now check that it is subadditive. To prove this, we will first need a lemma.
Lemma 5.1.128 Let X be T2 , let K X be quasicompact, and let U1 ,U2 X be

open and such that K U1 U2 . Then, there are quasicompact subsets K1 , K2 X


such that (i) K1 U1 , (ii) K2 U2 , and (iii) K = K1 K2 .
Proof. We leave this as an exercise.
Exercise 5.1.129 Complete the proof yourself.

5.1 Measure theory

229

Having (hopefully) proved the lemma, we now show subadditivity for open sets. (We
will then prove subadditivity in general.) So, let {Um : m N} be a countable collection
S
S K
of open sets of X. Let K mN Um . Then, there is some mK N such that K m
k=1 Uk .
By applying this lemma inductively then, we may find quasicompact sets K1 , . . . , Km such
S
that (i) Kk Uk for 0 k m and K = m
k=1 Kk . Using the fact that we have already proved
finite subadditivity (of H) for quasicompact sets (Step 7), we find that
m

H(K)

H(Kk )

m(Uk ) m(Um ).

k=1

k=1

(5.1.130)

mN

Taking the sup of K, we find that


!
(
m

Um

:= sup H(K) : K

mN

Um , K K

m(Um ).

(5.1.131)

mN

mN

Having proved subadditivity for open sets, we now prove it for arbitrary sets. So, let
{Sm : m N} be an arbitrary countable collection of subsets of X. If mN m(Sm ) = , then
there is nothing to show, and so we may as well suppose without loss of generality that
mN m(Sm ) < . Let > 0 and for each m N pick an open set Um such that (i) Sm Um
and (ii) m(Sm ) m(Um ) < m(Sm ) + 2m . Then, using subadditivity for open sets, we find
!
m

Sm

!
m

mN

Um

mN

m(Um ) <
mN

mN



m(Sm ) + 2m =

m(Sm ) + 2.
mN

(5.1.132)
Hence, as > 0 was arbitrary, we have that
!
m

[
mN

Sm

m(Sm ).

(5.1.133)

MN

Thus, m is an outer-measure on X.
S T E P 1 : S H O W T H AT E A C H BU I S U N I F O R M LY - M E A S U R A B L E W I T H R E SPECT TO m
Let . We want to show that m( (U)) = m(U). Then, for any other 0 , we will
have that m( (U)) = m(U) = m( 0 (U)), so that indeed every element of BU has the same
measure. However, K is a quasicompact set contained in U iff (K) is a quasicompact set
contained in (U).i Therefore, by the definition of m(U) (5.1.124) it suffices to show that
H(K) = H(h(K)). To show this, we first show that HU (K) = HU (h(K)) for all U U . That
is, we would like to show that (K : U) = ( (K) : U). However, every cover of K by elements
of BU , 1 (U), . . . , m (U), gives a cover of (K) by elements of BU of the same cardinality,
(1 (U)), . . . , (m (U)). It thus follows that HU (K) = HU (h(K)).
For fixed, consider the map from H to R that sends f to f ( (K)) f (K). We
just showed that this is 0 on each HU T , and so it is 0 on CU , and so it is 0 on H, that is,
H(K) = H(h(K)).
S T E P 2 : S H O W T H AT I F S A N D T A R E U N I F O R M LY - S E PA R AT E D , T H E N m(S
T ) = m(S) + m(T )
Note that we always have that m(S T ) m(S) + m(T ), and so it suffices to show that
m(S T ) m(S) + m(T ).

Chapter 5. Integration

230

We first prove this for open sets. So, let U,V U . If either U or V s has infinite measure,
then this inequality just reads , and is so automatically satisfied. Thus, without loss
of generality, assume that m(U), m(V ) < . Let > 0. Then, there is are some K, L K
such that K U, L V , and
m(U) < H(K) m(U) and m(V ) < H(L) m(V ).

(5.1.134)

If U and V are uniformly-separated, then certainly K and L are uniformly-separated, and so


by Step 8, we have that
H(K L) = H(K) + H(L),

(5.1.135)

and so
m(U V ) H(K L) = H(K) + H(L) > m(U) + m(V ) 2.

(5.1.136)

Hence, m(U V ) m(U) + m(V ).


We now do the general case. Once again, if either S or T has infinite measure, we are
done, so we may as well suppose that m(S), m(T ) < .
Our first order of business it to show that there are some open sets containing S and T
respectively which are uniformly-separated.
Look at any open cover B which positively-separates S and T , and take an open starrefinement C of thisj . Define U := StarC (S) and V := StarC (T ). We wish to show that U
and V are uniformly-separated with respect to C . Because B positively-separates S and
T , by definition (see Definition 4.4.2), we have that StarB (S) and StarB (T ) are disjoint.
Therefore, it suffices to show that StarC (U) StarB (S) (and similarly for V ). So, suppose
that C C intersects U. Then, by definition of U, it must intersect some element C0 C
which intersects S. Let B B be such that StarC (C0 ) B. We then have that
C k StarC (C0 ) B l StarB (S).

(5.1.137)

Thus, indeed, StarC (U) StarB (S).


So, let U,V U be open sets containing S and T respectively which are uniformlyseparated. Let > 0, and choose W U that contains S T and satisfies
m(S T ) m(W ) < m(S T ) + .

(5.1.138)

Let us replace U and V by U W and V W upon doing so, it will still be the case that
U,V U , it will still be the case that S U and T V , and it will still be the case that U
and V are uniformly-separated, but now we will also have that m(U V ) m(W ). Now we
have
m(W ) < m(S T ) + m(S) + m(T ) + m(U) + m(V ) + = m(U V ) +
m(W ) + .
(5.1.139)
Thus, we do indeed have that
m(S V ) = m(S) + m(T ).

(5.1.140)

e is a uniformly-measurable base for m and that


In particular, we have now shown that B
m is additive for uniformly-separated sets, so that indeed m is a uniformly-additive on X.

5.1 Measure theory

231

It remains to show that m is regular.


S T E P 3 : S H O W T H AT H(K) m(K)
Let U U contain K. Then, by the definition of m(U), (5.1.124), we have that H(K)
m(U). Taking the infimum over all such U, we obtain H(K) m(K).
S T E P 4 : S H O W T H AT m I S R E G U L A R
The first thing we check is that m(K) < for K quasicompact. By Proposition 3.8.4, there
is some open U X containing K with compact closure. Hence, for K K 0 U, with K 0
quasicompact, we have
H(K 0 ) H(Cls(U)) < .m

(5.1.141)

Taking the supremum over such K 0 , we have that m(U) H(Cls(U)), and so, as m(K)
m(U) (because m is an outer-measure), m(K) is finite.
m is outer-regular by definition (5.1.125).
We now turn to inner-regular on open subsets. This is almost true by the definition
(5.1.124), but we dont have H(K) = m(K). However, we actually dont need thiswe only
ned H(K) m(K). To show this, let U U contain K. Then, by the definition of m(U),
(5.1.124), we have that H(K) m(U). Taking the infimum over all such U, we obtain
H(K) m(K).
Thus, m is regular.
S T E P 5 : S H O W T H AT m(K0 ) = 1
First of all, from the definition, we have that HU (K0 ) = 1 for all U U . Perhaps we could
try harder and show that we already do have that m(K0 ) = 1; however, this enough is to
show simply that m(K0 ) > 0, and so by simply dividing by m(K0 ) if necessary, we obtain a
e and m(K0 ) = 1.
regular uniformly-additive measure with uniformly-measurable base B
S T E P 6 : S H O W T H AT m I S U N I Q U E
e
Let m0 be another regular uniformly-additive on X with uniformly-measurable base B
0
and m(K0 ) = 1. As both m and m are outer-regular, it suffices to show that they agree on
open sets. Then, because they are both inner-regular on open sets, it suffices to show that
they both agree on quasicompact subsets. However, on account of Proposition 3.8.4 and
outer-regularity, it in turn suffices to show that they agree on open sets with compact closure.
So, let G denote the collection of all open sets with compact closures. Furthermore, let us
define
Z(G
 ) :=
S 2X : for every > 0 there are C closed and U G such that

C S U and m(U \C ), m0 (U \C ) < .

(5.1.142)

e for
For the rest of this proof, let us write S b T iff there is some uniform cover B B
which StarB (S) T . The first thing to note is that, if S b T , then Cls(S) Int(T ). Suppose
that S b T and let x X be an accumulation point of S. By definition, there is some uniform
e such that StarB (S) T . Because x is an accumulation point of S and every
cover B B
element of B is open, every element of B that contains x (of which there must be at least
one, say B B), must intersect S, and so we have that x B StarB (S) T , which implies
that x Int(T ).

Chapter 5. Integration

232

To prove that they agree on G , we will first show that they agree on Z(G ) G . To show
that this is in fact sufficient, we prove that
m(U) = sup{m(V ) : V G Z(G ), V b U}

(5.1.143)

for U G open (the exact same proof will work for m0 ). As m is inner-regular on open
sets (and so in particular on elements of G ), it suffices to show that, for every K U
quasicompact, there is some VK G Z(G ) with K VK b U. We first show that we can
find such a VK G .
As X is T0 , it is uniformly-completely-T3 , and so we may uniformly-separated quasicome such that StarB (K)
pact sets from closed sets, and so there is some uniform cover B B
c
is disjoint from StarB (U ). Because X is a locally compact T2 isogeneous space, by taking
a star-refinement if necessary, we can without loss of generality assume that each element of
e . Once
B has compact closure (i.e. is an element of G ).n Take a star-refinement C B
again, every element of C has compact closure. By quasicompactness of K, there are finitely
many C1 , . . . ,Cm C that over K. Define C := C1 Cm G . Furthermore,
StarC (C) StarB (K) StarB (U c )c U.

(5.1.144)

Thus, we have shown that for U G and K U quasicompact, there is some VK G with
K VK b U. Thus, to finish the proof that it suffices to show that they agree on Z(G ) G ,
it suffices to show that, for every U,V G with U b V , there is some W Z(G ) G with
U b W b V.
So, let U,V G with U b V . We showed before that this implies that Cls(U) V .
Then, because we may uniformly separate quasicompact sets from closed sets in uniformlye such that StarB (Cls(U)) is
completely-T3 spaces, there is some uniform cover B B
c
disjoint from StarB (V ). Define
W := StarB (Cls(U)).

(5.1.145)

W is open as every element of B is open. It also has compact closure as its closure is
contained in Cls(U), which is compact (closed subsets of compact sets are compact in T2
spaces). By definition, we have that U b W . We check that also W b V . To show this,
we show that StarB (W ) V . So, let B B intersect W . We proceed by contradiction:
suppose that B intersects V c . Then, it is contained in StarB (V c ), which is disjoint from W :
a contradiction. Thus, U b W b V and W G . It remains to show that W Z(G ). This,
however, follows from the fact that that W is measurable with respect to both m and m0
(because it is opensee Proposition 5.1.101), the fact that it has finite measure for both m
and m0 (because its closure is quasicompact and the measure is regular), and Theorem 5.1.82
(we can force U there to have compact closure by intersecting it with V ). This finally
establishes (5.1.143), and so finishes our proof that it suffices to show that m and m0 agree
on G Z(G ).
For the rest of the proof, take note that everything we know about m0 is likewise true
about m. Therefore, everything we prove to be true about m0 will also be true about m. Thus,
hereafter, if we prove facts about either m or m0 , we shall prove them about m0 they are
then automatically true about m.
For U G Z(G ), every cover BV has a finite subcover of U, and so just as we did for
K compact, we may define (U : V ) to be the the cardinality of the smallest such subcover.
We shall use this notation in a moment.

5.1 Measure theory

233

Exercise 5.1.146 Let U G Z(G ). Show that, for every > 0, there is some open

U , such that

m0 StarBU (U) U <

(5.1.147)

Now, fix U0 Z(G ) G and let U open be arbitrary.


Exercise 5.1.148 Show that there is some U 0 b U such that, for all V sufficiently

small (with respect to b),


m0 (U0 ) (U00 : V )

m0 (U)
(U 0 : V )

(5.1.149)

whenever StarBU 0 (U0 ) U00 .


R

Hint: See (10.2) in [10].

Exercise 5.1.150 Show that there is some U 00 b U 0 such that, for all V sufficiently

small (with respect to b),


(U00 : V )
m0 (U0 )

m0 (U 00 ) (U 0 : V )

(5.1.151)

whenever StarBU 0 (U00 ) U0 .


R

Hint: See (10.3) in [10].

Exercise 5.1.152 Combine the last three exercises to show that m0 and m agree on

G Z(G ) up to a scalar multiple.


R

Hint: You should be able to combine these results to get an expression


for m0 (U0 ) that, besides factors of m0 (U 00 ) and m0 (U 0 ), will be completely
independent of m0 . As explained above, everything true of m0 must also be
true of m, and so, we will have the same expression for m, with exception of
the fact that the factors m0 (U 0 ) and m0 (U 00 ) will be different.


a Hence locally compact as T implies uniformly-completely-T implies T and subspaces of T spaces are T .
0
3
2
2
2
b If the group is commutative, the symmetries by left translation and right translations are the same, so left

vs. right does not matter.


c K is nonempty, and so cannot be covered by anything empty. Therefore, (K : U) 1, and in particular, is
0
0
not 0.
d Here we are using the fact that is closed under composition, so that 0 .
k
l
e That was sort of the point of the previous step.
f For each K , K K with K K , we have such a map.
1 2
1
2
g The first map from H into R R is the projection of f T onto the K th coordinate in the first coordinate
1
and the projection of f T onto K2th coordinate in the second coordinate. This map is continuous because it
is continuous in each coordinate. Each coordinate is continuous because projections are continuous. The first
map is followed by the map from R R into R given by subtraction, which is continuous because we know that
hR, +, 0, i is a topological group.
h This is the definition of uniformly-separatedsee Definition 4.4.2.
i This implicitly uses the fact that 1 .

Chapter 5. Integration

234

j Note that every B is an open cover, so there is no need to say open hereit is just to clarify. We dont
U
write BU to simplify the notation (and also because we will want to write U for something else).
k Because C and C0 intersect.
l Because B contains C0 , which intersects S.
m Recall that H is always finite, by definition.
n Take any open set U with compact closure (which exists by local compactness). Then, B will be a cover
U
whose elements are in G . Take a common star-refinement of this and B. The closures of elements of this new
cover will be contained in the elements of BU , which themselves will be compact as closed sets of compact sets
are compact.

Dayyyuuuummmm. That was a hard theorem. Probably the hardest in these notes. But holy
jesus was it worth it. Check out this epic definition.
Definition 5.1.153 Lebesgue measure Lebesgue measure m on Rd is the haar measure

with respect to the symmetry group of all isometriesa such that m([0, 1] [0, 1]) = 1.
R

You have to check that the group of isometries actually generate a uniform base for
Rd via (5.1.109), but this is actually really trivial, because the isometric image of an
-ball isgaspanother -ball!

This is actually much better than defining lebesgue measure to be classical haar
measure (i.e. haar measure with respect to the topological group structure). With this
definition we automatically get that lebesgue measure is invariant under rotations for
free, whereas with the classical definition, this requires some work.

Besides using haar measure to define lebesgue measure, it is also common to use
something called Carathodorys Extension Theorem, which, while not that bad, has
the problem that it lacks a uniqueness result.b Moreover, carathodory doesnt even
give us a regular measureit actually just gives us an outer-measure. We would
then have to go through by hand and check that the measure defined in this way is
finite on quasicompact sets, inner-regular on open sets, outer-regular, has a uniformlymeasurable base, is uniformly-additive, is invariant under translation, is invariant
under rotation, is invariant under reflection, and that translations, rotations, and
reflections actually give us all the isometries. Ew. The theory is just so much prettier
when all of this hard work is done for us by one result, instead of by fifty-bajillion
separate ones.

a An isometry is an isomorphism in the category of metric spaces.

Concretely, this means that | f (x1 ), f (x2 )| =


|x1 , x2 |. In the case of Rd , all rotations, reflections, and translations are isometries. (In fact, one can show,
though we will not, that these generate all isometries.)
b At least in general. I think in certain nice cases it can be made to work.

We can even define counting measure using the The Haar-Howes Theorem.3
Definition 5.1.154 Counting measure Let X be a set equipped with the discrete uni-

formity and take := AutSet (X) to be the group of all bijections from X to itself. Then,
{B{x0 } : x0 X} := {{h({x0 }) : h H}} = {{{x} : x X}} a

(5.1.155)

is a uniform base, and so, because B := {{x0 } : x0 X} is a base for this topology, by
Exercise 5.1.110, hX, Hi is an isogeneous space. Therefore, by the The Haar-Howes
Theorem, there is a unique isogeneous measure m, the counting measure, on X such that
m({x0 }) = 1.
3 Though

this is a bit like nuking the planet to swat a fly.

5.2 The integral

235

a This

is the uniform base which has a single cover, namely, the cover by singletons. I say this because what
is going on here is actually very easy, even though the notation may be a bit hard to parse.

5.2

The integral
As you know, the integral is supposed to be the area under the curve. The functions we will
be integrating will take values in the reals, and so the integral will spit out a real number. This
is probably nothing new to you. What is almost certainly new to you, however, is that now the
domains of the functions we will be integrating will be regular measure spaces.4 That is, we will be
integrating functions f : X R, for X a regular measure space. We will then simply define the
integral to be the measure of the set5
{hx, yi X R : 0 < y < f (x)} .

(5.2.1)

To do this, of course, we must first define a measure on X R.


5.2.1

The product measure


The product measure isyou guessed ita measure on the product. As the integral is the area
under the curve and the area under the curve is a subset of X [0, ] (for f : X [0, ]), it is
necessary for our development to put a measure on X [0, ].6
Theorem 5.2.2 Product measure. Let hX1 , m1 i and hX2 , m2 i be topological measure

spaces. Then, there exists a unique regular borel measure m1 m2 on X1 X2 , the product measure, such that [m1 m2 ](K1 K2 ) = m1 (K1 ) m2 (K2 ) for Ki Xi quasicompact.
Furthermore, it satisfies
(i). [m1 m2 ](S1 S2 ) = m1 (S1 ) m2 (S2 ) for all Si Xi a ;
(ii).
(
[m1 m2 ](U) := sup

m1 (K1,k ) m2 (K2,k ) :
k=0

Ki,k Xi quasicompact,
{K1,k K2,k : 0 k m} is disjoint,
)
m
[

K1,k K2,k U, m N

(5.2.3)

for U X1 X2 open

k=0

[m1 m2 ](S) := inf {[m1 m2 ](U) : S U, U open} ;


and
4 You can do things much more generally than this, but for us,there is no need. As a rule, it is usually best to do
things in the nicest theory that encompasses every example youre interested in, and for us, we will not be interested in
measures that are not regular (except perhaps when it comes to counter-examples).
5 At least when f is nonnegativewell have to work just a teensy bit harder to take the signed area if the function
is negative somewhere.
6 Of course, you dont have to do things this way, and indeed, this is not how its usually done. Doing things the other
way, however, is arguably one reason why people think the lebesgue integral is not geomeric. On the other hand, I
cant believe anybody would argue that the area under the curve is not geometricthis is in fact how we explain things
to high school students, after all.

Chapter 5. Integration

236
(iii).
(

m1 (U1,m ) m2 (U2,m ) : Ui,m Xi open,

m(S) = inf

mN

(5.2.4)

)
S

U1,m U2,m .

mN

So that formulas look ridiculous, but I promise, its not that complicated. The first
simply says that we can approximate open sets from the inside with compact sets of a
special form (namely, finite disjoint unions of compact rectangles)essentially just
a nice version of inner-regularity. The second simply says that we can approximate
any set from the outside with open sets of a special form (namely, countable unions
of open rectangles)essentially just a nice version of outer-regularity. Perhaps one
thing to note is that in we do not need to require disjointness explicitly in (5.2.4)not
having disjointness makes the sum larger, and so the infimum doesnt care about these
cases, so to speak.

You (hopefully) showed in Exercise 5.1.52 that the product of two -compact spaces is
-compact. It thus follows that hX1 X2 , m1 m2 i is likewise a topological measure
space.

Proof. S T E P 1 : D E F I N E m1 m2
We define
mK (K1 K2 ) := m1 (K1 ) m2 (K2 ) for K1 X1 , K2 X2 quasicompact
(
m

mU (U) := sup

mK (K1,k K2,k ) :
k=0

K1,k X1 , K2,k X2 quasicompact,


{K1,k K2,k : 0 k m} is disjoint,
)
m
[

K1,k K2,k U, m N

(5.2.5)

for U X1 X2 open

k=0

[m1 m2 ](S) := inf{mU (U) : S U, U open}.


We will eventually show that all these formulas agree in the case that more than one applies
to a give set (e.g. the second and third both apply to any open set). After which time, we
shall simply denote m1 m2 for everything.
S T E P 2 : S H O W T H AT m1 m2 I S N O N D E C R E A S I N G
Exercise 5.2.6 Check that mK is nondecreasing on sets of the form K1 K2 for

Ki Xi quasicompact.
Exercise 5.2.7 Check that mU is nondecreasing on open sets.
Exercise 5.2.8 Check that m1 m2 itself is nondecreasing.

5.2 The integral

237

S T E P 3 : S H O W T H AT mU (U1 U2 ) = mU (U1 ) mU (U2 ) F O R Ui Xi O P E N


Let Ui Xi be open. Let us first suppose that both m1 (U1 ) = = m2 (U2 ). Then, for every
M > 0, there are quasicompact Ki Ui with m(Ki ) > M. Then,
mU (U1 U2 ) m1 (K1 ) m2 (K2 ) > M 2 ,

(5.2.9)

and so mU (U1 U2 ) = .
Let us now do the case where one has measure 0 and one has infinite measure. Without
loss of generality, take m1 (U1 ) = 0 and m2 (U2 ) = . Then, whenever we have
m
[

K1,k K2,k U1 U2 ,

(5.2.10)

k=0

we must have that K1,k U1 for all 0 k m,b which forces m1 (K1,k ) = 0, and so in turn it
forces
m

mK (K1,k K2,k ) = 0,

(5.2.11)

k=0

and hence mU (U1 U2 ) = 0 = m1 (U1 ) m2 (U2 ).


Exercise 5.2.12 Do the case where one has finite positive measure and one has

infinite measure.
Exercise 5.2.13 Do the case where one has zero measure and the other has finite

positive measure.
Finally, consider the case where both 0 < m(U1 ), m(U2 ) < . Let
(m1 (U1 ) + m2 (U2 )) min{m1 (U1 ), m2 (U2 )} > > 0

(5.2.14)

and choose Ki Ui quasicompact so that


mi (Ki ) mi (Ui ) < m(Ki ) +

.
m1 (U2 ) + m2 (U2 )

(5.2.15)

Then,
m1 (U1 ) m2 (U2 ) m1 (K1 ) m2 (K2 )



> m1 (U1 )
m1 (U2 ) + m2 (U2 )



m2 (U2 )
m1 (U1 ) + m2 (U2 )
2
= m1 (U1 ) m2 (U2 ) +
(m1 (U1 ) + m2 (U2 ))2
> m1 (U1 ) m2 (U2 ) ,
whence it follows that mU (U1 U2 ) = m1 (U1 ) m2 (U2 ).
S T E P 4 : S H O W T H AT [m1 m2 ](U) = mU (U)

(5.2.16)

Chapter 5. Integration

238

Let S X1 X2 be open. We must show that [m1 m2 ](S) = mU (S). Of course, mU (S)
{mU (U) : S U, U open}, which gives us that [m1 m2 ](S) mU (S). On the other hand,
because mU is nondecreasing on open sets, we have that mU (S) is a lower-bound for
{mU (U) : S U, U open}, which gives us the other inequality.
S T E P 5 : S H O W T H AT [m1 m2 ](S1 S2 ) = m1 (S1 ) m2 (S) F O R m(Si ) <
In particular, this will show that [m1 m2 ](K1 K2 ) = mK (K1 K2 ) for Ki Xi quasicompact. Let Si Xi with mi (Si ) < . We want to show that [m1 m2 ](S) = m1 (S1 ) m2 (S2 ). In
other words, we want to show that
m1 (S1 ) m2 (S2 ) = inf{mU (U) : S1 S2 U, U open}.

(5.2.17)

Let > 0. Choose > 0 such that (m1 (S1 ) + m2 (S2 )) + 2 < . Because mi is regular,
there is some open Ui Xi with Ki Ui and
mi (Si ) mi (Ui ) < mi (Si ) + .

(5.2.18)

Then, S1 S2 U1 U2 , U1 U2 is open, and


m1 (S1 ) m2 (S2 ) m1 (U1 ) m2 (U2 ) = mU (U1 U2 )
< m1 (S1 ) m2 (S2 ) + (m1 (S1 ) + m2 (S2 )) + 2

(5.2.19)

< m1 (S1 ) m2 (S2 ) + ,


whence it follows that [m1 m2 ](S1 S2 ) = m1 (S1 ) m2 (S2 ).
S T E P 6 : S H O W T H AT mU (K1 K2 ) = mK (K1 K2 )
Now suppose that U := K1 K2 is open for Ki Xi quasicompact. We must show that
mK (U) = mU (U). That is, we must show that mU (U) = m1 (K1 ) m2 (K2 ). However, from
the definition of mU as a supremum, we have that mU (U) m1 (K1 ) m2 (K2 ). To show the
other inequality, let
m
[

K1,m K2,m K1 K2

(5.2.20)

k=0

be a disjoint union for Ki,k Xi quasicompact.c We need to show that


m

m1 (K1,k ) m2 (K2,k ) m1 (K1 ) m2 (K2 ).

(5.2.21)

k=1

Ki,k Ki for 0 k m, and so


!
!
m
[

K1,k

k=0

m
[

K2,k

K1 K2 .d

(5.2.22)

k=0

Now, here is where the notation becomes atrocious, but the idea is only a little bit tricky.
The intuition is that we break-up the bounding rectangle into a grid of smaller rectangles
so that each of the original rectangles is a disjoint union of the grid rectangles. Let
S {1, . . . , k 1, k, . . . , m} =: Sk and define
Ki,k,S := Ki,k

\
lSk

Ki,l

\
l S
/ k

c
Ki.l
,

(5.2.23)

5.2 The integral

239

so that
Ki,k =

(5.2.24)

Ki,k,S

SSk

and
Li :=

m
[

Ki,k =

k=0

m [
[

(5.2.25)

Ki,k,S

k=1 SSk

are disjoint unions. Then, we have that


m

m1 (K1,k ) m2 (K2,k ) =
k=0

k=0

m1 (K1,k,S )

SSk

m2 (K2,k,S )

SSk

m1 (K1,k,S )

k=0 SSk

m2 (K2,k,S )

(5.2.26)

k=0 SSk

= m1 (L1 ) m2 (L2 ) m1 (K1 ) m2 (K2 ),


as was to be shown.
S T E P 7 : S H O W T H AT m1 m2 I S A N O U T E R - M E A S U R E
That [m1 m2 ](0)
/ = 0 follows immediately from the definition. We already showed that it
is nondecreasing.
Exercise 5.2.27 Check that m1 m2 is subadditive.

S T E P 8 : S H O W T H AT m1 m2 I S R E G U L A R
That it is outer-regular follows immediately from the definition.e Inner-regularity also
follows quite easily.
(
m

[m1 m2 ](U) := sup

[m1 m2 ](K1,k K2,k ) :


k=0

K1,k X1 ,
K2,k X2 quasicompact,
{K1,k K2,k : 0 k m} is disjoint,
)
m
[

(5.2.28)

K1,k K2,k U, m N

k=0

sup{[m1 m2 ](K) : K U, K quasicompact} f [m1 m2 ](U),


and so all of these inequalities must be equalities. We now check that it is finite on
quasicompact sets. Let K X1 X2 be quasicompact. Define Ki := i (K) Xi . This is
quasicompact by the Quasicompactness and the Extreme Value Theorem, and so mi (Ki ) < .
However, K K1 K2 , and hence [m1 m2 ](K) m1 (K1 ) m2 (K2 ) < .
S T E P 9 : S H O W T H AT m1 m2 I S A D D I T I V E O N F I N I T E D I S J O I N T U N I O N S O F
Q U A S I C O M PA C T R E C TA N G L E S

Chapter 5. Integration

240
Sm

is a disjoint union.

m1 (K1,k ) m2 (K2,k ).

(5.2.29)

Let Ki,k Xi be quasicompact for 0 k m be such that


We wish to show that
!
[m1 m2 ]

m
[

k=0 K1,k K2,k

K1,k K2,k

k=0

k=0

The equality follows from subadditivity. To show the other inequality, let > 0 and let
U X1 X2 be open and such that
!
m
[

K1,k K2,k U and [m1 m2 ](U) < [m1 m2 ]

k=0

m
[

K1,k K2,k + .

(5.2.30)

m1 (K1,k ) m2 (K2,k ),

(5.2.31)

k=0

Then,
[m1 m2 ]

m
[

!
K1,k K2,k

> [m1 m2 ](U)

k=0

k=0

which gives us the other inequality.


S T E P 1 0 : S H O W T H AT m1 m2 I S B O R E L
Let U X1 X2 be open. By inner-regularity, modulo a set of measure 0, we can write
U as a countable union of quasicompact that are finite disjoint unions of rectangles of the
form K1 K2 for Ki Xi quasicompact. As sets of measure 0 are measurable and countable
unions of measurable sets are measurable, it thus suffices to show that K1 K2 is measurable
for Ki Xi quasicompact.
So, let S X1 X2 be arbitrary. We wish to show that
[m1 m2 ](S) [m1 m2 ](S (K1 K2 )) + [m1 m2 ](S (K1 K2 )c ).

(5.2.32)

First note that this holds for S = L1 L2 , Li Xi quasicompact, by the previous step:
L1 L2 = ((L1 K1 ) (L1 K1c )) ((L2 K2 ) (L2 K2c ))
= (L1 K1 ) (L2 K2 )
(L1 K1 ) (L2 K2c )
(L1 K1c ) (L2 K2 )
(L1 K1c ) (L2 K2c )

(5.2.33)

= ((L1 L2 ) (K1 K2 ))
((L1 L2 ) ((K1c K2 ) (K1 K2c ) (K1c K2c )))
= ((L1 L2 ) (K1 K2 )) ((L1 L2 ) (K1 K2 )c ) .
If [m1 m2 ](S) = , we are done, so we may as well assume that [m1 m2 ](S) < . Let >
0. Let U X1 X2 be open and such that (i) S U and (ii) [m1 m2 ](S) > [m1 m2 ](U).
Because the space is T2 , K1 K2 is closed, and so U (K1 K2 )c is open. Therefore, there
is a disjoint union
m
[
k=0

L1,k L2,k U (K1 K2 )c

(5.2.34)

5.2 The integral

241

for Li,k Xi quasicompact such that


m

[m1 m2 ](U (K1 K2 )c ) <

[m1 m2 ](L1,k L2,k ).

(5.2.35)

k=0

Similarly, there is a disjoint union


n
[

M1,k M2,k U

k=0

m
[

!c
L1,k L2,k

(5.2.36)

k=0

for Mi,k Xi quasicompact such that


[m1 m2 ] U

m
[

!c !
L1,k L2,k

<

[m1 m2 ](M1,k M2,k ).

(5.2.37)

k=0

k=0

Hence,
[m1 m2 ](S) > [m1 m2 ](U) m

m
[

L1,k L2,k

k=0

n
[

!
M1,k M2,k

k=0

= g [m1 m2 ](L1,k L2,k ) + [m1 m2 ](M1,k M2,k )


k=0

k=0
m
[

> m (U (K1 K2 )c ) + m U

(5.2.38)

!c !
L1,k L2,k

k=0

[m1 m2 ](U (K1 K2 )c ) + [m1 m2 ](U (K1 K2 )) 3


[m1 m2 ](S (K1 K2 )c ) + [m1 m2 ](S (K1 K2 )).
STEP 11: SHOW UNIQUENESS
To show that two regular measures agree, by outer-regularity, it suffices to show that they
agree on on open sets. By inner-regularity, to show this, it suffices to show that
(
m

sup{m(K) : K U, K quasicompact} = sup

[m1 m2 ](K1,k K2,k ) :


k=0

K1,k X1 , K2,k X2 quasicompact,


{K1,k K2,k : 0 k m} is disjoint,
)
m
[

K1,k K2,k U, m N

(5.2.39)

k=0

The left-hand side is automatically at least as large as the right-hand side because the
set inside the sups on the left-hand side is a superset of the set inside the sup right-hand
side. On the other hand, for K X1 X2 quasicompact, Ki := i (K) is quasicompact by
the Quasicompactness and the Extreme Value Theorem, and as K K1 K2 , we have
that m(K) m1 (K1 ) m2 (K2 ), which showed that the left-hand side is no larger than the
right-hand side, which gives equality.
S T E P 1 2 : S H O W T H AT [m1 m2 ](S1 S2 ) = m1 (S1 ) m2 (S2 ) F O R Si Xi

Chapter 5. Integration

242

We have actually already done the case where mi (Si ) < see Step 5. We first prove this
for Si measurable. We then do the general case by using the fact that, in topological measure
spaces, every set is measurable up to a set of measure 0 (Proposition 5.1.77).
S
By Exercise 5.1.54, write X1 X2 = mN M1,m M2,m as the countable disjoint union
of measurable rectangles of finite measure.
Exercise 5.2.40

Use this together with the fact that we already know that
[m1 m2 ](F1 F2 ) = m1 (F1 ) m2 (F2 ) for set of finite measure to finish the proof
that [m1 m2 ](S1 S2 ) = .
Exercise 5.2.41 Now show that [m1 m2 ](S1 S2 ) = 0 if S1 , S2 are both measurable,

one of which has measure 0.


Exercise 5.2.42 Finally, show that [m1 m2 ](S1 S2 ) = m1 (S1 ) m2 (S2 ) for arbi-

trary S1 , S2 , using Proposition 5.1.77 as explained at the beginning of this step.

S T E P 1 3 : S H O W R E C TA N G L E O U T E R - R E G U L A R I T Y
Finally, we seek to show (5.2.4). By outer-regularity, it suffices to show that we can
approximate open sets with covers of open rectangles. To show this, we show that open sets
can be approximated with countable covers of (not-necessarily-open) rectangles. If we can
do this, then by -compactness, we can approximate open sets with countable covers of
(not-necessarily-open) rectangles of finite measure. By outer-regularity of X1 and X2 , we
will be able to approximate each of these rectangles by open rectangles, which will give us
finally that we can approximate open sets with countable covers by open rectangles, which
will complete the proof. Thus, we seek to show that we can approximate open sets with
countable covers of (not-necessarily-open) rectangles. To show this, it suffices to show that
every open set can be written as the countable disjoint union of rectangles.
So, let U X1 X2 be open. For each hx1 , x2 i U, let U1,x1 ,x2 U2,x1 ,x2 U contain
hx1 , x2 i. By -compactness, U can be written as the union of countably-many of these.
Changing up notation a bit, let us this write
U=

U1,m U2,m .

(5.2.43)

mN

Exercise 5.2.44 Finish the proof by writing U as a disjoint union of countably many

rectangles.

a That

is, you assume that area is base times height for quasicompact rectangles, and you then you get that
area is base times height for all rectanglesyou dont even need measurability.
b Okay, you caught me. This is not necessarily true if K
/ Congratulations. Would you like a cookie?
2,k = 0.
c Imagine a finite disjoint union of rectangles contained inside another big rectangle.
d This is the statement that the bounding rectangle of the union of rectangles will still be contained inside
the big rectangle.
e Indeed, the definition was designed for the purpose of being regular.
f Because m m is nondecreasing.
1
2
g By the previous step, because these two collections of rectangles are disjoint.
h Here, m denotes any regular borel measure on X X such that m(K K ) = m (K ) m (K ) for K X
i
i
1
2
1
2
1 1
2 2
quasicompact.

5.2 The integral

243

The product of two isogeneous spaces is canonically an isogeneous spaces, and if they are both
T0 locally quasicompact, so too will the product be. Hence, the The Haar-Howes Theorem tells us
that the product will obtain a measure. The question, then, is whether this measure agrees with the
product measure as defined above. Fortunately, the answer is in the affirmative.
Definition 5.2.45 Product isogeneous structure Let hX1 , H1 i and hX2 , H2 i be two

isogeneous structures. Then, the product isogeneous structure on X1 X2 is given by


H1 H2 := {h1 h2 : h1 H1 , h2 H2 },

(5.2.46)

where h1 h2 : X1 X2 X1 X2 is defined by
[h1 h2 ](hx1 , x2 i) := hh1 (x1 ), h2 (x2 )i.

(5.2.47)

Exercise 5.2.48 Show that this in fact gives an isogeneous structure.


R

In other words, you need to show that


{BU1 U2 }

(5.2.49)

is a uniform base, where


BU1 U2 := {h1 (U1 ) h2 (U2 ) : h1 H1 , h2 H2 }.

(5.2.50)

Let hX1 , H1 , m1 i and hX2 , H2 , m2 i be isogeneous measure with


m1 (K1 ) = 1 = m2 (K2 ). Then, m1 m2 is the haar measure on the product isogeneous
space X1 X2 that gives measure 1 to K1 K2 .
Proposition 5.2.51

In particular, the product of lebesgue measure on Rd with the product of lebesgue


measure on Re is the lebesgue measure on Rd+e .

Proof. We leaves this as an exercise.


Exercise 5.2.52 Prove the result yourself.

5.2.2

Lebesgue measure
Before continuing on with integration, we prove some properties about lebesgue measure. Knowing
that lebesgue measure on Rd is just the product of the lebesgue measure on R is actually incredibly
useful, and so now that we know this, we take advantage of this fact.
Proposition 5.2.53 The lebesgue measure of a point is 0.

Proof. The argument is the exact same in Rd as it is in R, so we write down the argument
in R and save us from some .
The first thing to notice is that, by translation invariance, the measure of every point is
the same. Let us denote this measure by M
Points are closed, and hence measurable, and so m is additive on points. Therefore, we

Chapter 5. Integration

244
have that

1 = m([0, 1]) m

{x} =

m({x}) =

x[0,1]Q

x[0,1]Q

(5.2.54)

mN

This equation forces M = 0.

Corollary 5.2.55 m([0, 1) [0, 1)) = 1.


R

Sets of the form


[a1 , b1 ) [ad , bd )

(5.2.56)

are important in measure theory because, for example,


[0, 2) = [0, 1) [1, 2)

(5.2.57)

is a disjoint union. If we tried replacing everything here with all open intervals we
we would have that (0, 2) = (0, 1) (1, 2), which is just plain false, and if we tried
replacing everything here with all closed intervals the union would not be disjoint
([0, 1] and [1, 2] intersect at 1). The disjointness is important in measure theory of
course because of additivity (on measurable sets). Sets of the form (5.2.56) are
half-open rectangles.

Proof. Now that we know that


m([0, 1) [0, 1)) = m([0, 1)) m([0, 1)),

(5.2.58)

it suffices to prove this result in R.


That it is true in R follows from the fact that
1 = m([0, 1]) = m([0, 1)) + m({1}) = m([0, 1)).

(5.2.59)


Proposition 5.2.60 m([a1 , b1 ] [ad , bd ]) = (b1 a1 ) (bd ad ).

Proof. Now that we know that it is a product measure, it suffices to show the onedimensional case, and so we prove that m([a, b]) = b a.
By translation invariance, it suffices to show that m([0, b]) = b.
Because the measure of points is 0, it suffices to show that m([0, b)) = b.
Exercise 5.2.61 Show that m([0, m)) = m for m N.
p

Exercise 5.2.62 Show that m([0, q )) = q for p, q Z+ .

Then, we have that


q m([0, q)) m([0, b)) m([0, r)) = r
for all p, q Q+ with q b r. It follows that m([0, b)) = b.

(5.2.63)


5.2 The integral

245

Exercise 5.2.64 Show that any open set in Rd can be written at the countable disjoint union

of half-open rectangles.
Exercise 5.2.65 Let X be a regular measure space and for S X Rd and a R, define

aS := {hx, ayi : hx, yi S} ,

(5.2.66)

where x X and y Rd . Show that


m(aS) = |a|d m(S).
R

(5.2.67)

Prove it for Rd alone first (i.e. if X is just a point, so that X Rd = Rd ). For this
case, prove it for S open using the previous exercise and the fact that we already
know it is true for half-open rectangles. Then prove it in general for Rd by outerregularity. Then generalize this to the case X Rd for X not-necessarily a point using
the definition of product measures.

Exercise 5.2.68 Let T : Rd Rd be a linear transformation and let S Rd . Show that

m(T (S)) = |det(T )| m(S).


R

(5.2.69)

Hint: Recall your matrix decompositions from linear algebra and use the fact that we
already know (by Haar-Howe) that lebesgue measure is invariant under isometries
together with the result of the previous exercise.

Exercise 5.2.70 Show that countable sets have 0 measure.


R

In particular, Q has 0 measure!

Example 5.2.71 An uncountable set of zero measurethe Cantor Set Let L

R+

be less than 1.
Define C0 := [0, 1]. Define C1 to be C0 with the middle open interval of length L removed,
that is
C1 := C0 \ ( 12 12 L, 12 + 12 L) = [0, 12 21 L] [ 12 + 12 L, 1].

(5.2.72)

Thus, C1 is the disjoint union of two-intervals of length 12 (1 L).


 Now from each of these
intervals remove the middle open interval of length L 21 (1 L) = 12 (L L2 ).a This is C2 .

Chapter 5. Integration

246
That is,



1
1
2 1
2
1
1
1
1
1
:=
C2 C1 \
2 ( 2 2 L) 4 (L L ), 2 ( 2 2 L) + 4 (L L )


1
1
2 1
2
1
1
1
1
1
2 ( 2 + 2 L + 1) (L L ), 2 ( 2 + 2 L + 1) + (L L )
4
4


1
= 0, 12 ( 12 12 L) (L L2 )
4


1
2 1
1
1
1
1
2 ( 2 2 L) + (L L ), 2 2 L]
4


1
2
1
1
1
1
1
2 + 2 L, 2 ( 2 + 2 L + 1) (L L )
4


1
2
1
1
1
2 ( 2 + 2 L + 1) + (L L ), 1 .
4

(5.2.73)

Continue this process inductively, at each step removing the middle open intervals of
1
L to define Ck . Finally, define
length 2k1
C :=

Ck .b

(5.2.74)

k=0

C is the Cantor Set.c


C0 is the disjoint union of 1 closed interval of length L0 := 1.
C1 is the disjoint union of 2 closed intervals of length
1
1
L1 := (L0 L L0 ) = (1 L)
2
2

(5.2.75)

C2 is the disjoint union of 4 closed intervals of length


1
1
L2 := (L1 L L1 )d = (1 L)2
2
4

(5.2.76)

C3 is the disjoint union of 8 closed intervals of length


1
1
L3 := (L2 L L2 ) = (1 L)3 .
2
8

(5.2.77)

By now, hopefully you see the pattern, which you could prove by induction if you really
k
wanted: Ck is the disjoint union of 2k closed intervals of length 1L
. Thus, in general,
2
we have that
m(Ck ) = (1 L)k .

(5.2.78)

By Exercise 5.1.41 then, it follows that


m(C) = lim(1 L)k = 0.
k

It remains to show that C is uncountable. We do this in two steps.


Exercise 5.2.80 Show that every point of C is an accumulation point.

(5.2.79)

5.2 The integral

247

Exercise 5.2.81 Show that a nonempty closed subset of R that has the property that

every point is an accumulation point is uncountable.


a You

should think of L as the fraction of the interval that you remove at each step.
this point, as I havent had a chance to add in my own picture of this, I highly suggest you google it to
find a picture.
c Actually, the classical Cantor Set is the case L = 1 .
3
d The way to think about this is as follows: You take the intervals of length L , you remove the fraction L of
1
that length from the middle to obtain a set made of two pieces of total length L1 L L1 , so that each half itself
is of length 12 (L1 L L1 ).
b At

Example 5.2.82 A set with empty interior of positive measurethe CantorSmith-Volterra Set We leave this as an exercise.


Exercise 5.2.83 Modify construction of the Cantor Set starting with C0 := [0, 1]

again, but upon constructing Ck from Ck1 , you remove the middle open interval
1
from the middle of each closed interval of Ck1 . The resulting set is the Cantor4k
Smith-Volterra Set. Show that it has measure 12 , but has empty-interior.
R

The Cantor-Smith-Volterra Set is still uncountable of course, just as the Cantor Set
likewise had empty-interior. Its just that its not surprising for a set with empty
interior to have measure 0what is surprising however is a set with empty interior of
positive measure with empty-interior. Similarly, its not surprising for a uncountable
set to have positive measurewhat is surprising is an uncountable set of measure
zero0.

Example 5.2.84 A set that is not measurable a

Define q : R [0, 1) by q(x) := x bxc]. In other words, this is just the fractional part
of the real number (intuitively, you drop of the integer in front of its decimal expansion).
Note that 1 gets sent to 0, and so, if you like, you can think of the image as a circle, with 1
glued-to 0, if this helps your intuition.
Now fix Qc and define R : R R by R(x) := x + .b For x1 , x2 R, define x1 x2 iff
there is some m Z such that x1 = Rm (x2 ) := x2 + m .
Exercise 5.2.85 Show that is an equivalence relation on R.

Denote by Ox the equivalence class of x R with respect to .c


Exercise 5.2.86 Show that q(Ox ) is dense in [0, 1).
R

Hint: This is why we needed to be irrational.

Now, let N R be any set which contains exactly one point from each equivalence class with
respect to . We claim that q(N) [0, 1) is not measurable. We proceed by contradiction:
suppose that q(N) is measurable.

Chapter 5. Integration

248

Exercise 5.2.87 Use the fact that is an equivalence relation to show that q(Rm (N))

and q(Rn (N)) are disjoint iff m 6= n.


R

Hint: See Proposition A.1.72equivalence classes form a partition of the


set.

Exercise 5.2.88 Show that


[

q(Rm (N))

[0, 1) =

(5.2.89)

mZ

Hint: Once again, uses the fact that equivalence classes form a partition.

Exercise 5.2.90 Let S R. Show that q(S) is measurable iff q(R(S)) is measurable.
Exercise 5.2.91 Let S R. Show that m(q(S)) = m(q(R(S))).

Thus, these exercises give us that


[0, 1) =

q(Rm (N))

(5.2.92)

mZ

is a disjoint union of measurable sets, all of which have the same measure M := q(N). If
M = 0, then, by additivity, we have m([0, 1)) = 0: a contradiction. On the other hand, if
M > 0, by additivity again, we have m([0, 1)) = : a contradiction. Therefore, it cannot be
the case that N is measurable.
a Construction

adapted from [19, pg. 407].


R is for rotation because in the circle picture, this will correspond to a rotation by angle t.
c The O is for orbityou can imagine the point x orbiting around the circle as you apply R to it
over-and-over.
b The

Example 5.2.93 Two disjoint sets S, T with m(S T ) 6= m(S) + m(T ) Let N be as in

the previous example. We showed there that it was not measurable. Therefore, there must
exist some S R such that
m ((S N) (S N c )) = m(S) 6= m(S N) + m(S N c ).

(5.2.94)

As a matter of fact, this trick can be adapted to prove that every set of positive measure has a
nonmeasurable subset.
Proposition 5.2.95 Let S R. Then, if m(S) > 0, then there is some subset N S that is

not measurable.
Proof. a If m(S) = , then because lebesgue measure is semifinite (Proposition 5.1.75),
there is some subset T S with 0 < m(T ) < . If T contains a set that is not measurable,
then of course so does S, so it suffices to prove the case where S has finite measure. Thus,
without loss of generality, suppose that m(S) < .
Consider the collection of cosetsb {x + Q : x R}. Let N R be set that contains

5.2 The integral

249

precisely one element of each coset. It follows thatc


[

R=

(r + N)

(5.2.96)

rQ

is a disjoint union, and so it in turn follows that


S=

[S (r + N)]

(5.2.97)

rQ

is a disjoint union.
We proceed by contradiction: suppose that every subset of S is measurable. Then, the
previous equality implies that

m(S (r + N))

m(S) =

(5.2.98)

rQ

We show that m(S (r + N)) = 0 for all r Q, which will gives us that m(S) = 0: a
contradiction.
As m(S (r + N)) < and R is inner-regular on measurable sets of finite measure (see
Proposition 5.1.72), it suffices to show that every quasicompact subset of S (r + N) has
measure 0. So, let K S (r + N) be quasicompact.
Define
H :=

(s + K).

(5.2.99)

sQ[0,1]

As K r + N, this is a disjoint union, and so we have


m(H) =

sQ[0,1]

m(s + K) = d

m(K).

(5.2.100)

sQ[0,1]

H is bounded, and so has finite measure. Thus, the above equality forces m(K) = 0, and we
are done.

a Proof

adapted from [22, pg. 53].


Definition A.1.139 for the definition of cosets.
c This follows from the fact that the cosets form a partition of R, which in turn follows from the fact that
equivalence classes form partitionssee Corollary A.1.73.
d By translation invariance.
b See

We mentioned awhile ago in the definition of measurable functions (Definition 5.1.34) the
existence of a uniform-homeomorphism of R that preserves neither measurability not measure 0. It
is time we return to this.
Example 5.2.101 A uniform-homeomorphism of R that preserves neither measurability nor measure 0the Cantor Function We first define a uniformly-continuous

function f : [0, 1] [0, 1]. This function will be nondecreasing, and so g := f + id[0,1] :
[0, 1] [0, 2] will be increasing, and hence injective. It will turn out that f (0) = 0 and
f (1) = 1, and so we will then extend g to all of R periodically, defining g(x) to be
g(x 1) + 2 for x [1, 2], etc..a
R

f is the Devils Staircase, and g is the Cantor Function.b

We define f : [0, 1] [0, 1] as the uniform limit of a sequence of continuous functions.

Chapter 5. Integration

250

It will then be continuous because MorTop ([0, 1], R) is complete, and hence uniformlycontinuous because [0, 1] is quasicompact. We then must check that f is nondecreasing.
Once we do so, we will have that g is increasing, and hence injective. Then, because the
domain [0, 1] is quasicompact and the codomain [0, 2] is T2 , we will have that its inverse is
continuous (Exercise 3.6.48 does this for us), and hence uniformly-continuous, and hence a
uniform-homeomorphism. Extending g periodically has no effect on this.
So, let us get started.c Define f0 : [0, 1] [0, 1] to be the identity function. We define
fm : [0, 1] R for m > 0 recursively:

if x [0, 31 ]
2 fm1 (3x)
(5.2.102)
fm (x) := 12
if x [ 13 , 23 ] .

1 1
2
2 + 2 f m1 (3x 2) if x [ 3 , 1]
Exercise 5.2.103 Show that fm is well-defined for all m N.
R

You must check that the two expressions agree for x =

1
3

and x = 23 .

Exercise 5.2.104 Show that fm (0) = 0 and fm (1) = 1 for all m N.


Exercise 5.2.105 Show that fm (x) [0, 1], so that indeed fm defines a function into

[0, 1].
Exercise 5.2.106 Show that fm is continuous.
R

Hint: Use induction and apply the Continuity (Proposition 3.1.38).

Exercise 5.2.107 Show that fm is nondecreasing.

We now check that m 7 fm EndTop ([0, 1]) is cauchy.d To do this, we will show by
induction that
 m+1
1
| fm+1 fm |
.
(5.2.108)
2
It will then follow from the Triangle Inequality that
 
 m
 k
1 ( 21 )m+1
1
1
1
1
| fn fm |

=

=
, (5.2.109)
1
1
2
1 2
1 2
k=m+1 2
k=m+1 2
n

from which cauchyness follows immediately.


Using the definition (5.2.102), we have that

if x [0, 13 ]
2x
f1 (x) := 21
if x [ 13 , 23 ] ,

1
2
2 (3x 1) if x [ 3 , 1]

(5.2.110)

5.2 The integral

251

and so

if x [0, 13 ]
2x
| f1 (x) f0 (x)| = |x 12 |
if x [ 13 , 23 ] .

1
2
2 (1 x) if x [ 3 , 1]

(5.2.111)

It follows that
| f1 f0 | =

1
6


1 0+1
.
2

(5.2.112)

This completes the base case. As for the inductive case, fix m 1 and assume that (5.2.108)
holds for 0 n < m. We prove that it holds for m as well. From the definition (5.2.102)
again, we have that

if x [0, 13 ]
| fm (3x) fm1 (3x)|
| fm+1 (x) fm (x)| = 12 0
(5.2.113)
if x [ 13 , 23 ] .

2
| fm (3x 2) fm1 (3x 2)| if x [ 3 , 1]
Thus, by the induction hypothesis,
| fm+1 fm |

1
2 | f m f m1 |

 m+1
1
=
.
2

(5.2.114)

We may now define f := limm f EndTop ([0, 1]). You showed that each fm is nondecreasing,
and so, as limits preserve inequalities, f itself is nondecreasing. Now define g(x) :=
f (x) + x. As described at the beginning of the example, g : [0, 1] [0, 1] is a uniformhomeomorphism.
We now show that g preserves neither measurability nor measure 0. Let C [0, 1] denote
the (L = 13 ) Cantor and let Cm denote the set defined in the mth step of the construction
of the Cantor Setsee Example 5.2.71 if you dont know what were referring to. For
convenience of notation, define Dm := [0, 1] \Cm and D := [0, 1] \C.
Exercise 5.2.115 Show that fk |Dm = fm |Dm is constanta on each component of Dm

for all k m. In particular, the image of fk on Dm is finite for all k m.


R

a Note

Though its perhaps not clear from the formulas, the fm s were defined so
that precisely this is true: even though f (0) = 0 and f (1) = 1, f does all of
increasing on a set of measure 0, namely the Cantor Set.
that this constant will depend on the component.

From this, it follows that f itself is constant on each component of D.


Recall that each Dm is a disjoint union of open intervals. As f is constant on each one of
these intervals, the measure of the image of each one of these intervals under g is just the
length of that interval (the image is the interval itself plus whatever constant f happened to
be on that interval). Furthermore, by injectivity, the images of each of these intervals must
be disjoint, and hence, m(g(D)) is the sum of the measures of all these intervals, namely,
m(D) = 1. As g([0, 1]) = [0, 2], we thus have that
m(g(C)) = m(g([0, 1])) m(g([0, 1] \C) = m([0, 2]) m(g(D)) = 2 1 = 1. (5.2.116)
Thus, m(g(C)) = 1, despite the fact that m(C) = 0.

Chapter 5. Integration

252

Every subset of R of positive measure contains a nonmeasurable set (Proposition 5.2.95), so


let N g(C) be some such set. Then, g1 (N) C is measurable because it has measure 0.
a Of

course, we have to shift the graph up to maintain continuity.


people use both these terms to refer to f . I prefer this convention so that each has its own name.
c It will probably help to find a picture on this internet of what this is supposed to look like. (Sorry. Making
diagrams takes awhile and I have not had the time.) Let Cm denote the mth step in the construction of the
Cantor Set (see Example 5.2.71). In words, fm is supposed to be the function that is constant [0, 1] \Cm and
increases linearly on the intervals that remain in Cm .
d Recall that End
Top ([0, 1]) := MorTop ([0, 1], [0, 1])see Definition A.2.7. To simplify notation, we shall
denote the norm on EndTop ([0, 1]) simply as ||.
b Some

Before we finally move on to the integral, we tie-up a loose-end: we defined 00 := 1 way back
in Chapter 2 (see Theorem 2.5.13), but we never justified itit is about time we do so.
Exercise 5.2.117 00 := 1 Show that, for every > 0, there is some open neighborhood
+
U of h0, 0i R+
0 R0 such that

m ({hx, yi U : |xy 1| })
< .
m(U)

5.2.3

(5.2.118)

I totally understand if people wish to leave the symbol 00 undefined, but I think this
result makes it clear that, if 00 is to represent any real number at all, then that real
number should be 1.

Perhaps its still worth keeping in mind that, for any real number a [0, 1], there is
+
y
a net 7 hx , y i R+
0 R0 that converges the origin, but for which 7 (x )
converges to a.

The integral
Characteristic, simple, borel, and integrable functions

An important type of function in measure theory are characteristic functions.


Definition 5.2.119 Characteristic functions and simple functions Let X be a set and

let S X. Then, the characteristic function of S, S : X {0, 1}, is defined by


(
1 if x S
S (x) :=
.
(5.2.120)
0 if x
/S
A simple function is a finite nonnegative linear combination of characteristic functions.
R

The reason that characteristic functions are important in measure theory is because
the integral of S will turn-out to simply be m(S). In particular, if you know the
integral, you know the measure.

We have seen a characteristic function beforethe Continuity is the characteristic


function of Q R.

You might say that the lebesgue integral is to characteristic functions as the riemann
integral is to rectanglesone can define the lebesgue integral (though we will not)
by approximating functions with characteristic functions,a just as one defines the
riemann integral by approximating functions with rectangles (aka (scalar multiples
of) characteristic functions of intervals).b

5.2 The integral

253

a Well,

actually finite linear combinations of characteristic functions, the so-called simple functions.
actually find this approach a bit messy, but its just yet more evidence that the lebesgue integral is hardly
more difficult than the riemann integralits the same fucking thing with arbitrary (measurable) sets instead of
intervals.
bI

In principle, we can integrate any function.7 The problem with


this, of course, Ris that the
R
integral
will
not
satisfy
certain
properties
we
would
like
it
to
(e.g.
dx
[
f (x) + g(x)] = dx f (x) +
R
dx g(x)) if we dont restrict the functions we integrate. This brings us to the definition of borel
functions.
Definition 5.2.121 Borel function and absolutely integrable Let X be an outer-

measure space and let f : X [, ] be a function. Then, f is borel iff


{hx, yi X [, ] : 0 y < f (x)} and {hx, yi X [, ] : f (x) y < 0} (5.2.122)
are measurable. If furthermore they have finite measure, then f is absolutely integrable.
R

We explain the reason the term borel is used here in Proposition 5.2.129 below.

The term globally integrable is in contrast to the terms locally integrable and just
plain integrablesee the following definition and Theorem 5.2.155 (the defining
theorem of the integral).

The collection of all (equivalence classes of) borel functions on X in


MorAlE (X, [, ]) is denoted Bor(X). The collection of all nonnegative borel
functions on X is denoted Bor+
0 (X).

If every we say a net of functions in Bor(X) converges, we mean that it converges


pointwise for almost-every x X. You can even apply Definition by specification of
convergence to have this define a topology on Bor(X), but this isnt terribly useful.
For example, the integral will not be continuous with respect to this topologysee
Example 5.2.210. For us, it will essentially just be used as concise languageits
easier to say just converges than converges pointwise almost-everywhere all the
time. (In fact, the same goes for MorAlE (X, [, ]).)

There is a symbol for the absolutely integrable functions on X as well (namely


L1 ), but we shall wait to discuss this further until we encounter the L p spacessee
Section 5.3.

Of course, the sets in (5.2.122) are just the area under the curve (one for the
nonnegative part of f and one for the nonpositive part of f ), and so the integral will
wind up being the measure of the first minus the measure of the second. Thus, a
function is borel iff the two sets needed to define its integral are measurable, and is
absolutely integrable if the measure of these sets are finite (so that one can subtract
them to define the integral).

Definition 5.2.123 Compactly integrable Let hX, mi be a topological measure space

and let f Bor(X). Then, f is compactly integrable iff f K is absolutely integrable for all
compact subsets K X.a
R

This is the condition that the function has a finite integral on every compact set. For
example, every continuous function is compactly integrable by the Quasicompactness
and the Extreme Value Theorem, but not even all constant functions will be absolutely

7 Literally. If you already know what it means for a function to be measurable, you dont even need this to be the case
in order to integrate it.

Chapter 5. Integration

254

integrable. This notion is useful because it will allow us to integrate things that would
otherwise be undefined (x 7 sin(x)
x is the classic exampleit is not integrable, but it
is compactly integrable).
R

This is more often called locally integrable, but I find this to be misleading
terminologyto mean, locally integrable should mean Every point has a neighborhood base consisting of measurable sets on which the function is absolutely
integrable..

a I definitely mean compact, not quasicompact. Quasicompact sets need not be measurable because they
need not be closed. Counter-example?

It is most convenient to work with


{hx, yi X [0, ] : y < f (x)} ,

(5.2.124)

for similar reasons as it is more convenient to work with [a, b) than it is [a, b], but it actually makes
no difference, as we now check.
Proposition 5.2.125

Let hX, mi be a topological measure space and let f

MorAlE (X, [0, ]). Then,


{hx, yi X [0, ] : y < f (x)}

(5.2.126)

is measurable iff
{hx, yi X [0, ] : f (x) y}

(5.2.127)

is. Furthermore, they both have the same measure.


Proof. We leave this as an exercise.
Exercise 5.2.128 Prove the result yourself.
R


Hint: For the direction, consider the sequence m 7 1 + m1 f . For the

direction, consider the sequence m 7 1 m1 f .


There is a very important characterization of borel functions. In fact, this characterization is the
reason borel functions are called borel functions.
Proposition 5.2.129 Let hX, mi be a topological measure space. Then, f Bor(X) iff the

preimage of every open set is measurable.


R

As preimages preserves unions, intersections, and complements, it follows that the


preimage of any borel set (see Definition 5.1.44) will likewise be measurable. This is
why we call borel functions borel funtions.

A lot of authors call these measurable functions. This conflicts with other standard
terminology (which we make use ofsee Definition 5.1.34), and so I recommend
you not use it. There is a way to make this not inconsistent, but its awkward: when
you have a function f : R R, for example, you simply declare that the measurable
sets in the codomain are the borel sets but the measurable sets in the domain are all
(lebesgue) measurable sets. Ew.

5.2 The integral


Proof.

255

() Suppose that f Bor(X). Let us write

f := {hx, yi X [0, ] : f (x) y}.

(5.2.130)

We first do the case where f is bounded, say by C [0, ), and is 0 outside a set of finite
measure M X. In this case, [m mL ]( f ) is finite,b and so by inner-regularity, there is an
increasing sequence of compact sets K0 K1 K2 f such that [m mL ]( f \C) =
S
0,c where C := mN Km . Now, define gm , hm : X [0, ] by
(
suphx,yiKm {y} if Km ({x} R) 6= 0/
gm (x) :=
(5.2.131)
0
if Km ({x} R) = 0/
and
(
infhx,yi(XR)\Km {y} if Km ({x} R) 6= 0/
hm (x) :=
.
0
if Km ({x} R) = 0/

(5.2.132)

Note that, as Km f , gm (x) < f (x) and f (x) hm (x). Furthermore, for each fixed x X,
the sequence m 7 gm (x) is nondecreasing, and so by the Limit superiors and limit inferiors,
we may define g(x) := limm gm (x). Similarly, we may define h(x) := limm hm (x).
Exercise 5.2.133 Show that the preimage under gm of every open set is measurable

by showing that the preimage of (, b) is open for all b R. Show that the
preimage under hm of every open set is measurable by showing that the preimage of
(a, ) is open for all a R.
Exercise 5.2.134 Use this to deduce that the preimage of every open set under both

g and h measurable.
Exercise 5.2.135 Show that [m mL ](hg ) = 0.
Exercise 5.2.136 Use this, together with the fact that the preimage of every open

set under both g and h is measurable, to show that g(x) = h(x) for almost every x.
As g f h, it follows that f itself is equal almost-everywhere to both g and h.
Exercise 5.2.137 Use the fact that f = g in Bor( X) and the fact that the preimage

of every open set under g is measurable to show that the preimage of every open set
under f is measurable.
This complete the proof in the case that f is bounded and vanishes outside a set of finite
measure.
S
We now do the proof for f Bor+
0 (X) be arbitrary. Write X = mN Mm as the increasing
union of measurable sets of finite measure, so that
X [0, ] =

Mm [0, m)

mN

Mm {}.

(5.2.138)

f (Mm {}).

(5.2.139)

mN

Thus,
f =

[
mN

f (Mm [0, m))

[
mN

Chapter 5. Integration

256
Note that

f (Mm [n, n+1)) = {hx, yi X [, ] : y < f (x), x Mm , n y < n + 1} = fm ,


(5.2.140)
where fm : X R is defined by

f (x) if x Mm and f (x) < m


fm (x) := m
if x Mm and f (x) m .

0
otherwise

(5.2.141)

fm is measurable because f is. Furthermore, it is bounded and vanishes outside a set of


measure 0, and so the previous case applies, and we have that the preimage of open sets
under fm is measurable.
Exercise 5.2.142 Use this to show that the preimage of open sets under f is

measurable.
Finally, for f Bor(X) arbitrary (not necessarily nonnegative), each of the nonnegative
and nonpositive parts of f are borel (by definition), and so the previous case shows that the
preimage under either of these functions of open sets is measurable. It then follows that the
preimage under f of open sets are measurable.
() Suppose that the preimage of every open set is measurable. It follows that the preimage
of half-open rectangles are measurable,d and so
{hx, yi X [0, ] : 0 y < f (x)} =

f 1 ([r, )) [0, r)

(5.2.143)

rQ

is measurable. Similarly the negative version of this set is measurable too, and so f is
borel.

a Proof

adapted from [19, pg. 384].


denotes lebesgue measure on [0, ], to distinguish it from the measure on X.
c We are implicitly using the fact that is measurable here.
f
d Why?

bm

Proposition 5.2.144 Let hX, mi be a topological measure space and let { fm : m N}

Bor(X) be a countable collection of measurable functions. Then, x 7 supmN fm (x) and


x 7 infmN fm (x) are borel.
Proof. We leave this as an exercise.
Exercise 5.2.145 Prove the result yourself.


Proposition 5.2.146 Let hX, mi be a topological measure space and let 7 f Bor(X)

converge to f MorAlE (X, [, ]). Then, f Bor(X).


R

If you want to be fancy about things, this is just the statement that Bor(X)

5.2 The integral

257

MorAlE (X, [, ]) is closed (with respect to the topology of almost-everywhere


pointwise convergence).

Proof. We leave this as an exercise.


Exercise 5.2.147 Prove the result yourself.
R

Hint: Reduce it to the case where 7 f is a sequence. Then, for a


convergent sequence m 7 fm Bor(X), define gm := infkm fk . gm is borel
by the previous result. Check that gm is nondecreasing and has the same limit
as m 7 fm . Deduce that limm fm is borel.


In the course of showing uniqueness of the integral, it will be important to know that we can
approximate borel functions by finite linear combinations of characteristic functions.
Proposition 5.2.148 Let X be a topological measure space and let f Bor+
0 (X). Then,

there exists a nondecreasing sequence m 7 sm Bor+


0 (X) of simple functions such that
limm sm f in Bor+
(X).
0
Proof.

Write X = mN Km for Km X compact. Define fm : X [0, ] by

f (x) if x Km and f (x) m


fm (x) := m
if x Km and f (x) > m .

0
otherwise
S

For n N and 0 o < mn, define



Sm,n,o := x Km : mo < fn (x)

o+1
m

(5.2.149)

(5.2.150)

Finally, define sm : X [0, ] by


m2

sm :=

mo S

m,m,o

(5.2.151)

o=0

Exercise 5.2.152 Show that m 7 sm is a nondecreasing sequence of borel simple

functions converging pointwise to f



a Proof

adapted from [24, pg. 31].

The integral itself

One last thing before we get to the integralwe decompose a function into its nonnegative and
nonpositive parts.
Definition 5.2.153 Let X be a set and let f : X R be a function. Then, we write

f+ := max{ f , 0} and f := min{ f , 0}.

(5.2.154)

Chapter 5. Integration

258

That is, f+ is equal to f if f is positive and 0 otherwise; likewise, f is equal to f if


f is negative and 0 otherwise.

Note that always f+ , f 0.

Also note that f = f+ f and | f | = f+ + f .

Theorem 5.2.155 Integral.

Let hX, mi be a topological measure space.a Then, there is a unique function I :


Bor+
0 [0, ], the integral, such that
(i). (Normalization) I(S ) = m(S);
(ii). (Additivity) I( f + g) = I( f ) + I(g);
(iii). (Nonnegative-homogeneity) I(a f ) = aI( f ) for a 0;b and
(iv). (Lebesgues Monotone Convergence Theorem) whenever 7 f is a nondecreasing net of functions then
lim I( f ) = I(lim f ).c

(5.2.156)

For f Bor(X) absolutely integrable, we define


I( f ) := I( f+ ) I( f ).

(5.2.157)

f Bor(X) is integrable iff f is compactly integrabled and


K 3 K 7 I( f K )

(5.2.158)

converges, where K is the directed set of all compact subsets of X ordered by


inclusion, in which case we definee
I( f ) := lim I( f K ).
KK

(5.2.159)

We write
Z

d m(x) f (x) := I( f ).

(5.2.160)

Furthermore, this satisfies


(i).
Z
X

d m(x) f (x) = [m mL ] ({hx, yi X [0, ] : y < f (x)}) .f ; (5.2.161)

and
(ii). (Lebesgues Dominated Convergence Theorem) Let 7 f Bor(X) converge
to f Bor(X). Then, if there is some absolutely integrable g Bor+
0 (X) such
that eventually | f | g,g then f is absolutely integrable and
Z

lim

d m(x) | f (x) f (x)| = 0.

(5.2.162)

5.2 The integral

259

Note that functions are absolutely integrableR iff the integral of their absolute value is
finite. That is, f is absolutely integrable iff X d m(x) | f (x)| < .

There is a lot going on here, so let me try to clarify. We can integrate any nonnegative
borel function (with the answer possibly being ), with the integral being the measure
of the area under the curve. The integral on nonnegative borel functions is specified
uniquely by four different properties.
We extend this to not-necessarily-nonnegative functions by declaring that the integral
of a absolutely integrable function is just the integral of the nonnegative part of the
function minus the integral of the nonpositive part of the function. In other words, it is
the signed area under the curve. In order for the subtraction to make sense, we need
at least one of these quantities to be finite, and it is traditional for the term (globally)
integrable to actually imply that both of them are finite, so that the integral of a
absolutely integrable function is a finite number.
Unfortunately, there are still things to which we would like to assign the value of an
integral, but which are not integrable. This happens when there is infinite positive
area and infinite negative area, but they should cancel each other in such a way
so as to get a finite number. As mentioned previously, the classic example of which
is x 7 sin(x)
x . To include such functions, we thus extend the definition of the integral
even further by approximating the integral of such functions by their integrals on
R
R
quasicompact sets. This is like defining 0 dx f (x) := limb 0b dx f (x). In the case
R
R
that f is integrable over [0, ], we have that [0,] dx f (x) = limb 0b dx f (x), so that,
in such cases, it doesnt matter which definition you use to compute the integral.

For S X, we define
Z

d m(x) f (x) :=
S

d m(x) S (x) f (x).

(5.2.163)

If X = R and S = [a, b], we define


Z b

dx f (x) :=
[a,b]

d mL (x) f (x)

(5.2.164)

If the measure is clear


R from context,
R we may just simply write dx instead of d m(x).
We may even write X d m f or just X f if the variable of integration is irrelevant.

Warning: lim X f = X lim f can fail if the convergence is not monotone, or even
if it is monotone decreasingsee the Example 5.2.210.

In particular, by the The integral itself, it follows that

lim

d m(x) f (x) =

d m(x) lim f (x).

(5.2.165)

Warning: For Lebesgues Dominated Convergence Theorem, t is not enough that f


be absolutely integrable.h Our example above, Example 5.2.210, is a counter-example
to this. In fact, even if your space has finite measure, and every fm is absolutely
integrable, and f is absolutely integrable, this can still fail. See the following
example.

Its worth noting that if you want to pull this shit (i.e. commute the limits with the
integrals) with the riemann integral you need uniform convergence. Holy crap thats
an inconveniently strong condition.

Chapter 5. Integration

260

Damn this is sexy. First of all, we are characterizing the integral uniquely via certain
properties it satisfies, and then furthermore, it turns out that this unique such function
is simply given by the area under the curve. What do you think of that, Riemann?i

Because I is defined on Bor+


0 (X), it is implicit that if f (x) = g(x) almost-everywhere,
then I( f ) = I(g). So, for example, the integral of the Continuity is 0!j

k
I happen to prefer to put the dx in front
R of the integral, but dont let that confuse
youthe meaning is just the same as f (x) dx.

Proof. l S T E P 1 : I N T R O D U C E N O TAT I O N
As we will be making use of the set quite a bit, it will be useful to introduce the notation
f := {hx, yi X [0, ] : y f (x)} .

(5.2.166)

STEP 2: DEFINE I
Let f : X R+
0 and define
I( f ) := [m mL ]( f ).

(5.2.167)

S T E P 3 : S H O W T H AT I D E S C E N D S T O A W E L L - D E F I N E D F U N C T I O N O N
MorAlE (X, [0, ])
We must check that if f = g in MorAlE (X, [0, ]), then I( f ) = I(g). Define
S := {x X : f (x) 6= g(x)}.

(5.2.168)

By definition of AlE, we have that m(S) = 0. We then have that


f cg ) = {hx, yi X [0, ] : y f (x) and y > g(x)} S [0, ],

(5.2.169)

and so
[m mL ]( f cg ) [m mL ](S [0, ]) = 0 := 0.

(5.2.170)

Hence,
[m mL ]( f ) = [m mL ]( f g ) + [m mL ( f cg ) = [m mL ]( f g ). (5.2.171)
By f g symmetry, we likewise have that [m mL ](g ) = [m mL ]( f g ). Combining
this with the previous equality gives us that I( f ) = I(g).
S T E P 4 : S H O W T H AT I(S ) = m(S)
S := {hx, yi X R : S (x) y} = S [0, 1]

(5.2.172)

and so
I(S ) := [m mL ](S ) = m(S) 1 = m(S).
S T E P 5 : S H O W T H AT I I S A D D I T I V E

(5.2.173)

5.2 The integral

261

We wish to show that I( f + g) = I( f ) + I(g). If either I( f ) or I(g) is infinite, then this is


trivially satisfied as this equation then reads = . Thus, without loss of generality, we
may assume that I( f ), I(g) < .
For f : X [0, ], define f : X [0, ] X [0, ] by
f (hx, yi) := hx, f (x) + yi.

(5.2.174)

This definition was made so that we have


f +g = f f (g )

(5.2.175)

is a disjoint union.m Thus, it suffices to show that (i) f (M) is measurable if M is and (ii) that
[m mL ]( f (M)) = [m mL ](M) for M X [0, ] measurable. To show this, we apply
Theorem 5.1.82 (sets in topological measure spaces are measurable iff you can approximate
them with open and closed setsit is thus very important that the product measure is regular
and borel).
We prove this in stages. First of all, take M = MX [0, b) for MX X measurable. Then,
M = g for g := bMX ,

(5.2.176)

and so we have that (by (5.2.175))


f f (M) = f +g = g+ f = M g ( f ).

(5.2.177)

Furthermore,
g ( f ) = g ({hx, yi f : x M} {hx, yi f : x
/ MX })
= {hx, y + hi : hx, yi f , x MX } {hx, yi f : x
/ MX },

(5.2.178)

and so by translation invariance (and the measurability of M), we have


[m mL ](g ( f )) = [m mL ]( f ),

(5.2.179)

and hence
[m mL ]( f ) + [m mL ]( f (M)) = [m mL ](M) + [m mL ]( f ),

(5.2.180)

which yields
[m mL ]( f (M)) = [m mL ](M).

(5.2.181)

Now take M = MX [a, b). Then,


M = g \ h for g := bMX and h := aMX .

(5.2.182)

Now the fact that [m mL ]( f (M)) follows from the previous case.
S
For the general case, let M X [0, ] be arbitrary measurable. Write X = mN Rm
as the disjoint union of measurable sets of finite measure (this is an application of Exercise 5.1.54), and define Mm,n := M (Rm [n, n + 1)) (as well as Mm, := M (Rm {}) if
you really like, though this will not matter as everything here has measure 0). Then, M is the
disjoint union of Mm,n , and so it suffices to prove the result for subsets of Rm [m, m + 1).

Chapter 5. Integration

262

So, now, let us change notation, let MX X be measurable of finite measure and take M
to be a measurable subset of MX [n, n + 1). Let > 0. By rectangle outer-regularity (see
S
(5.2.4)), there is a countable cover of M by open rectangles mN U1,m U2,m such that

m(U1,m ) mL (U2,m ) < [m mL ](M) + .

(5.2.183)

mN

Without loss of generality,n we may assume that each U2,m = [am , bm ), in which case the
previous case applies, so that
[m mL ]( f (M))

[m mL ]( f (U1,m U2,m )) = [m mL ](U1,m U2,m )


mN

mN

< [m mL ](M) + ,
(5.2.184)
and so
[m mL ]( f (M)) [m mL ](M).

(5.2.185)

Similarly,
[m mL ] ( f ((MX [n, n + 1)) \ M)) [m mL ]((MX [n, n + 1)) \ M)

(5.2.186)

Hence,
[m mL ]( f (M))
+ [m mL ] ( f ((MX [n, n + 1)) \ M))

[m mL ](M)
+ [m mL ]((MX [n, n + 1)) \ M)
= [m mL ](MX [n, n + 1))
= [m mL ]( f (MX [n, n + 1))
[m mL ]( f (M))
+ [m mL ] ( f (MX [n, n + 1)) \ f (M))
= [m mL ]( f (M))
+ [m mL ] ( f ((MX [n, n + 1)) \ M)) .
(5.2.187)

Thus, all of the inequalities must be inequalities, which in particular implies that
[m mL ]( f (MX [n, n + 1))) = [m mL ]( f (M))
+ [m mL ] ( f (MX [n, n + 1)) \ f (M)) .

(5.2.188)

By our characterization of measurability in topological measure spaces (Theorem 5.1.82),


this implies that f (M) is measurable. Hence,
0 = [m mL ](MX [n, n + 1)) [m mL ]( f (MX [n, n + 1)))
= ([m mL ](M) + [m mL ]((MX [n, n + 1)) \ M))
([m mL ]( f (M)) + [m mL ]( f (MX [n, n + 1)) \ f (M)))
= ([m mL ](M) [m mL ]( f (M)))
+ ([m mL ]((MX [n, n + 1)) \ M) [m mL ] ( f ((MX [n, n + 1)) \ M))) .

5.2 The integral

263
(5.2.189)

We already showed that [m mL ](M) [m mL ]( f (M)) (and similarly for the complement), so that both of these terms are nonnegative, and hence 0. This gives us
[m mL ]( f (M)) = [m mL ](M), as desired.
S T E P 6 : S H O W T H AT I I S N O N N E G AT I V E H O M O G E N E O U S
Note that
a f := {hx, yi X R : ay < f (x)} = {hx, ayi X R : y < f (x)} = a f . (5.2.190)
Then, it follows from Exercise 5.2.65 that
I(a f ) := m(a f ) = m(a f ) = a m( f ) = aI( f ).

(5.2.191)

S T E P 7 : P ROV E L E B E S G U E S M O N OTO N E C O N V E R G E N C E T H E O R E M
Let us define f : X [0, ] by f (x) := lim f (x). For almost every x X, the sequence
7 f (x) is nondecreasing, and so by the usual Limit superiors and limit inferiors, this
limit exists in [0, ]o for almost every x, and so defines a function f MorAlE (X, [0, ]).
For all the points where we do not have monotonicity, we may without loss of generality
take f to be infinite there.
First of all, if f (x) = for all x X, we have that
Z

lim

d m(x) f (x) = p =

Z
X

d m(x) f (x).

(5.2.192)

Otherwise, there is some x0 X for which f (x0 ) < .


Unfortunately,q measure theory is an intrinsically countable subject, and so to apply the
machinery weve developed, we must reduce the proof to the case of sequences. To do this,
we show that every subnet
7 I( f )

(5.2.193)

has in turn another subnetr


7 I( f )

(5.2.194)

that converges to I( f ). As f (x0 ) < , for := m1 , there is some m such that f (x)
fm (x) < m1 . As subnets of convergent nets converge to the same thing, we still have that
m 7 fm (x) converges to f (x) for all x X. Define fm := fm . We wish to show that
lim I( fm ) = I( f ).

(5.2.195)

However, note that


fm fm+1 and

fm = f

(5.2.196)

mN

for all m, and hence


I( fm ) := [m mL ]( fm ) [m mL ]( f ) = I( f ) := I(lim fm ).
m

(5.2.197)

Chapter 5. Integration

264
It then follows that
lim I( fm ) = I( f )

(5.2.198)

by Exercise 5.1.39.s
STEP 8: SHOW UNIQUENESS
Now, let I : Bor+
0 (X) [0, ] be an additive nonnegative-homogeneous map that satisfies
Lebesgues Monotone Convergence Theorem and I(S ) = m(S). It follows immediately that
they agree on borel simple functions. However, we already know (Proposition 5.2.148) that
any borel function can be written as the limit monotone limit of simple functions, so that,
by Lebesgues Monotone Convergence Theorem, we actually have equality of I with the
integral on all borel functions.
S T E P 9 : P R O V E L E B E S G U E S D O M I N AT E D C O N V E R G E N C E T H E O R E M
We leave the proof as a series of exercises.
Exercise 5.2.199 First of all check that, if the result is true, then f is indeed

intergable.
Exercise 5.2.200 Reduce the proof to the case where each f 0.
Exercise 5.2.201 Reduce the proof to the case of sequences m 7 fm .
Exercise 5.2.202 Define gm := infkm fk and hm := supkm fk .a Show that
Z
Z

lim
m

d m(x) |gm (x) f (x)| = 0 = lim


m

d m(x) |hm (x) f (x)| . (5.2.203)

Hint: Use the definition of the integral as the measure of the area under the
curve. Then apply your knowledge of measure theory.

a These

are all borel functions by Proposition 5.2.144.

Exercise 5.2.204 Use this to finish the proof.

S T E P 1 0 : P R O V E T H AT T H E I N T E G R A L I S W E L L - D E F I N E D
By this, we mean that we need to check that
Z

d m(x) f (x) =

lim

KK

d m(x) f (x)

(5.2.205)

for f Bor(X) globally integrable.

In this case, the net K 7 K f bounded by the absolutely integrable function f itself, and so
this actually just follows from Lebesgues Dominated Convergence Theorem.
a Recall

that this means that (i) X is -compact, (ii) m is regular, and (iii) m is borel.

5.2 The integral

265

b Of course, when we extend the definition of the integral to not-necessarily-nonnegative functions, this will
be true for all a R, but for the time being, we require that a 0 so that a f itself is nonnegative.
c Note that lim f (x) always exists, by the usual Limit superiors and limit inferiorseither it is bounded, in

which case it converges in R, or it is unbounded, in which case it converges to . (I lied a bit. . . as it is only
nondecreasing almost-everywhere, then the limit exists only almost-everywhere.) Furthermore, f Bor
0 (X)
(see Proposition 5.2.146), so that the integral is indeed defined for f . (Of course, the measure of the area under
the curve always makes sense, but we only consider it to be the integral for borel functions.)
d So that the terms of the net themselves make sense.
e This is well-defined in the sense that this agrees with the definition above in (5.2.157) in the case that f is
in fact globally integrable.
f m is lebesgue measure, simply to distinguish it from the measure on X. The strict < as opposed to is
L
crucial for making a couple of arguments easier, similar to how it is more convenient to work with [a, b) than it
is [a, b].
g That is, | f (x)| g(x) for almost every x X.

h This is part of the conclusion, not part of the hypothesis


i Seriously, people actually claim that the lebesgue integral is not geometric. Da fuq? Can someone please
explain to me how limits of partitions of sums of areas of rectangles is more geometric than the area under the
curve?
j Another way to see this is from (i) and the fact that m (Q) = 0 because Q is countablesee Exercise 5.2.70.
L
k For two reasons: (i) it tells you immediately what the variable you are integrating with respect to is (just a
slight convenience, especially when the integrand is complicated) and (ii) it is more analogous to how we write
d
the derivativeno one writes d f (x)/ dxpeople write dx
f (x).
l Proof adapted from [19, pg. 377].
m This requires the use of the strict inequality in the definition of and .
g
f
n Why?
o If the net is eventually bounded, the Limit superiors and limit inferiors tells us it converges in R. Otherwise,
it converges to .
p You caught me again: this is false if m(X) = 0. Heres another cookie: *gives cookie to reader*.
q Or fortunately, I suppose, depending on your perspective.
r Actually, a subsequence.
s I find it fascinating how relatively trivial this is compared to additivity, despite this having a name attached
to it and additivity practically taken for granted. On the contrary: Lebesgues Monotone Convergence Theorem
requires just basic theory of outer-measuresadditivity on the other hand requires a reasonably well-developed
theory of topological measure spaces!

This result presents the properties of the integral which uniquely characterize it (as well as a
couple other absolutely fundamental facts about the integral). Of course, the integral satisfies many
other properties as well.
Exercise 5.2.206 Let hX, mi be a topological measure space and let f Bor+
0 (X). Show
R

that if
R

d m(x) f (x) = 0, then f = 0.

Remember: the statement that f = 0 in Bor+


0 (X) just means that (any function that
represents) it is 0 almost-everywhere.

Proposition 5.2.207 Triangle Inequality Let hX, mi be a topological measure space

and let f Bor(X) be integrable. Then,


Z
Z


d m(x) f (x) d m(x) | f (x)| .
X

X

(5.2.208)

Chapter 5. Integration

266

Proof. First take f to be globally integrable | f | = f+ + f . Using this, we have


Z
Z

Z



d m(x) f (x) := d m(x) f+ (x) d m(x) f (x)



X

d m(x) f+ (x) +
X

d m(x) f (x) =
X

(5.2.209)

d m(x) | f (x)| .

The result follows for f a general integrable function because the absolute value (being
continuous) commutes with limits.

 Example 5.2.210 A monotone sequence of functions which converges to 0
whose integrals converge to This is actually quite easy. Not even close to being exotic

compared to some of the counter-examples we see. For example, define fm : R [0, ] by


fm := [m,) . Then, limm fm (x) = 0 for all x R, and so
Z

dx lim fm (x) = 0.

(5.2.211)

On the other hand,

R dx f m (x)

= for all m N, and so

dx lim fm (x) = .

lim
m

(5.2.212)

Fortunately, however, we do have commutation of limits with integrals in cases more than
just a nondecreasing sequence of functions, because of, for example, Lebesgues Dominated
Convergence Theorem.
As mentioned in a remark of the theorem that defined the integral (Theorem 5.2.155), integrability of the limit and each function in the net is not enoughyou really need them all to be
bounded by a single absolutely integrable function.
Example 5.2.213 A sequence of absolutely integrable functions converging to
a absolutely integrable function on a space of finite measure, but for which the
limit of the integral is not the integral of the limit For m N, define fm : [0, 1] R by


fm (x) := (m + 1)xm . Then, limm fm (x) = 0 almost everywhere, and so we have


Z 1
0

dx lim fm (x) = 0.

(5.2.214)

On the other hand, we have that

R1
0

dx fm (x) = 1 for all m N, and so

Z 1

lim
m

dx fm (x) = 1

(5.2.215)

Exercise 5.2.216 Can you find a counter-example to show that Lebesgues Dominated

Convergence Theorem fails if the net of borel functions is eventually bounded by an


integrable function instead of an absolutely integrable one?
We mentioned awhile back that sums are really just integrals over the counting measure.

5.2 The integral

267

Proposition 5.2.217 Let m denote the counting measure on N and let f Bor+
0 (hN, mi).

Then,
Z

d m(m) f (m) =
N

f (m).

(5.2.218)

mN

If youre ever teaching a child how to add two numbers, say 3 and 5, dont tell them
to integrate the function f : {x1 , x2 } N, f (x1 ) = 3 and f (x2 ) = 5, over the two
point space equipped with the counting measure. As elucidating as that may seem,
its circularwe needed to know how to add natural numbers to get to this point. Its
probably best to teach them that it is the cardinality of the disjoint union of the set
{0, 1, 2} and {0, 1, 2, 3, 4}.a

Proof. For m N, define


m

fm :=

{k} f (k).

(5.2.219)

k=0

Note that m 7 fm is nondecreasing and converges to f . Hence, by Lebesgues Monotonic


Convergence Theorem,
Z

d m(m) f (m) =
N

d m(m) lim {k} (m) f (k) = lim


n

= b lim f (k)
n

k=0

k=0

d m(m)
N

{k} (m) f (k)


k=0

Z
N

{k} (n) = lim f (k) m({k}) = lim f (k)


n

k=0

k=0

f (m).

mN

(5.2.220)

a Dont

forget to explain that a finite cardinal is an equivalence class of sets, all of which have the property
that there is no bijection onto a proper subset, with respect to the equivalence relation that is isomorphism in the
category sets. If you omit this, I fear you might risk confusing them.
b By additivity and nonnegative-homogeneity.

Proposition 5.2.221 Let hX, mi be a topological measure space and let f : X [, ]

be absolutely integrable. Then, for everyR > 0, there is some > 0 such that, whenever
m(S) < is measurablea , it follows that S d m(x) | f (x)| < .
Proof. For convenience, write g := | f |. Let > 0. For m N, define
Mm := g1 ([0, m])

(5.2.222)

Note that this is measurable because g is borel (see Proposition 5.2.129). It follows that
gm := f |Mm is borel. As m 7 gm is nondecreasing and converge to g in Bor+
0 (X), it follows
from Lebesgues Monotone Convergence Theorem that
Z

d m(x) gm (x) =

lim
m

d m(x) g(x).
X

(5.2.223)

Chapter 5. Integration

268

Thus, there is some m0 Z+ such that, whenever m m0 , it follows that


Z

d m(x) [g(x) gm (x)] < .

(5.2.224)

Define :=
Z

m0 .

Now, let S X be such that m(S) < . Then,

d m(x) | f (x)| =

d m(x) g(x) =
S

Z
X

d m(x) [g(x) gm0 (x)] +

Z
S

d m(x) gm0 (x)


(5.2.225)

d m(x) [g(x) gm0 (x)] + m0 m(S) < + m0 = 2.




a The measurability hypothesis is needed so that the integral over S is defined. Recall that
so for this to be defined, we require that S be borel, which requires that S be measurable.
b Because g
m0 is bounded by m0 .

:=

S , and

Exercise 5.2.226 Let I R be an interval, let f Bor(I) be absolutely integrable, and let
Rx

a I. Show that the map x 7

dt f (t) is uniformly-continuous.

The following is an important result that we will need to prove the Fundamental Theorem of
Calculus in the next chapter. To state it, we will need to introduce a couple of definitions.
Definition 5.2.227 Upper-semicontinuity and lower-semicontinuity Let f : X

[, ] be a function on a topological space X and let x0 X. Then, f is uppersemicontinuous at x0 iff for every > 0, there is an open neighborhood U of x0 such
that, whenever x U, it follows that f (x) f (x0 ) < . f is at x0 iff for every > 0, there is
an open neighborhood U of x0 such that, whenever x U, it follows that f (x0 ) f (x) < .
R

Upper-semicontinuity means that you can force f (x) to be as close as you like to
f (x0 ) from above (by moving sufficiently close to x0 ), but you have no control over
what happens below f (x) can be arbitrarily negative in a neighborhood of x0 and it
will still be upper-semicontinuous. Similarly, for lower-semicontinuous.

Theorem 5.2.228 Carathodory-Vitali Theorem. Let hX, mi be a topological measure

space and let f Bor(X) be absolutely integrable. Then, for every > 0, there are u, v
Bor(X) such that
(i). u f v;
(ii). u is upper-semicontinuous and bounded above;
(iii). v is lower-semicontinuous and bounded below; and
(iv).
Z

d m(x) [v(x) u(x)] < .

(5.2.229)

Proof. a We first prove the case for f nonnegative.


Then, by Proposition 5.2.148, there is an nondecreasing sequence m 7 sm of simple
functions that converges to f in Bor+
0 (X). Then, as
sm = (sm sm1 ) + (sm1 sm2 ) + + (s1 s0 ) + s0 ,

(5.2.230)

5.3 L p spaces

269

we can also write f as a series of simple functions


f=

m M

(5.2.231)

mN

for m > 0 and Mm X measurable. Let > 0, and by inner and outer-regularity let
Km Mm Um be such that Km is quasicompact, Um is open, and
m m(Um \ Km ) <

2m .

(5.2.232)

As f was assumed to be absolutely integrable, it must be the case that


Z

m m(Mm ) =

d m(x) f (x)

(5.2.233)

mN

converges, and so there is some m0 N such that

m m(Mm ) < .

(5.2.234)

m=m0 +1

Finally, define
m0

v :=

m Um and

m=0

m K

(5.2.235)

m=0

Exercise 5.2.236 Show that u and v satisfy the desired properties.


Exercise 5.2.237 Finish the proof by doing the general case by writing f = f+ f .


a Proof

adapted from [22, pg. 56].

Riemann integrability

Finally, as I am required to teach you the riemann integral, I begrudgingly include a cop-out version
just to say I taught it.
Definition 5.2.238 Riemann integrable Let f : [a, b] R. Then, f is riemann inte-

grable iff f is bounded


and continuous almost-everywhere. In this case, its riemann integral
R
is defined to be ab dx f (x).
R

This is usually a theorem, but because I think its a waste of time to develop the
riemann integral when the lebesgue integral does everything the riemann integral
does (and much more), I include it as a definition just to say I did it. Congratulations!
You now know the riemann integral!

5.3 L p spaces
The idea of the L p spaces is simplelets study a space by studying the functions on that space. In
this case, the space is a topological measure space, and so the relevant functions will be the borel
functions. In a perfect world, I suppose we would use the integral to put (semi)norms on Bor(X)

Chapter 5. Integration

270

itself, but unlike in the continuous case where we had theorems (namely the Quasicompactness
and the Extreme Value Theorem) to guarantee the finiteness of the supremum seminorms on
quasicompact sets, there are no such theorems in this casetheres no getting around the fact that
some borel function will have infinite integral on any set of positive measure. And so, instead of
studying all of the borel functions at once, we study certain subsets of them. The L p spaces are
certain nice spaces of borel functions, namely the ones whose pth power is absolutely integrable.
Before we do anything else, we define the so-called L p norms.
Definition 5.3.1 L p -norm Let hX, mi be a topological measure space and let p [1, ].

Then, the L p norm, || p : Bor(X) [0, ], is defined by

1
R
p p
(
d
m(x)
|
f
(x)|
)
X
| f | p :=
sup y R : m f 1 ([y, ]) > 0

p 6= .
p=

(5.3.2)

If you take X := {1, . . . , d} to be a d point set with the counting measure, this reads

|v| p = (|v1 | p + + |vd | p ) p ,

(5.3.3)

where we have suggestively written vk := v(k). In particular, L2 ({1, . . . , d})


=HilR
Rd .a
R

| f | is the essential supremum. Intuitively, youre taking the supremum over all
y-values that that f is at least as large as on a set of (strictly) positive measure. So for
example, the function that is 0 on Qc and on Q has essential supremum 0that
it is infinite on an infinite set does not matter, as this infinite set has measure 0.
Note that in the case that X is quasicompact, the supremum norm | f | as defined
in Example 4.3.85 agrees with the essential supremum norm | f | , so there is no
ambiguity. The reason this is called the p = norm is because lim p | f | p = | f |
(at least when | f | p < for some p < ).

The L is for lebesgue. I have no idea what the p is for. In fact, I think its a bad
choice of letter, but were pretty much stuck with it at this point.

a Hil is the category of (real) hilbert spaces. If you know what this is, greatheres another cookie *gives
R
yet another cookie to the reader*. If not, dont worry about it for nowas long as you intuitively see why this
is the same as the euclidean norm, youre fine.

Definition 5.3.4 Hlder conjugate Let 1 < p, q < . Then, p and q are hlder conju-

gate iff
1
p

+ 1q = 1.

(5.3.5)

1 and are hlder conjugates.


R

Note that 2 is hlder conjugate to itself.

One way to see that we dont want toa investigate the case 0 < p < 1 is because
then it does not have a hlder conjugate.b This really breaks things because then, for
example, L p spaces will not hold.

a Well,
b Or

usually not.
at least, its hlder conjugate would be negative.

5.3 L p spaces

271

Proposition 5.3.6 Hlders Inequality Let 1 p, q be hlder conjugates and let

f , g Bor(X). Then,
| f g|1 | f | p |g|q .
R

(5.3.7)

The case p = 2 = q is called the Cauchy-Schwarz Inequality. In the case that X is


a d point space with the counting measure, it is literally the statement that the dot
product of two vectors is at most the product of their euclidean norms.

Proof. If both | f | p = = |g|q , then the inequality is trivially satisfied. If one of | f | p and
|g|q is 0, without loss of generality, say | f | p , then we get that | f | = 0 by Exercise 5.2.206,
and so | f | = 0, and so | f g| = 0, and so | f g|1 = 0,a and so in this case the inequality is
satisfies as well. Thus, we may as well assume that 0 < | f | p , |g|q < . Then, replacing f by
f
g
| f | and |g| , we see that it suffices to show that | f g|1 1 if | f | p = 1 = |g|q .
p

First take 1 < p, q < . Let us write 2s = | f | p and 2t = |g|q for s,t Bor(X).
Exercise 5.3.8 Show that that if a + b = 1 with a, b 0, then 2ax+by a2x + b2y .
R

Note that this does not necessarily work if a or b is negative. Thus, is the
point where the proof breaks down if p < 1 (because then either p is negative
or q is negative).

This is called convexity.

Using this, we obtain


| f ||g| = 2s/p 2t/q = 2s/p+t/q

2s 2t
| f | p |g|q
+ =
+
.
p
q
p
q

(5.3.9)

Integrating this inequality gives Hlders Inequality.


Exercise 5.3.10 Prove the result for p = 1 and q = .


a I feel as if this notation,

is the number defined by

while consistent, is a bit obtuse| f | is the function defined by x 7 | f (x)|, and | f |1


d
X m(x) | f (x)|. I may go back and change things in the future if I ever find the time.

Proposition 5.3.11 Minkowskis Inequality Let hX, mi be a topological measure space,

let f , g Bor(X), and let 1 p . Then,


| f + g| p | f | p + |g| p .

(5.3.12)

Proof. First take 1 < p < . Then,







|( f + g) p |1 = f ( f + g) p1 + g( f + g) p1 1 a f ( f + g) p1 1 + g( f + g) p1




b | f | p ( f + g) p1 q + |g| p ( f + g) p1 q
(5.3.13)



p1
= ( f + g) q | f | p + |g| p .

Chapter 5. Integration

272

where q is the hlder conjugate of p. This implies that (p 1)q = p, and so


1
Z
q



p1 q

( f + g) p1 :=
d m(x) ( f (x) + g(x))
q
X

Z
=

d m(x) | f (x) + g(x)| p

(5.3.14)

1

= |( f

1
+ g) p |1q .

So in fact, the previous inequality reads


p

|( f + g) |1 |( f
Exercise 5.3.16

1
p q
+ g) |1



| f | p + |g| p

(5.3.15)

Finish the proof in the cases where |( f + g)| p ]1 = 0 and

|( f + g) p |1 = .
If 0 < |( f + g) p |1 < , then we may divide this inequality by this factor to obtain
| f + g| p := |( f

1
p p
+ g) |1

1
1 q

= |( f + g) |1

| f | p + |g|q .

(5.3.17)

Exercise 5.3.18 Do the case p = 1 and p = .


a By
b By

the usual Triangle Inequality.


L p spaces.

L p spaces is just the statement that the norm8 || p satisfies the triangle inequality, something
we needed if it were to be a norm on a vector space. Now that we know this, we are free to defined
the L p spaces.
Definition 5.3.19 L p spaces Let hX, mi be a topological measure space and let 1 p

. Then, L p (X) is the normed vector space defined by


L p (X) := { f Bor(X) : | f | p < },

(5.3.20)

with the addition and scalar multiplication the same as that in Bor(X) equipped with the L p
norm, || p .
R

8 In

I dont like this definition. I think it is artificially cooked-up because people are too
afraid to work with things that arent metric spaces. A definition like this is analogous
to studying only the bounded continuous functions on a topological space, instead of
all continuous functions MorTop (X, R). That is, you restrict your class of functions
for no other reason that you want one norm, instead of a family of seminorms.a I
think it is much more natural to look at compactly integrable functions. Then, just
as in MorTop (X, R), you would obtain a seminorm for each quasicompact subset.
For one thing, continuous functions are all compactly integrable, and so we would
p
have MorTop (X, R) Lloc
(X). Obviously this fails for the global L p spaces. For
example, x 7 x is not an element of L1 (R). In fact, I prefer the local approach so
much more that I think I may add it at a later date when I have more time. For the
moment, however, the usual L p spaces it is.

quotes because we dont know a priori that it is a norm.

5.3 L p spaces

273

a The supremum norm of a bounded function is always finite, even if your topological space is not necessarily

quasicompact.

One hugely important property that the L p spaces have is that they are complete.
Theorem 5.3.21 Riesz-Fischer Theorem. Let hX, mi be a topological measure space

and let 1 p . Then, L p (X) is complete.


Proof. We leave this as an exercise.
Exercise 5.3.22 Prove this result yourself.
R

Hint: Yet another one that should not be an exercise. This time, check-out
[24, pg. 70].


Remember way back when we did Heine-Borel? We mentioned there briefly that the result
fails to hold in general, that is, there are spaces in which closed bounded sets are not necessarily
quasicompact. The L p spaces provide examples of for which this can happen.
Example 5.3.23 A closed bounded set in a normed vector space that is not
quasicompact We take our topological measure space to be XhNi equipped with the


counting measure. Note then that elements of L1 (X) are just sequences m 7 am for which

|am | < .

(5.3.24)

mN

Consider the closed unit ball D1 (0) := { f L1 (X) : | f |1 1}.


Exercise 5.3.25 Check that D1 (0) is closed.

It is clearly bounded. We show that D1 (0) is not quasicompact. To do so, we construct


a sequence in D1 (0) which has no convergent subnet. For m N, let am be the sequence
which sends m to 1 and everything else to 0. So, for example,
a0 := (1, 0, 0, 0, . . .)
a1 := (0, 1, 0, 0, . . .)
a2 := (0, 0, 1, 0, . . .)

(5.3.26)

etc.
We then have that
(
2
|a a |1 =
0
m

if m 6= n
.
if m = n

(5.3.27)

This shows that there is not cauchy subnet, and hence certainly no convergent subnet.
TO DO
Second derivative test

6. Differentiation

Finally we are ready to begin our study of differentiation. We could have started this awhile ago,
but we really needed certain facts about continuity, uniform convergence, and even integration,
before we could address all the facts we care about related to differentiation. This thus led us to a
relatively broad study of the most general spaces1 in which these notions makes sense. For the time
being, however, we return to Rd .

6.1

Tensors and abstract index notation


Throughout this chapter, all vector spaces will be finite-dimensional and real. For the
remainder of this section, V and W will always denote such vector spaces. We omit
proofs in this section under the assumption that you have either already seen the results
or can prove them on your own if you so desired.
We plan to do differentiation in Rd , and to do this (instead of just in R), it will be useful to
know some basic facts about linear algebra. The real motivation for taking this small side route is
that we want to use Penroses abstract index notation.2
If you dont understand the details of this section your first time through, thats fine.
Learn what you can, and then come back as you need to when the concepts come up in
the actual differentiation part of the chapter.
We first discuss the dual space and tensor product.
Definition 6.1.1 Dual space The dual space of V , V , is defined to be

V := MorVectR (V, R).


1 Meh,

(6.1.2)

basically anyways

2 This is conceptually different,

but mechanically very similar to Einsteins index notation. You might say that abstract
index notation is choice-free Einstein index notation (the choice of course being a choice of basis).

Chapter 6. Differentiation

276

V has the structure of a vector space by defining addition and scalar multiplication pointwise. The elements of V are linear functionals or covectors.
R

Though we have not precisely defined in yet, the category VectF , for F a field, is
the category whose objects are vector fields over F and whose morphisms are linear
transformations. Thus, MorVectR (V, R) is our fancy-schmancy notation for the vector
space of linear functions from V into R.

In other words, the elements of V take in elements of V and spit out numbers.
This is actually not that foreign of a conceptfor example, the derivative takes in a
vector (the direction in which to differentiate) and spits out a number (the directional
derivative in that direction).
We reserve the notation V for the conjugate-dual (something we wont see in these
notes)this is why we write V instead of V .

If V comes with a topology, youre only going to want to look at the continuous linear
functionals. Of course, you can look at all of them (including the discontinuous ones),
but in comparison this space will be an ugly beast of a motherfucker.a

a Can

you tell I write not much differently from how I speak? ;-)

Of critical importance is that the dual of the dual is the original vector space.3
Proposition 6.1.3 The map V 3 v 7 v (V ) , where v : V R is defined by

v () := (v)

(6.1.4)

is an isomorphism.
Warning: This is false in infinite dimensions. You will see that we show that this map
is injective and linear, and so by the Rank-Nullity Theorem (something specific to
finite-dimensions) is an isomorphism.

Definition 6.1.5 Tensor product Let V and W be finite-dimensional real vector spaces.

Then, the tensor product of V and W , V W is defined to be the set of all functions
T : V W R such that
(i). for each fixed V , the map 7 T (, ) is linear; and
(ii). for each fixed W , the map 7 T (, ) is linear.
Let v V and w W . Then, the tensor product, v w V W , is defined by
[v w](, ) := (v)(w).
R

3 Careful:

(6.1.6)

To clarify, there are tensor products of vector spaces, and then there are tensor
products of vectors themselves. The tensor product of two vectors lives in the tensor
product of the corresponding vector spaces. And in fact, everything in V W , while
not of the form v w itself necessarily, can be written as a finite sum of elements of
this formsee Proposition 6.1.7 below. (Elements of the form v w are sometimes
called pure or simple, as opposed to, e.g. , v1 w1 + v2 w2 ).

It will frequently be the case that V is isomorphic to V , but in a noncanonical way. On the other hand
and V are canonically isomorphic. The way to make this intuition precise requires more category theory than is
probably helpful. Suffice it to say, the idea is to show that the constructions (read functors) are isomorphic, not the
objects themselves.

(V )

6.1 Tensors and abstract index notation

277

In other words, V W is the set of all bilinear maps on V W , where bilinear


means that, if you fix all arguments except one, you obtain a linear map.

In practice, I find it easier to think of the tensor product as the vector space spanned
by guys of the form v w (as opposed to bilinear maps on the cartesian product of
the dualsick).

This is neither the most general nor the most elegant definition of the tensor product.
The right way to define the tensor product is, as usual, by finding the properties
which uniquely characterize it. Maybe I will update the notes at a later date to include
this, but my personal feeling right now is that this would take us a bit too far astray
(after all, were not studying linear algebra or tensors for their own sakefor us,
theyre just tools to do calculus).

Proposition 6.1.7 V W is the span of {v w : v V, w W }.


Definition 6.1.8 Tensor A tensor of rank hk, li over V is an element of

V V V V
| {z } |
{z
}
k

(6.1.9)

k is the contravariant rank and l is the covariant rank. If l = 0, then the tensor is contravariant, and if k = 0, then the tensor is covariant. If T is a tensor of rank hk, li, then we shall
write
T a1 akb1 bl

(6.1.10)

to help remind us what type of tensor this is.a


R

Do not be sloppy and not stagger your indices! If you do, you will eventually make a
mistake. For example, later we will be raising and lowering indices. Suppose I start
with T ab , I lower to obtain Tba , and then I raise again to obtain T ba I should obtain
the same thing, but in general T ab 6= T ba , and so I have an error. It may seem obvious
to the point of being silly when I point it out like this, but this is a mistake that is
easy to make if there is a big long computation in between the raising and lowering
(especially if its more than just a and b floating around). And of course, you will
never have this problem if you stagger: T ab goes to T ab goes back to T ab .

I claim that this is likewise not that foreign of a concept. In fact, there are so many
examples you are familiar with that I dont even want to put them in a remark, so see
the next example.

a Dont let the indices mislead youthere are no choices being made. Everything is coordinate-free, even
later when we start manipulating the indices themselvesits still all coordinate-free.

 Example 6.1.11 Disclaimer: While none of the examples themselves make use of things
we havent done yet, some of the notation does (e.g. va a ). Read onwards and come back
later if this really bothers you.

(i). Vectors (written va ) themselves are tensors of type h1, 0i.


(ii). Covectors (or linear functionals) (written a ) are of typea h0, 1i. For a linear
functional and v a vector, (v) is written as va a .

Chapter 6. Differentiation

278

(iii). The dot product (written temporarily as gab ) is an example of a tensor of type h0, 2iit
takes in two vectors and spits out a number, written v w = va wb gab .
(iv). Linear transformations (written T ab ) are tensors of type h1, 1iit takes in a single
vector and spits out another vector (written va 7 T ab vb ). Note that it is T ab and
not T ba your convention could go either way, but in the convention we choose the
indices that are contracted during composition are closer together.
a By definition. The term covector is our word for a tensor of type h0, 1i, or in other words, an element of
V (i.e. a linear functional).

There are three key constructions involving tensors that we will need, the tensor product,
contraction, and the dual vector. The tensor product we have already done in Definition 6.1.5,4 and
so we simply explain the how to write the tensor product in index notation.
a1 ...ak1

The tensor product of [T1 ]


a1 ...ak1

[T1 ]

a1 ...ak2

b1 ...bl1

and [T2 ]

a1 ...ak2
b1 ...bl1 [T2 ]
b1 ...bl2 .

b1 ...bl2

is denoted
(6.1.12)

That is, you literally just juxtapose them.


We now turn to contraction.
a1 ...ak
b1 ...bl

:= [v1 ]a1 [vk ]ak [1 ]b1 [l ]bl be a tensor of rank hk, li (recall (Proposition 6.1.7) that every tensor can be written as a sum of
tensors of this form). Then, the contraction of T along the ai and b j index is defined to be
Definition 6.1.13 Contraction Let T

a1 ...ai1 ai+1 ...ak


b1 ...b j1 b j+1 ...bl :=
a1
j (vi ) [v1 ] [vi1 ]ai1 [vi+1 ]ai+1 [vk ]ak [1 ]b1 [ j1 ]b j1 [ j+1 ]b j+1 [l ]bl .

(6.1.14)
The contraction of tensor that is a sum of simple tensors is defined to be the sum of the
contraction of those simple tensors.
R

In general, the way one can decompose a general tensor as a sum of simple ones is
not unique, so we must technically check that this is well-defined.

This might look atrocious, but its actually quite simple. Covectors take in vectors
and spit-out numbers, and so the contraction of a tensor product in its ai and b j index
is formed by plugging in the ith vector into the jth covector.

Keep in mind that you can only contract upper-indices (contravariant) with lower
(covariant) ones.

A couple of examples: The only possible contraction of the tensor va b is written


va a and this of course is just (v). If T ab is a linear transformation and va is a
vector, then the contraction T ab vb is just T (v), that is, the image of v under the linear
transformation T .

The index k in [vk ]ak is part of the name of the vectorthe entire name is vk , and then
the notation [vk ]ak reminds us that vk is a h1, 0i-tensor, i.e. just a vector.

4 Well, I suppose we have to define the tensor products of arbitrary tensors, as opposed to just vectors, but the
definition in general is just an extension of the one weve already written down.

6.1 Tensors and abstract index notation

279

We need one more ingredient before we can get to actual differentiation, namely that of a
metric.
Definition 6.1.15 Metric (on a vector space) A metric ga on V is a covariant tensor

of rank 2 such that


(i). (Symmetry) g(v1 , v2 ) = g(v2 , v1 ); and
(ii). (Nonsingularity) the map from V to V defined by v 7 g(v, ), where g(v, ) is the
linear functional which sends w to g(v, w), is an isomorphism of vectors spaces.
R

If va is a vector, then we write va := gab vb . va is the dual vector (which itself is not a
vectorits a covector) of va . Nonsingualirty is key because it allows us to reverse
this process. If a is a covector, then because the map va 7 va is an isomorphism,
there is a unique vector, written a , that is equal to a under this map.

(i) can be written gab = gba . Also note that g(v1 , v2 ) = [v1 ]a [v2 ]b gab .

The idea of a notion of a metric on a vector space and a metric on a set (in the context
of uniform space theory) have little to nothing to do with each other. It is merely a
coincidence of terminology that is so ingrained that even I dare not go against it.

The term metric in this sense of the word should really not be thought of as a
sort of distance, but rather as a sort of dot product. Indeed, you can verify that the
dot product is a metric, and furthermore, in a sense that we dont bother to make
precise, every positive-definite metric (on a vector space) is equivalent to the usual
euclidean dot product. There is some connection with the other notion of metric,
howeverpositive-definite metrics give us norms (the square-root g(v, v)), which in
turn gives us a metric (in the other sense).

Nonsingularity is usually replaced with the requirement that g(v, w) = 0 for all w
implies that v = 0 (called nondegeneracy. In finite dimensions, this is equivalent to
nonsingularity (by the Rank-Nullity Theorem). In infinite dimensions, however, they
are not equivalent, and it is nonsingularity that we want (so that we can raise and
lower indices).

a The

g is for gravity (in general relativity, gravity is modeled as a metric on space-time).

Its worth noting that, everything except raising and lowering indices we can do without a
metric. To raise and lower indices, we do need that extra structure. In particular, if you pick a
different metric, then your meaning of va will change even though the metric does not appear
explicitly in this notation.
In summary:
(i). The tensor product of two vectors va and wa , written va wb , is defined to be the bilinear map
that sends the pair of covectors ha , a i to (a va )(a wa ). In practice, its not particularly
helpful to think of what this is5 in practice what matters is can you manipulate them.
(ii). A general tensor of rank hk, liis an element in the tensor product of k copies of V with l
copies of V .
(iii). The definition of the tensor product of vectors can be extended to the tensor product of any
tensors. In index notation, this is denoted simply by juxtaposition.
5 When

you add 2 to 3 do you think about 2 being an equivalence class of sets with respect to the equivalence relation
of isomorphism in the category of sets? Here, Ill help you out: No, you do not.

Chapter 6. Differentiation

280
(iv). We can contract indices.
(v). If we have a metric, we can also raise and lower indices.

6.2

The definition
One thing that I personally found conceptually confusing with differentiation in Rd itself that was
elucidated for me when passing to the study of more general manifolds was the distinction between
a vector and a point. The problem in Rd of course is that the space of points and the space of factors
are effectively the same thing, they are both given by a d-tuple of real numbers, when in fact, they
are really playing quite different roles. The points tell you where we are and the vectors tell you
what direction to go in. In a general manifold, the points that tell you where form the points of
the space itself and the vectors do not live in the entire space itself, but rather the tangent spaces.
Thus, while we have no intention of doing manifold theory in general,6 we will make use of
some suggestive notation that comes from the theory.
Throughout this chapter, the symbol Rd will be used to denote d-dimensional euclidean
space as a metric space. For each x Rd , we define T(Rd )x, the tangent space at x in
Rd to be the metric vector space Rd (with metric being the dot product).7 Furthermore,
we declare that T(Rd )x1 6= T(Rd )x2 for x1 6= x2 .8 We will often, but not always, use
abstract index notation for vectors va T(Rd )x to help remind us that they are to be
thought of as vectors instead of points. It will sometimes be useful to assign other
vectors spaces to each point (for example, for a function f : Rd Rm := V ). In general,
the assignment of a vector space Vx to each point of Rd will be called a vector bundle
on Rd . The assignment to each point of the tangent space is a special vector bundle
called the tangent bundle.
In particular, as sets, we might have that Rd = T(Rd )x, but thats itthe two objects dont even live
in the same category, and so it doesnt even make sense to ask whether there is some isomorphism
between them (unless you forget some of the structure, in which case youre actually changing the
object). For example, Rd is just a metric spaceyou cannot add any two of its elements. Tangent
vectors, elements of T(Rd )x, on the other hand, we can add just fine.
To clarify, this is not actually how the definition goes in general. The general definition of the
tangent space requires us to first be able to talk about manifolds, which in turn requires us to know
how differentiation in Rd works, which of course we have not done yet. Thus, we are making use
of this notation only to help clarify the study of differentiation in Rd . In principle, once enough
of this theory has been developed so that we can talk about tangent spaces in general, we would
replace the above with the actual definition. Likewise, this is not the actual definition of vector
bundlesfor us, the term vector bundle is just a phrase in the English language that we are
going to use to communicate mathematical ideas. It requires basic manifold theory to define vector
bundles properly.
This speak of tangent spaces allows us to make an important definition.
Definition 6.2.1 Tensor field A tensor field of rank hk, li is a function on Rd whose

value at x is a rank hk, li tensor on T(Rd )x. A tensor field T a1 ...akb1 ...bl is continuous
6 In contrast to topology, for example, you do have to prove essentially all of your results in Rd first and then extend
them to arbitrary manifolds, whereas in principle you can prove all the results about topology you ever wanted without
even mentioning R. This is one reason among others why we do general topology but not manifold theory.
7 Careful: the word metric here is being used in two totally different senses. I know the terminology is perverse, but
dont look at me! Im not the one that came-up with it.
8 There are many ways to do this, but one way, for example, is to take T(Rd )x := Rd {x}.

6.2 The definition

281

iff [v1 ]b1 [vl ]bl [1 ]a1 [k ]ak T a1 ...akb1 ...bl is continuous for all vectors [v1 ]a , . . . , [vl ]a and
covectors [1 ]a , . . . , [k ]a .
R

6.2.1

Its just an assignment of a tensor to every point in Rd . For example, a vector field is
an assignment of a vector to every point.

The definition itself


Finally, with this (hopefully elucidating) notation in hand, we can define the derivative.
Definition 6.2.2 Directional derivative Let S Rd , let f : S R, let x Int(S), and

let v T(Rd )x. Then, the directional derivative of f at x in the direction v, Dv f (x), is
defined by
Dv f (x) := lim

0

f (x + v) f (x)
.


(6.2.3)

If this limit exists, then f is differentiable at x in the direction va .


R

The expression inside the limit on the right-hand side of (6.2.3) is the difference
quotient.

Note that we only define the derivative at points on the interior of the domain. The
reason we do this is because otherwise we cannot guarantee that x + v is in the
domain of f for even a single > 0, so that f (x + v) does not make sense, in which
case it is nonsensical to ask whether the limit of the difference quotient exists or not.

For some reason, people tend to write limh0 in this definition instead of lim0 . Not
sure why.  is used for small numbers everywhere else in analysis. Moreover, what
happens if your function is called h (not that uncommon of a name for a function)?
What are you going to do? Write h(x + hv)? Ew. Maybe you object But  cant
be negative! Thats against the rules!. No. is typically taken to be positive, but I
am not writing that. . . I am writing . The distinction is deliberate is used to
indicate a small positive number and  is used to indicate a small (not-necessarilypositive) number.a If you think the difference between these symbols is hard to notice,
I recommend you increase your TEXnichal skillsb (one is \varepsilon and the
other is \epsilon)c .

aI

do not guarantee that I will never break this convention.


get glasses.
c Actually, if youre really T Xnichal, you might have noticed that this itself is a slight liebecause of font
E
issues, I had to steal \epsilonup from the newpx package.
b Or

Definition 6.2.4 Derivative (of a function) Let S Rd , let f : S R and let x Int(S).

Then, f is differentiable at x iff


(i). Dv f (x) exists for all v T(Rd )x;
(ii). the map T(Rd )x 3 v 7 Dv f (x) is linear; and
(iii). the map T(Rd )x 3 v 7 Dv f (x) is continuous.a
In this case, we write
va a f (x) := Dv f (x)

(6.2.5)

to indicate the linearity. f is differentiable (on S) iff f is differentiable at x for all x Int(S).

Chapter 6. Differentiation

282
R

Warning: This is not the universally accepted definition of differentiability. This


is (almost) what is referred to as gteau differentiability as opposed to a stronger
definition called frchet differentiability. We say almost because we require that
the map va 7 va a f (x) be linear (and continuous), whereas gteaux differentiability
merely requires the existence of Dv f (x) for all v T(Rd )x. (Thus, our definition
is slightly stronger.) We chose this definition because it is easier and, IMHO, more
natural, than frchet differentiability.b That being said, it does have its pathologies.
Of particular note is that there are differentiable functions that are not continuoussee Example 6.5.1. Fortunately, this cannot happen in one-dimensionsee
Proposition 6.5.8.

Of course, there exist examples in which the directional derivative is not linear
(otherwise we would not have made this an assumption)see the following example
(Example 6.2.10). In particular, if we say that f is differentiable at x, then we are
assuming the directional derivative is linear, not only that it exists.
Despite this pathology, fortunately, the derivative is homogeneous, a fact particularly
relevant in one-dimension.

Exercise 6.2.6 Let S Rd , let f : S R, let x Rd , let v T(Rd )x, and let R.

Show that, if Dv f (x) exists, then Dv f (x) exists and furthermore that
Dv f (x) = Dv f (x).
R

(6.2.7)

A corollary of this is that, for a function f : R R to be differentiable at a point, it


need only have one directional derivative (because then it has all directional derivatives by the previous exercise). Furthermore, in one dimension, there is essentially
only one choice of va everything else is a scalar multiple. Thus, in one dimension,
we always take va = 1 T(R)x
=VectR R and write
d
f (x) := va a f (x).
dx

(6.2.8)

The notation va a f is obviously suggestive. Hopefully it reminds you of the fact from
multivariable calculus that the directional derivative of a function f in the direction ~v
is given by the dot product of ~v with the gradient of f : ~v ~ f .c For fixed f and x, the
map va 7 [va a f ](x) is linear in va , and so defines a linear functional, that is to say,
a f (x) T(Rd )x , or equivalently, that a f is a covector field on Rd . This covector
field is the derivatived or gradient of f at x.

Perhaps more accurate notation would have been [va a f ](x), that is, va a f is a
function (in the case that f is differentiable anyways), and so va a f (x) is the value
of va a f at x, as opposed to, the derivative of the function f (x), which is just 0 of
course.e The point is: you must compute the derivative, and then plug-in x. It might
seem silly in such a simple context, but in much more complicated contexts I myself
have made this very mistake. For example, suppose you are computing a functional
derivative (to find a noether charge, say) of some action functional in physics, and
you want to see that this quantity is conserved on-shell (i.e. when the equations of
motion hold)you cannot use the equations of motion before you finish computing
the noether charge off-shell: thats cheating (and more importantly, possibly just
plain wrong)!f

Suppose that f is differentiable. Then, va a f itself is a function on all of Rd , and so


we may differentiateg this as well to obtain
wb b (va a f ) = wb va b a f .

(6.2.9)

6.2 The definition

283

(Warning: This equality will not hold in general if va is a nonconstant function of x.)
Just as a f (x) defined a covector on T(Rd )x, so to does b a f (x) define a covariant
2-tensor on T(Rd )x.h
If all higher derivatives of f exist, then f is infinitely-differentiable. If all higher
derivatives of f exist and they are continuous, then f is smooth.i

a Actually,

in finite dimensions, continuity follows from linearity for free (though we will not check this).
We state this explicitly because you will want to do so when you generalize to infinite dimensions, where you
wont get continuity for free.
b We certainly need the map v 7 D f (x) to be linear if we are to do tensor calculus (otherwise the derivative
v
itself wont be a tensor).
c Contraction of indices should always be thought of as a sort of dot product.
d That is to say, f is the derivative and va f is the derivative in the direction of va (or the directional
a
a
derivative in the direction of va ).
e I know to some this may seem pedantic, but f is the function, f (x) is the value at x of the function, so that
f (x) is just a number.
f Once again, if the physics analogy is meaningless to you, just ignore it.
g If the limit exists, of course
h Note that a covariant 2-tensor is an element of T(Rd )x T(Rd )x . When we say it is a covariant 2-tensor
on T(Rd )x, this is what we meanits neither a (real-valued) function nor an element of T(Rd )x.
i Yes, there is a difference. In fact, infinitely-differentiable functions need not even be continuoussee
Example 6.5.1.

Example 6.2.10 A function which is gteaux differentiable but not differentiable

In other words, we seek a function all of whose directional derivatives exist at a point, but
for which the map v 7 Dv f (x) is not linear.
Define f : R2 R by
(
0
if hx, yi = h0, 0i
f (hx, yi) :=
.
(6.2.11)
x2 y
otherwise
x2 +y2
Let v := hvx , vy i T(Rd )h0, 0i be nonzero (the directional derivative in the direction of the
zero vector is alwaysgaspzero). Then,
f (h0, 0i + v) f (h0, 0i)
=


(vx )2 (vy )
(vx )2 +(vy )2

v2x vy
.
v2x + v2y

(6.2.12)

In particular, the limit of this as  0 always exist. On the other hand, the map
v 7

v2x vy
=: Dv f (h0, 0i)
v2x + v2y

(6.2.13)

is certainly not linear. For example,


Dh1,1i f (h0, 0i) =
a This

1
6= 0 + 0 = Dh1,0i f (h0, 0i) + Dh0,1i f (h0, 0i).
2

(6.2.14)

example comes from Ted Shifrins math.stackexchange answer.

In fact, things that are arguably even worse than this can happenit can happen that all
directional derivatives exist and furthermore the map that seconds a vector to the directional
derivative in that direction is linear, but yet the function not even be continuous. Unfortunately, we

Chapter 6. Differentiation

284

will have to postpone such a counter-example until after having defined the exponential function
see Example 6.5.1
We can use the definition of the derivative of a function to define the derivative for all tensor
fields. The idea is that, by plugging-in enough vectors and covectors, all tensor fields reduce to just
a function.
Definition 6.2.15 Derivative (of a tensor) Let T

Rd .

T a1 ...akb1 ...bl

a1 ...ak
b1 ...bl

be a tensor field of rank


is differentiable at x iff for all covectors 1 , . . . , k and

hk, li on
Then,
vectors v, v1 , . . . , vl .
h
i
(i). Dv [1 ]a1 [k ]ak [v1 ]b1 [vl ]bl T a1 ...akb1 ...bl (x) exists for all v T(Rd )x;

h
i
(ii). the map T(Rd )x 3 V 7 Dv [1 ]a1 [k ]ak [v1 ]b1 [vl ]bl T a1 ...akb1 ...bl (x) is linear;
and
h
i
(iii). the map T(Rd )x 3 v 7 Dv [1 ]a1 [k ]ak [v1 ]b1 [vl ]bl T a1 ...akb1 ...bl (x) is continuous.
In this case, the derivative, a T a1 ...akb1 ...bl , of T a1 ...akb1 ...bl is defined by
va [1 ]a1 [k ]ak [v1 ]b1 [vl ]bl a T a1 ...akb1 ...bl =


va a [1 ]a1 [k ]ak [v1 ]b1 [vl ]bl T a1 ...akb1 ...bl

(6.2.16)

for all covectors 1 , . . . , k and vectors v, v1 , . . . , vl .


R

This makes sense because the thing inside the derivative on the right-hand side is just
a function.

This looks a lot more atrocious than it is. All we are doing is reducing tensor fields to
functions, and then applying the definitions we know for functions. So, for example,
consider a tensor field T abc of rank h2, 1i. For every choice of vector [v1 ]a and choice
of two covectors [1 ]a , [2 ]a , we obtain a function, [v1 ]a [1 ]b [2 ]c T bca . The tensor
T abc is then differentiable if every one of these functions is differentiable, and in this
case, the derivative is a tensor field that now takes in two vectors va , [v1 ]aand two
covectors [1 ]a , [2 ]a , and spits out the function va b [v1 ]b [1 ]c [2 ]d T cdb .

Among other things, we can how differentiate functions which take their values in
Rm , as we just interpret such a function as a vector field on Rd .a

Thus, the in particular shifts the covariant rank of a tensor up by 1 (for example, as
functions are just h0, 0i tensors, it takes functions to covector fields).

a We

are cheating a bit. Rm is not the tangent space of any pointyou can tell because it might not even
have the right dimension. The appropriate way to deal with this is to attach a new vector space Vx to each point,
where Vx
=VectR Re , and then interpret the value f (x) as an element of Vx . Fortunately, this has no effect on the
above definition. For such functions, I will write f , to remind us that f lives in a different vector bundle than
usual.

Example 6.2.17 Gradient, divergence, curl, and laplacian The gradient, diver-

gence, curl, and laplacian from multivariable calculus are all just special cases of derivatives
of tensors (along with other tensorial constructions such as contraction).
Throughout this example, let f be a smooth function on Rd and let va be a vector field on
Rd .

6.3 A smooth version of MorTop (Rd , R): C (Rd )

285

The gradient of f is simply a f . Youve already seen this guy.


The divergence of va is a va . Note how in multivariable calculus you might write this as
~ ~v, the dot product of the gradient with the vector field ~v. The index contraction here
hopefully makes obvious how this is the divergence that you know and (maybe) love.
The curl is trickier, but this is not surprising, as it was trickier than the rest in multivariable
calculus as well. First of all, the curl of a vector field is in general not a vector fieldthis
is very specific to three-dimensions, essentially because the cross-product is specific to
three-dimensions. In general, it is also more convenient to define the curl of a covector
field instead of a vector field.a For us, the curl of a covector field a will be defined to be
a b b a .b For what its worth, in higher dimensions, mathematicians dont call this
the curlthey call it the exterior derivative.c
The laplacian of a function is the divergence of the gradientliterally (modulo the raising
of an index), that is, the laplacian of f is a a f .
a They

are essentially equivalent, however, as you can translate back and forth by raising and lowering
indices.
b To get back a vector field in three-dimensions you have to in turn contract this with abc , where abc is the
unique completely antisymmetric contravariant tensor of rank 3 of trace 1. Completely antisymmetric means
if you flip any two indices you pick-up a minus sign (e.g. abc = bac ; of trace 1 means abc abc = 1. The
reasons you dont get back a vector in higher dimensions is because in higher dimensions this becomes a1 ...ad ,
and then you will have instead a contravariant tensor of rank d 2.
c Not sure whats exterior about it though.

6.3

A smooth version of MorTop (Rd , R): C (Rd )


Our goal in this section is to find a notion of convergence (i.e. a topology) that has the property that
a net of smooth functions converges to another smooth function. The statement that MorTop (Rd , R)
is complete from the last chapter (Theorem 4.5.9) says that continuous functions which converge
in MorTop (Rd , R) (i.e. converges uniformly on quasicompact subsets) converge to a continuous
function.9 Unfortunately, this fails miserably if we replace continuous with the word smooth.
In fact, we have have a sequence of smooth functions which converge uniformly to a function that is
differentiable-nowhere.10 To construct such a counter-example, we will have to wait until we have
access to trigonometric functions11 see Example 6.5.36. There is a simple solution howeverif
your derivatives are continuous, and you want them to converge to something continuous, just force
them to converge uniformly as well.
Definition 6.3.1 C (Rd ) C (Rd ) seminormed algebra space whose underlying set is the

collection of all smooth real-valued functions, whose algebra structurea is given by the
pointwise operations, and for each quasicompact K Rd , m N, and [v1 ]a , . . . , [vm ]a Rd
there is a seminorm ||K,v1 ,...,vm defined by
| f |K,v1 ,...,vm := sup |[v1 ]a1 [vm ]am a1 am f (x)| .

(6.3.2)

xK

9 In

C (Rd , Rm ) is the set of all smooth functions from Rd into Rm . The differences: (i)
there is now no longer any multiplication, and (ii) in the definition of the seminorm
you interpret the absolute value as the eudlicean norm in Rm .

fact, it says more than thisif your net is cauchy, it has a limit, and that limit is continuous.
it is of course necessarily continuous by completeness of MorTop (Rd , R).
11 Perhaps one can construct an example without them, but the standard example does use them.

10 Though

Chapter 6. Differentiation

286

We have not defined it, but in the category of manifolds, Man, it will turn out that
C (Rd , Rm ) := MorMan (Rd , Rm ).b

a Recall that an algebra (Definition 4.3.80) is a vector space that is also a ring subject to a compatibility
condition. In other words, you can add things, you can multiply things, and you can multiply by scalars.
b The objects in Man are manifolds, not men. Dont objectify people. Its wrong.

Proposition 6.3.3 Let 7 f C (Rd ) be a net and let f C (Rd ). Then, 7

f converges to f (is cauchy) in C (Rd ) iff 7 [v1 ]a1 [vm ]am a1 am f converges to [v1 ]a1 [vm ]am a1 am f (is cauchy) in MorTop (Rd , R) for all m N and
[v1 ]a , . . . , [vm ]a Rd .
You might say that the definition of the seminorms was made the way they were so
that this is the case.

Proof. We leave this as an exercise.


Exercise 6.3.4 Prove this yourself.


Theorem 6.3.5. C (Rd ) is complete.

Proof. Let 7 f C (Rd ) be cauchy. By the previous result,


7 [v1 ]a1 [vm ]am a1 am f

(6.3.6)

is cauchy in MorTop (Rd , R) for all m N and v1 , . . . , vm Rd , and so, for each
m N and v1 , . . . , vm Rd , there is some gv1 ,...,vm MorTop (Rd , R) such that 7
[v1 ]a1 [vm ]am a1 am f converges to gv1 ,...,vm in MorTop (Rd , R).
Exercise 6.3.7 Complete the proof by showing that

gv1 ,...,vm = [v1 ]a1 [vm ]am a1 am g.

(6.3.8)


6.4
6.4.1

Calculus
Tools for calculation
And now we come to the results which you are probably most familiar with from calculus.
Proposition 6.4.1 Algebraic Derivative Theorems Let f , g : Rd R be differen-

tiable.a and let R. Then,


(i). (Linearity) a ( f + g) = a f + a g;
(ii). (Homogeneity) a ( f ) = a f ;
(iii). (Product Rule)a ( f g) = (a f )g + f (a g);b and

6.4 Calculus

287

(iv). (Quotient Rule) a


R

 
f
g

(a f )g f (a g)
g2

whenever g 6= 0.

The first three are true just as well for f and g arbitrary tensor fields, with essentially
the same exact proofs (the juxtaposition denotes the tensor product of course). The
Quotient Rule does not make sense, however, as in general you cannot invert tensors.

Proof. (i) and (ii) follows straight from the corresponding results above limitssee Proposition 2.4.33(i) and Proposition 2.4.33(ii).
As for (iii), we have
f (x + hv)g(x + hv) f (x)g(x) d
=




h
g(x + hv) g(x)
f (x + hv) f (x)
g(x + hv) + f (x)
,
h
h

(6.4.2)

and so taking limits gives us the Product Rule.


Similarly, the proof of the Quotient Rule amounts to just algebraic manipulation of the
difference quotient:




f (x+hv) f (x)
g(x+hv)g(x)
f (x+hv)
f (x)
g(x)

f
(x)
h
h
g(x+hv) g(x)
=
.
(6.4.3)
h
g(x + hv)g(x)

a Everything

works just as well (with the same proof) if you just assume it is differentiable in some direction
at some point, but the notation is more tedious.
b I would get in the habit of not mixing-up the order of f and g (e.g. by writing ( f )g + ( g) f or
a
a
something of the like). It wont matter for us, but it can and will latter when youre working with things that are
not commutative (the cross-product of vectors is probably the most elementary example).
c We added and subtracted f (x)g(x + hv).
d We added and subtracted f (x)g(x + hv).

Proposition 6.4.4 Chain Rule Let f : Rd Rm and g : Rm Rn be differentiable.a

Then,
a [g f ] (x) = g ( f (x))a f (x).
R

(6.4.5)

One of the things I really love about index notation is that it almost dictates what the
answer has to behow many ways can you? construct a tensor with one Rd covariant
index and one Rn contravariant index using only f , g , and their derivatives?

Proof. To prove this, by the definition of the derivative of tensors, we need to show that
a [g f ] (x) = g ( f (x))a f (x)

(6.4.6)

for all covectors (living in Rn ) . In particular, it suffices to prove the result for g : Rm R
(because now g : Rm R).
Let va be a constant vector field on Rd . Then, what we actually want to show is
va a [g f ](x) = ( g( f (x))) (va a f (x)) ,

(6.4.7)

as va is arbitrary. On one hand


g ( f (x + hv)) g( f (x))
h0
h

va a [g f ](x) = lim

(6.4.8)

Chapter 6. Differentiation

288
and on the other hand

1
[g ( f (x) + hva a f (x)) g( f (x))]
h0 h

( g( f (x))) (va a f (x)) = lim

(6.4.9)

Thus, we want to show that


f (x + hv) = f (x) + hva a f (x) as h 0,

(6.4.10)


but of course this is just the very definition of the derivative.


a The

6.4.2

index is used to remind us that g lives in a Rn (as opposed to Rm or Rd ).

The Mean Value Theorem


One of the most important theorems of traditional calculus is of course the Mean Value Theorem.
We first prove a couple of results, important in their own right, leading up to it.
Definition 6.4.11 Local extrema Let f : X R be a function from a topological

space into R and let x0 X. Then, x0 is a local maximum of f iff there is some open
neighborhood U of x0 such that (i) f (x0 ) = supxU { f (x)} and (ii) x0 is the unique such
point.a x0 is a local minimum of f iff there is some open neighborhood U of x0 such that (i)
f (x0 ) = infxU { f (x)} and (ii) x0 is the unique such point. x0 is a local extremum of f iff it
is either a local maximum or a local minimum.
a So

for example, constant functions have no local maxima.

R
Proposition 6.4.12 First Derivative Test Let S Rd , let x0 (S), and let f : S R be

differentiable at x0 . Then, if x0 is a local extremum of f , then a f (x0 ) = 0.


R

Sometimes this is also referred to as Fermats theorem. Evidently, it was not the last
result of his life.

The first derivative alone, however, cannot detect the difference between local minima and maxima. For that we need the Differentiability and continuity, a stronger
version of Second Derivative Test, with which you are likely more familiarsee
Theorem 6.5.58.

Proof. Suppose that f is differentiable at x0 and x0 is a local extremum of f . Let U be an


open neighborhood about x0 so that x0 is either a maximum or minimum in U. We proceed
by contradiction: suppose that a f (x0 ) 6= 0. Then, there is some va T(Rd )x0 such that
va a f (x0 ) 6= 0. Without loss of generality, suppose that va a f (x0 ) > 0. Then, there is some
h > 0 such that x0 + hv, x0 hv U and
f (x0 + hv) f (x0 )
f (x0 hv) f (x0 )
=: K > 0 and
=: L > 0,
h
h

(6.4.13)

f (x0 + hv) = f (x0 ) + hK > f (x0 ) and f (x0 hv) = f (x0 ) hL < f (x0 ),

(6.4.14)

so that f (x0 ) can be neither a maximum nor a minimum in U: a contradiction.

and so

6.4 Calculus

289

Theorem 6.4.15 Mean Value Theorem. Let f : [a, b] R be continuous and differen-

tiable.a Then, there is some c (a, b) such that


d
dx

f (x) =

f (b) f (a)
.
ba

(6.4.16)

The right-hand side of this equation is called the average slope of f on [a, b].

This essentially does not generalize at all to higher dimensions. It just fails outright if
you increase the dimension of the codomainsee Example 6.4.20. On the other hand,
there is the question of what should the statement of the Mean Value Theorem even be
if you increase the dimension of the domainwhat is the average slope over a square,
for example? What you can do is state the result in terms of curves between points in
Rd , but this is really just the one-dimensional statement here for the composition of
the function with the curve (which of course itself is a function from a closed interval
in R to Rd ).

Proof. Define g : [a, b] R by




f (b) f (a)
g(x) := f (x)
(x a).
ba

(6.4.17)

d
The first thing to notice is that we want to find some c (a, b) such that dx
g(c) = 0.
g is continuous on [a, b], and therefore, by the Quasicompactness and the Extreme Value
Theorem, attains a maximum and a minimum somewhere on [a, b]. Suppose that both the
maximum and minimum were achieved at the endpoints. As g(a) = f (a) and g(b) = f (a),
this would imply that f (a) was both a minimum and a maximum for g on [a, b], which in
d
turn would imply that g is constant on [a, b]. In particular, dx
g(c) = 0 for any c (a, b). On
the other hand, if at least one of the maximum and the minimum where attained at c (a, b),
then it would be a local extremum, and so by the The Mean Value Theorem, we would have
d
g(c) = 0.

that dx
a Recall that (see the definition, Definition 6.2.4) a function being differentiable means that it is differentiable
on its interior (we only defined the derivative for interior points). Thus, while it may be the case that differentiability at a point implies continuity at a point, saying that f is differentiable here only implies continuity on the
interior of the domain, (a, b). But in fact, we do need continuity on all of [a, b]see the next example.

Example 6.4.18 A function which is differentiable on [a, b] for which the The
Mean Value Theorem fails Define f : [0, 1] R by

(
0
f (x) :=
1

if x = 0
.
otherwise

(6.4.19)

f is differentiable on [0, 1] because its derivative exists at every point in the interior of [0, 1]
(it is always 0 of course). On the other hand, the average slope of this function is 10
10 = 1.
Example 6.4.20 A function continuous and differentiable on [a, b] into R2 for
which the The Mean Value Theorem fails Define f1 , f2 : [1, 1] R by


f1 (x) := x2 and f2 (x) := x3

(6.4.21)

Chapter 6. Differentiation

290
and f : [1, 1] R2 by
f (x) := h f1 (x), f2 (x)i.

(6.4.22)

We show there is no c (1, 1) such that


d
f (1) f (1)
f (c) =
.
dx
1 (1)

(6.4.23)

f1 (1) = 1 and f1 (1) = 1, and on the other hand, f2 (1) = 1 and f2 (1) = 1. Therefore,
the right-hand side of this equation becomes
f (1) f (1)
= h0, 1i.
1 (1)

(6.4.24)

If (6.4.23) were to hold, then we must have that f10 (c) = 0, which forces c = 0. But then,
f20 (c) = 0 6= 1.
Exercise 6.4.25 Let f : [a, b] R be continuous and differentiable. Show that the following

are true.
(i). If

d
dx

f (c) = 0 for all c (a, b), then f is constant on [a, b].

(ii). if

d
dx

f (c) > 0 for all c (a, b), then f is increasing on [a, b].

(iii). If

d
dx

f (c) < 0 for all c (a, b), then f is decreasing on [a, b].

Exercise 6.4.26 Let f : [a, b] R and let c (a, b). Then, if f is differentiable at c and
d
dx

f (c) > 0, is f necessarily increasing in a neighborhood of c?

Be careful though: we can deduce nothing if the derivative is positive at a point alone.
Example 6.4.27 A point at which the derivative is positive but has no neighborhood in which the function is increasing Define f : R R by


(
x 1
f (x) := 2 4
x

if x Q
.
if x Qc

We first show that the derivative at 12 is 1.



 1
1
1
( 2 +) 4 4

1
1
f ( 2 + ) f ( 2 )
=  1 2 1


2 + 4


(6.4.28)

if  Q
if x

Qc

(
1
=
1+

if  Q
.
if  Qc

(6.4.29)

Hence, we do indeed have that


lim

0

f ( 21 + ) f ( 12 )
= 1 > 0.


(6.4.30)

On the other hand, let U be any open neighborhood of 12 and let > 0 be rational and such
that + 12 U. Now choose 0 < < irrational and such that + 2 > . Then, despite

6.4 Calculus

291

the fact that 12 + < 12 + , we nevertheless have that


f ( 12 + ) = ( 12 + )2 = 41 + + 2 > 14 + = f ( 14 + ).

6.4.3

(6.4.31)

The Fundamental Theorem of Calculus


So there is this theorem. Its kind of important. Maybe youve even heard of it. Its called the
Fundamental Theorem of Calculus and it asserts that
Z x

f (x) =
a

d
dx dx
f (x).

(6.4.32)

Well, sort of. If were not careful about what we mean, this can completely fail.. Thats rightyour
calculus teachers lied to you. You should sue them.12
Example 6.4.33 A uniformly-continuous function on [0, 1] that is 0 at 0, 1 at 1,
but whose derivative is 0 almost everywherethe Devils Staircase Weve seen this

bad boy beforesee Example 5.2.101.a The construction is relatively long, and so we
dont repeat it here, but if it helps you remember, its that function which is constant on
the components of the complement of the Cantor Set in [0, 1]. In fact, as the Cantor Set
has measure 0, it follows from this fact that the derivative
of the Devils Staircase is 0
R
d
almost-everywhere. Hence, on one hand we have that 01 dx dx
f (x) = 0, but on the other
hand we have f (1) f (0) = 1. Turns-out that a lot can happen on a set of measure 0!
In fact, you can modify this by adding x to obtain a uniform-homeomorphism (the Cantor Functionsee the same example, Example 5.2.101) which has derivative 1 almosteverywhere, but yet goes from 0 to not 1, but 2, on [0, 1]. The point is, you can have
ridiculously nice functions for which the Fundamental Theorem of Calculus fails.
a There,

it was used to generate an example of a uniform-homemorphism of R that preserves neither


measurability not measure 0.

Actually, this shouldnt come as that much of a surprise.13 For one thing, pretty much everything
you integrate in standard calculus classes is a polynomial, a trig function, an exponential function,
a inverse of one of these, or a function formed from these by addition multiplication etc.. That is,
you only deal with incredibly nice functions all the time. While the Devils Staircase is actually
relatively nice as far as all continuous functions go (its uniformly-continuous), it certainly isnt
smooth everywhere.
Also, its worth mentioning that we cheated a bit for these counter-examples: the Devils
Staircase is not differentiable on [0, 1]it is only differentiable almost-everywhere. Its important
that we allow this, however. Remember, we want to go both ways: given a absolutely integrable
function, we want to use the integral to (hopefully) generate a function that is an antiderivative14 ;
and conversely given a differentiable function, we want to use the integral to (hopefully) recover
the original function as the integral of the derivative. The point is that, in one of these cases, the
input is a absolutely integrable function, in which case things only matter up to measure zero.
12 Im exaggerating for dramatic effect. (So dramatic, isnt it?) When you first took calculus, the idea that a derivative
would be defined only almost-everywhere was probably just not a thing.
13 By now, you should pretty much expect many things that seem to be completely reasonable to be utterly false. For
example, check-out the function that is infinitely-differentiable but not continuousExample 6.5.1.
14 That is, a function that has the property that when you differentiate it, you get back the original function.

Chapter 6. Differentiation

292

d
To clarify: if F(x) := ax dt f (t), we could never hope to have that dx
F(x) = f (x) for all x15
instead, we can only hope this to be true almost-everywhere. Fortunately enough, this much is
true.

Theorem 6.4.34 The Fundamental Theorem of CalculusPart I. Let f : [a, b] R

be absolutely integrable. Then,


d
dx

Z x

dt f (t) = f (x)

(6.4.35)

for almost-all x [a, b].


R

In other words, The derivative of the integral is the original function..

Proof. We leave this an an exercise for the reader.


Exercise 6.4.36 Prove this yourself.
R

Hint: This is definitely in the category of This really shouldnt be an exercise,


but it is anyways because Jonny ran out of time.. Pugh ([19]) actually proves
something stronger than this on pg. 396, so your hint is to look there.


In the direction in which your input is a differentiable function, of course, there is (seemingly) nothing to worry about because we are a priori assuming that the function is differentiable
everywhere (not just almost-everywhere), and in this case, we have the following.
Theorem 6.4.37 The Fundamental Theorem of CalculusPart II. Let f : [a, b] R

be continuous and differentiable and let x [a, b]. Then, if t 7 dtd f (t) is absolutely integrable
on (a, b), then
Z x

f (x) =
a

dt dtd f (t).

(6.4.38)

In other words, The integral of the derivative is the original function..

In particular, the following is true: Given f : (a, b) R absolutely integrable and


d
F : [a, b] R any continuous differentiable function that satisfies dx
F(x) = f (x)
Rb
for all x (a, b), then a dt f (t) = F(b) F(a). This is how it is most often used
in calculusyou find some such F and then compute F(b) F(a) to compute the
integral.

d
Proof. a Suppose that x 7 dx
f (x) is absolutely integrable on (a, b). Without loss of generality, assume that x = b. Let > 0. Then, by The integral itself (Theorem 5.2.228), there is
a lower-semicontinuous function g : (a, b) R such that (i) dtd f g and (ii)

Z b

Z b

dt g(t) <
a
15 Take

dt dtd f (t) + .

(6.4.39)

any function f , any point x R, and any value y Ryou can redefine f to be equal to y at x, and you will
have no effect on its integral.

6.4 Calculus

293

By replacing g with g + for sufficiently small , we may without loss of generality assume
that dtd f < g. For > 0, define F : [a, b] R by
Z x

F (x) :=

dt g(t) f (x) + f (a) + (x a).

(6.4.40)

By lower-semicontinuity and the definition of the derivative, for each x [a, b), there is
some x > 0 such that
g(t) >

d
dt

f (x) and

f (t) f (x) d
< d f (x) + eta
t x

(6.4.41)

for all t (x, x + x ). Thus, for t (x, x + x , we have


F (t) F (x) =

Z t

ds g(s) [ f (t) f (x)] + (t x)




> (t x) dtd f (x) (t x) dtd f (x) + + (t x) = 0.
x

(6.4.42)

Define x := sup{t [a, b] : F (t) = 0}. By continuity, F (x ) = 0, It then follows from the
previous inequality that F (b) 0. As is arbitrary, it follows (6.4.40) that
f (b) f (a)

Z b

Z b

dt g(t) <
a

dt dtd f (t) + ,

(6.4.43)

and so
f (b) f (a)

Z b
a

dt dtd f (t).

(6.4.44)

Replacing everywhere in this argument f with f gives us


f (b) ( f (a))

Z b
a

dt dtd f (t).

(6.4.45)

Combining these two inequalities, we obtain


f (b) f (a) =

Z b
a

dt, dtd f (t).

(6.4.46)


a Proof

6.4.4

adapted from [22, pg. 149].

Functions defined by their derivatives: exp, sin, and cos


So, weve proven several properties about how to manipulate derivatives, but what is there to
differentiate? Polynomials? Thats no fun. Lets find a function even easier to differentiate. In fact,
lets see if we can find a function that is equal to its own derivative
d
dx

f (x) = f (x).

(6.4.47)

Dont be a smart-asszero doesnt count. Of course, you already know the answeror so you
think you do. Can you define exp(x)? For what its worth, you do have the tools to do so at this
point, but its quite likely that someone told you once upon a time that e 2.718 . . . and then
exp(x) := ex . If its not clear to you at this point that this is just complete and utter nonsense, then

Chapter 6. Differentiation

294

apparently Im not very good at writing mathematical exposition.16 What you could do, however,
xm
is define exp by its power-series, exp(x) := mM m!
, but where did that formula come from? Your
ass? No. The proper way to define the exponential function is that it is the unique function from R
to R that (i) is equal to its own derivative and (ii) is 1 at 0.17 Of course, as always, we cant just go
around asserting things like this exist willy-nilly. Who can even comprehend the chaos that might
ensue? We must prove that such a thing exists, and that only one such thing exists. The tool that
will allow us to do this is the following theorem about the existence and uniqueness of (ordinary)
differential equations.
Solutions to the differential equations we will be investigating are quite nice. They are even
better than smooth. They are what is called analytic.
Definition 6.4.48 Radius of convergence Let x0 Rd and for m N, let [cm ]a1 am be

a rank m covariant tensor. Then, the radius of convergence of the power series

[cm ]a a

(x x0 )a1 (x x0 )am ,

(6.4.49)

mN



is sup |x x0 | : the series converges absolutely for x Rd . .
In one dimension, we can say a bit more about the radius of convergence.
Proposition 6.4.50 The radius of convergence of the power series

cm (x x0 )m

(6.4.51)

mN

is the unique real number r0 such that (i) the series converges absolutely for all x R with
|x x0 | and (ii) diverges for all x Rd with |x x0 | > r0 .
Furthermore,
r0 =

(6.4.52)

1
lim supm |cm | m

Warning: Anything may happen if |x x0 | = rsee the following exercise (Exercise 6.4.54).

Proof. We leave this as an exercise.


Exercise 6.4.53 Prove this yourself. Hint: Use the Series.


Exercise 6.4.54 In the all of the following three examples, show that the radius of conver-

gence is 1.
(i). Show that mN xm diverges if |x| = 1.
m

(ii). Show that mN xm diverges for x = 1 and converges for x = 1.


m

(iii). Show that mN mx 2 converges absolutely for |x| = 1.


16 Or

maybe youre just stupid. (Disclaimer: That was a joke.)


unique function that that is equal to its own derivative
and is equal to 0 at 0 is the function that is everywhere 0.

1 is the next most obvious choice, as opposed to, say .


17 The

6.4 Calculus

295

Definition 6.4.55 Taylor series Let S Rd , let x0 Int(S), and let f : S R be

infinitely-differentiable at x0 . Then, the taylor series of f at x0 is the function


Br (x0 ) 3 x 7

a1 am f (x0 )
(x x0 )a1 (x x0 )am 3 R,
m!
mN

(6.4.56)

where r is the radius of convergence.


R

If x0 = 0, then this is the maclaurin series of f .

Every smooth function has a taylor series. Not every smooth function is the limit of
its taylor seriessee Exercise 6.4.86.

As mentioned in the remark, not every smooth function is the limit of its taylor series. In fact,
there is a name for such functions which do possess this property.
Definition 6.4.57 Analytic Let S Rd , let x Int(S), and let f : S R. Then, f is

analytic at x iff (i) f is infinitely-differentiable at x and (ii) there is a neighborhood U of x


on which f is the limit of its taylor series at x in C (U). f is analytic (on S) iff f is analytic
at x for all x Int(S). f is globally-analytic iff (i0 f is analytic and (ii) f is the limit of its
taylor series at x in C (S).
R

The difference between analytic and globally-analytic is that, in order to be globallyanalytic, the taylor series at every point has to converge to the function on the entire
1
domain. For example, the function R \ {1} 3 x 7 1x
is analytic, but not globally2
analyticits taylor series at 0 is 1 + x + x + , which does not converge on all of
R \ {1}. On the other hand, we will see that exp is globally-analytic.

Recall that there is a difference between smooth and infinitely-differentiable (see one
of the remarks in the definition of the derivative, Definition 6.2.4). Analytic functions
are in fact smooth, because polynomials are smooth and analytic functions are limits
of polynomialsa in C (Rd ).

As was mentioned above in the definition of taylor series themselves, not all smooth
functions are analyticsee Exercise 6.4.86.

This is actually one huge difference between real and complex analysis. When you
study complex analysis, you will find that all (complex) differentiable functions are
analytic.

The first condition (infinite-differentiability) is really just imposed so that the taylor
series itself makes sense.

a Namely

the partial sums of their taylor series.

Exercise 6.4.58 Let S Rd and let f : S R. Show that if there is some x S for which

the taylor series at x converges to f in C (S), then f is globally-analytic.


R

In other words, to show that it is globally-analytic, it suffices to check that one point
works globally.

Youll recall from calculus that a lot of the functions you worked with (e.g. ln, arctan, etc.)
were defined as inverses to other functions. It will thus be of interest to us in order to know that
inverses of analytic functions are analytic.

Chapter 6. Differentiation

296

Proposition 6.4.59 Let S R, let x Int(S), and let f : S R be injective. Then, if f is

analytic at x at a f (x) 6= 0, then f 1 : f (S) R is analytic at f (x).


R

Warning: This will fail if a f (x) = 0. For example, R 3 x 7 x3 R is analytic and


bijective but its inverse is not even differentiable at 0.

Warning: Even if f is bijective with a f (x) 6= 0 for all x R, f globally-analytic


does not imply that f 1 is globally-analytic. For example, we will see that exp
d
is bijective, dx
exp(x) 6= 0 for all x R, and exp is globally-analytic, but yet its
inverse, ln, is not globally-analytic, because, for example, its taylor series at x = 1 is
3
2
+ (x1)
+ , which does not converge on all of R+ .
(x 1) (x1)
2
3

Proof. We leave this an an exercise.


Exercise 6.4.60 Prove the result yourself.


Theorem 6.4.61 Existence and uniqueness of linear constant coefficient ODEs.

Let Aab : Rd Rd be a nonzero linear map and let [0 ]a Rd . Then, there exists a unique
globally-analytic map a : R Rd such that
(i). a (0) = [0 ]a ; and
(ii).
a
d
dt (t)

= Aab (t)b .

(6.4.62)

Proof. S T E P 1 : S H O W S M O O T H N E S S G I V E N E X I S T E N C E
For the proof, we do not make use of index notation.
STEP 2: SHOW EXISTENCE
Define m : R Rd by
m

m (t) :=

Ak t k
0 .
k=0 k!

(6.4.63)

Exercise 6.4.64 Show that m 7 m is cauchy in C (R, Rd ).

By completeness (Theorem 6.3.5), we may then define := limm m C (R, Rm ). By


the definition of convergence in C or rather, a corollary of itsee Proposition 6.3.3), we
have that
m

d
dt

m1
Ak
Ak1 k1
t k1 = A lim
t
m
k=1 (k 1)!
k=1 (k 1)!

= lim dtd m = lim


m

Ak
= A lim t k = A.
m
k=0 k!

(6.4.65)

And the other condition follows easily from the fact that m (0) = 0 for all m N. By
construction,a is the limit of its taylor series in C (R), and therefore, by the previous
exercise, is globally-analytic.
STEP 3: SHOW UNIQUENESS

6.4 Calculus

297

Let c : R Rd be some other curve that satisfies (i) c(0) = 0 and (ii)
d
c = Ac.
dt

(6.4.66)

Define
L := sup |Av|.

(6.4.67)

|v|=1

As A is nonzero, this is positive. We wish to show that c = on an interval of length L1


centered at 0. We will then have that they satisfy the same differential equation and same
1
initial condition at 2L
, and so by the same argument we can extend the interval on which
1
they agree from an interval of length L1 to an interval of length L1 + 2L
. Extending in other
2
direction as well shows that they agree on an interval of length L centered at 0. Continuing
this process inductively, we show thatthey agree onall of R.

1 1
1 1
Define T : MorTop 2L
, 2L , Rd MorTop 2L
, 2L , Rd by
Z t

[T ( f )] (t) := 0 + A

ds f (s).

(6.4.68)

 1 1  d
, 2L , R . It will then follow
We show that T is a contraction mapping on MorTop 2L
from the Banach Fixed-Point Theorem
that
T
has
a
unique
fixed point, that is, there is a
 1 1
unique continuous function on 2L
, 2L
Z t

(t) = 0 + A

ds (s).

(6.4.69)

By the Fundamental Theorem of Calculus, this is equivalent to


complete the proof.
To show that it is a contraction mapping, note that
Zt




|T ( f ) T (g)| = h sup i A ds f (s) g(s)

= A(t), which will

1 1
t 2L , 2L

Z t

d
dt (t)

sup

ds | f (s) g(s)|

(6.4.70)

i 0
h
1 1
t 2L , 2L

L
| f g| = 12 | f g|,
2

so that this is indeed a contraction mapping.


a Almostyou

have to check that this is actually the taylor series of the function.

We get as an easy corollary the following result.


Corollary 6.4.71 Let m N, a0 , . . . , am R with am 6= 0, and f00 , . . . , f0m1 R. Then,

there exists a unique smooth map f : R R such that


(i).

dk
dxk

f (k) (0) = a f0k for 0 k m 1; and

Chapter 6. Differentiation

298
(ii).
m

d
am dx
m f + + a0 f = 0.

(6.4.72)
k

d
Proof. For 0 k m 1, define gk := dx
k f . Then, (6.4.72) together with these definitions
gives us a system of m first-order differential equations:

0
1
0

0
g0
g0

0
1

0 .
d . 0
(6.4.73)
.. = ..
..
..
.. ..
.
..
dt
.
.
.
.
gm1
gm1
aam0 aam1 aam2 aam1
m

So the general notation makes this look obtuse, but I promise, everything going on here
d
d2
is absolutely trivial. For example, consider the differential equation dx
2 f 3 dx f + 2 f = 0.
d
d
Then, upon defining g0 := f and g1 := dx
f = dx
g0 , then our system would read
d
dx g0
d
dx g1

+ 1 g1

=
=

2 g0

(6.4.74)

+ 3 g1 .

By the previous theorem, there is a unique smooth solution to (6.4.73) that satisfies
gk (0) = f0k for 0 k m 1. This is just the same as saying, however, that there is a unique
smooth map f : R R that satisfies (6.4.72) and dxdk f (0) = f0k for 0 k m 1.

The definitions
Definition 6.4.75 Exponential function The exponential function, exp, is the unique

function exp : R R such that (i)

d
dx

exp(x) = exp(x) and (ii) exp(0) = 1.

Definition 6.4.76 Eulers Number Eulers Number is e := exp(1).

Cool, so now we have a function that is its own derivative. What about a function that is minus
its own derivative. Well, now that we have exp in hand, this is actually really easy.
Exercise 6.4.77 Show that x 7 exp(x) is the unique function from R to R such that (i) is

equal to minus its derivative and (ii) is equal to 1 at 0.


What about second derivatives? Well, of course, we exp is equal to its own second derivative.
Thats easy. But what about a function that is equal to minus its own second derivative? If we
wanted to play the same trick as before, we would need to consider the function x 7 exp(x) for
some number with 2 = 1, but thats just craziness. Only looney-toons would ever think about
such nonsense.18 Looks like were just going to have to come up with a new name for such a
function. Cosine sounds rad. Lets call it that.
Definition 6.4.78 Cosine and sine The cosine function, cos, is the unique function

cos : R R such that (i)

d2
dx2

cos(x) = cos(x), (ii) cos(0) = 1, and (iii)

The sine function, sin, is the unique function sin : R R such that (i)
18 But

seriously: why is there no real number whose square is 1?

d2
dx2

d
dx

cos(0) = 0.

sin(x) = sin(x),

6.4 Calculus

299

(ii) sin(0) = 0, and (iii)


R

d
dx

sin(0) = 1.

In the case of exp, because it was defined by a differential equation of first-order,


only one initial condition was required to uniquely specify it. In contrast, however,
there is more than one function equal to minus its own second-derivative (and is 1 at
0). To uniquely specify the function, you must also specify its first derivative. sin and
cos are essentially everything in the sense that any other function that is equal to
minus its own second derivative can be written as a linear combination of these (and
furthermore, you can figure out what that linear combination should be by looking at
initial conditions at 0).

In terms of finding functions that are equal to their own derivatives or minus their own derivatives, we are essentially done. exp always works for finding functions equal to its own nth derivative,
and if you want a function equal to minus its own nth derivative, x 7 exp(x) will work for n odd,
and cos and sin will work for n even. For the time being then, we simply return to the study of cos
and sin.
Their properties
Exercise 6.4.79 exp is an exponential function Show that exp(x) = ex .
R

exp is the unique function that is equal to its own derivative and is 1 at 0. On the
other hand, x 7 ex is an exponential function in the usual sense, that is, in the sense
of Theorem 2.5.13. These are not a priori the same thing.

Exercise 6.4.80 Show that the image of exp is R+ .


d
In particular, as dx
exp = exp > 0, exp is increasing, and in particular, injective. This allows us
to make the following definition.

Definition 6.4.81 The natural logarithm The natural logarithm, ln : R+ R, is defined

by ln := exp1 .

Exercise 6.4.82 Show that limx exp(x) = and limx xm exp(x) = 0 for all m N.
d
Exercise 6.4.83 Show that dx
ln(x) = 1x .

Exercise 6.4.84 Proposition 6.4.59 implies that ln is analytic. Compute the taylor series of

ln at x = 1 and compute its radius of convergence using Proposition 6.4.50. Conclude that
ln(2) =

(1)m+1
m ,
mZ+

(6.4.85)

finally wrapping up a loose-end from way back in Example 2.4.112.


Exercise 6.4.86 Taylor series need not converge to the original function Define

f : R R by
(

exp 1x
if x > 0
f (x) :=
.
0
if x 0

(6.4.87)

Chapter 6. Differentiation

300
Show that
R

dm
dxm

f (0) = 0 for all m N.

If you are familiar with taylor series from calculus, this example is meant to show
that there are functions which are not equal to their taylor seriesthis function has
taylor series identically 0 at 0, but of course the function itself is not 0. Thus, there is
no general theorem concerning existence of taylor seriessometimes they just do
plain not exist (or at least, do not converge to what you want them to).

d
d
Exercise 6.4.88 Show that dx
sin = cos and dx
cos = sin.

Exercise 6.4.89 Show that

cos(x + y) = cos(x) cos(y) sin(x) sin(y)

(6.4.90)

sin(x + y) = cos(x) sin(y) + cos(y) sin(x)


for all x, y R.
Exercise 6.4.91 Show that sin2 + cos2 = 1.

Proposition 6.4.92 , , and the periodicity of sin and cos There is a positive real

number R+ such that


(i). is the smallest positive real number such that cos(x + ) = cos(x) for all x R; and
(ii). is the smallest positive real number such that sin(x + ) = sin(x) for all x R.
Furthermore, it satisfies

cos x + 2 = sin(x)

sin x + 2 = cos(x).
R

(6.4.93)

For historical reasons, it is customary to use the half-period := 12 instead.

Proof. It follows from the definition that


d2
dx2

d2
dx2

cos(0) = cos(0) = 1. We first show that

d
cos is not everywhere negative. If this were the case that dx
cos would be (strictly)
d
d
decreasing. As dx cos(0) = 0, there would then be some x0 > 0 with dx
cos(x0 ) =: y0 < 0.
d
Then, because the derivative is decreasing again, we would have that dx cos(t) y0 for all
y x0 . Hence,

Z x

d
dt cos(t) =
dt

Z x

d
cos(x) =
dt cos(t) +
dt
0
x0
Z x0
d
y0 (x x0 ) +
dt cos(t).
dt
0

Z x0

dt
0

d
cos(t)
dt

(6.4.94)

By taking x sufficiently large, we would obtain the


n existence2 of some point
o x R with
d
+
cos(x) < 1: a contradiction. Therefore, the set x R : dx2 cos(x) = 0 is nonempty,
and so we may define


d2
+
:= inf x R : 2 cos(x) = 0 .
(6.4.95)
dx

6.4 Calculus

301
2

d
By continuity, we have that dx
2 cos( ) = 0, and so cos( ) = 0 (from the differential equa2
2
tion). As cos + sin = 1, it follows that sin( ) = 1.
d
We check that in fact sin( ) = 1. As sin(0) = 0, it suffices to show that dx
sin(x) > 0 for
d
x (0, ). However, dx sin(x) = cos(x), so it suffices to show that cos(x) > 0 for x (0, ).
By the Connectedness and the Intermediate Value Theorem, it suffices to show that cos does
d2
not vanish on (0, ). However, if cos were to vanish on (0, ), then dx
2 cos would vanish on
(0, ), contradicting the definition of .
We then have that

cos(x + 4 ) = cos(x + 3 ) cos( ) sin(x + 3 ) sin( ) = sin(x + 3 )


= cos(x + 2 ) sin( ) sin(x + 2 ) cos( ) = cos(x + 2 )
= cos(x + ) cos( ) + sin(x + ) sin( ) = sin(x + )

(6.4.96)

= cos(x) sin( ) + sin(x) cos( ) = cos(x).


We wish to show that 4 is the smallest such positive real number. So, let T R+ be
some other positive real number such that cos(x + T ) = cos(x) for all x R. We wish to
T
d2
show that 4 T . To show this, by the definition of , it suffices to show that dx
2 cos( 4 ) = 0.
We compute again using the angle addition formula

1 = cos(0) = cos(T ) = cos T4 + T4 + T4 + T4
4
2
2
4
= a cos T4 6 cos T4 sin T4 + sin T4 = C4 6C2 (1 C2 ) + (1 C2 )2 (6.4.97)
= 8C4 8C2 + 1,

where of course we have defined C := cos T4 . From this equation, it follows that either

d2
cos T4 = C = 0, in which case we are done, from the definition of , we have that
dx2
T
4 , or
C2 1 = 0,

(6.4.98)


from which it follows that cos T4 = 1. If it were 1, that would imply that cos, and
d2
T
T
hence dx
2 cos vanishes in (0, 4 ), which implies that 4 > . If it were 1, then as cos is
less than 1 in a neighborhood of 0, there would have to be some point in x0 (0, T4 ) for
d
which the derivative was positive. As d cos(0)
= 0, this implies that there must be some
point in x1 (0, x0 ) on which the second derivative is positive. However, as the second
derivative is negative in a neighborhood of 0, this implies (by the Connectedness and the
Intermediate Value Theorem again), that the second derivative must vanish in (0, x1 ). Then,
< x1 < x0 < T4 . And so, in all cases, we do indeed have that T4 , as desired.
Exercise 6.4.99 Finish the theorem by showing that cos(x + ) = sin(x) and

sin(x + ) = cos(x).

a Algebra

omitted. Its tedious.

Exercise 6.4.100 Show that


(i). cos is nonvanishing on 2 , 2 ; and

Chapter 6. Differentiation

302
(ii). sin is nonvanishing on (0, ).
This enables us to make the following definitions.

Definition 6.4.101 tan, cot, sec, and csc Define tan, sec : 2 , 2 R and cot, csc :

(0, ) R by
(i). tan :=

sin
cos ;

(ii). cot :=

cos
sin ;

(iii). sec :=

1
cos ;

(iv). csc :=

1
sin .

and

Exercise 6.4.102 Show that

(i).

d
dx

tan = sec2 ;

(ii).

d
dx

cot = csc2 ;

(iii).

d
dx

sec = sec tan; and

(iv).

d
dx

csc = csc cot.

Thus, tan is increasing and cot is decreasing. In particular, they are both injective.
Exercise 6.4.103 Show that the image of both tan and cot is R.

This allows us to make the following definition.


Definition 6.4.104 Arctangent and arccotangent The arctangent and arccotangent,

arctan, arccot : R R, are defined by arctan := tan1 and arccot := cot1 .


R

You can define inverses for the rest of these so-called trigonometric functions as
well, but in general you must restrict the domain of definition (for example, cos is
not injective on R).

Exercise 6.4.105 Show that

(i).

d
dx

arctan(x) =

1
;
x2 +1

(ii).

d
dx

arccot(x) = x21+1 .

and

We can finally tie-up a loose-end from way back in the beginning of Chapter 3.

Exercise 6.4.106 Show that arctan : R 2 , 2 and arccot : R (0, ) are homeomorphisms.
R

Recall that
 they cannot be uniform-homeomorphisms because R is complete, and
2 , 2 and (0, ) are not.

6.5 Differentiability and continuity

6.5

303

Differentiability and continuity


Having defined the exponential function, we can now tie-up a loose-end from before.


Example 6.5.1 A function which is infinitely-differentiable but not continuous a

Define f : R2 R by

0  
1
exp 2 y
f (hx, yi) :=
x


exp 2 +y2

if x = 0
otherwise

(6.5.2)

x2


Order R+ by the usual order and consider the net R+ 3 t 7 xt := ht, exp t12 i R2 . Then,
limt0+ xt = h0, 0i, but
f (xt ) = f (ht, exp( t12 )i) =

exp( t22 )
exp( t22 ) + exp( t22 )

1
2

6= 0 = f (h0, 0i),

(6.5.3)

and so f is not continuous at h0, 0i.


On the other hand, it is smooth away from x = 0, so we need only show that it is infinitelydifferentiable on the set x = 0. For v := hvx , vy i T(R2 )h0, yi, vx 6= 0,b we have

f (h0, yi + v) f (h0, yi)


=

=



1
exp
(y+vy )
2
(vx )


2
+(y+vy )2
exp
(vx )2


1

 exp



1
(y + vy )
exp

2
1
(vx )


=
 exp 2
+ (y + vy )2 (6.5.4)
(vx )2

y + vy
1
(vx )2

+ (y + vy )2 exp

1
(vx )2

.

Hence,
Dv f (h0, yi) := lim =
0

f (h0, yi + v) f (h0, yi)


= 0.


(6.5.5)

Of course, the map v 7 0 is linear.


Thus, Dv f (h0, yi) exists for all v T(Rd )h0, yi and the map v 7 Dv f (h0, 0i) is linear, but
nevertheless f is not continuous at h0, 0i.
Exercise 6.5.6 Show that in fact f is infinitely-differentiable.
R

Hint: Note that there is nothing to worry about ever away from x = 0.
 As for

x = 0, try using an induction argument to show that the power of exp (v1 )2
x
will always be greater than the power of the same thing in the numerator of
every term of every component of the higher derivatives.

a This
b If

comes from [4, pg. 116].


vx = 0, then the f (h0, yi + v) = 0, and so of course the directional derivative in this direction is 0.

Exercise 6.5.7 Does there exist an infinitely-differentiable function which is nowhere

continuous?
R

Honestly, I have no fucking idea. At this rate, probably. lulz

Chapter 6. Differentiation

304

Fortunately, however, this cannot happen in one-dimension (thank god!).


Proposition 6.5.8 Let f : R R be a function and let x R. Then, if f is differentiable at

x0 , then it is continuous at x0 .
Proof. Consider

f (x) f (x0 ) =


f (x) f (x0 )
(x x0 ).
x x0

(6.5.9)
0

f (x0 )
d
Taking the limit of both sides as x x0 , because limxx0 f (x)
= dx
(x0 ) exists (and is
xx0
finite), we find that limxx0 ( f (x) f (x0 )) = 0, so that f is continuous at x0 .


Be careful though: f need not be continuous in a neighborhood of x0 .


 Example 6.5.10 A function on R differentiable at a point and not continuous in
a neighborhood of that point Define f : R R by

(
x2
f (x) :=
0

if x Q
.
if x Qc

(6.5.11)

Exercise 6.5.12 Show that f is continuous at 0 and only at 0.


Exercise 6.5.13 Show that f is differentiable at 0 with derivative 0.
R

You might be tempted to ask Well, what if it is twice differentiable at a point?.


The problem with this is, in order to even define the second derivative at the point,
the derivative has to exist in a neighborhood of that point,a in which case it is then
automatically continuous on that same neighborhood by the previous result.

a Otherwise, the limit as h 0 of the difference quotient of the derivative (the definition of the second
derivative) does not make sense.

Exercise 6.5.14 A function that is not differentiable Find an example of a function

that is not differentiable.


Exercise 6.5.15 A function differentiable in one but not all directions Find a func-

tion f : R2 R that is differentiable along the x-axis at the origin, but not along the
y-axis.
Exercise 6.5.16 A function differentiable on the standard basis but not differentiable Find a function f : R2 R that is differentiable along both the x-axis and the y-axis

at the origin, but not in the direction of the vector h1, 1i.
R

In particular, it is quite possible for a the partial derivativesa of a function to exist


without that function being differentiable.

a The partial derivatives of a function are the directional derivatives in the direction of a standard orthonormal

basis vector.

6.5 Differentiability and continuity

305

Exercise 6.5.17 A continuous function that is not differentiable Find an example a

function that is continuous but not differentiable.


As a matter of fact, you can even find continuous functions which arent differentiable anywhere!


Example 6.5.18 A continuous function that is nowhere-differentiable a Define

f0 : [0, 1] R by
(
x
f0 (x) :=
1x

if x [0, 12 ]
if x [ 12 , 1]

(6.5.19)

and extend f0 : R R periodically with period 1. For k N, now define fk : R R by


1
f (4k x).
4k 0

fk (x) :=

(6.5.20)

Then define sm : R R by
m

sm (x) :=

fk (x).

(6.5.21)

k=0

We wish to show that m 7 sm is cauchy in MorTop (R, R). Then, because MorTop (R, R) is
complete (Theorem 4.5.9), we may define f : R R by

f (x) :=

fk (x)

(6.5.22)

k=0

and will automatically have that s MorTop (R, R) (that is, that f is continuous). We will
then show that f is not differentiable anywhere.
We thus verify cauchyness. For n > m, we have
n

|sn (x) sm (x)| =

m
1
1
1
1

4k 4k 1 1 4k
k=m+1
k=0
k=m+1
4
n

fk (x) <

k=m+1

1 ( 14 )m+1 4 1
1
=

= 3 m+1 .
4
1 41
1 41

(6.5.23)

In fact, for K R quasicompact,


|sn sm |K |sn sm |R

4 1
,
3 4m+1

(6.5.24)

and hence the sequence m 7 sm is cauchy in MorTop (R, R), and hence converges to f .
We now turn to the proof that f is differentiable nowhere. So, fix x R. We want to show
that f is not differentiable at R. To prove this, we first prove a small lemma: we show that
1
there exists a sequence m 7 m such that |m | = 4m+1
and
0
0
| fm (x + m
) fm (x)| = |m
|
0 | | |. Define r := 4m x b4m xc, so that r [0, 1). We define
whenever |m
m
m
m
(
1
if r [0, 41 ) or r [ 12 , 43 )
1

.
m := 4m+1
1 if r [ 14 , 21 ) or r [ 34 , 1)

(6.5.25)

(6.5.26)

Chapter 6. Differentiation

306
1
Obviously |m | = 4m+1
. For the other fact, notice that
(
r
if rm [0, 12 )
m
,
f0 (4 x) =
1 r if x [ 12 , 1)

(6.5.27)

Thus,
f0 (4m (x + m )) f0 (4m x) = f0 (rm + 4m m ) f0 (rm ) = f0 (rm 41 ) f0 (rm ). (6.5.28)
The sign was purposefully chosen so that both rm and rm 14 lie in the same half of the
interval [0, 1], so that the same rule in the definition of f0 applies. In fact, the same is of
0 with |0 |  . Thus, in either case, for such an 0 ,
course true if 14 replaced by any 4m m
m
m
m
we have that
0
0
f0 (4m (x + m
)) f0 (4m x) = 4m m
.

(6.5.29)

From this it follows that


0
| fm (x + m
) fm (x)| =

m
0
m
1
4m | f 0 (4 (x + m )) f 0 (4 x)|

1
4m

0
0
4m |m
| = |m
| (6.5.30)

Take n > m. Then,


fn (x + m ) =

n
n
n(m+1)
1
1
) = b 41n f0 (4n x) = fn (x)
4n f 0 (4 (x + m )) = 4n f 0 (4 x 4

(6.5.31)

Thus,
m

f (x + m ) f (x) =

[ fk (x + m ) fk (x)] =

[ fk (x + m ) fk (x)].

k=0

k=0

(6.5.32)

On the other hand, for n m,


fn (x) =

n
1
4n f 0 ( f x)

4mn
4n f 0


x 
1
x),
4m mn = 4mn fm ( 4mn
4

(6.5.33)

and so



x
hm
| fn (x + m ) fn (x)| = 4
fn 4mn + 4mn fn

m
= c 4mn mn = |m |
4
mn




4mn
x

(6.5.34)

Thus,
m :=

m
f (x + m ) f (x)
fk (x + m ) fk (x)
=
.
m
k
k=0

(6.5.35)

By (6.5.34), each term in this sum is 1, so that the difference quotient f (x+mm) f (x) is an

integer. Let n+
m denote the number of +1s that appear in this finite sum and similarly let nm
+

denote the number of 1s that appear in this sum, so that m = nm nm and m = nm + nm .


Hence, m = m 2n
m . Thus, the parity of m is constant, and so cannot converge. Thus,
this limit, the derivative, does not exist at x.
a Proof

adapted from [3].


f0 has period 1 and n m + 1.
c Because, for n m, | m | | |.
m
4mn

b Because

In fact, if you thought this was bad, we can do even worse than this. (This also wraps up a

6.5 Differentiability and continuity

307

loose-end when we motivated the introduction of the topology on C (Rd ).)


Example 6.5.36 A sequence of smooth functions which converges in
MorTop (R, R) to a function which is nowhere-differentiable Weierstrass Function


It is often said that a function on a closed interval that has a continuous derivative is lipschitzcontinuous. One should be careful, however, because in our language, this is not true.
Example 6.5.37 A function on a closed interval with continuous derivative that
is not lipschitz-continuous Define f : [0, 1] R by


(
x sin( 1x ) if x 6= 0
.
f (x) :=
0
if x = 0

(6.5.38)

Then, f is differentiable on [0, 1] with derivativea


d
f (x) = sin( 1x ) 1x cos( 1x ).
dx

(6.5.39)

Exercise 6.5.40 From here, show that f is not lipschitz-continuous on [0, 1].
a Remember,

we only define the derivative at interior points of domain, and f is in fact smooth on (0, 1).

On the other hand, if the derivative extends to a continuous function on the entire domain, then
we are good to go.
Exercise 6.5.41 Let K Rd be quasicompact and let f : K R. Show that if f has a

derivative which extends to a continuous function on K, then f is lipschitz-continuous.


The derivative (as a function on the tangent space) is itself continuous (hence uniformlycontinuous because continuous homomorphisms of topological groups are uniformly-continuous
(Proposition 4.3.63).
Exercise 6.5.42 Show that the map T(Rd )x 3 va 7 va a f (x) R is continuousa if f is

differentiable at x.
a Recall that T(Rd )x is a metric vector space, with the metric being the dot product. This metric induces a
norm (the usual euclidean norm), which in turn induces a metric, which in turn induces a uniformity, which in
turn induces a topology.

And now we come to the classical Do the partial derivatives commute? question.
Theorem 6.5.43 Clairuts Theorem. Let S Rd , let x S, and let f : S R. Then, if

f has a second derivative at that is continuous at x, then, a b (x) f = b a f (x).


R

If f is smooth, by applying this result inductively, it follows that all mixed partials
agree.

This applies equally well for arbitrary tensors of course, with the usual proof or just
reducing it to this case.

Warning: This fails in general, even for smooth functions, if there is curvature on a
riemannian manifold! As a matter of fact, the curvature tensor is defined by this lack

Chapter 6. Differentiation

308
of commutativity.

Proof. Let va , wa Rd . Then,


wb b f (x + v v) wb b f (x)
v 0
v
f (x + v v + w w) f (x + v v) f (x + w w) + f (x)
= lim lim
.
v 0 w 0
v w

[va a (wb b f )](x) = lim

(6.5.44)

For w R, define gw : Uhw R by


gw () := f (x + w w + v) f (x + v),

(6.5.45)

where Uw R is an open neighborhood of 0 chosen so that x + w + v U is in the domain


of f for all  Cls(Uw ). In fact, by making Uw smaller if necessary, way may without loss
of generality assume that it is an open interval containing 0. Using this definition, we have
1 gw (v ) gw (0)
.
v 0 w 0 w
v

[va a (wb b f )](x) = lim lim

(6.5.46)

gw is continuous on the closed interval Cls(Uw ) and differentiable on its interior,a and so
by the The Mean Value Theorem there is some cv (0, v ) such that
d
d gw (cv )

gw (v ) gw (0)


,
v

(6.5.47)

so that
1 d
gw (cv )
v 0 w 0 w d
va a f (x + w w + cv v) va a f (x + cv v)
.
= lim lim
v 0 w 0
w

[va a (wb b f )](x) = lim lim

(6.5.48)

Similarly as before, for v R, defineb hv : Vv R by


hv () := va a f (x + cv v + w)
for a sufficiently small neighborhood Vv of 0. Then,
h
i
h (w ) hv (0)
va a (wb b f ) (x) = lim lim v
.
v 0 w 0
w

(6.5.49)

(6.5.50)

Applying the The Mean Value Theorem again, we deduce the existence of some cw (0, w )
such that
d
d hv (cw )

hv (w ) hv (0)


,
w

so that
h
i
va a (wb b f ) (x) = lim lim wb b (va a f (x + cv v + cw w)) .
v 0 w 0

Continuity of the second derivative applied to this gives the desired result.
a Despite

(6.5.51)

(6.5.52)


the fact that the existence of the derivative of f does not imply continuity of f , it is the case that the
existence of directional derivatives of f implies existence of derivative of gw , which, because it is defined in
one-dimension, does imply continuity.
b For what its worth, this is what ultimately motivated me to change from hs to s in the difference quotient
at first I tried writing g1hw and g2hv , which works, but. . . ew. Being able to use h is so much better.

6.5 Differentiability and continuity

309

Exercise 6.5.53 Define f : R2 R by


xy(x2 y2 )
x2 +y2

if hx, yi 6= h0, 0i

if hx, yi = h0, 0i

(
f (hx, yi) :=

(6.5.54)

Show that this function is differentiable everywhere with gradient (in standard coordinates)
given by
(D y(x4 +4x2 y2 y4 ) x(x4 4x2 y2 y4 ) E
, (x2 +y2 )2
if hx, yi 6= h0, 0i
(x2 +y2 )2
.
(6.5.55)
a f (hx, yi) =
h0, 0i
if hx, yi = h0, 0i
In other words, for a vector va := hvx , vy i R2 ,
 4 22 4 
(  y(x4 +4x2 y2 y4 ) 
x(x 4x y y )
v
+
v
if hx, yi 6= h0, 0i
x
y
(x2 +y2 )2
(x2 +y2 )2
va a f (hx, yi) =
. (6.5.56)
0
if hx, yi = h0, 0i
Show that
R2 3 w 7 Dw (va f ) (h0, 0i)

(6.5.57)

is not linear, so that f is not twice-differentiable at h0, 0i.


On the other hand, if you plug-in v = h1, 0i and w = h0, 1i, and then plug-in v = h0, 1i
and w = h1, 0i, you should get different answers. Thus, the second partial derivatives
do not commute, but it is also not twice-differentiable. Thus, this example does not
show the necessity of the continuity assumption in the previous theorem. Can you
find an example of a function second differentiable at a point that is not symmetric
there?

Differentiability and continuity finally allows us to address what is probably one of the most
familiar (and useful) results from elementary calculus: the second derivative test.19
Theorem 6.5.58 Higher Derivative Test. Let S Rd , let x0 Int(S), f : S R be

differentiable at x0 , and let m Z+ {} be the least for which a1 am f (x0 ) 6= 0.a


Then, if m < , then x0 is a local maximum (resp. local minimum) of f iff (i) m is even and
(ii) a1 am f (x0 ) is negative-definite (resp. positive-definite).
R

Warning: There are smooth functions with local extrema and all derivatives vanishing
at this point (i.e. m = ), in which case the test is not applicablesee the following
exercise.

For example, this will tell us that R 3 x 7 x4 R has a local minimum at x = 0,


something which the Second Derivative Test is unable to do (it is inconclusive).

Proof. We leave this as an exercise.


Proof. Prove this yourself.




a With

19 Though

m := if all derivatives which exist vanish.

we present a stronger result, what we call simply the Highest Derivative Test.

Chapter 6. Differentiation

310

Exercise 6.5.59 Find a smooth function f : R R for which x = 0 is a local maximum

and

6.6

dn
dxn

f (0) = 0 for all n Z+ .

The Inverse Function Theorem


We have talked about functions which take values in Rm before, f : Rd Rm , and we have
interpreted them as vector fields on Rd . I dont want to do this now. Now, the codomain is going to
play what will be the role of a general manifold when you generalize, so for the time being I will
write M := Rm to be suggestive of this.
When you generalize to manifolds and you have a smooth function f : M N between
manifolds, then for each x X the derivative becomes a linear map at the tangent
space T(M)x 3 va 7 vb b f (x) T(N) f (x). (Once again, the index indicates that
this vector lives in a different vector space than va does.) Unfortunately, in the very
special case of f : Rd Rm , because T(M)x
=VectR Rd is the same as20 M and
m
T(N) f (x)
=VectR R is the same as N, the same thing gets treated in very different
ways in different, and this can make things very confusing. For example, f (x) Rm
the metric space21 , but va a f (x) Rm the vector space (namely, the tangent space
of Rm at f (x)).
Thus, for a smooth map f : Rd M, at each point, a f (x) MorVectR (T(Rd )x, T(M) f (x), the
linear map being defined as va 7 vb b f (x) .
For the special case d = m, something a little bit special happens: we can then ask whether the
derivative is invertible at each point or not (regarding as a linear map between tangent spaces of the
same dimension). An incredibly important fact, known as the Inverse Function Theorem, is that, if
a function is smooth and the derivative is invertible at a point, then the function itself is invertible
with smooth inverse in a neighborhood of that point. You might say that infinitesimal invertibility
implies local invertibility.
Theorem 6.6.1 Inverse Function Theorem. Let S Rd , let f a : S Rd be smooth,

and let x Int(S). Then, if a f (x)b : T(Rd )x T(Rd ) f (x) is invertible, then there exists
an open neighborhood U of x such that f |U is invertible with smooth inverse.
R

Bijective functions which are smooth and have smooth inverse will be called diffeomorphisms when you pass to the category of manifolds (they wind-up being of course
the isomorphisms in the category of manifolds). Thus, we say that if a smooth map
has a derivative that is invertible at a point, then it is a local diffeomoprhism.

Proof. We leave this as an exercise.


Exercise 6.6.2 Prove this yourself.
R

Hint: Yet another exercise that should not besee [21, pg. 221].

20 Or

so the symbols would suggest.


we could, we should regard it as a manifold, but we dont know what manifolds are, and a metric space is the next
best thing (that we have at our disposal).
21 If

A. Sets and categories

A.1
A.1.1

Basic set theory


What is a set?
For the most part, we will completely ignore any set-theoretic concerns in these notes, but before
we do just blatantly ignore any potential issues, we should first probably (attempt to) justify this
dismissal.
What is a set? Of course, intuitively, a set is just a thing that contains a bunch of other
things, but this itself is not a precise mathematical definition, so how do we come up with a precise
mathematical definition of the idea of a set? One way to do this would be to attempt to develop an
axiomatic set theory, but there is a certain circularity problem in doing this.
The term axiomatic set theory here refers to any collection of axioms which attempt to make
precise the intuitive idea of a set. In a given theory, however, the symbols which we make use of
to write down the axioms themselves form a set. The point is that, in attempting to write down a
mathematically precise definition of a set, one must make use of the naive notion of a set.
Of course this example might not be very convincing. Why not just not think of all the symbols
together and just think of them individually? It is true that if you fudge things around a bit you may
be able to convince yourself that youre not really making use of the naive notion of a set here. That
being said, even if you can convince yourself that you can get around the problem of first requiring
a set of symbols, sooner or later, in attempting to make sense out of an axiomatic set theory, you
will need to make use of the naive notion of a set.
Because of this, we consider the idea of a set to be so fundamental as to be undefinable, and we
simply assume that we can freely work with this intuitive idea of a collection of things all thought
of as one thing, namely a set.
One has to be careful however. Naive set theory has paradoxes, a famous example of which is
Russels paradox. Consider for example the set1
X := {Y : Y
/ Y}.
1 Hopefully

(A.1.1)

you have seen notation like this before. If not, really quickly skip ahead to Appendix A.1.2 The absolute
basics to look-up the meaning of this notation.

Chapter A. Sets and categories

312

Is X X? One resolution of this paradox is that it is nonsensical to construct the set of all things
satisfying a certain property. Whenever you construct a set in this manner, your objects have to be
already living inside some other set. For example, we can write
X := {Y Z : Y
/ Y}

(A.1.2)

for some fixed set Z. Russels paradox now becomes the statement that X
/ Z.
This is still somehow not enough. For example, if you turn to Example A.2.2, the category of
sets, youll see that we do need to make use of the notion of the collection of all sets. To get around
this, we think of the X of (A.1.1) as just being on an entirely new level of set: for example, one
way to get around this is to understand that Y varies over all sets and then to just interpret Russels
paradox as saying that X is a set-like object that is not itself a set (often called a proper class).
Do not freak-out if we refer to a set-like object, which is in fact not as a set, as a set. For us,
this is just a convenient abuse of language. Ultimately, whether or not a set-like object is truly
a set or not wont matter too much to us.

The content of this section was meant only to convince you that (i) there is no way of getting
around the fact that the idea of collecting things together is undefinably fundamental, and that (ii)
ultimately this naive idea is not paradoxical (if you cheat a little).
Disclaimer: I am neither a logician nor a set-theorist, so take what I say with a grain of salt.
A.1.2

The absolute basics


Some comments on logical implication

The word iff is short-hand for the phrase if and only if. So, for example, if A and B are statements,
then the sentence A iff B. is logically equivalent to the two sentences A if B. and A only if B..
In symbols, we write B A and A B respectively. The former logical implication is perhaps
more obvious; the other might be slightly trickier to translate from the English to the mathematics.
The way you might think about it is this: if A is true, then, because A is true only if B is true, it must
have been the case that B was true too. Thus, A only if B. is logically equivalent to A implies
B..
For us, the term statement will refer to something that is either true or false. If A and B are
statements, then A B are statements: True True is considered true, True False is considered
false, False True is considered true, and False False. Hopefully the first two of these make
sense, but how does one understand why it should be the case that False True is true? To see
this, I think it helps to first note the following.2
x X, P(x). is logically equivalent to x X P(x).,

(A.1.3)

where P(x) is a statement that depends on x.


Now consider the following example in English.
Every pig on Mars owns a shotgun.

(A.1.4)

Is this statement true or false? Under the (hopefully legitimate assumption) that there is no pig
on Mars at all, my best guess is that most native English speakers would say that this is a true
statement. In any case, this is mathematics, not linguistics, and for the sake of definiteness, we
simply declare a statement such at this to be vacuously true (unless of course there are pigs on
Mars, in which case we would need to determine if they all owned shotguns). This example is
meant to convince you that, in the case that X is empty, it is reasonable to declare the statement
x X, P(x) to be true for tautological reasons.
Now, appealing back to (A.1.3), hopefully it now also seems reasonable to declare statements
of the form False Q to be true, likewise for tautological reasons.
2 The

symbol in English reads for all. Similarly, the symbol is read as there exists.

A.1 Basic set theory

313

Sets

The idea of a set is something that contains other things.


If X is a set which contains an element x, then we write x X. Two sets are
equal iff they contain the same elements.

(A.1.5)

Definition A.1.6 Empty-set The empty-set, 0,


/ is the set 0/ := {}. That is, it is the set

which contains no elements.


R

If ever you see an equals sign with a colon in front of it (e.g. in 0/ := {}), it means that
the equality is true by definition. This is used in definitions themselves, but also outside of
definitions to serve as a reminder as to why the equality holds.

Definition A.1.7 Subset Let X and Y be sets. Then, X is a subset of Y iff whenever

x X it is also the case that x Y . If X is a subset of Y , we write X Y .


R

Generally speaking we put slashes through symbols to indicate that the statement that would
have been conveyed without the slash is false. For example, x
/ X means that x is not an
element of X, the statement that X 6 Y means that X is not a subset of Y , etc..

Exercise A.1.8 Let X and Y be sets. Show that X = Y iff X Y and Y X.


Definition A.1.9 Proper subset Let X be a subset of Y . Then, X is aproper, and write

X Y , iff there is some y Y that is not also in X.


Let X be a set, let P be a property that an element in X may or may not satisfy, and
let us write P(x) iff x satisfies the property P. Then, the notation
{x X : P(x)}
is read The set of all elements in X such that P(x). and represents a set whose
elements are precisely those elements of X for which P is true. Sometimes this is also
written as
{x X|P(x)} ,
but our personal opinion is that this can look ugly (or even slightly confusing) if, for
example, P(x) contains an absolute value in it:
{x R||x| < 1} .
Definition A.1.10 Complement Let X and Y be sets. Then, the complement of Y in X,

X \Y , is
X \Y := {x X : x
/ Y }.
If X is clear from context, sometimes we write Y c := X \Y .

(A.1.11)

Chapter A. Sets and categories

314

Definition A.1.12 Union and intersection Let A, B be subsets of a set X. Then, the

union of A and B, A B, is
A B := {x X : x A or x B} .

(A.1.13)

The intersection of A and B, A B, is


A B := {x X : x A and x B} .

(A.1.14)

Definition A.1.15 Disjoint and intersecting Let A, B be subsets of a set X. Then, A

and B are disjoint iff A B = 0.


/ A and B intersect (or meet) iff A B 6= 0.
/
Exercise A.1.16 De Morgans Laws Let {Si X : i} be a collection of subsets of a set

X. Show that
!c
[

!c
=

Si

\
i

Sic

and

Si

Sic .

(A.1.17)

The union and intersection of two sets are ways of constructing new sets, but one important
thing to keep in mind is that, a priori, the two sets A and B are assumed to be contained within
another set X. But how do we get entirely new sets without already living inside another? There
are several ways to do this.
Definition A.1.18 Cartesian-product Let X and Y be sets. Then, the cartesian-product

of X and Y , X Y , is
X Y := {hx, yi : x X, y Y } .
R

(A.1.19)

If you really insist upon everything being defined in terms of sets we can take
hx, yi := {x, {x, y}} .

(A.1.20)

The reason we use the notation hx, yi as opposed to the probably more common
notation (x, y) is to avoid confusion with the notation for open intervals.
R

If Y = X, then it is common to write X 2 := X X, and similarly for products of more


than two sets (e.g. X 3 := X X X). Elements in finite products are called tuples.
For example, the elements of X 2 are 2-tuples (or just ordered pairs), the elements in
X 3 are 3-tuples, etc.

Definition A.1.21 Disjoint-union Let X and Y be sets. Then, the disjoint-union of X

and Y , X tY , is
X tY := {ha, mi : m {0, 1}, a X if m = 0, a Y if m = 1} .
R

(A.1.22)

Intuitively, this is supposed to be a copy of X together with a copy of Y . a can come


from either set, and the 0 or 1 tells us which set a is supposed to come from. Thus,
we think of X X tY as X = {(a, 0) : a X} and Y X tY as Y {(a, 1) : a Y }.

The key difference between the union and disjoint-union is that, in the case of the union of A and
B, an element that x is both in A and in B is a single element in A B, whereas in the disjoint-union

A.1 Basic set theory

315

there will be two copies of it: one in A and one in B. Hopefully the next example will help clarify
this.
 Example A.1.23 Union vs. disjoint-union Define A := {a, b, c} and B := {c, d, e, f }.
Then, A B = {a, b, c, d, e, f }. On the other hand, A t B = {a, b, cA , cB , d, e, f }, where
A t B A = {a, b, cA } and A t B B = {cB , d, e, f }.

Definition A.1.24 Power set Let X be a set. Then, the power set of X, 2X , is the set of

all subsets of X,
2X := {A : A X} .
R

A.1.3

(A.1.25)

We will discuss the motivation for this notation in the next subsection (see Exercise A.1.51).

Relations, functions, and orders


We can then make the following definition.
Definition A.1.26 Relation A relation between two sets X and Y is a subset R of X Y .

For a given relation R, we then write x R y, or just x y if R is clear from context iff
(x, y) R. Often we will simply refer to the relation by the symbol instead of R.
Definition A.1.27 Composition Let X, Y , and Z be sets, and let R be a relation on X

and Y , and let S be a relation on Y and Z. Then, the composition, S R, of R and S is the
relation on X and Z defined by
S R := {hx, zi X Z : y Y such that hx, yi X Y and hy, zi Y Z.} . (A.1.28)
R

You will see in the next definition that a function is in fact just a very special type
of relation, in which case, this composition is exactly the composition that you
(hopefully) know and love.

There are several different important types of relations. Perhaps the most important is the
notion of a function.
Definition A.1.29 Function A function from a set X to a set Y is a relation f that has the

property that for each x X there is exactly one y Y such that x f y. For a given function
f , we denote by f (x) that unique element of Y such that x f f (x). X is the domain of f
and Y is the codomain of f . The set of all functions from X to Y is denoted Y X .
R

The motivation for this notation is that, if X and Y are finite sets, then the cardinality
(see Chapter 1 What is a number?) of the set of all functions from X to Y is |Y ||X| .

Example A.1.30 Identity function For every set X (including the empty-set), there is
a function, idX : X X, the identity function, defined by


idX (x) := x.

(A.1.31)

Chapter A. Sets and categories

316

Definition A.1.32 Inverse function Let f : X Y and g : Y X be functions. Then,

g is a left-inverse of f iff g f = idX ; g is a right-inverse of f iff f g = idY ; g is a


two-sided-inverse, or just inverse, iff g is both a left- and right-inverse of f .
Exercise A.1.33 Let g and h be two (two-sided)-inverses of f . Show that g = h.

Because of the uniqueness of two-sided-inverses, we may write f 1 for the unique twosided-inverse of f .
Exercise A.1.34 Provide examples to show that left-inverses and right-inverses need not

be unique.
Exercise A.1.35 Let X be a nonempty set.

(i). Explain why there is no function f : X 0.


/
(ii). Explain why there is exactly one function f : 0/ X.
(iii). How many functions are there f : 0/ 0?
/
Definition A.1.36 Image Let f : X Y be a function and let S X. Then, the image

of S under f , f (S), is
f (S) := { f (x) : x X} .

(A.1.37)

The range of f , f (X), is the image of X under f .


R

Note the difference between range and codomain. For example, consider the function
f : R R defined by f (x) := x2 . Then, the codomain is R but the range is just
[0, ). In fact the range and codomain are the same precisely when f is surjective
(see Exercise A.1.44).

Definition A.1.38 Preimage Let f : X Y be a function and let T Y . Then, the

preimage of T under f , f 1 (T ), is
f 1 (T ) := {x X : f (x) T } .

(A.1.39)

Exercise A.1.40 Let f : X Y be a function, and let S and T be a collection of subsets

of X and Y respectively. Show that the following statements are true.


(i). f 1 (

T T

(ii). f 1 (

T T

T) =

f 1 (T ).

T) =

f 1 (T ).

T T
T T

S) =

f (S)

S)

f (S).

(iii). f (

(iv). f (

SS
SS

SS
SS

Find an example to show that we need not have equality in (iv).

A.1 Basic set theory

317

Exercise A.1.41 Let f : X Y be a function and let T Y . Show that f 1 (T c ) = f 1 (T )c .

For S X, find examples to show that we need not have either f (Sc ) f (S)c nor f (S)c
f (Sc ).
Definition A.1.42 Injectivity, surjectivity, and bijectivity Let f : X Y be a function.

Then,
(i). (Injective) f is injective iff for every y Y there is at most one x X such that
f (x) = y.
(ii). (Surjective) f is surjective iff for every y Y there is at least one x X such that
f (x) = y.
(iii). (Bijective) f is bijective iff for every y Y there is exactly one x X such that
f (x) = y.
R

It follows immediately from the definitions that a function f : X Y is bijective iff it is both
injective and surjective.

Exercise A.1.43 Let f : X Y be a function. Show that f is injective iff whenever

f (x1 ) = f (x2 ) it follows that x1 = x2 .


Exercise A.1.44 Let f : X Y be a function. Show that f is surjective iff f (X) = Y .
Example A.1.45 The domain and codomain matter Consider the function
f (x) := x2 . Is this function injective or surjective? Defining functions like this may
have been kosher back when you were doing mathematics that wasnt actually mathematics,
but no longer. The question does not make sense because you have not specified the domain
or codomain. For example, f : R R is neither injective nor surjective, f : R+
0 R is
+
+
injective but not surjective, f : R R0 is surjective but not injective, and f : R+
0 R0
is both injective and surjective. Hopefully this example serves to illustrate: functions are
not (just) rulesif you have not specified the domain and codomain, then you have not
specified the function.


Exercise A.1.46 Let f : X Y be a function between nonempty sets. Show that

(i). f is injective iff it has a left inverse,


(ii). f is surjective iff it has a right inverse, and
(iii). f is bijective iff it has a (two-sided) inverse.
R

By Exercise A.1.35(ii), there is exactly one function from 0/ to {0}.


/ This function
is definitely injective as every element in the codomain has at most one preimage.
On the other hand, there is no function from {0}
/ to 0/ (by Exercise A.1.35(i)), and so
certainly no left-inverse to the function from 0/ to {0}.
/ This is why we require the
sets to be nonempty.

Chapter A. Sets and categories

318
Exercise A.1.47 Show that

(i). the composition of two injections is an injection,


(ii). the composition of two surjections is a surjection, and
(iii). the composition of two bijections is a bijection.
Exercise A.1.48 Let f : X Y be a function, let S X, and let T Y . Show that the

following statements are true.



(i). f f 1 (T ) T , with equality if f is surjective.
(ii). f 1 ( f (S)) S, with equality if f is injective.
Find examples to show that we need not have equality in general.
R

Maybe this is a bit silly, but I remember which one is which as follows. First of all,
write these both using , not , that is, S f 1 ( f (S)) and f ( f 1 (S)) S. Then,
the 1 is always closest symbol that represents being smaller (that is ).

Exercise A.1.49 Let X and Y be sets, and let x0 X and y0 Y . If there is some bijection

from X to Y , show that in fact there is a bijection from X to Y which sends x0 to y0 .


Exercise A.1.50 Let X and Y be sets. Show that there is a bijection from X Y to tyY X.
Exercise A.1.51 Let X be a set. Construction a bijection from 2X , the power set of X, and

{0, 1}X , the set of functions from X into {0, 1}.


R

This is the motivation for the notation 2X to denote the power set.

Arbitrary disjoint-unions and products


Definition A.1.52 Disjoint-union (of collection) Let X be an indexed collectiona of

sets. Then, the disjoint-union over all X X ,

X := {hx, Xi : X X x X}.

XX

X, is
(A.1.53)

XX

The intuition and way to think of notation is just the same as it was in the simpler
case of the disjoint-union of two sets (Definition A.1.21).

indexed collection we mean a set in which elements are allowed to be repeated. So, for example, X
is allowed to contain two copies of N. The reason for the term indexed collection is that indices are often
used to distinguish between the two identical copies, e.g., Y = {N1 , N2 }as sets are not allowed to repeat
elements, we add the indices so that, strictly speaking, N1 6= N2 as elements of X , even though they represent
the same set. (If this is confusing, dont think about it too hardits just a set where elements are allowed to be
repeated.)
a By

A.1 Basic set theory

319

Definition A.1.54 Restrictions (of functions defined on a disjoint-union) Let X be

an indexed collection of sets, let Y be a set, and let f :


the restriction of f to X, f |X : X Y , is defined by

XX

X Y be a function. Then,

f |X (x) := f (hx, Xi).

(A.1.55)

In particular, the inclusion is defined to be




X := [idXX ] ,

(A.1.56)

that is, the restriction of the identity idXX X :

XX

XX

X.

Definition A.1.57 Cartesian-product (of collection) Let X be an indexed collection

of sets. Then, the cartesian-product over all X X ,


(
)

X := f : X
X : f (X) X .
XX

XX

X, is
(A.1.58)

XX

Admittedly this notation is a bit obtuse. The cartesian-product is still supposed to


be thought of a collection of ordered-pairs, except now the pairs arent just pairs,
but can be 3, 4, or even infinitely many coordinates. The coordinates are indexed
by elements of X ,
and the X-coordinate for X X must lie in X itself. Thus, for
example, X1 X2 = XX X for X = {X1 , X2 }. The key that is probably potentially
the most confusing is that the elements of X are playing more than one role: on one
hand, they index the coordinates, and on the other hand, they are the set in which
the coordinates take their values. Hopefully keeping in mind the case X = {X1 , X2 }
helps this make sense. So, for example, in the statement f (X) X, on the left-hand
side, X is being thought of as an index, and on the right-hand side it is being thought
of as the space in which a coordinate
lives. This is thus literally just the statement

that the X-coordinate of f XX X must be an element of the set X.

For x

XX

X, we write xX := x(X) for the X-component or X-coordinate.

Definition A.1.59 Components (of functions into a product) Let X be an indexed

collection of sets, let Y be a set, and let f : Y


component, fX : Y X, is defined by

XX

X be a function, then the X-

fX (y) := f (y)X .

(A.1.60)

In particular, the projection, X , is defined to be


X := [idXX X ]X ,
that is, it is the X-component of the identity idXX X :
R

(A.1.61)

XX

XX

X.

For example, in the case f : Y X1 X2 , then f (y) = h f1 (x), f2 (x)i.

Before introducing other important special cases of relations, we must first introduce several
properties of relations.

Chapter A. Sets and categories

320
Definition A.1.62 Let be a relation on a set X.

(i). (Reflexive) is reflexive iff x R x for all x X.


(ii). (Symmetric) is symmetric iff x1 x2 is equivalent to x2 x1 for all x1 , x2 X.
(iii). (Transitive) is transitive iff x1 x2 and x2 R x3 implies x1 R x2 .
(iv). (Antisymmetric) is antisymmetric iff x1 x2 and x2 x1 implies x1 = x2 .a
(v). (Total) is total iff for every x1 , x2 X either x1 x2 or x2 x1 .
a Admittedly

the terminology here with symmetric and antisymmetric is a bit unfortunate.

Equivalence relations
Definition A.1.63 Equivalence relation An equivalence relation on a set X is a relation

on X that is reflexive, symmetric, and transitive.

Example A.1.64 Integers modulo m Let m Z+ and let x, y Z. Then, x and y are

congruent modulo m, and write x


= y (mod m), iff x y is divisible by m.
Exercise A.1.65 Check that
= (mod m) is an equivalence relation.

For example, 3 and 10 are congruent modulo 7, 1 and 3 are congruent modulo 4, 2 and
6 are congruent modulo 8, etc..
R

We will see a better way of viewing the integers modulo m in Example A.1.149. It
is better in the sense that it is much more elegant and concise, but requires a bit of
machinery and will probably not be as transparent if you have never seen it before.
Thus, it is probably more enlightening, at least the first time, to see things spelled out
in explicit detail.

Definition A.1.66 Equivalence class Let be an equivalence relation on a set X and

let x0 X. Then, the equivalence class of x0 , denoted by [x0 ] , or just [x0 ] if is clear from
context, is
[x0 ] := {x X : x x0 } = {x X : x0 x} .
R

(A.1.67)

Note that the second equation of (A.1.67) uses the symmetry of the relation.

Example A.1.68 Integers modulo m This is a continuation of Example A.1.64. For


example, the equivalence class of 5 modulo 6 is


[5]
= ( mod 6) = {. . . , 1, 5, 11, 17, . . .} ,

(A.1.69)

the equivalence class of 1 modulo 8 is


[1]
= ( mod 8) = {. . . , 17, 9, 1, 7, 15, . . .} ,
etc..

(A.1.70)

A.1 Basic set theory

321

An incredibly important property of equivalence classes is that they form a partition of the set.
Definition A.1.71 Partition Let X be a set. Then, a partition of X is a collection X of

subsets of X such that


(i). X =

UX

U, and

(ii). for U1 ,U2 X either U1 = U2 or U1 is disjoint from U2 .


Proposition A.1.72 Let be an equivalence relation on a set X and let x1 , x2 X. Then,

either (i) x1 x2 or (ii) [x1 ] is disjoint from [x2 ] .


Proof. If x1 x2 we are done, so suppose that this is not the case. We wish to show that
[x1 ] is disjoint from [x2 ] , so suppose that this is not the case. Then, there is some x3 X
with x1 x3 and x3 x2 . Then, by transitivity x1 x2 : a contradiction. Thus, it must be
the case that [x1 ] is disjoint from [x2 ] .

Corollary A.1.73 Let X be a set and let be an equivalence relation on X. Then, the

collection X := {[x] : x X} is a partition of X.


Proof. The previous proposition, Proposition A.1.72, tells us that X has property (ii) of the
definition of a partition, Definition A.1.71. Property (i) follows from the fact that x [x] ,
so that indeed
X=

[
xX

[x] =

U.

(A.1.74)

UX


Conversely, a partition of a set defines an equivalence relation.
Exercise A.1.75 Let X be a set, let X be a partition of X, and define x1 x2 iff there is

some U X such that x1 , x2 U. Show that is an equivalence relation.




Example A.1.76 Integers modulo m This in turn is a continuation of Example A.1.68.

The equivalence classes modulo 4 are


[0]
= ( mod 4) = {. . . , 8, 4, 0, 4, 8, . . .}
[1]
= ( mod 4) = {. . . , 7, 3, 1, 5, 9, . . .}
[2]
= ( mod 4) = {. . . , 6, 2, 2, 6, 10, . . .}

(A.1.77)

[3]
= ( mod 4) = {. . . , 5, 1, 3, 7, 11, . . .} .
You can verify directly that (i) each integer appears in at least one of these equivalence
classes and (ii) that no integer appears in more than one. Thus, indeed, the set

[0]
= ( mod 4) , [1]
= ( mod 4) , [2]
= ( mod 4) , [3]
= ( mod 4) is a partition of Z.
Given a set X with an equivalence relation , we obtain a new set X/ , the collection of all
equivalence classes of elements in X with respect to .

Chapter A. Sets and categories

322

Definition A.1.78 Quotient set Let be an equivalence relation on a set X. Then, the

quotient of X with respect to , X/ , is defined as


X/ := {[x] : x X} .

(A.1.79)

The function q : X X/ defined by q(x) := [x] is the quotient function.


Of course the quotient function is surjective. Whats perhaps a bit more surprising is that every
surjective function can be viewed as the quotient function with respect to some equivalence relation.
Exercise A.1.80 Let q : X Y be surjective and for x1 , x2 X, define x1 x2 iff x1 , x2
q1 (y) for some y Y . Show that (i) is an equivalence relation on X and (ii) that
q(x) = [x] .
Example A.1.81 Integers modulo m This in turn is a continuation of Example A.1.76.
For example, the quotient set mod 5 is


Z/
= (mod 5) = [0]
= ( mod 5) , [1]
= ( mod 5) , [2]
= ( mod 5) , [3]
= ( mod 5) , [4]
= ( mod 5) . (A.1.82)


It is quite common for us, after having defined the quotient set, to want to define operations on
the quotient set itself. For example, we would like to be able to add integers modulo 24 (we do this
when telling time). In this example, we could make the following definition.
[x]
= ( mod 24) + [y]
= ( mod 24) := [x + y]
= ( mod 24) .

(A.1.83)

This is okay, but before we proceed, we have to check that this definition is well-defined. That is,
there is a potential problem here, and we have to check that this potential problem doesnt actually
happen. I will try to explain what the potential problem is.
Suppose we want to add 3 and 5 modulo 7. On one hand, we could just do the obvious thing
3 + 5 = 8. But because we are working with equivalence classes, I should just as well be able to
add 10 and 5 and get the same answer. In this case, I get 10 + 5 = 15. At first glance, it might seem
we got different answers, but, alas, while 8 and 15 are not the same integer, they are congruent
modulo 7.
In symbols, if I take two integers x1 and x2 and add them, and you take two integers y1 and y2
with y1 equivalent to x1 and y2 equivalent to x2 , it had better be the case that x1 + x2 is equivalent to
y1 + y2 . That is, the answer should not depend on the representative of the equivalence class we
chose to do the addition.


Example A.1.84 Integers modulo m his in turn is a continuation of Example A.1.81.

Let m Z+ , let x1 , x2 Z, and define


[x1 ]
= ( mod m) + [x2 ]
= ( mod m) := [x1 + x2 ]
= ( mod m) .

(A.1.85)

We check that this is well-defined. Suppose that y1


= x1 (mod m) and y2
= x2 (mod m). We
must show that x1 + x2
= y1 + y2 (mod m). Because yk
= xk (mod m), we know that yk xk
is divisible by m, and hence (y1 x1 ) + (y2 x2 ) = (y1 + y2 ) (x1 + x2 ) is divisible by m.
But this is just the statement that x1 + x2
= y1 + y2 (mod m), exactly what we wanted to
prove.
Exercise A.1.86 Define multiplication modulo m and show that is is well-defined.

A.1 Basic set theory

323

Preorders
Definition A.1.87 Preorder A preorder on a set X is a relation on X that is reflexive

and transitive. A set equipped with a preorder is a preordered set.


Note that an equivalence relation is just a very special type of preorder.

Exercise A.1.88 Find an example of

(i). a relation that is both reflexive and transitive (i.e. a preorder),


(ii). a relation that is reflexive but not transitive,
(iii). a relation that is not reflexive but transitive, and
(iv). a relation that is neither reflexive nor transitive.
The notion of an interval is obviously important in mathematics and you almost have certainly
encountered them before in calculus. We give here the abstract definition (see Proposition 2.4.74 to
see that this agrees with what you are probably familiar with).
Definition A.1.89 Interval Let hX, i be a preordered set and let I X. Then, I is an

interval iff for all x1 , x2 I with x1 x2 , whenever x1 x x, it follows that x I.


R

In other words, I is an interval iff everything in-between two elements of I is also in


I.

Definition A.1.90 Monotone Let X and Y be preordered sets and let f : X Y be a

function. Then, f is nondecreasing iff x1 x2 implies that f (x1 ) f (x2 ). If the second
inequality is strict for distinct x1 and x2 , i.e. if x1 < x2 implies f (x1 ) < f (x2 ), then f
is increasing. If the inequality is in the other direction, i.e. if f (x1 ) f (x2 ), then f is
nonincreasing. If it is both strict and reversed, i.e. if x1 < x2 implies f (x1 ) > f (x2 ), then f
is decreasing. f is monotone iff it is either nondecreasing or nonincreasing and f is strictly
monotone iff it is either increasing or decreasing.
R

Note that the that appears in x1 x2 is different than the that appears in
f (x1 ) f (x2 ): the former is the preorder on X and the latter is the preorder on Y .
We will often abuse notation in this manner.

In this course, we will almost always be dealing with preordered sets whose preorder is in
addition antisymmetric.
Definition A.1.91 Partial-order A partial-order is an antisymmetric preorder. A set

equipped with a partial-order is a partially-ordered set or a poset.


Example A.1.92 Power set The archetypal example of a partially-ordered set is given
by the power set. Let X be a set and for U,V 2X , define U V iff U V .


Exercise A.1.93 Check that (2X , ) is in fact a partially-ordered set.

Exercise A.1.94 What is an example of a preorder that is not a partial-order?

Chapter A. Sets and categories

324

While we will certainly be dealing with nontotal partially-ordered sets, totality of an ordering is
another property we will commonly come across.
Definition A.1.95 Total-order A total-order is a total partial-order. A set equipped with

a total-order is a totally-ordered set.


Exercise A.1.96 What is an example of a partially-ordered set that is not a totally-ordered

set.
And finally we come to the notion of well-ordering, which is an incredibly important property
of the natural numbers.
Definition A.1.97 Well-order A well-order on a set X is a total-order that has the

property that every nonempty subset of X has a smallest element. A set equipped with a
well-order is a well-ordered set.
In fact, we do not need to assume a priori that the order is a total-order. This follows simply
from the fact that every nonempty subset has a smallest element.
Proposition A.1.98 Let X be a partially-ordered set that has the property that every

nonempty subset of X has a smallest element. Then, X is totally-ordered (and hence


well-ordered).
Proof. Let x1 , x2 X. Then, the set {x1 , x2 } has a smallest element. If this element is
x1 , then x1 x2 . If this element is x2 , then x2 x1 . Thus, the order is total, and so X is
totally-ordered.

Exercise A.1.99 What is an example of a totally-ordered set that is not a well-ordered set?
Zorns Lemma

We end this subsection with an incredibly important result known as Zorns Lemma. At the moment,
its importance might not seem obvious, and perhaps one must see it in action in order to appreciate
its significance. For the time being at least, let me say this: if ever you are trying to produce
something maximal by adding things to a set one-by-one (e.g. if you are trying to construct a basis
by picking linearly-independent vectors one-by-one), but you are running into trouble because,
somehow, this process will never stop, not even if you go on forever: give Zorns Lemma a try.
Definition A.1.100 Upper-bound and lower-bound Let hX, i be a preordered set,

let S X, and let x X. Then, x is an upper-bound iff s x for all s S. x is a lower-bound


iff x s for all s S.
Definition A.1.101 Maximum and minimum Let hX, i be a preordered set and let

x X. Then, x is a maximum of X iff x is an upper-bound of all of X. x is a minimum of X


iff x is a lower-bound of all of X.
Definition A.1.102 Maximal and minimal Let hX, i be a preordered set, let S X,

and let x S. Then, x is maximal in S iff whenever y S and y x, it follows that x = y. x


is minimal in S iff whenever y S and y x it follows that x = y.
R

In other words, maximal means that there is no element in S strictly greater than x

A.1 Basic set theory

325

(and similarly for minimal). Contrast this with maximum and minimum: if x is a
maximum of S it means that y x for all y S (and analogously for minimum).

Exercise A.1.103 Let X be a partially-ordered set and let S X.

(i). Show that every maximum of S is maximal in S.


(ii). Show that S has at most one maximum element.
(iii). Come up with an example of X and S where S has two distinct maximal elements.
Definition A.1.104 Downward-closed and upward-closed Let X be a preordered

set and let S X. Then, S is downward-closed in X iff whenever x s S it follows that


x S. S is upward-closed in X iff whenever x s S it follows that x S.
Proposition A.1.105 Let X be a well-ordered set and let S X be downward-closed in X.

Then, there is some s0 X such that S = {x X : x < s0 }.


Proof. As S is a proper subset of X, Sc is nonempty. As X is well-ordered, it follows that
Sc has a smallest element s0 . We claim that S = {x : x < s0 }. First of all, let x X and
suppose that x < s0 . If it were not the case that x S, then s0 would no longer be the smallest
element in Sc . Hence, we must have that x S. Conversely, let x S. By totality, either
x s0 or s0 x. As x S and s0 Sc , we cannot have that x = s0 , so in fact, in the former
case, we would have x < s0 , and we are done, so it suffices to show that s0 x cannot
happen. If s0 x, then because S is downward-closed in X and x S, it would follows that
s0 S: a contradiction. Therefore, it cannot be the case that s0 x.

Theorem A.1.106 Zorns Lemma. Let X be a partially-ordered set. Then, if every
well-ordered subset has an upper-bound, then X has a maximal element.

Proof. a S T E P 1 : M A K E H Y P O T H E S E S
Suppose that every well-ordered subset has an upper bound. We proceed by contradiction:
suppose that X has no maximal element.
S T E P 2 : S H O W T H AT E V E R Y W E L L - O R D E R E D S U B S E T H A S A N U P P E R B O U N D not C O N TA I N E D I N I T
Let S X be a well-ordered subset, and let u be some upper-bound of S. If there were no
element in X strictly greater than u, then u would be a maximal element of X. Thus, there is
some u0 > u. It cannot be the case that u0 S because then we would have u0 u because u
is an upper-bound of S. But then the fact that u0 u and u u0 would imply that u = u0 : a
contradiction. Thus, u0
/ S, and so constitutes an upper-bound not contained in S.
S T E P 3 : D E F I N E u(S)
For each well-ordered subset S X, denote by u(S) some upper-bound of S not contained
in S.
S T E P 4 : D E F I N E T H E N O T I O N O F A u- S E T
We will say that a well-ordered subset S X is a u-set iff x0 = u ({x S : x < x0 }) for all

Chapter A. Sets and categories

326
x0 S.

S T E P 5 : S H O W T H AT F O R u- S E T S S A N D T , E I T H E R S I S D O W N WA R D C L O S E D I N S O R T I S D O W N WA R D C L O S E D I N S
Define
[

D :=

A.

(A.1.107)

AX
A is downward-closed inS
B is downward-closed in T

That is, D is the union of all sets that are downward-closed in both S and T .
We first check that D itself is downward-closed in both S and T . Let d D, let s S,
and suppose that s d. As d D, it follows that d A for some A X downward-closed in
both S and T . As A is in particular downward-closed in S, it follows that s D, and so D is
downward-closed in S. Similarly it is downward-closed in T .
If D = S, then S = D is downward-closed in T , and we are done. Likewise if D = T , so we
may as well assume that D is a proper subset of both S and T . Then, by Proposition A.1.105,
there are s0 S and t0 T such that {s S : s < s0 } = D = {t T : t < t0 }. Because S and
T are u-sets, it follows that
s0 = u ({s S : s < s0 }) = u ({t T : t < t0 }) = t0 .

(A.1.108)

Define D {s0 } =: D0 := D {t0 }. Let d D0 , let s S, and suppose that s d. Either


d = s0 or d D. In the latter case, d < s0 . Either way, d s0 , and so we have that s s0 , and
so either s = s0 or s < s0 ; either way, s D0 . The conclusion is that D0 is downward-closed
in S. It is similarly downward-closed in T . By the definition of D, we must have that D0 D:
a contradiction. Thus, it could not have been the case that D was a proper subset of both S
and T .
STEP 6: DEFINE U
Define
U :=

S.

(A.1.109)

SX
S is a u-set

We claim that U is a u-set. It will then also be the case that U 0 := U {u(U)} is a u-set, and
so, by the definition of U, we will have U 0 U: a contradiction, which will complete the
proof. Thus, it suffices to show that U is a u-set.
S T E P 7 : F I N I S H T H E P R O O F B Y S H O W I N G T H AT U I S A u- S E T
We first need to check that U is well-ordered. Let A U be nonempty. For S X a u-set,
define AS := A S. For each AS that is nonempty, denote by aS the smallest element in AS .
Let T X be some other u-set. Then, by the previous step, without loss of generality, S is
downward-closed in T . In particular, S T so that aS T . Hence, aT aS . Then, because
S is downward-closed, aT S, and hence aT aS , and hence aT = aS . This unique element
is a smallest element of A.
Let u0 U. All that remains to be shown is that u0 = u ({x U : x < u0 }). To do this,
we first show that every u-set is downward-closed in U.
Let S X be a u-set, let s S, let x U, and suppose that x s. As x U, there is some
u-set T such that x T . Then, by the previous step, either S is downward-closed in T or T

A.1 Basic set theory

327

is downward-closed in S. If the former case, then we have that x S because x s. On the


other hand, in the latter case, we have that x S because x T S.
Now we finally return to showing that u0 = u ({x U : x < u0 }). By definition of U,
u0 S for some u-set S, and therefore, u0 = u ({x S : x < u0 }). Therefore, it suffices to
show that if x U is less than u0 , then it is in S (because then {x S : x < u0 } = {x U :
x < u0 }. This, however, follows from the fact that S is downward-closed in U.

a Proof

A.1.4

adapted from [7].

Sets with algebraic structure


Definition A.1.110 Binary operation A binary operation on a set X is a function

: X X X. It is customary to write x1 x2 := (x1 , x2 ) for binary operations.


R

Sometimes people say that closure is an axiom. This is not necessary. That a binary
operation on X takes values in X implicitly says that the operation is closed. That
doesnt mean that you never have to check closure, however. For example, in order to
verify that the even integers 2Z are a subrng (see Definition A.1.122).

Definition A.1.111 Let be a binary relation on a set X.

(i). (Associative) is associative iff (x1 x2 ) x3 = x1 (x2 x3 ) for all x1 , x2 , x3 X.


(ii). (Commutative) is commutative if x1 x2 = x2 x1 for all x1 , x2 X.
(iii). (Identity) An identity of is an element 1 X such that 1 x = x = x 1 for all x X.
(iv). (Inverse) If has an identity and x X, then an inverse of x is an element x1 X
such that x x1 = 1 = x1 x.
We first consider sets equipped just a single binary operation.
Definition A.1.112 Magma A magma is a set equipped with a binary operation.
Exercise A.1.113 Let hX, i be a magma and let x1 , x2 , x3 X. Show that x1 = x2 implies

x1 x3 = x2 x3 .
R

My hint is that the solution is so trivial that it is easy to overlook.

This is what justifies the trick (if you can call it that) of doing the same thing to
both sides of an equation that is so common in algebra.

Note that the converse is not true in general. That is, we can have x1 x2 = x1 x3 with
x2 6= x3 .

Definition A.1.114 Semigroup A semigroup is a magma hX, i such that is associative.


Definition A.1.115 Monoid A monoid hX, , 1i is a semigroup hX, i equipped with an

identity 1 X.

Chapter A. Sets and categories

328

Exercise A.1.116 Identities are unique Let X be a monoid and let 1, 10 X be such

that 1 x = x = x 1 and 10 x = x = x 10 for all x X. Show that 1 = 10 .


Definition A.1.117 Group A group is a monoid hX, , 1i equipped with a function

1 : X X so that x1 is an inverse of x for all x X.


R

Usually this is just stated as X has inverses.. This isnt wrong, but this way of
thinking about things doesnt generalize to universal algebra or category theory quite
as well. The way to think about this is that, inverses, like the binary operation (as well
as the identity) is additional structure. This is in contrast to the axiom of associativity
which should be thought of as a property satisfied by an already existing structure
(the binary operation).

Exercise A.1.118 Inverses are unique Let X be a group, let x X, and let y, z X

both be inverses of x. Show that y = z.


Exercise A.1.119 Let hX, i be a group and let x1 , x2 , x3 X. Show that if x1 x2 = x1 x3 ,

then x2 = x3 .
R

Thus, the converse to Exercise A.1.113 holds in the case of a group.

Definition A.1.120 Homomorphism (of magmas) Let X and Y be magmas and let

f : X Y be a function. Then, f is a homomorphism iff f (x1 x2 ) = f (x1 ) f (x2 ) for all


x1 , x2 X, and furthermore, in the case that both X and Y have identities, f (1) = 1.
R

Note that, once again, the in f (x1 x2 ) is not the same as the in f (x1 ) f (x2 ). Confer
the remark following the definition of a nondecreasing function, Definition A.1.90.

We now move on to the study of sets equipped with two binary operations.
Definition A.1.121 Rg A rg is a set equipped with two binary operations hX, +, i such

that
(i). hX, +i is a commutative monoid,
(ii). hX, i is a semigroup, and
(iii). distributes over +, that is, x1 (x2 + x3 ) = x1 x2 + x1 x3 for all x1 , x2 , x3 X.
R

In other words, writing out what it means for hX, +i to be a commutative monoid and
for hX, i to be a a semigroup, these three properties are equivalent to
+ is associative,
(ii).
(i). + is commutative,
(iii). + has an identity,
(iv). is associative,
(v). distributes over +.

For x X and m Z+ , we write m x := x + + x. Note that we do not make this


| {z }
m

definition for m = 0 N. An empty-sum is always 0 (by definition), but 0 x need


not be 0 in a general rg (see the tropical integers in Example 1.3.2).

A.1 Basic set theory


R

329

I have actually never seen the term rg used before. That being said, I havent seen
any term to describe such an algebraic object before. Nevertheless, I have seen both
the terms rig and rng before (see below), and, well, given those terms, rg is pretty
much the only reasonable term to give to such an algebraic object. We dont have a
need to work with rgs directly, but we will work with both rigs and rngs, and so it is
nice to have an object of which both rigs and rngs are special cases.

Definition A.1.122 Rng A rng is a rg such that hX, +, 0, i is a commutative group, that

is, a rg that has additive inverses.


Exercise A.1.123 Let hX, +, 0, i be a rng and let x1 , x2 X. Show the following proper-

ties.
(i). 0 x1 = 0.
(ii). (x1 ) x2 = (x1 x2 ).


Example A.1.124 A rg that is not a rng The even natural numbers 2N with their

usual addition and multiplication is also an example of a rg that is not a rng.


Definition A.1.125 Rig A rig is a rg such that hX, , 1i is a monoid, that is, a rg that has

a multiplicative identity.
R

In a rig R, we write
R := {r R : r has a multiplicative inverse.} .

(A.1.126)

R is a group with respect to the ring multiplication and is known as the group of units in R.
R

I believe it is more common to refer to rigs as semirings. I dislike this terminology


because it suggests an analogy with semigroups, of which there is none. The term rig
is also arguably more descriptiveeven if you didnt know what the term meant, you
might have a good chance of guessing, especially if you had seen the term rng before.

Definition A.1.127 Characteristic Let hX, +, 0, , , 1i be a rig. Then, either (i) there

is some m Z+ such that m 1 = 0 X or (ii) there is no such m. In the former case, the
smallest positive integer such that m 1 = 0 X is the characteristic, and in the latter case
the characteristic is 0. We denote the characteristic by Char(X).

Example A.1.128 A rg that is not a rig The even natural numbers 2N with their usual

addition and multiplication is a rg that is not a rig.


Definition A.1.129 Ring A ring is a rg that is both a rig and a rng.
R

The motivation for the terminology is as follows. Historically, the term ring was the first
to be used. It is not uncommon for authors to use the term ring to mean both our definition
and our definition minus the requirement of having a multiplicative identity. To remove this
ambiguity in terminology, we take the term ring to imply the existence of the identity and

Chapter A. Sets and categories

330

the removal of the i from the word is the term used for objects which do not necessarily
have an identity. Similarly, thinking of the n in ring as standing for negatives, a rig is
just a ring that does not necessarily posses additive inverses.
R

Whenever we say that a rg is commutative, we mean that the multiplication is commutative


(this should be obviousaddition is always commutative). Instead of saying referring to
things as commutative rgs etc. we will often shorten this to crg etc..

Note that it follows from Exercise A.1.123 that 1 x = x for all x X, X a ring.

Exercise A.1.130 Let X be a ring and suppose that 0 = 1. Show that X = {0}.
R

This is called the zero cring.

Definition A.1.131 Integral A rg hX, +, , 0i is integral iff it has the property that,
whenever x y = 0, it follows that either x = 0 or y = 0.
R

Usually the adjective integral is applied only to crings, in which case people refer
to this as an integral domain instead of an integral cring. As the natural numbers have
this property (i.e. xy = 0 x = 0 y = 0) I wanted an adjective that would describe
rgs with this property and integral was an obvious choice because of common use
of the term integral domain. It is then just more systematic to refer to them as
integral crings instead of integral domains. This is usually not an issue because it is
not very common to work with rigs or rgs (we of course need to because we construct
the natural numbers from the ground up).

Definition A.1.132 Skew-field A Skew-field is a ring hX, +, , 0, , 1i such that hX \

{0}, , 1,1 i is a group. That is, it is a ring such that every nonzero element has a multiplicative inverse.
R

In other words, every nonzero element has a multiplicative inverse.

Sometimes people use the term division ring instead of skew-field.

Exercise A.1.133 Show that all skew-fields are integral.

Definition A.1.134 Field A field is a commutative skew-field.


Exercise A.1.135 Let F be a field with positive characteristic p. Show that p is prime.
Definition A.1.136 Homomorphisms (of rgs) Let hX, +, i and hY, +, i be rgs and let

f : X Y be a function. Then, f is a homomorphism iff f is both a homomorphism (of


magmas) from hX, +i to hY, +i and from hX, i to hY, i. Explicitly, this means that
f (x + y) = f (x) + f (y), f (0) = 0, and f (xy) = f (x) f (y),

(A.1.137)

and furthermore, in the case that both X and Y are rigs, that
f (1) = 1.

(A.1.138)

A.1 Basic set theory

331

Quotient groups rngs

It is probably worth noting that this subsubsection is of relatively low priority. We present this
information here essentially because it gives a more unified, systematic, sophisticated, and elegant
way to view things presented in other places in the notes, but it is also not really strictly required to
understand these examples.
If you have never seen quotient rngs before, it may help to keep in the back of your mind
a concrete example as you work through the definitions. We recommend you keep in mind the
example R := Z and I := mZ (all multiplies of m) for some m Z+ . In this case, the quotient R/I
is (supposed to be, and in fact will turn-out to be) the integers modulo m. While this is a quotient
rng, it is also of course a quotient group (just forget about the multiplication), so this example may
also help you think about quotient groups as well.
As we shall use quotient groups to define quotient rngs, we do them first. The first thing to
notice is that every subgroup of a group induces an equivalence relation.
Definition A.1.139 Cosets (in groups) Let G be a group, let H be a subgroup, and let

g1 , g2 G. Then, we define g1
= g2 (mod H) iff g1
2 g1 H.
Exercise A.1.140 Show that
= (mod H) is an equivalent relation.

The equivalence class of g with respect to this equivalence relation is the left H-coset of g
and written gH. The quotient set (Definition A.1.78) of G with respect to this equivalence
relation id denoted G/H.
R

By changing the definition of the equivalent relation to . . . iff g1 g1


2 H, then we
obtain the corresponding definition of right H-cosets, denoted by Hg. Of course, in
general if the binary operation is not commutative, then gH 6= Hg.

For a subgroup H of G, G/H will always be a set. However, in good cases, it will be more than
just a setit will be a group in its own right.
Definition A.1.141 Ideals and quotient groups Let G be a group, let H G be a

subgroup, and let g1 , g2 G. Define


(g1 H) (g2 ) := (g1 g2 )H.

(A.1.142)

H is an ideal iff this is well-defined on the quotient set G/H. In this case, G/H is itself a
group, the quotient group of G modulo H.
R

Recall that gH represents the equivalent class of g modulo H, and so, in particular,
these definitions involve picking representatives of equivalence classes. Thus, in
order for these operations to make sense, they must be well-defined. In general, they
will not be well-defined, and we call H an ideal precisely in the good case where
these operations to make sense.

In the context of groups, it is much more common to refer to ideals as normal subgroups. As always, we choose the terminology we do because it is more universally
consistent, even if less common.

There is an easy way to check that a subgroup is an ideal or not that does not require directly
checking well-definedness.
Exercise A.1.143 Let G be a group and let H G be a subset. Show that H is an ideal iff

(i) it is a subgroup and (ii) gHg1 H for all g G.

Chapter A. Sets and categories

332

And now we turn to quotient rngs, whose development is completely analogous.


Definition A.1.144 Cosets (in rngs) Let R be a rng, let S R be a subrng, and let

r1 , r2 R. Then, we define r1
= r2 (mod S) iff r1 r2 S.
Exercise A.1.145 Show that
= (mod S) is an equivalence relation.

The equivalence class of r with respect to this equivalence relation is the S-coset of r and
written r + S. The quotient set of R with respect to this equivalence relation is denoted R/S.
You can check that mZ is indeed a subrng of Z and that Z/mZ consists of just m cosets:
0 + mZ, 1 + mZ, . . . , (m 1) + mZ, though you are probably more familiar just writing this as
0 (mod m), 1 (mod m), . . . , m 1 (mod m). Of course, however, Z/mZ is more than just a set, it has
a ring structure of its own, and in good cases, R/S will obtain a canonical ring structure of its own
as well.
Definition A.1.146 Ideals and quotient rngs Let R be a rng, let S R be a subrng,

and let r1 , r2 R. Define


(r1 + S) + (r2 + S) := (r1 + r2 ) + S and (r1 + S) (r2 + S) := (r1 r2 ) + S. (A.1.147)
S is an ideal iff both of these operations are well-defined. In this case, R/S is the quotient
rng of R modulo S.
Just as before, we have an easy way of checking of subsets are ideals.
Exercise A.1.148 Let R be a rng and let S R be a subset. Show that S is an ideal iff (i) it

is a subrng and (ii) r R and s S implies that r s, s r S.


R

The second property is sometimes called absorbing, because elements in the ideal
absorb things into the ideal when you multiply them.

Example A.1.149 Integers modulo m Let m Z+ .


Exercise A.1.150 Show that mZ is an ideal in Z.

Then, the integers modulo m are defined to be the quotient cring Z/mZ.

A.2

Basic category theory


First of all, a disclaimer: it is probably not best pedagogically speaking to start with even the very
basics of category theory. While in principle anyone who has the prerequisites for these notes
knows everything they need to know to understand category theory, it may be difficult to understand
the motivation for things without a collection of examples to work with in the back of your mind.
Thus, if anything in this section does not make sense the first time you read through it, you should
not worryt will only be a problem if you do not understand ideas here as they occur later on in
the text. In fact, it is probably perfectly okay to completely skip this section and reference back to
it as needed. In any case, our main motivation for introducing category theory in a subject like this
is simply that we would like to have more systematic language and notation.

A.2 Basic category theory


A.2.1

333

What is a category?
In mathematics, we study many different types of objects: sets, preordered sets, monoids, rngs,
topological spaces, schemes, etc..3 In all of these cases, however, we are not only concerned with
the objects themselves, but also with maps between them that preserve the relevant structure.
In the case of a set, there is no extra structure to preserve, and so the relevant maps are all the
functions. In contrast, however, for topological spaces, we will see that the relevant maps are
not all the functions, but instead all continuous functions.4 Similarly, the relevant maps between
monoids are not all the functions but rather the homomorphisms.5 The idea then is to come up with
a definition that deals with both the objects and the relevant maps, or morphisms, simultaneously.
This is the motivating idea of the definition of a category.
Definition A.2.1 Category A category C is

(i). a collection C0 called the objects of C ;


(ii). together with, for each A, B C0 , a collection MorC (A, B) called the morphisms from
A to B in C ;a
(iii). for each A, B,C C0 , a function : MorC (B,C) MorC (A, B) MorC (A,C) called
composition;
(iv). and for each A C0 a distinguished element idA MorC (A, A) called the identity of
A;
such that
(i). is associative, that is, f (g h) = ( f g) h for all morphisms f , g, h for which
these composition make sense,b and
(ii). f idA = f = idA f for all A C0 .
a No,

we do not require that MorC (A, B) be a set.


case youre wondering, the quotes around associative are used because usually the word associative
refers to a property that a binary operation has. A binary operation on a set S is, by definition, a function
from X X into X. Composition however in general is a function from X Y into Z for X := MorC (B,C),
Y := MorC (A, B) and Z := MorC (A,C), and hence not a binary operation.
b In

The intuition here is that the objects C0 are the objects you are interested in studying, and for
objects A, B C0 , the morphisms MorC (A, B) are the maps relevant to the study of the objects in
C . For us, it will usually be the case that every element of C0 is a set equipped with extra structure
(e.g. a binary operation) and the morphisms are just the functions that preserve this structure
(e.g. homomorphisms).
At the moment, this might seem a bit abstract because of the lack of examples. As you continue
through the main text, you will encounter more examples of categories, which will likely elucidate
this abstract definition. However, even already we have a couple basic examples of categories.


Example A.2.2 The category of sets The category of sets is the category Set whose

collection of objects Set0 is the collection of all sets,a for every set X and every set Y the
collection of morphisms from X to Y , MorSet (X,Y ), is precisely the set of all functions
3 No,

you are not expected to know what all of these are.


might say that the entire point of the notion of a topological space is it is one of the most general contexts in
which the notion of continuity makes sense. We will see exactly how this works later in the body of the text.
5 Once again, it is perfectly okay if you dont know what either a monoid or a homomorphism is.
4 You

Chapter A. Sets and categories

334

from X to Y ,b composition is given by ordinary function composition, and the identities of


the category are the identity functions.
a This
b Note

is not paradoxical as the collection of all sets is not itself a set.


that each MorSet (X,Y ) itself is a set.

We also have another example at our disposal, namely the category of preordered sets.
Example A.2.3 The category of preordered sets The category of preordered sets
is the category Pre whose collection of objects Pre0 is the collection of all preordered sets,
for every preordered set X and preordered set Y the collection of morphisms from X to Y ,
MorPre (X,Y ), is precisely the set of all nondecreasing functions from X to Y , composition
is given by ordinary function composition, and the identities of the category are the identity
functions.


The idea here is that the only structure on a preordered set is the preorder, and that the precise
notion of what it means to preserve this structure is to be nondecreasing. Of course, we could
everywhere replace the word preorder (or its obvious derivatives) with partial-order or totalorder and everything would make just as much sense. Upon doing so, we would obtain the category
of partially-ordered sets and the category of totally-ordered sets respectively.
We also have the category of magmas.
 Example A.2.4 The category of magmas The category of magmas is the category
Mag whose collection of objects Mag0 is the collection of all magmas, for every magma X
and every magma Y the collection of morphisms from X to Y , MorMag (X,Y ), is precisely
the set of all homomorphisms from X to Y , composition is given by ordinary function
composition, and the identities of the category are the identity functions.

Similarly, the idea here is that the only structure here is that of the binary operation (and possibly
an identity element) and that it is the homomorphisms which preserve this structure. Of course, we
could everywhere here replace the word magma with semigroup, monoid, group, etc. and
everything would make just as much sense. Upon doing so, we would obtain the categories of
semigroups, the category of monoids, and the category of groups respectively.
Finally we have the category of rgs.
Example A.2.5 The category of rgs The category of rgs is the category Rg whose
collection of objects Rg0 is the collection of all rgs, for every rg X and every rg Y the collection of morphisms from X to Y , MorRg (X,Y ), is precisely the set of all homomorphisms
from X to Y , composition is given by ordinary function composition, and the identities of
the category are the identity functions.


A.2.2

The same as before, we could have everywhere replaced the word rg with rig,
rng, or ring. These categories are denoted Rig, Rng, and Ring respectively.

Some basic notation


The real reason we introduce the definition of a category in notes like these is that it allows us
to introduce consistent notation and terminology throughout the text. Had we forgone even the
very basics of categories, we would still be able to do the same mathematics, but the notation and
terminology would be much more ad hoc.

A.2 Basic category theory

335

Definition A.2.6 Domain and codomain Let f : A B be a morphism in a category.

Then, the domain of f is A and the codomain of f is B.


R

Of course, these terms generalize the notions of domain and codomain for sets.

Definition A.2.7 Endomorphism Let C be a category and let A C0 . Then, an en-

domorphism is a morphism f MorC (A, A). We write EndC (A) := MorC (A, A) for the
collection of endomorphisms on A.
In other words, endomorphism is just a fancy name for a morphism with the same domain
and co-domain.
Definition A.2.8 Isomorphism Let f : A B be a morphism in a category. Then, f is

an isomorphism iff it is invertible, i.e., iff there is a morphism g : B A such that g f = idA
and f g = idB . In this case, g is an inverse of f . The collection of all isomorphisms from
A to B is denoted by IsoC (A, B).
Exercise A.2.9 Let f : A B be a morphism in a category and let g, h : B A be two

inverses of f . Show that g = h.


R

As a result of this exercise, we may denote the inverse of f by f 1 .a

a If inverses were not unique, then the notation

f 1 would be ambiguous: what inverse would we be referring

to?

Exercise A.2.10 Show that a morphism in Mag is an isomorphism iff (i) it is bijective, (ii)

it is a homomorphism, and (iii) its inverse is a homomorphism.


Exercise A.2.11 Show that the inverse of a bijective homomorphism of magmas is itself a

homomorphism.
R

Thus, if you want to show a function is an isomorphism of magmas, in fact you only
need to check (i) and (ii) of the previous exercise, because then you get (iii) for free.
(Of course, essentially the very same thing happens in Rg as well.)

Definition A.2.12 Isomorphic Let A, B C0 be objects in a category. Then, we A and

B are isomorphic iff there is an isomorphism from A to B. In this case, we write A


=C B, or

just A = B if the category C is clear from context.


Definition A.2.13 Automorphisms Let C be a category and let A C0 . Then, an

automorphism f : A A is a morphism which is both an endomorphism and an isomorphism.


We write AutC (A) := IsoC (A, A) for the collection of automorphisms of A.
Exercise A.2.14 Let X and Y be set. Show that IsoSet (X,Y ) is the set of all bijections from

X to Y .
Exercise A.2.15 Let C be a category. Show that
=C is an equivalence relation on C0 .

Bibliography

[1] Abbot, Stephen. Understanding Analysis. Springer-Verlag UTM. (2010).


[2] Cohn, Donald L.. Measure Theory, 2nd Ed.. Springer. (2013).
[3] Coleman, Mark. Continuous functions nowhere differentiable. Constnondiff.pdf
[4] Gelbaum, Bernard R.; Olmsted, John M. H.. Counterexamples in Analysis. Dover Publications
Inc.. (1992).
[5] Gleason, Jonatahn. A homeomorphism of R that doesnt preserve lebesgue measurability. (2009). https://www.academia.edu/14753183/A_homeomorphism_of_mathbb_R_
that_doesnt_preserve_lebesgue_measurability.
[6] Gleason, Jonathan. Existence and uniqueness of haar measure. (2010). https://www.
academia.edu/740208/Existence_and_Uniqueness_of_Haar_Measure.
[7] Grayson, Daniel R.. Zorns Lemma.
[8] Hart, K. P.; Nagata, J.; Vaughn, J. E.. Encyclopedeia of General Topology. Elsevier Science.
(2004).
[9] Hnig, Chaim Samuel. Proof of the well-ordering of cardinal numbers
[10] Howes, Norman R.. Modern Analysis and Toology. Springer-Verlag Universitext. (1991).
[11] Isbell, J. R. . Uniform spaces. American Mathematical Society. (1964).
[12] Kelley, John L.. General Topology. Springer-Verlag GTM. (1955).
[13] Kunzi, Hans-Peter A.. An Introduction to the Theory of Quasi-uniform Spaces. (7 May 2005).
http://math.sun.ac.za/cattop/Output/Kunzi/quasiintr.pdf.
[14] Lee, John M.. Riemannian ManifoldsAn Introduction to Cuvature. Springer-Verlag GTM.
(1997).

338

BIBLIOGRAPHY

[15] mathoverflow
[16] math.stackexchange
[17] Munkres, James. Topology, 2nd Ed.. Prentice Hall. (2000).
[18] Preuss, Gerhard.. Foundations of TopologyAn Approach to Convenient Topology. Kluver
Academic Publishers. (2002).
[19] Pugh, Charles Chapman. Real Mathematical Analysis. Springer-Verlag UTM. (2002).
[20] Ross, Kenneth. Elementary Analysis: The Theory of Calculus. Springer-Verlag UTM. (2013).
[21] Rudin, Walter. Principles of Mathematical Analysis, 3rd Ed.. McGraw-Hill. (1976).
[22] Rudin, Walter. Real and Complex Analysis, 3rd Ed.. McGraw-Hill. (1987).
[23] Steen, Lynn Arthur. Seebach Jr., J. Arthur. Counterexamples in Topology. Holt, Rinehart and
Winston Inc.. (1970).
[24] Stein, Elias M. Shakarchi, Rami. Real AnalysisMeasure Theory, Integration, and Hilbert
Spaces. Princeton University Press. (2005).
[25] Thomas, J.. A regular space, not completely regular. The American Mathematical Monthly,
Vol. 76, No. 2, pp. 181182. (1969).
[26] Wikipedia

Index

F set, 90
G set, 90
L p norm, 270
T0 , 128
T0 quotient, 129
T1 , 129
T2 , 130
T4 , 140
T2 1 , 132
2
T3 1 , 138
2
T4 1 , 141
2
-algebra, 211
-compact, 216
-quasicompact, 216
Absolute convergence, 61
Absolute value, 44
Absolutely integrable, 253
Accessible (topological space), 129
Accumulation point, 76, 98
Additive (measure), 208
Adherent point, 98
Algebra, 182
Algebraic (number), 43
Algebraic Derivative Theorems, 286
Algebraic Limit Theorems, 52
Almost-everywhere category, 209
Almost-everywhere XYZ, 208

Alternating harmonic series, 63


Alternating series, 62
Analytic, 295
Antisymmetric, 320
Arccotangent, 302
Archimedean field, 45
Arctangent, 302
Arens Square, 127
Associative, 327
Automorphism, 335
Average slope, 289
Baire space, 203
Ball, 44
Banach Fixed-Point Theorem, 205
Base, 90
Bernstein-Cantor-Schrder Theorem, 17
Bijective, 317
Bilinear, 277
Binary operation, 327
Bolzano-Weierstrass Theorem, 86
Borel (function), 253
Borel measure, 215
Borel measure space, 215
Borel sets, 218
Bounded map (of semimetric spaces), 174
Box topology, 121
Cantor Function, 249

340
Cantor Set, 246
Cantors Cardinality Theorem, 43
Cantor-Heine Theorem, 168
Cantor-Smith-Volterra Set, 247
Carathodorys Theorem, 211
Carathodory-Vitali Theorem, 268
Cardinal number, 14
Cartesian-product, 314, 319
Category, 333
cauchy, 51
Cauchy (in uniform spaces), 191
Cauchy completion, 195
Cauchy-Schwarz Inequality, 271
Chain rule, 287
Characteristic (of a rig), 329
Characteristic function, 252
Clairuts Theorem, 307
Classical Extreme Value Theorem, 150
Classical Intermediate Value Theorem, 149
Classical Intermediate-Extreme Value
Theorem, 150
Closed (in R), 76
Closed (in a topological space), 89
Closure, 79, 99
Cocountable, 101
Cocountable extension topology, 135
Cocountable topology, 100
Codomain (of a function), 315
Codomain (of a morphism), 335
Cofinal, 68
Cofinite, 123
Cofinite topology, 132
Commutative, 327
Compact, 130
Compactly integrable, 253
Comparison Test, 62
Complement, 313
Complete (uniform space), 192
Completely-T2 , 133
Completely-T3 , 139
Completely-hausdorff, 133
Completely-separated, 126
Completion (of a uniform space), 195
Component (cartesian-product), 319
Component (of a function into a product),
319
Composition, 315, 333
Conditional convergence, 61
Connected, 145

BIBLIOGRAPHY
Connected component, 147
Continuous, 73, 94
Continuous (at a point), 73, 94
Continuous (tensor), 280
Contraction, 278
Contraction-Mapping Theorem, 205
Contravariant rank, 277
Contravariant tensor, 277
Convergence, 49, 97, 105
Coordinate (cartesian-product), 319
Coset, 331
Cosine function, 298
Countably-infinite, 42
Covariant rank, 277
Covariant tensor, 277
Covector, 276
Cover, 83, 101
Crg, 330
Crig, 330
Cring, 330
Crng, 330
Curl, 285
De Morgans Laws, 314
Decreasing, 323
Dedekind cut, 33
Dedekind-complete, 32
Dedekind-MacNeille completion, 34
Degree (of a polynomial), 46
Dense, 102
Derivative (of a tensor field), 284
Derived filter, 105
Devils Staircase, 249
Diameter, 185
Diffeomorphism, 310
Difference quotient, 281
Differentiable (function), 281
Differentiable (tensor field), 284
Dirac measure, 221
Directed set, 48
directional derivative, 281
Dirichlet Function, 73
Disconnected, 145
Discrete topology, 94
Discrete uniformity, 161
Disjoint, 314
Disjoint-union, 314, 318
disjoint-union topology, 123
Disk, 44
Divergence, 50, 285

341

BIBLIOGRAPHY
Division ring, 330
Domain (of a function), 315
Domain (of a morphism), 335
Downward closed, 325
Downward-directed, 159
Dual space, 275
Dual vector, 279
Embedding, 95
Empty-set, 313
Endomorphism, 335
Equinumerous, 14
Equivalence class, 320
Equivalence relation, 320
Essential supremum, 270
Eulers Number, 298
Eventually XYZ, 49
Exponential, 74
Exponential function, 298
Exterior derivative, 285
Extreme Value Theorem, 149
Fermats theorem, 288
Field, 330
Filter base, 104
Filtering, 106
Final topology, 117
Final uniformity, 167
Finite, 14
First Derivative Test, 288
Frchet (topological space), 129
Frchet differentiable, 282
Fraction field, 29
Function, 315
Fundamental Theorem of Calculus, 292
Gteau differentiable, 282
Gauge spaces, 171
Generate (a topology), 92
Generating collection (of a topology), 92
Geometric series, 61
Globally-analytic, 295
Gradient, 282
Greatest lower-bound property, 32
Greatest lower-bounds, 31
Grothendieck Group, 25
Group, 328
Group of units, 329
Hlder conjugate, 270
Hlders Inequality, 271

Half-open rectangles, 244


Harmonic series, 63
Hausdorff (topological space), 130
Heine-Borel Theorem, 83
Higher Derivative Test, 309
Homeomorphism, 95
Homomorphism (of algebras), 182
Homomorphism (of magmas), 328
Homomorphism (of rgs), 330
Hyperconnected, 145
Ideal, 331, 332
Identity (in a category), 333
Identity function, 315
Iff, 312
Image, 316
Inclusion (disjoint-union), 319
Increasing, 323
Indexed collection, 318
Indiscrete topology, 94
Indiscrete uniformity, 161
Infimum, 31
Infinite, 14
Infinitely-differentiable, 283
Initial topology, 115
Injective, 317
Integers, 25
Integers modulo m, 332
Integral, 330
Integral domain, 25, 330
Interior, 80, 100
Interior point, 78, 99
Intermediate Value Theorem, 149
Intersect, 314
Intersection, 314
Interval, 323
Inverse (of a function), 316
Inverse Function Theorem, 310
Irrational numbers, 39
Isogeneous base, 225
Isogeneous cover, 225
Isogeneous measure, 225
Isomorphic, 335
Isomorphism, 335
Kelleys Convergence Axioms, 98
Kelleys Convergence Theorem, 110
Kelleys Filter Convergence Axioms, 107
Kelleys Filter Convergence Theorem, 112
Kolmogorov (topological space), 128

342
Kolmogorov quotient, 129
Kuratowski Closure Axioms, 81, 99
Kuratowski Interior Axioms, 82, 100
Kuratowskis Closure Theorem, 108
Kuratowskis Interior Theorem, 109
Laplacian, 285
Least upper-bound property, 32
Lebesgue measure, 234
Lebesgues Monotone Convergence
Theorem, 258
Left-inverse (of a function), 316
Limit (of a filter), 105
Limit (of a function), 72, 98
Limit (of a net), 49, 97
Limit Comparison Test, 63
Limit inferior, 54
Limit point, 76, 98
Limit superior, 54
Linear functionals, 276
Lipschitz-continuous, 174
Local base, 91
Local diffeomorphism, 310
Local extremum, 288
Local maximum, 288
Local minimum, 288
Locally XYZ, 150
Lower-bound, 324
Lower-semicontinuous, 268
Maclaurin series, 295
Magma, 327
Maximal, 324
Maximum, 324
Mean Value Theorem, 289
Measurable (function), 212
Measurable (set), 210
Meet, 314
Meet (of covers), 157
Metric, 171
Metric (on a vector space), 279
Metric outer-measure, 222
Metric space, 171
Minimal, 324
Minimum, 324
Minkowskis Inequality, 271
Monoid, 327
Monotone Convergence Theorem, 54
Morphisms, 333
Natural logarithm, 299

BIBLIOGRAPHY
Natural numbers, 14
Neighborhood, 91
Neighborhood base, 91
Net, 48, 97
Niemytzkis Tangent Disk Topology, 204
Nondecreasing, 323
Nondegenrate (metric), 279
Nonincreasing, 323
Nonsingular (metric), 279
Normal (topological space), 140
Normal subgroups, 331
Normed vector space, 181
Objects, 333
Open (in R), 75
Open (in a topological space), 89
Open cover, 83, 101
Open function, 122
Open neighborhood, 91
Order Limit Theorem, 53
Order topology, 93
Ordered pair, 314
Outer-measure, 208
Outer-measure space, 208
Partial derivatives, 304
Partial sums, 61
Partial-order, 323
Partially-ordered set, 323
Partition, 321
Pasting Lemma, 96
Peano axioms, 24
Perfectly-T2 , 134
Perfectly-separated, 127
Polynomial cring, 46
Poset, 323
Power series, 294
Power set, 315
Precisely-separated, 127
Preorder, 323
Preordered rg, 21
Preordered set, 323
Product isogeneous structure, 243
Product measure, 235
Product Rule, 286
Product topology, 120
Projection (cartesian-product), 319
Proper class, 312
Proper subset, 313
Pseudometrics, 171

343

BIBLIOGRAPHY
Quasicompact, 83, 101
Quasicompactly-generated, 193
Quotient function, 322
Quotient group, 331
Quotient rng, 332
Quotient Rule, 287
Quotient set, 322
Quotient topology, 120
Radius of convergence, 294
Range, 316
Rational functions, 46
Rational numbers, 29
Real numbers, 34
Refinement, 157
Reflexive, 320
Regular (topological space), 134
Regular measure, 215
Regular measure space, 215
Relation, 315
Restriction (disjoint-union), 319
Reverse Triangle Inequality, 44
Rg, 328
Riemann integrable, 269
Riemann integral, 269
Riesz-Fischer Theorem, 273
Rig, 329
Right-inverse (of a function), 316
Ring, 329
Rng, 329
Root Test, 63
Russels paradox, 311
Semifinite, 219
Semigroup, 327
Semimetric, 171
Semimetric space, 171
Seminorm, 181
Seminormed algebra, 183
Seminormed vector space, 181
Separated, 124
Separated by closed neighborhoods, 125
Separated by continuous functions, 126
Separated by neighborhoods, 125
Sequence, 48, 97
Series, 61
Signum function, 21
Simple function, 252, 253
Simplified Arens Square, 133
Sine function, 298

Skew-field, 330
Smooth, 283
Squeeze Theorem, 53
Star, 154
star-refinement, 157
Statement, 312
Strict subnet, 69, 98
Strictly-positive (measure), 221
Subadditive, 208
Subbase, 103
Subnet, 9, 68, 98
Subsequence, 68, 98
Subset, 313
Subspace topology, 119
Successor function, 24
Supremum, 30
Supremum norm, 183
Supremum seminorm, 183
Surjective, 317
Symmetric, 320
Tangent bundle, 280
Tangent space, 280
Taylor series, 295
Tensor, 277
Tensor field, 280
Tensor product (of tensors), 276
Tensor product (of vector spaces), 276
The Haar-Howes Theorem, 226
Thomae Function, 73
Thomas Tent Space, 138
Topological measure space, 216
Topological space, 89
Topological vector space, 180
Topologically-additive, 217
Topologically-distinguishable, 124
Topology, 89
Total, 320
Total-order, 324
Totally-ordered set, 324
Transitive, 320
Triangle Inequality, 44
Triangle Inequality (integral), 265
Tropical integers, 28
Tuple, 314
Two-sided-inverse (of a function), 316
Tychnoffs Theorem, 122
tychonoff, 139
Uncountable Fort Space, 128

344
Uniform base, 162
Uniform convergence, 183
Uniform covers, 160
Uniform space, 159
Uniform topology, 160
Uniform-homeomorphism, 161
Uniformity, 159
Uniformly-T0 , 184
Uniformly-T2 , 184
Uniformly-T3 , 185
Uniformly-T4 , 185
Uniformly-additive, 222
uniformly-completely-T2 , 185
Uniformly-completely-T3 , 185
Uniformly-completely-T4 , 185
Uniformly-completely-separated, 184
Uniformly-distinguishable, 184
Uniformly-measurable base, 225
Uniformly-measurable cover, 225
uniformly-perfectly-T2 , 185

BIBLIOGRAPHY
Uniformly-perfectly-T3 , 185
Uniformly-perfectly-T4 , 185
Uniformly-perfectly-separated, 184
Uniformly-separated, 184
Union, 314
Unique up to isomorphism, 25
Upper bound, 324
Upper-semicontinuous, 268
Upward closed, 325
Upward-closed, 159
Urysohn, 132
Urysohns Lemma, 142
Vector bundle, 280
Well-defined, 15
Well-order, 324
Well-ordered set, 324
Zero cring, 330
Zorns Lemma, 325

Index of notation

2X , 315
A B, 314
A
= B, 335
A
=C B, 335
A B, 314
B (x0 ), 44
C (Rd ), 285
D (x0 ), 44
Hg, 331
R , 329
S R, 315
V W , 276
X \Y , 313
X tY , 314
X Y , 313
X Y , 314
X + , 21
X0+ , 21
Y X , 315
Y c , 313
[(a, b)], 59
[x0 ], 320
[x0 ] , 320
AR , 43
A, 43
AutC (A), 335
Bor(X), 253
Bor+
0 (X), 253

Cls(S), 79, 99
Dv f (x), 281
EndC (A), 335
Int(S), 80, 100
IsoC (A, B), 335
Mag, 334
MorC (A, B), 333
N, 14
PreRg, 22
PreRig, 22
PreRing, 22
PreRng, 22
Pre, 334
Rg, 334
Rig, 334
Ring, 334
Rng, 334
Set, 333
StarU (S), 154
StarU (x), 154
TopGrp, 177
Top, 95
Uni, 161
Z, 25
|X|, 14
|x|, 44
0 , 42
S , 252

:=, 313
hx,
yi, 314
XX X, 318
cos, 298
deg(p), 46
distC (x), 173
e, 298
, 312
exp, 298
,
312
R
d m(x) f (x), 258, 259
RX
dx f (x), 259
RX
[a,b] dx f (x), 259
lcm, 176
lim inf x , 54
lim sup x , 54
lim x , 50
limm xm , 50
limt0+ xt , 50
limt xt , 50
limxa f (x) = L, 72
BU , 178
C0 , 333
F 7x , 105
U V , 157
U  V , 157
U V , 157
U , 169

346
m1 m2 , 235
| f |K , 183
| f | , 183
| f | p , 270

X , 319
XX X, 319
f |X , 319
sgn, 21
sin, 298
supxS , 48

BIBLIOGRAPHY
, 300
, 300
eG , 178
B
fD , 173
U
ax , 74
a , 48
f (U ), 158
f 1 , 316
f 1 (V ), 158
f+ , 258

f , 258
fX , 319
gH, 331
g1
= g2 (mod H), 331
m n, 17
v w, 276
va a f (x), 281
x R y, 315
x y, 315

Potrebbero piacerti anche