Sei sulla pagina 1di 142

An Introduction to

Multivariable Mathematics
Copyright © 2008 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.

An Introduction to Multivariable Mathematics


Leon Simon
www.morganclaypool.com

ISBN: 9781598298017 paperback


ISBN: 9781598298024 ebook

DOI 10.2200/S00147ED1V01Y200808MAS003

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON MATHEMATICS AND STATISTICS

Lecture #3
Series Editor: Steven G. Krantz, Washington University, St. Louis

Series ISSN
Synthesis Lectures on Mathematics and Statistics
ISSN pending.
An Introduction to
Multivariable Mathematics

Leon Simon
Stanford University

SYNTHESIS LECTURES ON MATHEMATICS AND STATISTICS #3

M
&C Morgan & cLaypool publishers
ABSTRACT
The text is designed for use in a 40 lecture introductory course covering linear algebra, multivariable
differential calculus, and an introduction to real analysis.
The core material of the book is arranged to allow for the main introductory material on linear
algebra, including basic vector space theory in Euclidean space and the initial theory of matrices and
linear systems, to be covered in the first 10 or 11 lectures, followed by a similar number of lectures
on basic multivariable analysis, including first theorems on differentiable functions on domains in
Euclidean space and a brief introduction to submanifolds. The book then concludes with further
essential linear algebra, including the theory of determinants, eigenvalues, and the spectral theorem
for real symmetric matrices, and further multivariable analysis, including the contraction mapping
principle and the inverse and implicit function theorems. There is also an appendix which provides
a 9 lecture introduction to real analysis. There are various ways in which the additional material in
the appendix could be integrated into a course—for example in the Stanford Mathematics honors
program, run as a 4 lecture per week program in the Autumn Quarter each year, the first 6 lectures
of the 9 lecture appendix are presented at the rate of one lecture a week in weeks 2–7 of the quarter,
with the remaining 3 lectures per week during those weeks being devoted to the main chapters of
the text.
It is hoped that the text would be suitable for a 1 quarter or 1 semester course for students who have
scored well in the BC Calculus advanced placement examination (or equivalent), particularly those
who are considering a possible major in mathematics.
The author has attempted to make the presentation rigorous and complete, with the clarity and
simplicity needed to make it accessible to an appropriately large group of students.

KEYWORDS
vector, dimension, dot product, linearly dependent, linearly independent, subspace,
Gaussian elimination, basis, matrix, transpose matrix, rank, row rank, column rank,
nullity, null space, column space, orthogonal, orthogonal complement, orthogonal pro-
jection, echelon form, linear system, homogeneous linear system, inhomogeneous linear
system, open set, closed set, limit, limit point, differentiability, continuity, directional
derivative, partial derivative, gradient, isolated point, chain rule, critical point, second
derivative test, curve, tangent vector, submanifold, tangent space, tangential gradient,
permutation, determinant, inverse matrix, adjoint matrix, orthonormal basis, Gram-
Schmidt orthogonalization, linear transformation, eigenvalue, eigenvector, spectral the-
orem, contraction mapping, inverse function, implicit function, supremum, infimum,
sequence, convergent, series, power series, base point, Taylor series, complex series, vec-
tor space, inner product, orthonormal sequence, Fourier series, trigonometric Fourier
series.
v

Contents
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Dot product and angle between vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Subspaces and linear dependence of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Gaussian Elimination and the Linear Dependence Lemma . . . . . . . . . . . . . . . . . . . . .7


5 The Basis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

7 Rank and the Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Orthogonal complements and orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . .18

9 Row Echelon Form of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

10 Inhomogeneous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Analysis in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1 Open and closed sets in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


2 Bolzano-Weierstrass, Limits and Continuity in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

4 Directional Derivatives, Partial Derivatives, and Gradient . . . . . . . . . . . . . . . . . . . . . 37


5 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Higher-order partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42


vi CONTENTS

7 Second derivative test for extrema of multivariable function . . . . . . . . . . . . . . . . . . . 44

8 Curves in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9 Submanifolds of Rn and tangential gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

3 More Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64

3 Inverse of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


4 Computing the Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Orthonormal Basis and Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Matrix Representations of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 75


7 Eigenvalues and the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 More Analysis in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

1 Contraction Mapping Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

2 Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82


3 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A Introductory Lectures on Real Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Lecture 1: The Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Lecture 2: Sequences of Real Numbers and the Bolzano-Weierstrass Theorem . . . . . . . . .91

Lecture 3: Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96


Lecture 4: Series of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Lecture 5: Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


Lecture 6: Taylor Series Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Lecture 7: Complex Series, Products of Series, and Complex Exponential Series . . . . . . 113

Lecture 8: Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116


CONTENTS vii

Lecture 9: Pointwise Convergence of Trigonometric Fourier Series . . . . . . . . . . . . . . . . . . 121

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
viii CONTENTS
Preface
The present text is designed for use in an introductory honors course. It is in fact a fairly faithful rep-
resentation of the material covered in the course “Math 51H—Honors Multivariable Mathematics”
taught during the 10 week Autumn quarter in the Stanford Mathematics Department, and typically
taken in the first year by those students who have scored well in the BC Calculus advanced placement
examination (or equivalent), many of whom are considering a possible major in mathematics.
The text is designed to be covered by 4 lectures per week (each of 50 minutes duration) in a 10 week
quarter, or by 3 lectures per week over a 13 week semester. In the Stanford program the material in
the Appendix (“Introductory lectures on real analysis”) is run parallel to the material of the main
text—the first 6 real analysis lectures are presented in doses of 1 lecture per week (Mondays are “real
analysis days” in the Stanford program) over a 6-week period, and during these weeks the remaining
3 lectures per week are devoted to the main text. A typical timetable based on this approach is to be
found at:
http://math.stanford.edu/˜lms/51h-timetable.htm
A serious effort has been made to ensure that the present text is rigorous and complete, but at the
same time that it has the clarity and simplicity needed to make it accessible to an appropriately large
group of students. For most students, regular (at least weekly) problem sets will be of paramount
importance in coming to grips with the material, and ideally these problem sets should involve a
mixture of questions which are closely integrated with the lectures and which are designed to help
in the assimilation the various concepts at a level which is at least one step removed from routine
application of standard definitions and theorems.
This text is dedicated to the many Stanford students who have taken the Honors Multivariable
Mathematics course in recent years. Their vitality and enthusiasm have been an inspiration.

Leon Simon
Stanford University
July 2008
x PREFACE
1

CHAPTER 1

Linear Algebra
1 VECTORS IN Rn
Rn will denote real Euclidean space of dimension n—that is, Rn is the set of all ordered n-tuples
of real numbers, referred to as “points in Rn ” or “vectors in Rn ”—the two alternative terminologies
“points” and “vectors” will be used quite interchangeably, and we in no way distinguish between
them.1 Depending on the context we may decide to write vectors (i.e., points) in Rn as columns or
rows: in the latter case, Rn would denote the set of all ordered n-tuples (x1 , . . . , xn ) with xj ∈ R
⎛ ⎞
x1
⎜x2 ⎟
⎜ ⎟
for each j = 1, . . . , n. In the former case, Rn would denote the set of all columns ⎜ . ⎟. In this
⎝ .. ⎠
xn
case we often (for compactness of notation) write (x1 , . . . , xn )T as an alternative notation for the
⎛ ⎞ ⎛ ⎞T
x1 x1
⎜ x2 ⎟ ⎜ x2 ⎟
⎜ ⎟ ⎜ ⎟
column vector ⎜ . ⎟, and conversely ⎜ . ⎟ = (x1 , . . . , xn ).
⎝ .. ⎠ ⎝ .. ⎠
xn xn
For the moment, we shall write our vectors (i.e., points) as rows. Thus, for the moment, points
x ∈ Rn will be written x = (x1 , . . . , xn ). The real numbers x1 , . . . , xn are referred to as components
of the vector x. More specifically, xj is the j -th component of x.
To start with, there are two important operations involving vectors in Rn : we can (i) add them
according to the definition

1.1 (x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn )

and (ii) multiply a vector (x1 , . . . , xn ) in Rn by a scalar (i.e., by a λ ∈ R) according to the definition

1.2 λ(x1 , . . . , xn ) = (λx1 , . . . , λxn ) .

Using these two definitions we can prove the following more general property: If x = (x1 , . . . , xn ),
y = (y1 , . . . , yn ), and if λ, μ ∈ R, then

1.3 λx + μy = (λx1 + μy1 , . . . , λxn + μyn ) .


1 Nevertheless, from the intuitive point of view it is sometimes helpful to think geometrically of vectors as directed arrows; for
example, such a geometric interpretation is suggested in the discussion of line vectors below.
2 CHAPTER 1. LINEAR ALGEBRA

Notice, in particular, that, using the definitions 1.1, 1.2, starting with any vector x = (x1 , . . . , xn )
we can write

n
1.4 x= xj ej ,
j =1

where ej is the vector with each component = 0 except for the j -th component which is equal to 1.
Thus, e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1). The vectors e1 , . . . , en are
called standard basis vectors for Rn .
The definitions 1.1, 1.2 are of course geometrically motivated—the definition of addition is motivated
by the desire to ensure that the triangle (or parallelogram) law of addition holds (we’ll discuss this
below after we introduce the notion of “line vector” joining two points of Rn ) and the second is
motivated by the natural desire to ensure that λx should be a vector in either the same or reverse
direction as x (according as λ is positive or negative) and that it should have “length” equal to |λ|
times the original length of x. To make sense of this latter requirement, we need to first define length
of a vector x = (x1 , . . . , xn ) ∈ Rn : motivated by2 Pythagoras, it is natural to define the length of x
(also referred to as the distance of x from 0), denoted x, by




n 2
1.5 x = xj .
j =1


There is also a notion of “line vector” AB from one point A = a = (a1 , . . . , an ) ∈ Rn to another
point B = b = (b1 , . . . , bn ) defined by

1.6 AB = (b1 − a1 , . . . , bn − an ) (= b − a using the definition 1.3 ) .

Notice, that geometrically, we may think of AB as a directed line segment (or arrow) with tail at

A and head at B, but mathematically, AB is nothing but b − a, i.e., the point with coordinates
(b1 − a1 , . . . , bn − an ). By definition, we then do have the triangle identity for addition:
→ → →
1.7 AB + BC = AC ,
→ →
because if A = a = (a1 , . . . , an ), B = b = (b1 , . . . , bn ), C = c = (c1 , . . . , cn ) then AB + BC =

(b − a) + (c − b) = c − a = AC.

2There is often confusion here in elementary texts, which are apt to claim, at least in the case n = 2, 3, that 1.5 is a consequence
of Pythagoras, i.e., that 1.5 is proved by using Pythagoras’ theorem; that is not the case—1.5 is a definition, motivated by the desire
to ensure that the Pythagorean theorem holds when we later give the appropriate definition of angle between vectors.
2. DOT PRODUCT AND ANGLE BETWEEN VECTORS IN Rn 3

2 DOT PRODUCT AND ANGLE BETWEEN VECTORS IN Rn


If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ), we define the dot product of x and y, denoted by x · y,

by x · y = nj=1 xj yj . Notice that then the dot product has the properties:

(i) x·y =y·x


(ii) (μx + λy) · z = μx · z + λy · z
(iii) x · x = x2 .

Using the above properties, we can check the identity

2.1 x + y2 = x2 + y2 + 2x · y

as follows:

x + y2 = (x + y) · (x + y) (by (iii))


= x · x + y · y + 2x · y (by several applications of (i) and (ii))
= x2 + y2 + 2x · y (by (iii) again) .

In particular, with ty in place of y (where t ∈ R) we see that

2.2 x + ty2 = t 2 y2 + 2tx · y + x2 .

Observe that if y = 0 this is a quadratic in the variable t, and by using “completion of the square”
at 2 + 2bt + c = a(t + b/a)2 + (ac − b2 )/a (valid if a  = 0), we see that 2.2 can be written as:

2.3 x + ty2 = y2 (t + x · y/y)2 + (y2 x2 − (x · y)2 )/y .

Observe that the right side here evidently takes its minimum value when, and only when, t =
−x · y/y and the corresponding minimum value of x + ty2 is (y2 x2 − (x · y)2 )/y and
the corresponding value of x + ty is the point x − y−2 (x · y)y. Thus,

2.4 x + ty2 ≥ (y2 x2 − (x · y)2 )/y ,

with equality if and only if t = −x · y/y, in which case x + ty = x − y−2 (x · y)y. We can give
a geometric interpretation of 2.4 as follows.
For y  = 0, the straight line  through x parallel to y is by definition the set of points {x + ty : t ∈ R},
so 2.4 says geometrically that the point x − y −2 (x · y)y is the point of  which has least distance
to the origin, and the least distance is equal to (y2 x2 − (x · y)2 )/y.
Observe that an important particular consequence of the above discussion is that y2 x2 − (x ·
y)2 ≥ 0, i.e.,

2.5 |x · y| ≤ x y ∀ x, y ∈ Rn ,


4 CHAPTER 1. LINEAR ALGEBRA

and in the case y  = 0 equality holds in 2.5 if and only if x = y−2 (x · y)y. (We technically
proved 2.5 only in case y  = 0, but observe that 2.5 is also true, and equality holds, when y = 0.)
The inequality 2.5 is known as the Cauchy-Schwarz inequality.
An important corollary of 2.5 is the following “triangle inequality” for vectors in Rn :

2.6 x, y ∈ Rn ⇒ x + y ≤ x + y .

This is proved as follows. By 2.1 and 2.5, x + y2 = x2 + y2 + 2x · y ≤ x2 + y2 +
2xy = (x + y)2 and then 2.6 follows by taking square roots.
Now we want to give the definition of the angle between two nonzero vectors x, y. As a preliminary
we recall that the function cos θ is a 2π -periodic function which has maximum value 1, attained
at θ = kπ , k any even integer, and minimum value −1 attained at θ = kπ with k any odd integer.
Also cos θ is strictly decreasing from 1 to −1 as θ varies between 0 and π. Thus, for each value
t ∈ [−1, 1] there is a unique value of θ ∈ [0, π] (denoted arccos t) with cos θ = t. We will give a
proper definition of the function cos θ and discuss these properties in detail later (in Lecture 6 of
Appendix A), but for the moment we just assume them without further discussion. We are now ready
to give the formal definition of the angle between two nonzero vectors. So suppose x, y ∈ Rn \ {0}.
Then we define3 the angle θ between x, y by
 
2.7 θ = arccos x−1 x · y−1 y .

Notice this makes sense because x−1 x · y−1 y ∈ [−1, 1] by the Cauchy-Schwarz inequality 2.5,
which says precisely that −xy ≤ x · y ≤ xy.
Observe that by definition we then have

2.8 x · y = xy cos θ, x, y ∈ Rn \ {0} ,

where θ is the angle between x, y.

2.9 Definition: Two vectors x, y are said to be orthogonal if x · y = 0.


Observe that then the zero vector is orthogonal to every other vector, and, by 2.8, two nonzero
vectors x, y ∈ Rn are orthogonal if and only if the angle between them is π/2 (because cos θ = 0
with θ ∈ [0, π ] if and only if θ = π/2).

SECTION 2 EXERCISES
2.1 Explain why the following “proof ” of the Cauchy-Schwarz inequality |x · y| ≤ x y, where
x, y ∈ Rn , is not valid:
3 Again, there is often confusion in elementary texts about what is a definition and what is a theorem. In the present treatment,

2.7 is a definition which relies only on the Cauchy-Schwarz inequality 2.5 (which ensures x −1 y −1 x · y ∈ [−1, 1]) and the
fact that arccos is a well defined function on [−1, 1] (with values ∈ [0, π ]). With this approach it becomes a theorem rather than
a definition that cos θ is the ratio (length of the edge adjacent to the angle θ )/(length of hypotenuse) in a right triangle—see
problem 6.5 of real analysis Lecture 6 in the Appendix for further discussion.
3. SUBSPACES AND LINEAR DEPENDENCE OF VECTORS 5

Proof: If either x or y is zero, then the inequality |x · y| ≤ x y is trivially correct because both
sides are zero. If neither x nor y is zero, then as proved above x · y = x y cos θ, where θ is the
angle between x and y. Hence, |x · y| = x y | cos θ | ≤ x y.
Note: We shall give a detailed discussion of the function cos x later (in Lecture 6 of the Appendix); for the moment you should of
course assume that cos x is well defined and has its usual properties (e.g., it is 2π -periodic, has absolute value ≤ 1 and takes each
value in [−1, 1] exactly once if we restrict x to the interval [0, π ], etc.). Thus, | cos θ | ≤ 1 (used in the last step above) is certainly
correct.

2.2 (a) The Cauchy-Schwarz inequality proved above guarantees |x · y| ≤ xy and hence x · y ≤
xy for all x, y ∈ Rn . If y  = 0, prove that equality holds in the first inequality if and only if
x = λy for some λ ∈ R and equality holds in the second inequality if and only if x = λy for some
λ ≥ 0.
(b) By examining the proof of the triangle inequality x + y ≤ x + y given above (recall that
proof began with the identity x + y2 = x2 + y2 + 2x · y), prove that equality holds in the
triangle inequality ⇐⇒ either at least one of x, y is 0 or x, y  = 0 and y = λx with λ > 0.

2.3 (Another proof of the Cauchy-Schwarz inequality.) If a = (a1 , . . . , an ), b =



(b1 , . . . , bn ) ∈ Rn , prove the identity 21 ni,j =1 (ai bj − aj bi )2 = a2 b2 − (a · b)2 , and
hence prove |a · b| ≤ ab.

2.4 Using the dot product, prove, for any vectors x, y ∈ Rn :


(a) The parallelogram law: x − y2 + x + y2 = 2(x2 + y2 ).
(b) The law of cosines: x − y2 = x2 + y2 − 2x y cos θ , assuming x, y are nonzero
and θ is the angle between x and y.
(c) Give a geometric interpretation of identities (a),(b) (i.e., describe what (a) is saying about the
−→ −→ −→
parallelogram determined by x, y—i.e., OACB where OA = x, OB = y, OC = x + y, and what (b)
−→ −→
is saying about the triangle determined by x and y—i.e., OAB , where OA = x, OB = y).

3 SUBSPACES AND LINEAR DEPENDENCE OF VECTORS


We say that a subset V of Rn is a subspace of Rn (or more specifically a linear subspace of Rn ) if V
has the properties that it contains at least the zero vector 0 = (0, . . . , 0), i.e.,

3.1 0∈V

and if it is closed under addition and multiplication by scalars, i.e.,

3.2 x, y ∈ V and λ, μ ∈ R ⇒ λx + μy ∈ V .

For example, the subset V consisting of just the zero vector (i.e., V = {0}) is a subspace (usually
referred to as the trivial subspace) and the subset V consisting of all vectors in Rn (i.e., V = Rn ) is a
subspace.
6 CHAPTER 1. LINEAR ALGEBRA

To facilitate discussion of some less trivial examples we need to introduce some important termi-
nology, as follows: We first define the span of a given collection v 1 , . . . , v N of vectors in Rn .
3.3 Definition:

span{v 1 , . . . , v N } = {c1 v 1 + c2 v 2 + · · · cN v N : c1 , . . . , cN ∈ R} .

An expression of the form c1 v 1 + c2 v 2 + · · · + cN v N (with c1 , . . . , cN ∈ R) is a linear combination


of v 1 , . . . , v N , and the linear combination is said to be nontrivial if not all the c1 , . . . , cN are zero.
Using this terminology the Def. 3.3 says that span{v1 , . . . , v N } is the set of all linear combinations
of v 1 , . . . , v N .
We claim the following.

3.4 Lemma. For any given nonempty collection v 1 , . . . , v N ∈ Rn , span{v 1 , . . . , v N } is a subspace of Rn .

Proof: Let V = span{v 1 , . . . , v N }. Notice that c1 v 1 + · · · + cN v N = 0 if cj = 0 ∀ j = 1, . . . , N,


hence 0 ∈ V . Also, if w1 , w2 ∈ V then w1 = c1 v 1 + · · · + cN v N and w 2 = d1 v 1 + · · · + dN v N for
suitable choices of cj , dj ∈ R, and so λw1 + μw 2 = (λc1 + μd1 )v 1 + · · · + (λcN + μdN )v N ∈ V .

One of the theorems we’ll prove later is that in fact any subspace V of Rn can be expressed as the
span of a suitable collection of vectors v 1 , . . . , v N ∈ Rn .

We need one more piece of terminology.


3.5 Definition: Vectors v 1 , . . . , vN are linearly dependent (l.d.) if some nontrivial linear combina-
tion of v 1 , . . . , v N is the zero vector, i.e., v 1 , . . . , v N are linearly dependent if there are constants
c1 , . . . , cN not all zero such that c1 v 1 + · · · + cN v N = 0.
Notice that v 1 , . . . , v N are l.d. if and only if one of the vectors v 1 , . . . , v N can be expressed as a
linear combination of the others. That is:

3.6 Lemma. v 1 , . . . , v N are l.d. ⇐⇒ ∃ j ∈ {1, . . . , N} and constants ci , i  = j , with vj =


N
i=j,i=1 ci v i .

Proof of ⇒: We are given c1 , . . . , cN not all zero such that i ci v i = 0. Pick j such that cj = 0.
−ci
Then v j = N i=j,i=1 cj v i .

Proof of ⇐: Assume j ∈ {1, . . . , N} and ci , i = j , are such that v j = N i=j,i=1 ci v i . Then
N
i=1 ci v i = 0, where we define cj = −1.

SECTION 3 EXERCISES
3.1 Let v 1 , . . . , v k be any set of vectors in Rn . Let  v k be obtained by applying any one of
v 1 , . . . ,
the 3 “elementary operations” to v 1 , . . . , v k , where the 3 elementary operations are: (i) interchange
of any pair of vectors (i.e., (i) merely changes the ordering of the vectors); (ii) multiplication of one
4. GAUSSIAN ELIMINATION AND THE LINEAR DEPENDENCE LEMMA 7

of the vectors by a nonzero scalar; (iii) replacing the i-th vector v i by the new vector 
v i = v i + μv j ,
where i  = j and μ ∈ R.) Prove that span{v 1 , . . . , v k } = span{ v k }.
v 1 , . . . ,

4 GAUSSIAN ELIMINATION AND THE LINEAR


DEPENDENCE LEMMA
We begin with some remarks about linear systems of equations:


⎪ a11 x1 + a12 x2 + · · · + a1 n xn = b1

⎨ a21 x1 + a22 x2 + · · · + a2 n xn = b2
(∗) .. .. ..

⎪ . . .


am1 x1 + am2 x2 + · · · + amn xn = bm ,
where aij , bi are given real constants and the unknowns x1 , . . . , xn are to be determined.
Terminology:
(1) Any choice x1 , . . . , xn which satisfies all m of the equations is called a solution of the system.
(2) The corresponding vector (x1 , x2 , · · · , xn )T is called a vector solution (or a solution vector) for
the system. (Here we use the notation that (x1 , x2 , · · · , xn )T denotes the column vector with j -th
entry xj ; for reasons which will become apparent later, we always write solution vectors as column
vectors rather than row vectors.)
(3) The set of all possible solution vectors is called the solution set of the system.

There is a systematic procedure for solving (i.e., finding the solution set of ) such linear systems,
known as Gaussian elimination. We’ll discuss Gaussian elimination and its consequences in detail
later in this chapter, but for the moment we just need to describe the first step in the process.
The procedure is based on the (easily checked) observation that the set of solutions of the system (∗)
is unchanged under any one of the following 3 operations:
(i) interchanging any two of the equations;
(ii) multiplying any one of the equations by a nonzero constant;
(iii) adding a multiple of one equation to another one of the equations, i.e., if we add a multiple of
the i-th equation to the j -th equation, where i  = j .
We now consider the system (∗), and the following alternatives.
Case 1: The coefficient ai1 of x1 is zero for each i = 1, . . . , m; thus in this case the system (∗) has
the form ⎧

⎪ 0x1 + a12 x2 + · · · + a1 n xn = b1

⎨0x1 + a22 x2 + · · · + a2 n xn = b2
⎪ .. .. ..

⎪ . . .

0x1 + am2 x2 + · · · + amn xn = bm
8 CHAPTER 1. LINEAR ALGEBRA

(i.e., the unknown x1 does not appear at all in any of the equations).
Case 2: At least one of the coefficients ai1  = 0; then by operation (i) we may arrange that a11 = 0,
and by operation (ii) we may actually arrange that a11 = 1, so that the system (∗) can be changed
to a new system having the same set of solutions as the original system and having the form


⎪ 1 x1 +  a12 x2 + · · · +  a1 n xn =  b1

⎨a21 x1 +  a22 x2 + · · · +  a2 n xn =  b2
⎪ .. .. ..

⎪ . . .

am1 x1 + 
 am2 x2 + · · · +  amn xn =  bm .
Now the operation (iii) allows us to subtract  a21 times the first equation from the second equation,

a31 times the first equation from the third equation, and so on, thus giving a new system having the
same set of solutions as the original system and having the form


⎪ 1x1 +  a12 x2 + · · · +  a1 n xn =  b1

⎨ 0x1 +  a22 x2 + · · · +  a2 n xn =  b2
⎪ .. .. ..

⎪ . . .

0x1 +  am2 x2 + · · · +  amn xn =  bm
(i.e., the x1 unknown only appears in the first equation), where each  bj is a linear combination
of the original b1 , . . . , bn ; in particular, 
bj = 0 for each j = 1, . . . , m if all the original bj = 0,
j = 1, . . . , m.
This completes the first step of the Gaussian elimination process. Notice that this first step provides
us with a new system of equations which is equivalent to (i.e., has precisely the same solution set as)
the original system, and which either has the form


⎪ 0x1 + a12 x2 + · · · + a1 n xn = b1

⎨0x1 + a22 x2 + · · · + a2 n xn = b2
(∗) .. .. ..

⎪ . . .


0x1 + am2 x2 + · · · + amn xn = bm
(i.e., the x1 unknown does not appear at all in any of the equations), or


⎪ 1x1 +  a12 x2 + · · · +  a1 n xn = b1

⎨0x1 +  
a22 x2 + · · · +  a2 n xn = b2
(∗) . . ..

⎪ .
. .
. .


0x1 + am2 x2 + · · · +  amn xn =  bm
(i.e., the x1 unknown only appears in the first equation).
The system (∗) is said to be homogeneous if all the bj = 0, j = 1, . . . , m; as we noted above, the
first step in the Gaussian elimination process gives a new system which is also homogeneous if the
original system is homogeneous.
4. GAUSSIAN ELIMINATION AND THE LINEAR DEPENDENCE LEMMA 9

The following lemma refers to such a homogeneous system of linear equations, i.e., a system of m
equations in n unknowns thus:


⎪ a11 x1 + a12 x2 + ···+ a1 n xn =0

⎨ a21 x1 + a22 x2 + ···+ a2 n xn =0
(∗∗) .. .. ..

⎪ . . .


am1 x1 + am2 x2 + · · · + amn xn = 0.

Such a system is called “under-determined” if there are fewer equations than unknowns, i.e., if
m < n.

4.1 “Under-determined systems lemma:” An under-determined homogeneous system of linear equations


(i.e., a system as in (∗∗) above with m < n) always has a nontrivial solution.

Proof: The proof is based on induction on m. When m = 1 the given system is just the single
equation
a11 x1 + a12 x2 + · · · + a1n xn = 0
with n ≥ 2, and if a11 = 0 we see that x1 = 1, x2 = · · · = xn = 0 is a nontrivial solution. On the
other hand, if a11  = 0 then we get a nontrivial solution by taking x2 = 1, x1 = −(a12 /a11 ), and
x3 = · · · = xn = 0. Thus, the lemma is true in the case m = 1.
So take m ≥ 2 and, as an inductive hypothesis, assume that the theorem is true with m − 1 in place
of m, and that we are given the homogeneous system (∗∗) with m < n. According to the above
discussion of Gaussian elimination we know that (∗∗) has the same set of solutions as a system of
the form


⎪a11 x1 + 
a12 x2 + · · · +  a1 n xn = 0

⎨ 0 x1 +  a22 x2 + · · · +  a2 n xn = 0
(∗∗) . . ..

⎪ .. .. .


0 x1 + am2 x2 + · · · +  amn xn = 0,

where either  a11 = 1 or 0, so it suffices to show that there is a nontrivial solution of (∗∗). The last
m − 1 equations here is an under-determined system of m − 1 equations in the n − 1 unknowns
x2 , . . . , xn , so by the inductive hypothesis there is a nontrivial solution x2 , . . . , xn . If 
a11 = 1 we
also solve the first equation by taking x1 = −( a12 x2 + · · · + 
a1 n xn ), so the proof is complete in the
case  a11 = 1.
On the other hand, if 
a11 = 0 then we get a nontrivial solution of the system (∗∗) by simply taking
x1 = 1 and x2 = · · · = xn = 0, so the proof is complete.

A very important corollary of the under-determined systems lemma is the following linear depen-
dence lemma.
10 CHAPTER 1. LINEAR ALGEBRA

4.2 “Linear Dependence Lemma.” Suppose v 1 , v 2 , . . . , v k are any vectors in Rn . Then any k + 1
vectors in span{v 1 , . . . , v k } are l.d.
4.3 Remark: An important special case of this is when k = n and v j = ej , where ej is the j -th
standard basis vector as in 1.4. Since span{e1 , . . . , en } = all of Rn (by 1.4), in this case the above
Linear Dependence Lemma says simply that any n + 1 vectors in Rn are linearly dependent.

Proof of the Linear Dependence Lemma: Let w1 , . . . , wk+1 ∈ span{v 1 , . . . , v k }.Then each wj =
a linear combination of v 1 , . . . , v k , so that

(1) w j = a1j v 1 + a2j v 2 + · · · + akj v k , j = 1, . . . , k + 1 ,

for suitable scalars aij , i = 1, . . . , k, j = 1, . . . , k + 1.


Now we want to prove that w 1 , . . . , wk+1 are l.d.; in other words, we want to prove that the system
of equations

(2) x1 w1 + · · · + xk+1 wk+1 = 0

has a nontrivial solution (i.e., we must show that we can find x1 , . . . , xk+1 which are not all zero
and which satisfy the system (2)).
Substituting the given expressions (1) for the wj into (2), we see that (2) actually says

x1 (a11 v 1 + · · · + ak1 v k ) + x2 (a12 v 1 + · · · + ak2 v k ) + · · ·


· · · + xk+1 (a1 k+1 v 1 + · · · + ak k+1 v k ) = 0 ;
that is,

(3) (x1 a11 + · · · + xk+1 a1 k+1 )v 1 + (x1 a21 + · · · + xk+1 a2 k+1 )v 2 + · · ·


· · · + (x1 ak 1 + x2 ak 2 + · · · + xk+1 ak k+1 )v k = 0 ,

so it suffices to get a nontrivial solution of the homogeneous linear system

a11 x1 + a12 x2 + · · · + a1 k+1 xk+1 =0


a21 x1 + a22 x2 + · · · + a2 k+1 xk+1 =0
.. .. ..
. . .
ak1 x1 + ak2 x2 + · · · + ak k+1 xk+1 =0

of k equations in the k + 1 unknowns x1 , . . . , xk+1 , and we can do this by the under-determined


systems Lem. 4.1, so the proof is complete.

SECTION 4 EXERCISES
4.1 Use the Linear Dependence Lemma to prove that if k ∈ {2, . . . , n}, then k vectors v 1 , . . . , v k ∈
Rn are l.d. ⇐⇒ dim span{v 1 , . . . , v k } ≤ k − 1.
5. THE BASIS THEOREM 11

5 THE BASIS THEOREM


We now introduce the notion of basis for a nontrivial subspace of Rn . For this we first need the
appropriate definitions, as follows.
5.1 Definition: Vectors v 1 , . . . , v N ∈ Rn are linearly independent (l.i.) if they are not linearly de-
pendent.

Thus, v 1 , . . . , v N are l.i. if and only if N
j =1 cj v j = 0 ⇒ cj = 0 ∀ j = 1, . . . , N.

5.2 Definition: Let V be any nontrivial subspace of Rn . A basis for V is a collection of vectors
w 1 , . . . , wq such that
(i) w1 , . . . , wq are l.i., and
(ii) span{w1 , . . . , wq } = V .

Such a basis always exists:


5.3 Theorem (“The Basis Theorem.”) Suppose V is a nontrivial subspace of Rn . Then
(a) V has a basis; and
(b) If u1 , . . . , uk are l.i. vectors in V then there is a basis for V which contains u1 , . . . , uk ; more precisely,
there is a basis v 1 , . . . , v q for V with q ≥ k and v j = uj for each j = 1, . . . , k.
Proof of (a): Define

q = max{ ∈ {1, . . . , n} : ∃ l.i. vectors w1 , . . . , w ∈ V } ,

and choose l.i. vectors v 1 , . . . , v q ∈ V . (Thus, roughly speaking, v 1 , . . . , v q are chosen to give “a
maximum number of linearly independent vectors in V .”) We claim that such v 1 , . . . , v q must span
V .To see this let v be an arbitrary vector in V and consider the vectors v 1 , . . . , v q , v. If q = n this is a
set of n + 1 vectors in Rn = span{e1 , . . . , en } and so must be l.d. by the Linear Dependence Lem. 4.2.
On the other hand, if q < n then q + 1 ≤ n and so v 1 , . . . , v q , v must again be l.d., otherwise
v 1 , . . . , v q , v is a set of q + 1 ∈ {1, . . . , n} l.i. vectors in V contradicting the definition of q. Thus,
in either case (q = n, q < n), the vectors v 1 , . . . , v q , v are l.d. Thus, c0 v + c1 v 1 + · · · + cq v q = 0
for some c0 , . . . , cq not all zero. But of course then c0  = 0 because otherwise this identity would tell
us that c1 v 1 + · · · + cq v q = 0 with not all c1 , . . . , cq zero, contradicting the linear independence
of v 1 , . . . , v q . Thus, v = −c0−1 (c1 v 1 + · · · + cq v q ), which completes the proof.

Proof of (b): The proof of (b) is almost the same as the proof of (a), except that we start by defining

q = max{ ∈ {k, . . . , n} : ∃ l.i. vectors w1 , . . . , w ∈ V


with w j = uj for each j = 1, . . . , k} ,

so that we can select l.i. vectors v 1 , . . . , v q ∈ V with v j = uj for j = 1, . . . , k. The remainder of


the proof is identical, word for word, with the proof of (a), and yields the conclusion that v 1 , . . . , v q
span V , and hence v 1 , . . . , v q are a basis for V with q ≥ k and v j = uj for each j = 1, . . . , k.
12 CHAPTER 1. LINEAR ALGEBRA

5.4 Theorem. Let V be a nontrivial subspace of Rn . Then every basis for V has the same number of
vectors.

Proof: (Based on the “Linear Dependence Lemma” 4.2.) Otherwise, we would have two sets of
basis vectors v 1 , . . . , v k and w1 , . . . , wq with q > k. Then, in particular, V = span{v 1 , . . . , v k } =
span{w 1 , . . . , wq } ⊃ {w1 , . . . , wk+1 }; that is, w1 , . . . , wk+1 are k + 1 linearly independent vectors
contained in span{v 1 , . . . , v k } which contradicts the linear dependence lemma.

In view of the above result we can now define the notion of “dimension” of a subspace.
5.5 Definition: If V is a nontrivial subspace of Rn then the dimension of V is the number of vectors
in a basis for V . The trivial subspace {0} is said to have dimension zero.
5.6 Remarks: Of course Rn itself has dimension n, because e1 , . . . , en (as in 1.4) are l.i. and span
all of Rn ; i.e., e1 , . . . , en is a basis of Rn (which we call the “standard basis for Rn ”; the vector ej is
called the “j -th standard basis vector”).
We now prove a third theorem, which is also, like Thm. 5.4 above, a consequence of the Linear
Dependence Lemma.

5.7 Theorem. Let V be a nontrivial subspace of Rn of dimension k (so k ≤ n because any n + 1 vectors
in Rn are l.d. by the Linear Dependence Lemma). Then:
(a) any k vectors v 1 , . . . , v k in V which span V must be l.i. (and hence form a basis for V ); and
(b) any k l.i. vectors v 1 , . . . , v k in V must span V (and hence form a basis for V ).

Proof of (a): v 1 , . . . , v k l.d. means (by Lem. 3.6 of the present chapter) that some v j is a linear
combination of the remaining v i and hence for some j ∈ {1, . . . , k} we have
span{v 1 , . . . , v k }(= V ) = span{v 1 , . . . , v j −1 , v j +1 , . . . , v k }
and so any basis of V consists of k l.i. vectors in span{v1 , . . . , v j −1 , v j +1 , . . . , v k }, contradicting
the linear dependence lemma.
Proof of (b): Otherwise, there is a vector v ∈ V which is not in span{v 1 , . . . , vk } and so v 1 , . . . , v k , v
are (k + 1) l.i. vectors (see Exercise 5.2 below) in the subspace V = span{w1 , . . . , wk }, where
w 1 , . . . , wk is any basis for V , contradicting the linear dependence lemma.

SECTION 5 EXERCISES
5.1 Use the basis theorem to prove that if V , W are subspaces of Rn with V ⊂ W and dim V =
dim W , then V = W .

5.2 If v 1 , . . . , v k are l.i. and v ∈


/ span{v 1 , . . . , v k }, prove that v 1 , . . . , v k , v are l.i.
6. MATRICES 13

6 MATRICES
We begin with some basic definitions.
6.1 Definitions: (i) An m × n real matrix A = (aij ) is an array of mn real numbers aij , arranged in
m rows and n columns, with aij denoting the entry in the i-th row and j -th column. Thus, the i-th
row is the vector
ρi = (ai1 , . . . , ain ) (so ρiT ∈ Rn )
and the j -th column is the vector

α j = (a1j , . . . , amj )T ∈ Rm .

(ii) If A = (aij ) an m × n matrix as in (i) with columns α 1 , . . . , α n ∈ Rm and if x = (x1 , . . . , xn )T ∈


Rn , we define the matrix product Ax to be the vector y = (y1 , . . . , ym )T ∈ Rm with


n
y= xk α k .
k=1

Equivalently,

n
yi = aik xk , i = 1, . . . , m ,
k=1
which is also equivalent to

m 
n
y= aik xk ei ,
i=1 k=1
where ei is the i-th standard basis vector in Rm as in 1.4. We note, in particular, that by taking the
special choice x = ej we see that this says exactly


m
Aej = the j -th column α j = (a1j , . . . , amj )T = aij ei of A, j = 1, . . . , n .
i=1

(iii) More generally, if A = (aij ) is an m × n matrix and B = (bij ) is and n × p matrix, then AB
denotes the m × p matrix with j -th column Aβ j (∈ Rm ), where β j (∈ Rn ) is the j -th column of B;
thus
j -th column of AB = Aβ j where β j = the j -th column of B .
Equivalently,


n
AB = (cij ), where cij = aik bkj , i = 1, . . . , m, j = 1, . . . , p .
k=1
14 CHAPTER 1. LINEAR ALGEBRA

(iv) If A = (aij ) is an m × n matrix, then the transpose of A, denoted AT , is the n × m matrix with
entry aj i in the i-th row and j -th column. Notice that this notation is consistent with our previous
usage of the notation x T when x is a row or column vector: If we are writing vectors in Rn as columns,
then x T denotes the row vector (i.e., 1 × n matrix) with the same entries as x and if we are writing
vectors in Rn as rows then x T denotes the column (i.e., n × 1 matrix) with the same entries as x.
Notice that

α j = j -th column of A ⇒ αTj is the j -th row of AT


ρ i = i-th row of A ⇒ ρTi is the i-th column of AT .

Notice that in a more formal sense an m × n matrix A can be thought of as a point in Rnm , with
the agreement that the entries are ordered into rows and columns rather than a single row or single
column. From this point of view it is natural to define the sum of matrices, multiplication of a matrix
by a scalar, and the length (or “norm”) A of a matrix, as in the following.
Analogous to addition of vectors in Rn we also add matrices component-wise; thus, if A = (aij ), B =
(bij ) are m × n matrices then we define A + B to the be m × n matrix with aij + bij in the i-th
row and j -th column. Thus,

6.2 (aij ) + (bij ) = (aij + bij ) .

Note, however, that it does not make sense to add matrices of different sizes.
Again analogous to the corresponding operation for vectors in Rn , for a given m × n matrix A = (aij )
and a given λ ∈ R we define

6.3 λA = (λaij ) ,

and for such an m × n matrix A = (aij ) we define the “length” or “norm” of A, A, by
⎛ ⎞




m  n

n  m
6.4 A = aij2 ⎝= aij2 ⎠ .
i=1 j =1 j =1 i=1

Since after all this norm really is just the length of the vector with the nm entries aij , we can then
use the usual properties of the norm of vectors in Euclidean space:

6.5 A + B ≤ A + B, λA = |λ|A .

6.6 Remark: Observe that if A = (aij ) is m × n and x ∈ Rn then the matrix product of A times
x gives a vector Ax in Rm , provided we agree to write vectors in Rn and Rm as columns. Indeed, if
x = (x1 , . . . , xn ) is a row vector of length n (i.e., a 1 × n matrix) with n ≥ 2 then Ax makes no
sense. So we shall henceforth always assume we are writing vectors in Rn as columns when we wish
to use the matrix product Ax of an m × n matrix A by a vector in Rn .
7. RANK AND THE RANK-NULLITY THEOREM 15

We also have the important inequality

6.7 Ax ≤ Ax, A any m × n matrix and x ∈ Rn .



Proof of 6.7: Since x = nj=1 xj ej we have Ax = nj=1 xj Aej = nj=1 xj α j , where α j is the j -

th column of A (by (ii) above), and so, by the triangle inequality 2.6, Ax ≤ nj=1 |xj |α j  = y ·
γ , where y = (|x1 |, . . . , |xn |)T , γ = (α 1 , . . . , α n )T ∈ Rn , and hence by the Cauchy-Schwarz
inequality we have



n
Ax ≤ y α j 2 = xA
j =1

as claimed.

One of the key reasons that matrices are important is that m × n matrices naturally represent linear
transformations Rn → Rm as follows.
Suppose T : Rn → Rm is a linear transformation (i.e., T is a mapping from Rn to Rm such that

λ, μ ∈ R, x, y ∈ Rm ⇒ T (λx + μy) = λT (x) + μT (y)). Then we can write x = nj=1 xj ej and
n
use the linearity of T to give T (x) = j =1 xj T (ej ). Thus, if we let A be the m × n matrix with
j -th column α j = T (ej )(∈ Rm ) we then have by 6.1(ii) above that T (x) = Ax. That is:

6.8 T : Rn → Rm linear ⇒ T (x) ≡ Ax, where the j -th column of A = T (ej ) .

SECTION 6 EXERCISES
6.1 Let θ ∈ [0, 2π ) and let T be the linear
 transformation of  R defined
2
 by T (x) = Q(θ)x, where
cos θ − sin θ r cos α
Q(θ ) is the 2 × 2 matrix . Prove that if x = (with r ≥ 0 and α ∈ [0, 2π))
sin θ cos θ r sin α
 
r cos(α + θ)
then T (x) = . With the aid of a sketch, give a geometric interpretation of this.
r sin(α + θ)

6.2 What is the matrix (in the sense described above) of the linear transformation T : R2 → R2
which takes the point (x, y) to its “reflection in the line y = x,” i.e., the transformation T (x, y) =
(y, x).
Caution: In this exercise points in R2 are written as row vectors, but in order to represent T in terms
of matrix multiplication you should first rewrite everything in terms of column vectors.

7 RANK AND THE RANK-NULLITY THEOREM


Let A = (aij ) be an m × n matrix. We define the column space C(A) of A to be the subspace of
Rm spanned by the columns of A: thus, if α j = (a1j , a2j , . . . , amj )T is the j th column of A, then

7.1 C(A) = span{α 1 , . . . , α n } ,


16 CHAPTER 1. LINEAR ALGEBRA

which (by 3.4 of the present chapter) is a subspace of Rm . The “null space” N(A) of A is defined to
be the set of all solutions x ∈ Rn of the homogeneous system Ax = 0:

7.2 N (A) = {x ∈ Rn : Ax = 0} ,

so that N(A) is a subspace of Rn .


The rank (or column rank) of A is defined to be the dimension of the column space C(A) of A; that
is, it is the dimension of the subspace of Rn spanned by the columns of A.
The row rank of A is defined to be dimension of the subspace of Rn spanned by the rows of A, in
case we agree to write all vectors in Rn as row vectors. (Equivalently, it is the column rank, i.e., the
dimension of the column space, of AT .)
One of the things we’ll prove later (in Thm. 8.5 below) is that the row rank and the column rank are
equal. This is not intuitively obvious, but true nevertheless.
The nullity of A is defined to be the dimension of the null space N(A).

Let’s look at the extreme cases for N (A): (a) N (A) = {0} and (b) N (A) = Rn (i.e., the case when
N(A) is trivial and the case when N (A) is the whole space Rn ). In case (a), we have that Ax = 0 has

only the trivial solution x = 0; but Ax = nj=1 xj α j , so this just says nj=1 xj α j = 0 has only the
trivial solution x = 0. That is, (a) just says that the columns of A are l.i., and hence are a basis for
C(A) and the column space C(A) has dimension n. In case (b), we have Ax = 0 for each x in Rn ,
which, in particular, means that Aej = 0, which says that the j -th column of A is the zero vector
for each j , i.e., A = O (the zero matrix, with every entry = 0), and hence C(A) = {0} in this case.
Notice that in each of these two extreme cases we have dim C(A) + dim N(A) = n.
The rank/nullity theorem (below) says that this is in fact true in all cases, not just in the extreme
cases.

7.3 Rank/Nullity Theorem. Let A be any m × n matrix. Then


dim C(A) + dim N (A) = n .

7.4 Remarks: (1) Notice this is correct in the extreme cases N (A) = {0} and N(A) = Rn (as we
discussed above), so we only of have to give the proof in the nonextreme cases (i.e., when N(A)  = {0}
and N(A)  = Rn ).
(2) Recall that dim C(A) is called the “rank” of A (i.e., “rank(A)”) and dim N(A) the “nullity” of A
(i.e., “nullity(A)”); using this terminology the above theorem says
rank (A) + nullity (A) = n ,
hence the name “rank/nullity theorem.”

Proof of Rank/Nullity Theorem: By Rem. (1) above we can assume that N (A) = {0} and N(A) =
Rn . In particular, since N(A) is nontrivial it has a basis u1 , . . . , uk . By the part (b) of the Basis
7. RANK AND THE RANK-NULLITY THEOREM 17

Thm. 5.3 we can find a basis v 1 , . . . , v n for all of Rn with v j = uj for each j = 1, . . . , k, and of
course also k ≤ n − 1 because N (A) = Rn . Now

C(A) = span{α 1 , . . . , α n } where α 1 , . . . , α n denote the columns of A


= {x1 α 1 + · · · + xn α n : x1 , . . . , xn ∈ R}
= {Ax : x ∈ Rn }

= {A( nj=1 cj v j ) : c1 , . . . , cn ∈ R} (because span{v 1 , . . . , v n } = Rn )

= { nj=1 cj Av j : c1 , . . . , cn ∈ R}

= { nj=k+1 cj Av j : ck+1 , . . . , cn ∈ R} (because Av j = 0 for j = 1, . . . , k)
= span{Av k+1 , . . . , Av n } .

Also, Av k+1 , . . . , Av n are l.i.; check:


n  n 
j =k+1 cj Av j = 0 ⇒ A j =k+1 cj v j = 0

⇒ nj=k+1 cj v j ∈ N (A) = span{v 1 , . . . , v k }

⇒ nj=k+1 cj v j = kj =1 dj v j for some d1 , . . . , dk

⇒ kj =1 dj v j − nj=k+1 cj v j = 0
⇒ ck+1 , . . . , cn , d1 , . . . , dk are all 0, because v 1 , . . . , v n are l.i.

Thus, Av k+1 , . . . , Av n are a basis for C(A), and hence dim C(A) = n − k = n − dim N(A).

SECTION 7 EXERCISES
7.1 (a) Use Gaussian elimination to show that the solution set of the homogeneous system Ax = 0
with matrix ⎛ ⎞
1 2 3 4
⎜1 4 3 2 ⎟
A=⎜ ⎝2 5 6 7⎠

1 0 3 6
is a plane (i.e., the span of 2 l.i. vectors), and find explicitly 2 l.i. vectors whose span is the solution
space.

(b) Find the dimension of the column space of the above matrix (i.e., the dimension of the subspace
of R4 which is spanned by the columns of the matrix).
Hint: Use the result of (a) and the rank nullity theorem.

7.2 If A, B are any m × n matrices, prove that rank(A + B) ≤ rank (A) + rank (B).
(rank (A) is the dimension of the column space of A, i.e., the dimension of the subspace of Rm
spanned by the columns of A.)

7.3 (a) If A, B are, respectively, m × n and n × p matrices, prove that rank AB ≤


min{rank A, rank B}.
18 CHAPTER 1. LINEAR ALGEBRA

(b) Give an example in the case n = m = p = 2 to show that strict inequality may hold in the
inequality of (a).

7.4 Suppose v 1 , . . . , v n are vectors in Rm , not all zero, and let  ∈ {1, . . . , n} be the maximum number
of linearly independent vectors which can be selected from v 1 , . . . , v n . Select any 1 ≤ j1 < j2 <
· · · < j ≤ n such that v j1 , . . . , v j are l.i. Prove that v j1 , . . . , v j is a basis for span{v 1 , . . . , v n }.
(In particular, given a nonzero m × n matrix A with j -th column α j , we can always select certain
of the columns α j1 , . . . , α jQ to give a basis for C(A).)

8 ORTHOGONAL COMPLEMENTS AND ORTHOGONAL


PROJECTION
Let V be a subspace of Rn . The “orthogonal complement” V ⊥ of V is defined to be the set of all
vectors x in Rn such that x · v = 0 for each v ∈ V . That is, V ⊥ is the set of all vectors which are
orthogonal to each vector in V . Of course V ⊥ so defined is a subspace of Rn .
If V is the trivial subspace {0} we evidently have V ⊥ = Rn , whereas if V is nontrivial the Basis
Thm. 5.3(a) guarantees that we can choose a basis u1 , . . . , uk for V and then by 5.3(b) (applied with
Rn in place of V ) there is a basis v 1 , . . . , v n for Rn with v j = uj for each j = 1, . . . , k. It is then
very easy to check that V ⊥ is exactly characterized by saying that

x ∈ V ⊥ ⇐⇒ uj · x = 0 for each j = 1, . . . , k .

The k equations here form a homogeneous system of k linear equations in the n unknowns x1 , . . . , xn ,
so by the under-determined systems Lem. 4.1 it is a nontrivial subspace unless k = n, i.e., unless
V = Rn .
Our main initial results about V ⊥ are given in the following theorem.

8.1 Theorem. Let V be a subspace of Rn . Then the subspace V ⊥ satisfies the following:
(i) V⊥ ∩V = {0}
(ii) V +V⊥ = Rn
(iii) dim V + dim V ⊥ =n
(iv) (V ⊥ )⊥ =V .
Remark: Notice that (i) says that V and V ⊥ have only the zero vector in common, and (ii) says that
every vector in Rn can be written as a the sum of a vector in V and a vector in V ⊥ .

Proof of (i): w ∈ V ∩ V ⊥ ⇒ w ∈ V and w ∈ V ⊥ . Thus, in particular, w · w = 0, i.e., w2 = 0,


i.e., w = 0.
Proof of (ii): V + V ⊥ = {v + u : v ∈ V , u ∈ V ⊥ } and this is clearly a subspace of Rn . Call this
subspace W . Take any vector x ∈ W ⊥ , so that x · (v + u) = 0 for each v ∈ V and each u ∈ V ⊥ .
8. ORTHOGONAL COMPLEMENTS AND ORTHOGONAL PROJECTION 19

Taking u = 0, this implies x · v = 0 for each v ∈ V , so x ∈ V ⊥ , and, taking v = 0, it implies x · u =


0 for each u ∈ V ⊥ . In particular, since x ∈ V ⊥ , we must have x · x = 0. Thus, x = 0; that is, we have
shown that W ⊥ is the trivial subspace, and, by the remarks preceding the statement of the theorem,
this can occur only if W = Rn , so (ii) is proved.

Proof of (iii): Notice that (iii) is trivially true if V is the trivial subspace (because then V ⊥ = Rn ),
so we can assume V is nontrivial. Likewise, we can assume that V ⊥ is nontrivial, because, as we
pointed out in the discussion preceding the statement of the theorem, V ⊥ trivial implies V = Rn
and again (iii) holds. Thus, we can assume that both V and V ⊥ are nontrivial, and hence by the
basis theorem we can select a basis v 1 , . . . , vk for V and a basis u1 , . . . , u for V ⊥ . Of course
then by (ii) we have span{v 1 , . . . , v k , u1 , . . . , u } = Rn and we claim that v 1 , . . . , v k , u1 , . . . , u
are linearly independent. To check this, suppose that c1 v 1 + · · · + ck v k + d1 u1 + · · · + d u = 0.
Then c1 v 1 + · · · + ck v k = −d1 u1 − · · · − d u and the left side is in V while the right side is in
V ⊥ , so by (i) both sides must be zero.That is, c1 v 1 + · · · + ck v k = 0 = d1 u1 + · · · + d u and since
v 1 , . . . , v k are l.i., this gives all the cj = 0, and similarly, since u1 , . . . , u are l.i., all the dj = 0.
Thus, we have proved that the vectors v 1 , . . . , v k , u1 , . . . , u both span Rn and are l.i.; that is, they
are a basis for Rn , and hence k +  = n. That is, dim V + dim V ⊥ = n, as claimed.

Proof of (iv): Observe that by definition of V ⊥ we have v · u = 0 for each v ∈ V and each u ∈ V ⊥ ,
which, in particular, says that each v ∈ V is in (V ⊥ )⊥ (because it says that each v ∈ V is orthogonal
to each vector in V ⊥ ), so we have shown V ⊂ (V ⊥ )⊥ .
By (iii) dim V ⊥ = n − dim V . Also, V ⊥ is a subspace of Rn , so (iii) holds with V ⊥ in place of V .
That is, dim(V ⊥ )⊥ = n − dim V ⊥ = n − (n − dim V ) = dim V .
That is, we have shown that V ⊂ (V ⊥ )⊥ and that V , (V ⊥ )⊥ have the same dimension.They therefore
must be equal. (Notice that here we use the following general principle: if V , W are subspaces of
Rn and V ⊂ W then dim V = dim W implies V = W . This is an easy consequence of the Basis
Thm. 5.3—see Exercise 5.1 above.)
This completes the proof of Thm. 8.1.

8.2 Remark: 8.1(ii) tells us that we can write any vector x ∈ Rn as x = v + u with v ∈ V and
u ∈ V ⊥ . We claim that such v, u are unique. Indeed, if we could also write x =  v + u with v∈V
and u ∈ V ⊥ then we would have x = v + u =  v + u, hence v − v =u − u. Since the left side is in
V and the right side is in V ⊥ , we would then have by 8.1(i) that v − 
v = 0 = u − u, i.e., that v = v
and u =  u. So v, u are unique as claimed. For this reason the decomposition of 8.1(ii) is sometimes
referred to as a direct sum.

Using Thm. 8.1 we can prove that existence of a unique orthogonal projection of Rn onto a given
subspace V , as follows.
20 CHAPTER 1. LINEAR ALGEBRA

8.3 Theorem (Orthogonal Projection.) Given a subspace V ⊂ Rn there is a unique map PV : Rn →


Rn with the two properties
PV (x) ∈ V ∀x ∈ Rn
x − PV (x) ∈ V ⊥ ∀x ∈ Rn .
This map automatically has the additional properties
(i) PV : Rn → Rn is linear (i.e., PV (λx + μy) = λPV (x) + μPV (y) ∀ x, y ∈ Rn )
(ii) x · PV (y) = PV (x) · y ∀ x, y ∈ Rn
(iii) ∀ x ∈ Rn , x − PV (x) ≤ x − y ∀ y ∈ V , with equality ⇐⇒ y = PV (x) .
8.4 Remark: The map PV is called the orthogonal projection of Rn onto V . Notice that the properties
(i), (ii) tell us that PV is given by matrix multiplication by a symmetric matrix (pij ); that is, PV (x) =
n
i,j =1 pij xj ei , where pij = pj i for each i, j = 1, . . . , n, and property (iii) says that PV (x) is the
unique nearest point of V to x, as in the following schematic diagram depicting the case n = 3,
dim V = 2.

Figure 1.1: The orthogonal projection PV onto a subspace V .

Proof of Theorem 8.3: Theorem 8.1(ii) implies that for any x ∈ Rn we can write x = v + u with
v ∈ V and u ∈ V ⊥ and by Rem. 8.2 such v, u are unique. Thus, we can define PV (x) = v (∈ V ),
and then x − PV (x) = u (∈ V ⊥ ). Using the Thm. 8.1(i) we can readily check that the claimed
uniqueness of PV the stated properties (i) and (ii) (see Exercise 8.2 below).
To prove (iii), observe that for any x ∈ Rn and y ∈ V we have x − y2 = (x − PV (x)) − (y −
PV (x))2 = x − PV (x)2 + y − P V (x)2 + 2(x − PV (x)) · (y − P (x)), and the last term is
8. ORTHOGONAL COMPLEMENTS AND ORTHOGONAL PROJECTION 21

zero because x − PV (x) ∈ V ⊥ and y − PV (x) ∈ V . That is, we have shown that x − y2 =
x − PV (x)2 + y − PV (x)2 ≥ x − PV (x)2 , with equality if and only if y − PV (x) = 0, i.e.,
if and only if y = PV (x).

The following theorem, which is based on Thm. 8.1 above and the Rank-Nullity Theorem, establishes
some important connections between column spaces and null spaces of a matrix A and its transpose
AT .
8.5 Theorem. Let A be an m × n matrix. Then
(i) (C(A))⊥ = N(AT )
(ii) C(A) = (N (AT ))⊥
(iii) dim C(A) = dim C(AT ) .
8.6 Remark: Notice that (iii) says that the row rank and the column rank are the same.
Proof of (i): Let α 1 , . . . , α n be the columns of A. Then
x ∈ (C(A))⊥ ⇐⇒ x · α j = 0 ∀ j = 1, . . . , n
⇐⇒ AT x = 0
⇐⇒ x ∈ N(AT ) .

Proof of (ii): By (i) (C(A))⊥ = N(AT ) and hence (using (iv) of Thm. 8.1) C(A) = ((C(A))⊥ )⊥ =
(N(AT ))⊥ .

Proof of (iii): By the rank-nullity theorem (applied to the transpose matrix AT ) we have
(1) dim C(AT ) = m − dim N(AT ) .
By part (iii) of Thm. 8.1 and by (i) above we have
(2) dim C(A) = m − dim(C(A))⊥ = m − dim N(AT ) .
By (1),(2) we have dim C(A) = dim C(AT ) as claimed.

SECTION 8 EXERCISES
8.1 (a) If V is a subspace of Rn and if P is the orthogonal projection onto V , prove that I − P is
the orthogonal projection onto V ⊥ (I is the identity transformation, so (I − P )(x) = x − P (x)).
(b) If α ∈ Rn with α = 1 and if V = span{α}, find the matrix of the orthogonal projection onto
V and also the matrix of the orthogonal projection onto V ⊥ .

8.2 If V and PV (x) are as in the first part of Thm. 8.3 above, prove that properties (i), (ii) hold as
claimed.
Hint: To check (i) note that PV (λx + μy) − λPV (x) − μPV (y) = −(λx + μy − PV (λx +
μy)) + λ(x − PV (x)) + μ(y − PV (y)) and use Thm. 8.1(i) and the stated properties of PV .
22 CHAPTER 1. LINEAR ALGEBRA

9 ROW ECHELON FORM OF A MATRIX

Let A = (aij ) be an m × n matrix. Recall that the “First Stage” (described in Sec. 4 above) of the
process of Gaussian elimination leads to 2 possible cases: either the first column is zero, or else there
is at least one nonzero entry in the first column, and using only the 3 elementary row operations we
produce a new matrix with all entries in the first column zero except for the first entry which we
can arrange to be 1. That is, Stage 1 of the process of Gaussian elimination produces a new matrix
 with the same null space as A (because elementary row operations leave the solution set of the
A
homogeneous system Ax = 0 unchanged) and A  either has the form:
⎛ ⎞
0 a12 · · ·  a1n
⎜0  a22 · · ·  a2n ⎟
⎜ ⎟
Case 1  ⎜
A = ⎜ .. .. .. ⎟
⎝ . . ··· . ⎟

am2 · · · 
0  amn

or

⎛ ⎞
1 a12 ··· 
a1n
⎜ 0 a22 ··· 
a2n ⎟
⎜ ⎟
Case 2 = ⎜
A .. .. .. ⎟
⎜ . . ··· . ⎟
⎝ ⎠
0 am2 ··· 
amn .

Now in Case 1 we can apply this whole first stage process again to the m × (n − 1) matrix

⎛ ⎞

a12 ··· 
a1n
⎜ 
a22 ··· 
a2n ⎟
= ⎜
A ⎜ .. ..


⎝ . ··· . ⎠

am2 ··· 
amn ,

while in Case 2 we can apply the whole first stage process again to the (m − 1) × (n − 1) matrix

⎛ ⎞

a22 ··· 
a2n
 ⎜ .. .. ⎟ .
A=⎝ . ··· . ⎠

am2 ··· 
amn
9. ROW ECHELON FORM OF A MATRIX 23

Using mathematical induction on the number n of columns we can check that, in fact after at most
n repetitions of the process described above, we obtain a matrix of the following form:

col. j1 col. j2 col. j3 ··· col. jQ


↓ ↓ ↓ ↓
⎛ ⎞⎫
0 ··· 0 1 ∗ ··· ∗ ∗ ∗ ··· ∗ ⎪
··· ∗ ∗ ∗ ··· ∗ ∗ ∗ ⎪
⎜ ⎪

⎜ 0 ··· 0 0 0 ··· ∗ ∗ ∗ ··· ∗ ⎟
··· 0 1 ∗ ··· ∗ ∗ ∗ ⎟⎪⎬
⎜ 0 ··· 0 0 0 ··· ∗ ∗ ∗ ··· ∗ ⎟
··· 0 0 0 ··· 0 1 ∗
⎜ ⎟⎪ (a)
⎜ .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ⎟ ⎪

⎜ . ⎟
⎜ . . . . . . . . . .
. . . ⎟⎪⎪
⎜ ⎟⎭
⎜ 0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0
··· 0 1 ∗ ··· ∗ ⎟
⎜ ⎟⎫
⎜ 0 ··· 0 0 0 · · · 0 0 0 · · · 0 ⎟⎪
··· 0 0 0 ··· 0 0 0 ⎬
⎜ .. ⎟
⎜ .. .. .. .. .. .. .. .. .. ..
.. .. .. ⎟ (b)
⎝ . . . . . . . . . .
. . . . ⎠⎪

0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0

Schematic Diagram of the Row Echelon Form of A

where the first Q rows (labeled (a) above) are nonzero, and each contains a pivot, i.e., a 1 preceded
by an unbroken string of zeros, and each successive one of these rows has a strictly longer unbroken
string of zeros before the pivot; and the remaining rows (labeled (b) above), if there are any, are all
zero.
Notice that mathematical induction does indeed provide a rigorous proof of this because the result is
trivially true for n = 1, and for n ≥ 2 the inductive hypothesis that the result is true for matrices with
n − 1 columns would tell us that the matrix A  above can be reduced to echelon form by elementary
row operations. By applying the same row operations to the matrix A  we then get the required
echelon form for A.
Such a form (obtained from A by elementary row operations) is called a “row echelon form” for
the matrix A: Notice that there are a certain number Q ≤ min{m, n} of nonzero rows, each of
which consists of an unbroken string of zeros (“the leading zeros” of the row) followed by a 1
called the “leading 1” or “pivot,” and each successive nonzero row has strictly more leading zeros
than the previous. Correspondingly, the column numbers of the pivot columns (i.e., the columns
which contain the pivots) are j1 , . . . , jQ , with 1 ≤ j1 < j2 < · · · < jQ ≤ n, as shown in the above
schematic diagram.
Of course by using further elementary row operations (subtracting the appropriate multiples of a
row containing a pivot from the rows above it) we can now eliminate all nonzero entries above the
pivot in each of the pivot columns, thus yielding a matrix called the “reduced row echelon form of
A” (abbreviated rrefA):
24 CHAPTER 1. LINEAR ALGEBRA

col. j1 col. j2 col. j3 ··· col. jQ


↓ ↓ ↓ ↓
⎛ ⎞
0 ··· 0 1 ∗ ··· ∗ 0 ∗ ··· ∗ 0 ∗
··· ∗ 0 ∗ ··· ∗
⎜ 0 ··· 0 0 0 ··· ∗ 0 ∗ ··· ∗ ⎟
··· 0 1 ∗ ··· ∗ 0 ∗
⎜ ⎟
⎜ 0 ··· 0 0 0 ··· ∗ 0 ∗ ··· ∗ ⎟
··· 0 0 0 ··· 0 1 ∗
⎜ ⎟
⎜ .. .. .. .. .. .. .. .. ⎟
.. .. ..
.. .. ..
⎜ . ⎟
(‡) ⎜ . . . . . . . . . .
. . . ⎟
⎜ ⎟ .
⎜ 0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0
··· 0 1 ∗ ··· ∗ ⎟
⎜ ⎟
⎜ 0 ··· 0 0 0 ··· 0 0 0 ··· 0 ⎟
··· 0 0 0 ··· 0 0 0
⎜ .. ⎟
⎜ .. .. .. .. .. .. ..
.. .. .. .. .. .. ⎟
⎝ . . . . . . . . . .
. . . . ⎠
0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0

The Reduced Row Echelon Form (“rref A”) of A

Note: Such schematic diagrams are accurate and useful, but must be interpreted correctly; for
instance, if j1 = 1 (i.e., if there is a pivot in the first column), then the first few zero columns shown
in the above diagrams are not present, and the reduced row echelon form of A (rrefA) would look
as follows:

col. j1 col. j2 col. j3 ··· col. jQ


⎛ ↓ ↓ ↓
⎞ ↓
1 ∗ ··· ∗ 0 ∗ ··· ∗ 0 ∗ ··· ∗ 0 ∗
··· ∗
⎜ 0 0 ··· 0 1 ∗ ··· ∗ ⎟
··· ∗ 0 ∗ ··· ∗ 0 ∗
⎜ ⎟
⎜ 0 0 ··· 0 0 0 ··· ∗ ⎟
··· 0 1 ∗ ··· ∗ 0 ∗
⎜ ⎟
⎜ . .. .. .. .. .. ⎟
.. .. .. .. .. ..
⎜ .. . ⎟
⎜ . . . . . . . ⎟. . .
⎜ ⎟ .
⎜ 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0 1 ∗
··· ∗ ⎟
⎜ ⎟
⎜ 0 0 ··· 0 0 0 ··· 0 ⎟
··· 0 0 0 ··· 0 0 0
⎜ . .. ⎟
⎜ . .. .. .. .. .. .. .. ⎟
.. .. ..
⎝ . . . . . . . . . ⎠. . .
0 0 ··· 0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0

In the case when there is a pivot in every column (i.e., Q = n and j1 = 1, j2 = 2, . . . , jn = n), a
row echelon form of A would look like
9. ROW ECHELON FORM OF A MATRIX 25

⎛ ⎞
1 ∗ ∗ ··· ∗
⎜ 0 1 ∗ ··· ∗ ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎜ . . . ⎟
⎜ ⎟
⎜ 0 0 0 ··· 1 ⎟
⎜ ⎟
⎜ 0 0 0 ··· 0 ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎝ . . . ⎠
0 0 0 ··· 0

and the reduced row echelon form of A (i.e., rrefA) is:


⎛ ⎞
1 0 0 ··· 0
⎜ 0 1 0 ··· 0 ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎜ . . . ⎟
⎜ ⎟
⎜ ⎟.
(‡‡) ⎜ 0 0 0 ··· 1 ⎟
⎜ 0 0 0 ··· 0 ⎟
⎜ ⎟
⎜ . .. .. ⎟
⎝ .. . . ⎠
0 0 0 ··· 0

That is, if there is a pivot in every column (Q= n and j1 , . . . , jQ = 1, . . . , n, respectively) then the
reduced row echelon form of A has first n rows equal to the n × n identity matrix, and remaining
rows all zero.

Since Gaussian elimination (which uses only the 3 elementary row operations) does not change the
null space we have
N (A) = N(rrefA) ,
so to find N(A) we instead only need solve the simpler problem of finding N(rrefA).
Notice in case Q = 0 we have the trivial case A = O and N(A) = Rn and when Q = n we
have a pivot in every column and N (A) = N(rrefA) = {0}, so we assume from now on that
Q ∈ {1, . . . , n − 1}, and we proceed to compute N(rrefA).
If we let bij denote the element in the i-th row and j -th column of rrefA, then see that,
for i ∈ {1, . . . , Q}, equation number i in the system rrefAx = 0 is satisfied ⇐⇒ xji =

− k=j1 ,...,jQ xk bik , and of course the last m − Q equations are all trivially true for all x be-
cause the last m − Q rows of rrefA are zero. Thus, since for any vector x ∈ Rn we can write
Q
x = nk=1 xk ek = i=1 xji eji + k=j1 ,...,jQ xk ek , we see that

Ax = 0 ⇐⇒ rrefAx = 0 ⇐⇒ xji = − k=j1 ,...,jQ xk bik , i = 1, . . . , Q
Q
⇐⇒ x = − i=1 ( k=j1 ,...,jQ xk bik )eji + k=j1 ,...,jQ xk ek
Q
⇐⇒ x = k=j1 ,...,jQ xk (ek − i=1 bik eji )
26 CHAPTER 1. LINEAR ALGEBRA

with xk arbitrary for k  = j1 , . . . , jQ . Thus,

Q
N(A) = span{ek − i=1 bik eji : k = j1 , . . . , jQ } ,

and the n − Q vectors on the right here are clearly l.i., so in fact we have dim N(A) = n − Q.

Observe also that rrefA has the standard basis vectors e1 , . . . , eQ in columns number j1 , . . . , jQ ,
and all the other columns have their last m − Q entries = 0, so trivially the column space of rrefA
is spanned by the l.i. vectors e1 , . . . , eQ : if k  = j1 , . . . , jQ then the k-th column β k of rrefA =
Q
=1 c e for suitable constants c1 , . . . , cQ . Since the j -th column of a matrix is obtained by
multiplying the matrix by the j -th standard basis vector ej , in terms of matrix multiplication this
Q Q
says exactly rrefA(ek − =1 c ej ) = 0—i.e., the vector ek − =1 c ej is in N(rrefA). Since
Q
N (rrefA) = N(A) we must then have that ek − =1 c ej is in N(A), or in other words, A(ek −
Q
c ej ) = 0.That is, for each k = j1 , . . . , jQ , the k-th column α k of A is the linear combination
=1Q
=1 c α j . Thus, we have proved that the column space C(A) of A is spanned by the columns
number j1 , . . . , jQ of A (i.e., the columns α j1 , . . . , α jQ of A, where j1 , . . . , jQ are the numbers of the
pivot columns of rrefA). We also claim that α j1 , . . . , α jQ are l.i., because otherwise there are scalars
Q Q
c1 , . . . , cQ not all zero such that =1 c α j = 0 and this says exactly that the vector =1 c ej ∈
Q Q
N(A) = N (rrefA), with c1 , . . . , cQ not all zero, so that =1 c e = rrefA( =1 c ej ) = 0 with
c1 , . . . , cQ not all zero, which is of course not true because e ,  = 1, . . . , Q, are l.i. vectors. Observe
that this part of the argument (that α j1 , . . . , α jQ are l.i.) is also correct if Q = n; of course in this
case we must have j1 = 1, j2 = 2, . . . , jn = n and the result is that all n columns of A are l.i.
Thus, to summarize, we have proved that if the pivot columns of rrefA are column numbers
j1 , . . . , jQ (Q ≥ 1), then the columns α j1 , . . . , α jQ are a basis for the column space of A. In
particular, Q (the number of nonzero rows in rrefA) is equal to the dimension of C(A), the column
space of A. Since we already proved directly that dim(N (A)) = n − Q, we have thus given a second
proof of the rank/nullity theorem and at the same time we have shown how to explicitly find the
null space N(A) and the column space C(A).

SECTION 9 EXERCISES
9.1 Write down a single homogeneous linear equation with unknowns x1 , x2 , x3 such that the
⎛ ⎞ ⎛ ⎞
1 1
solution set consists of the span of the 2 vectors ⎝ 2 , −2⎠.
⎠ ⎝
−1 0
10. INHOMOGENEOUS SYSTEMS 27

9.2 Use Gaussian elimination to show that the solution set of the homogeneous system Ax = 0 with
matrix ⎛ ⎞
1 2 3 4
⎜1 4 3 2 ⎟
A=⎜ ⎝2 5 6 7⎠

1 0 3 6
is a plane (i.e., the span of 2 l.i. vectors), and find explicitly 2 l.i. vectors whose span is the solution
space.

(b) Find the dimension of the column space of the above matrix (i.e., the dimension of the subspace
of R4 which is spanned by the columns of the matrix).
Hint: Use the result of (a) and the rank nullity theorem.

9.3 Suppose
⎛ ⎞
1 0 1 1 1
⎜2 1 1 3 1⎟
A=⎜
⎝1
⎟ .
1 2 0 2⎠
0 0 1 −1 1
Find a basis for the null space N (A) and the column space C(A) of A.

10 INHOMOGENEOUS SYSTEMS
Notice that Gaussian elimination can also be used to solve inhomogeneous systems

10.1 Ax = b ,

where A is m × n, b is a given vector in Rm , and x is to be determined. First we have the following


lemma, which establishes that the set of all solutions of such an inhomogeneous system is either
empty or an affine space which is a translate of the null space N (A) of A.
10.2 Lemma. Suppose A is m × n and b ∈ Rm . Then:
(i) ∃ a solution x ∈ Rn of 10.1 ⇐⇒ b ∈ C(A) (the column space of A);
(ii) if x 0 ∈ Rn is any solution of 10.1 then the complete set of solutions
x of 10.1 is the affine space x 0 + N(A).
10.3 Remark: If V is a subspace of Rn and a ∈ Rn , then “the affine space through a parallel to V ”
means the translate of V by the vector a; that is, a + V (= {a + x : x ∈ V }). With this terminology,
part (ii) of the above theorem says that the set of all solutions of 10.1 is the affine space through x 0
parallel to N(A), assuming that there is at least one solution x = x 0 of 10.1.
Proof of 10.2: Observe first that if α j is the j -th column of A then by definition of matrix multipli-

cation 6.1(ii) we have Ax = nj=1 xj α j which means that the set of all vectors Ax corresponding
28 CHAPTER 1. LINEAR ALGEBRA

Figure 1.2: Schematic picture of the affine space a + V with n = 3, dim V = 2.

to x ∈ Rn is just the set of all linear combinations of the columns α 1 , . . . , α n , i.e., the column space
of A. Hence, Ax = b has a solution if and only if b ∈ C(A), so (i) is proved.
Suppose x 0 ∈ Rn with Ax 0 = b and observe that then

Ax = b ⇐⇒ A(x − x 0 ) + Ax 0 = b ⇐⇒ A(x − x 0 ) = 0
⇐⇒ x − x 0 ∈ N (A) ⇐⇒ x ∈ x 0 + N(A) ,

so (ii) is also proved.

In practice, we can check if there is a solution x 0 , and if so actually find the whole solution set,
by using Gaussian elimination as follows: We start with the m × (n + 1) “augmented matrix” A|b
which has columns α 1 , . . . , α n , b and use the same elementary row operations which reduce A to
rrefA, but applied to the augmented matrix instead of A: this gives a new m × (n + 1) augmented
matrix rrefA|b such that rrefAx =  b has the same set of solutions as the original system Ax = b.
One can then directly check whether the new system rrefAx =  b has a solution, and, if so, actually
find all the solutions x 0 + N (A).

SECTION 10 EXERCISES
10.1 Find a cubic polynomial ax 3 + bx 2 + cx + d whose graph passes through the 4 points (−1, 0),
(0, 1), (1, −2), (2, 5). Is this cubic polynomial the unique such?
10.2 For each of the inhomogeneous systems with the indicated augmented matrix A|b: (a) check
whether or not a solution exists, and (b) in case a solution exists find the set of all solutions (describe
10. INHOMOGENEOUS SYSTEMS 29

your solution in terms of an affine space x 0 + V , giving x 0 and a basis for the subspace V in each
case):
(i) ⎛  ⎞
1 0 1 1 1 1

⎜2 1 1 3 1  1⎟
A|b = ⎜
⎝1
 ⎟
 2⎠
1 2 0 2 
0 0 1 −1 1 1

(ii) ⎛  ⎞
1 0 1 1 2 1

⎜2 1 1 3 1  1⎟
A|b = ⎜
⎝1
 ⎟.
 2⎠
1 2 0 2 
0 0 1 −1 1 1
30
31

CHAPTER 2

Analysis in Rn
1 OPEN AND CLOSED SETS IN Rn
We begin with the definition of open and closed sets in Rn . In these definitions we use the notation
that for ρ > 0 and y ∈ Rn the open ball of radius ρ and center y (i.e., {x ∈ Rn : x − y < ρ}) is
denoted Bρ (y).
1.1 Definition: A subset U ⊂ Rn is open if for each a ∈ U there is a ρ > 0 such that the ball
Bρ (a) ⊂ U .
Observe that direct use of the above definition yields the following general facts:
• Rn and ∅ (the empty set) are both open.
• If U1 , . . . , UN is any finite collection of open sets, then ∩N
j =1 Uj is open (thus “the intersection of
finitely many open sets is again open”).
• If {Uα }α∈
(
any nonempty indexing set) is any collection of open sets, then ∪α∈
Uα is open
(thus “the union of any collection of open sets is again open”).
In the definition of closed set we need the notion of limit point of a set C ⊂ Rn : a point y ∈ Rn is
said to be a limit point of a set C ⊂ Rn if there is a sequence {x (k) }k=1,2,... with x (k) ∈ C for each k
and limk→∞ x (k) = y (i.e., limk→∞ x (k) − y = 0). For example, in the case n = 1 the point 0 is
a limit point of the open interval (0, 1), because { k+1
1
}k=1,2,... is a sequence of points in (0, 1) with
limit zero. Notice that any point y ∈ C is trivially a limit point of C, because it is the limit of the
constant sequence {x (k) }k=1,2,... with x (k) = y for each k, hence it is always true that

C ⊂ {limit points of C} .

1.2 Definition: A subset C ⊂ Rn is closed if C contains all its limit points.

There is a very important connection between open and closed sets.

1.3 Lemma. A set U ⊂ Rn is open ⇐⇒ Rn \ U is closed.

Before we give the proof, we point out an equivalent version of the same result.

1.4 Corollary. A set C ⊂ Rn is closed ⇐⇒ Rn \ C is open.

Proof of Corollary: Apply 1.3 to the case U = Rn \ C; this works because U = Rn \ C ⇐⇒ C =


Rn \ U .
32 CHAPTER 2. ANALYSIS IN Rn

Proof of 1.3 ⇒: Assume U is open and let y be a limit point of Rn \ U . If y ∈ U then by definition
of open there is ρ > 0 such that Bρ (y) ⊂ U , so all points of Rn \ U have distance at least ρ from y,
so, in particular, y cannot be a limit point of Rn \ U , contradicting the choice of y. Thus, we have
eliminated the possibility that y ∈ U and hence y ∈ Rn \ U .
Proof of 1.3 ⇐: Take y ∈ U . Then y is not a limit point of Rn \ U (because Rn \ U contains all
of its limit points) and we claim there must be some ρ > 0 such that Bρ (y) ∩ Rn \ U = ∅. Indeed,
otherwise we would have Bρ (y) ∩ Rn \ U = ∅ for every ρ > 0, and, in particular, B 1 (y) ∩ Rn \
k
U  = ∅ for each k = 1, 2, . . ., so for each k = 1, 2, . . . there is a point x (k) ∈ B 1 (y) ∩ Rn \ U , hence
k
x (k) − y < k1 and y is a limit point of Rn \ U , a contradiction. So in fact there is indeed a ρ > 0
such that Bρ (y) ∩ Rn \ U = ∅, or, in other words, Bρ (y) ⊂ U .

Notice that in view of the De Morgan Laws (see Exercise 1.5 below) that Rn \ (∪α∈
Uα ) =
∩α∈
(Rn \ Uα ) and Rn \ (∩α∈
Uα ) = ∪α∈
(Rn \ Uα ) and the general properties of open sets men-
tioned above, we conclude the following general properties of closed sets:
• Rn and ∅ are both closed.
• If C1 , . . . , CN is any finite collection of closed sets, then ∪N
j =1 Cj is closed (thus “the union of
finitely many closed sets is again closed”).
• If {Cα }α∈
(
any nonempty indexing set) is any collection of closed sets then ∩α∈
Cα is closed
(thus “the intersection of any collection of closed sets is again closed”).

SECTION 1 EXERCISES
1.1 Prove that {(x, y) ∈ R2 : y > x 2 } is an open set in R2 which is not closed, and {(x, y) ∈ R2 :
y ≤ x 2 } is closed but not open.

1.2 Let A ⊂ Rn , and define E ⊂ A to be relatively open in A if for each x ∈ E there is δ > 0 such
that Bδ (x) ∩ A ⊂ E.
Prove that E ⊂ A is relatively open in A ⇐⇒ ∃ an open set U ⊂ Rn such that E = A ∩ U .

1.3 Check the claim made above that U1 , . . . , UN open ⇒ ∩N


j =1 Uj is open.

1.4 Give an example, in the case n = 1, to show that the result of the previous exercise does not
extend to the intersection of infinitely many open sets; thus, give an example of open sets U1 , U2 , . . .
in R such that ∩∞j =1 Uj is not open.

1.5 Prove the first De Morgan law mentioned above; that is, if {Aα }α∈
is any collection of subsets
of Rn , then Rn \ (∪α∈
Aα ) = ∩α∈
(Rn \ Aα ).
Hint: You have to prove Rn \ (∪α∈
Aα ) ⊂ ∩α∈
(Rn \ Aα ) and Rn \ (∪α∈
Aα ) ⊃ ∩α∈
(Rn \ Aα ),
i.e., x ∈ Rn \ (∪α∈
Aα ) ⇒ x ∈ ∩α∈
(Rn \ Aα ) and x ∈ ∩α∈
(Rn \ Aα ) ⇒ x ∈ Rn \ (∪α∈
Aα ).
2. BOLZANO-WEIERSTRASS, LIMITS AND CONTINUITY IN Rn 33

2 BOLZANO-WEIERSTRASS, LIMITS AND CONTINUITY


IN Rn
Here we shall write points in Rn as rows: x = (x1 , . . . , xn ).
We first give the definition of convergence of a sequence {x (k) }k=1,2,... of such points:
2.1 Definition: Given a sequence {x (k) }k=1,2,... of points in Rn , limk→∞ x (k) = y means
limk→∞ x (k) − y = 0.
Notice that this makes sense, because {x (k) − y}k=1,2,... is a sequence of real numbers, and we
are already familiar with the definition of convergence of such real sequences. In terms of ε, N, the
Def. 2.1 actually says that lim x (k) = y ⇐⇒ for each ε > 0 there is N such that x (k) − y < ε
whenever k ≥ N . In view of the inequalities

 (k)  
n
(k)
2.2 max |xj − yj | : j = 1, . . . , n ≤ x − y ≤
(k)
|xj − yj |
j =1

it is very easy to check the following lemma (see Exercise 2.2 below).
(k)
2.3 Lemma. lim x (k) = y with y = (y1 , . . . , yn ) ⇐⇒ lim xj = yj ∀ j = 1, . . . , n.
(k)
2.4 Remark: The real sequences {xj }k=1,2,... (for j = 1, . . . , n) are referred to as the component
sequences corresponding to the vector sequence x (k) , so with this terminology Lem. 2.3 says that the
vector sequence x (k) converges if and only if each of the component sequences converge, and in this
case the limit of the vector sequence is the point in Rn with j -th component equal to the limit of
the j -th component sequence, j = 1, . . . , n.

Recall from Lecture 2 of the Appendix that every bounded sequence in R has a convergent subse-
quence. A similar theorem is true for sequences in Rn .

2.5 Theorem (Bolzano-Weierstrass in Rn .) If {x (k) }k=1,2,... is a bounded sequence in Rn (so that there
is a constant R with x (k)  ≤ R ∀ k = 1, 2, . . .), then there is a convergent subsequence {x (kj ) }j =1,2,... .
Proof: As we mentioned above, this is already proved in Lecture 2 of the Appendix for the case
n = 1, and the general case follows from this and induction on n. The details are left as an exercise
(see Exercise 2.3 below).

Now suppose A is an arbitrary subset of Rn , and let f : A → Rm , where m ≥ 1. We want to define


continuity of f at a point a ∈ A. The definition which follows is similar to the the special case
discussed in Lecture 3 of the Appendix A—i.e. the special case when A is an interval in R and
m = 1. In the general case when A is any subset of Rn and m ≥ 1 the definition is in fact as follows:

Definition (i): f : A → Rm is continuous at the point a ∈ A if for each ε > 0 there is δ > 0 such
that x ∈ Bδ (a) ∩ A ⇒ f (x) − f (a) < ε.
34 CHAPTER 2. ANALYSIS IN Rn

(ii) f : A → Rm is said to be continuous if is continuous at every point a ∈ A.

Analogous to the 1-variable theorem discussed in Lecture 3 of Appendix A, we have the following
theorem in Rn .

2.6 Theorem. Suppose K ⊂ Rn is compact (i.e., K is a closed bounded subset of Rn ) and f : K → R is


continuous. Then f attains its maximum and minimum values on K; that is, there are points a, b ∈ K
with f (a) ≤ f (x) ≤ f (b) ∀ x ∈ K.

Proof: The proof is almost identical to the proof of the corresponding 1-variable theorem given in
Lecture 3 of the Appendix, except that we use the Bolzano-Weierstrass theorem in Rn rather than
in R. (See Exercise 2.4 below.)

We shall also need the concept of of limit of a function which is defined on a subset of Rn . So
suppose f : A → Rm , where A ⊂ Rn and m ≥ 1. We shall define the notion of limit at any point
a ∈ A which is not an isolated point of A—a point a ∈ A is called an isolated point of A if there is
some δ0 > 0 such that A ∩ Bδ0 (a) = {a} (i.e., if there is some ball Bδ0 (a) such that a is the only
point of A which lies in the ball Bδ0 (a)). Thus a ∈ A is not an isolated point of A if and only if
A ∩ (Bδ (a) \ {a})  = ∅ for each δ > 0.
2.7 Definition: Suppose A ⊂ Rn , f : A → Rm , a ∈ A, a not an isolated point of A, and y ∈ Rm .
Then limx →a f (x) = y means that for each ε > 0 there is a δ > 0 such that x ∈ A ∩ Bδ (a) \ {a} ⇒
f (x) − y < ε.

2.8 Remark: With the notation of the above definition, notice that limx →a f (x) = y with y =
(y1 , . . . , ym ) ⇐⇒ limx →a fj (x) = yj ∀ j = 1, . . . , m.
We leave the proof as an exercise (see Exercise 2.4 below), and we also leave the proofs of the standard
limit theorems in the following lemma as an exercise (see Exercise 2.4).

2.9 Lemma. Suppose λ, μ ∈ R, f, g : A → Rm , a ∈ A, and limx →a f (x) and limx →a g(x) both exist.
Then:
(i) lim (λf (x) + μg(x)) exists and = λ lim f (x) + μ lim g(x)
x →a x →a x →a
(ii) m = 1 ⇒ lim (f (x)g(x)) exists and = ( lim f (x))( lim g(x))
x →a x →a x →a
(iii) m = 1 and lim g(x)  = 0 ⇒ lim (f (x)/g(x))
x →a x →a
exists and = ( lim f (x))/( lim g(x)) .
x →a x →a

2.10 Remark: Suppose f : A → Rm and a ∈ A. Observe that f is trivially continuous at a if a is


an isolated point of A, and if a is not an isolated point of A then f is continuous at a if and only if
limx →a f (x) = f (a) (meaning limx →a f (x) exists and is equal to f (a)).
3. DIFFERENTIABILITY 35

SECTION 2 EXERCISES
2.1 (Sandwich theorem.) Suppose A ⊂ Rn , a ∈ A, y ∈ Rm , f, g, h : A → Rm and limx →a g(x) =
limx →a h(x) = y. Prove g(x) ≤ f (x) ≤ h(x) ∀ x ∈ A ⇒ limx →a f (x) = y.
Note: Part of what you have to prove is that limx →a f (x) exists.

2.2 Give the proof of 2.2 above.

2.3 Using the Bolzano-Weierstrass theorem for real sequences (from Lecture 2 of Appendix A), give
the proof of Thm. 2.5 above.
Caution: An application of the theorem in the case n = 1 (Lecture 2 of Appendix A) tells us that
(k)
each component sequence {xj }k=1,2,... has a convergent subsequence, but you need to take account
of the fact that there may be a different subsequence corresponding to each j .

2.4 Give the proofs of Thm. 2.6, Rem. 2.8, and Lem. 2.9 above.

2.5 Suppose K is a compact (i.e., closed and bounded) subset of Rn and let f : K → R be continuous.
Prove that for each ε > 0 there is δ > 0 such that x, y ∈ K and x − y < δ ⇒ |f (x) − f (y)| < ε.
Hint: Otherwise there is ε = ε0 > 0 such that this fails for each δ > 0, which means it fails for δ = 1
k
for each k = 1, 2, . . ., which means that for each k there are points x k , y k ∈ K with x k − y k  < 1
k
and |f (x k ) − f (y k )| ≥ ε0 .

3 DIFFERENTIABILITY IN Rn
Recall first the definition of differentiability of a function f : (α, β) → R of one variable: f is
differentiable at a point a ∈ (α, β) if

f (a + h) − f (a)
3.1 lim exists .
h→0 h

In this case, the limit is denoted f (a) and is referred to as the derivative of f at a.
This definition does not extend well to functions of more than 1-variable because in the case of
functions f : U → R with U ⊂ Rn open and n ≥ 2, difference quotients like f (a+h)−f h
(a)
make
no sense (we cannot divide by the vector quantity h). We therefore need an alternative definition of
differentiability which makes sense for such functions and which is equivalent to the usual defini-
tion 3.1 of differentiability when n = 1. The key observation in this regard is that 3.1 (in the case
n = 1 when f : (α, β) → R and a ∈ (α, β)) is equivalent to the requirement that there is a real
number A such that
 
3.2 lim |h|−1 |f (a + h) − f (a) + Ah | = 0 .
h→0
36 CHAPTER 2. ANALYSIS IN Rn

Notice that such A exists if and only if A = lim f (a+h)−f


h
(a)
, because
 f (a + h) − f (a) 
|h|−1 |f (a + h) − (f (a) + Ah)| =  − A
h
and hence
f (a + h) − f (a)
lim |h|−1 |f (a + h) − (f (a) + Ah)| = 0 ⇐⇒ lim =A.
h→0 h→0 h
So 3.2 is equivalent to 3.1 in case n = 1, but the point now is that 3.2 has a natural generalization
to the case when f : U → Rm , where U ⊂ Rn is open and m, n are arbitrary positive integers, as
follows.
3.3 Definition: If f : U → Rm then we say that f is differentiable at a ∈ U if there is an m × n
matrix A such that
lim h−1 f (a + h) − (f (a) + Ah) = 0 .
h→0

Figure 2.1: Definition of differentiability, n = 2, m = 1.

In case 3.3 holds, A is called the derivative matrix (or Jacobian matrix) of the function f at the point
a. It is unique if it exists at all, and we could fairly easily check that now, but in any case it will be a
by-product of the result of Thm. 4.3 in the next section, so we’ll leave the proof of uniqueness until
then.
The derivative matrix A of f at a, when it exists, is usually denoted Df (a).

As for functions of 1-variable we have the fact that differentiability ⇒ continuity. More precisely:
4. DIRECTIONAL DERIVATIVES, PARTIAL DERIVATIVES, AND GRADIENT 37

3.4 Lemma. Suppose f : U → Rm with U open in Rn and suppose a ∈ U . Then differentiability of f


at a ⇒ continuity of f at a.
Proof: Assume f is differentiable at a ∈ U . Pick ρ > 0 such that Bρ (a) ⊂ U , which we can do
because U is open. Then for 0 < h < ρ we have

f (a + h) − f (a) = f (a + h) − (f (a) + Df (a)h) − Df (a)h


≤ f (a + h) − (f (a) + Df (a)h) + Df (a)h
≤ (h−1 f (a + h) − (f (a) + Df (a)h))h + Df (a)h

and limh→0 h−1 (f (a + h) − (f (a) + Df (a)h)), limh→0 h, limh→0 Df (a)h are all
zero, so, by Lem. 2.9, limh→0 f (a + h) − f (a) = 0.0 + 0 = 0, which is the definition of conti-
nuity of f at a.

4 DIRECTIONAL DERIVATIVES, PARTIAL DERIVATIVES,


AND GRADIENT
Notice that if f : U → Rm with U open and if a ∈ U , we can find ρ > 0 such that Bρ (a) ⊂ U
and so, for any vector v ∈ Rn , f (a + tv) is a function of the real variable t, and is well defined for
|t| < v−1 ρ if v = 0 and is a constant function f (a + tv) ≡ f (a) for all t ∈ R if v = 0.
4.1 Definition: The directional derivative of f at a by the vector v, denoted Dv f (a), is defined by

Dv f (a) = lim h−1 (f (a + hv) − f (a))


h→0

whenever the limit on the right exists. Observe that, since g(t) = f (a + tv) is a function of the real
variable t for t in some open interval containing 0, the Def. 4.1 just says that the directional derivative
Dv f (a) is exactly the derivative g (0) at t = 0 of the function g(t) defined by g(t) = f (a + tv).
4.2 Definition: In the special case v = ej (the j -th standard basis vector in Rn ) the directional
derivative De j f (a) is usually denoted Dj f (a) whenever it exists; thus

Dj f (a) = lim h−1 f (a + hej − f (a)) .


h→0

Dj f (a) so defined is called the j -th partial derivative of f at a. An alternative notation (sometimes
referred to as “classical notation”) is to write
∂f (x)
= Dj f (x) .
∂xj
Notice that the simplest way of calculating Dj f (x) is usually simply to take the derivative (as in
1-variable calculus) of the function f (x1 , . . . , xn ) with respect to the single variable xj while holding
all the other variables, xi , i  = j , fixed. Thus, in general, it is not necessary to use the formal limit
definition of 4.2.
38 CHAPTER 2. ANALYSIS IN Rn

There is an important connection between partial derivatives and the derivative matrix at points
where f is differentiable, given in the following theorem.
4.3 Theorem. If f : U → Rm with U open and if f is differentiable at a point a ∈ U , then all the
partial derivatives of f at a exist, and in fact the j -th column of the derivative matrix Df (a) is exactly
the j -th partial derivative Dj f (a), j = 1, . . . , n. Furthermore, if f is differentiable at a then all the
directional derivatives Dv f (a) exist, and are given by the formula
Dv f (a) = Df (a)v ,
or equivalently,

n
Dv f (a) = vj Dj f (a) .
j =1

4.4 Caution: The above says, in particular, that f differentiable at a implies that all the partial
derivatives Dj f (a), j = 1, . . . , n, and all the directional derivatives Dv f (a) exist, but the converse
is false; that is, it may happen that all the directional derivatives (including the partial derivatives)
exist yet f still fails to be differentiable at a. (See Exercise 4.1 below.)
Proof of Theorem 4.3: Take ρ > 0 such that Bρ (a) ⊂ U and let A = Df (a). We are given
limh→0 h−1 f (a + h) − (f (a) + Ah) = 0 so given ε > 0 there is a δ ∈ (0, ρ) such that 0 <
h < δ ⇒ h−1 f (a + h) − (f (a) + Ah) < ε and taking, in particular, h = hej we see that
h ∈ R with 0 < |h| < δ ⇒ |h|−1 f (a + hej ) − (f (a) + hAej ) < ε ⇐⇒ h−1 (f (a + hej ) −
f (a)) − Aej  < ε, which is precisely the ε, δ definition of lim h−1 (f (a + hej ) − f (a)) = Aej , so
in fact Dj f (a) exists and equals the j -th column of Df (a). If v is any vector in Rn \ {0} and if h ∈ R
with 0 < |h| < δ/v then |h|−1 f (a + hv) − (f (a) + hAv) < ε, and hence h−1 (f (a + hv) −
f (a)) − Av < ε which, since ε > 0 is arbitrary, says limh→0 h−1 (f (a + hv) − f (a)) = Av; that

is Dv f (a) exists and is equal to Av = Df (a)v = nj=1 vj Dj f (a), since we already proved the
j -th column of Df (a) is Dj f (a).
4.5 Definition: Let f : U → R with U open. The gradient of f at a point a ∈ U is the vector
(D1 f (a), . . . , Dn f (a))T .
We claim that the gradient has geometric significance: If f is differentiable at a it gives the direction
of the fastest rate of change of f when we start from the point a, and furthermore it’s magnitude
∇f (a) gives the actual rate of change. More precisely:
4.6 Lemma. Suppose f : U → R with U open, let a ∈ U and assume that f is differentiable at a. Then
max{Dv f (a) : v ∈ Rn , v = 1} = ∇f (a), and for ∇f (a)  = 0 the maximum is attained when
and only when v = ∇f (a)−1 ∇f (a).

Proof: By Lem. 4.4 we have Dv f (a) = nj=1 vj Dj f (a) = v · ∇f (a), and by the version of the
Cauchy-Schwarz inequality in Ch. 1, Exercise 2.2 we have for v = 1 that
(1) v · ∇f (a) ≤ ∇f (a)
4. DIRECTIONAL DERIVATIVES, PARTIAL DERIVATIVES, AND GRADIENT 39

and equality holds in this inequality if and only if ∇f (a) = λv with λ ≥ 0. But ∇f (a) = λv
with λ ≥ 0 and v = 1 ⇒ λ = ∇f (a), so if ∇f (a)  = 0 equality holds in (1) if and only if
v = ∇f (a)−1 ∇f (a), as claimed.

In view of the remarks in 4.4 above, we have the inconvenient fact that all the partial derivatives
of f might exist, yet f still not be differentiable. Fortunately, there is after all a criterion, involving
only the partial derivatives of f , which guarantees differentiability of f , as follows.

4.7 Theorem. Let f : U → Rm and assume that there is a ball Bρ (a) ⊂ U such that the partial
derivatives Dj f (x) exist at each point x ∈ Bρ (a) and are continuous at a. Then f is differentiable at a.

4.8 Remark: In particular, if the partial derivatives Dj f exist everywhere in U and are continuous
at each point of U (in this case we say that “f is a C 1 function on U ”), then we can apply the above
theorem at every point of U , so f is differentiable at every point of U in this case.

Proof of Theorem 4.7: Observe that it suffices to prove the theorem in the case m = 1, because in
case m ≥ 2 we have f (x) = (f1 (x), . . . , fm (x))T and we can apply the case m = 1 to each individual
component fj , which evidently implies the required result. With m = 1 we have f real-valued and
for 0 < |h| < ρ we have


n
 
(1) f (a + h) − f (a) = f (a + h(j ) ) − f (a + h(j −1) ) ,
j =1

j
where we use the notation h(j ) = i=1 hi ei for j = 1, . . . , n and h(0) = 0. Observe that then
h(j ) = h(j −1) + hj ej for j = 1, . . . , n, and so in the difference f (a + h(j ) ) − f (a + h(j −1) ) =
f (a + h(j −1) + hj ej ) − f (a + h(j −1) ) we are varying only the j -th variable (i.e., a single vari-
able), so the Mean Value Theorem from 1-variable calculus implies f (a + h(j ) ) − f (a + h(j −1) ) =
hj Dj f (h(j −1) + θj hj ej ) for some θj ∈ (0, 1), whence (1) implies


n
f (a + h) − f (a) = hj Dj f (a + h(j −1) + θj hj ej ) ,
j =1

and hence


n
 
f (a + h) − f (a) − Df (a)h = hj Dj f (a + h(j −1) + θj hj ej ) − Dj f (a) .
j =1
40 CHAPTER 2. ANALYSIS IN Rn

and in particular, with Df = (D1 f, . . . , Dn f ),

n
 
(2) f (a + h) − f (a) − Df (a)h =  hj Dj f (a + h(j −1) + θj hj ej ) − Dj f (a) 
j =1

n
  
≤ hj Dj f (a + h(j −1) + θj hj ej ) − Dj f (a)
j =1
 n
 
≤ h Dj f (a + h(j −1) + θj hj ej ) − Dj f (a) ,
j =1

where we used the triangle inequality (2.6 of Ch. 1) to go from the first line to the second.
Let ε > 0. Since Dj f (x) is continuous at x = a, for each j = 1, . . . , n we have δj ∈ (0, ρ) such
that
k < δj ⇒ Dj f (a + k) − Dj f (a) < ε/n ,
and hence with δ = min{δ1 , . . . , δn } we have

(3) k < δ ⇒ Dj f (a + k) − Dj f (a) < ε/n for each j = 1, . . . , n .



j −1 2
Finally, observe that if h < δ then h(j −1) + θj hj ej  = i=1 hi + θj hj ≤ h < δ, and
2 2

hence we can use (3) with k = h(j −1) + θj hj ej and so (2) implies

f (a + h) − f (a) − Df (a)h < εh whenever 0 < h < δ ,

thus proving that f is differentiable at a.

SECTION 4 EXERCISES
x3
if y = 0
4.1 Let f be the function on R2 defined by f (x, y) = y
0 if y = 0.
(i) Prove that the directional derivative Dv f (0, 0) (exists and) = 0 for each v ∈ R2 .
(ii) f is not continuous at (0, 0).
(iii) f is not differentiable at (0, 0).
⎧ |x |x
⎨ 1 2 if (x1 , x2 )  = (0, 0)
4.2 For the function f (x1 , x2 ) = x12 +x22

0 if (x1 , x2 ) = (0, 0),
(i) Find the directional derivatives at (0, 0) (and show they all exist),

(ii) Show that the formula Dv f (0, 0) = 2j =1 vj Dj f (0, 0) fails for some vectors v ∈ R2 ,
(iii) Hence, using (ii) together with the appropriate theorem, prove that f is not differentiable at
(0, 0).
5. CHAIN RULE 41

5 CHAIN RULE

The chain rule for functions of one variable says that the composite function g ◦ f is differentiable at
a point x with derivative g (f (x))f (x) provided both the derivatives f (x) and g (f (x)) exist. The
chain rule for functions of more than one variable is similar, and the proof is more or less identical,
except that the derivatives f , g must be replaced by the relevant derivative matrices. The precise
theorem is as follows.

5.1 Theorem (Chain Rule.) Suppose U ⊂ Rn and V ⊂ Rm are open and f : U → V , g : V → R


are given, and suppose that f is differentiable at a point x ∈ U and that g is differentiable at the point
y = f (x) ∈ V . Then g ◦ f is differentiable at x and D(g ◦ f )(x) = Dg(f (x))Df (x).

5.2 Remark: Dg(f (x))Df (x) is the product of the  × m matrix Dg(f (x)) times the m × n matrix
Df (x), which makes sense and gives an  × n matrix, which is at least the correct size matrix to
possibly represent D(g ◦ f )(x), because the composite g ◦ f maps the set U ⊂ Rn into R , so at
least the statement of the theorem makes sense.

Proof of the Chain Rule: Let ε ∈ (0, 1) and take δ0 > 0 such that Bδ0 (x) ⊂ U and Bδ0 (f (x)) ⊂ V .
Observe first that by differentiability of g at y = f (x) we can choose δ1 ∈ (0, δ0 ) such that

z − y < δ1 ⇒ g(z) − g(y) − Dg(y)(z − y) ≤ εz − y

and, on the other hand, by differentiability of f at x we can choose δ2 ∈ (0, δ0 ) such that

h < δ2 ⇒ f (x + h) − f (x) − Df (x)h ≤ εh

and, in particular,

h < δ2 ⇒ f (x + h) − f (x) = f (x + h) − f (x) − Df (x)h + Df (x)h


≤ f (x + h) − f (x) − Df (x)h + Df (x)h
≤ εh + Df (x))h
≤ (1 + Df (x))h < δ1

if h < (1 + Df (x))−1 δ1 . So, in particular, if h < min{(1 + Df (x))−1 δ1 , δ2 } then we have
the two inequalities

f (x + h) − f (x) − Df (x)h ≤ εh


g(z) − g(y) − Dg(y)(z − y) ≤ ε(1 + Df (x))h
42 CHAPTER 2. ANALYSIS IN Rn

with z = f (x + h), y = f (x). So with z = f (x + h), y = f (x) we see h < min{(1 +
Df (x))−1 δ1 , δ2 } ⇒
g(z) − g(y) − Dg(y)Df (x)h  
= g(z) − g(y) − Dg(y)(z − y) − Dg(y) f (x + h) − f (x) − Df (x)h  
≤ g(z) − g(y) − Dg(y)(z − y) + Dg(y) f (x + h) − f (x) − Df (x)h 
≤ g(z) − g(y) − Dg(y)(z − y) + Dg(y) f (x + h) − f (x) − Df (x)h
 − y + εDg(y) h 
≤ εz
≤ ε 1 + Df (x) + Dg(y) h = Mεh ,
where M = 1 + Df (x) + Dg(f (x)). That is we have proved that if we take δ = min{(1 +
Df (x))−1 δ1 , δ2 } then
h < δ ⇒ g(f (x + h)) − g(f (x)) − Dg(f (x))Df (x)h ≤ Mεh .
Since we can repeat the whole argument with M −1 ε in place of ε we have thus shown that there is
a δ > 0 such that
h < δ ⇒ g(f (x + h)) − g(f (x)) − Dg(f (x))Df (x)h ≤ εh .
That is, g ◦ f is differentiable at x and D(g ◦ f )(x) = Dg(f (x))Df (x).

SECTION 5 EXERCISES
5.1 Use the chain rule and any other theorems from the text that you need to show the following:
(i) If f : U → V and g : V → Rp are C 1 , where U ⊂ Rn , V ⊂ Rm are open, then g ◦ f is C 1 on
U.
(ii) If f : R → R is C 1 and g : R3 → R is defined by g(x) = f (2x1 − x2 + 3x3 ), prove that g is
C 1 and show D1 g(x) = −2D2 g(x) = (2/3)D3 g(x) at each point x = (x1 , x2 , x3 ) ∈ R3 .
(iii) If ϕ : R → R is C 1 and g(x) = ϕ(x) for x ∈ Rn , then g is C 1 on Rn \ {0} and Dj g(x) =
xj
x  ϕ (x) for all x  = 0.
(iv) If γ : [0, 1] → Rn is a C 1 curve and if f : Rn → R is C 1 then f ◦ γ is C 1 on [0, 1] and
(f ◦ γ ) (t) = ∇f (γ (t)) · γ (t) = (Dγ (t) f )(γ (t)) ∀ t ∈ [0, 1], where ∇f is the gradient of f (i.e.,
(Df )T ) and Dv f means the directional derivative of f by v.

6 HIGHER-ORDER PARTIAL DERIVATIVES


Recall that f : U → Rm , with U ⊂ Rn open, is said to be C 1 on U if all the partial derivatives Dj f
exist and are continuous on U . Likewise f is said to be C 2 on U if each of the partial derivatives
Dj f is C 1 , so in this case the second order partial derivatives Di Dj f exist and are continuous. An
important general fact about such second order partials is that
6.1 Di Dj f = Dj Di f on U , provided f is C 2 on U ;
6. HIGHER-ORDER PARTIAL DERIVATIVES 43

i.e., we claim that the order in which one computes the mixed partials is irrelevant. Of course then
if k ≥ 2 and f : U → Rm is C k (meaning all partial derivatives Di1 Di2 · · · Dik f (x) exist and are
continuous for any choice of i1 , . . . , ik ∈ {1, . . . , n}), then again the order in which the partial
derivatives are taken is irrelevant:

6.2 f ∈ C k ⇒ Di1 Di2 · · · Dik f (x) = Dij1 Dij2 · · · Dijk f (x)

whenever j1 , . . . , jk is a re-ordering (“a permutation”) of the integers 1, . . . , k, which is an easy


consequence of 6.1 and induction on k, because for k ≥ 3 we can write

Di1 Di2 · · · Dik f (x) = Di1 Di2 · · · Dik−1 (Dik f )(x) = Di1 (Di2 · · · Dik−1 Dik f )(x) .

A partial derivative Di1 · · · Dik f (x) as in 6.2 above is usually referred to as a partial derivative of f
of order k.

We will actually prove a result which is slightly more general than 6.1, as follows.

6.3 Theorem. Suppose f : U → Rm , where U is an open subset of Rn , suppose ρ > 0, a ∈ U , Bρ (a) ⊂


U , i, j ∈ {1, . . . , n}, and suppose the second-order partial derivatives Di Dj f (x) and Dj Di f (x) exist
at each point x of Bρ (a) and are continuous at the point x = a. Then Di Dj f (a) = Dj Di f (a).
Proof: Evidently, there is nothing to prove when i = j , so we can assume i  = j . Also, we can assume
m = 1, because otherwise we can first apply the case m = 1 of the theorem to each component fj of
f = (f1 , . . . , fm )T and this then evidently implies the theorem as stated. So we assume that m = 1
(i.e., that f is real-valued) and i = j . For h, k ∈ (−ρ/2, ρ/2) we observe that the points a + hei ,
a + kej and a + hei + kej are all in Bρ (a) and so it makes sense to consider the “second difference”

f (a + hei + kej ) − f (a + hei ) − f (a + kej ) + f (a) .

The expression here is called a second difference because it can be written

ϕj,k (a + hei ) − ϕj,k (a) with ϕj,k (x) = f (x + kej ) − f (x)

(i.e., it can be written as the “first difference of the first difference ϕj,k ” hence the name “second
difference”), and similarly it can also be written

ψi,h (a + kej ) − ψi,h (a) with ψi,h (x) = f (x + hei ) − f (x) ,

so in fact,

(1) ϕj,k (a + hei ) − ϕj,k (a) = ψi,h (a + kej ) − ψi,h (a) .

Now only the i-th variable is being varied in the difference on the left of (1), so we can use the Mean
Value Theorem from 1-variable calculus in order to conclude that

ϕj,k (a + hei ) − ϕj,k (a) = hDi ϕj,k (a + θ hei ) for some θ ∈ (0, 1) ,
44 CHAPTER 2. ANALYSIS IN Rn

and of course by taking the partial derivative with respect to xi on each side of the identity ϕj,k (x) =
f (x + kej ) − f (x), we see that Di ϕj,k (x) = Di f (x + kej ) − Di f (x). Hence,

ϕj,k (a + hei ) − ϕj,k (a) = hDi f (a + θ hei + kej ) − hDi f (a + θ hei )

and on the right here is a difference in which only the j -th variable is being varied, hence we can
again use the Mean Value Theorem from 1-variable calculus to see that the expression on the right
can be written hkDj Di f (a + θ hei + ηkej for some η ∈ (0, 1). Thus, finally we have shown

(2) ϕj,k (a + hei ) − ϕj,k (a) = hkDj Di f (a + θ hei + ηkej ) for |h|, |k| < ρ/2 .

By a similar argument (starting with the right side of (1) instead of the left) we have

(3) ψi,h (a + kej ) − ψi,h (a) = khDi Dj f (a + 


θ hei + 
ηkej ) for |h|, |k| < ρ/2 ,

and for suitable 


θ, 
η ∈ (0, 1). Thus, by (1), (2), (3) we see that

(4) Dj Di f (a + θhei + ηkej ) = Di Dj f (a + θ hei + 


ηkej ) for |h|, |k| < ρ/2 .

Taking limits as h2 + k 2 → 0 and using the fact that Di Dj f (x) and Dj Di f (x) are both contin-
uous at x = a, we have Di Dj f (a) = Dj Di f (a) as claimed.

SECTION 6 EXERCISES
6.1 Suppose f, g : R → R are C 2 functions, and let F : R2 → R be defined by F (x, y) = f (x +
y) + g(x − y). Prove that F is C 2 on R2 and satisfies the wave equation on R2 (i.e., the equation
∂2F 2
∂x 2
− ∂∂yF2 = 0).

6.2 Prove that the degree n polynomials u, v obtained by taking the real and imaginary parts of
∂2p ∂2p
(x + iy)n are harmonic (i.e., ∂x 2
+ ∂y 2
≡ 0 on R2 in both cases p = u, p = v).
Hint: Thus, (x + iy)n = u(x, y) + iv(x, y) where u, v are real-valued polynomials in the variables
∂u
x, y. Start by showing that ∂x = ∂v
∂y and ∂y = − ∂x .
∂u ∂v

7 THE SECOND DERIVATIVE TEST FOR EXTREMA OF A


MULTIVARIABLE FUNCTION
Here we want to discuss conditions on a C 2 function f : U → R, with U open in Rn , which are
sufficient to guarantee f has a local maximum or local minimum at a point a ∈ U . First we give the
precise definition of local maximum and local minimum.
7.1 Definition: Suppose f : U → R with U open and a ∈ U . f is said to have a local maximum at
a if there is ρ > 0 with Bρ (a) ⊂ U and

f (x) ≤ f (a) ∀ x ∈ Bρ (a)


7. SECOND DERIVATIVE TEST FOR EXTREMA OF MULTIVARIABLE FUNCTION 45

and the local maximum is said to be strict if there is ρ > 0 such that Bρ (a) ⊂ U and
f (x) < f (a) ∀ x ∈ Bρ (a) \ {a} .

Likewise, f is said to have a local minimum at a if there is ρ > 0 with Bρ (a) ⊂ U and f (x) ≥
f (a) ∀ x ∈ Bρ (a) and the local minimum is said to be strict if ∃ ρ > 0 such that we have the strict
inequality f (x) > f (a) ∀ x ∈ Bρ (a) \ {a}.
Of course if f : U → R is differentiable with U ⊂ Rn open, then ∇f (a) = 0 at any point a ∈ U
where f has a local maximum or minimum (because if f has a local maximum or minimum at a
then f (a + tej ) clearly has a local maximum or minimum at t = 0 and hence by 1-variable calculus
has zero derivative with respect to t at t = 0, and the derivative at t = 0 is by definition the j ’th
partial derivative Dj f (a)).
Recall the second derivative test from single variable calculus that if f : (α, β) → R is C 2 (i.e., the
first and second derivatives of f exist and are continuous) then f has a strict local maximum at a
point a ∈ (α, β) if f (a) = 0 and f (a) < 0 and a strict local minimum at a point a ∈ (α, β) if
f (a) = 0 and f (a) > 0.
Here we want to show that a similar statement holds for functions f : U → R, with U open in
Rn . In order to state the relevant theorem, we first need to make a brief discussion of the notion of
quadratic form.
A quadratic form on Rn is an expression of the form

n
7.2 Q(ξ ) = qij ξi ξj , ξ = (ξ1 , . . . , ξn ) ∈ Rn ,
i,j =1

where (qij ) is a given n × n real symmetric matrix. The quadratic form is said to be positive definite
if Q(ξ ) > 0 ∀ ξ ∈ Rn \ {0} and negative definite if Q(ξ ) < 0 ∀ ξ ∈ Rn \ {0}.
Of course since a quadratic form is just a homogeneous degree 2 polynomial in the variables
ξ1 , . . . , ξn , it is evidently continuous and hence by Thm. 2.6 of the present chapter attains its
maximum and minimum value on any closed bounded set in Rn . In particular, the unit sphere S n−1
in Rn ,
7.3 S n−1 = {x ∈ Rn : x = 1} ,
is a closed bounded set, hence the restriction of the quadratic form to S n−1 , Q|S n−1 , therefore does
attain its maximum and minimum value. Let
7.4 m = min Q|S n−1 , M = max Q|S n−1
and observe that for ξ ∈ Rn \ {0} we can write ξ = ξ 
ξ , where 
ξ = ξ −1 ξ ∈ S n−1 , and so Q(ξ ) =
ξ  Q(
2 ξ ), whence
7.5 mξ 2 ≤ Q(ξ ) ≤ Mξ 2 , ∀ ξ ∈ Rn ,
46 CHAPTER 2. ANALYSIS IN Rn

with m, M as in 7.4.
Notice, in particular, if Q is positive definite then the minimum m of Q|S n−1 is positive, and so 7.5
guarantees
7.6 Q positive definite ⇒ ∃ m > 0 such that Q(ξ ) ≥ mξ 2 ∀ ξ ∈ Rn ,
and similarly,
7.7 Q negative definite ⇒ ∃ m > 0 such that Q(ξ ) ≤ −mξ 2 ∀ ξ ∈ Rn .
We are now ready to state the second derivative test for C 2 functions f : U → R, where U is open
in Rn . In this theorem, Hessf,a (ξ ) will denote the Hessian quadratic form of f at a, defined by

n
7.8 Hessf,a (ξ ) = Di Dj f (a)ξi ξj , ξ = (ξ1 , . . . , ξn )T ∈ Rn .
i,j =1

7.9 Theorem (Second derivative test for functions of n variables.) Suppose U is open in Rn ,
f : U → R is C 2 , and a ∈ U . Then
f (x) = f (a) + (x − a) · ∇f (a) + 1
2 Hessf,a (x − a) + E(x) ,
where limx →a x − a−2 E(x) = 0.
In particular, f has a strict local maximum at a if ∇f (a) = 0 and Hessf,a (ξ ) is negative definite, and
a strict local minimum at a if ∇f (a) = 0 and Hessf,a (ξ ) is positive definite.
7.10 Terminology: If f : U → R is C 1 and U ⊂ Rn is open, points a ∈ U with ∇f (a) = 0 are
usually referred to as critical points of f . Using this terminology, Thm. 7.9 says that if f is C 2 then
f has a strict local minimum at any critical point a ∈ U where Hessf,a (ξ ) is positive definite and
strict local maximum at any critical point a ∈ U where Hessf,a (ξ ) is negative definite.
Proof of 7.9: We begin with an identity from single variable calculus: If h : [0, 1] → R is C 2 , then
we have the identity
!1
(1) 0 (1 − t)h (t) dt = h(1) − h(0) − h (0) ,

which is proved using integration by parts. (See Exercise 7.3 below.) We apply (1) in the special
case when h(t) = f (a + t (x − a)), with x ∈ Bρ (a). By the Chain Rule, h(t) is then indeed C 2

on [0, 1] and h (t) = (x − a) · ∇f (a + t (x − a)), h (t) = ni,j =1 (xi − ai )(xj − aj )Di Dj f (a +
t (x − a)), so (1) gives
!1
f (x) = f (a) + (x − a) · ∇f (a)+ ni,j =1 (xi −ai )(xj −aj ) 0 (1− t)Di Dj f (a + t (x − a)) dt .
!1 !1
Since 0 (1 − t)Di Dj f (a + t (x − a)) dt = 0 (1 − t)(Di Dj f (a + t (x − a)) − Di Dj f (a)) dt +
!1 !1
0 (1 − t) dtDi Dj f (a) and 0 (1 − t) dt = 2 , this can be rewritten
1

(2) f (x) = f (a) + (x − a) · ∇f (a) + 1


2 Hessf,a (x − a) + E(x) ,
7. SECOND DERIVATIVE TEST FOR EXTREMA OF MULTIVARIABLE FUNCTION 47

n
where E(x) = i,j =1 qij (x)(xi − ai )(xj − aj ) with
!1  
qij (x) = 0 (1 − t) Di Dj f (a + t (x − a) − Di Dj f (a) dt .

Now observe that


! 1   
(3) |qij (x)| =  0 (1 − t) Di Dj f (a + t (x − a) − Di Dj f (a) dt 
!1
≤ 0 (1 − t)|Di Dj f (a + t (x − a) − Di Dj f (a)| dt ,

and let ε > 0 and i, j ∈ {1, . . . , n}. Since Di Dj f (x) is continuous at x = a we then have a δij > 0
such that x ∈ Bδij (a) ⇒ |Di Dj f (x) − Di Dj f (a)| < ε/n2 . Let δ = min{δij : i, j ∈ {1, . . . , n}}
and observe that x ∈ Bδ (a) ⇒ a + t (x − a) ∈ Bδ (a) ∀ t ∈ [0, 1], so then we have

x ∈ Bδ (a) ⇒ |Di Dj f (a + t (x − a) − Di Dj f (a)| < ε/n2 for each i, j = 1, . . . , n

and each t ∈ [0, 1]. Hence, using (3),


!1
x ∈ Bδ (a) ⇒ |qij (x)| ≤ 0 (1 − t) dt ε/n2 = 21 ε/n2 < ε/n2 ,

and so

n
(4) x ∈ Bδ (a) ⇒ |E(x)| ≤ |xi −ai ||xj − aj ||qij (x)| ≤ x − a2 n2 ε/n2 = εx − a2 .
i,j =1

Notice that this is the ε, δ definition of limx →a x − a−2 E(x) = 0, so the first part of the lemma
is proved.
Now suppose that Hessf,a (ξ ) is positive definite. Then by 7.6 there is m > 0 such that the term
2 Hessf,a (x − a) in (2) satisfies
1

1
(5) Hessf,a (x − a) ≥ mx − a2 for all x ∈ Rn ,
2

and, on the other hand, by using (4) with ε = m/2 we have δ > 0 such that

(6) x ∈ Bδ (a) ⇒ |E(x)| ≤ mx − a2 /2 ⇒ E(x) ≥ −mx − a2 /2 ,

and so using (5) and (6) in (2) we see that if ∇f (a) = 0

x ∈ Bδ (a) ⇒ f (x) ≥ f (a) + mx − a2 /2 ,

so f has a strict local minimum at a.


Applying the same argument with −f in place of f we deduce that f has a strict local maximum
at a if ∇f (a) = 0 and if Hessf,a (ξ ) is negative definite.
48 CHAPTER 2. ANALYSIS IN Rn

SECTION 7 EXERCISES
7.1 (a) We proved above that if U ⊂ Rn is open, if f : U → R is C 2 , and if x 0 ∈ U is a critical
point (i.e., ∇f (x 0 ) = 0), then f has a local minimum/maximum at x 0 if the Hessian quadratic form

Q(ξ ) ≡ ni,j =1 Di Dj f (x 0 )ξi ξj is positive definite/negative definite, respectively. Prove that f has
neither a local maximum nor a local minimum at a critical point x 0 of f if Q(ξ ) is indefinite (i.e., if
there are unit vectors ξ, η such that Q(ξ ) > 0 and Q(η) < 0.
(b) Find all local max/local min for the function f : R2 → R given by f (x, y) = 13 (x 3 + y 3 ) −
x 2 − 2y 2 − 3x + 3y, and justify your answer.

7.2 Suppose that f (x, y) = 13 (x 3 + y 3 ) − x 2 − 2y 2 − 3x + 3y. Find the critical points (i.e., the
points where ∇f = 0) of f , and discuss whether f has a local maximum/minimum at these points.
( Justify any claims that you make by proof or by referring to the relevant theorem above.)
!1
7.3 Prove the identity 0 (1 − t)h (t) dt = h(1) − h(0) − h (0) used in the proof of Thm. 7.9,
assuming that h is a C 2 function on [0, 1].
!1 !1
Hint: Write 0 (1 − t)h (t) dt = 0 (1 − t) dt
d
h (t) dt, and use integration by parts.

8 CURVES IN Rn
By a C 0 curve in Rn we mean a continuous map γ : [a, b] → Rn , where a, b ∈ R with a < b. Of
course one might think geometrically of the curve as the image γ ([a, b]), but formally the curve is
in fact the map γ : [a, b] → Rn .
By a C 1 curve in Rn we shall mean a C 1 map γ : [a, b] → Rn . We should clarify this, because to
date we have only considered functions which are C 1 on open sets, whereas here we take the domain
to be the closed interval [a, b]. In fact, by saying that γ : [a, b] → Rn is C 1 we mean that: (i) the
derivative γ (t) = limh→0 h−1 (γ (t + h) − γ (t)) exists for each t ∈ (a, b); (ii) the one-sided deriva-
tives γ (a) = limh↓0 h−1 (γ (a + h) − γ (a)), γ (b) = limh↑0 h−1 (γ (a + h) − γ (a)) exist; and (iii)
γ (t), so defined, is a continuous function of t on the closed interval [a, b].
γ (t) is sometimes referred to as the velocity vector of γ , because if t, t + h ∈ [a, b], the quantity
γ (t + h) − γ (t) is the change of position of γ as t changes from t to t + h, so, as h → 0, h−1 (γ (t +
h) − γ (t)) can be thought of as the (vector) rate of change of the point γ (t). If γ (t) = 0 we define

8.1 τ (t) = γ (t)−1 γ (t) ,

and we refer to τ (t) as the unit tangent vector of the curve γ . From the schematic diagram (depicting
the case n = 3) it is intuitively evident that indeed γ (t) is tangent to the curve γ (t) if it is nonzero,
and hence that the terminology “unit tangent vector” is appropriate.
We can also discuss the length (γ ) of a C 0 curve γ as follows.
8. CURVES IN Rn 49

σ(t+h)
σ(t) σ(b)

σ(a)

σ(t+h)-σ(t)
0

Figure 2.2: Schematic picture of definition of velocity vector n = 3.

First we introduce the notion of partition of the interval [a, b], meaning a finite collection P =
{t0 , . . . , tN } of points of [a, b] with a = t0 < t1 < · · · < tN = b. For such a partition P we define


N
(γ , P ) = γ (tj ) − γ (tj −1 ) .
j =1

Geometrically, (γ , P ) is the length of the “polygonal approximation” to γ obtained by taking the
union of the straight line segments γ (tj −1 ) γ (tj ) joining the points γ (tj −1 ) and γ (tj ), as in the
following schematic diagram, which depicts the case n = 3.
We say the curve has finite length if the set

8.2 {(γ , P ) : P is a partition of [a, b]}

is bounded above. In case γ has finite length we can define the length (γ ) to be the least upper
bound of the set in 8.2:

8.3 (γ ) = sup{(γ , P ) : P is a partition of [a, b]} .

It is a straightforward exercise (Exercise 8.1 below) to check the additivity property that if c ∈ (a, b)
then

8.4 (γ ) = (γ |[a, c]) + (γ |[c, b]) .


50 CHAPTER 2. ANALYSIS IN Rn

Figure 2.3: Schematic picture of polygonal approximation of γ , n = 3.

The explicit formula for length of a C 1 curve given in the following theorem is important.
8.5 Theorem. If γ : [a, b] → Rn is C 1 , then γ has finite length and in fact
" b
(γ ) = γ (t) dt .
a

Proof: First observe by the fundamental theorem of calculus


" tj
γ (tj ) − γ (tj −1 ) = γ (t) dt .
tj −1
! tj ! tj
Using the fact that  tj −1 γ (t) dt ≤ tj −1 γ (t) dt (see Exercise 8.3 below), we then have
" tj
γ (tj ) − γ (tj −1 ) ≤ γ (t) dt
tj −1

for each j = 1, . . . , N, and hence by summing over j we have


N " b
γ (tj ) − γ (tj −1 ) ≤ γ (t) dt ,
j =1 a

so the collection of all sums



N

γ (tj ) − γ (tj −1 ) : t0 = a < t1 < · · · < tN = b is a partition of [a, b]
j =1
8. CURVES IN Rn 51

!b
has the integral a γ (t) dt as an upper bound so indeed γ has finite length as claimed and
" b
(1) (γ ) ≤ γ (t) dt .
a
For t ∈ [a, b] we define
S(t) = (γ |[a, t])
so that S(a) = 0 and S(b) = (γ ). Take τ ∈ [a, b) and h > 0 such that τ + h ∈ [a, b], and observe
that, by 8.4 with a, τ, τ + h in place of a, c, b, respectively, we have
S(τ + h) = S(τ ) + (γ |[τ, τ + h])
so
(2) (γ |[τ, τ + h]) = S(τ + h) − S(τ ) ,
but, on the other hand, by (1) and the definition of length we have
" τ +h
γ (τ + h) − γ (τ ) ≤ (γ |[τ, τ + h]) ≤ γ (t) dt ,
τ
i.e., by (2)
" τ +h " τ +h " τ
γ (τ +h) − γ (τ ) ≤ S(τ +h)−S(τ ) ≤ γ (t) dt = γ (t) dt − γ (t) dt ,
τ a a
so
" τ +h " τ
h−1 γ (τ + h) − γ (τ ) ≤ h−1 (S(τ + h) − S(τ )) ≤ h−1 ( γ (t) dt − γ (t) dt) .
a a

Notice the right side has limit as h ↓ 0 equal to dτ
d
a γ (t) dt, which by the fundamental theorem
of calculus is γ (τ ). So both the left and right side has limit as h ↓ 0 equal to γ (τ ), and hence

by the Sandwich Theorem (Exercise 2.1 of the present chapter) we deduce that for τ ∈ [a, b)
lim h−1 (S(τ + h) − S(τ )) = γ (τ ) .
h↓0

By a similar argument with h < 0 we also prove for τ ∈ (a, b]


lim h−1 (S(τ + h) − S(τ )) = γ (τ ) .
h↑0

Thus, we have shown s (t) exists and equals γ (t) for every t ∈ [a, b],1 so by the fundamental
theorem of calculus and the fact that S(a) = 0 we have
" b
S(b) = γ (t) dt .
a
1 Of course we proved only the existence of the relevant 1-sided derivatives at the end-points a, b.
52 CHAPTER 2. ANALYSIS IN Rn

Since, by definition S(b) = (γ ), this completes the proof.

8.6 Remark: If γ : [a, b] → Rn is C 1 , then, as shown in the above proof, the function S(t) =
(γ |[a, t]) is a C 1 function on [a, b], and S (t) = γ (t) for each t ∈ [a, b].Thus, if γ (t)  = 0 then
S has positive derivative, hence is a strictly increasing function mapping [a, b] onto [0, (γ )], and
by 1-variable calculus the inverse function T : [0, (γ )] → [a, b] is C 1 and the derivative T (s) =
(S (t))−1 , assuming the variables s, t are related by s = S(t) (or, equivalently, t = T (s)). Notice that
then if we define  γ = γ ◦ T , so γ (t) = γ (s), assuming again that s, t are related by s = S(t) or,
equivalently, t = T (s). The map  γ is called the arc-length parametrization of γ (because s = S(t) =
(γ |[a, t])). Notice that by the 1-variable calculus chain rule we have  γ (s) = γ (t)−1 γ (t) and,
in particular,  γ (s) ≡ 1. The reader should keep in mind that all of this is dependent on the
assumption that γ is C 1 with γ (t) = 0.

SECTION 8 EXERCISES
8.1 Let γ : [a, b] → Rn be an arbitrary continuous curve of finite length. Prove directly from the
definition of length (as the least upper bound of lengths of polygonal approximations, as in 8.3) that
if c ∈ (a, b) then
(γ |[a, c]) + (γ |[c, b]) = (γ ) .

Hint: Start by showing that if P , Q are partitions for [a, c] and [c, b], respectively, then
(γ |[a, c], P ) + (γ |[c, b], Q) ≤ (γ ).

8.2 Let γ : [0, 1] → R2 be the C 0 curve defined by γ (t) = (t, t sin 1t ) if t ∈ (0, 1] and γ (0) = (0, 0).
Prove that γ has infinite length.
Hint: One approach is to directly check that for each real K > 0 there is a partition P : t0 = 0 <

t1 < t2 < · · · < tN = 1 such that Nj =1 γ (tj ) − γ (tj −1 ) > K.

!b
8.3 Let f = (f1 , . . . , fn ) : [a, b] → Rn be continuous and as usual take a f (t) dt to mean
! b !b  !b !b
f1 (t) dt, . . . , a fn (t) dt (∈ Rn ). Prove  a f (t) dt ≤ a f (t) dt. Hint: First observe v ·
!ba !b
a f (t) dt = a v · f (t) dt for any fixed vector v ∈ R .
n

8.4 Consider the “helix” in R3 given by the curve γ (t) = (a cos ωt, a sin ωt, bωt) for t ∈ R, where
a, b, ω are given positive constants.
(a) Show that the speed (i.e., γ (t)) is constant for t ∈ R.
(b) Show that the velocity vector γ (t) makes a constant nonzero angle with the z-axis.
(c) With t1 = 0 and t2 = 2π/ω, show that there is no τ ∈ (t1 , t2 ) such that γ (t1 ) − γ (t2 ) = (t1 −
t2 )γ (τ ).
(Thus the Mean Value Theorem fails for vector-valued functions.)
9. SUBMANIFOLDS OF Rn AND TANGENTIAL GRADIENTS 53

8.5 For the Helix γ (t) = (a cos ωt, a sin ωt, bωt), where t ∈ [0, 2π/ω] and a, b, ω > 0, prove that
the arc-length variable s = S(t)(= (γ |[0, t])) (as in Rem. 8.6 above) is a constant multiple of t.
Also, find the length, unit tangent vector, and curvature vector γ (s) for γ .
8.6 Suppose γ : [a, b] → Rn is a C 2 curve with nonzero velocity vector γ . If  γ (s) = γ (t), where
s is the arc-length parameter (i.e., s = S(t) = (γ |[a, t]) as in Rem. 8.6 above), prove that the
d
curvature vector κ = ds 
γ (s) at s is given in terms of the t-variable by the formula
# $
κ(s) = γ (t)−2 I − γ (t)−2 γ (t)γ (t)T γ (t) .

Here I is the n × n identity matrix and we assume γ (t) is written as a column; γ (t)γ (t)T is matrix
multiplication of the n × 1 matrix γ (t) by the 1 × n matrix γ (t)T .

9 SUBMANIFOLDS OF Rn AND TANGENTIAL


GRADIENTS
In this section it will be convenient to write vectors in Rn as rows, x = (x1 , . . . , xn ), and the reader
should keep in mind that then the gradient ∇f (x) of a C 1 function f : U → R will also be a row
vector ∇f (x) = (D1 f (x), . . . , Dn f (x)).
We first introduce the notion of k-dimensional graph in Rn , with k ∈ {1, . . . , n − 1}.
9.1 Definition: If g = (g1 , . . . , gn−k ) : U → Rn−k , where U is open in Rk , the graph map corre-
sponding to g is the map G : U → Rn defined by
G(x) = (x, g(x)), x∈U ,
and the graph of g is defined to be G(U ), i.e., {G(x) : x ∈ U }. Notice that G is C 1 if g is C 1 and
in this case we can view the graph of g (i.e., G(U )) as a k-dimensional C 1 surface sitting in Rn , as
represented in the following schematic diagram, which is actually representative of the special case
k = 2, n = 3.
9.2 Definition: A subset M ⊂ Rn is a k-dimensional C 1 submanifold of Rn if for each a ∈ M
there is a δ > 0 such that M ∩ Bδ (a) is Pj1 ,...,jn (G(U )) where G(x) (as in 9.1) is the graph map
(x, g(x)) of a C 1 function g : U → Rn−k with U open in Rk , and Pj1 ,...,jn is the permutation map
Pj1 ,...,jn (x1 , . . . , xn ) = (xj1 , . . . , xjn ), with j1 , . . . , jn a permutation (i.e., a re-ordering) of 1, . . . , n.
This definition sounds at first a little abstract, but it is really just saying that if a ∈ M then, in a
suitably small ball Bδ (a), we can re-label the coordinates (using xji as the new i-th coordinate for
i = 1, . . . , n) in such a way that M ∩ Bδ (a) becomes the graph of a C 1 function as in 9.1.
Figure 2.5 gives a schematic picture of what is going on in the case k = 1, n = 2.
54 CHAPTER 2. ANALYSIS IN Rn

Figure 2.4: Schematic picture of a C 1 graph in case k = 2, n = 3.

Next we define the notion of tangent space Ta M of the C 1 manifold M at a ∈ M, as follows.


9.3 Definition: If M is a k-dimensional C 1 submanifold of Rn and a ∈ M then the tangent space
of M at a, Ta M, is defined by

Ta M = {γ (0) : γ is a C 1 map [0, α) → Rn with α > 0, γ ([0, α)) ⊂ M, γ (0) = a} .

Thus, in fact Ta M is the set of all initial velocity vectors of C 1 curves γ (t) which are contained in
M and which start from the given point a at t = 0. We claim that Ta M is a k-dimensional subspace
of Rn .

9.4 Lemma. If M is a k-dimensional C 1 submanifold of Rn and if a ∈ M, then the tangent space Ta M


is a k-dimensional subspace of Rn .

9.5 Remark A schematic picture representing the situation described in the above lemma, in the
case k = 2 and n = 3, is given in Figure 2.6.
Proof of 9.4: Since it is merely a matter of relabeling the coordinates, we can suppose without loss of
generality that the permutation j1 , . . . , jn needed in the Def. 9.2 is in fact the identity permutation
(i.e., ji = i for each i = 1, . . . , n), and hence there is δ > 0 such that

M ∩ Bδ (a) = graph g ,
9. SUBMANIFOLDS OF Rn AND TANGENTIAL GRADIENTS 55

Figure 2.5: Schematic picture of a C 1 manifold M in case k = 1, n = 2.

where g : U → Rn−k is C 1 and U is open in Rk . If v ∈ Ta M then by definition of Ta M there


is a C 1 curve γ : [0, α) → Rn (α > 0) with γ ([0, α)) ⊂ M, γ (0) = a and γ (0) = v. Since γ is
continuous we have ρ > 0 such that 0 ≤ t < ρ ⇒ γ (t) ∈ M ∩ Bδ (a), and so 0 ≤ t < ρ ⇒
(1) γ (t) = G(σ (t)) ,
where G is the graph map as in 9.1 and σ (t) = (γ1 (t), . . . , γk (t)) (i.e., σ (t) is the vector in Rk
obtained by taking first k components of γ (t)), so that σ (t) : [0, ρ) → Rk is C 1 with σ (0) = a 0 ,
where a 0 = (a1 , . . . , ak ) (i.e., a 0 is the vector in Rk obtained by taking the first k components of
a). By applying the chain rule to the composite on the right of (1) we then have

n
(2) v = γ (0) = yj Dj G(a 0 ) ,
j =1

where (y1 , . . . , yn ) = σ (0) ∈ Rk , thus showing, since v ∈ Ta M was arbitrary, that


(3) Ta M ⊂ span{D1 G(a 0 ), . . . , Dk G(a 0 )} .
We can also prove the reverse inclusion
(4) span{D1 G(a 0 ), . . . , Dk G(a 0 )} ⊂ Ta M ,
56 CHAPTER 2. ANALYSIS IN Rn

Figure 2.6: The tangent space Ta M of M.


Notice that the tangent space in general does not pass through the point a; whereas the affine space
a + Ta M = {a + v : v ∈ Ta M} (i.e., the translate of the subspace Ta M by the vector a) does pass through
a.

because if y ∈ Rk we can take ρ > 0 such that Bρ (a 0 ) ⊂ U (because U is open), and hence for any
α > 0 with αy < ρ we have γ (t) = G(a 0 + ty) is C 1 for |t| < α, γ ((−α, α)) ⊂ M, γ (0) = a
and again (2) holds by the chain rule. Since y1 , . . . , yk were arbitrary, this does prove the reverse
inclusion (4). Thus, in fact (by (3),(4)) we have proved

(5) Ta M = span{D1 G(a 0 ), . . . , Dk G(a 0 )} .

Also, D1 G(a 0 ), . . . , Dk G(a 0 ) are l.i.; indeed,

G(x) = (x, g(x)) ⇒ Dj G(x) = (ej , Dj g(x)), j = 1, . . . , k ,

which are clearly l.i. vectors. This completes the proof that Ta M is a k-dimensional subspace of Rn .

9.5 Remark: Part of the above proof (in fact the proof of inclusion (4)) shows that for any a ∈ M and
for any y ∈ Rk we can find γ : (−α, α) → Rn (i.e., γ is C 1 on the whole interval (−α, α) rather than

just on [0, α)) with α > 0, γ ((−α, α)) ⊂ M, γ (0) = a and γ (0) = kj =1 yj Dj G(a 0 ). Notice also
9. SUBMANIFOLDS OF Rn AND TANGENTIAL GRADIENTS 57

k
that for any v ∈ Ta M we can choose these y1 , . . . , yk such that the j =1 yj Dj G(a 0 ) = v (by (5)
in the above proof ), i.e., γ can be selected so that γ (0) = v.

Given a C 1 map f : U → R, where U is an open subset of Rn with U ⊃ M, we define the tangential


gradient relative to M of f , denoted ∇M f , (also referred to as the “gradient of f |M”) by
9.6 ∇M f (a) = P Ta M (∇Rn f (a)), a∈M,
where PTa M : Rn → Rn is the orthogonal projection onto the tangent space Ta M (as in 8.3 of Ch. 1
with V = Ta M). Recall that by 8.3 of Ch. 1 in fact the definition 9.6 simply says that ∇M f (a) is
the Ta M-component v in the decomposition ∇Rn f (a) = v + u with v ∈ Ta M and u ∈ (Ta M)⊥ ,
hence the name “tangential gradient.”
Of course since P Ta M is a linear map we have that ∇M f is also linear in f in the sense that if
fj : Uj → R is C 1 with Uj open and Uj ⊃ M for j = 1, 2, and if λ, μ ∈ R, then
9.7 ∇M (λf1 + μf2 )(a) = λ∇M f1 (a) + μ∇M f2 (a), a∈M.

9.8 Lemma. Suppose M is a k-dimensional C 1 submanifold of Rn and f : U → R is C 1 , where U is


an open set containing M. Then at each point a ∈ M we have
(∗) v ∈ Ta M ⇒ Dv f (a) = v · ∇Rn f (a) = v · ∇M f (a) .
 → R is another C 1 function with U
Furthermore, if f : U  ⊃ M and if f|M = f |M, then ∇M f (a) =

∇M f (a) ∀ a ∈ M.
9.9 Remarks: (1) Notice, in particular, that (∗) implies that all the directional derivatives of f by
vectors v ∈ Ta M are determined by the tangential gradient ∇M f (a).
(2) In the last part of the above lemma we use the notation f |M to mean the “restriction of f to
M,” meaning the function which is defined on M only (and not on the larger set U ) and which
agrees with f at all points of M; the last part of the lemma in fact shows that the gradient of f on
M, ∇M f , depends only on f |M and is unchanged if we take another f , provided the new f is still
C 1 and agrees with the original f at all points of M.

Proof of 9.8: For v ∈ Ta M Def. 9.3 ensures that there is a C 1 map γ : [0, α) → Rn with α > 0,
γ ([0, α)) ⊂ M, γ (0) = a, and γ (0) = v. By the chain rule, dt d
f (γ (t)) = γ (t) · ∇Rn f (γ (t)) so,
d
in particular, dt f (γ (t))|t=0 = v · ∇Rn f (a) = Dv f (a). On the other hand, since v ∈ Ta M we have
v = P Ta M (v) and then by 8.3 (with V = Ta M) we have v · ∇Rn f (a) = P Ta M (v) · ∇Rn f (a) =
v · P Ta M (∇Rn f (a)) = v · ∇M f (a) by the definition 9.6.
To prove that last part of the lemma, observe that under the stated hypotheses f− f is C 1 on the
open set U  ∩ U ⊃ M, and so we can apply the above argument with f− f in place of f . Since

f − f = 0 at each point of M, this gives v · ∇M (f− f )(a) = 0 ∀ a ∈ M and all v ∈ Ta M and so
taking v = ∇M (f− f )(a) we conclude ∇M (f− f )(a) = 0. Then by the linearity 9.7 this can be
written ∇M f− ∇M f (a) = 0, so the proof of 9.8 is complete.
58 CHAPTER 2. ANALYSIS IN Rn

In view of the above theorem the following definition is natural.

9.10 Definition: Suppose M is a k-dimensional C 1 submanifold of Rn and f : U → R is C 1 , where


U is an open set containing M. Then a point a ∈ M is called a critical point of f |M (the restriction
of f to M as in 9.9(2)) if ∇M f (a) = 0.
Using the terminology of the above definition we then have the following.

9.11 Lemma. If f |M has a local maximum or minimum at a ∈ M (i.e., if there is δ > 0 such that either
f (x) ≤ f (a) ∀ x ∈ M ∩ Bδ (a) or f (x) ≥ f (a) ∀ x ∈ M ∩ Bδ (a)) then a is a critical point of f |M
(i.e., ∇M f (a) = 0).
9.12 Caution: Of course it need not be true under the hypotheses of the above lemma that the full
Rn gradient ∇Rn f (a) = 0, because we are only assuming f |M has a local maximum or minimum
at a, and this gives no information about the directional derivatives of f at a by vectors v which are
not in Ta M.

Proof of Lemma 9.11: Assume f |M has a local maximum at a, and take δ > 0 such that f (x) ≤
f (a) for all x ∈ M ∩ Bδ (a). Let v ∈ Ta M. By Rem. 9.5 we can find a C 1 map γ : (−α, α) → Rn
with γ ((−α, α)) ⊂ M, γ (0) = 0, and γ (0) = v. γ is C 1 , hence continuous, and so there is an
η > 0 such that |t| < η ⇒ γ (t) − γ (0) < δ, so, since γ (0) = a, we have |t| < η ⇒ γ (t) ∈ M ∩
Bδ (a), and hence |t| < η ⇒ f (γ (t)) ≤ f (a), so f (γ (t)) has a local maximum at t = 0. Hence,
by 1-variable calculus the derivative at t = 0 is zero. That is, by the Chain Rule and 9.8, 0 =

dt f (γ (t))|t=0 = γ (0) · ∇R f (a) = v · ∇R f (a) = v · ∇M f (a), so v · ∇M f (a) = 0 for each v ∈
d n n

Ta M. In particular, with v = ∇M f (a) we infer ∇M f (a)2 = 0, hence ∇M f (a) = 0.


If f |M has a local minimum at a the same argument can be applied to −f , hence the proof of 9.11
is complete.

One of the things we shall prove later (in Ch. 4) is that if k ∈ {1, . . . , n − 1} then intersection of
the level sets of the n − k C 1 functions g1 , . . . , gn−k is a k-dimensional C 1 submanifold, at least
near points of the intersection where the gradients ∇Rn g1 , . . . , ∇Rn g n−k are l.i. More precisely:

9.13 Theorem. If U ⊂ Rn is open, if g1 , . . . , gn−k : U → R are C 1 , if S = {x ∈ U : gj (x) =


0 for each j = 1, . . . , n − k} and if M  = ∅, where
M = {y ∈ S : ∇Rn g1 (y), . . . , ∇Rn gn−k (y) are l.i.} ,
then M is a k-dimensional C 1 submanifold.
We shall give the proof of this later, in Sec. 3 of Ch. 4, as a corollary of the Implicit Function
Theorem.

9.14 Lagrange Multiplier Theorem. If U ⊂ Rn is open, if f, g1 , . . . , gn−k : U → R are


C 1 , if S = {x ∈ U : gj (x) = 0 for each j = 1, . . . , n − k}, if a ∈ M ≡ {y ∈ S : ∇g1 (y), . . . ,
9. SUBMANIFOLDS OF Rn AND TANGENTIAL GRADIENTS 59

∇gn−k (y) are l.i.} (so that M is a k-dimensional C 1 manifold by 9.13), and if a is a critical point

of f |M, then ∃ λ1 , . . . , λn−k ∈ R such that ∇Rn f (a) = n−k
j =1 λj ∇Rn gj (a).

9.15 Terminology: The constants λ1 , . . . , λn−k are called Lagrange Multipliers for f relative to the
constraints g1 , . . . , gn−k .
9.16 Remark: In view of 9.11 the hypothesis that a is a critical point of f |M is automatically true
(and hence the conclusion of 9.14 applies) if f |M has a local maximum or local minimum at a (or,
in particular, if f |S has a local maximum or local minimum at a ∈ M).
Proof of 9.14: By definition of critical point of f |M we have
PTa M (∇Rn f (a)) = ∇M f (a) = 0 ,
and hence ∇Rn f (a) ∈ (Ta M)⊥ . By 9.4, Ta M has dimension k, and hence, by 8.1(iii) of Ch. 1, we
have dim(Ta M)⊥ = n − k.
Now gj ≡ 0 on M, hence by Lem. 9.8 above we have ∇M gj ≡ 0 on M and hence we also have
∇Rn gj (a) ∈ (Ta M)⊥ . Furthermore, ∇Rn g1 (a), . . . , ∇Rn gn−k (a) are l.i., and hence, by Lem. 5.7(b)
of Ch. 1, they must be a basis for (Ta M)⊥ , so there are unique λ1 , . . . , λn−k ∈ R such that ∇Rn f (a) =
n−k
j =1 λj ∇Rn gj (a).

SECTION 9 EXERCISES
9.1 (a) Find the maximum value of the product x1 x2 · · · xn subject to the constraints xj > 0 for each

j = 1, . . . , n and nj=1 xj = 1, and (rigorously) justify your answer.
a1 +a2 +···+an
(b) Prove that if a1 , . . . , an are positive numbers, then (a1 a2 · · · an )1/n ≤ n . (“The in-
equality between the arithmetic and geometric mean.”)
9.2 Find:
(a) the minimum value of x 2 + y 2 on the line x + y = 2;
(b) the maximum value of x + y on the circle x 2 + y 2 = 2.
9.3 Suppose f (x, y, z) = 3xy + z3 − 3z for (x, y, z) ∈ R3 .
(i) Show that f has no local max or min in R3 .
(ii) Prove that f attains its max and min values on the sphere x 2 + y 2 + z2 − 2z = 0.
(iii) Find all the points on the sphere x 2 + y 2 + z2 − 2z = 0 where f attains its max and min values.
9.4 Let n ≥ 2 and S n−1 = {x ∈ Rn : x = 1}. Prove that S n−1 is an (n − 1)-dimensional C 1
submanifold.
# %  &$
Hint: Notice that we can write S n−1 = ∪nj=1 x ∈ Rn : i=j xi2 < 1 and xj = 1 − i=j xi2
# %  &$
∪ ∪nj=1 x ∈ Rn : i=j xi2 < 1 and xj = − 1 − i=j xi2 .
60
61

CHAPTER 3

More Linear Algebra


1 PERMUTATIONS
A permutation on n elements is an ordered n-tuple (i1 , . . . , in ) which is a re-ordering of the first n
positive integers 1, . . . , n; that is the set P n of all such permutations is
 
P n = (i1 , . . . , in ) : {i1 , . . . , in } = {1, . . . , n} .

Notice that there are exactly n! such permutations, because there are exactly n ways to choose the
first entry i1 , and for each of these n choices there are exactly n − 1 ways of choosing the entry i2 ,
so the number of ways of choosing the first two entries i1 , i2 is n(n − 1), and similarly (if n ≥ 3), a
total of n(n − 1)(n − 2) ways of choosing the first 3 entries i1 , i2 , i3 , and so on.
Let 1 ≤ k <  ≤ n, and define τk, : Rn → Rn by
k -th -th k -th -th
↓ ↓ ↓ ↓
τk, (x1 , . . . , xk , . . . , x , . . . , xn ) = (x1 , . . . , x , . . . , xk , . . . , xn )

(i.e., τk, interchanges the elements at position k, ). Such a transformation is called a transposition;
notice that τk, is its own inverse. Thus, τk, ◦ τk, = identity transformation on Rn :

τk, ◦ τk, (x1 , . . . , xn ) ≡ (x1 , . . . , xn ) ∀ (x1 , . . . , xn ) ∈ Rn .

We claim that any permutation can be achieved by applying a finite sequence of such transposi-
tions. Indeed, if (i1 , . . . , in ) is any permutation then we can write this permutation in terms of a
permutation of the first n − 1 (instead of n) integers as follows:



perm. of 1, . . . , n − 1,
' () *



⎨ ( i1 ,. . . , in−1 , n) if in = n
(i1 , . . . , in ) = τ ( i , . . . ,i , . . . , i , n) if i < n (i.e., i = n for some k < n) ,

⎪ k,n 1 n n−1 n k

⎪ k th slot

⎩ ) *' (
perm. of 1, . . . , n − 1

and so by induction on n it follows that each permutation can indeed be achieved by a sequence
of transpositions; that is, for any permutation (i1 , . . . , in ) we can find a sequence τ (1) , . . . , τ (q) of
transpositions such that

1.1 (i1 , . . . , in ) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (1, . . . , n) .


62 CHAPTER 3. MORE LINEAR ALGEBRA

Correspondingly, we can permute the n coordinates x1 , . . . , xn of a point in Rn using the same


sequence of transpositions:

1.2 (xi1 , . . . , xin ) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (x1 , x2 , . . . , xn ) .

1.3 Note: We of course do not claim the representation 1.1 is unique, and it is not—indeed there
are many different ways of writing a given permutation in terms of a sequence of transpositions.
For example, if n = 3 then the permutation (3, 2, 1) can be written as τ1,3 (1, 2, 3) but also as
τ2,3 ◦ τ1,2 ◦ τ2,3 (1, 2, 3).
An important quantity related to a permutation (i1 , . . . , in ) is the integer N(i1 , . . . , in ), which is
defined as the number of inversions present in the sequence i1 , . . . , in ; that is,

N (i1 , . . . , in ) = the number of pairs k,  ∈ {1, . . . , n} with  > k and i < ik .


n−1
One usually computes this as j =1 Nj (i1 , . . . , in ), where

N1 (i1 , . . . , in ) = number of integers  > 1 such that i < i1


N2 (i1 , . . . , in ) = the number of integers  > 2 such that i < i2
.. ..
. .
Nn−1 (i1 , . . . , in ) = the number of integers  > n − 1 such that i < in−1

(so that Nn−1 (i1 , . . . , in ) = 1 if in < in−1 and = 0 if in > in−1 ).


The quantity N (i1 , . . . , in ) allows us to define the “parity” (i.e., “evenness” or “oddness”) of a per-
mutation.
1.4 Definition: The permutation (i1 , . . . , in ) is said to be even or odd according as the integer
N (i1 , . . . , in ) is even or odd.
Notice that then
+1 if (i1 , . . . , in ) is even
(−1)N (i1 ,...,in ) =
−1 if (i1 , . . . , in ) is odd .

The quantity (−1)N(i1 ,...,in ) is called the index of the permutation (i1 , . . . , in ).

1.5 Lemma. If 1 ≤ k <  ≤ n and if (i1 , . . . , in ) is a permutation of (1, . . . , n), then


(−1)N (i1 ,...,i ,...,ik ,...,in ) = −(−1)N (i1 ,...,ik ,...,i ,...,in ) .

1.6 Remarks: (1) Notice that (i1 , . . . , i , . . . , ik , . . . , in ) can be written in terms of the transposition
τk, as τk, (i1 , . . . , ik , . . . , i , . . . , in ), so the above lemma says that the parity of the permutation
changes each time we apply a transposition; notice, in particular, this means that the index (−1)N(i1 ,...,in )
can be written alternatively as
(−1)N (i1 ,...,in ) = (−1)q ,
1. PERMUTATIONS 63

where q is the number of transpositions appearing in 1.1 above.


(2) An additional consequence of the lemma is that for each n ≥ 2 there are n! 2 even permutations
and n!
2 odd permutations, because according to the lemma the transposition τ1,2 maps the even
permutations to the odd permutations, and this map is clearly 1:1 and onto (it is onto because any
permutation (i1 , i2 , i3 , . . . , in ) can be written τ1,2 (i2 , i1 , i3 , . . . , in )).
Proof of Lemma 1.5: If  = k + 1, then ik , i are adjacent elements in the permutation (i1 , . . . , in ).
Observe that, if we use the notation that #A = the number of elements in the set A (zero if A
is empty), then since the pair in the k-th and (k + 1)’st slots is ik+1 , ik (which is an inversion if
ik+1 > ik but not if ik < ik+1 ) we have
k -th (k + 1)’st
↓ ↓

ik+1 > ik ⇒ Nk (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) = 1 + #{j : j ≥ k + 2 and ij < ik+1 }


and Nk+1 (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) = #{j : j ≥ k + 2 and ij < ik } ,

so that

Nk (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) + Nk+1 (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) =


1 + #{j : j ≥ k + 2 and either ij < ik+1 or ij < ik } ,

which, except for the “1+” on the right side, is exactly the same as

Nk (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) + Nk+1 (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) .

That is, we have shown that

ik+1 > ik ⇒Nk (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) + Nk+1 (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) =
Nk (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) + Nk+1 (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) + 1 .

By a similar argument (indeed it’s the same identity applied to the permutation (j1 , . . . , jn ), where
j = i for  = k, k + 1 and jk = ik+1 , jk+1 = ik ) we have

ik+1 < ik ⇒Nk (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) + Nk+1 (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) =
Nk (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) + Nk+1 (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) − 1 .

Since trivially Np (i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) = Np (i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) if either p < k
or p > k + 1, then we have actually shown that

N(i1 , . . . , ik−1 , ik+1 , ik , . . . , in ) = N(i1 , . . . , ik−1 , ik , ik+1 , . . . , in ) ± 1

and so (−1)N (i1 ,...,ik−1 ,ik+1 ,ik ,...,in ) = −(−1)N (i1 ,...,ik−1 ,ik ,ik+1 ,...,in ) and the lemma is proved in the
case when  = k + 1.
In the case when  ≥ k + 2 we can get the permutation (i1 , . . . , i , . . . , ik , . . . , in ) from the original
permutation (i1 , . . . , ik , . . . , i , . . . , in ) by first making a sequence of  − k transpositions of adja-
cent elements to move ik to the right of i , and then making a sequence of  − k − 1 transpositions
64 CHAPTER 3. MORE LINEAR ALGEBRA

of adjacent elements to move i back to the k th position. In all that is 2( − k) − 1 transpositions
of adjacent elements, each changing the index by a factor of −1, so the total change in the index is
(−1)2(−k)−1 = −1, and the lemma is proved.

If (i1 , . . . , in ) is any permutation of 1, . . . , n, recall that then we can pick transpositions τ (1) , . . . , τ (q)
such that (i1 , . . . , in ) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (1, . . . , n). Then (since τ (i) ◦ τ (i) = identity for each
i) the same sequence of transpositions in reverse order “undoes” this permutation: (1, . . . , n) =
τ (q) ◦ · · · ◦ τ (1) (i1 , . . . , in ). The permutation (j1 , . . . , jn ) = τ (q) ◦ · · · ◦ τ (1) (1, . . . , n) is called the
“inverse” of (i1 , . . . , in ). Observe that jk is uniquely determined by the condition ijk = k for each k =
1, . . . , n, as one sees by taking xk = ik in the identity (xj1 , . . . , xjn ) = τ (q) ◦ · · · ◦ τ (1) (x1 , . . . , xn ).
Observe also that (−1)N(j1 ,...,jn ) = (−1)q = (−1)N (i1 ,...,in ) ; that is, the permutation (i1 , . . . , in ) and
its inverse (j1 , . . . , jn ) have the same index.

SECTION 1 EXERCISES
1.1 For each of the following permutations, find the parity (i.e., evenness or oddness) by two methods:
(i) by directly calculating the number N of inversions; and (ii) by representing the permutation by a
sequence of transpositions applied to the trivial permutation (1, 2, . . . , n):
(a) (4, 3, 1, 5, 7, 2, 6), (b) (2, 3, 1, 8, 5, 4, 7, 6).

2 DETERMINANTS
We here take the general “multilinear algebra” approach, which begins with the following abstract
theorem (which in the course of the proof will become quite concrete).
()
' n-factors *
2.1 Theorem. There is one and only one function D : Rn × · · · × Rn → R having the following prop-
erties:
(i) D is linear in the j -th factor for each j = 1, . . . , n; For j = 1 this means
D(c1 β 1 + c2 γ 1 , α 2 , . . . , α n ) = c1 D(β 1 , α 2 , . . . , α n ) + c2 D(γ 1 , α 2 , . . . , α n ) ,
for any c1 , c2 ∈ R and any β 1 , γ 1 , α 2 , . . . , α n ∈ Rn , and we require a similar identity for the
other factors;

(ii) D changes sign when we interchange any pair of factors: that is, for any k < ,
D(α 1 , . . . , α k , . . . , α  , . . . , α n ) = −D(α 1 , . . . , α  , . . . , α k , . . . , α n ) ;

(iii) D(e1 , . . . , en ) = 1.
2.2 Remark: Observe that property (ii) says, in particular, that D(α 1 , . . . , α n ) = 0 in case any two
of the vectors α i , α j (i  = j ) are equal, because in that case by interchanging α i and α j and using
property (ii) we conclude that D(α 1 , . . . , α n ) must be the negative of itself, i.e., it must be zero.
2. DETERMINANTS 65

Proof: The proof is fairly straightforward now that we have the basic properties of permutations at
our disposal: We start by assuming such a D exists, and seeing what conclusions that enables us to
draw. In fact, if such D exists, then we can write each vector α j in the expression D(α 1 , . . . , α n ) as
a linear combination of the standard basis vectors and use the linearity property (i). Thus, we write

α 1 = ni1 =1 ai1 1 ei1 , α 2 = ni2 =1 ai2 2 ei2 , . . . , α n = nin =1 ain n ein , and use the linearity property (i)
on each of the sums. This gives the expression
n
D(α 1 , . . . , α n ) = ai1 1 ai2 2 · · · ain n D(ei1 , ei2 , . . . , ein ) .
i1 ,i2 ,...,in =1

In view of Rem. 2.2 above we can drop all terms where i = ik for some k = ; that is we are
summing only over distinct i1 , . . . , in , i.e., over i1 , . . . , in such that (i1 , . . . , in ) is a permutation of
the integers 1, . . . , n, so only n! terms (from the total of nn terms) are possibly nonzero, and we can
write 
D(α 1 , . . . , α n ) = ai1 1 ai2 2 · · · ain n D(ei1 , ei2 , . . . , ein ) .
distinct i1 ,...,in ≤ n
Now interchanging any two of the indices ik , i (with k = ) just changes the sign of the ex-
pression D(ei1 , ei2 , . . . , ein ) (by virtue of property (ii)). So pick a sequence of transpositions
τ (1) , . . . , τ (q) such that (i1 , . . . , in ) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (1, . . . , n); then D(ei1 , . . . , ein ) =
(−1)q D(e1 , . . . , en ), and from our discussion of permutations we know that (−1)q is just the index
(−1)N (i1 ,...,in ) of the permutation (i1 , . . . , in ), and also D(e1 , . . . , en ) = 1 by property (iii), so we
have proved (assuming a D satisfying (i), (ii), (iii) exists) that

(1) D(α 1 , . . . , α n ) = (−1)N (i1 ,...,in ) ai1 1 ai2 2 · · · ain n .
perms. (i1 ,...,in ) of {1,...,n}

Thus, if such a map D exists, then there can only be one possible expression for it, Viz. the expression
on the right of (1). Thus, to complete the proof we just have to show that if we define D(α 1 , . . . , α n )
to be the expression on the right side of (1), then D has the required properties (i), (ii), (iii).
Now properties (i), (iii) are obviously satisfied, so all we really need to do is check property (ii).
To check property (ii) observe first that ik , i are dummy variables so we can relabel them p, q, in
which case
k -th  -th
↓ ↓

D(α 1 , .., α  , .., α k , .., α n ) k -th  -th


↓ ↓ k -th  -th
↓ ↓

= (−1) N (i1 ,...,p,...,q,...,in )
ai1 1 ai2 2 · · · ap · · · aqk · · · ain n
distinct i1 ,...p,...,q,...,in

= (−1)N (i1 ,...,p,...,q,...,in ) ai1 1 ai2 2 · · · aqk · · · ap · · · ain n
distinct i1 ,...p,...,q,...,in

=− (−1)N (i1 ,...,q,...,p,...,in ) ai1 1 ai2 2 · · · aqk · · · ap · · · ain n ,
distinct i1 ,...q,...,p,...,in
66 CHAPTER 3. MORE LINEAR ALGEBRA

where we used the fact that (−1)N (i1 ,...,p,...,q,...,in ) = −(−1)N(i1 ,...,q,...p,...,in ) (by the main lemma,
Lem. 1.5, in the previous section), and now we simply relabel q = ik and p = i to see that the final
expression here is indeed, as claimed, the quantity −D(α 1 , . . . , α k , . . . , α  , . . . , α n ).
Thus, the proof of the theorem is complete.
We can now define the determinant det A of an n × n matrix A = (aij ).
2.3 Definition: det A = D(α 1 , . . . , α n ), where α 1 , . . . , α n are the columns of A.
We can now check the various properties of determinant: The first 3 are just restatements of prop-
erties (i), (ii), (iii) of the theorem, and hence require no proof.
P1. det A is a linear function of each of the columns of A (assuming the other columns are held
fixed).
P2. If we interchange 2 (different) columns of A, then the value of the determinant is changed by a
factor −1 (and hence det A = 0 if A has two equal columns).
P3. If A = I (the identity matrix), then det A = 1.
P4. If A, B are n × n matrices, then det AB = det A det B.
Proof of P4: Let A = (aij ), B = (bij ). Recall that by definition of matrix multiplication we have

that the j -th column of AB is nk=1 bkj α k , where α k is the k-th column of A. Thus, using the
linearity and alternating properties (i), (ii) for D we have


n 
n
det AB = D( bk1 1 α k1 , . . . , bkn n α kn )
k1 =1 kn =1

n
= bk1 1 bk2 2 · · · bkn n D(α k1 , . . . , α kn )
k1 ,k2 ,...,kn =1

= bk1 1 bk2 2 · · · bkn n D(α k1 , . . . , α kn )
distinct k1 ,k2 ,...,kn

= (−1)N (k1 ,...,kn ) bk1 1 bk2 2 · · · bkn n D(α 1 , . . . , α n )
distinct k1 ,k2 ,...,kn
= det B det A

as claimed.
P5. det A = det AT (and hence the properties P1, P2 apply also to the rows of A).
P6. (Expansion along the i-th row or j -th column.) The following formulae are valid:

det A = nj=1 (−1)i+j aij det Aij for each i = 1, . . . , n

det A = ni=1 (−1)i+j aij det Aij for each j = 1, . . . , n ,
2. DETERMINANTS 67

where Aij denotes the (n − 1) × (n − 1) matrix obtained by deleting the i-th row and j -th column
of the matrix A.

We defer the proof of P5, P6 for a moment, in order to make an important point about practical
computation of determinants:

2.4 Remark: (Highly relevant for the efficient computation of determinants.) Notice that one almost
never directly uses the definition (or for that matter P6) to compute a determinant. Rather, one uses
the properties P1, P2 (and the corresponding facts about rows instead of columns which we can do
by P5) in order to first drastically simplify the matrix before attempting to expand the determinant.
In particular, observe that the 3rd kind of elementary row operation does not change the value of
the determinant at all:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
r1 r1 r1 r1
⎜ r ⎟ ⎜r ⎟ ⎜r ⎟ ⎜r ⎟
⎜ 2 ⎟ ⎜ 2⎟ ⎜ 2⎟ ⎜ 2⎟
⎜ .. ⎟ ⎜.⎟ ⎜.⎟ ⎜.⎟
⎜ ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ ⎟. ⎜ ⎟. ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ri ⎟ ⎜ri ⎟ ⎜ r i ⎟ ← i-th ⎜ri ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ = det ⎜ ⎟
det ⎜ .. ⎟ = det ⎜ .. ⎟ + λ det ⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜.⎟ ⎜.⎟ ⎜.⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜r j + λr i ⎟ ⎜r j ⎟ ⎜ r i ⎟ ← j -th ⎜r j ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝.⎠ ⎝.⎠ ⎝.⎠
rn rn rn rn

where we used P1 and P2. We can of course also use the other two types of elementary row operations,
so long as we keep track of the fact that interchanging 2 rows changes the sign of the determinant
and multiplying a row by a constant λ has the effect of multiplying the determinant by λ.

We now want to check properties P5, P6.

Proof of P5: Recall that if (i1 , . . . , in ) is any permutation of 1, . . . , n then we can pick transpo-
sitions τ (1) , . . . , τ (q) such that (i1 , . . . , in ) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (1, . . . , n), and then the same
sequence of transpositions in reverse order “undoes” this permutation: (1, . . . , n) = τ (q) ◦ · · · ◦
τ (1) (i1 , . . . , in ). The permutation (j1 , . . . , jn ) = τ (q) ◦ · · · ◦ τ (1) (1, . . . , n) is called the “inverse” of
(i1 , . . . , in ). Observe that jk is uniquely determined by the condition ijk = k for each k = 1, . . . , n,
as one sees by taking xk = ik in the identity (xj1 , . . . , xjn ) = τ (q) ◦ · · · ◦ τ (1) (x1 , . . . , xn ). Ob-
serve also that (−1)N (j1 ,...,jn ) = (−1)q = (−1)N (i1 ,...,in ) ; that is, the permutation (i1 , . . . , in ) and
its inverse (j1 , . . . , jn ) have the same index.
Now 
det A = (−1)N (i1 ,...,in ) ai1 1 ai2 2 · · · ain n ,
distinct i1 ,...,in
68 CHAPTER 3. MORE LINEAR ALGEBRA

and observe that the terms in the product ai1 1 ai2 2 · · · ain n can be reordered so that the jk -th entry
(which is aijk jk = akjk ) appears as the k-th factor. Thus,

ai1 1 ai2 2 · · · ain n = a1j1 a2j2 · · · anjn ,

and, as we mentioned above, the index (−1)N (j1 ,...,jn ) of (j1 , . . . , jn ) is the same as the index
(−1)N(i1 ,...,in ) of (i1 , . . . , in ). We thus have

det A = (−1)N (j1 ,...,jn ) a1j1 a2j2 · · · anjn .
distinct i1 ,...,in

Since the correspondence (i1 , . . . , in ) → (j1 , . . . , jn ) between a permutation and its inverse is 1:1
(hence onto all n! permutations), we see that (j1 , . . . , jn ) visits each of the n! permutations exactly
once in the above sum over distinct i1 , . . . , in . That is, we have shown

det A = (−1)N (j1 ,...,jn ) a1j1 a2j2 · · · anjn .
distinct j1 ,...,jn

But now observe what we have here is exactly the expression for the determinant of AT . Thus,
det A = det AT as claimed in P5.

Proof of P6: In view of P5 we need only prove the second identity here, because the first follows
by applying the second to the transpose matrix. Furthermore, we need only prove the first identity
in the case j = n, because the case j = n implies the result for j < n by virtue of property P2. Let

α 1 , . . . , α n be the columns of A. Since α n = ni=1 ain ei , we have

(1) det A = D(α 1 , . . . , α n ) = D(α 1 , . . . , α n−1 , ni=1 ain ei )
n
= i=1 ain D(α 1 , . . . , α n−1 , ei ) ,

and by making n − i interchanges of adjacent rows, and using the fact that (by P5) property P2 also
applies with respect to interchanges of rows rather than columns, we can move the i-th row of the
n × n matrix (α 1 α 2 · · · α n−1 ei ) to the n-th position, at the expense of gaining a factor (−1)n−i ; and
so, since (−1)n−i = (−1)n+i , we obtain
 
Ain 0
(2) D(α 1 , . . . , α n−1 , ei ) = (−1) det
n+i
,
ri 1

where 0 is the zero column vector in Rn−1 and r i is the row vector (ai1 , ai2 , . . . , ain−1 ) ∈ Rn−1
(i.e., r i is just the first n − 1 entries of the i-th row of A).
Now by definition
  
Ain 0
det = (−1)N (i1 ,...,in ) bi1 1 bi2 2 . . . bin n ,
ri 1
distinct i1 ,...,in ≤ n
3. INVERSE OF A SQUARE MATRIX 69
 
Ain 0
where bij is the element in the i-th row and j -th column of the matrix ; notice that
ri 1
bin n = 0 if in  = n and 1 if in = n, so this expression simplifies to
  
Ain 0
det = (−1)N (i1 ,...,in−1 ,n) bi1 1 bi2 2 . . . bin−1 n−1 ,
ri 1
distinct i1 ,...,in−1 ≤n−1

which of course (since N(i1 , . . . , in−1 , n) = N(i1 , . . . , in−1 )) is just exactly the expression for de-
 (n − 1)
terminant of the  × (n − 1) matrix obtained by deleting the n-th row and the n-th column
Ain 0
of the matrix , i.e., it is the determinant det Ain ; that is,
ri 1
 
Ain 0
(3) det = det Ain .
ri 1

By combining (1), (2), (3) we have P6 as claimed.

SECTION 2 EXERCISES
2.1 Calculate the determinant of
⎛ ⎞
10 11 12 13 426
⎜2000 2001 2002 2003 421 ⎟
⎜ ⎟
⎜ 2 2 1 0 419 ⎟
⎜ ⎟
⎝ 100 101 101 102 2000⎠
2003 2004 2005 2006 421

(show all row/column operations: no calculators).


⎛ ⎞
1 1 ··· 1
⎜ x1 x2 ··· xn ⎟
⎜ ⎟
⎜ x12 x22 ··· xn2 ⎟
2.2 Prove that the “Vandermonde determinant”  = det ⎜ ⎟
⎜ .. .. ⎟
⎝ . . ⎠
x1n−1 x2n−1 · · · xnn−1
is the product of all possible factors xj − xi with 1 ≤ i < j ≤ n; that is,  = 1≤i<j ≤n (xj − xi ).
Hint: Consider using row operations to reduce the n-variable case to the (n − 1)-variable case.

3 INVERSE OF A SQUARE MATRIX


One very important theoretical consequence of the formula P6 of the previous section is that it enables
us to prove the inverse of an n × n matrix A (i.e., the n × n matrix A−1 such that A−1 A = AA−1 = I )
70 CHAPTER 3. MORE LINEAR ALGEBRA

exists if det A  = 0. First observe that according to the first identity in P6 we have
⎛ ⎞
a11 a12 · · · a1n
⎜a ⎟
⎜ 21 a22 · · · a2n ⎟
⎜ . .. .. ⎟
⎜ .. . ⎟
⎜ . ⎟
 ⎜ ⎟
n
⎜ a 1 a 2 · · · an ⎟ ← i-th row
(−1) ak det Aik = det ⎜ .
i+k ⎜ .. ⎟
. .. ⎟
⎜ . . . ⎟
k=1 ⎜ ⎟
⎜a1 a2 · · · an ⎟ ← -th row
⎜ ⎟
⎜ .. .. .. ⎟
⎝ . . . ⎠
an1 an2 · · · ann

which is zero if i = . On the other hand, if i =  then P6 says the expression on the left is just
det A. Thus, we have
n
det A if i = 
(−1)i+k ak det Aik =
k=1
0 if i =  ,
or in other words (labeling the indices differently)


n
det A if i = j
(−1)j +k aik det Aj k =
k=1
0 if i  = j .

Notice that the expression on the left is just the element in the i-th row and j -th column of the
matrix product AM, where M is the n × n matrix with entry mij in the i-th row and j -th column,
where mij = (−1)i+j det Aj i ; in other words,
 
M = (−1)i+j det Aj i ,

and then the above, in matrix terminology, just says

AM = (det A) I ,

where I is the n × n identity matrix. By using an almost identical argument, based on the second
identity in P6 rather than the first, we also deduce

MA = (det A) I ,

where M is the same matrix.

3.1 Terminology: The above matrix M (which has the entry (−1)i+j det Aj i in its i-th row and
j -th column) is called the “adjoint matrix of A,” and is denoted adj A.

Because of the above we now have a complete understanding of when an n × n matrix A has an
inverse, i.e., when there is an n × n matrix A−1 such that A−1 A = AA−1 = I .
3. INVERSE OF A SQUARE MATRIX 71

3.2 Theorem. Let A be an n × n matrix. Then A has an inverse if and only if det A  = 0, and in this
case the inverse A−1 is given explicitly by A−1 = det1 A adj A.

Proof: First assume det A  = 0. Then by the formulae adj A A = A adj A = (det A) I proved above,
we see that indeed the inverse exists and is given explicitly by the formula det1 A adj A.
Conversely, if A−1 exists, then we have AA−1 = I , and hence taking determinants of each side and
using properties P4 and P3 we obtain (det A) (det A−1 ) = 1, which implies det A  = 0, so the proof
is complete.

3.3 Remark: Notice that the last part of the above argument gives the additional information that
1
det A−1 = .
det A

Recall (see the remark after the statement of properties P1–P6) that we can compute det A by using
elementary row operations and that the first type of elementary row operation (interchanging any
2 rows) only changes the sign, the second type (multiplying a row by a nonzero scalar λ) multiplies
the determinant by λ (= 0), and the third type (replacing the i-th row by the i-th row plus any
multiple of the j -th row with i  = j ) does not change the value of the determinant at all. Thus,
none of the elementary row operations can change whether or not the matrix has zero determinant.
In particular, this means that det A  = 0 ⇐⇒ det rref A  = 0. On the other hand, since A is n × n
(i.e., we are in the case m = n), then, from our previous discussion of rref A in Sec. 9. of Ch. 1, we
know that either rref A = I (the n × n identity matrix) or rref A has a zero row (in which case it
trivially has determinant = 0). We also recall (again from Sec. 9 of Ch. 1) that (since we are in the
case m = n) rref A has a zero row if and only if the null space N (A) is nontrivial (i.e., if and only
if the homogeneous system Ax = 0 has a nontrivial solution), which in turn is true if and only if A
has rank less than n.
Thus, combining these facts with Thm. 3.2 above, we have:

3.4 Theorem. If A is an n × n matrix, then


A−1 exists ⇐⇒ det A  = 0 ⇐⇒ rref A = I ⇐⇒ N (A) = {0} ⇐⇒ rank A = n .

SECTION 3 EXERCISES
3.1 Suppose A, B are n × n matrices and AB = I . Prove that then det A = 0 and B =
(det A)−1 adj A. (In particular, AB = I ⇒ BA = I and B is the unique inverse (det A)−1 adj A).

3.2 If A, B are, respectively, m × n and n × p matrices, prove


(a) (AB)T = B T AT , and (b) m = n = p and A, B invertible ⇒ AB invertible and (AB)−1 =
B −1 A−1 .
72 CHAPTER 3. MORE LINEAR ALGEBRA

3.3 Let A = (aij ) be an n × n matrix with det A = 0, and b = (b1 , . . . , bn )T ∈ Rn . Prove that the
solution x = (x1 , . . . , xn )T = A−1 b of the system Ax = b is given by the formula
det Ai
xi = , i = 1, . . . , n ,
det A
where Ai is obtained from A by replacing the i th column of A by the given vector b.
 
Hint: Use the formula A−1 = (det A)−1 (−1)i+j det Aj i proved above to calculate the i-th com-
ponent of A−1 b.

4 COMPUTING THE INVERSE


By Exercise 3.1 above, an n × n matrix B is the inverse of the n × n matrix A if and only if AB = I ,
where I is the n × n identity matrix, i.e., the matrix with j -th column ej . Since the j -th column of
AB is Aβ j , where β j is the j -th column of B, to find the inverse of A we thus have to solve
4.1 Ax = ej , j = 1, . . . , n .
In view of the discussion in Sec. 10 of Ch. 1 we can attempt to do this by using elementary row
operations on the augmented matrix A|ej to give the new augmented matrix
4.2 rref A|β j ,
and by Thm. 3.4 above we know that A−1 exists if and only if rref A = I , in which case the augmented
system in 4.2 is simply
4.3 I |β j , j = 1, . . . , n ÿ,
and hence if A−1 exists then the solution of 4.1, which gives the j -th column of A−1 , is exactly
the vector which appears right side of 4.2 (and in that case rref A = I ). We can of course solve all
j equations in 4.1 simultaneously by working with the n × 2n augmented matrix A|I (because the
j -th column of I is ej ).
Thus, to summarize, we can find the inverse by starting with the n × 2n augmented matrix A|I and
applying elementary row operations, which produce rref A = I as the first n columns, and produces
the n × 2n augmented matrix
I |B .
Then B is the required inverse A−1 . This process will also tell us when A−1 does not exist, because
in that case rref A will have at least one row of zeros (and det A = 0).
⎛ ⎞
1 4 3
For example, if A is 3 × 3 matrix ⎝1 4 5⎠ then we compute the inverse (and at the same time
2 5 1
check that the inverse exists) by looking at the 3 × 6 augmented matrix
⎛ ⎞
1 4 3 1 0 0
⎝ 1 4 5 0 1 0 ⎠ ,
2 5 1 0 0 1
5. ORTHONORMAL BASIS AND GRAM-SCHMIDT 73

and using elementary row operations as follows:


⎛ ⎞ ⎛ ⎞
1 4 3 1 0 0 1 4 3 1 0 0
⎝ 1 4 5 0 1 0 ⎠ r2 → r2 − r1 ⎝ 0 0 2 −1 1 0 ⎠
2 5 1 0 0 1 r3 → r3 − 2r1 0 −3 −5 −2 0 1
⎛ ⎞
1 4 3 1 0 0
r2 → 21 r2 ⎝ 0 0 1 −1/2 1/2 0 ⎠
r3 → −r3 0 3 5 2 0 −1
⎛ ⎞
r1 → r1 − 3r2 1 4 0 5/2 −3/2 0
⎝ 0 0 1 −1/2 1/2 0 ⎠
r3 → r3 − 5r2 0 3 0 9/2 −5/2 −1
⎛ ⎞.
1 4 0 5/2 −3/2 0
⎝ 0 0 1 −1/2 1/2 0 ⎠
r3 → 1
3 r3 0 1 0 9/6 −5/6 −1/3
⎛ ⎞
1 0 0 −21/6 11/6 4/3
r1 → r1 − 4r3 ⎝ 0 ⎠
0 1 −1/2 1/2 0
0 1 0 9/6 −5/6 1/3
⎛ ⎞
1 0 0 −21/6 11/6 4/3
r2 ↔ r3 ⎝ 0 1 0 9/6 −5/6 1/3 ⎠
0 0 1 −1/2 1/2 0
⎛ ⎞
−7/2 11/6 4/3
Therefore, A−1 = ⎝ 3/2 −5/6 −1/3⎠.
−1/2 1/2 0

SECTION 4 EXERCISES
⎛ ⎞
1 1 1 1
⎜0 1 1 1⎟
4.1 Calculate the inverse A−1 of A, if A is the 4 × 4 matrix ⎜
⎝1
⎟.
0 2 3⎠
0 0 1 2

5 ORTHONORMAL BASIS AND GRAM-SCHMIDT


Let v 1 , . . . , v N be vectors in Rn . We say that the vectors are orthonormal if v i · v j = δij , where δij is
the Kronecker delta (= 1 when i = j and = 0 when i  = j ). Thus, v 1 , . . . , v N orthonormal means
that each vector has length 1 and the vectors are mutually orthogonal (i.e., each vector is orthogonal
to all the others).
If V is a nontrivial subspace of Rn then an orthonormal basis for V is a basis for V consisting of an
orthonormal set of vectors.
74 CHAPTER 3. MORE LINEAR ALGEBRA

5.1 Remark: Observe that orthonormal vectors v 1 , . . . , v N are automatically l.i., because

N
ci v i = 0 ⇒ cj = 0 ∀ j = 1, . . . , N ,
i=1
N
as we see by taking the dot product of each side of the identity i=1 ci v i = 0 with the vector v j .

5.2 Lemma. If V is a nontrivial subspace of Rn of dimension k and if u1 , . . . , uk is an orthonormal


basis for V , then the orthogonal projection P V of Rn onto V (as in Thm. 8.3 of Ch. 1) is given explicitly
by
k
P V (x) = x · ui ui , x ∈ Rn .
i=1
k
Proof: Define Q(x) = i=1 x · ui ui for x ∈ Rn , and observe that then Q(x) ∈ V by definition.
Also, since uj · ui = δij , uj · (x − Q(x)) = uj · x − x · uj = 0, so that x − Q(x) is orthogonal to
each uj , hence orthogonal to span{u1 , . . . , uk }. That is, x ∈ V and x − Q(x) ∈ V ⊥ and hence Q(x)
satisfies the two conditions which according to Thm. 8.3 of Ch. 1 uniquely determine the orthogonal
projection P V . Hence, Q = P V and the proof is complete.

Our main result here is the following.

5.3 Theorem. Let v 1 , . . . , v k be any l.i. set of vectors in Rn . Then there is an orthonormal set of
vectors u1 , . . . , uk such that span{u1 , . . . , uj } = span{v 1 , . . . , v j } for each j = 1, . . . , k, and in fact
u1 , . . . , uj is an orthonormal basis for span{v 1 , . . . , v j } for each j = 1, . . . , k.

Proof (“Gram-Schmidt orthogonalization”): We proceed inductively to construct orthonormal


u1 , u2 , . . . , uk such that span{u1 , . . . , uj } = span{v 1 , . . . , v j } for each j by the following procedure,
in which we use the notation that Pj denotes the orthogonal projection of Rn onto span{v 1 , . . . , v j }
for each j = 1, . . . , k:
u1 = v 1 −1 v 1
and, assuming j ∈ {2, . . . , k − 1} and that we have selected orthonormal u1 , . . . , uj with
span{u1 , . . . , uj } = span{v 1 , . . . , v j }, we define
(1) uj +1 = v j +1 − Pj (v j +1 )−1 (v j +1 − Pj (v j +1 )) .

Observe that this makes sense because v j +1 − Pj (v j )  = 0 (because by definition Pj (v j +1 ) is a linear


combination of v 1 , . . . , v j and so v j +1 − Pj (v j ) is a nontrivial linear combination of v 1 , . . . , v j +1
and hence cannot be zero because v 1 , . . . , v j +1 are l.i.).
Then by definition, uj +1 is a unit vector, and if i = 1, . . . , j then we have
ui · uj +1 = c(ui · v j +1 − ui · Pj (v j +1 )), c = v j +1 − Pj (v j +1 )−1 ,
6. MATRIX REPRESENTATIONS OF LINEAR TRANSFORMATIONS 75

and by Thm. 8.3(ii) of Ch. 1 the right side here is c(ui · v j +1 − P (ui ) · v j +1 ) = c(ui · v j +1 − ui ·
v j +1 ) = 0, so uj +1 is orthogonal to u1 , . . . , uj , hence (since the inductive hypothesis guarantees
that u1 , . . . , uj are orthonormal) u1 , . . . , uj +1 are orthonormal and by 5.1 are l.i. vectors in the
(j + 1)-dimensional space span{v 1 , . . . , v j +1 }, and so are a basis for span{v 1 , . . . , v j +1 }. Thus, in
particular, span{u1 , . . . , uj +1 } = span{v 1 , . . . , v j +1 } and the proof of 5.3 is complete.

5.4 Remark: Notice that in view of Lem. 5.2 the inductive definition (1) in the above proof can be
written
j 
j
uj +1 = v j +1 − v j +1 · ui ui −1 (v j +1 − v j +1 · ui ui ) .
i=1 i=1

The procedure of getting orthonormal vectors in this way is known as Gram-Schmidt orthogonaliza-
tion.
Finally, we want to introduce the important notion of orthogonal matrix.
5.5 Definition: An n × n matrix Q = (qij ) is orthogonal if the columns q 1 , . . . , q n of Q are
orthonormal.

Observe that by definition Q orthogonal ⇒ QT Q = I , because by definition of matrix multiplica-


tion the entry of QT Q in the i-th row and j -th column is q i · q j which is δij because q 1 , . . . , q n
are orthonormal. Notice that the fact the QT Q = I means that det(QT Q) = 1, and hence by prop-
erties P4, P5 we see that (det Q)2 = 1, or in other words det Q = ±1. In particular, det Q  = 0,
so Q−1 exists by Thm. 3.4. Indeed, since QT Q = I we see that Q−1 = QT and hence we must
automatically have QQT = I . That is (since Q = (QT )T ) we have
Q orthogonal ⇐⇒ QT orthogonal ,

so, in fact, Q orthogonal implies that the rows of Q are orthonormal.

SECTION 5 EXERCISES
5.1 If S is the two-dimensional subspace of R4 spanned by the vectors (1, 1, 0, 0)T , (0, 0, 1, 1)T ,
find an orthonormal basis for S and find the matrix of the orthogonal projection of R4 onto S.

5.2 If S is the subspace of R4 spanned by the vectors (1, 0, 0, 1)T , (1, 1, 0, 0)T , (0, 0, 1, 1)T , find an
orthonormal basis for S, and find the matrix of the orthogonal projection of R4 onto S.

6 MATRIX REPRESENTATIONS OF LINEAR


TRANSFORMATIONS
Suppose V is a nontrivial subspace of Rn and T : V → V is linear. Thus, T (λx + μy) = λT (x) +
μT (y) ∀ x, y ∈ V and for all λ, μ ∈ R. We showed in Sec. 6 of Ch. 1 that, in case V = Rn , T is
76 CHAPTER 3. MORE LINEAR ALGEBRA

represented by the n × n matrix A with j -th column T (ej ) in the sense that T (x) ≡ Ax for all
x ∈ Rn . A similar result holds in the present more general case when V is an arbitrary nontrivial
subspace of Rn .
To describe this result we first need to introduce a little terminology.
6.1 Terminology: Let v 1 , . . . , v k be a basis for V . Then each x ∈ V can be uniquely represented
k
j =1 xj v j . The vector ξ = (x1 , . . . , xk ) ∈ R is referred to as the coordinates of x relative to the
T k

given basis v 1 , . . . , v k .

Observe that x = kj =1 xj v j ⇒ T (x) = nj=1 xj T (v j ) by linearity of T , and since T (v j ) ∈ V
and v 1 , . . . , v k is a basis for V we can express T (v j ) as a unique linear combination of v 1 , . . . , v k ;
thus

k
6.2 T (v j ) = aij v i for some unique aij ∈ R .
i=1

The k × k matrix A = (aij ) is called the matrix of T relative to the given basis v 1 , . . . , v n .

Observe that, still with x = kj =1 xj v j , we then have T (x) = kj =1 xj T (v j ) =
k k k k
i=1 aij xj v i = i=1 ( j =1 aij xj )v i . Thus if x has coordinates ξ = (x1 , . . . , xk )
T
j =1
relative to v 1 , . . . , v k then T (x) has coordinates Aξ relative to the same basis v 1 , . . . , v k , where
A = (aij ) is the matrix of T relative to the given basis v 1 , . . . , v k as in 6.2.

7 EIGENVALUES AND THE SPECTRAL THEOREM


Let A = (aij ) be an n × n (real) matrix. We begin with the definition of eigenvalue of A.
Observe that the expression det(A − λI ) is a degree n polynomial in the variable λ and the coefficient

of the top degree term λn is (−1)n . Thus, det(A − λI ) = (−1)n (λn + n−1 j
j =0 aj λ ) for some real
numbers a1 , . . . , an−1 .
One can check this directly in case n = 2 (in which case det(A − λI ) = (a11 − λ)(a22 − λ) −
a12 a21 = λ2 − (a11 + a22 )λ + (a11 a22 − a12 a21 )) and it can easily be checked, in general, by in-
duction on n.
Notice that then the fundamental theorem of algebra tells us that there are (possibly complex)
numbers λ1 , . . . , λn such that
7.1 det(A − λI ) ≡ (−1)n (λ − λ1 ) · · · (λ − λn ), ∀λ .

7.2 Definition: The numbers λ1 , . . . , λn (which are the roots of the equation det(A − λI ) = 0) are
called eigenvalues of the matrix A.

There are two important properties of such eigenvalues.


7. EIGENVALUES AND THE SPECTRAL THEOREM 77

7.3 Lemma. Let λ1 , . . . , λn be the eigenvalues of A as in 7.2 above. Then:

(i) det A = λ1 · · · λn (i.e., det A = the product of the eigenvalues of A);


(ii) for any λ ∈ R: λ an eigenvalue of A ⇐⇒ ∃ v ∈ Rn \ {0} with Av = λv .

7.4 Remark: Any v ∈ Rn \ {0} with Av = λv as in (ii) above is called an eigenvector of A corresponding
the eigenvalue λ. Notice we only here consider eigenvectors corresponding to real eigenvalues λ; it
would also be possible to consider eigenvectors corresponding to complex eigenvalues λ, although
in that case the corresponding eigenvectors would be in Cn \ {0} rather than in Rn \ {0}.

Proof of 7.3: First note that (i) is proved simply by setting λ = 0 in the identity 7.1.
To prove (ii), simply observe that, for any real λ, det(A − λI ) = 0 ⇐⇒ N(A − λI ) = {0} ⇐⇒
∃ v ∈ Rn \ {0} with (A − λI )v = 0 by Thm. 3.4 of the present chapter.

We can now state and prove the spectral theorem.

7.4 Theorem (Spectral Theorem.) Let A = (aij ) be a symmetric n × n matrix. Then there is an
orthonormal basis v 1 , . . . , v n for Rn consisting of eigenvectors of A.

Proof: The proof is by induction on n. It is trivial in the case n = 1 because in that case the matrix
A is just a real number and Rn is the real line R. So assume n ≥ 2 and as an inductive hypothesis
assume the theorem is true with n − 1 in place of n.
Let A(x) be the quadratic form given by the n × n matrix A. Thus,

n
A(x) = aij xi xj ,
i,j =1

and observe that we can write


(1) A(x) = x2 A(
x ), x = x−1 x ∈ S n−1 , x ∈ Rn \ {0} .


By Thm. 2.6 of Ch. 2 we know that A|S n−1 attains its minimum value m at some point v 1 ∈ S n−1 ,
so, by (1), for x ∈ Rn \ {0}
(2) x−2 A(x) = A(
x ) ≥ m = x−2 A(x)|x =v 1

and hence the function x−2 A(x) : Rn \ {0} → R actually attains a minimum at the point v 1 and
hence (see the discussion following 7.1 of Ch. 2) must have zero gradient at this point. But direct
computation shows that the i-th partial derivative of the function x−2 A(x) is

n 
n

2x−2 aij xj − x−3 ak xk x xi ,
j =1 k,=1
78 CHAPTER 3. MORE LINEAR ALGEBRA

so at x = v 1 = (v11 , v12 , . . . , v1n )T we get



n
aij v1j − mv1i = 0, i = 1, . . . , n ,
j =1

which in matrix terms says


Av 1 = mv 1 ,
so, by 7.3(ii), we have shown that v 1 is an eigenvector corresponding to eigenvalue m.
Let T be the linear transformation determined by the matrix A, so that
T (x) = Ax, x ∈ Rn ,
and let V = span{v 1 }.Then dim V = 1 and by Thm. 8.1(iii) we know that dim V ⊥ = n − 1, and it is
a straightforward exercise (Exercise 7.1(b) below) to check that T : V ⊥ → V ⊥ . So let w 1 , . . . , wn−1
be an orthonormal basis for V ⊥ and let A0 = (aij ) be the (n − 1) × (n − 1) matrix of T |V ⊥ relative
(0)

to the basis w 1 , . . . , wn−1 as discussed in Sec. 6 above. Observe that then by 6.2 we have

n−1
(0)
T (w j ) = akj wk , j = 1, . . . , n − 1 ,
k=1

and by taking the dot product of each side with w i we get (using Exercise 7.1(a) below)
(0) (0)
aij = wi · T (w j ) = wj · T (w i ) = aj i ,
(0) (0) (0)
so that aij = aj i . Thus, (aij ) is an (n − 1) × (n − 1) symmetric matrix, and we can use the
inductive hypothesis to find an orthonormal basis u2 , . . . , un of Rn−1 and corresponding λ2 , . . . , λn
such that
A0 uj = λj uj , j = 2, . . . , n ,
so, with uj = (uj 1 , . . . , uj n−1 )T ,


n−1
(0)
(2) aki uj i = λj uj k , j, k = 1, . . . , n − 1 .
i=1
n−1
We then let v j = i=1 uj i wi for j = 2, . . . , n and observe that by (2)


n−1 
n−1 
n−1
(0)

n−1
Av j = uj i Awi = uj i T (w i ) = uj i aki wk = λj uj k wk = λj v j ,
i=1 i=1 i,k=1 k=1

for j = 2, . . . , n, so v j are eigenvectors of A for j = 2, . . . , n. Finally, observe that since by con-


struction v 2 , . . . , v n ∈ V ⊥ and v 1 ∈ V we have that v 1 , . . . , v n are an orthonormal set of vectors in
Rn , and hence (since orthonormal vectors are automatically l.i. by Rem. 5.1 of the present Chapter)
v 1 , . . . , v n are an orthonormal basis for Rn .
7. EIGENVALUES AND THE SPECTRAL THEOREM 79

SECTION 7 EXERCISES
7.1 Let T : Rn → Rn be a linear transformation with the property that the matrix of T (i.e., the
matrix Q such that T (x) = Qx for all x ∈ Rn ) is symmetric. (i.e., Q = (qij ) with qij = qj i ). Prove:
(a) x · T (y) = y · T (x) for all x, y ∈ Rn .
(b) If v is a nonzero vector such that T (v) = λv for some λ ∈ R, and if  is the one-dimensional
subspace spanned by v, then T (⊥ ) ⊂ ⊥ .

7.2 Let A be an n × n symmetric matrix. Prove there is an n × n orthogonal matrix Q (i.e., QT Q = I )


such that QT AQ is a diagonal matrix.
Hint: The Spectral Theorem proved above says that there is an orthonormal basis v 1 , . . . , v n for Rn
such that Av j = λj v j for each j (where λj ∈ R for each j ).
80
81

CHAPTER 4

More Analysis in Rn
1 CONTRACTION MAPPING PRINCIPLE
First we define the notion of contraction. Given a mapping f : E → Rn , where E ⊂ Rn , we say that
f is a contraction if there is a constant θ ∈ (0, 1) such that
1.1 f (x) − f (y) ≤ θx − y, ∀ x, y ∈ E .

Theorem (Contraction Mapping Theorem.) Let E be a closed subset of Rn and f a contraction


mapping of E into E. (Thus, f (E) ⊂ E and f satisfies 1.1.) Then f has a fixed point, i.e., there is a
z ∈ E such that f (z) = z.
Remark: The proof we give below is constructive, i.e., not only will the proof show that there is a fixed
point z, but at the same time it actually gives a practical method for finding (or at least approximating
arbitrarily closely) such a point z.

Proof of the Contraction Mapping Theorem: We inductively construct a sequence {x k }k=0,1,2,...


of points of E as follows:
(i) Take x 0 ∈ E arbitrary.
(ii) Assuming that k ≥ 1 and that x 0 , . . . , x k−1 are already chosen, define
x k = f (x k−1 ) .
Observe that then we have, using 1.1,
x k+1 − x k  = f (x k ) − f (x k−1 ) ≤ θx k − x k−1 , k≥1,
and by mathematical induction we have
x k+1 − x k  ≤ θ k x 1 − x 0 , k≥1,
and so if  > k ≥ 1 we have
−1
x  − x k  =  −1
j =k (x j +1 − x j ) ≤ =k x j +1 − x j 
−1 j j−1
≤ ( j =k θ )x 1 − x0  = ( j =k θ j )f (x 0 ) − x0  .

On the other hand, ∞ j k
j =k θ is a convergent geometric series with first term θ and common ratio
θ , so it has sum θ /(1 − θ), and hence the above inequality gives
k

(∗) x  − x k  ≤ (1 − θ )−1 f (x 0 ) − x 0  θ k , >k≥1.


82 CHAPTER 4. MORE ANALYSIS IN Rn

In particular, with k = 1 we have


x   = x  − x 1 + x 1  ≤ x  − x 1  + x 1  ≤ (1 − θ)−1 f (x 0 ) − x 0  + f (x 0 )

for all  > 1, so the sequence {x k }k=1,2,... is bounded, and by the Bolzano-Weierstrass theorem there
is a convergent subsequence {xkj }j =1,2,... . Since E is closed the limit limj →∞ x kj is a point of E.
Notice that we can also take  = kj in (∗), giving
(∗∗) x kj − x k  ≤ (1 − θ )−1 f (x 0 ) − x 0  θ k , j >k≥1,

and hence, letting z = limj →∞ x kj , and taking limits in (∗∗), we see that

z − x k  ≤ (1 − θ )−1 f (x 0 ) − x 0  θ k , ∀ k≥1,

so that, in fact (since θ k → 0 as k → ∞), the whole sequence x k (and not just the subsequence x kj )
converges to z. Then by continuity of f at the point z we have lim f (x k ) = f (z). But, on the other
hand, f (x k ) = x k+1 so we must also have lim f (x k ) = lim x k+1 = z. Thus, f (z) = z.

SECTION 1 EXERCISES
1.1 Show that the system of two nonlinear equations
x 2 − y 2 + 4x − 1 = 0
1 
x 2 x 2 + y 2 + 4y + 1 = 0

has a solution in the disk x 2 + y 2 ≤ 1.


 T
Hint: Define f (x, y) = x + 41 (x 2 − y 2 ) − 41 , y + 41 ( 21 x 2 + y 2 )x + 41 and show that F (x, y) =
(x, y)T − f (x, y) has a fixed point in the disk.

2 INVERSE FUNCTION THEOREM


We here prove the inverse function theorem, which guarantees that C 1 function f : U → Rn is
locally C 1 invertible near points where the Jacobian matrix is nonsingular. More precisely:

Theorem (Inverse Function Theorem.) Suppose U ⊂ Rn is open, f : U → Rn is C 1 , and x 0 ∈ U is


such that Df (x 0 ) is nonsingular (i.e., (Df (x 0 ))−1 exists). Then there are open sets V , W containing x 0
and f (x 0 ), respectively, such that f |V is a 1:1 map of V onto W with C 1 inverse g : W → V .

Proof: We first consider the special case when Df (x 0 ) = I . (The general case follows directly by
applying this special case to the function f˜(x) ≡ (Df (x 0 ))−1 f (x).)
Let ε ∈ (0, 21 ]. Observe that since U is open and Df (x) is continuous and = I at x =
x 0 we have a ρ > 0 such that Bρ (x 0 ) ⊂ U and Df (x) − I  < ε ≤ 21 for x − x 0  <
ρ, so, in particular, Df (x) v  =  v + (Df (x) − I ) v  ≥  v  − (Df (x) − I ) v  ≥  v  −
2. INVERSE FUNCTION THEOREM 83

(Df (x) − I ) v  ≥  v  − 21  v  = 21  v  for any v ∈ Rn and hence the null space N(Df (x))
is trivial. Thus, the above choice of ρ ensures that (Df (x))−1 exists for each x ∈ Bρ (x 0 ).
Next observe that with the same ρ > 0 as in the above paragraph we have, by the fundamental
theorem of calculus, ∀ x, y ∈ Bρ (x 0 )
!1 d
(y − f (y)) − (x − f (x)) = 0 dt F (x + t (y − x)) dt, F (x) = x − f (x) ,

which by the Chain Rule gives


!1
(y − f (y)) − (x − f (x)) = 0 (I − Df (x + t (y − x))) · (y − x) dt ,

and hence
!1
(1) y − f (y)) − (x − f (x)) =  0 (I − Df (x + t (y − x))) · (y − x) dt
!1
≤ 0 (I − Df (x + t (y − x))) · (y − x) dt
!1
≤ 0 I − Df (x + t (y − x)) y − x dt
≤ εy − x ∀ x, y ∈ Bρ (x 0 ) ,

and hence
(2) (1 − ε)x − y ≤ f (x) − f (y) ≤ (1 + ε)x − y, ∀ x, y ∈ Bρ (x 0 ) .

We are going to prove the theorem with V = Bρ (x 0 ). We first claim that if z is an arbitrary point
of the open ball Bρ (x 0 ) and if σ ∈ (0, ρ − z − x 0 ) then
(3) Bσ/2 (f (z)) ⊂ f (Bσ (z)) ⊂ f (Bρ (x 0 )) .

Notice that the second inclusion here is trivial because of the inclusion B σ (z) ⊂ Bρ (x 0 ) which we
check as follows: x ∈ B σ (z) ⇒ x − x 0  = x − z + z − x 0  ≤ x − z + z − x 0  ≤ σ + z −
x 0  < ρ because σ < ρ − z − x 0 . Thus, we have only to prove the first inclusion in (3). To
prove this we have to show that for each vector v ∈ Bσ/2 (f (z)) there is an x v ∈ Bσ (z) with
f (x v ) = v . We check this using the contraction mapping principle on the closed ball B σ (z): We
define F (x) = x − f (x) + v , and note that
F (x) − z = (x − f (x)) − (z − f (z)) + v − f (z)
1 1
≤ (x − f (x)) − (z − f (z)) +  v − f (z) < σ + σ = σ ,
2 2

for all x ∈ B σ (z), so F maps the closed ball B σ (z) into the corresponding open ball Bσ (z) and since
F (x) − F (y) = (x − f (x)) − (y − f (y)) we see by (1) that F is a contraction mapping of B σ (z)
into itself and so has a fixed point x ∈ B σ (z) by the contraction mapping principle. This fixed point
x actually lies in the open ball Bσ (z) because F maps B σ (z) into Bσ (z); thus x − f (x) + v = x, or
in other words f (x) = v , with x ∈ Bσ (z), so x is the required point x v ∈ Bσ (z), and indeed (3) is
proved. Notice that (3) in particular guarantees that the image f (Bρ (x 0 )) is an open set, which we
84 CHAPTER 4. MORE ANALYSIS IN Rn

call W . By (2), f is a 1:1 map of Bρ (x 0 ) onto this open set W . Let g : W → Bρ (x 0 ) be the inverse
function and write x = g( u ), y = g( v ) ( u , v ∈ W ); then the first inequality in (2) can be written
(2) g( u ) − g( v ) ≤ (1 − ε)−1  u − v  ≤ 2 u − v , u, v ∈ W ,

which guarantees that the inverse function g is continuous on W . To prove the inverse is differen-
tiable at a given v ∈ W , pick y ∈ Bρ (x 0 ) such that f (y) = v , let A = Df (y). We claim that g is
differentiable at v with Dg( v ) = (Df (y))−1 . To see this let A = Df (y), and for given ε ∈ (0, 41 )
pick δ > 0 such that Bδ ( v ) ⊂ W (which we can do since W is open), Bδ (y) ⊂ Bρ (x 0 ), and
(4) 0 < x − y < δ ⇒ f (x) − f (y) − A(x − y) < εx − y ,

which we can do since f is differentiable at y. Recall that we already checked that A−1 (= (Df (y))−1 )
exists for all y ∈ Bρ (x 0 ), and then
g( u ) − g( v ) − A−1 ( u − v ) = A−1 (A(g( u ) − g( v )) − ( u − v ))
= A−1 (A(x − y) − (f (x) − f (y)))
(5) ≤ A−1  f (x) − f (y) − A(x − y)

for every u ∈ W , where x = g( u ) (i.e., u = f (x)). But, since x − y = g( u ) − g( v ) ≤


2 u − v  (by (2) ), we then have  u − v  < δ/2 ⇒ x − y < δ, which by (4) implies the ex-
pression on the right of (5) is ≤ εA−1 x − y = εA−1 g( u ) − g( v ) ≤ 2εA−1 u − v.
That is, we have proved
 u − v  < δ/2 ⇒ g( u ) − g( v ) − A−1 ( u − v ) < 2εA−1  u − v  ,

which proves that g is differentiable at v with Dg( v ) = A−1 , as claimed.


Finally, we have to check the continuity of the Jacobian matrix Dg( u ) for u ∈ W ; but f (g( u ))) ≡ u
for u ∈ W , so by the chain rule we have Df (g( u ))Dg( u ) = I for every u ∈ W ; i.e., Dg( u ) =
(Df (g( u )))−1 , which is a continuous function of u because g is a continuous function of u ∈ W
and Df is continuous on Bρ (x 0 ), which in turn implies that (Df )−1 is continuous on Bρ (x 0 ). This
completes the proof of the inverse function theorem.

3 IMPLICIT FUNCTION THEOREM


The implicit function theorem considers C 1 mappings from an open set U in Rn into Rm , with m < n.
It is convenient notationally to think of Rn as the same as the product Rn−m × Rm , and accordingly,

x
points in R should be written
n where x ∈ Rn−m and y ∈ Rm ; this is somewhat cumbersome
y
so we instead abbreviate it by writing [x, y] (with square brackets). Likewise, a function f on Rn

x
should be written f , again too cumbersome, so we’ll write f [x, y]. After a while you’ll get
y
comfortable with this, and start fudging notation by simply writing f (x, y) (with round brackets)
3. IMPLICIT FUNCTION THEOREM 85

instead of f [x, y]. Note that if f [x, y] is a C 1 function then we use the notation Dx f [x, y] =
[x ,y ]
( ∂f∂x1
, . . . , ∂f∂x[xn−m
,y ] [x ,y ]
) and Dy f [x, y] = ( ∂f∂y1
[x ,y ]
, . . . , ∂f∂y m
), and so in this notation the Jacobian
matrix of f is (Dx f [x, y], Dy f [x, y]).

Implicit Function Theorem. Let U be an open subset of Rn , G : U → Rm a C 1 mapping with m < n,


let S = {[x, y] ∈ U : G[x, y] = 0}, and let [a, b] be a point in S such that the matrix Dy G[x, y] (that’s
,y ]
the m × m matrix with j -th column ∂G[x∂yj ) is nonsingular at the point [a, b]. Then there are open
sets V ⊂ Rn and W ⊂ Rn−m with [a, b] ∈ V and a ∈ W and a C 1 map h : W → Rm such that
S ∩ V = graph h.
Note: As usual, graph h = {[x, y] : x ∈ W and y = h(x)}, that is, {[x, h(x)] : x ∈ W }.

Proof of the Implicit Function Theorem: The proof is rather easy: it is simply a clever application
of the inverse function theorem. We start by defining a map f : U → Rn by simply taking f [x, y] =
[x, G[x, y]]. Of course this is a C 1 map because G is C 1 . Note that the mapping f leaves the first
n − m coordinates (i.e., the x coordinates) of each point [x, y] fixed. Also, by direct calculation the
Jacobian matrix Df [a, b] (which is n × n) has the block form
 
I(n−m)×(n−m) O
Dx G[a, b] Dy G[a, b]

and hence det Df [a, b] = det Dy G[a, b]  = 0 so the inverse function theorem implies that there are
open sets V , Q in Rn with [a, b] ∈ V , f [a, b] = [a, G[a, b]] = [a, 0] ∈ Q such that f : V → Q
is 1:1 and onto and the inverse g : Q → V is C 1 . Since f [x, y] ≡ [x, G[x, y]] (i.e., f fixes the first
n − m components x of the point [x, y] ∈ Rn ), g must have the form
(1) g[x, y] = [x, H [x, y]], with H a C 1 function on Q .

Now we define W = {x ∈ Rn−m : [x, 0] ∈ Q ∩ (Rn−m × {0})} (it is easy to check that W is then
open in Rn−m because Q is open in Rn ), and let h(x) = H [x, 0] with H as in (1), so that
g[x, 0] = [x, h(x)], x ∈ W, where h(x) = H [x, 0] with H as in (1) ,

and then observe that [x, 0] = f (g[x, 0]) = f [x, h(x)] = [x, G[x, h(x)]; that is, G[x, h(x)] = 0,
so we’ve shown that graph h ⊂ S ∩ V . To check equality, we just have to check the reverse inclusion
S ∩ V ⊂ graph h as follows:
[x, z] ∈ S ∩ V ⇒ G[x, z] = 0 ⇒ f [x, z] = [x, 0] ⇒
[x, z] = g(f [x, z]) = g[x, 0] = [x, h(x)] ,

so h(x) = z and [x, z] ∈ graph h.

Finally, we give an alternative version of the implicit function theorem which we stated in Thm. 9.13
of Ch. 2 and which was used in the proof of the Lagrange multiplier Thm. 9.14 of Ch. 2.
86 CHAPTER 4. MORE ANALYSIS IN Rn

Corollary. If k ∈ {1, . . . , n − 1}, if U ⊂ Rn is open, if g1 , . . . , gn−k : U → R are C 1 , if S = {x ∈


U : gj (x) = 0 for each j = 1, . . . , n − k} and if M  = ∅, where
M = {y ∈ S : ∇Rn g1 (y), . . . , ∇Rn gn−k (y) are l.i.} ,
then M is a k-dimensional C 1 manifold.

Proof: Let m = n − k, let a ∈ M and let G = (g1 , . . . , gm )T : U → Rm . The m rows of the m × n


matrix DG(a) are given to be l.i., so DG(a) has rank m and we must we able find m l.i. columns
Dj1 G(a), . . . , Djm G(a). Reordering the coordinates if necessary, we can suppose without loss of
generality that j1 = n − m + 1, j2 = n − m + 2, . . . , jm = n (i.e., j1 , . . . , jm are the last m integers
between 1 and n), and we can relabel the coordinates of points of Rn as [x, y] with x ∈ Rn−m , y ∈ Rm
and the point a can be written also as [a 0 , b0 ] with a 0 ∈ Rn−m and b0 ∈ Rm . Then Dy G[x, y] is
an m × m nonsingular matrix at the point [a 0 , b0 ], so the Implicit Function Theorem applies with
[a 0 , b0 ] in place of [a, b] and hence the Def. 9.2 of Ch. 2 is checked.
87

APPENDIX A

Introductory Lectures on Real


Analysis
LECTURE 1: THE REAL NUMBERS
We assume without proof the usual properties of the integers: For example, that the integers are
closed under addition and subtraction, that the principle of mathematical induction holds for the
positive integers, and that 1 is the least positive integer.
We also assume the usual field and order properties for the real numbers R. Thus, we accept without
proof that the reals satisfy the field axioms, as follows:

F1 a + b = b + a, ab = ba (commutativity)

F2 a + (b + c) = (a + b) + c, a(bc) = (ab)c (associativity)

F3 a(b + c) = ab + ac (distributive law)

F4 ∃ elements 0, 1 with 0 = 1 such that 0 + a = a and 1 · a = a ∀ a ∈ R (additive and multiplica-


tive identities)

F5 ∀ a ∈ R, ∃ an element −a such that a + (−a) = 0 (additive inverse)

F6 ∀ a ∈ R with a  = 0, ∃ an element a −1 such that aa −1 = 1 (multiplicative inverse).

Terminology: a + (−b), ab−1 are usually written a − b , ab , respectively. Notice that the latter
makes sense only for b = 0.

Notice that all the other standard algebraic properties of the reals follow from these. (See Exercise 1.1
below.)

We here also accept without proof that the reals satisfy the following order axioms:

01 For each a ∈ R, exactly one of the possibilities a > 0, a = 0, −a > 0 holds.

02 a > 0 and b > 0 ⇒ ab > 0 and a + b > 0.


88 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

Terminology: a > b means a − b > 0, a ≥ b means that either a > b or a = b, a < b means b > a,
and a ≤ b means b ≥ a.

We claim that all the other standard properties of inequalities follow from these and from F1–F6.
(See Problem 1.2 below.)
Notice also that the above properties (i.e., F1–F6, O1, O2) all hold with the rational numbers
Q ≡ { pq : p, q are integers, q  = 0} in place of R. F1–F6 also hold with the complex numbers C =
{x + iy : x, y ∈ R} in place of R, but inequalities like a > b make no sense for complex numbers.

In addition to F1–F6, O1, O2 there is one further key property of the real numbers. To discuss it
we need first to introduce some terminology.

Terminology: If S ⊂ R we say:

(1) S is bounded above if ∃ a number K ∈ R such that x ≤ K ∀ x ∈ S. (Any such number K is called
an upper bound for S.)
(2) S is bounded below if ∃ a number k ∈ R such that x ≥ k ∀ x ∈ S. (Any such number k is called
a lower bound for S.)
(3) S is bounded if it is both bounded above and bounded below. (This is equivalent to the fact that
∃ a positive real number L such that |x| ≤ L ∀ x ∈ S.)

We can now introduce the final key property of the real numbers.

C. (“Completeness property of the reals”): If S ⊂ R is nonempty and bounded above, then S has
a least upper bound.
Notice that the terminology “least upper bound” used here means exactly what it says: a number α
is a least upper bound for a set S ⊂ R if
(i) x ≤ α ∀ x ∈ S i.e., α is an upper bound), and
(ii) if β is any other upper bound for S, then α ≤ β i.e., α is ≤ any other upper bound for S).

Such a least upper bound is unique, because if α1 , α2 are both least upper bounds for S, the property (ii)
implies that both α1 ≤ α2 and α2 ≤ α1 , so α2 = α1 . It therefore makes sense to speak of the least
upper bound of S (also known as “the supremum” of S). The least upper bound of S will henceforth
be denoted sup S. Notice that property C guarantees sup S exists if S is nonempty and bounded
above.

Remark: If S is nonempty and bounded below, then it has a greatest lower bound (or “infimum”),
which we denote inf S. One can prove the existence of inf S (if S is bounded below and nonempty)
by using property C on the set −S = {−x : x ∈ S}. (See Exercise 1.5 below.)
We should be careful to distinguish between the maximum element of a set S (if it exists) and the
supremum of S. Recall that we say a number α is the maximum of S (denoted max S) if
LECTURE 1. THE REAL NUMBERS 89

(i) x ≤ α ∀ x ∈ S (i.e., α is an upper bound for S), and


(ii) α ∈ S.
These two properties say exactly that α is a upper bound for S and also one of the elements of S.
Thus, clearly a maximum α of S, if it exists, satisfies both (i), (ii) and hence must agree with sup S.
But keep in mind that max S may not exist, even if the set S is nonempty and bounded above: for
example, if S = (0, 1) (= {x ∈ R : 0 < x < 1}), then sup S = 1, but max S does not exist, because
1∈/ S.

Notice that of course any finite nonempty set S ⊂ R has a maximum. One can formally prove this
by induction on the number of elements of the set S. (See Exercise 1.3 below.)

Using the above-mentioned properties of the integers and the reals it is now possible to give formal
rigorous proofs of all the other properties of the reals which are used, even the ones which seem self-
evident. For example, one can actually prove formally, using only the above properties, the fact that
the set of positive integers are not bounded above. (Otherwise there would be a least upper bound
α so, in particular, we would have n ≤ α for each positive integer n and hence, in particular, n ≤ α
hence n + 1 ≤ α for each n, or in other words n ≤ α − 1 for each positive integer n, contradicting
the fact that α is the least upper bound!) Thus, we have shown rigorously, using only the axioms
F1–F6, O1, O2, and C, that the positive integers are not bounded above. Thus,
# 1 1
$
∀ positive a ∈ R, ∃ a positive integer n with n > a i.e., < .
n a

This is referred to as “The Archimedean Property” of the reals.


Similarly, using only the axioms F1–F6, O1,2, and C, we can give a formal proof of all the basic
properties of the real numbers—for example, in Problem 1.7 below you are asked to prove that square
roots of positive numbers do indeed exist.

Final notes on the Reals: (1) We have assumed without proof all properties F1–F6, O1,O2 and C. In
a more advanced course we could, starting only with the positive integers, give a rigorous construction
of the real numbers, and prove all the properties F1–F6, O1, O2, and C. Furthermore, one can prove
(in a sense that can be made precise) that the real number system is the unique field with all the
above properties.
(2) You can of course freely use all the standard rules for algebraic manipulation of equations and
inequalities involving the reals; normally you do not need to justify such things in terms of the axioms
F1–F6, O1, O2 unless you are specifically asked to do so.

LECTURE 1 PROBLEMS
1.1 Using only properties F1–F6, prove
(i) a · 0 = 0∀a ∈ R
(ii) ab = 0 ⇒ either a = 0 or b = 0
90 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

(iii) a
b + c
d = ad+bc
bd ∀ a, b, c, d ∈ R with b  = 0, d  = 0.
Note: In each case present your argument in steps, stating which of F1–F6 is used at each step.
Hint for (iii): First show the “cancellation law” that x
y = xz
yz for any x, y, z ∈ R with y  = 0, z  = 0.
1.2 Using only properties F1–F6 and O1,O2, (and the agreed terminology) prove the following:
(i) a > 0 ⇒ 0 > −a i.e., −a < 0)
(ii) a > 0 ⇒ 1
a >0
(iii) a > b > 0 ⇒ 1
a < 1
b
(iv) a > b and c > 0 ⇒ ac > bc.
1.3 If S is a finite nonempty subset of R, prove that max S exists. (Hint: Let n be the number of
elements of S and try using induction on n.)
1.4 Given any number x ∈ R, prove there is an integer n such that n ≤ x < n + 1.
Hint: Start by proving there is a least integer > x.
Note: 1.4 Establishes rigorously the fact that every real number x can be written in the form x =
integer plus a remainder, with the remainder ∈ [0, 1). The integer is often referred to as the “integer
part of x.” We emphasize once again that such properties are completely standard properties of the
real numbers and can normally be used without comment; the point of the present exercise is to show
that it is indeed possible to rigorously prove such standard properties by using the basic properties
of the integers and the axioms F1–F6, O1,O2, C.
1.5 Given a set S ⊂ R, −S denotes {−x : x ∈ S}. Prove:
(i) S is bounded below if an only if −S is bounded above.
(ii) If S is nonempty and bounded below, then inf S exists and = − sup(−S).
(Hint: Show that α = − sup (−S) has the necessary 2 properties which establish it to be the greatest
lower bound of S.)
1.6 If S ⊂ R is nonempty and bounded above, prove ∃ a sequence a1 , a2 , . . . of points in S with
lim an = sup S.
Hint: In case sup S ∈
/ S, let α = sup S, and for each integer j ≥ 1 prove there is at least one element
aj ∈ S with α > aj > α − j1 .
1.7 Prove that every positive real number has a positive square root. (That is, for any a > 0, prove
there is a real number α > 0 such that α 2 = a.)
Hint: Begin by observing that S = {x ∈ R : x > 0 and x 2 < a} is nonempty and bounded above,
and then argue that sup S is the required square root.
LECTURE 2. SEQUENCES OF REAL NUMBERS 91

LECTURE 2: SEQUENCES OF REAL NUMBERS AND THE


BOLZANO-WEIERSTRASS THEOREM

Let a1 , a2 , . . . be a sequence of real numbers; an is called the n-th term of the sequence.We sometimes
use the abbreviation {an } or {an }n=1,2,...

Technically, we should distinguish between the sequence {an } and the set of terms of the sequence—i.e.,
the set S = {a1 , a2 . . . }. These are not the same: e.g., the sequence 1, 1, . . . has infinitely many
terms each equal to 1, whereas the set S is just the set {1} containing one element.
Formally, a sequence is a mapping from the positive integers to the real numbers; the nth term an of
the sequence is just the value of this mapping at the integer n. From this point of view—i.e., thinking
of a sequence as a mapping from the integers to the real numbers—a sequence has a graph consisting
of discrete points in R2 , one point of the graph on each of the vertical lines x = n. Thus, for example,
the sequence 1, 1, . . . (each term = 1) has graph consisting of the discrete points marked “⊗” in the
following figure:

Figure A.1: Graph of the trivial sequence {an }n=1,... where an = 1 ∀ n.

Terminology: Recall the following terminology. A sequence a1 , a2 , . . . is:

(i) bounded above if ∃ a real number K such that an ≤ K ∀ integer n ≥ 1.


(ii) bounded below if ∃ a real number k such that an ≥ k ∀ integer n ≥ 1.
(iii) bounded if it is both bounded above and bounded below. (This is equivalent to the fact that ∃
a real number L such that |an | ≤ L ∀ integer n ≥ 1.)
(iv) increasing if an+1 ≥ an ∀ integer n ≥ 1.
(v) strictly increasing if an+1 > an ∀ integer n ≥ 1.
(vi) decreasing if an+1 ≤ an ∀ integer n ≥ 1.
(vii) strictly decreasing if an+1 < an ∀ integer n ≥ 1.
(viii) monotone if either the sequence is increasing or the sequence is decreasing.
92 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

(ix) We say the sequence has limit  ( a given real number) if for each ε > 0 there is an integer
N ≥ 1 such that
(∗) |an − | < ε ∀ integer n ≥ N .3

(x) In case the sequence {an } has limit  we write


lim an =  or lim an =  or an →  .
n→∞

(xi) We say the sequence {an } converges (or “is convergent”) if it has limit  for some  ∈ R.

Theorem 2.1. If {an } is monotone and bounded, then it is convergent. In fact, if S = {a1 , a2 . . . } is the
set of terms of the sequence, we have the following:
(i) if {an } is increasing and bounded then lim an = sup S.
(ii) if {an } is decreasing and bounded then lim an = inf S.

Proof: See Exercise 2.2. (Exercise 2.2 proves part (i), but the proof of part (ii) is almost identical.)

Theorem 2.2. If {an } is convergent, then it is bounded.

Proof: Let l = lim an . Using the definition (ix) above with ε = 1, we see that there exists an integer
N ≥ 1 such that |an − l| < 1 whenever n ≥ N. Thus, using the triangle inequality, we have |an | ≡
|(an − l) + l| ≤ |an − l| + |l| < 1 + |l| ∀ integer n ≥ N . Thus,
|an | ≤ max{|a1 |, . . . |aN |, |l| + 1} ∀ integer n ≥ 1 .

Theorem 2.3. If {an }, {bn } are convergent sequences, then the sequences {an + bn }, {an bn } are also
convergent and
(i) lim(an + bn ) = lim an + lim bn
(ii) lim(an bn ) = (lim an ) · (lim bn ) .
In addition, if bn = 0 and lim bn  = 0, then
an lim an
(iii) lim = .
bn lim bn
Proof: We prove (ii); the remaining parts are left as an exercise. First, since {an }, {bn } are convergent,
the previous theorem tells us that there are constants L, M > 0 such that
(∗) |an | ≤ L and |bn | ≤ M ∀ positive integer n .
3 Notice that (∗) is equivalent to  − ε < a <  + ε ∀ integers n ≥ N.
n
LECTURE 2. SEQUENCES OF REAL NUMBERS 93

Now let l = lim an , m = lim bn and note that by the triangle inequality and (∗),
|an bn − lm| ≡ |an bn − lbn + lbn − lm|
≡ |(an − l)bn + l(bn − m|
≤ |an − l||bn | + |l||bn − m|
(∗∗) ≤ M|an − l| + |l||bn − m| ∀ n ≥ 1 .

On the other hand, for any given ε > 0 we use the definition of convergence, i.e., (ix) above) to
deduce that there exist integers N1 , N2 ≥ 1 such that
ε
|an − l| < ∀ integer n ≥ N1
2(1 + M + |l|)
and
ε
|bn − m| < ∀ integer n ≥ N2 .
2(1 + M + |l|)
Thus, for each integer n ≥ max{N1 , N2 }, (∗∗) implies
ε ε
|an bn − lm| < M + |l|
2(1 + M + |l|) 2(1 + M + |l|)
ε ε
< + =ε,
2 2
and the proof is complete because we have shown that the definition (ix) is satisfied for the sequence
an bn with lm in place of l.
Remark: Notice that if we take {an } to be the constant sequence −1, −1, . . . (so that trivially
lim an = −1) in part (ii) above, we get
lim(−bn ) = − lim bn .

Hence, using part (i) with {−bn } in place of bn we conclude


lim(an − bn ) = lim an − lim bn .

By similar argument (i), (ii), imply that


lim(αan + βbn ) = α lim an + β lim bn .

For any α, β ∈ R (provided lim an , lim bn both exist).


The following definition of subsequence is important.
Definition: Given a sequence a1 , a2 , . . . , we say an1 , an2 , . . . is a subsequence of {an } if n1 , n2 , . . .
are integers with 1 ≤ n1 < n2 < n3 < . . . . (Note the strict inequalities.)

Thus, {bn }n=1,2,... is a subsequence of {an }n=1,2,... if and only if for each j ≥ 1 both the following
hold:
(i) bj is one of the terms of the sequence a1 , a2 , . . . , and
94 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

(ii) the real number bj +1 appears later than the real number bj in the sequence a1 , a2 , . . .
Thus, {a2n }n=1,2,... = a2 , a4 , a6 . . . is a subsequence of {an }n=1,2,... and 21 , 13 , 41 , 15 . . . is a subse-
quence of { n1 }n=1,2,... but 13 , 21 , 41 , 15 , . . . is not a subsequence of { n1 }.

Theorem 2.4. (Bolzano-Weierstrass Theorem.) Any bounded sequence {an } has a convergent subse-
quence.
Proof: (“Method of bisection”) Since {an } is bounded we can find upper and lower bounds, respec-
tively. Thus, ∃ c < d such that
(∗) c ≤ ak ≤ d ∀ integer k ≥ 1 .
Now subdivide the interval [c, d] into the two half intervals [c, c+d2 ], [ 2 , d]. By (∗), at least one of
c+d

these (call it I1 say) has the property that there are infinitely many integers k with ak ∈ I1 . Similarly,
we can divide I1 into two equal subintervals (each of length d−c 4 ), at least one of which (call it I2 )
has the property that ak ∈ I2 for infinitely many integers k. Proceeding in this way we get a sequence
of intervals {In }n=1,2... with In = [cn , dn ] and with the properties that, for each integer n ≥ 1,
d −c
(1) length In ≡ dn − cn =
2n
(2) [cn+1 , dn+1 ] ⊂ [cn , dn ] ⊂ [c, d]
(3) ak ∈ [cn , dn ] for infinitely many integers k ≥ 1 .
(Notice that (3) says cn ≤ ak ≤ dn for infinitely many integers k ≥ 1.)
Using properties (1), (2), (3), we proceed to prove the existence of a convergent subsequence as
follows.
Select any integer k1 ≥ 1 such that ak1 ∈ [c1 , d1 ] (which we can do by (3)). Then select any integer
k2 > k1 with ak2 ∈ [c2 , d2 ]. Such k2 of course exists by (3). Continuing in this way we get integers
1 ≤ k1 < k2 . . . such that akn ∈ [cn , dn ] for each integer n ≥ 1. That is,
(4) cn ≤ akn ≤ dn ∀ integer n ≥ 1 .
On the other hand, by (1), (2) we have
(5) c ≤ cn ≤ cn+1 < dn+1 ≤ dn ≤ d ∀ integer n ≥ 1 .
Notice that (5), in particular, guarantees that {cn }, {dn } are bounded sequences which are, respectively,
increasing and decreasing, hence by Thm. 2.1 are convergent. On the other hand, (1) says dn − cn =
2n (→ 0 as n → ∞), hence lim cn = lim dn (=  say). Then by (4) and the Sandwich Theorem
d−c

(see Exercise 2.5 below) we see that {akn }n=1,2,... also has limit .

LECTURE 2 PROBLEMS
2.1 Use the Archimedean property of the reals (Lem. 1.1 of Lecture 1) to prove rigorously
that lim n1 = 0.
LECTURE 2. SEQUENCES OF REAL NUMBERS 95

2.2 Prove part (i) of Thm. 2.1.


Hint: Let α = sup S, and show first that for each ε > 0, ∃ an integer N ≥ 1 such that aN > α − ε.
2.3 Using the definition (ix) on p. 92 prove that a sequence {an } cannot have more than one limit.
2.4 If {an }, {bn } are given convergent sequences and an ≤ bn ∀ n ≥ 1, prove lim an ≤ lim bn .
Hint: lim(an − bn ) = lim an − lim bn , so it suffices to prove that lim cn ≤ 0 whenever {cn } is con-
vergent with cn ≤ 0 ∀ n.
2.5 (“Sandwich Theorem.”) If {an }, {bn } are given convergent sequences with lim an = lim bn ,
and if {cn } is any sequence such that an ≤ cn ≤ bn ∀ n ≥ 1, prove that {cn } is convergent and
lim cn = lim an (= lim bn ) (Hint: Let l = lim an = lim bn and use the definition (ix) on p. 92.)
2.6 If k is a fixed positive integer and if {an } is any sequence such that 1
nk
≤ an ≤ nk ∀ n ≥ 1,
1/n 1
prove that lim an = 1. (Hint: use 2.5 and the standard limit result that lim n n = 1.)
96 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

LECTURE 3: CONTINUOUS FUNCTIONS


Here we shall mainly be interested in real valued functions for some closed interval [a, b]; thus
f : [a, b] → R. (This is reasonable notation, since for each x ∈ [a, b], f assigns a value f (x) ∈ R.)
First we recall the definition of continuity of such a function.
Definition 1. f : [a, b] → R is said to be continuous at the point c ∈ [a, b], if for each ε > 0 there
is a δ > 0 such that
x ∈ [a, b] with |x − c| < δ ⇒ |f (x) − f (c)| < ε .

Definition 2. We say f : [a, b] → R is continuous if f is continuous at each point c ∈ [a, b].

We want to prove the important theorem that such a continuous function attains both its maximum
and minimum values on [a, b]. We first make the terminology precise.

Terminology: If f : [a, b] → R, then:


(1) f is said to attain its maximum at a point c ∈ [a, b] if f (x) ≤ f (c) ∀ x ∈ [a, b];
(2) f is said to attain its minimum at a point c ∈ [a, b] if f (x) ≥ f (c) ∀ x ∈ [a, b].

We shall also need the following lemma, which is of independent importance.

Lemma 3.1. If an ∈ [a, b] ∀n ≥ 1 and if lim an = c ∈ [a, b] and if f : [a, b] → R is continuous at c,


then
lim f (an ) = f (c) ,
i.e., the sequence {f (an )}n=1,2... converges to f (c).
Proof: Let ε > 0. By Def. 1 above, ∃ δ > 0 such that
(∗) |f (x) − f (c)| < ε whenever x ∈ [a, b]with |x − c| < δ .

On the other hand, by the definition of lim an = c, with δ in place of ε (i.e., we use the definition
(ix) on p. 92 of Lecture 2 with δ in place of ε ) we can find an integer N ≥ 1 such that
|an − c| < δ whenever n ≥ N .

Then, (since an ∈ [a, b] ∀ n) (∗) tells us that


|f (an )−f (c)| < ε ∀ n ≥ N .

Theorem 3.2. If f : [a, b] → R is continuous, then f is bounded and ∃ points c1 , c2 , ∈ [a, b] such
that f attains its maximum at the point c1 and its minimum at the point c2 ; that is, f (c) ≤ f (x) ≤
f (c2 ) ∀ x ∈ [a, b].
LECTURE 3. CONTINUOUS FUNCTIONS 97

Proof: It is enough to prove that f bounded above and that there is a point c1 ∈ [a, b] such that
f attains its maximum at c1 , because we can get the rest of the theorem by applying this results to
−f .
To prove f is bounded above we argue by contradiction. If f is not bounded above, then for each
integer n ≥ 1 we can find a point xn ∈ [a, b] such that f (x) ≥ n. Since xn is a bounded sequence,
by the Bolzano-Weierstrass Theorem (Thm. 2.4 of Lecture 2) we can find a convergent subsequence
xn1 , xn2 , . . . Let c = lim xnj
Of course, since a ≤ xnj ≤ b ∀j , we must have c ∈ [a, b]. Also, since 1 ≤ n1 < n2 < . . . (and since
n1 , n2 . . . are integers), we must have nj +1 ≥ nj + 1 , hence by induction on j
(1) nj ≥ j ∀ integer j ≥ 1 .

Now since xnj ∈ [a, b] and lim xnj = c ∈ [a, b] we have by Lem. 3.1 that lim f (xnj ) = f (c). Thus,
f (xnj )j =1,2,... is convergent, hence bounded by Thm. 2.2. But by construction f (xnj ) ≥ nj (≥ j by
(1)), hence f (xnj )j =1,2... is not bounded, a contradiction.This completes the proof that f is bounded
above.
We now want to prove f attains its maximum value at some point c1 ∈ [a, b]. Let S = {f (x) : x ∈
[a, b]}. We just proved above that S is bounded above, hence (since it is non empty by definition)
S has a least upper bound which we denote by M. We claim that for each integer n ≥ 1 there is a
point xn ∈ [a, b] such that f (xn ) > M − n1 . Indeed, otherwise M− n1 would be an upper bound for
S, contradicting the fact that M was chosen to be the least upper bound. Again we can use the
Bolzano-Weierstrass Theorem to find a convergent subsequence xn1 , xn2 . . . and again (1) holds.
Let c1 = lim xnj . By Lem. 3.1 again we have
(2) f (c1 ) = lim f (xnj ) .

However, by construction we have


1 1
M ≥ f (xnj ) > M − ≥M− (by (1)) .
nj j

And hence by the Sandwich Theorem (Exercise 2.4 of Lecture 2) we have lim f (xnj ) = M. By (2)
this gives f (c1 ) = M. But M is an upper bound for S = {f (x) : x ∈ [a, b]}, hence we have f (x) ≤
f (c1 ) ∀ x ∈ [a, b], as required.

An important consequence of the above theorem is the following.

Lemma (Rolle’s Theorem): If f : [a, b] → R is continuous, if f (a) = f (b) = 0 and if f is differ-


entiable at each point of (a, b), then there is a point c ∈ (a, b) with f (c) = 0.

Proof: If f is identically zero then f (c) = 0 for every point c ∈ (a, b), so assume f is not identically
zero. Without loss of generality we may assume f (x) > 0 for some x ∈ (a, b) (otherwise this
98 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

property holds with −f in place of f ). Then max f (which exists by Thm. 3.2) is positive and is
attained at some point c ∈ (a, b). We claim that f (c) = 0. Indeed, f (c) = limx→c f (x)−f
x−c
(c)
=
limx→c+ f (x)−f
x−c
(c)
= limx→c− f (x)−f
x−c
(c)
and the latter 2 limits are, respectively, ≤ 0 and ≥ 0. But
they are equal, hence they must both be zero.

Corollary (Mean Value Theorem): If f : [a, b] → R is continuous and f is differentiable at each


point of (a, b), then there is some point c ∈ (a, b) with f (c) = f (b)−f
b−a
(a)
.

Proof: Apply Rolle’s Theorem to the function f(x) = f (x) − f (a) − f (b)−f (a)
b−a (x − a).

LECTURE 3 PROBLEMS
3.1 Give an example of a bounded function f : [0, 1] → R such that f is continuous at each
point c ∈ [0, 1] except at c = 0, and such that f attains neither a maximum nor a minimum value.

3.2 Prove carefully (using the definition of continuity on p. 96) that the function f : [−1, 1] → R
defined by
+1 if 0 < x ≤ 1
f (x) =
0 if − 1 ≤ x ≤ 0

is not continuous at x = 0. (Hint: Show the definition fails with, e.g., ε = 21 .)

3.3 Let f : [a, b] → R be continuous, and let |f | : [a, b] → R be defined by |f |(x) = |f (x)|.
Prove that |f | is continuous.

3.4 Suppose f : [a, b] → R and c ∈ [a, b] are given, and suppose that lim f (xn ) = f (c) for
all sequences {xn }n=1,2,... ⊂ [a, b] with lim xn = c. Prove that f is continuous at c. (Hint: If not,
∃ ε > 0 such that (∗) on p. 96 fails for each δ > 0; in particular, ∃ ε > 0 such that (∗) fails for δ = n1 ∀
integer n ≥ 1.)

3.5 If f : [0, 1] → R is defined by

1 if x ∈ [0, 1] is a rational number


f (x) =
0 if x ∈ [0, 1] is not rational .

Prove that f is continuous at no point of [0, 1].


Hint: Recall that any interval (c, d) ∈ R (with c < d) contains both rational and irrational numbers.

3.6 Suppose f : (0, 1) → R is defined by f (x) = 0 if x ∈ (0, 1) is not a rational number, and
f (x) = 1/q if x ∈ (0, 1) can be written in the form x = pq with p, q positive integers without
common factors. Prove that f is continuous at each irrational value x ∈ (0, 1).
Hint: First note that for a given ε > 0 there are at most finitely many positive integers q with 1
q ≥ ε.
LECTURE 3. CONTINUOUS FUNCTIONS 99

3.7 Suppose f : [0, 1] → R is continuous, and f (x) = 0 for each rational point x ∈ [0, 1]. Prove
f (x) = 0 for all x ∈ [0, 1].
3.8 If f : R → R is continuous at each point of R, and if f (x + y) = f (x) + f (y) ∀x, y ∈ R,
prove ∃ a constant a such that f (x) = ax ∀x ∈ R. Show by example that the result is false if we
drop the requirement that f be continuous.
100 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

LECTURE 4: SERIES OF REAL NUMBERS


Consider the series
a1 + a2 + · · · + an + . . .

(usually written with summation notation as ∞ n=1 an ), where a1 , a2 , . . . is a given sequence of real
numbers. an is called the n-th term of the series. The sum of the first n terms is

n
sn = ak ;
k=1

sn is called the n-th partial sum of the series. If


sn → s

(i.e., if lim sn = s) for some s ∈ R, then we say the series converges, and has sum s. Also, in this case
we write
∞
s= an .
n=1
If sn does not converge, then we say the series diverges.

Example: If a ∈ R is given, then the series 1 + a + a 2 + . . . (i.e., the geometric series) has nth
partial sum
n if a = 1
sn = 1 + a + · · · + a n−1 = 1−a n
1−a if a = 1 .

Using the fact that a n → 0 if |a| < 1, we thus see that the series converges and has sum 1−a 1
if
|a| < 1, whereas the series diverges for |a| ≥ 1. (Indeed {sn } is unbounded if |a| > 1 or a = 1, and
if a = −1, {sn }n=1,2,... = 1, 0, 1, 0, . . . .)
The following simple lemma is of key importance.

Lemma 4.1. If ∞ n=1 an converges, then lim an = 0.
∞ 1
Note: The converse is not true. For example, we check below that n=1 n does not converge, but its
nth term is n1 , which does converge to zero.
Proof of Lemma 4.1: Let s = lim sn . Then of course we also have s = lim sn+1 . But, sn+1 − sn =
(a1 + a2 + · · · + an+1 ) − (a1 + a2 + · · · an ) = an+1 , hence (see the remark following Thm. 2.3 of
Lecture 2) we have lim an+1 = lim sn+1 − lim sn = s − s = 0. i.e., lim an = 0.
The following lemma is of theoretical and practical importance.

Lemma 4.2. If ∞ n=1 an , n=1 bn both converge, and have sum s, t, respectively, and if α, β are

real numbers, then ∞ (αa + βb ) also converges and has sum αs + βt, i.e., (αa n + βbn ) =
∞n=1 n
∞ n
∞ n=1
α ∞ a
n=1 n + β b
n=1 n if both n=1 n a , b
n=1 n converge.
LECTURE 4. SERIES OF REAL NUMBERS 101


Proof: Let sn = nk=1 an , tn = nk=1 bk .We are given sn → s and tn → t, then αsn + βtn → αs +

βt (see the remarks following Thm. 2.3 of Lecture 2). But αsn + βtn = α nk=1 an + β nk=1 bk =
n ∞
k=1 (αak + βbk ), which is the nth partial sum of n=1 (αan + βbn ).

There is a very convenient criteria for checking convergence in case all the terms are nonnegative.
Indeed, in this case
sn+1 − sn = an+1 ≥ 0 ∀ n ≥ 1 ,
hence the sequence {sn } is increasing if an ≥ 0. Thus, by Thm. 2.1(i) of Lecture 2 we see that {sn }
converges if and only if it is bounded. That is, we have proved:

Lemma 4.3. If each term of ∞ n=1 an is nonnegative (i.e., if an ≥ 0 ∀ n) then the series converges if and
only if the sequence of partial sums (i.e., {sn }n=1,2,... ) is bounded.

Example. Using the above criteria, we can discuss convergence of ∞ 1
n=1 np where p > 0 is given.
The nth partial sum in this case is
 n
1
sn = .
kp
k=1

Since 1
xp is a decreasing function of x for x > 0, we have, for each integer k ≥ 1,
1 1 1
≤ p ≤ p ∀ x ∈ [k, k + 1] .
(k + 1) p x k
Integrating, this gives
" k+1 " k+1 " k+1
1 1 1 1 1
≡ ≤ dx ≤ dx ≡ p ,
(k + 1)p k (k + 1)p k xp k k p k
so if we sum from k = 1 to n, we get

n " n+1  1 n
1 1
≤ dx ≤ .
(k + 1)p 1 xp kp
k=1 k=1

That is, " n+1 1


sn+1 − 1 ≤ dx ≤ sn ∀n ≥ 1 .
1 xp
But
" n+1 1 log(n + 1) if p = 1
dx = (n+1)1−p −1
1 xp 1−p if p = 1 .
Thus, we see that {sn } is unbounded if p ≤ 1 and bounded for p > 1, hence from Lem. 4.2 we
conclude
∞
1 converges p > 1
n=1
n p
diverges p≤1.
102 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

Remark. The above method can be modified to discuss convergence of other series. See Exercise 4.6
below.

Theorem 4.4. If ∞ n=1 |an | converges, then n=1 an converges.

Terminology: If n=1 |an | converges, then we say that ∞ n=1 an is absolutely convergent. Thus, with
this terminology, the above theorem just says “absolute convergence ⇒ convergence.”

Proof of Theorem 4.4: Let sn = nk=1 ak , tn = nk=1 |ak |. Then we are given tn → t for some
t ∈ R.
For each integer n ≥ 1, let

an if an ≥ 0
pn =
0 if an < 0
−an if an ≤ 0
qn =
0 if an > 0 ,
n n
and let sn+ = −
k=1 pn , sn = k=1 qn . Notice that for each n ≥ 1 we then have
an = pn − qn , sn = sn+ − sn−
|an | = pn + qn , tn = sn+ + sn− ,

and pn , qn ≥ 0. Also,
0 ≤ sn+ ≤ tn ≤ t and 0 ≤ sn− ≤ tn ≤ t .

Hence, we have shown that ∞n=1 pn , n=1 qn have bounded partial sums. Hence, by Lem. 4.3,
∞ ∞ ∞
both n=1 pn , n=1 qn converge. But then (by Lem. 4.2) ∞n=1 (pn − qn ) converges, i.e., n=1 an
converges.

Rearrangement of series: We want to show that the terms of an absolutely convergent series can
be rearranged in an arbitrary way without changing the sum. First we make the definition clear.
Definition: Let j1 , j2 , . . . be any sequence of positive integers in which every positive integer appears
once and only once (i.e., the mapping n → jn is a 1 : 1 mapping of the positive integers onto the

positive integers). Then the series ∞ n=1 ajn is said to be a rearrangement of the series n=1 an .
∞ ∞
Theorem 4.5. If n=1 an is absolutely convergent, then any rearrangement n=1 ajn converges, and

has the same sum as ∞ n=1 an .

Proof: We give the proof when an ≥ 0 ∀ n (in which case “absolute convergence” just means “con-
vergence”). This extension to the general case is left as a exercise. (See Exercise 4.8 below.)

Hence, assume ∞ n=1 an converges, and an ≥ 0 ∀ integer n ≥ 1, and let n=1 ajn be any rearrange-
ment. For each n ≥ 1, let
P (n) = max{j1 , . . . , jn } .
LECTURE 4. SERIES OF REAL NUMBERS 103

So that
{j1 , . . . , jn } ⊂ {1, . . . P (n)} and hence (since an ≥ 0 ∀ k)
aj1 + aj2 + · · · + ajn ≤ a1 + · · · + aP (n) ≤ s ,

where s = n=1 an . Thus, we have shown that the partial sums of ∞ n=1 ajn are bounded above
∞ ∞
by s, hence by Lem. 4.3, n=1 ajn converges, and has sum t satisfying t ≤ s. But n=1 an is a

rearrangement of ∞ n=1 ajn (using the rearrangement given by the inverse mapping jn → n), and
hence by the same argument we also have s ≤ t. Hence, s = t as required.

LECTURE 4 PROBLEMS
The first few problems give various criteria to test for absolute convergence (and also, in some cases,
for testing for divergence).

4.1 (i) (Comparison test.) If ∞ a , ∞ n=1 bn are given series and if |an | ≤ |bn | ∀ n ≥ 1, prove
∞ ∞n
n=1
b
n=1 n absolutely convergent ⇒ n=1 n absolutely convergent.
a
(ii) Use this to discuss convergence of:

(a) ∞ sin n
n=1 n2
sin( n1 )
(b) ∞ n=1 n .

4.2 (Comparison test for divergence.) If ∞ an , ∞ bn are given series with nonnegative
∞ n=1 ∞
n=1
terms, and if an ≥ bn ∀ n ≥ 1, prove n=1 bn diverges ⇒ n=1 an diverges.

4.3 (i) (Ratio Test.) If an  = 0 ∀ n, if λ ∈ (0, 1) and if there is an integer N ≥ 1 such that |a|an+1
n|
|


λ ∀ n ≥ N, prove that n=1 an is absolutely convergent. (Hint: First use induction to show that
|an | ≤ λn−N ∀ n ≥ N.)
|an+1 |
(ii) (Ratio test for divergence.) If an  = 0 ∀n and if ∃ an integer N ≥ 1 such that |an | ≥ 1 ∀n ≥ N

then prove ∞ n=1 an diverges.
1
4.4 (Cauchy root test.) (i) Suppose ∃ λ ∈ (0, 1) and an integer N ≥ 1 such that |an | n ≤ λ ∀ n ≥ N.

Prove that ∞ n=1 an converges.

(ii) Use the Cauchy root test to discuss convergence of ∞ n=1 n x . (Here x ∈ R is given—consider
2 n

the possibilities |x| < 1, |x| > 1, |x| = 1.)



4.5 Suppose an ≥ 0 ∀ n ≥ 1 and ∞ n=1 an diverges. Prove
∞ an
(i) n=1 1+an diverges
∞ an
(ii) n=1 1+n2 a converges.
n

4.6 (Integral Test.) If f : [1, ∞) → R is positive and continuous at each point of [1, ∞), and if f is
decreasing, i.e., x < y ⇒ f (y) ≤ f (x), prove ! n using a modification of the argument on pp. 100–102

that ∞ n=1 f (n) converges if and only if { 1 f (x) dx}n=1,2,... is bounded.
104 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

4.7 Using the integral test (in Exercise 4.6 above) to discuss convergence of
∞ 1
(i) n=2 n log n
∞ 1
(ii) n=2 n(log n)1+ε , where ε > 0 is a given constant.

4.8 Complete the proof of Thm. 4.5 (i.e., discuss the general case when |an | converges). (Hint:
The theorem has already been established for series of nonnegative terms; use pn , qn as in Thm. 4.4.)
LECTURE 5. POWER SERIES 105

LECTURE 5: POWER SERIES



A power series is a series of the form ∞ n
n=0 an x , where a0 , a1 , . . . are given real numbers and x is
a real variable. Here we use the standard convention that x 0 = 1, so the first term a0 x 0 just means
a0 .

Notice that for x = 0 the series trivially converges and its sum is a0 .

The following lemma describes the key convergence property of such series.

Lemma 1. If the series ∞ n=0 an x converges for some value x = c, then the series converges absolutely
n

for every x with |x| < |c|.



Proof: ∞ n=0 an c converges ⇒ lim an c = 0 ⇒ {an c }n=1,2,... is a bounded sequence. That is,
n n n

there is a fixed constant M > 0 such that |an c | ≤ M ∀ n = 0, 1, . . ., and so |x| < |c| ⇒, for any
n

j = 0, 1, . . .,
   xj   xj  # |x| $j |x|
|aj x j | = aj c j   j  ≤ M  j  = M = Mr j , r = <1,
c c |c| |c|
and hence
n−1 n−1 1 − rn M
j =0 |aj x |≤M ≤
=M , n = 1, 2, . . . .
j j
j =0 r
1−r 1−r

Thus, the series ∞ n=0 |an x | is convergent, because it has nonnegative terms and we’ve shown its
n

partial sums are bounded. This completes the proof.


We can now directly apply the above lemma to establish the following basic property of power series.

Theorem 5.1. For any given power series ∞ n
n=0 an x , exactly one of the following 3 possibilities holds:
(i) the series diverges ∀ x  = 0, or
(ii) the series converges absolutely ∀ x ∈ R, or
(iii) ∃ ρ > 0 such that the series converges absolutely ∀ x with |x| < ρ, and diverges ∀ x with |x| > ρ.
Terminology: If (iii) holds, the number ρ is called the radius of convergence and the interval (−ρ, ρ)
is called the interval of convergence. If (i) holds we say the radius of convergence is zero, and if (ii)
holds we say the radius of convergence =∞.

Note: The theorem says nothing about what happens at x = ±ρ in case (iii).

Proof of Theorem 5.1: Consider the set S ⊂ R defined by



S = {|x| : ∞ n
n=0 an x converges} .
Notice that we always have 0 ∈ S, so S is nonempty. If S = {0} then case (i) holds, so we can
assume S contains at least one c with c  = 0. If S is not bounded then by Lem. 1 we have that
∞ n
n=0 an x is absolutely convergent (A.C.) on (−R, R) for each R > 0, and hence (ii) holds. If
106 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

S = {0} is bounded then R = sup S exists and is positive. Now for any x ∈ (−R, R) we have c ∈ S
with |c| > |x| (otherwise |c| ≤ |x| for each c ∈ S meaning that |x| would be an upper bound for

S smaller than R, contradicting R = sup S), and hence by Lem. 1 an x n is A.C. So, in fact,
∞ ∞
n=0 an x is A.C. for each x ∈ (−R, R). We must of course also have n=0 an x diverges for
n n

each x with |x| > R because otherwise we would have x0 with |x0 | > R and n=0 an x0n convergent,
hence |x0 | ∈ S, which contradicts the fact that R is an upper bound for S.

Suppose now that a given power series ∞ n
n=0 an x has radius of convergence ρ > 0 (we include
here the case ρ = ∞, which is to be interpreted to mean that the series converges absolutely for all
x ∈ R).
A reasonable and natural question is whether or not we can also expand f (x) in terms of powers of
x − α for given α ∈ (−ρ, ρ).The following theorem shows that we can do this for |x − α| < ρ − |α|
(and for all x in case ρ = ∞).

Theorem 5.2 (Change of Base-Point.) If f (x) = ∞ n
n=0 an x has radius of convergence ρ > 0 or
ρ = ∞, and if |α| < ρ (and α arbitrary in case ρ = ∞), then we can also write

(∗) f (x) = ∞ m=0 bm (x − α)
m
∀ x with |x − α| < ρ − |α| (x, α arbitrary if ρ = ∞) ,
∞  n 
where bm = n=m m an α n−m (so, in particular, b0 = f (α)); part of the conclusion here is that the series

for bm converges, and the series m=0 bm (x − α)m converges for the stated values of x, α.

Note: The series on the right in (∗) is a power series in powers of x − α, hence the fact that it
converges for |x − α| < ρ − |α| means that it has radius of convergence (as a power series in powers
of x − α) ≥ ρ − |α| (and radius of convergence = ∞ in case ρ = ∞). Thus, in particular, the series
on the right of (∗) automatically converges absolutely for |x − α| < ρ − |α| by Lem. 1.
Proof of Theorem 5.2: We take any α with |α| < ρ and any x with |x − α| < ρ − |α| (α, x are

arbitrary if ρ = ∞), and we look at the partial
 sum SN = N

n
n=0 an x . Since the Binomial Theorem
n n n−m
tells us that x = (α + (x − α)) = m=0 m α
n n (x − α) , we see that SN can be written
m

 
N N n n n−m
n=0 an x = (x − α)m .
n
n=0 an m=0 α
m
Using the interchange of sums formula (see Problem 6.3 below)
N n N N
n=0 m=0 cnm = m=0 n=m cnm ,

this then gives


#   $
N N N n
(1) n=0 an x
n
= m=0 n=m an α n−m (x − α)m .
m
 n 1/n
Now
n since m → 1 as n → ∞ for each   fixed m, we see that for any ε > 0 we have N such that
≤ (1 + ε) n for all n ≥ N, hence |a n x n | ≤ |a ((1 + ε)x)n | ∀ n ≥ N and hence by the com-
m n n m n
parison test ∞ n=0 m n a x n also converges absolutely for all x ∈ (−ρ, ρ) (because |x| < ρ ⇒ (1 +
LECTURE 5. POWER SERIES 107

 n 
ε)|x| < ρ for suitable ε > 0).Thus, since |α| < ρ we have, in particular, that n absolutely
m an α is
N  n  ∞  n  ∞ n
convergent and we can substitute n=m m an α n−m = n=m m an α n−m − n=N +1 m an α n−m
in (1) above, whence (1) gives
#   $
N n N ∞ n
a
n=0 n x = m=0 n=m an α n−m
(x − α)m
m  
# n $

− N m=0 n=N +1 an α n−m (x − α)m
m
and
 #   $  #   $
 N ∞ n m N ∞ n
 m=0 n=N +1 a n α n−m
(x −α)  ≤ m=0 n=N +1 |an | |α|n−m
|x − α| m
m m
∞ # N n $
≡ n=N +1 m=0 |an | |α|n−m |x − α|m
# m $
∞ n n
≤ n=N +1 m=0 |an | |α|n−m
|x − α| m

∞ m
≡ n=N +1 |an |(|α|+|x −α|)n , |α|+|x −α| < ρ ,

where we used the Binomial Theorem again in the last line. Now observe that we have
∞ n ∞
n=N+1 |an |(|α| + |x − α|) = n=1 |an |(|α| + |x − α|)
n
N
− n=1 |an |(|α| + |x − α|)n → 0 as N → ∞ ,

so the above shows that limN →∞ Nm=0 bm (x − α) exists (and is real), i.e., the series
m b (x −
∞ m=0n m
α) converges for |x − α| < ρ − |α|, and that the sum of the series agrees with n=0 an x , so the
m

proof of Thm. 5.2 is complete.

LECTURE 5 PROBLEMS

5.1 (i) Suppose the radius of convergence of ∞ n
n=0 an x is 1, and the radius of convergence of
∞ ∞
n=0 (an + bn )x has radius of convergence 1.
n n
n=0 bn x is 2. Prove that
(Hint: Lemma 1 guarantees that {an x n }n=1,2,... is unbounded if |x| is greater than the radius of

convergence of ∞ n
n=0 an x .)

(ii) If both n=0 an x n , ∞ n
n=0 bn x have radius of convergence = 1, show that

(a) The radius of convergence of ∞ n=0 (an + bn )x is ≥ 1, and
n

(b) For any given number R > 1 you can construct examples with radius of convergence of

n=0 (an + bn )x = R.
n

5.2 If ∃ constants c, k > 0 such that c−1 n−k ≤ |an | ≤ cnk ∀n = 1, 2, . . ., what can you say about

the radius of convergence of ∞ n
n=0 an x .
108 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

LECTURE 6: TAYLOR SERIES REPRESENTATIONS


The change of base point theorem proved in Lecture 5 is actually quite strong; for example, it
makes it almost trivial to check that a power series is differentiable arbitrarily many times inside its
interval of convergence (i.e., a power series is “C ∞ ” in its interval of convergence), and furthermore
all the derivatives can be correctly calculated simply by differentiating each term (i.e., “termwise”
differentiation is a valid method for computing the derivatives of a power series in its interval of
convergence). Specifically, we have:

Theorem 6.1. Suppose the power series ∞ n=0 an x has radius of convergence ρ > 0 (or ρ = ∞), and
n

let f (x) = n=0 an x n for |x| < ρ. Then all derivatives f (m) (x), m = 1, 2, . . ., exist at every point x
with |x| < ρ, and, in fact,

f (m) (x) = ∞ n=m n(n − 1) · · · (n − m + 1)an x
n−m
, x ∈ (−ρ, ρ) ,
which says precisely that the derivatives of f can be correctly computed simply by differentiating the series

an x n termwise (because n(n − 1) · · · (n − m + 1)an x n−m is just the mth derivative of an x n ); that
d ∞ ∞ d m
n=0
m
n=0 an x = n=0 dx m an x for |x| < ρ.
is, dx n n
m

Proof: It is enough to check the stated result m = 1, because then the general result follows directly
by induction on m.
The proof for m = 1 is an easy consequence of Thm. 5.2 (change of base-point theorem), which
tells us that we can write

(1) f (x) − f (α) = ∞ m=1 bm (x − α) ≡ (x − α)b1 + (x − α) m=2 bm (x − α)
m m−1

∞ n
for |x − α| < ρ − |α|, where bm = n=m m an α
n−m . That is,
f (x) − f (α) − b1 (x − α) ∞
(2) = m=2 bm (x − α)m−1 , 0 < |x − α| < ρ − |α|
x−α
and the expression on the right is a convergent power series with radius of convergence at least
r = ρ − |α| > 0 and hence is A.C. if x − α is in the interval of convergence (−r, r). In particular,
it is A.C. when |x − α| = r/2 and hence (2) shows that
 f (x) − f (α) 
 
(3)  − b1  ≤ ∞
m=2 |bm | |x − α|
m−1
≤ |x − α| ∞ m=2 |bm |(r/2)
m−2
x−α
for 0 < |x − α| < r/2 (where r = ρ − |α| > 0). Since the right side in (3) → 0 as x → α, this

shows that f (α) exists and is equal to b1 = ∞
n=1 nan α
n−1 .

We now turn to the important question of which functions f can be expressed as a power series on
some interval. Since we have shown power series are differentiable to all orders in their interval of
convergence, a necessary condition is clearly that f is differentiable to all orders; however, this is not
LECTURE 6. TAYLOR SERIES REPRESENTATIONS 109

sufficient; see Exercise 6.2 below. To get a reasonable sufficient condition, we need the following
theorem.

Theorem 6.2 (Taylor’s Theorem.) Suppose r > 0, α ∈ R and f is differentiable to order m + 1 on the
interval |x − α| < r. Then ∀ x with |x − α| < r, ∃ c between α and x such that
m f (n) (α) f (m+1) (c)
f (x) = (x − α)n + (x − α)m+1 .
n=0
n! (m + 1)!
Proof: Fix x with 0 < x − α < r (a similar argument holds in case −r < x − α < 0), and, for
|t − α| < r, define
m f (n) (α)
(1) g(t) = f (t) − n=0 (t − α)n − M(t − α)m+1 ,
n!
where M (constant) is chosen so that g(x) = 0, i.e.,
f (n) (α)
(f (x) − m n=0 n! (x − α) )
n
M= .
(x − α)m+1
Notice that, by direct computation in (1),

g (n) (α) = 0 ∀ n = 0, . . . , m
(2)
g (m+1) (t) = f m+1 (t) − M(m + 1)!, |t − α| < r.

In particular, since g(α) = g(x) = 0, the mean value theorem tells us that there is c1 ∈ (α, x) such
that g (c1 ) = 0. But then g (α) = g (c1 ) = 0, and hence again by the mean value theorem there is
a constant c2 ∈ (α, c1 ) such that g (c2 ) = 0.
After (m + 1) such steps we deduce that there is a constant cm+1 ∈ (α, x) such that g (m+1) (cm+1 ) =
0. However, by (2), g (m+1) (t) = f (m+1) (t) − M(m + 1)!, hence this gives
f (m+1) (cm+1 )
M= .
(m + 1)!
In view of our definition of M, this proves the theorem with c = cm+1 .

Theorem 6.2 gives us a satisfactory sufficient condition for writing f in terms of a power series.
Specifically we have:

Theorem 6.3. If f (x) is differentiable to all orders in |x − α| < r, and if there is a constant C > 0 such
that
 f (n) (x) 
  n
(∗)   r ≤ C ∀ n ≥ 0, and ∀ x with |x − α| < r ,
n!

then ∞ f (n) (α)
n=0 n! (x − α) converges, and has sum f (x), for every x with |x − α| < r.
n
110 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS


Note 1: Whether or not (∗) holds, and whether or not ∞ f (n) (α)
n=0 n! (x − α) converges to f (x), we
n
∞ f (n) (α)
call the series n=0 n! (x − α)n the Taylor series of f about α.
Note 2: Even if the Taylor series converges in some interval x − α < r, it may fail to have sum f (x).
(See Exercise 6.2 below). Of course the above theorem tells us that it will have sum f (x) in case
the additional condition (∗) holds.
(m+1)
Proof of Theorem 6.3: The condition (∗) guarantees that the term f(m+1)!(c) (x − α)m+1 on the right
 m+1
in Thm. 6.2 has absolute value ≤ C |x−α|
r and hence Thm. 6.2, with m = N , ensures
   
f (x) − N f (α) (x − α)n  ≤ C |x − α| N+1 → 0 as N → ∞
(n)
n=0
n! r
if |x − α| < r. Hence,
N f (n) (α)
lim n=0 (x − α)n = f (x)
N →∞ n!
whenever |x − α| < r, i.e.,
∞ f (n) (α)
n=0 (x − α)n = f (x), |x − α| < r .
n!

LECTURE 6 PROBLEMS
6.1 Prove that the function f , defined by

e−1/x
2
if x  = 0
f (x) =
0 if x = 0 ,

is C ∞ on R and satisfies f (m) (0) = 0 ∀ m ≥ 0.

Note: This means the Taylor series of f (x) about 0 is zero; i.e., it is an example where the Taylor
series converges, but the sum is not f (x).

6.2 If f is as in 6.1, prove that there does not exist any interval (−ε, ε) on which f is represented

by a power series; that is, there cannot be a power series ∞ n=0 an x such that f (x) =
n
n=0 an x
n

for all x ∈ (−ε, ε).

6.3 Let bnm be arbitrary real numbers 0 ≤ n ≤ N, 0 ≤ m ≤ n. Prove


N n N N
n=0 m=0 bnm = m=0 n=m bnm .

bnm if m ≤ n
Hint: Define b̃nm =
0 if n < m ≤ N.
LECTURE 6. TAYLOR SERIES REPRESENTATIONS 111

6.4 Find the Taylor series about x = 0 of the following functions; in each case prove that the series
converges to the function in the indicated interval.
(i) 1
1−x 2
, |x| < 1 (Hint: 1
1−y = 1 + y + y 2 . . . , |y| < 1).
(ii) log(1 + x), |x| < 1
(iii) ex , x ∈ R
2
(iv) ex , x ∈ R (Hint: set y = x 2 ).

6.5 (The analytic definition of the functions cos x, sin x, and the number π .)

Let sin x, cos x be defined by sin x = ∞ n x 2n+1
n=0 (−1) (2n+1)! , cos x =
n x 2n
n=0 (−1) (2n)! . For convenience
of notation, write C(x) = cos x, S(x) = sin x. Prove:
(i) The series defining S(x), C(x) both have radius of convergence ∞, and S (x) ≡ C(x), C (x) ≡
−S(x) for all x ∈ R.
(ii) S 2 (x) + C 2 (x) ≡ 1 for all x ∈ R. Hint: Differentiate and use (i).
(iii) sin, cos (as defined above) are the unique functions S, C on R with the properties (a) S(0) =
0, C(0) = 1 and (b) S (x) = C(x), C (x) = −S(x) for all x ∈ R. Hint: Thus, you have to show
that S = S and C  = C assuming that properties (a),(b) hold with   in place of S, C, respectively;
S, C
show that Thm. 6.3 is applicable.
(iv) C(2) < 0 and hence there is a p ∈ (0, 2) such that C(p) = 0, S(p) = 1 and C(x) > 0 for all
x ∈ [0, p).
# x 4k+2 $
Hint: C(x) can be written 1 − x2 + x24 − ∞
2 4 x 4k+4
k=1 (4k+2)! − (4k+4)! .

(v) S, C are periodic on R with period 4p. Hint: Start by defining C(x) = S(x + p) and 
S(x) =
−C(x + p) and use the uniqueness result of (iii) above.
Note: The number 2p, i.e., twice the value of the first positive zero of cos x, is rather important, and
we have a special name for it—it usually denoted by π . Calculation shows that π = 3.14159 . . ..
(vi) γ (x) = (C(x), S(x)), x ∈ [0, 2π ] is a C 1 curve, the mapping γ |[0, 2π ) is a 1:1 map of [0, 2π)
onto the unit circle S 1 of R2 , and the arc-length S(t) of the part of the curve γ |[0, t] is t for each
t ∈ [0, 2π ]. (See Figure A.2.)

Remark: Thus, we can geometrically think of the angle between e1 and (C(t), S(t)) as t (that’s t
radians, meaning the arc on the unit circle going from e1 to P = (C(t), S(t)) (counterclockwise) has
length t as you are asked to prove in the above question) and we have the geometric interpretation
that C(t)(= cos t) and S(t)(= sin t) are, respectively, the lengths of the adjacent and the opposite
sides of the right triangle with vertices at (0, 0), (0, C(t)), (C(t), S(t)) and angle t at the vertex
(0, 0), at least if 0 < t < p = π2 . (Notice this is now a theorem concerning cos t, sin t as distinct
from a definition.)
112 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

Figure A.2:

Note: Part of the conclusion of (vi) is that the length of the unit circle is 2π. (Again, this becomes
a theorem; it is not a definition—π is defined to be 2p, where p, as in (iv), is the first positive zero

of the function cos x = ∞ n x 2n
n=0 (−1) (2n)! .)
LECTURE 7. COMPLEX SERIES AND COMPLEX EXPONENTIAL SERIES 113

LECTURE 7: COMPLEX SERIES, PRODUCTS OF SERIES, AND


COMPLEX EXPONENTIAL SERIES

In Real Analysis Lecture 4 we discussed series ∞ n=1 an where an ∈ R. One can analogously consider
complex series, i.e., the case when an ∈ C. The definition of convergence is exactly the same as in

the real case. That is we say the series converges if the nth partial sum (i.e., nj=1 aj ) converges;
more precisely:
n
Definition: ∞ n=1 an converges if the sequence of partial sums { j =1 aj }n=0,1,... is a
convergent
sequence in C; that is, there is a complex number s = u + iv (u, v real) such that limn→∞ nj=1 aj =
s.

Note: Of course here we are using the terminology that a sequence {zn }n=1,2,... ⊂ C converges, with
limit a = α + iβ, if the real sequence |zn − a| has limit zero, i.e., limn→∞ |zn − a| = 0. In terms of
“ε, N” this is the same as saying that for each ε > 0 there is N such that |zn − a| < ε for all n ≥ N.
Writing zn in terms of its real and imaginary parts, zn = un + ivn , then this is the same as saying
un → α and vn → β, so applying this to the sequence of partial sums we see that the complex series
∞ ∞
n=1 an with an = αn + iβn is convergent if and only if both of the real series ∞n=1 αn , n=1 βn

are convergent, and in this case ∞ n=1 na = ∞
α
n=1 n + i ∞
n=1 nβ .
Most of the theorems we proved for real series carry over, with basically the same proofs, to

the complex case. For example, if ∞ n=1 an , n=1 bn are convergent complex series then the se-

ries n=1 (an + bn ) is convergent and its sum (i.e., limn→∞ nj=1 (aj + bj )) is just ∞ n=1 an +

n=1 bn .

Also, again analogously to the real case, we say the complex series ∞n=1 an is absolutely convergent

if n=1 |an | is convergent. We claim that just as in the real case absolute convergence implies
convergence.

∞ ∞
Lemma 1. The complex series n=1 an is convergent if n=1 |an | is convergent.

Proof: Let αn , βn denote the real and imaginary parts of an , so that an = αn + iβn and |an | =

αn2 + βn2 ≥ max{|αn |, |βn |}, so ∞
n=1 |an | converges ⇒ ∃ a fixed M > 0 such that nj=1 |aj | ≤
n
M ∀ n ⇒ j =1 |αj | ≤ M ∀ n ⇒ ∞ j =1 |αn | is convergent, so

α is absolutely conver-
∞ n n=1 n n
gent, hence convergent. Similarly, n=1 βn is convergent. But j =1 aj = j =1 αj + i nj=1 βj
n n n ∞
and so limn→∞ j =1 an exists and equals limn→∞ j =1 αj + i limn→∞ j =1 βj = n=1 αn +

i ∞ n=1 βn , which completes the proof.


We next want to discuss the important process of multiplying two series: If ∞ n=0 an and b
N n=0 n
are given complex series, we observe that the product of the partial sums, i.e., the product ( n=0 an ) ·

( N n=0 bn ), is just the sum of all the elements in the rectangular array
114 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

⎛ ⎞
aN b0 aN b1 aN b2 · · · · · · · · · aN bN
⎜aN−1 b0 aN −1 b1 aN −1 b2 · · · · · · · · · aN −1 bN ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ ⎟
⎜ a2 b 0 a2 b1 a2 b2 ··· ··· ··· a2 bN ⎟
⎜ ⎟
⎝ a1 b0 a1 b1 a1 b2 ··· ··· ··· a1 bN ⎠
a0 b0 a0 b1 a0 b2 ··· ··· ··· a0 bN

#
N $#
N $ 2N #
  $
an bn = ai bj ,
n=0 n=0 n=0 0≤i,j ≤N, i+j =n

and observe that if i, j ≥ 0 and i + j ≤ N we automatically have i, j ≤ N, and so with cn =


n
i,j ≥0, i+j =n ai bj (= i=0 ai bn−i ) we see that the right side of the above identity can be written
N
n=0 cn + 0≤i,j ≤N, i+j >N ai bj , and so we have the identity

#
N $#
N $ 
N 
(∗) an bn − cn = ai bj
n=0 n=0 n=0 0≤i,j ≤N, i+j >N


for each N = 0, 1, . . .. (Geometrically, Nn=0 cn is the sum of the lower triangular elements of the
array, including the leading diagonal, and the term on the right of (∗) is the sum of the remaining,

upper triangular, elements.) If the given series an , bn are absolutely convergent, we show

below that the right side of (∗) → 0 as N → ∞, so that ∞ n=0 cn converges and has sum equal to
∞ ∞
( n=0 an ) ( n=0 bn ). That is:

Lemma (Product Theorem.) If an and b are absolutely convergent complex series, then
∞ ∞ ∞ nn
( n=0 an )( n=0 bn ) = n=0 cn , where cn = i=0 ai bn−i for each n = 0, 1, 2, . . .; furthermore,

the series cn is absolutely convergent.

Proof: By (∗) we have


#
N $#
N $ N     
   
 ai bj − cn  =  a i bj  ≤ |ai ||bj |
i=0 j =0 n=0 i,j ≤N, i+j >N i,j ≤N, i+j >N
 
≤ |ai ||bj | + |ai ||bj |
i,j ≤N, i>N/2 i,j ≤N, j >N/2
#  N $#
N $# N $#  N $
= |ai | |bj | |ai | |bj |
i=[N/2]+1 j =0 i=0 j =[N/2]+1
LECTURE 7. COMPLEX SERIES AND COMPLEX EXPONENTIAL SERIES 115

# ∞
 $#
∞ $ #
∞ $# ∞
 $
≤ |ai | |bj | + |ai | |bj |
i=[N/2]+1 j =0 i=0 j =[N/2]+1
→ 0 as N → ∞ ,

where [N/2] = N2 if N is even and [N/2] = N 2−1 if N is odd. Notice that in the last line we used
[N/2]
the fact that ∞ |a | = ∞ i=0 |ai | − i=0 |ai | → 0 as N → ∞ because by definition
∞ J i
i=[N/2]+1
∞ ∞ [N/2]
n=0 |a n | = lim J →∞ n=0 |a n |, and similarly, j =[N/2]+1 |b j | = j =0 |b j | − j =0 |bj | → 0,
∞ J
because n=0 |bn | = limJ →∞ n=0 |bn |.
∞ ∞ ∞
This completes the proof that ∞ n=0 cn converges, and n=0 cn = ( n=0 an ) ( n=0 bn ).
 
To prove that ∞  
n=0 |cn | converges, just note that for each n ≥ 0 we have |cn | ≡ i+j =n ai bj
≤ i+j =n |ai ||bj | ≡ Cn say, and the above argument, with |ai |, |bj | in place of ai , bj , respectively,

shows that ∞ n=0 Cn converges, so by the comparison test n=0 |cn | also converges.

We now define the complex exponential series.


Definition: The complex exponential function exp z (also denoted ez ) is defined by exp z =
∞ z n
n=0 n! .
|z|n
Observe that this makes sense for all z because the series ∞ is the real exponential series,
∞ znn!
n=0
which we know is convergent on all of R, so that the series n=0 n! is absolutely convergent (hence
convergent by Lem. 1) for all z.
We can use the above product theorem to check the following facts, which explains why the notation
ez is sometimes used instead of exp z:
(i) (exp a)(exp (b) = exp(a + b), a, b ∈ C,
(ii) exp ix = cos x + i sin x, x ∈ R .

The proof is left as an exercise (Exercises 7.1 and 7.2 below).


Notice that it follows from (i) that exp z is never zero (because by (i) (exp z) (exp −z) = exp 0 =
1  = 0 ∀ z ∈ C).

LECTURE 7 PROBLEMS
7.1 Use the product theorem to show that exp a exp b = exp(a + b) for all a, b ∈ C.

7.2 Justify the formula exp ix = cos x + i sin x for all x ∈ R.



Note: cos, sin are defined by cos x = ∞ k x 2k
k=0 (−1) (2k)! and sin x =
k x 2k+1
k=0 (−1) (2k+1)! for all real x.
116 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

LECTURE 8: FOURIER SERIES


So far we have only considered vectors in Rn , but of course the concept of vector is really much
more general. Indeed, any collection of objects V which have a well defined operation of addition
and multiplication by real scalars such that following eight vector space axioms are satisfied, is called
a real vector space and the objects themselves are called vectors:

8.1 Vector Space Axioms for V :


We assume that for v, w ∈ V and λ ∈ R there are well-defined operations “+” (addition) and “.”
(multiplication of vector by a scalar) to give v + w ∈ V and λ · v ∈ V (henceforth written just as
λv) and that the following properties hold:
(i) v, w ∈ V ⇒ v + w = w + v
(ii) u, v, w ∈ V ⇒ (u + v) + w = u + (v + w)
(iii) ∃ 0 ∈ V (called the “zero vector”) with v + 0 = v ∀ v ∈ V
(iv) v ∈ V ⇒ ∃ an element −v with v + (−v) = 0
(v) λ, μ ∈ R, v ∈ V ⇒ λ(μv) = (λμ)v
(vi) λ, μ ∈ R, v ∈ V ⇒ (λ + μ)v = (λv) + (μv)
(vii) λ ∈ R, v, w ∈ V ⇒ λ(v + w) = (λv) + (λw)
(viii) v ∈ V ⇒ 1 v = v.
Note: In a general vector space we usually write v − w as an abbreviation for v + (−w).

There are many examples of such general real vector spaces: For example, the set C 0 ([a, b]) of
continuous functions f : [a, b] → R trivially satisfies the above axioms if addition and multiplication
by scalars are defined pointwise (i.e., f, g ∈ C 0 ([a, b]) ⇒ f + g ∈ C 0 ([a, b]) is defined by (f +
g)(x) = f (x) + g(x)∀x ∈ [a, b], and λ ∈ R and f ∈ C 0 ([a, b]) ⇒ λf is defined by (λf )(x) =
λf (x) for x ∈ [a, b]). In this case, the zero vector is the zero function 0(x) = 0 ∀x ∈ [a, b]. Most
of the definitions and theorems for the Euclidean space Rn carry over with only notational changes
to general real vector spaces. For example, the linear dependence lemma holds in a general vector
space and the basis theorem applies in any subspace which is the span of finitely many vectors.

Suppose now that V is a real vector space equipped with a inner product  , . Thus, v, w is real
valued and
(i) v, w = w, v ∀ v, w ∈ V ,
(ii) αv1 + βv2 , w = αv1 , w + βv2 , w ∀ v1 , v2 , w ∈ V and α, β ∈ R, and
(iii) v, v ≥ 0 ∀ v ∈ V with equality if and only if v = 0.

We define the norm of V , denoted  , by v = v, v .
LECTURE 8. FOURIER SERIES 117

Notice that then the inner product and the norm have properties very similar to the dot product and
norm of vectors in Rn (indeed an important special case of inner product is the case V = Rn and
v, w = v · w). In the general case, by properties (i), (ii), (iii), we have
(1) v − w2 = v2 + w2 − 2v, w ∀ v, w ∈ V ,

and we can prove various other results which have analogues in the special case when V = Rn and
v, w = v · w; for example, there is a Cauchy-Schwarz inequality (proved exactly as in Rn ):
|v, w | ≤ v w, v, w ∈ V .

Analogous to the Rn case, a set v1 , . . . , vN of vectors in V is said to be orthonormal if


(2) vi , vj = δij ∀ i, j ∈ {1, . . . , N}

where δij denotes the “Kronecker delta,” defined by

1 if i = j
δij =
0 if i  = j .

Given such an orthonormal set v1 , . . . , vN and given any other v ∈ V , we define


(3) cn = v, vn .

Then by using (1) we get


N N
v − N n=1 cn vn  = v +
2 2
n=1 m=1 cn cm vn , vm − 2cn v, vn .

And hence by (2), (3) we get


N N
(∗) v − n=1 cn vn 
2 = v2 − 2
n=1 cn ,

and hence in particular,


N
(∗∗) 2
n=1 cn ≤ v2 ,
N
with equality if and only if v = N n=1 cn vn . (Notice that geometrically n=1 cn vn represents the
“orthogonal projection” of v onto the subspace spanned by v1 , . . . vN —see Exercise 8.1.)
Now in the case when V is not finite dimensional we can have an orthonormal sequence v1 , v2 . . . of
elements of V . In particular, the above discussion applies to v1 , v2 , . . . , vN for each integer N ≥ 1,
∞ 2
and (∗∗) tells us that the partial sums N 2
n=1 cn of the series n=1 cn are bounded above by v .
2
∞ 2
Hence, n=1 cn converges (recall from Lecture 4 that a series of nonnegative terms converges if
and only if the partial sums are bounded) and


(‡) cn2 ≤ v2 .
n=1
118 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

(‡) is called Bessel’s Inequality. Notice (by (∗)) that equality holds in (‡) if and only if limN→∞ v −
N
n=1 cn vn  = 0. This ties in with the following definition.

Definition: The orthonormal sequence v1 , v2 , . . . is said to be complete if, for each v ∈ V , we have

limN→∞ v − N n=1 cn vn  = 0; here of course cn = v, vn as in (3) above. Notice that by (∗)
completeness holds if and only if equality holds in (‡) ∀ v ∈ V .

Terminology: If limN →∞ v − N n=1 cn vn  = 0 then we say the series n=1 cn vn converges to v

in the norm   of V . Whether or not this convergence takes place, the series ∞ n=1 cn vn is called the
Fourier series for v relative to the orthonormal sequence v1 , v2 , . . ..

For the rest of this lecture we specialize to a concrete situation—Viz., Fourier series for piecewise
continuous functions. First we introduce some notations: for any given closed interval [a, b] ⊂ R,
CP ([a, b]) = the set of piecewise continuous functions on [a, b]. That is f ∈ CP ([a, b]) means there
is a partition a = x0 < x1 < . . . < xJ = b of the interval [a, b] such that f is continuous at each
point of (xj −1 , xj ), j = 1, . . . , J , and such that
lim f (x), lim f (x), lim f (x), lim f (x) all exist j = 1, . . . , J − 1 .
x↓a x↑b x↓xj x↑xj

CA ([a, b]) denotes the subset of CP ([a, b]) functions f with the additional properties that
1# $
f (a) = f (b) = lim f (x) + lim f (x)
2 x↓a x↑b
and
1# $
f (xj ) = lim f (x) + lim f (x) , j = 1, . . . , J − 1 .
2 x↓xj x↑xj

One readily checks that if f, g ∈ CP ([a, b]) and if α, β ∈ R, then αf + βg ∈ CP ([a, b]), so
CP ([a, b]) is a vector space which is a subspace of the space of all functions f : [a, b] → R. Likewise,
CA ([a, b]) is a vector space, which is a subspace of CP ([a, b]).
We now specialize further to the interval [−π, π] and we define an inner product on CA ([a, b]) by

(†) f, g = π1 −π f (x)g(x) dx ,
 !
π
so that f  = π1 −π f 2 (x) dx. (The factor π1 is present purely for technical reasons.)
By direct computation, one checks that the sequence √1 , cos x, sin x, cos 2x, sin 2x, . . . ,
2
cos nx, sin nx, . . . is an orthonormal sequence relative to this inner product (Exercise 8.4.).

Definition: Given f ∈ CA ([−π, π]), the trigonometric Fourier series of f is defined to be


the Fourier series of f relative to the orthonormal sequence √1 , cos x, sin x, cos 2x, sin 2x,
2
. . . , cos nx, sin nx, . . .. Thus, the trigonometric Fourier series of f can be written

c0 √1 + ∞ n=1 (an cos nx + bn sin nx),
2
where
LECTURE 8. FOURIER SERIES 119


c0 = 1 √1 f (x) dx,
π
!−π
π
2
an = 1
cos(nx)f (x) dx, n = 1, 2, . . .
π !−π
π
bn = 1
π −π sin(nx)f (x) dx, n = 1, 2, . . . .

A much more convenient notation is achieved by setting a0 = 2c0 , in terms of which the above
trigonometric Fourier series is written
a0 ∞
2 + n=1 (an cos nx + bn sin nx),
where ! π
an = π1 −π cos(nx)f (x) dx, n = 0, 1, 2, . . .
! π
bn = π1 −π sin(nx)f (x) dx, n = 1, 2, . . . .

Bessel’s inequality (∗∗) says in this case (keeping in mind that c02 = a02 /2) that
a02 ∞ !π
2 + 2
n=1 (an + bn2 ) ≤ 1
π −π f 2 (x) dx ,

and by the above general discussion we have equality if and only if the Fourier series is norm
convergent to f (i.e., if and only if the orthonormal sequence √1 , cos x, sin x, cos 2x, sin 2x, . . ., is
2
complete in the vector space V = CA ([−π, π]). We claim that this is in fact true; that is, we claim:

Theorem 8.2. For each f ∈ CA ([−π, π]), we have


  
limN →∞ f − a20 + N n=1 (an cos nx + bn sin nx)
 = 0,
and  2 !π
a02 N
2 + 2
n=1 (an + bn2 ) = f  ≡ 1
π −π f 2 (x) dx .

Proof: The proof is nontrivial, and we omit it. Nevertheless, in the next lecture we will give a
complete discussion (with proof ) of pointwise convergence of the Fourier series to f . (For pointwise
convergence we will have to assume a stronger condition on f than that merely f ∈ CA ([−π, π ]).)

LECTURE 8 PROBLEMS

8.1 With notation as on p. 117 prove that N n=1 cn vn really does represent “orthogonal pro-
N
jection” onto the subspace of V spanned by v1 , . . . vN ; that is, prove (i) n=1 cn vn = v when

v ∈ span{v1 , . . . vN }, and (ii) v − Nn=1 cn vn , w = 0 whenever w ∈ span{v1 , . . . vN }.

8.2 (i) With the same notation, prove that for any constants d1 , . . . , dN and any v ∈ V
N
v − N n=1 dn vn  = v −
2 2
n=1 (dn − 2cn dn ) (cn = v, vn ) .
2

(ii) Using (i) together with the inequality 2ab ≤ a 2 + b2 with equality if and only if a = b, prove
that (with cn = v, vn )
N
(∗) v − N n=1 dn vn  ≥ v −
2
n=1 cn vn 
2
120 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

with equality if and only if cn = dn ∀ n = 1, . . . , N.



(Notice that geometrically (∗) says that Nn=1 cn vn is the point in the subspace spanned by v1 , . . . , vN
which has least distance from v.)

8.3 If f, g ∈ CA ([a, b]) and α, β ∈ R, prove αf + βg ∈ CA ([a, b]).

8.4 Prove that  , defined in (†) is an inner product on CA ([−π, π]), and check that the sequence
of functions √1 , cos x, sin x, cos 2x, sin 2x, . . . is an orthonormal sequence (as claimed in the above
2
lecture).

8.5 Compute the Fourier series for the following functions f ∈ CA ([−π, π]):
(i) f (x) = |x|, −π ≤ x ≤ π
x, −π < x < π
(ii) f (x) =
0, x = ±π.

8.6 Assuming the result of Thm. 8.1 and using the Fourier series of 8.5(i), (ii), find the sums of the
series
∞ 1
(i) n=1 (2n−1)4 ,
∞ 1
(ii) n=1 n2 .
LECTURE 9. POINTWISE CONVERGENCE OF TRIGONOMETRIC FOURIER SERIES 121

LECTURE 9: POINTWISE CONVERGENCE OF


TRIGONOMETRIC FOURIER SERIES
In the previous lecture we defined CA ([a, b]) to be the space of functions f : [a, b] → R which are
piecewise continuous (i.e., in C P ([a, b])) and which satisfy the additional conditions
1 
f (a) = f (b) = f (a+ ) + f (b− )
2
and
1 
(∗) f (y) = f (y+ ) + f (y− )
2
at each point y ∈ (a, b). Here and subsequently we use the notation f (y+ ) = limx↓y f (x), f (y− ) =
limx↑y f (x); notice that of course we automatically have f (y+ ) = f (y− ) = f (y) at any point y
where f is continuous, so (∗) really only imposes a restriction on f at the finitely many points where
f is not continuous.

Now take f ∈ CA ([−π, π ]). Since f (−π ) = f (π ), we can extend f to give a function f¯ : R → R
by defining
f¯(x + 2kπ ) = f (x), x ∈ [−π, π], k = 0, ±1, ±2, . . . .

This extended function is 2π -periodic. (A function g : R → R is said to be p-periodic if g(x + p) =


g(x) ∀ x ∈ R—evidently f¯ has this property with p = 2π by definition.) Henceforth we shall drop
the “bar” in f¯; that is f will denote both the given function in CA ([−π, π ]) and its 2π -periodic
extension as defined above. Of course the 2π -periodic extension is piecewise continuous on all of R
and satisfies (∗) at each point y ∈ R.
An important fact about periodic functions (which we shall need below) is the following.

Lemma 9.1. If g : R → R is piecewise continuous and p-periodic for some p > 0, then
! a+p !p
a f (x) dx = 0 f (x) dx ∀ a ∈ R .

Proof:
!a !a
g(x) dx = 0 g(x + p) dx (since g(x) ≡ g(x + p))
0 ! a+p
= p g(t) dt (by change of variable t = x + p) ,

so
! a+p ! a+p !a
g(t) dt = 0 g(t) dt − 0 g(t) dt
a !p ! a+p !a
= 0 g(t) dt + p g(t) dt − 0 g(t) dt
!p
= 0 g(t) dt .
122 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

We now prove the main result about pointwise convergence of trigonometric Fourier series: Notice
that in the statement of the following theorem we use the 2π -periodic extension of f as described
above in order to make sense of the quantities f (x+ ) if x = π and f (x− ) if x = −π, and also to
make sense of the one-sided derivatives when x = ±π .
Theorem 9.2. Suppose f ∈ CA ([−π, π]) and let x ∈ [−π, π] be such that the one-sided derivatives
f (x + h) − f (x− ) f (x + h) − f (x+ )
(∗) lim , lim ,
h↑0 h h↓0 h

both exist. Then the Fourier series a20 + ∞ n=1 (an cos nx + bn sin nx) converges to f (x).

Important Notes: (1) Of course the limits in ∗ exist automatically (and are equal) if f is differentiable
at x.

(2) Here the coefficients are as described in Lecture 8, i.e.,



an = π1 −π f (t) cos nt dt, n ≥ 0,
(‡) !π
bn = π1 −π f (t) sin nt dt, n ≥ 1 .
Recall also that Bessel’s inequality tells us that
a02 ∞ !π
2 + 2
n=1 (an + bn2 ) ≤ 1
π −π f 2 (t) dt .
(As a matter of fact, Thm. 8.1 tells us that equality holds here, but we will not need this fact in the
present discussion.)

A key step in the proof of Thm. 9.2 is the following lemma, which gives an alternative expression

for the partial sum a20 + Nn=1 (an cos nx + bn sin nx).

Lemma 9.3. If SN (x) = a20 + Nn=1 (an cos nx + bn sin nx) with an , bn as above, then for N ≥ 1,
!0 !π
SN (x) − f (x) = −π (f (x − t) − f (x+ ))DN (t) dt + 0 (f (x − t) − f (x− ))DN (t) dt ,
where
N
DN (t) = 1 1
π (2 + n=1 cos nt) .
Proof: Using (‡) we have

SN (x) = π1 −π f (t){ 21 + N (cos nt cos nx + sin nt sin nx)} dt,
1 π
! n=1
N
= π −π f (t){ 2 + n=1 cos(n(x − t))} dt
1

= −π f (t)DN (x − t)) dt, (by definition of DN )
! x+π
= x−π f (x − s)DN (s) ds, (by change of variable t → x − s)
! x+π
= x−π f (x − t)DN (t) dt,

= −π f (x − t)DN (t) dt, (by Lem. 9.1)
!0 !π
(1) = −π f (x − t)DN (t) dt + 0 f (x − t)DN (t) dt ,
LECTURE 9. POINTWISE CONVERGENCE OF TRIGONOMETRIC FOURIER SERIES 123

where in the second line we used the identity cos(A − B) = cos A cos B + sin A sin B.
!0 !π
Now evidently −π DN (t) dt = DN (t) dt = 21 , and since f ∈ CA ([−π, π ]), we have f (x) =
!0
0 !0
2 (f (x+ ) + f !(x− )). f (x − t)DN (t) dt = −π (f (x − t) − f (x+ ))DN (t) dt +
1
Hence, −π
π ! π
2 f (x+ ), and 0 f (x − t)DN (t) dt = 0 f (x − t) − f (x− ))DN (t) dt + 2 f (x− ), so (1) yields
1 1

!0 !π
SN (x) − f (x) = −π (f (x − t) − f (x+ ))DN (t) dt + 0 (f (x − t) − f (x− ))DN (t) dt

as required.

Before proceeding to the proof of Thm. 9.2 we notice that we have the following alternative expres-
sions for DN (t):
sin(N + 21 )t
(∗) DN (t) = , −π ≤ t ≤ π, t = 0 .
2π sin( 21 t)

1 N
To see this, note that π DN (t) ≡ 21 + N n=1 cos nt = 2 n=−N e
int (by virtue of the formula cos x =
−ix ), so that by multiplying by e±it/2 and taking the difference of the resulting expressions,
2 (e + e
1 ix

we have
1 i(N + 1 )t 1
π DN (t)(eit/2 − e−it/2 ) = (e 2 − e−i(N+ 2 )t ) ,
2

and so, using sin x = 1 ix


2i (e − e−ix ), we obtain
π sin 21 t DN (t) = 1
2 sin(N + 21 )t ,

or, in other words,


sin(N + 21 )t
DN (t) = if − π < t ≤ π, t = 0 .
2π sin 21 t

We shall also need the following inequality for DN (t):



(∗∗) |t||DN (t)| ≤ 1, |t| ∈ (0, ).
3

|t|
Indeed, this is an easy consequence of (∗), because | sin( 2t )| ≥ 4 for |t| ≤ 2π
3 (see Exercise 9.1), and
hence by (∗)
| sin(N + 21 )t| |t| 4  1  2π
|t||DN (t)| ≡ · ≤  sin(N + )t  ≤ 1 for |t| ≤ .
2π | sin( 2t )| 2π 2 3
124 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

Proof of Theorem 9.2: Choose an arbitrary δ ∈ 0, 2π 3 ), and note that by Lem. 9.3 and (∗) (and
using the identity sin(N + 21 )t = sin Nt cos 21 t + cos Nt sin 21 t) we can write
(1) 2π(SN (x) − f (x)) =
! −δ ! −δ
−π (f (x − t) − f (x+ )) cos Nt cot 2t dt −π (f (x − t) − f (x+ )) sin Nt dt
!π !π
+ δ (f (x − t) − f (x− )) cos Nt cot 2t dt + δ (f (x − t) − f (x− )) sin N t dt
!0 !δ
+ 2π −δ (f (x − t) − f (x+ ))DN (t) dt + 2π 0 (f (x − t) − f (x− ))DN (t) dt .

Now notice that the first 2 integrals on the right are just the Fourier coefficients of cos Nx and
sin N x in the Fourier series of
1
2 (f (x − t) − f (x+ ) cot 2t , −π < t < −δ
g1 (t) =
0 −δ < t < π
and
1
2 (f (x − t) − f (x+ ) −π < t < −δ
g2 (t) =
0 −δ < t < π ,

respectively, and hence, since g1 (t), g2 (t) are piecewise continuous on [−π, π], these first 2 integrals
converge to zero (by Bessel’s inequality) as N → ∞ for each fixed δ ∈ (0, 2π 3 ). Similarly, the 3rd
and 4th integrals converge to zero as N → ∞ for each fixed δ ∈ (0, 2π 3 ). On the other hand, using
the hypotheses (∗) of Thm. 9.2 we have that there is a fixed constant C and a constant δ1 > 0 such
that
|f (x − t) − f (x+ )| ≤ C|t|, −δ1 < t < 0
and
|f (x − t) − f (x− )| ≤ C|t|, 0 < t < δ1 .

Hence, the final two integrals on the right in the expression above can be estimated as follows if
δ ≤ δ1 :
!0 !δ
| −δ (f (x − t) − f (x+ ))DN (t) dt| + | 0 (f (x − t) − f (x− ))DN (t) dt|
!0 !δ
≤ −δ |(f (x − t) − f (x+ )||DN (t) dt + 0 |(f (x − t) − f (x− )||DN (t)| dt
#! !δ $
0
≤ C −δ |t||DN (t)| dt + 0 |t||DN (t)| dt ≤ 2Cδ by (∗∗) above .

Thus, we can finish the proof as follows. Let ε > 0 be given. Choose δ ∈ (0, 2π 3 ) such that δ < δ1
and 2Cδ < 2ε (i.e., δ < 4C
ε
).Then with this choice of δ fixed, the first 4 integrals on the right of identity
(‡) converge to zero as N → ∞, hence there is a constant N1 ≥ 1 such that the absolute value of
each of the first 4 integrals is < 8ε for all N ≥ N1 . That is,

|f (x) − SN (x)| < 4 × ε


8 + ε
2 = ε, for all N ≥ N1 .
LECTURE 9. POINTWISE CONVERGENCE OF TRIGONOMETRIC FOURIER SERIES 125

LECTURE 9 PROBLEMS
|t|
9.1 Prove | sin t| ≥ 2 for |t| ≤ π
3.
(Hint: Since | sin t|, |t| are even functions, it is enough to check the inequality for t ∈ [0, π3 ]. Use
the mean value theorem on the function sin t − 2t .)

9.2 Use the Fourier series of the function in Problem 8.4 (i) of Lecture 8, together with the pointwise

convergence result of Thm. 9.2 to find the sum of the series ∞ 1
n=1 (2n−1)2 .

9.3 Calculate the Fourier series for the function f (x) = x 2 , −π ≤ x ≤ π and prove that the Fourier
series converges to f (x) for each x ∈ [−π, π]. (Use any general theorem from the text that you need.)
Hence, check the identity
∞ 1 π2
n=1 n2 = 6

(independently of the result of Thm. 8.1).

9.4 By using a change of scale x → πx L and the theory of trigonometric Fourier series for functions in
CA ([−π, π ]), give a general discussion of trigonometric Fourier series of functions in CA ([−L, L]).
126 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS
127

Index

absolute convergence, 103 bisection, method of, 94


absolutely convergent, 102 Bolzano-Weierstrass, 33
absolutely, converges, 105 Bolzano-Weierstrass theorem, 33, 82, 94
addition and multiplication by scalars, closed bound
under, 5 least upper, 88
addition, definition of, 1 lower, 88
affine space, 27, 28 upper, 88
algebra, linear, 1, 61 bounded, 88, 91
analysis above, 88, 91
in Rn , 31 below, 88, 91
in Rn , more, 81 sequence, 94
introductory lectures on real, 87
angle between two nonzero vectors, 4 cancellation law, 90
angle between vectors in Rn , 3 Cauchy root test, 103
approximation, polygonal, 49 Cauchy-Schwarz inequality, 4, 5, 117
Archimedean property, 89 chain rule, 41
arrow, 2 chain rule theorem, 41
attain its maximum, 96 change of base-point theorem, 106
attain its minimum, 96 closed sets in Euclidean space, 31
augmented matrix, 28 closed under addition and multiplication by
axioms scalars, 5
field, 87 column
order, 87 pivot, 23, 26
vector space, 116 rank of a matrix, 16
space, 15, 16
basis, 11 combination, linear, 6
orthonormal, 73 comparison test for divergence, 103
theorem, 11 complements, orthogonal, 18
vectors, 12 completeness property, 88
vectors for Rn , standard, 2 completion of the square, 3
Bessel’s inequality, 118 complex exponential series, 113
binomial theorem, 107 complex series, 113
128 INDEX

component of the vector, 1 of subsequence, 93


component sequences, 33 of the angle between vectors, 4
computation of determinants, 67 orthogonal complement, 18
computing the inverse, 72 derivatives
construction of the real numbers, 89 directional, 37
continuity definition, 96 higher-order partial, 42
continuity of the Jacobian matrix, 84 matrix, 36
continuous one-sided, 48, 122
at the point, 96 partial, 37
functions, 96 test, second, 44
functions, piecewise, 118 determinants, 64
contraction mapping principle, 81 determinants, efficient computation of, 67
convergence difference, second, 43
interval of, 105 differentiability definition, 35
pointwise, 119 differentiability in Rn , 35
radius of, 105 dimension of subspace, 12, 16
convergent, absolutely, 102 direct sum, 19
convergent, sequences, 92 directed line segment, 2
converges, 102 directional derivatives, partial derivatives,
absolutely, 105 gradient, 37
sequences, 92 divergence, 103
Taylor series, 110 comparison test for, 103
cosines, law of, 5 ratio test for, 103
curves in Rn , 48 dot product
and angle between vectors in Rn , 3
De Morgan Laws, 32 definition, 3
decreasing, 91 properties, 3
decreasing, strictly, 91
definition echelon form of a matrix, 22, 23
complex exponential series, 115 efficient computation of determinants, 67
continuity, 96 eigenvalues
dot product, 3 and the spectral theorem, 76
local maximum, 44 of the matrix, 76
local minimum, 45 properties, 76
of addition, 1 eigenvector, 77
of differentiability, 35 elementary operations, 6
of matrices, 13 elimination, Gaussian, 7, 22
of open and closed sets, 31 equations, linear systems of, 7
of span, 6 Euclidean space, 14
INDEX 129

definitions and theorems for, 116 integral test, 103


open and closed sets in, 31 interval of convergence, 105
real, 1 inverse
extrema of a multivariable function, 44 computing the, 72
function theorem, 82
field axioms, 87 matrix, 69
finite length, 49 of a square matrix, 69
Fourier series, 116, 118 inversions, number of, 62
Fourier series, pointwise convergence of
trigonometric, 121 Jacobian matrix, 36
function j -th standard basis vector, 12
continuous, 96
extrema of a multivariable, 44 Lagrange multiplier theorem, 59
multivariable, 44 law
piecewise continuous, 118 cancellation, 90
second derivative test for, 46 De Morgan, 32
of cosines, 5
Gaussian elimination, 7, 22 parallelogram, 5
Gaussian elimination and the linear triangle, 2
dependence lemma, 7 least upper bound, 88, 97
gradient, 37 length finite, 49
gradient, tangential, 53, 57 length of a matrix, 14
Gram-Schmidt orthogonalization, 74, 75 limit point of a set, 31
graph map, 53 limits and continuity in Rn , 33
higher-order partial derivatives, 42 line segment, directed, 2
homogeneous system, 8 line vectors, 1, 2
linear
identity matrix, 66 combination, 6
implicit function theorem, 84, 85 combination, nontrivial, 6
increasing, strictly, 91 dependence
indefinite, 48 lemma, 10
index of a permutation, 62 lemma, Gaussian elimination and the, 7
inequality of vectors, subspaces, 5
Bessel’s, 118 subspace, 5
Cauchy-Schwarz, 4, 5, 117 systems of equations, 7
triangle, 4, 40 transformations, 15, 78
infimum, 88 transformations, matrix representations of,
inhomogeneous systems, 27 75
inner product, 116 linearly dependent vectors, 6
130 INDEX

linearly independent vectors, 11 norm of a matrix, 14


local maximum, 44, 59 null space, 16
local minimum, 45, 59 nullity of a matrix, 16
lower bound, 88
one-sided derivatives, 48, 122
matrices, 13 open and closed sets in Euclidean space, 31
matrix operations, elementary, 6
augmented, 28 operations with vectors, 1
column rank of a, 16 order axioms, 87
component-wise, 14 orthogonal
continuity of the Jacobian, 84 complements and orthogonal projection,
derivative, 36 18
eigenvalues of the, 76 matrix, 75
identity, 66 projection, 119
inverse, 69 orthogonal complements and, 18
inverse of a square, 69 unique, 19
Jacobian, 36 vectors, 4
length of a, 14 orthogonalization, Gram-Schmidt, 74, 75
norm of a, 14 orthonormal
nullity of a, 16 basis, 73
orthogonal, 75 rows, 75
product, 13 set of vectors, 73
rank, 16 vectors, 73, 78, 117
real, 13
reduced row echelon form, 23 parallelogram law, 5
representations of linear transformations, parity of the permutation, 62
75 partial derivatives
row echelon form of a, 22 gradient, directional derivatives, 37
row rank of a, 16 higher-order, 42
symmetric, 20 second order, 42
mean value theorem, 39, 98 partition, 49
method of bisection, 94 permutation, 61
monotone, 91 even, 62
multiplying two series, 113 index of a, 62
multivariable function, 44 odd, 62
parity of the, 62
negative definite quadratic form, 45 piecewise continuous functions, 118
nontrivial linear combination, 6 pivot columns, 23, 26
nontrivial subspace, 18 pointwise convergence, 119
INDEX 131

pointwise convergence of trigonometric component, 33


Fourier series, 121 convergent, 92
polygonal approximation, 49 of real numbers and the
positive definite quadratic form, 45 Bolzano-Weierstrass theorem, 91
power series, 105 of transpositions, 62
product real, 33
dot, 3 series
inner, 116 complex, 113
matrix, 13 complex exponential, 113
of series, 113 converges, 100, 113
properties of dot, 3 absolutely, 105
theorem, 114 Taylor, 110
projection, orthogonal, 19 definition of complex exponential, 115
Pythagoras’ theorem, 2 Fourier, 116, 118
quadratic form, 45 multiplying two, 113
of real numbers, 100
radius of convergence, 105 pointwise convergence of trigonometric
rank Fourier, 121
and the rank-nullity theorem, 15 power, 105
column, 16 products of, 113
nullity theorem, 16 rearrangement of, 102
nullity theorem, rank and the, 15 representations, Taylor, 108
of a matrix, 16 trivially converges, 105
ratio test, 103 solution
ratio test for divergence, 103 of the system, 7
rational numbers, 88 set of the system, 7
rearrangement of series, 102 vector, 7
reduced row echelon form, 23
span definition, 6
representations of linear transformations,
spectral theorem, 77
matrix, 75
spectral theorem, eigenvalues and the, 76
representations, Taylor series, 108
square matrix, inverse of a, 69
Rolle’s theorem, 97
standard basis vectors for Rn , 2
Sandwich theorem, 35, 95 strictly decreasing, 91
second derivative test, 44 strictly increasing, 91
second difference, 43 submanifolds of Rn and tangential gradients,
segment, directed line, 2 53
sequences subsequence definition, 93
bounded, 94 subspace
132 INDEX

dimension, 16 Taylor’s, 109


dimension of a, 12 transformation, 61
linear, 5 linear, 15, 78
linear dependence of vectors, 5 matrix representations of linear, 75
nontrivial, 18 transposition, 61, 62
trivial, 5 triangle
sum, direct, 19 identity, 2
symmetric matrix, 20 inequality, 4, 40
law, 2
tangent vector, unit, 48 trivial subspace, 5
tangential gradient, 57
Taylor series representations, 108 under-determined system, 9
Taylor theorem, 109 unit tangent vector, 48
theorem upper bound, 88
basis, 11 upper bound, least, 88
binomial, 107
vectors, 1
Bolzano-Weierstrass, 33, 82, 94
angle between, 3
chain rule, 41
basis, 12
change of base-point, 106
components of the, 1
contraction mapping, 81
definition of the angle between, 4
eigenvalues and the spectral, 76
in Rn , 1
implicit function, 84, 85
j -th standard basis, 12
inverse function, 82
line, 1, 2
Lagrange multiplier, 59
linearly dependent, 6
mean value, 39, 98 linearly independent, 11
orthogonal projection, 20 norm, 14
product, 114 orthogonal, 4
Pythagoras’, 2 orthonormal, 73, 78, 117
rank and the rank-nullity, 15 orthonormal set of, 73
rank/nullity, 16 unit tangent, 48
Rolle’s, 97 velocity, 48
Sandwich, 35, 95 zero, 5
second derivative test, 46 zero column, 68
sequences of real numbers and the velocity vector, 48
Bolzano-Weierstrass, 91
spectral, 77 x, length of, 2

Potrebbero piacerti anche