Sei sulla pagina 1di 99

Linear Algebra: MAT 217

Lecture notes, Spring 2013

Michael Damron

Princeton University

Contents
1 Vector spaces
1.1 Vector spaces and fields . . . .
1.2 Subspaces . . . . . . . . . . . .
1.3 Spanning and linear dependence
1.4 Bases . . . . . . . . . . . . . . .
1.5 Exercises . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

4
4
6
8
11
15

2 Linear transformations
2.1 Definitions . . . . . . . .
2.2 Range and nullspace . .
2.3 Isomorphisms . . . . . .
2.4 Matrices and coordinates
2.5 Exercises . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

18
18
20
22
23
28

3 Dual spaces
3.1 Definitions .
3.2 Annihilators
3.3 Transpose .
3.4 Double dual
3.5 Exercises . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

31
31
32
33
35
37

4 Determinants
4.1 Permutations . . . . . . . . . . . . . . .
4.2 Determinants: existence and uniqueness
4.3 Properties of the determinant . . . . . .
4.4 Exercises . . . . . . . . . . . . . . . . . .
4.5 Exercises on polynomials . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

38
38
41
45
47
50

5 Eigenvalues
5.1 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52
52
55
57

6 Jordan form
6.1 Primary decomposition theorem . . . . . . . . . . . . . . .
6.2 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . .
6.3 Existence and uniqueness of Jordan form, Cayley-Hamilton
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

59
59
62
65
68

7 Bilinear forms
7.1 Definition and matrix representation . . . . . . . . . . . . . . . . . . . . . .
7.2 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73
73
77

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

7.3
7.4

Sequilinear and Hermitian forms . . . . . . . . . . . . . . . . . . . . . . . . .


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 inner product spaces


8.1 Definitions . . . . . . . . . . . . . . . . . . . . .
8.2 Orthogonality . . . . . . . . . . . . . . . . . . .
8.3 Adjoint . . . . . . . . . . . . . . . . . . . . . .
8.4 Spectral theory in inner product spaces . . . . .
8.5 Appendix: proof of Cauchy-Schwarz by P. Sosoe
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

80
82
84
84
85
89
92
94
97

Vector spaces

1.1

Vector spaces and fields

Linear algebra is the study of linear functions. In Rn these are functions f satisfying
f (x + y) = f (x) + f (y) and f (cx) = cf (x), x, y Rn , c R.
We will generalize this immediately, taking from Rn only what we absolutely need. We start
by looking at the c value above: it is called a scalar. Generally scalars do not need to come
from R. There is only some amount of structure we need for the set of scalars.
Definition 1.1.1. A set F is called a field if for each a, b F, there is an element ab F
and another a + b F such that
1. for all a, b, c F, (ab)c = a(bc) and (a + b) + c = a + (b + c),
2. for all a, b F, ab = ba and a + b = b + a,
3. there exist element 0, 1 F such that for all a F, a + 0 = a and 1a = a,
4. for all a F, there is an element a F such that a + (a) = 0 and if a 6= 0 there is
an element a1 F such that aa1 = 1 and
5. for all a, b, c F, a(b + c) = ab + ac.
This is our generalization of R. Note one interesting point: there is nothing that asserts
that F must be infinite, and indeed there are finite fields. Take any prime p and consider
the set Zp given by
Zp = {0, . . . , p 1} with modular arithmetic .
That is, a + b is defined as (a + b) mod p (for instance (2 + 5) mod 3 = 1). Then this is
a field. You will verify this in the exercises. Another neat fact: if F is a finite field then it
must have pn elements for some prime p and n N. You will prove this too.
Other examples are R and C.
Given our field of scalars we are ready to generalize the idea of Rn ; we will call this a
vector space.
Definition 1.1.2. A collection (V, F) of a set V and a field F is called a vector space (the
elements of V called vectors and those of F called scalars) if the following hold. For each
v, w V there is a vector sum v + w V such that
1. there is one (and only one) vector called ~0 such that v + ~0 = v for all v V ,
2. for each v V there is one (and only one) vector v such that v + (v) = ~0,
3. for all v, w V , v + w = w + v,
4

4. for all v, w, z V , v + (w + z) = (v + w) + z.
Furthermore for all v V and c F there is a scalar product cv V such that
1. for all v V , 1v = v,
2. for all v V and c, d F, (cd)v = c(dv),
3. for all v, w V and c F, c(v + w) = cv + cw and
4. for all v V and c, d F, (c + d)v = cv + dv.
This is really a ton of rules but they have to be verified! In case (V, F) is a vector space,
we will typically say V is a vector space over F or V is an F-vector space. Lets look at some
examples.
1. Take V = Rn and F = R. We define addition as you would imagine:
(v1 , . . . , vn ) + (w1 , . . . , wn ) = (v1 + w1 , . . . , vn + wn )
and scalar multiplication by
c(v1 , . . . , vn ) = (cv1 , . . . , cvn ) .
2. Let F be any field with n N and write
Fn = {(a1 , . . . , an ) : ai F for i = 1, . . . , n}
and define addition and scalar multiplication as above. This is a vector space. Note in
particular that F is a vector space over itself.
3. If F1 F2 are fields (with the same 0, 1 and operations) then F2 is a vector space over
F1 . This situation is called a field extension.
4. Let S be any nonempty set and F a field. Then define
V = {f : S F : f a function} .
Then V is an F-vector space using the operations
(f1 + f2 )(s) = f1 (s) + f2 (s) and (cf1 )(s) = c(f1 (s)) .
Facts everyone should see once.
1. For all c F, c~0 = ~0.

Proof.
c~0 = c(~0 + ~0) = c~0 + c~0
~0 = c~0 + (c~0) = (c~0 + c~0) + (c~0)
= (c~0) + (c~0 + (c~0)) = c~0 .

2. For all v V , 0v = ~0.


Proof. 0v = (0 + 0)v = 0v + 0v. Adding (0v) to both sides gives the result.
3. For all v V , (1)v = v.
Proof.
v + (1)v = 1v + (1)v = (1 + (1))v = 0v = ~0 .

1.2

Subspaces

Definition 1.2.1. Let V be a vector space over F. Then W V is called a subspace of V


if W is a vector space over F using the same operations as in V .
Suppose we are given a vector space V . To check that W V is a subspace we need to
verify eight properties! Do not worry many of them follow immediately, by inheritance.
That is, they are true simply because they were true in V . For example if V is a vector
space over F and v, w W then clearly v + w = w + v, since these are also vectors in V and
addition is commutative in V .
We only need to check the following.
1. ~0 W .
2. (closed under addition) For all v, w W , v + w W .
3. (closed under scalar multiplication) For all v W and c F, cv W .
4. (closed under inverses) For all v W , v W .
Proposition 1.2.2. Let (V, F) be a vector space. Then W V is a subspace if and only if
it is nonempty and for all v, w W and c F, cv + w W .
Proof. Suppose that W satisfies the property in the proposition. Then let v W . Taking
v = w and c = 1, we get ~0 = v + (1)v W . Next, if c F then cv = ~0 + cv W . If
w W then v + (1)w = v + w W , giving W as a subspace. Conversely, if W is a subspace
then for all v W and c F, cv W , so if w W , we get cv + w W . Furthermore W is
nonempty since it contains ~0.
6

If V is a vector space over F with W1 , W2 subspaces we can generate a new space. We


define
W1 + W2 = {w1 + w2 : w1 W1 , w2 W2 } .
Generally we define
W1 + + Wn = (W1 + + Wn1 ) + Wn .
Claim 1.2.3. W1 + W2 is a subspace.
Proof. First it is nonempty. Next if v, w W1 + W2 and c F, we can write v = w1 + w2
and w = w10 + w20 for w1 , w10 W1 and w2 , w20 W2 . Then
cv + w = c(w1 + w2 ) + (w10 + w20 ) = (cw1 + w10 ) + (cw2 + w20 ) .
Since W1 and W2 are subspaces, the first element is in W1 and the second in W2 , giving
cv + w W1 + W2 , so it is a subspace.
Question from class. If V is a vector space over F and W is a subset of V that is a vector
space using the same operations of addition and scalar multiplication, can the zero element
of W be different from the zero element of V ? No. Let ~0W be the zero element from W .
Then ~0W + ~0W = ~0W . However denoting by v the additive inverse element of ~0W from V ,
we have
~0 = ~0W + v = (~0W + ~0W ) + v = ~0W + (~0W + v) = ~0W .
Examples.
1. For all n1 n2 , Cn1 is a subspace of Cn2 (as C-vector spaces). Here we identify
Cn1 = {(z1 , . . . , zn2 ) : zn1 +1 = = zn2 = 0} .
2. Given a vector space V over F, {~0} is a subspace.
3. In R2 , any subspace is either (a) R2 , (b) {~0} or (c) a line through the origin. Why?
If W is a subspace and contains some w 6= ~0, it must contain the entire line spanned
by w; that is, the set {cw : c R}. This is a line through the origin. If it contains
anything outside this line, we can use this new vector along with w to generate all of
R2 .
4. Generally in Rn , any subspace is a hyperplane through the origin.
Last time we saw that if W1 and W2 are subspaces of a vector space V then
W1 + W2 = {w1 + w2 : w1 W1 , w2 W2 }
is also a subspace. This is actually the smallest subspace containing both W1 and W2 . You
might think this would be W1 W2 , but generally, the union does not need to be a subspace.
Consider V = R2 over R and
W1 = {(x, 0) : x R}, W2 = {(0, y) : y R} .
7

Then both of these are subspaces but their union is not, since it is not closed under addition
((1, 1) = (1, 0) + (0, 1)
/ W1 W2 ).
In the case that W1 W2 = {~0}, we say that W1 + W2 is a direct sum and we write it
W1 W2 .

1.3

Spanning and linear dependence

Given a subset S (not necessarily a subspace) of a vector space V we want to generate the
smallest subspace containing S.
Definition 1.3.1. Let V be a vector space and S V . The span of S is defined
Span(S) = W CS W ,
where CS is the collection of subspaces of V containing S.
Note that the Span is the smallest subspace containing S in that if W is another subspace
containing S then Span(S) W . The fact that Span(S) is a subspace follows from:
Proposition 1.3.2. Let C be a collection of subspaces of a vector space V . Then W C W
is a subspace.
Proof. First each W inC contains ~0, so W C W is nonempty. If v, w W C W and c F
then v, w W for all W C. Since each W is a subspace, cv + w W for all W C,
meaning that cv + w W C W , completing the proof.
Examples.
1. Span() = {~0}.
2. If W is a subspace of V then Span(W ) = W .
3. Span(Span(S)) = Span(S).
4. If S T V then Span(S) Span(T ).
There is a different way to generate the span of a set. We can imagine that our initial
definition of span is from the outside in. That is, we are intersecting spaces outside of S.
The second will be from the inside out: it builds the span from within, using the elements
of S. To define it, we introduce some notation.
Definition 1.3.3. If S V then v V is said to be a linear combination of elements of
S if there are finitely many elements v1 , . . . , vn S and scalars a1 , . . . , an F such that
v = a1 v1 + + an vn .
Theorem 1.3.4. Let S V be nonempty. Then Span(S) is the set of all linear combinations
of elements of S.
8

Proof. Let S be the set of all linear combinations of elements of S. We first prove S
Each of the vi s is in S and therefore in Span(S).
Span(S), so let a1 v1 + + an vn S.
By closure of Span(S) under addition and scalar multiplication, we find a1 v1 + + an vn
Span(S).
it suffices to show that S is a subspace of V ; then it is one of
To show that Span(S) S,
the spaces we are intersecting to get Span(S) and we will be done. Because S 6= we can find
s S and then 1s is a linear combination of elements of S, making the Span nonempty. So
let v, w Span(S) and c F. We can write v = a1 v1 + + an vn and w = b1 w1 + + bk wk
for vi , wi S. Then
cv + w = (ca1 )v1 + + (can )vn + b1 w1 + + bk wk S .

Corollary 1.3.5. If W1 , W2 are subspaces of V then Span(W1 W2 ) = W1 + W2 .


Proof. Because ~0 W1 and in W2 , we have W1 + W2 (W1 W2 ). Therefore W1 + W2 is one
of the subspaces we intersect to get the span and Span(W1 W2 ) W1 + W2 . Conversely,
any element in W1 + W2 is in Span(W1 W2 ) as it is already a linear combination of elements
of W1 W2 .
Definition 1.3.6. A vector space V is finitely generated if there is a finite set S V such
that V = Span(S). Such an S is called a generating set.
The space Rn is finitely generated: we can choose
S = {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)} .
The space
R
c = {(x1 , x2 , . . .) : xi R, finitely many nonzero terms}
with coordinate-wise addition and scalar multiplication is not finitely generated.
Generating sets are closely linked to linear independence.
Definition 1.3.7. A set S V is called linearly dependent if there exists v S such that
v Span(S \ {v}). We decree that is linearly independent; that is, not linearly dependent.
The intuition is that a set is linearly dependent if there are unnecessary elements in it to
span Span(S). Indeed, we can restate this condition for S 6= as
S linearly dependent iff v S such that Span(S) = Span(S \ {v}) .
Exercise: prove this!
Examples.
9

1. {~0} is linearly dependent in any vector space.


2. In C2 , {(1, 0), (0, 1), (1, 1)} is linearly dependent, since (1, 1) Span({(1, 0), (0, 1)}).
3. In Cn ,
{(1, 0, . . . , 0), . . . , (0, . . . , 0, 1)}
is linearly independent. Indeed, suppose we remove any element from this set. For
simplicity let us take the first. Then every element in the span of the others must have
zero first-coordinate, and cannot be (1, 0, . . . , 0).
There is a very simple condition we can check to see if a set is linearly independent.
Proposition 1.3.8. Let V be a vector space and S V . Then S is linearly independent if
and only if whenever a1 , . . . , an F and v1 , . . . , vn S satisfy
a1 v1 + + an vn = ~0
we must have a1 = = an = 0.
Proof. If S = then S is linearly independent. Furthermore, it satisfies the condition of
the proposition vacuously: it is true because we cannot ever find a linear combination of
elements of S equal to ~0.
Otherwise suppose that S is linearly dependent but S 6= . Then we can find v S such
that v Span(S \ {v}). Therefore v is a linear combination of elements of S \ {v}: we can
find w1 , . . . , wn S \ {v} and scalars a1 , . . . , an such that v = a1 w1 + + an wn . Then
(a1 )w1 + + (an )wn + v = ~0 .
This is a linear combination of elements of S equal to ~0 with not all coefficients equal to 0,
proving that if the condition of the proposition holds, then S must be linearly independent.
Conversely if S is linearly independent suppose that
a1 v1 + + an vn = ~0
for some v1 , . . . , vn S and a1 , . . . , an F. If the coefficients are not all 0, we can find one,
say a1 which is nonzero. Then we solve:
v1 = (a1
1 ) [a2 v2 + + an vn ] ,
giving v1 Span(S \ {v1 }). (Note here that a1
1 is defined since a1 6= 0 and all nonzero field
elements are invertible.)
Corollary 1.3.9. Let S1 S2 V , an F-vector space.
1. If S1 is linearly dependent, so is S2 .
2. If S2 is linearly independent, so is S1 .
10

Proof. The first item follows from the second, so we prove the second. Suppose that S2 is
linearly independent and that v1 , . . . , vn S1 , a1 , . . . , an F such that
a1 v1 + + an vn = ~0 .
Since these vectors are also in S2 and S2 is linearly independent, a1 = = an = 0. Thus
S1 is linearly independent.
Recall the intuition that a set is linearly independent if each vector in it is truly needed
to represent vectors in the span. Not only are the all needed, but linear independence implies
that there is exactly one way to represent each vector of the span.
Proposition 1.3.10. Let S V be linearly independent. Then for each nonzero vector v
Span(S) there exists exactly one choice of v1 , . . . , vn S and nonzero coefficients a1 , . . . , an
F such that
v = a1 v1 + + an vn .
Proof. Let v Span(S) be nonzero. By characterization of the Span as the set of linear
combinations of elements of S, there is at least one representation as above. To show
it is unique, suppose that v = a1 v1 + + an vn and v = b1 w1 + + bk wk and write
S1 = {v1 , . . . , vn }, S2 = {w1 , . . . , wk }. We can rearrange the Si s so that the elements
v1 = w1 , . . . , vm = wm are the common ones; that is, the ones in S1 S2 . Then
~0 = v v =

m
X

(aj bj )vj +

j=1

n
X
l=m+1

al vl +

k
X

bp w p .

p=m+1

This is just a linear combination of elements of S, so by linear independence, all coefficients


are zero, implying that aj = bj for j = 1, . . . , m, and all other al s and bl s are zero. Thus
all nonzero coefficients are the same in the linear combinations and we are done.

1.4

Bases

We are now interested in maximal linearly independent sets. It turns out that these must
generate V as well, and we will work toward proving that.
Definition 1.4.1. Let V be an F-vector space and S V . If S generates V and is linearly
independent then we call S a basis for V .
Note that the above proposition says that any vector in V has a unique representation
as a linear combination of elements from the basis.
We will soon see that any basis of V must have the same number of elements. To prove
that, we need a famous lemma. It says that if we have a linearly independent T and a
spanning set S, we can add #S #T vectors from S to T to make it spanning.
Theorem 1.4.2 (Steinitz exchange lemma). Let S = {v1 , . . . , vm } satisfy Span(S) = V and
let T = {w1 , . . . , wk } be linearly independent. Then
11

1. k m and
2. after possibly reordering the set S, we have
Span({w1 , . . . , wk , vk+1 , . . . , vm }) = V .
Proof. The proof is by induction on k, the size of T . If k = 0 then T is empty and thus
linearly independent. In this case, we do not exchange any elements of T with elements of
S and the lemma simply states that 0 m and Span(S) = V , which is true.
Suppose that for some k 0 and all linearly independent sets T of size k, the lemma
holds; we will prove it holds for k + 1, so let T = {w1 , . . . , wk+1 } be a linearly independent
set of size k + 1. By last lecture, {w1 , . . . , wk } is linearly independent and by induction,
k m and we can reorder S so that
Span({w1 , . . . , wk , vk+1 , . . . , vm }) = V .
Because of this we can find scalars a1 , . . . , am such that
a1 w1 + + ak wk + ak+1 vk+1 + + am vm = wk+1 .

(1)

If k = m or if k m 1 but all the coefficients ak+1 , . . . , am are zero, then we have


wk+1 Span({w1 , . . . , wk }), a contradiction since T is linearly independent. Therefore we
must have k + 1 m and at least one of ak+1 , . . . , am must be nonzero. Reorder the set S
so that ak+1 6= 0. Then we can solve for vk+1 in (1) to find
vk+1 Span({w1 , . . . , wk+1 , vk+2 , . . . , vm }) .
Therefore each element of {w1 , . . . , wk , vk+1 , . . . , vm } can be represented as a linear combination of elements from {w1 , . . . , wk+1 , vk+2 , . . . , vm }, and since the former set spans V , we
see that
Span({w1 , . . . , wk+1 , vk+2 , . . . , vm }) = V .
This completes the proof.
We can now give all the consequences of this theorem.
Corollary 1.4.3. Let V be an F-vector space. If B1 and B2 are both bases for V then they
have the same number of elements.
Proof. If B1 is finite, with n elements, then suppose that B2 has at least n + 1 elements.
Choosing any such subset of size n + 1 as T and B1 as the spanning set from the previous
theorem, we see that n + 1 n, a contradiction. This means #B2 #B1 . If on the
other hand B1 is infinite, then if B2 were finite, we could reverse the roles of B2 and B1 ,
apply Steinitz again, and see #B1 #B2 , a contradiction. Therefore in all cases we have
#B2 #B1 . Applying this same logic for B1 and B2 reversed, we get #B1 #B2 , proving
the corollary.
12

Definition 1.4.4. A vector space with a basis of size n is called n-dimensional and we
write dim(V ) = n. If this is true for some n we say the vector space is finite dimensional.
Otherwise we say that V is infinite dimensional and write dim(V ) = .
Note that {~0} is zero dimensional, since is a basis for it.
Corollary 1.4.5. Let V be an n-dimensional vector space (n 1) and S = {v1 , . . . , vm }.
1. If m < n then S cannot span V .
2. If m > n then S cannot be linearly independent.
3. If m = n then S is linearly independent if and only if it spans V .
Proof. Let B be a basis for V . Then using Steinitz with B as the linearly independent set
and S as the spanning set, we see that if S spans V then S has at least n elements, proving
the first part. Similarly, using Steinitz with B as the spanning set and S as the linearly
independent set, we get part two.
If m = n and S is linearly independent then Steinitz implies that we can add 0 vectors
from B to S to make S span V . This means S itself spans V . Conversely, if S spans V ,
then if it is not linearly independent, we can find v V such that v Span(S \ {v}), or
V = Span(S) Span(S \ {v}). Therefore S \ {v} is a smaller spanning set, contradicting
the first part.
Corollary 1.4.6. If W is a subspace of V then dim(W ) dim(V ). In particular, if V has
a finite basis, so does W .
Proof. If V is infinite dimensional there is nothing to prove, so let B be a finite basis for V
of size n. Consider all subsets of W that are linearly independent. By the previous corollary,
none of these have more than n elements (they cannot be infinite either since we could then
extract a linearly independent subset of size n + 1). Choose any one with them largest
number of elements and call it BW . It must be a basis the reason is that it is a maximal
linearly independent subset of W (this is an exercise on this weeks homework). Because it
has no more than dim(V ) number of elements, we are done.
Now we have one of two main subspace theorems. It says we can extend a basis for a
subspace to a basis for the full space.
Theorem 1.4.7 (One subspace theorem). Let W be a subspace of a finite-dimensional vector
space V . If BW is a basis for W , there exists a basis B of V containing BW .
Proof. Consider all linearly independent subsets of V that contain BW (there is at least one,
BW !) and choose one, S, of maximal size. We know that #S dimV and if #S = dimV
it must be a basis and we are done, so assume that #S = k < dimV . We must then
have Span(S) 6= V so choose a vector v V \ Span(S). We claim that S {v} is linearly
independent, contradicting maximality of S. To see this write S = {v1 , . . . , vk } and
a1 v1 + + ak vk + bv = ~0 .
13

If b 6= 0 then we can solve for v, getting v Span(S), a contradiction, so we must have


b = 0. But then a1 v1 + + ak vk = ~0 and linear independence of S gives ai = 0 for all i, a
contradiction.
The second subspace theorem will follow from a dimension theorem.
Theorem 1.4.8. Let W1 , W2 be subspaces of V , a finite-dimensional vector space. Then
dim(W1 + W2 ) + dim(W1 W2 ) = dim(W1 ) + dim(W2 ) .
be a basis for the intersection W1 W2 . By the one subspace theorem we can
Proof. Let B
Write
find bases B1 and B2 of W1 and W2 respectively that both contain B.
= {v1 , . . . , vk }
B
B1 = {v1 , . . . , vk , vk+1 , . . . , vl }
B2 = {v1 , . . . , vk , wk+1 , . . . , wm } .
We will now show that B = B1 B2 is a basis for W1 + W2 . This will prove the theorem,
since then dim(W1 + W2 ) + dim(W1 W2 ) = k + (l + m k) = l + m.
To show that B is a basis for W1 + W2 we first must prove Span(B) = W1 + W2 . Since
B W1 + W2 , we have Span(B) Span(W1 + W2 ) = W1 + W2 . On the other hand, each
vector in W1 + W2 can be written as w1 + w2 for w1 W1 and w2 W2 . Because B contains
a basis for each of W1 and W2 , these vectors w1 and w2 can be written in terms of vectors
in B, so w1 + w2 Span(B).
Next we show that B is linearly independent. We set a linear combination equal to zero:
a1 v1 + + ak vk + ak+1 vk+1 + + al vl + bk+1 wk+1 + + bm wm = ~0 .

(2)

By subtracting the w terms to one side we find that bk+1 wk+1 + + bm wm W1 . But this
is a basis for the intersection
sum is already in W2 , so it must be in the intersection. As B
we can write
bk+1 wk+1 + + bm wm = c1 v1 + + ck vk
for some ci s in F. Subtracting the ws to one side and using linear independence of B2 gives
bk+1 = = bm = 0. Therefore (2) reads
a1 v1 + + al vl = ~0 .
Using linear independence of B1 gives ai = 0 for all i and thus B is linearly independent.
The proof of this theorem gives:
Theorem 1.4.9 (Two subspace theorem). If W1 , W2 are subspaces of a finite-dimensional
vector space V , there exists a basis of V that contains bases of W1 and W2 .
Proof. Use the proof of the last theorem to get a basis for W1 + W2 containing bases of W1
and W2 . Then use the one-subspace theorem to extend it to V .
14

Note the difference from the one subspace theorem. We are not claiming that you can
extend any given bases of W1 and W2 to a basis of V . We are just claiming there exists at
least one basis of V such that part of this basis is a basis for W1 and part is a basis for W2 .
In fact, given bases of W1 and W2 we cannot generally find a basis of V containing these
bases. Take
V = R3 , W1 = {(x, y, 0) : x, y R}, W2 = {(x, 0, z) : x, z R} .
If we take bases B1 = {(1, 0, 0), (1, 1, 0)} and B2 = {(1, 0, 1), (0, 0, 1)}, there is no basis of
V = R3 containing both B1 and B2 since V is 3-dimensional.

1.5

Exercises

We will write N = {1, 2, . . .} and Z = {. . . , 1, 0, 1, . . .} for the natural numbers and integers,
respectively. Let N = N {0}. The rationals are Q = {m/n : m, n Z, n 6= 0} and R
stands for the real numbers.
1. If a, b N we say that a divides b, written a | b, if there is another natural number c
such that b = ac. Fix m, n N and define
S = {mp + nq : p, q Z} N .
(a) Let d be the smallest element of S. Show that d | m and d | n.
Hint. You can use the statement of the division algorithm without proof; that
is, if a, b N then there exist r, s N such that r < b and a = bs + r.
(b) Show that if e is another element of N that divides both m and n then e | d. This
number d is called the greatest common divisor of m and n, written d = gcd(m, n).
(c) For any nonzero integers m, n define gcd(m, n) = gcd(|m|, |n|). Show there exist
p, q Z such that mp + nq = gcd(m, n).
2. Let p be a prime and Zp be the set {0, . . . , p 1}. Show that Zp is a field using the
operations
ab = (ab) mod p and a + b = (a + b) mod p .
Here we have defined a mod p for a N as the unique r N with r < p such that
a = ps + r for some s N .
3. Let S be a nonempty set and F a field. Let V be the set of functions from S to F and
define addition and scalar multiplication on (V, F) by
(f + g)(s) = f (s) + g(s) and (cf )(s) = c(f (s)) .
Show V is a vector space over F.

15

4. Let W be a subspace of an F-vector space V and define the set


V /W = {v + W : v V } .
Here the notation v + W means the set {v + w : w W }, so V /W is a set whose
elements are sets.
(a) Show that two elements v1 + W and v2 + W of V /W are equal if and only if
v1 v2 W . In this case we say that v1 and v2 are equivalent modulo W .
(b) Show that the elements of V /W form a partition of V . That is, their union is V
and distinct elements must have empty intersection.
(c) In the case of V = R2 and W = {(x, y) : x + y = 0}, with F = R, give a geometric
description of the elements of V /W .
(d) Define addition and scalar multiplication on V /W as follows. For C1 , C2 V /W ,
select v1 , v2 V such that C1 = v1 + W and C2 = v2 + W . Define
C1 + C2 = (v1 + v2 ) + W and cC1 = (cv1 ) + W for c F .
Show that these definitions do not depend on the choice of v1 , v2 .
(e) Prove that the above operations turn V /W into an F-vector space. It is called
the quotient space of V over W .
5. If V is an F-vector space, recall that V is finitely generated if there is a finite set S V
such that V = Span(S).
(a) Is R finitely generated as a vector space over Q?
(b) Is the space of functions from R to R finitely generated as a vector space over R?
6. Show that if S V is a finite generating set then S contains a basis for V . Deduce
that V is finitely generated if and only if it has a finite basis.
7. Let S V .
(a) Suppose that S generates V but no proper subset of S generates V (that is, S is
a minimal spanning set). Show that S is a basis.
(b) Suppose that S is linearly independent and is not a proper subset of any linearly
independent set in V (that is, S is a maximal linearly independent set). Show
that S is a basis.
8. Let W be a subspace of V .
(a) We say that S V is linearly independent modulo W if whenever v1 , . . . , vk S
and a1 , . . . , ak F are such that
a1 v1 + + ak vk W
then a1 = = ak = 0. Show that S is linearly independent modulo W if and
only if the set {v + W : v S} is linearly independent as a subset of V /W .
16

(b) Assume now that V has dimension n < . If W has dimension m, show that
V /W has dimension n m.
Hint. Let BW be a basis for W and use the one subspace theorem to extend it
to a basis B for V . Show that {v + W : v B but v
/ BW } is a basis for V /W .
(c) Let W1 W2 V be subspaces. Show that
dim W2 /W1 + dim V /W2 = dim V /W1 .
9. If W1 , . . . , Wk are subspaces of V we write W1 Wk for the sum space W1 + +Wk
if
Wj [W1 + + Wj1 ] = {~0} for all j = 2, . . . , k .
In this case we say that the subspaces W1 , . . . , Wk are independent.
(a) For k = 2, this definition is what we gave in class: W1 and W2 are independent if
and only if W1 W2 = {~0}. Give an example to show that for k > 2 this is not
true. That is, if W1 , . . . , Wk satisfy Wi Wj = {~0} for all i 6= j then these spaces
need not be independent.
(b) Prove that the following are equivalent.
1. W1 , . . . , Wk are independent.
2. Whenever w1 + + wk = ~0 for wi Wi for all i then wi = ~0 for all i.
3. Whenever Bi is a basis for Wi for all i, the Bi s are disjoint and B := ki=1 Bi
is a basis for W1 + + Wk .
10. Give an example to show that there is no three subspace theorem. That is, if
W1 , W2 , W3 are subspaces of V then there need not exist a basis of V containing a
basis for Wi for all i = 1, 2, 3.
11. Let F be a finite field. Define a sequence (sn ) of elements of F by s1 = 1 and sn+1 =
sn + 1 for n N. Last, define the characteristic of F as
char(F ) = min{n N : sn = 0} .
(If the set on the right is empty, we set char(F ) = 0.)
(a) Show that because F is finite, its characteristic is a prime number p.
(b) Show that the set {0, s1 , . . . , sp1 } with the same addition and multiplication as
in F is itself a field, called the prime subfield of F .
(c) Using the fact that F can be viewed as a vector space over its prime subfield,
show that F has pn elements for some n N.

17

Linear transformations

We now move on to the main subject of the course, linear transformations.

2.1

Definitions

Definition 2.1.1. Let V and W be vector spaces over the same field F. A function T : V
W is called a linear transformation if
T (v1 + v2 ) = T (v1 ) + T (v2 ) and T (cv1 ) = cT (v1 ) for all v1 , v2 V and c F .
As usual, we only need to check the condition
T (cv1 + v2 ) = cT (v1 ) + T (v2 ) for v1 , v2 V and c F .
Examples
1. Consider C as a vector space over itself. Then if T : C C is linear, we can write
T (z) = zT (1)
so T is completely determined by its value at 1.
2. Let V be finite dimensional and B = {v1 , . . . , vn } a basis for V . Each v V can be
written uniquely as
v = a1 v1 + + an vn for ai F .
So define T : V Fn by T (v) = (a1 , . . . , an ). This is called the coordinate map relative
to B. It is linear because if v = a1 v1 + + an vn , w = b1 v1 + + bn vn and c F,
cv + w = (ca1 + b1 )v1 + + (can + bn )vn
is one representation of cv + w in terms of the basis. But this representation is unique,
so we get
T (cv + w) = (ca1 + b1 , . . . , can + bn )
= c(a1 , . . . , an ) + (b1 , . . . , bn )
= cT (v) + T (w) .
3. Given any m n matrix A with entries from F (the notation from the homework is
A Mm,n (F), we can define a linear transformations LA : Fn Fm and RA : Fm Fn
by
LA (v) = A v and RA (v) = v A .
Here we are using matrix multiplication and in the first case, representing v as a column
vector. In the second, v is a row vector.
18

4. In fact, the set of linear transformations from V to W , written L(V, W ), forms a vector
space! Since the space of functions from V to W is a vector space, it suffices to check
that it is a subspace. So given T, U L(V, W ) and c F, we must show that cT + U
is a linear transformation. So let v1 , v2 V and c0 F:
(cT + U )(c0 v + w) = (cT )(c0 v + w) + U (c0 v + w)
= c(T (c0 v + w)) + U (c0 v + w)
= c(c0 T (v) + T (w)) + c0 U (v) + U (w)
= c0 (cT (v) + U (v)) + cT (w) + U (w)
= c0 (cT + U )v + (cT + U )(w) .
Another obvious way to build linear transformations is composition.
Proposition 2.1.2. Let T : V W and U : W Z be linear (with all spaces over the
same field F). Then the composition U T is a linear transformation from V to Z.
Proof. Let v1 , v2 V and c F. Then
(U T )(cv1 + v2 ) = U (T (cv1 + v2 )) = U (cT (v1 ) + T (v2 ))
= cU (T (v1 )) + U (T (v2 )) = c(U T )(v1 ) + (U T )(v2 ) .

Recall that each T : C C that is linear is completely determined by its value at 1.


Note that {1} is a basis. This fact holds true for all linear transformations and is one of the
most important theorems of the course: in the words of Conway, each linear transformation
is completely determined by its values on a basis, and any values will do!
Theorem 2.1.3 (The slogan). Let V and W be vector spaces over F. If {v1 , . . . , vn } is
a basis for V and w1 , . . . , wn are any vectors in W (with possible duplicates) then there is
exactly one T L(V, W ) such that T (vi ) = wi for all i = 1, . . . , n.
Proof. This is an existence and uniqueness statement, so lets first prove uniqueness. Suppose
that T, U L(V, W ) both map vi to wi for all i. Then write an arbitrary v V uniquely as
v = a1 v1 + + an vn . We have
T (v) = T (a1 v1 + + an vn ) = a1 T (v1 ) + + an T (vn ) = a1 w1 + + an wn
= a1 U (v1 ) + + an U (vn ) = U (a1 v1 + + an vn ) = U (v) .
To prove existence we must construct one such linear map. Each v V can be written
uniquely as v = a1 v1 + + an vn , so define T : V W by
T (v) = a1 w1 + + an wn .
The fact that T is a function (that is, for each v V there is exactly one w W such that
T (v) = w) follows from uniqueness of the representation of v in terms of the basis. So we
19

must show linearity. If v, v 0 V , write v = a1 v1 + + an vn and v 0 = b1 v1 + + bn vn . We


have for c F,
T (cv + v 0 ) = T ((ca1 + b1 )v1 + + (can + bn )vn )
= (ca1 + b1 )w1 + + (can + bn )wn
= c(a1 w1 + + an wn ) + (b1 w1 + + bn wn )
= cT (v) + T (v 0 ) .

2.2

Range and nullspace

Next we define two very important subspaces that are related to a linear transformation T .
Definition 2.2.1. Let T : V W be linear. The nullspace, or kernel, of T is the set
N (T ) V defined by
N (T ) = {v V : T (v) = ~0} .
The range, or image, of T , is the set R(T ) W defined by
R(T ) = {w W : T (v) = w for some v V } .
In the definition of N (T ) above, ~0 is the zero vector in the space W .
Proposition 2.2.2. Let T : V W be linear. Then N (T ) is a subspace of V and R(T ) is
a subspace of W .
Proof. First N (T ) is nonempty, since each linear transformation must map ~0 to ~0: T (~0) =
T (0~0) = 0T (~0) = ~0. If v1 , v2 N (T ) and c F,
T (cv1 + v2 ) = cT (v1 ) + T (v2 ) = c~0 + ~0 = ~0 ,
so cv1 + v2 N (T ), showing that N (T ) is a subspace of V . For R(T ), it is also non-empty,
since ~0 is mapped to by ~0. If w1 , w2 R(T ) and c F, choose v1 , v2 V such that
T (v1 ) = w1 and T (v2 ) = w2 . Then
cw1 + w2 = cT (v1 ) + T (v2 ) = T (cv1 + v2 ) ,
so cw1 + w2 is mapped to by cv1 + v2 , a vector in V and we are done.
In the finite-dimensional case, the dimensions of these spaces are so important they get
their own names: the rank of T is the dimension of R(T ) and the nullity of T is the dimension
of N (T ). The next theorem relates these dimensions to each other.
Theorem 2.2.3 (Rank-nullity). Let T : V W be linear and dim(V ) < . Then
rank(T ) + nullity(T ) = dim(V ) .
20

Proof. In a way, this theorem is best proved using quotient spaces, and you will do this in
the homework. We will prove it the more standard way, by counting and using bases. Let
{v1 , . . . , vk } be a basis for the nullspace of T and extend it to a basis {v1 , . . . , vk , vk+1 , . . . , vn }
for V . We claim that T (vk+1 ), . . . , T (vn ) are distinct and form a basis for R(T ); this will
complete the proof. If T (vi ) = T (vj ) for some i, j {k + 1, . . . , n}, we then have T (vi vj ) =
~0, implying that vi vj N (T ). But we have a basis for N (T ): we can write
vi vj = a1 v1 + + ak vk
and subtracting vi vj to the other side, we have a linear combination of elements of a basis
equal to zero with some nonzero coefficients, a contradiction.
Now we show B = {T (vk+1 ), . . . , T (vn )} is a basis for R(T ). They are clearly contained
in the range, so Span(B) R(T ). Conversely, if w R(T ) we can write w = T (v) for some
v V and using the basis, find coefficients such that bi such that
w = T (v) = T (b1 v1 + . . . + bn vn ) .
Expanding the inside, we get b1 T (v1 ) + + bn T (vn ). The first k vectors are zero, since
v1 , . . . , vk N (T ), so
w = bk+1 T (vk+1 ) + + bn T (vn ) ,
proving w Span(B) and therefore B spans R(T ).
For linear independence, let bk+1 T (vk+1 ) + + bn T (vn ) = ~0. Then
~0 = T (bk+1 vk+1 + + bn vn ) ,
so bk+1 vk+1 + + bn vn N (T ). As before, we can then write these vectors in terms of
v1 , . . . , vk , use linear independence of {v1 , . . . , vn } to get bi = 0 for all i.
One reason the range and nullspace are important is that they tell us when a transformation is one-to-one (injective) or onto (surjective). Recall these definitions:
Definition 2.2.4. If X and Y are sets and f : X Y is a function then we say that f is
one-to-one (injective) if f maps distinct points to distinct points; that is, if x1 , x2 X with
x1 6= x2 then f (x1 ) 6= f (x2 ). We say that f is onto (surjective) if each point of Y is mapped
to by some x; that is, for each y Y there exists x X such that f (x) = y.
Proposition 2.2.5. Let T : V W be linear. Then
1. T is injective if and only if N (T ) = {~0}.
2. T is surjective if and only if R(T ) = W .
Proof. The second is just the definition of surjective, so we prove the first. Suppose that
T is injective and let v N (T ). Then T (v) = ~0 = T (~0), but because T injective, v = ~0,
proving that N (T ) {~0}. As N (T ) is a subspace, we have {~0} N (T ), giving equality.
Conversely suppose that N (T ) = {~0}; we will prove that T is injective. So assume that
T (v1 ) = T (v2 ). By linearity, T (v1 v2 ) = ~0, so v1 v2 N (T ). But he only vector in N (T )
is the zero vector, so v1 v2 = ~0, giving v1 = v2 and T is injective.
21

In the previous proposition, the second part holds for all functions T , regardless of
whether they are linear. The first, however, need not be true if T is not linear. (Think
of an example!)
We can give an alternative characterization of one-to-one and onto:
Proposition 2.2.6. Let T : V W be linear.
1. T is injective if and only if it maps linearly independent sets of V to linearly independent sets of W .
2. T is surjective if and only if it maps spanning sets of V to spanning sets of W .
3. T is bijective if and only if it maps bases of V to bases of W .
Proof. The third part follows from the first two. For the first, assume that T is injective
and let S V be linearly independent. We will show that T (S) = {T (v) : v S} is linearly
independent. So let
a1 T (v1 ) + + an T (vn ) = ~0 .
This implies that T (a1 v1 + + an vn ) = ~0, implying that a1 v1 + + an vn = ~0 by injectivity.
But this is a linear combination of vectors in S, a linearly independent set, giving ai = 0 for
all i. Thus T (S) is linearly independent.
Conversely suppose that T maps linearly independent sets to linearly independent sets
and let v N (T ). If v 6= ~0 then {v} is linearly independent, so {T (v)} is linearly independent. But if T (v) = ~0 this is impossible, since {~0} is linearly dependent. Thus v 6= ~0 and
N (T ) = {~0}, implying T is injective.
For item two, suppose that T is surjective and let S be a spanning set for V . Then if
w W we can find v V such that T (v) = w and a linear combination of vectors of S equal
to v: v = a1 v1 + + an vn for vi S. Therefore
w = T (v) = a1 T (v1 ) + + an T (vn ) ,
meaning that we have w Span(T (S)), so T (S) spans W . Conversely if T maps spanning
sets to spanning sets, then T (V ) = R(T ) must span W . But since R(T ) is a subspace of W ,
this means R(T ) = W and T is onto.

2.3

Isomorphisms

Definition 2.3.1. A linear transformation T : V W that is bijective (that is, injective


and surjective) is called an isomorphism.
Generally speaking, we can view a bijection between sets X and Y as a relabeling of the
elements of X (to get those of Y ). In the case of an isomorphism, this labeling also respects
the vector space structure, being linear.
Proposition 2.3.2. Let T : V W be an isomorphism. Then T 1 : W V is an
isomorphism. Here, as always, the inverse function is defined by
T 1 (w) = v if and only if T (v) = w .
22

Proof. It is an exercise to see that any bijection has a well-defined inverse function and that
this inverse function is a bijection. (This was done, for example, in the 215 notes in the first
chapter.) So we must only show that T 1 is linear. To this end, let w1 , w2 W and c F.
Then
T (T 1 (cw1 + w2 )) = cw1 + w2 ,
whereas
T (cT 1 (w1 ) + T 1 (w2 )) = cT (T 1 (w1 )) + T (T 1 (w2 )) = cw1 + w2 .
Since T is injective, this implies that T 1 (cw1 + w2 ) = cT 1 (w1 ) + T 1 (w2 ).
Using the notion of isomorphism, we can see that any n dimensional vector space V over
F is just Fn .
Theorem 2.3.3. Let V be an n-dimensional vector space over F. Then V is isomorphic to
Fn .
Proof. Let B = {v1 , . . . , vn } be a basis for V . We will think of B as being ordered. Define the
coordinate map TB : V Fn as before as follows. Each v V has a unique representation
v = a1 v1 + + an vn . So set TB (v) = (a1 , . . . , an ). This was shown before to be a linear
transformation. So we must just show it is an isomorphism.
Since the dimension of V is equal to that of Fn , we need only show that TB is onto. Then
by the rank-nullity theorem, we will find
dimN (TB ) = dim(V ) dim(R(TB )) = dim(V ) dim(Fn ) = 0 ,
implying that N (TB ) = {~0}, and that TB is one-to-one. So to show onto, let (a1 , . . . , an ) Fn .
The element v = a1 v1 + + an vn maps to it:
TB (v) = TB (a1 v1 + + an vn ) = (a1 , . . . , an ) ,
so TB is an isomorphism.

2.4

Matrices and coordinates

We will now see that, just as V with dimension n looks just like Fn , all linear maps from
V to W look just like matrices with entries from F.
Suppose that T : V W is linear and these are finite dimensional vector spaces with
dimension n and m respectively. Fix B = {v1 , . . . , vn } and C = {w1 , . . . , wm } to be bases of
V and W respectively. We know that T is completely determined by its values on B, and
each of these values lies in W , so we can write
T (v1 ) = a1,1 w1 + + am,1 wm
T (v2 ) = a1,2 w1 + + am,2 wm

23

and so on, up to
T (vn ) = a1,n w1 + + am,n wm .
Now we take some arbitrary v V and express it in terms of coordinates using B. This
time we write it as a column vector and use the notation [v]B :

a1
[v]B = , where v = a1 v1 + + an vn .
an
Let us compute T (v) and write it in terms of C:
T (v) = a1 T (v1 ) + + an T (vn )
= a1 (a1,1 w1 + + am,1 wm ) + + an (a1,n w1 + + am,n wm )
= (a1 a1,1 + + an a1,n )w1 + + (a1 am,1 + + an am,n )wm .
Therefore we can write T (v) in coordinates using C as


a1 a1,1 + + an a1,n
a1,1

[T (v)]C =
=
a1 am,1 + + an am,n
am,1

a1,n
am,n

a1
.
an

Therefore we have found on half of:


Theorem 2.4.1 (Matrix representation). Let T : V W be linear and B = {v1 , . . . , vn }
and C = {w1 , . . . , wm } be (ordered) bases of V and W respectively. There exists a unique
matrix, written [T ]B
C such that for all v V ,
[T (v)]C = [T ]B
C [v]B .
Proof. We have already shown existence. To show uniqueness, suppose that A is any m n
matrix with entries from F such that for all v V , A[v]B = [T (v)]C . Choose v = vi for
some i = 1, . . . , n (one of the basis vectors in B). Then the coordinate representation of v
is [v]B = ei , the vector with all 0s but a 1 in the i-th spot. Now the product of matrices
A[v]B actually gives the i-th column of A. We can see this by using the matrix multiplication
formula: if M is an m n matrix and N is an n p matrix then the matrix M N is m p
and its (i, j)-th coordinate is given by
(M N )i,j =

n
X

Mi,k Nk,j .

k=1

Therefore as A is mn and [v]B is n1, the matrix A[v]B is m1 and its (j, 1)-th coordinate
is
n
n
X
X
(A[v]B )j,1 =
Aj,k ([v]B )k,1 =
Aj,k (ei )k,1 = Aj,i .
k=1

k=1

This means the entries of A[v]B are A1,i , A2,i , . . . , Am,i , the i-th column of A. However, this
B
also equals [T (ei )]C , which is the i-th column of [T ]B
C by construction. Thus A and [T ]C have
the same columns and are thus equal.
24

In fact much more is true. What we have done so far is defined a mapping : L(V, W )
Mm,n (F) in the following manner. Given fixed bases B and C of sizes n and m respectively,
we set
(T ) = [T ]B
C .
This function is actually an isomorphism, meaning that the space of linear transformations
is just a relabeling of the space of matrices (after choosing coordinates B and C):
Theorem 2.4.2. Given bases B and C of V and W of sizes n and m, the spaces L(V, W )
and Mm,n (F) are isomorphic via the mapping .
Proof. We must show that is a bijection and linear. First off, if (T ) = (U ) then for all
v V , we have
[T (v)]C = (T )[v]B = (U )[v]B = [U (v)]C .
But the map sending vectors in W to their coordinates relative to C is also a bijection, so
T (v) = U (v). Since this is true for all v, we get T = U , meaning is injective. To show
surjective, let A be any m n matrix with (i, j)-th entry Ai,j . Then we can define a linear
transformation T : V W by its action on the basis B: set
T (vi ) = A1,i w1 + + Am,i wm .
By the slogan, there is a unique linear transformation satisfying this and you can then check
that [T ]B
C = A, meaning is surjective and therefore a bijection.
To see that is linear, let T, U L(V, W ) and c F. Then the i-th column of [cT + U ]B
C
is simply the coefficients of (cT + U )(vi ) expressed relative to the basis C. This coordinate
map is linear, so
[(cT + U )(vi )]C = [cT (vi ) + U (vi )]C = c[T (vi )]C + [U (vi )]C ,
which is c times the i-th column of (T ) plus the i-th column of (U ). Thus
B
B
[cT + U ]B
C = c[T ]C + [U ]C .

Last time we saw that if V and W have dimension n and m and we fix bases B of V and
C of W then there is an isomorphism : L(V, W ) Mm,n (F) given by
(T ) = [T ]B
C .
A simple corollary of this follows. Because of any basis is a basis, these spaces have
the same dimension:
Corollary 2.4.3. The dimension of L(V, W ) is mn, where V has dimension n and W has
dimension m. Given bases B of V and C of W , a basis of L(V, W ) is given by the set of
size mn
{Ti,j : 1 i n, 1 j m} ,
where Ti,j is the unique linear transformation sending vi to wj and all other elements of B
to ~0.
25

Proof. Since L(V, W ) and Mm,n (F) are isomorphic, they have the same dimension, which in
the latter case is mn (that was a homework problem). Further the basis of Mm,n (F) of size
mn given by the matrices with a 1 in the (i, j)-th entry and 0 everywhere else map by 1
to a basis for L(V, W ), and it is exactly the set listed in the corollary.
We can now give many nice properties of the matrix representation.
1. Let T : V W and U : W Z be linear with B, C, D bases for V, W, Z. For any
v V,
B
C
[(U T )v]D = [U (T (v))]D = [U ]C
D [T (v)]C = [U ]D [T ]C [v]B .
However [U T ]B
D is the unique matrix with this property, so we find
C
B
[U T ]B
D = [U ]D [T ]C .

In other words, transformation composition corresponds to matrix multiplication. A


good way to remember this is that the Cs cancel out on the right.
2. If T : V W is an isomorphism, setting IdV : V V as the identity map and
IdW : W W as the identity map and I as the identity matrix,
B
1 C
I = [IdV ]B
]B
B = [T ]C [T
1 C
I = [IdW ]C
]B [T ]B
C = [T
C .

In other words, [T ]B
C is an invertible matrix.
Definition 2.4.4. We say that A Mn,n (F) is invertible if there is a B Mn,n (F)
such that AB = BA = I.
You will show in the homework if A is invertible, there is exactly one (invertible) B
that satisfies AB = BA = I. Therefore we write A1 = B. This gives
1
[T ]B
= [T 1 ]C
C
B .
Exercise: if A is an invertible n n matrix and B is a basis for V then there is an
isomorphism T : V V such that [T ]B
B = A.
We summarize the relation between linear transformations and matrices using the
following table. Fix V, W , T : V W and bases B, C of V, W .
Linear transformations
vV
wW
T
U T (composition)
isomorphisms

Matrices
the n 1 column vector [v]B
the m 1 column vector [w]C
the m n matrix [T ]B
C
B
[U ]C
[T
]
(matrix
multiplication)
D
C
invertible matrices
26

3. Change of basis. Suppose we have T : V W with B, C bases of V , W . We would


B0
0
0
like to relate [T ]B
C to [T ]C 0 , the matrix relative to other bases B , C of V , W . How do
0
C
we do this? Consider the matrices [IdV ]B
B and [IdW ]C 0 :
0

B
B
B
B
[IdW ]C
C 0 [T ]C [IdV ]B = [IdW T IdV ]C 0 = [T ]C 0 .
0

B
Note that [IdW ]C
C 0 and [IdV ]B are invertible. Therefore:

If T : V W is linear and B, B 0 are bases of V with C, C 0 bases of W , there exist


B0
invertible matrices P = [IdW ]C
C 0 Mm,m (F) and Q = [IdV ]B Mn,n (F) such that
0

B
[T ]B
C 0 = P [T ]C Q .
0

Not only is each [IdV ]B


B invertible, each invertible matrix can be seen as a change
of basis matrix: given a basis B of V and an invertible matrix P Mn,n (F), there
0
exists a basis B 0 of V such that P = [IdV ]B
B .
Proof. By the exercise above, there is an isomorphism TP : V V such that
0
[TP ]B
B = P . Writing B = {v1 , . . . , vn }, define B = {T (v1 ), . . . , T (vn )}. Then the
B0
j-th column of [IdV ]B is computed by evaluating
[IdV (T (vj ))]B = [T (vj )]B = j-th column of [TP ]B
B = P .
0

So [IdV ]B
B and P have the same columns and are thus equal.
In one case we have a simpler form for P and Q. Suppose that T : V V is
linear and B, B 0 are bases for V . Then
0

B
B
B
[T ]B
B 0 = [IdV ]B 0 [T ]B [IdV ]B .
0

1
[T ]B
That is, we have [T ]B
B P , where P is an invertible n n matrix. This
B0 = P
motivates the definition

Definition 2.4.5. Two n n matrices A and B are said to be similar if there is


an invertible n n matrix P such that B = P 1 AP .
The message is that similar matrices represent the same transformation but relative to a different basis. Therefore if there is some property of matrices that is the
same for all matrices that are similar, we are right to say it is a property of the
underlying transformation. For instance we define the trace of an n n matrix A
by
n
X
T r(A) =
Ai,i .
i=1

We can show easily that T r(AB) = T r(BA):


T r(AB) =

n
X

(AB)i,i =

i=1

n X
n
X

n X
n
X

Ai,k Bk,i

i=1 k=1
n
X

Bk,i Ai,k =

k=1 i=1

(BA)k,k = T r(BA) .

k=1

27

Therefore if P is invertible, T r(P 1 AP ) = T r(AP P 1 ) = T r(A). This means


that if T : V V is linear, we can define its trace as T r(T ) = T r([T ]B
B ) for any
basis of B (and it will not depend on our choice of B!).

2.5

Exercises

1. Let T : V V be linear with dim V < . Show that the following two statements
are equivalent.
(A) V = R(T ) N (T ).
(B) N (T ) = N (T 2 ), where T 2 = T T .
2. Let T : V W be linear with dim(V ) = n and dim(W ) = m.
(a) Prove that if n > m then T cannot be injective.
(b) Prove that if n < m then T cannot be surjective.
(c) Prove that if n = m then T is injective if and only if it is surjective.
3. Let V, W and Z be finite-dimensional vector spaces over F. If T : V W and
U : W Z are linear, show that
rank(U T ) min{rank(U ), rank(T )} .
Prove also that if either of U or T is invertible, the rank of U T is equal to the rank of
the other one. Deduce that if P : V V and Q : W W are isomorphisms then the
rank of QT P equals the rank of T .
4. Given an angle [0, 2), let T : R2 R2 be the function that rotates a vector
clockwise about the origin by an angle . Find [T ]B
B , where B = {(1, 0), (0, 1)}.
5. Let V and W be finite dimensional vector spaces over F and T : V W linear. Show
there exist ordered bases B of V and C of W such that
(

0
if i 6= j
[T ]B
.
C i,j =
0 or 1 if i = j
6. Let F be a field and consider the vector space of polynomials of degree at most n:
Fn [x] = {an xn + + a0 : ai F for i = 0, . . . , n} .
(a) Show that B = {1, x, x2 , . . . , xn } is a basis for this space.
(b) Fix an element b F and define the evaluation map Tb : Fn [x] F by Tb (p) =
p(b). Show this is linear. Find the range and nullspace of Tb .
(c) Give the representation of Tb in terms of the basis B for Fn [x] and the basis {1}
for F.
28

(d) For distinct b1 , . . . , bn+2 in F show that the functions Tb1 , . . . , Tbn+2 are linearly
dependent in L(Fn [x], F). Deduce that any polynomial p in Fn [x] with at least
n + 1 zeros must have p(x) = 0 for all x F.
7. Here you will give an alternative proof of the rank-nullity theorem. Let T : V W
be linear and suppose that dim(V ) < .
(a) Consider the quotient space V /N (T ) and define a function T : V /N (T ) W as
follows. If C V /N (T ) is some element we may represent it as v + N (T ) for
some v V . Select one such element v and define T(C) = T (v). Show that this
definition does not depend on the choice of v, so long as v + N (T ) = C; that is,
that T as defined is a (well-defined) function.
(b) Prove that T is an isomorphism from V /N (T ) to R(T ). (This is a version of the
first isomorphism theorem when it is proved for groups.)
(c) Deduce the rank-nullity theorem.
8. Let A Mn,n (F) be invertible and B be a basis for an n-dimensional F-vector space
V . Show there is an isomorphism T : V V such that [T ]B
B = A.
9. (a) Let A Mn,n (F) be invertible. Show that the inverse matrix is unique.
(b) Let V be an n-dimensional F-vector space and T : V V and U : V V be
linear that satisfy
(U T )(v) = v for all v V .
Show that (T U )(v) = v for all v V .
(c) Let A, B Mn,n (F) satisfy AB = I. Show that BA = I.
10. If A Mm,n (F) we define the column rank of A as the dimension of the span of the n
different columns of A in Fm . Similarly, we define the row rank of A as the dimension
of the rows of A in Fn .
(a) Show that the column rank of A is equal to the rank of the linear transformation
LA : Fn Fm defined by LA (v) = A v, matrix multiplication of the column
vector v by the matrix A on the left.
(b) Use exercise 7 on the previous homework to show that if P Mn,n (F) and Q
Mm,m (F) are both invertible then the column rank of A equals the column rank
of QAP .
(c) Show that the row rank of A is equal to the rank of the linear transformation
RA : Fm Fn defined by RA (v) = v A, viewing v as a row vector and multiplying
by A on the right.
(d) Show that if P Mn,n (F) and Q Mm,m (F) are both invertible then the row
rank of A equals the row rank of QAP .

29

(e) Use exercise 9 on the previous homework and parts (a) - (d) above to show that
the row rank of A equals the column rank of A.
11. Given m R define the line
Lm = {(x, y) R2 : y = mx} .
(a) Let Tm be the function which maps a point in R2 to its closest point in Lm . Find
the matrix of Tm relative to the standard basis.
(b) Let Rm be the function which maps a point in R2 to the reflection of this point
about the line Lm . Find the matrix of Tm relative to the standard basis.
Hint for both. First find the matrix relative to a carefully chosen basis.

30

Dual spaces

3.1

Definitions

We have been talking about coordinates, so lets examine them more closely. Let V be an
n-dimensional vector space and fix a basis B = {v1 , . . . , vn } of V . We can write any vector
V in coordinates relative to B as

a1
[v]B = , where v = a1 v1 + + an vn .
an
For any i = 1, . . . , n we can define the i-th coordinate map by vi : V F given by vi (v) = ai ,
where ai is the i-th entry of [v]B . These elements vi are linear and are thus in the space
L(V, F). This space comes up so much we give it a name:
Definition 3.1.1. We write V = L(V, F) and call it the dual space to V . Elements of V
will be written f and called linear functionals.
Given any basis B = {v1 , . . . , vn } we call B = {v1 , . . . , vn } the basis of V dual to B.
Proposition 3.1.2. If B is a basis of V then B is a basis of V .
Proof. The dimension of V is n, the dimension of V , so we must show B is linearly
independent or spanning. We show linearly independent: suppose that
a1 v1 + + an vn = ~0 ,
where ~0 on the right is the zero transformation from V to F. Apply both sides to vi . For
i 6= j we get vj (vi ) = 0, since the j-th coordinate of vi is 0. For i = j we get vi (vi ) = 1, so
ai = (a1 v1 + + an vn )(vi ) = ~0(vi ) = 0 .
This is true for all i so B is linearly independent and we are done.
It is not surprising that B is a basis of V . The reason is that each element f V can
be written in its matrix form using the basis B of V and {1} of F. Then then matrix for vi
is
[vi ]B
{1} = (0 0 1 0 0) ,
where the 1 is in the i-th spot. Clearly these form a basis for M1,n (F) and since the map
sending linear transformations to their matrices relative to these bases is an isomorphisms,
so should B be a basis of V .
There is an alternate characterization: each vi is in L(V, F) so can be identified by its
action on the basis B:
(
1 if i = j
vi (vj ) =
.
0 otherwise
One nice thing about considering the dual basis B is that we can write an arbitrary
f V in terms of the basis B quite easily.
31

Proposition 3.1.3. Let B be a basis for V and B the dual basis for V . Then if f V ,
f = f (v1 )v1 + + f (vn )vn .
Proof. We simply need to check that both sides give the same answer when evaluated at the
basis of V . So apply each to vi : the left side gives f (vi ) and the right gives
(f (v1 )v1 + + f (vn )vn )(vi ) = f (v1 )v1 (vi ) + + f (vn )vn (vi ) = f (vi )vi (vi ) = f (vi ) .

A nice way to think about linear functionals involves their nullspaces. By the rank-nullity
theorem, if f V ,
dim(N (f )) + dim(R(f )) = dim(V ) .
Because R(f ) F, it is at most one-dimensional. Therefore N (f ) = V or N (f ) is n 1
dimensional, where n = dim(V ). This gives
If f is not the zero functional, nullity(f ) = dim(V ) 1. A subspace of this dimension
is called a hyperspace.
Because of the simple structure of the nullspace, we can characterize linear functionals easily.
Proposition 3.1.4. Two nonzero elements f, g V are equal if and only if they have the
same nullspace N = N (f ) = N (g) and they agree at one vector outside N .
Proof. One direction is clear, so suppose that N = N (f ) = N (g) and v V \ N satisfies
f (v) = g(v). You can check that if BN is a basis for N then BN {v} is a basis for V . (The
proof is similar to how we proved the one-subspace theorem.) But then f and g agree on
BN {v} and must agree everywhere, giving f = g.

3.2

Annihilators

As we have seen before, one useful tool for the study of linear transformations is the nullspace.
We will consider the dual version of this now: given S V , we give a name to those f V
such that S N (f ).
Definition 3.2.1. If S V then the annihilator of S is
S = {f V : f (s) = 0 for all s S} .
Note that if S T then S T .
Proposition 3.2.2. Let S V (not necessarily a subspace).
1. S is a subspace of V .
2. S = (Span(S)) .
32

3. Let V be finite-dimensional with U a subspace. Let {v1 , . . . , vk } be a basis for U and ex


tend it to a basis {v1 , . . . , vn } for V . If {v1 , . . . , vn } is the dual basis then {vk+1
, . . . , vn }
is a basis for U .
Proof. For the first item, S contains the zero linear functional, so it is nonempty. If
f, g S and c F then for any s S,
(cf + g)(s) = cf (s) + g(s) = c 0 + 0 = 0 ,
so cf + g S . Thus S is a subspace of V .
Next since S Span(S), we have S (Span(S)) . Conversely, if f (s) = 0 for all s S
then let a1 s1 + + ak sk Span(S). Then
f (a1 s1 + + ak sk ) = a1 f (s1 ) + + ak f (sk ) = 0 ,
so f (Span(S)) .

For the third item, each of vk+1


, . . . , vn annihilates v1 , . . . , vk , so they annihilate everything in the span, that is, U . In other words, they are in U , and we already know they
are linearly independent since they are part of the dual basis. To show they span U , let
f U and write f in terms of the dual basis using the previous proposition:

f = f (v1 )v1 + + f (vk )vk + f (vk+1 )vk+1


+ + f (vn )vn = f (vk+1 )vk+1
+ + f (vn )vn

Span({vk+1
, . . . , vn }) .

Corollary 3.2.3. If V is finite dimensional and W is a subspace,


dim(V ) = dim(W ) + dim(W ) .
Proof. This follows from item 3 above.

3.3

Transpose

Given T : V W that is linear, we will define a corresponding transformation T t on the


dual spaces, but it will act in the other direction. We will have T t : W V .
Definition 3.3.1. If T : V W is linear, we define the function T t : W V by the
following. Given g W , set T t (g) V as the linear functional such that
(T t (g))(v) = g(T (v)) for all v V .
T t is called the transpose of T .
Note that the definition here is T t (g) = g T . Since both maps on the right are linear,
so is their composition. So T t (g) is in fact a linear functional (it is in V ).
33

Proposition 3.3.2. If T : V W is linear then T t : W V is linear.


Proof. Let g1 , g2 W and c F. We want to show that T t (cg1 + g2 ) = cT t (g1 ) + T t (g2 )
and these are both elements of V , so we want to show they act the same on each element
of V . So let v V and compute
(T t (cg1 + g2 ))(v) = (cg1 + g2 )(T (v)) = cg1 (T (v)) + g2 (T (v)) = c(T t (g1 ))(v) + (T t (g2 ))(v)
= (cT t (g1 ) + T t (g2 ))(v) .

The matrix of T t can be written in a very simple way using dual bases.
Theorem 3.3.3. Let T : V W be linear and B, C bases for V and W . Writing B and
C for the dual bases,


B t
[T t ]C
=
.
[T
]

B
C
The matrix on the right the transpose matrix; that is, if A is a matrix then the transposed
matrix At is defined by (At )i,j = Aj,i .
Proof. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. When we build the matrix [T ]B
C , we
make the j-th column by expanding T (vj ) in terms of the basis C. So our matrix can be
rewritten as


w1 (T (v1 )) w1 (T (v2 )) w1 (T (vn ))
w2 (T (v1 )) w2 (T (v2 )) w2 (T (vn ))
.

[T ]B
=
C

wm (T (v1 )) wm (T (v2 )) wm (T (vn ))

To build the matrix [T t ]C


B , we begin with the first vector of C and express it in terms
of B . We write
T t (w1 ) = a1 v1 + + an vn .

The coefficients have a simple form:


T t (w1 ) = (T t (w1 ))(v1 )v1 + + (T t (w1 ))(vn )vn
= w1 (T (v1 ))v1 + + w1 (T (vn ))vn .
This means the first column of our matrix is

w1 (T (v1 ))

w1 (T (vn ))
B
This is just the first row of [T ]B
C . Similarly, the j-th column is the j-th row of [T ]C and this
completes the proof.

Proposition 3.3.4. Let T : V W be linear with V, W finite-dimensional. Then


34

1. N (T t ) = R(T ) ,
2. R(T t ) = N (T ) ,
3. rank(T t ) = rank(T ) and nullity(T t ) = nullity(T ).
Proof. For the first item, let g R(T ) . Then we would like to show that g N (T t ), or
that T t (g) = 0. Since T t (g) V this amounts to showing that (T t (g))(v) = 0 for all v V .
So let v V and compute
(T t (g))(v) = g(T (v)) = 0 ,
since g annihilates the range of T . This shows that R(T ) N (T t ). For the other direction,
let g N (T t ) and w R(T ). Then we can find v V such that w = T (v) and so
g(w) = g(T (v)) = (T t (g))(v) = 0 ,
since T t (g) = 0. This completes the proof of the first item.
Next if f R(T t ) we can find g W such that f = T t (g). If v N (T ) then
f (v) = (T t (g))(v) = g(T (v)) = g(0) = 0 ,
so f N (T ) . This shows that R(T t ) N (T ) . To show the other direction, we count
dimensions.
dim R(T t ) = dim W dim N (T t )
= dim W dim R(T )
= dim R(T )
= dim W dim N (T )
= dim N (T ) .
Since these spaces have the same dimension and one is contained in the other, they must be
equal.
The last item follows from dimension counting as well.

3.4

Double dual

We now move one level up, to look at the dual of the dual, the double dual.
Definition 3.4.1. If V is a vector space, we define the double dual V as the dual of V .
It is the space L(V , F) of linear functionals on V .
As before, when V is finite-dimensional, since dim L(V , F) = dim (V ) dim (F), we find
dim V = dimV when dim V < .
Exercise. Show this is true even if dim V = .
There are some simple elements of V , the evaluation maps. given v V we set
evalv : V F by evalv (f ) = f (v) .
35

evalv is a linear functional on V . To see this, note first that it certainly maps V to F,
so we must only show it is linear. The proof is the same as in the previous homework:
let f, g V and c F. Then
evalv (cf + g) = (cf + g)(v) = cf (v) + g(v) = c evalv (f ) + evalv (g) .
In fact the map : V V given by (v) = evalv is an isomorphism when dim V <
. It is called the natural isomorphism. We first show it is linear, so let v1 , v2 V
and c F. If f V then
((cv1 + v2 ))(f ) = evalcv1 +v2 (f ) = f (cv1 + v2 ) = cf (v1 ) + f (v2 )
= c evalv1 (f ) + evalv2 (f ) = c((v1 ))(f ) + ((v2 ))(f )
= (c(v1 ) + (v2 ))(f ) .
Since this is true for all f , we get (cv1 + v2 ) = c(v1 ) + (v2 ).
Now to prove that is an isomorphism we only need to show injective (since V and
V have the same dimension). So assume that (v) = 0 (the zero element of V ).
Then for all f V , we have f (v) = evalv (f ) = 0. We now use a lemma to finish the
proof.
Lemma 3.4.2. If v V is nonzero, there exists f V such that f (v) 6= 0.
Proof. Let v V be nonzero and extend it to a basis B for v. Then the element v in
the dual basis B has v (v) = 1. So set f = v .
Because f (v) = 0 for all f V , the lemma says v = ~0. Therefore N () = {~0} and
is injective, implying it is an isomorphism.
When we looked at the dual V and constructed the dual basis B for a basis B, the dual
element v to an element v B actually depended on the initial choice of basis B. This is
because to define v , we must express a vector in terms of coordinates using the entire basis
B, and then take the coefficient of v. The identification of V with V via the isomorphism
, however, does not depend on the choice of basis. Some people get extremely excited about
this independence of basis, apparently. There are relations, however, between the mapping
and the concepts that we developed earlier about V .
Theorem 3.4.3. Let V be finite dimensional and B a basis of V . Then (B) = (B ) .
Proof. Write v1 , . . . , vn for the elements of B. The elements v1 , . . . , vn of B are characterized
by vi (vj ) = 0 when i 6= j and 1 if i = j. Similarly, the elements v1 , . . . , vn of (B ) are
characterized by vi (vj ) = 0 when i 6= j and 1 otherwise. But the elements of (B) also
have this property:
(vi )(vj ) = evalvi (vj ) = vj (vi ) = 0 when i 6= j and 1 otherwise .
Therefore (vi ) = vi and we are done.
36

The interesting part of the previous theorem is that the mapping of B to B depends on
the choice of B, whereas does not. Therefore when we take the dual basis twice, mapping
B to B and then B to (B ) , the dependence on B disappears.
Theorem 3.4.4. If W V is a subspace and dim V < then (W ) = (W ) .
Proof. Let {v1 , . . . , vk } be a basis for W such that {v1 , . . . , vn } is a basis for V . From the

, . . . , vn } is a basis for W such that {v1 , . . . , vn } is a basis for


results on annihilators, {vk+1
V . Applying this result again, we see that {v1 , . . . , vk } is a basis for (W ) . But is an
isomorphism so ({v1 , . . . , vk }) is also a basis for (W ). Since these bases are the same sets
(from the last theorem), this finishes the proof.

3.5

Exercises

1. Let V be an F-vector space and C a basis of V . Show there is a basis B of V such


that B = C.
2. Let V be an F-vector space and S 0 V . Define the lower annihilator

S 0 = {v V : f (v) = 0 for all f S 0 } .

Show the following:


(a)

S0 =

Span(S 0 ).

(b) Assume that V is finite-dimensional and U 0 V is a subspace. Let {f1 , . . . , fn }


be a basis for V such that {f1 , . . . , fk } is a basis for U 0 . If {v1 , . . . , vn } is a basis
for V such that vi = fi for all i (given by exercise 9) then show that {vk+1 , . . . , vn }
is a basis for U 0 . In particular, deduce that dim(U 0 ) + dim( U 0 ) = dim(V ).
3. Let V and W be finite dimensional vector spaces and V : V V and W : W W
be the isomorphisms
V (v) = evalv and W (w) = evalw .
1
Show that if T : V W is linear then W
(T t )t V = T .

4. Let V be finite-dimensional and C a basis for V . Show that there is a basis B of V


such that B = C.
Hint. Consider C V , the dual basis to C.
5. Let S V , a finite dimensional vector space. Show that if V : V V is the map
V (v) = evalv , then V1 ((S ) ) = Span(S).

37

Determinants

4.1

Permutations

We now move to permutations, which will be useful in the study of determinants.


Definition 4.1.1. A bijection from {1, . . . , n} to {1, . . . , n} is called a permutation on n
letters. The set of permutations on n letters is written Sn and is called the symmetric group.
A permutation can be seen as simply a rearrangement of the set {1, . . . , n}. It is truly a
relabeling. There are at least two simple ways to represent a permutation.
1. We write the elements of {1, . . . , n} in a row, with the images below:
1 2 3 4 5 6
6 3 2 5 1 4
This permutation maps 1 to 6, 2 to 3, 3 to 2, 4 to 5, 5 to 1 and 6 to 4.
2. Cycle notation. We start with 1 and follow its path by iterating the permutation.
In the above example, first 1 maps to 6. Then 6 maps to 4, so in two steps, 1 maps to
4. Next 4 maps to 5 and then 5 maps back to 1. We write this as (1645). Now that
we have completed a cycle, we move to the next element of {1, . . . , n} that we have
not used yet: 2. We see that 2 maps to 3 and then back to 2. So this gives (23) and
therefore we write
(1645)(23) .
Since we have written all the elements of {1, . . . , n}, we finish. We have decomposed
our permutation into two cycles. The convention is that we omit any cycle of length
1, but we do not have any here.
I usually think about permutations in terms of their cycle decomposition. In the exercises,
you will prove:
Exercise. For each permutation on n letters, its cycle decomposition exists and is unique
(up to rearrangement of the individual cycles).
Here are some facts about permutations.
The identity permutation maps every element back to itself.
There are n! elements of Sn .
The elements of Sn can be multiplied (that is, composed). If , Sn we define
the product as . The composition of two bijections is a bijection, so Sn .
Products fare quite well in the cycle decomposition; here is an example. Take as the
permutation in S6 whose representation is (1645)(23). Take = (123456). Then
= (1645)(23)(123456) .
38

These cycles are not disjoint, so we can make them so. Start with 1 and feed it into
the right side. The first factor maps 1 to 2, so 1 exits from the left side of the rightmost
factor as a 2. It enters the middle factor as a 2, and exits as a 3, entering the leftmost
factor to leave unchanged (since 3 does not appear in the leftmost factor). This gives
us
(13
We begin again with 3, feeding it into the rightmost factor. It maps to 4, then stays
unchanged, then maps to 5, so we get
(135
continuing, 5 maps to 6 and to 4:
(1354
4 maps to 5 and back to 1, so this closes a cycle.
(1354) .
We start again with the next unused letter, 2. It maps to 3 and then back to 2, so it
is unchanged and we omit it. The last letter is 6, which maps to 1 and back to 6. So
we get
= (1354) .
The symmetric group is, in fact, a group.
Definition 4.1.2. A set G with a binary operation (that is, a function : GG G)
is called a group if the following hold:
1. there is an element e G such that eg = ge = g for all g G,
2. for all g G there is an inverse element g 1 G such that gg 1 = g 1 g = e and
3. for all g, h, k G, we have (gh)k = g(hk).
A group G is called abelian (or commutative) if gh = hg for all g, h G.
For n 3 the group Sn is non-abelian.
We will look at the simplest permutations, the transpositions:
Definition 4.1.3. An element Sn is called a transposition if it can be written
= (ij) for some i, j {1, . . . , n} with i 6= j .
Every permutation can be written as a product of transpositions (but they will not
necessarily be disjoint!) This can be seen because it can be written in cycle notation, and
then we can decompose each cycle into a product of transpositions. Indeed, if (a1 ak ) is
a cycle then you can verify that
(a1 ak ) = (a1 ak )(a1 ak1 ) (a1 a2 ) .
The main theorem we want to prove is:
39

Theorem 4.1.4. Given Sn , write = 1 k and 1 l , where all s and s are


transpositions. Then (1)k = (1)l .
This theorem means that if we represent a permutation as a product of transpositions,
the number of such transpositions may be different, but the parity (oddness of evenness) is
the same. This allows us to define
Definition 4.1.5. The signature of a permutation is sgn() = (1)k , where is written
as a product of k transpositions.
To prove Theorem 4.1.4, we need to introduce another definition.
Definition 4.1.6. A pair {i, j} with is called an inversion pair for if i 6= j and i j has a
different sign than (i) (j). ( reverses the order of i and j.) Write N () for the number
of inversion pairs for .
As an example, the permutation from before, (1645)(23) has inversion pairs
{1, 2}, {1, 3}, {1, 4}, {1, 5}, {2, 3}, {2, 5}, {3, 5}, {4, 5}, {4, 6}, {5, 6}, so N () = 10 .
The number of inversion pairs acts nicely with multiplying by adjacent transpositions.
Lemma 4.1.7. Let Sn and = (k k + 1) be an adjacent transposition. Then
N ( ) N () = 1 .
Proof. Write Inv() for the set of inversion pairs of . Then we will show that
Inv( )Inv() = {{ 1 (k), 1 (k + 1)}} .
Here AB is the symmetric difference of sets: it is defined as (A \ B) (B \ A). This
will prove the lemma because when #(AB) = 1 if must be that either A contains B but
has one more element, or B contains A but has one more element. In this case we have
#A #B = 1.
First we show { 1 (k), 1 (k + 1)} Inv( )Inv(). If 1 (k) > 1 (k + 1) then
this is an inversion pair for since ( 1 (k)) = k < k + 1 = ( 1 (k + 1)). However then
( 1 (k)) = k + 1 > k = ( 1 (k + 1)), so it is not an inversion pair for and therefore
is in Inv( )Inv(). In the case that 1 (k) < 1 (k + 1) a similar argument shows that
{ 1 (k), 1 (k + 1)} is not an inversion pair for but it is one for and therefore is in
Inv( )Inv().
Now we must show that if {a, b} =
6 { 1 (k), 1 (k + 1)} then {a, b} is an inversion pair
for if and only if it is an inversion pair for . We will just show one direction; the other
is similar. This will prove that Inv( )Inv() does not contain any other elements and
we will be done with the lemma.
So suppose that {a, b} is an inversion pair for but it is not equal to { 1 (k), 1 (k +1)}.
If neither of a, b are equal to 1 (k), 1 (k + 1) then we have
((a)) ((b)) = (a) (b) = b a ,
40

so {a, b} is an inversion pair for . Otherwise if exactly one of a, b is equal to 1 (k), 1 (k+
1) then let us suppose that a < b (else we can just switch the roles of a and b). Then because
{a, b} is an inversion pair for we have (b) < (a), so if a = 1 (k), we must have
(b) < k = (a), so (b) = (b) < (a) < k + 1 = (a), so {a, b} is still an inversion
pair for . If instead a = 1 (k + 1) we cannot have b = 1 (k), so (b) < k, giving
(b) = (b) < k = (a) and {a, b} is an inversion pair for . Last, if (a)
/ {k, k + 1}
we must have (b) {k, k + 1} and therefore if (b) = k, (a) > k + 1, giving (b) =
k + 1 < (a) = (a), so {a, b} is an inversion pair for . If (b) = k + 1 then (a) > k + 1
and (b) = k < k + 1 = (b) < (a) = (a), so {a, b} is an inversion pair for . This
completes the proof.

4.2

Determinants: existence and uniqueness

Given n vectors ~v1 , . . . , ~vn in Rn we want to define something like the volume of the parallelepiped spanned by these vectors. What properties would we expect of a volume?
1. vol(e1 , . . . , en ) = 1.
2. If two of the vectors ~vi are equal the volume should be zero.
3. For each c > 0, vol(c~v1 , ~v2 , . . . , ~vn ) = c vol(~v1 , . . . , ~vn ). Same in other arguments.
4. For each ~v10 , vol(~v1 +~v10 , ~v2 , . . . , ~vn ) = vol(~v1 , . . . , ~vn )+vol(~v10 , ~v2 , . . . , ~vn ). Same in other
arguments.
Using the motivating example of the volume, we define a multilinear function as follows.
Definition 4.2.1. If V is an n-dimensional vector space over F then define
V n = {(v1 , . . . , vn ) : vi V for all i = 1, . . . , n} .
A function f : V n F is called multilinear if for each i and vectors v1 , . . . , vi1 , vi+1 , . . . , vn
V , the function fi : V F is linear, where
fi (v) = f (v1 , . . . , vi1 , v, vi+1 , . . . , vn ) .
A multilinear function f is called alternating if f (v1 , . . . , vn ) = 0 whenever vi = vj for some
i 6= j.
Proposition 4.2.2. Let f : V n F be a multilinear function. If F does not have characteristic two then f is alternating if and only if for all v1 , . . . , vn and i < j,
f (v1 , . . . , vi , . . . , vj , . . . , vn ) = f (v1 , . . . , vj , . . . , vi , . . . , vn ) .

41

Proof. Suppose that f is alternating. Then


0 = f (v1 , . . . , vi + vj , . . . , vi + vj , . . . , vn )
= f (v1 , . . . , vi , . . . , vi + vj , . . . , vn ) + f (v1 , . . . , vj , . . . , vi + vj , . . . , vn )
= f (v1 , . . . , vi , . . . , vj , . . . , vn ) + f (v1 , . . . , vj , . . . , vi , . . . , vn ) .
Conversely suppose that f has the property above. Then if vi = vj ,
f (v1 , . . . , vi , . . . , vj , . . . , vn ) = f (v1 , . . . , vj , . . . , vi , . . . , vn )
= f (v1 , . . . , vi , . . . , vj , . . . , vn ) .
Since F does not have characteristic two, this means this is zero.
Corollary 4.2.3. Let f : V n F be an n-linear alternating function. Then for each Sn ,
f (v(1) , . . . , v(n) ) = sgn() f (v1 , . . . , vn ) .
Proof. Write = 1 k where the i s are transpositions and (1)k = sgn(). Then
f (v(1) , . . . , v(n) ) = f (v1 k1 (1) , . . . , v1 k1 (n) ) .
Applying this k 1 more times gives the corollary.
Theorem 4.2.4. Let V be an F-vector space and {e1 , . . . , en } a basis. There exists a unique
n-linear alternating function f on V such that f (e1 , . . . , en ) = 1.
Proof. We will first prove uniqueness, so assume that f is an n-linear alternating function
on V such that f (v1 , . . . , vn ) = 1. We will show that f must have a certain form. Let
v1 , . . . , vn V and write then as
vk = a1,k e1 + + an,k en .
We can then expand using n-linearity:
f (v1 , . . . , vn ) = f (a1,1 e1 + + an,1 en , v2 , . . . , vn ) =
=

n
X
i1 =1
n
X

ai,1 f (ei , v2 , . . . , vn )

i1 =1

n
X

ai1 ,1 ain ,n f (ei1 , . . . , ein )

in =1

ai1 ,1 ain ,n f (ei1 , . . . , ein ) .

i1 ,...,in

Since f is alternating, all choices of i1 , . . . , in that are not distinct have f (ei1 , . . . , ein ) = 0.
So we can write this as
X
ai1 ,1 ain ,n f (ei1 , . . . , ei,n ) .
i1 ,...,in distinct

42

The choices of distinct i1 , . . . , in can be made using permutations. Each permutation Sn


gives exactly one such choice. So we can yet again write as
X
a(1),1 a(n),n f (e(1) , . . . , e(n) ) .
Sn

Using the lemma from last time, f (e(1) , . . . , e(n) ) = sgn()f (e1 , . . . , en ) = sgn(), so
f (v1 , . . . , vn ) =

sgn()a(1),1 a(n),n .

Sn

If g is any other n-linear alternating function with g(e1 , . . . , en ) = 1 then the same computation as above gives the same formula for g(v1 , . . . , vn ), so f = g. This shows uniqueness.
For existence, we need to show that the formula above actually gives an n-linear alternating function with f (e1 , . . . , en ) = 1.
1. We first show f (e1 , . . . , en ) = 1. To do this, we write
ek = a1,k e1 + + an,k en ,
where ak,j = 0 unless j = k, in which case it is 1. If Sn is not the identity,
we can find k 6= j such that (k) = j. This means that a(k),k = aj,k = 0 and so
sgn()a(1),1 a(n),n = 0. Therefore
f (e1 , . . . , en ) =

sgn()a(1),1 a(n),n = sgn(id)a1,1 an,n = 1 .

Sn

2. Next we show alternating. Suppose that vi = vj for some i 6= j and let i,j be the
transposition (ij). Split all permutations into A, those which invert i and j and Sn \ A,
those which do not. Then if vk = a1,k e1 + + an,k en , we can write
X
sgn()a(1),1 a(n),n
f (v1 , . . . , vn ) =
Sn

sgn()a(1),1 a(n),n +

sgn(i,j )ai,j (1),1 ai,j (n),n

Sn \A

sgn()[a(1),1 a(n),n ai,j (1),1 ai,j (n),n ] .

Note however that ai,j (i),i = a(j),i = a(j),j , since vi = vj . Similarly ai,j (j),j = a(i),i .
Therefore
a(1),1 a(n),n = ai,j (1),1 ai,j (n),n .
So the above sum is zero and we are done.

43

3. For n-linearity, we will just show it in the first coordinate. So let v, v1 , . . . , vn V and
c F. Writing
vk = a1,k e1 + + an,k en and v = a1 e1 + + an en ,
then cv + v1 = (ca1 + a1,1 )e1 + + (can + an,1 )en , so
X
f (cv + v1 , v2 , . . . , vn ) =
sgn()(ca(1) + a(1),1 )a(2),2 a(n),n
Sn

=c

sgn()a(1) a(2),2 a(n),n +

Sn

sgn()a(1),1 a(n),n

Sn

= cf (v, v2 , . . . , vn ) + f (v1 , . . . , vn ) .

One nice property of n-linear alternating functions is that they can determine when
vectors are linearly independent.
Theorem 4.2.5. Let V be an n-dimensional F-vector space and f a nonzero n-linear alternating function on V . Then {v1 , . . . , vn } is linearly independent if and only if f (v1 , . . . , vn ) 6=
0.
Proof. If n = 1 then the proof is an exercise, so take n 2 and first assume that the vectors
are linearly dependent. Then we can write one as a linear combination of the others. Suppose
for example that v1 = b2 v2 + + bn vn . Then
f (v1 , . . . , vn ) = b2 f (v2 , v2 , . . . , vn ) + + bn f (vn , v2 , . . . , vn ) = 0 .
Here we have used that f is alternating.
Conversely suppose that {v1 , . . . , vn } is linearly independent. Then it must be a basis.
We can then proceed exactly along the development given above and, if u1 , . . . , un are vectors
written as
uk = a1,k v1 + + an,k vn ,
then if f (v1 , . . . , vn ) = 0, we find
X
f (u1 , . . . , un ) =
sgn()a(1),1 a(n),n f (v1 , . . . , vn ) = 0 .
Sn

Therefore f is zero. This is a contradiction, so f (v1 , . . . vn ) 6= 0.


Definition 4.2.6. Choosing V = Fn and e1 , . . . , en the standard basis, we write det (the
determinant) for the unique n-linear alternating function such that det(e1 , . . . , en ) = 1. If
A Mn,n (F) we define det(A) = det(~a1 , . . . , ~an ), where ~ai is the i-th column of A.
Corollary 4.2.7. Let A Mn,n (F). Then det(A) 6= 0 if and only if A is invertible.
Proof. By the previous theorem, det(A) 6= 0 if and only if the columns of A are linearly
independent. This is equivalent to saying that the column rank of A is n, or that A is
invertible.
44

4.3

Properties of the determinant

One of the most important properties of the determinant is that it factors through products
(compositions).
Theorem 4.3.1. Let A, B Mn,n (F). We have the following factorization:
det AB = det A det B .
Proof. First if det A = 0 the matrix A cannot be invertible and therefore neither is AB, so
det AB = 0, proving the formula in that case. Otherwise we have det A 6= 0. In this case we
will use a method of proof that is very common when dealing with determinants. We will
define a function on matrices that is n-linear and alternating as a function of the columns,
mapping the identity to 1, and use the uniqueness of the determinant to see that it is just
the determinant function. So define f : Mn,n (F) F by
f (B) =

det AB
.
det A

First note that if In is the n n identity matrix, f (In ) = (det AIn )/ det A = 1. Next, if
B has two equal columns, its column rank is strictly less than n and so is the column rank
of AB, meaning that AB is non-invertible. This gives f (B) = 0/ det A = 0.
Last to show n-linearity of f as a function of the columns of B, write B in terms of its
columns as (~b1 , . . . , ~bn ). Note that if ei is the i-th standard basis vector, then we can write
~bi = Bei . Therefore the i-th column of AB is (AB)ei = A~bi and AB = (A~b1 , . . . , A~bn ). Thus
if ~b1 , ~b01 are column vectors and c F,
h
i
det A(c~b1 + ~b01 , ~b2 , . . . , ~bn ) = det(A(c~b1 + ~b01 ), A~b2 , . . . , A~bn )
= det(cA~b1 + A~b01 , A~b2 , . . . , A~bn )
= c det(A~b1 , A~b2 , . . . , A~bn ) + det(A~b01 , A~b2 , . . . , A~bn ) .
This means that det AB is n-linear (at least in the first column the same argument works
for all columns), and so is f .
There is exactly one n-linear alternating function f with f (In ) = 1, so f (B) = det B.
Here are some consequences.
Similar matrices have the same determinant.
Proof.
det P 1 AP = det P 1 det A det P = det A det P 1 det P = det A det In = det A .

If A is invertible then det(A1 ) =

1
.
det A

45

The definition of the determinant is probably different than what you may have seen
before. So now we will relate the definition to the Laplace (cofactor) expansion.
Definition 4.3.2. Given A Mn,n (F), the (i, j)-th minor of A, written A(i|j), is the n
1 n 1 matrix formed by removing the i-th row and the j-th column from A.
The Laplace expansion is a recursive formula for the determinant. We can write det A
in terms of the determinant of smaller matrices, the minors of A.
Theorem 4.3.3. Let A Mn,n (F) with entries (ai,j ). Then
det A =

n
X

(1)i1 ai,1 det A(i|1) .

i=1

Proof. Write the columns of A as ~a1 , . . . , ~an with ~a1 = a1,1 e1 + + an,1 en (where e1 , . . . , en
are the standard basis vectors) and use n-linearity on the matrix A = (~a1 , . . . , ~an ) to get
det A = det(a1,1 e1 , ~a2 , . . . , ~an ) + + det(an,1 en , ~a2 , . . . , ~an ) . =

n
X

ai,1 det(ei , ~a2 , . . . , ~an ) .

i=1

Now we must only show that det(ei , ~a2 , . . . , ~an ) = (1)i1 det A(i|1).
We will need to use two facts from the homework.
1. For any matrix B, det B = det B t . As a consequence of this, det is n-linear and
alternating when viewed as a function of the rows of B.
2. If B is any block upper triangular matrix ; that is, of the form


C D
B=
0 E
for square matrices C and E, then det B = det C det E.
So now use i 1 adjacent row swaps

1 ai,2
0 a1,2

0 ai1,2

0 ai+1,2

0 an,2

to turn the matrix (ei , ~a2 , . . . , ~an ) into

ai,3 ai,n
a1,3 a1,n

ai1,3 ai1,n
.
ai+1,3 ai+1,n

an,3 an,n

Since we applied i 1 transpositions, the determinant of this matrix equals (1)i1 times
det(ei , ~a2 , . . . , ~an ). Now we apply the block upper-triangular result, noting that this matrix
is of the form


1
D
.
0 A(i|1)
Therefore det(ei , ~a2 , . . . , ~an ) = (1)i1 det A(i|1) and we are done.
46

There is a more general version of this result. The above we call expanding along the
first column. We can expand along the j-th column by first applying j 1 adjacent column
swaps to get
n
X
det A =
(1)i+j ai,j det A(i|j) .
i=1

By taking the transpose initially we can expand along any row too.

4.4

Exercises

1. Prove that Sn , the set of permutations on n letters, is a group under composition.


Show that Sn is abelian (its multiplication is commutative) if and only if n < 3.
2. Let f : Sn Z be a function that is multiplicative; that is, f ( ) = f ()f ( ). Show
that f must be one of the following three functions: identically zero, identically 1 or
the signature function.
3. List the elements of S4 and state which are odd and which are even.
4. Show that any element of Sn can be written as a product of disjoint cycles.
5. Show that if T Sn is a subgroup (a subset that is also a group under composition)
and T contains both (12) and (12 n) then T = Sn .
Hint. If Sn then what is the relation between the cycle decomposition of and
that of 1 ?
6. Let V be an F-vector space of dimension n and let f be a k-linear alternating function
on V with k > n. Show that f is identically zero.
7. Suppose that A Mn,n (F) is upper-triangular ; that is, ai,j = 0 if i > j. Show that
det A = a1,1 a2,2 an,n . (Dont use the next exercise though!)
8. This exercise is a generalization of the previous one to block upper-triangular matrices.
For n 2 we say that M Mn,n (F) is block upper-triangular if there exists k with
1 k n 1 and matrices A Mk,k (F), B Mk,nk (F) and C Mnk,nk (F) such
that M has the form


A B
.
0 C
That is, the elements of M are given by

Ai,j

B
i,jk
Mi,j =

Cik,jk

1 i k,
1 i k,
k < i n,
k < i n,

47

1jk
k<jn
.
1jk
k<jn

We will show in this exercise that


det M = det A det C .
(a) Show that if det C = 0 then the above formula holds.
(b) Suppose that det C 6= 0 and define a function : Mk,k (F) F by


A B
1

(A) = [det C] det


.
0 C
is a scalar multiple of the determinant of the block upper-triangular
That is, (A)
matrix we get when we replace A by A and keep B and C fixed.

i. Show that is k-linear as a function of the columns of A.


ii. Show that is alternating and satisfies (Ik ) = 1, where Ik is the k k
identity matrix.
iii. Conclude that the above formula holds when det C 6= 0.
9. Let a0 , . . . , an be distinct complex numbers.
matrix

1 a0 a20
1 a1 a21

1 an a2n

Write Mn (a0 , . . . , an ) for the Vandermonde

an0
an1
.

n
an

The goal of this exercise is to prove the Vandermonde determinant formula


Y
det Mn (a0 , . . . , an ) =
(aj ai ) .
0i<jn

We will argue by induction on n.


(a) Show that if n = 2 then the Vandermonde formula holds.
(b) Now suppose that k 2 and that the formula holds for all 2 n k. Show that
it holds for n = k + 1 by completing the following outline.
i. Define the function f : C C by f (z) = det Mn (z, a1 , . . . , an ). Show that f
is a polynomial of degree at most n.
ii. Find all the zeros of f .
Hint. Recall what was proved on a past homework: if a polynomial of degree
n has at least n + 1 zeros then it must be identically zero.
iii. Show that the coefficient of z n is (1)n det Mn1 (a1 , . . . , an ).
iv. Show that the Vandermonde formula holds for n = k + 1, completing the
proof.
10. Show that if A Mn,n (F) then det A = det At , the determinant of the transpose of A.
48

11. Let A M7,7 (C) be anti-symmetric; that is, A = At . What is det A?


12. Let T : V V be linear and B a finite basis for V . We define
det T = det[T ]B
B .
(a) Show that the above definition does not depend on the choice of B.
(b) Show that if f is any nonzero n-linear alternating function on V then
det T =

f (T (v1 ), . . . , T (vn ))
,
f (v1 , . . . , vn )

where we have written B = {v1 , . . . , vn }. (This is an alternate definition of det T .)


13. Let A Mnn (F ) for some field F . Recall that if 1 i, j n then the (i, j)-th minor
of A, written A(i|j), is the (n 1) (n 1) matrix obtained by removing the i-th row
and j-th column from A. Define the cofactor
Ci,j = (1)i+j det A(i|j) .
Note that the Laplace expansion for the determinant can be written
det A =

n
X

Ai,j Ci,j .

i=1

(a) Show that if 1 j, k n with j 6= k then


n
X

Ai,k Ci,j = 0 .

i=1

(b) Define the classical adjoint of A, written adj A, by


(adj A)i,j = Cj,i .
Show that (adj A)A = (det A)I.
(c) Show that A(adj A) = (det A)I and deduce that if A is invertible then
A1 = (det A)1 adj A .
Hint: begin by applying the result of the previous part to At .
(d) Use the formula in the last part to find the inverses

1 2 3
1 2 4
1 0 0
1 3 9 ,
0 1 1
1 4 16
6 0 1
49

of the following matrices:

4
0
.
1
1

14. Consider a system of equations in n variables with coefficients from a field F. We can
write this as AX = Y for an nn matrix A, an n1 matrix X (with entries x1 , . . . , xn )
and an n 1 matrix Y (with entries y1 , . . . , yn ). Given the matrices A and Y we would
like to solve for X.
(a) Show that
(det A)xj =

n
X
(1)i+j yi det A(i|j) .
i=1

(b) Show that if det A 6= 0 then we have


xj = (det A)1 det Bj ,
where Bj is an n n matrix obtained from A by replacing the j-th column of A
by Y . This is known as Cramers rule.
(c) Solve the following systems of equations using Cramers rule.

2x y + z
2y z

yx

4.5

2x y + z 2t

2x + 2y 3z + t

x + y z

4x 3y + 2z 3t

=3
=1
=1

= 5
= 1
= 1
= 8

Exercises on polynomials

1. Let F be a field and write F[x] for the set of polynomials with coefficients in F. Define
deg(p) for the degree of p F[x]: the largest k such that the coefficient of xk in p is
nonzero. The degree of the zero polynomial is defined to be .
(a) Show that for p, q F[x], the product pq has degree deg(pq) = deg(p) + deg(q).
(b) Show that for p, d F[x] such that d is nonzero, there exist q, r F[x] such that
p = qd + r and deg(r) < deg(d). (This result is called the division algorithm.)
Hint. We may assume that deg(p) 0, for otherwise we can choose r = q = 0.
Also we can assume deg(d) deg(p), or else we choose q = 0 and r = p. So use
induction on deg(p), starting with deg(p) = 0, meaning that p(x) = c for some
nonzero c F. For the inductive step, if deg(p) > 0, find some q1 F[x] such
that deg(p q1 d) < deg(p) and continue.
2. Show that if p F[x] and c F then p(c) = 0 if and only if the polynomial x c
divides p (that is, we can find d F[x] such that (x c)d = p).
3. Let p, q F[x] be nonzero and define the subset S of F[x] as
S = {ap + bq : a, b F[x]} .

50

(a) Let d S be nonzero of minimal degree. Show that d divides both p and q (see
the definition of divides in exercise 2).
(b) Show that if s F[x] divides both p and q then s divides d.
(c) Conclude that there exists a unique monic polynomial (that is, with leading coefficient 1) d F[x] satisfying:
i. d divides both p and q and
ii. if s F[x] divides both p and q then s divides d.
(This d is called the greatest common divisor of p and q.)
4. A field F is called algebraically closed if every p F[x] with deg(p) 1 has a zero in
F. Prove that if F is algebraically closed then for any nonzero p F[x], we can find
a, 1 , . . . , k F and natural numbers n1 , . . . , nk with n1 + + nk = deg(p) such that
p(x) = a(x 1 )n1 (x k )nk .
Here we say that 1 , . . . , k are the roots of p and n1 , . . . , nk are their multiplicities.
Hint. Use induction on the degree of p.
5. Let F be algebraically closed. Show that for nonzero p, q F[x], the greatest common
divisor of p and q is 1 if and only if p and q have no common root. Is this true for
F = R?

51

5
5.1

Eigenvalues
Diagonalizability

Our goal for most of the rest of the semester is to classify all linear transformations T : V V
when dim V < . How can we possibly do this? There are so many transformations. Well,
lets start with the simplest matrices.
Definition 5.1.1. A Mn,n (F) is called a diagonal matrix if ai,j = 0 when i 6= j.
Notice that if A is a diagonal matrix, then A acts very simply on the standard basis.
Precisely, if A is diagonal with entries 1 , . . . , n , then
Aei = i ei .
For the next couple of lectures we will try to determine exactly when a linear transformation
has a diagonal matrix representation (for some basis). This motivates the following definition.
Definition 5.1.2. If T : V V is linear then a nonzero vector v V is called an eigenvector
for T with associated eigenvalue if T (v) = v. If there is a basis for V consisting of
eigenvectors for T then we say T is diagonalizable.
Proposition 5.1.3. Let T : V V be linear. Then T is diagonalizable if and only if there
is a basis B of V such that [T ]B
B is diagonal.
Proof. If [T ]B
B is diagonal with entries 1 , . . . , n , then writing B = {v1 , . . . , vn }, we have
B
[T (vi )]B = [T ]B
B [vi ]B = [T ]B ei = i ei = [i vi ]B .

Therefore T (vi ) = i vi . Since vi is part of a basis, vi 6= ~0 and therefore each vi is an


eigenvector.
Conversely, suppose that B = {v1 , . . . , vn } is a basis of V consisting of eigenvectors for
B
T . Then the i-th column of [T ]B
B is the column vector [T vi ]B = i [vi ]B = i ei , so [T ]B is
diagonal.
Imagine that we have a linear transformation T : V V and we are trying to build a
basis of eigenvectors for T . We find first an eigenvector v1 with eigenvalue 1 . Next we find
v2 with value 2 . How do we know they are linearly independent? Here is a sufficient (but
not necessary!) condition.
Theorem 5.1.4. Let v1 , . . . , vk be eigenvectors with respective eigenvalues 1 , . . . , k such
that i 6= j for i 6= j. Then {v1 , . . . , vk } is linearly independent.
Proof. As usual, suppose that
a1 v1 + + ak vk = ~0 .
Lets assume that we have already removed all vectors which have nonzero coefficients, and
among all such linear combinations equal to zero, this is one with the least number of
52

coefficients. We may assume that there are at least two coefficients, or else we would have
a1 v1 = ~0 and since v1 6= 0 we would have a1 = 0, meaning all coefficients are zero and
{v1 , . . . , vk } is linearly independent.
So apply T to both sides:
a1 T (v1 ) + + ak T (vk ) = ~0 .
Since these are eigenvectors, we can rewrite as
a1 1 v1 + + ak k vk = ~0 .
However multiplying the linear combination by 1 we get
a1 1 v1 + + ak 1 vk = ~0 .
Subtracting these two,
a2 (1 2 )v2 + + ak (1 k )vk = ~0 .
All i s were distinct and all ai s were nonzero, so this is a linear combination of the vi s
equal to zero with fewer nonzero coefficients than in the original one, a contradiction.
For a matrix A Mn,n (F), we define its eigenvalues and eigenvectors similarly: is an
eigenvalue of A if there is a nonzero v Fn such that A v = v.
To find the eigenvalues, we make the following observation.
is an eigenvalue for A if and only if there exists a nonzero v such that (I A)(v) = ~0.
This is true if and only if I A is not invertible. Therefore
is an eigenvalue of A (I A) not invertible det(I A) = 0 .
This leads us to define
Definition 5.1.5. The characteristic polynomial of a matrix A Mn,n (F) is the function
cA : F F given by
cA (x) = det(xI A) .
The definition is similar for a linear transformation. The characteristic polynomial of
T : V V is cT (x) = det[xI T ]B
B , where B is any finite basis of V . (You will show on
homework that this definition does not depend on the choice of basis.)
Facts about the characteristic polynomial.
1. cA is a monic polynomial of degree n.

53

Proof. We simply write out the definition of the determinant, using the notation that
Ai,j (x) is the (i, j)-th entry of xI A:
X
cA (x) = det(xI A) =
sgn A(1),1 (x) A(n),n (x) .
Sn

Each term in this sum is a product of n polynomials, each of degree at most 1 (it is
either a field element or a polynomial of the form x a for some a). So each term is a
polynomial of degree at most n, implying the same for cA . The only term of degree n
corresponds to choosing all the diagonal elements of the matrix this is the identity
permutation. This term is (x a1,1 ) (x an,n ) and the coefficient of xn is 1.
2. If we look for terms of degree n1 in cA (x), we note that in the determinant expansion,
each permutation that is not the identity must have at least two numbers k 6= j from
1 to n such that (k) 6= k and (j) 6= j. Therefore all nonidentity permutations give
terms with degree at most n 2. So the only term contributing to degree n 1 is the
identity. Thus the coefficient of cA (x) of degree n 1 is the same as that of
(x a1,1 ) (x an,n ) = xn xn1 [a1,1 + + an,n ] + = xn xn1 T r(A) +
This means the coefficient of xn1 is T r(A).
3. The coefficient of degree zero (the last term of cA ) is (1)n det A. This follows by just
plugging in x = 0. Thus
cA (x) = xn xn1 T r(A) + + (1)n det A .
4. The field F is called algebraically closed if every p F[x] with degree at least 1 has a
zero in F. (For example, C is algebraically closed.) You will show on homework that
in this case, each polynomial can be factored into factors of the form:
p(x) = a(x r1 )n1 (x rk )nk ,
for a, r1 , . . . , rk F and natural numbers n1 , . . . , nk . The ri s are the roots of p and
the ni s are their multiplicities.
So if F is algebraically closed, we can factor
cA (x) = (x 1 )n1 (x k )nk
with n1 + + nk = n. Note the leading coefficient a here is 1 since cA is monic.
Expanding this, we see that if F is algebraically closed, then
T r(A) =

k
X

ni i and det A =

i=1

k
Y

ni i .

i=1

Thus the trace is the sum of eigenvalues (repeated according to multiplicity) and the
determinant is the product of eigenvalues (again repeated according to multiplicity).
54

5.2

Eigenspaces

If T : V V and dim V = n then T is diagonalizable if it has n distinct eigenvalues. But


this is of course not necessary: consider T to be the identity operator. Then every nonzero
vector is an eigenvector with eigenvalue 1. But of course T is diagonalizable since its matrix
form (relative to any basis) is the identity, a diagonal matrix.
Now we look more into the necessary conditions. For this we define the eigenspace E .
Definition 5.2.1. If F then the eigenspace
E = {v V : T (v) = v} = N (I T ) .
Note that is an eigenvalue of T if and only if E 6= {~0}.
The eigenspace E is the set of all eigenvectors associated to , unioned with ~0. Just as
eigenvectors for distinct eigenvalues are linearly independent, so are the eigenspaces.
Theorem 5.2.2. Let T : V V be linear. If 1 , . . . , k are distinct (not necessarily
eigenvalues) then E1 , . . . , Ek are independent:
E1 + + Ek = E1 Ek .
Proof. In the homework, you showed that for subspaces A1 , . . . , Ak , the following are equivalent.
1. A1 , . . . , Ak are independent.
2. Whenever Bi is a basis of Ai for i = 1, . . . , k, the set B = ki=1 Bi is a basis for
A1 + + Ak .
3. Whenever vi Ai for i = 1, . . . , k and v1 + + vk = ~0, all vi s must be ~0.
So let vi Ei for all i be such that v1 + + vk = ~0. By way of contradiction, assume
they are not all zero. This means that for some subset
P S of {1, . . . , k}, the vectors vi for
i S are nonzero (and are thus eigenvectors) and iS vi = ~0. But then we have a linear
combination of eigenvectors for distinct eigenvectors equal to zero. Linear independence
gives a contradiction.
We can now give the main diagonalizability theorem.
Theorem 5.2.3 (Main diagonalizability theorem). Let T : V V be linear with dim V =
n < . The following are equivalent.
1. T is diagonalizable.
2. V = E1 Ek , where 1 , . . . , k are the distinct eigenvalues of T .
3. We can write cT (x) = (x 1 )n1 (x k )nk , where ni = dim Ei .
55

Proof. Assume that T is diagonalizable. Then we can find a basis B for V consisting of
eigenvectors for T . Each of these vectors is associated with a particular eigenvalue, so write
1 , . . . , k for the distinct ones. We can then group together the elements of B associated
with i , span them, and call the resulting subspace Ei . It follows then that
E1 Ek = E1 + + Ek = Span B = V .
So 1 implies 2.
Now assume that 2 holds. Then build a basis Bi of size ni of each Ei . Since the i s
are eigenvalues, the Bi s consist of eigenvectors for eigenvalue i and ni 1 for all i. Since
we assumed item 2, the set B = ki=1 Bi is a basis for V and [T ]B
B is a diagonal matrix
with distinct entries 1 , . . . , k , with i repeated ni times. By computing the characteristic
polynomial cT through this matrix, we find item 3 holds. This proves 2 implies 3.
Last if 3 holds, then n1 + + nk = n, because cT has degree n. Therefore
dim E1 + + dim Ek = n .
Because each i is a root of cT , it is an eigenvalue and therefore has an eigenvector. This
means each Ei has dimension at least 1. Let Bi be a basis of Ei . As the i s are distinct,
the Ei s are independent and so
E1 + + Ei = E1 Ek .
This means B = ki=1 Bi is a basis for the sum and therefore is linearly independent. Since
it has size n, it is a basis for V . But each vector of B is an eigenvector for T so T is
diagonalizable.
Lets finish by giving an example. We would
Let A be the real matrix

1 1
A= 1 1
0 0

like to check if a matrix is diagonalizable.

1
1 .
1

So we compute the characteristic polynomial.



x1
1
1
x

1
1
cA (x) = det 1 x 1 1 = (x 1) det
.
1 x 1
0
0
x1
Here we have used the formula for the determinant of a block upper-triangular matrix. So
it equals
(x 1)((x 1)2 + 1) = (x 1)(x2 2x + 2) .
The last factor does not have any roots in R. Therefore we cannot write cA in the form
(x 1 )n1 (x k )nk and A is not diagonalizable.
On the other hand, if we consider A as a complex matrix, it is diagonalizable. This is
because the characteristic polynomials factors
(x 1)(x a)(x a
) ,
56


where a = 1 + i (where i = 1) and a
is the complex conjugate of a. Since A has 3 distinct
3
eigenvalues and the dimension of C is 3, the matrix is diagonalizable.
If, however, the characteristic polynomial were
cA (x) = (x 1)(x i)2 ,
then we would have to investigate further. For instance this would be the case if we had

1 0 0
A= 0 i 1 .
0 0 i
In this case if we write cA (x) = (x 1 )n1 (x k )nk , we have 1 = 1, 2 = i and
n1 = 1, n2 = 2. If A were diagonalizable, then we would need (from the previous theorem)
1 = n1 = dim E1 and 2 = n2 = dim Ei . However, note that

i1 0 0
0 1 ,
iI A = 0
0
0 0
which has nullity 1, not 2. This means that dim Ei = 1 and A is not diagonalizable.

5.3

Exercises

1. Let V be a vector space and W1 , . . . , Wk be subspaces. Show that the Wi s are independent if and only if for each v W1 + +Wk , there exist unique w1 W1 , . . . , wk Wk
such that v = w1 + + wk .
2. In this problem we will show that if F is algebraically closed then any linear T : V V
can be represented as an upper triangular matrix. This is a simpler result than (and
is implied by) the Jordan Canonical form, which we will cover in class soon.
We will argue by (strong) induction on the dimension of V . Clearly the result holds for
dim V = 1. So suppose that for some k 1 whenever dim W k and U : W W is
linear, we can find a basis of W relative to which the matrix of U is upper-triangular.
Further, let V be a vector space of dimension k + 1 over F and T : V V be linear.
(a) Let be an eigenvalue of T . Show that the dimension of R := R(T I) is
strictly less than dim V and that R is T -invariant.
(b) Apply the inductive hypothesis to T |R (the operator T restricted to R) to find a
basis of R with respect to which T |R is upper-triangular. Extend this to a basis
for V and complete the argument.
3. Let A be the matrix

6 3 2
A = 4 1 2 .
10 5 3
57

(a) Is A diagonalizable over R? If so, find a basis for R3 of eigenvectors of A.


(b) Is A diagonalizable over C? If so, find a basis for C3 of eigenvectors of A.
4. Let A be the matrix

1 1
1
1 1 1 .
1
1 1

Find An for all n 1.


Hint: first diagonalize A.
5. Let A Mn,n (F) be upper-triangular. Show that the eigenvalues of A are the diagonal
entries of A.
6. Let V be a finite dimensional vector space over a field F and let T : V V be linear.
Suppose that every subspace of V is T -invariant. What can you say about T ?
7. A linear transformation T : V V is called a projection if T 2 = T . Let T : V V
be a projection with dim V < .
(a) Show that
V = R(T ) N (T )
is a T -invariant direct sum.
Hint. Use exercise 5, homework 3.
(b) Show that there is a basis B of V such that [T ]B
B is diagonal with entries equal to
1 or 0. How is this result different from that of exercise 9, homework 3?
(c) Let U : V V be linear such that U 2 = I. Derive a simple matrix representation
for U .
Hint. Consider (1/2)(U + I).

58

Jordan form

For a diagonalizable transformation, we have a very nice form. However it is not always
true that a transformation is diagonalizable. For example, it may be that the roots of the
characteristic polynomials are not in the field (as in the example above). Even if the roots
are in the field, they may not have multiplicities equal to the dimensions of the eigenspaces.
So we look for a more general form. Instead of looking for a diagonal form, we look for a
block diagonal form. That is, we want to write

B 0 0
0 C 0

A=
0 0 D ,

where B, C, D, . . . are square matrices.

6.1

Primary decomposition theorem

For this purpose we define generalized eigenspaces.


Definition 6.1.1. Let T : V V be linear. Given F, the generalized eigenspace E is
the subspace
E = {v V : (I T )k v = ~0 for some k N} .
Note that E E and we can write
k
E =
k=1 N (I T ) .

Although in general if A, B are subspaces of V then A B need not be a subspace, we know


it is a subspace if A B. Because
N (I T ) N (I T )2 N (I T )3 ,
you can verify that E is a subspace.
When dim V < we know that T : V V is diagonalizable if and only if V can
be written as a direct sum of eigenspaces. Even if T is not diagonalizable, if the field F is
algebraically closed, we can always write V as a direct sum of generalized eigenspaces.
Theorem 6.1.2 (Primary decomposition theorem). Let T : V V be linear with dim V <
. If F is algebraically closed, then
V = E1 Ek ,
where 1 , . . . , k are the distinct eigenvalues of T .
Why will this theorem be useful?
59

Lemma 6.1.3. If T : V V is linear and F then E is T -invariant. That is, if v E


then T (v) E .
Proof. Let v E . Then there is some k such that (I T )k v = ~0. Now
(I T )k (T (v)) = T ((I T )k (v)) = T (~0) = ~0 .
Here we have used that (I T )k and T commute. This is because the first operator is
just a combination of operators of the form T m , all of which commute with T . Therefore
T (v) E .
From this lemma, we see that once we prove the primary decomposition theorem, we will
be able to write V as a T -invariant direct sum of generalized eigenspaces. So letting Bi be
a basis of Ei and B = ki=1 Bi , B is a basis for V and [T ]B
B is a block diagonal matrix. The
reason is that if we take T (v) for some vector v Bi then the result lies in Ei and therefore
we only need to use the vectors in Bi to represent it. All entries in its expansion in terms of
B corresponding to vectors in Bj (for j 6= i) will be zero, giving a block diagonal matrix.
Proof of the primary decomposition theorem. The proof will follow several steps.
Step 1. Peeling off a generalized eigenspace. The point of this step is to show that for some
eigenvalue 1 and some subspace W1 , we can write
V = E1 W1 .
Then we will restrict T to W1 and argue by induction.
Since the characteristic polynomial cT has coefficients from an algebraically closed field
F, it has a root 1 F. Then the generalized eigenspace E1 has nonzero dimension.
We claim that there is some k1 such that E1 = N (1 I T )k1 . To show this, let
v1 , . . . , vt1 be a basis for E1 . Then for each j = 1, . . . , t1 there exists pj 1 such that
(1 I T )pj (vj ) = ~0. Let k1 = max{p1 , . . . , pt1 }. Clearly N (1 I T )k1 E1 , so we need
only show the other inclusion. If v E1 we can write
v = a1 v1 + + at1 vt1 ,
so
(1 I T )k1 (v) = a1 (1 I T )k1 (v1 ) + + (1 I T )k1 (vt1 ) .
However k1 pj for all j so this is zero. Therefore v N (1 I T )k1 and we have proven
the claim.
Next we show that
V = N (1 I T )k1 R(1 I T )k1 .
(As pointed out in class, this claim can be proved by just noticing that the operator U =
(1 I T )k1 satisfies N (U ) = N (U 2 ) and therefore, by a homework problem, V = N (U )
R(U ).) The rank-nullity theorem implies that their dimensions sum to the dimension of V ,
so we need only show that their intersection is the zero subspace. (Use the two subspace
60

dimension theorem.) So assume that v is in the intersection, meaning that there exists w V
such that (1 I T )k1 (w) = v and (1 I T )k1 (v) = ~0. But then we get (1 I T )2k1 (w) = ~0
and therefore w E1 . This means actually
~0 = (1 I T )k1 (w) = v .
We now set W1 = R(1 I T )k1 and we are done with this step.
Step 2. T -invariance of the direct sum. We saw above that E1 is T -invariant. We claim
that W1 is as well, so that the direct sum is T -invariant, and we have obtained our first
block.
If v W1 = R(1 I T )k1 then there exists w V such that (1 I T )k1 (w) = v. Then
T (v) = T ((1 I T )k1 (w)) = (1 I T )k1 (T (w))
because T and 1 I T commute. Therefore T (v) W1 and W1 is T -invariant.
To conclude this step, we define T1 to be T restricted to W1 . That is, we view W1 as a
vector space of its own and T1 : W1 W1 as a linear transformation defined by T1 (w) = T (w)
for w W1 .
Step 3. E2 , . . . , Ek are in W1 . We now show that if 1 , . . . , k are the distinct eigenvalues
of T then E2 , . . . , Ek are contained in W1 .
So first let v Ej for some j = 2, . . . , k. By definition of the generalized eigenspace
we can find t such that v N (j I T )t . We will now use a lemma that follows from the
homework.
Lemma 6.1.4. There exist polynomials p, q F[x] such that
(j x)t p + (1 x)k1 q = 1 .
Proof. Since (j x)t and (1 x)k1 do not have a common root, you proved in the homework
that their greatest common divisor is 1. Then the lemma follows from the result on the
homework: if r, s F[x] have greatest common divisor d then there exist p, q F[x] such
that rp + sq = d.
We will use the lemma but in its transformation form. For any polynomial a F[x] of
the form a(x) = an xn + + a0 we define
a(T ) = an T n + + a1 T + a0 I .
Therefore
I = (j I T )t p(T ) + (1 I T )k1 q(T ) .
Applying this to v, we get
v = (j I T )t p(T )(v) + (1 I T )k1 q(T )(v) .
But all these polynomial transformations commute, so we can use v N (j I T )t to find
v = (1 I T )k1 (q(T )(v)) R(1 I T )k1 = W1 .
61

Therefore v W1 and so Ej W1 for j = 2, . . . , k.


Step 4. E2 , . . . , Ek are the generalized eigenspaces of T1 . Let w W1 be a vector in a
generalized eigenspace of T1 with eigenvalue . Then for some t, (I T1 )t (w) = ~0 and since
T1 acts the same as T , we find w is a generalized eigenvector of T . This means that w Ej
for some j and thus = j . If j = 1 then we would have w W1 E1 , giving w = ~0.
Therefore either j > 1 or w = ~0, meaning in either case that w E2 Ek .
Conversely if w Ej for some j = 2, . . . , k then w W1 . Then for some t, (j I
T )t (w) = ~0. But T acts the same as T1 on W1 so (j I T1 )(w) = ~0 and w is a generalized
eigenvector of T1 .
Step 5. The inductive step. We will argue for the theorem by induction on the number
of distinct eigenvalues of T . Let e(T ) be this number. If e(T ) = 1 then we have seen that
V = E1 W1 but that T1 has no eigenvalues. This means that W1 must have dimension
zero and V = E1 . Therefore V = E1 and we are done.
Now assume that the theorem holds for all linear U : V V such that e(U ) k (for
some k 1) and let T : V V be linear with e(T ) = k + 1. Then let 1 be an eigenvalue
of T and decompose V = E1 W1 . The transformation T1 has e(T1 ) k so we can write
W1 as a direct sum of its generalized eigenspaces. These are just E2 , . . . , Ek , the other
generalized eigenspaces of T , so
W1 = E2 Ek .
Therefore V = E1 Ek and we are done.

6.2

Nilpotent operators

Now that we have done the primary decomposition of V into generalized eigenspaces, we will
do a secondary decomposition, and break each space into smaller subspaces, each spanned by
a chain basis. To do this we notice that if we consider i I T on the generalized eigenspace
Ei , then it is nilpotent.
Definition 6.2.1. A linear U : V V is called nilpotent if there is some k 1 such that
U k = 0. The minimal k is called the degree of U .
We will consider a nilpotent U : V V and find a nice matrix representation for it. We
will relate this basis back to T later. The nice representation will come from chains.
Definition 6.2.2. A set {v, U (v), U 2 (v), . . . , U l (v)} is called a chain of length l for U if
U s (v) 6= ~0 for s l but U l+1 (v) = ~0.
Theorem 6.2.3 (Structure theorem for nilpotent operators). Let U : V V be nilpotent
and dim V < . There exists a basis for V consisting of chains for U .

62

Our main tool to prove the theorem will be linear independence mod a subspace. Recall
that if W is a subspace of V then v1 , . . . , vk are said to be linearly independent mod W if
whenever
a1 v1 + + ak vk W ,
it follows that v1 , . . . , vk W . We will give some important lemmas about this concept.
Many of these can be seen as statements we have derived at the beginning of the semester,
but in the setting of quotient spaces, specifically in V /W . For example, the following is
analogous to the one-subspace (basis extension) theorem:
Proposition 6.2.4. Let W1 W2 be subspaces of V . If dim W2 dim W1 = m and
v1 , . . . , vl W2 are linearly independent mod W1 we can find m l vectors vl+1 , . . . , vm
W2 \ W1 such that {v1 , . . . , vm } is linearly dependent mod W1 .
Proof. You showed in homework that {v1 , . . . , vl } is linearly independent mod W1 if and only
if {v1 + W, . . . , vl + W } is linearly independent in W2 /W1 . Further, you showed that the
dimension of W2 /W1 is dim W2 dim W1 . So we can use the one-subspace theorem in W2 /W1
to extend {v1 +W1 , . . . , vl +W1 } to a basis of W2 /W1 , adding elements Cl+1 , . . . , Cm W2 /W1 .
Each of these elements can be written as v + W1 for some v, so we obtain a set
{v1 + W1 , . . . , vl + W1 , vl+1 + W1 , . . . , vm + W1 }
which is a basis of W2 /W1 . By the equivalence above, {v1 , . . . , vm } is then linearly independent mod W1 .
Another similar statement is:
Proposition 6.2.5. Let W1 W2 be subspaces of V . If dim W2 dim W1 = m and
{v1 , . . . , vl } W2 is linearly independent mod W1 then l m.
Proof. Since {v1 , . . . , vl } is linearly independent mod W1 , {v1 + W2 , . . . , vl + W2 } is linearly
independent in W2 /W1 . This space has dimension m, so by Steinitz, l m.
Now lets specialize to the case of subspaces associated to a nilpotent operator. Given a
nilpotent U : V V of degree k with V finite-dimensional, we construct the subspaces
N0 = {~0}, N1 = N (U ), . . . , Nk1 = N (U k1 ), Nk = V .
Note that
N0 N1 Nk and Nk1 6= V .
We will prove a couple of properties about this tower of subspaces.
1. If v Nj \ Nj1 for j = 2, . . . , k then U (v) Nj1 \ Nj2 .
Proof. If v Nj \ Nj1 then U j (v) = ~0 but U j1 (v) 6= ~0. Thus
U j1 (U (v)) = ~0 but U j2 (U (v)) 6= ~0 ,
meaning U (v) Nj1 \ Nj2 .
63

2. If {v1 , . . . , vl } is linearly independent mod Nj for j 1 then {U (v1 ), . . . , U (vl )} is


linearly independent mod Nj1 .
Proof. Suppose that {v1 , . . . , vl } is linearly independent mod Nj and j 1. Then
suppose
a1 U (v1 ) + + al U (vl ) Nj1 for some a1 , . . . , al F .
Then we can write U (a1 v1 + + al vl ) Nj1 , meaning
U j (a1 v1 + + al vl ) = U j1 (U (a1 v1 + + al vl )) = ~0 .
Thus a1 v1 + al vl Nj and linear independence mod Nj gives ai = 0 for all i. We
conclude {U (v1 ), . . . , U (vl )} is linearly independent mod Nj1 .
Finally we prove the structure theorem for nilpotent operators.
Proof of structure theorem. We will prove by induction on the degree of U . We will prove
a slightly stronger statement: for any k, let Sk be the statement whenever U : V V is
nilpotent of degree k, writing m = dim Nk dim Nk1 , if {v1 , . . . , vm } is linearly independent
mod Nk1 then there is a basis of V consisting of chains for U such that v1 , . . . , vm each
begin a chain.
For k = 1, the statement is pretty easy. Let U : V V be nilpotent of degree 1. Then
m = dim N1 dim N0 = dim V . Also if {v1 , . . . , vm } is linearly independent mod N0 , since
N0 = {~0}, this set is truly linearly independent and thus a basis. Now since U (v) = ~0 for all
v, each vi starts a chain of length 1 and we are done.
Now let U : V V be nilpotent of degree k 2 and assume that the statement Sl holds
for l = k 1. Suppose that {v1 , . . . , vdk } are given vectors that are linearly independent mod
Nk1 and dk = dim Nk dim Nk1 . By the second property above,
{U (v1 ), . . . , U (vdk )} is linearly independent mod Nk2 ,
so by the first proposition again we may extend it to a set
{U (v1 ), . . . , U (vdk ), w1 , . . . , wmdk } with dim Nk1 dim Nk2 = m dk
which is linearly independent mod Nk2 . Now we apply the statement Sk1 to this set to
start chains. The space Nk1 is U -invariant, and so we can restrict U to it, defining the
restricted operator Uk1 . It is not hard to check that it is nilpotent of degree k 1 and has
j
tower of nullspaces equal to the set of first k 1 subspaces for U . That is, N (Uk1
) = Nj for
j = 0, . . . , k1. So the inductive hypothesis says that there is a basis Bk1 of Nk1 consisting
of chains for Uk1 such that each of U (v1 ), . . . , U (vdk ), w1 , . . . , wmdk starts a chain. Now
we may simply append vi to the chain started by U (vi ) for i = 1, . . . , dk to get m chains
with a total of m + dk elements. Since m + dk = dim V , we are left to just check that
{v1 , . . . , vdk } Bk1 is linearly independent; then it will be a basis for V consisting of chains
for U (such that v1 , . . . , vdk each start a chain).
64

To prove that, note that Bk1 is linearly independent (and a subset of Nk1 ) and
{v1 , . . . , vdk } is linearly independent mod Nk1 . Thus if we have a linear combination
X
bv v = ~0 ,
a1 v1 + + adk vdk +
vBk1

then a1 v1 + P
+ adk vdk Nk1 and linear independence mod Nk1 gives a1 = = adk = 0.
Thus we have vBk1 bv v = ~0 and linear independence of Bk1 gives bv = 0 for all v.
Next we prove uniqueness of the nilpotent representation.
Theorem 6.2.6 (Uniqueness). Let U : V V be nilpotent and dim V < . If B is a basis
of V consisting of chains for U , write
li (B) = # of (maximal) chains of length i in B .
Then if B, B 0 are bases of V consisting of chains for U , li (B) = li (B 0 ) for all i.
Proof. Write k for the degree of U . Let ni (B) be the number of elements of B that are in
Ni \ Ni1 . Since each element of B is in one of these sets exactly (it is a basis of chains), we
have
n1 (B) + + nk (B) = dim V .
On the other hand since the elements of B in Ni \ Ni1 are linearly independent and outside
of Ni1 they are actually linearly independent mod Ni1 (easy exercise). Therefore
ni (B) dim Ni dim Ni1 := mi for all i .
Since also m1 + + mk = dim V , we must have ni (B) = mi for all i. However ni (B) is
equal to the number of (maximal) chains of length at least i in B, so these numbers must
be the same in B and B 0 . Last,
li (B) = ni (B) ni+1 (B) = ni (B 0 ) ni+1 (B 0 ) = li (B 0 ) .

6.3

Existence and uniqueness of Jordan form, Cayley-Hamilton

Definition 6.3.1. A Jordan block for of size l is

1 0 0
0 1 0

Jl =

65

0
0

Theorem 6.3.2 (Jordan canonical form). If T : V V is linear with dim V < and F
algebraically closed. Then there is a basis of V such that [T ] is block diagonal with Jordan
blocks.
Proof. First decompose V = E1 Ek . On each Ei , the operator T i I is nilpotent.
Each chain for (T i I)|E gives a block in nilpotent decomposition. Then T = T i I +i I
i
gives a Jordan block.
We can now see the entire decomposition. We first decompose
V = E1 Ek
and then
Ei = C1i Cki i ,
where each Cji is the span of a chain of generalized eigenvectors: Cji = {v1 , . . . , vp }, where
T (v1 ) = i v1 , T (v2 ) = v2 + v1 , . . . , T (vp ) = i vp + vp1 .
Note that chains of generalized eigenvectors are not mapped to each other in the same way
that chains for nilpotent operators are. This is because we have to add the scalar operator
I.
To build up to the Cayley-Hamilton theorem, we need a couple of lemmas.
Lemma 6.3.3. If U : V V is linear and nilpotent with dim V = n < then
Un = 0 .
Therefore if T : V V is linear and v E then
(T I)dim

v = ~0 .

Proof. Let be a basis of chains for U . Then the length of the longest chain is n.
Lemma 6.3.4. If T : V V is linear with dim V < and is a basis such that [T ] is
in Jordan form then for each eigenvalue , let S be the basis vectors corresponding to blocks
for . Then
Span(S ) = E for each .
Therefore if
k
Y
c(x) =
(i x)ni ,
i=1

then ni = dim Ei for each i.

66

Proof. Write 1 , . . . , k for the distinct eigenvalues of T . Let


Wi = Span(Si ) .
We may assume that the blocks corresponding to 1 appear first, 2 appear second, and so
on. Since [T ] is in block form, this means V is a T -invariant direct sum
W1 Wk .
However for each i, T i I restricted to Wi is in nilpotent form. Thus (T i I)dim
for each v Si . This means

E
i

v = ~0

Wi Ei for all i or dim Wi dim Ei .


P
But V = E1 Ek , so ki=1 dim Ei = dim V . This gives that dim Wi = dimEi for
all i, or Wi = Ei .
For the second claim, ni is the number of times that i appears on the diagonal; that is,
the dimension of Span(Si ).
Theorem 6.3.5 (Cayley-Hamilton). Let T : V V be linear with dim V < . Writing cT
for the characteristic polynomial of T
cT (T ) = 0 .
Proof. We will only give a proof in the case that F is algebraically closed. The general case
follows by doing a field extension. (You can see many different proofs online or in textbooks
though.) Since F is algebraically closed we may factor the characteristic polynomial
cT (x) = (x 1 )n1 (x k )nk ,
where 1 , . . . , k are the distinct eigenvalues. Let B be a basis such that [T ]B
B is in Jordan
form. Last lecture it was shown that ni is equal to the dimension of Ei .
So taking v B such that v Ei , we get
!
Y
(cT (T ))(v) = (T 1 I)n1 (T k I)nk (v) =
(T j I)nj (T i I)ni (v) .
j6=i

In last lecture it was shown that if (T i I)dimEi restricted to Ei is zero. Therefore


cT (T )(v) = ~0. Since cT (T ) sends all basis vectors to ~0, it must be the zero operator.

We can now finish with the uniqueness of Jordan form.


Theorem 6.3.6 (Uniqueness of Jordan form). Let T : V V be linear with F algebraically
B0
closed. If B and B 0 are bases such that [T ]B
B and [T ]B 0 are in Jordan form, these matrices
are equal up to permutation of blocks.
67

Proof. Write 1 , . . . , k for the distinct eigenvalues of T . Setting Si as the generalized eigenvectors in B corresponding to i and Si0 the same for B 0 , we have from last lecture that
Si , Si0 are bases for Ei .
Therefore both Si and Si0 are chain bases for T i I restricted to Ei . By the uniqueness of
nilpotent form, the number of chains of each length is the same in Si , Si0 . This means the
number of Jordan blocks of each size for i is the same in B and B 0 . This is true for all i,
and proves the theorem.

6.4

Exercises

Notation. Let T : V V be linear, where V is an F-vector space. If p F[x] has the form
p(x) = an xn + + a1 x + a0 then we define the linear transformation p(T ) : V V by
p(T ) = an T n + + a1 T + a0 I .
Exercises.
1. Let V be a finite dimensional vector space with T : V V linear and let
V = W1 Wk
be a T -invariant direct sum decomposition. That is, the subspaces Wi are independent,
sum to V and are each T -invariant. If B1 , . . . , Bk are bases for W1 , . . . , Wk respectively,
let B = ki=1 Bi and show that [T ]B
B is a block diagonal matrix with k blocks of sizes
#B1 , . . . , #Bk .
2. Let U : V V be nilpotent of degree k and dim V < . In this question we will
sketch an approach to the structure theorem for nilpotent operators using quotient
spaces.
(a) For i = 1, . . . , k, define Ni = N (U i ) and N0 = {~0}. Define the quotient spaces
i for i 1 then there exists
i = Ni /Ni1 and N
0 = {~0}. Show that if C N
N
i1 , such that U (C) D, where
DN
U (C) = {U (v) : v C} .
i N
i1 by Ui (C) = D, where D is the unique
(b) For i = 1, . . . , k, define Ui : N
element in Ni1 such that U (C) D. Show that Ui is linear.
(c) For i = 1, . . . , k, show that Ui is injective.
(You do not need to do anything for this paragraph.) From this point on, the proof
(k)
(k)
k . By injectivity of Uk ,
would proceed as follows. Let C1 , . . . , Clk be a basis of N
(k)
(k)
k1 and
{Uk (C1 ), . . . , Uk (Clk )} linearly independent. Extend it to a basis of N
68

(k1)

(k1)

call the elements of this basis C1


, . . . , Clk1 , where lk1 lk . Continue, and
(kr)
(kr)

after constructing the basis of Nkr whose elements are C1


, . . . , Clkr , extend
kr1 . In the end we get a family of bases
their images under Ukr to a basis of N
1 , . . . , N
k . This family is analogous to chain bases constructed in
of the spaces N
class, but now live in quotient spaces. At this point, one must just extract the
chain bases, and you can think about how to do that.
3. The minimal polynomial Let V be a finite-dimensional F-vector space and T : V
V linear.
(a) Consider the subset S F[x] defined by
S = {p F[x] : p(T ) = 0} .
Show that S contains a nonzero element.
Hint. Let {v1 , . . . , vn } be a basis for V and for each i, consider
{vi , T (vi ), T 2 (vi ), . . . , T n (vi )} .
Show this set is linearly dependent and therefore there is a polynomial pi F[x]
such that (pi (T ))(vi ) = ~0. Then define p as the product p1 pn .
(b) Let mT S be a monic non-zero element of S of minimal degree. Show that mT
divides any other element of S. Conclude that mT is unique. We call mT the
minimal polynomial of T .
(c) Prove that the zeros of mT are exactly the eigenvalues of T by completing the
following steps.
i. Suppose that r F is a zero of mT . Show that
mT (x) = q(x)(x r)k
for some k N and q F[x] such that q(r) 6= 0. Prove also that q(T ) 6= 0.
ii. Show that if r F is a zero of mT then rI T is not invertible and so r is
an eigenvalue of T .
iii. Conversely, if is an eigenvalue of T , let v be a corresponding eigenvector.
Show that if p F[x] then (P (T ))(v) = P ()v. Conclude that is a zero of
mT .
4. Let T : V V be linear and V a finite-dimensional vector space over F. Let p, q F[x]
be relatively prime (that is, their greatest common divisor is 1). We will show that
N (p(T )q(T )) = N (p(T )) N (q(T )) .
(a) Show that if v N (p(T )) + N (q(T )) then v N (p(T )q(T )).
(b) Show that if v N (p(T )q(T )) then v N (p(T )) + N (q(T )).
Hint. Since p, q are relatively prime, we may find a, b F[x] such that ap+bq = 1.
Now apply this to v.
(c) Show that N (p(T )) N (q(T )) = {~0}.
69

5. Let T : V V be linear and V a finite-dimensional vector space over an algebraically


closed field F. Show that if the minimal polynomial is factored as
mT (x) = (x 1 )n1 (x k )nk
then V = N (1 I T )n1 N (k I T )nk . How is this a different route to prove
the primary decomposition theorem?
Hint. Use exercise 4.
6. Let T : V V be linear and V a finite-dimensional vector space over F. Show that T
is diagonalizable if and only if there exist 1 , . . . , k in F such that
mT (x) = (x 1 ) (x k ) .
Hint. Use exercise 4.
7. Let V be a finite dimensional vector space over F, an algebraically closed field. If
T : V V is linear, show that the multiplicity of an eigenvalue in mT is equal to
the size of the largest Jordan block of T for . For which T is cT = mT ?
8. Find the Jordan form for each of the following matrices over C. Write the minimal
polynomial and characteristic polynomial for each. To do this, first find the eigenvalues.
Then, for each eigenvalue , find the dimensions of the nullspaces of (A I)k for
pertinent values of k (where A is the matrix in question). Use this information to
deduce the block forms.

1 0 0
2 3 0
5
1
3
2
0
(b) 0 1 0
(c) 0
(a) 1 4 1
1 4 0
0 1 2
6 1 4
9. (a) The characteristic polynomial of the matrix

7 1 2
2
1 4 1 1

A=
2 1 5 1
1 1 2
8
is c(x) = (x 6)4 . Find an invertible matrix S such that S 1 AS is in Jordan
form.
(b) Find all complex matrices in Jordan form with characteristic polynomial
c(x) = (i x)3 (2 x)2 .

70

10. If T : V V is linear and V is a finite-dimensional F-vector space with F algebraically


closed, we define the algebraic multiplicity of an eigenvalue to be a(), the dimension
of the generalized eigenspace E . The geometric multiplicity of is g(), the dimension
of the eigenspace E . Finally, the index of is i(), the length of the longest chain of
generalized eigenvectors in E .
Suppose that is an eigenvalue of T and g = g() and i = i() are given integers.
(a) What is the minimal possible value for a = a()?
(b) What is the maximal possible for a?
(c) Show that a can take any value between the answers for the above two questions.
(d) What is the smallest dimension n of V for which there exist two linear transformations T and U from V to V with all of the following properties? (i) There
exists F which is the only eigenvalue of either T or U , (ii) T and U are not
similar transformations and (iii) the geometric multiplicity of for T equals that
of U and similarly for the index.
11. Let T : V V be linear on a finite-dimensional vector space. If W is a T -invariant
subspace of V , define the restriction TW of T to W .
(a) Show the characteristic polynomial of TW divides that of T . Show the minimal
polynomial of TW divides that of T .
(b) Show that if T is diagonalizable then so is TW .
12. Let T, U : V V be linear on a finite-dimensional vector space. Assume that T U =
U T and that both T and U are diagonalizable. We will show that T and U are
B
simultaneously diagonalizable; that is, there is a basis B such that both [T ]B
B and [U ]B
are diagonal.
(a) Show that each eigenspace ET of T is U -invariant.
(b) Show that there is a basis of each ET consisting of eigenvectors for both T and
U . Conclude that T and U are simultaneously diagonalizable.
13. In this problem we will inspect the interaction between Rn and Cn . This will be used
to establish the real Jordan form in the next problem.
(a) Every vector in Cn can be written as v + iw where v, w Rn . Define the inclusion
map : Rn Cn by (v) = v = v + i~0. Show that is R-linear; that is,
(cv + w) = c(v) + (w) for v, w Rn and c R.
(b) Define the complex conjugation map c : Cn Cn by
c(v + iw) = v iw .
Show that c2 is the identity and c is anti-linear ; that is, c is additive but c(z(v +
iw)) = zc(v + iw). (Here z represents the complex conjugate of the number
z C.)
71

(c) Prove that if W is a subspace of Rn then Span((W )) is c-invariant. Conversely, if


W 0 is a c-invariant subspace of Cn , show that W 0 = Span((W )) for some subspace
W of Rn .
14. In this problem we establish the real Jordan form. Let T : Rn Rn be linear. The
complexification of T is defined as TC : Cn Cn by
TC (v + iw) = T (v) + iT (w) .
(a) Show that TC is a linear transformation on Cn . If C is one of its eigenvalues
and E is the corresponding generalized eigenspace, show that c(E ) = E . (Here
c is the complex conjugation map from last problem.)
(b) Show that the non-real eigenvalues of TC come in pairs. In other words, show that
we can list the distinct eigenvalues of TC as
1 , . . . , r , 1 , . . . , 2m ,
where for each j = 1, . . . , r, j = j and for each i = 1, . . . , m, 2i1 = 2i .
(c) Because C is algebraically closed, the proof of Jordan form shows that
Cn = E1 Er E1 E2m .
Using the previous two parts, show that for j = 1, . . . , r and i = 1, . . . , m, the
subspaces of Cn
Ej and E2i1 E2i
are c-invariant.
(d) Deduce from the previous problem that there exist subspaces X1 , . . . , Xr and
Y1 , . . . , Ym of Rn such that for each j = 1, . . . , r and i = 1, . . . , m,
E = Span((Yi )) .
E = Span((Xj )) and E
2i1

2i

Show that R = X1 Xr Y1 Ym .
(e) Prove that for each j = 1, . . . , r, the transformation T j I restricted to Xj is
nilpotent and thus we can find a basis Bj for Xj consisting entirely of chains for
T j I.
(f) For each k = 1, . . . , m, let
(k)

(k)

Ck = {v1 + iw1 , . . . , vn(k)


+ iwn(k)
}
k
k
be a basis of E2k1 consisting of chains for TC 2k1 I. Prove that
(k)
(k)
Ck = {v1 , w1 , . . . , vn(k)
, wn(k)
}
k
k

is a basis for Yk . Describe the form of the matrix representation of T restricted


to Yk , relative to the basis Ck .
(g) Gathering the previous parts, state and prove a version of Jordan form for linear
transformations on Rn . Your version should be of the form If T : Rn Rn is
linear then there exists a basis B such that [T ]B
B has the form . . .
72

Bilinear forms

7.1

Definition and matrix representation

We now move to bilinear forms, which are 2-linear functions.


Definition 7.1.1. A 2-linear function f : V V F is called a bilinear form. The set
Bil(V, F) of bilinear forms on V forms a vector space.
If V is finite dimensional then we can find a matrix form for f in terms of a basis.
Theorem 7.1.2. Let f be a bilinear form on V and B a basis for V . Define the matrix [f ]B
B
by

[f ]B
B i,j = f (vj , vi ) .
Then for all v, w V , we have
[w]tB [f ]B
B [v]B .
Furthermore, [f ]B
B is the unique matrix such that this equation holds for all v, w V .
Proof. To show this, let B be a basis for V and write B = {v1 , . . . , vn }. If v, w V , write
v = a1 v1 + + an vn and w = b1 v1 + + bn vn .
Then
f (v, w) = f (a1 v1 + + an vn , w) =

n
X

ai f (vi , w) =

i=1

=
=
=
=

n
X
i=1
n
X
i=1
n
X
i=1
n
X

ai f (vi , b1 v1 + + bn vn )
"
ai

n
X

#
bj f (vi , vj )

j=1

([v]B )i,1

n
X

([w]B )t1,j [f ]B
B


j,i

j=1

[w]tB [f ]B
B

i=1
[w]tB [f ]B
B [v]B


1,i

([v]B )i,1

For uniqueness, note that if A is any matrix such that f (v, w) = [w]tB A[v]B for all v, w,
we apply this to vi , vj :
f (vj , vi ) = [vi ]tB A[vj ]B = eti Aej = Ai,j .
Thus A has the same entries as those of [f ]B
B.

73

Remarks.
If we define the standard dot product of two vectors ~a, ~b Fn by
~a ~b = a1 b1 + + an bn ,
where we have written the vectors ~a = (a1 , . . . , an ) and ~b = (b1 , . . . , bn ), then we can
write the above result as

f (v, w) = [f ]B
B [v]B [w]B .
Given any A Mn,n (F) and basis B of V (of size n), the function fA , given by
fA (v, w) = [w]tB A[v]B
is bilinear (check this!) and has A for its matrix relative to B. To see this, we compute
the (i, j)-th entry of [fA ]B
B : it is
fA (vj , vi ) = [vi ]tB A[vj ]B = eti Aej = Ai,j .
An important example comes from A = I. This corresponds to the dot product in
basis B:
(v, w) 7 [w]tB [v]B = [v]B [w]B .
Given B a basis of V , the map f 7 [f ]B
B is an isomorphism.
Proof. If f, g Bil(V, F) and c F,



B
B
[cf + g]B
B i,j = (cf + g)(vj , vi ) = cf (vj , vi ) + g(vj , vi ) = c [f ]B i,j + [g]B i,j ,
B
B
B
so [cf + g]B
B = c[f ]B + [g]B , proving linearity. If [f ]B is the zero matrix, then

f (v, w) = [w]tB [f ]B
B [v]B = 0 for all v, w V ,
so f is zero, proving injectivity. Last, if A is a given matrix in Mn,n (F), we showed
above that [fA ]B
B = A. This means that the map is surjective and we are done.
Definition 7.1.3. The rank of a bilinear form f on a finite-dimensional vector space V is
defined as rank([f ]B
B ) in any basis B.
To show this is well-defined will take a bit of work. Given a bilinear form f on V we can
fix a vector v V and define
Lf (v) : V F by (Lf (v))(w) = f (v, w) .
Since f is bilinear, this is a linear functional and thus lives in V . Let Bil(V, F) be the vector
space of bilinear forms on V .
We can nicely represent the matrix of Lf relative to the basis B and the dual basis B .
74

Proposition 7.1.4. Lf : V V is a linear transformation and


B
[Lf ]B
B = [f ]B .

Proof. Let v1 , v2 V and c F. To show that Lf (cv1 + v2 ) = cLf (v1 ) + Lf (v2 ), we will need
to apply these functionals to vectors in V . So let w V and compute
Lf (cv1 + v2 )(w) = f (cv1 + v2 , w) = cf (v1 , w) + f (v2 , w)
= cLf (v1 )(w) + Lf (v1 )(w) = (cLf (v1 ) + Lf (v2 ))(w) .
This is true for all w, so Lf is linear.
The i-th column of [Lf ]B
B is obtained by writing
B = {v1 , . . . , vn }, B = {v1 , . . . , vn }
and writing Lf (vi ) in terms of B . Recall that if g is a linear functional then its representation
in terms of the dual basis can be written
f = f (v1 )v1 + + f (vn )vn .
Therefore we can write
Lf (vi ) = Lf (vi )(v1 )v1 + + Lf (vi )(vn )vn
= f (vi , v1 )v1 + + f (vi , vn )vn .
B
This means that the (j, i)-th entry of [Lf ]B
B is (vi , vj ), the (j, i)-th entry of [f ]B .
B
Here are some nice consequences of the equality [f ]B
B = [Lf ]B .

1. The definition of rank of f does not depend on the choice of basis B. Indeed, let C be
another basis. Then
B
C
C
rank(f ) = rank[f ]B
B = rank([Lf ]B ) = rank([Lf ]C ) = rank[f ]C .

The third equality follows from the fact that the rank of the matrix of Lf does not
depend on the bases used to represent it.
2. We can equally well define Rf to be the map from V to V by
Rf (v)(w) = f (w, v) .
B
Then [Rf ]B
B = [f ]B . The reason is as follows. Define g(v, w) = f (w, v). Then Lg = Rf .
B
Now the matrix [g]B
B is easily seen to be the transpose of [f ]B (its (i, j)-th entry is
g(vj , vi ) = f (vi , vj )). So
B
[f ]B
B = [g]B

t

B
= [Lg ]B
B = [Rf ]B .

75

3. We say that a bilinear form is degenerate if its rank is not equal to dim V . We
can now state many equivalent conditions for this: the following are equivalent when
f Bil(V, F) and dim V = n < .
(a) f is degenerate.
(b) Define the nullspace of f to be
N (f ) = {v V : f (v, w) = 0 for all w V } .
(This is also called the left nullspace.) Then N (f ) 6= {~0}.
(c) Defining the right nullspace by
NR (f ) = {v V : f (w, v) = 0 for all w V } ,
then NR (f ) 6= {~0}.
Note here that N (f ) = N (Lf ) and NR (f ) = N (Rf ). By this representation, we have
rank(f ) + dim N (f ) = dim V .
This comes from the rank-nullity theorem applied to Lf .
Theorem 7.1.5. Let V be finite-dimensional. The map L : Bil(V, F) L(V, V ) given by
L (f ) = Lf
is an isomorphism.
Proof. First we show linearity. Given f, g Bil(V, F) and c F, we want to show that
L (cf + g) = cL (f ) + L (g) .
To do this, we need to show that when we apply each side to a vector v V , we get the
same result. The result will be in the dual space, so we need to show this result, applied to
a vector w V , is the same. Thus we compute
(L (cf + g)(v)) (w) = (cf + g)(v, w) = cf (v, w) + g(v, w)
and the right side is
((cL (f ) + L (g))(v)) (w) = (c(L (f )(v)) + L (g)(v)) (w)
= c ((L (f )(v))(w)) + (L (g)(v))(w)
= cf (v, w) + g(v, w) .
To show bijectivity, we note that the dimension of Bil(V, F) is n2 , the same as that of
L(V, V ) (since the map sending a bilinear form to a matrix is an isomorphism). Thus we
need only show one-to-one. If L (f ) = 0, then L (f )(v) = 0 for all v V , meaning for all
w V,
0 = (L (f )(v))(w) = f (v, w) .
This being true for all v, w means f is zero, so L is injective.
76

Now we move to changing coordinates. This is one big difference between the matrix of a
linear transformation and the matrix of a bilinear form. Instead of conjugating by a change
of basis matrix as before, we multiply on the right by the change of basis matrix and on the
left by its transpose.
Proposition 7.1.6. Let f be a bilinear form on V , a finite-dimensional vector space over
F. If B, B 0 are bases of V then

t
B0
B0
B0
[f ]B 0 = [I]B [f ]B
B [I]B .
Proof. For any v, w V ,


t


t

0
B0
B
B0
B0
B
t
0
0
[v]
[w]
[I]
[f
]
[I]B [f ]B [I]B [v]B 0 = [I]B
[w]B 0
B
B
B
B
B
= [w]tB [f ]B
B [v]B
= f (v, w) .

Another way to see the theorem is that if a matrix A represents a bilinear form in some
basis and P is an invertible matrix, then P t AP represents the bilinear form in a different
basis.

7.2

Symmetric bilinear forms

Definition 7.2.1. A form f Bil(V, F) is called symmetric if f (v, w) = f (w, v) for all
v, w V . The space of symmetric bilinear forms is denoted Sym(V, F).
Symmetric forms are represented by symmetric matrices. That is, if f is symmetric and
B t
B is a basis, then [f ]B
B is equal to its transpose ([f ]B ) . One of the fundamental theorems
about symmetric bilinear forms is that they can be diagonalized.
Definition 7.2.2. A basis B of V is called orthogonal relative to f Bil(V, F) if f (v, w) = 0
for all distinct v, w B.
The basis B being orthogonal relative to f is equivalent to [f ]B
B being a diagonal matrix.
Theorem 7.2.3 (Diagonalization of symmetric bilinear forms). Let V be a finite-dimensional
vector space over F, a field of characteristic not equal to 2. If f Sym(V, F) then V has
basis orthogonal relative to f .
Remark. Equivalently, if A Mn,n (F) is symmetric, then there is an invertible P Mn,n (F)
such that P t AP is diagonal.

77

Proof. First note that if f is the zero form then the theorem is trivially true. So assume
f 6= 0.
We will argue by induction on the dimension of V . For the base case, if V has dimension
1, then any basis is orthogonal relative to f . If dim(V ) = n > 1 then we begin by finding
a first element of our basis. To do this, let v V ; we will need to make sure that v can be
chosen such that f (v, v) 6= 0. For this, we use a lemma.
Lemma 7.2.4. Let f Sym(V, F) be nonzero. If F does not have characteristic two then
f (v, v) = 0 for all v V f = 0 .
Proof. The idea of the proof is to develop a so-called polarization identity. For v, w V ,
f (v + w, v + w) f (v w, v w) = 4f (v, w) ,
so

1
f (v, w) = (f (v + w, v + w) f (v w, v w)) .
4
Note that what we have written as 1/4 is actually the inverse of 4 in F. This exists because F
does not have characteristic 2. Therefore if f (z, z) = 0 for all z, we apply this for z = v + w
and z = v w to find f (v, w) = 0.
From the lemma, since f 6= 0, we can find v V such that f (v, v) 6= 0. This implies that
v itself is nonzero. Now consider the function Lf (v). Since it is a nonzero linear functional
(for instance Lf (v)(v) 6= 0) its nullspace must be of dimension n 1. Define f to be f
restricted to N (Lf (v)) and note that f is a symmetric bilinear form on N (Lf (v)). Since this
has dimension strictly less than that of V , we use induction to find {v1 , . . . , vn1 }, a basis
for N (Lf (v)) that is orthogonal relative to f.
Since v
/ N (Lf (v)), it follows that {v1 , . . . , vn1 , v} is a basis for V . It is also orthogonal
because f (vi , vj ) = f(vi , vj ) = 0 whenever i, j {1, . . . , n 1} and for i = 1, . . . , n 1 we
have
f (vi , v) = f (v, vi ) = 0 since vi N (Lf (v)) .
This completes the proof.
The above shows that any symmetric bilinear form can be diagonalized as long as
char(F) 6= 2. In the case that the characteristic is 2, we cannot necessarily find an orthogonal basis: consider


0 1
A=
.
1 0
In a field with characteristic 2, 1 = 1, so this is a symmetric matrix and thus defines
a symmetric bilinear form f on F2 , where F = Z2 , by f (v, w) = wt Av. However for any
v = ae1 + be2 ,
f (v, v) = a2 f (e1 , e1 ) + abf (e1 , e2 ) + abf (e2 , e1 ) + b2 f (e2 , e2 ) = 0 .
78

So if {v, w} is an orthogonal basis, we must have f (v, v) = f (w, w) = f (v, w) = f (w, v) = 0,


implying f is 0, a contradiction.
In the case of Cn , the diagonalization result allows us actually to get a matrix with only
1s and 0s. To do this, take an orthogonal basis B for a symmetric form f and label it
{v1 , . . . , vn }. Now define
( v
i
if f (vi , vi ) 6= 0
f (vi ,vi )
wi =
.
0
otherwise
Then {w1 , . . . , wn } is still orthogonal relative to f but satisfies f (wi , wi ) = 0 or 1. This
actually works in any field F in which each element has a square root. This of course does
not work in Rn , but we have a separate result for that.
Theorem 7.2.5 (Sylvesters law of inertia). Let f Sym(Rn , R). Then there exists a basis
0
B such that [f ]B
B is diagonal with only 1s, 1s and 0s. Furthermore, if B is another basis
B0
such that [f ]B 0 is in this form, the number of 1s, 1s and 0s respectively is the same.
Proof. For
pthe first part, just take a basis B = {v1 , . . . , vn } from the last theorem and define
wi = vi / |f (vi , vi )| when f (vi , vi ) 6= 0. For the second part, define S0 (B) as the set of
vectors v in B such that f (v, v) = 0, S+ (B) as the set of v B such that f (v, v) > 0 and
S (B) as the set of v B such that f (v, v) < 0. Last, define their spans as V0 (B), V+ (B)
and V (B). Since each vector of B falls into one of these categories,
V = V0 (B) V+ (B) V (B) .
We have a similar decomposition for B 0 .
P
Note that if v V+ (B) we can write it as v = ti=1 ai vi , where v1 , . . . , vt are the elements
of S+ (B). Then using orthogonality,
f (v, v) =

t
X

a2i f (vi , vi ) > 0 .

i=1

The same is clearly true for V+ (B 0 ). Now assume for a contradiction that #S+ (B) >
#S+ (B 0 ). Then by the two subspace dimension theorem, writing V (B 0 ) = V0 (B 0 ) V (B 0 ),
dim (V+ (B) V (B 0 )) + dim (V+ (B) + V (B 0 )) = dim V+ (B) + dim V (B 0 ) .
Using dim (V+ (B) + V (B 0 )) dim V and dim V+ (B) > dim V+ (B 0 ),
dim (V+ (B) V (B 0 )) dim V+ (B 0 ) + dim V (B 0 ) dim V > 0 .
Therefore there is a vector v V+ (B) V (B 0 ). But such a vector must have f (v, v) > 0
and f (v, v) 0, a contradiction. So #S+ (B) #S+ (B 0 ). Reversing the roles of B and B 0
gives
#S+ (B) = #S+ (B 0 ) .
An almost identical argument gives #S (B) = #S (B 0 ). This means we must have also
#S0 (B) = #S0 (B 0 ), since both bases have the same number of elements.
79

Some remarks are in order here.


1. The space V0 (B) is unique; that is, for another basis B 0 giving f the matrix form above,
we have V0 (B) = V0 (B 0 ). This is because they are both equal to N (f ).
Proof. We will show V0 (B) = N (f ). The same proof shows V0 (B 0 ) = N (f ). Let
v S0 (B). Since B is orthogonal, if w is another element of B (not equal to v) then
f (v, w) = 0. However we also have f (v, v) = 0 since v S0 (B). This means Lf (v)
kills all basis elements and must be 0, giving v N (f ). Therefore
S0 (B) N (f ) V0 (B) = Span(S0 (B)) Span(N (f )) = N (f ) .
However,
#S0 (B) = dim V [#S+ (B) + #S (B)] = dim V rank f = dim(N (f )) .
Thus V0 (B) = N (f ).
2. The spaces V (B) and V+ (B) are not unique; they just have to have the same dimensions as V (B 0 ) and V+ (B 0 ) respectively, if B 0 is another basis that puts f into the
form of the theorem. As an example, take f Bil(R2 , R) with matrix in the standard
basis


1 0
B
[f ]B =
.
0 1

Then take v1 = (2, 3) and v2 = ( 3, 2). Since f ((a, b), (c, d)) = ac bd,

f (v1 , v1 ) = (2)(2) ( 3)( 3) = 1 ,

f (v1 , v2 ) = (2)( 3) ( 3)(2) = 0 ,



f (v2 , v2 ) = ( 3)( 3) (2)(2) = 1 .
0

B
0
This means if B 0 = {v1 , v2 } then [f ]B
B 0 = [f ]B but the spaces V+ (B) and V+ (B ) are
not the same (nor are V (B) and V (B 0 )).

7.3

Sequilinear and Hermitian forms

One motivation for considering symmetric bilinear forms is to try to abstract the standard
dot product. In this direction, we know that in Rn , the quantity
q

k~ak = ~a ~a = a21 + + a2n


measures the length of the vector ~a = (a1 , . . . , an ). If we try to give this same definition
for complex vectors, we get
(i, . . . , i) (i, . . . , i) = n ,
80

which is bad because we should have ~a ~a 0 for all vectors. This is the motivation for
introducing a different dot product on Cn : it is given by
~a ~b = a1 b1 + + an bn ,
where z represents the complex conjugate x iy of a complex number z = x + iy. Note that
this dot product is no longer bilinear because of the conjugate; however, it is sesquilinear.
Definition 7.3.1. If V is a vector space over C, a function f : V V C is called
sesquilinear if
1. for each fixed w V , the function v 7 f (v, w) is linear and
2. for each fixed v V , the function w 7 f (v, w) is anti-linear; that is, for w1 , w2 V
and c C,
f (v, cw1 + w2 ) = cf (v, w1 ) + f (v, w2 ) .
If, in addition, f (v, w) = f (w, v) for all v, w V , we call f Hermitian.
The theory of Sesquilinear and Hermitian forms parallels that of bilinear and symmetric
forms. We will not give the proofs of the following statements, as they are quite similar to
before:
If f is a sesquilinear form and B is a basis of V then there is a matrix of f relative to
B as before:
t
for v, w V, f (v, w) = [w]B [f ]B
B [v]B .
Here, the (i, j)-th entry of [f ]B
B is f (v1 , vj ), where B = {v1 , . . . , vn }, as before.
The function Lf (v) is no longer linear, it is anti-linear. Although Rf (w) is linear,
the map w 7 Rf (w) (from V to V ) is no longer an isomorphism. It is a bijective
anti-linear function.
We have the polarization formula
4f (u, v) = f (u + v, u + v) f (u v, u v) + if (u + iv, u + iv) if (u iv, u iv) .
This implies that if f (v, v) = 0 for all v, then f = 0.
There is a corresponding version of Sylvesters law:
Theorem 7.3.2 (Sylvester for Hermitian forms). Let f be a Hermitian form on a finitedimensional complex vector space V . There is a basis B of V such that [f ]B
B is diagonal with
only 0s, 1s and 1s. Furthermore, the number of each does not depend on B as long as
the matrix is in diagonal form.
The proof is the same.

81

7.4

Exercises

1. Let V be an F-vector space and {v1 , . . . , vn } a basis for V . Consider the dual basis
{v1 , . . . , vn } and for all pairs i, j {1, . . . , n} define fi,j (v, w) = vi (v)vj (w). Show that
{fi,j : i, j {1, . . . , n}}
is a basis for Bil(V, F). Find the nullspace of each element in this basis.
2. Let V be a vector space and f Sym(V, F). If W is a subspace of V such that
V = W N (f ), show that fW , the restriction of f to W , is non-degenerate.
3. Let V be a vector space over F with characteristic not equal to 2. Show that if V is
finite dimensional and W is a subspace such that the restriction fW of f Sym(V, F)
to W is non-degenerate, then V = W W f . Here W f is defined as
W f = {v V : f (v, w) = 0 for all w W } .
Hint. Use induction on dim W .
4. Let V be a vector space of dimension n < and f Sym(V, F) be non-degenerate.
(a) A linear T : V V is called orthogonal relative to f if f (T (v), T (w)) = f (v, w)
for all v, w V . Show that if T is orthogonal then it is invertible.
(b) For any g Bil(V, F) and linear U : V V we can define gU : V V F by
gU (v, w) = g(U (v), U (w)) .
Show that gU Bil(V, F). Given a basis B of V , how do we express the matrix
of gU relative to that of g? Use this to find the determinant of any T that is
orthogonal relative to f .
(c) Show that the orthogonal group
O(f ) = {T L(V, V ) : T is orthogonal relative to f }
is, in fact, a group under composition.
5. Let V be a finite-dimensional F-vector space such that char(F) 6= 2. If f is a skewsymmetric bilinear form on V (that is, f (v, w) = f (w, v) for all v, w V ) can one
find a basis B of V such that [f ]B
B is diagonal?
6. Let f be a symmetric bilinear form on Rn .
(a) Show that
fH ((v, w), (x, y)) := f (v, x) + f (w, y) if (v, y) + if (w, x)
defines a Hermitian form on Cn . (Here we are writing (v, w) for the vector v + iw
as in last homework.)
82

(b) Show that N (fH ) = Span((N (f ))), where is the embedding (v) = (v, 0).
(c) Show that if f is an inner product then so is fH .
7. For the matrix A below, find an invertible

0 1
1 0

2 1
3 2

83

matrix S such that S t AS is diagonal:

2 3
1 2
.
0 1
1 0

inner product spaces

8.1

Definitions

We will be interested in positive definite hermitian forms.


Definition 8.1.1. Let V be a complex vector space. A hermitian form f is called an inner
product (or scalar product) if f is positive definite. In this case we call V a (complex) inner
product space.
An example is the standard dot product:
hu, vi = u1 v1 + + un vn .
It
p is customary to write an inner product f (u, v) as hu, vi. In addition, we write kuk =
hu, ui. This is the norm induced by the inner product h, i. In fact, (V, d) is a metric
space, using
d(u, v) = ku vk .
Properties of norm. Let (V, h, i) be a complex inner product space.
1. For all c C, kcuk = |c|kuk.
2. kuk = 0 if and only if u = ~0.
3. (Cauchy-Schwarz inequality) For u, v V ,
|hu, vi| kukkvk .
Proof. Define If u or v is ~0 then we are done. Otherwise, set w = u
0 hw, wi = hw, ui
However
hw, vi = hu, vi

hu, vi
hw, vi .
kvk2

hu, vi
hv, vi = 0 ,
kvk2

so
0 hw, ui = hu, ui
and
0 hu, ui

hu, vihu, vi
,
kvk2

|hu, vi|2
|hu, vi|2 kuk2 kvk2 .
kvk2

84

hu,vi
v.
kvk2

Everything above is an equality so, we have equality if and only if w = ~0, or v and u
are linearly dependent.
4. (Triangle inequality) For u, v V ,
ku + vk kuk + kvk .
This is also written ku vk ku wk + kw vk.
Proof.
ku + vk2 =
=

hu + v, u + vi = hu, ui + hu, vi + hu, vi + hv, vi


hu, ui + 2Re hu, vi + hv, vi
kuk2 + 2 |hu, vi| + kvk2
kuk2 + 2kukkvk + kvk2 = (kuk + kvk)2 .

Taking square roots gives the result.

8.2

Orthogonality

Definition 8.2.1. Given a complex inner product space (V, h, i) we say that vectors u, v V
are orthogonal if hu, vi = 0.
Theorem 8.2.2. Let v1 , . . . , vk be nonzero and pairwise orthogonal in a complex inner product space. Then they are linearly independent.
Proof. Suppose that
a1 v1 + + ak vk = ~0 .
Then we take inner product with vi .
0 = h~0, vi i =

k
X

aj hvj , vi i = ai kvi k2 .

j=1

Therefore ai = 0.
We begin with a method to transform a linearly independent set into an orthonormal set.
Theorem 8.2.3 (Gram-Schmidt). Let V be a complex inner product space and v1 , . . . , vk V
which are linearly independent. There exist u1 , . . . , uk such that
1. {u1 , . . . , uk } is orthonormal and
2. for all j = 1, . . . , k, Span({u1 , . . . , uj }) = Span({v1 , . . . , vj }).

85

Proof. We will prove by induction. If k = 1, we must have v1 6= ~0, so set u1 = v1 /kv1 k. This
gives ku1 k = 1 so that {u1 } is orthonormal and certainly the second condition holds.
If k 2 then assume the statement holds for all j = k 1. Find vectors u1 , . . . , uk1 as
in the statement. Now to define uk we set
wk = vk [hvk , u1 iu1 + + hvk , uk1 iuk1 ] .
We claim that wk is orthogonal to all uj s and is not zero. To check the first, let 1 j k 1
and compute
hwk , uj i = hvk , uj i [hvk , u1 ihu1 , uj i + + hvk , uk1 ihuk1 , uj i]
= hvk , uj i hvk , uj ihuj , uj i = 0 .
Second, if wk were zero then we would have
vk Span({u1 , . . . , uk1 }) = Span({v1 , . . . , vk1 }) ,
a contradiction to linear independence. Therefore we set uk = wk /kwk k and we see that
{u1 , . . . , uk } is orthonormal and therefore linearly independent.
Furthermore note that by induction,
Span({u1 , . . . , uk }) Span({u1 , . . . , uk1 , vk }) Span({v1 , . . . , vk }) .
Since the spaces on the left and right have the same dimension they are equal.
Corollary 8.2.4. If V is a finite-dimensional inner product space then V has an orthonormal
basis.
What do vectors look like represented in an orthonormal basis? Let = {v1 , . . . , vn } be
a basis and let v V . Then
v = a1 v1 + + an vn .
Taking inner product with vj on both sides gives aj = hv, vj i, so
v = hv, v1 iv1 + + hv, vn ivn .
Thus we can view in this case (orthonormal) the number hv, vi i as the projection of v onto
vi . We can then find the norm of v easily:
2

kvk

= hv, vi = hv,

n
X

hv, vi ivi i =

i=1

n
X

n
X
i=1

|hv, vi i|2 .

i=1

This is known as Parsevals identity.


86

hv, vi ihv, vi i

Definition 8.2.5. If V is an inner product space and W is a subspace of V we define the


orthogonal complement of V as
W = {v V : hv, wi = 0 for all w W } .
{~0} = V and V = {~0}.
If S V then S is always a subspace of V (even if S was not). Furthermore,

S = (Span S) and S = Span S .
Theorem 8.2.6. Let V be a finite-dimensional inner product space with W a subspace. Then
V = W W .
Proof. Let {w1 , . . . , wk } be a basis for W and extend it to a basis {w1 , . . . , wn } for V . Then
perform Gram-Schmidt to get an orthonormal basis {v1 , . . . , vn } such that
Span({v1 , . . . , vj }) = Span({w1 , . . . , wj }) for all j = 1, . . . , n .
In particular, {v1 , . . . , vk } is an orthonormal basis for W . We claim that {vk+1 , . . . , vn } is a
f to be the span of these vectors. Clearly W
f W . On
basis for W . To see this, define W
the other hand,
W W = {w W : hw, w0 i = 0 for all w0 W }
{w W : hw, wi = 0} = {~0} .
f = n k we see
This means that dim W + dim W n, or dim W n k. Since dim W
they are equal.
Definition 8.2.7. Let V be a finite-dimensional (complex) inner product space. Letting W
be a subspace of V , we can write each vector v uniquely as v = w1 + w2 , where w1 W and
w2 W . Then define the orthogonal projection onto W as
PW (v) = w1 .
There are many ways to define the orthogonal projection. In some sense it is the closest
vector to v in w (we will see this soon). Here are some simple properties.
PW is linear.
2
PW
= PW

PW = I PW .
For all v1 , v2 V , hPW (v1 ), v2 i = hv1 , PW (v2 )i. This says PW can be moved to the
other side in the inner product. Formally this means PW is its own adjoint (defined
soon).
87

Proof.
hPW (v1 ), v2 i = hPW (v1 ), PW (v2 )i + hPW (v1 ), PW (v2 )i = hPW (v1 ), PW (v2 )i .
By the same argument, hv1 , PW (v2 )i = hPW (v1 ), PW (v2 )i.
For all v V , w W ,
kv PW (v)k kv wk with equality iff w = PW (v) .
Here the norm comes from the inner product. This says that PW (v) is the unique
closest vector to v in W .
Proof. Note first that for any w W and w0 W , the Pythagoras theorem holds:
kw + w0 k2 = hw + w0 , w + w0 i = hw, wi + 2hw, w0 i + hw0 , w0 i = kwk2 + kw0 k2 .
Now
kv wk2 = kPW (v) w + PW (v)k2 = kPW (v) wk2 + kPW (v)k2 .
This is at least kPW (v)k2 = kv PW (v)k2 and they are equal if and only if PW (v) =
w.
Projection onto a vector. If w is a nonzero vector, we can define W = Span(w) and
consider the orthogonal projection onto W . Let {w1 , . . . , wn1 } be an orthonormal basis for
W (which exists by Gram-Schmidt) and normalize w to be w0 = w/kwk. Then we can
write an arbitrary v V in terms of the basis {w1 , . . . , wn1 , w0 } as
v = hv, w1 iw1 + + hv, wn1 iwn1 + hv, w0 iw0 .
This gives a formula for PW (v) = hv, w0 iw0 . Rewriting in terms of w, we get
PW (v) =

hv, wi
w.
kwk2

Since the orthogonal projection onto a subspace was defined without reference to a basis, we
see that this does not depend on the choice of w!
We then define the orthogonal projection onto the vector w to be PW , where W =
Span({w}). That is, Pw (v) = hv,wi
w.
kwk2

88

8.3

Adjoint

Before we saw that an orthogonal projection can be moved to the other side of the inner
product. This motivates us to look at which operators can do this. Given any linear T , we
can define another linear transformation which acts on the other side of the inner product.
Theorem 8.3.1 (Existence of adjoint). Let T : V V be linear and V a finite-dimensional
complex inner product space. There exists a unique linear T : V V such that for all
v, w V ,
hT (v), wi = hv, T (w)i .
T is called the adjoint of T .
Proof. For the proof, we need a lemma.
Lemma 8.3.2 (Riesz representation theorem). Let V a finite-dimensional inner product
space. For each f V there exists a unique wf V such that for all v V ,
f (v) = hv, wf i .
Proof. Recall the map Rh,i : V V given by Rh,i (w)(v) = hv, wi. Since the inner product
is a sesquilinear form, this map is linear. In fact, the inner product has rank equal to dim V
when viewed as a sesquilinear form, since only the zero vector is in its nullspace. Thus the
rank of Rh,i is dim V and it is an isomorphism. This means that given f V there exists
a unique wf V such that Rh,,i (wf ) = f . In other words, a unique wf V such that for
all v V ,
hv, wf i = Rh,i (wf )(v) = f (v) .

We now use the lemma. Given T : V V linear and w V , define a function fT,w :
V C by
fT,w (v) = hT (v), wi .
This is a linear functional, since it equals Rw T . So by Riesz, there exists a unique w such
that
hT (v), wi = fT,w (v) = hv, wi
for all v V .
We define T (w) = w.

By definition T (w) satisfies hT (v), wi = hv, T (w)i for all v, w V . We must simply
show it is linear. So given c C and v, w1 , w2 V ,
hv, T (cw1 + w2 )i = hT (v), cw1 + w2 i = chT (v), w1 i + hT (v), w2 i
= chv, T (w1 )i + hv, T (w2 )i
= hv, cT (w1 ) + T (w2 )i .
By uniqueness, T (cw1 + w2 ) = cT (w1 ) + T (w2 ).
The adjoint has many interesting properties. Some simple ones you can verify:
89

(T + S) = T + S
(T S) = S T .
(cT ) = cT .
These can be seen to follow from the next property.
Proposition 8.3.3. Let T : V V be linear with B an orthonormal basis. Then
t

B
[T ]B
B = ([T ]B ) .

Proof. Write B = {v1 , . . . , vn } and use orthonormality to express


T (vj ) = hT (vj ), v1 iv1 + + hT (vj ), vn ivn
= hT (v1 ), vj iv1 + + hT (vn ), vj ivn .
This means the (i, j)-th entry of [T ]B
B is hT (vi ), vj i.
On the other hand, we can write
T (vj ) = hT (vj ), v1 iv1 + + hT (vj ), vn ivn ,
so the (i, j)-th entry of [T ]B
B is hT (vj ), vi i.
Many properties of linear transformations are defined in terms of their adjoints.
Definition 8.3.4. Let V be an inner product space and T : V V be linear. Then T is
self-adjoint if T = T ;
skew-adjoint if T = T ;
unitary if T is invertible and T 1 = T ;
normal if T T = T T .
Recall last time we defined T : V V (in an inner product space V ) to be unitary if T
is invertible and T 1 = T . There are alternate characterizations of unitary operators.
Proposition 8.3.5. The following are equivalent when V is a finite dimensional inner product space and T : V V is linear.
1. T is unitary.
2. kT (v)k = kvk for all v V .
3. hT (v), T (w)i = hv, wi for all v, w V .
4. {T (v1 ), . . . , T (vk )} is orthonormal whenever {v1 , . . . , vk } is.
90

5. Whenever B is an orthonormal basis (for h, i), the columns of [T ]B


B are orthonormal
(relative to the standard dot product).
Proof. Suppose that T is unitary. Then if v V ,
kT (v)k2 = hT (v), T (v)i = hT T v, vi = hT 1 T v, vi = hv, vi = kvk2 .
So 1 implies 2.
Assume 2. Then using the polarization identity for Hermitian forms: for v, w V ,
4hT (u), T (v)i = hT (u + v), T (u + v)i hT (u v), T (u v)i
+ ihT (u + iv), T (u + iv)i ihT (u iv), T (u iv)i
= hu + v, u + vi hu v, u vi + ihu + iv, u + ivi ihu iv, u ivi
= 4hu, vi .
So 2 implies 3.
Assuming 3, 4 follows immediately. If the vectors {v1 , . . . , vk } are orthonormal then
(
1 if i = j
hvi , vj i =
.
0 if i 6= j
Then using hT (vi ), T (vj )i = hvi , vj i, this is still true for {T (v1 ), . . . , T (vk )}.
Assume 4. Then let B = {v1 , . . . , vn } be an orthonormal basis . Then the matrix [h, i]B
B
is the identity. This means that for u, v Cn ,
t

hv, wi = [w]B [h, i]B


B [v]B = [v]B [w]B ,
where is the standard dot product. Now the columns of [T ]B
B are equal to [T (v1 )]B , . . . , [T (vn ])B
so we get by 4
(
1 if i = j
[T (vi )]B [T (vj )]B = hT (vi ), T (vj )i =
.
0 if i 6= j
This means 5 holds.
Last, assuming 5, we show that T is unitary. Taking B to be any orthonormal basis, the
matrix [T ]B
B has orthonormal (and thus linearly independent) columns, so it is invertible,
giving T is invertible. Furthermore, the (i, j)-th entry of [T T ]B
B is

t
B
B
[T ]B
etj [T ]B
B [T ]B ei = [T ]B ej
B ei ,
the standard dot product of the i-th column of [T ]B
B with the j-th column, which is 1 if i = j

B
and zero otherwise. This means [T T ]B is the identity matrix, giving T = T 1 .
Remark. Given a matrix A Mn,n (C) we say that A is unitary if A is invertible and
t
A = A1 . Here A = A is the conjugate transpose. Note that the (i, j)-th entry of A A
is just the standard dot product of the i-th column of A with the j-th column of A. So A is
unitary if and only if the columns of A are orthonormal. Thus the last part of the previous
proposition says
T unitary [T ]B
B unitary for any orthonormal basis B .
91

8.4

Spectral theory in inner product spaces

We now study the eigenvalues of self-adjoint and unitary operators.


Theorem 8.4.1. Let T : V V be linear on a finite dimensional inner product space V
and an eigenvalue of T .
1. If T is self-adjoint, R.
2. If T is skew adjoint, i R ( is imaginary).
3. If T is unitary, || = 1.
Proof. If T is self-adjoint, then if v is an eigenvector for eigenvalue ,
hv, vi = hv, vi = hv, T (v)i = hT (v), vi = hv, vi .
Since v =
6 ~0, this means = , or R.
If T is skew-adjoint, then iT is self-adjoint:
(iT ) = iT = (i)(T ) = iT .
So since the eigenvalues of iT are just i for the eigenvalues of T , we see that all eigenvalues
of T are imaginary.
If T is unitary with v an eigenvector for eigenvalue ,
hv, vi = hT (v), T (v)i = hv, vi .
This means = 1, or || = 1
The main theorem in spectral theory regards diagonalization of self-adjoint operators. In
fact, they are more than diagonalizable; one can change the basis using a unitary transformation.
Definition 8.4.2. A linear T : V V is unitarily diagonalizable on an inner product space
V if there exists an orthonormal basis of V consisting of eigenvectors for T . A matrix A is
unitarily diagonalizable if there is a unitary matrix P such that
P 1 AP is diagonal .
The main theorem is the following. Recall that T is normal if T T = T T . Self-adjoint,
skew-adjoint and unitary operators are normal.
Theorem 8.4.3. Let T : V V be linear and V a finite-dimensional inner product space.
Then T is normal if and only if T is unitarily diagonalizable.

92

Proof. One direction is easy. Suppose that T is unitarily diagonalizable. Then we can find
B
an orthonormal basis B such that [T ]B
B is diagonal. Then [T ]B is also diagonal, since it is
just the conjugate transpose of T . Any two diagonal matrices commute, so, in particular,
these matrices commute. This means

B
B
B
B
B
[T T ]B
B = [T ]B [T ]B = [T ]B [T ]B = [T T ]B ,

giving T T = T T .
The other direction is more difficult. We will first show that if T is self-adjoint then T is
unitarily diagonalizable. For that we need a lemma.
Lemma 8.4.4. If T : V V is linear then
R(T ) = N (T ) and N (T ) = R(T ) .
Proof. Let us assume that v N (T ). Then for any w R(T ) we can find w0 V such that
T (w0 ) = w. Then
hv, wi = hv, T (w0 )i = hT (v), w0 i = 0 .
So v R(T ) . This means N (T ) R(T ) . On the other hand, these spaces have the same
dimension:
dim R(T ) = dim V dim R(T ) = dim N (T ) ,
but the matrix of T is just the conjugate transpose of that of T (in any orthonormal basis),
so dim N (T ) = dim N (T ), completing the proof of the first statement.
For the second, we apply the first statement to T :
R(T ) = N (T )
and then perp both sides: R(T ) = N (T ) .
Now we move to the proof.
Self-adjoint case. Assume that T = T ; we will show that V has an orthonormal basis of
eigenvectors for T by induction on dim V . First if dim V = 1 then any nonzero vector v is
an eigenvector for T . Set our orthonormal basis to be {v/kvk}.
If dim V > 1 then by the fact that we are over C, let v be any eigenvector for T (this
is possible since C is algebraically closed) for eigenvalue . Then dim N (T I) > 0 and
thus
dim R(T I) = dim V dim N ((T I) )
= dim V dim N (T I) .
However as T is self-adjoint, R, so this is
dim V dim N (T I) < dim V .
Furthermore if dim R(T I) = 0 then we must have dim N (T I) = dim V , meaning
the whole space is the eigenspace for T . In this case we just take any orthonormal basis
93

B of V consisting of eigenvectors of T and write [T ]B


B as a diagonal matrix in this basis.
B
However as it is orthonormal, [T ]B is just the conjugate transpose, and is thus diagonal,
meaning B is an orthonormal basis of eigenvectors for T , completing the proof.
So we may assume that
0 < dim R(T I) < dim V .
Now both N = N (T I) and R = R(T I) are T -invariant (check this!) so we can
consider TN and TR , the restrictions of T to N and R. These are still self-adjoint, as for
instance
hTR v, wi = hv, TR wi for all v, w R ,
meaning TR = TR (similarly for TN ). Therefore by induction we can find orthonormal bases
BN = {v1 , . . . , vl } and BR = {vl+1 , . . . , vn }
of N and R consisting of eigenvectors for TN and TR (and thus of T ). But N and R are
perpendicular, meaning that B = BN BR is still orthonormal.
Normal case. Suppose now that T is normal and write T as a self-adjoint part and a
skew-adjoint part:
1
1
T = (T + T ) + (T T ) = T1 + T2 .
2
2
Since T is normal, these two parts commute! Since self-adjoint operators are unitarily diagonalizable, so are skew adjoint ones (since whenever U is skew-adjoint, iU is self-adjoint).
Therefore we have commuting unitarily diagonalizable transformations, they are simultaneously diagonalizable. This follows from a proof from the homework, where it was shown that
commuting diagonalizable transformations are simultaneously diagonalizable. The transformations we consider are unitarily diagonalizable, but the same proof works here (check this!)
B
So we can find an orthonormal basis B such that [T1 ]B
B and [T2 ]B are diagonal. Thus
B
B
[T ]B
B = [T1 ]B + [T2 ]B is diagonal

and we are done.

8.5

Appendix: proof of Cauchy-Schwarz by P. Sosoe

Let V be an inner product space over C, and u, v V , then


|hu, vi| kukkvk.

(3)

To prove this, the idea is to start from a weaker inequality and upgrade it by exploiting some
invariances of the quantities involved. The starting point is the positive-definiteness of the
inner product:
0 hu v, u vi.
(4)

94

Expanding the inner product on the right, we find:


hu v, u vi = hu, ui hu, vi hv, ui + hv, vi
= kuk2 2<hu, vi + kvk2 .

(5)

To pass to the second line, we have used the Hermitian property


hv, ui = hu, vi
and the fact that for any complex number z,
z + z = 2<z.
Combining (4) and , we find
2<hu, vi kuk2 + kvk2 ,
which we rewrite as

kuk2 kvk2
+
.
(6)
2
2
This inequality is weaker than (3): the left side is in general smaller than |hu, vi|, while the
right side is larger than kukkvk.
To improve the situation, notice that if C is such that || = 1, then
<hu, vi

kvk2 = ||2 kvk2 = kvk2


while
vi.
<hu, vi = <hu,
So for any such that || = 1:
kuk2 kvk2

<hu, vi
+
.
2
2

(7)

This is one of those great situations where we have an inequality holding for every value of
some parameter. The obvious next step is to optimize the choice of . To see how to do this
in the present case, recall that any complex number z can be written as
z = |z|ei ,
where is real. Thus also
hu, vi = ei |hu, vi|
for some . Similarly write = 1 ei for some > 0. Then
vi = <ei ei |hu, vi| = cos( )|hu, vi|,
<hu,
recalling that ei = cos() + i sin(). Using this in (7), we find
cos( )|hu, vi|
95

kuk2 kvk2
+
,
2
2

whether is determined by u and v, but is at our disposal. We get the strongest statement
by setting = , improving (7) to
|hu, vi|

kuk2 kvk2
+
.
2
2

(8)

We can further improve (8) by introducing a new parameter c > 0. Notice that if we apply
(8) with u replaced by c u and v replaced by 1c v, then the left side is
|hcu,

1
c
vi| = |hu, vi| = |hu, vi|.
c
c

On the other hand, the right side of (8) is now


c2

1 kvk2
kuk2
+ 2
,
2
c 2

so that for any c > 0, we have


|hu, vi| c2

kuk2
1 kvk2
+ 2
.
2
c 2

(9)

Since the left side is independent of c, we should once again optimize in our free parameter c
to get the best possible inequality. The inequality will obviously be strongest for the smallest
possible value of the right side. The function
x 7 x

kuk2 1 kvk2
+
2
x 2

is differentiable for x > 0 and tends to infinity as x 0+ and x . It thus has a (global)
minimum at the zero of its derivative, which occurs when
kuk2
1 kvk2
2
= 0,
2
x 2
or
x0 =

kvk2
.
kuk2

This tells that the optimal choice of c in (9) for fixed kuk and kvk is c =
to the right side being
kvk kuk2 kuk kvk2
+
= kukkvk.
kuk 2
kvk 2
This finishes the proof of (3).

96

x0 , which leads

8.6

Exercises

1. Let A be a symmetric matrix in Mn,n (R).


(a) A is called positive-definite if Av v > 0 for all nonzero v Rn . (Here is the
standard dot-product.) Show that A is positive-definite if and only if there exists
an invertible B Mn,n (R) such that A = B t B.
(b) A is called positive semi-definite if Av v 0 for all v Rn . Formulate a similar
result to the above for such A.
2. Let V be a complex inner product space. Let T L(V, V ) be such that T = T .
We call such T skew-self-adjoint. Show that the eigenvalues of T are purely imaginary.
Show further that V is the orthogonal direct sum of the eigenspaces of T . In other
words, V is a direct sum of the eigenspaces and hv, wi = 0 if v and w are in distinct
eigenspaces.
Hint: Construct from T a suitable self-adjoint operator and apply the known results
from the lecture to that operator.
3. Let (V, h, i) be a complex inner product space, and be a Hermitian form on V (in
addition to h, i ). Show that there exists an orthonormal basis B of V such that []B
B
is diagonal, by completing the following steps:
(a) Show that for each w V , there exists a unique vector, which we call Aw, in V
with the property that for all v V ,
(v, w) = hv, Awi.
(b) Show the the map A : V V which sends a vector w V to the vector Aw just
defined, is linear and self-adjoint.
(c) Use the spectral theorem for self-adjoint operators to complete the problem.
4. Let (V, h, i) be a real inner product space.
(a) Define k k : V R by
kvk =

p
hv, vi .

Show that for all v, w V ,


|hv, wi| kvkkwk .
(b) Show that k k is a norm on V .
(c) Show that there exists an orthonormal basis of V .
5. Let (V, h, i) be a real inner product space and T : V V be linear.

97

(a) Prove that for each f V there exists a unique z V such that for all v V ,
f (v) = hv, zi .
(b) For each u V define fu,T : V V by
fu,T (v) = hT (v), ui .
Prove that fu,T V . Define T t (u) to be the unique u V such that for all
v V,
hT (v), ui = hv, T t (u)i
and show that T t is linear.
(c) Show that if is an orthonormal basis for V then

t
[T t ] = [T ] .
6. Let h, i be an inner product on V = Rn and define the complexification of h, i by
h(v, w), (x, y)iC = hv, xi + hw, yi ihv, yi + ihw, xi .
(a) Show that h, iC is an inner product on Cn .
(b) Let T : V V be linear.
i. Prove that (TC ) = (T t )C .
ii. If T t = T then we say that T is symmetric. Show in this case that TC is
Hermitian.
iii. It T t = T then we say T is anti-symmetric. Show in this case that TC is
skew-adjoint.
iv. If T is invertible and T t = T 1 then we say that T is orthogonal. Show in
this case that TC is unitary. Show that this is equivalent to
T O(h, i) ,
where O(h, i) is the orthogonal group for h, i.
7. Let h, i be an inner product on V = Rn and T : V V be linear.
(a) Suppose that T T t = T t T . Show then that TC is normal. In this case, we can find
a basis of Cn such that is orthonormal (with respect to h, iC ) and [TC ] is
diagonal. Define the subspaces of V
X1 , . . . , Xr , Y1 , . . . , Y2m
as in problem 1, question 3. Show that these are mutually orthogonal; that is, if
v, w are in different subspaces then hv, wi = 0.
98

(b) If T is symmetric then show that there exists an orthonormal basis of V such
that [T ] is diagonal.
(c) If T is skew-symmetric, what is the form of the matrix of T in real Jordan form?
(d) A M22 (R) is called a rotation matrix if there exists [0, 2) such that


cos sin
A=
.
sin cos
If T is orthogonal, show that there exists a basis of V such that [T ] is block
diagonal, and the blocks are either 2 2 rotation matrices or 1 1 matrices
consisting of 1 or 1.
Hint. Use the real Jordan form.

99

Potrebbero piacerti anche