Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Michael Damron
Princeton University
Contents
1 Vector spaces
1.1 Vector spaces and fields . . . .
1.2 Subspaces . . . . . . . . . . . .
1.3 Spanning and linear dependence
1.4 Bases . . . . . . . . . . . . . . .
1.5 Exercises . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
6
8
11
15
2 Linear transformations
2.1 Definitions . . . . . . . .
2.2 Range and nullspace . .
2.3 Isomorphisms . . . . . .
2.4 Matrices and coordinates
2.5 Exercises . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
20
22
23
28
3 Dual spaces
3.1 Definitions .
3.2 Annihilators
3.3 Transpose .
3.4 Double dual
3.5 Exercises . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
32
33
35
37
4 Determinants
4.1 Permutations . . . . . . . . . . . . . . .
4.2 Determinants: existence and uniqueness
4.3 Properties of the determinant . . . . . .
4.4 Exercises . . . . . . . . . . . . . . . . . .
4.5 Exercises on polynomials . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
38
41
45
47
50
5 Eigenvalues
5.1 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
52
55
57
6 Jordan form
6.1 Primary decomposition theorem . . . . . . . . . . . . . . .
6.2 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . .
6.3 Existence and uniqueness of Jordan form, Cayley-Hamilton
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
59
59
62
65
68
7 Bilinear forms
7.1 Definition and matrix representation . . . . . . . . . . . . . . . . . . . . . .
7.2 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
73
77
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7.3
7.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
80
82
84
84
85
89
92
94
97
Vector spaces
1.1
Linear algebra is the study of linear functions. In Rn these are functions f satisfying
f (x + y) = f (x) + f (y) and f (cx) = cf (x), x, y Rn , c R.
We will generalize this immediately, taking from Rn only what we absolutely need. We start
by looking at the c value above: it is called a scalar. Generally scalars do not need to come
from R. There is only some amount of structure we need for the set of scalars.
Definition 1.1.1. A set F is called a field if for each a, b F, there is an element ab F
and another a + b F such that
1. for all a, b, c F, (ab)c = a(bc) and (a + b) + c = a + (b + c),
2. for all a, b F, ab = ba and a + b = b + a,
3. there exist element 0, 1 F such that for all a F, a + 0 = a and 1a = a,
4. for all a F, there is an element a F such that a + (a) = 0 and if a 6= 0 there is
an element a1 F such that aa1 = 1 and
5. for all a, b, c F, a(b + c) = ab + ac.
This is our generalization of R. Note one interesting point: there is nothing that asserts
that F must be infinite, and indeed there are finite fields. Take any prime p and consider
the set Zp given by
Zp = {0, . . . , p 1} with modular arithmetic .
That is, a + b is defined as (a + b) mod p (for instance (2 + 5) mod 3 = 1). Then this is
a field. You will verify this in the exercises. Another neat fact: if F is a finite field then it
must have pn elements for some prime p and n N. You will prove this too.
Other examples are R and C.
Given our field of scalars we are ready to generalize the idea of Rn ; we will call this a
vector space.
Definition 1.1.2. A collection (V, F) of a set V and a field F is called a vector space (the
elements of V called vectors and those of F called scalars) if the following hold. For each
v, w V there is a vector sum v + w V such that
1. there is one (and only one) vector called ~0 such that v + ~0 = v for all v V ,
2. for each v V there is one (and only one) vector v such that v + (v) = ~0,
3. for all v, w V , v + w = w + v,
4
4. for all v, w, z V , v + (w + z) = (v + w) + z.
Furthermore for all v V and c F there is a scalar product cv V such that
1. for all v V , 1v = v,
2. for all v V and c, d F, (cd)v = c(dv),
3. for all v, w V and c F, c(v + w) = cv + cw and
4. for all v V and c, d F, (c + d)v = cv + dv.
This is really a ton of rules but they have to be verified! In case (V, F) is a vector space,
we will typically say V is a vector space over F or V is an F-vector space. Lets look at some
examples.
1. Take V = Rn and F = R. We define addition as you would imagine:
(v1 , . . . , vn ) + (w1 , . . . , wn ) = (v1 + w1 , . . . , vn + wn )
and scalar multiplication by
c(v1 , . . . , vn ) = (cv1 , . . . , cvn ) .
2. Let F be any field with n N and write
Fn = {(a1 , . . . , an ) : ai F for i = 1, . . . , n}
and define addition and scalar multiplication as above. This is a vector space. Note in
particular that F is a vector space over itself.
3. If F1 F2 are fields (with the same 0, 1 and operations) then F2 is a vector space over
F1 . This situation is called a field extension.
4. Let S be any nonempty set and F a field. Then define
V = {f : S F : f a function} .
Then V is an F-vector space using the operations
(f1 + f2 )(s) = f1 (s) + f2 (s) and (cf1 )(s) = c(f1 (s)) .
Facts everyone should see once.
1. For all c F, c~0 = ~0.
Proof.
c~0 = c(~0 + ~0) = c~0 + c~0
~0 = c~0 + (c~0) = (c~0 + c~0) + (c~0)
= (c~0) + (c~0 + (c~0)) = c~0 .
1.2
Subspaces
Then both of these are subspaces but their union is not, since it is not closed under addition
((1, 1) = (1, 0) + (0, 1)
/ W1 W2 ).
In the case that W1 W2 = {~0}, we say that W1 + W2 is a direct sum and we write it
W1 W2 .
1.3
Given a subset S (not necessarily a subspace) of a vector space V we want to generate the
smallest subspace containing S.
Definition 1.3.1. Let V be a vector space and S V . The span of S is defined
Span(S) = W CS W ,
where CS is the collection of subspaces of V containing S.
Note that the Span is the smallest subspace containing S in that if W is another subspace
containing S then Span(S) W . The fact that Span(S) is a subspace follows from:
Proposition 1.3.2. Let C be a collection of subspaces of a vector space V . Then W C W
is a subspace.
Proof. First each W inC contains ~0, so W C W is nonempty. If v, w W C W and c F
then v, w W for all W C. Since each W is a subspace, cv + w W for all W C,
meaning that cv + w W C W , completing the proof.
Examples.
1. Span() = {~0}.
2. If W is a subspace of V then Span(W ) = W .
3. Span(Span(S)) = Span(S).
4. If S T V then Span(S) Span(T ).
There is a different way to generate the span of a set. We can imagine that our initial
definition of span is from the outside in. That is, we are intersecting spaces outside of S.
The second will be from the inside out: it builds the span from within, using the elements
of S. To define it, we introduce some notation.
Definition 1.3.3. If S V then v V is said to be a linear combination of elements of
S if there are finitely many elements v1 , . . . , vn S and scalars a1 , . . . , an F such that
v = a1 v1 + + an vn .
Theorem 1.3.4. Let S V be nonempty. Then Span(S) is the set of all linear combinations
of elements of S.
8
Proof. Let S be the set of all linear combinations of elements of S. We first prove S
Each of the vi s is in S and therefore in Span(S).
Span(S), so let a1 v1 + + an vn S.
By closure of Span(S) under addition and scalar multiplication, we find a1 v1 + + an vn
Span(S).
it suffices to show that S is a subspace of V ; then it is one of
To show that Span(S) S,
the spaces we are intersecting to get Span(S) and we will be done. Because S 6= we can find
s S and then 1s is a linear combination of elements of S, making the Span nonempty. So
let v, w Span(S) and c F. We can write v = a1 v1 + + an vn and w = b1 w1 + + bk wk
for vi , wi S. Then
cv + w = (ca1 )v1 + + (can )vn + b1 w1 + + bk wk S .
Proof. The first item follows from the second, so we prove the second. Suppose that S2 is
linearly independent and that v1 , . . . , vn S1 , a1 , . . . , an F such that
a1 v1 + + an vn = ~0 .
Since these vectors are also in S2 and S2 is linearly independent, a1 = = an = 0. Thus
S1 is linearly independent.
Recall the intuition that a set is linearly independent if each vector in it is truly needed
to represent vectors in the span. Not only are the all needed, but linear independence implies
that there is exactly one way to represent each vector of the span.
Proposition 1.3.10. Let S V be linearly independent. Then for each nonzero vector v
Span(S) there exists exactly one choice of v1 , . . . , vn S and nonzero coefficients a1 , . . . , an
F such that
v = a1 v1 + + an vn .
Proof. Let v Span(S) be nonzero. By characterization of the Span as the set of linear
combinations of elements of S, there is at least one representation as above. To show
it is unique, suppose that v = a1 v1 + + an vn and v = b1 w1 + + bk wk and write
S1 = {v1 , . . . , vn }, S2 = {w1 , . . . , wk }. We can rearrange the Si s so that the elements
v1 = w1 , . . . , vm = wm are the common ones; that is, the ones in S1 S2 . Then
~0 = v v =
m
X
(aj bj )vj +
j=1
n
X
l=m+1
al vl +
k
X
bp w p .
p=m+1
1.4
Bases
We are now interested in maximal linearly independent sets. It turns out that these must
generate V as well, and we will work toward proving that.
Definition 1.4.1. Let V be an F-vector space and S V . If S generates V and is linearly
independent then we call S a basis for V .
Note that the above proposition says that any vector in V has a unique representation
as a linear combination of elements from the basis.
We will soon see that any basis of V must have the same number of elements. To prove
that, we need a famous lemma. It says that if we have a linearly independent T and a
spanning set S, we can add #S #T vectors from S to T to make it spanning.
Theorem 1.4.2 (Steinitz exchange lemma). Let S = {v1 , . . . , vm } satisfy Span(S) = V and
let T = {w1 , . . . , wk } be linearly independent. Then
11
1. k m and
2. after possibly reordering the set S, we have
Span({w1 , . . . , wk , vk+1 , . . . , vm }) = V .
Proof. The proof is by induction on k, the size of T . If k = 0 then T is empty and thus
linearly independent. In this case, we do not exchange any elements of T with elements of
S and the lemma simply states that 0 m and Span(S) = V , which is true.
Suppose that for some k 0 and all linearly independent sets T of size k, the lemma
holds; we will prove it holds for k + 1, so let T = {w1 , . . . , wk+1 } be a linearly independent
set of size k + 1. By last lecture, {w1 , . . . , wk } is linearly independent and by induction,
k m and we can reorder S so that
Span({w1 , . . . , wk , vk+1 , . . . , vm }) = V .
Because of this we can find scalars a1 , . . . , am such that
a1 w1 + + ak wk + ak+1 vk+1 + + am vm = wk+1 .
(1)
Definition 1.4.4. A vector space with a basis of size n is called n-dimensional and we
write dim(V ) = n. If this is true for some n we say the vector space is finite dimensional.
Otherwise we say that V is infinite dimensional and write dim(V ) = .
Note that {~0} is zero dimensional, since is a basis for it.
Corollary 1.4.5. Let V be an n-dimensional vector space (n 1) and S = {v1 , . . . , vm }.
1. If m < n then S cannot span V .
2. If m > n then S cannot be linearly independent.
3. If m = n then S is linearly independent if and only if it spans V .
Proof. Let B be a basis for V . Then using Steinitz with B as the linearly independent set
and S as the spanning set, we see that if S spans V then S has at least n elements, proving
the first part. Similarly, using Steinitz with B as the spanning set and S as the linearly
independent set, we get part two.
If m = n and S is linearly independent then Steinitz implies that we can add 0 vectors
from B to S to make S span V . This means S itself spans V . Conversely, if S spans V ,
then if it is not linearly independent, we can find v V such that v Span(S \ {v}), or
V = Span(S) Span(S \ {v}). Therefore S \ {v} is a smaller spanning set, contradicting
the first part.
Corollary 1.4.6. If W is a subspace of V then dim(W ) dim(V ). In particular, if V has
a finite basis, so does W .
Proof. If V is infinite dimensional there is nothing to prove, so let B be a finite basis for V
of size n. Consider all subsets of W that are linearly independent. By the previous corollary,
none of these have more than n elements (they cannot be infinite either since we could then
extract a linearly independent subset of size n + 1). Choose any one with them largest
number of elements and call it BW . It must be a basis the reason is that it is a maximal
linearly independent subset of W (this is an exercise on this weeks homework). Because it
has no more than dim(V ) number of elements, we are done.
Now we have one of two main subspace theorems. It says we can extend a basis for a
subspace to a basis for the full space.
Theorem 1.4.7 (One subspace theorem). Let W be a subspace of a finite-dimensional vector
space V . If BW is a basis for W , there exists a basis B of V containing BW .
Proof. Consider all linearly independent subsets of V that contain BW (there is at least one,
BW !) and choose one, S, of maximal size. We know that #S dimV and if #S = dimV
it must be a basis and we are done, so assume that #S = k < dimV . We must then
have Span(S) 6= V so choose a vector v V \ Span(S). We claim that S {v} is linearly
independent, contradicting maximality of S. To see this write S = {v1 , . . . , vk } and
a1 v1 + + ak vk + bv = ~0 .
13
(2)
By subtracting the w terms to one side we find that bk+1 wk+1 + + bm wm W1 . But this
is a basis for the intersection
sum is already in W2 , so it must be in the intersection. As B
we can write
bk+1 wk+1 + + bm wm = c1 v1 + + ck vk
for some ci s in F. Subtracting the ws to one side and using linear independence of B2 gives
bk+1 = = bm = 0. Therefore (2) reads
a1 v1 + + al vl = ~0 .
Using linear independence of B1 gives ai = 0 for all i and thus B is linearly independent.
The proof of this theorem gives:
Theorem 1.4.9 (Two subspace theorem). If W1 , W2 are subspaces of a finite-dimensional
vector space V , there exists a basis of V that contains bases of W1 and W2 .
Proof. Use the proof of the last theorem to get a basis for W1 + W2 containing bases of W1
and W2 . Then use the one-subspace theorem to extend it to V .
14
Note the difference from the one subspace theorem. We are not claiming that you can
extend any given bases of W1 and W2 to a basis of V . We are just claiming there exists at
least one basis of V such that part of this basis is a basis for W1 and part is a basis for W2 .
In fact, given bases of W1 and W2 we cannot generally find a basis of V containing these
bases. Take
V = R3 , W1 = {(x, y, 0) : x, y R}, W2 = {(x, 0, z) : x, z R} .
If we take bases B1 = {(1, 0, 0), (1, 1, 0)} and B2 = {(1, 0, 1), (0, 0, 1)}, there is no basis of
V = R3 containing both B1 and B2 since V is 3-dimensional.
1.5
Exercises
We will write N = {1, 2, . . .} and Z = {. . . , 1, 0, 1, . . .} for the natural numbers and integers,
respectively. Let N = N {0}. The rationals are Q = {m/n : m, n Z, n 6= 0} and R
stands for the real numbers.
1. If a, b N we say that a divides b, written a | b, if there is another natural number c
such that b = ac. Fix m, n N and define
S = {mp + nq : p, q Z} N .
(a) Let d be the smallest element of S. Show that d | m and d | n.
Hint. You can use the statement of the division algorithm without proof; that
is, if a, b N then there exist r, s N such that r < b and a = bs + r.
(b) Show that if e is another element of N that divides both m and n then e | d. This
number d is called the greatest common divisor of m and n, written d = gcd(m, n).
(c) For any nonzero integers m, n define gcd(m, n) = gcd(|m|, |n|). Show there exist
p, q Z such that mp + nq = gcd(m, n).
2. Let p be a prime and Zp be the set {0, . . . , p 1}. Show that Zp is a field using the
operations
ab = (ab) mod p and a + b = (a + b) mod p .
Here we have defined a mod p for a N as the unique r N with r < p such that
a = ps + r for some s N .
3. Let S be a nonempty set and F a field. Let V be the set of functions from S to F and
define addition and scalar multiplication on (V, F) by
(f + g)(s) = f (s) + g(s) and (cf )(s) = c(f (s)) .
Show V is a vector space over F.
15
(b) Assume now that V has dimension n < . If W has dimension m, show that
V /W has dimension n m.
Hint. Let BW be a basis for W and use the one subspace theorem to extend it
to a basis B for V . Show that {v + W : v B but v
/ BW } is a basis for V /W .
(c) Let W1 W2 V be subspaces. Show that
dim W2 /W1 + dim V /W2 = dim V /W1 .
9. If W1 , . . . , Wk are subspaces of V we write W1 Wk for the sum space W1 + +Wk
if
Wj [W1 + + Wj1 ] = {~0} for all j = 2, . . . , k .
In this case we say that the subspaces W1 , . . . , Wk are independent.
(a) For k = 2, this definition is what we gave in class: W1 and W2 are independent if
and only if W1 W2 = {~0}. Give an example to show that for k > 2 this is not
true. That is, if W1 , . . . , Wk satisfy Wi Wj = {~0} for all i 6= j then these spaces
need not be independent.
(b) Prove that the following are equivalent.
1. W1 , . . . , Wk are independent.
2. Whenever w1 + + wk = ~0 for wi Wi for all i then wi = ~0 for all i.
3. Whenever Bi is a basis for Wi for all i, the Bi s are disjoint and B := ki=1 Bi
is a basis for W1 + + Wk .
10. Give an example to show that there is no three subspace theorem. That is, if
W1 , W2 , W3 are subspaces of V then there need not exist a basis of V containing a
basis for Wi for all i = 1, 2, 3.
11. Let F be a finite field. Define a sequence (sn ) of elements of F by s1 = 1 and sn+1 =
sn + 1 for n N. Last, define the characteristic of F as
char(F ) = min{n N : sn = 0} .
(If the set on the right is empty, we set char(F ) = 0.)
(a) Show that because F is finite, its characteristic is a prime number p.
(b) Show that the set {0, s1 , . . . , sp1 } with the same addition and multiplication as
in F is itself a field, called the prime subfield of F .
(c) Using the fact that F can be viewed as a vector space over its prime subfield,
show that F has pn elements for some n N.
17
Linear transformations
2.1
Definitions
Definition 2.1.1. Let V and W be vector spaces over the same field F. A function T : V
W is called a linear transformation if
T (v1 + v2 ) = T (v1 ) + T (v2 ) and T (cv1 ) = cT (v1 ) for all v1 , v2 V and c F .
As usual, we only need to check the condition
T (cv1 + v2 ) = cT (v1 ) + T (v2 ) for v1 , v2 V and c F .
Examples
1. Consider C as a vector space over itself. Then if T : C C is linear, we can write
T (z) = zT (1)
so T is completely determined by its value at 1.
2. Let V be finite dimensional and B = {v1 , . . . , vn } a basis for V . Each v V can be
written uniquely as
v = a1 v1 + + an vn for ai F .
So define T : V Fn by T (v) = (a1 , . . . , an ). This is called the coordinate map relative
to B. It is linear because if v = a1 v1 + + an vn , w = b1 v1 + + bn vn and c F,
cv + w = (ca1 + b1 )v1 + + (can + bn )vn
is one representation of cv + w in terms of the basis. But this representation is unique,
so we get
T (cv + w) = (ca1 + b1 , . . . , can + bn )
= c(a1 , . . . , an ) + (b1 , . . . , bn )
= cT (v) + T (w) .
3. Given any m n matrix A with entries from F (the notation from the homework is
A Mm,n (F), we can define a linear transformations LA : Fn Fm and RA : Fm Fn
by
LA (v) = A v and RA (v) = v A .
Here we are using matrix multiplication and in the first case, representing v as a column
vector. In the second, v is a row vector.
18
4. In fact, the set of linear transformations from V to W , written L(V, W ), forms a vector
space! Since the space of functions from V to W is a vector space, it suffices to check
that it is a subspace. So given T, U L(V, W ) and c F, we must show that cT + U
is a linear transformation. So let v1 , v2 V and c0 F:
(cT + U )(c0 v + w) = (cT )(c0 v + w) + U (c0 v + w)
= c(T (c0 v + w)) + U (c0 v + w)
= c(c0 T (v) + T (w)) + c0 U (v) + U (w)
= c0 (cT (v) + U (v)) + cT (w) + U (w)
= c0 (cT + U )v + (cT + U )(w) .
Another obvious way to build linear transformations is composition.
Proposition 2.1.2. Let T : V W and U : W Z be linear (with all spaces over the
same field F). Then the composition U T is a linear transformation from V to Z.
Proof. Let v1 , v2 V and c F. Then
(U T )(cv1 + v2 ) = U (T (cv1 + v2 )) = U (cT (v1 ) + T (v2 ))
= cU (T (v1 )) + U (T (v2 )) = c(U T )(v1 ) + (U T )(v2 ) .
2.2
Next we define two very important subspaces that are related to a linear transformation T .
Definition 2.2.1. Let T : V W be linear. The nullspace, or kernel, of T is the set
N (T ) V defined by
N (T ) = {v V : T (v) = ~0} .
The range, or image, of T , is the set R(T ) W defined by
R(T ) = {w W : T (v) = w for some v V } .
In the definition of N (T ) above, ~0 is the zero vector in the space W .
Proposition 2.2.2. Let T : V W be linear. Then N (T ) is a subspace of V and R(T ) is
a subspace of W .
Proof. First N (T ) is nonempty, since each linear transformation must map ~0 to ~0: T (~0) =
T (0~0) = 0T (~0) = ~0. If v1 , v2 N (T ) and c F,
T (cv1 + v2 ) = cT (v1 ) + T (v2 ) = c~0 + ~0 = ~0 ,
so cv1 + v2 N (T ), showing that N (T ) is a subspace of V . For R(T ), it is also non-empty,
since ~0 is mapped to by ~0. If w1 , w2 R(T ) and c F, choose v1 , v2 V such that
T (v1 ) = w1 and T (v2 ) = w2 . Then
cw1 + w2 = cT (v1 ) + T (v2 ) = T (cv1 + v2 ) ,
so cw1 + w2 is mapped to by cv1 + v2 , a vector in V and we are done.
In the finite-dimensional case, the dimensions of these spaces are so important they get
their own names: the rank of T is the dimension of R(T ) and the nullity of T is the dimension
of N (T ). The next theorem relates these dimensions to each other.
Theorem 2.2.3 (Rank-nullity). Let T : V W be linear and dim(V ) < . Then
rank(T ) + nullity(T ) = dim(V ) .
20
Proof. In a way, this theorem is best proved using quotient spaces, and you will do this in
the homework. We will prove it the more standard way, by counting and using bases. Let
{v1 , . . . , vk } be a basis for the nullspace of T and extend it to a basis {v1 , . . . , vk , vk+1 , . . . , vn }
for V . We claim that T (vk+1 ), . . . , T (vn ) are distinct and form a basis for R(T ); this will
complete the proof. If T (vi ) = T (vj ) for some i, j {k + 1, . . . , n}, we then have T (vi vj ) =
~0, implying that vi vj N (T ). But we have a basis for N (T ): we can write
vi vj = a1 v1 + + ak vk
and subtracting vi vj to the other side, we have a linear combination of elements of a basis
equal to zero with some nonzero coefficients, a contradiction.
Now we show B = {T (vk+1 ), . . . , T (vn )} is a basis for R(T ). They are clearly contained
in the range, so Span(B) R(T ). Conversely, if w R(T ) we can write w = T (v) for some
v V and using the basis, find coefficients such that bi such that
w = T (v) = T (b1 v1 + . . . + bn vn ) .
Expanding the inside, we get b1 T (v1 ) + + bn T (vn ). The first k vectors are zero, since
v1 , . . . , vk N (T ), so
w = bk+1 T (vk+1 ) + + bn T (vn ) ,
proving w Span(B) and therefore B spans R(T ).
For linear independence, let bk+1 T (vk+1 ) + + bn T (vn ) = ~0. Then
~0 = T (bk+1 vk+1 + + bn vn ) ,
so bk+1 vk+1 + + bn vn N (T ). As before, we can then write these vectors in terms of
v1 , . . . , vk , use linear independence of {v1 , . . . , vn } to get bi = 0 for all i.
One reason the range and nullspace are important is that they tell us when a transformation is one-to-one (injective) or onto (surjective). Recall these definitions:
Definition 2.2.4. If X and Y are sets and f : X Y is a function then we say that f is
one-to-one (injective) if f maps distinct points to distinct points; that is, if x1 , x2 X with
x1 6= x2 then f (x1 ) 6= f (x2 ). We say that f is onto (surjective) if each point of Y is mapped
to by some x; that is, for each y Y there exists x X such that f (x) = y.
Proposition 2.2.5. Let T : V W be linear. Then
1. T is injective if and only if N (T ) = {~0}.
2. T is surjective if and only if R(T ) = W .
Proof. The second is just the definition of surjective, so we prove the first. Suppose that
T is injective and let v N (T ). Then T (v) = ~0 = T (~0), but because T injective, v = ~0,
proving that N (T ) {~0}. As N (T ) is a subspace, we have {~0} N (T ), giving equality.
Conversely suppose that N (T ) = {~0}; we will prove that T is injective. So assume that
T (v1 ) = T (v2 ). By linearity, T (v1 v2 ) = ~0, so v1 v2 N (T ). But he only vector in N (T )
is the zero vector, so v1 v2 = ~0, giving v1 = v2 and T is injective.
21
In the previous proposition, the second part holds for all functions T , regardless of
whether they are linear. The first, however, need not be true if T is not linear. (Think
of an example!)
We can give an alternative characterization of one-to-one and onto:
Proposition 2.2.6. Let T : V W be linear.
1. T is injective if and only if it maps linearly independent sets of V to linearly independent sets of W .
2. T is surjective if and only if it maps spanning sets of V to spanning sets of W .
3. T is bijective if and only if it maps bases of V to bases of W .
Proof. The third part follows from the first two. For the first, assume that T is injective
and let S V be linearly independent. We will show that T (S) = {T (v) : v S} is linearly
independent. So let
a1 T (v1 ) + + an T (vn ) = ~0 .
This implies that T (a1 v1 + + an vn ) = ~0, implying that a1 v1 + + an vn = ~0 by injectivity.
But this is a linear combination of vectors in S, a linearly independent set, giving ai = 0 for
all i. Thus T (S) is linearly independent.
Conversely suppose that T maps linearly independent sets to linearly independent sets
and let v N (T ). If v 6= ~0 then {v} is linearly independent, so {T (v)} is linearly independent. But if T (v) = ~0 this is impossible, since {~0} is linearly dependent. Thus v 6= ~0 and
N (T ) = {~0}, implying T is injective.
For item two, suppose that T is surjective and let S be a spanning set for V . Then if
w W we can find v V such that T (v) = w and a linear combination of vectors of S equal
to v: v = a1 v1 + + an vn for vi S. Therefore
w = T (v) = a1 T (v1 ) + + an T (vn ) ,
meaning that we have w Span(T (S)), so T (S) spans W . Conversely if T maps spanning
sets to spanning sets, then T (V ) = R(T ) must span W . But since R(T ) is a subspace of W ,
this means R(T ) = W and T is onto.
2.3
Isomorphisms
Proof. It is an exercise to see that any bijection has a well-defined inverse function and that
this inverse function is a bijection. (This was done, for example, in the 215 notes in the first
chapter.) So we must only show that T 1 is linear. To this end, let w1 , w2 W and c F.
Then
T (T 1 (cw1 + w2 )) = cw1 + w2 ,
whereas
T (cT 1 (w1 ) + T 1 (w2 )) = cT (T 1 (w1 )) + T (T 1 (w2 )) = cw1 + w2 .
Since T is injective, this implies that T 1 (cw1 + w2 ) = cT 1 (w1 ) + T 1 (w2 ).
Using the notion of isomorphism, we can see that any n dimensional vector space V over
F is just Fn .
Theorem 2.3.3. Let V be an n-dimensional vector space over F. Then V is isomorphic to
Fn .
Proof. Let B = {v1 , . . . , vn } be a basis for V . We will think of B as being ordered. Define the
coordinate map TB : V Fn as before as follows. Each v V has a unique representation
v = a1 v1 + + an vn . So set TB (v) = (a1 , . . . , an ). This was shown before to be a linear
transformation. So we must just show it is an isomorphism.
Since the dimension of V is equal to that of Fn , we need only show that TB is onto. Then
by the rank-nullity theorem, we will find
dimN (TB ) = dim(V ) dim(R(TB )) = dim(V ) dim(Fn ) = 0 ,
implying that N (TB ) = {~0}, and that TB is one-to-one. So to show onto, let (a1 , . . . , an ) Fn .
The element v = a1 v1 + + an vn maps to it:
TB (v) = TB (a1 v1 + + an vn ) = (a1 , . . . , an ) ,
so TB is an isomorphism.
2.4
We will now see that, just as V with dimension n looks just like Fn , all linear maps from
V to W look just like matrices with entries from F.
Suppose that T : V W is linear and these are finite dimensional vector spaces with
dimension n and m respectively. Fix B = {v1 , . . . , vn } and C = {w1 , . . . , wm } to be bases of
V and W respectively. We know that T is completely determined by its values on B, and
each of these values lies in W , so we can write
T (v1 ) = a1,1 w1 + + am,1 wm
T (v2 ) = a1,2 w1 + + am,2 wm
23
and so on, up to
T (vn ) = a1,n w1 + + am,n wm .
Now we take some arbitrary v V and express it in terms of coordinates using B. This
time we write it as a column vector and use the notation [v]B :
a1
[v]B = , where v = a1 v1 + + an vn .
an
Let us compute T (v) and write it in terms of C:
T (v) = a1 T (v1 ) + + an T (vn )
= a1 (a1,1 w1 + + am,1 wm ) + + an (a1,n w1 + + am,n wm )
= (a1 a1,1 + + an a1,n )w1 + + (a1 am,1 + + an am,n )wm .
Therefore we can write T (v) in coordinates using C as
a1 a1,1 + + an a1,n
a1,1
[T (v)]C =
=
a1 am,1 + + an am,n
am,1
a1,n
am,n
a1
.
an
n
X
Mi,k Nk,j .
k=1
Therefore as A is mn and [v]B is n1, the matrix A[v]B is m1 and its (j, 1)-th coordinate
is
n
n
X
X
(A[v]B )j,1 =
Aj,k ([v]B )k,1 =
Aj,k (ei )k,1 = Aj,i .
k=1
k=1
This means the entries of A[v]B are A1,i , A2,i , . . . , Am,i , the i-th column of A. However, this
B
also equals [T (ei )]C , which is the i-th column of [T ]B
C by construction. Thus A and [T ]C have
the same columns and are thus equal.
24
In fact much more is true. What we have done so far is defined a mapping : L(V, W )
Mm,n (F) in the following manner. Given fixed bases B and C of sizes n and m respectively,
we set
(T ) = [T ]B
C .
This function is actually an isomorphism, meaning that the space of linear transformations
is just a relabeling of the space of matrices (after choosing coordinates B and C):
Theorem 2.4.2. Given bases B and C of V and W of sizes n and m, the spaces L(V, W )
and Mm,n (F) are isomorphic via the mapping .
Proof. We must show that is a bijection and linear. First off, if (T ) = (U ) then for all
v V , we have
[T (v)]C = (T )[v]B = (U )[v]B = [U (v)]C .
But the map sending vectors in W to their coordinates relative to C is also a bijection, so
T (v) = U (v). Since this is true for all v, we get T = U , meaning is injective. To show
surjective, let A be any m n matrix with (i, j)-th entry Ai,j . Then we can define a linear
transformation T : V W by its action on the basis B: set
T (vi ) = A1,i w1 + + Am,i wm .
By the slogan, there is a unique linear transformation satisfying this and you can then check
that [T ]B
C = A, meaning is surjective and therefore a bijection.
To see that is linear, let T, U L(V, W ) and c F. Then the i-th column of [cT + U ]B
C
is simply the coefficients of (cT + U )(vi ) expressed relative to the basis C. This coordinate
map is linear, so
[(cT + U )(vi )]C = [cT (vi ) + U (vi )]C = c[T (vi )]C + [U (vi )]C ,
which is c times the i-th column of (T ) plus the i-th column of (U ). Thus
B
B
[cT + U ]B
C = c[T ]C + [U ]C .
Last time we saw that if V and W have dimension n and m and we fix bases B of V and
C of W then there is an isomorphism : L(V, W ) Mm,n (F) given by
(T ) = [T ]B
C .
A simple corollary of this follows. Because of any basis is a basis, these spaces have
the same dimension:
Corollary 2.4.3. The dimension of L(V, W ) is mn, where V has dimension n and W has
dimension m. Given bases B of V and C of W , a basis of L(V, W ) is given by the set of
size mn
{Ti,j : 1 i n, 1 j m} ,
where Ti,j is the unique linear transformation sending vi to wj and all other elements of B
to ~0.
25
Proof. Since L(V, W ) and Mm,n (F) are isomorphic, they have the same dimension, which in
the latter case is mn (that was a homework problem). Further the basis of Mm,n (F) of size
mn given by the matrices with a 1 in the (i, j)-th entry and 0 everywhere else map by 1
to a basis for L(V, W ), and it is exactly the set listed in the corollary.
We can now give many nice properties of the matrix representation.
1. Let T : V W and U : W Z be linear with B, C, D bases for V, W, Z. For any
v V,
B
C
[(U T )v]D = [U (T (v))]D = [U ]C
D [T (v)]C = [U ]D [T ]C [v]B .
However [U T ]B
D is the unique matrix with this property, so we find
C
B
[U T ]B
D = [U ]D [T ]C .
In other words, [T ]B
C is an invertible matrix.
Definition 2.4.4. We say that A Mn,n (F) is invertible if there is a B Mn,n (F)
such that AB = BA = I.
You will show in the homework if A is invertible, there is exactly one (invertible) B
that satisfies AB = BA = I. Therefore we write A1 = B. This gives
1
[T ]B
= [T 1 ]C
C
B .
Exercise: if A is an invertible n n matrix and B is a basis for V then there is an
isomorphism T : V V such that [T ]B
B = A.
We summarize the relation between linear transformations and matrices using the
following table. Fix V, W , T : V W and bases B, C of V, W .
Linear transformations
vV
wW
T
U T (composition)
isomorphisms
Matrices
the n 1 column vector [v]B
the m 1 column vector [w]C
the m n matrix [T ]B
C
B
[U ]C
[T
]
(matrix
multiplication)
D
C
invertible matrices
26
B
B
B
B
[IdW ]C
C 0 [T ]C [IdV ]B = [IdW T IdV ]C 0 = [T ]C 0 .
0
B
Note that [IdW ]C
C 0 and [IdV ]B are invertible. Therefore:
B
[T ]B
C 0 = P [T ]C Q .
0
So [IdV ]B
B and P have the same columns and are thus equal.
In one case we have a simpler form for P and Q. Suppose that T : V V is
linear and B, B 0 are bases for V . Then
0
B
B
B
[T ]B
B 0 = [IdV ]B 0 [T ]B [IdV ]B .
0
1
[T ]B
That is, we have [T ]B
B P , where P is an invertible n n matrix. This
B0 = P
motivates the definition
n
X
(AB)i,i =
i=1
n X
n
X
n X
n
X
Ai,k Bk,i
i=1 k=1
n
X
Bk,i Ai,k =
k=1 i=1
(BA)k,k = T r(BA) .
k=1
27
2.5
Exercises
1. Let T : V V be linear with dim V < . Show that the following two statements
are equivalent.
(A) V = R(T ) N (T ).
(B) N (T ) = N (T 2 ), where T 2 = T T .
2. Let T : V W be linear with dim(V ) = n and dim(W ) = m.
(a) Prove that if n > m then T cannot be injective.
(b) Prove that if n < m then T cannot be surjective.
(c) Prove that if n = m then T is injective if and only if it is surjective.
3. Let V, W and Z be finite-dimensional vector spaces over F. If T : V W and
U : W Z are linear, show that
rank(U T ) min{rank(U ), rank(T )} .
Prove also that if either of U or T is invertible, the rank of U T is equal to the rank of
the other one. Deduce that if P : V V and Q : W W are isomorphisms then the
rank of QT P equals the rank of T .
4. Given an angle [0, 2), let T : R2 R2 be the function that rotates a vector
clockwise about the origin by an angle . Find [T ]B
B , where B = {(1, 0), (0, 1)}.
5. Let V and W be finite dimensional vector spaces over F and T : V W linear. Show
there exist ordered bases B of V and C of W such that
(
0
if i 6= j
[T ]B
.
C i,j =
0 or 1 if i = j
6. Let F be a field and consider the vector space of polynomials of degree at most n:
Fn [x] = {an xn + + a0 : ai F for i = 0, . . . , n} .
(a) Show that B = {1, x, x2 , . . . , xn } is a basis for this space.
(b) Fix an element b F and define the evaluation map Tb : Fn [x] F by Tb (p) =
p(b). Show this is linear. Find the range and nullspace of Tb .
(c) Give the representation of Tb in terms of the basis B for Fn [x] and the basis {1}
for F.
28
(d) For distinct b1 , . . . , bn+2 in F show that the functions Tb1 , . . . , Tbn+2 are linearly
dependent in L(Fn [x], F). Deduce that any polynomial p in Fn [x] with at least
n + 1 zeros must have p(x) = 0 for all x F.
7. Here you will give an alternative proof of the rank-nullity theorem. Let T : V W
be linear and suppose that dim(V ) < .
(a) Consider the quotient space V /N (T ) and define a function T : V /N (T ) W as
follows. If C V /N (T ) is some element we may represent it as v + N (T ) for
some v V . Select one such element v and define T(C) = T (v). Show that this
definition does not depend on the choice of v, so long as v + N (T ) = C; that is,
that T as defined is a (well-defined) function.
(b) Prove that T is an isomorphism from V /N (T ) to R(T ). (This is a version of the
first isomorphism theorem when it is proved for groups.)
(c) Deduce the rank-nullity theorem.
8. Let A Mn,n (F) be invertible and B be a basis for an n-dimensional F-vector space
V . Show there is an isomorphism T : V V such that [T ]B
B = A.
9. (a) Let A Mn,n (F) be invertible. Show that the inverse matrix is unique.
(b) Let V be an n-dimensional F-vector space and T : V V and U : V V be
linear that satisfy
(U T )(v) = v for all v V .
Show that (T U )(v) = v for all v V .
(c) Let A, B Mn,n (F) satisfy AB = I. Show that BA = I.
10. If A Mm,n (F) we define the column rank of A as the dimension of the span of the n
different columns of A in Fm . Similarly, we define the row rank of A as the dimension
of the rows of A in Fn .
(a) Show that the column rank of A is equal to the rank of the linear transformation
LA : Fn Fm defined by LA (v) = A v, matrix multiplication of the column
vector v by the matrix A on the left.
(b) Use exercise 7 on the previous homework to show that if P Mn,n (F) and Q
Mm,m (F) are both invertible then the column rank of A equals the column rank
of QAP .
(c) Show that the row rank of A is equal to the rank of the linear transformation
RA : Fm Fn defined by RA (v) = v A, viewing v as a row vector and multiplying
by A on the right.
(d) Show that if P Mn,n (F) and Q Mm,m (F) are both invertible then the row
rank of A equals the row rank of QAP .
29
(e) Use exercise 9 on the previous homework and parts (a) - (d) above to show that
the row rank of A equals the column rank of A.
11. Given m R define the line
Lm = {(x, y) R2 : y = mx} .
(a) Let Tm be the function which maps a point in R2 to its closest point in Lm . Find
the matrix of Tm relative to the standard basis.
(b) Let Rm be the function which maps a point in R2 to the reflection of this point
about the line Lm . Find the matrix of Tm relative to the standard basis.
Hint for both. First find the matrix relative to a carefully chosen basis.
30
Dual spaces
3.1
Definitions
We have been talking about coordinates, so lets examine them more closely. Let V be an
n-dimensional vector space and fix a basis B = {v1 , . . . , vn } of V . We can write any vector
V in coordinates relative to B as
a1
[v]B = , where v = a1 v1 + + an vn .
an
For any i = 1, . . . , n we can define the i-th coordinate map by vi : V F given by vi (v) = ai ,
where ai is the i-th entry of [v]B . These elements vi are linear and are thus in the space
L(V, F). This space comes up so much we give it a name:
Definition 3.1.1. We write V = L(V, F) and call it the dual space to V . Elements of V
will be written f and called linear functionals.
Given any basis B = {v1 , . . . , vn } we call B = {v1 , . . . , vn } the basis of V dual to B.
Proposition 3.1.2. If B is a basis of V then B is a basis of V .
Proof. The dimension of V is n, the dimension of V , so we must show B is linearly
independent or spanning. We show linearly independent: suppose that
a1 v1 + + an vn = ~0 ,
where ~0 on the right is the zero transformation from V to F. Apply both sides to vi . For
i 6= j we get vj (vi ) = 0, since the j-th coordinate of vi is 0. For i = j we get vi (vi ) = 1, so
ai = (a1 v1 + + an vn )(vi ) = ~0(vi ) = 0 .
This is true for all i so B is linearly independent and we are done.
It is not surprising that B is a basis of V . The reason is that each element f V can
be written in its matrix form using the basis B of V and {1} of F. Then then matrix for vi
is
[vi ]B
{1} = (0 0 1 0 0) ,
where the 1 is in the i-th spot. Clearly these form a basis for M1,n (F) and since the map
sending linear transformations to their matrices relative to these bases is an isomorphisms,
so should B be a basis of V .
There is an alternate characterization: each vi is in L(V, F) so can be identified by its
action on the basis B:
(
1 if i = j
vi (vj ) =
.
0 otherwise
One nice thing about considering the dual basis B is that we can write an arbitrary
f V in terms of the basis B quite easily.
31
Proposition 3.1.3. Let B be a basis for V and B the dual basis for V . Then if f V ,
f = f (v1 )v1 + + f (vn )vn .
Proof. We simply need to check that both sides give the same answer when evaluated at the
basis of V . So apply each to vi : the left side gives f (vi ) and the right gives
(f (v1 )v1 + + f (vn )vn )(vi ) = f (v1 )v1 (vi ) + + f (vn )vn (vi ) = f (vi )vi (vi ) = f (vi ) .
A nice way to think about linear functionals involves their nullspaces. By the rank-nullity
theorem, if f V ,
dim(N (f )) + dim(R(f )) = dim(V ) .
Because R(f ) F, it is at most one-dimensional. Therefore N (f ) = V or N (f ) is n 1
dimensional, where n = dim(V ). This gives
If f is not the zero functional, nullity(f ) = dim(V ) 1. A subspace of this dimension
is called a hyperspace.
Because of the simple structure of the nullspace, we can characterize linear functionals easily.
Proposition 3.1.4. Two nonzero elements f, g V are equal if and only if they have the
same nullspace N = N (f ) = N (g) and they agree at one vector outside N .
Proof. One direction is clear, so suppose that N = N (f ) = N (g) and v V \ N satisfies
f (v) = g(v). You can check that if BN is a basis for N then BN {v} is a basis for V . (The
proof is similar to how we proved the one-subspace theorem.) But then f and g agree on
BN {v} and must agree everywhere, giving f = g.
3.2
Annihilators
As we have seen before, one useful tool for the study of linear transformations is the nullspace.
We will consider the dual version of this now: given S V , we give a name to those f V
such that S N (f ).
Definition 3.2.1. If S V then the annihilator of S is
S = {f V : f (s) = 0 for all s S} .
Note that if S T then S T .
Proposition 3.2.2. Let S V (not necessarily a subspace).
1. S is a subspace of V .
2. S = (Span(S)) .
32
Span({vk+1
, . . . , vn }) .
3.3
Transpose
The matrix of T t can be written in a very simple way using dual bases.
Theorem 3.3.3. Let T : V W be linear and B, C bases for V and W . Writing B and
C for the dual bases,
B t
[T t ]C
=
.
[T
]
B
C
The matrix on the right the transpose matrix; that is, if A is a matrix then the transposed
matrix At is defined by (At )i,j = Aj,i .
Proof. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. When we build the matrix [T ]B
C , we
make the j-th column by expanding T (vj ) in terms of the basis C. So our matrix can be
rewritten as
w1 (T (v1 )) w1 (T (v2 )) w1 (T (vn ))
w2 (T (v1 )) w2 (T (v2 )) w2 (T (vn ))
.
[T ]B
=
C
w1 (T (v1 ))
w1 (T (vn ))
B
This is just the first row of [T ]B
C . Similarly, the j-th column is the j-th row of [T ]C and this
completes the proof.
1. N (T t ) = R(T ) ,
2. R(T t ) = N (T ) ,
3. rank(T t ) = rank(T ) and nullity(T t ) = nullity(T ).
Proof. For the first item, let g R(T ) . Then we would like to show that g N (T t ), or
that T t (g) = 0. Since T t (g) V this amounts to showing that (T t (g))(v) = 0 for all v V .
So let v V and compute
(T t (g))(v) = g(T (v)) = 0 ,
since g annihilates the range of T . This shows that R(T ) N (T t ). For the other direction,
let g N (T t ) and w R(T ). Then we can find v V such that w = T (v) and so
g(w) = g(T (v)) = (T t (g))(v) = 0 ,
since T t (g) = 0. This completes the proof of the first item.
Next if f R(T t ) we can find g W such that f = T t (g). If v N (T ) then
f (v) = (T t (g))(v) = g(T (v)) = g(0) = 0 ,
so f N (T ) . This shows that R(T t ) N (T ) . To show the other direction, we count
dimensions.
dim R(T t ) = dim W dim N (T t )
= dim W dim R(T )
= dim R(T )
= dim W dim N (T )
= dim N (T ) .
Since these spaces have the same dimension and one is contained in the other, they must be
equal.
The last item follows from dimension counting as well.
3.4
Double dual
We now move one level up, to look at the dual of the dual, the double dual.
Definition 3.4.1. If V is a vector space, we define the double dual V as the dual of V .
It is the space L(V , F) of linear functionals on V .
As before, when V is finite-dimensional, since dim L(V , F) = dim (V ) dim (F), we find
dim V = dimV when dim V < .
Exercise. Show this is true even if dim V = .
There are some simple elements of V , the evaluation maps. given v V we set
evalv : V F by evalv (f ) = f (v) .
35
evalv is a linear functional on V . To see this, note first that it certainly maps V to F,
so we must only show it is linear. The proof is the same as in the previous homework:
let f, g V and c F. Then
evalv (cf + g) = (cf + g)(v) = cf (v) + g(v) = c evalv (f ) + evalv (g) .
In fact the map : V V given by (v) = evalv is an isomorphism when dim V <
. It is called the natural isomorphism. We first show it is linear, so let v1 , v2 V
and c F. If f V then
((cv1 + v2 ))(f ) = evalcv1 +v2 (f ) = f (cv1 + v2 ) = cf (v1 ) + f (v2 )
= c evalv1 (f ) + evalv2 (f ) = c((v1 ))(f ) + ((v2 ))(f )
= (c(v1 ) + (v2 ))(f ) .
Since this is true for all f , we get (cv1 + v2 ) = c(v1 ) + (v2 ).
Now to prove that is an isomorphism we only need to show injective (since V and
V have the same dimension). So assume that (v) = 0 (the zero element of V ).
Then for all f V , we have f (v) = evalv (f ) = 0. We now use a lemma to finish the
proof.
Lemma 3.4.2. If v V is nonzero, there exists f V such that f (v) 6= 0.
Proof. Let v V be nonzero and extend it to a basis B for v. Then the element v in
the dual basis B has v (v) = 1. So set f = v .
Because f (v) = 0 for all f V , the lemma says v = ~0. Therefore N () = {~0} and
is injective, implying it is an isomorphism.
When we looked at the dual V and constructed the dual basis B for a basis B, the dual
element v to an element v B actually depended on the initial choice of basis B. This is
because to define v , we must express a vector in terms of coordinates using the entire basis
B, and then take the coefficient of v. The identification of V with V via the isomorphism
, however, does not depend on the choice of basis. Some people get extremely excited about
this independence of basis, apparently. There are relations, however, between the mapping
and the concepts that we developed earlier about V .
Theorem 3.4.3. Let V be finite dimensional and B a basis of V . Then (B) = (B ) .
Proof. Write v1 , . . . , vn for the elements of B. The elements v1 , . . . , vn of B are characterized
by vi (vj ) = 0 when i 6= j and 1 if i = j. Similarly, the elements v1 , . . . , vn of (B ) are
characterized by vi (vj ) = 0 when i 6= j and 1 otherwise. But the elements of (B) also
have this property:
(vi )(vj ) = evalvi (vj ) = vj (vi ) = 0 when i 6= j and 1 otherwise .
Therefore (vi ) = vi and we are done.
36
The interesting part of the previous theorem is that the mapping of B to B depends on
the choice of B, whereas does not. Therefore when we take the dual basis twice, mapping
B to B and then B to (B ) , the dependence on B disappears.
Theorem 3.4.4. If W V is a subspace and dim V < then (W ) = (W ) .
Proof. Let {v1 , . . . , vk } be a basis for W such that {v1 , . . . , vn } is a basis for V . From the
3.5
Exercises
S0 =
Span(S 0 ).
37
Determinants
4.1
Permutations
These cycles are not disjoint, so we can make them so. Start with 1 and feed it into
the right side. The first factor maps 1 to 2, so 1 exits from the left side of the rightmost
factor as a 2. It enters the middle factor as a 2, and exits as a 3, entering the leftmost
factor to leave unchanged (since 3 does not appear in the leftmost factor). This gives
us
(13
We begin again with 3, feeding it into the rightmost factor. It maps to 4, then stays
unchanged, then maps to 5, so we get
(135
continuing, 5 maps to 6 and to 4:
(1354
4 maps to 5 and back to 1, so this closes a cycle.
(1354) .
We start again with the next unused letter, 2. It maps to 3 and then back to 2, so it
is unchanged and we omit it. The last letter is 6, which maps to 1 and back to 6. So
we get
= (1354) .
The symmetric group is, in fact, a group.
Definition 4.1.2. A set G with a binary operation (that is, a function : GG G)
is called a group if the following hold:
1. there is an element e G such that eg = ge = g for all g G,
2. for all g G there is an inverse element g 1 G such that gg 1 = g 1 g = e and
3. for all g, h, k G, we have (gh)k = g(hk).
A group G is called abelian (or commutative) if gh = hg for all g, h G.
For n 3 the group Sn is non-abelian.
We will look at the simplest permutations, the transpositions:
Definition 4.1.3. An element Sn is called a transposition if it can be written
= (ij) for some i, j {1, . . . , n} with i 6= j .
Every permutation can be written as a product of transpositions (but they will not
necessarily be disjoint!) This can be seen because it can be written in cycle notation, and
then we can decompose each cycle into a product of transpositions. Indeed, if (a1 ak ) is
a cycle then you can verify that
(a1 ak ) = (a1 ak )(a1 ak1 ) (a1 a2 ) .
The main theorem we want to prove is:
39
so {a, b} is an inversion pair for . Otherwise if exactly one of a, b is equal to 1 (k), 1 (k+
1) then let us suppose that a < b (else we can just switch the roles of a and b). Then because
{a, b} is an inversion pair for we have (b) < (a), so if a = 1 (k), we must have
(b) < k = (a), so (b) = (b) < (a) < k + 1 = (a), so {a, b} is still an inversion
pair for . If instead a = 1 (k + 1) we cannot have b = 1 (k), so (b) < k, giving
(b) = (b) < k = (a) and {a, b} is an inversion pair for . Last, if (a)
/ {k, k + 1}
we must have (b) {k, k + 1} and therefore if (b) = k, (a) > k + 1, giving (b) =
k + 1 < (a) = (a), so {a, b} is an inversion pair for . If (b) = k + 1 then (a) > k + 1
and (b) = k < k + 1 = (b) < (a) = (a), so {a, b} is an inversion pair for . This
completes the proof.
4.2
Given n vectors ~v1 , . . . , ~vn in Rn we want to define something like the volume of the parallelepiped spanned by these vectors. What properties would we expect of a volume?
1. vol(e1 , . . . , en ) = 1.
2. If two of the vectors ~vi are equal the volume should be zero.
3. For each c > 0, vol(c~v1 , ~v2 , . . . , ~vn ) = c vol(~v1 , . . . , ~vn ). Same in other arguments.
4. For each ~v10 , vol(~v1 +~v10 , ~v2 , . . . , ~vn ) = vol(~v1 , . . . , ~vn )+vol(~v10 , ~v2 , . . . , ~vn ). Same in other
arguments.
Using the motivating example of the volume, we define a multilinear function as follows.
Definition 4.2.1. If V is an n-dimensional vector space over F then define
V n = {(v1 , . . . , vn ) : vi V for all i = 1, . . . , n} .
A function f : V n F is called multilinear if for each i and vectors v1 , . . . , vi1 , vi+1 , . . . , vn
V , the function fi : V F is linear, where
fi (v) = f (v1 , . . . , vi1 , v, vi+1 , . . . , vn ) .
A multilinear function f is called alternating if f (v1 , . . . , vn ) = 0 whenever vi = vj for some
i 6= j.
Proposition 4.2.2. Let f : V n F be a multilinear function. If F does not have characteristic two then f is alternating if and only if for all v1 , . . . , vn and i < j,
f (v1 , . . . , vi , . . . , vj , . . . , vn ) = f (v1 , . . . , vj , . . . , vi , . . . , vn ) .
41
n
X
i1 =1
n
X
ai,1 f (ei , v2 , . . . , vn )
i1 =1
n
X
in =1
i1 ,...,in
Since f is alternating, all choices of i1 , . . . , in that are not distinct have f (ei1 , . . . , ein ) = 0.
So we can write this as
X
ai1 ,1 ain ,n f (ei1 , . . . , ei,n ) .
i1 ,...,in distinct
42
Using the lemma from last time, f (e(1) , . . . , e(n) ) = sgn()f (e1 , . . . , en ) = sgn(), so
f (v1 , . . . , vn ) =
sgn()a(1),1 a(n),n .
Sn
If g is any other n-linear alternating function with g(e1 , . . . , en ) = 1 then the same computation as above gives the same formula for g(v1 , . . . , vn ), so f = g. This shows uniqueness.
For existence, we need to show that the formula above actually gives an n-linear alternating function with f (e1 , . . . , en ) = 1.
1. We first show f (e1 , . . . , en ) = 1. To do this, we write
ek = a1,k e1 + + an,k en ,
where ak,j = 0 unless j = k, in which case it is 1. If Sn is not the identity,
we can find k 6= j such that (k) = j. This means that a(k),k = aj,k = 0 and so
sgn()a(1),1 a(n),n = 0. Therefore
f (e1 , . . . , en ) =
Sn
2. Next we show alternating. Suppose that vi = vj for some i 6= j and let i,j be the
transposition (ij). Split all permutations into A, those which invert i and j and Sn \ A,
those which do not. Then if vk = a1,k e1 + + an,k en , we can write
X
sgn()a(1),1 a(n),n
f (v1 , . . . , vn ) =
Sn
sgn()a(1),1 a(n),n +
Sn \A
Note however that ai,j (i),i = a(j),i = a(j),j , since vi = vj . Similarly ai,j (j),j = a(i),i .
Therefore
a(1),1 a(n),n = ai,j (1),1 ai,j (n),n .
So the above sum is zero and we are done.
43
3. For n-linearity, we will just show it in the first coordinate. So let v, v1 , . . . , vn V and
c F. Writing
vk = a1,k e1 + + an,k en and v = a1 e1 + + an en ,
then cv + v1 = (ca1 + a1,1 )e1 + + (can + an,1 )en , so
X
f (cv + v1 , v2 , . . . , vn ) =
sgn()(ca(1) + a(1),1 )a(2),2 a(n),n
Sn
=c
Sn
sgn()a(1),1 a(n),n
Sn
= cf (v, v2 , . . . , vn ) + f (v1 , . . . , vn ) .
One nice property of n-linear alternating functions is that they can determine when
vectors are linearly independent.
Theorem 4.2.5. Let V be an n-dimensional F-vector space and f a nonzero n-linear alternating function on V . Then {v1 , . . . , vn } is linearly independent if and only if f (v1 , . . . , vn ) 6=
0.
Proof. If n = 1 then the proof is an exercise, so take n 2 and first assume that the vectors
are linearly dependent. Then we can write one as a linear combination of the others. Suppose
for example that v1 = b2 v2 + + bn vn . Then
f (v1 , . . . , vn ) = b2 f (v2 , v2 , . . . , vn ) + + bn f (vn , v2 , . . . , vn ) = 0 .
Here we have used that f is alternating.
Conversely suppose that {v1 , . . . , vn } is linearly independent. Then it must be a basis.
We can then proceed exactly along the development given above and, if u1 , . . . , un are vectors
written as
uk = a1,k v1 + + an,k vn ,
then if f (v1 , . . . , vn ) = 0, we find
X
f (u1 , . . . , un ) =
sgn()a(1),1 a(n),n f (v1 , . . . , vn ) = 0 .
Sn
4.3
One of the most important properties of the determinant is that it factors through products
(compositions).
Theorem 4.3.1. Let A, B Mn,n (F). We have the following factorization:
det AB = det A det B .
Proof. First if det A = 0 the matrix A cannot be invertible and therefore neither is AB, so
det AB = 0, proving the formula in that case. Otherwise we have det A 6= 0. In this case we
will use a method of proof that is very common when dealing with determinants. We will
define a function on matrices that is n-linear and alternating as a function of the columns,
mapping the identity to 1, and use the uniqueness of the determinant to see that it is just
the determinant function. So define f : Mn,n (F) F by
f (B) =
det AB
.
det A
First note that if In is the n n identity matrix, f (In ) = (det AIn )/ det A = 1. Next, if
B has two equal columns, its column rank is strictly less than n and so is the column rank
of AB, meaning that AB is non-invertible. This gives f (B) = 0/ det A = 0.
Last to show n-linearity of f as a function of the columns of B, write B in terms of its
columns as (~b1 , . . . , ~bn ). Note that if ei is the i-th standard basis vector, then we can write
~bi = Bei . Therefore the i-th column of AB is (AB)ei = A~bi and AB = (A~b1 , . . . , A~bn ). Thus
if ~b1 , ~b01 are column vectors and c F,
h
i
det A(c~b1 + ~b01 , ~b2 , . . . , ~bn ) = det(A(c~b1 + ~b01 ), A~b2 , . . . , A~bn )
= det(cA~b1 + A~b01 , A~b2 , . . . , A~bn )
= c det(A~b1 , A~b2 , . . . , A~bn ) + det(A~b01 , A~b2 , . . . , A~bn ) .
This means that det AB is n-linear (at least in the first column the same argument works
for all columns), and so is f .
There is exactly one n-linear alternating function f with f (In ) = 1, so f (B) = det B.
Here are some consequences.
Similar matrices have the same determinant.
Proof.
det P 1 AP = det P 1 det A det P = det A det P 1 det P = det A det In = det A .
1
.
det A
45
The definition of the determinant is probably different than what you may have seen
before. So now we will relate the definition to the Laplace (cofactor) expansion.
Definition 4.3.2. Given A Mn,n (F), the (i, j)-th minor of A, written A(i|j), is the n
1 n 1 matrix formed by removing the i-th row and the j-th column from A.
The Laplace expansion is a recursive formula for the determinant. We can write det A
in terms of the determinant of smaller matrices, the minors of A.
Theorem 4.3.3. Let A Mn,n (F) with entries (ai,j ). Then
det A =
n
X
i=1
Proof. Write the columns of A as ~a1 , . . . , ~an with ~a1 = a1,1 e1 + + an,1 en (where e1 , . . . , en
are the standard basis vectors) and use n-linearity on the matrix A = (~a1 , . . . , ~an ) to get
det A = det(a1,1 e1 , ~a2 , . . . , ~an ) + + det(an,1 en , ~a2 , . . . , ~an ) . =
n
X
i=1
Now we must only show that det(ei , ~a2 , . . . , ~an ) = (1)i1 det A(i|1).
We will need to use two facts from the homework.
1. For any matrix B, det B = det B t . As a consequence of this, det is n-linear and
alternating when viewed as a function of the rows of B.
2. If B is any block upper triangular matrix ; that is, of the form
C D
B=
0 E
for square matrices C and E, then det B = det C det E.
So now use i 1 adjacent row swaps
1 ai,2
0 a1,2
0 ai1,2
0 ai+1,2
0 an,2
ai,3 ai,n
a1,3 a1,n
ai1,3 ai1,n
.
ai+1,3 ai+1,n
an,3 an,n
Since we applied i 1 transpositions, the determinant of this matrix equals (1)i1 times
det(ei , ~a2 , . . . , ~an ). Now we apply the block upper-triangular result, noting that this matrix
is of the form
1
D
.
0 A(i|1)
Therefore det(ei , ~a2 , . . . , ~an ) = (1)i1 det A(i|1) and we are done.
46
There is a more general version of this result. The above we call expanding along the
first column. We can expand along the j-th column by first applying j 1 adjacent column
swaps to get
n
X
det A =
(1)i+j ai,j det A(i|j) .
i=1
By taking the transpose initially we can expand along any row too.
4.4
Exercises
Ai,j
B
i,jk
Mi,j =
Cik,jk
1 i k,
1 i k,
k < i n,
k < i n,
47
1jk
k<jn
.
1jk
k<jn
1 a0 a20
1 a1 a21
1 an a2n
an0
an1
.
n
an
f (T (v1 ), . . . , T (vn ))
,
f (v1 , . . . , vn )
n
X
Ai,j Ci,j .
i=1
Ai,k Ci,j = 0 .
i=1
1 2 3
1 2 4
1 0 0
1 3 9 ,
0 1 1
1 4 16
6 0 1
49
4
0
.
1
1
14. Consider a system of equations in n variables with coefficients from a field F. We can
write this as AX = Y for an nn matrix A, an n1 matrix X (with entries x1 , . . . , xn )
and an n 1 matrix Y (with entries y1 , . . . , yn ). Given the matrices A and Y we would
like to solve for X.
(a) Show that
(det A)xj =
n
X
(1)i+j yi det A(i|j) .
i=1
2x y + z
2y z
yx
4.5
2x y + z 2t
2x + 2y 3z + t
x + y z
4x 3y + 2z 3t
=3
=1
=1
= 5
= 1
= 1
= 8
Exercises on polynomials
1. Let F be a field and write F[x] for the set of polynomials with coefficients in F. Define
deg(p) for the degree of p F[x]: the largest k such that the coefficient of xk in p is
nonzero. The degree of the zero polynomial is defined to be .
(a) Show that for p, q F[x], the product pq has degree deg(pq) = deg(p) + deg(q).
(b) Show that for p, d F[x] such that d is nonzero, there exist q, r F[x] such that
p = qd + r and deg(r) < deg(d). (This result is called the division algorithm.)
Hint. We may assume that deg(p) 0, for otherwise we can choose r = q = 0.
Also we can assume deg(d) deg(p), or else we choose q = 0 and r = p. So use
induction on deg(p), starting with deg(p) = 0, meaning that p(x) = c for some
nonzero c F. For the inductive step, if deg(p) > 0, find some q1 F[x] such
that deg(p q1 d) < deg(p) and continue.
2. Show that if p F[x] and c F then p(c) = 0 if and only if the polynomial x c
divides p (that is, we can find d F[x] such that (x c)d = p).
3. Let p, q F[x] be nonzero and define the subset S of F[x] as
S = {ap + bq : a, b F[x]} .
50
(a) Let d S be nonzero of minimal degree. Show that d divides both p and q (see
the definition of divides in exercise 2).
(b) Show that if s F[x] divides both p and q then s divides d.
(c) Conclude that there exists a unique monic polynomial (that is, with leading coefficient 1) d F[x] satisfying:
i. d divides both p and q and
ii. if s F[x] divides both p and q then s divides d.
(This d is called the greatest common divisor of p and q.)
4. A field F is called algebraically closed if every p F[x] with deg(p) 1 has a zero in
F. Prove that if F is algebraically closed then for any nonzero p F[x], we can find
a, 1 , . . . , k F and natural numbers n1 , . . . , nk with n1 + + nk = deg(p) such that
p(x) = a(x 1 )n1 (x k )nk .
Here we say that 1 , . . . , k are the roots of p and n1 , . . . , nk are their multiplicities.
Hint. Use induction on the degree of p.
5. Let F be algebraically closed. Show that for nonzero p, q F[x], the greatest common
divisor of p and q is 1 if and only if p and q have no common root. Is this true for
F = R?
51
5
5.1
Eigenvalues
Diagonalizability
Our goal for most of the rest of the semester is to classify all linear transformations T : V V
when dim V < . How can we possibly do this? There are so many transformations. Well,
lets start with the simplest matrices.
Definition 5.1.1. A Mn,n (F) is called a diagonal matrix if ai,j = 0 when i 6= j.
Notice that if A is a diagonal matrix, then A acts very simply on the standard basis.
Precisely, if A is diagonal with entries 1 , . . . , n , then
Aei = i ei .
For the next couple of lectures we will try to determine exactly when a linear transformation
has a diagonal matrix representation (for some basis). This motivates the following definition.
Definition 5.1.2. If T : V V is linear then a nonzero vector v V is called an eigenvector
for T with associated eigenvalue if T (v) = v. If there is a basis for V consisting of
eigenvectors for T then we say T is diagonalizable.
Proposition 5.1.3. Let T : V V be linear. Then T is diagonalizable if and only if there
is a basis B of V such that [T ]B
B is diagonal.
Proof. If [T ]B
B is diagonal with entries 1 , . . . , n , then writing B = {v1 , . . . , vn }, we have
B
[T (vi )]B = [T ]B
B [vi ]B = [T ]B ei = i ei = [i vi ]B .
coefficients. We may assume that there are at least two coefficients, or else we would have
a1 v1 = ~0 and since v1 6= 0 we would have a1 = 0, meaning all coefficients are zero and
{v1 , . . . , vk } is linearly independent.
So apply T to both sides:
a1 T (v1 ) + + ak T (vk ) = ~0 .
Since these are eigenvectors, we can rewrite as
a1 1 v1 + + ak k vk = ~0 .
However multiplying the linear combination by 1 we get
a1 1 v1 + + ak 1 vk = ~0 .
Subtracting these two,
a2 (1 2 )v2 + + ak (1 k )vk = ~0 .
All i s were distinct and all ai s were nonzero, so this is a linear combination of the vi s
equal to zero with fewer nonzero coefficients than in the original one, a contradiction.
For a matrix A Mn,n (F), we define its eigenvalues and eigenvectors similarly: is an
eigenvalue of A if there is a nonzero v Fn such that A v = v.
To find the eigenvalues, we make the following observation.
is an eigenvalue for A if and only if there exists a nonzero v such that (I A)(v) = ~0.
This is true if and only if I A is not invertible. Therefore
is an eigenvalue of A (I A) not invertible det(I A) = 0 .
This leads us to define
Definition 5.1.5. The characteristic polynomial of a matrix A Mn,n (F) is the function
cA : F F given by
cA (x) = det(xI A) .
The definition is similar for a linear transformation. The characteristic polynomial of
T : V V is cT (x) = det[xI T ]B
B , where B is any finite basis of V . (You will show on
homework that this definition does not depend on the choice of basis.)
Facts about the characteristic polynomial.
1. cA is a monic polynomial of degree n.
53
Proof. We simply write out the definition of the determinant, using the notation that
Ai,j (x) is the (i, j)-th entry of xI A:
X
cA (x) = det(xI A) =
sgn A(1),1 (x) A(n),n (x) .
Sn
Each term in this sum is a product of n polynomials, each of degree at most 1 (it is
either a field element or a polynomial of the form x a for some a). So each term is a
polynomial of degree at most n, implying the same for cA . The only term of degree n
corresponds to choosing all the diagonal elements of the matrix this is the identity
permutation. This term is (x a1,1 ) (x an,n ) and the coefficient of xn is 1.
2. If we look for terms of degree n1 in cA (x), we note that in the determinant expansion,
each permutation that is not the identity must have at least two numbers k 6= j from
1 to n such that (k) 6= k and (j) 6= j. Therefore all nonidentity permutations give
terms with degree at most n 2. So the only term contributing to degree n 1 is the
identity. Thus the coefficient of cA (x) of degree n 1 is the same as that of
(x a1,1 ) (x an,n ) = xn xn1 [a1,1 + + an,n ] + = xn xn1 T r(A) +
This means the coefficient of xn1 is T r(A).
3. The coefficient of degree zero (the last term of cA ) is (1)n det A. This follows by just
plugging in x = 0. Thus
cA (x) = xn xn1 T r(A) + + (1)n det A .
4. The field F is called algebraically closed if every p F[x] with degree at least 1 has a
zero in F. (For example, C is algebraically closed.) You will show on homework that
in this case, each polynomial can be factored into factors of the form:
p(x) = a(x r1 )n1 (x rk )nk ,
for a, r1 , . . . , rk F and natural numbers n1 , . . . , nk . The ri s are the roots of p and
the ni s are their multiplicities.
So if F is algebraically closed, we can factor
cA (x) = (x 1 )n1 (x k )nk
with n1 + + nk = n. Note the leading coefficient a here is 1 since cA is monic.
Expanding this, we see that if F is algebraically closed, then
T r(A) =
k
X
ni i and det A =
i=1
k
Y
ni i .
i=1
Thus the trace is the sum of eigenvalues (repeated according to multiplicity) and the
determinant is the product of eigenvalues (again repeated according to multiplicity).
54
5.2
Eigenspaces
Proof. Assume that T is diagonalizable. Then we can find a basis B for V consisting of
eigenvectors for T . Each of these vectors is associated with a particular eigenvalue, so write
1 , . . . , k for the distinct ones. We can then group together the elements of B associated
with i , span them, and call the resulting subspace Ei . It follows then that
E1 Ek = E1 + + Ek = Span B = V .
So 1 implies 2.
Now assume that 2 holds. Then build a basis Bi of size ni of each Ei . Since the i s
are eigenvalues, the Bi s consist of eigenvectors for eigenvalue i and ni 1 for all i. Since
we assumed item 2, the set B = ki=1 Bi is a basis for V and [T ]B
B is a diagonal matrix
with distinct entries 1 , . . . , k , with i repeated ni times. By computing the characteristic
polynomial cT through this matrix, we find item 3 holds. This proves 2 implies 3.
Last if 3 holds, then n1 + + nk = n, because cT has degree n. Therefore
dim E1 + + dim Ek = n .
Because each i is a root of cT , it is an eigenvalue and therefore has an eigenvector. This
means each Ei has dimension at least 1. Let Bi be a basis of Ei . As the i s are distinct,
the Ei s are independent and so
E1 + + Ei = E1 Ek .
This means B = ki=1 Bi is a basis for the sum and therefore is linearly independent. Since
it has size n, it is a basis for V . But each vector of B is an eigenvector for T so T is
diagonalizable.
Lets finish by giving an example. We would
Let A be the real matrix
1 1
A= 1 1
0 0
1
1 .
1
x1
1
1
x
1
1
cA (x) = det 1 x 1 1 = (x 1) det
.
1 x 1
0
0
x1
Here we have used the formula for the determinant of a block upper-triangular matrix. So
it equals
(x 1)((x 1)2 + 1) = (x 1)(x2 2x + 2) .
The last factor does not have any roots in R. Therefore we cannot write cA in the form
(x 1 )n1 (x k )nk and A is not diagonalizable.
On the other hand, if we consider A as a complex matrix, it is diagonalizable. This is
because the characteristic polynomials factors
(x 1)(x a)(x a
) ,
56
where a = 1 + i (where i = 1) and a
is the complex conjugate of a. Since A has 3 distinct
3
eigenvalues and the dimension of C is 3, the matrix is diagonalizable.
If, however, the characteristic polynomial were
cA (x) = (x 1)(x i)2 ,
then we would have to investigate further. For instance this would be the case if we had
1 0 0
A= 0 i 1 .
0 0 i
In this case if we write cA (x) = (x 1 )n1 (x k )nk , we have 1 = 1, 2 = i and
n1 = 1, n2 = 2. If A were diagonalizable, then we would need (from the previous theorem)
1 = n1 = dim E1 and 2 = n2 = dim Ei . However, note that
i1 0 0
0 1 ,
iI A = 0
0
0 0
which has nullity 1, not 2. This means that dim Ei = 1 and A is not diagonalizable.
5.3
Exercises
1. Let V be a vector space and W1 , . . . , Wk be subspaces. Show that the Wi s are independent if and only if for each v W1 + +Wk , there exist unique w1 W1 , . . . , wk Wk
such that v = w1 + + wk .
2. In this problem we will show that if F is algebraically closed then any linear T : V V
can be represented as an upper triangular matrix. This is a simpler result than (and
is implied by) the Jordan Canonical form, which we will cover in class soon.
We will argue by (strong) induction on the dimension of V . Clearly the result holds for
dim V = 1. So suppose that for some k 1 whenever dim W k and U : W W is
linear, we can find a basis of W relative to which the matrix of U is upper-triangular.
Further, let V be a vector space of dimension k + 1 over F and T : V V be linear.
(a) Let be an eigenvalue of T . Show that the dimension of R := R(T I) is
strictly less than dim V and that R is T -invariant.
(b) Apply the inductive hypothesis to T |R (the operator T restricted to R) to find a
basis of R with respect to which T |R is upper-triangular. Extend this to a basis
for V and complete the argument.
3. Let A be the matrix
6 3 2
A = 4 1 2 .
10 5 3
57
1 1
1
1 1 1 .
1
1 1
58
Jordan form
For a diagonalizable transformation, we have a very nice form. However it is not always
true that a transformation is diagonalizable. For example, it may be that the roots of the
characteristic polynomials are not in the field (as in the example above). Even if the roots
are in the field, they may not have multiplicities equal to the dimensions of the eigenspaces.
So we look for a more general form. Instead of looking for a diagonal form, we look for a
block diagonal form. That is, we want to write
B 0 0
0 C 0
A=
0 0 D ,
6.1
dimension theorem.) So assume that v is in the intersection, meaning that there exists w V
such that (1 I T )k1 (w) = v and (1 I T )k1 (v) = ~0. But then we get (1 I T )2k1 (w) = ~0
and therefore w E1 . This means actually
~0 = (1 I T )k1 (w) = v .
We now set W1 = R(1 I T )k1 and we are done with this step.
Step 2. T -invariance of the direct sum. We saw above that E1 is T -invariant. We claim
that W1 is as well, so that the direct sum is T -invariant, and we have obtained our first
block.
If v W1 = R(1 I T )k1 then there exists w V such that (1 I T )k1 (w) = v. Then
T (v) = T ((1 I T )k1 (w)) = (1 I T )k1 (T (w))
because T and 1 I T commute. Therefore T (v) W1 and W1 is T -invariant.
To conclude this step, we define T1 to be T restricted to W1 . That is, we view W1 as a
vector space of its own and T1 : W1 W1 as a linear transformation defined by T1 (w) = T (w)
for w W1 .
Step 3. E2 , . . . , Ek are in W1 . We now show that if 1 , . . . , k are the distinct eigenvalues
of T then E2 , . . . , Ek are contained in W1 .
So first let v Ej for some j = 2, . . . , k. By definition of the generalized eigenspace
we can find t such that v N (j I T )t . We will now use a lemma that follows from the
homework.
Lemma 6.1.4. There exist polynomials p, q F[x] such that
(j x)t p + (1 x)k1 q = 1 .
Proof. Since (j x)t and (1 x)k1 do not have a common root, you proved in the homework
that their greatest common divisor is 1. Then the lemma follows from the result on the
homework: if r, s F[x] have greatest common divisor d then there exist p, q F[x] such
that rp + sq = d.
We will use the lemma but in its transformation form. For any polynomial a F[x] of
the form a(x) = an xn + + a0 we define
a(T ) = an T n + + a1 T + a0 I .
Therefore
I = (j I T )t p(T ) + (1 I T )k1 q(T ) .
Applying this to v, we get
v = (j I T )t p(T )(v) + (1 I T )k1 q(T )(v) .
But all these polynomial transformations commute, so we can use v N (j I T )t to find
v = (1 I T )k1 (q(T )(v)) R(1 I T )k1 = W1 .
61
6.2
Nilpotent operators
Now that we have done the primary decomposition of V into generalized eigenspaces, we will
do a secondary decomposition, and break each space into smaller subspaces, each spanned by
a chain basis. To do this we notice that if we consider i I T on the generalized eigenspace
Ei , then it is nilpotent.
Definition 6.2.1. A linear U : V V is called nilpotent if there is some k 1 such that
U k = 0. The minimal k is called the degree of U .
We will consider a nilpotent U : V V and find a nice matrix representation for it. We
will relate this basis back to T later. The nice representation will come from chains.
Definition 6.2.2. A set {v, U (v), U 2 (v), . . . , U l (v)} is called a chain of length l for U if
U s (v) 6= ~0 for s l but U l+1 (v) = ~0.
Theorem 6.2.3 (Structure theorem for nilpotent operators). Let U : V V be nilpotent
and dim V < . There exists a basis for V consisting of chains for U .
62
Our main tool to prove the theorem will be linear independence mod a subspace. Recall
that if W is a subspace of V then v1 , . . . , vk are said to be linearly independent mod W if
whenever
a1 v1 + + ak vk W ,
it follows that v1 , . . . , vk W . We will give some important lemmas about this concept.
Many of these can be seen as statements we have derived at the beginning of the semester,
but in the setting of quotient spaces, specifically in V /W . For example, the following is
analogous to the one-subspace (basis extension) theorem:
Proposition 6.2.4. Let W1 W2 be subspaces of V . If dim W2 dim W1 = m and
v1 , . . . , vl W2 are linearly independent mod W1 we can find m l vectors vl+1 , . . . , vm
W2 \ W1 such that {v1 , . . . , vm } is linearly dependent mod W1 .
Proof. You showed in homework that {v1 , . . . , vl } is linearly independent mod W1 if and only
if {v1 + W, . . . , vl + W } is linearly independent in W2 /W1 . Further, you showed that the
dimension of W2 /W1 is dim W2 dim W1 . So we can use the one-subspace theorem in W2 /W1
to extend {v1 +W1 , . . . , vl +W1 } to a basis of W2 /W1 , adding elements Cl+1 , . . . , Cm W2 /W1 .
Each of these elements can be written as v + W1 for some v, so we obtain a set
{v1 + W1 , . . . , vl + W1 , vl+1 + W1 , . . . , vm + W1 }
which is a basis of W2 /W1 . By the equivalence above, {v1 , . . . , vm } is then linearly independent mod W1 .
Another similar statement is:
Proposition 6.2.5. Let W1 W2 be subspaces of V . If dim W2 dim W1 = m and
{v1 , . . . , vl } W2 is linearly independent mod W1 then l m.
Proof. Since {v1 , . . . , vl } is linearly independent mod W1 , {v1 + W2 , . . . , vl + W2 } is linearly
independent in W2 /W1 . This space has dimension m, so by Steinitz, l m.
Now lets specialize to the case of subspaces associated to a nilpotent operator. Given a
nilpotent U : V V of degree k with V finite-dimensional, we construct the subspaces
N0 = {~0}, N1 = N (U ), . . . , Nk1 = N (U k1 ), Nk = V .
Note that
N0 N1 Nk and Nk1 6= V .
We will prove a couple of properties about this tower of subspaces.
1. If v Nj \ Nj1 for j = 2, . . . , k then U (v) Nj1 \ Nj2 .
Proof. If v Nj \ Nj1 then U j (v) = ~0 but U j1 (v) 6= ~0. Thus
U j1 (U (v)) = ~0 but U j2 (U (v)) 6= ~0 ,
meaning U (v) Nj1 \ Nj2 .
63
To prove that, note that Bk1 is linearly independent (and a subset of Nk1 ) and
{v1 , . . . , vdk } is linearly independent mod Nk1 . Thus if we have a linear combination
X
bv v = ~0 ,
a1 v1 + + adk vdk +
vBk1
then a1 v1 + P
+ adk vdk Nk1 and linear independence mod Nk1 gives a1 = = adk = 0.
Thus we have vBk1 bv v = ~0 and linear independence of Bk1 gives bv = 0 for all v.
Next we prove uniqueness of the nilpotent representation.
Theorem 6.2.6 (Uniqueness). Let U : V V be nilpotent and dim V < . If B is a basis
of V consisting of chains for U , write
li (B) = # of (maximal) chains of length i in B .
Then if B, B 0 are bases of V consisting of chains for U , li (B) = li (B 0 ) for all i.
Proof. Write k for the degree of U . Let ni (B) be the number of elements of B that are in
Ni \ Ni1 . Since each element of B is in one of these sets exactly (it is a basis of chains), we
have
n1 (B) + + nk (B) = dim V .
On the other hand since the elements of B in Ni \ Ni1 are linearly independent and outside
of Ni1 they are actually linearly independent mod Ni1 (easy exercise). Therefore
ni (B) dim Ni dim Ni1 := mi for all i .
Since also m1 + + mk = dim V , we must have ni (B) = mi for all i. However ni (B) is
equal to the number of (maximal) chains of length at least i in B, so these numbers must
be the same in B and B 0 . Last,
li (B) = ni (B) ni+1 (B) = ni (B 0 ) ni+1 (B 0 ) = li (B 0 ) .
6.3
1 0 0
0 1 0
Jl =
65
0
0
Theorem 6.3.2 (Jordan canonical form). If T : V V is linear with dim V < and F
algebraically closed. Then there is a basis of V such that [T ] is block diagonal with Jordan
blocks.
Proof. First decompose V = E1 Ek . On each Ei , the operator T i I is nilpotent.
Each chain for (T i I)|E gives a block in nilpotent decomposition. Then T = T i I +i I
i
gives a Jordan block.
We can now see the entire decomposition. We first decompose
V = E1 Ek
and then
Ei = C1i Cki i ,
where each Cji is the span of a chain of generalized eigenvectors: Cji = {v1 , . . . , vp }, where
T (v1 ) = i v1 , T (v2 ) = v2 + v1 , . . . , T (vp ) = i vp + vp1 .
Note that chains of generalized eigenvectors are not mapped to each other in the same way
that chains for nilpotent operators are. This is because we have to add the scalar operator
I.
To build up to the Cayley-Hamilton theorem, we need a couple of lemmas.
Lemma 6.3.3. If U : V V is linear and nilpotent with dim V = n < then
Un = 0 .
Therefore if T : V V is linear and v E then
(T I)dim
v = ~0 .
Proof. Let be a basis of chains for U . Then the length of the longest chain is n.
Lemma 6.3.4. If T : V V is linear with dim V < and is a basis such that [T ] is
in Jordan form then for each eigenvalue , let S be the basis vectors corresponding to blocks
for . Then
Span(S ) = E for each .
Therefore if
k
Y
c(x) =
(i x)ni ,
i=1
66
E
i
v = ~0
Proof. Write 1 , . . . , k for the distinct eigenvalues of T . Setting Si as the generalized eigenvectors in B corresponding to i and Si0 the same for B 0 , we have from last lecture that
Si , Si0 are bases for Ei .
Therefore both Si and Si0 are chain bases for T i I restricted to Ei . By the uniqueness of
nilpotent form, the number of chains of each length is the same in Si , Si0 . This means the
number of Jordan blocks of each size for i is the same in B and B 0 . This is true for all i,
and proves the theorem.
6.4
Exercises
Notation. Let T : V V be linear, where V is an F-vector space. If p F[x] has the form
p(x) = an xn + + a1 x + a0 then we define the linear transformation p(T ) : V V by
p(T ) = an T n + + a1 T + a0 I .
Exercises.
1. Let V be a finite dimensional vector space with T : V V linear and let
V = W1 Wk
be a T -invariant direct sum decomposition. That is, the subspaces Wi are independent,
sum to V and are each T -invariant. If B1 , . . . , Bk are bases for W1 , . . . , Wk respectively,
let B = ki=1 Bi and show that [T ]B
B is a block diagonal matrix with k blocks of sizes
#B1 , . . . , #Bk .
2. Let U : V V be nilpotent of degree k and dim V < . In this question we will
sketch an approach to the structure theorem for nilpotent operators using quotient
spaces.
(a) For i = 1, . . . , k, define Ni = N (U i ) and N0 = {~0}. Define the quotient spaces
i for i 1 then there exists
i = Ni /Ni1 and N
0 = {~0}. Show that if C N
N
i1 , such that U (C) D, where
DN
U (C) = {U (v) : v C} .
i N
i1 by Ui (C) = D, where D is the unique
(b) For i = 1, . . . , k, define Ui : N
element in Ni1 such that U (C) D. Show that Ui is linear.
(c) For i = 1, . . . , k, show that Ui is injective.
(You do not need to do anything for this paragraph.) From this point on, the proof
(k)
(k)
k . By injectivity of Uk ,
would proceed as follows. Let C1 , . . . , Clk be a basis of N
(k)
(k)
k1 and
{Uk (C1 ), . . . , Uk (Clk )} linearly independent. Extend it to a basis of N
68
(k1)
(k1)
1 0 0
2 3 0
5
1
3
2
0
(b) 0 1 0
(c) 0
(a) 1 4 1
1 4 0
0 1 2
6 1 4
9. (a) The characteristic polynomial of the matrix
7 1 2
2
1 4 1 1
A=
2 1 5 1
1 1 2
8
is c(x) = (x 6)4 . Find an invertible matrix S such that S 1 AS is in Jordan
form.
(b) Find all complex matrices in Jordan form with characteristic polynomial
c(x) = (i x)3 (2 x)2 .
70
2i
Show that R = X1 Xr Y1 Ym .
(e) Prove that for each j = 1, . . . , r, the transformation T j I restricted to Xj is
nilpotent and thus we can find a basis Bj for Xj consisting entirely of chains for
T j I.
(f) For each k = 1, . . . , m, let
(k)
(k)
Bilinear forms
7.1
n
X
ai f (vi , w) =
i=1
=
=
=
=
n
X
i=1
n
X
i=1
n
X
i=1
n
X
ai f (vi , b1 v1 + + bn vn )
"
ai
n
X
#
bj f (vi , vj )
j=1
([v]B )i,1
n
X
([w]B )t1,j [f ]B
B
j,i
j=1
[w]tB [f ]B
B
i=1
[w]tB [f ]B
B [v]B
1,i
([v]B )i,1
For uniqueness, note that if A is any matrix such that f (v, w) = [w]tB A[v]B for all v, w,
we apply this to vi , vj :
f (vj , vi ) = [vi ]tB A[vj ]B = eti Aej = Ai,j .
Thus A has the same entries as those of [f ]B
B.
73
Remarks.
If we define the standard dot product of two vectors ~a, ~b Fn by
~a ~b = a1 b1 + + an bn ,
where we have written the vectors ~a = (a1 , . . . , an ) and ~b = (b1 , . . . , bn ), then we can
write the above result as
f (v, w) = [f ]B
B [v]B [w]B .
Given any A Mn,n (F) and basis B of V (of size n), the function fA , given by
fA (v, w) = [w]tB A[v]B
is bilinear (check this!) and has A for its matrix relative to B. To see this, we compute
the (i, j)-th entry of [fA ]B
B : it is
fA (vj , vi ) = [vi ]tB A[vj ]B = eti Aej = Ai,j .
An important example comes from A = I. This corresponds to the dot product in
basis B:
(v, w) 7 [w]tB [v]B = [v]B [w]B .
Given B a basis of V , the map f 7 [f ]B
B is an isomorphism.
Proof. If f, g Bil(V, F) and c F,
B
B
[cf + g]B
B i,j = (cf + g)(vj , vi ) = cf (vj , vi ) + g(vj , vi ) = c [f ]B i,j + [g]B i,j ,
B
B
B
so [cf + g]B
B = c[f ]B + [g]B , proving linearity. If [f ]B is the zero matrix, then
f (v, w) = [w]tB [f ]B
B [v]B = 0 for all v, w V ,
so f is zero, proving injectivity. Last, if A is a given matrix in Mn,n (F), we showed
above that [fA ]B
B = A. This means that the map is surjective and we are done.
Definition 7.1.3. The rank of a bilinear form f on a finite-dimensional vector space V is
defined as rank([f ]B
B ) in any basis B.
To show this is well-defined will take a bit of work. Given a bilinear form f on V we can
fix a vector v V and define
Lf (v) : V F by (Lf (v))(w) = f (v, w) .
Since f is bilinear, this is a linear functional and thus lives in V . Let Bil(V, F) be the vector
space of bilinear forms on V .
We can nicely represent the matrix of Lf relative to the basis B and the dual basis B .
74
Proof. Let v1 , v2 V and c F. To show that Lf (cv1 + v2 ) = cLf (v1 ) + Lf (v2 ), we will need
to apply these functionals to vectors in V . So let w V and compute
Lf (cv1 + v2 )(w) = f (cv1 + v2 , w) = cf (v1 , w) + f (v2 , w)
= cLf (v1 )(w) + Lf (v1 )(w) = (cLf (v1 ) + Lf (v2 ))(w) .
This is true for all w, so Lf is linear.
The i-th column of [Lf ]B
B is obtained by writing
B = {v1 , . . . , vn }, B = {v1 , . . . , vn }
and writing Lf (vi ) in terms of B . Recall that if g is a linear functional then its representation
in terms of the dual basis can be written
f = f (v1 )v1 + + f (vn )vn .
Therefore we can write
Lf (vi ) = Lf (vi )(v1 )v1 + + Lf (vi )(vn )vn
= f (vi , v1 )v1 + + f (vi , vn )vn .
B
This means that the (j, i)-th entry of [Lf ]B
B is (vi , vj ), the (j, i)-th entry of [f ]B .
B
Here are some nice consequences of the equality [f ]B
B = [Lf ]B .
1. The definition of rank of f does not depend on the choice of basis B. Indeed, let C be
another basis. Then
B
C
C
rank(f ) = rank[f ]B
B = rank([Lf ]B ) = rank([Lf ]C ) = rank[f ]C .
The third equality follows from the fact that the rank of the matrix of Lf does not
depend on the bases used to represent it.
2. We can equally well define Rf to be the map from V to V by
Rf (v)(w) = f (w, v) .
B
Then [Rf ]B
B = [f ]B . The reason is as follows. Define g(v, w) = f (w, v). Then Lg = Rf .
B
Now the matrix [g]B
B is easily seen to be the transpose of [f ]B (its (i, j)-th entry is
g(vj , vi ) = f (vi , vj )). So
B
[f ]B
B = [g]B
t
B
= [Lg ]B
B = [Rf ]B .
75
3. We say that a bilinear form is degenerate if its rank is not equal to dim V . We
can now state many equivalent conditions for this: the following are equivalent when
f Bil(V, F) and dim V = n < .
(a) f is degenerate.
(b) Define the nullspace of f to be
N (f ) = {v V : f (v, w) = 0 for all w V } .
(This is also called the left nullspace.) Then N (f ) 6= {~0}.
(c) Defining the right nullspace by
NR (f ) = {v V : f (w, v) = 0 for all w V } ,
then NR (f ) 6= {~0}.
Note here that N (f ) = N (Lf ) and NR (f ) = N (Rf ). By this representation, we have
rank(f ) + dim N (f ) = dim V .
This comes from the rank-nullity theorem applied to Lf .
Theorem 7.1.5. Let V be finite-dimensional. The map L : Bil(V, F) L(V, V ) given by
L (f ) = Lf
is an isomorphism.
Proof. First we show linearity. Given f, g Bil(V, F) and c F, we want to show that
L (cf + g) = cL (f ) + L (g) .
To do this, we need to show that when we apply each side to a vector v V , we get the
same result. The result will be in the dual space, so we need to show this result, applied to
a vector w V , is the same. Thus we compute
(L (cf + g)(v)) (w) = (cf + g)(v, w) = cf (v, w) + g(v, w)
and the right side is
((cL (f ) + L (g))(v)) (w) = (c(L (f )(v)) + L (g)(v)) (w)
= c ((L (f )(v))(w)) + (L (g)(v))(w)
= cf (v, w) + g(v, w) .
To show bijectivity, we note that the dimension of Bil(V, F) is n2 , the same as that of
L(V, V ) (since the map sending a bilinear form to a matrix is an isomorphism). Thus we
need only show one-to-one. If L (f ) = 0, then L (f )(v) = 0 for all v V , meaning for all
w V,
0 = (L (f )(v))(w) = f (v, w) .
This being true for all v, w means f is zero, so L is injective.
76
Now we move to changing coordinates. This is one big difference between the matrix of a
linear transformation and the matrix of a bilinear form. Instead of conjugating by a change
of basis matrix as before, we multiply on the right by the change of basis matrix and on the
left by its transpose.
Proposition 7.1.6. Let f be a bilinear form on V , a finite-dimensional vector space over
F. If B, B 0 are bases of V then
t
B0
B0
B0
[f ]B 0 = [I]B [f ]B
B [I]B .
Proof. For any v, w V ,
t
t
0
B0
B
B0
B0
B
t
0
0
[v]
[w]
[I]
[f
]
[I]B [f ]B [I]B [v]B 0 = [I]B
[w]B 0
B
B
B
B
B
= [w]tB [f ]B
B [v]B
= f (v, w) .
Another way to see the theorem is that if a matrix A represents a bilinear form in some
basis and P is an invertible matrix, then P t AP represents the bilinear form in a different
basis.
7.2
Definition 7.2.1. A form f Bil(V, F) is called symmetric if f (v, w) = f (w, v) for all
v, w V . The space of symmetric bilinear forms is denoted Sym(V, F).
Symmetric forms are represented by symmetric matrices. That is, if f is symmetric and
B t
B is a basis, then [f ]B
B is equal to its transpose ([f ]B ) . One of the fundamental theorems
about symmetric bilinear forms is that they can be diagonalized.
Definition 7.2.2. A basis B of V is called orthogonal relative to f Bil(V, F) if f (v, w) = 0
for all distinct v, w B.
The basis B being orthogonal relative to f is equivalent to [f ]B
B being a diagonal matrix.
Theorem 7.2.3 (Diagonalization of symmetric bilinear forms). Let V be a finite-dimensional
vector space over F, a field of characteristic not equal to 2. If f Sym(V, F) then V has
basis orthogonal relative to f .
Remark. Equivalently, if A Mn,n (F) is symmetric, then there is an invertible P Mn,n (F)
such that P t AP is diagonal.
77
Proof. First note that if f is the zero form then the theorem is trivially true. So assume
f 6= 0.
We will argue by induction on the dimension of V . For the base case, if V has dimension
1, then any basis is orthogonal relative to f . If dim(V ) = n > 1 then we begin by finding
a first element of our basis. To do this, let v V ; we will need to make sure that v can be
chosen such that f (v, v) 6= 0. For this, we use a lemma.
Lemma 7.2.4. Let f Sym(V, F) be nonzero. If F does not have characteristic two then
f (v, v) = 0 for all v V f = 0 .
Proof. The idea of the proof is to develop a so-called polarization identity. For v, w V ,
f (v + w, v + w) f (v w, v w) = 4f (v, w) ,
so
1
f (v, w) = (f (v + w, v + w) f (v w, v w)) .
4
Note that what we have written as 1/4 is actually the inverse of 4 in F. This exists because F
does not have characteristic 2. Therefore if f (z, z) = 0 for all z, we apply this for z = v + w
and z = v w to find f (v, w) = 0.
From the lemma, since f 6= 0, we can find v V such that f (v, v) 6= 0. This implies that
v itself is nonzero. Now consider the function Lf (v). Since it is a nonzero linear functional
(for instance Lf (v)(v) 6= 0) its nullspace must be of dimension n 1. Define f to be f
restricted to N (Lf (v)) and note that f is a symmetric bilinear form on N (Lf (v)). Since this
has dimension strictly less than that of V , we use induction to find {v1 , . . . , vn1 }, a basis
for N (Lf (v)) that is orthogonal relative to f.
Since v
/ N (Lf (v)), it follows that {v1 , . . . , vn1 , v} is a basis for V . It is also orthogonal
because f (vi , vj ) = f(vi , vj ) = 0 whenever i, j {1, . . . , n 1} and for i = 1, . . . , n 1 we
have
f (vi , v) = f (v, vi ) = 0 since vi N (Lf (v)) .
This completes the proof.
The above shows that any symmetric bilinear form can be diagonalized as long as
char(F) 6= 2. In the case that the characteristic is 2, we cannot necessarily find an orthogonal basis: consider
0 1
A=
.
1 0
In a field with characteristic 2, 1 = 1, so this is a symmetric matrix and thus defines
a symmetric bilinear form f on F2 , where F = Z2 , by f (v, w) = wt Av. However for any
v = ae1 + be2 ,
f (v, v) = a2 f (e1 , e1 ) + abf (e1 , e2 ) + abf (e2 , e1 ) + b2 f (e2 , e2 ) = 0 .
78
t
X
i=1
The same is clearly true for V+ (B 0 ). Now assume for a contradiction that #S+ (B) >
#S+ (B 0 ). Then by the two subspace dimension theorem, writing V (B 0 ) = V0 (B 0 ) V (B 0 ),
dim (V+ (B) V (B 0 )) + dim (V+ (B) + V (B 0 )) = dim V+ (B) + dim V (B 0 ) .
Using dim (V+ (B) + V (B 0 )) dim V and dim V+ (B) > dim V+ (B 0 ),
dim (V+ (B) V (B 0 )) dim V+ (B 0 ) + dim V (B 0 ) dim V > 0 .
Therefore there is a vector v V+ (B) V (B 0 ). But such a vector must have f (v, v) > 0
and f (v, v) 0, a contradiction. So #S+ (B) #S+ (B 0 ). Reversing the roles of B and B 0
gives
#S+ (B) = #S+ (B 0 ) .
An almost identical argument gives #S (B) = #S (B 0 ). This means we must have also
#S0 (B) = #S0 (B 0 ), since both bases have the same number of elements.
79
Then take v1 = (2, 3) and v2 = ( 3, 2). Since f ((a, b), (c, d)) = ac bd,
f (v1 , v1 ) = (2)(2) ( 3)( 3) = 1 ,
B
0
This means if B 0 = {v1 , v2 } then [f ]B
B 0 = [f ]B but the spaces V+ (B) and V+ (B ) are
not the same (nor are V (B) and V (B 0 )).
7.3
One motivation for considering symmetric bilinear forms is to try to abstract the standard
dot product. In this direction, we know that in Rn , the quantity
q
which is bad because we should have ~a ~a 0 for all vectors. This is the motivation for
introducing a different dot product on Cn : it is given by
~a ~b = a1 b1 + + an bn ,
where z represents the complex conjugate x iy of a complex number z = x + iy. Note that
this dot product is no longer bilinear because of the conjugate; however, it is sesquilinear.
Definition 7.3.1. If V is a vector space over C, a function f : V V C is called
sesquilinear if
1. for each fixed w V , the function v 7 f (v, w) is linear and
2. for each fixed v V , the function w 7 f (v, w) is anti-linear; that is, for w1 , w2 V
and c C,
f (v, cw1 + w2 ) = cf (v, w1 ) + f (v, w2 ) .
If, in addition, f (v, w) = f (w, v) for all v, w V , we call f Hermitian.
The theory of Sesquilinear and Hermitian forms parallels that of bilinear and symmetric
forms. We will not give the proofs of the following statements, as they are quite similar to
before:
If f is a sesquilinear form and B is a basis of V then there is a matrix of f relative to
B as before:
t
for v, w V, f (v, w) = [w]B [f ]B
B [v]B .
Here, the (i, j)-th entry of [f ]B
B is f (v1 , vj ), where B = {v1 , . . . , vn }, as before.
The function Lf (v) is no longer linear, it is anti-linear. Although Rf (w) is linear,
the map w 7 Rf (w) (from V to V ) is no longer an isomorphism. It is a bijective
anti-linear function.
We have the polarization formula
4f (u, v) = f (u + v, u + v) f (u v, u v) + if (u + iv, u + iv) if (u iv, u iv) .
This implies that if f (v, v) = 0 for all v, then f = 0.
There is a corresponding version of Sylvesters law:
Theorem 7.3.2 (Sylvester for Hermitian forms). Let f be a Hermitian form on a finitedimensional complex vector space V . There is a basis B of V such that [f ]B
B is diagonal with
only 0s, 1s and 1s. Furthermore, the number of each does not depend on B as long as
the matrix is in diagonal form.
The proof is the same.
81
7.4
Exercises
1. Let V be an F-vector space and {v1 , . . . , vn } a basis for V . Consider the dual basis
{v1 , . . . , vn } and for all pairs i, j {1, . . . , n} define fi,j (v, w) = vi (v)vj (w). Show that
{fi,j : i, j {1, . . . , n}}
is a basis for Bil(V, F). Find the nullspace of each element in this basis.
2. Let V be a vector space and f Sym(V, F). If W is a subspace of V such that
V = W N (f ), show that fW , the restriction of f to W , is non-degenerate.
3. Let V be a vector space over F with characteristic not equal to 2. Show that if V is
finite dimensional and W is a subspace such that the restriction fW of f Sym(V, F)
to W is non-degenerate, then V = W W f . Here W f is defined as
W f = {v V : f (v, w) = 0 for all w W } .
Hint. Use induction on dim W .
4. Let V be a vector space of dimension n < and f Sym(V, F) be non-degenerate.
(a) A linear T : V V is called orthogonal relative to f if f (T (v), T (w)) = f (v, w)
for all v, w V . Show that if T is orthogonal then it is invertible.
(b) For any g Bil(V, F) and linear U : V V we can define gU : V V F by
gU (v, w) = g(U (v), U (w)) .
Show that gU Bil(V, F). Given a basis B of V , how do we express the matrix
of gU relative to that of g? Use this to find the determinant of any T that is
orthogonal relative to f .
(c) Show that the orthogonal group
O(f ) = {T L(V, V ) : T is orthogonal relative to f }
is, in fact, a group under composition.
5. Let V be a finite-dimensional F-vector space such that char(F) 6= 2. If f is a skewsymmetric bilinear form on V (that is, f (v, w) = f (w, v) for all v, w V ) can one
find a basis B of V such that [f ]B
B is diagonal?
6. Let f be a symmetric bilinear form on Rn .
(a) Show that
fH ((v, w), (x, y)) := f (v, x) + f (w, y) if (v, y) + if (w, x)
defines a Hermitian form on Cn . (Here we are writing (v, w) for the vector v + iw
as in last homework.)
82
(b) Show that N (fH ) = Span((N (f ))), where is the embedding (v) = (v, 0).
(c) Show that if f is an inner product then so is fH .
7. For the matrix A below, find an invertible
0 1
1 0
2 1
3 2
83
2 3
1 2
.
0 1
1 0
8.1
Definitions
hu, vi
hw, vi .
kvk2
hu, vi
hv, vi = 0 ,
kvk2
so
0 hw, ui = hu, ui
and
0 hu, ui
hu, vihu, vi
,
kvk2
|hu, vi|2
|hu, vi|2 kuk2 kvk2 .
kvk2
84
hu,vi
v.
kvk2
Everything above is an equality so, we have equality if and only if w = ~0, or v and u
are linearly dependent.
4. (Triangle inequality) For u, v V ,
ku + vk kuk + kvk .
This is also written ku vk ku wk + kw vk.
Proof.
ku + vk2 =
=
8.2
Orthogonality
Definition 8.2.1. Given a complex inner product space (V, h, i) we say that vectors u, v V
are orthogonal if hu, vi = 0.
Theorem 8.2.2. Let v1 , . . . , vk be nonzero and pairwise orthogonal in a complex inner product space. Then they are linearly independent.
Proof. Suppose that
a1 v1 + + ak vk = ~0 .
Then we take inner product with vi .
0 = h~0, vi i =
k
X
aj hvj , vi i = ai kvi k2 .
j=1
Therefore ai = 0.
We begin with a method to transform a linearly independent set into an orthonormal set.
Theorem 8.2.3 (Gram-Schmidt). Let V be a complex inner product space and v1 , . . . , vk V
which are linearly independent. There exist u1 , . . . , uk such that
1. {u1 , . . . , uk } is orthonormal and
2. for all j = 1, . . . , k, Span({u1 , . . . , uj }) = Span({v1 , . . . , vj }).
85
Proof. We will prove by induction. If k = 1, we must have v1 6= ~0, so set u1 = v1 /kv1 k. This
gives ku1 k = 1 so that {u1 } is orthonormal and certainly the second condition holds.
If k 2 then assume the statement holds for all j = k 1. Find vectors u1 , . . . , uk1 as
in the statement. Now to define uk we set
wk = vk [hvk , u1 iu1 + + hvk , uk1 iuk1 ] .
We claim that wk is orthogonal to all uj s and is not zero. To check the first, let 1 j k 1
and compute
hwk , uj i = hvk , uj i [hvk , u1 ihu1 , uj i + + hvk , uk1 ihuk1 , uj i]
= hvk , uj i hvk , uj ihuj , uj i = 0 .
Second, if wk were zero then we would have
vk Span({u1 , . . . , uk1 }) = Span({v1 , . . . , vk1 }) ,
a contradiction to linear independence. Therefore we set uk = wk /kwk k and we see that
{u1 , . . . , uk } is orthonormal and therefore linearly independent.
Furthermore note that by induction,
Span({u1 , . . . , uk }) Span({u1 , . . . , uk1 , vk }) Span({v1 , . . . , vk }) .
Since the spaces on the left and right have the same dimension they are equal.
Corollary 8.2.4. If V is a finite-dimensional inner product space then V has an orthonormal
basis.
What do vectors look like represented in an orthonormal basis? Let = {v1 , . . . , vn } be
a basis and let v V . Then
v = a1 v1 + + an vn .
Taking inner product with vj on both sides gives aj = hv, vj i, so
v = hv, v1 iv1 + + hv, vn ivn .
Thus we can view in this case (orthonormal) the number hv, vi i as the projection of v onto
vi . We can then find the norm of v easily:
2
kvk
= hv, vi = hv,
n
X
hv, vi ivi i =
i=1
n
X
n
X
i=1
|hv, vi i|2 .
i=1
hv, vi ihv, vi i
PW = I PW .
For all v1 , v2 V , hPW (v1 ), v2 i = hv1 , PW (v2 )i. This says PW can be moved to the
other side in the inner product. Formally this means PW is its own adjoint (defined
soon).
87
Proof.
hPW (v1 ), v2 i = hPW (v1 ), PW (v2 )i + hPW (v1 ), PW (v2 )i = hPW (v1 ), PW (v2 )i .
By the same argument, hv1 , PW (v2 )i = hPW (v1 ), PW (v2 )i.
For all v V , w W ,
kv PW (v)k kv wk with equality iff w = PW (v) .
Here the norm comes from the inner product. This says that PW (v) is the unique
closest vector to v in W .
Proof. Note first that for any w W and w0 W , the Pythagoras theorem holds:
kw + w0 k2 = hw + w0 , w + w0 i = hw, wi + 2hw, w0 i + hw0 , w0 i = kwk2 + kw0 k2 .
Now
kv wk2 = kPW (v) w + PW (v)k2 = kPW (v) wk2 + kPW (v)k2 .
This is at least kPW (v)k2 = kv PW (v)k2 and they are equal if and only if PW (v) =
w.
Projection onto a vector. If w is a nonzero vector, we can define W = Span(w) and
consider the orthogonal projection onto W . Let {w1 , . . . , wn1 } be an orthonormal basis for
W (which exists by Gram-Schmidt) and normalize w to be w0 = w/kwk. Then we can
write an arbitrary v V in terms of the basis {w1 , . . . , wn1 , w0 } as
v = hv, w1 iw1 + + hv, wn1 iwn1 + hv, w0 iw0 .
This gives a formula for PW (v) = hv, w0 iw0 . Rewriting in terms of w, we get
PW (v) =
hv, wi
w.
kwk2
Since the orthogonal projection onto a subspace was defined without reference to a basis, we
see that this does not depend on the choice of w!
We then define the orthogonal projection onto the vector w to be PW , where W =
Span({w}). That is, Pw (v) = hv,wi
w.
kwk2
88
8.3
Adjoint
Before we saw that an orthogonal projection can be moved to the other side of the inner
product. This motivates us to look at which operators can do this. Given any linear T , we
can define another linear transformation which acts on the other side of the inner product.
Theorem 8.3.1 (Existence of adjoint). Let T : V V be linear and V a finite-dimensional
complex inner product space. There exists a unique linear T : V V such that for all
v, w V ,
hT (v), wi = hv, T (w)i .
T is called the adjoint of T .
Proof. For the proof, we need a lemma.
Lemma 8.3.2 (Riesz representation theorem). Let V a finite-dimensional inner product
space. For each f V there exists a unique wf V such that for all v V ,
f (v) = hv, wf i .
Proof. Recall the map Rh,i : V V given by Rh,i (w)(v) = hv, wi. Since the inner product
is a sesquilinear form, this map is linear. In fact, the inner product has rank equal to dim V
when viewed as a sesquilinear form, since only the zero vector is in its nullspace. Thus the
rank of Rh,i is dim V and it is an isomorphism. This means that given f V there exists
a unique wf V such that Rh,,i (wf ) = f . In other words, a unique wf V such that for
all v V ,
hv, wf i = Rh,i (wf )(v) = f (v) .
We now use the lemma. Given T : V V linear and w V , define a function fT,w :
V C by
fT,w (v) = hT (v), wi .
This is a linear functional, since it equals Rw T . So by Riesz, there exists a unique w such
that
hT (v), wi = fT,w (v) = hv, wi
for all v V .
We define T (w) = w.
By definition T (w) satisfies hT (v), wi = hv, T (w)i for all v, w V . We must simply
show it is linear. So given c C and v, w1 , w2 V ,
hv, T (cw1 + w2 )i = hT (v), cw1 + w2 i = chT (v), w1 i + hT (v), w2 i
= chv, T (w1 )i + hv, T (w2 )i
= hv, cT (w1 ) + T (w2 )i .
By uniqueness, T (cw1 + w2 ) = cT (w1 ) + T (w2 ).
The adjoint has many interesting properties. Some simple ones you can verify:
89
(T + S) = T + S
(T S) = S T .
(cT ) = cT .
These can be seen to follow from the next property.
Proposition 8.3.3. Let T : V V be linear with B an orthonormal basis. Then
t
B
[T ]B
B = ([T ]B ) .
B
and zero otherwise. This means [T T ]B is the identity matrix, giving T = T 1 .
Remark. Given a matrix A Mn,n (C) we say that A is unitary if A is invertible and
t
A = A1 . Here A = A is the conjugate transpose. Note that the (i, j)-th entry of A A
is just the standard dot product of the i-th column of A with the j-th column of A. So A is
unitary if and only if the columns of A are orthonormal. Thus the last part of the previous
proposition says
T unitary [T ]B
B unitary for any orthonormal basis B .
91
8.4
92
Proof. One direction is easy. Suppose that T is unitarily diagonalizable. Then we can find
B
an orthonormal basis B such that [T ]B
B is diagonal. Then [T ]B is also diagonal, since it is
just the conjugate transpose of T . Any two diagonal matrices commute, so, in particular,
these matrices commute. This means
B
B
B
B
B
[T T ]B
B = [T ]B [T ]B = [T ]B [T ]B = [T T ]B ,
giving T T = T T .
The other direction is more difficult. We will first show that if T is self-adjoint then T is
unitarily diagonalizable. For that we need a lemma.
Lemma 8.4.4. If T : V V is linear then
R(T ) = N (T ) and N (T ) = R(T ) .
Proof. Let us assume that v N (T ). Then for any w R(T ) we can find w0 V such that
T (w0 ) = w. Then
hv, wi = hv, T (w0 )i = hT (v), w0 i = 0 .
So v R(T ) . This means N (T ) R(T ) . On the other hand, these spaces have the same
dimension:
dim R(T ) = dim V dim R(T ) = dim N (T ) ,
but the matrix of T is just the conjugate transpose of that of T (in any orthonormal basis),
so dim N (T ) = dim N (T ), completing the proof of the first statement.
For the second, we apply the first statement to T :
R(T ) = N (T )
and then perp both sides: R(T ) = N (T ) .
Now we move to the proof.
Self-adjoint case. Assume that T = T ; we will show that V has an orthonormal basis of
eigenvectors for T by induction on dim V . First if dim V = 1 then any nonzero vector v is
an eigenvector for T . Set our orthonormal basis to be {v/kvk}.
If dim V > 1 then by the fact that we are over C, let v be any eigenvector for T (this
is possible since C is algebraically closed) for eigenvalue . Then dim N (T I) > 0 and
thus
dim R(T I) = dim V dim N ((T I) )
= dim V dim N (T I) .
However as T is self-adjoint, R, so this is
dim V dim N (T I) < dim V .
Furthermore if dim R(T I) = 0 then we must have dim N (T I) = dim V , meaning
the whole space is the eigenspace for T . In this case we just take any orthonormal basis
93
8.5
(3)
To prove this, the idea is to start from a weaker inequality and upgrade it by exploiting some
invariances of the quantities involved. The starting point is the positive-definiteness of the
inner product:
0 hu v, u vi.
(4)
94
(5)
kuk2 kvk2
+
.
(6)
2
2
This inequality is weaker than (3): the left side is in general smaller than |hu, vi|, while the
right side is larger than kukkvk.
To improve the situation, notice that if C is such that || = 1, then
<hu, vi
<hu, vi
+
.
2
2
(7)
This is one of those great situations where we have an inequality holding for every value of
some parameter. The obvious next step is to optimize the choice of . To see how to do this
in the present case, recall that any complex number z can be written as
z = |z|ei ,
where is real. Thus also
hu, vi = ei |hu, vi|
for some . Similarly write = 1 ei for some > 0. Then
vi = <ei ei |hu, vi| = cos( )|hu, vi|,
<hu,
recalling that ei = cos() + i sin(). Using this in (7), we find
cos( )|hu, vi|
95
kuk2 kvk2
+
,
2
2
whether is determined by u and v, but is at our disposal. We get the strongest statement
by setting = , improving (7) to
|hu, vi|
kuk2 kvk2
+
.
2
2
(8)
We can further improve (8) by introducing a new parameter c > 0. Notice that if we apply
(8) with u replaced by c u and v replaced by 1c v, then the left side is
|hcu,
1
c
vi| = |hu, vi| = |hu, vi|.
c
c
1 kvk2
kuk2
+ 2
,
2
c 2
kuk2
1 kvk2
+ 2
.
2
c 2
(9)
Since the left side is independent of c, we should once again optimize in our free parameter c
to get the best possible inequality. The inequality will obviously be strongest for the smallest
possible value of the right side. The function
x 7 x
kuk2 1 kvk2
+
2
x 2
is differentiable for x > 0 and tends to infinity as x 0+ and x . It thus has a (global)
minimum at the zero of its derivative, which occurs when
kuk2
1 kvk2
2
= 0,
2
x 2
or
x0 =
kvk2
.
kuk2
This tells that the optimal choice of c in (9) for fixed kuk and kvk is c =
to the right side being
kvk kuk2 kuk kvk2
+
= kukkvk.
kuk 2
kvk 2
This finishes the proof of (3).
96
x0 , which leads
8.6
Exercises
p
hv, vi .
97
(a) Prove that for each f V there exists a unique z V such that for all v V ,
f (v) = hv, zi .
(b) For each u V define fu,T : V V by
fu,T (v) = hT (v), ui .
Prove that fu,T V . Define T t (u) to be the unique u V such that for all
v V,
hT (v), ui = hv, T t (u)i
and show that T t is linear.
(c) Show that if is an orthonormal basis for V then
t
[T t ] = [T ] .
6. Let h, i be an inner product on V = Rn and define the complexification of h, i by
h(v, w), (x, y)iC = hv, xi + hw, yi ihv, yi + ihw, xi .
(a) Show that h, iC is an inner product on Cn .
(b) Let T : V V be linear.
i. Prove that (TC ) = (T t )C .
ii. If T t = T then we say that T is symmetric. Show in this case that TC is
Hermitian.
iii. It T t = T then we say T is anti-symmetric. Show in this case that TC is
skew-adjoint.
iv. If T is invertible and T t = T 1 then we say that T is orthogonal. Show in
this case that TC is unitary. Show that this is equivalent to
T O(h, i) ,
where O(h, i) is the orthogonal group for h, i.
7. Let h, i be an inner product on V = Rn and T : V V be linear.
(a) Suppose that T T t = T t T . Show then that TC is normal. In this case, we can find
a basis of Cn such that is orthonormal (with respect to h, iC ) and [TC ] is
diagonal. Define the subspaces of V
X1 , . . . , Xr , Y1 , . . . , Y2m
as in problem 1, question 3. Show that these are mutually orthogonal; that is, if
v, w are in different subspaces then hv, wi = 0.
98
(b) If T is symmetric then show that there exists an orthonormal basis of V such
that [T ] is diagonal.
(c) If T is skew-symmetric, what is the form of the matrix of T in real Jordan form?
(d) A M22 (R) is called a rotation matrix if there exists [0, 2) such that
cos sin
A=
.
sin cos
If T is orthogonal, show that there exists a basis of V such that [T ] is block
diagonal, and the blocks are either 2 2 rotation matrices or 1 1 matrices
consisting of 1 or 1.
Hint. Use the real Jordan form.
99