Sei sulla pagina 1di 17

University of New South Wales School of Mathematics

MATH2601 Higher Linear Algebra

We notice that the remainder vector is an eigenvector of A ! Writing


v2 = 3
1 , we have
Av1 = 5v1 + v2 ,

6. THE JORDAN CANONICAL FORM

These equations are very similar to eigenvalue equations, so perhaps


they will give something close to a diagonalisation of A. We could try


 5 0

,
A v1 v2 = ( 5v1 +v2 5v2 ) = v1 v2
1 5

Problem. Find a formula for An , where




2 9
.
A=
1 8
We shall follow the standard procedure for this sort of problem: find the
eigenvalues 1 , 2 and corresponding eigenvectors v1 , v2 of A; set



1 0
D=
and P = v1 v2 ;
0 2
then A = P DP 1 and
An = (P DP 1 ) (P DP 1 ) = P D n P 1 ,
where D n is easy to calculate. So,
det(A I) = 2 10 + 25 = ( 5)2

 

1 3
3 9
,

A 5I =
0 0
1
3
which shows that we have
=5

with v = t

3
1

Since A has only one (independent) eigenvector, the method fails.


What can we do? Well, as we dont have enough eigenvectors,
perhaps other vectors would work instead. Remember that we need
something like Av = 5v. Lets try a simple vector like v1 = 10 ; then


 
3
2
.
= 5v1 +
Av1 =
1
1
1

Av2 = 5v2 .

or

A v2 v1 = ( 5v2

5v1 +v2 ) = v2 v1

5 1
0 5

We see that A is related to two matrices which are almost diagonal.


The two possibilities lead to (more or less) the same theory, and it is
customary to choose the second, which gives an element 1 above the
diagonal. So, set





3 1
5 1
.
P = v2 v1 =
and J =
0 5
1 0
Then AP = P J and it is clear (at least, in this case) that P is invertible, so A = P JP 1 . Does this help us to calculate An ? As in the
diagonalisable case,
An = (P JP 1 ) (P JP 1 ) = P J n P 1 ,
so the real problem is to find J n . But if we calculate the values of J 2 , J 3
and perhaps one or two more powers, it is easy to guess and then prove
by induction that

 n
5
n 5n1
n
.
J =
0
5n
So we can solve the problem posed above:
 n



5
n 5n1
3 1
0 1
An = P J n P 1 =
1 0
0
5n
1 3

 n
n1
n1
5 3n 5
9n 5
.
=
n 5n1
5n + 3n 5n1
2

The matrix J above is an example of a Jordan form. We shall


prove in this chapter that every n n matrix over C is similar to a
Jordan form, and shall show how to find the Jordan form of a given
matrix. Moreover, each matrix is similar to only one Jordan form (with
certain reservations), and so the Jordan form provides a definite test for
similarity of two matrices. In Chapter 7 we shall show how the Jordan
form can be used in solving systems of simultaneous linear differential
equations. We begin with an important lemma.
Lemma. Schurs Lemma. Let A be an n n matrix over C. Then there
exists an n n unitary matrix Q such that QAQ is upper triangular.
The diagonal of this upper triangular matrix consists of the eigenvalues
of A, repeated according to multiplicity.
Proof. The result is obvious for n = 1, and we proceed by induction
in very much the same way as we did in proving the spectral theorem
for real transformations (chapter 5, page 12). Suppose that n > 1 and
let A be an n n matrix over C; let be an eigenvalue of A and
let B = { e1 , . . . , em } be an orthonormal basis for the corresponding
eigenspace. Extend B to an orthonormal basis B for Cn . The matrix
of A with respect to B is
Q1 AQ1 =

Im
0

A1
A2

where Q1 is unitary because its columns are the (orthonormal) vectors


from the basis B. Now if m = n there is nothing more to prove; if m < n
then we apply the inductive hypothesis to the (n m) (n m) matrix
A2 . Thus Q2 A2 Q2 = U for some (n m) (n m) unitary matrix Q2
and some upper triangular matrix U . If we now define the n n matrix
Q = Q1

Im
0

0
Q2

QAQ =

Im
0

0
Q2

Alternatively, Schurs Lemma can be stated and proved in terms of


transformations rather than matrices; in this formulation it resembles
our proof of the spectral theorem for normal transformations.
Lemma. Let T be a linear transformation on a finitedimensional inner
product space V over C. Then there is an orthonormal basis of V
with respect to which the matrix of T is upper triangular and has the
eigenvalues of T , repeated according to multiplicity, on its diagonal.
Sketch of proof. The result is clear for dim V = 1; assume it is true
whenever dim V < n, and let T be a transformation on an ndimensional
complex inner product space V . Let W = E be an eigenspace of T
having an orthonormal basis B1 = { e1 , . . . , em }. Consider the map
S : W W

where S(v) = projW (T (v)) .

By induction we may assume that W has an orthonormal basis B2 =


{ em+1 , . . . , en } with respect to which the matrix U of S is upper triangular; then B = B1 B2 is an orthonormal basis for V , and we find
the matrix of T with respect to this basis. The first m columns are left
as an exercise. For the others, since S is a projection onto W we have
T (em+j ) = S(em+j ) + w

with

wW ;

if we find the coordinate vectors of the two terms on the right hand side
with respect to B then the first will end with the jth column of U , while
the second will end with n m zeros. Thus the matrix of T with respect
to B has the form


Im A
,
0
U
which is upper triangular.

then Q is unitary, and




which is upper triangular, and the existence of such a matrix for all n
follows by induction. The diagonal elements are the eigenvalues of A
because similar matrices have the same characteristic polynomial.



Im
0

A1
A2
3



Im
0

0
Q2

Im
0

A1 Q2
U

Comment. Consider the significance of this lemma. For many applications, we would like to diagonalise a matrix A, making it similar to a
matrix with the eigenvalues of A on the diagonal and zeros elsewhere.
4

We know, however, that this is not always possible. Schurs lemma


shows us that it is always possible to make A similar to a matrix with
the eigenvalues of A on the diagonal and zeros below the diagonal. This
is a first step on the way to the Jordan form, which allows us to further
specify the elements above the diagonal. Here is our aim:
Definition. The direct sum of square matrices A1 , A2 , . . . , Ar is the
matrix with A1 , A2 , . . . , Ar on the diagonal and zeros elsewhere,

A1 0 0
0 A2 0
A1 A2 Ar =
.
..
..
..
...
.
.
.
0

Examples. Note that the Aj need not all be the

1 2 0

 

5 6
1 2
3 4 0
=

0 0 5
7 8
3 4
0 0 7

2 0 0


3 1
0 3 1
(4) =
(2)
0 0 3
0 3
0 0 0

Ar

same size:

0
0
;
6
8

0
0
.
0
4

Definition. A Jordan block is a kk matrix


of the form

1 0 0
0 1 0
. . . .

. . ... .
.. .. ..
Jk () =

0 0 0 1
0 0 0

Matrix polynomials. Let F be a field. We write F[z] for the set of all
polynomials in one variable z which have coefficients in the field F. If
f (z) = f0 + f1 z + + fd z d is a polynomial in F[z] and A is an n n
matrix over F, then f (A) is the n n matrix
f (A) = f0 I + f1 A + + fd Ad .
If A is fixed, the set of all such matrices is denoted F[A]. That is,
F[A] = { f (A) | f is a polynomial } .
Lemma. Properties of matrix polynomials. If f, g are polynomials,
is a scalar and A is an n n matrix, then
(f + g)(A) = f (A) + g(A) ;
(f )(A) = f (A) ;
(z p f )(A) = Ap f (A) ;
(gf )(A) = g(A)f (A) ;
f (A) and g(A) commute;
Af (A) = f (A)A .
Moreover, if v is a vector in the eigenspace E of A, then
f (A)v = f ()v .
Finally, if P is invertible and A = P BP 1 , then
f (A) = P f (B)P 1 .
Proof. Follows directly from the definition.

A Jordan matrix is a direct sum of Jordan


Camille Jordan
(18381922)
blocks.
Thus, our matrix J from page 2 is a Jordan matrix consisting of a single
Jordan block J2 (5).

Corollary. For any n n matrix A over F, the set F[A] is a vector


subspace of Mnn (F).
Proof. The first two parts of the lemma show that F[A] is closed under
addition and scalar multiplication. But it is clear that F[A] is a non
empty subset of Mnn (F), so F[A] is a subspace of this set.

Problem. What is the dimension of the vector space C[A]? Since this
space contains all possible (complex) polynomial expressions in A, a
spanning set is
{ I, A, A2 , A3 , . . . } .
There is no immediately obvious finite spanning set for C[A], so it might
appear that the space is infinitedimensional. However, the previous result shows that this is not so: Mnn (C) has dimension n2 , so the subspace
C[A] must have dimension n2 or less. In fact, much more than this is
true!

p(z) = det(zI A) = z 2 5z 2 ,
and we can calculate
p(A) = A2 5A 2I =

Theorem. The CayleyHamilton Theorem.


Let A be an n n complex matrix, and let p(z)
be the characteristic polynomial of A. Then
p(A) is the zero matrix.
Proof. By Schurs Lemma A = QU Q for
some upper triangular U , and so by the last
part of the lemma on page 6 we have
p(A) = Q p(U ) Q .

That is, p(U ) maps every element of Cn to the zero vector, and so p(U )
is the zero matrix.


1 2
is
Example. The characteristic polynomial of A =
3 4

Arthur Cayley
(18211895)

p(z) = (z 1 ) (z n ) .

1
3

2
4

2
0

0
2

=0.

A4 = 27A2 + 10A = 145A + 54I


A5 = 145A2 + 54A = 779A + 290I .
Moreover, 2I = A2 5A = (A 5I)A and so
A1 = 12 (A 5I) .

Using the fourth property of the lemma on page 6, we have


p(U ) = (U 1 I) (U n I) .
Now let Vj = span{ e1 , . . . , ej } be the span of the first j standard basis
vectors in Cn . If j k then

By considering this exercise, we see that if A is n n, then An and each


higher power can be written as a linear combination of lower powers.
Corollary. If A is an n n complex matrix, then

(U k I)ej = { jth column of U k I } Vk1 ,


and therefore (U k I)(Vk ) Vk1 . Applying this result repeatedly,
p(U )(Cn ) = (U 1 I) (U n I)(Vn )
(U 1 I) (U n1 I)(Vn1 )

7 10
15 22

Exercise. With A as above, find A5 and A1 as linear combinations of


A and I.
Solution. By using the characteristic equation we have A2 = 5A + 2I
and so
A3 = 5A2 + 2A = 27A + 10I

So it suffices to prove that p(U ) = 0. Now A and U have the same


eigenvalues, and p(z) is given in terms of the eigenvalues by

(U 1 I)(V1 )
= {0} .

{ I, A, A2 , . . . , An1 }
is a spanning set for C[A], and hence dim C[A] n.
Lemma. The division algorithm for polynomials Let F be a field and
let f, g be in F[z] If g is not the zero polynomial then there exists a
unique pair of polynomials q, r in F[z] such that
f = qg + r

and either r = 0 or deg r < deg g.


8

Comment. There are various conventions in the literature regarding


the degree of the zero polynomial: sometimes it is said to have degree
, sometimes 1. For this course we shall say that the degree of the
zero polynomial is undefined.
Proof. The existence part, in effect, states the possibility of long (or
short) division of polynomials, which is familiar to you from school. Lets
prove it anyway. Write
g = gd z d + gd1 z d1 + + g0
with gd 6= 0, and consider the set of polynomials
R = { f qg | q F[z] } .
If R contains the zero polynomial then f = qg for some q and we are
done. Otherwise let
f qg = r = re z e + re1 z e1 + + r0
be a polynomial of smallest possible degree in R. If e d then


re
f q + z ed g = r (re z e + )
gd
is a polynomial in R having degree smaller than that of r, which is
impossible. Therefore e < d and we have found the required q, r. To
prove uniqueness, suppose that

Then
f = (2z 1)g + r1

where r1 = z 3 + z 2 + 3z + 3

g = (z 2)r1 + r2
r1 = (z + 1)r2 .

where r2 = z 2 + 3

As we have done in Z, we can use this both to prove the existence


of greatest common divisors for polynomials, and to calculate them in
practice. Note that makes no sense for polynomials, so we have to
define greatest common divisors in a slightly different way.
Definition. Let f1 , f2 be polynomials over a field F. We say that f2 is
a factor of f1 , written f2 | f1 , if f1 = qf2 for some q F[z]. If f1 and f2
are not both the zero polynomial, then a greatest common divisor
of f1 and f2 is a polynomial g such that
g | f1 and g | f2 ;
if d F[z] and d | f1 and d | f2 , then d | g.
Lemma. Existence and uniqueness of the greatest common divisor.
Any two polynomials f1 and f2 , not both zero, have a unique monic
greatest common divisor.
Proof. By symmetry we may assume that f2 6= 0; write f2 = r0 and
apply the division algorithm repeatedly:
f1 = q1 r0 + r1
r0 = q2 r1 + r2
..
.

f = q1 g + r1 = q2 g + r2 ,
where the pairs q1 , r1 and q2 , r2 both satisfy the conditions of the theorem. Then
(q1 q2 )g = r2 r1 ;
consequently q1 q2 = 0 and r2 r1 = 0, as otherwise the right hand side
has smaller degree than the left hand side. This completes the proof.
By applying the division algorithm repeatedly we obtain the Euclidean algorithm for polynomials. For example, let
f = 2z 5 3z 4 + 6z 3 7z 2 + 6 and
9

g = z 4 z 3 + 2z 2 3z 3 .

()

rn2 = qn rn1 + rn
rn1 = qn+1 rn .
Note that as long as rk 6= 0 we have deg rk < deg rk1 ; this cannot continue indefinitely, so at some stage we must have rn+1 = 0, as indicated
in this calculation. Looking at the last equation and working backwards
we have
rn | rn1

rn | rn2

10

rn | f2 and rn | f1 .

On the other hand, if we start at the beginning and work forwards, we


have

the first equation of the algorithm shows that, say,


r1 = f1 q1 r0

d | f1 and d | f2

d | r1

d | rn .

By definition, rn is a greatest common divisor of f1 and f2 ; dividing rn


by its leading coefficient gives a monic greatest common divisor. If f1 , f2
have monic greatest common divisors g1 and g2 , then from the second
part of the definition we have g1 | g2 and g2 | g1 ; since they are monic,
this implies that g1 = g2 . The proof is complete.
Comment. We see from this proof that a greatest common divisor of
f1 , f2 can be found as the last divisor, or the last nonzero remainder,
in the Euclidean algorithm.
Definition. If f1 and f2 are not both zero, we write gcd(f1 , f2 ) for the
(unique) monic greatest common divisor of f1 and f2 . We say that f1
and f2 are coprime or relatively prime if gcd(f1 , f2 ) = 1.
Theorem. Bezouts identity. Let f1 , f2 , not both zero, be in F[z]. Then
there exist a1 , a2 F[z] such that a1 f1 + a2 f2 = gcd(f1 , f2 ).
Proof. Consider the set

= (b1 f1 + c1 f2 ) q1 (b0 f1 + c0 f2 )
= (b1 q1 b0 )f1 + (c1 q1 c0 )f2
= b1 f1 + c1 f2
is in L; and so on. Eventually we find that rn is in L; and we already
know that gcd(f1 , f2 ) is a constant times rn .
Example and comment.
Let
f = 2z 5 3z 4 + 6z 3 7z 2 + 6 and g = z 4 z 3 + 2z 2 3z 3
as on page 9. From the Euclidean algorithm on page 10 we have
gcd(f, g) = r2 = z 2 + 3 ;
running the Euclidean algorithm backwards, we have
z 2 + 3 = g (z 2)r1

L = { a1 f1 + a2 f2 | a1 , a2 F[z] }

= g (z 2)((f (2z 1)g)


= (z 2)f + (2z 2 5z + 3)g ,

and let g be a monic polynomial of smallest possible degree in L. As g is


in L it can be written as g = a1 f1 + a2 f2 ; we claim that g = gcd(f1 , f2 ).
First, use the division algorithm to write f1 = qg + r; then
r = f1 q(a1 f1 + a2 f2 ) = (1 qa1 )f1 (qa2 )f2 L .
By the requirement that g have minimal degree, it is impossible that
deg r < deg g; so r = 0. This shows that g | f1 , and similarly g | f2 . If
d is any common factor of f1 and f2 then d | a1 f1 + a2 f2 , that is, d | g.
By definition g = gcd(f1 , f2 ), and the theorem is proved.

which exhibits the greatest common divisor as a linear combination of f and g.


If you recall the proofs of existence of gcds and of Bezouts identity,
and the associated calculation techniques, for integers, you will see
that the corresponding material for polynomials is very similar.

Alternative proof (sketch). Consider again the Euclidean algorithm


() on page 11. It is clear that f1 and f2 are in the set L defined above;

We now return to polynomials of matrices.


Lemma. The minimal polynomial. Let A Mn,n (C), and let m be
a nonzero polynomial of smallest possible degree in C[z] such that
m(A) = 0. If f C[z] and f (A) = 0, then m | f . In particular, m
is a factor of the characteristic polynomial of A.

11

12

Proof (sketch). If f = qm + r then r(A) = f (A) q(A)m(A) = 0. But


deg r < deg m is not possible, so we have r = 0 and hence m | f .
Definition. Let A be an n n complex matrix. The minimal polynomial of A is the unique (nonzero) monic polynomial m of smallest
degree such that m(A) = 0.
Example. We find a minimal polynomial by trial and error: we shall
see much better methods later. Let

1
3
3
A = 4 7 6 .
2
3
2
Since A is not a scalar multiple of I, no
can satisfy f (A) = 0. However,

5 9
2

A = 12 19
6 9

linear polynomial f = f0 + f1 z

9
18 ,
8

which shows that z k is a factor of m(z) and hence bk 1.


Example. We use the lemma to give an alternative solution for the
previous example. (There are still better solutions to come!) Using
standard methods, we find that the characteristic polynomial of A is
p(z) = z 3 + 4z 2 + 5z + 2 = (z + 1)2 (z + 2) ,
and so the minimal polynomial must be
m1 (z) = (z + 1)(z + 2)

or

m2 (z) = (z + 1)2 (z + 2) .

If m1 (A) = 0 then the minimal polynomial is m1 ; if not, then it is


m2 . Direct calculation shows that (A + I)(A + 2I) = 0, so the minimal
polynomial is (z + 1)(z + 2), confirming our previous result.
Theorem. The primary decomposition theorem. Let f be a polynomial
over F, and suppose that f = f1 f2 where f1 and f2 are coprime. For
any A Mn,n (F) we have
ker(f (A)) = ker(f1 (A)) ker(f2 (A)) .

and we see that A = 3A 2I. So the minimal polynomial is the


quadratic m(z) = z 2 + 3z + 2.
Lemma. Let A be an n n matrix over C. Suppose that 1 , . . . , s are
the distinct eigenvalues of A and that they have algebraic multiplicities
a1 , . . . , as respectively. Then the minimal polynomial of A is
m(z) = (z 1 )b1 (z s )bs
for some exponents bk with 1 bk ak .
Proof. Since we already know that A has characteristic polynomial
p(z) = (z 1 )a1 (z s )as
and that m(z) is a factor of this, the only thing left to prove is that bk
can never be zero. To do this, let v be an eigenvector of A corresponding
to k . Using one of the properties from page 6, we have
0 = m(A)v = m(k )v
13

and so m(k ) = 0 ,

Proof. Since f1 and f2 are coprime, there exist polynomials a1 and a2


such that a1 (z)f1 (z) + a2 (z)f2 (z) = 1 for all z; evaluating both sides at
the matrix A, we have
a1 (A)f1 (A) + a2 (A)f2 (A) = I .
Therefore any v ker(f (A)) can be written as
v = I v = a1 (A)f1 (A)v + a2 (A)f2 (A)v .
But since matrix polynomials (in the same matrix) commute, we have




f2 (A) a1 (A)f1 (A)v = a1 (A) f1 (A)f2 (A)v = a1 (A)f (A)v = 0 .

That is, a1 (A)f1 (A)v is in ker(f2 (A)); similarly, a2 (A)f2 (A)v is in


ker(f1 (A)); so v is the sum of a vector in ker(f1 (A)) and a vector in
ker(f2 (A)), and we have proved that
ker(f (A)) = ker(f1 (A)) + ker(f2 (A)) .
14

()

On the other hand, if v is in both ker(f1 (A)) and ker(f2 (A)) then

3. Cn has a basis B consisting of generalised eigenvectors of A such


that the matrix of A with respect to B has the form

v = I v = a1 (A)f1 (A)v + a2 (A)f2 (A)v = 0 ;


that is, ker(f1 (A)) and ker(f2 (A)) have zero intersection; so the sum ()
is direct, and the proof is finished.
Exercise. Confirm the obvious inductive extension of this result: if
f = f1 fs and the factors fk are pairwise coprime (that is, there is
no common factor of any two) , then
ker(f (A)) = ker(f1 (A)) ker(fs (A)) .
We shall apply the above exercise to the characteristic polynomial p(z)
of a matrix, factorised into powers of linear polynomials,
p(z) = (z 1 )a1 (z s )as .
But first we introduce some terminology for the kernels involved in the
direct sum.
Definition. Let A be an n n complex matrix; let be an eigenvalue
of A having algebraic multiplicity a. The generalised eigenspace of
A corresponding to is
GE = ker((A I)a ) .
The nonzero elements of GE are called generalised eigenvectors.
Lemma. Properties of generalised eigenspaces. Let A Mn,n (C);
suppose that 1 , . . . , s are the distinct eigenvalues of A and that they
have algebraic multiplicities a1 , . . . , as respectively. Then
1. every GEk is invariant under A;

A1 A2 As ,
where Ak is an upper triangular matrix of size ak ak , with all
diagonal elements equal to k ;
4. GEk has dimension equal to the algebraic multiplicity ak .
Comment. Two important things are assured by this lemma. Firstly,
though we have seen that the dimension of an eigenspace may be less
than the corresponding algebraic multiplicity, this is not so for generalised eigenspaces: each generalised eigenspace has dimension equal to
the algebraic multiplicity of the corresponding eigenvalue. So, at the expense of introducing some extra complications, we have overcome one of
the obstacles to diagonalisation in the general case. Secondly, although
we cannot always find a basis of Cn consisting of eigenvectors of a given
matrix A, we can always find a basis (moreover, quite a special one)
consisting of generalised eigenvectors.
Proof of the lemma. For the first result we use the fact that the matrices
A and (A I)a commute (exercise: give two different proofs of this).
Therefore
v GE

(A I)a v = 0
(A I)a Av = A(A I)a v = 0
Av GE .

For the second we apply the primary decomposition theorem (page 12)
or its inductive extension (page 13) to p(z), the characteristic polynomial
of A. We have (by definition of algebraic multiplicity) that
p(z) = (z 1 )a1 (z s )as ;
the factors (z k )ak are coprime in pairs and therefore

2. Cn = GE1 GEs ;
Note that this is stronger than saying that there is no common
factor of all s polynomials.

ker(p(A)) = ker((A 1 I)a1 ) ker((A s I)as )


= GE1 GEs .

15

16

By the CayleyHamilton theorem we have p(A) = 0, so ker(p(A)) = Cn ,


and this completes the proof.
For the third claim, recall that each generalised eigenspace GEk is
invariant under A; so the map
Tk : GEk GEk

where T (v) = Av

is welldefined; and by Schurs lemma there is a basis Bk of GEk such


that the matrix Ak of Tk with respect to Bk is upper triangular. Let
B = { b1 , . . . } be the union of the bases Bk ; this is a basis for Cn .
Suppose that the basis vectors from GEk are br+1 , . . . , br+mk . Then
[Abr+j ]Bk is the jth column of Ak . This is the kth section of [Abr+j ]B ;
and all other sections are zero because Abr+j GEk . So the matrix
of A with respect to B is
M = A1 A2 As .
Now each Tk has eigenvalue k only (because different generalised eigenspaces have zero intersection), and so the diagonal of M contains these
eigenvalues, with k occurring mk times, where mk = dim GEk . But
M is similar to A; therefore mk = ak for each k. This completes the
proof of part 3 of the lemma, and also proves part 4.
Corollary. If the n n matrix A has only one eigenvalue , then
GE = Cn .
Lemma. Computing generalised eigenspaces. Let be an eigenvalue of
an n n matrix A; for k = 0, 1, 2, . . . write
dk = nullity(A I)k = dim ker(A I)k .
Then
ker(A I) ker(A I)2 ker(A I)3 Cn ;
0 = d0 d1 d2 d3 n;
d1 d0 d2 d1 d3 d2 ;
ker(A I)m = ker(A I)m+1 for some positive integer m a,
where a is the algebraic multiplicity of ; and for any such m we
have GE = ker(A I)m .
17

Comment. The second and third results of this lemma will be very
useful later in avoiding hard work. They show that if we write down
the sequence of nullities, then the nullities themselves must be non
decreasing, while their differences must be nonincreasing. For example,
5 , 8 , 11 , 12 , 12 , 12 , . . .
is a possible sequence, but
5 , 8 , 7 , 12 , 12 , 12 , . . .

and 5 , 8 , 9 , 12 , 12 , 12 , . . .

are not. The fourth result shows that in order to find GE , we do not
necessarily have to calculate the kernel of (A I)a ; we only need to
calculate successive kernels until there is no change.
Proof of the lemma. The first statement is a very easy exercise; the
second follows immediately. For the third, choose a basis { u1 , . . . , up }
for ker(A I)k and extend it to obtain a basis
{ u1 , . . . , up , v1 , . . . , vq }
for ker(A I)k+1 ; extend this again to a basis
{ u1 , . . . , up , v1 , . . . , vq , w1 , . . . , wr }
for ker(A I)k+2 . Now for j = 1, 2, . . . , r the vector (A I)wj is
in ker(A I)k+1 and hence is a linear combination of u1 , . . . , up and
v1 , . . . , vq ; for each j we write
(A I)wj = xj + yj
with xj span{ u1 , . . . , up } and yj span{ v1 , . . . , vq }. We wish to
show that the vectors y1 , y2 , . . . , yr are linearly independent. So, set
1 y1 + 2 y2 + + r yr = 0 .
Now (A I)k xj = 0 for each j, and so
(A I)k+1 (1 w1 + 2 w2 + + r wr )
= (A I)k (1 x1 + 2 x2 + + r xr )
+ (A I)k (1 y1 + 2 y2 + + r yr )
= 0,
18

which shows that 1 w1 + 2 w2 + + r wr ker(A I)k+1 . Hence


1 w1 + 2 w2 + + r wr
= 1 u1 + 2 u2 + + p up + 1 v1 + 2 v2 + + q vq
for some scalars 1 , . . . , q . As { u1 , . . . , up , v1 , . . . , vq , w1 , . . . , wr } is
an independent set the scalars j (among others) are all zero. Therefore the span of the vectors v1 , . . . , vq contains r independent elements
y1 , . . . , yr ; hence q r, that is, dk+1 dk dk+2 dk+1 , as claimed.
The third statement shows that if dm = dm+1 , then dm+1 = dm+2
and so on, and hence all the kernels from ker(A I)m onwards are
equal; all that remains is to prove that this must occur for some m a;
and it suffices to show that ker(A I)a = ker(A I)a+1 . So, let
v ker(A I)a+1 . We can write the characteristic polynomial of A as
p(z) = (z )a q(z) = (z )a

na
X

k (z )k

k=0

for some scalars 1 , . . . , na ; moreover, z is not a factor of q(z) and


so 0 6= 0. But p(A) is the zero matrix (CayleyHamilton theorem) and
therefore
0 = p(A)v =

na
X

k (A I)a+k v = 0 (A I)a v ;

k=0

hence (A I) v = 0 and v ker(A I) . This shows that


ker(A I)a+1 ker(A I)a ;
but we already know that the reverse inclusion holds, and the proof is
complete.
Corollary. If is an eigenvalue of A then
GE =

ker(A I)k .

k=1

19

Jordan forms. It can be shown that the basis found in the previous
lemma can be further refined to give a Jordan basis for Cn , with
respect to which the matrix of A is in Jordan form.
Definition. A Jordan chain in a generalised eigenspace GE of a
matrix A is a sequence v1 , v2 , . . . , vk such that
(A I)vj = vj+1 for j = 1, . . . , k 1

and (A I)vk = 0 ,

that is, vk is an eigenvector. We shall often write such a chain as


AI

AI

AI

AI

AI

v1 v2 vk1 vk 0
(where the zero vector is not actually part of the chain). A Jordan
basis for an n n matrix A is a basis of Cn consisting of one or more
Jordan chains of A.
Lemma. Jordan chains produce Jordan blocks. Let v be a generalised
eigenvector of an n n matrix A corresponding to the eigenvalue .
Then there is a Jordan chain
AI

AI

AI

AI

AI

v1 v2 vk1 vk 0
with vk 6= 0, and
1. every vj is in GE ;
2. the set { vk , . . . , v1 } is linearly independent;
3. the subspace W = span{ vk , . . . , v1 } of Cn is invariant under A;
4. the matrix of T : W W , T (v) = Av with respect to the ordered
basis B = { vk , . . . , v1 } is the Jordan block Jk ().
Proof. For the first claim note that by definition of a Jordan chain we
have
(A I)k+1j vj = 0 .
For the second, let 1 v1 + + k vk = 0. Multiplying by (A I)k1
gives 1 vk = 0 and so 1 = 0; a similar calculation shows that every
coefficient j is zero. For the rest, note that by definition
Avj = vj+1 + vj for j = 1, . . . , k 1 and
20

Avk = vk ;

this shows that Avj W , and that the matrix of T with respect to B
has jth column from the right given by

0
1 row j+1 from the bottom
[Avj ]B =
row j from the bottom
0
provided j < k, and kth column from the right (that is, first column)
equal to (, 0, . . .)T . So the matrix of T is Jk (), as claimed.
Examples.
1. Consider the matrix from the start of this chapter,


2 9
A=
.
1 8
We know already that A has only one eigenvalue = 5. It is easy
to check that (A 5I)2 = 0 and so
GE5 = ker(0) = C2 .
Alternatively, this is given without calculation
by the corollary on

page 17. Starting with the vector v = 10 , we have a chain
 


 
A5I
A5I
0
3
1
,

v2 =
v = v1 =
0
1
0
and the matrix of A with respect to the ordered basis { v2 , v1 } is


5 1
J = J2 (5) =
0 5
as we found earlier. The fact that the generalised
eigenspace is the

whole of C2 shows that our choice of v1 = 10 was not just lucky
in fact, any nonzero vector other than an eigenvector would have
done as well. For example, the chain
 




A5I
A5I
0
6
7

v2 =
v1 =
0
2
3
21

gives
A=
2. Take

6
7
2 3



5
0

1
5



6
7
2 3

1

7 1 1
A = 2 7
2 ;
5 3 1

then A has eigenvalues 6 (with algebraic multiplicity 2) and 3, and


corresponding eigenvectors

1
1
v = 0 , v = 3
1
7
respectively. We have

2 1
2

GE6 = ker(A6I) = ker


? ?
? ?

2
1
1

? = span
0 , 2 ,

?
1
0

noting that the remaining entries in (A6I)2 need not be calculated


(why?). The chain


1
1
0
A6I
A6I
v1 = 2 v2 = 0 0
0
1
0

provides a basis for GE6 : note that we were careful to obtain a


chain of length 2 by ensuring that our choice for v1 was not an
eigenvector. We have GE3 = ker(A 3I) = E3 , which has a chain
of length 1,

1
0
A3I

v3 =
3
0 .
7
0
We have
Av2 = 6v2 ,

Av1 = v2 + 6v1 ,
22

Av3 = 3v3 ,

and so A = P JP 1 where

1 1

P = v2 v1 v3 = 0 2
1 0
and

1
3
7

6 1 0

J = J2 (6) J1 (3) = 0 6 0 .
0 0 3
3. Consider the matrices

4 2 1
A = 2 3
1
6 5 1

2 1 0
J = 0 2 1 .
0 0 2

If we try to do the same with B we find that things are not quite
the same, because we have

1 2 2
0
2
ker(B 2I) = ker 0 0 0 = span 1 , 1

0 0 0
0
1
and already

and

0 4 4
B = 2 6 4 .
1 2 0

Each has eigenvalue = 2, with algebraic multiplicity 3, and therefore has one generalised eigenspace GE2 = C3 . We may compute

2 2 1
1
ker(A 2I) = ker 0 1 0 = span 0

2
0 0
0

and

then we have A = P JP 1 with

1 1 0
P = 0
1 0 and
2 3 1

2 1 1
1
1
2

ker(A 2I) = ker 0 0


0
= span
0 , 2 .

0 0
0
2
0

A chain beginning with a vector in ker(A 2I) will only have length
1; one beginning in ker(A 2I)2 will have length 2; so we need to
begin with a vector in neither. For example, take

0
1
1
0
A2I
A2I
A2I
0 1 0 0 ;
1
3
2
0
23

0 0
ker(B 2I)2 = ker 0 0
0 0

0
0 = C3 .
0

We will not get a Jordan chain of three generalised eigenvectors;


but there will be independent chains of length 2 and 1, which will
lead to a Jordan matrix consisting of two separate Jordan blocks.
We begin with a vector v1 not in ker(B 2I), giving a chain such
as


1
2
0
B2I
B2I
v1 = 0 v2 = 2 0 .
0
1
0

As B has two independent eigenvectors and our first chain includes


only one of them, we may construct a second chain starting with
an eigenvector independent of v2 , say


0
0
B2I
v3 = 1 0 .
1
0
We have B = P JP 1 with

2 1
P = (v2 v1 v3 ) = 2 0
1 0
24

0
1
1

2
and J = 0
0

1 0
2 0 ;
0 2

alternatively,

0
P = (v3 v2 v1 ) = 1
1

2 1
2 0
1 0

2
and J = 0
0

0 0
2 1 .
0 2

The facts we have proved about the dimensions of successive kernels will,
in certain circumstances, make it very easy to find the Jordan form J
of a matrix; though if we want to find the matrix P as well, there will
usually still be a lot of computation to be done.
Examples/exercises.
1. A 12 12 matrix A has only one eigenvalue = 7. Given that
nullity(A 7I) = 4 ,

nullity(A 7I)2 = 7 ,

nullity(A 7I)3 = 10 ,

nullity(A 7I)4 = 12 ,

find a Jordan form of A.


Solution. A diagram is helpful.
ker(A 7I)
v4

ker(A 7I)2

v12

ker(A 7I)3

v3

v1

v9

v5

First choose a vector v1 in ker(A 7I)4 but not in ker(A 7I)3 .


Then construct a chain of 4 vectors
A7I

A7I

v1 v2 v3 v4 .
25

All the vectors we shall choose must be independent, as we need P


to be an invertible matrix. But there are no further independent
vectors to be chosen outside ker(A 7I)3 ; therefore we take v9 in
ker(A 7I)3 but not ker(A 7I)2 , making sure that it is not a
linear combination of v1 , v2 , . . . , v8 and vectors in ker(A 7I)2 ,
and construct a third chain
A7I

A7I

v9 v10 v11
of length 3. Finally we have a chain of length 1, that is, a single
vector v12 ker(A 7I). These twelve generalised eigenvectors
form a basis of C12 , and we have A = P JP 1 , where

P = v4 v3 v2 v1 v8 v7 v6 v5 v11 v10 v9 v12

and J is the Jordan matrix


v2

ker(A 7I)4

A7I

As v4 is an eigenvector of A this chain will go no further and we construct a new one, starting with a vector v5 which is in ker(A 7I)4
but is not a linear combination of v1 , v2 , v3 , v4 and vectors in
ker(A 7I)3 . This gives another chain of four generalised eigenvectors
A7I
A7I
A7I
v5 v6 v7 v8 .

J = J4 (7) J4 (7) J3 (7) J1 (7) .


Comment. We shall not formally prove that the kernel diagram
method works, but we offer the following observations. First, we have
shown that when the successive kernels (or, equivalently, their dimensions) stop increasing, nullity(A I)m is eqial to a, the algebraic multiplicity of . So our combined chains contain the right number of vectors
to be a basis for GE . Proving that the vectors are independent (and
therefore do in fact form a basis) is rather messy to write down, so we
illustrate it for the vectors v1 , . . . , v8 which form the first two chains in
the previous example. Suppose that
1 v1 + + 8 v8 = 0 ;
26

since many of the vectors in this expression are in ker(A I)3 , we have
(A I)3 (1 v1 + 5 v5 ) = 0 .
This means that
1 v1 + 5 v5 = w
for some w in ker(A I)3 ; but by our choice of v5 , this is possible
only if 5 = 0. Hence also 1 = 0, and we have shown that v1 , . . . , v8
are independent. Finally, note that for our production of chains to
work, each ring of the diagram must contain no more vectors than
the previous ring; but this is true as a consequence of our earlier result
dm dm1 dm1 dm2 .
Examples/exercises, continued.
2. Use these ideas to find with minimal work Jordan forms for the
matrices A and B in example 3 on page 23.
3. Find a Jordan form for the matrix

1
0 4 8
2
8
1 1
C=
,
0
4 1 8
1 2 2 9
given that two of its eigenvalues are 1 and 3.
4. Suppose that A is a 15 15 matrix with only one eigenvalue , and
that the nullities of (A I)k are
4 , 8 , 11
for k = 1, 2, 3 respectively. Find all possible completions of the
sequence of nullities, and hence all possible Jordan forms of A.
5. A 13 13 matrix A is known to have an eigenvalue with
nullity(A I)k = 3, 5

for k = 1, 2;

and a (different) eigenvalue with


nullity(A I)k = 2, 4, 6, 7
27

for k = 1, 2, 3, 4.

Find all possible Jordan forms of A. If A were given and the values of and were known, how could you decide with minimum
calculation which of these Jordan forms is the correct one?
6. Let A be a square matrix and an eigenvalue of A. Explain why
all of the following are the same:
the number of Jordan blocks Jk () in the Jordan form of A;
the number of separate Jordan chains forming a basis for GE ;
the number of independent eigenvectors of A corresponding to
eigenvalue ;
the dimension of E ;
the geometric multiplicity of
the nullity of A I;
the number of parameters in the solution of (A I)v = 0;
the number of nonleading columns in a rowechelon form of
A I.
7. One more example of finding both P and J, with a bit of a hint to
reduce the workload: let

3 1 3 1
8
1
3 4
A=
.
1 1 1 1
1
0 2 3
Given that 2 and 3 are eigenvalues of A, find an invertible matrix
P and a Jordan matrix J such that A = P JP 1 .
Solution. Finding the eigenspaces for = 2 and = 3, we obtain
1
0
A 2I
0
0

0
1
0
0

1
0
A 3I
0
0

2
1
0
0
0
1
0
0

1
2
1

2
1 2
,
E
=
span
,


,
2
0
0

0
0
1

0
2 0

3 1
1
, E3 = span
.
1 0

0 0
1
28

Hence A has eigenvalues 2, 2, 3, and from the trace we find that the
fourth eigenvalue is 2 again. Hence there will be three independent
generalised eigenvectors corresponding to = 2, and

from E3 ; this vector is guaranteed to be independent of v1 , v2 , v3 .


Now corresponding to = 2 we have two chains of generalised
eigenvectors
A2I

v1 v2

GE2 = ker(A 2I)2 .


So we find

0
0 0
0
0
0
0
1
1
2

(A 2I)2 =

0
? ?
?
?
0
? ?
?
?

1
0
0
0

1
0
0
0

2
0
,
0
0

noting that the remaining entries of (A2I)2 need not be calculated.


Now we may choose, say,

1
0
v1 = ker(A 2I)2 ker(A 2I) ;
0
0
and then take

1
3
v2 = (A 2I)v1 =
.
1
1

It is a good idea to check that v2 is an eigenvector. From a diagram


we see that another vector v3 E2 is needed, and we must ensure
that v1 , v2 , v3 are independent. In fact

2
1
v3 =

1
0
will do. To complete the matrix P we take, say,

0
1
v4 =

0
1
29

and

v3 ;

and corresponding to = 3, just a single (generalised) eigenvector


v4 . So the required matrices are
1
 3
P = v2 v1 v3 v4 =
1
1

and

1
0
0
0

2
0
J = J2 (2) J1 (2) J1 (3) =
0
0

2
0
1 1

1
0
0
1
1
2
0
0

0
0
2
0

0
0
.
0
3

Jordan forms and similarity. The following theorem shows that the
Jordan form gives a definite test for similarity of two matrices. Compare
the determinant, characteristic polynomial and other similarity invariants which we studied in previous chapters: these can can be used to
show that two matrices are not similar, but can never show that two
matrices are similar.
Theorem. The Jordan form is a complete similarity invariant. Two
n n matrices A and B are similar if and only if they have the same
Jordan form, except possibly for a permutation of the Jordan blocks.
Proof. First, suppose that A and B are similar. Then they have the
same characteristic polynomial and hence the same eigenvalues with the
same algebraic multiplicities (see the lemma computing eigenvalues,
chapter 5, page 5). Moreover, for any and any k the matrices (AI)k
and (B I)k are similar; so they have the same nullity (chapter 2,
page 39). But we have seen that these nullities determine the Jordan
blocks; so, if we ignore the order of these blocks, A and B have the same
Jordan form.
30

Conversely, suppose that A has Jordan form J1 and B has Jordan


form J2 and that these Jordan forms consist of the same Jordan blocks.
Then J2 can be obtained from J1 by a reordering of the rows and the
same reordering of the columns; so J1 is similar to J2 . Since A is similar
to J1 and B is similar to J2 and similarity is an equivalence relation,
this completes the proof.
Example. The matrices

4 2 1
A = 2 3
1
6 5 1

To complete the proof we have to show that any polynomial f (z)


which is a proper factor of m(z) does not give f (A) = 0. Such a polynomial has the form
f (z) = (z 1 )c1 (z s )cs
with ck < bk for at least one value of k. Now ker(A k I)ck will be a
proper subset of GEk , so there exists a vector

0 4
and B = 2 6
1 2

4
4
0

from page 23 share all the similarity invariants that we have discussed
up to chapter 5 (rank, nullity, trace, determinant, eigenvalues and so
on). But they are not similar because they have different Jordan forms
(top and bottom of page 24).
Jordan forms and the minimal polynomial. Earlier in this chapter
we found some minimal polynomials more or less by trial and error.
Theorem. Jordan forms and the minimal polynomial. Let A be an
n n matrix; for each eigenvalue k of A, denote the size of the largest
Jordan block Jb () occurring in the Jordan form of A by bk . Then the
minimal polynomial of A is
m(z) = (z 1 )b1 (z s )bs .
Proof. Note that bk is the length of the longest Jordan chain corresponding to k ; and that GEk = ker(A k I)bk . So for every vk in
GEk we have
(A k I)bk vk = 0 ;
since (A k I)bk is one of the terms in the product for m(A), and since
it commutes with all the other terms, we have
m(A)vk = (A k I)bk vk = 0 .

v ker(A k I)ck +1 ker(A k I)ck .


As on page 19, we can write f (z) in the form
f (z) = (z k )ck g(z) = (z k )ck

deg
Xg

j (z k )j

j=0

with 0 6= 0; then we have


f (A)v =

deg
Xg

j (z k )ck +j v = 0 (A k I)ck v 6= 0 ,

j=0

and so f (A) is not the zero matrix. This completes the proof.
Corollary. If A is an n n matrix, and if b1 , b2 , . . . , bs are defined as
in the preceding theorem, then
dim C[A] = b1 + b2 + + bs n .
Example. If the matrix A has Jordan form
J5 (2) J2 (2) J3 (2) J3 (2) J2 (2) ,
then its minimal polynomial is

But every v Cn is a linear combination of generalised eigenvectors,


and so m(A)v = 0 for all v Cn ; therefore m(A) is the zero matrix.

m(z) = (z 2)5 (z + 2)3 .

31

32

Jordan forms and powers. We return to the kind of problem which


motivated this chapter. Recall that if we can write a square matrix in
terms of a Jordan form, A = P JP 1 , then we have An = P J n P 1 , and
the problem remains of calculating J n . We begin with the observation
that any Jordan block can be written

0
.
.
Jk () =
.
0

..
.

..
.

0
0

0
0
0
0
.
..
.
.
= I + N with N = .
0
1

1
0
.. . .
.
.
0
0

0
0
..
.
.
1
0

Powers of the k k matrix N are easily calculated: we illustrate with


the 4 4 case:
0
0
N =
0
0

1
0
0
0

0
1
0
0

0
0
0
0
2
, N =
0
1
0
0

0
0
0
0

1
0
0
0

0
0
1
0
3
, N =
0
0
0
0

0
0
0
0

0
0
0
0

1
0

0
0

and N 4 = N 5 = = 0. Such a matrix is called nilpotent. To


calculate Jk ()n = (I + N )n we use the matrix version of a very well
known theorem.

Examples.
1. We have
J4 ()n = (I + N )n
n
n
n
n
n1
n2 2
= I+

N+

N +
n3 N 3 ,
1
2
3
the series stopping at this point since N 4 = 0. Hence

( n1 )n1 ( n2 )n2 ( n3 )n3


n
n
n
( 1 )n1 ( 2 )n2
0
J4 ()n =
.
0
0
n
( n1 )n1
0
0
0
n
2. Consider the Jordan matrix J = J2 (7) J1 (7) J1 (3). We have
 n

n
7
( n1 )7n1
n
n
n1
J2 (7) = 7 I +
7
N=
,
0
7n
1
since in this case N 2 is already zero. Therefore
n

7
n 7n1 0
0
7n
0
0
0
Jn =
.
n
0
0
7
0
0
0
0 3n
3. For the matrix

Theorem. The binomial theorem. Let A and B be n n matrices for


which AB = BA. Then for any n 0 we have
(A + B)n = An +

n
n
An2 B 2 + + B n .
An1 B +
2
1

Proof exercise! An easy induction using the Pascals triangle formula


n + 1
r

n
r

 n 
.
r1

We remark that it is easy to show by induction that


(A1 As )n = An1 Ans .
33

0 4
B = 2 6
1 2

4
4
0

on page 23 we found B = P JP 1 with

2 1 0
2

P = (v2 v1 v3 ) = 2 0 1
and J = 0
1 0 1
0
therefore

1 0
2 0 ;
0 2

2n n2n1 0
2 2n 4n
4n
Bn = P 0
2n
0 P 1 = 2n1 2n 2 + 4n 4n .
0
0
2n
n
2n 2 2n
34

Potrebbero piacerti anche