Exp 6

The matrix exponential
Erik Wahln
erik.wahlen@math.lu.se
October 3, 2014
1 Definition and basic properties

These notes serve as a complement to Chapter 7 in Ahmad and Ambrosetti. In
Chapter 7, the general solution of the equation
(1) x0 = Ax
is computed using eigenvalues and eigenvectors. This method works when A has
n distinct eigenvalues or, more generally, when there is a basis for Cn consisting of
eigenvectors of A. If the latter case, we say that A is diagonalizable. In this case,
the general solution is given by
x(t) = c1 et1 v1 + + cn etn vn ,
where 1 , . . . , n are the, not necessarily distinct, eigenvalues of A and v1 , . . . , vn

the corresponding eigenvectors. Example 7.4.5 in the book illustrates what can
happen if A is not diagonalizable, i.e. if there is no basis of eigenvectors of A. In
these notes we will discuss this question in a more systematic way.
Our starting point is to write the general solution of (1) as
x(t) = etA c,
where c is an arbitrary vector in Cn , in analogy with the scalar case n = 1. We

then have to begin by defining etA and show that the above formula actually gives
a solution of the equation. Note that it is simpler to work in the complex vector
space Cn than Rn , since even if A is real it might have complex eigenvalues and
eigenvectors. If A should happen to be real, we can restrict to real vectors c Rn
to obtain real solutions.
Recall from Taylors formula that
n
t
X tk
e = + rn (t),
k=0
k!
where
es tn+1
rn (t) = ,
(n + 1)!
1
for some s between 0 and t. Since limn rn (t) = 0, we have

t
X tk
e = , t R.
k=0
k!
This is an example of a power series.

More generally a power series is a series of the form

X
ak (x x0 )k ,
k=0
where the the coefficients ak are complex numbers. The number
R := sup{r 0 : {|ak |rk }

k=0 is bounded}
is called the radius of convergence of the series. The significance of the radius of
convergence is that the power series converges inside of it, and that the series can
be manipulated as though it were a finite sum there (e.g. differentiated termwise).
To prove this we first show a preliminary lemma.
Lemma 1. Suppose that {fn } is a sequence of continuously differentiable functions
on an interval I. Assume that fn0 g uniformly on I and that {fn (a)} converges
for some a I. Then {fn } converges pointwise on I to some function f . Moreover,
f is continuously differentiable with f 0 (x) = g(x).
Proof. We have Z x
fn (x) = fn (a) + fn0 (s) ds, n.
a
Since fn0 (s) g(s) uniformly on the interval between a and x, we can take the
limit under the integral and obtain that fn (x) converges to f (x) defined by
Z x
f (x) = f (a) + g(s) ds, x I,
a
where f (a) = limn fn (a). The fundamental theorem of calculus now shows that
f is continuously differentiable with f 0 (x) = g(x).
Theorem 2. Assume that the power series

X
(2) ak (x x0 )k
n=0
has positive radius of convergence. The series converges uniformly and absolutely
in the interval [x0 r, x0 + r] whenever 0 < r < R and diverges when |x x0 | > R.
The limit is infinitely differentiable and the series can be differentiated termwise.
Proof. If |xx0 | > R the sequence ak (xx0 )k is unbounded, so the series diverges.
Consider an interval |x x0 | r, where r (0, R). Choose r < r < R. Then
r k r k
|ak (x x0 )k | |ak |rk C ,
r r
2
when |x x0 | r, where C is a constant such that |ak |rk C k. Since r/r < 1
we find that
X r k
< .
k=0
r
Weierstrass M-test then shows that (2) converges uniformly when |x x0 | r
to some function S(x). S is at least continuous since its the uniform limit of a
sequence of continuous functions. If S is differentiable and can be differentiated
termwise, then the derivative is

X
(k + 1)ak+1 (x x0 )k .
k=0
This is a again a power series with radius of convergence R (prove this!). Hence
it converges uniformly on [x0 r, x0 + r], 0 < r < R. It follows from the previous
lemma that S is continuously differentiable on (x0 R, x0 + R) and that the power
series can be differentiated termwise. An induction argument shows that S is
infinitely many times differentiable.
The interval (x0 R, x0 + R) is called the interval of convergence. The power
series also converges for complex x with |x x0 | < R (hence the name radius of
convergence). All of the above properties still hold for complex x, but the derivative
must now be understood as a complex derivative. This concept is studied in courses
in complex analysis; we will not linger on it here.
Let us now return to matrices. The set of complex n n matrices is denoted

nn
C .
Definition 3. If A Cnn , we define

k k
X t A
etA = , t R.
k=0
k!
In particular,

A
X Ak
e = .
k=0
k!
For this to make sense, we have to show that the series converges for all t R.
A matrix sequence {Ak } k=1 , Ak C
nn
, is said to converge if it converges element-
wise, i.e. [Ak ]ij converges as k for all i and j, where the notation [A]ij is used
to denote the element on row i and column j of A (this will also be denoted aij
in some places). As usual, a matrix series is said to converge if the corresponding
(matrix) sequence of partial sums converges. Instead of checking if all the elements
converge, it is sometimes useful to work with the matrix norm.
Definition 4. The matrix norm of A is defined by
kAk = max |Ax|.

xCn ,|x|=1
3
Equivalently, kAk is the smallest K 0 s.t.
|Ax| K|x|, x Cn
since
x x x
|Ax| = A |x|
= A |x| and = 1.
|x| |x| |x|
The following proposition follows from the definition (prove this!).
Proposition 5. The matrix norm satisfies
(1) kAk 0, with equality iff A = 0.
(2) kzAk = |z|kAk, z C.
(3) kA + Bk kAk + kBk.
A more interesting property is the following.
Proposition 6.
kABk kAkkBk.
Proof. We have
|AB x| kAk|B x| kAkkBk|x| = kAkkBk if |x| = 1.
Proposition 7.
(1) |aij | kAk.

P 1/2
2
(2) kAk i,j |a ij | .
Proof. (1) Recall that aij = [A]ij . Let {e1 , . . . , en }, be the standard basis. Then

a1j
..
Aej = . |aij | kAej k kAk|ej | kAk.
anj

R1 R1 x
(2) We have that
Ax = ... x = ... ,

Rn Rn x
where
Ri = ai1 ain
is row i of A. Hence
n n
! !
X X X
kAxk2 = kRi xk2 |Ri |2 |x|2 = |aij |2 |x|2 .
Cauchy-Schwarz
i=1 i=1 i,j
The following corollary is an immediate consequence of the above proposition.
4
Corollary 8. Let {Ak }
k=1 C
nn
and A Cnn . Then
lim Ak = A lim kAk Ak 0.

k k
We can now show that our definition of the matrix exponential makes sense.
Proposition 9. The series tk Ak tA
P
k=0 k! defining e converges uniformly on compact
intervals. Moreover, the function t 7 e is differentiable with derivative AetA .
tA
Proof. Each matrix element of tk Ak

P
k=0 k! is a power series in t with coefficients
[Ak ]ij /k!. The radius of convergence is infinite, R = , since
k
[A ]ij k kAk krk kAkk rk
r 0
k! k! k!
as k for all r 0. The series thus converges uniformly on any compact
interval and pointwise on R. Differentiating termwise, we obtain
j j
d X tk Ak X tk1 Ak X tA
= = A = AetA .
dt k=0 k! k=1
(k 1)! j=k1
j=0
j!
Since e0A = I, we obtain the following result.

Theorem 10. The general solution of (1) is given by
x(t) = etA c, c Cn
and the unique solution of the IVP
x0 = Ax, x(0) = x0
is given by
x(t) = etA x0 .
In certain cases it is possible to compute the matrix exponential directly from
the definition.
Example 11. Suppose that

1 0 0
0 2 0
A = ..

.. . . ..
. . . .
0 0 n
is a diagonal matrix. Then

k1 0 0
0 k 0
k 2
A = .. .. ,

.. . .
. . . .
0 0 kn
5
so
P
tk k1
k=0 k! 0 0
P tk k2
tA
0 k=0 k! 0
e = .. .. ... ..

. . .

P tk kn
0 0 k=0 k!

et1 0 0
0 et2 0
= .. .. .

.. . .
. . . .
tn
0 0 e
Example 12. Lets compute etA where

0 1
A= .
1 0
We find that

2 1 0 3 0 1 41 0
A = , A = , A = and A4j+r = Ar .
0 1 1 0 0 1
Hence,
(1)j t2j (1)j t2j+1
P P !
tA j=0 (2j)! j=0 (2j+1)! cos t sin t
e = (1)j t2j+1 P (1)j t2j = .
sin t cos t
P
j=0 (2j+1)! j=0 (2j)!
Lets now look at the case of a general diagonalizable matrix A. As already

mentioned, A is diagonalizable if there is a basis for Cn consisting of eigenvectors
v1 , . . . , vn of A. The corresponding eigenvalues 1 , . . . , n dont have to be distinct.
In this case we know from Chapter 7 in Ahmad and Ambrosetti that the general
solution of (1) is given by
x(t) = c1 e1 t v1 + + cn en t vn .
In matrix notation, we can write this as

c1

n t ..
x(t) = e1 t v1 e vn . ,
cn
or t1
e 0 c1

x(t) = v1 vn ... ... .

.. ...
.

tn
0 e cn
Denote the matrix to the left by T . If we want to find the solution with x(0) = x0 ,
we have to solve the system of equations
T c = x0 c = T 1 x0 .
6
Thus the solution of the IVP is
x(t) = T etD T 1 x0 ,
where
1 0
D = ... . . . ... .

0 n
This can also be seen by computing the matrix exponential of etA using the fol-
lowing proposition.
Proposition 13.
1
eT BT = T eB T 1
Proof. This follows by noting that
(T BT 1 )k = (T BT 1 )(T BT 1 ) (T BT 1 )
= T B(T 1 T )B(T 1 T ) (T 1 T )BT 1
= T B k T 1 ,
whence
X (T BT 1 )k X T B k T 1
= = T eB T 1 .
k=0
k! k=0
k!
Returning to the discussion above, if A is diagonalizable, then it can be written
(3) A = T DT 1
with T and D as above. D is the matrix for the linear operator x 7 Ax in the
basis v1 , . . . , vn . The matrix for this linear operator in the standard basis is simply
A. Equation (3) is the change of basis formula for matrices. For convenience we
will also refer to D as the matrix for A in the basis v1 , . . . , vn , although this is a
bit sloppy. From (3) and Proposition 13 it follows that
etA = T etD T 1 .
The matrix for etA in the basis v1 , . . . , vn is thus given by etD . The solution of the
IVP is given by
etA x0 = T etD T 1 x0 ,
confirming our previous result.
We finally record some properties of the matrix exponential which will be useful
for matrices which cant be diagonalized.
Lemma 14. AB = BA eA B = BeA .
Proof. Ak B = Ak1 BA = = AB k . Thus
N
! N
!
X Ak X Ak
lim B = lim B .
N
k=0
k! N
k=0
k!
7
Proposition 15.
(1) (eA )1 = eA .
(2) eA eB = eA+B = eB eA if AB = BA.
(3) etA esA = e(t+s)A .
Proof. (1) We have that
d tA tA
e e = AetA etA + etA (AetA ) = (A A)etA etA = 0,
dt
where we have used the previous lemma to interchange the order of etA and A.
Hence
etA etA C (constant matrix).
Setting t = 0, we find that C = I (identity matrix). Setting t = 1, we find that
eA eA = I.
(2) We have that
d t(A+B) tA tB
e e e = (A + B)et(A+B) etA etB et(A+B) AetA etB et(A+B) etA BetB
dt
= (A + B (A + B))etA etB et(A+B)
= 0,
where we have used the previous lemma in the second line. As in (1) we obtain
eA eB e(A+B) = I eA eB = eA+B , where we have used (1).
(3) follows from (2) since tA and sA commute.
Lets use this to compute the matrix exponential of a matrix which cant be
diagonalized.
Example 16. Let
2 0 0 1
D= , N=
0 2 0 0
and
2 1
A=D+N = .
0 2
The matrix A is not diagonalizable, since the only eigenvalue is 2 and C x = 2x
has the solution
1
x = z , z C.
0
Since D is diagonal, we have that

tD e2t 0
e = .
0 e2t
Moreover, N 2 = 0 (confirm this!), so

tN 1 t
e = I + tN =
0 1
8
Since D and N commute, we find that
2t 2t
tA tD+tN tD tN e 0 1 t e te2t
e =e =e e = = .
0 e2t 0 1 0 e2t
Exercise 5 below shows that the hypothesis that A and B commute in Propo-
sition 15b is essential.
2 Generalized eigenvectors
We know now how to compute the matrix exponential when A is diagonalizable. In
the next section we will discuss how this can be done when A is not diagonalizable.
In order to do that, we need to introduce some more advanced concepts from linear
algebra. When A is diagonalizable there is a basis consisting of eigenvectors. The
main idea when A is not diagonalizable is to replace this basis by a basis consisting
of generalized eigenvectors.
Definition 17. Let A Cnn . A vector v Cn , v 6= 0, is called a generalized
eigenvector corresponding to the eigenvalue if
(A I)m v = 0
for some integer m 1.
Note that according to this definition, an eigenvector also qualifies as a gen-
eralized eigenvector. We also remark that in the above definition has to be an
eigenvalue since if m 1 is the smallest positive integer such that (A I)m v = 0,
then w = (A I)m1 v 6= 0 and
(A I)w = (A I)(A I)m1 v = (A I)m v = 0,
so w is an eigenvector and is an eigenvalue.
The goal of this section is to prove the following theorem.

Theorem 18. Let A Cnn . Then there is a basis for Cn consisting of generalized
eigenvectors of A.
We will also discuss methods for constructing such a basis.
Before proving the main theorem, we discuss a number of preliminary results.

Let V be a vector space and let V1 , . . . , Vk be subspaces. We say that V is the
direct sum of V1 , . . . , Vk if each vector x V can be written in a unique way as
x = x1 + x2 + + xk , where xi Vi , i = 1, . . . , k.
If this is the case we use the notation
V = V1 V2 Vk .
We say that a subspace W of V is invariant under A if
x W Ax W.
9
Example 19. Suppose that A has n distinct eigenvalues 1 , . . . , n with cor-
responding eigenvectors v1 , . . . , vn . It then follows that the vectors v1 , . . . , vn are
linearly independent (see Theorem 7.1.2 in Ahmad and Ambrosetti) and thus form
a basis for Cn . Let
ker(A i I) = span{vi }, i = 1, . . . , n,
be the corresponding eigenspaces. Here
ker B = {x Cn : B x = 0}
is the kernel (or null space) of an n n matrix B. Then
Cn = ker(A 1 I) ker(A 2 I) ker(A n I)
by the definition of a basis. It is also clear that each eigenspace is invariant under
A.
More generally, suppose that A is diagonalizable, i.e. that it has k distinct
eigenvalues 1 , . . . , k and that the geometric multiplicity of each eigenvalue i
equals the algebraic multiplicity ai . Let ker(A i I), i = 1, . . . , k, be the corre-
sponding eigenspaces. We can then find a basis for each eigenspace consisting of
ai eigenvectors. The union of these bases consists of a1 + + ak = n elements
and is linearly independent, since eigenvectors belonging to different eigenvalues
are linearly independent (this follows from an argument similar to Theorem 7.1.2
in Ahmad and Ambrosetti). We thus obtain a basis for Cn and it follows that
Cn = ker(A 1 I) ker(A 2 I) ker(A k I).
In this basis, A has the matrix

1 I1
D=
...

k Ik
where each Ii is an ai ai unit matrix. In other words, D is a diagonal matrix

with the eigenvalues on the diagonal, each repeated ai times. This explains why
A is called diagonalizable.
Let A Cnn be a square matrix and p() = a0 m + a1 m1 + + am1 + am

a polynomial. We define
p(A) = a0 Am + a1 An1 + + am1 A + am I.
Theorem 20 (Cayley-Hamilton). Let pA () = det(A I) be the characteristic

polynomial of A. Then pA (A) = 0.
Proof. Cramers rule from linear algebra says that

1
(A I)1 = adj(A I),
pA ()
10
where the adjugate matrix adj B is a matrix whose elements are the cofactors of
B. More specifically, the element on row i and column j of the adjugate matrix for
B is Cji , where Cji = (1)j+i det Bji , in which Bji is the (n 1) (n 1) matrix
obtained by eliminating row j and column i from B (see sections 7.1.27.1.3 of
Ahmad and Ambrosetti). Note that each element of adj(AI) is a polynomial in
of degree at most n 1 (since at least one element of the diagonal is eliminated).
Thus,
pA ()(A I)1 = n1 Bn1 + + B1 + B0 ,
for some constant n n matrices B0 , . . . , Bn1 . Multiplying with A I gives
pA ()I = pA ()(A I)(A I)1

= n Bn1 + n1 (ABn1 Bn2 ) + + (AB1 B0 ) + AB0 .
Thus,
Bn1 = a0 I,
ABn1 Bn2 = a1 I,
..
.
AB1 B0 = an1 I,
AB0 = an I,
where
pA () = a0 n + a1 n1 + + an1 + an .
Multiplying the rows by An , An1 , . . ., A, I and adding them, we get
pA (A) = a0 An + a1 An1 + + an1 A + an I

= An Bn1 + An1 (ABn1 Bn2 ) + + A(AB1 B0 ) + AB0
= An Bn1 + An Bn1 An1 Bn2 + + A2 B1 AB0 + AB0
= 0.
Lemma 21. Suppose that p() = p1 ()p2 () where p1 and p2 are relatively prime.
If p(A) = 0 we have that
Cn = ker p1 (A) ker p2 (A)
and each subspace ker pi (A) is invariant under A.
Proof. The invariance follows from pi (A)Ax = Api (A)x = 0, x ker pi (A). Since
p1 and p2 are relatively prime, it follow by Euclids algorithm that there exist
polynomials q1 , q2 such that
p1 ()q1 () + p2 ()q2 () = 1.
Thus
p1 (A)q1 (A) + p2 (A)q2 (A) = I.
11
Applying this identity to the vector x Cn , we obtain
x = p1 (A)q1 (A)x + p2 (A)q2 (A)x,

| {z } | {z }
x2 x1
where
p2 (A)x2 = p2 (A)p1 (A)q1 (A)x = p(A)q1 (A)x = 0,
so that x2 ker p2 (A). Similarly x1 ker p1 (A). Thus V = ker p1 (A) + ker p2 (A).
On the other hand, if
x1 + x2 = x01 + x02 , xi , x0i ker pi (A), j = 1, 2,
we obtain that
y = x1 x01 = x02 x2 ker p1 (A) ker p2 (A),
so that
y = p1 (A)q1 (A)y + p2 (A)q2 (A)y = q1 (A)p1 (A)y + q2 (A)p2 (A)y = 0.
It follows that the representation x = x1 + x2 is unique and therefore
Cn = ker p1 (A) ker p2 (A).
Recall that the characteristic polynomial pA () = det(AI) can be factorized

as
pA () = (1)n ( 1 )a1 ( k )ak
where 1 , . . . , k are the distinct eigenvalues of A and a1 , . . . , ak the corresponding
algebraic multiplicities. Applying Theorem 20 and Lemma 21 to pA () we obtain
the following important result.
Theorem 22. We have that
Cn = ker(A 1 I)a1 ker(A k I)ak ,
where each ker(A i I)ai is invariant under A.
Proof. We begin by noting that the polynomials ( i )ai , i = 1, . . . , k, are rela-

tively prime. Repeated application of Lemma 21 therefore shows that
Cn = ker(A 1 I)a1 ker(A k I)ak ,
with each ker(A i I)ai invariant.

The space ker(A i I)ai is called the generalized eigenspace corresponding
to i . We leave it as an exercise to show that this is the space spanned by all
generalized eigenvectors of A corresponding to i .
We can now prove Theorem 18.
12
Proof of Theorem 18. If we select a basis {vi,1 , . . . , vi,ni } for each subspace ker(A
i I)ai , then the union {v1,1 , . . . , v1,n1 , v2,1 , . . . , v2,n2 , . . . , vk,1 , . . . , vk,nk } will be a
basis for Cn consisting of generalized eigenvectors. The fact these vectors are
linearly independent follows from Theorem 22. Indeed, suppose that
k ni
!
X X
i,j vi,j = 0.
i=1 j=1
Then by the definition of a direct sum, we have nj=1

P i
i,j vi,j for each i. But then
for each i, i,j = 0, j = 1, . . . , ni since {vi,1 , . . . , vi,ni } is linearly independent.
When A is diagonalizable it takes the form of a diagonal matrix in the eigen-
vector basis. We might therefore ask what a general matrix looks like in a basis
of generalized eigenvalues. Since each generalized eigenspace is invariant under A,
the matrix in the new basis will be block diagonal:

B1
B=
... ,

Bk
where each Bi is an ni ni matrix, ni = dim ker(A i I)ai . Moreover, Bi only has
one eigenvalue i . Indeed, Bi is the matrix for the restriction of A to ker(Ai I)ai
in the basis vi,1 , . . . , vi,ni and if Av = v for some non-zero v ker(A i I)ai ,
then
0 = (A i I)ai v = ( i )ai v = i .
It follows that the dimension of ker(A i I)ai equals the algebraic multiplicity of
the eigenvalue i , that is, ni = ai . This follows since
(1)n ( 1 )a1 ( k )ak = det(A I)
= det(B I)
= det(B1 I1 ) det(Bk Ik )
= (1)n ( 1 )n1 ( k )nk ,
where we have used the facts that the determinant of a matrix is independent of
basis, and that the determinant of a block diagonal matrix is the product of the
determinants of the blocks.
Set Ni = Bi i Ii , where Ii is the ni ni unit matrix. Then Niai = 0 by the
definition of the generalised eigenspaces. A linear operator N with the property
that N m = 0 for some m is called nilpotent.
We can summarize our findings as follows.
Theorem 23. Let A Cnn . There exists a basis for Cn in which A has the block
diagonal form
B1
B=
.. ,

.
Bk
and Bi = i Ii + Ni , where 1 , . . . , k are the distinct eigenvalues of A, Ii is the
ai ai unit matrix and Ni is nilpotent.
13
We remark that matrix B in the above theorem are not unique. Apart from the
order of the blocks Bi , the blocks themselves depend on the particular bases chosen
for the generalized eigenspaces. There is a particularly useful way of choosing these
bases which gives rise to the Jordan normal form. Although the Jordan normal
form will not be required to compute the matrix exponential we present it here for
completeness and since it is mentioned in Ahmad and Ambrosetti.
Theorem 24. Let A Cnn . There exists an invertible n n matrix T such that
T 1 AT = J,
where J is a block diagonal matrix,

J1
J =
...

Jm
and each block Ji is a square matrix of the form

1 0
... ...
Ji = I + N = ,

...
1
0
where is an eigenvalue of A, I is a unit matrix and N has ones on the line directly
above the diagonal and zeros everywhere else. In particular, N is nilpotent.
See http://en.wikipedia.org/wiki/Jordan_normal_form or any advanced text-

book in linear algebra for more information. Note in particular that there is an
alternative version of the Jordan normal form for real matrices with complex eigen-
values, which is briefly mentioned in Ahmad and Ambrosetti and discussed in more
detail on the Wikipedia page.
3 Computing the matrix exponential

We will now use the results of the previous section in order to find an algorithm
for computing etA when A is not diagonalizable. Note that our main interest
is actually in solving the equation x0 = Ax, possibly with the initial condition
x(0) = x0 . As we will see, this doesnt actually require computing etA explicitly.
As previously mentioned, when A is diagonalizable, the general solution of
x0 = Ax is given by
(4) x(t) = c1 e1 t v1 + + cn en t vn .
If we want to solve the IVP
(5) x0 = Ax, x(0) = x0 ,
14
we simply choose c1 , . . . , cn so that
x(0) = c1 v1 + + cn vn = x0 .
In other words, the numbers ci are the coordinates for the vector x0 in the basis
v1 , . . . , vn . Note that each term ei t vi in the solution is actually etA vi . Since the
jth column of the matrix etA is given by etA ej , where {e1 , . . . , en } is the standard
basis, we can compute the matrix exponential by repeating the above steps with
initial data x0 = ej for j = 1, . . . , n.
The same approach works when A is not diagonalizable, with the difference
that the basis vectors vi are now generalized eigenvectors instead of eigenvectors.
Denote the basis vectors vi,j , i = 1, . . . , k, j = 1, . . . , ai , as in the proof of Theo-
rem 18 (recall that the dimension of each generalized eigenspace is the algebraic
multiplicity of the corresponding eigenvalue). Then the general solution is
ai
k X
X
x(t) = ci,j etA vi,j
i=1 j=1
We thus need to compute

etA vi,j ,
where vi,j is a generalized eigenvector corresponding to the eigenvalue i . Since
(A i I)ai vi,j = 0,
we find that
etA vi,j = eti I+t(Ai I) vi,j

= eti I et(Ai I) vi,j
= ei t et(Ai I) vi,j
t2 ta1

i t 2 ai 1
=e I + t(A i I) + (A i I) + + (A i I) vi,j ,
2 (a 1)!
where we have also used the fact that I and (A I) commute and the definition
of the matrix exponential. The general solution can therefore be written
i 1 `
k X ai aX
!
X t
(6) x(t) = ci,j ei t (A i I)` vi,j .
i=1 j=1 `=0
`!
In order to solve the IVP (5), we simply have to express x0 in the basis vi,j to find
the coefficients ci,j . Finally, to compute etA we repeat the above steps for each
standard basis vector ej .
Remark 25. Formula (6) shows that the general solution is a linear combination
of exponential functions multiplied with polynomials. The polynomial factors can
only appear for eigenvalues with (algebraic) multiplicity two or higher. This is
precisely the same structure which we encountered for homogeneous higher order
scalar linear differential equations with constant coefficients.
15
Remark 26. When A is diagonalizable we see from (4) that the general solution
is just a linear combination of exponential functions, without polynomial factors.
This seems to contradict the previous remark. A closer look reveals that in this
case
(7) (A i I)2 vi,j = 0
for each j. Indeed, we know that ker(A i I) ker(A i I)ai . If A is diagonal-

izable then then the geometric multiplicity dim ker(A i I) equals the algebraic
multiplicity ai = dim ker(A i I)ai , so the two subspaces coincide. This means
that every generalized eigenvector is in fact an eigenvector, so that (7) holds. This
implies that
(A i I)` vi,j = 0, ` 2
in (6) even if ai 2.
Remark 27. More generally, it might happen that (A i I)` vanishes on the
generalized eigenspace ker(A i I)ai for some ` < ai 1. One can show that there
is a unique monic polynomial pmin () such that pmin (A) = 0. pmin () is called
the minimal polynomial. It divides the characteristic polynomial pA () and can
therefore be factorized as
pmin () = ( 1 )m1 ( k )mk ,
with mi ai for each i. Repeating the proof of Theorem 22, we obtain that
Cn = ker(A 1 I)m1 ker(A k I)mk ,
so that ker(Ai I)ai = ker(Ai I)mi , i.e. (Ai I)mi vanishes on ker(Ai I)ai .
In fact, one can show that (A i I)m vanishes on ker(A i I)ai if and only if
m mi . In the diagonalizable case we have
pmin () = ( 1 ) ( k ),
i.e. mi = 1 for all i.
We now consider some examples.
Example 28. Let

1 2
A= .
2 1
The characteristic polynomial is ( 1)2 4 = ( + 1)( 3), so the distinct
eigenvalues are 1 = 1 and 2 = 3. Since the eigenvalues are distinct, there is
a basis consisting of eigenvectors. Solving the equations Ax = x and Ax = 3x,
we find the eigenvectors v1 = (1, 1) and v2 = (1, 1). The general solution of the
system x0 = Ax is thus given by

t 3t t 1 3t 1
x(t) = c1 e v1 + c2 e v2 = c1 e + c2 e
1 1
16
In order to compute the matrix exponential, we find the solutions with x(0) = e1
and x(0) = e2 , respectively. In the first case, we obtain the equations

1 1 1
c1 + c2 =
1 1 0
(
c1 + c2 = 1

c1 + c2 = 0
(
c1 = 21

c2 = 12 .
In the second case, we find that

1 1 0
c1 + c2 =
1 1 1
(
c1 = 12

c2 = 21 .
Hence,
1 et + e3t

At 1 t 1 1 3t 1
e e1 = e + e =
2 1 2 1 2 et + e3t
and
1 et + e3t

At 1 t 1 1 3t 1
e e2 = e + e = .
2 1 2 1 2 et + e3t
Finally,
et + e3t et + e3t

At 1
e = .
2 et + e3t et + e3t
Example 29. Let
3 4
A= .
1 1
The characteristic polynomial is ( + 1)2 , so 1 = 1 is the only eigenvalue. This
means that any vector belongs to the generalized eigenspace ker(A + I)2 , so that
eAt v = et e(A+I)t v
= et (I + t(A + I))v

t 1 0 2 4
=e +t v
0 1 1 2

t 1 2t 4t
=e v.
t 1 + 2t
In particular
At t 1 2t 4t
e =e .
t 1 + 2t
17
We could come to the same conclusion by using the standard basis vectors e1 and
e2 as our basis v1,1 , v1,2 in the solution formula (6). This would give the general
solution

tA tA t 1 2t t 4t
x(t) = c1 e e1 + c2 e e2 = c1 e + c2 e .
t 1 + 2t
There is one more possibility for 2 2 matrices. The matrix could be diago-
nalizable and still have a double eigenvalue. We leave it as an exercise to show
that this can happen if and only if A is a scalar multiple of the identity matrix,
i.e. A = I for some number (which will be the only eigenvalue). In this case
etA = et I.
For 3 3 matrices there are more possibilities.
Example 30. Let

1 0 1
A= 0 2 0 .
1 0 1
The characteristic polynomial of A is pA () = 2 ( 2). Thus, A has the only
eigenvalues 1 = 0 and 2 = 2 with algebraic multiplicities a1 = 2 and a2 = 1,
respectively. We find that
Ax = 0 x = z(1, 0, 1),
Ax = 2x x = z(0, 1, 0),
z C. Thus v1 = (1, 0, 1) and v2 = (0, 1, 0) are eigenvectors corresponding to 1
and 2 , respectively. We see that A is not diagonalizable.
The generalised eigenspace corresponding to 2 is simply the usual eigenspace
ker(A 2I), but the one corresponding to 1 is ker A2 . Calculating

0 0 0
A2 = 0 4 0 ,
0 0 0
we find e.g. the basis v1,1 = v1 = (1, 0, 1), v1,2 = (1, 0, 0) for ker A2 and we
previously found the basis v2,1 = v2 = (0, 1, 0) for ker(A 2I).
We have
0 0
tA 2t
e 1 =e 1
0 0
and
1 1 1
tA 0t
e 0 =e
0 =
0 .
1 1 1
18
Finally,

1 1 1+t 0 t 1 1+t
etA 0 = (I + tA) 0 = 0 1 + 2t 0 0 = 0 .
0 0 t 0 1t 0 t
We can thus already write the general solution as

1 1+t 0
(8) x(t) = c1 0 + c2 0 + c3 e2t 1 ,
1 t 0
where we chose to use the simpler notation c1 , c2 , c3 for the coefficients instead of
c1,1 , c1,2 , c2,1 from eq. (6).
In order to compute etA , we need to compute etA ei for the standard basis
vectors. Note that we have already computed

1+t
etA e1 = 0
t
and
0
tA 2t
e e2 = e 1 ,
0
so it remains to compute etA e3 . We thus need to solve the equation

0 1 1 0
0 = c1 0 + c2 0 + c3 1 ,
1 1 0 0
and a simple calculation gives c1 = 1, c2 = 1 and c3 = 0, so that

0 1 1+t t
etA 0 = 0 + 0 = 0 .
1 1 t 1t
Thus,
1+t 0 t
etA = 0 e2t 0 .
t 0 1t
Example 31. Suppose that we in the previous example wanted to solve the IVP

0
0
x = Ax, x(0) = 1 .

1
Of course, once we have the formula for the matrix exponential we can find the
solution by calculating

1+t 0 t 0 t
tA
x(t) = e x(0) = 0 e2t 0 1 = e2t .
t 0 1t 1 1t
19
We could however also solve the problem by using the general solution (8) and
finding c1 , c2 , c3 to match the initial data. We thus need to solve the system

0 1 1 0
1 = c1 0 + c2 0 + c3 1 ,
1 1 0 0
giving c1 = 1, c2 = 1 and c3 = 1. Thus,

1 1+t 0 t
x(t) = 0 + 0 + e2t 1 = e2t ,
1 t 0 1t
which coincides with the result of our previous computation.
Example 32. Let

3 1 1
A = 0 2 0 .
1 1 1
The characteristic polynomial of A is pA () = ( 2)3 . Thus, A has the only
eigenvalue 1 = 2 with algebraic multiplicity 3. The corresponding generalized
eigenspace is the whole of C3 . Just like in Example 29 we can therefore compute
the matrix exponential directly through the formula
t2

tA 2t t(A2I) 2t 2
e =e e =e I + t(A 2I) + (A 2I) .
2
We find that

1 1 1 0 0 0
A 2I = 0 0 0 and (A 2I)2 = 0 0 0 .
1 1 1 0 0 0
t2
Thus the term 2
(A 2I)2 vanishes and we obtain
etA = e2t (I + t(A 2I))

1 0 0 1 1 1
= e2t 0 1 0 + t 0 0 0
0 0 1 1 1 1
2t 2t

(1 + t)e te te2t
= 0 e2t 0 .
te2t te2t (1 t)e2t
Example 33. Let

2 0 0
A = 0 2 1 .
1 0 2
20
Again pA () = ( 2)3 and thus 2 is the only eigenvalue. This time

0 0 0 0 0 0
A 2I = 0 0 1 and (A 2I)2 = 1 0 0 ,
1 0 0 0 0 0
so that
t2

tA 2t 2
e =e I + t(A 2I) + (A 2I)
2

1 0 0 0 0 0 2 0 0 0
t
= e2t 0 1 0 + t 0 0 1 + 1 0 0
2
0 0 1 1 0 0 0 0 0
2t
e 0 0
t2 2t
= 2 e e te2t .
2t
te2t 0 e2t
The 4 4 case can be analyzed in a similar way. In general, the computations

will get more involved the higher n is. Most computer algebra systems have rou-
tines for computing the matrix exponential. In Maple this can be done using the
command MatrixExponential from the LinearAlgebra package.
We can also formulate the above algorithm using the block diagonal represen-
tation of A from Theorem 23. Let T be the matrix for the corresponding change of
basis, i.e. T is the matrix whose columns are the coordinates for the basis vectors
in which A takes the block diagonal form B. Then A = T BT 1 and
etA = T etB T 1 ,
where
etB1
etB =
... ,

etBk
and
tai 1

tBi t(i Ii +Ni ) ti Ii tNi i t mi 1
e =e =e e =e Ii + tNi + + N ,
(ai 1)! i
since Nim = 0 for m ai .

In the last two examples above, the matrices were already in block diagonal
form, so that we could take T = I. Let us instead consider Example 30 from this
perspective.
Example 34. For the matrix

1 0 1
A= 0 2 0
1 0 1
21
in Example 30 we found that the generalized eigenspace ker A2 had the basis

1 1
v1,1 = 0 , v1,2 = 0
1 0
and the eigenspace ker(A 2I) the basis

0
v2,1 = 1 .

0
Thus, we take
1 1 0
T = 0 0 1
1 0 0
and find that
0 0 1
T 1 = 1 0 1 .
0 1 0
Since Av1,1 = 0, Av1,2 = v1,1 and Av2,1 = 2v2,1 , The corresponding block diagonal
matrix is
0 1 0
B 1
B = 0 0 0 = ,
B2
0 0 2
where
0 1
B1 = , B2 = 2.
0 0
We find that
tB1 1 t
e = I + tB1 = , etB2 = e2t .
0 1
Thus,
1 t 0
etB = 0 1 0
0 0 e2t
and

1 1 0 1 t 0 0 0 1
etA = T etB T 1 = 0 0 1 0 1 0 1 0 1
2t
1 0 0 0 0 e 0 1 0

1+t 0 t
= 0 e2t 0 .
t 0 1t
in agreement with our previous calculation.
22
Exercises
1. Compute eA by summing the power series when

0 1 2
0 1
a) A = b) A = 0 0 2 .
1 0
0 0 0

tA 0 1
2. Compute e by diagonalising the matrix, where A = .
1 0
3. Show that
keA k ekAk .
4. a) Show that

(eA ) = eA .
b) Show that eS is unitary if S is skew symmetric, that is, S = S.
5. Show that the following identities (for all t R) imply AB = BA.

a) AetB = etB A, b) etA etB = et(A+B) .
6. Let

0 1 1 1 1 2 1 2 0
A1 = 0 1 1 , A2 = 1 1 0 , A3 = 3 1 3 .
0 1 1 1 1 2 0 2 1
Calculate the generalized eigenspaces and determine a basis consisting of gen-

eralized eigenvectors in each case.
7. Calculate etAj for the matrices Aj in the previous exercise.
8. Solve the initial-value problem
x0 = Ax, x(0) = x0 ,
where
2 1 1 1
A = 4 1 4 and x0 = 0 .
5 1 4 0
9. The matrix
18 3 2 12
0 2 0 0
A=
2 12 2

1
24 6 3 16
has the eigenvalues 1 and 2. Find the corresponding generalized eigenspaces
and a determine a basis consisting of generalized eigenvectors.
23
10. Consider the initial value problem
(
x01 = x1 + 3x2 ,
x(0) = x0 .
x02 = 3x1 + x2 ,
For which initial data x0 does the solution converge to zero as t ?
11. Can you find a general condition on the eigenvalues of A which guarantees
that all solutions of the IVP
x0 = Ax, x(0) = x0 .
converge to zero as t ?
12. The matrices A1 and A2 in Exercise 6 have the same eigenvalues. If youve
solved Exercise 7 correctly, you will notice that all solutions of the IVP cor-
responding to A1 are bounded for t 0 while there are unbounded solutions
of the IVP corresponding to A2 . Explain the difference and try to formulate
a general principle.
24

Exp 6

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Exp 6

Caricato da

Copyright:

Formati disponibili

The matrix exponential

1 Definition and basic properties

x(t) = c1 et1 v1 + + cn etn vn ,

where 1 , . . . , n are the, not necessarily distinct, eigenvalues of A and v1 , . . . , vn

where c is an arbitrary vector in Cn , in analogy with the scalar case n = 1. We

This is an example of a power series.

where the the coefficients ak are complex numbers. The number

R := sup{r 0 : {|ak |rk }

Let us now return to matrices. The set of complex n n matrices is denoted

Definition 3. If A Cnn , we define

Definition 4. The matrix norm of A is defined by

kAk = max |Ax|.

Proposition 5. The matrix norm satisfies

(1) kAk 0, with equality iff A = 0.

(2) kzAk = |z|kAk, z C.

(3) kA + Bk kAk + kBk.

A more interesting property is the following.

|AB x| kAk|B x| kAkkBk|x| = kAkkBk if |x| = 1.

(1) |aij | kAk.

The following corollary is an immediate consequence of the above proposition.

lim Ak = A lim kAk Ak 0.

Proof. Each matrix element of tk Ak

Since e0A = I, we obtain the following result.

and the unique solution of the IVP

is a diagonal matrix. Then

Example 12. Lets compute etA where

Lets now look at the case of a general diagonalizable matrix A. As already

In matrix notation, we can write this as

x(t) = v1 vn ... ... .

Returning to the discussion above, if A is diagonalizable, then it can be written

Moreover, N 2 = 0 (confirm this!), so

The goal of this section is to prove the following theorem.

Before proving the main theorem, we discuss a number of preliminary results.

be the corresponding eigenspaces. Here

is the kernel (or null space) of an n n matrix B. Then

Cn = ker(A 1 I) ker(A 2 I) ker(A n I)

Cn = ker(A 1 I) ker(A 2 I) ker(A k I).

In this basis, A has the matrix

where each Ii is an ai ai unit matrix. In other words, D is a diagonal matrix

Let A Cnn be a square matrix and p() = a0 m + a1 m1 + + am1 + am

p(A) = a0 Am + a1 An1 + + am1 A + am I.

Theorem 20 (Cayley-Hamilton). Let pA () = det(A I) be the characteristic

Proof. Cramers rule from linear algebra says that

pA ()I = pA ()(A I)(A I)1

pA (A) = a0 An + a1 An1 + + an1 A + an I

Cn = ker p1 (A) ker p2 (A)

and each subspace ker pi (A) is invariant under A.

x = p1 (A)q1 (A)x + p2 (A)q2 (A)x,

x1 + x2 = x01 + x02 , xi , x0i ker pi (A), j = 1, 2,

y = x1 x01 = x02 x2 ker p1 (A) ker p2 (A),

y = p1 (A)q1 (A)y + p2 (A)q2 (A)y = q1 (A)p1 (A)y + q2 (A)p2 (A)y = 0.

It follows that the representation x = x1 + x2 is unique and therefore

Cn = ker p1 (A) ker p2 (A).

Recall that the characteristic polynomial pA () = det(AI) can be factorized

Theorem 22. We have that

Cn = ker(A 1 I)a1 ker(A k I)ak ,

where each ker(A i I)ai is invariant under A.

Proof. We begin by noting that the polynomials ( i )ai , i = 1, . . . , k, are rela-

Cn = ker(A 1 I)a1 ker(A k I)ak ,

with each ker(A i I)ai invariant.

We can now prove Theorem 18.

Then by the definition of a direct sum, we have nj=1

where J is a block diagonal matrix,

Example 28. Let