The Singular Value Decomposition.

Chapter 7
The Singular Value Decomposition

In an earlier chapter we looked at a procedure for diagonalizing a square matrix by using a change
of basis. At that time we saw that not every square matrix could be diagonalized. In this chapter
we will look at a generalization of that diagonalization procedure that will allow us to diagonalize
any matrix square or not square, invertible or not invertible. This procedure is called the singular
value decomposition.
7.1 Singular Values
Let A be an m n matrix, then we know that A
T
A will be a symmetric positive semi-denite
n n matrix. We can therefore nd an orthonormal basis of R
n
consisting of eigenvectors of A
T
A.
Let this orthonormal basis be v
1
, v
2
, . . . , v
n
and let
i
be the eigenvalue of A
T
A corresponding
to the eigenvector v
i
. Since A
T
A is positive semi-denite we must have
i
0.
Now notice that
|Av
i
|
2
= (Av
i
)
T
Av
i
= v
T
i
A
T
Av
i
=
i
v
T
i
v
i
=
i
Therefore the length of Av
i
is
i
. In other words,
i
is the factor by which the length of each
eigenvector of A
T
A is scaled when multiplied by A.
Furthermore, notice that for i ,= j we have
Av
i
Av
j
= (Av
i
)
T
Av
j
= v
T
i
A
T
Av
j
=
j
v
i
v
j
= 0
so Av
1
, Av
2
, . . . , Av
n
is an orthogonal set of vectors. If we want to normalize a non-zero vector,
Av
i
, in this set we just have to scale it by 1/
i
. Note also that some of the vectors in this set
could be the zero vector if 0 happens to be an eigenvalue of A
T
A. In fact one of these vectors will
denitely be the zero vector whenever Nul A ,= 0 (that is, whenever the columns of A are linearly
dependent). The reason is as follows:
Nul A ,= 0 Ax = 0 for some x ,= 0
A
T
Ax = 0
x is an eigenvector of A
T
A with eigenvalue 0.
The implication also works in the other direction as follows:
0 is an eigenvalue of A
T
A A
T
Ax = 0 for some x ,= 0
x
T
A
T
Ax = 0
|Ax| = 0
Ax = 0
The columns of A are linearly dependent.
291
292 7. The Singular Value Decomposition
The above comments lead to the following denition.
Denition 22 Let A be an mn matrix then the singular values of A are dened to be the square
roots of the eigenvalues
1
of A
T
A. The singular values of A will be denoted by
1
,
2
, . . . ,
n
. It is
customary to list the singular values in decreasing order so it will be assumed that
1

2

n
0
Example 7.1.1
What are the singular values of A =
_
_
1 1
1 1
1 1
_
_
?
The rst step is to compute A
T
A which gives
_
3 1
1 3
_
. This matrix has the character-
istic polynomial
2
6 + 8 = ( 4)( 2)
which gives us the two eigenvalues of 4 and 2. We take the square roots of these to get
the singular values,
1
= 2 and
2
=
2.
The vectors v
1
=
_
2/2
2/2
_
and v
2
=
_
2/2
2/2
_
would be orthonormal eigenvectors
of A
T
A. What happens when these two vectors are multiplied by A?
Av
1
=
_
_
2
0
_
_
and this vector has length
1
= 2.
Av
2
=
_
_
0
0
2
_
_
2
=
2.
So the lengths of v
1
and v
2
are scaled by the corresponding singular values when these
vectors are multiplied by A. Note also, as mentioned earlier, that Av
1
and Av
2
are
orthogonal.
Now consider the following problem: Let B = A
T
. What are the singular values of B?
B
T
B will be a 3 3 matrix so B has 3 singular values. It was shown earlier that A
T
A
and AA
T
will have the same non-zero eigenvalues so the singular values of B will be 2,
2, and 0.
Example 7.1.2
Let A =
_
1 2
1 2
_
. What are the singular values of A? (Note that in this case the columns
of A are not linearly independent so, for reasons mentioned earlier in this section, 0 will
turn out to be a singular value.)
1
Some textbooks prefer to dene the singular values of A as the square roots of the non-zero eigenvalues of A
T
A.
7.1. Singular Values 293
The procedure is straightforward. First we compute
A
T
A =
_
2 4
4 8
_
and this matrix has characteristic polynomial
2
10, which gives eigenvalues of 10
and 0, and so we get
1
=
10 and
2
= 0 as the singular values of A.
For = 10 we would have a unit eigenvector of v
1
=
_
1/
5
2/
5
_
.
Then Av
1
=
_
5
5
_
which has length
1
=
10.
For = 0 we would have a unit eigenvector of v
2
=
_
2/
5
1/
5
_
.
Then Av
1
=
_
0
0
_
2
= 0.
The Singular Value Decomposition
Here is the main theorem for this chapter.
Theorem 7.1 (The Singular Value Decomposition) Let A be any mn matrix, then we can
write A = UV
T
where U is an m m orthogonal matrix, V is an n n orthogonal matrix, and
is an m n matrix whose rst r diagonal entries are the nonzero singular values
1
,
2
, . . . ,
r
of A and all other entries are zero. The columns of V are called the right singular vectors. The
columns of U are called the left singular vectors.
Proof.
Let A be any m n matrix. Let
1
,
2
, . . . ,
n
be the singular values of A (with
1
,
2
, . . . ,
r
the non-zero singular values) and let v
1
, v
2
, . . . , v
n
be the corresponding orthonormal eigenvectors
of A
T
A. Let V =
_
v
1
v
2
v
n
. So V is an orthogonal matrix and

AV =
_
Av
1
Av
2
Av
n
=
_
Av
1
Av
2
Av
r
0 0
We will mention here (the proof is left as an exercise) that r will be the rank of A. So it is possible
that r = n in which case there will not be any columns of zeroes in AV .
Now let u
i
=
1
i
Av
i
for 1 i r. As we saw earlier these vectors will form an orthonormal
set of r vectors in R
m
. Extend this set to an orthonormal basis of R
m
by adding mr appropriate
vectors, u
r+1
, . . . , u
m
, and let U = [u
1
u
2
. . . u
r
u
r+1
. . . u
m
]. Then U will be an orthogonal matrix
and
U = [u
1
u
2
. . . u
m
]
_
1
0 0 0
0
2
0 0
0 0
3
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
=
_
1
u
1

2
u
2

r
u
r
0 0
=
_
Av
1
Av
2
Av
r
0 0
(In case the above reasoning is unclear remember that in the product U the columns of contain
the weights given given to the columns of U and after the r
th
column all the entries in are zeroes.)
Therefore AV = U and multiplying on the right by V
T
gives us the singular value decomposition
A = UV
T
.
The singular value decomposition (SVD) can also be written as
A =
1
u
1
v
T
1
+
2
u
2
v
T
2
+ +
r
u
r
v
T
r
You should see a similarity between the singular value decomposition and the spectral decomposition.
In fact, if A is symmetric and positive denite they are equivalent.
The singular value decomposition of a matrix is not unique. The right singular vectors are or-
thonormal eigenvectors of A
T
A. If an eigenspace of this matrix is 1 dimensional there are two choices
for the corresponding singular vector, these choices are negatives of each other. If an eigenspace has
dimension greater than 1 then there are innitely many choices for the (orthonormal) eigenvectors,
but any of these choices would be an orthonormal basis of the same eigenspace. Furthermore, as
seen in the above proof, it might be necessary to add columns
2
to U to make up an orthonormal
basis for R
m
. There will be a certain amount of freedom in choosing these vectors.
Example 7.1.3
To illustrate the proof of Theorem 7.1 we will outline the steps required to nd the
SVD of
A =
_
1 2
1 2
_
In Example 7.1.2 we found the singular values of A to be
1
=
10 and
2
= 0 so we
know that
=
_
10 0
0 0
_
If we take the right singular vectors (in the appropriate order) as columns then we have
V =
_
1/
5 2/
5
2/
5 1/
5
_
Take a moment to consider the following questions:
i. Are there any other possible answers for in this example?
ii. Are there any other possible answers for V in this example?
The answer is no to the rst question, and yes to the second. There are four possible
choices for V . (What are they?)
Now how can we nd U? From the proof of Theorem 7.1 we see that
u
1
=
1
1
Av
1
=
1
10
_
1 2
1 2
_ _
1/
5
2/
5
_
=
1
10
_
5
5
_
=
_
1/
2
1/
2
_
2
Suppose we let W be the span of u
1
, u
2
, , ur. Then the columns that we add are an orthonormal basis of
W
.
This gives us the rst column of U, but we cant nd u
2
the same way since
2
= 0. To
nd u
2
we just have to extend u
1
to an orthonormal basis of R
2
. It should be clear that
letting u
2
=
_
1/
2
1/
2
_
will work. So we now have
U =
_
1/
2 1/
2
1/
2 1/
2
_
Again, stop now and ask yourself if there any other possible choices for U at this stage?
(The answer is yes, for any particular choice of V there are 2 choices for U.)
We now have the SVD
A = UV
T
=
_
1/
2 1/
2
1/
2 1/
2
_ _
10 0
0 0
_ _
1/
5 2/
5
2/
5 1/
5
_
You should recognize U and V as rotation matrices.
This SVD can also be written in the form
1
u
1
v
T
1
+
2
u
2
v
T
2
=
10
_
1/
2
1/
2
_
_
1/
5 2
Example 7.1.4
Find the SVD of A =
_
_
1 1
1 1
1 1
_
_
.
We used this matrix for an earlier example and so we already have most of the important
information. From the earlier results we know that
=
_
_
2 0
0
2
0 0
_
_
and V =
_
2/2
2/2
2/2
2/2
_
The last step is to nd U. The rst column of U will be Av
1
normalized, so u
1
=
_
_
2/2
2/2
0
_
_
. Similarly u
2
=
_
_
0
0
1
_
_
. What about u
3
? First notice that at this point we
can write the SVD as follows:
UV
T
=
_
_
2/2 0
2/2 0
0 1
_
_
_
_
2 0
0
2
0 0
_
_
_
2/2
2/2
2/2
2/2
_
=
_
_
2/2 0
2/2 0
0 1
_
_
_
_
2
1 1
0 0
_
_
If we now carry out the last matrix multiplication, the entries in the third column of
U all get multiplied by 0. So in a sense it doesnt matter what entries go in that last
column.
This can also be seen if we write the SVD in the form
1
u
1
v
T
1
+
2
u
2
v
T
2
. Since there is
no
3
it follows that the value of u
3
is not relevant when the SVD is expressed in this
form. In this form the SVD gives
1
u
1
v
T
1
+
2
u
2
v
T
2
= 2
_
_
2/2
2/2
0
_
_
_
2/2
2/2
2
_
_
0
0
1
_
_
_
2/2
2/2
= 2
_
_
1/2 1/2
1/2 1/2
0 0
_
_
+
2
_
_
0 0
0 0
2/2
2/2
_
_
=
_
_
1 1
1 1
0 0
_
_
+
_
_
0 0
0 0
1 1
_
_
=
_
_
1 1
1 1
1 1
_
_
But having said all this U should have a third column and if you wanted to nd it how
could you do it? The set u
1
, u
2
is an orthonormal basis for a plane in R
3
. To extend
these two vectors to an orthonormal basis for all of R
3
we want a third vector, u
3
, that
is normal to this plane. One way of doing this would be to let u
3
= u
1
u
2
. This would
give u
3
=
_
_
2/2
2/2
0
_
_
.
Exercises
1. Find a singular value decomposition of the following matrices.
(a)
_
2 3
0 2
_
(b)
_
6 3
1 2
_
(c)
_
0 2
0 0
_
2. Find a singular value decomposition of the following matrices.
(a)
_
_
0 2
0 1
0 0
_
_
(b)
_
_
1 0
0 1
1 0
_
_
(c)
_
_
1 2
0 1
1 0
_
_
3. What are the singular values of the matrix
_
cos() sin()
sin() cos()
_
4. Let A =
_
1 2 2
1 2 2
_
. Find a SVD for A and A
T
.
5. (a) Let A =
_
1 0
0 2
_
. This is a symmetric indenite matrix. Find a spectral decomposition and
a singular value decomposition for this matrix.
(b) Let A =
_
1 3
3 1
_
. This is a symmetric indenite matrix. Find a spectral decomposition and a
singular value decomposition for this matrix.
(c) If A is a symmetric matrix show that the singular values of A are just the absolute value of
the eigenvalues of A.
6. Find a singular value decomposition for the following matrices. Note that these matrices have
dierent sizes, but they are all of rank 1 so in each case the SVD can be written
1
u
1
v
T
1
.
(a)
_
_
1
1
1
_
_
(b)
_
_
1 1
1 1
1 1
_
_
(c)
_
_
1 1 1
1 1 1
1 1 1
_
_
(d)
_
_
1 1 1 1
1 1 1 1
1 1 1 1
_
_
7. Find a singular value decomposition for the following matrices. Note that these matrices have
dierent sizes, but they are all of rank 2 so in each case the SVD can be written
1
u
1
v
T
1
+
2
u
2
v
T
2
.
(a)
_
_
1 0
1 0
0 1
_
_
(b)
_
_
1 0 2
1 0 2
0 1 0
_
_
(c)
_
_
1 0 2 0
1 0 2 0
0 1 0 2
_
_
8. Find the SVD of A =
_
_
1 1 0
0 0 1
0 0 1
_
_
.
9. The matrix A =
_
1 3/2
0 1
_
is not diagonalizable. What is the singular value decomposition of this
matrix?
10. Let A =
_
_
0 0
0 0
1 1
_
_
. Find the singular value decomposition A = UV
T
. How many choices are
there for the second and third column of U?
11. Let A = UV
T
be the singular value decomposition of A. Express the following in terms of U,
and V .
(a) A
T
A
(b) AA
T
(c) (A
T
A)
1
A
T
(assuming A has linearly independent columns)
(d) A(A
T
A)
1
A
T
(assuming A has linearly independent columns)
12. Suppose A is a square matrix with singular value decomposition A = UV
T
(a) What is the SVD of A
T
?
(b) If A is invertible, what is the SVD of A
1
?
(c) Show that [ det(A)[ is the product of the singular values of A.
13. Let A = UV
T
be the singular value decomposition of the mn matrix Awith U =
_
u
1
u
2
u
m
and V =
_
v
1
v
2
v
n
. Show that
1
u
1
v
T
1
+
2
u
2
v
T
2
+ +
k
u
k
v
T
k
has rank k. (Hint: show that v
k+1
, v
k+2
, , v
n
is a basis for Nul A.)
14. (a) Suppose A is a symmetric matrix with the spectral decomposition A = PDP
T
, show that
the spectral decomposition of A +I is P(D +I)P
T
.
(b) Suppose A is a square matrix with the SVD A = UV
T
. Is the SVD of A + I given by
U( +I)V
T
?
15. Let Q be a matrix with orthonormal columns. What does a SVD of Q look like?
Using MAPLE
Example 1
The Maple command for computing the SVD is SingularValues and is illustrated below.
We will nd the SVD of
A =
_
_
0 1 2
3 0 1
2 3 0
1 2 3
_
_
>A:=<<0,3,2,1>|<1,0,3,2>|<2,1,0,3>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]):
>U;
\[
\left[ \begin {array}{cccc} - 0.32302& 0.49999& 0.03034&- 0.80296
\\\noalign{\medskip}- 0.41841&- 0.49999&
0.74952&- 0.11471\\\noalign{\medskip}-
0.55065&- 0.5000&- 0.65850&-
0.11471\\\noalign{\medskip}- 0.64604&
0.49999& 0.06068& 0.57354
\end {array} \right] \]
>S;
\[ \left[ \begin {array}{c} 5.35768\\\noalign{\medskip} 2.82843\\\noalign{\medskip} 2.30115
\\\noalign{\medskip} 0.0\end {array} \right] \]
>Vt;
\[
\left[ \begin {array}{ccc} - 0.56043&- 0.60979&- 0.56043\\\noalign{\medskip}-
0.70711&-{ 2.6895\times 10^{-16}}&
0.70711\\\noalign{\medskip} 0.43119&-
0.79256& 0.43119\end {array} \right]
\]
The singular values are returned as a vector not in the form of a diagonal matrix. If you want the
singular values in a matrix you can enter
>DiagonalMatrix(S[1..3],4,3);
>U.%.Vt;
This last command returns the following matrix
_
_
0.00000000036 1.0 2.0
3.0 0 0.9999999997
2.0 3.0 0.0000000001
1.000000001 2.0 3.000000001
_
_
This is matrix A with some small dierences due to the accumulation of rounding errors in the oating
point arithmetic. The precision of our result could be improved by increasing the value of the Digits
variable in Maple .
We could also write the SVD in the form
3
i=1
i
u
i
v
T
i
In Maple this sum could be entered as
>simplify(add(S[i]*Column(U,i).Row(Vt,i),i=1..3));
This will again give matrix A with some rounding errors.
Example 2
We will use Maple to nd the singular values of
_
_
1 a
1 0
1 a
_
_
and we will investigate how these singular values relate to the parameter a.
>A:=<<1,1,1>|<a,0,a>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt],conjugate=false);
We now have the two singular values of A expressed in terms of the parameter a. We can visualize
the relationship between a and these singular values as follows:
>plot({ [a,S[1],a=-4..4],[a,S[2].a=-4..4]});
We get Figure 7.1.
0
1
2
3
4
5
4 2 2 4
Figure 7.1: The singular values of A versus a.
The plot seems to indicate that one of the singular values, s[2], approaches a limit as a becomes
large. We can compute this limit in Maple as follows
>limit(s[2], a=infinity);
1
We look at a variation on the same type of problem. Suppose we want to investigate the singular
values of matrices of the form
B =
_
cos(t) sin(t)
sin(t) cos(t)
_
We will rst dene
>f:=t-><<cos(t),sin(t)>|<sin(t),cos(t)>>;
This denes a function in Maple which returns a matrix of the desired form for any specied value
of t. For example the command
>f(1);
will return
_
cos(1) sin(1)
sin(1) cos(1)
_
and
>f(k);
will return
_
cos(k) sin(k)
sin(k) cos(k)
_
Next we enter
>g:=t->map( sqrt, eigenvals(transpose(f(t))&*f(t)) );
This will compute the singular values of our matrix for any specied value of t.
For example, the command
>g(.3);
[ .659816, 1.250857 ]
returns the singular values of
_
cos(.3) sin(.3)
sin(.3) cos(.3)
_
So we can enter
>sv:=g(t):
>plot( [ sv[1], sv[2] ], t=-3..3);
These commands give Figure 7.2 which plots the singular values of our matrix as a funtion of t.
Example 3
We have seen that the SVD of matrix A can be expressed as
A =
r
i=1
i
u
i
v
T
i
0.2
0.4
0.6
0.8
1
1.2
1.4
3 2 1 1 2 3
t
Figure 7.2: The singular values of B versus t.
where r is the rank of A. For any integer n with 0 < n r the sum
n
i=1
i
u
i
v
T
i
is called the rank n singular value approximation of A.
Before getting to the main problem we will look at a simple example to illustrate the basic idea.
Let A =
_
_
1.4 0.0 3.0
1.1 0.0 0.0
2.1 2.1 2.1
_
_
. This is a 3 3 matrix of rank 3. We will nd the SVD of A using Maple
.
>A:=<<1.4, 1.1, 2.1>|<0.0,0.0,2.1>|<2.1,2.1,2.1>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
>u1:=Column(U,1): ### The left singular vectors
>u2:=Column(U,2):
>u3:=Column(U,3):
>v1:=Row(V,1): ### the right singular vectors as row vectors
>v2:=Row(V,2):
>v3:=Row(V,3):
The rank 1 singular value approximation would be
>A1:=S[1]*u1.v1;
>A1:=U.DiagonalMatrix(<S[1],0,0>).Vt; ### another way to get the same result
A1 =
_
_
1.719 1.023 2.309
.348 .207 .468
1.953 1.163 2.624
_
_
How close is matrix A1 to A? This question makes sense only relative to an inner product. We will use
the inner product A, B) = trace A
T
B.
The distance from A to A1 can now be computed as
>sqrt(Trace((A-A1)^%T.(A-A1)));
1.9046
We will mention without proof that, in fact, matrix A1 is the closest you can get to A by a matrix
of rank 1 relative to this inner product.
The rank 2 approximation would be
>A2:=sv[1]*u1.v1 + sv[2]*u2.v2; ### one way
>A2:=U.DiagonalMatrix(<S[1],S[2],0>).Vt; ### another way
A2 =
_
_
1.365 .0232 3.016
.433 .447 .299
2.249 1.000 2.033
_
_
If you compare the entries in this matrix with those in A you can see that it appears to be close to A
than matrix A1. How far is A2 from A?
>sqrt(Trace((A-A2)^%T.(A-A2)));
.8790
So we see that A2 is a better approximation to A than A1. A2 will be the closest you can get to A by
a rank 2 matrix.
If we were to continue this for one more step and compute the rank 3 singular value approximation
we would get A exactly. The distance from A3 to A would be 0.
We will extend this idea to a larger matrix.
In this example we will choose a random 12 12 matrix and compute the distance between the rank
n singular value approximation of A and A itself for n = 1..12. The distance will be computed relative
to the inner product A, B) = trace A
T
B.
>A:=RandomMatrix(12,12, generator=0.0..9.0):
>ip:=(A,B)->Trace(A^%T.B); ### our inner product
We will now dene our rank n approximations in Maple . Then we compute the distances (i.e., the
errors of our approximations) using the inner product.
>for n to 12 do
B[n]:=eval(add(S[i]*Column(U,i).Row(Vt,i),i=1..n)) od:
>for n to 12 do
err[n]:=sqrt(ip(A-B[n],A-B[n])) od;
We can visualize these errors using a plot.
>plot([seq( [i,err[i]],i=1..12)],style=point);
This gives Figure 7.3.
Of course B
12
= A so the nal error must be 0. The above pattern is typical for any matrix. As n
increases the approximations become better and better, and becomes exact when n = r.
There is another interesting aspect to this example. We have found the singular values and place
them in the list sv. From this list we will dene the following values
e
1
=
_
2
2
+
2
3
+
2
4
+ +
2
12
e
2
=
_
2
3
+
2
4
+ +
2
12
e
3
=
_
2
4
+ +
2
12
.
.
.
.
.
.
e
11
=
12
e
12
= 0
0
100
200
300
400
500
600
2 4 6 8 10 12
Figure 7.3: The errors of the SVD approximations.
and plot them.
>for i to 12 do e[i]:=sqrt( add( sv[j]^2, j=i+1..12)) od:
>plot( [seq( [i, e[i]], i=1..12),style =point)
This plot turns out to be exactly the same as Figure 7.3. This illustrates a fact that is true in general
and whose proof is left as an exercise
3
: The error of the rank n singular value approximation is the square
root of the sum of the squares of the unused singular values. That is, if you look at the unused singular
values as a vector, then the error is the length of this vector.
3
The trickiest part of the proof depends on the fact that if v is a unit vector then the trace of vv
T
is 1.
7.2. Geometry of the Singular Value Decomposition 305
7.2 Geometry of the Singular Value Decomposition
Let A =
_
2 1
2 2
_
. This matrix has the following SVD:
A = UV
T
=
_
1/
5 2/
5
2/
5 1/
5
_ _
3 0
0 2
_ _
2/
5 1/
5
1/
5 2/
5
_
T
The matrices U and V
T
are orthogonal matrices, and in this case they are simple rotation
matrices (i.e., there is no reection). U corresponds to a counter-clockwise rotation by 63.4
and
V
T
corresponds to a clockwise rotation of 26.6
. Finally is a diagonal matrix so it corresponds to

a scaling by the factors of 3 and 2 along the two axes. So what happens to the unit circle when it
is multiplied by A? We will look at the eect of multiplying the unit circle by each of the factors of
the SVD in turn. The steps are illustrated in Figures 7.4 - 7.7
3
2
1
0
1
2
3
3 2 1 1 2 3
Figure 7.4: The unit circle with the right
singular vectors
3
2
1
0
1
2
3
3 2 1 1 2 3
Figure 7.5: The unit circle is rotated by
V
T
. The right singular vectors now lie
on the axes.
3
2
1
0
1
2
3
3 2 1 1 2 3
Figure 7.6: The unit circle is scaled by
resulting in an ellipse.
3
2
1
0
1
2
3
3 2 1 1 2 3
Figure 7.7: The ellipse is rotated by U.
In Figure 7.4 we see the unit circle with the right singular vectors (the columns of V ) plotted.
In Figure 7.5 the unit circle has been multiplied by V
T
, which means it has been rotated
clockwise. There is something you should understand about this result. First, recall that the
columns of V form an orthonormal set of vectors - the right singular vectors. When these vectors
(arranged in matrix V ) are multiplied by V
T
we get the identity matrix. This means that the right
singular vectors have been reoriented (by a rotation and possibly a reection) to lie along the axes
of the original coordinate system. So in Figure 7.4 we see that the right singular vectors have been
rotated to lie on the x and y axes. (This happens in every case, multiplying by V
T
rotates, and
possibly ips, the right singular vectors so that they line up along the original axes.)
In Figure 7.6 the rotated unit circle is multiplied by . Since is a diagonal matrix we see
the expected result. The circle has been scaled by a factor of 3 along the x axis and by a factor of
2 along the y axis. The circle has now been transformed into an ellipse.
Finally in Figure 7.7 we multiply by U. This is a rotation matrix so the ellipse in Figure 7.6
is rotated so that it is no longer oriented along the x and y axes. The axes of the ellipse are now
the left singular vectors. The vectors shown in Figure 7.7 are not the left singular vectors, they
are the vectors Av
1
and Av
2
. The left singular vectors would be the result of normalizing these two
vectors.
To summarize the above: The unit circle is transformed into an ellipse when it is multiplied by
A. The axes of the ellipse are in the directions of u
1
and u
2
. The points on the ellipse that are
furthest from the origin are Av
1
and its negative. The points on the ellipse that are closest to the
origin are Av
2
and its negative.
PROBLEM. Repeat the above example with A =
_
1 2
1 2
_
. Use the fact that the SVD for this
matrix is
A = UV
T
=
_
1/
2 1/
2
1/
2 1/
2
_ _
10 0
0 0
_ _
1/
5 2/
5
2/
5 1/
5
_
T
Suppose we try a similar analysis with the matrix A =
_
_
1 1
1 1
1 1
_
_
. We have already computed
the SVD of A:
A = UV
T
=
_
_
2/2 0
2/2
2/2 0
2/2
0 1 0
_
_
_
_
1 0
0
2
0 0
_
_
_
2/2
2/2
2/2
2/2
_
In this case notice that A is a 3 2 matrix so multiplication by A would correspond to a linear
transformation from R
2
to R
3
. In the SVD we have A = UV
T
where U is a 3 3 matrix, V is a
2 2 matrix, and is 3 2. So U corresponds to a transformation from R
3
to R
3
, V
T
corresponds
to a transformation from R
2
to R
2
, and corresponds to a transformation from R
2
to R
3
.
So suppose we start with the unit circle in R
2
. When we multiply by V
T
the circle looks the
same, it has just been rotated so that the right singular vectors lie along the axes. Next we multiply
by . Notice that for any vector in R
2
we have
Ax =
_
_
2 0
0
2
0 0
_
_
_
x
y
_
=
_
_
2x
2y
0
_
_
So what happens here? We see that the x value is scaled by 2 and the y value is scaled by
2 so
again the circle is stretched into an ellipse. But something else happens, there is a third coordinate
of 0 that gets added on. In other words we still have an ellipse in the xy plane, but the ellipse
in now located in 3 dimensional space. In this case multiplying by has the eect of scaling and
zero-padding. It is the zero-padding that results in the change of dimension.
Finally we multiply by U which again will be a rotation matrix, but now the rotation is in R
3
so the ellipse is rotated out of the xy plane. The unit circle is again transformed into an ellipse,
but the resulting ellipse is located in 3 dimensional space. These transformations are illustrated in
Figures 7.8- 7.11.
2
1
0
1
2
2 1 1 2
Figure 7.8: The unit circle with the right
singular vectors
2
1
0
1
2
2 1 1 2
Figure 7.9: The unit circle is multiplied
by V
T
. The right singular vectors now
lie on the axes.
2
1
0
1
2
2
1
2
2
1
1
2
Figure 7.10: The unit circle is scaled
into an ellipse by and inserted into
R
3
.
1.5
1
0.5
1
1.5
1
1
1.5
0.5
1
1.5
Figure 7.11: The ellipse from is rotated
in R
3
by U.
PROBLEM. Do a similar analysis for multiplying the unit sphere by A
T
. (There are a couple of
major dierences with this example. In particular, what exactly do you end up with in this case?)
In summary you should understand that the nding the SVD of a matrix A can be interpreted
as factoring the matrix into a rotation followed by a scaling followed by another rotation. This
last sentence is a bit of an oversimplicationin that there could also be reections involved in the
orthogonal matrices. Also if A is not a square matrix then multiplying by will involve truncation
(decreasing the dimension) or zero padding (increasing the dimension).
The SVD and Linear Transformations
If A is an mn matrix then T(x) = Ax would be a linear transformation from R
n
to R
m
.
R
n
R
m
A
-
Now when we nd the singular value decomposition A = UV
T
the matrices U and V
T
can be
looked at as change of basis matrices giving the following diagram.
R
n
R
m
-
R
n
R
m
A
-
?
6 V
T
U
From this point of view you can look at A and as corresponding to the same linear transfor-
mation relative to dierent bases in the domain and codomain. More specically, if the vectors in
the domain are expressed in terms of the columns of U and vectors in the codomain are expressed
in terms of the columns of V , then the multiplication by A (in the standard basis) corresponds to
multiplication by .
If the domain and codomain have dierent dimensions then the change in dimension is a result
of the operation of . If the dimension is increased via the transformation, this is accomplished
through zero padding. If the dimension is decreased, this is accomplished through truncation.
4
4
In fact we can write =
D O
= D
I O
or =
D
O
I
O
D where D is a square diagonal matrix (with

possibly some zeroes on the diagonal). So can be written as the product of a square matrix which scales the entries
in a vector and a truncation matrix,
I O
or a zero padding matrix
I
O
.
Exercises
1. For A =
_
6 2
7 6
_
we have the SVD
A = UV
T
=
_
1/
5 2/
5
2/
5 1/
5
_ _
10 0
0 5
_ _
2/
5 1/
5
1/
5 2/
5
_
Plot the unit circle with the right singular vectors then show the result of successively multiplying
this circle by V
T
, , and U
2. Let matrix A be the same as in question (1). Repeat the steps of question 1 for
(a) A
T
(b) A
1
3. For A =
_
2 3
0 2
_
we have the SVD
A = UV
T
=
_
2/
5 1/
5
1/
5 2/
5
_ _
4 0
0 1
_ _
1/
5 2/
5
2/
5 1/
5
_
Plot the unit circle with the right singular vectors then show the result of successively multiplying
this circle by V
T
, , and U
4. Let matrix A be the same as in question (3). Repeat the steps of question 1 for
(a) A
T
(b) A
1
5. Let A =
_
_
1 1
1 1
1 1
_
_
. This matrix has the following SVD
A = UV
T
=
_
_
1/
3 1/
6 1/
2
1/
3 1/
6 1/
2
1/
3 2/
6 0
_
_
_
_
6 0
0 0
0 0
_
_
_
2/2
2/2
2/2
2/2
_
(a) Describe the eect of multiplying the unit circle by A by looking at the eect of multiplying
successively by each factor of the SVD.
(b) The unit circle gets transformed into a line segment in R
3
with what end points?
(c) What is a basis for Col A? How does this relate to the answer for (b)?
(d) What are the furthest points from the origin on the transformed unit circle? How far are these
points from the origin? What does this have to do with the singular values of A?
6. Let A =
_
1 1 1
1 1 1
_
. This matrix has the following SVD
A = UV
T
=
_
2/2
2/2
2/2
2/2
_ _
6 0 0
0 0 0
_
_
_
1/
3 1/
3 1/
3
1/
6 1/
6 2/
6
1/
2 1/
2 0
_
_
(a) Describe the eect of multiplying the unit sphere by A by looking at the eect of multiplying
successively by each factor of the SVD.
(b) The unit sphere gets transformed into a line segment in R
2
with what end points?
(c) What is a basis for Col A? How does this relate to the answer for (b)?
(d) What are the furthest points from the origin on the transformed unit circle? How far are these
points from the origin? What does this have to do with the singular values of A?
7. Let A =
_
_
0 0 1
1 0 1
1 0 0
_
_
.
(a) Find the SVD of A.
(b) The unit sphere will be transformed into a lled ellipse in R
3
. What is the equation of the
plane containing this ellipse.
(c) What are the points on the ellipse that are furthest from the origin? What is the distance of
these points from the origin.
8. Are the following statements TRUE or FALSE?
(a) A 2 2 matrix of rank 1 transforms the unit circle into a line segment in R
2
.
(b) A 3 2 matrix of rank 1 transforms the unit circle into a line segment in R
3
.
(c) A 2 2 matrix of rank 2 transforms the unit circle into an ellipse in R
2
.
(d) A 3 2 matrix of rank 2 transforms the unit circle into an ellipse in R
3
.
(e) A 3 3 matrix of rank 3 transforms the unit sphere into an ellipsoid in R
3
.
(f) A 3 3 matrix of rank 1 transforms the unit sphere into a line segment in R
3
.
Using MAPLE
Example 1
In this example we will use Maple to illustrate the geometry of the SVD in R
3
. We will let
A =
_
_
1 0 1
0 1 1
1 0 .5
_
_
and show the eects of multiplying the unit sphere by this matrix.
We will use the following basic fact: the unit sphere can be plotted using the vector
v =
_
_
cos(s) sin(t)
sin(s) sin(t)
cos(t)
_
_
and letting the parameter s range over the interval [0, 2] and the parameter t range over the interval
[0, ]. We will in fact write a Maple procedure that will plot the top and bottom halves in dierent
colors.
>showsphere:=proc(matr)
local A,v1,v2,p1,p2;
A:=matr:
v1:=<cos(s)*sin(t), sin(s)*sin(t), cos(t)>:
v2:=A.v2:
p1:=plot3d(v2,s=0..2*Pi,t=0..Pi/2,color=grey):
p2:=plot3d(v2,s=0..2*Pi,t=Pi/2..Pi,color=blue):
plots[display]([p1,p2],scaling=constrained,orientation=[100,70]);
end:
In this procedure, the input matr is assumed to be a 3 3 matrix and the procedure plots the unit
sphere after being multiplied by the matrix.
Next we will enter matrix A and nd the SVD.
>A:=<<1,0,1,0,1,1,1,0,-.5>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]):
>I3:=IdentityMatrix(3):
Next we will use the showsphere above and apply the various transformations to a sphere.
Now to plot the results we just have to enter the following:
>showsphere(I3); #### the original sphere
>showsphere(Vt); #### apply Vt
>showsphere(DiagonalMatrix(S).Vt); ### now apply S
>showsphere(U.DiagonalMatrix(S).Vt); ### and finally apply U
This gives Figures 7.12- 7.15.
Note that one of the singular values is .3099 which results in the sphere being attened a lot in one
direction. To see this it is a good idea to use the mouse to rotate the plots once they have been drawn
in order to see them from dierent viewing angles.
Example 2
Figure 7.12: The unit sphere.
Figure 7.13: Multiply by V
T
.
6
Figure 7.14: Multiply by .
Figure 7.15: Multiply by U.
In this example we will let A =
_
1.2 1 1
1 1 1
_
. This corresponds to a transformation from R
3
to R
2
.
Finding the SV D with Maple we get:
>A:=<<1.2,1>|<1,-1>|<1,1>>;
>U,S,Vt:=SingularValues(A, output=[U,S,Vt]);
So we have =
_
2.107 0 0
0 1.414 0
_
which involves scaling and truncation (dimension reduction). In
this case we will have to modify our approach since after multiplying by we will be in R
2
. We will still
use the plot3d command by adding a third component of zero, and choosing an appropriate viewing
angle.
>S1:=DiagonalMatrix(S,2,3):
>v:=<cos(s)*sin(t),sin(s)*sin(t),cos(t)>:
>SV:=S1.Vt.v;
>USV:=U.S1.Vt.v;
We now have a slight problem when it comes to plotting. The vectors V
T
v and UV
T
v are
vectors in R
2
using 2 parameters. Maple doesnt have a command for plotting in two dimensions with
2 parameters, so we will use a trick as shown below.
>showsphere(V);
>plot3d( [SV[1],SV[2],0],s=0..2*Pi,t=0..Pi,
orientation=[90,0],scaling=constrained);
>plot3d( [USV[1],USV[2],0],s=0..2*Pi,t=0..Pi,
orientation=[90,0],scaling=constrained);
This gives Figures 7.16 - 7.19.
Multiplication by V
T
gives, as expected, a rotation in R
3
. Multiplication by truncates the third
coordinate and scales the result. This gives a lled ellipse in R
2
. Multiplying by U rotates this ellipse in
R
2
. Notice that the plotting method we used makes it clear where the north pole and south pole of
the original sphere have ended up. They are in the interior of the ellipse at the points
_
1.2 1 1
1 1 1
_
_
_
0
0
1
_
_
=
_
1
1
_
Figure 7.16: The unit sphere. Figure 7.17: Multiply by V
T
.
Figure 7.18: Multiply by .
Figure 7.19: Multiply by U.
and
_
1.2 1 1
1 1 1
_
_
_
0
0
1
_
_
=
_
1
1
_
7.3 The Singular Value Decomposition and the Pseudoin-
verse
Consider the matrix A =
_
_
1 1
1 0
0 1
_
_
. This matrix has no inverse, but the pseudoinverse as dened
in Chapter 5 would be
A
= (A
T
A)
1
A
T
=
_
2 1
1 2
_
1
_
1 1 0
1 0 1
_
=
_
1/3 2/3 1/3
1/3 1/3 2/3
_
Now look at the SVD of A. From A
T
A we get singular values of
3 and 1. Omitting the rest of

the details we get
A = UV
T
=
_
_
2/
6 0 1/
3
1/
6 1/
2 1/
3
1/
6 1/
2 1/
3
_
_
_
_
3 0
0 1
0 0
_
_
_
1/
2 1/
2
1/
2 1/
2
_
Now suppose we ask ourselves why matrix A cannot be inverted. If we look at the SVD we see
that A can been decomposed into three factors. Of those three both U and V are invertible (since
they are orthogonal, their inverse is just their transpose), so the reason that A is not invertible
must have something to do with . What is the eect of , the middle factor? It scales the rst
component by
3, and this scaling can be inverted (just divide the rst component by
3). It scales
the second component by 1, and again this scaling can be undone. There is a third eect of the
matrix, it takes vectors in R
2
and places them in R
3
by adding a 0 as a third component (zero
padding). It is this last eect of that lies behind the non-invertibility of A in that it changes the
dimension of the vector. Every vector in R
2
gets transformed into a unique vector in R
3
by A, but
the reverse is not true. Every vector in R
3
does not have a pre-image in R
2
since the column space
of A is two dimensional. It is precisely the vectors in R
3
that are not in the column space of A that
do not have a pre-image in R
2
.
So we have A = UV
T
and if each factor was invertible the inverse of A would be V
1
U
T
.
This should be a 2 3 matrix which corresponds to a linear transformation from R
3
to R
2
that will
undo the eects of matrix A. The problem is the middle term, the matrix has no inverse. How
close can we come to nding an inverse of ? To undo the eects of matrix A we want to do three
things: scale the rst component by 1/
3, scale the second component by 1, and chop o (truncate)

the third component of an input vector in R
3
. The matrix that would do this is
_
1/
3 0 0
0 1 0
_
and, for reasons that will become clear shortly, we will call this matrix
. If we evaluate V
U
T
we get
_
1/
2 1/
2
1/
2 1/
2
_ _
1/
3 0 0
0 1 0
_
_
_
2/
6 1/
6 1/
6
0 1/
2 1/
2
1/
3 1/
3 1/
3
_
_
=
_
1/
6 1/
2 0
1/
6 1/
2 0
_
_
_
2/
6 1/
6 1/
6
0 1/
2 1/
2
1/
3 1/
3 1/
3
_
_
=
_
1/3 2/3 1/3
1/3 1/3 2/3
_
In other words we get A
, the pseudoinverse of A.
Now, in general, when you nd the SVD of an mn matrix A = UV
T
the matrix will be an
m n matrix of the form
_
D 0
0 0
_
where D stands for a square diagonal matrix with all non-zero
7.3. The Singular Value Decomposition and the Pseudoinverse 315
diagonal entries. We will dene the pseudoinverse of to be the n m matrix
_
D
1
0
0 0
_
. The
matrix D
1
will undo the scalings of D.
The principle behind the pseudoinverse is essentially how we deal with . The principle is to
invert all scalings and then to undo any zero padding by a truncation and vice versa..
To clarify the point we are trying to make in this section suppose A is an m n matrix with
linearly independent columns with the singular value decomposition A = UV
T
. The pseudoinverse
of A as dened in Chapter 5 would be
A
= (A
T
A)
1
A
T
= (V
T
U
T
UV
T
)
1
V
T
U
T
= (V
T
V
T
)
1
V
T
U
T
= (V
_
2
1
2
2
.
.
.
2
n
_
_
V
T
)
1
V
T
U
T
= V
_
_
1/
2
1
1/
2
2
.
.
.
1/
2
n
_
_
V
T
V
T
U
T
= V
_
_
1/
2
1
1/
2
2
.
.
.
1/
2
n
_
_
_
1
. . .
2
.
.
.
n

_
_
U
T
= V
_
_
1/
1

1/
2
.
.
.
1/
n

_
_
U
T
= V
U
T
In other words the pseudoinverse as dened in this section in terms of the singular value decom-
position is consistent with our previous denition. But this new denition is more powerful because
it is always dened. It is not restricted to matrices with linearly independent columns.
Example 7.3.5
W hat is the pseudoinverse of A =
_
1 2
1 2
_
?
We have already found the SVD of this matrix in Example 7.1.4
A = UV
T
=
_
2/2
2/2
2/2
2/2
_ _
10 0
0 0
_ _
1/
5 2/
5
2/
5 1/
5
_
From the above discussion we have the pseuodinverse
A
= V
U
T
=
_
1/
5 2/
5
2/
5 1/
5
_ _
1/
10 0
0 0
_ _
2/2
2/2
2/2
2/2
_
=
_
1/10 1/10
1/5 1/5
_
What happens if you multiply A by its pseudoinverse? Do you get the identity?
No. Simple computation gives
AA
=
_
1 2
1 2
_ _
1/10 1/10
1/5 1/5
_
=
_
1/2 1/2
1/2 1/2
_
and
A
A =
_
1/10 1/10
1/5 1/5
_ _
1 2
1 2
_
=
_
1/5 2/5
2/5 4/5
_
Suppose we write the SVD of an mn matrix A as
A =
1
u
1
v
T
1
+
2
u
2
v
T
2
+
3
u
3
v
T
3
+ +
r
u
r
v
T
r
where r is the number of non-zero singular values of A. Then the above comments mean that the
pseudoinverse of A can be written as
A
=
1
1
v
1
u
T
1
+
1
2
v
2
u
T
2
+
1
3
v
3
u
T
3
+ +
1
r
v
r
u
T
r
Notice what happens when these two expressions are multiplied together. We leave it as a simple
exercise to show that
AA
T
= u
1
u
T
1
+u
2
u
T
2
+u
3
u
T
3
+ +u
r
u
T
r
and
A
T
A = v
1
v
T
1
+v
2
v
T
2
+v
3
v
T
3
+ +v
r
v
T
r
These are just projectors onto Col U and Col V respectively
5
.
Example 7.3.6
C onsider the matrix A =
_
1 1
1 1
_
which has the following SVD
A =
_
2/2
2/2
2/2
2/2
_ _
2 0
0 0
_ _
2/2
2/2
2/2
2/2
_
Now it should be obvious that A is not invertible, in fact since the columns of A are not
linearly independent you cant nd the pseudoinverse of A from the formula (A
T
A)
1
A
T
.
But why is A not invertible? What insights can the SVD give us into this question?
We have A = UV
T
so it might seem that to invert A all we have to do is to invert
each of the factors of the SVD and then reverse the order of multiplication. If we try
this there is certainly no problem with U or V ; since these matrices are orthogonal they
are certainly invertible. But what about . This is just a scaling matrix and so it might
5
We will soon see that Col U = Col A andCol V = RowA.
seem that to invert it we just have to undo the scalings. In particular the x coordinate
is scaled by 2 so to undo that scaling we just have to multiply by 1/2. But the y value is
scaled by 0 and that means all y values are mapped to 0, so there is no way to undo this
scaling.That is one way of understanding why A is not invertible one of the singular
values is equal to 0 and a scaling by 0 cannot be inverted.
If we proceed as outlined above, the pseudoinverse of A should be given by V
U
T
which
gives
_
2/2
2/2
2/2
2/2
_ _
1/2 0
0 0
_ _
2/2
2/2
2/2
2/2
_
=
_
1/4 1/4
1/4 1/4
_
Now suppose you had the following system of equations
x
1
+x
2
= 1
x
1
+x
2
= 3
This system is obviously inconsistent. The normal equations would be
2x
1
+ 2x
2
= 4
2x
1
+ 2x
2
= 4
The normal equations have innitely many solutions so the system we are looking at
doesnt have a unique least squares solution. It has innitely many least squares solu-
tions.
2
2
4
6
3 2 1 1 2 3
x
Figure 7.20: The two solid parallel lines represent the inconsistent system. The dotted line represents
the least-squares solutions to the system.
The normal equations imply that all the points on the line x
1
+ x
2
= 2 would be least
squares solutions. This is illustrated in Figure 7.20. If we write this system as Ax = b
suppose we tried to nd a least squares solution by multiplying by the pseudoinverse
found above. In this case we get
A
b =
_
1/4 1/4
1/4 1/4
_ _
1
3
_
=
_
1
1
_
What is so special about this result? First of all it lies on the line x
1
+ x
2
= 2 so it is
a least squares solution. More than that it is the least squares solution of the minimum
length (i.e., it is the least squares solution that is closest to the origin).
Although we wont prove it, what happended in this example will always happen. If
Ax = b has linearly independent columns then A
b will give the unique least squares

solution to the system. If Ax = b has linearly dependent columns then the system will
have many least squares solutions and A
b will give the least squares solution to the

system of minimum norm.
You should see the pseudoinverse as a generalization of the idea of a matrix inverse. The following
points should clarify this.
If A is square with independent columns then A is invertible and the pseudoinverse of A would
be the same as the inverse. That is,
A
= A
1
In this case, a linear system Ax = b would have the unique solution A
1
b.
If A is not square but has linearly independent columns then A is not invertible but A does
have a pseudoinverse. The pseudoninverse can be computed as
A
= (A
T
A)
1
A
T
In this case A
b gives the unique least-squares solution to Ax = b.

If A does not have linearly independent columns then the pseudoninverse can be computed
using the SVD. In this case A
b gives the least-squares solution of minimum norm to the

system Ax = b.
Exercises
1. Suppose you are given the following SVD of A
A =
_
_
1 0 0
0 1/
2 1/
2
0 1/
2 1/
2
_
_
_
_
4 0
0 2
0 0
_
_
_
2/
5 1/
5
1/
5 2/
5
_
What is A
?
2. Suppose
A = 3
_
_
1/
2
1/
2
0
_
_
_
2/3 1/3 2/3
What is A
?
3. Suppose
A = 3
_
_
2
1
1
_
_
_
1 1 1
What is A
?
4. Use the SVD to nd the pseudoinverse of
(a)
_
_
1 1
1 1
1 1
_
_
(b)
_
1 1 1
1 1 1
_
(c)
_
1 1
1 1
_
5. Find the psudoinverse of
(a)
_
_
1 2 3
0 0 0
0 0 0
_
_
(b)
_
_
1 0 0
2 0 0
3 0 0
_
_
(c)
_
_
0 0 1
0 0 2
0 0 3
_
_
6. Let =
_
3 0 0
0 2 0
_
. Evaluate
and
.
7. Let =
_
_
6 0
0 4
0 0
_
_
. Evaluate
and
.
8. Let =
_
_
5 0 0
0 2 0
0 0 0
0 0 0
_
_
. Evaluate
and
.
9. Use the pseudoinverse to nd a least squares solution to the following system:
x
1
+x
2
+x
3
= 0
x
1
+x
2
+x
3
= 6
10. The system
x
1
+ 2x
2
+x
3
= 3
x
2
x
3
= 0
is consistent and has innitely many solutions.
(a) Find an expression for the general solution of this system.
(b) Find an expression for the magnitude squared of the general solution and use calculus to
determine the smallest possible value of the magnitude squared.
(c) If you write this system as Ax = b, evaluate A
b. How does this relate to the answer from

(b)?
11. (a) What is the pseudoninverse of any n 1 matrix A =
_
_
a
1
a
2
.
.
.
a
n
_
_
? (Hint: use the fact that this
matrix has rank 1.)
(b) What is the pseudoninverse of any 1 n matrix A =
_
a
1
a
2
a
n
?
12. Let A be an m n matrix of rank r with SVD A = UV
T
.
(a) What is A
u
i
for 1 i r?
(b) What is A
u
i
for r + 1 i m?
13. If A has orthonormal columns what is A
?
14. Show that if A is an invertible matrix then A
= A
1
.
15. Show that A
A and AA
are symmetric.
16. Show that AA
A = A and A
AA
= A
. (Note: this result along with the previous problem shows

that AA
and A
A are projectors.)
Using MAPLE
Example 1.
In this example we will use Maple to nd the least-squares solution to an overdetermined system with
the pseudoinverse. Our system of equations will represent an attempt to write e
x
as a linear combination
of 1, x, x
2
, and x
3
. We will convert this into a discrete problem by sampling these functions 41 times
on the interval [2, 2].
>f:=x->exp(x):
>g[1]:=x->1:
>g[2]:=x->x:
>g[3]:=x->x^2:
>g[4]:=x->x^3:
>xvals:=Vector(41,i->-2+.1*(i-1)):
>u:=map(f,xvals):
>for i to 4 do v[i]:=map(g[i],xvals) od:
We will now try to write u as a linear combination of the v
i
. Now vector u is a discrete approximation
to e
x
and vectors v
1
, v
2
, v
3
, and v
4
are approximations to 1, x, x
2
and x
3
so our problem is the discrete
version of trying to write e
x
as a cubic polynomial. Setting up this problem will result in an inconsistent
system of 41 equations in 4 unknowns.
In the following Maple commands we compute the pseudoinverse from the fact that if
A =
1
u
1
v
T
+
2
u
2
v
2
2
+ +
r
u
r
v
T
r
then
A
=
1
1
v
1
u
T
+
1
2
v
2
u
2
2
+ +
1
r
v
r
u
T
r
>A:=<v[1]|v[2]|v[3]|v[4]>:
>pinvA:=eval( add( 1/S[i] * Column(V^%T,i) . Column(U^%T,i),i=1..4)):
>soln:=pinvA.u;
soln = [.92685821055486, .9606063839232, .6682692476746, .209303723666]
>p1:=add(soln[i]*x^(i-1),i=1..4);
>p2:=1+x+1/2*x^2+1/6*x^3;
>plot([exp(x),p1,p2],x=-2..2,color=[black,red,blue]);
The resulting plot is shown in Figure 7.21. By looking at the graphs it appears that by using
the weights computed above we get a better approximation to e
x
than the Taylor polynomial. We can
quantify this a bit more clearly as follows:
>int((exp(x)-p1)^2,x=-2..2);
.015882522833790110294
>int((exp(x)-p2)^2,x=-2..2);
.27651433188568219389
These values show that p1 is closer to e
x
than p2
Example 2.
2
4
6
2 1 1 2
x
Figure 7.21:
In this example we will illustrate how to write a Maple procedure that will compute the pseudoinverse
of a matrix. We will call our procedure pinv. We will rst give the procedure and make some comments
afterwards. When you are entering the procedure you should end each line (until you are nished) with
SHIFT-ENTER rather than ENTER. This prevents a new prompt from appearing on each line.
>pinv:=proc(A)
local sv1,sv2,U,V,i:
U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
sv2:=select(x->x>10^(-8),S);
eval(add( 1/sv2[i]*Column(Vt^%T,i).Row(U^%T,i),i=1..Dimension(sv2)));
end;
The rst line gives the name of the procedure and indicates that the procedure will require one
input parameter. The A in this line is a dummy variable, it stands for whatever matrix is input to
the procedure.
The second line lists the local variables used in the procedure. These are basically all the symbols
used within the procedure.
The third line computes the SVD of the input matrix.
The fourth line is a bit tricky. Some of the singular values from the previous line could be zero.
We just want the non-zero singular values. But, unfortunately, due to rounding errors sometimes
singular values that should be 0 turn out to be small non-zero decimals. This line selects all the
singular values that are greater than 10
8
. Even if a singular value is not zero but very small then
its reciprocal will be very large and this can result in numerical instability in the computation.
The fth line computes the pseudoinverse as
i
v
i
u
T
i
for the non-zero singular values, or at least for the singular values greater than our cut-o value.
The last line indicates that the procedure is nished. You can now use the pinv command to nd
the pseudoinverse of any (numerical) matrix.
For example:
>M:=<<1,5>|<2,6>|<3,7>|<4,8>>;
>pinv(M);
This returns the matrix:
_
_
0.5500000002 0.2500000001
0.2250000001 0.1250000000
0.1000000000 1.0 10
11
0.4250000002 0.1250000001
_
_
7.4 The SVD and the Fundamental Subspaces of a Matrix
Suppose A is an mn matrix of rank r with the following SVD
A =
_
u
1
. . . u
r
u
r+1
. . . u
m
1
.
.
.
r
0
.
.
.
0
_
_
_
v
1
. . . v
r
v
r+1
. . . v
n
T
which can be written as
A =
1
u
1
v
T
1
+
2
u
2
v
T
2
+ +
r
u
r
v
T
r
Now since A is m n with rank r it follows that Nul A has dimension n r. If we look at the
product Av
k
where k > r then we have
Av
k
=
_
1
u
1
v
T
1
+
2
u
2
v
T
2
+ +
r
u
r
v
T
r
_
v
k
= 0
since the columns of V are orthogonal. It follows then that v
r+1
, . . . , v
n
is an orthonormal basis
of Nul A. Since the row space of A is the orthogonal complement of the null space it then follows
that v
1
, . . . , v
r
is an orthonormal basis of RowA.
If we apply the above argument to A
T
we then get u
r
, . . . , u
m
is an orthonormal basis of
Nul A
T
, and u
1
, . . . , u
r
is an orthonormal basis for RowA
T
(which is the same as Col A).
Given any matrix A, the four fundamental subspaces of A are: Col A, Nul A, Col A
T
, and Nul A
T
.
So the SVD of A gives orthonormal bases for each of these subspaces.
The SVD also gives us projectors onto these four fundamental subpaces.
AA
projects onto Col A.

A
A projects onto RowA.

I AA
projects onto NulA

T
.
I A
A projects onto NulA.

The following may help clarify some of the above comments:
If A is an n n matrix with linearly independent columns then A is invertible and
A
1
A = I
AA
1
= I
In this case we have Col A = RowA = R
n
and Nul A = Nul A
T
= 0.
If A is not square but has linearly independent columns then A has a pseudoinverse and
A
A = I
AA
= the projector onto Col A

If the columns of A are not linearly independent then A has a pseudoinverse and
A
A = the projector onto Row A

AA
= the projector onto Col A

7.4. The SVD and the Fundamental Subspaces of a Matrix 325
Example 7.4.7
L et
A =
_
_
1 1 1
1 1 1
1 1 1
1 1 1
_
_
It should be perfectly clear that A has rank 1 and that
_
_
1
1
1
_
_
is a basis for the row space
of A and
_
_
1
1
1
1
_
_
T
is a basis for the column space of A. If we nd the SVD we get
V =
_
_
1/
3 1/
2 1/
6
1/
3 0 2/
6
1/
3 1/
2 1/
6
_
_
The rst column is a unit vector that is a basis of RowA. Because the columns are
orthonormal, the second and third columns form a basis for the plane orthogonal to the
row space, and that is precisely Nul A.
We also have
U =
_
_
1/2 1/
2 1/
6 1/
12
1/2 0 0 3/
12
1/2 1/
2 1/
6 1/
12
1/2 0 2/
6 1/
12
_
_
Again is should be easy to see that the rst column is a unit vector that is a basis for
Col A and so the remaining columns must be an orthonormal basis of Nul A
T
Example 7.4.8
L et A =
_
_
0 1 0
1 0 2
0 2 0
_
_
. Find the matrix that projects vectors orthogonally onto Col A.
One way of doing this would be to nd an explicit orthonormal basis for the column
space. In this particular case this is easy because it is clear that the rst two columns
form an orthogonal basis for the column space. If we normalize these columns then we
can compute the projector as
_
_
0 1/
5
1 0
0 2/
5
_
_
_
_
0 1/
5
1 0
0 2/
5
_
_
T
=
_
_
1/5 0 2/5
0 1 0
2/5 0 4/5
_
_
(If you look at this projector it should be clear that it has rank 2. You should remember
that this corresponds to the fact that it projects vectors onto a 2 dimensional subspace.)
Another way of nding the projector is by the SVD. In this case the SVD would be given
by
UV
T
=
_
_
0 1/
5 2/
5
1 0 0
0 2/
5 1/
5
_
_
_
_
5 0 0
0
5 0
0 0 0
_
_
_
_
1/
5 0 2/
5
0 1 0
2/
5 0 1/
5
_
_
T
The pseudoinverse is then given by
A
= V
U
T
=
_
_
0 1/5 0
1/5 0 2/5
0 2/5 0
_
_
The projector onto Col A will then be
AA
=
_
_
1/5 0 2/5
0 1 0
2/5 0 4/5
_
_
Note that in this case the rst method seems simpler because it was very easy to nd an
orthonormal basis for the column space. The second method has an advantage in that
it allows you to dene the projector strictly in terms of matrix A regardless of the size
of A.
7.4. The SVD and the Fundamental Subspaces of a Matrix 327
Exercises
1. Let A =
_
_
1 1 0
1 1 0
0 0 1
_
_
.
(b) Find a basis for Col A and Row A.
(c) Find a basis for Nul A and Nul A
T
.
(d) Evaluate A
A and AA
2. Let A =
_
_
1 0
1 0
1 0
1 2
_
_
.
(b) Find a basis for Col A and Row A.
(c) Find a basis for Nul A and Nul A
T
.
(d) Evaluate A
A and AA

7.5 The SVD and Statistics
There are deep connections between linear agebra and statistics. In this section we want to take
a brief look at the relationship bewteen the SVD of a matrix and several statistical concepts.
Suppose a series of measurements results in several lists of related data. For example, in a study
of plant growth biologists might collect data about the temperature, the acidity of the soil, the
height of the plants, and the surface area of the leaves. The data collected can be arranged in the
form of a matrix called the matrix of observations. Each parameter that is measured can be
arranged along one row of the matrix, so an mn matrix of observations consists of n observations
(i.e., measurements) of m dierent parameters.
Let X = [X
1
X
2
X
n
] be an m n matrix of observations. The sample mean, M, is given
by
M =
1
n
(X
1
+X
2
+ +X
n
)
If we dene

X
j
= X
k
M then the matrix
B =
_
X
1

X
2

X
n
_
is said to represent the data in mean-deviation form.
The covariance matrix, S, is dened to be
S =
1
n 1
BB
T
As an example suppose we measured the weights and heights of 10 individuals and got the results
shown in the following table.
weight (kg) 23.1 16.2 18.4 24.2 12.4 20.0 25.2 11.1 19.3 25.1
height (m) 1.10 .92 .98 1.24 .86 .99 1.21 .75 1.00 1.35
This would give a 2 10 matrix of observations. Each observation would involve the measurement
of 2 parameters.
The sample mean is the vector whose entries are the average weight and average height. Com-
puting these averages we get
M =
_
19.5
1.04
_
If we now subtract this mean from each of the observations we get the data in mean-deviation
form
B =
_
3.6 3.3 1.1 4.7 7.1 .5 5.7 8.4 .2 5.6
.06 .12 .06 .20 .18 .05 .17 .29 0.4 .31
_
If we look at each column of the matrix of observations as a point in R
2
we can plot these points
in what is called a scatter plot. Figure 7.22 is a scatter plot of our data. In this plot the sample
mean is also plotted as a cross, it is located at the center of the data. For comparison, Figure
7.23 is a plot of the data in mean deviation form. The only dierence is that the data has been
shifted so that it is now centered around the origin. In mean deviation form the sample mean will
be the origin. The entries in matrix B indicate how much above or below average each value lies.
The covariance matrix would be
S =
1
9
BB
T
=
_
25.807 .891
.891 .034
_
7.5. The SVD and Statistics 329
0.8
0.9
1
1.1
1.2
1.3
12 14 16 18 20 22 24
Figure 7.22: Scatter plot of original data.
0.3
0.2
0.1
0
0.1
0.2
0.3
8 6 4 2 2 4
Figure 7.23: Data in mean-deviation form.
The entries down the diagonal of the covariance matrix represent the variance of the data. In
particular, the diagonal entry s
jj
of matrix S is the variance of the j
th
parameter.
So, in the above example, 25.907 is the variance of the weight and .034 is the variance of the
height.
The variance can be interpreted as a measure of the spread of the values of a certain parameter
around the average value. For example, the average of 9 and 11 is 10, but the average of -100 and
220 is also 10. The dierence is that the rst pair of numbers lie much closer to 10 than the second
pair, i.e., the variance of the rst pair is much less than the variance of the second pair.
The total variance of the data is the sum of all the separate variances. That is, the total
variance of the data is the sum of the diagonal entries of S (this is also the trace of S).
Each o-diagonal entry of matrix S, s
ij
for i ,= j, is called the covariance between parameters
x
i
and x
j
of the data matrix X. Notice that the covariance matrix is symmetric so s
ij
= s
ji
. If the
covariance is 0 it is said that the corresponding parameters are uncorrelated.
Principal Components
The covariance matrix is symmetric and positive denite so, as weve seen before, it can be diag-
onalized. To diagonalize S we would nd the (positive) eigenvalues and then the corresponding
eigenvectors. The eigenvectors of S determine a set of orthogonal lines. If u
i
is one of these eigen-
vectors then the vector B
T
u
i
is called a principal components of the data. The principal component
corresponding to the largest eigenvalue is called the rst principal component. The second principal
component corresponds to the second larget eigenvalue and so on.
For our earlier example of weights and heights we would get the following eigenvalues and unit
eigenvectors
1
= 25.838,
2
= .0032
u
1
=
_
.9994, .0345
, u
2
=
_
.0345, .9994
The rst principal component would then be

B
T
u
1
=
_
3.600 3.302 1.101 4.704 7.102 .498 5.702 8.405 .214
T
The second principal component would be
B
T
u
2
=
_
.064 .006 .022 .038 .065 .067 .027 .000 .393 .117
T
Now this might seem confusing but all that is going on is a change of basis. We have our data in
mean-deviation form and we are converting it to our eigenbasis
6
which has been ordered according
to the size of the eigenvalues. The rst principal component is a vector that contains all the rst
coordinates of our data points relative to the eigenbasis. The second principal component is made
up of the second coordinates of our data points relative to the eigenbasis.
In the above example the entries in the second principal component are fairly small. This means
that most of the data points lie very near the rst eigenspace. That is, relative to the eigenbasis our
data is approximately 1-dimensional. This is connected to the relative sizes of the eigenvalues. The
sum of the eigenvalues of S will equal the total variance.
In the following plot we see the data in mean-deviation form and the eigenspace of S correspond-
ing to the rst principal component (
1
= 25.8374).
0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
10 64 2 4 6 8 101214161820222426
x
Figure 7.24: The data in mean-deviation form and the rst principal component
The line through the origin along the principal component has slope .0345. This would have
equation

h = .0345 w where

h and w are the weight and height in mean deviation form. Is this just
the least-squares line? No
7
, the signicance of this line and how it relates to the least-squares line
will be explained in the next section.
6
The eigenbasis consists of the eigenvectors of mathbfS which are the right singular vectors of X.
7
The least-squares line would be w = .0383
h. You should try deriving this equation for a bit of review.

Exercises
1. Given the following data points (in mean-deviation form)
x -2 -1 1 2
y -3 0 2 1
(a) Find the least-squares line for this data.
(b) Find the total least-squares line for this data.
(c) Plot the data points and the two lines from (a) and (b) on the same set of axes.
(d) Consider the line y = x. Find the square root of the sum of the squares of the vertical
distances of the data points to this line. Find the square root of the sum of the squares of
the perpendicular distances of the data points to this line.
2. Given the following data points (in mean-deviation form)
x -2 -1 0 1 2
y 1 1 0 -2 0
(c) Plot the data points and the two lines from (a) and (b) on the same set of axes.
(d) Consider the line y = x. Find the square root of the sum of the squares of the vertical
distances of the data points to this line. Find the square root of the sum of the squares of
the perpendicular distances of the data points to this line.
3. Given the following data points
x 1 3 5
y 3 1 2
4. Let A =
_
3 4 1
1 2 5
_
be a data matrix.
(a) Convert A to mean-deviation form.
(b) Find the covariance matrix.
(c) Find the principal components.
(d) What fraction of of the total variance is due to the rst principal component.
5. Let A =
_
_
1 1 2 2 1 2 1 2
3 5 7 9 11 13 15 17
1 1 1 1 1 1 1 1
_
_
be a data matrix.
(a) Convert A to mean-deviation form.
(b) Find the covariance matrix.
(c) Find the principal components.
(d) What fraction of of the total variance is due to the rst principal component.
Using MAPLE
Example 1.
We will use Maple to illustrate the idea of principal components.
We begin by generating 200 points using one of the random number routines in Maple .
>with(stats[random]):
>xv:=[seq( normald(),i=1..200)]: ### the x coordinates
>yv:=[seq(.9*xv[i]+normald(),i=1..200)]: ### the y coordinates
>mx:=add(xv[i],i=1..200)/200: ### the average x value
>my:=add(yv[i],i=1..200)/200: ### the average y value
>mxv:=[seq(xv[i]-mx,i=1..200)]: ### x in mean deviation form
>myv:=[seq(yv[i]-my,i=1..200)]: ### y in mean deviation form
>data:=[seq( [mxv[i],myv[i]],i=1..200)]:
>p1:=plot(data,style=point,color=black):
>B:=< convert(mxv,Vector), convert(myv,Vector)>;
>M:=1/199*B^%T.B;
_
1.084 .9079
.9079 1.790
_
>SingularValues(M,output=[U,S,Vt]);
[ 2.4115, .4631 ]
_
.5646 .8254
.8254 .5646
_
The rst row of V t gives the rst principal component. We will compute the corresponding slope.
>m1:=Vt[1,2]/V[1,1]:
>p2:=plot([m1*x,-1/m1*x],x=-3..3,thickness=2,color=black):
>plots[display]([p1,p2],scaling=constrained);
4
2
2
4
3 2 1 1 2 3
Figure 7.25: The 200 data points and the principal components.
This gives Figure 7.25. We have a cloud of data points centered at the origin. These points lie in
a roughly elliptical region. The principal components correspond to the axes of that ellipse.
Example 2.
In this example we will begin by genterating 30 data points and then put the data in mean-deviation
form. The steps are similar to the rst example.
>xv:=[seq(normald(),i=1..30)]:yv:=[seq(.5*normald()+.6*xv[i], i=1..30)]:
>mx:=add(xv[i],i=1..30)/30:
>my:=add(yv[i],i=1..30)/30:
>mxv:=convert([seq(xv[i]-mx,i=1..30)],Vector):
>myv:=convert([seq(yv[i]-my,i=1..30)],Vector):
>data:=[seq( [mxv[i],myv[i]],i=1..30)]:
We now have a collection of points centered at the origin. Look at any straight line drawn through
the origin at angle (the slope of this line would be tan ). We will nd the sum of the squares of the
orthogonal distances to this line and the sum of the squares of the vertical diastances to this line.
A unit vector in the direction of the line would be
_
cos()
sin()
_
. A unit vector normal to the line would
be
_
sin()
cos()
_
. The orthogonal distance from a point
_
x
i
y
i
_
to this line the length of the projection onto
the normal vector and this would be [ x
i
sin() +y
i
cos()[.
The vertical distance from
_
x
i
y
i
_
to this line would be [y
i
x
i
tan()[.
We will use Maple to compute the sum of the squares of these distances and plot the results functions
of . We will call the sum of the squares of the orthogonal distances D1, and the sum of the squares of
the vertical distances will be called D2.
>D1:=expand(add( (-mxv[i]*sin(t)+myv[i]*cos(t))^2,i=1..30));
D1 = 19.42397 cos
2
() + 33.01234 sin
2
() 40.59837 sin() cos()
>D2:=expand(add( ( myv[i]-mxv[i]*tan(t) )^2,i=1..30));
D2 = 33.01234 tan
2
() 40.59837 tan() + 19.42397
>plot( [ D1, D2 ], t=-Pi/2..Pi/2, 0..60, thickness=2);
The plots of D1 and D2 are shown in Figure 7.24. The plot shows that both of these functions
take on a minimum at around = .5. Using Maple we can nd where these minima occur. We will
nd the derivatives (using the diff command), and nd the critical values.
>fsolve(diff(D1,t)=0,t=0..1);
.62391
>fsolve(diff(D2,t)=0,t=0..1);
.55130
So the line which minimizes the sum of the squares of the orthogonal distance would lie at an
angle of .6391 radians. For vertical distances the minimum would be when the line lies at .55310
radians.
0
10
20
30
40
50
60
1.5 1 0.5 0.5 1 1.5
t
Figure 7.26: The plot of D1 and D2
Now the line which minimizes the sum of the squares of the vertical distances would be the least-
squares line. If our x coordinates are in x and out y coordinates are in vy then the least-squares
line through the origin tting these points would have slope
y x
x x
To nd the angle at which this line lies we then apply the inverse tangent. In Maple we have
>arctan(DotProduct(mxv,myv)/DotProduct(mxv,mxv));
.55130
>VectorAngle(mxv,myv); ### an easier way
Amazing! This is the same result that we obtained above using calculus to determine the mini-
mum value of D2.
Now what about the minimum of D1. How do we nd this using linear algebra. The minimum
line here will be the eigenspace of the covariance matrix corresponding to the largest eigenvalue.
>B:=<mxv|myv>:
>S:=1/29*B^%T.B;
>U,S,Vt:=SingularValues(M,output=[U,S,Vt]);
The line we are looking for is determined by the rst column of V. We will nd the slope of this
line and then apply the inverse tangent.
>arctan(V[1,2]/V[1,1]);
.62391
This agrees with the previous result obtained using calculus.
7.6. Total Least Squares 335
7.6 Total Least Squares
Suppose you want to nd the straight line that gives the best t to a collection of data. One
approach, as we saw earlier, is to nd the least squares line. The assumption of this approach is
that all the error of the data is located in the y values
8
. In some cases this assumption is valid, but
in many cases it will turn out that there are errors of measurement in both the x and y values.
Suppose we have data matrix X =
_
x
1
x
2
x
n
where each column is a point in R

2
and
the data is already in mean-deviation form. Let u be a unit vector, and let x = tu be the line
through the origin in the direction of u. We can nd square of the orthogonal distance of each point,
x
i
, to the line x = tu by using the projector I uu
T
.
|(I uu
T
)x
i
|
2
= x
T
i
(I uu
T
)(I uu
T
)x
i
= x
T
i
(I uu
T
)x
i
The sum of all such distances is therefore
n
i=1
x
T
i
(I uu
T
)x
i
=
n
i=1
|x
i
|
2
i=1
x
T
i
uu
T
x
i
Look at the expression on the right hand side of the above equation. This represents the value that
we want to minimize as the dierence of two sums. If we wish to nd the vector u which minimizes
this quantity we must maximize the second sum. This is because the value of the rst sum is xed
by the given data points so we want to subtract as much as possible from this sum. But the second
sum is
x
T
i
uu
T
x
i
=
u
T
x
i
x
T
i
u
= u
T
_
x
i
x
T
i
_
u
= u
T
XX
T
u
and this can be seen as a quadratic form with the unknown u and the maximum will be taken on
when u is a unit eigenvector corresponding to the largest eigenvalue of the matrix XX
T
. Finally,
this is just the rst principal component of X.
As an example suppose we have the following data values:
x 1.1 1.2 1.3 1.8 1.9 2.1 2.3 2.4 2.5 3.0 3.3 3.8
y .2 .4 .4 .6 .7 .9 .8 1.0 1.3 1.1 2.5 2.8
We can put this data in mean-deviation form and then nd the least squares line and total least
squares line. We will outline the steps.
First nd the average x and y values.
12
i=1
x
i
12
= 2.225
12
i=1
y
i
12
= 1.0583
8
When we nd the least squares line by solving a system of the form X = y we remove the error from y by
projecting y onto the column space of X. The column space of X is determined by the x coordinates of the data
points. If there are errors in these x values then this method would be awed.
We subtract these averages from the x and y values to put the data in mean-deviation form and
create matrix X.
X =
_
1.125 1.025 0.925 0.425 0.325 0.125 0.075 0.175 0.275 0.775 1.075 1.575
0.8853 0.6853 0.6853 0.4853 0.3853 0.1853 0.2853 0.0853 0.2147 0.0147 1.4147 1.7147
This gives
XX
T
=
_
7.82250 6.94250
6.94250 7.21791
_
This matrix has eigenvalues of 14.4693 and .5711. A basis for the eigenspace corresponding to
the largest eigenvalue is
_
.72232
.69156
_
. This eigenspace would be a line whose slope is
.69156
.72232
= .95741
If we nd the least squares line through these points in the usual way we would get a line whose
slope is .88750.
If we plot the data and the lines we obtain Figure 7.27.
LS
TLS
1.2
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1.2 0.8 0.2 0.6 11.2 1.6
Figure 7.27: Comparison of the total least squares line (TLS) and the least squares line (LS).
Exercises
1. Find the pseudoinverse of A =
_
1 2
1 2
_
and use this pseudoinverse to nd the least-squares
solution of the system
Ax =
_
1
0
_
2. Find the pseudoinverse of A =
_
_
1 2
2 1
1 1
_
_
and use this pseudoinverse to nd the least-squares
solution of the system
Ax =
_
_
1
1
1
_
_
3. Find the least-squares line and the total least-squares line for the following data points. Plot both
lines on the same set of axes.
x 0 1 2
y 0 0 1
4. Find the least-squares line and the total least-squares line for the following data points. Plot both
lines on the same set of axes.
x -1 0 1 2
y 0 0 1 1
Using MAPLE
Example 1.
We will use Maple to illustrate another use of the SVD - data compression. We will begin by dening
a 30 30 matrix of data.
>with(plots):
>f:=(x,y)-> if abs(x)+abs(y)<1 then 1 else 0 fi:
>A:=Matrix(30,30,(i,j)->f((i-15)/12,(j-15)/12)-f((i-15)/4,(j-15)/4)):
>matrixplot(A);
Figure 7.28: Matrixplot of A.
There is nothing special about this matrix other than the fact that it generates a nice picture.
The matrixplot command in Maple generates a 3 dimensional image where the values in the matrix
correspond to the heights of a surface. This gives us a nice way of visualizing the data in the matrix.
Now we will nd the singular value decomposition of A and we will dene new matrices
B
1
=
1
u
1
v
T
1
B
2
=
1
u
1
v
T
1
+
2
u
2
v
T
2
B
3
=
1
u
1
v
T
1
+
2
u
2
v
T
2
+
3
u
3
v
T
3
.
.
.
.
.
.
and plot them. The second line of Maple code below is a bit tricky but it just corresponds to the formula
B
i
=
i
j=1
j
u
j
v
T
j
>for i to 12 do
B[i]:=value(add(S[j]*Column(U,j).Row(Vt,j)),j=1..i)) od:
>matrixplot(B[1]);
>matrixplot(B[2]);
>matrixplot(B[3]);
>matrixplot(B[10]);
Figure 7.29: Matrixplot of B
1
.
2
3
.
10
Now matrix A contains 900 entries. Matrix B
1
was computed from only 61 numbers -
1
, the 30
entries in u
1
and the 30 entries in v
1
. With less than 7% of the original amount of information we
were able to reconstruct a poor approximation to A - the plot of A is not recognizable from the plot
of B
1
. With matrix B
2
we use 122 numbers, about 14% of the original amount of data. The plot of
B
2
is beginning to reveal the basic 3d structure of the original data. By adding more and more of the
components of the SVD we can get closer and closer to A, and the plot of B
10
is very close to the
plot of A, but if you look at the singular values they become very small after the rst 12 so all the
components after
12
should contribute very little to the reconstruction of A. To construct B
12
we need
732 numbers which is about 80% of the amount of data in A. The point is that if we wanted to transmit
the information in matrix A to another location we could save time by sending the 732 numbers needed
to reconstruct A
12
rather than the 900 numbers of A thereby reducing the amount of data that must be
transferred. It is true that the matrix reconstructed would not be exactly the same as A but, depending
on the context, it might be acceptably close.
How close is matrix B
1
to matrix A. If they were the same then A B
1
would be the zero matrix.
If B
1
is close to A then the entries in A B
1
should all be small. The distance from A to B
1
could
be measured in a way similar to how we measure the distance from one vector to another. We could
subtract the matrices, square the entries, add them, and then take the square root
9
. In Maple we can do
this with the Norm command with the frobenius option. (There are various ways of nding the norm
of a matrix. The method we are using here is called the Frobenius norm.)
>Norm(A-B[1],frobenius);
9
This is in fact the same as nding the distance relative to the inner product A, B) = trace(A
T
B)
This gives us the values 6.791, 2.648, 2.278, .013426353, .764e-8. The conclusion is that B
1
2 is very
close to A.
We can plot the error of the successive approximations and get the following graph. We also plot
the singular values for comparison. Notice how the decrease in the errors parallels the decrease in the
singular values.
0
1
2
3
4
5
6
2 4 6 8 10 12
Figure 7.33: Errors of the SVD reconstructions.
2
4
6
8
10
12
2 4 6 8 10 12
Figure 7.34: The singular values ofA.
There is another way to visualize how the matrices B
j
approximate A using an animation in Maple .
>for i to 12 do p[i]:=matrixplot(B[i]) od:
>display( [ seq( p[i],i=1..12) ], insequence=true );
Example 2.
We will now look at another example of data compression. We begin by dening a 8 8 matrix:
>M:=Matrix(8,8,[[0,0,0,0,0,0,0,0],
[0,1,1,1,1,1,1,0],
[0,1,0,0,0,0,1,0],
[0,1,0,1,1,0,1,0],
[0,1,0,0,0,0,1,0],
[0,1,0,0,0,0,1,0],
[0,1,1,1,1,1,1,0],
[0,0,0,0,0,0,0,0]]):
This matrix would correspond to the following image where 0=black and 1=white.
Figure 7.35: An 8 by 8 image.
The JPEG method of compressing an image involves converting it to the Discrete Cosine basis that
we mentioned in Chapter 1. We will write a procedure that converts an 8 8 matrix to the Discrete
Cosine basis and the corresponding inverse transformation. First we dene a function f that gives cosine
functions at various frequencies. We then generate a basis for R
8
by sampling f. We place these basis
vectors in in matrix A and let A1 be the inverse of A (these are the change of basis matrices). Then we
dene dct for the Discrete Cosine transform and idct for the inverse transform.
>f:=(k,t)->evalf(cos(Pi*k*(t-1/2)/8)):
>A:=Matrix(8,8, (i,j)->f(i-1,j));
>A1:=A^(-1):
>dct:=proc(mat)
local m1;
m1:=mat:
A1.m1.A1^%T;
end:
>idct:=proc(mat)
local m1;
m1:=mat:
A.M1.A^%T;
end:
We now will apply the dct procedure to M and call the result TM. This matrix contains all the
information from the original image but relative to a dierent basis. Image compression is performed by
reducing the amount of information in TM by making all small entries equal to 0. The following Maple
code scans through the entries in TM and if an entry is lesss than 0.2 then that entry is made 0.
>TM:=dct(M);
>for i to 8 do
for j to 8 do if abs(TM[i,j])<.2 then TM[i,j]:=0 fi
od; od;
>print(TM);
This gives the following matrix
_
_
0.3437500000 0 0 0 0.2209708692 0 0.3027230266 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0.3093592166 0
0 0 0 0 0 0 0 0
0.2209708691 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0.3027230267 0 0.3093592167 0 0 0 0 0
0 0 0 0 0 0 0 0
_
_
Notice that we are keeping only 7 of the 64 entries in TM. We now transform back to the original basis.
>M2:=idct(TM):
This would correspond to the Figure 7.37
Figure 7.36: The DCT compressed image. Figure 7.37: The SVD compressed image.
Now we will compare this with using the SVD to compress the image:
>U,S,Vt:=SingularValues(M,oytput=[U,S,Vt]):
>M3:=value(add(S[i]*Column(U,i).Row(Vt,i)),i=1..2));
Here we are reconstructing the image from just two components of the SVD and we get
The idea is not to recreate the original image exactly. The idea is to create a reasonably good
reproduction of the image by using signicantly less data than that contained in the original image.
Since some of the original information is lost in this process, this type of compression is called lossy
compression.
Example 3.
We will use Maple to compare the total least squares line and the least squares line for a set of data.
We begin by generating two lists called xv and yv which contain the coordinates of our data points.
>with(stats[random],normald):
>f1:=x->.2*x+normald[0,.2]():
>f2:=x->2*sin(.1*x)+normald[0,.2]():
>xv:=[seq( f1(i), i=1..20)]: ## noisy x values
>yv:=[seq( f2(i),i=1..20)]: ## noisy y values
Next we have to put the data in mean-deviation form. We will write a procedure called mdform which
will take any list as an input and return the mean-deviation form of that list.
>mdform:=proc(L)
local n,m;
n:=nops(L): ## n is the number of poiints
m:=add(L[i],i=1..n)/n: ## m is the mean
convert([seq( L[i]-m, i=1..n)],Vector);
end:
>mx:=mdform(xv):
>my:=mdform(yv):
>A:=<mx|my>:
>M:=A^%T.A:
>U,S,Vt:=SingularValues(M,output=[U,S,Vt]);
The direction of the total least squares line is determined by the rst column of V computed above.
We will dene the slope of this line and then dene the plots the TLS line and the data points. The
plots wont be displayed until we nd the least squares line as well.
>v1:=Row(Vt,1):
>mtls:=v1[2]/v1[1]: ### the slope of the TLS line
>data:=[seq( [x[i],y[i]],i=1..50)]:
>p1:=plot(data,style=point):
>p2:=plot(mtls*x,x=-2..2): ### plot the TLS line
We can nd the least squares line as follows:
>mls:=dotprod(mx,my)/dotprod(mx,mx); ## slope of the LS line
>p3:=plot(mls*x,x=-2..2): ### plot the LS line
>plots[display]([p1,p2,p3]);
This gives the following plot:
Now the least squares line should minimize the sum of the squares of the vertical distances from the
data point to the line. We will use Maple to compute these distances for the TLS line and the LS line
>yls:=mls*mx:
>ytls:=mtls*mx:
>Norm(my-yls,2);
1.29529
>Norm(my-ytls,2);
1.30000
Each time you execute the above commands the numerical values obtained should vary because the data
points wwere generated using random numbers but the rst value should always be smaller than the
second.
We leave it as a nal Maple exercise for the reader to compare the sum of the squares of the
orthogonal distances from the data points to the lines. (See the discussion in section 7.6).
1
0.5
0.5
2 1 1 2
Figure 7.38: The TLS line, LS line, and the data points.
Chapter 8
Calculus and Linear Algebra
In this chapter we will look at some connections between techniques of calculus (dierentiation and
integration) and the methods of linear algebra we have covered in this course.
8.1 Calculus with Discrete Data
Suppose we have the following experimental data which gives the vapor pressure (in torr) of
ethanol at various temperatures (in degrees Centigrade)
Temperature T 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0
Pressure P 42.43 55.62 69.25 93.02 116.95 153.73 190.06 241.26 303.84
We have plotted this data in Figure 8.1 which shows that the pressure is clearly an increasing
function of temperature, P = f(T), but any precise formulation of this function is unknown. Now
suppose we want to answer the following questions:
What is the value of P when T = 42.7?
What is the value of dP/dT when T = 42.7?
What is
_
T=40.0
T=20.0
f(T) dT?
How could we answer these questions? We have seen two major approaches that could be taken.
We could nd an interpolating polynomial and use that to answer each of the above questions.
In Figure 8.1 we show the data, the interpolating polynomial, and the derivative of this polynomial.
Looking at these plots should convince you that this approach is unsatisfactory. As the plot of the
derivative clearly shows the interpolating polynomial is not strictly increasing and so it violates our
basic intuition of the physics of the problem. This is typical of what happens when you try to t a
high degree polynomial to a set of data.
The problem with the above approach is that we tried to nd a curve that t the data values
exactly but experimental data is almost always guaranteed to contain errors of measurement. An-
other method would be to nd a function that gives the best least-squares t to the data. The
problem here is to determine what type of function to t. In many cases understanding the physical
theory behind the phenomenon can indicate what type of function should be used. In the current
example we can see that P grows with T but should that growth be exponential, linear, quadratic,
or some other form. If we assume quadratic growth and nd the best tting function of the form
345
346 8. Calculus and Linear Algebra
0
50
100
150
200
250
300
30 40 50 60
Figure 8.1: Vapor pressure versus temperature (interpolation)
P = c
0
+C
1
T +c
2
T
2
we would get the plots shown in Figure 8.2 . Clearly this is a more satisfactory
result than simple interpolation.
We now want to take a dierent approach to this type of problem.
You should recall from calculus that if P = f(T) then the derivative dP/dT is dened as
f
(T) = lim
T0
f(T + T) f(T)
T
This means that we have the following approximation
dP
dT

f(T + T) f(T)
T
This is not saying anything new or complicated. It is stating the obvious fact that the instantaneous
rate of change can be approximated by an average rate of change over an interval (generally the
smaller the interval the better the approximation). There is an interval of T = 5.0 between each
of our data values in the above table.
8.1. Calculus with Discrete Data 347
0
50
100
150
200
250
300
20
30 40 50 60
Figure 8.2: Vapor pressure versus temperature (least squares t)
If we store the P values in a vector v then the nite dierences can be computed by matrix
multiplication
1
5
_
_
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
_
_
_
_
42.43
55.62
69.25
93.02
116.95
153.73
190.06
241.26
303.84
_
_
=
_
_
2.64
2.73
4.75
4.78
7.36
7.27
10.24
12.52
_
_
Notice that since it takes 2 data values to compute each nite dierence our nine data values
gives only eight nite dierences.
The matrix in the above example could be called a dierentiation matrix since for any vector
generated by sampling a function (with, in this case, an interval of 5 between the samples) multipli-
cation by the matrix results in a vector containing approximations to the derivative of the function.
What is the null space of this matrix?
8.2 Dierential Equations and Dynamical Systems
In this section we want to look at dierential equations of the form
dy
dt
= f(y)
where f(y) is linear. The left hand side of this equation represents the instantaneous rate of change
of y with respect to t. If we evaluate y at a sequence of equidistant values of t then this instantaneous
rate of change at the particular value y
k
can be approximated by
dy
dt

y
k+1
y
k
t
where t represents the t interval from y
k
to y
k+1
.
For example, if we had the function y = 3t
2
then the value of
dy
dt
at t = 1 could be approximated
by
y(1.1) (y(1)
.1
=
3(1.1)
2
3(1)
2
.1
= 6.3
where we are using a value of t = .1. It should be clear that in general we will get a better
approximation by using a smaller value of t. So if we used t = .02 we would have
y(1.02) (y(1)
.02
=
3(1.02)
2
3(1)
2
.02
= 6.06
Now suppose we have the dierential equation
dy
dt
= 5y
At the particular value y
k
this equation could be approximated by
y
k+1
y
k
t
= 5y
k
which can be rewritten as
y
k+1
= (1 + 5t)y
k
If we had some initial value (say y
0
= 3) and some xed interval (say t = .1) then we could
approximate subsequent values of y from the order 1 dierence equation y
k+1
= 1.5y
k
. This would
give y
0
= 3, y
1
= (1.5)(3) = 4.5, y
2
= (1.5)
2
(3) = 6.75, and so on. In general we would have
y
n
= 3(1.5)
n
. Remember y
n
stands for the value of y after n intervals of length .1 .
8.3. An Oscillating Spring 349
Exercises
1. Given the dierential equation
dy
dt
= 2y
We will look at several ways of solving this by a discrete approximation. The right hand side tells
us that the rate of change at time k is given by 2y
k
but how can we approximate this rate of
change by a nite dierence? For the following problems use a time interval of t = .25 with the
initial va lues y
0
= 1, y
1
= 1/2.
(a) The rate of change at time k can be approximated by the forward dierence
y
k+1
y
k
t
Use this approximation to solve the dierential equation. (Remember a solution in this context
is a sequence of values y
2
, y
3
, y
4
, . . . . This sequence of values will be generatedby a discrete
dynamical system.)
(b) The rate of change at time k can be approximated by the backward dierence
y
k
y
k1
t
Use this approximation to solve the dierential equation.
(c) The rate of change at time k can be approximated by the centered dierence
y
k+1
y
k1
2t
Use this approximation to solve the system.
(d) Plot the three approximate solutions along with the exact solution.
2. Repeat the previous problem for
dy
dt
= 2y + 1
Use the same t and initial values.
3. Repeat for
d
2
y
dt
2
= 2y + 1
8.3 An Oscillating Spring
In this section we will consider a system composed of a mass connected to a spring with the
other end of the spring connected to a xed support. We will assume that the only relevant force
is the spring force and will ignore gravity and friction. The mass is displaced from its equilibrium
position and released. The result is that the mass will oscillate.
On a conceptual level this is one of the most important examples in this chapter. It illustrates
how a problem can be analyzed in terms of basic laws of physics. These laws of physics can then
be expressed in the form of a dierential equation which can be solved in continuous time using
calculus. Finally it shows how it can be converted to discrete time and solved using linear algebra.
This last step might appear to be redundant but in many applications it turns out that the equations
involved are too complicated for a calculus solution and they have to be converted to discrete time
approximations.
The use of calculus in such problems results in what is called an analytical solution; the solution
is given as a function. The use of linear algebra as shown here is called a numerical solution; the
solution is a list of numbers. The development of fast computers with large memories have had a
revolutionary impact on applied mathematics. These technological improvements have made quick
and accurate numerical solutions possible where they would have been impossible 30 years ago.
Fixed
Support
Figure 8.3: A mass on a spring
Continuous Time
First we will analyze this as a continuous time system. From physics you know that the position
of the mass is governed by an equation of the form F = ma. Furthermore, in this example, we are
assuming the only relevant force is the spring force which is given by F = Kx where K > 0 is the
spring constant and x is the displacement of the mass from equilibrium. Combining these equations
we get
ma = Kx
m
d
2
x
dt
2
= Kx
d
2
x
dt
2
=
K
m
x
This last equation has a general solution of the following form
1
:
1
In fact, the set of solutions to this dierential equation form a vector space (i.e., the sum of any two solutions
is also a solution, and a scalar multiple of a solution is a solution). This vector space is two dimensional and has a
basis consisting of cos
q
K
m
t and sin
q
K
m
t. So you can look at the general solution as being any possible combination
of these basis vectors.
x = C
1
cos
_
K
m
t +C
2
sin
_
K
m
t
Problem. Given that K = 1, m = 1, and at t = 0 you know that x = 0 and
dx
dt
= 1 nd the
values of C
1
and C
2
.
Solution. Given the values of K and m the above equation would become
x = C
1
cos t +C
2
sin t
Substituting x = 0 and t = 0 into this equation we have
0 = C
1
cos(0) +C
2
sin(0)
= C
1
Hence C
1
= 0.
Now nd the derivative and substitute t = 0 and
dx
dt
= 1 :
dx
dt
= sint +C
2
cos t
1 = sin(0) +C
2
cos(0)
1 = C
2
Therefore the motion of of the oscillating mass is described by
x = sin t
Discrete Time
Weve seen before that a rst derivative can be approximated by a nite dierence. The second
derivative can be approximated in a similar way. Using the fact that the second derivative is the
derivative of the rst derivative we get
d
2
x
dt
2

x
k+2
x
k+1
t

x
k+1
x
k
t
t
=
x
k+2
2x
k+1
+x
k
t
2
This nite dierence expression would give an approximation to the second derivative at time
k +1 (the midpoint of the values used). So if we use this discrete approximation in the place of the
second derivative then the equation describing the motion becomes
x
k+2
2x
k+1
+x
k
t
2
=
K
m
x
k+1
Solving this for x
t+2
we get
x
k+2
=
_
2
K
m
t
2
_
x
k+1
x
k
Or in matrix form:
_
x
k+1
x
k+2
_
=
_
0 1
1 2 p
_ _
x
k
x
k+1
_
where p =
K
m
t
2
.
As an example of this discrete model suppose that K/m = 1 and t = .8. We then have the
dynamical system
x
k+1
=
_
0 1
1 1.36
_
x
k
To actually use this to compute values we would need x
0
which would require knowledge of
the position of the mass at times t = 0 and t = .8 (i.e, at k = 0 and k = 1). From previous
results we know that the position of the object is described by x = sin t, so at t = .8 we have
x
1
= sin(.8) = .71736. The initial state of the dynamical system is then x
0
=
_
0
.71736
_
. Repeated
multiplication by A gives the following values:
_
0
.71736
_

_
.71736
.97560
_

_
.97560
.60947
_

_
.60947
.14673
_

_
.14673
.80902
_

_
.80902
.95354
_

_
.95354
.48779
_

_
.48779
.29014
_

_
.29014
.88238
_

If we draw a time plot of the discrete system along with the solution of the continuous model we
get Figure 3.56. The values given by the discrete system also lie on a sine wave but at a slightly
dierent frequency from that of the continuous time solution.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14 16
Figure 8.4: Plots of the discrete time and continuous time solutions with t = .8. The horizontal
axis is indexed by k not by t.
The continuous solution has a period of 2. What is the period of the discrete solution? The
characteristic polynomial would be
2
1.36 + 1 = 0 giving eigenvalues of
1.36
1.36
2
4
2
= .68 .7333212i
These complex eigenvalues have magnitude 1, and correspond to a rotation of arccos(.68) at each
multiplication by A. One complete cycle would therefore require
2
arccos(.68)
steps. Since each step
is .8 seconds the total period is
.8
2
arccos(.68)
=
1.6
arccos(.68)
6.107342014
This is slightly less than the period of the continuous time model which had a period of 2
6.283185308.
Problem. You should realize that the nite dierence approximation that we are using to
generate our linear dynamical system becomes more exact as the time interval becomes smaller.
Compute the period of the discrete time solution for t = .4, and t = .1 and compare the result
with the period of the continuous time solution.
Here are the plots of these solutions for t = .4, and t = .1.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14 16
Figure 8.5: Plots of the discrete time and continuous time solutions with t = .4.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14 16
Figure 8.6: Plots of the discrete time and continuous time solutions with t = .1.
Exercises
1. In our analysis of the oscillating spring-mass system we ignored gravity. How would the analysis
change if we include gravity.
2. The dynamical system which modelled the spring-mass system ignored friction. We can modify the
system as follows to include the eect of friction:
_
x
k+1
x
k+2
_
=
_
0 1
q 1 2 p q
_ _
x
k
x
k+1
_
Here the parameter q is a (small) positive value that models the presence of friction.
(a) What are the magnitudes of the (complex) eigenvalues of this system?
(b) Let p = .64 and x
0
=
_
0
.717
_
. Use Maple to draw time plots of this system for q =
0.1, 0.2, 0.3, . . . , 0.9, 1.0. At what value of q do the eigenvalues become real? How does the
behavior of the system change at that point?
3. Set up the dierential equations for a system of two equal masses connected by three springs to
two xed supports. Assume the springs all have the same spring constant.
8.4 Dierential Equations and Linear Algebra
We have already looked at dynamical systems of the form x
k+1
= Ax
k
. Dynamical systems of
this type are sometimes called discrete dynamical systems because the time variable (k) evolves
in steps of some xed nite size. There is another fundamental way of modeling systems which evolve
over time using continuous dynamical systems which are described by dierential equations.
The simplest example of a continuous linear dynamical system would be an equation like
dx
dt
= .06x
The left side of this equation is a derivative which represents the instantaneous rate of change of x
with respect to time, t. The equation says that this rate of change is equal to 6% of the value of x.
The solution to this equation would be
x = Ce
.06t
Checking this by taking the derivative we would get
dx
dt
= Ce
.06t
(.06) = x(.06) = .06x
One key idea here is that the set of all possible solutions of the above dierential equation can
be seen as a 1 dimensional vector space
2
with basis e
.06t
.
We now show how the same dynamical system can be modeled as a discrete system.For simplicity
we will choose some specic value for C, say C = 10. We would then have the solution x = 10e
.06t
.
If we let x
0
stand for the value of x at time 0 we have x
0
= 10. Choose a time interval, say t = .5
2
As a generalization of this example the solution of
dx
dt
= kx would be x = Ce
kt
a one dimensional vector space
consisting of all scalar multiples of e
kt
.
8.4. Dierential Equations and Linear Algebra 355
and let x
k
be the value of x at time t = .5k (that is, k time intervals after t = 0). Then the derivative
can be approximated by a dierence quotient
x
k+1
x
k
.5
= .06x
k
Solving this for x
k+1
we get
x
k+1
= 1.03x
k
This would now be a discrete time approximation to the original continuous time system. If we start
o with x
0
= 10 and use this dierence equation we get the following values
x
0
10
x
1
10.3
x
2
10.609
x
3
10.927
x
4
11.255
x
5
11.595
x
6
11.941
Now remember that x
6
is the value after 6 time intervals which corresponds to t = 3. In the
continuous time model the value of x at t = 3 would be 10e
.06(3)
= 11.972. So the continuous and
the discrete model DO NOT AGREE. Here is a plot of the continuous time model and the values of
the discrete time model. The gap between the discrete and continuous models will generally increase
10
12
14
16
18
0 2 4 6 8 10
t
Figure 8.7: Comparison of the continuous and discrete models with t = .5.
as time goes on. At t = 10 the dierence between the two models would be
10e
.06(10)
10(1.03)
2
0 .16008
In general, as the time interval gets smaller the solution given by the discrete model gets closer
to the solution of the continuous model. To illustrate this we show the plots that would result if we
used a larger time interval of t = 1 and a smaller time interval of t = .2.
By choosing a smaller time interval the discrete model becomes a better approximation to the
continuous model. When we use a time interval of t = .2 the dierence between the two models
10
12
14
16
18
0 2 4 6 8 10
t
Figure 8.8: Comparison of the continuous
and discrete models with t = 1.
10
12
14
16
18
0 2 4 6 8 10
t
Figure 8.9: Comparison of the continuous
and discrete models with t = .2.
at t = 10 would be .06496 (approximately one third of what it was with an interval of .5). When
the time interval is t = 1 the dierence between the two models at t = 10 would be .31271.
There are two drawbacks to using very small intervals. First, due to the nite precision of
computers there is a limit as to how small the interval can be and the smaller the interval the more
serious the rounding errors will be. Second, the smaller the time interval the greater the amount of
data that will be produced. For example, if you chose t to be a millionth of a second, it would
take a million steps to compute just one second of the solution.
Systems of Two Dierential Equations
Next suppose we had a system of dierential equations like
dx1
dt
= .01x
1
dx2
dt
= .07x
2
This system would be easy to solve based on the earlier example. The solution here would be
x
1
= C
1
e
.01t
x
2
= C
2
e
.07t
That was pretty simple, but now look at a more complicated example
dx1
dt
= x
1
+2x
2
dx2
dt
= x
1
+4x
2
The big dierence in this example is that the variables are coupled. That is the formula for the
rate of change of x
1
involves the values of both x
1
and x
2
, and similarly for the rate of change of x
2
.
To solve this problem we want to uncouple the equations, and this will just involve diagonalizing
a matrix.
We begin by letting x =
_
x
1
x
2
_
and then we have
dx
dt
=
_
dx
1
/dt
dx
2
/dt
_
=
_
x
1
+ 2x
2
x
1
+ 4x
2
_
=
_
1 2
1 4
_ _
x
1
x
2
_
= Ax
8.4. Dierential Equations and Linear Algebra 357
The matrix A in this case can be diagonalized in the usual way. The matrix that diagonalizes A
is P =
_
2 1
1 1
_
and P
1
AP =
_
2 0
0 3
_
. So we introduce a new variable
y =
_
y
1
y
2
_
= P
1
x
We then get
dx
dt
= Ax
dx
dt
= PDP
1
x
P
1
dx
dt
= P
1
PDP
1
x
dy
dt
= Dy
In this case that leaves us with
dy1
dt
= 2y
1
dy2
dt
= 3y
2
The equations have been uncoupled. The solution now is simple:
y
1
= C
1
e
2t
y
2
= C
2
e
3t
But this is the solution in terms of our new variables. What is the solution in terms of the original
variables. For this we just evaluate the following
x = Py =
_
2 1
1 1
_ _
C
1
e
2t
C
2
e
3t
_
So
x
1
= 2C
1
e
2t
+C
2
e
3t
x
2
= C
1
e
2t
+C
2
e
3t
This solution can be written as
x = C
1
_
2e
2t
e
2t
_
+C
2
_
e
3t
e
3t
_
When the solutions are written this way it is easier to see that they form a two dimensional vector
space with basis
_
2e
2t
e
2t
_
and
_
e
3t
e
3t
_
.
Now how can the solutions be visualized. First, to use a specic example lets choose C
1
= 1
and C
2
= 1. Then we can draw time plots of the solutions where we view x
1
and x
2
as functions
of time. This would give
We can also draw a phase plot where we plot x
1
values against the corresponding x
2
values.
This gives In the phase plot we see the origin as a repellor.
Can we model this system as a discrete-time system? First, to simplify the notation
3
, we will
introduce new variables: let a = x
1
and b = x
2
.Well, if we let t = .1 then using a similar argument
as our earlier example we get
3
If we keep the variable x
1
, it is most common to represent the value of this variable at time t by x
t
1
10
8
6
4
2
0
2
4
6
8
10
2 1 1 2
t
Figure 8.10: The time plots of x
1
= 2e
2t
e
3t
and x
2
= e
2t
e
3t
6
4
2
0
2
4
6
x2
6 4 2 2 4 6
x1
Figure 8.11: The phase plot of x
1
= 2e
2t
e
3t
and x
2
= e
2t
e
3t
.
Appendix A
Linear Algebra with Maple
We will summarize the basics of using the LinearAlgebra package to do linear algebra. For all the commands in this
section it will be assumed that you have entered the following command to load the LinearAlgebra package.
>with(LinearAlgebra):
Dening Vectors in Maple
There are several ways of dening a vector in Maple . Suppose we have the vector
u =
2
6
6
4
4
3
0
8
3
7
7
5
The easiest way of dening this vector in Maple is
>u:=<4, 3, 0, 8>;
Note: You have to enclose the vector entries in angled brackets, < and >.
The Vector command also allows you to dene a vector by giving a rule to generate the n
th
entry in the vector.
This method of dening a vector requires two input parameters. The rst parameter is the size of the vector. The
second is the rule to generate entries in the vector.
>u:=Vector(4, n->n^2);
u=[1, 4, 9, 16]
>v:=Vector(5, j->t/(j+1));
v=[t/2, t/3, t/4, t/5, t/6]
A vector is a one-dimensional array in Maple which means individual entries in the vector can be accessed by
specifying the index of that entry as shown below
>u:=<x,y,x*y,-1>:
>u[2];
y
>u[4];
-1
>u[1]-u[3];
x - xy
>v:=Vector(4, n->u[5-n]);
v=[-1, xy, y, x]
359
360 A. Linear Algebra with Maple
Dening Matrices in Maple
There are many ways of dening a matrix in Maple . Suppose we have the matrix
A =
2
4
1 0
3 5
8 2
3
5
Either of the rst two following commands could be used to dene this matrix.
>A:=<<1|0>,<3|5>,<8|2>>; #### row by row
>A:=<<1,3,8>|<0,5,2>>; #### column by column
><A|A>;
><A,A>;
A matrix is a two-dimensional array in Maple and each entry in the matrix can be accessed by specifying the
two indices (row and column) of the entry. For example
>A:=<<2,0,8>|<5,1,4>>;
2
4
2 5
0 1
8 4
3
5
>A[2,1];
0
>A[3,1]*A[1,2];
40
Matrices can also be generated by giving a rule which generates thye entries. You rst specify the size of the
matrix and then give the rule
>B:=Matrix(2,3, (i,j)->i/j);
>C:=Matrix(2,2, (i,j)->t^(i*j));
B =
1 1/2 1/3
2 1 2/3
C =
t t
2
t
2
t
4
Patterned Matrices
Some matrices have entries that fall into a particular pattern. For example, the following matrices are square diagonal
matrices:
B =
2
6
6
4
4 0 0 0
0 8 0 0
0 0 3 0
0 0 0 1
3
7
7
5
C =
2
6
6
6
4
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 2 0
0 0 0 0 5
3
7
7
7
5
The maple command DiagonalMatrix can be used for this type of matrix. With this command you just have to input
the entries on the diagonal of the matrix. So we could dene B and C as
>B:=Diagonalmatrix(<4,8,3,-1>);
>C:=DiagonalMatrix(<1,1,1,2,5>);
The 10 10 identity matrix could be dened as
>I10:=IdentityMatrix(10);
Note how the $ sign is used to create a sequence of ten 1s.
Another type of patterned matrix is called a band matrix. The following are examples of band matrices:
B1 =
2
6
6
6
4
4 1 0 0 0
2 4 1 0 0
0 2 4 1 0
0 0 2 4 1
0 0 0 2 4
3
7
7
7
5
B2 =
2
6
6
6
6
6
4
b c d 0 0 0
a b c d 0 0
0 a b c d 0
0 0 a b c d
0 0 0 a b c
0 0 0 0 a b
3
7
7
7
7
7
5
In Maple we can enter
361
>B1:=BandMatrix(<-2,4,1>,1, 5);
>B2:=BandMatrix(<0,a,b,c,d>,1,6);
The BandMatrix command requires three inputs. The rst input must be a vector containing the entries down the
band. The next entry species how many diagonal bands extend below the main diagonal. The third entry (or third
and fourth) species the size of the matrix.
Solving Systems of Equations
As usual with Maple there are many ways of solving a system of equations. We will only mention three ways.
Suppose we want to solve the system Ax = b where A =
2
4
1 2 1
2 1 2
3 4 5
3
5
and b =
2
4
2
0
5
3
5
. In this case A is a square
invertible matrix so the solution is given by x = A
1
b. The Maple commands for this are as follows:
>A:=<<1,2,3>|<2,1,4>|<1,2,5>>;
>b:=<2,0,-5>:
>sol:=A^(-1).b;
[7/2, 4/3, -25/6]
The method that was just used only works if A is invertible. We can solve any system by setting up the augmented
matrix of the system and then putting it in reduced row echelon form. The last column of the reduced matrix will
contain the solution.
>A:=<<1,2,3>|<2,1,4>|<1,2,5>>;
>b:=<2,0,-5>:
>ReducedRowEchelonForm(A,b):
>col(%, 4);
[7/2, 4/3, -25/6]
>LinearSolve(A,b); ### another option
Suppose A =
1 2 2 3
2 1 3 4
and b =
3
0
then Ax = b must have innitely many solutions. We can nd these

solutions in Maple as follows:
>A:=<<1,2>|<2,1>|<2,3>|<3,4>>:
>b:=[3,0]:
>LinearSolve(A,b);
[-1-4/3*s-5/3*t, 2-1/3*s-2/3*t, s, t]
Matrix and Vector Opertations
The simplest operations on matrices and vectors are scalar multiplication and addition. These two operations
allow you to create linear combinations.
We will use the following matrices and vectors for our examples in this section:
A =
2
4
2 1
1 1
0 2
3
5
, B =
2
4
3 3
1 4
2 1
3
5
, u =
1
3
, v =
3
1
then we can evaluate 5A, 3u, 2A + 3B, 4u 8v as follows

>A:=<<2|1|1>|<-1|0|2>>:
>B:=<<3|3|-1>|<4|2|1>>:
>u:=<1,3>:
>v:=<3,-1>:
>5*A;
>-3*u;
>2*A+3*B;
>4*u-8*v;
So addition and scalar multiplication are computed using the symbols + and *.
Matrix multiplication is not the same as scalar multiplication and is represented by a dierent symbol in Maple
. Matrix multiplication is indicated by a dot, that is the . symbol. So if we wanted to compute AB, BA, A
2
, Au,
B(u +v) using the same matrices and vectors as above we could enter
>A.B;
>B.A;
>A.A; ### one way of finding A^2
>A^2; ### an alternate way of finding A^2
>A.u;
>A.(u+v);
Finding the transpose or inverse of a matrix can be found as follows (we show two methods for nding the inverse).
>Transpose(A);
>A^(-1); ### this stands for the inverse of A but does not compute it
For example, using the same matrices as above suppose we want to nd a matrix C such that
A(C +B) = B
t
A
Solving this equation symbolically we would get
C = A
1
B
t
AB
We can then compute this result in Maple
>C:=A^(-1).Transpose(B).A-B;
The dot product can be found in two ways. To nd u v we can enter either
>DotProduct(u,v);
>Transpose(u).v);
These two methods result from the equation u v = u
T
v.
There is a similar command for the cross product. A cross product can be evaluated using
>CrossProduct(<1,2,3>,<4,5,6>);
[-3, 6, -3]
>CrossProduct(<A,B,C>,<X,Y,Z>);
[B*Z-C*Y, C*X-A*Z, A*Y-B*X])
Determinants
A determinant can be computed in Maple with the det command. For example suppose we want to use Cramers
Rule to solve
2x
1
+ 3x
2
+ 4x
3
= a
3x
1
+ 2x
2
+ 3x
3
= b
5x
1
+ 5x
2
+ 9x
3
= c
for x
2
.
Cramers Rule says that
x
2
=
2 a 4
3 b 3
5 c 9
2 3 4
3 2 3
5 5 9
In Maple we could do
>a1:=<2,3,5>:
>a2:=<3,2,5>:
>a3:=<4,3,9>:
>y:=<a,b,c>:
>A:=<a1|a2|a3>:
>A2:=<a1|y|a3>:
>Determinant(A2)/Determinant(A); ### Cramers Rule for x2
363
Examples
Example 1
We will solve the system
x +y +z = 3
3x 2y +z = 1
4x y + 2z = 4
First we will show how to plot these equations.
>e1:=x+y+z=3:
>e2:=3*x-2*y+z=1:
>e3:=4*x-y+2*z=4:
>plots[implicitplot3d]({e1,e2,e3},x=-4..4,y=-4..4,z=-4..4,axes=boxed,style=patchnogrid,shading=zgrayscale);
4
2
0
2
4
x
2
0
2
4
y
4
2
0
2
4
Figure A.1: The plot of the system.
The plot shows that the three planes making up the system intersect along a line.
We could solve this system by
>solve({e1,e2,e3}, {x,y,z});
y = 8/5 2/5 z, z = z, x = 7/5 3/5 z
This result means that z is free and so the solution would correspond to the line
2
4
8/5
0
7/5
3
5
+t
2
4
2/5
1
3/5
3
5
We could also solve this system by setting up the augmented matrix and reducing.
>A:=<<1,3,4>|<1,-2,-1>|<1,1,2>|<3,1,4>>;
>ReducedRowEchelonForm(A);
2
6
6
4
1 0 3/5 7/5
0 1 2/5 8/5
0 0 0 0
3
7
7
5
It should be clear that this reduced form gives the same solution as the previous method.
Example 2
Given
v
1
=
2
6
6
4
1
2
3
4
3
7
7
5
, v
2
=
2
6
6
4
2
3
4
5
3
7
7
5
, v
3
=
2
6
6
4
3
4
5
6
3
7
7
5
, v
4
=
2
6
6
4
4
5
6
7
3
7
7
5
Find a basis for Span (v
1
, v
2
, v
3
, v
4
) from among hese vectors.
We can solve this in Maple as follows
>v1:=<1,2,3,4>:
>v2:=<2,3,4,5>:
>v3:=<3,4,5,6>:
>v4:=<4,5,6,7>:
>A:=<v1|v2|v3|v4>:
>ReducedRowEchelonForm(A);
2
6
6
6
6
6
4
1 0 1 2
0 1 2 3
0 0 0 0
0 0 0 0
3
7
7
7
7
7
5
Maple has done the computation but it is up to us to give the correct interpretation to this result. In this case
we see that two columns of the reduced form contain pivots. The corresponding columns of the original matrix would
be the basis we are looking for. So our basis is v
1
, v
2
.
Example 3
Find all 2 2 matrices satisfying A
2
= 0.
We start by dening
A =
a b
c d
>A:=<<a,c>|<b,d>>:
>B:=A^2:
B =
"
a
2
+bc ab +bd
ca +dc bc +d
2
#
Now we want each entry in B to equal 0. The next line shows haw we can refer to these entries in Maple and
have Maple solve the desired equations.
>solve( {B[1,1]=0, B[1,2]=0, B[2,1]=0, B[2,2]=0}, {a,b,c,d} );
c = 0, d = 0, b = b, a = 0 ,
c = c, d = d, a = d, b =
d
2
c
This result means that there are two basic solutions. If c = 0 then there is a solution of the form
0 b
0 0
where b is free.
If c ,= 0 then there is a solution of the form
"
d
d
2
c
c d
#
where c and d are free.
365
Example 4
For what values of a and b do the vectors
2
4
1
2
2
3
5
,
2
4
2
1
a
3
5
2
4
1
a
b
3
5
from a basis of R
3
?
We will illustrate two methods of answering this question.
>A:=<<1,2,2>|<2,1,a>|<1,a,b>>:
>GaussianElimination(A);
2
6
6
4
1 2 1
0 3 a 2
0 0 b + 2/3 + 1/3 a
2
2 a
3
7
7
5
In order for these vectors to be a basis of R
3
we need the entry in the third row, third column to be non-zero.
We can state this condition as
b ,=
1
3
a
2
+ 2a
2
3
We could also do the following
>Determinant(A);
3 b a
2
+ 6 a 2
For these vectors to be a basis of R
3
we want the determinant to be non-zero. This would give the same result as the
rst method.
Appendix B
Complex Numbers
Consider the equation x
2
+1 = 0. If you try to solve this equation, the rst step would be to isolate the x
2
term giving
x
2
= 1. You would then take the square root and get x =
1. Algebraically this would be the solution (or rather

one of the solutions, the other being
1). However there is no real number which satises this condition since
when you square as a real number the result can never be negative. In the 16
th
century mathematicians introduced
the symbol i to represent this algebraic solution, and referred to this solution as an imaginary number. In general,
an imaginary number is any real multiple of i.
A complex number is a number that is the sum of a real number and an imaginary number. A complex number
is usually represented as a +bi where a and b are real numbers. In this notation a is referred to as the real part, and
b is referred to as the imaginary part of the complex number. There are special symbols that are commonly used to
refer to the real and imaginary parts of a complex number. If z is a complex number then z indicates the real part
of z and z indicates the imaginary part of z.
Complex numbers satisfy the usual rules of addition and multiplication. The one complication is that any
occurrence of i
2
can be replaced by 1. Look at the following computations for example :
(2 + 5i) + (7 2i) = 9 + 3i
(2 + 5i)(7 2i) = 14 4i + 35i 10i
2
= 14 + 31i 10(1) = 24 + 31i
i
3
= i
2
i = 1 i = i
Geometry of Complex Numbers
A correpondance can be set up between complex numbers and points in the plane. The real part gives the
horizontal coordinate, and the imaginary part gives the vertical coordinate. So, for example, the complex number
3 + 2i would correspond to the point (3, 2). A purely real number would lie somewhere on the horizontal axis and a
purely complex number would lie on the vertical axis. When plotting complex numnbers in this way it is standard to
call the horizontal axis the real axis and the vertical axis the imaginary axis. If we associate points in the plane with
position vectors (that is, vectors whose starting point is the origin), then adding complex numbers is like adding the
corresponding vectors. Multiplying a complex number by a real number is like multiplying the vector by a scalar.
Given a complex number z = a + bi, the complex conjugate of that number is z = a bi. So the conjugate of
a complex number is formed by changing the sign of the imaginary part. Geometrically, the conjugate of z is the
mirror image of z through the real axis. Notice that z = z if and only if zis purely real. Two basic properties of the
conjugate are:
z
1
+ z
2
= z
1
+z
2
z
1
z
2
= z
1
z
2
We will give a proof of the second of these properties. Let z
1
= a +bi and z
2
= c +di, then
z
1
z
2
= (a +bi)(c +di)
= ac +adi +bci +bdi
2
= ac bd + (ad +bc)i
367
368 B. Complex Numbers
REAL AXIS
IMAGINARY AXIS
a+bi
a-bi
Figure B.1: The points a +bi and a bi in the complex plane
and so we have
z
1
z
2
= (a bi)(c di)
= ac adi bci +bdi
2
= ac bd (ad +bc)i
= z
1
z
2
The above result can be generalized to matrices and vectors with complex entries. For a complex matrix A and
complex vector x we have:
Ax = Ax
Or, more particularly, if Ax = x then Ax = x. From this it follows that is A has only real entries then Ax = x.
In other words, if A has only real entries and has complex eigenvalues then the eigenvalues and eigenvectors come in
complex pairs. In other words, if is an eigenvalue then so is , and if x is an eigenvector (corresponding to ) then
x is an eigenvector corresponding to .
Another important property of the conjugate is that if z = a +bi then
zz = (a +bi)(a bi) = a
2
abi +abi b
2
i
2
= a
2
+b
2
which you should recognize as the distance of z from the origin squared (or the length of the corresponding vector
squared). This distance is called the magnitude (or length, or absolute value) of the complex number and written
[z[ =
z z. This equation has an important consequence when dealing with complex vectors: recall that if v is a real
vector then |v|
2
= v
T
v. But if v is a complex vector then |v|
2
= v
T
v.
1
For example, suppose v =
1
i
then
v
T
v =
1 i
1
i
= 1
2
+i
2
= 1 1 = 0
which would clearly be incorrect for the length. But
v
T
v =
1 i
1
i
= 1
2
i
2
= 1 + 1 = 2
Taking the square root we then get the correct length, |v| =
2.
1
The conjugate of the transpose of a complex matrix A is usually written A
. So if v is a complex vector then

|v|
2
= v
v. This equation is also valid for real vectors since v
= v
T
if all the entries are real.
369
The conjugate also has some use with division of complex numbers. To rationalize the denominator of a complex
fraction means to eliminate any imaginary terms from the denominator. This can be done by multiplying the numerator
and denominator of the fraction by the conjugate of the denominator. For example:
1 +i
2 +i
=
(1 +i)(2 i)
(2 +i)(2 i)
=
3 +i
5
=
3
5
+
1
5
i
Polar Representation of Complex Numbers
Any complex number (or, more generally, any point in the plane) can be characterized by the distance of the
point from the origin, r, and the angle measured from the positive x axis, . So, for example, the complex number
1 +i has r =
2 and = /4. If we square this complex number we get (1 + i)

2
= 1 + 2i +i
2
= 2i. In this case the
value of r would be 2 and would be /2.
In general if a complex number lies at a distance r and an angle the real coordinate would be given by r cos
and the imaginary coordinate would be r sin. So this complex number could be written as r cos + ir sin =
r (cos +i sin ).
There is another important notation for complex numbers that is related to the idea of power series. From calculus
you should recall that
e
x
= 1 +x +
x
2
2
+
x
3
6
+
x
4
24
+
Substituting x = i into the above and simplifying we get
e
i
= 1 +i +
i
2
2
2
+
i
3
3
6
+
i
4
4
24
+
= 1 +i

2
2
i
3
6
+

4
24
+
The real part of this last expression is 1

2
2
+

4
24
+ which is the power series for cos . The imaginary part
is

3
6
+

5
120
+ which is the power series for sin. As a result we get what is called Eulers Formula:
e
i
= cos +i sin
As a result we have the fact that any complex number can be represented as re
i
. The conjugate of this complex
number would be re
i
. The absolute value of r is just the magnitude of the complex number, and the angle is
called the argument of the complex number.
This notation makes one important aspect of multiplication of complex numbers easy to see. Suppose we have
a complex number z
1
= r
1
e
i
. This point is located at a distance r from the origin and at an angle from the
positive real axis. Now suppose we multiply this complex number by another complex number z
2
= r
2
e
i
. We get
z
1
z
2
= r
1
e
i
r
2
e
i
= r
1
r
2
e
i(+)
. What has happened to the original complex number? Its length has been scaled
by r
2
and the angle has been rotated to +. In other words, multiplication by a complex number can be seen as a
combination of a scaling and a rotation.
Roots of Unity
Suppose you have the equation z
3
= 1. One solution is clearly z = 1. This is the real cube root of 1, but there are
two other complex solutions. If we write z = re
i
, then we want r
3
e
i3
= 1 = e
i2N
for any integer N. This implies
that r = 1 and that 3 = 2N. We then have =
2N
3
, and this gives three dierent solutions = 0, 2/3, 2/3.
(All the other values of would be coterminal with these angles.) If we plot these points in the complex plane along
with the unit circle we get the following:
In general, if we want the N
th
roots of 1 we can start with w = e
i2/N
. Then w
N
=
`
e
i2/N
N
= e
i2Pi
= 1, so
w is an N
th
root of 1. Then w
k
is also an N
th
root of 1. Thus 1, w, w
2
, w
3
, . . . , w
N1
are the N
th
roots. By earlier
remarks, these will be evenly spaced points on the unit circle.
1
1
1 1
Figure B.2: The cube roots of 1.
Exercises
1. Let z
1
= 2 +i and z
2
= 1 + 2i. Find
(a) z
1
z
2
(b) z
1
z
1
(c) z
2
2
2. Let z = 1 +
3i.
(a) Write z in the form re
i
.
(b) Write z in the form re
i
.
(c) Write z
2
in the form re
i
.
(d) Write z
6
in the form re
i
.
3. Find all solutions of z
3
= 1. Do this by rewriting the equation as z
3
1 = 0. Then factor the left hand side:
(z 1)(z
2
+ z + 1) = 0. You should get 3 solutions. Give your solutions in both the standard form as a + bi
and in exponential form as re
i
.
4. Find all four solutions to z
4
= 1.
5. Start with the equation e
i
= cos + i sin . Square both sides of this equation. Use this result to nd
trigonometric identities for cos 2 and sin2.
6. Show that
1
z
=
z
[z[
2
for any complex number z ,= 0.
7. (a) Find [e
i
[ and [i
e
[.
(b) Plot the two points e
i
and i
e
in the complex plane.
371
Using MAPLE
We will use Maple to illustrate some of the aspects of complex numbers discussed in this section.
In Maple the symbol I is used to stand for
1. In the following example we will begin by dening the complex

number z = 7.4 + 3.2i.
>z:=7.4+3.2*I:
>abs(z);
8.062257748
>Re(z);
7.4
>Im(z);
3.2
>conjugate(z);
7.4-3.2*I
>conjugate(z)*z;
65.00
>sqrt(conjugate(z)*z);
8.062257748
>convert(z,polar);
polar(8.062257748,.4081491038)
>8.062257748*exp(.4081491038*I);
7.399999999+3.200000000*I
>argument(z);
.4081491038
>convert(z^2,polar);
polar(65.00, .816298)
The command abs(z) computes [z[, the magnitude of z.
The commands Re and Im return the real and imaginary parts of a complex number.
The conjugate command returns z. Notice that the product conjugate(z)*z returns the square of abs(z).
The command convert(z,polar) returns the value of r and required to write z in the form re
i
. The following
command computes this exponential form and returns the original z (with some rounding error). Notice that the values
returned by convert(z
^
2,polar) show that when z is squared the magnitude gets squared and the argument gets doubled.
The Maple command argument(z) will return just the argument of z.
Next we will use Maple to illustrate Eulers Formula.
>f:=exp(I*t);
>plot([Re(f), Im(f)],t=-9..9,linestyle=[1,2],thickness=2);
This gives Figure B.3.
You should understand where these plots came from. Since e
it
= cos t + i sint plotting the real and imaginary parts
results in plots of a cosine and sine function.
Compare the above with the following:
>w:=.3+.9*I;
>g:=exp(w*t);
>plot([Re(g), Im(g)],t=-9..9,linestyle=[1,2],thickness=2);
This gives the Figure B.4.
To understand this result notice that we have
e
(.3+.9i)t
= e
.3t
e
.9it
= e
.3t
(cos(.9t) +i sin(.9t)) = e
.3t
cos(.9t) +ie
.3t
sin(.9t)
So plotting the real and imaginary parts returns a cosine and sine function but now they are being scaled by a function
which is increasing exponentially.
For one last example we will use Maple to plot the solutions to z
2
0 = 1 (that is, to plot the 20 twentieth roots of
1). The rst command below uses Maple to compute the roots and place them in a list called sols. The second line uses
complexplot procedure in Maple which can be used to plot a list of complex numbers.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
8 6 4 2 4 6 8
t
Figure B.3: The real and imaginary parts of e
it
.
4
2
2
4
6
8
10
12
14
8 6 4 2 4 6 8
t
Figure B.4: The real and imaginary parts of e
(.3+.9i)t
.
>sols:=[solve(z^20=1,z)];
>plots[complexplot](sols,style=point);
This gives Figure B.5.
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
1 0.6 0.20.40.60.8 1
Figure B.5: The solutions to z
20
= 1.
Appendix C
Linear Transformations
Let U and V be vector spaces and let T be a transformation (or function, or mapping) from U to V . That is,
T is a rule that associates each vector, u, in U with a unique vector, T(u), in V . The space U is called the domain
of the transformation and V is called the co-domain. The vector T(u) is called the image of the vector u under
transformation T.
Denition 23 A transformation is linear if:
1. T(u +v) = T(u) +T(v) for all u and v in the domain of T.
2. T(cu) = cT(u) for all u in the domain of T and all scalars c.
The combination of the two properties of a linear transformation implies that
T(c
1
v
1
+c
2
v
2
+ +cnvn) = c
1
T(v
1
) +c
2
T(v
2
) + +cnT(vn)
for any set of vectors, v
i
, and scalars, c
i
.
We will just make a few observations about linear transformations.
Theorem C.1 If T is a linear transformation then T(0) = 0.
Proof Let T : U V be a linear transformation and let u be any vector in U, then
T(0
U
) = T(u u) = T(u) T(u) = 0
V
In the above 0
U
stands for the zero vector in U and 0
V
is the zero vector in V .
Theorem C.2 If T is a linear transformation from R
n
to R
m
then T(u) = Au for some mn matrix A.
Proof Let u =
2
6
6
6
4
u
1
u
2
.
.
.
un
3
7
7
7
5
be any vector in R
n
then
T(u) = T(u
1
e
1
+u
2
e
2
+ +unen)
= u
1
T(e
1
) +u
2
T(e
2
) + +unT(en)
=
T(e
1
) T(e
2
) T(en)
2
6
6
6
4
u
1
u
2
.
.
.
un
3
7
7
7
5
= Au
373
374 C. Linear Transformations
The matrix A in the above theorem is called the standard matrix of the linear transformation T. The
above proof in fact gives a method for nding the matrix A. The proof shows that the columns of A will be the images
of the standard basis under the transformation.
For example, suppose A =
1 1 2
2 2 1
and u =
2
4
3
1
2
3
5
. The linear transformation T(x) = Ax would be from R
3
to R
2
. This is sometimes written T : R
3
R
2
. The image of u under this transformation would be
T(u) =
1 1 2
2 2 1
2
4
3
1
2
3
5
=
2
2
So any linear transformation from R

n
to R
m
is equivalent to a matrix multiplication. What happens with other
vector spaces? There are many familiar operations which qualify as linear transformations. For example, in vector
spaces of dierentiable functions the operation of nding a derivative is a linear transformation because
(f +g)
= f
+g
(cf)
= cf
where f and g are functions and c is a scalar.

Or in the vector spaces of matrices, taking the transpose is a linear transformation because
(A +B)
T
= A
T
+B
T
(cA)
T
= cA
T
When you take the determinant of a matrix the inputs are square matrices and the outputs are real numbers, so
computing a determinant is a transformation from the vector space of nn matrices to R
1
but it is not linear since
det(A +B) ,= det(A) + det(B)
det(cA) ,= c det(A)
It turns out that we can say something specic about linear transformations between nite-dimensional vector
spaces:
Suppose T is a linear transformation where the domain and co-domain are both nite dimensional vector spaces.
In this case if we represent each vector by coordinates in terms of some basis then the vector spaces will look like R
n
for some value of n (the dimension of the spaces).
For example, suppose we had T : P
3
P
3
dened by T(p(x)) = p
(x).If we use a basis
1, x, x
2
, x
3
then the
polynomial c
0
+c
1
x +c
2
x
2
+c
3
x
3
would be represented by
2
6
6
4
c
0
c
1
c
2
c
3
3
7
7
5
and T(p) = c
1
+2c
2
x +3c
3
x
2
would be represented
by
2
6
6
4
c
1
2c
2
3c
3
0
3
7
7
5
and this transformation would be equivalent to multiplying by the matrix
2
6
6
4
0 1 0 0
0 0 2 0
0 0 0 3
0 0 0 0
3
7
7
5
It is also possible for one or both of the domain and co-domain to be innite dimensional and in this case the
transformation is usually not represented by a matrix multiplication. But even here it is possible. Suppose for example
we had an innite dimensional vector where the transformation is just a shift in the coordinates, i.e.
T :
2
6
6
6
4
c
0
c
1
c
2
.
.
.
3
7
7
7
5
2
6
6
6
4
c
1
c
2
c
3
.
.
.
3
7
7
7
5
This could be seen as multiplication by the matrix
2
6
6
6
4
0 1 0 0
0 0 1 0
0 0 0 1
.
.
.
.
.
.
3
7
7
7
5
In this case the matrix would have an innite number of rows and columns.
Finally we point out why they are called linear transformations.
375
Theorem C.3 If T : U V is a linear transformation and L is a straight line in U, then T(L) is either a straight
line in V or a single point in V .
Proof Any straight line L in U must have an equation of the form x = u
0
+ tu
1
. This is a line through u
0
in the
direction of u
1
. If we apply T to this line we get:
T(L) = T (u
0
+tu
1
)
= T (u
0
) +T (tu
1
)
= T (u
0
) +tT (u
1
)
This result can be seen as a line through T(u
0
) in the direction of T(u
1
). If T(u
1
) = 0 then the transformation gives
just a single point.
You have to be careful in interpreting the above. For example, in the vector space of dierentiable functions
the expression t sin x would correspond to a straight line through the origin. The points on this line would be
expressions such as 2 sinx, 3 sinx, 3.7 sinx.
It is a straight line because it corresponds to all scalar multiples of a vector. The usual plot of sinx as a
waveform is totally irrelevant in this case.
The origin in this case is not the point (0,0). The origin would be the zero function, f(x) = 0.
As pointed out earlier, taking the derivative of a function is a linear transformation. If we apply this linear transfor-
mation to this line (by dierentiating with respect to x) we get t cos x, which is another straight line.
Heres another example. The expression t sinx+(1t) cos x gives a straight line in the vector space of dierentiable
functions. The points in this space are functions. When you plug in t = 0 you get cos x. When you plug in t = 1
you get sin x. So this is a straight line passing through the points cos x and sin x. This type of abstraction is one of
the basic features of higher mathematics. Here we have taken a simple, intuitive geometric idea from R
2
(the idea of
a line through two points) and extended it to an abstract space.
376 C. Linear Transformations
Appendix D
Partitioned Matrices
Suppose you have the 5 5 matrix
A =
2
6
6
6
4
1 1 4 3 2
6 3 1 7 8
9 0 1 2 2
8 7 6 5 8
1 1 3 4 2
3
7
7
7
5
This matrix can be partioned, for example, as follows:
A =
2
6
6
6
4
1 1 4 3 2
6 3 1 7 8
9 0 1 2 2
8 7 6 5 8
1 1 3 4 2
3
7
7
7
5
=
A
11
A
12
A
21
A
22
The entries in A can be divided into a group of submatrices. In this example A

11
is a 3 3 matrix, A
12
is a 3 2
matrix, A
21
is a 2 3 matrix, and A
22
is a 2 2 matrix. (This would not be the only way of partitioning A. Draw
any collection of horizontal and vertical lines through the matrix and you can create a partition.
For another example let I
3
be the 3 3 identity matrix. The following are all ways of partitioning I
3
:
e
1
e
2
e
3
2
4
e
T
1
e
T
2
e
T
3
3
5
1 0
0 I
2
The important thing about partitioned matrices is that if the partitions have compatible sizes then the usual rules
for matrix addition and multiplication can be used with the partitions. For example we could write
A +B =
A
11
A
12
A
21
A
22
B
11
B
12
B
21
B
22
A
11
+B
11
A
12
+B
12
A
21
+B
21
A
22
+B
22
if the various submatrices have compatible sizes for the additions to be dened (i.e., A
11
and B
11
must have the same
size, etc.)
Similarly we could write
AB =
A
11
A
12
A
21
A
22

B
11
B
12
B
21
B
22
A
11
B
11
+A
12
B
21
A
11
B
21
+A
12
B
22
A
21
B
11
+A
22
B
21
A
21
B
12
+A
22
B
22
provided that all the subsequent multiplications and additions are dened.
For example suppose A is an invertible nn matrix,I is the nn identity matrix, and O is the nn zero matrix,
then

O A
I O

O I
A
1
O
I O
O I
Or suppose that matrix B is a 37 matrix. If you can nd a pivot in of the rst 3 columns of B then the reduced
row echelon form of B would have the form
I C
where I is the 3 3 identity matrix and C is a 3 7 matrix. Now

notice that
I C
C
I
= O
377
378 D. Partitioned Matrices
Ask yourself: what are the dimensions of the matrices in the above equation. The above equation also implies that
the columns of
C
I
form a basis for Nul B. (Why?)

Two other familiar examples of multiplying partitioned matrices are when each row or column is a partition. For
example if we have the matrix product AB and we let a
T
i
be the rows of A and b
i
be the columns of B then we can
write
AB =
2
6
6
6
4
a
T
1
a
T
2
a
T
3
.
.
.
3
7
7
7
5
b
1
b
2
b
3
. . .
=
2
6
6
6
4
a
T
1
b
1
a
T
1
b
2
a
T
1
b
3
. . .
a
T
2
b
1
a
T
2
b
2
a
T
2
b
3
. . .
a
T
3
b
1
a
T
3
b
2
a
T
3
b
3
. . .
.
.
.
.
.
.
.
.
.
3
7
7
7
5
This is just the inner product form for matrix multiplication.
On the other hand if we have the matrix product CD and we partition C into columns and D into rows we have
CD =
c
1
c
2
c
3
. . .
2
6
6
6
4
d
T
1
d
T
2
d
T
3
.
.
.
3
7
7
7
5
= c
1
d
T
1
+c
2
d
T
2
+c
3
d
T
3
+
This is the outer product form for matrix multiplication.
As a last example of using partitioned matrices we will give a proof that a symmetric matrix, A, is orthogonally
diagonalizable by some matrix P.
We will proof this by induction on the size of the matrix. If A is 1 1 then it is already diagonal and we can let
P = [1].
Now assume the statement is true for matrices of size (n 1) (n 1). We have to show that it is true for n n
matrices. We know that A has only real eigenvalues, so let
1
be some real eigenvalue of A with a corresponding
unit eigenvector v
1
. We can nd an orthonormal basis for R
n
v
1
, v
2
, . . . , vn (any such basis will do) and let
P =
v
1
v
2
. . . vn
. Now
P
T
AP =
2
6
6
6
4
v
T
1
v
T
2
.
.
.
vn
3
7
7
7
5
Av
1
Av
2
. . . Avn
=
2
6
6
6
4
v
T
1
v
T
2
.
.
.
vn
3
7
7
7
5
1
v
1
Av
2
. . . Avn
1
0
0 B
where B is an (n1)(n1) matrix. Furthermore, P

T
AP is symmetric, so B must be symmentric. By the induction
hypothesis we now have
Q
T
BQ = D
for some orthogonal matrix Q and diagonal matrix D.
Let R =
1 0
0 Q
. We then have
R
T
1
0
0 B
R =
1 0
0 Q
T

1
0
0 B

1 0
0 Q
1
0
0 Q
T
BQ
1
0
0 D
Finally, this means that R

T
P
T
APR = (PR)
T
APR =
1
0
0 D
. But PR is an orthogonal matrix since the

product of two orthogonal matrices is orhogonal. Lets dene S = PR. We then get S
T
AS is diagonal and so A is
orthogonally diagonalizable.

The Singular Value Decomposition.

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

The Singular Value Decomposition.

Caricato da

Copyright:

Formati disponibili

Chapter 7

The Singular Value Decomposition

. So V is an orthogonal matrix and

. Finally is a diagonal matrix so it corresponds to

D where D is a square diagonal matrix (with

or a zero padding matrix

3 and 1. Omitting the rest of

3, scale the second component by 1, and chop o (truncate)

b will give the unique least squares

b will give the least squares solution to the

b gives the unique least-squares solution to Ax = b.

b gives the least-squares solution of minimum norm to the

b. How does this relate to the answer from

. (Note: this result along with the previous problem shows

projects onto Col A.

A projects onto RowA.

projects onto NulA

A projects onto NulA.

= the projector onto Col A

A = the projector onto Row A

= the projector onto Col A

328 7. The Singular Value Decomposition

The rst principal component would then be

h. You should try deriving this equation for a bit of review.

where each column is a point in R

then Ax = b must have innitely many solutions. We can nd these

then we can evaluate 5A, 3u, 2A + 3B, 4u 8v as follows

1. Algebraically this would be the solution (or rather

. So if v is a complex vector then

v. This equation is also valid for real vectors since v

2 and = /4. If we square this complex number we get (1 + i)

1. In the following example we will begin by dening the complex

So any linear transformation from R

where f and g are functions and c is a scalar.

(x).If we use a basis

The entries in A can be divided into a group of submatrices. In this example A

where I is the 3 3 identity matrix and C is a 3 7 matrix. Now

form a basis for Nul B. (Why?)

where B is an (n1)(n1) matrix. Furthermore, P

Finally, this means that R

. But PR is an orthogonal matrix since the

Potrebbero piacerti anche