Chapter 10 The Singular Value Decomposition

Chapter 10
THE SINGULAR VALUE

DECOMPOSITION
10.1 INTRODUCTION
In this chapter we will discuss the important topic in science and engineering, that is the
concepr of singular value decomposition. This topic is important but can not be separated
from the other conscepts in linear algebra and matrix.
We will begin this chapter by discussing what we mean with generalized inverse or Moore
Penrose
The concept that will be discussed in this chapter have the strong connection with the real
world application such as in modern control implementation.
10.2 GENERALIZED INVERSE

We will begin our discussion in this section with m × n real matrix A by examining the
properties of the real square matrices ATA and AAT, which are n × n and m × m
respectively.
Theorem 1. Suppose that B is any matrix. Then BTB is nonnegative definite.
Proof. Consider the quadratic form Q = yT y, which is never negative. Suppose y = Bx

where B can be m × n. Then xTBTBx ≥ 0, so that BTB is nonnegative definite.
This theorem in conjunction with Theorem (Quadratic form …about the eigenvalue of
Hermitian matrix) and that square Hernitian matrix is positive definite, we can conclude
that the eigenvalue of ATA are either zero or positive and real. Also, this is true for the
eigenvalues of AAT in which we suppose that B = AT in Theorem 1.
Now, we are going to prove the next important theorem.
Theorem 2. Suppose A be an m × n matrix of rank r, where r ≤ m and r ≤ n, and suppose

pi be the orthonormal eigenvectors of ATA. Also suppose fi be an orthonormal eigenvectors
of AAT, and suppose λi be the nonzero eigenvalues of ATA. Then
(i) λi are also the nonzero eigenvalues of ATA.
(ii) Agi =  i fi.
(iii) Agi = 0 for i = r + 1, r + 2, …, n
(iv) ATfi = 0 for i = r + 1, r + 2, …, n.
(v) ATfi = λigi for i = 1, 2, …, r.
Proof. Recall that for any given m × n matrix A, rank(A) = rank(ATA) = rank(AAY), there
are exactly r nonzero eigenvalues of ATA and AAT. Then ATAgi = λigi for I = 1, 2, …, r
and ATAgi = 0 for I = r + 1, r +2, …, n. Now, define an m-vectors pi as
Ag i
pi = for I = 1, 2, …, r.
i
Then
AA T g i Ai g i 
AATpi =   i p i (10-1)
i i
Furthermore
g Ti A T Ag j i g Ti g Tj
piTpj =    ij (10-2)
i  j j
Also, we can find for I = r + 1, r +2, …, n such that AATfi = 0 and are orthonormal. Thus
the eigenvectors fi are an orthonormal basis for V m and the eigenvector gi are an
orthonormal basis for Vn.
Based on these facts, we are now able to prove the statements in the theorem
(i) Since there are r of λi, these must be the eigenvalues of AAT. Hence (i) is proved.
(ii) And since for each I there is one normalized eigenvector, pi can be taken equal to fi
and (ii) is proved
2
(iii) Since ATAgi =0 for I = r + 1, r + 2, …, m. Then Ag 2 2
 g Ti A T Ag i  0 , so that Agi =
0 and (iii) is proved
(iv) Similarly, AATfi = 0 for I = r + 1, r + 2, …, m. Then, ATfi = 0 and (iv) is proved
 Ag  A T Ag g
T T
 i 
 i
 i i  i g i and (v) is proved
(v) Finally, A fi = A  
 i  i i
Example. Verify Theorem 2 for A =
Theorem 3. Suppose A be an m × n matrix. Then, under the condition (i) through (v) of
r
Theorem 2, A = 
k 1
 k f k g kT
Proof. As indicated by Theorem 2, A: Vn Vm. Also f1, f2, …, fm and g1, g2, …, gn
form orthonormal bases for Vm and Vn respectively. So, for arbitrary v in Vn, we have
n
v= 
k 1
k g k , where  k  g Ti v (10-3)
Use the properties (ii) and (iii) of Theorem 2 to obtain

n r n n
Av = 
k 1
k Ag k    k Ag k 
k 1

k  r 1
k Ag k  
k  r 1
k k f k
By combining this with Eq. (C), then the theorem is proved.
We should note that Theorem 2 hold even A is rectangular, and has no spectral
representation.
Based on the above analysis, we are now ready to discuss the concept of More-Penrose
generalized more detail and we need the following definition.
Definition 1. The More-Penrose generalized inverse (or pseudoinverse), denoted A+, of

the m × n real matrix A is the n × m real matrix
r
A+ = k 1
 k1 g k f kT (10-4)
Example
Theorem 4. Suppose A be an m × n matrix and w be an arbitrary m-vector. Also consider

the equation Av = w and define v0 = A+w. Then ||Av – w||2 ≥ ||Av0 – w||2 and for those y ≠
v0 such that ||Av – w||2 = ||Av0 – w||2, then ||z||2 ≥ ||v0||2.
Proof. Using the notations of Theorem 2 and 3, an arbitrary n-vector w and an arbitrary n-
vector v can be written as
m m
w= 
k 1
k kf and v= 
k 1
k gk (10-5)
where μk = fkTw and γk = gkTv. Then use the properties (1) and (2) of Theorem 2, gives
  
m m r m
Av - w =   k Ag k 
k 1
 kfk =
k 1 k 1
k k   k f k  
k  r 1
f
k k (10-6)
Those fk are orthonormal,
    
r 2 m
||Av - w|| 22 = k k   k 2
k
k 1 k  r 1
k
The best way we can do to minimize ||Av - w|| 22 is to choose  k  for k = 1, 2, …, r,
k
so that those vectors y in Vn that minimize ||Av - w|| 22 can be expressed as
r    n
 k 
y=  g i
   k
gi k
k 1
 k   r 1
where μk for k = 1, 2, …, n is arbitrary. Since
2
r    n
||y|| =   k  
 k
2
 k2
2
 
k 1
 k   r 1
The y with minimum norm must have μk = 0 for k = r + 1, r + 2, …, n. So by using

μk = fkTw from Eq.(E) gives y with a minimum norm =
r    r fTw 
 gi  k   gi  k   Aw  v0

   k 1   
k 1
 k   k 
Example
Theorem 5. Suppose A be any matrix such that Av = w is a linear system. Then if any
solution of this linear system exists, it can be expressed as
v = A+w + (I – A+A)y (10-7)
where y is any arbitrary n-vector.
Proof. If a solution exists to Av = w, μk = 0 for k = r + 1, r + 2, …, m in Eq. (10-6), and

k
k  for k = 1, 2, …, r. Then any solution v can be written as
k
   r n
 k 
v= g  g 
   k  r 1 k k
k (10-8)
k 1
 k 
where ξk are arbitrary scalars for k = r + 1, r + 2, …, n. Now, there is an arbitrary vector y =
n
g
k 1
k  k , where ξk = gkTy. Also from definition 1 and Theorem 3, we have
r r r r r
 g k f kT f i g kT  i    i g k g kT  ki   g k g kT
1 1
A+ A = (10-9)
i 1 k 1 k i 1 k 1 k k 1
r
But, since the gk are orthonormal basis vectors for Vm, I = g
k 1
k g Tk . Hence,
n n
(I – A+A)y =  g k g Tk y 
k  r 1
g
k  r 1
k k (10-
10)
From Eq. (10-3), μk = fkTw so that by substitution of Eq. (10-10) into Eq. (10-8) and
definition 1, we have the result.
Based on the above analysis, we ready to formulate the important conditions for
pseudoinverse of a matrix A by the following theorem.
Theorem 6. The (More-Penrose) generalized inverse (or pseudoinverse) of a matrix A, not

necessarily square is a matrix A+ that satisfies the conditions:
(1) AA+ and A+A are Hermitian
(2) AA+A = A
(3) A+AA+ = A+
Proof. (1) Left to the reader as an exercise

r r

1
(2) Represent A =   k f k g Tk and A+ = g k f kT . Then
k 1 k 1 k
r r r
  k f k g Tk  g k f kT   k f k g Tk
1
AA+A =
k 1 k 1 k k 1
r r r

1
= i  k f i g Ti g j f Tj f k g Tk
i 1 j 1 k 1 j
Since = δij and
giTgj = δjk, gives the result.
fjTfk
(3) The same procedure can be used here
A generalized inverses exists for every matrix. If A has order n × m, then A+ has order n ×
m and has the properties as indicated by the following theorem
Theorem 7. For each m × n matrix A, there exists a unique n × m matrix A+ satisfying ten
properties
(1) A+ is unique
(2) A+ = A-1 for nonsingular A
(3) (A+)+ = A
(4) (kA)+ = (1/k)A+ for k ≠ 0
(5) (AH)+ = (A+)H
(6) 0+ = 0
(7) The rank of A+ equals to the rank of A.
(8) If P and Q are unitary matrices of appropriate orders so that the product of PQ is
defined, then (PAQ)+ = QHA+PH.
(9) If A has order m × k, B has order k × n, and both matrices have rank k,
then
(AB)+ = B+A+.
(10) For square matrix A, AA+ = A+A if and only if A+ can be expressed as a polynomial in
A.
Proof.
(1) We assume that F and G are two generalized inverse for the same matrix A then we
must show that F = G. Since F and G are assumed to satisfy conditions (1) through (3)
of
Theorem 6
above then we have FA, AF, GA, and AG are all Hermitian and
AFA = A, (10-
11a)
FAF =F, (10-

11b)
AGA = A (10-
11c)
and
GAG = G (10-
11d)
Multiplying both sides of eq. (10-11a) on the right by G, we obtain AFAG = AG, from
which we infer that
AG = (AG)H = [(AF)(AG)]H = (AG)H(AF)H = (AG)(AF) = (AGA)F = AF
Multiplying both sides of eq. (10-11a) on the right by G, we obtain GAFA = GA, from
which we infer that
GA = (GA)H = [(GA)( FA)]H = (FA)H(GA)H = (FA)(GA) = F(AGA) = FA
Then
G = GAG = (GA)G = (FA)G = FAF = F
(2) A-1 satisfies conditions 1 through 3 of Theorem 6 because
(AA-1)H = IH = I = AA-1
AA-1A = A(A-1A) = AI = A
and
A-1AA-1 = (A-1A)A-1 = IA-1 = A-1
The result follows from property (1) of this theorem
(3) With respect to A and A+, the conditions 1 through 3 are symmetric so that if A+ is
the
generalized inverse of A, then A also is the generalized inverse of A+, i.e., A = (A+)+.
(4) It follows directly from conditions 1 through 3 of Theorem 6.
(5) To prove this property have to show that (A+)H satisfies conditions 1 through 3 of
theorem
6.
Take D = AH , then
From condition 1 we obtain
(DD+)H = [AH(A+)H]H = A+A = (A+A)H = AH(A+)H = DD+
(D+D)H = [(A+)H AH]H = AA+ = (AA+)H = (A+)HAH = D+D
Both are Hermitian by definition

From condition 2 Theorem 6 we have
D+DD+ = (A+)HAH (A+)H = (A+AA+)H = (A+)H = D+
And finally from condition 3 Theorem 6 we also have.
DD+D = AH(A+)HAH = (AA+A)H = AH = D
Hence D+ = (A+)H satisfies all the conditions for a generalized inverse of AH. Since the
generalzed inverse is unique, then it follows that (AH)+ = D+ = (A+)H.
(6) If A = 0, then A+ = 0 satisfies conditions 1 through 3 of Theorem 6.
(7) Left to reader as an exercise.
(8) Let F = PAQ. It means that we have to show that F+ = QHA+PH satisfies conditions (1)
through (3), given that A and A+ do.
Form condition (1)of Theorem 6 we infer that
FF+ = (PAQ)( QHA+PH)

= PA(QQH)A+PH
= PAIA+PH
= P(AA+)PH
And
F+F = ( QHA+PH) (PAQ)
= QHA+ (PH)AQ
= QHA+IAQ
= QH(A+A)Q
Both Hermitian since AA+ and A+A are.
From condition (2) of Theorem 6 we infer that
FF+F = (PAQ)(QHA+PH)( PAQ)

= PA(QQH)A+(PHP)AQ
= PAIA+IAQ
= P(AA+A)Q
= PAQ = F
From condition (2) of Theorem 6 we infer that
F+FF+ = (QHA+PH)(PAQ)( QHA+PH)

= QHA+(PHP)A( QQH)A+PH
= QH A+ IAI A+PH
= QH(A+A A+)PH
= QHA+PH = F+
Theorem 8. If A can be factored into the product BC, where both BHB and CCH are
invertible, then
A+ = CH(CCH)-1(BHB)-1BH (10-
12)
Proof. To prove this theorem, we have to show that A+ satisfies the three conditions.
From condition (1) of Theorem 6 we see that
AA+ = (BC) CH(CCH)-1(BHB)-1BH = B(CCH)(CCH)-1(BHB)-1BH = B(BHB)-1BH
A+A = CH(CCH)-1(BHB)-1BH(BC) = C (CCH)-1(BHB)-1(BHB)C = CH(CCH)-1C
Both are obviously Hermitian.

From condition (2) of Theorem 6 we have
A A+A = (BC) CH(CCH)-1(BHB)-1BH (BC) = B[(CCH)(CCH)-1][ (BHB)-1(BHB)]C

= BIIC = BC = A
It is obviously from condition (3) of Theorem 6 that
A+AA+ = CH(CCH)-1(BHB)-1BH (BC) CH(CCH)-1(BHB)-1BH

= CH(CCH)-1 [ (BHB)-1(BHB)] [(CCH)(CCH)-1] 1(BHB)-1BH
= CH(CCH)-1 II (BHB)-1BH = CH(CCH)-1(BHB)-1BH = A+
Now, based on Theorem 8 above, we can provide an algorithm for generalized inverse for
any matrix A. We have to factor the matrix PAQ into BC first, where both BHB and CCH
are nonsingular. From this result we see that the last n – k columns of PAQ are linear
combination of the first k columns, so there must exit matrix G such both A22 = A21G and
A12 = A11G. Submatrix A11 is invertible so that G = A11-1A12 and A22 = A21A11-1A12. And it
follows from theorem that (PAQ)+ = CH(CCH)-1(BHB)-1BH , and then by property 8 that
QHA+PH = CH(CCH)-1(BHB)-1BH
By multiplying both sides of the last equation by Q on the left and P on the right we obtain
the desired formula. As a result, the product is unique, although the factors B and C and
the matrices P and Q are not unique.
The procedure for providing the generalized inverse for any matrix A is stated in Algorithm
1 as follows
Algorithm 1
Step 1. Find the rank of A and denote it as r.

Step 2. Locate r × r submatrix of A having rank r.
Step 3. Through a sequence of elementary row operation of the first kind, move the
submatrix identified in Step 2 into upper left portion of A, i.e., determine
 A11 A12 
PAQ =  

A 21 A 22 
Where P and Q are each the product of elementary matrices of the first kind, and A11 is a
submatrix of a that is non singular and of rank r. If no elementary operations where
necessary, then P and Q are identity matrices. A12, A21, or A22 may be empty.
A 11 
Step 4. Set B =   , G = A -1A and C = [I |G], where I is the r -by- r identity
 11 12 k k
A 11 
matrix.
Step 5. A+ = QCH(CCH)-1(BHB)-1BHP (5)
When the A form a linearly dependent set of vectors then equation in step 5 reduces to
A+ = (AHA)-1AH (6)
Equations (5) and (6) are the formulas for calculating generalized inverses, but they are not
stable if in the calculation the roundoff error is imvolved because small errors in the
elements of a matrix A can result in large errors in the computed elements of A+. In such
situations we need a better algorithm to exists.
For any matrix A, not necessarily square, the product AHA is normal and has nonnegative
eigenvalues, The positive square roots of these eigenvalues are the singular values of A.
More ever there exists unitary matrices U and V such that
D 0
A = U   VH
 (7)
 0 0
Where D is a diagonal matrix having as its main diagonal all positive singular value of A.
The block diagonal matrix
D 0
Σ =  
 (8)
 0 0
Has the same order as A, and, therefore, is square when A is square.
Theorem B. Let A be a matrix has order m × n with m ≥ n. Then A can be factored into A
= UΣVH, where Σ is an n × n matrix, and U is an m × n matrix with orthonormal columns.
Proof. Suppose Σ to be an n × n diagonal matrix containing all the singular value of A,

including zero if they arise. Now, construct D, V and U1 respectively and then construct U
by first augment unto U1 the identity matrix have the same number of rows as U1 but then
keeping only the first n linearly independent columns and orthonormalizing them.
The algorithm for constructing the singular-value decomposition of A is the following:
Step 1. Determine the eigenvalues of AHA and a canonical basis of orthonormal

eigenvectors for AHA.
Step 2. Construct D as a square diagonal matrix whose diagonal elements are the positive
singular value of A.
Step 3. Set V = [V1|V2], where the columns of V1 are the eigenvectors identified in step 1
that correspond to positive eigenvalues, and the columns of V2 are the remaining
eigenvectors.
Step 4. Calculate U1 = AV1D-1.
Step 5. Augment unto U1 the identity matrix have the same number of rows as U1.
Step 6. Identify those columns of the augmented matrix that form a maximal set of
linearly independent column vectors, and delete the others. Orthonormalize the column that
remain, and denote the resulting matrix as U.
If A is real, both U and V may be chosen to be orthogonal.

Based on the singular-value decomposition for A which is given in Equation (7), we can
generate the numerically stable formula for the generalized inverse for any matrix A, i.e.,
D -1 0
+   H
A = V U (9)
 0 
0

Which can be simplified to
A+ = V1D-1U1H (10)
Where V1 and U1 are defined by step 3 and step 4 respectively. For the purpose of of
calculating generalized inverse, steps 5 and 6 can be ignored.
Theorem C. P = UΣVH is positive semidefinite, and Q = UVH is a matrix with orthonormal

columns.
Proof. P is Hermitian and similar to Σ, which has nonnegative eigenvalues. Since the
columns of U are orthonormal, UHU = I, and with V being unitary,
PHP = (UVH)H(UVH) = UVH UVH = VIVH = VVH = I.
Theorem D. Let A be any m × n matrix with m ≥ n. Then A can be factored into A = QP,
where Q has orthonormal columns and P is positive semidefinite. Such a factorization is
called polar decomposition of A.
Proof. QP = (UVH)(VΣVH) = U(VHV)ΣVH = UΣVH = A.
10.3 THE BASIC CONCEPT OF THE SINGULAR VALUE DECOMPOSITION
In order to read this chapter, you will have to have the comprehensive
understanding of Chapter 8. In section 10.2 we will study a variant of the QR algorithm,
for which you will need to have read most of the material in Sections 4.5 to 4.8 and Section
5.3 as well . This chapter is largely independent in Chapter 6.
Throught the chapter we will restrict our attention to real matrices. This is done
solely to simplify the expositions; the generalization to the complex setting is routine.
Let A  R nxn by symmetric. Then by Corollary 4.4.14 there exists an orthonormal
basis v1,…,vn of R n, consisting of eigenvectors of A. Each vi satisfies Avi = λivi, where λi
is the (real) eigenvalue associated with vi. These relationships can be expressed by the
following diagram,
A
k
Vk  vk (10-
Which portrays the actions of A is linear transformation mappinf R n into R n. The
diagram descriebes the action of A completely, since A is completely determined by its
action on a basis of R n. Eq. (10-) is equivalent to the statement (theorem 4.4.13) that
there exixt an orthogonal matrix V and a diagonal matrix D such that A = VDVT. The
coloumns of V aore v1,v2,…,vn, and the main diagonal entries of D are λ1, λ2,…, λn.
It is reasonable to ask to what extent (10.1.1) can be generalized to nonsystmetric
matrices. If A is nonsimmetric but normal, (10.1.1) continues to be true, except that some of
λ1,…, λn and v1,…,vn are complex. If A is not normal but simple, we must give up to
ortogonality of the basis. If A is not simple, (10.1.1) ceses to hold. On will known
generalization is The Jordan canonical form. See for example, Lancaster and Tisminetsky
(1985). If A is not square, say A  R nxm with n  m, then even the Jordan canonical form
does not exist. In this section we will develop an extension of (10.1.1), valid for all A  R
nxm
, called the singular value the decomposition (SVD).
Only a moments reveals a significant change that will have to be made in (10.1.1)
if we wish to extend it to nonsquare matrices. Every A  R nxm can be viewed as a linear
transformation A: R m  R n, mapping R m into R n. The domain consists of m-tuples,
while the range consists of n-tuples. Thus our extention of (10.1.1) will have to have
different sets of vectors on left and right.
For the rest of this section A will denote a matrix in R nxm. Recall from Section
3.5 that A has two important spaces associated with it – the null space and the range-given
by
N (A) = { x | R m | Ax = 0}
R (A) = { Ax | x R m}
The null space in subspace of R m, and the range is a subspaces Of R n. Recall that the
range is also called the coloumn space of A (Exercise 3.5.7), and its dimension is called the
rank of A, denoted rank(A).
THEOREM 10.1.2 If A  R nxm , then m = dim (Nul(A)) + dim (R(A)).

For a proof see any elementary linear algera text. Better yet, prove it yourself.
THEOREM 10.1.3 N (ATA) = N (A)

Proof it is obvious that N (A)  N (ATA). To prove that N (ATA)  N (A), suppose x 
N (ATA). Then ATAx = 0 . Recall that the standard inner product in R n is given by (w,z) =
wTz. An easy computation shows that (ATAx.x) = (Ax,Ax) (cf.,Lemma 3.5.2). Thus 0 =
. Therefore Ax = 0; that is x  N (A).
2
(ATAx,x) = (Ax,Ax) = Ax 2
COROLLARY 10.1.4 Rank(ATA) = rank(A) = rank(AT) = rank(AAT).

Proof Rank(ATA) = m –dim(N (ATA)) = m – dimN (A) = rank(A). The second equation is a
basic result of linear algebra . The third equation is the same as the first equation, xcept that
the roles of A and AT are reserved.
THEOREM 10.1.5 If v is an eigenvector of ATA associated with a nonzero eigenvalue,
then Av is an eigenvector of AAT associated with the same eigenvalue.
Proof AAT (Av) = A(ATA)v = A(v) = (Av)
COROLLARY 10.1.6 ATA and AAT have the same nonzero eigenvalues, counting
multiplicity.
The matrices ATA and AAT are both symmetric and hebce simple . Thus Corollary
10.1.6 and Exercise 10.1.4 yield a second proof that they have the same rank, which equals
the number of nonzero eigenvalues. Since ATA and AAT generally have different
dimensions, they cannot have exactly the same eigenvalues. The difference is made up by a
nonzero eigenvalue of apropiate multiplicity. If rank (ATA) = rank (AAT) = r and r < m , then
ATA has zero as an eigenvalue of multiplicity n – r.
THEOREM 10.1.7 (SVD Theorem) Let A  R nxm have rank r. Then there exixts real
numbers σ1  σ2  …  σr > 0 , an orthonormal basis v1,…,vm of R m and an orthonormal
basis u1,…,un of R n , such that
Avi = σ iui i = 1,…,r ATui = σ ivi i = 1,…,r
Avi = 0 i = r + 1,…,m ATui = 0 i = r + 1,…,n
(10.1.8)
Equations 10.1.8 imply that v1,…,vm are eigenvectors of ATA, u1,…,un are eigenvectors of
AAT, and  12 ,…,  r2 are the nonzero of ATA and AAT.
Proof. You can easily verify that be assertions in the final sentence are true. This
determines how v1,…,vm nust be chosen. Let v1,…,vm be an orthonormal basis of Rm
consisting of eigenvectors of ATA and let λ1,…, λ m be the associated eigenvalues. Since ATA
is positive semidefinite, all of its eigenvalues are nonnegative . Assume v1,…,vm are ordered
so that λ 1  λ 2  …  λ m. Since r = rank(A) = rank(ATA), it must be that r > 0 and λ r + 1 =
λ r + 2=…= λ m = 0. For i = 1,…,r, define σi and ui by
1
σi = Avi 2 and ui = Avi
i
These definitions imply that Avi = σiui and ui 2
= 1, i =1,…,r. The result of Exercise
10.1.3 implies that u1,…,ur are orthogonal and hence orthonormal. It is easy to show that
= (Avi,Avi) = (ATAvi,vi) = (  i,vi,vi) =  i. It now
2 2 2
σi = λ i, i =1,…,r. Indeed σ i = Avi 2
1 i
follows easily that ATui = σivi, i = 1,…,r for ATui = ( i) ATAvi = ( ) = σ ivi
i i
The proof is now complete, except that we have not defined Ur+1,…,Un, assuming
r < n. By Theorem 10.1.5 the vectors u1,…,ur are eigenvectors of AAT associated with
nonzero eigenvalues. Since AAT  R nxn and rank (AAT) = r, AAT must have a null-space of
dimensions n – r. Let Ur+1,…un be any orthonormal basis of N (AAT). Noting that
Ur+1,…,Un are eigenvectors of AAT associated with the eigenvalue zero, we see that
Ur+1,…,Un are orthogonal to U1,..,ur. Thus u1,…,un is an orthonormal basis of Rn consisting
of eigenvectors of AAT. Since N (AAT) = N (AT), we have ATui = 0 for i = r + 1,…,n. This
completes the proof.
The numbers σ1,…, σr are called the singular values of A. Let k = min {n,m}. If r <
k, it is usual to adjoin k – r zero singular values r + 1 =…= k = 0. The vectors v1,v2,…,vm
are called right singular vectors of A, and u1,u2,…,un are called left singular vectors of A.
Singular vectors are not uniquely determined; they are no more uniquiely determined then
any eigenvectors of legth 1. Any singur vector can be replaced by its opposite, and if ATA or
AAT happens to have an eigenspace of dimensions  2, an even greater loss of uniquiness
occorus.
AT has the same singular values as A. The right (left) singular vectors of AT are the
left (right) singular vectors of A.
Theorem 10.1.10 allows of us to draw, for any A  R nxm, a diagram in the spirit
of (10.1.1):
A
1
v1  u1
2
v2  u2
 
r
vr  ur
v r 1 

  0
v m 
An analogous diagram holds for AT. Drawing the two diagrams side by side , we have
Which server as pictorial representations of the SVD Theorem.
A AT
1 1
v1  u1  v1
2 2
v2  u2  v2
    
r r
vr  ur  vr
(10.1.9)
v r 1  u r 1 
 
  0   0
v m  u n 
The singular value the decomposition displays orthonormal bases of the four

spaces R (AT), N (A) , and R (AT) = N (AT). It is clear from (10.1.9) that
R (AT) = v1 ,..., vr R (A) = u1 ,..., u r
N (A) = vr 1 ,..., vm N (AT) = u r 1, ...u n
 
From the representations we see that R (AT) = N (A) and R (A) = N (AT ) ; we prove
these equalities in Theorem 3.5.3 by other means.
The singular value decompositions is usually expressed as a matrix decomposition,
as follows :
THEOREM 10.1.10 (SVD Theorem ) Let A  R nxm have rank r. Then there exists U 
Rnxn,   R nxm, such that U and V are orthogonal ,  has the form
 1 0 
 2 0
 =   1   2  ...   r > 0 (10.1.11)
  
 
 0 0
And
A = UV T
Proof. Let v1,…,vm and u1,…,un be right and left singular vectors, and let σ1,…, σr be the
nonzero singular values of A. Let V = [v1,…,vm]  R mxm and U = [u1,…,un]  R nxn. Then
U and V are orthogonal. The equations
 u i  1,..., r
Avi =  1 i
0 i  r  1,..., m
Can be combined into a single matrix equation
 1 0 
  0
A[v1,…,vr|vr+1,…,vm] = [u1,…,ur| ur+1,…,un] 
0 r 
 
 0 0
That is, AV = U  . Since VV T = I, wee immediately that A = U  VT.
In the product A = U  VT, the last n – r coloumns of U and m – r coloumns of V
are superfluous because they interact only with blocks of zeros in . This observation leads
to the following variant of the SVD Theorem.
THEOREM 10.1.12 Let A  R nxm have rank r. Then there exist Û  R nxr, ̂  R rxr
and Vˆ V  R mxr such that Û and Vˆ are isometries (cf.,Section 3.4), is a diagonal matrix
with main-diagonal entries σ1  σ2  …  σr > 0 and
ˆ Vˆ T
A = Uˆ 
THEOREM 10.1.13 Let A  R nxm have rank r. Let σ1,…, σr be the nonzero singular
values of A, with associated right and left singular vectors v1,…,vr and u1,…,ur,
respectively. Then
r
A= 
j 1
σjUjVjT
Proof Let B =  j 1 ju j v Tj

r
σjujvjT R nxm
. To show that A = B, it suffices to show that
Avi = Bvi, i = 1,…,m since v1,…,vm is a basis of R m if i  r, we have Avi = σiui and Bvi = .
T T
Since v1,…,vm is an orthonormal set , vj vi = 0 unless j = i, in which case v i = 1. Thus all
terms in the sum are zero, except the I th term, and Bvi =  iui. If i > r, then Avi = 0 and Bvi
 
r r
= σjuj(vjTvi) = σjuj(0) = 0
j 1 j 1
Show that a simple relationship exists between the singlar vectors of A and the eigenvectors
of M. Show how to build an orthogonal basis of R n+m consisting of eigenvectors of M,
given the singular vectors of A.
It is clear that we can calculate the singular values the decomposition of any
matrix A by calculating the eigenvalues and eigenvectors of ATA and AAT. This approach is
illustrated in the next example and the exercise that follow. In the next section we willl
discuss a different approach, in which the SVD is computed without forming ATA or AAT
explicitly
Example 10.1.14 Find the singular values abd right and left singlar vectors of the matrix
1 2 0 
A=  
 2 0 2
Since ATA is 3 by 3 and AAT is 2 by 2, it seems reasonable to work with the latter.
Since
5 2
AAT =  
2 8 
The characteristic polynomial is (λ -5)( λ -8) - 4 = λ 2 –13 λ + 36 = (λ -9)( λ -4), and the
eigenvalues of AAT are λ 1 = 9 and λ 2 = 4. The singular values of A are therefore
σ1 = 3 and σ2 = 2
The left singular vectors of A are eigenvectors of AAT . Solving (9I – AAT)u = 0, we find
that every multiple of [1 2]T is an eigenvector of AAT associated with the eigenvalue λ 1 = 9.
Then solving (4I – AAT )u = 0, we find that the other eigenspace of AAT consists of
multiples of [2 – 1]T.
Since we want representatives with unit Euclidean norm, we take
1 1  1 2 
u1 = 2 and u2 =  
5   5   1
( What other choice for u1 and u2 could we have made?) These are the left singular vectors
of A. Notice that they are orthogonal, as they must be. We can find the right singular
vectors v1, v2, and v3 by calculating the eigenvectors of ATA. However v1 and v2 are more
easily found by the formula
1 T
vi = A ui i = 1,2
i
Which is trivial variation of one of the equation in (10.1.8) . Thus
5  0 
1   1  
v1 = 2 and v2 = 2
3 5   5  
 
4   1
Notice that these vectors are orthonormal. v3 must satisfy Av3 = 0. Solving the equation Av
= 0 and normalizing the solution, we get
  2
1  
v3 = 1 
3
2 
We could have found v3 wihout reference to A be applying the Gram-Schmidt process, for
example, to find a vector orthogonal to both v1 and v2. Normalizing that vector, we would
get  v3.
Now that we have the singular values and singular vectors of A, we can easily
construct the matrices U,  and V of Theorem 10.1.10 . We have
1 1 2 
U = [ u1 u2 ] =  
5 2 1
 0 0 3 0 0 
=  1  =  
 0  2 0 0 2 0 
5 0  2 5 
 

V = v1v2v3 =  1
2 6 5 
3 5 
4  3  2 5
 
You can easily check that A = U  VT. In so doing you will notice that v3 plays no rule in
the computation. This is an intance of the reamark made just prior to Theorem 10.1.12. It is
an easy exercise for you to write down matrices Û , ̂ and Vˆ satisfying the hypoteses of
Theorem 10.1.12
10.2 COMPUTING THE SVD
Diagram (10.1.9) gives a clear, complete picture of the action of A and of AT. This is
therefore reasonable to expect that the singular value decomposition will be very useful.
This is indeed the case, and we will examine some applications in subsequent sections.
Because the SVD employs orthogonal matrices (orthonormal bases), we can expect it to be
not only an important theoretical device, but also a powerful computational tool. If this
expectation is to be realized, we must be accurate, efficient means of calculating sigular
values and singular vectors.
The option of forming ATA (or AAT) and calculating its eigenvalues and (possibly)
eigenvectors should not be overlooked. This approach had the advantage of being relative
inexpensive; we studied several good algorithms of calculating eigenvalues and
eigenvectors of symmetric matrices in Chapter 4,5 and 6. The disadvantage of this approach
is hat smaller singular values will be calculated inaccurately. This is a consequenceof the”
loss of information through squaring” phenomanen (see Example 3.5.11), which occurs
when we compute ATA from A.
We can get some idea why this information loss occurs by consediring an
example. Suppose the entries of the matrix A are known to be correct to about six desimal
places. If A has, say, σ1  1 and σ17  10-3, then σ17 is fairly small compared with 1, but it is
still well above the error level   10-5 or 10-6. We ought to be able to calculate σ17 with
some prescesion, perhaps to two or three decimal places. The entries of ATA also have about
six-digits accuracy. Associated with the singular values σ1 and σ17, ATA has the eigevalues
λ 1 = σ 1  1 and λ 17 = σ 17  10-6. Notice that λ 17 is of about the same magnitude as the
2 2
errors in the entries of ATA. Therefore we cannot expect to calculate λ 17 accurately.

The larger singular values and associated singular vectors of A can be determined
accurately from the eigenvalues and the eigenvectors of ATA, As we shall wee, there are
some applications that require only this information. For such applications this may be best
approach. Orther applications require that smaller singular values. For these we must finda
procedure that avoids calculating ATA and AAT. In this section we will study one such
procedure, in which A is reduced to a condensed from and then an implicit version of the
QR algorithm is applied to the condensed form to extract its singular values and
(optionally) vectors. This is just one of several possible approaches. For example, each of
the three methods discussed in Chapter 6 has implicit variants for calculating singular
values.
Reduction to Bidiagonal Form
In Chapter 4 we found that the eigenvalue problem can be made much easier if we first
reduce the matrix to a simpler form, such as tridiagonal or Hessenberg form. The same
turns out to be true of the singular valu the decomposition. The eigenvalue problem requires
that the reduction be done via similarity transformations. For the singular value
decompotion A = U  VT. It is clear that similarity transformations are not necessary, but
the transforming matrices should be a orthogonal. We will see that we can reduce any
matrix to a bidiagonal form by applying reflectors on both left and right. The algorithms
that we are about to discuss work well for dense matrices. We will not coves the sparse
case.
We continue to assume that we are dealing with a matrix A  R nxm, but we will
now make the additional assumtion that n  m. This does not imply any loss of generalty,
for if n < m, we can operete on AT instead of A. If the SVD of AT is AT = U  VT, then the
SVD of A is A = V  TUT
A matrix B  R nxm (n  m) is said to be bidiagonal if bij = 0 whenever i > j or i
< j – 1. This means that B has the form
* * 
 * * 0 
 
 *  
 
  *
 *
 
 0 
THEOREM 10.2.1 Let A  R nxm
with n  m. Then there exist orthogonal Û  R nxn
and Vˆ  R mxm
, both products of a finite number of reflectors, and a bidiagonal B R
nxm
, such that
A = Û B Vˆ T
There is a finite algorithm to calculate Û , Vˆ and B.

We will prove Theorem 10.1.12 by describing the construction. It is very similar
to the QR decompositions by reflectors (Section 3.2 ) and the reduction to upper
Hessenberg form by reflectors (Section 4.5), so let’s just sketch the procedure. The first
step creates zeros in the first coloumn and row A. Let Û 1  R nxn
be a reflectors such that
a11  * 
  0 
a
Û 1  21  =  
   
   
a n1  0 
Then the first of Û 1A consists of zeros, expect for the (1,1) entry. Now let [ â 11, â 12,…, â
ˆ
1m] denote the first coloumn of Û 1A and let V 1  R
mxm
be a reflectors of the form
1 0  0
0 
Vˆ 1 =  
 Vˆ1 
 
0 
Such that
[ a12,…,a1n ] V 1 = [*0..0]
Then the first row of Û 1A Vˆ 1 consists of zeros, expect for the first two entries. Because the
first column of Vˆ 1 is e1, the first coloumn of Û 1A is unaltered by right multiplication by
Vˆ 1. Thus Û 1A Vˆ 1 has the form
* * 0  0 
0 
Û 1A Vˆ 1 =  
 Aˆ 
 
0 
The second step of the constuction is identical to the first, expect that it acts on the
submatrix A. It is easy to show that the rotators used on the second step do not destroy the
zeros created on the first step. After two steps we have
* * 0 0  0 
0 * * 0  0 
 
Û 2 Û 1A Vˆ 1 Vˆ 2 = 0 0 
 ˆˆ 
  A 
0 0 
 
The third step acts on the submatrix A, and so on, After m steps we have
* * 0
   
 
Û m… Û 2 Û 1A Vˆ 1 Vˆ 2… Vˆ m-2 =  * = B
 
0 *
 0 
Nptice that steps m – 1 and m require multiplications on the left only. Let Û = Û 1 Û 2…
Û m and Vˆ = Vˆ 1 Vˆ 2… Vˆ m-2. Then Û TA Vˆ = B; that is A = Û UB Vˆ VT
In many applications (e.g., least-squares problems) n is much larger then m. In this

case it is sometimes more efficient to perform the reduction to bidiagonal form in two
stages. In the first stage a QR decompositions of A is performed :
 R̂
A = QR = [Q1Q2]  
0 
Where R̂  R mxm is upper triangular. This involves multiplications by reflectors on the
left side of A only. In the second stage the relative small matrix R̂ is reduced to bidiagonal
~~ ~
form R̂ = UBV T. All of these matrices are m by m. It is easy to show that
U~ 0  B ~
~T
A = [ Q1 Q2 ]    V
 0 I  0 
Letting
U~ 0 ~
Û = [ Q1 Q2 ]   = [ Q1 U Q2 ]  R
nxn
 0 I 
B~
B =    R nxm
0 
~ ~
And V = V  R mxm, we have A = Û B Vˆ T
The advantage of this arrangement is that the right multiplications are applied to
the small marix R̂ instead of the large matrix A. They therefore cost a los less. The
disadvantage is that the right multiplications destroy the upper-triangular form of R. Thus
most of the left multiplications must be repeared on the small matrix R̂ . If the ratio n/m is
sufficiently large, the added cost of the extra left multiplications will be more than offset by
the savings in the right multiplications.
It is tempting to look for a revised algorithm that exploits the special structure of
R̂ rather than destroying it. Such a procedure, using plane rotators, was mentioned by
Chan (1982), but as he pointed out, nothing is saves unless fast, scaled rotators are used.
Using fast rotators, one can devise an algorithm whose asymptotic flop count is less than
that of the original procedure for all rations n/m > 1. The catch is that fast rotators have a
large overhead, and they will not prove cost effective unless m is fairly large.
As we have already noted, the various applications of the SVD have different
requirements. Some require only the singular values, while others require the right or the
Û
left singular vectors, or both. If any of the singular vectors are needed, then the matrices
ˆ
and /or V have to be computed explicity . Ussualy it is possible to avoid calculating Û ,
but there are numerous applications for which Vˆ is needed.The fop counts in Exercise
10.2.2 and 10.2.6 do not include the cost of computing Û or Vˆ
Since B is bidiagonal, it has the form

~
B 
B=  
0 
~
Where B  R mxm
is bidiagonal. The problem of finding the SVD of A can be reduced to
that offinding the SVD of the small, bidiagonal matrix B. First of all, the equation A = Û B
Vˆ T can be rewritten as A = Û 1 BVˆ T, where Û 1  R nxm consists of the first m coloumns
~
~ ~ ~ ~ ~
of Û . If B = U  V T is the SVD of B , then A = U1  VT, where U1 = Û 1 U and V =
~
VˆV . This is not exactly the SVD in the sense of Theorem 10.1.10. That form can be
obtained by adjoining n – m rows of zeros to and n – m columns U2 to U1 in such away that
the resulting matrix U = [ U1 U2 ] is orthogonal. The choice U2 = Û 2 (last n – m columns
of Û ) works. In practice this extension is seldom necessary, since the decomposition A =
U1  VT usually suffices.
Properly Bidiagonal Matrices
~
For notational convience we now drop the tilde from B and let B  R mxm be a bidiagonal
matrix say,
 1  1 0 
   
 2 2 
B=  3  
 
   m 1 
0  m 

We will say that B is a properly bidiagonal matrix if  i  0 and  i  0 for all i.
If B is not properly bidiagonal, the problem of finding the SVD of B can be
reduced to two or more smaller subproblems, first of all, if some  k = 0, then
 B1 0 
B= 0 B 
 2
Where B1  R kxk and B2  R (m-k)x(m-k) are both bidiagonal. The SVDs of B1 and B2 can be
found separately and combined to yield SVD of B. If some  k = 0, a small amount of
work transforms B to a form can be reduced. This is left as an exercise.
The Implicit QR Algorithm for the SVD
As result of Exercise 10.2.9, we see that we can always assume, without loss of generality,
that B is a properly bidiagonal matrix. A form of the implicit QR algorithm can be used to
find the SVD of any such matrix. We will describe the algorithm first and then jusisfy it.
Suppose B  R mxm is a properly bidiagonal matrix. Then both BBT and BTB are
properly tridiagonal matrices, so we could find their eigenvalues inexpensively by the QR
algorithm. The algorithm that we are about to develop is equivalent to the QR algorithm on
both BBT and BTB, but it is carried out without ever forming these matrices explicity. We
begin a QR step by choosing a shift. The lower right-hand 2-by-2 submatrix of BBT is
 m2 1   m2 1  m  m 1 
 
  m  m 1  m2 
Calculate the eigenvalues of this submatrix and take the sgift to be that eigenvalue which is
closer to  m2. This is Wilkinson shift on BBT. It is good choice because it guarantess
convergence, and the convergence is rapid in pratice. We could have chosen the shift form
BTB instead of BBT. We chose the latter because its lower right-hand 2-by-2 submatrix has
slightly simpler form.
A QR step on BTB with shift would perform the similarity transformation BTB 
T T
Q B BQ, where Q is the orthogonal factor form the QR decomposition:
BTB –  1 = QR (10.2.3)
Since we plan to take an implicit step, all we need is the first column of Q. Because R is
upper triangular, the first coloumn of Q is proportional to the first coloumn of BTB –  I,
which is
  12   
 
 1  1 
0  (10.2.4)
 
 
0 
 
Let V12 be a rotator(or reflector) in the (1,2) plane whose first coloumn is proportional to
(10.2.4). Multiply B by V12 on the right. The operation B  BV12 alters only the first two
coloumns of B, and as you can easily check, it creates a new nonzero entry ( a “bulge”) in
the (2,1) position . (Draw the picture!)
T T
Now find a rotator U 12 in the (1,2) plane such that U 12 BV12 has a zero in the (
2,1) position. This operation acts on row 1 and 2 and creates a new bulge in the (1,3)
T
position. Let V23 be a rotator acting on coloumns 2 and 3 such that U 12 BV12V23 has a zero in
the (1,3) position. This creates a bulge in the (3,2) position. Applying additional rotators U
T T
23 , V34, U 34 …,we chase the bulge through positions (2,4), (4,3),(3,5), (5,4),…,(m,m – 1),
and finally off of the matrix completely. The result is a bidiagonal matrix
B̂ = U mT 1,m ...U 23
T T
U 12 BV12V23 ...Vm1,m m-1,m
(10.2.5)
That is nearly proper
Letting
U = U12U23…Um-1,m and V = V12V23…Vm-1,m (10.2.6)
We can rewrite (10.2.5) as
B̂ = UTBV (10.2.7)
In the addition we have BB = U BB U and B̂ B̂ = V B BV. As we shall see, Bˆ Bˆ T and
ˆ ˆ T T T T T T
B̂ T B̂ are essentially yhe same matrices as we would have obtained by taking one shifted
QR step with shift , starting with BBT and BTB, respectively. If we set B̂  B and perform
repeated QR steps, both BBT and BTB will tend to diagonal form. The main diagonal entries
will converge to the eigenvalues. If the Wilkinson shift is used, the (m,m – 1) and (m,m)
entries of both BBT and BTB will convergence very rapidly. The former to zero and the latter
to an eigenvalue. Of course we do not deal with BBT or BTB directly; we deal with B. The
rapid convergebce of BBT and BTB translate into converegence of m – 1 to zero and to a
singular value of B.
Once  m–1 becames negligible, it can be considered to be zero, and the problem
can be deflated. Performing shifted QR steps on the remaining (m – 1)-by-(m – 1)
submatrix, we can force m – 2 quickly to zero, exposing to another singular value in the m-
1 position. Continuing is te manner, we soon find all the singular values of B.
During the whole procedure, all the k tend slowy toward zero. If at any point one
of them becomes negligible, the problem should be reduced to two smaller subproblems.
Calculating the Singular Vectors
If only singular values are needed, there is no need to keep a record of the many rotators
used the during the QR steps. However, if the right (or left) singular vectors are desired, w
must keep track of the rotators Vi,i+1 (or Ui,i+1). Let us suppose we wish to compute the right
singular vectors of A  R nxm and we have already calculated a decomposition
A = Û 1B Vˆ T
(10.2.8)
Where B  R mxm
is bidiagonal, Û 1  R has orthonormal columns, and Vˆ  R mxm is
nxm
orthogonal. Needing the right singular vectors, we have calculated Vˆ explicity and saved
it. As we perform the QR steps on B, we need to take into account each rotators Vij that
multiplies B on the right. This can be done by making the update Vˆ Vij  Vˆ along with
the update BVij  B. Since (BVij)(VVij)T = BVT, we see that this update . preserves the
overall product in (10.2.8). Of course this procedure should also be followed for the right
rotators used in the reduction procedure described in Exercise 10.2.9. Once B has been
reduced to diagonal form, the singular values lie on the main diagonal of B, and the right
singular vectors of A are he coloumns of Vˆ . The singular values do not necessary appear in
descending order. If left singular vectors are needed, then Û 1 must be calculated explixitly
and saved. Then for each rotators UijT that is applied to B on the left, the update Û 1UijT 
U1 should be made along with the update UijTB  B. In the end the m coloumns of U1 are
the left singular vectors of A.
The updates of B are inexpensive because B is very spase. By contrast the updates
Vˆ and Û 1 are relatively expensive. While the cost of an entire QR step
of the full matrices
without updating Û 1 or Vˆ is O(m) flops, the additional costs of updating Vˆ and Û 1 are
O(m2) and O(nm) flops per QR step, respectively. It follows that if the right or left singular
vectors are needed, the QR steps became much more expensive. The added cost can usually
be decreased by employing the ultimate shift strategy suggested in Section 4.8.
Justification of the Implicit QR Step
Although we re mainly interested in bidiagonal matrices, we will begin in more general

setting. B need not even be square. Let B  R nxm. If we wish to perform QR steps on BBT
and BTB with a shift, we begin by taking QR decompositions
BBT –  I = PS and BTB –  I = QR (10.2.9)
Where P  R R and Q  R
nxn mxm
are orthogonal and S  R nxn and R  R mxm are upper
triangular. If we know define B  R nxm by
B = PTBQ (10.2.10)
We see immedetiely from (10.2.9) and (10.2.10) that
B B T = PTBBTP = SP +  I
B T B = QTBTBQ = RQ +  I
This means that the transformation BBT  B B T and BTB  B T B are both shifted QR
steps.
Now suppose B is bidiagonal. Then bidiagonal form is inherited by B , as the
following exercises show. As in Chapter1, we will say that B is lower k-banded (upper-b-
banded) if bij = 0 whenever i – j > k ( j – i > k). A bidiagonal matrix is one that is both lower
0-banded and upper 1-banded.
If is not an eigenvalue of BBT or BTB, then the upper-triangular matrices S and R in
(10.2.9) are both non-singular.
If B  R mxm is a properly bidiagonal matrix, then part (d) of Exercise 10.2.15
remains valid even if is an eigenvalue of BBT or BTB. In this case part(e) is almost true as
well, expext that B has a zero in its (m – 1,m) position; that is  m-1 = 0. A carefull
discussion of this case requires some extra effort, so we will omit it. The result  m-1 = 0 is,
a theoretical result. In practice, roundoff errors will cause  m-1 to be nonzero.
We wish to show that the matrix B given by (10.2.10) is essentially the same as
B̂ given by (10.2.5) and (10.2.7). The next Theorem will make that take possible.
THEOREM 10.2.11 Let B  R mxm be non-singular. Suppose B , B̂  R mxm are properly
bidiagonal matrices P,Q,U,V  R mxm are orthogonal matrices,
B = PTBQ (10.2.12)
And
B̂ = UTBV
(10.2.13)
Suppose further that Q and V have essentially the same first coloumn; that is, q1 = v1d1,
where d1 = 1. Then there exist orthogonal diagonal matrices D and E such that V = QD, U =
PE, and
B̂ = E B D
In other words, B and B̂ are essentially the same.
Proof We note first that B T B = QT(BTB)Q and B̂ T B̂ = VT(BTB)V. By exercise 10.2.8, B
T
B and B̂ T B̂ are properly tridiagonal matrices. Applying theorem 5.3.1 ( or 5.3.5) with F
~
= R and with BTB, B T B and B̂ T B̂ playing the roles of A,B and B B, respectively, we
find that Q and V are essentially equal. More precisely, there exists an orthogonaldiagonal
matrix D such that V = QD. Using this equation, we can rewrite (10.2.12) and (10.2.13) to
obtain
U B̂ = BV =BQD =P( B D)
Defining C = U B̂ = P( B D), we note that U B̂ B and P( B BD) are both QR
decompositions of C, since U and P are orthogonal and B̂ and B BD are upper triangular.
Since B̂ and B D are not assumed to have positive main-diagonal entries, we cannot
conclude that they are equal. Instead we can draw the weaker conclusion (Exercise 3.2.21)
that there exists an orthogonal diagonal matrix E such that U = PE and B̂ = E B D. The
completes the proof.
To apply this theorem to our present situation, note that the first column of the
matrix V = V12V23…Vm-1,m of (10.2.6) is the same as the first column of V12. This is easy to
see if we think of accumulating V by starting with V12, then multiplying on the right by V23,
V34,…,Vm-1,m successively. Since each of these is a rotator acting on column 1, the first
coloumn of V12 was chosen to be essentially the same as the first column of the orthogonal
matrix Q in the QR decomposition of BTB –  I. This is exactly the Q that appears in
(10.2.3) and (10.2.10). We can now apply Theorem 10.2.11 to conclude that the matrix B̂
defined by (10.2.5) and (10.2.7) is essentially the same as the matrix B defined by
(10.2.10). This completes the jstification of the implicit QR step, at least in case that the
shift is not exactly equal to an eigenvalue.
10.3
SOME BASIC APPLICATIONS OF SINGULAR VALUES
Computation of Norms and Condition Numbers

In Chapter 2 we defined the spectral matrix norm to be the norm induced by the Euclidean
vector norm
Ax 2
A 2 = max
x0 x2
The discussion in Chapter 2 was restricted to square matrices, but this definition makes
sense for nonsquare matrices as well. Geometrically A 2 represents the maximum
magnifacitions that can be undergone by any vector x  R when acted on by A. In light of
m
(10.1.9), it should not be suprising that 2 equals the maximum singular value of A.
THEOREM 10.3.1 Let A  R nxm have singular values σ1  σ2  …  0, Then

A 2 = σ1
Proof we must that max Ax 2/ x 2 = σ1. First notice that if v1 is the right singular vector of
A associated with σ1, then
Av1 u1
= σ1 2
= σ1
v1 v1 2
So max Ax 2/ x 2  σ1 Now we must show that no other vector is magnified by more
x0
than σ1
Let x  R m. That x can be expressed as a linear combination of the right singular
2
vectors of A: x = c1v1 + c2v2 + … cmvm. Since v1,…,vm are orthonormal, x 2
=
|c1| +…+|cm| . Now Ax = c1v1 + …+crAvr+…cmAxm = σ1c1u1 + … + σrcrur + 0 =… + 0,
2 2
2
where r is the rank of A. Since u1,…,ur are also orthonormal, Ax 2 = |σ1c1|2 + … + |crvr|2.
Ax
 σ 12 ( |c1|2 + … + |cr|2)  σ12 x  σ1. This completes he
2 2 2
Thus Ax 2 2 ; that is,
x 2
proof.
Since A and AT have the same singular values, we have the following corollary.
COROLLARY 10.3.2 A 2 = AT 2.
Now supposse A is square , say A  R nxn, and non-singular. The spectral condition
number of A is defined by
1
k2(A) = A 2
A
2
Let us see how k2(A) can be expressed in terms of the singular values of A. Since A has
rank n, it has n strictly positive singular values, and its action is described completely by
the following diagram :
A
1
v1  u1
2
v2  u2
 
n
vn  un
It follows that the corresponding diagram for A-1 is

A-1
 1
u1 
1 v
1
 1
u2 

2  v
2
 
 1
un  n  v
n
In terms of matrices we have A = U  VT and A-1 = V-T  -1U-1 = V  -1UT. Either way we see
that the singular values of A-1, in descending order, are σ n
1
 σ n 11  …  σ 11 > 0.
1
Applying theorem 10.3.1 to A-1 we conclude that A 2 = σ n . These observations imply the
following theorem.
THEOREM 10.3.3 Let A  R nxn be a non-singular matrix with singular values σ1  σ2 
…  σn > 0. Then
1
k2(A) =
n
Another experession for the condition number that was given in Chapter 2 is
max mag ( A)
k2(A) =
min mag ( A)
Where
Ax 2
maxmag(A) = max
x0 x2
Ax 2
minmag(A) = min
x0 x 2
This gives a slightly different view of the condition number. From Theorem 10.3.1
we know that maxmag (A) = σ1. It must therefore be true that minmag(A) = σn
In Chapter 3 we obversed that the equation
max mag ( A)
k2(A) =
min mag ( A)
(10.3.4)
Can be used to extend the definition of k2 to certain nonsquare matrices. Specifically, if A 
R nxm, n  m, and rank(A) = m, then minmag(A) > 0, and we can take (10.3.4) as the
definition of the condition number of A. If A is nonzero but does not have fill rank, then
(still assuming n  m) minmag(A) = 0, and it is reasonable to define k2(A) =  . With this
convention the following theorem holds, regardless of whether or not A has full rank.
THEOREM 10.3.5 Let A  R nxm, n  m , be a nonzero matrix with singular values σ1 
σ2  …  σm > 0. Then maxmag(A) = σ1, minmag(A) = σm. and k2(A) = σ1/σm.
The proof is left as an easy exercise for you.
Numerical Rank Determination
In the absence of roundoff errors and uncertainties in the data, the singular value
decomposition reveals the rank of a matrix. Unfortunately the presence of errors and
uncertainties makes the question of rank meaningless. As we shall see, a small perturbation
in a matrix that is not of full rank can and typically will increase the rank.
The nonnegative number A  A 2
is a measure of the distance between the
matrices A and A  . Exercise 10.3.8 shows that every rank-deficient matrix has full-rank
matrices arbitrally close to I; this suggest that atrices of full rank are abundant. This
impression is strengthened by he next theorem and its corollary.
THEOREM 10.3.6 Let A  R nxm with rank(A) = r > 0. Let A = U  VT be the singular
value decomposition of A. For k = 1,…,r – 1 define Ak = U  KVT, where k  R nxm
is
σdefined by
 1 0 
  0 
 2 
k =   
 
0 k 
 0 0
(We assume an usual that σ1  σ2  …  σr). Then rank(Ak) = k, and
σk+1 = A  Ak 2 = min{ A  B 2|rank(B) = k }
That is, of all the matrices of rank k, Ak is closet to A.
Proof It is obvious the rank (Ak) = k .Since A – Ak = U(  -  k)VT, it is clear that the
largest singular values of A – Ak is σk+1. Therefore A  Ak 2 = σk+1. It emains to be shown
only that for any other matrix B of rank k, A  B 2  σk+1.
Given such that a B, note first that N (B) has dimension m – k, for dim(N (B)) =
dim(R ) – dim(R (B)) = m – rank(B) = m – k. Also, the space v1,…,vk+1 has dimension
m
k+1. (As usual, v1,…,vm denote the coloumns of V.) Since N (B) and v1,…,vk+1 are two
subspaces of R m, the sum of whose dimensions exceeds m, they must have a nontrivial
intersection. Let x be a nonzero vector in N (B)  [v1,…,vk+1]. We can and will assume
that x̂ 2 = 1. Since x  v1,…,vk+1 there exist scalars c1,…,ck+1 such that x =
c1v1+…+ck+1vk+1. Because v1,…,vk+1 are orthonormal , c1  2+…+ c k 1  2 = x̂
2
2 = 1. Since
x N (B),Bx = 0. Thus
 
k 1 k 1
(A – B) x̂ = Ax = i 1
ciAvi = i 1
σiciui
Since u1,…,uk+1are also orthonormal,
 
k 1 k 1
( A  B) xˆ
2
2 = i 1
|σici|2  σ 2k 1 i 1
2
|ci|2 = σ k 1
Therefore
( A  B) xˆ
A B 2  2
 σk+1
xˆ 2
This completes the proof
COROLLARY 10.3.9 Suppose A  R nxm has full rank. Thus rank(A) = r = min{n,m}. Let
σ1  σ2  …  σ r > 0 be the singular values of A. Let B  R nxm satisfy A  B 2 < σr. Then
B also has full rank.
This result is an immediate consequence of Theorem 10.3.6. From Corollary
10.3.7 we see that if A has full rank, then all matrices sufficiently close to A also have full
rank. From Exercise 10.3.8 we know that every rank-deficient matrix has full rank matrices
arbitraly close to it. By Corollay 10.3.7, each of these full-rank matrices is surrounded by
other matrices of full rank. In topological lamguage, the set of matrices of full rank is an
open dense ubset of R nxm. Its compement, the set of rank-deficient matrices, is therefore
closed and nowhere dense. This discussion is meant to convence you that almost all
matrices have full rank.
If a matrix does not have full rank, any small perturbation is almost certain to
transform it to matrix that does have full rank. It follows that in the presence of uncertainty
in the data, it is possible to calculate the rank of matrix or even detect that it is rank
deficient. (This is generalization of the assertion, made in Chapter 1 and Chapter 2, that it is
impossible to detect wheter a squarematrix is singular.) Nevertheless in certain applications
it is reasonable to call a matrix numerically rank deficient if it is close to a rank-deficient
matrix.
Let  be some positive number that represents the magnitude of the data
uncertainties in the matrix A. If there exists matrices B of rank k such that A  B 2 < 
and, on the orther hand, for every matrix C of rank  k – 1 we have A  C 2 >>, then we
will ay that the numerical rank of A is k. From Theorem 10.3.6 we know that this condition
is satisfied if and only if the singular values of A satisfy.
σ1  σ2  …  σk >>  > σk+1  …
Thus the numerical rank can be determined by examining the singular values. A matrix that
has k “large” singular values, the others being “tiny”, has numerical rank k, However , if
the set of singular values has no convenient gap, it may be impossible to assign a
meaningful numerical rank to the matrix.
EXAMPLE 10.3.8 The matrix

1 / 3 1 / 3 2 / 3
 2 / 3 2 / 3 4 / 3
 
A = 1 / 3 2 / 3 3 / 3 
 
 2 / 5 2 / 5 4 / 5
3 / 5 1 / 5 4 / 5
Is obviously of rank 2, since its third column is the sum of the first two coloumns. We
calculated the singular values of A using a canned program that employs an algorithm of the
type described in the previous section. The numbers are stored to about 16-decimal digits
accuracy in the machine, so a perturbation of magnitude about 10 -16 is present in the
machine representation of A. Additional roundoff errors are made in the calculation of the
singular values. Therefore a reasonable choice of would be , say  = 10-15. The computed
singular values were
σ1 = 2.599 σ2 = .368 and σ3 = .866 x 10-16
Since σ1  σ2 >>  > σ3, trghe matrix has numerical rank 2.
EXAMPLE 10.3.9 Imagine a 2000 x 1000 matrix with singular values σn = (.9)n, n =
1,2,…,1000. Then σ1 = .9 and σ1000 = 1.75 x 10-46. It is clear that 1 is “large” and 1000 is
“tiny” by any standards, so be a numerical rank of the matrix lies somewhere between 1
and 1000. Howhever , it is impossible to specify the numerical rank exactly because there
are no gaps in the singular values. For example
σ261 = 1.14 x 10-12 σ262 = 1.03 x 10-12
σ263 = 9.24 x 10-23 σ264 = 8.31 x 10-13
If  = 10 , say it might be reasoanable to say that the numerical rank is approximately
-12
260, but it is certainly not possible to specify it exactly.
Distance to Nearest Singular Matrix
We conclude this section by considering the implications of Theorem 10.3.6 for square,
non-singular matrices. Let A  R nxm be non-singular and let A denote the singular matrix
that is closetst to A, in sense that 2 is a small as possible. In Theorem 2.3.17 we showed
that
A  As 1

A k ( A)
For any induced matrix norm, and we mentioned that for the the 2-norm, equality holds.
We now have the tools to prove this.
COROLLARY 10.3.10 Let A  R nxm be non-singular. ( Thus A has singular values σ1 
σ2  … σn > 0) . Let A be the singular matrix that is closest to A, in the sense that A  As
2 is as small as possible. Then A  As 2 = σn, and

A  As 2 1
=
A2
k 2 ( A)
These results are immediate consequence of Theorems 10.3.1, 10.3.3, and 10.3.6.
In words, the distance from A to the nearest singular matrix is equal to the smallest singular
value of A, and the “relative distance” to the nearest singlar matix is equal to the reciprocal
of the condition number.
10.4. THE SVD AND THE LEAST-SQUARES PROBLEM
Let A  R nxm, r = rank(A), and b  R n, and consider the system of equations

Ax = b
(10.4.1)
With unknown x  R m. If it happens that n > m, then the system is overdetermined, and we
cannot expect to find an exact solution. Thus we will seek an x such that b  Ax 2 is
minimized. This is exactly the least-squaes problem, which we studied in Chapter 3, There
we found that if n  m and rank(A) = m, the least-squares problem has a unique solution. If
rank(A) < m, the solution is not unique; there are many x  R m for which 2 is minimized.
Even if n < m, it can happen that (10.4.1) does not have a solution, so we might as well
include that case as well. Thus we will make no assumption about the relative sizes of n and
m.
Because the soluton of the least-squares problem is sometimes not unique,we will
consider following additional problem: of all the x  R n that minimize 2, find one for
which 2 is as small possible. As we shall see, this problem always has a unique solution.
Initially we shall assume A and b are known exactly, and allcomputations are carried out
exactly. Once we have settled the theoretical issues, we will disccuss the practical
questions.
Supposes we have the exact singular value decomposition : A = U  VT, where U
 R and V  R mxm are orthogonal and
nxn
 1 0 
  0 
 ˆ 0  2 
=   =     R nxm
 0 0   
0 r 
 0 0
With σ1  σ2  …  σr > 0. Because U is orthogonal, b  Ax 2 = UT (b  Ax) 2 =
UTb  (VTx) 2. Letting c = UTb and y = VTx, we have
 
2 2 r m
b  Ax 2 = c   y 2
= i 1
|ci -σiyi| + i  r 1
|ci|2
(10.4.2)
It is clear that this expression is minimized when only when
ci
yi = i = 1,…,r
i
Notice that when r < m , yr+1,…,ym do not appear in (10.4.2). Thus they have no effect on
the residual and can be chosen arbitrarily. Among all the solutions so obtained, 2 is clearly
minimized when and only when yr+1 = …= ym = 0. Since x = Vy and V is orthogonal, x 2=
y 2. Thus x 2 is minimized when and only when y 2 is. This proves that the least
squares problem has exactly one minimum norm solution.
It is useful to repeat the development, using partitioned matrices. Let
ĉ   ŷ 
c =   and y= 
 
d z 
where ĉ , ŷ , R r. Then (10.4.2) can be rewritten as
2
2 ĉ   ̂ 0   ŷ  ˆ yˆ 
cˆ  
b  Ax 2 = = cˆ   yˆ
2
  -  0 0  z  =   2 = d 2
2
d      2
d 
(10.4.3)
This is minimized when and only when y =  -1c; that is, yi = ci/σi, i= 1,…,r. We an choose
z arbitrarily, but we get the minimum norm solution by taking z = 0. The norm of the
minimical residual is d 2. This solves the problem completely in principle. We summarize
the procedure:
ĉ 
1. Calculate   = c =UTb
d 
2. ˆ 1 cˆ
Let yˆ  
3. Let y = R n, where z can be chosen arbitrarily. To get the minimum norm solution,
take z = 0.
(10.4.4)
4. Let x = Vy.
Practical Considerations
In practice we do not know the exact rank of A. It is best to use the numerical rank,
discussed inm Section 10.3, instead. All “tiny” singular values should be set to zero.
We have solved the least-squares problem under the assumption that we have the
whole matrices U and V at hand. However, you can easily check that the calculation of c
uses only the first r columns of U, where, in practice, r is the numerical rank. If only the
minimum norm solution is wanted, only the first r columns of V are used . Whil the
numerical rank is usually not known in advance, it can never exceed min {n,m}, so at
almost min{n,m} columns of U and V are needed.
If n >> m , the computation of U can be expensive, even if we compute only the
first columns. In fact the computation of U can be avoided completely. U is the product of
many reflectors and roators that are generated during the reducton to bidiagonal form and
the subsequent QR steps. Since U is needed only so that we can calculate c = UTb, we can
simply update b instead of assembling U. As each rotatos ar reflector Ui is generated , we
b  b. In the end b will have been transformed into c. In the process
T
make the update U i
we get not only c, but also d, so we can compute the norm of the residual 2. If several least-
squares promlems with the same A but a differentright-hand sides b(1),b(2),.. are to be
solved, the updates must be applied to all of the b(i) at once, since the Ui will not be saved.
No matter how the calculations are organized, the SVD is an expensive way to
solve the least-squares problem. Its principal advantage is that it gives a completely reliable
means of determninng the numerical rank for rank-deficient least-squares.
The Pseudoinverse
The pseudoinvere, also known as the Moore-Penrose generalized inverse, is an interesting

generalization of the ordinary inverse. Although only square, non-singular matrices have
inverses in the ordinary sense, every A  R nxm has a pseudoinverse. Just as the solution of
A square, non-singular linear system Ax = b can be expressed in termes of A-1 as x = A-1b,
the mninimum norm solution to a least-squares problem with a possibly nonsquare
coefficient matrix A can be expressed in terms of the pseudoinverse A+ as x = A+b.
Given A  R nxm
with rank r, the action of A is completely described by the
diagram

v1 
1 u1
2
v2  u2
 
r
vr  ur
v r 1 

 0
v m 
Where v1,…,vm and u1,…,un are complete orthonormal sets of right and left singular
vectors, respectively, and σ1  σ2  …  σ r > 0 are the nonzero singular valus of A. In
matrix form,
A = U  VT
We wish to define the pseudoinverse A+  R mxn so that it is as much like a true
1
inverse as possible. Therefore we must certain require A+ui = σ i vi for i = 1,…,r. There is
reasonable choice for A+ur+1,…,A+un orther than to make them zero. This assures, among
other things, that A is the pseudoinverse of A+. Thus we define the pseudoinverse of A to be
the matrix A+  R mxn that is uniquely specified by the diagram
 1
u1 
1 v
1
 1
u2

2  v
2
 
1
ur  
r  v
r
u r 1 


0
u n 
We see immediately that rank(A+) = rank(A), u1,…,un and v1,…,vm are right and left singular
1 1
vectors of A+, respectively, and σ 1 ,…,σ r are the nonzero singular values . The restricted
operators A: v1 ,....,vr  u1 ,...,ur and A+: u1 ,...,u r  v1 ,...,vr are true inverses of
one another.
What does A+ look like as a matrix? You can answer this questions in the simplest
case by working the following exercise.
To see what A+ looks like in general, note the equations
vi 1 i  1,..., r
A+ui =  i
0 i  r  1,..., n
Can be expressed as a single matrix equation
A+[u1,u2,…,ur|ur+1,…,un] =
 11 0  0 
 
 0 2  0
1

[v1,v2,…,vr|vr+1,…,vm]    0
 
 0 0   r1 
 0
 0

or A+U = V  .Thus

A + = V  UT (10.4.5)
This is the SVD of A in matrix form, and it gives us a means of calculating A+ by
+
computing the SVD of A. However, there is seldom any reason to compute the
psedoinverse; it is mainly a theoretical tool. In this respect the psedoinverse plays a
role much like that of the ordinary inverse.
It is easy to make the claimed connection between the pseudoinverse and the least-
squares problem.
THEOREM 10.4.6 Let A  R nxm

and b  R n, and let x  R m
be the minimum norm
solution of
b  Ax 2 = min b  Aw 2
wR m
+
Then x = A b
Proof
By (10.4.4),x = Vy = V
 yˆ  ˆ 1 cˆ  ˆ 1 0 cˆ    
0   V    V   d   V   V  U b  A b
T
  0  0 0  
The Pseudoinverse is used in the study of the sensitivityof the rank-deficient least-
squares problem. See [SLS] os Stewart (1977)
10.5 ANGLES AND DISTANCES BETWEEN SUBSPACES
Angles between Subspaces

In this section we will see how to describe the relative orientation of two subspaces of R n
in terms of certain principal angles between hem. These are the angles formed by certain
principal vectors in the spaces. The study of principal angles and vectors has numerous
applications, notably canonical correlation analysis. At the end of the section we will study
the closely related notion of distances between subspaces.
Let S and T be two subspaces of R n. We would allow them to have different
dimensions, but for simplicity we will assume dim(S ) = dim(T ) = k > 0. The smallest
angle between S and T is defined to be the smallest angle that can be formed a vector in S
and a vector in T. Recall that the angle between nonzero vector u  S and v  T is unique θ
 [0,  ] defined by
(u, v)
Cos θ =
u2 v2
Obviously the angle is unaltered when u and v are multiplied by positive constants, so it
suffices to consider vectors of length 1. Since the angle is minimized when the consine is
maximized, the smallest angles satisfies.
Cos θ 1 = max { (u,v) | u  S, u 2 = 1,v  T, v 2 = 1}
(10.5.1)
The smallest angle is also called the first principal angle between S and T. Suppose the
maximum in (10.5.1) is attained when u = u1 and v = v1. The second principal angle  2 ie
then defined to be the smallest angel that can be formed between a vector in S and that is
orthogonal to u1 and a vector in S that is orthogonal to v1. Thus

Cos θ 2 = max { (u,v) | u  S  (u1) , u 2 = 1, v T  (v1)  , v 2 = 1 }
Let u1,v2 be a pair for which the maximum is attained. In general the principal
angles are defined recursively for i = 1,…,k by

Cos θ I = max { (u,v) | u  S  (v1,…,vi-1) , u 2 = 1,v  T  (v1,…,vi-1)  , v 2 = 1}
 
Then vectors ui  S  u1 ,...ui 1 and vi  T  v1, ..., v i 1 are chosen to
be a pair for which the maximum is attained. Obviously 0  θ 1  θ 2  …  θ k   /2.
The orthonormal vectors u1,…,uk  S and v1,…,vk  T are called the principal vectors.
Clearly S = u1 ,..., u k and T = v1 ,...vk . The principal vectors are not uniquely
determined but, as we shall see, the principal angles are
EXAMPLE 10.5.2 A pair of principal vector ui and vi can always be exchanged for –ui and
–vi
EXAMPLE 10.5.3 Let S  T. Then θ1 = θ …= θk = 0. The principal vectors u1,…,uk  S
can be chosen arbitrarily, subject only to orthonormality, and vi = ui, i = 1,…,k
The principal angles and vectors satisfy a number of simple and interesting
geometric relationships, in addition to those that are obvious consequences of the
definition. We will begin with two elementary lemmas.
LEMMA 10.5.4 Let u,v  R n with u 2 = 1 and ( u – v,v) = 0. Let θ denote the angle
between u and v. (If v = 0, we define θ =  /2). Then
Cos θ = v 2 and sin θ = u v 2
Proof From figure 10.1 we see that this lemma states a geometrically obvous fact. Here is a
proof anyway. By definition, cos θ = (u,v)/ u 2 v 2. Since (u – v,v) = 0,we have (u,v) =
2
(u,v) = v 2 . Since also u 2 = 1, we conclude that cos θ = v 2. By the Pythagorean
theorem (Lemma 3.5.5), u 2
2 = v 2
2 + u v 2
2 , so u v 2
2 = 1 – cos2 θ = sin2 θ.
Thus u  v 2 = sin θ
LEMMA 10.5.5 Let u,v  R n with u 2 = v 2 = 1. Let θ be the angle between u and v.
Then
u  v cos = sin θ
v cos θ is the multiple of v that is closet to u
Proof Figure 10.2 shows that this lemma is also obvious. Let us first show that vcos θ is the
multiple of v that is closet to u. Applying the projection theorem (Theorem 10.5.6) with S
= (v), we find that the multiple cv that is closest to u is characterized by (u – cv,v) = 0.
Solving the equation for c. We find that c = (u,v)/(v,v) = (u,v) = cos θ. Now applying
Lemma 10.5.4 with v replaced by vcos θ, we have u  v cos = sin θ
The result of the next exercise will be used in the proof of the theorem that
follows.
THEOREM 10.5.6 The first principal angle θ 1 and principal vectors u1  S and v1  T
satisfy
Sin θ = u1  v1 cos1 2 = min u1  v 2 = min min u  v
vT uS vT 2
u 2 1
Sin θ = v1  u1 cos1 2 = min v1  u 2 = min min v  u

uS vT uS 2
v 2 1
Proof the equation sin θ 1 = u1  v1 cos1 2 2 is a consequence of Lemma 10.5.5 As for the
orther two equations. It is obvious that
u1  v1 cos1  min u1  v  min min u  v
2 vS U S vT 2
u 2 1
To get equality, it suffices to prove that for every u  S and v  T with u 2 = 1, u  v 2
 u1 – v1cos θ 1 2. Given such a u and v, let v denote the best approximation of u from
T. The certainly u  vˆ
 u  v 2. Let denote the angle between u and v. (If v = 0, we
2
set =  /2) By exercise 10.5.3 θ   /2. Since(u – v,v) = 0 by theorem 3.5.6, we have
u  v̂ 2 = sin θ from Lemma 10.5.4. Since θ 1  θ and sin as an increasing function on [0,
 /2], u1 – v1cos θ 1 2 = sin θ 1  sin θ = u  vˆ 2  u  v 2. This proves the first
string of equations. The second string is proved by reversing the roles of S and T.
Theorem 10.5.6 is easily generalized to yield statements about the orther principal
angles and vectors. For this purpose it is convenient to introduce some new notation. For i =
1,…,k let
 
S i =S  u1 ,...ui 1 = ui ,..., u k
 
T i =T  v1 ,..., vi 1 = v i ,..., vk
Roughtly speaking, S i and T i are just S and T with the first i – 1 principal vectors
removed.
THEOREM 10.5.7 for i = 1,…,k

Sin θ 1 = ui  vi cos1 2 = min ui  v 2 = min min u  v
vTi uSi vTi 2
u 1
2
Sin 1  vi u i cos1  min vi  u  min
vTi
min v  u
2 uSi 2
v 1
uSi 2
2
Theorem 10.5.7 is an immediate consequence of Theorem 10.5.6, because θ i , ui

and vi are the first principal angles and vectors for the subspaces S i and T i.
Not only are u1,…,uk and v1,…,vk orthonormal, but additional orthogonality
relations hold between the ui and vj as well.
THEOREM 10.5.8 If i  j, then(ui, vj) = 0
Proof First assume i < j. From Theorem 10.5.7
ui  vi cosi 2  min u i v 2
vSi
so by the projection theorem, (ui – vicos θ i,v) = 0 for all v  T i. Since i < j, vj  T i, so (u
i– vicos θ i,vj) = 0. Since (vi,vj) = 0 as well, we concluse that (ui,vj) = 0. To get the result for I
> j, just reverse the roles of S and T
Theorem 10.5.8 can also be expressed in matrix from as follows
COROLLARY 10.5.9 Let U1 = [u1,u2,…,uk]  R nxk, V1 = [v1,…,vk]  R nxk, and
cos  1 0 
 cos  2 
T1 =    R kxk
  
 
 0 cos  k 
T T
Then U 1 V1 = V 1 U1 = T1
Computing Principal Angles and Vectors
In applications the subspaces S and T are usually provided in the form of a basis for each
subspaces. If the bases are not orthonormal, they can be orthonomalized by one of the
techniques from Chapter 3. Let us therefore assume that we have orthonormal bases
p1,p2,...,pk and q1,q2,…,qk of S and T, respectively Let P 1 = [p1, p2, ….,pk]  R nxk and Q1
= [q1,q2,…,qk]  R nxk. We wish to determine the principal angles and vectors between the
spaces. This is equivalent to determaining the matrices U1, V1, and T1, defined in Corollary
10.5.9. Since u1,…,uk and p1,…,pk are both bases of he same spaces , R (U1) = R (P1) = S.
Similarly R(V1) = R(Q1) = T. The matrices P1, Q1, V1 and U1 are all isometries (cf., Section
3.4) because they have orthonormal columns.
By Exercise 10.5.6, there exist orthogonal matrices M1,N1  R kxk such that
U1 = P1M1 and V1 = Q1N1
If we can figure out out how to calculate M1 and N1, we can use them to determine U1 and
T
V1. Recalling form Corollary 10.5.9 that U 1 V1 = T1, we have
T T T T T
P 1 Q1 = M1U 1 V1N 1 = M 1 N 1
T T
Since M1 and N1 are orthogonal and T1 is diagonal , M1T1N 1 is the SVD of P 1 Q1. this
gives us a means of calculating the principal angles and vectors
Calculate
T
1. P 1 Q1
2.
T T
The SVD P 1 Q1 = M1T1N 1 . Let  1   2  …   k denote the singular values
(1
0.
5.
1
0)
3.  i = arccos  i , i = 1,…,k (principal angles)
4. U1 = P1M1 and V1 = Q1N1 (principal vectors)
This also settles a theoretical question. The principal angles are uniquely determeined:
They are determined by the singular values of a matrix that is independent of u1,…,uk and
v1,…,vk, so they do not depend on how the prinipal vectors are chosen. By step 4 of
(10.5.10), the principal vectors have exactly as much arbitrariness as singular vectors do.
As the following exercise shows, the computation θ i= arcos  I cannot deliver
accurate values for angles near zero.
If we wish to calculate small principal angles accurately, we must find another
method. The following lemma is a start in the direction.
LEMMA 10.5.11 Let W1  R nxk be an isometry, and consider the partitioned form
W 11 
W11 =   W11  R n1xk, W21  R n2xk, n1 + n2 = n
W 21
Let γ1  γ2  …  γk be the singular values of W11, and let σ1  σ2  …  σk be the singular
values of W21 in ascending order. Then
2 2
γi + σi = 1 i = 1,…,k
Proof Since W1 has orthonormal columns
I = WIT W11  W11T W11  W21T W21
It follows immediately that W11T W11 and W21T W21 have common eigenvectors: If W11T W11 v
= λ v, then W21T W21 V =  v, where
λ+  =1
and vice versa. Since the eigenvalues of W11T W11 and W21T W21 are  12   22  ....  k2 and
 12   22  ...   k2 , respectively, (10.5.12) impies that  i2   2i = 1 for i = 1,2,…,k
.
THEOREM 10.5.13 Let S and T be k-dimensional subspaces of R n with principal angles θ

1  θ 2 … θ k .Let p1,…,pk and pk+1,…,pn be orthonormal bases for S and S ,

respectively, and let q1,…,qk and qk+1,…,qn be orhonormal bases for T and T ,
respectively let
P1 = [p1,…,pk] P2 = [pk+1,…,pn]
Q1 = [q1,…,qk] Q2 = [qk+1,…,qn]
T T
Then the singular values of P2 Q1 and P1 Q2 are
Sin θ 1  sin θ 2  …  sin θ k
Proof let P = [P1P2]  R nxn. Then P is orthogonal matrix. Since Q1 has orthonormal
columns, the matrix
W1 = PTQ1
Must also have orthonormal columns. W1 can be written in the portioned form
 P1T Q2 
W1 =  T 
 P2 Q1 
The singular values of P1T Q1 are γi = cos θ i, i =1,…,k, so by Lemma 10.5.11 P2T Q1 has
singular values
σi = 1   i2 = 1 cos2  i = sin θ i i = 1,…,k
Reversing the roles of P and Q, we find that Q P also has singular sin θ i, i = 1,…,k
T
2 1
P1T Q2 is the transpose of Q2T P1 , so it has the same singular values.

It is now clear how to calculate small principal angles accurately.
Calculate:
1. pk+1,…,pn an orthonormal basis of S  . Let P2 = [pk+1,…,pn]
2. P2T Q1
3. The singular values σ1  σ2  …  σk of P2T Q1
(10.5.14)
4. θ i= arcsinσi, i = 1,…,k
This procedure gives accurate values of the small θ i and inaccurate values of the θ i that are
close to  /2
The matrices P1T Q1 , P2T Q1 and P1T Q2 are all submatrices of the orthogonal
matrix
 P1T Q1 P1T Q2 
PT Q   T 
 P2 Q1 P2 Q2 
T
Theorem 10.5.13 shows that there are simple relationships between the singular
values of these submatrices. A more detailed statement of the relationships between the
singular values and singular vectors of the blocks of an orthogonal or unitary matrix is
given by Stewart (1977), Theorem A1.
The Distance between subspaces
We continue to let S and T denote the k-dimensional subspaces of R n. In Section 5.2 we

defined by distance between S and T by
d(S,T ) max d(u,T )
us
u 2 1
Where
d(u,T ) min v  T u v 2
The following theorem makes it very easy to work wish the distances function.
THEOREM 10.5.15 In the notation of Theorem 10.5.13
d(S,T ) = Q2T P1  Q1T P2  P1T Q2  P2T Q1
2 2 2 2
,
Proof Let u  S with u 2 = 1. Since R = T  T n
u can be expressed uniquely as a
  
sum, u = v + v , where v  T and v  T . By the projection theorem (Theorem 3.5.6),

d(u,T ) = v . Letting Q be the orthogonal matrix Q = [Q1Q2], we have
2
Q1T v  
QT v    T  
Q2 v 
 
v = 0
2 2 2 2
so v   QTv   Q1T v   Q2T v  since v  T
T
, Q1
2 2 2 2
2
so v   Q2T v  . Notice also that Q2 v = 0. Thus
T
2 2
d(u,T ) = v   Q2T v   Q2T v   Q2T v  Q2T u

2 2 2 2
And
d(S,T ) = max
uS
Q2T u
u 1
2
2
Since u  S = R (P1), there is a (unique) x  R k such that u = P1x. Since P1 is an isometry,
x 2 = u 2 = 1. Conversely, given x  R k with x 2 = 1. P1x is a member of S satisfying
P1 x 2 = 1. Therefore
d(S,T ) = max Q2T P1 x  Q2T P1
x R k 2 2
x 1
2
By Theorem 10.3.1, the spectral norm of a matrix is equal to the largest singular value.
Since Q1T P2 has the same singular values as Q2T P1 (Theorem 10.5.13), we have d(S,T ) =
Q2T P1
2
as well. Since P1T Q2 and P2T Q1 are the transposes of Q2T P1 and Q1T P2 ,
respectively, it is also true that d(S,T ) = Q2T P1  Q1T P2
2 2
COROLLARY 10.5.16 Let S and T be two k-dimensional subspaces of R n. Then d(S,T )

equals the sine of the largest principal angle between S and T.
Proof From Theorem 10.5.13 we know that the largest singular value of P2T Q1 is sin θ k.
Therefore d(S,T ) = P T Q1 = sin θ k.
2
THEOREM 10.5.17 The function d(S,T ) is a metric on the set of k-dimensional subspaces
of R n. That is, for any k-dimensional subspaces S,T and U of R n.
(i) d(S,T ) > 0 if S  T
(ii) d(S,T ) = 0
(iii) d(S,T ) = d(T,S )
(iv) d(S,U)  d(S,T )  d(T,U )
Proof properties (i) and (ii) follow immedieately from the definition of d(S,T ) Property
(iii) is obviously a qonsequence of Theorem 10.5.15 .To Prove (iv), let r1,…,rk denote an
orthonormal basis for U and Let R1 = [r1,…,rk]  R nxk, Then by Theorem 10.5.15
d(S,U ) = P2T R1  P2T IR1  P2T QQ T R1 P2TR1
2 2
Q1T  Q T R 
= P2T Q1Q2  T  R1 
 P2T Q1 P2T Q2  1T 1  
Q2  2
Q2 R1  2
= ( P Q1 )(Q R1 )  ( P Q2 )(Q R1 )
2
T T
1 2
T T
2
2
 P Q1 2 Q
2
T T
1
2
R1 2
P T
2 2 Q2 2
Q2T R1
2
T
= P Q1
2  Q R1 = d(S,T ) + d(T,U )
T
2
2 2
This completes the proof
10.4 Square Root of a Definite Matrix and Singular Values

An analogy between Hermitian matrices and real numbers can be seen in the
following result, which states the existence of a square root H0 of a positive semi-defiite
matrix H, that is, a matrix H0 such that H 02 = H.
Theorem 1. A matrix H is positive definite (or semi-definite) if and only if it has a

positive definite (respectively, semi-definite) square root H0 Moreover, rank H0 = rank H.
PROOF. If H ≥ 0 then, by definition, its eigenvalues {i }i 1 are nonnegative and we can
n
define a real matrix D0 = diag [ 1 ,  2 ,  ,  n ]. The matrix H is normal and hence there
2
is a unitary matrix U such that H = UDU*, where D = D0 The required square root of H is
the matrix
H0 = UD0U*, (1)
2
since H = UD0U*UD0U* = U D U* = H. Note that the representation (1) shows that the
2
0 0
eigenvalues of H0 are the (arithmetic) square roots of the eigenvalues of H. This proves that
rank H0 = rank H (Exercise 5.3.9), and H ≥ 0 yields H0 ≥ 0.
Conversely, if H = H 02 and H ≥ 0, then the eigenvalues of H, being the squares of
those of H0, are nonnegative. Hence H ≥ 0. This fact also implies the equality of ranks.
The same arguments proves the therem for positive definite matrices. ■
The following simple corollary gives a proof of the necessary part of Theorem
5.3.3.
Corollary 1. If H ≥ 0 (or H > 0), then (Hx, x) ≥ 0 (or (Hx, x) > 0) for all x  F n.
2
PROOF. Representing H = H 0 and using the matrix version of Theorem 5.1.1, we have
for all x  F n . ■
2
(Hx, x) = ( H 0 x, x) = (H0x, H0x) ≥ 0
Note that the square root of positive semi-definite square root H0 of H is unique.
Proposition 1. Let H be positive semi-definite. The positive semi-definite square root H0

of H is unique and is given by Eq. (1).
2
PROOF. Let H1 satisfy H 1 = H. By Theorem 4.11.3, the eigenvalues of H1 are square roots
of those of H and, since H1 ≥ 0, they must be nonnegative. Thus, the eigenvalues of H1 and
H0 of Eq. (1) coincide. Furthermore, H1 is Hermitian and, therefore (Theorem 5.2.1), H1 =
VD0V* for some unitary V. Now H 12 = H 02 = H, so V D 02 V* = U D 02 U* and hence (U*V)
D 02 = D 02 (U*V) and , consequently, H1 = H0, as required. ■
In the sequel the unique positive semi-definite (or definite) square root of a
positive semi-definite (or definite) matrix H is denoted by H1/2. Summarizing the above
discussion, note that λi ε σ(H1/2) if and only if λ2 ε σ(H), and the corresponding
eigenspaces of H1/2 and coincide. The concept of a square root of a positive semi-definite
matrix allows us to introduce a spectral characteristic for rectangular matrices.
Consider an arbitrary m x n matrix A. Then n x n matrix A*A is (generally)
positive semi-definite (see Exercise 5.3.5). Therefore by Theorem 1 the matrix A*A has a
positive semi-definite square root H1 such that A*A = H 12 . The eigenvalues λ1, λ2, . . . , λn
of the matrix H1 = (A*A)1/2 are referred to as the singular values s1, s2, . . . , sn of the
(generally rectangular) matrix A. Thus, for I = 1, 2, . . . , n.
si (A)  i (( A * A)1 / 2 )
Obviously, the singular values of a matrix are nonnegative numbers.
Exercise 2. Check that s1 = 2 , s2 = 1 are singular values of the matrix

1 0
A  0 1 □
1 0
Note that the singular values of A are sometimes defined as eigenvalues of the
matrix (AA*)1/2 of order m. It follows from the next fact that the difference in definition is
not highly significant.
Theorem 2. The nonzero eigenvalues of the matrices (A*A)1/2 and (AA*)1/2 coincide.
PROOF. First we observe that it suffices to prove the assertion of the theorem for the
matrices A*A and AA*. Furthermore, we select eigenvectors x1, x2, . . . , xn of A*A
corresponding to eigenvalues λ1, λ2, . . . , λn such that {x1, x2, . . . , xn} forms an orthonormal
basis in F n . We have
(A*Axi, xj) = λi (xi, xj) = λi δij, 1 ≤ i , j ≤ n,
On the other hand, (A*Axi, xj) = (Axi, Axj), and comparison shows that (Axi, Axj) = λi , i = 1,
2, . . ., n. Thus, Axi = 0 (1 ≤ i ≤ n) if and only if λi = 0. Since
AA*(Axi) = A(A*Axi) = λi(Axi), 1 ≤ i ≤ n,
the preceding remark shows that for λi ≠ 0, the vector Axi is an eigenvector of AA*. Hence
if a nonzero λi ε σ(A*A) is associated with the eigenvector xi, then λi(AA*) and is
associated with the eigenvector Axi. In particular, σ(A*A)  σ(AA*). The opposite inclusion
is obtained by exchanging the roles of A and A*. ■
Thus the eigenvalues of A*A and AA*, as well as of (A*A)1/2, differ only by the
geometric multiplicity of the zero eigenvalue, which is n – r for A*A and m – r for AA*,
where r = rank(A*A) = rank(AA*). Also, for a square matrix A it follows immediately that
the eigenvalues of A*A and AA* coincide and have the same multiplicities. [Compare this
with the result of Exercise 4.14.10(b).]
Note that we proved more than was stated in Theorem 2.
Exercise 3. Shows that if x1, x2, . . . , xk are orthonormal eigenvectors of A*A

corresponding to nonzero eigenvalues, then Ax1, Ax2, . . . , Axk are orthogonal eigenvectors
of AA* corresponding to the same eigenvalues. A converse statement is obtained by
replcaing A by A*.
Exercise 4. Verify that the nonzero singular values of the matrices A and A* are the
same. If, in addition, A is a square n x n matrix, show that si(A) = si(A*) for i = 1, 2, . . . , n.
□
In the rest of this section only square matrices are considered.
Proposition 2. The singular values of a square matrix are invariant under unitary
transformation.
In other words, for any unitary matrix U  Cm x n and any A  Cm x n,

si(A) = si(UA) = si(AU)
(2)
PROOF. By definition
si(UA) =  i ((A*U*UA)1/2) = si((A*A)1/2) = si(A).
To prove the second equality in Eq. (2), use Exercise 4 and the part of the proposition
already proved. ■
Theorem 3. For an n x n normal matrix A  F n x n,

si(A) = |  i (A)| , i = 1, 2, . . . , n.
PROOF. Let i denote an eigenvalue of A*A corresponding to eigenvector xi. Since A*A =
AA*, it follows (see Exercise 5.2.7) that
A*Axi =  i A*xi = i i xi = |  i |2xi.
(3)
Hence xi is an eigenvector of A*A corresponding to the eigenvalue |  i |2. By definition, | i |

is a singular value of A. Note that A has no other singular values, since in varying I from 1
to n in Eq. (3), all the eigenvalues of A*A are obtained. ■
Exercise 5. Check that for any n x n matrix A,

|det A| = s1(A)s2(A) . . . sn(A).
Hint. If H2 = A*A, where H ≥ 0, then det H = |det A|.
Exercise 6. Confirm that a square matrix is unitary if and only if all its singular
values are equal to one.
Exercise 7. Let A  Cm x n and A = QR be the “Q-R decompostion” of A described in

Exercise 3.12.7. Show that A and R have the same singular values. □
10.7 Polar and Singular-Value Decompositions
The following result has its origin in the familiar polar form of complex number:
  0 e , where λ0 ≥ 0 and 0 ≤ γ ≤ 2π.
i
Theorem 1. Any matrix A  F n can be represented in the form

A = HU,
(1)
where H ≥ 0 and U is unitary. Moreover, the matrix H in Eq. (1) is unique and is given by
H = (AA*)1/2.
PROOF. Let
λ1 ≥ λ2 ≥ . . . ≥ λr ≥ 0 = λr + 1 = . . . = λn
denote the eigenvalues of A*A (see Theorem 5.4.2) with corresponding eigenvectors x1, x2,
. . . , xn that comprise an orthonormal basis in F n.
Then, by Exercise 5.4.3, the normalized elements
1 1
yi  Ax i  Ax i , i = 1, 2, . . . , r, are orthonormal eigenvectors of AA*
|| Ax i || i
corresponding to the eigenvalues 1 , 2 , . . . , r , respectively. Choosing an orthonormal
n
basis { y i }i 1 for AA*.
Proceeding to a construction of the matrices H and U in Eq. (1), we write H =
(AA*)1/2 and note (see Section 5.4) that Hyi = (λi)1/2yi for i = 1, 2, . . . , n. Also, we
introduces an n x n (transition) matrix U by Uxi = yi, i = 1, 2, . . . , n.
Note that Corollary 5.6.1 assert that U is unitary. We have
HUxi = Hyi = i y i = Axi, i = 1, 2, . . . , r.
(3)
Since (Axi, Axi) = (A*Axi, xi) = λi = 0 for i = r + 1, r + 2, . . . , n, it follows that Axi
= 0 (r + 1 ≤ i ≤ n). Furthermore, as observed in Section 5.4, AA*yi = 0 implies Hyi = 0.
Thus equalitiea (3) and (4) for basis elements clearly give HUx = Ax for all x  F n,
proving Eq. (1). ■
Note that if A is non singular, so is (AA*)1/2 and in the polar decomposition (1), the
matrix H = (AA*)1/2 is positive definite. Observe that in this case the unitary matrix U can
be chosen to be H-1A and the representation (1) is unique.
A dual polar decomposition can be establisehd similarly.
Exercise 1. Sow that any A  F n can be represented in the form

A = U1H1,
(5)
where H1 = (A*A)1/2 and U1 is unitary. □
For normal matrices both decomposition coincide in the following sense.
Proposition 1. A matrix A  F n
is normal if and only if the matrices H and U in Eq. (1)
commute.
PROOF. If A = HU = UH, then

A*A = (HU*)(UH) = H2= (HU)(U*H) = AA*
and A is normal.
Cinversely, let A be normal and adopt the notations used in the proof of Theorem
n
1. Since, AA* = A*A the system { x i }i 1 is an orthonormal eigenbasis for AA* and,
consequently, for H = (AA*)1/2. Hence, for i = 1, 2, . . . , n,
UHxi = i Ux i
(6)
Furthemore, from A = HU it follows that λixi = A*Axi = U*H2Uxi, and consequently H2Uxi
= λiUxi. Thus the vectors Uxi are eigenvectors of H2 corresoponding to eigenvalues λi (i = 1,
2, . . . , n). Then matrix H has the same eigenvectors corresponding to the eigenvalues i
, respectively:
HUxi = i Ux i , i = 1, 2, . . . , n).
Comparing this with Eq. (6), it is deduced that UH = HU, as required. ■
Exercise 2. Check that if   0 e i (λ0 > 0) is a nonzero eigenvalues of a normal

i
matrix A, then λ0 is eigenvalue of H while e is an eigenvalue of U in the polar
decomposition (1).
Exercise 3. Check that rhe representation

0 2  1  1     
A  
1
   1 1    , 
0 2        2
is a polar decomposition of A. Find its dual polar decomposition. □
The procedure used in proving Theorem 1 can be developed to prove another

important result.
Theorem 2. Let A denote an arbitrary matrix from F m x n and let {s i }ir1 be the nonzero
singular values of A. Then A can be represented in the form
A = UDV*
(7)
where U  F m x m and V  F n x n are unitary and the m x n matrix D has si in the i, i
position (1 ≤ i ≤ r) and zeros elsewhere.
The represntation (7) is referred to as a singular-valve decomposition of

Theorem1. Note only that now the matrix A is generally a rectangular matrix and the
n
system of eigenvectors { x i }i 1 for A*A and { y j }nj1 for AA* are now orthonormal bases in
F n and F m, respectively. We have [see Eq. (2)]
Axi = i y i , i = 1, 2, . . . , r, (8)
and , by defintion of xi (see Exercise 5.3.6), Axi = 0 (r + 1 ≤ i ≤ n).
Now we construct matrices
V = [x1 x2 . . . xn] and V = [y1 y2 . . . ym]
(9)
and note that, according to Corollary 5.6.2, they are unitary. Since by definition si =  i (i
= 1, 2, . . . n), the realtion (8) implies
AV = [s1y1 s2y2 . . . sryr 0 . . . 0] = UD,
where is D the matrix in Eq. (7). Thus, the representation (7) is established. ■
Note the structure of the matrices U and V in Eqs. (7) and (9): the columns of U
(resppectively, V) viewed as vectors from F m (respectively, F n) constitute an orthonormal
eigenbasis of AA* in F m (respectively, of A*A in F n). Thus, a singular-value decomposition
of A can be obtained by soving the eigenvalue-eigenvector problem for the matrices AA*
and A*A.
Example 4. Consider the matrix A defined in Exercise 5.4.2;it has singular values s1 =
2 and s2 = 1. To find a singular-value decomposition of A, we compute A*A and AA* and
construct orthonormal eigenbases for these matrices. The standard basis in F 2 can be used
for A*A and the system {[α 0 α ]T, [0 1 0] T, [α 0 - α] T}, where α = 1 / 2 , can be used
for AA*. Hence
1 0  0    2 0
   
A = 0 1    0 1 0   0 1  
 1 0
0 1
1 0  0     0 0 
 
is a singular-value decomposition of A. □
The realtion (7) gives rise to the general notion of unitary equivalence of two m x
n matrices A and B. We say that A and B are unitarily equivalent if there exist unitary
matrices U and V such that A = UBV*. It is clear that unitary equivalence is an equivalence
relation and that Theorem 2 can be interpreted as asserting the existence of a canonical
form (the matrix D) with respect to unitary equivalence in each equivalence class. Now the
next result is to be expected.
Proposition 2. Two m x n matrices are unitarily equivalent if and only if they have the
same singular values.
PROOF. Assume that the singular values of the matrices A, B  Fm x n coincide.

Writing a singular-value decomposition of the form (7) for each matrix (with the same D),
unitary equivalence readily follows.
Conversely, let A and B be unitarily equivalent, and let D1 and D2 be the matrices of
singular values A and B respectively, that appear in the singular-value decompositions of
the matrices. Without loss of generality, it may be assumed that the singular values on the
diagonals of D1 and D2 are in nondecreasing order. It is easily found that the equivalent: D2
= U1D1 V1* . Viewing the last realtion as the singular-value decomposition of D2 and noting
that the middle factor D1 in such a representation is uniquely dtermined, we conclude that
D1 = D2. ■
Corollary 1. Two m × n matrices A and B are unitarily equivalent if and only if the
matrices A*A and B*B are similar.
PROOF. If A = UBV*, then

A*A = VB*U*UBV* = VB*BV* = VB*BV -1.
Conversely, the similarity of A*A and B*B yields, in particular, that they have the same
spectrum. Hnece A and B have the same singular values and, consequently, are unitarily
equivalent. ■

Chapter 10 The Singular Value Decomposition

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 10 The Singular Value Decomposition

Caricato da

Copyright:

Formati disponibili

Chapter 10

THE SINGULAR VALUE

10.2 GENERALIZED INVERSE

Theorem 1. Suppose that B is any matrix. Then BTB is nonnegative definite.

Proof. Consider the quadratic form Q = yT y, which is never negative. Suppose y = Bx

Now, we are going to prove the next important theorem.

Theorem 2. Suppose A be an m × n matrix of rank r, where r ≤ m and r ≤ n, and suppose

Use the properties (ii) and (iii) of Theorem 2 to obtain

By combining this with Eq. (C), then the theorem is proved.

Definition 1. The More-Penrose generalized inverse (or pseudoinverse), denoted A+, of

Theorem 4. Suppose A be an m × n matrix and w be an arbitrary m-vector. Also consider

Those fk are orthonormal,

The y with minimum norm must have μk = 0 for k = r + 1, r + 2, …, n. So by using

v = A+w + (I – A+A)y (10-7)

where y is any arbitrary n-vector.

Proof. If a solution exists to Av = w, μk = 0 for k = r + 1, r + 2, …, m in Eq. (10-6), and

Theorem 6. The (More-Penrose) generalized inverse (or pseudoinverse) of a matrix A, not

Proof. (1) Left to the reader as an exercise

FAF =F, (10-

AG = (AG)H = [(AF)(AG)]H = (AG)H(AF)H = (AG)(AF) = (AGA)F = AF

GA = (GA)H = [(GA)( FA)]H = (FA)H(GA)H = (FA)(GA) = F(AGA) = FA

(2) A-1 satisfies conditions 1 through 3 of Theorem 6 because

(4) It follows directly from conditions 1 through 3 of Theorem 6.

(DD+)H = [AH(A+)H]H = A+A = (A+A)H = AH(A+)H = DD+

(D+D)H = [(A+)H AH]H = AA+ = (AA+)H = (A+)HAH = D+D

Both are Hermitian by definition

D+DD+ = (A+)HAH (A+)H = (A+AA+)H = (A+)H = D+

And finally from condition 3 Theorem 6 we also have.

DD+D = AH(A+)HAH = (AA+A)H = AH = D

(6) If A = 0, then A+ = 0 satisfies conditions 1 through 3 of Theorem 6.

(7) Left to reader as an exercise.

FF+ = (PAQ)( QHA+PH)

Both Hermitian since AA+ and A+A are.

From condition (2) of Theorem 6 we infer that

FF+F = (PAQ)(QHA+PH)( PAQ)

From condition (2) of Theorem 6 we infer that

F+FF+ = (QHA+PH)(PAQ)( QHA+PH)

AA+ = (BC) CH(CCH)-1(BHB)-1BH = B(CCH)(CCH)-1(BHB)-1BH = B(BHB)-1BH

A+A = CH(CCH)-1(BHB)-1BH(BC) = C (CCH)-1(BHB)-1(BHB)C = CH(CCH)-1C

Both are obviously Hermitian.

A A+A = (BC) CH(CCH)-1(BHB)-1BH (BC) = B[(CCH)(CCH)-1][ (BHB)-1(BHB)]C

It is obviously from condition (3) of Theorem 6 that

A+AA+ = CH(CCH)-1(BHB)-1BH (BC) CH(CCH)-1(BHB)-1BH

Step 1. Find the rank of A and denote it as r.

Has the same order as A, and, therefore, is square when A is square.

Proof. Suppose Σ to be an n × n diagonal matrix containing all the singular value of A,

The algorithm for constructing the singular-value decomposition of A is the following:

Step 1. Determine the eigenvalues of AHA and a canonical basis of orthonormal

If A is real, both U and V may be chosen to be orthogonal.

Theorem C. P = UΣVH is positive semidefinite, and Q = UVH is a matrix with orthonormal

PHP = (UVH)H(UVH) = UVH UVH = VIVH = VVH = I.

Proof. QP = (UVH)(VΣVH) = U(VHV)ΣVH = UΣVH = A.

10.3 THE BASIC CONCEPT OF THE SINGULAR VALUE DECOMPOSITION

THEOREM 10.1.2 If A  R nxm , then m = dim (Nul(A)) + dim (R(A)).

THEOREM 10.1.3 N (ATA) = N (A)

COROLLARY 10.1.4 Rank(ATA) = rank(A) = rank(AT) = rank(AAT).

Proof Let B =  j 1 ju j v Tj

10.2 COMPUTING THE SVD

errors in the entries of ATA. Therefore we cannot expect to calculate λ 17 accurately.

Reduction to Bidiagonal Form

There is a finite algorithm to calculate Û , Vˆ and B.

In many applications (e.g., least-squares problems) n is much larger then m. In this

Since B is bidiagonal, it has the form

AA(Axi) = A(AAxi) = λi(Axi), 1 ≤ i ≤ n,