Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This theorem in conjunction with Theorem (Quadratic form …about the eigenvalue of
Hermitian matrix) and that square Hernitian matrix is positive definite, we can conclude
that the eigenvalue of ATA are either zero or positive and real. Also, this is true for the
eigenvalues of AAT in which we suppose that B = AT in Theorem 1.
Proof. Recall that for any given m × n matrix A, rank(A) = rank(ATA) = rank(AAY), there
are exactly r nonzero eigenvalues of ATA and AAT. Then ATAgi = λigi for I = 1, 2, …, r
and ATAgi = 0 for I = r + 1, r +2, …, n. Now, define an m-vectors pi as
Ag i
pi = for I = 1, 2, …, r.
i
Then
AA T g i Ai g i
AATpi = i p i (10-1)
i i
Furthermore
g Ti A T Ag j i g Ti g Tj
piTpj = ij (10-2)
i j j
Also, we can find for I = r + 1, r +2, …, n such that AATfi = 0 and are orthonormal. Thus
the eigenvectors fi are an orthonormal basis for V m and the eigenvector gi are an
orthonormal basis for Vn.
Based on these facts, we are now able to prove the statements in the theorem
(i) Since there are r of λi, these must be the eigenvalues of AAT. Hence (i) is proved.
(ii) And since for each I there is one normalized eigenvector, pi can be taken equal to fi
and (ii) is proved
2
(iii) Since ATAgi =0 for I = r + 1, r + 2, …, m. Then Ag 2 2
g Ti A T Ag i 0 , so that Agi =
0 and (iii) is proved
(iv) Similarly, AATfi = 0 for I = r + 1, r + 2, …, m. Then, ATfi = 0 and (iv) is proved
Ag A T Ag g
T T
i
i
i i i g i and (v) is proved
(v) Finally, A fi = A
i i i
Example. Verify Theorem 2 for A =
Theorem 3. Suppose A be an m × n matrix. Then, under the condition (i) through (v) of
r
Theorem 2, A =
k 1
k f k g kT
Proof. As indicated by Theorem 2, A: Vn Vm. Also f1, f2, …, fm and g1, g2, …, gn
form orthonormal bases for Vm and Vn respectively. So, for arbitrary v in Vn, we have
n
v=
k 1
k g k , where k g Ti v (10-3)
We should note that Theorem 2 hold even A is rectangular, and has no spectral
representation.
Based on the above analysis, we are now ready to discuss the concept of More-Penrose
generalized more detail and we need the following definition.
Example
Proof. Using the notations of Theorem 2 and 3, an arbitrary n-vector w and an arbitrary n-
vector v can be written as
m m
w=
k 1
k kf and v=
k 1
k gk (10-5)
where μk = fkTw and γk = gkTv. Then use the properties (1) and (2) of Theorem 2, gives
m m r m
Av - w = k Ag k
k 1
kfk =
k 1 k 1
k k k f k
k r 1
f
k k (10-6)
r 2 m
||Av - w|| 22 = k k k 2
k
k 1 k r 1
k
The best way we can do to minimize ||Av - w|| 22 is to choose k for k = 1, 2, …, r,
k
so that those vectors y in Vn that minimize ||Av - w|| 22 can be expressed as
r n
k
y= g i
k
gi k
k 1
k r 1
where μk for k = 1, 2, …, n is arbitrary. Since
2
r n
||y|| = k
k
2
k2
2
k 1
k r 1
Theorem 5. Suppose A be any matrix such that Av = w is a linear system. Then if any
solution of this linear system exists, it can be expressed as
g
k 1
k k , where ξk = gkTy. Also from definition 1 and Theorem 3, we have
r r r r r
g k f kT f i g kT i i g k g kT ki g k g kT
1 1
A+ A = (10-9)
i 1 k 1 k i 1 k 1 k k 1
r
But, since the gk are orthonormal basis vectors for Vm, I = g
k 1
k g Tk . Hence,
n n
(I – A+A)y = g k g Tk y
k r 1
g
k r 1
k k (10-
10)
From Eq. (10-3), μk = fkTw so that by substitution of Eq. (10-10) into Eq. (10-8) and
definition 1, we have the result.
Based on the above analysis, we ready to formulate the important conditions for
pseudoinverse of a matrix A by the following theorem.
1
(2) Represent A = k f k g Tk and A+ = g k f kT . Then
k 1 k 1 k
r r r
k f k g Tk g k f kT k f k g Tk
1
AA+A =
k 1 k 1 k k 1
r r r
1
= i k f i g Ti g j f Tj f k g Tk
i 1 j 1 k 1 j
Since = δij and
giTgj = δjk, gives the result.
fjTfk
(3) The same procedure can be used here
A generalized inverses exists for every matrix. If A has order n × m, then A+ has order n ×
m and has the properties as indicated by the following theorem
Theorem 7. For each m × n matrix A, there exists a unique n × m matrix A+ satisfying ten
properties
(1) A+ is unique
(2) A+ = A-1 for nonsingular A
(3) (A+)+ = A
(4) (kA)+ = (1/k)A+ for k ≠ 0
(5) (AH)+ = (A+)H
(6) 0+ = 0
(7) The rank of A+ equals to the rank of A.
(8) If P and Q are unitary matrices of appropriate orders so that the product of PQ is
defined, then (PAQ)+ = QHA+PH.
(9) If A has order m × k, B has order k × n, and both matrices have rank k,
then
(AB)+ = B+A+.
(10) For square matrix A, AA+ = A+A if and only if A+ can be expressed as a polynomial in
A.
Proof.
(1) We assume that F and G are two generalized inverse for the same matrix A then we
must show that F = G. Since F and G are assumed to satisfy conditions (1) through (3)
of
Theorem 6
above then we have FA, AF, GA, and AG are all Hermitian and
AFA = A, (10-
11a)
AGA = A (10-
11c)
and
GAG = G (10-
11d)
Multiplying both sides of eq. (10-11a) on the right by G, we obtain AFAG = AG, from
which we infer that
Multiplying both sides of eq. (10-11a) on the right by G, we obtain GAFA = GA, from
which we infer that
Then
G = GAG = (GA)G = (FA)G = FAF = F
(AA-1)H = IH = I = AA-1
AA-1A = A(A-1A) = AI = A
and
A-1AA-1 = (A-1A)A-1 = IA-1 = A-1
The result follows from property (1) of this theorem
(3) With respect to A and A+, the conditions 1 through 3 are symmetric so that if A+ is
the
generalized inverse of A, then A also is the generalized inverse of A+, i.e., A = (A+)+.
(5) To prove this property have to show that (A+)H satisfies conditions 1 through 3 of
theorem
6.
Take D = AH , then
From condition 1 we obtain
Hence D+ = (A+)H satisfies all the conditions for a generalized inverse of AH. Since the
generalzed inverse is unique, then it follows that (AH)+ = D+ = (A+)H.
(8) Let F = PAQ. It means that we have to show that F+ = QHA+PH satisfies conditions (1)
through (3), given that A and A+ do.
Form condition (1)of Theorem 6 we infer that
Theorem 8. If A can be factored into the product BC, where both BHB and CCH are
invertible, then
A+ = CH(CCH)-1(BHB)-1BH (10-
12)
Proof. To prove this theorem, we have to show that A+ satisfies the three conditions.
From condition (1) of Theorem 6 we see that
QHA+PH = CH(CCH)-1(BHB)-1BH
By multiplying both sides of the last equation by Q on the left and P on the right we obtain
the desired formula. As a result, the product is unique, although the factors B and C and
the matrices P and Q are not unique.
The procedure for providing the generalized inverse for any matrix A is stated in Algorithm
1 as follows
Algorithm 1
A 11
matrix.
Step 5. A+ = QCH(CCH)-1(BHB)-1BHP (5)
When the A form a linearly dependent set of vectors then equation in step 5 reduces to
A+ = (AHA)-1AH (6)
Equations (5) and (6) are the formulas for calculating generalized inverses, but they are not
stable if in the calculation the roundoff error is imvolved because small errors in the
elements of a matrix A can result in large errors in the computed elements of A+. In such
situations we need a better algorithm to exists.
For any matrix A, not necessarily square, the product AHA is normal and has nonnegative
eigenvalues, The positive square roots of these eigenvalues are the singular values of A.
More ever there exists unitary matrices U and V such that
D 0
A = U VH
(7)
0 0
Where D is a diagonal matrix having as its main diagonal all positive singular value of A.
The block diagonal matrix
D 0
Σ =
(8)
0 0
Theorem B. Let A be a matrix has order m × n with m ≥ n. Then A can be factored into A
= UΣVH, where Σ is an n × n matrix, and U is an m × n matrix with orthonormal columns.
A+ = V1D-1U1H (10)
Where V1 and U1 are defined by step 3 and step 4 respectively. For the purpose of of
calculating generalized inverse, steps 5 and 6 can be ignored.
Proof. P is Hermitian and similar to Σ, which has nonnegative eigenvalues. Since the
columns of U are orthonormal, UHU = I, and with V being unitary,
Theorem D. Let A be any m × n matrix with m ≥ n. Then A can be factored into A = QP,
where Q has orthonormal columns and P is positive semidefinite. Such a factorization is
called polar decomposition of A.
In order to read this chapter, you will have to have the comprehensive
understanding of Chapter 8. In section 10.2 we will study a variant of the QR algorithm,
for which you will need to have read most of the material in Sections 4.5 to 4.8 and Section
5.3 as well . This chapter is largely independent in Chapter 6.
Throught the chapter we will restrict our attention to real matrices. This is done
solely to simplify the expositions; the generalization to the complex setting is routine.
Let A R nxn by symmetric. Then by Corollary 4.4.14 there exists an orthonormal
basis v1,…,vn of R n, consisting of eigenvectors of A. Each vi satisfies Avi = λivi, where λi
is the (real) eigenvalue associated with vi. These relationships can be expressed by the
following diagram,
A
k
Vk vk (10-
Which portrays the actions of A is linear transformation mappinf R n into R n. The
diagram descriebes the action of A completely, since A is completely determined by its
action on a basis of R n. Eq. (10-) is equivalent to the statement (theorem 4.4.13) that
there exixt an orthogonal matrix V and a diagonal matrix D such that A = VDVT. The
coloumns of V aore v1,v2,…,vn, and the main diagonal entries of D are λ1, λ2,…, λn.
It is reasonable to ask to what extent (10.1.1) can be generalized to nonsystmetric
matrices. If A is nonsimmetric but normal, (10.1.1) continues to be true, except that some of
λ1,…, λn and v1,…,vn are complex. If A is not normal but simple, we must give up to
ortogonality of the basis. If A is not simple, (10.1.1) ceses to hold. On will known
generalization is The Jordan canonical form. See for example, Lancaster and Tisminetsky
(1985). If A is not square, say A R nxm with n m, then even the Jordan canonical form
does not exist. In this section we will develop an extension of (10.1.1), valid for all A R
nxm
, called the singular value the decomposition (SVD).
Only a moments reveals a significant change that will have to be made in (10.1.1)
if we wish to extend it to nonsquare matrices. Every A R nxm can be viewed as a linear
transformation A: R m R n, mapping R m into R n. The domain consists of m-tuples,
while the range consists of n-tuples. Thus our extention of (10.1.1) will have to have
different sets of vectors on left and right.
For the rest of this section A will denote a matrix in R nxm. Recall from Section
3.5 that A has two important spaces associated with it – the null space and the range-given
by
N (A) = { x | R m | Ax = 0}
R (A) = { Ax | x R m}
The null space in subspace of R m, and the range is a subspaces Of R n. Recall that the
range is also called the coloumn space of A (Exercise 3.5.7), and its dimension is called the
rank of A, denoted rank(A).
COROLLARY 10.1.6 ATA and AAT have the same nonzero eigenvalues, counting
multiplicity.
The matrices ATA and AAT are both symmetric and hebce simple . Thus Corollary
10.1.6 and Exercise 10.1.4 yield a second proof that they have the same rank, which equals
the number of nonzero eigenvalues. Since ATA and AAT generally have different
dimensions, they cannot have exactly the same eigenvalues. The difference is made up by a
nonzero eigenvalue of apropiate multiplicity. If rank (ATA) = rank (AAT) = r and r < m , then
ATA has zero as an eigenvalue of multiplicity n – r.
THEOREM 10.1.7 (SVD Theorem) Let A R nxm have rank r. Then there exixts real
numbers σ1 σ2 … σr > 0 , an orthonormal basis v1,…,vm of R m and an orthonormal
basis u1,…,un of R n , such that
Avi = σ iui i = 1,…,r ATui = σ ivi i = 1,…,r
Avi = 0 i = r + 1,…,m ATui = 0 i = r + 1,…,n
(10.1.8)
Equations 10.1.8 imply that v1,…,vm are eigenvectors of ATA, u1,…,un are eigenvectors of
AAT, and 12 ,…, r2 are the nonzero of ATA and AAT.
Proof. You can easily verify that be assertions in the final sentence are true. This
determines how v1,…,vm nust be chosen. Let v1,…,vm be an orthonormal basis of Rm
consisting of eigenvectors of ATA and let λ1,…, λ m be the associated eigenvalues. Since ATA
is positive semidefinite, all of its eigenvalues are nonnegative . Assume v1,…,vm are ordered
so that λ 1 λ 2 … λ m. Since r = rank(A) = rank(ATA), it must be that r > 0 and λ r + 1 =
λ r + 2=…= λ m = 0. For i = 1,…,r, define σi and ui by
1
σi = Avi 2 and ui = Avi
i
These definitions imply that Avi = σiui and ui 2
= 1, i =1,…,r. The result of Exercise
10.1.3 implies that u1,…,ur are orthogonal and hence orthonormal. It is easy to show that
= (Avi,Avi) = (ATAvi,vi) = ( i,vi,vi) = i. It now
2 2 2
σi = λ i, i =1,…,r. Indeed σ i = Avi 2
1 i
follows easily that ATui = σivi, i = 1,…,r for ATui = ( i) ATAvi = ( ) = σ ivi
i i
The proof is now complete, except that we have not defined Ur+1,…,Un, assuming
r < n. By Theorem 10.1.5 the vectors u1,…,ur are eigenvectors of AAT associated with
nonzero eigenvalues. Since AAT R nxn and rank (AAT) = r, AAT must have a null-space of
dimensions n – r. Let Ur+1,…un be any orthonormal basis of N (AAT). Noting that
Ur+1,…,Un are eigenvectors of AAT associated with the eigenvalue zero, we see that
Ur+1,…,Un are orthogonal to U1,..,ur. Thus u1,…,un is an orthonormal basis of Rn consisting
of eigenvectors of AAT. Since N (AAT) = N (AT), we have ATui = 0 for i = r + 1,…,n. This
completes the proof.
The numbers σ1,…, σr are called the singular values of A. Let k = min {n,m}. If r <
k, it is usual to adjoin k – r zero singular values r + 1 =…= k = 0. The vectors v1,v2,…,vm
are called right singular vectors of A, and u1,u2,…,un are called left singular vectors of A.
Singular vectors are not uniquely determined; they are no more uniquiely determined then
any eigenvectors of legth 1. Any singur vector can be replaced by its opposite, and if ATA or
AAT happens to have an eigenspace of dimensions 2, an even greater loss of uniquiness
occorus.
AT has the same singular values as A. The right (left) singular vectors of AT are the
left (right) singular vectors of A.
Theorem 10.1.10 allows of us to draw, for any A R nxm, a diagram in the spirit
of (10.1.1):
A
1
v1 u1
2
v2 u2
r
vr ur
v r 1
0
v m
An analogous diagram holds for AT. Drawing the two diagrams side by side , we have
Which server as pictorial representations of the SVD Theorem.
A AT
1 1
v1 u1 v1
2 2
v2 u2 v2
r r
vr ur vr
(10.1.9)
v r 1 u r 1
0 0
v m u n
The singular value the decomposition displays orthonormal bases of the four
spaces R (AT), N (A) , and R (AT) = N (AT). It is clear from (10.1.9) that
R (AT) = v1 ,..., vr R (A) = u1 ,..., u r
N (A) = vr 1 ,..., vm N (AT) = u r 1, ...u n
From the representations we see that R (AT) = N (A) and R (A) = N (AT ) ; we prove
these equalities in Theorem 3.5.3 by other means.
The singular value decompositions is usually expressed as a matrix decomposition,
as follows :
THEOREM 10.1.10 (SVD Theorem ) Let A R nxm have rank r. Then there exists U
Rnxn, R nxm, such that U and V are orthogonal , has the form
1 0
2 0
= 1 2 ... r > 0 (10.1.11)
0 0
And
A = UV T
Proof. Let v1,…,vm and u1,…,un be right and left singular vectors, and let σ1,…, σr be the
nonzero singular values of A. Let V = [v1,…,vm] R mxm and U = [u1,…,un] R nxn. Then
U and V are orthogonal. The equations
u i 1,..., r
Avi = 1 i
0 i r 1,..., m
Can be combined into a single matrix equation
1 0
0
A[v1,…,vr|vr+1,…,vm] = [u1,…,ur| ur+1,…,un]
0 r
0 0
That is, AV = U . Since VV T = I, wee immediately that A = U VT.
In the product A = U VT, the last n – r coloumns of U and m – r coloumns of V
are superfluous because they interact only with blocks of zeros in . This observation leads
to the following variant of the SVD Theorem.
THEOREM 10.1.12 Let A R nxm have rank r. Then there exist Û R nxr, ̂ R rxr
and Vˆ V R mxr such that Û and Vˆ are isometries (cf.,Section 3.4), is a diagonal matrix
with main-diagonal entries σ1 σ2 … σr > 0 and
ˆ Vˆ T
A = Uˆ
THEOREM 10.1.13 Let A R nxm have rank r. Let σ1,…, σr be the nonzero singular
values of A, with associated right and left singular vectors v1,…,vr and u1,…,ur,
respectively. Then
r
A=
j 1
σjUjVjT
Avi = Bvi, i = 1,…,m since v1,…,vm is a basis of R m if i r, we have Avi = σiui and Bvi = .
T T
Since v1,…,vm is an orthonormal set , vj vi = 0 unless j = i, in which case v i = 1. Thus all
terms in the sum are zero, except the I th term, and Bvi = iui. If i > r, then Avi = 0 and Bvi
r r
= σjuj(vjTvi) = σjuj(0) = 0
j 1 j 1
Show that a simple relationship exists between the singlar vectors of A and the eigenvectors
of M. Show how to build an orthogonal basis of R n+m consisting of eigenvectors of M,
given the singular vectors of A.
It is clear that we can calculate the singular values the decomposition of any
matrix A by calculating the eigenvalues and eigenvectors of ATA and AAT. This approach is
illustrated in the next example and the exercise that follow. In the next section we willl
discuss a different approach, in which the SVD is computed without forming ATA or AAT
explicitly
Example 10.1.14 Find the singular values abd right and left singlar vectors of the matrix
1 2 0
A=
2 0 2
Since ATA is 3 by 3 and AAT is 2 by 2, it seems reasonable to work with the latter.
Since
5 2
AAT =
2 8
The characteristic polynomial is (λ -5)( λ -8) - 4 = λ 2 –13 λ + 36 = (λ -9)( λ -4), and the
eigenvalues of AAT are λ 1 = 9 and λ 2 = 4. The singular values of A are therefore
σ1 = 3 and σ2 = 2
The left singular vectors of A are eigenvectors of AAT . Solving (9I – AAT)u = 0, we find
that every multiple of [1 2]T is an eigenvector of AAT associated with the eigenvalue λ 1 = 9.
Then solving (4I – AAT )u = 0, we find that the other eigenspace of AAT consists of
multiples of [2 – 1]T.
Since we want representatives with unit Euclidean norm, we take
1 1 1 2
u1 = 2 and u2 =
5 5 1
( What other choice for u1 and u2 could we have made?) These are the left singular vectors
of A. Notice that they are orthogonal, as they must be. We can find the right singular
vectors v1, v2, and v3 by calculating the eigenvectors of ATA. However v1 and v2 are more
easily found by the formula
1 T
vi = A ui i = 1,2
i
Which is trivial variation of one of the equation in (10.1.8) . Thus
5 0
1 1
v1 = 2 and v2 = 2
3 5 5
4 1
Notice that these vectors are orthonormal. v3 must satisfy Av3 = 0. Solving the equation Av
= 0 and normalizing the solution, we get
2
1
v3 = 1
3
2
We could have found v3 wihout reference to A be applying the Gram-Schmidt process, for
example, to find a vector orthogonal to both v1 and v2. Normalizing that vector, we would
get v3.
Now that we have the singular values and singular vectors of A, we can easily
construct the matrices U, and V of Theorem 10.1.10 . We have
1 1 2
U = [ u1 u2 ] =
5 2 1
0 0 3 0 0
= 1 =
0 2 0 0 2 0
5 0 2 5
V = v1v2v3 = 1
2 6 5
3 5
4 3 2 5
You can easily check that A = U VT. In so doing you will notice that v3 plays no rule in
the computation. This is an intance of the reamark made just prior to Theorem 10.1.12. It is
an easy exercise for you to write down matrices Û , ̂ and Vˆ satisfying the hypoteses of
Theorem 10.1.12
Diagram (10.1.9) gives a clear, complete picture of the action of A and of AT. This is
therefore reasonable to expect that the singular value decomposition will be very useful.
This is indeed the case, and we will examine some applications in subsequent sections.
Because the SVD employs orthogonal matrices (orthonormal bases), we can expect it to be
not only an important theoretical device, but also a powerful computational tool. If this
expectation is to be realized, we must be accurate, efficient means of calculating sigular
values and singular vectors.
The option of forming ATA (or AAT) and calculating its eigenvalues and (possibly)
eigenvectors should not be overlooked. This approach had the advantage of being relative
inexpensive; we studied several good algorithms of calculating eigenvalues and
eigenvectors of symmetric matrices in Chapter 4,5 and 6. The disadvantage of this approach
is hat smaller singular values will be calculated inaccurately. This is a consequenceof the”
loss of information through squaring” phenomanen (see Example 3.5.11), which occurs
when we compute ATA from A.
We can get some idea why this information loss occurs by consediring an
example. Suppose the entries of the matrix A are known to be correct to about six desimal
places. If A has, say, σ1 1 and σ17 10-3, then σ17 is fairly small compared with 1, but it is
still well above the error level 10-5 or 10-6. We ought to be able to calculate σ17 with
some prescesion, perhaps to two or three decimal places. The entries of ATA also have about
six-digits accuracy. Associated with the singular values σ1 and σ17, ATA has the eigevalues
λ 1 = σ 1 1 and λ 17 = σ 17 10-6. Notice that λ 17 is of about the same magnitude as the
2 2
In Chapter 4 we found that the eigenvalue problem can be made much easier if we first
reduce the matrix to a simpler form, such as tridiagonal or Hessenberg form. The same
turns out to be true of the singular valu the decomposition. The eigenvalue problem requires
that the reduction be done via similarity transformations. For the singular value
decompotion A = U VT. It is clear that similarity transformations are not necessary, but
the transforming matrices should be a orthogonal. We will see that we can reduce any
matrix to a bidiagonal form by applying reflectors on both left and right. The algorithms
that we are about to discuss work well for dense matrices. We will not coves the sparse
case.
We continue to assume that we are dealing with a matrix A R nxm, but we will
now make the additional assumtion that n m. This does not imply any loss of generalty,
for if n < m, we can operete on AT instead of A. If the SVD of AT is AT = U VT, then the
SVD of A is A = V TUT
A matrix B R nxm (n m) is said to be bidiagonal if bij = 0 whenever i > j or i
< j – 1. This means that B has the form
* *
* * 0
*
*
*
0
THEOREM 10.2.1 Let A R nxm
with n m. Then there exist orthogonal Û R nxn
and Vˆ R mxm
, both products of a finite number of reflectors, and a bidiagonal B R
nxm
, such that
A = Û B Vˆ T
a11 *
0
a
Û 1 21 =
a n1 0
Then the first of Û 1A consists of zeros, expect for the (1,1) entry. Now let [ â 11, â 12,…, â
ˆ
1m] denote the first coloumn of Û 1A and let V 1 R
mxm
be a reflectors of the form
1 0 0
0
Vˆ 1 =
Vˆ1
0
Such that
[ a12,…,a1n ] V 1 = [*0..0]
Then the first row of Û 1A Vˆ 1 consists of zeros, expect for the first two entries. Because the
first column of Vˆ 1 is e1, the first coloumn of Û 1A is unaltered by right multiplication by
Vˆ 1. Thus Û 1A Vˆ 1 has the form
* * 0 0
0
Û 1A Vˆ 1 =
Aˆ
0
The second step of the constuction is identical to the first, expect that it acts on the
submatrix A. It is easy to show that the rotators used on the second step do not destroy the
zeros created on the first step. After two steps we have
* * 0 0 0
0 * * 0 0
Û 2 Û 1A Vˆ 1 Vˆ 2 = 0 0
ˆˆ
A
0 0
The third step acts on the submatrix A, and so on, After m steps we have
* * 0
Û m… Û 2 Û 1A Vˆ 1 Vˆ 2… Vˆ m-2 = * = B
0 *
0
Nptice that steps m – 1 and m require multiplications on the left only. Let Û = Û 1 Û 2…
Û m and Vˆ = Vˆ 1 Vˆ 2… Vˆ m-2. Then Û TA Vˆ = B; that is A = Û UB Vˆ VT
0 I
B~
B = R nxm
0
~ ~
And V = V R mxm, we have A = Û B Vˆ T
The advantage of this arrangement is that the right multiplications are applied to
the small marix R̂ instead of the large matrix A. They therefore cost a los less. The
disadvantage is that the right multiplications destroy the upper-triangular form of R. Thus
most of the left multiplications must be repeared on the small matrix R̂ . If the ratio n/m is
sufficiently large, the added cost of the extra left multiplications will be more than offset by
the savings in the right multiplications.
It is tempting to look for a revised algorithm that exploits the special structure of
R̂ rather than destroying it. Such a procedure, using plane rotators, was mentioned by
Chan (1982), but as he pointed out, nothing is saves unless fast, scaled rotators are used.
Using fast rotators, one can devise an algorithm whose asymptotic flop count is less than
that of the original procedure for all rations n/m > 1. The catch is that fast rotators have a
large overhead, and they will not prove cost effective unless m is fairly large.
As we have already noted, the various applications of the SVD have different
requirements. Some require only the singular values, while others require the right or the
Û
left singular vectors, or both. If any of the singular vectors are needed, then the matrices
ˆ
and /or V have to be computed explicity . Ussualy it is possible to avoid calculating Û ,
but there are numerous applications for which Vˆ is needed.The fop counts in Exercise
10.2.2 and 10.2.6 do not include the cost of computing Û or Vˆ
As result of Exercise 10.2.9, we see that we can always assume, without loss of generality,
that B is a properly bidiagonal matrix. A form of the implicit QR algorithm can be used to
find the SVD of any such matrix. We will describe the algorithm first and then jusisfy it.
Suppose B R mxm is a properly bidiagonal matrix. Then both BBT and BTB are
properly tridiagonal matrices, so we could find their eigenvalues inexpensively by the QR
algorithm. The algorithm that we are about to develop is equivalent to the QR algorithm on
both BBT and BTB, but it is carried out without ever forming these matrices explicity. We
begin a QR step by choosing a shift. The lower right-hand 2-by-2 submatrix of BBT is
m2 1 m2 1 m m 1
m m 1 m2
Calculate the eigenvalues of this submatrix and take the sgift to be that eigenvalue which is
closer to m2. This is Wilkinson shift on BBT. It is good choice because it guarantess
convergence, and the convergence is rapid in pratice. We could have chosen the shift form
BTB instead of BBT. We chose the latter because its lower right-hand 2-by-2 submatrix has
slightly simpler form.
A QR step on BTB with shift would perform the similarity transformation BTB
T T
Q B BQ, where Q is the orthogonal factor form the QR decomposition:
BTB – 1 = QR (10.2.3)
Since we plan to take an implicit step, all we need is the first column of Q. Because R is
upper triangular, the first coloumn of Q is proportional to the first coloumn of BTB – I,
which is
12
1 1
0 (10.2.4)
0
Let V12 be a rotator(or reflector) in the (1,2) plane whose first coloumn is proportional to
(10.2.4). Multiply B by V12 on the right. The operation B BV12 alters only the first two
coloumns of B, and as you can easily check, it creates a new nonzero entry ( a “bulge”) in
the (2,1) position . (Draw the picture!)
T T
Now find a rotator U 12 in the (1,2) plane such that U 12 BV12 has a zero in the (
2,1) position. This operation acts on row 1 and 2 and creates a new bulge in the (1,3)
T
position. Let V23 be a rotator acting on coloumns 2 and 3 such that U 12 BV12V23 has a zero in
the (1,3) position. This creates a bulge in the (3,2) position. Applying additional rotators U
T T
23 , V34, U 34 …,we chase the bulge through positions (2,4), (4,3),(3,5), (5,4),…,(m,m – 1),
and finally off of the matrix completely. The result is a bidiagonal matrix
B̂ = U mT 1,m ...U 23
T T
U 12 BV12V23 ...Vm1,m m-1,m
(10.2.5)
That is nearly proper
Letting
U = U12U23…Um-1,m and V = V12V23…Vm-1,m (10.2.6)
We can rewrite (10.2.5) as
B̂ = UTBV (10.2.7)
In the addition we have BB = U BB U and B̂ B̂ = V B BV. As we shall see, Bˆ Bˆ T and
ˆ ˆ T T T T T T
B̂ T B̂ are essentially yhe same matrices as we would have obtained by taking one shifted
QR step with shift , starting with BBT and BTB, respectively. If we set B̂ B and perform
repeated QR steps, both BBT and BTB will tend to diagonal form. The main diagonal entries
will converge to the eigenvalues. If the Wilkinson shift is used, the (m,m – 1) and (m,m)
entries of both BBT and BTB will convergence very rapidly. The former to zero and the latter
to an eigenvalue. Of course we do not deal with BBT or BTB directly; we deal with B. The
rapid convergebce of BBT and BTB translate into converegence of m – 1 to zero and to a
singular value of B.
Once m–1 becames negligible, it can be considered to be zero, and the problem
can be deflated. Performing shifted QR steps on the remaining (m – 1)-by-(m – 1)
submatrix, we can force m – 2 quickly to zero, exposing to another singular value in the m-
1 position. Continuing is te manner, we soon find all the singular values of B.
During the whole procedure, all the k tend slowy toward zero. If at any point one
of them becomes negligible, the problem should be reduced to two smaller subproblems.
If only singular values are needed, there is no need to keep a record of the many rotators
used the during the QR steps. However, if the right (or left) singular vectors are desired, w
must keep track of the rotators Vi,i+1 (or Ui,i+1). Let us suppose we wish to compute the right
singular vectors of A R nxm and we have already calculated a decomposition
A = Û 1B Vˆ T
(10.2.8)
Where B R mxm
is bidiagonal, Û 1 R has orthonormal columns, and Vˆ R mxm is
nxm
orthogonal. Needing the right singular vectors, we have calculated Vˆ explicity and saved
it. As we perform the QR steps on B, we need to take into account each rotators Vij that
multiplies B on the right. This can be done by making the update Vˆ Vij Vˆ along with
the update BVij B. Since (BVij)(VVij)T = BVT, we see that this update . preserves the
overall product in (10.2.8). Of course this procedure should also be followed for the right
rotators used in the reduction procedure described in Exercise 10.2.9. Once B has been
reduced to diagonal form, the singular values lie on the main diagonal of B, and the right
singular vectors of A are he coloumns of Vˆ . The singular values do not necessary appear in
descending order. If left singular vectors are needed, then Û 1 must be calculated explixitly
and saved. Then for each rotators UijT that is applied to B on the left, the update Û 1UijT
U1 should be made along with the update UijTB B. In the end the m coloumns of U1 are
the left singular vectors of A.
The updates of B are inexpensive because B is very spase. By contrast the updates
Vˆ and Û 1 are relatively expensive. While the cost of an entire QR step
of the full matrices
without updating Û 1 or Vˆ is O(m) flops, the additional costs of updating Vˆ and Û 1 are
O(m2) and O(nm) flops per QR step, respectively. It follows that if the right or left singular
vectors are needed, the QR steps became much more expensive. The added cost can usually
be decreased by employing the ultimate shift strategy suggested in Section 4.8.
10.3
SOME BASIC APPLICATIONS OF SINGULAR VALUES
(10.1.9), it should not be suprising that 2 equals the maximum singular value of A.
2
where r is the rank of A. Since u1,…,ur are also orthonormal, Ax 2 = |σ1c1|2 + … + |crvr|2.
Ax
σ 12 ( |c1|2 + … + |cr|2) σ12 x σ1. This completes he
2 2 2
Thus Ax 2 2 ; that is,
x 2
proof.
Since A and AT have the same singular values, we have the following corollary.
COROLLARY 10.3.2 A 2 = AT 2.
Now supposse A is square , say A R nxn, and non-singular. The spectral condition
number of A is defined by
1
k2(A) = A 2
A
2
Let us see how k2(A) can be expressed in terms of the singular values of A. Since A has
rank n, it has n strictly positive singular values, and its action is described completely by
the following diagram :
A
1
v1 u1
2
v2 u2
n
vn un
1
u2
2 v
2
1
un n v
n
In terms of matrices we have A = U VT and A-1 = V-T -1U-1 = V -1UT. Either way we see
that the singular values of A-1, in descending order, are σ n
1
σ n 11 … σ 11 > 0.
1
Applying theorem 10.3.1 to A-1 we conclude that A 2 = σ n . These observations imply the
following theorem.
THEOREM 10.3.3 Let A R nxn be a non-singular matrix with singular values σ1 σ2
… σn > 0. Then
1
k2(A) =
n
Another experession for the condition number that was given in Chapter 2 is
max mag ( A)
k2(A) =
min mag ( A)
Where
Ax 2
maxmag(A) = max
x0 x2
Ax 2
minmag(A) = min
x0 x 2
This gives a slightly different view of the condition number. From Theorem 10.3.1
we know that maxmag (A) = σ1. It must therefore be true that minmag(A) = σn
In Chapter 3 we obversed that the equation
max mag ( A)
k2(A) =
min mag ( A)
(10.3.4)
Can be used to extend the definition of k2 to certain nonsquare matrices. Specifically, if A
R nxm, n m, and rank(A) = m, then minmag(A) > 0, and we can take (10.3.4) as the
definition of the condition number of A. If A is nonzero but does not have fill rank, then
(still assuming n m) minmag(A) = 0, and it is reasonable to define k2(A) = . With this
convention the following theorem holds, regardless of whether or not A has full rank.
THEOREM 10.3.5 Let A R nxm, n m , be a nonzero matrix with singular values σ1
σ2 … σm > 0. Then maxmag(A) = σ1, minmag(A) = σm. and k2(A) = σ1/σm.
The proof is left as an easy exercise for you.
In the absence of roundoff errors and uncertainties in the data, the singular value
decomposition reveals the rank of a matrix. Unfortunately the presence of errors and
uncertainties makes the question of rank meaningless. As we shall see, a small perturbation
in a matrix that is not of full rank can and typically will increase the rank.
The nonnegative number A A 2
is a measure of the distance between the
matrices A and A . Exercise 10.3.8 shows that every rank-deficient matrix has full-rank
matrices arbitrally close to I; this suggest that atrices of full rank are abundant. This
impression is strengthened by he next theorem and its corollary.
THEOREM 10.3.6 Let A R nxm with rank(A) = r > 0. Let A = U VT be the singular
value decomposition of A. For k = 1,…,r – 1 define Ak = U KVT, where k R nxm
is
σdefined by
1 0
0
2
k =
0 k
0 0
(We assume an usual that σ1 σ2 … σr). Then rank(Ak) = k, and
σk+1 = A Ak 2 = min{ A B 2|rank(B) = k }
That is, of all the matrices of rank k, Ak is closet to A.
Proof It is obvious the rank (Ak) = k .Since A – Ak = U( - k)VT, it is clear that the
largest singular values of A – Ak is σk+1. Therefore A Ak 2 = σk+1. It emains to be shown
only that for any other matrix B of rank k, A B 2 σk+1.
Given such that a B, note first that N (B) has dimension m – k, for dim(N (B)) =
dim(R ) – dim(R (B)) = m – rank(B) = m – k. Also, the space v1,…,vk+1 has dimension
m
k+1. (As usual, v1,…,vm denote the coloumns of V.) Since N (B) and v1,…,vk+1 are two
subspaces of R m, the sum of whose dimensions exceeds m, they must have a nontrivial
intersection. Let x be a nonzero vector in N (B) [v1,…,vk+1]. We can and will assume
that x̂ 2 = 1. Since x v1,…,vk+1 there exist scalars c1,…,ck+1 such that x =
c1v1+…+ck+1vk+1. Because v1,…,vk+1 are orthonormal , c1 2+…+ c k 1 2 = x̂
2
2 = 1. Since
x N (B),Bx = 0. Thus
k 1 k 1
(A – B) x̂ = Ax = i 1
ciAvi = i 1
σiciui
Since u1,…,uk+1are also orthonormal,
k 1 k 1
( A B) xˆ
2
2 = i 1
|σici|2 σ 2k 1 i 1
2
|ci|2 = σ k 1
Therefore
( A B) xˆ
A B 2 2
σk+1
xˆ 2
This completes the proof
COROLLARY 10.3.9 Suppose A R nxm has full rank. Thus rank(A) = r = min{n,m}. Let
σ1 σ2 … σ r > 0 be the singular values of A. Let B R nxm satisfy A B 2 < σr. Then
B also has full rank.
This result is an immediate consequence of Theorem 10.3.6. From Corollary
10.3.7 we see that if A has full rank, then all matrices sufficiently close to A also have full
rank. From Exercise 10.3.8 we know that every rank-deficient matrix has full rank matrices
arbitraly close to it. By Corollay 10.3.7, each of these full-rank matrices is surrounded by
other matrices of full rank. In topological lamguage, the set of matrices of full rank is an
open dense ubset of R nxm. Its compement, the set of rank-deficient matrices, is therefore
closed and nowhere dense. This discussion is meant to convence you that almost all
matrices have full rank.
If a matrix does not have full rank, any small perturbation is almost certain to
transform it to matrix that does have full rank. It follows that in the presence of uncertainty
in the data, it is possible to calculate the rank of matrix or even detect that it is rank
deficient. (This is generalization of the assertion, made in Chapter 1 and Chapter 2, that it is
impossible to detect wheter a squarematrix is singular.) Nevertheless in certain applications
it is reasonable to call a matrix numerically rank deficient if it is close to a rank-deficient
matrix.
Let be some positive number that represents the magnitude of the data
uncertainties in the matrix A. If there exists matrices B of rank k such that A B 2 <
and, on the orther hand, for every matrix C of rank k – 1 we have A C 2 >>, then we
will ay that the numerical rank of A is k. From Theorem 10.3.6 we know that this condition
is satisfied if and only if the singular values of A satisfy.
σ1 σ2 … σk >> > σk+1 …
Thus the numerical rank can be determined by examining the singular values. A matrix that
has k “large” singular values, the others being “tiny”, has numerical rank k, However , if
the set of singular values has no convenient gap, it may be impossible to assign a
meaningful numerical rank to the matrix.
We conclude this section by considering the implications of Theorem 10.3.6 for square,
non-singular matrices. Let A R nxm be non-singular and let A denote the singular matrix
that is closetst to A, in sense that 2 is a small as possible. In Theorem 2.3.17 we showed
that
A As 1
A k ( A)
For any induced matrix norm, and we mentioned that for the the 2-norm, equality holds.
We now have the tools to prove this.
COROLLARY 10.3.10 Let A R nxm be non-singular. ( Thus A has singular values σ1
σ2 … σn > 0) . Let A be the singular matrix that is closest to A, in the sense that A As
These results are immediate consequence of Theorems 10.3.1, 10.3.3, and 10.3.6.
In words, the distance from A to the nearest singular matrix is equal to the smallest singular
value of A, and the “relative distance” to the nearest singlar matix is equal to the reciprocal
of the condition number.
1 0
0
ˆ 0 2
= = R nxm
0 0
0 r
0 0
With σ1 σ2 … σr > 0. Because U is orthogonal, b Ax 2 = UT (b Ax) 2 =
UTb (VTx) 2. Letting c = UTb and y = VTx, we have
2 2 r m
b Ax 2 = c y 2
= i 1
|ci -σiyi| + i r 1
|ci|2
(10.4.2)
It is clear that this expression is minimized when only when
ci
yi = i = 1,…,r
i
Notice that when r < m , yr+1,…,ym do not appear in (10.4.2). Thus they have no effect on
the residual and can be chosen arbitrarily. Among all the solutions so obtained, 2 is clearly
minimized when and only when yr+1 = …= ym = 0. Since x = Vy and V is orthogonal, x 2=
y 2. Thus x 2 is minimized when and only when y 2 is. This proves that the least
squares problem has exactly one minimum norm solution.
It is useful to repeat the development, using partitioned matrices. Let
ĉ ŷ
c = and y=
d z
where ĉ , ŷ , R r. Then (10.4.2) can be rewritten as
2
2 ĉ ̂ 0 ŷ ˆ yˆ
cˆ
b Ax 2 = = cˆ yˆ
2
- 0 0 z = 2 = d 2
2
d 2
d
(10.4.3)
This is minimized when and only when y = -1c; that is, yi = ci/σi, i= 1,…,r. We an choose
z arbitrarily, but we get the minimum norm solution by taking z = 0. The norm of the
minimical residual is d 2. This solves the problem completely in principle. We summarize
the procedure:
ĉ
1. Calculate = c =UTb
d
2. ˆ 1 cˆ
Let yˆ
3. Let y = R n, where z can be chosen arbitrarily. To get the minimum norm solution,
take z = 0.
(10.4.4)
4. Let x = Vy.
Practical Considerations
In practice we do not know the exact rank of A. It is best to use the numerical rank,
discussed inm Section 10.3, instead. All “tiny” singular values should be set to zero.
We have solved the least-squares problem under the assumption that we have the
whole matrices U and V at hand. However, you can easily check that the calculation of c
uses only the first r columns of U, where, in practice, r is the numerical rank. If only the
minimum norm solution is wanted, only the first r columns of V are used . Whil the
numerical rank is usually not known in advance, it can never exceed min {n,m}, so at
almost min{n,m} columns of U and V are needed.
If n >> m , the computation of U can be expensive, even if we compute only the
first columns. In fact the computation of U can be avoided completely. U is the product of
many reflectors and roators that are generated during the reducton to bidiagonal form and
the subsequent QR steps. Since U is needed only so that we can calculate c = UTb, we can
simply update b instead of assembling U. As each rotatos ar reflector Ui is generated , we
b b. In the end b will have been transformed into c. In the process
T
make the update U i
we get not only c, but also d, so we can compute the norm of the residual 2. If several least-
squares promlems with the same A but a differentright-hand sides b(1),b(2),.. are to be
solved, the updates must be applied to all of the b(i) at once, since the Ui will not be saved.
No matter how the calculations are organized, the SVD is an expensive way to
solve the least-squares problem. Its principal advantage is that it gives a completely reliable
means of determninng the numerical rank for rank-deficient least-squares.
The Pseudoinverse
1
u1
1 v
1
1
u2
2 v
2
1
ur
r v
r
u r 1
0
u n
We see immediately that rank(A+) = rank(A), u1,…,un and v1,…,vm are right and left singular
1 1
vectors of A+, respectively, and σ 1 ,…,σ r are the nonzero singular values . The restricted
operators A: v1 ,....,vr u1 ,...,ur and A+: u1 ,...,u r v1 ,...,vr are true inverses of
one another.
What does A+ look like as a matrix? You can answer this questions in the simplest
case by working the following exercise.
To see what A+ looks like in general, note the equations
vi 1 i 1,..., r
A+ui = i
0 i r 1,..., n
Can be expressed as a single matrix equation
A+[u1,u2,…,ur|ur+1,…,un] =
11 0 0
0 2 0
1
[v1,v2,…,vr|vr+1,…,vm] 0
0 0 r1
0
0
or A+U = V .Thus
A + = V UT (10.4.5)
This is the SVD of A in matrix form, and it gives us a means of calculating A+ by
+
computing the SVD of A. However, there is seldom any reason to compute the
psedoinverse; it is mainly a theoretical tool. In this respect the psedoinverse plays a
role much like that of the ordinary inverse.
It is easy to make the claimed connection between the pseudoinverse and the least-
squares problem.
0 0 0
The Pseudoinverse is used in the study of the sensitivityof the rank-deficient least-
squares problem. See [SLS] os Stewart (1977)
Thus u v 2 = sin θ
LEMMA 10.5.5 Let u,v R n with u 2 = v 2 = 1. Let θ be the angle between u and v.
Then
u v cos = sin θ
v cos θ is the multiple of v that is closet to u
Proof Figure 10.2 shows that this lemma is also obvious. Let us first show that vcos θ is the
multiple of v that is closet to u. Applying the projection theorem (Theorem 10.5.6) with S
= (v), we find that the multiple cv that is closest to u is characterized by (u – cv,v) = 0.
Solving the equation for c. We find that c = (u,v)/(v,v) = (u,v) = cos θ. Now applying
Lemma 10.5.4 with v replaced by vcos θ, we have u v cos = sin θ
The result of the next exercise will be used in the proof of the theorem that
follows.
THEOREM 10.5.6 The first principal angle θ 1 and principal vectors u1 S and v1 T
satisfy
Sin θ = u1 v1 cos1 2 = min u1 v 2 = min min u v
vT uS vT 2
u 2 1
Proof the equation sin θ 1 = u1 v1 cos1 2 2 is a consequence of Lemma 10.5.5 As for the
orther two equations. It is obvious that
u1 v1 cos1 min u1 v min min u v
2 vS U S vT 2
u 2 1
u1 – v1cos θ 1 2. Given such a u and v, let v denote the best approximation of u from
T. The certainly u vˆ
u v 2. Let denote the angle between u and v. (If v = 0, we
2
set = /2) By exercise 10.5.3 θ /2. Since(u – v,v) = 0 by theorem 3.5.6, we have
u v̂ 2 = sin θ from Lemma 10.5.4. Since θ 1 θ and sin as an increasing function on [0,
/2], u1 – v1cos θ 1 2 = sin θ 1 sin θ = u vˆ 2 u v 2. This proves the first
string of equations. The second string is proved by reversing the roles of S and T.
Theorem 10.5.6 is easily generalized to yield statements about the orther principal
angles and vectors. For this purpose it is convenient to introduce some new notation. For i =
1,…,k let
S i =S u1 ,...ui 1 = ui ,..., u k
T i =T v1 ,..., vi 1 = v i ,..., vk
Roughtly speaking, S i and T i are just S and T with the first i – 1 principal vectors
removed.
In applications the subspaces S and T are usually provided in the form of a basis for each
subspaces. If the bases are not orthonormal, they can be orthonomalized by one of the
techniques from Chapter 3. Let us therefore assume that we have orthonormal bases
p1,p2,...,pk and q1,q2,…,qk of S and T, respectively Let P 1 = [p1, p2, ….,pk] R nxk and Q1
= [q1,q2,…,qk] R nxk. We wish to determine the principal angles and vectors between the
spaces. This is equivalent to determaining the matrices U1, V1, and T1, defined in Corollary
10.5.9. Since u1,…,uk and p1,…,pk are both bases of he same spaces , R (U1) = R (P1) = S.
Similarly R(V1) = R(Q1) = T. The matrices P1, Q1, V1 and U1 are all isometries (cf., Section
3.4) because they have orthonormal columns.
By Exercise 10.5.6, there exist orthogonal matrices M1,N1 R kxk such that
U1 = P1M1 and V1 = Q1N1
If we can figure out out how to calculate M1 and N1, we can use them to determine U1 and
T
V1. Recalling form Corollary 10.5.9 that U 1 V1 = T1, we have
T T T T T
P 1 Q1 = M1U 1 V1N 1 = M 1 N 1
T T
Since M1 and N1 are orthogonal and T1 is diagonal , M1T1N 1 is the SVD of P 1 Q1. this
gives us a means of calculating the principal angles and vectors
Calculate
T
1. P 1 Q1
2.
T T
The SVD P 1 Q1 = M1T1N 1 . Let 1 2 … k denote the singular values
(1
0.
5.
1
0)
3. i = arccos i , i = 1,…,k (principal angles)
4. U1 = P1M1 and V1 = Q1N1 (principal vectors)
This also settles a theoretical question. The principal angles are uniquely determeined:
They are determined by the singular values of a matrix that is independent of u1,…,uk and
v1,…,vk, so they do not depend on how the prinipal vectors are chosen. By step 4 of
(10.5.10), the principal vectors have exactly as much arbitrariness as singular vectors do.
As the following exercise shows, the computation θ i= arcos I cannot deliver
accurate values for angles near zero.
If we wish to calculate small principal angles accurately, we must find another
method. The following lemma is a start in the direction.
LEMMA 10.5.11 Let W1 R nxk be an isometry, and consider the partitioned form
W 11
W11 = W11 R n1xk, W21 R n2xk, n1 + n2 = n
W 21
Let γ1 γ2 … γk be the singular values of W11, and let σ1 σ2 … σk be the singular
values of W21 in ascending order. Then
2 2
γi + σi = 1 i = 1,…,k
Proof Since W1 has orthonormal columns
I = WIT W11 W11T W11 W21T W21
It follows immediately that W11T W11 and W21T W21 have common eigenvectors: If W11T W11 v
= λ v, then W21T W21 V = v, where
λ+ =1
and vice versa. Since the eigenvalues of W11T W11 and W21T W21 are 12 22 .... k2 and
12 22 ... k2 , respectively, (10.5.12) impies that i2 2i = 1 for i = 1,2,…,k
.
THEOREM 10.5.13 Let S and T be k-dimensional subspaces of R n with principal angles θ
1 θ 2 … θ k .Let p1,…,pk and pk+1,…,pn be orthonormal bases for S and S ,
respectively, and let q1,…,qk and qk+1,…,qn be orhonormal bases for T and T ,
respectively let
P1 = [p1,…,pk] P2 = [pk+1,…,pn]
Q1 = [q1,…,qk] Q2 = [qk+1,…,qn]
T T
Then the singular values of P2 Q1 and P1 Q2 are
Sin θ 1 sin θ 2 … sin θ k
Proof let P = [P1P2] R nxn. Then P is orthogonal matrix. Since Q1 has orthonormal
columns, the matrix
W1 = PTQ1
Must also have orthonormal columns. W1 can be written in the portioned form
P1T Q2
W1 = T
P2 Q1
The singular values of P1T Q1 are γi = cos θ i, i =1,…,k, so by Lemma 10.5.11 P2T Q1 has
singular values
σi = 1 i2 = 1 cos2 i = sin θ i i = 1,…,k
Reversing the roles of P and Q, we find that Q P also has singular sin θ i, i = 1,…,k
T
2 1
Theorem 10.5.13 shows that there are simple relationships between the singular
values of these submatrices. A more detailed statement of the relationships between the
singular values and singular vectors of the blocks of an orthogonal or unitary matrix is
given by Stewart (1977), Theorem A1.
The following theorem makes it very easy to work wish the distances function.
THEOREM 10.5.15 In the notation of Theorem 10.5.13
d(S,T ) = Q2T P1 Q1T P2 P1T Q2 P2T Q1
2 2 2 2
,
Proof Let u S with u 2 = 1. Since R = T T n
u can be expressed uniquely as a
sum, u = v + v , where v T and v T . By the projection theorem (Theorem 3.5.6),
d(u,T ) = v . Letting Q be the orthogonal matrix Q = [Q1Q2], we have
2
Q1T v
QT v T
Q2 v
v = 0
2 2 2 2
so v QTv Q1T v Q2T v since v T
T
, Q1
2 2 2 2
2
so v Q2T v . Notice also that Q2 v = 0. Thus
T
2 2
By Theorem 10.3.1, the spectral norm of a matrix is equal to the largest singular value.
Since Q1T P2 has the same singular values as Q2T P1 (Theorem 10.5.13), we have d(S,T ) =
Q2T P1
2
as well. Since P1T Q2 and P2T Q1 are the transposes of Q2T P1 and Q1T P2 ,
respectively, it is also true that d(S,T ) = Q2T P1 Q1T P2
2 2
Q1T Q T R
= P2T Q1Q2 T R1
P2T Q1 P2T Q2 1T 1
Q2 2
Q2 R1 2
= ( P Q1 )(Q R1 ) ( P Q2 )(Q R1 )
2
T T
1 2
T T
2
2
P Q1 2 Q
2
T T
1
2
R1 2
P T
2 2 Q2 2
Q2T R1
2
T
= P Q1
2 Q R1 = d(S,T ) + d(T,U )
T
2
2 2
This completes the proof
PROOF. If H ≥ 0 then, by definition, its eigenvalues {i }i 1 are nonnegative and we can
n
define a real matrix D0 = diag [ 1 , 2 , , n ]. The matrix H is normal and hence there
2
is a unitary matrix U such that H = UDU*, where D = D0 The required square root of H is
the matrix
H0 = UD0U*, (1)
2
since H = UD0U*UD0U* = U D U* = H. Note that the representation (1) shows that the
2
0 0
eigenvalues of H0 are the (arithmetic) square roots of the eigenvalues of H. This proves that
rank H0 = rank H (Exercise 5.3.9), and H ≥ 0 yields H0 ≥ 0.
Conversely, if H = H 02 and H ≥ 0, then the eigenvalues of H, being the squares of
those of H0, are nonnegative. Hence H ≥ 0. This fact also implies the equality of ranks.
The same arguments proves the therem for positive definite matrices. ■
The following simple corollary gives a proof of the necessary part of Theorem
5.3.3.
Corollary 1. If H ≥ 0 (or H > 0), then (Hx, x) ≥ 0 (or (Hx, x) > 0) for all x F n.
2
PROOF. Representing H = H 0 and using the matrix version of Theorem 5.1.1, we have
for all x F n . ■
2
(Hx, x) = ( H 0 x, x) = (H0x, H0x) ≥ 0
Note that the square root of positive semi-definite square root H0 of H is unique.
2
PROOF. Let H1 satisfy H 1 = H. By Theorem 4.11.3, the eigenvalues of H1 are square roots
of those of H and, since H1 ≥ 0, they must be nonnegative. Thus, the eigenvalues of H1 and
H0 of Eq. (1) coincide. Furthermore, H1 is Hermitian and, therefore (Theorem 5.2.1), H1 =
VD0V* for some unitary V. Now H 12 = H 02 = H, so V D 02 V* = U D 02 U* and hence (U*V)
D 02 = D 02 (U*V) and , consequently, H1 = H0, as required. ■
In the sequel the unique positive semi-definite (or definite) square root of a
positive semi-definite (or definite) matrix H is denoted by H1/2. Summarizing the above
discussion, note that λi ε σ(H1/2) if and only if λ2 ε σ(H), and the corresponding
eigenspaces of H1/2 and coincide. The concept of a square root of a positive semi-definite
matrix allows us to introduce a spectral characteristic for rectangular matrices.
Consider an arbitrary m x n matrix A. Then n x n matrix A*A is (generally)
positive semi-definite (see Exercise 5.3.5). Therefore by Theorem 1 the matrix A*A has a
positive semi-definite square root H1 such that A*A = H 12 . The eigenvalues λ1, λ2, . . . , λn
of the matrix H1 = (A*A)1/2 are referred to as the singular values s1, s2, . . . , sn of the
(generally rectangular) matrix A. Thus, for I = 1, 2, . . . , n.
si (A) i (( A * A)1 / 2 )
Obviously, the singular values of a matrix are nonnegative numbers.
Note that the singular values of A are sometimes defined as eigenvalues of the
matrix (AA*)1/2 of order m. It follows from the next fact that the difference in definition is
not highly significant.
Theorem 2. The nonzero eigenvalues of the matrices (A*A)1/2 and (AA*)1/2 coincide.
PROOF. First we observe that it suffices to prove the assertion of the theorem for the
matrices A*A and AA*. Furthermore, we select eigenvectors x1, x2, . . . , xn of A*A
corresponding to eigenvalues λ1, λ2, . . . , λn such that {x1, x2, . . . , xn} forms an orthonormal
basis in F n . We have
On the other hand, (A*Axi, xj) = (Axi, Axj), and comparison shows that (Axi, Axj) = λi , i = 1,
2, . . ., n. Thus, Axi = 0 (1 ≤ i ≤ n) if and only if λi = 0. Since
the preceding remark shows that for λi ≠ 0, the vector Axi is an eigenvector of AA*. Hence
if a nonzero λi ε σ(A*A) is associated with the eigenvector xi, then λi(AA*) and is
associated with the eigenvector Axi. In particular, σ(A*A) σ(AA*). The opposite inclusion
is obtained by exchanging the roles of A and A*. ■
Thus the eigenvalues of A*A and AA*, as well as of (A*A)1/2, differ only by the
geometric multiplicity of the zero eigenvalue, which is n – r for A*A and m – r for AA*,
where r = rank(A*A) = rank(AA*). Also, for a square matrix A it follows immediately that
the eigenvalues of A*A and AA* coincide and have the same multiplicities. [Compare this
with the result of Exercise 4.14.10(b).]
Note that we proved more than was stated in Theorem 2.
Exercise 4. Verify that the nonzero singular values of the matrices A and A* are the
same. If, in addition, A is a square n x n matrix, show that si(A) = si(A*) for i = 1, 2, . . . , n.
□
Proposition 2. The singular values of a square matrix are invariant under unitary
transformation.
PROOF. By definition
si(UA) = i ((A*U*UA)1/2) = si((A*A)1/2) = si(A).
To prove the second equality in Eq. (2), use Exercise 4 and the part of the proposition
already proved. ■
PROOF. Let i denote an eigenvalue of A*A corresponding to eigenvector xi. Since A*A =
AA*, it follows (see Exercise 5.2.7) that
A*Axi = i A*xi = i i xi = | i |2xi.
(3)
Exercise 6. Confirm that a square matrix is unitary if and only if all its singular
values are equal to one.
The following result has its origin in the familiar polar form of complex number:
0 e , where λ0 ≥ 0 and 0 ≤ γ ≤ 2π.
i
PROOF. Let
λ1 ≥ λ2 ≥ . . . ≥ λr ≥ 0 = λr + 1 = . . . = λn
denote the eigenvalues of A*A (see Theorem 5.4.2) with corresponding eigenvectors x1, x2,
. . . , xn that comprise an orthonormal basis in F n.
Then, by Exercise 5.4.3, the normalized elements
1 1
yi Ax i Ax i , i = 1, 2, . . . , r, are orthonormal eigenvectors of AA*
|| Ax i || i
corresponding to the eigenvalues 1 , 2 , . . . , r , respectively. Choosing an orthonormal
n
basis { y i }i 1 for AA*.
Proceeding to a construction of the matrices H and U in Eq. (1), we write H =
(AA*)1/2 and note (see Section 5.4) that Hyi = (λi)1/2yi for i = 1, 2, . . . , n. Also, we
introduces an n x n (transition) matrix U by Uxi = yi, i = 1, 2, . . . , n.
Note that Corollary 5.6.1 assert that U is unitary. We have
HUxi = Hyi = i y i = Axi, i = 1, 2, . . . , r.
(3)
Since (Axi, Axi) = (A*Axi, xi) = λi = 0 for i = r + 1, r + 2, . . . , n, it follows that Axi
= 0 (r + 1 ≤ i ≤ n). Furthermore, as observed in Section 5.4, AA*yi = 0 implies Hyi = 0.
Thus equalitiea (3) and (4) for basis elements clearly give HUx = Ax for all x F n,
proving Eq. (1). ■
Note that if A is non singular, so is (AA*)1/2 and in the polar decomposition (1), the
matrix H = (AA*)1/2 is positive definite. Observe that in this case the unitary matrix U can
be chosen to be H-1A and the representation (1) is unique.
A dual polar decomposition can be establisehd similarly.
Proposition 1. A matrix A F n
is normal if and only if the matrices H and U in Eq. (1)
commute.
Note the structure of the matrices U and V in Eqs. (7) and (9): the columns of U
(resppectively, V) viewed as vectors from F m (respectively, F n) constitute an orthonormal
eigenbasis of AA* in F m (respectively, of A*A in F n). Thus, a singular-value decomposition
of A can be obtained by soving the eigenvalue-eigenvector problem for the matrices AA*
and A*A.
Example 4. Consider the matrix A defined in Exercise 5.4.2;it has singular values s1 =
2 and s2 = 1. To find a singular-value decomposition of A, we compute A*A and AA* and
construct orthonormal eigenbases for these matrices. The standard basis in F 2 can be used
for A*A and the system {[α 0 α ]T, [0 1 0] T, [α 0 - α] T}, where α = 1 / 2 , can be used
for AA*. Hence
1 0 0 2 0
A = 0 1 0 1 0 0 1
1 0
0 1
1 0 0 0 0
is a singular-value decomposition of A. □
The realtion (7) gives rise to the general notion of unitary equivalence of two m x
n matrices A and B. We say that A and B are unitarily equivalent if there exist unitary
matrices U and V such that A = UBV*. It is clear that unitary equivalence is an equivalence
relation and that Theorem 2 can be interpreted as asserting the existence of a canonical
form (the matrix D) with respect to unitary equivalence in each equivalence class. Now the
next result is to be expected.
Proposition 2. Two m x n matrices are unitarily equivalent if and only if they have the
same singular values.
Corollary 1. Two m × n matrices A and B are unitarily equivalent if and only if the
matrices A*A and B*B are similar.