Sei sulla pagina 1di 10

Appendix C

Symmetric matrices

C.1 Spaces of Matrices


Let Sm be the space of symmetric m × m matrices, and Mm,n be the space of rectangular m × n
matrices with real entries. From the viewpoint of their linear structure (i.e., the operations of
addition and multiplication by reals) Sm is just the arithmetic linear space Rm(m+1)/2 of dimen-
sion m(m+1)
2 : by arranging the elements of a symmetric m × m matrix X in a single column, say,
in the row-by-row order, you get a usual m2 -dimensional column vector; multiplication of a ma-
trix by a real and addition of matrices correspond to the same operations with the “representing
vector(s)”. When X runs through Sm , the vector representing X runs through m(m + 1)/2-
2
dimensional subspace of Rm consisting of vectors satisfying the “symmetry condition” – the
coordinates coming from symmetric to each other pairs of entries in X are equal to each other.
Similarly, Mm,n as a linear space is just Rmn , and it is natural to equip Mm,n with the inner
product defined as the usual inner product of the vectors representing the matrices:
m ∑
∑ n
⟨X, Y ⟩ = Xij Yij = Tr(X T Y ).
i=1 j=1

Here Tr stands for the trace – the sum of diagonal elements of a (square) matrix. With this inner
product (called the Frobenius inner product), Mm,n becomes a legitimate Euclidean space, and
we may use in connection with this space all notions based upon the Euclidean structure, e.g.,
the (Frobenius) norm of a matrix
v
u∑ √
√ um ∑ n
∥X∥2 = ⟨X, X⟩ = t Xij2 = Tr(X T X)
i=1 j=1

and likewise the notions of orthogonality, orthogonal complement of a linear subspace, etc.
The same applies to the space Sm equipped with the Frobenius inner product; of course, the
Frobenius inner product of symmetric matrices can be written without the transposition sign:
⟨X, Y ⟩ = Tr(XY ), X, Y ∈ Sm .

C.2 Eigenvalue Decomposition


Let us focus on the space Sm of symmetric matrices. The most important property of these
matrices is as follows:

333
334 APPENDIX C. SYMMETRIC MATRICES

Theorem C.2.1 [Eigenvalue decomposition] n × n matrix A is symmetric if and only if it


admits an orthonormal system of eigenvectors: there exist orthonormal basis {e1 , ..., en } such
that
Aei = λi ei , i = 1, ..., n, (C.2.1)
for reals λi .

In connection with Theorem C.2.1, it is worthy to recall the following notions and facts:

C.2.A. Eigenvectors and eigenvalues. An eigenvector of an n × n matrix A is a nonzero


vector e (real or complex) such that Ae = λe for (real or complex) scalar λ; this scalar is called
the eigenvalue of A corresponding to the eigenvector e.
Eigenvalues of A are exactly the roots of the characteristic polynomial

π(z) = det(zI − A) = z n + b1 z n−1 + b2 z n−2 + ... + bn

of A.
Theorem C.2.1 states, in particular, that for a symmetric matrix A, all eigenvalues are real,
and the corresponding eigenvectors can be chosen to be real and to form an orthonormal basis
in Rn .

C.2.B. Eigenvalue decomposition of a symmetric matrix. Theorem C.2.1 admits equiv-


alent reformulation as follows (check the equivalence!):

Theorem C.2.2 An n × n matrix A is symmetric if and only if it can be represented in the


form
A = U ΛU T , (C.2.2)
where

• U is an orthogonal matrix: U −1 = U T (or, which is the same, U T U = I, or, which is the


same, U U T = I, or, which is the same, the columns of U form an orthonormal basis in
Rn , or, which is the same, the columns of U form an orthonormal basis in Rn ).

• Λ is the diagonal matrix with the diagonal entries λ1 , ..., λn .

Representation (C.2.2) with orthogonal U and diagonal Λ is called the eigenvalue decomposition
of A. In such a representation,

• The columns of U form an orthonormal system of eigenvectors of A;

• The diagonal entries in Λ are the eigenvalues of A corresponding to these eigenvectors.

C.2.C. Vector of eigenvalues. When speaking about eigenvalues λi (A) of a symmetric n×n
matrix A, we always arrange them in the non-ascending order:

λ1 (A) ≥ λ2 (A) ≥ ... ≥ λn (A);

λ(A) ∈ Rn denotes the vector of eigenvalues of A taken in the above order.


C.3. VARIATIONAL CHARACTERIZATION OF EIGENVALUES 335

C.2.D. Freedom in eigenvalue decomposition. Part of the data Λ, U in the eigenvalue


decomposition (C.2.2) is uniquely defined by A, while the other data admit certain “freedom”.
Specifically, the sequence λ1 , ..., λn of eigenvalues of A (i.e., diagonal entries of Λ) is exactly
the sequence of roots of the characteristic polynomial of A (every root is repeated according to
its multiplicity) and thus is uniquely defined by A (provided that we arrange the entries of the
sequence in the non-ascending order). The columns of U are not uniquely defined by A. What is
uniquely defined, are the linear spans E(λ) of the columns of U corresponding to all eigenvalues
equal to certain λ; such a linear span is nothing but the spectral subspace {x : Ax = λx} of
A corresponding to the eigenvalue λ. There are as many spectral subspaces as many different
eigenvalues; spectral subspaces corresponding to different eigenvalues of symmetric matrix are
orthogonal to each other, and their sum is the entire space. When building an orthogonal matrix
U in the spectral decomposition, one chooses an orthonormal eigenbasis in the spectral subspace
corresponding to the largest eigenvalue and makes the vectors of this basis the first columns in U ,
then chooses an orthonormal basis in the spectral subspace corresponding to the second largest
eigenvalue and makes the vector from this basis the next columns of U , and so on.

C.2.E. “Simultaneous” decomposition of commuting symmetric matrices. Let


A1 , ..., Ak be n × n symmetric matrices. It turns out that the matrices commute with each
other (Ai Aj = Aj Ai for all i, j) if and only if they can be “simultaneously diagonalized”, i.e.,
there exist a single orthogonal matrix U and diagonal matrices Λ1 ,...,Λk such that

Ai = U Λi U T , i = 1, ..., k.

You are welcome to prove this statement by yourself; to simplify your task, here are two simple
and important by their own right statements which help to reach your target:

C.2.E.1: Let λ be a real and A, B be two commuting n × n matrices. Then the


spectral subspace E = {x : Ax = λx} of A corresponding to λ is invariant for B
(i.e., Be ∈ E for every e ∈ E).
C.2.E.2: If A is an n × n matrix and L is an invariant subspace of A (i.e., L is a
linear subspace such that Ae ∈ L whenever e ∈ L), then the orthogonal complement
L⊥ of L is invariant for the matrix AT . In particular, if A is symmetric and L is
invariant subspace of A, then L⊥ is invariant subspace of A as well.

C.3 Variational Characterization of Eigenvalues


Theorem C.3.1 [VCE – Variational Characterization of Eigenvalues] Let A be a symmetric
matrix. Then
λℓ (A) = min max xT Ax, ℓ = 1, ..., n, (C.3.1)
E∈Eℓ x∈E,xT x=1

where Eℓ is the family of all linear subspaces in Rn of the dimension n − ℓ + 1.

VCE says that to get the largest eigenvalue λ1 (A), you should maximize the quadratic form
xT Ax over the unit sphere S = {x ∈ Rn : xT x = 1}; the maximum is exactly λ1 (A). To get
the second largest eigenvalue λ2 (A), you should act as follows: you choose a linear subspace E
of dimension n − 1 and maximize the quadratic form xT Ax over the cross-section of S by this
336 APPENDIX C. SYMMETRIC MATRICES

subspace; the maximum value of the form depends on E, and you minimize this maximum over
linear subspaces E of the dimension n − 1; the result is exactly λ2 (A). To get λ3 (A), you replace
in the latter construction subspaces of the dimension n − 1 by those of the dimension n − 2,
and so on. In particular, the smallest eigenvalue λn (A) is just the minimum, over all linear
subspaces E of the dimension n − n + 1 = 1, i.e., over all lines passing through the origin, of the
quantities xT Ax, where x ∈ E is unit (xT x = 1); in other words, λn (A) is just the minimum of
the quadratic form xT Ax over the unit sphere S.
Proof of the VCE is pretty easy. Let e1 , ..., en be an orthonormal eigenbasis of A: Aeℓ =
λℓ (A)eℓ . For 1 ≤ ℓ ≤ n, let Fℓ = Lin{e1 , ..., eℓ }, Gℓ = Lin{eℓ , eℓ+1 , ..., en }. Finally, for
x ∈ Rn let ξ(x) be the vector of coordinates of x in the orthonormal basis e1 , ..., en . Note
that
xT x = ξ T (x)ξ(x),
since {e1 , ..., en } is an orthonormal basis, and that
∑ ∑
xT Ax = xT A ξi (x)ei ) = xT λi (A)ξi (x)ei =
∑ i i
λi (A)ξi (x) (xT ei )
i | {z } (C.3.2)
ξi (x)

= λi (A)ξi2 (x).
i

Now, given ℓ, 1 ≤ ℓ ≤ n, let us set E = Gℓ ; note that E is a linear subspace of the dimension
n − ℓ + 1. In view of (C.3.2), the maximum of the quadratic form xT Ax over the intersection
of our E with the unit sphere is
{ n }
∑ ∑n
2 2
max λi (A)ξi : ξi = 1 ,
i=ℓ i=ℓ

and the latter quantity clearly equals to max λi (A) = λℓ (A). Thus, for appropriately chosen
ℓ≤i≤n
E ∈ Eℓ , the inner maximum in the right hand side of (C.3.1) equals to λℓ (A), whence the
right hand side of (C.3.1) is ≤ λℓ (A). It remains to prove the opposite inequality. To this end,
consider a linear subspace E of the dimension n − ℓ + 1 and observe that it has nontrivial
intersection with the linear subspace Fℓ of the dimension ℓ (indeed, dim E + dim Fℓ =
(n − ℓ + 1) + ℓ > n, so that dim (E ∩ F ) > 0 by the Dimension formula). It follows that there
exists a unit vector y belonging to both E and Fℓ . Since y is a unit vector from Fℓ , we have
∑ℓ ∑ℓ
y= ηi ei with ηi2 = 1, whence, by (C.3.2),
i=1 i=1



y T Ay = λi (A)ηi2 ≥ min λi (A) = λℓ (A).
1≤i≤ℓ
i=1

Since y is in E, we conclude that


max xT Ax ≥ y T Ay ≥ λℓ (A).
x∈E:xT x=1

Since E is an arbitrary subspace form Eℓ , we conclude that the right hand side in (C.3.1) is
≥ λℓ (A).
A simple and useful byproduct of our reasoning is the relation (C.3.2):
Corollary C.3.1 For a symmetric matrix A, the quadratic form xT Ax is weighted sum of
squares of the coordinates ξi (x) of x taken with respect to an orthonormal eigenbasis of A; the
weights in this sum are exactly the eigenvalues of A:

xT Ax = λi (A)ξi2 (x).
i
C.3. VARIATIONAL CHARACTERIZATION OF EIGENVALUES 337

Corollaries of the VCE


VCE admits a number of extremely important corollaries as follows:

C.3.A. Eigenvalue characterization of positive (semi)definite matrices. Recall that a


matrix A is called positive definite (notation: A ≻ 0), if it is symmetric and the quadratic form
xT Ax is positive outside the origin; A is called positive semidefinite (notation: A ≽ 0), if A is
symmetric and the quadratic form xT Ax is nonnegative everywhere. VCE provides us with the
following eigenvalue characterization of positive (semi)definite matrices:
Proposition C.3.1 : A symmetric matrix A is positive semidefinite if and only if its eigenval-
ues are nonnegative; A is positive definite if and only if all eigenvalues of A are positive
Indeed, A is positive definite, if and only if the minimum value of xT Ax over the unit sphere
is positive, and is positive semidefinite, if and only if this minimum value is nonnegative; it
remains to note that by VCE, the minimum value of xT Ax over the unit sphere is exactly the
minimum eigenvalue of A.

C.3.B. ≽-Monotonicity of the vector of eigenvalues. Let us write A ≽ B (A ≻ B) to


express that A, B are symmetric matrices of the same size such that A−B is positive semidefinite
(respectively, positive definite).
Proposition C.3.2 If A ≽ B, then λ(A) ≥ λ(B), and if A ≻ B, then λ(A) > λ(B).
Indeed, when A ≽ B, then, of course,
max xT Ax ≥ max xT Bx
x∈E:xT x=1 x∈E:xT x=1

for every linear subspace E, whence


λℓ (A) = min max xT Ax ≥ min max xT Bx = λℓ (B), ℓ = 1, ..., n,
E∈Eℓ x∈E:xT x=1 E∈Eℓ x∈E:xT x=1

i.e., λ(A) ≥ λ(B). The case of A ≻ B can be considered similarly.

C.3.C. Eigenvalue Interlacement Theorem. We shall formulate this extremely important


theorem as follows:
Theorem C.3.2 [Eigenvalue Interlacement Theorem] Let A be a symmetric n × n matrix and
Ā be the angular (n − k) × (n − k) submatrix of A. Then, for every ℓ ≤ n − k, the ℓ-th eigenvalue
of Ā separates the ℓ-th and the (ℓ + k)-th eigenvalues of A:
λℓ (A) ≽ λℓ (Ā) ≽ λℓ+k (A). (C.3.3)
Indeed, by VCE, λℓ (Ā) = min max xT Ax, where Ēℓ is the family of all linear subspaces of
E∈Ēℓ x∈E:xT x=1
the dimension n − k − ℓ + 1 contained in the linear subspace {x ∈ Rn : xn−k+1 = xn−k+2 = ... =
xn = 0}. Since Ēℓ ⊂ Eℓ+k , we have
λℓ (Ā) = min max xT Ax ≥ min max xT Ax = λℓ+k (A).
E∈Ēℓ x∈E:xT x=1 E∈Eℓ+k x∈E:xT x=1

We have proved the left inequality in (C.3.3). Applying this inequality to the matrix −A, we
get
−λℓ (Ā) = λn−k−ℓ (−Ā) ≥ λn−ℓ (−A) = −λℓ (A),
or, which is the same, λℓ (Ā) ≤ λℓ (A), which is the first inequality in (C.3.3).
338 APPENDIX C. SYMMETRIC MATRICES

C.4 Positive Semidefinite Matrices and the Semidefinite Cone


C.4.A. Positive semidefinite matrices. Recall that an n × n matrix A is called positive
semidefinite (notation: A ≽ 0), if A is symmetric and produces nonnegative quadratic form:

A ≽ 0 ⇔ {A = AT and xT Ax ≥ 0 ∀x}.

A is called positive definite (notation: A ≻ 0), if it is positive semidefinite and the corresponding
quadratic form is positive outside the origin:

A ≻ 0 ⇔ {A = AT and xT Ax > 00 ∀x ̸= 0}.

It makes sense to list a number of equivalent definitions of a positive semidefinite matrix:

Theorem C.4.1 Let A be a symmetric n × n matrix. Then the following properties of A are
equivalent to each other:
(i) A ≽ 0
(ii) λ(A) ≥ 0
(iii) A = DT D for certain rectangular matrix D
(iv) A = ∆T ∆ for certain upper triangular n × n matrix ∆
(v) A = B 2 for certain symmetric matrix B;
(vi) A = B 2 for certain B ≽ 0.
The following properties of a symmetric matrix A also are equivalent to each other:
(i′ ) A ≻ 0
(ii′ ) λ(A) > 0
(iii′ ) A = DT D for certain rectangular matrix D of rank n
(iv′ ) A = ∆T ∆ for certain nondegenerate upper triangular n × n matrix ∆
(v′ ) A = B 2 for certain nondegenerate symmetric matrix B;
(vi′ ) A = B 2 for certain B ≻ 0.

Proof. (i)⇔(ii): this equivalence is stated by Proposition C.3.1.


(ii)⇔(vi): Let A = U ΛU T be the eigenvalue decomposition of A, so that U is orthogonal and
Λ is diagonal with nonnegative diagonal entries λi (A) (we are in the situation of (ii) !). Let Λ1/2
1/2
be the diagonal matrix with the diagonal entries λi (A); note that (Λ1/2 )2 = Λ. The matrix
1/2
B = U Λ1/2 U T is symmetric with nonnegative eigenvalues λi (A), so that B ≽ 0 by Proposition
C.3.1, and
B 2 = U Λ1/2 |U{z
T
U} Λ1/2 U T = U (Λ1/2 )2 U T = U ΛU T = A,
I

as required in (vi).
(vi)⇒(v): evident.
(v)⇒(iv): Let A = B 2 with certain symmetric B, and let bi be i-th column of B. Applying
the Gram-Schmidt orthogonalization process (see proof of Theorem A.2.3.(iii)), we can find an

i
orthonormal system of vectors u1 , ..., un and lower triangular matrix L such that bi = Lij uj ,
j=1
or, which is the same, BT= LU , where U is the orthogonal matrix with the rows u1 , ..., uTn .
T
We
now have A = B 2 = B T (B T )T = LU U T LT = LLT . We see that A = ∆T ∆, where the matrix
∆ = LT is upper triangular.
(iv)⇒(iii): evident.
C.4. POSITIVE SEMIDEFINITE MATRICES AND THE SEMIDEFINITE CONE 339

(iii)⇒(i): If A = DT D, then xT Ax = (Dx)T (Dx) ≥ 0 for all x.


We have proved the equivalence of the properties (i) – (vi). Slightly modifying the reasoning
(do it yourself!), one can prove the equivalence of the properties (i′ ) – (vi′ ).

Remark C.4.1 (i) [Checking positive semidefiniteness] Given an n × n symmetric matrix A,


one can check whether it is positive semidefinite by a purely algebraic finite algorithm (the so
called Lagrange diagonalization of a quadratic form) which requires at most O(n3 ) arithmetic
operations. Positive definiteness of a matrix can be checked also by the Choleski factorization
algorithm which finds the decomposition in (iv′ ), if it exists, in approximately 16 n3 arithmetic
operations.
There exists another useful algebraic criterion (Sylvester’s criterion) for positive semidefi-
niteness of a matrix; according to this criterion, a symmetric matrix A is positive definite if
and only if its angular minors are positive, and A is positive semidefinite if and only[ if all] its
a b
principal minors are nonnegative. For example, a symmetric 2 × 2 matrix A = is
b c
positive semidefinite if and only if a ≥ 0, c ≥ 0 and det(A) ≡ ac − b2 ≥ 0.
(ii) [Square root of a positive semidefinite matrix] By the first chain of equivalences in
Theorem C.4.1, a symmetric matrix A is ≽ 0 if and only if A is the square of a positive
semidefinite matrix B. The latter matrix is uniquely defined by A ≽ 0 and is called the square
root of A (notation: A1/2 ).

C.4.B. The semidefinite cone. When adding symmetric matrices and multiplying them by
reals, we add, respectively multiply by reals, the corresponding quadratic forms. It follows that

C.4.B.1: The sum of positive semidefinite matrices and a product of a positive


semidefinite matrix and a nonnegative real is positive semidefinite,

or, which is the same (see Section 2.1.4),

C.4.B.2: n × n positive semidefinite matrices form a cone Sn+ in the Euclidean space
Sn of symmetric n×n matrices, the∑Euclidean structure being given by the Frobenius
inner product ⟨A, B⟩ = Tr(AB) = Aij Bij .
i,j

The cone Sn+ is called the semidefinite cone of size n. It is immediately seen that the semidefinite
cone Sn+ is “good,” specifically,

• Sn+ is closed: the limit of a converging sequence of positive semidefinite matrices is positive
semidefinite;

• Sn+ is pointed: the only n × n matrix A such that both A and −A are positive semidefinite
is the zero n × n matrix;

• Sn+ possesses a nonempty interior which is comprised of positive definite matrices.

Note that the relation A ≽ B means exactly that A − B ∈ Sn+ , while A ≻ B is equivalent to
A − B ∈ int Sn+ . The “matrix inequalities” A ≽ B (A ≻ B) match the standard properties of
340 APPENDIX C. SYMMETRIC MATRICES

the usual scalar inequalities, e.g.:

A≽A [reflexivity]
A ≽ B, B ≽ A ⇒ A = B [antisymmetry]
A ≽ B, B ≽ C ⇒ A ≽ C [transitivity]
A ≽ B, C ≽ D ⇒ A + C ≽ B + D [compatibility with linear operations, I]
A ≽ B, λ ≥ 0 ⇒ λA ≽ λB [compatibility with linear operations, II]
Ai ≽ Bi , Ai → A, Bi → B as i → ∞ ⇒ A ≽ B [closedness]

with evident modifications when ≽ is replaced with ≻, or

A ≽ B, C ≻ D ⇒ A + C ≻ B + D,

etc. Along with these standard properties of inequalities, the inequality ≽ possesses a nice
additional property:

C.4.B.3: In a valid ≽-inequality


A≽B
one can multiply both sides from the left and by the right by a (rectangular) matrix
and its transpose:
A, B ∈ Sn , A ≽ B, V ∈ Mn,m

V T AV ≽ V T BV
Indeed, we should prove that if A − B ≽ 0, then also V T (A − B)V ≽ 0, which is
immediate – the quadratic form y T [V T (A − B)V ]y = (V y)T (A − B)(V y) of y is
nonnegative along with the quadratic form xT (A − B)x of x.

An important additional property of the semidefinite cone is its self-duality:

Theorem C.4.2 A symmetric matrix Y has nonnegative Frobenius inner products with all pos-
itive semidefinite matrices if and only if Y itself is positive semidefinite.

Proof. “if” part: Assume that Y ≽ 0, and let us prove that then Tr(Y X) ≥ 0 for every X ≽ 0.
Indeed, the eigenvalue decomposition of Y can be written as

n
Y = λi (Y )ei eTi ,
i=1

where ei are the orthonormal eigenvectors of Y . We now have



n ∑
n
Tr(Y X) = Tr(( λi (Y )ei eTi )X) = λi (Y ) Tr(ei eTi X)
i=1 i=1 (C.4.1)

n
= λi (Y ) Tr(eTi Xei ),
i=1

where the concluding equality is given by the following well-known property of the trace:

C.4.B.4: Whenever matrices A, B are such that the product AB makes sense and
is a square matrix, one has
Tr(AB) = Tr(BA).
C.4. POSITIVE SEMIDEFINITE MATRICES AND THE SEMIDEFINITE CONE 341

Indeed, we should verify that if A ∈ Mp,q and B ∈ Mq,p , then Tr(AB) = Tr(BA). The

p ∑
q
left hand side quantity in our hypothetic equality is Aij Bji , and the right hand side
i=1 j=1

q ∑
p
quantity is Bji Aij ; they indeed are equal.
j=1 i=1

Looking at the concluding quantity in (C.4.1), we see that it indeed is nonnegative whenever
X ≽ 0 (since Y ≽ 0 and thus λi (Y ) ≥ 0 by P.7.5).
”only if” part: We are given Y such that Tr(Y X) ≥ 0 for all matrices X ≽ 0, and we should
prove that Y ≽ 0. This is immediate: for every vector x, the matrix X = xxT is positive
semidefinite (Theorem C.4.1.(iii)), so that 0 ≤ Tr(Y xxT ) = Tr(xT Y x) = xT Y x. Since the
resulting inequality xT Y x ≥ 0 is valid for every x, we have Y ≽ 0.
342 APPENDIX C. SYMMETRIC MATRICES

Potrebbero piacerti anche