Sei sulla pagina 1di 429


of Matrices
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in
these texts is not lost to today's students and researchers.

Robert E. O'Malley, Jr., University of Washington

Editorial Board
John Boyd, University of Michigan Peter Oiver, University of Minnesota
Susanne Brenner, Louisiana State University Philip Protter, Cornell University
Bernard Deconinck, University of Washington Matthew Stephens, The University of Chicago
William G. Faris, University of Arizona Divakar Viswanath, University of Michigan
Nicholas J. Higham, University of Manchester Gerhard Wanner, L' Universite de Geneve
Mark Kot, University of Washington

Classics in Applied Mathematics

C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications
and Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained
Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One,
Part Two, Supplement. Translated by G. W. Stewart
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for
Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of InitialA/alue Problems
in Differential-Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and
Nonlinear Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
Richard Haberman, Mathematical Models: Mechanical Vibrations, Popuhtion Dynamics, and Traffic Flow
Peter W. M. John, Statistical Design and Analysis of Experiments
Tamer Ba§ar and Geert Jan Olsder, Dynamic Nuncooperative Game Theory, Second Edition
Emanuel Parzen, Stochastic Processes
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design

*First time in print.

Classics in Applied Mathematics (continued)

Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and 11
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographie Imaging
R. Wong, Asymptotic Approximations of Integrals
0 . Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. 1: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Popuhtions
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology
Heinz-Otto Kreiss and Jens Lorenz, lnitial-BounL·ry Value Problems and the Navier-Stohes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and Technique
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
1. Gohberg, P. Lancaster, and L. Rodman, Matnx Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
Francoise Chatelin, Spectral Approximation of Linear Operators

Classics in Applied Mathematics (continued)

Yousef Saad, Numerical Methods for Large Eigenvalue Problems, Revised Edition
Achi Brandt and Oren E. Livne, Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics,
Revised Edition
Bernd Fischer, Polynomial Based Iteration Methods for Symmetric Linear Systems
Pierre Grisvard, Elliptic Problems in Nonsmooth Domains
E. J. Hannan and Manfred Deistler, The Statistical Theory of Linear Systems
Franchise Chatelin, Eigenvalues of Matrices, Revised Edition
of Matrices

Frangoise Chatelin
CERFACS and the University of Toulouse
Toulouse, France

With exercises by
Mario Ahues
Universite de Saint^Etienne, France
Frangoise Chatelin

Translated with additional material by

Walter Ledermann
University of Sussex, UK

Financial assistance for the translation was given by

the French Ministry of Culture»

Society for Industrial and Applied Mathematics
Copyright © 2012 by the Society for Industrial and Applied Mathematics

This SIAM edition is a revised republication of the work first published by John
Wiley & Sons, Inc., in 1993.

This book was originally published in two separate volumes by Masson, Paris:
Valeurs propres de matrices (1988) and Exercises de valeurs propres de matrices (1989).

10 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. N o part of this book
may be reproduced, stored, or transmitted in any manner without the written
permission of the publisher. For information, write to the Society for Industrial and
Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688

MATLAB is a registered trademark of The MathWorks, Inc. For MATLAB product

information, please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA
01760-2098 USA, 508-647-7000, Fax: 508-647-7001,,
www. matlnworks. com.

Library of Congress Cataloging-in-Publication Data

Chaitin-Chatelin, Frangoise.
Eigenvalues of matrices / Frangoise Chatelin ; with exercises by Mario Ahues ;
translated, with additional material, by Walter Ledermann. — Rev. ed.
p. cm. — (Classics in applied mathematics ; 71)
Includes bibliographical references and index.
ISBN 978-1-61197245-0
1. Matrices. 2. Eigenvalues. I. Ahues, Mario. II. Ledermann, Walter, 1911-2009.
III. Title.
QA188.C44 2013

is a registered trademark.
Hypatia of Alexandria,
a.d. 370415,
stoned to death by the mob.

After profound studies of mathematics and philosophy

in Athens she established a school in Alexandria, her native city,
where Pfoto and Aristotle, as well as Diophantus,
Apolhnius ofPerga and Ptolemy were studied,
This displeased the clerics who incited the mob against her.

Preface to the Classics Edition xiii
Preface xv
Preface to the English Edition xix
Notation xxi
List of Errata xxiii

Chapter 1 Supplements from Linear Algebra 1

1.1 Notation and definitions 1
1.2 The canonical angles between two subspaces 5
1.3 Projections 8
1.4 The gap between two subspaces 10
1.5 Convergence of a sequence of subspaces 14
1.6 Reduction of square matrices 18
1.7 Spectral decomposition 27
1.8 Rank and linear independence 31
1.9 Hermitian and normal matrices 32
1.10 Non-negative matrices 33
1.11 Sections and Rayleigh quotients 34
1.12 Sylvester's equation 35
1.13 Regular pencils of matrices 42
1.14 Bibliographical comments 43
Exercises 43

Chapter 2 Elements of Spectral Theory 61

2.1 Revision of some properties of functions of a complex variable 61
2.2 Singularities of the resolvent 63
2.3 The reduced resolvent and the partial inverse 73
2.4 The block-reduced resolvent 76
2.5 Linear perturbations of the matrix A 79
2.6 Analyticity of the resolvent 82
2.7 Analyticity of the spectral projection 84
2.8 The Rellich-Kato expansions 85
2.9 The Rayleigh-Schrödinger expansions 86
2.10 Non-linear equation and Newton's method 89
2.11 Modified methods 92
2.12 The local approximate inverse and the method of residual
correction 95
2.13 Bibliographical comments 98
Exercises 98

Chapter 3 Why Compute Eigenvalues? Ill

3.1 Differential equations and difference equations 111
3.2 Markov chains 114
3.3 Theory of economics 117
3.4 Factorial analysis of data 119
3.5 The dynamics of structures 120
3.6 Chemistry 122
3.7 Fredholm's integral equation 124
3.8 Bibliographical comments 126
Exercises 126

Chapter 4 Error Analysis 149

4.1 Revision of the conditioning of a system 149
4.2 Stability of a spectral problem 150
4.3 A priori analysis of errors 165
4.4 A posteriori analysis of errors 170
4.5 A is almost diagonal 177
4.6 A is Hermitian 180
4.7 Bibliographical comments 190
Exercises 191

Chapter 5 Foundations of Methods for Computing Eigenvalues 205

5.1 Convergence of a Krylov sequence of subspaces 205
5.2 The method of subspace iteration 208
5.3 The power method 213
5.4 The method of inverse iteration 217
5.5 The QR algorithm 221
5.6 Hermitian matrices 226
5.7 The QZ algorithm 226
5.8 Newton's method and the Rayleigh quotient iteration 227
5.9 Modified Newton's method and simultaneous inverse iterations 228
5.10 Bibliographical comments 235
Exercises 235

Chapter 6 Numerical Methods for Large Matrices 251

6.1 The principle of the methods 251
6.2 The method of subspace iteration revisited 253
6.3 The Lanczos method 257
6.4 The block Lanczos method 266
6.5 The generalized problem Kx = XMx 270
6.6 Arnoldi's method 272
6.7 Oblique projections 279
6.8 Bibliographical comments 280
Exercises 281

Chapter 7 Chebyshev's Iterative Methods 293

7.1 Elements of the theory of uniform approximation
for a compact set in C 293
7.2 Chebyshev polynomials of a real variable 299

7.3 Chebyshev polynomials of a complex variable 300

7.4 The Chebyshev acceleration for the power method 304
7.5 The Chebyshev iteration method 305
7.6 Simultaneous Chebyshev iterations (with projection) 308
7.7 Determination of the optimal parameters 311
7.8 Least squares polynomials on a polygon 312
7.9 The hybrid methods of Saad 314
7.10 Bibliographical comments 316
Exercises 316

Chapter 8 Polymorphic Information Processing with Matrices 323

8.1 Scalars in a field 324
8.2 Scalars in a ring 324
8.3 Square matrices are macro-scalars 327
8.4 The spectral and metric information stemming from A
of order n 328
8.5 Polar representations of A of order n 330
8.6 The yield of A Hermitian positive semi-definite under
spectral coupling 332
8.7 Homotopic deviation 340
8.8 Non-commutativity of the matrix product 342
8.9 Conclusion 346
8.10 Bibliographical comments 346
Exercises 346
Additional References 348

Appendices 351
A Solution to Exercises 351
B References for Exercises 395
C References 399

Index 406
Preface to the Classics Edition

The original French version of this book was published by Masson, Paris, in 1988.
The 24 years which have elapsed since 1988 until the present SIAM republication
of the English translation (Wiley, 1993) by Professor Ledermann have confirmed the
essential role played by matrices in intensive scientific computing. They lie at the
foundation of the digital revolution that is taking place worldwide at lightning speed.
During the past quarter of a century, the new field called qualitative computing
has emerged in mathematical computation, which can be viewed as a first step in the
direction of the polymorphic information theory that is required to decipher life phe­
nomena on the planet. In this broader perspective, the backward analysis, which was
devised by Givens and Wilkinson in the late 1950s to assess the validity of matrix
computations performed in the finite precision arithmetic of scientific computers, be­
comes mandatory even when the arithmetic of the theory is exact. This is because
classical linear algebra may yield local results which disagree with the global nonlin­
ear algebraic context. Consequently, square matrices play, via their eigenvalues and
singular values, an even more fundamental role than that which was envisioned in
Chapter 3 of the original version.
This Classics Revised Edition describes this deeper role in a postface taking the
form of Chapter 8, which is accompanied by an updated bibliography. This is my third
book devoted to computational spectral theory to be published by SIAM, following
Lectures on Finite Precision Computations in 1996 (co-authored with Valerie Fraysse)
and Spectral Approximation of Linear Operators in 2011 (Classics 65). These books
form a trilogy which contains the theoretical and practical knowledge necessary to
acquire a sound understanding of the central role played by eigenvalues of matrices
in life information theory.
My gratitude goes to Beresford Parlett, U.C. Berkeley, for his perceptive reading
of a draft of the Postface. It is again my pleasure to acknowledge the highly profes­
sional support provided by Sara Murphy, Developmental and Acquisitions Editor at

Frangoise Chatelin
CERFACS and University of Toulouse,
July 2012.

Helmholtz (...) advises us to observe for a long time the waves of the sea and the
wakes of ships, especially at the moment when the waves cross each other (...). Through
such understanding one must arrive at this new perception, which brings more order
into the phenomena.

Alain, Element de philosophic radicale III

The calculation of eigenvalues is a problem of great practical and theoretical

importance. Here are two very different types of application: in the dynamics of
structures it is essential to know the resonance frequency of the structure; for
example, we mention the vibrations of the propeller blades in ships or helicoptors,
the influence of the swell on drilling platforms in the sea, the reaction of buildings
in earthquakes. Another class of fundamental applications is related to the
determination of the critical value of a parameter for the stability of a dynamical
system such as a nuclear reactor.
A good understanding of the algorithms is necessary in order that they may
be used efficiently. One knows the fantastic advance of calculators brought about
by technical developments: in 1957 transistors replaced valves; in the 1960s
printed circuits appeared and then the first integrated circuits with several dozen
transistors per microchip. In 1985 the VLSI (very large scale integration)
technology permitted the integration of a million transistors per chip.
What is less well known is the gain in performance due to progress in
mathematical methods. In some areas this is at least as important as the gain due
to technological revolution. For example, from 1973 to 1983 the capacities of the
most powerful computers were multiplied by 1000 and during the same period
the improvement of certain numerical techniques brought about a gain of
another factor of 1000. All this in the supersonic regime made it possible in 1983
to calculate a complete aircraft in less than a night's work on Cray I.
The object of this book is to give a modern and complete theory, on an
elementary level, of the eigenvalue problem of matrices. We present the
fundamental aspects of the theory of linear operators in finite dimensions and in
matrix notation. The use of the vocabulary of functional analysis has the effect

of demonstrating the profound similarity between the different methods of

approximation. At the same time, the use of the vocabulary of linear algebra, in
particular the systematic use of bases for the representation of invariant
subspaces, allows us to give a geometric interpretation that enhances the
traditional algebraic presentation of many algorithms in numerical matrix
The presentation of this work is organized around several salient ideas:

(a) treatment of the eigenvalue problem in complete generality: non-symmetric

matrices and multiple defective eigenvalues;
(b) influence of the departure from normality on spectral conditioning (Chapter 4);
(c) use of the Schur form in preference to the Jordan form;
(d) simultaneous treatment of several distinct eigenvalues (Chapters 2 and 4);
(e) presentation of the most efficient up-to-date algorithms (for sequential or
vectorial computers) in order to compute the eigenvalues of (i) dense matrices
of medium size and (ii) sparse matrices of large sizes, the algorithms being
divided into two families: (1) algorithms of the interative type for subspaces
(Chapters 5,6 and 7) and (2) those of the incomplete Lanczos/Arnoldi type
(Chapter 6);
(f) analysis of the convergence of subspaces by means of the convergence of their
bases (Chapter 1);
(g) analysis of the quality of approximation with the help of two concepts:
approximation through orthogonal projection on a subspace and asymptotic
behaviour of the subspaces AkS, k = 1,2,... (Chapters 5,6 and 7);
(h) improvement of the efficiency of the numerical methods through spectral
preconditioning (Chapters 5,6 and 7).

The reader who wants to obtain a deeper understanding of this area will find
the study of the following classical books very enriching: Golub-Van Loan
(Chapters 7,8 and 9), Parlett and Wilkinson.
The present book is a work on numerical analysis in depth. It is addressed
especially to second-year students of the Maitrise, to pupils of the Magistere, as
well as to those of the Grandes Ecoles. It is assumed that the reader is familiar
with the basic facts of numerical analysis covered in the book Introduction ά Γ
Analyse Numerique Matricielle et ά Γ Optimisation* by P. G. Ciarlet. The Recueil
if Exercices* is an indispersable pedagogic complement to the main text. It
consists of exercises of four types:

*An English translation was published in 1989 by the Cambridge University Press (Translator's
footnote). ·
This collection of exercises is incorporated in the present volume (Translator's footnote).

(A) illustrations or supplements of points discussed in the text ( a solution is

(B) exercises for training and deepening understanding (bibliographical reference
is given in appendix B where the proofs can be found);
(C) computational exercises where the result is usually given in the text;
(D) problems (no solution is furnished).

This text has benefited from comments by a number of my colleagues and

friends. It took its starting point from a Licence course Ά propos de valeur
propres' which Philippe Toint invited me to give at the University at Namur in
the Spring of 1983.1 should like to thank him, and most of all I want to express
my thanks to Mario Ahues for the close and friendly collaboration throughout
the preparation of the text and the exercises. Equally, I am pleased to
acknowledge, in addition to their friendship, the influence of Beresford Parlett
and of Youcef Saad, which grew in the course of years. Finally, I should like to
thank Philippe Ciarlet and Jacques-Louis Lions for showing their confidence in
me by suggesting that I should write this volume for the series Mathematiques
Appliquees pour la Mattrise.
Preface to the English

I am very pleased to have this opportunity to acknowledge the very fine work
accomplished by Professor Ledermann. He certainly worked much harder than
is commonly expected from a translator, to transform a terse French text into
a more flowing English one. He even corrected some mathematical mistakes!
Two paragraphs have also been added in Chapter 4 to keep up to date with
new developments about the influence of non-normality and the componentwise
stability analysis. The list of references has been updated accordingly.

N set of integers
JR set of real numbers
C set of complex numbers
A = (α^) matrix with elementflyin the ith row andjth column (1 < i ^ w,
K jϊ ^ m); linear map of <CW into C"
Λ τ = (αβ) transposed matrix (denoted by XA in algebra)
A* = (a,,) transposed conjugate matrix
<C" set of n by m matrices over C
x = ( { # = ( { l 9 ..., £ η ) column vector of C
{x x ,...,x r } = {Xi}\ set of r vectors
A = [fl t ,..., flm] matrix of column vectors {A,}"
sp(4) spectrum of A
{A,·}^ set of distinct eigenvalues of A, d ^ w.
{μ,}" set of eigenvalues, possibly repeated, each counted with its
algebraic multiplicity
res(/l) = C — sp(4) resolvent set of A
p(A) = max | χ.\ spectral radius of A

det A determinant of A
tr A trace of A = £ aH

r(A) rank of A
adj /I = (Aij) adjoint of A:AU is the cofactor of αμ when A = (fly)
π(λ) = det (A/ — Λ) the characteristic polynomial of A
|| x || 2 = I Σ I iil ) Euclidean norm of x

ΜΙΙ,=(ΣΚΙ2)1/2 Frobenius norm of A

condM)=M||M-,| condition number of A (with respect to inversion)

a\a2esv(A*A) singular value of A

/4 tM :M->M restriction of a linear map A to an invariant subspace M
lin(xj,...,x r ) subspace over C generated by {χ,}Γ,
ω(Μ, Ν) gap between the subspaces M and N
0 = diag(0,·) diagonal matrix of canonical angles between M and N
Σ=ο notational convention
tensor (or Kronecker*) product of the matrices A and B
\®B set of polynomials of degree ^ k
Chebyshev polynomial of the first kind of degree k, when
Tk(t) =
I[( f + (i2 _1}Ι/2)* +

(ί + ( ί 2 - 1 ) 1 / 2 Γ * ] compression or Rayleigh quotient of A constructed upon X, Y

Y*AX; Y*X = / adjoint bases of C
complementary invariant subspaces, M = M+
M right invariant subspace
left invariant subspace
x right eigenvector
x left eigenvector
csp spectral condition number
csp(/) spectral condition number of the eigenvalue λ
csp(x) spectral condition number of the eigenvector x
csp(M) spectral condition number of the invariant subspace M
meas (Γ) Lebesgue measure of the curve Γ
di.n. departure from normality

* Leopold Kronecker, 1823-1891, born at Liegnitz, died in Berlin.

List of Errata

p. xv line(—22) helicopters
p. xvi line(—2) indispensable

p. xxii definition A is regular 4=> A is invertible <=$■ det A^O

Chapter 1

p. 3 line(-12) orthonormal
p. 10 line(13) basis
p. 11 line(2) x*x = 1
line(-7) ||(/-πΛτ)πΜ|| in (1.4.6)
p. 16 line(2) ck
line(10) as required.
p. 18 line(2) basis
line(5) = l/2ibr
/ x -esin(2/e) λ
line(8) A[£)
~ \ -εώη(2/ε) x )
p. 21 line(-7) ,w*)
p. 22 line(15) 1< j <i < m
p. 23 line(9) =0
p. 31 line(13) . . . of X, but it actually goes back to Laplace in the
first Supplement, pp. 505-512, to Theorie analy-
tique des Probabilites, 3rd edition, Paris, 1820.
p. 32 footnote 1854-

p. 38 line(18) \\&z\\
footnote *Karl Adolf Hessenberg, 1904-1959, born and died
in Frankfurt am Main.
p. 39 lined) HT-'HF
line(ll) 5<1
line(12) cond 2 (X)
p. 40 line(16) \\AA* - A*A\\F
line(-8) of order r (not to be confused with the order of B
taken to be 1)
p. 42 line(—7) set of finite eigenvalues
p. 43 line(—13) Poincare
p. 45 line(9) - f^) in 1.1.10
p. 46 lined 3) Plli in 1.1.19
line(-3) 0 m a x = π/2 in 1.2.2
p. 47 line(-2) [UU] in 1.2.6
p. 48 line(10) If||(P-Q)P||2<l in 1.3.4
/ r \ 1/2
line(-5) h^sin ^ in 1.4.2

p. 49 line(-ll) delete "and normal" in 1.6.4

p. 50 line(5) St = . . . e C n x m in 1.6.8
p. 52 line(-7) upper triangular in 1.6.19
p. 56 line(3) Hermitian in 1.8.7
line(-12) <Xl(B) + X1(C) in 1.9.2
p. 57 line(-7) Brouwer's in 1.10.1

Chapter 2

p. 65 line(-3) constant
p. 66 Proposition 2,.2.7 z .-* S(z)
lined 1) keW
lined 2)
p. 68 line(7) An = APv
lined 6) z^T
p. 70 line(9) -p
in (2.2.5)
lined 6)
= -Σ fe=0
p. 72 line(5)
= P(A*,X)
p. 73 line(2)
P(A*,X) =
dz = —dz

p. 75 line(14)
p. 78 line(3) (b) >δ-'
p. 79 line(7) σ
p. 80 line(-l) A-zI
p. 83 line(14) (2.6.2)
Hne(-lO) in exact arithmetic.
p. 84 line(-3) λ(ί) =
p. 86 line(3) k
p. 87 line(-6)
p. 88 line(4) S(t) = [.·.]- 1
p. 89 PROOF Take s = p defined in Theorem 2.9.3.
p. 90 line(12) (I - Y*X)B = 0.
line(-6) = r U {0}.
p. 91 line(14) -0
p. 92 line(19) B = Y*AU
p. 93 line(3) \\vk\\ = *k
p. 96 lines(—7, —6) l(U)
p. 98 line(2) = Κχ^ + · · ·
line(5) +A'- 1 6
103 line(-3) B in 2.9.1
104 line(4) A*y - ξν in 2.9.1
line(6) •••IIQUe in(*)
p. 105 line(7) delete of in 2.10.2
p. 108 line(16) coefficient (1, n) in J is a\n in 2.11.5

Chapter 3

P· 111 line(-9) Tacoma

P· 114 line(17) P(...) = P(...)
P· 115 line(-14) Pu(i,j)
P· 117 line(-6) +ljdi.
line(-5) A + dlT
P· 118 line(ll) C
3= Σ ai Pi
3 +

line(--7) Xi = (1 + r)fi
P· 126 line(--7) of eigenvalues

p. 132
line(9) π be the vector in 3.2.3
p. line(-2)
135 click in 3.2.9
p. 136
line(8) converge in 3.2.9
p. 140
line(5) S = [Si,...] in 3.4.8
p. 145
3.7.1 [B:ll]
3.7.2 [B:4,H]
p. 146 3.7.4 [B: 3,11]

Chapter 4

P· 149 line(6) is based

P· 155 line(3) II · U P
line(ll) (or eigenvectors)
line(17) (/ - 1)//
P· 156 line(-6) cond2Ü£)
P· 160 line(6) sparsity
line(-8) πβ = 1
P· 166 line(9)
P· 167 line(-5) λ' = · · ·
P· 168 Theorem 4.3.7 ||χ'.|| = 1
P· 171 line(-14) A is . . .
line(-12) (1 + Ι λ - α Ι ) 1 - 1 . . .
P· 173 line(-16) υ{0,Β)- Υ*
line(-15) [ • • • , s p ( B ) ] > 0
p. 174 line(17) . Then
P· 178 line(10) = max
p. 182 line(9) a—X
P· 187 line(-5) min
7 (Pj " A )
In particular csp(x) = ΙΙΣ-1!^ <
P· 190 line(-17)
value and eigenvector condition
be related when A is not normal
ί 2'.>erstheeigen-
need not

P· 197 line(14) (I-P±)

P· 199 line(-l) (S,x)
P· 200 line(8) (S\x)
line(17) yn

P· 202 line(ll) ~ξν\\2,

line(-3) Au — au A*v — av
P· 203 line(3) (||r(a)||| + | | S ( a ) | | | ) 1 / 2 - · · ·
P· 204 line(9) a matrix H
line(10) = UCandV*(A-H) =

Chapter 5

p. 208 line(-ll) same

p. 210 line(-15) ω(··· ,Μ/) - > 0
line(-14) Theorem 5.2.1
line(-4) - °ij a
i a
p. 212 line(-2) . . . , assume that x is thefirstcolumn of X, then
footnote Andre-Louis Cholesky, 1875-1918, born in Mont-
guyon, died in Bagneux.
p. 213 line(3) AQ = QT
p. 217 line(-5) Α-σΙ =
p. 218 line(7) = [*,$](···
p. 219 line(2) 6 -c+i
line(ll) (A - al)y = 0
p. 220 line(4) . . . exists a vector u such that qi is an eigenvector
p. 221 line(17) {Qi~'Qk)
p. 222 line(-18) r < n
line(-14) £kDk^
p. 225 line(-8) Qken = ' ' ' ti^n
p. 227 line(-3) = -Axk + · · ·'
p. 229 line(-l) B = Q*AQ
p. 231 line(9) eigenvalue λ
line(16) same
p. 233 line(-13) Xk
p. 236 line(2) Jti+i = <%i
p. 242 line(-5) (A - ujl)q
p. 243 line(-5) T - σΐ) = ■■■
p. 244 line(9) that
line(ll) ... of 5 and
p. 247 line(13) x*B*Bx

Chapter 6

p. 253
line(9) G,
p. 254
lined 3)
p. 255
line(17) < caf.
p. 258
line(3) jen = lin(...,An~1u)
line(-2) bj
line(-l) u : = U" -
p. 260 line(8) Let

p. 261 line(-l) H
p. 262 line(13) Ρ(λι) = · · · =
p. 267 line(4) J^l
line(14) (a) delete A
p. 269 line(4) skeS
line(ll) Sk = Xk + Y]PjSk
line(-l) (7i)
p. 271 line(14) required
line(15) same
p. 272 line(6) u := u — · · ·
p. 273 line(l) M
p. 275 line(3) sp(A) - {X}
line(-4) centre c,
line(-2) denominator
laiui 7}
±i-\ _i y(-— 0
p. 276 line(9) < cai
line(-4) Delete theie end of pr<pi
When dealing with || (Pi — P)x\\ in Theorem 6.3.4
we did not use the assumption that A is Hermitian.
Therefore the result sinθι < \\(Ρι — P)x\\ < cai
is valid. To bound λ — λ/ we also follow the proof
of Theorem 6.3.4, where xi (respectively x{) is re­
placed by xi (respectively x[). The conclusion fol­
lows accordingly.
p. 278 line(9) ioi = j
line(16) i = i* under Σ
line(19) ignore sketch of Hi
line(-7) \Χί
line(-5) h+i ι
line(-l) Hi
p. 279 lines(—11, - 8 , - 1 ) wi
line(-8) (6.7.1)
p. 281 line(-8) Sib in 6.1.2
p. 282 line(-12) Theorem 6.2.6 in 6.2.5
p. 283 line(l) (#*uj+i)i/i in 6.3.3
p. 284 line(8) change m to / in T/ in 6.3.5
p. 285 line(8)
p. 287 line(5) Exercise 6.3.13 in 6.3.14
p. 289 line(3) do,... , d m _ i in 6.3.18
line(10) rjrj = 0 in 6.3.18
line(lS) tridiagonal matrix Tk in 6.3.18

p. 290 line(10) starting in 6.4.1

p. 291 line(12) zelmX in 6.5.1
line(-12) Ht in 6.6.3
line(—5) w\ in 6.7.1

Chapter 7

p. 293 line(-5) v* of V the sense of

p. 294 line(10) Rivlin
line(-4) Theorem 7.1.1
p. 295 line(9) andf
Theorem 7.1.5 C(S)
lines(—3, - 1) Pfc
p. 296 lines(5,9) ?k
line(10) ω(ζ)
p. 297 line(-2) change i to :
p. 298 line(2) Pk
p. 299 line(13) Chebyshev
p. 300 line(7) Rivlin
line(-l) v(k)

p. 301 line(-5) Tk\{z-c)/e) Tk[(X-- c)/e]

line(-2) Figure 7.3.2
p. 302 line(-3) \h(z)\
line(-l) p(i) = i
footnote Sommieres
p. 304 line(-2) P
p. 305 line(-lO) for yk :
p. 309 line(-3) ye Sk
p. 311 line(15) max/i(/ii) =
p. 314 line(-14) ( < e , 9 * >^)
line(-10) >lk*IU

Appendix A

p. 352 line(-5) θι in 1.2.2

p. 355 line(-7) Q in 1.2.6
line(-3) [Q,Q]*[U,U]} in 1.2.6
p. 374 line(4) inequality (a) in 4.2.2
p. 381 line(8) CHAPTER 5

p. 389 line(-12) WiGj1 in 6.6.3

line(-lO) (WiG^yWi = G^WfWi in 6.6.3
p. 390 line(4) Ht in 6.6.3

Appendix B
p. 395 line(-12) [15] de la methode ...

Appendix C

p. 399 Balas
Baumgärtel, H. (1985)
p. 400 line(20) Bauwens
p. 402 line(24) . . . Sijthoff and Noordhoff.
line(25) Meyer, CD. Jr. . . .
p. 403 line(20) Oeuvres
line(-ll) Rutishauser, H. (1969) 'Computational aspects of
F. L. Bauer's simultaneous iteration method', Nu-
mer. Math., 13,4-13.

Supplements from Linear


Computation of an eigenvalue often leads to the investigation of the associated

invariant subspace, that is to say of a (possibly orthonormal) basis in that sub-
space. In this chapter we present the tools from linear algebra which are specific
for the treatment of eigenvalues. In particular, the canonical angles between two
subspaces are an appropriate measure of the 'distance' between these two
subspaces. The convergence of subspaces is translated into the convergence of
their bases up to a non-singular or unitary matrix, as the case may be. As often
as possible, we also use Schur's form in preference to that of Jordan; the latter is
important from a theoretical point of view but is numerically unstable.


Let C represent the space of column vectors x with complex components ξ]
(j = Ι,.,.,η). Then x* is the row vector with components ξ}. Unless otherwise
stated, the norm on <C" is the Euclidean norm
/ n \l/2
ΐΐχΐΐ2 = ( . Σ ΐ { / ] ·
The norms

11x1100= max |<y

are also useful. Unless the contrary is indicated, || · || denotes an arbitrary norm
on <C".
The scalar product on <Cn is given by
(x9y) = y*x.
When (x, y) = 0, the vectors x and y are said to the orthogonal.

Let {x,:i= l,...,w} be a basis of <C, that is a set of n linearly independent

vectors. The basis is orthonormal if and only if
(χ,,Χ;) = (50· (ί,.;=1,...,η).
The representation of a vector x in an orthonormal basis is given by

When the basis is not necessarily orthonormal, the coefficients ξ] in the

= Σ Sjxj

can be written in the form

ij = (x>yj) 0'=l»2,...,n),
where {)/,·: 7 = 1,2,..., n} is another basis of C such that
(χ„30) = ^ (i.7= 1,2 n). (1.1.1)
The proof of the existence and uniqueness of the basis {y}) is given in Exercise 1.1.1.
The basis {yj} defined in equation (1.1.1) is called the adjoint basis of {x,}. We
also say that the 2n vectors {x,} and {yj} form a biorthogonal collection of
elements of C .
Let {ay.j = 1,..., r} be a set of r vectors of C . We denote by
A = \_a1,...,ar]
the rectangular matrix of order n x r whose columns are the vectors au..., ar.
To fix the ideas we assume that
r ^n.
<*j = (au) (i = 1,..., n; ; = 1,..., r),
The vector space which is generated by {a,} is denoted by
lin (a l 9 ..., ar)
We shall often identify the matrix A with the linear mapping

which is represented by the matrix A when <Cr and <D" are referred to their
respective canonical bases.

Example 1.1.1 The unit matrix I„ represents the identity map C"->C". When
there is no ambiguity we shall write /.
The matrix

A* = (äjd
is the transposed conjugate complex of A; it reduces to the transposed
Λτ = (αβ)
(or lA) when A is real. Let / I b e a n n x n square matrix. The trace οϊΑ is given by

*A= Σ aJr

The matrix is said to be normal if

AA* = A*A
and it is called Hermitian* if
A = A*.
For a real matrix, being Hermitian means being symmetric, that is
A = AT.
The Hermitian matrix A is positive definite (respectively positive semi-definite) if
x ^ O implies that x*Ax > 0 (respectively x*Ax ^ 0).
Annxr rectangular matrix Q is said to be orthogonal if

ß*ß = 'r
Ann x n square matrix Q is called unitary if

For a real matrix being unitary means being orthogonal, that is

The columns of a unitary matrix form an orthonormal basis of C .
The set of all n x r matrices over C is denoted by <Cwxr; it is isomorphic to
S£(C, C ) , the set of all linear maps of C r into C n .
Corresponding to any norms ||-||Cr and ||-||Cn on C and <C" respectively, we
can define an induced norm (subordinated norm) on <EnXr as follows:

M||= max i^k: (AeC»*').

o#x6cr ||x||c r

* After Charles Hermite, 1822-1901, born in Dieuze, died in Paris.


Example 1.1.2 (see Isaacson and Keller, 1966, pp. 9-10) When the norm ||* || x
is used for both C r and C , then

\\A\\1= max £ |έΐ0.|.

When the norm || · || „ is used for both <Cr and <P, then
II ^t II oo = max Σ Κ Ι = ΜΊΙι-

An induced norm possesses the submultiplicative property, namely

\\AB\\^\\A\\ ||fi||
The condition number, cond (/I), of a square matrix (relative to inversion) is
defined by
Let A be a real or complex square matrix of order n. We consider the eigenvalue

find AeC and 0 Φ x e C " such that Ax = λχ (1.1.2)

The scalar λ is called an eigenvalue of ,4 and x is an eigenvector associated with

λ. The complex number λ is an eigenvalue of A if and only if it is a zero of the
characteristic polynomial
π(λ) = ά&(λΙ-Α),
the determinant of λΐ — A. This polynomial has n zeros in C, not necessarily
distinct; together they form the spectrum of A, denoted by sp{A). Thus
sp(/4) = {AeC; A is an eigenvalue of A}.

The real number

ρ(Α) = πιαχ{\λ\; xesp(/4)}

is called the spectral radius of A. A subspace M of <Cn is said to be invariant under

/I if
M c M .
In particular, when A is an eigenvalue of A9 the eigenspace Ker(,4 - λ1\ which is
generated by all the eigenvectors associated with λ, is invariant under A.
The singular values of the n x r rectangular matrix A are the non-negative
square roots of the eigenvalues of the square matrix A*A of order r. The norm

which is induced on A by || · || 2 on C" and C is given by

\\A\\2 = pl,2(A*A) (1.1.3)
(see Exercise 1.1.6), where p (A*A) denotes ^/p(A*A). This norm is often called
the spectral norm of A. It is majorized by ||X|| F , the Frobenius* norm, which is
easier to calculate:
/ n r \l/2

Mi|F = tr l 'V^)=^ Σ Σ K\2) ·

For if the eigenvalues of A*A are OLX ^ ·· · ^ αΓ ^ 0, we have that
ix{A*A) = «! + ··· + ar ^ «j = ρ(ΛΜ).
According to the context, this norm is also called the Schur* norm or the
Hilbert i -Schmidt § norm. It is the Euclidean norm of the vector (a0) in <Cnr.


Let M and N be two subspaces of C , each of dimension r. The relative position
of two spaces can be described with the help of canonical angles, which we are
going to define after establishing a preliminary lemma. We suppose that the
subspaces are determined by orthonormal bases consisting of the column vectors
{ql9...,qr} and {ui9...,ur}
respectively. We identify the bases with the n x r matrices
6 = [<Zi--4r] and l/ = [ w i - w r ] ,
which satisfy the equations
ρ*ρ = £/*(; = /.
Lemma 1,2.1 The singular values ofU*Q lie between 0 and 1.
PROOF Let {cf:i = 1,..., r} be the set of singular values of l/*Q. Then
^\\U\\2\\Q\\22 = p(U*U)p(Q*Q)=l.
Definition The r angles Oj defined by

CJ = COS0J9 O^0;^(;=l,...,r),

are called the canonical angles between M and N.

* Georg Ferdinand Frobenius, 1849-1917, born and died in Berlin.

Issai Schur, 1875-1941, born in Mogilev, died in Tel Aviv.
♦David Hubert, 1862-1943, born in Königsberg, died in Göttingen.
Erhard Schmidt, 1876-1959, born in Dorpat, died in Berlin.

We arrange the c, as an increasing sequence and therefore the angles θ} as a

decreasing sequence. Thus

We introduce the diagonal matrix
e = diag(0 1 ,...,0 r ).
The canonical angle

is called the maximal angle between M and N. Let τ be a trigonometric function;

we define
T0 = diag(T0 1 ,...,T0 r ).
By definition, the singular values of U*Q are the same as those of the matrix
The property of having the same singular values establishes an equivalence
relation on the set of all matrices, which we denote by ~ . Thus we have proved

Proposition 1.2.2
In particular we deduce that
||l/*ß|| 2 = ||cos©|| 2 and | | t / * ß | | F = l|cos0|| F ,
because the spectral norm ||·|| 2 and the Frobenius norm ||-||F depend only on the
singular values.
In Exercise 1.2.1 the reader will be introduced to the case in which the common
dimension of M and N exceeds n/2.
We now suppose that M is referred to an orthonormal basis Q and that N is
referred to the adjoint basis Y (if it exists). Thus
Q*Q = Y*Q = /.
The following lemma ensures the existence of the adjoint basis provided that

Lemma 1.2.3 There exist adjoint bases Q and Y in M and N respectively

if and only if the maximal angle 0max between M and N is less than n/2.

PROOF Let Q and U be orthonormal bases for M and N respectively. We seek

an invertible matrix B of order r such that
Y=UB and Y*Q = /,

that is
B*U*Q = I.
Thus £* exists if and only if U*Q is invertible. By Proposition 1.2.2 (see
Exercise 1.1.8), the matrix U*Q is invertible if and only if
cos0 max >O,
and then
J5- 1 =(l/*Ö)* = Q*C/.

Lemma 1,2.4 Let X, Yand X'f Y' be two pairs of adjoint bases for M and N. Then
there exists an invertible matrix C of order r such that
X' = XC and Y'=Y{C~lY.

PROOF Since X and X' are bases for M, there exists an invertible matrix C such
X' = XC.
Similarly, there exists an invertible matrix D satisfying
Γ = YD.
By hypothesis, the bases for M and N are adjoint in pairs; thus
Y*X = I and (Γ)*Χ' = 7.
(YD)*(XC) = D*Y*XC = O*C = /,
so that
D = (C*)- 1 =(C" 1 )*.

Proposition 1.2.5 Let Yand Q be adjoint bases ofM and N respectively, Q being
orthonormal, and let Θ be the matrix of canonical angles between M and N. Then,
y-(cos0)_1 and ß-y-tan©.

PROOF AS in the proof of Lemma 1.2.3, let U be an orthonormal basis for N.

Then there exists an invertible matrix B such that
Y=UB and B~1 = Q,¥U.
Y*Y = B*U*UB = B*B.
Hence the singular values of Y are the same as those of B, and the latter are inverse
to the singular values of Q*U. However, Q*C7 and UQ* have the same singular

Figure 1.2.1

values [see Exercise 1.1.18(c)], which, by definition, are c o s 0 . Hence

(Q* - Y*)(Q- Y) = QQ*- Y*Q-Q*Y + Y*Y
= 1-1-1+ Y*Y=Y*Y-I.
Hence if τ, is a singular value of Q — Y, we have that
The canonical angles enable us to extend to C some well-known trigonometric
relations in the plane, where Θ (< π/2) is the acute angle between two straight lines
M and N (see Figure 1.2.1):
IM 2 = NI 2 = i,
\\q-u\\2 = 2sm-,

\\y\\2 = — j
Il<7-yll2 = tan0,
||f||2 = cos0,
| | 4 - i | | 2 = sin0.

A projection P is a linear idempotent map:
P2 = P.
To each projection there corresponds a decomposition of C into a direct sum:

where M = Im P and W = Ker P\ indeed, if x is an arbitrary element of C" we


can write
x = px + (χ - ρχ\
where PxelmP and x — PxeKer P. We say that P is a projection on M parallel
to W. Conversely, if a direct decomposition of C is given, we can define a
projection by stipulating that P is the identity map on M and the zero map on
W. We put

and so
W = N1 = {xeC|x*}> = 0 for all yeN}.

Lemma 1.3.1 The space C" can be decomposed into the direct sum

if and only i/0 max < π/2, where 0max is ifte maximal angle between M and N.

PROOF According to Exercise 1.2.2 the equality 0max = π/2 is equivalent to the
existence of a non-zero vector that belongs to both M and N1 (see Figure 1.3.1).

Proposition 1.3.2 Let X and Ybe adjoint bases for M and N (which exist when
0max < π/2). Then the matrix
P = XY* (1.3.1)
represents the projection on M parallel to N1 in the canonical basis o / C .

PROOF By hypothesis
Y*X = L (1.3.2)
P2 = XY*XY* = XY* = P;
so P is a projection. Let x e C ; then

Figure 1.3.1

This shows that ImP c M. Conversely, if u = Xa is any vector of M, we can

u = X(Y*X)a = XY*(Xa) = P(Xa).
Hence M = ImP, as claimed. We deduce from equation (1.3.3) that x e K e r P if
and only if
y*x =0 (i=l,...,r),
that is
KerP = iV1.
Suppose that X\ Y is another pair of adjoint bases. Then X' = XCy Y' = Y(C*)~ \
and so
X'(Y)* = XY*.
When M = JV, we obtain the unique orthogonal projection of C on M; its
matrix will be denoted by π Μ . Suppose that M is given by the orthogonal bases
X relative to the canonical basis of <En(X*X = /). Then
nM = XX\
We remark that nM is Hermitian and that
11**112 = 1;
this follows from equation (1.1.3) and from the fact that all eigenvalues of a
projection matrix are zero or unity.


Let M and N be two subspaces of C , not necessarily of the same dimension.

Definition The gap between M and N is the number

where πΜ and πΝ are the orthogonal projections on M and N respectively.
The shortest distance, in the || · || 2 metric, of x from the subspace N is denoted by
This is the distance between x and πΝχ, which is the foot of the perpendicular
from x on to N. Thus
<ϋ8ί(χ,Ν)=||χ-πΝχ||2=||(/-πΝ)χ||2 (1.4.1)

Proposition 1.4.1
ω(Μ, N) = max {max dist (x, N\ max dist (>>, M)}

subject to the conditions that

xeM, x*X=l, yeN9 y*y=l.

PROOF For the sake of brevity we shall write || · || in place of || · || 2 .

(a) We shall show that
max {dist (x,JV)|xeM,x*x = 1} = ||(/-π Ν )π Μ ||. (1.4.2)
Since π Μ χ = x if and only if xeM, we have
dist (x, N) = || (/ - πΝ)χ || = || (/ - πΝ)πΜχ ||
^ΙΙ(/-π Ν )π Μ ||||χ|| = | | ( / - π Ν ) π Μ | | .
In particular,
max {dist (x, N)|xeM,x*x = 1} < || (/ - πΝ)πΜ ||. (1.4.3)
Conversely, by the definition of the matrix norm, there exists a unit vector
u e C such that
\\(Ι-πΝ)πΜ\\ = \\(Ι~πΝ)πΜη\\. (1.4.4)
Two cases have to be distinguished: (i) if nMu = 0, then πΜ = πΝπΜ, which
implies that Mc= JV, and so both sides of equation (1.4.2) are zero; (ii) if
nMu # 0 , then x 0 = πΜΜ/||πΜι*|| is a unit vector of M and we deduce from
equation (1.4.4) that
|| (J - πΝ)πΜ || = || nMu || dist (x 0 , N)
^dist(x 0 ,N),
Ι|π Μ «ΚΙ|π Μ ||||ιι|| = 1.
||(/-π Ν )π Μ || < max {dist (x,N)| xeM, x*x = 1} (1.4.5)
The assertion (1.4.2) follows from equations (1.4.3) and (1.4.5).
(b) Proposition 1.4.1 can now be reformulated as follows:
ω(Μ, N) - max {|| / - πΝ)πΜ ||, || (/ - πΜ)πΝ \\}. (1.4.6)

Since π2Μ = π Μ and || πΜ || = 1, we have

|| (/ - πΝ)πΜ || = || (πΜ - πΝ)πΜ || < || πΜ - πΝ ||.

||(/-π Μ )π Ν || ^ | | π Ν - π Μ | | = | | π Μ - π Ν | | .
ω(Μ, Ν) > max {|| (/ - πΝ)πΜ||, || (/ - πΜ)πΝ ||}. (1.4.7)

In order to establish the opposite inequality we recall that there exists a

unit vector xe<En such that
l l * A , - M = ll(*Af-%)x||. (1.4.8)
(*M ~ π * ) χ = UM(I ~ % ) * - (I - nM)*jv*
= u — v,
say, where
" =W - πΝ)χ, v = (/ - πΜ )πΝχ
Since π£, = πΜ and π Μ (/ — πΜ) = 0, it is readily verified that
u*v = 0,
ΙΙ(π Μ -π Ν )χ|| 2 ΗΙ"ΙΙ 2 + ΙΜΙ2.
Using the relations (/ — nN)2 = / — πΝ and π\ — πΝ we obtain

Since AB and BA have the same non-zero eigenvalues, we have || AB || = || BA ||.

li"H2 -H llt^H2 ^ max {|| (/— ΤΓ^ΤΓ^ ||2, II (/— ΤΓ^ΤΓ^ ||2}
A simple calculation shows that
| | ( / - π „ ) χ | | 2 + | | π „ χ | | 2 = |ΜΙ 2 = 1.
On referring to equation (1.4.8) and taking square roots we obtain
ω(Μ, N) ^ max {||(/ - ηΝ)πΜ ||, ||(/ - πΜ)πΝ ||} (1.4.9)
The statement (1.4.6) now follows from equations (1.4.7) and (1.4.9) (see
Figure 1.4.1 inR 2 ).

Theorem 1.4.2 Let P and P be projections on M and N respectively. Then

|| P - P || < 1 implies that dim M = dim N.

PROOF We shall prove that || P - P \\ < 1 implies that dim M ^ dim N. Let
x ! , . . . , x r be a basis of M. We shall show that the vectors Px1,...,Pxr are
linearly independent. If not, suppose that

where α χ ,..., ar are not all zero. Put


y = Σ α ί χ ί·

Then y φ 0, P'j; = 0 any Py = y9 because yeM. Hence

(P-P')y = y.
However, this contradicts the hypothesis that \\P — P'\\ < 1. By interchanging
M and JV, we conclude in the same way that dim N < dim M.

Corollary 1.4.3
ω(Μ, N) < 1 implies that dim M = dim N.

PROOF Choose P = nM and P' = πΝ.

Theorem 1.4.4 Suppose that dim M = dim N = r< n/2. Then the 2r eigenvalues
ofnM — πΝ, which are not necessarily zero, are equal to
± sin 0f (i= l,...,r).
PROOF Let [Q,ß] and [(/, ζ/] be the bases of <Cn defined in Exercise 1.2.6.
Relative to the orthonormal basis [Q, Q], the projection nM is represented by
^/r 0 0\
0 0 0
0 0 0,
and the projection πΝ is represented by

-s (C -S 0).

Thus the map nM — πΝ is represented by

is2 cs o\
n= \cs -s2 o
^0 0 0)
By a suitable permutation of the rows and the columns one verifies that the
eigenvalues of πΜ — πΝ, which are not necessarily zero, are equal to the eigen­
values of the r two-by-two matrices

CjSj -S2

that is ±Sj(j = 1,...,r). Since Π is symmetric, the Sj are also the singular values

Corollary 1.4.5
<u(M,JV) = sin0 max .

PROOF We remark that nM — πΝ and sin Θ have the same non-zero singular
values, whence the result follows at once.


Let {Mfc: k = 1,2,...} be a sequence of subspaces. We are going to define what
is meant by saying that this sequence tends to a subspace M. We put
dim Mk = rk dim M = r.

Definition The sequence of subspaces Mk converges to the subspace M as k -> oo,

if and only if
o>(Mk,M) = sin flJ^-O,

where 0 ^ x is the maximal acute angle between Mk and M, or, again, if and only if
or, equivalently, if and only if
Ι|π Μ ΐ ί -π Μ || 2 ->0,
that is

Lemma 15.1 For sufficiently great k we have

rk = r.

PROOF By hypothesis,

ω(ΜΛ, M) = || nMk - πΜ || 2 -» 0.
Thus, for sufficiently great fe,
The result now follows from Corollary 1.4.3.

Theorem 1.5.2 Let Qk and Q be orthonormal bases for the subspaces Mk and M
respectively (k = 1,2,....). Without loss of generality, assume that
dim Mk = dim M — r.
Mk-^>M as fc-*oo,
if and only if there exists a sequence of unitary matrices
Uk (fe=l,2,...)
of order r such that
QkUk~+Q fc->oo.
(a) Assume that Mk^M so that π Μ(ί ->π Μ , or, in terms of the corresponding
Multiplying on the right by Q and using the fact that ß*Q = /, we obtain

Q = Ö?ß.
Our hypothesis implies that

in particular
Hence, for sufficiently great values of k9 the Hermitian matrix Ck¥Ck is positive
definite. Every positive definite Hermitian matrix possesses at least one
positive definite square root (see Horn and Johnson, 1990, p. 405). We shall
denote a positive definite square root of C^Ck by (C£Cfc)1/2 and its inverse
It is easy to verify that
uk = ck(c*ckym

is unitary; indeed,
vkut = ck(c*cj- ^(c*ckr^c* = /.
In accordance with our assumption, all eigenvalues of C%Ck tend to unity
as fc-> oo. The same is true for (CJCk)1/2; hence
(C*Cfc)1/2-»J as fc-oo.
We have
Qk(Ck-Uk) = QkUkl(C*Ck)V2-Il
Since Qk and Uk remain bounded in the spectral norm, it follows that
we required.
(b) Conversely, suppose there exist unitary matrices Uk(k =1,2,...) such that
We deduce that
(QkUk)(QkUk)* = QkQ*^QQ*,
which is equivalent to Mk -+ M.

Corollary 1.5,3 Suppose that Mk-+M as fc-> oo. Then:

(a) //, for each k, an orthonormal basis Qk is given for Mk, there exists a
subsequence {Qt} and an orthonormal basis V for M such that Qi-*V.
(b) / / an orthonormal basis Q is given for M, there exists, for each k, an
orthonormal basis Vkfor Mk such that Vk->Q.


(a) By Theorem 1.5.2 there exists a sequence {Uk} of unitary matrices of order
r such that
Every unitary matrix is of spectral norm unity. Hence {Uk} is a bounded
sequence (in the spectral norm) and therefore possesses a convergent
subsequence, say
where / runs through an increasing sequence of positive integers. Evidently,
U is a unitary matrix. We now have that
QlVl ^Q and Q,^QC/* = K,

say, whence
so V is an orthonormal basis for M.
(b) This follows at once by putting
Vk = QkUk.
By Theorem 1.5.2 we have Vk-+Q, as required.

Proposition 1.5.4 Suppose that the subspaces Mk (k = 1,2,...) and M are equipped
with orthonormal bases Qk and Q respectively. Let Xk and X be arbitrary
bases for Mk and M. Then the existence of unitary matrices Uk such that QkUk->Q
is equivalent to the existence ofinvertible matrices Fk such that XkFk-+X.


(a) Suppose that

QkUk-+Q. (1.5.1)
There exist invertible matrices Bk and B such that
Xk = QkBk and X = QB.
The adjoint bases are
Yk = Qk(Btr1 and Y=Q(B*yl.
The matrices
Fk = Y*X = B- lQ*QB (k = 1,... ,r)
are invertible. It follows from the condition (1.5.1) that QkQk -+ QQ*> and so
QkCk -> Q, where Ck = C*ß. Hence
XkFk = QkBkBk * QtQB = QkCkB -+QB = X
(b) Assume that XkFk -> X, that is Qk(BkFkB~x) -► Q. Now BkFkB'x = Q*ß = Ck.
Therefore the hypothesis states that QkCk -> Q, whence Q C -> 6 * 0 = ^ and

Corollary 1.5.5 Let Xk and X be bases for Mk and M respectively. Then

Mk-+M if and only if there exist invertible matrices Fk such that
XkFk->X. (1.5.2)

PROOF By Proposition 1.5.4, the condition (1.5.2) is equivalent to QkUk->Q,

which, in turn, is equivalent to Mk -> M by virtue of Theorem 1.5.2.

Example 1.5.1 The result of Corollary 1.5.3 is optimal, as is confirmed by the

following example. Let M = R 2 . The orthonormal bases

c o s - , - s i n - 1 ,i s i n - , c o s - I > (1.5.3)

where ε is real, has no limit when ε tends to zero. The subsequence obtained
by putting zk = \kn does converge when k tends to infinity. Indeed, (1.53) is
equal to {ex, e2} for k = 1,2,.... We remark that (1.5.3) consists of the eigenvectors
of the matrix
,, x / 1 + ε cos (2/ε) ε sin (2/ε)
Α(ε) = I
\ε sin (2/ε) 1 — ε cos (2/ε)
corresponding to the eigenvalues 1 + ε and 1 — ε. When ε tends to zero, the
matrix Α(ε) tends to / and both eigenvalues tend to 1, the double eigenvalue of/.


We are interested in the theoretical problem of reducing a matrix A to simpler
forms with the help of similarity transformations
which preserve the eigenvalues. We return to the problem (1.1.2) (page 4). We
denote by
{λΐ9...,λ+ d^n}
the set of distinct eigenvalues of A. Let λ be a particular eigenvalue. Its geometric
multiplicity, g, is the greatest number of linearly independent eigenvectors that
correspond to X, that is
g = dim Ker (A — XI).
The algebraic multiplicity, m, of X is the multiplicity with which λ appears as a
zero of the characteristic polynomial π(ί), that is
n(t) = (t-X)mnl(t); nx(X)^0.
It will be shown that
[see (1.6.4), page 26]. We denote by
the set of eigenvalues with repetitions, each counted with its algebraic multi­
plicity; for example, if λγ is of algebraic multiplicity ml9 we might put
μί = · · · = μ ιηι = λΐ9 ^ m i + i = *·* = Λ<2>

An eigenvalue of algebraic multiplicty unity is called simple; otherwise it is said

to be multiple.
An eigenvalue of multiplicity m greater than unity is said to be semi-simple
if it admits m linearly independent eigenvectors; otherwise it is said to be defective.
If A — λΐ is singular, so is A* — XI, and I j s an eigenvalue of A*. Thus if
Ax = λχ9 there exists χ + Φ 0 such that A*x+ = λχ^ or, alternatively,
x$A = Ax*.
The eigenvector χ + of A* which corresponds to I is also called a left eigenvector
of A corresponding to k.

1.6.1 Diagonalisable Matrices

The matrix A is diagonalisable if and only if it is similar to a diagonal matrix. Let
Z) = d i a g ( ^ , . . . , / 0
be the diagonal matrix consisting of the eigenvalues of A.

Theorem 1.6.1 The matrix A is diagonalisable if and only if it possesses n linearly

independent eigenvectors x((i = 1,..., n). It can then be decomposed into the form
A = XDX~\ (1.6.1)
where the ith column of X (the ith column of(X~ )) is the right eigenvector xf
(the left eigenvector xf*) associated with the eigenvalue μ,·.

PROOF Let X = [ x x , . . . , x J be the invertible matrix whose columns are the n

linearly independent eigenvectors. The relation X~ XX = / implies that the rows
x£ of X " * satisfy the equations

xtxj = iu (U = ! . · - . » ) ■ (1-6.2)
Now Axt = μ,χ,Ο' = 1,..., n) can be written
AX = XD or A= XDX'\
Thus A is diagonalisable. Now AX = XD is equivalent to X~ lA = DX1; so
^*χ^ = μι·χί* ( ι = Ι,.,.,η).
The xt* are the eigenvectors of A* normalized by the relations (1.6.2).
Moreover, X+ = [x r J is the adjoint basis of X = [ x j and X*X = I is
equivalent to Χζ = X~x. Conversely, if A is diagonalisable, it possesses n linearly
independent eigenvectors. We leave the proof to the reader.
Thus A is diagonalisable if and only if its eigenvalues are semi-simple. In
that case A9 too, is termed semi-simple. When A is not diagonalisable, it is called

The decomposition (1.6.1), when it exists, is interesting on account of the

simplicity of the diagonal form. However, in practice, even if A is diagonalisable,
the matrix X, though invertible, may be ill-conditioned with regard to inversion,
which renders X~lAX difficult to compute. For this reason the next section
will be devoted to similarity transformations by unitary transformations; in the
Euclidean norm these have a conditioning coefficient equal to unity (see Exercise

1.6.2 Unitary Transformations

We are going to show that every matrix A is unitarily similar to an upper
triangular matrix; this is Schur's form for A. The following theorem is an existence
result; a constructive algorithm, called QR, will be given in Chapter 5 for certain
classes of matrices.

Theorem 1.6.2 There exists a unitary matrix Q such that Q*AQ is an upper
triangular matrix whose diagonal elements are the eigenvalues μί9...,μη in this

PROOF The proof proceeds by induction on the order n of A. The theorem is

trivial when n = 1. We assume that it holds for matrices of order n — 1. By means
of a reduction technique we shall produce a matrix of order n — 1 that has the
same eigenvalues as A except μ χ .
Let Xj be an eigenvector of A such that
Αχι=μιχΐ9 ||Xill2 = 1·
There exists a matrix U of size n by n— 1 such that [ x i , l / ] is unitary: the
columns of U are orthogonal to x 1} that is U*xl = 0. Then

Lxu l/]M[x l f I/] = [ ^ ] l > i * i > ΛΙΓ\

(μχ x\AU\
\0 U*AUJ
The eigenvalues of U*AU are μ 2 ,...,μ„. The result follows by induction.
The chosen order of the eigenvalues {μ^ determines the Schur basis Q apart
from a unitary block-diagonal matrix (see Exercise 1.6.5).

Remark / / A is real and if one uses orthogonal transformations, then A can be

reduced to an upper block-triangular matrix, where the order of the diagonal
blocks is at most 2 by 2. The diagonal blocks have conjugate complex eigenvalues.

We recall that two m by n matrices A and B are said to be equivalent if there

exist invertible matrices X and Y such that
B = XAY.
More especially, B is unitarily equivalent to A if

where (7 and K are unitary matrices of orders m and w respectively.

Extending the transformations on A so as to include equivalence transforma­
tions, we shall prove that every matrix is unitarily equivalent to a diagonal
matrix with non-negative elements, the diagonal consisting of its singular values.
This is the singular value decomposition (abbreviated to SVD).

Theorem 1.6.3 Let A be an m by n matrix and put q — min (m, n). There exist
unitary matrices U and V of orders m and n respectively such that U*AV= diag (σ,)
is of order mbyn and σχ ^ σ2 ^ · · ^ aq ^ 0. [This means that U*AV= (s^·) where
u = σ,·(ΐ = 1,... ,q) and sfj = 0 otherwise.]

PROOF There are vectors xeC" and ye<Cm such that ||x|| 2 = ΙΜΙ2 = 1 an(
Ax = axy, where

*i = MII2.
We construct unitary matrices U and V of the form:
U = ly,UJ and Κ=[χ,Κ χ ].

*-™"-('.' ;*)
w*=y*AVx and 5=ί/*/1Κ 1 .
Let φ = (σ\ + w*w)~ . The vector:
u* = (f>(auoj*)
is a unit row vector such that

* = Φ{ Bw }
We deduce that
M i « III > Φ2(?\ + w*w)2 = σ\ + w*w.
<x? = M l l ^ M i 112 >Mi«ll!·

It follows that

and so w = 0. The proof is now completed by induction by virtue of the fact that
Returning to similarity transformations of square matrices and employing
transformations that are not necessarily unitary, we shall put an arbitrary matrix
(diagonalisable or defective) into a particular block-diagonal form, known as
Jordans* form.

1.6.3 The Jordan Form

Numerous proofs have been given to establish the Jordan form of an arbitrary
defective matrix (see Horn and Johnson, 1990, pp. 121-6]. We present here a
strongly computational proof which starts from Schur's form. A further definition
and three preliminary lemmas are required.

Definition An m by m matrix U = {w0·} is said to be strictly upper triangular

ifuij = 0 when 1 ^ j ^ m. Evidently,
Um = 0.

Lemma 1.6.4 Let R be an upper triangular matrix. There exists an invertible

matrix Z such that
ZlRZ = diag{R,},
R.^U+Ui (i=l rf).
Ui is a strictly upper triangular matrix and the A, are distinct.

PROOF We use induction with respect to n. The theorem is true when n = 1.

Assume now that the theorem is true for upper triangular matrices of order less
than n. Let R be an upper triangular matrix of order n. Since the reduction to the
Schur form allows us to arrange the eigenvalues in any manner, we may assume

Vo RJ'
R1 = λlI+U1

♦Camille Jordan, 1838-1921, born in Lyon, died in Paris.


and Rt and R2 have no eigenvalue in common. Note that

c rr-c -r>
There exists a matrix 5 with the property that
// B\(RX S\(I -B\JRX 0\
Vo i)\o jt2Ao i J \o R2J
if and only if
S = RXB-BR2 (1.6.3)
It will be shown in Section 1.12 (Proposition 1.12.1) that equation (1.6.3) has a
unique solution for B provided that
sp(K 1 )nsp(K 2 ) = 0,
as is indeed the case in this situation.

Lemma 1.6.5 Let E be ak x k matrix of the form

< V}
£* = 0,
£*i+i=*i (i=l,...,fe-l),

/ - £ T £ = βχβϊ.

PROOF The above relations are verified immediately.

Lemma 1.6.6 Let U be a strictly upper triangular matrix of order m. Then there
exists an invertible matrix Y such that
y - 1 l / y = N = diag(£ J ),

The block Ej is of order kj and these orders are arranged in decreasing order of
magnitude. (When kj=\, then Ej is the zero matrix of order unity.)

PROOF We use induction with respect to m. The lemma is true when m = 1. We


assume that it holds for all strictly upper triangular matrices of order less than m.

By hypothesis, there exists an invertible matrix Yt such that

7'«,y,-(«' JV,
N 2 = diag(£ 2 ,.. ·>£«,)
order (£ χ ) ^ order (Ej) (;>2).
(l 0 VO M T\/1 0\ /0 u
Vo yrvVo ι/.Λο r,; "Vo rr
Let u Y1 = [M{W2] be partitioned in a manner consistent with the partitioning of
Y~1 Ul Y. Then with the aid of Lemma 1.6.5 we find that

-u\E[ θ\ ίθ u\ 1 u]E] o\ (0 u\(I-E\Ei) uJ2 \

0 I 0 0 £i 0 I 0 0
0 0 /) 0
0 N2, 0 / o N
u](I — £{£j) = wjejej = σ^[
say, where
= «Τ«ι·
We now have to distinguish two cases. First, when σ ^ 0, we verify that
/σ 0 0 \ ft ael «I\ σ 0 0\ (0 e\
N exu\
0 / 0 0 Ei 0 / 0 = 0 Ei 0
0 0 σ-1/ 0 0 0 W2
N V 0 0 alj Γό ό NJ
V \
where the order of N exceeds that of E1 by unity. Put
sJ = uT2Ni2-1 (i=l,2,...,fc 2 + l).
We observe that
N22 = 0,
because the blocks of N2 are arranged in decreasing order; on the other hand,

e,s: eM~

Consider the sequence of similarity transformations

/l ei+1sJ\(N eiS]\(l -ei+1sJ\jN ei+isj+l\
\0 I Λθ JV 2 Ao / / \0 N2 f
where we have used the fact that

(Lemma 1.6.5). Since sk = 0. this proves that U is indeed similar to

order (N) > order (E2).
Next, suppose that σ = 0. Then a simple permutation of the rows and columns,
which amounts to a similarity transformation, shows that U is similar to
(El 0 0 \
0 0 u\\.
^0 0 N2)

By the inductive hypothesis, there exists an invertible matrix X2 such that

so U is similar to

(El °)
where N'2 has the block-diagonal form required.

Theorem 1.6.7 Let Abe a defective matrix of order n with distinct eigenvalues
λί9...9λά (d<n).
Then there exists an invertible matrix X such that
X-MX = diag(J0),
J.j = XJ -f Eu
and Eu is a matrix of order k^ of the form
»-(o V ) U-'·*■■·■«*
that is there are gt blocks Ei} corresponding a particular λ(.

PROOF We start with the Schur form of A, namely

Q*AQ = R,
where R is an upper triangular matrix. Then JR is transformed into the block-
triangular form by Lemma 1.6.4. By Lemma 1.6.6 we have
Βί=Υ7ΐ(λίΙ+υί)Υί = λίΙ + άϊ^(Ε1,...,Ε9) (i=l,...,<*).

The set of Jordan blocks J 0 (j = 1,..., gt) associated with the same eigenvalue
λι constitutes the Jordan box associated with Af. Its order is
m£ = fca + — + fcfrf;
it contains gt blocks and so
/,<mf (1.6.4)
Let t{ be the dimension of the largest Jordan block associated with kr Then
( ^ - ^ = 0,
for there are no more than ί{ — 1 consecutive units along the first superdiagonal
of Bt. As always, let ku...., kd be the distinct eigenvalues of A.
Theorem 1.6.7 shows that
i= 1
we then have
dim M, = mh
which is the algebraic multiplicity of k{. We call /, the index of kt.
It can be proved that the Jordan form is unique apart from the arrangements
of the blocks along the diagonal.

Example 1.6.1 Let k be an eigenvalue of algebraic multiplicity m = 7, geometric

multiplicity g = 3 and index ί = 3. There are two possible forms for the Jordan
box associated with A; each contains three blocks and there cannot be more than
two consecutive units on the superdiagonal:

\λ 0 \λ 1
0 λ 0
λ 1 0
0 λ 1 λ 1
0 0 A 0 0 λ 0
A 1 θ! λ 1 0
0 A l ο λ ι!
0 0 A 0 0 A


Let λί9..., λά be the distinct eigenvalues of A. The spectral projection associated
with lt is the projection Pt on the invariant subspace M, parallel to

Theorem 1.7.1 Each matrix A possesses a spectral decomposition of the form

A-t&Pi + Dt), Df' = 0. (1.7.1)

PROOF By Theorem 1.6.7, we have

A= XJX~\
where J is a block-diagonal matrix consisting of d Jordan boxes
The box Bi is an mf x mt matrix of the form
Bt = XtImi + Nl9
where Nt is a matrix whose only non-zero elements appear on the first super-
diagonal and can be taken to be equal to unity.
Let Xi (respectively Xf*) be the matrix formed by the mf columns of X
(respectively rows of X'1) which are associated with Af. The column vectors of
Xi furnish a basis for M4 which possesses as the adjoint basis the corresponding
row vectors of X~ \ that is
Xt*Xi = / m ..
The matrix

represents the projection on Mf parallel to

This is the spectral projection associated with kt and is illustrated by the following

0 0
0 Bi 0
0 0
X J x-1


A = t LXAUm + Ndx*] = t Wi + A·)·

i=l i=l

According to Theorem 1.6.7, the matrix Nt has at most ί{ — 1 consecutive units

along the superdiagonal. Therefore
Νγ = 0 and D(' = XtN{* X* = 0.
We leave it to the reader to verify the following relations:
PiPj = SijPi, DiPj^ötjDi, DiDj = Q if i*j;
APt = P,A = PiAPi = λ,Ρ, + A , Dt = (A~ A,/)P,.
If, for an eigenvalue Xh the invariant subspace N{ is identical with the
eigenspace Ker {A — λ(Ι)9 then ί·χ = 1 and Df = 0 in (1.7.1). In this case we say that
the spectral projection P{ reduces to the eigenprojection.

Corollary 1.7.2 The spectral decomposition of a diagonalisable matrix A is of the


where Pt is the eigenprojection associated with λί.

PROOF If A is diagonalisable, than, for each eigenvalue Xh the invariant

subspace M, is identical with the eigenspace Ker (A — A,·/), that is / f = l ,
Dt = 0.

There are evidently infinitely many ways in which a projection on M, can be

defined. In particular one might consider the orthogonal projection. We have
chosen a projection that is related to the properties of
{A-ziy1 (ze<C),
as we shall see in Chapter 2. In Chapter 4 this will enable us easily to derive the
convergence properties of a sequence of spectral projections from the convergence
of the corresponding sequence of matrices.

Proposition 1.7.3 The spectral decomposition of the transposed conjugate matrix

A*=t$ipf + DT)
i= 1

constitutes a Jordan decomposition of A*.


PROOF The proof is immediate, as can be seen from the following example:
(X 1 0 0 \ /Ä 0 0 0 \
0 A 0 0 1 A 0 0
J= J* = = PJT,
0 0 A 1 •0 0 Ä 0
Vo o o xl Vo o i V
/Ä 1 0 0\ / 0 1 0 0\
0 Ä 0 0 10 0 0
J' = P=
0 0 Ä 1 0 0 0 1
Vo o o xJ V0 0 1 θ7
Generally, the determination of P is given in Exercise 1.6.16.

We deduce from Proposition 1.7.3 that Af and 3, have the same multiplicities
and indices in A and A* respectively.

Example 1.7.1 The matrix

has the eigenvalues λχ = 3 and λ2 = — 1; the corresponding eigenvectors can be
taken to be

'-(i) ''-(-2)
The matrix A* has the eigenvectors

normalized by the conditions

1* 1 — ^ 2 * 2 —

The eigenprojections are

'-.-»ft 'ί2)
*2 = *2*ί. = 2 ( _ 2 1 }

It is readily verified that

A = 3Pl-P2, A* = 3P*-P*.

Theorem 1.6.7 allows us to construct a basis X of the invariant subspace M

associated with λ:
AX = XB, where B = XIm + N
is the Jordan box of order m associated with λ. The matrix
B = X$AX
represents the restriction
in the adjoint bases X and X+.
Let Λί^ be the subspace generated by the basis X+. The relationship
X%A = BX%
or, alternatively,
A*X+ = X+B*
implies that M + is the invariant subspace for A* associated with I; it is also called
the left-invariant subspace of A associated with λ.
Let us change the bases of M and M+. Define
X' = XC and * ; = X^(C'1)*
so that
X'fX' = /.
We obtain
AXC = XBC or AX1 = XB\
B' = C-1BC = X'*AX'.
Now B' represents A ^M in the new bases; it is, in general, no longer an upper
triangular matrix.

Lemma 1.7.4 Let (X, Y) and (X, Y') be two pairs of adjoint bases in (M,N) and
(M, ΛΓ) respectively, where M is an invariant subspace for A. Then
B=Y*AX= Y'*AX.


B' = Γ *AX = Γ *XB = B.



Definition Let X be a rectangular matrix of size n by m, where m < n. The rank

of X, denoted by r(X\ is the largest order of an invertible matrix that can be
extracted from X by selecting some of its row and columns and forming their inter­

We quote the following results without proofs.

Lemma 1.8.1 The rank ofX is equal to the number of non-zero singular values

Proposition 1.8.2 Let r(X) = m. Then there exists annxm orthonormal matrix Q
and upper triangular invertible matrix R such that
X = QR. (1.8.1)

PROOF The proof can be found in Horn and Johnson (1990, p. 112).
The formula (1.8.1) is called the Schmidt, or QR9 factorization of X. Several
algorithms exist for obtaining the factorization (1.8.1). We mention the methods
of Gram,* Schmidt, the modified Gram-Schmidt processes and those of
Householder and Givens (see Golub and Van Loan, 1989, pp. 146-62).

Proposition 1.8.3 Ifr(X) = r <m, there exists a permutation matrix Π such that
where Q is orthonormal,

and R n is an upper triangular invertible matrix.

PROOF For a proof see Exercise 1.8.1.

The QR factorization is a very important tool, both from a theoretical and a

practical point of view. It enables us to construct an orthonormal basis Q of a
vector space M by starting from an arbitrary basis X.
In a practical situation it may happen that the vectors of X, though independent
mathematically, are almost dependent numerically, that is cond 2 (X) and also
cond2(jR) are large. This leads to the notion of a numerical rank for such a set of

♦Jörgen Pedersen Gram, 1850-1916, born at Nustrup, died in Copenhagen.


Definitions Let

be the singular values ofX and let ebe a given positive number.
(a) The matrix X is said to be oft-rank r if exactly r singular values satisfy

^ ε (i=l r).
(b) The m column vectors ofX are dependent within ε, if there exists an invertible
matrix B of order m such that XB is oft-rank less than m. A matrix X which
is of rank m but oft-rank r is said to have a numerical rank equal to r.


We recall that a matrix A is said to be normal if it commutes with its conjugate
transpose, that is
AA* = A*A.
A Hermitian matrix (A = A*) is a special case of a normal matrix.
With regard to the eigenvalue problem, Hermitian or normal matrices possess
numerous remarkable properties, which be quote for reference:
(a) A Hermitian or normal matrix possesses an orthonormal basis consisting of
(b) We have

where the eigenprojections Pt are orthogonal:

Ρ( = Ρ*.
(c) \\A\\2 = p(A)
(see Exercises 1.9.3 and 1.9.4).
In addition, the eigenvalues of a Hermitian matrix possess the following
min-max representation (Fischer-Poincaro*).
Theorem 1.9.1 Let A be a Hermitian matrix with eigenvalues

μ]ί = min max (x*Ax; xe Vk, x*x = 1)
(k= 1,..., n), where Vk ranges over all k-dimensional subspaces of<Cn.

* Henri Poincare, 1954-1912, born in Nancy, died in Paris.


PROOF See, for example, Ciarlet (1989, Theorem 1.3.1, p. 16). The max-min
characterization, which is due to Courant* and Weyl* is demonstrated in
Exercise 1.9.5.
The number
^, χ x*Ax
^(x) = — - ,
defined for x Φ 0, is called the Rayleigh* quotient of A for the vector x. This
number plays a very important part in the calculation of the eigenvalues of
Hermitian matrices. In particular, it is an immediate consequence of Theorem
1.9.1 that
where x is an any vector satisfying x*x = 1.

A matrix is said to be non-negative if all its elements are either positive or zero.
Such matrices occur in the numerical treatment of partial differential equations,
in the theory of probability, in physics, chemistry and economics (see Chapter 3).
We quote the principal result, which is known as the Perron § -Frobenius
theorem; it is concerned with what are called irreducible matrices.
This notion is defined as follows: an n by n matrix A = (a0) is said to be reducible
if the indices 1,2,..., n can be split into two non-empty disjoint sets:
il9...9ir; Ji,...Js (r + s = n)
in such a way that
.W/> = ° (a=l,...,r;j?=l,...,s).
If such a splitting is impossible the matrix is said to be irreducible.

Theorem 1.10.1 Let A be an n by n irreducible non-negative matrix with distinct


Ifp = maXj-IAfl, then p is a simple eigenvalue of A and there exists an eigenvector

x, associated with p, all of whose components are positive. Every eigenvalue of
modulus p is simple, and every eigenvector with positive components is proportional
to x. Moreover, if the elements of A are strictly positive, then p > \Xj\for every
eigenvalue λ] other than p.

♦Richard Courant, 1882-1972, born at Lublinitz, died at New Rochelle.

Hermann Weyl, 1885-1955, born at Elmshorn, died at Zürich.
♦Oscar Perron, 1880-1975, born at Frankenthal, died in Munich.
John William Strutt (Lord Rayleigh), 1849-1919, born at Langford Grove, died at Terling Place.

PROOF The reader is referred to the book by Varga (1962, p. 30) for the
definitions and the proof or to the book by Gantmacher (1960, Vol. II, Ch. 13)
or to Horn and Johnson (1990, Ch. 8).


1.11.1 Hermitian Matrices

In Section 1.9 we defined the Rayleigh quotient for a single vector x. We shall
now generalize this definition by considering several vectors.
Let Q be an orthonormal basis for an arbitrary subspace; thus Q*Q = I. Then
Μ — QQ* is th e orthogonal projection on M.

Definition Let A be a Hermitian matrix; the restriction of A to M is denoted by

A^M. The map nMA^Mfrom M to M is called the section of A on M, and
@(Q) = Q*AQ
is called the Rayleigh matrix quotient of A on Q (see Householder, 1964, p. 74).

Since πΜΑπΜ = Q{Q*AQ)Q*, it follows that 0t(Q) represents nMA^M relative to

the orthonormal basis Q of M. If M is invariant under A, then the section nMA^M
is identical with the restriction A^M.

1.11.2 Arbitrary Matrices

To be sure, the number 0l{x) is still defined when A is no longer Hermitian. In
this case, however, it may be more interesting to consider
@(x,y) = y*Ax,
which is the generalized Rayleigh quotient constructed for two vectors x, y such
y*x = 1.
Let X be a basis for the subspace M and let Y be the adjoint basis for the subspace
N\ so
Y*X = I.
®{X, Y) = Y*AX
is the matrix Rayleigh quotient for adjoint bases X and Y. Let
P = XY*;
PAP = X(Y*AX)Y*.

When M is an invariant subspace, we have

AX = XB, where B = Y*AX.
However, in general,
AX - XB = R,
where R is interpreted as the residual matrix for A associated with X and Y,


1.12.1 Block-diagonalisation of J =
Let A and B be square matrices of orders n and r respectively. Suppose that the
nxr matrix Z is a solution of Sylvester's* equation:
AZ-ZB = C. (1.12.1)
It is easy to verify that, if

-ft a - - f t r>
The matrices

(0) - (ζ')
are bases for the right and left invariant subspaces of T respectively, both being
associated with the matrix A. The corresponding spectral projection is given by

1.12.2 Sylvester's Equation AZ-ZB = C

The above matrix equation may be regarded as a particular system of nr
equations for nr (scalar) unknowns. We define the function
which associates with every nxr matrix a column vector of order nr as follows.

* James Joseph Sylvester, 1814-1897, born and died in London.


Z = [z 1 ,...,z r ],
where z l,..., zr are the columns of Z; then
vecZ = f ) eC"
z = vec Z, c = vec C,
ΛΖ - ZB = C
is equivalent to
^z = c, (1.12.2)
^ = /r®^-BT(8)/„,
the symbol ® denoting the tensor product. Explicitly,
pjA-b^K ... - U \ (1123)

V -blrin ... >4-M„;

Let T be the linear map o n C " defined by
Ί.Ζ-+ΑΖ-ΖΒ (1.12.4)
By virtue of the isomorphism between C" x r and C nr we identify T with ^".

1.12.3 The Use of the Schur Form of B

We may think of equation (1.12.1) as a system of equations for the column
vectors zx,..., zr. These unknowns are linked by the matrix B. We may separate
them by using the Schur form
B = QTQ*,
where Tis an upper triangular matrix with diagonal
(/z 1 ,...,^ r ) = sp(ß).
Z' = ZQ and C = CQ.
Equation (1.12.1) is equivalent to
^1Z/-Z'T=C/. (1.12.5)

Further, let
z' = vec Z' and d = vec C".
Then equation (1.12.5) is equivalent to
9"i = c\
(Α-μχΙη 0
f' = Ir®A-T ®I =
V -*ιΛ Α-μϊ%)
is a block-triangular matrix. More explicitly, we can write the equation as the
(Α-μ2Ι)ζ'2 = €'2 + ίί2ζ\ (1.12.6)
( Λ - / ν θ ζ ; = < + Σ iirz;

This system may be solved recursively for z\,...,z'r provided that each of the
is invertible. The original unknowns can then be determined with the aid of
the equation
Z = Z'Q*.
Hence we have the following proposition.

Proposition 1.12.1 The map T is invertible if and only ifsp(A)nsp(B) = 0.

The minimal distance between sp(A) and sp(2J) is defined as
δ = {min|/l — μ\, Xesp(A), μβ$ρ(Β)}
= min dist [sp(/4), sp(ß)].
This is a measure of the separation of the two sets sp(A) and sp(B) in C. In
what follows we assume that δ is strictly positive; in that case equation (1.12.1)
has a unique solution
Z = TlC,
where T " 1 , the inverse of T, is also denoted by (A9B)~l.

1.12.4 Algorithms
Two algorithms are used in practice to solve equations (1.12.6):
(a) The algorithm of Bartels and Stewart (1972) which utilizes the Schur form

of A. Let A = U*SU, where 5 is a lower triangular matrix and U is unitary.

One is then led to solving
where Z" = JJZ' and C" = L/C, that is r triangular systems with matrices
5 - μ , / ( ί = l,...,r).
In a concrete situation the QÄ algorithm is used to compute the Schur
forms T and S, which, in the case of real matrices, may involve diagonal
blocks of order 2.
(b) The algorithm of Hessenberg*-Schur proposed in Golub, Nash and Van
Loan (1979), which uses the Hessenberg form of A, that is A = U*HU, where
U is unitary and H is an upper Hessenberg matrix. The r systems with
matrices H — μ,7 (i = 1,..., r) are solved by Gaussian^ elimination with partial

1.12.5 Conditioning of Equation (1.12.1)


||vecZ|| 2 = f X ΙΙ^ΙΙ2Υ/2 = ΙΙΖ||ρ

is the Frobenius norm of Z, we deduce that

|,T l l F -supiI2i. S „ P ^.,mi,

ΖΦΟ ||Z|| F z*o ||z|| 2
condF (T) = cond2 3Γ
Proposition 1.12.2 Let δ = min dist Ορ(Λ), sp(ß)]. Then \\ T"11| ^ δ" ι . More­
over, if A and B are Hermitian or normal, then ||Τ _1 ||ρ = δ _ 1 .

PROOF The expression for 2Γ' makes it plain that its eigenvalues and therefore
those of ^" are A, —/ij, where At6Sp(y4) and μ^€8ρ(β). Hence p(^""x) = p(T "x) = δ~ x
and p ( T - 1 ) ^ ||T _1 1| for any induced norm |||| (see Exercise 1.1.4).
Next, if A and B are normal, so is ΖΓ\ this may be deduced from (1.12.3) by
showing that ^ * ^ = «T.T*, provided that A*A = AA* and B*B = BB*.
A normal matrix can be diagonalised by a unitary similarity transformation.
Since the norm || · || 2 is invariant under unitary transformation we may assume
for the present purpose that 5" is in diagonal form. It now becomes plain that
ρ ( τ - 1 ) = | | ^ - 1 | | 2 = 1ΐτ-1||ρ.

♦Gerhard Wilhelm Hessenberg, 1874-1925, born in Frankfurt on Main, died in Berlin.

After Carl Friedrich Gauss, 1777-1855, born in Brunswick, died in Göttingen.

We wish to examine ΙΙ(Λ,β)" 1 || F = | | T " 2 | | F and we begin with the case in

which B is of order r = 1. Caser=\
Let δ = dist [b,sp(/4)] = |fr - λ\ > 0.

Lemma 1.12.3 Suppose that A is diagonalisable and that A = XDX'1, where D

is a diagonal matrix. Then
(5- 1 ^||(>l-W)- 1 |l2^cond 2 (X)<5- 1 .

PROOF The result is evident from the relation


Proposition 1.12.4 Suppose that A has the Jordan decomposition A = XJX~l.

If δ(<\) is sufficiently small, then
where L is the greatest index of eigenvalues μ of A such that \μ — b\ < 1.

PROOF Put Jb = J — bI and ||Jb~l ||J* = cmin(Jb), where Jb is a direct sum of

Jordan blocks of the form

(μ-b l 0 \

^ 0 ··· 'μ-b)

of order less than or equal to the index έμ of the eigenvalue μβ$ρ(Α). For such
a block, the least singular value is the square root of the least eigenvalue of
H = G*G. Put ε = μ - b, where \ε\^δ. Then

I e.t 1 + |ε|2'χ

V0 '''·ε l + |e|2J
We wish to obtain a lower bound for the eigenvalues of H. Now
det// = |detG| 2 = |6| 2r .
By Gershgorin's theorem (Theorem 4.5.1) the eigenvalues a, of H satisfy
0 < α ί ^ 1 + |ε| 2 + 2|ε| = (1 + |ε|) 2

(/= l,...,r). Hence

det H |ε| 2 (i=l,...,r).

\\(A-biyl\\2^cona2(.X) max [(1 + |β|/"- 1 |β| - ''] (1.12.7)

The graph of the function


is given in Figure 1.12.1, when r > 1. This is an increasing function with respect
to the exponent r.
If we confine ourselves to values of |ε| less than unity, we have
ΐ / ΐ + |ε|Υ" -1 1/Ί+<5 JL-1

L=max[t'ßesp(A),\s\ < 1].
When δ is sufficiently small, the maximum that appears in (1.12.7) is attained
at <5(^|ε|), the term in S~L being dominant.
When A is normal, cond 2 (A r )=l. Exercise 1.6.19 shows that cond2(X)
increases as a function of v(A) = || A A* - A*F\\¥9 the departure from normality
of A. Similarly, it is seen in Exercise 1.6.20 that \\N\\¥ increases simultaneously
with v(A\ where N is the strictly upper triangular part of the Schur form of A.

Example 1.12.1 Let

A= a c
0 b

Figure 1.12.1

where a, b and c are real numbers and c φ 0. Then

c2 {b-a)c
AA* A*A =
db-a)c -c2
When b φ a, the eigenbasis is given by the matrix

x-(x l
\0 (b-a)/c)
and when a = b, the Jordan basis is given by the matrix

*-G .-■)
It can be verified that cond2(A')-^ oo when |c| -* oo.

We conclude from this investigation that a regular triangular matrix

T = D + N, in which ||N|| F is large, is ill-conditioned relative to inversion
irrespective of the distances of the eigenvalues from the origin. Caser>\
ΙΙ(Λ,β)" 1|F depends on cond2(X), cond 2 (F) and on δ = mindist[sp(/l),sp(J5)],
where X and V are the Jordan bases (bases of eigenvectors) of A and B

Example 1.12.2 Let r = 2 and

B= a c
0 b
(A-aIn 0 \
V -clm A-blJ
it is clear that cond2(«^") depends on |c| and on cond 2 (/l — μΐ). where μ = a or
b. We shall examine a particular case. Let

(\ OL \
1# a
A =
K '■ij
of order 6. We note that |a| is related to cond2(^f) and \c\ is related to cond2(K);
let δ = min(\a - 11, \b - 11). Table 1.12.1 shows the dependence of y = ||(Ay B)"1 ||F

Table 1.12.1.

a -1 -5 -10 -15
y 5 x 105 2 x 108 2.3 x 108 7 x 108
c 0 1 10 100
y 1.6 xlO 4 5xl0 5 5 x 106 5 x 107

as a function of cond 2 (X) and cond 2 (F), when

a = i, = 0.8; 6 = 0.2.
Analogous results are obtained when a Φ b.


Let A and B be matrices of type n by m. The set of matrices A — λΒ, where AeCC,
is called a pencil of matrices.

Definition The matrix pencil Α — λΒ is said to be regular if

(a) A and B are square matrices of the same order and
(b) det (A — λΒ) does not vanish for all complex numbers λ.
In all other cases [m Φ n or m = n and det (A — λΒ) = 0 for all λ] the pencil
is said to be singular.
We are interested in regular pencils of square matrices: for some values of λ
there exists a vector x Φθ,χβ<£η such that
Αχ = λΒχ. (1.13.1)
The problem (1.13.1) is called the generalized eigenvalue problem. The set
sp|\4, JB] = {ze<C|det(/4 - zB) = 0}
forms the set of eigenvalues of the pencil. If 0 Φ Aesptyl,/*], then 1/Aesp[ß,/1].
Moreover, if B is regular, then
sp [A, B] = sp \_B ~l A, / ] = sp(ß " lA).
When B is singular, s p [ ^ , ß ] can be finite, empty or infinite.

Example 1.13.1

-G 3) B =C 0) »wfl-(u
-C 3) Mo i) s »t^ = 0

"C o) B-(l o) »Wfl-c
In applications, A and B are often symmetric. Without further assumptions
about A and B9 the spectrum sp[i4,£] may be complex.
Example 1.13.2

A case that occurs very frequently in applications is that in which A is

indefinite (eigenvalues positive and negative) and B is singular and positive
semi-definite (eigenvalues positive and zero). This is found in structural mechanics
(see Chapter 3).


The canonical angles between two subspaces were used in statistics before they
were used in numerical analysis (see Afriat, 1957; Björck and Golub, 1973;
Davis and Kahan, 1968; Golub and van Loan, 1989, Ch. 2; and the article
by Stewart, 1973a). Regarding the notion of gap in a Banach space, see Chatelin
(1983). The relationship between the norm of the strictly upper triangular part
of the Schur form and the departure from normality was studied by Henrici
(1962). The computational influence of a very large departure from normality
on matrix calculations is discussed in Chatelin (1992). The proof of the existence
of the Jordan form given in Section 1.6.3 was inspired by Fletcher and Sorensen
The first variational characterization of the eigenvalues of a self-adjoint
operation is due to Weber (1869) and to Lord Rayleigh (1899). The min-max
characterization is due to Pioncare (1890) and to Fischer (1905), while the
max-min characterization is due to Weyl (1911) and to Courant (1920).
In Stewart's article (1973a) and in Varah (1979) the quantity \\(A9B)'1 \\~χ is
called separation between the matrices A and B. We have not adopted this
nomenclature in order to avoid possible confusion with the separation between
spectra represented by δ in the non-normal cases. As regards the definition in
a Hubert space, see Stewart (1971). The proof of Proposition 1.12.4 was inspired
by Kahan, Parlett and Jiang (1982).


Section 1.1 Notation and Definitions

1.1.1 [A] Prove that every basis of C" possesses an adjoint basis and that the
latter is unique. Examine the existence and uniqueness of an adjoint basis for
a basis of a subspace of C" of dimension r, when r < n.

1.1.2 [B:8] Show that for each A in C" x r we have

KerA* = (lmA)L
I m ^ * = (Ker>l) 1 .
1.1.3 [D] Prove that, if Ae<£nxn is Hermitian and positive definite, then the
(x9y)G(Cn x (Cn^y*Axe(C
defines a scalar product.
1.1.4 [B:25] Prove the following inequalities:
Mh^iMIIJ/lU 1 ' 2 , WleC"*",

Al\\^\\A\\2^y/n\\A\\ii V/1GC W X W ,

where r(A) denotes the rank of A,
it An
*ζρ(Α), Vi4e<P,x", VueC\{0}
p(A) < || A ||, V X e C x", V induced norms.
1.1.5 [B:8,11] Let A e C " be Hermitian with spectrum
sp(/l) = {A 1 ,... ) A d }.
Prove that:
(a) A f G R ( i = l , . . . , 4
(b) The singular values of A are σ{ = | λ,·| (i = 1,..., d).
(c) There exists an orthonormal basis of <CW that consists of the eigenvectors of A.
(d) \\A\\2 = p(A).
1.1.6 [A] Show that, for all A in <C"xr,
n xr
1.1.7 [A] Let Qe<E and suppose that the columns of Q form an orthonormal
system. Prove that | | β | | 2 = 1. Deduce that, if Q is a unitary matrix, then
1.1.8 [A] Prove that if A is a singular matrix, then at least one of its singular
values is equal to zero.

1.1.9 [D] Prove that:

(a) p2 = p ^ s p ( p ) c = { o , i } .
(b) Dk = 0 for some integer k=>p(D) = 0.
(c) e*e = ßß* = /andAesp(Q)=>|>l| = l.
(d) condpf7; f ) = 2n+||Z||£.

1.1.10 [D] Let A be a regular matrix of order n with singular values

σ1^σ2^ ··· ^σ„>0.
show that
cond2 A = cond*/2(/4M) = ( ^ J ,

Gi = SUp
dimK =-i xeV\ χ*χ )

1.1.11 [D] Let A and B be two matrices of order n. Prove that

af(i4 + B) ^ σ,(Λ) + ^(B) (1 ^ i ^ n\
where, for each matrix M, the singular values are denoted by
1.1.12 [D] Let A be a Hermitian matrix. Show that if Λ is positive
(semi-definite), then its eigenvalues are positive (non-negative).
1.1.13 [D] The matrix CeR"xn is defined by
[—af_ x if 7 = n and 1 <; i < n
1 if 7 = i — 1 and 2 ^ i ^ n
0 otherwise
Show that the characteristic polynomial of C is given by

π(λ) = "£ a ^ + A".


C is the companion matrix associated with the polynomial n.

1.1.14 [D] Construct two real 2 by 2 matrices A and B such that
AB = ΒΛ and ρ(Λ£) < ρ(Λ)ρ(£)

1.1.15 [D] Let A and B be two n by n matrices. Prove that


where σχ ^ σ2 < · · ^ ση are the singular values.


1.1.16 [D] Prove that if A and B are n by n matrices, then

tr(,4£) = tr(BA).
Deduce that if A and B are similar matrices, then they have the same trace.
1.1.17 [B:10] Let |||| be the norm in <Cnxr which is induced by the norms
|| · || C r in C r and || · || Cn in C . Prove that, for each Ae<En x r ,
1.1.18 [D] Prove that:
(a) The set of regular matrices is dense in C x "
(b) For all A and B in <Cn x", sp(AB) = sp(BA).
(c) If both products AB and BA are defined and are square matrices, then
p(AB) = p(BA).
1.1.19 [B:ll] Show that if A = (a^eC , then
M | | = max £ \ai3\.
1.1.20 [D] Show that in C" x n the formula
< x , y > = tr(y*x)
defines a scalar product whose derived norm is the Frobenius norm. Is it an
induced norm? Is it sub-multiplicative?

Section 1.2 The Canonical Angles between Two Subspaces

1.2.1 [A] Let M and N be subspaces of C such that

dim M — dim N > -.

Show that of the canonical angles between M and N at most [M/2] are non-zero.
(Notation: [a] = min {jeK'J > a} VaeR.)

1.2.2 [A] Let M and N be subspaces of C such that

dim M = dim N ^ -
and let 0max be the greatest canonical angle between M and N. Show that if
#max < π /2, then MnN1 contains at least one non-zero vector.

1.2.3 [A] Let Xe€nxr be an orthonormal basis of M and let Ye(P x r be an

orthonormal basis of N. Suppose that the canonical angles between M and N

are given by

0 = diag(Ö1,...,Ör).
7 = YY*X.
Prove that Γ is a basis of N and that
r~cos©, Τ-ΛΓ-sin©.
1.2.4 [B:6] Let M and JV be subspaces of <P such that

dim M = dim N = r < -.

Define χ,-eC and j ^ e C O ' = l,...,r) by the following conditions:
lyjxj = maxmax(|j;*x|;x*x = y*y = 1)
xeM yeN

and, when 2^j^r,

| y*xA = max max (| y*x\; x*x = y*y = 1, x*x = yfy = 0) (i = 1,... J - 1).
3 J
xeM yeN

Let 0X ^ 02 ^ ··· ^ 0r be the canonical angles between M and JV. Show that
cos0 r _ i + 1 = |j/*x.| (i= 1 r).
1.2.5 [D] Let 0X ^ ·· · ^ 0r be the canonical angles between the subspaces M
and JV of dimension r. Show that
sin 0X = max min (|| x — y || 2; x*x =1),
xeM yeN

sin 0r = min min (|| x — y\\ 2; x*x =1).

xeM yeN

1.2.6 [A] Let Θ be the diagonal matrix of the canonical angles between the
subspaces M and JV of C of dimension r < n/2. Put C = cos Θ and S = sin Θ.
Prove that there exist orthonormal bases Q of M, Q of M 1 , (7 of ΛΓ and ζ/ of
JV1 such that
(C -S 0 \
[Qß]*[t/i/]* = S C O
V0 0 'n-2ry

Deduce how to calculate the canonical angles when r ^ n/2.


Section 1.3 Projections

1.3.1 [D] Let X and Y be two matrices in C x r such that X*X = Y*X = lr
and let P = XY*. Prove that
||Ρ||ρ=||7||ρ, wherep = 2 o r F .
1.3.2[Bill] Prove that P is an orthogonal projection if and only if P is
1.3.3 [B:35] Suppose that P and Q are orthogonal projections. Prove that if
P is non-zero, then ||P|| 2 = 1. Also show that \\P - Q ||2 ^ 1.
1.3.4 [B:35] Let M and N be arbitrary subspaces of <Cn and let P and Q be
the orthogonal projections on M and N respectively. If \\P — Q\\2 < 1, then
(a) either dim M = dim N and \\(P - Q)P \\ 2 = || (P - Q)Q \\2 = || P - Q \\2
(b) or dim M < dim N and Q maps M on to a proper subspace N0 of N; if Q 0
is the orthogonal projection on ΛΓ0, then
11(^-00)^112 = 11(^-6)^112 = ΙΙ^-6οΙΙ 2 <ΐ
11(^-0)0112 = 11^-6112 = 1.

Section 1.4 The Gap between Two Subspaces

1.4.1 [B:35] Let M and N be arbitrary subspaces of C" and let P and Q be
projections on M and N respectively. Prove that
co(M,iV)<max{||(P-Q)P|| 2 ,||(P-Q)ß|| 2 }.
Examine the maximum when
dim M = dim N.
1.4.2 [D] Let M and N be subspaces of C of dimension r and let

be the canonical angles between them. Prove that if Π Μ and Π Ν are the
orthogonal projections on M and N respectively, then

l i n M - n J V | | 2 = sinö 1 and ||ΠΜ-ΠΝ||Ρ= i^Tf sin2 0£.

1.4.3 [D] Let M and N be subspaces of C . Prove that

ω(Μ,Ν) = ω ( Μ 1 , Ν 1 ) .
Deduce an extension of Theorem 1.4.4 for the case in which

dim M = dim N ^ -.

Section 1.5 Convergence of a Sequence of Subspaces

1.5.1 [B:ll] Let {Mk} be a sequence of subspaces of C . Prove that Mk
converges to a subspace M of <C" if and only if the following conditions are
(a) Given a basis Y of M and a complement Ύ of Y and given a basis Xk of
Mk9 there exist, for sufficiently greatfc,a regular matrix Ffc and a matrix Dfc
such that
Xk=YFix + ?Dk.
(b) DfcFk->0as fc->oo.
1.5.2 [D] Assume that, in the definition of convergence given in Exercise 1.5.1,
the bases Y( = Q) and Xk{ = Qk) are orthonormal. Prove that we may choose Fk
to be unitary such that it has the same singular values as cos ®k where Θ* is
the diagonal matrix of canonical angles between Mk and M. Deduce that Mk-+M
if and only if cos Θ* -+ Ir.

Section 1.6 Reduction of Square Matrices

1.6.1 [B:8,ll] Prove that eigenvectors that are associated with distinct eigen­
values are linearly independent.
1.6.2 [D] Let Ae<Cn xn and let λβ<£ be an eigenvalue with a non-zero imaginary
part and x an associated eigenvector. Prove that x is an eigenvector associated
with I and show that x and x are linearly independent.
1.6.3 [D] Let λ be an eigenvalue of A and let M and M^ be the right and left
invariant subspaces of X, Prove that corresponding to any orthonormal basis
X of M, there exists a basis X+ of M such that X*X = /m, where m is the
algebraic multiplicity of λ.
1.6.4 [D] Prove that:
(a) A matrix that is both normal and nilpotent is zero.
(b) A matrix H that is skew-Hermitian (//* = — H) and normal has a purely
imaginary spectrum.
1.6.5 [A] Prove that the order of the eigenvalues on the diagonal of the Schur
triangular form determine the corresponding Schur basis apart from a unitary
block-diagonal matrix.
1.6.6 [B:25] Let A = l/ΣΚ* be the singular value decomposition (SVD) of A.
Show that U and V consist of the eigenvectors of AA* and A*A respectively.
1.6.7 [D] Let amin be the least singular value of A. Prove that, if A is regular,

Deduce that amin is the distance of A from the nearest singular matrix.

1.6.8 [B:9] Let AeCm x", r(A) = r and D = d i a g ^ , . . . , σΓ), where the oi are the
non-zero singular values of A, and let U and V be the matrices of the SVD:

U*AV=T = (D °)G<C mxn .

VO 0/
V o o
Show that:
(a) AAU = A and AUAI = A f.
(b) ΛΛ " and A*A are Hermitian.
(c) If r = n, then Λ1" = μ Μ ) - Μ * .
(d) P = AA^ is the orthogonal projection on Im A.
Give an example in which [ΑΕγ ΦΒ^Α\ The matrix A^ is the pseudo-inverse
(or Moore-Penrose inverse) of A (a particular case of the generalized inverse),
which is used in the solution of Ax = b by the method of least squares.

1.6.9 [D] Let ί be the order of the greatest Jordan block associated with an
eigenvalue λ of Ae<Enxn. Prove that if / > 1, then
i ^ /=>Ker(/4 - λΐΥ' c Ker(/1 - A/)',
the inclusion being strict, and
i ^S=>Ker(A - λΐ)1 = Ker(A - λΙ)ί+ι = M,
where M is the maximal invariant subspace under A associated with λ.
1.6.10 [B:ll] By using the theorem on the Jordan form establish the following
k^ 1 k-> oo
lim A = Oop(A) < 1.

1.6.11 [B:62] Let A be a complex matrix of order n and ε a positive real

number. Construct a norm | | | | in C" such that the induced norm |||| in C n x n
has the property that
\\Α\\<ρ(Α) + ε.
1.6.12 [B:10] Let A e l " ". Define formally
oo 1
eA = I+ f-Ak.
* = i/c!

(a) Prove that the series converges uniformly.

(b) Prove that if V is an arbitrary regular matrix, then
(c) Show how to compute eA by using the Jordan form of A.
1.6.13[B:31] Prove the Cayley-Hamilton theorem: if π is the characteristic
polynomial of A, then n(A) is the zero matrix.
1.6.14 [B:31] Prove that the matrix A is diagonalisable, if and only if the product
Yldi=1(A — XJ) is the zero matrix, where λί9...,λά are the distinct eigenvalues
of A.

1.6.15 [B:31] Let Ae<Cn xn. Denote the characteristic polynomial of A by

π(ί) = det (tl - A) = tn + *£ a / .

Prove that
a„_1 = - t r A and a0 = {-\)ndetA.
1.6.16 [A] Show that every Jordan block J is similar to J*:
J* = P~lJP9
where P is a permutation matrix. Determine P.
1.6.17 [B:8] This exercise furnishes an alternative proof of the Jordan form.
Let Le<Cnxn be a nilpotent matrix of index /, that is
iZ-^O, but L' = 0.
M^KeriJ, N^ImL', L° = /.
(a) Show that
Mi c:Mi+1 (strict inclusion)
when i = 0, l,...,<f — 1.
(b) Prove that there exists a basis of C" in which L is represented by

(N? 0



NU) = Νψ = ... = No> = y e C i ^ ;
;i, ifj? = a + l
(0, otherwise
when j — 2,3,..., Λ and

with the convention that the blocks Νψ,..., Ν^ are omitted when pj = 0.
(c) Let Ae<Cnxn be an arbitrary matrix. Prove that A can be represented by a
'Alm 0
0 'Bly
where AY is nilpotent and Βγ is regular.
(d) Let sp(/l) = {λ 1 ,..., λά) be the spectrum of A. Prove that >1 can be represented
by a block-diagonal matrix
U 0 N
Vo 'Aj
where Λ, — λ(ΙΜί is nilpotent, mt being the algebraic multiplicity of Af.
(e) Deduce the existence of the Jordan form.
1.6.18 [D] Prove that the Jordan form is unique apart from the ordering of
the diagonal blocks.
1.6.19 [A] Suppose that A is diagonalisable by X:
D = X~lAX
and that Q is a Schur basis
ß M ß = D + iV,
where N is a strictly upper-diagonal matrix.
Prove the inequalities


cond 2 2 (X):sl+--^%
v(i4)=||i4M-i4i4*|| F .
1.6.20 [A] Let A = QRQ* be the Schur form of A, where R is an upper diagonal

matrix and N its strictly upper triangular part. Establish the bounds

v2(A) ^UKJll2 ^
m viA)lr?-n
,mr 'y^ -
1.6.21 [D] Prove that two diagonalisable matrices are similar if they have the
same spectrum. What can be said about defective matrices?
1.6.22 [D] Let D be a diagonal matrix of order n and let X be a regular matrix
of order n. Consider the matrix
A= X~lDX.
(a) Determine a similarity transformation Y that diagonalises the matrix

" < : : )
of order In.
(b) Prove that Y diagonalises
B = fp(A) q(A)\
Kq(A) p(A)f
where p and q are arbitrary polynomials.
(c) Express the eigenvalues of B in terms of those of A.
1.6.23 [D] Determine the connection between the singular values of X and
the Schur factorization of the matrices

Section 1.7
X*X9 XX*

Spectral Decomposition

1.7.1 [ D ] Let A = R" xw . Suppose that λ = γ + \μ is an eigenvalue of A and

that x = y + iz is a corresponding eigenvector, where y and μ are real, and y.
and z are vectors in R n . Prove that lin(y,z) is a real invariant subspace.
1.7.2 [D] Let

be the spectral decomposition of A. Prove that


Dfj = 3uDj,

Dflj = 0 when i # ; ,
APi = P,A = PiAPi = AfPt + Dh
Di = (A-XiI)Pi.
n xn
1.7.3 [D] Let Ae<£ . Prove the existence of a basis

of C n such that WfWt = JWi and

Λ <Λ
W~ AW=
where Tf is an upper triangular matrix of order m„ all of whose diagonal elements
are equal to the eigenvalue λ( of algebraic multiplicity mx. If λί9...,λά are the
distinct eigenvalues of A> interpret the matrix WiWf.
1.7'.4 [C] Obtain the spectral decompositions of the matrix

Λ= 0 1
0 0
Section 1.8 Rank and Linear Independence
1.8.1 [A] Let Xe<Cnxm9 where m < n, and suppose that r(X) < m. Prove that
there exists a permutation matrix Π such that ΧΠ = QR, where Q is orthonormal

- ( V *;>
K n being a regular upper triangular matrix of order r.
1.8.2 [D] Prove that if X = QK, where Q is orthonormal, then
r (X) = r(R) and cond 2 (X) = cond 2 (R).
1.8.3 [D] Suppose that X = QK, where Q is orthonormal and
*u ^12
0 R22J
is an upper triangular matrix. If Rx x is of order r and σ χ , σ 2 ,... are the singular
values of X arranged in decreasing order, prove that

1.8.4 [D] Suppose that the ε-rank of the matrix X is equal to r for every

positive ε. Prove that there exists a matrix X of rank r such that

\\X-X\\p = min | | j r - y | i
r(Y) = r"

where p = 2 or F.
1.8.5 [B:39] The Householder algorithm is defined as follows. Let A(l) = A be
a given matrix.
(*) given A{k) = (β{*>):
Iffe= Π, STOP.
If k < n, define

a = (M2+...+M2)l/2

» «to

(a) Prove that Hk is symmetric and orthogonal.
(b) Prove that the matrix Hk and the vector u satisfy the following equations:
H j M = — (xel,
Hku = £ WJ^J — (xek if fe ^ 2.

(c) Prove that

w = Σ Wjej=>Hkw = w.

(d) Let
= Hn-1Hn-2-H2H1A,
Q= H1H2-'Hn-2Hn-i.
Prove that R is an upper triangular matrix and that Q is orthogonal.
(e) Prove that A = QR.
1.8.6 [D] Let amin(X) be the least singular value of X. Prove that there exists
a permutation matrix Π such that, if ATI = QR is the Schmidt factorization, then

1.8.7 [D] Let AeC *, where n^p.

(a) Prove that there exists a factorization, known as the polar decomposition,
A = QH,
where Qe<Enxp, Q*Q = IP and where He<Epxp is symmetric and positive
(b) Prove that the matrix Q in (a) satisfies the conditions
|| A - Q\\j = min {\\A-U ||,·:Ηη U = lin A and U*U = / p } ,
where j = 2 or F.
(c) Compare the applications of the polar decomposition and of the Schmidt
factorization in relation to the orthonormalization of a set of linearly
independent vectors.

Section 1.9 Hermitian and Normal Matrices

1.9.1 [A] Prove that if A is Hermitian and if B is Hermitian semi-positive, then
p(A + B)>p(A)
1.9.2 [A] Prove the monotonicity theorem of Weyl: let A,B and C be
Hermitian matrices such that
A = B + C,
and assume that their spectra are arranged in decreasing order. Then:
(a) When i'= 1,2,...,*,
λ((Β) + λη(0 < λ((Α) ^ λΑΒ) + λΑΟ,
(b) If C is semi-positive definite,
λΑΒχλΑΑ) (i=l,2,...,n).
1.9.3 [A] Prove that the matrix Ae<C" is normal if and only if there exists
an orthonormal basis of <E" that consists of eigenvectors of A.
1.9.4 [A] Prove that if A is normal then
\\A\\2 = p(A).
1.9.5[A] Let Ae<E " be a Hermitian matrix and put
JV u*Au
p(u,A) = —- (u#0).
Prove that the spectrum of A can be characterized as follows:
k{(A) = max min p(u, A),
s u

subject to the conditions that

dimS = / - l , MlS.
1.9.6 [A] Establish the following consequence of Exercise 1.9.2: if A and B are
Hermitian matrices of order n, then their eigenvalues λ^Α) and λ^Β) can be
enumerated in such a way that
λί(Β)^λί(Α)+\\Λ-Β\\ (i=l,...,n).
1.9.7 [B:67] Let π, be the characteristic polynomial of the real symmetric
fax '12 *iA
Aj = an (l^;^n),

faj u

and put n0(t)— 1.

(a) Show that {π 0 ,...,π η } is a Sturm sequence, that is if rx and r 2 are zeros of
Uj+χ such that rx < r 2 , then there exists a zero of π,- in [rur^\.
(b) Show that if An is tridiagonal, then
^j+i(t):=(t-aj+ij+i)nj(t)-aj+ljnj_l(t) (j = l , . . . , n - 1).

Section 1.10 Non-negative Matrices

1.10.1 [B:18] Let

S = \xe1Rn:xi^Q, Σ x f = l [.


T(x) = Λχ,

where A is a non-negative irreducible matrix and p(x) is a continuous function

which is non-zero in S and such that T(S) ^ S. Use Brower's fixed-point theorem
to prove the Perron-Frobenius Theorem.

1.10.2 [B:18] Let A be a real non-negative matrix. Prove that:

(a) If there exists a vector x > 0 such that
Ax ^ Ax, then λ ^ p(/l).
(b) (λΐ — A)'1 is non-negative if and only if λ > p(A).

1.10.3 [B:18] Suppose that A is a non-negative irreducible matrix. Prove that


if there exist m eigenvalues /i, of A such that

then (with suitable labelling) μ} — ω,ρ, where ω, = c\p(2ijn/m) and i2 = — 1.

Section 1.11 Sections and Rayleigh Quotients

1.11.1 [D] Prove that if M is an invariant subspace of A, then the section of
A on M can be identified with the restriction A^M.
1.12.2 [D] Let X and Y be complex n by r matrices such that r(X) = r and
Y*X = lr. Give a formula for the matrix that represents the section of a linear
map A on the subspace Im X.

Section 1.12 Sylvester's Equation

1.12.1 [D] Let Pe<P ΧΛ and ße(C m x m be regular matrices. Suppose that Ae
<Cnxn and Be€mxm are such that sp(,4)nsp(£) = 0 . Put
R = (A,B)\ S= (PAP-\QBQlyl.
Show that
Establish a stronger result when P and Q are unitary and || · || is the norm induced
either by ||-||2 or by ||-||F.
1.12.2 [D] Suppose that T:Z -► AZ — ZB is regular and that A is regular. Prove
that if ß^0 and δ >0 are such that ||0|| ^ β and \\Α~ι\\Κ(β + δ)~\ then

Suppose now that A and B are Hermitian and positive definite and that, for all

x*x = l =>χ*Αχ ^β + δ>β> χ*Βχ.

Show that
1.12.3 [B:27] Examine the spectrum and determine the spectral radius of the
as a function of the spectra and spectral radii of A and B.
1.12.4 [D] Let
3T = lr®A-BT®In,
3T' = Ir®A-ST®I„

where B is of order r, A is of order n and S is a Schur form of A, say

(a) Prove that ΖΓ and ^"' are similar; in fact

^ ' = (0®/J^(Q*®U
(b) Prove that
sp(^) = sp(iT') = μ - μ\ Xesp(A)^esp(B)}.
1.12.5 [B:27] Establish the following properties of the Kronecker product:
(a) A®{B + C) = A®B + A®C,
(A + B)®C = A®C + B®C,
provided that the above sums are defined.
(b) For all aeC, A ® (aß) = OL(A ® Β).
(c) (A®B)(C®D) = (AC)®(BD\
provided that the above products are defined.
(d) A®(B®C) = (A®B)®C.
(e) (A®B)* = A*®B*.
(f) Let A and B be regular matrices of orders m and n respectively. Then
(A®B)(A-l®B~1) = Im®In.
(g) If λ(Α) and λ(Β) are eigenvalues of A and B respectively with corresponding
eigenvectors φ(Α) and φ(Β), then λ(Α)λ(Β) is an eigenvalues of A ® B and
φ(Α)®φ(Β) is a corresponding eigenvector.
1.12.6 [B:17] Let V= (F 0 , Vx) be an orthonormal basis of <Cn and let P = K0K*
be the orthogonal projection on M = Im K0 and Q an orthogonal projection on
a subspace JV such that dimN = dimM. A unitary solution Ue(Cnxn of the
is called a direct rotation of M on iV if and only if, relative to the basis V, it is
represented by

ÜJC° -M,
\s0 cj
(a) C 0 > 0 and Cj > 0 and
(b) S 1 = S * .
Prove that if N' = Im (/ — Q) and M' — Im (/ — P) satisfy the conditions
MniV' = M'n/V = {0},
then the direct rotation of M on N exists; it is unique and (a) implies (b).

Section 1.13 Regular Pencils of Matrices

1.13.1 [D] Let A and B be matrices of order n such that
Prove that det(/l - λΒ) = 0 for all AeC.
1.13.2 [D] Let A and B be symmetric matrices in R " x n such that A is regular
and B is positive semi-definite and singular. Let U be an orthonormal basis of
Ker£. Prove that:
(a) Zero is an eigenvalue of A ~ 1B of algebraic multiplicity
m = dim Ker B + dim Ker UTA U
and of geometric multiplicity
g = dim Ker B.
(b) The non-zero eigenvalues of A ~ lB are real and non-defective: there exists a
diagonal matrix Λ of order r and an n by r matrix X satisfying XTX = In
such that
AX = BXA and XTBX = Jr.
(c) The matrix Λ~ ι Β is non-defective if and only if UTAU is regular.
1.13.3 [B:45] Let A and B be symmetric matrices. Prove that the pencil A — λΒ
is definite if and only if there exist real numbers a and ß such that the matrix
OLA + ßB is definite and that this is equivalent to the condition that UTAU is
definite, where U is an orthonormal basis of Ker B.
1.13.4 [B:45] Prove that every diagonalisable matrix C can be factorized in the
C= AB\
where A and B are symmetric and B is regular. Comment on this result.

Elements of Spectral Theory

In this chapter we present the elements of the spectral theory of finite-dimensional

operators: mainly the expansion in a Laurent series of the resolvent (A — zl)~1
in the neighbourhood of an eigenvalue, and the expansion in the perturbation
series of Rellich-Kato and of Rayleigh-Schrödinger for the eigenelements of the
set of operators A(t) = A + tH, where t is a complex parameter. We introduce the
fundamental tool of a block-reduced resolvent in order to treat simultaneously
several distinct eigenvalues, which arise most frequently in the approximation of
a multiple eigenvalue.


Let f:z\-+f(z) be a function of a complex variable. We say that f(z) is holomorphic
(or analytic) in the neighbourhood V of z 0 if and only if / is continuous in V and
d//dz exists at every point of V.
Let Γ be a closed Jordan curve lying in V (that is a rectifiable simple curve)
positively oriented and surrounding z0 (see Figure 2.1.1). Then f(z0) is given by
Cauchy's* integral formula:

2inJrz — z0
By differentiation,

The expansion off as a Taylor* series in the neighbourhood of z 0 is as follows:

♦Augustin Louis Cauchy, 1789-1857, born in Paris, died at Sceaux.

Brook Taylor, 1685-1731, born at Edmonton, died in London.

Figure 2.1.1

it converges absolutely and uniformly with respect to z in the interior of any disk
that lies inside Γ. Conversely, every series of the form

f{z)= Σαάζ-zvt
k= 0

defines a function that is holomorphic in the open disk {z;|z — z 0 | < p}, where
p = (limsup|aj 1/fc )- 1 .
This series converges absolutely and uniformly with respect to z in every disk
{z;|z — z 0 | ^ r}, where r < p. Moreover, this series is uniquely determined by /

a — /c = 0 , 1 ,
The coefficients ak of the Taylor expansion can be bounded by Cauchy's
\ak\ ^ Mr~\ k^O, where M = max |/(z)|.
Next we suppose that / is holomorphic in the annulus
{ζ;α<|ζ-ζο|<0}, a > 0.
Then / can be expanded in a Laurent* series in the neighbourhood of z 0 ; thus

f(z)= Yak{z-z0f.
— oo

The series converges absolutely and uniformly in z in the annulus

{ζ;α + ε < \z — z0\ <β — ε}
where ε > 0.
If / is holomorphic in {z;0 < | z — z01 < β} but not in {z; | z — z01 < /?}, then z 0
is said to be an isolated singularity of/. If the expansion as a Laurent series about
z0 contains infinitely many non-zero coefficients ak for which k < 0, then z0 is

*PauI Mathieu Laurent, 1841-1908, born at Echternach, died in Paris.


said to be an essential singularity of/. In the opposite case, z 0 is a pole of/; the
greatest integer { such that a_t Φ 0 is called the order of the pole.
The definitions and properties that have been recalled above can be extended
without difficulty to a function / with values in the vector space C n xw, that is,
for example, to a square matrix A of order n whose n2 coefficients depend on the
complex variable z. It suffices to replace the absolute value | · | on <C by the chosen
norm | | | | on C n x n . In particular, one could apply Liouville's* theorem and
Cauchy's integral formula.


The subsequent spectral theory will be established independently of the work
presented in Chapter 1.
The resolvent set of A consists of those points z of (C at which (A — z/)~ * exists;
it will be denoted by res(/4). The matrix.
R(A,z) = (A-ziy\ zeres(A)
is called the resolvent οϊΑ. If there is no ambiguity, we shall denote R(A9 z) simply
by R(z). The unique solution of the equation Ax — zx = b is then written as
x = R{z)b.
The complement of res (A) in C is called the spectrum of A and is denoted by
sp(/l): at a point λ of sp(/4) the matrix A — XI is not invertible. Hence there exists
a vector x φ 0 such that Ax = λχ\ thus λ is an eigenvalue of A and x is an
In this section we will investigate the properties of the resolvent.

Lemma 2.2.1 The resolvent R(z) satisfies two identities, known as the first and
second resolvent equations respectively:
R(Zl) - R(z2) = (zx - z2)R(Zl)R(z2) = (zx - z2)R(z2)R(Zl) (2.2.1)
for all zx and z2 in res(/4), and
R(Ai,z)-R(A2,z) = R(Al,z)(A2-Al)R(A2iz)
^R(A29z)(A2-Al)R(Al,z) (2.2.2)
for zeres (A x )n res (A2).

PROOF The results are easily deduced from the identities

(z1-z2)I = A-z2I-(A-zlI)
A2 — Ax — A2 — zI — (Al — zl).

"Joseph Liouville, 1809-1882, born at Saint Omer, died in Paris.


Proposition 2.2.2 The resolvent R(z) is holomorphic throughout res (,4), where it
possesses the following expansion as a Taylor series in a neighbourhood ofz0:

R(z)= X(z-z 0 )*/* f c + 1(z0) (2.2.3)

PROOF Let z0eres(y4). We have the formal identity
(A - ziy * = R(z0)U - (z - z 0 )K(z 0 )]" l .
For every z such that \z — z0\ < \\ R{z0) || ~ \ the series

R(z) = R(z0)f^l(z-z0)R(z0)f
fc = 0

converges absolutely.

Theorem 2.2.3 The number

a=limM f c || 1 / f c = inf||/l k || 1/k
fc-*oo k

PROOF Let ak = log || Ak ||. We wish to show that

inf* = b.
* k
The inequality
M- + *IKM-Ihll^ll
implies that
am + k<:am + ak.
When m is a fixed positive integer, we can put k = mq + r, where q and r are
integers such that 0 ^ r < m. Hence

ak ^ qam + ar and \^\j\a>» +(£ V

When m is fixed and k tends to infinity, we have

k m
Hence lim sup (ak/k) ^ ajm, where m is arbitrary. Therefore sup(ak//c) ^ b. On
the other hand, since ak/k ^ fr, it follows that

liminfi — \^b.

Theorem 2.2.4 If\z\ > a, then R(z) exists and is given by

* ( ' ) = - Σ-ΪΤ7 (2-2-4)

PROOF By Theorem 2.2.3, \\Ak\\1/k -*a when k-> oo. Hence, if |z| > a + ε, where
ε > 0, we have
ΙζΙ-Μΐ^ΙΙ^^ία + εΓΗα + Η
and so

provided that k is sufficiently great. Hence the series

S(z) = z~1 f,{z~lAf


converges when \z\ > a. On multiplying on the left or on the right by A — z/, we
verify that
(A - zI)S{z) = S(z)(A -zl)=-1.
This proves equation (2.2.4).
The identity (2.2.4) is the expansion of R(z) as a Taylor series in the
neighbourhood of z = oo, the radius of convergence being limfc sup || Ak ||1/k = a.
Hence equation (2.2.4) diverges when |z| < a.

Corollary 2.2.5 The sets res(A) and sp(A) are not empty.

PROOF By virtue of Theorem 2.2.4,

res (A) =) {z; \z\ > a}
Now res {A) is not empty because a exists. When \z\ > \\A\\, we deduce from
equation (2.2.4) that

ll*WII< Σ Μ Τ Τ ^ ^ Ι - Μ Ι Ι ) " 1 ·
k = 0\Z\

Hence || R(z)\\ -►O when \z\ -> oo. If sp(A) is empty, then R(z) would by analytic
and bounded throughout <C. By Liouville's theorem R(z) would be constants, and
this constant would be the zero matrix. This would lead to the contradiction that
/ = (,4-zJ)K(z) = 0.

Corollary 2.2.6 We have

OL = max(|vl|;Aesp(,4)} = p(A).

PROOF We shall show that there exists at least one point of sp(/l) (that is an
eigenvalue of A) on the circle {z; \z\ = a}. Since the domain of convergence of
equation (2.2.4) is {z; |z| > a}, there is at least one singularity of R(z) on the circle
of convergence, provided that a > 0.
On the other hand, if a = 0, then sp (A) = {0} unless the spectrum of A is empty,
which is impossible. Hence we conclude that a = p(A).

Proposition 2.2.7 Let ZK-*S(Z) be a continuous function of z in a domain G with

values in <C"xn. Then z\->p(S(z)) is upper semi-continuous in G.

PROOF For every /ceN, the function z\-> || Sk(z) || l,k is continuous in G. For every
ε > 0 and z in G, there exists veN such that
||S v (z)|| 1/v ^p(S(z) + ie.
There exists of δ > 0 such that
|ζ'-ζ|<<5^||5 ν (ζ')ΙΙ 1 / ν ^ΙΙ5 ν (ζ)|| 1 / ν + ^
^ p(S(z)) + ε.
p(S(z'))= inf || S V ) II1/fc,

we obtain
\z' - z\ <S=>p(S(z')) ^ p(S(z)) + ε.
We have established the fact that R{z) is holomorphic in the exterior of the
disk {z; \z\ ^ p(A)} which contains the spectrum of A (see Figure 2.2.1) and that
R(z) has the expansions (2.2.3) and (2.2.4). Next we are going to establish the form
of the Laurent expansion of R(z) in the neighbourhood of an eigenvalue x, where
Let Γ and Γ' be two Jordan curves surrounding λ. The curve P lies in the
exterior of Γ. Both curves lie in the set res (A) and contain no other point of sp(A).

Figure 2.2.1

Figure 2.2.2

Let τ denote the remainder oisp(A), that is

sp(i4)= {A}UT
(see Figure 2.2.2).

Theorem 2.2.8 The matrix

has the following properties :
(a) P is a projection on M — Im P, along M = Ker P.
(b) M and M are invariant subspaces of A.
(c) AfM:M-+M has the spectrum λ and A^:M^M has the spectrum τ.


(a) We shall show that P2 = P:

P2 R(z)R(z')dz'dz,
\2in) JJ r
where zeT and z ' e F . By equation (2.2.1) we obtain

\2inJ JrJr, z'-z
We remark that
f ' 2ίπ and
J r ,z'-z
On changing the order of integration we immediately deduce that


Ri / Γ
az \2i
l il,7^) - '\ Γ

\[ R(z)dz = P.
(b) Put M = Im P and M = Ker P. We will show that M is invariant under A.
Since AR(z) = R(z)A, we deduce that PA = AP. If welm P, then u = Pv for
some v. Hence Au = PAv = PAveM. Thus
By a similar argument (see Section 1.3), it is shown that
M = KerP = I m ( / - P )
is invariant under A.
(c) Let [X, X] and [Χ%, X + ] be adjoint bases of C such that X and X are bases
of M and M respectively. Relative to these bases the map A^M:M-^M is
represented by the matrix B = X*AX. Similarly, the map A^:M-^M is
represented by the matrix B = X*AX.
When zeres(,4), zeT and teT, we have

2ίπ >**-©ί»·
When z is in the exterior of Γ, we obtain
Ä(z)P= — Κ(ί)
ζ - ί

which is a holomorphic function of z.

Since M and M are complementary invariant subspaces, A becomes a
block-diagonal matrix relative to the adjoint bases [X,X] and [Χ*,Χ*]; in

[^•«■κ a
and sp(/4) = sp(ß)usp(£). Therefore
^--^{ r\s-um
Since P = XX* (see Exercise 2.2.3) we deduce that
R(z)P = X{B - ziy lX* and R{z)(I -P) = X(B - zl)~lX

Now (B — ziy1 is a holomorphic function of z in the exterior of Γ and in

particular at every point of τ; thus res(J5) 3 τ. On the other hand, R(z)(I - P) is
holomorphic in the interior of Γ, whence Aeres(B). We deduce that
sp(£) = {>l} and sp(5) = T.
The fundamental theorem 2.2.8 enables us to block-diagonalize A. We remark
first of all that the definition of the spectral projection given in Theorem 2.2.8 is
identical with that given in Theorem 1.7.1; indeed, P does not depend on Γ, but
uniquely on λ, which is a singularity of R(z).

Example 2.2.1 Consider

A(e) = ( «}-*Λ = ( Ι ; ),as£-*0,

, 2l=( , w _ 2 , r ,._J_(' : ; -_'J

The eigenvalues of Α(ε) are the two roots of (1 — z)2 — ε = 0, say
λχ(ε) = 1 + ^/6 and λ2(ε)=1—^/ε.
By integrating around λ^ε) and λ2(ε) respectively, we obtain

We remark that Ρ^ε) + Ρ2(<0 = /-

We return to the notation employed in Theorem 2.2.8 and its proof. By
hypothesis, λ is not an eigenvalue of J3, and so (B — λΙ)~ι exists.

Definition The matrix

is called the reduced resolvent of A at λ.
This matrix represents the extension to the whole space of the inverse
( £ - Xiy\ In fact,

G j w : «..·«-}
Lemma 2.2.9 T/ie matrix

15 nilpotent: De = 0, w/ii?re / is ihe index of the eigenvalue λ.


PROOF For any positive integer k we have

Dk = (A - XlfP = X(B - XlfX*.
Now B has λ as its sole eigenvalue. Hence p(B — λΙ) = 0 and Ν = Β — λΐ is
nilpotent: there exists a positive integer ί such that N{ = 0 but N{~l φ 0. Thus
(Α-λΐΥΧ = 0 but (Α-λί)'-ιΧΦϋ, and ί is the index of the eigenvalue λ
(Theorem 1.6.7).

Theorem 2.2.10 The expansion ofR(z) as a Laurent series, valid in a neighbour-

hood ο/λ, can be written as
m=—- 'Σ , L+i + Σ (* - tfs*+>. (2.2.5)
z —z k=i(z — A) k=o
PROOF We have
(B - ziyl = (B - λΐ - (z - λ)Ι)~ \
By Theorem 2.2.4 the expansion

(B-ziyi = - f^iz-xy^^B-xif
fc = 0

converges when \z — λ\ > 0, that is when z φ λ. We deduce that, when z Φ A,

R(z)P = (A - zl) ~'P = X(B - zl)"lX%

= X {z-kyk-xX(B-XI)kXl
k= 0

P - *
ζ-Λ *ΐ-!(ζ-λ)* + 1 '

On the other hand, the expansion

Ä(z)(/-P) = X ( B - z / ) " 1 X * = £ ( z - A ^ ß - A / ) - ^ 1 ^ *
fc = 0

is the Jaylor series for R{z)(I — P), valid near λ. On using the definition
S = X(B - λΙ)~ lX% we infer that R(z)P + K(z)(/ - P) satisfies equation (2.2.5).
Without referring to the characteristic polynomial we have established the fact
that a pole λ of order { of the resolvent R(z) is an eigenvalue of A of index *f and
algebraic multiplicity m = dim M.
We will now mention the form that Cauchy's integral formula takes in this
context. Let Γ be a Jordan curve lying in res (A) and enclosing sp(,4), and let /
be a function that is holomorphic in the neighbourhood of sp(y4). Then Cauchy's
integral formula enables us to define

fW = r r - fV)W -A)"1 Λζ. (2.2.6)

2ιπ J r

With an entirely analogous technique one can discuss isolated eigenvalues of

finite multiplicity for linear operators in Banach* spaces (see Chatelin, 1983).
Without difficulty, the preceding investigation can be extended to the case in
which the spectrum of A is partitioned into disjoint subsets of eigenvalues.
Let {A,·} (1 < i ^ d] be the distinct eigenvalues of A. We denote by Pt (respec­
tively (fhDh S^ the spectral projection (respectively index, nilpotent matrix,
reduced resolvent) associated with each λ(.

Lemma 2.2.11 We have



PROOF Let Γ be a Jordan curve enclosing the set {Λ,,} (1 ^ i ^ d). Then
i=i 2ιπ, r
Since R(z) is holomorphic in the exterior of Γ, we can use the expansion (2.2.4)
of R{z) and make the change of variable: z = 1/i. From the identity

R(z)dz= £ tk+lAk-2
*=o t
and from the fact that t traverses a Jordan curve Γ' in the negative sense
(z = pew => z " l = p " * e ~ie) we deduce that

^i«(z)d Z = ^ f /* = · - 2ΐπ,/ = /.
i m j rr
2i7rJ 2i*7r J rj-- tt —2in

Corollary 2.2.12 Let ZETQS(A). Then R(z) can be expanded as follows:

PROOF If zeres(Λ), we have

Ä(z) = R(z)Pi + K ( z ) £ P . .

On applying equation (2.2.5) and noting that ^,Ρ, = 0 we obtain equation


Corollary 2.2.13 We recover the spectral decomposition (1.7.1), namely

♦Stefan Banach, 1892-1945, born in Krakow, died in Lvow.


PROOF For each A, we have

AP^XtPt + Dt.
We obtain the result by summing over i.

Proposition 2.2.14 When zEres(y4), we have

R*(A, z) = R(A*, z) and P%4, λ) = Ρ(Α*> λ).

PROOF The identity (A — zl)* = A* — zl immediately yields the relation

R*(A9z) = R{A*,z).
Next, let A be an eigenvalue of A and choose p so small that the circle
Γ:{ζ;ζ-λ = ρ&θ,0^θ^2π}
isolates A. Let Γ be the complex conjugate circle positively oriented (see Figure
2.2.3); thus
r:z-A = pe- ,e :
dz + dz = 0.
For all x and y in C we have

R(A,z)dz \y\ x = y*\ f R(A,z)az x

= I y*R*{A,z)xdz= I [R(A,z)yYxaz.


i f ÄM,z)dzT= f R*(A,z)dz.


Figure 2.2.3


P(/t*,yl) = - - ί R(A*yz)dz = -^ \ R*(A,z)dz.

The contour Γ is denoted by Γ " or Γ + according to whether its orientation is

negative or positive. Since άζ = dz, we obtain

— K%4, z) dz = — K%4, z) άζ
2mJr_ 2mJr_

=Γ^ί K(^z)dzT = P%4,A).

Corollary 2.2.15 When A is Hermitian, R(z) is normal, P = P* and £ = 1.

PROOF Since R*{z) = R(z), it follows from equation (2.2.1) that R*(z)R(z) =
R(z)R*(z) and
l|RWIl2 = p [ M - ^ ) " " 1 ] = d i s t - 1 [ z , s P M ) ] .
Since λ is real, it may be assumed that the contour in Figure 2.2.3 is symmetric
with respect to the real axis; thus P* = P. Now
D = (A - λΙ)Ρ = D*.
Since D is a Hermitian nilpotent matrix it is zero, that is D = 0 and ί = 1.



2.3.1 The Reduced Resolvent

Let λ be an eigenvalue of multiplicity m. We recall that the reduced resolvent
with respect to λ is given by the matrix

5 = dist[A,sp(i4)-{A}]>0.

Proposition 2.3.1 The matrix S has the following properties:

(a) i i s i i ^ a - 1 .
(b) IfA_is Hermitian, then \\S\\2 = δ'1.
(c) IfX is an orthonormal basis, then


(a) The non-zero eigenvalues of S are the eigenvalues of {Β — λΙ)'1. Therefore

(b) If A is Hermitian, so is B if one chooses X+ = X.

(c) This is evident because \\X\\2 — 1 and ||Α^|| 2 ^ 1.
Thus || S || 2 depends on δ and also on the conditioning of the Jordan base (or
eigenvectors) of B.
Let b be a vector such that

The matrix S clearly serves to solve the system

X%z = 0,
consisting of n + m equations in n unknowns of rank n\ indeed the unique solution
of system (2.3.1) is z = Sb [this may be seen by using the relations (A — Xl)X =
X(B - λΐ) and XX% + XX% = / ] .
In a practical situation system (2.3.1) can be solved by adapting the factoriza­
tions of Gauss or Schmidt (see Exercise 2.3.1). However, it may be preferable to
reduce the problem, when possible, to the solution of a regular system of n
equations in n unknowns; this will be demonstrated in the next lemma.

Lemma 2.3.2 Suppose that X*b = 0 and that λΦθ. Then the unique solution z
of system (2.3.1) is a solution of the system
(I - P)Az -kz = b. (2.3.2)

PROOF We can write system (2.3.2) in the form

L 0 Β-λΐ]ΐΧ*]
or, again,
-λΧ*ζ =0,
When λ Φ 0, we obtain Xp = 0 and X%z = (B - λ1)~ lX*b, whence z = Sb.
System (2.3.2) of rank n can be solved in a standard fashion.

2.3.2 A Partial Inverse

The spectral projection P presupposes a knowledge of the right and left invariant
subspaces M and M*, which may be costly. We are going to introduce the notion

of a partial inverse which requires only the knowledge of the right invariant
subspace M.
Let X be a basis for M and let 7 be an adjoint basis for a subspace N, which
need not be M + . We suppose that ω(Μ, N) < 1, that is 0max < π/2. Then Π = X Y*
is the projection on M along N1 = W. Let [X, X] and [7, 7 ] be adjoint bases of
C . The matrix A is similar to

where ß = 7 Μ Χ = ATJ/IA' (Lemma 1.7.4) and B = ΓΜΛΓ.

We remark that

sp(ß)nsp(ß) = 0 .

Theorem 2.3.3 If A is of the block-triangular form

relativeto the adjoint bases IX, X~\ and[Y, 7 ] , then there exist adjoint bases [X, X~\
and [X^9X^\ defined by
X = X-XR, X*=Y+YR*, X* = Y
where R = (B, B)~ C, such that A becomes block-diagonal of the form

K ίΐ
PROOF It is easy to verify that

ifandonlyifÄ = (B,jB)"1C.
With the notation of Section 2.3.1 we have
B = X%AX = Y*A(X - XR) = Y*AX = B.
As we shall see in Proposition 2.3.4, this is due to a particular choice of the bases
X, X+, starting from the bases X, 7in Theorem 2.3.3.
When arbitrary bases IX, X] and [X, X~\ are associated with the direct decom­
positions MφNL and Λ ί φ Μ , the corresponding matrices B=Y*AX and
B = X*AX are similar because they have the same spectral structure (see Exercise
2.3.2). We shall study this similarity more precisely in the following proposition.

Proposition 2.3.4 Let {X9 X+) and (X, 7) be a pair of adjoint bases. Then there

exists a regular matrix G such that

X*=YG and B = G*B(G~1)*.

PROOF Let X, Y(respectively X, XJ be adjoint bases in the subspaces N1, M 1

(respectively M,M+). Now M 1 = Μ + ; therefore 7and X+ are two bases for the
same space. There exists a regular matrix G such that X# = YG. Denote by X'
the adjoint basis of X+ in N. It is known that X' = X(G~*)* and that
B = X%AX = X%AX'.
B = G*Y*AX(G~l)* = G*B(G~x)*.
The matrix G depends on the choice of bases Y and Χ+{οτ_Μλ. For example, in
Theorem 2.3.3 we have G = I with the choice of bases X, X^.
Since λ is not an eigenvalue of B, it is not an eigenvalue of B either. We define
the partial inverse Σ (with respect to N1) by
Σ= Χ(Β-λΙΓιΥ*.

Lemma 2.3.5 If λ Φ 0 and the right-hand side is a vector b satisfying Y*b = 0,

then the equation
(/ - Yl)Az -kz = b
has the unique solution z = Σb.

PROOF See Exercise 2.3.3.

Let [β, Q] be an orthonormal basis of C" such that Q and Q are orthonormal
bases of M and M1 respectively. The projection P1 = QQ* is the orthogonal
projection on M. The corresponding partial inverse (with respect to M 1 ) is
defined by
Σ^ρίΒ-//)- 1 ^*,
where B = Q*AQ. We leave it to the reader to verify that


2.4.1 Definition of the Block-Reduced Resolvent

The reduced resolvent which we have defined previously refers to the case of a
multiple eigenvalue for which bases for the left and right invariant subspaces are
known. Numerically, a multiple eigenvalue is, in general, approached by a set of

neighbouring eigenvalues, and the resolvent that is associated with each of these
eigenvalues individually is ill-conditioned because their distance from the re­
mainder of the spectrum is small. One may therefore wish to treat globally the
cluster of eigenvalues which are close to a multiple eigenvalue.
Let {μ,·:| ^ Ι ^ Γ } be the block of the r eigenvalues of A, counted with their
multiplicity and distinct from the rest of the spectrum; we wish to treat the
eigenvalues {μ^ simultaneously. The corresponding right invariant subspace is
given by

i= 1

and the left invariant subspace is denoted by M + . The spectral projection is

represented by the matrix
P = XX%.
The complementary invariant subspace is denoted by
M = I m ( / - P ) = M^.
the matrix B = X$AX represents A^M relative to the bases X and X+; it satisfies
the equations
AX = XB and X*A = BX*.
Similarly, the matrix B = X^AX represents A^ relative to the bases X and X^
which are complementary to X and X+ respectively.
We put
<T = s p ( B ) = { / i i : l ^ i < r } ,

t = sp(£)
δ = dist min (σ, τ),
which is positive by hypothesis.
In the case of a block of eigenvalues, the eigenvalues of B may include distinct
ones. We will generalize the notion of a reduced resolvent in the following

Definition The linear map

is called the block-reduced resolvent relative to {μ,·: 1 ^ i < r}, where (Β,Β)'1 is
the inverse of the linear map
in )xr
defined on <E -' .

Proposition 2.4.1 The map S possesses the following properties:

(a) | | 5 | | p ^ | | X | | 2 | | X J | 2 | | ( 5 , ß ) - 1 | | P > w / l e n p = 2 o r F .
(b) usii^-δ- 1 .
(c) If A is Hermitian, then \\S\\F = δ .


(a) Put U = (B, B) ~lXIZ. Then

When p = 2, the result is immediate because the spectral norm is induced by
the Euclidean norm.
When p = F, we have




i= 1

whence the result follows:
(b) This_is a consequence of Proposition 1.12.2 and the fact that sp(S) =
(c) We can choose bases X and X+ in M 1 and M£ respectively such that
Jf * i = X*X = /,
because ω(Μ ,Μ^) < 1. Then by Proposition 1.2.5 we have
where Θ is the diagonal of the canonical angles. We deduce that || X ||2 = 1
and \\XJ2J* 1. If Af = M # , then ||Äf||2 = \\XJ2 = \. If A is Hermitian, so
are B and J5 if we choose an orthonormal basis [ β , β ] . Thus
| | 5 | | Ρ = | | ( 5 , 5 ) - 1 | | ρ = δ- 1 .
Lemma 2.4.2 Let R be an n by r matrix such that X*R = 0.IfB is regular, the
\AZ-ZB = R,
X%Z = 0,

(I-P)AZ-ZB = R,
where P = XX*, have the same solution, namely Z = SR.

PROOF This is analoguous to the proof of Lemma 2.3.2. The equation λΧζζ = 0
is replaced by {X*tZ)B = 0, which has the unique solution AT*Z = 0 if and only if
B is regular.

What becomes of the notion of a block-reduced resolvent, when σ = {λ}

consists of a single eigenvalue of algebraic multiplicity m? Two cases have to be
(a) λ is semi-simple and B = XIm. Sylvester's equation is then entirely uncoupled
and S is identical with the reduced resolvent S.
(b) λ is defective. In this case the two notions of the reduced resolvent S and the
block-reduced resolvent S are distinct. We shall see in Section 2.9 that in
this case it is the notion of a block-reduced resolvent that plays a part in the
theory of analytic perturbations.

2.4.2 The Partial Block Inverse

As in the case of the reduced resolvent, the use of the block-reduced resolvent
presupposes a knowledge of the spectral projection, that is of the right and of
the left invariant subspaces. We shall extend the notion of a partial inverse to a
block of eigenvalues.
Again, let Π = X Y* be a projection on M, not necessarily the spectral projection.
Let N 1 = K e r n .

Definition The linear map Σ = X(B,B)~1Y* is called the block partial inverse
with respect to {μ,: 1 ^ i < r}, defined in N1, where
B = Y*AX.
In particular, we may choose N = M. When Q is an orthogonal basis of Mx,
we have B = Q*AQ and Σ1 = Q(B,B)~ lQ*. We~leave to the reader the task of
verifying that
l|E- L llp=ll(B,B)- 1 ||p when p = 2orF.


Let A be a matrix whose eigenvalues we wish to compute. The existence of
rounding errors and/or systematic errors make us determine numerically the
eigenvalues of a 'neighbouring' matrix, Ä = A + H, where H is a matrix of'small'
norm, termed a perturbation. In order to deduce some information on the

eigenelements of A from a knowledge of the eigenelements of A\ it is useful to

consider the family of matrices A(t) — A' — ί/ί, where t is a complex parameter.
Then A(0) = A' and A(\) = A: this is a homotopy between A and A'. Consider the
lA(t) - z/]x(i) = ft, Λ(ί)χ(0 = λ(ήχ(ή.
If the functions x(r) and λ(ή are analytic in a disk around 0 which contains the
point t = 1, then the solutions of the problems
(A — zl)x = b, Ax — λχ
can be computed recursively by starting from the known solutions of
(A1 - zl)x' = ft, A'x' = λ'χ'.
Let Γ be a Jordan curve drawn in res(/T) and isolating A', which is assumed
to be an eigenvalue of A' of multiplicity m and index £'. We define formally
R(t,z) = lA(t)-ziyl when zeres(zi')

P(t)=-~{ R(t,z)dz

and we shall examine the analyticity of these functions of t in the neighbourhood


Lemma 2.5.1 Let R'(z) = (Α' -zl)~l when zeres(Α'). Then

plHR'iz)] = p[R\z)H] = p[I -(A- zI)R'(z)l (2.5.1)

PROOF We have
HR\z) = {A' - A)R\z) = I-{A-zl)R\z).
On the other hand, from the definition of p(A) as limfc sup || Ak ||l/k it follows that
See also Section 2.12.

Lemma 2.5.2 The resolvent R(t,z) is analytic in the disk {t;\t\<p~l[/iK'(z)] }

and it has the Taylor expansion

R(t,z)= Σ tklR'(z)HfR'(z) = R'(z) f tkLHR'(z)f.

k=0 k=0

PROOF We have
A(t) = zI = A'-zI-tH = (Α' - zI)U - tR\z)H\


R(t, z) = [(A' - zl)(l - tR\z)HY\ " * = £ tklR\z)HfR'(z)


if and only if
We deduce that t = 1 belongs to the domain of analyticity provided that
p[R'(z)H~\ < 1. This holds under the classic condition
but it may also hold for matrices of greater norm, as the following example shows.

Example 2.5.1 In R2 let

We wish to determine the set of matrices H which ensure that t = 1 belongs to

the domain of analyticity of R(t,z) when z describes the circle Γ:{ζ;|ζ| = 1}
surrounding the eigenvalue 0. We have

R(z) = (A-ziyl = ( ~l/z °

V 0 1/2-z
-x/z 0 /

p[HR(z)] = ^

max |Ä(z)||2 = m a x f l _ L ) B L
ζεΓ »r V|z| | 2 - z | /
As z describes Γ, the condition

||Η|| 2 <Γιωχ||Λ(ζ)|| 2 ) ' = 1

is satisfied provided that max(|x|,|.y|) < 1, while the condition

is satisfied provided that \xy\ < 1.

Figure 2.5.1

The corresponding sets of points in the (x,y) plane are bounded by the
circumference of the square and by the two hyperbolas shown in Figure 2.5.1. If
we choose x = y/n,y= l/n,thenx>/ = l/ x /tt->Oasw->oo;but ||H|| 2 = H 1 / 2 -»OO.


By Lemma 2.5.2, R(t,z) is analytic when \t\ < p~l[HR'(z)~\. Now x(t) = R(r,z)b
can be computed in various ways by starting from x' = R\z)b.

Proposition 2.6.1 When \t\<p~l[HK'(z)], the series

x(t) = ttkyk
converges, where y0 = x' and
y^lR'Wfx^R'Wy^, (fc^l).
PROOF The assertion follows immediately from Lemma 2.5.2.
k= Σyi-* x > as
/= 0

*o = Jo = *'.

Lemma 2.6.2 We have the identities

xk = x k _! + R'(z)V> -(A- z/)xfc_ J (k > 1) (2.6.1)

xk = x' + k\z){A' - A)xk.1 (k ^ 1) (2.6.2)

PROOF The identities are proved recursively as follows:


Hy0 = (A1 - zl)x0 -(A- zl)x0 = b-{A- zl)x0;

yk + 1 = R\z)Hyk = R\z)l(A' - zl) -(A- zlfty,
= yk-R\z){A-zI)yk
= R\z)ib-(A-zl)xk_x -(A-zl)yk-\
by making the inductive hypothesis that identity (2.6.1) holds. Hence
yk+1 = R'(z)lb ~ (A - z / ) x j = xk+! - xk,
which is (2.6.1) for k + 1. Next
yl + - + yk = xk- x' = R'(z)H(x' + ■- + y^J
= ^(z)Hx k _ 1 .

The formula (2.6.9) requires a knowledge of A' — A while formula (2.6.1)

requires only a knowledge of the residual error for xk_ x. It is worth noting that,
since xk — xk_ l -»O as k -» oo, the residual error must be computed with increasing
precision, in order to obtain an effective convergence to x.

Example 2.6.1 Consider the equation Ax = b and suppose we know an approxi­

mate solution x' that is an exact solution of Α'χ' = b. For example, x' is known
in simple precision as a result of applying the Gauss factorization LU to A. Put
x 0 = x' and Sk — xk — xk-i(k ^ 1). The iterative method of refinement consists in
solving the equation
A'ök = b-Axk_x=rk (fe^l) (2.6.3)
We have x k -»x as /c->oo, if and only if p[(A' — A)A'~ ] = p ( / — AA'~X) =

p(I — A'~ 1A) < 1. This is true in precision arithmetic. In floating-point arithmetic
we have convergence towards an approximate solution with the precision utilized
in calculating rk. We pose the question: 'When calculating the residue in double
precision can we obtain the solution x in double precision whilst solving the
equation (2.6.3) in simple precision?' The answer is 'yes' if we proceed as follows:

(a) k = 0. x 0 = x' is known in simple precision (through the factorization LU); it

is extended by zeros to obtain numbers in double precision. Thus the vectors
by Ax' and rl = b — Ax' are calculated in double precision.
(b) k = l.rl is truncated to simple precision. We solve Α'δί = rt and v/e calculate
xl = x 0 -f <$! in double precision.

We repeat the procedure until the residue is zero in double precision (see
Exercise 2.6.1).


According to Proposition 2.2.7 the function z\-*p[HR\zy\ is upper semi-continuous
in res {A'). Hence it attains its maximum when z traverses the compact set Γ.

Proposition 2.7.1 The spectral projection

« · - ( = ) ;
is analytic in the disk

δ'Γ = \t;\t\ <maxplHR'(z)Yl


PROOF The matrix P(t) is defined for each t in δ'Γ Let t0e&r * consider its

jt;|i-r0|< max || HR(i0)z || |.

Now the series

R(t9 z) = R(t09 z) Σ (t - t0)kHR(t0i zf

fc = 0

converges uniformly for z on Γ. On integrating over Γ we immediately verify

that P(t) is analytic near t0.

Lemma 2.7.2 If the projection P(t) depends continuously on t, when t varies over
a connected region o/C, then the dimension of Im P(t) remains constant.

PROOF Theorem 1.4.2 shows that the map t\—►dimlmP(i) is a continuous

function. Since its values lie in IN, the function is constant.
Therefore the dimension of the invariant subspace M(t) = Im P(t) is constant
and equal to m (say) when t varies in δ'Γ: the total algebraic multiplicity of the
eigenvalues of A(t) inside Γ is preserved. Let B(t) be an m by m matrix representing
A(t)^my Put l(t) = (\/m)trB(t); then 1(f) represents the arithmetic mean of the
eigenvalues of A(t) inside Γ.

Proposition 2.7.3 The functions l(t) and B(t) are analytic in δ'Γ.

PROOF Let t0eö'r. Suppose that P{t0) can be written as

P(t0) = X0Y*.
When t -► t0 we have P(t) -► P(t0) and
G(t) = Y*P(t)X0->Y*X0Y*X0 = /.
According to Lemma 1.2.4 the vectors P(t)X0 form a basis whose adjoint basis
in the space generated by Y0 is given by
The matrix
B(t) = Z*(t)A(t)P(t)X0
represents A(t)^m) in the adjoint bases P{t)X0 and Z(i). We have
B(t) = G(t)-lY*(A'-tH)P(t)X0.
Now G(i)~* is analytic in the neighbourhood of t0 because G(t) -» / when t -+ i 0 .
Hence J5(i) is analytic and so is ml(i) = tr B(t).
Proposition 2.7.3 allows us to conclude that the simple eigenvalues are analytic
functions of t. However, in general, this is not true for multiple eigenvalues.

Example 2.7.1 The matrix

«-C i)
has the eigenvalues ± J~t and

*»-(: i)
has zero as an eigenvalue of index 2. We remark that the arithmetic mean of the
eigenvalues remains constant at zero. The matrix

«-C 0)
has the eigenvalues \{t ± y/t + At) whose arithmetic mean is equal to \i.


We give below the form of the Taylor expansions in t of P(t) and 2(f) for all t in
δ'Γ. These expansions are known as the Rellich*-Kato expansions. Let F represent
the spectral projection of A' associated with the eigenvalue X of index Λ

* Franz Rellich, 1906-1955, born in Tramin, died in Göttingen.


Theorem 2.8.1 For every t in δ'Γ we have the expansions

P{t) = P'- Σ i*"* 1 XS /(p,) ifS' <P2) --HS' (Pk) ,

k=2 *

\t) = X + - £ -YtrlHS'iPi)· HS'^l


S'i0) = _ p> y ( -p) = D>P y(p) = yp when p > 0,

D' = (Af - λ'Ι)Ρ' and 5' = lim R'(z)(I - P').


PROOF The reader is referred to the proof given in Kato (1976, pp. 74-80).
On account of the complexity of their coefficients these expansions are chiefly
of theoretical interest. In what follows we shall meet the Rayleigh-Schrödinger*
expansions which possess the double advantage of having coefficients that can
be calculated recursively and of covering the case in which Γ encloses several
eigenvalues {μ$\9 of A'.


Let {μ\}\ be the block of r eigenvalues, each counted with its multiplicity, which
is isolated from the rest of the spectrum and which we wish to treat simultaneously.
Let Γ be a Jordan curves lying in res (Α') which isolates this block from the rest
of sp(y4'). Let X' be a basis for the associated invariant subspace M' and let Y
be an adjoint basis in the subspace N [such that ω(Μ\ N) < 1]. Thus Y*X' = /.
Then IT = X T * is the projection on M' along N1 = W.
Let M(i) be the invariant subspace associated with the eigenvalues of A(t)
inside Γ. If ω(Μ(ί), N) < 1, it is possible to choose a basis X(t) in M(i) such that
Y*X(t) = /.
is the r by r matrix that represents A(i) tM(I) relative to the base X(t) and Y. UX(t)
and B(t) are analytic in t, there exist expansions of the form

X(t)=fjtkZk and B(i)=Xi"Q,

k=0 fc=0

whose coefficients we are going to determine.

♦Erwin Schrödinger, 1887-1961, born in Vienna, died at Alpbach.


Let IX\X'~\ and [Y, J ] be adjoint bases of C". We put

B' = Γ Μ ' * ' and & = ΓΜ'Χ'.
By hypothesis σ' = sp(£') forms a block of eigenvalues of Λ'. Furthermore, we
define the partial block inverse
Σ! = Χ'(Ι?,ΒΤ)-ιΥ' (iniV 1 )
relative to the block σ' (see Section 2.4.2).

Proposition 2.9.1 The coefficients Zk and Ck are formally the solutions of the
recurrence relations
C0 = B\ Ck = Y*(A'Zk -HZk-x) (k>\\

with the convention that ]Γ?= 1 = 0.

PROOF The proof proceeds recursively by comparing coefficients of tk in the

(A' - tH)X(t) = X(t)Y*(A' - tH)X(t\
Y*X(t) = /.
When k = 0, we may choose Z 0 = X\ and so
Y*Z0 = I, C0 = B'.
When k = 1, we obtain the system
(/ - ITM'Zi - ZXB' = (/ - Π')#Ζ 0
which has the unique solution Zx = ΣΉΖ0. Hence Cx = Y*(A'Zl - HZ 0 ).
The remainder of the proof is left to the reader.

Remark If we choose Y=X'+, then ΓΓ and Σ' become ?' and S' respectively and
x'?A'Zk = B'X'fZk = 0,


Lemma 2.9.2 If p\_(P(t) - Ρ')ΓΓ] < 1 w/ien ί //es in <5r, ί/ι<?η ίΛέ? mairix
S(t) = lY*P(t)XT1
is analytic in δ'Γ and Π(ί) = P(t)X'S(t) Y* defines the projection on M(t) along N1.

PROOF By Proposition 2.7.1, P(t) is analytic at all t in δ'Γ; moreover,

PlY*iP(t)- Ρ1ΧΊ = p{LP(t)-ΡΊΧΎ*} < i.
Hence / + Y*[P(t) - P']X' = Y*P(t)X\ exists and is invertible for all t in δ'Γ.
Its inverse S(t) = lY*P(t)X'] represents Yl'P(t)lM,. The columns of X'S(t) form a
basis of M' and P(t)X'S(t) is a set of r vectors of M(t) such that Κ*Ρ(ί)Χ'5(ί) = /.
This is the basis X(i) that is adjoint to Y. Incidentally, we have shown that, under
the hypothesis of the lemma, we have ω(Μ(ί), N) < 1.

Theorem 2.9.3 Let ρ = (\Γ\/π)\\Π'\\2Γ'*\\Η\\29 where |Γ| denotes the length of Γ

r'r = max||R'(z)|| 2 .

77ien r/ie disk {t; \t\ < \/p} belongs to the domain of analyticity ofX(t) and B(t).

PROOF Since F and IT are projections (in general oblique) we have

||Π'||2^1 and 1 < ||P'||2 < ( ^ V r ·

Hence if |i| < 1/p, we have

U Ml J^ II 2 ^ < - -- -—*— ^ -1 -
2|Γ|/·ί-||Π|| 2 2
We deduce that


[P(l)-F]ir = ^ tR'(z)H £ ltR'(z)HfR'(z)n'.

2ιπ fc = 0


|[Ρ(ί)-Ρ']ΠΊΙ 2 ^^||ΠΊΙ 2 |ί|||Η|| 2 Γ' Γ 2 Γ £ (|i|||H|| 2 r' r ) f c ]<l.

On the other hand, since |r| \\H\\2r'r < | , it is clear that teS'r. Hence, by Lemma
2.9.2, P{t) is analytic and so also are X(t) and B(t) = Y*(A' - tH)X(t).

Corollary 2.9.4 / / ||H|| 2 satisfies the condition

--)||ΠΊΙ2Γ'Γ2||Η||2<1, (2.9.1)

then t = 1 belongs to the domain of analyticity of X(t) and B(t). In particular,

X = Y£=0Zk is the basis of the subspace M which is invariant under A; it is
associated with the eigenvalues ofB = Y*AX, which represents A fM, where Y*X = /.

PROOF The condition 1/s > 1 is satisfied under the hypothesis (2.9.1), which can
also be written as

|Γ|||Π'||2Γ'Γ2 2r'r
by virtue of the relations mentioned at the beginning of the proof of Theorem
Let q be such that s < q < 1, so that

q s
Now X(i) and B(t) are analytic on the circle {i; |i| = \/q}. Let
0L = max(\\X(t)-X'\\2;\t\=\/q\
ß = max(\\B(t)-B'\\2;\t\ = \/q).
By Cauchy's inequalities (page 62) we have
IIZJ2W, \\Ck\\2^ßqk.
This demonstrates that the convergence is at least as fast as that of a geometric
progression of common ratio q, where q is arbitrarily close to s provided that
|| H || 2 satisfies condition (2.9.1).
If we choose Y = X' = Q', where Q' is an orthonormal basis of the invariant
subspace M', then ΓΓ is an orthogonal projection and ||Π'|| 2 = 1 on condition
(2.9.1). This condition is satisfied when \\H\\2 is sufficiently small. This furnishes
a theoretical framework which suffices for the error analysis that we shall under­
take in Chapter 4. Nevertheless, it is interesting to remark that the sufficient
condition for analyticity given in Lemma 2.9.2 may be satisfied even when ||H|| 2
is not 'small' (see Exercise 2.9.1).


As we have seen, the perturbation theory which we have expounded in Sections
2.5 to 2.9 enables us to calculate iteratively the eigenelements of A by starting
with those of a neighbouring matrix A' provided that t = 1 belongs to the domain
of analyticity. In what follows we shall present a different class of iterative
calculations. They arise from the formulation of the eigenvalue problem as a
non-linear equation in the basis of the invariant subspace, normalized with the
help of linear forms.

Let σ = {μ,: 1 < i ^ r} be a block of r eigenvalues of A, each counted with its

algebraic multiplicity and separated from the remainder of the spectrum. Let M
be the associated invariant subspace and let Y be a basis of a subspace N such
that ω(Μ, N) < 1. Next, let X be the basis of M satisfying Y*X = /; then Π = X Y*
is the projection on M along N1 — W.

Theorem 2.10.1 Ι/Οφσ, the basis X which satisfies AX — XB and which is norma­
lized by Y*X = I is a solution of
F(X) = AX- X{Y*AX) = 0. (2.10.1)

PROOF Equation (2.10.1) expresses the fact that there exists a matrix B such that
AX = XB. Hence X is invariant under A. On multiplying on the left by Y* we
deduce that
(/ - Y*X) = 0,
which implies that Y*X = / since B is regular by virtue of the assumption
0£σ = sp(ß).
The Frochet* differential of F, which is a quadratic function of X, is easily
calculated, namely
Zh+(DXF)Z = J(X)Z = (/ -XY*)AZ - Z(Y*AX)
= (/-ΠμΖ-Ζ£, (2.10.2)
where Z ranges over C .

Theorem 2.10.2 On the assumption that Οφσ, we have

(a) J(X) is invertible.
(b) J satisfies a Lipschitz1' condition over (P x r.

(a) Let τ = sp(>l) — σ. Now
sp((J-IIM) = a-{0}.
ansp((/-n).4) = 0 ,
because Ο^σ.
(b) Let Xx and X2 be elements of C . Then
[ J ^ ) - J(X 2 )]Z = (X2 - XX)Y*AZ + ZY*A(X2 -X,)

* Maurice Frechet, 1878-1973, born at Maligny, died in Paris.

Rudolf Otto Sigmund Lipschitz, 1832-1903, born at Bonkeim, died in Bonn.

and so
|| J&J - J ( * 2 ) K 2|| Y*A|| \\X2 - Xt ||.
We define
HT = [Z; 7*Z = 0];
this is a subspace of C .

Lemma 2.10.3 Let V= [vx,..., i?r] be a matrix satisfying Y*V= I. Then F(V)eW.
IfJ(V) and Y*AV are invertible, the space iV is invariant under J _ 1 ( F ) .

PROOF We have
y*F(F)= y*(/- VY*)AV=O,
which proves the first assertion.
Next consider the equation
( / - VY*)AZ-Z(Y*AV) = C9 (2.10.3)
where C is such that Y*C = 0. Since we suppose that J(K) is invertible, we have
sp ((/ - V Y*)A) n sp (7*A V) = 0.
The solution Z of equation (2.10.3) exists and is such that
Y*Z(Y*AV)= y*c = o.
This implies that Y*Z = 0 because Y*A V is invertible. Thus we have shown that
Ce1T implies that J'l(V)CeiT.

Equation (2.10.1) can be solved by Newton's* method: starting from an appro­

ximate solution U such that Y*U = I we define a sequence (Xk) such that
Y*Xk = / (k > l) by formally applying Newton's iteration:
X° = U, Xk +1
=Xk-J- ^JFCY*) (k > 0). (2.10.4)

Theorem 2.10.4 On the assumption that 0$G, there exists p>0 such that, for
every U satisfying \\X — U\\ < p, the sequence defined in (2.10.4) is meaningful and
converges quadratically to X as k tends to infinity.

PROOF Using Lemma 2.10.3 we show recursively that Y*Xk — I, that is Xk+1 — Xk
lies in iV if we suppose that, at each step, J(Xk) and Y*AXk are invertible. Since
we assume that Ο^σ, the matrix B = Y*AX is invertible. Now the function
Bv-►det B is continuous: hence for some a > 0 there exists ργ such that, for all V
satisfying \\X - V\\ < pu we have that det(Y*AV) ^ a > 0.

♦Sir Isaac Newton, 1642-1727, born at Woolsthorpe, died in London.


On the other hand, since 3(X) is invertible and J satisfies a Lipschitz condition,
there exists p2 ^ px such that, for every U such that \\X - l / | | < Ρι·>the sequence
(2.10.4) converges quadratically to X (see Exercises 2.10.1 and (2.10.2).
Finally, we remark that when Xk -* X, then F(Xk) tends to zero and, in practice,
must therefore be calculated with increasing precision (see Sections 2.6 and 2.12).

Iterations of the type (2.10.4) are costly because they require the solution of a
different Sylvester equation at each step. We can modify this type of iteration by
keeping the system that has to be solved fixed in the course of the iterations. Thus
we define a family of modified Newton methods (with fixed gradient) whose
convergence is linear. We present these methods in relation to the variable
deviation Vk = Xk — X°\ for the study of convergence is simpler in this version.

2.11.1 The Simplified Method of Newton (Method of the Tangent)

In (2.10.4) we fix J(Xk) to be equal to J 0 = J(U). Let Vk = Xk - U. Then
^0 = 0,
K1== - J " 1 / * , (2.11.1)
Vk+x = Vi+^liVkY*AVk-] (k2*l),
J 0 = J(l/), R = AU-UB, B=Y'AU = @(U,Y)
Here R is the matrix residual in U; S = Y*A — BY* is the left matrix residual in
K Put
y = II Jo Ml, P = 11*11, s=||S||

Let g be the function defined by

l-yr^i (2.11.2)
g(t) =
We have 1 < g(t) ^ 2 and ^(0) = 1 (see Figure 2.11.1).

0 1/4 t
Figure 2.11.1

Lemma 2.11.1 When ε < £ we have

\Wk\\^9(e)yp (k>l).

PROOF Put || Vl || = πχ ^ yp and suppose that || Vk\\ < Π*. Then

iorY*AVk = SVk.
Define a sequence (xfc) by putting x 0 = 0 and
% = π1(1 + xk) (fc^l).
Then xk+ x = ε(1 + xk) , where ε = ys\\ Vl ||.
It can be shown inductively that xfc is a monotonic increasing sequence (k ^ 0);
it tends to x where
1 + x = g(s)
(see Figure 2.11.2). Since xk < x, we conclude that
\\Vk\\^Kk^n1(l+x) = g{e)nl.

Theorem 2.11.2 When ε<±, the map

is a contraction map in the ball
® = {V;\\V\\^g(e)\\Vi\\}.

PROOF We have
G(V)- G(V) = J - ' I V Y * A V - V'Y*A V\
= J ^ ' C i ^ - V')Y*AV+ V'Y*A(V- ν')]=ξ.
By hypothesis, max( || V \\, || V'\\) < «(e)|| Kt || = g(*)*i■ Hence
H\\ <2^(ε)ν 5πι ||Κ- K'H =2εοτ(ε)|| F - HI-

a = 2sg(e) < 4ε < 1
when ε < \.
Consequently, G possesses a unique fixed point V in & which satisfies
A(U + V) = (U + V)Y*A(U + V\
AX = X£.
One can calculate V by successive approximations starting from V0 = 0; the
iteration (2.11.1) converges linearly to V provided that ||R|| < \y2s:
Since a is the contraction constant of G, we have

\\V~Vk\\^^—\\Vi\\ (*>1),
1 —a

α = 2ε0(ε), e = ys||K1||.

Similarly, if Bk = Y*AXk9 then

B - Bk = Y*A(X - Xk) = Y*A(V- Vk\


\\B-Bk\\ = \\Y*A(V-Vk)\\^^s\\Vxl
1 —a
Corollary 2.11.3 The upper bounds \\V - X\\ ^g(z)\\3-lR\\ and \\B-B\\^
g(e)s\\Jo XR\\ are valid when ε<\.

PROOF The matrix V= X - U belongs to @ and B - B = Y*A(X -U) = SV.

If the initial basis U is orthonormal so that (7*1/ = /, we can choose Y—U

and the limit basis satisfies U*X = /. Then \\X — U\\p = ||tan Θ|| ρ , where p = 2 or
F, and Θ is the diagonal matrix consisting of the canonical angles between the
exact and the approximate invariant subspaces.

2.11.2 The Modified Method of Newton

(a) Let A'' = A + H and let X' be a basis of the subspace Λί' which is invariant
under A, Suppose that 7*X' = /. The Frechet differential which was defined
in equation (2.10.2) can be approximated by the map
Zi-*J'Z = (/ - X'Y*)A'Z - ZY*A'X'.

We obtain the iteration

Vl=J"lHX\ (2.11.3)
Vk+l = V, + 3'-l[VkY*AVk + HVk- VkY*HX'l
In Chapter 4, this iteration will be used to obtain bounds for the error linking
the eigenelements of A with those of A'.
(b) We can modify J 0 in the following way:
Z -> J Z = (/ - U Y*)AZ - ZB,
where B is defined as follows: let
T = Q*BQ
be the Schur form of
B= Y*AU.

where the d are the r eigenvalues of B and ζ is their arithmetic mean. Then

B = QTQ*.

We obtain the iteration:

V^-J'R, (2.11.4)

This iteration will be used in Chapter 5 in order to study the speed of

convergence of certain computational methods of the inverse iteration type.
We remark that the iteration (2.11.1) rests on the quadratic form VkY*AVk9
which explains why the iteration converges provided that ||£|| is sufficiently
small. By way of contrast, the iterations (2.11.3) or (2.11.4) contain terms that are
linear in Vk; consequently, the convergence can be established only when ||H||,
or ||2? — B || and ||Ä|| respectively, are small enough.


A very large number of methods for the approximate solution of equations can
be presented in the general context of correcting the residual (Stetter, 1978), for
which we shall give here a formal presentation.

Let F be a map, not necessarily linear, of C into <Ert, defined on Dom F. We

consider the equation
F(x) = 0 (2.12.1)
Suppose F is defined on the ball

# = {y,\\y-x\\<p}.
Suppose, further, that we are able to evaluate the map F and that we know an
approximate inverse G defined in the following manner.

Definition We say that G is an approximate local inverse ofF if and only if the
following three conditions are satisfied:
(a) F ( J ) c D o m G .
(b) G(0)e<^.
(c) U = 1 — G°F is a contraction map on $.
The method of residual correction consists in producing the sequence
x (0) = G(0), x(k + υ = G(0) + U(xik)) (k ^ 0). (2.12.2)

Proposition 2.12.1 If G is an approximate local inverse of F, then x(k)-+x as

k -+ oc and F(x) = 0.

PROOF The existence of lim xik) is assured because the sequence defined in
(2.12.2) is a Cauchy sequence. Let
x = limx (k) as fc-»oo.
We shall show that x is a fixed point of the map
K(y) = G(0)+l/(y).
On letting k tend to oo in (2.12.2) we obtain
x = G(0)+l/(x).

V(x) = G(0) + U(x) = x.

For each y of 3t we have
II V(y)-x|| = || V(y)- V(x)|| ^ φ ) \ \ y - x \ \ < p ,
where t(u) < 1 is the contraction constant of U,
Thus V(x)e& and V(&) <= ^ . It is easy to see that F is injective on ^ . Hence
x is the unique zero of F in ^ .
The iteration (2.12.2) can also be interpreted as follows: x(0) = G(0) is an
approximation of x with residual F(x(0)) and error x — G(0) = U(x) = e. In its turn
the error e can be approximated by <?(1) = x (1) - G(0) = C/(x(0)) = x (0) - GF(x(0)).

On iterating we obtain the method of correction of the residual F(xik)) at the

iterate xik). The method is the more effective the smaller the contraction constant
A particular case of importance is that of the linear equation
Ax = b,
for which F is the affine map
F(y) = Ay-b.
Let B be an approximate inverse of A: thus Ü = / — BA has the property that
p(U) < 1. For a given u in & define
G(y) = By + u.
Then G is an approximate inverse of F.
The ensuing method of residual correction is as follows:
x<°> = u, x(k+1) = x(*> - B(Ax(k) -b), k^ 0.

Example 2.12.1 Suppose we wish to solve the equation

Ax = b.
We assume that there exists a matrix K such that
x = Kx + Hb,
H= (I-K)~lA.
For a given vector w, define G by
G(y) = Ku + Hb + Hy.
Then G is an approximate inverse if and only if
We have the decomposition
A = D(L+1 + U\
where D is the diagonal of A and L and U are the lower and upper triangular
parts of D~ lA. The iterative methods of Jacobi,* Gauss-Seide^ and the accelera­
ted iteration correspond respectively to
(a) Kj= -(L+U\Hb = D-lb\
(b) K G S =-(/ + L)- u,fffc = (/ + Lr 1 0- 1 fc;

(c) Κω = (Ι + (oL)-l[(1 - ω ) Ι - o > O \ H b = ω(Ι + coL)" l D~%where0<a><2.

♦Carl Gustav Jacob Jacobi, 1804-1851, born at Potsdam, died in Berlin.

Philipp Ludwig von Seidel, 1821-1896, born at Zweibrücken, died in Munich.

We leave it to the reader to verify that the iteration of the residual correction
x (0) = Ku + Hb, xik+l) = K{k) + Hb
reverts to the three well-known iterative methods.

Example 2.12.2 We return to the context of Example 2.6.1. On choosing B = A'"1

and G(y) = A'~ly + A"2b, we obtain the iteration
x<o> = X'f x<*+ i) = X' + A'-\A' - A)x(k\
which should be compared with the identity (2.6.2).
Other important examples of the use of the notion of an approximate inverse
in linear algebra are the techniques of preconditioning (Exercise 2.12.2) and the
multigrid methods (Exercise 2.12.3). It should be borne in mind that the precision
obtained for xik + υ is the same as that for the residual F(x{k)).


The spectral theory of closed operators in a Banach space (for the calculation of
isolated eigenvalues of finite multiplicity) is found in the expositions of Chatelin
(1983) and Kato (1976). The use of Cauchy's integral formula (2.2.6) for a matrix
goes back to Poincare (1935). The theory of analytic perturbations for an operator
is treated in Chatelin (1983), Kato (1976), Baumgärtel (1985) and Wasow(1978).
The reduced resolvent of Kato is known in other contexts as the Drazin inverse
and, in the case of a semi-simple eigenvalue, as the group inverse (see the book by
Campbell-Meyer, 1991). The notion of block-reduced resolvent is new (Chatelin,
1984), as are the Rayleigh-Schrödinger expansions in Section 2.9. The use of a
quadratic formulation for the computation of an eigenvector associated with a
simple eigenvalue is in Anselone and Rail (1968). The technique of proving
Lemma 2.11.1 and Theorem 2.11.2 is borrowed from Stewart's article (1973a).
The special case y = en in (2.11.1) is treated in Dongarra, Moler and Wilkinson
Stetter's paper (1978) is fundamental for the methods of residual correction.
The methods for iterative refinement are the subject of the book by Kulisch and

Section 2.1 Revision of Some Properties of Functions of a Complex Variable
2.1.1 [B:41] Let Ω be an open set in C Suppose the maps /;Ω->(Ε, p:R 2 -+R,
<?:R2 ->R and #:R 2 ->C are such that
f(z) = p(x, y) 4- iq(x, y) = g(x9 y)
for all z = x 4- iy in Ω.

Prove that f is holomorphic in Ω if and only if g is differentiable in Ω and

satisfies the Cauchy-Riemann conditions
dp dq dp dq
—= — andΛ —=
dx dy dy dx
in Ω. Deduce that p and q satisfy the equation

δχ2 + dy2""
2.1.2 [D] Let Ω be an open set in C Prove that
zeüv-*A(z) = [α0·(ζ)]€€"xm
is analytic if and only if the mn functions ζθΩι->α0·(ζ) are analytic.

Section 2.2 Singularities of the Resolvent

2.2.1 [B:35] Prove that the resolvent R(z) satisfies the equation

! * ( * ) = _[K(z)] a f
and, more generally,

^-kR(z) = kl(-\)klR(z)f +1
2.2.3 [B:63] Show directly that the function zi-»p(R(z)) is upper semi-conti­
nuous on the resolvent set.
2.2.3 [D] Let As<£n xn and let M be the invariant subspace associated with an
eigenvalue λ of A. Let X be a basis of M and let X+ be_a basis of the orthogonal
complement of the complementary invariant subspace M{M+ = ML,M®M = C),
such that XIX = /. Prove that

XX* =
* — L f; R(z)dz,
where Γ is a Jordan curve lying in res(,4) and isolating λ.
2.2.4 [B:35] Let P be the spectral projection associated with an eigenvalue λ.
Prove that
limK(z)(/-P) = S,

where S is the reduced resolvent associated with λ.


2.2.5 [D] Let / : C -► C be a holomorphic function. For any square matrix A we

define f(A) by Cauchy's formula (page 61). Show that if V is an invertible matrix,

2.2.6 [B:10] Investigate sufficient conditions for matrices A and B to satisfy the

2.2.7 [D] Let A be a square matrix. Let Xesp(A) and let P be the associated
spectral projection. Prove that if 0£sp(/4), then A~ί exists, λ~* is an eigenvalue
οϊ A'1 and P is the spectral projection of A~l associated with λ~ι.

2.2.8 [B:38] Consider Sylvester's equation

AX + XB = C. (1)
(a) Prove that if the eigenvalues of A and of B have negative real parts, then the
unique solution of equation (1) is given by

ΛΓ= - \ QAtCeBtat.
(b) Let a be a non-zero number in
res (A) n res (B).

/ ( ζ ) = (ζ + α ) ( ζ - α Γ \
U = f(A),

Prove that X is a solution of equation (1) if and only if X is a solution of
X-UXV=D. (2)
(c) Prove that if the eigenvalues of A and of B have negative real parts, then
p(U)<\ and p(V)<l·
(d) Prove that if p(U)p(V) < 1, then the solution of equation (2) can be expanded
in the series

X= £ Un~lDVn~l.
n= 0

Section 2.3 The Reduced Resolvent and the Partial Inverse

2.3.1 [A] Adapt the factorization of Gauss or of Schmidt in order to solve the

where A is an eigenvalues of A and X+ is a basis of the orthogonal complement

of the complement of the invariant subspace associated with A.
2.3.2 [A] Let B and B be the matrices introduced in Sections 2.3.1 and 2.3.2
(pages 73 and 74). Prove that they have the same Jordan form and are therefore
2.3.3 [A] Prove Lemma 2.3.5 (page 76).
2.3.4 [D] Let A be an invertible matrix. Let Xesp(A) and let P be the associa­
ted spectral projection and S(A, A) the associated reduced resolvent. Calculate
S(A ~ \ A~*) as a function of S(A, A) in two different ways, as follows:
(a) By the formula
S(A-\X~l)= lim R(A~\z)(I-P).

(b) By the formula

2mJr ζ—λ
where Γ is a closed Jordan curve which isolates A -1 from the rest of the
spectrum of A"1.
2.3.5 [C] Let
A-( V—21 Λ
(a) Verify that A = 25 is an eigenvalue of A.
(b) Calculate the reduced resolvent S associated with A.
(c) Calculate the partial inverse Σ 1 associated with A and the orthogonal projec­
tion upon the invariant subspace.
(d) Compare ||5|| 2 with ||Σ Χ || 2 .
2.3.6 [A] Let Aesp(,4). Let M be the invariant subspace associated with A, Π
any projection on M, Π 1 the orthogonal projection on Μ,Σ(Π) the partial
inverse associated with A and Π. Prove that
imn 1 )«j = min ||Σ(Π)||,. (j = 2 or F).

2.3.7 [B:9] Identify S with the Drazin inverse (A - λΙ)Ό and, when λ is semi-
simple, with the group inverse (A - λΐψ. If A is normal, prove that
(A - λϊγ = (Α- λΙ)Ό = (A- Xlf = 5
where f denote the Moore-Penrose inverse, or pseudo-inverse (Exercise 1.6.8).

Section 2.4 The Block-Reduced Resolvent

2.4.1 [B:24] Solve the equation
(/ - P)AZ -ZB =R
with the help of the Schur form of B and the Hessenberg form of (/ — P)A.
2.4.2 [D] Solve the equation
(l-P)AZ-ZB =R
by Gauss's method, reducing it to r systems of n + r equations with r unknowns
and of rank n, with the matrices

*-{"?) "-' >

where P is the spectral projection associated with the bloc of eigenvalues {μ,: ι =
2.4.3 [A] Examine the difference between the block-reduced resolvent S and
the reduced resolvent S associated with the same eigenvalue that is double and

Section 2.5 Linear Perturbations of the Matrix A

2.5.1 [D] Examine the analyticity of the solution of the equation
AX - XB(t) = C
in the neighbourhood of t = 0 when
β(ί) = £ + ίΗ, ||Η||2=1, ieC.

Section 2.6 Analyticity of the Resolvent

2.6.1 [A] Examine the convergence of the iteration (2.6.3) defined in Example
2.6.1 (page 83).
2.6.2 [D] Adapt the algorithm (2.6.3) (page 83) to the case of an almost triangular
matrix and to the case of an almost diagonal matrix.

2.6.3 [D] Suggest an algorithm based on Exercise 2.5.1 for solving the equation
AX - XB = C
when B is almost triangular.
2.6.4 [C] Solve the system

(-; TO-C)
by successive iterations, starting from ( I, which is the solution of

("I ?XK>
Section 2.7 Analyticity of the Spectral Projection
2.7.1 [B:35] Let P(t) be the spectral projection of the perturbed matrix A(t) =
A — tH. Give an example which shows that Kato's proof for expansion of P{t)
cannot be extended to the case in which the Jordan curve Γ defining P(t) encloses
several distinct eigenvalues of the matrix A(0) = A.

Section 2.8 The Rellich-Kato Expansions

2.8.1 [D] Write down the Rellich-Kato expansions for a simple eigenvalue and
for a semi-simple eigenvalue.

Section 2.9 The Rayleigh-Schrödinger Expansions

2.9.1 [B:ll] Consider two vectors x and y in Cw such that
y*x = x*x = 1.
Put Q = xy*. Suppose that ξ is a simple eigenvalue of the matrix
Α = ξ<2 + (1-0)Α(1-<2).
(a) Show that
B:i(I - Q)(A - { / ) ] W i : { j i } ^ {y}1 is invertible.
Σ = (/-β)2Τι(/-6).

Let || · || be an induced matrix norm. Put σ = ||Σ|| and let

ίχ =
l - yX-r ^ 4 r ,
u = Ax — ξχ,
v = A*y = ξγ.
Suppose there exists a( ^ σ) and ε such that
(*) |ι>*Σ*ιι|^<ι*||β||6 (k>l).

? = a2\\Q\\e.
(b) Prove that if f < {, then there exists a simple eigenvalue λ of A such that
which is the only eigenvalue of A in the disk
Let P be the spectral projection of A associated with λ and suppose that
y*Px Φ 0. Prove that there exists an eigenvector φ of A associated with λ and
normalized by γ*φ = 1 such that
2.9.2 [D] Verify that in the case of a simple eigenvalue the expansions of
Rellich-Kato and Rayleigh-Schrödinger can be made to coincide, and that this
is also true in the case of a semi-simple eigenvalue.
2.9.3 [D] In the proof of Proposition 2.9.1 (page 87) we calculated first Zk and
then Ck. Suggest a way of first calculating Ck and then Zk.
2.9.4 [D] Verify the identity
n(i) = P(i)Ar,S(i)y*
by using the Rellich-Kato series for P(t) and S(t) and the Rayleigh-Schrödinger
series for Tl(t) = X(t)Y*.

Section 2.10 Non-linear Equation and Newton's Method

2.10.1 [A] Let F'{x) be the Frechet derivative at the point x of an operator F
which is Frechet-differentiable in the neighbourhood V of a zero x*. Suppose
(HI) F'(x*) is regular.
(H2) x\-+F'(x) is uniformly continuous on V.

Prove that there exists p > 0 such that, for every x0 satisfying ||x* — x 0 II < P, the

satisfies ||x* — xk || < p and converges to x* superlinearly.

2.10.2 [A] Using the same notation as in Exercise 2.10.1 we now suppose that:
(HI) F(x*) is regular.
(H2) There exist numbers of p and ί such that in a neighbourhood V of x*,
the map xi-*F(x) satisfies the inequality
(a) Prove that there exists p > 0 such that, for every x 0 satisfying ||x* — x0 II < P,
the sequence
χΛ+1=χ*-Γ(χΛ)-^(χ,) (fc^O)
||x f c -x*||<p
supJUili ^L = c < __| oo.
(b) Deduce the quadratic convergence of Newton's method when the derivative
satisfies a Lipschitz condition in the neighbourhood of the required root.
2.10.3 [D] Write down the method (2.10.4) (p. 91) by taking as the variable the
vk = xk - x°.
2.10.4 [B:48] Consider the equation
F(x) = 0
in a finite-dimensional space (£, ||·||) on the following assumptions: there exist
x0eB, / > 0 , r > 0 , m > 0 and c> 0 such that:
(HI) / is a Lipschitz constant of the operator
nr(x0) = {xeB:||x-x 0 ||<r}
and if (B) is the space of linear operators of B into itself.
(H2) F(x 0 ) is regular.
(H3) ΙΙΠχοΓΜΐ ^mand | | Κ ( χ 0 Γ ^ ( χ 0 ) Κ α

(H4) mic < 0.5.

(H5) mj and c satisfy the inequality


Define the constants

1 - J\ - Imtc
p= ¥■ ,
1 — m{ c — x / l — 2m/c

v= — .
Prove that:
(a) 3x*e{xeB: \\ x - x0 || ^ p} such that F(x*) = 0.
(b) 0 < y < l .
(c) The Newton sequence {xk} satisfies
ll**-*oll <P
2.10.5 [B:19] Consider the equation (2.10.1) (page 90)
AX-X(Y*AX) = 0,
where Y is of full rank m. Change the basis in such a way that the matrix is
replaced by Γ = [/m 0] T .
(a) Prove that relative to this basis the unknown X is replaced by

since it satisfies the condition Y*X = Im. Let

Α' A'
A' =
^ 2 1 ^22>

be the representation of ,4 relative to the new basis.

(b) Show that R satisfies the equation
A'22R-RA'n -RA\2R = - A'2l,
known as Riccatis equation.

2.10.6 [B:19] Consider Riccati's equation

(Exercise 2.10.5). Define the iteration

ARk+! - Rk + XB = RkCRk -D (k^ 0).

Let σ be the least singular value of the operator Rt-*AR — RB and let

(a) Show that if κ < £, then the proposed method of iteration converges linearly
towards a solution R which is the only solution in the closed ball with centre
0 and radius
1-^/1 - 4 K

(b) Show that if κ < 1/12, then Newton's method, applied to Riccati's equation,
converges in a quadratic manner towards a unique solution in the closed ball
defined above.

Section 2.11 Modified Methods

2.11.1 [D] Let (x, A)e(C x C be the unique solution of

where λ is a simple eigenvalue of A.
(a) Write out Newton's method, applied to this problem.
(b) Propose a simplified method with fixed gradient.
(c) Examine the convergence of these methods.
2.11.2 [A] Give sufficient conditions for the convergence of the method (2.11.3)
(page 95).
2.11.3 [A] Give sufficient conditions for the convergence of the method (2.11.4)
(page 95). Establish the contraction constant of the map
in a manner similar to Exercise 2.11.2, provided that \\B — B\\ and ||K|| are
sufficiently small.

2.11.4 [A] Let F be an operator and let x* be a point in its domain such that
F(x*) = 0.
Suppose that F is differentiable in a neighbourhood of x* and that T is a regular
linear operator such that
y= sup IIT-FMII^ir-Mr1,

where || · || is a vector norm and also the corresponding induced norm for linear
operators, and where p is a given positive real number. Prove that if ||x0 — x* || < p
then the sequence

converges to x* in a linear manner.

2.11.5 [D] Let A be a simple eigenvalue of a matrix A and let u be an eigenvector
of A associated with λ and normalized by e*u = 1. Consider the problem

(a) Show that, by means of a permutation, the Frechet differential of F at («, λ)

can be written as
/ \n \
, _ \B(X,u) :

{ 0 1 j
where Β(λ, ύ) denotes the matrix obtained by replacing the last column of
A -λΐ by - i i .
(b) Show that Β(λ, u) is singular if the eigenvalue λ is of multiplicity ^ 2.
(c) Apply Newton's method with fixed slope to

(d) Extend the preceding results to the case of a double eigenvalue by taking the
(AV -UB\

a b
E = (e„^iye„) and B=
c d

2.11.6 [A] Consider the equation

F(x) = 0.
as in Exercise 2.10.4. Retain the hypotheses (HI), (H2) and (H3) of that exercise
and add:
(H4 m£c< 0.25.
(H5) The constants mj and c satisfy the condition
1 - J\ - 4m/c
r> y- .
Define the numbers
_ 1 — χ/1 — 4m^fc
~ Tmfc '
2 ·
Prove that:
(a) 3x*e{xeB: ||x - x 0 II < P} s u c h t h a t F(**) = °-
(b) 0<y<0.5.
(c) The Newton sequence with fixed slope defined by
Χ*+ι=**-^'(*οΓ 1 ^(**)
||χΛ-χ0|| <ρ
2.11.7[D] Compare the Rayleigh-Schrödinger iterates for i = 1 with those
defined in (2.11.1) (page 92) with initial bases U. Comment on these results.

Section 2.12 The Local Approximate Inverse and the Method

of Residual Correction
2.12.1 [D] Let λ be a simple eigenvalue of the matrix A and let y be a vector
that is not orthogonal to the eigenspace associated with λ. Propose a method of
residual correction for the solution of the equation
Ax — xy*Ax = 0,
when A is almost diagonal.

2.12.2 [A] Consider the system Ax = b, where A is regular. Let B be a regular

matrix such that
cond2 (BA)« cond2 (A).
Instead of the original system consider the equivalent preconditioned system
BAx = Bb.
Interpret this preconditioning as an approximate inverse.
2.12.3 [A] Study the multigrid method [B, 28]. Interpret this method in relation
to the notion of an approximate inverse.

Why Compute Eigenvalues?

...Analytical mechanics is much more than an efficient tool for the solution of dynamical
problems that we encounter in physics and engineering There is hardly any other branch
of the mathematical sciences in which abstract mathematical speculation and concrete
physical evidence go so beautifully together and complement each other so perfectly.
... There is a tremendous treasure of philosophical meaning behind the great theories
ofEuler* and Lagrange? and of Hamilton* and Jacobi ...a source of the greatest intellectual
enjoyment to every mathematically-minded person.

Cornelius Lanczos, from the Preface to The Variational Principles of Mechanics

(Toronto University Press, 1962)

The eigenvalues of matrices or linear operators play a part in a very large number
of applications, both theoretical and practical. We shall try to convey an idea of
the extent of applications by citing examples deliberately chosen from very diverse
disciplines: they range from mathematics to chemistry and to the dynamics of
structures; they touch on economics.
While the theoretical applications are fundamental, the industrial applications
are no less important. We mention only the accident of the suspension bridge at
Tacome, in the state of Washington on the West Coast of the United States. This
bridge, of a span of 700 m, collapsed in 1940 under the effect of aeroelastic
vibrations, only four months after it was brought into service. At the moment of
collapse it showed a torsion of 45° against the horizontal in both directions under
the effect of a 70 km per hour wind.


We propose to study the stability of systems of linear differential equations and
of difference equations.

♦Leonard Euler, 1707-1783, born at Basle, died in St Petersburg.

Joseph Louis Lagrange, 1736-1813, born at Turin, died in Paris.
♦Sir William Rowan Hamilton, 1805-1865, born in Dublin, died near Dublin.

3.1.1 Linear Differential Equations

Consider the system of linear differential equations of the first order:

"(0) = u 0 , ^ = Au (i^O), (3.1.1)

where u is a vector in Rw depending on the time t and where A is a constant n
by n matrix.
Let A = XJX " * be the spectral decomposition of A If we put
O= X~1U9

then (3.1.1) becomes

^ = Jv, (3.1.2)
where J is the Jordan form of A: let ί be the size of the greatest block.
The reader will verify (Exercise 3.1.1) that the solution of (3.1.1) for which
u(0) = u0 is given by
u(t) = eAtu0 = XeJtX~lu09 (3.1.3)
where e is an upper-triangular matrix whose elements are of the form tjeXit

[^6Sp(^)] and where 0 ^7 < £{, the index of k{ being t>v

On examining the solution (3.1.3) when t -* oo, we can show that:
(a) The system (3.1.1) is stable and u(t)-^0 if and only if
max Re Xt < 0.

(b) The system is unstable and u(t) is unbounded if there exists an eigenvalue λ
such that
(c) When maxA. Re A, = 0, the solution u{t) is bounded or unbounded according
to whether the eigenvalues for which Re A, = 0 are semi-simple or include at
least one defective eigenvalue (see Exercise 3.1.3).

Remark The equation (3.1.1) models a diffusion problem when the time is regarded
as a continuous variable. If one considers only discrete values of the time, then the
formulation involves a linear recurrence equation.

3.1.2 Difference Equations

Consider the linear recurrence system
u09 uk = Auk„l (k>l). (3.1.4)

It is clear that uk = Aku0 = XJkX~lu0, where Jk is an upper-triangular matrix

whose elements are of the form λ{, Afesp (A), k — ^i+l ^j^k, where i{ is the
index of A,· (see Exercise 3.1.6).
On examining uk when k -> oo we can show that:
(a) The system (3.1.4) is stable and uk->0 if and only if
p(y4) = max|/lf| < 1.

(b) The system is unstable and uk is unbounded when there exists an eigenvalue
λ such that
(c) When p(A) = 1, the solution uk is bounded or unbounded as k -► oo, according
as to whether the eigenvalues ks for which \Xj\ = 1 are semi-simple or include
at least one defective eigenvalue.
Example 3.1.1 Consider the Fibonacci* sequence 0, 1, 1, 2, 3, 5, 8, 13,...:
/o = 0,
h = 1. (3-1-5)


Then (3.1.5) becomes

Wo== M
(o} *vi o) W f c _ 1 (k>V-
The eigenvalues of the matrix are

i±v^ and ΙΖ^.

2 2
When k tends to infinity, fk + Jfk tends to (1 + y/S)/!, the number of the golden
section. The interested reader will find references to the remarkable properties of
the Fibonacci numbers in Strang (1980, pp. 196-8).

Example 3.1.2 Consider the method for calculating yfl, proposed by Theon of
Smyrna (second century B.C.). Starting from (1,1), iterate the transformation
xf->x + 2y,
y\-+x + y.

* Leonardo Fibonacci, about 1170-1250, born and died at Pisa.


It will be found that x2/y2 ->2. The procedure can be formalized as follows:


The reader should verify that xk/yk -> Jl and that the values of this quotient are
alternately greater and less than yjl.
According to whether a phenomenon is modelled by a system of differential
equations (continuous time) or by a difference system (discrete time), the stability
of the system depends, respectively, on the real parts or on the moduli of the
eigenvalues of the matrix describing the system.


A set of random variables Xn where teT, is called a stochastic process. For
example, Xt might represent the number of persons queuing at a Post Office
counter at the instant t. The theory of stochastic processes is used in the study
of queuing theory (telephone exchange, road traffic, information counters and so
on). A stochastic process without memory is called a Markov* chain:
nxk=j\x, = i,j=o,...,k-\)=nxk=j\xk-i = /*-!),
where P(£) denotes the probability of the event E.
The following terminology is often used in relation to Markov chains. The
system which evolves in time (taking the discrete values k = 0,1,...) is said to be
in the state j at the instant k if Xk =j. The probabilities
pg> = P ( X k = 7 l * * - i = 0
are called transition probabilities. For a homogeneous chain, pf) is independent
A homogeneous chain which can take n states is associated with the transition
matrix P of order «, given by

P = (Pij\ Pij^O, ipy=l.

Such a matrix is called a stochastic matrix.

Example 3.2.1 Random walk on a triangular grid of side n + / ( see Figure 3.2.1).
A particle moves at random on the grid by jumping from one point to one of
its (at most four) neighbouring points (N-S-E-W).

*Andrei Andreivitch Markov, 1856-1922, born at Ryazan, died at Petrograd.


2 x<6>
1 x(4) x<5)
0 x x(2) x (3>
/WO 1 2 n= 2

/ - I . /
X /, / + 1
X4-X/, j
up T
1 x -* x /; + 1, /
x down


Figure 3.2.1

(a) The probability of passing from (ij) to (/ — 1, j) or (i, j 1) is

Pd(iJ) =
This probability is doubled when / = 0 o r ; = 0.
(b) The probability of passing from (i, j) to (i + 1, j) or (i,;-f 1) is

We observe that if i+j = n, then pu(/, 7) = 0.

This random walk models the Brownian movement in the plane. The reader
can verify that when n = 2 and the points of the grid are numbered in accordance
with Figure 3.2.1, then P is given by
/ Λ
\ 0 \ 0 θ\
0 J 0 i 0
1 0 0 0 0
P =
0 0 0 \ \
\ 0 \ 0 0
Vo 0 0 1 0 0/

Example 3.2.2 A renewal process, due to breakdown of a piece of equipment

(light bulb, electronic component, television set,...), can be associated with a
Markov chain if the life span of the equipment is a random variable that is
independent of the life span of the preceding equipment and has the same
probability distribution. The variables Ak ( = age of the equipment at the instant
k) form a Markov chain.
The evolution in time of a system that is associated with a homogeneous
Markov chain is characterized by the transition matrix P and the initial


We put

qf* = P(Xk=j) (fc = 0,l,...).
We remark that

£<??'=11^11 ι = ι·

Proposition 3.2.1
qw = q(0)pk (3.2-1)

PROOF We have

qf = t P(Xk =j\Xk.l = >)P(**-i = 0

i= 1

The asymptotic behaviour of a homogeneous Markov chain is determined by

the stationary distribution (if it exists), say
π = lim qik\

which satisfies the conditions

π = πΡ, W l ^ 1, π,-^0. (3.2.2)
Proposition 3.2.2 If the matrix P is irreducible, there exists a stationary distribu­
tion π satisfying (3.2.2).

PROOF Since P is stochastic, one of its eigenvalues is equal to unity, and π τ is

the corresponding left eigenvector. When P is irreducible, the Perron-Frobenius
theorem 1.10.1 asserts that unity is a simple eigenvector and that all components
of π are positive.
The knowledge of the stationary distribution π is useful for studying the
queuing problem modelled by the above Markov chain. In computer science
(communication systems) and in the social sciences, the number of states can be
very large (n > 105) and the matrix P is non-symmetric. Moreover, for very large
systems, there is often a large number of eigenvalues close to unity. The method

of iterative aggregation/disaggregation enables us to accelerate the iteration of

the powers represented by (3.2.1) (see Exercises 3.2.2 and 3.2.3).


We present here a formal account (Morishima, 1971) of what is known as the
Marx*-von Neumann* theory in economics. The models of Leontiev (1941) and
von Neumann (1945-6) are studied in Exercises 3.3.5 and 3.3.8.
Consider a productive economy which is divided into n branches in such a
way that each branch produces only one kind of article and, conversely, each
kind of article is produced only by one branch. We denote by au the quantity of
articles required for the unique production of the branch/ The matrix A = (αί7),
where ai} ^ 0, is called the matrix of technical coefficients. For a production
x = (x!,..., xn)
the vector Ax represents the quantities necessary for this production. Hence
y = x — Ax
represents the net production.
We shall briefly present the linear production model due to Leontiev. The
hypothesis of linearity has two consequences; there are no possible substitutions
between the products consumed in the production and there are constant returns
to scale. Exercise 3.3.1 deals with the choice of units in the definition of A.
We shall now take into account work and wages. In order to produce one unit
of the article;, the branchy uses ^ workers. We put
/ τ =(Λ,...,α
We suppose that the salary w is the same in each branch and that it is entirely
devoted to consumption: each worker uses up the quantity dt of the article i. We
(F = (dl9...,dJ.
The total consumption of the article i for the production of the article; is therefore
atj + üf,·^·. Hence we have the sociotechnical matrix
B = A + ίΊά.
Additional assumptions of the model are that the consumptions of the workers
are the same at whichever branch they are employed, that there are no luxury
articles, that there is no fixed capital, that all the profit is accumulated and finally
that there exists a price system that renders each branch profitable.

*Karl Marx, 1818-1883, born at Trier, died in London.

John von Neumann, 1903-1957, born in Budapest, died in Washington.

In the framework of this model we seek to know

(a) whether there exists a price system p that ensures an equal rate of profit r for
each branch, (3.3.1)
(b) whether there exists a structure of production x that ensures the same rate
of growth τ for each branch (balanced growth). (3.3.2)

Proposition 3.3.1 If the sociotechnical matrix B is irreducible, there exist p, x and

τ — r that satisfy the conditions (3.3.1) and (3.3.2) and are such that
1 1
pB = p, Bx = x.
1+r 1+τ


(a) Let p = (pl9..., pn) be a price system. The production cost of one unit of; is
n n
J = Σ auPj + W^J = Σ (au + *A)Pi
i- 1 i=l

= Σ buPi·
i= 1
It is required that p} = (1 + r)cy Hence p satisfies (3.3.1) if and only if

Since B is an irreducible matrix with non-negative coefficients, it possesses

a simple eigenvalue p > 0 such that

pB = pp, p>0,
p= -—.
1 +r
Moreover, p < 1, because by hypothesis all branches are profitable (Exercise
3.3.2). Hence r = l / p - l > 0 .
(b) Let x = (x!,...,xn)T be the production structure. The total quantity of n
articles required for the production of one unit of i is given by
n n
di = Σ (fly + tjddXj = X buXj.

In order that x, = (1 + x)dh it is necessary and sufficient that Bx = px, x > 0

and p = (1 + τ)" \ whence τ = r.
The rate of growth τ is equal to the rate of profit r. This property is known
as the golden rule of growth. We remark that the quantities /?, x and τ = r are
The models for economic planning involve the matrix of technical coefficients,
which can be calculated from accounting tables. When we are concerned with

forecasting models comprising a group of countries or the whole world, then the
matrices involved are structured in blocks, but they are not symmetric and are
of gigantic size. To give an indication, the input-output array for the years 1970
to 1979 supplied by INSEE for France corresponds to a division into about 600
branches of activity. Aggregated versions exist comprising 91,35 and 15 branches


We shall describe the analysis into principal components of a cluster of points.
Other methods of factorial analysis are described in Exercises 3.4.5 to 3.4.7.
We wish to analyse a set of data which are presented in the form of n vectors
{Sj} in Rfc carrying masses ax,..., an respectively.
The space is endowed with a norm defined as a positive-definite matrix B of
order k. Put

s = Σ aJsJ and X
J = SJ - £

The matrix

of order k by n represents the data referred to the mass centre. The principal
components analysis of the cluster of points consists in projecting them on to
the plane which minimizes their dispersion in R* in the sense of the norm defined
by B. We use the notation
/t = diag(at).
The method consists in calculating the two greatest eigenvalues of the matrix
together with the corresponding two eigenvectors. In general, the matrix U is
not symmetric.

Lemma 3.4.1 (Barra, 1981) Let X, A and B be three matrices of orders kxn,
n xn and kxk respectively, where A and B are symmetric positive definite. We
suppose that k^n. The matrices U = XAXTB and V= XTBXA of orders k and
n respectively have s( < k) positive eigenvalues in common; they are the non-zero
eigenvalues of the positive semi-definite matrix
of order k.

PROOF We have W= B1/2UB~1/2. Let u Φ 0 be an eigenvector of U associated


with A φ 0. Then
Uu = XAXTBu = Aw.

X/li; = Aw#0.
Hence v φ 0 and

A is clearly an eigenvalue for W\ therefore A > 0.

The eigenvectors are connected by the relations

u = B~l/2w, v= X T £ 1/2 w, \Υ\ν = λ\ν.

If WTW = l, then uTBu = vTAv=t. Let

W=E E, where E= ΑΙΙ2ΧΎΒ1/2.
On the other hand,
Z = EET = A1/2VAl/2.
Hence W and Z are of the same rank, s(^/c); consequently, so are U and V.
The eigenelements of U can be calculated with the help of the eigenelements of
the matrix W of order k(^n). It is shown in Exercises 3.4.4 to 3.4.7 that the
methods of correspondence analysis, canonical correlation and discriminant
analysis revert to the same pattern of calculation with appropriate choices of X,
A and B.


The structural conception of industrial machines makes increasing use of mathe­
matical models for the structural behaviour of the different mechanical consti­
tuents. The model is then numerically approximated on a computer.
For an analysis of rotating machines we distinguish the rotating parts, the
fixed parts and the connection devices. Each of these types of constituents is
associated with particular physical phenomena. The fixed parts are studied in
accordance with the principles of structure analysis; the presence of rotating parts
makes it necessary to take account of the gyroscopic effect; similarly, the connec­
ting pieces may introduce circulatory forces (Geradin and Kill, 1984). Thus we

arrive at an equation of the form

d2u du
—r + B—
2 + B— + Ku =f, (3.5.1)
dt dt
where M is the mass matrix of the structure, which is symmetric and positive
(semi-) definite, K is the stiffness matrix and B = G + C is the matrix that takes
account of the gyroscopic effect G and the damping C. The matrices K and B
need not be symmetric; / represents an exterior force.

3.5.1 Free Undamped Vibrations

If B is neglected, equation (3.5.1) reduces to
M — + Ku = 0, (3.5.2)
where the stiffness matrix K is, in general, symmetric and positive (semi-) definite.
If we seek a solution u of the form
u(t) = β ί ω 'χ,
Kx = ω2Μχ.
In general we want to find the least eigenvalues, or those that lie in a given
interval, in order to determine whether a known oscillatory perturbation force
can create a resonance.
Most frequently the matrices K and M are positive definite. Nevertheless, it
can happen that M and K are singular. For example, a structure that admits free
rigid movements (an aeroplane or a ship) has a stiffness matrix of rank n — r,
where r is the number of independent rigid movements.

3.5.2 The Quadratic Problem of Eigenvalues

In the general case we seek a solution u of the form u(t) = βμίχ so that
(μ 2 Μ + μΒ + K)x = 0. (3.5.3)
We can reduce this quadratic eigenvalue problem to a classical generalized
eigenvalue problem of order 2«, namely
Pz = XQz9
where we put λ = l/μ and

M 0 λ
={:}· ' - ( : : ) - ·■( 0 -KJ

The matrices P and Q are symmetric if B and K are symmetric. However, in

general, the eigenvalues λ (and hence μ) of the problem are complex numbers.
The size of the problem obtained by finite element approximation of structures
from civil engineering or aeronautics reaches several hundred or thousand degrees
of freedom. Engineers have devised various techniques of reducing the size of the
problem by static condensation; these are described in Exercise 3.5.8.


3.6.1 Quantum Chemistry

In quantum theory the properties of particles, such as electrons, atoms or mole­
cules, in the stationary state, are described by the wave function φ\ this is a
solution of the Schrödinger equation
Ηφ = Εψ,
where H is the energy operator and E is the energy of the particle. The operator
H is called the Hamiltonian; for a single particle it can be written as
H=-—A + q,
where h is Planck's constant, m is the mass of the particle and q is the potential
energy (depending on the spatial variables). Starting from a configuration with
basis {Xi}N, we have the representation

Φ = Σ ciXi-
£= 1

If we confine ourselves to the first n vectors {#,·}", we obtain the generalized

eigenvalue problem
Hc = ESc, c = (c/)"1, (3.6.1)
H = (HXpXi) and S = (XpXi)
(/, j = 1,..., n). This is the method of the interaction configurator which is more
pecise then the approximation of Hartree-Fock (see below), but which leads to
very large problems:
(a) When the chosen Xi are non-orthogonal, the size of the problem is usually
less than 1000; but the matrix S is ill-conditioned (the basis vectors are almost
linearly dependent.)
(b) When the Xi are chosen to be orthogonal, we obtain the classical problem
He = £c, whose dimension can vary from 103 to 106 (or more). The matrix
H is usually sparse and diagonally dominant but with an unstructured distri­
bution of the non-zero elements.

Davidson (1983) has proposed a method of projection on a Krylov subspace;

this differs from Lanczos's method, which we shall present in Chapter 6 (see
Exercise 6.3.17).
In the approximation of Hartree-Fock we are led to the problem Fc = ESc
whose size is in general less than 300. We wish to determine all the eigenvalues;
the problem should be solved iteratively, as the matrices change little from one
iteration to the next. The solution obtained with one iteration allows us to define
an orthonormal basis with respect to which the matrix F is almost diagonal.

3.6.2 Spectra of Graphs

A very different approach for molecules of hydrocarbon has been proposed by
Hiickel (1931): he uses the graph associated with a molecule. In this graph the
vertices are the carbon atoms and the edges are the bonds between the σ-electrons
of the two atoms under consideration. We associate with the graph a symmetric
matrix A = (alV), where atj represents the number of edges joining i to;.
According to Hiickel we can approximate H by OLI 4- ßA and 5 by / + σΑ,
where a,/? and σ are supposed to be known constants. The equation (3.6.1)

Ac = c.
Thus E and c can be deduced from the eigenelements of the matrix A, which
corresponds to the graph of bonds between the carbon atoms.
The interest of this approach lies in the fact that the size of A is evidently
reduced. However, although the results obtained by this method are qualitatively
good, they are far less precise then those obtained by the method of the preceding

3.6.3 Chemical Reactions

A very active area of research is concerned with the spontaneous emergence of
space-time schemes of organization in chemical and biochemical reactions. The
understanding of these phenomena of self-organization is a fundamental stage
in the study of morphogenesis in open and non-linear biological systems. A
classical example of a chemical reaction that displays this phenomenon of self-
organization is the reaction of Belousov-Zhabotinski (Nicolis and Prigogine,
1977): a homogeneous chemical mixture, when left at rest at constant temperature,
can organize itself spontaneously into spirals.
We present here a model of a trimolecular chemical reaction known as Brusse-
lator; this is one of the simplest models possessing the property of self-organization.
We suppose that reaction takes place in a test tube. Let r be the space variable,
0 ^ r ^ 1. The concentrations x(i, r) and y(t, r) with diffusion coefficients D t and

D2 satisfy the equations

dx _ d2x
di dr2 + A-(B+l)x + x*y9
dy Λ 5 2 j; „ ,
_ , 2, + ß x - x 2 v .
3ί dr
with the initial conditions x(0, r) = x0(r), y(0,r) = ^o(r) when 0 < r < 1, and the
Dirichlet boundary conditions
x(i,0) = x(r, l) = i4,

y(i,0) = y(i,l) = ^.

The system (3.6.2) has the trivial stationary solution x = A, y = β/Λ. The linear
stability of (3.6.2) around the equilibrium solution is studied by putting

dt dt
The stability is therefore related to that of the Jacobian of the right-hand side of
(3.6.2), evaluated at the equilibrium solution.
Let J be the linear operator. There exists a stable periodic solution if the
eigenvalues of J with largest real part are pure imaginary (and semi-simple). The
reader will verify that

Dx--2 + B-l A2
J= or1
-B D2—-
2 2


The examples that we have given so far involve only differential or partial
differential equations. However, most equations of mathematical physics possess
an integral formulation, for example the problem of the Laplacian+ with the help
of the Green* function (Exercise 3.7.1).
A simple example of an integral operator is the following:

K:x(r)-(Kx)(r) i = fc(r,s)x(s)ds
k(t9s)xi (0 ^ t ^ 1),

♦Erik Ivar Fredholm, 1866-1927, born at Stockholm, died at Mörby.

After Count Pierre Simon de Laplace, 1749-1827, born at Beaumont-en-Auge, died in Paris.
♦George Green, 1793-1841, born and died at Nottingham.

and the corresponding eigenvalue problem is given by the equation

(Kx)(t) = fc(r,s)x(s)ds = λχ(ή (0 ^ t ^ 1). (3.7.1)

Within an appropriate framework of functional analysis it is easy to verify that
if (Λ,,χ) are eigenelements of an integral operator, then (1/Λ.,χ) are eigenelements
of the associated differential problem. Under appropriate conditions the inverse
of an elliptic differential operator is a compact integral operator (see Chatelin,
1983, Ch. 4).
We shall describe a method of approximation by approximate quadratures of
(3.7.1). This is very often used and is known as the method ofNyström (1930). We
consider a formula for approximate quadrature on [0,1], defined at points
0 < sf] ^ 1 with weights νν{.Λ), (i = 1,..., n). For the sake of simplicity the superscript
(n) will henceforth be omitted. Equation (3.7.1) is approximated by
Σ wjk(t,Sj)xn(Sj) = Xn(t) (O^i^l) (3.7.2)

The scalar λη and the vector xn = [xn(s7)]" are calculated by discretizing the
variable t at the same points if = st. We obtain
X wjk(th Sj)xn(Sj) = λΗχΜ (i = 1,. · ·, n).

Let An be the matrix with elements Wjk(th sj) (i, j? = 1,..., n). Then λη and xn are
solutions of

The value of xn(t) when t Φ t( is obtained by substituting the value of xn in

equation (3.7.2) on condition that λη Φ 0:

xn(t) = rt»>At>Sj)Xn(Sj)· (3.7.3)

The formula (3.7.3) which enables us to determine the value of xn at an arbitrary

point t is known under the name of Nyström's natural interpolation.
Under appropriate assumptions on K and on the formula of approximate
quadratures, it can be shown that (λη,χη) tends to the solution (A,x) of (3.7.1) (see
Chatelin, 1983, for a rigorous treatment).
The reader will notice that, in contrast to what takes place in the discretization
of differential equations, the matrix An obtained here is dense. This computational
difficulty can be surmounted by keeping n of modest size and using the techniques
of iterative refinement (see Chatelin, 1984a).
This survey of different situations involving the calculation of eigenvalues
shows that it is required either to compute them all or else only some of them,
in particular those that lie in a given region of the plane (interval, half-plane of

positive real parts and so on). The methods presented in Chapter 5 (dense
matrices of moderate size) and those of Chapters 6 and 7 (large sparse matrices)
will enable us to carry out these numerical calculations.


In the preceding pages we have presented a selection of practical applications
and theories of eigenvalues. There are many other industrial and scientific appli­
cations that require the calculation of eigenvalues. Without claiming to be
exhaustive we mention the following:
(a) Network analysis of the distribution of electric power (Erisman, Neves and
Dwarakanath, 1980).
(b) Plasma physics (Rappaz, 1979).
(c) The physics of nuclear reactors (Wachpress, 1966, and, more recently,
Kavenoky and Lautard, 1983; see also Hageman and Young, 1982).
(d) The control of large space structures (Balas, 1982).
(e) Automatic layout of electronic circuits in technology VLSI (Barnes, 1982).
(f) Oceanography (Platzmann, 1978).
The iterative aggregation/disaggregation method for Markov chains is des­
cribed in Chatelin (1984). The presentation of the Marx-von Neumann model
was inspired by a lecture course on balanced growth given by J. Laganier at the
University of Grenoble. The von Neumann model is described in Aubin (1984).
Supplements on the theory and applications of matrices with non-negative
coefficients can be found in Berman and Plemmons (1979).
As regards the dynamics of structures, the reader will find further information
in the books by Meirovitch (1980) and Sanchez-Palencia (1980). A description
of the maritime applications (naval design and drilling platforms in the high sea)
can be found in Aasland and Björstad (1983).
Finally, the reader will find useful developments in the books by Chatelin
(1983) on the numerical approximation of eigenvectors of differential and integral
operators, by Thompson (1982) and by Cvetkovic, Doob and Sacks (1980).


Section 3.1 Differential Equations and Difference Equations

3.1.1 [A] Consider the system of first-order differential equations
w(0) = u o , — = Au (i>0),
where u is a vector of R" which is diiferentiable with respect to t and where A is

a constant matrix of order n. Let J be the Jordan form of A and let V be the
corresponding basis. Show that
M(i)=Ke J i F- 1 u 0 ,
and determine the elements of eJ*. In particular, discuss the case in which A is
3.1.2 [C] Compute u(t) when the data in Exercise 3.1.1 are
1 2\
A= 0 0 1 and un = 1
3.1.3 [D] Show that the system of differential equations proposed in Exercise
3.1.1 can be bounded or unbounded according to whether the eigenvalues of A
having a zero real part are semi-simple or not.
3.1.4 [D] Consider the one-dimensional heat equation
du d2u
—= — (0<χ<1, ί>0
dt dx
with the boundary conditions
ii(r,0) = u(i,l) = 0 (i>0)
and the initial condition
M (0,x)=/(x) (O^x^l).
(a) Write down the problem that arises when the second derivative is discretized
by finite differences:
_d _u(u
_ _ x)
_ ^ u(t, x-h)-
__ x) + w(i, x + h)

(b) Cast the discretization in the form

+ Αφ = 0,
Φ(0) = φ0>

φ(ή =
Ui(t) being an approximate value of w(i, ih). Put tj =jAt {j - 0,1,...) and let
u[ be an approximation to uk(jAt). Integrate with respect to the time over

1 <;♦■
"i =
(c) Rewrite the system. Use the trapezium rule to evaluate approximately the

(\t)dt*$ Wc) + Ml (c<d).
(d) Show that the system can now be written as

A + — l)uJ+1 = (-A + — I W,
At J V At
(e) given that the eigenvalues of A are
^k "- ^ i ,s2 i- n ^

show that the sequence uj is bounded provided that h and Δί are sufficiently
3.1.5 [B:39] Discretize the equation

Id2u d2u\ „ Λ «0
+ _/ Ω (0 )
"(i? v) °" - ·' ·
where Γ is the boundary of Ω, using finite differences and a step of h = 1/N for
both x and y.
(a) Let utj be approximations oiu(ihJh) and / y = f(ihjh). Show that the resulting
system becomes

- 7i(M«- u + M; + w + ";,;-1 + uu+1 - 4M

«J) =Λ/

when 0 < i, j < N, and

uu = 0
when i o r ; are 0 or N.

(b) Write down the system in matrix form

Ahu = b,
the matrix being block-tridiagonal with invertible blocks.
(c) Show that Jacobi's method, applied to this system, converges if h is sufficiently
small; it may be assumed that the eigenvalues of Ah are given by
2 i+cos -
f = —2
cos 2 2
3.1.6 [D] Let J be a Jordan block. Determine the elements of Jk (k = 1,2,...).
3.1.7 [D] Let A be a real positive definite matrix. Consider the system
= Au (t>0).
Let X = (x1,...,x„) be a basis consisting of eigenvectors of A. Investigate the
existence of a solution of the form

u(t)= t*jei<0jtxj 0'2=-l).


3.1.8 [D] The chemical reaction

2H 2 + 0 2 - 2 H 2 0
is decomposed into elementary reactions involving the radicals O, OH and H:
0 + H 2 ->OH + H,
OH + H 2 ->H 2 0 + H,
H + 0 2 ->OH + 0.
At the kth stage let xk, yk and zk represent the number of radicals O, OH and H
respectively, in such a way that
X = Z
fc+1 k->
= X
yk+1 k + Z*>
fc + 1 = x
k + yk

and x0 = 1, y0 = z0 = 0. Put
"fc = yk u0 = 0
\zk) {o)
Determine the matrix A such that
"* + Ι = Λ Μ * .

By analysing the spectrum of A, deduce the limit

M^ = lim uk.
k-* oo

3.1.9 [D] Study the discretization through finite differences of the eigenvalue
Y = ku(x) (0<x<l),
whenu(0) = u(l) = 0.
Calculate the exact solutions and compare them with the results furnished by
a discritization at five points (associated matrix of order 3).
3.1.10 [B:l 1] Let T be a bounded linear operator in Hubert space (H, <·,·»· Let
be an orthonormal system in H and let
S = lin V
be the subspace generated by V. The orthogonal projection on S is denoted by
π. Consider the eigenvalue problem
Τφ = λφ, ΟΦφβΗ, λβ<£
and the approximation, named after Galerkin, that is associated with the sub-
space S:
π(Τφη- ληφη) = 0, 0 Φ φη6S, Xne<£.
Show that the approximate problem is equivalent to a matrix problem
Au = ωκ, O^iieC", coeC.
Determine the matrix A and the relation between u and φη.

Section 3.2 Markov Chains

3.2.1 [B:36] Let P be the transition matrix of a homogeneous Markov chain.
(a) Prove that each complex eigenvalue is of modulus less than or equal to unity.
(b) Prove that for each eigenvalue of modulus unity, that is for each eigenvalue
of the form λ = e10, 0eR, there exists on integer q such that eiqe = 1.
(c) Prove that if all the elements of P are positive, then unity is a simple
eigenvalue of P and all the other eigenvalues of P are of modulus less than
3.2.2 [B:12] Consider a discrete Markov chain with n states. Let

Ρ = (Ρυ)

be the associated transition matrix. Suppose that the chain is irreducible and
non-periodic. KolmogorofFs equations

have a unique solution π*. Jacobi's iteration can be written as

(a) Show that Jacobi's iteration corresponds to the power method applied to PT
with the normalization condition
e = (l,l 1)T.
Let {Ω(ι):ι = 1,...,/?} be a partition of the set {1,2,...,«}. With each state
π of the chain such that n{ > 0 (i = 1,..., n) we associate a matrix Ρ3(π) (called
the aggregated matrix) defined by

Σ Σ *'pfk
η^^ιφ^ι— (ΐ</,^ρ).

(b) Show that Pa(n) is a transition matrix. Let na be defined by

π* = π*Ρ\

and let
p π*
π= Σ v - J n- GJn>
i=i Σ f
Gj= Σ <νϊ·
(c) Show that ne = 1. The new stationary state of the chain is defined by one
Jacobi step
π = πΡ.
(d) Show that

** = Σ ^ τ Σ MV* (K*<n).

3.2.3 [D] We retain the notations of Exercise 3.2.2. Consider a Markov chain
which is almost completely reducible:
P = D + £,
D = diag(D 1 ,Z) 2 ,...,D / ,), ||£|| 2 = ε;
£>, is the transition matrix of an irreducible non-periodic chain, whose stationary
state is denoted by π, and satisfies the condition
nte— 1.
Let π be a vector of blocks π,. Consider one step in the aggregation/disaggregation
method based on π:
Ρβ= Σ Σ MV* (ΐ^υ^ρ),
ΛεΩ(ί') / 6 Ω 0 )

π^= 1,
π3 = πΛΡ\

Show that
| | π - π * | | 2 = 0(ε).
3.2.4 [C] A message has to go from a point A to a point B by passing through
n intermediary points. Suppose the message can take only two states: either 0 or
1. Each intermediary has the probability p = \ of correctly transmitting the
message received and a probability q = f of transmitting the opposite message.
We say the system is in the state £(0) at the kth stage if the intermediary k
transmits 0 to the next intermediary, and that it is in the state £(1) if the
intermediary transmits 1.
(a) Prove that the sequence of observed states is a Markov chain.
(b) Calculate the transition matrix.
(c) Calculate the probability of receiving the correct message at B and determine
the limit of this probability when the number, n, of intermediaries tends to

Section 3.3 Theory of Economics

3.3.1 [B:37] Let A = (a^) be the matrix of technical coefficients. Let dj be numbers
that enable us to pass from one physical unit to another or else to a monetary

unit. Let A be the new matrix obtained in this way. Prove that
Ä= D'lAD,
D = diag(d1,...,rfw).
3.3.2 [B:37] Let A be the matrix of technical coefficients defined in monetary
units. Suppose there exists a price system that makes each branch of the economy
profitable. Show that
and hence p(A) < 1, where A is the matrix of coefficients defined in whatever
system of units.
3.3.3 [B:37] Suppose the number of employees in the branch j of the economy
decreases, thereby causing the intensity of work in this branch to increase. Prove
that the rate of profit and the rate of growth will then increase.
3.3.4 [B:37] In the Marx-von Neumann model wages are indexed by prices:
w = pd,
where d is the employees' basket of consumption goods. Suppose that d is varied
by an amount Ad.
(a) Show that the increases of consumption by the employees is equivalent to an
increase in the wage costs.
(b) Show that an increase of the wage costs implies a decrease in the rates of
profit and growth.
3.3.5 [B:5] We present here what is known as the closed Leontiev model. The
set of goods is supposed to be equal to the set of products. The matrix A of
technical coefficients is a non-negative square matrix. If x is the vector of products
and y is the vector of goods, then
y = Ax.
The system is viable if y ^ x and the equilibrium of the quantities is given by
(J - A)x = 0 (x ^ 0).
(a) Determine a sufficient condition for equilibrium when the matrix A is irredu­
Let p be the row vector of the prices of the goods. The row vector of the
costs of manufacturing the goods is therefore
c = pA.
Hence the equilibrium of the prices is given by
ρ(/-Λ) = 0 (ρ>0).

(b) Show that, when A is irreducible, the equilibrium of the prices is equivalent
to the equilibrium of the quantities.
3.3.6 [B:37] Next, we shall present the open model of Leontiev. We now have
n goods which are also products, but there exists a type of goods which is not a
product (in general, the work). The matrix A of technical coefficients is non-
negative and irreducible. The net produce is given by the equation
q = (I-A)x.
(a) Given a demand vector c ^ 0, determine a sufficient condition for the exis­
tence of a vector x ^ 0 such that
q = c.
(b) Prove that if there exists a row vector p > 0 such that
then (I — A)~ exists and is positive.
(c) Examine and interpret the sequence

x (0) = c,
when the dominant eigenvalue λ* of A is such that 0 < λ* < 1.
3.3.7 [B:37] In the growth model of von Neumann, the production is defined
by two matrices:
A = coefficient matrix of the goods,
B = coefficient matrix of the products,
where it is supposed that there are m techniques for the production of n goods.
During a given period of time we consider a column vector xeR m of the activity
level of the techniques and a row vector p,pTeRn, of the prices of the goods. If a
is the growth rate and β is the interest rate, then
(B - <xA)x ^ 0 (x ^ 0),
ρ(Β-βΑ)^0 (ρ»0).
(a) Show that the surplus has zero price and that, if the profit is less than the
interest rate, the activity level is nil.
(b) Show that, if the technology (B9 A) consists of non-negative irreducible matri­
ces, there exists a unique number α* = β* > 0 such that
a* Ax ^Bx (x> 0),
β*ρΑ^ρΒ (ρ>0)
p(a*A - B)x = 0.

(c) What can be said about the maximum rate of growth in relation to the
minimum rate of interest?
3.3.8 [C] We shall treat here the case of a farmer whose economy is confined to
the raising of chickens. We are concerned with two goods (chickens and eggs)
and two processes (laying and brooding). It will be assumed that a laying hen
will lay a dozen eggs per month, while a brooding hen will hatch four eggs per
(a) Show that the matrices A and B of Exercise 3.3.7 are in this case

(b) Study the farmer's situation at the end of two months after he started with
three chickens and eight eggs.
(c) Repeat part (b) for the case when he started with two chickens and four eggs.
(d) Calculate the rate of growth when the economy is balanced.
(e) Study the balance of prices when one chicken is worth 10 units and an egg is
worth 1 unit.
(f) Repeat part (e) for the case when the price of a chicken is 6 units and that of
an egg is 1 unit, and calculate the rate of interest.
3.3.9 [B:37] Consider the Marx-von Neumann model defined in Section 3.3
(page 117): we formalize here the process of absolute price formation and hence
the propagation of inflation. Given two vectors
x = (x 1 ,...,x n ) and }> = (j>i,...,)'„)
we define the vector
z = (z 1 ,...,z n ) = max{x,<y}
by putting
z^maxjxi,^} (ι = Ι,.,.,η).
Let s be the marginal rate and put

Define the following sequence of row vectors:


This formalizes the effect of the 'clich', that is the rigidity at the decline of prices.
(a) Show that if B is irreducible and if p = (p (1) ,... ,p(n)) > 0 is the price system,

then pk converges to ap where

a = max -£-.

(b) Prove that, for a given A, the sequence

p k+ t(A) = max j pfc(A), -ph(X)B

converges if λ ^ p and diverges if A < p, in which case the relative prices

converges to p and the absolute prices increases at the rate ρ/λ.
3.3.10 [D] Consider Samuelson's oscillator: let rk be the national revenue, ck
the national consumption, dk the national expenditure and ik the national invest­
ment, all during year k.
Let s be the marginal tendency for consumption and let v be the ratio of
investment versus the increase of consumption. Then y — vs is the coefficient of
^ = 7(^-1-^-2)·
In addition we have the relations
rk = ck + ik + dfc,
which correspond to the plan of consumption and investment, and
k — srk - 1 Ϊ

which is the delay of one year between the evolution of consumption and revenue
(with vital minimum zero).
(a) Taking dk = 1 for all /c, show that the national revenue satisfies the equation
rfc + 2 - s ( l +v)rk+l+svrk= 1.
(b) Study the solution of this equation in relation to the values of (s, v). It is
convenient to consider the four regions in the (s, 1;) plane defined by the curves
1 Δ 4ν
s = -v and s = (1+tO 2

3.3.11 [C] Consider an economy that is divided into N regions. We study the
interregional movements of the immigrant workforce. Its redistribution during
the period k (0 ^ k ^ T) is given by a row vector
k — ( x kl>* · -*xkNh

where xkj is the effective workforce of the immigrant population in the region j
during the period k. We suppose that the workers freely change their location
according to taste and different market conditions for work. Let Ak = (α(9) be the
'matrix of migration', where af) is the rate of migration of workers from the region
i to the region j during the period k.
(a) Show that
X =
k+1 ^0^0^*1 '"Ak.

(b) Suppose that N = 3 and that

4 2
Ak = A = 0 ^ ^U
3 3
\I i I /
Calculate the matrix that represents \the
4 4 2/
rate of migration from one region
to another at the end of T periods.
(c) Examine the behaviour of the matrix calculated in part (b) when T-> oo.

Section 3.4 Factorial Analysis of Data

3.4.1 [B:14] Let X9 A, B9 U, V, W9 Z and E be the matrices defined in Lemma
3.4.1. We order the real eigenvalues of U and W and of V and Z (of orders k and
n respectively) by decreasing magnitude; thus
= =
Ai ^ A 2 ^ ' * * ^ Ar > Ar + i = Ar + 2 "" ^k = = ^n ==

(a) Show that the associated eigenvectors uh wh vt and zf satisfy the equations
Ui = B-1/2wh vi = A-li2zi.
(b) Show that the SVD (Exercise 1.6.8) of E is


3.4.2 [B:14] Retaining the notations of Exercise 3.4.1, define

if \=
Ρυ,ΰ) i{fTB-WA-ig)1iir
Show that if
fi = BUi and 9i = AVi9
p(fi,gd = max {p(/,0)|uJ/= »Jo = 0; 1 < j < i} (1 < i < r).
3.4.3 [B:14] We retain the notations of Exercise 3.4.1. Let
SeR*x" and TeJR.nxN,

where max {k,n) ^ JV, and define X = STT, A~' = TT T , B~* = SST. Let 0, be the
ith canonical angle between the subspaces lin ST and lin T T in R N . Show that
yfki = cos 0i (i = 1,..., k).
3.4.4 [B:14] In the method of correspondence analysis the matrix X of order
k x n represents a contingency table:
Xij^O and Σ^ο=^·

For example, x 0 represents the empirical frequency of two discrete variables J

and J which take values in {1,2,..., k} and {1,2,..., it} respectively. We define
k n
j = Σ XU > °» b
i= Σ X
ij > °>

A = diag(ar 1 ), JB = diag(br 1 ).

(a) Show that λι = 1 is the dominant eigenvalue of the matrix

associated with the triplet (X, A, B). Let
α = (α 1 ,...,α π ) τ , f? = (fc1,...,fcfc)T,
X0 = X - baT, U0 = XOAXJJB.

(b) Show that

sp(l/ 0 ) = sp(l/)\{l}.

(c) Show that (70 is associated with the triplet (X0A9 A ~ l9 B) as well as with the
triplet (X09A,B).
(d) Interpret factorial correspondence analysis of the columns of X as determining
the principal inertial axis in R k equipped with the norm B of the set of points
defined by the columns of X' — X0A weighted by {a,}". The row analysis is
obtained by duality.
3.4.5 [B:14] Continuing Exercise 3.4.4, assume now that xu represents the
probability of the event {/ = i and J =7}, where / and J are two discrete random
variables. Suppose that the vectors / and g of Exercise 3.4.2 represent the corre­
sponding functions
Associated with the variables / and J we have the functions / ( / ) and g(J).
Establish the following results:
fTXg = <$lf(I)g(J)l


where S is the mathematical expectation.

Show that the probabilistic interpretation of the correspondence analysis is
now to determine the functions of / and J defined by fx = Bux and gt = Avt that
are uncorrelated with the preceding ones and that have maximal correlation
equal to y/Xi9 i ^ 2.
3.4.6 [B:14] In the principal components analysis we consider a random vector
SeR" of zero mean. We define
X = £(SST)9 A = B = I, u=V=X\
fTXf=£(fTSSTf) = a2(fTS).

Show that the method determines uJS, the linear combination of the compo­
nents of 5 that has the greatest variance and is uncorrelated with u?S when j < i.
In this situation, is it necessarily true that λ,·€[0,1]?
For the geometric interpretation, we consider n weights {α^Ί such that Σ" = laj= 1
and n vectors {S}}\ in R \ centred with respect to the {a,·}". Set X = [ S l 5 . . . , S J
and A = diag(a7); the norm B on R* is given. The principal component analysis
(PCA) of the set of n points {Sj)nx in R" finds the principal inertial axes in R*
normed with B of the n points weighted by {a^\. The two particular choices
B = I [respectively B = diag(br*)] lead to the unitary PCA [respectively norma­
lized PCA with bi = Σ"= i<*jSfj which is the empirical variance].
3.4.7 [B:14] Consider the method of discriminant analysis. We retain the nota­
tions of the Exercises 3.4.1 and 3.4.2. Let S be a random centred vector in R* and
let J be an integer random variable. Put n(j) = Prob(J =y). We define

* = (*!>·..,*„),
(a) Show that

fTXg = J $(PS\J =j)n(j)g(J) = <?LfTSg(J)l

gTA-lg = #[g2(J)l
(b) Show that sp(l/) c [0,1].
(c) Show that p(f, g) is the correlation coefficient for fTS and g(J). The canonical

analysis of a random vector S and a random discrete variable J determines

the linear functional of S and the function of J that have maximal correlation.
3.4.8 [B:14] In the canonical analysis of the correlation between two vectors
one considers N centred vectors {Sy}* in R* and N centred vectors {7>}* in R".
SetS = [S i ,...,S N ],T=[T 1 ,...,T A r ], J ß- 1 =(l/iV)S5 T ,/4- 1 =(l/iV)7T T andX =
(\/N)STT. Explain why the direct geometrical interpretation in terms of minimal
inertia is no longer possible. Give an interpretation in terms of canonical angles
between the subspaces in R" spanned by ST and TT.
3.4.9 [A] Let A, B, X, W, Z, (7, V and E be the matrices of Exercise 3.4.1. Let RA
and RB be upper triangular matrices such that
(a) Show that W and W are similar.
(b) Show that the eigenvectors u of U and v of V can be calculated from the
eigenvectors vv of W.

Section 3.5 The Dynamics of Structures

3.5.1 [D] A vertical bar of length unity is fixed at its lower end. At the other
end (zero of the x axis) a device prevents displacements at right angles to the axis
of the bar. At this end a downward force P is applied which causes the bar to be
deformed. Let u(x) be the displacement of the point situated at the abscissa x at
right angles to the axis of the bar.
On the assumption that the displacements remain small it can be shown that
a(x) —-«(χ)Ί
u(x) 1 + Pu(x) = 0,
dx dx
where a(x) depends on the physical properties of the bar. In accordance with the
conditions for fixing the bar we have
M(0) = w(l) = 0.
(a) Prove that if a(x) = 1, this differential problem has a non-trivial solution only
Pe{(7i/e) 2 :/c=l,2,...}
and that, when P = Pk = (nk)2, the solution is any function that is linearly
dependent on
uk{x) = sin nkx (0 ^ x ^ 1).
Suppose now that the function χι—>α(χ) is not constant. We may seek an
approximate solution by discretizing the bar into n + 1 segments of length

h= l/(n + 1). Let wf denote the approximation for u(ih) and put ai+l/2 =
a((i +1)). The discretized problem can be written as

Ty\ßi+ll2{ui+l -Ui)-af_ 1 / 2 (w f - u f _ x )] + Pw; = 0,

"o = "«= 0.
(b) Show that this discretization is equivalent to a matrix problem
Au = Xu.
Determine the matrix A,
3.5.2 [B:66] The lower ends of the two bars in Figure 3.5.1 are equipped with
springs in such a way that in the absence of forces, the position of the bars is in
the same vertical line. A downward vertical force F is applied to the upper end
of the second bar, which causes the angles 0X and 0 2 to appear. Both bars are of
length / and mass m, and the two springs have the some characteristic constant k.
(a) Show that, if the force of gravity is neglected, the kinetic energy K and the
potential energy V are given by
K = > / 2 [ 4 ö ? + Wxe2 cosφ χ - θ2) + θ221
V= \kQ\ -h \k{Q2 - Θ,)2 - Fl(2 - cos θχ - cos 02).
(b) Write down Lagrange's equations
dfdK\ dK dV Λ ,. t ^
— —r- +— =0 0=1,2)
dt\BeJ de, dßi
for this case. Discuss the solution that corresponds to the initial perturbed

Figure 3.5.1

0,(0) = εα„ 0,(0) = eft 0=1,2).
Assume the existence of a solution 0,(ί,ε) which is differentiable with respect
to ε at ε = 0. Put

ΦΜ = —-—
(c) Show that </>, satisfies the equation

+ W, = 0, 0 = 1,2).
(d) Write the preceding system of differential equation in matrix form:
Βφ + Αφ = 0,
and show that A and £ are symmetric and that B is positive definite.
(e) Show that if F > 0 and k> 0, then the roots of the polynomial
p(/4) = d e t ( A - ^ ß )
are real and distinct.
3.5.3 [B:66] Generalize the problem 3.5.2 to the case of n bars. In this case the
kinetic and potential energies (neglecting gravity) are given by
ml2 n
· ·
= ΤΓ Σ (6« + 3 - 6max
{'' J) - δ»)θ&cos (0.· - ej)>

3.5.4 [B:66] Consider an elastic solid. When the equations of elasticity are
linearized, they take the form
£ A / Jduk duA
2 M =i \dxl dxkJ
Aijki — Ajikl — i4yIfc — v4kiij,

,·= i dx,. dr
where u(x) is the displacement vector and p(x) is the density of the material.
Write down the system of differential equations that enable us to determine
the normal modes of vibration of the form
ux(x, t) = exp ( - ί^Τήω(χ).

3.3.5 [D] Consider the vibrations of an elastic disk (homogeneous and isotropic)
whose normal displacement component u(x, t) is a solution of

where Δ 2 is the biharmonic operator (the Laplacian is applied twice). The method
of the separation of variables yields two normal modes of vibration of the form
u(x, t, λ) = exp (iy/Tt)w(x).
Write down the equation satisfied by w.
3.5.6 [A] Consider the differential equations
Mu" + Bu' + Ku = 0,
whose unknown u is a vector function of a real variable t > 0. Find a solution u
of the form
"(0 = eA'4>,
where φ is a constant vector. Prove that the pair (A, φ) satisfies the equation
(λ2Μ + λΒ + £)</> = 0.
3.5.7 [D] Consider the differential equation
MM" + J3u' + Ku = 0
with the initial conditions
w(0) = wo, ii'iOHtii.
Define the polynomial
ρ(λ) = ά<Χ(λ2Μ + λΒ + Κ).
Suppose that M, B and K are Hermitian. Prove that:
(a) If M, 1? and K are positive semi-definite and if M and X are positive definite,
then no root of ρ(λ) has a positive real part.
(b) If M and K are positive semi-definite and B is positive definite, then λ = 0 is
the only root with a zero real part.
3.5.8 [B:22] We present here a method known as static condensation in relation
to the problem
(P) Kq = aj2Mq, O^eC,
which models the natural frequencies and modal forms of a structure considered
globally; K is the rigidity matrix and M is the matrix of masses (page 121).
We choose a subset qc of coordinates that are to be eliminated and we denote
by qR the subset of coordinates that are to be retained. This induces a partitioning

of the equations (P):

KRR4R + KRC<?C = ^ 2 (M RR g R + MRCqc),
*CR<7R + Kcc<7c = w2(MCR^fR + Mccqc).
Suppose that qc can be decomposed as follows:
<7c = <?s + <7D,
where qs is the static part:
<Zs= -^CC^CR^R·
The method of static condensation consists in neglecting qD so that
9c = -
(a) Prove that, if qD = 0, then (ω, qR) is a solution of
KRR<ZR = <^2MRRgR,
K — K — K κ~ικ
^RR "" A R R ^RC^CC^CR'

^*RR ^*RR ~" ^ R C ^ C C ^CR ~~ ^ R C ^ C C ^*CR>

^*CR ^ C R ~" ^ C C ^ C C ^CR·
(b) Show that qD satisfies the equation
(Kcc - a)2Mcc)qD = co2MCR<?R.
Let qR = 0 and let (μ,, </>,) be the solutions of
Κ€€φ = μ2Μ€€φ (φ*0).
Suppose that
.2 *> . . . ^ „2
4>*Mcc<t>J = oij.
Let ε > 0 be the order of magnitude of the error acceptable for the modal
forms associated with the low frequencies.
(c) Show that the method of static condensation furnishes approximations ac­
ceptable for the solutions (ω, q) such that
ω2 = εμ2« 1.

Section 3.6 Chemistry

3.6.1 [D] Prove that equation (3.6.1) is equivalent to Galerkin's method when
{Χι>···>&■} is a n orthogonal system. Determine the linear operator which is
represented by the matrix H is (3.6.1) (page 122).

3.6.2 [D] Verify that the Jacobian of the right-hand side of (3.6.2) (page 124) is
given by
d2 \
J =
-B D
\ 1

Section 3.7 Fredholm's Integral Equation

3.7.1 [B:12] Consider the differential eigenvalue problem:
- x " = Ax, x(0)=0, x(l) = 0.

(a) Determine the associated Green kernel and formulate the eigenvalue problem
associated with the integral operator.
(b) Show that the discretization by finite differences for the differential problem
is equivalent to Fredholm's approximation for the integral problem.
3.7.2 [B:6,12] We present here what is known as the collocation method by
discussing an example. Let B = C°[0,1] be the Banach space of functions
x:ie[0, l]i->x(i)eC
which are continuous on [0,1] and are equipped with the uniform norm
| | x | | = max |x(i)|.

In this space we consider the set of functions {eu...,en} defined as follows.

Let n > 2 be a given integer and put
/i = ( n - l ) - and 0 = 0 - l)h 0=l,2,..,n).

if 0 ^ t ^ h9
whenj = 2,...,n— 1:

l--|i-r,.| ύ tj.1^t^tj+1,
*;(') =
0 otherwise,

en(t)=S h
0 otherwise.

Let Bn = lin {ex,..., en} be the subspace generated by these functions.

(a) Prove that xeBn if and only if xeB and x is a polynomial of degree less than
or equal to unity on [tp tj+ J wheny = 1,2,...,n — 1.
(b) Prove that
πη:Β-*Β, XH-> £ x(i;)e,·

is a projection on £„; determine its kernel. Let T be a bounded linear operator

defined on B. Consider the eigenvalue problem
Τφ = λφ, ΟΦφεΒ, AeC.
and the approximation (of oblique Galerkin type)
πη(Τφη - ληφη) = 0, 0 Φ φηβΒη, AneC.
(c) Prove that this approximation is equivalent to a matrix problem of order n:
Au = COM, 0 Φ weC, ωβ€.
Calculate the matrix A and make explicit the connection between u and φη.
3.7.3 [A] Retain the notation of Exercise 3.7.2. Let
Prove that φη is an eigenvector of the operator Τπ„ associated with the eigenvalue

3.7.4 [B:5,12] Let

hi*) = Σ MjnMtjn)

be an approximate quadrature formula for

/(x)= I x(i)di.
Let T:C°[0,1] -> C°[0,1] be the integral operator

(Tx)(t)= k(t9s)x(s)ds
having a continuous kernel k.
We define the Nyström approximation of T associated with the given quadra­
ture formula, namely

(Tnx)(t)= t<*jnk(t,tjn)x(tjn).

Let λη and xjn be defined by the equations


max | x j = l.
1 </<«

(a) Prove that

<M0 = Σ °>jnKt,tjn)Xjn

is an eigenvector of Tn associated with the eigenvalue λη

(b) Prove that
<l>»(tin) = Xin (1 </<«).

Error Analysis

This chapter begins with a topic of great practical importance, namely the
stability of a spectral problem and the notion that is derived from it: the spectral
conditioning for a set of distinct eigenvalues and for the associated invariant
subspace. This will be considered in the most general case involving a non-
normal matrix and defective eigenvalues.
The analysis of a priori errors in based on spectral theory, which enables us
to give concise and elegant proofs. The analysis of a posteriori errors furnishes
bounds that are fairly easy to calculate as a function of the residual matrix
AU — UC constructed upon the matrix C of order m and the m vectors of V.


Suppose we wish to solve the linear system Ax = b: it possesses a unique solution
x = A " lb if and only if A is non-singular. This is all that the pure mathematician
is interested in. But the numerical analyst wants to know that, within the realm
of possibility, the solution is insensitive to perturbations in the data A and b. If
A is perturbed by ΔΑ and b by Ab, then the new solution x -f Δχ is such that
||Ax|| condM) /«Aft« | ||ΑΛ|[\
ΐ-Μ-ΊΙΙΔΛΙΓIV 11*11 Mil /
provided that ||ΔΛ|| < 1/M~ * II ( see Schwarz, p. 27, or Horn and Johnson, 1990,
p. 338).
The condition number cond(i4)(= Mil MMl) is therefore a measure of the
relative error ||Ax||/||x|| of the solution as a function of the relative errors of the
data. If cond (A) is large, then, in a certain manner, A is close to a singular matrix:
there exists a matrix AA of rank unity such that

UA, - '"'
cond A
and A + A A is singular (see Exercise 4.1.1; also see Chapter 1, Section 1.12).
Before solving a linear system it is useful in practice to scale the matrix in order
to reduce its condition number. The scaling process of A consists in finding

diagonal matrices Dl and D2 such that

cond(DlAD2) = inf condiA^Aj)

As we have seen, the stability of the solution of a linear system depends on the
regularity of A. The situation is much more complex for the problem of eigenvalues.
The notion that corresponds to the regularity of A is the property of A to be
diagonalisable or non-defective.

Example 4.1.1 The matrix

is defective. Put

m-(] J) (e>o).
The eigenvalues of Α{ε) are
Αχ(ε) = 2 + y/ε, λ2(ε) = 2 - ^/ε,
d/l1(g)_ 1 άλ2(ε)_ 1
dß 2^/έ' de iyß
The rate of change of the eigenvalues at 0 is infinite. However, Α(ε) is much nearer
to non-diagonalisation than might be predicted from the distance of the eigen­
\\A — Α(ε)\\ = ε and Α^ε) — Α2(ε) = 2 χ / ε » ε
for small ε.


We shall study the variation of the eigenvalues and eigenvectors as a function of
a variation AA in the matrix A. We shall begin with a simple eigenvalue.

4.2.1 The Case m = 1

Suppose the eigenelements A, x, x+ satisfy the equations
AX =λ Χ
_ 1||χ||2 = χ * χ = 1 .
,4*x+ = AxJ *
Let P = xx* be the eigenprojection, S the reduced resolvent, Ρλ = χχ* the
orthogonal projection on the eigendirection M = Ηη(χ),Σ± the partial inverse

Figure 4.2.1

in Mλ and let ξ be the acute angle between the eigendirections lin (x) and lin (xj
(see Figure 4.2.1). If [x,Q] is a unitary basis of C such that Q is a basis of
M \ then
Σ 1 = ρ(Β^Λ/Γ 1 ρ*,
Β = βΜρ.
<5 = dist[A,sp(>l)-{A}].

Proposition 4.2.1 / / the matrix A is subjected to a perturbation AA, then the

simple eigenvalue λ varies (to thefirstorder) by Αλ = x*AAx, and the eigendirection
lin(x) turns (to the first order) through an angle Θ such that
tan0= ΙΙΣ-^Λχ^.
PROOF Let A' = A + AA, where ε = \\AA \\ 2 . We have seen in Chapter 2 that the
Rayleigh-Schrödinger series converge for sufficiently small ε (Corollary 2.9.4).
Hence there exist eigenelements X' and x' of A' such that x*x' = 1 and
μ'-(Α + χ*Δ.4χ)| = 0(ε2),
| | χ ' - ( χ - Σ χ Δ Λ χ ) | | 2 = 0(ε2).
In accordance with Figure 4.2.2 we have
||x'_x|| 2 = tan0

Figure 4.2.2

(a) The spectral condition number of the simple eigenvalue λ is defined by
csp(/l)= | | x j | 2 .
(b) The spectral condition number of the eigendirection lin(x) is defined by
We recall that
Il**ll2 = 11^112 = (cos ξ)~\
(3-1^||(B-A/)-1||2 = l|X1||2<2cond2(F)^',
where £ is the index of the eigenvalue of B that is nearest to A, provided that δ is
sufficiently small (see Exercise 4.2.1), and where V is the Jordan basis for B.
When B is diagonalisable,
where V is the basis of the eigenvectors of B. Then
(a) If there exists a vector of M that is almost parallel to x (that is if ξ is almost
a right angle), then λ is ill-conditioned.
(b) If δ is small and/or cond 2 (K) is large, then x is ill-conditioned.
When A is Hermitian or normal, then
M 1 = M, ||xj|2 = l and ΙΙΣ1!!, =-cT'.
In this case, the sole cause for the ill-conditioning of lin (x) is δ, the distance of /
from the rest of the spectrum of A. For an arbitrary matrix, the departure of B
from normality also plays a part (see Chapter 1, Section 1.12).

Example 4.2.1 The condition number of a multiple eigenvalue / (of multiplicity

m) is not defined. When we perturb the matrix, we obtain in general m simple
eigenvalues {AJ.}™. When λ is semi-simple, maXjCsp(Aj) may be moderate, because
λ corresponds to an orthogonal basis of eigenvectors. When λ is defective, then
maXfCspiA;) is necessarily large because λ corresponds to fewer than m indepen­
dent eigenvectors. We conclude that a defective eigenvalue is necessarily ill-
conditioned when considered individually (see Section 4.2.2).

Example 4.2.2 Consider the 2 x 2 matrix

-(: y)-C X X -:=:>

v° )
where a φ b. The matrix has two simple eigenvalues, namely a and b; their

condition number is of order ε" l . Whenft— a is small, then T is close to a matrix

with a double eigenvalue \(a + ft), which is semi-simple or defective according to
the value of ε.
(a) Suppose that ft — a is small and ε" * is moderate. Let T — \{a + ft)/, then
/a —ft b — a\
2 ε
T-T' =\
\ 2 /
|| T - T || 2 is of the order of ft - a.
(b) Suppose that both ft — a and ε are small. The matrix
b a
( „ ~ \

\ )
has a defective double eigenvalue \ (a + b), and the Jordan basis is given by
n i \
V=\ ε

U b>-a\ 2 /
If (l/s)(b — a) is moderate, then cond2 (V) is moderate. This condition number
is large when (1/ε)(ί? — a) is large; in this case V is of rank unity up to the term
ε/(ί> — a), and the departure of T from normality is (ft — α)/ε and is therefore

Example 4.2.3 Consider the matrix

/—149 - 5 0 —154>
A= 537 180 546 L
-27 -9 —25 J
whose eigenvalues are {1,2,3} (the example is due to C. Moler). We shall disturb
a 2 2 =180:
(a) a 22 = 180.01: the eigenvalues become 0.207 265 5; 2.3008349; 3.5018994
(Figure 4.23).

-X · · K · K ^
I 2 3

Figure 4.2.3

—._jT· r. ^
I 2 3
Figure 4.2.4

(b) aT22 = 179.997 769: the eigenvalues are now 1.550 945 6 ± i x 7.999 21 x 10" 2 ;
2.895 877 9 (Figure 4.2.4).
It can be seen that a small perturbation around a22 can produce very different
perturbations of the eigenvalues. It can be verified that csp (Af) % 103 (i = 1,2,3).

4.2.2 The Case m > 1

We pass on to the case of a block σ of eigenvalues of total algebraic multiplicity
m, and we suppose that the number
£ = dist πιίη(σ, sp A — o)
is positive. The block σ may consist of a cluster of eigenvalues, or of a single
multiple eigenvalue.
Let M be the m-dimensional subspace which is invariant under A and is
associated with σ. Let Q be an orthonormal basis of M; then P1 = QQ* is the
orthogonal projection on M. The matrix B — Q*AQ, which represents the
mapping A>M relative to the base Q, has σas its spectrum.
Let [Q, Q] be a unitary basis for C . The partial block-inverse in M1 is given by
1<L = Q{B,B)Q\
B = Q*AQ.
Let Ξ = diag {£,·} be the matrix of canonical angles between M and M„,, the
left invariant subspace. Let X+ be the basis of Μ + normalized by the condition

P = QXt
is the spectral projection on M.

Proposition 4.2.2 If the matrix A is subjected to the perturbation ΔΑ, then aand
M become σ' and M' respectively, which (to the first order) are defined asfollows:
(a) σ' is the spectrum ofB' = B + X$AAQ.
(b) M' has the basis X', normalized by Q*X' = Im such that X' = Q - E±AAQ.

PROOF When p equals 2 or F and if ε ρ = ||ΔΛ||Ρ is sufficiently small, then


the expansions in the Rayleigh-Schrödinger series, established in Corollary

2.9.4, yield
\\B'-lB + Xl(AA)Q]\\=0(e2p\
\\X' -lQ-ZHAA)Q-]\\p = 0(a2p).
We remark that if Θ is the diagonal matrix of canonical angles between M
and M', then
II*' -ßllp=Htan0|| p = ΙΙΣ^ΛβΙΙ,.
Since the matrices B and B' are such that
can we deduce anything about the proximity between the members of σ and σ'?
We denote by V the Jordan basis of eigenvectors of B.

Property 4.2.3 For each λΈσ', there exists λβσ of index { such that
for sufficiently small ε2.

PROOF We refer to the subsequent Theorem 4.4.2. If ε2 = || ΔΛ||2 is sufficiently

small, then
(ΐ + μ ' - λ Ι ) ' - 1 " ^
(see Exercise 4.4.1).
When B is diagonalisable, then / = 1, V is a basis of eigenvectors and

Corollary 4.2.4 Ife2 is sufficiently small, then

maxminIA' - λ\ ^ 2cond2(K)||Zs|t||2eJ/,n.
λ'εσ ' λεσ

PROOF The result is true when £ = 1. When

1 < ί ^ m, then εψ ^ e\,m < 1,
We denote by λ (respectively λ') the arithmetic mean of the eigenvalues in σ
(respectively &):

1 = - X μ, 1' = - £ μ.
m μεσ m μεσ'

Corollary 4.2.5 When ε2 is small enough,

|2'-2| = 0(ε2).
PROOF This is an immediate consequence of the inequalities:

\λ' -l\ = -\tr(B' - B)\^ p(B' - B)^\\B' - B\\.


(a) The global spectral condition number ο/σΪ8 defined as
csp(a) = cond2(V)\\XJ2.
(b) The spectral condition number of the invariant subspace M is defined as
csp(M)=||Z 1 || F .
These definitions are independent of the choice of bases in M and M 1 . We
recall that
ll*JI 2 = Wp\\2 = ll(cosE)- 1 || 2 = (cos^ max )- 1
and that
||I- L || F =||(fl,JB)- 1 |lF>*" 1 -
In the special case, in which A is Hermitian or normal, we obtain the following
^ * = 6» V is unitary, cond 2 (V)= 1,
0,P = P 1 , S = L 1 ,||S|| F = i - 1 ,
S =

where S is the block-reduced resolvent. The eigenvalues are always well-

conditioned; an invariant subspace is ill-conditioned if and only if it corresponds
to a block of eigenvalues that are close to the rest of the spectrum.

Example 4.2.4 Consider the special case in which σ contains a single eigenvalue
λ of multiplicity m; then λ is globally well-conditioned when cond 2 (K)||X 3|c || 2 is
moderate (note that we may put V= Im if λ is semi-simple).
A defective eigenvalue is ill-conditioned when cond2(K)||ArJ|t||2 is large (see
Exercise 4.2.2). On the other hand, it may be well-conditioned in contrast to the
case in which it is treated individually (see Exercise 4.2.1).

4.2.3 The Study of csp (M)

Since csp(M) = ||(β,β) _ 1 || Ρ , the investigation carried out in Chapter 1, Section
1.12, applies generally: csp(M) depends on cond 2 (K), cond 2 (K) and δ, where V
is the Jordan basis (eigenbasis) of B in C m and V is a Jordan (eigen-) basis of B
in C n " m . We shall consider two cases:
(a) λ is a multiple eigenvalue: if λ is semi-simple, then B = A/m and csp (M) reduces
to ||Σ 1 ||F, which depends only on £and cond 2 (F). On the other hand, if λ is
defective, then B = VJVl and ||Σ Χ || Ρ depends also on cond 2 (V).

(b) σ consists of m neighbouring eigenvalues {jU,}7* The dependence of csp (M)

as a function of δand cond 2 (V) is not a new phenomenon. We are interested
here in the dependence of csp(M) as a function of cond 2 (K). Now B is
B = VDV~l = Q(RDR1)Q* = β Τ β * ,
and cond 2 (F) = cond 2 (K). The number cond 2 (V) is related to the departure
from normality of the matrix B and is therefore related to the norm of the
strictly upper-diagonal part of T.
When cond 2 (V) is large, there exists a matrix close to B that has a defective
eigenvalue of multiplicity m and whose Jordan vectors are almost parallel.

Example 4.2.5 Consider the eigenvalues {1,0} of

1 1044\
0 /
The matrix of the eigenvectors of A is

cond 2 (K)~10 4 .

It is easy to verify that A' = I _4 Λ ) is defective; it has | as a double
10 _ 4 /2 0
eigenvalue and the Jordan basis V = ( . I, for which cond 2
V-10_4/2 10"4/2/
(F')~10 4 .
We remark that the departure from normality of A is of the order 104 and that
the relative distance between the eigenvalues is \/\\A\\ ~ 10~ 4 .

4.2.4 Balancing of a Non-normal Matrix

The number ||Χ„|| 2 is not invariant when A is subjected to a diagonal similarity
transformation. The notion which corresponds to the scaling of a matrix is the
balancing in relation to the eigenvalue problem. It consists in trying to find a
diagonal matrix D such that
\\D~lAD\\2= inf ||Δ"ΜΔ|| 2 .
From a practical point of view the value of ||XJ| 2 is significant if it corresponds
to a matrix Δ~ιΑΑ for which ||Δ - 1 ,4Δ|| 2 is close to its minimum.

Example 4.2.6 Let

Λ.(V1O '°Ί
0 J
and A.f\01 ° \
ΚΓ 4 /
A' = A~lAA = (1 l

The eigenvectors of A are

1 1
X =
0 -10"4
and those of A' are

* — * - ( ; ' , )
It can be verified that the balancing of A by Δ has diminished the condition
number of the basis of eigenvectors, as well as the departure from normality of
A (see Exercise 4.2.3).

4.2.5 Clustering of Eigenvalues

In order to compute ill-conditioned eigenvalues and/or eigenvectors we try to
group those eigenvalues that are 'tied' to one another, that is which are such that
they are strongly affected by a perturbation in order that the spectral condition
numbers are diminished as much as possible.
When A is normal, then csp (σ) = 1 and csp (M) = δ ~l: clustering neighbouring
eigenvalues diminishes csp(M).
When A is not normal, we have
csp(a) = cond 2 (K)||X,J 2 and csp(M)=||Z J
and the inequalities
1<II*J| 2 <1 + ΙΣ !
(5- 1 ^||I 1 || 2 ^cond 2 (K)(l+<5)^- 1 (5- (m=l),
where L is the maximal index of the eigenvalues of B (Proposition 1.12.4).
Grouping certain eigenvalues causes 1/Jand \\Χ+\\2 to diminish, but cond 2 (K)
and cond 2 (K) remain unchanged.

Example 4.2.7 Let

(\ 1 0\
0 1-ε 0
v° 0
If we group the two neighbouring eigenvalues {1,1 — ε}, both the spectral condi­
tion numbers of the eigenvalues and of the eigenvectors pass from l/ε to 1.

Example 4.2.8 The matrix

(\ 104 0\
A= 0 0 0
0 2/

has the separated eigenvalues {1,0, j}. The eigenvalues 1 and 0 are ill-conditioned
(condition number of the order 104) while the corresponding eigenvectors
(1, 0, 0)T and (1, - Κ Γ 4 , 0)T
are well-conditioned (condition number of the order 1). In fact, the matrix
/ 1 104 θ\
A' = 1.1 x K T 0 0
^2xl0"5 0 \j
has the eigenvalues {1.1; —0.1; ^}. The first two eigenvectors are (1, 10 5,
±x 10" 4 ) T and(l, - l . l x l O " 4 , -±xlO" 4 ) T .
When we group the two ill-conditioned eigenvalues into σ= {0, 1}, we find
that csp(^~10 4 and csp(M)~104, where M = lin(e1,^2) is the associated in-
(\ 10 4 \
variant subspace. This is due to the fact that the matrix B = has as its
vo o ;
matrix of eigenvectors V= I _4 1, which has the property that cond2 (K)~
Denote by X' the basis of the invariant subspace associated with σ' = {—0.1;
1.1} and normalized by Q*X' = /, where Q = [eXie^\. The reader can verify that

*·-[!: * x r]
(see Exercise 4.2.7). The grouping did not improve the spectral condition numbers,
because cond2 (V) remains unchanged. One may ponder on the apparent paradox
that makes it possible for two well-conditioned eigenvectors to generate on
ill-conditioned invariant subspace.
It should be remarked that the grouped eigenvalues {0,1} are not consecutive:
the eigenvalue \ lies between them. The relative distance 1/||>4|| ~ 10 ~4 is small.

4.2.6 More on Conditioning

It should become clear by now that the condition numbers we have introduced
so far have been chosen as the coefficients which can be arbitrarily large in the
first-order bounds derived from perturbation theory. This is because it is essential
to determine the circumstances under which a spectral computation can be ill-
conditioned, that is a small perturbation on the data induces a large perturbation
on the output. We have seen that, as a rule, a large departure from normality is
at the root of unremovable spectral ill-conditioning.
A rigorous general theory of conditioning is beyond the scope of this book.
The interested reader is referred to Rice (1966), Geurts (1982) and Fraysse (1992).
See also Exercises 4.2.13 to 4.2.19.
It is important to realize that notions such as stability and conditioning are

relative to the choice of measures adopted to quantify the perturbations. In

particular, the instability induced by 'normwise' perturbations ΔΑ such that
|| A A || ^ ε || A ||, for a given ε > 0 and for an arbitrary matrix norm, can be much
greater than that induced by 'componentwise' perturbations A A such that \AA\ ^
ε\Α\, where the inequalities are defined componentwise. The latter type of pertur­
bations, which in particular preserves the sparcisity pattern of A, is often more
appropriate for studying the influence on algorithms offiniteprecision arithmetic.
Our next remark concerns the spectral condition number of an eigenvector
(or an invariant subspace). If x is an eigenvector, then ax, a φ 0 is also an
eigenvector. The non-uniqueness of the definition of an eigenvector is reflected
in its condition number, as we illustrate now. We suppose that the eigenvalue is
simple (m = 1, see Figure 4.2.2). If one is merely interested in the eigendirection,
the stability of the computation of x can be analysed by looking at tan θ = ||Δχ|| 2 ,
where Δχ = χ' — x — — ΣλΑΑχ lies in M 1 = lin(x) 1 . This gives csp(x) = ΙΙΣ11|2,
and corresponds to the normalization x*x' = 1 on the computed eigenvector x'.
In this case the eigenvalue and eigenvector condition numbers are not related.
However, one may wish to have a different normalization. For example y*x' = 1,
where y is an arbitrary vector, non-orthogonal to x. With this choice, the
perturbation Δχ lies in the subspace W= lin (y) 1 , and Δχ = — ΣΑΑχ. The condi­
tion number is now ||Σ||2. In particular, one may choose for y the left eigenvector
x+; then Δχ = — SAAx lies in the complementary invariant subspace M = lin (x*) 1 .
The resulting condition number ||S||2 was proposed by Wilkinson in 1965. It is
large as soon as the eigenvalue condition number ||P|| 2 = | | x j | 2 is large. The two
condition numbers are no longer independent. See Example 4.2.8 and Exercises
4.2.8 and 4.2.9.

Example 4.2.9 Consider the eigenvector problem associated with a homogeneous

irreducible Markov chain. The stationary state π satisfies

π = πΡ, \\π\\χ = 19 π^Ο,

as shown in Chapter 3, Section 3.2, equation (3.2.2). The matrix P is stochas­

tic and irreducible: the simple eigenvalue 1 is associated with the eigenvector
e = (1,..., 1)T. Since π is a row of probabilities, the condition ||π|| t = Σ£π4 = 1 is
the normalization of choice. It can be rewritten as π τ £=1. We suppose that
P + AP remains stochastic; hence

n'e = 1, An = π' - π = - πΑΡ(1 - Ρψ

where S = (/ — Ρψ is the reduced resolvent associated with I — P. In the Markov

chain literature, S is often referred to under the equivalent name of group inverse
(because 1 is simple; see Exercise 2.3.7).
One way to illustrate the global eigenvalue ill-conditioning is to look at the

Definition The normmse ε- pseudo-spectrum of A is the set

σε = {AeC;i is an eigenvalue of Λ + ΔΛ,||ΔΛ|Κε||Λ||}.
If σε is much larger than sp (A) then the eigenvalues of A are globally ill-
conditioned with respect to a normwise perturbation. One can similarly define
a componentwise ε pseudo-spectrum (see Example 4.2.11).

4.2.7 More on Non-normality

The influence of non-normality has been a recurring theme throughout this
section on spectral stability. The departure from normality (abbreviated d.f.n.)
has been quantified in Chapter 1, Section 1.12, by v(A) or ||N|| F . It is related to
cond (X), where X is the Jordan (or eigen-) basis of A, depending on whether A
is defective or diagonalisable.
Indeed, an arbitrary matrix has (at least) two canonical forms.
(a) The Schur form A = Q(D + JV)Q*, where N is the strictly upper triangular
part of the Schur form and D is the diagonal of eigenvalues. Since Q is unitary,
cond 2 (Q)= 1: the normalization is borne by the Schur basis Q. However,
|| N ||F can be arbitrarily large.
(b) The Jordan form A = X(D + K)X " *, where K is a matrix whose only non­
zero elements (located on the first upper diagonal) are unity: the normalization
is now borne by K, whereas cond 2 (X) can be arbitrarily large.
We wish to warn the reader that for a diagonalisable matrix a large cond (X%
where X is the eigenbasis, is not always an indication of a large d.f.n.

Example 4.2.10 Consider the 2 x 2 matrix

'a c
,0 b
for c> 0, of Example 1.12.2. When b φ a, A is diagonalisable and
A l
X = L0 b-a
V )
and when b = a, A is under the Jordan form ( 1 and X' — I 1.
VO a) V0 lie)
The d.f.n. is v(A) = y/lc^Jc2 +(b — a)2. When c is fixed and moderate, then
cond (X)-+ oo when b-*a, and v(A) decreases towards yflc2. In this case, the
large cond (X) merely indicates that A is close to being a defective matrix with a
moderate d.f.n.
On the other hand, if c-+ oo, then the d.f.n. v(A) ^ yjlc2 -► oo; cond(X) and
c o n d ^ ' ) tend to infinity, according as to whether b¥^ a or b = a.

We conclude by a spectacular example of the effect of an increasing d.f.n.

Example 4.2.11 Let Sv be the real Schur form of order n = 2p defined by

/ * 1 3Ί
r yi *1 V o
x2 y2
-y2 XjJ


It is easy to check that the di.n. v(Sv) is larger than y/n — 2 v2, so that it increases
as the parameter v increases, when n is fixed.
The real values xk,yk have been chosen so that the eigenvalues xk ± iyk lie on
the parabola x = — 10 y2, that is
(2/c-l)2 2/c-l
**= - yk = - löö" (/c=l,...,p).
Now let v4v = ßSvQ, where Q is the symmetric orthonormal matrix consisting of
the eigenvectors of the second-order difference matrix

W (ij= Ι,.,.,η).
4u = . -sin
n+ 1 n+ 1
The matrix Λν has the same spectrum and d.f.n. as 5V. For v = 1, 10, 102, 103,
and n fixed equal to 20, the following computations on Av were performed by
means of the QR algorithm (see Chapter 5), under MATLAB* on a workstation
working with a machine precision of the order of 2 x 10~ 16 . Figure 4.2.5 shows
the exact ( + ) and computed (°) spectra of the four 20x20 matrices Av. The
increasing instability of the spectrum is clear, as v increases. The * represents the
exact and computed means of the eigenvalues; they are equal within machine
precision, as a consequence of Corollary 4.2.5.
Apart from two of them, 18 of the 20 computed eigenvalues lie, for v= 102 and
103, on a disk centred at this arithmetic mean 1 This suggests that the matrix Av
behaves approximately like one Jordan block of size 18, completed by two
diagonal elements.
In order to test this hypothesis, we compute the spectra of a sample of matrices
A' = A + A A, randomly perturbed from A, in the following componentwise way.

♦MATLAB is a numeric computation system, trade mark of The Math Works Inc.

•0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0

U- I V- 10

•0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 2 -1.5 -1 -0.5 0 0.5 1 1.5 2

l/= JO2 i / = 10s

Exact and computed spectra

Figure 4.2.5 Exact and computed spectra

Thus Ay becomes a'u = alV(l + αί), where a is a random variable taking the values
± 1 with probability £ and t = 2~k, the integer k varying from 40 to 50. Therefore
|Δ>1| = ί|>4| where t ranges from 2 - 4 0 ~ 1 0 " 1 2 to 2 - 5 O ~ 1 0 " 1 5 . For each i, the
sample size is 30, so that the total number of matrices is 30x 11=330. The
superposition of the corresponding 330 spectra are plotted in Figure 4.2.6. The
transformation of the perturbed spectra as v varies is dramatic. The 36 spikes
around 1 for v = 102 and 103 confirm the hypothesis that Av becomes computa­
tionally close to a Jordan form as v increases. The computed eigenvalues λ' are
solutions of (λ' - A)18 = ε = 0(t\ ε being positive or negative with equal probability.
The computed eigenvalues appear at the vertices of two families of regular
polygons with 18 sides, which are symmetric with respect to the vertical axis—



-05 -0.4 -0.3 0.2 -0.1 -0.4 -0.3 0.2 -0.1

U= 10

. ■A T

L i i#" 4
10s 10s

Perturbed spectra

Figure 4.2.6 Perturbed spectra

hence the 36 spikes. Note that, although the spectra for v = 102 and 103 are
qualitatively similar, the one for v = 103 is more ill-conditioned than the one for
v = 102. The fact that the matrix approximately behaves like a Jordan block of
size 18 (rather than 20, for example) is a consequence of the particular structure
of 5V. It remains true under normwise perturbations (Fraysse 1992).
The perturbed spectra are part of the componentwise ε pseudo-spectrum, with
£ - 2 " . However, the information on the underlying Jordan structure, in finite
arithmetic, of Av (which is diagonalisable in exact arithmetic) is too detailed to
be retrievable by looking at the global pseudo-spectrum. The interested reader
can find other suggestive examples in Chatelin (1989).
Example 4.2.11 shows that truly difficult problems arise when the d.f.n. is

unbounded under the variation of some parameter. The parameter can be merely
the size n of the matrix, as is the case when the matrix is the discretization of a
highly non-normal operator.
The true difficulty of computing the spectrum in the absence of normality has
been known to mathematicians for a long time. Its practical implications should
not be overlooked, since such examples appear in essential industrial etpplications,
as well as in physics—fluid dynamics and plasma physics, for example.


Let λ be an eigenvalue of A of algebraic multiplicity m, geometric multiplicity g
and index <f. Then
M = Ker (A - λΐγ and E = Ker (A - λΐ)
are the associated invariant and eigensubspaces respectively.
Put A = A + ff, where \\H || = ε; c will denote a generic constant.

Lemma 4.3.1 When ε is sufficiently small, then, in a given neighbourhood of λ,

there exist m eigenvalues {μ·}7 of A (each counted with its algebraic multiplicity).

PROOF Let Γ be a closed Jordan curve isolating λ and lying in res (A). Consider

Ρ ' - Ρ = - - ί - ί [K'(z)-K(z)]dz,
R'(z) = (A-ziy\
R(z) = (A-ziy\
R'{z) - R(z) = R'(z)(A - A)R(z).
||P' _ p|| ^ i i ^ l L azerx ||Ä'(2)|| ||R(z)||1||H|| = cs9
In L J
where meas Γ denotes the Lebesgue* measure of the curve Γ (see Exercise 4.3.1).
If ε is such that \\P'-P\\ <U then
dim ImP = dim Im F = m,
and A has m eigenvalues μ'. inside Γ.
We define Af' = Im F , the subspace invariant under A and associated with

♦Henri Lebesgue, 1875-1941, born at Beauvais, died in Paris.


Corollary 4.3.2 When ε is sufficiently small, then

ω(Μ,Μ') = 0(ε).
PROOF We have
ω(Μ,Μ') ^ max [||(P - P')P|| 2 , ||(P - P')P'U2]
by virtue of Lemma 4.3.1 and the fact that all norms are equivalent. In particular,
if xeM, then dist (x, Af') = Ο(ε), and if x'eM', then dist (χ', Af) = Ο(ε).

Lemma 4.3.3 Ife is sufficiently small, then P' defines a bijection ofM on to M'.

PROOF Let F be the map P\M:M-+M'. Suppose that xeM and ||x|| = 1. Then
11 - HP'xIl | = I IIPxIl - HP'xIl I < ||(P - F)Px ||
^ ||(P-P')PK±
if ε is sufficiently small. Hence
HP'HH and HP"1!!^.
We note that A]M and F~ 1A'F are maps of M into itself. Let B = Y*AX and
B' = Y*F~1A'FX be square matrices of order m representing these maps in a
chosen basis X of M and an adjoint basis Y.

Lemma 4.3.4 We have

sp(B) = {A), sp(F) = {/ii}7
PROOF It is evident that sp (B) = {λ}. Let φ'εΜ' be an eigenvector of A' associated
with μ'; thus Α'φ' = μ'φ', and hence

(Ρ'-1Α,Ρ')Ρ'-ίφ = μ'Ρ'-1φ\
that is
£ V = μ'η',
where ^' = Y*F~^' and iy' ^ 0 because F~ ιφ' Φ 0. We conclude that sp(J3') =

Choose an arbitrary vector £e<Cm and put y = ΧξβΜ. Then

(5 - £')£ = Y*(Ay - P'-'A'Fy)

= γ*Ρ>-ΐρ'(Α-Α')Χξ,
since AF = Ρ'Λ' and F~ FAX = ΛΧ.

Let / be a function of a complex variable z, and suppose that / is holomorphic
in a neighbourhood οϊλ, which constitutes the spectrum of B. By using Cauchy's
integral formula (2.2.6) we define

f(B) = ^{f(z)(B-zyldz.

Lemma 4.3.5

- | t r [/(£') - / ( B ) ] | < \\f(B') -f(B) K c || B' - B ||.


PROOF We use the identity

(Β' - ziy l-(B- zl)-l = (£' - ziy l(B - B'){B - ziy1.

\\f(B')-f(B)|| ^ ^ — max[|/(z)|||(Β' -ziy1 \\(B- zl)~11|] | | B - B'||,

because, for sufficiently small ε, the contour Γ contains sp(2*'). Finally, we apply
the inequality

to the matrix
C =/(£')-/(£).

Theorem 43.6 When ε is sufficiently small we have

(a) maxlA-zi^Ote 1 ") (4.3.1)

(b) |Α-ΑΊ = 0(ε), (4.3.2)



(a) Let f(z) = (z — A/, which is holomorphic in a neighbourhood of λ. Since

(z — Xf(B — zl)'1 is holomorphic inside Γ, it follows that
f(B) = (B-XlY = 0.

For any vectors x and x' of C m we have the identity

f(B')x' -f(B)x = U(B') ~/(B)]x + /(£')(*' - x).
If η' is an eigenvector of B' associated with λ\ we have
/(£>?'==(A'-A) V
We deduce that
\λ - i f ||ff'||2 ^ \\f(B) -f(B)\\2 ||x|| 2 + ||/(F)|| 2 dist(iy'.C-) ^ cs,
by virtue of Corollary 4.3.2, because C m is invariant under B and associated
with λ.
(b) Choose f(z) = z and apply Lemma 4.3.5. Thus

—|tr(jy - β)| = |I' - A| ^ || B' - B||.

Let λ' be one of the eigenvalues of A' inside Γ. Put
Ek = Ker (A - λΙ)\ Ε) = Ker (Α' - λ'ΐγ,

1 ^7 < k ^ t ^ m.

Theorem 4.3.7 Lei x}e£^ and | | x j = 1. Then, for sufficiently small ε,

disX(x'pEk) = 0(e{k-j+w).

PROOF Let Fk be a subspace of <Cn such that

<£" = Ek®Fk,
and let Π be the projection on Ek along Fk. The equation
(/I - klfx = y
has a unique solution in Fk.
Let x'jeE'j and ||X;|| = 1. Put
xk = Πχ;,

and so
x;.-x fc = ( / - n ) x ; . e i v

*i - x t = i(A - urtF j - H^ - A/)"(X;. - x»).
\\χ) -xJ^cUA- XI)k(x'j - xk) \\=cUA- Xlfx) ||.

Consider the identities

(A - Xlf - (Α' - Xlf = *£ (Α' - Xlf(A - A')(A - Xlf ~' ~ *

1= 0

(A - Xlf = l(A' - ΧΊ) + (Χ' - X)If = X Ck(X' - X)\A' - Λ7)*-\

We deduce that
|| (A - Xlfx) - (A'-Xlfx) II ^ce.
Also, when j < k, we have

|| (A' - Xlfx) || = I Σ C[(X' - X)'(A' - X'lf-% I

||i = fc-j+l ||
k j+
^c\X-X'\ ~ \
because (A' — A'/)*""'*} = 0 when i < k —j.
Since | X — X'\^ cel,e we have
< c || (A - Xlf - (A' - A/)k]x; + (A' - Xlfx) || 2

Remark /n particular, whenj =fc= 1, ίΛβη x'x is an eigenvector of A' and

£ 1 = =£ = Ker(A-A/).
dist(x/1,£) = O(e1/0·
The distance between the eigenvectors is of the order of the distance between
X and the individual eigenvalues, while the distance between the invariant
subspaces is of the order of the distance between X and the arithmetic mean X'.

Proposition 4.3.8 When ε is sufficiently small, we have

πιίημ-μ'.Ι^Οίε^). (4.3.3)

PROOF We remark that (4.3.3) is an improvement of (4.3.1) only if g/m > 1/7.
Suppose now that m < g£, which is satisfied when the Jordan box Βλ, associated
with X9 contains g blocks of different sizes.
We choose a Jordan basis of B in M. Then B' is similar to C = Βλ + εΚ, where
εΚ is the perturbation matrix induced by H, and we may suppose that
|| K || = 0(1). We shall show that there exists at least one eigenvalue μ' of C (and

hence also of B') such that

Consider the characteristic polynomial
π(μ') = άα(μΊ-σ + λΙ)
whose zeros are the eigenvalues μ\ — λ of the matrix C — XL The constant term
is the product of the roots

t= 1

The Jordan box Bk consists of g — 1 zeros, not necessarily consecutive, on its

superdiagonal and m— 1— (g— \) = m — g units. The constant term in π(μ')
cannot contain terms in ε*, where k < g, because m — g + k = m is equivalent to
k = g. On the other hand, there exists at least one perturbation for which the
constant term contains a term in ε9. Hence

/= 1

and there exists necessarily at least one μ' such that | λ — μ'\ ^ ce9,m.

Remark If A is Hermitian or normal, then g~mj= 1, £ = M and \f£ = g/m = 1.

The bounds which we have established determine only the order relative to ε.
Next we shall establish a posteriori bounds, which will enable us to estimate the


We use the knowledge of approximate eigenelements in order to obtain bounds
for errors. In order to be useful these bounds must be reasonably easy to compute
as functions of the known eigenelements.

In this section, the norm || · || on <C" is supposed to be monotonic, that is for each
diagonal matrix D = diag(A1,...,yln) the induced norm satisfies ||D|| =max,|A l |.
The Euclidian norm and the norm || · || ^ are monotonic.
The definition of monotonicity we have given is equivalent to the following
characterization. Let x = (£,-} and y = (η^ be vectors in C"; then || · || is monotonic
if and only if | ξί{\ ^ | ηίf | (i = 1,..., m) implies that || x || < y || (see Horn and Johnson,
1990, p. 310). Let a be a scalar and let u be a vector such that ||M|| = 1. In order
to test how accurately these data represent eigenelements of A, it is natural to

consider the residual vector r = Au — <xu for A based on a and u. UA is diagonalis­

able, then A = XDX ~ l\ if not, then A = XJX " \ where J is the Jordan form of A.

Theorem 4.4.1 Let a and u be given, where \\u\\ = 1. Put r = Au — OLU. Then there
exists an eigenvalue XofA of index ί ^ 1, such that:
(a) If A is diagonalisable,
|A-a|^cond(J0||r||. (4.4.1)

(b) If A is not diagonalisable,

~ a | , , <cond(X)\\r\\. (4.4.2)

PROOF The result is trivial when r = 0. We suppose that D — OLI or J — a/

respectively is regular.
(a) If A is diagonalisable, we have
r = A u - ocu = X(D - aI)X ~x u
l = \\u\\ = \\X(D-*I)-lX-lr\\
AespM)l^- a l
(b) When A s defective, we have
^ cond (X) max Ο + Ι ^ + ^ ^
AespM) \X — 0L\f

by virtue of a bound established in the proof of Proposition 1.12.4, where λ

is an eigenvalue of A of index Y. We remark that when λ is defective {ί > 1),
the maximum is not necessarily attained for an eigenvalue λ that is closest
to a. We remark also that (4.4.2) often reduces to (4.4.1) when λ is semi-simple

Theorem 4.4.2 When A' = A + H, then for each eigenvalue X of A', there exists
an eigenvalue Xof A, of index ί^\, such that:
(a) If A is diagonalisable, then
| A ' - A | < cond (JO || fl ||. (4·4·3)
(b) If A is not diagonalisable, then
— ^cond(X)||tf||. (4.4.4)
(ΐ + μ ' - λ | >

PROOF The assertion is a simple consequence of Theorem 4.4.1, if we choose

for a the eigenvalue X of A and for u an associated eigenvector x' such that
|| x j = l;then
Ax' — λ'χ = r = — Hx'
The inequality (4.4.3) is known as the Bauer~Fike (1960) theorem.

Example 4.4.1 Let

A.(2 -">"), „-1, ..f '

V0 2 / No­
where | | I I | L = land M t i - i i | L =10" 1 0 .
Nevertheless, a = 1 is not close to the double eigenvalue 2 (!); one suspects the
factor condiX) that appears in (4.4.1). The reader will verify that
1 1 \/2 l\/l 1010N
A =
\ o — i0"loyvo 2 Λ 0 - 1 0 1 0
The number cond(X) = \\X \\ \\X~11| is the condition number (with respect to
inversion) of the matrix X of the Jordan basis (of eigenvectors) of A. This number
is often taken as a measure of the conditioning of the spectrum of a matrix A; this
is justified by Theorem 4.4.2. When A is normal, cond2 (X) = 1; this confirms the
fact that the eigenvalues of a normal matrix are always well-conditioned.
Theorem 4.4.1 furnishes bounds that are valid whatever the value of ||r||.
However, in order to estimate condiX) for a non-normal matrix, we have to
know an approximation to a Jordan basis (or of a basis of eigenvectors). When
|| r || is sufficiently small we shall show in what follows that there are bounds that
require only the knowledge of a single approximate eigenvector. We shall present
this result in the more general case when an approximate basis of an invariant
subspace of dimension m is known.
Two situations will be treated successively:
(a) Only the approximate invariant subspace is known, and we associate with it
the Rayleigh quotient matrix.
(b) The approximate invariant subspace is known that belongs to a neighbour­
ing matrix A' = A + H, which is also known.

Suppose we know U — [u l 5 ..., uM], which is a basis of the subspace M close to
the invariant subspace M; we suppose further that the vectors are normalized by

where Y is a given matrix. The matrices U and Y are augmented to [C/, (/] and
[Y, J ] so as to become adjoint bases of <Cm. Relative to these bases, A takes the

where B = Y*AU is the Rayleigh quotient of A constructed upon the matrices U

and Y, and
D = YM U, R0=Y*AU and S* = Y*A U.
The right residual matrix is defined as
R = AU - UB = (/ - t/γ*μι/ = L/K0.
Similarly, the left residual matrix is given by

If U is the basis of an approximate invariant subspace, then ||K|| is small. We

shall establish precise bounds as a function of \\R\\, where || · || now denotes the
spectral norm or the Frobenius norm. First, we introduce the following notation:
X is the basis of the subspace M invariant under A, normalized by Y*X = /;
B = Y*AX represents the map A^M:M-+M with respect to the adjoint bases X
and Y; Σ = U(D, B)Y* is the partial block inverse with respect to sp(B), which is
defined if and only if dist min [sp (D), sp (B)'] > 0; Θ = diag (0,), i = 1,..., m, is the
diagonal matrix of the canonical angles between M and M. We put
y=||f||, p=||K||, s=||S|| and w=||LR||.

Theorem 4.4,3 //sp (D) n sp (B) = 0 and ifysw < \, then there exists a basis X of
M normalized by Y*X = / such that

where ε = ysw and 1 < g(e) < 2.

PROOF It suffices to apply Corollary 2.11.3. Since R satisfies the equation

Y*K = 0, we have %R = J^ 1R. The function g was defined in the formula (2.11.2).

Corollary 4.4.4 / / s p (D) n s p (B) = 0 andifp< l/(4y2s), then

\\U-X\\^2yp and \\B-B\\^2ysp.

PROOF The assertion is a simple consequence of the inequalities w < γρ and


If the basis U of M is chosen to be orthonormal, then for p = 2, F, || U -X \\p =

||tan©|| p , when X is normalized by U*X = I; moreover, ||£|| p = ||(D,B) _1 ||p.
Since 1 < g(e) < 2, the knowledge of y = || Σ || is necessary only to ensure that the
condition ε < \ is satisfied.
The bounds of Theorem 4.4.3 are optimal with regard to the data U and Y;
this will be demonstrated in the following example.

Example 4.4.2 In R 2 let

\a l/b)
and let u = y - ev Then e[Aex = 0 and exe\ is the orthogonal projection on the
direction lin {e^. We have

||Z||=y = |fc|, 5*Ir=-a2fe, ||ΣΓ|| = w = |ofr|.
The eigenvalues of A are A = (l/2fc)(l ± ^/l — 4α ΊΡ). Let Ö be the acute angle
between the eigenvectors and eue2. Then

tan Ö = —.
We put ε = y \\ s \\ w = (ab)2. Than the bounds given in Theorem 4.4.3 are attained:
μ| = g(e)\b\a2 and tan0 = g(e)\ab\.
The bounds that were established in Theorem 4.4.3 can be computed from the
solution W of the equation
(I-UY*)AW-WB = R,
and they apply when p is sufficiently small. It should be borne in mind that it is
w and not p that serves as a good indicator for the quality of the approximation
of X by U and of B by B. If Y also happens to be an approximate basis of the
left invariant subspace, then s, too, is small and the Rayleigh quotient B
approximates B to the second order 0(sw).
In the very special case in which A is almost triangular, it is possible to obtain
bounds that can be computed from the eigenvalues without knowledge of the
approximate eigenvectors (Exercise 4.4.3).

We now suppose that we know the matrix A — A + /i, close to A, for which M'
is an exact invariant subspace with a given orthonormal basis Q'. We put

B' = Q'*A'Q' and B' = Q'*A'Q'; thus

Let Σ' = ß ' ( F , B')~ lQ*; the right residual matrix for A, based upon B' and ρ',
is defined as
AQ' -Q'B' = (A - A')Q' = - HQ'.
We put / = ||Σ'||,s = ||Q'*AQ'Q'*|| and ί = ||β'||,« = IIQ*II; Θ is the diagonal
matrix of canonical angles between M and M'.

Theorem4.4.5 //sp(JB / )nsp(B') = 0 and ί/Ί|//|| < | [ y ' ( l + tu + 2y'st)Y\ then

there exists a basis X ofM, normalized by Q'*X = / such that
where B = Q'*AX.

PROOF Consider the map

G :w-ν Λ + Σ'[κρ*Λκ+ //Κ- κρ'*//ρ'],
where Kx = - ΣΉζ)'. It can be shown, that when || H || / ( l + iu + 2/si) < £,
(a) The sequence {Vk} defined in (2.11.3), with Y= Χ' = ρ', satisfies
(b) G' is a contraction map in the sphere
(c) G' has a unique fixed point V in 0$ which satisfies
χ = ρ'+κ, ΛΑ^ΧΒ, ρ'*χ = /.

For a proof the reader is referred to Exercise 2.11.2. It is easy to deduce that
ΙΙ*-6ΊΙ = ΙΙ*ΊΙ^2||Σ7/ρι.
On the other hand, the identity
B-B' = Q'*AX - Q'*A'Q
= Q'*A(X-Q') + Q:*(A-A')Q:

= ο:*Ανν*(χ-<ζ)+ο:*Η<2'
furnishes a bound for || B — B'\\.

Corollary 4.4.6 With the Euclidian norm and under the hypothesis that

iiHii2<Ky'0+y'*)]~ 1

we have the inequalities



PROOF This is immediate since \\Q'\\2 — IIQ* II2 = U hence t = u = 1.

The bounds established in Theorem 4.4.5 and Corollary 4.4.6 can be computed
from the solution W = ZHQ' of the equation
(/ - Q'Q'*)A'W - W'B' = (/ - QQ*)HQ'.
Finding an estimate for / can turn out to be costly when the matrix A' is not
normal. In Section 4.6 we shall get to know the simplifications brought about
by the assumption that A' is Hermitian.

Having established the a posteriori bounds for || B — B || and \\B — B' || one would
like to deduce bounds for the distance between their respective spectra. This is
a very difficult question for which no solution has yet been found that is
sufficiently simple to be altogether satisfactory.
In Exercise 4.3.2 the following qualitative result will be established:
dist [sp(/4), sp (A')-] ^ c \\A - A'\\1/n,
if || A — A'\\ is small enough, where n is the order of A and of A'. The constant c
is difficult to estimate in the general case (see, for example, Ostrowski, 1957) and
is of little use in practice because it increases with the order n of the matrix. The
exponent 1/n is deleted when A and A' are assumed to be diagonalisable. In
particular, when A and A' are Hermitian,
as we shall see in Section 4.6.
The simplicity of such a result is preserved even in the context of arbitrary
matrices if we replace the distance between the spectra by the distance between
the arithmetic means of the eigenvalues:

A = -tri4, ;: = - t r / l \
n n

In fact,

\l-l'\ = -\tr(A-A')\^\\A-A'\
Let us now return to the matrices B, B and B' in which we are interested. We
denote their spectra by
{μ,.}™ KJ7 and M7
respectively, and their arithmetic means by

μ = ~Σ^ C = —Σίί and μ\ = -ΣΚ'

m i m i nii

We deduce immediately that

\μ-ζ\^\\Β-Β\\ and \μ-μ'\ ^ \\B- B'\\.
In particular, when m = 1,
U-CI^2M||2||Er||2, r = Au-Cu,
u*u = 1, C = u*Au
μ-ΑΊ^[2Μΐι 2 ||ΣΉ^ΊΙ 2 + ΙΙ^Ίΐ2], *"V = i.
We conclude this section on bounds for a posteriori errors by connecting it
with a related problem, namely the localization of the eigenvalues of a matrix,
that is the determination of the region in the complex plane in which the required
eigenvalues are situated. The localization obtained depends on the data at our
disposal. For example, if we wish tofindthe simple eigenvalue λ of Λ, the localiza­
tion problem might take the following form.
Given the vectors u and y such that y*u = 1, and also the complex number
σ = y*Au, is it possible to determine the radius of the smallest disk centred at σ
(or u) and containing λ (or x) in such a way that Ax = λχ, χΦθΡ. We may also
wish to obtain localization results by starting with a given vector u and a given
scalar a. The results we shall establish will provide partial answers to this
question in the general context when it is desired to localize together a set of m
eigenvalues. In Section 4.6 we shall return to this problem of localization in the
case of Hermitian matrices.


For a diagonal matrix, the bases X = Y= In are the bases of right and of left
eigenvectors respectively. We shall establish localization results for the eigenvalues
of a matrix that is close to a diagonal matrix; this will be based on the following
general result.

Theorem 4.5.1 Each eigenvalue ofA= (au) lies in at least one of the Gershgorin

| z ; | z - f l H | < Σ\αΑ (ί = 1 n).

PROOF Write A = D + H, where D = diag(a u ) is the diagonal of A and H is its

off-diagonal part. The theorem is true when λ = au for at least one index i.
Suppose now that λ φ aih i = 1,..., n. Then / / — D is regular and we have
λΙ-Α=λΙ-ϋ-Η = (λΙ- D)U - (λΐ - D)- lHl
The condition || (λΐ — D)~x || || H || < 1 is sufficient for λΐ — A to be regular. Hence
for each eigenvalue λ of Λ, we have

Corollary 4.5.2 If each pair of the n Gershgorin disks has an empty intersection,
then each disk contains exactly one eigenvalue of A, which is therefore simple.

PROOF By virtue of the hypothesis, the au are distinct. Put Α(ε) = D + εΗ when
Ο ^ ε ^ 1. When ε = 0, the disks reduce to the points au. By continuity, as ε
increases, each disk contains one eigenvalue as long as the disks remain disjoint.
When applied to a matrix A which is almost diagonal, that is for which
|| H ||^ = maXiEj^ilfl/jl is small, the above results enable us to find bounds for
\λ — au\ provided that the disks are disjoint. In some cases of non-empty inter­
sections we can use diagonal similarity transformations on A in order to render
the disks disjoint (see Exercises 4.5.1 and 4.5.2).
We shall now establish a result that enables us to treat the case of a matrix A
that is close to a block-diagonal matrix.

Theorem 4.5.3 For any ordering of the eigenvalues μ,- of A, put

1 m
β = ~Σ Pi­
rn i= i
By suitably permuting the rows and the columns of A it is possible to obtain a
Alx Al2
A2\ A22J

in which Axl is a block of order m such that


where, generally,


PROOF Let T be the Schur form of A:

AQ = QT,
and let X be the nby m matrix formed by the first m columns of Q: AX = XTlx.
t r T u =tr X* AX = mfi.

Λ U,...,m)
be the m by m matrix extracted from X by selecting the elements in the rows
iah*--»im anc* the columns 1,2,...,m.
We permute the rows of X in such a way that

where X u is the square matrix with the property that

| d e t X u | = max d e t X ( , 1 , " ' , i m )

for all possible choices of i {,..., im from the set 1,..., n.

We partition A in the same fashion; thus
^ l i ^ n + AxlX2i = ΧπΤη,
^11 + ^12*21^11 ^11^11^11 *

It follows that
^21=^21^11 Ο Γ
^21*11 =
By Cramer's* rule, the general element of y 2 i *
n,= detX n

»Gabriel Cramer, 1704-1742, born in Geneva, died at Bagnol-sur-Ceze.


(k = r + 1,..., n; j = 1,..., m). By virtue of the choice of Xl l5 we have \ykj\ ^ 1,

and so
m n
|tr>t 1 2 y 2 1 | = Σ Σ wu
j'=lk = r+l
m n
^max|)>fcj.| £ X |fljjkl^Mi2ll*

We leave it to the reader to apply this result to a matrix that is close to a

block-diagonal matrix (see Exercise 4.5.3).

Numerous simplifications occur in this context, which enable us to obtain more
precise results.

4.6.1 The Rayleigh Quotient p = u*Au, u*u — 1

We recall what Theorem 4.4.1 becomes: if a and u are given such that || u\\2 = 1
and r = Au — OLU, there exists an eigenvalue λ of A such that | λ — α| ^ || r || 2 . The
Rayleigh quotient p — u*Au, constructed with w, possesses the following optimality

Lemma 4.6.1 When A is Hermitian, the problem

min \\ Au — zu \\2, u fixed, u*u=\,

is solved by p = u*Au.

PROOF We have
\\Au-zu\\22= (u*A* - zu*)(Au - zu)
— u*A*Au — zu*A*u — zu*Au + zz
= u*A*Au + (u*Auu*A*u — u*Auu*A*u) — zu*A*u — zu*Au + zz
= u*A*Au — u*Auu*A*u + u*Au(u*A*u — z) — z(u*A*u — z)
— u*A*(Au — uu*Au) + \u*Au — z| 2 ,
The minimum of which is attained for z = u*Au = p.
In particular, Theorem 4.4.1 now asserts that, given u such that || u || 2 = 1, there
exists an eigenvalue / of A with the property that
μ - p K M u - p n ||2.

This property is often attributed to Krylov (Krylov and Bogolioubov, 1929) and
Weinstein (1934).
If additional information is available about the distance of p from the other
eigenvalues of A, then the Krylov-Weinstein inequality can be improved; this
will now be done after a preliminary lemma. We put
e= \\Au-pu\\2.

Lemma 4.6.2 Let a and b be two real numbers such that a<p<b and suppose
that the open interval (a,b) contains no eigenvalues of A. Then

PROOF We have A = QDQ*, where Q is a basis of eigenvectors of A. Put v = Q*u;

t h e n | M | 2 = |M| 2 = l.If

where the xt are the eigenvectors of A9 then


v = Σ to,

where ξ. = xfu. We now have

(A - bu)*(Au - au) = (DO - bv)*(Dv - av)

= Z(ft-*>fa-*)liil 2
= ε2 + ( ρ - ί ? ) ( ρ - α ) ^ 0 ,
because (μί —ft)(ju,·— a) > 0 for all i.

We denote by Θ the acute angle made by the direction of u and the eigen-
subspace M associated with λ (see Figure 4.6.1).

Figure 4.6.1

Theorem 4.6.3 Suppose the open interval (λ,λ) contains p and precisely one
eigenvalue λ of A. Then
ε2 ε2
p-= <λ^ρ + - (4.6.1)
λ—p p—λ
2 1/2
. + ÄV 2Ί
sin0<: Ρ-^Γ1) + «Ί (4-6.2)

PROOF Apply Lemma 4.6.2.

(a) If λ < ρ < I, put a = λ and b = I. Then

p-= ^λ<ρ.
(b) If λ < ρ < λ, put a = λ and ί? = λ. Then
Hence (4.6.1) is always true when λ < p < λ. With the notations of Lemma 4.6.2
we have
(Au - ku)*(Au - 'λύ) = ε2 + (p - λ)(ρ - λ)
= (Dv - λν)*(ϋν - λν)

i= 1

By hypothesis (z - λ)(ζ - 1 ) ^ 0 when z ranges over sp (A\ except when z = λ. We

conclude that

Suppose that ^ t = · · · = pm = λ. We can choose a basis of eigenvectors in M such

II Pull 2
and so
£1=cos0, i 2 = ... = {m = 0.
ε2 + ( ρ - Α ) ( ρ - Ι )
in2 0 = 1 - cos 2 Θ ^ 1 + -


the denominator has the lower bound

It follows that

'*(ή)T( ! i i ) ^+ε,+0 '- ä,( ''- 3, }
whence (4.6.2) is readily deduced.

Corollary 4.6.4 Let

^ = dist[p,sp(/l)-μ}] = min[|p~μ|,μ€spμ)-{yl}].

Ife < δ, then

ε s
\λ — ρ\^ιζ and sin0^=:
δ δ

PROOF Apply Theorem 4.6.3 with

X= p—δ and 1 = ρ + δ.
The inequality (4.6.1) is known as the Kato-Temple inequality (Kato, 1949;
Temple, 1928). It improves the Krylov-Weinstein inequality when ε2<(λ — p)·
(p —A). This inequality is often used to obtain the bounds λ and λ.

Example 4.6.1 Let

/ 1 10"5 1(τΛ
A = 10" 2 10" 5
1(T 5 1(T 5 3
On putting M = ef (i = 1,2,3) we obtain |Af· — i\ ^y/ΐ x 10~ 5 by using the
Krylov-Weinstein inequality. Next, on applying (4.6.1) we obtain
2xl0~10 , 4
M <*> 2xlO"10
l-^/lx 10"5

, , , 2χ10~10
\-yfix 10"5

4.6.2 The Matrix Rayleigh Quotient B = Q*AQ, Q*Q = /

We shall now turn our attention to the approximation of a set σοί m eigenvectors
of A by means of the spectrum of a Hermitian matrix C of order m. Let Q be a
basis consisting of m orthonormal vectors. The residual matrix associated with
C and Q is given by R(C) = AQ- QC. The spectrum of C is denoted by {aj?.

Theorem 4.6.5 There exists an ordered set ofm eigenvalues {μ^™ of A such that
(a) m a x | f t - a i | < | | A ( C ) | | 2 , (4.6.3)

(b) Σ( μί - αί ) 2 <||Α(0||^. (4.6.4)


(a) Let G = [Q, Q] be an orthonormal basis of <C". Then write

A = G*AG = (B S

where B = Q*AQ, S0 = Q*AQ and E = β"Μβ. On the other hand,

R(C) = G*R(C) = [

||Ä(C)|| 2 = ||Ä(C)|| 2 ,


Ö = G*G
We now adopt the basis defined by G and apply the dilation theorem (see
Exercise 4.6.1) to the matrix R(C); thus we construct the matrix

H = H* = (S-C S

where W= W* is chosen in such a way that

lltf|l2 = IIÄ(C)||2.

The spectrum of the matrix

A-H-lC ° )
\0 E-WJ
is denoted by {aj", with the proviso that
{<*,.}? = sp(C).
By Weyl's theorem (Exercise 1.9.6) there is a set {μ,.}? of eigenvalues of A
such that
Ιμ,-α,Ι < ||if ||2 = |lR(C)||2 (/=l,...,m).
(b) We diagonalise A and C; thus
F M ^ = D = diag(^,·),
i/*Cl/ = A = diag(ai),
and we put
Q = V*QU, R(Q = V*R(C)U.
R(C) = D ß - ß A
We may therefore confine ourselves to the case in which A and C are diagonal
matrices D and Λ respectively.
Next, consider the change of basis
We put
ΗΌ=|0Ο·Ι2> ß = (4«A
_j(^i-aj)2 whenl^j^m, l^i^n.
\o when;>n, Ui$«.
Put R = R{C) = DQ- QA. Then
|| R || I = tr K *K = tr (Q*D2Q - AQ*DQ - Q*DQA + Λ2)
m / n n

= Σ Σ^Ι^·Ι2-2Σ«Λ·Ι^Ι2 + «,2
m n n
= Σ Σ Wy(M!-a/, because X w 0 = 1
j=li=l i=l
R ft
Σ Σ wi/A/> because dy = 0 when j >m.

We wish toy niiu

find aa ιιιαιιΐΛ
matrix rvW=(w.j) which uniiniiiz.c5
= yw..) wniuii minimizes ||\\R\\?
rv [ under the

Hoffmann and Wielandt (1953) have shown that W is a permutation matrix

of order n. In fact, it is easy to verify that the minimum of

Σ Σ wudu
is attained for

Σ du.


when A and C are fixed and {μ,·}7 is a set of eigenvalues of A.

In particular, suppose that the basis Q is chosen to be the basis of eigenvectors
Q' of A' = A + H, and that C is taken to be B' = Q'M'Q'. Then
R(B') = (A-A')Q\
and if {μ·}7 = sp(ß')> there exists a numbering of the eigenvalues of A such that

m a x | M i - ^ | ^ ||/fß'|| 2 , £ (μ,- tf)2 < IIIfff |ß.

«' »= i

Next, we shall show that the Rayleigh quotient B = Q*AQ9 constructed on Q,

possesses the following extremal property.

Lemma 4.6.6 The matrix B = Q*AQ represents the minimum of || AQ — QZ \\p

(p = 2 or F), when Q is fixed and such that Q*Q = I and when Z ranges over <Cm x m.

PROOF We have
(AQ - QZ)*(AQ - QZ) = Q*A2Q - Z*Q*AQ - (Q*AQ)Z + Z*Z
= Q*A2Q + (B - Z)*(B -Z)-B2
= (AQ - QB)*(AQ - QB) + (B - Z)*(B - Z).
F = AQ-QZ, G = AQ-QB, H= B-Z.
1/2 i,
(a) ||F|| 2 = p (/ *F); since H*H is positive semi-definite, it follows that
p(F*F)^p(G*G) (see Exercise 1.9.1).

(b) ||F|| F = tr 1/2 (F*F); we have tr(F*F) ^ tr (G*G).

In both cases the minimum is attained when Z = B.
The interested reader can refer to Exercises 4.6.5 and 4.6.6 where extremal
results with regard to the approximation of the eigenelements of A by those of
B will be found; these results complement Lemma 4.6.6. As in the preceding
section, the inequalities of Theorem 4.6.5 can be sharpened for C = B, if we
possess supplementary information about sp(2*).
Suppose we know the spectral decomposition of B: B=VAV*, where
Δ = diag(pi). Put U = QV; then

/ m \l/2

where e, = \\Α^-ΡΜ\\2.

Theorem 4.6.7 Let (λ, J) be an open interval that contains sp (B) and precisely the
set σ of eigenvalues of A. Then there exists a numbering {μ^™, of these eigenvalues
such that
pi- Z f ^ ^ i ^ i + Σ


PROOF The reader is referred to Kato's article (1949). We remark that in the
denominator there are no quantities of the type pf — pj9 which could be small in
the case of neighbouring eigenvalues.
The above inequalities use the knowledge of a basis of eigenvectors of B.
We deduce from them other inequalities that are less precise but require only a
knowledge of \\R\\F and (A, I).

Corollary 4.6.8 Under the assumptions of Theorem 4.6.7 we have

\\R\\2F ^ ^ , \\R\\2F
ιηιη,.μ-ρ,.) maijiPi-Q
0 = l,...,m).

PROOF This is evident from Theorem 4.6.7.

Corollary 4.6.9 Let

δ = dist min [sp (£), sp (A) — &].

i O

PROOF Evident.

The various results that we have enunciated in this section generalize those
established in Section 4.6.1 which deal with the relationship between a single
eigenvalue and the Rayleigh quotient. In order to complete the analogy it only
remains for us to mention the results which enable us to find bounds for the
matrix of the canonical angles between eigenspaces.

4.6.3 The Inequalities of Davis-Kahan

Among the results proved in Davis and Kahan (1968) we mention those that are
most relevant for our purpose.
(a) σΐ8 approximated by sp(£) = {pJ7 anc * ® denotes the diagonal matrix of
canonical angles between the subspace generated by the given basis Q and
the eigenspace associated with σ. Let R = R(B).

Theorem 4.6.10 Let

δ= dist min [sp(#),sp(/l) — &].
||sin©||p^^ (p = 2 o r F ) .
(b) a'\s approximated by sp(£') = {μ[}™ and Θ denotes the diagonal matrix of
canonical angles between the eigenspaces of A and A' respectively. Let
R = R(B).

Theorem 4.6.11 Let

δ' = dist min [sp (&), sp (A) — &].
||sin©||^^ (p = 2 o r F ) .
The interested reader is most strongly advised to refer to the article by Davis
and Kahan (1968), which contains a large number of other results.

4.6.4 The Importance of an Orthonormal Basis

The residual matrix R(C) = AQ — QC is computed from an orthonormal basis
Q. In fact, the bound (4.6.3) can no longer be guaranteed in the absence of ortho-

normality between the basis vectors. Let U be a basis consisting of m linearly

independent unit vectors that are not orthogonal. The singular values of U lie
between 0 and yfm; the least singular value of U, say tfmin((7), is a good indicator
of the linear independence of the columns of U (see Chapter 1, Section 1.8). Let
R(Cy U) = AU — UC be the residual matrix associated with C and U.
Theorem 4.6.12 For given matrices C and U, there exists an ordered set of m
eigenvalues of A such that

(a) m a x l ^ - α , Κ - ^ — ||R(C, l/)|| 2 ,

<r min (L0

(b) Σ (μ, - a,·)2 < -γ1—- II H(C, U) || I

PROOF We shall apply Theorem 4.6.5. Let

be the decomposition of U into singular values, where D = diag (σ,,..., σ„) and
Y and X are unitary matrices of orders n and m respectively. Put
A = YAY*, C = X*CX, R(C, U) = YR(C, U)X.
R(C,U) = ÄU-UC
||Ä(C,l/)||,= ||Ä(C,l/)||„ (p = 2 o r F ) .
Consider the partitioning
(B S'*\

(a) On applying (4.6.3) to A, we obtain
m a x l ^ - a . - K \\L\\2,

11 f
Ί2 =
II ^ ^ II2 ^ II ** -C\\
~ ^112 +
"^ H \S\\%
^ II2

and if σ = min(σ 1 ,..., am) = <xmin(l/), we have

a||S'|| 2 <||S'D|| 2 , a\\B'-C\\2^\\B'D-DC\\2.

\\L\\l^\(\\B'D-DC\\22 + \\S'D\\l)^-2\\R\\22.
o o
(b) It is clear that

It can also be shown that

(see Exercise 4.6.13). Hence
|| A \\l = || B'D- DC \\l + || S'D\\j
^a2(\\B'-C\\l+\\S'\\2F) = a2\\L\\2F
On applying (4.6.4) to A we obtain


The notion of spectral conditioning given here for a matrix that is not necessarily
diagonalisable generalizes the notion for a simple eigenvalue given by Wilkinson
(1965) or Golub and Van Loan (1989).
The thorough analysis of the influence of the departure from normality on the
spectral stability of matrices is presented for the first time.
The adopted framework is the traditional normwise stability analysis. The
componentwise analysis has been actively developed in the 1980s. See Geurts
(1982) and Fraysse (1992) for a theoretical account, and the LAPACK User's
Guide (1992) for a more practical viewpoint.
The random componentwise real perturbation of the matrix, defined in
Example 4.2.11, has been used in Chatelin (1989) to explore the topological
neighborhood of a Jordan form. Trefethen (1991) has used similar complex
perturbations in connection with normwise pseudo-spectra.
The influence of non-normality is spectacularly examplified in the two
following PhD theses: Godet-Thobie (1992) contains an industrial application in
aeronautics and Reddy (1991) studies the spectrum of the Orr-Sommerfeld
operator for the stability of parallel shear flows. Another example of spectral
instability is described by Kerner (1989). It concerns the Alfven spectrum for
decreasing resistivity in magnetohydrodynamics.

Example 4.2.8 and Theorem 4.4.3 were inspired by Stewart (1971). The analysis
of a priori errors presented in Section 4.3 was previously given by Wilkinson
(1965) using Gershgorini's disks. The inequalities (4.4.2) and (4.4.4) were inspired
by Kahan, Parlett and Jiang (1982). The a posteriori bounds given in Theorem
4.4.5 are new. Theorem 4.5.3 was proved by Ruhe (1970a). The proof of Theorems
4.6.5 and 4.6.12, due to W. Kahan, is here published for the first time following
his report 'Inclusion theorems for clusters of eigenvalues of Hermitian matrices',
Computer Science Department, University of Toronto, Ontario (1967), kindly
supplied by the author.


Section 4.1 Revision of the Conditioning of a System

4.1.1 [A] Let A be a regular matrix. Show that there exists a matrix AA of
rank 1 such that A + AA is singular and

»ΔΛ|| 2 <

4.1.2 [C] Compute cond2 (A) in the following cases:

(aM = (
,0 ef

(b) A = (
:: r>
1 10* \
(c) Λ = (
,0 ε/

4.1.3 [D] Consider a rectangular matrix Ae<Cn xm, where n ^ m, and suppose
that the columns of A are linearly independent. Define

tA\ max{Mx|| 2 :||x|| 2 = l}

min{Mx|| 2 :||x|| 2 = l}

Denote the singular values of A by σχ.

(a) Prove that if m — n, then κ2(Α) = cond2 (A).
(b) Prove that

K2(A) = cond^2 (A* A) = ( I ^ ^ Y / 2 .

\ min <xf /

(c) Let

A = (! 1 1
'■" 1

Compute κ2(Λ).
4.1.4[D] Let A' = A+ εΗ, where ||H|| = 1 with respect to the induced norm
|| · ||. Suppose that A is a regular matrix such that


Prove that A' is regular and that

^ cond (Ay- - + 0(ε 2 ).

4.1.5 [C] The row scaling of the system Ax = b consists in solving the equivalent
system D~1Ax = D~lb9 where D is a diagonal matrix such that all the rows of
D~lA have approximately the same norm relative to || · || ^. Investigate the effect
of the scaling (in the arithmetic with base 10 rounded to three decimal places),
when the data are
10 105N
1 1
and when we take D = diag(10" ,1).

Section 4.2 Stability of a Spectral Problem

4.2.1 [A] Let λ be an eigenvalue of A and let δ be the distance of λ from the
rest of the spectrum of A. Let M be the eigenspace associated with λ and let g
be an orthonormal basis of M 1 . Define
B = Q*AQ9

Z1 = Q(B-U)-1Q*.
Let / be the index of the eigenvalue of B which is nearest to X. Prove that, if b is
sufficiently small, then
5-1^||(B-A/)-1|l2 = P1||2^2cond2(K^-/,
where V is the Jordan basis of B.
4.2.2 [A] Let Ae<Cnx\ee<£ and HeC" x n such that || H ||2 = 1. Define
Α(ε) = Α±εΗ.

Let A be a non-zero eigenvalue of A whose algebraic multiplicity is m. Let M be

the associated subspace and φ an orthonormal basis of M. Define
0 = φ*Αφ,
Mm + n = j=V-*eVy the Jordan form of 0,
A/m + N = R = β*0β, the Schur form of 0.
Suppose that A is of index t.
(a) Prove that, for all k ^ 0,
||N f c || 2 ^cond 2 (n
(b) Prove that, for sufficiently small ε, there exist φ„ φ{ε) and 0(ε) such that
θ(ε) = φχΑφ(ε),
Α*φ* = φ*θ\
\\θ(ε)-θ\\2^\\Ρ\\2\ε\ + 0(ε2\

P being the spectral projection associated with A.
(c) Let Α(ε) be the eigenvector of 0(ε) that is closest to A and suppose that Α(ε) Φ A.
Prove that
and deduce that

|Α(ε)-Α| Λ=0 |Α(ε)-ΑΓ

(d) Prove that, for sufficiently small ε,

|A( e )-A|^[/cond 2 (F)||P|| 2 | £ |]^ + 0 ( | £ m
(e) Compare this with the property 4.2.3 (page 155).
4.2.3 [A] Let

Λ-(1 10
Ί and Δ^1 ° \
Verify that the balancing of A by Δ decreases the condition number of the basis
of eigenvectors as well as the departure from normality.
4.2.4 [B:25] Let x and x+ be the right and left eigenvectors associated with a
simple eigenvalue of a matrix A. Prove that if no normalization is imposed on x

or x*, then the condition number of λ is given by

csp(A) = ■m\-

4.2.5 [D] Suppose that A is diagonalisable in a basis V whose columns are unit
vectors in the Euclidean norm. Let Is,·!"1 be the condition number of the
eigenvalue Xt as it was defined in Exercise 4.2.4. Prove that when all eigenvalues
are simple, then


and that for each simple eigenvalue A,

l ^ | s i | " 1 < i [ c o n d 2 ( K ) + cond 2 (K)- 1 ].
4.2.6 [A] Show that the condition number of a semi-simple eigenvalue is of the
Lipschitz type while a defective eigenvalue is of the Holder type.
4.2.7 [A] Investigate the relative error of a non-zero eigenvalue.
4.2.8 [B:25,67] Suppose A is diagonalisable. Compare the condition number
csp(x) of an eigenvector corresponding to a simple eigenvalue with that deduced
from Wilkinson's formula
xj(e) = Xj-e X xt + 0(ε2)
i = 1,ίφ j\(Aj — AtfXi^Xj

= Xj- eSHxj + 0(ε 2 ),

where x} is the eigenvector corresponding to λί9 \\Xj\\2 = II^j*II2 = 1, S is the
associated reduced resolvent and Α(ε) = A 4- εΗ, \\ H ||2 = 1.
Investigate ||χ/ε)|| 2 and comment.
4.2.9 [C] Compute the condition numbers of Chatelin and of Wilkinson
(Exercise 4.2.8) for
-G 1.1
and comment.
4.2.10 [C] Let
(I 104 o\ 1 104 0 \
A= 0 0 0 and A' = 1.1 x lO-5 0 0
.0 0 i 2xl0"5 0 1 Ύ 1

Verify that the basis X' of the invariant subspace M' of A which is associated
with the block a' = {— 0.1; 1.1} and normalized by Q*X' = /, where Q = (el9e2)9

is equal to
( 1 θλ
X' = 0 1
\ 10" 3 /36 5/9/
4.2.11 [A] Investigate the departure from normality of the Schur form as a
function of the condition number (relative to inversion) of the matrix representing
the Jordan form when the block σα)η8Ϊ8ί8 of a double eigenvalue λ or two distinct
eigenvalues λ and μ.
4.2.12 [D] Verify that the Jordan form is numerically unstable. The computation
of the Jordan form is an ill-posed problem.
4.2.13 [A] Let A have the simple eigenvalue λ:
Ax = λχ (χφ 0),

Α*χ+ = λχ+ (x*#0).

Prove that
|ΔΑ| JlxJUIxll
ΙΙΔΛ||-Ο||Δ/1|| |χ*χ|

where || · || # is the dual norm of || · ||. Deduce the value of the relative condition
number for λ Φ 0:
KW(A)= Ihn I ^ I J I L .
ΙΙΔΛΙΙ-Ο |A| ||Ai4||

4.2.14 [B:20,23] The space of matrices is now equipped with the relative
componentwise distance:
ε = ππη(ω;|Δ/1| ^ ω | Λ | ) with AA = A — A\
where the inequality is taken componentwise. Show that, for the same eigenprob-
lem, the relative condition number for λ Φ 0 is given by
1 |ΑλΜχ;||Λ||χ|
Κα(λ) = lim
ε-οε |Λ| |χ*χ||Α|
where \A\ denotes the matrix with the (ϊ, j)th element equal to |aiV|.
4.2.15 [A] Let y be an arbitrary vector, non-orthogonal to x. We suppose that
x and the perturbed eigenvector x' are normalized such that y*x = y*x' = 1, so
Δχ = — ΣΔΑχ lies in lin(y) 1 .
Show that
On ί££-|Σ||*|.
lAiin-o Ai4||

Deduce that the relative condition number for x Φ 0 is

|Δχ|| Mil
/Cc(x)=||Z||MH= lim
ΙΙΔΛΙΙ-Ο ||χ|| ||ΔΛ||

4.2.16 [B:20,23] With the distance defined in Exercise 4.2.14, show that

KG(x) = lim
ε-οε || x | IML
4.2.17 [C] Compute the normwise relative condition numbers defined in
Exercises 4.2.13 and 4.2.15 for the three norms || · ||s, || · || 2 , || * ||«, and the matrix
Compute the componentwise condition numbers and compare with the normwise
ones obtained with || · || ^.
4.2.18 [B:20] Let λ\ x' be approximate eigenelements for A; r = Αχ' — λ'χ' is the
residual vector. The backward error is defined as the minimal size of perturbation
AA such that (A + AA)x' = λ'χ'. Prove that the normwise backward error is given
by IMI/(M|| ||x'||), for any subordinate norm, and that the componentwise
backward error is given by max 1 < i < n |r ( |/(|/l||x'|) l . Verify that the backward
errors are independent of the normalizations chosen for x'.
4.2.19 [D] We suppose that A is diagonalisable: A=XDX~ Prove the
following componentwise version of the Bauer-Fike theorem:
min ΙΔλΙ^ΙΐμΤ-ΜΐΔΛΐμτΐΙΙ

<ε|||Α Γ -ΊΜΙΙ^ΙΙΙ
for all AA s uchthat |ΔΛ|<ε|Λ|.
Apply th<s result to the highly non-noirmal matrix
2 109 -2xl09\
A= -10"9 5 -3 with ei|
^2xl0" -3 2 1
Check that
/l9 20 14^
Ι^-ΊΙ^ΙΙ^Ι = 20 21 15
1^14 15 l l y
and compare with || X"l \\ \\ X \\ \\A ||. Conclude.
4.2.20 [B:43] We suppose that the elements of A are complex-valued differentia-
ble functions of a complex parameter t varying in D a C The matrix A = A(t)

admits the eigenelements λ = λ(ή, χ = x(t)9 defined in D. Let t0eD be such that
X(t0) is simple and A\ λ\ χ' exist. If we impose the Euclidean normalization
x*(t)x(t) = 1 in Z), then prove that the derivative x' at t = t 0 is
x' = (x*SA'x)x - SA'x,
where S is the reduced resolvent.
Suppose now that A(t) = A + ίΔΑ for ί in a neighbourhood of ί == 0 including
t = 1. Check that the first-order Taylor expansion around t = 0 is
x(i) = x + i[(x*SA4x)x - SAAx] 4- 0(t2).
Set Ax= A + AA, x(l) = x 1 , interpret geometrically the identity xx — x =
(I-P1) (~SAAx) = {I-P±)Ax, where P 1 = x x * , and Δχ = χ 2 - χ is the
variation on x induced by the Wilkinson normalization xjx 2 = 1, where χ # is
the left eigenvector
Check that
||χ 1 |Ι1 = 1 + ||(/-ί , ) 1 Χ2ΐΐ2>1·
4.2.21 [A] Let λ belong to the normwise ε-pseudo-spectrum of A. Show that
this is equivalent to any of the following three statements:
(a) There exists a vector y such that
\\(A-Xl)y\\^E\\A\l ||y|| = l.

(b) II(A-A/)" 1 1| > - ^ - .

(c) For the Euclidean norm | | · | | 2 ,

Section 4.3 A Priori Analysis of Errors

4.3.1 [A] Let
A' = A + H,
where \\Η\\2 = ε and R(z) = (A-zI)~l. Suppose the Jordan curve Γ isolates the
eigenvalue λ of A from the rest of the spectrum of A. Put
c(r) = max||K(z)|| 2 .

Prove that if ε and y are such that

then the matrix JR'(z) = (A' — zl)~l exists and
max\\ R'(z)\\ 2 <
zer 1 — yc{r)

4.23 [A] The distance between two finite sets σ and r is defined as

dist(<r, r) = max< max min|i — s|,max min|i — s\ >

(^ fer see see tex J

Prove that for any two matrices A and A' we have

dist [sp (A\ sp (A')-] ^ c \\A - A'\\ "\
where c is a constant.
4.3.3 [D] Prove that >4i->det A is Frechet differentiable and that its derivative
det' A is given by
(det^)H= £ aei{Au...,Ai_l,HhAi+u...,An)
A=(Al9A2,...,AJ and H = (/f l5 ...,H n ).
4.3.4 [B:ll] Using the notation of Proposition 4.3.8 prove that, for sufficiently
small ε,
= 0(e).

4.3.5 [D] Suppose the function / is holomorphic in a neighbourhood of the

eigenvalue λ and let ε be a positive number. Using the notation of Proposition
4.3.8, show that, for sufficiently small ε,

= 0(8),

rw-um) = 0(8).

4.3.6 [B:38] Consider the map Ύ:Χ -* AX — XB on the assumption that

sp(A)nsp(B) = 0.
Show that, if Aeres(T),
(T - AI)" lX = -?- f (A - ziy XX[_B - (z - λ)Π " x dz,
where Γ is a closed Jordan curve isolating sp(ß) from sp(4).

Section 4.4 A Posteriori Analysis Errors

4.4.1 [A] Using the notation of Proposition 4.2.3 prove that
(1 + μ / - Α | ) ί / - 1 ) / / < 2 ,
provided that ||ΔΛ|| is sufficiently small.

4.4.2 [D] Prove that

| A - 2 ' | < 11**112*2,
where λ' is the arithmetic mean of the m eigenvalues of A' close to λ of multiplicity
m and where ε2 = || A A || 2.
4.4.3[A] Suppose that the matrix A is sufficiently close to a triangular matrix
and no two diagonal elements are equal to au. Prove that the condition (*) of
Exercise 2.9.1 is satisfied when x = y = ex and the norm is || · || i.
4.4.4 [B:42] Consider the generalized problem
Ax = λΒχ,
where A and B are real symmetric matrices and B is positive definite. The space
Rw is equipped with the inner product
<w,u>B = MTJ3t;,
and || · ||fl is the norm associated with this product. Let
S = B~lA.
(a) Show that S is symmetric with respect to the inner product <·, · >B.
(b) Let μί < μ2 ^ · · · ^ μ„ be the eigenvalues of S; let /ieR and xeR n . Show that
1 1
min^-^ ^*"* (x#0),
**o |μ.| ||Sx||B
(c) Let P; be the eigenprojection associated with λ(. Show that
min \μ] - μ\ \\χ - Ptx \\B ^ ||Sx - μχ\\Β.

(d) Define the Rayleigh quotient associated with < v> B and a non-zero vector x
as follows:
<Sx x >B
RB(S9x) = >2 .

Prove that
min || Sx - μχ \\B = || Sx - RB(S9 x)x ||B.

(e) Suppose the real numbers a and ß are such that


Show that
RB(S9 JC) - Δ, < μ( ^ RB(S9 x) + Δα,

Δ = \\Sx-RB(S9 xMl
(f) Deduce the a posteriori bounds:
min|ft - RB(S9x)\ ^ lRB(s\x) - RB(S9x)2V/29

mm 2
0.· Λ Ä B (s ,x);
4.4.5 [D] Consider the results of Section 4.4.2. Find bounds for
\\XX-X\\9 || X\- X\\ and H^-BU,
XX = V+W9 X\=Q!+W and Bl = Y*AXl.

Section 4.5 A is almost Diagonal

4.5.1 [A] Let {d^\ be a set of n positive numbers. Show that for every eigenvalue
λ of A there exists an index i such that
Ι*-β„|<7 Σ \*iMr

4.5.2 [A] Let

10~ 4 0 ^
A = 210" 4
1(T 4
1(T 4 3
Use similarity transformations by diagonal matrices and Gershgorin's disks in
order to localize the eigenvalues of A around 1, 2 and 3 with a precision of the
order of 10~ 8 .
4.5.3 [D] Let A — D + H, where D is a block-diagonal matrix possessing a
single block of order r. The row indices of the block constitute the set /. Define

r isi r&

(a) With the help of Theorem 4.5.3 obtain a bound of the type

(b) Complete the study of the localization of the eigenvalues of A by means of
Corollary 4.5.2 when \\Η\\γ is sufficiently small.
4.5.4 [B:51] Let A = (ay)e<C" xn and suppose that /ieC differs from all the
diagonal elements of A. Define the map

Λ|(μ)=Σΐ^ΐΓ^1+Σΐ^Ι (* = 2,...,n-l)

;=ι 1%-μΙ
The set
K, = {jieC:|n-a„| <«,(*«)} (i = 1,2,...,«)
is called the ith Gudkov region. The Gershgorin disks are denoted by Gf
(i = l,2,...,n).
(a) Prove that
i=l i=l

(b) Construct the Gershgorin disks and the Gudkov regions associated with the
i-l 1 0\
A=\ 1 1 1
\ 2 0 3/
and compare their precision of localization.
(e) Put Ki = Kt(A)9Gi = Gi(A) and D = diag(d1,...,i/w). Consider the minimal
G(A)= Π Ü G i(ö"^D),

κ(/ΐ)= π lU.P""1^)·

Prove that G(A) = Κ(Λ).


Section 4.6 A Is Hermitian

4.6.1 [A] Let

- ( ;
where A is Hermitian. Show that there exists a Hermitian matrix W such that

T-■ = ( A
" )

imi2 = imi 2 .
This result is known as the dilation theorem.
4.6.2 [D] In the case of a simple eigenvalue, establish an inequality, computable
with the help of the generalized Rayleigh quotient, as a function of
p = M M - ξ ι ι ||2 and s= \\A*v- <fc||2,

ξ = ν*Αιι and v*u = u*ii = 1.
4.6.3 [B:34] Let iieR", a e R and
r(a) = Au — an,
where u*u = 1 and A is symmetric. Define
J f = {H symmetric: (A — H)u = ati}.
(a) Prove that
min||ii|| 2 = ||r(a)|| 2 ,

min||//|| F = 2 | | r ( a ) | | 2 - ( a - « M M ) 2 .

(b) Prove that if p = MMM and r = r(p\ then the minima are attained for
H = ru* + ur*,
and r(p) is the minimum of r(a).
For a non-symmetric matrix Λ the situation is as follows: Let u and v be
vectors in C n such that v*u = u*u = 1. For a e C define
r(a) = Au + an, 5(a) = /l*t; = öiv
^ = {H:(A - H)u = au,M* - H*)v = äi;},

min||/i|| 2 = max{||r(a)|| 2 ,||s(a)|| 2 } )

min \\H\\F=\\r(a)\\22+\\s(<x)\\22-(*-v*Au)2.

The minima are attained by

H = ru* + vs*9
where r = r(z), s = s(z) and z = Ι;*ΛΜ. However, z does not entail a minimization
of the residuals.
4.6.4[B:45] Let Ae<Enxn be a Hermitian matrix and let Q e C n x m be an
orthonormal basis of the space 5 = lin (Q), that is the space generated by the
columns of Q. Prove that the best set of m numbers for approximating sp(A) that
can be deduced from A and Q is sp (Q*AQ) = {/?,}, in the sense that
ßj= min max y*Ay.

4.6.5[A] Let Ae<Cnxn be a Hermitian matrix, A e C m x m a diagonal matrix,

Q€(£n xm a n orthonormal basis and Ke(Cw x m a unitary matrix such that
Q*AQ = VAV*.
Prove that the best set of approximate eigenvectors that can be deduced from A
and Q is QV, in the sense that
WAQV- QVA\\2 = min {\\AU - UD\\2:U*U = Jm,D diagonal}.
4.6.6 [C] Let
/0 0.1 0\
4 = 0.1 0 1 .
vo i oy
Compute the eigenelements of B = Q*AQ, where Q = (el9e2). Deduce that
the property of optimality of the Rayleigh-Ritz vectors is only collective: none
of these vectors minimizes the distance from an eigenvector of A.
4.6.7 [D] Compare the localization (4.6.1) with that of Theorem 4.5.1 when A
is a Hermitian matrix close to a diagonal matrix.
4.6.8 [B:32] Let

Ψ° = {(W<I):3K= (t>0),s.t. K*F= / and w0· = i ; ^ ; 1 ^ ij < it}.

(a) Prove that i T c ar.

(b) Prove that the vertices of 3C are the permutation matrices and the matrices
belonging to W.
(c) Use this result to prove the following property: let D and Δ be diagonal
matrices and let P be a matrix such that
||D - P*AP||F = min {|| D - Κ*ΔΚ||Ρ: V*V= /}.
Then P is a permutation matrix.
4.6.9 [D] What do the bounds of Theorem 4.4.3 imply when A is Hermitian?
4.6.10[B:34] Let U and V be two orthonormal bases in <P x m such that V*U
is regular. Prove that there exists a basis H and a diagonal matrix D such that
(A-H)U-UC and V*(A -U) = DV*,
C= (V*Uy1D(V*U).
4.6.11 [B:34] With the notation of the preceding Exercise 4.6.10 prove that
there exist matrices H of minimal norm:
||H|| 2 = max{||R|| 2 ,||S*|| 2 },
||H|| F = ||R\\* + | | 5 * \ \ 2 F - \\Z\\%,
R = AV-UC, S=V*A-DV* and Z=V*R = S*U.
4.6.12 [B:17,33] Let p be the Rayleigh approximation of the largest eigenvalue
of i4, which we assume to be simple. Prove that the bound

sin Θ ^ ~

of Corollary 4.6.4 can be improved to become

tan θ ^ ^.

4.6.13 [A] Consider the proof of Theorem 4.6.12. Show that

\\ffD-DC\\J>a\\B'-C\\j (; = 2 o r F ) .

Foundations of Methods
for Computing Eigenvalues

The present methods for computing eigenvalues are based on the convergence
of a Krylov sequence of subspaces towards the dominant invariant subspace of
the same dimension. The fundamental QR method is interpreted as a collection
of methods of subspace iteration. In this chapter we shall give a geometric
presentation of the convergence of these methods with the help of the tools
introduced in Chapter 1. This presentation will complement and illuminate in a
new light the traditional algebraic study of convergence. For example, it will
enable us to give a natural explanation of the condition upon the matrix of
eigenvectors of A which is necessary and sufficient for the convergence of the
basic QR algorithm.


In classical terminology a vectorial Krylov sequence is a sequence of vectors
M, Au, A2u,...,
where w^O. Given r linearly independent vectors u!,...,ur which generate a
subspace 5, we call the sequence of subspaces
o , /T.4J, A o,..., A O)...)

a Krylov sequence (of subspaces), and we are interested in the convergence of

this sequence when k tends to infinity.
Let {MJ" be the eigenvalues of A each counted with its algebraic multiplicity.
We make the assumption that there exists an integer r such that
1 ^r<n
\μι\ > \μι\ > - > \μλ > \^+ι\> - > lft.1 > 0 (5.1.1)

The vectors of the Jordan basis are denoted by {xj" and the vectors of the adjoint
basis by {xij|r}".

Definition The eigenvalues { μ ^ are the dominant eigenvalues of A; under the

assumption (5.1.1) they form the dominant block. The subspace M = lin (xx,..., xr)
is the associated dominant invariant subspace and M+ = lin (xx „,..., xr<t) is the
left invariant dominant subspace. Let X be a basis ofM and let X+ be the adjoint
basis in M+. The dominant spectral projection is given by P — XX%.
We now suppose that we are given r linearly independent vectors U = [u l,..., ur]
which generate the subspace S = lin (ul9..., ur).
We shall establish the following fundamental theorem.

Theorem 5.1.1 On the assumption (5.1.1) we have

ω(Α%Μ)->0 as fc->oo,
if and only if dim PS — r.

PROOF Let X be a basis of M and X a complementary basis in M such that

PX = 0 and let [Χ+,Χ+'] be the adjoint basis of [ X , * ] . Then U can be
decomposed as
U = XF + XG,
where F = X%V and PU = XF. We have
dim PS = r,
if and only if PU is of rank r, that is if and only if F is regular.
(a) We suppose that F is regular. Now Uk = AkU is a basis of AkS and, according
to Proposition 1.5.4, we have ω(Λ*5, Μ) -* 0 if and only if there exists a regular
matrix of order r such that UkFk -► X. Since both M and M are invariant
under A we have
AX = XB, where B = X £Λ*,
AX = ΧΪ*, where B = ΛΓ *Λ JT.

= Xß*F + X5kG.
Note that the eigenvalues of B and B are {μ.}^ and {^J;+ j respectively, so
that B is invertible by virtue of (5.1.1).
Uk(F~ lB~k) = X + X{BkGF-lBk).

For every ε > 0, there exists an integer K such that k ^ K implies that

(\\Bk\\\\B-k\\)1/k^p(B)p(B'i)^e = Z^r-f
+ ε.
We can choose ε such that |μΓ+ 1/μτ\ + ε < 1, which implies that co(/4*S, M)->0;
for, given the bases Uk of /1*S and X of M, there exists a regular matrix of
order r, namely

such that

|ί/Λ-;πι = ο(|—Π.
The convergence is linear at the rate |μΓ+ί/μΓ\.
(b) Suppose there exists a sequence of regular matrices Fk such that
AkUFk -► X, and so PA k i/F k -♦ X.
ΡΛ*17 = ^ k P(7 = AkXF = XBkF.
On multiplying on the left by X * we obtain
BkFFk->Iy asfc->oo.
we conclude that F is necessarily regular.
The necessary and sufficient condition that
dim PS = r
is satisfied by a particular choice of the matrix A and the subspace S, which, as
we shall see later, is of great theoretical and practical importance.

Definition The matrix H is said to be of the upper Hessenberg form i/ft o = 0

when i>j + 1. It is irreducible ifhii-1 Φ0 when i = 2,...,n.

Every matrix is unitarily similar to an upper Hessenberg matrix by means of

the algorithms of Givens or Householder (see Ciarlet, 1989, p. 364, or Golub and
Van Loan 1989, p. 222). If some of the elements hu.1 are zero, then the problem
of eigenvalues of if is reduced to a set of subproblems for irreducible Hessenberg

Let Er = lin (ex,..., er) = [e A ,..., e r ], where the et are the first r vectors of the
canonical basis of C". We are interested in the Krylov sequence HkEr.
Lemma 5.1.2 Under the hypothesis (5.1.1) we have
dim PEr = r,
where P is the dominant spectral projection of rank r associated with an irreducible
Hessenberg matrix H.

PROOF Suppose that xeEr but x$Er_ t. Then HxeEr+l but Hx$Er. Repeating
this argument, we deduce that the n - r + 1 vectors x, Hx,..., Hn~rx are linearly
independent: every subspace that is invariant under H and contains x must be
of dimension greater than n — r. Hence Er has zero intersection with every
invariant subspace of dimension less than or equal to n — r. In particular, Ker P,
which is of dimension n — r and is invariant under H, has the property that
Cn = £ r 0 K e r P .
lmP = P<En = PEr
and so
(Cn = I m P 0 K e r P ,
which proves that dim PEr = r.

Corollary 5.1.3 IfH is an irreducible Hessenberg matrix and if (5.1.1) holds, then
ω(//*£ Γ ,Μ)->0 as /c-> oo.

PROOF This is evident from Theorem 5.1.1 and Lemma 5.1.2.

The convergence of the Krylov sequence towards the dominant invariant
subspace can also occur when |μΓ| = |μ Γ + ι|. However, the presence of distinct
eigenvalues with the some modulus may prevent such a convergence.


The subspace AkS is generated by the r vectors
Akul9...,Akur (r<n).
These vectors tend to become parallel when fc-*oo, because, in general, they
converge towards the dominant eigenvector xx (for the power method, see
Section 5.3). The method of subspace iteration consists in iteratively constructing
an orthonormal basis Qk of AkS in the following manner:
(a) l/ = QoKo,
(b) for k > 1, let Uk = AQk_, = QkRk, (5.2.1)
where the Rk are upper triangular matrices of order r.

Schmidt's orthogonalization, Uk = QkRk9 can be carried out by the House­

holder method (see Ciarlet, 1989, p. 155) or by the Givens method (see Golub
and Van Loan, 1989, p. 211). We obtain the following result regarding

Theorem 5.2.1 Suppose (5.1.1) holds. Given orthonormal bases Qk and Q of AkS
and M respectively, there exists a sequence of unitary matrices Zk such that
if and only if dim PS — r.

PROOF It is clear that the matrix Qk defined in (5.2.1) is a basis of AkS. The
assertion follows from Theorem 1.5.2 because co(AkSiM)->0 asfc->oo.
The matrix Bk = Q*AQk is a matrix of order r whose spectrum converges
towards the r dominant eigenvalues.

Corollary 5.2.2 If (5.1.1) holds and if dim PS = r, then sp (Bk) converges to {μ^
as k^co.

PROOF The matrix QkZk furnishes an orthonormal basis for AkS. We note that
B'k = Z*Q*AQkZk is similar to Bk and

where B = Q*AQ is the matrix that represents the map A restricted to M relative
to the basis Q. Hence
sp(ßMQ) = {^}·;.
Since Bk -* B we have
The linear convergence of Qk towards Q and of sp(Bk) towards {μ.}^ is
controlled by |μΓ+ χ/μτ\. In Chapter 6 we shall meet more precise results: in fact,
the rate of convergence of μ(*} towards μχ is of the order of |μ Γ+1 /μ 1 |, when μχ is
simple. An iteration on the subspace S = Sr carries with it simultaneously an
iteration on each of the subspaces
S 7 = 1111(11!,...,ii/), 1 ^/^r-
This remarkable fact will have important consequences: suppose the eigenvalues
are such that
ΙλΛ > \λ2\ > > \K\ > lAWiI > - > lM.1 > 0. (5.2.2)
On this assumption we define a strictly increasing sequence of invariant
subspaces Mf where
Mf = \m(xl9...,xf) (K/<r).

In M = Mr we choose an orthonormal basis 6 = Mi»···>&■] s u c r l that Qf =

Mi»···><?/] *s a Das * s of My, 1 ^ / ^ r . Put
A y = [*1>· ··>*/]> ^/* LX1*>· · > * / * ϋ
Pf XfXf*> * r= *> ^Γ =
^» r* =
^ '
The reader can verify that
B = Q*AQ
is an upper triangular matrix: Q is a Schur basis of M, which is unique apart
from a unitary diagonal matrix.
Therefore Theorem 5.2.1 may be made more precise as follows.

Theorem 5.2.3 Suppose that (5.2.2) holds. Given orthonormal bases Qk ofAkS and
a Schur basis Q ofM, there exists a sequence of unitary diagonal matrices Dk such
if and only ifX*U is regular and its r — 1 principal minors are non-zero.

PROOF Let Uf = [ux,..., uf~\. When 1 ^ / ^ r, the condition that dim PfSf = /
is equivalent to the assertion that X*+Uf is regular. Now X%Uf is the principal
submatrix of order / extracted from X * U. Hence we know that, when 1 ^ / ^ r,
By a recurrence argument on / we can show that the matrix Zk of Theorem 5.3.1
can be taken to be in diagonal form by a suitable choice of Q.

Corollary 5.2.4 Suppose that (5.2.2) holds. If X*U is regular and if its r— 1
principal minors are non-zero, then the limit form ofBk, as k tends to infinity, is an
upper triangular matrix whose diagonal consists of the eigenvalues {λ.}\ in this

PROOF By Corollary 5.2.2 and Theorem 5.2.3 we have D*BkDk-+B = Q*AQ,

which is an upper triangular matrix having {λ.}\ as its diagonal elements in
this order. Since Dk is a diagonal matrix, the matrices Bk = D^BkDk and Bk have
the same form; in particular, the diagonal elements are identical. In fact,

We can say that, modulo a unitary diagonal matrix, the matrix Bk converges
towards an upper triangular matrix. This is the 'essential* convergence of
Wilkinson* (see Ciarlet, 1989, p. 205, and Wilkinson, 1965, p. 517).

* James Hardy Wilkinson, 1919-1987, born at Strood, died at Teddington.


The condition (5.2.2) is not satisfied when there exist eigenvalues having the
same modulus. Without further investigations we can no longer deduce the
convergence co(AkSf9Mf)-+09 when / is an index corresponding to a set o f /
eigenvalues that include eigenvalues having the same modulus. The matrix Dk of
Theorem 5.2.3 becomes block-diagonal unitary; the limit form of Bk becomes
block-triangular, although it remains triangular in certain cases.
An important case in which there exist eigenvalues having the same modulus
is the case of real matrices which may have pairs of conjugate complex
eigenvalues. We shall return to this question in greater detail when we discuss
the power method in Section 5.3.
The method of subspace iteration is characterized by the subspace AkS, the
fcth iterate by A of the subspace S. Starting from there one can study the
convergence of the method and, if it converges, at what speed it does so.
The reader will verify that the method is still defined when r = n and A is
regular; we then have
dim Sn = dim Mn = dim AkS = n,
and a)(AkSn9 Mn) = 0 whatever the value of k.
In practice, several ways of constructing a basis for AkS can be throught of. In
(5.2.1) we presented the construction of an orthonormal basis Qk arising from the
Schmidt factorization QR.
By way of an example we shall give the 'staircase' iteration (Bauer, 1957)
(Treppeniteration') which rests on the Gaussian LR factorization (see Ciarlet,
1989, p. 138).

Example 5.2.1 If possible, a basis Lk of AkS is constructed where Lk is a matrix

of the form shown below:

1 O O "ol
x 1I O o
: x 1

X )< X "x"

The following procedure is adopted:

(a) U = L0R0i
(b) When k^ 1, put Uk = ALk.x = LkRk9
where Rk is an upper triangular matrix of order r. It is known that the Schmidt
factorization (by the Householder or Givens algorithms) is stable, while the
Gaussian factorization without pivoting, even if it is possible, may not be

stable. It is for this reason that we presented Example 5.2.1 only for its historical

Example 5.2.2 The method of bi-iteration may be regarded as a method of

constructing a basis Xk of AkEn and an adjoint basis Yk of (A*)kEn, where
£ n = lin(e 1 ,...,e n ) = C w .Weput
^ο ^ο = f* G k + 1 =i4A' k , Hk + 1=A Yk.
The factorization of Gauss, if it exists, given by
Y*A Xk — Lk + lRk + i,

enables us to compute the adjoint bases Xk + l and Yk+1 such that

AXk — Xk+iRk+i
Λ l
k 'k+l^fc+l'

When A is Hermitian, we can use the Cholesky factorization

and the algorithm becomes identical with the subspace iteration (5.2.1) when
r = n; in this case Xk is unitary and AXk = Xk+ xRk + v
The method of subspace iteration is most frequently used in conjunction with
the auxiliary techniques of deflation and spectral preconditioning whose principles
we are going to describe. These are techniques that modify the spectrum of A,
but not the eigenvectors or the Schur basis; their purpose is to facilitate the
computation of the spectrum.

5.2.1 Deflation
The object is to eliminate those eigenvalues that have already been computed,
the computation being carried out one by one.

Proposition 5.2.5 Let x and x+ be the right and left eigenvectors corresponding to
the simple eigenvalue λ. The matrices
A' = A — σχχ* and A = A — σχχ*
have the same eigenvalues as A except λ, which is replaced by λ — σ.
When A is diagonalisable, A' and A have the same eigenbasis.
The matrices A and A have the same Schur basis.


(a) If A is diagonalisable, then

AX = XD.

A'X = XD- oxe\ = X(D - σβ^]).

(b) If T is the Schur form of A9 then
AQ = WT.
Suppose x is the first column of Q; it follows that
ÄQ = QT- axe] = Q(T- aexe[).
The deflations we have defined are deflations by subtractions. Other deflations
in use are deflations by restriction and deflations by similarity (see Wilkinson,

5.2.2 Spectral Preconditioning

We operate with f(A)9 where the function / is holomorphic in a neighbourhood

of sp(,4) and is chosen in such a way that the spectrum is modified so as to
facilitate computation. We shall discuss three examples of applying this idea:
(a) f(t) = t — a: translation of the origin for the QR algorithm, considered in
Section 5.5.
(b) f(t) = (t — σ)~ι: the inverse iteration will be discussed in Section 5.5 and the
spectral transformation in Chapter 6, Section 6.5.
(c) f(t) is a polynomial in t, chosen in such a way that the eigenvalues with the
greatest real part become the eigenvalues with the greatest modulus; this will
be discussed in Chapter 7, Section 7.9.


If in the preceding method we consider the special case in which r = I, we recover
one of the oldest methods used for computing the dominant eigenvalue λχ\ this
is known as the power method.
We suppose that
\λι\>\μ2\>->\μ.\>0 (5.3.1)
The following theorem is fundamental.

Theorem 5.3.1 Suppose that condition (5.3.1) holds. The sequence

Hub Mfc-ill*
is such that

if and only ifx*^u φ 0.


PROOF This is a special case of Theorem 5.2.3. Moreover, in general,

(see Exercise 5.3.1).

What happens when condition (5.3.1) is not satisfied? In the following we shall
suppose that A is diagonalisable. Two cases are possible:
(a) There exists a multiple semi-simple eigenvalue and the method still converges.
(b) There exist distinct eigenvalues with the same modulus, and, in general, the
method fails to converge.
We discuss these cases in more detail:
(a) μ1 = ··.=μ Γ , |μιΙ>|μ Γ +ιΙ>···.


Aku = μ\ Γ £ (x*u)Xi + t+i (-)*<***«)*,] ·

If Pru Φ 0, we have convergence towards a vector in the dominant eigenspace
associated with μ1 = ··· = μΓ, that is towards an eigenvector.
(b) Let \λ1\ = \λ2\>\μ3\> --. We suppose that λγφλ2 and that Ρ2ιιφ0. If
x\u φ 0, we put λί = εωλ2, where 0 < θ < 2π. Then

Aku = A J e ^ V ^ x , + (x*2u)x2 + £ ( f Y(*t/Φ<λ

If there exists a rational number p = t/s such that θ = 2π/ρ, then there exist
t subsequences that converge to the vectors
e 2ikÄt/s (x*^)x 1 + (x**u)x2 (k = 1,..., t)

Example 5.3.1 Let

/1+i 2i 0\
A= \ 0 1-i 0 .
\ -i -3 1/
The spectrum is {1 -h i, 1 — i, 1}. It will be found that the power method furnishes
four convergent subsequences of vectors because 1 + i = exp (2πί/4) (1 — i).

le 5.3.2 Let A = XDX~\ where

( i o - 1 0\
0 1 -1 0
x= 1 2 1 1
{-l 0 0 V
D = diag(3,2,l,-3).
In this case
λ2 — e / j — 3,
and we obtain two convergent subsequences by applying the power method to
the matrix A with the initial vector u = (1,1,1,1 )T.
For example, by projecting A on the subspace generated by the two limit
vectors, we obtain the matrix
Z-2.76 1.18\
V 1.18
118 2.76,
whose eigenvalues are 3 and - 3 (see Exercise 5.3.7).
If 0 is not of the form 2π/ρ, where p is rational, then no convergence can take
place when x^u Φ 0. The only possibility consists in iterating upon two
vectors ux and u2 such that P2ui and P2u2 are independent.

In this section we present some aspects of the behaviour of subspace iteration,
subject to condition (5.1.1), in the light of the behaviour of the power method.
First of all, if the necessary and sufficient condition for convergence, that is
dim PS = r, is not satisfied, then convergence will occur towards an invariant
subspace that is no longer dominant.
Example 5.33 Let

sp(^) = {3,2,1}.
Whenr = 2,

n o\
U = 1 1
Vo o)
we have

As we should expect from the preceding discussion, we notice the convergence
of Bk to an upper triangular matrix having the eigenvalues {1, 3} (in this order)
upon its diagonal. In fact, the power method applied to A with the initial vector
u= {1, 1, 0)T yields the eigenvalue 1, since u happens to be the eigenvector
associated with 1.
When there are eigenvalues having the same modulus, we can obtain several
distinct block-triangular limit matrices.

Example 5.3.4 Let the matrix A = XDX~i of Example 5.3.2 be defined by

means of
D = (3,-3,2,1).
By simultaneous iterations on the two vectors e1 and e2i the eigenvalues
{3, —3} are obtained by calculating the spectrum of one of the two limit blocks
0.913 1.57 /0.2<296 -5.30
5.19 -0.913 or
V - l , 68 -0.296
These are obtained by projecting the matrix A on the invariant subspace
associated with {3, —3} by means of one of the limit bases
-0.626 0.417 0.209 0.626Y
-0.251 -0.314 -0.880 0.251
-0.356 -0.237 -0.831 0.356^T
0.573 -0.465 -0.358 -0.578,
On the other hand, if we iterate in the same way with a matrix A defined by
Z) = (3,2, 1, - 3 ) ,
we obtain two limit bases
/-0.447 0 0 0.894 γ
V 0.365 0 0.913 0.183/

-0.447 0 -0.894 0 V
0.365 0 -0.183 -0.913/'
but only a single limit block
0.600 -2.94 \
-2.94 -0.600/
The reader will verify that this pecularity is due to the fact that in this case the
eigenvectors associated with 3 and — 3 are orthogonal.


We have seen that the convergence of the power method is linear, with rate
\μ2/λχ |. Since the eigenvectors are invariant under translation of the origin, it is
natural to think about improving this factor through an appropriate shift of
origin. Let σ be an approximation to a simple eigenvalue A, with eigenvector x and
spectral projection P. Then the eigenvalues of A — σΐ are μ, — σ; if σ is not close
to the eigenvalues of A, other than A, than the dominant eigenvalue of (A — σΙ)~ι
is 1/(λ — er), whose modulus is significantly greater than max {\μ( — σ\"1,^ Φ λ).
This fact is exploited in the method of inverse iteration designed to compute the
eigenvector x associated with λ whose approximation σ is known: thus we put
(k>1) (5A1)
io = 7 V · {A-aI)zk = qk_.u & = 7ΠΓ
I N II2 Il kll2
This iteration is none other but the power method applied to (A — σ/)" 1 ; the
convergence rate is \λ — σ\/τηίημ.¥:λ\μί — σ|, which is closer to zero the closer σ
is to λ. Be this as it may, Α — σΙ will be closer to becoming singular; at first sight
this could pose a problem for the solution of (5.4.1). In what follows we shall
examine this apparent paradox.

Lemma 5.4.1 If the vector x is well-conditioned, the error made in solving (5.4.1)
is mainly in the direction generated by x, which is the direction required.
PROOF Let Q be a basis for (x) 1 such that [x, Q] is unitary. Then
, r πίλ~σ X A
* Q Μ~Χ*Ί

B = Q*AQ
Σ ± = ρ(Β-Α/)- 1 ρ*, csp(x)=P1||2.

We wish to examine the solution of (A — al)y = u. The computed vector y is

the exact solution of the neighbouring problem

(4-ff/-tf)y = !!+/,
where H and / are of small norm. We write
(A - al)y = u + Hy +f=u + g,
where g = Hy +f. The error made on y is therefore e = (A — aI)~1g. Now

1 -1
ι x*AQ(B -
(Α-σΙ) = [χ,0]\λ-σ λ-σ
0 (Β-σΙ)
e = x(x*e) + Q{Q*e\

x*e = [/ - AQ(B - σΙ)~ lQ*]g

Q*e = {B-ci)lq*g.
B - σΐ = B - λΐ - (σ - λ)1
{Β-σΙ)-χ = [Ι-{σ- λ)(Β- λΙ)~ι]{Β-λΙ)~χ
ands=\\(B — σΐ) \\2isclose to ||Σ λ ||2 when |λ — σ| is small enough. We deduce

\\Q*e\\2^s\\g\\2 and \x*e\ ^ — — ( 1 + M|| 2 s)||^|| 2 .

When || Σ 1| 2 is of moderate size, we see that the closer σ is to A, the more will
the error lie in the direction of lin(x) (see Figure 5.4.1). We know that
IIΣ11|2 > δ~ \ where δ = minM. # λ |μ, - σ\, with equality when A is Hermitian or


V 0.
Figure 5.4.1

Consider a computer that operates with c digits in the base b. By 'machine

precision' we mean a quantity of the order ε = b~c. We say that a is an exact
eigenvalue 'up to machine precision' if there exists E such that Α — σΙ + Ε is
singular and ||E|| =0(ε). Similarly, y is an exact eigenvector 'up to machine
precision' if there exists F such that y is an eigenvector of A + F and || F \\ = 0(ε).
The following remarkable fact is a consequence of Lemma 5.4.1. Starting with
an eigenvalue exact up to machine precision and with (almost) any vector, we
obtain with one step of inverse iteration an eigenvector that is exact up to
machine precision. More exactly, we shall prove the following result.

Proposition 5.4.2 Suppose σ is such that A — σΐ + E is singular where \\E\\2 = ε.

Then there exists at least one vector u such that the exact solution y of(A + al)y — u
determines an eigenvector of A that is exact wih precision ε.

PROOF Let y be a solution of the singular system (A — σΐ + E)y = 0 such that

||j>||2 = l.Put
u = (A — al)y and z = Ey.
(A + zy*)y = ay and || zy* \\ 2 = || z \\ 2 ^ ε.
Hence σ and y are eigenelements of the matrix A + zy* whose norm differs from
that of A by less than ε.
The exact solution of a neighbouring problem is the best we can hope to
achieve in practice. However, we must not lose sight of the fact that when λ is
ill-conditioned, there may be a great difference between σ and λ.

Example 5.4.1 Let

Note that

σ-l and ^ = (1 + 1 0 - 2 0 Γ 1 / 2 ( 1 0 ^ 1 0 )

are exact eigenelements of

A' = ( "1010 ^
V - 1010/(1 + 1(T 20 ) 2 - 1(T 2 7(1 4- lO" 2 0 )/
and || A' — A ||2 ~ 10" 10 . Now A has the double eigenvalue λ = 2 and |σ — λ\ = 1,
which is not small. However,


and λ is an ill-conditioned eigenvalue, for it is defective and cond(K) ~ 1010, V

being the Jordan basis (A is highly non-normal).
Let us return to the iteration (5.4.1). Proposition 5.4.2 has shown that if σ is
exact up to machine precision, then there exists an eigenvector that is exact up
to machine precision. This remains true even if qi is computed up to machine
precision. One might hope that q2 will be better still. However, if λ is
ill-conditioned, this turns out very often to be false; one cannot obtain anything
better than machine precision, even if it is assumed that the solution of (5.4.1) is
exact. Let us look at this phenomenon in an example.

Example 5.4.2 Let

ViO" 1 0 1/
and σ = 1. The exact eigenvalues are 1 ± 10 . We compute the iterates defined
by (5.4.1), starting with
wT = (0, 1).
rk = (A-I)qk.
We obtain

<?0 Zl 9l Ί *2 12 r2 *3 <?3 r3

0 10 10 1 0 0 0 1 1010 1 0
1 0 0 lO-io 1 1 0 1 0 lO-io

It will be observed that ||rk|| oscillates between 10" 10 and 1.

The eigenvalues of A are ill-conditioned, for they possess eigenvectors that are
almost parallel.
An important variant of inverse iteration is the iteration of the Rayleigh
p0= " (5.4.2)
« 0= ϊ«? u*u
k t^zk
-1' Qk |, zz zz
IUJI *t kk
The properties of local convergence of (5.4.2) have been studied by Ostrowski*
(1958-9) (see also Parlett, 1980, pp. 71-9).

♦Alexander Marko wisch Ostrowski, 1893-1987, born at Kiev, died at Lugano.


The results may be summarized as follows:

(a) if p 0 is an approximation to a semi-simple eigenvalue A of A and if pk -+ A,
then this convergence is essentially quadratic.
(b) When A is normal, the convergence of pk towards A, if it takes place, is
essentially cubic.
We remark that the translation pk in (5.4.2) varies with k; this enables us to
obtain a convergence whose order is asymptotically greater than unity. In the
subsequent sections we shall return to the relationships between the iteration of
the Rayleigh quotient, the QR method with origin shifts and Newton's method.


5.5.1 The Basic Algorithm

The basic QR algorithm consits of the construction of a sequence {Ak} of unitarily
similar matrices:
A^A^Q,RU Ak+l=RkQk = Qk+1Rk + 1 (k>l)
where the Qk are unitary and the Rk are upper triangular matrices. Since
Rk = Q*Ak9 we have

Ä* = ß i - ß * and ak = Rk...Rt.
Then it can be verified that

We assume that the eigenvalues are simple and of strictly positive distinct
moduli; thus
|A 1 |>|A 2 |>..->|AJ>0. (5.5.1)
We point out immediately that the assumption 0£sp(y4) is not restrictive, since
one can satisfy it by making a translation of the spectrum.

Lemma 5.5.1 The behaviour of Ak is determined by that of &k when k tends to


PROOF This is evident because

Ak+i ^k^l^k'

Lemma 5.5.2 The first r columns of £k generate the subspace AkEr, where
Er = \m(e1,...,er), r = l , . . . , n .

PROOF This follows from the triangular form of the matrix Mk which appears
in the factorization Ak = &k&k\ we observe that both A and $k are regular.

The QR method may therefore be regarded as a collection of n methods of

subspace iterations starting with the n nested subspaces
£ ι ζ =£ 2 <=:.-er £„ = €".
The condition (5.5.1) implies that (5.1.1) is satisfied for r = 1,..., n.
First, we shall study the convergence of the QR algorithm when applied to an
irreducible Hessenberg matrix, making use of Corollary 5.1.3.

Lemma 5.5.3 Let H be a Hessenberg matrix and let H = QR be its Schmidt

factorization. Then both Q and RQ are Hessenberg matrices.

PROOF The proof is left to the reader.

Theorem 5.5.4 Under the hypothesis (5.5.1) the QR algorithm, when applied to
an irreducible Hessenberg matrix, produces a sequence of unitarily similar
Hessenberg matrices which converges (modulo a unitary diagonal matrix) to an
upper triangular matrix whose diagonal consists of the eigenvalues {/l,.}" in this

PROOF Let Q be the Schur basis of H (unique up to a unitary diagonal matrix).

When r > n, we can apply Theorem 5.2.3 to the first r columns of Jfc, which is a
basis of HkEr. When r = n, the recurrence still applies, because a>(HkEn, Mn) = 0
for all k. Hence we deduce that, given be basis Q and £k of <P, there exists a
unitary diagonal matrix Dk such that
Now Hk + l is similar to
D*Hk + 1Dk = D*Q*HQkDk,
which converges to
Q*HQ = R.
This matrix is the Schur form of H whose diagonal consists of the eigenvalues
{Au}" ordered by decreasing modulus.

This theorem illuminates the theoretical importance of the Hessenberg form.

Its practical importance stems from Lemma 5.5.3, for in the course of the iterations
the matrices Qk and Hk are such that their subdiagonal parts contain at most
n — 1 elements which are not necessarily zero. This fact considerably diminishes
the volume of calculation to be effected.
We return to a diagonisable matrix A whose eigenvalues satisfy (5.5.1):
A = XDX~\ where X = [x l 5 .. . , x j ,


Theorem 5.5.5 On the assumption (5.5.1) the QR algorithm, when applied to A,
produces a sequence of unitarily similar matrices whose limit form is an upper
triangular matrix having {A.}" as its diagonal elements in this order, under the
necessary and sufficient condition that the n—\ principal minors of X~l are

PROOF It suffices to consider the necessary and sufficient condition of conver­

gence given in Theorem 5.2.3 when r = n and U = [el9...,e„] = /. The speed of
linear convergence is controlled by max r = x n-i\K+JK\·

Remark The subspace iteration is an incomplete QR method in the sense to be

made more precise in Chapter 6, where we shall present the incomplete methods of
Lanczos and Arnoldi.

5.5.2 Eigenvalues with Equal Moduli

When there exist distinct eigenvalues with the same modulus, the limit form of
Ak need no longer be upper triangular, but may become upper block-triangular.
The limit form depends only on the eigenvalues and their multiplicity. It is beyond
the scope of this book to include a complete discussion of the convergence of the
basic QR algorithm in the presence of distinct eigenvalues with the same modulus.
The following result can be proved.

Theorem 5.5.6 When the QR algorithm is applied to an irreducible Hessenberg

matrix, it produces a sequence of matrices whose limit form is triangular, if and only
if there do not exist two distinct eigenvalues with the same modulus and with
algebraic multiplicities of the same parity.

PROOF See Parlett (1968) and Parlett and Poole (1972). Parlett's article of 1968
gives a necessary and sufficient condition for convergence to a quasi-triangular
form, that is having blocks on the diagonal of order at most two.

5.5.3 The QL Algorithm on A ~ *

If A is invertible, the Euclidean scalar product satisfies the relation
>>*x = (x, y) = (Λχ, A ~*y\

The subspaces
are orthogonal complements in pairs. Note that if Er = lin (ex,..., er\ then
£:r1 = lin(e r + x,..., e„).
The QR algorithm, which we have interpreted as a collection of n subspace
iteration methods for A, can also be interpreted with the help of iterations for

Proposition 5.5.7 The QR algorithm for A is equivalent to the QL algorithm for


PROOF The QL factorization of a matrix A amounts to orthonormalizing its

columns by beginning with the last rather than with the first, as is the case in the
QR factorization, L is a regular lower triangular matrix.
Let Ak = QkRk, A* = R*Q*9 and so

/ = R~*

which is a lower triangular matrix. Put

t£k = Lk-Ll= 0tk .

Just as in the method of subspace iteration, different bases can be constructed

in AkEr, r = 1,..., n. Thus we obtain different algorithms which converge in the
same manner but enjoy different numerical stability.

Example 5.5.1 Rutishauser's* (1958) LR algorithm. We construct (if possible)

the following sequence {Ak} of similar matrices:
A = A, = L.R^ Ak+1 = RkLk = Lk+lRk+1 (k > 1),
where Lk is a lower triangular matrix, all of whose diagonal elements are equal
to unity (Gauss's factorization without pivoting). If &k = Rk-··/?! and S£k =
Li · · · Lfc, then Ak = S£kMk and Ak+l= S£kxAS£k. It is easy to verify that the first r
columns of 5£k generate the subspace AkEr,r—\,...,n (see Lemma 5.5.2). One

♦Heinz Rutishauser, 1918-1970, born at Weidenfelden, died in Zürich.


might compare the LR algorithm with the method of 'staircase' iteration

presented in Example 5.2.1.

5.5.4 Shifts of Origin

The linear rate of the convergence of the basic algorithm can be improved by
using shifts of origin that vary at each step. Thus consider the algorithm:
AX=A, Ak-akI = QkRk, Ak + 1 = RkQk + akI (k ^ 1),
where {ak} is a sequence of scalars.
Two strategies are often used in practice. They are defined by the following

( 3 )σ, = <> = β ί ν „ . , .
(b) σ^ is the eigenvalue of the submatrix ET2AkE2 that is closest to -aJJ*, where
E2 = [£„_!,£„]· This shift is known as Wilkinson's shift.
There exists no global result on convergence of the shifted QR algorithm for a
non-normal Hessenberg matrix. With strategy (a) the convergence of ajj* to an
eigenvalue is asymptotically quadratic. When strategy (b) is applied to a
symmetric tridiagonal matrix, the convergence of a{^ to an eigenvalue is at least
quadratic and almost always cubic.
Strategy (a) is related to the iteration of the Rayleigh quotient in the following

Lemma 5.5.8 If we choose ok — a™, then Qkek is proportional ίο(Α% — äkI) ~1ek.

PROOF The relation

implies that
Q* = Rk(Ak-akI)-K
Qke„ = (At-ökI)-lRken
= (A*-äkI)-lf<»e„,
where r*'( #0) is the element of Rk in the position (n, n). Hence

" MAt-ajrwi,
Starting from qk_ i and pk„l= a'*', one iteration of the Rayleigh quotient upon
A* yields qk = QkeH and

when strategy (a) is followed, the coefficient of Ak + 1 in position (n, n) is therefore

the result of one iteration of the Rayleigh quotient on A* starting with en, which
is an approximate eigenvector.
In practice, the shift of origin is applied together with deflation: the last row
and the last column of the matrix are suppressed when the coefficient a(^_ln is
considered to be sufficiently small; a new shift of origin is then applied to the
matrix of order n — 1.
It is evident that when shifts of origin are employed, the eigenvalues will no
longer necessarily appear in order of decreasing moduli.
The interested reader will find a description of the practical implementation
of the QR algorithm in Golub and Van Loan (1989, pp. 228-37).


When the matrix A is Hermitian, numerous simplifications occur, as we have
indicated from time to time. We recall that among the most important ones are
the facts that A is diagonalisable, that the eigenvalues are real and that the
eigenvectors are orthogonal. The simplifications relating to the QR algorithm can
be summarized as follows:
(a) Complexity. The tridiagonal form (or band symmetric) is preserved (see
Exercise 5.6.1).
(b) Convergence. When the QR algorithm with Wilkinson's shift is applied to
an irreducible tridiagonal matrix the convergence is always at least linear.
The asymptotic rate of convergence is almost always at least cubic.
The reader will find a detailed treatment of the symmetric eigenvalue problem
in Golub and Van Loan (1989, Ch. 8) and particularly in Parlett (1980, Chs. 8
and 9).


In this section we consider the generalized problem
Ax — λΒχ,
whose eigenvalues can be computed by the QZ algorithm (Moler and Stewart,
1973). This algorithm generalizes QR: when B is regular, QZ reduces essentially
to the application of QR to AB~ \ without the need to compute B~l.

Theorem 5.7.1 There exist unitary matrices Q and Z such that Q*AZ = T and
Q*BZ = S are upper triangular matrices. If for some values of i, we have
ίί£ = sit = 0, then sp [Λ, B~] = C; otherwise
s p [ ^ B ] = {i„/sfl;sl|9feO}.

PROOF Let {Bk} be a sequence of regular matrices that converges to B as k -► oo

(B need not be regular). For each k we define:
(a) the Schur decomposition Ql(ABk l)Qk = Rk,
(b) the Schmidt factorization BklQk = ZkSkl, where Rk and Sk are upper
triangular matrices.
Therefore QkAZk = RkSk and Q^BkZk = Sfc are also upper triangular matrices.
The sequence of pairs of unitary matrices {(Qk,Zk)} possesses a convergent
subsequence Qf->Q and Z^^Z, where ^eNl ^ N . It can be verified that the
limit matrices are unitary and that both Q*AZ and Q*BZ are upper triangular
matrices. The statement about sp [Λ, JB] follows from the identity

det(,4 - λΒ) = det(QZ*) f\ (tu - As,,).

The stability of the generalized eigenvalue problem will be studied in Exercise

The QZ algorithm is divided into two stages:
(a) The determination of the two unitary matrices Q and Z such that Q*AZ and
Q*BZ are in upper Hessenberg and triangular forms respectively. This is the
preparatory stage.
(b) The iteration: Ak - kBk = Qt{Ak_l - XBk_l)Zk, k ^ 1, where Ak and Bk are
in upper Hessenberg and triangular forms respectively. The matrix AkBk i is
essentially the result of applying one step of the QR process to Ak_lBk}v It
can be shown that, as k-+ oo, the limit form of Ak is upper block-triangular.
The reader will find a complete description of the QZ algorithm in Golub and
Van Loan (1989, pp. 251-66).


Let A be a simple eigenvalue of A; the eigenvector x, normalized by y*x = 1,
satisfies the equation
F(x) = Ax- x(y*Ax) = 0. (5.8.1)
We apply Newton's method to (5.8.1) in the manner defined in Section 2.10 of
Chapter 2; thus we write

x° = — , z = xfc+1-xk,
(I - xky*)Az - z(y*Axk) = - Axb + xk(y*Axk) (k > 0),
where the superscript k represents the number of the iteration. We deduce that
Axk+1-xk +
\y*Axk) = xkly*A(xk +x
- x k )] (k ^ 0).

If we put μ* = y*Axk, k ^ 0, we obtain the equivalent equation

μ - μ * / ) χ * + 1 =τ*χ*,
where xk is a non-zero scalar.
This may be interpreted as follows: x fc+1 is the solution, normalized by
y*xk + 1 = 1, of a system of linear equations with matrix A — μ*Ι and right-hand
side parallel to x \
Proposition 5.8.1 The application of Newton's method to (5.8.1) is equivalent to
an iteration of the right Rayleigh quotient.

PROOF We define the iteration of the right Rayleigh quotient as follows:

q (A-vk_1I)zk = qk_u
°~U' y*io
q k
~ π π'
II zk II
(that is only the right-hand vector in the Rayleigh quotient is modified).
We shall show inductively that μ* = vk and that the vectors xk and qk generate
the same direction; they differ only in their normalizations, which are y*xk = 1
and || qk \\ = 1 respectively. This is true when k = 0; indeed,

o *Λ o y*Au
= Vn.
Suppose the assertion is true for the (k — l)st iteration; hence the vector

satisfies y*qfk = 1. Thus qk = xk and μΗ = vk.

This leads to the conclusion that the local convergence of the iteration (5.8.2)
is quadratic.



We continue to assume that λ is a simple eigenvalue and that the corresponding
eigenvector x is normalized by y*x = 1. We consider Newton's modified method:
v - y
0 — ~T~' — Xk+1 ~ X
(I - xky*)Az - z(y*Ax0) =-Axk + xk(y*Axk) (k > 0),

or else, if C = y M x 0 ,
(A - CI)xk + i = xkly*A(xk+ x - x 0 )]·

Proposition 5.9.1 Newton's modified method (5.9.1) is equivalent to the method

of inverse iteration on A, starting with u and ζ — y*Au/y*u.

PROOF The vectors defined by these two methods correspond respectively to

the normalizations y*xk = 1 and || qk ||2 = 1. They generate the same direction.

Remark The two methods are mathematically equivalent, but not numerically.
In fact we have seen in Proposition 5.4.2 that one cannot surpass the machine
precision for qk when it is calculated by the inverse iteration method. On the other
hand, we can obtain xk to a precision equal to that used to compute the residual
Axk — xk(y*Axk)9 while the linear systems are solved in simple precision.

5.9.2 Simultaneous Inverse Iterations

Let σ be an approximation for an eigenvalue λ of algebraic multiplicity m and
let U be a set of m linearly independent vectors. We can generalize (5.4.1) by
defining the inverse subspace iteration (or block-inverse iteration or simultaneous
inverse iterations) in the following manner:
(a) U = Q0R0;
(b) when k ^ 0, put (5.9.2)
{A — al)Yk + l = 6 k , Yk+i = Q* + i#*+i·
Let Q be an orthonormal basis of the invariant subspace M associated with λ
and let Q be an orthonormal basis of the orthogonal complement M 1 . Put
(χ = λ — σ, ε = |α|,
so that A — σΐ is almost singular for small enough ε.
The following analogue of Lemma 5.4.1 is easily proved.

Lemma 5.9.2 //csp (M) = || Σ x || F is moderate, the error committed in solving

(A-aI)Y=U (5.9.3)
lies mainly in the required subspace M.

PROOF This is entirely analogous to the proof of Lemma 5.4.1. The system we
wish to solve can be written

where B = Q*AQ and B = QAQ. It remains to compare ||(B,B) \\F with

|| (B — σ/)" 1 ||F when σ is close to λ. If λ is semi-simple, then Β = λΙ and we are

brought back to the situation of Lemma 5.4.1. If A is not semi-simple, it can be
verified that when || (£, B) || F is moderate, sois ||(B — σ/)" 1 ||F (see Exercise 5.9.1).
However, if σ is too close to a defective eigenvalue λ, then the basis Y may
consist of vectors that are almost linearly dependent.

Lemma 5.9,3 Let

OL 1 o\

Vo a
be of order r. Ife = |a| is sufficiently small, then G is of rank unity up to ε1/2.

PROOF It is easy to verify that

/a"1 -a"2 (-l)r+1a"r^

(-i) r + 1 or 1
(-l)'af~r-2 1

= (-l)r+1a'r
·. ( - l ) r a r ~ 2
V .(-I)'*1«'-1
r+1 r
= (-l) oT K,say.
Let Π be the matrix e^J, which maps C on to lin (ej. Π is of rank unity: one
of its singular values is equal to unity while the other r — 1 singular values are
equal to zero. In fact, Π Τ Π = ejie^ejej = ereTr, which is the diagonal matrix
On the other hand, G l and K have the same rank: the columns of G~1 are
multiplied by ar, apart from a sign, in order to obtain K. Now
|| K - Π || 2 ^ c(2e + 3ε2 + · · · + rer' *) ^ ce,
where c is a generic constant depending on r.
We conclude that the difference between the singular values of K and those of
Π is of the order of ε 1/2 (see Exercise 5.9.2).
In view of Lemma 5.9.2 we are interested in that part of the solution Y of (5.9.3)
which lies in M, that is
ß * y = (B - σΙ)~ lQ*lV - AQ(B - aiy x
For that reason we study the solution Z of the system (A — σΙ)Ζ = Q.

Theorem 5.9.4 Ifk is defective, the m vectors that are the solutions of
(A-aI)Z =Q
are linearly dependent up to ε , for small enough ε.

PROOF With the notations of the proof of Lemma 5.9.2 we have

Ζ = ρ(£-σ/Γ1.
The rank of Z is equal to the rank of
if V is the Jordan basis of B = Q*AQ and Bx is the Jordan box of B associated
with the eigenvector λ of geometric multiplicity g < m, that is
ß A = diag(J 1 ,...,J g ),
where the {Jk}\ are the g Jordan blocks of λ.
For each k (1 < k ^ g\ Jk — σΐ is a matrix of the type G in Lemma 5.9.3 of
order rfc, equal to the grade of the eigenvector that is associated with the Jordan
block Jk,rk^t,t being the index of A.
For sufficiently small ε, the rank of (Βλ — σΐ)'1 is therefore equal to g up to
the order of ε1/2; the some holds for the rank of Z.

Corollary 5.9.5 If 'λ is defective, the method of simultaneous inverse iterations is

unstable when ε is sufficiently small.

PROOF The basis calculated by means of (5.9.2) is close to being degenerate:

Yk tends to become of rank g < m as Qk-+Q, which is an orthonormal basis
Hence there is no analogue of Proposition 5.4.2 when λ is a defective
eigenvalue; this is confirmed by the following example.

Example 5.9.1 \JAA = VJV~ , where

/4 1 0 0 0\ l 1 0 0 0 -1\
0 4 0 0 0 1 1 0 0 - 1 I
J= 0 0 1 1 0 and V= 1 1 1 1 0 -1
0 0 0 1 0 1 1 1 1 -1
^0 0 0 0 2 ) U 1 1 1 o)
We wish to compute an orthogonal basis of the invariant subspace M associated
with the defective eigenvalue λ = 1, that is
__ /0 0 a a α\τ
\0 0 -2b b bj

Table 5.9.1.
0.85 ρ15<0.2χ10"10 e„-[;33 0
0.575 T
-0.411 J
0 0.5779 0.5771 0.5771 Ί
0.99 Pi < 0.25 x 10" 10
-[: 0 0.8161
0.5 x 10" 4
-0.4086 J

0 - 0 . 2 x ΗΓ 4
does not decrease
e100= 0.5773503 0.8164965
0.5773503 -0.4082488
JX5773503 -0.4082486 _
0 0.81
0 -0.064
0.9999999 2<pk<4 ßioo "~ 0.5773503 0.42
0.5773503 -0.050
0.5773503 -0.37

We use the simultaneous inverse iterations defined by (5.9.2), starting with

/ i i i o oy
V-l 1 0 1 0/
and c. We consider the residual matrix
Rk = AQk-Qk(Q*AQk)
and the norm ||Α||* = Σι,./Ι υΙ> when K = (r0). The numerical results are sum­
marized in Table 5.9.1.
We observe that when σ is far from λ (λ — σ = 0.15), the residual pk becomes
small, but the solution has only one correct digit. On the other hand, when a is
close to λ {λ — σ = 10" 7 ), the eigenvector has seven correct significant digits while
the principal vector has none.

5.9.3 A Modified Newton Method

We present a modified Newton method which is mathematically equivalent to
(5.9.2) in the sense of the following proposition.

Proposition 5.9.6 Let

X0=U,Y*U=I, Z = Xk + 1-Xk,
(I - XkY*)AZ -σΖ=- F(Xk) (k^0).

Then the bases calculated by (5.9.2) and by (5.9.4) generate the same subspace

PROOF We can write (5.9.4) as

AXjt + ι ~~ G^k+i ^fc£*>
Now Ek is regular, for, if not, Xk+i would be of rank less than m, which is
impossible because A — σΐ is regular by (5.9.4): Xk+i satisfies Y*Xk+1 = /. We
conclude that
When λ is defective, the basis calculated by (5.9.4) is of rank m, which remains
constant throughout the iterations; on the other hand, the basis Yk9 calculated
by (5.9.2), tends to become numerically of rank g when σ is sufficiently close to
λ. This is essentially due to the fact that, when the projection I — XkY* is applied
to A, it will in the course of the solution eliminate the contribution of the subspace
upon which A — σΐ approaches singularity.
How does one study the convergence of (5.9.4)? For example, one might
compare (5.9.4) with the simplified Newton method (2.11.1) whose convergence
was demonstrated in Section 2.11 on the condition that the initial basis U yields
a residual matrix R = AU — UB of sufficiently small norm, where B = Y*AU.
The Newton iteration (2.10.4), the simplified Newton iteration (2.11.1) and the
modified Newton iteration (5.9.4) lead, at thefethiteration, to the solution of a
Sylvester equation for Z defined respectively by
(/ - XkY*)AZ - Z(Y*AXk) for (2.10.4),
(/ - UY*)AZ - ZB for(2.11.1),
(/ - Xk Y*)AZ - σΖ for (5.9.4).
We may interpretJ5.9.4) by saying that B has been replaced by σ/, which is
legitimate when \\Β — σΙ || is sufficiently small. However, we shall see that this is
no longer always true when λ is defective.
Suppose 5 = Y*AU is diagonalisable: B= WAW1. Then
ΙΙΒ-σ/ΙΙΗΙ^ίΔ-σ/)^" 1 !!
^ cond (W) max | ζ, — σ\.

Proposition 5.9.7 If λ is defective and B is diagonalisable, then cond2(W) is

necessarily large as soon as || U — X \\2 is sufficiently small.

PROOF F = U W is the matrix of eigenvectors of the matrix A'Q = UB Y*; in fact,


we suppose that the basis U is chosen to be orthonormal. Then W*W= F*F and

cond2(^) = ^ g ,

where amax(F) and ^ i n (F) are the greatest and the least singular values of F
Let Ex be the eigenspace of A associated with the eigenvalue λ. We have
dim Ex-g<m.
Let Π λ be the matrix representing the orthogonal projection on Ex. The matrix
G = FLXF is of rank g because each of its columns is an eigenvector of A. On the
other hand,
Hence on putting F = [mfl,..., / m ] we have
|| ( / - n j / i || 2 =dist (UEX) (i=l,...,m).
The reader can verify that Ex is also the eigenspace of A0 = XBY*, where
B = Y*AX.
We deduce from Theorem 4.3.7 that

\\F-G\\2F= Σ MI-IlJfiWl^cWA^-AJ2/,

where t is the index of A, provided that η = || A'0 — A0 \\ 2 is sufficiently small. Now

A'0-A0 = UBY*-XBY*
= (U- X)BY* + XY*A(U - X)Y*.
Hence || A'0 — A0 \\2 is small as soon as || U — X ||2 is sufficiently small.
The m — g smallest eigenvalues af of F*F satisfy σ1 = 0(r/1/<r) (see Exercise
5.9.3). In conclusion, it remains to obtain a lower bound for the greatest σ?; in fact,

cond2(HO > ^-l^(a2max(G) - αη1")1*2 = αη~ ^2',
where cjs a generic constant; the quantity r\~1/2/ is greater the smaller η is. As a
result, B cannot be well approximated by the diagonal σΐ,
The main interest in the iteration (5.9.2) lies in the fact that the matrix A — σΐ
remains fixed throughout the iterations; in contrast, the iteration (2.11.1) requires
the solution of a Sylvester equation. We now put forward a modification of
(2.11.1) that does not require such an expensive step, while remaining stable when
λ is defective. When λ is defective, as we have seen, B is not well approximated
by σΐ. Let B = QTQ* be the Schur decomposition of ß, where
r = d i a g ( Q + N,

N being the strictly upper triangular part of T. We put

B = QfQ*.
T h e n | | 5 - 5 | | 2 = | | T - f | | 2 = maxl|f1-a|.
This enables us to put forward a modified Newton method which is stable
when λ is defective and which has almost the same complexity as (5.9.2):
X0 = U, Z = Xk + i — Xk, (^QS)
(I - UY*)AZ -ZB=- F(Xk) (fc ^ 0).
In this context a natural choice for σ is the arithmetic mean
- 1 1
rrii=i m

In contrast to (5.9.2), the interest in (5.9.5) lies in the fact that it allows us to
compute the set of all m basis vectors of M with the precision required. See
exercise 2.11.3 for a sufficient condition of convergence for (5.9.5). This sufficient
condition requires that λ is not too ill-conditioned.

The geometrical illumination of Sections 5.1 to 5.5 was inspired by the fundamental
article of Parlett and Poole (1973) and by Watkins' paper (1982). The presentation
of the inverse iteration method was adapted from the article by Peters and
Wilkinson (1979), in particular Proposition 5.4.2 and Example 5.4.2. The study
of the simultaneous inverse iterations for calculating a defective eigenvalue is
new (Chatelin, 1986).
Here is a point of nomenclature: the method of simultaneous iterations
(Rutishauser, 1969) has different names according to the context: subspace
iteration in Parlett's (1980) book (following the custom of structural engineers)
and orthogonal iteration in the book by Golub and Van Loan (1989, Ch. 7). The
practical implementation of this method always involves a step of projection (see
Chapter 6).


Section 5.1 Convergence of a Krylov Sequence of Subspaces

5.1.1 [A] Prove that if an irreducible Hessenberg matrix is diagonalisable, then
all its eigenvalues are simple. Deduce that this is also true for a symmetric
irreducible tridiagonal matrix.
5.1.2 [D] Let A be a regular matrix and S0 a vector subspace on which A acts.
Denote the corresponding Krylov sequence by Jfk. Prove that this sequence

becomes stationary, that is

if and only if one subspace of the sequence is invariant under A.

Section 5.2 The Method of Subspace Iteration

5.2.1 [C] Study the convergence of the method (5.2.1) for the matrix
(\ 0 0
A=\ -1 0
by using as the initial subspace
(a) S = \in(eve2\
(b) S = lin(£g.
5.2.2 [B:67] Investigate the possibility of computing the eigenvalues of A by
using the LR method described in Example 5.2.1.
5.2.3 [D] Consider the matrices A' and A defined in Proposition 5.2.5. Prove
Α' = Α(Ι-Ρ) + (λ-σ)Ρ,
Α = Α(Ι-Ρ1) + (λ-σ)Ρ\
where P and P are, respectively, the spectral projection and the orthogonal
projection on the one-dimensional eigenspace associated with A.
5.2.4 [D] Study the convergence of the subspace iteration for an irreducible
Hessenberg matrix by using an initial basis of the form
/ \
* X X

0. *.. x
0 0. '*.
U0 = e(Cn
Ό. x
0 0
where * is a non-zero element, x is an element that is not necessarily zero and
all other elements are zero.

Section 5.3 The Power Method

5.3.1 [A] Prove that on the assumptions of Theorem 5.3.1 we have
Ι ^ * ^ - Λ 1 | = 0(|μ 2 /μ 1 |*)

5.3.2 [D] Suppose that A is Hermitian and that the conditions of Theorem 5.3.1
are satisfied. Prove that
| ^ k - A J = 0(|^2//iJ2*)
5.3.3 [D] How can the power method be used to compute the eigenvalue(s) of
least modulus of a regular matrix?
5.3.4[C] Study the behaviour of the power method for the matrices

*-(; i) a-d - ( ί Λ) »**

5.3.5[B:49] For a given polynomial
ρ(ζ) = ζπ + α 1 ζ π ~ 1 + · · · + α , ι
we define its companion matrix (cy) as follows (see Exercise 1.1.13):
f—a„_ i + 1 if 7 = nand 1 ^ ΐ ^ η
c0=<l if j = i — 1 a n d 2 < i ^ n
(0 otherwise
Prove that the power method for C is equivalent to Bernoulli's* method for p(z).
This method consists in computing
Z a z
n +k = ln + k-l + a2Zn + k-2 + "* + a Z
n k>

for k = 0,1,2,..., when z 0 ,...,z„_ x are given.

5.3.6 [D] With the help of the power method and Exercise 5.3.2, propose a
method with quadratic convergence for the calculation of the spectral radius of
A without having to calculate the product A*A explicitly.
5.3.7 [C] Carry out the computations relating to Example 5.3.2. Verify that the
two limit vectors are
i?x =(—0.115, 0, 0.577, 0.808)T
t; 2 =(_0.115, 0, -0.808, -0.577) 7 ,
and that the vectors vt + v2 and vl — v2 are proportional to the eigenvectors
associated with 3 and — 3.
5.3.8 [B:67] Let λ1 and λ2 be the first two dominant eigenvalues of A. Suppose

A1=e%, λ,Φλ^ θ=— (peQ)


♦Daniel Bernoulli, 1700-1782, born at Groningen, died in Basle.


Prove that every pair of linearly independent vectors chosen from the limit
vectors of the power method enables us to construct a 2 x 2 matrix with spectrum
Ui,A 2 }·
5.3.9 [B:67] Revert to Exercise 5.3.8 in the case in which A is real and λχ = — A2.
Prove that if v and w are two limits of convergent subsequences, then v -f w and
v — w are the corresponding eigenvectors.

Section 5.4 The Method of Inverse Iteration

5.4.1 [D] Propose a method of subspace iteration in order to compute the
eigenvalues of least modulus of a regular matrix.
5.4.2 [B:45] Establish the quadratic and cubic rates of convergence of the
Rayleigh quotient in the general case and in the Hermitian case respectively.

Section 5.5 The QR Algorithm

5.5.1 [D] Let H be a singular irreducible Hessenberg matrix. Prove that the
eigenvalue zero is recovered after one step of the QR algorithm.
5.5.2 [B:64,67] Study the equivalence between the algebraic proof of the QR
method (given, for example, in [B:67]) and the geometric proof given in Theorem
5.5.5. In particular, prove that if H is an irreducible diagonalisable Hessenberg
matrix and V is the matrix of eigenvectors, then all the principal minors of V~l
are non-zero.
5.5.3 [C] Apply the QR algorithm to the matrix

- G -?>
Show that one obtains two distinct constant subsequences. Comment.
5.5.4[D] Let Ae<Enxn. Consider the following algorithm, known as additive
reduction (AR):
A0 = A,

A — F'1A F
Λ _£,
Η1 fc ^k^k'
where Ek is the lower triangular part of Ak, including the diagonal.
(a) Prove that if A is an irreducible lower Hessenberg matrix with eigenvalues
of distinct moduli, and if the matrices Ak = (a|*)) generated by the AR
algorithm are such that
Vfc: |<{ | > \af21>...> I O >0,
then, as k -► oo, the diagonal elements of Ak tend to the eigenvalues of A.
(b) Compare the complexity of this algorithm with that of the QR method.

5.5.5 [D] Examine the potential instability of the AR algorithm (Exercise 5.5.4).
In particular, examine the case of defective eigenvalues and the case of matrices
with a greatly extended spectrum.
5.5.6 [A] Compare the basis Qk defined by (5.2.1) with the basis Qk of the QR
5.5.7 [B:21,26] Let Ψ be the vector space of polynomials with real coefficients,
endowed with a scalar product < ·, · > . Consider an orthonormal system
(Po> Pi»..., p«,...), where pk is of degree k.
(a) Prove that the polynomials pk satisfy the relation
Pn +1 (x) = (Anx + Bn)pn(x) - Cnpn _! (x).
(b) Prove that if ak and bk are such that
pk(x) = afcxfc + fokxfc_1 + ···,

Ak = -

Ble = Al

ck = al
(c) Define

a* =

and construct the symmetric tridiagonal matrix

/ 0 0■ π
0 ßl U
α 0- π
ßl ι /»2. "U
T = 0 ßl. «2. 0
••A '·«.

Suppose the scalar product is of the form

<P><7>= w(x)p(x)<?(x)dx,

where w is a non-negative function defined in [a,b] such that, for every

polynomial p, the Lebesgue integral

P\\22w=\ vv(x)|p(x)|2dx
' - - ! >
(d) Show that, for each k ^ 1, the polynomial pk has k simple real roots in the
interval [a, b]; we denote these roots by xjik(j = 1,2,..., fc).
(e) Prove the identity

Τηφη(χ) = χφηΜ ~ — *.n + 1>

where en+l is the (n -f l)th canonical vector of R n + x and

Φη(χ) = [PoM> Pi (*)> · · ·»Pn W ] T
Deduce that the roots of p n + 1 are the eigenvalues of Tn and that, by means
of a shift of origin, the basic QR method, when applied to T„, is convergent.
Consider Gauss's quadrature formula

ί w(x)/(x)dx= X whnf(xln) + En(n

where the weights wjn are such that the error En(f) is zero when / is a
polynomial of degree ^ In — 1.
(f) Show that the weights vv, „ can be deduced from the first component of the
eigenvector </>„(*,,„) provided that the moments of the functions w, that is
1w(x)x dx,
are known.
mk =
4' * k

5,5.8 [B:65] We generalize here the basic QR algorithm with the help of the
notion of isospectral flow.
(a) Show that every matrix Ae<Cn xn has a unique decomposition
Α = πι(Α) + π2(Α\
where π^Λ) is skew-Hermitian [πι{Α)* — — π{(Α)~\ and π2{Α) is an upper
triangular matrix with real diagonal elements.
For all B and X in <P x n define
[£, X] = BX - XB.
n xn
Let B0e<C and suppose that / is analytic in an open set containing the

spectrum of B0. Consider the matrix differential equation

Β(ί) = [Β(ί),π 1 (/(Β(ί)))], Β(0) = Βο.
We call t\-+B(t) the flow defined by / . Let R and Q be the solutions,
respectively, of the equations
6(0 = 6(0π1(/(β(ί))Χ 6(0) = /Π,
R(t) = n2(f(B(t)))R(t), R(0) = In.
(b) Prove that Q(t) is unitary and that R(t) is an upper triangular matrix whose
diagonal elements are real and positive.
(c) Prove that
B(t) = Q(t)*B0Q(t) = R(t)B0R(ty \
Q(t)R(t) = e /(Bo)i .
Let A,(y = l,2,...,n) be the eigenvalues of B0 and let Vj be the associated
eigenvectors. We suppose that
when k= Ι , . , . , η - 1, lin(^ 1 ,...,e k )nlin(t; fc+1 ,...,i; n ) = {0}.
(d) Show that, when t -+ oo, the behaviour of B(t) is as follows: its strictly lower
triangular part tends to zero, its diagonal elements tend to A l9 ... ,λη in this
order and its upper triangular part remains bounded.
(e) Show that when the basic QR algorithm is applied to

it produces a sequence
,l k = e/(B(fc)) (fc=l,2,...).
We recover the QR method for B0 by taking f(z) = In z.

Section 5.6 Hermitian Matrices

5.6.1 [A] Prove that the QR method preserves the Hermitian (or symmetric)
tridiagonal form.
5.6.2 [B:45] Let A e C be a regular symmetric matrix.
(a) Prove that there exists a unitary matrix Q and a regular lower triangular
matrix L such that A = QL: this is the QL factorization of A.
We define the QL algorithm with shift ak of origin as follows: given Ak and
σλ, let
Ak-akI = QkLk
be the QL factorization of Ak — akl. Define
Ak+i ^LkQk + Gk1-

(b) Prove that Ak+1 is unitarily similar to Al. Put

J = [>,.> *,«-1>···>*ΐ]·
Let (/4^ ) be the sequence of matrices produced by the QL method and let
(A[R)) be the sequence produced by the QR method.
(c) Prove that
A[R) = 7A[L)I
7* = / = /- 1 .
Let A be a real symmetric tridiagonai matrix:

(*> ßl. \
I R.
ßl «1

V "A- •a«
Define the Wilkinson shift by
ifa t = a 2 ,
2 1 otherwise,
Ul-sffl(S)ßl(\S\ + y/S + ß\)-
where δ = (α1 — a2)/2. Define the vector p by
(A —wl)p = el
and the vector q by
(A — ω/)<? = τρ,


(d) Prove that

\\(Α-ω^ί\\22 = ^^τηϊη(2β2ι,β22Αβ1βι\/^β)'
Deduce that if A is an irreducible real symmetric tridiagonal matrix, then the
QL algorithm, with Wilkinson's shift of origin, generates a sequence of
irreducible real symmetric tridiagonal matrices
/ a„<*) \
1 ß\k).
A = ft»(*) a <*>


such that

and therefore converges.

5.6.3 [D] Propose a QL method for the computation of the eigenvalues of an
arbitrary matrix, based on the ideas of Exercise 5.6.2 and examine the convergence
of this method.
5.6.4 [B:49,54] Consider Jacobi's method for a symmetric matrix A elR" X ":
A0 = A.
Given the matrix Ak = (α^), let (p, q) be such that
|^>|= max |a}J>|.

Let J k be the matrix obtained from the identity matrix by changing

the zero in position (p, q) by sin 0k,
the zero in position (q,p) by —sin0k,
the units in positions (p, p) and (q, q) by cos 0k,

where 0k is such that

satisfies the condition

(a) Study the convergence of this method.

(b) Apply this method to a 3 x 3 matrix and prove that its asymptotic behaviour
is the same as that under the inverse iteration method.
5.6.5 [B:45] Let T be a real symmetric tridiagonal matrix of order n. Let σ be
a shift of origin. The QL factorization (Exercise 5.6.2) of Τ—σΙ can be
accomplished by n — 1 rotations Jk; that is Jk is a rotation in the plane of the
coordinates (/c, k + 1) (in Exercise 5.6.4 take p = k and q = k 4-1). Thus
J 1 J 2 - J ( l - 1 ( r - ( T / ) = L.
(a) Show that the calculation of

can be started as soon as Jn-2Jn-iT has been calculated, by applying J*_ t

on the right at this stage.
(b) Study the following shifts:
π τ (0)
σ = -^-' (Newton),

σ = ω_ίτΜ (Saadx
where π τ is the characteristic polynomial of T.

Section 5.7 The QZ Algorithm

5.7.1 [A] Let A, BE<C" X ". Consider the generalized problem
Ax = λΒχ (χ Φ 0),
where det (A — XB) is not identically zero. Let S be a subspace of C of dimension
m such that
dim[>l(S) + £(S)]^m;
than S is called a deflation subspace.
(a) Prove that there exists a unitary matrix Ue<Cnxn and a unitary matrix
Ve<Enxn such that the first m columns of V form a basis of K and

U*AV=['Axi Axl
0 Λ 22
#11 #12
L/*£K .
0 £22
m xm
where A x 1? J ^ x eC . Define
dif(i4 l l ,ß 1 1 ;i4 2 2 ,ß 2 2 )= min max{A(i4 1 1 ,/l22)^(ß 1 1 ,ß 2 2 )},


A(B 1 1 ,B 2 2 )=||B 2 2 y"-JfB 1 1 || F .
Now consider two arbitrary unitary matrices in C x":
1/ = (1/!,1/ 2 ) and K=(K lf K 2 ),
where L/ 1 ,K 1 eC . Define
/*,.,. = £/|M V. and B y = l/fßK,.,
( n _ m ) xm
and, for given X, y in C :
£/; = ( l / x + I/ 2 X)(/ + X*X)~ 1/2 ,
l/'2 = (U2 - l / ^ M / + JlfX*)"1/2,
K , 1 =(K 1 + K 2 y)(/+ y*y)~ 1 / 2 ,

V2 = (V2 - Vx Y*)(I + YY*)-1/2.

(b) Prove that U' = (U\, U'2) and V = (V\, V2) are unitary.
(c) Prove that U*AV\ = U*BV\ = 0 if and only if the pair (X, Y) satisfies the
A22Y — Ai4u = XAl2Y— A21,
D22Y— A B [ ] = XBl2 Y— ^21·
Define the constants
y = max{M 2 1 || F ,||B 2 1 || F },
(5 = d i f ( / l U ) B n M 2 2 , B 2 2 ) )
v = max{||/l 1 2 || 2 ) ||B 1 2 || 2 }.
(d) Prove that if
yv 1

then there exist X, Y in C (n ~ w) xm such that V\ is the basis of a deflation

subspace. Prove that sp(A,B) is the disjoint union of sp(All-\- Al2Y,B11 +
B12Y) and sp(>l22 - xA129B22 - XBu)·
(e) Deduce the following result: let U = (Uu U2) and K= (Vi9 V2) be such that
A2l= B2i=0. For any two matrices E and FinC" x "we define

e^ = mkx{i|£ i .|| F ,||F..|| F },

y = 2i»
v = max{||412||2,||B12||}+ß12,
5 = dif(^11,ß11M22,B22)-(e11+622).
Then, if
yv 1

there exist X, Y such that Vi + V2Y is a basis of a deflation subspace

associated with the perturbed problem
(A + E)x = Λ(£ + F)x (x#0)
and the spectrum sp(/t + £, £ + F) is the disjoint union of the spectra
δ ρ ( ^ 1 1 + £ 1 1 + μ ΐ 2 + £ 1 2 ) 7 > β 1 1 + £ 1 1 + ( β 1 2 + £ 1 2 )7)
sp(A 22 + F 2 2 - X(A12 + £ 1 2 ),B 2 2 + F 2 2 - X(£ 1 2 + F 12 )).

(f) Prove that the matrices X and Y of parts (c) and (d) satisfy the conditions

5.7.2 [B:53] Let A e ( P x n and Be€nxn be two symmetric matrices. We suppose
that B is positive definite. Consider the Rayleigh quotient
, x x*Ax
μ(χ) = -~— (x*0).
For a given vector xk we use the notation
to = Μ**)>
Ck = A- μβ.
C* = Ö* — Ek — Fk
be the descomposition of Ck into a diagonal matrix (£>k), the strictly lower
triangular part ( — Ek) and the strictly upper triangular part ( — Fk). For a given
value of the parameter ω(>0) we define

*** = / - ^ - c » .
Consider the iteration
k+l ~ MkXk.

(a) Prove that

X X r
k + 1 ^ fc ~~ * k *>

where rk is the residual defined by

k = (A-VkB)xk·
(b) Prove that if

μo = min-^ = minμ(βi),
i bu
lim fk = 0,

^ = (A ~ VkB)xk

A 1

5.7.3 [D] Suppose B is a regular matrix. Let

B*B = LL*
be the Cholesky factorization of B*B. Show that
Ax = A£xoL(C - λΙ)1*χ = 2x,
where C = L" J?ML-*.
5.7.4 [B:52] Let AeCm x " and £e<Cm x n. Suppose that n ^ m and that
Ker/4nKer£ = {0}.
Let xeC" and suppose that Bx Φ 0. Define
F{x) = \\\Ax-p{x)Bx\\l
PW =
(a) Prove that the gradient of F at the point x is given by
VF(x) = [A - p(x)B~]*[_A - p(x)J5]x
The gradient method for minimizing F is defined by
*k+i=*fc + 0(**) (/c = 0 , l , . . . ) ,
* -2F(x k )
VF(xJ if||VF(x k )|| 2 #0,
0(**)H 0 if||VF(xfc)||2 = 0,
0 if£xk = 0.
(b) Prove that
Il^ + ill2 = KII2 -11^)112·
(c) Examine the convergence of the sequence (xk).

Section 5.8 Newton's Method and the Rayleigh Quotient Iteration

5.8.1 [B:40] Let A' be a matrix close to A. Suppose we know a simple non-zero
eigenvalue λ' of A', a right eigenvector φ' and a left eigenvector φ' such that
ΙΙ0ΊΙ2 = ^ ν = ΐ;

hence F = φ'ψ'* is the spectral projection and


is the associated reduced resolvent.

Define the following algorithm:

Φο = Φ'*
λΗ = ψ'*ΑφΗ9

Φκ + ι=γΛφ^8'Α\^φΗ-~Αφλ (/c = 0,l,...).

(a) Study the convergence of this method.

(b) Prove that the sequence (</>k) satisfies
φ'*φ,= \ (* = 0,1,...).
(c) Interpret this method as a power method with defect correction.
(d) Interpret this method under the aspect of a modified Newton method.
5.8.2 [D] Let φ be such that || φ ||2 = 1 and Αφ — λΒφ. Suppose there exists a
vector φ0 such that φ^Βφ — 1. Study the convergence of Newton's method
applied to the equation
Ax - φ*ΑχΒχ = 0.

Section 5.9 Modified Newton's Method and Simultaneous Inverse Iterations

5.9.1 [A] Use the notations of Lemma 5.9.2. Compare ΙΙ(Β,Β)"1 ||F with
ΙΚα-σ/ΓΜΐζ when e x - !

5.9.2 [A] Consider Lemma 5.9.3. Prove that the difference between the singular
values of the matrices K and Π is of order ε1/2.

5.9.3 [A] Consider Proposition 5.9.7. Let of be the eigenvalues of F*F. Prove
σί = 0(η*").
5.9.4 [C] Let
0 θ\
A= 0 1 1
yO 0 1
and let M be the invariant subspace associated with the defective eigenvalue
χ = l: M = linZ, where X = (e2,e2). In the method (2.11.1) take Y= X. Choose

as the initial basis

0 °l
17 = 1 0
ν υ
(a) Show that ||_l/ - X ||2 = 0(ε).
(b) Show that B = Y*AU is diagonahsable.
(c) Let W be the basis of eigenvectors of B. Show that
cond2(W0 = O(6 -1 ' 2 ).

Numerical Methods for

Large Matrices

The numerical methods for large matrices are based on the principle of projection
on an appropriate subspace; they require only the product of the matrix A and
a vector, the matrix being stored in a secondary memory. The methods we are
going to propose in this and the next chapter are at present the most efficient
ones for computers of traditional construction (sequential computers) which can
dispose of a vectorial unit.
What is a large eigenvalue problem? Evidently, there exists no precise and
absolute answer to this question, for the notion of size depends on the computer
used. We could propose the following answer: an eigenvalue problem is regarded
to be large when it is much cheaper to compute only those eigenvalues and
vectors that are required, rather than to compute them all.
The eigenvalue problems of large sparse matrices arise mainly from the dis­
cretisation of partial differential equations. The most frequent requirements are
to find (a) the least eigenvalues of a symmetric matrix or (b) the eigenvalues of
greatest real part of a non-symmetric matrix. For example, in structural
mechanics one may wish to compute several hundred eigenvalues of matrices
whose orders exceed 105. In quantum chemistry the order may reach 106 and
more. The majority of the spectral problems that have been solved up to now
are symmetric, but the share of non-symmetric matrices is increasing (problems
of stability and bifurcation are considered in Chapter 3).
The next two chapters present the state of the art with regard to the algorithms
for large eigenvalue problems. Several theoretical questions are open at present,
which explains the heuristic aspect of some of the algorithms that will be
Chapter 6 is concerned with the extreme eigenvalues while Chapter 7 treats
the eigenvalues of greatest real part when the matrix is non-symmetric.


The principal idea is to approximate to the eigenelements of the matrix A of
order n by those of a matrix of much smaller order v, most frequently obtained

by orthogonal projection on a subspace G, of dimension v « n. Let nt be the matrix

of the orthogonal projection in question.
The spectral problem:
find /leC and 0 Φ xe<P such that Ax = λχ (6.1.1)
is approximated by the problem in G,:
find v^eC and 0 Φ xleGl such that πι(Αχι — Λ,,χ,) = 0. (6.1.2)
The problem (6.1.2) is the Galerkin* approximation of the problem (6.1.1) (see
Raviart and Thomas, 1983). It is called the Rayleigh-Ritz* approximation in the
special case in which A is Hermitian.
The numerical method consists in constructing an orthonormal basis in G^ and
in solving (6.1.2) with respect to this basis. Let g, be the n x v matrix representing
this basis. Put xl — Q£h where £ z eC v and where ξι is a solution of
or, again, an putting Bt = QfAQt we consider the problem
find Aze<C, 0 Φ £ZCV such that B& = λ&. (6.1.4)
The matrix Bl is of order v and represents the map
<tfl = nlttf]Gi:Gl-+Gl
with respect to the basis Qv
In practice the subspace G, is constructed by starting with a Krylov sequence
generated either by a vector u or by a set of r linearly independent vectors {u.}rr
S= lin (tip..., ur) (1 ^r<n).
There are two main classes of methods that arise from the following choices of G{.
(a) Sz = AlS when / = 1,2, — The dimension dim St = r is constant throughout
the iterations. This choice leads to the power method when r = 1 and to the
method of simultaneous iterations when r > 1.
(b) Jfz = (S, AS,..., AlS) when / = 1,2,..., v < n. The space Jf, is called the Krylov
subspace generated by {t^,..., ur}\ it is of dimension rl. When A is Hermitian,
this choice leads to the Lanczos method ( r = 1) or to the block-Lanczos
method (r > 1). When A is not Hermitian, the method reduces to the Arnoldi
method (r = 1) or the block-Arnoldi method (r > 1).

Remark The subspace CfCx — lin {p(A)S}, where p is a polynomial of degree

^ / — 1 , is evidently richer than St. In particular, it contains the subspace

* Boris Gigorievich Galerkin, 1871-1945, born at Polotsk, died in Moscow.

Walter Ritz, 1878-1909, born at Sion, died at Göttingen.

Si = Pi(A)S, where pl is a Chebyshev* polynomial, which will be used in Chapter

7 in order to accelerate the convergence of the method ofsubspace iteration.
Supposing that the eigenelements X, x are given, one would like to know whether
there exists a sequence Xl9xt of eigenelements of s/t = nlA^Gl (or of Al = ntA, see
Exercise 6.1.1) that converges rapidly towards X,x when / increases. Since the
method of approximation presented here is a method of projection, it will be
seen that it is possible to bound the errors | λ — λι | and || x — xz || 2 as a function of
dist (x, Gt) =\\(I- π,)χ \\ 2 = sin 0(x, G,),
where 0(x, G0) is the acute angle between lin(x) and GL. In fact, the study of the
convergence is carried out in two stages:
(a) to show that for the choices of G, considered, there exist one or more
eigenvectors x such that <xt = \\(I — nl)x\\1 is small;
(b) to bound \λ — Xt\ and ||x — xl ||2 as a function of (xt.
The convergence rate of the methods is therefore derived from α,, which plays
a key role. We shall generally suppose that X is simple. When X is multiple, the
analysis is more delicate and the exponent involves the index of X (see Chapter 4).


Once again we suppose that
\μι\> \μι\ >-> brl > ΐΛ+ι I > '" > I/O > 0, (6.2.1)
where 1 < r < n. The definitions of M and P were given in Chapter 5. The reader
is referred to Section 5.2 for the definition and discussion of the convergence of
this method.
It was shown there that
ω(Α%Μ) = 0(|μ Γ+1 /μ Γ |'),
provided that (6.2.1) is satisfied and that dimP5 = r. This is a global rate of
convergence; we shall establish more precise and detailed upper bounds.

6.2.1 Estimation of || (/ - /r,)*,· || 2

Lemma 6.2.1 Let dim PS = r and let xf be an eigenvector associated with μ(. For
every ε > 0, there exists a unique vector s( in S and an index k such that

when / > fe.

♦Pafnouty Lvovich Chebyshev, 1821-1894, born at Okatovo, died at St Petersburg.


PROOF Each seS can be written in the form

S= Y^SjUj.


Since the {Puj}ri are linearly independent, there exists for any xteM a unique
Si in S such that Pst = x t . In what follows xf will be an eigenvector:
Axt = μ,χ,..
By definition
||(/-^)xi||2 = min||x/-);||2^||xi->;/||2,


C = -,4(/-P), ρ(η = \μ,+ ί/μί\.

For every ε > 0, there exists an integer k such that, when / > /c, we have
IIC'H^piQ + e.
When /I is diagonalisable, X = XDX "* and || Cl || ^ cond 2 (X)pl(C). See Exercise
6.2.1 for a study of the constant, when A is not diagonalisable.
When /-»oo, then dist(xi}Si) tends to zero like Ιμ,+ ^μ,Ι'. The constant
II *i — Si || 2 diminishes as the acute angle between the eigenvector xt and the initial
subspace S becomes smaller (see Figure 6.2.1 for the case r = 1).

X I ^M
' x=Ps
Figure 6.2.1

When A is Hermitian, P is orthogonal and

II Xi-st II2 = || (I-P)si\\ 2 = tan 0(5ί,χ,·),
where 0(sf, xt) is the acute angle between the directions (st) and (xt).

6.2.2 Speed of Convergence

Lemma 6.2.2 Let π be a projection matrix on a subspace S. Then nA and nAn
have the same eigenvectors associated with a non-zero eigenvalue.

PROOF The equation πΑχ = λχ (λ Φ 0) implies that x = πχ, and so xeS. Let B
be the matrix of nAn in an orthonormal basis V of S. Put x = Υξ; then Βξ = λξ.

Let A be a simple eigenvalue of A, associated with an eigenvector x such that

|| x || 2 = 1. We choose λ to be among the r dominant eigenvalues of A. Put

A generic constant will be denoted by c.

Lemma 6.2.3 Let λ and x be eigenelements of A. Then if I is sufficiently great,

there exist eigenelements λχ, xt of Ax = nxA satisfying x*xx = 1 such that
\λ — λl\^:C0ίl and ||x — x j 2 ^ c a , .
When A is Hermitian, we have
\λ — λ{\ ^caf.

PROOF By Lemma 6.2.2 the eigenelements of Ax satisfy

xx = Q,{, and Βχξχ = λχξχ.
Let π be the orthogonal projection on M. We are going to apply Theorem 4.4.5
to the matrices Ax = nxA and A' = nA, which will play, respectively, the roles of
A and A' in that theorem. Put
Η, = 4 | - Λ ' = ( π , - π Μ .
Since co(AlS, M) -+ 0, we deduce that
Ι|Η/ΙΙ2-+0 when/->oo.
When x is the eigenvector under consideration, we have
Hxx = (nx — n)Ax = λ(ηχ — n)x = λ(ηχ — I)x.
For sufficiently great / there exists an eigenvector Xj of Ah normalized by

Figure 6.2.2

x*xt = 1, such that

\x-xl\\2^c(xh Ιλ-AjKca,,
where λι is the associated eigenvalue. This eigenvalue is simple when / is
sufficiently great, because || Hl || 2 -► 0 as / -> oo.
Let xt = χ,/ΙΙ Χχ II2 be the normalized eigenvector (see Figure 6.2.2). When A is
Λι — Λ ι ΛΛι

We now have
(A - At)At = A(/ - πζ)χ + (/ - π,Μίί, - x),
which implies that

where λ may be a multiple eigenvalue.

Theorem 6.2.4 On the assumption that (6.2.1) holds and that dim PS = r, the
method of simultaneous iterations on r vectors converges. Moreover, if the ith
dominant eigenvalue μ( is simple, then the convergence rate of the ith pair of
eigenelements of Ax is of the order of\μr+1/μi\,i=l,...7r.
When A is Hermitian, the convergence rate of the ith eigenvalue becomes
l^r+lMI 2 .

PROOF The assertions are simple consequences of Lemmas 6.2.1 and 6.2.3. See
Exercise 6.2.3.

6.2.3 Subspace Iteration with Projection

The algorithm (5.2.1), presented in Chapter 5, constructs the basis Qt of AlS. We
present here the version with projection on the subspace AlS. For the sake of

simplicity we suppose that A is diagonalisable. The method consists of constructing

in AlS a basis Xt of eigenvectors of A{ in the following manner (see Exercise 6.2.4):
(a) 17 = Q0R0i B0 = Q*AQ0 = Y0D0Y~\ X0 = Q0Y0.
W h e n / ^ l let
(b) AX^^Qfl» (6.2.2)
(c) B^QfAQ^Ypj-1, X^QtYl9
where Yt is the matrix of eigenvectors of Br

Lemma 6.2.5 The matrices Qt and Χχ are bases of AlS.

PROOF The proof is left to the reader, who will observe that the bases Qt in
(6.2.2) and in (5.2.1) are different.

Theorem 6.2.6 Let Xl and X be the bases of eigenvectors of Ax and A respectively.

Then there exists a sequence of regular block-diagonal matrices At such that
XtAi -► X when I -> oo.

PROOF We use Corollary 1.5.5 to assert that there exists a sequence of regular
matrices Δ, such that XtAt -► X.
The proof that Δζ is block-diagonal is carried out by induction (see Exercise

We suppose that μ( is a simple eigenvalue of A. Then μ|0 ->μ·, where μ{.° is the
ith (simple) eigenvalue of Bt associated with the eigenvector ξ{Ρ. We put

Corollary 6.2.7 If μί is a simple eigenvalue, the convergences of μ(Ρ-*μί and of

lin (x{P) -► lin (xf) take place at a rate |μΓ+1/μ,·|·

PROOF This is a reformulation of Theorem 6.2.4.

In practice one does not calculate Bt at each iteration. If the step of projection
takes place at every k iterations, this amounts to projecting on the subspace AklS.
Since the dimension r of Bx is moderate in comparison with n, the methods of
Chapter 5 can be employed in order to diagonalise B (for example by the QR
algorithm or by inverse iteration).


In this and the two subsequent sections we suppose that the matrix A is

6.3.1 The Tridiagonalisation Method of Lanczos

Let u be a non-zero vector. The Krylov subspace generated by u is denoted by
Kn = \\n(u,Au,...,Anu-1).
We have dim JT„ = n if and only if the minimal polynomial of A relative to u is
of degree n\ thus p{A)u φ 0 for every polynomial p of degree less than n.
In exact arithmetic, the Lanczos algorithm constructs iteratively an orthonor-
mal basis

of C in which A is represented by a tridiagonal matrix Tn — V*AVn as follows:

(a) vl=u/\\u\\2, a1=v*Av1> bx^0.
(b) W h e n ; = 1 , 2 , . . . , n - l put

Xj+ 1 = ÄVj ~ aPi ~ bjVj-l> b

j+ 1 = II Xj+ 1 II 2 > 0>
vj+1 = b7+\xj+1, <*J+I=V*+1AOJ+1.
Tn is a Hermitian tridiagonal matrix with diagonal elements at(i = 1,...,n) and
off-diagonal elementsfc,(i= 2,..., n).
This algorithm was proposed by Lanczos (1950) as a method for tridiagonali-
sing a Hermitian matrix. However, in practice the vectors vt which are
constructed by local orthogonalization (with respect to t ^ and t;,·^), rather
quickly lose the property of global orthogonality. For this reason practitioners
have been led to prefer Householder's method, which is much stabler for
tridiagonalising a Hermitian matrix of medium size.

6.3.2 The Incomplete Lanczos Method

Since the vectors vt are computed iteratively, the process may be terminated after
the computation of /(< n) of these vectors. Then Vx = [v!,..., v{] is an orthonormal
basis of the space
jri = lin(u9Au9...9Al-1u)
in which the Rayleigh-Ritz approximation $4X is represented by the matrix
which is a tridiagonal matrix of order Z. This leads to the following algorithm:
(a) u Φ 0, £>! = y/ü*ü, v0 - 0,
(b) When; = l,2,...,/,put

j = 7Z> u = Avj~ bjVj_ ί, a. = vju, (6.3.1)

uj = u-ajVj, bJ+1=y/ü*ü.

We suppose that the quantities 6,0* = 2,...,/) are positive, that is to say, that
dim J f j = L This assumption is not restrictive, for if it is not fulfilled, the
eigenvalue problem for A reduces to two subproblems (see Exercise 6.3.1).
The dimension / of the approximate problem is either fixed in advance or else
is determined dynamically by the algorithm according to the value of bl+1 (see
Section 6.3.5).
The tridiagonal matrix Tt is then diagonalised.

71 = W 7 ·
The eigenvalues Dt and the eigenvectors Xt = VlYl of At = ntA are called the
Ritz values and Ritz vectors of A; they are the required approximations of certain
eigenelements of A9 as we shall see. They possess the global optimal properties
envisaged in Section 4.6 of Chapter 4.
In the basic Lanczos method we seek to keep / small in relation to n, and there
can be no question of convergence in the classical sense of this term since / takes
only a finite number of values. When / is of modest size compared with n, we
shall see that Jf, contains eigenvalues of Ax which are sufficiently close to certain
eigenvectors associated with extreme eigenvalues of A. This property of approxi­
mation justifies the choice of the Krylov subspace Jf,, but this is not the only
justification. There is also a computational reason: the Schmidt orthogonalization
process is particularly simple in the Krylov subspace; this is the reason why the
matrix Τχ is tridiagonal (see Exercise 6.3.3).
We shall now show that the Lanczos method can approximate only one
eigendirection associated with a multiple eigenvalues. The irreducible tridiagonal
Hermitian matrix Tt possesses only simple real eigenvalues (Exercise 5.1.1), which
may be arranged in decreasing order of magnitude:

Let {2f; 1 ^ ii < d] be the distinct eigenvalues of A9 also arranged in decreasing
order of magnitude:
λί > λ2 > ' · ' > λά = Amin.
Let Pt be the eigenprojection associated with Af and let E be the subspace
generated by the vectors {Ρ,Μ: 1 ^ Ϊ ' ^ d } ; thus
£ = lin(P 1 w,...,P d M).

If these vectors are not zero, they are eigenvectors of A corresponding to the
distinct eigenvalues; they are linearly independent.

Lemma 6.3.1 The Lanczos process amounts to approximate the eigenvalues of

A' = A]E
which are simple eigenvalues.

PROOF We have

i=l i= 1 i= 1

hence Jf, £ £, whatever the value of /. The Lanczos method applied to A or to

A' produces the same matrix niAnl. If dim E = d' < </, then A' is of order d' and
possesses </' independent eigenvectors associated with distinct eigenvalues, which
can only be simple.

6.3.3 Estimation of tan 0(xh Jf^ l<n

Ley Pz_ l be the vector space of polynomials of degree less than or equal to / — 1.
Suppose that P-u Φ 0 and put

ιΙΛ·"ΙΙ 2
x, is an eigenvector associated with λ·χ. Put

if (I — Pi)u φ 0 and y( = 0 otherwise.

Lemma 6.3.2 IfPtu Φ 0, we have

tan 9(xh ΧΊ) = min || p(A)yt || 2 tan d(xh u).

PROOF Each veJft can be written as v = q(A)u, where gGPj_ t . Since


it follows that
u = PiU + £pjU and i; = q^P.u + Σ ^ Ρ , " .

Since the eigenprojections are orthogonal, it is readily shown that

tan20(P|M,tO = ( £ . ^
If(/-P l )M#0, we have
Σ «2(^)l|P/i|li = Il^)yj*||(/-P>||*.

If u = Pitt, we put yt = 0; then 0(P;U, v) = 0. We define

Ρ(·) = - .,

then PGP,_i and P(Af) = 1.

tan 0(Pfu, Jff) = min tan Θ(Ρμ, ν)

-Γ min „ P ( „ y , l l 2 l f c i ^ .
"p(Ai)=l J II Λ" II2
Finally, we observe that

tan0(P,. M , M )J l ( / - P ) ""*.
II P.« II2
It remains to obtain an estimate for the number
i„ = min \\p(A)yt\\2.
This can be accomplished with the help of Chebyshev polynomials. We recall
that for real t and |i| > 1, the Chebyshev polynomial of the first kind and of
degree k is defined by

In Chapter 7, Sections 7.2 and 7.3, we shall collect the properties of Cheby­
shev polynomials of a real or complex variable which are required in Chapters 6
and 7.

Theorem 6.3.3 When Ρ,κ φ 0, then

tan 0(xi9 Jf.) ^ -— tan 0(xf, u\

Δ1 = 1, Δ | = Π^ (i>D
J<i kj — kt

M ~~ M+1
7i = l + 2
X| + 1 — / „

PROOF We have
t ü= min ||ρ(Λ)^|| 2

= min 1 JTF l

because A = QDQ* (see Chapter 7).

(a) The case i = 1:
max|p(A,)|< max \p(t)\.
j>l i€[Aml„.A2]

By Theorem 7.2.1,
min max |p(i)l = ,
Ρ(λ,)= 1

^1 ~ ^ 2
?i = l + 2
A? Ami

(b) The case i> 1:

il7 = min max|p(A;)|
pePi-ι i/i

^ min maxIpWI·
pePi-i j>i
P(A<)= · = ρ ( λ , - ι ) = 0

One such p can be written in the form


where tfeP,.,·. Hence

, r v J
j>i " i>i

<(Π "m,n max
*<* A k — A; / j>i
We conclude that

tü ^ Af min max \q(t)\ = —.

When / increases, the decrease of tan d(xh Jfj) is of the order of the decrease

of 1/T, _;(}>;). The quantity y( depends on the relative distance (Λ· —A i+1 )/
v^i+i ~ - ^ m i n ) .
*i = Vi + y/yi-i-
For sufficiently great /, the value of T, _,·()>;) is of the order of ^τ\~\ and the rate
of decrease of 0(xh ΧΊ) is Ι/τ,. This rate is the better the greater yt is and γχ is the
greater the smaller i is in comparison with /; that is X; is an eigenvector associated
with one of the greatest eigenvalues of A.

6.3.4 Approximation
We seek to estimate the precision of the Lanczos method as a function of Uu and
the spectrum of A, for a pair of eigenelements λ, x, || x || 2 = 1. We put
«i —\\U — π ι ) * II2 = si*10(x, #Ί) < t a n 0(x, ^i)>
which can be majorized with the help of Theorem 6.3.3.

Theorem 6.3.4 Suppose that λ and x are given. If a, is sufficiently small, there
exist eigenelements λι and xt of Ax such that \ λ — Λ,| ^ cocf and sin 0, ^ caf, where
c is a generic constant and where 6t is the acute angle formed by the eigendirections
lin (x,) and lin (x).

PROOF Let λχ be the eigenvalue of Ax = nxA closest to λ and let xx be a

corresponding eigenvector. Let Γ, be a Jordan curve lying in res(/4)nres(y4,)
which isolates λ and λχ. Let Px and Rx(z) be the eigenprojection and the resolvent
associated with Ax and λχ. Then

(P, - P)x = - - L f [Äf(z) - K(z)]x dz



- ^ f ^αζ(/-π()χ,
2πΐ J r ,A — z

R(z)x =


|| ( P , - P ) x || < ^ V , a „


μ, = meas (Γ,), ct = max || Rt(z) ||, dt = [dist (A, Γ,)] \

If we put
c = — max (Mat),
we conclude that
Let Xj = Pzx and let 0, be the acute angle between lin(x) and lin(xj). Then
|| X/1| 2 = cos 0j, xfx = cos 2 0,
| Xj — x || 2 = sin 0j ^ co^
(see Figure 6.3.1). Put
We have the relations
Λ Ai -— X ./1X Χ · /ΊιΛι

= xM(x - x,) + x*(A - At)xt + (x* - x*)Alxli

Since || x — x, || 2 = 2 sin 0,/2, we deduce that
Finally, as in Lemma 6.2.3, we show that | λ — Α,Κ ca2 when A is Hermitian.
Theorem 6.3.4 demonstrates that, for moderate values of/, the Lanczos method
enables us to approximate the extreme eigenvalues of A (the greatest and the
least, the latter result being obtained by arranging the eigenvalues of A and of

Figure 6.3.1

Tt in increasing order of magnitude), on the condition that consecutive eigenvalues

are well separated. The constants that occur in Theorem 6.3.4 can be made precise
(see Exercise 6.3.6).
When A, is close to Ai+1, the bounds deduced in Theorem 6.3.3 become very
pessimistic. We can improve them by taking into account the particular structure
of the spectrum. For example, it can be shown (Saad, 1980a) that


i + 2~~ / min
but the constant c„ which contains (Af — Ai+x)~ \ is large.
When A is a multiple eigenvalue, the Lanczos method (in exact arithmetic) does
not enable us to compute the set of eigenvectors associated with A. However, in
practice, rounding errors have the effect that from a certain number of iterations
onwards, the Lanczos method is applied to a neighbouring matrix having neigh­
bouring eigenvalues that are distinct (no longer multiple). In fact, it will be noticed
that a second copy of A appears which corresponds to a second eigenvector that
is not proportional to the first. This second copy appears as a result of the
Lanczos method being applied with an initial vector that has a zero component
(to machine precision) in the desired eigenspace. When A is of multiplicity m, then
m copies will appear successively as / increases.
We shall return to the question of multiple eigenvalues in Section 6.4, where
we shall present the Lanczos block method.

6.3.5 A Posteriori Bounds

In accordance with the construction of the basis Kz we have the identity
AV^Vfi + bt+M+tf, (6.3.2)
where ex is the /th vector of the cononical basis of C . Recall that Xx = Vt Yv Thus
AX^Xfr + bt+yO^ejYt.
The residual for A, calculated in terms of A}0 and xj°, is given by

Mx<'>-^xf>||2 = f> 1+1 |^| = ^ .
There exists an eigenvalue A,· of A such that

Moreover, if du = dist [λγ\ερ(Α) - {AJ], we have

ß2 ß
\λ,-λ®\^ and
du u
where θη denotes the acute angle formed by the eigendirections lin(xj°) and
In practice, it will be known that the eigenvalue A{.° (respectively the eigen­
vector xf}) has 'converged' by observing the last component of the eigenvector
ξ{}} of Tr When the QL algorithm with shift of origin is used to compute the
eigenvalues of T,, then it is possible to compute ξ{{) without having to compute
the whole vector ξ® (for details see Parlett, 1980, Ch. 13).

6.3.6 Effects of Finite Precision Arithmetic

The study we have just carried out supposes that the vectors {i;.}^ remain
mutually orthogonal. This is not true in practice when a calculator is used whose
arithmetic is only finite. In particular, this orthogonality disappears when λ{!\
χ|° begin to approach Λ,.,χ. (see Exercises 6.3.7 and 6.3.8). The exact Lanczos
algorithm terminates at /(< n), although in practice it can be continued indefinitely.
The error induced by the absence of orthogonality can only retard the
'convergence'; it does not prevent it. In this situation, three strategies have been
employed to implement the Lanczos method; we shall briefly recall their
advantages and inconveniences:
(a) Strategy of non-reorthogonalization. The memory space required is minimal.
However, more than n steps are needed to obtain all the eigenvalues
(including the multiple ones); in general between 2.5« and 6n steps are
necessary. The stopping criterion is delicate, and the eigenvectors have to be
computed separately.
(b) Strategy of complete reorthogonalization. The behaviour of this algorithm
is very close to that of the exact algorithm; in particular, it requires a minimal
number of steps. On the other hand, it needs more memory space, since it is
necessary to store the Vj and it needs more calculations (but the Gram-Schmidt
process is well vectorized).
(c) Strategy with partial reorthogonalization. This is a compromise strategy; the
reader can find a description in Parlett (1980, Ch. 13). Essentially it provides
the advantages of complete reorthogonalization at a much lower cost. This
is achieved by watching the precision obtained with the help of the value of


In exact arithmetic the Lanczos method cannot detect the multiplicity of the
eigenvalue which is being computed. This fact has led to proposing the block
Lanczos method. This enables us to determine multiplicities that are less than
or equal to the size of the block.

6.4.1 The Algorithm

Let {wj,..., ur} be a set of linearly independent vectors that generate a subspace
S. The Krylov subspace generated by S is defined as
j r l = Iin(SMS,...,i4 , - 1 S).
We construct an orthonormal basis Vx of Kt in which Ax is represented by the

which is a block tridiagonal matrix, the order of each block being r. Moreover,
the blocks form a band of size r + 1 (see Figure 6.4.1).
»Ί = [βο,..·,βι-ι].
where the orthonormal basis Qj of AJS is constructed from the orthonormal
basis Q0 of S in the following manner:
(a) χ Ι , - β ί ^ ρ ο , Β ^ Ο , ρ . ^ Ο ;
(b) when ./= 1,2,...,/-1,
(i) put X, = AQj_, - β,._ Jj- Qj_2B*,
(ii) carry out the Schmidt factorization

where Rj is an upper triangular matrix of order r,

(iii) put BJ+ x = R., i . + x = ß7MQ ..
If dim Jf j = /r, the matrices Rj are regular provided that the degree of the
minimal polynomial of the vectors {w}^ is at least equal to /.

Figure 6.4.1
Lemma 6.4.1 The eigenvalues ofT1 are of multiplicity less than or equal to r.

PROOF Since the matrices Rj are regular, the matrix t{ is of rank ^ n — r. For
each eigenvalue λ of A, the matrix Tl — λΐ is still of rank ^ n — r. Hence
Now let E be the subspace generated by the {Ρβ}\.

Lemma 6.4.2 The block Lanczos process amounts to approximating the eigen­
values of A' = A\E whose eigenvalues are of multiplicity ^ r .

PROOF This is analogous to the proof of Lemma 6.3.1. The eigenspace

associated with A, is PtS and dim PtS < r.
Let {μ.}" be the eigenvectors of A9 each counted with its multiplicity and
arranged in decreasing order of magnitude:

The set of indices {i, i + 1,..., i + r — 1} is denoted by /. Each eigenvalue μ. is

associated with the eigenvector x. and the eigenprojection P. = x.xf. Then

is the projection associated with the block {μί9...,μί+Γ^ί} of total algebraic

multiplicity r.

6.4.2 Estimation of tan 0(xk, JTi), ke\

Theorem 6.4.3 //dim PS = r, there exists skeS such that

tan 0(xk, JTd < ^ ' tan(x k , sfc) (kel),

Δ 1 = 1, Δ ί = Π ^ (*>1λ


PROOF A vector seS can be written as


s= Σ^Ρ

and so

Ps = tsjPuJ-
Since the r vectors {PUj}\ are independent by hypothesis, there exists a unique
vector sk < S such that Psk — xk for each given eigenvector xk (fee/). We put
vk = (I-P)sk = sk-xk.
Us k -x fc || 2 = tan0(xfc,sfc).
For a given xk we consider the vector veX{ which can be written in the form
v = q(A)sk9 where gePj_ x.
Sk = xk + 2., -Pj*fc>
we have

(a) TTie cose i = 1. Here Sj is defined by Xj
||(/-P)i,||>= «^illPjSillS
II ^ II2 J>1+- Λ )
The minimum of the right-hand side for gePj-i is attained for p. We put
s = p(A)sleJfh
2 0 μι+Γ + Μπ
PI — Vl+r-Vn
Hence, for j ^ 1 + r,
<*lVj-ßl = 1- 2AW

m*i and ir*--ι(0;)Ι<1·

tan2 0(x ^ . Τ , Κ

ji> 1 + r p2(Aii)

< y ll p ^"2

Σ \\PjSl\\22=W-P)sl\\22 = \\s1-x1\\22.
3> 1+r

(b) The case i > 1. We now put

2 μί+ μη
β. = '*
ßi + r-ßn
therefore α,·μ& — β( = yk. We define

PM = T^fat-ßü
then Ρί(μ,) = 0 when; < i. Let s = Pi(A)sk9 where sfc is defined by xk. Now

< I Π Mj ~ μ " \ ^ i + r ^ P ' S k '

<_.. '.~.llx»-sJ5·
We remark that the bounds of Theorem 6.4.2 reduce to those of Theorem 6.3.3
when r = 1 and the eigenvalues are distinct. The angle 9(xk,Jft) decreases like
T^_\(yk), where yk depends on the distance μΙί — μ ί+Γ . The generalization of the
Lanczos method to the block Lanczos method has an effect that is comparable
to the transition of the power method to the method of simultaneous iterations
(see Section 6.2).
Bounds for \μΗ — μ[1)\ and || xk — x[l) ||2 (kel) can be established as in Theorem


It is required to solve the generalized eigenvalue problem
Κχ = λΜχ (χ#0), (6.5.1)
where K is symmetric and M is positive definite symmetric (see Chapter 3). This
can be reduced to the standard form A — vl in different ways.

6.5.1 The Cholesky Factorization of M

Let M = RTR. We put

where the product is not evaluated explicitly for large matrices. Equation (6.5.1)
is equivalent to
Ay = Xy,
where x — R~ y. This reduction preserves the eigenvalues. Suppose we wish to
compute some of the smallest eigenvalues; the convergence factor for the least
eigenvalue kx is determined by

Now this number may be very small. In structural mechanics, it is not rare to
have A! = 105, A2 = 2 x l 0 5 and i m a x = 10 19 , which leads to y1 = 10" 1 4 . It
requires about n Lanczos steps to separate λχ from λ29 even in exact arithmetic.
An efficient remedy is provided by the spectral transformation, which is a natural
generalization of inverse iteration.

6.5.2 Spectral Transformation

Let us choose σ close to the eigenvalues rquired and such that K — aM is regular.
Equation (6.5.1) has the some solutions as

{K - σΜ)~ιΜχ = —— x (x Φ0). (6.5.2)

It is natural to put

Α = (Κ-σΜ)~ιΜ and v=
However, A is no longer symmetric with respect to the Euclidean scalar
product. We have the following lemma.

Lemma 6.5.1 The matrix A is self-adjoint with respect to the scalar product
defined by M.

PROOF Define the scalar product

<u9vyM = vTMu.
(Au, v}M = vTMAu = ντΜ(Κ - σΜ)~ lMu
= [(X - aMylMv]JMu = <w, Av}M.
Therefore we may define the algorithm (6.3.1) for the problem (6.5.2) provided
that the Euclidean scalar product is replaced by <·, > M . This yields the following

(a) u Φ 0, w = Mu, bi = uTw, v0 = 0;

(b) When; = 1,2,...,/, put
u w
Solve (6.5.3)
(K — aM)u = w, u:=u-bjVj_iy
a. = u w, u:u - ajVp w = MM,
j+l = u w.

The additional cost in relation to (6.3.1) at each step consists in the evaluation
of w = Mu and in the solution of (X — aM)u = v. This solution is carried out with
the help of the factorization K - σΜ = LDlJ, where L is a lower triangular
matrix with a unit diagonal. The strategy of complete reorthogonalization is here
recommended in order to keep / as small as possible. The spectral transformation
transforms the part of the spectrum that is close to σ into the extremities of the
spectrum of A. Hence the algorithm (6.5.3) will efficiently compute the eigenvalues
in an interval containing σ. If required, one could use different shifts σ. This
method enables us to determine any eigenvalue whatsoever in the interior of the
spectrum if we know an approximation σ.

Remark For very large problems, the triangular factorization cannot be kept in
the central memory. If transfer time between the central and the secondary memories
is important then it may be advantageous to use the block Lanczos method, which
involves the solution ofr systems of the form (K — σΜ)uf = w,· (i = 1,..., r).
For the sake of simplicity we have assumed that M is positive definite, but in
practice it may be singular (see Exercises 6.5.1 and 6.5.2).


We shall now describe a method that extends the Lanczos method to the case
of a non-Hermitian matrix. It is based on Arnoldi's algorithm which iteratively
transforms a matrix into a Hessenberg matrix (Arnoldi, 1951).

6.6.1 The Incomplete Arnoldi Method

Again, if u Φ 0, let
j r l = lin(ii,i4M,...M , " 1 w)
be the Krylov subspace generated by u. Arnoldi's method computes an ortho-

normal basis {ν.}[ of tfx in which the map A, is represented by an upper

Hessenberg matrix:
(a) O^u/Wu^hui^v+AOi,
(b) when 7 = 1,...,/ — 1, put
xj+1=Avj- Σ huvi9 hj+ij=\\xj+1\\2, (6.6.1)

J+i=Λ;Λ.Λ·+1> hj+1=**AVJ+i e <i+*)·
The algorithm terminates when χά = 0, which is impossible if the minimal
polynomial of A with respect to u is of degree > /. If this condition is satisfied,
Hi = (hij) is an irreducible Hessenberg matrix.
In what follows we shall suppose that A (and hence At) is diagonalisable with
eigenvalues {A,·}*}. Since Ht is diagonalisable, it necessarily possesses / simple
eigenvalues, which we shall denote by {Aj0}^. Let Pf be the eigenprojection
associated with λ{. ΙίΡμφ 0, we put x, = Pf w/ll P»w II2 ·

6.6.2 Estimation of || (/ - /r,)*,· || 2

Lemma 6.6.1 IfPiU Φ 0 and if A is diagonalisable, then

ll(/-r - w j x , | | 2 < min max|p(A.)| cf.


PROOF We have
||(/-7r l )x i || 2 = dist 2 (x i ,Jf / ) = min \\Xi-q(A)u\\2.

In X' we choose a vector of the form

-lq(A)Piu + q(A(I-Pi))ul
Since q(A) = ^(Λ^Ρ, and

we obtain
Ιΐα-π^χ,ΙΙ,^ min \\p(A(I-P,))(/-PtM2-
i.Pi-1 IIΛ" II2
Now v4 is diagonalisable, say Λ = XDX~\ and so A(I - Pf) = XD'X'1, where
£>' is a diagonal matrix consisting of the eigenvalues Xj(j Φ i) of A. Hence
||p(A(I - P,))|| 2 ^ min|p(^-)| cond 2 (X).

The proof is concluded by putting

I K / - W I 2 cond (X).
ε(.°= min max \ρ(λ)\.
pePi-i sp(A)-{ki}

This is the uniform norm of the best approximation to the zero function on the
set sp(A) — {AJ by means of polynomials in a complex variable of degree </
satisfying the condition ρ(λ() = 1.
In Chapter 7, Theorem 7.1.6 and Example 7.1.2, it will be shown that if the
spectrum of A consists of d distinct eigenvalues, then among the d — 1 eigenvalues
of A that are distinct from A, there exist / eigenvalues, denoted by A 1? ..., Aj such

ε«> = min max |p(z)| = ( £ Π

K — λ]
PU)=1 \ k*j

provided that / ^ d — 1; otherwise ε(/) = 0.

It is seen that, when / is large, ε(0 decreases as the terms of the form
|(Ak — X)l(Xk — Xj)\ (k Φ]) in the denominator increase. Therefore it is likely that ε(Ζ)
will be smaller for those eigenvalues situated at the periphery of the spectrum
(which includes the dominant eigenvalues) rather than for those that lie in the
interior of the spectrum. This is confirmed in practice.

The discussion of the decrease of ε(/) when / increases is a difficult problem in the
approximation theory of functions of a complex variable. Except in particular
cases in which the spectrum is of a very special form, it is not easy to establish
upper bounds of ε(° that are both simple and precise. The two examples below
show that ε{Ρ depends on sp (A) in an important manner.

Example 6.6.1 When the eigenvalues are uniformly distributed over [0, 1], we

^ =^ 7 . (7=l,...,n)
and ε(Λ"1> = : 1 1
n-\ ' 2"- -!

Example 6.6.2 When the eigenvalues are uniformly distributed over the circle
\z\ = /, we have A,· = exp [2(; - 1)πί/η] (; = 1,...,n) and ε(0 = 1//.

It is seen that the decrease of ε(0 with increasing / can be quite moderate for

certain cases of spectral distribution. The study of ε(,) is pursued by studying the
upper bound η{1); this is obtained by letting z vary in a domain D which contains
sp(A) = {λ} and excludes λ. In fact,
max |p(z)|<maxi0( 2 )|
sp(A)-{X) D

e ( l ) ^ij ( , ) = min max|p(z)|,
pePi-i zeD

When the matrix A is real, its spectrum is symmetric with respect to the real
axis. If the eigenvalue λ is also real, we may choose for D a domain that is
symmetric with respect to the real axis.
The following theorem determines η{1) in three particular cases: λ is real and
D consists of
(a) a line segment,
(b) a disk,
(c) the interior of an ellipse with real major axis.

Theorem 6.6.2 We have the following characterizations, where a,X — c,e and p
are positive real numbers:
(a) When D is the real interval {i; 11 — c\ ^ a},

(b) When D is the disk {z; \z — c\ ^ p},

(c) When D is bounded by the ellipse with centre e, focal distance e and semi-
major axis a,

PROOF See Chapter 7, Section 7.3 and Figure 6.6.1.

Figure 6.6.1

6.6.4 Approximation
We shall prove the analogue of Theorem 6.3.4 after putting

where x is the eigenvector associated with the simple eigenvalue λ.

Theorem 6.6.3 Suppose that λ and x are given and that λ is simple. Then ι/α, is
sufficiently small, there exist eigenelements λι and x, of At such that \λ — AJ< ca(
and sin 0, ^ calf where c is a generic constant.

PROOF We revert to the proof of Theorem 6.3.4, where it was shown that
\\?i — PII2 < ca 2· We suppose that the eigenvectors x and x, are such that
ΙΙχ|| 2 = ΙΙχ/ΙΙ 2 = ΐ .
We introduce a vector x which is proportional to x and satisfies xf. x = 1 so that
P,x (i) = x,.
Similarly, define
Ptx = x\
(see Figure 6.6.2).
I 2 = ΙΙχ;-χ|Ι 2 = ΙΙχ;ΐΙ2ΙΙχ "-χ(ΙΙ2.

We obtain
A —— X»# /\X ,

λ - λι = x* (/ - nt)Axil) + x* ntA{x{l) - χ,)

= X
ΙΓΊΓ **V - πι)Αχ + χΐ*πιΑ(χΊ- χ
) \
ll-^il^L J

x λ2
X * λ,
* * »»
X 4 X
X λ3
Figure 6.6.3

μ - λ.\ ^ c ( max I M l ] α ^ ca
ν<'<»ΙΙχίΙΙ2/ ' '
Suppose we wish to approximate the dominant eigenvalue λ = λν We assume
that the remainder of the spectrum is real (or nearly real; see Figure 6.6.3). Then
Theorem 6.6.2 enables us to deduce an error bound equal (or nearly equal) to
that of the Lanczos method which amounts to Ι/Τ^^γ^ (without, however,
the exponent two for the eigenvalues).
Supposing that the dominant eigenvalue is real; we can obtain an approxima­
tion for the ith eigenvalue which is still very close to that provided by Lanczos,
on the condition that the remainder of the spectrum is real or nearly real.
When A possesses complex eigenvalues, the study of the precision of Arnoldi's
algorithm is far less conclusive than that of the Lanczos method. The reader will
understand that to a large extent this is due to the lesser degree of perfection of
the theory of uniform approximation on a compact set in the complex

6.6.5 A Posteriori Bounds

We have the identity
l=VlHl + h
whence we deduce that


for the ith pair, λ{{\ χ{/\

Now A is diagonalisable by hypothesis, say A = XDX~l. By Theorem 4.4.1
there exists an eigenvalue λ such that
In contrast to what happens when A is Hermitian, this bound cannot be entirely
calculated a posteriori on account of the presence of the factor cond 2 (X).

6.6.6 The Practical Algorithm

The computation of the matrix Ht is extremely costly in practice. The technique
of incomplete orthogonalisation is often preferable to it, which we are now going
to describe. This technique is based on the (heuristic) observation that the
elements Λ0 of Ht decrease for fixed i when j decreases.
We construct a basis {w.}\ oiXl in the following way: let q be a given positive
integer, w ^ O a vector and wi =M/||W|| 2 · W h e n ; = l , . . . , / p u t
(a) y = Awß
(b) when i = max (1, j — q\ then up to ί = / put y: = y — wfi^ where h{j = wf Aw:,
(c) Wj+i=y/hj+Uj, where hj+l = \\y\\2.

Theorem 6.6.4 The vectors defined in (6.6.2) form a basis Wt for Xl such that
wfwj — di} when \i —j\ ^ q + 1.

i* = max(l,;-<7) = ^ .
[j-q if; > q.
when ; = 1,...,/, we have

i = 2*

Let Ht be the band Hessenberg matrix such that its non-zero elements are hi}
when i —\^j^i + q:

\ 0
we have the identity
AW^Wfit + K^w^e]. (6.6.3)

Proposition 6.6.5 The map Ax — nlA^Xe is represented by the matrix Ht which

H^H^rtf, r^^^G-'Wfw^^
i ν

PROOF Put Gl = Wf Wx and Bx = WfAWt. We leave it to the reader to verify

that the map Ax is represented by the matrix
Hl = G~lBl

with respect to the adjoint bases Wl and WlGl * (Exercise 6.6.3). On multiplying
the identity (6.6.3) by Wf we deduce that

This yields the result required.

From Proposition 6.6.5 we can deduce two implementations of the Arnoldi

algorithm with incomplete orthogonalization that are useful in practice. They
use the band Hessenberg matrix with or without the correction term rtej (see
Exercise 6.6.4).
Another way of limiting the cost of computation in practice is to use the
iterative variant of Arnoldi's algorithm in a heuristic manner. Starting with u
and fixing a moderate value of Z we compute the eigenvectors φψ οίΑν We begin
again, using as a starting vector a linear combination of the φ{}\ No proof exists
for the convergence of this method (see also chapter 7, Section 7.9).


In Section 6.1 we have expounded the principle of approximation of eigenelements
of A by means of an orthogonal projection upon a judiciously chosen subspace.
When A is not Hermitian, we might be led to consider oblique projections. We
propose the following formal presentation which uses two non-orthogonal
subspaces Gf and Gf of dimension v « n. Let ώι be the orthogonal projection on
Gf. The problem (6.1.1) is approximated in G\ by the problem: find A,eC,
OT^X/GGJ such that

ώ^Αχχ — λιΧι) = 0
The problem (6.7.1) is known as the Petrov approximation of (6.1.1) (Chatelin,
1983), p. 64 and Ch. 4).
We construct orthonormal bases V\ and Vf in G\ and Gf respectively.
Equation (6.7.1) becomes
which is a generalized eigenvalue problem.
The reader can verify (Exercise 6.7.1) that the orthogonal projection ώι on Gf

Figure 6.7.1

defines an oblique projection π' on G) (see Figure 6.7.1); this justifies the name
of oblique projection for this method.

Example 6.7Λ The non-symmetric and incomplete Lanczos algorithm is a

method of oblique projection on the subspaces
Jf^lin^/h*,...,^"1) and 2'l = l\n(vuA*v,...XA*)l-1v).
(a) Choose υλ = u and wl = v such that
w*vl — 1, fc1=c1=0.
(b) When; = l,2,...,/put

Wj+ j = A* Wj - QjWj - CjXVj- l9


j+l— >

The matrix WfAVl is tridiagonal with elements (ci9ai9bi+l). This is a delicate

method to use in practice since no proof of 'convergence' exists. Its great
advantage over Arnoldi's method is the fact that only a small storage space is
necessary. (See J. Cullum and R. WiUoughby, Ά practical procedure for
computing eigenvalues of large sparse nonsymmetric matrices', Cullum and
WiUoughby, 1986, pp. 193-240).


The general presentation of this chapter is adapted from Chatelin (1983); see also
Saad (1980a, 1980b). The convergence theorem 6.2.4 is due to Chatelin and Saad.
Theorem 6.3.4 was proved in Saad (1980a) by means of a variational formulation.
The proof based on the spectral theory, which is given here, does not rest on the
fact that A is symmetric; it therefore applies to Theorem 6.6.3.
Paige's dissertation (1971) is the origin of papers in the 1970s on the effect of
finite precision in the Lanczos algorithm. The selective reorthogonalization is
presented in Parlett and Scott (1979), the partial reorthogonalization in Simon
(1984) and the algorithm without reorthogonalization is studied in the book by
Cullum and WiUoughby (1985). The block Lanczos method was introduced by
Golub (1973).
The spectral transformation studied in Ericsson and Ruhe (1980) is much used

in structural mechanics. The case of a complex transformation (for a real

non-symmetric matrix) is discussed in Parlett and Saad (1987).
A comparative study of the performances of the method of simultaneous
iterations (with or without Chebyshev acceleration) and the Lanczos method is
proposed in Nour-Omid, Parlett and Taylor (1983). Although quite often the
Lanczos method reveals itself as having a better performance than the iteration
of subspaces, even when accelerated, this advantage generally disappears when
the matrix is no longer symmetric, as we shall see in Chapter 7. Finally, Parlett's
article (1984) is directed towards the existing numerical software.
The practical use of the non-symmetric Lanczos algorithm is studied in Parlett,
Taylor and Liu (1985). See also Saad (1982a).


Section 6.1 The Principle of the Methods

6.1.1 [A] Let nb Gx and s/t be the mathematical objects defined in Section 6.1.
Let At = πχΑ. Prove that jrft and Ax have the same non-zero eigenvalues.

6.1.2 [D] Let N > n. Consider three matrices aaeCn x", aßeCN x n and aveCn x N
such that
<*« = aßP = ra
Nxn nx N
where peC and reC are such that
rp = /„,
Define the following square matrices of order N:
Aa = paar, Aß = paß, Ay = a/.
Let μ be a non-zero eigenvalue of algebraic multiplicity m. Let weCn x m be a basis
of the right invariant subspace and let i;eC Xm be a basis such that v*u = Im.
σ = ν*αα and n = pr.
Let Sa be the block-reduced resolvent of aa associated with the eigenvalue μ.
Let ra(z) be the inverse operator of y -»a^y — yz, where zeCm x m is a given matrix
whose spectrum is disjoint from that of αα.
(a) Prove that σ is regular.
(b) Prove that for each xeC" x m we have
sa(x) = lim re(z)[(/m - tw*)x].
(c) Prove that μ is an eigenvalue of algebraic multiplicity m of the matrices Aa, Αβ
and Ay.

(d) Obtain the spectral projections for Aa,Aß and Ay as functions of αα9αβ,αν
p, r, u and v.
(e) Prove that for each XeCN x m the block-reduced resolvents of Aa, Aß and Av
associated with μ, are given by the formulae
Sa(X) = psa(rX)-(IN-n)XG-\
Sß(X) = (psa(aßX) - l(IN - ρησ-ιν*αβ)Χ1σ- \
l l
Sy(X) = laysa(rX) - (IN - ayua~ v*r)X^-

Section 6.2 The Method of Subspace Iteration Revisited

6.2.1 [A] Study the constant \\Cl\\2 of Lemma 6.2.1 when the matrix A is not
6.2.2 [D] We retain the notation of Exercise 6.1.2.
(a) Prove that the basis of the right invariant subspace of Ar associated with μ,
can be derived from that of Aa by subjecting the latter matrix to fixed point
(b) Study the convergence of the eigenelements of Ar when interpreted as the
approximations of the eigenelements of A.
6.2.3 [A] Prove Theorem 6.2.4 on the assumption that |μ,| > |μ ί+ J.
6.2.4 [A] Prove that the matrix Qk constructed by the algorithm (5.2.1) is a
basis of the subspace AkS.
6.2.5 [A] Prove inductively that the matrix Δζ of Theorem 6.2.4 is block-

Section 6.3 The Lanczos Method

6.3.1 [A] Prove that if the Krylov subspace JTZ in the Lanczos tridiagonalisa-
tion method is of dimension less than /, than the process reduces to two
eigenvalue problems of order less than n.
6.3.2 [D] Prove that the basis Vn constructed by the Lanczos method is an
orthonormal basis.
6.3.3 [A] Let (uu..., u„) be a basis of the vector space S of dimension n. The
vectors pj constructed by the Gram-Schmidt algorithm are defined as follows:

yj+i=Uj+i· Σ (yfuj+im­

prove that they form an orthonormal basis of S.

Now consider the Krylov subspace Jf generated by
uj = Aj~1v1 (j = 2,...,n),

where vx is such that §vx \\ 2 = 1. Let Xj be basis vectors constructed by the Lanczos
algorithm and let ys be the vectors obtained in the Gram-Schmidt orthogonali-
zation process.
Show that, for j = 1,2,..., n, there exists a real non-negative number Oj such
e x
yj = jr

Deduce that the orthonormal bases of Lanczos and Gram-Schmidt coincide.

Why is the Lanczos algorithm preferable in finite precision arithmetic?
6.3.4 [D] Retain the notation of Lemma 6.3.1. Show that the Lanczos method,
when applied to A or A\ produces the same matrix πχΑπχ.
6.3.5 [D] We are proposing here a generalization of the Lanczos method to the
case of a non-Hermitian matrix A. Let vx and wx be such that w^vl = 1. Define

whenj= 1,2,...,/, put


Wj+ ! = A*Wj - ÜWj - öjWj- l9



v -5£±i

(a) Prove that


(b) Prove that if IK || = UwJ* then

11^112 = 11^112 (i=l,2,...,/).

^ = (1?!,...,^),

Jfl(A,vl) = \in(vuAv1,.-.,Al~1vl),
jr / (^*,w 1 ) = lin(w 1 M*w 1 ,...,(>4*r i w 1 ),
(OLK β2 0 ... 0^
\&2 " 2 ß3 ··· ;
r,= o ··.. '·.. '■·.. o
i '■■•■..'"V'-A.
0 ·· 0 'öm'ctm
(c) Prove that if the algorithm terminates at the Zth step (<5jf+ x Φ 0, ; = 1,2,..., Z),
\ν*ν, = ι„
lm(V,) = jrl(A,Vl)
\m(W,) = X-l(A*,Wl)
^i=y,T, + Sl+lVl+le*,
A*W,= WtT* = ßl+lwI+le*,
(d) what happens if Sj+1= 0?
(e) Interpret the matrix T, in relation to a representation of the linear map A.
6.3.6 [A] Find estimates for the constants in the bounds given in Theorem
63.7 [B: 46] Suppose the calculations are carried out in finite precision arithmetic
with machine error of order ε. Thus the recurrence formulae (6.3.1) become
AV^V^ + b^^rf + F,,
F*F, = L, + / + L*,
where L, is a lower triangular matrix. Suppose that there exists local orthogona­
in such a way that the diagonal and the first subdiagonal of L, are zero.

Finally, suppose that

(a) Show that

ΙΙ^ + ι^ + ιΙΐ2 = Κ + 1 ^ Ι ΐ 2 = ΙΙ^ +1 ^ΙΙ 2 .

(b) Show that the ith column xj.° of Xt satisfies the equation

Λ + ι 4 0=1,2,...,/),
yiJt = £<"■%£<«,
Kt being the strictly triangular part of Fj Vt - VjFr
(c) Show that ||KJ|2 = 0(ε||Λ|| 2 ).
(d) Show that the relations
7α = 0(ε\\Λ\\2) and xj°Vi~l
imply that
ßu = 0(e\\A\\2).
Deduce that 'the loss of orthogonality entails convergence'.
6.3.8 [D] Retain the notation of Exercise 6.3.7.
(a) Prove that if i φ k and i, k < /, then

(λ<<> - λ<'>)χ</>τχ<« = yu ψ - ykk | ä - ( , . - yki).

(b) Deduce that the Ritz vectors xf) and x%\ which are not good approximations
for the eigenvectors xf and xk (because ξη and ξΙΗ are too great), are
orthogonal up to machine precision.
6.3.9 [B:15] Given a real symmetric matrix A - (atJ) of order n, choose an
arbitrary vector v{*] such that \\ν^]\\ 2 = 1, and an integer fc0 « n.


be a function of the type Ω ^ R -► Rn x n. Consider the following algorithm:

For / = 1,2,... and k = 1,2,..., k0 put

(a) Wf = AVkl\
(b) Hkl^Vk^Wkl\

(c) Compute an eigenvalue λ^ of H^ and an associated eigenvector y[° of norm

(d) x i l ) = ^ i l )
(e) r<'> = M-A<'>/)x<<>.
If lK°ll2 xs sufficiently small, terminate the process; if not,
(fK'> = C(A<«)r«>,
(g) w«'> = (/-F<'>K<'>T)i<".
If II w{k II2 i s sufficiently small, then
w (0 = r (0

(h) d = — ^ <+i>

K? + 1> = x g (/+-/+1).
Prove the following inequalities:

ιιη°ιι 2 =ι>
Κ°|| 2 = ι,
4 ° ^ ^max(^) (the greatest eigenvalue of A\
6.3.10 [D] Consider the algorithm of Exercise 6.3.9. Prove that
(a) V^ V(l] is an orthogonal projection for all / and k.
(b) If λ -► C(A) is continuous on a compact set containing the spectrum of A, then
the sequence 4° is bounded with respect of / and k.
6.3.11 [D] Investigate the convergence of the algorithm proposed in Exercise
6.3.9 when λ^ is the greatest eigenvalue of the matrix H%\
(a) Show that, for each fce{l,2,...,/c0}, the sequence (A^)leN is increasing and
(b) Show that

independently of k.
6.3.12 [D] Show that if in Exercise 6.3.9 C(A) is symmetric and positive or
negative definite, then the sequences r^ and rjjj tend to zero as / tends to infinity.
6.3.13 [C] Study the behaviour of the algorithm described in Exercise 6.3.9



0(λ) = (λΙ
where D is the diagonal of A.
6.3.14 [B:15,16] The choice made in Exercise 6.3.9, namely
C(A) = (^/-D)" 1 ,
where D is the diagonal of A, corresponds to what is called Davidson's algorithm.
We suppose the aim is to compute the greatest eigenvalue of A. Show that if v{^]
is such that λψΐ — D is positive definite, then the algorithm converges.
6.3.15 [B:15] Let (λ,ν) be a pair of eigenelements of A, where λ is not the
greatest eigenvalue of A. Let weUn and let ε be a non-zero real number. Put
vE = v + ε W.
(a) Show that
νΎεΑνε_ , t
T 2
2w Aw-λ\\\ν\\ 2

kiis ιι^.ιΐί
S + = {wGRn:wT/lw-A||w||2>0}
S_ = Rn\S + .
(b) Show that S + is a non-empty open cone.
(c) Consider the algorithm defined in Exercise 6.3.9. Show that the convergence
of x[° towards V can take place only if
(d) Deduce that the method is unstable when λ is not the greatest eigenvalue of A.
6.3.16 [D] Consider the basis KjJ* of Exercise 6.3.9. Can this basis be associated
with a Krylov subspace?
6.3.17 [A] Consider the classical Davidson algorithm (see Exercises 6.3.9 and
6.3.14) when applied to the real symmetric sparse matrix A = (α^). Let i0 be an
index such that

and let/o be an index such that

^ο,ο * 0 .

(a) what happens when no such^O exists? Let

(b) Obtain the explicit form of H(2υ.

(c) Deduce the convergence of the algorithm.
6.3.18 [B: 61] Let A be a real symmetric positive matrix of order n. Consider
the following two methods for solving the problem Ax = b.
Lanczos method: x 0 eR n is given.
r0 = b- Ax0,
^o = Ikolla»
Forfc = 0,l,2,...:
if Sk = 0, terminate;
if not, let


yk = <iIA<iki
uk = Aqk-ykqk-ökqk.u

Method of the conjugate gradient: x0elRn is given

r0 = d0 = b - Ax0.
For/c = 0,l,2,...:
If dk = 0, terminate: x = xk is the solution of Ax = b; if not, put

_ II»-*«!
fc+1 = xk + akdk,
k+l = rk- akAdk,

K + 1 ll 2 2
ßk ii- n2 '

^k + i = r * + i + ßkdk-

Let d be the number of district eigenvalues of A.

(a) Prove the existence of an integer m ^ d such that dm = 0.
(b) Prove that the vectors d0,...,dm are linearly independent.
(c) Prove that
lin (</ 0 ,..., dk) = lin (r0, Ar0,..., A kr0)
= lin(r0,ru...,rk) (0 < k ^ m - 1),

(d) Prove the following properties:

djAdj = 0 (0 ^ i <j < m),

l|ril| 2 >0 (0<i<m),
ifO < i < 7 ^ m ,
djrj =

(e) Prove that the minimum of the function gk(o) = (xk — x + cdk)TA(xk
+ adk) is attained at ak.
Consider the tridiagonal symmetric matrix

ί^ο <*i. 0
^ι Vi. '··..
'··.'·. A-i
\0 'h-i'Jk-i)
Dk = dmg(a-\...iak~}1),
β* = (<7ο>· ··><?*-1)>
*j = - % / £ / (0<j<fc-lX
A o .. o^
τ0 i o ;
** = 0 t! 1.

(f) Prove the following relations:

AQk-QkTk = ökqkel
Ö*ß* = / *

Tk = LkDkLl
and deduce the relations between the parameters yi9 <5„ ft and ai9
(g) Show that the iterate xk of the conjugate gradient method can be obtained
from the Lanczos method by the equation

(h) Propose a method for computing xk — x0 based on the Cholesky factorization


Section 6.4 The Block Lanczos Method

6.4.1 [D] Recover Theorem 6.3.3 by staring from Theorem 6.4.3 when r = 1 and
when the eigenvalues are district.
6.4.2 [D] Can one generalize the block Lanczos method to a non-Hermitian
matrix by reverting to the ideas of Exercise 6.3.5?

Section 6.5 The Generalized Problem

6.5.1 [B: 47] Consider the problem
(K - λΜ)ζ = 0 (z Φ 0)
when K is a real symmetric positive definite matrix and
M = diag(M + ,0),
where M + is real symmetric positive definite. The structure of M induces a
partition of K and z, namely

(a) Show that the original problem can be written as

( X 1 1 - A M + )z1 + /C12z2 = 0,
K]2Zl + K22z2 = 0.
(b) Show that it may be supposed that K22 is regular.
(c) Show that zx is an eigenvector of the matrix
^11 ^ U ""^12^22 ^12·
(d) Show that z 2 is completely determined by zl9Kl2 and K22.

6.5.2 [D] Generalize the study made in Exercise 6.5.1 to a matrix M which is
real symmetric semi-definite.
6.5.3 [D] Consider the problem
Kx = AM x,
where K and M are symmetric, K is regular and M is positive semi-definite
singular. Let X be the basis of eigenvectors normalized by ΧτΜΧ = /.
(a) What is the result of the inverse iteration
(K-aM)z = Myk, Λ + ι=/τ?
(b) Use Exercise 1.13.2 to show that when y is arbitrary and
z = (K-aM)~lMy,
then either
(i) K ~ lM is non-defective (eigenvalue 0 of index 1) and zelmX;
(ii) K ~l M is defective (eigenvalue zero of index 2) and (K — σΜ)"1 Mz e Im X.
(c) Deduce that, whatever the initial vector y 0 , after at most two iterations the
vectors yk lie in ImX and everything takes place as if M were regular.

Section 6.6 Arnoldi's Method

6.6.1 [D] Prove that Ht represents the map stt in the orthonormal basis
{vt } l v defined by Arnoldi's method (6.6.1).
6.6.2 [D] Arnoldi's method is used to approximate the eigenvalue A£. Establish
the rate of convergence when the remainder of the spectrum is real.
6.6.3 [A] Prove that in Proposition 6.6.5 the map s/t is represented by
the matrix Ht = G^1Bl relative to the basis Wr
6.6.4 [A] Consider the algorithm (6.6.2). Arnoldi's method with incomplete
orthogonalization without the correction term rtej consists in using the eigen-
elements of the matrix Ht in order to approximate the eigenelements of A in J f f.
Study the corresponding error bounds.

Section 6.7 Oblique Projections

6.7.1 [A] Consider the Petrov approximation defined in (6.7.1). Prove that the
orthogonal projection ώχ on Gf defines an oblique projection π' on Gf if
6.7.2 [D] Study the Petrov approximation when Gf = AG\.
6.7.3 [D] Show that the methods of incomplete orthogonalization and of
aggregation/disaggregation for a Markov chain are methods of oblique projection.

Chebyshev's Iterative

The Chebyshev polynomials play a central part in the acceleration techniques

for the convergence of linear iterations, for they furnish the optimum of the

min max|p(z)|,
pePk zeS

where S is a set in the complex plane that does not contain A and is bounded by
an ellipse.
In the chapter we have collected a certain number of methods inspired by this
principle in order to compute the eigenvalues of greatest real part of a
non-symmetric matrix.


Let S be a compact set of C (or R) and let C(S) be the set of continuous
functions on S with real or complex values, endowed with the uniform
||/IL = max|/(z)|.

Let K be a subspace of C(S) of dimension k. Our aim is to characterize

those elements V* of V which are closest (in the same of the uniform norm)
to a given function /.

(a) v* is a best approximation of / over S in V if
min max \f(z) - v(z)\ = \\f-v*\\ „.
veV zeS

(b) Each f possesses a non-empty subset of critical points

£(/,S) = {reS;||/|L = |/(OI}·
As regards the existence of a best approximation v* the reader is referred to
Exercise 7.1.1. We have the following fundamental characterization.

Theorem 7.1.1 The function v* is a best approximation of f if and only if

there exists a subset σ of S consisting of distinct points zu...,zr of S together
with r positive numbers a 1? ..., ar such that

t a,[/·(*,)-»*(*,)] v(Zi) = 0 (V»6 V),

where r^k+ 1 in the real case and r ^ 2k + 1 in the complex case.

PROOF The reader is referred to Revlin (1990, p. 74). The {zj'i are the
critical points of the error / — v*.

When z φ 0, we put sgn z = z/\z\ so that (sgn z)z = \z\.

Corollary 7.1.2 The function v* is a best approximation of f if and only if

there exists
a= {zi}r1czE(f-v*,S}

such that

£«AP(*i) = 0 WveV),
i= 1

(a) e^sgnif-O*)^)
(b) £i(f -v*)(zt) = | | / - v * | | „ (i=l r).

PROOF Corollary 7.1.2 is a reformulation of Theorem 1.1.

Theorem 7.1.3 / / v* is a best approximation off on S, then it is also a best

approximation off on σ, and
min max |/(z) — v(z)\ = min max |/(z r ) — ν(ζ{)\.
veV zeS veV Ziea

PROOF See Rivlin (1990, p. 75).

Definition The subspace V of dimension k is said to satisfy the Haar (or the
Chebyshev) condition on S, if every non-zero function of V possesses at most
k — 1 zeros in S.

This definition is equivalent to the interpolation condition at k distinct

points of S: for every set of k distinct points {tt}\ of S and k values {yj*
in R or C, there exists a unique function veV such that

ν=Σ <*iVi

(where the {uj* are a basis of V\ and V satisfies the equations

v(ti) = yi (i = l,...,fc).

Example 7.1.1 For every given AeC, the set K = Pfc = {pePk9 ρ(λ) = 0} is a
vector space of dimension k which satisfies the Haar condition on every compact
subset of C that does not contain λ.

Theorem 7.1.5 (Haar) Every function fe C(s) possesses a unique best approxima­
tion v* in V if and only if V satisfies the Haar condition.

PROOF See Rivlin (1990, p. 78).

The special case of uniform approximation of real functions of a real variable

is treated in Laurent (1972, Ch. 3).
We are interested in the following problem: given a set S = {Af}{ of / distinct
points, determine the polynomial p* such that

||p* ||«,= min max|p(z)|. (7.1.1)

pePk zeS

The polynomials considered belong to an affine subspace; nevertheless, we can

revert to the preceding topic by putting q* = 1 — p*9 where q* is the best
approximation of the constant function unity on S by polynomials of P.

II l — « * II oo = m i n m a x I * - Φ)\-
qePk zeS

*Alfred Haar, 1885-1933, born in Budapest, died at Szeged.


Theorem 7.1.6 When k </, there exist k + 1 points λί,...,λίί+ί of S such that

ΙΡ·Ι--(Σ Π f - j ) ·

PROOF By Theorem 7.1.1 there exists a subset of r points {A,}^ of S that are
critical points of the error p* = 1 — q* and have the property that


where /c + 1 < r ^ 2/c + 1. We shall show that for this particular problem we have
r = /c+l.
Choose k + 1 points λί9..., kk + x among the r ^ k + 1 critical points of p*.
Consider the following basis of Pk\
Wj(z) = (z-X)lj(z) (; = l,...,/c),
where /, is the Lagrange polynomial

f = 1 Λ; /f

of degreefe— 1 such that

/J(AI) = 0 (i#./'andi#fc+l)
/,.(Afc + 1 )#0.
We verify immediately that
ω,(Λ,) = A,· - λ, ω/ΑΛ + i) Φ 0
and ω,·^,·) = 0 when i Φ] and i φ k + 1. By virtue of Haar's condition we have
detCo^^O (U=l,...,fc).
Hence the system

ϊ^.μ5)^ =0 (j=l,...,fc) (7.1.2)


offcequations with k + 1 unknowns; jSs has a non-zero solution in which j8fc + 1

is arbitrary.
Now consider the Lagrange polynomials of degree k:
fc+1 7 _ 2

r=l A : - /f

such that
l'j(Xj)=l /;.(Ar) = 0 (t*j)
0 = l , . . . , f c + 1). We verify that the system (7.1.2) has a particular solution
ßj = 1)(λ) Φ 0 (j = 1,..., k +1). In fact, the ;th equation can be written as

<*Mj)ßj + » A + i ) A + i = (Aj- W j + A + itf* +1 "λ) Π ^ γ ^ = 0.

t- 1 Λί Λ,

We may choose j?k+x to be non-zero; hence

β^-β^ψ^ήψ^ o=i,...,fc).
The reader will verify that ßj can be identified with

r = i A.· — Af

provided that we choose

We consider the polynomial peP* defined by

e i e ' = -^- (s=l,...,fc+l)

(since /?s ^ 0), that is

eie- = sgniSs-sgn/;(>l).
Evidently, p(A)=l. Put

It is clear that

p ( ^ = pe»' = p Ä
On the other hand, p is a positive real number i in fact
r*+i *)-i r*+i Ί-ι
p = | £ [sgn/;μ)] ς(Α)| =^ |/;wiJ > a

Thus the polynomial p, defined above, has the property that


ΣΙΑΙ ***>(*.> = ° <ν«'6,**λ


lft|p&) = PÄ (P>0).
This proves that, when k < f, the polynomial q = 1 — p is the best approximation
required: q* = 1 — p*, therefore p* = p. For this optimal polynomial we have

ΙΡΊΙΟ l - ^ l l o o = [sgnpW]p(A s ) = |p(As)| = p (s = 1 fc+1).

When fc ^ / , we have || p* ||«, = 0, because there always exists at least one
polynomial of degree ^ / such that
p(A)=landp(A i ) = 0 (i=l,...,/).

Example 7.1.2 Suppose that S = sp(/4) —{A}, where sp(/l) = {AJ^ represents
the d distinct eigenvalues of a matrix A. In Chapter 6, Section 6.6, we defined
ε ( 0 = min max \p(z)\.
p&i-x S

According to Theorem 7.1.6, the d— 1 eigenvalues of A that are distinct from

λ include / eigenvalues λί9..., λι such that
Ak — λ
Σ Π λ,,-λ;
ε<'>=< j=lfc=l (7.1.3)
0 otherwise

when all the eigenvalues, other than A, are in a circle that is well separated from
λ (see Figure 7.1.1), then ε(,) is small.

Figure 7.1.1

In order to obtain a bound for ε(/) which involves only a localization of the
spectrum (not all the eigenvalues) we consider a compact connected region D
containing S. Then:

(a) for all pe V we have

max \p(z)\ < max \p(z)\
zeS zeD

(b) the maximum is attained on the boundary dD because the polynomial p is
analytic in D.

Particular optimal results were cited in Theorem 6.6.2. They involve the
Chebyshev polynomials of the first kind which we are now going to study.


Polynomials are simple functions which are useful for approximating more
complicated functions. Now if peP k , say
p(t) = a0 + a0 + alt-\ \-aktk,
then p is entirely determined by the fc-f 1 coefficients a 0 ,a!,...,a k . Amongst
all polynomials the set of Chelyshev polynomials has interesting approximation
properties with regard to the uniform norm.

7.2.1 Definition
ThefcthChebyshev polynomial of the first kind Tk(t) is defined as follows:
Jcos(fccos_1i) when|i|^|
( cosh (k cosh _ 1 i) when|i|>|

= K(i + v / ^^) f c + (i + V^ IT )" fc ] dil>1)·

when | ί | > 1 , we can put coshku = Tk(t), where coshw = i or equivalently,
u = cosh" * t. The change of variable eu = w yields
^/x wk + w~k
where ί = (w + w~ *)/2. Thus w2 — 2tw + 1 = 0; we may choose
W= t+ y/t2~l.

7.2.2 Properties
Tk(-t) = (-lfTk(t),
7ΌΜ-1, Tl(t) = t, Tk(t) = 2tTk.1(t)-Tk-2(t) (k = 2,3,...),
\Tk(t)\^l when|t|<l.

Theorem 7.2.1 Let a<b<X. The optimum

min max \p(t)\
peP k te[a,b]

is attained by
tk(t) =
I 4 II oo =
T t [l+2(A-6)/(fc-<i)]'

PROOF See Revlin (1990, Ch. 2).


We can define Tk for a complex variable z by the formula (for example)
Tk(z) = cosh (k cosh ~* z).
when /c -► oo, we have | Tk(z)\ -► oo, except when z is real and |z| < |.
In what follows we suppose that λ is real.

Definition Let E = E(c, e, a) denote the ellipse with centre c, semi-major axis a,
and focal distance e, where c is real and a, e, λ — c> 0; let $ denote the region
bounded by E (see Figure 7.3.1).

Lemma 7.3.1
f/*= min max \p(z)\ ^
pePu ze<f T k [(A-c)/e]

Figure 7.3.1

PROOF Put z' = (z — c)/e. Then z' lies in the region &' which is bounded by the
ellipse £'(0,1, a/c) of centre 0, focal distance 1 and semi-major axis a/c. Therefore,
by the maximum principle,

7(fc) < max Uz') 1

z'eS' T*[(A-c)/e] Tki(X-c)/e-]ze*

Then ze£'(0,1, a/c) if and only if

On the other hand, Tk(z') = (w* + w"*)/2, whence
max | Tk(z')| == max £| w* + w~k\= max ||p*eik* + p~*e~
z'eE' weCp 0 < 0 < 2*

It is easy to see that this maximum has the value

i(p' + p-*)=7\(j|).

Theorem 7.3.2 The optimum η(Κ) is obtained for the polynomial

Uz-c)le\ _A „_ Tk{a/e)
hiz) = and M „ = ·T tX-c)e] (7.3.1)
TkU-c)le] k

PROOF See Clayton (1963).

This result is based on the assumption that c and a are real; it is no longer true
when these quantities become complex numbers (see the 7.3.2 and Exercise 7.3.9).
Nevertheless, the result remains asymptotically true as we shall now show. Let

Figure 73.2

E = £'(c, e, a) be an arbitrary ellipse that does not contain λ. Let p* be the

extremal polynomial that satisfies
max | p*(z)| = min max|p(z)|.
zeE' pepk zeE'

(As regards the existence and uniqueness of p*, the reader is referred to Exercise

Proposition 7.3.3
lim max|p*(z)11/fc= lim max if MI 1 /*
fc->oo zeE' fc->oo zeE'

PROOF We begin by proving that

min | ffc(z)| ^ max |p*(z)| ^ max | tk(z)\. (7.3.2)
zeE' zeE' zeE'

The inequality on the right is evident. Suppose that the inequality on the left is
max | n*( z )|< min | fk(z) |
zeE' zeE'

implies that
p*(z)<tk(z) when zeE'.

By Rouche's* theorem. tk(z) — p*(z) has as many zeros in the interior of £' as f(z).
Now tk has k zeros on the segment joining the foci c — e and c + e (Exercise 7.3.5).
On the other hand, tkW — P*W = 0 and λ is exterior to E'. This proves that
tk — p* is the zero polynomial, because its degree does not exceed k and it has at
least k 4-1 distinct zeros. Thus tk(z) = p*(z) on E\ which contradicts our
By virtue of (7.3.2) it suffices to prove that
lim min|ffc(z)1/fc= lim max if.(z)\1,k
k - o o zeE' fc-oo zeE' '

All the points of the ellipse E are such that limk_ „ |/(z)| 1/k is constant for zeE.

Theorem 7.3.4 When 0 ^ r < 1, we have

min max|n( z )| = r*.
pePk | z | < r , F V "

Eugene Rouche, 1832-1910, born at Sommiemes, died at Lunel.


PROOF Put Qk = {peP k ,p(l) = 1}. It is known that

2 ie 2 1/2S
maxX |p(z)|=limrf "lp(re )| M0l .

Now if peQ k and if we define

then qeQia- Thus

min |p(reie)l2sd0j > ™ η { | J Ι^' β )Ι 2 ( 1 θ | }

qiz) = £ atzl.

Jo i=o

Since 0 ^ r < l we have

/ ks \l/2 / ks \,1/2
2 2 2
(l><l '· ') ^(Σ>(Ι )
fcs ks t / ks \l/2
1= 2
Σΐβιΐ<Λ/&+ϊ( Σ Ν ·
1= 0
0 \/ = 0 /
, ,l/2s

(J,'"i*«w*r>VS ^ w . > ^ ) '

min f

Therefore when s -* oo we conclude that

min max | p(z)^r k .
peQ k |z|<r

This lower bound is attained when p(z) = z \

Corollary 7.3.5 Let D be the disk

D = {z;|z-c|^p},
where p and c are real and k>c + p. The optimum
n{k)— min max|p(z)|
pePk zeD

is attained for the polynomial

^z) = ij~j and

Μ4*ΙΙ- = ( χ τ λ 1
PROOF Carry out the change of variable
, z-c
z =-


In the power method we construct the sequence

starting from q0 = u/ \\ u ||. This iteration can also be written yk = ßkAku, k > 1.
One might think of using a more general polynomial iteration yk = pk(A)u,
where pkePk is a polynomial of degree k.
Suppose A is diagonalisable and has the eigenvectors {xf}". Moreover, we
make the assumption (5.3.1) that

|A 1 |>max|/i i |.


w = Σ £i*i

yk = £iP*Ui)*i + Σ tiPk(ßi)*i' (7.4.1)


Put λ = Ax. It turns out that yk yields a good approximation of l i n ^ ) if \pk(ßi)\

is small compared with \pk(X)\ when i ^ 2. This suggests finding a polynomial p
that satisfies
min max | p(z) | = e{k+*).
peP k reep(i4)-{A}

We suppose that the dominant eigenvalue λ is rea/ and that the remainder of the
spectrum lies in the ellipse E(c, e, a) where c, e, a are real and λ — οα.
By Theorem 7.3.2 the optimum polynomial is

r ,v 7U(z-c)/eJ

The computation of yk is simplified by the three-term recurrence relation


satisfied by Tk. On putting

P* = T / ~ ) (k = 0,1,2,...)

we obtain

P k + l i k + l ( z ) = Tk+i

Z "—" C
Again, on putting ak+1 = pjpk+i we have
tk+l(z) = 2ak+1 tk(z) - akak+1 tk.x(z). (7.4.2)

Hence the ak are recursively defined by

^=-^-, σλ+1=—-| (fc=l,2,...) (7.4.3)

λ-c (2/aJ - ak
The two recurrence relations (7.4.2) and (7.4.3) can be combined to define an
algorithm for the computation of
yk = tk(A)u (fc = l,2,...).
Although the value of λ is not known, we remark that it occurs only in the
denominator of ak (in 2/σ^, which is a normalization factor for kk: in practice λ
may be replaed by an approximate value.


We wish to determine the dominant eigenvalue λ of A which we suppose to be
real positive and simple: this is the eigenvalue of greatest real part. The ellipse
E(c9 e, a) contains the remainder of the spectrum of A.

7.5.1 Definition
The following method of calculating yk is known as the Chebyshev iteration:

(a) For y0 = u Φ 0, compute σχ = ,


(b) F o r ; = l , 2 , . . . , / c - l p u t (7.5.1)

yj+! = 2-J±±(A - cl)yj- σ}σ}+χγ}-ν


(a) We have assumed that the parameters c, e and a, which define the ellipse, are
real. If £, still centred on the real axis, has its major axis parallel to the
imaginary axis, then e and a are imaginary. Nevertheless, the computations
of (7.5.1) can always be carried out in real arithmetic. In fact, the σ, are pure
imaginary and so σ,+ l/e and σ]σ]+ j are real.
We remark that when a and e are imaginary, the polynomial

is no longer optimal but remains asymptotically optimal for large k.
(b) When the eigenvalues are real and \μ( — c\ ^ a (i = 2,..., n\ then the optimal
polynomial is
f T f c [ (r-c)/a]
by virtue of Theorem 7.2.1. Therefore, in order to obtain the Chebyshev
iteration in this case it suffices to replace e by a.

7.5.2 Convergence
The Chebyshev iteration (7.5.1) can be interpreted as a method of projection
on the direction generated by ük = tk(A)u (fc = 1,2,...), where

In Section 7.6 we shall present a convergence theorem similar to those

established in Sections 5.2 and 6.2 for the method of subspace iteration.
We make here a more qualitative study of the convergence. If ξχ Φ 0, then
(7.4.1) implies that lin(yk) converges to the direction lin(x) if and only
maXf^ 2 ΙΛΟΌΙ -^Ο when fe-> oo.
If we define wf as the root of

i(vVi + Wi-i) = * ^

of greatest modulus, then we know that £^f) behaves asymptotically like

(Wf/Wi))*, where

^(w1 + w 1 1 ) = =— and wl>0.

Definition The damping coefficient of μ{ relative to the parameters c and e is

defined as κ^μ,) = | wjwi |. The convergence rate towards λ is
τ(λ) = max κίμ,).

The coefficient κ(μ^ is constant on the ellipse £(c, e, a) of semi-major axis

α , ^ ϊ ί Ρ ί + ΡΓ 1 )^
where pf = | νν,·|, αχ = λ — c, whence

ι + V αι ~ e

(see Figure 7.5.1).

When >4 is symmetric we may choose
^2 + ^min
c =

^2 "~ *min
a =

The convergence rate towards λ is then 1/wJ, where

w, = T t = -
al + y/al-a2'
which is the same as that of the Lanczos method defined for the Krylov subspace
Jtk+! = (u, >4M, ..., Aku).

Figure 73.1

Hence, we have just shown the remarkable property that, without knowledge of
c or e, the Lanczos method determines automatically the vector uk = tk(A)u in the
which is the best possible vector with regard to the speed of convergence towards
This remains true when A is no longer symmetric and λ is the dominant
eigenvalue, replacing the Lanczos by the Arnoldi method (when i = 1): the
convergence rate towards λ9 that is
max^ 11 wy| _ a + ^/a2 — e1

is the same as that for the Arnoldi method (i =1), defined for Jffc + 1 when k is
sufficiently great (see Exercise 7.5.2).
It should be borne in mind that such a performance of the Chebyshev iteration
can be attained only when optimal parameters c and e are used. It is unrealistic
in practice to assume that these quantities are known. It is necessary to determine
them dynamically in the course of the iteration. This will be treated in Section 7.7.


Let the eigenvalues {μ.}" be ordered by decreasing real parts. We wish to compute
the r eigenvalues {μ(}\ of greatest real parts. We suppose that Re(^r) > Re(pr+ x)
(see Figure 7.6.1, in which r = 4).
Let M denote the invariant subspace associated with {μ.}\ and let P denote
the spectral projection upon it. We shall now consider the projection method on
the subspace Sk = tk{A)S9 where tk(z) = Tk[(z — c)/e]; this is determined by the
parameters c and e which defined the ellipse £(c, e, a) containing the remainder
of the spectrum {μ,·}"+ χ (see Section 7.7). The algorithm consists in constructing
an orthonormal basis Qk in Sk starting from a basis U of the subspace S of


f X X^2

-a x—»
x c
V x

Figure 7.6.1

dimension m^r9 the constants σΐ9 k and ε being given. This is carried out as

(a) ( / 0 = C / , t / 1 = ^ - c / ) ( 7 ;
(b) when; = 1,...,k — 1, put

'"■-(£-'') ■
υ,-,,-2-ίϋ(/1-ε;)ϋ,-σΛ,,ϋ;_,; (7.6.1)
(c) Uk = QkRk;
(d)Bk = QtAQk^FkDkF~l
(projection and diagonalization).
From the m eigenvalues of Bk retain the r eigenvalues of greatest real parts;
form the diagonal matrix Dk with them and let Fk comprise the associated r
eigenvectors. Put Xk = QkFk.
(e) If || AX'k - X'kD'k ||F > ε, then U = QkFk; substitute in (a).
For the computation we use the polynomial
h(z) =
because tk(X) = 1 [see (7.4.2) and (7.4.3)]. We assume that the matrix A is
diagonalisable. The orthogonal projection on Sk is denoted by nk.
The following result is a consequence of Lemma 6.2.1.

Lemma 7.6,1 Suppose that dim PS = r. Then for each eigenvector xf associated
with μ, ί/iere exists a unique vector st ofS such that PSi = x, and

\\(I-*fcih*ct ™e) (*=l,...,r),

provided that the eigenvalues {/xj"+ x lie in the ellipse £(c, e, a).

PROOF We revert to the proof of Lemma 6.2.1. By definition,

||(/-^)xi||2 = min||xi-3;||2<||xi->;i||2,


yt = - r - : hiAfo = x, + —— tk (A)(I - P)xt-.



Since A is diagonalisable, we have
IItklA(I - P)] ||2 ^max|rfc^,.)|cond2(X).


zeE o
The constant c, has the value of cond2 (X) || (/ — P)sf || 2 .

Corollary 7.6.2 Suppose that the eigenvalues {μ.}" are rea/ and that they are
arranged in decreasing order. Then under the hypotheses of Lemma 7.6.1 we have

MI-*Jx,h*c,-l— (i=l,...,r),

PROOF There are n — r eigenvalues in the interval [μ„,μ Γ +ι]. Apply Lemma
7.6.1 when
c= , a= , y,= .

Theorem 7.6.3 Suppose that the assumptions of Lemma 7.6.1 are satisfied. Then
the method of simultaneous Chebyshev interations with optimal parameters
(a) If the ith eigenvalue of the greatest real part is simple and if the {μ7}"+ χ lie in
the ellipse E (c, e, a), then the error boundsfor the ith pair of eigenelements are of
the order
(b) If A is Hermitian, the bound for the ith greatest eigenvalue becomes of order

PROOF co(Sk,M)-»0.

The convergence rates that we have found, are, respectively, those of the
block Arnoldi method (when the dominant eigenvalues are real and positve) and

those of the block Lanczos method. For a non-symmetric matrix the cost of
simultaneous Chebyshev iterations is well below that of the block Arnoldi
method. The former method will be preferred in practice if a satisfactory
technique is available for estimating the optimal parameters.
The gain due to the Chebyshev acceleration is measured by comparing
|μΓ+1/Αΐ,|* with Tk(a/e)/Tk[frt-c)/e]9 which is equivalent to (max^lvv^/KI)*
(i = 1,..., r), when k is sufficiently great.


The Chebyshev iterations (7.5.1) and (7.6.1) depend on the parameters c and e
(respectively a = e in the case of a real spectrum) which determine the ellipse
(respectively the segment [c — a, c + a]) containing those eigenvalues that we do
not wish to compute.
Let us begin by studying the case in which r = 1. The optimal parameters c
and e are those that satisfy

min τ(λ) = min max Κ(μ() = min max (7.7.1)

c,e c,e ί>1 c,e i> 1

We assume that the eigenvalue of the greatest real part is real. The set sp (A) — {λ}
is symmetric with respect to the real axis.
The problem (7.7.1) consists in seeking the minimum of a finite number of
functions of two real variables c and e if sp(A) — {λ} is supposed to be known.
When /l = 0, the problem has been studied in Manteufifel (1977), where an
algorithm for the computation of c and e is proposed.
When r > 1, a natural idea consists in seeking to solve

min τ(μΓ) = min max (7.7.2)

However, the situation is more complicated than in the case in which r = 1, for
there may be conjugate complex eigenvalues.
(a) μΓ is real. It suffices to fulfil (7.7.2) (see Figure 7.7.1).
(b) μΓ_! and μΓ are conjugate complex. It may happen that the best ellipse
constructed upon μΤ.χ and μΓ contains in its interior some of the desired



Figure 7.7.1



Figure 7.7.2

eigenvalues. A good compromise consists in finding the best ellipse for

μ = Re(/xr); see Figure 7.7.2 and Saad (1984) for further details. The case of
complex μΓ is treated in full generality in Ho, Chatelin and Bennani (1990).
In practice, sp(A) is unknown and one has to resort to estimations of the
eigenvalues in order to determine the optimal parameters dynamically. This can
be done by using the required information about the spectrum which is found
in the various methods for computing the eigenvalues (power method, simul­
taneous iterations, Arnoldi's method).
For example, in the algorithm of Section 7.6 we choose m>r and in step (d)
of (7.6.1) we bring to light the parameters c and e for determining the new ellipse
by using the m — r eigenvalues not retained in Bk.


As we have just seen in detail, the Chebyshev iteration method serves to
accelerate the linear iteration methods which are used to solve linear systems
(Exercise 7.8.1) or to compute some eigenvalues of greatest real parts. The
underlying problem of approximation theory is concerned with the determination
of the polynomial that satisfies
min max|p(z)|, (7.8.1)
pePk zeS

where S is a set in the complex plane containing the spectrum of A except A. (If
the problem is to solve a system, then λ = 0 and S contains the whole spectrum.)
When S is bounded by an ellipse, the solution of (7.8.1) is the Chebyshev
polynomial tk(z\ whence the term 'Chebyshev iterations'.
However, the problem (7.8.1) has many other applications apart from the
acceleration of linear systems. In fact, in very diverse contexts we meet the more
general problem of determining a polynomial that is large (in a certain sense) on
some eigenvalues {μ^\ of A while it is as small as possible on the remainder r of
the spectrum. For example, we mention the filtering or the techniques of

*μ 2

-X *


Figure 7.8.1

In the case of a complex spectrum the uniform norm that appears in (7.8.1) is
not necessarily the norm that leads to the best polynomial in practice. The
Chebyshev polynomial depends on the optimal ellipse which contains the part
τ of the spectrum that is to be eliminated. This ellipse may turn out to be far too
large in relation to r(see Figure 7.8.1).
It might be more interesting to consider the polygon H which is the convex
hull of the set of eigenvalues in τ and to determine the least squares polynomial
that satisfies
™° HPL, (7-8.2)
Σ^ι«ίΡ(/*ι) = ι

where the {aj^ are given coefficients and |||| w is the I2 norm relative to
a weight function w defined on the boundary dH.

Theorem 7.8.1 Let {SJ}Q be the first k + 1 orthogonal polynomials with respect to
w. The polynomial that satisfies (7.8.2) can be written as

^=Z«.s>i) (j = o,...,k).

PROOF This generalizes the known result for r = l . Consider the degene­
rate kernel

k(**z)= Σ Sj(t)sj(z).


</Kz),/fc(i,z)>w= P(z)lk(t,z)w(z)dz
= p(t)
for every pePk.

Now q* satisfies the constraint


and can be rewritten as


q*(z) = c£ α,/^,,ζ),

where !
Let pePk and suppose that

Σ α«Ρ(μ,·)=1·

We put p = q* + e; clearly

On the other hand,


= c X a^/i;) = 0.

We conclude that \\p\\w>q* || w.

As regards the determination of //, the choice of w and the practical
computation of q*, the interested reader is referred to the paper by Saad


We continue to suppose that one wishes to compute the r eigenvalues {μ·}\ of A
which have the greatest real part. We shall briefly describe how Arnoldi's method
can be combined with the techniques of deflation and polynomial transformations
(for acceleration of spectral preconditioning) in order to obtain efficient hybrid
methods. These methods are described in greater detail by Saad (1989).

7.9.1 The Method of Arnoldi-Chebyshev

We are concerned with Arnoldi's iterative version, where Chebyshev iteration is
used to compute the new initial vector. Wefixm as the size of the Arnoldi method
and k as the degree of the Chebyshev polynomial. We choose an initial vector u.
The algorithm consists in carrying out the following three steps:
(a) Starting from u we compute the Hessenberg matrix of order m obtained by
Arnoldi. Its eigenvalues are divided into two groups: the r eigenvalues that
approximate to those we wish to find and the m — r eigenvalues that allow
us to determine the optimal parameters.
(b) Let z0 be a suitable linear combination of the eigenvectors associated with
the r retained eigenvalues. Starting with z0, carry out k steps of the Chebyshev
iteration in order to obtain zk = tk(A)z0.
(e) Put u = zk/\\ zk || and return to step (a).
Step (b) serves to diminish those components of z0 that are associated with the
unwanted eigenvalues.
The practical choice of the parameters m, k and r is discussed in Bennani (1991).
The influence of the non-normality of A is discussed in Chatelin and Godet-

7.9.2 Arnoldi's Method with spectral Preconditioning

We apply Arnoldi's method to the matrix Bk = pk(A), where the polynomial pk
has been determined in such a way that the r eigenvalues of A with greatest real
parts become the r dominant eigenvalues of Bk. Let Vk be the Arnoldi basis
computed in this way; the eigenvalues of A are approximated by those of the
matrix V*AVk by virtue of the principle that Bk and A have the same invariant
subspace associated respectively by the r dominating eigenvalues and by those
of greatest real parts.
We fix m andfe,and we choose u. The algorithm consists in carrying out the
following steps in succession:
(a) Initialization. Starting with u, apply Arnoldi's method to A and divide its
eigenvalues into two groups; go to step (c).
(b) Compute V*AVk, divide its eigenvalues into two groups and obtain the
optimal parameters.
(c) Compute the Chebyshev polynomial of degree k and compute the vector v
as a linear combination of the eigenvectors associated with the retained
(d) Starting with v apply Arnoldi's method to Bk = pk(A) in order to obtain the
basis vk; return to step (b).

(a) It is unnecessary to compute Bk explicitly in order to apply Arnoldi's method
to it; only the product pk(A)x is required, where x is a given vector.

(b) The following observations refer to each of the two methods we have just
(i) The Chebyshev polynomial associated with the ellipse containing the
unwanted eigenvalues may be replaced by the least squares polynomial
associated with the convex hull of these eigenvalues.
(ii) If the number r of required eigenvalues exceeds a certain size, then it may
be of interest to use a deflation technique (Exercises 7.9.1 and 7.9.2).
(c) In addition, the spectral transformation λν-+{λ — σ)" 1 may be used for
preconditioning. In order to solve (A — σΙ)χ = y9 we employ a direct method
(Gauss factorizaion with pivot and preconditioning) or an iterative method
of the conjugate gradient type with preconditioning (see Golub and van
Loan, 1989, p. 373). The algorithm of minimal residual of Saad and Schultz
(1986) makes no particular hypothesis about the matrix A — al.


This chapter has been inspired mainly by Saad (1984). The literature on
Chebyshev polynomials of a complex variable is very poor (Wrigley, 1963; Rivlin,
1990; and Manteuffel, 1977). Theorem 7.1.6 is due to Saad (1982b); Theorem 7.3.3
is due to Manteuffel (1977); Theorem 7.3.4 is due to Zarantonello (see Varga,
1957). The use of Chebyshev iteration for the computation of the critical value of a
nuclear reactor is very old (see, for example, Wrigley, 1963, or the book by
Wachpress, 1966). The aggregation/disaggregation methods are used in this
context (see Stettari and Aziz, 1973; see also F. Chatelin and Miranker, 1982).
An algorithm for computing the optimal parameters when the reference eigen­
value μΓ is complex is given by Ho (1990).


Section 7.1 Elements of the Theory of Uniform Approximation

for a Compact Set in <C
7.1.1 [A] Let S be a compact set in (C, let C(S) be the set of continuous functions
on S with values in C and let V be a subspace of C(S) of dimension k. Prove that

V/eC(S), 3v*eV such that

»/-»ΊΙοο^ΙΙ/-^«,. Vi?eK,
where || · || ^ is the uniform norm on C(S):

11/11 , = max |/(z)|, V/eC(S).


7.1.2 [B:50] Consider the fundamental interpolation polynomial of a function


f: [ - 1,1] -+R in the points x,e[-1,1] (1 < ; «S fc):

U-i(x)= Σ/(*Λ(*λ
where the l} are the Lagrange polynomials of degreefc— 1. Let p^_1 be a best
approximation of / in Pk _ t . Define
/>* = liy— i-fc-i lloo,
«* = Ι Ι / - Ρ Γ - 1 Ι Ι - -
Show that

p t < J 1+ max Σ|/,·(χ)Μ.

7.1.3 [D] Show that Haar's condition is equivalent to the interpolation


Section 7.2 Chebyshev Polynomials of a Real Variable

7.2.1 [C] Show that the first five Chebyshev polynomials are
T0(t)=l, ■
Ti(f) = t,
T2(t) = 2t2-l,
T3(t) = 4 t 3 - 3 i ,
T4(i) = 8f 4 -8f 2 + l.
7.2.2 [B:50] Show that the Chebyshev polynomials satisfy
Tk(t) = 2tTk.l{t)-Tk.2(f),

Deduce that Tk is of degreefcand that the coefficient oft* in Tk(t) is equal to 2*.
7.2.3 [C] Show that the Chebyshev polynomials satisfy
1 "0 if ΙΦΚ

I at
if / = fc#0,
if / =fc= 0.
7.2.4 [D] Show that the Chebyshev polynomials satisfy
(1 - t2)T'k(t) = kTk. ,(t) - ktTk(t), (fc S* 1),
2 2
(1 - ί )Τ'ί(ή = tT'k(t) - k Tk(t) (fc > 0).

7.2.5 [B:50] Show that the Chebyshev polynomials satisfy

1 — si
y Tfc(t)s*= ,·
A l - 2 s r + s2
7.2.6 [D] Prove that, for fixed k and sufficiently great real i,
T k (i)-i(2i) k .
7.2.7 Prove that

Tk(\ + 2ε) - ±e2*^* if ε > 0 is sufficiently small and k > 1/^/ε,

Tk( ω + — I ~ TU)* if k is sufficiently great.

Section 7.3 Chebyshev Polynomials of a Complex Variable

7.3.1 [D] Define
Tk(z) = cosh (k cosh " 1 z).
Study this definition with the help of the function
τ:(χ, }>)!-► (cosh x cos y, sinh x sin y).
7.3.2 [D] With the help of the definition of Tk(z) given in Exercise 7.3.1, recover
the definition of the Chebyshev polynomials of a real variable:
x > 1 => Tk(x) = cosh (k cosh~ 1 x),
x < - 1 =>Tk(x) = ( - l ) * c o s h [ / c c o s h " ^ - x ) ] ,
|x| ^ 1 =>rfc(x) = cos(fccos -1 x).
7.3.3 [D] Show that the Chebyshev polynomials can also be defined by
Tk{z) = cos (k cos - 1 z ) .
7.3.4 [A] Show that the Chebyshev polynomials satisfy the recurrence relation
Tk+l(z) = 2zTk(z)-Tk^(z) (ze<E).
7.3.5[A] Show that Tk(z) has k zeros on the real segment [—1,1].
7.3.6 [D] The Joukoviski transformation is defined by
J:wy-+t = i(w + w _ 1 ).
Show that J transforms the circle | w\ = p into the ellipse
ΒίΟ,Ι,^ρ + ρ" 1 )).
7.3.7 [D] Prove that

—^-eE(Oy Ud)^>teE(c,e,de).

7.3.8 [D] Show that

Tk(z) = ±(wk + w~%
z = cosh ξ and w = e5.
7.3.9 [A] Show that the polynomial tk of Theorem 7.3.2 is no longer optimal
when the parameters c and a are not real.
7.3.10 [D] Study the limit of

G*(z) = % ^ (ZGCC),
when fc tends to infinity.
7.3.11 [D] Show that, when the ellipse E(c,e,a) becomes a circle, we have

^0 and ^«^ϊ.
7.3.12 [A] Prove the existence and uniqueness of the polynomial p* such that
max|p*(z)|= min max|p(z)|,
zeD pePk zeD

where D is a compact set in the complex plane, not containing λ.

7.3.13 [D] Consider the ellipse £(0,1, a) and the number
p= a+ yJa2-\.
Let Ex be the conformal ellipse which passes through λ and has a semi-major
axis equal to αλ; let
Px = <*λ + χ / β ί ί - Ι ·
Prove that
P< · [p(z)|^p fc + p- fc
-<min max \^~z rr·
p* pePk ze£<0.1 ,Λ) | ρ(λ) \ Ρλ + Ρλ

Section 7.4 Chebyshev Acceleration for the Power Method

7.4.1 [D] Study the Chebyshev acceleration for a defective matrix.
7.4.2 [D] Write down the algorithm for the computation of
yk = tk(A)u (fc=l,2,...),
which is deduced from the formulae (7.4.2) and (7.4.3) by replacing the exact
eigenvalue λ by an approximation.

Section 7.5 The Chebyshev Iteration Method

7.5.1 [D] Prove that if A is symmetric than the Lanczos method defined by the
Krylov subspace
jrfc + 1 =lin[p k (,4),p fc eP k ]
determines the vector ük = tk(A)u, which is optimal for the speed of convergence
towards the dominant eigenspace, no knowledge being required of the parameters
c and e of the ellipse associated with the Chebyshev iteration method.
7.5.2 [A] Prove that for large k

/maxj^lwjiy ^ Tk(a/e)
\ |wj / Tkl(X-c)/e]'
7.5.3 [D] Consider the Chebyshev iteration method. Show that if the eigen­
values of A, other than A, lie in the disk | z | < p, then the convergence of the power
method is not improved by Chebyshev.

Section 7.6 Simultaneous Chebyshev Iterations (with Projection)

7.6.1 [D] Study the validity of Lemma 7.6.1 when A is not diagonalisable.
7.6.2 [D] Compare the cost of the simultaneous Chebyshev iterations with that
of the block Arnoldi method.
7.6.3[B:44,58] Let AeR"*", beTR" and μ 1 ,...,μ Γ 6€. Find / e R " such that
p f esp(/l-fc/ T ) (l^i^r).
Let Qe<C be an orthonormal basis and let K e C r x r be an upper triangular

matrix such that

Ηη(β) = Μ*,
where M + is the invariant subspace of A1 associated with the eigenvalues
λ1,..., Ι Γ of A1 with greatest real parts.
(a) Propose one or more algorithms to compute the basis Q of this partial Schur
factorization of A1.
(b) Show that the choice
/=ß5, see,
ί = β ί>
reduces the proposed problem to the following problem of size r.
Find s e C such that RT — tsT has the eigenvalues μΐ9...,μτ. This problem
is called the partial pole assignment in control theory.

Section 7.7 Determination of the Optimal Parameters

7.7.1 [B:29,57] Write down an algorithm for the dynamic computation of the
optimal parameters for the algorithm (7.6.1) by using the m — r eigenvalues of Bk
that have not been retained.
7.7.2 [B:29,30] Suppose μΓ is complex. Compare the approximate solution of
(7.7.2) with the exact solution proposed by Ho in [B:29].

Section 7.8 Least Squares Polynomials on a Polygon

7.8.1 [A] Let x 0 be an approximate solution of the problem Ax = b. Starting
with a set of constants yni(i = 1,2,..., n — 1), define
w - l

xn = x„-i+ Σ ym'i,

rf = b — Axt.
Let the error be denoted by

where x is the exact solution.

(a) Prove that
en = P„(A)e0i
where p„ is a polynomial of degree n such that pn(Q) = 1. We are interested in
a sequence of polynomials pn such that ||p„(>4)||2 tends to zero as fast as
possible when n tends to infinity.
(b) Prove that if A is diagonalisable, then
ΙΙΡ„(Λ)ΙΙ2->0 as n->oo,
if and only if
|/>rt(A)|-»0 as H-+00
for all λ in sp (A).
Now determine the polynomial that satisfies (7.8.1) when λ = 0.
(c) Extend the preceding result to the case of an arbitrary matrix by using its
Jordan form. For a given sequence of polynomials pn we define the asymptotic
rate of convergence at a point AeC as follows:
Consider the polynomials
„ , » . _Tml(c-xye]

(d) Prove that for this polynomial pn we have

r(A)I = exp < Re cosh * ( ] - cosh * I - 1 >.

| ( c - ^ + [(c-A) 2 -e 2 ] 1 / 2 l
c + (c 2 -e 2 ) 1 / 2
we call optimal parameters those that minimize maxZ€sp(/1)r(A).
(e) Show that for a given pair (c, e) one has to compute the sequence
x0 eR", r0 = b — Ax0,

Δ0 —-r0, Xj — x 0 + Δ 0 ,

and, starting from xn,

rn = b- Axni

«1 =
2c2 - e 2 '

"^l·"©^-1] '
ßn = C0tn-l.

Section 7.9 The Hybrid Methods of Saad

7.9.1 [A] Suppose a Schur factorization AQ = QR is given corresponding to a
fixed order of the eigenvalues. Define a deflation with several vectors.
7.9.2 [A] Propose an algorithm of progressive deflation for computing the
eigenvalues with greatest real parts.
7.9.3 [D] Propose a polynomial preconditioning for Ax = b which is suitable
for the method of conjugate gradient.

Polymorphic Information
Processing with Matrices

The 25 years which separate the writing in 1987 of the original French version of this
textbook and the present Classics Revised Edition have enabled scientists and soft­
ware developers to progress significantly in the understanding of the role played by
matrices in intensive scientific computing. The evolution in computing know-how is
fuelled by the necessity to translate mathematical computation into numerical soft­
ware which should be fast and reliable enough to meet the ever-growing demands of
high-tech industries. In this endeavour, computations which rest upon an explicit or
implicit spectral decomposition of highly non-normal matrices represent a formidable
challenge. The spectra are inherently unstable: this is convincingly illustrated by Ex­
ample 4.2.11, pp. 162-164 in Section 4.2.7. And the subject is developed more thor­
oughly in Chapter 10 of the book [4J, which addresses the specific difficulties created
by high non-normality for the necessary backward assessment of finite precision com­
putations. In practice, high non-normality does arise in matrix computation when the
underlying equations express a strong coupling between two phenomena observed in
physics or technology. This coupling creates mathematical instabilities which have a
serious impact on the computed results. Various tools such as pseudo-spectra [4,21]
have been designed to assess the validity of such results when obtained with a reli­
able numerical software run on a computer with a classical architecture. But when
codes are run on massively parallel architectures, the validity assessment of computer
simulations remains today an open, yet pressing, problem. An extensive survey of
concurrent advances in numerical software for eigenvalues is found in [18]. All ref­
erences which do not appear in Appendix B or in C are listed in the text as [n] and
appear at the end of the chapter for reference number n. An analysis of what works
best in software practice is a precious source of information about the theoretical
reasons why matrices are such dependable tools in scientific computing.
This chapter attempts to present some of these reasons in a way which does jus-

tice to the power of mathematics to compute and model our world. Thus it is equally
important to consider what conceptual tools are at work in today's (2012) view of the
phenomenological world presented by theoretical physics [1,2]. To set the scene, let
us review the rich variety of basic building blocks, generically called scalars, which
are used in computation understood in a broad enough sense, running from engineer­
ing practice to physical theory.


8.1.1 The Real Field R
The field of real numbers underlies almost all computations taking place inside scien­
tific computers worldwide.
We recall that, more generally, a field K is an algebraic structure where two dis­
tinct operations, denoted + and x, are defined, both associative and commutative,
yielding an additive group structure (0 neutral for +) and for K* = K\{0} a mul­
tiplicative group structure ( 0 ^ 1 unit for x). Beside the field R, children in high
school are introduced to Z2 = {0,1} and Q, and sometimes to complex numbers,
turning R2 into C.

8.1.2 The Skew Field H

Multiplication need not be commutative in a field, as is illustrated by the field HI =
C x C of quaternions which are 4D-real vectors. Quaternions are heavily used in all
engineering domains which depend on 3D-rotations, from computer graphics for the
movie industry to orbital mechanics for geolocalisation. They also underlie electro-
magnetism (Maxwell 1870; see [8]) and special relativity.


8.2.1 The Integer Ring Z
We commonly count with integers 1,2,3,..., leading to the ring Z = {0, ± 1 , ± 2 , . . . } :
we can multiply but not divide in Z* (for n > 1,1/n ^ Z). This indicates that scalars
may belong to a ring in practice.

8.2.2 The Ring Ή of Hyperbolic Numbers

The real plane R2 can be classically endowed with the complex field structure C. This
makes it the Euclidean plane pioneered by Euler with the celebrated formula
ew = cosθ + isin0, \el&\ = 1

resulting from considering the imaginary unit i = (0,1), i2 = —1, Arg i = π/2, and
circular trigonometry on the unit circle.
The 1848 proposal by J. Cockle to equip R 2 with an alternative ring structure
related to hyperbolic trigonometry on the two dual unit hyperbolas x2 — y2 = ±1 is
not widely appreciated. It rests on the introduction of a non real unipotent u — (0,1),
u2 = 1, u Φ ± 1 , such that w = x + yu eV..
Some properties of complex vs. hyperbolic numbers are contrasted below at a
point M = (x,y) inR 2 :

c u
2 2
i, i = - 1 u, w = 1, u 7^ ± 1

z = x + iy w = x + yu

z = x — iy w* = x — yu

zz = x2 + y2 > 0 ww* — x2 —y2 indefinite

ζ - ^ & ζ ϊ θ w-1 = -!°^ \A Φ \v\

Vx<2 - y2 if
\χ\ > \vl
\z\ = Λ/ΖΊΪ k k = < 1 0 if \x\ = |y|,

^\Jy2 — x2 if M < |y|>


|i| = l \u\h = ± i

el6> = cos Θ 4- i sin Θ eni^ = cosh(/? 4- usinhy?, \eU(p\h = 1

± e ^ , tanh(^ = | , |y| < |x|

w _ )

t a n 0 = f, x ^ O ±ueM<^, tanh<^ = | , |t/| > |x|

The algebraic structures C (field) and Ή. (ring) yield respective information at the
point M which differ markedly:
• The Euclidean norm or modulus \z\ G R + is replaced by a threefold mark
function w *-> \w\h £ {R + , iR} which is not a norm and depends on the loca­
tion of w for its expression. The mark relates implicitly H to the axes R and

%R in C. The hyperbolic "distance" of M from the origin is \\w\h\ — p =

yj\x2 - y2\ > 0. It remains constant at p > 0 (respectively at 0) when M
varies on the four-branched hyperbola with hyperbolic radius p > 0 (respec­
tively on the asymptotes).

• In circular (respectively hyperbolic) trigonometry the polar form is unique (re­

spectively fourfold for |y\ φ \χ\) with a single Euclidean angle Θ defined mod
2π (respectively two hyperbolic angles φ in R).

These differences are induced by the two quadratic forms x2 + y2 vs. x2 — y2.
The plane R 2 is treated as a whole by means of the positive definite form x2 4- y2\ it
is divided into four distinct quadrants by the two asymptotes y — ±x on which the
indefinite form x2 — y2 is 0. For example, in the quadrant where 0 < y < x, the
angles Θ (Euclidean) and φ (hyperbolic) are related by 0 < tan# = tanh</? = A'A
with OA! = 1 in Figure 8.2.1. The point B — el6 on the unit circle defines OB' =
cos# < 1, B'B — sin#; alternatively the point C = βηφ on the unit hyperbola
defines OC = coshy? > 1 and C'C = sinh(/?. Moreover, θ < π/4 and φ > 0
represent respectively twice the area of the circular and hyperbolic sectors OA'B and

Figure 8.2.1

The structure Ή equips the plane R 2 with a model of hyperbolic geometry. The
hyperbolic number w — χΛ-yu models the addition of the two heterogeneous numbers
xl and yu, where 1 and u span two distinct real categories such as space and time.
Not surprisingly, hyperbolic numbers are well-suited to describe special relativity in
one spatial dimension [19J.

The standard model for particle physics is set in the multiplicative and associative
algebras due to Clifford [1,9]. However we cannot leave this foundational topic [8,9]
without touching on non-associativity, i.e. on algebraic structures beyond rings. This
is because recursive complexification of x leads to the non-associative algebras Ak,
k > 3, proposed by Graves (k = 3) and Dickson to go beyond A^ = H. These weaker
algebraic structures offer new computational possibilities which may appear at odds
with classical logic [3]. Computation paradoxes should not be feared: they reveal new
phenomena which invite us to extend the current logic of computation [3,7,8,9]. We
shall be facing yet another computational contradiction later in Section 8.8.
The smallest non-associative Dickson algebra is the division algebra As = G
consisting of octonions in R 8 . This algebra would be a field if it were associative. It
stands at the crossroads of many computational phenomena in geometry, theoretical
physics and number theory [1,2,8].
But it is high time to refocus on our central theme: the associative ring of square
matrices defined over R or C


8.3.1 A "Pseudo-field" of Square Matrices
Let A be a matrix of order n > 2 over C with rank r — r(A), 1 < r < n. All
non-singular matrices (r = n) form a field inside the ring C n x n with
AA-1 =Α~1Α = Ιη. (8.3.1)
We set n = m in Exercise 1.6.8 on p. 50, where A = UEV* and A* = VY^U*.
Lemma 8.3.1 When 1 < r < n, the identity (8.3.1) is replaced by the two following
AA* = PUy A* A = i V , (8.3.2)
where P\j and Py are orthogonal projections on Im A.

PROOF For 1 < r < n, EE f - ( Ir

n ) = 7 r , and hence AA* = UIrU*
and A*A = VIrV* are two similar matrices representing the orthogonal projection
Ir on Im A expressed in different bases. Observe that sp (J n ) = {1} is replaced
by sp (Pu) = sp (Pv) = {0,1}, where the algebraic multiplicity of 1 has become
r < n.
The existence of a pseudo-inverse for any A φ 0 with rank r, 1 < r < n, allows
us to expand the ring structure into that of a "pseudo-field", provided that we extend
the set of unit matrices to contain all orthogonal projections with rank r, 1 < r < n.
For reasons which will become clear as we proceed, we call macro-scalars the
elements in a ring of square matrices.

8.3.2 Another Look at the QR Algorithm

Convergence of the QB, algorithm has been presented in Chapter 5, Section 5.5,
pp. 221-226. It was independently invented in 1961 by Vera Kublanovskaya in
Russia and John Francis in England, who replaced Gauss'Li? factorisation used in
(Rutishauser 1958) by the more robust alternative provided by Laplace's QB. Before
this invention, scientists would mainly use Newton's method to find the roots z e C
of the characteristic polynomial π(ζ), or x e C n of (5.8.1) on p. 227 (Anselone and
Rail 1968).
We observe that the QB, algorithm rests on the QB, factorisation (1.8.1) on p. 31
applied successively on a sequence of matrices unitarily similar to A\ = A as in­
dicated in Section 5.5.1 on p. 221. In other words, the algorithm works directly on
square matrices treated as scalars. The result is a paradigm shift in the history of the
practical computation of eigenvalues: at long last, it was possible to get all eigenval­
ues of A without omission.
By comparison, Newton's method produces a sequence of numbers in C or of
vectors in C n which may not converge unless the starting data are close enough to
one of the solutions. The QB method for eigenvalues provides a clear illustration of
the conceptual benefit obtained by working over scalars which consist of matrices.

8.3.3 The BLAS

On the practical side, the benefit can be no less impressive. Block-partitioning is a
successful technique to get a good time-efficiency on parallel architectures. The basic
linear algebra subroutines (BLAS) exploit to their fullest the capabilities of compu­
tation over matrices. Among many others, one can cite two examples of theoretical
(1) the Schur complement formula (1917-1918),
(2) the Sherman-Morrison formula (1950).
Section 8.7 below and Chapter 7 in [8] offer some applications of these identities to
matrix analysis.
The rest of the chapter will review some of the ways by which matrices provide
specific dynamical information about themselves as quantification of endomorphisms,
i.e. linear maps from C n into itself.


8.4.1 The Jordan Form
The Jordan form A = XJX-1 is detailed in Section 1.6.3, pp. 22-26. The Jordan
matrix J [14J can take one of two forms:

• J is diagonal iff A is diagonalisable or semi-simple. Then the spectral infor­

mation is the unique vector μ = (μ$) G C n displaying the n eigenvalues.
• J is bidiagonal iff A is defective. The spectral information consists of two
vectors, the complex μ in C n and a binary one in Z ^ - 1 specifying the Jordan
block structure.

8.4.2 The Eigencomplexity of λ

Let λ be an eigenvalue of A of algebraic (respectively geometric) multiplicity m (re­
spectively g) and index I with g, I G [1, m].
The Jordan structure of λ can be specified by the Segre characteristic {si =
/ > · · · > s9 > 1}, where Sj represents the order of the j , t h Jordan block for λ
(in decreasing order). Alternatively it can be specified by the Weyr characteristic
{w\ = g > - - · > wi > 1}. The Weyr numbers Wj are defined as Wj = v3- — Vj-i,
j = 1 to Z, for i/j = dim Ker (A - XI)j = the nullity of (A - XI)j, with i/0 = 0, so
that vo = 0 < v\ — g < · · · < v\ = m.
g i
It is clear that m = Y^ Si = Y J Wj.

Definition The eigencomplexity ofX is given by the binary structural matrix C\ =

l 9
(cij) G Z | such that \^ °ϋ ~ *> z Z c ^
s = w f
3 an
j=l i=l

Cx = 1

When λ is simple (respectively semi-simple), C\ reduces to (1) (respectively

(1 · · · 1) T G Z | x l ) . The rectangular matrix C\ of size g x / offers the dual ways
proposed independently by Segre in Italy (1884) and Weyr in Austria (1885) to rep­
resent the block structure which had been defined by Jordan (1870) under the form of
a binary sequence of length ra — 1 consisting of s^_i ones and 1 zero, i = 1 to g, and
displaying altogether g zeros.
The spectral information carried by A is expressed with numbers in the fields C
and Z2. By comparison, the metric information is set in M + .

8.4.3 The Singular Values of A

The singular values σ* > 0, i = 1 to n, are the non-negative square roots of the
eigenvalues of the unitarily similar matrices A A* or A* A introduced by Beltrami
(1873) and Jordan (1874). They form the vector σ = (σ<) G R+ n .

Let || · Up denote the Holder norm where p is specialised to be p = 1,2 or oo. Thus
for a; = (xi) e C n , ||x||i = ^ | x » | , ||z||oo = max*|x;|.

Lemma 8.4.1 \\μ\\ρ < \\σ\\ρ, p = 1,2, oo, with equality iff A is normal.

PROOF See Statements 257-260 on p. 194 in Chapter III of [12].


8.5.1 The Left and Right Polar Representations
Given A e C n x n , we define HL = (AA*)1/2 and HR = (AM) 1 / 2 . Each Hermitian
matrix HL or HR, which is uniquely defined, is the left or right module for A.

Definition The matrix A enjoys two forms of polar representation


where the modules HL and HR are Hermitian positive semi-definite, and the unitary
matrices UL and UR are called left and right phase factors.

Proposition 8.5.1 Unlike H, U is uniquely defined iff A is invertible. When A is

singular, then the left (say) phase factor UL = U has the form U — (I — T)U$,
where U$ is one of the left phase factors for A and T satisfies TT* = T + T*,
Im T c Ker A*.

PROOF See Statement 203 p. 183 in Chapter III [12].

Observe that I - T is unitary and A*Tx = 0 for x G C n . Let 0 e sp (A) with
1 < g < m < n; then I — T has at most g eigenvalues distinct from 1. Moreover,
U0-U = TU0 and \\U0 - U\\2 = ||T|| 2 < 2.

8.5.2 A Is Normal
The factors H and U commute iff A is normal (Statement 208, p. 184, Chapter III
in [12] or Proposition 1, p. 191, Section 5.7 in (Lancaster and Tismenetsky 1985).
Therefore the polar factorisation has a unique form: left = right.

Lemma 8.5.2 The factors H and U of A normal have a common orthonormal eigen-
basis together with A.

PROOF H and U are semi-simple and commute. See Exercise 6, p. 240 in (Lan­
caster and Tismenetsky, 1985) or Statement 193, p. 127, Chapter II in [12].

Lemma 8.5.3 IfX = σβτθ Φ 0 is an eigenvalue of a normal matrix A = HU, then

0 < σ e sp (H) and eiQ e sp (17).

PROOF Left to the reader (Exercise 2, p. 192 in Lancaster and Tismenetsky 1985).

8.5.3 From C = C l x l to C n X n , n > 2

The (circular) polar representation z = pe%0 in C can be generalised to A e Cnxn
with the following new possibilities:

• The phase factors are unique iff A is invertible.

• The form of the representation is unique iff A is normal <<==> \\μ\\ρ — \\o-\\pi
p = 1,2, oo. Otherwise ||μ|| ρ < ||cr||p; the left and right forms differ.

Remark The absolute condition numbers for H and U are studied with the Frobe-
nius norm in [5] in the case of a rectangular matrix A G C m x n , r(A) = n < m.
Interestingly for m — n the explicit values for the condition numbers of the phase
factor U depend on the ground field for A. More precisely (1991),

CR(U) = —?—>Cc(U) = —,

where the n singular values σ^ are ordered by non-decreasing values. We observe

that, over C, the absolute condition numbers for A -> A~x in the 2-norm and for
A —> U in the F-norm take the same value 1/σ\. By contrast, when m > n, the
unique value C{U) — ^- is valid over R or C
As for the condition of module H its value

C{H) V2
~ 1 + K{A)

for m > n, K(A) = ση/σ\ does not depend on the field R or C [5]. We mention for
future reference in Section 8.6 that

C2(H) = l+(^^^-) . (8.5.1)


8.5.4 Application to the Real Multiplication Maps in Afc, k > 0

We consider the multiplication maps by a given vector x

Lx\ y^xxy, Rx : y i-> y x x,

where x, y G Α&, the Dickson algebra of dimension n = 2k,k >0.

For k > 2 (respectively > 3) multiplication is not commutative (respectively
associative). The corresponding matrices (also denoted by Lx and Rx) are normal:
let x = a + X, X being the imaginary part in 5sAk\ then x — a — X, Lx — al + Lx,
IJT = Lx = al — Lx and I / ^ I ^ = a2I — L2X = LXL^. Moreover, for k > 2, Lx
and Rx differ but commute and \\LX\\ — \\RX\\ [8]. Therefore there exists a common
orthogonal eigenbasis for Lx, Rx, their module ϋί^ and the respective phase factors
Ux, Vx which differ only by their eigenvalues lying on the unit circle.
(1) For k < 3, L^LX = R%RX = ||x|| 2 / n · The common module is Hx — ||x||/ n ·
Therefore Lx — \\x\\Ux and Rx — ||x||V^.
(2) A significant change occurs when k > 4. The common module of the normal
real matrices Lx and Rx is more complicated than ||x||/ n · The spectrum sp (L^LX) =
sp (R%RX) contains at least the three values ||x|| 2 and a, /3 such that 0 < a < \\x\\2 <
β < 2^ _3 ||χ|| 2 when x is not alternative, i.e. \\x x y\\ φ ||χ|| · ||y|| for some y in A^.
In particular, Lx and Rx can be singular when x ^ 0 is a zerodivisor (By ^ 0 :
x x y — 0). When this occurs, their rank satisfies 2k — 4(/c — 1) < r(Lx) — r(Rx) <
2k — 4 and the phase factors are ambiguously determined (see Exercises 8.5.2 and
The evolution of the polar representation of Lx and Rx over E displays three
major stages corresponding to k = 1, k € { 2 , 3 } and finally k > 4. See |8J for more


Unless otherwise stated, we assume throughout this section that A is Hermitian
positive (semi-) definite. The application in mind is to the left or right module H
for an arbitrary matrix. The better part of the section is adapted from Chapter 3 in
[13], where A is mainly restricted to be real.
When x e Cn is an eigenvector for A, Ax — \x for λ > 0, i.e. the angle
Z(x, Ax) is 0 for λ > 0 and not determined for λ = 0.
When u φ 0 is not an eigenvector, the scalar product (Au, u) is positive. Over R
(A real) the coplanar real vectors u and Au define an acute Euclidean angle Θ(Α, u) —
Z(u, Au) such that (Au,u) = ||Aw||||u|| cos6(A,u) > 0. Over C (A complex),

Θ(Α, u) is the unique canonical (or principal) angle between the complex lines spanned
by u and Au (Definition p. 5, Section 1.2). Below we are interested in this acute "an­
gle" Θ(Α, u) defined for Au φ 0 and its maximal value φ(Α), 0 < φ(Α) < π/2.

Definition The yield of A is the complex number of unit modulus defined as

a(A) = cos φ(Α) + i sin φ(Α) = e**(A).

The argument φ{Α) for the yield is the maximal dynamical play produced by A.

8.6.1 Characterisation of a (A)

LetO < Xmin — λι = μι < · · · < μη = λά = A m a x denote the repeated eigenvalues
μί of A ordered by increasing value: \min and A m a x are the two extreme eigenvalues
for A which are distinct iff A φ λ/ η ; λ = Xl+Xd is the arithmetic mean. The
Euclidean scalar product is written as (x, y) = y*x in matrix notation (x,y eCnxl)
1 2
and (χ,χ) / = \\x\\ denotes the Euclidean norm || · ||2.
We define two real-valued functions:
• O ^ e C M cose(Au) = ^ g ^ for Au φ 0,

• ( n , e ) G C n x R 4 n(A, u, ε) = \\ (εΑ - I)u\\.

The quotient defining cos Θ(Α, u) should be contrasted with the Rayleigh quotient
M(A,x) = [uffi > x φ 0, defined on p. 33, which yields metric information in
[μι, μη], The norm function n satisfies the min-max equality

sup inf | | ( e A - i > | | 2 = inf sup | | ( e A - J > | | 2

||η||<1 ε € Μ ε>0

proved by Gustafson (1968-1972) in the broader context of strongly accretive bounded

operators on a Hubert space, which contains Hermitian positive definite matrices as a
special case. The right-hand side deals with the norm curve ε G M+ \-> \\εΑ — I\\ 6
R + which is continuous and convex in e. The minimum in ε is unique and achieved
for e m with ||s m A — I\\ < 1.

Theorem 8.6.1 Let A be Hermitian positive definite. The components ofa(A) are
cos φ(Α) = ^ ^ f = min u ^ o cos0(A,Tz) and sin φ(Α) = ^ ^ - \\emA - I\\,
ε - - Id

PROOF Chapter 3 in [13].

Letting λι = 0 in Theorem 8.6.1 yields φ(Α) — π/2 and φ(Α) < π/2 when
λι > 0 by assumption on A: the quadratic form x* Ax is non-negative for all x ^ O .
Hence (Ax, x) = 0 for x φ 0 <<==> Ax = 0 <=> x £ Ker A. The vectors x and

Ax — 0 are trivially orthogonal when x G Ker A. In the limit λι —> 0, one can set
Θ(Α, u) = π/2 for all eigenvectors u in Ker A which are associated with 0. On the
other hand, Θ(Α, x) — 0 for all eigenvectors x associated with λ > 0. This introduces
a sharp distinction between the kernel Ker A and all eigenspaces Ker {A — XI),
λ > 0. In the following examples, the positive (semi-) definite matrices of interest
are the modules of a non-Hermitian matrix.

Example 8.6.1 Let A invertible and non-normal have two polar representations:


The eigenvalues of HL and HR are the singular values for A: the extreme ones define

sin φ\Η) IA-U

Ai + Xd σι + ση X
where H (respectively U) stands for HL or HR (respectively UL or UL)·

Assuming that d > 2, we set w\ = Λ +fA and w\ = XX+X , 0 < w\ < 1/2 <
w\. We choose the square roots w\ > -4= and 0 < Wd < -j=. Then cos φ(Α) =

Example 8.6.2 Let us apply this trigonometric view to the Hermitian factor H for
A — HU (say) not invertible. Then φ(Η) = π/2 and the phase factor is not uniquely
defined (Proposition 8.5.1). The value π/2 for φ(Η) signals the singularity of H and
hence of A.
Now let us go back to the Remark in Section 8.5.3 and let m = n = r(A). We
can readily interpret (8.5.1) as

1 <C2{H) = l + s i n 2 φ(Η) < 2 ,

where H is the module of an invertible A (so that 0 < Ai = σ± < Xd = <Jn)·

Equivalent^, C(H) = f / ä ^ L with em{H) = γ-^τ- = i (Theorem 8.6.1); this
provides an interesting metric way to look at the condition number C(H). Similarly,
Cu(U) — em(H), where H is the module restricted to the plane spanned by two
singular vectors associated with the lowest singular values σ\ < σ<ι when they differ.
Actually, the three quantities 1 < K(A) < oo, 1 < C(H) < \/2 and 0 <
sin φ(Η) < 1 are merely different ways to measure the distance of the original matrix
A to singularity by means of its extreme singular values σ\ < ση. The minimal
values, respectively 1,1 and 0, correspond to A = αζ), Q unitary, a φ 0 in C, so that
H = \a\I, φ(Η) = 0. The upper bounds, respectively oo, \/2 and 1 correspond to A
singular (Ai = σ\ = 0 and φ(Η) = π/2). It is important to keep in mind that K(A)
is the relative condition number for A ^ A~l in the 2-norm, whereas C(H) is the
(absolute = relative) condition number for A \-> H in the F-norm.

Let A = UYV* e C n x n and AA = UBV\ and let H'(A) denote the Freenet
derivative of H : A\-+ (ΑΑ*)1/2; then

C(H) = \\H'(A)\\F = max ||J-T'(A)AA||F,

||AA|| F =1

with ||ΔΑ||,ρ = ||J5||i?· It can be shown that C(H) is achieved for the rank 2-matrix
B = wxeiel + wdene{, w\ = j+fa and w\ = χ^χ^; hence ||S|||, = tr BBT =
i + wd = 1 anc* ll-^lb = wi < 1. Moreover, £?2 = wiu^(eief 4- ene^)9 where
^ i ^ d = I cos(/>(#). ThusO< ^wrwj = p(B) < 75 < II-BH2 = wi < 1.
When λι -> 0, so does Wd and B tends to the nilpotent matrix e\e^\ r(B) — 2
drops to 1 in the limit.
More details are found in [5] and in Section 4.6 of [13]. Actually the practising
numerical analyst will find ample food for theoretical thought in all of Chapter 4 of
Example 8.6.3 Let A represent the real matrices Lx and Rx defined in Section 8.5.4.
The common module is Hx. For k < 3, Hx = ||x||/ n · Therefore φ(Ηχ) = 0:
there is no dynamical play for the multiplication maps. Alternatively the yield is real:
a(Hx) = 1.
A positive play occurs for multiplication by non-alternative vectors in higher di­
mensional Dickson algebras. Then the norm stops being multiplicative: \\x x y\\ φ
IMIIMI [8]· If a; is a zerodivisor, φ(Ηχ) = π/2 and the yield is pure imaginary:
a(Hx) = i.
The above discussion shows that the linear map which multiplies by x in Ak is best
(respectively worst) conditioned when x is alternative (respectively x is a zerodivisor,

Going back to A Hermitian, we only assume from now on that it is positive semi-
definite, so that 0 < φ(Α) < 7Γ/2. If A = XI, X = λι = λ^ and φ(Α) = 0 because
all vectors are eigenvectors. In conclusion, 0 < φ(Α) < π/2 iff A is invertible ^ XL
Let x\ and xd be a choice of normalised eigenvectors in C n for A associated
with λι and λ^ respectively. We denote S — S(x\,Xd) = {u G Cn;u = zx\ 4-
z'xd, ζ,ζ' G C, |z| 2 +|2/| 2 = 1} = 5 3 the unit sphere in E 4 , and 5 0 = S0(xi,Xd) =
{v G C n ; u = eiew1x1 + e*'wdx&, Θ, ff G R} 9* S1 x S 1 , where S1 is the unit circle
in R 2 . Note that e%ew\ and e%e Wd represent arbitrary square roots of Λ ^ Λ and
λΙΤΧ^ respectively.

Theorem 8.6.2 Let A be invertible, then sin</>(j4) = 11(A +A ^ ~ ^UW for αγν
u G S and cos φ(Α) is achieved for any v in So C S.

PROOF By an easy adaptation to the complex case of Theorem 3.1 on pp. 31-32
in [13].

When A is symmetric rather than Hermitian, the eigenvectors x\ and Xd, as well
as u and v are real vectors in R n . The analogues of S and So are respectively R =
R(xuxd) = {u € Rn;u = axi + α'α^,α,α' e R, a2 + a / 2 = 1} ^ 5 1 , and
i? 0 = Ro(xi,Xd) reduced to the four points {v e Rn;u = ±wix\ + (±WdXd)}
lying on R.
Corollary 8.6.3 When A is symmetric positive definite, the sets S and So in Theorem
8.6.2 are replaced by the sets R and Ro respectively.
PROOF Clear.
Several remarks are in order:
(1) If A is singular (λι = Wd = 0), the set S (respectively i?) is unchanged with
Au — z'xd (respectively a'xd) so that sin φ(Α) = 1. The sets So and Ro are
reduced, according to Wd = 0, w\ — 1 to be subsets of Ker A, leading to
cos φ(Α) = 0, as expected.
(2) When d = 1, λι = λ^ and w\ = Wd = 4^.

(3) The different behaviour of cos Θ(Α, u) and n(A, u, e) with respect to minimi­
sation is quite remarkable. It deserves further study |10J.
(4) cos φ(Α) is called the first "antieigenvalue" with "antieigenvectors" in RQ
(A real) in Gustafson's parlance. And the general study of cos φ(Α) and sin φ(Α)
is referred to as matrix trigonometry by its inventor. More generally, over C
φ(Α) is merely a particular case of canonical angle between complex subspaces
of dimension r = 1 generated by u and Au. For 1 < r < n, the associated
trigonometry was presented earlier in Chapter 1, Sections 1.2 to 1.5, as a handy
tool devised to quantify the convergence of eigensolvers in Chapters 5 to 7. But
Gustafson had a different goal in mind for A symmetric.
The difference is mentioned in Exercise 5, Section 6.7, pp. 119-120 of [13J.
However, the maximal canonical angle shown in Figure 1.3.1, p. 9 has lost in
|13| the proper reference to its twofold origin: statistical over R (Afriat 1957)
and numerical over C (Section 1.14 on p. 43). Many examples of the use of the
angle φ{Α) to analyse the convergence of iterative linear solvers are provided
in Chapter 4 of [13].
(5) Unlike the Euclidean angle Z(u, Au) in the real 2D-plane, the canonical angle
Θ(Α, u) derived from the complex Euclidean scalar product has no familiar
interpretation in real 4D-geometry.
Therefore the map u i-> Au expresses, when u is not an eigenvector, a change
of direction only over R. Over C, the change appears, in real terms, as an evolu­
tion of a different, more intricate, nature. It incorporates the complex structure
of the ground field C in a computational manner to be studied elsewhere [10].

(6) The argument φ(Α) measures the maximal dynamical play that the matrix A
can induce on a vector u. This viewpoint is a move away from the more fa­
miliar search for colinearity, that is for eigenvectors of A. This alternative
viewpoint does not look for the directional invariance expressed by A through
its eigenvectors. It looks rather for the directional laxity that A can express by
an inner coupling of any two distinct eigenvalues, as we shall see in Section
8.6.3, the maximal laxity being achieved by φ(Α) for the extreme eigenvalues
λπιίη, Xmax> The larger the φ(Α), the greater the evolution that is possible un­
der A by coupling λι, λ^. The number λ = Λι+Λ<* represents the middle point
in the spectrum. Let A = QDQ*; the maximal value Ad ~ Al = ||A - λ/|| =
\\D — XI \\ = maxi<-K n \ßi — X\ is achieved by \\(A — XI)u\\ for u e S.

8.6.2 The Euler Equation Associated with cos 0(A, u)

We recall that the Euler equation associated with the Rayleigh quotient M(A, x) for
a Hermitian matrix A is given by Ax = Xx, x φ 0: x and Ax are colinear in C n
with a real scalar λ. The solutions are the eigenvectors for which the dimension of
the subspace lin (x, Ax) drops from 2 to 1.

Proposition 8.6.4 The Euler equation for cos Θ(Α, u) is given by

A2u Λ Au Λ
T7ö Γ-2τι r+u = 0, Αυ,φΟ. (8.6.1)
{A2u,u) (Au,u)
PROOF See Section 3.3, pp. 33-37 in [13].
Observe that A2u is a real linear combination of An and u.

Corollary 8.6.5 The Euler equation for cos Θ(Α, u) is satisfied by eigenvectors of A
as well as by linear combinations of normalised eigenvectors Xk and x\ associated
with 0 < Xk < Xi, 1 < k < I < d, lying in So(xk,xi) = {v = eieWkXk + £τθ wixi}>
where w
t = x^xl> 1/2. wf = ^ < 1/2

PROOF See Section 3.4 on pp. 39-44 in [ 13].

Generally speaking the three vectors u, Au, A2u are independent. They are col­
inear if u is an eigenvector, or they have rank 2 if u E So(xk,xi). In the latter case,
(8.6.1) entails ||u|| = 1, A2u = (Xk + Xi)Au - XkXiu.
There are ά^ψ- distinct positive critical values for cos Θ(Α, u) when A is invert-
ible (λι > 0). Can we relax the condition λ& > 0 in Corollary 8.6.5? For the choice
0 = λι < λ/, the vectors in So(xi,xi) are normalised eigenvectors e%ex\ in Ker A
and (8.6.1) is not defined. However, φ\\ = π/2, 1 < / < d, can be inferred in the
limit λι -Λ 0.

8.6.3 The Local Dynamical Play 4>ki for 0 < \k < Xi

Given any pair Xk < λ/, the dynamical play φω satisfies the relations

0 < cos0^ = - — r T - , s i n f c = -——7- < 1.

It is clear that φηύη{Α) < φω < φ(Α) = ^ m a x (A) < π/2, where φηιιη corresponds
to minAfc<A; (A|/Äfe, 1 < fc, / < d), just like φτηαχ corresponds to max(A/ — Xk) =
Xd - λι.

Corollary 8.6.6 s i n ^ = | | ( ^ x t A — I)u\\ is constant for u e S(xk,xi) = {u

zxk + z'xr, \z\2 + \z'\2 = 1}.

PROOF Straightforward calculation.

Let Xk\ = Xk+Xl represent the middle point in [λ&, λ/]. Corollary 8.6.6 indicates
that ||(A - Xkil)u\\ assumes the constant value Xl~2Xh = Xkl sinφ^ for any u e
Given the two normalised eigenvectors xk and x\, a vector in S(xk,xi) (Corollary
8.6.6) is defined by z, z' <G C, which describe the unit sphere in R 4 . By contrast the
vectors v in So(xk,%i) (Corollary 8.6.5) are defined by two real independent variables
θ,θ', corresponding to s — elGWk and s' = e%e w\\ now |s| and \s'\ are determined
by Xk < Xi and only the signs are arbitrary and independent on the unit circle S 1 .
Over M, the dynamical play resulting from spectral coupling is illustrated on Fig­
ure 8.6.1. We use the notation 0 < λ = Xk = OA < Xf = λ/ = AB. Then
A+A' λ ' - ^ , AC = Λ/ΛΑ7 = g, CD = 2 ^ 7
OM = CM 2 a, AM λ+λ'
= h > λ,

and φ = Z(CA,CM) = φ^. Observe that cosφ = f ~. We shall meet the

triangle ACM again in Section 8.6.4.
The three quantities h < g < a are known as the pythagorean means for λ and λ',
being respectively the harmonic, geometric and arithmetic means.

Figure 8.6.1: The pythagorean means

Another useful geometric picture is given in Figure 8.6.2 by means of the trapez-

Figure 8.6.2: The harmonic mean

ium ABCD, where the opposite sides AB and CD are parallel with lengths λ and
λ'. The diagonal lines BD and AC meet at E: the parallel line through E to the
sides AB and CD meets the concurrent sides at F and G. Then Thales tells us that
2FE = 2EG = FG = 2 ^ 7 = h, the harmonic mean of the side lengths λ and λ'.
Observe that Figure 8.6.2 concerns the particular case of a complete quadrilateral for
which one pair of lines meets at infinity.

8.6.4 An Epistemological Conclusion

In classical spectral analysis geared towards invariance of direction, the variational
properties of the Rayleigh quotient for Hermitian matrices have been known for al­
most 150 years (Weber 1869). They were instrumental in the early developments of
spectroscopy which eventually led to quantum mechanics and to the discovery of the
DNA structure in molecular biology. Spectral theory for A Hermitian compares x and
Ax when they are colinear. There are n real ratios μ&, k = 1 to n, provided by the
min-max Theorem 1.9.1 on p. 32: these ratios are the n repeated eigenvalues of A.
If we restrict our attention to Hermitian positive definite matrices A Φ XI, the
coupling of any two distinct eigenvalues Xk < Χι yields d^^- critical values cos φ^ι
for cos 0(A, u), each value being achieved in So(xk, #z). At the same time, sin 0(A, u)
remains constant on the much larger surface S(xk, xi).
We observe that the dynamical plays ΦΜΛ < k < I < d, can be quantified in
dual algebraic/geometric ways:
(i) c o s f e = ψ ^
(A )
\\AuMu\\ > ^ τ ^ Ο , for any ue S0(xk,xi),
(Ü) s i n f e = Α ^ = ΐ _ ^
= \\(τ±-Α-I)u\\ forany ue S(xk,xi).
The algebraic formulae are equivalent. This is not true of the geometric versions
because of the strict inclusion 5ο(χ&, %ι) C S(xk,xi). Can </>Μ be interpreted over R
(respectively C) as the Euclidean (respectively canonical) angle Θ(Α, u)l Not always.
It can be checked that j j ^ ^ y = 2 A ^ ^ for u = zxk + z'xu 0 < Xk < Xi, iff
\z\2 = w\ and |z'| 2 = wf: ξ = \z\2 = 1 - \zf\2 is the unique solution in ]0,1[ of the

quadratic equation (ξ - χ χ^_Χι ) 2 = 0, and hence \z'\2 = X*+Xl ·

The trigonometric approach of Gustafson has uncovered a geometric difference
between the real and imaginary components of the yield and, more generally, of el(^kl.
Because (Au, u) — \\ Al/2u\\2, we understand that the difference comes from the shift
which takes place in -J-1| (A — XMI)U\\ . To account for this geometric difference, we
call catchvectors all vectors in So{xk,xi)'- they are ca/?ta red/predicted by the Euler
equation for cos 9(A,u). Catchvectors v guarantee that the dynamical play is any
"angle" Θ(Α, v). Moreover, the three vectors Xkiv, Av and Vki = Av — Xkiv have the
respective norms \\Xkiv\\ = a, ||At;|| = \Λ/~λ/ = g and ||Vfej|| = λί~2Xh.
Over R, the triangle ACM of Figure 8.6.1 tells us that Vki is orthogonal to Av, as
can be checked directly. See Figure 8.6.3 in the real plane {v, Av}, where the indices
k and I are omitted.

Figure 8.6.3: v is a real catch vector

It is remarkable that sin φ^ = ^ x^ remains constant even when φ^ does not

represent the angle Θ(Α, u), letting Uki = Au — Xkiu describe the sphere of radius
Xki s i n ^ / for u G S(xk,xi). The Rayleigh quotient of A on the invariant subspace
M = \in(xk,xi) is a 2 x 2 matrix with eigenvalues λ^ < Χι whose properties are
examined in Exercises 8.6.1 and 8.6.2.
Future research may unravel more of the computational significance for the dis­
symmetry between norm and scalar product quantified as sin φ^ι and cos φ^ respec­
tively. It is clear that classical (circular) trigonometry in C erases the difference ex­
pressed in C n x n , n > 2, only.


8.7.1 Outlook
The previous section has presented the effect on cosine and norm quantifiers related
to A when two distinct eigenvalues are coupled, under the assumption that A is Her-
mitian positive semi-definite.
The present section offers a different perspective on coupling. It considers the
direct coupling of two arbitrary matrices A and E of order n, which takes the linear

form A(t) = A + tE, where the complex parameter t, which describes C = C U

{oo}, represents the intensity of the coupling between the original matrix A and the
deviation matrix E of rank r(E) = r, 1 < r < n. The theory of homotopic deviation
(HD) looks at the question: "What is the fate of sp (A(t)) as \t\ -> oo?" When r < n,
some eigenvalues X(t) of A(t) may not escape to oo but converge in C. These limits
at finite distance in C form the set Lim = lim^i^oo sp (A(£))\{oo} c C, which
may be empty.
The HD theory represents a vast generalisation of the classical analytic perturba­
tion theory where the parameter t is bounded, |t| < 1, so that \\tE\\ < \\E\\ < oo,
which was presented in Chapter 2 (see also Chatelin 1983, Kato 1976, Baumgärtel
1985). In HD, \\tE\\ —> oo and the limit set Lim is characterised by means of A
and the singular representation E = UV*, where U, V G c n x r have rank r and are
deduced from the SVD for E.

8.7.2 Analyticity of the Map t \-^ (A(t) — zl)'1 around 0 and

The word "homotopic" stems from the formal factorisation

R(t, z) = (A(t) - zl)~l = (A - ziy1 [I + tE(A - ζΐ)'1}-1,

which is considered for z e res (A), t G C. The Sherman-Morrison formula is used

to write
R(t, z) = B(0, z)[In - tU{Ir - t M 2 ) - V * # ( 0 , *)],
where Mz = V*(zl — A)~1U e C r x r is the bottom-up communication matrix
between the two levels of information corresponding to C r and C n , r < n. Because
of the above form of factorisation chosen for the resolvent of A(t) at z in res(-A),
R(t, z) exists for t G res(M~ x ) when r(Mz) — r. Any z in res(A) such that ίμζ — 1
for some μζ G sp(Mz) is an eigenvalue X(t) for A(t). The homotopic spectral link
ίμζ = 1 between t and z entails that |t| —> oo iff μζ -ϊ 0.
Altogether HD theory is a 3-level information processing, in which the third level
is C n + r . The top-down information processing is realised by the augmented matrix
) £ (£(n+r)x(n+r)^
M^) = y y* 0r

Definition π(ζ) = det A(z) is the homotopic polynomial with degree d, 0 < d <
n — r. Its zeroset is Z = {z G C; π(ζ) = 0}.

By the Schur complement formula

jt{z) = π(ζ) detMz for z G res (A).


The resolvent R(t, z) is analytic in t around 0 (|£| small enough) and oc (\t\ large
enough) for z in res (A)\F(A, E), where F(A, E) = {z e res (A); r(Mz) < r} =
Z Π res (A) = Lim Π res (A).
The points in F(A, E) are frontier points, i.e. those points in res (A) where
R(t,z) is not analytic around \t\ = oo: they are the limits of X(t) which exist in
res (A). As \t\ —> oc they attract the flow of spectral information consisting of
the spectral rays t — \t\eie i-> X(t) for Θ fixed, which do not escape to oc or to an
eigenvalue of A.
Critical points are particular frontier points z where Mz is nilpotent<^=> p(Mz) =
0. At such a point z, the map: t >-» Ä(t, 2) is a matrix polynomial in t of degree < r.
The critical points repel the spectral flow for |t| < oo but become asymptotically
attractive as \t\ -» oo. Observe that if r = 1, all frontier points are critical because
Mz is reduced to be a complex scalar. In computational practice, the asymptotic
regime is reached as soon as |t| > 300 or so.
This sketchy summary attempts to convey the flavour of the easy part of HD the­
ory, which is able to characterise Lim Π res (A) with a few more analytic tools than
the classical spectral theory presented in Chapter 2. The study of Lim Π sp (A) is
more subtle: it calls for a significant generalisation of spectral theory and uncovers
new computational phenomena. For example, when 2 < r < n, it is possible that
a limit eigenvalue λ ( lim X(t) — X G sp (A)) does not belong to Z (π(λ) Φ 0).
When this is the case, a local bottom-up organisation of the information occurs at λ.
This contrasts with the top-down organisation which is the rule at all frontier points
in res (A).
The interested reader is referred to Chapter 7, pp. 247-346 in [8]] for an in-depth
treatment of HD. Reference [6J treats an example arising from the discretisation of
the acoustic wave equation where the homotopy parameter is the complex admittance.
The critical points correspond to frequencies for which no finite value of the admit­
tance can cause a resonance.


8.8.1 Contrasting AB and BA

When A (or B) of order n is invertible, then AB and BA are similar: A(BA)A~1 =
AB. When det A = det B = 0, AB and BA share the same characteristic poly­
nomial: they are isospectral, and hence they are surely similar when they are diago-
nalisable (for example if B = AT or A*). Even if AB and BA are not similar, the
following augmented matrices of order 2n are always similar:

/ In A \ ( 0n 0 \( In -A\ ( AB 0 \
V 0 In J\ B BA ) \ 0 In ) \ B 0n J-

The Jordan forms for AB and BA differ only at the eigenvalue 0 for which the sizes Si
of the Jordan blocks satisfy \si(AB) - Si(BA)\ < 1, i = 1, to max(g(AB),g(BA))
g(AB) g(BA)
while keeping m = ^ Si(AB) = Y^ Si(BA) invariant as the common alge-
i=l i=l
braic multiplicity of 0 [ l l ] .
Let 0 be a defective eigenvalue of AB characterised by the three integers g, /, m,
with g,l e [l,m]. Then, the structural matrix Co (AB) of size g x I for AB de­
fined in Section 8.4.2 can be associated with finitely many possible structural matri­
ces CQ(BA) of size g' x /' for BA. The total number N(g, Z,TO)of possibilities is
minimal at 1 when AB and BA are similar. When the similarity exists only at the
augmented level 2n, N can grow exponentially withTO.The reader can find in [16]
an algorithmic description of the possibilities Co (AB) \-+ CQ(BA).
We exploit below the associativity of the matrix product A(BA) = (AB)A and
denote E = AB and F = BA.

Proposition 8.8.1 The resolvents RE(z) = (E - zl)~l and RF(z) = (F - zl)~l

are connected by the relation

BRE(z)A - zRF{z) = I (8.8.1)

at any z G res (E) = res (F).

PROOF A right multiplication of (8.8.1) by F - zl yields the identity BA -zl =


Lemma 8.8.2 If A and B are invertible, the spectral projections associated with
0 ± X e sp (E) = sp (F) satisfy

\PF = BPEA. (8.8.2)

PROOF Let (C) be a Jordan curve isolating λ φ 0 from the rest of the spectrum.
Then (BA)"1 = A~XB~X a n d £ ( A B -ziyxA - ( / - z F " 1 ) " 1 - -\RF-i (|).
We set u = \, du = - # , and thus / BRE(z)Adz = / zRF-i(u)du. It
7c Jc
follows that BPtfA = \PF-i = XPF.

8.8.2 A Local Spectral Analysis of (8.8.1) around a Nonzero Eigen­

value. Part I
In the general case when E and F can be singular we can perform the series expansion
(2.2.5) given in Theorem 2.2.10 on p.70, in the neighbourhood of any nonzero λ, with
common index /, 1 < Z < m, for each resolvent RE(z) and RF(z).

Proposition 8.8.3 The identification of the matrix coefficients for l/(z — X)k, k >
0, in the expansion (2.2.5) for RF and RF satisfying (8.8.1) entails the following

k =0 (I - PF)(I + XSF ) = BSEA, (8.8.3)

k= l DF + XPF = BPEA, (8.8.4)
2<k<l-l Dp + XDp- = BDk£lA,
k =l XDlfl = BDlFxA.

PROOF We equate in (8.8.1) the I + 1 matrix coefficients for l/(z - X)k, k = 0,

to I and write zRF(z) — (z — X + X)RF{Z).
For k = 0, RE(z) yields BSEA, RF(z) yields SF and (λ - z)RF(z) yields PF.
Therefore the identity BSFA - XSF + PF = I should hold, which can be rewritten
as (8.8.3). The relations for k — 1 to / follow in a like manner.

8.8.3 Local vs. Global Information

We observe that if λ φ 0 is defective (I > 1), we run into a contradiction when A
and B are invertible: (8.8.2) XPF = BPEA contradicts (8.8.4) if / > 1 (DF φ 0).
The two equalities are at odds, which result from two different ways of performing
the spectral analysis:
(i) In Lemma 8.8.2, the spectral projections are directly deduced from the resol­
vents RF and RF by Cauchy integration on (C) around λ.
(ii) In Proposition 8.8.3, the Laurent series expansions which converge for z close
enough to λ are used in place of the resolvents.
Depending on whether we choose to reason along the line (i) or (ii), we get con­
flicting conclusions when E and F are invertible but λ is defective. This unexpected
paradox takes place in the associative ring of matrices and stems from an inherently
nonlinear spectral analysis around λ. Recall that λ is the root of a polynomial of de­
gree n which can be arbitrarily large, and that the difference E — F is the commutator
An alternative way to present the paradox is to write PF-\ = PF + \DF, an
equality which challenges the classical result that PF-\ — PF for λ Φ 0 even if
DF φ 0. The additional term jDF comes from the identification of the results of
integration (i) and analytic expansion (ii) applied to the bond (8.8.1) between AB and
BA; it disappears when the product F — BA is studied in isolation as a whole. The
Flanders result that the Jordan forms for AB and BA may disagree only at λ = 0
[11] could be interpreted as an echo of the presence of \DF (resulting from the bond
between AB and BA) which only becomes tangible in the limit λ -> 0 (F~l does
not exist) because the augmented matrices ( J and ( R n ) a r e s ^ m ^ ar -

The paradox induced by Cauchy integration could become even more puzzling if
we were to progress further into the direction (i) with only the incomplete knowledge
of the existence of λ, taken wrongly to be the unique singularity for the resolvents RE
and RF. One would be led to the (most often spurious) conclusion that

AB = BA = XI. (8.8.5)

The equalities (8.8.2) and (8.8.5) are valid only if / = 1 and sp (E) = sp (F) = {X}
respectively. In mathematical theory the first contradiction has gone undetected so
far because spectral consequences have not been drawn for the bond (8.8.1). Conven­
tional wisdom treats the two matrices AB and BA as separate global entities. The
second contradiction is easily resolved by processing local and global information at
the same time. The mind has access to the global information about the spectrum
provided by the common characteristic polynomial TTE(Z) = TTF(Z). Such is not the
case in experimental sciences when experimentalists have access only to the local
phenomenological reality. The computational paradox stemming from local contour
integration is a welcome warning against the dangers of a naive inference from local
to global information.

Remark The warning may not be as far-out as it looks if we ponder on the matrices
Lx and Rx representing the left and right multiplications by x in Ak, k > 0. For
k < 3, these matrices have the simple module Hx = \\x\\I2k, corresponding to
the unique singular value \\x\\. And the development of our physical intuition of the
manifested world is primarily based on the validity in R, C, El and G of this extremely
special situation. It is therefore reasonable to expect that our 3D-based intuition can
be challenged in higher dimensional algebras (k > 4) when the multiplication maps
have not only module factors which carry polymorphic information about themselves
{3, 8} but also phase factors which can be ambiguously defined.

8.8.4 Local Spectral Analysis of (8.8.1). Part II

The complete spectral analysis of (8.8.1) around λ Φ 0 involves, in addition to non-
positive powers, the positive ones (z — X)k, k > 1.

Proposition 8.8.4 Spectral analysis of (8.8.1) around X ^ 0 entails for k > 1 the
infinite sequence of relations

BS^A = (I + XSF)S%.. (8.8.6)

PROOF Direct identification to 0.

When z -> λ, the Z + 1 relations in Proposition 8.8.3 are important because
(z — X)~k converges either to 1 or to oo. We conjecture that these relations underlie
many phenomena commonly observed in the world. But for k > 1, (z—X)k converges
to 0, and it tends to hide the sequence (8.8.6) when z is close enough to λ.

8.8.5 The Exceptional Case λ = 0

When λ = 0, theory tells us that the indices IE and IF for 0 satisfy IE € {1,2}
for lF = 1 and lE e {IF - IJFJF + 1} for lF > 2 [16]. Only additional in­
formation about E and F (such as numerical experimentation) can indicate which
possibility is actually the case. The global connection (8.8.1) which exists between
the non-commuting square matrices A and B (AB φ ΒΑ) provides a rational basis
for the many holistic aspects of life which are more and more frequently observed in
experimental biology and ecology. It also suggests a fundamental difference between
the two cases λ = 0 and λ φ 0.

This chapter has offered a selection of snapshots taken in the booming domain of ma­
trix computation which contain eigenvalues in their inner core - in an explicit and,
at times, implicit fashion. Due to the fast, bush-like, evolution of the domain (from
pure algebra to finance and the Internet) many other aspects could have been pre­
sented. Admittedly, the chapter betrays the author's personal views on the evolution
of mathematical computation over the ages [7,8,9,10].
In her view of computation, matrices will prove themselves to be even more essen­
tial tools than vectors in the scientific understanding of the ever-changing scheme of
living organisms. The polymorphic and possibly ambiguous character of the dynam­
ical information that matrices carry makes them versatile macro-scalars upon which
can be built a complex multilevel processing of information.


The attention of the reader is drawn to [1,2,11,12,13,16,19,22], whose popularity
within the numerical analysis community may not parallel the depth of their signif­
icance for matrix computation. On the theoretical side, Dickson algebras and HD
are studied at length in [8] as two aspects of a developing theory of mathematical
computation called "qualitative computing".

Section 8.5 Polar Representations of A

8.5.1[C] When r(A) = n - 1 for A e C n x n , show that there exist exactly two
left phase factors Ui and U2 = (/ - T)U\, where T = 2 Ä , a e Ker A*. Interpret
\T. Deduce that \ΌΧ - C/2||2 = 2.

8.5.2[D] When r(A) < n — 2, show that there exist uncountably many distinct
phase factors for A.

8.5.3[D] Use Exercises 8.5.1 and 8.5.2 to show that for A e C n x n , the number
of distinct phase factors for A can take the values {1,2} for n = 2 and {1,2, oo} for
n > 3. In the latter case, oo is uncountable.

8.5.4[D] When 1 < r(A) < n, show that the matrix T defined in Proposition
8.5.1 satisfies max ||T|| 2 = 2.

Section 8.6 The Yield of A Hermitian Positive Semi-definite under Spectral


8.6.1[C] Let 0 < λ < λ' be two distinct eigenvalues for A Hermitian positive
semi-definite, with associated normalised eigenvectors x, x' which span the invariant
subspace M = lin (χ,χ'). Let P = QQ* be the orthogonal projection on M,
Q*Q = J 2 . Let B = Q*AQ = &(Q) represent the 2 x 2 Rayleigh quotient of A on
M; see p. 34 in Chapter 1, Section 1.11.1.
(1) Show that B is Hermitian positive definite with sp(.B) = {λ, λ'}.
(2) Set B = ( ? J. Show that tr B = a + c = λ + λ' > 0 and det B =
λλ' = ac- \b\2 > 0.
(3) Prove that \b\2 >0<=>\<{a,c}< X'.
(4) Suppose that λ < a < X = ^ - < c < λ'. Show that 0 < \b\ < ^f^.
(5) Show that sgn6 = τ|τ, b Φ 0, is not specified by the triple {α, λ, λ'}.

8.6.2[C] Let Q = [#i, #2] and choose q\ to be the catchvector t — wx 4- w'x'

over C.
(1) Show that \\At\\ = vXV = g.
(2) Deduce that a = (At,t) = \\At\\ cosφ = \\A^H\\ = h, c = ^ ± ^ and
\b\ = VXX/sm(j).
(3) Explain why q<i is necessarily of the form zx + z'x' with \z\2 — j£y and
U'|2 _ A;
(4) Confirm the value c = Χχ+χ, = (Au,u) with the choice q2 =u= J-χ-^χτΧ-
'■χτχ'. Conclude that B can take the remarkable form B = vXVC with
/ A+A
cos φ — sin c
=\ -sin0 i(xJT + x>J»\ )-CheckthatdetC = l.
Ä+Ä7 1 'XV A7

Solution to Exercises

1.1.1 The columns of (V~l)* are the adjoint basis of the columns of V when
V is non-singular. We remark that when V is not a square matrix, then the
adjoint basis either does not exist or else is not unique. For example, when
Λ 0\
V= 01
VI 0 /
than these exist at least two adjoint bases, namely
1 0\

1.1.6 The spectral radius of a Hermitian positive semi-definite matrix T can

be characterized by

p(T)= max x*Tx.

If we put T = A* A it is found that

1.1.7 If Q*ß = J, then
I I Ö I I 2 = l i e - 1 h=ii O * » 2 = I .
1.1.8 If A is a singular matrix, so is A*A. Hence zero is an eigenvalue of A*A
and therefore a singular value of A. Let n ^ r and i 4 e C x r; if one column of A
depends linearly on the other columns, then zero is a singular value of A.
1.2.1 Let

r = dimM = dimiV>- and t = dim(MnN).


t ^ 2r - n > 0.
n r
Let Qe<C * be an orthonormal basis of M whose first t columns form a basis
of Mr\N. Let Ue<£nXr be an orthonormal basis of N whose first t columns are
those of Q. Then

* \o w)
where W is of order r — t.
This shows that U*Q has at least t singular values which are equal to unity;
hence there are at most r — t non-zero canonical angles between M and N.
r — t^n — r < - ,
that is

Q = (V,Q') and l/ = (F,£/'),
lin(K) = M n N .
M' = lin(Q') and N' = lin(l/');
M'nN' = {0},
M n N + M' + i V ^ M + N',

and the non-zero canonical angles between M and N are the same as those
between M' and N'.
1.2.2 Let QeC n x m be an orthonormal basis of M and let Ue<Cnxm be an
orthonormal basis of N. Let 0, be the greatest canonical angle between M and
N, and put cx = cosöj. Then
öi = i^^Ci = 0 o l / * Q is singular
o 3 w 6 C m such that u Φ 0 and t/*ßw = 0
o3xe<C n such that χφθ and x e M n N 1 .

We remark that the vector u represents the coordinates of x in the basis Q of M.

1.2.3 T is a basis of N if and only if the matrix Y*X is regular. However,
according to Proposition 1.2.2, y * Z ~ c o s 0 is invertible since Θί < π / 2 . Now
one computes:
(T* -X*)(T- X) = I -(X*Y)(Y*X).
If a, is a singular value of T — X,
a? = 1 - cos 2 0f = sin2 0f.
1.2.6 Let X = (Xu X2) and Y = (Yu Y2) be orthonormal bases of <Cn such that
Xx is a basis of M and 7X is a basis of N:
Jf1eCXr> y^C^, * * * χ = yjYj = / ,
where r < w/2. Let

\X*JK \X*Yt X*Y2J \W2l W22)

x n rx r
There exist a unitary matrix Zx e C and a unitary matrix Vx e<C such that
C = Z*^11K1=diag(c1,...,cr)
is the singular value decomposition of Wi t . Hence, by the definition of canonical
angles, we have

where fc ^ r. We define
C' = diag(c 1 ,...,c k ),
and therefore

The columns of the matrix

are orthonormal; hence

= V*W*1WUV1+(W21V1)*(W21V1)
= c + (w*ivlnw2lv1)

and so
(^ 2 1 K 1 )*(^ 2 1 F 1 ) = diag(si,...,sk2;0,...,0)
5.^0 and sf + cf = l (i=l,...,fc).
{n r) x (n r)
Let Z2<E ~ ~ be a unitary matrix whose first k columns are those of W2\ V\
when normalized. Then

S = diag(s1,...,sfc;0,...,0)e<Cr>
Let S' = diag(s lv ..,s k ). Then S' is regular and

We therefore have
l s
Vo zj \w2J
In an analogous manner we determine a unitary matrix K2e(C<n r)x<
" r)
Z*W12V2 = (T,0),
where T is a diagonal matrix with non-positive elements such that
T2 + C2 = Ir.
Thus T= -S. Let

Z=(Z> ? ) and K-f^ °Y

(CO -S' 0 0 \
0 I,_k 0 0 0
Z*WV = S 0 X33 X34. X35
0 0 Λ
43 Λ44 Λ
0 0 -*53 -*54 ^55
The columns of the matrix are orthogonal; thus S'X34. = 0 and so X34. = 0. Also

we deduce that X35 = 0, X43 = 0 and X53 = 0; moreover,

-C'S' + S'X33 = 0,
whence X33 = C. The matrix

Z3 = (X" X
\A54 Xss/
in unitary and

(c o -s' o \
0 /,_* 0 0
ZWV = S' 0 CO
\0 0 0 Z3
/7» ο o o \ /c o -S' 0
0 /,_» 0 0 0 /,_, 0 0
0 0 lk 0 S' 0 C O
\o o o z3y v0 0 0 /„

.-*,('* M and Z = (Z> M

\o z j \o zj
(c -s o ^
0 0 /„ - * /
6 = XjZ, is an orthonormal basis of M,
Q = X2Z2 is an orthormal basis of M 1 ,
17=7!^! is an orthonormal basis of N,
U = Y2V2 is an orthonormal basis of N1,
(c -s o \
[ρρ]*[ΐ/ι/] = S C O
0 0 /„_ 2'J
Q*U = C


we conclude that, in order to compute the canonical angles between two

subspaces of common dimension greater than n/2, is suffices to compute the
canonical angles between their orthogonal complements.
1.6.5 Suppose that all the eigenvalues λί9...9λη of A are distinct. Put
D = diag(A1,...,/lII).
Let Q be a unitary matrix such that
Q*AQ = D + Nl
is a Schur form of A, Let u = (tiy) be a unitary matrix such that
(QU)*A(QU) = D + N2
is another Schur form of A. The matrices Nx = (n\V) and N2 = (nff) are strictly
upper triangular matrices.
We have that

j - l n

k=l k=i+l

0 n

Σ - Σ =o.
fc=l k=n+l

Put j = 1 and ί = n; we find that unl = 0. Suppose that when fc>2 we have
ie{/c,fc+ Ι , . , . , η - l,n}=>wtti = 0;
we deduce that Μ^-!! = 0 . This shows that
wMi = 0 when i = 2,3,...,n.
Now suppose that when j = 2,3,...,/ and k>j + 1 we have
ie{/c,/e+ l,...,w— l,n}=>u o = 0.
It then follows that
Uij = 0 when j = 1,2,...,n— 1 and / = . / + Ι,.,.,η.
We leave to the reader the task of verifying that in the presence of repeated
eigenvalues, the diagonal of U will contain blocks.

1.6.16 Let P = (py) be defined by

fl ifi + j = n + l ,
Pi / = 1
[0 otherwise.
p - i = P = p*
J* = P* JP.
1.6.19 Let X be a basis of eigenvectors of A and let Q be a Schur basis:
A = XDX~\
Q*AQ = D + N\
where D is the diagonal of eigenvectors and N is a strictly upper triangular
matrix. Then


, , ^ Λ JIAMIFV 7 2
H^Ü¥J ■
On the other hand,
2 II ^ Λ Λ
** IIF
1 II 2 || v II 2 |
= \\Χ-'\\\\\Χ\\\\\ΌΌ*\\Υ.
||DD*|| F =||D*D|| F =||D 2 || F ^||>1 2 || F ;
' Λ . ^^
M 2 HF
M M - ~ ^ * | | F = 2(MM|| F -M 2 || F ),

' 2O IIII JA2 | |||2F

1.6.20 Let Q be a Schur basis. Thus

Q*AQ = R = R = D + N,
where D is a diagonal matrix and N is a strict upper triangular matrix. Define
r = K*K-KK* = (y0).
By induction on the size of A it can be proved that



we conclude that

Ι Ι ^ Ι Ι ρ ^ - γ - ) Ί ΐ + - - ^ - ) ; 2 2 + - + — -7«n.

By the Cauchy-Schwarz inequality

12 Ä
/4*/ί - AA* = βΓβ*.

and so

If D Φ λΐ, then || D || F ^ o and the number

s= max \λί-λ·.\

is positive, where the λί are the distinct eigenvalues of A.

„ II^IIF . vM) 5
fl = , 0 = —— , C-
\M\F j2\\A\\l Jl\\D\

It is easy to show that

b - 1 + a2 ^ Jlca^Jl-a2.
If b — 1 + a2 <0 then b2/3 ^b<l —a2, and we obtain the inequality required.
If b - 1 + a2 ^ 0, then
(1 + 2c2)a4 - 2(1 - b + c2)a2 + (fc2 - 2b + 1)^ 0.

a2 ^ — ! — [a - f> + c2 + c(c2 + 2b - b2)1'2].

1 + 2c '

Since c2 < 1, we have a2 ^ 1 — ^b2, that is

6MII 2
1.8.1 Let XeC""" and suppose that X is of rank r<m. Hence there exists a
permutation matrix Π such that the singular value decomposition of ATI can
be written

v*xnu ■e :>
where V is a unitary matrix of order n, U is a unitary matrix of order m and Σ
is a non-singular diagonal matrix of order r. W e may write

Vt/21 i / 2 J
where l / u is of order r, and hence

= "(7' ο>

be the Schmidt factorization of the non-singular matrix Σ Ι / * , . Hence Q n is


unitary and Rll is an upper triangular matrix, both of order r. Hence

=i ":) and R =.
Q being unitary and K an upper triangular matrix.
1.9.1 Let C = A + By where /I and 5 are Hermitian and B is positive semi-
definite. For all ue<En such that ||u\\2 = 1, we have u*Bu^0 and
u*Cu = uMw + u*Bu ^ WMM.
On taking the maximum on the left subject to ||u|| 2 = 1, we obtain
We conclude that

1.9.2 Let C = A — B, where A and B are Hermitian. We consider the spectra

arranged in decreasing order. Thus
Ai(>l)= min max u*Au (j = l,...,i — 1)
fi,...,t>i- i l|u||2= 1
u*t>j = 0

u*Au = M*£M -I- u*Cu.
X1(C)= max M*CM,

An(C)= min w*Cw= — max (— u*Cu).

I|u|| 2 = l l|u||2=l

when || u21| = 1, we have

u*Bu + Xn(C) ^ uMu ^ u*Bu + AX(C).
Now take the maximum subject to
IM|2=1 and ΐ4*ι;,. = 0 (;= l,...,i-1)
and the minimum extended over 3llvl9...,vi^l;itis found that
λη(0 + Xt(B) ^ kt\A) < km + Ai(C).

If C is positive semi-definite, then Art(C) > 0, and so λ{(Α) ^ λ^Β).

1.9.3 Let A be normal, D diagonal, N strictly upper triangular and Q unitary,

such that
Q*AQ = D + N.
N*N = NN*;
if N = (iiy), then
ΐ<;=>η ι 7 = 0
π η

Σ "*.·"*/= Σ ni*"j*> for all ij.

k=l k=l

On putting i = j = 1, we deduce that n u = 0 for all fc. Suppose we have proved

that nkj = 0 for all k when i = j ^ l\ we than conclude that nkll + x = 0.
Hence N = 0 and the eigenvectors of A form a unitary matrix. This implies
that all spectral projections are orthogonal and that A is diagonalisable.
Nevertheless, a normal non-Hermitian matrix can have complex eigenvalues.
For example, a diagonal matrix containing at least one non-zero complex
element is normal.
1.9.4 According to Exercise 1.9.3 we have
Q*AQ = D and ß*/i*Q = D*.
Hence Q is also a matrix of eigenvectors of A*. Thus
A*AQ = A*QD = QD*D,
A* A = 6(2>*D)ß*,
p(A*A) = p(D*D)9
\\A\\2 = p(A).

1.9.5 Suppose the spectrum is arranged in increasing order of magnitude. We

start with the characterization
λΛΑ)= max min .
dimS=j ueS U*U


. u*Au
= max mm
dimS = n-j+l ueS U*U

. U*Ali
= max mm
dimS1=j-l «IS1 U*U
. u*Au
= max min .
dimS = j - l ulS U*U

1.9.6 Let A and B be two Hermitian matrices. Since

p(T)^\\T\\2 for all Γe<C,,x',
we conclude that
C= A-B+\\A-B\\21
is positive semi-definite. Let
Λ' = 4 + Μ - Β | | 2 / .
A =£ +C
and we apply Exercise 1.9.2 in order to deduce that the eigenvalues of A and
of B may be numbered in such a way that
ki(A) = ki(A)+\\A-B\\2,
we have that

2.2.3 First we state the following facts: If
J = (xM)
is a square matrix such that
U ifa = )3,
x*ß= \ 1 if P = a + 1,
[0 otherwise,

(a) J is non-singular, if and only if λ Φ 0;
(b) if λ Φ 0 and J " l = (^), then
0 ifa>0,

if a ^j?.

On the other hand, let V be a Jordan basis of A: V = (Xl9...,Xd). If

(K ) = (Xiiti,...9X^d)i
Ρ^ = XjX+j
is the spectral projection of A associated with the eigenvalue Xj. If zesp(A\ then
Pj is also the spectral projection of (A — zl)~l associated with the eigenvalue
(Aj — z)~i. We conclude that

is the spectral decomposition of A and therefore that of (A —zl)"1 is given by


VΣ (-ö/
lj being the least integer such that

Finally, let Tk be a Jordan curve which isolates Xk from the rest of the spectrum
of A. Then

f -M~ • 2πϊ if j = fc,


—=0 0 = 1,2,...,<*;«>()).

-ΛΓ μ - ζ / ) - 1 α ζ = ρλ = ^ χ ^ .
2πι Jrk

2.3.1 The system

has n unknowns, n + r equations and is of rank n. If we use Gaussian elimination

with partial pivoting, then we obtain a system of the form

where T is an upper triangular matrix of order n. The matrix T is obtained by
premultiplying in turn by permutation matrices and Gaussian elementary
matrices. If, instead of the latter, we use Householder matrices (Exercise 1.8.5),
then we obtain the Schmidt factorization.
The final structure is of the same form.
2.3.2 It is trivial that diagonalisable matrices with the same spectrum are
similar. Defective matrices are similar if and only if they possess the same
spectral structure (eigenvalues with the same algebraic and geometric multi­
plicities and the some indices); that is if and only if they possess the same
Jordan form.
2.3.3 We show that λ is not an eigenvalue of (/ — Π) A: if λ were an eigenvalue
of (/ — Tl)A, we should have
(/ - U)Au = Xu
for some u Φ 0. If λ φ 0, then
0 = Π(/ - U)Au = /UIw=>nu = 0,
u = (I - Π)Μ Φ0
(l-U)A(l-Ti)u = ku\
that is λ is an eigenvalue of (/ — U)A(I — IT), which is impossible because λ is
not an eigenvalue of B. We deduce that the unique solution is
ζ = Σί>.
2.3.6 We refer to the identities
(/-Π)(/-Π1) = / - Π ,
(/-Π1)(/-Π) = / - Π 1

in order to establish the equations

Σ(Π 1 ) = (/~Π 1 )Σ(Π),

whence we deduce the inequalities

Ι!Σ(Π^)|| 2 ^ ||Σ(Π)|| 2 ,

2.4.3 Consider the Sylvester equation

(/ - P)AX ~XB = R
for a given matrix RelR"* . The 2 x 2 matrix B is the Rayleigh compression
corresponding to the spectral projection P associated with a double non-zero
eigenvalue λ. Let V = (Vt V2) be the Jordan basis of B. The Jordan form of B is

where a = 0 if λ is semi-simple and a = 1 if A is defective. Now carry out the

change of unknowns:
Y = (YlY2) = XV9

and the new equation becomes

(1-P)AY-YJ = RV.
An easy computation yields
Y2 = SRV2 + ocSYl
= SRV2 + ocS2RVl.
Hence the reduced resolvent S and the block-reduced resolvent S satisfy the
SR = X = YV~x = SR + OLS2R(0 VX)V~X.

Hence SR = SR if λ is semi-simple (that is a = 0).

2.6.1 According to Lemma 2.6.2

A'(xk -xk-x) = b- Axk_loAf(xk - x') = (Α' - A)xk-V
l l
Hence xk converges to x = A~ b if and only if p[A'~ (A' — A)"] < 1, in exact
arithmetic. Why can the computation of b — Axk present a problem in finite
precision arithmetic?
Verify that, in arithmetic with three decimal places, the solution of

0.986 0.579\/u\/0.235\
0.409 0 . 2 3 7 / W ~ MU07/

gives rise to the interates

/ 2.11 \ / : N
1.99 \ / 2.00
V-3.17 A - 2 . 9 9 / V --3.00>
The exact solution is (.3).
2.10.1 In accordance with hypothesis (H2), there exists rle(0,r) such that
whence F'(x) is non-singular for each x such that || x — x* || < r x . Now the map
xh-+F'(x) -1 is continuous in the neighbourhood of x*. Hence there exists
r 2 e(0, rt) and μ > 0 such that
Finally, there exists pe(0,r 2 ) such that

||x-x*||<p and | j y - x * | | <p=>\\F(x)- F(y)\\ < — .

xk = x* + t(xk - x*) (0 ^ t < 1).

F(xk) = F(x k (i)(x f c -x*)di.

Suppose that || xk - x* || < p (which is true when k = 0). Hence

[F'(x k (0)-F'(x k )](x k -x*)dr

llx k+ i-x*ll<illx*-x*l·
On the one hand, this shows that x k + 1 satisfies
and, on the other hand, that
lim xk = x*.

Now, on using the sharper bound

||xk+1-x*KHx*-x*ll sup \\F(xk(t))-F(xp)
0<i< 1

and the fact that

lim sup ||F'(x k (i))-F(x k )||=0,
fc-QO 0 < ί < 1

we conclude that the convergence is superlinear.


2.10.2 By virtue of (H3) we have

sup || F(xk(t)) - F'(xk) \\<l\\xk- x* ||',
Ο^ί^ 1

whence, by Exercise 2.10.1,

If the Jacobian matrix satisfies the Lipschiz condition, then p = 1 and the
convergence is quadratic.
2.11.2 W e d e n o t e / = | | J ' - 1 | | , P = l | / i ^ H , i = I I ^ H , w = | | y * | | , s = | | y M ( / -
X' y*) ||, v, = J ' " l HX\ || Vx || ^ γ'ρ = πχ by definition. We suppose that || Vk \\ ^
nk, and set
β! = H J ' " 1 » ||. ε2 = \\Υ*ΗΧ'\\, η = ε1^γ'ε2 and e = yt2sp.
|| Vk+11| ^ nx 4- exnk 4- )>'ε2πΛ 4- /STT*
= π! -f fo -f /ε 2 )π* + /$π* = π*+ x (say).
Set nk = π χ (1 4- xk) for k ^ 1. Forfc= 1, Xj = 0; forfc= 2,
π 2 = π ^ Ι 4- η 4- v's^) = π ^ Ι 4- ?y 4- ε) = π ^ Ι 4- x 2 )
π* +1 = ^ [ 1 + ^(1 4- xk) 4- ε(1 4- x k ) 2 ]
= π 1 [1 - x k + 1 ]
which defines the recurrence relation Χχ=0, x k + 1 — η 4-ε4-(*7 4- 2ε)χΙς + εχ£,
The limit x satisfies x = /(x) = εχ2 4- (η 4- 2ε)χ 4- ff 4- ε; χ = /(x) has two real
roots if 2 Ν /ε + ^ < 1 . (One can verify that lyfz + η < 1 implies that the
discriminant is positive.) Let x* denote the smallest root. When fc-+oo, xk
converges monotonically from xx = 0 towards x*, and nk converges to π χ (1 -f x*).
We determine a sufficient condition under which G' is a contractive map in the
closed ball:
®= {ν;\\ν\\^(\+χ*)π1};
G\V) - G'(K') = J ' " * [tf(K - V) -(V- V)Y*HX'
+ (κ- V)Y*AV+ VY*A(V- vy\

II ξ II < II V - V || [ε, 4- / ε 2 + 2s/(l + χ*)π λ ]

It is easy to check that if η 4- 2ε < \ (which implies */ 4- 2^/ε < 1) then x* < 1
and the Lipschitz constant k for G' in & satisfies k = ^ 4- 2ε(1 4- x*) < 1. Therefore,
these exists a unique fixed point V= X - X' in ^ , with y*K= 0, such that
||F|| = | | X - X ' H < 2 | | F 1 | | = 2 | | J ' - 1 / / X ' H .

B - Β' = Y*AX - Y*A'X' = Y*A(X - X') + Y*(A - A')X'
The condition */ + 2ε < | is rewritten as
y'||H||+/||H||iu + 2y'25||//||i<i,
that is
/||H||[l+iii + 2/si]<i.
If we choose the Euclidean norm | | | | 2 and the bases Y = X' = Q to be
orthonormal, then t = u = 1, and the sufficient condition becomes

2.11.3 We denote ^ = ||J" 1 1| 9 \\Β-Β\\=δ, | | Ä | | = p with R = AU-UB,

|| 51| = s, where 5 = Y*A - BY* = y M ( / - (7 y*) is the left residual matrix. We
define K1 = - J " 1 Ä and G I K I — ^ + J - 1 [K(KMK)+ K ( B - £ ) ] . Clearly
/ B |S*
(2.11.4) is a modification of (2.11.1) where the block decomposition I — —
(B \S*\
has been modified into I —h— I. \\Vl\\ ^γρ = πχ and we suppose that || Vk || ^
nk. Then || Vk + x || < π t + f5π^ + ?πΛ<5 = πΛ + x. Upon setting nk = n1(\ + xfc),
ε = y2sp and <5' = f<5 we obtain x = H m ^ ^ xh which satisfies x = /(x) = εχ2 +
(2ε + <5')χ + ε. This equation has two real roots if <5' + 4 ε < 1 . Under this
condition, xk converges monotonically from xx = 0 towards x*, the smallest
root. We consider now G on the ball
« = {K;||HK(l+x*)Ä1};
G(V)- G(V') = yil(V- V')Y*AV + ΥΎ*(¥- V') + (V- V')(B - B)]
= | | { Κ Ι | Κ - Κ ' | | [ 2 ε ( 1 + χ * ) + 5'].
We leave it to the reader to check that the condition δ' + 4ε < 1 implies x* < 1;
& = 2ε(1+χ*) + <5'<4ε + <5'<1.
This condition is written as 4y2sp + γδ < 1. One should remark that, without
perturbation, (2.11.1) converges if 4y2sp < 1, whereas, with the perturbation,
B — B, (2.11.4) converges if
4fsp + yo< 1, where(5= \\B-B\\.
We conclude that the unique fixed point V = X — U satisfies
||K|| = | | X - C / | | < 2 | | J - 1 J R | | .

2.11.4 We have the identity

xk + l - x * = - Γ"1 Γ [F(x,(i))- T](xk-x*)di,

χΛ(ί) = x* + i(xk - x*) (0 < t < 1).
Hence, if || xfc — x* || < p, we have
2.11.6 Let xeQ = {xeB: \\ x — x0 || ^ p}. Define the operators
G(x) = x-F'(x0rlF(x),
L(x) = F(x) - F(x0) - F'(x0)(x - x0).
It is easy to prove the following inequalities:
|| L(x) - L(x0) K lp II x ~ x0 II < P2,
\\G(x)-x0\\^mlp2 + c = p,
\\G'(x)\\=mlp = y,
provided that x, yeQ. The sequel is left to the reader, who must apply the fixed
point theorem to the operator G.
2.12.2 Consider the system
Ax = b (1)
and a non-singular matrix B such that
cond2 (BA)« cond204).
Hence the equivalent system
BAx = Bb
is better conditioned: the matrix B is a preconditioner for the solution of
equation (1).
If p(I — RA)« 1, where R is an approximate inverse operator (Exercise 2.6.1),
then it can be shown that
cond (RA)« cond (A),
and we may choose B = R. Therefore, B appears as an approximate inverse of
A. The vector
x0 = Bb

may be considered as a starting point for the refinement process

2.12.3 Consider the system

Ax = b,
where A corresponds to a discretization of a linear operator in infinite dimension.
We associate with A a step that characterizes the discretization. The order of
A is a decreasing function n(h) of h. Let b! be a coarser step:
h'» ft,
and let
Ax' = V
be the associated system of order
N = n(/i), m = n(W).
Suppose there exist matrices
re<CmxN and pe€Nxm
such that
rp = Im and Λ' = rAp
and that /Γ is non-singular. Then we may take
as an approximate inverse, where K is an operator such that
provided that v is sufficiently great. Often K is chosen to be the interaction
matrix of Jacobi, Gauss-Seidel or relaxation (see [B:9,27]).

3.3.1 If J = V ~ XA V, then Jk=V~lAkV, and we deduce that
e =Ve V-K
The result follows from the identity
e J(i+h) - e Ji = (eJ* - /)e J i = hJeJt [/ + 0(h)l
The computation of the elements of e Ji follows from Exercise 3.1.6.

3.4.9 We have the relations

ΐν = Β1,2υΒ-1ΐ2,
U = ΧΑΧτΒ,
whence we obtain
W= (Bll2RBl)W{Bll2RBx)-\
The eigenvectors satisfy

u, = R-lw„ vt = -^RÄlE™i (Ai *0).

v ^i
3.5.6 The derivatives of w are
u'(i) = λελίφ and w"(i) = λ2ελίφ.
(A2M + Aß + K)</> = 0 if Mu" + £u' + KM = 0.
3.7.3 The result follows directly:
Τπηφ = Τπ,,ίΤφ,,) = Τ(πηΤπη)φη
= Τ(ληφη) = ληΤφη = ληφη.

4.1.1 Let u be such that || u ||2 = 1 and
\\A~lu\\2 = max \\A'lx\\2 = IM' 1 1| 2 .
IWl2 = i

t> = —— Λ *u and Δ/4 = —uv*
\\A~l\ 12 II Λ

Hence || i; ||2 = 1; also AA is of rank unity and

(A + AA)v = 0.
Hence A + AA is singular. On the other hand,

IM "Ml 2 cond 2 (,4)

4.2.1 Let (ß, Q) be an orthonormal basis of C , where Q is a basis of M. Let

sp(B) = {A}.
B = Q*AQy
sp(B) = sp(>l)\U}.
δ= min \μ — λ\.


( Γ 1 ^ max -^ρΚΒ-λΙΓ^^ΗΒ-λΙΓ1^.
με*ρ{Β)\μ — λ\




||ΣΧ ||2 = | | ( B - A / ) - ' ||2.
Aesp(ß) suchthat δ = \λ — λ\,
and let
J = V'lBV
be the Jordan form of B. Hence

(B-Xl)~i = V(J-XI)-1V-1,
and so
Let / be the index of λ and let ./(A) be the corresponding / x / Jordan block.

||(J(A) - λΙ)~1 ||F = <T V 1 + 2δ2 + ··· + ΙδΜ~u s: - 2 3 - ' ,


provided that δ is sufficiently small. Now

liy-A/)- 1 |l2 = max||(J lV -/)- 1 |l2

< max | | ( J y - * / ) - % ,

where the Ju are the different Jordan blocks of J. For sufficiently small δ the last
maximum is attained by J 0 = J(A), whence we have the result that
^ - 1 ^ | | ( B - A / ) - 1 | | 2 = IIS1||2<2cond2(K)o-z.
4.2.2 The function

~2πϊ' (Me)-ziyldz
is analytic for
R(z) = (A-ziy\
and Γ is a Jordan curve isolating λ. Hence
lim||P(e)-P|| 2 = 0

χ(ε) = Ρ(ε)φ
can be normalized as follows:
Φ(ε) = [0^(ε)]-^(β),
Α*φ* = φ>θ*, ΦΙΦ = ΙΜ.
In fact, for sufficiently small ε, we have φ*χ(ε) Φ 0, because
lim|0*x(e)-l| = O.
θ(ε) = φΙΑφ(ε)
and we can prove that
= ΦΙΗφ.
da ε = 0

\\θ(ε)-φ\\2*!ί\\φ*\\2\ε\ + 0(ε2).
This proves (b).
The inequality (9) is proved as follows. Since
QNQ* = Vny-it
it follows that

However, sp(j/*f/) ^ {0,1}. Hence

|| Nk || 2 ^ cond 2 V (for all k ^ 0).
The inequality (c) is a consequence of the indentity
lMßiIm -ΘΤ1 IMßVm - β(β)] = / . - [Α(ε)/„ - θ] -1 [0(c) - 0]
because A(e)/m — θ(ε) is singular. The inequality (d) is a consequence of

l^°nd*(K)ll*(£)-^maxil,- l
\λ(ε)-λ\ l \λ(ε)-λ\'
4.2.3 We calculate

B = A~lAA = (1 l
\0 0
The departures from normality are
v(A) = || A*A - AA* ||F = 10*72(1 + 108) > / 2 x 108,
v(B) = \\B*B-BB*\\F = 2.
The bases of eigenvectors are

X(A) = ( 1 l
A ) and X(B) = (
' \0 10-*/ \0 0
respectively, whence

\\X(A)\\2>J2, ||X(/t)- 1 |l 2 >V2xl0 4 ,

||X(B)||* = I ± ^ , \\Χ(Β)-ι\\22 = 1±^/1.

cond 2 [ΛΤμ)] > 2 x 104,

cond 2 [X(ß)] = 3 + v 5
< 2.62.

4.2.6 Let / be a function defined on an open set Ω of a normed vector space

(£, || · || E) and suppose that / takes values in a normed vector space (F, || · ||F). We
recall that / satisfies the Lipschitz condition if there exists a number κ ^ 0 such
ll/(x)-/(y)llF<*ll*-yllE for all x,yeCl
and that / is Holder-continuous of order pf if there exists a number κ ^ 0 and an
integer p > 1 such that
Wm-fiyUF^KWx-yVE for all x,yeQ.
If in the above definitions we take Ω to be a neighbourhood of 0E and if we
fix x = 0 E , then we obtain the corresponding notions relative to a scalar. By
virtue of the bounds established in Exercise 4.2.2, part (d), the function e»-*A(6),
defined in a neighbourhood of ε — 0, satisfies the Lipschitz condition at 0 when
the eigenvalue is semi-simple for (/ = 1), and it is Holder-continuous at 0 when
the eigenvalue is defective with index /, the order p of the continuity being equal
to 1//.
4.2.7 The quantity ε in Exercise 4.2.2 measures the absolute error of Α(ε) in
relation to the approximation of A. Hence the relative error of Α(ε) is given by

ε» =
On the other hand, for each eigenvalue λ φ 0 of a non-singular matrix we have
o< — <\\Α~1\\2.
Hence when λ is semi-simple (/ = 1 and V= Im) we have

m M
~ ^ cond 2 (A) || P || 2εκ + 0(ε 2 ).

The study of the defective case is left to the reader.

4.2.11 Consider the block of eigenvalues
An upper triangular matrix of the form