0 Voti positivi0 Voti negativi

18 visualizzazioni429 pagineFeb 21, 2019

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

© All Rights Reserved

18 visualizzazioni

© All Rights Reserved

- Aptitude Shortcuts and Mind Tricks for Partnership Problems Type-II
- Convolution 2
- Algebraic Fredholm Theory
- HwSol
- 56377
- Computational Techniques
- Model Order Reduction Techniques for Damped Vibrating Systems_the Moment Matching Approach
- Chapter Five
- Lecture 1
- Ahp Different Priority Method
- syllabus
- Eigenvalues and Eigenvectors
- rrr
- Elements of Mat Lab
- Chapter 2
- nn01.pdf
- bok%3A978-1-4471-2879-3
- mws_gen_sle_spe_eigenvalues.doc
- Biology StreamSyllabus
- Triangulation(2)

Sei sulla pagina 1di 429

of Matrices

Books in the Classics in Applied Mathematics series are monographs and textbooks declared out

of print by their original publishers, though they are of continued importance and interest to the

mathematical community. SIAM publishes this series to ensure that the information presented in

these texts is not lost to today's students and researchers.

Editor-in-Chief

Robert E. O'Malley, Jr., University of Washington

Editorial Board

John Boyd, University of Michigan Peter Oiver, University of Minnesota

Susanne Brenner, Louisiana State University Philip Protter, Cornell University

Bernard Deconinck, University of Washington Matthew Stephens, The University of Chicago

William G. Faris, University of Arizona Divakar Viswanath, University of Michigan

Nicholas J. Higham, University of Manchester Gerhard Wanner, L' Universite de Geneve

Mark Kot, University of Washington

C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences

Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications

and Computational Methods

James M. Ortega, Numerical Analysis: A Second Course

Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained

Minimization Techniques

F. H. Clarke, Optimization and Nonsmooth Analysis

George F. Carrier and Carl E. Pearson, Ordinary Differential Equations

Leo Breiman, Probability

R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding

Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences

Olvi L. Mangasarian, Nonlinear Programming

*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One,

Part Two, Supplement. Translated by G. W. Stewart

U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for

Ordinary Differential Equations

K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of InitialA/alue Problems

in Differential-Algebraic Equations

Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems

J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and

Nonlinear Equations

Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability

Cornelius Lanczos, Linear Differential Operators

Richard Bellman, Introduction to Matrix Analysis, Second Edition

Beresford N. Parlett, The Symmetric Eigenvalue Problem

Richard Haberman, Mathematical Models: Mechanical Vibrations, Popuhtion Dynamics, and Traffic Flow

Peter W. M. John, Statistical Design and Analysis of Experiments

Tamer Ba§ar and Geert Jan Olsder, Dynamic Nuncooperative Game Theory, Second Edition

Emanuel Parzen, Stochastic Processes

Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis

and Design

Classics in Applied Mathematics (continued)

Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New

Statistical Methodology

James A. Murdock, Perturbations: Theory and Methods

Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems

Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and 11

J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables

David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their

Applications

F. Natterer, The Mathematics of Computerized Tomography

Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographie Imaging

R. Wong, Asymptotic Approximations of Integrals

0 . Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation

David R. Brillinger, Time Series: Data Analysis and Theory

Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems

Philip Hartman, Ordinary Differential Equations, Second Edition

Michael D. Intriligator, Mathematical Optimization and Economic Theory

Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems

Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue

Computations, Vol. 1: Theory

M. Vidyasagar, Nonlinear Systems Analysis, Second Edition

Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice

Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology

of Selecting and Ranking Popuhtions

Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods

Leah Edelstein-Keshet, Mathematical Models in Biology

Heinz-Otto Kreiss and Jens Lorenz, lnitial-BounL·ry Value Problems and the Navier-Stohes Equations

J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition

George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and Technique

Friedrich Pukelsheim, Optimal Design of Experiments

Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications

Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics

Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues

Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics

Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties

Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations

Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems

1. Gohberg, P. Lancaster, and L. Rodman, Matnx Polynomials

Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics

Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem

Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications

Robert J. Adler, The Geometry of Random Fields

Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity

Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions

Francoise Chatelin, Spectral Approximation of Linear Operators

(continued)

Classics in Applied Mathematics (continued)

Yousef Saad, Numerical Methods for Large Eigenvalue Problems, Revised Edition

Achi Brandt and Oren E. Livne, Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics,

Revised Edition

Bernd Fischer, Polynomial Based Iteration Methods for Symmetric Linear Systems

Pierre Grisvard, Elliptic Problems in Nonsmooth Domains

E. J. Hannan and Manfred Deistler, The Statistical Theory of Linear Systems

Franchise Chatelin, Eigenvalues of Matrices, Revised Edition

Eigenvalues

of Matrices

r REVISED EDITION T

Frangoise Chatelin

CERFACS and the University of Toulouse

Toulouse, France

With exercises by

Mario Ahues

Universite de Saint^Etienne, France

and

Frangoise Chatelin

Walter Ledermann

University of Sussex, UK

the French Ministry of Culture»

51EUTL,

Society for Industrial and Applied Mathematics

Philadelphia

Copyright © 2012 by the Society for Industrial and Applied Mathematics

This SIAM edition is a revised republication of the work first published by John

Wiley & Sons, Inc., in 1993.

This book was originally published in two separate volumes by Masson, Paris:

Valeurs propres de matrices (1988) and Exercises de valeurs propres de matrices (1989).

10 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. N o part of this book

may be reproduced, stored, or transmitted in any manner without the written

permission of the publisher. For information, write to the Society for Industrial and

Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688

USA.

information, please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA

01760-2098 USA, 508-647-7000, Fax: 508-647-7001, info@mathworks.com,

www. matlnworks. com.

Chaitin-Chatelin, Frangoise.

Eigenvalues of matrices / Frangoise Chatelin ; with exercises by Mario Ahues ;

translated, with additional material, by Walter Ledermann. — Rev. ed.

p. cm. — (Classics in applied mathematics ; 71)

Includes bibliographical references and index.

ISBN 978-1-61197245-0

1. Matrices. 2. Eigenvalues. I. Ahues, Mario. II. Ledermann, Walter, 1911-2009.

III. Title.

QA188.C44 2013

512.9'436--dc23

2012033049

is a registered trademark.

To

Hypatia of Alexandria,

a.d. 370415,

stoned to death by the mob.

in Athens she established a school in Alexandria, her native city,

where Pfoto and Aristotle, as well as Diophantus,

Apolhnius ofPerga and Ptolemy were studied,

This displeased the clerics who incited the mob against her.

o

Contents

Preface to the Classics Edition xiii

Preface xv

Preface to the English Edition xix

Notation xxi

List of Errata xxiii

1.1 Notation and definitions 1

1.2 The canonical angles between two subspaces 5

1.3 Projections 8

1.4 The gap between two subspaces 10

1.5 Convergence of a sequence of subspaces 14

1.6 Reduction of square matrices 18

1.7 Spectral decomposition 27

1.8 Rank and linear independence 31

1.9 Hermitian and normal matrices 32

1.10 Non-negative matrices 33

1.11 Sections and Rayleigh quotients 34

1.12 Sylvester's equation 35

1.13 Regular pencils of matrices 42

1.14 Bibliographical comments 43

Exercises 43

2.1 Revision of some properties of functions of a complex variable 61

2.2 Singularities of the resolvent 63

2.3 The reduced resolvent and the partial inverse 73

2.4 The block-reduced resolvent 76

2.5 Linear perturbations of the matrix A 79

2.6 Analyticity of the resolvent 82

2.7 Analyticity of the spectral projection 84

2.8 The Rellich-Kato expansions 85

2.9 The Rayleigh-Schrödinger expansions 86

2.10 Non-linear equation and Newton's method 89

2.11 Modified methods 92

2.12 The local approximate inverse and the method of residual

correction 95

2.13 Bibliographical comments 98

Exercises 98

X CONTENTS

3.1 Differential equations and difference equations 111

3.2 Markov chains 114

3.3 Theory of economics 117

3.4 Factorial analysis of data 119

3.5 The dynamics of structures 120

3.6 Chemistry 122

3.7 Fredholm's integral equation 124

3.8 Bibliographical comments 126

Exercises 126

4.1 Revision of the conditioning of a system 149

4.2 Stability of a spectral problem 150

4.3 A priori analysis of errors 165

4.4 A posteriori analysis of errors 170

4.5 A is almost diagonal 177

4.6 A is Hermitian 180

4.7 Bibliographical comments 190

Exercises 191

5.1 Convergence of a Krylov sequence of subspaces 205

5.2 The method of subspace iteration 208

5.3 The power method 213

5.4 The method of inverse iteration 217

5.5 The QR algorithm 221

5.6 Hermitian matrices 226

5.7 The QZ algorithm 226

5.8 Newton's method and the Rayleigh quotient iteration 227

5.9 Modified Newton's method and simultaneous inverse iterations 228

5.10 Bibliographical comments 235

Exercises 235

6.1 The principle of the methods 251

6.2 The method of subspace iteration revisited 253

6.3 The Lanczos method 257

6.4 The block Lanczos method 266

6.5 The generalized problem Kx = XMx 270

6.6 Arnoldi's method 272

6.7 Oblique projections 279

6.8 Bibliographical comments 280

Exercises 281

7.1 Elements of the theory of uniform approximation

for a compact set in C 293

7.2 Chebyshev polynomials of a real variable 299

CONTENTS χι

7.4 The Chebyshev acceleration for the power method 304

7.5 The Chebyshev iteration method 305

7.6 Simultaneous Chebyshev iterations (with projection) 308

7.7 Determination of the optimal parameters 311

7.8 Least squares polynomials on a polygon 312

7.9 The hybrid methods of Saad 314

7.10 Bibliographical comments 316

Exercises 316

8.1 Scalars in a field 324

8.2 Scalars in a ring 324

8.3 Square matrices are macro-scalars 327

8.4 The spectral and metric information stemming from A

of order n 328

8.5 Polar representations of A of order n 330

8.6 The yield of A Hermitian positive semi-definite under

spectral coupling 332

8.7 Homotopic deviation 340

8.8 Non-commutativity of the matrix product 342

8.9 Conclusion 346

8.10 Bibliographical comments 346

Exercises 346

Additional References 348

Appendices 351

A Solution to Exercises 351

B References for Exercises 395

C References 399

Index 406

Preface to the Classics Edition

The original French version of this book was published by Masson, Paris, in 1988.

The 24 years which have elapsed since 1988 until the present SIAM republication

of the English translation (Wiley, 1993) by Professor Ledermann have confirmed the

essential role played by matrices in intensive scientific computing. They lie at the

foundation of the digital revolution that is taking place worldwide at lightning speed.

During the past quarter of a century, the new field called qualitative computing

has emerged in mathematical computation, which can be viewed as a first step in the

direction of the polymorphic information theory that is required to decipher life phe

nomena on the planet. In this broader perspective, the backward analysis, which was

devised by Givens and Wilkinson in the late 1950s to assess the validity of matrix

computations performed in the finite precision arithmetic of scientific computers, be

comes mandatory even when the arithmetic of the theory is exact. This is because

classical linear algebra may yield local results which disagree with the global nonlin

ear algebraic context. Consequently, square matrices play, via their eigenvalues and

singular values, an even more fundamental role than that which was envisioned in

Chapter 3 of the original version.

This Classics Revised Edition describes this deeper role in a postface taking the

form of Chapter 8, which is accompanied by an updated bibliography. This is my third

book devoted to computational spectral theory to be published by SIAM, following

Lectures on Finite Precision Computations in 1996 (co-authored with Valerie Fraysse)

and Spectral Approximation of Linear Operators in 2011 (Classics 65). These books

form a trilogy which contains the theoretical and practical knowledge necessary to

acquire a sound understanding of the central role played by eigenvalues of matrices

in life information theory.

My gratitude goes to Beresford Parlett, U.C. Berkeley, for his perceptive reading

of a draft of the Postface. It is again my pleasure to acknowledge the highly profes

sional support provided by Sara Murphy, Developmental and Acquisitions Editor at

SIAM.

Frangoise Chatelin

CERFACS and University of Toulouse,

July 2012.

Preface

Helmholtz (...) advises us to observe for a long time the waves of the sea and the

wakes of ships, especially at the moment when the waves cross each other (...). Through

such understanding one must arrive at this new perception, which brings more order

into the phenomena.

importance. Here are two very different types of application: in the dynamics of

structures it is essential to know the resonance frequency of the structure; for

example, we mention the vibrations of the propeller blades in ships or helicoptors,

the influence of the swell on drilling platforms in the sea, the reaction of buildings

in earthquakes. Another class of fundamental applications is related to the

determination of the critical value of a parameter for the stability of a dynamical

system such as a nuclear reactor.

A good understanding of the algorithms is necessary in order that they may

be used efficiently. One knows the fantastic advance of calculators brought about

by technical developments: in 1957 transistors replaced valves; in the 1960s

printed circuits appeared and then the first integrated circuits with several dozen

transistors per microchip. In 1985 the VLSI (very large scale integration)

technology permitted the integration of a million transistors per chip.

What is less well known is the gain in performance due to progress in

mathematical methods. In some areas this is at least as important as the gain due

to technological revolution. For example, from 1973 to 1983 the capacities of the

most powerful computers were multiplied by 1000 and during the same period

the improvement of certain numerical techniques brought about a gain of

another factor of 1000. All this in the supersonic regime made it possible in 1983

to calculate a complete aircraft in less than a night's work on Cray I.

The object of this book is to give a modern and complete theory, on an

elementary level, of the eigenvalue problem of matrices. We present the

fundamental aspects of the theory of linear operators in finite dimensions and in

matrix notation. The use of the vocabulary of functional analysis has the effect

XVI PREFACE

approximation. At the same time, the use of the vocabulary of linear algebra, in

particular the systematic use of bases for the representation of invariant

subspaces, allows us to give a geometric interpretation that enhances the

traditional algebraic presentation of many algorithms in numerical matrix

analysis.

The presentation of this work is organized around several salient ideas:

matrices and multiple defective eigenvalues;

(b) influence of the departure from normality on spectral conditioning (Chapter 4);

(c) use of the Schur form in preference to the Jordan form;

(d) simultaneous treatment of several distinct eigenvalues (Chapters 2 and 4);

(e) presentation of the most efficient up-to-date algorithms (for sequential or

vectorial computers) in order to compute the eigenvalues of (i) dense matrices

of medium size and (ii) sparse matrices of large sizes, the algorithms being

divided into two families: (1) algorithms of the interative type for subspaces

(Chapters 5,6 and 7) and (2) those of the incomplete Lanczos/Arnoldi type

(Chapter 6);

(f) analysis of the convergence of subspaces by means of the convergence of their

bases (Chapter 1);

(g) analysis of the quality of approximation with the help of two concepts:

approximation through orthogonal projection on a subspace and asymptotic

behaviour of the subspaces AkS, k = 1,2,... (Chapters 5,6 and 7);

(h) improvement of the efficiency of the numerical methods through spectral

preconditioning (Chapters 5,6 and 7).

The reader who wants to obtain a deeper understanding of this area will find

the study of the following classical books very enriching: Golub-Van Loan

(Chapters 7,8 and 9), Parlett and Wilkinson.

The present book is a work on numerical analysis in depth. It is addressed

especially to second-year students of the Maitrise, to pupils of the Magistere, as

well as to those of the Grandes Ecoles. It is assumed that the reader is familiar

with the basic facts of numerical analysis covered in the book Introduction ά Γ

Analyse Numerique Matricielle et ά Γ Optimisation* by P. G. Ciarlet. The Recueil

if Exercices* is an indispersable pedagogic complement to the main text. It

consists of exercises of four types:

*An English translation was published in 1989 by the Cambridge University Press (Translator's

footnote). ·

+

This collection of exercises is incorporated in the present volume (Translator's footnote).

PREFACE χνιι

furnished);

(B) exercises for training and deepening understanding (bibliographical reference

is given in appendix B where the proofs can be found);

(C) computational exercises where the result is usually given in the text;

(D) problems (no solution is furnished).

friends. It took its starting point from a Licence course Ά propos de valeur

propres' which Philippe Toint invited me to give at the University at Namur in

the Spring of 1983.1 should like to thank him, and most of all I want to express

my thanks to Mario Ahues for the close and friendly collaboration throughout

the preparation of the text and the exercises. Equally, I am pleased to

acknowledge, in addition to their friendship, the influence of Beresford Parlett

and of Youcef Saad, which grew in the course of years. Finally, I should like to

thank Philippe Ciarlet and Jacques-Louis Lions for showing their confidence in

me by suggesting that I should write this volume for the series Mathematiques

Appliquees pour la Mattrise.

Preface to the English

Edition

I am very pleased to have this opportunity to acknowledge the very fine work

accomplished by Professor Ledermann. He certainly worked much harder than

is commonly expected from a translator, to transform a terse French text into

a more flowing English one. He even corrected some mathematical mistakes!

Two paragraphs have also been added in Chapter 4 to keep up to date with

new developments about the influence of non-normality and the componentwise

stability analysis. The list of references has been updated accordingly.

Notation

N set of integers

JR set of real numbers

C set of complex numbers

A = (α^) matrix with elementflyin the ith row andjth column (1 < i ^ w,

K jϊ ^ m); linear map of <CW into C"

Λ τ = (αβ) transposed matrix (denoted by XA in algebra)

A* = (a,,) transposed conjugate matrix

Xm

<C" set of n by m matrices over C

τ

x = ( { # = ( { l 9 ..., £ η ) column vector of C

{x x ,...,x r } = {Xi}\ set of r vectors

A = [fl t ,..., flm] matrix of column vectors {A,}"

sp(4) spectrum of A

{A,·}^ set of distinct eigenvalues of A, d ^ w.

{μ,}" set of eigenvalues, possibly repeated, each counted with its

algebraic multiplicity

res(/l) = C — sp(4) resolvent set of A

p(A) = max | χ.\ spectral radius of A

i

det A determinant of A

n

tr A trace of A = £ aH

i=l

r(A) rank of A

adj /I = (Aij) adjoint of A:AU is the cofactor of αμ when A = (fly)

π(λ) = det (A/ — Λ) the characteristic polynomial of A

2

|| x || 2 = I Σ I iil ) Euclidean norm of x

χχιι NOTATION

/4 tM :M->M restriction of a linear map A to an invariant subspace M

lin(xj,...,x r ) subspace over C generated by {χ,}Γ,

ω(Μ, Ν) gap between the subspaces M and N

0 = diag(0,·) diagonal matrix of canonical angles between M and N

o

Σ=ο notational convention

ι=1

tensor (or Kronecker*) product of the matrices A and B

\®B set of polynomials of degree ^ k

Chebyshev polynomial of the first kind of degree k, when

Tk(t) =

I[( f + (i2 _1}Ι/2)* +

Y*AX; Y*X = / adjoint bases of C

complementary invariant subspaces, M = M+

M right invariant subspace

left invariant subspace

x right eigenvector

x left eigenvector

*

csp spectral condition number

csp(/) spectral condition number of the eigenvalue λ

csp(x) spectral condition number of the eigenvector x

csp(M) spectral condition number of the invariant subspace M

meas (Γ) Lebesgue measure of the curve Γ

di.n. departure from normality

List of Errata

Preface

p. xv line(—22) helicopters

p. xvi line(—2) indispensable

Notation

p. xxii definition A is regular 4=> A is invertible <=$■ det A^O

Chapter 1

p. 3 line(-12) orthonormal

p. 10 line(13) basis

p. 11 line(2) x*x = 1

line(-7) ||(/-πΛτ)πΜ|| in (1.4.6)

p. 16 line(2) ck

line(10) as required.

p. 18 line(2) basis

line(5) = l/2ibr

/ x -esin(2/e) λ

line(8) A[£)

~ \ -εώη(2/ε) x )

p. 21 line(-7) ,w*)

p. 22 line(15) 1< j <i < m

p. 23 line(9) =0

p. 31 line(13) . . . of X, but it actually goes back to Laplace in the

first Supplement, pp. 505-512, to Theorie analy-

tique des Probabilites, 3rd edition, Paris, 1820.

p. 32 footnote 1854-

XXIV LIST OF ERRATA

p. 38 line(18) \\&z\\

footnote *Karl Adolf Hessenberg, 1904-1959, born and died

in Frankfurt am Main.

p. 39 lined) HT-'HF

line(ll) 5<1

line(12) cond 2 (X)

p. 40 line(16) \\AA* - A*A\\F

line(-8) of order r (not to be confused with the order of B

taken to be 1)

p. 42 line(—7) set of finite eigenvalues

p. 43 line(—13) Poincare

p. 45 line(9) - f^) in 1.1.10

p. 46 lined 3) Plli in 1.1.19

line(-3) 0 m a x = π/2 in 1.2.2

p. 47 line(-2) [UU] in 1.2.6

p. 48 line(10) If||(P-Q)P||2<l in 1.3.4

/ r \ 1/2

2

line(-5) h^sin ^ in 1.4.2

p. 50 line(5) St = . . . e C n x m in 1.6.8

p. 52 line(-7) upper triangular in 1.6.19

p. 56 line(3) Hermitian in 1.8.7

line(-12) <Xl(B) + X1(C) in 1.9.2

p. 57 line(-7) Brouwer's in 1.10.1

Chapter 2

p. 65 line(-3) constant

p. 66 Proposition 2,.2.7 z .-* S(z)

lined 1) keW

lined 2)

p. 68 line(7) An = APv

lined 6) z^T

p. 70 line(9) -p

z-X

in (2.2.5)

l-l

lined 6)

= -Σ fe=0

p. 72 line(5)

= P(A*,X)

p. 73 line(2)

P(A*,X) =

line(4)

dz = —dz

LIST OF ERRATA XXV

p. 75 line(14)

p. 78 line(3) (b) >δ-'

p. 79 line(7) σ

p. 80 line(-l) A-zI

p. 83 line(14) (2.6.2)

Hne(-lO) in exact arithmetic.

p. 84 line(-3) λ(ί) =

p. 86 line(3) k

p. 87 line(-6)

έί

p. 88 line(4) S(t) = [.·.]- 1

p. 89 PROOF Take s = p defined in Theorem 2.9.3.

p. 90 line(12) (I - Y*X)B = 0.

line(-6) = r U {0}.

p. 91 line(14) -0

p. 92 line(19) B = Y*AU

p. 93 line(3) \\vk\\ = *k

p. 96 lines(—7, —6) l(U)

p. 98 line(2) = Κχ^ + · · ·

line(5) +A'- 1 6

103 line(-3) B in 2.9.1

\\{vY

104 line(4) A*y - ξν in 2.9.1

line(6) •••IIQUe in(*)

p. 105 line(7) delete of in 2.10.2

p. 108 line(16) coefficient (1, n) in J is a\n in 2.11.5

Chapter 3

P· 114 line(17) P(...) = P(...)

P· 115 line(-14) Pu(i,j)

P· 117 line(-6) +ljdi.

line(-5) A + dlT

n

P· 118 line(ll) C

3= Σ ai Pi

3 +

n

line(--8)

Λ-Σ-

line(--7) Xi = (1 + r)fi

P· 126 line(--7) of eigenvalues

XXVI LIST OF ERRATA

p. 132

line(9) π be the vector in 3.2.3

p. line(-2)

135 click in 3.2.9

p. 136

line(8) converge in 3.2.9

p. 140

line(5) S = [Si,...] in 3.4.8

p. 145

3.7.1 [B:ll]

3.7.2 [B:4,H]

p. 146 3.7.4 [B: 3,11]

Chapter 4

P· 155 line(3) II · U P

line(ll) (or eigenvectors)

line(17) (/ - 1)//

P· 156 line(-6) cond2Ü£)

P· 160 line(6) sparsity

line(-8) πβ = 1

P· 166 line(9)

P· 167 line(-5) λ' = · · ·

P· 168 Theorem 4.3.7 ||χ'.|| = 1

P· 171 line(-14) A is . . .

line(-12) (1 + Ι λ - α Ι ) 1 - 1 . . .

χ

P· 173 line(-16) υ{0,Β)- Υ*

line(-15) [ • • • , s p ( B ) ] > 0

p. 174 line(17) . Then

P· 178 line(10) = max

p. 182 line(9) a—X

P· 187 line(-5) min

7 (Pj " A )

In particular csp(x) = ΙΙΣ-1!^ <

P· 190 line(-17)

value and eigenvector condition

be related when A is not normal

ί 2'.>erstheeigen-

need not

P· 199 line(-l) (S,x)

P· 200 line(8) (S\x)

line(17) yn

line(-3) Au — au A*v — av

P· 203 line(3) (||r(a)||| + | | S ( a ) | | | ) 1 / 2 - · · ·

P· 204 line(9) a matrix H

line(10) = UCandV*(A-H) =

LIST OF ERRATA XXVll

Chapter 5

p. 210 line(-15) ω(··· ,Μ/) - > 0

line(-14) Theorem 5.2.1

line(-4) - °ij a

i a

j

p. 212 line(-2) . . . , assume that x is thefirstcolumn of X, then

footnote Andre-Louis Cholesky, 1875-1918, born in Mont-

guyon, died in Bagneux.

p. 213 line(3) AQ = QT

p. 217 line(-5) Α-σΙ =

p. 218 line(7) = [*,$](···

p. 219 line(2) 6 -c+i

line(ll) (A - al)y = 0

p. 220 line(4) . . . exists a vector u such that qi is an eigenvector

p. 221 line(17) {Qi~'Qk)

p. 222 line(-18) r < n

line(-14) £kDk^

p. 225 line(-8) Qken = ' ' ' ti^n

p. 227 line(-3) = -Axk + · · ·'

p. 229 line(-l) B = Q*AQ

p. 231 line(9) eigenvalue λ

line(16) same

p. 233 line(-13) Xk

p. 236 line(2) Jti+i = <%i

p. 242 line(-5) (A - ujl)q

p. 243 line(-5) T - σΐ) = ■■■

p. 244 line(9) that

line(ll) ... of 5 and

p. 247 line(13) x*B*Bx

Chapter 6

p. 253

line(9) G,

p. 254

lined 3)

p. 255

line(17) < caf.

p. 258

line(3) jen = lin(...,An~1u)

U

line(-2) bj

line(-l) u : = U" -

p. 260 line(8) Let

χχνιιι LIST OF ERRATA

p. 261 line(-l) H

p. 262 line(13) Ρ(λι) = · · · =

p. 267 line(4) J^l

line(14) (a) delete A

p. 269 line(4) skeS

line(ll) Sk = Xk + Y]PjSk

line(-l) (7i)

p. 271 line(14) required

line(15) same

p. 272 line(6) u := u — · · ·

p. 273 line(l) M

p. 275 line(3) sp(A) - {X}

line(-4) centre c,

line(-2) denominator

laiui 7}

±i-\ _i y(-— 0

p. 276 line(9) < cai

line(-4) Delete theie end of pr<pi

lowing:

When dealing with || (Pi — P)x\\ in Theorem 6.3.4

we did not use the assumption that A is Hermitian.

Therefore the result sinθι < \\(Ρι — P)x\\ < cai

is valid. To bound λ — λ/ we also follow the proof

of Theorem 6.3.4, where xi (respectively x{) is re

placed by xi (respectively x[). The conclusion fol

lows accordingly.

p. 278 line(9) ioi = j

line(16) i = i* under Σ

line(19) ignore sketch of Hi

line(-7) \Χί

line(-5) h+i ι

line(-l) Hi

p. 279 lines(—11, - 8 , - 1 ) wi

line(-8) (6.7.1)

p. 281 line(-8) Sib in 6.1.2

p. 282 line(-12) Theorem 6.2.6 in 6.2.5

p. 283 line(l) (#*uj+i)i/i in 6.3.3

p. 284 line(8) change m to / in T/ in 6.3.5

p. 285 line(8)

p. 287 line(5) Exercise 6.3.13 in 6.3.14

p. 289 line(3) do,... , d m _ i in 6.3.18

line(10) rjrj = 0 in 6.3.18

line(lS) tridiagonal matrix Tk in 6.3.18

LIST OF ERRATA XXIX

p. 291 line(12) zelmX in 6.5.1

line(-12) Ht in 6.6.3

line(—5) w\ in 6.7.1

Chapter 7

p. 294 line(10) Rivlin

line(-4) Theorem 7.1.1

p. 295 line(9) andf

Theorem 7.1.5 C(S)

lines(—3, - 1) Pfc

p. 296 lines(5,9) ?k

line(10) ω(ζ)

p. 297 line(-2) change i to :

p. 298 line(2) Pk

p. 299 line(13) Chebyshev

p. 300 line(7) Rivlin

line(-l) v(k)

line(-2) Figure 7.3.2

p. 302 line(-3) \h(z)\

line(-l) p(i) = i

footnote Sommieres

p. 304 line(-2) P

X-c

p. 305 line(-lO) for yk :

p. 309 line(-3) ye Sk

p. 311 line(15) max/i(/ii) =

p. 314 line(-14) ( < e , 9 * >^)

line(-10) >lk*IU

Appendix A

p. 355 line(-7) Q in 1.2.6

line(-3) [Q,Q]*[U,U]} in 1.2.6

p. 374 line(4) inequality (a) in 4.2.2

p. 381 line(8) CHAPTER 5

XXX LIST OF ERRATA

line(-lO) (WiG^yWi = G^WfWi in 6.6.3

p. 390 line(4) Ht in 6.6.3

Appendix B

p. 395 line(-12) [15] de la methode ...

Appendix C

p. 399 Balas

Barra

Baumgärtel, H. (1985)

p. 400 line(20) Bauwens

p. 402 line(24) . . . Sijthoff and Noordhoff.

line(25) Meyer, CD. Jr. . . .

p. 403 line(20) Oeuvres

line(-ll) Rutishauser, H. (1969) 'Computational aspects of

F. L. Bauer's simultaneous iteration method', Nu-

mer. Math., 13,4-13.

CHAPTER 1

Algebra

invariant subspace, that is to say of a (possibly orthonormal) basis in that sub-

space. In this chapter we present the tools from linear algebra which are specific

for the treatment of eigenvalues. In particular, the canonical angles between two

subspaces are an appropriate measure of the 'distance' between these two

subspaces. The convergence of subspaces is translated into the convergence of

their bases up to a non-singular or unitary matrix, as the case may be. As often

as possible, we also use Schur's form in preference to that of Jordan; the latter is

important from a theoretical point of view but is numerically unstable.

Let C represent the space of column vectors x with complex components ξ]

(j = Ι,.,.,η). Then x* is the row vector with components ξ}. Unless otherwise

stated, the norm on <C" is the Euclidean norm

/ n \l/2

ΐΐχΐΐ2 = ( . Σ ΐ { / ] ·

The norms

j=\

and

11x1100= max |<y

are also useful. Unless the contrary is indicated, || · || denotes an arbitrary norm

on <C".

The scalar product on <Cn is given by

(x9y) = y*x.

When (x, y) = 0, the vectors x and y are said to the orthogonal.

2 SUPPLEMENTS FROM LINEAR ALGEBRA

vectors. The basis is orthonormal if and only if

(χ,,Χ;) = (50· (ί,.;=1,...,η).

The representation of a vector x in an orthonormal basis is given by

n

representation

n

x

= Σ Sjxj

ij = (x>yj) 0'=l»2,...,n),

where {)/,·: 7 = 1,2,..., n} is another basis of C such that

(χ„30) = ^ (i.7= 1,2 n). (1.1.1)

The proof of the existence and uniqueness of the basis {y}) is given in Exercise 1.1.1.

The basis {yj} defined in equation (1.1.1) is called the adjoint basis of {x,}. We

also say that the 2n vectors {x,} and {yj} form a biorthogonal collection of

elements of C .

Let {ay.j = 1,..., r} be a set of r vectors of C . We denote by

A = \_a1,...,ar]

the rectangular matrix of order n x r whose columns are the vectors au..., ar.

To fix the ideas we assume that

r ^n.

If

<*j = (au) (i = 1,..., n; ; = 1,..., r),

then

A=(au).

The vector space which is generated by {a,} is denoted by

lin (a l 9 ..., ar)

We shall often identify the matrix A with the linear mapping

which is represented by the matrix A when <Cr and <D" are referred to their

respective canonical bases.

NOTATION AND DEFINITIONS 3

Example 1.1.1 The unit matrix I„ represents the identity map C"->C". When

there is no ambiguity we shall write /.

The matrix

A* = (äjd

is the transposed conjugate complex of A; it reduces to the transposed

Λτ = (αβ)

(or lA) when A is real. Let / I b e a n n x n square matrix. The trace οϊΑ is given by

n

*A= Σ aJr

AA* = A*A

and it is called Hermitian* if

A = A*.

For a real matrix, being Hermitian means being symmetric, that is

A = AT.

The Hermitian matrix A is positive definite (respectively positive semi-definite) if

x ^ O implies that x*Ax > 0 (respectively x*Ax ^ 0).

Annxr rectangular matrix Q is said to be orthogonal if

ß*ß = 'r

Ann x n square matrix Q is called unitary if

öTe=QQT=/„·

The columns of a unitary matrix form an orthonormal basis of C .

The set of all n x r matrices over C is denoted by <Cwxr; it is isomorphic to

S£(C, C ) , the set of all linear maps of C r into C n .

Corresponding to any norms ||-||Cr and ||-||Cn on C and <C" respectively, we

can define an induced norm (subordinated norm) on <EnXr as follows:

o#x6cr ||x||c r

4 SUPPLEMENTS FROM LINEAR ALGEBRA

Example 1.1.2 (see Isaacson and Keller, 1966, pp. 9-10) When the norm ||* || x

is used for both C r and C , then

Π

When the norm || · || „ is used for both <Cr and <P, then

II ^t II oo = max Σ Κ Ι = ΜΊΙι-

\\AB\\^\\A\\ ||fi||

The condition number, cond (/I), of a square matrix (relative to inversion) is

defined by

cond(A)=\\Al\\A'1\\.

Let A be a real or complex square matrix of order n. We consider the eigenvalue

problem:

λ. The complex number λ is an eigenvalue of A if and only if it is a zero of the

characteristic polynomial

π(λ) = ά&(λΙ-Α),

the determinant of λΐ — A. This polynomial has n zeros in C, not necessarily

distinct; together they form the spectrum of A, denoted by sp{A). Thus

sp(/4) = {AeC; A is an eigenvalue of A}.

ρ(Α) = πιαχ{\λ\; xesp(/4)}

/I if

M c M .

In particular, when A is an eigenvalue of A9 the eigenspace Ker(,4 - λ1\ which is

generated by all the eigenvectors associated with λ, is invariant under A.

The singular values of the n x r rectangular matrix A are the non-negative

square roots of the eigenvalues of the square matrix A*A of order r. The norm

THE CANONICAL ANGLES BETWEEN TWO SUBSPACES 5

\\A\\2 = pl,2(A*A) (1.1.3)

1,2

(see Exercise 1.1.6), where p (A*A) denotes ^/p(A*A). This norm is often called

the spectral norm of A. It is majorized by ||X|| F , the Frobenius* norm, which is

easier to calculate:

/ n r \l/2

For if the eigenvalues of A*A are OLX ^ ·· · ^ αΓ ^ 0, we have that

ix{A*A) = «! + ··· + ar ^ «j = ρ(ΛΜ).

According to the context, this norm is also called the Schur* norm or the

Hilbert i -Schmidt § norm. It is the Euclidean norm of the vector (a0) in <Cnr.

Let M and N be two subspaces of C , each of dimension r. The relative position

of two spaces can be described with the help of canonical angles, which we are

going to define after establishing a preliminary lemma. We suppose that the

subspaces are determined by orthonormal bases consisting of the column vectors

{ql9...,qr} and {ui9...,ur}

respectively. We identify the bases with the n x r matrices

6 = [<Zi--4r] and l/ = [ w i - w r ] ,

which satisfy the equations

ρ*ρ = £/*(; = /.

Lemma 1,2.1 The singular values ofU*Q lie between 0 and 1.

PROOF Let {cf:i = 1,..., r} be the set of singular values of l/*Q. Then

0^c2^p(Q*UU*Q)=\\U*Q\\22

^\\U\\2\\Q\\22 = p(U*U)p(Q*Q)=l.

Definition The r angles Oj defined by

CJ = COS0J9 O^0;^(;=l,...,r),

f

Issai Schur, 1875-1941, born in Mogilev, died in Tel Aviv.

♦David Hubert, 1862-1943, born in Königsberg, died in Göttingen.

§

Erhard Schmidt, 1876-1959, born in Dorpat, died in Berlin.

6 SUPPLEMENTS FROM LINEAR ALGEBRA

decreasing sequence. Thus

2

We introduce the diagonal matrix

e = diag(0 1 ,...,0 r ).

The canonical angle

we define

T0 = diag(T0 1 ,...,T0 r ).

By definition, the singular values of U*Q are the same as those of the matrix

cos0==diag(cos0,).

The property of having the same singular values establishes an equivalence

relation on the set of all matrices, which we denote by ~ . Thus we have proved

Proposition 1.2.2

l/*Q~cos0.

In particular we deduce that

||l/*ß|| 2 = ||cos©|| 2 and | | t / * ß | | F = l|cos0|| F ,

because the spectral norm ||·|| 2 and the Frobenius norm ||-||F depend only on the

singular values.

In Exercise 1.2.1 the reader will be introduced to the case in which the common

dimension of M and N exceeds n/2.

We now suppose that M is referred to an orthonormal basis Q and that N is

referred to the adjoint basis Y (if it exists). Thus

Q*Q = Y*Q = /.

The following lemma ensures the existence of the adjoint basis provided that

if and only if the maximal angle 0max between M and N is less than n/2.

an invertible matrix B of order r such that

Y=UB and Y*Q = /,

THE CANONICAL ANGLES BETWEEN TWO SUBSPACES 7

that is

B*U*Q = I.

Thus £* exists if and only if U*Q is invertible. By Proposition 1.2.2 (see

Exercise 1.1.8), the matrix U*Q is invertible if and only if

cos0 max >O,

and then

J5- 1 =(l/*Ö)* = Q*C/.

Lemma 1,2.4 Let X, Yand X'f Y' be two pairs of adjoint bases for M and N. Then

there exists an invertible matrix C of order r such that

X' = XC and Y'=Y{C~lY.

PROOF Since X and X' are bases for M, there exists an invertible matrix C such

that

X' = XC.

Similarly, there exists an invertible matrix D satisfying

Γ = YD.

By hypothesis, the bases for M and N are adjoint in pairs; thus

Y*X = I and (Γ)*Χ' = 7.

Hence

(YD)*(XC) = D*Y*XC = O*C = /,

so that

D = (C*)- 1 =(C" 1 )*.

Proposition 1.2.5 Let Yand Q be adjoint bases ofM and N respectively, Q being

orthonormal, and let Θ be the matrix of canonical angles between M and N. Then,

ifemax<n/2,

y-(cos0)_1 and ß-y-tan©.

Then there exists an invertible matrix B such that

Y=UB and B~1 = Q,¥U.

Now

Y*Y = B*U*UB = B*B.

Hence the singular values of Y are the same as those of B, and the latter are inverse

to the singular values of Q*U. However, Q*C7 and UQ* have the same singular

8 SUPPLEMENTS FROM LINEAR ALGEBRA

Figure 1.2.1

y-(cos0)_1.

Next

(Q* - Y*)(Q- Y) = QQ*- Y*Q-Q*Y + Y*Y

= 1-1-1+ Y*Y=Y*Y-I.

Hence if τ, is a singular value of Q — Y, we have that

T^cos-^-l^tan2^,

The canonical angles enable us to extend to C some well-known trigonometric

relations in the plane, where Θ (< π/2) is the acute angle between two straight lines

M and N (see Figure 1.2.1):

IM 2 = NI 2 = i,

Θ

\\q-u\\2 = 2sm-,

\\y\\2 = — j

cosy

Il<7-yll2 = tan0,

||f||2 = cos0,

| | 4 - i | | 2 = sin0.

1.3 PROJECTIONS

A projection P is a linear idempotent map:

P2 = P.

To each projection there corresponds a decomposition of C into a direct sum:

PROJECTIONS 9

can write

x = px + (χ - ρχ\

where PxelmP and x — PxeKer P. We say that P is a projection on M parallel

to W. Conversely, if a direct decomposition of C is given, we can define a

projection by stipulating that P is the identity map on M and the zero map on

W. We put

and so

W = N1 = {xeC|x*}> = 0 for all yeN}.

Lemma 1.3.1 The space C" can be decomposed into the direct sum

if and only i/0 max < π/2, where 0max is ifte maximal angle between M and N.

PROOF According to Exercise 1.2.2 the equality 0max = π/2 is equivalent to the

existence of a non-zero vector that belongs to both M and N1 (see Figure 1.3.1).

Proposition 1.3.2 Let X and Ybe adjoint bases for M and N (which exist when

0max < π/2). Then the matrix

P = XY* (1.3.1)

represents the projection on M parallel to N1 in the canonical basis o / C .

PROOF By hypothesis

Y*X = L (1.3.2)

Hence

P2 = XY*XY* = XY* = P;

so P is a projection. Let x e C ; then

Figure 1.3.1

10 SUPPLEMENTS FROM LINEAR ALGEBRA

write

u = X(Y*X)a = XY*(Xa) = P(Xa).

Hence M = ImP, as claimed. We deduce from equation (1.3.3) that x e K e r P if

and only if

y*x =0 (i=l,...,r),

that is

KerP = iV1.

Suppose that X\ Y is another pair of adjoint bases. Then X' = XCy Y' = Y(C*)~ \

and so

X'(Y)* = XY*.

When M = JV, we obtain the unique orthogonal projection of C on M; its

matrix will be denoted by π Μ . Suppose that M is given by the orthogonal bases

X relative to the canonical basis of <En(X*X = /). Then

nM = XX\

We remark that nM is Hermitian and that

11**112 = 1;

this follows from equation (1.1.3) and from the fact that all eigenvalues of a

projection matrix are zero or unity.

Let M and N be two subspaces of C , not necessarily of the same dimension.

ω(Μ,Ν)=\\πΜ-πΝ\\2,

where πΜ and πΝ are the orthogonal projections on M and N respectively.

The shortest distance, in the || · || 2 metric, of x from the subspace N is denoted by

dist(x,N).

This is the distance between x and πΝχ, which is the foot of the perpendicular

from x on to N. Thus

<ϋ8ί(χ,Ν)=||χ-πΝχ||2=||(/-πΝ)χ||2 (1.4.1)

Proposition 1.4.1

ω(Μ, N) = max {max dist (x, N\ max dist (>>, M)}

THE GAP BETWEEN TWO SUBSPACES 11

xeM, x*X=l, yeN9 y*y=l.

(a) We shall show that

max {dist (x,JV)|xeM,x*x = 1} = ||(/-π Ν )π Μ ||. (1.4.2)

Since π Μ χ = x if and only if xeM, we have

dist (x, N) = || (/ - πΝ)χ || = || (/ - πΝ)πΜχ ||

^ΙΙ(/-π Ν )π Μ ||||χ|| = | | ( / - π Ν ) π Μ | | .

In particular,

max {dist (x, N)|xeM,x*x = 1} < || (/ - πΝ)πΜ ||. (1.4.3)

Conversely, by the definition of the matrix norm, there exists a unit vector

u e C such that

\\(Ι-πΝ)πΜ\\ = \\(Ι~πΝ)πΜη\\. (1.4.4)

Two cases have to be distinguished: (i) if nMu = 0, then πΜ = πΝπΜ, which

implies that Mc= JV, and so both sides of equation (1.4.2) are zero; (ii) if

nMu # 0 , then x 0 = πΜΜ/||πΜι*|| is a unit vector of M and we deduce from

equation (1.4.4) that

|| (J - πΝ)πΜ || = || nMu || dist (x 0 , N)

^dist(x 0 ,N),

because

Ι|π Μ «ΚΙ|π Μ ||||ιι|| = 1.

Hence

||(/-π Ν )π Μ || < max {dist (x,N)| xeM, x*x = 1} (1.4.5)

The assertion (1.4.2) follows from equations (1.4.3) and (1.4.5).

(b) Proposition 1.4.1 can now be reformulated as follows:

ω(Μ, N) - max {|| / - πΝ)πΜ ||, || (/ - πΜ)πΝ \\}. (1.4.6)

Similarly,

||(/-π Μ )π Ν || ^ | | π Ν - π Μ | | = | | π Μ - π Ν | | .

Hence

ω(Μ, Ν) > max {|| (/ - πΝ)πΜ||, || (/ - πΜ)πΝ ||}. (1.4.7)

12 SUPPLEMENTS FROM LINEAR ALGEBRA

unit vector xe<En such that

l l * A , - M = ll(*Af-%)x||. (1.4.8)

Now

(*M ~ π * ) χ = UM(I ~ % ) * - (I - nM)*jv*

= u — v,

say, where

" =W - πΝ)χ, v = (/ - πΜ )πΝχ

Since π£, = πΜ and π Μ (/ — πΜ) = 0, it is readily verified that

u*v = 0,

whence

ΙΙ(π Μ -π Ν )χ|| 2 ΗΙ"ΙΙ 2 + ΙΜΙ2.

Using the relations (/ — nN)2 = / — πΝ and π\ — πΝ we obtain

ΙΙ"||2^||πΜ(/-πΝ)||2||(/-πΝ)χ||2

and

\\ν\\2^\\(Ι-πΜ)πΝ\\2\\πΝζ\\2.

Hence

li"H2 -H llt^H2 ^ max {|| (/— ΤΓ^ΤΓ^ ||2, II (/— ΤΓ^ΤΓ^ ||2}

χ{||(/-π„)χ||2+||π„χ||2}.

A simple calculation shows that

| | ( / - π „ ) χ | | 2 + | | π „ χ | | 2 = |ΜΙ 2 = 1.

On referring to equation (1.4.8) and taking square roots we obtain

ω(Μ, N) ^ max {||(/ - ηΝ)πΜ ||, ||(/ - πΜ)πΝ ||} (1.4.9)

The statement (1.4.6) now follows from equations (1.4.7) and (1.4.9) (see

Figure 1.4.1 inR 2 ).

|| P - P || < 1 implies that dim M = dim N.

PROOF We shall prove that || P - P \\ < 1 implies that dim M ^ dim N. Let

x ! , . . . , x r be a basis of M. We shall show that the vectors Px1,...,Pxr are

linearly independent. If not, suppose that

THE GAP BETWEEN TWO SUBSPACES 13

Γ

y = Σ α ί χ ί·

(P-P')y = y.

However, this contradicts the hypothesis that \\P — P'\\ < 1. By interchanging

M and JV, we conclude in the same way that dim N < dim M.

Corollary 1.4.3

ω(Μ, N) < 1 implies that dim M = dim N.

Theorem 1.4.4 Suppose that dim M = dim N = r< n/2. Then the 2r eigenvalues

ofnM — πΝ, which are not necessarily zero, are equal to

± sin 0f (i= l,...,r).

PROOF Let [Q,ß] and [(/, ζ/] be the bases of <Cn defined in Exercise 1.2.6.

Relative to the orthonormal basis [Q, Q], the projection nM is represented by

^/r 0 0\

0 0 0

0 0 0,

V

and the projection πΝ is represented by

/cx

-s (C -S 0).

0

14 SUPPLEMENTS FROM LINEAR ALGEBRA

is2 cs o\

n= \cs -s2 o

^0 0 0)

By a suitable permutation of the rows and the columns one verifies that the

eigenvalues of πΜ — πΝ, which are not necessarily zero, are equal to the eigen

values of the r two-by-two matrices

S2 CJSJ

CjSj -S2

that is ±Sj(j = 1,...,r). Since Π is symmetric, the Sj are also the singular values

ofn.

Corollary 1.4.5

<u(M,JV) = sin0 max .

PROOF We remark that nM — πΝ and sin Θ have the same non-zero singular

values, whence the result follows at once.

Let {Mfc: k = 1,2,...} be a sequence of subspaces. We are going to define what

is meant by saying that this sequence tends to a subspace M. We put

dim Mk = rk dim M = r.

if and only if

o>(Mk,M) = sin flJ^-O,

where 0 ^ x is the maximal acute angle between Mk and M, or, again, if and only if

sin0(fc)->O,

or, equivalently, if and only if

Ι|π Μ ΐ ί -π Μ || 2 ->0,

that is

rk = r.

CONVERGENCE OF A SEQUENCES OF SUBSPACES 15

PROOF By hypothesis,

ω(ΜΛ, M) = || nMk - πΜ || 2 -» 0.

Thus, for sufficiently great fe,

Ι|πΜ|ϊ-πΜ||2<1.

The result now follows from Corollary 1.4.3.

Theorem 1.5.2 Let Qk and Q be orthonormal bases for the subspaces Mk and M

respectively (k = 1,2,....). Without loss of generality, assume that

dim Mk = dim M — r.

Then

Mk-^>M as fc-*oo,

if and only if there exists a sequence of unitary matrices

Uk (fe=l,2,...)

of order r such that

as

QkUk~+Q fc->oo.

PROOF

(a) Assume that Mk^M so that π Μ(ί ->π Μ , or, in terms of the corresponding

matrices,

Q*e;r-ee*-o.

Multiplying on the right by Q and using the fact that ß*Q = /, we obtain

where

Q = Ö?ß.

Our hypothesis implies that

in particular

|detQ|2^l.

Hence, for sufficiently great values of k9 the Hermitian matrix Ck¥Ck is positive

definite. Every positive definite Hermitian matrix possesses at least one

positive definite square root (see Horn and Johnson, 1990, p. 405). We shall

denote a positive definite square root of C^Ck by (C£Cfc)1/2 and its inverse

by{C*Ck)-v\

It is easy to verify that

uk = ck(c*ckym

16 SUPPLEMENTS FROM LINEAR ALGEBRA

is unitary; indeed,

vkut = ck(c*cj- ^(c*ckr^c* = /.

In accordance with our assumption, all eigenvalues of C%Ck tend to unity

as fc-> oo. The same is true for (CJCk)1/2; hence

(C*Cfc)1/2-»J as fc-oo.

We have

Qk(Ck-Uk) = QkUkl(C*Ck)V2-Il

Since Qk and Uk remain bounded in the spectral norm, it follows that

QkUk^Q,

we required.

(b) Conversely, suppose there exist unitary matrices Uk(k =1,2,...) such that

Qkvk^Q.

We deduce that

(QkUk)(QkUk)* = QkQ*^QQ*,

which is equivalent to Mk -+ M.

(a) //, for each k, an orthonormal basis Qk is given for Mk, there exists a

subsequence {Qt} and an orthonormal basis V for M such that Qi-*V.

(b) / / an orthonormal basis Q is given for M, there exists, for each k, an

orthonormal basis Vkfor Mk such that Vk->Q.

PROOF

(a) By Theorem 1.5.2 there exists a sequence {Uk} of unitary matrices of order

r such that

QkUk^Q.

Every unitary matrix is of spectral norm unity. Hence {Uk} is a bounded

sequence (in the spectral norm) and therefore possesses a convergent

subsequence, say

{I/,}-I/,

where / runs through an increasing sequence of positive integers. Evidently,

U is a unitary matrix. We now have that

QlVl ^Q and Q,^QC/* = K,

CONVERGENCE OF A SEQUENCES OF SUBSPACES 17

say, whence

V*V=I;

so V is an orthonormal basis for M.

(b) This follows at once by putting

Vk = QkUk.

By Theorem 1.5.2 we have Vk-+Q, as required.

Proposition 1.5.4 Suppose that the subspaces Mk (k = 1,2,...) and M are equipped

with orthonormal bases Qk and Q respectively. Let Xk and X be arbitrary

bases for Mk and M. Then the existence of unitary matrices Uk such that QkUk->Q

is equivalent to the existence ofinvertible matrices Fk such that XkFk-+X.

PROOF

QkUk-+Q. (1.5.1)

There exist invertible matrices Bk and B such that

Xk = QkBk and X = QB.

The adjoint bases are

Yk = Qk(Btr1 and Y=Q(B*yl.

The matrices

Fk = Y*X = B- lQ*QB (k = 1,... ,r)

are invertible. It follows from the condition (1.5.1) that QkQk -+ QQ*> and so

QkCk -> Q, where Ck = C*ß. Hence

XkFk = QkBkBk * QtQB = QkCkB -+QB = X

(b) Assume that XkFk -> X, that is Qk(BkFkB~x) -► Q. Now BkFkB'x = Q*ß = Ck.

Therefore the hypothesis states that QkCk -> Q, whence Q C -> 6 * 0 = ^ and

Mk-+M if and only if there exist invertible matrices Fk such that

XkFk->X. (1.5.2)

which, in turn, is equivalent to Mk -> M by virtue of Theorem 1.5.2.

18 SUPPLEMENTS FROM LINEAR ALGEBRA

following example. Let M = R 2 . The orthonormal bases

c o s - , - s i n - 1 ,i s i n - , c o s - I > (1.5.3)

where ε is real, has no limit when ε tends to zero. The subsequence obtained

by putting zk = \kn does converge when k tends to infinity. Indeed, (1.53) is

equal to {ex, e2} for k = 1,2,.... We remark that (1.5.3) consists of the eigenvectors

of the matrix

,, x / 1 + ε cos (2/ε) ε sin (2/ε)

Α(ε) = I

\ε sin (2/ε) 1 — ε cos (2/ε)

corresponding to the eigenvalues 1 + ε and 1 — ε. When ε tends to zero, the

matrix Α(ε) tends to / and both eigenvalues tend to 1, the double eigenvalue of/.

We are interested in the theoretical problem of reducing a matrix A to simpler

forms with the help of similarity transformations

A->X~lAX,

which preserve the eigenvalues. We return to the problem (1.1.2) (page 4). We

denote by

{λΐ9...,λ+ d^n}

the set of distinct eigenvalues of A. Let λ be a particular eigenvalue. Its geometric

multiplicity, g, is the greatest number of linearly independent eigenvectors that

correspond to X, that is

g = dim Ker (A — XI).

The algebraic multiplicity, m, of X is the multiplicity with which λ appears as a

zero of the characteristic polynomial π(ί), that is

n(t) = (t-X)mnl(t); nx(X)^0.

It will be shown that

g^m

[see (1.6.4), page 26]. We denote by

{μι,···>μ„}

the set of eigenvalues with repetitions, each counted with its algebraic multi

plicity; for example, if λγ is of algebraic multiplicity ml9 we might put

μί = · · · = μ ιηι = λΐ9 ^ m i + i = *·* = Λ<2>

REDUCTION OF SQUARE MATRICES 19

to be multiple.

An eigenvalue of multiplicity m greater than unity is said to be semi-simple

if it admits m linearly independent eigenvectors; otherwise it is said to be defective.

If A — λΐ is singular, so is A* — XI, and I j s an eigenvalue of A*. Thus if

Ax = λχ9 there exists χ + Φ 0 such that A*x+ = λχ^ or, alternatively,

x$A = Ax*.

The eigenvector χ + of A* which corresponds to I is also called a left eigenvector

of A corresponding to k.

The matrix A is diagonalisable if and only if it is similar to a diagonal matrix. Let

Z) = d i a g ( ^ , . . . , / 0

be the diagonal matrix consisting of the eigenvalues of A.

independent eigenvectors x((i = 1,..., n). It can then be decomposed into the form

A = XDX~\ (1.6.1)

1

where the ith column of X (the ith column of(X~ )) is the right eigenvector xf

(the left eigenvector xf*) associated with the eigenvalue μ,·.

linearly independent eigenvectors. The relation X~ XX = / implies that the rows

x£ of X " * satisfy the equations

xtxj = iu (U = ! . · - . » ) ■ (1-6.2)

Now Axt = μ,χ,Ο' = 1,..., n) can be written

AX = XD or A= XDX'\

Thus A is diagonalisable. Now AX = XD is equivalent to X~ lA = DX1; so

^*χ^ = μι·χί* ( ι = Ι,.,.,η).

The xt* are the eigenvectors of A* normalized by the relations (1.6.2).

Moreover, X+ = [x r J is the adjoint basis of X = [ x j and X*X = I is

equivalent to Χζ = X~x. Conversely, if A is diagonalisable, it possesses n linearly

independent eigenvectors. We leave the proof to the reader.

Thus A is diagonalisable if and only if its eigenvalues are semi-simple. In

that case A9 too, is termed semi-simple. When A is not diagonalisable, it is called

defective.

20 SUPPLEMENTS FROM LINEAR ALGEBRA

simplicity of the diagonal form. However, in practice, even if A is diagonalisable,

the matrix X, though invertible, may be ill-conditioned with regard to inversion,

which renders X~lAX difficult to compute. For this reason the next section

will be devoted to similarity transformations by unitary transformations; in the

Euclidean norm these have a conditioning coefficient equal to unity (see Exercise

1.1.7).

We are going to show that every matrix A is unitarily similar to an upper

triangular matrix; this is Schur's form for A. The following theorem is an existence

result; a constructive algorithm, called QR, will be given in Chapter 5 for certain

classes of matrices.

Theorem 1.6.2 There exists a unitary matrix Q such that Q*AQ is an upper

triangular matrix whose diagonal elements are the eigenvalues μί9...,μη in this

order.

trivial when n = 1. We assume that it holds for matrices of order n — 1. By means

of a reduction technique we shall produce a matrix of order n — 1 that has the

same eigenvalues as A except μ χ .

Let Xj be an eigenvector of A such that

Αχι=μιχΐ9 ||Xill2 = 1·

There exists a matrix U of size n by n— 1 such that [ x i , l / ] is unitary: the

columns of U are orthogonal to x 1} that is U*xl = 0. Then

ALx^Vi^^x^AUl

and

(μχ x\AU\

\0 U*AUJ

The eigenvalues of U*AU are μ 2 ,...,μ„. The result follows by induction.

The chosen order of the eigenvalues {μ^ determines the Schur basis Q apart

from a unitary block-diagonal matrix (see Exercise 1.6.5).

reduced to an upper block-triangular matrix, where the order of the diagonal

blocks is at most 2 by 2. The diagonal blocks have conjugate complex eigenvalues.

REDUCTION OF SQUARE MATRICES 21

exist invertible matrices X and Y such that

B = XAY.

More especially, B is unitarily equivalent to A if

Extending the transformations on A so as to include equivalence transforma

tions, we shall prove that every matrix is unitarily equivalent to a diagonal

matrix with non-negative elements, the diagonal consisting of its singular values.

This is the singular value decomposition (abbreviated to SVD).

Theorem 1.6.3 Let A be an m by n matrix and put q — min (m, n). There exist

unitary matrices U and V of orders m and n respectively such that U*AV= diag (σ,)

is of order mbyn and σχ ^ σ2 ^ · · ^ aq ^ 0. [This means that U*AV= (s^·) where

s

u = σ,·(ΐ = 1,... ,q) and sfj = 0 otherwise.]

PROOF There are vectors xeC" and ye<Cm such that ||x|| 2 = ΙΜΙ2 = 1 an(

l

Ax = axy, where

*i = MII2.

We construct unitary matrices U and V of the form:

U = ly,UJ and Κ=[χ,Κ χ ].

Then

*-™"-('.' ;*)

where

w*=y*AVx and 5=ί/*/1Κ 1 .

1/2

Let φ = (σ\ + w*w)~ . The vector:

u* = (f>(auoj*)

is a unit row vector such that

Α

* = Φ{ Bw }

We deduce that

M i « III > Φ2(?\ + w*w)2 = σ\ + w*w.

However,

<x? = M l l ^ M i 112 >Mi«ll!·

22 SUPPLEMENTS FROM LINEAR ALGEBRA

It follows that

and so w = 0. The proof is now completed by induction by virtue of the fact that

M||2^||B||2.

Returning to similarity transformations of square matrices and employing

transformations that are not necessarily unitary, we shall put an arbitrary matrix

(diagonalisable or defective) into a particular block-diagonal form, known as

Jordans* form.

Numerous proofs have been given to establish the Jordan form of an arbitrary

defective matrix (see Horn and Johnson, 1990, pp. 121-6]. We present here a

strongly computational proof which starts from Schur's form. A further definition

and three preliminary lemmas are required.

ifuij = 0 when 1 ^ j ^ m. Evidently,

Um = 0.

matrix Z such that

ZlRZ = diag{R,},

where

R.^U+Ui (i=l rf).

Ui is a strictly upper triangular matrix and the A, are distinct.

Assume now that the theorem is true for upper triangular matrices of order less

than n. Let R be an upper triangular matrix of order n. Since the reduction to the

Schur form allows us to arrange the eigenvalues in any manner, we may assume

that

Vo RJ'

where

R1 = λlI+U1

REDUCTION OF SQUARE MATRICES 23

c rr-c -r>

There exists a matrix 5 with the property that

// B\(RX S\(I -B\JRX 0\

Vo i)\o jt2Ao i J \o R2J

if and only if

S = RXB-BR2 (1.6.3)

It will be shown in Section 1.12 (Proposition 1.12.1) that equation (1.6.3) has a

unique solution for B provided that

sp(K 1 )nsp(K 2 ) = 0,

as is indeed the case in this situation.

< V}

Then

£* = 0,

£*i+i=*i (i=l,...,fe-l),

/ - £ T £ = βχβϊ.

Lemma 1.6.6 Let U be a strictly upper triangular matrix of order m. Then there

exists an invertible matrix Y such that

y - 1 l / y = N = diag(£ J ),

where

The block Ej is of order kj and these orders are arranged in decreasing order of

magnitude. (When kj=\, then Ej is the zero matrix of order unity.)

24 SUPPLEMENTS FROM LINEAR ALGEBRA

assume that it holds for all strictly upper triangular matrices of order less than m.

Let

0

7'«,y,-(«' JV,

where

N 2 = diag(£ 2 ,.. ·>£«,)

and

order (£ χ ) ^ order (Ej) (;>2).

Now

(l 0 VO M T\/1 0\ /0 u

Vo yrvVo ι/.Λο r,; "Vo rr

T

Let u Y1 = [M{W2] be partitioned in a manner consistent with the partitioning of

Y~1 Ul Y. Then with the aid of Lemma 1.6.5 we find that

0 I 0 0 £i 0 I 0 0

0 0 /) 0

\

0 N2, 0 / o N

2j

and

u](I — £{£j) = wjejej = σ^[

say, where

σ

= «Τ«ι·

We now have to distinguish two cases. First, when σ ^ 0, we verify that

1

/σ 0 0 \ ft ael «I\ σ 0 0\ (0 e\

N exu\

0 / 0 0 Ei 0 / 0 = 0 Ei 0

0 0 σ-1/ 0 0 0 W2

N V 0 0 alj Γό ό NJ

V \

where the order of N exceeds that of E1 by unity. Put

sJ = uT2Ni2-1 (i=l,2,...,fc 2 + l).

We observe that

N22 = 0,

because the blocks of N2 are arranged in decreasing order; on the other hand,

e,s: eM~

REDUCTION OF SQUARE MATRICES 25

/l ei+1sJ\(N eiS]\(l -ei+1sJ\jN ei+isj+l\

\0 I Λθ JV 2 Ao / / \0 N2 f

where we have used the fact that

and

order (N) > order (E2).

Next, suppose that σ = 0. Then a simple permutation of the rows and columns,

which amounts to a similarity transformation, shows that U is similar to

(El 0 0 \

0 0 u\\.

^0 0 N2)

so U is similar to

(El °)

where N'2 has the block-diagonal form required.

Theorem 1.6.7 Let Abe a defective matrix of order n with distinct eigenvalues

λί9...9λά (d<n).

Then there exists an invertible matrix X such that

X-MX = diag(J0),

where

J.j = XJ -f Eu

and Eu is a matrix of order k^ of the form

E

»-(o V ) U-'·*■■·■«*

that is there are gt blocks Ei} corresponding a particular λ(.

26 SUPPLEMENTS FROM LINEAR ALGEBRA

Q*AQ = R,

where R is an upper triangular matrix. Then JR is transformed into the block-

triangular form by Lemma 1.6.4. By Lemma 1.6.6 we have

Βί=Υ7ΐ(λίΙ+υί)Υί = λίΙ + άϊ^(Ε1,...,Ε9) (i=l,...,<*).

The set of Jordan blocks J 0 (j = 1,..., gt) associated with the same eigenvalue

λι constitutes the Jordan box associated with Af. Its order is

m£ = fca + — + fcfrf;

it contains gt blocks and so

/,<mf (1.6.4)

Let t{ be the dimension of the largest Jordan block associated with kr Then

( ^ - ^ = 0,

for there are no more than ί{ — 1 consecutive units along the first superdiagonal

of Bt. As always, let ku...., kd be the distinct eigenvalues of A.

Theorem 1.6.7 shows that

C=©Mf,

i= 1

where

M^KerM-V/«;

we then have

dim M, = mh

which is the algebraic multiplicity of k{. We call /, the index of kt.

It can be proved that the Jordan form is unique apart from the arrangements

of the blocks along the diagonal.

multiplicity g = 3 and index ί = 3. There are two possible forms for the Jordan

box associated with A; each contains three blocks and there cannot be more than

two consecutive units on the superdiagonal:

\λ 0 \λ 1

0 λ 0

λ 1 0

0 λ 1 λ 1

0 0 A 0 0 λ 0

A 1 θ! λ 1 0

0 A l ο λ ι!

0 0 A 0 0 A

SPECTRAL DECOMPOSITION 27

Let λί9..., λά be the distinct eigenvalues of A. The spectral projection associated

with lt is the projection Pt on the invariant subspace M, parallel to

A= XJX~\

where J is a block-diagonal matrix consisting of d Jordan boxes

Bl9...,Bd9

The box Bi is an mf x mt matrix of the form

Bt = XtImi + Nl9

where Nt is a matrix whose only non-zero elements appear on the first super-

diagonal and can be taken to be equal to unity.

Let Xi (respectively Xf*) be the matrix formed by the mf columns of X

(respectively rows of X'1) which are associated with Af. The column vectors of

Xi furnish a basis for M4 which possesses as the adjoint basis the corresponding

row vectors of X~ \ that is

Xt*Xi = / m ..

The matrix

This is the spectral projection associated with kt and is illustrated by the following

diagram:

0 0

0 Bi 0

0 0

X J x-1

28 SUPPLEMENTS FROM LINEAR ALGEBRA

where

i=l i=l

along the superdiagonal. Therefore

Νγ = 0 and D(' = XtN{* X* = 0.

We leave it to the reader to verify the following relations:

PiPj = SijPi, DiPj^ötjDi, DiDj = Q if i*j;

APt = P,A = PiAPi = λ,Ρ, + A , Dt = (A~ A,/)P,.

If, for an eigenvalue Xh the invariant subspace N{ is identical with the

eigenspace Ker {A — λ(Ι)9 then ί·χ = 1 and Df = 0 in (1.7.1). In this case we say that

the spectral projection P{ reduces to the eigenprojection.

form

Α

=Σλιρι,

t=l

subspace M, is identical with the eigenspace Ker (A — A,·/), that is / f = l ,

Dt = 0.

defined. In particular one might consider the orthogonal projection. We have

chosen a projection that is related to the properties of

{A-ziy1 (ze<C),

as we shall see in Chapter 2. In Chapter 4 this will enable us easily to derive the

convergence properties of a sequence of spectral projections from the convergence

of the corresponding sequence of matrices.

A*=t$ipf + DT)

i= 1

SPECTRAL DECOMPOSITION 29

PROOF The proof is immediate, as can be seen from the following example:

(X 1 0 0 \ /Ä 0 0 0 \

0 A 0 0 1 A 0 0

J= J* = = PJT,

0 0 A 1 •0 0 Ä 0

Vo o o xl Vo o i V

where

/Ä 1 0 0\ / 0 1 0 0\

0 Ä 0 0 10 0 0

J' = P=

0 0 Ä 1 0 0 0 1

Vo o o xJ V0 0 1 θ7

Generally, the determination of P is given in Exercise 1.6.16.

We deduce from Proposition 1.7.3 that Af and 3, have the same multiplicities

and indices in A and A* respectively.

-CO

has the eigenvalues λχ = 3 and λ2 = — 1; the corresponding eigenvectors can be

taken to be

x

'-(i) ''-(-2)

The matrix A* has the eigenvectors

1* 1 — ^ 2 * 2 —

and

'-.-»ft 'ί2)

*2 = *2*ί. = 2 ( _ 2 1 }

30 SUPPLEMENTS FROM LINEAR ALGEBRA

A = 3Pl-P2, A* = 3P*-P*.

associated with λ:

AX = XB, where B = XIm + N

is the Jordan box of order m associated with λ. The matrix

B = X$AX

represents the restriction

AW:M^M

in the adjoint bases X and X+.

Let Λί^ be the subspace generated by the basis X+. The relationship

X%A = BX%

or, alternatively,

A*X+ = X+B*

implies that M + is the invariant subspace for A* associated with I; it is also called

the left-invariant subspace of A associated with λ.

Let us change the bases of M and M+. Define

X' = XC and * ; = X^(C'1)*

so that

X'fX' = /.

We obtain

AXC = XBC or AX1 = XB\

where

B' = C-1BC = X'*AX'.

Now B' represents A ^M in the new bases; it is, in general, no longer an upper

triangular matrix.

Lemma 1.7.4 Let (X, Y) and (X, Y') be two pairs of adjoint bases in (M,N) and

(M, ΛΓ) respectively, where M is an invariant subspace for A. Then

B=Y*AX= Y'*AX.

PROOF

RANK AND LINEAR INDEPENDENCE 31

of X, denoted by r(X\ is the largest order of an invertible matrix that can be

extracted from X by selecting some of its row and columns and forming their inter

section.

Lemma 1.8.1 The rank ofX is equal to the number of non-zero singular values

ofX.

Proposition 1.8.2 Let r(X) = m. Then there exists annxm orthonormal matrix Q

and upper triangular invertible matrix R such that

X = QR. (1.8.1)

PROOF The proof can be found in Horn and Johnson (1990, p. 112).

The formula (1.8.1) is called the Schmidt, or QR9 factorization of X. Several

algorithms exist for obtaining the factorization (1.8.1). We mention the methods

of Gram,* Schmidt, the modified Gram-Schmidt processes and those of

Householder and Givens (see Golub and Van Loan, 1989, pp. 146-62).

Proposition 1.8.3 Ifr(X) = r <m, there exists a permutation matrix Π such that

XTL = QR,

where Q is orthonormal,

practical point of view. It enables us to construct an orthonormal basis Q of a

vector space M by starting from an arbitrary basis X.

In a practical situation it may happen that the vectors of X, though independent

mathematically, are almost dependent numerically, that is cond 2 (X) and also

cond2(jR) are large. This leads to the notion of a numerical rank for such a set of

vectors.

32 SUPPLEMENTS FROM LINEAR ALGEBRA

Definitions Let

be the singular values ofX and let ebe a given positive number.

(a) The matrix X is said to be oft-rank r if exactly r singular values satisfy

^ ε (i=l r).

σι

(b) The m column vectors ofX are dependent within ε, if there exists an invertible

matrix B of order m such that XB is oft-rank less than m. A matrix X which

is of rank m but oft-rank r is said to have a numerical rank equal to r.

We recall that a matrix A is said to be normal if it commutes with its conjugate

transpose, that is

AA* = A*A.

A Hermitian matrix (A = A*) is a special case of a normal matrix.

With regard to the eigenvalue problem, Hermitian or normal matrices possess

numerous remarkable properties, which be quote for reference:

(a) A Hermitian or normal matrix possesses an orthonormal basis consisting of

eigenvectors.

(b) We have

Ρ( = Ρ*.

(c) \\A\\2 = p(A)

(see Exercises 1.9.3 and 1.9.4).

In addition, the eigenvalues of a Hermitian matrix possess the following

min-max representation (Fischer-Poincaro*).

Theorem 1.9.1 Let A be a Hermitian matrix with eigenvalues

Then

μ]ί = min max (x*Ax; xe Vk, x*x = 1)

vk

(k= 1,..., n), where Vk ranges over all k-dimensional subspaces of<Cn.

NON-NEGATIVE MATRICES 33

PROOF See, for example, Ciarlet (1989, Theorem 1.3.1, p. 16). The max-min

characterization, which is due to Courant* and Weyl* is demonstrated in

Exercise 1.9.5.

The number

^, χ x*Ax

^(x) = — - ,

x*x

defined for x Φ 0, is called the Rayleigh* quotient of A for the vector x. This

number plays a very important part in the calculation of the eigenvalues of

Hermitian matrices. In particular, it is an immediate consequence of Theorem

1.9.1 that

μί^χ*Αχ^μη,

where x is an any vector satisfying x*x = 1.

1.10 N O N - N E G A T I V E MATRICES

A matrix is said to be non-negative if all its elements are either positive or zero.

Such matrices occur in the numerical treatment of partial differential equations,

in the theory of probability, in physics, chemistry and economics (see Chapter 3).

We quote the principal result, which is known as the Perron § -Frobenius

theorem; it is concerned with what are called irreducible matrices.

This notion is defined as follows: an n by n matrix A = (a0) is said to be reducible

if the indices 1,2,..., n can be split into two non-empty disjoint sets:

il9...9ir; Ji,...Js (r + s = n)

in such a way that

a

.W/> = ° (a=l,...,r;j?=l,...,s).

If such a splitting is impossible the matrix is said to be irreducible.

eigenvalues:

x, associated with p, all of whose components are positive. Every eigenvalue of

modulus p is simple, and every eigenvector with positive components is proportional

to x. Moreover, if the elements of A are strictly positive, then p > \Xj\for every

eigenvalue λ] other than p.

f

Hermann Weyl, 1885-1955, born at Elmshorn, died at Zürich.

♦Oscar Perron, 1880-1975, born at Frankenthal, died in Munich.

§

John William Strutt (Lord Rayleigh), 1849-1919, born at Langford Grove, died at Terling Place.

34 SUPPLEMENTS FROM LINEAR ALGEBRA

PROOF The reader is referred to the book by Varga (1962, p. 30) for the

definitions and the proof or to the book by Gantmacher (1960, Vol. II, Ch. 13)

or to Horn and Johnson (1990, Ch. 8).

In Section 1.9 we defined the Rayleigh quotient for a single vector x. We shall

now generalize this definition by considering several vectors.

Let Q be an orthonormal basis for an arbitrary subspace; thus Q*Q = I. Then

π

Μ — QQ* is th e orthogonal projection on M.

A^M. The map nMA^Mfrom M to M is called the section of A on M, and

@(Q) = Q*AQ

is called the Rayleigh matrix quotient of A on Q (see Householder, 1964, p. 74).

the orthonormal basis Q of M. If M is invariant under A, then the section nMA^M

is identical with the restriction A^M.

To be sure, the number 0l{x) is still defined when A is no longer Hermitian. In

this case, however, it may be more interesting to consider

@(x,y) = y*Ax,

which is the generalized Rayleigh quotient constructed for two vectors x, y such

that

y*x = 1.

Let X be a basis for the subspace M and let Y be the adjoint basis for the subspace

N\ so

Y*X = I.

Then

®{X, Y) = Y*AX

is the matrix Rayleigh quotient for adjoint bases X and Y. Let

P = XY*;

then

PAP = X(Y*AX)Y*.

SYLVESTER'S EQUATION 35

AX = XB, where B = Y*AX.

However, in general,

AX - XB = R,

where R is interpreted as the residual matrix for A associated with X and Y,

1.12.1 Block-diagonalisation of J =

ES

Let A and B be square matrices of orders n and r respectively. Suppose that the

nxr matrix Z is a solution of Sylvester's* equation:

AZ-ZB = C. (1.12.1)

It is easy to verify that, if

and

-ft a - - f t r>

The matrices

(0) - (ζ')

are bases for the right and left invariant subspaces of T respectively, both being

associated with the matrix A. The corresponding spectral projection is given by

The above matrix equation may be regarded as a particular system of nr

equations for nr (scalar) unknowns. We define the function

vec:C'lxr-><C',r

which associates with every nxr matrix a column vector of order nr as follows.

36 SUPPLEMENTS FROM LINEAR ALGEBRA

Let

Z = [z 1 ,...,z r ],

where z l,..., zr are the columns of Z; then

Zl

vecZ = f ) eC"

Put

Is

z = vec Z, c = vec C,

Then

ΛΖ - ZB = C

is equivalent to

^z = c, (1.12.2)

where

^ = /r®^-BT(8)/„,

the symbol ® denoting the tensor product. Explicitly,

pjA-b^K ... - U \ (1123)

xr

Let T be the linear map o n C " defined by

Ί.Ζ-+ΑΖ-ΖΒ (1.12.4)

By virtue of the isomorphism between C" x r and C nr we identify T with ^".

We may think of equation (1.12.1) as a system of equations for the column

vectors zx,..., zr. These unknowns are linked by the matrix B. We may separate

them by using the Schur form

B = QTQ*,

where Tis an upper triangular matrix with diagonal

(/z 1 ,...,^ r ) = sp(ß).

Put

Z' = ZQ and C = CQ.

Equation (1.12.1) is equivalent to

^1Z/-Z'T=C/. (1.12.5)

SYLVESTER'S EQUATION 37

Further, let

z' = vec Z' and d = vec C".

Then equation (1.12.5) is equivalent to

9"i = c\

where

(Α-μχΙη 0

T

f' = Ir®A-T ®I =

V -*ιΛ Α-μϊ%)

is a block-triangular matrix. More explicitly, we can write the equation as the

system

(Α-μιΙ)ζ,ί=ο\

(Α-μ2Ι)ζ'2 = €'2 + ίί2ζ\ (1.12.6)

r-l

( Λ - / ν θ ζ ; = < + Σ iirz;

This system may be solved recursively for z\,...,z'r provided that each of the

matrices

Α-μχΙ„...,Α-μΤΙη

is invertible. The original unknowns can then be determined with the aid of

the equation

Z = Z'Q*.

Hence we have the following proposition.

The minimal distance between sp(A) and sp(2J) is defined as

δ = {min|/l — μ\, Xesp(A), μβ$ρ(Β)}

= min dist [sp(/4), sp(ß)].

This is a measure of the separation of the two sets sp(A) and sp(B) in C. In

what follows we assume that δ is strictly positive; in that case equation (1.12.1)

has a unique solution

Z = TlC,

where T " 1 , the inverse of T, is also denoted by (A9B)~l.

1.12.4 Algorithms

Two algorithms are used in practice to solve equations (1.12.6):

(a) The algorithm of Bartels and Stewart (1972) which utilizes the Schur form

38 SUPPLEMENTS FROM LINEAR ALGEBRA

One is then led to solving

SZ'-ZT=X\

where Z" = JJZ' and C" = L/C, that is r triangular systems with matrices

5 - μ , / ( ί = l,...,r).

In a concrete situation the QÄ algorithm is used to compute the Schur

forms T and S, which, in the case of real matrices, may involve diagonal

blocks of order 2.

(b) The algorithm of Hessenberg*-Schur proposed in Golub, Nash and Van

Loan (1979), which uses the Hessenberg form of A, that is A = U*HU, where

U is unitary and H is an upper Hessenberg matrix. The r systems with

matrices H — μ,7 (i = 1,..., r) are solved by Gaussian^ elimination with partial

pivoting.

Since

ΖΦΟ ||Z|| F z*o ||z|| 2

and

condF (T) = cond2 3Γ

Proposition 1.12.2 Let δ = min dist Ορ(Λ), sp(ß)]. Then \\ T"11| ^ δ" ι . More

over, if A and B are Hermitian or normal, then ||Τ _1 ||ρ = δ _ 1 .

PROOF The expression for 2Γ' makes it plain that its eigenvalues and therefore

those of ^" are A, —/ij, where At6Sp(y4) and μ^€8ρ(β). Hence p(^""x) = p(T "x) = δ~ x

and p ( T - 1 ) ^ ||T _1 1| for any induced norm |||| (see Exercise 1.1.4).

Next, if A and B are normal, so is ΖΓ\ this may be deduced from (1.12.3) by

showing that ^ * ^ = «T.T*, provided that A*A = AA* and B*B = BB*.

A normal matrix can be diagonalised by a unitary similarity transformation.

Since the norm || · || 2 is invariant under unitary transformation we may assume

for the present purpose that 5" is in diagonal form. It now becomes plain that

ρ ( τ - 1 ) = | | ^ - 1 | | 2 = 1ΐτ-1||ρ.

+

After Carl Friedrich Gauss, 1777-1855, born in Brunswick, died in Göttingen.

SYLVESTER'S EQUATION 39

which B is of order r = 1.

1.12.5.1 Caser=\

Let δ = dist [b,sp(/4)] = |fr - λ\ > 0.

is a diagonal matrix. Then

(5- 1 ^||(>l-W)- 1 |l2^cond 2 (X)<5- 1 .

(A-bI)-l=X(D-bl)X-1.

If δ(<\) is sufficiently small, then

\\{A-bI)-l\\2^cona2X(\+d)L-lo-L,

where L is the greatest index of eigenvalues μ of A such that \μ — b\ < 1.

Jordan blocks of the form

(μ-b l 0 \

^ 0 ··· 'μ-b)

of order less than or equal to the index έμ of the eigenvalue μβ$ρ(Α). For such

a block, the least singular value is the square root of the least eigenvalue of

H = G*G. Put ε = μ - b, where \ε\^δ. Then

I e.t 1 + |ε|2'χ

'ε

V0 '''·ε l + |e|2J

We wish to obtain a lower bound for the eigenvalues of H. Now

det// = |detG| 2 = |6| 2r .

By Gershgorin's theorem (Theorem 4.5.1) the eigenvalues a, of H satisfy

0 < α ί ^ 1 + |ε| 2 + 2|ε| = (1 + |ε|) 2

40 SUPPLEMENTS FROM LINEAR ALGEBRA

det H |ε| 2 (i=l,...,r).

Therefore

\\(A-biyl\\2^cona2(.X) max [(1 + |β|/"- 1 |β| - ''] (1.12.7)

nespM)

1+ΙεΐΥ"1

/:|6|

is given in Figure 1.12.1, when r > 1. This is an increasing function with respect

to the exponent r.

If we confine ourselves to values of |ε| less than unity, we have

ΐ / ΐ + |ε|Υ" -1 1/Ί+<5 JL-1

|e|

where

L=max[t'ßesp(A),\s\ < 1].

When δ is sufficiently small, the maximum that appears in (1.12.7) is attained

at <5(^|ε|), the term in S~L being dominant.

When A is normal, cond 2 (A r )=l. Exercise 1.6.19 shows that cond2(X)

increases as a function of v(A) = || A A* - A*F\\¥9 the departure from normality

of A. Similarly, it is seen in Exercise 1.6.20 that \\N\\¥ increases simultaneously

with v(A\ where N is the strictly upper triangular part of the Schur form of A.

A= a c

0 b

Figure 1.12.1

SYLVESTER'S EQUATION 41

c2 {b-a)c

AA* A*A =

db-a)c -c2

When b φ a, the eigenbasis is given by the matrix

x-(x l

\

\0 (b-a)/c)

and when a = b, the Jordan basis is given by the matrix

*-G .-■)

It can be verified that cond2(A')-^ oo when |c| -* oo.

T = D + N, in which ||N|| F is large, is ill-conditioned relative to inversion

irrespective of the distances of the eigenvalues from the origin.

1.12.5.2 Caser>\

1

ΙΙ(Λ,β)" 1|F depends on cond2(X), cond 2 (F) and on δ = mindist[sp(/l),sp(J5)],

where X and V are the Jordan bases (bases of eigenvectors) of A and B

respectively.

B= a c

0 b

Since

(A-aIn 0 \

V -clm A-blJ

it is clear that cond2(«^") depends on |c| and on cond 2 (/l — μΐ). where μ = a or

b. We shall examine a particular case. Let

(\ OL \

1# a

A =

'a

K '■ij

of order 6. We note that |a| is related to cond2(^f) and \c\ is related to cond2(K);

let δ = min(\a - 11, \b - 11). Table 1.12.1 shows the dependence of y = ||(Ay B)"1 ||F

42 SUPPLEMENTS FROM LINEAR ALGEBRA

Table 1.12.1.

a -1 -5 -10 -15

y 5 x 105 2 x 108 2.3 x 108 7 x 108

c 0 1 10 100

y 1.6 xlO 4 5xl0 5 5 x 106 5 x 107

a = i, = 0.8; 6 = 0.2.

Analogous results are obtained when a Φ b.

Let A and B be matrices of type n by m. The set of matrices A — λΒ, where AeCC,

is called a pencil of matrices.

(a) A and B are square matrices of the same order and

(b) det (A — λΒ) does not vanish for all complex numbers λ.

In all other cases [m Φ n or m = n and det (A — λΒ) = 0 for all λ] the pencil

is said to be singular.

We are interested in regular pencils of square matrices: for some values of λ

there exists a vector x Φθ,χβ<£η such that

Αχ = λΒχ. (1.13.1)

The problem (1.13.1) is called the generalized eigenvalue problem. The set

sp|\4, JB] = {ze<C|det(/4 - zB) = 0}

forms the set of eigenvalues of the pencil. If 0 Φ Aesptyl,/*], then 1/Aesp[ß,/1].

Moreover, if B is regular, then

sp [A, B] = sp \_B ~l A, / ] = sp(ß " lA).

When B is singular, s p [ ^ , ß ] can be finite, empty or infinite.

Example 1.13.1

,aM

-G 3) B =C 0) »wfl-(u

(bM

-C 3) Mo i) s »t^ = 0

EXERCISES 43

(cM

"C o) B-(l o) »Wfl-c

In applications, A and B are often symmetric. Without further assumptions

about A and B9 the spectrum sp[i4,£] may be complex.

Example 1.13.2

indefinite (eigenvalues positive and negative) and B is singular and positive

semi-definite (eigenvalues positive and zero). This is found in structural mechanics

(see Chapter 3).

The canonical angles between two subspaces were used in statistics before they

were used in numerical analysis (see Afriat, 1957; Björck and Golub, 1973;

Davis and Kahan, 1968; Golub and van Loan, 1989, Ch. 2; and the article

by Stewart, 1973a). Regarding the notion of gap in a Banach space, see Chatelin

(1983). The relationship between the norm of the strictly upper triangular part

of the Schur form and the departure from normality was studied by Henrici

(1962). The computational influence of a very large departure from normality

on matrix calculations is discussed in Chatelin (1992). The proof of the existence

of the Jordan form given in Section 1.6.3 was inspired by Fletcher and Sorensen

(1983).

The first variational characterization of the eigenvalues of a self-adjoint

operation is due to Weber (1869) and to Lord Rayleigh (1899). The min-max

characterization is due to Pioncare (1890) and to Fischer (1905), while the

max-min characterization is due to Weyl (1911) and to Courant (1920).

In Stewart's article (1973a) and in Varah (1979) the quantity \\(A9B)'1 \\~χ is

called separation between the matrices A and B. We have not adopted this

nomenclature in order to avoid possible confusion with the separation between

spectra represented by δ in the non-normal cases. As regards the definition in

a Hubert space, see Stewart (1971). The proof of Proposition 1.12.4 was inspired

by Kahan, Parlett and Jiang (1982).

EXERCISES

1.1.1 [A] Prove that every basis of C" possesses an adjoint basis and that the

latter is unique. Examine the existence and uniqueness of an adjoint basis for

a basis of a subspace of C" of dimension r, when r < n.

44 SUPPLEMENTS FROM LINEAR ALGEBRA

KerA* = (lmA)L

and

I m ^ * = (Ker>l) 1 .

1.1.3 [D] Prove that, if Ae<£nxn is Hermitian and positive definite, then the

function

(x9y)G(Cn x (Cn^y*Axe(C

defines a scalar product.

1.1.4 [B:25] Prove the following inequalities:

Mh^iMIIJ/lU 1 ' 2 , WleC"*",

1

Al\\^\\A\\2^y/n\\A\\ii V/1GC W X W ,

M||2<Mi|F<v/K^M|l2-V/ie(C''1'<'\

where r(A) denotes the rank of A,

it An

*ζρ(Α), Vi4e<P,x", VueC\{0}

u*u

p(A) < || A ||, V X e C x", V induced norms.

x

1.1.5 [B:8,11] Let A e C " be Hermitian with spectrum

sp(/l) = {A 1 ,... ) A d }.

Prove that:

(a) A f G R ( i = l , . . . , 4

(b) The singular values of A are σ{ = | λ,·| (i = 1,..., d).

(c) There exists an orthonormal basis of <CW that consists of the eigenvectors of A.

(d) \\A\\2 = p(A).

1.1.6 [A] Show that, for all A in <C"xr,

Μ||*=ρ(ΛΜ).

n xr

1.1.7 [A] Let Qe<E and suppose that the columns of Q form an orthonormal

system. Prove that | | β | | 2 = 1. Deduce that, if Q is a unitary matrix, then

cond2(0=l.

1.1.8 [A] Prove that if A is a singular matrix, then at least one of its singular

values is equal to zero.

EXERCISES

(a) p2 = p ^ s p ( p ) c = { o , i } .

(b) Dk = 0 for some integer k=>p(D) = 0.

(c) e*e = ßß* = /andAesp(Q)=>|>l| = l.

(d) condpf7; f ) = 2n+||Z||£.

σ1^σ2^ ··· ^σ„>0.

show that

cond2 A = cond*/2(/4M) = ( ^ J ,

Gi = SUp

dimK =-i xeV\ χ*χ )

af(i4 + B) ^ σ,(Λ) + ^(B) (1 ^ i ^ n\

where, for each matrix M, the singular values are denoted by

σι(Μ)^σ2(Μ)^··^ση(Μ).

1.1.12 [D] Let A be a Hermitian matrix. Show that if Λ is positive

(semi-definite), then its eigenvalues are positive (non-negative).

1.1.13 [D] The matrix CeR"xn is defined by

[—af_ x if 7 = n and 1 <; i < n

1 if 7 = i — 1 and 2 ^ i ^ n

0 otherwise

Show that the characteristic polynomial of C is given by

j=o

1.1.14 [D] Construct two real 2 by 2 matrices A and B such that

AB = ΒΛ and ρ(Λ£) < ρ(Λ)ρ(£)

tv(AB)^£ai(A)ai(B)9

46 SUPPLEMENTS FROM LINEAR ALGEBRA

tr(,4£) = tr(BA).

Deduce that if A and B are similar matrices, then they have the same trace.

1.1.17 [B:10] Let |||| be the norm in <Cnxr which is induced by the norms

|| · || C r in C r and || · || Cn in C . Prove that, for each Ae<En x r ,

MI|=inf{c>0:Mx||CM^c||x||Cr,VxeCr}.

1.1.18 [D] Prove that:

(a) The set of regular matrices is dense in C x "

(b) For all A and B in <Cn x", sp(AB) = sp(BA).

(c) If both products AB and BA are defined and are square matrices, then

p(AB) = p(BA).

xr

1.1.19 [B:ll] Show that if A = (a^eC , then

n

M | | = max £ \ai3\.

1.1.20 [D] Show that in C" x n the formula

< x , y > = tr(y*x)

defines a scalar product whose derived norm is the Frobenius norm. Is it an

induced norm? Is it sub-multiplicative?

2

Show that of the canonical angles between M and N at most [M/2] are non-zero.

(Notation: [a] = min {jeK'J > a} VaeR.)

dim M = dim N ^ -

2

and let 0max be the greatest canonical angle between M and N. Show that if

#max < π /2, then MnN1 contains at least one non-zero vector.

orthonormal basis of N. Suppose that the canonical angles between M and N

EXERCISES 47

are given by

2

Put

0 = diag(Ö1,...,Ör).

Let

7 = YY*X.

Prove that Γ is a basis of N and that

r~cos©, Τ-ΛΓ-sin©.

1.2.4 [B:6] Let M and JV be subspaces of <P such that

2

Define χ,-eC and j ^ e C O ' = l,...,r) by the following conditions:

lyjxj = maxmax(|j;*x|;x*x = y*y = 1)

xeM yeN

| y*xA = max max (| y*x\; x*x = y*y = 1, x*x = yfy = 0) (i = 1,... J - 1).

3 J

xeM yeN

Let 0X ^ 02 ^ ··· ^ 0r be the canonical angles between M and JV. Show that

cos0 r _ i + 1 = |j/*x.| (i= 1 r).

1.2.5 [D] Let 0X ^ ·· · ^ 0r be the canonical angles between the subspaces M

and JV of dimension r. Show that

sin 0X = max min (|| x — y || 2; x*x =1),

xeM yeN

xeM yeN

1.2.6 [A] Let Θ be the diagonal matrix of the canonical angles between the

subspaces M and JV of C of dimension r < n/2. Put C = cos Θ and S = sin Θ.

Prove that there exist orthonormal bases Q of M, Q of M 1 , (7 of ΛΓ and ζ/ of

JV1 such that

(C -S 0 \

[Qß]*[t/i/]* = S C O

V0 0 'n-2ry

48 SUPPLEMENTS FROM LINEAR ALGEBRA

1.3.1 [D] Let X and Y be two matrices in C x r such that X*X = Y*X = lr

and let P = XY*. Prove that

||Ρ||ρ=||7||ρ, wherep = 2 o r F .

1.3.2[Bill] Prove that P is an orthogonal projection if and only if P is

Hermitian.

1.3.3 [B:35] Suppose that P and Q are orthogonal projections. Prove that if

P is non-zero, then ||P|| 2 = 1. Also show that \\P - Q ||2 ^ 1.

1.3.4 [B:35] Let M and N be arbitrary subspaces of <Cn and let P and Q be

the orthogonal projections on M and N respectively. If \\P — Q\\2 < 1, then

(a) either dim M = dim N and \\(P - Q)P \\ 2 = || (P - Q)Q \\2 = || P - Q \\2

(b) or dim M < dim N and Q maps M on to a proper subspace N0 of N; if Q 0

is the orthogonal projection on ΛΓ0, then

11(^-00)^112 = 11(^-6)^112 = ΙΙ^-6οΙΙ 2 <ΐ

and

11(^-0)0112 = 11^-6112 = 1.

1.4.1 [B:35] Let M and N be arbitrary subspaces of C" and let P and Q be

projections on M and N respectively. Prove that

co(M,iV)<max{||(P-Q)P|| 2 ,||(P-Q)ß|| 2 }.

Examine the maximum when

dim M = dim N.

1.4.2 [D] Let M and N be subspaces of C of dimension r and let

be the canonical angles between them. Prove that if Π Μ and Π Ν are the

orthogonal projections on M and N respectively, then

ω(Μ,Ν) = ω ( Μ 1 , Ν 1 ) .

Deduce an extension of Theorem 1.4.4 for the case in which

dim M = dim N ^ -.

EXERCISES 49

1.5.1 [B:ll] Let {Mk} be a sequence of subspaces of C . Prove that Mk

converges to a subspace M of <C" if and only if the following conditions are

satisfied:

(a) Given a basis Y of M and a complement Ύ of Y and given a basis Xk of

Mk9 there exist, for sufficiently greatfc,a regular matrix Ffc and a matrix Dfc

such that

Xk=YFix + ?Dk.

(b) DfcFk->0as fc->oo.

1.5.2 [D] Assume that, in the definition of convergence given in Exercise 1.5.1,

the bases Y( = Q) and Xk{ = Qk) are orthonormal. Prove that we may choose Fk

to be unitary such that it has the same singular values as cos ®k where Θ* is

the diagonal matrix of canonical angles between Mk and M. Deduce that Mk-+M

if and only if cos Θ* -+ Ir.

1.6.1 [B:8,ll] Prove that eigenvectors that are associated with distinct eigen

values are linearly independent.

1.6.2 [D] Let Ae<Cn xn and let λβ<£ be an eigenvalue with a non-zero imaginary

part and x an associated eigenvector. Prove that x is an eigenvector associated

with I and show that x and x are linearly independent.

1.6.3 [D] Let λ be an eigenvalue of A and let M and M^ be the right and left

invariant subspaces of X, Prove that corresponding to any orthonormal basis

X of M, there exists a basis X+ of M such that X*X = /m, where m is the

algebraic multiplicity of λ.

1.6.4 [D] Prove that:

(a) A matrix that is both normal and nilpotent is zero.

(b) A matrix H that is skew-Hermitian (//* = — H) and normal has a purely

imaginary spectrum.

1.6.5 [A] Prove that the order of the eigenvalues on the diagonal of the Schur

triangular form determine the corresponding Schur basis apart from a unitary

block-diagonal matrix.

1.6.6 [B:25] Let A = l/ΣΚ* be the singular value decomposition (SVD) of A.

Show that U and V consist of the eigenvectors of AA* and A*A respectively.

1.6.7 [D] Let amin be the least singular value of A. Prove that, if A is regular,

Deduce that amin is the distance of A from the nearest singular matrix.

50 SUPPLEMENTS FROM LINEAR ALGEBRA

1.6.8 [B:9] Let AeCm x", r(A) = r and D = d i a g ^ , . . . , σΓ), where the oi are the

non-zero singular values of A, and let U and V be the matrices of the SVD:

VO 0/

Put

\G(£mxn

V o o

and

Αΐ=νΣ*υ*

Show that:

(a) AAU = A and AUAI = A f.

1

(b) ΛΛ " and A*A are Hermitian.

(c) If r = n, then Λ1" = μ Μ ) - Μ * .

(d) P = AA^ is the orthogonal projection on Im A.

Give an example in which [ΑΕγ ΦΒ^Α\ The matrix A^ is the pseudo-inverse

(or Moore-Penrose inverse) of A (a particular case of the generalized inverse),

which is used in the solution of Ax = b by the method of least squares.

1.6.9 [D] Let ί be the order of the greatest Jordan block associated with an

eigenvalue λ of Ae<Enxn. Prove that if / > 1, then

l

i ^ /=>Ker(/4 - λΐΥ' c Ker(/1 - A/)',

the inclusion being strict, and

i ^S=>Ker(A - λΐ)1 = Ker(A - λΙ)ί+ι = M,

where M is the maximal invariant subspace under A associated with λ.

1.6.10 [B:ll] By using the theorem on the Jordan form establish the following

results:

p(A)=\r&\\Ak\\llk=\\m\\Ak\\u\

k^ 1 k-> oo

k

lim A = Oop(A) < 1.

fc-*oo

number. Construct a norm | | | | in C" such that the induced norm |||| in C n x n

has the property that

\\Α\\<ρ(Α) + ε.

x

1.6.12 [B:10] Let A e l " ". Define formally

oo 1

eA = I+ f-Ak.

* = i/c!

EXERCISES 51

(b) Prove that if V is an arbitrary regular matrix, then

ev'lAV=V~leAV.

(c) Show how to compute eA by using the Jordan form of A.

1.6.13[B:31] Prove the Cayley-Hamilton theorem: if π is the characteristic

polynomial of A, then n(A) is the zero matrix.

1.6.14 [B:31] Prove that the matrix A is diagonalisable, if and only if the product

Yldi=1(A — XJ) is the zero matrix, where λί9...,λά are the distinct eigenvalues

of A.

Prove that

a„_1 = - t r A and a0 = {-\)ndetA.

1.6.16 [A] Show that every Jordan block J is similar to J*:

J* = P~lJP9

where P is a permutation matrix. Determine P.

1.6.17 [B:8] This exercise furnishes an alternative proof of the Jordan form.

Let Le<Cnxn be a nilpotent matrix of index /, that is

iZ-^O, but L' = 0.

Define

M^KeriJ, N^ImL', L° = /.

(a) Show that

Mi c:Mi+1 (strict inclusion)

when i = 0, l,...,<f — 1.

(b) Prove that there exists a basis of C" in which L is represented by

(N? 0

1

"ft

J=

N[l)

M1»

\°

52 SUPPLEMENTS FROM LINEAR ALGEBRA

where

NU) = Νψ = ... = No> = y e C i ^ ;

;i, ifj? = a + l

(0, otherwise

when j — 2,3,..., Λ and

with the convention that the blocks Νψ,..., Ν^ are omitted when pj = 0.

(c) Let Ae<Cnxn be an arbitrary matrix. Prove that A can be represented by a

matrix

'Alm 0

0 'Bly

where AY is nilpotent and Βγ is regular.

(d) Let sp(/l) = {λ 1 ,..., λά) be the spectrum of A. Prove that >1 can be represented

by a block-diagonal matrix

U 0 N

Vo 'Aj

where Λ, — λ(ΙΜί is nilpotent, mt being the algebraic multiplicity of Af.

(e) Deduce the existence of the Jordan form.

1.6.18 [D] Prove that the Jordan form is unique apart from the ordering of

the diagonal blocks.

1.6.19 [A] Suppose that A is diagonalisable by X:

D = X~lAX

and that Q is a Schur basis

ß M ß = D + iV,

where N is a strictly upper-diagonal matrix.

Prove the inequalities

\\A\\l

cond 2 2 (X):sl+--^%

2 WAX

where

v(i4)=||i4M-i4i4*|| F .

1.6.20 [A] Let A = QRQ* be the Schur form of A, where R is an upper diagonal

EXERCISES 53

matrix and N its strictly upper triangular part. Establish the bounds

v2(A) ^UKJll2 ^

m viA)lr?-n

,mr 'y^ -

where

v(A)=\\A*A-AA*\\F.

1.6.21 [D] Prove that two diagonalisable matrices are similar if they have the

same spectrum. What can be said about defective matrices?

1.6.22 [D] Let D be a diagonal matrix of order n and let X be a regular matrix

of order n. Consider the matrix

A= X~lDX.

(a) Determine a similarity transformation Y that diagonalises the matrix

" < : : )

of order In.

(b) Prove that Y diagonalises

B = fp(A) q(A)\

Kq(A) p(A)f

where p and q are arbitrary polynomials.

(c) Express the eigenvalues of B in terms of those of A.

1.6.23 [D] Determine the connection between the singular values of X and

the Schur factorization of the matrices

Section 1.7

X*X9 XX*

Spectral Decomposition

and

a

(~

\X

X*

0

that x = y + iz is a corresponding eigenvector, where y and μ are real, and y.

and z are vectors in R n . Prove that lin(y,z) is a real invariant subspace.

1.7.2 [D] Let

P,PJ = SUPJ,

Dfj = 3uDj,

54 SUPPLEMENTS FROM LINEAR ALGEBRA

Dflj = 0 when i # ; ,

APi = P,A = PiAPi = AfPt + Dh

Di = (A-XiI)Pi.

n xn

1.7.3 [D] Let Ae<£ . Prove the existence of a basis

Λ <Λ

1

W~ AW=

0

w

where Tf is an upper triangular matrix of order m„ all of whose diagonal elements

are equal to the eigenvalue λ( of algebraic multiplicity mx. If λί9...,λά are the

distinct eigenvalues of A> interpret the matrix WiWf.

1.7'.4 [C] Obtain the spectral decompositions of the matrix

Λ= 0 1

0 0

\

Section 1.8 Rank and Linear Independence

1.8.1 [A] Let Xe<Cnxm9 where m < n, and suppose that r(X) < m. Prove that

there exists a permutation matrix Π such that ΧΠ = QR, where Q is orthonormal

and

- ( V *;>

K n being a regular upper triangular matrix of order r.

1.8.2 [D] Prove that if X = QK, where Q is orthonormal, then

r (X) = r(R) and cond 2 (X) = cond 2 (R).

1.8.3 [D] Suppose that X = QK, where Q is orthonormal and

*u ^12

R=

0 R22J

is an upper triangular matrix. If Rx x is of order r and σ χ , σ 2 ,... are the singular

values of X arranged in decreasing order, prove that

1.8.4 [D] Suppose that the ε-rank of the matrix X is equal to r for every

EXERCISES 55

\\X-X\\p = min | | j r - y | i

Up

r(Y) = r"

where p = 2 or F.

1.8.5 [B:39] The Householder algorithm is defined as follows. Let A(l) = A be

a given matrix.

(*) given A{k) = (β{*>):

Iffe= Π, STOP.

If k < n, define

a = (M2+...+M2)l/2

Π

» «to

^Hl,GOTO(*).

(a) Prove that Hk is symmetric and orthogonal.

(b) Prove that the matrix Hk and the vector u satisfy the following equations:

H j M = — (xel,

fc-1

Hku = £ WJ^J — (xek if fe ^ 2.

fc-1

w = Σ Wjej=>Hkw = w.

(d) Let

R

= Hn-1Hn-2-H2H1A,

Q= H1H2-'Hn-2Hn-i.

Prove that R is an upper triangular matrix and that Q is orthogonal.

(e) Prove that A = QR.

1.8.6 [D] Let amin(X) be the least singular value of X. Prove that there exists

a permutation matrix Π such that, if ATI = QR is the Schmidt factorization, then

x

1.8.7 [D] Let AeC *, where n^p.

56 SUPPLEMENTS FROM LINEAR ALGEBRA

(a) Prove that there exists a factorization, known as the polar decomposition,

A = QH,

where Qe<Enxp, Q*Q = IP and where He<Epxp is symmetric and positive

semi-definite.

(b) Prove that the matrix Q in (a) satisfies the conditions

|| A - Q\\j = min {\\A-U ||,·:Ηη U = lin A and U*U = / p } ,

where j = 2 or F.

(c) Compare the applications of the polar decomposition and of the Schmidt

factorization in relation to the orthonormalization of a set of linearly

independent vectors.

1.9.1 [A] Prove that if A is Hermitian and if B is Hermitian semi-positive, then

p(A + B)>p(A)

1.9.2 [A] Prove the monotonicity theorem of Weyl: let A,B and C be

Hermitian matrices such that

A = B + C,

and assume that their spectra are arranged in decreasing order. Then:

(a) When i'= 1,2,...,*,

λ((Β) + λη(0 < λ((Α) ^ λΑΒ) + λΑΟ,

\λΑΑ)-λ({Β)\^\\Α-Β\\2.

(b) If C is semi-positive definite,

λΑΒχλΑΑ) (i=l,2,...,n).

xn

1.9.3 [A] Prove that the matrix Ae<C" is normal if and only if there exists

an orthonormal basis of <E" that consists of eigenvectors of A.

1.9.4 [A] Prove that if A is normal then

\\A\\2 = p(A).

nx

1.9.5[A] Let Ae<E " be a Hermitian matrix and put

JV u*Au

p(u,A) = —- (u#0).

u*u

Prove that the spectrum of A can be characterized as follows:

k{(A) = max min p(u, A),

s u

EXERCISES 57

dimS = / - l , MlS.

1.9.6 [A] Establish the following consequence of Exercise 1.9.2: if A and B are

Hermitian matrices of order n, then their eigenvalues λ^Α) and λ^Β) can be

enumerated in such a way that

λί(Β)^λί(Α)+\\Λ-Β\\ (i=l,...,n).

1.9.7 [B:67] Let π, be the characteristic polynomial of the real symmetric

matrix

fax '12 *iA

Aj = an (l^;^n),

faj u

jj)

(a) Show that {π 0 ,...,π η } is a Sturm sequence, that is if rx and r 2 are zeros of

Uj+χ such that rx < r 2 , then there exists a zero of π,- in [rur^\.

(b) Show that if An is tridiagonal, then

^j+i(t):=(t-aj+ij+i)nj(t)-aj+ljnj_l(t) (j = l , . . . , n - 1).

S = \xe1Rn:xi^Q, Σ x f = l [.

Put

T(x) = Λχ,

which is non-zero in S and such that T(S) ^ S. Use Brower's fixed-point theorem

to prove the Perron-Frobenius Theorem.

(a) If there exists a vector x > 0 such that

Ax ^ Ax, then λ ^ p(/l).

(b) (λΐ — A)'1 is non-negative if and only if λ > p(A).

58 SUPPLEMENTS FROM LINEAR ALGEBRA

1.11.1 [D] Prove that if M is an invariant subspace of A, then the section of

A on M can be identified with the restriction A^M.

1.12.2 [D] Let X and Y be complex n by r matrices such that r(X) = r and

Y*X = lr. Give a formula for the matrix that represents the section of a linear

map A on the subspace Im X.

1.12.1 [D] Let Pe<P ΧΛ and ße(C m x m be regular matrices. Suppose that Ae

<Cnxn and Be€mxm are such that sp(,4)nsp(£) = 0 . Put

R = (A,B)\ S= (PAP-\QBQlyl.

Show that

IISir^llRll^cond^condCQ).

Establish a stronger result when P and Q are unitary and || · || is the norm induced

either by ||-||2 or by ||-||F.

1.12.2 [D] Suppose that T:Z -► AZ — ZB is regular and that A is regular. Prove

that if ß^0 and δ >0 are such that ||0|| ^ β and \\Α~ι\\Κ(β + δ)~\ then

Suppose now that A and B are Hermitian and positive definite and that, for all

Show that

IIT-MI^«"1.

1.12.3 [B:27] Examine the spectrum and determine the spectral radius of the

operator

Ί.Χ^ΑΧΒ

as a function of the spectra and spectral radii of A and B.

1.12.4 [D] Let

3T = lr®A-BT®In,

3T' = Ir®A-ST®I„

EXERCISES 59

^ ' = (0®/J^(Q*®U

(b) Prove that

sp(^) = sp(iT') = μ - μ\ Xesp(A)^esp(B)}.

1.12.5 [B:27] Establish the following properties of the Kronecker product:

(a) A®{B + C) = A®B + A®C,

(A + B)®C = A®C + B®C,

provided that the above sums are defined.

(b) For all aeC, A ® (aß) = OL(A ® Β).

(c) (A®B)(C®D) = (AC)®(BD\

provided that the above products are defined.

(d) A®(B®C) = (A®B)®C.

(e) (A®B)* = A*®B*.

(f) Let A and B be regular matrices of orders m and n respectively. Then

(A®B)(A-l®B~1) = Im®In.

(g) If λ(Α) and λ(Β) are eigenvalues of A and B respectively with corresponding

eigenvectors φ(Α) and φ(Β), then λ(Α)λ(Β) is an eigenvalues of A ® B and

φ(Α)®φ(Β) is a corresponding eigenvector.

1.12.6 [B:17] Let V= (F 0 , Vx) be an orthonormal basis of <Cn and let P = K0K*

be the orthogonal projection on M = Im K0 and Q an orthogonal projection on

a subspace JV such that dimN = dimM. A unitary solution Ue(Cnxn of the

equation

UP-QU=0

is called a direct rotation of M on iV if and only if, relative to the basis V, it is

represented by

ÜJC° -M,

\s0 cj

where

(a) C 0 > 0 and Cj > 0 and

(b) S 1 = S * .

Prove that if N' = Im (/ — Q) and M' — Im (/ — P) satisfy the conditions

MniV' = M'n/V = {0},

then the direct rotation of M on N exists; it is unique and (a) implies (b).

60 SUPPLEMENTS FROM LINEAR ALGEBRA

1.13.1 [D] Let A and B be matrices of order n such that

Ker/lnKer£#{0}.

Prove that det(/l - λΒ) = 0 for all AeC.

1.13.2 [D] Let A and B be symmetric matrices in R " x n such that A is regular

and B is positive semi-definite and singular. Let U be an orthonormal basis of

Ker£. Prove that:

(a) Zero is an eigenvalue of A ~ 1B of algebraic multiplicity

m = dim Ker B + dim Ker UTA U

and of geometric multiplicity

g = dim Ker B.

(b) The non-zero eigenvalues of A ~ lB are real and non-defective: there exists a

diagonal matrix Λ of order r and an n by r matrix X satisfying XTX = In

such that

AX = BXA and XTBX = Jr.

(c) The matrix Λ~ ι Β is non-defective if and only if UTAU is regular.

1.13.3 [B:45] Let A and B be symmetric matrices. Prove that the pencil A — λΒ

is definite if and only if there exist real numbers a and ß such that the matrix

OLA + ßB is definite and that this is equivalent to the condition that UTAU is

definite, where U is an orthonormal basis of Ker B.

1.13.4 [B:45] Prove that every diagonalisable matrix C can be factorized in the

form

C= AB\

where A and B are symmetric and B is regular. Comment on this result.

CHAPTER 2

operators: mainly the expansion in a Laurent series of the resolvent (A — zl)~1

in the neighbourhood of an eigenvalue, and the expansion in the perturbation

series of Rellich-Kato and of Rayleigh-Schrödinger for the eigenelements of the

set of operators A(t) = A + tH, where t is a complex parameter. We introduce the

fundamental tool of a block-reduced resolvent in order to treat simultaneously

several distinct eigenvalues, which arise most frequently in the approximation of

a multiple eigenvalue.

O F A COMPLEX VARIABLE

Let f:z\-+f(z) be a function of a complex variable. We say that f(z) is holomorphic

(or analytic) in the neighbourhood V of z 0 if and only if / is continuous in V and

d//dz exists at every point of V.

Let Γ be a closed Jordan curve lying in V (that is a rectifiable simple curve)

positively oriented and surrounding z0 (see Figure 2.1.1). Then f(z0) is given by

Cauchy's* integral formula:

2inJrz — z0

By differentiation,

2inJr(z-z0)k+l

The expansion off as a Taylor* series in the neighbourhood of z 0 is as follows:

+

Brook Taylor, 1685-1731, born at Edmonton, died in London.

62 ELEMENTS OF SPECTRAL THEORY

Figure 2.1.1

it converges absolutely and uniformly with respect to z in the interior of any disk

that lies inside Γ. Conversely, every series of the form

00

f{z)= Σαάζ-zvt

k= 0

defines a function that is holomorphic in the open disk {z;|z — z 0 | < p}, where

p = (limsup|aj 1/fc )- 1 .

This series converges absolutely and uniformly with respect to z in every disk

{z;|z — z 0 | ^ r}, where r < p. Moreover, this series is uniquely determined by /

because

a — /c = 0 , 1 ,

k\

The coefficients ak of the Taylor expansion can be bounded by Cauchy's

inequalities:

\ak\ ^ Mr~\ k^O, where M = max |/(z)|.

Next we suppose that / is holomorphic in the annulus

{ζ;α<|ζ-ζο|<0}, a > 0.

Then / can be expanded in a Laurent* series in the neighbourhood of z 0 ; thus

f(z)= Yak{z-z0f.

— oo

{ζ;α + ε < \z — z0\ <β — ε}

where ε > 0.

If / is holomorphic in {z;0 < | z — z01 < β} but not in {z; | z — z01 < /?}, then z 0

is said to be an isolated singularity of/. If the expansion as a Laurent series about

z0 contains infinitely many non-zero coefficients ak for which k < 0, then z0 is

SINGULARITIES OF THE RESOLVENT 63

said to be an essential singularity of/. In the opposite case, z 0 is a pole of/; the

greatest integer { such that a_t Φ 0 is called the order of the pole.

The definitions and properties that have been recalled above can be extended

without difficulty to a function / with values in the vector space C n xw, that is,

for example, to a square matrix A of order n whose n2 coefficients depend on the

complex variable z. It suffices to replace the absolute value | · | on <C by the chosen

norm | | | | on C n x n . In particular, one could apply Liouville's* theorem and

Cauchy's integral formula.

The subsequent spectral theory will be established independently of the work

presented in Chapter 1.

The resolvent set of A consists of those points z of (C at which (A — z/)~ * exists;

it will be denoted by res(/4). The matrix.

R(A,z) = (A-ziy\ zeres(A)

is called the resolvent οϊΑ. If there is no ambiguity, we shall denote R(A9 z) simply

by R(z). The unique solution of the equation Ax — zx = b is then written as

x = R{z)b.

The complement of res (A) in C is called the spectrum of A and is denoted by

sp(/l): at a point λ of sp(/4) the matrix A — XI is not invertible. Hence there exists

a vector x φ 0 such that Ax = λχ\ thus λ is an eigenvalue of A and x is an

eigenvector.

In this section we will investigate the properties of the resolvent.

Lemma 2.2.1 The resolvent R(z) satisfies two identities, known as the first and

second resolvent equations respectively:

R(Zl) - R(z2) = (zx - z2)R(Zl)R(z2) = (zx - z2)R(z2)R(Zl) (2.2.1)

for all zx and z2 in res(/4), and

R(Ai,z)-R(A2,z) = R(Al,z)(A2-Al)R(A2iz)

^R(A29z)(A2-Al)R(Al,z) (2.2.2)

for zeres (A x )n res (A2).

(z1-z2)I = A-z2I-(A-zlI)

and

A2 — Ax — A2 — zI — (Al — zl).

64 ELEMENTS OF SPECTRAL THEORY

Proposition 2.2.2 The resolvent R(z) is holomorphic throughout res (,4), where it

possesses the following expansion as a Taylor series in a neighbourhood ofz0:

*=o

PROOF Let z0eres(y4). We have the formal identity

(A - ziy * = R(z0)U - (z - z 0 )K(z 0 )]" l .

For every z such that \z — z0\ < \\ R{z0) || ~ \ the series

R(z) = R(z0)f^l(z-z0)R(z0)f

fc = 0

converges absolutely.

a=limM f c || 1 / f c = inf||/l k || 1/k

fc-*oo k

exists.

*^inl*

inf* = b.

* k

The inequality

M- + *IKM-Ihll^ll

implies that

am + k<:am + ak.

When m is a fixed positive integer, we can put k = mq + r, where q and r are

integers such that 0 ^ r < m. Hence

k m

Hence lim sup (ak/k) ^ ajm, where m is arbitrary. Therefore sup(ak//c) ^ b. On

the other hand, since ak/k ^ fr, it follows that

liminfi — \^b.

SINGULARITIES OF THE RESOLVENT 65

PROOF By Theorem 2.2.3, \\Ak\\1/k -*a when k-> oo. Hence, if |z| > a + ε, where

ε > 0, we have

ΙζΙ-Μΐ^ΙΙ^^ία + εΓΗα + Η

and so

k=0

converges when \z\ > a. On multiplying on the left or on the right by A — z/, we

verify that

(A - zI)S{z) = S(z)(A -zl)=-1.

This proves equation (2.2.4).

The identity (2.2.4) is the expansion of R(z) as a Taylor series in the

neighbourhood of z = oo, the radius of convergence being limfc sup || Ak ||1/k = a.

Hence equation (2.2.4) diverges when |z| < a.

Corollary 2.2.5 The sets res(A) and sp(A) are not empty.

res (A) =) {z; \z\ > a}

and

sp(i4)c{z;|z|^a}.

Now res {A) is not empty because a exists. When \z\ > \\A\\, we deduce from

equation (2.2.4) that

ll*WII< Σ Μ Τ Τ ^ ^ Ι - Μ Ι Ι ) " 1 ·

k = 0\Z\

Hence || R(z)\\ -►O when \z\ -> oo. If sp(A) is empty, then R(z) would by analytic

and bounded throughout <C. By Liouville's theorem R(z) would be constants, and

this constant would be the zero matrix. This would lead to the contradiction that

/ = (,4-zJ)K(z) = 0.

66 ELEMENTS OF SPECTRAL THEORY

OL = max(|vl|;Aesp(,4)} = p(A).

PROOF We shall show that there exists at least one point of sp(/l) (that is an

eigenvalue of A) on the circle {z; \z\ = a}. Since the domain of convergence of

equation (2.2.4) is {z; |z| > a}, there is at least one singularity of R(z) on the circle

of convergence, provided that a > 0.

On the other hand, if a = 0, then sp (A) = {0} unless the spectrum of A is empty,

which is impossible. Hence we conclude that a = p(A).

values in <C"xn. Then z\->p(S(z)) is upper semi-continuous in G.

PROOF For every /ceN, the function z\-> || Sk(z) || l,k is continuous in G. For every

ε > 0 and z in G, there exists veN such that

||S v (z)|| 1/v ^p(S(z) + ie.

There exists of δ > 0 such that

|ζ'-ζ|<<5^||5 ν (ζ')ΙΙ 1 / ν ^ΙΙ5 ν (ζ)|| 1 / ν + ^

^ p(S(z)) + ε.

Since

p(S(z'))= inf || S V ) II1/fc,

fc$sl

we obtain

\z' - z\ <S=>p(S(z')) ^ p(S(z)) + ε.

We have established the fact that R{z) is holomorphic in the exterior of the

disk {z; \z\ ^ p(A)} which contains the spectrum of A (see Figure 2.2.1) and that

R(z) has the expansions (2.2.3) and (2.2.4). Next we are going to establish the form

of the Laurent expansion of R(z) in the neighbourhood of an eigenvalue x, where

\M^P(A).

Let Γ and Γ' be two Jordan curves surrounding λ. The curve P lies in the

exterior of Γ. Both curves lie in the set res (A) and contain no other point of sp(A).

Figure 2.2.1

SINGULARITIES OF THE RESOLVENT

Figure 2.2.2

sp(i4)= {A}UT

(see Figure 2.2.2).

p

-{^)lmdz

has the following properties :

(a) P is a projection on M — Im P, along M = Ker P.

(b) M and M are invariant subspaces of A.

(c) AfM:M-+M has the spectrum λ and A^:M^M has the spectrum τ.

PROOF

P2 R(z)R(z')dz'dz,

\2in) JJ r

where zeT and z ' e F . By equation (2.2.1) we obtain

p2J±Y[\w)-mdz,az

\2inJ JrJr, z'-z

We remark that

dz

f ' 2ίπ and

f—-o·

J r ,z'-z

On changing the order of integration we immediately deduce that

urn*-*

68 ELEMENTS OF SPECTRAL THEORY

Also

C

Ri / Γ

az \2i

AT' C

l il,7^) - '\ Γ

R(z)dz.

Thus

-1

\[ R(z)dz = P.

2/π

(b) Put M = Im P and M = Ker P. We will show that M is invariant under A.

Since AR(z) = R(z)A, we deduce that PA = AP. If welm P, then u = Pv for

some v. Hence Au = PAv = PAveM. Thus

MgM.

By a similar argument (see Section 1.3), it is shown that

M = KerP = I m ( / - P )

is invariant under A.

(c) Let [X, X] and [Χ%, X + ] be adjoint bases of C such that X and X are bases

of M and M respectively. Relative to these bases the map A^M:M-^M is

represented by the matrix B = X*AX. Similarly, the map A^:M-^M is

represented by the matrix B = X*AX.

When zeres(,4), zeT and teT, we have

2ίπ >**-©ί»·

When z is in the exterior of Γ, we obtain

df

*-(£)/,

Ä(z)P= — Κ(ί)

ζ - ί

Since M and M are complementary invariant subspaces, A becomes a

block-diagonal matrix relative to the adjoint bases [X,X] and [Χ*,Χ*]; in

fact,

[^•«■κ a

*

and sp(/4) = sp(ß)usp(£). Therefore

(B

^--^{ r\s-um

Since P = XX* (see Exercise 2.2.3) we deduce that

*

R(z)P = X{B - ziy lX* and R{z)(I -P) = X(B - zl)~lX

SINGULARITIES OF THE RESOLVENT 69

particular at every point of τ; thus res(J5) 3 τ. On the other hand, R(z)(I - P) is

holomorphic in the interior of Γ, whence Aeres(B). We deduce that

sp(£) = {>l} and sp(5) = T.

The fundamental theorem 2.2.8 enables us to block-diagonalize A. We remark

first of all that the definition of the spectral projection given in Theorem 2.2.8 is

identical with that given in Theorem 1.7.1; indeed, P does not depend on Γ, but

uniquely on λ, which is a singularity of R(z).

The eigenvalues of Α(ε) are the two roots of (1 — z)2 — ε = 0, say

λχ(ε) = 1 + ^/6 and λ2(ε)=1—^/ε.

By integrating around λ^ε) and λ2(ε) respectively, we obtain

We return to the notation employed in Theorem 2.2.8 and its proof. By

hypothesis, λ is not an eigenvalue of J3, and so (B — λΙ)~ι exists.

s^xfi-xiy'xi

is called the reduced resolvent of A at λ.

This matrix represents the extension to the whole space of the inverse

( £ - Xiy\ In fact,

G j w : «..·«-}

Lemma 2.2.9 T/ie matrix

70 ELEMENTS OF SPECTRAL THEORY

Dk = (A - XlfP = X(B - XlfX*.

Now B has λ as its sole eigenvalue. Hence p(B — λΙ) = 0 and Ν = Β — λΐ is

nilpotent: there exists a positive integer ί such that N{ = 0 but N{~l φ 0. Thus

(Α-λΐΥΧ = 0 but (Α-λί)'-ιΧΦϋ, and ί is the index of the eigenvalue λ

(Theorem 1.6.7).

hood ο/λ, can be written as

D

m=—- 'Σ , L+i + Σ (* - tfs*+>. (2.2.5)

z —z k=i(z — A) k=o

PROOF We have

(B - ziyl = (B - λΐ - (z - λ)Ι)~ \

By Theorem 2.2.4 the expansion

(B-ziyi = - f^iz-xy^^B-xif

fc = 0

R(z)P = (A - zl) ~'P = X(B - zl)"lX%

= X {z-kyk-xX(B-XI)kXl

k= 0

P - *

ζ-Λ *ΐ-!(ζ-λ)* + 1 '

Ä(z)(/-P) = X ( B - z / ) " 1 X * = £ ( z - A ^ ß - A / ) - ^ 1 ^ *

fc = 0

is the Jaylor series for R{z)(I — P), valid near λ. On using the definition

S = X(B - λΙ)~ lX% we infer that R(z)P + K(z)(/ - P) satisfies equation (2.2.5).

Without referring to the characteristic polynomial we have established the fact

that a pole λ of order { of the resolvent R(z) is an eigenvalue of A of index *f and

algebraic multiplicity m = dim M.

We will now mention the form that Cauchy's integral formula takes in this

context. Let Γ be a Jordan curve lying in res (A) and enclosing sp(,4), and let /

be a function that is holomorphic in the neighbourhood of sp(y4). Then Cauchy's

integral formula enables us to define

2ιπ J r

SINGULARITIES OF THE RESOLVENT 71

finite multiplicity for linear operators in Banach* spaces (see Chatelin, 1983).

Without difficulty, the preceding investigation can be extended to the case in

which the spectrum of A is partitioned into disjoint subsets of eigenvalues.

Let {A,·} (1 < i ^ d] be the distinct eigenvalues of A. We denote by Pt (respec

tively (fhDh S^ the spectral projection (respectively index, nilpotent matrix,

reduced resolvent) associated with each λ(.

d

ZJW-

i=l

PROOF Let Γ be a Jordan curve enclosing the set {Λ,,} (1 ^ i ^ d). Then

1

R(z)dz.

i=i 2ιπ, r

Since R(z) is holomorphic in the exterior of Γ, we can use the expansion (2.2.4)

of R{z) and make the change of variable: z = 1/i. From the identity

R(z)dz= £ tk+lAk-2

*=o t

and from the fact that t traverses a Jordan curve Γ' in the negative sense

(z = pew => z " l = p " * e ~ie) we deduce that

^i«(z)d Z = ^ f /* = · - 2ΐπ,/ = /.

i m j rr

2i7rJ 2i*7r J rj-- tt —2in

Ä(z) = R(z)Pi + K ( z ) £ P . .

(2.2.7).

72 ELEMENTS OF SPECTRAL THEORY

AP^XtPt + Dt.

We obtain the result by summing over i.

R*(A, z) = R(A*, z) and P%4, λ) = Ρ(Α*> λ).

R*(A9z) = R{A*,z).

Next, let A be an eigenvalue of A and choose p so small that the circle

Γ:{ζ;ζ-λ = ρ&θ,0^θ^2π}

isolates A. Let Γ be the complex conjugate circle positively oriented (see Figure

2.2.3); thus

r:z-A = pe- ,e :

Then

dz + dz = 0.

For all x and y in C we have

= I y*R*{A,z)xdz= I [R(A,z)yYxaz.

Hence

i f ÄM,z)dzT= f R*(A,z)dz.

Γ"

Figure 2.2.3

THE REDUCED RESOLVENT AND THE PARTIAL INVERSE 73

Now

negative or positive. Since άζ = dz, we obtain

— K%4, z) dz = — K%4, z) άζ

2mJr_ 2mJr_

PROOF Since R*{z) = R(z), it follows from equation (2.2.1) that R*(z)R(z) =

R(z)R*(z) and

l|RWIl2 = p [ M - ^ ) " " 1 ] = d i s t - 1 [ z , s P M ) ] .

Since λ is real, it may be assumed that the contour in Figure 2.2.3 is symmetric

with respect to the real axis; thus P* = P. Now

D = (A - λΙ)Ρ = D*.

Since D is a Hermitian nilpotent matrix it is zero, that is D = 0 and ί = 1.

THE PARTIAL INVERSE

Let λ be an eigenvalue of multiplicity m. We recall that the reduced resolvent

with respect to λ is given by the matrix

Put

5 = dist[A,sp(i4)-{A}]>0.

(a) i i s i i ^ a - 1 .

(b) IfA_is Hermitian, then \\S\\2 = δ'1.

(c) IfX is an orthonormal basis, then

ΙΙ5||2^ΙΙ**ΙΙ2ΙΙ(£-^ΓΊΙ2.

74 ELEMENTS OF SPECTRAL THEORY

PROOF

(c) This is evident because \\X\\2 — 1 and ||Α^|| 2 ^ 1.

Thus || S || 2 depends on δ and also on the conditioning of the Jordan base (or

eigenvectors) of B.

Let b be a vector such that

(μ-Α/)ζ=0,

(2.3.1)

X%z = 0,

consisting of n + m equations in n unknowns of rank n\ indeed the unique solution

of system (2.3.1) is z = Sb [this may be seen by using the relations (A — Xl)X =

X(B - λΐ) and XX% + XX% = / ] .

In a practical situation system (2.3.1) can be solved by adapting the factoriza

tions of Gauss or Schmidt (see Exercise 2.3.1). However, it may be preferable to

reduce the problem, when possible, to the solution of a regular system of n

equations in n unknowns; this will be demonstrated in the next lemma.

Lemma 2.3.2 Suppose that X*b = 0 and that λΦθ. Then the unique solution z

of system (2.3.1) is a solution of the system

(I - P)Az -kz = b. (2.3.2)

L 0 Β-λΐ]ΐΧ*]

or, again,

-λΧ*ζ =0,

*

{Β-λΙ)Χζζ=Χ*ο.

When λ Φ 0, we obtain Xp = 0 and X%z = (B - λ1)~ lX*b, whence z = Sb.

System (2.3.2) of rank n can be solved in a standard fashion.

The spectral projection P presupposes a knowledge of the right and left invariant

subspaces M and M*, which may be costly. We are going to introduce the notion

THE REDUCED RESOLVENT AND THE PARTIAL INVERSE 75

of a partial inverse which requires only the knowledge of the right invariant

subspace M.

Let X be a basis for M and let 7 be an adjoint basis for a subspace N, which

need not be M + . We suppose that ω(Μ, N) < 1, that is 0max < π/2. Then Π = X Y*

is the projection on M along N1 = W. Let [X, X] and [7, 7 ] be adjoint bases of

C . The matrix A is similar to

We remark that

sp(ß)nsp(ß) = 0 .

relativeto the adjoint bases IX, X~\ and[Y, 7 ] , then there exist adjoint bases [X, X~\

and [X^9X^\ defined by

X = X-XR, X*=Y+YR*, X* = Y

l

where R = (B, B)~ C, such that A becomes block-diagonal of the form

K ίΐ

PROOF It is easy to verify that

ifandonlyifÄ = (B,jB)"1C.

With the notation of Section 2.3.1 we have

B = X%AX = Y*A(X - XR) = Y*AX = B.

As we shall see in Proposition 2.3.4, this is due to a particular choice of the bases

X, X+, starting from the bases X, 7in Theorem 2.3.3.

When arbitrary bases IX, X] and [X, X~\ are associated with the direct decom

positions MφNL and Λ ί φ Μ , the corresponding matrices B=Y*AX and

B = X*AX are similar because they have the same spectral structure (see Exercise

2.3.2). We shall study this similarity more precisely in the following proposition.

Proposition 2.3.4 Let {X9 X+) and (X, 7) be a pair of adjoint bases. Then there

76 ELEMENTS OF SPECTRAL THEORY

X*=YG and B = G*B(G~1)*.

(respectively M,M+). Now M 1 = Μ + ; therefore 7and X+ are two bases for the

same space. There exists a regular matrix G such that X# = YG. Denote by X'

the adjoint basis of X+ in N. It is known that X' = X(G~*)* and that

B = X%AX = X%AX'.

Therefore

B = G*Y*AX(G~l)* = G*B(G~x)*.

The matrix G depends on the choice of bases Y and Χ+{οτ_Μλ. For example, in

Theorem 2.3.3 we have G = I with the choice of bases X, X^.

Since λ is not an eigenvalue of B, it is not an eigenvalue of B either. We define

the partial inverse Σ (with respect to N1) by

Σ= Χ(Β-λΙΓιΥ*.

then the equation

(/ - Yl)Az -kz = b

has the unique solution z = Σb.

Let [β, Q] be an orthonormal basis of C" such that Q and Q are orthonormal

bases of M and M1 respectively. The projection P1 = QQ* is the orthogonal

projection on M. The corresponding partial inverse (with respect to M 1 ) is

defined by

Σ^ρίΒ-//)- 1 ^*,

where B = Q*AQ. We leave it to the reader to verify that

Ι|Σ1||2=||(Β-λ/Γ1|Ι2.

The reduced resolvent which we have defined previously refers to the case of a

multiple eigenvalue for which bases for the left and right invariant subspaces are

known. Numerically, a multiple eigenvalue is, in general, approached by a set of

THE BLOCK-REDUCED RESOLVENT 77

neighbouring eigenvalues, and the resolvent that is associated with each of these

eigenvalues individually is ill-conditioned because their distance from the re

mainder of the spectrum is small. One may therefore wish to treat globally the

cluster of eigenvalues which are close to a multiple eigenvalue.

Let {μ,·:| ^ Ι ^ Γ } be the block of the r eigenvalues of A, counted with their

multiplicity and distinct from the rest of the spectrum; we wish to treat the

eigenvalues {μ^ simultaneously. The corresponding right invariant subspace is

given by

i= 1

represented by the matrix

P = XX%.

The complementary invariant subspace is denoted by

M = I m ( / - P ) = M^.

the matrix B = X$AX represents A^M relative to the bases X and X+; it satisfies

the equations

AX = XB and X*A = BX*.

Similarly, the matrix B = X^AX represents A^ relative to the bases X and X^

which are complementary to X and X+ respectively.

We put

<T = s p ( B ) = { / i i : l ^ i < r } ,

t = sp(£)

and

δ = dist min (σ, τ),

which is positive by hypothesis.

In the case of a block of eigenvalues, the eigenvalues of B may include distinct

ones. We will generalize the notion of a reduced resolvent in the following

manner.

S=X(B,B)~lXl

is called the block-reduced resolvent relative to {μ,·: 1 ^ i < r}, where (Β,Β)'1 is

the inverse of the linear map

Zv-*BZ-ZB

in )xr

defined on <E -' .

78 ELEMENTS OF SPECTRAL THEORY

(a) | | 5 | | p ^ | | X | | 2 | | X J | 2 | | ( 5 , ß ) - 1 | | P > w / l e n p = 2 o r F .

(b) usii^-δ- 1 .

ι

(c) If A is Hermitian, then \\S\\F = δ .

PROOF

||5Ζ||ρ=||ίί/||ρ.

When p = 2, the result is immediate because the spectral norm is induced by

the Euclidean norm.

When p = F, we have

where

wxuwi^iwxutwi

Hence

Next,

H^ll^ll^llltli^llHll^llillfll?·

i= 1

ΙΙΙ/ΙΙΡ^ΙΚΑΒΓΜΙΡΙΙ^ΖΙΙΡ^ΙΙΑΒΓΜΙΡΙΙ^Ι^ΙΙΖΙΙΡ·

whence the result follows:

(b) This_is a consequence of Proposition 1.12.2 and the fact that sp(S) =

spKftBr^ufO}.

(c) We can choose bases X and X+ in M 1 and M£ respectively such that

Jf * i = X*X = /,

λ

because ω(Μ ,Μ^) < 1. Then by Proposition 1.2.5 we have

J^-cos1©,

where Θ is the diagonal of the canonical angles. We deduce that || X ||2 = 1

and \\XJ2J* 1. If Af = M # , then ||Äf||2 = \\XJ2 = \. If A is Hermitian, so

are B and J5 if we choose an orthonormal basis [ β , β ] . Thus

| | 5 | | Ρ = | | ( 5 , 5 ) - 1 | | ρ = δ- 1 .

Lemma 2.4.2 Let R be an n by r matrix such that X*R = 0.IfB is regular, the

equations

\AZ-ZB = R,

X%Z = 0,

LINEAR PERTURBATIONS OF THE MATRIX A 79

and

(I-P)AZ-ZB = R,

where P = XX*, have the same solution, namely Z = SR.

PROOF This is analoguous to the proof of Lemma 2.3.2. The equation λΧζζ = 0

is replaced by {X*tZ)B = 0, which has the unique solution AT*Z = 0 if and only if

B is regular.

consists of a single eigenvalue of algebraic multiplicity m? Two cases have to be

distinguished:

(a) λ is semi-simple and B = XIm. Sylvester's equation is then entirely uncoupled

and S is identical with the reduced resolvent S.

(b) λ is defective. In this case the two notions of the reduced resolvent S and the

block-reduced resolvent S are distinct. We shall see in Section 2.9 that in

this case it is the notion of a block-reduced resolvent that plays a part in the

theory of analytic perturbations.

As in the case of the reduced resolvent, the use of the block-reduced resolvent

presupposes a knowledge of the spectral projection, that is of the right and of

the left invariant subspaces. We shall extend the notion of a partial inverse to a

block of eigenvalues.

Again, let Π = X Y* be a projection on M, not necessarily the spectral projection.

Let N 1 = K e r n .

Definition The linear map Σ = X(B,B)~1Y* is called the block partial inverse

with respect to {μ,: 1 ^ i < r}, defined in N1, where

B = Y*AX.

In particular, we may choose N = M. When Q is an orthogonal basis of Mx,

we have B = Q*AQ and Σ1 = Q(B,B)~ lQ*. We~leave to the reader the task of

verifying that

l|E- L llp=ll(B,B)- 1 ||p when p = 2orF.

Let A be a matrix whose eigenvalues we wish to compute. The existence of

rounding errors and/or systematic errors make us determine numerically the

eigenvalues of a 'neighbouring' matrix, Ä = A + H, where H is a matrix of'small'

norm, termed a perturbation. In order to deduce some information on the

80 ELEMENTS OF SPECTRAL THEORY

consider the family of matrices A(t) — A' — ί/ί, where t is a complex parameter.

Then A(0) = A' and A(\) = A: this is a homotopy between A and A'. Consider the

problems

lA(t) - z/]x(i) = ft, Λ(ί)χ(0 = λ(ήχ(ή.

If the functions x(r) and λ(ή are analytic in a disk around 0 which contains the

point t = 1, then the solutions of the problems

(A — zl)x = b, Ax — λχ

can be computed recursively by starting from the known solutions of

(A1 - zl)x' = ft, A'x' = λ'χ'.

Let Γ be a Jordan curve drawn in res(/T) and isolating A', which is assumed

to be an eigenvalue of A' of multiplicity m and index £'. We define formally

R(t,z) = lA(t)-ziyl when zeres(zi')

and

P(t)=-~{ R(t,z)dz

ofO.

plHR'iz)] = p[R\z)H] = p[I -(A- zI)R'(z)l (2.5.1)

PROOF We have

HR\z) = {A' - A)R\z) = I-{A-zl)R\z).

On the other hand, from the definition of p(A) as limfc sup || Ak ||l/k it follows that

p[HK'(z)]=p[Ä'(z)Jf].

See also Section 2.12.

and it has the Taylor expansion

k=0 k=0

PROOF We have

A(t) = zI = A'-zI-tH = (Α' - zI)U - tR\z)H\

LINEAR PERTURBATIONS OF THE MATRIX A 81

Hence

k=0

if and only if

\t\plR'(z)Hl<l.

We deduce that t = 1 belongs to the domain of analyticity provided that

p[R'(z)H~\ < 1. This holds under the classic condition

IIHIKHK'Wir1;

but it may also hold for matrices of greater norm, as the following example shows.

the domain of analyticity of R(t,z) when z describes the circle Γ:{ζ;|ζ| = 1}

surrounding the eigenvalue 0. We have

V 0 1/2-z

>.-z)\

-x/z 0 /

and

1/2

p[HR(z)] = ^

z(2-z)

max |Ä(z)||2 = m a x f l _ L ) B L

ζεΓ »r V|z| | 2 - z | /

As z describes Γ, the condition

maxp[HK(z)]<l

zer

is satisfied provided that \xy\ < 1.

82 ELEMENTS OF SPECTRAL THEORY

Figure 2.5.1

The corresponding sets of points in the (x,y) plane are bounded by the

circumference of the square and by the two hyperbolas shown in Figure 2.5.1. If

we choose x = y/n,y= l/n,thenx>/ = l/ x /tt->Oasw->oo;but ||H|| 2 = H 1 / 2 -»OO.

By Lemma 2.5.2, R(t,z) is analytic when \t\ < p~l[HR'(z)~\. Now x(t) = R(r,z)b

can be computed in various ways by starting from x' = R\z)b.

x(t) = ttkyk

o

converges, where y0 = x' and

y^lR'Wfx^R'Wy^, (fc^l).

PROOF The assertion follows immediately from Lemma 2.5.2.

Ifp(HK'(z)]<l,then

k

x

k= Σyi-* x > as

^-*°°·

/= 0

Also

*o = Jo = *'.

xk = x k _! + R'(z)V> -(A- z/)xfc_ J (k > 1) (2.6.1)

ANALYTICITY OF THE RESOLVENT 83

and

xk = x' + k\z){A' - A)xk.1 (k ^ 1) (2.6.2)

yi=R'(z)Hy0

yk + 1 = R\z)Hyk = R\z)l(A' - zl) -(A- zlfty,

= yk-R\z){A-zI)yk

= R\z)ib-(A-zl)xk_x -(A-zl)yk-\

by making the inductive hypothesis that identity (2.6.1) holds. Hence

yk+1 = R'(z)lb ~ (A - z / ) x j = xk+! - xk,

which is (2.6.1) for k + 1. Next

yl + - + yk = xk- x' = R'(z)H(x' + ■- + y^J

= ^(z)Hx k _ 1 .

requires only a knowledge of the residual error for xk_ x. It is worth noting that,

since xk — xk_ l -»O as k -» oo, the residual error must be computed with increasing

precision, in order to obtain an effective convergence to x.

mate solution x' that is an exact solution of Α'χ' = b. For example, x' is known

in simple precision as a result of applying the Gauss factorization LU to A. Put

x 0 = x' and Sk — xk — xk-i(k ^ 1). The iterative method of refinement consists in

solving the equation

A'ök = b-Axk_x=rk (fe^l) (2.6.3)

We have x k -»x as /c->oo, if and only if p[(A' — A)A'~ ] = p ( / — AA'~X) =

l

p(I — A'~ 1A) < 1. This is true in precision arithmetic. In floating-point arithmetic

we have convergence towards an approximate solution with the precision utilized

in calculating rk. We pose the question: 'When calculating the residue in double

precision can we obtain the solution x in double precision whilst solving the

equation (2.6.3) in simple precision?' The answer is 'yes' if we proceed as follows:

is extended by zeros to obtain numbers in double precision. Thus the vectors

by Ax' and rl = b — Ax' are calculated in double precision.

(b) k = l.rl is truncated to simple precision. We solve Α'δί = rt and v/e calculate

xl = x 0 -f <$! in double precision.

ELEMENTS OF SPECTRAL THEORY

We repeat the procedure until the residue is zero in double precision (see

Exercise 2.6.1).

According to Proposition 2.2.7 the function z\-*p[HR\zy\ is upper semi-continuous

in res {A'). Hence it attains its maximum when z traverses the compact set Γ.

R(t,z)dz

« · - ( = ) ;

is analytic in the disk

ζεΓ

anc

PROOF The matrix P(t) is defined for each t in δ'Γ Let t0e&r * consider its

neighbourhood

fc = 0

that P(t) is analytic near t0.

Lemma 2.7.2 If the projection P(t) depends continuously on t, when t varies over

a connected region o/C, then the dimension of Im P(t) remains constant.

function. Since its values lie in IN, the function is constant.

Therefore the dimension of the invariant subspace M(t) = Im P(t) is constant

and equal to m (say) when t varies in δ'Γ: the total algebraic multiplicity of the

eigenvalues of A(t) inside Γ is preserved. Let B(t) be an m by m matrix representing

A(t)^my Put l(t) = (\/m)trB(t); then 1(f) represents the arithmetic mean of the

eigenvalues of A(t) inside Γ.

Proposition 2.7.3 The functions l(t) and B(t) are analytic in δ'Γ.

THE RELLICH-KATO EXPANSIONS 85

P(t0) = X0Y*.

When t -► t0 we have P(t) -► P(t0) and

G(t) = Y*P(t)X0->Y*X0Y*X0 = /.

According to Lemma 1.2.4 the vectors P(t)X0 form a basis whose adjoint basis

in the space generated by Y0 is given by

Z(t)=Y0(G(ty1)*.

The matrix

B(t) = Z*(t)A(t)P(t)X0

represents A(t)^m) in the adjoint bases P{t)X0 and Z(i). We have

B(t) = G(t)-lY*(A'-tH)P(t)X0.

Now G(i)~* is analytic in the neighbourhood of t0 because G(t) -» / when t -+ i 0 .

Hence J5(i) is analytic and so is ml(i) = tr B(t).

Proposition 2.7.3 allows us to conclude that the simple eigenvalues are analytic

functions of t. However, in general, this is not true for multiple eigenvalues.

«-C i)

has the eigenvalues ± J~t and

*»-(: i)

has zero as an eigenvalue of index 2. We remark that the arithmetic mean of the

eigenvalues remains constant at zero. The matrix

2

«-C 0)

has the eigenvalues \{t ± y/t + At) whose arithmetic mean is equal to \i.

We give below the form of the Taylor expansions in t of P(t) and 2(f) for all t in

δ'Γ. These expansions are known as the Rellich*-Kato expansions. Let F represent

the spectral projection of A' associated with the eigenvalue X of index Λ

86 ELEMENTS OF SPECTRAL THEORY

k=2 *

where

ζ-*λ'

PROOF The reader is referred to the proof given in Kato (1976, pp. 74-80).

On account of the complexity of their coefficients these expansions are chiefly

of theoretical interest. In what follows we shall meet the Rayleigh-Schrödinger*

expansions which possess the double advantage of having coefficients that can

be calculated recursively and of covering the case in which Γ encloses several

eigenvalues {μ$\9 of A'.

Let {μ\}\ be the block of r eigenvalues, each counted with its multiplicity, which

is isolated from the rest of the spectrum and which we wish to treat simultaneously.

Let Γ be a Jordan curves lying in res (Α') which isolates this block from the rest

of sp(y4'). Let X' be a basis for the associated invariant subspace M' and let Y

be an adjoint basis in the subspace N [such that ω(Μ\ N) < 1]. Thus Y*X' = /.

Then IT = X T * is the projection on M' along N1 = W.

Let M(i) be the invariant subspace associated with the eigenvalues of A(t)

inside Γ. If ω(Μ(ί), N) < 1, it is possible to choose a basis X(t) in M(i) such that

Y*X(t) = /.

Then

B(t)=Y*A(t)X(t)

is the r by r matrix that represents A(i) tM(I) relative to the base X(t) and Y. UX(t)

and B(t) are analytic in t, there exist expansions of the form

k=0 fc=0

THE RAYLEIGH-SCHRÖDINGER EXPANSIONS 87

B' = Γ Μ ' * ' and & = ΓΜ'Χ'.

By hypothesis σ' = sp(£') forms a block of eigenvalues of Λ'. Furthermore, we

define the partial block inverse

Σ! = Χ'(Ι?,ΒΤ)-ιΥ' (iniV 1 )

relative to the block σ' (see Section 2.4.2).

Proposition 2.9.1 The coefficients Zk and Ck are formally the solutions of the

recurrence relations

C0 = B\ Ck = Y*(A'Zk -HZk-x) (k>\\

equations

(A' - tH)X(t) = X(t)Y*(A' - tH)X(t\

Y*X(t) = /.

When k = 0, we may choose Z 0 = X\ and so

Y*Z0 = I, C0 = B'.

When k = 1, we obtain the system

(/ - ITM'Zi - ZXB' = (/ - Π')#Ζ 0

y*z1=o,

which has the unique solution Zx = ΣΉΖ0. Hence Cx = Y*(A'Zl - HZ 0 ).

The remainder of the proof is left to the reader.

Remark If we choose Y=X'+, then ΓΓ and Σ' become ?' and S' respectively and

x'?A'Zk = B'X'fZk = 0,

whence

Lemma 2.9.2 If p\_(P(t) - Ρ')ΓΓ] < 1 w/ien ί //es in <5r, ί/ι<?η ίΛέ? mairix

S(t) = lY*P(t)XT1

is analytic in δ'Γ and Π(ί) = P(t)X'S(t) Y* defines the projection on M(t) along N1.

88 ELEMENTS OF SPECTRAL THEORY

PlY*iP(t)- Ρ1ΧΊ = p{LP(t)-ΡΊΧΎ*} < i.

Hence / + Y*[P(t) - P']X' = Y*P(t)X\ exists and is invertible for all t in δ'Γ.

Its inverse S(t) = lY*P(t)X'] represents Yl'P(t)lM,. The columns of X'S(t) form a

basis of M' and P(t)X'S(t) is a set of r vectors of M(t) such that Κ*Ρ(ί)Χ'5(ί) = /.

This is the basis X(i) that is adjoint to Y. Incidentally, we have shown that, under

the hypothesis of the lemma, we have ω(Μ(ί), N) < 1.

and

r'r = max||R'(z)|| 2 .

ζεΓ

77ien r/ie disk {t; \t\ < \/p} belongs to the domain of analyticity ofX(t) and B(t).

Hence if |i| < 1/p, we have

U Ml J^ II 2 ^ < - -- -—*— ^ -1 -

r

2|Γ|/·ί-||Π|| 2 2

We deduce that

p{LP(t)-PW'}^\\iP(t)-PW'\\2

and

2ιπ fc = 0

Thus

On the other hand, since |r| \\H\\2r'r < | , it is clear that teS'r. Hence, by Lemma

2.9.2, P{t) is analytic and so also are X(t) and B(t) = Y*(A' - tH)X(t).

--)||ΠΊΙ2Γ'Γ2||Η||2<1, (2.9.1)

nJ

NON-LINEAR EQUATION AND NEWTON'S METHOD 89

X = Y£=0Zk is the basis of the subspace M which is invariant under A; it is

associated with the eigenvalues ofB = Y*AX, which represents A fM, where Y*X = /.

PROOF The condition 1/s > 1 is satisfied under the hypothesis (2.9.1), which can

also be written as

m/iu<-—--U-

|Γ|||Π'||2Γ'Γ2 2r'r

by virtue of the relations mentioned at the beginning of the proof of Theorem

2.9.3.

Let q be such that s < q < 1, so that

q s

Now X(i) and B(t) are analytic on the circle {i; |i| = \/q}. Let

0L = max(\\X(t)-X'\\2;\t\=\/q\

ß = max(\\B(t)-B'\\2;\t\ = \/q).

By Cauchy's inequalities (page 62) we have

IIZJ2W, \\Ck\\2^ßqk.

This demonstrates that the convergence is at least as fast as that of a geometric

progression of common ratio q, where q is arbitrarily close to s provided that

|| H || 2 satisfies condition (2.9.1).

If we choose Y = X' = Q', where Q' is an orthonormal basis of the invariant

subspace M', then ΓΓ is an orthogonal projection and ||Π'|| 2 = 1 on condition

(2.9.1). This condition is satisfied when \\H\\2 is sufficiently small. This furnishes

a theoretical framework which suffices for the error analysis that we shall under

take in Chapter 4. Nevertheless, it is interesting to remark that the sufficient

condition for analyticity given in Lemma 2.9.2 may be satisfied even when ||H|| 2

is not 'small' (see Exercise 2.9.1).

As we have seen, the perturbation theory which we have expounded in Sections

2.5 to 2.9 enables us to calculate iteratively the eigenelements of A by starting

with those of a neighbouring matrix A' provided that t = 1 belongs to the domain

of analyticity. In what follows we shall present a different class of iterative

calculations. They arise from the formulation of the eigenvalue problem as a

non-linear equation in the basis of the invariant subspace, normalized with the

help of linear forms.

90 ELEMENTS OF SPECTRAL THEORY

algebraic multiplicity and separated from the remainder of the spectrum. Let M

be the associated invariant subspace and let Y be a basis of a subspace N such

that ω(Μ, N) < 1. Next, let X be the basis of M satisfying Y*X = /; then Π = X Y*

is the projection on M along N1 — W.

Theorem 2.10.1 Ι/Οφσ, the basis X which satisfies AX — XB and which is norma

lized by Y*X = I is a solution of

F(X) = AX- X{Y*AX) = 0. (2.10.1)

PROOF Equation (2.10.1) expresses the fact that there exists a matrix B such that

AX = XB. Hence X is invariant under A. On multiplying on the left by Y* we

deduce that

(/ - Y*X) = 0,

which implies that Y*X = / since B is regular by virtue of the assumption

0£σ = sp(ß).

The Frochet* differential of F, which is a quadratic function of X, is easily

calculated, namely

Zh+(DXF)Z = J(X)Z = (/ -XY*)AZ - Z(Y*AX)

= (/-ΠμΖ-Ζ£, (2.10.2)

xr

where Z ranges over C .

(a) J(X) is invertible.

(b) J satisfies a Lipschitz1' condition over (P x r.

PROOF

(a) Let τ = sp(>l) — σ. Now

sp((J-IIM) = a-{0}.

Therefore

ansp((/-n).4) = 0 ,

because Ο^σ.

xr

(b) Let Xx and X2 be elements of C . Then

[ J ^ ) - J(X 2 )]Z = (X2 - XX)Y*AZ + ZY*A(X2 -X,)

+

Rudolf Otto Sigmund Lipschitz, 1832-1903, born at Bonkeim, died in Bonn.

NON-LINEAR EQUATION AND NEWTON'S METHOD 91

and so

|| J&J - J ( * 2 ) K 2|| Y*A|| \\X2 - Xt ||.

We define

HT = [Z; 7*Z = 0];

xr

this is a subspace of C .

Lemma 2.10.3 Let V= [vx,..., i?r] be a matrix satisfying Y*V= I. Then F(V)eW.

IfJ(V) and Y*AV are invertible, the space iV is invariant under J _ 1 ( F ) .

PROOF We have

y*F(F)= y*(/- VY*)AV=O,

which proves the first assertion.

Next consider the equation

( / - VY*)AZ-Z(Y*AV) = C9 (2.10.3)

where C is such that Y*C = 0. Since we suppose that J(K) is invertible, we have

sp ((/ - V Y*)A) n sp (7*A V) = 0.

The solution Z of equation (2.10.3) exists and is such that

Y*Z(Y*AV)= y*c = o.

This implies that Y*Z = 0 because Y*A V is invertible. Thus we have shown that

Ce1T implies that J'l(V)CeiT.

ximate solution U such that Y*U = I we define a sequence (Xk) such that

Y*Xk = / (k > l) by formally applying Newton's iteration:

X° = U, Xk +1

=Xk-J- ^JFCY*) (k > 0). (2.10.4)

Theorem 2.10.4 On the assumption that 0$G, there exists p>0 such that, for

every U satisfying \\X — U\\ < p, the sequence defined in (2.10.4) is meaningful and

converges quadratically to X as k tends to infinity.

PROOF Using Lemma 2.10.3 we show recursively that Y*Xk — I, that is Xk+1 — Xk

lies in iV if we suppose that, at each step, J(Xk) and Y*AXk are invertible. Since

we assume that Ο^σ, the matrix B = Y*AX is invertible. Now the function

Bv-►det B is continuous: hence for some a > 0 there exists ργ such that, for all V

satisfying \\X - V\\ < pu we have that det(Y*AV) ^ a > 0.

92 ELEMENTS OF SPECTRAL THEORY

On the other hand, since 3(X) is invertible and J satisfies a Lipschitz condition,

there exists p2 ^ px such that, for every U such that \\X - l / | | < Ρι·>the sequence

(2.10.4) converges quadratically to X (see Exercises 2.10.1 and (2.10.2).

Finally, we remark that when Xk -* X, then F(Xk) tends to zero and, in practice,

must therefore be calculated with increasing precision (see Sections 2.6 and 2.12).

2.11 MODIFIED M E T H O D S

Iterations of the type (2.10.4) are costly because they require the solution of a

different Sylvester equation at each step. We can modify this type of iteration by

keeping the system that has to be solved fixed in the course of the iterations. Thus

we define a family of modified Newton methods (with fixed gradient) whose

convergence is linear. We present these methods in relation to the variable

deviation Vk = Xk — X°\ for the study of convergence is simpler in this version.

In (2.10.4) we fix J(Xk) to be equal to J 0 = J(U). Let Vk = Xk - U. Then

^0 = 0,

K1== - J " 1 / * , (2.11.1)

Vk+x = Vi+^liVkY*AVk-] (k2*l),

where

J 0 = J(l/), R = AU-UB, B=Y'AU = @(U,Y)

Here R is the matrix residual in U; S = Y*A — BY* is the left matrix residual in

K Put

y = II Jo Ml, P = 11*11, s=||S||

and

l-yr^i (2.11.2)

g(t) =

it

«Ut^i).

We have 1 < g(t) ^ 2 and ^(0) = 1 (see Figure 2.11.1).

0 1/4 t

Figure 2.11.1

MODIFIED METHODS 93

\Wk\\^9(e)yp (k>l).

iorY*AVk = SVk.

Define a sequence (xfc) by putting x 0 = 0 and

% = π1(1 + xk) (fc^l).

2

Then xk+ x = ε(1 + xk) , where ε = ys\\ Vl ||.

It can be shown inductively that xfc is a monotonic increasing sequence (k ^ 0);

it tends to x where

1 + x = g(s)

(see Figure 2.11.2). Since xk < x, we conclude that

\\Vk\\^Kk^n1(l+x) = g{e)nl.

G\V->Vl+iil(VY+AV)

is a contraction map in the ball

® = {V;\\V\\^g(e)\\Vi\\}.

PROOF We have

G(V)- G(V) = J - ' I V Y * A V - V'Y*A V\

= J ^ ' C i ^ - V')Y*AV+ V'Y*A(V- ν')]=ξ.

By hypothesis, max( || V \\, || V'\\) < «(e)|| Kt || = g(*)*i■ Hence

H\\ <2^(ε)ν 5πι ||Κ- K'H =2εοτ(ε)|| F - HI-

94 ELEMENTS OF SPECTRAL THEORY

Hence

a = 2sg(e) < 4ε < 1

when ε < \.

Consequently, G possesses a unique fixed point V in & which satisfies

A(U + V) = (U + V)Y*A(U + V\

or

AX = X£.

One can calculate V by successive approximations starting from V0 = 0; the

iteration (2.11.1) converges linearly to V provided that ||R|| < \y2s:

Since a is the contraction constant of G, we have

\\V~Vk\\^^—\\Vi\\ (*>1),

1 —a

where

α = 2ε0(ε), e = ys||K1||.

B - Bk = Y*A(X - Xk) = Y*A(V- Vk\

and

\\B-Bk\\ = \\Y*A(V-Vk)\\^^s\\Vxl

1 —a

Corollary 2.11.3 The upper bounds \\V - X\\ ^g(z)\\3-lR\\ and \\B-B\\^

g(e)s\\Jo XR\\ are valid when ε<\.

and the limit basis satisfies U*X = /. Then \\X — U\\p = ||tan Θ|| ρ , where p = 2 or

F, and Θ is the diagonal matrix consisting of the canonical angles between the

exact and the approximate invariant subspaces.

(a) Let A'' = A + H and let X' be a basis of the subspace Λί' which is invariant

under A, Suppose that 7*X' = /. The Frechet differential which was defined

in equation (2.10.2) can be approximated by the map

Zi-*J'Z = (/ - X'Y*)A'Z - ZY*A'X'.

THE LOCAL APPROXIMATE INVERSE 95

Vl=J"lHX\ (2.11.3)

Vk+l = V, + 3'-l[VkY*AVk + HVk- VkY*HX'l

In Chapter 4, this iteration will be used to obtain bounds for the error linking

the eigenelements of A with those of A'.

(b) We can modify J 0 in the following way:

Z -> J Z = (/ - U Y*)AZ - ZB,

where B is defined as follows: let

T = Q*BQ

be the Schur form of

B= Y*AU.

Put

i=T+diag(C-Q,

where the d are the r eigenvalues of B and ζ is their arithmetic mean. Then

define

B = QTQ*.

V^-J'R, (2.11.4)

convergence of certain computational methods of the inverse iteration type.

We remark that the iteration (2.11.1) rests on the quadratic form VkY*AVk9

which explains why the iteration converges provided that ||£|| is sufficiently

small. By way of contrast, the iterations (2.11.3) or (2.11.4) contain terms that are

linear in Vk; consequently, the convergence can be established only when ||H||,

or ||2? — B || and ||Ä|| respectively, are small enough.

T H E M E T H O D O F RESIDUAL CORRECTION

A very large number of methods for the approximate solution of equations can

be presented in the general context of correcting the residual (Stetter, 1978), for

which we shall give here a formal presentation.

96 ELEMENTS OF SPECTRAL THEORY

consider the equation

F(x) = 0 (2.12.1)

Suppose F is defined on the ball

# = {y,\\y-x\\<p}.

Suppose, further, that we are able to evaluate the map F and that we know an

approximate inverse G defined in the following manner.

Definition We say that G is an approximate local inverse ofF if and only if the

following three conditions are satisfied:

(a) F ( J ) c D o m G .

(b) G(0)e<^.

(c) U = 1 — G°F is a contraction map on $.

The method of residual correction consists in producing the sequence

x (0) = G(0), x(k + υ = G(0) + U(xik)) (k ^ 0). (2.12.2)

k -+ oc and F(x) = 0.

PROOF The existence of lim xik) is assured because the sequence defined in

(2.12.2) is a Cauchy sequence. Let

x = limx (k) as fc-»oo.

We shall show that x is a fixed point of the map

K(y) = G(0)+l/(y).

On letting k tend to oo in (2.12.2) we obtain

x = G(0)+l/(x).

Hence

For each y of 3t we have

II V(y)-x|| = || V(y)- V(x)|| ^ φ ) \ \ y - x \ \ < p ,

where t(u) < 1 is the contraction constant of U,

Thus V(x)e& and V(&) <= ^ . It is easy to see that F is injective on ^ . Hence

x is the unique zero of F in ^ .

The iteration (2.12.2) can also be interpreted as follows: x(0) = G(0) is an

approximation of x with residual F(x(0)) and error x — G(0) = U(x) = e. In its turn

the error e can be approximated by <?(1) = x (1) - G(0) = C/(x(0)) = x (0) - GF(x(0)).

THE LOCAL APPROXIMATE INVERSE 97

iterate xik). The method is the more effective the smaller the contraction constant

oft/.

A particular case of importance is that of the linear equation

Ax = b,

for which F is the affine map

F(y) = Ay-b.

Let B be an approximate inverse of A: thus Ü = / — BA has the property that

p(U) < 1. For a given u in & define

G(y) = By + u.

Then G is an approximate inverse of F.

The ensuing method of residual correction is as follows:

x<°> = u, x(k+1) = x(*> - B(Ax(k) -b), k^ 0.

Ax = b.

We assume that there exists a matrix K such that

x = Kx + Hb,

where

H= (I-K)~lA.

For a given vector w, define G by

G(y) = Ku + Hb + Hy.

Then G is an approximate inverse if and only if

p(K)<\.

We have the decomposition

A = D(L+1 + U\

where D is the diagonal of A and L and U are the lower and upper triangular

parts of D~ lA. The iterative methods of Jacobi,* Gauss-Seide^ and the accelera

ted iteration correspond respectively to

(a) Kj= -(L+U\Hb = D-lb\

(b) K G S =-(/ + L)- u,fffc = (/ + Lr 1 0- 1 fc;

1

f

Philipp Ludwig von Seidel, 1821-1896, born at Zweibrücken, died in Munich.

98 ELEMENTS OF SPECTRAL THEORY

We leave it to the reader to verify that the iteration of the residual correction

x (0) = Ku + Hb, xik+l) = K{k) + Hb

reverts to the three well-known iterative methods.

and G(y) = A'~ly + A"2b, we obtain the iteration

x<o> = X'f x<*+ i) = X' + A'-\A' - A)x(k\

which should be compared with the identity (2.6.2).

Other important examples of the use of the notion of an approximate inverse

in linear algebra are the techniques of preconditioning (Exercise 2.12.2) and the

multigrid methods (Exercise 2.12.3). It should be borne in mind that the precision

obtained for xik + υ is the same as that for the residual F(x{k)).

The spectral theory of closed operators in a Banach space (for the calculation of

isolated eigenvalues of finite multiplicity) is found in the expositions of Chatelin

(1983) and Kato (1976). The use of Cauchy's integral formula (2.2.6) for a matrix

goes back to Poincare (1935). The theory of analytic perturbations for an operator

is treated in Chatelin (1983), Kato (1976), Baumgärtel (1985) and Wasow(1978).

The reduced resolvent of Kato is known in other contexts as the Drazin inverse

and, in the case of a semi-simple eigenvalue, as the group inverse (see the book by

Campbell-Meyer, 1991). The notion of block-reduced resolvent is new (Chatelin,

1984), as are the Rayleigh-Schrödinger expansions in Section 2.9. The use of a

quadratic formulation for the computation of an eigenvector associated with a

simple eigenvalue is in Anselone and Rail (1968). The technique of proving

Lemma 2.11.1 and Theorem 2.11.2 is borrowed from Stewart's article (1973a).

The special case y = en in (2.11.1) is treated in Dongarra, Moler and Wilkinson

(1983).

Stetter's paper (1978) is fundamental for the methods of residual correction.

The methods for iterative refinement are the subject of the book by Kulisch and

Miranker(1981).

EXERCISES

Section 2.1 Revision of Some Properties of Functions of a Complex Variable

2.1.1 [B:41] Let Ω be an open set in C Suppose the maps /;Ω->(Ε, p:R 2 -+R,

<?:R2 ->R and #:R 2 ->C are such that

f(z) = p(x, y) 4- iq(x, y) = g(x9 y)

for all z = x 4- iy in Ω.

EXERCISES 99

satisfies the Cauchy-Riemann conditions

dp dq dp dq

—= — andΛ —=

dx dy dy dx

in Ω. Deduce that p and q satisfy the equation

δχ2 + dy2""

ίηΩ.

2.1.2 [D] Let Ω be an open set in C Prove that

zeüv-*A(z) = [α0·(ζ)]€€"xm

is analytic if and only if the mn functions ζθΩι->α0·(ζ) are analytic.

2.2.1 [B:35] Prove that the resolvent R(z) satisfies the equation

! * ( * ) = _[K(z)] a f

dz

and, more generally,

^-kR(z) = kl(-\)klR(z)f +1

(/c=l,2,...).

dz*

2.2.3 [B:63] Show directly that the function zi-»p(R(z)) is upper semi-conti

nuous on the resolvent set.

2.2.3 [D] Let As<£n xn and let M be the invariant subspace associated with an

eigenvalue λ of A. Let X be a basis of M and let X+ be_a basis of the orthogonal

complement of the complementary invariant subspace M{M+ = ML,M®M = C),

such that XIX = /. Prove that

XX* =

* — L f; R(z)dz,

2niJr

where Γ is a Jordan curve lying in res(,4) and isolating λ.

2.2.4 [B:35] Let P be the spectral projection associated with an eigenvalue λ.

Prove that

limK(z)(/-P) = S,

z-*k

100 ELEMENTS OF SPECTRAL THEORY

define f(A) by Cauchy's formula (page 61). Show that if V is an invertible matrix,

then

f(V-lAV)=V~lf{A)V.

2.2.6 [B:10] Investigate sufficient conditions for matrices A and B to satisfy the

equation

2.2.7 [D] Let A be a square matrix. Let Xesp(A) and let P be the associated

spectral projection. Prove that if 0£sp(/4), then A~ί exists, λ~* is an eigenvalue

οϊ A'1 and P is the spectral projection of A~l associated with λ~ι.

AX + XB = C. (1)

(a) Prove that if the eigenvalues of A and of B have negative real parts, then the

unique solution of equation (1) is given by

ΛΓ= - \ QAtCeBtat.

Jo

(b) Let a be a non-zero number in

res (A) n res (B).

Define

/ ( ζ ) = (ζ + α ) ( ζ - α Γ \

U = f(A),

D=-±-(U-I)C(V-I).

2a

Prove that X is a solution of equation (1) if and only if X is a solution of

X-UXV=D. (2)

(c) Prove that if the eigenvalues of A and of B have negative real parts, then

p(U)<\ and p(V)<l·

(d) Prove that if p(U)p(V) < 1, then the solution of equation (2) can be expanded

in the series

00

X= £ Un~lDVn~l.

n= 0

EXERCISES 101

2.3.1 [A] Adapt the factorization of Gauss or of Schmidt in order to solve the

equation

of the complement of the invariant subspace associated with A.

2.3.2 [A] Let B and B be the matrices introduced in Sections 2.3.1 and 2.3.2

(pages 73 and 74). Prove that they have the same Jordan form and are therefore

similar.

2.3.3 [A] Prove Lemma 2.3.5 (page 76).

2.3.4 [D] Let A be an invertible matrix. Let Xesp(A) and let P be the associa

ted spectral projection and S(A, A) the associated reduced resolvent. Calculate

S(A ~ \ A~*) as a function of S(A, A) in two different ways, as follows:

(a) By the formula

S(A-\X~l)= lim R(A~\z)(I-P).

ι

2mJr ζ—λ

where Γ is a closed Jordan curve which isolates A -1 from the rest of the

spectrum of A"1.

2.3.5 [C] Let

22

A-( V—21 Λ

53/

(a) Verify that A = 25 is an eigenvalue of A.

(b) Calculate the reduced resolvent S associated with A.

(c) Calculate the partial inverse Σ 1 associated with A and the orthogonal projec

tion upon the invariant subspace.

(d) Compare ||5|| 2 with ||Σ Χ || 2 .

2.3.6 [A] Let Aesp(,4). Let M be the invariant subspace associated with A, Π

any projection on M, Π 1 the orthogonal projection on Μ,Σ(Π) the partial

inverse associated with A and Π. Prove that

imn 1 )«j = min ||Σ(Π)||,. (j = 2 or F).

102 ELEMENTS OF SPECTRAL THEORY

2.3.7 [B:9] Identify S with the Drazin inverse (A - λΙ)Ό and, when λ is semi-

simple, with the group inverse (A - λΐψ. If A is normal, prove that

(A - λϊγ = (Α- λΙ)Ό = (A- Xlf = 5

where f denote the Moore-Penrose inverse, or pseudo-inverse (Exercise 1.6.8).

2.4.1 [B:24] Solve the equation

(/ - P)AZ -ZB =R

with the help of the Schur form of B and the Hessenberg form of (/ — P)A.

2.4.2 [D] Solve the equation

(l-P)AZ-ZB =R

by Gauss's method, reducing it to r systems of n + r equations with r unknowns

and of rank n, with the matrices

where P is the spectral projection associated with the bloc of eigenvalues {μ,: ι =

l,...,r}.

2.4.3 [A] Examine the difference between the block-reduced resolvent S and

the reduced resolvent S associated with the same eigenvalue that is double and

defective.

2.5.1 [D] Examine the analyticity of the solution of the equation

AX - XB(t) = C

in the neighbourhood of t = 0 when

β(ί) = £ + ίΗ, ||Η||2=1, ieC.

2.6.1 [A] Examine the convergence of the iteration (2.6.3) defined in Example

2.6.1 (page 83).

2.6.2 [D] Adapt the algorithm (2.6.3) (page 83) to the case of an almost triangular

matrix and to the case of an almost diagonal matrix.

EXERCISES 103

2.6.3 [D] Suggest an algorithm based on Exercise 2.5.1 for solving the equation

AX - XB = C

when B is almost triangular.

2.6.4 [C] Solve the system

(-; TO-C)

by successive iterations, starting from ( I, which is the solution of

("I ?XK>

Section 2.7 Analyticity of the Spectral Projection

2.7.1 [B:35] Let P(t) be the spectral projection of the perturbed matrix A(t) =

A — tH. Give an example which shows that Kato's proof for expansion of P{t)

cannot be extended to the case in which the Jordan curve Γ defining P(t) encloses

several distinct eigenvalues of the matrix A(0) = A.

2.8.1 [D] Write down the Rellich-Kato expansions for a simple eigenvalue and

for a semi-simple eigenvalue.

2.9.1 [B:ll] Consider two vectors x and y in Cw such that

y*x = x*x = 1.

Put Q = xy*. Suppose that ξ is a simple eigenvalue of the matrix

Α = ξ<2 + (1-0)Α(1-<2).

(a) Show that

B:i(I - Q)(A - { / ) ] W i : { j i } ^ {y}1 is invertible.

Define

Σ = (/-β)2Τι(/-6).

/ 04 ELEMENTS OF SPECTRAL THEORY

ίχ =

g{r)

l - yX-r ^ 4 r ,

2r

u = Ax — ξχ,

v = A*y = ξγ.

Suppose there exists a( ^ σ) and ε such that

(*) |ι>*Σ*ιι|^<ι*||β||6 (k>l).

Define

? = a2\\Q\\e.

(b) Prove that if f < {, then there exists a simple eigenvalue λ of A such that

\λ-ξ\*ί9(η\Ό*Ση\9

which is the only eigenvalue of A in the disk

\ζ-ξ\^$α.

Let P be the spectral projection of A associated with λ and suppose that

y*Px Φ 0. Prove that there exists an eigenvector φ of A associated with λ and

normalized by γ*φ = 1 such that

||φ-*Κ0(?)||Σιι||.

2.9.2 [D] Verify that in the case of a simple eigenvalue the expansions of

Rellich-Kato and Rayleigh-Schrödinger can be made to coincide, and that this

is also true in the case of a semi-simple eigenvalue.

2.9.3 [D] In the proof of Proposition 2.9.1 (page 87) we calculated first Zk and

then Ck. Suggest a way of first calculating Ck and then Zk.

2.9.4 [D] Verify the identity

n(i) = P(i)Ar,S(i)y*

by using the Rellich-Kato series for P(t) and S(t) and the Rayleigh-Schrödinger

series for Tl(t) = X(t)Y*.

2.10.1 [A] Let F'{x) be the Frechet derivative at the point x of an operator F

which is Frechet-differentiable in the neighbourhood V of a zero x*. Suppose

that:

(HI) F'(x*) is regular.

(H2) x\-+F'(x) is uniformly continuous on V.

EXERCISES 105

Prove that there exists p > 0 such that, for every x0 satisfying ||x* — x 0 II < P, the

sequence

2.10.2 [A] Using the same notation as in Exercise 2.10.1 we now suppose that:

(HI) F(x*) is regular.

(H2) There exist numbers of p and ί such that in a neighbourhood V of x*,

the map xi-*F(x) satisfies the inequality

\\F(x)-F(y)\\*iS\\x-yV.

(a) Prove that there exists p > 0 such that, for every x 0 satisfying ||x* — x0 II < P,

the sequence

χΛ+1=χ*-Γ(χΛ)-^(χ,) (fc^O)

satisfies

||x f c -x*||<p

and

ΙΙ**+ι-*ΊΙ

supJUili ^L = c < __| oo.

*>ο||χ,-χ*||1+*

(b) Deduce the quadratic convergence of Newton's method when the derivative

satisfies a Lipschitz condition in the neighbourhood of the required root.

2.10.3 [D] Write down the method (2.10.4) (p. 91) by taking as the variable the

deviation

vk = xk - x°.

2.10.4 [B:48] Consider the equation

F(x) = 0

in a finite-dimensional space (£, ||·||) on the following assumptions: there exist

x0eB, / > 0 , r > 0 , m > 0 and c> 0 such that:

(HI) / is a Lipschitz constant of the operator

Γ:ΩΓ(χ0)^(β),

where

nr(x0) = {xeB:||x-x 0 ||<r}

and if (B) is the space of linear operators of B into itself.

(H2) F(x 0 ) is regular.

(H3) ΙΙΠχοΓΜΐ ^mand | | Κ ( χ 0 Γ ^ ( χ 0 ) Κ α

106 ELEMENTS OF SPECTRAL THEORY

(H5) mj and c satisfy the inequality

r^X-{\-JV-2mfc).

ΥΥϊί

1 - J\ - Imtc

p= ¥■ ,

mi

1 — m{ c — x / l — 2m/c

2^/1-2m7c

v= — .

mt

Prove that:

(a) 3x*e{xeB: \\ x - x0 || ^ p} such that F(x*) = 0.

(b) 0 < y < l .

(c) The Newton sequence {xk} satisfies

ll**-*oll <P

and

||xk-x*Kvy2k/(l-72k).

2.10.5 [B:19] Consider the equation (2.10.1) (page 90)

AX-X(Y*AX) = 0,

where Y is of full rank m. Change the basis in such a way that the matrix is

replaced by Γ = [/m 0] T .

(a) Prove that relative to this basis the unknown X is replaced by

Α' A'

A' =

^ 2 1 ^22>

(b) Show that R satisfies the equation

A'22R-RA'n -RA\2R = - A'2l,

known as Riccatis equation.

EXERCISES 107

AR-RB = RCR-D

(Exercise 2.10.5). Define the iteration

Let σ be the least singular value of the operator Rt-*AR — RB and let

_]1C]|F||Z>||F

K—

(a) Show that if κ < £, then the proposed method of iteration converges linearly

towards a solution R which is the only solution in the closed ball with centre

0 and radius

1-^/1 - 4 K

(b) Show that if κ < 1/12, then Newton's method, applied to Riccati's equation,

converges in a quadratic manner towards a unique solution in the closed ball

defined above.

2.11.1 [D] Let (x, A)e(C x C be the unique solution of

™-C£T)-0

where λ is a simple eigenvalue of A.

(a) Write out Newton's method, applied to this problem.

(b) Propose a simplified method with fixed gradient.

(c) Examine the convergence of these methods.

2.11.2 [A] Give sufficient conditions for the convergence of the method (2.11.3)

(page 95).

2.11.3 [A] Give sufficient conditions for the convergence of the method (2.11.4)

(page 95). Establish the contraction constant of the map

G:F^K1-hJ-1[Fy*/4F+F(5-i)],

in a manner similar to Exercise 2.11.2, provided that \\B — B\\ and ||K|| are

sufficiently small.

108 ELEMENTS OF SPECTRAL THEORY

2.11.4 [A] Let F be an operator and let x* be a point in its domain such that

F(x*) = 0.

Suppose that F is differentiable in a neighbourhood of x* and that T is a regular

linear operator such that

y= sup IIT-FMII^ir-Mr1,

ll*-**l!<P

where || · || is a vector norm and also the corresponding induced norm for linear

operators, and where p is a given positive real number. Prove that if ||x0 — x* || < p

then the sequence

2.11.5 [D] Let A be a simple eigenvalue of a matrix A and let u be an eigenvector

of A associated with λ and normalized by e*u = 1. Consider the problem

can be written as

a

/ \n \

, _ \B(X,u) :

{ 0 1 j

where Β(λ, ύ) denotes the matrix obtained by replacing the last column of

A -λΐ by - i i .

(b) Show that Β(λ, u) is singular if the eigenvalue λ is of multiplicity ^ 2.

(c) Apply Newton's method with fixed slope to

(d) Extend the preceding results to the case of a double eigenvalue by taking the

operator

(AV -UB\

where

a b

E = (e„^iye„) and B=

c d

EXERCISES 109

F(x) = 0.

as in Exercise 2.10.4. Retain the hypotheses (HI), (H2) and (H3) of that exercise

and add:

(H4 m£c< 0.25.

(H5) The constants mj and c satisfy the condition

1 - J\ - 4m/c

r> y- .

2m/c

Define the numbers

_ 1 — χ/1 — 4m^fc

P

~ Tmfc '

l-J\-4mifc

y=

2 ·

Prove that:

(a) 3x*e{xeB: ||x - x 0 II < P} s u c h t h a t F(**) = °-

(b) 0<y<0.5.

(c) The Newton sequence with fixed slope defined by

Χ*+ι=**-^'(*οΓ 1 ^(**)

satisfies

||χΛ-χ0|| <ρ

and

yk

1-7

2.11.7[D] Compare the Rayleigh-Schrödinger iterates for i = 1 with those

defined in (2.11.1) (page 92) with initial bases U. Comment on these results.

of Residual Correction

2.12.1 [D] Let λ be a simple eigenvalue of the matrix A and let y be a vector

that is not orthogonal to the eigenspace associated with λ. Propose a method of

residual correction for the solution of the equation

Ax — xy*Ax = 0,

when A is almost diagonal.

110 ELEMENTS OF SPECTRAL THEORY

matrix such that

cond2 (BA)« cond2 (A).

Instead of the original system consider the equivalent preconditioned system

BAx = Bb.

Interpret this preconditioning as an approximate inverse.

2.12.3 [A] Study the multigrid method [B, 28]. Interpret this method in relation

to the notion of an approximate inverse.

CHAPTER 3

...Analytical mechanics is much more than an efficient tool for the solution of dynamical

problems that we encounter in physics and engineering There is hardly any other branch

of the mathematical sciences in which abstract mathematical speculation and concrete

physical evidence go so beautifully together and complement each other so perfectly.

... There is a tremendous treasure of philosophical meaning behind the great theories

ofEuler* and Lagrange? and of Hamilton* and Jacobi ...a source of the greatest intellectual

enjoyment to every mathematically-minded person.

(Toronto University Press, 1962)

The eigenvalues of matrices or linear operators play a part in a very large number

of applications, both theoretical and practical. We shall try to convey an idea of

the extent of applications by citing examples deliberately chosen from very diverse

disciplines: they range from mathematics to chemistry and to the dynamics of

structures; they touch on economics.

While the theoretical applications are fundamental, the industrial applications

are no less important. We mention only the accident of the suspension bridge at

Tacome, in the state of Washington on the West Coast of the United States. This

bridge, of a span of 700 m, collapsed in 1940 under the effect of aeroelastic

vibrations, only four months after it was brought into service. At the moment of

collapse it showed a torsion of 45° against the horizontal in both directions under

the effect of a 70 km per hour wind.

DIFFERENCE EQUATIONS

We propose to study the stability of systems of linear differential equations and

of difference equations.

+

Joseph Louis Lagrange, 1736-1813, born at Turin, died in Paris.

♦Sir William Rowan Hamilton, 1805-1865, born in Dublin, died near Dublin.

112 WHY COMPUTE EIGENVALUES?

Consider the system of linear differential equations of the first order:

di

where u is a vector in Rw depending on the time t and where A is a constant n

by n matrix.

Let A = XJX " * be the spectral decomposition of A If we put

O= X~1U9

^ = Jv, (3.1.2)

di

where J is the Jordan form of A: let ί be the size of the greatest block.

The reader will verify (Exercise 3.1.1) that the solution of (3.1.1) for which

u(0) = u0 is given by

u(t) = eAtu0 = XeJtX~lu09 (3.1.3)

where e is an upper-triangular matrix whose elements are of the form tjeXit

Jt

On examining the solution (3.1.3) when t -* oo, we can show that:

(a) The system (3.1.1) is stable and u(t)-^0 if and only if

max Re Xt < 0.

X,esp(A)

(b) The system is unstable and u(t) is unbounded if there exists an eigenvalue λ

such that

ReA>0.

(c) When maxA. Re A, = 0, the solution u{t) is bounded or unbounded according

to whether the eigenvalues for which Re A, = 0 are semi-simple or include at

least one defective eigenvalue (see Exercise 3.1.3).

Remark The equation (3.1.1) models a diffusion problem when the time is regarded

as a continuous variable. If one considers only discrete values of the time, then the

formulation involves a linear recurrence equation.

Consider the linear recurrence system

u09 uk = Auk„l (k>l). (3.1.4)

DIFFERENTIAL EQUATIONS AND DIFFERENCE EQUATIONS 113

whose elements are of the form λ{, Afesp (A), k — ^i+l ^j^k, where i{ is the

index of A,· (see Exercise 3.1.6).

On examining uk when k -> oo we can show that:

(a) The system (3.1.4) is stable and uk->0 if and only if

p(y4) = max|/lf| < 1.

i

(b) The system is unstable and uk is unbounded when there exists an eigenvalue

λ such that

W>i.

(c) When p(A) = 1, the solution uk is bounded or unbounded as k -► oo, according

as to whether the eigenvalues ks for which \Xj\ = 1 are semi-simple or include

at least one defective eigenvalue.

Example 3.1.1 Consider the Fibonacci* sequence 0, 1, 1, 2, 3, 5, 8, 13,...:

/o = 0,

h = 1. (3-1-5)

Put

Wo== M

(o} *vi o) W f c _ 1 (k>V-

The eigenvalues of the matrix are

2 2

When k tends to infinity, fk + Jfk tends to (1 + y/S)/!, the number of the golden

section. The interested reader will find references to the remarkable properties of

the Fibonacci numbers in Strang (1980, pp. 196-8).

Example 3.1.2 Consider the method for calculating yfl, proposed by Theon of

Smyrna (second century B.C.). Starting from (1,1), iterate the transformation

xf->x + 2y,

y\-+x + y.

114 WHY COMPUTE EIGENVALUES?

It will be found that x2/y2 ->2. The procedure can be formalized as follows:

where

The reader should verify that xk/yk -> Jl and that the values of this quotient are

alternately greater and less than yjl.

According to whether a phenomenon is modelled by a system of differential

equations (continuous time) or by a difference system (discrete time), the stability

of the system depends, respectively, on the real parts or on the moduli of the

eigenvalues of the matrix describing the system.

A set of random variables Xn where teT, is called a stochastic process. For

example, Xt might represent the number of persons queuing at a Post Office

counter at the instant t. The theory of stochastic processes is used in the study

of queuing theory (telephone exchange, road traffic, information counters and so

on). A stochastic process without memory is called a Markov* chain:

nxk=j\x, = i,j=o,...,k-\)=nxk=j\xk-i = /*-!),

where P(£) denotes the probability of the event E.

The following terminology is often used in relation to Markov chains. The

system which evolves in time (taking the discrete values k = 0,1,...) is said to be

in the state j at the instant k if Xk =j. The probabilities

pg> = P ( X k = 7 l * * - i = 0

are called transition probabilities. For a homogeneous chain, pf) is independent

of/c.

A homogeneous chain which can take n states is associated with the transition

matrix P of order «, given by

Example 3.2.1 Random walk on a triangular grid of side n + / ( see Figure 3.2.1).

A particle moves at random on the grid by jumping from one point to one of

its (at most four) neighbouring points (N-S-E-W).

MARKOV CHAINS 115

l

i

2 x<6>

1 x(4) x<5)

(1)

0 x x(2) x (3>

/WO 1 2 n= 2

/ - I . /

X /, / + 1

X4-X/, j

up T

1 x -* x /; + 1, /

x down

W-1

Figure 3.2.1

i+j

Pd(iJ) =

In

This probability is doubled when / = 0 o r ; = 0.

(b) The probability of passing from (i, j) to (i + 1, j) or (i,;-f 1) is

This random walk models the Brownian movement in the plane. The reader

can verify that when n = 2 and the points of the grid are numbered in accordance

with Figure 3.2.1, then P is given by

/ Λ

\ 0 \ 0 θ\

0 J 0 i 0

1 0 0 0 0

P =

0 0 0 \ \

\ 0 \ 0 0

Vo 0 0 1 0 0/

(light bulb, electronic component, television set,...), can be associated with a

Markov chain if the life span of the equipment is a random variable that is

independent of the life span of the preceding equipment and has the same

probability distribution. The variables Ak ( = age of the equipment at the instant

k) form a Markov chain.

The evolution in time of a system that is associated with a homogeneous

Markov chain is characterized by the transition matrix P and the initial

116 WHY COMPUTE EIGENVALUES?

probabilities

We put

where

qf* = P(Xk=j) (fc = 0,l,...).

We remark that

£<??'=11^11 ι = ι·

Proposition 3.2.1

qw = q(0)pk (3.2-1)

PROOF We have

i= 1

the stationary distribution (if it exists), say

π = lim qik\

fc-00

π = πΡ, W l ^ 1, π,-^0. (3.2.2)

Proposition 3.2.2 If the matrix P is irreducible, there exists a stationary distribu

tion π satisfying (3.2.2).

the corresponding left eigenvector. When P is irreducible, the Perron-Frobenius

theorem 1.10.1 asserts that unity is a simple eigenvector and that all components

of π are positive.

The knowledge of the stationary distribution π is useful for studying the

queuing problem modelled by the above Markov chain. In computer science

(communication systems) and in the social sciences, the number of states can be

very large (n > 105) and the matrix P is non-symmetric. Moreover, for very large

systems, there is often a large number of eigenvalues close to unity. The method

THEORY OF ECONOMICS 117

the powers represented by (3.2.1) (see Exercises 3.2.2 and 3.2.3).

We present here a formal account (Morishima, 1971) of what is known as the

Marx*-von Neumann* theory in economics. The models of Leontiev (1941) and

von Neumann (1945-6) are studied in Exercises 3.3.5 and 3.3.8.

Consider a productive economy which is divided into n branches in such a

way that each branch produces only one kind of article and, conversely, each

kind of article is produced only by one branch. We denote by au the quantity of

articles required for the unique production of the branch/ The matrix A = (αί7),

where ai} ^ 0, is called the matrix of technical coefficients. For a production

x = (x!,..., xn)

the vector Ax represents the quantities necessary for this production. Hence

y = x — Ax

represents the net production.

We shall briefly present the linear production model due to Leontiev. The

hypothesis of linearity has two consequences; there are no possible substitutions

between the products consumed in the production and there are constant returns

to scale. Exercise 3.3.1 deals with the choice of units in the definition of A.

We shall now take into account work and wages. In order to produce one unit

of the article;, the branchy uses ^ workers. We put

/ τ =(Λ,...,α

We suppose that the salary w is the same in each branch and that it is entirely

devoted to consumption: each worker uses up the quantity dt of the article i. We

put

(F = (dl9...,dJ.

The total consumption of the article i for the production of the article; is therefore

atj + üf,·^·. Hence we have the sociotechnical matrix

B = A + ίΊά.

Additional assumptions of the model are that the consumptions of the workers

are the same at whichever branch they are employed, that there are no luxury

articles, that there is no fixed capital, that all the profit is accumulated and finally

that there exists a price system that renders each branch profitable.

+

John von Neumann, 1903-1957, born in Budapest, died in Washington.

118 WHY COMPUTE EIGENVALUES?

(a) whether there exists a price system p that ensures an equal rate of profit r for

each branch, (3.3.1)

(b) whether there exists a structure of production x that ensures the same rate

of growth τ for each branch (balanced growth). (3.3.2)

τ — r that satisfy the conditions (3.3.1) and (3.3.2) and are such that

1 1

pB = p, Bx = x.

1+r 1+τ

PROOF

(a) Let p = (pl9..., pn) be a price system. The production cost of one unit of; is

n n

C

J = Σ auPj + W^J = Σ (au + *A)Pi

i- 1 i=l

= Σ buPi·

i= 1

It is required that p} = (1 + r)cy Hence p satisfies (3.3.1) if and only if

a simple eigenvalue p > 0 such that

pB = pp, p>0,

p= -—.

1 +r

Moreover, p < 1, because by hypothesis all branches are profitable (Exercise

3.3.2). Hence r = l / p - l > 0 .

(b) Let x = (x!,...,xn)T be the production structure. The total quantity of n

articles required for the production of one unit of i is given by

n n

di = Σ (fly + tjddXj = X buXj.

and p = (1 + τ)" \ whence τ = r.

The rate of growth τ is equal to the rate of profit r. This property is known

as the golden rule of growth. We remark that the quantities /?, x and τ = r are

unique.

The models for economic planning involve the matrix of technical coefficients,

which can be calculated from accounting tables. When we are concerned with

FACTORIAL ANALYSIS OF DATA 119

forecasting models comprising a group of countries or the whole world, then the

matrices involved are structured in blocks, but they are not symmetric and are

of gigantic size. To give an indication, the input-output array for the years 1970

to 1979 supplied by INSEE for France corresponds to a division into about 600

branches of activity. Aggregated versions exist comprising 91,35 and 15 branches

respectively.

We shall describe the analysis into principal components of a cluster of points.

Other methods of factorial analysis are described in Exercises 3.4.5 to 3.4.7.

We wish to analyse a set of data which are presented in the form of n vectors

{Sj} in Rfc carrying masses ax,..., an respectively.

The space is endowed with a norm defined as a positive-definite matrix B of

order k. Put

s = Σ aJsJ and X

J = SJ - £

The matrix

of order k by n represents the data referred to the mass centre. The principal

components analysis of the cluster of points consists in projecting them on to

the plane which minimizes their dispersion in R* in the sense of the norm defined

by B. We use the notation

/t = diag(at).

The method consists in calculating the two greatest eigenvalues of the matrix

U = ΧΑΧΎΒ

together with the corresponding two eigenvectors. In general, the matrix U is

not symmetric.

Lemma 3.4.1 (Barra, 1981) Let X, A and B be three matrices of orders kxn,

n xn and kxk respectively, where A and B are symmetric positive definite. We

suppose that k^n. The matrices U = XAXTB and V= XTBXA of orders k and

n respectively have s( < k) positive eigenvalues in common; they are the non-zero

eigenvalues of the positive semi-definite matrix

W=B1,2XAXTBl/2

of order k.

120 WHY COMPUTE EIGENVALUES?

with A φ 0. Then

Uu = XAXTBu = Aw.

Put

Then

X/li; = Aw#0.

Hence v φ 0 and

The eigenvectors are connected by the relations

T

W=E E, where E= ΑΙΙ2ΧΎΒ1/2.

On the other hand,

Z = EET = A1/2VAl/2.

Hence W and Z are of the same rank, s(^/c); consequently, so are U and V.

The eigenelements of U can be calculated with the help of the eigenelements of

the matrix W of order k(^n). It is shown in Exercises 3.4.4 to 3.4.7 that the

methods of correspondence analysis, canonical correlation and discriminant

analysis revert to the same pattern of calculation with appropriate choices of X,

A and B.

The structural conception of industrial machines makes increasing use of mathe

matical models for the structural behaviour of the different mechanical consti

tuents. The model is then numerically approximated on a computer.

For an analysis of rotating machines we distinguish the rotating parts, the

fixed parts and the connection devices. Each of these types of constituents is

associated with particular physical phenomena. The fixed parts are studied in

accordance with the principles of structure analysis; the presence of rotating parts

makes it necessary to take account of the gyroscopic effect; similarly, the connec

ting pieces may introduce circulatory forces (Geradin and Kill, 1984). Thus we

THE DYNAMICS OF STRUCTURES 121

d2u du

M—

—r + B—

2 + B— + Ku =f, (3.5.1)

dt dt

where M is the mass matrix of the structure, which is symmetric and positive

(semi-) definite, K is the stiffness matrix and B = G + C is the matrix that takes

account of the gyroscopic effect G and the damping C. The matrices K and B

need not be symmetric; / represents an exterior force.

If B is neglected, equation (3.5.1) reduces to

d2u

M — + Ku = 0, (3.5.2)

dr

where the stiffness matrix K is, in general, symmetric and positive (semi-) definite.

If we seek a solution u of the form

u(t) = β ί ω 'χ,

then

Kx = ω2Μχ.

In general we want to find the least eigenvalues, or those that lie in a given

interval, in order to determine whether a known oscillatory perturbation force

can create a resonance.

Most frequently the matrices K and M are positive definite. Nevertheless, it

can happen that M and K are singular. For example, a structure that admits free

rigid movements (an aeroplane or a ship) has a stiffness matrix of rank n — r,

where r is the number of independent rigid movements.

In the general case we seek a solution u of the form u(t) = βμίχ so that

(μ 2 Μ + μΒ + K)x = 0. (3.5.3)

We can reduce this quadratic eigenvalue problem to a classical generalized

eigenvalue problem of order 2«, namely

Pz = XQz9

where we put λ = l/μ and

M 0 λ

={:}· ' - ( : : ) - ·■( 0 -KJ

122 WHY COMPUTE EIGENVALUES?

general, the eigenvalues λ (and hence μ) of the problem are complex numbers.

The size of the problem obtained by finite element approximation of structures

from civil engineering or aeronautics reaches several hundred or thousand degrees

of freedom. Engineers have devised various techniques of reducing the size of the

problem by static condensation; these are described in Exercise 3.5.8.

3.6 CHEMISTRY

In quantum theory the properties of particles, such as electrons, atoms or mole

cules, in the stationary state, are described by the wave function φ\ this is a

solution of the Schrödinger equation

Ηφ = Εψ,

where H is the energy operator and E is the energy of the particle. The operator

H is called the Hamiltonian; for a single particle it can be written as

h2

H=-—A + q,

2m

where h is Planck's constant, m is the mass of the particle and q is the potential

energy (depending on the spatial variables). Starting from a configuration with

basis {Xi}N, we have the representation

00

Φ = Σ ciXi-

£= 1

eigenvalue problem

Hc = ESc, c = (c/)"1, (3.6.1)

where

H = (HXpXi) and S = (XpXi)

(/, j = 1,..., n). This is the method of the interaction configurator which is more

pecise then the approximation of Hartree-Fock (see below), but which leads to

very large problems:

(a) When the chosen Xi are non-orthogonal, the size of the problem is usually

less than 1000; but the matrix S is ill-conditioned (the basis vectors are almost

linearly dependent.)

(b) When the Xi are chosen to be orthogonal, we obtain the classical problem

He = £c, whose dimension can vary from 103 to 106 (or more). The matrix

H is usually sparse and diagonally dominant but with an unstructured distri

bution of the non-zero elements.

CHEMISTRY 123

this differs from Lanczos's method, which we shall present in Chapter 6 (see

Exercise 6.3.17).

In the approximation of Hartree-Fock we are led to the problem Fc = ESc

whose size is in general less than 300. We wish to determine all the eigenvalues;

the problem should be solved iteratively, as the matrices change little from one

iteration to the next. The solution obtained with one iteration allows us to define

an orthonormal basis with respect to which the matrix F is almost diagonal.

A very different approach for molecules of hydrocarbon has been proposed by

Hiickel (1931): he uses the graph associated with a molecule. In this graph the

vertices are the carbon atoms and the edges are the bonds between the σ-electrons

of the two atoms under consideration. We associate with the graph a symmetric

matrix A = (alV), where atj represents the number of edges joining i to;.

According to Hiickel we can approximate H by OLI 4- ßA and 5 by / + σΑ,

where a,/? and σ are supposed to be known constants. The equation (3.6.1)

becomes

Ac = c.

β-Εσ

Thus E and c can be deduced from the eigenelements of the matrix A, which

corresponds to the graph of bonds between the carbon atoms.

The interest of this approach lies in the fact that the size of A is evidently

reduced. However, although the results obtained by this method are qualitatively

good, they are far less precise then those obtained by the method of the preceding

section.

A very active area of research is concerned with the spontaneous emergence of

space-time schemes of organization in chemical and biochemical reactions. The

understanding of these phenomena of self-organization is a fundamental stage

in the study of morphogenesis in open and non-linear biological systems. A

classical example of a chemical reaction that displays this phenomenon of self-

organization is the reaction of Belousov-Zhabotinski (Nicolis and Prigogine,

1977): a homogeneous chemical mixture, when left at rest at constant temperature,

can organize itself spontaneously into spirals.

We present here a model of a trimolecular chemical reaction known as Brusse-

lator; this is one of the simplest models possessing the property of self-organization.

We suppose that reaction takes place in a test tube. Let r be the space variable,

0 ^ r ^ 1. The concentrations x(i, r) and y(t, r) with diffusion coefficients D t and

124 WHY COMPUTE EIGENVALUES?

dx _ d2x

di dr2 + A-(B+l)x + x*y9

(3.6.2)

dy Λ 5 2 j; „ ,

_ , 2, + ß x - x 2 v .

3ί dr

with the initial conditions x(0, r) = x0(r), y(0,r) = ^o(r) when 0 < r < 1, and the

Dirichlet boundary conditions

x(i,0) = x(r, l) = i4,

y(i,0) = y(i,l) = ^.

The system (3.6.2) has the trivial stationary solution x = A, y = β/Λ. The linear

stability of (3.6.2) around the equilibrium solution is studied by putting

dt dt

The stability is therefore related to that of the Jacobian of the right-hand side of

(3.6.2), evaluated at the equilibrium solution.

Let J be the linear operator. There exists a stable periodic solution if the

eigenvalues of J with largest real part are pure imaginary (and semi-simple). The

reader will verify that

l

Dx--2 + B-l A2

J= or1

2

-B D2—-

2 2

dr

The examples that we have given so far involve only differential or partial

differential equations. However, most equations of mathematical physics possess

an integral formulation, for example the problem of the Laplacian+ with the help

of the Green* function (Exercise 3.7.1).

A simple example of an integral operator is the following:

K:x(r)-(Kx)(r) i = fc(r,s)x(s)ds

k(t9s)xi (0 ^ t ^ 1),

Jo

f

After Count Pierre Simon de Laplace, 1749-1827, born at Beaumont-en-Auge, died in Paris.

♦George Green, 1793-1841, born and died at Nottingham.

FREDHOLM'S INTEGRAL EQUATION 125

Jo

Within an appropriate framework of functional analysis it is easy to verify that

if (Λ,,χ) are eigenelements of an integral operator, then (1/Λ.,χ) are eigenelements

of the associated differential problem. Under appropriate conditions the inverse

of an elliptic differential operator is a compact integral operator (see Chatelin,

1983, Ch. 4).

We shall describe a method of approximation by approximate quadratures of

(3.7.1). This is very often used and is known as the method ofNyström (1930). We

consider a formula for approximate quadrature on [0,1], defined at points

0 < sf] ^ 1 with weights νν{.Λ), (i = 1,..., n). For the sake of simplicity the superscript

(n) will henceforth be omitted. Equation (3.7.1) is approximated by

n

Σ wjk(t,Sj)xn(Sj) = Xn(t) (O^i^l) (3.7.2)

The scalar λη and the vector xn = [xn(s7)]" are calculated by discretizing the

variable t at the same points if = st. We obtain

n

X wjk(th Sj)xn(Sj) = λΗχΜ (i = 1,. · ·, n).

Let An be the matrix with elements Wjk(th sj) (i, j? = 1,..., n). Then λη and xn are

solutions of

equation (3.7.2) on condition that λη Φ 0:

point t is known under the name of Nyström's natural interpolation.

Under appropriate assumptions on K and on the formula of approximate

quadratures, it can be shown that (λη,χη) tends to the solution (A,x) of (3.7.1) (see

Chatelin, 1983, for a rigorous treatment).

The reader will notice that, in contrast to what takes place in the discretization

of differential equations, the matrix An obtained here is dense. This computational

difficulty can be surmounted by keeping n of modest size and using the techniques

of iterative refinement (see Chatelin, 1984a).

This survey of different situations involving the calculation of eigenvalues

shows that it is required either to compute them all or else only some of them,

in particular those that lie in a given region of the plane (interval, half-plane of

126 WHY COMPUTE EIGENVALUES?

positive real parts and so on). The methods presented in Chapter 5 (dense

matrices of moderate size) and those of Chapters 6 and 7 (large sparse matrices)

will enable us to carry out these numerical calculations.

In the preceding pages we have presented a selection of practical applications

and theories of eigenvalues. There are many other industrial and scientific appli

cations that require the calculation of eigenvalues. Without claiming to be

exhaustive we mention the following:

(a) Network analysis of the distribution of electric power (Erisman, Neves and

Dwarakanath, 1980).

(b) Plasma physics (Rappaz, 1979).

(c) The physics of nuclear reactors (Wachpress, 1966, and, more recently,

Kavenoky and Lautard, 1983; see also Hageman and Young, 1982).

(d) The control of large space structures (Balas, 1982).

(e) Automatic layout of electronic circuits in technology VLSI (Barnes, 1982).

(f) Oceanography (Platzmann, 1978).

The iterative aggregation/disaggregation method for Markov chains is des

cribed in Chatelin (1984). The presentation of the Marx-von Neumann model

was inspired by a lecture course on balanced growth given by J. Laganier at the

University of Grenoble. The von Neumann model is described in Aubin (1984).

Supplements on the theory and applications of matrices with non-negative

coefficients can be found in Berman and Plemmons (1979).

As regards the dynamics of structures, the reader will find further information

in the books by Meirovitch (1980) and Sanchez-Palencia (1980). A description

of the maritime applications (naval design and drilling platforms in the high sea)

can be found in Aasland and Björstad (1983).

Finally, the reader will find useful developments in the books by Chatelin

(1983) on the numerical approximation of eigenvectors of differential and integral

operators, by Thompson (1982) and by Cvetkovic, Doob and Sacks (1980).

EXERCISES

3.1.1 [A] Consider the system of first-order differential equations

dw

w(0) = u o , — = Au (i>0),

di

where u is a vector of R" which is diiferentiable with respect to t and where A is

EXERCISES 727

a constant matrix of order n. Let J be the Jordan form of A and let V be the

corresponding basis. Show that

M(i)=Ke J i F- 1 u 0 ,

and determine the elements of eJ*. In particular, discuss the case in which A is

diagonalisable.

3.1.2 [C] Compute u(t) when the data in Exercise 3.1.1 are

/0

1 2\

A= 0 0 1 and un = 1

v1/

3.1.3 [D] Show that the system of differential equations proposed in Exercise

3.1.1 can be bounded or unbounded according to whether the eigenvalues of A

having a zero real part are semi-simple or not.

3.1.4 [D] Consider the one-dimensional heat equation

du d2u

—= — (0<χ<1, ί>0

2

dt dx

with the boundary conditions

ii(r,0) = u(i,l) = 0 (i>0)

and the initial condition

M (0,x)=/(x) (O^x^l).

(a) Write down the problem that arises when the second derivative is discretized

by finite differences:

2

_d _u(u

_ _ x)

_ ^ u(t, x-h)-

Ä

2u{U

__ x) + w(i, x + h)

άφ

+ Αφ = 0,

dt

Φ(0) = φ0>

where

φ(ή =

uN(t)

Ui(t) being an approximate value of w(i, ih). Put tj =jAt {j - 0,1,...) and let

u[ be an approximation to uk(jAt). Integrate with respect to the time over

128 WHY COMPUTE EIGENVALUES?

[tj(iJ+1]andput

1 <;♦■

uk(t)dt.

"i =

AtJ-

(c) Rewrite the system. Use the trapezium rule to evaluate approximately the

integrals

1

(\t)dt*$ Wc) + Ml (c<d).

d-c

(d) Show that the system can now be written as

A + — l)uJ+1 = (-A + — I W,

At J V At

where

u\

w

(e) given that the eigenvalues of A are

2

^k "- ^ i ,s2 i- n ^

2

(/c=l,2,...,N),

show that the sequence uj is bounded provided that h and Δί are sufficiently

small.

3.1.5 [B:39] Discretize the equation

Id2u d2u\ „ Λ «0

+ _/ Ω (0 )

"(i? v) °" - ·' ·

where Γ is the boundary of Ω, using finite differences and a step of h = 1/N for

both x and y.

(a) Let utj be approximations oiu(ihJh) and / y = f(ihjh). Show that the resulting

system becomes

«J) =Λ/

uu = 0

when i o r ; are 0 or N.

EXERCISES 129

Ahu = b,

the matrix being block-tridiagonal with invertible blocks.

(c) Show that Jacobi's method, applied to this system, converges if h is sufficiently

small; it may be assumed that the eigenvalues of Ah are given by

4

2 i+cos -

f = —2

cos 2 2

h\

3.1.6 [D] Let J be a Jordan block. Determine the elements of Jk (k = 1,2,...).

3.1.7 [D] Let A be a real positive definite matrix. Consider the system

d2u

= Au (t>0).

at2

Let X = (x1,...,x„) be a basis consisting of eigenvectors of A. Investigate the

existence of a solution of the form

7=1

2H 2 + 0 2 - 2 H 2 0

is decomposed into elementary reactions involving the radicals O, OH and H:

0 + H 2 ->OH + H,

OH + H 2 ->H 2 0 + H,

H + 0 2 ->OH + 0.

At the kth stage let xk, yk and zk represent the number of radicals O, OH and H

respectively, in such a way that

X = Z

fc+1 k->

= X

yk+1 k + Z*>

z

fc + 1 = x

k + yk

and x0 = 1, y0 = z0 = 0. Put

(l\

(Xk)

"fc = yk u0 = 0

\zk) {o)

Determine the matrix A such that

"* + Ι = Λ Μ * .

130 WHY COMPUTE EIGENVALUES?

M^ = lim uk.

k-* oo

3.1.9 [D] Study the discretization through finite differences of the eigenvalue

problem:

d2u(x)

Y = ku(x) (0<x<l),

dx

whenu(0) = u(l) = 0.

Calculate the exact solutions and compare them with the results furnished by

a discritization at five points (associated matrix of order 3).

3.1.10 [B:l 1] Let T be a bounded linear operator in Hubert space (H, <·,·»· Let

V=(vi,...,vn)

be an orthonormal system in H and let

S = lin V

be the subspace generated by V. The orthogonal projection on S is denoted by

π. Consider the eigenvalue problem

Τφ = λφ, ΟΦφβΗ, λβ<£

and the approximation, named after Galerkin, that is associated with the sub-

space S:

π(Τφη- ληφη) = 0, 0 Φ φη6S, Xne<£.

Show that the approximate problem is equivalent to a matrix problem

Au = ωκ, O^iieC", coeC.

Determine the matrix A and the relation between u and φη.

3.2.1 [B:36] Let P be the transition matrix of a homogeneous Markov chain.

(a) Prove that each complex eigenvalue is of modulus less than or equal to unity.

(b) Prove that for each eigenvalue of modulus unity, that is for each eigenvalue

of the form λ = e10, 0eR, there exists on integer q such that eiqe = 1.

(c) Prove that if all the elements of P are positive, then unity is a simple

eigenvalue of P and all the other eigenvalues of P are of modulus less than

unity.

3.2.2 [B:12] Consider a discrete Markov chain with n states. Let

Ρ = (Ρυ)

EXERCISES 131

be the associated transition matrix. Suppose that the chain is irreducible and

non-periodic. KolmogorofFs equations

ι=Σ*ι

have a unique solution π*. Jacobi's iteration can be written as

(a) Show that Jacobi's iteration corresponds to the power method applied to PT

with the normalization condition

7c(fc)e=l,

where

e = (l,l 1)T.

Let {Ω(ι):ι = 1,...,/?} be a partition of the set {1,2,...,«}. With each state

π of the chain such that n{ > 0 (i = 1,..., n) we associate a matrix Ρ3(π) (called

the aggregated matrix) defined by

Σ Σ *'pfk

η^^ιφ^ι— (ΐ</,^ρ).

π* = π*Ρ\

and let

p π*

π= Σ v - J n- GJn>

i=i Σ f

where

Gj= Σ <νϊ·

(c) Show that ne = 1. The new stationary state of the chain is defined by one

Jacobi step

π = πΡ.

(d) Show that

** = Σ ^ τ Σ MV* (K*<n).

^ΩΟ)

132 WHY COMPUTE EIGENVALUES?

3.2.3 [D] We retain the notations of Exercise 3.2.2. Consider a Markov chain

which is almost completely reducible:

P = D + £,

where

D = diag(D 1 ,Z) 2 ,...,D / ,), ||£|| 2 = ε;

£>, is the transition matrix of an irreducible non-periodic chain, whose stationary

state is denoted by π, and satisfies the condition

nte— 1.

Let π be a vector of blocks π,. Consider one step in the aggregation/disaggregation

method based on π:

Ρβ= Σ Σ MV* (ΐ^υ^ρ),

ΛεΩ(ί') / 6 Ω 0 )

π^= 1,

where

π3 = πΛΡ\

Show that

| | π - π * | | 2 = 0(ε).

3.2.4 [C] A message has to go from a point A to a point B by passing through

n intermediary points. Suppose the message can take only two states: either 0 or

1. Each intermediary has the probability p = \ of correctly transmitting the

message received and a probability q = f of transmitting the opposite message.

We say the system is in the state £(0) at the kth stage if the intermediary k

transmits 0 to the next intermediary, and that it is in the state £(1) if the

intermediary transmits 1.

(a) Prove that the sequence of observed states is a Markov chain.

(b) Calculate the transition matrix.

(c) Calculate the probability of receiving the correct message at B and determine

the limit of this probability when the number, n, of intermediaries tends to

infinity.

3.3.1 [B:37] Let A = (a^) be the matrix of technical coefficients. Let dj be numbers

that enable us to pass from one physical unit to another or else to a monetary

EXERCISES 133

unit. Let A be the new matrix obtained in this way. Prove that

Ä= D'lAD,

where

D = diag(d1,...,rfw).

3.3.2 [B:37] Let A be the matrix of technical coefficients defined in monetary

units. Suppose there exists a price system that makes each branch of the economy

profitable. Show that

p(A)<\

and hence p(A) < 1, where A is the matrix of coefficients defined in whatever

system of units.

3.3.3 [B:37] Suppose the number of employees in the branch j of the economy

decreases, thereby causing the intensity of work in this branch to increase. Prove

that the rate of profit and the rate of growth will then increase.

3.3.4 [B:37] In the Marx-von Neumann model wages are indexed by prices:

w = pd,

where d is the employees' basket of consumption goods. Suppose that d is varied

by an amount Ad.

(a) Show that the increases of consumption by the employees is equivalent to an

increase in the wage costs.

(b) Show that an increase of the wage costs implies a decrease in the rates of

profit and growth.

3.3.5 [B:5] We present here what is known as the closed Leontiev model. The

set of goods is supposed to be equal to the set of products. The matrix A of

technical coefficients is a non-negative square matrix. If x is the vector of products

and y is the vector of goods, then

y = Ax.

The system is viable if y ^ x and the equilibrium of the quantities is given by

(J - A)x = 0 (x ^ 0).

(a) Determine a sufficient condition for equilibrium when the matrix A is irredu

cible.

Let p be the row vector of the prices of the goods. The row vector of the

costs of manufacturing the goods is therefore

c = pA.

Hence the equilibrium of the prices is given by

ρ(/-Λ) = 0 (ρ>0).

134 WHY COMPUTE EIGENVALUES?

(b) Show that, when A is irreducible, the equilibrium of the prices is equivalent

to the equilibrium of the quantities.

3.3.6 [B:37] Next, we shall present the open model of Leontiev. We now have

n goods which are also products, but there exists a type of goods which is not a

product (in general, the work). The matrix A of technical coefficients is non-

negative and irreducible. The net produce is given by the equation

q = (I-A)x.

(a) Given a demand vector c ^ 0, determine a sufficient condition for the exis

tence of a vector x ^ 0 such that

q = c.

(b) Prove that if there exists a row vector p > 0 such that

pA>p,

x

then (I — A)~ exists and is positive.

(c) Examine and interpret the sequence

x (0) = c,

when the dominant eigenvalue λ* of A is such that 0 < λ* < 1.

3.3.7 [B:37] In the growth model of von Neumann, the production is defined

by two matrices:

A = coefficient matrix of the goods,

B = coefficient matrix of the products,

where it is supposed that there are m techniques for the production of n goods.

During a given period of time we consider a column vector xeR m of the activity

level of the techniques and a row vector p,pTeRn, of the prices of the goods. If a

is the growth rate and β is the interest rate, then

(B - <xA)x ^ 0 (x ^ 0),

ρ(Β-βΑ)^0 (ρ»0).

(a) Show that the surplus has zero price and that, if the profit is less than the

interest rate, the activity level is nil.

(b) Show that, if the technology (B9 A) consists of non-negative irreducible matri

ces, there exists a unique number α* = β* > 0 such that

a* Ax ^Bx (x> 0),

β*ρΑ^ρΒ (ρ>0)

and

p(a*A - B)x = 0.

EXERCISES 135

(c) What can be said about the maximum rate of growth in relation to the

minimum rate of interest?

3.3.8 [C] We shall treat here the case of a farmer whose economy is confined to

the raising of chickens. We are concerned with two goods (chickens and eggs)

and two processes (laying and brooding). It will be assumed that a laying hen

will lay a dozen eggs per month, while a brooding hen will hatch four eggs per

month.

(a) Show that the matrices A and B of Exercise 3.3.7 are in this case

(b) Study the farmer's situation at the end of two months after he started with

three chickens and eight eggs.

(c) Repeat part (b) for the case when he started with two chickens and four eggs.

(d) Calculate the rate of growth when the economy is balanced.

(e) Study the balance of prices when one chicken is worth 10 units and an egg is

worth 1 unit.

(f) Repeat part (e) for the case when the price of a chicken is 6 units and that of

an egg is 1 unit, and calculate the rate of interest.

3.3.9 [B:37] Consider the Marx-von Neumann model defined in Section 3.3

(page 117): we formalize here the process of absolute price formation and hence

the propagation of inflation. Given two vectors

x = (x 1 ,...,x n ) and }> = (j>i,...,)'„)

we define the vector

z = (z 1 ,...,z n ) = max{x,<y}

by putting

z^maxjxi,^} (ι = Ι,.,.,η).

Let s be the marginal rate and put

A—U

1+5

Define the following sequence of row vectors:

pk+i=max<pk9-pkB>9

This formalizes the effect of the 'clich', that is the rigidity at the decline of prices.

(a) Show that if B is irreducible and if p = (p (1) ,... ,p(n)) > 0 is the price system,

136 WHY COMPUTE EIGENVALUES?

1-,—'-

1+r

and

P(i)

a = max -£-.

converges to p and the absolute prices increases at the rate ρ/λ.

3.3.10 [D] Consider Samuelson's oscillator: let rk be the national revenue, ck

the national consumption, dk the national expenditure and ik the national invest

ment, all during year k.

Let s be the marginal tendency for consumption and let v be the ratio of

investment versus the increase of consumption. Then y — vs is the coefficient of

acceleration:

^ = 7(^-1-^-2)·

In addition we have the relations

rk = ck + ik + dfc,

which correspond to the plan of consumption and investment, and

c

k — srk - 1 Ϊ

which is the delay of one year between the evolution of consumption and revenue

(with vital minimum zero).

(a) Taking dk = 1 for all /c, show that the national revenue satisfies the equation

rfc + 2 - s ( l +v)rk+l+svrk= 1.

(b) Study the solution of this equation in relation to the values of (s, v). It is

convenient to consider the four regions in the (s, 1;) plane defined by the curves

1 Δ 4ν

s = -v and s = (1+tO 2

3.3.11 [C] Consider an economy that is divided into N regions. We study the

interregional movements of the immigrant workforce. Its redistribution during

the period k (0 ^ k ^ T) is given by a row vector

x

k — ( x kl>* · -*xkNh

EXERCISES 137

where xkj is the effective workforce of the immigrant population in the region j

during the period k. We suppose that the workers freely change their location

according to taste and different market conditions for work. Let Ak = (α(9) be the

'matrix of migration', where af) is the rate of migration of workers from the region

i to the region j during the period k.

(a) Show that

X =

k+1 ^0^0^*1 '"Ak.

w

4 2

Ak = A = 0 ^ ^U

3 3

\I i I /

Calculate the matrix that represents \the

4 4 2/

rate of migration from one region

to another at the end of T periods.

(c) Examine the behaviour of the matrix calculated in part (b) when T-> oo.

3.4.1 [B:14] Let X9 A, B9 U, V, W9 Z and E be the matrices defined in Lemma

3.4.1. We order the real eigenvalues of U and W and of V and Z (of orders k and

n respectively) by decreasing magnitude; thus

= =

Ai ^ A 2 ^ ' * * ^ Ar > Ar + i = Ar + 2 "" ^k = = ^n ==

"·

(a) Show that the associated eigenvectors uh wh vt and zf satisfy the equations

Ui = B-1/2wh vi = A-li2zi.

(b) Show that the SVD (Exercise 1.6.8) of E is

E=ijIiZiw].

i=l

{fJXgl

if \=

Ρυ,ΰ) i{fTB-WA-ig)1iir

Show that if

fi = BUi and 9i = AVi9

then

p(fi,gd = max {p(/,0)|uJ/= »Jo = 0; 1 < j < i} (1 < i < r).

3.4.3 [B:14] We retain the notations of Exercise 3.4.1. Let

SeR*x" and TeJR.nxN,

/38 WHY COMPUTE EIGENVALUES?

where max {k,n) ^ JV, and define X = STT, A~' = TT T , B~* = SST. Let 0, be the

ith canonical angle between the subspaces lin ST and lin T T in R N . Show that

yfki = cos 0i (i = 1,..., k).

3.4.4 [B:14] In the method of correspondence analysis the matrix X of order

k x n represents a contingency table:

Xij^O and Σ^ο=^·

and J which take values in {1,2,..., k} and {1,2,..., it} respectively. We define

k n

a

j = Σ XU > °» b

i= Σ X

ij > °>

A = diag(ar 1 ), JB = diag(br 1 ).

U = ΧΛΑ^Β

associated with the triplet (X, A, B). Let

α = (α 1 ,...,α π ) τ , f? = (fc1,...,fcfc)T,

X0 = X - baT, U0 = XOAXJJB.

sp(l/ 0 ) = sp(l/)\{l}.

(c) Show that (70 is associated with the triplet (X0A9 A ~ l9 B) as well as with the

triplet (X09A,B).

(d) Interpret factorial correspondence analysis of the columns of X as determining

the principal inertial axis in R k equipped with the norm B of the set of points

defined by the columns of X' — X0A weighted by {a,}". The row analysis is

obtained by duality.

3.4.5 [B:14] Continuing Exercise 3.4.4, assume now that xu represents the

probability of the event {/ = i and J =7}, where / and J are two discrete random

variables. Suppose that the vectors / and g of Exercise 3.4.2 represent the corre

sponding functions

/:{l,...,fc}->R,

0:{l,...,n}-]R.

Associated with the variables / and J we have the functions / ( / ) and g(J).

Establish the following results:

fTXg = <$lf(I)g(J)l

EXERCISES 139

fTB-if=£{u(m2h

Show that the probabilistic interpretation of the correspondence analysis is

now to determine the functions of / and J defined by fx = Bux and gt = Avt that

are uncorrelated with the preceding ones and that have maximal correlation

equal to y/Xi9 i ^ 2.

3.4.6 [B:14] In the principal components analysis we consider a random vector

SeR" of zero mean. We define

X = £(SST)9 A = B = I, u=V=X\

fTXf=£(fTSSTf) = a2(fTS).

Show that the method determines uJS, the linear combination of the compo

nents of 5 that has the greatest variance and is uncorrelated with u?S when j < i.

In this situation, is it necessarily true that λ,·€[0,1]?

For the geometric interpretation, we consider n weights {α^Ί such that Σ" = laj= 1

and n vectors {S}}\ in R \ centred with respect to the {a,·}". Set X = [ S l 5 . . . , S J

and A = diag(a7); the norm B on R* is given. The principal component analysis

(PCA) of the set of n points {Sj)nx in R" finds the principal inertial axes in R*

normed with B of the n points weighted by {a^\. The two particular choices

B = I [respectively B = diag(br*)] lead to the unitary PCA [respectively norma

lized PCA with bi = Σ"= i<*jSfj which is the empirical variance].

3.4.7 [B:14] Consider the method of discriminant analysis. We retain the nota

tions of the Exercises 3.4.1 and 3.4.2. Let S be a random centred vector in R* and

let J be an integer random variable. Put n(j) = Prob(J =y). We define

* = (*!>·..,*„),

A-^dmgWj))

B-l=£(SST)

(a) Show that

fTB-lf=a2(fTS),

gTA-lg = #[g2(J)l

(b) Show that sp(l/) c [0,1].

(c) Show that p(f, g) is the correlation coefficient for fTS and g(J). The canonical

140 WHY COMPUTE EIGENVALUES?

the linear functional of S and the function of J that have maximal correlation.

3.4.8 [B:14] In the canonical analysis of the correlation between two vectors

one considers N centred vectors {Sy}* in R* and N centred vectors {7>}* in R".

SetS = [S i ,...,S N ],T=[T 1 ,...,T A r ], J ß- 1 =(l/iV)S5 T ,/4- 1 =(l/iV)7T T andX =

(\/N)STT. Explain why the direct geometrical interpretation in terms of minimal

inertia is no longer possible. Give an interpretation in terms of canonical angles

between the subspaces in R" spanned by ST and TT.

3.4.9 [A] Let A, B, X, W, Z, (7, V and E be the matrices of Exercise 3.4.1. Let RA

and RB be upper triangular matrices such that

A = RTARAi B = RTBRB.

Define

E = RAXTRTB, W=ETE.

(a) Show that W and W are similar.

(b) Show that the eigenvectors u of U and v of V can be calculated from the

eigenvectors vv of W.

3.5.1 [D] A vertical bar of length unity is fixed at its lower end. At the other

end (zero of the x axis) a device prevents displacements at right angles to the axis

of the bar. At this end a downward force P is applied which causes the bar to be

deformed. Let u(x) be the displacement of the point situated at the abscissa x at

right angles to the axis of the bar.

On the assumption that the displacements remain small it can be shown that

d

a(x) —-«(χ)Ί

u(x) 1 + Pu(x) = 0,

dx dx

where a(x) depends on the physical properties of the bar. In accordance with the

conditions for fixing the bar we have

M(0) = w(l) = 0.

(a) Prove that if a(x) = 1, this differential problem has a non-trivial solution only

if

Pe{(7i/e) 2 :/c=l,2,...}

and that, when P = Pk = (nk)2, the solution is any function that is linearly

dependent on

uk{x) = sin nkx (0 ^ x ^ 1).

Suppose now that the function χι—>α(χ) is not constant. We may seek an

approximate solution by discretizing the bar into n + 1 segments of length

EXERCISES 141

h= l/(n + 1). Let wf denote the approximation for u(ih) and put ai+l/2 =

a((i +1)). The discretized problem can be written as

"o = "«= 0.

(b) Show that this discretization is equivalent to a matrix problem

Au = Xu.

Determine the matrix A,

3.5.2 [B:66] The lower ends of the two bars in Figure 3.5.1 are equipped with

springs in such a way that in the absence of forces, the position of the bars is in

the same vertical line. A downward vertical force F is applied to the upper end

of the second bar, which causes the angles 0X and 0 2 to appear. Both bars are of

length / and mass m, and the two springs have the some characteristic constant k.

(a) Show that, if the force of gravity is neglected, the kinetic energy K and the

potential energy V are given by

K = > / 2 [ 4 ö ? + Wxe2 cosφ χ - θ2) + θ221

V= \kQ\ -h \k{Q2 - Θ,)2 - Fl(2 - cos θχ - cos 02).

(b) Write down Lagrange's equations

dfdK\ dK dV Λ ,. t ^

— —r- +— =0 0=1,2)

dt\BeJ de, dßi

for this case. Discuss the solution that corresponds to the initial perturbed

Figure 3.5.1

142 WHY COMPUTE EIGENVALUES?

conditions

0,(0) = εα„ 0,(0) = eft 0=1,2).

Assume the existence of a solution 0,(ί,ε) which is differentiable with respect

to ε at ε = 0. Put

ΦΜ = —-—

οε

(c) Show that </>, satisfies the equation

+ W, = 0, 0 = 1,2).

30,50,/

(d) Write the preceding system of differential equation in matrix form:

Βφ + Αφ = 0,

and show that A and £ are symmetric and that B is positive definite.

(e) Show that if F > 0 and k> 0, then the roots of the polynomial

p(/4) = d e t ( A - ^ ß )

are real and distinct.

3.5.3 [B:66] Generalize the problem 3.5.2 to the case of n bars. In this case the

kinetic and potential energies (neglecting gravity) are given by

ml2 n

· ·

K

= ΤΓ Σ (6« + 3 - 6max

{'' J) - δ»)θ&cos (0.· - ej)>

3.5.4 [B:66] Consider an elastic solid. When the equations of elasticity are

linearized, they take the form

1

£ A / Jduk duA

2 M =i \dxl dxkJ

Aijki — Ajikl — i4yIfc — v4kiij,

,·= i dx,. dr

where u(x) is the displacement vector and p(x) is the density of the material.

Write down the system of differential equations that enable us to determine

the normal modes of vibration of the form

ux(x, t) = exp ( - ί^Τήω(χ).

EXERCISES 143

3.3.5 [D] Consider the vibrations of an elastic disk (homogeneous and isotropic)

whose normal displacement component u(x, t) is a solution of

dt2

where Δ 2 is the biharmonic operator (the Laplacian is applied twice). The method

of the separation of variables yields two normal modes of vibration of the form

u(x, t, λ) = exp (iy/Tt)w(x).

Write down the equation satisfied by w.

3.5.6 [A] Consider the differential equations

Mu" + Bu' + Ku = 0,

whose unknown u is a vector function of a real variable t > 0. Find a solution u

of the form

"(0 = eA'4>,

where φ is a constant vector. Prove that the pair (A, φ) satisfies the equation

(λ2Μ + λΒ + £)</> = 0.

3.5.7 [D] Consider the differential equation

MM" + J3u' + Ku = 0

with the initial conditions

w(0) = wo, ii'iOHtii.

Define the polynomial

ρ(λ) = ά<Χ(λ2Μ + λΒ + Κ).

Suppose that M, B and K are Hermitian. Prove that:

(a) If M, 1? and K are positive semi-definite and if M and X are positive definite,

then no root of ρ(λ) has a positive real part.

(b) If M and K are positive semi-definite and B is positive definite, then λ = 0 is

the only root with a zero real part.

3.5.8 [B:22] We present here a method known as static condensation in relation

to the problem

(P) Kq = aj2Mq, O^eC,

which models the natural frequencies and modal forms of a structure considered

globally; K is the rigidity matrix and M is the matrix of masses (page 121).

We choose a subset qc of coordinates that are to be eliminated and we denote

by qR the subset of coordinates that are to be retained. This induces a partitioning

144 WHY COMPUTE EIGENVALUES?

KRR4R + KRC<?C = ^ 2 (M RR g R + MRCqc),

*CR<7R + Kcc<7c = w2(MCR^fR + Mccqc).

Suppose that qc can be decomposed as follows:

<7c = <?s + <7D,

where qs is the static part:

<Zs= -^CC^CR^R·

The method of static condensation consists in neglecting qD so that

9c = -

KccKCRgR.

(a) Prove that, if qD = 0, then (ω, qR) is a solution of

KRR<ZR = <^2MRRgR,

where

K — K — K κ~ικ

^RR "" A R R ^RC^CC^CR'

=

^*RR ^*RR ~" ^ R C ^ C C ^CR ~~ ^ R C ^ C C ^*CR>

=

^*CR ^ C R ~" ^ C C ^ C C ^CR·

(b) Show that qD satisfies the equation

(Kcc - a)2Mcc)qD = co2MCR<?R.

Let qR = 0 and let (μ,, </>,) be the solutions of

Κ€€φ = μ2Μ€€φ (φ*0).

Suppose that

.2 *> . . . ^ „2

··<«

and

4>*Mcc<t>J = oij.

Let ε > 0 be the order of magnitude of the error acceptable for the modal

forms associated with the low frequencies.

(c) Show that the method of static condensation furnishes approximations ac

ceptable for the solutions (ω, q) such that

ω2 = εμ2« 1.

3.6.1 [D] Prove that equation (3.6.1) is equivalent to Galerkin's method when

{Χι>···>&■} is a n orthogonal system. Determine the linear operator which is

represented by the matrix H is (3.6.1) (page 122).

EXERCISES 145

3.6.2 [D] Verify that the Jacobian of the right-hand side of (3.6.2) (page 124) is

given by

d2 \

A2

J =

d2

-B D

^

\ 1

3.7.1 [B:12] Consider the differential eigenvalue problem:

- x " = Ax, x(0)=0, x(l) = 0.

(a) Determine the associated Green kernel and formulate the eigenvalue problem

associated with the integral operator.

(b) Show that the discretization by finite differences for the differential problem

is equivalent to Fredholm's approximation for the integral problem.

3.7.2 [B:6,12] We present here what is known as the collocation method by

discussing an example. Let B = C°[0,1] be the Banach space of functions

x:ie[0, l]i->x(i)eC

which are continuous on [0,1] and are equipped with the uniform norm

| | x | | = max |x(i)|.

Let n > 2 be a given integer and put

/i = ( n - l ) - and 0 = 0 - l)h 0=l,2,..,n).

Define

if 0 ^ t ^ h9

*i(0=-

otherwise;

whenj = 2,...,n— 1:

l--|i-r,.| ύ tj.1^t^tj+1,

*;(') =

0 otherwise,

en(t)=S h

0 otherwise.

146 WHY COMPUTE EIGENVALUES?

(a) Prove that xeBn if and only if xeB and x is a polynomial of degree less than

or equal to unity on [tp tj+ J wheny = 1,2,...,n — 1.

(b) Prove that

n

πη:Β-*Β, XH-> £ x(i;)e,·

defined on B. Consider the eigenvalue problem

Τφ = λφ, ΟΦφεΒ, AeC.

and the approximation (of oblique Galerkin type)

πη(Τφη - ληφη) = 0, 0 Φ φηβΒη, AneC.

(c) Prove that this approximation is equivalent to a matrix problem of order n:

Au = COM, 0 Φ weC, ωβ€.

Calculate the matrix A and make explicit the connection between u and φη.

3.7.3 [A] Retain the notation of Exercise 3.7.2. Let

Φ„=Τφη.

Prove that φη is an eigenvector of the operator Τπ„ associated with the eigenvalue

n

hi*) = Σ MjnMtjn)

/(x)= I x(i)di.

Jo

Let T:C°[0,1] -> C°[0,1] be the integral operator

(Tx)(t)= k(t9s)x(s)ds

Jo

having a continuous kernel k.

We define the Nyström approximation of T associated with the given quadra

ture formula, namely

(Tnx)(t)= t<*jnk(t,tjn)x(tjn).

EXERCISES 147

n

max | x j = l.

1 </<«

<M0 = Σ °>jnKt,tjn)Xjn

(b) Prove that

<l>»(tin) = Xin (1 </<«).

CHAPTER 4

Error Analysis

This chapter begins with a topic of great practical importance, namely the

stability of a spectral problem and the notion that is derived from it: the spectral

conditioning for a set of distinct eigenvalues and for the associated invariant

subspace. This will be considered in the most general case involving a non-

normal matrix and defective eigenvalues.

The analysis of a priori errors in based on spectral theory, which enables us

to give concise and elegant proofs. The analysis of a posteriori errors furnishes

bounds that are fairly easy to calculate as a function of the residual matrix

AU — UC constructed upon the matrix C of order m and the m vectors of V.

Suppose we wish to solve the linear system Ax = b: it possesses a unique solution

x = A " lb if and only if A is non-singular. This is all that the pure mathematician

is interested in. But the numerical analyst wants to know that, within the realm

of possibility, the solution is insensitive to perturbations in the data A and b. If

A is perturbed by ΔΑ and b by Ab, then the new solution x -f Δχ is such that

||Ax|| condM) /«Aft« | ||ΑΛ|[\

ΐ-Μ-ΊΙΙΔΛΙΓIV 11*11 Mil /

provided that ||ΔΛ|| < 1/M~ * II ( see Schwarz, p. 27, or Horn and Johnson, 1990,

p. 338).

The condition number cond(i4)(= Mil MMl) is therefore a measure of the

relative error ||Ax||/||x|| of the solution as a function of the relative errors of the

data. If cond (A) is large, then, in a certain manner, A is close to a singular matrix:

there exists a matrix AA of rank unity such that

UA, - '"'

cond A

and A + A A is singular (see Exercise 4.1.1; also see Chapter 1, Section 1.12).

Before solving a linear system it is useful in practice to scale the matrix in order

to reduce its condition number. The scaling process of A consists in finding

150 ERROR ANALYSIS

cond(DlAD2) = inf condiA^Aj)

As we have seen, the stability of the solution of a linear system depends on the

regularity of A. The situation is much more complex for the problem of eigenvalues.

The notion that corresponds to the regularity of A is the property of A to be

diagonalisable or non-defective.

is defective. Put

m-(] J) (e>o).

The eigenvalues of Α{ε) are

Αχ(ε) = 2 + y/ε, λ2(ε) = 2 - ^/ε,

d/l1(g)_ 1 άλ2(ε)_ 1

dß 2^/έ' de iyß

The rate of change of the eigenvalues at 0 is infinite. However, Α(ε) is much nearer

to non-diagonalisation than might be predicted from the distance of the eigen

values:

\\A — Α(ε)\\ = ε and Α^ε) — Α2(ε) = 2 χ / ε » ε

for small ε.

We shall study the variation of the eigenvalues and eigenvectors as a function of

a variation AA in the matrix A. We shall begin with a simple eigenvalue.

Suppose the eigenelements A, x, x+ satisfy the equations

AX =λ Χ

_ 1||χ||2 = χ * χ = 1 .

,4*x+ = AxJ *

Let P = xx* be the eigenprojection, S the reduced resolvent, Ρλ = χχ* the

orthogonal projection on the eigendirection M = Ηη(χ),Σ± the partial inverse

STABILITY OF A SPECTRAL PROBLEM 151

Figure 4.2.1

in Mλ and let ξ be the acute angle between the eigendirections lin (x) and lin (xj

(see Figure 4.2.1). If [x,Q] is a unitary basis of C such that Q is a basis of

M \ then

Σ 1 = ρ(Β^Λ/Γ 1 ρ*,

where

Β = βΜρ.

Put

<5 = dist[A,sp(>l)-{A}].

simple eigenvalue λ varies (to thefirstorder) by Αλ = x*AAx, and the eigendirection

lin(x) turns (to the first order) through an angle Θ such that

tan0= ΙΙΣ-^Λχ^.

PROOF Let A' = A + AA, where ε = \\AA \\ 2 . We have seen in Chapter 2 that the

Rayleigh-Schrödinger series converge for sufficiently small ε (Corollary 2.9.4).

Hence there exist eigenelements X' and x' of A' such that x*x' = 1 and

μ'-(Α + χ*Δ.4χ)| = 0(ε2),

| | χ ' - ( χ - Σ χ Δ Λ χ ) | | 2 = 0(ε2).

In accordance with Figure 4.2.2 we have

||x'_x|| 2 = tan0

Figure 4.2.2

152 ERROR ANALYSIS

Definitions

(a) The spectral condition number of the simple eigenvalue λ is defined by

csp(/l)= | | x j | 2 .

(b) The spectral condition number of the eigendirection lin(x) is defined by

ο*ρ(χ)=\\Σ1\\2.

We recall that

Il**ll2 = 11^112 = (cos ξ)~\

and

(3-1^||(B-A/)-1||2 = l|X1||2<2cond2(F)^',

where £ is the index of the eigenvalue of B that is nearest to A, provided that δ is

sufficiently small (see Exercise 4.2.1), and where V is the Jordan basis for B.

When B is diagonalisable,

<5-1^||Z1||2<cond2(F)^-1,

where V is the basis of the eigenvectors of B. Then

(a) If there exists a vector of M that is almost parallel to x (that is if ξ is almost

a right angle), then λ is ill-conditioned.

(b) If δ is small and/or cond 2 (K) is large, then x is ill-conditioned.

When A is Hermitian or normal, then

M 1 = M, ||xj|2 = l and ΙΙΣ1!!, =-cT'.

In this case, the sole cause for the ill-conditioning of lin (x) is δ, the distance of /

from the rest of the spectrum of A. For an arbitrary matrix, the departure of B

from normality also plays a part (see Chapter 1, Section 1.12).

m) is not defined. When we perturb the matrix, we obtain in general m simple

eigenvalues {AJ.}™. When λ is semi-simple, maXjCsp(Aj) may be moderate, because

λ corresponds to an orthogonal basis of eigenvectors. When λ is defective, then

maXfCspiA;) is necessarily large because λ corresponds to fewer than m indepen

dent eigenvectors. We conclude that a defective eigenvalue is necessarily ill-

conditioned when considered individually (see Section 4.2.2).

b

v° )

where a φ b. The matrix has two simple eigenvalues, namely a and b; their

STABILITY OF A SPECTRAL PROBLEM 153

with a double eigenvalue \(a + ft), which is semi-simple or defective according to

the value of ε.

(a) Suppose that ft — a is small and ε" * is moderate. Let T — \{a + ft)/, then

/a —ft b — a\

2 ε

T-T' =\

b-a

0

\ 2 /

and

|| T - T || 2 is of the order of ft - a.

(b) Suppose that both ft — a and ε are small. The matrix

b a

( „ ~ \

-φ-ά)

\ )

has a defective double eigenvalue \ (a + b), and the Jordan basis is given by

n i \

V=\ ε

U b>-a\ 2 /

If (l/s)(b — a) is moderate, then cond2 (V) is moderate. This condition number

is large when (1/ε)(ί? — a) is large; in this case V is of rank unity up to the term

ε/(ί> — a), and the departure of T from normality is (ft — α)/ε and is therefore

large.

/—149 - 5 0 —154>

A= 537 180 546 L

-27 -9 —25 J

\

whose eigenvalues are {1,2,3} (the example is due to C. Moler). We shall disturb

a 2 2 =180:

(a) a 22 = 180.01: the eigenvalues become 0.207 265 5; 2.3008349; 3.5018994

(Figure 4.23).

-X · · K · K ^

I 2 3

Figure 4.2.3

154 ERROR ANALYSIS

—._jT· r. ^

I 2 3

Figure 4.2.4

(b) aT22 = 179.997 769: the eigenvalues are now 1.550 945 6 ± i x 7.999 21 x 10" 2 ;

2.895 877 9 (Figure 4.2.4).

It can be seen that a small perturbation around a22 can produce very different

perturbations of the eigenvalues. It can be verified that csp (Af) % 103 (i = 1,2,3).

We pass on to the case of a block σ of eigenvalues of total algebraic multiplicity

m, and we suppose that the number

£ = dist πιίη(σ, sp A — o)

is positive. The block σ may consist of a cluster of eigenvalues, or of a single

multiple eigenvalue.

Let M be the m-dimensional subspace which is invariant under A and is

associated with σ. Let Q be an orthonormal basis of M; then P1 = QQ* is the

orthogonal projection on M. The matrix B — Q*AQ, which represents the

mapping A>M relative to the base Q, has σas its spectrum.

Let [Q, Q] be a unitary basis for C . The partial block-inverse in M1 is given by

1<L = Q{B,B)Q\

where

B = Q*AQ.

Let Ξ = diag {£,·} be the matrix of canonical angles between M and M„,, the

left invariant subspace. Let X+ be the basis of Μ + normalized by the condition

that

Then

P = QXt

is the spectral projection on M.

Proposition 4.2.2 If the matrix A is subjected to the perturbation ΔΑ, then aand

M become σ' and M' respectively, which (to the first order) are defined asfollows:

(a) σ' is the spectrum ofB' = B + X$AAQ.

(b) M' has the basis X', normalized by Q*X' = Im such that X' = Q - E±AAQ.

STABILITY OF A SPECTRAL PROBLEM 155

2.9.4, yield

\\B'-lB + Xl(AA)Q]\\=0(e2p\

\\X' -lQ-ZHAA)Q-]\\p = 0(a2p).

We remark that if Θ is the diagonal matrix of canonical angles between M

and M', then

II*' -ßllp=Htan0|| p = ΙΙΣ^ΛβΙΙ,.

Since the matrices B and B' are such that

||B-F||2<||^||2||Ai4||2,

can we deduce anything about the proximity between the members of σ and σ'?

We denote by V the Jordan basis of eigenvectors of B.

Property 4.2.3 For each λΈσ', there exists λβσ of index { such that

^-A|^2[cond2(F)||ZJ|t||2]1^,

for sufficiently small ε2.

small, then

(ΐ + μ ' - λ Ι ) ' - 1 " ^

(see Exercise 4.4.1).

When B is diagonalisable, then / = 1, V is a basis of eigenvectors and

\X'-X\^cond(V)\\XJ2s2.

maxminIA' - λ\ ^ 2cond2(K)||Zs|t||2eJ/,n.

λ'εσ ' λεσ

1 < ί ^ m, then εψ ^ e\,m < 1,

while

l^(cond2(V)\\XJ2)l"^cond2(V)\\XJ2.

We denote by λ (respectively λ') the arithmetic mean of the eigenvalues in σ

(respectively &):

1 = - X μ, 1' = - £ μ.

m μεσ m μεσ'

156 ERROR ANALYSIS

|2'-2| = 0(ε2).

PROOF This is an immediate consequence of the inequalities:

m

Definitions

(a) The global spectral condition number ο/σΪ8 defined as

csp(a) = cond2(V)\\XJ2.

(b) The spectral condition number of the invariant subspace M is defined as

csp(M)=||Z 1 || F .

These definitions are independent of the choice of bases in M and M 1 . We

recall that

ll*JI 2 = Wp\\2 = ll(cosE)- 1 || 2 = (cos^ max )- 1

and that

||I- L || F =||(fl,JB)- 1 |lF>*" 1 -

In the special case, in which A is Hermitian or normal, we obtain the following

simplifications:

^ * = 6» V is unitary, cond 2 (V)= 1,

0,P = P 1 , S = L 1 ,||S|| F = i - 1 ,

S =

conditioned; an invariant subspace is ill-conditioned if and only if it corresponds

to a block of eigenvalues that are close to the rest of the spectrum.

Example 4.2.4 Consider the special case in which σ contains a single eigenvalue

λ of multiplicity m; then λ is globally well-conditioned when cond 2 (K)||X 3|c || 2 is

moderate (note that we may put V= Im if λ is semi-simple).

A defective eigenvalue is ill-conditioned when cond2(K)||ArJ|t||2 is large (see

Exercise 4.2.2). On the other hand, it may be well-conditioned in contrast to the

case in which it is treated individually (see Exercise 4.2.1).

Since csp(M) = ||(β,β) _ 1 || Ρ , the investigation carried out in Chapter 1, Section

1.12, applies generally: csp(M) depends on cond 2 (K), cond 2 (K) and δ, where V

is the Jordan basis (eigenbasis) of B in C m and V is a Jordan (eigen-) basis of B

in C n " m . We shall consider two cases:

(a) λ is a multiple eigenvalue: if λ is semi-simple, then B = A/m and csp (M) reduces

to ||Σ 1 ||F, which depends only on £and cond 2 (F). On the other hand, if λ is

defective, then B = VJVl and ||Σ Χ || Ρ depends also on cond 2 (V).

STABILITY OF A SPECTRAL PROBLEM 157

as a function of δand cond 2 (V) is not a new phenomenon. We are interested

here in the dependence of csp(M) as a function of cond 2 (K). Now B is

diagonalisable:

B = VDV~l = Q(RDR1)Q* = β Τ β * ,

and cond 2 (F) = cond 2 (K). The number cond 2 (V) is related to the departure

from normality of the matrix B and is therefore related to the norm of the

strictly upper-diagonal part of T.

When cond 2 (V) is large, there exists a matrix close to B that has a defective

eigenvalue of multiplicity m and whose Jordan vectors are almost parallel.

1 1044\

0 /

The matrix of the eigenvectors of A is

'«i1

\0

l

)

-10-4/

cond 2 (K)~10 4 .

104

It is easy to verify that A' = I _4 Λ ) is defective; it has | as a double

10 _ 4 /2 0

eigenvalue and the Jordan basis V = ( . I, for which cond 2

6

V-10_4/2 10"4/2/

(F')~10 4 .

We remark that the departure from normality of A is of the order 104 and that

the relative distance between the eigenvalues is \/\\A\\ ~ 10~ 4 .

The number ||Χ„|| 2 is not invariant when A is subjected to a diagonal similarity

transformation. The notion which corresponds to the scaling of a matrix is the

balancing in relation to the eigenvalue problem. It consists in trying to find a

diagonal matrix D such that

\\D~lAD\\2= inf ||Δ"ΜΔ|| 2 .

Adiag

From a practical point of view the value of ||XJ| 2 is significant if it corresponds

to a matrix Δ~ιΑΑ for which ||Δ - 1 ,4Δ|| 2 is close to its minimum.

Λ.(V1O '°Ί

0 J

and A.f\01 ° \

ΚΓ 4 /

Then

A' = A~lAA = (1 l

\

158 ERROR ANALYSIS

1 1

X =

0 -10"4

and those of A' are

* — * - ( ; ' , )

It can be verified that the balancing of A by Δ has diminished the condition

number of the basis of eigenvectors, as well as the departure from normality of

A (see Exercise 4.2.3).

In order to compute ill-conditioned eigenvalues and/or eigenvectors we try to

group those eigenvalues that are 'tied' to one another, that is which are such that

they are strongly affected by a perturbation in order that the spectral condition

numbers are diminished as much as possible.

When A is normal, then csp (σ) = 1 and csp (M) = δ ~l: clustering neighbouring

eigenvalues diminishes csp(M).

When A is not normal, we have

csp(a) = cond 2 (K)||X,J 2 and csp(M)=||Z J

and the inequalities

1

1<II*J| 2 <1 + ΙΣ !

(5- 1 ^||I 1 || 2 ^cond 2 (K)(l+<5)^- 1 (5- (m=l),

where L is the maximal index of the eigenvalues of B (Proposition 1.12.4).

Grouping certain eigenvalues causes 1/Jand \\Χ+\\2 to diminish, but cond 2 (K)

and cond 2 (K) remain unchanged.

(\ 1 0\

0 1-ε 0

v° 0

If we group the two neighbouring eigenvalues {1,1 — ε}, both the spectral condi

tion numbers of the eigenvalues and of the eigenvectors pass from l/ε to 1.

(\ 104 0\

A= 0 0 0

0 2/

STABILITY OF A SPECTRAL PROBLEM 159

has the separated eigenvalues {1,0, j}. The eigenvalues 1 and 0 are ill-conditioned

(condition number of the order 104) while the corresponding eigenvectors

(1, 0, 0)T and (1, - Κ Γ 4 , 0)T

are well-conditioned (condition number of the order 1). In fact, the matrix

/ 1 104 θ\

5

A' = 1.1 x K T 0 0

^2xl0"5 0 \j

has the eigenvalues {1.1; —0.1; ^}. The first two eigenvectors are (1, 10 5,

±x 10" 4 ) T and(l, - l . l x l O " 4 , -±xlO" 4 ) T .

When we group the two ill-conditioned eigenvalues into σ= {0, 1}, we find

that csp(^~10 4 and csp(M)~104, where M = lin(e1,^2) is the associated in-

(\ 10 4 \

variant subspace. This is due to the fact that the matrix B = has as its

vo o ;

matrix of eigenvectors V= I _4 1, which has the property that cond2 (K)~

104.

Denote by X' the basis of the invariant subspace associated with σ' = {—0.1;

1.1} and normalized by Q*X' = /, where Q = [eXie^\. The reader can verify that

*·-[!: * x r]

(see Exercise 4.2.7). The grouping did not improve the spectral condition numbers,

because cond2 (V) remains unchanged. One may ponder on the apparent paradox

that makes it possible for two well-conditioned eigenvectors to generate on

ill-conditioned invariant subspace.

It should be remarked that the grouped eigenvalues {0,1} are not consecutive:

the eigenvalue \ lies between them. The relative distance 1/||>4|| ~ 10 ~4 is small.

It should become clear by now that the condition numbers we have introduced

so far have been chosen as the coefficients which can be arbitrarily large in the

first-order bounds derived from perturbation theory. This is because it is essential

to determine the circumstances under which a spectral computation can be ill-

conditioned, that is a small perturbation on the data induces a large perturbation

on the output. We have seen that, as a rule, a large departure from normality is

at the root of unremovable spectral ill-conditioning.

A rigorous general theory of conditioning is beyond the scope of this book.

The interested reader is referred to Rice (1966), Geurts (1982) and Fraysse (1992).

See also Exercises 4.2.13 to 4.2.19.

It is important to realize that notions such as stability and conditioning are

160 ERROR ANALYSIS

particular, the instability induced by 'normwise' perturbations ΔΑ such that

|| A A || ^ ε || A ||, for a given ε > 0 and for an arbitrary matrix norm, can be much

greater than that induced by 'componentwise' perturbations A A such that \AA\ ^

ε\Α\, where the inequalities are defined componentwise. The latter type of pertur

bations, which in particular preserves the sparcisity pattern of A, is often more

appropriate for studying the influence on algorithms offiniteprecision arithmetic.

Our next remark concerns the spectral condition number of an eigenvector

(or an invariant subspace). If x is an eigenvector, then ax, a φ 0 is also an

eigenvector. The non-uniqueness of the definition of an eigenvector is reflected

in its condition number, as we illustrate now. We suppose that the eigenvalue is

simple (m = 1, see Figure 4.2.2). If one is merely interested in the eigendirection,

the stability of the computation of x can be analysed by looking at tan θ = ||Δχ|| 2 ,

where Δχ = χ' — x — — ΣλΑΑχ lies in M 1 = lin(x) 1 . This gives csp(x) = ΙΙΣ11|2,

and corresponds to the normalization x*x' = 1 on the computed eigenvector x'.

In this case the eigenvalue and eigenvector condition numbers are not related.

However, one may wish to have a different normalization. For example y*x' = 1,

where y is an arbitrary vector, non-orthogonal to x. With this choice, the

perturbation Δχ lies in the subspace W= lin (y) 1 , and Δχ = — ΣΑΑχ. The condi

tion number is now ||Σ||2. In particular, one may choose for y the left eigenvector

x+; then Δχ = — SAAx lies in the complementary invariant subspace M = lin (x*) 1 .

The resulting condition number ||S||2 was proposed by Wilkinson in 1965. It is

large as soon as the eigenvalue condition number ||P|| 2 = | | x j | 2 is large. The two

condition numbers are no longer independent. See Example 4.2.8 and Exercises

4.2.8 and 4.2.9.

irreducible Markov chain. The stationary state π satisfies

tic and irreducible: the simple eigenvalue 1 is associated with the eigenvector

e = (1,..., 1)T. Since π is a row of probabilities, the condition ||π|| t = Σ£π4 = 1 is

the normalization of choice. It can be rewritten as π τ £=1. We suppose that

P + AP remains stochastic; hence

chain literature, S is often referred to under the equivalent name of group inverse

(because 1 is simple; see Exercise 2.3.7).

One way to illustrate the global eigenvalue ill-conditioning is to look at the

pseudo-spectrum.

STABILITY OF A SPECTRAL PROBLEM 161

σε = {AeC;i is an eigenvalue of Λ + ΔΛ,||ΔΛ|Κε||Λ||}.

If σε is much larger than sp (A) then the eigenvalues of A are globally ill-

conditioned with respect to a normwise perturbation. One can similarly define

a componentwise ε pseudo-spectrum (see Example 4.2.11).

The influence of non-normality has been a recurring theme throughout this

section on spectral stability. The departure from normality (abbreviated d.f.n.)

has been quantified in Chapter 1, Section 1.12, by v(A) or ||N|| F . It is related to

cond (X), where X is the Jordan (or eigen-) basis of A, depending on whether A

is defective or diagonalisable.

Indeed, an arbitrary matrix has (at least) two canonical forms.

(a) The Schur form A = Q(D + JV)Q*, where N is the strictly upper triangular

part of the Schur form and D is the diagonal of eigenvalues. Since Q is unitary,

cond 2 (Q)= 1: the normalization is borne by the Schur basis Q. However,

|| N ||F can be arbitrarily large.

(b) The Jordan form A = X(D + K)X " *, where K is a matrix whose only non

zero elements (located on the first upper diagonal) are unity: the normalization

is now borne by K, whereas cond 2 (X) can be arbitrarily large.

We wish to warn the reader that for a diagonalisable matrix a large cond (X%

where X is the eigenbasis, is not always an indication of a large d.f.n.

'a c

A=.

,0 b

for c> 0, of Example 1.12.2. When b φ a, A is diagonalisable and

A l

X = L0 b-a

c

V )

and when b = a, A is under the Jordan form ( 1 and X' — I 1.

VO a) V0 lie)

The d.f.n. is v(A) = y/lc^Jc2 +(b — a)2. When c is fixed and moderate, then

cond (X)-+ oo when b-*a, and v(A) decreases towards yflc2. In this case, the

large cond (X) merely indicates that A is close to being a defective matrix with a

moderate d.f.n.

On the other hand, if c-+ oo, then the d.f.n. v(A) ^ yjlc2 -► oo; cond(X) and

c o n d ^ ' ) tend to infinity, according as to whether b¥^ a or b = a.

162 ERROR ANALYSIS

/ * 1 3Ί

r yi *1 V o

x2 y2

-y2 XjJ

\v

It is easy to check that the di.n. v(Sv) is larger than y/n — 2 v2, so that it increases

as the parameter v increases, when n is fixed.

The real values xk,yk have been chosen so that the eigenvalues xk ± iyk lie on

the parabola x = — 10 y2, that is

(2/c-l)2 2/c-l

**= - yk = - löö" (/c=l,...,p).

1000

Now let v4v = ßSvQ, where Q is the symmetric orthonormal matrix consisting of

the eigenvectors of the second-order difference matrix

W (ij= Ι,.,.,η).

4u = . -sin

n+ 1 n+ 1

The matrix Λν has the same spectrum and d.f.n. as 5V. For v = 1, 10, 102, 103,

and n fixed equal to 20, the following computations on Av were performed by

means of the QR algorithm (see Chapter 5), under MATLAB* on a workstation

working with a machine precision of the order of 2 x 10~ 16 . Figure 4.2.5 shows

the exact ( + ) and computed (°) spectra of the four 20x20 matrices Av. The

increasing instability of the spectrum is clear, as v increases. The * represents the

exact and computed means of the eigenvalues; they are equal within machine

precision, as a consequence of Corollary 4.2.5.

Apart from two of them, 18 of the 20 computed eigenvalues lie, for v= 102 and

103, on a disk centred at this arithmetic mean 1 This suggests that the matrix Av

behaves approximately like one Jordan block of size 18, completed by two

diagonal elements.

In order to test this hypothesis, we compute the spectra of a sample of matrices

A' = A + A A, randomly perturbed from A, in the following componentwise way.

♦MATLAB is a numeric computation system, trade mark of The Math Works Inc.

STABILITY OF A SPECTRAL PROBLEM 163

U- I V- 10

•0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Exact and computed spectra

Thus Ay becomes a'u = alV(l + αί), where a is a random variable taking the values

± 1 with probability £ and t = 2~k, the integer k varying from 40 to 50. Therefore

|Δ>1| = ί|>4| where t ranges from 2 - 4 0 ~ 1 0 " 1 2 to 2 - 5 O ~ 1 0 " 1 5 . For each i, the

sample size is 30, so that the total number of matrices is 30x 11=330. The

superposition of the corresponding 330 spectra are plotted in Figure 4.2.6. The

transformation of the perturbed spectra as v varies is dramatic. The 36 spikes

around 1 for v = 102 and 103 confirm the hypothesis that Av becomes computa

tionally close to a Jordan form as v increases. The computed eigenvalues λ' are

solutions of (λ' - A)18 = ε = 0(t\ ε being positive or negative with equal probability.

The computed eigenvalues appear at the vertices of two families of regular

polygons with 18 sides, which are symmetric with respect to the vertical axis—

164 ERROR ANALYSIS

iSIttll

V^VfC^Vf^'-:.

;^Φ**ΐ-:

■%$m

-05 -0.4 -0.3 0.2 -0.1 -0.4 -0.3 0.2 -0.1

U= 10

. ■A T

L i i#" 4

10s 10s

Perturbed spectra

hence the 36 spikes. Note that, although the spectra for v = 102 and 103 are

qualitatively similar, the one for v = 103 is more ill-conditioned than the one for

v = 102. The fact that the matrix approximately behaves like a Jordan block of

size 18 (rather than 20, for example) is a consequence of the particular structure

of 5V. It remains true under normwise perturbations (Fraysse 1992).

The perturbed spectra are part of the componentwise ε pseudo-spectrum, with

40

£ - 2 " . However, the information on the underlying Jordan structure, in finite

arithmetic, of Av (which is diagonalisable in exact arithmetic) is too detailed to

be retrievable by looking at the global pseudo-spectrum. The interested reader

can find other suggestive examples in Chatelin (1989).

Example 4.2.11 shows that truly difficult problems arise when the d.f.n. is

A PRIORI ANALYSIS OF ERRORS 165

unbounded under the variation of some parameter. The parameter can be merely

the size n of the matrix, as is the case when the matrix is the discretization of a

highly non-normal operator.

The true difficulty of computing the spectrum in the absence of normality has

been known to mathematicians for a long time. Its practical implications should

not be overlooked, since such examples appear in essential industrial etpplications,

as well as in physics—fluid dynamics and plasma physics, for example.

Let λ be an eigenvalue of A of algebraic multiplicity m, geometric multiplicity g

and index <f. Then

M = Ker (A - λΐγ and E = Ker (A - λΐ)

are the associated invariant and eigensubspaces respectively.

Put A = A + ff, where \\H || = ε; c will denote a generic constant.

there exist m eigenvalues {μ·}7 of A (each counted with its algebraic multiplicity).

PROOF Let Γ be a closed Jordan curve isolating λ and lying in res (A). Consider

Ρ ' - Ρ = - - ί - ί [K'(z)-K(z)]dz,

2mJr

where

R'(z) = (A-ziy\

R(z) = (A-ziy\

R'{z) - R(z) = R'(z)(A - A)R(z).

Hence

||P' _ p|| ^ i i ^ l L azerx ||Ä'(2)|| ||R(z)||1||H|| = cs9

In L J

where meas Γ denotes the Lebesgue* measure of the curve Γ (see Exercise 4.3.1).

If ε is such that \\P'-P\\ <U then

dim ImP = dim Im F = m,

and A has m eigenvalues μ'. inside Γ.

We define Af' = Im F , the subspace invariant under A and associated with

166 ERROR ANALYSIS

ω(Μ,Μ') = 0(ε).

PROOF We have

ω(Μ,Μ') ^ max [||(P - P')P|| 2 , ||(P - P')P'U2]

^c\\F-P\\2^ce

by virtue of Lemma 4.3.1 and the fact that all norms are equivalent. In particular,

if xeM, then dist (x, Af') = Ο(ε), and if x'eM', then dist (χ', Af) = Ο(ε).

Lemma 4.3.3 Ife is sufficiently small, then P' defines a bijection ofM on to M'.

PROOF Let F be the map P\M:M-+M'. Suppose that xeM and ||x|| = 1. Then

11 - HP'xIl | = I IIPxIl - HP'xIl I < ||(P - F)Px ||

^ ||(P-P')PK±

if ε is sufficiently small. Hence

HP'HH and HP"1!!^.

We note that A]M and F~ 1A'F are maps of M into itself. Let B = Y*AX and

B' = Y*F~1A'FX be square matrices of order m representing these maps in a

chosen basis X of M and an adjoint basis Y.

sp(B) = {A), sp(F) = {/ii}7

and

\\Β-Β'\\=0(ε).

PROOF It is evident that sp (B) = {λ}. Let φ'εΜ' be an eigenvector of A' associated

with μ'; thus Α'φ' = μ'φ', and hence

(Ρ'-1Α,Ρ')Ρ'-ίφ = μ'Ρ'-1φ\

that is

£ V = μ'η',

where ^' = Y*F~^' and iy' ^ 0 because F~ ιφ' Φ 0. We conclude that sp(J3') =

= γ*Ρ>-ΐρ'(Α-Α')Χξ,

l

since AF = Ρ'Λ' and F~ FAX = ΛΧ.

A PRIORI ANALYSIS OF ERRORS 167

Therefore

\\B-Bf\\^ce.

Let / be a function of a complex variable z, and suppose that / is holomorphic

in a neighbourhood οϊλ, which constitutes the spectrum of B. By using Cauchy's

integral formula (2.2.6) we define

f(B) = ^{f(z)(B-zyldz.

Lemma 4.3.5

m

(Β' - ziy l-(B- zl)-l = (£' - ziy l(B - B'){B - ziy1.

Hence

because, for sufficiently small ε, the contour Γ contains sp(2*'). Finally, we apply

the inequality

-|trC|^p(CK||C||

m

to the matrix

C =/(£')-/(£).

(a) maxlA-zi^Ote 1 ") (4.3.1)

where

m

1

PROOF

(z — Xf(B — zl)'1 is holomorphic inside Γ, it follows that

f(B) = (B-XlY = 0.

168 ERROR ANALYSIS

f(B')x' -f(B)x = U(B') ~/(B)]x + /(£')(*' - x).

If η' is an eigenvector of B' associated with λ\ we have

/(£>?'==(A'-A) V

We deduce that

\λ - i f ||ff'||2 ^ \\f(B) -f(B)\\2 ||x|| 2 + ||/(F)|| 2 dist(iy'.C-) ^ cs,

by virtue of Corollary 4.3.2, because C m is invariant under B and associated

with λ.

(b) Choose f(z) = z and apply Lemma 4.3.5. Thus

m

Let λ' be one of the eigenvalues of A' inside Γ. Put

Ek = Ker (A - λΙ)\ Ε) = Ker (Α' - λ'ΐγ,

where

1 ^7 < k ^ t ^ m.

disX(x'pEk) = 0(e{k-j+w).

<£" = Ek®Fk,

and let Π be the projection on Ek along Fk. The equation

(/I - klfx = y

has a unique solution in Fk.

Let x'jeE'j and ||X;|| = 1. Put

xk = Πχ;,

and so

x;.-x fc = ( / - n ) x ; . e i v

Now

*i - x t = i(A - urtF j - H^ - A/)"(X;. - x»).

Hence

\\χ) -xJ^cUA- XI)k(x'j - xk) \\=cUA- Xlfx) ||.

A PRIORI ANALYSIS OF ERRORS 169

1= 0

and

(A - Xlf = l(A' - ΧΊ) + (Χ' - X)If = X Ck(X' - X)\A' - Λ7)*-\

i=0

We deduce that

|| (A - Xlfx) - (A'-Xlfx) II ^ce.

Also, when j < k, we have

||i = fc-j+l ||

k j+

^c\X-X'\ ~ \

because (A' — A'/)*""'*} = 0 when i < k —j.

Since | X — X'\^ cel,e we have

dist(x;.,£k;<iix;.-xj|2

< c || (A - Xlf - (A' - A/)k]x; + (A' - Xlfx) || 2

£ 1 = =£ = Ker(A-A/).

Then

dist(x/1,£) = O(e1/0·

The distance between the eigenvectors is of the order of the distance between

X and the individual eigenvalues, while the distance between the invariant

subspaces is of the order of the distance between X and the arithmetic mean X'.

πιίημ-μ'.Ι^Οίε^). (4.3.3)

i

PROOF We remark that (4.3.3) is an improvement of (4.3.1) only if g/m > 1/7.

Suppose now that m < g£, which is satisfied when the Jordan box Βλ, associated

with X9 contains g blocks of different sizes.

We choose a Jordan basis of B in M. Then B' is similar to C = Βλ + εΚ, where

εΚ is the perturbation matrix induced by H, and we may suppose that

|| K || = 0(1). We shall show that there exists at least one eigenvalue μ' of C (and

170 ERROR ANALYSIS

\λ-μ'\^α?ΐΜ.

Consider the characteristic polynomial

π(μ') = άα(μΊ-σ + λΙ)

whose zeros are the eigenvalues μ\ — λ of the matrix C — XL The constant term

is the product of the roots

m

t= 1

superdiagonal and m— 1— (g— \) = m — g units. The constant term in π(μ')

cannot contain terms in ε*, where k < g, because m — g + k = m is equivalent to

k = g. On the other hand, there exists at least one perturbation for which the

constant term contains a term in ε9. Hence

Πμ-μίΙ^ε*

/= 1

and there exists necessarily at least one μ' such that | λ — μ'\ ^ ce9,m.

The bounds which we have established determine only the order relative to ε.

Next we shall establish a posteriori bounds, which will enable us to estimate the

constants.

We use the knowledge of approximate eigenelements in order to obtain bounds

for errors. In order to be useful these bounds must be reasonably easy to compute

as functions of the known eigenelements.

4.4.1

In this section, the norm || · || on <C" is supposed to be monotonic, that is for each

diagonal matrix D = diag(A1,...,yln) the induced norm satisfies ||D|| =max,|A l |.

The Euclidian norm and the norm || · || ^ are monotonic.

The definition of monotonicity we have given is equivalent to the following

characterization. Let x = (£,-} and y = (η^ be vectors in C"; then || · || is monotonic

if and only if | ξί{\ ^ | ηίf | (i = 1,..., m) implies that || x || < y || (see Horn and Johnson,

1990, p. 310). Let a be a scalar and let u be a vector such that ||M|| = 1. In order

to test how accurately these data represent eigenelements of A, it is natural to

A POSTERIORI ANALYSIS OF ERRORS 111

able, then A = XDX ~ l\ if not, then A = XJX " \ where J is the Jordan form of A.

Theorem 4.4.1 Let a and u be given, where \\u\\ = 1. Put r = Au — OLU. Then there

exists an eigenvalue XofA of index ί ^ 1, such that:

(a) If A is diagonalisable,

|A-a|^cond(J0||r||. (4.4.1)

|A

~ a | , , <cond(X)\\r\\. (4.4.2)

(Ι+μ-αΙ)'"1

respectively is regular.

(a) If A is diagonalisable, we have

r = A u - ocu = X(D - aI)X ~x u

l = \\u\\ = \\X(D-*I)-lX-lr\\

llrll

^condW

min

AespM)l^- a l

(b) When A s defective, we have

i-WuW^wxv-ziy'x-'rw

ι

^ cond (X) max Ο + Ι ^ + ^ ^

AespM) \X — 0L\f

is an eigenvalue of A of index Y. We remark that when λ is defective {ί > 1),

the maximum is not necessarily attained for an eigenvalue λ that is closest

to a. We remark also that (4.4.2) often reduces to (4.4.1) when λ is semi-simple

Theorem 4.4.2 When A' = A + H, then for each eigenvalue X of A', there exists

an eigenvalue Xof A, of index ί^\, such that:

(a) If A is diagonalisable, then

| A ' - A | < cond (JO || fl ||. (4·4·3)

(b) If A is not diagonalisable, then

μ'-Λ|'

— ^cond(X)||tf||. (4.4.4)

(ΐ + μ ' - λ | >

172 ERROR ANALYSIS

for a the eigenvalue X of A and for u an associated eigenvector x' such that

|| x j = l;then

Ax' — λ'χ = r = — Hx'

and

l|rKI|H||.

The inequality (4.4.3) is known as the Bauer~Fike (1960) theorem.

V0 2 / No

where | | I I | L = land M t i - i i | L =10" 1 0 .

Nevertheless, a = 1 is not close to the double eigenvalue 2 (!); one suspects the

factor condiX) that appears in (4.4.1). The reader will verify that

1 1 \/2 l\/l 1010N

A =

\ o — i0"loyvo 2 Λ 0 - 1 0 1 0

The number cond(X) = \\X \\ \\X~11| is the condition number (with respect to

inversion) of the matrix X of the Jordan basis (of eigenvectors) of A. This number

is often taken as a measure of the conditioning of the spectrum of a matrix A; this

is justified by Theorem 4.4.2. When A is normal, cond2 (X) = 1; this confirms the

fact that the eigenvalues of a normal matrix are always well-conditioned.

Theorem 4.4.1 furnishes bounds that are valid whatever the value of ||r||.

However, in order to estimate condiX) for a non-normal matrix, we have to

know an approximation to a Jordan basis (or of a basis of eigenvectors). When

|| r || is sufficiently small we shall show in what follows that there are bounds that

require only the knowledge of a single approximate eigenvector. We shall present

this result in the more general case when an approximate basis of an invariant

subspace of dimension m is known.

Two situations will be treated successively:

(a) Only the approximate invariant subspace is known, and we associate with it

the Rayleigh quotient matrix.

(b) The approximate invariant subspace is known that belongs to a neighbour

ing matrix A' = A + H, which is also known.

4.4.2

Suppose we know U — [u l 5 ..., uM], which is a basis of the subspace M close to

the invariant subspace M; we suppose further that the vectors are normalized by

A POSTERIORI ANALYSIS OF ERRORS 173

where Y is a given matrix. The matrices U and Y are augmented to [C/, (/] and

[Y, J ] so as to become adjoint bases of <Cm. Relative to these bases, A takes the

form

and Y, and

D = YM U, R0=Y*AU and S* = Y*A U.

The right residual matrix is defined as

R = AU - UB = (/ - t/γ*μι/ = L/K0.

Similarly, the left residual matrix is given by

shall establish precise bounds as a function of \\R\\, where || · || now denotes the

spectral norm or the Frobenius norm. First, we introduce the following notation:

X is the basis of the subspace M invariant under A, normalized by Y*X = /;

B = Y*AX represents the map A^M:M-+M with respect to the adjoint bases X

and Y; Σ = U(D, B)Y* is the partial block inverse with respect to sp(B), which is

defined if and only if dist min [sp (D), sp (B)'] > 0; Θ = diag (0,), i = 1,..., m, is the

diagonal matrix of the canonical angles between M and M. We put

y=||f||, p=||K||, s=||S|| and w=||LR||.

Theorem 4.4,3 //sp (D) n sp (B) = 0 and ifysw < \, then there exists a basis X of

M normalized by Y*X = / such that

\\U-X\\^g(s)w

and

||B-B||«0(e)sw,

where ε = ysw and 1 < g(e) < 2.

Y*K = 0, we have %R = J^ 1R. The function g was defined in the formula (2.11.2).

\\U-X\\^2yp and \\B-B\\^2ysp.

g(e)<2.

174 ERROR ANALYSIS

||tan©|| p , when X is normalized by U*X = I; moreover, ||£|| p = ||(D,B) _1 ||p.

Since 1 < g(e) < 2, the knowledge of y = || Σ || is necessary only to ensure that the

condition ε < \ is satisfied.

The bounds of Theorem 4.4.3 are optimal with regard to the data U and Y;

this will be demonstrated in the following example.

\a l/b)

and let u = y - ev Then e[Aex = 0 and exe\ is the orthogonal projection on the

direction lin {e^. We have

and

||Z||=y = |fc|, 5*Ir=-a2fe, ||ΣΓ|| = w = |ofr|.

2

The eigenvalues of A are A = (l/2fc)(l ± ^/l — 4α ΊΡ). Let Ö be the acute angle

between the eigenvectors and eue2. Then

tan Ö = —.

Ifll

We put ε = y \\ s \\ w = (ab)2. Than the bounds given in Theorem 4.4.3 are attained:

μ| = g(e)\b\a2 and tan0 = g(e)\ab\.

The bounds that were established in Theorem 4.4.3 can be computed from the

solution W of the equation

(I-UY*)AW-WB = R,

and they apply when p is sufficiently small. It should be borne in mind that it is

w and not p that serves as a good indicator for the quality of the approximation

of X by U and of B by B. If Y also happens to be an approximate basis of the

left invariant subspace, then s, too, is small and the Rayleigh quotient B

approximates B to the second order 0(sw).

In the very special case in which A is almost triangular, it is possible to obtain

bounds that can be computed from the eigenvalues without knowledge of the

approximate eigenvectors (Exercise 4.4.3).

4.4.3

We now suppose that we know the matrix A — A + /i, close to A, for which M'

is an exact invariant subspace with a given orthonormal basis Q'. We put

A POSTERIORI ANALYSIS OF ERRORS 175

Let Σ' = ß ' ( F , B')~ lQ*; the right residual matrix for A, based upon B' and ρ',

is defined as

AQ' -Q'B' = (A - A')Q' = - HQ'.

We put / = ||Σ'||,s = ||Q'*AQ'Q'*|| and ί = ||β'||,« = IIQ*II; Θ is the diagonal

matrix of canonical angles between M and M'.

there exists a basis X ofM, normalized by Q'*X = / such that

||Χ-β'Κ2||Σ7/ρ'||

and

\\B-B\\^s\\X-Q\\+u\\HQ!\l

where B = Q'*AX.

G :w-ν Λ + Σ'[κρ*Λκ+ //Κ- κρ'*//ρ'],

where Kx = - ΣΉζ)'. It can be shown, that when || H || / ( l + iu + 2/si) < £,

(a) The sequence {Vk} defined in (2.11.3), with Y= Χ' = ρ', satisfies

11^11^211^11.

(b) G' is a contraction map in the sphere

α^{ν\\\ν\\^2\\νχ\\γ

(c) G' has a unique fixed point V in 0$ which satisfies

χ = ρ'+κ, ΛΑ^ΧΒ, ρ'*χ = /.

For a proof the reader is referred to Exercise 2.11.2. It is easy to deduce that

ΙΙ*-6ΊΙ = ΙΙ*ΊΙ^2||Σ7/ρι.

On the other hand, the identity

B-B' = Q'*AX - Q'*A'Q

= Q'*A(X-Q') + Q:*(A-A')Q:

= ο:*Ανν*(χ-<ζ)+ο:*Η<2'

furnishes a bound for || B — B'\\.

176 ERROR ANALYSIS

Corollary 4.4.6 With the Euclidian norm and under the hypothesis that

iiHii2<Ky'0+y'*)]~ 1

||tan0||2^2y'||tf||2

and

||B-BJ2^(2y's+l)||H||2.

The bounds established in Theorem 4.4.5 and Corollary 4.4.6 can be computed

from the solution W = ZHQ' of the equation

(/ - Q'Q'*)A'W - W'B' = (/ - QQ*)HQ'.

Finding an estimate for / can turn out to be costly when the matrix A' is not

normal. In Section 4.6 we shall get to know the simplifications brought about

by the assumption that A' is Hermitian.

4.4.4

Having established the a posteriori bounds for || B — B || and \\B — B' || one would

like to deduce bounds for the distance between their respective spectra. This is

a very difficult question for which no solution has yet been found that is

sufficiently simple to be altogether satisfactory.

In Exercise 4.3.2 the following qualitative result will be established:

dist [sp(/4), sp (A')-] ^ c \\A - A'\\1/n,

if || A — A'\\ is small enough, where n is the order of A and of A'. The constant c

is difficult to estimate in the general case (see, for example, Ostrowski, 1957) and

is of little use in practice because it increases with the order n of the matrix. The

exponent 1/n is deleted when A and A' are assumed to be diagonalisable. In

particular, when A and A' are Hermitian,

dist[sp(/l),spM')]<|M-^|l25

as we shall see in Section 4.6.

The simplicity of such a result is preserved even in the context of arbitrary

matrices if we replace the distance between the spectra by the distance between

the arithmetic means of the eigenvalues:

A = -tri4, ;: = - t r / l \

n n

A IS ALMOST DIAGONAL 177

In fact,

\l-l'\ = -\tr(A-A')\^\\A-A'\

n

Let us now return to the matrices B, B and B' in which we are interested. We

denote their spectra by

{μ,.}™ KJ7 and M7

respectively, and their arithmetic means by

m i m i nii

\μ-ζ\^\\Β-Β\\ and \μ-μ'\ ^ \\B- B'\\.

In particular, when m = 1,

U-CI^2M||2||Er||2, r = Au-Cu,

u*u = 1, C = u*Au

and

μ-ΑΊ^[2Μΐι 2 ||ΣΉ^ΊΙ 2 + ΙΙ^Ίΐ2], *"V = i.

We conclude this section on bounds for a posteriori errors by connecting it

with a related problem, namely the localization of the eigenvalues of a matrix,

that is the determination of the region in the complex plane in which the required

eigenvalues are situated. The localization obtained depends on the data at our

disposal. For example, if we wish tofindthe simple eigenvalue λ of Λ, the localiza

tion problem might take the following form.

Given the vectors u and y such that y*u = 1, and also the complex number

σ = y*Au, is it possible to determine the radius of the smallest disk centred at σ

(or u) and containing λ (or x) in such a way that Ax = λχ, χΦθΡ. We may also

wish to obtain localization results by starting with a given vector u and a given

scalar a. The results we shall establish will provide partial answers to this

question in the general context when it is desired to localize together a set of m

eigenvalues. In Section 4.6 we shall return to this problem of localization in the

case of Hermitian matrices.

For a diagonal matrix, the bases X = Y= In are the bases of right and of left

eigenvectors respectively. We shall establish localization results for the eigenvalues

of a matrix that is close to a diagonal matrix; this will be based on the following

general result.

178 ERROR ANALYSIS

Theorem 4.5.1 Each eigenvalue ofA= (au) lies in at least one of the Gershgorin

disks

off-diagonal part. The theorem is true when λ = au for at least one index i.

Suppose now that λ φ aih i = 1,..., n. Then / / — D is regular and we have

λΙ-Α=λΙ-ϋ-Η = (λΙ- D)U - (λΐ - D)- lHl

The condition || (λΐ — D)~x || || H || < 1 is sufficient for λΐ — A to be regular. Hence

for each eigenvalue λ of Λ, we have

Corollary 4.5.2 If each pair of the n Gershgorin disks has an empty intersection,

then each disk contains exactly one eigenvalue of A, which is therefore simple.

PROOF By virtue of the hypothesis, the au are distinct. Put Α(ε) = D + εΗ when

Ο ^ ε ^ 1. When ε = 0, the disks reduce to the points au. By continuity, as ε

increases, each disk contains one eigenvalue as long as the disks remain disjoint.

When applied to a matrix A which is almost diagonal, that is for which

|| H ||^ = maXiEj^ilfl/jl is small, the above results enable us to find bounds for

\λ — au\ provided that the disks are disjoint. In some cases of non-empty inter

sections we can use diagonal similarity transformations on A in order to render

the disks disjoint (see Exercises 4.5.1 and 4.5.2).

We shall now establish a result that enables us to treat the case of a matrix A

that is close to a block-diagonal matrix.

1 m

β = ~Σ Pi

rn i= i

By suitably permuting the rows and the columns of A it is possible to obtain a

partitioning

Alx Al2

A2\ A22J

Im/l-tMnl^M^II*,

A IS ALMOST DIAGONAL 179

where, generally,

ΜΙΙ* = ΣΚΙ·

AQ = QT,

and let X be the nby m matrix formed by the first m columns of Q: AX = XTlx.

Put

t r T u =tr X* AX = mfi.

Let

Λ U,...,m)

be the m by m matrix extracted from X by selecting the elements in the rows

iah*--»im anc* the columns 1,2,...,m.

We permute the rows of X in such a way that

*-G::}

where X u is the square matrix with the property that

We partition A in the same fashion; thus

^ l i ^ n + AxlX2i = ΧπΤη,

whence

=

^11 + ^12*21^11 ^11^11^11 *

It follows that

tr^-trT^-tr^X^*-/).

Put

^21=^21^11 Ο Γ

^21*11 =

^2V

s

By Cramer's* rule, the general element of y 2 i *

/l....,ft.....*\

Vl,.-7,··.,«/

n,= detX n

180 ERROR ANALYSIS

and so

m n

|tr>t 1 2 y 2 1 | = Σ Σ wu

j'=lk = r+l

m n

^max|)>fcj.| £ X |fljjkl^Mi2ll*

block-diagonal matrix (see Exercise 4.5.3).

4.6 A IS HERMITIÄN

Numerous simplifications occur in this context, which enable us to obtain more

precise results.

We recall what Theorem 4.4.1 becomes: if a and u are given such that || u\\2 = 1

and r = Au — OLU, there exists an eigenvalue λ of A such that | λ — α| ^ || r || 2 . The

Rayleigh quotient p — u*Au, constructed with w, possesses the following optimality

property.

min \\ Au — zu \\2, u fixed, u*u=\,

is solved by p = u*Au.

PROOF We have

\\Au-zu\\22= (u*A* - zu*)(Au - zu)

— u*A*Au — zu*A*u — zu*Au + zz

= u*A*Au + (u*Auu*A*u — u*Auu*A*u) — zu*A*u — zu*Au + zz

= u*A*Au — u*Auu*A*u + u*Au(u*A*u — z) — z(u*A*u — z)

— u*A*(Au — uu*Au) + \u*Au — z| 2 ,

The minimum of which is attained for z = u*Au = p.

In particular, Theorem 4.4.1 now asserts that, given u such that || u || 2 = 1, there

exists an eigenvalue / of A with the property that

μ - p K M u - p n ||2.

A IS HERMITIAN 181

This property is often attributed to Krylov (Krylov and Bogolioubov, 1929) and

Weinstein (1934).

If additional information is available about the distance of p from the other

eigenvalues of A, then the Krylov-Weinstein inequality can be improved; this

will now be done after a preliminary lemma. We put

e= \\Au-pu\\2.

Lemma 4.6.2 Let a and b be two real numbers such that a<p<b and suppose

that the open interval (a,b) contains no eigenvalues of A. Then

(b-p)(p-a)^e2.

t h e n | M | 2 = |M| 2 = l.If

n

n

v = Σ to,

i=l

(A - bu)*(Au - au) = (DO - bv)*(Dv - av)

= Z(ft-*>fa-*)liil 2

= ε2 + ( ρ - ί ? ) ( ρ - α ) ^ 0 ,

because (μί —ft)(ju,·— a) > 0 for all i.

We denote by Θ the acute angle made by the direction of u and the eigen-

subspace M associated with λ (see Figure 4.6.1).

Figure 4.6.1

182 ERROR ANALYSIS

Theorem 4.6.3 Suppose the open interval (λ,λ) contains p and precisely one

eigenvalue λ of A. Then

ε2 ε2

p-= <λ^ρ + - (4.6.1)

λ—p p—λ

and

2 1/2

. + ÄV 2Ί

sin0<: Ρ-^Γ1) + «Ί (4-6.2)

'λ-λ

(a) If λ < ρ < I, put a = λ and b = I. Then

p-= ^λ<ρ.

λ-ρ

(b) If λ < ρ < λ, put a = λ and ί? = λ. Then

ε2

ρ<Λ<ρ+

Ρ~λ

Hence (4.6.1) is always true when λ < p < λ. With the notations of Lemma 4.6.2

we have

(Au - ku)*(Au - 'λύ) = ε2 + (p - λ)(ρ - λ)

= (Dv - λν)*(ϋν - λν)

i= 1

conclude that

that

Pu

1

II Pull 2

and so

£1=cos0, i 2 = ... = {m = 0.

Hence

ε2 + ( ρ - Α ) ( ρ - Ι )

in2 0 = 1 - cos 2 Θ ^ 1 + -

sin

(λ-λ)(λ-ϊ)

A IS HERMITIAN 183

Since

(W

It follows that

siπ^,

'*(ή)T( ! i i ) ^+ε,+0 '- ä,( ''- 3, }

whence (4.6.2) is readily deduced.

^ = dist[p,sp(/l)-μ}] = min[|p~μ|,μ€spμ)-{yl}].

ε s

\λ — ρ\^ιζ and sin0^=:

δ δ

X= p—δ and 1 = ρ + δ.

The inequality (4.6.1) is known as the Kato-Temple inequality (Kato, 1949;

Temple, 1928). It improves the Krylov-Weinstein inequality when ε2<(λ — p)·

(p —A). This inequality is often used to obtain the bounds λ and λ.

/ 1 10"5 1(τΛ

5

A = 10" 2 10" 5

1(T 5 1(T 5 3

On putting M = ef (i = 1,2,3) we obtain |Af· — i\ ^y/ΐ x 10~ 5 by using the

Krylov-Weinstein inequality. Next, on applying (4.6.1) we obtain

2xl0~10 , 4

5

l-V^xlO"

M <*> 2xlO"10

l-^/lx 10"5

ERROR ANALYSIS

, , , 2χ10~10

\-yfix 10"5

We shall now turn our attention to the approximation of a set σοί m eigenvectors

of A by means of the spectrum of a Hermitian matrix C of order m. Let Q be a

basis consisting of m orthonormal vectors. The residual matrix associated with

C and Q is given by R(C) = AQ- QC. The spectrum of C is denoted by {aj?.

Theorem 4.6.5 There exists an ordered set ofm eigenvalues {μ^™ of A such that

(a) m a x | f t - a i | < | | A ( C ) | | 2 , (4.6.3)

i

PROOF

A = G*AG = (B S

*°S

'B-Cs

R(C) = G*R(C) = [

where

||Ä(C)|| 2 = ||Ä(C)|| 2 ,

and

Ö = G*G

-(';>

We now adopt the basis defined by G and apply the dilation theorem (see

Exercise 4.6.1) to the matrix R(C); thus we construct the matrix

H = H* = (S-C S

*\

lltf|l2 = IIÄ(C)||2.

A IS HERMITIAN 185

A-H-lC ° )

\0 E-WJ

is denoted by {aj", with the proviso that

{<*,.}? = sp(C).

By Weyl's theorem (Exercise 1.9.6) there is a set {μ,.}? of eigenvalues of A

such that

Ιμ,-α,Ι < ||if ||2 = |lR(C)||2 (/=l,...,m).

(b) We diagonalise A and C; thus

F M ^ = D = diag(^,·),

i/*Cl/ = A = diag(ai),

and we put

Q = V*QU, R(Q = V*R(C)U.

Hence

R(C) = D ß - ß A

and

||R(Q||F=||R(C)||F.

We may therefore confine ourselves to the case in which A and C are diagonal

matrices D and Λ respectively.

Next, consider the change of basis

ο=[ο,6]=(0,;)·

We put

ΗΌ=|0Ο·Ι2> ß = (4«A

and

_j(^i-aj)2 whenl^j^m, l^i^n.

tJ

\o when;>n, Ui$«.

Put R = R{C) = DQ- QA. Then

|| R || I = tr K *K = tr (Q*D2Q - AQ*DQ - Q*DQA + Λ2)

m / n n

= Σ Σ^Ι^·Ι2-2Σ«Λ·Ι^Ι2 + «,2

m n n

= Σ Σ Wy(M!-a/, because X w 0 = 1

j=li=l i=l

R ft

=

Σ Σ wi/A/> because dy = 0 when j >m.

i=lt=l

186 ERROR ANALYSIS

find aa ιιιαιιΐΛ

matrix rvW=(w.j) which uniiniiiz.c5

= yw..) wniuii minimizes ||\\R\\?

rv [ under the

constraints

of order n. In fact, it is easy to verify that the minimum of

Σ Σ wudu

is attained for

Σ du.

Hence

min\\R(C)\\2F=YJ(pi-oii)\

In particular, suppose that the basis Q is chosen to be the basis of eigenvectors

Q' of A' = A + H, and that C is taken to be B' = Q'M'Q'. Then

R(B') = (A-A')Q\

and if {μ·}7 = sp(ß')> there exists a numbering of the eigenvalues of A such that

«' »= i

possesses the following extremal property.

(p = 2 or F), when Q is fixed and such that Q*Q = I and when Z ranges over <Cm x m.

PROOF We have

(AQ - QZ)*(AQ - QZ) = Q*A2Q - Z*Q*AQ - (Q*AQ)Z + Z*Z

= Q*A2Q + (B - Z)*(B -Z)-B2

= (AQ - QB)*(AQ - QB) + (B - Z)*(B - Z).

Put

F = AQ-QZ, G = AQ-QB, H= B-Z.

1/2 i,

(a) ||F|| 2 = p (/ *F); since H*H is positive semi-definite, it follows that

p(F*F)^p(G*G) (see Exercise 1.9.1).

A IS HERMITIAN 187

In both cases the minimum is attained when Z = B.

The interested reader can refer to Exercises 4.6.5 and 4.6.6 where extremal

results with regard to the approximation of the eigenelements of A by those of

B will be found; these results complement Lemma 4.6.6. As in the preceding

section, the inequalities of Theorem 4.6.5 can be sharpened for C = B, if we

possess supplementary information about sp(2*).

Suppose we know the spectral decomposition of B: B=VAV*, where

Δ = diag(pi). Put U = QV; then

\\R\\F=\\AQ-QB\\F=\\AU-UA\\F

/ m \l/2

where e, = \\Α^-ΡΜ\\2.

Theorem 4.6.7 Let (λ, J) be an open interval that contains sp (B) and precisely the

set σ of eigenvalues of A. Then there exists a numbering {μ^™, of these eigenvalues

such that

ε2

pi- Z f ^ ^ i ^ i + Σ

(i=l,...,m).

PROOF The reader is referred to Kato's article (1949). We remark that in the

denominator there are no quantities of the type pf — pj9 which could be small in

the case of neighbouring eigenvalues.

The above inequalities use the knowledge of a basis of eigenvectors of B.

We deduce from them other inequalities that are less precise but require only a

knowledge of \\R\\F and (A, I).

\\R\\2F ^ ^ , \\R\\2F

ιηιη,.μ-ρ,.) maijiPi-Q

0 = l,...,m).

δ = dist min [sp (£), sp (A) — &].

188 ERROR ANALYSIS

If\\Rh<S9then

maxlp.-^.l^-^-A

i O

PROOF Evident.

The various results that we have enunciated in this section generalize those

established in Section 4.6.1 which deal with the relationship between a single

eigenvalue and the Rayleigh quotient. In order to complete the analogy it only

remains for us to mention the results which enable us to find bounds for the

matrix of the canonical angles between eigenspaces.

Among the results proved in Davis and Kahan (1968) we mention those that are

most relevant for our purpose.

(a) σΐ8 approximated by sp(£) = {pJ7 anc * ® denotes the diagonal matrix of

canonical angles between the subspace generated by the given basis Q and

the eigenspace associated with σ. Let R = R(B).

δ= dist min [sp(#),sp(/l) — &].

Then

||sin©||p^^ (p = 2 o r F ) .

o

(b) a'\s approximated by sp(£') = {μ[}™ and Θ denotes the diagonal matrix of

canonical angles between the eigenspaces of A and A' respectively. Let

R = R(B).

δ' = dist min [sp (&), sp (A) — &].

Then

||sin©||^^ (p = 2 o r F ) .

o

The interested reader is most strongly advised to refer to the article by Davis

and Kahan (1968), which contains a large number of other results.

The residual matrix R(C) = AQ — QC is computed from an orthonormal basis

Q. In fact, the bound (4.6.3) can no longer be guaranteed in the absence of ortho-

A IS HERMITIAN 189

independent unit vectors that are not orthogonal. The singular values of U lie

between 0 and yfm; the least singular value of U, say tfmin((7), is a good indicator

of the linear independence of the columns of U (see Chapter 1, Section 1.8). Let

R(Cy U) = AU — UC be the residual matrix associated with C and U.

Theorem 4.6.12 For given matrices C and U, there exists an ordered set of m

eigenvalues of A such that

{

<r min (L0

be the decomposition of U into singular values, where D = diag (σ,,..., σ„) and

Y and X are unitary matrices of orders n and m respectively. Put

A = YAY*, C = X*CX, R(C, U) = YR(C, U)X.

Then

R(C,U) = ÄU-UC

and

||Ä(C,l/)||,= ||Ä(C,l/)||„ (p = 2 o r F ) .

Consider the partitioning

(B S'*\

whence

'£7)-DC

^(C,C7):

1

S"D

(a) On applying (4.6.3) to A, we obtain

m a x l ^ - a . - K \\L\\2,

where

Now

■<r)

\\L\\=\\L*L\\

11 f

Ί2 =

2*i\V

II ^ ^ II2 ^ II ** -C\\

~ ^112 +

"^ H \S\\%

^ II2

190 ERROR ANALYSIS

a||S'|| 2 <||S'D|| 2 , a\\B'-C\\2^\\B'D-DC\\2.

Hence

\\L\\l^\(\\B'D-DC\\22 + \\S'D\\l)^-2\\R\\22.

o o

(b) It is clear that

\\B'D-DC\\l>a2\\B'-C\\l

(see Exercise 4.6.13). Hence

|| A \\l = || B'D- DC \\l + || S'D\\j

^a2(\\B'-C\\l+\\S'\\2F) = a2\\L\\2F

On applying (4.6.4) to A we obtain

The notion of spectral conditioning given here for a matrix that is not necessarily

diagonalisable generalizes the notion for a simple eigenvalue given by Wilkinson

(1965) or Golub and Van Loan (1989).

The thorough analysis of the influence of the departure from normality on the

spectral stability of matrices is presented for the first time.

The adopted framework is the traditional normwise stability analysis. The

componentwise analysis has been actively developed in the 1980s. See Geurts

(1982) and Fraysse (1992) for a theoretical account, and the LAPACK User's

Guide (1992) for a more practical viewpoint.

The random componentwise real perturbation of the matrix, defined in

Example 4.2.11, has been used in Chatelin (1989) to explore the topological

neighborhood of a Jordan form. Trefethen (1991) has used similar complex

perturbations in connection with normwise pseudo-spectra.

The influence of non-normality is spectacularly examplified in the two

following PhD theses: Godet-Thobie (1992) contains an industrial application in

aeronautics and Reddy (1991) studies the spectrum of the Orr-Sommerfeld

operator for the stability of parallel shear flows. Another example of spectral

instability is described by Kerner (1989). It concerns the Alfven spectrum for

decreasing resistivity in magnetohydrodynamics.

EXERCISES 191

Example 4.2.8 and Theorem 4.4.3 were inspired by Stewart (1971). The analysis

of a priori errors presented in Section 4.3 was previously given by Wilkinson

(1965) using Gershgorini's disks. The inequalities (4.4.2) and (4.4.4) were inspired

by Kahan, Parlett and Jiang (1982). The a posteriori bounds given in Theorem

4.4.5 are new. Theorem 4.5.3 was proved by Ruhe (1970a). The proof of Theorems

4.6.5 and 4.6.12, due to W. Kahan, is here published for the first time following

his report 'Inclusion theorems for clusters of eigenvalues of Hermitian matrices',

Computer Science Department, University of Toronto, Ontario (1967), kindly

supplied by the author.

EXERCISES

4.1.1 [A] Let A be a regular matrix. Show that there exists a matrix AA of

rank 1 such that A + AA is singular and

IM II2

»ΔΛ|| 2 <

cond2(A)

(aM = (

,0 ef

(b) A = (

:: r>

1 10* \

(c) Λ = (

,0 ε/

4.1.3 [D] Consider a rectangular matrix Ae<Cn xm, where n ^ m, and suppose

that the columns of A are linearly independent. Define

min{Mx|| 2 :||x|| 2 = l}

(a) Prove that if m — n, then κ2(Α) = cond2 (A).

(b) Prove that

\ min <xf /

192 ERROR ANALYSIS

(c) Let

A = (! 1 1

'■" 1

)G&"+]

Compute κ2(Λ).

4.1.4[D] Let A' = A+ εΗ, where ||H|| = 1 with respect to the induced norm

|| · ||. Suppose that A is a regular matrix such that

0<ε<

\\A'l\\

4.1.5 [C] The row scaling of the system Ax = b consists in solving the equivalent

system D~1Ax = D~lb9 where D is a diagonal matrix such that all the rows of

D~lA have approximately the same norm relative to || · || ^. Investigate the effect

of the scaling (in the arithmetic with base 10 rounded to three decimal places),

when the data are

10 105N

1 1

4

and

(T)

and when we take D = diag(10" ,1).

4.2.1 [A] Let λ be an eigenvalue of A and let δ be the distance of λ from the

rest of the spectrum of A. Let M be the eigenspace associated with λ and let g

be an orthonormal basis of M 1 . Define

B = Q*AQ9

and

Z1 = Q(B-U)-1Q*.

Let / be the index of the eigenvalue of B which is nearest to X. Prove that, if b is

sufficiently small, then

5-1^||(B-A/)-1|l2 = P1||2^2cond2(K^-/,

where V is the Jordan basis of B.

4.2.2 [A] Let Ae<Cnx\ee<£ and HeC" x n such that || H ||2 = 1. Define

Α(ε) = Α±εΗ.

EXERCISES 193

the associated subspace and φ an orthonormal basis of M. Define

0 = φ*Αφ,

Mm + n = j=V-*eVy the Jordan form of 0,

A/m + N = R = β*0β, the Schur form of 0.

Suppose that A is of index t.

(a) Prove that, for all k ^ 0,

||N f c || 2 ^cond 2 (n

(b) Prove that, for sufficiently small ε, there exist φ„ φ{ε) and 0(ε) such that

θ(ε) = φχΑφ(ε),

Φ:Φ=Φ:Φ(*)=ΙΜ,

Α*φ* = φ*θ\

\\θ(ε)-θ\\2^\\Ρ\\2\ε\ + 0(ε2\

where

IIHa-IWJIa.

P being the spectral projection associated with A.

(c) Let Α(ε) be the eigenvector of 0(ε) that is closest to A and suppose that Α(ε) Φ A.

Prove that

i^\\imm-erlie(z)-e^\\2

and deduce that

|A( e )-A|^[/cond 2 (F)||P|| 2 | £ |]^ + 0 ( | £ m

(e) Compare this with the property 4.2.3 (page 155).

4.2.3 [A] Let

Λ-(1 10

Ί and Δ^1 ° \

Verify that the balancing of A by Δ decreases the condition number of the basis

of eigenvectors as well as the departure from normality.

4.2.4 [B:25] Let x and x+ be the right and left eigenvectors associated with a

simple eigenvalue of a matrix A. Prove that if no normalization is imposed on x

194 ERROR ANALYSIS

csp(A) = ■m\-

\χ*χ\

4.2.5 [D] Suppose that A is diagonalisable in a basis V whose columns are unit

vectors in the Euclidean norm. Let Is,·!"1 be the condition number of the

eigenvalue Xt as it was defined in Exercise 4.2.4. Prove that when all eigenvalues

are simple, then

condF(K)=t|5ir1

i=l

l ^ | s i | " 1 < i [ c o n d 2 ( K ) + cond 2 (K)- 1 ].

4.2.6 [A] Show that the condition number of a semi-simple eigenvalue is of the

Lipschitz type while a defective eigenvalue is of the Holder type.

4.2.7 [A] Investigate the relative error of a non-zero eigenvalue.

4.2.8 [B:25,67] Suppose A is diagonalisable. Compare the condition number

csp(x) of an eigenvector corresponding to a simple eigenvalue with that deduced

from Wilkinson's formula

xjLHXf

xj(e) = Xj-e X xt + 0(ε2)

i = 1,ίφ j\(Aj — AtfXi^Xj

where x} is the eigenvector corresponding to λί9 \\Xj\\2 = II^j*II2 = 1, S is the

associated reduced resolvent and Α(ε) = A 4- εΗ, \\ H ||2 = 1.

Investigate ||χ/ε)|| 2 and comment.

4.2.9 [C] Compute the condition numbers of Chatelin and of Wilkinson

(Exercise 4.2.8) for

103

-G 1.1

and comment.

4.2.10 [C] Let

(I 104 o\ 1 104 0 \

A= 0 0 0 and A' = 1.1 x lO-5 0 0

.0 0 i 2xl0"5 0 1 Ύ 1

Verify that the basis X' of the invariant subspace M' of A which is associated

with the block a' = {— 0.1; 1.1} and normalized by Q*X' = /, where Q = (el9e2)9

EXERCISES 195

is equal to

( 1 θλ

X' = 0 1

\ 10" 3 /36 5/9/

4.2.11 [A] Investigate the departure from normality of the Schur form as a

function of the condition number (relative to inversion) of the matrix representing

the Jordan form when the block σα)η8Ϊ8ί8 of a double eigenvalue λ or two distinct

eigenvalues λ and μ.

4.2.12 [D] Verify that the Jordan form is numerically unstable. The computation

of the Jordan form is an ill-posed problem.

4.2.13 [A] Let A have the simple eigenvalue λ:

Ax = λχ (χφ 0),

Prove that

|ΔΑ| JlxJUIxll

lim

ΙΙΔΛ||-Ο||Δ/1|| |χ*χ|

where || · || # is the dual norm of || · ||. Deduce the value of the relative condition

number for λ Φ 0:

KW(A)= Ihn I ^ I J I L .

ΙΙΔΛΙΙ-Ο |A| ||Ai4||

4.2.14 [B:20,23] The space of matrices is now equipped with the relative

componentwise distance:

ε = ππη(ω;|Δ/1| ^ ω | Λ | ) with AA = A — A\

where the inequality is taken componentwise. Show that, for the same eigenprob-

lem, the relative condition number for λ Φ 0 is given by

1 |ΑλΜχ;||Λ||χ|

Κα(λ) = lim

ε-οε |Λ| |χ*χ||Α|

where \A\ denotes the matrix with the (ϊ, j)th element equal to |aiV|.

4.2.15 [A] Let y be an arbitrary vector, non-orthogonal to x. We suppose that

x and the perturbed eigenvector x' are normalized such that y*x = y*x' = 1, so

that

Δχ = — ΣΔΑχ lies in lin(y) 1 .

Show that

On ί££-|Σ||*|.

lAiin-o Ai4||

196 ERROR ANALYSIS

|Δχ|| Mil

/Cc(x)=||Z||MH= lim

ΙΙΔΛΙΙ-Ο ||χ|| ||ΔΛ||

4.2.16 [B:20,23] With the distance defined in Exercise 4.2.14, show that

KG(x) = lim

HIAxIL IIPPIIXIIL

ε-οε || x | IML

4.2.17 [C] Compute the normwise relative condition numbers defined in

Exercises 4.2.13 and 4.2.15 for the three norms || · ||s, || · || 2 , || * ||«, and the matrix

104

A=

2

Compute the componentwise condition numbers and compare with the normwise

ones obtained with || · || ^.

4.2.18 [B:20] Let λ\ x' be approximate eigenelements for A; r = Αχ' — λ'χ' is the

residual vector. The backward error is defined as the minimal size of perturbation

AA such that (A + AA)x' = λ'χ'. Prove that the normwise backward error is given

by IMI/(M|| ||x'||), for any subordinate norm, and that the componentwise

backward error is given by max 1 < i < n |r ( |/(|/l||x'|) l . Verify that the backward

errors are independent of the normalizations chosen for x'.

4.2.19 [D] We suppose that A is diagonalisable: A=XDX~ Prove the

following componentwise version of the Bauer-Fike theorem:

min ΙΔλΙ^ΙΐμΤ-ΜΐΔΛΐμτΐΙΙ

Xesp(A)

<ε|||Α Γ -ΊΜΙΙ^ΙΙΙ

for all AA s uchthat |ΔΛ|<ε|Λ|.

Apply th<s result to the highly non-noirmal matrix

f

2 109 -2xl09\

A= -10"9 5 -3 with ei|

9

^2xl0" -3 2 1

Check that

/l9 20 14^

Ι^-ΊΙ^ΙΙ^Ι = 20 21 15

1^14 15 l l y

and compare with || X"l \\ \\ X \\ \\A ||. Conclude.

4.2.20 [B:43] We suppose that the elements of A are complex-valued differentia-

ble functions of a complex parameter t varying in D a C The matrix A = A(t)

EXERCISES 197

admits the eigenelements λ = λ(ή, χ = x(t)9 defined in D. Let t0eD be such that

X(t0) is simple and A\ λ\ χ' exist. If we impose the Euclidean normalization

x*(t)x(t) = 1 in Z), then prove that the derivative x' at t = t 0 is

x' = (x*SA'x)x - SA'x,

where S is the reduced resolvent.

Suppose now that A(t) = A + ίΔΑ for ί in a neighbourhood of ί == 0 including

t = 1. Check that the first-order Taylor expansion around t = 0 is

x(i) = x + i[(x*SA4x)x - SAAx] 4- 0(t2).

Set Ax= A + AA, x(l) = x 1 , interpret geometrically the identity xx — x =

(I-P1) (~SAAx) = {I-P±)Ax, where P 1 = x x * , and Δχ = χ 2 - χ is the

variation on x induced by the Wilkinson normalization xjx 2 = 1, where χ # is

the left eigenvector

Check that

||χ 1 |Ι1 = 1 + ||(/-ί , ) 1 Χ2ΐΐ2>1·

4.2.21 [A] Let λ belong to the normwise ε-pseudo-spectrum of A. Show that

this is equivalent to any of the following three statements:

(a) There exists a vector y such that

\\(A-Xl)y\\^E\\A\l ||y|| = l.

«Mil

(c) For the Euclidean norm | | · | | 2 ,

<rmin(A-M)^e\\A\\2.

4.3.1 [A] Let

A' = A + H,

where \\Η\\2 = ε and R(z) = (A-zI)~l. Suppose the Jordan curve Γ isolates the

eigenvalue λ of A from the rest of the spectrum of A. Put

c(r) = max||K(z)|| 2 .

zeT

c(r)'

then the matrix JR'(z) = (A' — zl)~l exists and

c(T)

max\\ R'(z)\\ 2 <

zer 1 — yc{r)

198 ERROR ANALYSIS

4.23 [A] The distance between two finite sets σ and r is defined as

(^ fer see see tex J

dist [sp (A\ sp (A')-] ^ c \\A - A'\\ "\

where c is a constant.

4.3.3 [D] Prove that >4i->det A is Frechet differentiable and that its derivative

det' A is given by

(det^)H= £ aei{Au...,Ai_l,HhAi+u...,An)

where

A=(Al9A2,...,AJ and H = (/f l5 ...,H n ).

4.3.4 [B:ll] Using the notation of Proposition 4.3.8 prove that, for sufficiently

small ε,

= 0(e).

eigenvalue λ and let ε be a positive number. Using the notation of Proposition

4.3.8, show that, for sufficiently small ε,

= 0(8),

rw-um) = 0(8).

sp(A)nsp(B) = 0.

Show that, if Aeres(T),

(T - AI)" lX = -?- f (A - ziy XX[_B - (z - λ)Π " x dz,

2πυΓ

where Γ is a closed Jordan curve isolating sp(ß) from sp(4).

4.4.1 [A] Using the notation of Proposition 4.2.3 prove that

(1 + μ / - Α | ) ί / - 1 ) / / < 2 ,

provided that ||ΔΛ|| is sufficiently small.

EXERCISES 199

| A - 2 ' | < 11**112*2,

where λ' is the arithmetic mean of the m eigenvalues of A' close to λ of multiplicity

m and where ε2 = || A A || 2.

4.4.3[A] Suppose that the matrix A is sufficiently close to a triangular matrix

and no two diagonal elements are equal to au. Prove that the condition (*) of

Exercise 2.9.1 is satisfied when x = y = ex and the norm is || · || i.

4.4.4 [B:42] Consider the generalized problem

Ax = λΒχ,

where A and B are real symmetric matrices and B is positive definite. The space

Rw is equipped with the inner product

<w,u>B = MTJ3t;,

and || · ||fl is the norm associated with this product. Let

S = B~lA.

(a) Show that S is symmetric with respect to the inner product <·, · >B.

(b) Let μί < μ2 ^ · · · ^ μ„ be the eigenvalues of S; let /ieR and xeR n . Show that

1 1

min^-^ ^*"* (x#0),

f

11*11*

**o |μ.| ||Sx||B

(c) Let P; be the eigenprojection associated with λ(. Show that

min \μ] - μ\ \\χ - Ptx \\B ^ ||Sx - μχ\\Β.

(d) Define the Rayleigh quotient associated with < v> B and a non-zero vector x

as follows:

<Sx x >B

RB(S9x) = >2 .

Prove that

min || Sx - μχ \\B = || Sx - RB(S9 x)x ||B.

ß

and

oi<RB(Sux)<ß.

200 ERROR ANALYSIS

Show that

RB(S9 JC) - Δ, < μ( ^ RB(S9 x) + Δα,

where

Δ = \\Sx-RB(S9 xMl

lRB(S9x)-*]\\x\\2B

and

\\Sx-RB{S9xMl

(ß-RB(S9x))\\x\\2B

(f) Deduce the a posteriori bounds:

min|ft - RB(S9x)\ ^ lRB(s\x) - RB(S9x)2V/29

ft-KB(S,x)

mm 2

0.· Λ Ä B (s ,x);

4.4.5 [D] Consider the results of Section 4.4.2. Find bounds for

\\XX-X\\9 || X\- X\\ and H^-BU,

where

XX = V+W9 X\=Q!+W and Bl = Y*AXl.

4.5.1 [A] Let {d^\ be a set of n positive numbers. Show that for every eigenvalue

λ of A there exists an index i such that

1

Ι*-β„|<7 Σ \*iMr

1

10~ 4 0 ^

1

A = 210" 4

1(T 4

1(T 4 3

0

/

Use similarity transformations by diagonal matrices and Gershgorin's disks in

order to localize the eigenvalues of A around 1, 2 and 3 with a precision of the

order of 10~ 8 .

4.5.3 [D] Let A — D + H, where D is a block-diagonal matrix possessing a

single block of order r. The row indices of the block constitute the set /. Define

and

r isi r&

EXERCISES 201

(a) With the help of Theorem 4.5.3 obtain a bound of the type

|2-4|<-IIIH«illi.

TisI

(b) Complete the study of the localization of the eigenvalues of A by means of

Corollary 4.5.2 when \\Η\\γ is sufficiently small.

4.5.4 [B:51] Let A = (ay)e<C" xn and suppose that /ieC differs from all the

diagonal elements of A. Define the map

μ»-+[Κι(μλ...,ΑΛμ)],

where

*ι(μ))=ί>ιΑ

Λ|(μ)=Σΐ^ΐΓ^1+Σΐ^Ι (* = 2,...,n-l)

and

«.<,>-Σ'Ι«.Α

;=ι 1%-μΙ

The set

K, = {jieC:|n-a„| <«,(*«)} (i = 1,2,...,«)

is called the ith Gudkov region. The Gershgorin disks are denoted by Gf

(i = l,2,...,n).

(a) Prove that

sp(A)s\jKts\jGt.

i=l i=l

(b) Construct the Gershgorin disks and the Gudkov regions associated with the

matrix

i-l 1 0\

A=\ 1 1 1

\ 2 0 3/

and compare their precision of localization.

(e) Put Ki = Kt(A)9Gi = Gi(A) and D = diag(d1,...,i/w). Consider the minimal

sets

G(A)= Π Ü G i(ö"^D),

D>Oi=l

κ(/ΐ)= π lU.P""1^)·

D>Oi=l

202 ERROR ANALYSIS

4.6.1 [A] Let

- ( ;

where A is Hermitian. Show that there exists a Hermitian matrix W such that

T-■ = ( A

" )

satisfies

imi2 = imi 2 .

This result is known as the dilation theorem.

4.6.2 [D] In the case of a simple eigenvalue, establish an inequality, computable

with the help of the generalized Rayleigh quotient, as a function of

p = M M - ξ ι ι ||2 and s= \\A*v- <fc||2,

where

ξ = ν*Αιι and v*u = u*ii = 1.

4.6.3 [B:34] Let iieR", a e R and

r(a) = Au — an,

where u*u = 1 and A is symmetric. Define

J f = {H symmetric: (A — H)u = ati}.

(a) Prove that

min||ii|| 2 = ||r(a)|| 2 ,

HeJiT

min||//|| F = 2 | | r ( a ) | | 2 - ( a - « M M ) 2 .

HeJT

(b) Prove that if p = MMM and r = r(p\ then the minima are attained for

H = ru* + ur*,

and r(p) is the minimum of r(a).

For a non-symmetric matrix Λ the situation is as follows: Let u and v be

vectors in C n such that v*u = u*u = 1. For a e C define

r(a) = Au + an, 5(a) = /l*t; = öiv

and

^ = {H:(A - H)u = au,M* - H*)v = äi;},

EXERCISES 203

Then

min||/i|| 2 = max{||r(a)|| 2 ,||s(a)|| 2 } )

min \\H\\F=\\r(a)\\22+\\s(<x)\\22-(*-v*Au)2.

HeJfT

H = ru* + vs*9

where r = r(z), s = s(z) and z = Ι;*ΛΜ. However, z does not entail a minimization

of the residuals.

4.6.4[B:45] Let Ae<Enxn be a Hermitian matrix and let Q e C n x m be an

orthonormal basis of the space 5 = lin (Q), that is the space generated by the

columns of Q. Prove that the best set of m numbers for approximating sp(A) that

can be deduced from A and Q is sp (Q*AQ) = {/?,}, in the sense that

ßj= min max y*Ay.

Q€(£n xm a n orthonormal basis and Ke(Cw x m a unitary matrix such that

Q*AQ = VAV*.

Prove that the best set of approximate eigenvectors that can be deduced from A

and Q is QV, in the sense that

WAQV- QVA\\2 = min {\\AU - UD\\2:U*U = Jm,D diagonal}.

4.6.6 [C] Let

/0 0.1 0\

4 = 0.1 0 1 .

vo i oy

Compute the eigenelements of B = Q*AQ, where Q = (el9e2). Deduce that

the property of optimality of the Rayleigh-Ritz vectors is only collective: none

of these vectors minimizes the distance from an eigenvector of A.

4.6.7 [D] Compare the localization (4.6.1) with that of Theorem 4.5.1 when A

is a Hermitian matrix close to a diagonal matrix.

4.6.8 [B:32] Let

(a) Prove that i T c ar.

204 ERROR ANALYSIS

(b) Prove that the vertices of 3C are the permutation matrices and the matrices

belonging to W.

(c) Use this result to prove the following property: let D and Δ be diagonal

matrices and let P be a matrix such that

||D - P*AP||F = min {|| D - Κ*ΔΚ||Ρ: V*V= /}.

Then P is a permutation matrix.

4.6.9 [D] What do the bounds of Theorem 4.4.3 imply when A is Hermitian?

4.6.10[B:34] Let U and V be two orthonormal bases in <P x m such that V*U

is regular. Prove that there exists a basis H and a diagonal matrix D such that

(A-H)U-UC and V*(A -U) = DV*,

where

C= (V*Uy1D(V*U).

4.6.11 [B:34] With the notation of the preceding Exercise 4.6.10 prove that

there exist matrices H of minimal norm:

||H|| 2 = max{||R|| 2 ,||S*|| 2 },

||H|| F = ||R\\* + | | 5 * \ \ 2 F - \\Z\\%,

where

R = AV-UC, S=V*A-DV* and Z=V*R = S*U.

4.6.12 [B:17,33] Let p be the Rayleigh approximation of the largest eigenvalue

of i4, which we assume to be simple. Prove that the bound

sin Θ ^ ~

δ

tan θ ^ ^.

\\ffD-DC\\J>a\\B'-C\\j (; = 2 o r F ) .

CHAPTER 5

Foundations of Methods

for Computing Eigenvalues

The present methods for computing eigenvalues are based on the convergence

of a Krylov sequence of subspaces towards the dominant invariant subspace of

the same dimension. The fundamental QR method is interpreted as a collection

of methods of subspace iteration. In this chapter we shall give a geometric

presentation of the convergence of these methods with the help of the tools

introduced in Chapter 1. This presentation will complement and illuminate in a

new light the traditional algebraic study of convergence. For example, it will

enable us to give a natural explanation of the condition upon the matrix of

eigenvectors of A which is necessary and sufficient for the convergence of the

basic QR algorithm.

SEQUENCE OF SUBSPACES

In classical terminology a vectorial Krylov sequence is a sequence of vectors

M, Au, A2u,...,

where w^O. Given r linearly independent vectors u!,...,ur which generate a

subspace 5, we call the sequence of subspaces

o , /T.4J, A o,..., A O)...)

this sequence when k tends to infinity.

Let {MJ" be the eigenvalues of A each counted with its algebraic multiplicity.

We make the assumption that there exists an integer r such that

1 ^r<n

and

\μι\ > \μι\ > - > \μλ > \^+ι\> - > lft.1 > 0 (5.1.1)

206 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

The vectors of the Jordan basis are denoted by {xj" and the vectors of the adjoint

basis by {xij|r}".

assumption (5.1.1) they form the dominant block. The subspace M = lin (xx,..., xr)

is the associated dominant invariant subspace and M+ = lin (xx „,..., xr<t) is the

left invariant dominant subspace. Let X be a basis ofM and let X+ be the adjoint

basis in M+. The dominant spectral projection is given by P — XX%.

We now suppose that we are given r linearly independent vectors U = [u l,..., ur]

which generate the subspace S = lin (ul9..., ur).

We shall establish the following fundamental theorem.

ω(Α%Μ)->0 as fc->oo,

if and only if dim PS — r.

PX = 0 and let [Χ+,Χ+'] be the adjoint basis of [ X , * ] . Then U can be

decomposed as

U = XF + XG,

where F = X%V and PU = XF. We have

dim PS = r,

if and only if PU is of rank r, that is if and only if F is regular.

(a) We suppose that F is regular. Now Uk = AkU is a basis of AkS and, according

to Proposition 1.5.4, we have ω(Λ*5, Μ) -* 0 if and only if there exists a regular

matrix of order r such that UkFk -► X. Since both M and M are invariant

under A we have

AX = XB, where B = X £Λ*,

and

AX = ΧΪ*, where B = ΛΓ *Λ JT.

Hence

= Xß*F + X5kG.

Note that the eigenvalues of B and B are {μ.}^ and {^J;+ j respectively, so

that B is invertible by virtue of (5.1.1).

Hence

Uk(F~ lB~k) = X + X{BkGF-lBk).

CONVERGENCE OF A KRYLOV SEQUENCE OF SUBSPACE 207

Now

||B'[GF-1ß-tK||fk||||ß-''||||GF-1||.

For every ε > 0, there exists an integer K such that k ^ K implies that

(\\Bk\\\\B-k\\)1/k^p(B)p(B'i)^e = Z^r-f

+ ε.

Vr

We can choose ε such that |μΓ+ 1/μτ\ + ε < 1, which implies that co(/4*S, M)->0;

for, given the bases Uk of /1*S and X of M, there exists a regular matrix of

order r, namely

such that

|ί/Λ-;πι = ο(|—Π.

The convergence is linear at the rate |μΓ+ί/μΓ\.

(b) Suppose there exists a sequence of regular matrices Fk such that

AkUFk -► X, and so PA k i/F k -♦ X.

Now

ΡΛ*17 = ^ k P(7 = AkXF = XBkF.

Hence

XBkFFk->X.

On multiplying on the left by X * we obtain

BkFFk->Iy asfc->oo.

we conclude that F is necessarily regular.

The necessary and sufficient condition that

dim PS = r

is satisfied by a particular choice of the matrix A and the subspace S, which, as

we shall see later, is of great theoretical and practical importance.

when i>j + 1. It is irreducible ifhii-1 Φ0 when i = 2,...,n.

the algorithms of Givens or Householder (see Ciarlet, 1989, p. 364, or Golub and

Van Loan 1989, p. 222). If some of the elements hu.1 are zero, then the problem

of eigenvalues of if is reduced to a set of subproblems for irreducible Hessenberg

matrices.

208 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

Let Er = lin (ex,..., er) = [e A ,..., e r ], where the et are the first r vectors of the

canonical basis of C". We are interested in the Krylov sequence HkEr.

Lemma 5.1.2 Under the hypothesis (5.1.1) we have

dim PEr = r,

where P is the dominant spectral projection of rank r associated with an irreducible

Hessenberg matrix H.

PROOF Suppose that xeEr but x$Er_ t. Then HxeEr+l but Hx$Er. Repeating

this argument, we deduce that the n - r + 1 vectors x, Hx,..., Hn~rx are linearly

independent: every subspace that is invariant under H and contains x must be

of dimension greater than n — r. Hence Er has zero intersection with every

invariant subspace of dimension less than or equal to n — r. In particular, Ker P,

which is of dimension n — r and is invariant under H, has the property that

Cn = £ r 0 K e r P .

Now

lmP = P<En = PEr

and so

(Cn = I m P 0 K e r P ,

which proves that dim PEr = r.

Corollary 5.1.3 IfH is an irreducible Hessenberg matrix and if (5.1.1) holds, then

ω(//*£ Γ ,Μ)->0 as /c-> oo.

The convergence of the Krylov sequence towards the dominant invariant

subspace can also occur when |μΓ| = |μ Γ + ι|. However, the presence of distinct

eigenvalues with the some modulus may prevent such a convergence.

The subspace AkS is generated by the r vectors

Akul9...,Akur (r<n).

These vectors tend to become parallel when fc-*oo, because, in general, they

converge towards the dominant eigenvector xx (for the power method, see

Section 5.3). The method of subspace iteration consists in iteratively constructing

an orthonormal basis Qk of AkS in the following manner:

(a) l/ = QoKo,

(b) for k > 1, let Uk = AQk_, = QkRk, (5.2.1)

where the Rk are upper triangular matrices of order r.

THE METHOD OF SUBSPACE ITERATION 209

holder method (see Ciarlet, 1989, p. 155) or by the Givens method (see Golub

and Van Loan, 1989, p. 211). We obtain the following result regarding

convergence.

Theorem 5.2.1 Suppose (5.1.1) holds. Given orthonormal bases Qk and Q of AkS

and M respectively, there exists a sequence of unitary matrices Zk such that

QkZk-+Q

if and only if dim PS — r.

PROOF It is clear that the matrix Qk defined in (5.2.1) is a basis of AkS. The

assertion follows from Theorem 1.5.2 because co(AkSiM)->0 asfc->oo.

The matrix Bk = Q*AQk is a matrix of order r whose spectrum converges

towards the r dominant eigenvalues.

Corollary 5.2.2 If (5.1.1) holds and if dim PS = r, then sp (Bk) converges to {μ^

as k^co.

PROOF The matrix QkZk furnishes an orthonormal basis for AkS. We note that

B'k = Z*Q*AQkZk is similar to Bk and

where B = Q*AQ is the matrix that represents the map A restricted to M relative

to the basis Q. Hence

sp(ßMQ) = {^}·;.

Since Bk -* B we have

sp(ig-+sp(£).

The linear convergence of Qk towards Q and of sp(Bk) towards {μ.}^ is

controlled by |μΓ+ χ/μτ\. In Chapter 6 we shall meet more precise results: in fact,

the rate of convergence of μ(*} towards μχ is of the order of |μ Γ+1 /μ 1 |, when μχ is

simple. An iteration on the subspace S = Sr carries with it simultaneously an

iteration on each of the subspaces

S 7 = 1111(11!,...,ii/), 1 ^/^r-

This remarkable fact will have important consequences: suppose the eigenvalues

are such that

ΙλΛ > \λ2\ > > \K\ > lAWiI > - > lM.1 > 0. (5.2.2)

On this assumption we define a strictly increasing sequence of invariant

subspaces Mf where

Mf = \m(xl9...,xf) (K/<r).

210 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

Mi»···><?/] *s a Das * s of My, 1 ^ / ^ r . Put

=

A y = [*1>· ··>*/]> ^/* LX1*>· · > * / * ϋ

and

=

Pf XfXf*> * r= *> ^Γ =

^» r* =

^ '

The reader can verify that

B = Q*AQ

is an upper triangular matrix: Q is a Schur basis of M, which is unique apart

from a unitary diagonal matrix.

Therefore Theorem 5.2.1 may be made more precise as follows.

Theorem 5.2.3 Suppose that (5.2.2) holds. Given orthonormal bases Qk ofAkS and

a Schur basis Q ofM, there exists a sequence of unitary diagonal matrices Dk such

that

ÖA->ß

if and only ifX*U is regular and its r — 1 principal minors are non-zero.

PROOF Let Uf = [ux,..., uf~\. When 1 ^ / ^ r, the condition that dim PfSf = /

is equivalent to the assertion that X*+Uf is regular. Now X%Uf is the principal

submatrix of order / extracted from X * U. Hence we know that, when 1 ^ / ^ r,

(o(AkSf9Mj)-+0.

By a recurrence argument on / we can show that the matrix Zk of Theorem 5.3.1

can be taken to be in diagonal form by a suitable choice of Q.

Corollary 5.2.4 Suppose that (5.2.2) holds. If X*U is regular and if its r— 1

principal minors are non-zero, then the limit form ofBk, as k tends to infinity, is an

upper triangular matrix whose diagonal consists of the eigenvalues {λ.}\ in this

order.

which is an upper triangular matrix having {λ.}\ as its diagonal elements in

this order. Since Dk is a diagonal matrix, the matrices Bk = D^BkDk and Bk have

the same form; in particular, the diagonal elements are identical. In fact,

We can say that, modulo a unitary diagonal matrix, the matrix Bk converges

towards an upper triangular matrix. This is the 'essential* convergence of

Wilkinson* (see Ciarlet, 1989, p. 205, and Wilkinson, 1965, p. 517).

THE METHOD OF SUBSPACE ITERATION 211

The condition (5.2.2) is not satisfied when there exist eigenvalues having the

same modulus. Without further investigations we can no longer deduce the

convergence co(AkSf9Mf)-+09 when / is an index corresponding to a set o f /

eigenvalues that include eigenvalues having the same modulus. The matrix Dk of

Theorem 5.2.3 becomes block-diagonal unitary; the limit form of Bk becomes

block-triangular, although it remains triangular in certain cases.

An important case in which there exist eigenvalues having the same modulus

is the case of real matrices which may have pairs of conjugate complex

eigenvalues. We shall return to this question in greater detail when we discuss

the power method in Section 5.3.

The method of subspace iteration is characterized by the subspace AkS, the

fcth iterate by A of the subspace S. Starting from there one can study the

convergence of the method and, if it converges, at what speed it does so.

The reader will verify that the method is still defined when r = n and A is

regular; we then have

dim Sn = dim Mn = dim AkS = n,

and a)(AkSn9 Mn) = 0 whatever the value of k.

In practice, several ways of constructing a basis for AkS can be throught of. In

(5.2.1) we presented the construction of an orthonormal basis Qk arising from the

Schmidt factorization QR.

By way of an example we shall give the 'staircase' iteration (Bauer, 1957)

(Treppeniteration') which rests on the Gaussian LR factorization (see Ciarlet,

1989, p. 138).

of the form shown below:

1 O O "ol

x 1I O o

: x 1

X

o1

X

X )< X "x"

(a) U = L0R0i

(b) When k^ 1, put Uk = ALk.x = LkRk9

where Rk is an upper triangular matrix of order r. It is known that the Schmidt

factorization (by the Householder or Givens algorithms) is stable, while the

Gaussian factorization without pivoting, even if it is possible, may not be

212 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

stable. It is for this reason that we presented Example 5.2.1 only for its historical

interest.

constructing a basis Xk of AkEn and an adjoint basis Yk of (A*)kEn, where

£ n = lin(e 1 ,...,e n ) = C w .Weput

=

^ο ^ο = f* G k + 1 =i4A' k , Hk + 1=A Yk.

The factorization of Gauss, if it exists, given by

Y*A Xk — Lk + lRk + i,

AXk — Xk+iRk+i

and

Λ l

k 'k+l^fc+l'

and the algorithm becomes identical with the subspace iteration (5.2.1) when

r = n; in this case Xk is unitary and AXk = Xk+ xRk + v

The method of subspace iteration is most frequently used in conjunction with

the auxiliary techniques of deflation and spectral preconditioning whose principles

we are going to describe. These are techniques that modify the spectrum of A,

but not the eigenvectors or the Schur basis; their purpose is to facilitate the

computation of the spectrum.

5.2.1 Deflation

The object is to eliminate those eigenvalues that have already been computed,

the computation being carried out one by one.

Proposition 5.2.5 Let x and x+ be the right and left eigenvectors corresponding to

the simple eigenvalue λ. The matrices

A' = A — σχχ* and A = A — σχχ*

have the same eigenvalues as A except λ, which is replaced by λ — σ.

When A is diagonalisable, A' and A have the same eigenbasis.

The matrices A and A have the same Schur basis.

PROOF

AX = XD.

THE POWER METHOD 213

(b) If T is the Schur form of A9 then

AQ = WT.

Suppose x is the first column of Q; it follows that

ÄQ = QT- axe] = Q(T- aexe[).

The deflations we have defined are deflations by subtractions. Other deflations

in use are deflations by restriction and deflations by similarity (see Wilkinson,

1965).

of sp(,4) and is chosen in such a way that the spectrum is modified so as to

facilitate computation. We shall discuss three examples of applying this idea:

(a) f(t) = t — a: translation of the origin for the QR algorithm, considered in

Section 5.5.

(b) f(t) = (t — σ)~ι: the inverse iteration will be discussed in Section 5.5 and the

spectral transformation in Chapter 6, Section 6.5.

(c) f(t) is a polynomial in t, chosen in such a way that the eigenvalues with the

greatest real part become the eigenvalues with the greatest modulus; this will

be discussed in Chapter 7, Section 7.9.

If in the preceding method we consider the special case in which r = I, we recover

one of the oldest methods used for computing the dominant eigenvalue λχ\ this

is known as the power method.

We suppose that

\λι\>\μ2\>->\μ.\>0 (5.3.1)

The following theorem is fundamental.

Hub Mfc-ill*

is such that

214 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

5.3.1

What happens when condition (5.3.1) is not satisfied? In the following we shall

suppose that A is diagonalisable. Two cases are possible:

(a) There exists a multiple semi-simple eigenvalue and the method still converges.

(b) There exist distinct eigenvalues with the same modulus, and, in general, the

method fails to converge.

We discuss these cases in more detail:

(a) μ1 = ··.=μ Γ , |μιΙ>|μ Γ +ιΙ>···.

If

i

then

If Pru Φ 0, we have convergence towards a vector in the dominant eigenspace

associated with μ1 = ··· = μΓ, that is towards an eigenvector.

(b) Let \λ1\ = \λ2\>\μ3\> --. We suppose that λγφλ2 and that Ρ2ιιφ0. If

x\u φ 0, we put λί = εωλ2, where 0 < θ < 2π. Then

If there exists a rational number p = t/s such that θ = 2π/ρ, then there exist

t subsequences that converge to the vectors

e 2ikÄt/s (x*^)x 1 + (x**u)x2 (k = 1,..., t)

respectively.

/1+i 2i 0\

A= \ 0 1-i 0 .

\ -i -3 1/

The spectrum is {1 -h i, 1 — i, 1}. It will be found that the power method furnishes

four convergent subsequences of vectors because 1 + i = exp (2πί/4) (1 — i).

THE POWER METHOD 215

( i o - 1 0\

0 1 -1 0

x= 1 2 1 1

{-l 0 0 V

and

D = diag(3,2,l,-3).

In this case

λ2 — e / j — 3,

and we obtain two convergent subsequences by applying the power method to

the matrix A with the initial vector u = (1,1,1,1 )T.

For example, by projecting A on the subspace generated by the two limit

vectors, we obtain the matrix

Z-2.76 1.18\

B-

V 1.18

118 2.76,

2.76/

whose eigenvalues are 3 and - 3 (see Exercise 5.3.7).

If 0 is not of the form 2π/ρ, where p is rational, then no convergence can take

place when x^u Φ 0. The only possibility consists in iterating upon two

vectors ux and u2 such that P2ui and P2u2 are independent.

5.3.2

In this section we present some aspects of the behaviour of subspace iteration,

subject to condition (5.1.1), in the light of the behaviour of the power method.

First of all, if the necessary and sufficient condition for convergence, that is

dim PS = r, is not satisfied, then convergence will occur towards an invariant

subspace that is no longer dominant.

Example 5.33 Let

where

sp(^) = {3,2,1}.

Whenr = 2,

216 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

Choosing

n o\

U = 1 1

Vo o)

we have

Ή:

As we should expect from the preceding discussion, we notice the convergence

of Bk to an upper triangular matrix having the eigenvalues {1, 3} (in this order)

upon its diagonal. In fact, the power method applied to A with the initial vector

u= {1, 1, 0)T yields the eigenvalue 1, since u happens to be the eigenvector

associated with 1.

When there are eigenvalues having the same modulus, we can obtain several

distinct block-triangular limit matrices.

means of

D = (3,-3,2,1).

By simultaneous iterations on the two vectors e1 and e2i the eigenvalues

{3, —3} are obtained by calculating the spectrum of one of the two limit blocks

0.913 1.57 /0.2<296 -5.30

5.19 -0.913 or

V - l , 68 -0.296

These are obtained by projecting the matrix A on the invariant subspace

associated with {3, —3} by means of one of the limit bases

-0.626 0.417 0.209 0.626Y

-0.251 -0.314 -0.880 0.251

or

-0.356 -0.237 -0.831 0.356^T

0.573 -0.465 -0.358 -0.578,

On the other hand, if we iterate in the same way with a matrix A defined by

Z) = (3,2, 1, - 3 ) ,

we obtain two limit bases

/-0.447 0 0 0.894 γ

V 0.365 0 0.913 0.183/

THE METHOD OF INVERSE ITERATION 217

and

-0.447 0 -0.894 0 V

0.365 0 -0.183 -0.913/'

but only a single limit block

0.600 -2.94 \

-2.94 -0.600/

The reader will verify that this pecularity is due to the fact that in this case the

eigenvectors associated with 3 and — 3 are orthogonal.

We have seen that the convergence of the power method is linear, with rate

\μ2/λχ |. Since the eigenvectors are invariant under translation of the origin, it is

natural to think about improving this factor through an appropriate shift of

origin. Let σ be an approximation to a simple eigenvalue A, with eigenvector x and

spectral projection P. Then the eigenvalues of A — σΐ are μ, — σ; if σ is not close

to the eigenvalues of A, other than A, than the dominant eigenvalue of (A — σΙ)~ι

is 1/(λ — er), whose modulus is significantly greater than max {\μ( — σ\"1,^ Φ λ).

This fact is exploited in the method of inverse iteration designed to compute the

eigenvector x associated with λ whose approximation σ is known: thus we put

(k>1) (5A1)

io = 7 V · {A-aI)zk = qk_.u & = 7ΠΓ

z

I N II2 Il kll2

This iteration is none other but the power method applied to (A — σ/)" 1 ; the

convergence rate is \λ — σ\/τηίημ.¥:λ\μί — σ|, which is closer to zero the closer σ

is to λ. Be this as it may, Α — σΙ will be closer to becoming singular; at first sight

this could pose a problem for the solution of (5.4.1). In what follows we shall

examine this apparent paradox.

Lemma 5.4.1 If the vector x is well-conditioned, the error made in solving (5.4.1)

is mainly in the direction generated by x, which is the direction required.

PROOF Let Q be a basis for (x) 1 such that [x, Q] is unitary. Then

, r πίλ~σ X A

* Q Μ~Χ*Ί

where

B = Q*AQ

and

Σ ± = ρ(Β-Α/)- 1 ρ*, csp(x)=P1||2.

218 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

the exact solution of the neighbouring problem

(4-ff/-tf)y = !!+/,

where H and / are of small norm. We write

(A - al)y = u + Hy +f=u + g,

where g = Hy +f. The error made on y is therefore e = (A — aI)~1g. Now

z\m

1 -1

ι x*AQ(B -

(Α-σΙ) = [χ,0]\λ-σ λ-σ

0 (Β-σΙ)

Therefore

e = x(x*e) + Q{Q*e\

where

λ—σ

and

Q*e = {B-ci)lq*g.

Now

B - σΐ = B - λΐ - (σ - λ)1

whence

{Β-σΙ)-χ = [Ι-{σ- λ)(Β- λΙ)~ι]{Β-λΙ)~χ

1

ands=\\(B — σΐ) \\2isclose to ||Σ λ ||2 when |λ — σ| is small enough. We deduce

that

\λ-σ\

1

When || Σ 1| 2 is of moderate size, we see that the closer σ is to A, the more will

the error lie in the direction of lin(x) (see Figure 5.4.1). We know that

IIΣ11|2 > δ~ \ where δ = minM. # λ |μ, - σ\, with equality when A is Hermitian or

normal.

(**'l·

V 0.

"u«e#-

Figure 5.4.1

THE METHOD OF INVERSE ITERATION 219

precision' we mean a quantity of the order ε = b~c. We say that a is an exact

eigenvalue 'up to machine precision' if there exists E such that Α — σΙ + Ε is

singular and ||E|| =0(ε). Similarly, y is an exact eigenvector 'up to machine

precision' if there exists F such that y is an eigenvector of A + F and || F \\ = 0(ε).

The following remarkable fact is a consequence of Lemma 5.4.1. Starting with

an eigenvalue exact up to machine precision and with (almost) any vector, we

obtain with one step of inverse iteration an eigenvector that is exact up to

machine precision. More exactly, we shall prove the following result.

Then there exists at least one vector u such that the exact solution y of(A + al)y — u

determines an eigenvector of A that is exact wih precision ε.

||j>||2 = l.Put

u = (A — al)y and z = Ey.

Then

(A + zy*)y = ay and || zy* \\ 2 = || z \\ 2 ^ ε.

Hence σ and y are eigenelements of the matrix A + zy* whose norm differs from

that of A by less than ε.

The exact solution of a neighbouring problem is the best we can hope to

achieve in practice. However, we must not lose sight of the fact that when λ is

ill-conditioned, there may be a great difference between σ and λ.

Note that

σ-l and ^ = (1 + 1 0 - 2 0 Γ 1 / 2 ( 1 0 ^ 1 0 )

2

A' = ( "1010 ^

V - 1010/(1 + 1(T 20 ) 2 - 1(T 2 7(1 4- lO" 2 0 )/

and || A' — A ||2 ~ 10" 10 . Now A has the double eigenvalue λ = 2 and |σ — λ\ = 1,

which is not small. However,

Mil

220 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

being the Jordan basis (A is highly non-normal).

Let us return to the iteration (5.4.1). Proposition 5.4.2 has shown that if σ is

exact up to machine precision, then there exists an eigenvector that is exact up

to machine precision. This remains true even if qi is computed up to machine

precision. One might hope that q2 will be better still. However, if λ is

ill-conditioned, this turns out very often to be false; one cannot obtain anything

better than machine precision, even if it is assumed that the solution of (5.4.1) is

exact. Let us look at this phenomenon in an example.

ViO" 1 0 1/

5

and σ = 1. The exact eigenvalues are 1 ± 10 . We compute the iterates defined

by (5.4.1), starting with

wT = (0, 1).

Put

rk = (A-I)qk.

We obtain

<?0 Zl 9l Ί *2 12 r2 *3 <?3 r3

0 10 10 1 0 0 0 1 1010 1 0

1 0 0 lO-io 1 1 0 1 0 lO-io

The eigenvalues of A are ill-conditioned, for they possess eigenvectors that are

almost parallel.

An important variant of inverse iteration is the iteration of the Rayleigh

quotient:

MM

p0= " (5.4.2)

« 0= ϊ«? u*u

z

z

k t^zk

-1' Qk |, zz zz

„>

IUJI *t kk

The properties of local convergence of (5.4.2) have been studied by Ostrowski*

(1958-9) (see also Parlett, 1980, pp. 71-9).

THE QR ALGORITHM 221

(a) if p 0 is an approximation to a semi-simple eigenvalue A of A and if pk -+ A,

then this convergence is essentially quadratic.

(b) When A is normal, the convergence of pk towards A, if it takes place, is

essentially cubic.

We remark that the translation pk in (5.4.2) varies with k; this enables us to

obtain a convergence whose order is asymptotically greater than unity. In the

subsequent sections we shall return to the relationships between the iteration of

the Rayleigh quotient, the QR method with origin shifts and Newton's method.

The basic QR algorithm consits of the construction of a sequence {Ak} of unitarily

similar matrices:

A^A^Q,RU Ak+l=RkQk = Qk+1Rk + 1 (k>l)

where the Qk are unitary and the Rk are upper triangular matrices. Since

Rk = Q*Ak9 we have

Put

Ä* = ß i - ß * and ak = Rk...Rt.

Then it can be verified that

We assume that the eigenvalues are simple and of strictly positive distinct

moduli; thus

|A 1 |>|A 2 |>..->|AJ>0. (5.5.1)

We point out immediately that the assumption 0£sp(y4) is not restrictive, since

one can satisfy it by making a translation of the spectrum.

infinity.

=

Ak+i ^k^l^k'

Lemma 5.5.2 The first r columns of £k generate the subspace AkEr, where

Er = \m(e1,...,er), r = l , . . . , n .

222 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

PROOF This follows from the triangular form of the matrix Mk which appears

in the factorization Ak = &k&k\ we observe that both A and $k are regular.

subspace iterations starting with the n nested subspaces

£ ι ζ =£ 2 <=:.-er £„ = €".

The condition (5.5.1) implies that (5.1.1) is satisfied for r = 1,..., n.

First, we shall study the convergence of the QR algorithm when applied to an

irreducible Hessenberg matrix, making use of Corollary 5.1.3.

factorization. Then both Q and RQ are Hessenberg matrices.

Theorem 5.5.4 Under the hypothesis (5.5.1) the QR algorithm, when applied to

an irreducible Hessenberg matrix, produces a sequence of unitarily similar

Hessenberg matrices which converges (modulo a unitary diagonal matrix) to an

upper triangular matrix whose diagonal consists of the eigenvalues {/l,.}" in this

order.

When r > n, we can apply Theorem 5.2.3 to the first r columns of Jfc, which is a

basis of HkEr. When r = n, the recurrence still applies, because a>(HkEn, Mn) = 0

for all k. Hence we deduce that, given be basis Q and £k of <P, there exists a

unitary diagonal matrix Dk such that

£k-+Dk^Q.

Now Hk + l is similar to

D*Hk + 1Dk = D*Q*HQkDk,

which converges to

Q*HQ = R.

This matrix is the Schur form of H whose diagonal consists of the eigenvalues

{Au}" ordered by decreasing modulus.

Its practical importance stems from Lemma 5.5.3, for in the course of the iterations

the matrices Qk and Hk are such that their subdiagonal parts contain at most

n — 1 elements which are not necessarily zero. This fact considerably diminishes

the volume of calculation to be effected.

We return to a diagonisable matrix A whose eigenvalues satisfy (5.5.1):

A = XDX~\ where X = [x l 5 .. . , x j ,

THE QR ALGORITHM 223

and

'■■-Kb

Theorem 5.5.5 On the assumption (5.5.1) the QR algorithm, when applied to A,

produces a sequence of unitarily similar matrices whose limit form is an upper

triangular matrix having {A.}" as its diagonal elements in this order, under the

necessary and sufficient condition that the n—\ principal minors of X~l are

non-zero.

gence given in Theorem 5.2.3 when r = n and U = [el9...,e„] = /. The speed of

linear convergence is controlled by max r = x n-i\K+JK\·

made more precise in Chapter 6, where we shall present the incomplete methods of

Lanczos and Arnoldi.

When there exist distinct eigenvalues with the same modulus, the limit form of

Ak need no longer be upper triangular, but may become upper block-triangular.

The limit form depends only on the eigenvalues and their multiplicity. It is beyond

the scope of this book to include a complete discussion of the convergence of the

basic QR algorithm in the presence of distinct eigenvalues with the same modulus.

The following result can be proved.

matrix, it produces a sequence of matrices whose limit form is triangular, if and only

if there do not exist two distinct eigenvalues with the same modulus and with

algebraic multiplicities of the same parity.

PROOF See Parlett (1968) and Parlett and Poole (1972). Parlett's article of 1968

gives a necessary and sufficient condition for convergence to a quasi-triangular

form, that is having blocks on the diagonal of order at most two.

If A is invertible, the Euclidean scalar product satisfies the relation

>>*x = (x, y) = (Λχ, A ~*y\

where

224 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

The subspaces

S,AS,A2S,...

and

S\A-*S\(A*)-2S\...

are orthogonal complements in pairs. Note that if Er = lin (ex,..., er\ then

£:r1 = lin(e r + x,..., e„).

The QR algorithm, which we have interpreted as a collection of n subspace

iteration methods for A, can also be interpreted with the help of iterations for

A~*.

A'*.

columns by beginning with the last rather than with the first, as is the case in the

QR factorization, L is a regular lower triangular matrix.

Let Ak = QkRk, A* = R*Q*9 and so

where

/ = R~*

t£k = Lk-Ll= 0tk .

Then

in AkEr, r = 1,..., n. Thus we obtain different algorithms which converge in the

same manner but enjoy different numerical stability.

the following sequence {Ak} of similar matrices:

A = A, = L.R^ Ak+1 = RkLk = Lk+lRk+1 (k > 1),

where Lk is a lower triangular matrix, all of whose diagonal elements are equal

to unity (Gauss's factorization without pivoting). If &k = Rk-··/?! and S£k =

Li · · · Lfc, then Ak = S£kMk and Ak+l= S£kxAS£k. It is easy to verify that the first r

columns of 5£k generate the subspace AkEr,r—\,...,n (see Lemma 5.5.2). One

THE QR ALGORITHM 225

presented in Example 5.2.1.

The linear rate of the convergence of the basic algorithm can be improved by

using shifts of origin that vary at each step. Thus consider the algorithm:

AX=A, Ak-akI = QkRk, Ak + 1 = RkQk + akI (k ^ 1),

where {ak} is a sequence of scalars.

Two strategies are often used in practice. They are defined by the following

choices:

( 3 )σ, = <> = β ί ν „ . , .

(b) σ^ is the eigenvalue of the submatrix ET2AkE2 that is closest to -aJJ*, where

E2 = [£„_!,£„]· This shift is known as Wilkinson's shift.

There exists no global result on convergence of the shifted QR algorithm for a

non-normal Hessenberg matrix. With strategy (a) the convergence of ajj* to an

eigenvalue is asymptotically quadratic. When strategy (b) is applied to a

symmetric tridiagonal matrix, the convergence of a{^ to an eigenvalue is at least

quadratic and almost always cubic.

Strategy (a) is related to the iteration of the Rayleigh quotient in the following

manner.

Lemma 5.5.8 If we choose ok — a™, then Qkek is proportional ίο(Α% — äkI) ~1ek.

implies that

Q* = Rk(Ak-akI)-K

Hence

Qke„ = (At-ökI)-lRken

= (A*-äkI)-lf<»e„,

where r*'( #0) is the element of Rk in the position (n, n). Hence

Vt

" MAt-ajrwi,

Starting from qk_ i and pk„l= a'*', one iteration of the Rayleigh quotient upon

A* yields qk = QkeH and

226 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

the result of one iteration of the Rayleigh quotient on A* starting with en, which

is an approximate eigenvector.

In practice, the shift of origin is applied together with deflation: the last row

and the last column of the matrix are suppressed when the coefficient a(^_ln is

considered to be sufficiently small; a new shift of origin is then applied to the

matrix of order n — 1.

It is evident that when shifts of origin are employed, the eigenvalues will no

longer necessarily appear in order of decreasing moduli.

The interested reader will find a description of the practical implementation

of the QR algorithm in Golub and Van Loan (1989, pp. 228-37).

When the matrix A is Hermitian, numerous simplifications occur, as we have

indicated from time to time. We recall that among the most important ones are

the facts that A is diagonalisable, that the eigenvalues are real and that the

eigenvectors are orthogonal. The simplifications relating to the QR algorithm can

be summarized as follows:

(a) Complexity. The tridiagonal form (or band symmetric) is preserved (see

Exercise 5.6.1).

(b) Convergence. When the QR algorithm with Wilkinson's shift is applied to

an irreducible tridiagonal matrix the convergence is always at least linear.

The asymptotic rate of convergence is almost always at least cubic.

The reader will find a detailed treatment of the symmetric eigenvalue problem

in Golub and Van Loan (1989, Ch. 8) and particularly in Parlett (1980, Chs. 8

and 9).

In this section we consider the generalized problem

Ax — λΒχ,

whose eigenvalues can be computed by the QZ algorithm (Moler and Stewart,

1973). This algorithm generalizes QR: when B is regular, QZ reduces essentially

to the application of QR to AB~ \ without the need to compute B~l.

Theorem 5.7.1 There exist unitary matrices Q and Z such that Q*AZ = T and

Q*BZ = S are upper triangular matrices. If for some values of i, we have

ίί£ = sit = 0, then sp [Λ, B~] = C; otherwise

s p [ ^ B ] = {i„/sfl;sl|9feO}.

NEWTON'S METHOD AND THE RAYLEIGH QUOTIENT ITERATION 227

(B need not be regular). For each k we define:

(a) the Schur decomposition Ql(ABk l)Qk = Rk,

(b) the Schmidt factorization BklQk = ZkSkl, where Rk and Sk are upper

triangular matrices.

Therefore QkAZk = RkSk and Q^BkZk = Sfc are also upper triangular matrices.

The sequence of pairs of unitary matrices {(Qk,Zk)} possesses a convergent

subsequence Qf->Q and Z^^Z, where ^eNl ^ N . It can be verified that the

limit matrices are unitary and that both Q*AZ and Q*BZ are upper triangular

matrices. The statement about sp [Λ, JB] follows from the identity

5.7.1.

The QZ algorithm is divided into two stages:

(a) The determination of the two unitary matrices Q and Z such that Q*AZ and

Q*BZ are in upper Hessenberg and triangular forms respectively. This is the

preparatory stage.

(b) The iteration: Ak - kBk = Qt{Ak_l - XBk_l)Zk, k ^ 1, where Ak and Bk are

in upper Hessenberg and triangular forms respectively. The matrix AkBk i is

essentially the result of applying one step of the QR process to Ak_lBk}v It

can be shown that, as k-+ oo, the limit form of Ak is upper block-triangular.

The reader will find a complete description of the QZ algorithm in Golub and

Van Loan (1989, pp. 251-66).

Q U O T I E N T ITERATION

Let A be a simple eigenvalue of A; the eigenvector x, normalized by y*x = 1,

satisfies the equation

F(x) = Ax- x(y*Ax) = 0. (5.8.1)

We apply Newton's method to (5.8.1) in the manner defined in Section 2.10 of

Chapter 2; thus we write

x° = — , z = xfc+1-xk,

y*u

(I - xky*)Az - z(y*Axk) = - Axb + xk(y*Axk) (k > 0),

where the superscript k represents the number of the iteration. We deduce that

Axk+1-xk +

\y*Axk) = xkly*A(xk +x

- x k )] (k ^ 0).

228 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

μ - μ * / ) χ * + 1 =τ*χ*,

where xk is a non-zero scalar.

This may be interpreted as follows: x fc+1 is the solution, normalized by

y*xk + 1 = 1, of a system of linear equations with matrix A — μ*Ι and right-hand

side parallel to x \

Proposition 5.8.1 The application of Newton's method to (5.8.1) is equivalent to

an iteration of the right Rayleigh quotient.

q (A-vk_1I)zk = qk_u

°~U' y*io

(5.8.2)

y*Mk

q k

~ π π'

y*<ik

II zk II

(that is only the right-hand vector in the Rayleigh quotient is modified).

We shall show inductively that μ* = vk and that the vectors xk and qk generate

the same direction; they differ only in their normalizations, which are y*xk = 1

and || qk \\ = 1 respectively. This is true when k = 0; indeed,

o *Λ o y*Au

= Vn.

y*u

Suppose the assertion is true for the (k — l)st iteration; hence the vector

This leads to the conclusion that the local convergence of the iteration (5.8.2)

is quadratic.

INVERSE ITERATIONS

5.9.1

We continue to assume that λ is a simple eigenvalue and that the corresponding

eigenvector x is normalized by y*x = 1. We consider Newton's modified method:

v - y

X Z

0 — ~T~' — Xk+1 ~ X

fc>

y*u

(5A1)

(I - xky*)Az - z(y*Ax0) =-Axk + xk(y*Axk) (k > 0),

NEWTON'S METHOD AND SIMULTANEOUS INVERSE ITERATIONS 229

or else, if C = y M x 0 ,

(A - CI)xk + i = xkly*A(xk+ x - x 0 )]·

of inverse iteration on A, starting with u and ζ — y*Au/y*u.

the normalizations y*xk = 1 and || qk ||2 = 1. They generate the same direction.

Remark The two methods are mathematically equivalent, but not numerically.

In fact we have seen in Proposition 5.4.2 that one cannot surpass the machine

precision for qk when it is calculated by the inverse iteration method. On the other

hand, we can obtain xk to a precision equal to that used to compute the residual

Axk — xk(y*Axk)9 while the linear systems are solved in simple precision.

Let σ be an approximation for an eigenvalue λ of algebraic multiplicity m and

let U be a set of m linearly independent vectors. We can generalize (5.4.1) by

defining the inverse subspace iteration (or block-inverse iteration or simultaneous

inverse iterations) in the following manner:

(a) U = Q0R0;

(b) when k ^ 0, put (5.9.2)

{A — al)Yk + l = 6 k , Yk+i = Q* + i#*+i·

Let Q be an orthonormal basis of the invariant subspace M associated with λ

and let Q be an orthonormal basis of the orthogonal complement M 1 . Put

(χ = λ — σ, ε = |α|,

so that A — σΐ is almost singular for small enough ε.

The following analogue of Lemma 5.4.1 is easily proved.

(A-aI)Y=U (5.9.3)

lies mainly in the required subspace M.

PROOF This is entirely analogous to the proof of Lemma 5.4.1. The system we

wish to solve can be written

X

where B = Q*AQ and B = QAQ. It remains to compare ||(B,B) \\F with

230 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

brought back to the situation of Lemma 5.4.1. If A is not semi-simple, it can be

verified that when || (£, B) || F is moderate, sois ||(B — σ/)" 1 ||F (see Exercise 5.9.1).

However, if σ is too close to a defective eigenvalue λ, then the basis Y may

consist of vectors that are almost linearly dependent.

OL 1 o\

1

Vo a

l

be of order r. Ife = |a| is sufficiently small, then G is of rank unity up to ε1/2.

/a"1 -a"2 (-l)r+1a"r^

Vo

(-i) r + 1 or 1

(-l)'af~r-2 1

= (-l)r+1a'r

·. ( - l ) r a r ~ 2

V .(-I)'*1«'-1

r+1 r

= (-l) oT K,say.

Let Π be the matrix e^J, which maps C on to lin (ej. Π is of rank unity: one

of its singular values is equal to unity while the other r — 1 singular values are

equal to zero. In fact, Π Τ Π = ejie^ejej = ereTr, which is the diagonal matrix

diag(0,...,l).

On the other hand, G l and K have the same rank: the columns of G~1 are

multiplied by ar, apart from a sign, in order to obtain K. Now

|| K - Π || 2 ^ c(2e + 3ε2 + · · · + rer' *) ^ ce,

where c is a generic constant depending on r.

We conclude that the difference between the singular values of K and those of

Π is of the order of ε 1/2 (see Exercise 5.9.2).

In view of Lemma 5.9.2 we are interested in that part of the solution Y of (5.9.3)

which lies in M, that is

ß * y = (B - σΙ)~ lQ*lV - AQ(B - aiy x

Q*Ul

For that reason we study the solution Z of the system (A — σΙ)Ζ = Q.

NEWTON'S METHOD AND SIMULTANEOUS INVERSE ITERATIONS 231

Theorem 5.9.4 Ifk is defective, the m vectors that are the solutions of

(A-aI)Z =Q

1/2

are linearly dependent up to ε , for small enough ε.

Ζ = ρ(£-σ/Γ1.

The rank of Z is equal to the rank of

{Β-σΙ)-1-ν{Βλ-σΙ)-1ν-\

if V is the Jordan basis of B = Q*AQ and Bx is the Jordan box of B associated

with the eigenvector λ of geometric multiplicity g < m, that is

ß A = diag(J 1 ,...,J g ),

where the {Jk}\ are the g Jordan blocks of λ.

For each k (1 < k ^ g\ Jk — σΐ is a matrix of the type G in Lemma 5.9.3 of

order rfc, equal to the grade of the eigenvector that is associated with the Jordan

block Jk,rk^t,t being the index of A.

For sufficiently small ε, the rank of (Βλ — σΐ)'1 is therefore equal to g up to

the order of ε1/2; the some holds for the rank of Z.

unstable when ε is sufficiently small.

Yk tends to become of rank g < m as Qk-+Q, which is an orthonormal basis

ofM.

Hence there is no analogue of Proposition 5.4.2 when λ is a defective

eigenvalue; this is confirmed by the following example.

/4 1 0 0 0\ l 1 0 0 0 -1\

0 4 0 0 0 1 1 0 0 - 1 I

J= 0 0 1 1 0 and V= 1 1 1 1 0 -1

0 0 0 1 0 1 1 1 1 -1

^0 0 0 0 2 ) U 1 1 1 o)

We wish to compute an orthogonal basis of the invariant subspace M associated

with the defective eigenvalue λ = 1, that is

__ /0 0 a a α\τ

\0 0 -2b b bj

232 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

Table 5.9.1.

A-IHU*

0.85 ρ15<0.2χ10"10 e„-[;33 0

0

0.581

0.814

0.575

-0.471

0.575 T

-0.411 J

0 0.5779 0.5771 0.5771 Ί

0.99 Pi < 0.25 x 10" 10

-[: 0 0.8161

0

-0.4086

0.5 x 10" 4

-0.4086 J

0 - 0 . 2 x ΗΓ 4

p2<0.2xKT3

0.999999

does not decrease

e100= 0.5773503 0.8164965

0.5773503 -0.4082488

JX5773503 -0.4082486 _

0 0.81

0 -0.064

0.9999999 2<pk<4 ßioo "~ 0.5773503 0.42

0.5773503 -0.050

0.5773503 -0.37

/ i i i o oy

V-l 1 0 1 0/

and c. We consider the residual matrix

Rk = AQk-Qk(Q*AQk)

Γ

and the norm ||Α||* = Σι,./Ι υΙ> when K = (r0). The numerical results are sum

marized in Table 5.9.1.

We observe that when σ is far from λ (λ — σ = 0.15), the residual pk becomes

small, but the solution has only one correct digit. On the other hand, when a is

close to λ {λ — σ = 10" 7 ), the eigenvector has seven correct significant digits while

the principal vector has none.

We present a modified Newton method which is mathematically equivalent to

(5.9.2) in the sense of the following proposition.

X0=U,Y*U=I, Z = Xk + 1-Xk,

(I - XkY*)AZ -σΖ=- F(Xk) (k^0).

NEWTON'S METHOD AND SIMULTANEOUS INVERSE ITERATIONS 233

Then the bases calculated by (5.9.2) and by (5.9.4) generate the same subspace

(A-aI)~kS.

=

AXjt + ι ~~ G^k+i ^fc£*>

where

Ek=Y*AXk+1-aI.

Now Ek is regular, for, if not, Xk+i would be of rank less than m, which is

impossible because A — σΐ is regular by (5.9.4): Xk+i satisfies Y*Xk+1 = /. We

conclude that

Xk+1=(A-aiylXkEk.

When λ is defective, the basis calculated by (5.9.4) is of rank m, which remains

constant throughout the iterations; on the other hand, the basis Yk9 calculated

by (5.9.2), tends to become numerically of rank g when σ is sufficiently close to

λ. This is essentially due to the fact that, when the projection I — XkY* is applied

to A, it will in the course of the solution eliminate the contribution of the subspace

upon which A — σΐ approaches singularity.

How does one study the convergence of (5.9.4)? For example, one might

compare (5.9.4) with the simplified Newton method (2.11.1) whose convergence

was demonstrated in Section 2.11 on the condition that the initial basis U yields

a residual matrix R = AU — UB of sufficiently small norm, where B = Y*AU.

The Newton iteration (2.10.4), the simplified Newton iteration (2.11.1) and the

modified Newton iteration (5.9.4) lead, at thefethiteration, to the solution of a

Sylvester equation for Z defined respectively by

(/ - XkY*)AZ - Z(Y*AXk) for (2.10.4),

(/ - UY*)AZ - ZB for(2.11.1),

(/ - Xk Y*)AZ - σΖ for (5.9.4).

We may interpretJ5.9.4) by saying that B has been replaced by σ/, which is

legitimate when \\Β — σΙ || is sufficiently small. However, we shall see that this is

no longer always true when λ is defective.

Suppose 5 = Y*AU is diagonalisable: B= WAW1. Then

ΙΙΒ-σ/ΙΙΗΙ^ίΔ-σ/)^" 1 !!

^ cond (W) max | ζ, — σ\.

i

necessarily large as soon as || U — X \\2 is sufficiently small.

A'0F=UBW=FA.

234 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

we suppose that the basis U is chosen to be orthonormal. Then W*W= F*F and

cond2(^) = ^ g ,

where amax(F) and ^ i n (F) are the greatest and the least singular values of F

respectively.

Let Ex be the eigenspace of A associated with the eigenvalue λ. We have

dim Ex-g<m.

Let Π λ be the matrix representing the orthogonal projection on Ex. The matrix

G = FLXF is of rank g because each of its columns is an eigenvector of A. On the

other hand,

F-G^Z-IIJF.

Hence on putting F = [mfl,..., / m ] we have

|| ( / - n j / i || 2 =dist (UEX) (i=l,...,m).

The reader can verify that Ex is also the eigenspace of A0 = XBY*, where

B = Y*AX.

We deduce from Theorem 4.3.7 that

\\F-G\\2F= Σ MI-IlJfiWl^cWA^-AJ2/,

A'0-A0 = UBY*-XBY*

= (U- X)BY* + XY*A(U - X)Y*.

Hence || A'0 — A0 \\2 is small as soon as || U — X ||2 is sufficiently small.

The m — g smallest eigenvalues af of F*F satisfy σ1 = 0(r/1/<r) (see Exercise

5.9.3). In conclusion, it remains to obtain a lower bound for the greatest σ?; in fact,

Hence

cond2(HO > ^-l^(a2max(G) - αη1")1*2 = αη~ ^2',

where cjs a generic constant; the quantity r\~1/2/ is greater the smaller η is. As a

result, B cannot be well approximated by the diagonal σΐ,

The main interest in the iteration (5.9.2) lies in the fact that the matrix A — σΐ

remains fixed throughout the iterations; in contrast, the iteration (2.11.1) requires

the solution of a Sylvester equation. We now put forward a modification of

(2.11.1) that does not require such an expensive step, while remaining stable when

λ is defective. When λ is defective, as we have seen, B is not well approximated

by σΐ. Let B = QTQ* be the Schur decomposition of ß, where

r = d i a g ( Q + N,

EXERCISES 235

B = QfQ*.

T h e n | | 5 - 5 | | 2 = | | T - f | | 2 = maxl|f1-a|.

This enables us to put forward a modified Newton method which is stable

when λ is defective and which has almost the same complexity as (5.9.2):

X0 = U, Z = Xk + i — Xk, (^QS)

(I - UY*)AZ -ZB=- F(Xk) (fc ^ 0).

In this context a natural choice for σ is the arithmetic mean

m

- 1 1

rrii=i m

In contrast to (5.9.2), the interest in (5.9.5) lies in the fact that it allows us to

compute the set of all m basis vectors of M with the precision required. See

exercise 2.11.3 for a sufficient condition of convergence for (5.9.5). This sufficient

condition requires that λ is not too ill-conditioned.

5.10 BIBLIOGRAPHICAL C O M M E N T S

The geometrical illumination of Sections 5.1 to 5.5 was inspired by the fundamental

article of Parlett and Poole (1973) and by Watkins' paper (1982). The presentation

of the inverse iteration method was adapted from the article by Peters and

Wilkinson (1979), in particular Proposition 5.4.2 and Example 5.4.2. The study

of the simultaneous inverse iterations for calculating a defective eigenvalue is

new (Chatelin, 1986).

Here is a point of nomenclature: the method of simultaneous iterations

(Rutishauser, 1969) has different names according to the context: subspace

iteration in Parlett's (1980) book (following the custom of structural engineers)

and orthogonal iteration in the book by Golub and Van Loan (1989, Ch. 7). The

practical implementation of this method always involves a step of projection (see

Chapter 6).

EXERCISES

5.1.1 [A] Prove that if an irreducible Hessenberg matrix is diagonalisable, then

all its eigenvalues are simple. Deduce that this is also true for a symmetric

irreducible tridiagonal matrix.

5.1.2 [D] Let A be a regular matrix and S0 a vector subspace on which A acts.

Denote the corresponding Krylov sequence by Jfk. Prove that this sequence

236 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

5.2.1 [C] Study the convergence of the method (5.2.1) for the matrix

(\ 0 0

A=\ -1 0

V

by using as the initial subspace

(a) S = \in(eve2\

(b) S = lin(£g.

5.2.2 [B:67] Investigate the possibility of computing the eigenvalues of A by

using the LR method described in Example 5.2.1.

5.2.3 [D] Consider the matrices A' and A defined in Proposition 5.2.5. Prove

that

Α' = Α(Ι-Ρ) + (λ-σ)Ρ,

Α = Α(Ι-Ρ1) + (λ-σ)Ρ\

1

where P and P are, respectively, the spectral projection and the orthogonal

projection on the one-dimensional eigenspace associated with A.

5.2.4 [D] Study the convergence of the subspace iteration for an irreducible

Hessenberg matrix by using an initial basis of the form

/ \

* X X

0. *.. x

0 0. '*.

U0 = e(Cn

Ό. x

'*

0 0

V I

where * is a non-zero element, x is an element that is not necessarily zero and

all other elements are zero.

5.3.1 [A] Prove that on the assumptions of Theorem 5.3.1 we have

Ι ^ * ^ - Λ 1 | = 0(|μ 2 /μ 1 |*)

EXERCISES 237

5.3.2 [D] Suppose that A is Hermitian and that the conditions of Theorem 5.3.1

are satisfied. Prove that

| ^ k - A J = 0(|^2//iJ2*)

5.3.3 [D] How can the power method be used to compute the eigenvalue(s) of

least modulus of a regular matrix?

5.3.4[C] Study the behaviour of the power method for the matrices

5.3.5[B:49] For a given polynomial

ρ(ζ) = ζπ + α 1 ζ π ~ 1 + · · · + α , ι

we define its companion matrix (cy) as follows (see Exercise 1.1.13):

f—a„_ i + 1 if 7 = nand 1 ^ ΐ ^ η

c0=<l if j = i — 1 a n d 2 < i ^ n

(0 otherwise

Prove that the power method for C is equivalent to Bernoulli's* method for p(z).

This method consists in computing

Z a z

n +k = ln + k-l + a2Zn + k-2 + "* + a Z

n k>

5.3.6 [D] With the help of the power method and Exercise 5.3.2, propose a

method with quadratic convergence for the calculation of the spectral radius of

A without having to calculate the product A*A explicitly.

5.3.7 [C] Carry out the computations relating to Example 5.3.2. Verify that the

two limit vectors are

i?x =(—0.115, 0, 0.577, 0.808)T

and

t; 2 =(_0.115, 0, -0.808, -0.577) 7 ,

and that the vectors vt + v2 and vl — v2 are proportional to the eigenvectors

associated with 3 and — 3.

5.3.8 [B:67] Let λ1 and λ2 be the first two dominant eigenvalues of A. Suppose

that

P

238 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

Prove that every pair of linearly independent vectors chosen from the limit

vectors of the power method enables us to construct a 2 x 2 matrix with spectrum

Ui,A 2 }·

5.3.9 [B:67] Revert to Exercise 5.3.8 in the case in which A is real and λχ = — A2.

Prove that if v and w are two limits of convergent subsequences, then v -f w and

v — w are the corresponding eigenvectors.

5.4.1 [D] Propose a method of subspace iteration in order to compute the

eigenvalues of least modulus of a regular matrix.

5.4.2 [B:45] Establish the quadratic and cubic rates of convergence of the

Rayleigh quotient in the general case and in the Hermitian case respectively.

5.5.1 [D] Let H be a singular irreducible Hessenberg matrix. Prove that the

eigenvalue zero is recovered after one step of the QR algorithm.

5.5.2 [B:64,67] Study the equivalence between the algebraic proof of the QR

method (given, for example, in [B:67]) and the geometric proof given in Theorem

5.5.5. In particular, prove that if H is an irreducible diagonalisable Hessenberg

matrix and V is the matrix of eigenvectors, then all the principal minors of V~l

are non-zero.

5.5.3 [C] Apply the QR algorithm to the matrix

- G -?>

Show that one obtains two distinct constant subsequences. Comment.

5.5.4[D] Let Ae<Enxn. Consider the following algorithm, known as additive

reduction (AR):

A0 = A,

A — F'1A F

Λ _£,

Η1 fc ^k^k'

where Ek is the lower triangular part of Ak, including the diagonal.

(a) Prove that if A is an irreducible lower Hessenberg matrix with eigenvalues

of distinct moduli, and if the matrices Ak = (a|*)) generated by the AR

algorithm are such that

Vfc: |<{ | > \af21>...> I O >0,

then, as k -► oo, the diagonal elements of Ak tend to the eigenvalues of A.

(b) Compare the complexity of this algorithm with that of the QR method.

EXERCISES 239

5.5.5 [D] Examine the potential instability of the AR algorithm (Exercise 5.5.4).

In particular, examine the case of defective eigenvalues and the case of matrices

with a greatly extended spectrum.

5.5.6 [A] Compare the basis Qk defined by (5.2.1) with the basis Qk of the QR

method.

5.5.7 [B:21,26] Let Ψ be the vector space of polynomials with real coefficients,

endowed with a scalar product < ·, · > . Consider an orthonormal system

(Po> Pi»..., p«,...), where pk is of degree k.

(a) Prove that the polynomials pk satisfy the relation

Pn +1 (x) = (Anx + Bn)pn(x) - Cnpn _! (x).

(b) Prove that if ak and bk are such that

pk(x) = afcxfc + fokxfc_1 + ···,

then

Ak = -

'k+l

Ble = Al

<*k+lak-l

ck = al

(c) Define

B„

a* =

Ak

and construct the symmetric tridiagonal matrix

/ 0 0■ π

a

0 ßl U

α 0- π

ßl ι /»2. "U

T = 0 ßl. «2. 0

••A

0

••A '·«.

<P><7>= w(x)p(x)<?(x)dx,

240 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

polynomial p, the Lebesgue integral

P\\22w=\ vv(x)|p(x)|2dx

' - - ! >

exists.

(d) Show that, for each k ^ 1, the polynomial pk has k simple real roots in the

interval [a, b]; we denote these roots by xjik(j = 1,2,..., fc).

(e) Prove the identity

Φη(χ) = [PoM> Pi (*)> · · ·»Pn W ] T

Deduce that the roots of p n + 1 are the eigenvalues of Tn and that, by means

of a shift of origin, the basic QR method, when applied to T„, is convergent.

Consider Gauss's quadrature formula

where the weights wjn are such that the error En(f) is zero when / is a

polynomial of degree ^ In — 1.

(f) Show that the weights vv, „ can be deduced from the first component of the

eigenvector </>„(*,,„) provided that the moments of the functions w, that is

integrals

1w(x)x dx,

are known.

mk =

4' * k

5,5.8 [B:65] We generalize here the basic QR algorithm with the help of the

notion of isospectral flow.

(a) Show that every matrix Ae<Cn xn has a unique decomposition

Α = πι(Α) + π2(Α\

where π^Λ) is skew-Hermitian [πι{Α)* — — π{(Α)~\ and π2{Α) is an upper

triangular matrix with real diagonal elements.

For all B and X in <P x n define

[£, X] = BX - XB.

n xn

Let B0e<C and suppose that / is analytic in an open set containing the

EXERCISES 241

Β(ί) = [Β(ί),π 1 (/(Β(ί)))], Β(0) = Βο.

We call t\-+B(t) the flow defined by / . Let R and Q be the solutions,

respectively, of the equations

6(0 = 6(0π1(/(β(ί))Χ 6(0) = /Π,

R(t) = n2(f(B(t)))R(t), R(0) = In.

(b) Prove that Q(t) is unitary and that R(t) is an upper triangular matrix whose

diagonal elements are real and positive.

(c) Prove that

B(t) = Q(t)*B0Q(t) = R(t)B0R(ty \

Q(t)R(t) = e /(Bo)i .

Let A,(y = l,2,...,n) be the eigenvalues of B0 and let Vj be the associated

eigenvectors. We suppose that

|e/Wl)|>->|e/w")|,

when k= Ι , . , . , η - 1, lin(^ 1 ,...,e k )nlin(t; fc+1 ,...,i; n ) = {0}.

(d) Show that, when t -+ oo, the behaviour of B(t) is as follows: its strictly lower

triangular part tends to zero, its diagonal elements tend to A l9 ... ,λη in this

order and its upper triangular part remains bounded.

(e) Show that when the basic QR algorithm is applied to

it produces a sequence

,l k = e/(B(fc)) (fc=l,2,...).

We recover the QR method for B0 by taking f(z) = In z.

5.6.1 [A] Prove that the QR method preserves the Hermitian (or symmetric)

tridiagonal form.

xn

5.6.2 [B:45] Let A e C be a regular symmetric matrix.

(a) Prove that there exists a unitary matrix Q and a regular lower triangular

matrix L such that A = QL: this is the QL factorization of A.

We define the QL algorithm with shift ak of origin as follows: given Ak and

σλ, let

Ak-akI = QkLk

be the QL factorization of Ak — akl. Define

Ak+i ^LkQk + Gk1-

242 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

J = [>,.> *,«-1>···>*ΐ]·

L)

Let (/4^ ) be the sequence of matrices produced by the QL method and let

(A[R)) be the sequence produced by the QR method.

(c) Prove that

A[R) = 7A[L)I

and

7* = / = /- 1 .

Let A be a real symmetric tridiagonai matrix:

(*> ßl. \

I R.

ßl «1

A=

V "A- •a«

Define the Wilkinson shift by

ifa t = a 2 ,

2 1 otherwise,

Ul-sffl(S)ßl(\S\ + y/S + ß\)-

where δ = (α1 — a2)/2. Define the vector p by

(A —wl)p = el

and the vector q by

(A — ω/)<? = τρ,

where

\P\\2

(d) Prove that

\\(Α-ω^ί\\22 = ^^τηϊη(2β2ι,β22Αβ1βι\/^β)'

Deduce that if A is an irreducible real symmetric tridiagonal matrix, then the

QL algorithm, with Wilkinson's shift of origin, generates a sequence of

irreducible real symmetric tridiagonal matrices

/ a„<*) \

1 ß\k).

A = ft»(*) a <*>

Ä*',

··«<*>

ΐ,

EXERCISES 243

such that

5.6.3 [D] Propose a QL method for the computation of the eigenvalues of an

arbitrary matrix, based on the ideas of Exercise 5.6.2 and examine the convergence

of this method.

5.6.4 [B:49,54] Consider Jacobi's method for a symmetric matrix A elR" X ":

A0 = A.

Given the matrix Ak = (α^), let (p, q) be such that

|^>|= max |a}J>|.

the zero in position (p, q) by sin 0k,

the zero in position (q,p) by —sin0k,

the units in positions (p, p) and (q, q) by cos 0k,

(b) Apply this method to a 3 x 3 matrix and prove that its asymptotic behaviour

is the same as that under the inverse iteration method.

5.6.5 [B:45] Let T be a real symmetric tridiagonal matrix of order n. Let σ be

a shift of origin. The QL factorization (Exercise 5.6.2) of Τ—σΙ can be

accomplished by n — 1 rotations Jk; that is Jk is a rotation in the plane of the

coordinates (/c, k + 1) (in Exercise 5.6.4 take p = k and q = k 4-1). Thus

J 1 J 2 - J ( l - 1 ( r - ( T / ) = L.

(a) Show that the calculation of

on the right at this stage.

(b) Study the following shifts:

π τ (0)

σ = -^-' (Newton),

7^(0)

244 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

σ = ω_ίτΜ (Saadx

π'τ(ω)

where π τ is the characteristic polynomial of T.

5.7.1 [A] Let A, BE<C" X ". Consider the generalized problem

Ax = λΒχ (χ Φ 0),

where det (A — XB) is not identically zero. Let S be a subspace of C of dimension

m such that

dim[>l(S) + £(S)]^m;

than S is called a deflation subspace.

(a) Prove that there exists a unitary matrix Ue<Cnxn and a unitary matrix

Ve<Enxn such that the first m columns of V form a basis of K and

U*AV=['Axi Axl

0 Λ 22

#11 #12

L/*£K .

0 £22

m xm

where A x 1? J ^ x eC . Define

dif(i4 l l ,ß 1 1 ;i4 2 2 ,ß 2 2 )= min max{A(i4 1 1 ,/l22)^(ß 1 1 ,ß 2 2 )},

||X||F=I

nyiiF=i

where

*(A11M22)=\\A22Y-XAli\\F

and

A(B 1 1 ,B 2 2 )=||B 2 2 y"-JfB 1 1 || F .

Now consider two arbitrary unitary matrices in C x":

1/ = (1/!,1/ 2 ) and K=(K lf K 2 ),

nxm

where L/ 1 ,K 1 eC . Define

/*,.,. = £/|M V. and B y = l/fßK,.,

( n _ m ) xm

and, for given X, y in C :

£/; = ( l / x + I/ 2 X)(/ + X*X)~ 1/2 ,

l/'2 = (U2 - l / ^ M / + JlfX*)"1/2,

K , 1 =(K 1 + K 2 y)(/+ y*y)~ 1 / 2 ,

EXERCISES 245

(b) Prove that U' = (U\, U'2) and V = (V\, V2) are unitary.

(c) Prove that U*AV\ = U*BV\ = 0 if and only if the pair (X, Y) satisfies the

conditions:

A22Y — Ai4u = XAl2Y— A21,

D22Y— A B [ ] = XBl2 Y— ^21·

Define the constants

y = max{M 2 1 || F ,||B 2 1 || F },

(5 = d i f ( / l U ) B n M 2 2 , B 2 2 ) )

v = max{||/l 1 2 || 2 ) ||B 1 2 || 2 }.

(d) Prove that if

yv 1

subspace. Prove that sp(A,B) is the disjoint union of sp(All-\- Al2Y,B11 +

B12Y) and sp(>l22 - xA129B22 - XBu)·

(e) Deduce the following result: let U = (Uu U2) and K= (Vi9 V2) be such that

A2l= B2i=0. For any two matrices E and FinC" x "we define

e

y = 2i»

v = max{||412||2,||B12||}+ß12,

5 = dif(^11,ß11M22,B22)-(e11+622).

Then, if

yv 1

associated with the perturbed problem

(A + E)x = Λ(£ + F)x (x#0)

and the spectrum sp(/t + £, £ + F) is the disjoint union of the spectra

δ ρ ( ^ 1 1 + £ 1 1 + μ ΐ 2 + £ 1 2 ) 7 > β 1 1 + £ 1 1 + ( β 1 2 + £ 1 2 )7)

and.

sp(A 22 + F 2 2 - X(A12 + £ 1 2 ),B 2 2 + F 2 2 - X(£ 1 2 + F 12 )).

246 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

(f) Prove that the matrices X and Y of parts (c) and (d) satisfy the conditions

max{||Jf||F,||y||F}<2^

o

5.7.2 [B:53] Let A e ( P x n and Be€nxn be two symmetric matrices. We suppose

that B is positive definite. Consider the Rayleigh quotient

, x x*Ax

μ(χ) = -~— (x*0).

x*Bx

For a given vector xk we use the notation

to = Μ**)>

Ck = A- μβ.

Let

C* = Ö* — Ek — Fk

be the descomposition of Ck into a diagonal matrix (£>k), the strictly lower

triangular part ( — Ek) and the strictly upper triangular part ( — Fk). For a given

value of the parameter ω(>0) we define

*** = / - ^ - c » .

Consider the iteration

X

k+l ~ MkXk.

X X r

k + 1 ^ fc ~~ * k *>

r

k = (A-VkB)xk·

(b) Prove that if

0<ω<2

and

μo = min-^ = minμ(βi),

i bu

then

lim fk = 0,

k-+ao

where

^ = (A ~ VkB)xk

EXERCISES 247

and

A 1

B*B = LL*

be the Cholesky factorization of B*B. Show that

Ax = A£xoL(C - λΙ)1*χ = 2x,

1

where C = L" J?ML-*.

5.7.4 [B:52] Let AeCm x " and £e<Cm x n. Suppose that n ^ m and that

Ker/4nKer£ = {0}.

Let xeC" and suppose that Bx Φ 0. Define

F{x) = \\\Ax-p{x)Bx\\l

where

x*B*Ax

:

PW =

x*fc*£x

(a) Prove that the gradient of F at the point x is given by

VF(x) = [A - p(x)B~]*[_A - p(x)J5]x

The gradient method for minimizing F is defined by

*k+i=*fc + 0(**) (/c = 0 , l , . . . ) ,

where

* -2F(x k )

VF(xJ if||VF(x k )|| 2 #0,

0(**)H 0 if||VF(xfc)||2 = 0,

0 if£xk = 0.

(b) Prove that

Il^ + ill2 = KII2 -11^)112·

(c) Examine the convergence of the sequence (xk).

5.8.1 [B:40] Let A' be a matrix close to A. Suppose we know a simple non-zero

eigenvalue λ' of A', a right eigenvector φ' and a left eigenvector φ' such that

ΙΙ0ΊΙ2 = ^ ν = ΐ;

248 FOUNDATIONS OF METHODS FOR COMPUTING EIGENVALUES

S/=lim(/-F)(i4/-z/)-1

ζ-λ'

Define the following algorithm:

Φο = Φ'*

λΗ = ψ'*ΑφΗ9

(b) Prove that the sequence (</>k) satisfies

φ'*φ,= \ (* = 0,1,...).

(c) Interpret this method as a power method with defect correction.

(d) Interpret this method under the aspect of a modified Newton method.

5.8.2 [D] Let φ be such that || φ ||2 = 1 and Αφ — λΒφ. Suppose there exists a

vector φ0 such that φ^Βφ — 1. Study the convergence of Newton's method

applied to the equation

Ax - φ*ΑχΒχ = 0.

5.9.1 [A] Use the notations of Lemma 5.9.2. Compare ΙΙ(Β,Β)"1 ||F with

ΙΚα-σ/ΓΜΐζ when e x - !

5.9.2 [A] Consider Lemma 5.9.3. Prove that the difference between the singular

values of the matrices K and Π is of order ε1/2.

5.9.3 [A] Consider Proposition 5.9.7. Let of be the eigenvalues of F*F. Prove

that

σί = 0(η*").

5.9.4 [C] Let

(l

0 θ\

A= 0 1 1

yO 0 1

and let M be the invariant subspace associated with the defective eigenvalue

χ = l: M = linZ, where X = (e2,e2). In the method (2.11.1) take Y= X. Choose

EXERCISES

0 °l

17 = 1 0

ε

ν υ

(a) Show that ||_l/ - X ||2 = 0(ε).

(b) Show that B = Y*AU is diagonahsable.

(c) Let W be the basis of eigenvectors of B. Show that

cond2(W0 = O(6 -1 ' 2 ).

CHAPTER 6

Large Matrices

The numerical methods for large matrices are based on the principle of projection

on an appropriate subspace; they require only the product of the matrix A and

a vector, the matrix being stored in a secondary memory. The methods we are

going to propose in this and the next chapter are at present the most efficient

ones for computers of traditional construction (sequential computers) which can

dispose of a vectorial unit.

What is a large eigenvalue problem? Evidently, there exists no precise and

absolute answer to this question, for the notion of size depends on the computer

used. We could propose the following answer: an eigenvalue problem is regarded

to be large when it is much cheaper to compute only those eigenvalues and

vectors that are required, rather than to compute them all.

The eigenvalue problems of large sparse matrices arise mainly from the dis

cretisation of partial differential equations. The most frequent requirements are

to find (a) the least eigenvalues of a symmetric matrix or (b) the eigenvalues of

greatest real part of a non-symmetric matrix. For example, in structural

mechanics one may wish to compute several hundred eigenvalues of matrices

whose orders exceed 105. In quantum chemistry the order may reach 106 and

more. The majority of the spectral problems that have been solved up to now

are symmetric, but the share of non-symmetric matrices is increasing (problems

of stability and bifurcation are considered in Chapter 3).

The next two chapters present the state of the art with regard to the algorithms

for large eigenvalue problems. Several theoretical questions are open at present,

which explains the heuristic aspect of some of the algorithms that will be

described.

Chapter 6 is concerned with the extreme eigenvalues while Chapter 7 treats

the eigenvalues of greatest real part when the matrix is non-symmetric.

The principal idea is to approximate to the eigenelements of the matrix A of

order n by those of a matrix of much smaller order v, most frequently obtained

252 NUMERICAL METHODS FOR LARGE MATRICES

of the orthogonal projection in question.

The spectral problem:

find /leC and 0 Φ xe<P such that Ax = λχ (6.1.1)

is approximated by the problem in G,:

find v^eC and 0 Φ xleGl such that πι(Αχι — Λ,,χ,) = 0. (6.1.2)

The problem (6.1.2) is the Galerkin* approximation of the problem (6.1.1) (see

Raviart and Thomas, 1983). It is called the Rayleigh-Ritz* approximation in the

special case in which A is Hermitian.

The numerical method consists in constructing an orthonormal basis in G^ and

in solving (6.1.2) with respect to this basis. Let g, be the n x v matrix representing

this basis. Put xl — Q£h where £ z eC v and where ξι is a solution of

(6.1.3)

or, again, an putting Bt = QfAQt we consider the problem

find Aze<C, 0 Φ £ZCV such that B& = λ&. (6.1.4)

The matrix Bl is of order v and represents the map

<tfl = nlttf]Gi:Gl-+Gl

with respect to the basis Qv

In practice the subspace G, is constructed by starting with a Krylov sequence

generated either by a vector u or by a set of r linearly independent vectors {u.}rr

Let

S= lin (tip..., ur) (1 ^r<n).

There are two main classes of methods that arise from the following choices of G{.

(a) Sz = AlS when / = 1,2, — The dimension dim St = r is constant throughout

the iterations. This choice leads to the power method when r = 1 and to the

method of simultaneous iterations when r > 1.

(b) Jfz = (S, AS,..., AlS) when / = 1,2,..., v < n. The space Jf, is called the Krylov

subspace generated by {t^,..., ur}\ it is of dimension rl. When A is Hermitian,

this choice leads to the Lanczos method ( r = 1) or to the block-Lanczos

method (r > 1). When A is not Hermitian, the method reduces to the Arnoldi

method (r = 1) or the block-Arnoldi method (r > 1).

^ / — 1 , is evidently richer than St. In particular, it contains the subspace

f

Walter Ritz, 1878-1909, born at Sion, died at Göttingen.

THE METHOD OF SUBSPACE ITERATION REVISITED 253

7 in order to accelerate the convergence of the method ofsubspace iteration.

Supposing that the eigenelements X, x are given, one would like to know whether

there exists a sequence Xl9xt of eigenelements of s/t = nlA^Gl (or of Al = ntA, see

Exercise 6.1.1) that converges rapidly towards X,x when / increases. Since the

method of approximation presented here is a method of projection, it will be

seen that it is possible to bound the errors | λ — λι | and || x — xz || 2 as a function of

dist (x, Gt) =\\(I- π,)χ \\ 2 = sin 0(x, G,),

where 0(x, G0) is the acute angle between lin(x) and GL. In fact, the study of the

convergence is carried out in two stages:

(a) to show that for the choices of G, considered, there exist one or more

eigenvectors x such that <xt = \\(I — nl)x\\1 is small;

(b) to bound \λ — Xt\ and ||x — xl ||2 as a function of (xt.

The convergence rate of the methods is therefore derived from α,, which plays

a key role. We shall generally suppose that X is simple. When X is multiple, the

analysis is more delicate and the exponent involves the index of X (see Chapter 4).

Once again we suppose that

\μι\> \μι\ >-> brl > ΐΛ+ι I > '" > I/O > 0, (6.2.1)

where 1 < r < n. The definitions of M and P were given in Chapter 5. The reader

is referred to Section 5.2 for the definition and discussion of the convergence of

this method.

It was shown there that

ω(Α%Μ) = 0(|μ Γ+1 /μ Γ |'),

provided that (6.2.1) is satisfied and that dimP5 = r. This is a global rate of

convergence; we shall establish more precise and detailed upper bounds.

Lemma 6.2.1 Let dim PS = r and let xf be an eigenvector associated with μ(. For

every ε > 0, there exists a unique vector s( in S and an index k such that

254 NUMERICAL METHODS FOR LARGE MATRICES

r

S= Y^SjUj.

Hence

r

Since the {Puj}ri are linearly independent, there exists for any xteM a unique

Si in S such that Pst = x t . In what follows xf will be an eigenvector:

Axt = μ,χ,..

By definition

||(/-^)xi||2 = min||x/-);||2^||xi->;/||2,

where

Hence

\\^-yi\\2^v]^l\\L^-P)y\\2\\i-P)si\\2.

\\ßi\

Put

C = -,4(/-P), ρ(η = \μ,+ ί/μί\.

For every ε > 0, there exists an integer k such that, when / > /c, we have

IIC'H^piQ + e.

When /I is diagonalisable, X = XDX "* and || Cl || ^ cond 2 (X)pl(C). See Exercise

6.2.1 for a study of the constant, when A is not diagonalisable.

When /-»oo, then dist(xi}Si) tends to zero like Ιμ,+ ^μ,Ι'. The constant

II *i — Si || 2 diminishes as the acute angle between the eigenvector xt and the initial

subspace S becomes smaller (see Figure 6.2.1 for the case r = 1).

X I ^M

' x=Ps

Figure 6.2.1

THE METHOD OF SUBSPACE ITERATION REVISITED 255

II Xi-st II2 = || (I-P)si\\ 2 = tan 0(5ί,χ,·),

where 0(sf, xt) is the acute angle between the directions (st) and (xt).

Lemma 6.2.2 Let π be a projection matrix on a subspace S. Then nA and nAn

have the same eigenvectors associated with a non-zero eigenvalue.

PROOF The equation πΑχ = λχ (λ Φ 0) implies that x = πχ, and so xeS. Let B

be the matrix of nAn in an orthonormal basis V of S. Put x = Υξ; then Βξ = λξ.

|| x || 2 = 1. We choose λ to be among the r dominant eigenvalues of A. Put

χχ=\\(Ι-ηχ)χ\\2.

there exist eigenelements λχ, xt of Ax = nxA satisfying x*xx = 1 such that

\λ — λl\^:C0ίl and ||x — x j 2 ^ c a , .

When A is Hermitian, we have

\λ — λ{\ ^caf.

xx = Q,{, and Βχξχ = λχξχ.

Let π be the orthogonal projection on M. We are going to apply Theorem 4.4.5

to the matrices Ax = nxA and A' = nA, which will play, respectively, the roles of

A and A' in that theorem. Put

Η, = 4 | - Λ ' = ( π , - π Μ .

Since co(AlS, M) -+ 0, we deduce that

1|π,-π||2^0.

Hence

Ι|Η/ΙΙ2-+0 when/->oo.

When x is the eigenvector under consideration, we have

Hxx = (nx — n)Ax = λ(ηχ — n)x = λ(ηχ — I)x.

For sufficiently great / there exists an eigenvector Xj of Ah normalized by

256 NUMERICAL METHODS FOR LARGE MATRICES

Figure 6.2.2

\x-xl\\2^c(xh Ιλ-AjKca,,

where λι is the associated eigenvalue. This eigenvalue is simple when / is

sufficiently great, because || Hl || 2 -► 0 as / -> oo.

Let xt = χ,/ΙΙ Χχ II2 be the normalized eigenvector (see Figure 6.2.2). When A is

Hermitian,

Λι — Λ ι ΛΛι

and

W-X^cHA-AfaW

We now have

(A - At)At = A(/ - πζ)χ + (/ - π,Μίί, - x),

which implies that

Theorem 6.2.4 On the assumption that (6.2.1) holds and that dim PS = r, the

method of simultaneous iterations on r vectors converges. Moreover, if the ith

dominant eigenvalue μ( is simple, then the convergence rate of the ith pair of

eigenelements of Ax is of the order of\μr+1/μi\,i=l,...7r.

When A is Hermitian, the convergence rate of the ith eigenvalue becomes

l^r+lMI 2 .

PROOF The assertions are simple consequences of Lemmas 6.2.1 and 6.2.3. See

Exercise 6.2.3.

The algorithm (5.2.1), presented in Chapter 5, constructs the basis Qt of AlS. We

present here the version with projection on the subspace AlS. For the sake of

THE LANCZOS METHOD 257

in AlS a basis Xt of eigenvectors of A{ in the following manner (see Exercise 6.2.4):

(a) 17 = Q0R0i B0 = Q*AQ0 = Y0D0Y~\ X0 = Q0Y0.

W h e n / ^ l let

(b) AX^^Qfl» (6.2.2)

(c) B^QfAQ^Ypj-1, X^QtYl9

where Yt is the matrix of eigenvectors of Br

PROOF The proof is left to the reader, who will observe that the bases Qt in

(6.2.2) and in (5.2.1) are different.

Then there exists a sequence of regular block-diagonal matrices At such that

XtAi -► X when I -> oo.

PROOF We use Corollary 1.5.5 to assert that there exists a sequence of regular

matrices Δ, such that XtAt -► X.

The proof that Δζ is block-diagonal is carried out by induction (see Exercise

(6.2.5).

We suppose that μ( is a simple eigenvalue of A. Then μ|0 ->μ·, where μ{.° is the

ith (simple) eigenvalue of Bt associated with the eigenvector ξ{Ρ. We put

lin (x{P) -► lin (xf) take place at a rate |μΓ+1/μ,·|·

In practice one does not calculate Bt at each iteration. If the step of projection

takes place at every k iterations, this amounts to projecting on the subspace AklS.

Since the dimension r of Bx is moderate in comparison with n, the methods of

Chapter 5 can be employed in order to diagonalise B (for example by the QR

algorithm or by inverse iteration).

In this and the two subsequent sections we suppose that the matrix A is

Hermitian.

258 NUMERICAL METHODS FOR LARGE MATRICES

Let u be a non-zero vector. The Krylov subspace generated by u is denoted by

Kn = \\n(u,Au,...,Anu-1).

We have dim JT„ = n if and only if the minimal polynomial of A relative to u is

of degree n\ thus p{A)u φ 0 for every polynomial p of degree less than n.

In exact arithmetic, the Lanczos algorithm constructs iteratively an orthonor-

mal basis

(a) vl=u/\\u\\2, a1=v*Av1> bx^0.

(b) W h e n ; = 1 , 2 , . . . , n - l put

j+ 1 = II Xj+ 1 II 2 > 0>

vj+1 = b7+\xj+1, <*J+I=V*+1AOJ+1.

Tn is a Hermitian tridiagonal matrix with diagonal elements at(i = 1,...,n) and

off-diagonal elementsfc,(i= 2,..., n).

This algorithm was proposed by Lanczos (1950) as a method for tridiagonali-

sing a Hermitian matrix. However, in practice the vectors vt which are

constructed by local orthogonalization (with respect to t ^ and t;,·^), rather

quickly lose the property of global orthogonality. For this reason practitioners

have been led to prefer Householder's method, which is much stabler for

tridiagonalising a Hermitian matrix of medium size.

Since the vectors vt are computed iteratively, the process may be terminated after

the computation of /(< n) of these vectors. Then Vx = [v!,..., v{] is an orthonormal

basis of the space

jri = lin(u9Au9...9Al-1u)

in which the Rayleigh-Ritz approximation $4X is represented by the matrix

Τι=ν*Ανη

which is a tridiagonal matrix of order Z. This leads to the following algorithm:

(a) u Φ 0, £>! = y/ü*ü, v0 - 0,

(b) When; = l,2,...,/,put

v

u

j = 7Z> u = Avj~ bjVj_ ί, a. = vju, (6.3.1)

uj = u-ajVj, bJ+1=y/ü*ü.

THE LANCZOS METHOD 259

We suppose that the quantities 6,0* = 2,...,/) are positive, that is to say, that

dim J f j = L This assumption is not restrictive, for if it is not fulfilled, the

eigenvalue problem for A reduces to two subproblems (see Exercise 6.3.1).

The dimension / of the approximate problem is either fixed in advance or else

is determined dynamically by the algorithm according to the value of bl+1 (see

Section 6.3.5).

The tridiagonal matrix Tt is then diagonalised.

71 = W 7 ·

The eigenvalues Dt and the eigenvectors Xt = VlYl of At = ntA are called the

Ritz values and Ritz vectors of A; they are the required approximations of certain

eigenelements of A9 as we shall see. They possess the global optimal properties

envisaged in Section 4.6 of Chapter 4.

In the basic Lanczos method we seek to keep / small in relation to n, and there

can be no question of convergence in the classical sense of this term since / takes

only a finite number of values. When / is of modest size compared with n, we

shall see that Jf, contains eigenvalues of Ax which are sufficiently close to certain

eigenvectors associated with extreme eigenvalues of A. This property of approxi

mation justifies the choice of the Krylov subspace Jf,, but this is not the only

justification. There is also a computational reason: the Schmidt orthogonalization

process is particularly simple in the Krylov subspace; this is the reason why the

matrix Τχ is tridiagonal (see Exercise 6.3.3).

We shall now show that the Lanczos method can approximate only one

eigendirection associated with a multiple eigenvalues. The irreducible tridiagonal

Hermitian matrix Tt possesses only simple real eigenvalues (Exercise 5.1.1), which

may be arranged in decreasing order of magnitude:

λ®>λ{2>->λ®.

Let {2f; 1 ^ ii < d] be the distinct eigenvalues of A9 also arranged in decreasing

order of magnitude:

λί > λ2 > ' · ' > λά = Amin.

Let Pt be the eigenprojection associated with Af and let E be the subspace

generated by the vectors {Ρ,Μ: 1 ^ Ϊ ' ^ d } ; thus

£ = lin(P 1 w,...,P d M).

If these vectors are not zero, they are eigenvectors of A corresponding to the

distinct eigenvalues; they are linearly independent.

A' = A]E

which are simple eigenvalues.

260 NUMERICAL METHODS FOR LARGE MATRICES

PROOF We have

i=l i= 1 i= 1

A' produces the same matrix niAnl. If dim E = d' < </, then A' is of order d' and

possesses </' independent eigenvectors associated with distinct eigenvalues, which

can only be simple.

Ley Pz_ l be the vector space of polynomials of degree less than or equal to / — 1.

Suppose that P-u Φ 0 and put

ιΙΛ·"ΙΙ 2

x, is an eigenvector associated with λ·χ. Put

tan 9(xh ΧΊ) = min || p(A)yt || 2 tan d(xh u).

ρ(λ,)=1

d

it follows that

u = PiU + £pjU and i; = q^P.u + Σ ^ Ρ , " .

tan20(P|M,tO = ( £ . ^

If(/-P l )M#0, we have

Σ «2(^)l|P/i|li = Il^)yj*||(/-P>||*.

THE LANCZOS METHOD 261

Ρ(·) = - .,

Now

tan 0(Pfu, Jff) = min tan Θ(Ρμ, ν)

veJfi

-Γ min „ P ( „ y , l l 2 l f c i ^ .

LPePi-i

"p(Ai)=l J II Λ" II2

Finally, we observe that

tan0(P,. M , M )J l ( / - P ) ""*.

II P.« II2

It remains to obtain an estimate for the number

i„ = min \\p(A)yt\\2.

pePi-i

p(Ai)=l

This can be accomplished with the help of Chebyshev polynomials. We recall

that for real t and |i| > 1, the Chebyshev polynomial of the first kind and of

degree k is defined by

Tk(t)=Η(ί+yfi^r+(t+y/F^in

In Chapter 7, Sections 7.2 and 7.3, we shall collect the properties of Cheby

shev polynomials of a real or complex variable which are required in Chapters 6

and 7.

wfter^

Amin

Δ1 = 1, Δ | = Π^ (i>D

J<i kj — kt

and

M ~~ M+1

7i = l + 2

X| + 1 — / „

262 NUMERICAL METHODS FOR LARGE MATRICES

PROOF We have

t ü= min ||ρ(Λ)^|| 2

pePi-i

Ρ(λ,)=1

= min 1 JTF l

pePi-

ρ(λ,)=1

(a) The case i = 1:

max|p(A,)|< max \p(t)\.

j>l i€[Aml„.A2]

By Theorem 7.2.1,

min max |p(i)l = ,

Ρ(λ,)= 1

where

^1 ~ ^ 2

?i = l + 2

A? Ami

il7 = min max|p(A;)|

pePi-ι i/i

p(A,)=l

^ min maxIpWI·

pePi-i j>i

P(A<)= · = ρ ( λ , - ι ) = 0

PU,)=1

Pit)

maxlpi^O^max

, r v J

j>i " i>i

«(*,)

<(Π "m,n max

*<* A k — A; / j>i

«Λ)

We conclude that

When / increases, the decrease of tan d(xh Jfj) is of the order of the decrease

THE LANCZOS METHOD 263

of 1/T, _;(}>;). The quantity y( depends on the relative distance (Λ· —A i+1 )/

v^i+i ~ - ^ m i n ) .

Put

*i = Vi + y/yi-i-

For sufficiently great /, the value of T, _,·()>;) is of the order of ^τ\~\ and the rate

of decrease of 0(xh ΧΊ) is Ι/τ,. This rate is the better the greater yt is and γχ is the

greater the smaller i is in comparison with /; that is X; is an eigenvector associated

with one of the greatest eigenvalues of A.

6.3.4 Approximation

We seek to estimate the precision of the Lanczos method as a function of Uu and

the spectrum of A, for a pair of eigenelements λ, x, || x || 2 = 1. We put

«i —\\U — π ι ) * II2 = si*10(x, #Ί) < t a n 0(x, ^i)>

which can be majorized with the help of Theorem 6.3.3.

Theorem 6.3.4 Suppose that λ and x are given. If a, is sufficiently small, there

exist eigenelements λι and xt of Ax such that \ λ — Λ,| ^ cocf and sin 0, ^ caf, where

c is a generic constant and where 6t is the acute angle formed by the eigendirections

lin (x,) and lin (x).

corresponding eigenvector. Let Γ, be a Jordan curve lying in res(/4)nres(y4,)

which isolates λ and λχ. Let Px and Rx(z) be the eigenprojection and the resolvent

associated with Ax and λχ. Then

2niJn

(z)(/-^MÄ(z)xdz

2πυΓι

- ^ f ^αζ(/-π()χ,

2πΐ J r ,A — z

because

R(z)x =

Now

|| ( P , - P ) x || < ^ V , a „

2π

264 NUMERICAL METHODS FOR LARGE MATRICES

where

μ, = meas (Γ,), ct = max || Rt(z) ||, dt = [dist (A, Γ,)] \

ZBTI

If we put

μΐ

c = — max (Mat),

2πκι<η

we conclude that

\\(Pl-P)x\\^coil.

Let Xj = Pzx and let 0, be the acute angle between lin(x) and lin(xj). Then

|| X/1| 2 = cos 0j, xfx = cos 2 0,

and

| Xj — x || 2 = sin 0j ^ co^

(see Figure 6.3.1). Put

Put

1

(xtxt)112'

We have the relations

Λ Ai -— X ./1X Χ · /ΊιΛι

whence

μ-Αί|<|Μ||2(2||χ-χ/||2+||(/-πι)χ||2).

Since || x — x, || 2 = 2 sin 0,/2, we deduce that

μ-λ,Ι^ΙΙχ-Λ,ΙΙ^οα,.

Finally, as in Lemma 6.2.3, we show that | λ — Α,Κ ca2 when A is Hermitian.

Theorem 6.3.4 demonstrates that, for moderate values of/, the Lanczos method

enables us to approximate the extreme eigenvalues of A (the greatest and the

least, the latter result being obtained by arranging the eigenvalues of A and of

Figure 6.3.1

THE LANCZOS METHOD 265

are well separated. The constants that occur in Theorem 6.3.4 can be made precise

(see Exercise 6.3.6).

When A, is close to Ai+1, the bounds deduced in Theorem 6.3.3 become very

pessimistic. We can improve them by taking into account the particular structure

of the spectrum. For example, it can be shown (Saad, 1980a) that

where

/t

i + 2~~ / min

but the constant c„ which contains (Af — Ai+x)~ \ is large.

When A is a multiple eigenvalue, the Lanczos method (in exact arithmetic) does

not enable us to compute the set of eigenvectors associated with A. However, in

practice, rounding errors have the effect that from a certain number of iterations

onwards, the Lanczos method is applied to a neighbouring matrix having neigh

bouring eigenvalues that are distinct (no longer multiple). In fact, it will be noticed

that a second copy of A appears which corresponds to a second eigenvector that

is not proportional to the first. This second copy appears as a result of the

Lanczos method being applied with an initial vector that has a zero component

(to machine precision) in the desired eigenspace. When A is of multiplicity m, then

m copies will appear successively as / increases.

We shall return to the question of multiple eigenvalues in Section 6.4, where

we shall present the Lanczos block method.

In accordance with the construction of the basis Kz we have the identity

AV^Vfi + bt+M+tf, (6.3.2)

z

where ex is the /th vector of the cononical basis of C . Recall that Xx = Vt Yv Thus

AX^Xfr + bt+yO^ejYt.

The residual for A, calculated in terms of A}0 and xj°, is given by

whence

Mx<'>-^xf>||2 = f> 1+1 |^| = ^ .

There exists an eigenvalue A,· of A such that

266 NUMERICAL METHODS FOR LARGE MATRICES

ß2 ß

\λ,-λ®\^ and

sinö,,^»

d

du u

where θη denotes the acute angle formed by the eigendirections lin(xj°) and

Ιίη(χ,).

In practice, it will be known that the eigenvalue A{.° (respectively the eigen

vector xf}) has 'converged' by observing the last component of the eigenvector

ξ{}} of Tr When the QL algorithm with shift of origin is used to compute the

eigenvalues of T,, then it is possible to compute ξ{{) without having to compute

the whole vector ξ® (for details see Parlett, 1980, Ch. 13).

The study we have just carried out supposes that the vectors {i;.}^ remain

mutually orthogonal. This is not true in practice when a calculator is used whose

arithmetic is only finite. In particular, this orthogonality disappears when λ{!\

χ|° begin to approach Λ,.,χ. (see Exercises 6.3.7 and 6.3.8). The exact Lanczos

algorithm terminates at /(< n), although in practice it can be continued indefinitely.

The error induced by the absence of orthogonality can only retard the

'convergence'; it does not prevent it. In this situation, three strategies have been

employed to implement the Lanczos method; we shall briefly recall their

advantages and inconveniences:

(a) Strategy of non-reorthogonalization. The memory space required is minimal.

However, more than n steps are needed to obtain all the eigenvalues

(including the multiple ones); in general between 2.5« and 6n steps are

necessary. The stopping criterion is delicate, and the eigenvectors have to be

computed separately.

(b) Strategy of complete reorthogonalization. The behaviour of this algorithm

is very close to that of the exact algorithm; in particular, it requires a minimal

number of steps. On the other hand, it needs more memory space, since it is

necessary to store the Vj and it needs more calculations (but the Gram-Schmidt

process is well vectorized).

(c) Strategy with partial reorthogonalization. This is a compromise strategy; the

reader can find a description in Parlett (1980, Ch. 13). Essentially it provides

the advantages of complete reorthogonalization at a much lower cost. This

is achieved by watching the precision obtained with the help of the value of

A,.

In exact arithmetic the Lanczos method cannot detect the multiplicity of the

eigenvalue which is being computed. This fact has led to proposing the block

Lanczos method. This enables us to determine multiplicities that are less than

or equal to the size of the block.

THE BLOCK LANCZOS METHOD 267

Let {wj,..., ur} be a set of linearly independent vectors that generate a subspace

S. The Krylov subspace generated by S is defined as

j r l = Iin(SMS,...,i4 , - 1 S).

We construct an orthonormal basis Vx of Kt in which Ax is represented by the

matrix

which is a block tridiagonal matrix, the order of each block being r. Moreover,

the blocks form a band of size r + 1 (see Figure 6.4.1).

Write

»Ί = [βο,..·,βι-ι].

where the orthonormal basis Qj of AJS is constructed from the orthonormal

basis Q0 of S in the following manner:

(a) χ Ι , - β ί ^ ρ ο , Β ^ Ο , ρ . ^ Ο ;

(b) when ./= 1,2,...,/-1,

(i) put X, = AQj_, - β,._ Jj- Qj_2B*,

(ii) carry out the Schmidt factorization

(iii) put BJ+ x = R., i . + x = ß7MQ ..

If dim Jf j = /r, the matrices Rj are regular provided that the degree of the

minimal polynomial of the vectors {w}^ is at least equal to /.

Figure 6.4.1

268 NUMERICAL METHODS FOR LARGE MATRICES

D

Lemma 6.4.1 The eigenvalues ofT1 are of multiplicity less than or equal to r.

PROOF Since the matrices Rj are regular, the matrix t{ is of rank ^ n — r. For

each eigenvalue λ of A, the matrix Tl — λΐ is still of rank ^ n — r. Hence

dimKer^-AJKr.

Now let E be the subspace generated by the {Ρβ}\.

Lemma 6.4.2 The block Lanczos process amounts to approximating the eigen

values of A' = A\E whose eigenvalues are of multiplicity ^ r .

associated with A, is PtS and dim PtS < r.

Let {μ.}" be the eigenvectors of A9 each counted with its multiplicity and

arranged in decreasing order of magnitude:

associated with the eigenvector x. and the eigenprojection P. = x.xf. Then

multiplicity r.

Ti-i(yk)

where

ί

Δ 1 = 1, Δ ί = Π ^ (*>1λ

and

Γ

s= Σ^Ρ

J-i

THE BLOCK LANCZOS METHOD 269

and so

Ps = tsjPuJ-

Since the r vectors {PUj}\ are independent by hypothesis, there exists a unique

vector sk < S such that Psk — xk for each given eigenvector xk (fee/). We put

vk = (I-P)sk = sk-xk.

Hence

Us k -x fc || 2 = tan0(xfc,sfc).

For a given xk we consider the vector veX{ which can be written in the form

v = q(A)sk9 where gePj_ x.

Since

Sk = xk + 2., -Pj*fc>

we have

W

(a) TTie cose i = 1. Here Sj is defined by Xj

||(/-P)i,||>= «^illPjSillS

II ^ II2 J>1+- Λ )

The minimum of the right-hand side for gePj-i is attained for p. We put

s = p(A)sleJfh

2 0 μι+Γ + Μπ

PI — Vl+r-Vn

~Vn

Hence, for j ^ 1 + r,

<*lVj-ßl = 1- 2AW

Thus

tan2 0(x ^ . Τ , Κ

IPs"22

ji> 1 + r p2(Aii)

< y ll p ^"2

270 NUMERICAL METHODS FOR LARGE MATRICES

and

Σ \\PjSl\\22=W-P)sl\\22 = \\s1-x1\\22.

3> 1+r

2 μί+ μη

β. = '*

ßi + r-ßn

therefore α,·μ& — β( = yk. We define

PM = T^fat-ßü

Lj<i

then Ρί(μ,) = 0 when; < i. Let s = Pi(A)sk9 where sfc is defined by xk. Now

K

<_.. '.~.llx»-sJ5·

TU7ky

We remark that the bounds of Theorem 6.4.2 reduce to those of Theorem 6.3.3

when r = 1 and the eigenvalues are distinct. The angle 9(xk,Jft) decreases like

T^_\(yk), where yk depends on the distance μΙί — μ ί+Γ . The generalization of the

Lanczos method to the block Lanczos method has an effect that is comparable

to the transition of the power method to the method of simultaneous iterations

(see Section 6.2).

Bounds for \μΗ — μ[1)\ and || xk — x[l) ||2 (kel) can be established as in Theorem

6.3.4.

It is required to solve the generalized eigenvalue problem

Κχ = λΜχ (χ#0), (6.5.1)

where K is symmetric and M is positive definite symmetric (see Chapter 3). This

can be reduced to the standard form A — vl in different ways.

Let M = RTR. We put

A= RTKR\

THE GENERALIZED PROBLEM Kx = λΜχ 271

where the product is not evaluated explicitly for large matrices. Equation (6.5.1)

is equivalent to

Ay = Xy,

l

where x — R~ y. This reduction preserves the eigenvalues. Suppose we wish to

compute some of the smallest eigenvalues; the convergence factor for the least

eigenvalue kx is determined by

Now this number may be very small. In structural mechanics, it is not rare to

have A! = 105, A2 = 2 x l 0 5 and i m a x = 10 19 , which leads to y1 = 10" 1 4 . It

requires about n Lanczos steps to separate λχ from λ29 even in exact arithmetic.

An efficient remedy is provided by the spectral transformation, which is a natural

generalization of inverse iteration.

Let us choose σ close to the eigenvalues rquired and such that K — aM is regular.

Equation (6.5.1) has the some solutions as

λ—σ

It is natural to put

Α = (Κ-σΜ)~ιΜ and v=

λ-σ

However, A is no longer symmetric with respect to the Euclidean scalar

product. We have the following lemma.

Lemma 6.5.1 The matrix A is self-adjoint with respect to the scalar product

defined by M.

<u9vyM = vTMu.

Then

(Au, v}M = vTMAu = ντΜ(Κ - σΜ)~ lMu

= [(X - aMylMv]JMu = <w, Av}M.

Therefore we may define the algorithm (6.3.1) for the problem (6.5.2) provided

that the Euclidean scalar product is replaced by <·, > M . This yields the following

scheme:

272 NUMERICAL METHODS FOR LARGE MATRICES

(b) When; = 1,2,...,/, put

u w

w:=—.

bj

Solve (6.5.3)

(K — aM)u = w, u:=u-bjVj_iy

T

a. = u w, u:u - ajVp w = MM,

T

b

j+l = u w.

The additional cost in relation to (6.3.1) at each step consists in the evaluation

of w = Mu and in the solution of (X — aM)u = v. This solution is carried out with

the help of the factorization K - σΜ = LDlJ, where L is a lower triangular

matrix with a unit diagonal. The strategy of complete reorthogonalization is here

recommended in order to keep / as small as possible. The spectral transformation

1

t-a

transforms the part of the spectrum that is close to σ into the extremities of the

spectrum of A. Hence the algorithm (6.5.3) will efficiently compute the eigenvalues

in an interval containing σ. If required, one could use different shifts σ. This

method enables us to determine any eigenvalue whatsoever in the interior of the

spectrum if we know an approximation σ.

Remark For very large problems, the triangular factorization cannot be kept in

the central memory. If transfer time between the central and the secondary memories

is important then it may be advantageous to use the block Lanczos method, which

involves the solution ofr systems of the form (K — σΜ)uf = w,· (i = 1,..., r).

For the sake of simplicity we have assumed that M is positive definite, but in

practice it may be singular (see Exercises 6.5.1 and 6.5.2).

We shall now describe a method that extends the Lanczos method to the case

of a non-Hermitian matrix. It is based on Arnoldi's algorithm which iteratively

transforms a matrix into a Hessenberg matrix (Arnoldi, 1951).

Again, if u Φ 0, let

j r l = lin(ii,i4M,...M , " 1 w)

be the Krylov subspace generated by u. Arnoldi's method computes an ortho-

ARNOLD'S METHOD 273

Hessenberg matrix:

(a) O^u/Wu^hui^v+AOi,

(b) when 7 = 1,...,/ — 1, put

j

xj+1=Avj- Σ huvi9 hj+ij=\\xj+1\\2, (6.6.1)

V

J+i=Λ;Λ.Λ·+1> hj+1=**AVJ+i e <i+*)·

The algorithm terminates when χά = 0, which is impossible if the minimal

polynomial of A with respect to u is of degree > /. If this condition is satisfied,

Hi = (hij) is an irreducible Hessenberg matrix.

In what follows we shall suppose that A (and hence At) is diagonalisable with

eigenvalues {A,·}*}. Since Ht is diagonalisable, it necessarily possesses / simple

eigenvalues, which we shall denote by {Aj0}^. Let Pf be the eigenprojection

associated with λ{. ΙίΡμφ 0, we put x, = Pf w/ll P»w II2 ·

■p(Ai)=l

PROOF We have

||(/-7r l )x i || 2 = dist 2 (x i ,Jf / ) = min \\Xi-q(A)u\\2.

qePi-i

1

-lq(A)Piu + q(A(I-Pi))ul

\\PiU\\2

Since q(A) = ^(Λ^Ρ, and

we obtain

1

Ιΐα-π^χ,ΙΙ,^ min \\p(A(I-P,))(/-PtM2-

i.Pi-1 IIΛ" II2

pUi)=i

Now v4 is diagonalisable, say Λ = XDX~\ and so A(I - Pf) = XD'X'1, where

£>' is a diagonal matrix consisting of the eigenvalues Xj(j Φ i) of A. Hence

||p(A(I - P,))|| 2 ^ min|p(^-)| cond 2 (X).

274 NUMERICAL METHODS FOR LARGE MATRICES

I K / - W I 2 cond (X).

2

\\Pi*L

Put

ε(.°= min max \ρ(λ)\.

pePi-i sp(A)-{ki}

pU,)=l

This is the uniform norm of the best approximation to the zero function on the

set sp(A) — {AJ by means of polynomials in a complex variable of degree </

satisfying the condition ρ(λ() = 1.

In Chapter 7, Theorem 7.1.6 and Example 7.1.2, it will be shown that if the

spectrum of A consists of d distinct eigenvalues, then among the d — 1 eigenvalues

of A that are distinct from A, there exist / eigenvalues, denoted by A 1? ..., Aj such

that

K — λ]

PU)=1 \ k*j

It is seen that, when / is large, ε(0 decreases as the terms of the form

|(Ak — X)l(Xk — Xj)\ (k Φ]) in the denominator increase. Therefore it is likely that ε(Ζ)

will be smaller for those eigenvalues situated at the periphery of the spectrum

(which includes the dominant eigenvalues) rather than for those that lie in the

interior of the spectrum. This is confirmed in practice.

6.6.3

The discussion of the decrease of ε(/) when / increases is a difficult problem in the

approximation theory of functions of a complex variable. Except in particular

cases in which the spectrum is of a very special form, it is not easy to establish

upper bounds of ε(° that are both simple and precise. The two examples below

show that ε{Ρ depends on sp (A) in an important manner.

Example 6.6.1 When the eigenvalues are uniformly distributed over [0, 1], we

have

^ =^ 7 . (7=l,...,n)

w

and ε(Λ"1> = : 1 1

n-\ ' 2"- -!

Example 6.6.2 When the eigenvalues are uniformly distributed over the circle

\z\ = /, we have A,· = exp [2(; - 1)πί/η] (; = 1,...,n) and ε(0 = 1//.

It is seen that the decrease of ε(0 with increasing / can be quite moderate for

ARNOLD'S METHOD 275

certain cases of spectral distribution. The study of ε(,) is pursued by studying the

upper bound η{1); this is obtained by letting z vary in a domain D which contains

sp(A) = {λ} and excludes λ. In fact,

max |p(z)|<maxi0( 2 )|

sp(A)-{X) D

and

e ( l ) ^ij ( , ) = min max|p(z)|,

pePi-i zeD

Ρ(λ)=1

When the matrix A is real, its spectrum is symmetric with respect to the real

axis. If the eigenvalue λ is also real, we may choose for D a domain that is

symmetric with respect to the real axis.

The following theorem determines η{1) in three particular cases: λ is real and

D consists of

(a) a line segment,

(b) a disk,

(c) the interior of an ellipse with real major axis.

Theorem 6.6.2 We have the following characterizations, where a,X — c,e and p

are positive real numbers:

(a) When D is the real interval {i; 11 — c\ ^ a},

(c) When D is bounded by the ellipse with centre e, focal distance e and semi-

major axis a,

Figure 6.6.1

276 NUMERICAL METHODS FOR LARGE MATRICES

6.6.4 Approximation

We shall prove the analogue of Theorem 6.3.4 after putting

α/=||(/-π,)χ||2,

Theorem 6.6.3 Suppose that λ and x are given and that λ is simple. Then ι/α, is

sufficiently small, there exist eigenelements λι and x, of At such that \λ — AJ< ca(

and sin 0, ^ calf where c is a generic constant.

PROOF We revert to the proof of Theorem 6.3.4, where it was shown that

\\?i — PII2 < ca 2· We suppose that the eigenvectors x and x, are such that

ΙΙχ|| 2 = ΙΙχ/ΙΙ 2 = ΐ .

(/)

We introduce a vector x which is proportional to x and satisfies xf. x = 1 so that

P,x (i) = x,.

Similarly, define

Ptx = x\

(see Figure 6.6.2).

Now

(

ΙΙ(ρ,-ρ)χΙΙ

I 2 = ΙΙχ;-χ|Ι 2 = ΙΙχ;ΐΙ2ΙΙχ "-χ(ΙΙ2.

We obtain

A —— X»# /\X ,

= X

ΙΓΊΓ **V - πι)Αχ + χΐ*πιΑ(χΊ- χ

) \

ll-^il^L J

ARNOLD'S METHOD 277

x λ2

X * λ,

* * »»

λ

X 4 X

X λ3

Figure 6.6.3

Hence

μ - λ.\ ^ c ( max I M l ] α ^ ca

1

ν<'<»ΙΙχίΙΙ2/ ' '

Suppose we wish to approximate the dominant eigenvalue λ = λν We assume

that the remainder of the spectrum is real (or nearly real; see Figure 6.6.3). Then

Theorem 6.6.2 enables us to deduce an error bound equal (or nearly equal) to

that of the Lanczos method which amounts to Ι/Τ^^γ^ (without, however,

the exponent two for the eigenvalues).

Supposing that the dominant eigenvalue is real; we can obtain an approxima

tion for the ith eigenvalue which is still very close to that provided by Lanczos,

on the condition that the remainder of the spectrum is real or nearly real.

When A possesses complex eigenvalues, the study of the precision of Arnoldi's

algorithm is far less conclusive than that of the Lanczos method. The reader will

understand that to a large extent this is due to the lesser degree of perfection of

the theory of uniform approximation on a compact set in the complex

plane.

We have the identity

AV

l=VlHl + h

l+LlVl+iel>

whence we deduce that

and

Now A is diagonalisable by hypothesis, say A = XDX~l. By Theorem 4.4.1

there exists an eigenvalue λ such that

\X-X«>\<cond2(X)hi+l9lffl>\.

In contrast to what happens when A is Hermitian, this bound cannot be entirely

calculated a posteriori on account of the presence of the factor cond 2 (X).

278 NUMERICAL METHODS FOR LARGE MATRICES

The computation of the matrix Ht is extremely costly in practice. The technique

of incomplete orthogonalisation is often preferable to it, which we are now going

to describe. This technique is based on the (heuristic) observation that the

elements Λ0 of Ht decrease for fixed i when j decreases.

We construct a basis {w.}\ oiXl in the following way: let q be a given positive

integer, w ^ O a vector and wi =M/||W|| 2 · W h e n ; = l , . . . , / p u t

(a) y = Awß

(b) when i = max (1, j — q\ then up to ί = / put y: = y — wfi^ where h{j = wf Aw:,

(c) Wj+i=y/hj+Uj, where hj+l = \\y\\2.

Theorem 6.6.4 The vectors defined in (6.6.2) form a basis Wt for Xl such that

wfwj — di} when \i —j\ ^ q + 1.

PROOF Put

i* = max(l,;-<7) = ^ .

[j-q if; > q.

when ; = 1,...,/, we have

i = 2*

Let Ht be the band Hessenberg matrix such that its non-zero elements are hi}

when i —\^j^i + q:

\ 0

we have the identity

AW^Wfit + K^w^e]. (6.6.3)

satisfies

H^H^rtf, r^^^G-'Wfw^^

ψψ

i ν

that the map Ax is represented by the matrix

Hl = G~lBl

OBLIQUE PROJECTIONS 279

with respect to the adjoint bases Wl and WlGl * (Exercise 6.6.3). On multiplying

the identity (6.6.3) by Wf we deduce that

algorithm with incomplete orthogonalization that are useful in practice. They

use the band Hessenberg matrix with or without the correction term rtej (see

Exercise 6.6.4).

Another way of limiting the cost of computation in practice is to use the

iterative variant of Arnoldi's algorithm in a heuristic manner. Starting with u

and fixing a moderate value of Z we compute the eigenvectors φψ οίΑν We begin

again, using as a starting vector a linear combination of the φ{}\ No proof exists

for the convergence of this method (see also chapter 7, Section 7.9).

In Section 6.1 we have expounded the principle of approximation of eigenelements

of A by means of an orthogonal projection upon a judiciously chosen subspace.

When A is not Hermitian, we might be led to consider oblique projections. We

propose the following formal presentation which uses two non-orthogonal

subspaces Gf and Gf of dimension v « n. Let ώι be the orthogonal projection on

Gf. The problem (6.1.1) is approximated in G\ by the problem: find A,eC,

OT^X/GGJ such that

ώ^Αχχ — λιΧι) = 0

The problem (6.7.1) is known as the Petrov approximation of (6.1.1) (Chatelin,

1983), p. 64 and Ch. 4).

We construct orthonormal bases V\ and Vf in G\ and Gf respectively.

Equation (6.7.1) becomes

[νί*Αν})ξι^λ,νί*ν}ξρ

which is a generalized eigenvalue problem.

The reader can verify (Exercise 6.7.1) that the orthogonal projection ώι on Gf

Figure 6.7.1

280 NUMERICAL METHODS FOR LARGE MATRICES

defines an oblique projection π' on G) (see Figure 6.7.1); this justifies the name

of oblique projection for this method.

method of oblique projection on the subspaces

Jf^lin^/h*,...,^"1) and 2'l = l\n(vuA*v,...XA*)l-1v).

(a) Choose υλ = u and wl = v such that

w*vl — 1, fc1=c1=0.

(b) When; = l,2,...,/put

uj+^AVj-ajVj-bjVj.^

cJ+1=(|*i+1ii+1|)l/2,

v

V

-ϋ-α±

j+l— >

C

J+1

method to use in practice since no proof of 'convergence' exists. Its great

advantage over Arnoldi's method is the fact that only a small storage space is

necessary. (See J. Cullum and R. WiUoughby, Ά practical procedure for

computing eigenvalues of large sparse nonsymmetric matrices', Cullum and

WiUoughby, 1986, pp. 193-240).

The general presentation of this chapter is adapted from Chatelin (1983); see also

Saad (1980a, 1980b). The convergence theorem 6.2.4 is due to Chatelin and Saad.

Theorem 6.3.4 was proved in Saad (1980a) by means of a variational formulation.

The proof based on the spectral theory, which is given here, does not rest on the

fact that A is symmetric; it therefore applies to Theorem 6.6.3.

Paige's dissertation (1971) is the origin of papers in the 1970s on the effect of

finite precision in the Lanczos algorithm. The selective reorthogonalization is

presented in Parlett and Scott (1979), the partial reorthogonalization in Simon

(1984) and the algorithm without reorthogonalization is studied in the book by

Cullum and WiUoughby (1985). The block Lanczos method was introduced by

Golub (1973).

The spectral transformation studied in Ericsson and Ruhe (1980) is much used

EXERCISES 281

non-symmetric matrix) is discussed in Parlett and Saad (1987).

A comparative study of the performances of the method of simultaneous

iterations (with or without Chebyshev acceleration) and the Lanczos method is

proposed in Nour-Omid, Parlett and Taylor (1983). Although quite often the

Lanczos method reveals itself as having a better performance than the iteration

of subspaces, even when accelerated, this advantage generally disappears when

the matrix is no longer symmetric, as we shall see in Chapter 7. Finally, Parlett's

article (1984) is directed towards the existing numerical software.

The practical use of the non-symmetric Lanczos algorithm is studied in Parlett,

Taylor and Liu (1985). See also Saad (1982a).

EXERCISES

6.1.1 [A] Let nb Gx and s/t be the mathematical objects defined in Section 6.1.

Let At = πχΑ. Prove that jrft and Ax have the same non-zero eigenvalues.

6.1.2 [D] Let N > n. Consider three matrices aaeCn x", aßeCN x n and aveCn x N

such that

<*« = aßP = ra

v

Nxn nx N

where peC and reC are such that

rp = /„,

Define the following square matrices of order N:

Aa = paar, Aß = paß, Ay = a/.

Let μ be a non-zero eigenvalue of algebraic multiplicity m. Let weCn x m be a basis

of the right invariant subspace and let i;eC Xm be a basis such that v*u = Im.

Define

σ = ν*αα and n = pr.

Let Sa be the block-reduced resolvent of aa associated with the eigenvalue μ.

Let ra(z) be the inverse operator of y -»a^y — yz, where zeCm x m is a given matrix

whose spectrum is disjoint from that of αα.

(a) Prove that σ is regular.

(b) Prove that for each xeC" x m we have

sa(x) = lim re(z)[(/m - tw*)x].

(c) Prove that μ is an eigenvalue of algebraic multiplicity m of the matrices Aa, Αβ

and Ay.

282 NUMERICAL METHODS FOR LARGE MATRICES

(d) Obtain the spectral projections for Aa,Aß and Ay as functions of αα9αβ,αν

p, r, u and v.

(e) Prove that for each XeCN x m the block-reduced resolvents of Aa, Aß and Av

associated with μ, are given by the formulae

Sa(X) = psa(rX)-(IN-n)XG-\

Sß(X) = (psa(aßX) - l(IN - ρησ-ιν*αβ)Χ1σ- \

l l

Sy(X) = laysa(rX) - (IN - ayua~ v*r)X^-

respectively.

6.2.1 [A] Study the constant \\Cl\\2 of Lemma 6.2.1 when the matrix A is not

diagonalisable.

6.2.2 [D] We retain the notation of Exercise 6.1.2.

(a) Prove that the basis of the right invariant subspace of Ar associated with μ,

can be derived from that of Aa by subjecting the latter matrix to fixed point

iteration.

(b) Study the convergence of the eigenelements of Ar when interpreted as the

approximations of the eigenelements of A.

6.2.3 [A] Prove Theorem 6.2.4 on the assumption that |μ,| > |μ ί+ J.

6.2.4 [A] Prove that the matrix Qk constructed by the algorithm (5.2.1) is a

basis of the subspace AkS.

6.2.5 [A] Prove inductively that the matrix Δζ of Theorem 6.2.4 is block-

diagonal.

6.3.1 [A] Prove that if the Krylov subspace JTZ in the Lanczos tridiagonalisa-

tion method is of dimension less than /, than the process reduces to two

eigenvalue problems of order less than n.

6.3.2 [D] Prove that the basis Vn constructed by the Lanczos method is an

orthonormal basis.

6.3.3 [A] Let (uu..., u„) be a basis of the vector space S of dimension n. The

vectors pj constructed by the Gram-Schmidt algorithm are defined as follows:

EXERCISES 283

yj+i=Uj+i· Σ (yfuj+im

Now consider the Krylov subspace Jf generated by

u1=v1

uj = Aj~1v1 (j = 2,...,n),

where vx is such that §vx \\ 2 = 1. Let Xj be basis vectors constructed by the Lanczos

algorithm and let ys be the vectors obtained in the Gram-Schmidt orthogonali-

zation process.

Show that, for j = 1,2,..., n, there exists a real non-negative number Oj such

that

e x

yj = jr

Why is the Lanczos algorithm preferable in finite precision arithmetic?

6.3.4 [D] Retain the notation of Lemma 6.3.1. Show that the Lanczos method,

when applied to A or A\ produces the same matrix πχΑπχ.

6.3.5 [D] We are proposing here a generalization of the Lanczos method to the

case of a non-Hermitian matrix A. Let vx and wx be such that w^vl = 1. Define

vj+i=Avj-0CjVj-ßjVj-u

j

«i+i.

ßj+1

v -5£±i

*J+lßj+l=*J+l*J+V

284 NUMERICAL METHODS FOR LARGE MATRICES

Let

^ = (1?!,...,^),

Jfl(A,vl) = \in(vuAv1,.-.,Al~1vl),

jr / (^*,w 1 ) = lin(w 1 M*w 1 ,...,(>4*r i w 1 ),

(OLK β2 0 ... 0^

α

\&2 " 2 ß3 ··· ;

r,= o ··.. '·.. '■·.. o

i '■■•■..'"V'-A.

0 ·· 0 'öm'ctm

(c) Prove that if the algorithm terminates at the Zth step (<5jf+ x Φ 0, ; = 1,2,..., Z),

then

\ν*ν, = ι„

lm(V,) = jrl(A,Vl)

\m(W,) = X-l(A*,Wl)

^i=y,T, + Sl+lVl+le*,

A*W,= WtT* = ßl+lwI+le*,

Tl=W*AVl.

(d) what happens if Sj+1= 0?

(e) Interpret the matrix T, in relation to a representation of the linear map A.

6.3.6 [A] Find estimates for the constants in the bounds given in Theorem

6.3.4.

63.7 [B: 46] Suppose the calculations are carried out in finite precision arithmetic

with machine error of order ε. Thus the recurrence formulae (6.3.1) become

AV^V^ + b^^rf + F,,

F*F, = L, + / + L*,

where L, is a lower triangular matrix. Suppose that there exists local orthogona

lity:

v,llin(Oi-i,v,-2)

in such a way that the diagonal and the first subdiagonal of L, are zero.

EXERCISES 285

IIFil^eMII*

(a) Show that

(b) Show that the ith column xj.° of Xt satisfies the equation

Λ + ι 4 0=1,2,...,/),

Pn

where

yiJt = £<"■%£<«,

Kt being the strictly triangular part of Fj Vt - VjFr

(c) Show that ||KJ|2 = 0(ε||Λ|| 2 ).

(d) Show that the relations

7α = 0(ε\\Λ\\2) and xj°Vi~l

imply that

ßu = 0(e\\A\\2).

Deduce that 'the loss of orthogonality entails convergence'.

6.3.8 [D] Retain the notation of Exercise 6.3.7.

(a) Prove that if i φ k and i, k < /, then

(b) Deduce that the Ritz vectors xf) and x%\ which are not good approximations

for the eigenvectors xf and xk (because ξη and ξΙΗ are too great), are

orthogonal up to machine precision.

6.3.9 [B:15] Given a real symmetric matrix A - (atJ) of order n, choose an

arbitrary vector v{*] such that \\ν^]\\ 2 = 1, and an integer fc0 « n.

Let

(a) Wf = AVkl\

(b) Hkl^Vk^Wkl\

286 NUMERICAL METHODS FOR LARGE MATRICES

unity.

(d) x i l ) = ^ i l )

(e) r<'> = M-A<'>/)x<<>.

If lK°ll2 xs sufficiently small, terminate the process; if not,

(fK'> = C(A<«)r«>,

(g) w«'> = (/-F<'>K<'>T)i<".

If II w{k II2 i s sufficiently small, then

w (0 = r (0

(h) d = — ^ <+i>

ΚΊ1ΙΙ2

K? + 1> = x g (/+-/+1).

Prove the following inequalities:

ιιη°ιι 2 =ι>

Wlla^MII*

Κ°|| 2 = ι,

4 ° ^ ^max(^) (the greatest eigenvalue of A\

K%^\\A\\r

6.3.10 [D] Consider the algorithm of Exercise 6.3.9. Prove that

(a) V^ V(l] is an orthogonal projection for all / and k.

(b) If λ -► C(A) is continuous on a compact set containing the spectrum of A, then

the sequence 4° is bounded with respect of / and k.

6.3.11 [D] Investigate the convergence of the algorithm proposed in Exercise

6.3.9 when λ^ is the greatest eigenvalue of the matrix H%\

(a) Show that, for each fce{l,2,...,/c0}, the sequence (A^)leN is increasing and

bounded^.

(b) Show that

independently of k.

6.3.12 [D] Show that if in Exercise 6.3.9 C(A) is symmetric and positive or

negative definite, then the sequences r^ and rjjj tend to zero as / tends to infinity.

6.3.13 [C] Study the behaviour of the algorithm described in Exercise 6.3.9

EXERCISES 287

when

A=

0(λ) = (λΙ

where D is the diagonal of A.

6.3.14 [B:15,16] The choice made in Exercise 6.3.9, namely

C(A) = (^/-D)" 1 ,

where D is the diagonal of A, corresponds to what is called Davidson's algorithm.

We suppose the aim is to compute the greatest eigenvalue of A. Show that if v{^]

is such that λψΐ — D is positive definite, then the algorithm converges.

6.3.15 [B:15] Let (λ,ν) be a pair of eigenelements of A, where λ is not the

greatest eigenvalue of A. Let weUn and let ε be a non-zero real number. Put

vE = v + ε W.

(a) Show that

νΎεΑνε_ , t

T 2

2w Aw-λ\\\ν\\ 2

kiis ιι^.ιΐί

Define

S + = {wGRn:wT/lw-A||w||2>0}

S_ = Rn\S + .

(b) Show that S + is a non-empty open cone.

(c) Consider the algorithm defined in Exercise 6.3.9. Show that the convergence

of x[° towards V can take place only if

X^-VGS_.

(d) Deduce that the method is unstable when λ is not the greatest eigenvalue of A.

6.3.16 [D] Consider the basis KjJ* of Exercise 6.3.9. Can this basis be associated

with a Krylov subspace?

6.3.17 [A] Consider the classical Davidson algorithm (see Exercises 6.3.9 and

6.3.14) when applied to the real symmetric sparse matrix A = (α^). Let i0 be an

index such that

^ο,ο * 0 .

288 NUMERICAL METHODS FOR LARGE MATRICES

(c) Deduce the convergence of the algorithm.

6.3.18 [B: 61] Let A be a real symmetric positive matrix of order n. Consider

the following two methods for solving the problem Ax = b.

Lanczos method: x 0 eR n is given.

r0 = b- Ax0,

ϊ-ι=0

^o = Ikolla»

w-i=r0.

Forfc = 0,l,2,...:

if Sk = 0, terminate;

if not, let

Sk

yk = <iIA<iki

uk = Aqk-ykqk-ökqk.u

r0 = d0 = b - Ax0.

For/c = 0,l,2,...:

If dk = 0, terminate: x = xk is the solution of Ax = b; if not, put

ff*

_ II»-*«!

d\Adi

X

fc+1 = xk + akdk,

r

k+l = rk- akAdk,

K + 1 ll 2 2

ßk ii- n2 '

Il'fcll2

^k + i = r * + i + ßkdk-

EXERCISES

(a) Prove the existence of an integer m ^ d such that dm = 0.

(b) Prove that the vectors d0,...,dm are linearly independent.

(c) Prove that

lin (</ 0 ,..., dk) = lin (r0, Ar0,..., A kr0)

= lin(r0,ru...,rk) (0 < k ^ m - 1),

(0^i<j^m),

l|ril| 2 >0 (0<i<m),

ifO < i < 7 ^ m ,

djrj =

(e) Prove that the minimum of the function gk(o) = (xk — x + cdk)TA(xk

+ adk) is attained at ak.

Consider the tridiagonal symmetric matrix

ί^ο <*i. 0

^

^ι Vi. '··..

'··.'·. A-i

\0 'h-i'Jk-i)

Let

Dk = dmg(a-\...iak~}1),

β* = (<7ο>· ··><?*-1)>

*j = - % / £ / (0<j<fc-lX

A o .. o^

τ0 i o ;

** = 0 t! 1.

AQk-QkTk = ökqkel

Ö*ß* = / *

290 NUMERICAL METHODS FOR LARGE MATRICES

Tk = LkDkLl

and deduce the relations between the parameters yi9 <5„ ft and ai9

(g) Show that the iterate xk of the conjugate gradient method can be obtained

from the Lanczos method by the equation

ofTk.

6.4.1 [D] Recover Theorem 6.3.3 by staring from Theorem 6.4.3 when r = 1 and

when the eigenvalues are district.

6.4.2 [D] Can one generalize the block Lanczos method to a non-Hermitian

matrix by reverting to the ideas of Exercise 6.3.5?

6.5.1 [B: 47] Consider the problem

(K - λΜ)ζ = 0 (z Φ 0)

when K is a real symmetric positive definite matrix and

M = diag(M + ,0),

where M + is real symmetric positive definite. The structure of M induces a

partition of K and z, namely

( X 1 1 - A M + )z1 + /C12z2 = 0,

K]2Zl + K22z2 = 0.

(b) Show that it may be supposed that K22 is regular.

(c) Show that zx is an eigenvector of the matrix

=

^11 ^ U ""^12^22 ^12·

(d) Show that z 2 is completely determined by zl9Kl2 and K22.

EXERCISES 291

6.5.2 [D] Generalize the study made in Exercise 6.5.1 to a matrix M which is

real symmetric semi-definite.

6.5.3 [D] Consider the problem

Kx = AM x,

where K and M are symmetric, K is regular and M is positive semi-definite

singular. Let X be the basis of eigenvectors normalized by ΧτΜΧ = /.

(a) What is the result of the inverse iteration

(K-aM)z = Myk, Λ + ι=/τ?

11*11

(b) Use Exercise 1.13.2 to show that when y is arbitrary and

z = (K-aM)~lMy,

then either

(i) K ~ lM is non-defective (eigenvalue 0 of index 1) and zelmX;

or

(ii) K ~l M is defective (eigenvalue zero of index 2) and (K — σΜ)"1 Mz e Im X.

(c) Deduce that, whatever the initial vector y 0 , after at most two iterations the

vectors yk lie in ImX and everything takes place as if M were regular.

6.6.1 [D] Prove that Ht represents the map stt in the orthonormal basis

{vt } l v defined by Arnoldi's method (6.6.1).

6.6.2 [D] Arnoldi's method is used to approximate the eigenvalue A£. Establish

the rate of convergence when the remainder of the spectrum is real.

6.6.3 [A] Prove that in Proposition 6.6.5 the map s/t is represented by

the matrix Ht = G^1Bl relative to the basis Wr

6.6.4 [A] Consider the algorithm (6.6.2). Arnoldi's method with incomplete

orthogonalization without the correction term rtej consists in using the eigen-

elements of the matrix Ht in order to approximate the eigenelements of A in J f f.

Study the corresponding error bounds.

6.7.1 [A] Consider the Petrov approximation defined in (6.7.1). Prove that the

orthogonal projection ώχ on Gf defines an oblique projection π' on Gf if

co(Gl9Gf)<l.

6.7.2 [D] Study the Petrov approximation when Gf = AG\.

6.7.3 [D] Show that the methods of incomplete orthogonalization and of

aggregation/disaggregation for a Markov chain are methods of oblique projection.

CHAPTER 7

Chebyshev's Iterative

Methods

for the convergence of linear iterations, for they furnish the optimum of the

problem

min max|p(z)|,

pePk zeS

P(A)=1

where S is a set in the complex plane that does not contain A and is bounded by

an ellipse.

In the chapter we have collected a certain number of methods inspired by this

principle in order to compute the eigenvalues of greatest real part of a

non-symmetric matrix.

APPROXIMATION FOR A COMPACT SET IN C

Let S be a compact set of C (or R) and let C(S) be the set of continuous

functions on S with real or complex values, endowed with the uniform

norm

||/IL = max|/(z)|.

those elements V* of V which are closest (in the same of the uniform norm)

to a given function /.

Definitions

(a) v* is a best approximation of / over S in V if

min max \f(z) - v(z)\ = \\f-v*\\ „.

veV zeS

294 CHEBYSHEV'S ITERATIVE METHODS

£(/,S) = {reS;||/|L = |/(OI}·

As regards the existence of a best approximation v* the reader is referred to

Exercise 7.1.1. We have the following fundamental characterization.

there exists a subset σ of S consisting of distinct points zu...,zr of S together

with r positive numbers a 1? ..., ar such that

PROOF The reader is referred to Revlin (1990, p. 74). The {zj'i are the

critical points of the error / — v*.

there exists

a= {zi}r1czE(f-v*,S}

and

such that

£«AP(*i) = 0 WveV),

i= 1

where

(a) e^sgnif-O*)^)

(b) £i(f -v*)(zt) = | | / - v * | | „ (i=l r).

approximation off on σ, and

min max |/(z) — v(z)\ = min max |/(z r ) — ν(ζ{)\.

veV zeS veV Ziea

ELEMENTS OF THE THEORY OF UNIFORM APPROXIMATION 295

Definition The subspace V of dimension k is said to satisfy the Haar (or the

Chebyshev) condition on S, if every non-zero function of V possesses at most

k — 1 zeros in S.

points of S: for every set of k distinct points {tt}\ of S and k values {yj*

in R or C, there exists a unique function veV such that

k

ν=Σ <*iVi

i=l

v(ti) = yi (i = l,...,fc).

Example 7.1.1 For every given AeC, the set K = Pfc = {pePk9 ρ(λ) = 0} is a

vector space of dimension k which satisfies the Haar condition on every compact

subset of C that does not contain λ.

Theorem 7.1.5 (Haar) Every function fe C(s) possesses a unique best approxima

tion v* in V if and only if V satisfies the Haar condition.

is treated in Laurent (1972, Ch. 3).

We are interested in the following problem: given a set S = {Af}{ of / distinct

points, determine the polynomial p* such that

pePk zeS

P(A)=1

revert to the preceding topic by putting q* = 1 — p*9 where q* is the best

approximation of the constant function unity on S by polynomials of P.

Thus

II l — « * II oo = m i n m a x I * - Φ)\-

qePk zeS

296 CHEBYSHEVS ITERATIVE METHODS

Theorem 7.1.6 When k </, there exist k + 1 points λί,...,λίί+ί of S such that

ΙΡ·Ι--(Σ Π f - j ) ·

PROOF By Theorem 7.1.1 there exists a subset of r points {A,}^ of S that are

critical points of the error p* = 1 — q* and have the property that

i=l

where /c + 1 < r ^ 2/c + 1. We shall show that for this particular problem we have

r = /c+l.

Choose k + 1 points λί9..., kk + x among the r ^ k + 1 critical points of p*.

Consider the following basis of Pk\

Wj(z) = (z-X)lj(z) (; = l,...,/c),

where /, is the Lagrange polynomial

f = 1 Λ; /f

/J(AI) = 0 (i#./'andi#fc+l)

/,.(Afc + 1 )#0.

We verify immediately that

ω,(Λ,) = A,· - λ, ω/ΑΛ + i) Φ 0

and ω,·^,·) = 0 when i Φ] and i φ k + 1. By virtue of Haar's condition we have

detCo^^O (U=l,...,fc).

Hence the system

s=l

is arbitrary.

Now consider the Lagrange polynomials of degree k:

fc+1 7 _ 2

r=l A : - /f

ELEMENTS OF THE THEORY OF UNIFORM APPROXIMATION 297

such that

l'j(Xj)=l /;.(Ar) = 0 (t*j)

0 = l , . . . , f c + 1). We verify that the system (7.1.2) has a particular solution

ßj = 1)(λ) Φ 0 (j = 1,..., k +1). In fact, the ;th equation can be written as

t- 1 Λί Λ,

β^-β^ψ^ήψ^ o=i,...,fc).

The reader will verify that ßj can be identified with

r = i A.· — Af

where

e i e ' = -^- (s=l,...,fc+l)

eie- = sgniSs-sgn/;(>l).

Evidently, p(A)=l. Put

«.-[I!e"-w]"''

It is clear that

p ( ^ = pe»' = p Ä

\ßs\

On the other hand, p is a positive real number i in fact

r*+i *)-i r*+i Ί-ι

p = | £ [sgn/;μ)] ς(Α)| =^ |/;wiJ > a

298 CHEBYSHEV'S ITERATIVE METHODS

Jlc+l

because

lft|p&) = PÄ (P>0).

This proves that, when k < f, the polynomial q = 1 — p is the best approximation

required: q* = 1 — p*, therefore p* = p. For this optimal polynomial we have

When fc ^ / , we have || p* ||«, = 0, because there always exists at least one

polynomial of degree ^ / such that

p(A)=landp(A i ) = 0 (i=l,...,/).

Example 7.1.2 Suppose that S = sp(/4) —{A}, where sp(/l) = {AJ^ represents

the d distinct eigenvalues of a matrix A. In Chapter 6, Section 6.6, we defined

ε ( 0 = min max \p(z)\.

p&i-x S

PU)=1

λ include / eigenvalues λί9..., λι such that

Ak — λ

Σ Π λ,,-λ;

ε<'>=< j=lfc=l (7.1.3)

0 otherwise

when all the eigenvalues, other than A, are in a circle that is well separated from

λ (see Figure 7.1.1), then ε(,) is small.

Figure 7.1.1

In order to obtain a bound for ε(/) which involves only a localization of the

spectrum (not all the eigenvalues) we consider a compact connected region D

containing S. Then:

CHEBYSHEV POLYNOMIALS OF A REAL VARIABLE 299

max \p(z)\ < max \p(z)\

zeS zeD

and

(b) the maximum is attained on the boundary dD because the polynomial p is

analytic in D.

Particular optimal results were cited in Theorem 6.6.2. They involve the

Chebyshev polynomials of the first kind which we are now going to study.

Polynomials are simple functions which are useful for approximating more

complicated functions. Now if peP k , say

p(t) = a0 + a0 + alt-\ \-aktk,

then p is entirely determined by the fc-f 1 coefficients a 0 ,a!,...,a k . Amongst

all polynomials the set of Chelyshev polynomials has interesting approximation

properties with regard to the uniform norm.

7.2.1 Definition

ThefcthChebyshev polynomial of the first kind Tk(t) is defined as follows:

Jcos(fccos_1i) when|i|^|

( cosh (k cosh _ 1 i) when|i|>|

when | ί | > 1 , we can put coshku = Tk(t), where coshw = i or equivalently,

u = cosh" * t. The change of variable eu = w yields

^/x wk + w~k

TM-—;—.

where ί = (w + w~ *)/2. Thus w2 — 2tw + 1 = 0; we may choose

W= t+ y/t2~l.

7.2.2 Properties

Tk(-t) = (-lfTk(t),

7ΌΜ-1, Tl(t) = t, Tk(t) = 2tTk.1(t)-Tk-2(t) (k = 2,3,...),

\Tk(t)\^l when|t|<l.

300 CHEBYSHEV'S ITERATIVE METHODS

min max \p(t)\

peP k te[a,b]

P(A)=1

is attained by

T»[l+2(f-ft)/(ft-fl)]

tk(t) =

ΓΛ[1+2(Λ-*>)/(*>-a)]

and

1

I 4 II oo =

T t [l+2(A-6)/(fc-<i)]'

VARIABLE

We can define Tk for a complex variable z by the formula (for example)

Tk(z) = cosh (k cosh ~* z).

when /c -► oo, we have | Tk(z)\ -► oo, except when z is real and |z| < |.

In what follows we suppose that λ is real.

Definition Let E = E(c, e, a) denote the ellipse with centre c, semi-major axis a,

and focal distance e, where c is real and a, e, λ — c> 0; let $ denote the region

bounded by E (see Figure 7.3.1).

Lemma 7.3.1

Ua/e)

f/*= min max \p(z)\ ^

pePu ze<f T k [(A-c)/e]

Figure 7.3.1

CHEBYSHEV POLYNOMIALS OF A COMPLEX VARIABLE 301

PROOF Put z' = (z — c)/e. Then z' lies in the region &' which is bounded by the

ellipse £'(0,1, a/c) of centre 0, focal distance 1 and semi-major axis a/c. Therefore,

by the maximum principle,

m&x\Tk(z')\.

z'eS' T*[(A-c)/e] Tki(X-c)/e-]ze*

Put

HK>

Then ze£'(0,1, a/c) if and only if

»*c>-{«-^-<--e+JW7i\

On the other hand, Tk(z') = (w* + w"*)/2, whence

max | Tk(z')| == max £| w* + w~k\= max ||p*eik* + p~*e~

z'eE' weCp 0 < 0 < 2*

i(p' + p-*)=7\(j|).

Uz-c)le\ _A „_ Tk{a/e)

hiz) = and M „ = ·T tX-c)e] (7.3.1)

TkU-c)le] k

This result is based on the assumption that c and a are real; it is no longer true

when these quantities become complex numbers (see the 7.3.2 and Exercise 7.3.9).

Nevertheless, the result remains asymptotically true as we shall now show. Let

Figure 73.2

302 CHEBYSHEV'S ITERATIVE METHODS

extremal polynomial that satisfies

max | p*(z)| = min max|p(z)|.

zeE' pepk zeE'

P(A)=1

(As regards the existence and uniqueness of p*, the reader is referred to Exercise

7.3.9.)

Proposition 7.3.3

lim max|p*(z)11/fc= lim max if MI 1 /*

kV

fc->oo zeE' fc->oo zeE'

min | ffc(z)| ^ max |p*(z)| ^ max | tk(z)\. (7.3.2)

zeE' zeE' zeE'

The inequality on the right is evident. Suppose that the inequality on the left is

false:

max | n*( z )|< min | fk(z) |

zeE' zeE'

implies that

p*(z)<tk(z) when zeE'.

By Rouche's* theorem. tk(z) — p*(z) has as many zeros in the interior of £' as f(z).

Now tk has k zeros on the segment joining the foci c — e and c + e (Exercise 7.3.5).

On the other hand, tkW — P*W = 0 and λ is exterior to E'. This proves that

tk — p* is the zero polynomial, because its degree does not exceed k and it has at

least k 4-1 distinct zeros. Thus tk(z) = p*(z) on E\ which contradicts our

hypothesis.

By virtue of (7.3.2) it suffices to prove that

lim min|ffc(z)1/fc= lim max if.(z)\1,k

fcV

k - o o zeE' fc-oo zeE' '

All the points of the ellipse E are such that limk_ „ |/(z)| 1/k is constant for zeE.

min max|n( z )| = r*.

pePk | z | < r , F V "

Ρ(λ)=1

CHEBYSHEV POLYNOMIALS OF A COMPLEX VARIABLE

2 ie 2 1/2S

maxX |p(z)|=limrf "lp(re )| M0l .

mil

min |p(reie)l2sd0j > ™ η { | J Ι^' β )Ι 2 ( 1 θ | }

peQkLJo

Let

ks

qiz) = £ atzl.

z=o

Then

Jo i=o

/ ks \l/2 / ks \,1/2

1

2 2 2

(l><l '· ') ^(Σ>(Ι )

and

fcs ks t / ks \l/2

1= 2

Σΐβιΐ<Λ/&+ϊ( Σ Ν ·

1= 0

0 \/ = 0 /

Hence

, ,l/2s

min f

min max | p(z)^r k .

peQ k |z|<r

D = {z;|z-c|^p},

where p and c are real and k>c + p. The optimum

n{k)— min max|p(z)|

pePk zeD

P(A)=1

304 CHEBYSHEV'S ITERATIVE METHODS

Μ4*ΙΙ- = ( χ τ λ 1

PROOF Carry out the change of variable

, z-c

z =-

λ-c

FOR THE POWER M E T H O D

In the power method we construct the sequence

starting from q0 = u/ \\ u ||. This iteration can also be written yk = ßkAku, k > 1.

One might think of using a more general polynomial iteration yk = pk(A)u,

where pkePk is a polynomial of degree k.

Suppose A is diagonalisable and has the eigenvectors {xf}". Moreover, we

make the assumption (5.3.1) that

|A 1 |>max|/i i |.

Now

w = Σ £i*i

and

Π

i=2

is small compared with \pk(X)\ when i ^ 2. This suggests finding a polynomial p

that satisfies

min max | p(z) | = e{k+*).

peP k reep(i4)-{A}

Ρ(λ)=1

We suppose that the dominant eigenvalue λ is rea/ and that the remainder of the

spectrum lies in the ellipse E(c, e, a) where c, e, a are real and λ — οα.

By Theorem 7.3.2 the optimum polynomial is

r ,v 7U(z-c)/eJ

THE CHEBYSHEV ITERATION METHOD 305

P* = T / ~ ) (k = 0,1,2,...)

we obtain

P k + l i k + l ( z ) = Tk+i

Z "—" C

m

e

Again, on putting ak+1 = pjpk+i we have

z—c

tk+l(z) = 2ak+1 tk(z) - akak+1 tk.x(z). (7.4.2)

λ-c (2/aJ - ak

The two recurrence relations (7.4.2) and (7.4.3) can be combined to define an

algorithm for the computation of

yk = tk(A)u (fc = l,2,...).

Although the value of λ is not known, we remark that it occurs only in the

denominator of ak (in 2/σ^, which is a normalization factor for kk: in practice λ

may be replaed by an approximate value.

We wish to determine the dominant eigenvalue λ of A which we suppose to be

real positive and simple: this is the eigenvalue of greatest real part. The ellipse

E(c9 e, a) contains the remainder of the spectrum of A.

7.5.1 Definition

The following method of calculating yk is known as the Chebyshev iteration:

λ—c

yi=^(A-cI)u.

e

306 CHEBYSHEV'S ITERATIVE METHODS

(b) F o r ; = l , 2 , . . . , / c - l p u t (7.5.1)

1

e

Remarks

(a) We have assumed that the parameters c, e and a, which define the ellipse, are

real. If £, still centred on the real axis, has its major axis parallel to the

imaginary axis, then e and a are imaginary. Nevertheless, the computations

of (7.5.1) can always be carried out in real arithmetic. In fact, the σ, are pure

imaginary and so σ,+ l/e and σ]σ]+ j are real.

We remark that when a and e are imaginary, the polynomial

k

Tfc[(A-c)/£>]

is no longer optimal but remains asymptotically optimal for large k.

(b) When the eigenvalues are real and \μ( — c\ ^ a (i = 2,..., n\ then the optimal

polynomial is

f T f c [ (r-c)/a]

Tkl{X-c)ld\

by virtue of Theorem 7.2.1. Therefore, in order to obtain the Chebyshev

iteration in this case it suffices to replace e by a.

7.5.2 Convergence

The Chebyshev iteration (7.5.1) can be interpreted as a method of projection

on the direction generated by ük = tk(A)u (fc = 1,2,...), where

established in Sections 5.2 and 6.2 for the method of subspace iteration.

We make here a more qualitative study of the convergence. If ξχ Φ 0, then

(7.4.1) implies that lin(yk) converges to the direction lin(x) if and only

maXf^ 2 ΙΛΟΌΙ -^Ο when fe-> oo.

If we define wf as the root of

i(vVi + Wi-i) = * ^

THE CHEBYSHEV ITERATION METHOD 307

(Wf/Wi))*, where

defined as κ^μ,) = | wjwi |. The convergence rate towards λ is

τ(λ) = max κίμ,).

i>l

α , ^ ϊ ί Ρ ί + ΡΓ 1 )^

where pf = | νν,·|, αχ = λ — c, whence

α

ι + V αι ~ e

When >4 is symmetric we may choose

^2 + ^min

c =

and

^2 "~ *min

a =

a

w, = T t = -

al + y/al-a2'

which is the same as that of the Lanczos method defined for the Krylov subspace

Jtk+! = (u, >4M, ..., Aku).

Figure 73.1

308 CHEBYSHEV'S ITERATIVE METHODS

Hence, we have just shown the remarkable property that, without knowledge of

c or e, the Lanczos method determines automatically the vector uk = tk(A)u in the

space

Jfk+i=(pk(A)u;pGPk)9

which is the best possible vector with regard to the speed of convergence towards

liniXi).

This remains true when A is no longer symmetric and λ is the dominant

eigenvalue, replacing the Lanczos by the Arnoldi method (when i = 1): the

convergence rate towards λ9 that is

max^ 11 wy| _ a + ^/a2 — e1

is the same as that for the Arnoldi method (i =1), defined for Jffc + 1 when k is

sufficiently great (see Exercise 7.5.2).

It should be borne in mind that such a performance of the Chebyshev iteration

can be attained only when optimal parameters c and e are used. It is unrealistic

in practice to assume that these quantities are known. It is necessary to determine

them dynamically in the course of the iteration. This will be treated in Section 7.7.

(WITH PROJECTION)

Let the eigenvalues {μ.}" be ordered by decreasing real parts. We wish to compute

the r eigenvalues {μ(}\ of greatest real parts. We suppose that Re(^r) > Re(pr+ x)

(see Figure 7.6.1, in which r = 4).

Let M denote the invariant subspace associated with {μ.}\ and let P denote

the spectral projection upon it. We shall now consider the projection method on

the subspace Sk = tk{A)S9 where tk(z) = Tk[(z — c)/e]; this is determined by the

parameters c and e which defined the ellipse £(c, e, a) containing the remainder

of the spectrum {μ,·}"+ χ (see Section 7.7). The algorithm consists in constructing

an orthonormal basis Qk in Sk starting from a basis U of the subspace S of

J*r+l

X

f X X^2

-a x—»

x c

V x

X

f*r+2

r=4

Figure 7.6.1

SIMULTANEOUS CHEBYSHEV ITERATIONS 309

dimension m^r9 the constants σΐ9 k and ε being given. This is carried out as

follows:

(a) ( / 0 = C / , t / 1 = ^ - c / ) ( 7 ;

e

(b) when; = 1,...,k — 1, put

'"■-(£-'') ■

'

υ,-,,-2-ίϋ(/1-ε;)ϋ,-σΛ,,ϋ;_,; (7.6.1)

e

(c) Uk = QkRk;

(d)Bk = QtAQk^FkDkF~l

(projection and diagonalization).

From the m eigenvalues of Bk retain the r eigenvalues of greatest real parts;

form the diagonal matrix Dk with them and let Fk comprise the associated r

eigenvectors. Put Xk = QkFk.

(e) If || AX'k - X'kD'k ||F > ε, then U = QkFk; substitute in (a).

For the computation we use the polynomial

Tkl{z-cye]

h(z) =

r*[(A-c)/e]

because tk(X) = 1 [see (7.4.2) and (7.4.3)]. We assume that the matrix A is

diagonalisable. The orthogonal projection on Sk is denoted by nk.

The following result is a consequence of Lemma 6.2.1.

Lemma 7.6,1 Suppose that dim PS = r. Then for each eigenvector xf associated

with μ, ί/iere exists a unique vector st ofS such that PSi = x, and

\TklUii-c)/e]

provided that the eigenvalues {/xj"+ x lie in the ellipse £(c, e, a).

||(/-^)xi||2 = min||xi-3;||2<||xi->;i||2,

yeSk

where

310 CHEBYSHEVS ITERATIVE METHODS

Hence

UM)\

Since A is diagonalisable, we have

IItklA(I - P)] ||2 ^max|rfc^,.)|cond2(X).

and

max|ik(z)|

zeE o

The constant c, has the value of cond2 (X) || (/ — P)sf || 2 .

Corollary 7.6.2 Suppose that the eigenvalues {μ.}" are rea/ and that they are

arranged in decreasing order. Then under the hypotheses of Lemma 7.6.1 we have

MI-*Jx,h*c,-l— (i=l,...,r),

Tk(yd

where

PROOF There are n — r eigenvalues in the interval [μ„,μ Γ +ι]. Apply Lemma

7.6.1 when

c= , a= , y,= .

Theorem 7.6.3 Suppose that the assumptions of Lemma 7.6.1 are satisfied. Then

the method of simultaneous Chebyshev interations with optimal parameters

converges:

(a) If the ith eigenvalue of the greatest real part is simple and if the {μ7}"+ χ lie in

the ellipse E (c, e, a), then the error boundsfor the ith pair of eigenelements are of

the order

Tk{a/e)

\TkUni-c)le\\

(b) If A is Hermitian, the bound for the ith greatest eigenvalue becomes of order

Tk~2(U

PROOF co(Sk,M)-»0.

The convergence rates that we have found, are, respectively, those of the

block Arnoldi method (when the dominant eigenvalues are real and positve) and

DETERMINATION OF THE OPTIMAL PARAMETERS 311

those of the block Lanczos method. For a non-symmetric matrix the cost of

simultaneous Chebyshev iterations is well below that of the block Arnoldi

method. The former method will be preferred in practice if a satisfactory

technique is available for estimating the optimal parameters.

The gain due to the Chebyshev acceleration is measured by comparing

|μΓ+1/Αΐ,|* with Tk(a/e)/Tk[frt-c)/e]9 which is equivalent to (max^lvv^/KI)*

(i = 1,..., r), when k is sufficiently great.

The Chebyshev iterations (7.5.1) and (7.6.1) depend on the parameters c and e

(respectively a = e in the case of a real spectrum) which determine the ellipse

(respectively the segment [c — a, c + a]) containing those eigenvalues that we do

not wish to compute.

Let us begin by studying the case in which r = 1. The optimal parameters c

and e are those that satisfy

c,e c,e ί>1 c,e i> 1

We assume that the eigenvalue of the greatest real part is real. The set sp (A) — {λ}

is symmetric with respect to the real axis.

The problem (7.7.1) consists in seeking the minimum of a finite number of

functions of two real variables c and e if sp(A) — {λ} is supposed to be known.

When /l = 0, the problem has been studied in Manteufifel (1977), where an

algorithm for the computation of c and e is proposed.

When r > 1, a natural idea consists in seeking to solve

w,

However, the situation is more complicated than in the case in which r = 1, for

there may be conjugate complex eigenvalues.

(a) μΓ is real. It suffices to fulfil (7.7.2) (see Figure 7.7.1).

(b) μΓ_! and μΓ are conjugate complex. It may happen that the best ellipse

constructed upon μΤ.χ and μΓ contains in its interior some of the desired

λ2

X

Χλ,

r=4

Figure 7.7.1

312 CHEBYSHEV'S ITERATIVE METHODS

-X-

r=5

Figure 7.7.2

μ = Re(/xr); see Figure 7.7.2 and Saad (1984) for further details. The case of

complex μΓ is treated in full generality in Ho, Chatelin and Bennani (1990).

In practice, sp(A) is unknown and one has to resort to estimations of the

eigenvalues in order to determine the optimal parameters dynamically. This can

be done by using the required information about the spectrum which is found

in the various methods for computing the eigenvalues (power method, simul

taneous iterations, Arnoldi's method).

For example, in the algorithm of Section 7.6 we choose m>r and in step (d)

of (7.6.1) we bring to light the parameters c and e for determining the new ellipse

by using the m — r eigenvalues not retained in Bk.

As we have just seen in detail, the Chebyshev iteration method serves to

accelerate the linear iteration methods which are used to solve linear systems

(Exercise 7.8.1) or to compute some eigenvalues of greatest real parts. The

underlying problem of approximation theory is concerned with the determination

of the polynomial that satisfies

min max|p(z)|, (7.8.1)

pePk zeS

PU)=1

where S is a set in the complex plane containing the spectrum of A except A. (If

the problem is to solve a system, then λ = 0 and S contains the whole spectrum.)

When S is bounded by an ellipse, the solution of (7.8.1) is the Chebyshev

polynomial tk(z\ whence the term 'Chebyshev iterations'.

However, the problem (7.8.1) has many other applications apart from the

acceleration of linear systems. In fact, in very diverse contexts we meet the more

general problem of determining a polynomial that is large (in a certain sense) on

some eigenvalues {μ^\ of A while it is as small as possible on the remainder r of

the spectrum. For example, we mention the filtering or the techniques of

preconditioning.

LEAST SQUARES POLYNOMIALS ON A POLYGON 313

*μ 2

-X *

Χμ3

Figure 7.8.1

In the case of a complex spectrum the uniform norm that appears in (7.8.1) is

not necessarily the norm that leads to the best polynomial in practice. The

Chebyshev polynomial depends on the optimal ellipse which contains the part

τ of the spectrum that is to be eliminated. This ellipse may turn out to be far too

large in relation to r(see Figure 7.8.1).

It might be more interesting to consider the polygon H which is the convex

hull of the set of eigenvalues in τ and to determine the least squares polynomial

that satisfies

™° HPL, (7-8.2)

Σ^ι«ίΡ(/*ι) = ι

where the {aj^ are given coefficients and |||| w is the I2 norm relative to

a weight function w defined on the boundary dH.

Theorem 7.8.1 Let {SJ}Q be the first k + 1 orthogonal polynomials with respect to

w. The polynomial that satisfies (7.8.2) can be written as

where

^=Z«.s>i) (j = o,...,k).

i=l

PROOF This generalizes the known result for r = l . Consider the degene

rate kernel

Jlc

k(**z)= Σ Sj(t)sj(z).

Then

</Kz),/fc(i,z)>w= P(z)lk(t,z)w(z)dz

JdH

= p(t)

for every pePk.

314 CHEBYSHEVS ITERATIVE METHODS

i=l

r

q*(z) = c£ α,/^,,ζ),

where !

-(W'

Let pePk and suppose that

Σ α«Ρ(μ,·)=1·

We put p = q* + e; clearly

Σ«Λ)=ο.

On the other hand,

iipiii=ii«*ni+ikiii+2Re(<e,i*>w).

Now

r

i=l

r

= c X a^/i;) = 0.

i=l

As regards the determination of //, the choice of w and the practical

computation of q*, the interested reader is referred to the paper by Saad

(1987).

We continue to suppose that one wishes to compute the r eigenvalues {μ·}\ of A

which have the greatest real part. We shall briefly describe how Arnoldi's method

can be combined with the techniques of deflation and polynomial transformations

(for acceleration of spectral preconditioning) in order to obtain efficient hybrid

methods. These methods are described in greater detail by Saad (1989).

THE HYBRID METHODS OF SAAD 315

We are concerned with Arnoldi's iterative version, where Chebyshev iteration is

used to compute the new initial vector. Wefixm as the size of the Arnoldi method

and k as the degree of the Chebyshev polynomial. We choose an initial vector u.

The algorithm consists in carrying out the following three steps:

(a) Starting from u we compute the Hessenberg matrix of order m obtained by

Arnoldi. Its eigenvalues are divided into two groups: the r eigenvalues that

approximate to those we wish to find and the m — r eigenvalues that allow

us to determine the optimal parameters.

(b) Let z0 be a suitable linear combination of the eigenvectors associated with

the r retained eigenvalues. Starting with z0, carry out k steps of the Chebyshev

iteration in order to obtain zk = tk(A)z0.

(e) Put u = zk/\\ zk || and return to step (a).

Step (b) serves to diminish those components of z0 that are associated with the

unwanted eigenvalues.

The practical choice of the parameters m, k and r is discussed in Bennani (1991).

The influence of the non-normality of A is discussed in Chatelin and Godet-

Thobie(1991).

We apply Arnoldi's method to the matrix Bk = pk(A), where the polynomial pk

has been determined in such a way that the r eigenvalues of A with greatest real

parts become the r dominant eigenvalues of Bk. Let Vk be the Arnoldi basis

computed in this way; the eigenvalues of A are approximated by those of the

matrix V*AVk by virtue of the principle that Bk and A have the same invariant

subspace associated respectively by the r dominating eigenvalues and by those

of greatest real parts.

We fix m andfe,and we choose u. The algorithm consists in carrying out the

following steps in succession:

(a) Initialization. Starting with u, apply Arnoldi's method to A and divide its

eigenvalues into two groups; go to step (c).

(b) Compute V*AVk, divide its eigenvalues into two groups and obtain the

optimal parameters.

(c) Compute the Chebyshev polynomial of degree k and compute the vector v

as a linear combination of the eigenvectors associated with the retained

eigenvalues.

(d) Starting with v apply Arnoldi's method to Bk = pk(A) in order to obtain the

basis vk; return to step (b).

Remarks

(a) It is unnecessary to compute Bk explicitly in order to apply Arnoldi's method

to it; only the product pk(A)x is required, where x is a given vector.

316 CHEBYSHEV'S ITERATIVE METHODS

(b) The following observations refer to each of the two methods we have just

described.

(i) The Chebyshev polynomial associated with the ellipse containing the

unwanted eigenvalues may be replaced by the least squares polynomial

associated with the convex hull of these eigenvalues.

(ii) If the number r of required eigenvalues exceeds a certain size, then it may

be of interest to use a deflation technique (Exercises 7.9.1 and 7.9.2).

(c) In addition, the spectral transformation λν-+{λ — σ)" 1 may be used for

preconditioning. In order to solve (A — σΙ)χ = y9 we employ a direct method

(Gauss factorizaion with pivot and preconditioning) or an iterative method

of the conjugate gradient type with preconditioning (see Golub and van

Loan, 1989, p. 373). The algorithm of minimal residual of Saad and Schultz

(1986) makes no particular hypothesis about the matrix A — al.

This chapter has been inspired mainly by Saad (1984). The literature on

Chebyshev polynomials of a complex variable is very poor (Wrigley, 1963; Rivlin,

1990; and Manteuffel, 1977). Theorem 7.1.6 is due to Saad (1982b); Theorem 7.3.3

is due to Manteuffel (1977); Theorem 7.3.4 is due to Zarantonello (see Varga,

1957). The use of Chebyshev iteration for the computation of the critical value of a

nuclear reactor is very old (see, for example, Wrigley, 1963, or the book by

Wachpress, 1966). The aggregation/disaggregation methods are used in this

context (see Stettari and Aziz, 1973; see also F. Chatelin and Miranker, 1982).

An algorithm for computing the optimal parameters when the reference eigen

value μΓ is complex is given by Ho (1990).

EXERCISES

for a Compact Set in <C

7.1.1 [A] Let S be a compact set in (C, let C(S) be the set of continuous functions

on S with values in C and let V be a subspace of C(S) of dimension k. Prove that

»/-»ΊΙοο^ΙΙ/-^«,. Vi?eK,

where || · || ^ is the uniform norm on C(S):

zeZ

EXERCISES 317

U-i(x)= Σ/(*Λ(*λ

where the l} are the Lagrange polynomials of degreefc— 1. Let p^_1 be a best

approximation of / in Pk _ t . Define

/>* = liy— i-fc-i lloo,

«* = Ι Ι / - Ρ Γ - 1 Ι Ι - -

Show that

condition

detCco^l^O.

7.2.1 [C] Show that the first five Chebyshev polynomials are

T0(t)=l, ■

Ti(f) = t,

T2(t) = 2t2-l,

T3(t) = 4 t 3 - 3 i ,

T4(i) = 8f 4 -8f 2 + l.

7.2.2 [B:50] Show that the Chebyshev polynomials satisfy

Tk(t) = 2tTk.l{t)-Tk.2(f),

T0(t)=h

Deduce that Tk is of degreefcand that the coefficient oft* in Tk(t) is equal to 2*.

7.2.3 [C] Show that the Chebyshev polynomials satisfy

1 "0 if ΙΦΚ

I at

Tt(t)Tk(t)-—=:

π/2

π

if / = fc#0,

if / =fc= 0.

7.2.4 [D] Show that the Chebyshev polynomials satisfy

(1 - t2)T'k(t) = kTk. ,(t) - ktTk(t), (fc S* 1),

2 2

(1 - ί )Τ'ί(ή = tT'k(t) - k Tk(t) (fc > 0).

318 CHEBYSHEV'S ITERATIVE METHODS

00

1 — si

y Tfc(t)s*= ,·

A l - 2 s r + s2

7.2.6 [D] Prove that, for fixed k and sufficiently great real i,

T k (i)-i(2i) k .

7.2.7 Prove that

7.3.1 [D] Define

Tk(z) = cosh (k cosh " 1 z).

Study this definition with the help of the function

τ:(χ, }>)!-► (cosh x cos y, sinh x sin y).

7.3.2 [D] With the help of the definition of Tk(z) given in Exercise 7.3.1, recover

the definition of the Chebyshev polynomials of a real variable:

x > 1 => Tk(x) = cosh (k cosh~ 1 x),

x < - 1 =>Tk(x) = ( - l ) * c o s h [ / c c o s h " ^ - x ) ] ,

|x| ^ 1 =>rfc(x) = cos(fccos -1 x).

7.3.3 [D] Show that the Chebyshev polynomials can also be defined by

Tk{z) = cos (k cos - 1 z ) .

7.3.4 [A] Show that the Chebyshev polynomials satisfy the recurrence relation

Tk+l(z) = 2zTk(z)-Tk^(z) (ze<E).

7.3.5[A] Show that Tk(z) has k zeros on the real segment [—1,1].

7.3.6 [D] The Joukoviski transformation is defined by

J:wy-+t = i(w + w _ 1 ).

Show that J transforms the circle | w\ = p into the ellipse

ΒίΟ,Ι,^ρ + ρ" 1 )).

7.3.7 [D] Prove that

—^-eE(Oy Ud)^>teE(c,e,de).

e

EXERCISES 319

Tk(z) = ±(wk + w~%

where

z = cosh ξ and w = e5.

7.3.9 [A] Show that the polynomial tk of Theorem 7.3.2 is no longer optimal

when the parameters c and a are not real.

7.3.10 [D] Study the limit of

G*(z) = % ^ (ZGCC),

Tk(z)

when fc tends to infinity.

7.3.11 [D] Show that, when the ellipse E(c,e,a) becomes a circle, we have

^0 and ^«^ϊ.

7.3.12 [A] Prove the existence and uniqueness of the polynomial p* such that

max|p*(z)|= min max|p(z)|,

zeD pePk zeD

p(A)=l

7.3.13 [D] Consider the ellipse £(0,1, a) and the number

p= a+ yJa2-\.

Let Ex be the conformal ellipse which passes through λ and has a semi-major

axis equal to αλ; let

Px = <*λ + χ / β ί ί - Ι ·

Prove that

P< · [p(z)|^p fc + p- fc

-<min max \^~z rr·

p* pePk ze£<0.1 ,Λ) | ρ(λ) \ Ρλ + Ρλ

7.4.1 [D] Study the Chebyshev acceleration for a defective matrix.

7.4.2 [D] Write down the algorithm for the computation of

yk = tk(A)u (fc=l,2,...),

which is deduced from the formulae (7.4.2) and (7.4.3) by replacing the exact

eigenvalue λ by an approximation.

320 CHEBYSHEV'S ITERATIVE METHODS

7.5.1 [D] Prove that if A is symmetric than the Lanczos method defined by the

Krylov subspace

jrfc + 1 =lin[p k (,4),p fc eP k ]

determines the vector ük = tk(A)u, which is optimal for the speed of convergence

towards the dominant eigenspace, no knowledge being required of the parameters

c and e of the ellipse associated with the Chebyshev iteration method.

7.5.2 [A] Prove that for large k

/maxj^lwjiy ^ Tk(a/e)

\ |wj / Tkl(X-c)/e]'

7.5.3 [D] Consider the Chebyshev iteration method. Show that if the eigen

values of A, other than A, lie in the disk | z | < p, then the convergence of the power

method is not improved by Chebyshev.

7.6.1 [D] Study the validity of Lemma 7.6.1 when A is not diagonalisable.

7.6.2 [D] Compare the cost of the simultaneous Chebyshev iterations with that

of the block Arnoldi method.

7.6.3[B:44,58] Let AeR"*", beTR" and μ 1 ,...,μ Γ 6€. Find / e R " such that

p f esp(/l-fc/ T ) (l^i^r).

Let Qe<C be an orthonormal basis and let K e C r x r be an upper triangular

nxr

Ηη(β) = Μ*,

where M + is the invariant subspace of A1 associated with the eigenvalues

λ1,..., Ι Γ of A1 with greatest real parts.

(a) Propose one or more algorithms to compute the basis Q of this partial Schur

factorization of A1.

(b) Show that the choice

/=ß5, see,

τ

ί = β ί>

reduces the proposed problem to the following problem of size r.

Find s e C such that RT — tsT has the eigenvalues μΐ9...,μτ. This problem

is called the partial pole assignment in control theory.

EXERCISES 321

7.7.1 [B:29,57] Write down an algorithm for the dynamic computation of the

optimal parameters for the algorithm (7.6.1) by using the m — r eigenvalues of Bk

that have not been retained.

7.7.2 [B:29,30] Suppose μΓ is complex. Compare the approximate solution of

(7.7.2) with the exact solution proposed by Ho in [B:29].

7.8.1 [A] Let x 0 be an approximate solution of the problem Ax = b. Starting

with a set of constants yni(i = 1,2,..., n — 1), define

w - l

xn = x„-i+ Σ ym'i,

i=l

where

rf = b — Axt.

Let the error be denoted by

(a) Prove that

en = P„(A)e0i

where p„ is a polynomial of degree n such that pn(Q) = 1. We are interested in

a sequence of polynomials pn such that ||p„(>4)||2 tends to zero as fast as

possible when n tends to infinity.

(b) Prove that if A is diagonalisable, then

ΙΙΡ„(Λ)ΙΙ2->0 as n->oo,

if and only if

|/>rt(A)|-»0 as H-+00

for all λ in sp (A).

Now determine the polynomial that satisfies (7.8.1) when λ = 0.

(c) Extend the preceding result to the case of an arbitrary matrix by using its

Jordan form. For a given sequence of polynomials pn we define the asymptotic

rate of convergence at a point AeC as follows:

r(X)=\im\pnW.

n-*oc

Consider the polynomials

„ , » . _Tml(c-xye]

ρΛλ)

—Tjim-

322 CHEBYSHEV'S ITERATIVE METHODS

| ( c - ^ + [(c-A) 2 -e 2 ] 1 / 2 l

c + (c 2 -e 2 ) 1 / 2

we call optimal parameters those that minimize maxZ€sp(/1)r(A).

(e) Show that for a given pair (c, e) one has to compute the sequence

x0 eR", r0 = b — Ax0,

Δ0 —-r0, Xj — x 0 + Δ 0 ,

rn = b- Axni

where

2c

«1 =

2c2 - e 2 '

i?1=ca1-l,

"^l·"©^-1] '

ßn = C0tn-l.

7.9.1 [A] Suppose a Schur factorization AQ = QR is given corresponding to a

fixed order of the eigenvalues. Define a deflation with several vectors.

7.9.2 [A] Propose an algorithm of progressive deflation for computing the

eigenvalues with greatest real parts.

7.9.3 [D] Propose a polynomial preconditioning for Ax = b which is suitable

for the method of conjugate gradient.

CHAPTER 8

Polymorphic Information

Processing with Matrices

The 25 years which separate the writing in 1987 of the original French version of this

textbook and the present Classics Revised Edition have enabled scientists and soft

ware developers to progress significantly in the understanding of the role played by

matrices in intensive scientific computing. The evolution in computing know-how is

fuelled by the necessity to translate mathematical computation into numerical soft

ware which should be fast and reliable enough to meet the ever-growing demands of

high-tech industries. In this endeavour, computations which rest upon an explicit or

implicit spectral decomposition of highly non-normal matrices represent a formidable

challenge. The spectra are inherently unstable: this is convincingly illustrated by Ex

ample 4.2.11, pp. 162-164 in Section 4.2.7. And the subject is developed more thor

oughly in Chapter 10 of the book [4J, which addresses the specific difficulties created

by high non-normality for the necessary backward assessment of finite precision com

putations. In practice, high non-normality does arise in matrix computation when the

underlying equations express a strong coupling between two phenomena observed in

physics or technology. This coupling creates mathematical instabilities which have a

serious impact on the computed results. Various tools such as pseudo-spectra [4,21]

have been designed to assess the validity of such results when obtained with a reli

able numerical software run on a computer with a classical architecture. But when

codes are run on massively parallel architectures, the validity assessment of computer

simulations remains today an open, yet pressing, problem. An extensive survey of

concurrent advances in numerical software for eigenvalues is found in [18]. All ref

erences which do not appear in Appendix B or in C are listed in the text as [n] and

appear at the end of the chapter for reference number n. An analysis of what works

best in software practice is a precious source of information about the theoretical

reasons why matrices are such dependable tools in scientific computing.

This chapter attempts to present some of these reasons in a way which does jus-

324 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

tice to the power of mathematics to compute and model our world. Thus it is equally

important to consider what conceptual tools are at work in today's (2012) view of the

phenomenological world presented by theoretical physics [1,2]. To set the scene, let

us review the rich variety of basic building blocks, generically called scalars, which

are used in computation understood in a broad enough sense, running from engineer

ing practice to physical theory.

8.1.1 The Real Field R

The field of real numbers underlies almost all computations taking place inside scien

tific computers worldwide.

We recall that, more generally, a field K is an algebraic structure where two dis

tinct operations, denoted + and x, are defined, both associative and commutative,

yielding an additive group structure (0 neutral for +) and for K* = K\{0} a mul

tiplicative group structure ( 0 ^ 1 unit for x). Beside the field R, children in high

school are introduced to Z2 = {0,1} and Q, and sometimes to complex numbers,

turning R2 into C.

Multiplication need not be commutative in a field, as is illustrated by the field HI =

C x C of quaternions which are 4D-real vectors. Quaternions are heavily used in all

engineering domains which depend on 3D-rotations, from computer graphics for the

movie industry to orbital mechanics for geolocalisation. They also underlie electro-

magnetism (Maxwell 1870; see [8]) and special relativity.

8.2.1 The Integer Ring Z

We commonly count with integers 1,2,3,..., leading to the ring Z = {0, ± 1 , ± 2 , . . . } :

we can multiply but not divide in Z* (for n > 1,1/n ^ Z). This indicates that scalars

may belong to a ring in practice.

The real plane R2 can be classically endowed with the complex field structure C. This

makes it the Euclidean plane pioneered by Euler with the celebrated formula

ew = cosθ + isin0, \el&\ = 1

SCALARS IN A RING 325

resulting from considering the imaginary unit i = (0,1), i2 = —1, Arg i = π/2, and

circular trigonometry on the unit circle.

The 1848 proposal by J. Cockle to equip R 2 with an alternative ring structure

related to hyperbolic trigonometry on the two dual unit hyperbolas x2 — y2 = ±1 is

not widely appreciated. It rests on the introduction of a non real unipotent u — (0,1),

u2 = 1, u Φ ± 1 , such that w = x + yu eV..

Some properties of complex vs. hyperbolic numbers are contrasted below at a

point M = (x,y) inR 2 :

c u

2 2

i, i = - 1 u, w = 1, u 7^ ± 1

z = x + iy w = x + yu

z = x — iy w* = x — yu

Vx<2 - y2 if

\χ\ > \vl

\z\ = Λ/ΖΊΪ k k = < 1 0 if \x\ = |y|,

V

|i| = l \u\h = ± i

w _ )

The algebraic structures C (field) and Ή. (ring) yield respective information at the

point M which differ markedly:

• The Euclidean norm or modulus \z\ G R + is replaced by a threefold mark

function w *-> \w\h £ {R + , iR} which is not a norm and depends on the loca

tion of w for its expression. The mark relates implicitly H to the axes R and

326 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

yj\x2 - y2\ > 0. It remains constant at p > 0 (respectively at 0) when M

varies on the four-branched hyperbola with hyperbolic radius p > 0 (respec

tively on the asymptotes).

spectively fourfold for |y\ φ \χ\) with a single Euclidean angle Θ defined mod

2π (respectively two hyperbolic angles φ in R).

These differences are induced by the two quadratic forms x2 + y2 vs. x2 — y2.

The plane R 2 is treated as a whole by means of the positive definite form x2 4- y2\ it

is divided into four distinct quadrants by the two asymptotes y — ±x on which the

indefinite form x2 — y2 is 0. For example, in the quadrant where 0 < y < x, the

angles Θ (Euclidean) and φ (hyperbolic) are related by 0 < tan# = tanh</? = A'A

with OA! = 1 in Figure 8.2.1. The point B — el6 on the unit circle defines OB' =

cos# < 1, B'B — sin#; alternatively the point C = βηφ on the unit hyperbola

defines OC = coshy? > 1 and C'C = sinh(/?. Moreover, θ < π/4 and φ > 0

represent respectively twice the area of the circular and hyperbolic sectors OA'B and

OA'C.

Figure 8.2.1

The structure Ή equips the plane R 2 with a model of hyperbolic geometry. The

hyperbolic number w — χΛ-yu models the addition of the two heterogeneous numbers

xl and yu, where 1 and u span two distinct real categories such as space and time.

Not surprisingly, hyperbolic numbers are well-suited to describe special relativity in

one spatial dimension [19J.

SQUARE MATRICES ARE MACRO-SCALARS 327

The standard model for particle physics is set in the multiplicative and associative

algebras due to Clifford [1,9]. However we cannot leave this foundational topic [8,9]

without touching on non-associativity, i.e. on algebraic structures beyond rings. This

is because recursive complexification of x leads to the non-associative algebras Ak,

k > 3, proposed by Graves (k = 3) and Dickson to go beyond A^ = H. These weaker

algebraic structures offer new computational possibilities which may appear at odds

with classical logic [3]. Computation paradoxes should not be feared: they reveal new

phenomena which invite us to extend the current logic of computation [3,7,8,9]. We

shall be facing yet another computational contradiction later in Section 8.8.

The smallest non-associative Dickson algebra is the division algebra As = G

consisting of octonions in R 8 . This algebra would be a field if it were associative. It

stands at the crossroads of many computational phenomena in geometry, theoretical

physics and number theory [1,2,8].

But it is high time to refocus on our central theme: the associative ring of square

matrices defined over R or C

8.3.1 A "Pseudo-field" of Square Matrices

Let A be a matrix of order n > 2 over C with rank r — r(A), 1 < r < n. All

non-singular matrices (r = n) form a field inside the ring C n x n with

AA-1 =Α~1Α = Ιη. (8.3.1)

We set n = m in Exercise 1.6.8 on p. 50, where A = UEV* and A* = VY^U*.

Lemma 8.3.1 When 1 < r < n, the identity (8.3.1) is replaced by the two following

ones:

AA* = PUy A* A = i V , (8.3.2)

where P\j and Py are orthogonal projections on Im A.

n ) = 7 r , and hence AA* = UIrU*

and A*A = VIrV* are two similar matrices representing the orthogonal projection

Ir on Im A expressed in different bases. Observe that sp (J n ) = {1} is replaced

by sp (Pu) = sp (Pv) = {0,1}, where the algebraic multiplicity of 1 has become

r < n.

The existence of a pseudo-inverse for any A φ 0 with rank r, 1 < r < n, allows

us to expand the ring structure into that of a "pseudo-field", provided that we extend

the set of unit matrices to contain all orthogonal projections with rank r, 1 < r < n.

For reasons which will become clear as we proceed, we call macro-scalars the

elements in a ring of square matrices.

328 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

Convergence of the QB, algorithm has been presented in Chapter 5, Section 5.5,

pp. 221-226. It was independently invented in 1961 by Vera Kublanovskaya in

Russia and John Francis in England, who replaced Gauss'Li? factorisation used in

(Rutishauser 1958) by the more robust alternative provided by Laplace's QB. Before

this invention, scientists would mainly use Newton's method to find the roots z e C

of the characteristic polynomial π(ζ), or x e C n of (5.8.1) on p. 227 (Anselone and

Rail 1968).

We observe that the QB, algorithm rests on the QB, factorisation (1.8.1) on p. 31

applied successively on a sequence of matrices unitarily similar to A\ = A as in

dicated in Section 5.5.1 on p. 221. In other words, the algorithm works directly on

square matrices treated as scalars. The result is a paradigm shift in the history of the

practical computation of eigenvalues: at long last, it was possible to get all eigenval

ues of A without omission.

By comparison, Newton's method produces a sequence of numbers in C or of

vectors in C n which may not converge unless the starting data are close enough to

one of the solutions. The QB method for eigenvalues provides a clear illustration of

the conceptual benefit obtained by working over scalars which consist of matrices.

On the practical side, the benefit can be no less impressive. Block-partitioning is a

successful technique to get a good time-efficiency on parallel architectures. The basic

linear algebra subroutines (BLAS) exploit to their fullest the capabilities of compu

tation over matrices. Among many others, one can cite two examples of theoretical

importance:

(1) the Schur complement formula (1917-1918),

(2) the Sherman-Morrison formula (1950).

Section 8.7 below and Chapter 7 in [8] offer some applications of these identities to

matrix analysis.

The rest of the chapter will review some of the ways by which matrices provide

specific dynamical information about themselves as quantification of endomorphisms,

i.e. linear maps from C n into itself.

STEMMING FROM A OF ORDER n

8.4.1 The Jordan Form

The Jordan form A = XJX-1 is detailed in Section 1.6.3, pp. 22-26. The Jordan

matrix J [14J can take one of two forms:

SPECTRAL AND METRIC INFORMATION STEMMING FROM A OF ORDER n 329

mation is the unique vector μ = (μ$) G C n displaying the n eigenvalues.

• J is bidiagonal iff A is defective. The spectral information consists of two

vectors, the complex μ in C n and a binary one in Z ^ - 1 specifying the Jordan

block structure.

Let λ be an eigenvalue of A of algebraic (respectively geometric) multiplicity m (re

spectively g) and index I with g, I G [1, m].

The Jordan structure of λ can be specified by the Segre characteristic {si =

/ > · · · > s9 > 1}, where Sj represents the order of the j , t h Jordan block for λ

(in decreasing order). Alternatively it can be specified by the Weyr characteristic

{w\ = g > - - · > wi > 1}. The Weyr numbers Wj are defined as Wj = v3- — Vj-i,

j = 1 to Z, for i/j = dim Ker (A - XI)j = the nullity of (A - XI)j, with i/0 = 0, so

that vo = 0 < v\ — g < · · · < v\ = m.

g i

It is clear that m = Y^ Si = Y J Wj.

l 9

x

(cij) G Z | such that \^ °ϋ ~ *> z Z c ^

s = w f

3 an

^

j=l i=l

Cx = 1

(1 · · · 1) T G Z | x l ) . The rectangular matrix C\ of size g x / offers the dual ways

proposed independently by Segre in Italy (1884) and Weyr in Austria (1885) to rep

resent the block structure which had been defined by Jordan (1870) under the form of

a binary sequence of length ra — 1 consisting of s^_i ones and 1 zero, i = 1 to g, and

displaying altogether g zeros.

The spectral information carried by A is expressed with numbers in the fields C

and Z2. By comparison, the metric information is set in M + .

The singular values σ* > 0, i = 1 to n, are the non-negative square roots of the

eigenvalues of the unitarily similar matrices A A* or A* A introduced by Beltrami

(1873) and Jordan (1874). They form the vector σ = (σ<) G R+ n .

330 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

Let || · Up denote the Holder norm where p is specialised to be p = 1,2 or oo. Thus

n

for a; = (xi) e C n , ||x||i = ^ | x » | , ||z||oo = max*|x;|.

2=1

Lemma 8.4.1 \\μ\\ρ < \\σ\\ρ, p = 1,2, oo, with equality iff A is normal.

ORDER n

8.5.1 The Left and Right Polar Representations

Given A e C n x n , we define HL = (AA*)1/2 and HR = (AM) 1 / 2 . Each Hermitian

matrix HL or HR, which is uniquely defined, is the left or right module for A.

A = HLUL = URHR,

where the modules HL and HR are Hermitian positive semi-definite, and the unitary

matrices UL and UR are called left and right phase factors.

singular, then the left (say) phase factor UL = U has the form U — (I — T)U$,

where U$ is one of the left phase factors for A and T satisfies TT* = T + T*,

Im T c Ker A*.

Observe that I - T is unitary and A*Tx = 0 for x G C n . Let 0 e sp (A) with

1 < g < m < n; then I — T has at most g eigenvalues distinct from 1. Moreover,

U0-U = TU0 and \\U0 - U\\2 = ||T|| 2 < 2.

8.5.2 A Is Normal

The factors H and U commute iff A is normal (Statement 208, p. 184, Chapter III

in [12] or Proposition 1, p. 191, Section 5.7 in (Lancaster and Tismenetsky 1985).

Therefore the polar factorisation has a unique form: left = right.

Lemma 8.5.2 The factors H and U of A normal have a common orthonormal eigen-

basis together with A.

POLAR REPRESENTATIONS OF A OF ORDER n 331

PROOF H and U are semi-simple and commute. See Exercise 6, p. 240 in (Lan

caster and Tismenetsky, 1985) or Statement 193, p. 127, Chapter II in [12].

0 < σ e sp (H) and eiQ e sp (17).

PROOF Left to the reader (Exercise 2, p. 192 in Lancaster and Tismenetsky 1985).

The (circular) polar representation z = pe%0 in C can be generalised to A e Cnxn

with the following new possibilities:

• The form of the representation is unique iff A is normal <<==> \\μ\\ρ — \\o-\\pi

p = 1,2, oo. Otherwise ||μ|| ρ < ||cr||p; the left and right forms differ.

Remark The absolute condition numbers for H and U are studied with the Frobe-

nius norm in [5] in the case of a rectangular matrix A G C m x n , r(A) = n < m.

Interestingly for m — n the explicit values for the condition numbers of the phase

factor U depend on the ground field for A. More precisely (1991),

CR(U) = —?—>Cc(U) = —,

that, over C, the absolute condition numbers for A -> A~x in the 2-norm and for

A —> U in the F-norm take the same value 1/σ\. By contrast, when m > n, the

unique value C{U) — ^- is valid over R or C

As for the condition of module H its value

C{H) V2

~ 1 + K{A)

K(A)

for m > n, K(A) = ση/σ\ does not depend on the field R or C [5]. We mention for

future reference in Section 8.6 that

332 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

We consider the multiplication maps by a given vector x

For k > 2 (respectively > 3) multiplication is not commutative (respectively

associative). The corresponding matrices (also denoted by Lx and Rx) are normal:

let x = a + X, X being the imaginary part in 5sAk\ then x — a — X, Lx — al + Lx,

IJT = Lx = al — Lx and I / ^ I ^ = a2I — L2X = LXL^. Moreover, for k > 2, Lx

and Rx differ but commute and \\LX\\ — \\RX\\ [8]. Therefore there exists a common

orthogonal eigenbasis for Lx, Rx, their module ϋί^ and the respective phase factors

Ux, Vx which differ only by their eigenvalues lying on the unit circle.

(1) For k < 3, L^LX = R%RX = ||x|| 2 / n · The common module is Hx — ||x||/ n ·

Therefore Lx — \\x\\Ux and Rx — ||x||V^.

(2) A significant change occurs when k > 4. The common module of the normal

real matrices Lx and Rx is more complicated than ||x||/ n · The spectrum sp (L^LX) =

sp (R%RX) contains at least the three values ||x|| 2 and a, /3 such that 0 < a < \\x\\2 <

β < 2^ _3 ||χ|| 2 when x is not alternative, i.e. \\x x y\\ φ ||χ|| · ||y|| for some y in A^.

In particular, Lx and Rx can be singular when x ^ 0 is a zerodivisor (By ^ 0 :

x x y — 0). When this occurs, their rank satisfies 2k — 4(/c — 1) < r(Lx) — r(Rx) <

2k — 4 and the phase factors are ambiguously determined (see Exercises 8.5.2 and

8.5.3).

The evolution of the polar representation of Lx and Rx over E displays three

major stages corresponding to k = 1, k € { 2 , 3 } and finally k > 4. See |8J for more

details.

DEFINITE UNDER SPECTRAL COUPLING

Unless otherwise stated, we assume throughout this section that A is Hermitian

positive (semi-) definite. The application in mind is to the left or right module H

for an arbitrary matrix. The better part of the section is adapted from Chapter 3 in

[13], where A is mainly restricted to be real.

When x e Cn is an eigenvector for A, Ax — \x for λ > 0, i.e. the angle

Z(x, Ax) is 0 for λ > 0 and not determined for λ = 0.

When u φ 0 is not an eigenvector, the scalar product (Au, u) is positive. Over R

(A real) the coplanar real vectors u and Au define an acute Euclidean angle Θ(Α, u) —

Z(u, Au) such that (Au,u) = ||Aw||||u|| cos6(A,u) > 0. Over C (A complex),

YIELD OF A HERMITIAN POSITIVE SEMI-DEFINITE UNDER SPECTRAL COUPLING 333

Θ(Α, u) is the unique canonical (or principal) angle between the complex lines spanned

by u and Au (Definition p. 5, Section 1.2). Below we are interested in this acute "an

gle" Θ(Α, u) defined for Au φ 0 and its maximal value φ(Α), 0 < φ(Α) < π/2.

The argument φ{Α) for the yield is the maximal dynamical play produced by A.

LetO < Xmin — λι = μι < · · · < μη = λά = A m a x denote the repeated eigenvalues

μί of A ordered by increasing value: \min and A m a x are the two extreme eigenvalues

for A which are distinct iff A φ λ/ η ; λ = Xl+Xd is the arithmetic mean. The

Euclidean scalar product is written as (x, y) = y*x in matrix notation (x,y eCnxl)

1 2

and (χ,χ) / = \\x\\ denotes the Euclidean norm || · ||2.

We define two real-valued functions:

• O ^ e C M cose(Au) = ^ g ^ for Au φ 0,

The quotient defining cos Θ(Α, u) should be contrasted with the Rayleigh quotient

M(A,x) = [uffi > x φ 0, defined on p. 33, which yields metric information in

[μι, μη], The norm function n satisfies the min-max equality

||η||<1 ε € Μ ε>0

|Μ|<1

operators on a Hubert space, which contains Hermitian positive definite matrices as a

special case. The right-hand side deals with the norm curve ε G M+ \-> \\εΑ — I\\ 6

R + which is continuous and convex in e. The minimum in ε is unique and achieved

for e m with ||s m A — I\\ < 1.

Theorem 8.6.1 Let A be Hermitian positive definite. The components ofa(A) are

cos φ(Α) = ^ ^ f = min u ^ o cos0(A,Tz) and sin φ(Α) = ^ ^ - \\emA - I\\,

2

ε - - Id

Letting λι = 0 in Theorem 8.6.1 yields φ(Α) — π/2 and φ(Α) < π/2 when

λι > 0 by assumption on A: the quadratic form x* Ax is non-negative for all x ^ O .

Hence (Ax, x) = 0 for x φ 0 <<==> Ax = 0 <=> x £ Ker A. The vectors x and

334 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

Ax — 0 are trivially orthogonal when x G Ker A. In the limit λι —> 0, one can set

Θ(Α, u) = π/2 for all eigenvectors u in Ker A which are associated with 0. On the

other hand, Θ(Α, x) — 0 for all eigenvectors x associated with λ > 0. This introduces

a sharp distinction between the kernel Ker A and all eigenspaces Ker {A — XI),

λ > 0. In the following examples, the positive (semi-) definite matrices of interest

are the modules of a non-Hermitian matrix.

Example 8.6.1 Let A invertible and non-normal have two polar representations:

A = HLUL = URHR.

The eigenvalues of HL and HR are the singular values for A: the extreme ones define

Ai + Xd σι + ση X

where H (respectively U) stands for HL or HR (respectively UL or UL)·

Assuming that d > 2, we set w\ = Λ +fA and w\ = XX+X , 0 < w\ < 1/2 <

w\. We choose the square roots w\ > -4= and 0 < Wd < -j=. Then cos φ(Α) =

2w1Wd-

Example 8.6.2 Let us apply this trigonometric view to the Hermitian factor H for

A — HU (say) not invertible. Then φ(Η) = π/2 and the phase factor is not uniquely

defined (Proposition 8.5.1). The value π/2 for φ(Η) signals the singularity of H and

hence of A.

Now let us go back to the Remark in Section 8.5.3 and let m = n = r(A). We

can readily interpret (8.5.1) as

Equivalent^, C(H) = f / ä ^ L with em{H) = γ-^τ- = i (Theorem 8.6.1); this

provides an interesting metric way to look at the condition number C(H). Similarly,

Cu(U) — em(H), where H is the module restricted to the plane spanned by two

singular vectors associated with the lowest singular values σ\ < σ<ι when they differ.

Actually, the three quantities 1 < K(A) < oo, 1 < C(H) < \/2 and 0 <

sin φ(Η) < 1 are merely different ways to measure the distance of the original matrix

A to singularity by means of its extreme singular values σ\ < ση. The minimal

values, respectively 1,1 and 0, correspond to A = αζ), Q unitary, a φ 0 in C, so that

H = \a\I, φ(Η) = 0. The upper bounds, respectively oo, \/2 and 1 correspond to A

singular (Ai = σ\ = 0 and φ(Η) = π/2). It is important to keep in mind that K(A)

is the relative condition number for A ^ A~l in the 2-norm, whereas C(H) is the

(absolute = relative) condition number for A \-> H in the F-norm.

YIELD OF A HERMITIAN POSITIVE SEMI-DEFINITE UNDER SPECTRAL COUPLING 335

Let A = UYV* e C n x n and AA = UBV\ and let H'(A) denote the Freenet

derivative of H : A\-+ (ΑΑ*)1/2; then

||AA|| F =1

with ||ΔΑ||,ρ = ||J5||i?· It can be shown that C(H) is achieved for the rank 2-matrix

B = wxeiel + wdene{, w\ = j+fa and w\ = χ^χ^; hence ||S|||, = tr BBT =

w

i + wd = 1 anc* ll-^lb = wi < 1. Moreover, £?2 = wiu^(eief 4- ene^)9 where

^ i ^ d = I cos(/>(#). ThusO< ^wrwj = p(B) < 75 < II-BH2 = wi < 1.

When λι -> 0, so does Wd and B tends to the nilpotent matrix e\e^\ r(B) — 2

drops to 1 in the limit.

More details are found in [5] and in Section 4.6 of [13]. Actually the practising

numerical analyst will find ample food for theoretical thought in all of Chapter 4 of

[13].

Example 8.6.3 Let A represent the real matrices Lx and Rx defined in Section 8.5.4.

The common module is Hx. For k < 3, Hx = ||x||/ n · Therefore φ(Ηχ) = 0:

there is no dynamical play for the multiplication maps. Alternatively the yield is real:

a(Hx) = 1.

A positive play occurs for multiplication by non-alternative vectors in higher di

mensional Dickson algebras. Then the norm stops being multiplicative: \\x x y\\ φ

IMIIMI [8]· If a; is a zerodivisor, φ(Ηχ) = π/2 and the yield is pure imaginary:

a(Hx) = i.

The above discussion shows that the linear map which multiplies by x in Ak is best

(respectively worst) conditioned when x is alternative (respectively x is a zerodivisor,

fc>4).

Going back to A Hermitian, we only assume from now on that it is positive semi-

definite, so that 0 < φ(Α) < 7Γ/2. If A = XI, X = λι = λ^ and φ(Α) = 0 because

all vectors are eigenvectors. In conclusion, 0 < φ(Α) < π/2 iff A is invertible ^ XL

Let x\ and xd be a choice of normalised eigenvectors in C n for A associated

with λι and λ^ respectively. We denote S — S(x\,Xd) = {u G Cn;u = zx\ 4-

z'xd, ζ,ζ' G C, |z| 2 +|2/| 2 = 1} = 5 3 the unit sphere in E 4 , and 5 0 = S0(xi,Xd) =

{v G C n ; u = eiew1x1 + e*'wdx&, Θ, ff G R} 9* S1 x S 1 , where S1 is the unit circle

in R 2 . Note that e%ew\ and e%e Wd represent arbitrary square roots of Λ ^ Λ and

λΙΤΧ^ respectively.

Theorem 8.6.2 Let A be invertible, then sin</>(j4) = 11(A +A ^ ~ ^UW for αγν

$

u G S and cos φ(Α) is achieved for any v in So C S.

PROOF By an easy adaptation to the complex case of Theorem 3.1 on pp. 31-32

in [13].

336 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

When A is symmetric rather than Hermitian, the eigenvectors x\ and Xd, as well

as u and v are real vectors in R n . The analogues of S and So are respectively R =

R(xuxd) = {u € Rn;u = axi + α'α^,α,α' e R, a2 + a / 2 = 1} ^ 5 1 , and

i? 0 = Ro(xi,Xd) reduced to the four points {v e Rn;u = ±wix\ + (±WdXd)}

lying on R.

Corollary 8.6.3 When A is symmetric positive definite, the sets S and So in Theorem

8.6.2 are replaced by the sets R and Ro respectively.

PROOF Clear.

Several remarks are in order:

(1) If A is singular (λι = Wd = 0), the set S (respectively i?) is unchanged with

Au — z'xd (respectively a'xd) so that sin φ(Α) = 1. The sets So and Ro are

reduced, according to Wd = 0, w\ — 1 to be subsets of Ker A, leading to

cos φ(Α) = 0, as expected.

(2) When d = 1, λι = λ^ and w\ = Wd = 4^.

(3) The different behaviour of cos Θ(Α, u) and n(A, u, e) with respect to minimi

sation is quite remarkable. It deserves further study |10J.

(4) cos φ(Α) is called the first "antieigenvalue" with "antieigenvectors" in RQ

(A real) in Gustafson's parlance. And the general study of cos φ(Α) and sin φ(Α)

is referred to as matrix trigonometry by its inventor. More generally, over C

φ(Α) is merely a particular case of canonical angle between complex subspaces

of dimension r = 1 generated by u and Au. For 1 < r < n, the associated

trigonometry was presented earlier in Chapter 1, Sections 1.2 to 1.5, as a handy

tool devised to quantify the convergence of eigensolvers in Chapters 5 to 7. But

Gustafson had a different goal in mind for A symmetric.

The difference is mentioned in Exercise 5, Section 6.7, pp. 119-120 of [13J.

However, the maximal canonical angle shown in Figure 1.3.1, p. 9 has lost in

|13| the proper reference to its twofold origin: statistical over R (Afriat 1957)

and numerical over C (Section 1.14 on p. 43). Many examples of the use of the

angle φ{Α) to analyse the convergence of iterative linear solvers are provided

in Chapter 4 of [13].

(5) Unlike the Euclidean angle Z(u, Au) in the real 2D-plane, the canonical angle

Θ(Α, u) derived from the complex Euclidean scalar product has no familiar

interpretation in real 4D-geometry.

Therefore the map u i-> Au expresses, when u is not an eigenvector, a change

of direction only over R. Over C, the change appears, in real terms, as an evolu

tion of a different, more intricate, nature. It incorporates the complex structure

of the ground field C in a computational manner to be studied elsewhere [10].

YIELD OF A HERMITIAN POSITIVE SEMI-DEFINITE UNDER SPECTRAL COUPLING 337

(6) The argument φ(Α) measures the maximal dynamical play that the matrix A

can induce on a vector u. This viewpoint is a move away from the more fa

miliar search for colinearity, that is for eigenvectors of A. This alternative

viewpoint does not look for the directional invariance expressed by A through

its eigenvectors. It looks rather for the directional laxity that A can express by

an inner coupling of any two distinct eigenvalues, as we shall see in Section

8.6.3, the maximal laxity being achieved by φ(Α) for the extreme eigenvalues

λπιίη, Xmax> The larger the φ(Α), the greater the evolution that is possible un

der A by coupling λι, λ^. The number λ = Λι+Λ<* represents the middle point

in the spectrum. Let A = QDQ*; the maximal value Ad ~ Al = ||A - λ/|| =

\\D — XI \\ = maxi<-K n \ßi — X\ is achieved by \\(A — XI)u\\ for u e S.

We recall that the Euler equation associated with the Rayleigh quotient M(A, x) for

a Hermitian matrix A is given by Ax = Xx, x φ 0: x and Ax are colinear in C n

with a real scalar λ. The solutions are the eigenvectors for which the dimension of

the subspace lin (x, Ax) drops from 2 to 1.

A2u Λ Au Λ

T7ö Γ-2τι r+u = 0, Αυ,φΟ. (8.6.1)

{A2u,u) (Au,u)

PROOF See Section 3.3, pp. 33-37 in [13].

Observe that A2u is a real linear combination of An and u.

Corollary 8.6.5 The Euler equation for cos Θ(Α, u) is satisfied by eigenvectors of A

as well as by linear combinations of normalised eigenvectors Xk and x\ associated

with 0 < Xk < Xi, 1 < k < I < d, lying in So(xk,xi) = {v = eieWkXk + £τθ wixi}>

where w

t = x^xl> 1/2. wf = ^ < 1/2

Generally speaking the three vectors u, Au, A2u are independent. They are col

inear if u is an eigenvector, or they have rank 2 if u E So(xk,xi). In the latter case,

(8.6.1) entails ||u|| = 1, A2u = (Xk + Xi)Au - XkXiu.

There are ά^ψ- distinct positive critical values for cos Θ(Α, u) when A is invert-

ible (λι > 0). Can we relax the condition λ& > 0 in Corollary 8.6.5? For the choice

0 = λι < λ/, the vectors in So(xi,xi) are normalised eigenvectors e%ex\ in Ker A

and (8.6.1) is not defined. However, φ\\ = π/2, 1 < / < d, can be inferred in the

limit λι -Λ 0.

338 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

Given any pair Xk < λ/, the dynamical play φω satisfies the relations

It is clear that φηύη{Α) < φω < φ(Α) = ^ m a x (A) < π/2, where φηιιη corresponds

to minAfc<A; (A|/Äfe, 1 < fc, / < d), just like φτηαχ corresponds to max(A/ — Xk) =

Xd - λι.

zxk + z'xr, \z\2 + \z'\2 = 1}.

Let Xk\ = Xk+Xl represent the middle point in [λ&, λ/]. Corollary 8.6.6 indicates

that ||(A - Xkil)u\\ assumes the constant value Xl~2Xh = Xkl sinφ^ for any u e

S(xk,xi).

Given the two normalised eigenvectors xk and x\, a vector in S(xk,xi) (Corollary

8.6.6) is defined by z, z' <G C, which describe the unit sphere in R 4 . By contrast the

vectors v in So(xk,%i) (Corollary 8.6.5) are defined by two real independent variables

θ,θ', corresponding to s — elGWk and s' = e%e w\\ now |s| and \s'\ are determined

by Xk < Xi and only the signs are arbitrary and independent on the unit circle S 1 .

Over M, the dynamical play resulting from spectral coupling is illustrated on Fig

ure 8.6.1. We use the notation 0 < λ = Xk = OA < Xf = λ/ = AB. Then

A+A' λ ' - ^ , AC = Λ/ΛΑ7 = g, CD = 2 ^ 7

OM = CM 2 a, AM λ+λ'

= h > λ,

g

triangle ACM again in Section 8.6.4.

The three quantities h < g < a are known as the pythagorean means for λ and λ',

being respectively the harmonic, geometric and arithmetic means.

Another useful geometric picture is given in Figure 8.6.2 by means of the trapez-

YIELD OF A HERMITIAN POSITIVE SEMI-DEFINITE UNDER SPECTRAL COUPLING 339

ium ABCD, where the opposite sides AB and CD are parallel with lengths λ and

λ'. The diagonal lines BD and AC meet at E: the parallel line through E to the

sides AB and CD meets the concurrent sides at F and G. Then Thales tells us that

2FE = 2EG = FG = 2 ^ 7 = h, the harmonic mean of the side lengths λ and λ'.

Observe that Figure 8.6.2 concerns the particular case of a complete quadrilateral for

which one pair of lines meets at infinity.

In classical spectral analysis geared towards invariance of direction, the variational

properties of the Rayleigh quotient for Hermitian matrices have been known for al

most 150 years (Weber 1869). They were instrumental in the early developments of

spectroscopy which eventually led to quantum mechanics and to the discovery of the

DNA structure in molecular biology. Spectral theory for A Hermitian compares x and

Ax when they are colinear. There are n real ratios μ&, k = 1 to n, provided by the

min-max Theorem 1.9.1 on p. 32: these ratios are the n repeated eigenvalues of A.

If we restrict our attention to Hermitian positive definite matrices A Φ XI, the

coupling of any two distinct eigenvalues Xk < Χι yields d^^- critical values cos φ^ι

for cos 0(A, u), each value being achieved in So(xk, #z). At the same time, sin 0(A, u)

remains constant on the much larger surface S(xk, xi).

We observe that the dynamical plays ΦΜΛ < k < I < d, can be quantified in

dual algebraic/geometric ways:

(i) c o s f e = ψ ^

(A )

=

\\AuMu\\ > ^ τ ^ Ο , for any ue S0(xk,xi),

(Ü) s i n f e = Α ^ = ΐ _ ^

= \\(τ±-Α-I)u\\ forany ue S(xk,xi).

The algebraic formulae are equivalent. This is not true of the geometric versions

because of the strict inclusion 5ο(χ&, %ι) C S(xk,xi). Can </>Μ be interpreted over R

(respectively C) as the Euclidean (respectively canonical) angle Θ(Α, u)l Not always.

It can be checked that j j ^ ^ y = 2 A ^ ^ for u = zxk + z'xu 0 < Xk < Xi, iff

\z\2 = w\ and |z'| 2 = wf: ξ = \z\2 = 1 - \zf\2 is the unique solution in ]0,1[ of the

340 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

The trigonometric approach of Gustafson has uncovered a geometric difference

between the real and imaginary components of the yield and, more generally, of el(^kl.

Because (Au, u) — \\ Al/2u\\2, we understand that the difference comes from the shift

which takes place in -J-1| (A — XMI)U\\ . To account for this geometric difference, we

call catchvectors all vectors in So{xk,xi)'- they are ca/?ta red/predicted by the Euler

equation for cos 9(A,u). Catchvectors v guarantee that the dynamical play is any

"angle" Θ(Α, v). Moreover, the three vectors Xkiv, Av and Vki = Av — Xkiv have the

respective norms \\Xkiv\\ = a, ||At;|| = \Λ/~λ/ = g and ||Vfej|| = λί~2Xh.

Over R, the triangle ACM of Figure 8.6.1 tells us that Vki is orthogonal to Av, as

can be checked directly. See Figure 8.6.3 in the real plane {v, Av}, where the indices

k and I are omitted.

represent the angle Θ(Α, u), letting Uki = Au — Xkiu describe the sphere of radius

Xki s i n ^ / for u G S(xk,xi). The Rayleigh quotient of A on the invariant subspace

M = \in(xk,xi) is a 2 x 2 matrix with eigenvalues λ^ < Χι whose properties are

examined in Exercises 8.6.1 and 8.6.2.

Future research may unravel more of the computational significance for the dis

symmetry between norm and scalar product quantified as sin φ^ι and cos φ^ respec

tively. It is clear that classical (circular) trigonometry in C erases the difference ex

pressed in C n x n , n > 2, only.

8.7.1 Outlook

The previous section has presented the effect on cosine and norm quantifiers related

to A when two distinct eigenvalues are coupled, under the assumption that A is Her-

mitian positive semi-definite.

The present section offers a different perspective on coupling. It considers the

direct coupling of two arbitrary matrices A and E of order n, which takes the linear

HOMOTOPIC DEVIATION 341

{oo}, represents the intensity of the coupling between the original matrix A and the

deviation matrix E of rank r(E) = r, 1 < r < n. The theory of homotopic deviation

(HD) looks at the question: "What is the fate of sp (A(t)) as \t\ -> oo?" When r < n,

some eigenvalues X(t) of A(t) may not escape to oo but converge in C. These limits

at finite distance in C form the set Lim = lim^i^oo sp (A(£))\{oo} c C, which

may be empty.

The HD theory represents a vast generalisation of the classical analytic perturba

tion theory where the parameter t is bounded, |t| < 1, so that \\tE\\ < \\E\\ < oo,

which was presented in Chapter 2 (see also Chatelin 1983, Kato 1976, Baumgärtel

1985). In HD, \\tE\\ —> oo and the limit set Lim is characterised by means of A

and the singular representation E = UV*, where U, V G c n x r have rank r and are

deduced from the SVD for E.

oo

The word "homotopic" stems from the formal factorisation

to write

R(t, z) = B(0, z)[In - tU{Ir - t M 2 ) - V * # ( 0 , *)],

where Mz = V*(zl — A)~1U e C r x r is the bottom-up communication matrix

between the two levels of information corresponding to C r and C n , r < n. Because

of the above form of factorisation chosen for the resolvent of A(t) at z in res(-A),

R(t, z) exists for t G res(M~ x ) when r(Mz) — r. Any z in res(A) such that ίμζ — 1

for some μζ G sp(Mz) is an eigenvalue X(t) for A(t). The homotopic spectral link

ίμζ = 1 between t and z entails that |t| —> oo iff μζ -ϊ 0.

Altogether HD theory is a 3-level information processing, in which the third level

is C n + r . The top-down information processing is realised by the augmented matrix

) £ (£(n+r)x(n+r)^

M^) = y y* 0r

Definition π(ζ) = det A(z) is the homotopic polynomial with degree d, 0 < d <

n — r. Its zeroset is Z = {z G C; π(ζ) = 0}.

342 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

The resolvent R(t, z) is analytic in t around 0 (|£| small enough) and oc (\t\ large

enough) for z in res (A)\F(A, E), where F(A, E) = {z e res (A); r(Mz) < r} =

Z Π res (A) = Lim Π res (A).

The points in F(A, E) are frontier points, i.e. those points in res (A) where

R(t,z) is not analytic around \t\ = oo: they are the limits of X(t) which exist in

res (A). As \t\ —> oc they attract the flow of spectral information consisting of

the spectral rays t — \t\eie i-> X(t) for Θ fixed, which do not escape to oc or to an

eigenvalue of A.

Critical points are particular frontier points z where Mz is nilpotent<^=> p(Mz) =

0. At such a point z, the map: t >-» Ä(t, 2) is a matrix polynomial in t of degree < r.

The critical points repel the spectral flow for |t| < oo but become asymptotically

attractive as \t\ -» oo. Observe that if r = 1, all frontier points are critical because

Mz is reduced to be a complex scalar. In computational practice, the asymptotic

regime is reached as soon as |t| > 300 or so.

This sketchy summary attempts to convey the flavour of the easy part of HD the

ory, which is able to characterise Lim Π res (A) with a few more analytic tools than

the classical spectral theory presented in Chapter 2. The study of Lim Π sp (A) is

more subtle: it calls for a significant generalisation of spectral theory and uncovers

new computational phenomena. For example, when 2 < r < n, it is possible that

a limit eigenvalue λ ( lim X(t) — X G sp (A)) does not belong to Z (π(λ) Φ 0).

|£|—>oo

When this is the case, a local bottom-up organisation of the information occurs at λ.

This contrasts with the top-down organisation which is the rule at all frontier points

in res (A).

The interested reader is referred to Chapter 7, pp. 247-346 in [8]] for an in-depth

treatment of HD. Reference [6J treats an example arising from the discretisation of

the acoustic wave equation where the homotopy parameter is the complex admittance.

The critical points correspond to frequencies for which no finite value of the admit

tance can cause a resonance.

When A (or B) of order n is invertible, then AB and BA are similar: A(BA)A~1 =

AB. When det A = det B = 0, AB and BA share the same characteristic poly

nomial: they are isospectral, and hence they are surely similar when they are diago-

nalisable (for example if B = AT or A*). Even if AB and BA are not similar, the

following augmented matrices of order 2n are always similar:

/ In A \ ( 0n 0 \( In -A\ ( AB 0 \

V 0 In J\ B BA ) \ 0 In ) \ B 0n J-

NON-COMMUTATIVITY OF THE MATRIX PRODUCT 343

The Jordan forms for AB and BA differ only at the eigenvalue 0 for which the sizes Si

of the Jordan blocks satisfy \si(AB) - Si(BA)\ < 1, i = 1, to max(g(AB),g(BA))

g(AB) g(BA)

while keeping m = ^ Si(AB) = Y^ Si(BA) invariant as the common alge-

i=l i=l

braic multiplicity of 0 [ l l ] .

Let 0 be a defective eigenvalue of AB characterised by the three integers g, /, m,

with g,l e [l,m]. Then, the structural matrix Co (AB) of size g x I for AB de

fined in Section 8.4.2 can be associated with finitely many possible structural matri

ces CQ(BA) of size g' x /' for BA. The total number N(g, Z,TO)of possibilities is

minimal at 1 when AB and BA are similar. When the similarity exists only at the

augmented level 2n, N can grow exponentially withTO.The reader can find in [16]

an algorithmic description of the possibilities Co (AB) \-+ CQ(BA).

We exploit below the associativity of the matrix product A(BA) = (AB)A and

denote E = AB and F = BA.

are connected by the relation

F-zI.

Lemma 8.8.2 If A and B are invertible, the spectral projections associated with

0 ± X e sp (E) = sp (F) satisfy

PROOF Let (C) be a Jordan curve isolating λ φ 0 from the rest of the spectrum.

Then (BA)"1 = A~XB~X a n d £ ( A B -ziyxA - ( / - z F " 1 ) " 1 - -\RF-i (|).

We set u = \, du = - # , and thus / BRE(z)Adz = / zRF-i(u)du. It

7c Jc

follows that BPtfA = \PF-i = XPF.

value. Part I

In the general case when E and F can be singular we can perform the series expansion

(2.2.5) given in Theorem 2.2.10 on p.70, in the neighbourhood of any nonzero λ, with

common index /, 1 < Z < m, for each resolvent RE(z) and RF(z).

344 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

Proposition 8.8.3 The identification of the matrix coefficients for l/(z — X)k, k >

0, in the expansion (2.2.5) for RF and RF satisfying (8.8.1) entails the following

relations:

k= l DF + XPF = BPEA, (8.8.4)

1

2<k<l-l Dp + XDp- = BDk£lA,

k =l XDlfl = BDlFxA.

to I and write zRF(z) — (z — X + X)RF{Z).

For k = 0, RE(z) yields BSEA, RF(z) yields SF and (λ - z)RF(z) yields PF.

Therefore the identity BSFA - XSF + PF = I should hold, which can be rewritten

as (8.8.3). The relations for k — 1 to / follow in a like manner.

We observe that if λ φ 0 is defective (I > 1), we run into a contradiction when A

and B are invertible: (8.8.2) XPF = BPEA contradicts (8.8.4) if / > 1 (DF φ 0).

The two equalities are at odds, which result from two different ways of performing

the spectral analysis:

(i) In Lemma 8.8.2, the spectral projections are directly deduced from the resol

vents RF and RF by Cauchy integration on (C) around λ.

(ii) In Proposition 8.8.3, the Laurent series expansions which converge for z close

enough to λ are used in place of the resolvents.

Depending on whether we choose to reason along the line (i) or (ii), we get con

flicting conclusions when E and F are invertible but λ is defective. This unexpected

paradox takes place in the associative ring of matrices and stems from an inherently

nonlinear spectral analysis around λ. Recall that λ is the root of a polynomial of de

gree n which can be arbitrarily large, and that the difference E — F is the commutator

{AB}.

An alternative way to present the paradox is to write PF-\ = PF + \DF, an

equality which challenges the classical result that PF-\ — PF for λ Φ 0 even if

DF φ 0. The additional term jDF comes from the identification of the results of

integration (i) and analytic expansion (ii) applied to the bond (8.8.1) between AB and

BA; it disappears when the product F — BA is studied in isolation as a whole. The

Flanders result that the Jordan forms for AB and BA may disagree only at λ = 0

[11] could be interpreted as an echo of the presence of \DF (resulting from the bond

between AB and BA) which only becomes tangible in the limit λ -> 0 (F~l does

not exist) because the augmented matrices ( J and ( R n ) a r e s ^ m ^ ar -

NON-COMMUTATIVITY OF THE MATRIX PRODUCT 345

The paradox induced by Cauchy integration could become even more puzzling if

we were to progress further into the direction (i) with only the incomplete knowledge

of the existence of λ, taken wrongly to be the unique singularity for the resolvents RE

and RF. One would be led to the (most often spurious) conclusion that

AB = BA = XI. (8.8.5)

The equalities (8.8.2) and (8.8.5) are valid only if / = 1 and sp (E) = sp (F) = {X}

respectively. In mathematical theory the first contradiction has gone undetected so

far because spectral consequences have not been drawn for the bond (8.8.1). Conven

tional wisdom treats the two matrices AB and BA as separate global entities. The

second contradiction is easily resolved by processing local and global information at

the same time. The mind has access to the global information about the spectrum

provided by the common characteristic polynomial TTE(Z) = TTF(Z). Such is not the

case in experimental sciences when experimentalists have access only to the local

phenomenological reality. The computational paradox stemming from local contour

integration is a welcome warning against the dangers of a naive inference from local

to global information.

Remark The warning may not be as far-out as it looks if we ponder on the matrices

Lx and Rx representing the left and right multiplications by x in Ak, k > 0. For

k < 3, these matrices have the simple module Hx = \\x\\I2k, corresponding to

the unique singular value \\x\\. And the development of our physical intuition of the

manifested world is primarily based on the validity in R, C, El and G of this extremely

special situation. It is therefore reasonable to expect that our 3D-based intuition can

be challenged in higher dimensional algebras (k > 4) when the multiplication maps

have not only module factors which carry polymorphic information about themselves

{3, 8} but also phase factors which can be ambiguously defined.

The complete spectral analysis of (8.8.1) around λ Φ 0 involves, in addition to non-

positive powers, the positive ones (z — X)k, k > 1.

Proposition 8.8.4 Spectral analysis of (8.8.1) around X ^ 0 entails for k > 1 the

infinite sequence of relations

When z -> λ, the Z + 1 relations in Proposition 8.8.3 are important because

(z — X)~k converges either to 1 or to oo. We conjecture that these relations underlie

many phenomena commonly observed in the world. But for k > 1, (z—X)k converges

to 0, and it tends to hide the sequence (8.8.6) when z is close enough to λ.

346 POLYMORPHIC INFORMATION PROCESSING WITH MATRICES

When λ = 0, theory tells us that the indices IE and IF for 0 satisfy IE € {1,2}

for lF = 1 and lE e {IF - IJFJF + 1} for lF > 2 [16]. Only additional in

formation about E and F (such as numerical experimentation) can indicate which

possibility is actually the case. The global connection (8.8.1) which exists between

the non-commuting square matrices A and B (AB φ ΒΑ) provides a rational basis

for the many holistic aspects of life which are more and more frequently observed in

experimental biology and ecology. It also suggests a fundamental difference between

the two cases λ = 0 and λ φ 0.

8.9 CONCLUSION

This chapter has offered a selection of snapshots taken in the booming domain of ma

trix computation which contain eigenvalues in their inner core - in an explicit and,

at times, implicit fashion. Due to the fast, bush-like, evolution of the domain (from

pure algebra to finance and the Internet) many other aspects could have been pre

sented. Admittedly, the chapter betrays the author's personal views on the evolution

of mathematical computation over the ages [7,8,9,10].

In her view of computation, matrices will prove themselves to be even more essen

tial tools than vectors in the scientific understanding of the ever-changing scheme of

living organisms. The polymorphic and possibly ambiguous character of the dynam

ical information that matrices carry makes them versatile macro-scalars upon which

can be built a complex multilevel processing of information.

The attention of the reader is drawn to [1,2,11,12,13,16,19,22], whose popularity

within the numerical analysis community may not parallel the depth of their signif

icance for matrix computation. On the theoretical side, Dickson algebras and HD

are studied at length in [8] as two aspects of a developing theory of mathematical

computation called "qualitative computing".

EXERCISES

Section 8.5 Polar Representations of A

8.5.1[C] When r(A) = n - 1 for A e C n x n , show that there exist exactly two

left phase factors Ui and U2 = (/ - T)U\, where T = 2 Ä , a e Ker A*. Interpret

\T. Deduce that \ΌΧ - C/2||2 = 2.

EXERCISES 347

8.5.2[D] When r(A) < n — 2, show that there exist uncountably many distinct

phase factors for A.

8.5.3[D] Use Exercises 8.5.1 and 8.5.2 to show that for A e C n x n , the number

of distinct phase factors for A can take the values {1,2} for n = 2 and {1,2, oo} for

n > 3. In the latter case, oo is uncountable.

8.5.4[D] When 1 < r(A) < n, show that the matrix T defined in Proposition

8.5.1 satisfies max ||T|| 2 = 2.

Coupling

8.6.1[C] Let 0 < λ < λ' be two distinct eigenvalues for A Hermitian positive

semi-definite, with associated normalised eigenvectors x, x' which span the invariant

subspace M = lin (χ,χ'). Let P = QQ* be the orthogonal projection on M,

Q*Q = J 2 . Let B = Q*AQ = &(Q) represent the 2 x 2 Rayleigh quotient of A on

M; see p. 34 in Chapter 1, Section 1.11.1.

(1) Show that B is Hermitian positive definite with sp(.B) = {λ, λ'}.

b

(2) Set B = ( ? J. Show that tr B = a + c = λ + λ' > 0 and det B =

λλ' = ac- \b\2 > 0.

(3) Prove that \b\2 >0<=>\<{a,c}< X'.

(4) Suppose that λ < a < X = ^ - < c < λ'. Show that 0 < \b\ < ^f^.

(5) Show that sgn6 = τ|τ, b Φ 0, is not specified by the triple {α, λ, λ'}.

over C.

(1) Show that \\At\\ = vXV = g.

(2) Deduce that a = (At,t) = \\At\\ cosφ = \\A^H\\ = h, c = ^ ± ^ and

\b\ = VXX/sm(j).

(3) Explain why q<i is necessarily of the form zx + z'x' with \z\2 — j£y and

U'|2 _ A;

λ+λ'-

(4) Confirm the value c = Χχ+χ, = (Au,u) with the choice q2 =u= J-χ-^χτΧ-

A

'■χτχ'. Conclude that B can take the remarkable form B = vXVC with

/ A+A

cos φ — sin c

C

=\ -sin0 i(xJT + x>J»\ )-CheckthatdetC = l.

Ä+Ä7 1 'XV A7

APPENDIX A

Solution to Exercises

CHAPTER 1

1.1.1 The columns of (V~l)* are the adjoint basis of the columns of V when

V is non-singular. We remark that when V is not a square matrix, then the

adjoint basis either does not exist or else is not unique. For example, when

Λ 0\

V= 01

VI 0 /

than these exist at least two adjoint bases, namely

1 0\

and

be characterized by

p(A*A)=\\A\\22.

1.1.7 If Q*ß = J, then

I I Ö I I 2 = l i e - 1 h=ii O * » 2 = I .

1.1.8 If A is a singular matrix, so is A*A. Hence zero is an eigenvalue of A*A

and therefore a singular value of A. Let n ^ r and i 4 e C x r; if one column of A

depends linearly on the other columns, then zero is a singular value of A.

1.2.1 Let

352 APPENDIX A

Thus

t ^ 2r - n > 0.

n r

Let Qe<C * be an orthonormal basis of M whose first t columns form a basis

of Mr\N. Let Ue<£nXr be an orthonormal basis of N whose first t columns are

those of Q. Then

* \o w)

where W is of order r — t.

This shows that U*Q has at least t singular values which are equal to unity;

hence there are at most r — t non-zero canonical angles between M and N.

However,

n

r — t^n — r < - ,

2

that is

Put

Q = (V,Q') and l/ = (F,£/'),

where

lin(K) = M n N .

Let

M' = lin(Q') and N' = lin(l/');

then

M'nN' = {0},

M n N + M' + i V ^ M + N',

and the non-zero canonical angles between M and N are the same as those

between M' and N'.

1.2.2 Let QeC n x m be an orthonormal basis of M and let Ue<Cnxm be an

orthonormal basis of N. Let 0, be the greatest canonical angle between M and

N, and put cx = cosöj. Then

öi = i^^Ci = 0 o l / * Q is singular

o 3 w 6 C m such that u Φ 0 and t/*ßw = 0

o3xe<C n such that χφθ and x e M n N 1 .

SOLUTION TO EXERCISES 353

1.2.3 T is a basis of N if and only if the matrix Y*X is regular. However,

according to Proposition 1.2.2, y * Z ~ c o s 0 is invertible since Θί < π / 2 . Now

one computes:

(T* -X*)(T- X) = I -(X*Y)(Y*X).

If a, is a singular value of T — X,

a? = 1 - cos 2 0f = sin2 0f.

1.2.6 Let X = (Xu X2) and Y = (Yu Y2) be orthonormal bases of <Cn such that

Xx is a basis of M and 7X is a basis of N:

Jf1eCXr> y^C^, * * * χ = yjYj = / ,

where r < w/2. Let

x n rx r

There exist a unitary matrix Zx e C and a unitary matrix Vx e<C such that

C = Z*^11K1=diag(c1,...,cr)

is the singular value decomposition of Wi t . Hence, by the definition of canonical

angles, we have

where fc ^ r. We define

C' = diag(c 1 ,...,c k ),

and therefore

= V*W*1WUV1+(W21V1)*(W21V1)

2

= c + (w*ivlnw2lv1)

354 APPENDIX A

and so

(^ 2 1 K 1 )*(^ 2 1 F 1 ) = diag(si,...,sk2;0,...,0)

where

5.^0 and sf + cf = l (i=l,...,fc).

{n r) x (n r)

Let Z2<E ~ ~ be a unitary matrix whose first k columns are those of W2\ V\

when normalized. Then

where

Z^W

Ml)·

S = diag(s1,...,sfc;0,...,0)e<Cr>

Let S' = diag(s lv ..,s k ). Then S' is regular and

We therefore have

(C\

l s

Vo zj \w2J

In an analogous manner we determine a unitary matrix K2e(C<n r)x<

" r)

such

that

Z*W12V2 = (T,0),

where T is a diagonal matrix with non-positive elements such that

T2 + C2 = Ir.

Thus T= -S. Let

Then

(CO -S' 0 0 \

0 I,_k 0 0 0

Z*WV = S 0 X33 X34. X35

0 0 Λ

43 Λ44 Λ

45

0 0 -*53 -*54 ^55

\

The columns of the matrix are orthogonal; thus S'X34. = 0 and so X34. = 0. Also

SOLUTION TO EXERCISES

-C'S' + S'X33 = 0,

whence X33 = C. The matrix

Z3 = (X" X

4s\e<p«-r-k)xin-r-k)

\A54 Xss/

in unitary and

(c o -s' o \

0 /,_* 0 0

ZWV = S' 0 CO

\0 0 0 Z3

/7» ο o o \ /c o -S' 0

0 /,_» 0 0 0 /,_, 0 0

0 0 lk 0 S' 0 C O

\o o o z3y v0 0 0 /„

Put

\o z j \o zj

Then

(c -s o ^

Z*WV-- S CO

0 0 /„ - * /

Finally,

6 = XjZ, is an orthonormal basis of M,

Q = X2Z2 is an orthormal basis of M 1 ,

17=7!^! is an orthonormal basis of N,

U = Y2V2 is an orthonormal basis of N1,

and

(c -s o \

[ρρ]*[ΐ/ι/] = S C O

0 0 /„_ 2'J

Since

Q*U = C

356 APPENDIX A

and

subspaces of common dimension greater than n/2, is suffices to compute the

canonical angles between their orthogonal complements.

1.6.5 Suppose that all the eigenvalues λί9...9λη of A are distinct. Put

D = diag(A1,...,/lII).

Let Q be a unitary matrix such that

Q*AQ = D + Nl

is a Schur form of A, Let u = (tiy) be a unitary matrix such that

(QU)*A(QU) = D + N2

is another Schur form of A. The matrices Nx = (n\V) and N2 = (nff) are strictly

upper triangular matrices.

We have that

whence

j - l n

k=l k=i+l

where

0 n

Σ - Σ =o.

fc=l k=n+l

Put j = 1 and ί = n; we find that unl = 0. Suppose that when fc>2 we have

ie{/c,fc+ Ι , . , . , η - l,n}=>wtti = 0;

we deduce that Μ^-!! = 0 . This shows that

wMi = 0 when i = 2,3,...,n.

Now suppose that when j = 2,3,...,/ and k>j + 1 we have

ie{/c,/e+ l,...,w— l,n}=>u o = 0.

It then follows that

Uij = 0 when j = 1,2,...,n— 1 and / = . / + Ι,.,.,η.

We leave to the reader the task of verifying that in the presence of repeated

eigenvalues, the diagonal of U will contain blocks.

SOLUTION TO EXERCISES 357

fl ifi + j = n + l ,

Pi / = 1

[0 otherwise.

Then

p - i = P = p*

and

J* = P* JP.

1.6.19 Let X be a basis of eigenvectors of A and let Q be a Schur basis:

A = XDX~\

Q*AQ = D + N\

where D is the diagonal of eigenvectors and N is a strictly upper triangular

matrix. Then

\\N\\2¥=\\A\\2F-\\D\\2F

and

Hence

, , ^ Λ JIAMIFV 7 2

cond2{x

H^Ü¥J ■

On the other hand,

\\A*A\\F^\\X-1\\l\\D*X*XD\\

2 II ^ Λ Λ

** IIF

1 II 2 || v II 2 |

= \\Χ-'\\\\\Χ\\\\\ΌΌ*\\Υ.

However,

||DD*|| F =||D*D|| F =||D 2 || F ^||>1 2 || F ;

hence

2

cond2(AT)

' Λ . ^^

M 2 HF

Moreover,

M M - ~ ^ * | | F = 2(MM|| F -M 2 || F ),

whence

cond^(X)^l+-^^r.

2

2V

' 2O IIII JA2 | |||2F

358 APPENDIX A

Q*AQ = R = R = D + N,

where D is a diagonal matrix and N is a strict upper triangular matrix. Define

r = K*K-KK* = (y0).

By induction on the size of A it can be proved that

j=2

Since

we conclude that

Ι Ι ^ Ι Ι ρ ^ - γ - ) Ί ΐ + - - ^ - ) ; 2 2 + - + — -7«n.

V

12 Ä

However,

/4*/ί - AA* = βΓβ*.

Hence

and so

s= max \λί-λ·.\

Let

„ II^IIF . vM) 5

fl = , 0 = —— , C-

\M\F j2\\A\\l Jl\\D\

SOLUTION TO EXERCISES 359

b - 1 + a2 ^ Jlca^Jl-a2.

If b — 1 + a2 <0 then b2/3 ^b<l —a2, and we obtain the inequality required.

If b - 1 + a2 ^ 0, then

(1 + 2c2)a4 - 2(1 - b + c2)a2 + (fc2 - 2b + 1)^ 0.

Hence

1 + 2c '

However,

<,♦»-*,»-.(,-£♦»)«(,-*♦')

hence

i>2

α2«ζ1

l+2c2

Since c2 < 1, we have a2 ^ 1 — ^b2, that is

-^<ii"ii2.

6MII 2

1.8.1 Let XeC""" and suppose that X is of rank r<m. Hence there exists a

permutation matrix Π such that the singular value decomposition of ATI can

be written

v*xnu ■e :>

where V is a unitary matrix of order n, U is a unitary matrix of order m and Σ

is a non-singular diagonal matrix of order r. W e may write

Vt/21 i / 2 J

where l / u is of order r, and hence

ATI

= "(7' ο>

Let

360 APPENDIX A

where

e

=i ":) and R =.

'An

0

0

0

Q being unitary and K an upper triangular matrix.

1.9.1 Let C = A + By where /I and 5 are Hermitian and B is positive semi-

definite. For all ue<En such that ||u\\2 = 1, we have u*Bu^0 and

u*Cu = uMw + u*Bu ^ WMM.

On taking the maximum on the left subject to ||u|| 2 = 1, we obtain

p(C)^u*Au.

We conclude that

p(Q>p(A).

arranged in decreasing order. Thus

Ai(>l)= min max u*Au (j = l,...,i — 1)

fi,...,t>i- i l|u||2= 1

u*t>j = 0

and

u*Au = M*£M -I- u*Cu.

Now

X1(C)= max M*CM,

l|u||2=l

I|u|| 2 = l l|u||2=l

u*Bu + Xn(C) ^ uMu ^ u*Bu + AX(C).

Now take the maximum subject to

IM|2=1 and ΐ4*ι;,. = 0 (;= l,...,i-1)

and the minimum extended over 3llvl9...,vi^l;itis found that

λη(0 + Xt(B) ^ kt\A) < km + Ai(C).

SOLUTION TO EXERCISES 361

Hence

\λί(Α)-λί(Β)\^\\€\\2.

If C is positive semi-definite, then Art(C) > 0, and so λ{(Α) ^ λ^Β).

such that

Q*AQ = D + N.

Then

N*N = NN*;

if N = (iiy), then

ΐ<;=>η ι 7 = 0

and

π η

k=l k=l

that nkj = 0 for all k when i = j ^ l\ we than conclude that nkll + x = 0.

Hence N = 0 and the eigenvectors of A form a unitary matrix. This implies

that all spectral projections are orthogonal and that A is diagonalisable.

Nevertheless, a normal non-Hermitian matrix can have complex eigenvalues.

For example, a diagonal matrix containing at least one non-zero complex

element is normal.

1.9.4 According to Exercise 1.9.3 we have

Q*AQ = D and ß*/i*Q = D*.

Hence Q is also a matrix of eigenvectors of A*. Thus

A*AQ = A*QD = QD*D,

whence

A* A = 6(2>*D)ß*,

p(A*A) = p(D*D)9

\\A\\2 = p(A).

start with the characterization

u*Au

λΛΑ)= max min .

dimS=j ueS U*U

362 APPENDIX A

Then

. u*Au

= max mm

dimS = n-j+l ueS U*U

. U*Ali

= max mm

dimS1=j-l «IS1 U*U

. u*Au

= max min .

dimS = j - l ulS U*U

p(T)^\\T\\2 for all Γe<C,,x',

we conclude that

C= A-B+\\A-B\\21

is positive semi-definite. Let

Λ' = 4 + Μ - Β | | 2 / .

Then

A =£ +C

and we apply Exercise 1.9.2 in order to deduce that the eigenvalues of A and

of B may be numbered in such a way that

λΜ^λ^Α).

Since

ki(A) = ki(A)+\\A-B\\2,

we have that

λΑΒ)^λΛΑ)+\\Α-Β\\2.

CHAPTER 2

2.2.3 First we state the following facts: If

J = (xM)

is a square matrix such that

U ifa = )3,

x*ß= \ 1 if P = a + 1,

[0 otherwise,

SOLUTION TO EXERCISES 363

then:

(a) J is non-singular, if and only if λ Φ 0;

(b) if λ Φ 0 and J " l = (^), then

0 ifa>0,

if a ^j?.

A'"«"1

(K ) = (Xiiti,...9X^d)i

Then

Ρ^ = XjX+j

is the spectral projection of A associated with the eigenvalue Xj. If zesp(A\ then

Pj is also the spectral projection of (A — zl)~l associated with the eigenvalue

(Aj — z)~i. We conclude that

where

Ö;W=

VΣ (-ö/

lj being the least integer such that

Finally, let Tk be a Jordan curve which isolates Xk from the rest of the spectrum

of A. Then

ifyVfc,

Hence

I)rAh-z)

dz

—=0 0 = 1,2,...,<*;«>()).

-ΛΓ μ - ζ / ) - 1 α ζ = ρλ = ^ χ ^ .

2πι Jrk

364 APPENDIX A

with partial pivoting, then we obtain a system of the form

GK>

where T is an upper triangular matrix of order n. The matrix T is obtained by

premultiplying in turn by permutation matrices and Gaussian elementary

matrices. If, instead of the latter, we use Householder matrices (Exercise 1.8.5),

then we obtain the Schmidt factorization.

The final structure is of the same form.

2.3.2 It is trivial that diagonalisable matrices with the same spectrum are

similar. Defective matrices are similar if and only if they possess the same

spectral structure (eigenvalues with the same algebraic and geometric multi

plicities and the some indices); that is if and only if they possess the same

Jordan form.

2.3.3 We show that λ is not an eigenvalue of (/ — Π) A: if λ were an eigenvalue

of (/ — Tl)A, we should have

(/ - U)Au = Xu

for some u Φ 0. If λ φ 0, then

0 = Π(/ - U)Au = /UIw=>nu = 0,

whence

u = (I - Π)Μ Φ0

and

(l-U)A(l-Ti)u = ku\

that is λ is an eigenvalue of (/ — U)A(I — IT), which is impossible because λ is

not an eigenvalue of B. We deduce that the unique solution is

ζ = Σί>.

2.3.6 We refer to the identities

(/-Π)(/-Π1) = / - Π ,

(/-Π1)(/-Π) = / - Π 1

Σ(Π 1 ) = (/~Π 1 )Σ(Π),

SOLUTION TO EXERCISES 365

Ι!Σ(Π^)|| 2 ^ ||Σ(Π)|| 2 ,

ΙΙΣίΠ^ϊρ^ΙΙΣίΐυΐΙρ.

(/ - P)AX ~XB = R

3

for a given matrix RelR"* . The 2 x 2 matrix B is the Rayleigh compression

corresponding to the spectral projection P associated with a double non-zero

eigenvalue λ. Let V = (Vt V2) be the Jordan basis of B. The Jordan form of B is

change of unknowns:

Y = (YlY2) = XV9

(1-P)AY-YJ = RV.

An easy computation yields

Yi=SRVu

Y2 = SRV2 + ocSYl

= SRV2 + ocS2RVl.

Hence the reduced resolvent S and the block-reduced resolvent S satisfy the

relation

SR = X = YV~x = SR + OLS2R(0 VX)V~X.

A'(xk -xk-x) = b- Axk_loAf(xk - x') = (Α' - A)xk-V

l l

Hence xk converges to x = A~ b if and only if p[A'~ (A' — A)"] < 1, in exact

arithmetic. Why can the computation of b — Axk present a problem in finite

precision arithmetic?

Verify that, in arithmetic with three decimal places, the solution of

0.986 0.579\/u\/0.235\

0.409 0 . 2 3 7 / W ~ MU07/

366 APPENDIX A

/ 2.11 \ / : N

1.99 \ / 2.00

V-3.17 A - 2 . 9 9 / V --3.00>

:

The exact solution is (.3).

2.10.1 In accordance with hypothesis (H2), there exists rle(0,r) such that

\\x-x^\\<rl^\\F(x)-F(x*)\\<\\F(x*)-1\\~\

whence F'(x) is non-singular for each x such that || x — x* || < r x . Now the map

xh-+F'(x) -1 is continuous in the neighbourhood of x*. Hence there exists

r 2 e(0, rt) and μ > 0 such that

||χ-χ*||<Γ2=>Γ(χΓ1||<μ.

Finally, there exists pe(0,r 2 ) such that

2μ

Define

xk = x* + t(xk - x*) (0 ^ t < 1).

Then

Jo

Jo

Suppose that || xk - x* || < p (which is true when k = 0). Hence

0

and

llx k+ i-x*ll<illx*-x*l·

On the one hand, this shows that x k + 1 satisfies

||xk+i-x*||<P

and, on the other hand, that

lim xk = x*.

||xk+1-x*KHx*-x*ll sup \\F(xk(t))-F(xp)

0<i< 1

lim sup ||F'(x k (i))-F(x k )||=0,

fc-QO 0 < ί < 1

SOLUTION TO EXERCISES 367

sup || F(xk(t)) - F'(xk) \\<l\\xk- x* ||',

Ο^ί^ 1

If the Jacobian matrix satisfies the Lipschiz condition, then p = 1 and the

convergence is quadratic.

2.11.2 W e d e n o t e / = | | J ' - 1 | | , P = l | / i ^ H , i = I I ^ H , w = | | y * | | , s = | | y M ( / -

X' y*) ||, v, = J ' " l HX\ || Vx || ^ γ'ρ = πχ by definition. We suppose that || Vk \\ ^

nk, and set

β! = H J ' " 1 » ||. ε2 = \\Υ*ΗΧ'\\, η = ε1^γ'ε2 and e = yt2sp.

Then

|| Vk+11| ^ nx 4- exnk 4- )>'ε2πΛ 4- /STT*

= π! -f fo -f /ε 2 )π* + /$π* = π*+ x (say).

Set nk = π χ (1 4- xk) for k ^ 1. Forfc= 1, Xj = 0; forfc= 2,

π 2 = π ^ Ι 4- η 4- v's^) = π ^ Ι 4- ?y 4- ε) = π ^ Ι 4- x 2 )

and

π* +1 = ^ [ 1 + ^(1 4- xk) 4- ε(1 4- x k ) 2 ]

= π 1 [1 - x k + 1 ]

which defines the recurrence relation Χχ=0, x k + 1 — η 4-ε4-(*7 4- 2ε)χΙς + εχ£,

fc^l.

The limit x satisfies x = /(x) = εχ2 4- (η 4- 2ε)χ 4- ff 4- ε; χ = /(x) has two real

roots if 2 Ν /ε + ^ < 1 . (One can verify that lyfz + η < 1 implies that the

discriminant is positive.) Let x* denote the smallest root. When fc-+oo, xk

converges monotonically from xx = 0 towards x*, and nk converges to π χ (1 -f x*).

Let

G'\V*-*Vx+i'-llHV-V(Y*HX')+V(Y*AVy\.

We determine a sufficient condition under which G' is a contractive map in the

closed ball:

®= {ν;\\ν\\^(\+χ*)π1};

G\V) - G'(K') = J ' " * [tf(K - V) -(V- V)Y*HX'

+ (κ- V)Y*AV+ VY*A(V- vy\

It is easy to check that if η 4- 2ε < \ (which implies */ 4- 2^/ε < 1) then x* < 1

and the Lipschitz constant k for G' in & satisfies k = ^ 4- 2ε(1 4- x*) < 1. Therefore,

these exists a unique fixed point V= X - X' in ^ , with y*K= 0, such that

||F|| = | | X - X ' H < 2 | | F 1 | | = 2 | | J ' - 1 / / X ' H .

368 APPENDIX A

Now

B - Β' = Y*AX - Y*A'X' = Y*A(X - X') + Y*(A - A')X'

implies

\\B-B'\\^s\\X-X'\\+u\\HX'\\.

The condition */ + 2ε < | is rewritten as

y'||H||+/||H||iu + 2y'25||//||i<i,

that is

/||H||[l+iii + 2/si]<i.

If we choose the Euclidean norm | | | | 2 and the bases Y = X' = Q to be

orthonormal, then t = u = 1, and the sufficient condition becomes

|| 51| = s, where 5 = Y*A - BY* = y M ( / - (7 y*) is the left residual matrix. We

define K1 = - J " 1 Ä and G I K I — ^ + J - 1 [K(KMK)+ K ( B - £ ) ] . Clearly

/ B |S*

(2.11.4) is a modification of (2.11.1) where the block decomposition I — —

(B \S*\

has been modified into I —h— I. \\Vl\\ ^γρ = πχ and we suppose that || Vk || ^

nk. Then || Vk + x || < π t + f5π^ + ?πΛ<5 = πΛ + x. Upon setting nk = n1(\ + xfc),

ε = y2sp and <5' = f<5 we obtain x = H m ^ ^ xh which satisfies x = /(x) = εχ2 +

(2ε + <5')χ + ε. This equation has two real roots if <5' + 4 ε < 1 . Under this

condition, xk converges monotonically from xx = 0 towards x*, the smallest

root. We consider now G on the ball

« = {K;||HK(l+x*)Ä1};

G(V)- G(V') = yil(V- V')Y*AV + ΥΎ*(¥- V') + (V- V')(B - B)]

= | | { Κ Ι | Κ - Κ ' | | [ 2 ε ( 1 + χ * ) + 5'].

We leave it to the reader to check that the condition δ' + 4ε < 1 implies x* < 1;

hence

& = 2ε(1+χ*) + <5'<4ε + <5'<1.

This condition is written as 4y2sp + γδ < 1. One should remark that, without

perturbation, (2.11.1) converges if 4y2sp < 1, whereas, with the perturbation,

B — B, (2.11.4) converges if

4fsp + yo< 1, where(5= \\B-B\\.

We conclude that the unique fixed point V = X — U satisfies

||K|| = | | X - C / | | < 2 | | J - 1 J R | | .

369

SOLUTION TO EXERCISES

Jo

where

χΛ(ί) = x* + i(xk - x*) (0 < t < 1).

Hence, if || xfc — x* || < p, we have

\\xu+i-x*\\*iy\\T-lnxk-x*\\,

where

0<7||T-1||<1.

2.11.6 Let xeQ = {xeB: \\ x — x0 || ^ p}. Define the operators

G(x) = x-F'(x0rlF(x),

L(x) = F(x) - F(x0) - F'(x0)(x - x0).

It is easy to prove the following inequalities:

|| L(x) - L(x0) K lp II x ~ x0 II < P2,

\\G(x)-x0\\^mlp2 + c = p,

\\G'(x)\\=mlp = y,

\\G(x)-G(y)\\^y\\x-y\\,

provided that x, yeQ. The sequel is left to the reader, who must apply the fixed

point theorem to the operator G.

2.12.2 Consider the system

Ax = b (1)

and a non-singular matrix B such that

cond2 (BA)« cond204).

Hence the equivalent system

BAx = Bb

is better conditioned: the matrix B is a preconditioner for the solution of

equation (1).

If p(I — RA)« 1, where R is an approximate inverse operator (Exercise 2.6.1),

then it can be shown that

cond (RA)« cond (A),

and we may choose B = R. Therefore, B appears as an approximate inverse of

A. The vector

x0 = Bb

370 APPENDIX A

Ax = b,

where A corresponds to a discretization of a linear operator in infinite dimension.

We associate with A a step that characterizes the discretization. The order of

A is a decreasing function n(h) of h. Let b! be a coarser step:

h'» ft,

and let

Ax' = V

be the associated system of order

n(h')«n(h).

Put

N = n(/i), m = n(W).

Suppose there exist matrices

re<CmxN and pe€Nxm

such that

rp = Im and Λ' = rAp

and that /Γ is non-singular. Then we may take

R^pA'-'rK"

as an approximate inverse, where K is an operator such that

p(I-RA)«\

provided that v is sufficiently great. Often K is chosen to be the interaction

matrix of Jacobi, Gauss-Seidel or relaxation (see [B:9,27]).

CHAPTER 3

3.3.1 If J = V ~ XA V, then Jk=V~lAkV, and we deduce that

A J

e =Ve V-K

The result follows from the identity

e J(i+h) - e Ji = (eJ* - /)e J i = hJeJt [/ + 0(h)l

The computation of the elements of e Ji follows from Exercise 3.1.6.

SOLUTION TO EXERCISES

ΐν = Β1,2υΒ-1ΐ2,

W = RBXAXTRTB,

U = ΧΑΧτΒ,

B = RBRB,

whence we obtain

W= (Bll2RBl)W{Bll2RBx)-\

The eigenvectors satisfy

v ^i

3.5.6 The derivatives of w are

u'(i) = λελίφ and w"(i) = λ2ελίφ.

Hence

(A2M + Aß + K)</> = 0 if Mu" + £u' + KM = 0.

3.7.3 The result follows directly:

Τπηφ = Τπ,,ίΤφ,,) = Τ(πηΤπη)φη

= Τ(ληφη) = ληΤφη = ληφη.

CHAPTER 4

4.1.1 Let u be such that || u ||2 = 1 and

\\A~lu\\2 = max \\A'lx\\2 = IM' 1 1| 2 .

IWl2 = i

Let

t> = —— Λ *u and Δ/4 = —uv*

\\A~l\ 12 II Λ

112

(A + AA)v = 0.

Hence A + AA is singular. On the other hand,

4.2.1 Let (ß, Q) be an orthonormal basis of C , where Q is a basis of M. Let

372 APPENDIX A

where

sp(B) = {A}.

and

B = Q*AQy

where

sp(B) = sp(>l)\U}.

Let

δ= min \μ — λ\.

ß€sp(B)

Hence

( Γ 1 ^ max -^ρΚΒ-λΙΓ^^ΗΒ-λΙΓ1^.

με*ρ{Β)\μ — λ\

Let

Hence

ΙΙΣ^ΙΙ^ΙΙΒ-λ/ΓΜΐί-

However,

Therefore

whence

||ΣΧ ||2 = | | ( B - A / ) - ' ||2.

Let

Aesp(ß) suchthat δ = \λ — λ\,

and let

J = V'lBV

be the Jordan form of B. Hence

(B-Xl)~i = V(J-XI)-1V-1,

and so

\\(B-U)-lh*cotid2(V)Ml-U)-

Let / be the index of λ and let ./(A) be the corresponding / x / Jordan block.

Hence

SOLUTION TO EXERCISES 373

liy-A/)- 1 |l2 = max||(J lV -/)- 1 |l2

Jij

< max | | ( J y - * / ) - % ,

where the Ju are the different Jordan blocks of J. For sufficiently small δ the last

maximum is attained by J 0 = J(A), whence we have the result that

^ - 1 ^ | | ( B - A / ) - 1 | | 2 = IIS1||2<2cond2(K)o-z.

4.2.2 The function

Ρ(δ)=

~2πϊ' (Me)-ziyldz

nJr

is analytic for

\e\<minplR(z)Hy\

where

R(z) = (A-ziy\

and Γ is a Jordan curve isolating λ. Hence

lim||P(e)-P|| 2 = 0

and

χ(ε) = Ρ(ε)φ

can be normalized as follows:

Φ(ε) = [0^(ε)]-^(β),

where

Α*φ* = φ>θ*, ΦΙΦ = ΙΜ.

In fact, for sufficiently small ε, we have φ*χ(ε) Φ 0, because

lim|0*x(e)-l| = O.

Hence

θ(ε) = φΙΑφ(ε)

and we can prove that

döl

= ΦΙΗφ.

*

da ε = 0

374

APPENDIX A

whence

\\θ(ε)-φ\\2*!ί\\φ*\\2\ε\ + 0(ε2).

This proves (b).

The inequality (9) is proved as follows. Since

QNQ* = Vny-it

it follows that

|| Nk || 2 ^ cond 2 V (for all k ^ 0).

The inequality (c) is a consequence of the indentity

lMßiIm -ΘΤ1 IMßVm - β(β)] = / . - [Α(ε)/„ - θ] -1 [0(c) - 0]

because A(e)/m — θ(ε) is singular. The inequality (d) is a consequence of

l^°nd*(K)ll*(£)-^maxil,- l

\λ(ε)-λ\ l \λ(ε)-λ\'

(see[B:l,25]).

4.2.3 We calculate

B = A~lAA = (1 l

\0 0

The departures from normality are

v(A) = || A*A - AA* ||F = 10*72(1 + 108) > / 2 x 108,

v(B) = \\B*B-BB*\\F = 2.

The bases of eigenvectors are

X(A) = ( 1 l

A ) and X(B) = (

' \0 10-*/ \0 0

respectively, whence

Hence

cond 2 [ΛΤμ)] > 2 x 104,

cond 2 [X(ß)] = 3 + v 5

< 2.62.

SOLUTION TO EXERCISES 375

(£, || · || E) and suppose that / takes values in a normed vector space (F, || · ||F). We

recall that / satisfies the Lipschitz condition if there exists a number κ ^ 0 such

that

ll/(x)-/(y)llF<*ll*-yllE for all x,yeCl

and that / is Holder-continuous of order pf if there exists a number κ ^ 0 and an

integer p > 1 such that

Wm-fiyUF^KWx-yVE for all x,yeQ.

If in the above definitions we take Ω to be a neighbourhood of 0E and if we

fix x = 0 E , then we obtain the corresponding notions relative to a scalar. By

virtue of the bounds established in Exercise 4.2.2, part (d), the function e»-*A(6),

defined in a neighbourhood of ε — 0, satisfies the Lipschitz condition at 0 when

the eigenvalue is semi-simple for (/ = 1), and it is Holder-continuous at 0 when

the eigenvalue is defective with index /, the order p of the continuity being equal

to 1//.

4.2.7 The quantity ε in Exercise 4.2.2 measures the absolute error of Α(ε) in

relation to the approximation of A. Hence the relative error of Α(ε) is given by

ε» =

1ΙΛΙΙ2

On the other hand, for each eigenvalue λ φ 0 of a non-singular matrix we have

1

o< — <\\Α~1\\2.

\λ\

Hence when λ is semi-simple (/ = 1 and V= Im) we have

m M

~ ^ cond 2 (A) || P || 2εκ + 0(ε 2 ).

|Λ|

4.2.11 Consider the block of eigenvalues

σ={λ,μ}.

An upper triangular matrix of the form

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.