Sei sulla pagina 1di 413

Numerical Linear Algebra

and
Applications

Biswa Nath Datta


Department of Mathematical Sciences

Northern Illinois University

DeKalb, IL 60115

e-mail: dattab@math.niu.edu
The book is dedicated to my Parents and Father-in-law and Mother-in-law
whose endless blessing has made writing of this book possible
PREFACE
Numerical Linear Algebra is no longer just a subtopic of Numerical Analysis; it has grown
into an independent topic for research over the past few years. Because of its crucial role in
scienti c computing, which is a major component of modern applied and engineering research,
numerical linear algebra has become an integral component of undergraduate and graduate curricula
in mathematics and computer science, and is increasingly becoming so in other curricula as well,
especially in engineering.
The currently available books completely devoted to the subject of numerical linear algebra are
Introduction to Matrix Computations by G. W. Stewart, Matrix Computations by G.
H. Golub and Charles Van Loan, Fundamentals of Matrix Computations by David Watkins,
and Applied Numerical Linear Algebra by William Hager. These books, along with the
most celebrated book, The Algebraic Eigenvalue Problem by J. H. Wilkinson, are sources of
knowledge in the subject. I personally salute the books by Stewart and Golub and Van Loan because
I have learned \my numerical linear algebra" from them. Wilkinson's book is a major reference,
and the books by Stewart and Golub and Van Loan are considered mostly to be \graduate texts"
and reference books for researchers in scienti c computing.
I have taught numerical linear algebra and numerical analysis at Northern Illinois University,
the University of Illinois, Pennsylvania State University, the University of California{San Diego,
and the State University of Campinas, Brazil. I have used with great success the books by Golub
and Van Loan and by Stewart in teaching courses at the graduate level.
As for introductory undergraduate numerical linear algebra courses, I, like many other instruc-
tors, have taught topics of numerical linear algebra from the popular \numerical analysis" books.
These texts typically treat numerical linear algebra merely as a subtopic, so I have found they do
not adequately cover all that needs to be taught in a numerical linear algebra course. In some under-
graduate books on numerical analysis, numerical linear algebra is barely touched upon. Therefore,
in frustration I have occasionally prescribed the books by Stewart and by Golub and Van Loan as
texts at the introductory level, although only selected portions of these books have been used in
the classroom, and frequently, supplementary class notes had to be provided. When I have used
these two books as \texts" in introductory courses, a major criticism (or compliment, in the view
of some) coming from students on these campuses has been that they are \too rich" and \too vast"
for students new to the subject.
As an instructor, I have always felt the need for a book that is geared toward the undergrad-
uate, and which can be used as an independent text for an undergraduate course in Numerical
Linear Algebra. In writing this book, I hope to ful ll such a need. The more recent books Fun-
damentals of Matrix Computations, by David Watkins, and Applied Numerical Linear
Algebra, by William Hager, address this need to some extent.
This book, Numerical Linear Algebra and Applications, is more elementary than most
existing books on the subject. It is an outgrowth of the lecture notes I have compiled over the years
for use in undergraduate courses in numerical linear algebra, and which have been \class-tested"
at Northern Illinois University and at the University of California-San Diego . I have deliberately
chosen only those topics which I consider essential to a study of numerical linear algebra. The
book is intended for use as a textbook at the undergraduate and beginning graduate levels in
mathematics, computer science and engineering. It can also serve as a reference book for scientists
and engineers. However, it is primarily written for use in a rst course in numerical linear algebra,
and my hope is that it will bring numerical linear algebra to the undergraduate classroom.
Here the principal topics of numerical linear algebra, such as Linear Systems, the Matrix Eigen-
value Problem, Singular Value Decomposition, Least Squares Methods, etc., have been covered
at a basic level. The book focuses on development of the basic tools and concepts of numerical
linear algebra and their e ective use in algorithm-development. The algorithms are explained in
\step-by-step" fashion. Wherever necessary, I have referred the reader to the exact locations of
advanced treatment of the more dicult concepts, relying primarily on the aforementioned books
by Stewart, Golub and Van Loan, and occasionally on that of Wilkinson.
I have also drawn heavily on applications from di erent areas of science and engineering, such
as: electrical, mechanical and chemical engineering; physics and chemistry; statistics; control theory
and signal and image processing. At the beginning of each chapter, some illustrative case studies
from applications of practical interest are provided to motivate the student. The algorithms are
then outlined, followed by implementational details. MATLAB codes are provided in the appendix
for some selected algorithms. A MATLAB toolkit, called MATCOM, implementing the major
algorithms in Chapters 4 through 8 of the book, is included with the book.
I will consider myself successful and my e orts rewarded if the students taking a rst course in
numerical linear algebra and applications, using this book as a text, develop a rm grasp of the
basic concepts of round-o errors, stability, condition and accuracy, and leave with a knowledge
and appreciation of the core numerical linear algebra algorithms, their basic properties and im-
plementations. I truly believe that the book will serve as the right text for most of the existing
undergraduate and rst year graduate courses in numerical linear algebra. Furthermore, it will
provide enough incentives for the educators to introduce numerical linear algebra courses in their
curricula, if such courses are not in existence already. Prerequisites are a rst course in linear
algebra and good knowledge of scienti c programming.
Following is a suggested format for instruction using Numerical Linear Algebra and Ap-
plications as a text. These guidelines have been drawn on the basis of my own teaching and those
of several other colleagues with whom I have had an opportunity to discuss.
1. A First Course in Numerical Linear Algebra (Undergraduate { one semester)

Chapter 1: 1.4, 1.6, 1.7, 1.8


Chapter 2
Chapter 3: (except possibly Section 3.8)
Chapter 4
Chapter 5: 5.1{5.4, 5.5.1, 5.6
Chapter 6: 6.2, (some selected topics from Section 6.3), 6.4, 6.5.1, 6.5.3, 6.5.4, 6.6, 6.7, 6.9,
6.10.1{6.10.4
Chapter 7: 7.2, 7.3, 7.4, 7.5, 7.6, 7.8.1, 7.8.2
Chapter 8: 8.2, (some selected topics of Section 8.3), 8.4, 8.5.1, 8.5.3, 8.6.1, 8.6.2, 8.7, 8.9.1,
8.9.2, 8.9.3, 8.9.4

Possibly also some very selected portions of Chapter 9 and Chapter 10, depending upon the
availability of time and the interests of the students and instructors
2. A Second Course in Numerical Linear Algebra (Advanced Undergraduate, First Year
Graduate { one semester)
Chapter 1: 1.3.5, 1.3.6
Chapter 5: 5.5, 5.6, 5.7, 5.8
Chapter 6: 6.3, 6.5.2, 6.5.5, 6.8, 6.10.5, 6.10.6
Chapter 7: 7.7, 7.8, 7.9, 7.10, 7.11
Chapter 8: 8.3, 8.8, 8.9, 8.10, 8.11, 8.12
Chapter 9: 9.2, 9.3, 9.4, 9.5, 9.6.1, 9.8, 9.9, 9.10
Chapter 10
Chapter 11

3. A Course on Numerical Linear Algebra for Engineers (Graduate { one semester)

Chapter 1
Chapter 2
Chapter 3 (except possibly Section 3.8)
Chapter 4
Chapter 5: 5.1,5.2, 5.3, 5.4
Chapter 6: 6.2, 6.3, 6.4, 6.5.1, 6.5.3, 6.6.3 (only the statement and implication of Theorem
6.6.3), 6.7.1, 6.7.2, 6.7.3, 6.7.8, 6.8, 6.9, 6.10.1, 6.10.2, 6.10.3, 6.10.4, 6.10.5
Chapter 7: 7.3, 7.5, 7.8.1, 7.8.2
Chapter 8: 8.2, 8.3, 8.4, 8.5, 8.6.1, 8.6.2, 8.7.1, 8.9.1, 8.9.2, 8.9.3, 8.9.4, 8.9.6, 8.12
Chapter 9
Chapter 10: 10.2, 10.3, 10.4, 10.5, 10.6.1, 10.6.3, 10.6.4, 10.8.1, 10.9.1, 10.9.2
CHAPTER-WISE BREAKDOWN
Chapter 1, Some Required Concepts from Core Linear Algebra, describes some important results
from theoretical linear algebra. Of special importance here are vector and matrix norms, special
matrices and convergence of the sequence of matrix powers, etc., which are essential to the under-
standing of numerical linear algebra, and which are not usually covered in an introductory linear
algebra course.
Chapter 2 is on Floating Point Numbers and Errors in Computations. Here the concepts of
oating point number systems and rounding errors have been introduced and it has been shown
through examples how round-o errors due to cancellation and recursive computations can \pop
up", even in simple calculations, and how these errors can be reduced in certain cases. The IEEE
oating point standard has been discussed.
Chapter 3 deals with Stability of Algorithms and Conditioning in Problems. The basic concepts
of conditioning and stability, including strong and weak stability, have been introduced and exam-
ples have been given on unstable and stable algorithms and ill-conditioned and well-conditioned
problems. It has been my experience, as an instructor, that many students, even after taking a
few courses on numerical analysis, do not clearly understand that \conditioning" is a property of
the problem, stability is a property of the algorithm, and both have e ects on the accuracy of the
solution. Attempts have been made to make this as clear as possible.
It is important to understand the distinction between a \bad" algorithm and a numerically
e ective algorithm and the fact that popular mathematical software is based only on numerically
e ective algorithms. This is done in Chapter 4, Numerically E ective Algorithms and Mathematical
Software. The important properties such as eciency, numerical stability, storage-economy, etc.,
that make an algorithm and the associated software \numerically e ective" are explained with
examples. In addition, a brief statement regarding important matrix software such as LINPACK,
EISPACK, IMSL, MATLAB, NAG, LAPACK, etc., is given in this chapter.
Chapter 5 is on Some Useful Transformations in Numerical Linear Algebra and Their Appli-
cations. The transformations such as elementary transformations, Householder re ections, and
Givens rotations form the principal tools of most algorithms of numerical linear algebra. These
important tools are introduced in this chapter and it is shown how they are applied to achieve the
important decompositions such as LU and QR, and reduction to Hessenberg forms. This chapter
is a sort of \preparatory" chapter for the rest of the topics treated in the book.
Chapter 6 deals with the most important topic of numerically linear algebra, Numerical So-
lutions of Linear Systems. The direct methods such as Gaussian elimination with and without
pivoting, QR factorization methods, the method based on Cholesky decompositions, the methods
that take advantage of special structure of matrices, etc., the standard iterative methods such as
the Jacobi, the Gauss-Seidel, successive overrelation, iterative re nement, and the perturbation
analysis of the linear system and computations of determinants, inverses and leading principal mi-
nors are discussed in this chapter. Some motivating examples from applications areas are given
before the techniques are discussed.
The Least Squares Solutions to Linear Systems, discussed in Chapter 7, are so important in
applications that the techniques for nding them should be discussed as much as possible, even
in an introductory course in numerical linear algebra. There are users who still routinely use the
normal equations method for computing least squares solution; the numerical diculties associated
with this approach are described in some detail and then a better method, based on the QR
decomposition for the least squares problems, is discussed. The most reliable general-purpose
method based on the singular value decomposition is mentioned in this chapter, and treated in full in
Chapter 10. The QR methods for rank-de cient least squares problem and for the underdetermined
problem, and iterative re nement procedures are also discussed in this chapter. Some discussion
on perturbation analysis is also included.
Chapter 8 picks up another important topic, probably the second most important topic, Nu-
merical Matrix Eigenvalue Problems. There are users who still believe that eigenvalues should be
computed by nding the zeros of the characteristic polynomial. It is clearly explained why this is
not a good general rule. The standard and most widely used techniques for eigenvalues computa-
tions, the QR iteration with and without shifts, are then discussed in some detail. The popular
techniques for eigenvector computations such as the inverse power method and the Rayleigh Quo-
tient Iteration are described, along with techniques of eigenvalue locations. The most common
methods for the symmetric eigenvalue problem and the symmetric Lanczos method are described
very brie y in the end. Discussion on the stability of di erential and di erence equations, and
engineering applications to the vibration of structures and a stock market example from statistics,
are included which will serve as motivating examples for the students.
Chapter 9 deals with The Generalized Eigenvalue Problem (GEP). The GEP arises in many
practical applications such as in mechanical vibrations, design of structures, etc. In fact, in these
applications, almost all eigenvalue problems are generalized eigenvalue problems, most of them
are symmetric de nite problems. We rst present a generalized QR iteration for the pair (A; B ),
commonly known as the QZ iteration for GEP. Then we discuss in detail techniques of simultaneous
diagonalization for generalized symmetric de nite problems. Some applications of simultaneous
diagonalization techniques, such as decoupling of a system of second-order di erential equations,
are described in some detail. Since several practical applications, e.g. the design of large sparse
structures, give rise to very large-scale generalized de nite eigenvalue problems, a brief discussion
on Lanczos-based algorithms for such problems is also included. In addition, several case studies
from vibration and structural engineering are presented. A brief mention is made of how to reduce
a quadratic eigenvalue problem to a standard eigenvalue problem, or to a generalized eigenvalue
problem.
The Singular Value Decomposition (SVD) and singular values play important roles in a wide
variety of applications. In Chapter 10, we rst show how the SVD can be used e ectively to solve
computational linear algebra problems arising in applications, such as nding the structure of a
matrix (rank, nearness to rank-de ciency, orthonormal basis for the range and the null space of a
matrix, etc.), nding least squares solutions to linear systems, computing the pseudoinverse, etc.
We then describe the most widely used method, the Golub-Kahan-Reinsch method, for computing
the SVD and its modi cation by Chan. The chapter concludes with the description of a very recent
method for computing the smallest singular values of a bidiagonal matrix with high accuracy by
Demmel and Kahan. A practical life example on separating the fetal ECG from the maternal ECG
is provided in this chapter as a motivating example.
The stability (or instability) of an algorithm is usually established by means of backward round-
o error analysis, introduced and made popular by James Wilkinson. Working out the details of
round-o error analysis of an algorithm can be quite tedious, and presenting such analysis for
every algorithm is certainly beyond the scope of this book. At the same time, I feel that every
student of numerical linear algebra should have some familiarity with the way rounding analysis
of an algorithm is performed. We have given the readers A Taste of Round-o Error Analysis
in Chapter 11 of the book by presenting such analyses of two popular algorithms: solution of a
triangular system and Gaussian elimination for triangularization. For other algorithms, we just
present the results (without proof) in the appropriate places in the book, and refer the readers to
the classic text The Algebraic Eigenvalue Problem by James H. Wilkinson and occasionally
to the book of Golub and Van Loan for more details and proofs.
The appendix contains MATLAB codes for a selected number of basic algorithms. Students
will be able to use these codes as a template for writing codes for more advanced algorithms. A
MATLAB toolkit containing implementation of some of the most important algorithms has been
included in the book, as well. Students can use this toolkit to compare di erent algorithms for the
same problem with respect to eciency, accuracy, and stability. Finally, some discussion of how to
write MATLAB programs will be included.
Some Basic Features of the Book
NUMERICAL LINEAR ALGEBRA AND APPLICATIONS
 Clear explanation of the basic concepts. The two most fundamental concepts of Numer-
ical Linear Algebra, namely, the conditioning of the problem and the stability of an algorithm
via backward round-o error analysis, are introduced at a very early stage of the book with
simple motivating examples.
Speci c results on these concepts are then stated with respect to each algorithm and problem
in the appropriate places, and their in uence on the accuracy of the computed results is
clearly demonstrated. The concepts of weak and strong stability, recently introduced by
James Bunch, will appear for the rst time in this book.
Most undergraduate numerical analysis textbooks are somewhat vague in explaining these
concepts which, I believe, are fundamental to numerical linear algebra.
 Discussion of fundamental tools in a separate chapter. Elementary, Householder and
Givens matrices are the three most basic tools in numerical linear algebra. Most computation-
ally e ective numerical linear algebra algorithms have been developed using these basic tools
as principal ingredients. A separate chapter (Chapter 5) has been devoted to the introduc-
tion and discussion of these basic tools. It has been clearly demonstrated how a simple|but
very powerful|property of these matrices, namely, the ability of introducing zeros in speci c
positions of a vector or of a matrix, can be exploited to develop algorithms for useful matrix
factorizations such as LU and QR and for reduction of a matrix to a simple form such as
Hessenberg.
In my experience as a teacher, I have seen that once students have been made familiar with
these basic tools and have learned some of their most immediate applications, the remainder
of the course goes very smoothly and quite fast.
Throughout the text, soon after describing a basic algorithm, it has been shown how the
algorithm can be made cost-e ective and storage-ecient using the rich structures of these
matrices.
 Step-by-step explanation of the algorithms. The following approach has been adopted
in the book for describing an algorithm: the rst few steps of the algorithm are described in
detail and in an elementary way, and then it is shown how the general kth step can be written
following the pattern of these rst few steps. This is particularly helpful to the understanding
of an algorithm at an undergraduate level.
Before presenting an algorithm, the basic ideas, the underlying principles and a clear goal
of the algorithm have been discussed. This approach appeals to the student's creativity and
stimulates his interest. I have seen from my own experience that once the basic ideas, the
mechanics of the development and goals of the algorithm have been laid out for the student,
he may then be able to reproduce some of the well-known algorithms himself, even before
learning them in the class.
 Clear discussion of numerically e ective algorithms and high-quality mathemat-
ical software. Along with mathematical software, a clear and concise de nition of a \nu-
merically e ective" algorithm has been introduced in Chapter 3 and the important properties
such as eciency, numerical stability, storage-economy etc., that make an algorithm and asso-
ciated software numerically e ective, have been explained with ample simple examples. This
will help students not only to understand the distinction between a \bad" algorithm and a
numerically e ective one, but also to learn how to transform a bad algorithm into a good
one, whenever possible. These ideas are not clearly spelled out in undergraduate texts and
as result, I have seen students who, despite having taken a few basic courses in numerical
analysis, remain confused about these issues.
For example, an algorithm which is only ecient is often mistaken by students for a \good"
algorithm, without understanding the fact that an ecient algorithm can be highly unstable
(e.g., Gaussian elimination without pivoting).
 Applications. A major strength of the book is applications. As a teacher, I have often
been faced with questions such as: \Why is it important to study such-and-such problems?",
\Why do such-and-such problems need to be solved numerically?", or \What is the physical
signi cance of the computed quantities?" Therefore, I felt it important to include practical
life examples as often as possible, for each computational problem discussed in the book.
I have done so at the outset of each chapter where numerical solutions of a computational
problem have been discussed. The motivating examples have been drawn from applications
areas, mainly from engineering; however, some examples from statistics, business, bioscience,
and control theory have also been given. I believe these examples will provide sucient
motivation to the curious student to study numerical linear algebra.
After a physical problem has been posed, the physical and engineering signi cance of its
solution has been explained to some extent. The currently available numerical linear algebra
and numerical analysis books do not provide suciently motivating examples.
 MATLAB codes and the MATLAB toolkit. The use of MATLAB is becoming increas-
ingly popular in all areas of scienti c and engineering computing. I feel that numerical linear
algebra courses should be taught using MATLAB wherever possible. Of course, this does
not mean that the students should not learn to write FORTRAN codes for their favorite
algorithms{knowledge of FORTRAN is a great asset to a numerical linear algebra student.
MATLAB codes for some selected basic algorithms have therefore been provided to help the
students use these codes as templates for writing codes for more advanced algorithms. Also,
a MATLAB toolkit implementing the major algorithms presented in the book has been pro-
vided. The students will be able to compare di erent algorithms for the same problem with
regard to eciency, stability, and accuracy. For example, the students will be able to see in-
stantly, through numerical examples, why Gaussian elimination is more ecient than the QR
factorization method for linear systems problems; why the computed Q in QR factorization
may be more accurate with the Householder or Givens method than with the Gram-Schmidt
methods, etc.
 Thorough discussions and the most up-to-date information. Each topic has been
very thoroughly discussed, and the most current information on the state of the problem has
been provided. The most frequently asked questions by the students have also been answered.
 Solutions and answers to selected problems. Partial solutions for selected important
problems and, in some cases, complete answers, have been provided. I feel this is important
for our undergraduate students. In selecting the problems, emphasis has been placed on those
problems that need proofs.
Above all, I have imparted to the book my enthusiasm and my unique style of presenting
material in an undergraduate course at the level of the majority of students in the class, which
have made me a popular teacher. My teaching evaluations at every school at which I have taught
(e.g., State University of Campinas, Brazil; Pennsylvania State University; the University of Illinois
at Urbana-Champaign; University of California, San Diego; Northern Illinois University, etc.) have
been consistently \excellent" or \very good". As a matter of fact, the consistently excellent feedback
that I receive from my students provided me with enough incentive to write this book.
0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND COMPUTA-
TIONAL DIFFICULTIES
0.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1
0.2 Fundamental Linear Algebra Problems and Their Importance : : : : : : : : : : : : : 1
0.3 Computational Diculties of Solving Linear Algebra Problems Using Obvious Ap-
proaches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4
CHAPTER 0
LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND
COMPUTATIONAL DIFFICULTIES
0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE
AND COMPUTATIONAL DIFFICULTIES
0.1 Introduction
The main objectives of this chapter are to state the fundamental linear algebra problems at the
outset, make a brief mention of their importance, and point out the diculties that one faces in
computational setting when trying to solve these problems using obvious approaches.

0.2 Fundamental Linear Algebra Problems and Their Importance


The fundamental linear algebra problems are:

A. The Linear System Problem: Given an n  n nonsingular matrix A and an


n-vector b, the problem is to nd an n-vector x such that Ax = b.

A practical variation of the problem requires solutions of several linear systems with the same
matrix A on the left hand side. That is, the problem there is to nd a matrix X = [x1; x2; ::; xm]
such that
AX = B;
where B = [b1; b2; :::; bm] is an n  m matrix.
Associated with linear system problems are the problems of nding the inverse of a matrix,
nding the rank, the determinant, the leading principal minors, an orthonormal basis for the range
and the null-space of A, and various projection matrices associated with A. Solutions of some of
these later problems require matrix factorizations and the problem of matrix factorizations and
linear system problems are intimately related.
It is perhaps not an exaggeration to say that the linear system problem arises in almost all
branches of science and engineering: applied mathematics, biology, chemistry, physics, electrical,
mechanical, civil, and vibration engineering, etc.
The most common source is the numerical solution of di erential equations. Many mathe-
matical models of physical and engineering systems are systems of di erential equations: ordinary
and partial. A system of di erential equations is normally solved numerically by discretizing the
system by means of nite di erences or nite element methods. The process of discretization, in

1
general, leads to a linear system, the solution of which is an approximate solution to the di erential
equations. (see Chapter 6 for more details).

B. The Least Squares Problem: Given an m  n matrix A, and an m-vector b,


the least squares problem is to nd an n-vector x such that the norm of the residual
vector, kAx ; bk2, is as small as possible.

Least squares problems arise in statistical and geometric applications that require tting a
polynomial or curve to experimental data, and engineering applications such as signal and image
processing. See Chapter 7 for some speci c applications of least squares problems. It is worth
mentioning here that methods for numerically solving least squares problems invariably lead to
solutions of linear systems problems (see again Chapter 7 for details).

C. The Eigenvalue Problem: Given an n  n matrix A, the problem is to nd n


numbers i and n-vectors xi such that
Axi = i xi; i = 1; :::; n:

The eigenvalue problem typically arises in the explicit solution and stability analysis of a ho-
mogeneous system of rst order di erential equations. The stability analysis requires only implicit
knowledge of eigenvalues, whereas the explicit solution requires eigenvalues, and eigenvectors ex-
plicitly.
Applications such as buckling problems, stock market analysis, study of behavior of dynamical
systems, etc. require computations of only a few eigenvalues and eigenvectors, usually the few
largest or smallest ones.
In many practical instances, the matrix A is symmetric, and thus, the eigenvalue problem
becomes a symmetric eigenvalue problem. For details of some speci c applications see Chapter 8.
A great number of eigenvalue problems arising in engineering applications are, however, generalized
eigenvalue problems, as stated below.

2
D. The Generalized and Quadratic Eigenvalue Problems: Given the n  n
matrices A; B , and C , the problem is to nd i and xi such that
(2i A + iC + B )xi = 0; i = 1; : : :; n:

This is known as the quadratic eigenvalue problem. In the special case when C is a zero
matrix, the problem reduces to a generalized eigenvalue problem. That is, if we are given n  n
matrices A and B , we must nd  and x such that
Ax = Bx:
The leading equations of vibration engineering (a branch of engineering dealing with vibrations of
structures, etc.) are systems of homogeneous or nonhomogeneous second-order di erential equa-
tions. A homogeneous second order system has the form:
Az + C z_ + Bz = 0 ;
the solution and stability analysis of which lead to a quadratic eigenvalue problem.
Vibration problems are usually solved by setting C = 0.Moreover, in many
practical instances, the matrices A and B are symmetric and positive de nite. This leads to a
symmetric de nite generalized eigenvalue problem.
See Chapter 9 for details of some speci c applications of these problems.

E. Singular Value Decomposition Problem. Given an m  n matrix A, the


problem is to nd unitary matrices U and V , and a \diagonal" matrix  such that
A = U ()V  :

The above decomposition is known as the Singular Value Decomposition of A. The entries
of  are singular values. The column vectors of U and V are called the singular vectors.
Many areas of engineering such as control and systems theory, biomedical engineering, signal
and image processing, and statistical applications give rise to the singular value decomposition
problem. These applications typically require the rank of A, an orthonormal basis, projections, the
3
distance of a matrix from another matrix of lower rank, etc., in the presence of certain impurities
(known as noises) in the data. The singular values and singular vectors are the most numerically
reliable tools to nd these entities. The singular value decomposition is also the most numerically
e ective approach to solve the least squares problem, especially, in the rank-de cient case.

0.3 Computational Diculties of Solving Linear Algebra Problems Using Ob-


vious Approaches
In this section we would like to point out some computational diculties one might face while
attempting to solve some of the above-mentioned linear algebra problems using \obvious" ways.
 Solving a Linear System by Cramer's Rule: Cramer's Rule, as taught at an under-
graduate linear algebra course, is of signi cant theoretical and historical importance (for
a statement of this rule, see Chapter 6). Unfortunately, it can not be recommended as a
practical computational procedure.
Solving a 20  20 linear system, even on a fast modern-day computer, might take more than
a million years to compute the solution with this rule.
 Computing the unique solution of a linear system by matrix inversion: The unique
solution of a nonsingular linear system can be written explicitly as x = A;1b.
Unfortunately, computing a solution to a linear system by rst explicitly computing the
matrix inverse is not practical.
The computation of the matrix inverse is about three times as expensive as solving the linear
system problem itself using a standard elimination procedure (see Chapter 6), and often leads
to more inaccuracies. Consider a trivial example: Solve 3x = 27. An elimination procedure
will give x = 9 and require only one division. On the other hand, solving the equation using
matrix inversion will be cast as x = (1=3)  27, giving x = 0:3333  27 = 8:999 (in four digit
arithmetic), and will require one division and one multiplication.
Note that computer time consumed by an algorithm is theoretically measured by the number
of arithmetic operations needed to execute the algorithm.
 Solving a least squares problem by normal equations: If the m  n matrix A has
full rank, and m is greater than or equal to n, then the least squares problem has a unique
solution and this solution is theoretically given by the solution x to the linear system
AT Ax = AT b:

4
The above equations are known as the normal equations. Unfortunately, this procedure
has some severe numerical limitations. First, in nite precision arithmetic, during an explicit
formation of AT A, some vital information might be lost. Second, the normal equations are
more sensitive to perturbations than the ordinary linear system Ax = b, and this sensitivity,
in certain instances, corrupts the accuracy of the computed least squares solution to an extent
not warranted by the data. (See Chapter 7 for more details.)
 Computing the eigenvalues of a matrix by nding the roots of its characteristic
polynomial: The eigenvalues of a matrix A are the zeros of its characteristic polynomial.
Thus an \obvious" procedure for nding the eigenvalues would be to compute the charac-
teristic polynomial of A and then nd its zeros by a standard well-established root- nding
procedure. Unfortunately, this is not a numerically viable approach. The round-o errors pro-
duced during a process for computing the characteristic polynomial, will very likely produce
some small perturbations in the computed coecients. These small errors in the coecients
can a ect the computed zeros very drastically in certain cases. The zeros of certain poly-
nomials are known to be extremely sensitive to small perturbations in the coecients. A
classic example of this is the Wilkinson-polynomial (see Chapter 3). Wilkinson took a poly-
nomial of degree 20 with the distinct roots: 1 through 20, and perturbed the coecient of x19
by a signi cantly small amount. The zeros of this slightly perturbed polynomial were then
computed by a well-established root- nding procedure, only to nd that some zeros became
totally di erent. Some even became complex.
 Solving the Generalized Eigenvalue Problem and the Quadratic Eigenvalue Prob-
lems by Matrix Inversion: The generalized eigenvalue problem in the case where B is
nonsingular
Ax = Bx
is theoretically equivalent to the ordinary eigenvalue problem:
B;1 Ax = x:
However, if the nonsingular matrix B is sensitive to perturbations, then forming the matrix
on the left hand side by explicitly computing the inverse of B will lead to inaccuracies that
in turn will lead to computations of inaccurate generalized eigenvalues.
Similar results hold for the quadratic eigenvalue problem. In major engineering applications,
such as in vibration engineering, the matrix A is symmetric positive de nite, and is thus

5
nonsingular. In that case the quadratic eigenvalue problem is equivalent to the eigenvalue
problem
Eu = u ; where
0 I
!
E= :
;A B ;A C
;1 ;1

But numerically it is not advisable to solve the quadratic eigenvalue problem by actually
computing the matrix E explicitly. If A is sensitive to small perturbations, the matrix E
cannot be formed accurately, and the computed eigenvalues will then be inaccurate.
 Finding the Singular Values by computing the eigenvalues of AT A: Theoretically,
the singular values of A are the nonnegative square roots of the eigenvalues of AT A. However,
nding the singular values this way is not advisable. Again, explicit formation of the matrix
might lead to the loss of signi cant relevant information. Consider a rather trivial example:
01 11
B CC
AB@ 0A
0 0
where  is such that!in nite precision computation 1 + 2 = 1. Then computationally we
1 1
have AT A = . The eigenvalues now are 2 and 0. So the computed singular values
1 1 p p p
will now be given by 2, and 0. The exact singular values, however are 2 and = 2 (See
Chapter 10 for details.)

Conclusion: Above we have merely pointed out how certain obvious theoretical approaches to
linear algebra problems might lead to computational diculties and inaccuracies in computed re-
sults. Numerical linear algebra deals with in-depth analysis of such diculties, investigations of how
these diculties can be overcome in certain instances, and with formulation and implementations
of viable numerical algorithms for scienti c and engineering use.

6
1. A REVIEW OF SOME REQUIRED CONCEPTS FROM CORE LINEAR ALGE-
BRA
1.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
1.2 Vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
1.2.1 Subspace and Basis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
1.3 Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
1.3.1 Range and Null Spaces : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
1.3.2 Rank of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
1.3.3 The Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
1.3.4 Similar Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
1.3.5 Orthogonality and Projections : : : : : : : : : : : : : : : : : : : : : : : : : : 15
1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix : : : : 17
1.4 Some Special Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
1.5 The Cayley-Hamilton Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26
1.6 Singular Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27
1.7 Vector and Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
1.7.1 Vector Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
1.7.2 Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30
1.7.3 Convergence of a Matrix Sequence and Convergent Matrices : : : : : : : : : : 34
1.7.4 Norms and Inverses : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37
1.8 Norm Invariant Properties of Orthogonal Matrices : : : : : : : : : : : : : : : : : : : 40
1.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 41
1.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42
CHAPTER 1
A REVIEW OF SOME REQUIRED CONCEPTS
FROM CORE LINEAR ALGEBRA
1. A REVIEW OF SOME REQUIRED CONCEPTS FROM
CORE LINEAR ALGEBRA
1.1 Introduction
Although a rst course in linear algebra is a prerequisite for this book, for the sake of completeness
we establish some notation and quickly review the basic de nitions and concepts on matrices and
vectors in this chapter, and then discuss in somewhat greater detail the concepts and fundamen-
tal results on vector and matrix norms and their applications to the study of convergent
matrices. These results will be used frequently in the later chapters of the book.
1.2 Vectors
An ordered set of numbers is called a vector; the numbers themselves are called the components
of the vector. A lower case italic letter is usually used to denote a vector. A vector v having n
components has the form 2 3 v
66 v 1
77
v = 666 .. 2 77 :
75
4.
vn
A vector in this form is referred to as a column vector and its transpose is a row vector. The
set of all n-vectors (that is, each vector having n components) will be denoted by Rn1 or simply
by Rn . The transpose of a vector v will be denoted by v T . Unless otherwise stated, a column vector
will simply be called a vector.
If u and v are two row vectors in Rn, then their sum u + v is de ned by
u + v = (u + v ; u + v ; : : :; um + vn)T :
1 1 2 2

If c is a scalar, then cu = (cu1; cu2; : : :; cun)T . The inner product of two vectors u and v is
the scalar given by
uvT = u1 v1 + u2 v2 +    + unvn:
p
The length of a vector v , denoted by kv k, is v T v; that is, the length of v (or Euclidean length of
p
v) is v12 + v22 +    + vn2 .
A set of vectors fm1; : : :; mk g in Rn is said to be linearly dependent if there exist scalars
c1; : : :; ck, not all zero, such that
c m +    + ck mk = 0
1 1 (zero vector).
Otherwise, the set is called linearly independent.
7
Example 1.2.1
The set of vectors
ei = (0; 0; : : :; 0; 1 ; 0 : : :; 0)T ; i = 1; : : :; n
"
ith component
is linearly independent, because
0c 1
BB c 1
CC
c e + c e +    + cnen = BBB .. 2 CC = 0
CA
@.
1 1 2 2

cn
is true if and only if
c = c =    = cn = 0:
1 2

Example 1.2.2
1 ;3 ! !
The vectors and are linearly dependent, because
;2 6
1
!
;3 0
! !
3 + = :
;2 6 0
Thus, c1 = 3; c2 = 1.

1.2.1 Subspace and Basis


Orthogonality of two vectors : The angle  between two vectors u and v is given by
cos() = kuukkvv k
T

.
Two vectors u and v are orthogonal if  = 90o , that is uT v = 0. The symbol ? is used to denote
orthogonality.
Let S be a set of vectors in Rn . Then S is called a subspace of Rn if s1 ; s2 2 S implies
c1s1 + c2 s2 2 S , where c1 and c2 are any scalars. That is, S is a subspace if any linear combination
of two vectors in S is also in S . Note that the space Rn itself is a subspace of Rn . For every
subspace there is a unique smallest positive integer r such that every vector in the subspace can be

8
expressed as a linear combination of at most r vectors in the subspace; r is called the dimension
of the subspace and is denoted by dim[S ]. Any set of r linearly independent vectors from S of
dim[S ] = r forms a basis of the subspace.
Orthogonality of Two Subspaces. Two subspaces S1 and S2 of Rn are said to be orthogonal
if sT1 s2 = 0 for every s1 2 S1 and every s2 2 S2 . Two orthogonal subspaces S1 and S2 will be
denoted by S1 ? S2.

1.3 Matrices
A collection of n vectors in Rn arranged in a rectangular array of m rows and n columns is called
a matrix. A matrix A, therefore, has the form
0a b  b n 1
BB a 11 12

b
1

 b n C
C
A=BBB .. 21 22 2 CC :
CA
@ .
am bm    bmn
1 2

It is denoted by A = (aij )mn , or simply by A = (aij ), where it is understood that i = 1; : : :; m


and j = 1; : : :; n. A is said to be of order m  n. The set of all m  n matrices is denoted by Rmn .
A matrix A having the same number of rows and columns is called a square matrix. The
square matrix having 1's along the main diagonal and zeros everywhere else is called the identity
matrix and is denoted by I .
The sum of two matrices A = (aij ) and B = (bij ) in Rmn is a matrix of the same order as A
and B and is given by
A + B = (aij + bij ):
If c is a scalar, then cA is a matrix given by
cA = (caij ):
Let A be m  n and B be n  p. Then their product AB is an m  p matrix given by
X n i = 1; : : :; m;
AB = ( aikbkj );
=1 k j = 1; : : :; p:
Note that if b is a column vector, then Ab is a column vector. On the other hand, if a is a
column vector and bT is a row vector, then abT is a matrix, known as the outer product of the
two vectors a and b. Thus 0 1 0a b  a b 1
a m
BB a C
1

C B
1 1

B a b
1

   a b
CC
abT = B B C B .. C
m
B@ ... C ( b b    bm ) = B C:
2 2 1 2

C B ..
1
A
2
@ . . C A
an anb    an bm
1

9
Example 1.3.1
011
B CC
a=B
@2A; b = (2 3 4)
3
02 3 4 1
B C
Outer product abT = B@ 4 6 8 CA (a matrix).
6 9 120 1
2
Inner product aT b = ( 1 2 3 ) B@ 3 CCA = 20 (a scalar).
B
4
The transpose of a matrix A of order m  n, denoted by AT , is a matrix of order n  m with
rows and columns interchanged:
i = 1; : : :n;
AT = (aji);
j = 1; : : :m:
Note that the matrix product is not commutative; that is, in general
AB 6= BA:

Also, (AB )T = B T AT .

Hermitian (Symmetric) Matrix : A complex matrix A is called Hermititan if


A = (A)T = A,
where A is the complex conjugate of A. A real matrix is symmetric if AT = A.

An alternative way of writing the matrix product


Writing B = (b1; : : :; bp), where bi is the ith column of B , the matrix product AB can be written as
AB = (Ab ; : : :; Abp):
1

Similarly, if ai is the ith row of A then


0a B1
BB a B CC
1

AB = BBB .. CCC :
2

@ . A
am B

10
Block Matrices
If two matrices A and B can be partitioned as
A A ! B B !
A= 11 12
; B= 11 12
;
A 21 A22 B 21 B22

then considering each block as an element of the matrix, we can perform addition, scalar multipli-
cation and matrix multiplication in the usual way. Thus,
A11 + B11 A12 + B21
!
A+B =
A21 + B21 A22 + B22
and
A11B11 + A12B21 A11 B12 + A12B22
!
AB = ;
A21B11 + A22B21 A21 B12 + A22B22
assuming that the partioning has been done conformably so that the corresponding matrix mul-
tiplications are possible. The concept of two block partioning can be easily generalized.
Thus, if A = (Aij ) and B = (Bij ) are two block matrices, then C = AB is given by
X
n !
C = (Cij ) = Aik Bkj ;
k=1
where each Aik ; Bkj , and Cij is a block matrix.
A block diagonal matrix is a diagonal matrix where each diagonal element is a square matrix.
That is,
A = diag(A11; : : :; Ann);
where Aii are square matrices.
The Determinant of a Matrix
For every square matrix A, there is a unique number associated with the matrix called the
determinant of A, which is denoted by det(A). For a 2  2 matrix A, det(A) = a11a22 ; a12a21;
for a 3  3 matrix A = (aij ), det(A) = a11  det(A11) ; a12  det(A12) + a13  det(A13), where A1i
is a 2  2 submatrix obtained by eliminating the rst row and the ith column. This can be easily
generalized. For an n  n matrix A = (aij ) we have
det(A) = (;1)i+1ai1 det(Ai1) + (;1)i+2 ai2 det(Ai2)
+    + (;1)i+nain det(Ain);
where Aij is the submatrix of A of order (n ; 1) obtained by eliminating the ith row and j th column.

11
Example 1.3.2
01 2 31
B 4 5 6 CC :
A=B
@ A
7 8 9
Set i = 1. Then
5 6
! 4 6
! 4 5
!
det(A) = 1  det ; 2  det + 3  det
8 9 7 9 7 8
= 1(;3) ; 2(;6) + 3(;3) = 0:
Theorem 1.3.1 The following simple properties of det(A) hold:
1. det(A) = det(AT )
2. det( A) = n det(A), where is a scalar.
3. det(AB ) = det(A)  det(B ).
4. If two rows or two columns of A are identical, then det(A) = 0.
5. If B is a matrix obtained from A by interchanging two rows or two columns, then det(B ) =
; det(A).
6. The determinant of a triangular matrix is the product of its diagonal entries.
(A square matrix A is triangular if its elements below or above the diagonal are all zero.)
The Characteristic Polynomial, the Eigenvalues and Eigenvectors of a Matrix
Let A be an n  n matrix. Then the polynomial pn () = det(I ; A) is called the characteristic
polynomial. The zeros of the characteristic polynomial are called the eigenvalues of A. Note
that this is equivalent to the following:  is an eigenvalue of A i there exists a nonzero vector x
such that Ax = x. The vector x is called a right eigenvector (or just an eigenvector), and the
vector y satisfying y  A = y  is called a left eigenvector associated with .
De nition 1.3.1 An n  n matrix A having fewer than n linearly independent eigenvectors is
called a defective matrix .
Example 1.3.3

12
1 2
!
The matrix A =
0 1 " 1 # " ;1 #
is defective. The two eigenvectors and are linearly independent.
0 0
The Determinant of a Block Matrix

A !
Let
A
A= ;
11 12

0 A 22

where A and A are square matrices. Then det(A) = det(A )  det(A ).


11 22 11 22

1.3.1 Range and Null Spaces


For every m  n matrix A, there are two important associated subspaces: the Range of A, denoted
by R(A), and the Null Space of A, denoted by N (A):
R(A) = fb 2 Rm j b = Ax for some x 2 Rng
N (A) = fx 2 Rn j Ax = 0g:
Let S be a subspace of Rm . Then the subspace S ? de ned by
S ? = fy 2 Rm j y T x = 0 for all x 2 S g
is called the orthogonal complement of S . It can be shown (Exercise) that

(i) N (A) = R(AT )?


(ii) R(A)? = N (AT ).

The dimension of N (A) is called the nullity of A and is denoted by null(A).

1.3.2 Rank of a Matrix


Let A be an m  n matrix. Then the subspace spanned by the row vectors of A is called the row
space of A. The subspace spanned by the columns of A is called the column space of A.
The rank of a matrix A is the dimension of the column space of A. It is denoted by rank(A).
A square matrix A 2 Rnn is called nonsingular if rank(A) = n. Otherwise it is singular.
13
An n  n matrix A 2 Rnn is said to have full column rank if its columns are linearly
independent. The full row rank is similarly de ned. A matrix A is said to have full rank if it
has either full row rank or full column rank. If A does not have full rank, it is rank de cient.
Example 1.3.4
01 21
B CC
A=B
@3 4A
5 6
has full rank; rank(A) = 2 (it has full column rank); null(A) = 0.
Example 1.3.5
01 21
B 2 4 CC
A=B
@ A
0 0
is rank de cient; rank(A) = 1; Null (A) = 1.

Some Rank Properties


Let A be an m  n matrix. Then
1. rank(A) = rank(AT ).
2. rank(A) + null(A) = n.
3. rank(AB )  rank(A) + rank(B ) ; n, where B is n  p.
4. rank(BA) = rank(A) = rank(AC ), where B and C are nonsingular matrices
of order m.
5. rank(AB )  minfrank(A); rank(B )g.

1.3.3 The Inverse of a Matrix


Let A be an n  n matrix. Then a matrix B such that
AB = BA = I;
where I is the n  n identity matrix, is called the inverse of A. The inverse of A is denoted by
A;1 . The inverse is unique.
An interesting property of the inverse of the product of two matrices is:
14
(AB );1 = B ;1 A;1:

Theorem 1.3.2 For an n  n matrix A, the following are equivalent:


1. A is nonsingular.
2. det(A) is nonzero.
3. rank(A) = rank(AT ) = n.
4. N (A) = f0g.
5. A;1 exists.
6. A has linearly independent rows and columns.
7. The eigenvalues of A are nonzero.

1.3.4 Similar Matrices


Two matrices A and B are called similar if there exists a nonsingular matrix T such that
T ; AT = B:
1

An important property of similar matrices. Two similar matrices have the same
eigenvalues. (for a proof, see Chapter 8, section 8.2)
1.3.5 Orthogonality and Projections
A set of vectors fv ; : : :; vmg in Rn is orthogonal if
1

viT vj = 0; i 6= j:
If, in addition, viT vi = 1, for each i, then they are called orthonormal.
A basis for a subspace that is also orthonormal is called an orthonormal basis for the subspace.
Example 1.3.6

15
0 0 01
B C
@ 1 CA :
A=B 1
2
1
1
0 0 1 2

B ;p
The vector B
C
C
@ 1
5 A forms an orthonormal basis for R(A). (See section 5.6.1)
;p 1
5

Example 1.3.7
01 21
B 0 1 CC
A=B
@ A
1 0
0 ;p ;p 1 1 1

B 0 p; C 2 3

The matrix V = B
1 @ C
A forms an orthonormal basis for R(A). (See section 5.6.1)
1
3
;1
p p13
2

Orthogonal Projection
Let S be a subspace of Rn. Then an n  n matrix P having the properties:
(i) R(P ) = S
(ii) P T = P (P is symmetric)
(iii) P 2 = P (P is idempotent)
is called the orthogonal projection onto S or simply the projection matrix. We denote the
orthogonal projection P onto S by PS . The orthogonal projection onto a subspace is unique.

Let V = fv1; : : :; vk g form an orthonormal basis for a subspace S . Then


PS = V V T
is the unique orthogonal projection onto S . Note that V is not unique, but PS
is.

A relationship between PS and PS?


If PS is the orthogonal projection onto S , then I ; PS , where I is the identity matrix of the
same order as PS , is the orthogonal projection onto S ? . (Exercise : 14(a))
16
The Orthogonal Projections onto R(A) and N (AT )
When the subspace S is R(A) or N (AT ) associated with the matrix A, we will denote the unique
orthogonal projections onto R(A) and N (AT ) by PA and PN , respectively.

It can be shown (exercise 14(b)) that if A is m  n(m  n) and has full rank,
then
PA = A(AT A); AT 1

PN = I ; A(AT A); AT : 1

Example 1.3.8
01 21
B CC
A=B @0 1A
1 0
! 0 1
AT A =
2 2
; (AT A); = @
; A
1
5
6
1
3

2 5 ; 1 1

0 1 3 3

B CC
5 1 1

PA = B
6 3 6

B@ 1
3
; C A
1
3
1
3

; ;
1
6
1
3
1
6

1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix
Any vector b can be written as
b = bS + bS? ;
where bS 2 S and bS ? 2 S ? . Let S be the R(A) of a matrix A. Then bS 2 R(A) and bS ? 2 N (AT ).
We will therefore denote bS by bR and bS ? by bN , meaning that bR is in the range of A and bN is
in the null space of AT .

It can be shown (exercise (14(c)) that


bR = PA b and bN = PN b:

17
bR and bN are called the orthogonal projection of b onto R(A) and the orthogonal projection
of b onto N (AT ), respectively.
From above, we easily see that
bTR bN = 0:
Example 1.3.9
0 0 01 011
B C B C
A=B
@ 1 CA ; b = B@ 1 CA
1
2
1
1 1
0 1
2

B 0 C
B@ ; p CCA
V = an orthonormal basis = B 1
2

;p 1
2
00 0 01
B C
PA = V V T = B
@0 C
A 1
2
1
2

0 1 1

00 0 01011 001
2 2

B0
bR = PAb = B
CC BB 1 CC = BB 1 CC
@ A@ A @ A
1
2
1
2

0 1 1
1 1
01 0 0 1
2 2

B C
PN = (I ; PA) = B
@0 ; C
A 1
2
1
2

0 ; 1 1

011
2 2

B C
bN = PN b = B
@0CA
0
Note that b = bR + bN .

1.4 Some Special Matrices


1. Diagonal Matrix { A square matrix A = (aij ) is a diagonal matrix if aij = 0 for i 6= j .
We write A = diag(a11; : : :; ann).

2. Triangular Matrix { A square matrix A = (aij ) is an upper triangular matrix if aij = 0 for
i > j.

18
The transpose of an upper triangular matrix is lower triangular; that is, A = (aij ) is lower
triangular if aij = 0 for i > j .
0 1 0 1
BB 0 CC BB  CC
@ A 0
@ A
LOWER TRIANGULAR UPPER TRIANGULAR

Some Useful Properties of Triangular Matrices


The following properties of triangular matrices are useful.
1. The product of two upper (lower) triangular matrices is an upper (lower) triangular matrix.
The diagonal entries of the product matrix are just the products of the diagonal entries of
the individual matrices. (Exercise 19(a) ).
2. The inverse of a nonsingular upper (lower) triangular matrix is an upper (lower) triangular
matrix. The diagonal entries of the inverse are the reciprocals of the diagonal entries of the
original matrix. (Exercise 19(b)).
3. The eigenvalues of a triangular matrix are its diagonal entries (Exercise 19(d)).
4. The determinant of a triangular matrix is the product of its diagonal entries. (Exercise
(19(c)).
Thus, a triangular matrix is nonsingular i all of its diagonal entries are nonzero.
3. Unitary (Orthogonal) Matrix | A square complex matrix U is unitary if
U U = UU  = I;
where U  = (U )T ; U is the complex conjugate of U .
If U is real, then U is orthogonal if
U T U = UU T = I:
Orthogonal matrices play a very important role in numerical matrix computations.

19
The following two important properties of orthogonal mattices make them so at-
tractive for numerical computation :
1. The inverse of an orthogonal matrix is just its transpose O;1 = OT
2. The product of two orthogonal matrices is an orthogonal matrix.

4. Permutation Matrix | A nonzero square matrix P is called a permutation matrix if


there is exactly one nonzero entry in each row and column which is 1 and the rest are all zero.
Thus, if ( 1 ; : : :; n) is a permutation of (1; 2; : : :; n), then
P = ( e 1 ... e ) ; n

where ei is the ith row of the n  n identity matrix I , is a permutation matrix. Similarly,
P = (e 1 ; e 2 ; : : :; e );
n

where ei is the ith column of I , is a permutation matrix.


Example 1.4.1
00 1 01 01 0 01 01 0 01
B C B C B C
P =B
1 @ 0 0 1 CA ; P = B@ 0 1 0 CA ; P = B@ 0 0 1 CA
2 3

1 0 0 0 0 1 0 1 0
are all permutation matrices.
E ects of Pre-multiplication and Post-multiplication by a permutation matrix.
0e 1
BB .. C 1

If P = @ . C
1 A, then
e n
0 1
BB th row of A
1
CC
P A=B
B th row of A
2 CC :
1 BB ..
.
CC
@ A
nth row of A
Similarly, if P = (e 1 e 1    e ), where e is the ith column of A, then
2 n i

AP = ( th column of A, th column of A, : : : , nth column of A).


2 1 2

20
Example 1.4.2
0a a a 1 00 1 01 0e 1
B 31 12 13
C B C B 2
CC
1. A = B
@a a 21 22 a 23
C
A; P = B
@ 0 0 1 CA = B@ e
1 3 A
a13 a 23 a 33 1 0 0 e1
0a a a23 1 0 2nd row of A 1
B 21 22
C B C
P A=B
1 @a 31 a 32 a33 C
A=B
@ 3rd row of A CA
a 11 a 12 a13 1st row of A
00 1 01
B 0 0 1 CC = (e ; e ; e )
2. P = B
1 @ A 3 1 2

1 0 0
0a a a 1
Ba
AP = B
13

a
11

a
12
CC = (3rd column of A, 1st column of A, 2nd column of A)
@1 23 21 22 A
a 33 a 31 a 32

An important property of a permutation matrix is that a permutation matrix is orthogonal.


Thus:

1. The inverse of a permutation matrix P is its transpose and it is also a permu-


tation matrix.
2. The product of two permutation matrices is a permutation matrix, and there-
fore is orthogonal.

5. Hessenberg Matrix (almost triangular) | A square matrix A is upper Hessenberg if aij = 0


for i > j + 1. The transpose of an upper Hessenberg matrix is a lower Hessenberg matrix, that is,
a square matrix A = (aij ) is a lower Hessenberg matrix if aij = 0 for j > i + 1. A square matrix A
that is both upper and lower Hessenberg is tridiagonal.
0  0
1 0   1
BB .. ... C CC BB    C
C
BB . C BB C
. . . ... ... C
B@  C A B@ CA
    0  
LOWER HESSENBERG UPPER HESSENBERG

21
An upper Hessenberg matrix A = (aij ) is unreduced if
ai;i; 6= 0 for i = 2; 3; : : :; n
1

Similarly, a lower Hessenberg matrix A = (aij ) is unreduced if


ai;i 6= 0 for i = 1; 2; : : :; n ; 1
+1

Example 1.4.3
01 2 01
A=B
B2 3 4C
C
@ A is an unreduced lower Hessenberg matrix.
1 1 1
01 1 1
1
B C
A=B
@1 1 1CA is an unreduced upper Hessenberg matrix.
0 2 3
Some Useful Properties
1. Every square matrix A can be transformed to an upper (or lower) Hessenberg matrix by
means of an unitary similarity, that is, given a complex matrix A, there exists a unitary
matrix U such that
UAU  = H
where H is a Hessenberg matrix.
Proof. (A constructive proof in the case where A is real is given in Chapter 5.)
2. If A is symmetric (or complex Hermitian), then the transformed Hessenberg matrix as ob-
tained in 1 is tridiagonal.
3. An arbitrary Hessenberg matrix can always be partitioned into diagonal blocks such that each
diagonal block is an unreduced Hessenberg matrix.
Example 1.4.4
01 2 3 41
BB 2 1 1 1 CC
A = BBB CC :
CA
@ 0 0 1 1
0 0 1! 1
A A
A = 1 2
:
0 A3
22
Note that
1 2
!
A1 =
2 1
and
1 1
!
A3 =
1 1
are unreduced Hessenberg matrices.
Companion Matrix | A normalized upper Hessenberg matrix of the form
00 0   a 1
BB 1 0       a CC 1

BB CC
2

C = B0 1   C
B
BB .. . . . . . . . . . .. CCC
@. .A
0 0 0 1 an
is called an upper companion matrix. The transpose of an upper companion matrix is a lower
companion matrix.
The characteristic polynomial of a companion matrix can be easily written down.
det (C ; I ) = det (C T ; I )
= (;1)n(n ; ann;1 ; an;1 n;2 ;    ; a2  ; a1 ):

6. Nonderogatory Matrix A matrix A is nonderogatory if A is similar to a companion


matrix; that is, A is nonderogatory if there exists a nonsingular T such that TAT ;1 = a companion
matrix.
There are, of course, other equivalent characterizations of a nonderogatory matrix. For example,
a matrix A is nonderogatory i there exists a vector b such that
rank(b; Ab; : : :; An;1b) = n:
The matrix (b; Ab; : : :; An;1b) = n is called the controllability matrix in control theory. If the
rank condition is satis ed, then the pair (A; b) is called controllable.

Remark: An unreduced Hessenberg matrix is nonderogatory, but the converse is not


true.
Example 1.4.5

23
1 2
!
A=
0 3
is an upper Hessenberg matrix with (2; 1) entry equal to zero, but A is nonderogatory.
1
!
Pick b = :
2
1 5
!
Then (b; Ab) = is nonsingular.
2 6
A matrix that is not nonderogatory is called derogatory. A derogatory matrix is similar to a
direct sum of a number of companion matrices
0C 1
BB 1

0 CC
BB .
CC
BB .. CC
BB CC
@ 0 A
Ck
where each Ci is a companion matrix, k > 1 and the characteristic polynomial of each Ci divides
the characteristic polynomial of all the preceding Ci's. The above form is also known as Frobenius
Canonical Form.
7. Diagonally Dominant Matrix | A matrix A = (aij ) is row diagonally dominant if
X
jaiij > jaij j for all i
j 6=i
A column diagonally dominant matrix is similarly de ned. The matrix
0 10 1 1 1
B C
A=B @ 1 10 1 CA
1 1 10
is both row and column diagonally dominant.
Note : Sometimes in the literature of Linear algebra, a matrix A having the above properties
is called a strictly diagonally matrix.

8. Positive De nite Matrix | A symmetric matrix A is positive de nite if for every nonzero
vector x,
xT Ax > 0
X n
Let x = (x ; x ; : : :; xn) . Then x Ax =
1 2
T T aij xixj is called the quadratic form associated
i;j=1
with A.
24
Example 1.4.6
2 1
!
A=
1 !5
x
x= 1

x 2 ! !
2 1 x
x Ax = (x x )
T 1
1
1 5 x !2
2
x
= (2x + x x + 5x ) 1
1
x 2 1 2
2

= 2x + 2x x + 5x
2
1 1 2
2
2

= 2(x + 2x x + x ) + x
2
1 1 2
1
4
2
2
9
2
2
2

= 2(x + x ) + x > 0
1
1
2 2
2 9
2
2
2

A positive semide nite matrix is similarly de ned. A symmetric matrix A is positive semidef-
inite if xT Ax  0 for every x.
A commonly used notation for a symmetric positive de nite (positive semide nite
matrix) is A > 0( 0).
Some Characterizations and Properties of Positive De nite Matrices
Here are some useful characterizations of positive de nite matrices:
1. A matrix A is positive de nite if and only if all its eigenvalues are positive. Note that in the
above example the eigenvalues are 1.6972 and 5.3028.
2. A matrix A is positive de nite i all its leading principal minors are positive.
There are n leading principal minors of an n  n matrix A. The ith leading principal minor,
denoted by " 1 2    i !#
det A ;
1 2  i
is the determinant
0 10of the submatrix
1 of A formed out of the rst i rows and i columns.
1 1
Example: A = BB@ 1 10 1 CCA. Thus, in the above example,
1 1 10
The rst leading principal minor = 10; !
10 1
The second leading principal minor = det A = 99;
1 10
The third leading principal minor = det A = 972.

25
3. A symmetric diagonally dominant matrix is positive de nite. Note that the matrix A, in the
example above, is diagonally dominant.
4. If A = (aij ) is positive de nite, then aii > 0 for all i.
5. If A = (aij ) is positive de nite, then the largest element (in magnitude) of the whole matrix
must lie on the diagonal.
6. The sum of two positive de nite matrices is positive de nite.

Remarks: Note that (4) and (5) are only necessary conditions for a symmetric matrix to be
positive de nite. They can serve only as initial tests for positive de niteness. For example, the
matrices 04 1 1 11
BB 1 0 1 2 CC 0 20 12 25 1
A=B BB CC ; B = BB 12 15 2 CC
@ A
@ 1 1 2 3 CA 25 2 5
1 2 3 4
cannot be positive de nite, since in the matrix A, there is a zero entry on the diagonal and in B ,
the largest entry 25 is not on the diagonal.

1.5 The Cayley-Hamilton Theorem


A square matrix A satis es its own characteristic equation; that is, if A = (aij ) is an n  n matrix
and Pn () is the characteristic polynomial of A, then
Pn(A) is a ZERO matrix.
Proof. (see Matrix Theory by Franklin, pp. 113-114).

26
Example 1.5.1
Let
0 1
!
A = :
1 2
P () =
2 2 ; 2!; 1: ! 1 0!
1 2 0 1
P (A) = ;2 ;
2
2 5! 1 2 0 1
0 0
= :
0 0
1.6 Singular Values
Let A by m  n. Then the eigenvalues of the n  n hermitian matrix A A are real and non-negative.
Let these eigenvalues be denoted by i2 where 12  22      n2 . Then 1 ; 2; : : :; n are called
the singular values of A. Every m  n matrix A can be decomposed into
A = U V T ;
where Umm and Vnn are unitary and  is an m  n \diagonal" matrix. This decomposition is
called the Singular Value Decomposition or SVD. The singular values i ; i = 1; : : :; n are the
diagonal entries of . The number of nonzero singular values is equal to the rank of the
matrix A. The singular values of A are the nonnegative square roots of the eigenvalues
of AT A (see Chapter 10, section 10.3)
Example 1.6.1
Let
0 1
!
A = :
2 2!
1 2
AT A = :
2 8
h p i
The eigenvalues of AT A are : 92 65
q p
1 = q[ 9+2 65 ]
p
2 = [ 9;2 65 ]

27
1.7 Vector and Matrix Norms
1.7.1 Vector Norms
Let 0x 1
BB x 1
CC
x=BBB .. 2 CC
CA
@.
xn
be an n-vector and V be a vector space. Then, a vector norm, denoted by the symbol kxk, is a
real-valued continuous function of the components x1; x2; : : :; xn of x, de ned on V , that has the
following properties:
1. kxk > 0 for every non-zero x. kxk = 0 i x is the zero vector.
2. k xk = j jkxk for all x on V and for all scalars .
3. kx + y k  kxk + ky k for all x and y in V .
The property (3) is known as the Triangle Inequality.

Note:
k ; xk = kxk
kxk ; kyk  k(x ; y)k:
It is simple to verify that the following are vector norms.

Some Easily Computed Vector Norms.


(a) kxk1 = jx1 j + jx2j +   jxnj (sum norm or one norm)
p
(b) kxk2 = x21 + x22 +    x2n (Euclidean norm or two norm)
(c) kxk1 = maxi jxi j (in nity norm or maximum norm)

In general, if p is a real number greater than or equal to 1, the p-norm or Holder norm is
de ned by
kxkp = (jx1jp +    + jxnjp) 1 :
p

28
Example 1.7.1
Let x = (1; 1; ;2). Then
kxk = 4
1
q p
kxk = 1 + 1 + (;2) = 6
2
2 2 2

kxk1 = 2
An important property of the Holder-norm is the Holder inequality
kxT yk  kxkp kykq ;
where
1 1
p + q = 1:
A special case of the Holder-inequality is the Cauchy-Schwartz inequality
jxT yj  kxk kyk ; 2 2

that is, v vn
X
n u
u n u
X uX
j xj yj j  t xj t yj : 2 2

j =1 j =1 j =1

29
Equivalent Property of the Vector-norms
All vector norms are equivalent in the sense that there exist positive constants and such
that
kxk  kxk  kxk
for all x.
For the 2, 1, or 1 norms, we can compute and easily:
kxk  kxk  pnkxk
2 1 2

kxk1  kxk  pnkxk1


2

kxk1  kxk  nkxk1


1

1.7.2 Matrix Norms


Let A be an m  n matrix. Then, analogous to the vector norm, we de ne a matrix norm kAk with
the following properties:
1. kAk > 0; kAk = 0 i A is the zero matrix
2. k Ak = j jkAk for any scalar
3. kA + B k  kAk + kB k
4. kAB k  kAkkB k
for all A and B .
Subordinate Matrix Norms
Given a matrix A and a vector norm k  k, a non-negative number de ned by:
kAkp = max kAxkp
x6 =0kxk p
satis es all the properties of a matrix norm. This norm is called the matrix norm subordinate to
the vector norm.
A very useful and frequently used property of a subordinate matrix norm (we shall sometimes
call it the p-norm of a matrix A) is
kAxkp  kAkpkxkp:
This property easily follows from the de nition of p-norms. Note:
kAkp  kkAx kp
xkp
for any particular nonzero vector x. Multiplying both sides by kxkp gives the original inequality.
30
The two easily computable p-norms are:
X m
kAk = max
j n i
1
1
jaij j
=1

(maximum column-sum norm)


X n
kAk1 = max
im
jaij j
1
j =1
(maximum row-sum norm)

Example 1.7.2
0 1
B 1 ; 2 C
A=BB@ 3 4 CCA
;5 6
kAk = 12 1

kAk1 = 11
Another useful p-norm is the spectral norm:
De nition 1.7.1 : q
kAk = maximum eigenvalue of AT A
2

(Note that the eigenvalues of AT A are real and non-negative).


Example 1.7.3
2 5
!
A =
1 3 !
5 13
AT A =
13 34
The eigenvalues of AT A are 0.0257 and 38.9743.
p
kAk = 38:9743 = 6:2429
2

31
The Frobenius Norm
An important matrix norm compatible with the vector norm kxk2 is the Frobenius norm:
2n m 3 1
X X 2

kAkF = 4 jaij j 5 2

j =1 i=1
A matrix norm kkM and a vector norm kkv are compatible if
kAxkv  kAkM kxkv :
Example 1.7.4
1 2
!
A=
3 4
p
kAkF = 30:

Notes:
1. For the identity matrix I ,
p
kI kF = n;
whereas kI k1 = kI k2 = kI k1 = 1.
2. kAk2F = trace(AT A), where trace(A) is de ned as the sum of the diagonal
entries of A, that is, if A = (aij ), then trace(A) = a11 + a22 + : : : + ann.

Equivalence Property of Matrix Norms


As in the case of vector norms, the matrix norms are also related. There exist scalars and such
that:
kAk  kAk  kAk :
In particular, the following inequalities relating various matrix norms are true and are used very
frequently in practice.
Theorem 1.7.1 Let A be m  n.
p
(1) pn kAk1  kAk  mkAk1 .
1
2

p
(2) kAk  kAk  nkAk .
2 F 2

32
p
(3) p1m kAk1  kAk2  nkAk1.
p
(4) kAk2  kAk1kAk1 .
We prove here inequalities (1) and (2) and leave the rest as exercises.
Proof of (1)
By de nition:
kAk1 = max kAxk1 :
x6 kxk =0 1
Again, from the equivalence property of the vector-norms, we have:
kAxk1  kAxk 2

and
kxk  pnkxk1:
2

>From the second inequality we get p


1 n
kxk1  kxk2 :
It therefore follows that
kAxk1  pn kAxk 2

kxk1 kxk 2

or
p
kAxk1  n max kAxk2 = nkAk ; p
max
x6=0 kxk1 6 0 kxk2
x= 2

i.e.,
p1n kAk1  kAk : 2

The rst part is proved. To prove the second part, we again use the de nition of kAk2 and the
appropriate equivalence property of the vector-norms.
kAk = max kAxk ; 2
2

p
x6 kxk =0 2

kAxk  mkAxk1;
2

kxk1  kxk : 2

Thus,
kAxk  pm kAxk1 : 2

kxk kxk1 2

So max kAxk  pm max kAxk1 or kAk  pmkAk .


2
x6 kxk
=0 x6 kxk1
=0
1
2

The proof of (1) is now complete.


2

33
We prove (2) using a di erent technique. Recall that
kAkF = trace(AT A):
2

Since AT A is symmetric, there exists an orthogonal matrix O such that


OT (AT A)O = D = diag(d ; : : :; dn):
1

(See Chapter 8.)


Now, the trace is invariant under similarity transformation (Exercise). We then have
trace(AT A) = trace(D) = d1 +    + dn:
Let dk = maxi (di). Then, since d1; : : :; dn are also the eigenvalues of AT A, we have
kAk = dk:
2
2

Thus,
kAkF = trace(AT A) = d +    + dn  dk = kAk :
2
1
2
2

To prove the other part, we note that


kAkF = d +    + dn  dk + dk +    + dk = ndk
2
1

2 2
p
That is, kAkF  ndk = nkAk . So, kAkF  nkAk .
2 2

1.7.3 Convergence of a Matrix Sequence and Convergent Matrices


A sequence of vectors v (1); v (2); : : : is said to converge to the vector v if
Limit
k!1 i
v k = vi ; i = 1; : : :; n:
( )

A sequence of matrices A(1) ; A(2); : : : is said to converge to the matrix A = (aij ) if


Limit a k = aij ; i; j = 1; 2; : : :; n:
k!1 ij
( )

If the sequence fA(k)g converges to A, then we write


Limit
k!1
A k = A:( )

We now state, without proof, necessary and sucient conditions for the convergence of vector
and matrix sequences. The proofs can be easily worked out.

34
Theorem 1.7.2 The sequence v ; v ; : : : converges to v if and only if for any vector norm
(1) (2)

Limit
k!1
kv k ; vk = 0:
( )

A similar theorem holds for a matrix sequence.


Theorem 1.7.3 The sequence of matrices A ; A ; : : : converges to the matrix A if and only if
(1) (2)

for every matrix norm


Limit
k!1
kA k ; Ak = 0:
( )

We now state and prove a result on the convergence of the sequence of powers of a matrix to
the zero matrix.
Theorem 1.7.4 The sequence A; A ; : : : of the powers of the matrix A converges to the zero matrix
2

i jij < 1 for each eigenvalue i of A.


Proof. For every n  n matrix A, there exists a nonsingular matrix T such that
0J 0
1
T ; AT = J = B
B . . . CC ; 1
1
@ A
0 Jr
where each Ji has the form 0 1 1
i
BB  1 0 CC
BB i
... ...
CC
Ji = B BB CC :
B@ 0 . .. 1 C CA
i
The above form is called the Jordan Canonical Form of A, and the diagonal block matrices are
called Jordan matrices. It is an easy computation to see that
0 k k; ;k k; 1
B i ki 1
i 2
 
CC
B
B 0 i k kki ;
2

 CC

B
1

B .. . .
Jik = B ... ... .. C
. CC ;
B . .
B
B .. ... CC
B
@ . i ki C
k k ;
A
1

0 0  0 ki
from where we see that Jik ! 0 if and only if jij < 1. This means that Limit
k!1
Ak = 0 i jij < 1
for each i.
De nition 1.7.2 A matrix A is called a convergent matrix if Ak ! 0 as k ! 1.
35
We now prove a sucient condition for a matrix A to be a convergent matrix in terms of a
norm of the matrix A. We rst prove the following result.
A Relationship between Norms and Eigenvalues
Theorem 1.7.5 Let  be an eigenvalue of a matrix A. Then for any subordinate matrix norm,
jj < kAk:
Proof. By de nition, there exists a nonzero vector x such that
Ax = x:
Taking the norm of each side, we have
kAxk = kxk = jj kxk:
However, kAxk  kAk kxk, so jj kxk = kAxk  kAkkxk, giving jj  kAk.
De nition 1.7.3 The quantity (A) de ned by
(A) = max jij
is called the spectral radius of A.

As a particular case of Theorem 1.7.5, we have:

(A)  kAk:

In view of the above, we can now state:

Corollary 1.7.1 A matrix A is convergent if kAk < 1, where kk is a subordinate


matrix norm.

36
Convergence of an In nite Matrix Series
Theorem 1.7.6 The matrix series
I +A+ A + 2

converges if and only if A is a convergent matrix. When it converges, it converges to (I ; A);1.


Proof. For the series to converge, Ak must approach a zero matrix when k approaches in nity.
Thus, the necessity is obvious.
Next, let A be a convergent matrix; that is, Ak ! 0 as k ! 1. Then from Theorem 1.7.4, we
must have jij < 1 for each eigenvalue i of A. This means that the matrix (I ; A) is nonsingular,
since the eigenvalue of I ; A are i ; 1 ; 1 ; 2 ; : : :; 1 ; n, and a matrix A is nonsingular if and only
if its eigenvalues are nonzero. This is because the eigenvalues of (I ; A) are 1 ; 1 ; 1 ; 2; : : :; 1 ; n,
and jij < 1 implies that none of them is zero. Thus from the identity
(I ; A)(I + A + A2 +    + Ak ) = I ; Ak+1 ;
we have
(I + A + A2 +    + Ak ) = (I ; A);1 ; (I ; A);1Ak+1 :
Since A is a convergent matrix,
Ak ! 0 as k ! 1:
+1

Thus when k ! 1; I + A + A2 +    + Ak ! (I ; A);1 .

1.7.4 Norms and Inverses


While analyzing the errors in an algorithm, we sometimes need to know, given a nonsingular matrix
A, how much it can be perturbed so that the perturbed matrix A + E is nonsingular, and how to
estimate the error in the inverse of the perturbed matrix.
We start with the identity matrix. In the following, k k is a matrix norm for which kI k = 1.
Theorem 1.7.7 Let kE k < 1, then (I ; E ) is nonsingular and
k(I ; E ); k  (1 ; kE k); :
1 1

Proof. Let  ; : : :; n be the eigenvalues of E . Then the eigenvalues of I ; E are


1

1 ; 1; 1 ; 2; : : :; 1 ; n:

37
Since kE k < 1; jij < 1 for each i. Thus, none of the quantities 1 ; 1 ; 1 ; 2 ; : : :; 1 ; n is
zero. This proves that I ; E is nonsingular. (Note that a matrix A is nonsingular i all its
eigenvalues are nonzero.)
To prove the second part, we write
(I ; E );1 = I + E + E 2 +   
Since kE k < 1,
Limit
k!1
E k = 0:
Thus, the series on the right side is convergent. Taking the norm on both sides, we have
k(I ; E ); k  kI k + kE k + kE k    = (1 ; kE k); (since kI k = 1):
1 2 1

(Note that the in nite series 1 + x + x2 +    converges to


1 ;
1;x
i jxj < 1.)
Theorem 1.7.8 If kE k < 1 then
k(I ; E ); ; I k  1 ;kEkkE k
1

Proof. For any two nonsingular matrices A and B, we can write


A; ; B; = A; (B ; A)B; :
1 1 1 1

In the above equation, substitute now


B=I
and A = I ; E ; then we have
(I ; E );1 ; I = (I ; E );1E:
(Note that I ;1 = I .) Taking the norm on both sides yields
k(I ; E ); ; I k  kI ; E k; kE k:
1 1

Now from Theorem 1.7.7 we know


kI ; E k;  (1 ; kE k);
1 1

and thus we have the result.


38
Implication of the Result
If the matrix E is very small, then 1 ; kE k is close to unity. Thus the above result
implies that if we invert a slightly perturbed identity matrix, then the error in the
inverse of the perturbed matrix does not exceed the order of the perturbation.

Theorem 1.7.9 Let A be nonsingular and let kA; E k < 1. Then A ; E is nonsingular, and
1

kA; ; (A ; E ); k  kA; E k
1 1 1

kA; k 1 ; kA; E k
1 1

Proof. We can write


A ; E = A(I ; A; E ): 1

Since kA; E k < 1, from Theorem 1.7.7, we have I ; A; E is nonsingular. Thus, A ; E , which is
1 1

the product of the nonsingular matrices A and I ; A; E , is also nonsingular. 1

To prove the second part, we again recall the identity


A; ; B; = A; (B ; A)B; :
1 1 1 1

Substituting B = A ; E , we then have


A; ; (A ; E ); = ;A; E (A ; E ); :
1 1 1 1

Taking the norm on each side yields


kA; ; (A ; E ); k  kA; E kk(A ; E ); k:
1 1 1 1

Now, since
B = A ; (A ; B )
= A[I ; A; (A ; B )]; 1

B; = [I ; A; (A ; B)]; A; :
1 1 1 1

(Note that (XY );1 = Y ;1X ;1.) If we now substitute B = A ; E , we then have
(A ; E );1 = [I ; A;1 E ];1A;1 :
Taking norms, we get
k(A ; E ); k  kA; k kI ; A; E k; :
1 1 1 1

39
But from Theorem 1.7.7, we know that
kI ; A; E k;  (1 ; kA; E k); :
1 1 1 1

So, we have
k(A ; E ); k  1 ;kkAA;kE k :
; 1
1
1

We, therefore, have


kA; ; (A ; E ); k  kA1 ;EkkA;kAE k k ;
; ; 1 1
1 1
1

or kA ;kA(A;1;k E ) k  1 ;kAkA;E1kE k .
;1 ;1 ;1

Implication of the Result


The above result states that if kA;1 E k is small and is much less than unity, then
the relative error in (A ; E );1 is bounded by kA;1E k.

1.8 Norm Invariant Properties of Orthogonal Matrices


We conclude the chapter by listing some very useful norm properties of orthogonal matrices that
are often used in practice.
Theorem 1.8.1 Let O be an orthogonal matrix. Then
kOk = 1: 2

Proof. By de nition,
q
kOk = (OT O)
2
q
= (I ) = 1:

Theorem 1.8.2
kAOk = kAk 2 2

40
Proof.
q
kAOk = (OT AT AO)
2
q
= (AT A) = kAk 2

(Note that the spectral radius is invariant under similarity transformation.) (See Chap-
ter 8.)
Theorem 1.8.3
kAOkF = kAkF
Proof. kAOkF = trace(OT AT AO) = trace(AT A) = kAkF .
2 2

1.9 Review and Summary


The very basic concepts that will be required for smooth reading of the rest of the book have been
brie y summarized in this chapter. The most important ones are:
1. Special Matrices: Diagonal, triangular, orthogonal, permutation, Hessenberg, tridiagonal,
diagonally dominant, and positive de nite matrices have been de ned and important proper-
ties discussed.
2. Vector and Matrix Norms: Some important matrix norms are: row-sum norm, column-
sum norm, Frobenius-norm, and spectral norm.
A result on the relationship between di erent matrix norms is stated and proved in Theorem
1.7.1.
Of special importance is the norm-property of orthogonal matrices. Three simple but impor-
tant results have been stated and proved in section 1.8 (Theorems 1.8.1, 1.8.2, and 1.8.3).
These results say that
(i) the spectral norm of an orthogonal matrix is 1, and
(ii) the spectral and the Frobenius norms remain invariant under matrix multiplication.
3. Convergence of a Matrix Sequence. The notion of the convergence of the sequence of
matrix powers fAk g is important in the study of convergence of iterative methods for linear
systems.
The most important results in this context are:

41
(i) The sequence fAkg converges to the \zero" matrix if and only if jij < 1 for
each eigenvalue of i of A (Theorem 1.7.4).
(ii) The sequence fAkg converges to a zero matrix if kAk < 1. (Corollary to
Theorem 1.7.5).
4. Norms and Inverses. If a nonsingular matrix A is perturbed by a matrix E , it is sometimes
of interest to know if the perturbed matrix A + E remains nonsingular and how to estimate
the error in the inverse of A + E .
Three theorems (Theorems 1.7.7, 1.7.8, and 1.7.9) are proved in this context in Section 1.7.4.
These results will play an important role in the perturbation analysis of linear systems (Chap-
ter 6).

1.10 Suggestions for Further Reading


The material covered in this chapter can be found in any standard book on Linear Algebra and
Matrix Theory. In particular, we suggest the following books for further reading:
1. Matrix Theory by Joel N. Franklin, Prentice Hall, Englewood Cli s, NJ, 1968.
2. Linear Algebra With Applications by Steven J. Leon, McMillan, New York, 1986.
3. Linear Algebra and its Applications, (Second Edition), by Gilbert Strang, Academic
Press, New York, 1980.
4. Introduction to Linear Algebra by Gilbert Strang, Wellesley-Cambridge Press, 1993.
5. The Algebraic Eigenvalue Problem by James H. Wilkinson, Clarendon Press, Oxford,
1965 (Chapter 1).
6. Matrix Analysis by Roger Horn and Charles Johnson, Cambridge University Press, 1985.
7. The Theory of Matrices by Peter Lancaster, Academic Press, New York, 1969.
8. The Theory of Matrices with Applications by Peter Lancaster and M. Tismenetsky,
2nd ed., Academic Press, Dover, New York, 1985.
9. The Theory of Matrices in Numerical Analysis by A. S. Householder, Dover Publi-
cations, Inc., New York, 1964.
10. Matrices and Linear Algebra by Hans Schneider and George Philip Barker, Dover
Publications Inc., New York, 1989.
42
11. Linear Algebra and Its Applications by David C. Lay, Addison-Wesley New York,
1994.
12. Elementary Linear Algebra with Applications by Richard O. Hill, Jr., Harcourt-
Brace-Jovanovich, 1991.

43
Exercises on Chapter 1
PROBLEMS ON SECTIONS 1.2 AND 1.3
1. Prove that
(a) a set of n linearly independent vectors in Rn is a basis for Rn.
(b) the set fe1; e2; : : :; eng is a basis of Rn.
(c) a set of m vectors in Rn, where m > n, is linearly dependent.
(d) any two bases in a vector space V have the same number of vectors.
(e) dim(Rn) = n.
(f) spanfv1; : : :; vng is a subspace of V, where spanfv1 ; : : :; vng is the set of linear combina-
tions of the n vectors v1 ; : : :; vn from a vector space V.
(g) spanfv1; : : :; vng is the smallest subspace of V containing v1 ; : : :; vn.
2. Prove that if S = fs; : : :; sk g is an orthogonal set of nonzero vectors, then S is linearly
independent.
3. Let S be an m-dimensional subspace of Rn. Then prove that S has an orthonormal basis.
(Hint: Let S = fv1; : : :; vng.) De ne a set of vectors fuk g by:
u = kvv k
1
1

0
uk = kvvk0 k ;
1
+1
+1
k +1

where
vk0 = vk ; (vkT u )u ; (vkT u )u ;    ; (vkT uk)uk ;
+1 +1 +1 1 1 +1 2 2 +1 k = 1; 2; : : :; m ; 1:
Then show that fu1 ; : : :; um g is an orthonormal basis of S . This is the classical Gram-
Schmidt process.
4. Using the Gram-Schmidt process construct an orthonormal basis of R3 .
5. Construct an orthonormal basis of R(A), where
01 21
B 2 3 CC :
A=B
@ A
4 5

44
6. Let S1 and S2 be two subspaces of Rn . Then prove that
dim(S1 + S2 ) = dim(S1) + dim(S2) ; dim(S1 \ S2):

7. Prove Theorem 1.3.1 on the properties of the determinant of a matrix.


8. Prove that
(a) null(A) = 0 i A has linearly independent columns.
(b) rank(A) = rank(AT ).
(c) rank(A) + null(A) = n.
(d) if A is m  n and m < n, then rank(A)  m.
(e) if A and B are m  n and n  p matrices then
rank(AB )  minfrank(A); rank(B )g:
(f) the rank of a matrix remains unchanged when the matrix is multiplied by an invertible
matrix.
(g) if B = UAV , where U and V are invertible, then rank(B ) = rank(A).
(h) N (A) = R(AT )? and R(A)? = N (AT )
9. Let A be m  n. Then A has rank 1 i A can be written as A = abT , where a and b are
column vectors.
10. Prove the following basic facts on nonsingularity and the inverse of A:
(a) (A;1);1 = A
(b) (AT );1 = (A;1 )T
(c) (cA);1 = 1c A;1 , where c is a nonzero scalar
(d) (AB );1 = B ;1 A;1
11. Suppose a matrix A can be written as
A = LU;
where L is a lower triangular matrix with 1's along the diagonal and U = (uij ) is an upper
triangular matrix. Prove that
Y
n
det A = uii :
i=1

45
A 0
!
12. Let A = 1
, where A and A are square. Prove that det(A) = det(A ) det(A ).
A A
2 3
1 3 1 3

13. Suppose that A can be written as


A = LDLT
where L is lower triangular with 1's as diagonal entries and D = diag(d ; : : :; dnn) is a diag-
11

onal matrix. Prove that the leading principal minors (determinant) of A are d ; d d ; : : :,
11 11 22

d : : :dnn.
11

14. (a) Show that if PS is an orthogonal projection onto S , then I ; PS is the orthogonal
projection onto S? .
(b) Prove that
i. PA = A(AT A);1 AT .
ii. PN = I ; A(AT A);1AT .
iii. kPA k2 = 1
(c) Prove that
i. bR = PA b
ii. bN = PN b
15. (a) Find PA and PN for the matrices
01 21 0 1 1
1
A=B
B 2 3 CC ; A = BB 10;4 0 CC
@ A @ A
0 0 0 10 ; 4

011
B 0 CC, nd b and b for each of the above matrices.
(b) For the vector b = B
@ A R N
1
(c) Find an orthonormal basis for each of the above matrices using the Gram-Schmidt
process and then nd PA ; PN ; bR , and bN . For a description of the Gram-Schmidt
algorithm, see Chapter 7 or problem #3 of this chapter.
16. Let A be an m  n matrix with rank r. Consider the Singular Value Decomposition of
A:
A = U V T
= (Ur ; U^r )(Vr ; V^r )T :
Then prove that
46
(a) Vr VrT is the orthogonal projection onto N (A)? = R(AT ).
(b) Ur UrT is the orthogonal projection onto R(A).
(c) Vr (U^ )r (U^r )T is the orthogonal projection onto R(A)? = N (AT ).
(d) V^r (V^r)T is the orthogonal projection onto N (A).
17. (Distance between two subspaces). Let S1 and S2 be two subspaces of Rn such that
dim(S1 ) = dim(S2). Let P1 and P2 be the orthogonal projections onto S1 and S2 respectively.
Then kP1 ; P2k2 is de ned to be the distance between S1 and S2. Prove that the distance
between S1 and S2 , dist(S1 ; S2) = sin() where  is the angle between S1 and S2 .
18. Prove that if PS is an orthogonal projection onto S , then I ; 2PS is an orthogonal projec-
tion.

PROBLEMS ON SECTIONS 1.4{1.6


19. Prove the following.
(a) The product of two upper (lower) triangular matrices is an upper (lower) triangular
matrix . In general, if A = (aij ) is an upper (lower) triangular matrix , then p(A), where
p(x) is a polynomial, is an upper (lower) triangular matrix whose diagonal elements are
p(aii); i = 1; : : :; n.
(b) The inverse of a lower (upper) triangular matrix is another lower (upper) triangular
matrix, whose diagonal entries are the reciprocals of the diagonal entries of the triangular
matrix.
(c) The determinant of a triangular matrix is the product of its diagonal entries.
(d) The eigenvalues of a triangular matrix are its diagonal entries.
(e) If A 2 Rnn is strictly upper triangular, then An = 0.
(A = (aij ) is strictly upper triangular if A is upper triangular and aii = 0 for each i.)
(f) The inverse of a nonsingular matrix A can be written as a polynomial in A (use the
Cayley-Hamilton Theorem).
20. Prove that the product of an upper Hessenberg matrix and an upper triangular matrix is an
upper Hessenberg matrix.
21. Prove that a symmetric Hessenberg matrix is symmetric tridiagonal.

47
22. A square matrix A = (aij ) is a band matrix of bandwidth 2k + 1 if ji ; j j > k implies that
aij = 0. What are the bandwidths of tridiagonal and pentadiagonal matrices? Is the product
of two banded matrices having the same bandwidth a banded matrix of the same bandwidth?
Give reasons for your answer.
23. (a) Show that the matrix
H = I ; 2 uuuT u ;
T

where u is a column vector, is orthogonal (the matrix H is called a Householder


matrix.
(b) Show that the matrix !
c s
J= ; where c2 + s2 = `
;s c
is orthogonal. (The matrix J is called a Givens matrix.)
(c) Prove that the product of two orthogonal matrices is an orthogonal matrix.
(d) Prove that a triangular matrix that is orthogonal is diagonal.
24. Let A and B be two symmetric matrices.
(a) Prove that (A + B ) is symmetric.
(b) Prove that AB is not necessarily symmetric. Derive a condition under which AB is
symmetric.
(c) If A and B are symmetric positive de nite, prove that (A + B ) is positive de nite. Is
AB also positive de nite? Give reasons for your answer. When is (A ; B) symmetric
positive de nite?
25. Let A = (aij ) be an n  n symmetric positive de nite matrix. Prove the following.
(a) Each diagonal entry of A must be positive.
(b) A is nonsingular.
(c)
(aij )2 < aii ajj for i = 1; : : :; n;
j = 1; : : :; n;
i 6= j
(d) The largest element of the matrix must lie on the diagonal.
26. Let A be a symmetric positive de nite matrix and x be a nonzero n-vector. Prove that
A + xxT is positive de nite.
48
27. Prove that a diagonally dominant matrix is nonsingular, and a diagonally dominant symmetric
matrix with positive diagonal entries is positive de nite.
28. Prove that if the eigenvalues of a matrix A are all distinct, then A is nonderogatory.
29. Prove that a symmetric matrix A is positive de nite i A;1 exists and is positive de nite.
30. Let A be an m  n matrix (m  n) having full rank. Then AT A is positive de nite.
31. Prove the following basic facts on the eigenvalues and eigenvectors.
(a) A matrix A is nonsingular i A does not have a zero eigenvalue.
(Hint: det A = 12; : : :; n.)
(b) The eigenvalues of AT and A are the same.
(c) If two matrices have the same eigenvalues, they need not be similar (construct an example
to show this).
(d) A symmetric matrix is positive de nite i all its eigenvalues are positive.
(e) The eigenvalues of a triangular matrix are its diagonal elements.
(f) The eigenvalues of a unitary (orthogonal) matrix have moduli 1.
(g) The eigenvectors of a symmetric matrix are orthogonal.
(h) Let A be a symmetric matrix and let Q be orthogonal such that QAQT is diagonal.
Then show that the columns of Q are the eigenvectors of A.
32. Let 0c c   c c1
BB n1; n0;      
1 2 1

0 0C
0
C
BB CC
C=B BB 0. 1 0  0 0C
B@ .. .. . . . . . . .. .. CC
. . .C A
0 0   1 0
Then show that
(a) the matrix V de ned by
0 n n    n 1
BB n; n;    nn; CC
1 2

BB . . . 1
n C
C;
1 1

. ... C
1 2

B
V =B . . . . . CC where i 6= j ;
BB
@      n C
1 2 A
1 1  1
is such that V ;1 CV = diag(i); i = 1; : : :; n.
49
(b) The eigenvector xi corresponding to the eigenvalue i of C is given by xTi = (ni ;1; ni ;2; : : :,
i; 1).
33. Let H be an unreduced upper Hessenberg matrix. Let X = (x1 ; : : :; xn) be de ned by
011
BB 0 CC
x1 = e1 = BBB .. CCC ; xi+1 = Hxi; i = 1; 2; : : :; n ; 1:
@.A
0
Then prove that X is nonsingular and X ;1HX is a companion matrix (upper Hessenberg
companion matrix.)
34. What are the singular values of a symmetric matrix? What are the singular values of a
symmetric positive de nite matrix? Prove that a square matrix A is nonsingular i it has no
zero singular value.
35. Prove that
(a) trace(AB ) = trace(BA).
X
m X
n
(b) trace(AA ) = jaijj , where A = (aij ) is m  n.
2

i=1 j =1
(c) trace(A + B ) = trace(A) + trace(B ).
(d) trace(TAT ;1) = trace(A).

PROBLEMS ON SECTIONS 1.7 AND 1.8


36. Show that kxk1; kxk1 ; kxk2 (as de ned in section 1.7 of the book) are vector norms.
37. Show that, if x and y are two vectors, then

kxk ; kyk  kx ; yk  kxk + kyk

38. If x and y are two n-vectors, then prove that


(a) jxT y j  kxk2 ky k2 (Cauchy-Schwarz inequality)
(b) kxy T k2  kxk2 ky k2 (Schwarz inequality)
39. Let x and y be two orthogonal vectors, then prove that
kx + yk = kxk + kyk :
2
2
2
2
2
2

50
40. Prove that for any vector x, we have
kxk1  kxk  kxk :
2 1

41. Prove that kAk1 ; kAk1 ; kAk2 are matrix norms.


42. Let A = (aij ) be m  n. De ne A` = maxij jaij j. Is A` a matrix norm? Give reasons for your
answer.
43. (a) Prove that the vector length is preserved by orthogonal matrix multiplication. That is,
if x 2 Rn and Q 2 Rnn be orthogonal, then kQxk2 = kxk2 (Isometry Lemma).
(b) Is the statement in part (a) true if the one and in nity norms are used? Give reasons.
What if the Frobenius norm is used?
44. Prove that kI k2 = 1. Prove that kI kF = n.
p

45. Prove that if Q and P are orthogonal matrices, then


(a) kQAP kF = kAkF
(b) kQAP k2 = kAk2
46. Prove that the spectral norm of a symmetric matrix is the same as its spectral radius.
47. Let A 2 Rnn and let x; y and z be n-vectors such that Ax = b and Ay = b + z . Then prove
that
kzk  kx ; yk  kA; k kzk
2 1
kAk 2
2 2 2

(assuming that A;1 exists).


48. Prove that kAk2  kAkF (use Cauchy-Schwarz inequality).
49. Prove that kAk2 is just the largest singular value of A. How is kA;1 k2 related with a singular
value of A?
50. Prove that kAT Ak2 = kAk22 .
51. Prove that
kABkF  kAkF kBk : 2

kABkF  kAk kBkF :2

51
52. Let A = (a1; : : :; an), where aj is the j th column of A. Then prove that
X
n
kAkF =
2
kaikF :
2

i=1

53. Prove that if A and A + E are both nonsingular, then


k(A + E ); ; A; k  kE k kA; k k(A + E ); k
1 1 1 1

(Banach Lemma). What is the implication of this result?


54. Let A 2 Rmn have full rank, then prove that AyE has also full rank if E is such that
kE k < kAy k , where Ay = (AT A); AT .
2
1 1

55. Show that the matrices


01 2 31 01 1 1
1
B 2 3 4 CC ;
A=B
B
A=B
2 3
CC ;
@ A @ 1
2
1
3
1
4 A
4 5 6 1 1 1

01 2 31 00 1 01
3 4 5

A=B
B 0 5 4 CC ; B 0 0 1 CC
A=B
@ A @ A
0 0 1 1 2 3
are not convergent matrices.
56. Construct a simple example where the norm test for convergent matrices fails, but still the
matrix is convergent.
57. Prove that the series (I + A2 + A2 + : : :) converges if kB k < 1, where B = PAP ;1 . What is
the implication of this result? Construct a simple example to see the usefulness of the result
in practical computations. (For details, see Wilkinson AEP, p. 60.)

52
2. FLOATING POINT NUMBERS AND ERRORS IN COMPUTATIONS
2.1 Floating Point Number Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53
2.2 Rounding Errors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56
2.3 Laws of Floating Point Arithmetic : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59
2.4 Addition of n Floating Point Numbers : : : : : : : : : : : : : : : : : : : : : : : : : : 63
2.5 Multiplication of n Floating Point Numbers : : : : : : : : : : : : : : : : : : : : : : : 65
2.6 Inner Product Computation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66
2.7 Error Bounds for Floating Point Matrix Computations : : : : : : : : : : : : : : : : : 69
2.8 Round-o Errors Due to Cancellation and Recursive Computations : : : : : : : : : : 71
2.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75
2.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77
CHAPTER 2
FLOATING POINT NUMBERS
AND ERRORS IN COMPUTATIONS
2. FLOATING POINT NUMBERS AND ERRORS IN COMPU-
TATIONS
2.1 Floating Point Number Systems
Because of limited storage capacity, a real number may or may not be represented exactly on a
computer. Thus, while using a computer, we have to deal with approximations of the real number
system using nite computer representations. This chapter will be con ned to the study of the
arithmetic of such approximate numbers. In particular, we will examine the widely accepted IEEE
standard for binary oating-point arithmetic (IEEE 1985).
A nonzero normalized oating point number in base 2 has the form:
(;1)sd1:d2d3    dt 2e

x = :d d    dt2e
1 2

or
x = :r2e;
where e is the exponent, r is the signi cant, d2 d3    dt is called the fraction, t is the precision,
and (;1)s is the sign. (Note that t is nite.)
d =11

di = 0 or 1; 2  i  t
Three parameters specify all numerical values that can be represented. These are: the precision,
t, and L and U , the minimum and maximum exponents. The numbers L and U vary among
computers, even those that adhere to the IEEE standard, since the standard recommends only
minimums. As an example, the standard recommends, for single precision, that t = 24; L = ;126,
and U = 127. The recommendation for double precision is t = 53; L = ;1022, and U = 1023.
Consider the example for a 32-bit word
1 8 23
s e f
Here s is the sign of the number, e is the eld for the exponent, and f is the fraction.
Note that for normalized oating point numbers in base 2, it is known that d1 = 1 and can thus
be stored implicitly.
53
The actual storage of the exponent is accomplished by storing the true exponent plus an o set,
or bias. The bias is chosen so that e is always nonnegative. The IEEE standard also requires that
the unbiased exponent have two unseen values of L ; 1 and U + 1. L ; 1 is used to encode 0 and
denormalized numbers (i.e., those for which d1 6= 1). U +1 is used to encode 1 and nonnumbers,
such as (+1) + (;1), which are denoted by NaN.
Note that for the single precision example given above, the bias is 127. Thus, if the biased
exponent is 255, then 1 or a NaN is inferred. Likewise if the biased exponent is 0, then 0 or
a denormalized number is inferred. The standard speci es how to determine the di erent cases
for the special situations. It is not important here to go into such detail. Curious readers should
consult the reference (IEEE 1985).
From the discussion above, one sees that the IEEE standard for single precision provides approx-
imately 7 decimal digits of accuracy, since 2;23 = 1:2  10;7. Similarly, double precision provides
approximately 16 decimal digits of accuracy (2;52  = 2:2  10;16).
There is also an IEEE standard for oating point numbers which are not necessarily of base 2.
By allowing one to choose the base, , we see that the set of all oating point numbers, called the
Floating Point Number System, is thus characterized by four parameters:
| The number base.
t | The Precision.
L; U | Lower and upper limits of the exponent.
This set has exactly:
2( ; 1) t;1(U ; L + 1) + 1 numbers in it.
We denote the set of normalized oating point numbers of precision t by Ft . The set of Ft is
NOT closed under arithmetic operations; that is, the sum, di erence, product, or quotient
of two oating numbers in Ft is not necessarily a number in Ft. To see this, consider the
simple example in the oating system with = 10; t = 3; L = ;1; U = 2:
a = 11:2 = :112  10 2

b = 1:13 = :113  10 1

The product c = a  b = 12:656 = :12656  102 is not in Ft .


The above example shows that during the course of a computation, a computed number may
very well fall outside of Ft.
There are, of course, two ways a computed number can fall outside the range of Ft; rst, the
exponent of the number may fall outside the interval [L; U ]; second, the fractional part may contain
more than t digits (this is exactly what happened in the above example).
54
If the computations produce an exponent too large (too small) to t in a given computer, then
the situation is called over ow (under ow).
Over ow is a serious problem; for most systems the result of an over ow is 1. Under ow
is usually considerably less serious. The result of an under ow can be to set the value to zero, a
denormalized number or 2L .
Example 2.1.1 Over ow and Under ow
1. Let = 10; t = 3; L = ;3; U = 3.
a = :111  10 ;
3

b = :120  103

c = a  b = :133  10 5

will result in an over ow, because the exponent 5 is too large.


2. Let = 10; t = 3; L = ;2; U = 3
a = :1  10; ; 1

b = :2  10; 1

c = ab = 2  10; 4

will result in an under ow.


Simple mathematical computations such as nding a square root, or exponent of a number
or computing factorials can give over ow. For example, consider computing
p
c= a +b :
2 2

If a or b is very large, then we will get over ow while computing a2 + b2.


The IEEE standard also sets forth the results of operations with in nities and NaNs. All oper-
ations with in nities correspond to the limiting case in real analysis. Those ambiguous situations,
such as 0  1, result in NaNs, and all binary operations with one or two NaNs result in a NaN.

55
Computing the Length of a Vector
Over ow and under ow can sometimes be avoided just by organizing the computations di er-
ently. Consider, for example, the task of computing the length of an n-vector x with components
x ; : : :; xn
1

kxk = x + x +    + xn:
2
2
2
1
2
2
2

If some xi is too big or too small, then we can get over ow or under ow with the usual way
of computing kxk2. However, if we normalize each component of the vector by dividing it by
m = max(jx1j; : : :; jx1j) and then form the squares and the sum, then over ow problems can be
avoided. Thus, a better way to compute kxk22 will be:
1: m = max(jx1j; : : :; jxnj)
2: yi = xi=m; i = 1; : : :; n
p
3: kxk2 = m (y12 + y22 +    + yn2 )
2.2 Rounding Errors
If a computed result of a given real number is not machine representable, then there are two ways
it can be represented in the machine. Consider
  d    dtdt   
1 +1

Then the rst method, chopping, is the method in which the digits from dt+1 on are simply
chopped o . The second method is rounding, in which the digits dt+1 through the rest are not
only chopped o , but the digit dt is also rounded up or down depending on whether dt+1  =2 or
dt+1 < =2.
Let (x) denote the oating point representation of a real number x.
Example 2.2.1 Rounding
Consider base 10. Let x = 3:141596
t=2
(x) = 3:1
t=3
(x) = 3:14
t=4
(x) = 3:142

56
We now give an expression to measure the error made in representing a real number x on the
computer, and then show how this measure can be used to give bounds for errors in other oating
point computations.
De nition 2.2.1 Let x^ denote an approximation of x, then there are two ways we can measure
the error:
Absolute Error = jx^ ; xj
; x 6= 0:
Relative Error = jx^ j;xjxj

Note that the relative error makes more sense than the absolute error. The following
simple example shows this:
Example 2.2.2
Consider
x 1 = 1:31
x^ 1 = 1:30

and x 2 = 0:12

x^ 2 = 0:11

The absolute errors in both cases are the same:


jx^ ; x j = jx^ ; x j = 0:01:
1 1 2 2

On the other hand, the relative error in the rst case is


jx^1 ; x1j = 0:0076335
jx j
1

and the relative error in the second case is


jx^2 ; x2j = 0:0833333:
jx j
2

Thus, the relative errors show that x^1 is closer to x1 than x^2 is to x2 , whereas the absolute errors
give no indication of this at all.
The relative error gives an indication of the number of signi cant digits in an approximate
answer. If the relative error is about 10;s, then x and x^ agree to about s signi cant digits. More
speci cally,
De nition 2.2.2 x^ is said to approximate x to s signi cant digits if s is the largest
 non-negative
 1in-
j x ; x
^ j j x
teger for which the relative error jxj < 5(10;s); that is, s is given by s = ; log jxj + 2 .; x
^ j

57
Thus, in the above examples, x^1 and x1 agree to two signi cant digits, while x^2 and x2 agree
to about only one signi cant digit.
We now give an expression for the relative error in representing a real number x by its oating
point representation (x):

Theorem 2.2.1 Let (x) denote the oating point representation of a real number
x. Then
81 9
j (x) ; xj   = >
< ;t for rounding
2
1 >
=
: (2.2.1)
jxj >
: ;t for chopping
1 >
;

Proof. We establish the bound for rounding and leave the other part as an exercise.
Let x be written as
x = (d d    dtdt   )  e
1 2 +1

where d1 6= 0 and 0  di < . When we round o x we obtain one of the following point numbers:
x0 = (d d    dt)  e
1 2

x00 = [(d d    dt) + ;t)]  e :


1 2

Obviously we have x 2 (x0 ; x00). Assume, without any loss of generality, that x is closer to x0.
We then have
jx ; x0j  12 jx0 ; x00j = 12 e;t:
Thus, the relative error
jx ; x0j   1 ;t 
jxj 2 d1d2    dt   
;t
 12 1 (since di < )

= 12 1;t:

Example 2.2.3

58
Consider the three digit representation of the decimal number x = 0:2346 ( = 10; t = 3).
Then, if rounding is used, we have:
(x) = 0:235
Relative Error = 0:001705 < 1 10;2:
2
Similarly, if chopping is used, we have:
(x) = :234
Relative Error = 0:0025575 < 10;2:
De niton: The number  in (2.2.1) is called the machine precision or unit roundo error.
It is the smallest positive oating point number such that:
(1 + ) > 1:
 is usually between 10; and 10; (on most machines), for double and single precision, respec-
16 6

tively. For the IBM 360 and 370, = 16; t = 6;  = 4:77  10; .7

The machine precision is very important in scienti c computations. If the particulars ; t; L


and U for a computer are not known, the following simple FORTRAN program can be run to
estimate  for that computer (Forsythe, Malcolm and Moler (1977), p. 14.)
REAL MEU, MEU 1
MEU = 1.0
10 MEU = 0.5 * MEU
MEU 1 = MEU + 1.0
IF (MEU 1.GT.1.0) GOTO 10

The above FORTRAN program computes an approximation of  which di ers from  by at most a
factor of 2. This approximation is quite acceptable, since an exact value of  is not that important
and is seldom needed.
The book by Forsythe, Malcolm and Moler (CMMC 1977) also contains an extensive list of L
and U for various computers.

2.3 Laws of Floating Point Arithmetic


The formula 8
j (x) ; xj   = >
< ;t for chopping
1

jxj : 12 ;t for rounding


> 1

59
can be written as
(x) = x(1 +  )
where j j  
Assuming that the IEEE standard holds, we can easily derive the following simple laws of oating
point arithmetic.

Theorem 2.3.1 Let x and y be two oating point numbers, and let (x + y),
(x ; y ); (xy ), and (x=y ) denote the computed sum, di erence, product and quo-
tient. Then
1. (x  y ) = (x  y )(1 +  ), where j j  .
2. (xy ) = (xy )(1 +  ), where j j  .
3. if y 6= 0, then (x=y ) = (x=y )(1 +  ), where j j  .
On computers that do not use the IEEE standard, the following oating point law
of addition might hold:
4. (x + y ) = x(1 + 1 ) + y (1 + 2 ) where j1 j  , and j2 j  .

Example 2.3.1 Simple Floating Point Operations with Rounding


Let
= 10; t=3
in examples 1 through 3.
1. x = :999  102; y = :111  100.
x + y = 100:0110 = :100011  10 3

(x + y ) = :100  10 3

Thus, (x + y ) = (x + y )(1 +  ), where


 = ;1:0999  10; ; j j < 21 (10; ):
4 2

2. x = :999  102; y = :111  100.


xy = 11:0889
(xy ) = :111  10 2

60
Thus, (xy ) = xy (1 +  ), where
 = 1:00100  10; ; j j  12 (10 ; ):
3 1 3

3. x = :999  102; y = :111  100.


x = 900
 x y

y = :900  10
3

 = 0:
4. Let
= 10; t=4
x = 0:1112
y = :2245  105
xy = :24964  104;
(xy ) = :2496  104
Thus, j (xy ) ; xy j = :44 and
jj = 1:7625  10; 4

< 12  10; 3

Computing Without a Guard-Digit


Theorem 2.3.1 and the examples following this theorem show that the relative errors in com-
puting the sum, di erence, product and quotient in oating point arithmetic are small. However,
there are computers without guard digits, such as the Cybers and the current CRAYS (the CRAY
arithmetic is changing), in which additions and subtractions may not be accurate. We describe this
aspect in some detail below.
A guard digit is an extra digit on the lower end of the arithmetic register whose purpose is
to catch the low order digit which would otherwise be pushed out of existence when the decimal
points are aligned. The following example shows the di erence between two models.
Examples of oating point additions
Let
= 10; t = 3;  = 0:001
61
Example 2.3.2 Addition with a Guard Digit
x = 0:101  10 ; 2
y = ;0:994  10 1

Step 1. Align two numbers


guard digit
x = 0:101 0 102
y = ;0:099 4 102
Step 2. Add (with an extra digit)
0:1010 102

;0:0994 102

(x + y ) = 0:001 6 102

Step 3. Normalize
(x + y ) = 0:160  100
Result: (x + y) = (x + y)(1 + ) with  = 0.
Example 2.3.3 Addition without a guard digit
x = 0:101  10 ; 2
y = ;0:994  10 1

Step 1. Align two numbers


x = 0:101 10 2

y = ;0:099[4] 10 2

The low order digit[2] is pushed out !


Step 2. Add
0:101 102
;0:099 102
(x + y ) = 0:002  102
Step 3. Normalize
(x + y ) = 0:200  100
Result: (x + y) = (x + y)(1 + ) with  = 0:25 = 250.

62
Thus, we repeat that for computers with a guard digit,
(x  y ) = (x + y )(1 +  ) j j  
However, for those without a guard digit
(x  y ) = x(1 + 1 )  y (1 + 2 );
j j  ;
1 j j  :
2

A FINAL REMARK: Throughout this book, we will assume that the computations have been
performed with a guard digit, as they are on almost all available machines.
We shall call results 1 through 3 of Theorem 2.3.1 along with (2.2.1) the fundamental laws
of oating point arithmetic. These fundamental laws form the basis for establishing bounds for
oating point computations.
For example, consider the oating point computation of x(y + z ):
(x(y + z )) = [x  (y + z )](1 + 1 )
= x(y + z )(1 + 2 )(1 + 1 )
= x(y + z )(1 + 1 2 + 1 + 2 )
 x(y + z)(1 + 3);
where 3 = 1 + 2 ; since 1 and 2 are small, their product is neglected.
We can now easily establish the bound of 3 . Suppose = 10, and that rounding is used. Then
j j = j +  j  j j + j j
3 1 2 1 2

 21  10 ;t + 12  10 ;t
1 1

= 10 ;t:1

Thus, the relative error due to round-o in computing (x(y + z )) is about 101;t in the
worst case.
2.4 Addition of n Floating Point Numbers
Consider adding n oating point numbers x1; x2; : : :; xn with rounding. De ne s2 = (x1 + x2 ).
Then
s2 = (x1 + x2) = (x1 + x2 )(1 + 2 );

63
where j1 j  21 1;t. That is, s2 ; (x1 + x2 ) = 2 (x1 + x2). De ne s3 ; s4; : : :; sn recursively by
si = (si + xi ); i = 2; 3; : : :; n ; 1:
+1 +1

Then s3 = (s2 + x3) = (s2 + x3)(1 + 3 ). That is,


s ; (x + x + x ) = (x + x ) + (x + x )(1 +  ) + x 
3 1 2 3 1 2 2 1 2 2 3 3 3

 (x + x ) + (x + x + x )
1 2 2 1 2 3 3

(neglecting the term 2 3 which is small, and so on). Thus, by induction we can show that
sn ; (x + x +    + xn)  (x + x ) + (x + x + x )
1 2 1 2 2 1 2 3 3

+    + (x + x +    + xn )n 1 2

(again neglecting the terms i j , which are small).


The above can be written as
sn ; (x + x +    + xn)  x ( +  +    + n )
1 2 1 2 3

+x ( +    + n ) + x ( +    + n )
2 2 3 3

+    + xnn
where each jij  12 1;t = . De ning 1 = 0, we can write:

Theorem 2.4.1 Let x ; x ; : : :; xn be n oating point numbers. Then


1 2

(x1 + x2 +    + xn) ; (x1 + x2 +    + xn )


 x1(1 + 2 +    + n) + x2(2 +    + n) +    + xnn;
where each jij  , i = 1; 2; : : :; n.

Remark: From the above formula we see that we should expect smaller error in general when
adding n oating point numbers in ascending order of magnitude:
jx j  jx j  jx j      jxnj:
1 2 3

If the numbers are arranged in ascending order of magnitude, then the larger errors will be associ-
ated with the smaller numbers.

64
2.5 Multiplication of n Floating Point Numbers
Proceeding as in the case of addition of n oating point numbers in the last section, it can be shown
that

Theorem 2.5.1 Y
n
(x1  x2      xn )  (1 + ) xi
i=1
where  = j(1 + 2 )(1 + 3)    (1 + n ) ; 1j and ji j  ; i = 1; 2; : : :; n.

A bound for 
Assuming that (n ; 1) < :01, we will prove that  < 1:06(n ; 1). (This assumption is
quite realistic; on most machines this assumption will hold for fairly large values of n).
Since ji j  , we have   (1 + )n; ; 1. Again, since
1

ln(1 + )n;1 = (n ; 1) ln(1 + ) < (n ; 1);


we have
(1 + )n;1 < e(n;1):
Thus,
(1 + )n;1 ; 1 < e(n;1) = (n ; 1) + ((n ;21)) +   
2

 (n ; 1) ((n ; 1))2 


= (n ; 1) 1 + 2 + 6 +
 
< (n ; 1) 1 + 1 0;:05:05
(Note that (n ; 1) < :01.)
Thus,  
; 0 : 05
  (1 + ) ; 1 < (n ; 1) 1 + 1 ; :05 < 1:06(n ; 1):
n 1
(2.5.1)
Thus, combining Theorem 2.5.1 and (2.5.1), we can write

Theorem 2.5.2 The relative error in computing the product of n oating point
numbers is at most 1:06(n ; 1), assuming that (n ; 1) < :01.

65
2.6 Inner Product Computation
A frequently arising computational task in numerical linear algebra is the computation of the inner
product of two n-vectors x and y:
xT y = x y + x y +    + xnyn
1 1 2 2 (2.6.1)
where xi and yi , i = 1; : : :; n, are the components of x and y .
Let xi and yi , i = 1; : : :; n be oating-point numbers. De ne
S = (x y );
1 1 1 (2.6.2)
S = (S + (x y ));
2 1 2 2 (2.6.3)
..
.
Sk = (Sk;1 + (xkyk )); (2.6.4)
k = 3; 4; : : :; n:
We then have, using Theorem 2.3.1,
S = x y (1 +  );
1 1 1 1 (2.6.5)
S = [S + x y (1 +  )] (1 +  )
2 1 2 2 2 2 (2.6.6)
..
.
Sn = [Sn;1 + xn yn (1 + n )] (1 + n ); (2.6.7)
where each ji j  , and jij  . Substituting the values of S1 through Sn;1 in Sn and making
some rearrangements, we can write

X
n
Sn = xi yi(1 + i ) (2.6.8)
i=1
where
1 + i = (1 + i )(1 + i)(1 + i+1)    (1 + n)
 1 + i + i + i+1 +    + n (1 = 0) (2.6.9)
(ignoring the products i j and j k , which are small).

66
For example, when n = 2, it is easy to check that
S = x y (1 +  ) + x y (1 +  );
2 1 1 1 2 2 2 (2.6.10)
where 1 + 1  1 + 1 + 2; 1 + 2  1 + 2 + 2 (neglecting the products of 1 2 and 2 2, which are
small).
As in the last section, it can be shown (see Forsythe and Moler CSLAS, pp. 92{93) that if
n < 0:01, then
jij  1:01(n + 1 ; i); i = 1; 2; : : :; n: (2.6.11)
From (2.6.8) and (2.6.11), we have
j (xT y) ; xT yj
Xn
 jxij jyij jij
i=1
 njxjT jyj
 n kxk kyk
2 2
(using the Cauchy{Schwarz inequality (Chapter 1, section 1.7)),
where j j = (j 1j; j 2j; : : :; j nj)T .

Theorem 2.6.1
j (xT y) ; xT yj  njxjT jyj  n kxk kyk ;
2 2

where c is a constant of order unity.

Computing Inner Product in Double Precision


While talking about inner product computation, let's mention that since most computers allow
double precision computations, it is recommended that the inner product be computed in double
precision (using 2t digits arithmetic) to retain greater accuracy. The rationale here is that if we use
single precision to compute xT y , then there will be (2n ; 1) single precision rounding errors (one
for each multiplication and each addition). A better strategy is to convert each xi and yi to double
precision by extending their mantissa with zeros, multiply them in double precision, add them in
double precision and, nally, round the nal result in single precision. This process is known as
accumulation of inner product in double precision (or extended precision). We summarize
the process in the following.
67
Accumulation of Inner Product in Double Precision
1. Convert each xi and yi in double precision.
2. Compute the individual products xi yi in double precision.
X
n
3. Compute the sum xiyi in double precision.
i=1
4. Round the sum in single precision.
The process gives low round-o error at little extra cost. It can be shown (Wilkinson AEP, pp.
117{118) that the error in this case is essentially independent of n. Speci cally, it can be
shown that if the inner product is accumulated in double precision and 2(xT y ) denotes the result
of such computations, then

Theorem 2.6.2
j (xT y) ; xT yj  cjxT yj;
2

unless severe cancellation takes place in any of the terms of xT y.

Remark: The last sentence in Theorem 2.6.2 is important. One can construct a very simple
example (Exercise #6(b)) to see that if cancellation takes place, the conclusion of Theorem 2.6.2
does not hold. The phenomenon of catastrophic cancellation is discussed in the next
section.

68
2.7 Error Bounds for Floating Point Matrix Computations

Theorem 2.7.1 Let jM j = (jmijj). Let A and B be two oating point matrices
and c a oating point number. Then
1. (cA) = cA + E; jE j  jcAj
2. (A + B ) = (A + B ) + E; jE j  jA + B j
If A and B are two matrices compatible for matrix-multiplication, then
3. (AB ) = AB + E; jE j  njAj jB j + O(2 ).
Proof. See Wilkinson AEP, pp. 114-115, Golub and Van Loan MC (1989, p. 66).

Meaning of O( ) 2

In the above expression the notation O(2 ) stands for a complicated expression that is bounded
by c2 , where c is a constant, depending upon the problem. The expression O(2 ) will be used
frequently in this book.

Remark: The last result shows that the matrix multiplication in oating point arithmetic can be
very inaccurate, since jAj jB j may be much larger than jAB j itself (exercise #9). For this reason,
whenever possible, while computing matrix-matrix or matrix-vector product, accumu-
lation of inner products in double precision should be used, because in this case the
entries of the error matrix can be shown to be bounded predominantly by the entries
of the matrix jABj, rather than those of jAjjBj; see Wilkinson AEP, p. 118.
Error Bounds in Terms of Norms
Traditionally, for matrix computations the bounds for error matrices are given in terms of the
norms of the matrices, rather than in terms of absolute values of the matrices as given above.
Here we rewrite the bound for error matrices for matrix multiplications using norms, for easy
reference later in the book. We must note, however, that entry-wise error bounds are
more meaningful than norm-wise errors (see remarks in Section 3.2).
Consider again the equation:
(AB ) = AB + E; jE j  njAj jB j + O(2):
69
Since kE k  k jE j k, we may rewrite the equation as:
(AB ) = AB + E;
where
kE k  k jE j k  nk jAj k k jBj k + O( ): 2

In particular, for k k and k k1 norms, we have


2

kE k1  nkAk1 kBk1 + O( ) 2

kE k  n kAk kBk + O( ):


2
2
2 2
2

Theorem 2.7.2 (AB) = AB + E , where kE k  n kAk kBk + O( ).


2
2
2 2
2

Two Important Special Cases


A. Matrix-vector multiplication

Corollary 2.7.1 If b is a vector, then from above we have


(Ab) = Ab + e
where
kek  n kAk kbk + O( ):
2
2
2 2
2

(See also Problem #11 and the remarks made there.)

B. Matrix multiplication by an orthogonal matrix

Corollary 2.7.2 Let A 2 Rnn and Q 2 Rnn orthogonal. Then


(QA) = Q(A + E );
where kE k2  n2 kAk2 + O(2 ).

70
Implication of the above result
The result says that, although matrix multiplication can be inaccurate in general,
if one of the matrices is orthogonal then the oating point matrix multiplication gives
only a small and acceptable error. As we will see in later chapters, this result forms the basis
of many numerically viable algorithms discussed in this book.
For example, the following result, to be used very often in this book, forms the basis of the QR
factorization of a matrix A (see Chapter 5) and is a consequence of the above result.

Corollary 2.7.3 Let P be an orthogonal matrix de ned by


P = I ; 2 uu
T
u u;
T

where u is a column vector. Let P^ be the computed version of P in oating point


arithmetic. Then
^ ) = P (A + E );
(PA
where
kE k  cn kAk
2
2
2

and c is a constant of order unity.


Moreover, if the inner products are accumulated in double precision, then the bound
will be independent of n2 .
Proof. (See Wilkinson AEP, pp. 152-160).

2.8 Round-o Errors Due to Cancellation and Recursive Computations


Intuitively, it is clear that if a large number of oating point computations is done, then the
accumulated error can be quite large. However, round-o error can be disastrous even at a single
step of computation. For example, consider the subtraction of two numbers:
x = :54617
y = :54601
The exact value is
d = x ; y = :00016:

71
Suppose now we use four digit arithmetic with rounding. Then we have
x^ = :5462 (Correct to four signi cant digits)
y^ = :5460 (Correct to ve signi cant digits)
d^ = x^ ; y^ = :0002:
How good is the approximation of d^ to d? The relative error is
jd ; d^j = :25(quite large!)
jdj
What happened above is the following. In four digit arithmetic, the numbers .5462 and .5460 are
of almost the same size. So, when the rst one was subtracted from the second, the most signi cant
digits canceled and the very least signi cant digit was left in the answer. This phenomenon, known
as catastrophic cancellation, occurs when two numbers of approximately the same size are
subtracted. Fortunately, in many cases catastrophic cancellation can be avoided. For example,
consider the case of solving the quadratic equation:
ax + bx + c = 0;
2
a 6= 0:
The usual way the two roots x1 and x2 are computed is:
p2
x1 = ; b + b ; 4ac
p2a2
x2 = ; b ; b ; 4ac
2a
It is clear from above that if a; b, and c are numbers such that ;b is about the same size
p
as b2 ; 4ac (with respect to the arithmetic used), then a catastrophic cancellation will occur in
computing x2 and as a result, the computed value of x2 can be completely erroneous.
Example 2.8.1
As an illustration, take a = 1, b = ;105, c = 1 (Forsythe, Malcolm and Moler CMMC pp.
20-22). Then using = 10; t = 8; L = ;U = ;50, we see that
p 10
x1 = 10 5
+ 10 ; 4 = 105 (true answer)
2
x2 = 10 ;2 10 = 0 (completely wrong).
5 5

The true x2 = 0:000010000000001 (correctly rounded to 11 signi cant digits). The catastrophic
p
cancellation took place in computing x2, since ;b and b2 ; 4ac) are the same order. Note that
p
in 8-digit arithmetic, 1010 ; 4 = 105.
72
How Cancellation Can be Avoided
Cancellation can be avoided if an equivalent pair of formulas is used:
p
x = ; b + sign(b ) b ; 4ac 2
1
2a
c
x = ax
2
1

where sign(b) is the sign of b. Using these formulas, we easily see that:
x = 100000:00
1
1:0000000
x = 100000
2
:00
= 0:000010000
Example 2.8.2
For yet another example to see how cancellation can lead to inaccuracy, consider the problem
of evaluating
f (x) = ex ; x ; 1 at x = :01.
Using ve digit arithmetic, the correct answer is .000050167. If f (x) is evaluated directly from the
expression, we have
f (:01) = 1:0101 ; (:01) ; 1 = :0001
; :000050167
Relative Error = :0001:00005016
= :99  100 ;
indicating that we cannot trust even the rst signi cant digit.
Fortunately, cancellation can again be avoided using the convergent series
ex = 1 + x + x2 + x3! +   
2 3

In this case we have


ex ; x ; 1 = (1 + x + x2 + x3! +   ) ; x ; 1
2 3

= x + x + x 
2 3 4

2 3! 4!
For x = :01, this formula gives
(:01)2 + (:01)3 + (:01)4 +   
2 3! 4!
= :00005 + :000000166666 + :00000000004166 +   
= :000050167 (Correct gure up to ve signi cant gures)

73
Remark: Note that if x were negative, then use of the convergent series for ex would not have
helped. For example, to compute ex for a negative value of x, cancellation can be avoided by using:
e;x = e1x = 1
1 + x + 2! + x3! +   
x 2 3

Recursive Computations
In the above examples, we saw how subtractive cancellations can give inaccurate answers. There
are, however, other common sources of round-o errors, e.g., recursive computations, which are
computations performed recursively so that the computation of one step depends upon the results
of previous steps. In such cases, even if the error made in the rst step is negligible, due to the
accumulation and magni cation of error at every step, the nal error can be quite large, giving a
completely erroneous answer.
Certain recursions propagate errors in very unhealthy fashions. Consider a very nice example
involving recursive computations, again from the book by Forsythe, Malcolm, and Moler CMMC
[pp. 16-17].
Example 2.8.3
Suppose we need to compute the integral
Z 1
En = xnex; dx
1

for di erent values of n. Integrating by parts gives


Z 1 Z 1
En = n x;1
xe dx = (x e n x;1)1 ; nxn; ex; dx
1 1
0
0 0

or
En = 1 ; nEn; ; n = 2; 3; : : :1

Thus, if E is known, then for di erent values of n, En can be computed, using the above recursive
1

formula.
Indeed, with = 10 and t = 6, and starting with E1 = 0:367879 as a six-digit approximation
to E1 = 1=e, we have from above:
E1 = 0:367879
E2 = 0:264242
E3 = 0:207274
E4 = 0:170904
..
.
E9 = ;0:068480
74
Although the integrand is positive throughout the interval [0; 1], the computed value
of E is negative. This phenomenon can be explained as follows.
9

The error in computing E was ;2 times the error in computing E , the error in computing E
2 1 3

was ;3 times the error in E (therefore, the error at this step was exactly six times the error in
2

E ). Thus, the error in computing E was (;2)(;3)(;4)    (;9) = 9! times the error in E . The
1 9 1

error in E was due to the rounding of 1=e using six signi cant digits, which is about 4:412  10; .
1
7

However, this small error multiplied by 9! gave 9!  4:412  10; = :11601, which is quite large.
7

Rearranging the Recurrence


Again, for this example, it turned out that we could get a much better result by simply rear-
ranging the recursion so that the error at every step, instead of being magni ed, is reduced. Indeed,
if we rewrite the recursion as
En;1 = 1 ;nEn ; n = : : :; 3; 2
then the error at each step will be reduced by a factor of 1=n. Thus, starting with a large value of n
(say, n = 20) and working backward, we will see that E9 will be accurate to full six-digit precision.
To obtain a starting value, we note that
Z1 Z1
En = xn en;1dx  xndx = n +1 1 :
0 0

With n = 20, E20  1 . Let's take E20 = 0. Then, starting with E20 = 0, it can be shown (Forsythe,
21
Malcolm, and Moler CMMC, p. 17) that E9 = 0:0916123, which is correct to full six-digit precision.
The reason for obtaining this accuracy was that the error in E20 was at most 21 1 ; this error was
1 in computing E , giving an error of at most 1  1 = 0:0024 in the computation
multiplied by 20 19
20 21
of E19, and so on.

2.9 Review and Summary


The concepts of oating point numbers and rounding errors have been introduced and discussed in
this chapter.
1. Floating Point Numbers: A normalized oating point number has the form
x = r e ;
where e is called exponent, r is the signi cant, and is the base of the number system.
The oating point number system is characterized by four parameters:

75
| the base
t | the precision
L; U | the lower and upper limits of the exponent.
2. Errors: The error(s) in a computation is measured either by absolute error or relative
error.
The relative errors make more sense than absolute errors.
The relative error gives an indication of the number of signi cant digits in an approximate
answer.
The relative error in representing a real number x by its oating point representation (x) is
bounded by a number , called the machine precision (Theorem 2.2.1).
3. Laws of Floating Point Arithmetic.
(x  y ) = (x  y )(1 +  )
where * indicates any of the four basic arithmetic operations +; ;; , or , and j j  .
4. Addition, Multiplication, and Inner Product Computations. The results of addi-
tion and multiplication of n oating point numbers are given in Theorems 2.4.1 and 2.5.1,
respectively.
 While adding n oating point numbers, it is advisable that they are added in ascending
order of magnitude.
 While computing the inner product of two vectors, accumulation of inner product in
double precision, whenever possible, is suggested.
5. Floating Point Matrix Multiplications. The entry-wise and normalize error bounds for
matrix multiplication of two oating point matrices are given in Theorems 2.7.1 and 2.7.2,
respectively.
 Matrix multiplication in oating point arithmetic can be very inaccurate, unless one of
the matrices is orthogonal (or unitary, if complex). Accumulation of inner product is
suggested, whenever possible, in computing a matrix-matrix or a matrix-vector product.
 The high accuracy in a matrix product computation involving an orthogonal matrix
makes the use of orthogonal matrices in matrix computations quite attractive.

76
6. Round-o Errors Due to Cancellation and Recursive Computation.
Two major sources of round-o errors are subtractive cancellation and recursive com-
putations.
They have been discussed in some detail in section 2.8.
Examples have been given to show how these errors come up in many basic computations.
An encouraging message here is that in many instances, computations can be reorganized so
that cancellation can be avoided, and the error in recursive computations can be diminished
at each step of computation.

2.10 Suggestions for Further Reading


For details of IEEE standard, see the monograph \An American National Standard: IEEE Standard
for Binary Floating-Point Arithmetic," IEEE publication, 1985.
For results on error bounds for basic oating point matrix operations, the books by James
H. Wilkinson (i) The Algebraic Eigenvalue Problem (AEP) and (ii) Rounding Errors
in Algebraic Processes (Prentice-Hall, New Jersey, 1963) are extremely useful and valuable
resources.
Discussion on basic oating point operations and rounding errors due to cancellations and
recursive computations are given nowadays in many elementary numerical analysis textbooks. We
shall name a few here which we have used and found useful.
1. Elementary Numerical Analysis by Kendall Atkinson, John Wiley and Sons, 1993.
2. Numerical Mathematics and Computing by Ward Cheney and David Kincaid,
Brooks/Cole Publishing Company, California, 1980.
3. Computer Methods for Mathematical Computations by George E. Forsythe, Michael
A. Malcolm and Cleve B. Moler, Prentice Hall, Inc., 1977.
4. Numerical Methods: A Software Approach by R. L. Johnston, John Wiley and Sons,
Toronto, 1982.
5. Numerical Methods and Software by D. Kahaner, C. B. Moler and S. Nash, Prentice
Hall, Englewood Cli s, NJ, 1988.

77
Exercises on Chapter 2
1. (a) Show that 81
j (x) ; xj   = >
< ;t for rounding
2
1

jxj >
: ;t for chopping
1

(b) Show that (a) can be written in the form


(x) = x(1 +  ); j j  :
2. Let x be a oating point number and k be a positive integer, then
( xk! ) = xk! (1 + ek );
k k

where
jekj  2k  + O( ): 2

3. Construct examples to show that the distributive law for oating point addition and multi-
plication does not hold. What can you say about the commutativity and associativity for
these operations? Give reasons for your answers.
4. Let x1 ; x2; : : :; xn be the n oating point numbers. De ne
s = (x + x ); sk = (sk; + xk ); k = 3; : : :; n:
2 1 2 1

Then from Theorem 2.4.1 show that


sn = (x + x +    + xn ) = x (1 +  ) + x (1 +  ) +    + xn(1 + n):
1 2 1 1 2 2

Give a bound for each i ; i = 1; : : :; n.


5. (a) Construct an example to show that, when adding a list of oating point numbers, in
general, the rounding error will be less if the numbers are added in order of increasing
magnitude.
(b) Find another example to show that this is not always necessarily true.
6. (a) Prove that the error in computing an inner product with accumulation in double precision
is essentially independent of n. That is, show that if 2 (xT y ) denotes the computation of
the inner product with accumulation in double precision, then unless severe cancellation
takes place,
j 2(xT y) ; xT yj  cjxT yj + O(2)
(Wilkinson AEP, pp. 116-117).
78
(b) Show by means of a simple example that if there is cancellation, then 2(xT y ) can di er
signi cantly from xT y (take t = 3).
(c) If s is a scalar, then prove that
!
x y = x y (1 +  ) +    + xnyn (1 + n ) :
T
1 1 1
2
s s=(1 + )
Find bounds for  and i , i = 1; : : :; n. (See Wilkinson AEP, p. 118).
7. Show that
(a) (cA) = cA + E; jE j  jcAj
(b) (A + B ) = (A + B ) + E; jE j  (jAj + jB j)
(c) (AB ) = AB + E; jE j  njAj jB j + O(2 )
(Wilkinson AEP, p. 115.)
8. Construct a simple example to show that the matrix multiplication in oating point arithmetic
need not be accurate. Rework your example using accumulation of inner product in double
precision.
9. Let A and B be a n  n matrices, then show that
(AB ) = AB + E;
where
jeij j  n jfij j + 0( );
2 2

fij = inner product of the ith row of A and j th column of B .


10. Prove that if Q is orthogonal then
(QA) = Q(A + E ); where kE k2  n2 kAk2jO(2 ):

11. Let b be a column vector and x = Ab. Let x^ = (x). Then show that
kx^ ; xk  p(n)kA;1k kAk;
kxk
where p(n) is a polynomial in n of low degree.
(The number kA;1k kAk is called the condition number of A. There are matrices for which
this number can be very big. For those matrices we then conclude that the relative error
in matrix-vector product can be quite large.)
79
12. Using Theorem 2.7.1, prove that, if B is nonsingular,
k (AB) ; ABkF  n kBk B;1 + O(2):
kABkF F F

13. Let y1 ; : : :; yn be n column vectors de ned recursively:


yi = Ayi; i = 1; 2; : : :; n ; 1:
+1

Let y^i = (yi ). Find a bound for the relative error in computing each yi ; i = 1; : : :; n.
14. Let = 10; t = 4. Compute
(AT A);
where 0 1
BB 1 1 C
A=B@ 10
4 ; 0 C CA :
0 10;4
Repeat your computation with t = 9. Compare the results.
15. Show how to arrange computation in each of the following so that the loss of signi cant digits
can be avoided. Do one numerical example in each case to support your answer.
(a) ex ; x ; 1, for negative values of x.
p
(b) x +1;x ,
4 2
for large values of x.
1 1
(c)
x ; x + 1, for large values of x.
(d) x ; sin x, for values of x near zero.
(e) 1 ; cos x, for values of x near zero.
16. What are the relative and absolute errors in approximating
(a)  by 227?
(b) 13 by .333?
(c) 16 by .166?
How many signi cant digits are there in each computation?
17. Let = 10; t = 4. Consider computing
a = ( 16 ; :1666)=:1666:
How many correct digits of the exact answer will you get?
80
18. Consider evaluating
p
e= a +b : 2 2

How can the computation be organized so that over ow in computing a2 + b2 for large values
of a or b can be avoided?
19. What answers will you get if you compute the following numbers on your calculator or com-
puter?
p
(a) 10 ; 1,
8

p ;
(b) 10 ; 1, 20

(c) 10 ; 50
16

Compute the absolute and relative errors in each case.


20. What problem do you foresee in solving the quadratic equations
(a) x2 ; 106x + 1 = 0;
(b) 10;10x2 ; 1010x + 1010 = 0
using the well-known formula p
x= ; b  b ; 4ac : 2

2a
What remedy do you suggest? Now solve the equations using your suggested remedy, using
t = 4.
21. Show that the integral Z
yi =
1
xi dx
0 x+5
can be computed by using the recursion formula:
yi = 1i ; 5yi;1:
Compute y1 ; y2; : : :; y10 using this formula, taking
y = ln(x + 5)jx = ln 6 ; ln 5 = ln(1:2):
0
1
=0

What abnormalities do you observe in this computations? Explain what happened.


Now rearrange the recursion so that the values of yi can be computed more accurately.
22. Suppose that x approximates 104; 50000 and 55596 to ve signi cant gures. Find the largest
interval in each case containing x .

81
3. STABILITY OF ALGORITHMS AND CONDITIONING OF PROBLEMS
3.1 Some Basic Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82
3.1.1 Computing the Norm of a Vector : : : : : : : : : : : : : : : : : : : : : : : : : 82
3.1.2 Computing the Inner Product of Two Vectors : : : : : : : : : : : : : : : : : : 83
3.1.3 Solution of an Upper Triangular System : : : : : : : : : : : : : : : : : : : : : 83
3.1.4 Computing the Inverse of an Upper Triangular Matrix : : : : : : : : : : : : : 84
3.1.5 Gaussian Elimination for Solving Ax = b : : : : : : : : : : : : : : : : : : : : 86
3.2 De nitions and Concepts of Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : 91
3.3 Conditioning of the Problem and Perturbation Analysis : : : : : : : : : : : : : : : : 95
3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of the Solution 96
3.5 The Wilkinson Polynomial : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98
3.6 An Ill-conditioned Linear System Problem : : : : : : : : : : : : : : : : : : : : : : : : 100
3.7 Examples of Ill-conditioned Eigenvalue Problems : : : : : : : : : : : : : : : : : : : : 100
3.8 Strong, Weak and Mild Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 103
3.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 105
3.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 106
CHAPTER 3
STABILITY OF ALGORITHMS
AND CONDITIONING OF PROBLEMS
3. STABILITY OF ALGORITHMS AND CONDITIONING OF
PROBLEMS
3.1 Some Basic Algorithms
De nition 3.1.1 An algorithm is an ordered set of operations, logical and arithmetic, which
when applied to a computational problem de ned by a given set of data, called the input data,
produces a solution to the problem. A solution is comprised of a set of data called the output
data.
In this book, for the sake of convenience and simplicity, we will very often describe algorithms by
means of pseudocodes which can be translated into computer codes easily. Describing algorithms
by pseudocodes has been made popular by Stewart through his book IMC (1973).

3.1.1 Computing the Norm of a Vector


Given x = (x ; : : :; xn)T , compute kxk .
1 2

Algorithm 3.1.1 Computing the Norm of a Vector


Input Data: n; x ; : : :; xn.
1

Step 1: Compute r = max(jx j; : : :; jxnj).


1

Step 2: Compute yi = xi=r; i = 1; : : :; n


Step 3: Compute s = kxk = rp(y +    + yn).
2
2
1
2

Output Data: s.
Pseudocodes
r = max(jx j; : : :; jxnj)
1

s=0
For i = 1 to n do
yi = xi=r
s = s + yi2

s = r(s) =
1 2

G. W. Stewart, a former student of the celebrated numerical analyst Alston Householder, is a professor of computer
science at the University of Maryland. He is well known for his many outstanding contributions in numerical linear
algebra and statistical computations. He is the author of the book Introduction to Matrix Computations.

82
An Algorithmic Note
In order to avoid over ow, each entry of x has been normalized before
using the formula q
kxk2 = x21 +    + x2n:

3.1.2 Computing the Inner Product of Two Vectors


Given x and y two n-vectors, compute the inner product
xT y = x y + x y +    + xnyn:
1 1 2 2

Algorithm 3.1.2 Computing the Inner Product of Two Vectors


Input Data: n; x ; x ; : : :; xn:
1 2

Step 1: Compute the partial products: si = xiyi; i = 1; : : :; n:


Xn
Step 2: Add the partial products: Sum = si:
i=1

Output Data: Sum


Pseudocodes
Sum = 0
For i = 1; : : :; n do
Sum = Sum + xiyi
3.1.3 Solution of an Upper Triangular System
Consider the system
Ty = b
where T = (tij ) is a nonsingular upper triangular matrix and y = (y1 ; y2; : : :; yn)T . Speci cally,
t y + t y +    + t nyn = b
11 1 12 2 1 1

t y +    + t nyn = b
22 2 2 2

t y +    + t nyn = b
33 3 3 3
..
.
83
tn; ;n; yn; + tn; ;nyn = bn;
1 1 1 1 1

tnnyn = bn
where each tii 6= 0 for i = 1; 2; : : :; n.
The last equation is solved rst to obtain yn , then this value is inserted in the next to the last
equation to obtain yn;1, and so on. This process is known as back substitution. The algorithm
can easily be written down.
Algorithm 3.1.3 Back Substitution
Input Data: T = (tij ), an n  n upper triangular matrix, and b an n-vector.
Step 1: Compute yn = tbn
nn
Step 2: Compute yn; through y successively:
0 1
1 1

1 Xn
yi = t @bi ; tij yj A ; i = n ; 1; : : :; 2; 1:
ii j =i+1

Output Data: y = (y ; : : :; yn)T .


1

Pseudocodes
For i = n;0n ; 1; : : :; 3; 2; 11do
Xn
yi = t @bi ;
1
ii
tij yj A
j =i+1

Note: When i = n, the summation (P) is skipped.


3.1.4 Computing the Inverse of an Upper Triangular Matrix
Finding the inverse of an n  n matrix A is equivalent to nding a matrix X such that
AX = I:
Let X = (x1 ; : : :; xn) and I = (e1; : : :; en), where xi is the ith column of X and ei is the ith column
of I . Then the matrix equation AX = I amounts to solving n linear systems:
Axi = ei; i = 1; : : :; n:
The job is then particularly simple when A is a triangular matrix. Let T be an upper triangular
matrix. Then nding its inverse S = (s1 ; : : :; sn) amounts to solving n upper triangular linear
systems:
Txi = ei; i = 1; : : :; n:
84
Let si = (s1i ; s2i; : : :; sni)T .
For i=1: Ts = e gives
1 1

s = t1 :
11
11

(The entries s21 through sn1 are all zero.)


For i=2: Ts = e gives
2 2

s = t1 ; s = ; t1 (t s ):
22 12 12 22
22 11

(The entries s32 through sn2 are all zero.)


For i=k: Tsk = ek gives
skk = t1 ; sik = ; t1 (ti;i si+1 +1 ;k ; : : :; tik skk ); i = k ; 1; k ; 2; :::; 1:
kk ii
(The other entries of the column sk are all zero.)
The pseudocodes of the algorithm can now be easily written down.

A Convention: From now onwards, we shall use the following format for algorithm
descriptions.

Algorithm 3.1.4 The Inverse of an Upper Triangular Matrix


Let T be an n  n nonsingular upper triangular matrix. The following algorithm computes S;
the inverse of T .
For k = n; n ; 1; : : :; 1 do
(1) s = 1
kk
tkk
X
k
(2) sik = ;t;ii 1 tij sjk (i = k ; 1; k ; 2; : : : 1).
j =i+1
Example 3.1.1
05 2 31
B C
@ 0 2 1 CA
T =B
0 0 4

85
k = 3:
s 33 = 14

s 23 = ; t1 (t23s33 )
22
1
= ; 2 (1  14 ) = ; 81 :

s 13 = ; t1 (t12s23 + t13 s33)


11

= ; 5 (2  (; 18 ) + 3  41 )
1
1
= ; 10

k = 2:
s = 1 =1
22
t22 2
s 12 = ; t1 (t12s22 )
11

= ; (2  1 ) = ; 1
1
5 2 5
k = 1:
s 11 = t1 = 15
0 ; 1
11

1 1
;1

B
T; = S = B
5 5 10
CC
1
@0 1
2
; 1
8 A
0 0 1
4

3.1.5 Gaussian Elimination for Solving Ax = b


Consider the problem of solving the linear system of n equations in n unknowns:
a x + a x +  + a n xn = b
11 1 12 2 1 1

a x + a x +  + a n xn = b
21 1 22 2 2 2
..
.
an x + an x +  + annxn
1 1 2 2 = bn
or, in matrix notation,
Ax = b;

86
where a = (aij ) and b = (b1; : : :; bn)T .
A well-known approach for solving the problem is the classical elimination scheme known as
Gaussian elimination. A detailed description and mechanism of development of this historical
algorithm and its important practical variations will appear in Chapters 5 and 6. However, for
a better understanding of some of the material presented in this chapter, we just give a brief
description of the basic Gaussian elimination scheme.

Basic idea. The basic idea is to reduce the system to an equivalent upper triangular system so
that the reduced upper triangular system can be solved easily using the back substitution algorithm
(Algorithm 3.1.3).

Reduction process. The reduction process consists of (n ; 1) steps.


Step 1: At step 1, the unknown x is eliminated from
1
 athe second  athe n equations.
 a through
th

This is done by multiplying the rst equation by ; a ; ; a ; : : :; ; a and adding it,


21
11
31
11
n1
11
respectively, to the 2nd through n equations. The quantities
th

mi = ; aai ; i = 2; : : :; n
1
1

11

are called multipliers. At the end of step 1, the system Ax = b becomes A x = b , where
(1) (1)

the entries of A = (aij ) and those of b are related to the entries of A and b as follows:
(1) (1) (1)

aij = aij + mi a j (i = 2; : : :; n; j = 2; : : :; n)
(1)
1 1

bi = bi + mi b (i = 2; : : :; n):
(1)
1 1

(Note that a(1)


21 ; a31 ; : : :; ann are all zero.)
(1) (1)

Step 2: At step 2, x is eliminated from the 3rd through the nth equations of A x = b by
2
(1) (1)

multiplying the second equations of A(1) x = b(1) by the multipliers


mi = ; aai ; i = 3; : : :; n
2
2

22

and adding it, respectively, to the 3rd through nth equations. The system now becomes
A x = b , whose entries are given as follows:
(2) (2)

aij = aij + mi a j (i = 3; : : :; n; j = 3; : : :; n)
(2) (1) (1)
2 2

bi = bi + mi b (i = 3; : : :; n)
(2) (1) (1)
2 2

and so on.
87
k;1)
Step k: At step k, the (n ; k) multipliers mik = ; aikk; , i = k + 1; : : :; n are formed and using
(

akk
( 1)

them, xk is eliminated from the (k + 1)th through the nth equations of A(k;1)x = b(k;1). The
entries of A(k) and those of b(k) are given by
aijk = aijk; + mik akjk;
( ) ( 1) ( 1)
(i = k + 1; : : :; n; j = k + 1; : : :; n)
bik = bik; + mikbkk;
( ) ( 1) ( 1)
(i = k + 1; : : :; n)

Step n-1: At the end of the (n ; 1)th step, the reduced matrix A n; is upper triangular and the
( 1)

original vector b is transformed to b(n;1).


We are now ready to write down the pseudocodes of the Gaussian elimination scheme.

The following summarized observations will help write the pseudocodes:


1. There are (n ; 1) steps
(k = 1; 2; : : :; n ; 1):
2. For each value of k, there are (n ; k) multipliers: mik ; (i = k + 1; : : :; n).
3. For each value of k, only (n ; k)2 entries of A(k) are modi ed (i = k +
1; : : :; n; j = k + 1; : : :; n). The (n ; k) entries below the (k; k)th entry of
the kth column are zeros and the remaining other entries that are not modi-
ed remain the same as those of the corresponding entries of A(k;1).

Algorithm 3.1.5 Basic Gaussian Elimination


For k = 1; 2; : : :; n ; 1 do
For i = k + 1; : : :; n do
k;1)
mi;k = ; aikk; .
(

akk ( 1)

For j = k + 1; : : :; n do
aijk = aijk; + mik akjk;
( ) ( 1) ( 1)

bik = bik; + mikbkk;


( ) ( 1) ( 1)

(Note that A = (aij ) = (aij ); b = b .)


(0) (0)

88
Remarks:
1. The above basic Gaussian elimination algorithm is commonly known as the Gaussian elim-
ination algorithm without row interchanges or the Gaussian elimination algorithm
without pivoting. The reason for having such a name will be clear from the discussion of
this algorithm again in Chapter 5.
2. The basic Gaussian algorithm as presented above is not commonly used in practice. Two prac-
tical variations of this algorithm, known as Gaussian elimination with partial and complete
pivoting, will be described in Chapters 5 and 6.
3. We have assumed that the quantities a11; a(1) (n;1)
22 ; : : :; ann are di erent from zero. If
any of them is computationally zero, the algorithm will stop.
Example 3.1.2

5x1 + x2 + x3 = 7
x1 + x2 + x3 = 3
2x1 + x2 + 3x3 = 6
or
05 1 110x 1 071
BB CB 1
CC B C
@ 1 1 1 CA B@ x2 A = B@ 3 CA
2 1 3 x 3 6
Ax = b:
Step 1: k=1.
i = 2; 3:
m = ; aa = ; 15 ; m = ; aa = ; 52 :
21
21
31
31

11 11
j = 2; 3:

89
i = 2; j = 2 : a = a22 + m21a12 = 45
(1)
22

i = 2; j = 3 : a(1)
23 = a23 + m21a13 = 45
i = 3; j = 2 : a(1)
32 = a32 + m31a12 = 35
i = 3; j = 3 : a(1)
33 = a33 + m31a13 = 13 5
b(1)
2 = b2 + m21b1 = 58
b(1)
3 = b3 + m31b2 = 165
(Note: b1 = b1; a21 = a31 = 0, a11 = a11; a12 = a12 ; a13 = a13.)
(1) (1) (1) (1) (1) (1)

05 1 1 10x 1 071
BB CC BB CC B C 1

@0 A @ x A = B@ CA
4
5
4
5 2
8
5

0 3
5
13
5
x 3
16
5

A x = b :
(1) (1)

Step 2: k=2.
i=3
m = ; a = ; 34
(1)
32
32
a (1)
22

i = 3; j = 3 : a (2)
33 = a(1)
33 + m32 a23 = 2
(1)

b (2)
3 = b(1)
3 + m32 b2
(1)
=2

05 1 1 10x 1 071
BB 0 CC BB x CC = BB CC1

@ A@ A
4
5
4
5 @ A 2
8
5

0 0 2 x3 2
A x = b
(2) (2)

Note that A(2) is upper triangular.


Back Substitution: The above triangular system is easily solved using back substitution:
2x3 = 2 ) x3 = 1
90
4x + 4x = 8 ) x = 1
5 2 5 3 5 2

5x1 + x2 + x + 3 = 7 ) x1 = 1

3.2 De nitions and Concepts of Stability


The examples on catastrophic cancellations and recursive computations in the last chapter had one
thing in common: the inaccuracy of the computed result in each case was due entirely
to the algorithm used, because as soon as the algorithm was changed or rearranged and applied
to the problem with the same data, the computed result became very satisfactory. Thus, we are
talking about two di erent types of algorithms for a given problem. The algorithms of the rst
type|giving inaccurate results|are examples of unstable algorithms, while the ones of the
second type|giving satisfactory results|are stable algorithms.
The study of stability is very important. This is done by means of round-o error analysis.
There are two types: backward error analysis and forward error analysis.
In forward analysis an attempt is made to see how the computed solution obtained by the
algorithm di ers from the exact solution based on the same data.
De nition 3.2.1 An algorithm will be called forward stable if the computed solution x^ is close
to the exact solution, x, in some sense.
The round-o error bounds obtained in Chapter 2 for various matrix operations are the result
of forward error analyses.
On the other hand, backward analysis relates the error to the data of the problem rather than
to the problem's solution. Thus we de ne backward stability as follows:
De nition 3.2.2 An algorithm is called backward stable if it produces an exact solution to a
nearby problem.
Backward error analysis, introduced in the literature by J. H. Wilkinson, is nowadays widely
used in matrix computations and using this analysis, the stability (or instability) of many algorithms
in numerical linear algebra has been established in recent years. In this book, by \stability"
we will imply \backward stability", unless otherwise stated.
James H. Wilkinson, a British mathematician, is well known for his pioneering work on backward error analysis
for matrix computations. He was aliated with the National Physical Laboratory in Britain, and held visiting
appointments at Argonne National Laboratory, Stanford University, etc. Wilkinson died an untimely death in 1986. A
fellowship in his name has since been established at Argonne National Laboratory. Wilkinson's book The Algebraic
Eigenvalue Problem is an extremely important and very useful book for any numerical analyst.

91
As a simple example of backward stability, consider the case of computing the sum of two
oating point numbers x and y . We have seen before that
(x + y ) = (x + y )(1 +  )
= x(1 +  ) + y (1 +  )
= x0 + y 0
Thus, the computed sum of two oating point numbers x and y is the exact sum of another two
oating point numbers x0 and y 0 . Since
jj  ;
both x0 and y 0 are close to x and y , respectively. Thus we conclude that the operation of adding
two oating point numbers is backward stable. Similar statements, of course, hold for other
oating point arithmetic operations.
For yet another type of example, consider the problem for solving the linear system Ax = b:
De nition 3.2.3 An algorithm for solving Ax = b will be called stable if the computed solution
x^ is such that
(A + E )^x = b + b
with E and b small.
How Do We Measure Smallness?
The \smallness" of a matrix or a vector is measured either by looking into its entries or by
computing its norm.

Norm-wise vs. Entry-wise Errors


While measuring errors in computations using norms is traditional in matrix
computations, component-wise measure of errors is becoming increasingly
important. It really does make more sense.
An n  n matrix A has n2 entries, but the norm of A is a single number.
Thus the smallness or largeness of the norm of an error matrix E does not
truly re ect the smallness or largeness of the individual entries of E . For
example, if E = (10; :00001; 1)T , then kE k = 10:0499: Thus the small entry
.00001 was not re ected in the norm measure.

92
Examples of Stable and Unstable Algorithms by Backward Error Analysis
Example 3.2.1 A Stable Algorithm | Solution of an Upper Triangular System by
Back Substitution
Consider Algorithm 3.1.3 (the back substitution method). Suppose the algorithm is imple-
mented using accumulation of inner product in double precision. Then it can be shown (see Chapter
11) that the computed solution x^ satis es
(T + E )^x = b;
where the entries of the error matrix E are quite small. In fact, if E = (eij ) and T = (tij ), then
jeij j  jtij j10;t; i; j = 1; : : :; n;
showing that the error can be even smaller than the error made in rounding the entries of T . Thus,
the back substitution process for solving an upper triangular system is stable.
Example 3.2.2 An Unstable Algorithm | Gaussian Elimination Without Pivoting
Consider the problem of solving the nonsingular linear system Ax = b using Gaussian elimina-
tion (Algorithm 3.1.5).
It has been shown by Wilkinson (see Chapter 11 of this book) that, when the process does
not break down, the computed solution x^ satis es
(A + E )^x = b;
with
kE k1  cn kAk1 + 0( );
3 2

where
A k = (aijk )
( ) ( )

are the reduced matrices in the elimination process and , known as the growth factor, is given
by
max max ja(ijk)j
k ij
 = max ja j :
ij ij

More speci cally, if = max ja j, and k = max


i;j ij
ja(k)j, then the growth factor  is given by
i;j ij

 = max( ; ; : : :; n; ) :
1 1

93
Now for an arbitrary matrix A,  can be quite large, because the entries of the
reduced matrices A k can grow arbitrarily. To see this, consider the simple matrix
( )

10; 1
! 10

A= :
1 2
One step of Gaussian elimination using 9 decimal digit oating point arithmetic will yield the
reduced matrix
10 ;10 1
! 10;10 1 !
A(1) = = :
0 2 ; 1010 0 ;1010
The growth factor for this problem is then
 = max( ; 1) = max(22; 10 ) = 102 ;
10 10

which is quite large. Thus, if we now proceed to solve a linear system with this reduced upper
triangular matrix, we cannot expect a small error matrix E . Indeed, if we wish to solve
10;10x1 + x2 = 1
x1 + 2x2 = 3
using the above A(1) , then the computed solution will be x1 = 0; x2 = 1, whereas the exact solution
is x1 = x2 = 1. This shows that Gaussian elimination is unstable for an arbitrary linear
system.

Note: Gaussian Elimination without pivoting is not unstable for


all matrices. There are certain classes of matrices such as symmetric
positive de nite matrices, etc., for which Gaussian elimination is stable.
We shall discuss this special system in Chapter 6 in some detail.

If an algorithm is stable for a given matrix A, then one would like to see that the algorithm
is stable for every matrix A in a given class. Thus, we may give a formal de nition of stability as
follows:
De nition 3.2.4 An algorithm is stable for a class of matrices C if for every matrix A in C ,
the computed solution by the algorithm is the exact solution of a nearby problem.
Thus, for the linear system problem
Ax = b;
94
an algorithm is stable for a class of matrices C if for every A 2 C and for each b, it produces a
computed solution x^ that satis es
(A + E )^x =  = b + b
for some E and b, where (A + E ) is close to A and b + b is close to b.

3.3 Conditioning of the Problem and Perturbation Analysis


From the preceding discussion we should not form the opinion that if a stable algorithm is used
to solve a problem then the computed solution will be accurate. A property of the problem called
conditioning also contributes to the accuracy or inaccuracy of the computed result.
The conditioning of a problem is a property of the problem itself. It is concerned
with how the solution of the problem will change if the input data contains some impurities. This
concern arises from the fact that in practical applications very often the data come from some
experimental observations where the measurements can be subjected to disturbances (or \noise")
in the data. There are other sources of error also, for example, round-o errors (discussed in
Chapter 11), discretization errors, etc. Thus, when a numerical analyst has a problem in hand
to solve, he or she must frequently solve the problem not with the original data, but with data that
has been perturbed. The question naturally arises: What e ects do these perturbations
have on the solution?
A theoretical study done by numerical analysts to investigate these e ects, which is independent
of the particular algorithm used to solve the problem, is called perturbation analysis. This
study helps one detect whether a given problem is \bad" or \good" in the sense of whether small
perturbations in the data will create a large or small change in the solution. Speci cally we de ne:
De nition 3.3.1 A problem (with respect to a given set of data) is called an ill-conditioned or
badly-conditioned problem if a small relative error in data causes a large relative error in the
computed solution, regardless of the method of solution. Otherwise, it is called well-conditioned.
Suppose a problem P is to be solved with an input c. Let P (c) denote the computed value
of the problem with the input c. Let c denote the perturbation in c. Then P will be said to be
ill-conditioned for the input data c if the relative error in the answer:
jP (c + c) ; P (c)j
jP (c)j
is much larger than the relative error in the data:
jcj :
jcj
95
Note: The de nition of conditioning is data-dependent. Thus, a problem which is
ill-conditioned for one set of data could be well-conditioned for another set.
3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of
the Solution
As stated in the previous section, the conditioning of a problem is a property of the problem itself,
and has nothing to do with the algorithm used to solve the problem. To a user, of course, the
accuracy of the computed solution is of primary importance. However, the accuracy of a computed
solution by a given algorithm is directly connected with both the stability of the algorithm and the
conditioning of the problem. If the problem is ill-conditioned, no matter how stable the
algorithm is, the accuracy of the computed solution cannot be guaranteed.

Backward Stability and Accuracy


Note that the de nition of backward stability does not say that the
computed solution x^ by a backward stable algorithm will be close to
the exact solution of the original problem. However, when a stable
algorithm is applied to a well-conditioned problem, the computed
solution should be near the exact solution.

The ill-conditioning of a problem contaminates the computed solution, even with the use of
a stable algorithm, therefore yielding an unacceptable solution. When a computed solution is
unsatisfactory, some users (who are not usually concerned with conditioning) tend to put the
blame on the algorithm for the inaccuracy. To be fair, we should test an algorithm for stability
only on well-conditioned matrices. If the algorithm passes the test of stability on well-conditioned
matrices, then it should be declared a stable algorithm. However, if a \stable" algorithm is applied
to an ill-conditioned problem, it should not introduce more error than what the data warrants.
From the previous discussion, it is quite clear now that investigating the conditioning of a
problem is very important.

96
The Condition Number of a Problem
Numerical analysts usually try to associate a number called the con-
dition number with a problem. The condition number indicates
whether the problem is ill- or well-conditioned. More speci cally,
the condition number gives a bound for the relative error in the so-
lution when a small perturbation is applied to the input data.

In numerical linear algebra condition numbers for many (but not all) problems have been
identi ed. Unfortunately, computing the condition number is often more involved and
time consuming than solving the problem itself. For example (as we shall see in Chapter
6), for the linear system problem Ax = b, the condition number is
Cond(A) = kAk kA;1k:
Thus, computing the condition number in this case involves computing the inverse of A; it is more
expensive to compute the inverse than to solve the system Ax = b. In Chapter 6 we shall discuss
methods for estimating Cond(A) without explicitly computing A;1 .
We shall discuss conditioning of each problem in detail in the relevant chapter. Before closing
this section, however, let's mention several well-known examples of ill-conditioned problems.
An Ill-Conditioned Subtraction
Consider the subtraction: c = a ; b:
a = 12354101
b = 12345678
c = a ; b = 8423:
Now perturb a in the sixth place:
a^ = 12354001
c^ = c + c = a^ ; b = 8323:
Thus, a perturbation in the sixth digit in the input value caused a change in the second digit in
the answer. Note that the relative error in the data is
a ; a^ = :000008;
a
97
while the relative error in the computed result is
c ; c^ = :01187722:
c
An Ill-Conditioned Root-Finding Problem
Consider solving the simple quadratic equation:
f (x) = x ; 2x + 1:
2

The roots are x = 1; 1: Now perturb the coecient 2 by 0.00001. The computed roots of the
perturbed polynomial f^(x) = x2 ; 2:00001x + 1 are: x1 = 1:0032 and x2 = :9968: Relative errors
in x1 and x2 are .0032. The relative error in the data is 5  10;6.

3.5 The Wilkinson Polynomial


The above example involved multiple roots. Multiple roots or roots close to each other
invariably make the root- nding problem ill-conditioned; however, the problem can be
ill-conditioned even when the roots are very well separated. Consider the following well-known
example by Wilkinson (see also Forsythe, Malcolm and Moler CMMC, pp. 18{19).
P (x) = (x ; 1)(x ; 2)    (x ; 20)
= x ; 210x +   
20 19

The zeros of P (x) are 1; 2; : : :; 20 and are distinct. Now perturb the coecient of x19 from ;210
to ;210 + 2;20, leaving other coecients unchanged. Wilkinson used a binary computer with
t = 30. Therefore, this change signi ed a change in the 30th signi cant base 2 digit. The roots of
the perturbed polynomial, carefully computed by Wilkinson, were found to be (reproduced from
CMMC, p. 18):
1.00000 0000 10.09526 6145  0.64350 0904i
2.00000 0000 11.79363 3881  1.65232 9728i
3.00000 0000 13.99235 8137  2.51883 0070i
4.00000 0000 16.73073 7466  2.81262 4894i
4.99999 9928 19.50243 9400  1.94033 0347i
6.00000 6944
6.99969 7234
8.00726 7603
8.91725 0249
20.84690 8101

98
The table shows that certain zeros are more sensitive to the perturbation than are others.
The following analysis, due to Wilkinson (see also Forsythe, Malcolm and Moler CMMC, p. 19),
attempts to explain this phenomenon.
Let the perturbed polynomial be
P (x; ) = x ; x +   
20 19

Then the (condition) number


x
 x i
=

measures the sensitivity of the root n = i; i = 1; 2; : : :; 20. To compute this number, di erentiate
the equation P (x; ) = 0 with respect to :
x ;P=
 = P=x
= x
19
:
20 XX 20

(x ; j ) i
=1 i j=1
6 j
i=
x for i = 1; : : :; 10 are listed below. (For the complete list, see CMMC, p. 19.)
The values of  =x i

Root x= jx i Root x= jx i


= =

1 ;8:2  10; 18
11 ;4:6  10
7

2 8:2  10 11 ; 12 2:0  10
8

3 ;1:6  10; 6
13 ;6:1  10
8

4 2:2  10
3 ; 14 1:3  10
9

5 ;6:1  10; 1
15 ;2:1  10
9

6 5:8  10
1
16 2:4  10
9

7 ;2:5  10 3
17 ;1:9  10
9

8 6:0  10
4
18 1:0  10
9

9 ;8:3  10 5
19 ;3:1  10
8

10 7:6  10
6
20 4:3  10
7

99
Root- nding and Eigenvalue Computation
The above examples teach us a very useful lesson: it is not a good
idea to compute the eigenvalues of a matrix by explicitly
nding the coecients of the characteristic polynomial and
evaluating its zeros, since the round-o errors in computations will
invariably put some small perturbations in the computed coecients
of the characteristic polynomial, and these small perturbations in the
coecients may cause large changes in the zeros. The eigenvalues will
then be computed inaccurately.

3.6 An Ill-conditioned Linear System Problem


The matrix 01 1
1
 1 1
n
B
B
2 3
C
 n C
H=B C
1 1 1 1

B
B . .
2
.
3 4
.. C
+1

@ .. . . . . . C
A
n n
1
 
1
+1 2
1
n;1
is called the Hilbert matrix after the celebrated mathematician David Hilbert. The linear system
problem, even with a Hilbert matrix of moderate order, is extremely ill-conditioned. For example,
take n = 5 and consider solving
Hx = b;
where b  (2:2833; 1:4500; 1:0929; 0:8845; 0:7456). The exact solution is x = (1; 1; 1; 1; 1; )T : Now
perturb the (5; 1)th element of H in the fth place to obtain .20001. The computed solution
with this very slightly perturbed matrix is (0:9937; 1:2857; ;0:2855; 2:9997; 0:0001)T . Note that
Cond(H ) = 0(105).
For more examples of ill-conditioned linear system problems see Chapter 6.

3.7 Examples of Ill-conditioned Eigenvalue Problems


Example 3.7.1

100
Consider the 10  10 matrix: 01 1 1
BB 1 1 0 CC
BB . .
CC
A=B B . . . C
. CC :
BB . .. 1 C
@ 0 A
1
The eigenvalues of A are all 1. Now perturb the (10,1) coecient of A by a small quantity  =
10;10. Then the eigenvalues of the perturbed matrix computed using the software MATLAB to be
described in the next chapter (that uses a numerically e ective eigenvalue-computation algorithm)
were found to be:
0
1:0184 + 0:0980i
0:9506 + 0:0876i
1:0764 + 0:0632i
0:9051 + 0:0350i
1:0999 + 0:00i
1:0764 ; 0:0632i
0:9051 ; 0:0350i
1:0184 ; 0:0980i
0:9506 ; 0:0876i
(Note the change in the eigenvalues.)
Example 3.7.2 The Wilkinson-Bidiagonal Matrix
Again, it should not be thought that an eigenvalue problem can be ill-conditioned only when
the eigenvalues are multiple or are close to each other. An eigenvalue problem with well-separated
eigenvalues can be very ill-conditioned too. Consider the 20  20 triangular matrix (known as the
Wilkinson-bidiagonal matrix):
0 20 20 1
B
B 19 20 0
CC
B
B ... ...
CC
A=B B CC :
B
B . . . 20 C CA
@ 0
1
The eigenvalues of A are 1; 2; : : :; 20. Now perturb the (20,1) entry of A by  = 10;10). If the
eigenvalues of this slightly perturbed matrix are computed using a stable algorithm (such as the

101
QR iteration method to be described in Chapter 8), it will be seen that some of them will change
drastically; they will even become complex.
In this case also, certain eigenvalues are more ill-conditioned than others. To explain this,
Wilkinson computed the condition number of each of the eigenvalues. The condition number
of the eigenvalue i of A is de ned to be (see Chapter 8):

Cond(i) = jy T1x j ;
i i

where yi and xi are, respectively, the normalized left and right eigenvectors of A corresponding
to the eigenvalue i (recall that x is a right eigenvector of A associated with an eigenvalue  if
Ax = x, x 6= 0); similarly, y is a left eigenvector associated with  if yT A = yT ).
In our case, the right eigenvector xr corresponding to r = r has components (see Wilkinson
AEP, p. 91):  20 ; r (20 ; r)(19 ; r) 
1; ;20 ; 20 ; r
; : : :; (;20)20;r ; 0; : : :; 0 ;
(;20)2
while the components of yr are
 ( r ; 1)! ( r ; 1)( r ; 2) r ; 1 
0; 0; : : :; 0; 20r;1 ; : : :; 202 ; 20 ; 1 :
These vectors are not quite normalized, but still, the reciprocal of their products gives us an estimate
of the condition numbers.
In fact, Kr , the condition number of the eigenvalue  = r, is
Kr = y T1x = (20(;;1)
r 2019
r r r)!(r ; 1)!
The number Kr is large for all values or r. The smallest Ki for the Wilkinson matrix are K1 =
K20  4:31  107 and the largest ones are K11 = K10  3:98  1012.
Example 3.7.3 (Wilkinson AEP, p. 92)
0 n (n ; 1) (n ; 2)  3 2 11
BB (n ; 1) (n ; 1) (n ; 2)  3 2 1C CC
BB
BB 0 (n ; 2) (n ; 2) ... .. .. C
. . CC
A=B
B ... ... . . . . . . ... ... C CC
BB . . . . . . . 2 .. C.
BB .. CC
BB .. C
@ . 2 2 1C A
0    0 1 1
102
As n increases, the smallest eigenvalues become progressively ill-conditioned. For example, when
n = 12, the condition numbers of the rst few eigenvalues are of order unity while those of the last
three are of order 107.

3.8 Strong, Weak and Mild Stability


While establishing the stability of an algorithm by backward error analysis, we sometimes get
much more than the above de nition of stability calls for. For example, it can be shown that
when Gaussian elimination with partial and complete pivoting, (for discussions on partial and
complete pivoting, see Chapter 5) is applied to solve a nonsingular system Ax = b, the computed
solution x^ not only satis es
(A + E )^x = b
with a small error matrix E , but also we have (A + E ) is nonsingular. The standard de nition of
stability, of course, does not need that.
On the other hand, if Gaussian elimination without pivoting is applied to a symmetric positive
de nite system, the computed solution x^ satis es (A + E )^x = b with a small error matrix E , but
the standard backward error analysis does not show that (A + E ) is symmetric positive de nite.
Thus, we may talk about two types of stable algorithms: one type giving not only a small error
matrix, but also the perturbed matrix A + E belonging to the same class as the matrix A itself,
and the other type just giving a small error matrix without any restriction on A + E .
To distinguish between these two types of stability, Bunch (1987) has recently introduced in
the literature the concept of strong stability. Following Bunch we de ne:
De nition 3.8.1 An algorithm for solving the linear system problem Ax = b is strongly stable
for a class of matrices C if, for each A in C the computed solution is the exact solution of a nearby
problem, and the matrix (A + E ) also belongs to C .

Examples (Bunch).
1. The Gaussian elimination with pivoting is strongly stable on the class of nonsingular matrices.
(See Chapter 6 for a description of the Cholesky-algorithm.)
2. The Cholesky-algorithm for computing the factorization of a symmetric positive de nite ma-
trix of the form, A = HH T , is strongly stable on the class of symmetric positive de nite
matrices.
James R. Bunch is a professor of mathematics at the University of California at San Diego. He is well known for his
work on ecient factorization of symmetric matrices (popularly known as the Bunch-Kaufman and the Bunch-Parlett
factorization procedures), and his work on stability and conditioning.

103
3. The Gaussian elimination without pivoting is strongly stable on the class of nonsingular
diagonally dominant matrices.
With an analogy of this de nition of strong stability, Bunch also introduced the concept of
weak stability that depends upon the conditioning of the problem.
De nition 3.8.2 An algorithm is weakly stable for a class of matrices C if for each well-
conditioned matrix in C the algorithm produces an acceptable accurate solution.
Thus, an algorithm for solving the linear system Ax = b is weakly stable for a class of matrices
C if for each well-conditioned matrix A in C and for each b, the computed solution x^ to Ax = b is
such that
kx ; x^k is small:
kxk
Bunch was motivated to introduce this de nition to point out that the well-known (and fre-
quently used by engineers) Levinson Algorithm for solving linear systems involving TOEPLITZ
matrices (a matrix T = (tij ) is TOEPLITZ if the entries along each diagonal row are the same)
are weakly stable on the class of symmetric positive de nite Toeplitz matrices. This very
important and remarkable result was proved by Cybenko (1980). The result was important be-
cause the signal processing community had been using the Levinson algorithm routinely for years,
without fully investigating the stability behavior of this important algorithm.

Remarks on Stability, Strong Stability and Weak Stability


1. If an algorithm is strongly stable, it is necessarily stable.
2. Note that stability implies weak stability. Weak stability is good
enough for most users.
3. If a numerical analyst can prove that a certain algorithm is not weakly
stable, then it follows that the algorithm is not stable, because, \not
weakly stable" implies \not stable".

Mild Stability: We have de ned an algorithm to be backward stable if the algorithm produces
a solution that is an exact solution of a nearby problem. But it might very well happen that an
algorithm produces a solution that is only close to the exact solution of a nearby problem.
George Cybenko is a professor of electrical engineering and computer science at Dartmouth College. He has made
substantial contributions in numerical linear algebra and signal processing.

104
How should we then call such an Algorithm?
Van Dooren, following de Jong (1977) has called such an algorithm mixed stable algorithm, and
Steward (IMC, 1973) has de ned such an algorithm as just stable algorithm under the additional
restriction that the data of the nearby problem and the original data belong to the same class.
We believe that it is more appropriate to call such stability as mild stability. After all, such
an algorithm is stable in the mild sense.
We thus de ne:
De nition 3.8.3 An algorithm is mildly stable if it produces a solution that is close to the exact
solution of a nearby problem.
Example 3.8.1 1. The QR-algorithm for the rank de cient least squares problems is mildly
stable (see Lawson and Hanson SLP, p. 95 and Chapter 7 of this book).
2. The QR-algorithm for the full-rank underdetermined least squares problem is mildly stable
(see Lawson and Hanson SLP, p. 93 and Chapter 7 of this book).

3.9 Review and Summary


In this chapter we have introduced two of the most important concepts in numerical linear algebra,
namely, the conditioning of the problem and stability of the algorithm, and have discussed how
they e ect the accuracy of the solution.
1. Conditioning of the Problem: The conditioning of the problem is a property of the
problem. A problem is said to be ill-conditioned if a small change in the data causes a large
change in the solution, otherwise it is well-conditioned.
The conditioning of a problems is data dependent. A problem can be ill-conditioned with
respect to one set of data while it may be quite well-conditioned with respect to another set.
Ill-conditioning or well-conditioning of a matrix problem is generally measured by means of a
number called the condition number. The condition number for the linear system problem
Ax = b is kAk  kA;1k.
The well-known examples of ill-conditioned problems are: the Wilkinson polynomial for the
root- nding problem, the Wilkinson bidiagonal matrix for the eigenvalue problem, the Hilbert
matrix for the algebraic linear system problem, etc.
Paul Van Dooren is a professor of electrical engineering at University of Illinois at Urbana-Champaign. He has
received several prestigious awards and fellowships including the Householder award and the Wilkinson fellowship
for his important contributions to numerical linear algebra which turned out to be extremely valuable for solving
computational problems arising in control and systems theory and signal processing.

105
2. Stability of an Algorithm: An algorithm is said to be a backward stable algorithm if
it computes the exact solution of a nearby problem. Some examples of stable algorithms are:
backward substitution and forward elimination for triangular systems, Gaussian elimination
with pivoting for linear systems, QR factorization using Householder and Givens transforma-
tions, QR iteration algorithm for eigenvalue computations, etc.
The Gaussian elimination algorithm without row changes is unstable for arbitrary matrices.
3. E ects of conditioning and stability on the accuracy of the solution: The condi-
tioning of the problem and the stability of the algorithm both have e ects on accuracy of the
solution computed by the algorithm.
If a stable algorithm is applied to a well-conditioned problem, it should compute accurate
solution. On the other hand, if a stable algorithm is applied to an ill-conditioned problem,
there is no guarantee that the computed solution will be accurate. The de nition of backward
stability does not imply that. However, if a stable algorithm is applied to an ill-conditioned
problem, it should not introduce more errors than what the data warrants.

3.10 Suggestions for Further Reading


For computer codes of standard matrix computations, see the book Handbook for Matrix
Computations by T. Coleman and Charles Van Loan, SIAM, 1988. The concepts of stability
and conditioning have been very thoroughly discussed in the book An Introduction to Matrix
Computations by G. W. Stewart, Academic Press, New York, 1973 (Chapter 2). Note that
Stewart's de nition of backward stability is slightly di erent from the usual de nition
of backward stability introduced by Wilkinson. We also strongly suggest the readers to read
an illuminating paper by James R. Bunch in the area: The Weak and Strong Stability of Algorithms
in Numerical Linear Algebra, Lin. Alg. Appl. (1987), volume 88-89, 49-66.
Wilkinson's AEP is a rich source of knowledge for results on backward stability for matrix
algorithms.
An important paper discussing notions and concepts of di erent types of stability in general is
a paper by L. S. de Jong (1977).

106
Exercises on Chapter 3
Note: Use MATLAB (see Chapter 4 and the Appendix), whenever appropriate.
1. (a) Show that the oating point computations of the sum, product and division of two
numbers are backward stable.
(b) Are the oating computations of the inner and outer product of two vectors backward
stable? Give reasons for your answer.
2. Find the growth factor of Gaussian elimination without pivoting for the following matrices.
0 1 ;1       ;1 1
BB 0 1 ;1    ;1 CC
:00001 1
! 1
!
1 B BB .. . . . . . . . . . .. CCC
; ; . . C
1 1 :00001 1 B BB .. . . C
@. . . . ;1 C
. A
0       0 1 1010
1 1
!
1 1
!
1 1
!
1:0001 1
!
; ; ; ;
1 :9 1 :99 1 :999 1 1
01 1 1 1 0 1 1 1 01 1 1
1
BB 1 :9 :81 CC ; BB :9 :9 C ; BB
:9 C
2 3
CC :
@ A@ A@ 1
2
1
3
1
4 A
1 1:9 3:61 1 1:9 3:61 1
3
1
4
1
5

3. Find the condition number of each of the matrices of problem 2.


4. Show that Cond(cA) = Cond(A) for all nonzero scalars c.
Show that if kI k  1, then Cond(A)  1.
5. Prove that
k (AB) ; ABk  nCond (B) + O( ); 2
F F
kABkF
where A and B are matrices and B is nonsingular. (CondF (B )F = kB kF kB ; kF ) 1

6. Are the following oating point computations backward stable? Give reasons for your answer
in each case.
(a) (x(y + z ))
(b) (x1 + x2 +    + xn )
(c) (x1x2    xn )

107
(d) (xT y=c); where x and y are vectors and c is a scalar
p 
(e) x21 + x22 +    + x2n
7. Find the growth factor of Gaussian elimination for each of the following matrices and hence
conclude that Gaussian elimination for linear systems with these matrices is backward stable.
0 10 1 1 1
B C
(a) B@ 1 10 1 C
A
1 1 10
04 0 21
B C
(b) B@ 0 4 0 C
A
2 0 5
0 10 1 1 1
B 1 15 5 C
(c) B C
@ A
1 5 14
8. Show that Gaussian elimination without pivoting for the matrix
0 10 1 1 1
BB C
@ 1 10 1 CA
1 1 10
is strongly stable.
9. Let H be an unreduced upper Hessenberg matrix. Find a diagonal matrix D such that
D;1HD is a normalized upper Hessenberg matrix (that is, all subdiagonal entries are 1).
Show that the transforming matrix D must be ill-conditioned, if one or several subdiagonal
entries of H are very small. Do a numerical example of order 5 to verify this.
10. Show that the roots of the following polynomials are ill-conditioned:
(a) x3 ; 3x2 + 3x + 1
(b) (x ; 1)3(x ; 2)
(c) (x ; 1)(x ; :99)(x ; 2)
11. Using the result of problem #5, show that the matrix-vector multiplication with an ill-
conditioned matrix may give rise to a large relative error in the computed result. Construct
your own 2  2 example to see this.
12. Write the following small SUBROUTINES for future use:
(1) MPRINT(A,n) to print a square matrix A or order n.
108
(2) TRANS(A,TRANS,n) to compute the transpose of a matrix.
(3) TRANS(A,n) to compute the transpose of a matrix where the transpose is overwritten
by A.
(4) MMULT(A,B,C,m,n,p) to multiply C = Amn Bnp .
(5) SDOT(n,x,y,answer) to compute the inner product of two n-vectors x and y in single
precision.
(6) SAXPY(n,A,x,y) to compute y  ax + y in single precision, where a is a scalar and x
and y are vectors. (The symbol y  ax + y means that the computed result of a times
x plus y will be stored in y.)
(7) IMAX(n,x,MAX) to nd
jxij = maxfjxj j : j = 1; : : :; ng:
(8) SWAP(n,x,y) to swap two vectors x and y .
(9) COPY(n,x,y) to copy a vector x to y .
(10) NORM2(n,x,norm) to nd the Euclidean length of a vector x.
X
n
(11) SASUM(n,x,sum) to nd sum jxij.
i=1
(12) NRMI(x,n) to compute the in nity norms of an n-vector x.
(13) Rewrite the above routines in double precision.
(14) SNORM(m,n,A,LDA) to compute the 1-norm of a matrix Amn . LDA is the leading
dimension of the array A.
(15) Write subroutines to compute in nity and Frobenius norms of a matrix.
(16) Write a subroutine to nd the largest element in magnitude in a column vector.
(17) Write a subroutine to nd the largest element in magnitude in a matrix.
(Note: Some of these subroutines are a part of BLAS (LINPACK). See also the book Hand-
book for Matrix Computations by T. Coleman and Charles Van Loan, SIAM, 1988.)

109
4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATHEMATICAL SOFT-
WARE
4.1 De nitions and Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 110
4.2 Flop-Count and Storage Considerations for Some Basic Algorithm : : : : : : : : : : 113
4.3 Some Existing High-Quality Mathematical Softwares for Linear Algebra Problems : 122
4.3.1 LINPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122
4.3.2 EISPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122
4.3.3 LAPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 123
4.3.4 NETLIB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 124
4.3.5 NAG : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125
4.3.6 IMSL : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125
4.3.7 MATLAB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125
4.3.8 MATLAB Codes and MATLAB Toolkit : : : : : : : : : : : : : : : : : : : : : 126
4.3.9 The ACM Library : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126
4.3.10 ITPACK (Iterative Software Package) : : : : : : : : : : : : : : : : : : : : : : 126
4.4 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126
4.5 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 127
CHAPTER 4
NUMERICALLY EFFECTIVE ALGORITHMS
AND MATHEMATICAL SOFTWARE
4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATH-
EMATICAL SOFTWARE
4.1 De nitions and Examples
Solving a problem on a computer involves the following major steps performed in sequence:
1. Making a mathematical model of the problem, that is, translating the problem into the
language of mathematics. For example, mathematical models of many engineering problems
are sets of ordinary and partial di erential equations.
2. Finding or developing constructive methods (theoretical numerical algorithms) for solv-
ing the mathematical model. This step usually consists of a literature search to nd what
methods are available for the problems.
3. Identifying the best method from a numerical point of view (the best one may be a
combination of several others). We call it the numerically e ective method.
4. Finally, implementing on the computer the numerically e ective method identi ed in
step 3. This amounts to writing and executing a reliable and ecient computer program
based on the identi ed numerically e ective method, and may also require exploitation of the
target computer architecture.
The purpose of creating a mathematical software is to provide a scientist or engineer with a
piece of a computer program he can use with con dence to solve a problem for which the software
was designed. Thus, a mathematical software should be of high quality.
Let's be speci c about what we mean when we call a software a high quality mathematical
software. A high quality mathematical software should have the following features. It should be
1. Powerful and exible | can be used to solve several di erent variations of the original
problem and the closely associated problems. For example, closely associated with the linear
system Ax = b problem are:
(a) Computing the inverse of A, i.e., nding an X such that AX = I . Though nding the
inverse of A and solving Ax = b are equivalent problems, solution of a linear system
using the inverse ofthe system matrix is not advisable. Computing the inverse explicitly
should be avoided, unless a speci c application really calls for it.
(b) Finding the determinant and rank of A.
(c) Finding AX = B , where B is a matrix, etc.
110
Also, a matrix problem may have some special structures. It may be positive de nite, banded,
Toeplitz, dense, sparse, etc. The software should state clearly what variations of the problem
it can handle and whether it is special-structure oriented.
2. Easy to read and modify | The software should be well documented. The documentation
should be clear and easy to read, even for a non-technical user, so that if some modi cations
are needed, they can be made easily. To quote from the cover page of Forsythe, Malcolm and
Moler (CMMC):
\: : : it is an order of magnitude easier to write two good subroutines than to de-
cide which one is best. In choosing among the various subroutines available for
a particular problem, we placed considerable emphasis on the clarity and style of
programming. If several subroutines have comparable accuracy, reliability, and
eciency, we have chosen the one that is the least dicult to read and use."
3. Portable | Should be able to run on di erent computers with few or no changes.
4. Robust | Should be able to deal with an unexpected situation during execution.
5. Based on a numerically e ective algorithm | Should be based on an algorithm that
has attractive numerical properties.
We have used the expression \numerically e ective" several times without quali cation. This
is the most important component of a high quality mathematical software. We shall call a matrix
algorithm numerically e ective if it is:
(a) General Purpose | The algorithm should work for a wide class of matrices.
(b) Reliable | The algorithm should give warning whenever it is on the verge of breakdown due
to excessive round-o errors or not being able to meet some speci ed criterion of convergence.
There are algorithms which produce completely wrong answers without giving warning at all.
Gaussian elimination without pivoting (for the linear system or equivalent problem) is
one such algorithm. It is not reliable.
(c) Stable | Total rounding errors of the algorithm should not exceed the errors that are
inherent in the original problem (see the earlier section on stability).
(d) Ecient | The eciency of an algorithm is measured by the amount of computer time con-
sumed in its implementation. Theoretically, the number of oating-point operations needed
to implement the algorithm indicates its eciency.
111
De nition 4.1.1 A oating-point operation, or op, is the amount of computer time re-
quired to execute the Fortran statement

A(I,J) = A(I,J) + t * A(I,J)

A op involves one multiplication, one addition, and some subscript manipu-


lations. Similarly, one division coupled with an addition or subtraction will be
counted as one op. This de nition of a op has been used in the popular software package
LINPACK (this package is brie y described in section 4.2).

A note on the de nition of op-count


With the advent of supercomputing technology, there is a tendency
to count an addition or subtraction as a op as well. This de nition
of a op has been adopted in the second edition of the book by
Golub and Van Loan (MC 1989). However, we have decided to stick
to the original LINPACK de nition. Note that if an addition (or
subtraction) is counted as a op, then the \new op" has twice the
value of the \old op".

De nition 4.1.2 A matrix algorithm involving computation with matrices of order n will be
called an ecient algorithm if it takes no more than O(n ) ops. (The historical Cramer's
3

rule for solving a linear system is therefore not ecient, since O(n!) ops are required for
its execution.) (See Chapter 6.)

One point is well worth mentioning here. An algorithm may be ecient, but still
n3
unstable. For example, Gaussian elimination without pivoting requires 3 ops for an n  n
matrix. Therefore, while it is ecient, it is unreliable and unstable for an arbitrary matrix.
(e) Economic in the use of storage | Usually, about n2 storage locations are required
to store a dense matrix of order n. Therefore, if an algorithm requires storage of several
matrices during its execution, a large number of storage locations will be needed even when
n is moderate. Thus, it is important to give special attention to economy of storage while
designing an algorithm.

112
By carefully rearranging an algorithm, one can greatly reduce its storage requirement (ex-
amples of this will be presented later). In general, if a matrix generated during execution
of the algorithm is not needed for future use, it should be overwritten by another computed
element.

Notation for Overwriting and Interchange


We will use the notation
ab
to denote that \b overwrites a". Similarly, if two computed quan-
tities a and b are interchanged, they will be written symbolically
a $ b:

4.2 Flop-Count and Storage Considerations for Some Basic Algorithm


Numerical linear algebra often deals with triangular matrices, which can be stored using only
n(n + 1) locations rather than n2 locations. This useful fact should be kept in mind while designing
2
an algorithm in numerical linear algebra, so that the extra available space can be used for something
else.
Sparse matrices, in general, have lots of zero entries. A convenient scheme for storing a sparse
matrix will be such that only the nonzero entries are stored. In the following, we illustrate the
op-count and storage scheme for some simple basic matrix computations with dense matrices.

In determining the op-count of an algorithm, we note that the numbers of


multiplications and additions in a matrix algorithm are roughly the same.
Thus, a count of only the number of multiplications in a matrix algorithm
gives us an idea of the total op-count for that algorithm. Also, the counts
involving zero elements can be omitted.
Example 4.2.1
0 x 1Inner Product 0 y 1Computation
BB x CC
1
B
B
1

y C
C X n
Let x = BB C B C
B@ ... CCA and y = B . C be two n-vectors. Then the inner product z = x y = xi yi
2 2
T
B
@ .. CA i =1

xn yn
can be computed as (Algorithm 3.1.2):
113
For i = 1; 2; : : :; n do
z = z + xi yi
Just one op is needed for each i. Then a total of n ops is needed to execute the algorithm.
Example 4.2.2 Outer Product Computation
The outer product xy T is an n  n matrix Z as shown below. The (i; j )th component of the
matrix is xi yj . Since there are n2 components and each component requires one multiplication,
the outer-product computation requires n2 ops and n2 storage locations. However, very
often one does not require the matrix from the outer-product explicitly.
10x 0x y x y  x y 1
BB x
CC 1
BB x y x y    x yn CC
1 1 1 2 1

Z = xy = B
T CC ( y y    yn ) = BB .
BB ..
CA
2

B@ ..
nC
CC :
2 1 2 2 2

@.
1 2
A
xn xny xny    xnyn 1 2

Example 4.2.3 Matrix-Vector Product


Let A = (aij ) be an n  n matrix and let
0b 1
BB b 1
CC
b=BBB .. 2 CC
CA
@.
bn
be an n-vector. Then 0 a b +a b ++ a b 1
n n
BB 11 1 12 2
C 1

Ab = @ a b + a b +    + a nbn C
21 1 22 2 A 2

an b + an b +    + annbn
1 1 2 2

Flop-count. Each component of the vector Ab requires n multiplications and additions. Since
there are n components, the computation of Ab requires n ops. 2

Example 4.2.4 Matrix-Matrix Product With Upper Triangular Matrices


Let U = (uij ) and V = (vij ) be two upper triangular matrices of order n. Then the following
algorithm computes the product C = UV . The algorithm overwrites V with the product UV .
For i = 1; 2; : : :; n do
For j = i; i + 1; : : :; n do
X
j
vij  cij  uik vkj
k=i

114
An Explanation of the Above Pseudocode
Note that in the above pseudocode j represents the inner-loop and i represents the outer-loop.
For each value of i from 1 to n, j takes the values i through n.

Flop-Count.
1. Computing cij requires j ; i + 1 multiplications.
2. Since j runs from i to n and i runs from 1 to n, the total number of multiplications is:
X
n X
n X
n
(j ; i + 1) = (1 + 2 +    + (n ; i + 1)
i=1 j =i i=1
Xn (n ; i + 1)(n ; i + 2)
= 2
i=1

 n6 (for large n)
3

(Recall that 1 + 2 + 3 +    + r = r(r 2+ 1) , 12 + 22 + 32 +    + r2 = r(r + 1)(2


6
r + 1) .)

Flop-count for the Product of Two Triangular Matrices


For large n, the product of two n  n triangular matrices requires about n6 ops.
3

Example 4.2.5 Matrix-Matrix Multiplication


Let A be an m  n matrix and B be a n  p matrix. Then the following algorithm computes
the product C = AB .
For i = 1; 2; : : :; m do
For j = 1; 2; : : :; p do
Xn
cij = aik bkj .
k=1

115
Flop-count. There are n multiplications in computing each cij . Since j runs from 1 to p and
i runs from 1 to m, the total number of multiplication is mnp. Thus, for two square matrices A
and B , each of order n, this count is n . 3

Flop-Count for the Product of Two Matrices


Let A be m  n and B be n  p. Then computing C = AB requires
mnp ops. In particular, it take n3 ops to compute the product of
two n  n square matrices.

The algorithm above for matrix-matrix multiplication for two n  n matrices will obviously
require n2 storage locations for each matrix. However, it can be rewritten in such a way that there
will be a substantial savings in storage, as illustrated below.
The following algorithm overwrites B with the product AB , assuming that an additional column
has been annexed to B (alternatively one can have a work vector to hold values temporarily).
Example 4.2.6 Matrix-Matrix Product with Economy in Storage
For j = 1; 2; : : :; n do
X n
1. hi = aik bkj (i = 1; 2; : : :; n)
k=1
2.bij  hi (i = 1; 2; : : :; n)
(h is a temporary work vector).
!
Example 4.2.7 Computation of I ; 2uuu
T
T u A.

In numerical linear algebra very often we need to compute


!
I; 2 uu T
A;
uT u
where I is an m  m identity matrix, u is an m-vector, and A is m  n. The matrix I ; 2uuu
T
Tu !is
called a Householder matrix (see Chapter 5). Naively, one would form the matrix I ; 2uu
T
uT u
from the vector u, then form the matrix product explicitly with A. This will require O(n ) ops.
3

We show below that this matrix product can implicitly be performed with O(n2!) ops. The key
observation here is that we do not need to form the matrix I ; 2uu explicitly.
T
uT u
116
The following algorithm computes the product. The algorithm overwrites A with the product.
Let
= uT2u :
Then !
I ; 2uuu
T
Tu A

becomes A ; uuT A:
Let 0u 1
BB u 1
CC
u=BBB .. 2 CC :
CA
@.
un
Then the (i; j )th entry of (A ; uuT A) is equal to aij ; (u a j + u a j +    + un anj )ui. Thus, we
1 1 2 2

have the following algorithm.


Algorithm 4.2.1 Computing (I ; 2uuu
T
T u )A

1. Compute = uT2u .
For j = 1; 2; : : :; n do
= u1a1j + u2a2j +    + um amj
 
For i = 1; 2; : : :; m do
aij  aij ; ui

Flop-Count.
1. There are (m + 1) ops to compute (m ops to compute the inner product and 1 op to
divide 2 by the inner product).
2. There are n 's, and each costs (m + 1) ops. Thus, we need n(m + 1) ops to compute
the 's.
3. There are mn aij to compute. Each aij costs just one op, once the 's are computed.

Total op-count: (m + 1) + 2mn + n.

117
We now summarize the above very important result (which will be used repeatedly in this book)
in the following:

!
Flop-count for the Product: I ; 2uuu
T
Tu A

Let A be m  n (m  n and u be an m-vector. Then the above


product can be computed with only (m + 1) + 2mn + n ops. In
particular, if m = n, then it takes roughly 2n2 ops, compared to n3
ops if done naively.

A Numerical Example
Let 01 11
B 2 1 CC :
u = (1; 1; 1)T ; A = B
@ A
0 0
Then
= 23
j=1: (Compute the rst column of the product)

= u a +u a +u a = 1+2 = 3
1 11 2 21 3 31

 = 23  3 = 2
a  a ; u = 1 ; 2 = ;1
11 11 1

a  a ; u = 2 ; 2  1 ; 2 ; 2 = ;0
21 21 2

a  a ; u = 0 ; 2  1 = ;2
31 31 3

j=2: (Compute the second column of the product)


= u a +u a +u a = 1+1 = 2
1 12 2 22 3 32

 = 2  23 = 43
a  a ; u = 1 ; 4  1 = ;1
12 12 1
3 3

118
a22  a ; u = 1 ; 43  1 = ;31
22 2

a 32  a ; u = 0 ; 4 = ;4
32 3
3 3
Thus,
!
I ; 2uuu
T
A  Tu A
0 ;1 ;1 1
B0 ;1 C
3

=B
@ 3 A
C
;2 ;4
3

119
Example 4.2.8 Flop-count for Algorithm 3.1.3 (Back Substitution Process)
>From the pseudocodes of this algorithm, we see that it takes one op to compute yn , two ops
to compute yn;1 , and so on. Thus to compute y1 through yn , we need (1 + 2 + 3 +    + n) =
n(n + 1)  n2 ops.
2 2

Flop-count for the Back Substitution Process


It requires roughly n2 ops to solve an upper triangular system using
2

the back-substitution process.

Example 4.2.9 Flop-count and Storage Considerations for Algorithm 3.1.4 (The In-
verse of an Upper Triangular Matrix
Let's state the algorithm once more here
For k = n; n ; 1; : : :; 2; 1
skk = t1
kk
For i = k ; 10; k ; 2; : : :; 11
X n
s =;1 @
ik tii t s Aij jk
j =i+1

Flop-count.
k=1 1 op
k=2 3 ops
k=3 6 ops
..
.
k=n n(n + 1) ops
2
X
n r(r + 1)
(approximately ):Total ops 1 + 3 + 6 +    + n(n2+ 1) = 2
r=1
X
n r2 X
n r n: 3
= + 
2 r=1 2 6
r=1

120
Flop-count for Computing the Inverse of an Upper
Triangular Matrix
It requires about n6 ops to compute the inverse of an upper trian-
3

gular matrix.

Storage Considerations. Since the inverse of an upper triangular matrix is an upper triangular
matrix, and it is clear from the algorithm that we can overwrite tik by sik , the algorithm can be
rewritten so that it overwrites T with the inverse S . Thus we can rewrite the algorithm as

Computing the Inverse of an Upper Triangular Matrix


with Economy in Storage
For k = n; n ; 1; : : :; 2; 1
tkk  skk = t1
kk
For i = k ; 1; k ; 2; : : :; 1
Xk
t s =;1
ik ik tii j t s ij jk
i
= +1

Example 4.2.10 Flop-count and Storage Considerations for Algorithm 3.1.5 (Gaussian
Elimination)
It will be shown in Chapter 5 that the Gaussian elimination algorithm takes about n
3

3
ops. The algorithm can overwrite A with the upper triangular matrix A(n;1); in fact, it can
overwrite A with each A(k) . The multipliers mik can be stored in the lower half of A as they are
computed. Each b(k) can overwrite the vector b.

121
Gaussian Elimination Algorithm with Economy in Storage
For k = 1; 2; : : :; n ; 1 do
For i = k + 1; : : :; n do
aik  mik = ; aaik
kk
For j = k + 1; : : :; n do
aij  aij + mik akj
bi  bi + mikbk

4.3 Some Existing High-Quality Mathematical Softwares for Linear Algebra


Problems
Several high quality mathematical software packages for various types of matrix computations are
in existence. These are LINPACK, EISPACK, MATLAB, NETLIB, IMSL, NAG, and the most
recently released LAPACK.

4.3.1 LINPACK
LINPACK is \a collection of Fortran subroutines which analyze and solve various systems of si-
multaneous linear algebraic equations. The subroutines are designed to be completely machine
independent, fully portable, and to run at near optimum eciency in most operating environ-
ments." (Quotation from LINPACK Users' Guide.)
Though primarily intended for linear systems, the package also contains routines for the singu-
lar value decomposition (SVD) and problems associated with linear systems such as computing
the inverse, the determinant, and the linear least square problem. Most of the routines are
for square matrices, but some handle rectangular coecient matrices associated with overdeter-
mined or underdetermined problems.
The routines are meant for small and dense problems of order less than a few hundred and
band matrices of order less than several thousand. There are no routines for iterative methods.

4.3.2 EISPACK
EISPACK is an eigensystem package. The package is primarily designed to compute the eigenvalues
and eigenvectors of a matrix; however, it contains routines for the generalized eigenvalue problem
of the form Ax = Bx and for the singular value decomposition.

122
The eigenvalues of an arbitrary matrix A are computed in several sequential phases. First,
the matrix A is balanced. If it is nonsymmetric the balanced matrix is then reduced to an upper
Hessenberg by matrix similarities (if it is symmetric, it is reduced to symmetric tridiagonal). Fi-
nally the eigenvalues of the transformed upper Hessenberg or the symmetric tridiagonal matrix are
computed using the implicit QR iterations or the Sturm-sequence method. There are EISPACK
routines to perform all these tasks.

4.3.3 LAPACK
The building blocks of numerical linear algebra algorithms have three levels of BLAS (Basic Linear
Algebra Subroutines). They are:
Level 1 BLAS: These are for vector-vector operations. A typical Level 1 BLAS is of the form
y  x + y, where x and y are vectors and is a scalar.
Level 2 BLAS: These are for matrix-vector operations. A typical Level 2 BLAS is of the form
y  Ax + y .
Level 3 BLAS: These are for matrix-matrix operations. A typical Level 3 BLAS is of the form
C  AB + C .
Level 1 BLAS is used in LINPACK. Unfortunately, the algorithms composed of Level 1 BLAS
are not suitable for achieving high eciency on most supercomputers of today, such as on CRAY
computers.
While Level 2 BLAS can give good speed (sometimes almost peak speed) on many vector
computers such as CRAY X-MP or CRAY Y-MP, those are not suitable for high eciency on other
modern supercomputers (e.g., CRAY 2).
The Level 3 BLAS are ideal for most of today's supercomputers. They can perform O(n3)
oating-point operations on O(n2 ) data.
Therefore, during the last several years, an intensive attempt was made by numerical linear
algebraists to restructure the traditional BLAS-1 based algorithms into algorithms rich in BLAS-2
and BLAS-3 operations. As a result, there now exist algorithms of these types, called blocked
algorithms, for most matrix computations. These algorithms have been implemented in a software
package called LAPACK.
\LAPACK is a transportable library of Fortran 77 subroutines for solving the most common
problems in numerical linear algebra: systems of linear equations, linear least squares problems,
eigenvalue problems, and singular value problems. It has been designed to be ecient on a wide
range of modern high-performance computers.
123
LAPACK is designed to supersede LINPACK and EISPACK, principally by restructuring the
software to achieve much greater eciency on vector processors, high-performance \super-scalar"
workstations, and shared memory multiprocessors. LAPACK also adds extra functionality, uses
some new or improved algorithms, and integrates the two sets of algorithms into a uni ed package.
The LAPACK Users' Guide gives an informal introduction to the design of the algorithms and
software, summarizes the contents of the package, describes conventions used in the software and
its documentation, and includes complete speci cations for calling the routines." (Quotations from
the cover page of LAPACK Users' Guide.)

4.3.4 NETLIB
Netlib stands for network library. LINPACK, EISPACK, and LAPACK subroutines are available
electronically from this library, along with many other types of softwares for matrix computations.
A user can obtain software from these packages by sending electronic mail to
netlib@netlib.ornl.gov

Also, les can be transferred to a local directory by anonymous ftp:


ftp research.alt.com
ftp netlib2.cs.utk.edu

To nd out how to use netlib, send e-mail:


send index

Information on the subroutines available in a given package can be obtained by sending e-mail:
send index from {library}

Thus, to obtain a description of subroutines for LINPACK, send the message:


send index from LINPACK

To obtain a piece(s) of software from a package, send the message:


send {routines} from {library}

Thus, to get a subroutine called SGECO from LINPACK, send the message:
send SGECO from LINPACK

124
(This message will send you SGECO and other routines that call SGECO.) Xnetlib, which uses an
X Window interface for direct downloading, is also available (and more convenient) once installed.
Further information about Netlib can be obtained by anonymous FTP to either of the following
sites:
netlib2.cs.utk.edu
research.att.com

4.3.5 NAG
NAG stands for Numerical Algorithm Group. This group has developed a large software library
(also called NAG) containing routines for most computational problems including numerical linear
algebra problems, numerical di erential equations problems (both ordinary and partial), optimiza-
tion problems, integral equations problems, statistical problems, etc.

4.3.6 IMSL
IMSL stands for International Mathematical and Statistical Libraries. As the title suggests, this
library contains routines for almost all mathematical and statistical computations.

4.3.7 MATLAB
The name MATLAB stands for MATrix LABoratory. It is an interactive computing system
designed for easy computations of various matrix-based scienti c and engineering problems. MAT-
LAB provides easy access to matrix software developed by the LINPACK and EISPACK software
projects.
MATLAB can be used to solve a linear system and associated problems (such as inverting a
matrix or computing the rank and determination of a matrix), to compute the eigenvalues and
eigenvectors of a matrix, to nd the singular value decomposition of a matrix, to compute
the zeros of a polynomial, to compute generalized eigenvalues and eigenvectors, etc. MATLAB
is an extremely useful and valuable package for testing algorithms for small problems and for use
in the classroom. It has indeed become an indispensable tool for teaching applied and numerical
linear algebra in the classroom. A remarkable feature of MATLAB is its graphic capabilities (see
more about MATLAB in the appendix).
There is a student edition of MATLAB, published by Prentice Hall,1992.It is designed for easy
use in the classroom.

125
4.3.8 MATLAB Codes and MATLAB Toolkit
MATLAB codes for selected algorithms described in this book are provided for beginning students
in the APPENDIX.
Furthermore, an interactive MATLAB Toolkit called MATCOM implementing all the major
algorithms (to be taught in the rst course) will be provided along with the book, so that students
can compare di erent algorithms for the same problem with respect to numerical eciency, stability,
accuracy, etc.

4.3.9 The ACM Library


The library provided by the Association for Computing Machinery contains routines for basic
matrix-vector operations, linear systems and associated problems, nonlinear systems, zeros of poly-
nomials, etc. The journal TOMS (ACM Transactions on Mathematical Software) publishes these
algorithms.

4.3.10 ITPACK (Iterative Software Package)


The package contains routines for solving iteratively mainly linear systems problems in the large
and sparse cases.

4.4 Review and Summary


The purpose of this chapter has been to introduce concepts of numerically e ective algorithms and
the associated high quality mathematical software.
1. Writing codes for a given algorithm is rather a trivial task. However, all softwares are not
high quality software.
We have de ned a high quality mathematical software as one which is (1) powerful and
exible, (2) easy to read and modify, (3) portable, (4) robust, and more importantly (5) based
on a numerically e ective algorithm.
2. Like softwares, there may exist many algorithms for a given problem. However, not all
algorithms are numerically e ective. We have de ned a numerically e ective algorithm
as one which is (1) general purpose, (2) reliable, (3) stable, and (4) ecient in terms of both
time and storage.
3. The eciency of a matrix algorithm is measured by computer time consumed by the algorithm.
A theoretical measure is the number of ops required to execute the algorithm. Roughly, one
126
multiplication (or a division) together with one addition (or a subtraction) has been de ned
to be one op.
A matrix algorithm involving matrices of order n requiring no more than O(n3) ops has
been de ned to be an ecient algorithm; stability of an algorithm was de ned in Chapter
2.
An important point has been made: An algorithm may be ecient without being
stable. Thus, an ecient algorithm may not necessarily be a numerically e ective
algorithm. There are algorithms which are fast but not stable.
4. Several examples (Section 4.2) have been provided to show how certain basic matrix opera-
tions can be reorganized rather easily to make them both storage and time ecient, without
implementing them naively. Example 4.2.7 is the most important one in this context.
Here we have shown how to Torganize the computation of the product of an n  n matrix A
with the matrix H = I ; 2uuuT u , known as the Householder matrix, so that the product can
be computed with O(n ) ops rather than the O(n3) ops that would be needed if computed
2

naively, ignoring the structure of H . This computation forms a basis for many other matrix
computations described later in the book.
5. A statement for each of the several high quality mathematical software package such as
LINPACK, EISPACK, LAPACK, MATLAB, IMSL, NAG, etc., has been provided in the
nal section.

4.5 Suggestions for Further Reading


A clear and nice description of how to develop high quality mathematical software packages is
included in the book Matrix Computations and Mathematical Software by John Rice,
McGraw-Hill Book Company, 1981. This book contains a chapter (Chapter 12) on software projects.
The students (and the instructors) interested in comparative studies of various softwares may nd
them interesting. An excellent paper by Demmel (1984) on the reliability of numerical software is
worth reading.
Another recent book in the area is the book Handbook of Matrix Computations by
T. Coleman and C. Van Loan, SIAM, 1988. These two books are a must for readers interested in
the software-development for matrix problems.
The books by Forsythe, Malcom and Moler (CMMC) and by Hager (ANAL) contain some useful
subroutines for matrix computations. See also the books by Johnston, and by Kahaner, Moler and
Nash referenced in Chapter 2.

127
Stewart's book (IMC) is a valuable source for learning how to organize and develop algorithms
for basic matrix operations in time-ecient and storage-economic ways.
Each software package has its own users' manual that describes in detail the functions of the
subroutines, and how to use them, etc. We now list the most important ones.
1. The LINPACK Users' Guide by J. Dongarra, J. Bunch, C. Moler and G. W. Stewart,
SIAM, Philadelphia, PA, 1979.
2. The EISPACK Users' Guide can be obtained from either NESC or IMSL. Matrix Eigen-
system Routines EISPACK Guide by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S.
Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, has been published by Springer-Verlag,
Berlin, 1976, as volume 6 of Lecture Notes in Computer Science. A later version, prepared by
Garbow, Boyle, Dongarra and Moler in 1977, is also available as a Springer-Verlag publication.
3. LAPACK Users' Guide, prepared by Chris Bischof, James Demmel, Jack Dongarra, Jerry
D. Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen, is available from SIAM.
SIAM's address and telephone number are: SIAM (Society for Industrial and Applied Math-
ematics), 3600 University City Science Center, Philadelphia, PA 19114-2688; Tel: (215) 382-
9800.
4. The NAG Library and the associated users' manual can be obtained from:
Numerical Algorithms Group, Inc.
1400 Opus Place, Suite 200
Downers Grove, IL 60151-5702
5. IMSL: The IMSL software library and documentation are available from:
IMSL, Houston, Texas
6. MATLAB: The MATLAB software and Users' Guide are available from:
The MathWorks, Inc.
Cochituate Place
24 Prime Park Way
Natick, MA 01760-1520
TEL: (508) 653-1415; FAX: (508) 653-2997
e-mail: info@mathworks.com
The student edition of MATLAB has been published by Prentice Hall, Englewood Cli s, NJ
07632. A 5 14 disk is included with the book for MS-DOS personal computers.

128
For more information on accessing mathematical software electronically, the paper \Distribution
of Mathematical Software via Electronic Mail" by J. Dongarra and E. Grosse, Communications of
the ACM, 30(5) (1987), 403-407, is worth reading.
Finally, a nice survey of the blocked algorithms has been given by James Demmel (1993).

129
Exercises on Chapter 4
1. Develop an algorithm to compute the product C = AB in each of the following cases. Your
algorithm should take advantage of the special structure of the matrices in each case. Give
op-count and storage requirement in each case.
(a) A and B are both lower triangular matrices.
(b) A is arbitrary and B is lower triangular.
(c) A and B are both tridiagonal.
(d) A is arbitrary and B is upper Hessenberg.
(e) A is upper Hessenberg and B is tridiagonal.
(f) A is upper Hessenberg and B is upper triangular.
2. A square matrix A = (aij ) is said to be a band matrix of bandwidth 2k + 1 if
aij = 0 whenever ji ; j j > k:
Develop an algorithm to compute the product C = AB , where A is arbitrary and B is a band
matrix of bandwidth 2, taking advantage of the structure of the matrix B . Overwrite A with
AB and give op-count.
3. Using the ideas in Algorithm 4.2.1, develop an algorithm to compute the product A(I + xy T ),
where A is an n  n matrix and x and y are n-vectors. Your algorithm should require roughly
2n2 ops.
4. Rewrite the algorithm of the problem #3 in the special cases when the matrix I + xy T is
(a) an elementary matrix: I + meTk , m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T , and eTk is the kth
row of I .
(b) a Householder matrix: I ; 2uu
T
T , where u is an n-vector.
uu
5. Let A and B be two symmetric matrices of the same order. Develop an algorithm to compute
C = A + B , taking advantage of symmetry for each matrix. Your algorithm should overwrite
B with C . What is the op-count?
6. Let A = (aij ) be an unreduced lower Hessenberg matrix of order n. Then, given the rst row
r1, it can be shown (Datta and Datta [1976]) that the successive rows r2 through rn of Ak (k

130
is a positive integer  n) can be computed recursively as follows:
X
i;1

(riBi ; aij rj )
j =1
ri =
+1
ai;i+1 ; i = 1; 2; : : :; n ; 1; where Bi = A ; aii I:
Develop an algorithm to implement this. Give op-count for the algorithm.
7. Let ar and br denote, respectively, the rth columns of the matrices A and B . Then develop
an algorithm to compute the product AB from the formula
X
n
AB = ai bTi :
i=1
Give op-count and storage requirement of the algorithm.
8. Consider the matrix 0 12 11 10    3 2 1 1
BB 11 11 10    3 2 1 CC
BB .. .. C
C
BB 0 10 10 . . . . .C C
B .. . . . . . . . . . ... ... C
A=B BB . . CC
BB .. . . . . . 2 .. C
. . CC
BB .. C
@. 2 2 1C A
0    0 1 1
Find the eigenvalues of this matrix. Use MATLAB.
Now perturb the (1,12) element to 10;9 and compute the eigenvalues of this perturbed matrix.
What conclusion do you make about the conditioning of the eigenvalues?

131
MATLAB AND MATCOM PROGRAMS AND
PROBLEMS ON CHAPTER 4

You will need the program housmul from MATCOM

1. Using the MATLAB function `rand', create a 5  5 random matrix and then print out the
following outputs:
A(2,:), A(:,1), A (:,5),
A (1, 1: 2 : 5), A([1, 5]), A (4: -1: 1, 5: -1: 1).
2. Using the function `for', write a MATLAB program to nd the inner product and outer
product of two n-vectors u and v.
[s] = inpro(u,v)
[A] = outpro(u,v)
Test your program by creating two di erent vectors u and v using rand (4,1).
3. Learn how to use the following MATLAB commands to create special matrices:
compan companion matrix
diag diagonal matrix
ones constant matrix
zeros zero matrix
rand random matrix
tri triangular part
hankel hankel matrix
toeplitz Toeplitz matrix
hilb Hilbert matrix
triu upper triangular
vander Vandermonde matrix
4. Write MATLAB programs to create the following well-known matrices:
(a) [A] = wilk(n) to create the Wilkinson bidiagonal matrix A = (aij ) of order n:
aii = n ; i + 1; i = 1; 2;    ; 20
132
ai; ;i = n; i = 2; 3;    ; n
1

aij = 0; otherwise :
(b) [A] = pie(n) to create the Pie matrix A = (aij ) of order n:
aij = ; is a parameter near 1 or n ; 1:
aii = 1 for i 6= j:
5. Using \help" commands for \ ops", \clock", \etime", etc., learn how to measure op-count
and timing for an algorithm.
6. Using MATLAB functions `for', `size', `zero', write a MATLAB program to nd the product
of two upper triangular matrices A and B of order m  n and n  p, respectively. Test your
program using
A = triu(rand (4,3)),
B = triu(rand (3,3)).
7. Run the MATLAB program housmul(A; u) from MATCOM by creating a random matrix A
of order 6  3 and a random vector u with six elements. Print the output and the number of
ops and elapsed time.
8. Modify the MATLAB program housmul to housxy(A; x; y ) to compute the product (I +
xy T )A.
Test your program by creating a 15  15 random matrix A and the vectors x and y of
appropriate dimensions. Print the product and the number of ops and elapsed time.

133
5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LINEAR ALGEBRA
AND THEIR APPLICATIONS
5.1 A Computational Methodology in Numerical Linear Algebra : : : : : : : : : : : : : : 135
5.2 Elementary Matrices and LU Factorization : : : : : : : : : : : : : : : : : : : : : : : 135
5.2.1 Gaussian Elimination without Pivoting : : : : : : : : : : : : : : : : : : : : : 136
5.2.2 Gaussian Elimination with Partial Pivoting : : : : : : : : : : : : : : : : : : : 147
5.2.3 Gaussian Elimination with Complete Pivoting : : : : : : : : : : : : : : : : : : 155
5.3 Stability of Gaussian Elimination : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 160
5.4 Householder Transformations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 163
5.4.1 Householder Matrices and QR Factorization : : : : : : : : : : : : : : : : : : : 167
5.4.2 Householder QR Factorization of a Non-Square Matrix : : : : : : : : : : : : : 173
5.4.3 Householder Matrices and Reduction to Hessenberg Form : : : : : : : : : : : 174
5.5 Givens Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 180
5.5.1 Givens Rotations and QR Factorization : : : : : : : : : : : : : : : : : : : : : 186
5.5.2 Uniqueness in QR Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : 188
5.5.3 Givens Rotations and Reduction to Hessenberg Form : : : : : : : : : : : : : : 191
5.5.4 Uniqueness in Hessenberg Reduction : : : : : : : : : : : : : : : : : : : : : : : 193
5.6 Orthonormal Bases and Orthogonal Projections : : : : : : : : : : : : : : : : : : : : : 194
5.7 QR Factorization with Column Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : 198
5.8 Modifying a QR Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203
5.9 Summary and Table of Comparisons : : : : : : : : : : : : : : : : : : : : : : : : : : : 205
5.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 209
CHAPTER 5
SOME USEFUL TRANSFORMATIONS IN
NUMERICAL LINEAR ALGEBRA
AND THEIR APPLICATIONS
5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LIN-
EAR ALGEBRA AND THEIR APPLICATIONS
Objectives
The major objective of this chapter is to introduce fundamental tools such as elementary,
Householder, and Givens matrices and their applications. Here are some of the highlights of the
chapter.
 Various LU-type matrix factorizations: LU factorization using Gaussian elimination
without pivoting (Section 5.2.1), MA = U factorization using Gaussian elimination with
partial pivoting (Section 5.2.2), and MAQ = U factorization using Gaussian elimination
with complete pivoting (Section 5.2.3).
 QR factorization using Householder and Givens matrices (Section 5.4.1 and Section 5.5.1).
 Reduction to Hessenberg form by orthogonal similarity using Householder and Givens
matrices (Sections 5.4.3 and 5.5.3).
 Computations of orthonormal bases and orthogonal projections using QR factoriza-
tions (Section 5.6).
Background Material Needed for this Chapter
The following background material and tools developed in earlier chapters will be needed for
comprehension of this chapter.
1. Subspace and basis (Section 1.2.1)
2. Rank properties (Section 1.3.2)
3. Orthogonality and projections: Orthonormal basis and orthogonal projections (Sections
1.3.5 and 1.3.6)
4. Special matrices: Triangular, permutation, Hessenberg, orthogonal (Section 1.4)
5. Basic Gaussian elimination (Algorithm 3.1.5)
6. Stability concepts of algorithms (Section 3.2)

134
5.1 A Computational Methodology in Numerical Linear Algebra
Most computational algorithms to be presented in this book have a common basic structure that
can be described as follows:
1. The problem is rst transformed to a \reduced" problem.
2. The reduced problem is then solved exploiting the special structure exhibited by the problem.
3. Finally, the solution of the original problem is recovered from the solution of the reduced
problem.
The reduced problem typically involves a \condensed" form or forms of the matrix A, such
as triangular, Hessenberg (almost triangular), tridiagonal, Real Schur Form (quasi-triangular), or
bidiagonal. It is the structures of these condensed forms which are exploited in the solution of the
reduced problem. For example, the solution of the linear system Ax = b is usually obtained rst
by triangularizing the matrix A, and then solving an equivalent triangular system. In eigenvalue
computations, the matrix A is transformed to a Hessenberg form before applying the QR iterations.
To compute the singular value decomposition, the matrix A is rst transformed to a bidiagonal
matrix, and then singular values of the bidiagonal matrix are computed. These condensed forms
are normally achieved through a series of transformations known as elementary, Householder
or Givens transformations. We will study these transformation here and show how they can be
applied to achieve various condensed forms.

5.2 Elementary Matrices and LU Factorization


In this section we show how to triangularize a matrix A using the classical elimination scheme
known as the Gaussian elimination scheme. The tools of Gaussian elimination are elementary
matrices.
De nition 5.2.1 An elementary lower triangular matrix of order n is a matrix of the form
01 0    0 0 01
B
B 0 1 0    0 0C
C
B
B CC
B
B 0 0 1 0 0C
B
B . . ... ... ... .. CC
B .
. .
. .C CC
E=B B .. .. . .
B . . 0 1 . . .C
. C
B
B .
. . . . 0 ... C CC
B 0 . m k ;k
B
B .
..
+1
. .. .. 0 C
. CA
@0 0
0 0 0    mn;k    0 1
135
Thus, it is an identity matrix except possibly for a few nonzero elements below the diagonal of a
single column. If the nonzero elements lie in the kth column, then E has the form:
E = I + meTk ;
where I is the identity matrix of order n, m = (0; 0; : : :; 0; mk +1 ;k; : : :; mn;k)T , and ek is the kth
unit-vector.
Elementary matrices can be very conveniently used to create zeros in a vector, as shown in the
following lemma.
Lemma 5.2.1 Given 0a 1
BB a 1
CC
a=BBB .. 2 CC ;
CA a 6= 0;
@.
1

an
there is an elementary matrix E such that Ea is a multiple of e . 1

Proof. De ne 0 1 0 0  01
BB ; a 1 0    0 CC
B a CC
2

E=B BB .. 1
... .. C :
@ .a .C A
; an 0 0    1
Then E is an elementary lower triangular matrix and is such that
1

0a 1
BB 0 CC 1

BB CC
Ea = B BB 0. CCC
B@ .. CA
0

De nition 5.2.2 The elements mi = ; aai , i = 2; : : :; n are called multipliers.


1
1

5.2.1 Gaussian Elimination without Pivoting


We described the basic Gaussian elimination scheme (Gaussian elimination without pivoting) in
Chapter 3 (Algorithm 3.1.5). We will see in this section that this process yields an LU factorization
of A, A = LU , whenever the process can be carried to completion. The key observation is that
the matrix A k is a result of premultiplication of A k; by a suitable elementary lower
( ) ( 1)

triangular matrix.
Set A = A(0).
136
Step 1. Find an elementary matrix E such that A = E A has zeros below the (1,1) entry
1
(1)
1

in the rst column. That is, A has the form


(1)

0a a  a n 1
BB 0 a 11 12

 a n C CC
1

A =B
(1) (1)

BB .. ..
(1) 22
. . . ... CCA :
2

@ . .
0 a(1)n2    ann (1)

Note that it is only sucient to nd E1 such that


0a 1 0a 1
BB a 11
CC BB 0 11
CC
EBBB .. 21 CC = BB .
CA B@ ..
CC :
CA
@ .
1

0 an 1

Then A(1) = E1A will have the above form and is the same as the matrix A(1) obtained at the end
of step 1 of Algorithm 3.1.5.
Record the multipliers: m21; m31; : : :; mn1; mi1 = ; aai1 ; i = 2; : : :n.
11

Step 2. Find an elementary matrix E such that A = E A has zeros below the (2,2)
2
(2)
2
(1)

entry in the second column. The matrix E2 can be constructed as follows:


First, nd an elementary matrix Eb2 of order (n ; 1) such that
0a (1) 1 0a (1) 1
BB a 22
CC BB 0 22
CC
B CC BB CC
(1)

bE BBB ...
32

CC = BB 0 CC :
2
BB .. CC BB .. CC
@ . A @ . A
an
(1)
2 0
Record the multipliers: m ; : : :; mn ; mi = ; ai ; i = 3; : : :; n. Then de ne
(1)
2
32
a 2 2 (1)

01 0  01 0 1
22

B
B0 C
CC BB 1 0 CC
E =BB
B . C = B Eb CA :
@ ..
2
E^ C A @0 2
2

0
A(2) = E2A(1) will then have zeros below the (2,2) entry in the second column.
0a a   a n 1
BB 0 a    11 12

 a n C CC
1

BB
(1) (1)

A =B
22

 a n C
C2

BB 0. 0. a C:
(2) (2) (2)

. . . ... C
33 3

B@ .. .. CA
0 0 an (2)
3    ann (2)

137
Note that premultiplication of A(1) by E2 does not destroy zeros already created in A(1). This
matrix A(2) is the same as the matrix A(2) of Algorithm 3.1.5.

Step k. In general, at the kth step, an elementary matrix Ek is found such that A k = ( )

Ek A k; has zeros below the (k; k) entry in the kth column. Ek is computed in two successive
( 1)

steps. First, an elementary matrix Ebk of order n ; k + 1 is constructed such that


0 1
0 a k; 1 B akkk; C ( 1)
( 1)

BB ak;kk; CC BB 0 CC
B C
Ebk BBB k .. ;k CCC = BB 0 CC ;
( 1)
+1

@ . A BB ... CC
a k; @( A 1)
nk 0
and then Ek is de ned as 0I 1
0
BB k; 1
CC
Ek = BBB CC :
@ Ebk CA
0
Here Ik;1 is the matrix of the rst (k ; 1) rows and columns of the n  n identity matrix I . The
matrix A(k) = Ek A(k;1) is the same as the matrix A(k) of Algorithm 3.1.5.
Record the multipliers:
aik ( k;1)
mk ;k ; : : :; mik; mi;k = ; k; ; i = k + 1; : : :; n:
+1
akk ( 1)

Step n-1. At the end of the (n ; 1)th step, the matrix A n; is upper triangular and the ( 1)

same matrix A(n;1) of Algorithm 3.1.5.


0a a    a n 1
B
B 0 a
11


12

  a n
1
CC
B CC
(1) (1)

B
B0 0 a
22

an
2

CC
=B
(2) (2)

A n;1)
(
B
B
33
... ..
3
CC :
B 0 . CC
B
B . ... .. CA
@ .. .
0 0 0    0 annn; ( 1)

Obtaining L and U .
A n; = En; A n; = En; En; A n; =    = En; En; En;    E E A:
( 1)
1
( 2)
1 2
( 3)
1 2 3 2 1

Set
U = A n; ;
( 1)
L = En; En;    E E :
1 1 2 2 1

138
Then from above we have
U = L A: 1

Since each Ek is a unit lower triangular matrix (a lower triangular matrix having 1's along the
diagonal), so is the matrix L1 and, therefore, L;1 1 exists. (Note that the product of two triangular
matrices of one type is a triangular matrix of the same type.)
Set L = L;1 1 . Then the equation U = L1A becomes
A = LU:
This factorization of A is known as LU factorization.
De nition 5.2.3 The entries a ; a ; : : :; annn; are called pivots, and the above process of ob-
11
(1)
22
( 1)

taining LU factorization is known as Gaussian elimination without row interchanges . It is


commonly known as Gaussian elimination without pivoting.
An Explicit Expression for L
Since L = L;1 1 = E1;1E2;1    En;;11 and Ei;1 = I ; meTi , where
m = (0; 0; : : :; 0; mi ;i; : : :; mn;i);
+1

we have 0 1 0   01


B
B ;m 1 0 0C
C
B
B 21
.
CC
B
L = B ;m ;m 1 . . 0C CC :
B
B . . .
31
.
32

@ .. .. .. .. 0C A
;mn ;mn    ;mn;n; 11 2 1

Thus, L can be formed without nding any matrix inverse.


Existence and Uniqueness of LU Factorization
It is very important to note that for an LU factorization to exist, the pivots must be di erent
!
0 1
from zero. Thus, LU factorization may not exist even for a very simple matrix. Take A = .
1 0
It can be shown (exercise #3) that if Ar is the submatrix of A consisting of the rst r rows and
columns, then
(r ;1)
22    arr
det Ar = a11a(1) :
Karl Friedrich Gauss (1777-1855) was a German mathematician and astronomer, noted for development of many
classical mathematical theories, and for his calculation of the orbits of the asteroids Ceres and Pallas. Gauss is still
regarded as one of the greatest mathematicians the world has ever produced.

139
Thus, if det Ar ; r = 1; 2; : : :; n is nonzero, then an LU factorization always exists. Indeed, the LU
factorization in this case is unique, for if
A=L U =L U ;1 1 2 2

then, since L1 and L2 and the matrix A are nonsingular, from


det A = det(L1U1 ) = det L1  det U1
and
det A = det(L2U2 ) = det L2  det U2;
it follows that U1 and U2 are also nonsingular. Hence
L; L = U U ; ;
2
1
1 2 1
1

where L2L;1 1 is the product of two unit lower triangular matrices and is therefore unit lower
triangular; U2 U1;1 is the product of two upper triangular matrices and is therefore upper triangular.
Since a unit lower triangular matrix can be equal to an upper triangular matrix only if both are
the identity, we have
L1 = L2; U1 = U2:

Theorem 5.2.1 (LU Theorem) Let A be an nn matrix with all nonzero leading
principal minors. Then A can be decomposed uniquely in the form
A = LU;
where L is unit lower triangular and U is upper triangular.

Remark: Note that in the above theorem, if the diagonal entries of L are not speci ed,
then the factorization is not unique.
Algorithm 5.2.1 LU Factorization Using Elementary Matrices
Given an n  n matrix A, the following algorithm computes, whenever possible, elementary
matrices
E ; E ; : : :; En;
1 2 1

140
and an upper triangular matrix U such that with
L = E ; E ;    En;; ;
1
1
2
1 1
1

A = LU . The algorithm overwrites A with U .

For k = 1; 2;    ; n ; 1 do
1. If akk = 0; stop.
2. Find an elementary matrix Ebk = I + meTk , of order (n ; k + 1), where
m = (0; 0; : : :; 0; mk +1 ;k; : : :; mn;k);

such that 0a 1 0a 1
BB kk... CC BB 0kk CC
Ebk BBB .. CCC = BBB .. CCC :
@ . A @ . A
ank 0
Ik; 0
!
3. De ne Ek = 1
; where Ik;1 is the matrix of the rst (k ; 1) rows and columns of
0 Ebk
the n  n identity matrix I .
4. Save the multipliers
mk +1 ;k ; : : :; mn;k:

Overwrite aik with mi;k ; i = k + 1; : : :; n.


5. Compute A(k) = Ek A.
6. Overwrite A with A(k).
Example 5.2.1
Find an LU factorization of 02 2 31
B 4 5 6 CC :
A=B
@ A
1 2 4

141
Step 1. Compute E . The multipliers are: m = ;2; m = ; .
1 21 31
1

0 1 0 01
2

B C
E =B @ ;2 1 0 CA 1

; 0 1 1

0 1 0 0102 2 31 02 2 3 1 2

B CB C B C
AA =E A=B
(1)
@ ;2 1 0 CA B@ 4 5 6 CA = B@ 0 1 0 CA
1

; 0 1 1 2 1 1
2
0 1 ; 1
2

Step 2. Compute E . The multiplier is: m = ;1


2 32

1 0
!
Eb =
;1 1 2

01 0 01
!
E =B
B C 1 0
@ 0 1 0 CA = 0 Eb
2

0 ;1 1 2

01 0 0102 2 31 02 2 31
AA =E A =B
B 0 1 0 CC BB 0 1 0 CC BB C
(2)
@ 2
(1)
A@ A = @ 0 1 0 CA
0 ;1 1 0 1 0 0 5 5

02 2 31
2 2

B 0 1 0 CC :
Thus U = B
@ A
0 0 5

Compute L: 2

0 1 0 01
B C
L =E E =B @ ;2 1 0 CA
1 2 1

; ;1 1 1

0 1 0 0
1 0 1 0 01 2

L = L; = B
B ;m 1 0C
C BB C
@1
1
A = @ 2 1 0 CA
21

;m ;m 1 31 1 1 32
1
2

(Note that neither L1 nor its inverse needs to be computed explicitly.)

Forming the Matrix L and Other Computational Details


1. Each elementary matrix Ek is uniquely determined by (n ; k) multipliers
mk +1;k ; : : :; mnk:

Thus to construct Ek , it is sucient to save these multipliers.


142
2. The premultiplication of A(k;1) by Ek a ects only the rows (k +1) through n. The rst k rows
of the product remain same as those of A(k;1) and only the last (n ; k) rows are modi ed.
Thus, if Ek A(k;1) = A(k) = (a(ijk)), then
(a) a(ijk) = a(ijk;1) (i = 1; 2; : : :; k; j = 1; 2; : : :; n) (the rst k rows).
(b) a(ijk) = a(ijk;1) + mik a(kjk;1) (i = k + 1; : : :; n; j = k + 1; : : :; n) (the last (n ; k) rows).
(Note that this is how the entries of A(k) were obtained from those of A(k;1) in Algorithm
3.1.5.)
For example, let n = 3.
E A = A = (aij );
1
(1) (1)

0 1 0 01
B C
E =B
@ m 1 0 CA
1 21

m 0 1
0 1 0 010a 1 0a a 1
31

a a a
B CB 11 12 13
CC BB
11 12 13
CC
E A=B
1 @ m 1 0 CA B@ a
21 a21 22 a 23 A=@ 0 a
(1)
22 a (1)
23 A
m 31 0 1 a a 31 32 a 33 0 a(1)
32 a (1)
33

where
a (1)
22 = a22 + m21a21
a (1)
23 = a23 + m31a13
and so on.
3. As soon as A(k) is formed, it can overwrite A.
4. The vector (mk+1;k; : : :; mnk) has (n ; k) elements and at the kth step exactly (n ; k) zeros
are produced, an obvious scheme of storage will then be to store these (n ; k) elements in
the positions (k + 1; k); (k + 2; k); : : :; (n; k) of A below the diagonal.
5. The entries of the upper triangular matrix U then can be stored in the upper half part of
A including the diagonal. With this storage scheme, the matrix A(n;1) at the end of the
(n ; 1)th step will be
0 a a   an 1
BB ;m a      
11 12

an
1
CC
BB . . . CC
(1) (1)
21
..
21 2

A  A n;
( 1)
=B BB ... . . . . . CC
B@ .. . .. . .. .. CC
. A
;mn       ;mn;n; annn;
1 1
( 1)

143
Thus a typical step of Gaussian elimination for LU factorization consists of
(1) forming the multipliers and storing them in the appropriate places below diagonal,
(2) updating the entries of the rows (k + 1) through n and saving them in the upper half of A.
Based on the above discussion, we now present the following algorithm.
Algorithm 5.2.2 Triangularization Using Gaussian Elimination without Pivoting
Let A be an n  n matrix. The following algorithm computes triangularization of A, whenever
it exists. The algorithm overwrites the upper triangular part of A including the diagonal with U ,
and the entries of A below the diagonal are overwritten with multipliers needed to compute L.
For k = 1; 2; : : :; (n ; 1) do
1. (Form the multipliers)
aik  mik = ; aik (i = k + 1; k + 2; : : :; n)
akk
2. (Update the entries)
aij  aij + mikakj (i = k + 1; : : :; n; j = k + 1; n).

Remark: The algorithm does not give the matrix L explicitly; however, it can be formed out of
the multipliers saved at each step, as shown earlier (see the explicit expression for L).

Flop-Count. The algorithm requires roughly n3 ops. This can be seen as follows:
3

At step 1, we compute (n ; 1) multipliers and update (n ; 1) entries of A. Each multiplier


2

requires one op and updating each entry also requires 1 op. Thus, step 1 requires (n ; 1) +(n ; 1)
2

ops.
Step 2, similarly, requires [(n ; 2)2 + (n ; 2)] ops, and so on. In general, step k requires
[(n ; k)2 + (n ; k)] ops. Since there are (n ; 1) steps, we have
nX
;1 nX
;1
Total ops = (n ; k)2 + (n ; k)
k=1 k=1

= n(n ; 1)(2
6
n ; 1) + n(n ; 1)
2
n 3 
' 3 + O(n ) :
2

144
Recall
1. 12 + 22 +    + r2 = r(r+1)(2r+1)
6

2. 1 + 2 +    + r = r(r+1)
2

Gaussian Elimination for a Rectangular Matrix


The above described Gaussian elimination process for an n  n matrix A can be easily extended
to an m  n matrix to compute its LU factorization, when it exists. The process is identical, only
the number of steps in this case is k = minfm ; 1; ng: The following is an illustrative example. Let
01 21
B CC
A=B @3 4A
5 6
m = 3 ; n = 2:
The number of steps k = min(2; 2) = 2:

Step 1. The multipliers are m = ;3; m = ;5:21 31

a  a = a + m a = ;2
22
(1)
22 22 21 12

a  a = a + m a = ;4 (1)

01 2 1
32 32 32 31 13

A  A =B
B 0 ;2 CC :
@ A (1)

0 ;4
Step 2. The multiplier is m = ;2; a  a = 0. (2)
32
01 2 1 32 32

B C
A  A =B @ 0 ;2 CA (2)

0 0
01 2 1
B C
U = B
@ 0 ;2 CA :
0 0
Note that U in this case is upper trapezoidal rather than an upper triangular matrix
0 1 0 0
1 01 0 01
B C B C
L=B @ ;m21 1 0 CA = B@ 3 1 0 CA :
;m31 ;m32 1 5 2 1
145
Verify that 01 0 0101 2 1 01 21
B CB C B C
LU = B
@ 3 1 0 CA B@ 0 ;2 CA = B@ 3 4 CA = A:
5 2 1 0 0 5 6
Diculties with Gaussian Elimination without Pivoting
As we have seen before, Gaussian elimination without pivoting fails if any of the pivots is zero.
However, it is worse yet if any pivot becomes close to zero: in this case the method can be
carried to completion, but the obtained results may be totally wrong.
Consider the following celebrated example from Forsythe and Moler (CSLAS, p. 34):
Let Gaussian elimination without pivoting be applied to
0:0001 1
!
A= ;
1 1
and use three-digit arithmetic. There is only one step. We have: multiplier m21 = 10;;14 = ;104
0:0001 1
!
U = A = (1)

0 ;104
1 0
!
L = :
104 1
The product of the computed L and U gives
0:0001 1
!
LU = ;
2 0
which is di erent from A. Who is to blame?
Note that the pivot a(1)
11 = 0:0001 is very close to zero (in three-digit arithmetic). This small

pivot gave a large multiplier. The large multiplier, when used to update the entries, eliminated the
small entries (e.g., (1 ; 104) gave ;104). Fortunately, we can avoid this small pivot just by
row interchanges. Consider the matrix with the rst row written second and the second written
rst:
1 1
!
A=0 :
0:0001 1
Gaussian elimination now gives
1 1
! 1 0
!
U =A = (1)
; L=
0 1 0:0001 1
1 1
!
Note that the pivot in this case is a = 1. The product LU =
(1)
= A0 .
11
0:0001 1:0001
146
5.2.2 Gaussian Elimination with Partial Pivoting
In the example above, we have found a factorization of the matrix A0 which is a permuted version
of A in the sense that the rows have been swapped. A primary purpose of factorizing a matrix A
into LU is to use this factorization to solve a linear system. It is easy to see that the solution of the
system Ax = b and that of the system A0 x = b0, where b0 has been obtained in a manner similar to
that used to generate A0 , are the same. Thus, if the row interchanges can help avoid a small pivot,
it is certainly desirable to do so.
As the above example suggests, disaster in Gaussian elimination without pivoting can perhaps
be avoided by identifying a \good pivot" (a pivot as large as possible) at each step, before the
process of elimination is applied. The good pivot may be located among the entries in a column
or among all the entries in a submatrix of the current matrix. In the former case, since the search
is only partial, the method is called partial pivoting; in the latter case, the method is called
complete pivoting. It is important to note that the purpose of pivoting is to prevent
large growth in the reduced matrices which can wipe out original data. One way to
do this is to keep multipliers less than one in magnitude, and this is exactly what is accomplished
by pivoting. However, large multipliers do not necessarily mean instability (see our discussion of
Gaussian elimination without pivoting for symmetric positive de nite matrices in Chapter 6). We
rst describe Gaussian elimination with partial pivoting.
The process consists of (n ; 1) steps.

Step 1. Scan the rst column of A to identify the largest element in magnitude in that column.
Let it be ar1 1 .
;

 Form a permutation matrix P by interchanging the rows 1 and r of the identity matrix and
1 1

leaving the other rows unchanged.


 Form P A by interchanging the rows r and 1 of A.
1 1

 Find an elementary lower triangular matrix M such that A = M P A has zeros below the
1
(1)
1 1

(1,1) entry on the rst column.


It is sucient to construct M1 such that
0a 1 01
BB ... 11
CC BB 0 CC
MBBB .. CC = BB . CC :
CA B@ .. CA
@ .
1

an 1 0

147
Note that 0 1 0  01
BB m C
1 0  0C
BB 21 CC
M =B BB m. 0 1  0CC
... C
1 31

B@ .. C A
mn 0 0    11

where m = ; aa , m = ; aa ; : : :; mn = ; aan . Note that aij refers to the (i; j )th entry of the
21
21
31
31
1
1

permuted matrix P A. Save the multipliers mi ; i = 2; : : :; n and record the row interchanges.
11 11 11
1 1

0    1
BB 0    C CC
BB C
A =B BB 0.    C
.. C
(1)

B@ .. .. C
. .C A
0    
Step 2. Scan the second column of A below the rst row to identify the largest element
(1)

in magnitude in that column. Let the element be a(1) r2 ;2 . Form the permutation matrix P2 by
interchanging the rows 2 and r2 of the identity matrix and leaving the other rows unchanged. Form
P2 A(1).
Next, nd an elementary lower triangular matrix M2 such that A(2) = M2P2 A(1) has zeros below
the (2,2) entry. M2 is constructed as follows. First, construct an elementary matrix M c2 of order
(n ; 1) such that 0 1 0 1 a 
BB a 22
CC BB 0 CC
BB . 32 CC BB CC
c
M =B BB ... CC = BB 0 CC ;
2

B@ .. CC BB .. CC
A @.A
an 2 0
then de ne 01 0  01
BB 0 CC
M =BB CC :
B@ ...
2
c
M C A 2

0
Note that aij refers to the (i; j )th entry of the current matrix P A . At the end of Step 2,
2
(1)

148
we will have
0    1
BB 0    C CC
BB C
A =M P A =B
(2)
BB 0.
2 2
(1)
0   C CC ;
B@ .. .. . . . .C .
.
. A
0 0   
01 0 0   01
BB 0 1 0   0C CC
BB C
B0 m 1   0C
M =B BB .. .. 32
... .. CC
2
BB .. .. .C C
B@ .. .. ... 0 C CA
0 mn2 0   1
where mi2 = ; aai2 , i = 3; 4; : : :; n.
Save the multipliers mi2 and record the row interchange.
22

Step k. In general, at the kth step, scan the entries of the kth column of the matrix A(k;1)
below the row (k ; 1) to identify the pivot ar , form the permutation matrix Pk , and nd an
k;k

elementary lower triangular matrix Mk such that A(k) = Mk Pk A(k;1) has zeros below the (k; k)
entry. Then Mk is constructed rst by constructing M ck of order (n ; k + 1) such that
0a 1 01
BB kk... CC BB 0 CC
ck BB .. CC = BB .. CC ;
M B@ . CA B@ . CA
ank 0
and then de ning !
Ik; 0
Mk = 1

ck ;
0 M
where `0' is a matrix of zeros. The elements ai;k refer to the (i; k)th entries of the matrix
Pk A k; .
( 1)

Step n-1. At the end of the (n ; 1)th step, the matrix A n; will be an upper triangular( 1)

matrix.

Form U : Set
A n; = U:
( 1)
(5.2.1)

149
Then
U = A n; = Mn; Pn; A n;
( 1)
1 1
( 2)

= Mn; Pn; Mn; Pn; A n; =    = Mn; Pn; Mn; Pn;    M P M P A


1 1 2 2
( 3)
1 1 2 2 2 2 1 1

Set
Mn; Pn; Mn; Pn;    M P M P = M
1 1 2 2 2 2 1 1 (5.2.2)
Then we have from above the following factorization of A :
U = MA

Theorem 5.2.2 (Partial Pivoting Factorization Theorem) Given an n  n


nonsingular matrix A, Gaussian elimination with partial pivoting gives an upper
triangular matrix U and a \permuted" lower triangular matrix M such that
MA = U;
where M and U are given by (5.2.2) and (5.2.1), respectively.

From (5.2.2) it is easy to see that there exists a permutation matrix P such that PA = LU .
De ne
P = Pn;    P P 1 2 1 (5.2.3)
L = P (Mn; Pn;    M P ); : 1 1 1 1
1
(5.2.4)
Then PA = LU .

Corollary 5.2.1 (Partial Pivoting LU Factorization Theorem). Given an


n  n nonsingular matrix A, Gaussian elimination with partial pivoting yields LU
factorization of a permuted version of A:
PA = LU;
where P is a permutation matrix given by (5.2.3), L is a unit lower triangular matrix
given by (5.2.4), and U is an upper triangular matrix given by (5.2.1).

150
Example 5.2.2
0:0001 1
!
A= :
1 1
Only one step. The pivot entry is 1, r1 = 2
0 1
!
P = ;
1
1 0
1 1
!
PA =
1
0:0001 1
m = 1 = ;10;4
; 0:0001
21

1 0
! 1 0
!
M = =
1
m21 1 ;10;4 1
1 0
! 1 1! 1 1!
MPA = = =U
1 1
;10;4 1 0:0001 1 0 1
1
!
0 0 1
! 0 1 !
M = M1P1 = =
;10;4 1 1 0 1 ;10;4
0 1
! 10;4 1 ! 1 1 !
MA = = :
1 ;10;4 1 1 0 1
Example 5.2.3
Triangularize 0 0 1 11
B 1 2 3 CC
A=B
@ A
1 1 1
using partial pivoting. Express A = MU . Find also P and L such that PA = LU .

Step 1. The pivot entry is a = 1, r = 2


21 1

00 1 01
B C
P = B @ 1 0 0 CA
1

0 0 1
01 2 31
B C
PA = B
1 @ 0 1 1 CA ;
1 1 1

151
0 1 0 01
B C
M = B
@ 0 1 0 CA
1

;1 0 1
01 2 3 1
A = M P A=B
B 0 1 1 CC
(1)
@ A 1 1

0 ;1 ;2
Step 2. The pivot entry is a = 1 22

01 2 3 1
PA = B
B 0 1 1 CC ;
2 @
(1)
A
0 ;1 ;2
P = I (no interchange is necessary)
!
2 3

c = 1 0
M ;
2
1 1
01 0 01
!
M =
I 0 1
=
BB 0 1 0 CC
2
c
0 M @ A
0 1 1 2

01 2 3 1
B C
U = A(2) = M2 P2 A(1) = B
@ 0 1 1 CA
0 0 ;1
00 1 01
M = M2 P2 M1P1 = B
B 1 0 0 CC :
@ A
1 ;1 1
It is easily veri ed that A = MU .

Form L and P :
00 1 01
B 1 0 0 CC
P = P P =B
@ 2 A 1

0 0 1
01 0 01
B C
L = P (M P M P ); = B
@ 0 1 0 CA :
2 2 1 1
1

1 ;1 1
It is easy to verify that PA = LU .

152
Forming the Matrix M and Other Computational Details
1. Each permutation matrix Pk can be formed just by recording the index rk ; since Pk is the
permuted identity matrix in which rows k and rk have been interchanged. However, neither
the permutation matrix Pk nor the product Pk A(k;1) needs to be formed explicitly. This is
because the matrix Pk A(k;1) is just the permuted version of A(k;1) in which the
rows rk and k have been interchanged.
2. Each elementary matrix Mk can be formed just by saving the (n ; k + 1) multipliers. The
matrices MkPk A(k;1) = Mk B also do not need to be computed explicitly. Note that
the elements in the rst k rows of the matrix Mk B are the same as the elements of the rst
k rows of the matrix B, and the elements in the remaining (n ; k) rows are given by:
bij + mik bkj (i = k + 1; : : :; n; j = k + 1; : : :; n):

3. The multipliers can be stored in the appropriate places of lower triangular part of A (below
the diagonal) as they are computed.
4. The nal upper triangular matrix U = A(n;1) is stored in the upper triangular part.
5. The pivot indices rk are stored in a separate single subscripted integer array.
6. A can be overwritten with each A(k) as soon as the latter is formed.
Again, the major programming requirement is a subroutine that computes an elemen-
tary matrix M such that, given a vector a, Ma is a multiple of the rst column of the
identity matrix.
In view of our above discussion, we can now formulate the following practical algorithm for
LU factorization with partial pivoting.
Algorithm 5.2.3 Triangularization Using Gaussian Elimination with Partial Pivoting
Let A be an n  n nonsingular matrix. Then the following algorithm computes the triangu-
larization of A with rows permuted, using Gaussian elimination with partial pivoting. The upper
triangular matrix U is stored in the upper triangular part of A, including the diagonal. The multi-
pliers needed to compute the permuted triangular matrix M such that MA = U are stored in the
lower triangular part of A. The permutation indices rk are stored in a separate array.
For k = 1; 2; : : :; n ; 1 do

153
1. Find rk so that jar ;k j = kmax
k
ja j. Save rk.
in ik
If ar ;k = 0, then stop. Otherwise, continue.
k

2. (Interchange the rows rk and k) akj $ ark;j (j = k; k + 1; : : :; n).


3. (Form the multipliers) aik  mik = ; aaik (i = k + 1; : : :; n).
kk
4. (Update the entries) aij  aij + mik akj = aij + aik akj (i = k + 1; : : :; n; j = k + 1; : : :; n).

Flop-count. The algorithm requires about n3 ops and O(n ) comparisons.


3
2

(Note that the search for the pivot at step k requires (n ; k) comparisons.)

Note: Algorithm 5.2.3 does not give the matrices M and P explicitly. However, they
can be constructed easily as explained above, from the multipliers and the permutation
indices, respectively.
Remark: The above algorithm accesses the rows of A in the innermost loop and that is why
it is known as the row-oriented Gaussian elimination (with partial pivoting) algorithm.
It is also known as the kij algorithm; note that i and j appear in the inner loops. The column-
oriented algorithm can be similarly developed. Such a column-oriented algorithm has been used in
LINPACK (LINPACK routine SGEFA).
Example 5.2.4 01 2 41
B C
A=B
@ 4 5 6 CA :
7 8 9
Step 1. k = 1.
1. The pivot entry is 7: r1 = 3.
2. Interchange rows 3 and 1: 07 8 91
B C
AB
@ 4 5 6 CA :
1 2 4
3. Form the multipliers:
a  m = ; 74 ; a  m = ; 17 :
21 21 31 31

154
4. Update: 07 8 9 1
B CC
AB
@0 A: 3
7
6
7

0 6
7
19
7

Step 2. k = 2.
1. The pivot entry is 67 .
2. Interchange rows 2 and 3:
07 8 9 1
B CC
AB
@0 A: 6
7
19
7

0 3
7
6
7

3. Form the multipliers:


m = ; 12 :
32

4. Update: 07 8 9 1
B0
A = B
CC :
@ A6
7
19
7

0 0 ; 1
2

Form M . 00 0 11
M =B
B 1 0 ; CC :
@ A 1
7

; 1 ;
1
2
1
2

5.2.3 Gaussian Elimination with Complete Pivoting


In Gaussian elimination with complete pivoting, at the kth step, the search for the pivots is made
among all the entries of the submatrix below the rst (k ; 1) rows. Thus, if the pivot is ars , to
bring this pivot to the (k; k) position, the interchange of the rows r and k has to be followed by
the interchange of the columns k and s. This is equivalent to premultiplying the matrix A(k;1) by
a permutation matrix Pk obtained by interchanging rows k and r and post-multiplying Pk A(k;1)
by another permutation matrix Qk obtained by interchanging the columns k and s of the identity
matrix I . The ordinary Gaussian elimination is then applied to the matrix Pk A(k;1)Qk ; that is, an
elementary lower triangular matrix Mk is sought such that the matrix
A k = Mk Pk A k; Qk
( ) ( 1)

has zeros on the kth column below the (k; k) entry. The matrix Mk can of course be computed in
two smaller steps as before.
155
At the end of the (n ; 1)th step, the matrix A(n;1) is an upper triangular matrix. Set
A n; = U:
( 1)
(5.2.5)
Then
U = A n; = Mn; Pn; A n; Qn;
( 1)
1 1
( 2)
1

= Mn; Pn; Mn; Pn; A n; Qn; Qn;


1 1 2 2
( 3)
2 1

=    = Mn; Pn; Mn; Pn;    M P AQ Q    Qn; :


1 1 2 2 1 1 1 2 1

Set
Mn; Pn;    M P = M;
1 1 1 1 (5.2.6)
Q    Qn; = Q:
1 1 (5.2.7)
Then we have
U = MAQ:

Theorem 5.2.3 (Complete Pivoting Factorization Theorem) Given an nn


matrix A, Gaussian elimination with complete pivoting yields an upper triangular
matrix U , a permuted lower triangular matrix M and a permutation matrix Q such
that
MAQ = U;
where U; M , and Q are given by (5.2.5){(5.2.7).

As in the case of partial pivoting, it is easy to see from (5.2.4) and (5.2.7) that the factorization
MAQ = U can be expressed in the form:
PAQ = MU:

156
Corollary 5.2.2 (Complete Pivoting LU Factorization Theorem). Gaussian
elimination with complete pivoting yields the factorization PAQ = LU , where P
and Q are permutation matrices given by
P = Pn;    P ; 1 1

Q = Q    Qn; ;1 1

and L is a unit lower triangular matrix given by


L = P (Mn; Pn;    M P ); :
1 1 1 1
1

Example 5.2.5
Triangularize 00 1 1 1
B1 2 3
A=B
CC
@ A
1 1 1
using complete pivoting.

Step 1. The pivot entry is a = 3.


23

00 1 01
B C
P =B
1 @ 1 0 0 CA ;
0 0 1
00 0 11 03 2 11
B C B C
Q =B
1 @ 0 1 0 CA ; P AQ = B
1 @ 1 1 0 CA ;
1

1 0 0 1 1 1
0 1 0 01
M =B
B; C
@
1 1 0C
A; 1
3

; 0 1 1

03 2 1 3
1
B CC
A = M P AQ = B
(1)
1 1@0 ;1
1
3
1
3 A:
0 1
3
2
3

157
Step 2. The pivot entry is a = 23 .
(1)
33

01 0 01 01 0 01
B C B C
P =B @ 0 0 1 CA ;
2 Q =B
2 @ 0 0 1 CA :
0 1 0 0 1 0
03 2 11
!
B C c2 = 1 1 0
@ 0 23 13 CA ; M
P2A(1) Q2 = B :
1
0 ;3 31 1 2

01 0 01
M2 = B
B 0 1 0 CC ;
@ A
0 21 1
03 2 11
B 2 1 CC :
U = A(2) = M2P2A(1)Q2 = M2P2(M1P1AQ1 )Q2 = B @0 3 3 A
0 0 12
(Using Corollary 5.2.2, nd for yourself P , Q, L, and U such that PAQ = LU .)
Forming the Matrix M and other Computational Details
Remarks similar to those as in the case of partial pivoting hold. The matrices Pk ; Qk Pk A(k;1)Qk ,
Mk and Mk PkA(k;1)Qk do not have to be formed explicitly wasting storage unnecessarily. It is
enough to save the indices and the multipliers.
In view of our discussion on forming the matrices Mk and the permutation matrix Pk , we now
present a practical Gaussian elimination algorithm with complete pivoting, which does not show
the explicit formation of the matrices Pk ; Qk ; Mk ; MkA and Pk AQk . Note that partial pivoting
is just a special case of complete pivoting.
Algorithm 5.2.4 Triangularization Using Gaussian Elimination with Complete Pivot-
ing
Given an n  n matrix, the following algorithm computes triangularization of A with rows and
columns permuted, using Gaussian elimination with complete pivoting. The algorithm overwrites
A with U . U is stored in the upper triangular part of A (including the diagonal) and the multipliers
mik are stored in the lower triangular part. The permutation indices rk and sk are saved separately.
For k = 1; 2; : : :; n ; 1 do
1. Find rk and sk such that jar ;s j = max fjaij j : i; j  kg ; and save rk and sk .
k k

If ar ;s = 0, then stop. Otherwise, continue.


k k

158
2. (Interchange the rows rk and k) akj $ ar ;j (j = k; k + 1; : : :; n).
k

3. (Interchange the columns sk and k) aik $ ai;s (i = 1; 2; : : :; n).


k

4. (Form the multipliers) aik  mik = ; aik (i = k + 1; : : :; n).


akk
5. (Update the entries of A) aij  aij + mik akj = aij + aik akj (i = k + 1; : : :; n; j = k + 1; : : :; n).

Note: Algorithm 5.2.4 does not give the matrices M; P , and Q explicitly; they have
to be formed, respectively, from the multipliers mik and the permutation indices rk
and sk , as explained above.
Flop-count: The algorithm requires n3 ops and O(n ) comparisons.
3
3

Example 5.2.6
1 2
!
A= :
3 4
Just one step is needed.

k=1: The pivot entry is 4.


r = 2
1

s = 2:
1

First, the second and rst rows are switched and this is then followed by the switch of the second
and rst column to obtain the pivot entry 4 in the (1,1) position:
3 4
!
A
1 2
(After the interchange of the rst and second rows).
4 3
!
A
2 1
(After the interchange of the rst and second columns).
Multiplier is: a21  m21 = ; aa21 = ; 42 = ; 12
11

4 3
!
A
0 ; 12

159
(After updating the entries of A).
0 1
! 0 1
!
P1 = ; Q = Q1 = ;
1 0 1 0
!
1 0 0 1
! 0 1
!
M = M1P1 = 1 = :
;2 1 1 0 1 ; 12
5.3 Stability of Gaussian Elimination
The stability of Gaussian elimination algorithms is better understood by measuring the growth of
the elements in the reduced matrices A(k). (Note that although pivoting keeps the multipli-
ers bounded by unity, the elements in the reduced matrices still can grow arbitrarily).
We remind the readers of the de nition of the growth factor in this context, given in Chapter 3.
De nition 5.3.1 The growth factor  is the ratio of the largest element (in magnitude) of
A; A ; : : :; A n; to the largest element (in magnitude) of A:
(1) ( 1)

 = max( ; ; ; : : :; n; ) ;
1 2 1

where = max ja j and k = max


i;j ij
ja k j.
i;j ij
( )

Example 5.3.1
0:0001 1
!
A=
1 1
1. Gaussian elimination without pivoting gives
0:0001 1
!
A =U =
(1)

0 ;104
max ja(1)
ij j = 10
4

max jaij j = 1
 = the growth factor = 10 4

2. Gaussian elimination with partial pivoting yields


1 1
!
A =U =
(1)

0 1
max ja(1)
ij j = 1

max jaij j = 1
 = the growth factor = 1
160
The question naturally arises: how large can the growth factor  be for an arbitrary
matrix? We answer this question in the following.
1.

Growth Factor of Gaussian Elimination for Complete


Pivoting
For Gaussian elimination with complete pivoting,
  fn  2  3 12  4 13    n
1 ;1 1 g1=2:
n

This is a slowly growing function of n. Furthermore, in practice this bound is never attained.
Indeed, there was an unproven conjecture by Wilkinson (AEP, p. 213) that the growth
factor for complete pivoting was bounded by n for real n  n matrices. Unfortunately,
this conjecture has recently been settled by Gould (1991) negatively. Gould (1991)
exhibited a 13  13 matrix for which Gaussian elimination with complete pivoting gave the growth
factor  = 13:0205. In spite of Gould's recent result, Gaussian elimination with complete
pivoting is a stable algorithm.
2.

Growth Factor of Gaussian Elimination for Partial


Pivoting
For Gaussian elimination with partial pivoting,   2n;1, that is,
 can be as big as 2n; . 1

Unfortunately, one can construct matrices for which this bound is attained.

161
Consider the following example:
0 1
BB 1 0 0  0 1C
BB ;1 1 0  0 1C C
BB ... ... ... .. CC
. C
A=B BB .. C
.. C
BB . ... ... . C
C
BB ... ... . . . ... C CC
@ A
;1          ;1 1
That is, 8
> for j = i; n;
< 1
>
aij = > ;1 for j < i;
>
: 0 otherwise.
Wilkinson (AEP, pp. 212) has shown that the growth factor  for this matrix with partial pivoting
is 2n;1. To see this, take the special case with n = 4.
0 1 0 0 11
BB ;1 1 0 1 CC
A = B BB CC
@ ;1 ;1 1 1 CA
;1 ;1 ;1 1
01 0 0 11
BB 0 1 0 2 CC
A = B
(1)
BB CC
CA
@ 0 ; 1 1 2
0 ;1 ;1 2
01 0 0 11
BB CC
B 0 1 0 2 C
A(2) = B B@ 0 0 1 4 CCA
0 0 ;1 4
01 0 0 11
BB 0 1 0 2 CC
A = B
(3)
BB CC
@ 0 0 1 4 CA
0 0 0 8
Thus the growth factor
 = 81 = 23 = 24;1:

162
Remarks: Note that this is not the only matrix for which  = 2n; . Higham and Higham (1987)
1

have identi ed a set of matrices for which  = 2n;1. The matrix


0 0:7248 0:7510 0:5241 0:7510 1
BB 0:7317 0:1889 0:0227 ;0:7510 CC
B=B BB CC
@ 0:7298 ;0:3756 0:1150 0:7511 CA
;0:6993 ;0:7444 0:6647 ;0:7500
is such a matrix. See Higham (1993).
Examples of the above type are rare. Indeed, in many practical examples, the elements of the
matrices A(k) very often continue to decrease in size. Thus, Gaussian elimination with partial
pivoting is not unconditionally stable in theory, but in practice it can be considered a
stable algorithm.
3.

Growth Factor and Stability of Gaussian Elimination


without Pivoting
For Gaussian elimination without pivoting,  can be arbitrarily
large, except for a few special cases, as we shall see later, such as
symmetric positive de nite matrices. Thus Gaussian elimi-
nation without pivoting is, in general, a completely unstable
algorithm.

5.4 Householder Transformations


De nition 5.4.1 A matrix of the form
H = I ; 2uuu
T
Tu ;
where u is a nonzero vector, is called a Householder matrix after the celebrated numerical analyst
Alston Householder.
Alston Householder, an American mathematician, was the former Director of the Mathematics and Computer
Science Division of Oak Ridge National Laboratory at Oak Ridge, Tennessee and a former Professor of Mathematics
at the University of Tennessee, Knoxville. A research conference on Linear and Numerical Linear Algebra
dedicated to Dr. Householder, called \HOUSEHOLDER SYMPOSIUM" is held every three years around the world.
Householder died in 1993 at the age of 89. A complete biography of Householder appears in the SIAM Newsletter,
October 1993.

163
A Householder matrix is also known as an Elementary Re ector or a Householder transformation.
We now give a geometric interpretation of a Householder transformation.

u(uT x) x

;2u(uT x)
Hx = ( I ; 2uu)T x
= x ; 2u(uT x)
With this geometric interpretation the following results become clear:
 kHxk = kxk for every x 2 Rn.
2 2

A re ection does not change the length of the vector.


 H is an orthogonal matrix.
kHxk = kxk for every x implies that H is orthogonal.
2 2

 H = I.
2

Hx re ects x to the other side of P , but H x = H (Hx) re ects it back to x.


2

 Hy = y for every y 2 P .
Vectors in P cannot be re ected away.
 H has a simple eigenvalue ;1 and (n ; 1)-fold eigenvalue 1.
P = fv 2 Rn : v> u = 0g has n ; 1 linearly independent vectors y ; : : :; yn; and
1 1

Hyi = yi , i = 1; : : :; n ; 1. So 1 is an (n ; 1)-fold eigenvalue. Also, H re ects


u to ;u, i.e., Hu = ;u. Thus ;1 is an eigenvalue of H which must be a simple
eigenvalue because H can have only n eigenvalues.
164
 det(H ) = ;1
det(H ) = (;1)  1      1 = ;1.
Also from Figure 5.1, for given x; y 2 Rn with kxk2 = ky k2, if we choose u to be a unit vector
parallel to x ; y , then H = I ; 2uu> re ects x to y .
The importance of Householder matrices lies in the fact that they can also be used to create
zeros in a vector.

Lemma 5.4.1 Given a nonzero vector x 6= e , there always exists a Householder


1

matrix H such that Hx is a multiple of e1 .

Proof. De ne
H = I ; 2 uu
T
uT u
with u = x + sign(x1 )kxk2e1 , then it is easy to see that Hx is a multiple of e1 :

Note: If x is zero, its sign can be chosen either + or ;. Any possibility of over ow or under ow
1

in the computation of kxk can be avoided by scaling the vector x. Thus the vector u should be
determined from the vector max xfjx jg rather than from the vector x itself.
2

i i
Algorithm 5.4.1 Creating zeros in a vector with a Householder matrix
Given an n-vector x, the following algorithm replaces x by Hx = (; 0; : : :; 0)T , where H is a
Householder matrix.
1. Scale the vector x  max xfjx jg .
i i
2. Compute u = x + sign(x1 )kxk2e1 .
3. Form Hx where H = I ; 2uu .
T
uT u
Remark on Step 3: Hx in step 3 should be formed by exploiting the structure of H as shown
in Example 7 in Chapter 4.

165
Example 5.4.1
001
B CC
x=B
@4A
1
001
x  maxxfjx jg
B CC
=B
@1A
i 1

001 011 0 p 4
1
p
B CC 17 BB CC BB
17
4
CC
u=B
@1A+ @0A = @ 1
4 A
1
0 1
4
0 0 ; 0: 9701 ; 0 : 2425
14

T B CC
H = I ; 2uuu B
T u = @ ;0:9701 0:0588 ;0:2353 A
;0:2425 ;0:2353 0:9412
and 0 ;4:1231 1
B 0 CC :
Hx = B
@ A
0
Flop-Count and Round-o Property. Creating zeros in a vector by a Householder matrix
is a cheap and numerically stable procedure.
It takes only 2(n-2) ops to create zeros in the positions 2 through n in a vector, and it can
be shown (Wilkinson AEP, pp. 152-162) that if Hb is the computed Householder matrix, then
kH ; Hb k  10:
Moreover,
b ) = H (x + e);
(Hx
where
jej  cn kxk ;
2
2

c is a constant of order unity, and  is the machine precision.

166
5.4.1 Householder Matrices and QR Factorization

Theorem 5.4.1 (Householder QR Factorization Theorem) Given an n  n


matrix A, there exists an orthogonal matrix Q and an upper triangular matrix R
such that
A = QR:
The matrix Q can be written as Q = H1H2    Hn;1, where each Hi is a Householder
matrix.

As we will see later, the QR factorization plays a very signi cant role in numerical solutions
of linear systems, least-squares problems, eigenvalue and singular value computations.
We now show how the QR factorization of A can be obtained using Householder matrices, which
will provide a constructive proof of Theorem 5.4.1.
As in the process of LU factorization, this can be achieved in (n ; 1) steps; however, unlike
the Gaussian elimination process, the Householder process can always be carried out to
completion.
Step 1. Construct a Householder matrix H such that H A has zeros below the (1,1) entry
1 1

in the lst column: 0   1


BB 0      CC
BB CC
H A = B0   C
B :
BB .. .. .. CCC
1

@. . .A
0   
Note that it is sucient to construct H = I ; 2unuTn =(uTn un ) such that
1

0a 1 01
BB a CC BB 0 CC
11

HB BB .. CCC = BBB .. CCC ;


21

@ . A @.A
1

an 1 0
for then H1 A will have the above form.
Overwrite A with A = H1A for use in the next step.
(1)

167
Since A overwrites A(1) , A(1) can be written as:
0a a  a n1
BB 0 a    11 12 1
C
a nC
AA =BBB .. .. . . .
(1)
22 2 C
.. C :
@ . . . C
A
0 an    2 ann
Step 2. Construct a Householder matrix H such that H A has zeros below the (2,2) entry
2 2
(1)

in the 2nd column and the zeros already created in the rst column of A(1) in step 1 are not
destroyed: 0 1
    
BB 0    C CC
BB C
A =H A =B
(2)
BB 0. (1)
0   C C
. . . ... C
2

B@ .. .. CA
.
0 0   
H can be constructed as follows:
2

First, construct a Householder matrix Hb 2 = In;1 ; 2un;1uTn;1 =(uTn;1un;1); of order n ; 1 such


that 0 1 0 1 a 
BB a 22
CC BB 0 CC
B CC BB CC
bH BBB ...
32

CC = BB 0 CC ;
2
BB .. CC BB .. CC
@ . A @.A
an 2 0
and then de ne 01 0   01
BB CC
B 0 CC :
H =B
2
B@ ... bH CA 2

0
A(2) = H2A(1) will then have the form above.
Overwrite A with A . (2)

Note: Since H also has zeros below the diagonal on the lst column, premultiplication of A by
2
(1)

H preserves the zeros already created in step 1.


2

Step k. In general, at the kth step, rst create a Householder matrix


Hb k = In;k ; (2uuTn;k uun;k )
T +1 +1
+1
n;k n;k +1 +1

168
of order n ; k + 1 such that 0a 1 01
BB kk... CC BB 0 CC
Hb k BBB .. CCC = BBB .. CCC ;
@ . A @.A
ank 0
and then, de ning !
Ik; 0
Hk = 1
;
0 Hb k
compute A(k) = Hk A(k;1).
Overwrite A with A k :( )

The matrix A(k) will have zeros on the kth column below the (k; k)th entry and the zeros already
created in previous columns will not be destroyed. At the end of the (n ; 1)th step, the resulting
matrix A(n;1) will be an upper triangular matrix R.
Now, since
A(k) = Hk A(k;1); k = n ; 1; : : :; 2;
we have
R = A n; = Hn; A n; = Hn; Hn; A n;
( 1)
1
( 2)
1 2
( 3)
(5.4.1)
=    = Hn; Hn;    H H A:
1 2 2 1

Set
QT = Hn; Hn;    H H :
1 2 2 (5.4.2)
1

Since each Householder matrix Hk is orthogonal, so is QT . Therefore, from above, we have


R = QT A or A = QR: (5.4.3)
(Note that Q = H1T H2T    HnT;1 is also orthogonal.)
Forming the Matrix Q and Other Computational Details
1. Since each Householder matrix Hk is uniquely determined by the vector un;k+1; to construct
Hk it is sucient just to save the vector un;k+1:
2. A(k) = Hk A can be constructed using the technique described in Chapter 4 (Example 4.7)
which shows that A(k) can be constructed without forming the product explicitly.
3. The vector un;k+1 has (n ; k +1) elements, whereas only (n ; k) zeros are produced at the kth
step. Thus, one possible scheme for storage will be to store the elements (un;k+1;1; un;k+1;2; : : :;
un;k+1;n;k) in positions (k + 1; k); : : :; (n ; k + 1; k) of A: The last element un;k+1;n;k+1 has
to be stored separately.
169
4. The matrix Q, if needed, can be constructed from the Householder matrices H1 through Hn;1.
The major programming requirement is a subroutine for computing a Householder
matrix H such that, for a given vector x, Hx is a multiple of e . 1

Algorithm 5.4.2 Householder QR Factorization


Given an n  n matrix A, the following algorithm computes Householder matrices H1 through
Hn;1 and an upper triangular matrix R such that with Q = H1    Hn;1; A = QR: The algorithm
overwrites A with R.
For k = 1 to n ; 1 do
1. Find a Householder matrix Hb k = In;k+1 ; 2un;k+1uTn;k+1=uTn;k+1un;k+1 of order n ; k + 1
such that 0a 1 0r 1
BB kk... CC BB 0kk CC
Hb k BBB .. CCC = BBB .. CCC :
@ . A @ . A
ank 0
2. De ne
Ik;1 0
!
Hk = :
0 Hb k
3. Save the vector un;k+1.
4. Compute A(k) = Hk A.
5. Overwrite A with A(k).
Example 5.4.2
Let 00 1 11
B 1 2 3 CC :
A=B
@ A
1 1 1
Step 1.
k = 1:

170
Construct H :
001 01
1

B CC BB CC
HB
@1A = @0A
1

1 0
001 0 1 1 0 p2 1
B 1 CC + p BB CC BB CC
u =B
3 @ A 2@0A = @ 1 A
1 0 1
01 0 01 0 1 p12 p12
1
H = I ; 2uuT uu
T B 0 1 0 CC ; BB p
= B
CC
@ A @ A
3 3 1 1 1
1 3 2 2 2
3
0 0 1
3
p12 12 1

0 0 ; p1 ; p1 1 2

B p1 2 2
C
= B
@; 2 1
2
; C
A 1
2

;p 1
2
; 1
2
1
2

Form A :
(1)
0 ;p2 ;3p2 p 1
2 2
B
A =H A=B
p
2
1; 2 ;p C
CA
(1)
@ 0 1 2
p
2
2
p
2

0 ; (1+ 2)
2
; (2+ 2)
2

Overwrite: 0 ;1:414 ;2:1213 ;2:8284 1


B 0 ;0:2071 0:2929 CC
AA B
@ (1)
A
0 ;1:2071 ;1:7071
Step 2.
k=2
c:
Construct H
! !
2

bH ;0:2071 = 
;1:2071 02

;0:2071 ! 1
! ;1:4318 !
u = ; 1:2247 =
2
;1:2071 0 ;1:2071
;0:1691 ;0:9856 !
Hb =
;0:9856 0:1691
2

Construct H :2 01 1
0 0
H =B
B C
@ 0 ;0:1691 ;0:9856 CA
2

0 ;0:9856 0:1691
171
Form A : (2)

0 ;1:4142 ;2:1213 ;2:8284 1


B C
H A =H A =H H A=B
(2)
2
(1)
@ 0 2 1 1:2247 1:6330 CA=R
0 0 ;0:5774
Form Q: 0 0 1
0:8165 0:5774
Q = H H = BB@ ;0:7071 0:4082 ;0:5774 CCA
1 2

;0:7071 ;0:4082 0:5774


Flop-Count. The algorithm requires approximately n ops just to compute the triangular
2
3
3

matrix R. This can be seen as follows.


The construction of Hb k (and therefore of Hk ) requires about 2(n ; k) ops, while that of A(k)
from A(k) = Hk A (taking advantage of the special structure of Hk ) requires roughly 2(n ; k)2 ops.
Thus
nX
;1
Total number of ops = 2 [(n ; k)2 + (n ; k)]
k=1
= 2[(n ; 1)2 + (n ; 2)2 +    + 12] + 2[(n ; 1) + (n ; 2) +    + 1]
= 2 n(n ; 1)(2n ; 1) + 2  n(n ; 1)
6 2
 23n :
3

Note: The above count does not take into account the explicit construction of Q. Q is available
only in factored form. It should be noted that in a majority of practical applications, it is sucient
to have Q in this factored form and, in many applications, Q is not needed at all. If Q is needed
explicitly, another 32 n3 ops will be required. (Exercise #22)

Round-o Property. In the presence of round-o errors the algorithm computes QR de-
composition of a slightly perturbed matrix. Speci cally, it can be shown (Wilkinson AEP p. 236)
that if Rb denotes the computed R, then there exists an orthogonal Qb such that
A + E = Qb R:
b
The error matrix E satis es
kE kF  (n)kAkF ;
where (n) is a slowly growing function of n and  is the machine precision. If the inner
products are accumulated in double precision, then it can be shown (Golub and Wilkinson (1966))
that (n) = 12:5n. The algorithm is thus stable.
172
5.4.2 Householder QR Factorization of a Non-Square Matrix
In many applications (such as in least squares problems, etc.), one requires the QR factorization
of an m  n matrix A. The above Householder method can be applied to obtain QR factorization
of such an A as well. The process consists of s = minfn; m ; 1g steps; the Householder matrices
H ; H ; : : :; Hs are constructed successively so that
1 2
8 R!
>
< ; if m  n;
HsHs;    H H A = Q A = > 0
1 2
T
1
: (R; S ); if m  n.
Flop-Count and Round-o Property
The Householder method in this case requires
1. n2 (m ; n3 ) ops if m  n.
2. m2(n ; m3 ) ops if m  n.
The round-o property is the same as in the previous case. The QR factorization of a rectan-
gular matrix using Householder transformations is stable.
Example 5.4.3

0 1 1
1
B C
A = B
@ 0:0001 0 C A
0 0:0001
s = min(2; 3) = 2:

Step 1. Form H 1
0 1 1 011 0 2 1
B 0:0001 CC + p1 + (0:0001)
u =B
BB 0 CC = BB CC
@
2 A 2
@ A @ 0 :0001 A
0 0 0
0 ;1 ;0:0001 0 1
uu B ;0:0001
=B 0C
C
H =I;
1
2 2 T
2
uT2 u2 @ 1 A
0 0 1
0 ;1 ;1 1
B 0 ;0:0001 CC
A =H A=B
(1)
@ 1A
0 0:0001

173
Step 2: Form H 2

;0:0001 ! q 1
!
u = ; (;0:0001) + (0:0001) 2 2
1
0:0001 0
;2:4141 !
= 10; 4

:1000
1 0
! ;0:7071 0:7071 !
Hb = u u T
; 2 uT u =
1 1
2
0 1 1 0:7071 0:7071
1
01 0 0
1
BB C
H = 2 @ 0 ;0:7071 0:7071 CA
0 0:7071 0:7071
Form R 0 ;1 ;1 1
B C R
!
H A =B @ 0 0:0001 CA =
(1)
:
2
0
0 0
Form
0 ;1 0:0001 ;0:0001
1
B C
Q = H H =B
@ ;0:0001 ;0:7071 0:7071 CA
1 2

0 0:7071 0:7071
;1 ;1 !
R =
0 0:0001
5.4.3 Householder Matrices and Reduction to Hessenberg Form

Theorem 5.4.2 (Hessenberg Reduction Theorem) An arbitrary n  n ma-


trix can always be transformed to an upper Hessenberg matrix Hu by orthogonal
similarity:
PAP T = Hu:

As noted before, reduction to Hessenberg form is very important in eigenvalue computations.


The matrix A is routinely transformed to a Hessenberg matrix before the process of eigenvalue
computations (known as the QR iterations) starts. Hessenberg forms are also useful tools in
many other applications such as in control theory, signal processing, etc.

174
The idea of orthogonal factorization using Householder matrices described in the previous sec-
tion can be easily extended to obtain P and Hu .
The matrix P is constructed as the product of (n ; 2) Householder matrices P1 through Pn;2:
P1 is constructed to create zeros in the rst column of A below the entry (2,1), P2 is determined
to create zeros below the entry (3,2) of the second column of the matrix P1 AP1T , and so on.
The process consists of (n ; 2) steps. (Note that an n  n Hessenberg matrix contains at
least (n;2)(2 n;1) zeros.)

Step 1. Find a Householder matrix Pb of order n ; 1 such that


1

0a 1 01
BB a CC BB 0 CC
21

Pb BBB .. CCC = BBB .. CCC :


31

@ . A @.A
1

an 1 0
De ne
I 0
!
P = 1
1
0 Pb1
and compute
A = P AP T :
(1)
1 1

Overwrite A with A(1) : Then 0   1


BB    C
C
BB CC
AA =B BB 0.
(1)
  C CC
B@ .. .. CA
.
0   
Step 2. Find a Householder matrix Pb of order (n ; 2) such that
2

0a 1 01
BB ... CC BB 0 CC
32

Pb BBB .. CCC = BBB .. CCC :


@ . A @.A
2

an 2 0
De ne
I 0
!
P = 2
2
0 Pb2
and compute A(2) = P2 A(1)P2T :

175
Overwrite A with A : Then
(2)
0    1
BB     CCC
BB C
B0    C
AA =B BB (2) CC :
BB 0. 0   C CC
B@ .... .. CA
. .
0 0   
The general Step k can now easily be written down.
At the end of (n ; 2) steps, the matrix A(n;2) is an upper Hessenberg matrix Hu:
Now,
Hu = A n; = Pn; A n; PnT;
( 2)
2
( 3)
2

= Pn;2(Pn;3A(n;4)PnT;3 )PnT;2
..
.
= (Pn;2Pn;3    P1 )A(P1T P2T    PnT;3PnT;2 ): (5.4.4)
Set
P = Pn; Pn;    P :
2 3 1 (5.4.5)
We then have Hu = PAP T : Since each Householder matrix Pi is orthogonal, the matrix P which
is the product of (n ; 2) Householder matrices is also orthogonal.

Note: It is important to note that since Pk has the form


Ik 0
!
;
0 PbK
post-multiplication of Pk A by PkT does not destroy the zeros already created in Pk A: For example,
let n = 4 and k = 1: Then
01 0 0 01
BB 0    CC
P1 = B BB CC
CA
@ 0   
0   
0   1
BB     CC
PA = BBB CC
@ 0    CA
1

0   
176
and 0   101 0 0 0
1 0   1
B
B    C
C BB 0   C
C BB    C
C
B
P AP = B
T CC BB CC = BB CC :
1 1
B
@ 0    CA B@ 0   C A B@ 0   C A
0    0    0   

Forming the Matrix P And Other Computational Details


1. Each Householder matrix Pk is uniquely determined by the vector un;k : It is therefore su-
cient to save the vector un;k to recover Pk later. If the matrix P is needed explicitly, it can
be computed from the Householder matrices P1 through Pn;2:
2. The vector un;k has (n ; k) elements, whereas the number of zeros produced at the kth step is
(n ; k ; 1). Thus, the (n ; k) elements of un;k can be stored in the appropriate lower triangular
part of A below the diagonal if the subdiagonal entry at that step is stored separately. This is
indeed a good arrangement, since subdiagonal entries in a Hessenberg matrix are very special
and play a special role in many applications. Thus, all the information needed to compute P
can be stored in the lower triangular part of A below the diagonal, storing the subdiagonal
entries separately in a linear array of (n ; 1) elements.
Other arrangements of storage are also possible.
Algorithm 5.4.3 Householder Hessenberg Reduction
Given an n  n matrix A, the following algorithm computes Householder matrices P1 through
Pn;2 such that, with P T = P1    Pn;2, PAP T is an upper Hessenberg matrix Hu: The algorithm
overwrites A with Hu .
For k = 1; 2; : : :; n ; 2 do
1. Determine a Householder matrix Pbk = In;k ; 2un;k uTn;k =uTn;kun;k , of order n ; k, such that
0a 1 01
BB k ... ;k CC BB 0 CC
+1

PbK BBB .. CCC = BBB .. CCC :


@ . A @.A
ank 0
2. Save the vector un;k :
Ik 0
!
3. De ne Pk =
0 Pbk
177
4. Compute A(k) = Pk APkT
5. Overwrite A with A(k)

Flop-Count. The algorithm requires n ops to compute Hu. This count does not include
5
3
3

the explicit computation of P . P can be stored in factored form. If P is computed explicitly,


another 32 n3 ops will be required. However, when n is large, the storage required to form P is
prohibitive.

Round-O Property. The algorithm is stable. It can be shown, (Wilkinson AEP p.


351) that the computed Hu is orthogonally similar to a nearby matrix A + E , where
kE kF  cn kAkF :
2

Here c is a constant of order unity.


If the inner products are accumulated in double precision at the appropriate places in the
algorithm, then the term n2 in the above bound can be replaced by n, so in this case
kE kF  cnkAkF ;
which is very desirable.
Example 5.4.4
Let 00 1 21
B C
A=B
@ 1 2 3 CA :
1 1 1
Since n = 3; we have just one step to perform.
Form Pb1:
1
! !
Pb1 =
1 0
! ! p
1 p 1 p 1! 1 + 2!
u2 = + 2e1 = + 2 =
1 1 0 1
! ! !
bP1 = I2 ; 2uu2uu22  1 0 ; :2929 5:8284 2:4142 = ;0:7071 ;0:7071 :
T

T
2 0 1 2:4142 1 ;0:7071 0:7071

178
Form P : 1
01 1 01
0 0 0 0
1
B0
P =B
CC = BB 0 ;0:7071 ;0:7071 CC
1 @ A @ A
0 Pb 1 0 ;0:7071 0:7071
0 0 ;2:1213 0:7071 1
A  A = P AP T = B
B ;1:4142 3:5000 ;0:5000 CC = H :
(1)
@
1 1 A u
0 1:5000 ;0:5000
All computations are done using 4-digit arithmetic.
Tridiagonal Reduction
If the matrix A is symmetric, then from
PAP T = Hu
it follows immediately that the upper Hessenberg matrix Hu is also symmetric and, therefore, is
tridiagonal. Thus, if the algorithm is applied to a symmetric matrix A, the resulting matrix Hu
will be a symmetric tridiagonal matrix T . Furthermore, one obviously can take advantage of the
symmetry of A to modify the algorithm. For example, a signi cant savings can be made in storage
by taking advantage of the symmetry of each A(k):
The symmetric algorithm requires only 23 n3 ops to compute T compared to 53 n3 ops needed to
compute Hu: The round-o property is essentially the same as the nonsymmetric algorithm. The
algorithm is stable.
Example 5.4.5
Let 00 1 11
B 1 2 1 CC :
A=B
@ A
1 1 1
Since n = 3; we have just one step to perform.
Form Pb1:
1
! !
Pb1 =
1 0
1
! p 1
! p 1 ! 1 + p2 !
u2 = + 2e1 = + 2 =
1 1 0 1
! ! !
bP1 = I2 ; 2uu2uu22  1 0 ; :2929 5:8284 2:4142 = ;0:7071 ;0:7071 :
T
T
2 0 1 2:4142 1:0000 ;0:7071 0:7071
179
Form P : 1 01 1 01 1
0 0 0 0
B0
P =B
CC = BB 0 ;0:7071 ;0:7071 CC :
1@ A @ A
0 Pb 1 0 ;0:7071 0:7071
Thus 0 0 ;1:4142 0 1
Hu = P AP T = B
B ;1:4142 2:5000 0:5000 CC :
1 @ 1 A
0 0:5000 0:5000
(Note that Hu is symmetric tridiagonal.)

5.5 Givens Matrices


De nition 5.5.1 A matrix of the form j thcolumns
ith
0 # # 1
BB 1 0 0  
  0 C
BB 0 1 0   0 C
  C
BB ... ... ... .. C
BB  . C CC
B0 0 0  c s  0 C
C
ith
J (i; j; c; s) = B BB .. .. .. .. C
C
BB . . .  . C CC rows
BB 0 0 0    ;s c    0 C C j th
BB . . . .. C
B@ .. .. ..  . C CA
0 0 0   0  1
where c2 + s2 = 1, is called a Givens matrix, after the numerical analyst Wallace Givens.
Since one can choose c = cos  and s = sin  for some , a Givens matrix as above can be
conveniently denoted by J (i; j; ). Geometrically, the matrix J (i; j; ) rotates a pair of coordinate
axes (ith unit vector as its x-axis and the j th unit vector as its y -axis) through the given angle
 in the (i; j ) plane. That is why the Givens matrix J (i; j; ) is commonly known as a Givens
Rotation or Plane Rotation in the (i; j ) plane. This is illustrated in the following gure.

W. Givens was director of the Applied Mathematics Division at Argonne National Laboratory. His pioneering
work done in 1950 on computing the eigenvalues of a symmetric matrix by reducing it to a symmetric tridiagonal
form in a numerically stable way forms the basis of many numerically backward stable algorithms developed later.
Givens held appointments at many prestigious institutes and research institutions (for a complete biography, see the
July 1993 SIAM Newsletter). He died in March, 1993.

180
e2 + )
cos(
! cos ; sin ! cos 
!
v= =
sin( + ) sin cos sin 

 cos
!
u=
sin 


e1
Thus, when an n-vector 0x 1
BB x 1
CC
x=BBB .. 2 CC
CA
@.
xn
is premultiplied by the Givens rotation J (i; j; ), only the ith and j th components of x are a ected;
the other components remain unchanged.
Note that since c2 + s2 = 1, J (i; j; )  J (i; j; )T = I , thus the rotation J (i; j; ) is orthogonal.
x
!
If x = 1
is a 2-vector, then it is a matter of simple veri cation that, with
x 2

c= p x ; s= p x 1 2

x +x 2
1
2
2 x +x 2
1
2
2

c s
! !
the Givens rotation J (1; 2; ) = is such that J (1; 2; )x = :
;s c 0
The above formula for computing c and s might cause some under ow or over ow. However,
the following simple rearrangement of the formula might prevent that possibility.

If jx2 j  jx1j, compute t = xx1 , s = p 1 2 ; c = st.


1+t
x
2
1
Otherwise, t = x , c = p 2 , s = ct.
2

1+t
(Note that computations of s and t do not involve .)
1

Example 5.5.1

181
1
!
x= 1
2

Since jx1j > jx2j, we use t = 12 ; c = q 1 1 = p2 , s = p1 .


1+ 4 5 5
! 0 p2 p1 1 ! p5 !
c s 1
x = @ 15 25 A 1 = 2 :
;s c ;p p 2 0
5 5

Zeroing Speci ed Entries in a Vector


Givens rotations are especially useful in creating zeros in a speci ed position in a vector. Thus,
if 0x 1
BB x CC
1

BB . CC 2

BB .. CC
BB x CC
x=B BB ..i CCC
BB . CC
BB xk CC
BB .. CC
@.A
xn
and if we desire to zero xk only, we can construct the rotation J (i; k; ) (i < k) such that J (i; k; )x
will have zero in the kth position. !
c s
To construct J (i; k; ), rst construct a 2  2 Givens rotation such that
;s c
! ! !
c s xi
=
;s c xk 0
and then form the matrix J (i; k; ) by inserting c in the positions (i; i) and (k; k), s and ;s
respectively in the positions (i; k) and (k; i), and lling the rest of the matrix with entries of the
identity matrix.
Example 5.5.2
011
B CC
x=B
@ ;1 A :
3
Suppose we want to create a zero in the third position, that is, k = 3. Choose i = 2.
182
1. Form a 2  2 rotation such that
c s ! ;1 ! !
= ; c = p;1 ; s = p3 :
;s c 3 0 10 10
01 0 0
1
B p; CC
2. Form J (2; 3; ) = B
@0 1
10
p310 A:
0 p;103 p;10
1

Then 01 0 10 1 1 0 1 1
0
B p; CC BB CC BB p CC
J (2; 3; )x = B
@0 1
10
p310 A @ ;1 A = @ 10 A :
0 p;103 p;10
1
3 0

Creating Zeros in a Vector Except Possibly in the First Place


Given an n-vector x, if we desire to zero all the entries of x except possibly the rst one, we
can construct J (1; 2; ), x(1) = J (1; 2; )x, x(2) = J (1; 3; )x(1), x(3) = J (1; 4; )x(2), etc., so that
with
P = J (1; n; )    J (1; 3; )J (1; 2; );
we will have Px a multiple of e1 . Since each rotation is orthogonal, so is P .
Example 5.5.3

011
B CC
x=B
@ ;1 A
2
0p 1
0 ;p1 1
Bp
J (1; 2; ) = B
2

0C
C p2
2

@ 1
2 A 1

0 0 1
0 p2 1
B CC
x(1) = J (1; 2; )x = B
@0A
2
0 q 2 0 p2 1
B 06 1 06 CC :
J (1; 3; ) = B
@ q2 A
;2 0
p 6 6

183
Then
0 p6 1
B CC
J (1; 3; )x = B
(1)
@0A
0
0 p16 ;
p16 p26
1
B
P = J (1; 3; )J (1; 2; ) = B
CC
q@ p12 p12
q q0 A
; 2 2 2

0 p6 1
6 6 6

B 0 CC :
Px = B
@ A
0
Flop-Count and Round-o Property. Creating zeros in a vector using Givens rotations
is about twice as expensive as using Householder matrices. To be precise, the process requires
only 1 21 times ops as Householder's, but it requires O( n22 ) square roots, whereas the Householder
method requires O(n) square roots.
The process is as stable as the Householder method.

Creating Zeros in Speci ed Positions of a Matrix


The idea of creating zeros in speci ed positions of a vector can be trivially extended to create
zeros in speci ed positions of a matrix as well. Thus, if we wish to create a zero in the (j; i)[(j > i)]
position of a matrix A, one way to do this is to construct the rotation J (i; j; ) a ecting the ith
and j th rows only, such that J (i; j; )A will have a zero in the (j; i) position. The procedure then
is as follows.
Algorithm 5.5.1 Creating Zeros in a Speci ed Position of a Matrix Using Givens Ro-
tations
Given an n  n matrix A, the following algorithm overwrites A by J (i; j; )A such that the latter
has a zero in the (j; i) position.
1. Find c = cos(); s = sin() such that
c s ! aii ! !
=
;s c aji 0
2. Form J (i; j; ):

184
Remark: Note that there are other ways to do this as well. For example, we can form J (k; j; )
a ecting the kth and j th rows, such that J (k; j; )A will have a zero in the (j; i) position.
Example 5.5.4
Let 01 2 31
B C
A=B
@ 3 3 4 CA :
4 5 6
Create a zero in the (2,1) position of A using J (1; 2; ):
1. Find c and s such that
c s! 1
! x !
=
;s c 3 0
c= p ; 1
10
s = p310

2. Form J (1; 2; ).


0p 1 p310 0
1
B p; 10
C
J (1; 2; ) = B
@ 3
10
p110 0CA
0 0 1
0p p 10 11 p1510
1
B 0 ;p
J (1; 2; )A = B
10 10
5 C
p;10
C:
@ 3
10 A
4 5 6
Example 5.5.5
Let 01 2 31
B C
A=B
@ 2 3 4 CA :
4 5 6
Create a zero in the (3,1) position using J (2; 3; ):
1. Find c and s such that
c s 2
!  ! !
=
;s c 4 0
c= p ; s= p
2
20
4
20

185
2. Form
01 0 0
1
B0 p
J (2; 3; ) = B p420
CC
@ 2
20 A
0 p;204 p220
01 0 0
101 2 31 0 1 2 3 1
B0 p
A = J (2; 3; )A = B
C BB 2 3 4 CC = BB p20 p26 p32 CC
p420 C
(1)
@ 2
20 A@ A @ 20 20 A

0 p;204 p220 4 5 6 0 p;202 p;208


5.5.1 Givens Rotations and QR Factorization
It is clear from the foregoing discussion that, like Householder matrices, Givens rotations can also
be applied to nd the QR factorization of a matrix. The Givens method, however, is almost twice
as expensive as the Householder method. In spite of this, QR factorization using Givens rotations
seems to be particularly useful in QR iterations for eigenvalue computations and in the solution of
linear systems with structured matrices, such as Toeplitz, etc. Givens rotations are also emerging
as important tools in parallel computations of many important linear algebra problems.
Here we present the algorithm for QR factorization of an m  n matrix A, m  n, using Givens'
method.
The basic idea is just like Householder's: compute orthogonal matrices Q1; Q2; : : :; Qk , using
Givens rotations such that A(1) = Q1 A has zeros below the (1,1) entry in the rst column, A(2) =
Q2 A(1) has zeros below the (2,2) entry in the second column, and so on. Each Qi is generated as a
product of Givens rotations. One way to form fQi g is:
Q = J (1; m; )J (1; m ; 1; )    J (1; 2; );
1

Q = J (2; m; )J (2; m ; 1; )    J (2; 3; )


2

and so on.
Let s = min(m; n ; 1): Then
R = A s; = Qs; A s; = Qs; Qs; A s; =    = Qs; Qs;    Q Q A = QT A:
( 1)
1
( 2)
1 2
( 3)
1 2 2 1

We now have A = QR with QT = Qs;1Qs;2    Q2 Q1:

186
Theorem 5.5.1 (Givens QR Factorization Theorem) Given an m 
n matrix A, m  n, there exist s = min(m; n ; 1) orthogo-
nal matrices Q ; Q ; : : :; Qs; de ned by Qi = J (i; m; )J (i; m ; 1; )   
1 2 1

J (i; i + 1; ) such that if


Q = QT QT ;    ; QTs; ;
1 2 1

we have
A = QR;
where
R
!
R= 1

0
and R1 is upper triangular.

Forming the Matrix Q and Other Computational Details


In practice there is no need to form the n  n matrices J (k; `; ) and J (k; `; )A explicitly. Note
that J (k; `; ) can be determined by knowing c and s only, and J (k; `; )A replaces the kth and `th
rows of A by their linear combinations. Speci cally, the kth row of J (k; `; )A is c times the kth
row of A plus s times the `th row. Similarly, the `th row of J (k; `; ) is ;s times the kth row of A
plus c times the `th row. If the orthogonal matrix Q is needed explicitly, then it can be computed
from the product:
Q = QT1 QT2    QTs;1;
where each Qi is the product of (m ; i) Givens rotations:
Qi = J (i; m; )J (i; m ; 1; )    J (i; i + 1; ):

Algorithm 5.5.2 QR Factorization Using Givens Rotations


Given an m  n matrix A, the following algorithm computes an orthogonal matrix Q, using
Givens rotations such that A = QR. The algorithm overwrites A with R.
For k = 1; 2; : : :; minfn; m ; 1g do
For ` = k + 1; : : :; m do
187
1. Find c and s such that
c s
! a ! !
kk
= :
;s c a`k 0
2. Save the indices k and ` and the numbers c and s.
3. Form J (k; `; ):
4. Overwrite A with J (k; `; )A:

Flop-Count. The algorithm requires 2n ;m ; n  ops. This count, of course, does not
2
3

include computation of Q. Thus, the algorithm is almost twice as expensive as the House-
holder algorithm for QR factorization.
Round-o Property. The algorithm is quite stable. It can be shown that the computed Qb
and Rb satisfy
Rb = Qb T (A + E );
where
kE kF  ckAkF ; c is a constant of order unity (Wilkinson AEP, p. 240).

5.5.2 Uniqueness in QR Factorization


We have seen that QR factorization of a matrix A always exists and that this factorization may be
obtained in di erent ways. One therefore wonders if this factorization is unique. In the following we
will see that if A is nonsingular then the factorization is essentially unique. Moreover, if
the diagonal entries of the upper triangular matrices of the factorizations obtained by two di erent
methods are positive, then these two QR factorizations of A are exactly the same.
To see this, let
A = Q1 R1 = Q2 R2: (5.5.1)
Since A is nonsingular, then R1 and R2 are also nonsingular. Note that
det A = det(Q1R1) = det Q1 det R1
and
det A = det(Q2R2) = det Q2 det R2:
Since Q1 and Q2 are orthogonal, their determinants are 1. Thus det R1 and det R2 are di erent
from zero. From (5.5.1), we have
QT2 Q1 = R2R;1 1 = V;
188
so that V T V = QT1 Q2 QT2 Q1 = I . Now V is upper triangular (since R;1 1 and R2 are both upper
triangular), so equating elements on both sides, we see that V must be a diagonal matrix with 1
as diagonal elements. Thus,
R R; = V = diag(d ; d ; : : :; dn); with di = 1; i = 1; 2; : : :; n.
2 1
1
1 2

This means that R2 and R1 are the same except for the signs of their rows. Similarly, from
QT Q = V
2 1

we see that Q1 and Q2 are the same except for the signs of their columns. (For a proof, see Stewart
IMC, p. 214.)
If the diagonal entries of R1 and R2 are positive, then V is the identity matrix so that
R =R 2 1

and
Q =Q :
2 1

The above result can be easily generalized to the case where A is m  n and has linearly
independent columns.

Theorem 5.5.2 (QR Uniqueness Theorem) Let A have linearly independent


columns. Then there exist a unique matrix Q with orthonormal columns and a
unique upper triangular matrix R with positive diagonal entries such that
A = QR:

Example 5.5.6
We nd the QR factorization of 00 1 11
B C
A=B
@ 1 2 3 CA
1 1 1
using Givens Rotations and verify the uniqueness of this factorization.

189
Step 1. Find c and s such that
c s a ! ! !
= 11

;s c a 0 21

a = 0; a = 1
11 21

c = 0; s=1
0 0 1 01
B ;1 0 0 CC
J (1; 2; ) = B
@ A
0 0 1
0 0 1 0100 1 11 01 2 3 1
B CB C B C
A  J (1; 2; )A = B
@ ;1 0 0 CA B@ 1 2 3 CA = B@ 0 ;1 ;1 CA
0 0 1 1 1 1 1 1 1
Find c and s such that
c s a  ! ! !
= 11

;s c a 0 31

a = 1; c = p ; s = p
11
1 1

0p 0 p 1
2 2

1 1

B C 2 2

J (1; 3; ) = B
@ 0 1 0 CA
0 ;1
p p12
0 p1 0 p1 1 0 1 2 1 0p p 1
2

3 2 p32 2 2
B 2

A  J (1; 3; )A = @ 0 1 0 C
B
2
C BB 0 ;1 C = BB 0 ;1 ;1 CC
;1 C
A@ A @ p A
p;12 0 p12 1 1 1 0 p;12 ; 2
Step 2. Find c and s such that
c s! a ! !
=
22

;s c a 0 32
p
a = ;1;
22 a = ;p ; c = ;p ; s = ;p
32
1
2
2
3
1
3

01 0
10p p 0
p 1
2 3
2 2
B ;pp2 C BB C 2

J (2; 3; )A = B
@0 3
; pp C
A @ 0 ;1 ;p1 CA
1
3

0 p13 ;p 2
3
0 ;p ; 2 1
2
0 p2 p3
p 1 0
2 2 1:4142 2:1213 2:8284
1
B p23 p C B C
= B
@0 p2 p C2
A = B@ 0 1:2247 1:6330 CA = R
2
3

0 0 p13 0 0 0:5774
(using four digit computations).
190
Remark: Note that the upper triangular matrix obtained here is essentially the same as the
one given by the Householder method earlier (Example 5.4.2), di ering from it only in the signs of
the rst and third rows.

5.5.3 Givens Rotations and Reduction to Hessenberg Form


The Givens matrices can also be employed to transform an arbitrary n  n matrix A to an up-
per Hessenberg matrix Hu by orthogonal similarity: PAP T = Hu : However, to do this, Givens
rotations must be constructed in a certain special manner. For example, in the rst step,
Givens rotations J (2; 3; ); J (2; 4; ); : : :; J (2; n; ) are successively computed so that with P1 =
J (2; n; )    J (2; 4; )J (2; 3; ),
0    1
BB     C CC
BB C
P AP T = A = B BB 0.    C
.. C
(1)
1 1
B@ .. .. C
. .C A
0    
In the second step, Givens rotations J (3; 4; ), J (3; 5; ) : : :; J (3; n; ) are successively computed so
that with P = J (3; n; )J (3; n ; 1; )    J (3; 4; ),
2

0     1
BB           CC
BB CC
B 0     C
P A PT = A = B
(1) (2)
BB .. C C
2
0
BB . . .0  .C
.. C
2

B@ .. .. .. C
.C A
0 0    
and so on. At the end of (n ; 2)th step, the matrix A n; is the upper Hessenberg matrix Hu. The
( 2)

transforming matrix P is given by:


P = Pn; Pn;    P P ;
2 3 2 1

where Pi = J (i + 1; n; )J (i + 1; n ; 1; )    J (i + 1; i + 2; ):


Algorithm 5.5.3 Givens Hessenberg Reduction
Given an n  n matrix A, the following algorithm overwrites A with PAP T = Hu, where Hu is
an upper Hessenberg matrix.

191
For p = 1; 2; : : :; n ; 2 do
For q = p + 2; : : :; n do
1. Find c = cos() and s = sin() such that
c s ap ;p
!  ! !
=+1

;s c aq;p 0
2. Save c and s and the indices p and q:
3. Overwrite A with J (p + 1; q; )AJ (p + 1; q; )T :
Forming the Matrix P and Other Computational Details
There is no need to form J (p + 1; q; ) and J (p + 1; q; )A explicitly, since they are completely
determined by p; q and c and s (see the section at the end of the QR factorization algorithm using
Givens rotations). If P is needed explicitly, it can be formed from
P = Pn;    P P
2 2 1

where Pi = J (i + 1; n; )J (i + 1; n ; 1; )    J (i + 1; i + 2; ):

Flop-Count. The algorithm requires about n ops to compute Hu compared to n re-


10
3
3 5
3
3

quired by the Householder method. Thus, the Givens reduction to Hessenberg form is
about twice as expensive as the Householder reduction. If the matrix A is symmetric, then
the algorithm requires about 34 n3 ops to transform A to a symmetric tridiagonal matrix T ; again
this is twice as much as required by the Householder method to do the same job.

Round-o Property. The round-o property is essentially the same as the Householder
method. The method is numerically stable.
Example 5.5.7
00 1 11
B 1 2 3 CC
A=B
@ A
1 1 1

192
Step 1. Find c and s such that
c s! a ! !
=
21

;s c a 0 31

a = a = 1; c = p ; s = p
21 31
1
2
1
2
01 0 0
1
B0
J (2; 3; ) = B p12 p12
CC
@ A
0 ; p12 p12
0 0 2:1213 0:7171
1
B C
A  J (2; 3; )AJ (2; 3; )T = B
@ 1:4142 3:5000 0:5000 CA
0 ;1:5000 ;0:5000
= Upper Hessenberg.

Observation: Note that the Upper Hessenberg matrix obtained here is essentially the same as
that obtained by Householder's method (Example 5.4.4) the subdiagonal entries di er only in sign.

5.5.4 Uniqueness in Hessenberg Reduction


The above example and the observation made therein brings up the question of uniqueness in
Hessenberg reduction. To this end, we state a simpli ed version of what is known as the Implicit
Q Theorem. For a complete statement and proof, see Golub and Van Loan (MC 1983, pp.
223-234).

Theorem 5.5.3 (Implicit Q Theorem) Let P and Q be orthogonal matrices


such that P T AP = H and QT AQ = H are two unreduced upper Hessenberg
1 2

matrices. Suppose that P and Q have the same rst columns. Then H1 and H2 are
essentially the same in the sense that H2 = D;1 H1D, where
D = diag(1; : : :; 1):

193
Example 5.5.8
Consider the matrix 00 1 21
B C
A=B
@ 1 2 3 CA
1 1 1
once more. The Householder method (Example 5.4.4) gave
0 0 ;2:1213 0:7171 1
B C
H1 = P1AP1T = B @ ;1:4142 3:5000 ;0:5000 CA
0 1:5000 ;0:5000
The Givens method (Example 5.5.7) gave
0 0 2:1213 0:7171
1
B C
H2 = J (2; 3; )AJ (2; 3; )T = B
@ 1:4142 3:5000 0:5000 CA = H2:
0 ;1:5000 ;0:5000
In the notation of Theorem 5.5.3 we have
P =P ; 1 QT = J (2; 3; ):
Both P and Q have the same rst columns, namely, the rst column of the identity. We verify that
H = D; H D
2
1
1

and
D = diag(1; ;1; 1):

5.6 Orthonormal Bases and Orthogonal Projections


Let A be m  n, where m  n.
Consider the QR factorization of A:
R!
QT A = :
0
Assume that A has full rank n. Partition Q = (Q1; Q2), where Q1 has n columns. Then the
columns of Q1 form an orthonormal basis for R(A). Similarly, the columns of Q2 form
an orthonormal basis for the orthogonal complement of R(A). Thus, the matrix
PA = Q QT 1 1

is the orthogonal projection onto R(A) and the matrix PA? = Q2QT2 is the projection onto
the orthogonal complement of R(A). Since the orthogonal complement of R(A) is denoted by
R(A)? = N (AT ), we shall denote PA? by PN .
194
Example 5.6.1
0 1 1
1
B 0:0001
A=B 0 C
C
@ A
0 0:0001
Using the results of Example 5.4.3 we have
0 ;1 0:0001 ;0:0001
1
Q = B
B ;0:0001 ;0:7071 0:7071 CC
@ A
0 0:7071 0:7071
0 ;1 0:0001 1 0 ;0:0001 1
B C B C
Q1 = B @ 0:0001 ;0:7071 CA ; Q2 = B@ 0:7071 CA
0 0:7071 0:7071
0 1:000 0:0003 0:0007 1
B C
PA = Q1QT1 = B @ 0:0003 0:5000 ;0:5000 CA
0 0;:00007 ;0:5000 0:5000
:0001 1
PN = PA? = Q2QT2 = B
B 0:7071 CC ( ;0:0001 0:7071 0:7071)
@ A
0:7071
0 0:00000001 ;0:0001 ;0:0001 1
B C
= B @ ;0:0001 0:5000 0:5000 CA
;0:0001 0:5000 0:5000

Projection of a vector
Given an m-vector b, the vector bR , the projection of b onto R(A), is given by
bR = PA b:
Similarly b? , the projection of b onto the orthogonal complement of R(A), is given by
b? = PA? b = PN b;
where we denote PA? by PN .

Note that b = bR + b?. Again, since the orthogonal complement of R(A) = N (AT ), the null
space of AT , we denote b? by bN for notational convenience.
195
Note: Since bN = b ; bR, it is tempting to compute bN just by subtracting bR from b once
bR has been computed. This is not advisable, since in the computation of bN from bN = b ; bR,
cancellation can take place when bR  b.
Example 5.6.2

01 21 011
B CC B CC
A = B
@0 1A; b=B
@1A;
1 0 1
0 ;0:7071 ;0:5774 ;0:4082 1
B C
Q = B@ 0 ;0:5774 0:8165 CA
;0:7071 0:5774 0:4082
0 ;0:7071 ;0:5744 1
B C
Q1 = B
@ 0 ;0:5774 C
A
;0:7071 0:5774
0 0:8334 0:3334 0:1666 1
PA = Q QT = B
B 0:3334 0:3334 ;0:3334 CC
1 1 @ A
0:1666 ;0:3334 0:8334
0 1:3340 1
BB C
bR = PA b = @ 0:3334 CA
0:6666
0 0:1667 ;0:3333 ;0:1667 1
BB C
PN = @ ;0:3333 0:6667 0:3333 CA
;0:1667 0:3333 0:1667
0 ;0:3334 1
BB C
bN = PN b = @ 0:6666 CA
;0:3334
Example 5.6.3

0 1 1
1 0 2 1
B 0:0001
A = B 0 C
C; B 0:0001 CC
b=B
@ A @ A
0 0:0001 0:0001

196
0 ;1 ;0:0001 ;0:0081 1
B C
Q = B
@ ;0:0001 ;0:7071 0:7071 CA
0:1 0:7071 0:7071
0 1 0:0001 0:0001
1
PA = Q1QT1 = B
B 0:0001 0:5000 ;0:5000 CC
@ A
0:0001 ;5:0000 0:5000
0 2 1
B 0:0001 CC
bR = PA  b = B
@ A
0:0001
001
bN = B
B 0 CC :
@ A
0
Projection of a Matrix Onto the Range of Another Matrix
Let B = (b1; : : :; bn) be an m  n matrix. We can then think of projecting each column of B
onto R(A) and onto its orthogonal complement. Thus, the matrix
BR = PA(b ; : : :; bn) = PA B
1

is the orthogonal projection of B onto R(A). Similarly, the matrix


BN = PN B
is the projection of B onto the orthogonal complement of R(A).
Example 5.6.4

01 21 01 2 31
B CC B C
A=B
@2 3A; @ 2 3 4 CA
B=B
4 5 3 4 5
A = QR gives
0 ;0:2182 ;0:8165 ;0:5345 1
B C
Q=B
@ ;0:4364 ;0:4082 0:8018 CA
;0:8729 0:4082 ;0:2673

197
Orthonormal Basis for R(A):
0 ;0:2182 ;0:8165 1
QB
B ;0:4364 ;0:4082 CC
@ 1 A
;0:8729 0:4082
0 0:7143 0:4286 ;0:1429 1
B C
PA = Q QT = B
1@ 0:4286 0:3571 0:2143 CA
1

;0:1429 0:2143 0:9286


Orthogonal Projection of B onto R(A):
0 1:1429 2:1429 3:1429 1
B C
PA B = B
@ 1:7857 2:7857 3:7857 CA :
3:0714 4:0714 5:0714
5.7 QR Factorization with Column Pivoting
If an m  n (m  n) matrix A has rank r < n, then the matrix R is singular. In this case the QR
factorization cannot be employed to produce an orthonormal basis of R(A).
To see this, just consider the following simple 2  2 example from Bjorck (1992, p. 31):
0 0 c ;s
! ! 0 s!
A= = = QR:
0 1 s c 0 c
If c and s are chosen such that c2 + s2 = 1, rank(A) = 1 < 2, and the columns of Q do not form an
orthonormal basis of R(A) nor for its complement.
Fortunately, however, the process of QR factorization (for example, the Householder method)
can be modi ed to yield an orthonormal basis. The idea here is to generate a permutation matrix
P such that
AP = QR;
where
R R
!
R= 11 12
:
0 0
Here R11 is r  r upper triangular and r is the rank of A, and Q is orthogonal. The rst r columns
of Q will then form an orthonormal basis of R(A). The following theorem guarantees the existence
of such a factorization:

198
Theorem 5.7.1 (QR Column Pivoting Theorem) Let A be an m  n matrix
with rank(A) = r  min(m; n): Then there exist an n  n permutation matrix P
and an m  m orthogonal matrix A such that
R R
!
QT AP = 11 12
;
0 0
where R11 is an r  r upper triangular matrix with positive diagonal entries.

Proof. Since rank(A) = r, there exists a permutation matrix P such that


AP = (A ; A ); 1 2

where A1 is = m  r and has linearly independent columns. Consider now the QR factorization
of A1: !
T R11
Q A1 = ;
0
where by the uniqueness theorem (Theorem 5.5.2), Q and R11 are uniquely determined and R11
has positive diagonal entries. Then
R R
!
QT AP = (QT A ; QT A )= 11 12
:
1
0
2
22 R
Since rank(Q AP ) = rank(A) = r, and rank(R11) = r; we must have R22 = 0:
T

Creating the Permutation Matrix P


There are several ways one can think of creating this permutation matrix P . We present here
one such way, which is now known as QR factorization with column pivoting.
The permutation matrix P is formed as the product of r permutation matrices P1 through Pr .
These permutation matrices are applied to A one by one, before forming Householder matrices to
create zeros in appropriate columns. More speci cally, the following is done:

Step 1. Find the column of A having the maximum norm. Permute now the columns of A
so that the column of maximum norm becomes the rst column.
This is equivalent to forming a permutation matrix P1 such that the matrix AP1 has the rst
column having the maximum norm. Form now a Householder matrix H1 so that
A = H AP
1 1 1

has zeros in the rst column below the (1,1) entry.


199
Step 2. Find the column with the maximum norm of the submatrix A^ obtained from A 1 1

by deleting the rst row and the rst column. Permute the columns of this submatrix so that
the column of maximum norm becomes the rst column. This is equivalent to constructing a
permutation matrix from P^2 such that the second column of A^1 P^2 has the maximum norm. Form
P2 from P^2 in the usual way, that is:
01 0  01
BB 0 CC
P2 = BB CC :
B@ ... P^2 C A
0
Now construct a Householder matrix H2 so that
A = H A P = H H AP P
2 2 1 2 2 1 1 2

has zeros in the second column of A2 below the (2,2) entry. As before, H2 can be constructed in
two steps as in Section 5.3.1:
 The kth step can now easily be written down.
 The process is continued until the entries below the diagonal of the current matrix all become
zero.
Suppose r steps are needed. Then at the end of the rth step, we have
A  A r = Hr    H AP    Pr
( )
1 1
!
R R
= QT AP = R = 11 12
:
0 0
Flop-Count and Storage Consideration. The above method requires 2mnr ; r (m + n)+ 2

r
2 3
3
ops. The matrix Q, as in the Householder factorization, is stored in factored form. The
matrix Q can be stored in factored form in the subdiagonal part of A and A can overwrite R.
Example 5.7.1
00 01
B C
@ 1 CA
A=B 1
2
= (a1; a2):
1
2
1 32

200
Step 1. a has the largest norm. Then
2

0 1
!
P =
1
1 0
00 01
B 1 CC
AP = B
1 @ A 1
2

1 1
2
00 ;1
p ;p1 1
B 2
;1 C
2

H = B
1 @ p; 1
2
1
2
C
2 A
;1
p ;1 1

1 0 ;p2 p; 1
2 2
00 ;p1 1 0 0
2

;1
p 0 1

B p;
A = H AP = B
2
;1 C
2
C BB 1 CC BB 0 0 CC = R
2
(1)
@
1 1
1
2
1
2 2 A@ 1
2 A=@ A
;1
p ;1 1
1 1
0 0
2 2 2
!
2

R R
R = 1 2
:
0 0
Thus, for this example
Q = HT = H ; 1

0 1
! 1

P = :
1
1 0
0 0 1
p B ;q CC forms an
The matrix A has rank 1, since R = 2 is 1  1. The column vector B
1 @ 1
2 A
;p1
orthonormal basis of R(A).
2

Complete Orthogonal Factorization


It is easy to see that the submatrix (R11; R12) can further be reduced by using orthogonal
transformations, yielding !
T 0
:
0 0

201
Theorem 5.7.2 (Complete Orthogonalization Theorem) Given Amn with
rank(A) = r, there exist orthogonal matrices Qmm and Wnn such that
T 0
!
QT AW = ;
0 0
where T is an r  r upper triangular matrix with positive diagonal entries.

Proof. The proof is left as an exercise (exercise #38).


The above decomposition of A is called the complete orthogonal decomposition.
Rank-Revealing QR
The above process, known as the QR factorization with column pivoting, was developed
by Golub (1965). The factorization is known as the rank-revealing QR factorization, since in
exact arithmetic it reveals the rank of the matrix A which is the order of the nonsingular upper
triangular matrix R11. However, in the presence of rounding errors, we will actually have
R R
!
R= 11 12
;
0 R 22

and if R22 is \small" in some measure (say, kR22k is of O(), where  is the machine precision),
then the reduction will be terminated. Thus, from the above discussion, we note that, given an
m  n matrix A (m  n), if there exists a permutation matrix P such that
R R
!
QT AP = R = 11 12
;
0 22 R
where R11 is r  r, and R22 is small in some measure, then we will say that A has numerical rank r.
(For more on numerical rank, see Chapter 10, Section 10.5.5.)
Unfortunately, the converse is not true.
A celebrated counterexample due to Kahan (1966) shows that a matrix can be nearly rank-
de cient without having kR22k small at all.
Gene H. Golub, an American mathematician and computer scientist, is well known for his outstanding contribu-
tion in numerical linear algebra, especially in the area of the singular value decomposition (SVD), least squares, and
their applications in statistical computations. Golub is a professor of computer science at Stanford University and
is the co-author of the celebrated numerical linear algebra book \Matrix Computations". Golub is a member of the
National Academy of Sciences and a past president of SIAM (Society for Industrial and Applied Mathematics).

202
Consider 01
;c ;c    ;c 1
BB 01 ;c    ;c C
C
BB .
... ... .. C
C=R
A = diag(1; s; : : :; s ) B
n;
BB ... 1
. C
C
B@ ... . . . . . ;c CCA
0   0 1
with c + s = 1; c; s > 0. For n = 100; c = 0:2, it can be shown that A is nearly singular (the
2 2

smallest singular value is O(10; )). On the other hand, rnn = sn; = :133, which is not small, so
8 1

R cannot be nearly singular.


The question whether at any stage R becomes really small for any matrix has been investigated
22

by Chan (1987) and more recently by Hong and Pan (1992).

5.8 Modifying a QR Factorization


Suppose the QR factorization of an m  k matrix A = (a1; : : :; ak ); (m  k) has been obtained. A
vector ak+1 is now appended to obtain a new matrix:
A0 = (a ; : : :; ak; ak ):
1 +1

It is natural to wonder how the QR factorization of A0 can be obtained from the given QR factor-
ization of A, without nding it from scratch.
The problem is called the updating QR factorization problem. The downdating QR fac-
torization is similarly de ned. The updating and downdating QR factorization arise in a variety of
practical applications, such as signal and image processing.
We present below a simple algorithm using Householder matrices to solve the updating problem.
Algorithm 5.8.1 Updating QR Factorization Using Householder Matrices
The following algorithm computes the QR factorization of A0 = (a1; : : :; ak ; ak+1) given the
Householder QR factorization of A = (a1; : : :; ak).
Step 1. Compute bk = Hk    H ak , where H through Hk are Householder matrices such that
+1 1 +1 1

R
!
QT A = Hk    H A = : 1
0
Step 2. Compute a Householder matrix Hk so that Hk bk = rk is zero in entries k +
+1 +1 +1 +1

2; : : :; m.
" R! #
Step 3. Form R0 = ; rk .
0 +1

203
Step 4. Form Q0 = Hk    H .
+1 1

Theorem 5.8.1 R0 and Q0 de ned above are such that (Q0)T A0 = R0.

Example 5.8.1
011
B CC
A = B
@2A
3
0 ;0:2673 ;0:5345 ;0:8018 1
B C
H H = QT = B
2 1 @ ;0:5345 0:7745 ;0:3382 CA
1

;0:8018 ;0:3382 0:4927


0 ;3:7417 1
B 0 CC
R = B
@ A
0
01 11
B CC
A0 = B
@2 4A
3 5
Step 1.
0 ;0:2673 ;0:5345 ;0:8018 1
B C
H = B
1 @ ;0:5345 0:7745 ;0:3382 CA
;0:8018 ;0:3382 0:4927
0 ;3:7417 1
B CC
R = B
@ 0 A
0
0 ;6:4143 1
B C
b =H a = B
2 @ 0:8227 CA
1 2

0:3091
Step 2. 01 1 0 ;6:4143 1
0 0
B
H =B
C B C
@ 0 ;0:9426 ;0:3339 CA ;
2 r =B
2 @ ;0:9258 CA
0 ;0:3339 0:9426 0

204
Step 3.
0 ;3:7417 ;6:4143 1
B C
R0 = (R; r ) = B
@ 0
2 ;0:9258 C
A
0 0
0 ;0:2673 ;0:5345 ;0:8018 1
Q0 = H H =B
B 0:7715 ;0:6172 0:1543 CC
2 @1 A
;0:5773 ;0:5774 0:5774
01 1
1
Veri cation: (Q0)T R0 = BB@ 2 4C
C = A0.
A
3 5
5.9 Summary and Table of Comparisons
For easy reference we now review the most important aspects of this chapter.

1. Three Important Matrices: Elementary, Householder and Givens.


 Elementary Lower Triangular matrix: An n  n matrix E of the form E = I + meTk ,
where m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T ; is called an elementary lower triangular matrix.
If E is as given above, then E ;1 = I ; meTk .
 Householder matrix: An n  n matrix H = I ; 2uuu
T
T u , where u is an n-vector is called a
Householder matrix.
A Householder matrix is symmetric and orthogonal.
 Givens matrix: A Givens matrix J (i; j; c; s) is an identity matrix except for (i; i); (i; j); (j; i)
and (j; j ) entries, which are, respectively, c; s; ;s; and c, where c + s = 1.
2 2

A Givens matrix is an orthogonal matrix.

2. Two Important Matrix Factorizations: LU and QR.


 LU factorization: A factorization of A in the form A = LU , where L is unit lower triangular
and U is upper triangular, is called an LU factorization of A. An LU factorization of matrix
A does not always exist. If the leading principal minors of A are all di erent from zero, then
the LU factorization of A exists and is unique (Theorem 5.2.1).
The LU factorization of a matrix A, when it exists, is achieved using elementary lower trian-
gular matrices. The process is called Gaussian elimination without row interchanges.
205
The process is ecient, requiring only n33 ops, but is unstable for arbitrary matrices. Its use
is not recommended in practice unless A is symmetric positive de nite or column diagonally
dominant. For decomposition of A into LU in a stable way, row interchanges (Gaussian
elimination with partial pivoting) or both row and column interchanges (Gaussian elimination
with complete pivoting) to identify an appropriate pivot will be needed. Gaussian elimination
with partial and complete pivoting yield factorizations MA = U and MAQ = U , respectively.
 QR Factorization. Every matrix A can always be written in the form A = QR, where Q is
orthogonal and R is upper triangular. This is called QR factorization of A.
The QR factorization of a matrix A is unique if A has linearly independent columns (Theo-
rem 5.4.2).
The QR factorization can be achieved using either Householder or Givens matrices. Both
methods have guaranteed numerical stability. The Householder method is more ecient
than the Givens method ( 2n3 3 ops versus 4n3 3 ops (approximately)), but the Givens matri-
ces are emerging as useful tools in parallel matrix computations and for computations with
structured matrices. The Gram-Schmidt and modi ed Gram-Schmidt processes for QR
factorization are described in Chapter 7.

3. Hessenberg Reduction.
The Hessenberg form of a matrix is a very useful condensed form. We will see its use throughout
the whole book.
An arbitrary matrix A can always be transformed to an upper Hessenberg matrix by orthogonal
similarity: Given an n  n matrix A, there always exists an orthogonal matrix P such that PAP T =
Hu, an upper Hessenberg matrix (Theorem 5.4.2).
This reduction can be achieved using elementary, Householder or Givens matrices. We have
described here methods based on Householder and Givens matrices (Algorithms 5.4.3 and 5.5.3).
Both the methods have guaranteed stability, but again, the Householder method is more ecient
than the Givens method.
For the aspect of uniqueness in Hessenberg reduction, see the statement of the Implicit Q
Theorem (Theorem 5.5.3). This theorem basically says that if a matrix A is transformed by
orthogonal similarity to two di erent unreduced upper Hessenberg matrices H1 and H2 using two
transforming matrices P and Q, then H1 and H2 are essentially the same, provided that P and Q
have the same rst columns.

206
4. Orthogonal Bases and Orthogonal Projections.
R
!
If Q A =
T is the QR factorization of an m  n matrix (m  n), then the columns of Q form
0 1

an orthonormal basis for R(A) and the columns of Q2 form an orthonormal basis for the orthogonal
complement of R(A), where Q = (Q1; Q2) and Q1 has n columns.
If we let B = (b1; : : :; bn) be an m  n matrix, then the matrix BR = PA (b1; : : :; bn) = PA B ,
where PA = Q1 QT1 , is the orthogonal projection of B onto R(A).
Similarly, the matrix BN = PN B is the projection of B onto the orthogonal complement of
R(A), where PN = PA? = Q2QT2 .

5. QR Factorization with Column Pivoting, Rank-Revealing QR, and Modifying a QR


Factorization.
If A is rank-de cient, then the ordinary QR factorization cannot produce an orthonormal basis of
R(A).
In such a case, the QR factorization process needs to be modi ed. A modi cation, called the
QR factorization with column pivoting, has been discussed in this chapter. Such a factorization
always exists (Theorem 5.7.1).
A process to achieve such a factorization using Householder matrices due to Golub has been
described brie y in Section 5.6.
The QR factorization with column pivoting, in exact arithmetic, reveals the rank of A. That
is why it is called rank-revealing QR factorization. However, in the presence of rounding errors,
such a rank-determination procedure is complicated and not reliable. Finally, we have presented a
simple algorithm to modify the QR factorization of a matrix (updating QR).

6. Table of Comparisons.
We now summarize in the following table eciency and stability properties of some of these major
computations. We assume that A is m  n (m  n).

207
TABLE 5.1
TABLE OF COMPARISONS
PROBLEM METHOD FLOP-COUNT STABILITY
(APPROXIMATE)

LU Factorization Gaussian elimination mn2


2
;n 3
6
Unstable
without row interchange

Factorization: MA = U Gaussian elimination mn2


2
;n Stable in
3
6

with partial pivoting (+O(n2) comparisons) practice

Factorization: MAQ = U Gaussian elimination mn2


2
;n 3
6
Stable
with complete pivoting (+O(n3) comparisons)

QR Factorization Householder n (m ; n )
2
3
Stable

QR Factorization Givens 2n2(m ; n3 ) Stable

Hessenberg Reduction of Householder n


5 3
Stable
3

an n  n matrix

Hessenberg Reduction of Givens n


10 3
Stable
3

an n  n matrix

QR Factorization with Householder 2mnr Stable


Column Pivoting ;2r2(m + n) + 23r3 ,
r = rank(A)
Concluding Remarks: Gaussian elimination without pivoting is unstable; Gaussian
elimination with partial pivoting is stable in practice; Gaussian elimination with com-
plete pivoting is stable. >From the above table, we see that the process of QR factorization
and that of reduction to a Hessenberg matrix using Householder transformations are most ecient.
The methods are numerically stable. However, as remarked earlier, Givens transformations are
useful in matrix computations with structured matrices, and they are emerging as important tools
for parallel matrix algorithms. Also, it is worth noting that Gaussian elimination can be

208
used to transform an arbitrary matrix to an upper Hessenberg matrix by similarity.
For details, see Wilkinson AEP, pp. 353-355.
5.10 Suggestions for Further Reading
The topics covered in this chapter are standard and can be found in any numerical linear algebra
text. The books by Golub and Van Loan (MC) and that by G. W. Stewart (IMC) are rich sources
of further knowledge in this area.
The book MC in particular contains a thorough discussion on QR factorization with column
pivoting using Householder transformations. (Golub and Van Loan MC, 1984, pp. 162-167.)
The book SLP by Lawson and Hanson contains in-depth discussion of triangularization using
Householder and Givens transformations, and QR factorization with column pivoting (Chapters 10
and 15).
The details of error analysis of the Householder and the Givens methods for QR factorization
and reduction to Hessenberg forms are contained in AEP by Wilkinson.
For error analyses of QR factorization using Givens transformations and variants of Givens
transformations, see Gentleman (1975).
A nice discussion on orthogonal projection is given in the book Numerical Linear Algebra
and Optimization by Philip E. Gill, Walter Murray, and Margaret H. Wright, Addison Wesley,
1991.

209
Exercises on Chapter 5
(Use MATLAB, whenever appropriate and necessary)
PROBLEMS ON SECTIONS 5.2 and 5.3
1. (a) Show that an elementary lower triangular matrix has the form
E = I + meTk ;
where m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T .
(b) Show that the inverse of E in (a) is given by
E ; = I ; meTk :
1

2. (a) Given !
0:00001
a= ;
1
using 3-digit arithmetic, nd an elementary matrix E such that Ea is a multiple of e1 .
(b) Using your computations in (a), nd the LU factorization of
0:00001 1
!
A=
1 2
(c) Let L^ and U^ be the computed L and U in part (b). Find
kA ; L^ U^ kF ;
kAkF
where k  kF is the Frobenius norm.
(n;1)
3. Show that the pivots a11; a11
22; : : :; ann are nonzero i the leading principal minors of A are
nonzero.
Hint: Show that
(r ;1)
det Ar = a11 a(1)
22 : : :; arr :
4. Let A be a symmetric positive de nite matrix. At the end of the rst step of the LU factor-
ization of A, we have 0 1
a a  a n
BB 0
11 12 1
CC
BB CC
A =B
(1)
BB 0 CC
B@ .. CC
. A0 A
0
210
Prove that A0 is also symmetric and positive de nite. Hence show that LU factorization of a
symmetric positive de nite matrix using Gaussian elimination without pivoting always exists
and is unique.
5. (a) Repeat the exercise #4 when A is a diagonally dominant matrix; that is, show that LU
factorization of a diagonally dominant matrix always exists and is unique.
(b) Using (a), show that a diagonally dominant matrix is nonsingular.
6. Assuming that LU factorization of A exists, prove that
(a) A can be written in the form
A = LDU ; 1

where D is diagonal and L and U1 are unit lower and upper triangular matrices, respec-
tively.
(b) If A is symmetric, then
A = LDLT :
(c) If A is symmetric and positive de nite, then
A = HH T ;
where H is a lower triangular matrix with positive diagonal entries. (This is known
as the Cholesky decomposition.)
7. Assuming that LU factorization of A exists, develop an algorithm to compute U by rows and
L by columns directly from the equation:
A = LU:
This is known as Doolittle reduction.
8. Develop an algorithm to compute the factorization
A = LU;
where U is unit upper triangular and L is lower triangular. This is known as Crout reduc-
tion.
Hint: Derive the algorithm from the equation A = LU .

211
9. Compare the Doolittle and Crout reductions with Gaussian elimination with respect to op-
count, storage requirements and possibility of accumulating inner products in double preci-
sion.
10. A matrix G of the form
G = I ; geTk ;
is called a Gauss-Jordan matrix. Show that, given a vector x with the property that
eTk x 6= 0; there exists a Gauss-Jordan matrix G such that
Gx is a multiple of ek.
Develop an algorithm to construct Gauss-Jordan matrices G1; G2; : : :; Gn successively such
that

(GnGn ; 1; : : :; G2G1)A is a diagonal matrix.

This is known as Gauss-Jordan reduction.


Derive conditions under which Gauss-Jordan reduction can be carried to completion.
Give op-count for the algorithm and compare it with those of Gaussian elimination, Crout
reduction and Doolittle reductions.
11. Given 01 2 31
B 2 5 4 CC ;
A=B
@ A
3 4 5
nd
(a) LU factorization of A using Gaussian elimination and Doolittle reduction.
(b) LU factorization of A using Crout reduction (note that U is unit upper triangular and
L is lower triangular).
12. Apply the Gauss-Jordan reduction to A of the problem #11.
13. (a) Let A be m  n and let r = minfm ; 1; ng. Develop an algorithm to construct elementary
matrices E1; : : :; Er such that
Er Er;    E E A
1 2 1

is an upper trapezoidal matrix U . The algorithm should overwrite A with U .

212
(b) Show that the algorithm requires about r3 ops.
3

(c) Apply the algorithm to


01 21
B CC
A=B
@4 5A
6 7
14. Given a tridiagonal matrix A with nonzero o -diagonal entries, write down a set of simple
conditions on the entries of A that guarantees that Gaussian elimination can be carried to
completion.
15. Assuming that LU decomposition exists, from
A = LU
show that, when A is tridiagonal, L and U are both bidiagonals. Develop a scheme for
computing L and and U in this case and apply your scheme to nd LU factorization of
04 1 0 01
BB 1 4 1 0 CC
A=B BB CC
@ 0 1 4 1 CA
0 0 1 4
16. Prove that the matrix L in each of the factorizations PA = LU and PAQ = LU , obtained
by using Gaussian elimination with partial and complete pivoting, respectively, is unit lower
triangular.
00 1 0 01
BB 0 0 1 0 C C
17. Given A = B BB C
C
C ;
@ 0 0 0 1 A
2 3 4 5
Find a permutation matrix P , a unit lower triangular matrix L, and an upper triangular
matrix U such that PA = LU .
18. For each of the following matrices nd
(a) permutation matrices P1 and P2 and elementary matrices M1 and M2 such that
MA = M2P2M1P1A is an upper triangular matrix.
(b) permutation matrices P1 ; P2 ; Q1 ; Q2 and elementary matrices M1 and M2 such that
MAQ = M1 (P2(M1P1AQ1))Q2) is an upper triangular matrix.
213
01 1 1
1
B 2 3
C
i. A = B
@ 1
2
1
3
1
4
C
A;
1 1 1
0 100 99 98 1
3 4 5

B 98 55 11 CC ;
ii. A = B
@ A
0 10 01 111
B ;1 1 1 CC ;
iii. A = B
@ A
; 1 ; 1 1
0 0:00003 1:566 1:234 1
B C
iv. A = B@ 1:5660 2:000 1:018 CA ;
1 ;3:000
0 11:2340;1 10:018
B C
v. A = B@ ;1 2 0 C A:
0 ;1 2
(c) Express each factorization in the form PAQ = LU (note that for Gaussian elimination
without and with partial pivoting, Q = L).
(d) Compute the growth factor in each case.

PROBLEMS ON SECTIONS 5.4{5.6


19. Let x be an n-vector. Give an algorithm to compute a Householder matrix H = I ; 2 uuuT u
T

such that Hx has zeros in the positions (r + 1) through n; r < n.


How many ops will be required to implement this algorithm?
Given x = (1; 2; 3)T , apply your algorithm to construct H such that Hx has a zero in the 3rd
position.

20. Let H be an unreduced upper Hessenberg matrix of order n.


Develop an algorithm to triangularize H using
(a) Gaussian elimination,
(b) Householder transformations,
(c) Givens rotations.
Compute the op-count in each case and compare.

214
21. Let 0 10 1 1 1 1
BB 2 10 1 1 CC
H=BBB CC :
@ 0 1 10 1 CA
0 0 1 10
Triangularize H using
(a) Gaussian elimination,
(b) the Householder method,
(c) the Givens method.
22. Let H^ k = I ; 2uu
T
u uT ; where u is a (n ; k + 1) vector. De ne

Ik; 0
!
Hk = 1
:
0 H^ k
How many ops will be required to compute Hk A, where A is arbitrary and n  n? Your
count should take into account the special structure of the matrix H^ k .
Using this result, show that the Householder method requires about 2 n3 ops to obtain R
3
2
and another 3 n ops to obtain Q in the QR factorization of A.
3

23. Show that it requires n2 (m ; n3 ) ops to compute R in the QR factorization of an m  n


matrix A (m  n) using Householder's method.
Given 01 21
A=B
B 3 4 CC ;
@ A
5 6
(a) nd Householder matrices H1 and H2 such that
R
!
H H A= ;
2 1
0
where R is 2  2 upper triangular.
(b) nd orthonormal bases for R(A) and for the orthogonal complement of R(A).
(c) Find orthogonal projections of A onto R(A) and onto its orthogonal complement; that
is, nd PA and PA? .

215
24. Given 011
B 2 CC
b=B
@ A
3
and A as in Problem #21, nd bR and bN .
25. Given 01 31
B CC
B=B
@2 4A
3 5
and A as in Problem #21, nd BR and BN .
26. Let H be an n  n upper Hessenberg matrix and let
H = QR;
where Q is orthogonal and R is upper triangular obtained by Givens rotations.
Prove that Q is also upper Hessenberg.
27. Develop an algorithm to compute AH where A is m  n arbitrary and H is a Householder
matrix. How many ops will the algorithm require? Your algorithm should exploit the
structure of H .
28. Develop algorithms to compute AJ and JA, where A is m  n and J is a Givens rotation.
(Your algorithms should exploit the structure of the matrix J ).
How many ops are required in each case?
29. Show that the op-count to compute R in the QR factorization of an m  n matrix A (m  n)
using Givens' rotations is about 2n2(m ; n3 ).
30. Give an algorithm to compute
Q = H H    Hn
1 2

where H1 through Hn are Householder matrices in QR factorization of an m  n matrix


A; m  n. Show that the algorithm can be implemented with 2(m2n ; mn2 + n3=3) ops.
31. Let A be m  n. Show that the orthogonal matrix
Q = QT QT    QTs; ;
1 2 1

where each Qi is the product of (m ; i) Givens rotations, can be computed with 2n2 ( m ; n )
3
ops.
216
32. Let 01 2 31
B 4 5 6 CC :
A=B
@ A
7 8 9
Find QR factorization of A using
(a) the Householder method
(b) the Givens method
Compare the results.
33. Apply both the Householder and the Givens methods of reduction to the matrix
00 1 0 0 01
BB 0 0 1 0 0 CC
BB CC
A = B0 0 0 1 0C
B CC
BB
@ 0 0 0 0 1 CA
1 2 3 4 5
to reduce it to a Hessenberg matrix by similarity. Compare the results.
34.
(a) Show that it requires 5 n3 ops to compute the upper Hessenberg matrix Hu using the
3 nX;2 nX;2
Householder method of reduction. (Hint: 2(n ; k)2 + 2n(n ; k)  2 n3 + n3 = 5n .)
3

k=1 k=1 3 3
(b) Show that if the transforming matrix P is required explicitly, another 2 n3 ops will be
3
needed.
(c) Work out the corresponding op-count for reduction to Hessenberg form using Givens
rotations.
(d) If A is symmetric, then show that the corresponding count in (a) is 2n .
3

3
35. Given an unreduced upper Hessenberg matrix H , show that the matrix X de ned by X =
(e1; He1; : : :; H n;1e1 ) is nonsingular and is such that X ;1HX is a companion matrix in upper
Hessenberg form.

(a) What are the possible numerical diculties with the above computations?

217
(b) Transform 0 1 1
2 3 4
BB 2  10; 4 4 4C
C
H=BB 5
CC
B@ 0 1  10 1 2 C
;
A3

0 0 1 1
to a companion form.
36.
(a) Given the pair (A; b) where A is n  n and b is a column vector, develop an algorithm
to compute an orthogonal matrix P such that PAP T = upper Hessenberg Hu and Pb
is a multiple of the vector e1 .
(b) Show that Hu is unreduced and b is a nonzero multiple of e1 i rank (b; Ab; : : :; An;1b) = n:
(c) Apply your algorithm in (a) to
01 1 1 11 011
BB C BB CC
B 1 2 3 4C C BB 2 CC :
A=B C
B@ 2 1 1 1 CA ; b = B@ 3 CA
1 1 1 1 4

PROBLEMS ON SECTION 5.7 AND SECTION 5.8


37. Given 01 2 31
BB 2 3 4 CC
A=BBB CC ;
CA
@ 4 5 6
7 8 9
nd a permutation matrix P , an orthogonal matrix Q, and an upper triangular matrix R
using the Householder method such that
AP = QR:
The permutation matrix P is to be chosen according to the criteria given in the book.
38. Give a proof of the complete orthogonalization theorem (Theorem 5.7.2) starting from the
QR factorization theorem (Theorem 5.4.1).
39. Work out an algorithm to modify the QR factorization of a matrix A from which a column
has been removed.

218
MATLAB AND MATCOM PROGRAMS AND
PROBLEMS ON CHAPTER 5

You will need the programs housmul, compiv, givqr and givhs from MATCOM.

1.
(a) Write a MATLAB function called elm(v ) that creates an elementary lower triangular
matrix E so that Ev is a multiple of e1 , where v is an n-vector.
(b) Write a MATLAB function elmul(A; E ) that computes the product EA, where E is an
elementary lower triangular matrix and A is an arbitrary matrix.
Your program should be able to take advantage of the structure of the matrix
E.
(c) Using elm and elmul, write a MATLAB program elmlu that nds the LU factorization
of a matrix, when it exists:
[L; U ] = elmlu(A):
(This program should implement the algorithm 5.2.1 of the book).

Using your program elmlu, nd L and U for the following matrices:

0 1 0 1
0:00001 1A 1 1A
A = @ ; A=@ ;
1 1 0:00001 1
0 1
BB 10 1 1 C
A = B@ 1 10 1 CCA ; A = diag(1; 2; 3):
1 1 10
A = 5  5 Hilbert matrix
Now compute (i) the product LU and (ii) jjA ; LU jjF in each case and print your results.
219
2. Modify your program elmlu to incorporate partial pivoting:
[M; U ] = elmlupp(A):
Test your program with each of the matrices of problem #1.
Compare your results with those obtained by MATLAB built-in function:
[L; U ] = lu(A):

3. Write a MATLAB program, called parpiv to compute M and U such that MA = U; using
partial pivoting:
[M; U ] = parpiv(A):
(This program should implement algoirthm 5.2.3 of the book).

Print M; U; jjMA ; U jjF and jjMA ; U jj2 for each of the matrices A of problem #1.
4. Using the program compiv from MATCOM, print M; Q; and U; and
jjMAQ ; U jj ; and jjMAQ ; U jjF
2

for each of the matrices of problem #1.


5.
(a) Write a MATLAB program called housmat that creates a Householder matrix H such
that Ha is a multiple of e1 , where a is an n;vector.
(b) Using housmat and housmul (from the Appendix of the book), write a MATLAB
program housqr that nds the QR factorization of A:
[Q; R] = housqr(A):
(This program should implement the algorithm 5.4.2 of the book).
(c) Using your program housqr, nd Q and R such that
A = QR
for each of the matrices in problem #1.

Now compute
220
i. jjI ; QT QjjF ,
ii. jjA ; QRjjF ,
and compare the results with those obtained using the MATLAB built-in function
[Q; R] = qr(A):

(d) Repeat (c) with the program givqr from MATCOM in place of housqr, that computes
QR factorization using Givens rotations.
6. Run the program givqr(A) from MATCOM with each of the matrices in problem #1. Then
using [Q; R] = qr(A) from MATLAB on those matrices again, verify the uniqueness of QR
factorization for each A.
7. Using givhs(A) from MATCOM and the MATLAB function hess(A) on each of the matrices
from problem #1, verify the implicit QR theorems: Theorem 5.5.3 (Uniqueness of
Hessenberg reduction).
8. Using the results of problems #5, nd an orthonormal basis for R(A), an orthonormal basis for
the orthogonal complement of R(A), the orthogonal projection onto R(A), and the projection
onto the orthogonal complement of R(A) for each of the matrices of problem #1.
9. Incorporate \maximum norm column pivoting" in housqr to write a MATLAB program
housqrp that computes the QR factorization with column pivoting of a matrix A. Test your
program with each of the matrices of problem #1:
[Q; R; P ] = housqrp(A):
Compare your results with those obtained by using the MATLAB function
[Q; R; P ] = qr(A):

Note : Some of the programs you have been asked to write such as parpiv, housmat, housqr,
etc. are in MATCOM or in the Appendix. But it is a good idea to write your own programs.

221
6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 223
6.2 Basic Results on Existence and Uniqueness : : : : : : : : : : : : : : : : : : : : : : : 224
6.3 Some Applications Giving Rise to Linear Systems Problems : : : : : : : : : : : : : : 226
6.3.1 An Electric Circuit Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 226
6.3.2 Analysis of a Processing Plant Consisting of Interconnected Reactors : : : : : 228
6.3.3 Linear Systems Arising from Ordinary Di erential Equations (Finite Di er-
ence Scheme) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 231
6.3.4 Linear Systems Arising from Partial Di erential Equations: A Case Study
on Temperature Distribution : : : : : : : : : : : : : : : : : : : : : : : : : : : 233
6.3.5 Special Linear Systems Arising in Applications : : : : : : : : : : : : : : : : : 238
6.3.6 Linear System Arising From Finite Element Methods : : : : : : : : : : : : : 243
6.3.7 Approximation of a Function by a Polynomial: Hilbert System : : : : : : : : 247
6.4 Direct Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 248
6.4.1 Solution of a Lower Triangular System : : : : : : : : : : : : : : : : : : : : : : 249
6.4.2 Solution of the System Ax = b Using Gaussian Elimination without Pivoting 249
6.4.3 Solution of Ax = b Using Pivoting Triangularization : : : : : : : : : : : : : : 250
6.4.4 Solution of Ax = b without Explicit Factorization : : : : : : : : : : : : : : : : 256
6.4.5 Solution of Ax = b Using QR Factorization : : : : : : : : : : : : : : : : : : : 258
6.4.6 Solving Linear System with Right Multiple Hand Sides : : : : : : : : : : : : 260
6.4.7 Special Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 262
6.4.8 Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 284
6.4.9 LU Versus QR and Table of Comparisons : : : : : : : : : : : : : : : : : : : : 286
6.5 Inverses, Determinant and Leading Principal Minors : : : : : : : : : : : : : : : : : : 288
6.5.1 Avoiding Explicit Computation of the Inverses : : : : : : : : : : : : : : : : : 288
6.5.2 The Sherman-Morrison and Woodbury Formulas : : : : : : : : : : : : : : : : 290
6.5.3 Computing the Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : 292
6.5.4 Computing the Determinant of a Matrix : : : : : : : : : : : : : : : : : : : : : 295
6.5.5 Computing The Leading Principal Minors of a Matrix : : : : : : : : : : : : : 297
6.6 Perturbation Analysis of the Linear System Problem : : : : : : : : : : : : : : : : : : 299
6.6.1 E ect of Perturbation in the Right-Hand Side Vector b : : : : : : : : : : : : 300
6.6.2 E ect of Perturbation in the matrix A : : : : : : : : : : : : : : : : : : : : : 304
6.6.3 E ect of Perturbations in both the matrix A and the vector b : : : : : : : : : 306
6.7 The Condition Number and Accuracy of Solution : : : : : : : : : : : : : : : : : : : : 308
6.7.1 Some Well-known Ill-conditioned Matrices : : : : : : : : : : : : : : : : : : : : 309
6.7.2 E ect of The Condition Number on Accuracy of the Computed Solution : : : 310
6.7.3 How Large Must the Condition Number be for Ill-Conditioning? : : : : : : : 311
6.7.4 The Condition Number and Nearness to Singularity : : : : : : : : : : : : : : 312
6.7.5 Conditioning and Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 313
6.7.6 Conditioning and the Eigenvalue Problem : : : : : : : : : : : : : : : : : : : : 313
6.7.7 Conditioning and Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 314
6.7.8 Computing and Estimating the Condition Number : : : : : : : : : : : : : : : 315
6.8 Component-wise Perturbations and the Errors : : : : : : : : : : : : : : : : : : : : : : 320
6.9 Iterative Re nement : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 321
6.10 Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 326
6.10.1 The Jacobi Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 328
6.7.2 The Gauss-Seidel Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 332
6.10.3 Convergence of Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : 334
6.10.4 The Successive Overrelaxation (SOR) Method : : : : : : : : : : : : : : : : : 342
6.10.5 The Conjugate Gradient Method : : : : : : : : : : : : : : : : : : : : : : : : : 349
6.10.6 The Arnoldi Process and GMRES : : : : : : : : : : : : : : : : : : : : : : : : 356
6.11 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 359
6.12 Some Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : 366
CHAPTER 6
NUMERICAL SOLUTIONS
OF LINEAR SYSTEMS
6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
Objectives
The major objectives of this chapter are to study numerical methods for solving linear systems
and associated problems. Some of the highlights of this chapter are:
 Theoretical results on existence and uniqueness of the solution (Section 6.2).
 Some important engineering applications giving rise to linear systems problems (Section
6.3).
 Direct methods (Gaussian elimination with and without pivoting) for solving linear systems
(Section 6.4).
 Special systems: Positive de nite, Hessenberg, diagonally dominant, tridiagonal and block
tridiagonal (Section 6.4.7).
 Methods for computing the determinant and the inverse of a matrix (Section 6.5).
 Sensitivity analysis of linear systems problems (Section 6.6).
 Iterative re nement procedure (Section 6.9).
 Iterative methods (Jacobi, Gauss-Seidel, Successive Overrelaxation, Conjugate Gradient)
for linear systems (Section 6.10).
Required Background
The following major tools and concepts developed in earlier chapters will be needed for smooth
learning of material of this chapter.
1. Special Matrices (Section 1.4), and concepts and results on matrix and vector norms
(Section 1.7). Convergence of a matrix sequence and convergent matrices (Section
1.7.3)
2. LU factorization using Gaussian elimination without pivoting (Section 5.2.1, Algo-
rithms 5.5.1 and 5.5.2).
3. MA = U factorization with partial pivoting (Section 5.2.2, Algorithm 5.2.3).
4. MAQ = U factorization with complete pivoting (Section 5.2.3, Algorithm 5.2.4).
5. The concept of the growth factor (Section 5.3).
222
6. QR factorization of a matrix (Section 5.4.1, Section 5.5.1).
7. Concepts of conditioning and stability (Section 3.3 and Section 3.4).
8. Basic knowledge of di erential equations.

6.1 Introduction
In this chapter we will discuss methods for numerically solving the linear system
Ax = b;
where A is an n  n matrix and x and b are n-vectors. A and b are given and x is unknown.
The problem arises in a very wide variety of applications. As a matter of fact, it might be
said that numerical solutions of almost all practical engineering and applied science
problems routinely require solution of a linear system problem. (See Section 6.3.)
We shall discuss methods for nonsingular linear systems only in this chapter. The case where
the matrix A is not square or the system has more than one solution will be treated in Chapter 7.
A method called Cramer's Rule, taught in an elementary undergraduate linear algebra course,
is of high signi cance from a theoretical point of view.

CRAMER'S RULE
Let A be a nonsingular matrix of order n and b be an n-vector. The solution x to
the system Ax = b is given by
xi = det Ai ; i = 1; : : :; n;
det A
where Ai is a matrix obtained by replacing the ith column of A by the vector b and
x = (x1; x2; : : :; xn)T .

Remarks on Cramer's Rule: Cramer's Rule is, however, not at all practical from
a computational viewpoint. For example, solving a linear system with 20 equations and 20
unknowns by Cramer's rule, using the usual de nition of determinant, would require more than a
million years even on a fast computer (Forsythe, Malcom and Moler CMMC, p. 30). For an n  n
system, it will require about O(n!) ops.
Two types of methods are normally used for numerical computations:
223
(1) Direct methods
(2) Iterative methods
The direct methods consist of a nite number of steps and one needs to perform all the steps in a
given method before the solution is obtained. On the other hand, iterative methods are based on
computing a sequence of approximations to the solution x and a user can stop whenever a certain
desired accuracy is obtained or a certain number of iterations are completed. The iterative
methods are used primarily for large and sparse systems.
The organization of this chapter is as follows:
In Section 6.2 we state the basic theoretical results (without proofs) on the existence and
uniqueness of solutions for linear systems.
In Section 6.3 we discuss several engineering applications giving rise to linear systems problems.
In Section 6.4 we describe direct methods for solving linear systems.
In Section 6.5 we show how the LU and QR factorization methods can be used to compute the
determinant, the inverse and the leading principal minors of a matrix.
In Section 6.6, we study the sensitivity issues of the linear systems problems and their e ects
on the solutions.
In Section 6.9 we brie y describe an iterative re nement procedure for improving the accuracy
of a computed solution.
In Section 6.10 we discuss iterative methods: the Jacobi, Gauss-Seidel, SOR, and conjugate
gradient and GMRES methods.

6.2 Basic Results on Existence and Uniqueness


Consider the system of m equations in n unknowns:
a11x1 + a12x2 +    + a1nxn = b1
a12x1 + a22x2 +    + a2nxn = b2
..
.
am1x1 + am2 x2 +    + amnxn = bm :
In matrix form, the system is written as
Ax = b;

224
where 0 a a  a 1 0x 1 0b 1
11 12 1n 1
BB a a    a CC BB x CC BB b1 CC
A=BBB ..21 22 2n C
CC ; x = BBB ..2 CCC ; b = BBB ..2 CCC :
@ . A @.A @.A
am1 am2    amn xn bm
Given an m  n matrix A and an m-vector b, if there exists a vector x satisfying Ax = b, then we
say that the system is consistent. Otherwise, it is inconsistent. It is natural to ask if a given
system Ax = b is consistent and, if it is consistent, how many solutions are there? when is the
solution unique? etc. To this end, we state the following theorem.

Theorem 6.2.1 (Existence and Uniqueness Theorem for A Nonhomoge-


neous System)
(i) The system Ax = b is consistent i b 2 R(A); in other words, rank(A) =
rank(A; b).
(ii) If the system is consistent and the columns of A are linearly independent,
then the solution is unique.
(iii) If the system is consistent and the columns of A are linearly dependent,
then the system has an in nite number of solutions.

Homogeneous Systems
If the vector b = 0, then the system Ax = 0 is called a homogeneous system. A homogeneous
system always has a solution, namely x = 0. This is the trivial solution.

Theorem 6.2.2 (Existence Theorem for a Homogeneous System)


(i) The system Ax = 0 has a nontrivial solution i the columns of A are linearly
dependent. If Ax = 0 has a nontrivial solution, it has in nitely many solutions.
(ii) If m = n, then Ax = 0 has a nontrivial solution i A is singular.

225
Theorem 6.2.3 (Solution Invariance Theorem) A solution of a consistent sys-
tem
Ax = b
remains unchanged under any of the following operations:
(i) Any two equations are interchanged.
(ii) An equation is multiplied by a nonzero constant.
(iii) A nonzero multiple of one equation is added to another equation.

Two systems obtained from one another by applying any of the above operations are called
equivalent systems. Theorem 6.2.3 then says that two equivalent systems have the same
solution.
6.3 Some Applications Giving Rise to Linear Systems Problems
It is probably not an overstatement that linear systems problems arise in almost all practical
applications. We will give examples here from electrical, mechanical, chemical and civil engineering.
We start with a simple problem|an electric circuit.

6.3.1 An Electric Circuit Problem


Consider the following diagram of an electrical circuit:

226
A1 I1 A2 I2 A3
R12 = 1
R23 = 2

R25 = R34
V1 = 100
10
= 3

V6 = 0
I2
I4
R56 = 5
R45 = 4

A6 I3 A5 I2 A4

Figure 6-1

We would like to determine the amount of current between the nodes A1; A2 ; A3; A4 ; A5,
and A6 . The famous Kirchho 's Current Law tells us that the algebraic sum of all currents
entering a node must be zero. Applying this law at node A2, we have
I1 ; I2 + I4 = 0 (6.3.1)
At node A5,
I2 ; I3 ; I4 = 0 (6.3.2)
At node A3,
I2 ; I2 = 0 (6.3.3)
At node A4,
I2 ; I2 = 0 (6.3.4)
Now consider the voltage drop around each closed loop of the circuit, A1 A2 A3 A4A5 A6 A1,
A1 A2A5 A6 A1 , A2A3A4A5A2. The Kirchho 's Voltage Law tells us that the net voltage drop
around each closed loop is zero. Thus at the loop A1A2A3A4A5A6A1, substituting the values
of resistances and voltages, we have
I1 + 9I2 + 5I3 = 100 (6.3.5)
Similarly, at A1 A2 A5 A6A1 and A2 A3A4 A5 A2 we have, respectively
I1 ; 10I4 + 5I3 = 100 (6.3.6)
9I2 + 10I4 = 0 (6.3.7)
227
Note that (6.3.6) + (6.3.7) = (6.3.5). Thus we have four equations in four unknowns:
I1 ; I2 + I4 = 0 (6.3.8)
I2 ; I3 ; I4 = 0 (6.3.9)
I1 ; 10I4 + 5I3 = 100 (6.3.10)
9I2 + 10I4 = 0 (6.3.11)
The equations (6.3.8){(6.3.11) can be written as
0 1 ;1 0 1 1 0 I 1 0 0 1
1
B
B C
CC BB CCC BBB CCC
B
B
B C BB I CC BB 0 CC
B
B 0 1 ;1 ;1 CCC BB 2 CC BB CC
B
B CC BB CC = BB CC ;
B
B C BB CC BB CC
B
B 1 0 5 ;10 C CC BB I3 CC BB 100 CC
B
B CA B@ CA B@ CA
@
0 9 0 10 I4 0
the solution of which yields the current between the nodes.

6.3.2 Analysis of a Processing Plant Consisting of Interconnected Reactors


Many mathematical models are based on conservation laws such as conservation of mass, conser-
vation of momentum, and conservation of energy. In mathematical terms, these conservation laws
lead to conservation or balance or continuity equations, which relate the behavior of a system or
response of the quantity being modeled to the properties of the system and the external forcing
functions or stimuli acting on the system.
As an example, consider a chemical processing plant consisting of six interconnected chemical
reactors (Figure 6-2), with di erent mass ow rates of a component of a mixture into and out of
the reactors. We are interested in knowing the concentration of the mixture at di erent reactors.
The example here is similar to that given in Chapra and Canale (1988), pp. 295-298.

228
Q55 = 2
Q15 = 3
C5
Q54 = 2

Q25 = 1

Q01 = 6 Q44 = 11
C1 C2 C4
C01 = 12 Q12 = 3 Q24 = 1

Q23 = 1 Q34 = 8

Q31 = 1 Q66 = 2
C3 C6
Q03 = 8 Q36 = 10
C03 = 20
Figure 6-2

Application of conservation of mass to all these reactors results in a linear system of equations as
shown below, consisting of ve equations in ve unknowns. The solution of the system will tell us
the concentration of the mixture at each of these reactors.

Steady-state, completely mixed reactor. Consider rst a reactor with two ows coming
in and one ow going out, as shown in Figure 6-3.

m1 ; Q1; C1-
m2 ; Q2; C2- C3 -
m3 ; Q3; C3

Figure 6-3
Application of the steady state conservation of mass to the above reactor gives
m_ 1 + m_ 2 = m_ 3 : (6.3.12)
Noting that
m_ i = Qi  Ci
229
where
mi = mass ow rate of the mixture at the inlet and outlet sections i; i = 1; 2; 3
Qi = volumetric ow rate at the section i; i = 1; 2; 3
Ci = density or concentration at the section i; i = 1; 2; 3
we get from (6.3.12)
Q1C1 + Q2C2 = Q3C3 (6.3.13)
For given inlet ow rates and concentrations, the outlet concentration C3 can be found from equa-
tion (6.3.13). Under steady state operation, this outlet concentration also represents the spatially
uniform or homogeneous concentration inside the reactor. Such information is necessary for design-
ing the reactor to yield mixtures of a speci ed concentration. For details, see Chapra and Canale
(1988).
Referring now to Figure 6-2, where we consider the plant consisting of six reactors, we have
the following equations (a derivation of each of these equations is similar to that of (6.3.13)). The
derivation of each of these equations is based on the fact that the net mass ow rate into the reactor
is equal to the net mass ow out of the reactor.
For reactor 1:
6C1 ; C3 = 72 (6.3.14)
(Note that for this reactor, ow at the inlet is 72 + C3 and ow at the outlet is 6C1.)
For reactor 2:
3C1 ; 3C2 = 0 (6.3.15)
For reactor 3:
; C2 + 11C3 = 200 (6.3.16)
For reactor 4:
C2 ; 11C4 + 2C5 + 8C6 = 0 (6.3.17)
For reactor 5:
3C1 + C2 ; 4C5 = 0 (6.3.18)
For reactor 6:
10C3 ; 10C6 = 0 (6.3.19)

230
Equations (6.3.14){(6.3.19) can be rewritten in matrix form as
0 6 0 ;1 0 0 0 1 0 C 1 0 72 1
BB 3 ;3 0 0 0 0 CC BB C1 CC BB 0 CC
BB CB 2C B C
BB 0 ;1 9 0 0 0 CCC BBB C3 CCC BBB 200 CCC
BB CB C = B C (6.3.20)
BB 0 1 8 ;11 2 8 CCC BBB C4 CCC BBB 0 CCC
B@ 3 1 0 0 ;4 0 CA B@ C5 CA B@ 0 CA
0 0 10 0 0 ;10 C6 0
or
AC = D:
The ith coordinate of the unknown vector C represents the mixture concentration at reactor i of
the plant.

6.3.3 Linear Systems Arising from Ordinary Di erential Equations (Finite Di erence
Scheme)
A Case Study on a Spring-Mass Problem
Consider a system of three masses suspended vertically by a series of springs, as shown below,
where k1 ; k2 , and k3 are the spring constants, and x1; x2 , and x3 are the displacements of each
spring from its equilibrium position.

k1

m1
x1
k2

m2
x2
k3

m3
x3

The free-body diagram for these masses can be represented as follows:


231
k3(x3 ; x2)
6k1x1 6k2(x2 ; x1) 6
m1 m2 m3

? m?1g m2 g
? ? m3 g
?
k2(x2 ; x1) k3(x3 ; x2)

Referring to the above diagram, the equations of motion, by Newton's second law, are:
m1 ddtx21 = k2(x2 ; x1) + m1g ; k1 x1
2

m2 ddtx22 = k3(x3 ; x2) + m2g ; k2 (x2 ; x1 )


2

m3 ddtx23 = m3 g ; k3(x3 ; x2)


2

Suppose we are interested in knowing the displacement of these springs when the system eventually
returns to the steady state, that is, when the system comes to rest. Then, by setting the second-
order derivatives to zero, we obtain
k1x1 + k2(x1 ; x2) = m1 g
k2(x2 ; x1) + k3(x2 ; x3) = m2 g
k3(x3 ; x2) = m3 g
This system of equations in three unknowns, x1 ; x2, and x3 , can be rewritten in matrix form as
0 k + k ;k 0
10x 1 0m g1
1 2 2 1 1
B
B CC BB CC BB CC
B
B CC BB CC BB CC
B
B ;k2 k2 + k3 ;k3 C CC BBB x2 CCC = BBB m2g CCC
B
B CA B@ CA B@ CA
@
0 ;k3 k3 x3 m3 g
or
Kx = w:
The matrix 0 k + k ;k 0
1
1 2 2
BB CC
BB CC
K=B BB 2 2 3 3 CCC
; k k + k ; k
B@ CA
0 ;k3 k3
232
is called the sti ness matrix.

6.3.4 Linear Systems Arising from Partial Di erential Equations: A Case Study on
Temperature Distribution
Many engineering problems are modeled by partial di erential equations. Numerical approaches
to these equations typically require discretization by means of di erence equations, that is, partial
derivatives in the equations are replaced by approximate di erences. This process of discretization
in turn gives rise to linear systems of many interesting types. We shall illustrate this with a problem
in heat transfer theory.
A major objective in a heat transfer problem is to determine the temperature distribution
T (x; y; z; t) in a medium resulting from imposed boundary conditions on the surface of the medium.
Once this temperature distribution is known, the heat transfer rate at any point in the medium or
on its surface may be computed from Fourier's law, which is expressed as
qx = ;K @T @x
qy = ;K @y @T

qz = ;K @T @z
where qx is the heat transfer rate in the x direction, @T@x is the temperature gradient in the x
direction, and the positive constant K is called the thermal conductivity of the material. Similarly
for the y and z directions.
Consider a homogeneous medium in which temperature gradients exist and the temperature
distribution T (x; y; z; t) is expressed in Cartesian coordinates. The heat di usion equation which
governs this temperature distribution is obtained by applying conservation of energy over an in-
nitesimally small di erential element, from which we obtain the relation
@ (K @T ) + @ (K @T ) + @ (K @T ) + q_ =  C @T ; (6.3.21)
@x @x @y @y @z @z p @z

where  is the density, Cp is the speci c heat, and q_ is the energy generated per unit volume.
This equation, usually known as the heat equation, provides the basic tool for solving heat
conduction problems.
It is often possible to work with a simpli ed form of equation (6.3.21). For example, if the
thermal conduction is a constant, the heat equation is
@ 2T + @ 2T + @ 2T + q_ = 1 @T (6.3.22)
@x2 @y 2 @z2 K @t
233
where = K=(Cp) is a thermophysical property known as the thermal di usivity.
Under steady state conditions, there can be no changes of energy storage, i.e., the unsteady
state term @T
@t can be dropped, and equation (6.3.21) reduces to the 3-D Poisson's Equation
@ 2T + @ 2T + @ 2 T + q_ = 0 (6.3.23)
@x2 @y2 @z2 K
If the heat transfer is two-dimensional (e.g., in the x and y directions) and there is no energy
generation, then the heat equation reduces to the famous Laplace's equation
@ 2 T + @ 2T = 0 (6.3.24)
@x2 @y2
If the heat transfer is unsteady and one-dimensional without energy generation, then the heat
equation reduces to
@ 2T = 1 @T (6.3.25)
@x2 @t
Analytical solutions to the heat equation can be obtained for simple geometry and boundary con-
ditions. Very often there are practical situations where the geometry or boundary conditions are
such that an analytical solution has not been obtained or if it is obtained, it involves complex series
solutions that require tedious numerical evaluation. In such cases, the best alternatives are nite
di erence or nite element methods which are well suited for computers.
Finite Di erence Scheme
A well-known scheme for solving a partial di erential equation is to use nite di erences. The
idea is to discretize the partial di erential equation by replacing the partial derivatives with their
approximations, i.e., nite di erences. We will illustrate the scheme with Laplace's equation in the
following.
Let us divide a two-dimensional region into small regions with increments in the x and y
directions given as x and y , as shown in the gure below.

234
Nodal Points

Dy

Dx

Each nodal point is designated by a numbering scheme i and j , where i indicates x increment and
j indicates y increment:
(i,j + 1)

(i - 1,j) (i,j)
(i + 1, j)

(i,j - 1)

The temperature distribution in the medium is assumed to be represented by the nodal points
temperature. The temperature at each nodal point (xi ; yj ) (which is symbolically denoted by (i,j)
as in the diagram above) is the average temperature of the surrounding hatched region. As the
number of nodal points increases, greater accuracy in representation of the temperature distribution
is obtained.
A nite di erence equation suitable for the interior nodes of a steady two-dimensional system
can be obtained by considering Laplace's equation at the nodal point i; j as
@ 2T + @ 2 T = 0 (6.3.26)
@x2 i;j @y 2 i;j
235
The second derivatives at the nodal point (i; j ) can be expressed as
@T ; @T
@ 2T
 @x i+ 2 ;jx@x i; 2 ;j
1 1

@x2 i;j (6.3.27)
@T ; @T
@ 2T  @y i;j+ 2x@y i;j; 2
1 1
(6.3.28)
@y 2 i;j
As shown in the gure, the temperature gradients can be approximated (as derived from the
Taylor series) as a linear function of the nodal temperatures as
@T  Ti+1;j;x Ti;j (6.3.29)
@x i+ 12 ;j
@T  Ti;j ;xTi;1;j (6.3.30)
@x i; 12 ;j
@T  Ti;j+1;y Ti;j (6.3.31)
@y i;j+ 12
@T  Ti;j ;Ty i;j;1 (6.3.32)
@y i;j ; 12
where, Ti;j = T (xi ; yj ). Substituting (6.3.29){(6.3.32) into (6.3.27){(6.3.28), we get
@ 2T 
=
Ti+1;j ; 2Ti;j + Ti;1;j (6.3.33)
@x i;j
2 (x)2
@ 2T 
=
Ti;j+1 ; 2Ti;j + Ti;j ;1 (6.3.34)
@y i;j
2 (y )2
The equation (6.3.26) then gives
Ti+1;j ; 2Ti;j + Ti;1;j + Ti;j +1 ; 2Ti;j + Ti;j ;1 = 0
(x)2 (y )2
Assume x = y . Then the nite di erence approximation of Laplace's equation for
interior regions can be expressed as
Ti;j+1 + Ti;j ;1 + Ti+1;j + Ti;1;j ; 4Ti;j = 0 (6.3.35)
More accurate higher order approximations for interior nodes and boundary nodes are also obtained
in a similar manner.
Example 6.3.1
A two-dimensional rectangular plate (0  x  1; 0  y  1) is subjected to the uniform
temperature boundary conditions (with top surface maintained at 1000C and all other surfaces at
236
00C ) shown in the gure below, that is T (0; y ) = 0, T (1; y ) = 0, T (x; 0) = 0, and T (x; 1) = 1000C ,
Suppose we are interested only in the values of the temperature at the nine interior nodal points
(xi; yj ); where xi = ix and yj = j y , i; j = 1::3 with x = y = 14 .

o
100 C

(0,0) (1,0) (2,0) (3,0) (4,0)


(0,1) (1,1) (2,1) (3,1) (4,1)

o
O oC (0,2) (1,2) (2,2) (3,2) (4,2) O C

(0,3) (1,3) (2,3) (3,3) (4,3)

(0,4) (1,4) (2,4) (3,4) (4,4)

o
O C

However, we assume symmetry for simplifying the problem. That is, we assume that T33 = T13,
T32 = T12, and T31 = T11. We thus have only six unknowns: (T11; T12; T13) and (T21; T22; T23).
4T1;1 ; 0 ; 100 ; T2;1 ; T1;2 = 0
4T1;2 ; 0 ; T1;1 ; T2;2 ; T1;3 = 0
4T1;3 ; 0 ; T1;2 ; T2;3 ; 0 = 0
4T2;1 ; T1;1 ; 100 ; T1;1 ; T2;2 = 0
4T2;2 ; T1;2 ; T2;1 ; T1;2 ; T2;3 = 0
4T2;3 ; T1;3 ; T2;2 ; T1;3 ; 0 = 0

237
After suitable rearrangement, these equations can be written in the following form:
0 4 ;1 ;1 0 0 0 1 0 T1;1 1 0 100 1
BB CC BB CC BB CC
BB CC BB CC BB CC
BB ;2 4 0 ;1 0 0C CC BBB T2;1 CCC BBB 100 CCC
BB CC BB CC BB CC
BB C BB T CC BB 0 CC
BB 1 0 4 ;1 ;1 0C CC BB 1;2 CC BB CC
BB CC BB CC = BB CC
BB C BB CC BB CC
BB 0 ;1 ;2 4 0 ;1 C CC BB T2;2 CC BB 0 CC
BB CC BB CC BB CC
BB C BB CC BB CC
BB 0 0 ;1 0 4 ;1 C CC BB T1;3 CC BB 0 CC
BB CA B@ CA B@ CA
@
0 0 0 ;1 ;2 4 T2;3 0
The solution of this system will give us temperatures at the nodal points.

6.3.5 Special Linear Systems Arising in Applications


Many practical applications give rise to linear systems having special properties and structures, such
as tridiagonal, diagonally dominant, and positive de nite systems and block tridiagonal.
The solution methods for solving these special systems are described in Section 6.4.6. We rst state
a situation which gives rise to a tridiagonal system.
Tridiagonal Systems
Consider one-dimensional steady conduction of heat such as heat conduction through a wire.
In such a case, the temperature remains constant with respect to time. The equation here is:
@ 2T = 0
@x2
The di erence analog of this equation is:
T (x + x) ; 2 T (x) + T (x ; x) = 0
where x is the increment in x, as shown below.

T0 = 1 2 3 T4 =
x

238
Using a similar numbering scheme as before, the temperature at any point is given by
Ti+1 ; 2 Ti + Ti;1 = 0;
that is, the temperature at any point is just the average of the temperatures of the two nearest
neighboring points.
Suppose the domain of the problem is 0  x  1. Divide now the domain into four segments of
equal length, say x. Thus x = :25. Then T at x = ix will be denoted by Ti . Suppose that
we know the temperature at the end points x = 0 and x = 1, that is,
T0 =
T4 =
These are then the boundary conditions of the problem.
>From the equation, the temperature at each node, x = 0; x = x, x = 2x; x = 3x; x = 1
is calculated as follows:
At x = 0, T0 = (given)
At x = x, T0 ; 2T1 + T2 = 0
At x = 2x, T1 ; 2T2 + T3 = 0
At x = 3x, T2 ; 2T3 + T4 = 0
At x = 1, T4 = (given)
In matrix form these equations can be written as:
01 0 0 0 010T 1 0 1
0
BB CC BB CC BB CC
BB CB C B C
BB 1 ;2 1 0 0 CCC BBB T1 CCC BBB 0 CCC
BB CC BB CC BB CC
BB CB C B C
BB 0 1 ;2 1 0 CCC BBB T2 CCC = BBB 0 CCC
BB CC BB CC BB CC
BB CB C B C
BB 0 0 1 ;2 1 CCC BBB T3 CCC BBB 0 CCC
BB CC BB CC BB CC
@ A@ A @ A
0 0 0 0 1 T4
The matrix of this system is tridiagonal.

239
Symmetric Tridiagonal and Diagonally Dominant Systems
In order to see how such systems arise, consider now the unsteady conduction of heat. This
condition implies that the temperature T varies with the time t. The heat equation in this case is
1 @T = @ 2 T :
@t @x2
Let us divide the grid in the (x; t) plane with spacing x in the x-direction and t in the t-direction.

ti+1 ; ti = t xi+1 ; xi = x
t2
t1

0 x1 x2 x3 xn 1
Let the temperature at the nodal point xi = ix and tj = j t, as before, be denoted by Tij .
Approximating @T
@t and @ 2T by the nite di erences
@x2
@T  1 (T
@t t i;j +1 ; Ti;j )
@ 2T  1 (T
@x2 (x)2 i+1;j +1 ; 2Ti;j +1 + Ti;1;j +1);
we obtain the following di erence analog of the heat equation:
(1 + 2C )Ti;j +1 ; C (Ti+1;j +1 + Ti;1;j +1) = Ti;j
i = 1; 2; : : :; n
where C = (xt)2 .
These equations enable us to determine the temperature at a time step j = k + 1, knowing the
temperature at the previous time step j = k.
For i = 1; j = k: (1 + 2C )T1;k+1 ; CT2;k+1 = CT0;k+1 + T1;k
For i = 2; j = k: (1 + 2C )T2;k+1 ; CT3;k+1 ; CT1;k+1 = T2;k
...

240
For i = n; j = k: (1 + 2C )Tn;k+1 ; CTn;1;k+1 = Tn;k + Tn+1;k+1
Suppose now the temperatures at the two vertical sides are known, that is,
T0;t = TW1
Tn+1;t = TW2
Then the above equations can be written in matrix notation as
0 (1 + 2C ) ;C 0   0 1 0 T1;k+1 1 0 T1;k + CTW1 1
BB CC BB CC BB CC
BB CC BB CC BB CC
BB ;C (1 + 2C ) ;C 0  0 C B T2;k+1 C
C B CC BBB T2;k CC
BB CC BB C B CC
BB . ... ... ... .. C CC BBB .. CCC BBB .. CC
BB .. . CB . C = B . CC
BB CC BB C B CC
BB . ... ...
CC BB . CCC BBB .. CC
BB .. ;C C CC BBB .. CCC BBB . CC
BB CA B@ CA B@ CC
@ A
0   ;C (1 + 2C ) Tn;k+1 Tn;k + CTW2
The matrix of the above system is clearly symmetric, tridiagonal and diagonally dominant
(note that C > 0).
For example, when C = 1, and we have
0 3 ;1 0    0 1 0 T 1 0 T + T 1
1;k+1 1;k W1
B
B ;1 3 ;1    0 C C B
BB T2;k+1 CC BB T2;k CCC
C B
B
B .. . . . . . . . . . .. CC BB ... CC = BB ... CC
B
B . . C
C BB . CC BB . CC
B
B .. . . . . . . ;1 C
C
@ . A B@ .. CA B@ .. CA
0       ;1 3 Tn;k+1 Tn;k + TW2
or
Ax = b:
The matrix A is symmetric, tridiagonal, diagonally dominant and positive de nite.
Block Tridiagonal Systems
To see how block tridiagonal systems arise in applications, consider the two-dimensional
Poisson's equation:
@ 2T + @ 2T = f (x; y ); 0  x  1; 0  y  1:
@x2 @y 2

241
A discrete analog to this equation, similar to Laplace's equation derived earlier, is
Ti+1;j + Ti;1;j + Ti;j +1 + Ti;j;1 ; 4Tij = (x)2 fij ;
i = 1; 2; : : :; n j = 1; 2; : : :; n
This will give rise to a linear system of (n + 2)2 variables.
Assume now the values of T at the four sides of the unit square are known and we are interested
in the values of T at the interior grid points, that is,
T0;j ; Tn+1;j and Ti;0 ; Ti;n+1
(j = 0; 1; : : :; n + 1; i = 0; 1; : : :; n + 1)
are given and T11; : : :; Tn1; T12; : : :; Tn2; T1n; : : :; Tnn are to be found. Then we have a (n2  n2 )
system with n2 unknowns which can be written after suitable rearrangement as

04 ;1 0    0 ;1 0    1
BB ;1 4 ;1 0    0 ;1 0    CC
BB CC
BB  CC
BB  CC
BB CC
BB  CC
BB CC
BB 0   0 ;1 4 ;1 0    0 ;1    CC
BB 0    0 ;1 4 0 0    0 ;1 0    CC
BB C
BB ;1 0    0 0 4 ;1 0    0 ;1 0    C CC
B@ 0 ;1     0 ;1 4 ;1 0    0 ;1 0    C A
..
.

242
0 T01 + T10 ; (x)2f11 1
0 T11 1 BB T20 ; (x)2f21 CC
B C B
B CC
B
B C
T21 C B B CC
B
B . C
.. C B B .
.. CC
B
B C
C B
B CC
B
B C
Tn1 C B B CC
B C B .. CC
B
B T12 CC =B
B . CC
B
B .. C
C BB CC
B . C B
B
B C B
BB Tn;1;0 ; (x)2fn;1;1 CCC
B Tn2 CC
B
B . C B C
@ .. CA BBB Tn+1;1 + Tn;0 ; (x)2fn;1 CCC
Tnn B@ T02 ; (x)2f12 CA
..
.
or in matrix form,
0 An ;In 1 0 T11 1
0 T + T ; (x)2f 1
BB CC BB T CC 01 10 11
BB C B 21 C B CC
BB ;In . . . . . . CC BB ... CC BB T20 ; (x) f21
2
CC
BB CC BB CC BB ..
. CC
BB CC BB Tn1 CC BB . CC
BB ... ... ... CC BB CC BB .. CC
C B T12 C = B (6.3.36)
BB CC BB .. CC B Tn;1;0 ; (x) fn;1;1 CC
B 2
BB CC BB . CC BB Tn+1;1 + Tn;0 ; (x)2fn;1 CC
BB . .
. . . . ;In C C BB Tn2 CC BB CC
BB CC BB .. CC @ B T02 ; (x) f12
2 CA
@ A@ . A ..
.
;In An Tnn
where
04;1    0 1
BB. . . . . . ... C
CC
;1
An = BBB ..
. . . . . . ;1 C CA (6.3.37)
@ .
0    ;1 4
The system matrix above is block tridiagonal and each block diagonal of matrix An is symmetric,
tridiagonal, and positive de nite. For details, see Ortega and Poole (INMDE, pp. 268{272).

6.3.6 Linear System Arising From Finite Element Methods


We have seen in the last few sections how discretization of di erential equations using nite di er-
ences gives rise to various types of linear systems problems. Finite element technique is another
popular way to discretize di erential equations and this results also in linear systems problems.

243
Just to give a taste to the readers, we illustrate this below by means of a simple di erential equa-
tion. The interested readers are referred to some of the well-known books on the subject: Ciarlet
(1981), Strang and Fix (1973), Becker, Carey and Oden (1981),Reddy(1993).
1. Variational formulation of a two-point boundary value problem.
Let us consider the following two-point boundary value problem
;u00 + u = f (x) 0<x<1 (6.3.38)
u(0) = u(1) 0 (6.3.39)
du and f is a continuous function on [0,1]. We further assume that f is such that
where u0 = dx
Problem (6.3.38)-(6.3.39) has a unique solution.
We introduce the space
V = fv : v is a continuous function on [0,1] - and v 0 is piece-wise continuous and -
bounded on [0,1], and v (0) = v (1) = 0g
Now, if we multiply the equation ;u00 + u = f (x) by an arbitrary function v 2 V (v is called a
test function) and integrate the left hand side by parts, we get
Z1 Z1
(;u00(x) + u(x))v (x)dx = f (x)v (x)dx
0 0
that is,
Z1 Z1
(u0 v 0 + uv )dx = f (x)v (x)dx (6.3.40)
0 0
Since v 2 V; and v (0) = v (1) = 0. We write (6.3.40) as:
a(u; v) = (f; v) for every u 2 V
where Z1
a(u; v ) = (u0 v 0 + uv )dx
0
and Z1
(f; v ) = f (x)v (x)dx
0
(Notice that the form a(; ) is symmetric (i.e. a(u; v ) = a(v; u)) and bilinear.) These two prop-
erties, will be used later. It can be shown that u is a solution of (6.3.40) if and only
if u is a solution to (6.3.38)-(6.3.39).
2. The Discrete Problem

244
We now discretize problem (6.3.40). We start by constructing a nite dimensional subspace Vn
of the space V .
Here, we will only consider the simple case where Vn consists of continuous piecewise linear
functions. For this purpose, we let 0 = x0 < x1 < x2 ::: < xn < xn+1 = 1 be a partition of the
interval [0,1] into subintervals Ij = [xj ;1; xj ] of length hj = xj ; xj ;1; j = 1; 2; :::; n + 1. With
this partition, we associate the set Vn of all functions v (x) that are continuous on the interval [0,1],
linear in each subinterval Ij , j = 1; :::; n + 1, and satisfy the boundary conditions v (0) = v (1) = 0.
We now introduce the basis functions f1; 2; :::; ng, of Vn.
We de ne j8(x) by
< 1 if i = j
(i) j (xi ) = :
0 if i 6= j
(ii) j (x) is a continuous piecewise linear function.
j (x) can be computed explicitly to yield:

ϕ ( λ)
1 j

0
xj 1

8 x ; xj;1
>
< hj ; when xj;1  x  xj
j (x) = > xj+1 ; x
: hj+1 ; when xj  x  xj+1:
Since 1; :::; n are the basis functions, any function v 2 Vn can be written uniquely as:
X
n
v (x) = vii(x); where vi = v (xi ):
i=1
We easily see that Vn  V .
The discrete analogous of Problem (6.3.40) then reads:
nd un 2 Vn such that
a(un; v ) = (f; v) 8v 2 Vn (6.3.41)
X
n
Now, if we let un = ui i(x) and notice that equation (6.3.41) is particularly true for every
i=1

245
function j (x); j = 1; :::; n, we get n equations, namely.
X
a( ui i; j ) = (f; j ) 8j = 1; 2; :::; n
Now using the linearity of a (; j ) leads to n linear equations in n unknowns:
X
n
uia(i; j ) = (f; j ) 8j = 1; 2; :::; n:
i=1
which can be written in the matrix form as
Aun = (fn)i (6.3.42)
where (fn )i = (f; i) and A = (aij ) is a symmetric matrix given by
aij = aji = a(i ; j ); and un = (u1; :::; un)T :
The entries of the matrix A can be computed explicitly: We rst notice that
aij = aji = a(i ; j) = 0 if ji ; j j  2
(This is due to the local support of the function i(x)). A direct computation now leads to
Z x " 1 (x ; xj;1)2 # Z x +1 " 1 (xj+1 ; x) #2
aj;j = a(j ; j ) = dx + dx
j j

xj;1h2 + h2 j j h2 + h2 xj j +1 j +1
1 1  1
= h +h + 3 [hj + hj +1] :
j j +1
Z x " 1 (xj ; x) (x ; xj;1 # ;1 + hj
aj;j;1 = ; h2 + h dx
j

h = h 6
xj ;1 j j j j
Hence, the system (6.3.42) can be written as:
2 3 2 3 2 3
66 a1 b1 77 66 u1 77 66 (fn )1 77
66 b1 a2 0 77 66 u2 77 66 (fn )2 77
66 77 66 ..
.
77 66 ..
.
77
66 77 66 77 = 66 77
66 ... ... ... 77 66 .. 77 66 .. 77
66 77 66 . 77 66 . 77
64 ... ... bn;1 75 64 .. 75 64 .. 75
. .
0 bn;1 an un (fn )n

246
where aj = h1 + h 1 + 13 [hj + hj +1] and bj = ; h1 + h6j . In the special case of uniform grid
j j +1 j
1
hj = h = n + 1 , the matrix A then takes the form
2 3
66 2 ;1 0 7 2 3
66 ;1 2 ... ...
7 77 66 4 1
... ... 7
07
7
1
A = h 66
6 ... ... ... 77 + h 66 1 7
66 77 6 66 ... . . . 1 777 (6.3.43)
6
64 ... ... ;1 775 4 0 1 4
5
0 ;1 2
6.3.7 Approximation of a Function by a Polynomial: Hilbert System
In Chapter 3 (Section 3.6) we cited an ill-conditioned linear system with the Hilbert matrix. In
this section we show how such a system arises. The discussion here has been taken from Forsythe
and Moler (CSLAS, pp. 80{81).
Suppose a continuous function f (x) de ned on the interval 0  x  1 is to be approximated by
a polynomial of degree n ; 1: n X
pi xi;1;
i=1
such that the error Z 1 "X #2
n
E= pixi;1 ; f (x) dx
0 i=1
is minimized. The coecients pi of the polynomial are easily determined by setting
E = 0; i = 1; : : :; n:
pi
(Note that the error is a di erentiable function of the unknowns pi and that a minimum occurs
when all the partial derivatives are zero.) Thus we have
2 3
E = 2 Z 1 4X
n
pj xj;1 ; f (x)5 xi;1 dx = 0; i = 1; : : :; n
pi 0 j =1

or n Z 1  Z1
X
xi+j ;2 dx pj = f (x)xi;1 dx; i = 1; : : :; n:
j =1 0 0
(To obtain the latter form we have interchanged the summation and integration.)
Letting Z1
hij = xi+j;2 dx
0

247
and Z1
bi = f (x)xi;1 dx (i = 1; 2; : : :; n);
0
we have
X
n
hij pj = bi; i = 1; : : :; n:
j =1
That is, we obtain the linear system
Hp = b;
0b 1
BB b1 C C
where H = (hij ); b = BBB ..2 C
C. The matrix H is easily identi ed as the Hilbert matrix, (see
@.C A
bn
Chapter 3, Section 3.6), since
Z1
hij = xi+j ;2 dx = i + j1 ; 1 :
0

6.4 Direct Methods


In this section we will study direct methods for solving the problem Ax = b. These methods include
 Gaussian elimination without pivoting, based on LU factorization of A (Section 6.4.2).
 Gaussian elimination with partial pivoting, based on MA = U factorization of A
(Section 6.4.3).
 Gaussian elimination with complete pivoting, based on MAQ = U factorization of A
(Section 6.4.3).
 The method based on the QR factorization of A (Section 6.4.5).
 The method based on the Cholesky decomposition of a symmetric positive de nite
matrix (Section 6.4.7).
 Gaussian elimination for special systems: Hessenberg, positive de nite, tridiagonal,
diagonally dominant. (Section 6.4.7).
For a comparison of these methods and their relative merits and demerits, see Section 6.4.9 and
the accompanying Tables of Comparison. These methods are primarily used to solve small
and dense linear system problems of order up to 200 or so.
The basic idea behind all the direct methods is to rst reduce the linear system Ax = b to
equivalent triangular system(s) by nding triangular factors of the matrix A, and then solving the
248
triangular system(s) which is (are) much easier to solve than the original problem. In Chapter 5
we described various methods for computing triangular factors of A. In Chapter 3 (Section 3.1) we
described the back substitution method for solving an upper triangular system. We now describe
in the following an analogous method, called forward elimination, for solving a lower triangular
system.

6.4.1 Solution of a Lower Triangular System


The solution of the nonsingular lower triangular system
Ly = b
can be obtained analogously as in the upper triangular system. Here y1 is obtained from the rst
equation and then, inserting its value in the second equation, y2 is obtained and so on. This process
is called forward elimination.
Algorithm 6.4.1 Forward Elimination
For i = 10; 2; : : :; n do 1
X
i;1
yi = 1 @bi ; lij yj A
lii j =1

Note: When i = 1, the summation (P) is skipped.


Flop-count and stability. The algorithm requires about n2 ops (1 op to compute y1, 2
2

ops to compute y2, 3 ops to compute y3 , and so on). The algorithm is as stable as the back
substitution process (Algorithm 3.1.3).

6.4.2 Solution of the System Ax = b Using Gaussian Elimination without Pivoting


The Gaussian elimination method for the linear system Ax = b is based on LU factorization of the
matrix A. Recall from Chapter 5 (Section 5.2.1) that triangularization using Gaussian elimination
without pivoting, when carried out to completion, yields an LU factorization of A. Once we have
this factorization of A, the system Ax = b becomes equivalent to two triangular systems:
Ly = b;
Ux = y:

249
Solving Ax = b using Gaussian Elimination Without Pivoting
The solution of the system Ax = b using Gaussian elimination (without pivoting)
can be achieved in two stages:
 First, nd a LU factorization of A.
 Second, solve two triangular systems: the lower triangular system Ly = b rst,
followed by the upper triangular system Ux = y .

Flop-count. Since a LU factorization requires about n33 ops and the solution of each trian-
gular system needs only n22 ops, the total ops count for solving the system Ax = b using Gaussian
elimination is about n33 + n2 .

6.4.3 Solution of Ax = b Using Pivoting Triangularization


A. Solution with Partial Pivoting
If Gaussian elimination with partial pivoting is used to triangularize A, then as we have seen
in Chapter 5 (Section 5.2.2), this process yields a factorization:
MA = U:
In this case, the system Ax = b is equivalent to the triangular system
Ux = Mb = b0:

Solving Ax = b Using Gaussian Elimination With Partial Pivoting


To solve Ax = b using Gaussian elimination with partial pivoting:
1. Step 1. Find the factorization MA = U by the triangularization algorithm
using partial pivoting (Algorithm 5.2.3).
2. Step 2. Solve the triangular system by back substitution (Algorithm 3.1.3):
Ux = Mb = b0:

250
Implementation of Step 2
The vector
b0 = Mb = Mn;1Pn;1Mn;2 Pn;2    M1 P1b
can be computed as follows:
(1) s1 = b
(2) For k = 1; 2; : : :; n ; 1 do
sk+1 = Mk Pk sk
sn = b0 .

251
Computational Remarks
The practical Gaussian elimination (with partial pivoting) algorithm does not give Mk and Pk
explicitly. But, we really do not need them. The vector sk+1 can be computed immediately from
sk once the index rk of row interchange and the multipliers mik have been saved at the kth step.
This is illustrated with the 3  3 example as follows:
Example 6.4.1
Let 01 0 01 0 1 0 01
B C B C
n = 3; P1 = B
@ 0 0 1 CA ; M1 = B
@ m21 1 0 CA
0 1 0 m31 0 1
and let 0 s(2) 1
BB 1(2) CC
s2 = M1 P1s1 = @ s2 A :
s(2)
3
Then we have 0s 1
BB 1 CC
P1s1 = @ s3 A
s4
and the entries of s2 are then given by
s(2)
1 = s1 ;
s(2)
2 = m21s1 + s3 ;
s(2)
3 = m31s1 + s2 :

B. Solution with Complete Pivoting


Gaussian elimination with complete pivoting (Section 5.2.3) gives
MAQ = U:
Using this factorization, the system Ax = b can be written as
Uy = Mb = b0
where
y = QT x:
Thus, we have the following.
252
Solving Ax = b Using Gaussian Elimination With Complete Pivoting
To solve Ax = b using complete pivoting:
Step 1. Find the factorization MAQ = U by the factorization algorithm using
complete pivoting (Algorithm 5.2.4).
Step 2. Solve the triangular system (for y) (Algorithm 3.1.3):
Uy = b0;
computing b0 as shown above.
Step 3. Finally, recover x from y : x = Qy .

Implementation of Step 3
Since x = Qy = Q1 Q2    Qn;1y , the following scheme can be adopted to compute x from y in
Step 3.
Set wn = y .
For k = n ; 1; : : : 2; 1 do
wk = Qk wk+1
Then x = w1.

Note: Since Qk is a permutation matrix, the entries of wk are simply those of wk+1
reordered according to the permutation index.
Example 6.4.2
Solve
Ax = b
with 00 1 11 021
B CC B CC
A=B
@ 1 2 3 A; b=B
@6A
1 1 1 3
(a) using partial pivoting,
253
(b) using complete pivoting.
(a) Partial Pivoting
With the results obtained earlier (Section 5.2.2, Example 5.2.3), we compute
061 00 1 01
B 2 CC ; P = BB 1 0 0 CC
P1 b = B
@ A 1 @ A
3 0 0 1
061 0 1 0 01
M1P1b = B
B 2 CC ; M = BB 0 1 0 CC
@ A 1 @ A
;3 ;1 0 1
061 01 0 01
B CC B C
P2M1P1b = B @ 2 A ; P2 = B@ 0 1 0 CA
;3 0 0 1
061 01 0 01
B CC B C
b0 = M2P2M1P1b = B @ 2 A ; M2 = B@ 0 1 0 CA :
;1 0 1 1
The solution of the system
Ux = b0
01 2 3 10x 1 0 6 1
BB CC BB 1 CC BB CC
@ 0 1 1 A @ x2 A = @ 2 A
0 0 ;1 x3 ;1
is x1 = x2 = x3 = 1.

(b) Complete Pivoting


Using the results obtained in the example in Section 5.2.4, we have
061 00 1 01
B CC
P1 b = B
B C
@ 2 A ; P1 = B@ 1 0 0 CA
3 0 0 1
061 061
B CC B C
M1P1b = B @ 0 A ; P2M1P1b = B@ 1 CA
1 0
061
B CC
b0 = M2 P2 M1P1 b = B
@1A:
1
2

254
The solution of the system
Uy = b0
03 2 110x 1 061
BB 0 2 1 CC BB 1 CC BB CC
@ 3 3 A @ x2 A = @ 1 A
0 0 12 x3 1
2
is y1 = y2 = y3 = 1. Since fxk g; k = 1; 2; 3 is simply the rearrangement of fyk g, we have
x1 = x2 = x3 = 1.
Some Details of Implementation
Note that it is not necessary to store the vectors si and wi separately, because all we need is
the vector sn for partial pivoting and w1 for complete pivoting. So starting with x = b, each new
vector can be stored in x as it is computed.
Also note that if we use the practical algorithms, the matrices Pk ; Qk and Mk are not available
explicitly; they have to be formed respectively out of indices rk ; sk and the multipliers mik . In this
case, the statements for computing the sk 's and wk 's are to be modi ed accordingly.

Flop-count. We have seen in Chapter 5 (Section 5.2.1) that the triangularization process
using elementary matrices requires n33 ops. The triangular system Ux = b0 or Uy = b0 can be
solved using back substitution with n ops and, the vector b0 can be computed with n2 ops,
2
2
taking into account the special structures of the matrices Mk and Pk . Recovering x from y in Step
3 of the Complete Pivoting process does not need any ops. Note that x is obtained from y just by
reshuing the entries of y . Thus, the solution of the linear3 system Ax = b using Gaussian
elimination with complete or partial pivoting requires n3 +O(n2) ops. However, Gaussian
elimination with complete pivoting requires about n3 comparisons to identify (n ; 1) pivots,
3

compared to only O(n2) comparisons needed by the partial pivoting method.


Round-o Property
In Chapter 5 (Section 5.3) we discussed the round-o property of Gaussian elimination for
triangularization of A. We have seen that the growth factor  determines the stability of the
triangularization procedure. The next question is how does the growth factor a ect the
solution procedure of Ax = b using such a triangularization. The answer is given in the
following:

255
Round-o Error Result for Linear System Problem with Gaussian
Elimination
It can be shown that the computed solution x^ of the linear system Ax = b, using
Gaussian elimination, satis es
(A + E )^x = b
where kE k1  c(n3 + 3n2 ) kAk1 , and c is a small constant. For a proof see
Chapter 11, Section 11.4.

Remark: The size of the above bound is really determined by , since when n is not too large,
n3 can be considerably small compared to  and can therefore be neglected. Thus the growth
factor  is again the deciding factor.
6.4.4 Solution of Ax = b without Explicit Factorization
As we have just seen, the Gaussian elimination method for solving Ax = b comes in two stages.
First, the matrix A is explicitly factorized:
A = LU (without pivoting)
MA = U (with partial pivoting)
MAQ = U (with complete pivoting).
Second, the factorization of A is used to solve Ax = b. However, it is easy to see that these two
stages can be combined so that the solution can be obtained by solving an upper triangular
system by processing the matrix A and the vector b simultaneously. In this case, the
augmented matrix (A; b) is triangularized and the solution is then obtained by back substitution.
We illustrate this implicit process for Gaussian elimination with partial pivoting.
Algorithm 6.4.2 Solving Ax = b With Partial Pivoting Without Explicit Factorization
Given an n  n matrix A and a vector b, the following algorithm computes the triangular
factorization of the augmented matrix (A; b) using Gaussian elimination with partial pivoting. A
is overwriten by the transformed triangular matrix and b is overwritten by the transformed vector.
The multipliers are stored in the lower-half part of A.

256
For k = 1; 2; : : :; n ; 1 do
(1) Choose the largest element in magnitude in the column k below the (k; k) entry; call it ar :k;k

ar = max fjaik j : i  kg ;
k;k

If ar = 0, Stop.
k;k

(2) Otherwise, interchange the rows k and rk of A and the kth and rkth entries of b:
ar $ akj ; (j = k; k + 1; : : :; n)
k;j

br $ bk :
k

(3) Form the multipliers:


aik  mik = ; aaik (i = k + 1; : : :; n)
kk
(4) Update the entries of A:
aij  aij + mikakj (i = k + 1; : : :; n; j = k + 1; : : :; n)

(5) Update the entries of b:


bi  bi + mik bk (i = k + 1; : : :; n):

Example 6.4.3
00 1 11 021
B 2 2 3 CC ;
A=B
B 6 CC :
b=B
@ A @ A
4 1 1 3
Step 1. The pivot entry is a31 = 4, r1 = 3
Interchange the rows 3 and 1 of A and the 3rd and 1st entries of b.
04 1 11 031
B C B C
AB @ 2 2 3 CA ; b  B@ 6 CA
0 1 1 2
m21 = ; aa21 = ; 12
11
04 1 11 031
A  A(1) = B
B 0 3 5 CC ; b  b(1) = BB 9 CC
@ 2 2A @2A
0 1 1 2

257
Step 2. The pivot entry is a22 = 32
m32 = ; aa3222 = ; 23
04 1 1 1 031
B 3 5 CC B 9 CC
A  A(2) = B
@0 2 2 A; b  b(2) = B
@2A
0 0 ; 23 ;1
The reduced triangular system A(2) x = b(2) is:
4x1 + x2 + x3 = 3
3x + 5x = 9
2 2 2 3 2
2
; 3 x3 = ;1
The solution is:
x3 = 32 ; x2 = 12 ; x1 = 14

6.4.5 Solution of Ax = b Using QR Factorization


If we have
A = QR;
the system Ax = b then becomes
QRx = b
or
Rx = QT b = b0:
Thus, once we have the QR factorization of A, the solution of the system Ax = b can be obtained
just by solving the equivalent upper triangular system:
Rx = b0 = QT b:
Solving Ax = b using QR
To solve Ax = b using QR factorization,
1. Find the QR factorization of A : QT A = R (Sections 5.4 and 5.5).
2. >From b0 = QT b.
3. Solve Rx = b0.
258
Forming b0
To compute b0 we do not need Q explicitly. It can be computed from the factored form of Q.
For example, if the QR factorization is obtained using the Householder method (Chapter 5, Section
5.4.1), then
QT = Hn;1Hn;2    H2H1;
and b0 = QT b can be computed as
(1) y1 = b
(2) For k = 1; 2; : : :; n ; 1 do
yk+1 = Hkyk
(3) yn = b0:
Example 6.4.4
Consider 00 1 11 021
B 1 2 3 CC ;
A=B
B 6 CC :
b=B
@ A @ A
1 1 1 3
>From the example of Section 5.4.1, we know that the Householder method gives us
0 ;1:4142 ;2:1213 ;2:8284 1
B C
R = B @ 0 1:2247 1:6330 C A;
0 0 ;0:5774
0 0 ; p1 ; p1 1 0 0 ;0:7071 ;0:7071 1
B p1 1 2 2 C B C
H1 = B @ ; 2 2 ; 12 CA = B@ ;0:7071 0:5000 ;0:5000 CA ;
; p12 ; 12 12 ;0:7071 ;0:5000 0:5000
01 0 0
1
BB C
H2 = @ 0 ;0:1691 ;0:9856 C A
0 ;0:9856 0:1691
Compute b0:
021
B CC
y1 = b = B
@6A
3
0 0 ; p1 ; p1 1 0 2 1 0 ; p9 1 0 ;6:3640 1
B 2 2 CB C B
y2 = H1y1 = @ ; p2 p2 ; p2 A @ 6 A = @ p2 C
B C B C B
2C B C
1 1 1 1
A = B@ 0:0858 CA
; p12 ; p12 p12 3 ; p52 ;2:9142
259
0 ;6:3640 1
B C
b0 = y3 = H2y2 = B
@ 2:8560 CA
;0:5770
(Note that b0 above has been computed without explicitly forming the matrix Q.)
Solve:
Rx = b0
011
B 1 CC :
x = B
@ A
1

Flop-Count. If the Householder method is used to factor A into QR, the solution of
Ax = b requires 23 n3 + O(n2) ops (Chapter 5, Section 5.4.1); on the other hand, the Givens
rotations technique requires about twice as many (Chapter 5, Section 5.5.1).
Round-o Property
We know from Chapter 5 that both the Householder and the Givens methods for QR factor-
ization of A are numerically stable. The back-substitution process for solving an upper triangular
system is also numerically stable (Section 3.1.3). Thus, the method for solving Ax = b using
QR factorization is likely to be stable. Indeed, this is so.
Round-o Error Result for Solving Ax = b using QR
It can be shown (Lawson and Hanson (SLP p. 92)) that the computed solution x^ is
the exact solution of
(A + E )^x = b + b;
where kE kF  (3n2 + 41n)kAkF + O(2); and kbk  (3n2 + 40n)kbk + O(2).

Remark: There is no \growth factor" in the above expression.


6.4.6 Solving Linear System with Right Multiple Hand Sides
Consider the problem
AX = B

260
where B = (b1; :::; bm) is an n  m matrix (m  n) and X = (x1; x2; :::; xm). Here bi , and
xi ; i = 1; :::; m are n-vectors.
The problem of this type arises in many practical applications (one such application has been
considered in Section 6.4.7: Computing the Frequency Response Matrix).
Once the matrix A has been factorized using any of the methods described in Chapter 5, the
factorization can be used to solve the m linear systems above. We state the procedure only with
partial pivoting.

Solving AX = B (Linear System with Multiple Righ-hand Sides)


Step 1. Factorize: MA = U , using Gaussian elimination with partial pivoting.
Step 2. Solve the m upper triangular systems:
Uxi = b0i = Mbi ; i = 1; :::; m:

Flop-Count: The algorithm will require about n33 + mn2 ops.


Example: Solve AX = B where
01 2 41 01 21
B C B CC
A=B @ 4 5 6 CA ; B=B
@3 4A;
7 8 9 5 6
07 8 9 1 00 0 11
B 0 6 19 CC ;
U =B
B 1 0 ; 1 CC
M =B
@ 7 7A @ 7A
0 0 ;21 ;2 1 ;2
1 1

(See Example 5.2.4 in Chapter 5.)


Solve: 0 5 1
B 0:2857 CC
Ux1 = Mb1 = B
@ A
0
0 0:3333 1
B C
x1 = B
@ 0:3333 CA
0

261
Solve: 0 6 1 0 ;0:6667 1
B 1:1429 CC ;
Ux2 = Mb2 = B
B 1:3333 CC :
x2 = B
@ A @ A
0 0
6.4.7 Special Systems
In this subsection we will study some special systems. They are
(1) Symmetric positive de nite systems.
(2) Hessenberg systems.
(3) Diagonally Dominant Systems.
(4) Tridiagonal and block tridiagonal systems.
We have seen in Section 6.3 that these systems occur very often in practical applications such
as in numerical solution of partial di erential equations, etc. Indeed, it is very often said by
practicing engineers that there are hardly any systems in practical applications which
are not one of the above types. These systems therefore deserve a special treatment.
Symmetric Positive De nite Systems
First, we show that for a symmetric positive de nite matrix A there exists a unique
factorization
A = HH T
where H is a lower triangular matrix with positive diagonal entries. The factorization is
called the Cholesky Factorization, after the French engineer Cholesky.
The existence of the Cholesky factorization for a symmetric positive de nite matrix A can be
seen either via LU factorization of A or by computing the matrix H directly from the above relation.
To see this via LU factorization, we note that A being positive de nite and, therefore having
positive leading principal minors, the factorization
A = LU
is unique. The upper triangular matrix U can be written as
U = DU1
Andre-Louis Cholesky (1875-1918) served as an ocer in the French military. His work there involved geodesy
and surveying.

262
where
D = diag(u11; u22; : : :unn)
= diag(a11; a(1) (n;1)
22 : : :ann );

and U1 is a unit upper triangular matrix. Thus


A = LDU1:
Since A is symmetric, from above, we have
U1T DLT = LDU1
or
D = (U1T );1LDU1 (LT );1 :
The matrix (U1T );1L is a unit lower triangular matrix and the matrix U1(LT );1 is unit upper
triangular. It therefore follows from above that
(U1T );1L = U1 (LT );1 = I
that is,
U1 = LT ;
so A can be written as
A = LDLT ;
where L is unit lower triangular. Since the leading principal minors of A are a11; a11a(1) (1)
22 ; : : :; a11a22
: : :a(nnn;1); A is positive de nite i the pivots a11 ; a(1) (n;1)
22 ; : : :; ann are positive. This means that when
A is positive de nite the diagonal entries of D are positive and, therefore, we can write
D = D1=2  D1=2
where p q q 
(n;1)
D1=2 = diag a11; a(1)
22 ; : : :; ann :
So,
A = LDLT = LD1=2D1=2LT = H  H T
Note that the diagonal entries of H = LD1=2 are positive.
The above discussion can be summarized in the following theorem:

263
Theorem 6.4.1 (The Cholesky Factorization Theorem) Let A be a symmetric
positive de nite matrix. Then A can be written uniquely in the form
A = HH T ;
where H is a lower triangular matrix with positive diagonal entries. An explicit
expression for H is given by
H = LD1=2;
where L is the unit lower triangular matrix in the LU factorization of A obtained
by Gaussian elimination without pivoting and
 
D1=2 = diag u111=2; : : :; u1nn=2 :

The above constructive procedure suggests the following algorithm to compute the Cholesky
factorization of a symmetric positive de nite matrix A:
Algorithm 6.4.3 Gaussian Elimination for the Cholesky Factorization
Step 1. Compute the LU factorization of A using Gaussian elimination without pivoting.
Step 2. Form the diagonal matrix D from the diagonal entries of U :
D = diag(u11; u22; : : :; unn):

Step 3. Form H = LD1=2.


Example 6.4.5
Find the Cholesky factorization of A using Gaussian elimination without pivoting.
2 3
!
A =
3 5
1 0
!
;
L = M1 = 3
1
;2 1
!
1 0 2 3
! 2 3!
A = M1A = 3
(1) =
;2 1 3 5 0 12

264
2 3
!
U = 1 ; D = diag(2; 12 )
0 2
1 0
!
L = 3
2 1
1 0
! p2 0 ! p2 0 !
H = 3 1 1 = 3 1
2 0 p 2
p2 p2
Verify p ! p2 ! !
2 0 p32 2 3
HH T = p32 p12 p12
= = A:
0 3 5
Stability of Gaussian Elimination for the Cholesky Factorization
We now show that Gaussian elimination without pivoting is stable for symmetric positive def-
inite matrices by exhibiting some remarkable invariant properties of symmetric positive de nite
matrices. The following example illustrates that even when there is a small pivot, Gaussian
elimination without pivoting does not give rise to the growth in the entries of the
matrices A(k). Let !
0:00003 0:00500
A= :
0:00500 1:0000
There is only one step. The pivot entry is 0.00003. It is small. The multiplier m21 is large:
m21 = ; aa21 = ; 00::00003
00500 = ; 500 :
3
11
But
0 : 00003 0 :00500
!
A(1) = :
0 0:166667
The entries of A(1) did not grow. In fact, max(a(1)ij ) = 0:166667 < max(aij ) = 1. This interesting
phenomenon of Gaussian elimination without pivoting applied to the 2  2 simple example (positive
de nite) above can be explained through the following result.

Theorem 6.4.2 Let A = (aij ) be an n  n symmetric positive de nite matrix and


let A(k) = (aij(k) ) be the reduced matrices obtained by applying Gaussian elimination
without pivoting to A. Then
1. each A(k); k = 1; : : :; n ; 1, is symmetric positive de nite,
2. max ja(ijk)j  max ja(ijk;1)j; k = 1; 2; : : :; n ; 1.

265
Proof. We prove the results just for the rst step of elimination, because the results for the other
steps then follow inductively.
After the rst step of elimination, we have
0a a    a1n 1 0 a11 a12    a1n 1
BB 011 a(1)
12
C BB 0 CC
2n C
   a(1)
A(1) = BBB .. 22.. . . . ... C
C B
CA = BB@ ...
CC :
CA
@ . . B
0 a(1)
n2    a(1)
nn 0
To prove that A(1) is positive de nite, consider the quadratic form:
X
n X
n X n 
n X 
xT Bx = a(1)
ij xi xj = aij ; aia1aj 1 xixj
i=2 j =2 i=2 j =2 11
Xn X n X
n !2
= aij xi xj ; a11 x1 + aai1 xi
i=1 j =1 i=2 11
If A(1) is not positive de nite, then there will exist x2 ; : : :; xn such that
X
n X
n
ij xi xj  0:
a(1)
i=1 j =1
With these values of x2 through xn, if we de ne
X
n a
i1
x1 = ; xi ;
i=2 a11
then the quadratic form
X
n X
n
aij xixj  0
i=1 j =1
which contradicts the fact that A is positive de nite. Thus A(1) is positive de nite.
Also, we note
ii  aii ; i = 1; 2; : : :; n;
a(1)
for
0  a(1) ai1  a (because a > 0).
2
ii = aii ; ii
a11 11

Thus, each diagonal entry of A(1)


is less than or equal to the corresponding diagonal entry of
A. Since the largest element (in magnitude) of a symmetric positive de nite matrix lies on the
diagonal, we have max ja(1)
ij j  max jaij j.

A Consequence of Theorem 6.4.2


From Theorem 6.4.2, we immediately conclude that if jaij j  1; then ja(ijk)j  1. This
means that the growth factor  in this case is 1.
266
The Growth Factor and Stability of Guassian Elimination for a Positive
De nite Matrix
The growth factor  of Gaussian elimination without pivoting for a sym-
metric positive de nite matrix is 1. Thus, Gaussian elimination without
pivoting for a symmetric positive de nite matrix is stable.

Example 6.4.6

05 1 11
B C
A = B@ 1 1 1 CA
0 15 11 51 1
B 4 4 CC
A(1) = B
@0 5 5 A:
0 4 24
5 5
The leading principal minors of A(1) are 5, 4, 16. Thus A(1) is symmetric positive de nite. Also,
max(a(1) 24
ij ) = 5 < max(aij ) = 5:
05 1 11
B 4 4 CC
A(2) = B@0 5 5 A
0 0 4
The leading principal minors of A(2) are 5, 4, 16. Thus A(2) is also positive de nite. Furthermore,
max(a(2) (1) 5
ij ) = 5 = max(aij ). The growth factor  = 5 = 1.

Solution of a Symmetric Positive De nite System Using LDLT Decomposition


We have seen in the beginning of this section that a symmetric matrix A having nonzero leading
principal minors can be written uniquely in the form
A = LDLT ;
where L is unit lower triangular and D has positive diagonal entries. Furthermore, this decomposi-
tion can be obtained in a numerically stable way using Gaussian elimination without pivoting.

267
In several circumstances one prefers to solve the symmetric positive de nite system Ax = b directly
from the factorization A = LDLT , without computing the Cholesky factorization. The advantage
is that the process will not require computations of any square roots. The process then
is as follows:
Gaussian Elimination for the Symmetric Positive De nite System Ax = b
Step 1. Compute the LDLT factorization of A:
A = LDLT :

Step 2. Solve
Lz = b

Step 3. Solve
Dy = z

Step 4. Solve
LT x = y
Example 6.4.7

2 3
! 5
!
A= ; b= :
3 5 8
Step 1. ! !
1 0 2 0
L= 3 ; D= :
2 1 0 21
Step 2. Solve Lz = b
1 0
! z! 5
!
1
= :
3 1
2 z2 8
z1 = 5; z2 = 1 21

Step 3. Solve Dy = z
2 0
! y! 5
!
1
=
0 21 y2 13
2

268
y1 = 52
y2 = 1

Step 4. Solve LT x = y
1 32
! x! 5 !
1 2
=
0 1 x2 1
x2 = 1
x2 = 1

The Cholesky Algorithm


We now show how the Cholesky factorization can be computed directly from A = HH T . >From
A = HH T
0h 0  0 10h h  h 1
0a a    a1n 1 B 11
C B 11 21 n1
C
BB 11 12 CC B h21 h22 0    0 C B 0 h22    hn2 CC
B C B
BB a.21 a22 .    a2n C BB .. . . CC BB .. . . CC
B@ .. .. CC = B . . CC BB . . CC
A B B ..
. ... CA B@ ..
. ... CA
an1 an2    ann @
hn1       hnn 0 0    hnn
we have
h11 = pa11; hi1 = hai1 ; i = 2; : : :; n
11
X
i X
j
h2ik = aii; aij = hik hjk; j < i:
k=1 k=1
This leads to the following algorithm, known as the Cholesky Algorithm.
Algorithm 6.4.4 Cholesky Algorithm
Given an n  n symmetric positive de nite matrix A, the following algorithm computes the
Cholesky factor H . The matrix H is computed row by row and is stored in the lower triangular
part of A.
For k = 1; 2; : : :; n do
*This algorithm in some applications (such as in statistics) is known as the square-root algorithm. A square-
root free algorithm, however, can be developed.

269
For i = 1; 2; : : :; k ; 1 do
 
aki  hki = h1 aki ; Pij;=11 hij hkj
q ii

akk  hkk = akk ; Pkj=1


;1 h2
kj

Remark:
X
0
1. In the above pseudocode, ( )  0. Also when k = 1, the inner loop is skipped.
j =1
2. The Cholesky factor H is computed row by row.
3. The positive de niteness of A will make the quantities under the square-root sign positive.

Round-o property. Let the computed Cholesky factor be denoted by H^ . Then, it can be
shown (Demmel 1989) that
A + E = H^ (H^ )T ;
where jeij j  (n + 1) (aii ajj )1=2, and
1 ; (n + 1)
E = (eij ):
Thus, the Cholesky Factorization Algorithm is Stable

Solution of Ax = b using the Cholesky Factorization


Having the Cholesky factorization A = HH T at hand, the positive de nite linear system Ax = b
can now be solved by solving the lower triangular system Hy = b rst, followed by the upper
triangular system H T x = y .
Algorithm 6.4.5 The Cholesky Algorithm for the Positive De nite System Ax = b
Step 1. Find the Cholesky factorization of
A = HH T ;
using Algorithm 6.4.4.
Step 2. Solve the lower triangular system for y:
Hy = b:

270
Step 3. Solve the upper triangular system for x:
H T x = y:

Example 6.4.8
Let 01 1 1 1 031
B 1 5 5 CC ;
A=B
B 11 CC :
b=B
@ A @ A
1 5 14 20
A. The Cholesky Factorization
1st row: (k = 1)
h11 = 1:
2nd row: (k=2)
h21 = ha21 = 1
q11 p
h22 = a22 ; h221 = 5 ; 1 = 2
(Since the diagonal entries of H have to be positive, we take the + sign.)
3rd row: (k=3)
h31 = ah31 = 1
n
h32 = h (a32 ; h21h31) = 12 (5 ; 1) = 2
1
q22 p p
h33 = a33 ; (h231 + h232) = 14 ; 5 = 9
(we take the + sign)
h033 = +3 1
1 0 0
B C
H=B
@ 1 2 0 CA
1 2 3

271
B. Solution of the Linear System Ax = b
(1) Solution of Hy = b
01 0 01 0y 1 0 3 1
B
B 1 2 0
CC BB y1 CC = BB 11 CC
@ A @ 2A @ A
1 2 3 y3 20
y1 = 3; y2 = 4; y3 = 3

(2) Solution of H T x = y
01 1 11 0x 1 031
BB 0 2 2 CC BB x1 CC = BB 4 CC
@ A @ 2A @ A
0 0 3 x3 3
x3 = 1; x2 = 1; x3 = 1;

Flop-Count. The Cholesky algorithm requires n6 ops to compute H; one half of the number
3

of ops required to do the same job using LU factorization. Note that the process will also 2require
n square roots. The solution of each triangular system Hy = b and H T x = y requires n2 ops.
Thus the solution of the positive de nite system Ax = b using the Cholesky algorithm
n 3
requires 6 + n ops and n square roots.
2

Round-o property. If x^ is the computed solution of the system Ax = b using the Cholesky
algorithm, then it can be shown that x^ satis es
(A + E )^x = b
where kE k2  ckAk2; and c is a small constant depending upon n. Thus the Cholesky algo-
rithm for solving a symmetric positive de nite system is quite stable.
Relative Error in the Solution by the Cholesky Algorithm
Let x^ be the computed solution the symmetric positive de nite system of Ax = b
using the Cholesky algorithm followed by triangular systems solutions as described
above, then it can be shown that
kx ; x^k2   Cond(A):
kx^k2
(Recall that Cond(A) = kAk  kA;1k.)

272
Remark: Demmel (1989) has shown that the above bound can be replaced by O()Cond(A~),
where A~ = D;1AD;1 ; D = diag(pa11; : : :; pann). The latter may be much better than the previous
one, since Cond(A~) may be much smaller than Cond(A). (See the discussions on conditioning and
scaling in Section 6.5.)
Hessenberg System
Consider the linear system
Ax = b;
where A is an upper Hessenberg matrix of order n. If Gaussian elimination with partial pivoting
is used to triangularize A; and if jaij j  1; then ja(ijk)j  k + 1; (Wilkinson AEP p. 218). Thus we
can state the following:

Growth Factor and Stability of Gaussian Elimination for a Hessenberg


System
The growth factor for a Hessenberg matrix using Gaussian elimination with partial
pivoting is bounded by n. Thus a Hessenberg system can be safely solved
using partial pivoting.

Flop-count: It requires only n2 ops to solve a Hessenberg system, signi cantly less than
n3
3 ops required to solve a system with an arbitrary matrix. This is because at each step of
elimination during triangularization process, only one element needs to be eliminated and since
n 2
there are (n ; 1) steps, the triangularization process requires about ops. Once we have the
2
factorization
MA = U;
the upper triangular system
Ux = Mb = b0
can be solved in n ops. Thus a Hessenberg system can be solved with only n2 ops in
2
2
a stable way using Gaussian elimination with partial pivoting.
Example 6.4.9

273
Triangularize 0 1 2 31
B C
A=B
@ 2 3 4 CA
0 5 6
using partial pivoting.

Step 1.
00 1 01
B C
P1 = B
@ 1 0 0 CA
0 0 1
02 3 41
B 1 2 3 CC
P1 A = B
@ A
0 5 6
1 0
!
c1 =
M
; 21 1
0 1 0 01
B 1 C
M1 = B
@ ; 2 1 0 CA
0 0 1
0 1 0 0102 3 41 02 3 41
B 1 CB C B C
@ ; 2 1 0 CA B@ 1 2 3 CA = B@ 0 21 1 CA :
M1P1A = A(1) = B
0 0 1 0 5 6 0 5 6
Step 2.
01 0 01
B C
P2 = B
@ 0 0 1 CA
0 1 0
02 3 41
P2A(1)
B0 5 6C
= B C
@ A
0 1 1
2
1 0
!
c2 =
M
; 101 1
01 0 0
1
B0
M2 = B 1 0C
C
@ A
0 ; 101 1

274
01 0 0
10 2 3 4
1 0 2 3 4
1
B CC BB CC BB C
U = A(2) = M2P2A(1) = B
@0 1 0A@0 5 6A = @0 5 6C A
0 ; 101 1 1
0 2 1 2
0 0 5
00 1 0
1
B0
M = M2P2M1P1 = B 0 1 C
C
@ A
1 ; 21 ; 101

Computation of the Growth Factor 


 = max(66; 6; 6) = 1.

An Application of Hessenberg System: Computing the Frequency Response Matrix


In control theory it is often required to compute the matrix
G(j! ) = C (j!I ; A);1B
for many di erent values of ! , to study the response of a control system. The matrices A; B; C here
are matrices of a control system and are of order n  n; n  m, and r  n, respectively (m  n).
The matrix G(j! ) is called the frequency response matrix.
Since computing (j!I ; A);1B is equivalent to solving m linear systems (see Section 6.5.1):
(j!I ; A)xi = bi; i = 1; : : :; m
 3 
where bi is the ith column of B ; for each ! , we need about n3 + mn2 + rnm ops to compute
G(j! ) (using Gaussian elimination with partial pivoting). Typically, G(j! ) needs to be computed
for a very large number of ! , and thus, such computation will be formidable.
On the other hand, if A is transformed initially to a Hessenberg matrix:
A = PHP T ;
then
G(j! ) = C (j!I ; H );1P T B:
The computation of (j!I ; H );1 P T B will now require solutions of m Hessenberg systems, each
of which will require only n2 ops. Then, for computing G(j! ) = C (j!I ; H );1P T B for each ! ,
there will be a saving of order O(n) for each ! . This count does not include reduction to Hessenberg

275
form. Note that the matrix A is transformed to a Hessenberg matrix once and the same
Hessenberg matrix is used to compute G(j!) for each !.
Thus, the computation that uses an initial reduction of A to a Hessenberg matrix
is much more ecient. Moreover, as we have seen before, reduction to Hessenberg form and
solutions of Hessenberg systems using partial pivoting are both stable computations. This approach
was suggested by Laub (1981).

Remarks: Reduction of A to a Hessenberg matrix serves as a frontier to major computations


in control theory. Numerically viable algorithms for important control problems such as controlla-
bility and observability, matrix equations (Lyapunov, Sylvester, Observer-Sylvester, Riccati, etc.)
routinely transform the matrix A to a Hessenberg matrix before actual computations start. Readers
familiar and interested in these problems might want to look into the book \Numerical Methods
in Control Theory" by B. N. Datta for an account of these methods. For references on the
individual papers, see Chapter 8. The recent reprint book \Numerical Linear Algebra Tech-
niques for Systems and Control," edited by R. V. Patel, Alan Laub and Paul van Dooren,
IEEE Press, 1994, also contains all the relevant papers.
Diagonally Dominant Systems
Recall that a matrix A = (aij ) is column diagonally dominant if
ja11j > ja21j + ja31j +    + jan1j
ja22j > ja12j + ja32j +    + jan2j
..
.
jannj > ja1nj + ja2nj +    + jan;1;nj
A column diagonally dominant matrix, like a symmetric positive de nite matrix, pos-
sesses the attractive property that no row-interchanges are necessary at any step
during the triangularization procedure for Gaussian elimination with partial pivoting.
The pivot element is already there in the right place. Thus, to begin with, at the rst step, a11 being
the largest in magnitude of all the elements in the rst column, no row-interchange is necessary.
At the end of the rst step, we then have
0a a  a1n 1 0a a  a 1
1n
BB 011 a(1)
12
C BB 011 12 CC
 2n C
a(1)
A(1) = BBB .. 22.. C
.. C =BB .. 0C
C
@ . . . C
A @ B . A CA
0 a(1)
n2 a(1)
nn 0

276
and, it can be shown [exercise] that A0 is again column diagonally dominant and therefore a(1)22
is the pivot for the second step. This process can obviously be continued, showing that pivoting
is not needed for column diagonally dominant matrices. Furthermore, the following can be easily
proved.

Growth Factor and Stability of Gaussian Elimination for Diagonally


Dominant Systems
The growth factor  for a column diagonally dominant matrix with partial pivoting
is bounded by 2 (exercise #16):   2.
Thus, for column diagonally dominant systems, Gaussian elimination with
partial pivoting is stable.

Example 6.4.10
0 5 1 11
B 1 5 1 CC :
A=B
@ A
1 1 5
Step 1. 05 1 1 1
B 0 24 4 CC
A(1) = B
@ 5 5A
0 4 25
5 5

Step 2. 05 1 1 1
B 24 4 CC
A(2) = B
@0 5 5 A:
0 0 14
3
The growth factor  = 1.
(Note that for this example, the matrix A is column diagonally dominant and positive de nite;
thus  = 1.)
The next example shows that the growth factor of Gaussian elimination for a col-
umn diagonally dominant matrix can be greater than 1, but is always less than 2.
Example 6.4.11
277
5 ;8
!
A= :
1 10
5 ;8
!
A =
(1) :
0 11:6
The growth factor  = max(10; 11:6) = 11:6 = 1:16.
10 10
Tridiagonal Systems
The LU factorization of a tridiagonal matrix T , when it exists, may yield L and U having
very special simple structures: both bidiagonal, L having 1's along the main diagonal and the
superdiagonal entries of U the same as those of T . Speci cally, if we write
0a b1    0
1
BB c1 ... ... .. C
. C
T =BBB ..2 ... ... b C
C
@. n;1 C
A
0    cn an
01 10u b1 1
B
B `2 ... 0 CC BB 1 ... ... 0 CC
B
B ... ...
CC BB ... ...
CC
B
=B CC BB CC :
B CC BB C
B
@ 0 ... ... A@ 0 ... bn;1 C A
`n 1 un
By equating the corresponding elements of the matrices on both sides, we see that
a1 = u1
ci = `i ui;1; i = 2; : : :; n
ai = ui + `i bi;1; i = 2; : : :; n;
from which f`ig and fui g can be easily computed:
Computing LU Factorization of a Tridiagonal Matrix
u1 = a1
For i = 2; : : :; n do
`i = ci
ui;1
ui = ai ; `ibi;1.

278
Flop-count: The above procedure only takes (2n-2) ops.
Solving a Tridiagonal System
Once we have the above simple factorization of T; the solution of the tridiagonal
system Tx = b can be found by solving the two special bidiagonal systems:
Ly = b
and
Ux = y
.
Flop-count: The solutions of these two bidiagonal systems also require (2n-2) ops. Thus,
a tridiagonal system can be solved by the above procedure in only 4n-4 ops, a very
cheap procedure indeed.

Stability of the Process: Unfortunately, the above factorization procedures breaks down if
any ui is zero. Even if all ui are theoretically nonzero, the stability of the process in general
cannot be guaranteed. However in many practical situations, such as in discretizing Poisson's
equation etc., the tridiagonal matrices are symmetric positive de nite, in which cases, as we
have seen before, the above procedure is quite stable.
In fact, in the symmetric positive de nite case, this procedure should be preferred
over the Cholesky-factorization technique, as it does not involve computations of any square
roots. It is true that the Cholesky factorization of a symmetric positive de nite tridiagonal matrix
can also be computed in O(n) ops however, an additional n square roots have to be computed
(see, Golub and Van Loan MC 1984, p. 97).
In the general case, to maintain the stability, Gaussian elimination with partial
pivoting should be used.
If jaij; jbij, jcij  1 (i ; 1; : : :; n), then it can be shown (Wilkinson AEP p. 219) that the entires
of A(k) at each step will be bounded by 2.

279
Growth Factor and Stability of Gaussian Elimination for a Tridiagonal
System
The growth factor for Gaussian elimination with partial pivoting for a tridiagonal
matrix is bounded by 2:
  2:
Thus, Gaussian elimination with partial pivoting for a tridiagonal system
is very stable.

The op-count in this case is little higher; it takes about 7n ops to solve the system Tx = b (3n
ops for decomposition and 4n for solving two triangular systems), but still an O(n) procedure.
If T is symmetric, one naturally wants to take advantage of the symmetry; however, Gaussian
elimination with partial pivoting does not preserve symmetry. Bunch (1971,1974) and
Bunch and Kaufman (1977) have proposed symmetry-preserving algorithms. These algorithms can
be arranged to have op-count comparable to that of Gaussian elimination with partial pivoting
and require less storage than the latter. For details see the papers by Bunch and Bunch and
Kaufman.
Example 6.4.12 Triangularize
0 0:9 0:1 0 1
B C
A=B
@ 0:8 0:5 0:1 CA ;
0 0:1 0:5
using (i) the formula A = LU and (ii) Gaussian elimination.
i. From A = LU
u1 = 0:9
i=2:
`2 = uc2 = 00::89 = 98 = 0:8889
1
u2 = a2 ; `2b1 = 0:5 ; 89  0:1 = 0:4111

280
i=3:
`3 = uc3 = 00::41
1 = 0:2432
2
u3 = a03 ; `3b2 = 0:5 ; 0:24 10:1 = 0:4757
1 0 0
B C
L = B
@ 0:8889 1 0C A;
0 0:90 0:01:2432 00:1 1
B C
U = B
@ 0 0:4111 0:1 CA
0 0 0:4757

(ii) Using Gaussian Elimination with Partial Pivoting

Step 1. Multiplier m21 = ; 00::89 = ;0:89


0 0:9 0:1 0 1
B C
A(1) = B
@ 0 0:4111 0:1 CA
0 0:1 0:5

Step 2. Multiplier m32 = ; 00::411 = ;0:243


0 0:9 0:1 0
1
BB CC
A = @ 0 0:4111 0:1
(2)
A=U
0 0 1 0 0 0:4757
0
1 0 1 0 0
1
B C B C
L = B
@ ;m21 1 0 CA = B@ 0:8889 1 0 CA :
0 ;m32 1 0 0:2432 1

Block Tridiagonal Systems


In this section we consider solving the block tridiagonal system:
Tx = b;
where T is a block tridiagonal matrix and b = (b1; b2; : : :; bn)T is a block vector. The number of
components of the block vector bi is the same as the dimension of the ith diagonal block matrix in
T.
281
A. Block LU Factorization
The factorization procedure given in the beginning of this section may be easily extended to
the case of the Block Tridiagonal Matrix
0A B1 1
1
B
B C2 ... ... 0 CC
B
B ... ... ...
CC
T =B
B CC :
B C
B
@ 0 ... ... BN ;1 C A
CN AN
Thus if T has the block LU factorization:
0I 10U B1    01
BB L ... 0
CC BB 1 ... ... CC
..
.
BB 2 ... ...
CC BB ... ...
CC
..
T =B BB CC BB .CC = LU;
CC BB .. C
B@ 0 ... ... A@ .
... BN ;1 C A
LN I 0       UN
then the matrices Li ; i = 2; : : :; N and Ui ; i = 1;    N can be computed as follows:
Algorithm 6.4.6 Block LU Factorization
Set
U1 = A1
For i = 2; : : :; N do
(1) Solve for Li :
UiT;1 Li = Ci
(2) Compute Ui:
Ui = Ai ; LiBi;1 :

B. Solution of Block Systems


Once we have the above factorization, we can nd the solution x of the system
Tx = b
by solving Ly = b and Ux = y successively. The solution of Ly = b can be achieved by Block
Forward Elimination, and that Ux = y can be achieved by Block Back Substitution.
282
Algorithm 6.4.7 Block Forward Elimination
Set L1y0 = 0.
For i = 1; : : :; n do
yi = bi ; Liyi;1.

Algorithm 6.4.8 Block Back Substitution


Set BN xN +1 = 0.
For i = N; : : :; 1 do
Ui xi = yi ; Bixi+1
Example 6.4.13

0 4 ;1 1 0 1 041
BB C BB CC
B ;1 4 0 1 C C BB 4 CC
A=B C
B@ 1 0 2 ;1 CA ; b = B@ 2 CA
0 1 ;1 2 2
4 ;1
! 2 ;1
! 1 0
!
A1 = ; A2 = ; B1 =
;1 4 ;1 2 0 1
4
! 2
!
b1 = ; b2 =
4 2
Block LU Factorization
!
4 ;1
Set U1 = A1 = :
;1 4
i=2:
(1) Solve for L2:
1 0
!
U1L2 = I2 =
0 1
0:2667 0:0667
!
L2 = U1;1 =
0:667 0:2667

283
(2) Compute U2 from
U2 = A2 ; L2 B1! ! !
2 ;1 0:2667 0:0667 1 0
= ;
;1 2 0:0667
! 0:2667 0 1
1:7333 ;1:0667
=
;1:0667 1:7333
Block Forward Elimination

4
!
y1 = b1 ; L1 y0 = b1 =
4
0:6667
!
y2 = b2 ; L2 y1 =
0:6667
Block Back Substitution

0:6667
!
U2x2 = y2 ; B2x3 = y2 = (B2x3 = 0)
0:6667
0:6667
! 0:9286 0:5714 ! 0:6666 ! 1 !
x2 = U2; 1 = =
0:6667 0:5714 0:9286 0:6667 1
4
! 1
! 3
!
U1x1 = y1 ; B1x2 = ; =
4 1 3
3
! !
0:2667 0:0667 3
! 1!
x1 = U1;1 = = :
3 0:0667 0:2667 3 1

Block Cyclic Reduction


Frequently in practice, the block tridiagonal matrix of a system may possess some special
properties that can be exploited to reduce the system to a single lower-order system by using a
technique called Block Cyclic Reduction. For details see the book by Golub and Van Loan (MC
1983 pp. 110{117) and the references therein.
6.4.8 Scaling
If the entires of the matrix A vary widely, then there is a possibility that a very small number
needs to be added to a very large number during the process of elimination. This can in uence the
284
accuracy greatly, because, \the big one can kill the small one." To circumvent this diculty,
often it is suggested that the rows of A are properly scaled before the elimination process begins.
The following simple example illustrates this.
Consider the system ! ! 106 !
10 106 x1
= :
1 1 x2 2
Apply now Gaussian elimination with pivoting. Since 10 is the largest entry in the rst column,
no interchange is needed. We have, after the rst step of elimination,
10 106
! x! 106
!
1
=
0 ;105 x2 ;105
1
!
which gives x2 = 1 x1 = 0. The exact solution, however, is  . Note that the above system is
1
exactly equal to the system in Section 6.3.4, with the rst equation multiplied by 106. Therefore,
even choosing the false pivot 10 did not help us. However, if we scale the entries of the rst row
of the matrix by dividing it by 106 and then solve the system (after modifying the 1st entry of
b appropriately) using partial pivoting, we will then have the accurate solution, as we have seen
before.
Scaling of the rows of a matrix A is equivalent to nding an invertible diagonal matrix D1 so
that the largest element (in magnitude) in each row of D1;1A is about the same size. Once such D1
is found, the solution of the system
Ax = b
is found by solving the scaled system
~ = ~b
Ax
where
A~ = D1;1 A; ~b = D1;1b:
The process can be easily extended to scale both the rows and columns of A. Mathematically, this
is equivalent to nding diagonal matrices D1 and D2 such that the largest (in magnitude) element
in each row and column of D1;1AD2 lies between two xed numbers, say, 1 and 1, where is the

base of the number system. Once such D1 and D2 are found, the solution of the system
Ax = b
is obtained by solving the equivalent system
~ = ~b;
Ay

285
and then computing
x = D2y;
where
A~ = D1;1AD2
~b = D1;1 b:

The above process is known as equilibration (Forsythe and Moler, CSLAS, pp. 44{45).
In conclusion, we note that scaling or equilibration is recommended in general, and
should be used only on an ad-hoc basis depending upon the data of the problem. \The
round-o error analysis for Gaussian elimination gives the most e ective results when a matrix is
equilibrated." (Forsythe and Moler CSLAS, p. )

6.4.9 LU Versus QR and Table of Comparisons


We have just seen that the method for solving Ax = b using the QR factorization is about twice as
expensive as the Gaussian elimination with partial pivoting method if Householder method is used
to factor A and, about four times as expensive as this method if Givens rotations are used. On
the other hand, the QR factorization technique is unconditionally stable, whereas, from theoretical
point of view, with Gaussian elimination with partial or complete pivoting, there is always some
risk involved. Thus, if stability is the main concern and the cost is not a major factor, one can
certainly use the QR factorization technique. However, considering both eciency and stability
from a practical point of view, it is currently agreed that Gaussian elimination with partial
pivoting is the most practical approach for solution of Ax = b. If one really insists on
using an orthogonalization technique, certainly the Householder method is to be preferred over the
Givens method. Furthermore, Gaussian elimination without pivoting should not be used
unless the matrix A is symmetric positive de nite or diagonally dominant.
We summarize the above discussion in the following two tables.

286
TABLE 6.1
(COMPARISON OF DIFFERENT METHODS
FOR LINEAR SYSTEM PROBLEM WITH ARBITRARY MATRICES)

METHOD FLOP-COUNT GROWTH FACTOR  STABILITY


(APPROX.)
Gaussian Elimination
Without Pivoting n3 Arbitrary Unstable
3
Gaussian Elimination
With Partial n3 (+O(n2 )   2n;1 Stable in practice
3
Pivoting comparisons)
Gaussian Elimination
With Complete n3 (+O(n3 )   fn  21  3 21  n(1=n;1)g1=2 Stable
3
Pivoting comparisons)
QR Factorization
Using Householder 2n3 + (n None Stable
3
Transformations square roots)
QR Factorization
Using Givens 4n3 + ( n2 None Stable
3 2
Rotations square roots)

287
TABLE 6.2
(COMPARISON OF DIFFERENT METHODS FOR LINEAR
SYSTEM PROBLEM WITH SPECIAL MATRICES)

FLOP-COUNT GROWTH
MATRIX TYPE METHOD (APPROX.) FACTOR  STABILITY
3
Symmetric Positive 1) Gaussian Elimination 1) n3 1)  = 1
De nite without Pivoting Stable
3
1) Cholesky 1) n6 + (n 1) None
square roots)
Diagonally Gaussian Elimination
Dominant with partial n3 2 Stable
3
pivoting
Hessenberg Gaussian Elimination
with Partial n2 n Stable
Pivoting
Tridiagonal Gaussian Elimination O(n) 2 Stable
with Partial Pivoting

6.5 Inverses, Determinant and Leading Principal Minors


Associated with the problem of solving the linear system Ax = b are the problems of nding the
determinant, the inverse and the leading principal minors of the matrix A. In this section
we will see how these problems can be solved using the methods of various factorizations developed
earlier.

6.5.1 Avoiding Explicit Computation of the Inverses


The inverse of a matrix A very seldom needs to be computed explicitly. Most computa-
tional problems involving inverses can be reformulated in terms of solution of linear systems. For
example, consider
1. A;1b (inverse times a vector)
2. A;1B (inverse times a matrix)
3. bT A;1 c (vector times inverse times a vector).

288
The rst problem, the computation of A;1 b; is equivalent to solving the linear system:
Ax = b:
Similarly, the second problem can be formulated in terms of solving sets of linear equations. Thus,
if A is of order n  n and B is of order n  m, then writing C = A;1 B = (c1; c2; : : :; cm). We see
that the columns c1 through cm of C can be found by solving the systems
Aci = bi; i = 1; : : :; m;
where bi ; i = 1; : : :; m are the successive columns of the matrix B .
The computation of bT A;1c can be done in two steps:
1. Find A;1c; that is, solve the linear system: Ax = c
2. Compute bT x.
As we will see later in this section, computing A;1 is three times as expensive as solving
the linear system Ax = b. Thus, all such problems mentioned above can be solved much more
eciently by formulating them in terms of linear systems rather than naively solving them using
matrix inversion.
The explicit computation of the inverse should be avoided whenever pos-
sible. A linear system should never be solved by explicit computation of
the inverse of the system matrix.

Signi cance of the Inverse of a Matrix in Practical Applications


Having said that the most computational problems involving inverses can be reformulated in
terms of linear systems, let us remark that there are, however, certain practical applications where
the inverse of a matrix needs to be computed explicitly; and, in fact, the entries of the inverse
matrices in these applications have some physical signi cance.
Example 6.5.1
Consider once again the spring-mass problem discussed in Section 6.3.3. The (i; j )th entry of
the inverse of the sti ness matrix K tells us what the displacement of the mass i will be if a unit
external force is imposed on mass j . Thus the entries of K will tell us how the systems components
will respond to external forces.
289
Let's take a speci c instance when the spring constants are all equal:
k1 = k2 = k3 = 1
Then 0 2 ;1 0 1 01 1 11
B ;1 2 ;1 CC ;
K=B
B 1 2 2 CC :
K ;1 = B
@ A @ A
0 ;1 1 1 2 3
Since the entries of the rst column of K ;1 are all 1's, it means that a downward unit load to the
rst mass will make a displacement of all the masses by 1 inch downward. Similar interpretations
can be given for the elements of the other columns of K ;1 .
Some Easily Computed Inverses
Before we discuss the computation of A;1 for an arbitrary matrix A, we note that the inverses
of some well-known matrices can be trivially computed.
(1) The inverse of the elementary lower triangular matrix M = I ; meTk is given by M ;1 = I + meTk
(2) The inverse of an orthogonal matrix Q is its transpose QT (note that a Householder matrix
and a permutation matrix are orthogonal).
(3) The inverse of triangular matrix T of one type is again a triangular matrix of the same type,
the diagonal entries being the reciprocals of the diagonal entries of the matrix T .

6.5.2 The Sherman-Morrison and Woodbury Formulas


In many applications once the inverse of a matrix A is computed, it is required to nd the inverse
of another matrix B which di ers from A only by a rank-one perturbation. The question naturally
arises if the inverse of B can be computed without starting all over again. That is, if the inverse of
B can be found using the inverse of A which has already been computed. The Sherman-Morrison
formula shows us how to do this.

290
The Sherman-Morrison Formula
If u and v are two n-vectors and A is a nonsingular matrix, then
(A ; uv T );1 = A;1 + (A;1uv T A;1 )
where
= (1 ; v T1A;1u) ; if v T A;1u 6= 1.

Remarks: a) (1) above is a special case of this formula.


b) The Sherman-Morrison formula shows how to compute the inverse of the matrix
obtained from a matrix A by rank-one change, once the inverse of A has been computed,
without explicitly computing the inverse of the new matrix.
The Sherman-Morrison formula can be extended to the case where U and V are matrices. This
generalization is known as the Woodbury formula:

The Woodbury Formula


(5) (A ; UV T );1 = A;1 + A;1 U (I ; V T A;1 U );1V T A;1, if I ; V T A;1 U is nonsingular.
Example 6.5.2
Given
01 1 11
B C
A = B
@ 2 4 5 CA ;
6 7 8
0 ;3 ;1 1 1
B C
A;1 = B
@ 14 2 ;3 CA
;10 ;1 2
nd (A ; uv T );1, where u = v = (1; 0; 0)T .
= 1 ; vT1A;1 u = 41
0 ;3 ;1 1 1
; ; ; BB 74 43 41 CC
A + A uv A = @ 2 ; 2 2 A :
1 1 T 1

; 52 32 ; 12
291
Thus 0 ;3 1 1 1
B 47 ;43 14 CC
(A ; uv T );1 = B
@ ;2 2 2 A :
; 25 32 ; 12

6.5.3 Computing the Inverse of a Matrix


Computing the inverse of a matrix A is equivalent to solving the sets of linear systems:
Axi = ei ; i = 1; : : :; n:
These n linear systems now can be solved using any of the techniques discussed earlier. However,
Gaussian elimination with partial pivoting will be used in practice.
Since the matrix A is the same for all the systems, A has to be factorized only once. Taking into
account this fact and taking advantage of the special form of the right hand side of each system,
the computation of the inverse of A requires about n3 ops.
We may also compute the inverse of A directly from any of triangular factorizations (note that
this process is completely equivalent to the process of nding A;1 by solving the n linear systems
Axi = ei ; i = 1; : : :; n, di ering only in the arrangement of computations).
A. If Gaussian elimination with partial pivoting is used, we have
MA = U;
then
A;1 = U ;1M:
Recall from Chapter 5 that
M = Mn;1Pn;1    M2P2 M1 P1 ;
so that we have

Computing the Inverse From Partial


Pivoting Factorization

A;1 = U ;1(Mn;1Pn;1    M2P2 M1 P1 ):

292
B. If complete pivoting is used,
MAQ = U:
Then
A = M ;1 UQT :
(Note that Q;1 = QT .) Thus,

Computing the Inverse From Complete Pivoting


Factorization

A;1 = QU ;1M = (Q1Q2    Qn;1)U ;1(Mn;1Pn;1    M2P2 M1 P1):


C. If orthogonal factorization is used,
A = QR
and

Computing the Inverse From QR Factorization

A;1 = R;1 QT :
For reasons stated earlier, Gaussian elimination with partial pivoting should be used
in practice.

Remark: In practical computations, the structure of the elementary lower triangular ma-
trices Mi and the fact that Pi are permutation matrices should be taken into consideration
in forming the product M .
D. If A is a symmetric positive de nite matrix, then
A = HH T (the Cholesky Factorization)
A;1 = (H T );1H ;1 = (H ;1)T  H ;1

293
Computing the inverse of a symmetric positive de nite matrix A:
Step 1. Compute the Cholesky factorization A = HH T .
Step 2. Compute the inverse of the lower triangular matrix H : H ;1.
Step 3. Compute (H ;1)T H ;1.
E. A is a tridiagonal matrix
0a b 0
1 ja1j > jb1j
1 1
B
B c1 a2 . . .
CC ja2j > jb2j + jc1j
B
A=B ... ... b C
C;
B ..
@ n;1 CA .
0 cn;1 an janj > jcn;1j:
Then A has the bidiagonal factorization:
A = LU;
where L is lower bidiagonal and U is upper bidiagonal.
A;1 = U ;1 L;1

Example 6.5.3
Let
0 2 ;1 0 1
B C
A = B
@ ;1 2 ;1 CA
0 ;1 1
0 1 0 01
B ; 1 1 0 CC ;
L = B
@ 2 A
0 ; 23 1
0 2 ;1 0 1
B 0 3 ;1 CC
U = B
@ 2 A
0 0 1
3
01 1 1
101 0 0
1 0
1 1 1
1
B2 3
C B C B C
A ; 1= U ;1 L;1 = B
@0 2
3
C B 1 C B
2A@ 2 1 0A = @1 2 2C A:
0 0 3 1 2 1 1 2 3
3 3
A is tridiagonal and L and U are bidiagonals.
294
Example 6.5.4
Compute A;1 using partial pivoting when
00 1 11
A = B
B 1 2 3 CC
@ A
1 1 1
A;1 = U ;1 M = U ;1M2 P2 M1P1 :
Using the results of Example 5.2.4, we have
00 1 01
B C
M = M2P2 M1 P1 = B
@ 1 0 0 CA
1 ;1 1
0 1 ;2 1 1
B C
U ;1 = B
@ 0 1 1 CA :
0 0 ;1
So, 0 1 ;2 1 1 0 0 1 0 1 0 ;1 0 1 1
B CB C B C
A;1 = B
@ 0 1 1 CA B@ 1 0 0 CA = B@ 2 ;1 1 CA :
0 0 ; 1 1 ;1 1 ;1 1 ;1

6.5.4 Computing the Determinant of a Matrix


The determinant of a matrix A, denoted by det(A); is its nth leading principal minor pn. Thus,
the method for computing the leading principal minors to be described in the next section,
can in particular be used to compute the determinant of A. Furthermore, the ordinary Gaussian
elimination (with partial, complete or without pivoting) and the Householder triangularization
methods can also be applied to compute det(A).

Note: The determinant of a matrix is seldom needed in practice.


A. If Gaussian elimination without pivoting is used, we have
A = LU; det(A) = det(L)  det(U ):
U is an upper triangular matrix, so, det(U ) = u11u22    unn = a11a(1)
22    ann ; L is a unit
(n;1)

lower triangular matrix, so, det(L) = 1. Thus,

295
Computing det(A) from LU Factorization
(n;1)
22    ann = the product of the pivots.
det(A) = a11a(1)

B. If the Gaussian elimination with partial pivoting is used, we have


MA = U:
Then, det(M ) det(A) = det(U ).
Now M = Mn;1Pn;1    M2 P2 M1 P1. Since the determinant of each of the lower elementary
matrices is 1 and the determinant of each of the permutation matrices  1, we have
det(M ) = (;1)r ;
where r is the number of row interchanges made during the pivoting process. So, we have
det(A) = det(1M ) det(U )
= (;1)r  u11  u22    unn
= (;1)r a11a(1) (n;1)
22    ann :

Computing det(A) from MA = U factorization

det(A) = (;1)r a11 a(1) (n;1)


22    ann ;

where r is the number of interchanges.


C. If the Gaussian elimination with complete pivoting is used, we have
MAQ = U:
So,
det(M )  det(A)  det(Q) = det(U ):
Let r and s be, respectively, the number of row and column interchanges. Then
det(M ) = (;1)r
det(Q) = (;1)s
Thus we state the following:

296
Computing det(A) from MAQ = U factorization

det(A) = (;1)r+s det(U )


= (;1)r+s a11a(1) (n;1)
22    ann ;

where r and s are the number of row and column interchanges.

Example 6.5.5
00 1 11
B 1 2 3 CC
A=B
@ A
1 1 1
A. Gaussian elimination with partial pivoting.
01 2 3 1
B C
U =B @ 0 1 1 CA
0 0 ;1
only one interchange; therefore r = 1. det(A) = (;1) det(U ) = (;1)(;1) = 1.
B. Gaussian elimination with complete pivoting.
03 2 11
B 2 1 CC
U =B @0 3 3 A
0 0 12
In the rst step, there were one row interchange and one column interchange.
In the second step, there are one row interchange and one column interchange. Thus r =
2; s = 2
det(A) = (;1)r+s det(U ) = (;1)43  23  12 = 1:

6.5.5 Computing The Leading Principal Minors of a Matrix


There are applications, such as nding the eigenvalue distribution in a given region of the complex
plane (see Datta and Datta (1986), Datta (1987), etc.), that require knowledge of the leading
principal minors of a matrix. Also, as we will see in Chapter 8, the leading principal minors are
important to the eigenvalue computations of a symmetric matrix where the signs of the leading
principal minors of a symmetric tridiagonal matrix are needed. We will discuss here a numerical
method for determining the leading principal minors of a matrix.
297
The Gaussian elimination with partial and complete pivoting and Householder
triangularization, in general, can not be used to obtain the leading principal minors
(unless, of course, one nds the factor matrices explicitly, computes all the minors of each of the
factor matrices and then uses the Binet-Cauchy theorem (Gantmacher, Theory of Matrices, Vol. I)
to compute the leading principal minors, which is certainly a very expensive procedure). However,
Wilkinson (AEP, pp. 237{296) has shown that the steps of the partial pivoting method and those
of the Givens triangularization method can be rearranged so that the modi ed procedures yield
the kth leading principal minor at the end of the kth step. We will describe here only the Givens
triangularization method.
The Givens Method for the Leading Principal Minors
The zeros are created in the positions (2,1); (3,1), (3,2); (4,1), (4,2) (4,3), etc., in order, by
applying successively the Givens rotations J (1; 2; ) to A; J (1; 3; ) to A(1) = J (1; 2; )A; J (2; 3; )
to A(2) = J (1; 3; )J (1; 2; )A; and so on.
The leading principal minors of order (k + 1) is obtained at the end of kth step:
(k)
22    ak+1;k+1:
Pk+1 = a11 a(1)
Of course, each A(k) can overwrite A. In this case we have
Pk+1 = a11a22    ak+1;k+1:
Algorithm 6.5.1 Givens Method For the Leading Principal Minors
Given an n  n matrix A; the following algorithm computes the (i + 1)th principal minor at the
end of ith step.
For i = 2; : : :; n do
For j = 1; 2; : : :; i ; 1 do
Find c and s such that
c s
! a ! !
jj
=
;s c aij 0
Overwrite A with J (j; i; )A
Pi = a11a22    aii

298
Flop-count and stability. The algorithm requires 4n3 ops, four times the expense of the
3

Gaussian elimination method. The algorithm has guaranteed stability. (Wilkinson, AEP, pp.
246)
Example 6.5.6
01 0 01
B 1 1 0 CC
A=B
@ A
0 0 1
only one step needed
c = p1 ; s = p1
2 2
0 1 0 1
p12 p12
! 1 ! B 0:7071 0:7071 C 1 ! B 1:4142 C
=B CA = =B
@ 0 CA
; p12 p12 1 @ 1
;0:7071 0:7071 0
0 0:7071 0:7071 0 1 0 1 0 0 1 0 1:4142 0:7071 0 1
B CB C B C
A  J (1; 2; )A = B
@ ;0:7071 0:7071 0 CA B@ 1 1 0 CA = B@ 0 0:7071 0 CA
0 0 1 0 0 1 0 0 1
1st leading principal minor: p1 = a11 = 1
2nd leading principal minor: p2 = a11 a22 = 1:4142  0:7071 = 1:0000
3rd leading principal minor: p3 = a11a22a33 = 1.

6.6 Perturbation Analysis of the Linear System Problem


In practice the input data A and b may be contaminated by error. This error may be experimental,
may come from the process of discretization, etc. In order to estimate the accuracy of the computed
solution, the error in the data should be taken into account. As we have seen in Chapter 3, there
are problems whose solutions may change drastically even with small changes in the input data, and
this phenomenon of ill-conditioning is independent of the algorithms used to solve these problems.
We have discussed ill-conditioning of several problems in Chapter 3. Let's take another simple
example of an ill-conditioned linear system.
Consider the following linear system:
x1 + 2x2 = 3
2x1 + 3:999x2 = 5:999
299
The exact solution is x1 = x2 = 1. Now make a small perturbation in the right-hand side obtaining
the system:
x1 + 2x2 = 3
2x1 + 3:999x2 = 6
The solution of the perturbed system now, obtained by Gaussian elimination with pivoting (con-
sidered to be a stable method in practice) is:

x1 = 3; x2 = 0:

Thus, a very small change in the right hand side changed the solution altogether.
In this section we study the e ect of small perturbations of the input data A and b on the
computed solution x of the system Ax = b. This study is very useful. Not only will this help
us in assessing an amount of error in the computed solution of the perturbed system,
regardless of the algorithm used, but also, when the result of a perturbation analysis
is combined with that of backward error analysis of a particular algorithm, an error
bound in the computed solution by the algorithm can be obtained.
Since in the linear system problem Ax = b, the input data are A and b, there could be impurities
either in b or in A or in both. We will therefore consider the e ect of perturbations on the solution
x in each of these cases separately.

6.6.1 E ect of Perturbation in the Right-Hand Side Vector b


We assume here that there are impurities in b but the matrix A is exact.

Theorem 6.6.1 (Right Perturbation Theorem) If b and x, are, respectively,


the perturbations of b and x in the linear system Ax = b; and, A is assumed to be
nonsingular and b 6= 0; then
kbk kxk kbk
Cond(A)kbk  kxk  Cond(A) kbk :

300
Proof. We have
Ax = b;
and
A(x + x) = b + b:
The last equation can be written as
Ax + Ax = b + b;
or
Ax = b; sinceAx = b;
that is,
x = A;1b:
Taking a subordinate matrix-vector norm we get
kxk  kA;1k kbk: (6.6.1)
Again, taking the same norm on both sides of Ax = b; we get
kAxk = kbk
or
kbk = kAxk  kAk kxk (6.6.2)
Combining (6.6.1) and (6.6.2), we have

kxk  kAk kA;1k kbk : (6.6.3)


kxk kbk
On the other hand, Ax = b gives
kxk  kkb
Ak
k (6.6.4)
Also, from Ax = b, we have
1  1
kxk kA;1kkbk : (6.6.5)
Combining (6.6.4) and (6.6.5), we have

kx  kbk :
kxk kAkkA;1kkbk
301
Recall from Chapter 3 that kAk kA;1k is the condition number of A and is denoted by Cond(A).
Theorem is therefore proved.
.

302
Interpretation of Theorem 6.6.1
It is important to understand the implication of Theorem 6.6.1 quite well. Theorem 6.6.1
says that a relative change in the solution can be as large as Cond(A) multiplied by
the relative change in the vector b. Thus, if the condition number is not too large, then a
small perturbation in the vector b will have very little e ect on the solution. On the other hand,
if the condition number is large, then even a small perturbation in b might change the solution
drastically.
Example 6.6.1 An ill-conditioned problem
01 2 1
1 0 4 1
A=B
B 2 4:0001 2:002 CC ; B 8:0021 CC
b=B
@ A @ A
1 2:002 2:004 5:006
011 0 4 1
B C B C
The exact solution x = B
@1C
A. Change b to b0 = B@ 8:0020 CA.
1 5:0061
Then the relative change in b:
kb0 ; bk = kbk = 1:879  10;5 (small):
kbk kbk
If we solve the system Ax0 = b0, we get
0 3:0850 1
B C
x0 = x + x = B
@ ;0:0436 CA :
1:0022
(x0 is completely di erent from x)
Note: kkx k
xk = 1:3461:
It is easily veri ed that the inequality in Theorem 6.6.1 is satis ed:
Cond(A)  kkbbkk = 4:4434:
However, the predicated change is overly estimated.
Example 6.6.2 A well-conditioned problem
1 2
! 3
!
A= ; b=
3 4 7

303
1
! 3:0001
!
The exact solution x = . Let b0 = b + b = .
1 7:0001
The relative change in b:
kb0 ; bk = 1:875  10;5 (small)
kbk
Cond(A) = 14:9330 (small)
Thus a drastic change in the solution x is not expected. In fact x0 satisfying
Ax0 = b0
is
0:9999
! 1
!
x0 = x= :
1:0001 1
Note:
kxk = 10;5:
kxk
6.6.2 E ect of Perturbation in the matrix A
Here we assume that there are impurities in A only and as a result we have A + A in hand, but
b is exact.

Theorem 6.6.2 (Left Perturbation Theorem) Assume A is nonsingular and


6 0. Suppose that A and x are respectively the perturbations of A and x in
b=
the linear system
Ax = b:
Furthermore, assume that A is such that kAk < 1 . Then
kA k ;1

kxk  Cond(A) kAk = 1 ; Cond(A) kAk  :


kxk kAk kAk

Proof. We have
(A + A)(x + x) = b;
or
(A + A)x + (A + A)x = b: (6.6.6)

304
Since
Ax = b;
we have from (6.6.6)
(A + A)x = ;Ax (6.6.7)
or
x = ;A;1 A(x + x): (6.6.8)
Taking the norm on both sides, we have
kxk  kA;1k kAk  (kxk + kxk) (6.6.9)
= kA k kkAAkk kAk (kxk + kxk)
;1

that is,  kAk;1 kAk kAk 


1; k x k  kAk kA;1k kAk kxk: (6.6.10)
kAk kAk
Since
kA;1k kAk < 1;
the expression under parenthesis of the left hand side is positive. We can thus divide both sides of
the inequality by this number without changing the inequality. After this, if we also divide by kxk,
we obtain
kxk  kAkkA;1k kAk
kAk
kxk (1 ; kAkkA;1k kAk ) (6.6.11)
kAk
k
= Cond(A) kAk (1 ; Cond(A) kkAAkk )
Ak
which proves the theorem.

Remarks: Because of the assumption that kAk < kA1;1k (which is quite reasonable to assume),
the denominator on the right hand side of the inequality in Theorem 6.6.2 is less than one. Thus
even if kkAAkk is small, then there could be a drastic change in the solution if Cond(A)
is large.
Example 6.6.3
Consider the previous example once more. Change a2;3 to 2.0001; keep b xed. Thus
00 0 01
B C
A = ;10;4 B@ 0 0 1 CA (small):
0 0 0
305
Now solve the system: (A + A)x0 = b :
0 ;1:0002 1
B 2:0002 CC
x0 = B
@ A
0:9998
0 ;2:0002 1
x = x0 ; x = B
B 1:0002 CC
@ A
;0:0003
Relative Error = kkx
xk
k = 1:2911 (quite large).

6.6.3 E ect of Perturbations in both the matrix A and the vector b


Finally, we assume now that both the input data A and b have impurities. As a result we have the
system with A + A as the matrix and b + b as the right hand side vector.

Theorem 6.6.3 (General Perturbation Theorem) Assume that A is nonsingu-


6 0, and kAk < kA1;1k . Then
lar, b =
0 1
kxk  B
BB Cond(A) CC  kAk kbk 
kxk @ 1 ; Cond(A)  kAk C A kAk + kbk :
kAk

Proof. Subtracting
Ax = b
from
(A + A)(x + x) = b + b
we have
(A + A)(x + x) ; Ax = b
or
(A + A)(x + x) ; (A + A)x + (A + A)x ; Ax = b
or
(A + A)(x) ; Ax = b

306
or
A(I ; A;1(;A))x = b + Ax: (6.6.12)
Let A;1 (;A) = F . Then
kF k = kA;1(;A)k  kA;1k kAk < 1 (by assumption):
Since kF k < 1, I ; F is invertible, and
k(I ; F );1k  1 ;1kF k (see Chapter 1, Section 1.7, Theorem 1.7.7): (6.6.13)

>From (6.6.12), we then have


x = (I ; F );1 A;1(b + Ax)
or
kxk  1k;A kFkk (kbk + kAk kxk)
;1

or
kxk  kA;1k   kbk + kAk (6.6.14)
kxk (1 ; kF k) kxk
;1  
 (1k;A kFkk) kbkkbkkAk + kAk (Note that kx1k  kkAbkk ).
That is,
kxk  kA;1k kAk  kbk + kAk  : (6.6.15)
kxk (1 ; kF k) kbk kAk
Again
kF k = kA;1(;A)k  kA;1k kAk = kA kAkkkAk  kAk:
;1
(6.6.16)
Since kF k  1, we can write from (6.6.15) and (6.6.16)
0 1 0 1
kxk  B
BB kA;1k kAk C
C  kbk kAk  BB Cond(A) CC  kbk kAk 
C +
kxk @ (1 ; ( kA;1k kAk )  kAk) A kbk kAk = B
@ (1 ; Cond(A)  kAk) CA kbk + kAk :
kAk kAk
(6.6.17)

Remarks: We again see from (6.6.17) that even if the perturbations kkbbkk and kkAAkk are small,
there might be a drastic change in the solution, if Cond(A) is large. Thus, Cond(A) plays the
crucial role in the sensitivity of the solution.

307
De nition 6.6.1 Let A be a nonsingular matrix. Then
Cond(A) = kAk  kA;1k:

A Convention
Unless otherwise stated, when we write Cond(A), we will mean Cond2 (A),
that is, the condition number with respect to 2-norm. The condition number
of a matrix A with respect to a subordinate p norm (p = 1; 2; 1) will be denoted by
Condp (A), that is, Cond1 (A) will stand for the condition number of A with respect
to 1-norm, etc.

6.7 The Condition Number and Accuracy of Solution


The following are some important (but easy to prove) properties of the condition number of a
matrix.
I. Cond(A) with respect to any p-norm is at least 1.
II. If A is an orthogonal matrix, then Cond(A) with respect to the 2-norm is 1.
(Note that this property of an orthogonal matrix A makes the matrix so attractive
for its use in numerical computations.)
III. Cond(AT A) = (Cond(A))2.
IV. Cond(A) = Cond(AT ).
V. Cond(AB )  Cond(A)Cond(B ).
VI. Cond( A) = Cond(A), where is a nonzero scalar.
VII. Cond(A)  j1j=jnj, where j1j  j2j  : : :  jnj, and 1; : : :; n are the eigenvalues of A.
VIII. Cond2 (A) = 1 =n, where 1  2      n are the singular values of A.
We now formally de ne the ill-conditioning and well-conditioning in terms of the condition
number.
De nition 6.7.1 The system Ax = b is ill-conditioned if Cond(A) is quite large. Otherwise, it
is well-conditioned.
308
Remarks: Though the condition number, as de ned above, is norm-dependent, the condition
numbers with respect to two di erent norms are related (see Golub and Van Loan MC, 1984 p.
Cond2(A)  n.) In
26). (For example, it can be shown that if A is an n  n matrix, then n1  Cond
1 (A)
general, if a matrix is well-conditioned or ill-conditioned with respect to one norm, it
is also ill-conditioned or well-conditioned with respect to some other norms.
Example 6.7.1
(a) Consider
1 0:9999
!
A = ;
0:9999 1
5:0003 ;4:99997
!
;
A = 10
1 3 :
;4:9997 5:0003
1. The condition numbers with respect to the in nity norm and 1-norm:
kAk1 = kAk1 = 1:9999
kA;1k1 = kA;1k1 = 104
Cond1 (A) = Cond1 (A) = 1:9999  104

2. The condition number with respect to the 2-norm


q
kAk2 = (A) = 1:9999
q
kA;1k2 = (A;1) = 104
Cond2 (A) = 1:9999  104:

Remark: For the above example, it turned out that the condition number with respect to any
norm is the same. This is, however, not always the case. In general, however, they are closely
related. (See below the condition number of the Hilbert matrix with respect to di erent
norms.)
6.7.1 Some Well-known Ill-conditioned Matrices
1. The Hilbert Matrix 0 1 1 1  1 1
n
BB 1 21 13    1 C
n+1 C
A=B BB 2.. 3 4 C
.. C
@. . C
A
n n+1      
1 1 1
2n;1

309
For n = 10; Cond2(A) = 1:6025  1013:
Cond1 (A) = 3:5353  1013:
Cond1(A) = 3:5353  1013:
2. The Pie matrix A with aii = ; aij = 1 for i 6= j . The matrix becomes ill-conditioned
when is close to 1 or n ; 1. For example, when = 0:9999 and n = 5, Cond(A) = 5  104.
3. The Wilkinson bidiagonal matrix of order 20 (see Chapter 3):

6.7.2 E ect of The Condition Number on Accuracy of the Computed Solution


Once a solution x^ of the system Ax = b has been computed, it is natural to test how accurate the
computed solution x^ is. If the exact solution x is known, then one could of course compute the
relative error kxk;xkx^k to test x^. However, in most practical situations, the exact solution is not
known. In such cases, the most obvious thing to do is to compute the residual r = b ; Ax^ and
see how small is the relative residual kkrbkk . Interestingly, we should note that the solution
obtained by the Gaussian elimination process in general produces a small residual.
(WHY?) Unfortunately, a small relative residual does not guarantee the accuracy of the
solution. The following example illustrates this fact.
Example 6.7.2
Let
1:0001 1
!
A = ;
1 1
2:0001
!
b = :
2

Let
0
!
x^ =
2

Then
0:0001
!
r = b ; Ax^ =
0

310
Note that
! r is small. However, the vector x^ is nowhere close to the exact solution
1
x= .
1
The above phenomenon can be explained mathematically from the following theorem. The
proof can be easily worked out.

Theorem 6.7.1 (Residual Theorem)


kx^ ; xk  Cond(A) krk :
kxk kbk

Interpretation of Theorem 6.7.1


Theorem 6.7.1 tells us that the relative error in x^ does not depend only on the relative residual
but also on the condition number of the matrix A as well. A computed solution can be
guaranteed to be accurate only when the product of both Cond(A) and the relative
residual is small. Note that in the above example,
Cond(A) = 4:0002  104: (large!)

6.7.3 How Large Must the Condition Number be for Ill-Conditioning?


A frequently asked question is: how large Cond(A) has to be before the system Ax = b is
considered to be ill-conditioned. We restate Theorem 6.6.3 to answer the question.

Theorem 6.6.3 (Restatement of Theorem 6.6.3)


0 1
kxk  B
BB Cond(A) CC  kAk kbk 
kxk @ 1 ; Cond(A) kAk CA kAk + kbk ;
kAk

Suppose for simplicity


kAk = kbk = 10;d:
kAk kbk

311
Then, kxk is approximately less than or equal to 2  Cond(A)  10;d .
kxk
This says that if the data has a relative error of 10;d and if the relative error in the solution
has to be guaranteed to be less than or equal to 10;t; then Cond(A) has to be less than or equal
to 12  10d;t. Thus, whether a system is ill-conditioned or well-conditioned depends on
the accuracy of the data and how much error in the solution can be tolerated.
For example, suppose that the data have a relative error of about 10;5 and an accuracy of about
10;3 is sought, then Cond(A)  21  102 = 50. On the other hand, if the accuracy of about 10;2 is
sought, then Cond(A)  12  103 = 500. Thus, in the rst case the system will be well-conditioned
if Cond(A) is less than or equal to 50, while in the second case, the system will be well-conditioned
if Cond(A) is less than or equal to 500.
Estimating Accuracy from the Condition Number
In general, if the data are approximately accurate and if Cond(A) = 10s, then there
will be only about t ; s signi cant digit accuracy in the computed solution when
the solution is computed in t-digit arithmetic.
For better understanding of conditioning, stability and accuracy, we again refer the readers to
the paper of Bunch (1987).

6.7.4 The Condition Number and Nearness to Singularity


The condition number also gives an indication when a matrix A is computationally close to a
singular matrix: if Cond(A) is large, A is close to singular.
This measure of nearness to singularity is a more accurate measure than the determinant of A.
For example, consider the well-known n  n triangular matrix
0 1 ;1       ;1 ;1 1
BB 0 1 . . . ;1 C CC
BB . .. C
BB .. 0 1 . . . . C C
A=B BB .. .. . . . . . . ... CCC
. . . . .
BB .. . . . . . . . . . CC
@ 0 . ; 1 A
0 0   0 1
The matrix has the determinant equal to 1, however, it is nearly singular for large n.
Cond1 (A) = n2n;1:
Similarly, the smallness of the determinant of a matrix does not necessarily mean that
A is close to a singular matrix. Consider A = diag(0:1; 0:1; : : :; 0:1) of order 1000. det(A) =
312
10;1000, which is a small number. However, A is considered to be perfectly nonsingular, because
Cond2 (A) = 1.

6.7.5 Conditioning and Pivoting


It is natural to wonder if ill-conditioning can be detected during the triangularization process using
the Gaussian elimination with partial pivoting. By a normalized matrix here we mean that
kAk2 = 1. Suppose that A; and b have been normalized. Then there are certain symptoms
for ill-conditioning.
Symptoms for Ill-Conditioning
1. A small pivot,
2. A large computed solution,
3. A large residual vector, etc.
Justi cation: Suppose there is a small pivot, then M in the triangular factorization of A will be
large (see Algorithm 5.2.3 or Algorithm 5.2.4), and this large M will make A;1 large. (Note that
if partial pivoting is used, then A;1 = U ;1M ). Similarly, if the computed solution x^ is large, then
from Ax^ = ^b; we have kx^k = kA;1^bk  kA;1k k^bk; showing that kA;1k is possibly large. Large
kA;1k, of course, mean ill-conditioning, because Cond(A) = kAk kA;1k will then be large.
Remark: There are matrices which do not have any of these symptoms, still are
ill-conditioned (see Wilkinson AEP pp. 254{255.)
6.7.6 Conditioning and the Eigenvalue Problem
If a normalized matrix A has a small eigenvalue, it must be ill-conditioned. For, if 1 ; 2; : : :; n
are the eigenvalues of A, then it can be shown that
kA;1k2  max jeigenvalue of A;1j = min1j j :
i
Thus, a normalized matrix A is ill-conditioned if and only if kA;1k2 is large. (See
Wilkinson AEP, p. 195.)

313
Example 6.7.3
Consider the linear system
Ax = b
with
01 0 0
1
B 0 0:00001
A = B 0 C
C
@ A
0 0 0:00001
0 0:1 1
B CC
b = B
@ 0:1 A
0:1
0 0:00001 1
B C
x = 104 B
@ 1 CA :
1
which is quite large.
The eigenvalues of A are 1, 0.00001 and 0.00001. Thus, A has a small eigenvalue.
0 0:00001 1 0 1
B C
A;1 = 105 B@ 0 1 0C A;
0 0 1
which is large. Thus, for this example (i) the computed solution is large, (ii) A has a small
eigenvalue, and (iii) A;1 is large. A is, therefore, likely to be ill-conditioned. It is indeed true:
Cond(A) = 105:

6.7.7 Conditioning and Scaling


In Section 6.4.8 we discussed scaling and the message there was \scaling is in general rec-
ommended if the entries of the matrix A vary widely". Scaling followed by a strategy of
pivoting is helpful. One thing that we did not make clear is that scaling has some e ect on the
condition number of the matrix. For example, consider again the example of Section 6.4.8:
10 106
!
A =
1 1
Cond(A) = 106:
However, if the rst row of A is scaled to obtain
0:00001 1
!
~
A= ;
1 1
314
then
Cond(A~) = 2:
The question naturally arises, \Given a matrix A how can one choose the diagonal
matrices D1 and D2 such that Cond(D1;1AD2) will be as small as possible?"
There is an (almost) classical solution due to Bauer (1963) for the above problem. Unfortunately
the solution is not practical. For example the in nity-norm solution requires knowing the eigenvalue
of maximum modulus and the corresponding eigenvector of the nonnegative matrix C = jAjjA;1j,
and to solve Ax = b, we will not know A;1 in advance. For details of the method, see Forsythe and
Moler (CSLAS, pp. 43{44).

Remark: Demmel (1989) has shown that scaling to improve the condition number is not
necessary when solving a symmetric positive de nite system using the Cholesky algo-
rithm. The error bound obtained for the solution by the algorithm for the unscaled system Ax = b
is almost the same as that of the scaled system with A~ = D;1AD;1 , D = diag(pa11; : : :; pann).

6.7.8 Computing and Estimating the Condition Number


The obvious way to compute the condition number will be to compute it from its de nition:
1. Compute A;1
2. Compute kAk; kA;1k; and multiply them.
We have seen that computing the inverse of A requires n3 ops. Thus, this approach is three times
the expense of nding the solution of Ax = b itself. On the other hand, to compute Cond(A), we
only need to know kA;1 k, not the inverse itself. Furthermore, the exact value of Cond(A)
itself is seldom needed; an estimate is sucient. The question, therefore, arises whether we
can get a reasonable estimate of kA;1 k without computing the inverse of A explicitly.
In this context, we note that if y is any non-zero n-vector, then from
Az = y
we have
kzk = kA;1yk  kA;1k kyk:
Then,
kA;1k  kkzykk ; y 6= 0:

315
Thus, if we choose y such that kz k is quite large, we could have a reasonably good estimate of
ky k
kA;1k. Rice (MCMS, p. 93) remarks: There is a heuristic argument which says that if y is
picked at random, then the expected value of kkyzkk is about 12 kA;1k.
A systematic way to choose y has been given by the Linpack Condition Number Estimator
(LINPACK (1979)). It is based on an algorithm by Cline, Moler, Stewart and Wilkinson (1979).
The process involves solving two systems of equations
AT y = e
and
Az = y;
where e is a scalar multiple of a vector with components 1 chosen in such a way that the possible
growth is maximum.
To avoid over ow, the LINPACK condition estimator SGECO routine actually computes an
estimate of Cond(1 , called
A)
RCOND = kAkky kkz k :
The procedure for nding RCOND, therefore, can be stated as follows:
1. Compute kAk.
2. Solve AT y = e and Az = y , choosing e such that the growth is maximum (see LINPACK
(1979)) for the details of how to choose e).
3. RCOND = kkAy kk kz k.

Flop-count. Once A has been triangularized to solve a linear system involving A, the actual
cost of estimating the Cond(A) of A using the above procedure is quite cheap. The same triangu-
larization can be used to solve both the systems in step 2. Also, `1 vector norm can be used so that
the subordinate matrix norm can be computed from the columns of the matrix A. The process
of estimating Cond(A) in this way requires only O(n2) ops.
Round-o error. According to LINPACK (1979), ignoring the e ects of round-o error, it
can be proved that
1
RCOND  Cond(A):
In the presence of round-o error, if the computed RCOND is not zero, 1 is almost always
RCOND
a lower bound for the true condition number.
316
An Optimization Technique for Estimating kA;1k1
Hager (1984) has proposed a method for estimating kA;1 k based on an optimization technique.
This technique seems to be quite suitable for randomly generated matrices. Let A;1 = B = (bij ).
De ne a function f (x):
f (x) = kBx k1
Xn X n
= bij xj :

i=1 j =1
Then
kBk1 = kA;1k1 = maxff (x) : kxk1 = 1g:
Thus, the problem is to nd maximum of the convex function f over the convex set
S = fx 2 Rn : kxk1  1g:
It is well known that the maximum of a convex function is obtained at an extreme point. Hager's
method consists in nding this maximum systematically. We present the algorithm below (for
details see Hager (1984)). Hager remarks that the algorithm usually stops after two
iterations. An excellent survey of di erent condition number estimators including Hager's, and
their performances have been given by Higham (1987).

Algorithm 6.7.1 Hager's norm-1 condition number estimator

Set  = kA;1k1 = 0:
011
BB n1 CC
Set b = BBB n.. CCC :
@.A
1
n
1. Solve Ax = b.
2. Test if kxk  . If so, go to step 6. Otherwise set  = kxk1 and go to step 3.
3. Solve AT z = y; where
yi = 1 if xi  0;
yi = ;1 if xi < 0:
William Hager is a professor of mathematics at University of Florida. He is the author of the book Applied
Numerical Linear Algebra.

317
4. Set j = arg maxfjzij; i = 1 to ng.
5. If jzj j > z T b, update 001
BB ... CC
BB CC
B 1 CC j th entry
bB BB CC
BB 0. CC
B@ .. CA
0
and return to step 1. Else go to step 6.
6. Set kA;1k1  . Then Cond1 (A) = kAk1.
Example 6.7.4
We illustrate Hager's method by means of a very ill-conditioned matrix.
01 2 31
A=B
B 3 4 5 CC ; Cond(A) = 3:3819  1016:
@ A
6 7 8
Iteration 1:
011
B
B
3
C:
b = @ 13 C
A
1
3
0 1:0895 1
B C
x = B
@ ;2:5123 CA
1:4228
 = 5:0245
011
BB CC
y = @ ;1 A
1
0 2:0271 1
z =
B ;3:3785 CC
1016 B
@ A
1:3514
j = 2
jz2j = 1016(3:3785) > zT b = ;1:3340:

318
Update 001
B CC
bB
@1A:
0
Iteration 2:
001
B 1 CC ;
b = B
@ A
0
0 ;1:3564 1
B CC
x = 1017 B
@ 2: 7128 A;
;1:3564
kxk1 = 5:4255  1017:
Since kxk1 > , we set  = 5:4255  1017.

Comment. It turns out that this current value is an excellent estimate of kA;1k1.
Condition Estimation from Triangularization
If one uses the QR factorization to solve a linear system or the Cholesky factorization to solve
a symmetric positive de nite system, then as a by-product of the triangularization, one can obtain
an upper bound of the condition number with just a little additional cost.
If QR factorization with column pivoting is used, then from
R
!
QT AP = ;
0
we have Cond2(A) = Cond2(R).
If the Cholesky factorization is used then from
A = HH T ;
we have Cond2 (A) = (Cond2(H ))2. Thus, the Cond2 (A) can be determined if Cond2(R) or
Cond2 (H ) is known. Since kRk2 or kH k2 is easily computed, all that is needed is an algorithm
to estimate kR;1k2 or kH ;1k2. There are several algorithms for estimation of the norms of the
inverses of triangular matrices. We just state one from a paper of Higham (1987). For details, see
Higham (1987).
Algorithm 6.7.2 Condition Estimation of an Upper Triangular Matrix
319
Given a nonsingular upper triangular matrix T = (tij ) of order n, the following algorithm
computes CE such that kT ;1k1  CE .
1. Set z = 1 .
n tnn
For i = n ; 1 to 1 do
s1
2.
s  s + jtij jzj (j = i + 1; : : :; n).
zi = jts j :
ii
3. Compute CE = kz k1 , where z = (z1 ; z2; :::; zn)T .

Flop-count. The algorithm requires n2 ops.


2

Remark: Once kT ;1k1 is estimated by the above algorithm, kT ;1k2 can be estimated from the
relation:
1
kT ;1k2  (kM (T );1k1CE2 ;
where M (T ) = (mij ) are de ned by:
8
< jt j; i = j
mij = : ii
;jtij j; i =6 j:
kM (T );1k1 can be estimated by using Hager's algorithm described in the last section.

6.8 Component-wise Perturbations and the Errors


If the component-wise bounds of the perturbations are known, then the following perturbation
result obtained by Skeel (1979) holds.

Theorem 6.8.1 Skeel (1979) Let Ax = b and (A + A)(x + x) = b + b Let


jAj  jAj ,jbj  jbj Then,
kxk   kjA;1jjAjjxj + jA;1jjbjk
kxk (1 ; kjA;1jjAjk)kxk

Robert Skeel is a professor of Computer Science at the University of Illinois at Urbana-Champaign.

320
De nition 6.8.1 We shall call the number Cond(A; x) = kjA kjjxAk jjxjk the Skeel's condition
;1

number and Conds(A) = kjA;1jjAjk the upper bound of the Skeel's condition number.
An important property of Cond(A; x): Skeel's condition number is invariant un-
der row-scaling. It can, therefore, be much smaller than the usual condition number Cond(A).
Cond(A; x) is useful when the column norms of A;1 vary widely.
Chandrasekaran and Ipsen (1994) have recently given an analysis of how do the individual
components of the solution vector x get a ected when the data is perturbed. Their analysis can
e ectively be combined with the Skeel's result above, when the component-wise perturbations of
the data are known.. We give an example.

Theorem 6.8.2 (Component-wise Perturbation Theorem).


Let (A + A)(x + x) = b, and jAj  jAj Then,
jxi ; xij   jriT jjAjjxj
jxij jxij
where riT = eTi A;1 .

Thus, the component-wise perturbations in the error expressions have led to the componenet-
wise version of Skeel's condition number. Similar results also hold for right hand side perturbation.
For details see Chandrasekaran and Ipsen (1994).

6.9 Iterative Re nement


Suppose a computed solution x^ of the system Ax = b is not acceptable. It is then natural to wonder
if x^ can be re ned cheaply by making use of the triangularization of the matrix A already available
at hand.
The following process, known as iterative re nement, can be used to re ne x^ iteratively up
to some desirable accuracy.
Iterative Re nement Algorithm
The process is based on the following simple idea:
Let xb be a computed solution of the system
Ax = b:
321
If xb were exact solution, then
r = b ; Axb
would be zero. But in practice we shall not expect that. Let us now try to solve the system again
with the computed residual r(6= 0), that is, let c satisfy
Ac = r:
Then, y = xb + c is the exact solution of Ax = b, provided that c is the exact solution of Ac = r,
because
Ay = A(xb + c) = Axb + Ac = b ; r + r = b:
It is true that c again will not be an exact solution of Ac = r in practice; however, the above
discussion suggests that y might be a better approximation than xb. If so, we can continue the
process until a desired accuracy is achieved.
Algorithm 6.9.1 Iterative Re nement
Set x(0) = x^.
For k = 0; 1; 2; : : : do
1. Compute the residual vector r(k):

r(k) = b ; Ax(k) :
2. Calculate the correction vector c(k) by solving the system:
Ac(k) = r(k);
using the triangularization of A obtained to get the computed solution x^.
3. Form x(k+1) = x(k) + c(k).
4. If kx kx(;
(k+1) x(k)k
2
k)k2 is less than a prescribed tolerance , stop.

Remark: If the system is well-conditioned, then the iterative re nement using Gaussian elim-
ination with pivoting will ultimately produce a very accurate solution.
Example 6.9.1

322
01 1 01 0 0:0001 1
A=B
B 0 2 1 CC ; b = BB 0:0001 CC :
@ A @ A
0 0 3 ;1:666
0 ;0:2777 1
B 0:2778 C
The exact solution x = B C
@ A (correct up to four gures).
;0:5555
011
B CC
x(0) = B
@1A:
1
k=0: 0 ;1:9999 1
B C
r(0) = b ; Ax(0) = B
@ ;2:9999 CA :
;4:6666
The solution of Ac(0) = r(0) is
0 ;1:2777 1
B C
c(0) = B
@ ;0:7222 CA
;1:5555
0 ;0:2777 1
x(1)
B 0:2778 CC :
= x(0) + c(0) = B
@ A
;0:5555
Note that Cond(A) = 3:8078. A is well-conditioned.
Accuracy Obtained by Iterative Re nement
Suppose that the iteration converges. Then the error at (k +1)th step will be less than the error
at the kth step.
Relative Accuracy from Iterative Re nement
Let
kx^ ; x(k+1)k  c kx^ ; x(k)k :
kx^k kx^k
Then if c  10;s, there will be a gain of approximately s gures per iteration.
Flop-count. The procedure is quite cheap. Since A has already been triangularized to solve
the original system Ax = b; each iteration requires only O(n2) ops.
323
Remarks: Iterative re nement is a very useful technique. Gaussian elimination with partial
pivoting followed by iterative re nement is the most practical approach for solving a
linear system accurately. Skeel (1979) has shown that in most cases even one step of
iterative re nement is sucient.
Example 6.9.2 (Stewart IMC, p. 205)

7 6:990
! 34:97
!
A= ; b=
4 4 20:00
Cond2 (A) = 3:2465  103:
2
!
The exact solution is x = .
3
Let x(0) be
1:667
!
= x(0)
3:333
(obtained by Gaussian elimination without pivoting.)
k=0: !
0:333  10;2
r(0) = b ; Ax(0) =
0
The solution of Ac(0) = r(0) is
0:3167
!
c(0) =
;0:3167
1:9837
!
x(1) = x(0) + c(0) = :
3:0163
k=1:
;0:0292 !
r(1) = b ; Ax(1) =
;0:0168
The solution of Ac(1) = r(1) is
0:0108
!
c(1) =
;0:0150
1:9992
!
x =x +c =
(2) (1) (1)
3:0008

324
Iterative Re nement of the Computed Inverse
As in the procedure of re ning the computed solution of the system Ax = b; a computed inverse
of A can also be re ned iteratively.
Let X (0) be an approximation to a computed inverse of A. Then the matrices X (k) de ned by
the following iterative procedure:
X (k+1) = X (k) + X (k)(I ; AX (k)); k = 0; 1; 2; : : : (6.9.1)
converge to a limit (under certain conditions) and the limit, when it exists, is a better inverse of A.
Note the resemblance of the above iteration to the Newton-Raphson method for nding a zero
of f (x):
xk+1 = xk ; ff0((xxk )) :
k
Like the Newton-Raphson method, the iteration (6.9.1) has the convergence of order 2. This can
be seen as follows:
I ; AX (k+1) = I ; A(X (k) + X (k)(I ; AX (k)))
= I ; AX (k) ; AX (k) + (AX (k))2
= (I ; AX (k))2 :
Continuing k times, we get
I ; AXk+1 = (I ; AX0)2k ;
from where we conclude that if kI ; AX0 k =  < 1; then the iteration converges to a limit, because
in this case kI ; AXk+1k  2k . A necessary and sucient condition is that (I ; AX0 ) < 1.
We summarize the above discussion as follows:

Theorem 6.9.1 The sequence of matrices fXkg de ned by


Xk+1 = Xk + Xk(I ; AXk ); k = 0; 1; 2; : : ::
converges to the inverse of the matrix A for any initial approximation X0 of the
inverse i
(I ; AX0) < 1:
A sucient condition for the convergence is that
kI ; AX0k =  < 1:

325
Example 6.9.3

01 2 31
B C
A = B
@ 2 3 4 CA
7 6 8
0 0 ;0:6667 0:3333 1
B C
A;1 = B
@ ;4 4:3333 ;0:6667 CA (in four-digit arithmetic).
3 ;2:6667 0:3333
Let us take 0 0 ;0:6660 0:3333 1
B ;3:996 4:3290 ;0:6660 CC
X0 = B
@ A
2:9970 ;2:6640 0:3330
then
(I ; AX0) = 0:001 < 1:
(Note that the eigenvalues of I ; AX0 are 0.001, 0.001, 0.001.)
0 0 ;0:6667 0:3333 1
B ;4 4:3333 ;0:6667 CC
X1 = X0 + X0(I ; AX0) = B
@ A
3 ;2:6667 0:3333
(exact up to four signi cant gures).
Estimating Cond(A) For Iterative Re nement
A very crude estimate of Cond(A) may be obtained from the Iterative Re nement
procedure (Algorithm 6.9.1). Let k be the number of iterations required for the
re nement procedure to converge, t is the number of digits used in the arithmetic,
then (Rice MCMS, p. 98):
a rough estimate of Cond(A) is 10t(1; 1 ) .
k

Thus, if the iterative re nement procedure converges very slowly, then


A is ill-conditioned.
6.10 Iterative Methods
In this section, we study iterative methods, primarily used to solve a very large and sparse linear
system Ax = b, arising from engineering applications. These include

326
1. The Jacobi method (Section 6.10.1).
2. The Gauss-Seidel method (Section 6.10.2).
3. The Successive Overrelaxation method (Section 6.10.4).
4. The Conjugate Gradient method with and without preconditioner (Section 6.10.5).
5. The GMRES method (Section 6.10.6).
The Gauss-Seidel Method is a modi cation of the Jacobi method, and is a special case of the
Successive Overrelaxation Method (SOR). The Conjugate Gardient Method is primarily used to
solve a symmetric positive de nite system. The Jacobi and the Gauss-Seidel methods converge for
diagonally dominant matrices and in addition, the Gauss-Seidel method converges if A is symmetric
positive de nite. Note that the diagonally dominant and the symmetric positive de nite matrices
are among the most important classes of matrices arising in practical applications.
The direct methods based on triangularization of the matrix A becomes prohibitive in terms of
computer time and storage if the matrix A is quite large. On the other hand, there are practical
situations such as the discretization of partial di erential equations, where the matrix size
can be as large as several hundred thousand. For such problems, the direct methods become
impractical. For example, if A is of order 10; 000  10; 000, it may take as long as 2-3 days for an
IBM 370 to solve the system Ax = b using Gaussian elimination or orthogonalization techniques
of Householder and Givens. Furthermore, most large problems are sparse and the sparsity gets
lost to a considerable extent during the triangularization procedure, so that at the end
we have to deal with a very large matrix with too many nonzero entries, and the storage becomes
a crucial issue. For such problems, it is advisable to use a class of methods called ITERATIVE
METHODS that never alter the matrix A and require the storage of only a few vectors of length
n at a time.

Basic Idea
The basic idea behind an iterative method is to rst write the system Ax = b in an equivalent form:
x = Bx + d (6.10.1)
and then starting with an initial approximation x(1) of the solution-vector x to generate a sequence
of approximations fx(k)g iteratively de ned by
x(k+1) = Bx(k) + d; k = 1; 2; : : : (6.10.2)
327
with a hope that under certain mild conditions, the sequence fx(k)g converges to the solution as
k ! 1.
To solve the linear system Ax = b iteratively using the idea, we therefore need to know
(a) how to write the system Ax = b in the form (6.10.1), and
(b) how should x(1) be chosen so that the iteration (6.10.2) converges to the limit
or under what sort of assumptions, the iteration converges to the limit with any
arbitrary choice of x(1).
Stopping Criteria for the Iteration (6.10.2)
It is natural to wonder when the iteration (6.10.2) can be terminated. Since when convergence
occurs, x(k+1) is a better approximation than x(k); a natural stopping criterion will be:
Stopping Criterion 1
I. Stop the iteration (6.10.2) if
kx(k+1) ; x(k)k < 
kx(k)k
for a prescribed small positive number . ( should be chosen according to the
accuracy desired).
In cases where the iteration does not seem to converge or the convergence is too slow, one might
wish to terminate the iteration after a number of steps. In that case the stopping criterion will be:
Stopping Criterion 2
II. Stop the iteration (6.10.2) as soon as the number of iteration exceeds a prescribed
number, say N .
6.10.1 The Jacobi Method
The System Ax = b
or
a11x1 + a12x2 +    + a1nxn = b1
a21x1 + a22x2 +    + a2nxn = b2
..
.
an1x1 + an2x2 +    + annxn = bn
328
can be rewritten (under the assumption that aii 6= 0; i = 1; : : :; n) as:
x = 1 (b ; a x ;    ; a x )
1
a11 1 12 2 1n n

x2 = a1 (b2 ; a21x1 ; a23 x3 ;    ; a2nxn)


22
..
.
xn = a1 (bn ; an1x1 ;    ; an;n;1xn;1):
nn
In matrix notation,
0 x 1 0 0 ; aa12     ; aa1n 1 0 x 1 0 b1 1
BB x1 CC B 2n C
11 11 1
B ; a 21
0 ;a a 23
   ; a C B CC B a11 C
BB . CC B
2 B a a C
22 C
B
BB . CC BB b2 CC
x 2
. B
BB . CC = B .
..
22
. .. . ..
22
. .. .
.. C
CC BB .. CC + BB a22 CC
B
BB .. CC B C BB .. CC BB ... CC
. B
B .. . . . . an;1;n C . @b A
@ A @ ; an;1;n;1 CA@ A
. . .
n
xn ;aa n1
  ; a a n;n ; 1
0
x n ann
nn nn
or
x = Bx + d:
If we write the matrix A in the form
A =L+D+U
where 00 0  0
1
BB a 0  0C
C
L=BBB 21.. ... . . . ... C
C;
CA
@ .
an1    an;n;1 0
D = diag(a11;    ; ann);
and 00 a12       1 a1n
B
B 0 0 a23 . . .
CC a2n
B
B . ... ... ... .. C C
B
U = B .. . C CC ;
B
B . ... ... ... a
@ .. n;1;n C A
0 0 0 0 0
then it is easy to see that
B = ;D;1 (L + U ) = (I ; D;1 A)
d = D;1b:
(Note that, because of our assumption that aii 6= 0; i = 1; : : :; n, D is nonsingular.)
329
We shall call the matrix
B = ;D;1 (L + U ) = (I ; D;1 A)
the Jacobi Iteration Matrix and denote it by BJ . Similarly, we shall denote the vector D;1 b by
bJ , which, we called the Jacobi vector.

The Jacobi Iteration Matrix and the Jacobi Vector


Let A = L + D + U . Then
BJ = ;D;1(L + U )
bJ = D;1b

With the Jacobi iteration matrix and the Jacobi vector as de ned above, the iteration (6.9.2)
becomes:

The Jacobi Iteration


ii
k+1)    ; a x(k)); i = 1; 2;    ; n:
x(ik+1) = a1 (bi ; ai;i+1x(i+1 i;n n

We thus have the following iterative procedure, called the The Jacobi Method.
Algorithm 6.10.1 The Jacobi Method
0 x(1) 1
BB 1(1) CC
x
(1) Choose x(1) = BBB 2.. CCC.
@ . A
x(1)
n
(2) For k = 1; 2; : : :; do until a stopping criterion is satis ed

x(k+1) = BJ x(k) + bJ (6.10.3)


or
X
n
x(ik+1) = a1 (bi ; aij x(jk)); i = 1; : : :; n: (6.10.4)
ii j =1
i6 j
=

330
Example 6.10.1

05 1 11 071
B C B CC
A=B
@ 1 5 1 CA ; b=B
@7A
1 1 5 7
001
B CC
x(1) = B
@0A
0
0 0 ;0:2000 ;0:2000 1 0 1:4000 1
B C B C
BJ = B
@ ;0:2000 0 ;0:2000 C
A; bJ = B
@ 1:4000 CA
;0:2000 ;0:2000 0 1:4000
k=1: 0 1:4000 1
B C
x(2) = BJ x(1) + bJ = B
@ 1:4000 CA
1:4000
k=2: 0 0:8400 1
B C
x(3) = BJ x(2) + bJ = B
@ 0:8400 CA
0:8400
k=3: 0 1:0640 1
B 1:0640 CC
x(4) = BJ x(3) + bJ = B
@ A
1:0640
k=4: 0 0:9744 1
B C
x(5) = BJ x(4) + bJ = B
@ 0:9744 CA
0:9744
k=5: 0 1:0102 1
B CC
x(6) = BJ x(5) + bJ = B
@ 1: 0102 A
1:0102
k=6: 0 0:0099 1
B 0:0099 CC
x(7) = BJ x(6) + bJ = B
@ A
0:0099

331
k=7: 0 1:0016 1
B 1:0016 CC
x(8) = BJ x(7) + bJ = B
@ A
1:0016
The Gauss-Seidel Method*

In the Jacobi method, to compute the components of the vector x(k+1) ; the components of the
vector x(k) are only used; however, note that to compute x(ik+1), we could have used x(1k+1) through
x(i;k+1)
1 which were already available to us. Thus, a natural modi cation of the Jacobi method will
be to rewrite the Jacobi iteration (6.10.4) in the following form:
The Gauss-Seidel Iteration
X
i;1 X
n
x(ik+1) = a1 (bi ; aij x(jk+1) ; aij x(jk) ): (6.10.5)
ii j =1 j =i+1

The idea is to use each new component, as soon as it is available, in the computation
of the next component.
The iteration (6.10.5) is known as the Gauss-Seidel iteration and the iterative method based
on this iteration is called the Gauss-Seidel Method.
In the notation used earlier, the Gauss-Seidel iteration is:
x(k+1) = ;(D + L);1Ux + (D + L);1b:
(Note that the matrix D + L is a lower triangular matrix with a11; : : :; ann on the diagonal; and,
since we have assumed that these entries are nonzero, the matrix (D + L) is nonsingular).
We will call the matrix
B = ;(D + L);1U
the Gauss-Seidel matrix and denote it by the symbol BGS . Similarly the Gauss-Seidel vector
(D + L);1b will be denoted by bGS . That is,
* Association of Seidel's name with Gauss for this method does not seem to be well-documented in history.

332
The Gauss-Seidel Matrix and the Gauss-Seidel Vector
Let A = L + D + U . Then
BGS = ;(D + L);1U;
bGS = (D + L);1b:

Algorithm 6.10.2 The Gauss-Seidel Method


(1) Choose an initial approximation x(1) .
(2) For k = 1; 2; : : :; do until a stopping criterion is satis ed

x(k+1) = BGS x(k) + bGS (6.10.6)


or 0 1
X
i;1 X
n
x(ik+1) = a1 @bi ; aij x(jk+1) ; aij x(jk)A ; i = 1; 2; : : :; n: (6.10.7)
ii j =1 j =i+1

Example 6.10.2

05 1 11 071
B C B CC
A=B
@ 1 5 1 CA ; b=B
@7A
1 1 5 7
0 0 ;0:2 ;0:2 1
BGS = B
B 0 0:04 ;0:16 CC
@ A
0 0:032 0:072
0 1:4000 1
B C
bGS = B
@ 1:1200 CA
0:8960
k = 1: 0 1:4000 1
B 1:1200 CC
x(2) = BGS x(1) + bGS = B
@ A
0:8960

333
k = 2: 0 0:9968 1
B 1:0214 CC
x(3) = BGS x(2) + bGS = B
@ A
0:9964
k = 3: 0 0:9964 1
B 1:0014 CC
x(4) = BGS x(3) + bGS = B
@ A
1:0004
k = 4: 0 0:9996 1
B 1:0000 CC
x(5) = BGS x(4) + bGS = B
@ A
1:0001
k = 5: 011
B 1 CC :
x(6) = BGS x(5) + bGS = B
@ A
1
Computer Implementations
On actual computer implementations it is certainly economical to use the equation (6.10.4) and
(6.10.7). The use of (6.10.3) and (6.10.6) will necessitate the storage of D, L and U , which will be
a waste of storage.

6.10.3 Convergence of Iterative Methods


It is often hard to make a good guess of the initial approximation x(1). Thus, it will be nice to have
conditions that will guarantee the convergence of the iteration (6.10.2) for any arbitrary choice of
the initial approximation.
In the following we derive such a condition.

Theorem 6.10.1 (Iteration Convergence Theorem) The iteration


x(k+1) = Bx(k) + c
converges to a limit with an arbitrary choice of the initial approximation x(1) i the
matrix B k ! 0 as k ! 1, that is, B is a convergent matrix.

Proof. From
x = Bx + c
334
and
x(k+1) = Bx(k) + c;
we have
x ; x(k+1) = B(x ; x(k)): (6.10.8)
Since this is true for any value of k, we can write
x ; x(k) = B (x ; x(k;1)): (6.10.9)
Substituting (6.10.9) in (6.10.8), we have
x ; x(k+1) = B2 (x ; x(k;1)): (6.10.10)
Continuing this process k times we can write
x ; x(k+1) = Bk (x ; x(1)):
This shows that fx(k)g converges to the solution x for any arbitrary choice of x(1) if and only if
B k ! 0 as k ! 1.
Recall now from Chapter 1 (Section 1.7, Theorem 1.7.4) that B is a convergent matrix
i the spectral radius of B, (B), is less than 1. Now (B) = maxfjij; i = 1; : : :; ng, where
1 through n are the eigenvalues of B. Since jij  kB k for each i and for any matrix norm (see
Chapter 8); in particular, (B )  kB k. Thus, a good way to see if B is convergent is to compute
kBk with row-sum or column-sum norm and see if this is less than one. (Note that the converse
is not true.)
We combine the result of Theorem 6.10.1 with the observation just made in the following:

Conditions for Convergence of the Iteration (6.10.2)


A necessary and sucient condition for the convergence of the iteration (6.10.2), for
any arbitrary choice of x(1), is that (B ) < 1. A sucient condition is that kB k < 1;
for some norm.

We now apply the above result to identify classes of matrices for which the Jacobi and/or
Gauss-Seidel methods converge for any choice of initial approximation x(1).

335
The Jacobi and Gauss-Seidel Methods for Diagonally Dominant Matrices

Corollary 6.10.1 If A is row diagonally dominant, then the Jacobi method


converges for any arbitrary choice of the initial approximation x(1) .

Proof. Since that A = (aij) is row diagonally dominant, we have, by de nition


Xn
jaiij > jaij j; i = 1; 2;    ; n: (6.10.11)
j=1
i6 j
=
Recall that the Jacobi Iteration Matrix BJ is given by
0 0 ; a12     ; aa1n 1
B
B ; a21 a011 ; a23 
11
; aa2n
CC
B
B a22 a22 CC
B
B . . . ... ..
22 CC
BJ = B .. .. .. . CC
B
B
B ..
@
. ... ... ; aan;1;n CC
A
n;1;n;1
; aan1     ; an;n
ann
;1 0
nn
From (6.10.11) we therefore have that the absolute row sum (that is, the row sum taking absolute
values) of each row is less than 1, which means
kBJ k1 < 1:
Thus by Theorem 6.10.1, we have Corollary 6.10.1.

Corollary 6.10.2 If A is a row diagonally dominant matrix, then the


Gauss-Seidel method converges for any arbitrary choice of x(1).

Proof. The Gauss-Seidel iteration matrix is given by


BGS = ;(D + L);1U:
Let  be an eigenvalue of this matrix and x = (x1; : : :; xn)T be the corresponding eigenvector. Then
from
BGS x = x
or
;Ux = (D + L)x;
336
we have
X
n Xi
; aij xj =  aij xj ; 1  i  n;
j =i+1 j =1
which can be rewritten as
X
i;1 X
n
aiixi = ; aij xj ; aij xj ; (1  i  n):
j =1 j =i+1
Let xk be the largest component (having the magnitude 1) of the vector x. Then from the above
equation, we have
X
i;1 X
n
jjjakkj  jj jakj j + jakj j; (6.10.12)
j =1 j =i+1
that is,
kX
;1 X
n
jj(jakkj ; jakj j)  jakj j (6.10.13)
j =1 j =k+1
or Pn ja j
jj  j =k+1 kj
P ;1 ja j) :
(jakkj ; kj =1
(6.10.14)
kj
Since A is row diagonally dominant,
X
n
jakkj > jakjj
j =1
j6 k
=
or kX
;1 X
n
jakkj ; jakjj > jakj j:
j =1 j =k+1
Thus from (6.10.14), we conclude that jj < 1, that is,
(BGS ) < 1:
Thus from Theorem 6.10.1, we have Corollary 6.10.2.
A Remark on Corollary 6.10.1
It is usually true that the greater the diagonal dominance of A, the faster is the convergence
of the Jacobi method. However, there are simple counter examples that show that this does not
always happen.
The following simple 2  2 example in support of this statement appears in Golub and Van
Loan MC (1989, p. 514). The example was supplied by Richard S. Varga.
0 1 ;1 1 !
@ 2 A 1 ; 43
A1 = 1 ; A2 = 1 :
;2 1 ;2 1
337
It is easy to verify that (BJacob ) of A1 is greater than (BJacob) of A2 .
We now discuss the convergence of the Gauss-Seidel method for yet another very important
class of matrices, namely, the symmetric positive de nite matrices.
The Gauss-Seidel Method for A Symmetric Positive De nite Matrix
We show that the Gauss-Seidel method converges, with an arbitrary choice of x(1), for a sym-
metric positive de nite matrix.

Theorem 6.10.2 Let A be a symmetric positive de nite matrix. Then the Gauss-
Seidel method converges for any arbitrary choice of the initial approximation x(1) .

Proof. As A is symmetric, we have


A = L + D + LT :
Thus BGS = ;(D + L);1LT .
We will now show that (BGS ) < 1.
Let ; be an eigenvalue of BGS and u be the corresponding eigenvector. Then
(D + L);1 LT u = u
or
LT u = (D + L)u:
Thus
u LT u = u(D + L)u
or
uAu ; u (L + D)u = u(L + D)u
or
uAu = (1 + )u(L + D)u: (6.10.15)
Taking conjugate-transpose on both sides, we have
uAu = (1 +  )u(LT + DT )u: (6.10.16)

338
>From (6.10.15) and (6.10.16), we obtain
 1 
+ 1  u Au = u (L + D)u + u (LT + DT )u (6.10.17)
(1 + ) (1 + )
= u (L + D + LT + DT )u (6.10.18)
= u (A + DT )u (6.10.19)
= u (A + D)u > uAu (6.10.20)
(Note that since A is positive de nite, so is D and, therefore, uDu > 0).
Dividing both sides of (6.10.20) by u Au(> 0) we have
 1 1 >1
+
(1 + ) (1 +  )
or
(2 +  + ) > 1: (6.10.21)
(1 + )(1 + )
Let  = + i . Then  = ; i . >From (6.10.21) we then have
2(1 + ) > 1;
(1 + )2 + 2
p
from where it follows that 2 + 2 < 1. That is (BGS ) < 1; since jj = 2 + 2 .
Rates of Convergence and a Comparison Between
the Gauss-Seidel and the Jacobi Methods
We have just seen that for the row diagonally dominant matrices both the Jacobi and the
Gauss-Seidel methods converge for an arbitrary x(1) . The question naturally arises if this is true for
some other matrices, as well. Also, when both methods converge, another question arises:
which one converges faster?
>From our discussion in the last section we know that it is the iteration matrix B that plays
a crucial role in the convergence of an iterative method. More speci cally, recall from proof of
Theorem 6.9.2 that ek+1 = error at the (k + 1)th step = x ; xk+1 and e1 = initial error = x ; x(1)
are related by
kek+1k  kBk kke1k; k = 1; 2; 3; : : ::
Thus, kB k k gives us an upper bound of the ratio of the error between the (k + 1)th step and the
initial error.
De nition 6.10.1 If kBk k < 1; then the quantity
; ln kB k
k
k
339
is called the Average Rate of Convergence for k iterations, and, the quantity
; ln (B)
is called the Asymptotic Rate of Convergence.
If the asymptotic rate of convergence of one iterative method is greater than that
of the other and both the methods are known to converge, then the one with the
larger asymptotic rate of convergence, converges asymptotically faster than the other.
The following theorem, known as the Stein-Rosenberg Theorem, identi es a class of matrices
for which the Jacobi and the Gauss-Seidel are either both convergent or both divergent. We shall
state the theorem below without proof. The proof involves the Perron-Frobenius Theorem
from matrix theory and is beyond the scope of this book. The proof of the theorem and related
discussions can be found in an excellent reference book on the subject, (Varga MIR, Chapter 3):

Theorem 6.10.3 (Stein-Rosenberg) If the matrix A is such that its diagonal entries
are all positive and the o -diagonal entries are nonnegative, then one and only one
of the following statements holds:
(a) (BJ ) = (BGS ) = 0;
(b) 0 < (BGS ) < (BJ ) < 1;
(c) (BJ ) = (BGS ) = 1;
(d) 1 < (BJ ) < (BGS ).

Corollary 6.10.3 If 0 < (BJ ) < 1, then the asymptotic rate of convergence of
the Gauss-Seidel method is larger than that of the Jacobi method.

Two immediate consequences of the above results are noted below.


Richard Varga is a university professor at Kent State University. He is also the director of the Institute for
Computational Mathematics at that university. He is well known for his outstanding contributions in iterative
methods. He is the author of the celebrated book Matrix Iterative Analysis.

340
If the matrix A satis es the hypothesis of Theorem 6.10.3, then
(i) The Jacobi and the Gauss-Seidel methods either both converge or both
diverge.
(ii) When both the methods converge, the asymptotic rate of convergence of
the Gauss-Seidel method is larger than that of the Jacobi method.
Remarks: Note that in (ii) we are talking about the asymptotic rate of convergence, not the
average rate of convergence.
Unfortunately, in the general case no such statements about the convergence and the asymptotic
rates of convergence of two iterative methods can be made. In fact, there are examples where one
method converges but the other diverges (see the example below). However, when both the
Gauss-Seidel and the Jacobi converge, because of the lower storage requirement and
the asymptotic rates of convergence, the Gauss-Seidel method should be preferred
over the Jacobi.
Example 6.10.3
The following example shows that the Jacobi method can converge even when the Gauss-Seidel
method does not.
0 1 2 ;2 1
B C
A = B @ 1 1 1 CA
2 2 1
011
b = B
B 2 CC
@ A
5
0 0 ;2 2 1
BJ = B
B C
@ ;1 0 ;1 CA
;2 ;2 0
0 0 ;2 2 1
B C
BGS = B @ 0 2 ;3 CA
0 0 2

(BGS ) = 2:
(BJ ) = 6:7815  10;6

341
A Few Iterations with Gauss-Seidel
001 011 011
x(1)
B 0 CC ; x(2) = BB 1 CC ; x(3) = BB 0 CC ;
= B
@ A @ A @ A
0 07 1 0 131 1 3
B CC (5) B C
x(4) = B
@ ;8 A ; x = B@ ;36 CA ;
7 15
etc. This shows that the Gauss-Seidel method is clearly diverging. The exact solution is
071
B CC
x=B@ ;4 A :
;1
On the other hand, with the Jacobi method we have convergence with only two iterations:
001
B CC
x(1) = B @0A
0 01 1
B CC
x(2) = B @2A
0 57 1
x(3) = B
B ;4 CC :
@ A
;1
6.10.4 The Successive Overrelaxation (SOR) Method
The Gauss-Seidel Method is frustratingly slow when (BGS ) is close to unity. However, the rate
of convergence of the Gauss-Seidel iteration can, in certain cases, be improved by introducing a
parameter w, known as the relaxation parameter. The following modi ed Gauss-Seidel iteration,
The SOR Iteration
X
i;1 X
n
x(ik+1) = aw (bi ; aij x(jk+1) ; aij x(jk)) + (1 ; w)xki; i = 1; 2; : : :; (6.10.22)
ii j =1 j =i+1

is known as the successive overrelaxation iteration or in short, the SOR iteration, if w > 1.
From (6.10.22) we note the following:

342
(1) when ! = 1, the SOR iteration reduces to the Gauss-Seidel iteration.
(2) when ! > 1, in computing the (k + 1)th iteration, more weight is placed on
the most current value than when ! < 1, with a hope that the convergence
will be faster.
In matrix notation the SOR iteration is
x(k+1) = (D + !L);1[(1 ; !)D ; !U ]x(k) + !(D + !L);1b; k = 1; 2; : : : (6.10.23)
(Note that since aii 6= 0, i = 1; : : :; n the matrix (D + !L) is nonsingular.)
The matrix (D + !L);1[(1 ; ! )D ; !U ] will be called the SOR matrix and will be denoted by
BSOR . Similarly, the vector (D + !L);1b will be denoted by bSOR , that is,

The SOR Matrix and the SOR Vector

BSOR = (D + !L);1[(1 ; ! )D ; !U ]
bSOR = !(D + !L);1b:

In matrix notation the SOR iteration algorithm will be:


Algorithm 6.10.3 The Successive Overrelaxation Method
(1) Choose x(1)
(2) For k = 1; 2;   do until a stopping criterion is satis ed
x(k+1) = BSOR x(k) + bSOR .
Example 6.10.4
Let A and b be the same as in Example 6.9.2. Let
x(1) (1) (1)
1 = x2 = x3 = 0; w = 1:2

Then
0 ;0:2000 ;0:2400 ;0:2400 1 0 1:6800 1
B C B C
BSOR = B
@ 0:0480 ;0:1424 ;0:1824 CA ; bSOR = B
@ 1:2768 CA
0:0365 0:0918 ;0:0986 0:9704
343
k = 1: 0 x(2) 1 0 1:6800 1
1
B x(2) CC = B x(1) + b = BB 1:2768 CC
x(2) = B
@ 2 A SOR SOR @ A
x(2)
3 0:9704
k = 2: 0 x(3) 1 0 0:8047 1
1
B x(3) CC = B x(2) + b = BB 0:9986 CC
x(3) = B
@ 2 A SOR SOR @ A
x(3)
3 1:0531
k = 3: 0 x(4) 1 0 1:0266 1
1
B (4) CC B C
x(4) = B
@ x2 A = BSOR x(3) + bSOR = B@ 0:9811 CA :
x(4)
3 0:9875
Choice of ! in the Convergent SOR Iteration
It is natural to wonder what is the range of ! for which the SOR iteration converges and what
is the optimal choice of ! ? To this end, we rst prove the following important result due to William
Kahan (1958).

Theorem 6.10.4 (Kahan) For the SOR iteration to converge for every initial ap-
proximation x(1) , w must lie in the interval (0,2).

Proof. Recall that SOR iteration matrix BSOR is given by


BSOR = (D + !L);1[(1 ; !)D ; !U ]
where A = L + D + U .
The matrix (D + !L);1 is a lower triangular matrix with a1 , i = 1; : : :; n as the diagonal entries
ii
and the matrix (1 ; ! )D ; !U is an upper triangular matrix with (1 ; ! )aii; i = 1; : : :; n as the
diagonal entries. So
det(BSOR ) = ;(1 ; w)n:
William Kahan is a professor of Mathematics and Computer Science at the University of California-Berkeley. He
has made signi cant contributions in several areas of numerical linear algebra, including computer arithmetic. He
received the prestigious \ACM Turing Award" in 1989.

344
Since the determinant of a matrix is equal to the product of its eigenvalues, we conclude that
(BSOR )  j1 ; ! j;
where (BSOR is the special radius of the matrix BSOR .
Since by Theorem 6.10.1, for the convergence of any iterative method the spectral radius of the
iteration matrix has to be less than 1, we conclude that ! must lie in the interval (0,2).
The next theorem, known as the Ostrowski-Reich Theorem, shows that the above condition
is also sucient in case the matrix A is symmetric and positive de nite.
The theorem was proved by Reich for the Gauss-Seidel iteration (! = 1) in 1949 and subse-
quently extended by Ostrowski in 1954.
We state the theorem without proof. For proof, see Varga MIA, Section 3.4 or Ortega, Nu-
merical Analysis-Second Course, p. 123. The Ostrowski-Reich Theorem is a generalization
of Theorem 6.10.4 for symmetric positive de nite matrices.

Theorem 6.10.5 (Ostrowski-Reich) Let A be a symmetric positive de nite ma-


trix and let 0 < ! < 2. Then the SOR method will converge for any arbitrary choice
of x(1).

Optimal Choice of ! in the Convergent SOR Iteration


We now turn to the question of optimal choice of ! . Again, for an arbitrary matrix, no criterion
has been developed so far. However, such a criterion is known for a very useful class of matrices
that arises in many practical applications. The matrices in this class are called consistently ordered
matrices.
De nition 6.10.2 A matrix A = L + D + U is consistently ordered if the eigenvalues of the
matrix
C () = ;D;1 ( 1 L + U )
do not depend on ,  6= 0.
Young has de ned a consistently ordered matrix di erently (see Young (1971). The latter does
not depend on the eigenvalues.

345
De nition 6.10.3 The matrix A is 2-cyclic if there is a permutation matrix P such that
A A ! 11 12
PAP T = ;
A21 A22
where A11 and A22 are diagonal.
David Young has de ned such a matrix as the matrix having \Property (A)". This de nition
can be generalized to block matrices where the diagonal matrices A11 and A22 are block diagonal
matrices; we call such matrices block 2-cyclic matrices. A well-known example of a consistently
ordered block 2-cyclic matrix is the block tridiagonal matrix
0 T ;I    0 1
BB ;I . . . . . . ... CC
A=B BB .. . . . . . . CC
CA
@ . ; I
0    ;I T
where 0 4 ;1    0 1
BB ;1 . . . . . . ... CC
T =B BB . . . . . . . . . CC :
@ ;1 CA
0    ;1 4
Recall that this matrix arises in the discretization of the Poisson's equations:
 2T +  2T = f
x2 y 2
on the unit square. In fact, it can be shown (Exercise) that every block tridiagonal matrix
with nonsingular diagonal blocks is consistently ordered and 2-cyclic.
The following important and very well-known theorem on the optimal chocie of ! for consistently
ordered matrices is due to David Young (Young (1971)).
David Young, Jr., is a professor of mathematics and Director of the Numerical Analysis Center at the University
of Texas at Austin. He is widely-known for his pioneering contributions in the area of iterative methods for linear
systems. He is also one of the developers of the software package \ITPACK".

346
Theorem 6.10.6 (Young) Let A be consistently ordered and 2-cyclic with nonzero
diagonal elements. Then
(BGS ) = ((BJ ))2:
Furthermore, if the eigenvalues of BJ are real and (BJ ) < 1, then the optimal
choice for ! in terms of producing the smallest spectral radius in SOR, denoted by
!opt, is given by
!opt = p 2 ;
1 + 1 ; (BJ )2
and (BSOR ) = !opt ; 1.

As an immediate consequence, we get:

Corollary 6.10.4 For consistently ordered 2-cyclic matrices, if the Jacobi method
converges, so does the Gauss-Seidel method, and the Gauss-Seidel method converges
twice as fast as the Jacobi method.

Example 6.10.5
04 ;1 0 ;1 0 0 1 011
BB ;1 4 ;1 0 ; 1 0 C
C BB 0 CC
BB CC BB CC
B0 ;1 4 0 0 ;1 C C B0C
A=B BB CC ; b = BBB CCC :
BB ;1 0 0 4 ;1 0 C
CC BB 0 CC
B@ 0 ;1 0 ;1 4 ;1 A B@ 0 CA
0 0 ;1 0 ; 1 4 0
The eigenvalues of BJ are: 0:1036; 0:2500; ;0:1036; ;0:2500; 0:6036; ;0:6036.
(BJ ) = 0:6036
!opt = p 2 = 1:1128
1 + 1 ; (0:6036)2
(BGS ) = 0:3643

347
It took ve iterations for the SOR method0with 1 !opt to converge to the exact solution (up to
0
BB 0 CC
SOR = B
four signi cant gures), starting with x(1) B . CC :
B@ .. CA
0
0 0:2939 1
BB 0:0901 CC
BB CC
B
B 0:0184 C CC :
x(5) =
SOR B B CC
BB 0 : 0855
B@ 0:0480 CCA
0:0166
With the same starting vector x(1), the Gauss-Seidel method required 12 iterations (Try it!).
Also nd out how many iterations will be required by Jacobi

Comparison of Rates of Convergence Between the Gauss-Seidel and the SOR


Methods
The following theorem, also due to D. Young (see Young (1971)), relates the rates of convergence
between the Gauss-Seidel method and the SOR method with the optimum relaxation factor !opt ,
for consistently ordered matrices.

Theorem 6.10.7 (Young) Let A be a consistently ordered matrix with nonzero


diagonal elements. Assume that the eigenvalues of BJ are all real and (BJ ) < 1.
Let RGS and RSOR denote, respectively, the asymptotic rates of convergence of the
Gauss-Seidel and the SOR methods with optimum relaxation factor !opt . Then
2(BJ )R1GS=2  RSOR  RGS + 2[RGS ]1=2;
the second inequality holds if RGS  3.

Remarks: The above theorem basically states that the SOR method with the
optimum relaxation factor converges much faster (which is expected) than the Gauss-
Seidel method, when the asymptotic rate of convergence of the Gauss-Seidel method
is small.
348
Example 6.10.6
Consider again matrix arising in the process of discretization of the Poisson's equation with the
mesh-size h (the matrix 6.3.36).
For this matrix it is easily veri ed that
(BJ ) = cos(h):
We also know that (BGS ) = 2(BJ ). Thus
RGS = 2RJ = 2(; log cos h)
= 2  h + 0(h4 )
2 2
2
=  2h2 + 0(h4):
For small h, RGS is small, and by the second inequality of Theorem 6.10.7 we have
RSOR  2R1GS=2  2h
and
RSOR  2=(h):
RGS
Thus, when h is small, the asymptotic rate of convergence of the SOR method with the optimum
1 ; then
relaxation is much greater than the Gauss-Seidel method. For example, when h = 50
RSOR  100 = 31:8:
RGS 
Thus, in this case the SOR method converges about 31 times faster than the Gauss-Seidel
method. And, furthermore, the rate of convergence becomes greater as h decreases; the improve-
ment is really remarkable when h is very very small.

6.10.5 The Conjugate Gradient Method


We describe here a method, known as the Conjugate Gradient Method (CG) for solving a
symmetric positive de nite linear system
Ax = b:
The method was originally devised by Hestenes and Stiefel (1956) and nowadays is widely used
to solve large and sparse symmetric positive de nite systems. It is direct in theory, but
iterative in practice. The method is based on the following well-known result on optimization
(see the book by Luenberger (1973)).
349
Theorem 6.10.8 If A is a real symmetric positive de nite matrix, then solving
Ax = b
is equivalent to minimizing the quadratic function
(x) = 12 xT Ax ; xT b:
Furthermore, the minimum value of (x) is ; 12 bT A;1 b and is obtained by choosing
x = A;1b:
There are a large number of iterative methods in the literature of optimization for solving
this minimization problem (see Luenberger (1973)). In these iterative methods the successive
approximations xk are computed recursively:
xk+1 = xk + kpk ;
where the vectors fpk g are called the direction vectors and the scalars k are chosen to minimize
(p) in the directions of pk ; that is, k is chosen to minimize the function  (xk + pk ). It can be
shown that this will happen if we choose
= k = pTk (b ; Axk )=pTk Apk
= pTk rk =pTk Apk ;
where rk = b ; Axk :
How to choose the direction vectors?
The next question, therefore, is how to choose the direction vectors pk ? The conjugate gradient
method (denoted by CG) is a method that automatically generates the direction vectors. The
direction vector needed at each step is generated at the previous step. Moreover, the direction
vectors pk have the remarkable property that
(pi)T Apj = 0 i 6= j:
That is, these vectors are orthogonal with respect to the inner product xT Ay de ned
by A. The direction vectors pk satisfying the above property are called conjugate vectors.
Algorithm 6.10.4 The Basic Conjugate Gradient Algorithm
The Classical Conjugate Gradient Method
350
1. Choose x0 and . Set p0 = r0 = b ; Ax0 .
2. For i = 0; 1; 2; 3;    do
w = Api
i = krik22=pTi w
xi+1 = xi + ipi
ri+1 = ri ; iw
Test for convergence: If kri+1 k22  ; continue.
2
i = krkr+1k22k2
i
i

pi+1 = ri+1 + ipi

Example 6.10.7
05 1 11 071
B C B CC
A=B
@ 1 5 1 CA ; b=B
@7A
1 1 5 7
071
x(0) = (0; 0; 0)T ;
B 7 CC
p0 = r0 = b ; Ax(0) = B
@ A
7
i=0: 0 49 1
B CC
! = Ap0 = B
@ 49 A
49
0 = kprT0k!2 = 0:1429
2
0
0 1:0003 1
B C
x1 = x0 + 0p0 = B
@ 1:0003 CA
1:0003
0 ;0:0021 1
B ;0:0021 CC
r1 = r0 ; 0 ! = B
@ A
;0:0021
0 = 9  10;8
0 ;0:0021 1
B ;0:0021 CC :
p1 = r1 + 0 p0 = B
@ A
;0:0021
351
i=1:
0 ;0:0147 1
B ;0:0147 CC
! = Ap1 = B
@ A
;0:0147
0 1:0000 1
1 = 0:1429; x2 = x1 + 1 p1 = B
B 1:0000 CC :
@ A
1:0000
Convergence
In the absence of round-o errors the conjugate gradient method should converge
in no more than n iterations. Thus in theory, the conjugate gradient method requires about
n iterations. In fact, it can be shown that the error at every step decreases. Speci cally, it can be
2
proved (see Ortega IPVL, p. 277) that:

Theorem 6.10.9
kx ; xk k2 < kx ; xk;1k2;
where x is the exact solution, unless xk;1 = x:

However, the convergence is usually extremely slow due to the ill-conditioning of A. This can
be seen from the following result.

352
Rate of Convergence of the Conjugate Gradient Method
p
De ne kskA = sT As. Then an estimate of the rate of convergence is:
kxk ; xkA  2 kkx0 ; xkA
where
p p
= (  ; 1)=(  + 1) and  = Cond(A) = kAk2kA;1k2 = n=1;
here n and 1 are the largest and smallest eigenvalues of the symmetric positive
de nite matrix A (note that the eigenvalues of A are all positive).

Note. = 0 when Cond(A) = 1. When ! 1; Cond(A) ! 1. Thus, the larger


is the Cond(A), the slower is the rate of convergence.
(For a proof, see Luenberger (1973, p. 187.)

Preconditioning
Since a large condition number of A slows down the convergence of the conjugate gradient
method, it is natural to see if the condition number of A can be improved before the method is
applied; in this case we will be able to apply the basic conjugate gradient method to a preconditioned
system. Indeed, the use of a good preconditioner accelerates the rate of convergence of the method
substantially.
A basic idea of preconditioning is to nd a nonsingular S such that Cond(A~) < Cond(A) where
A~ = SAS T . Once such a S is found, then we can solve A~x~ = ~b where x~ = (S ;1)T x; ~b = Sb and
then recover x from x~ from
x = S T x~:
The matrix S is usually de ned for simplicity by
(S T S );1 = M:
Note that M is symmetric positive de nite and is called a preconditioner.
Algorithm 6.10.5 The Preconditioned Conjugate Gradient Method (PCG)
353
Find a preconditioner M .
Choose x0 and .
Set r0 = b ; Ax0; p0 = y0 = M ;1 r0.
For i = 0; 1; 2; 3    do
(a) w = Api
(b) i = yiT ri =pTi w
(c) xi+1 = xi + i pi
(d) ri+1 = ri ; iw
(e) Test for convergence: If kri+1k22  , continue.
(f) yi+1 = M ;1ri+1
(g) i = yiT+1 ri+1=yiT ri
(h) pi+1 = yi+1 + ipi

Note: If M = I , then the preconditioned conjugate gradient methods reduces to the


basic conjugate gradient.
The next question, therefore, is : : :
How to Find a Preconditioner M
Several possibilities have been explored in the literature. Among them are
1. Polynomial Preconditioning and
2. Incomplete Cholesky Factorization (ICF)
We shall describe ICF in the following. For a description of polynomial preconditioning, (see
Ortega IPVL, pp. 206{208).
Incomplete Cholesky Factorization
Since A is symmetric positive de nite, it admits factorization:
A = LDLT ;
where L is lower unit triangular and D is diagonal.
If A is sparse, L is generally less sparse than A, because ll-in can occur. However, we can
ignore the ll-in and obtain what is known as an Incomplete Cholesky Factorization.
The basic principle of the Incomplete Cholesky Factorization of A = (aij ) is:
354
If aij 6= 0 calculate lij .
If
aij = 0;
set
lij = 0:
Algorithm 6.10.6 Incomplete Cholesky Factorization (Ortega IPVL, p. 212)
Set `11 = pa11
For i = 1; 2; : : :; n do
For j = 1; 2; : : :; i ; 1 do
If aij = 0, then lij = 0 else
` = 1 (a ; Pj ;1 ` ` )
q ij
P i
`jj
;
ij k=1 ik jk
`ii = (aii ; k=1 lik2 )1

Remarks: The above algorithm requires computations of square roots. It may, therefore, not
be carried out to completion. However, we can obtain a no- ll, incomplete LDLT factorization of
A which avoids square root computations.
Algorithm 6.10.7 No-Fill Incomplete LDLT
Set d11 = a11.
For i = 2:::n do
For j = 1; 2; :::; i ; 1 do
if aij = 0; `ij = 0,
else,
X
j ;1
`ij = (aij ; `ik dkk`jk )=djj.
k=1
X
i;1
dii = aii ; `ikdkk
k=1
Use of Incomplete Cholesky Factorization in Preconditioning
Note that Incomplete Cholesky Factorization algorithm mathematically gives the factor-
ization of A in the form
A = LLT + R
355
where R 6= 0. Since the best choice for a preconditioner M is the matrix A itself, after the
matrix L is obtained through incomplete Cholesky Factorization, the preconditioner M is taken as
M = LLT :
In the Preconditioned Conjugate Gradient algorithm (Algorithm 6.9.5) (PCG), a symmetric positive
de nite system of the form:
My = r
needs to be solved at each iteration with M as the coecient matrix (Step f). Since M = LLT ,
this is equivalent to solving
Lx = r; LT y = x:
Since the coecient matrix at each iteration is the same, the incomplete Cholesky Fac-
torization will be done only once. If the no- ll incomplete LDLT is used, then mathematically
we have
A = LDLT + R
In this case we take the preconditioner M as:
M = LDLT :
Then at each iteration of the PCG, one needs to solve a system of the form
My = r;
which is equivalent to Lx = r; Dz = x and LT y = z . Again, L and D have to be computed
once for all.
6.10.6 The Arnoldi Process and GMRES
In the last few years, a method called GMRES has received a considerable amount of attention by
numerical linear algebraists in the context of solving large and sparse linear systems. The method
is based on a classical scheme due to Arnoldi, which constructs an orthonormal basis of a space,
called Krylov subspace fv1 ; Av1; :::An;1v1g, where A is n  n and v1 is a vector of unit length. The
Arnoldi method can be implemented just by using matrix-vector multiplications, and is, therefore,
suitable for sparse matrices, because the zero entries are preserved.
The basic idea behind solving a large and sparse problem using the Arnoldi method
is to project the problem onto the Krylov subspace of dimension m < n using the
orthonormal basis, constructed by the method, solve the m-dimensional problem using
356
a standard approach, and then recover the solution of the original problem from the
solution of the projected problem.
We now summarize the essentials of the Arnoldi method followed by an algorithmic description
of GMRES.

Algorithm 6.10.8 : The Arnoldi Method


(1) Start: Choose a vector v1 of norm 1 and an integer m  n.
(2) Iterate: For j = 1; 2; :::; m do
hij = (vjT Avj ); i = 1; 2; :::; j;
X
j
v^j+1 = Avj ; hi;j vi
i=1
hj +1;j = kv^j +1k2
and
vj +1 = v^j+1 =hj+1;j:
Notes: (i) The scalars hi;j have been chosen so that the vectors vj are orthonormal.
(ii) Let Vm be the n  m matrix whose j th column is the column vector vj , i.e., Vm 
[v1; v2; :::; vm]: Then Vm is an orthonormal basis of the Krylov subspace fv1 ; Av1; :::; An;1v1 g.
(iii) De ne H~ m as the (m +1)  m matrix whose nonzero entries are the coecients hij and Hm
is the m  m matrix obtain from H~ m by deleting its last row. Then the matrix Hm is such that
AVm ; VmHm = hm;1;m [0; 0; :::; 0; vm+1]: (6.10.24)
(iv) A numerically viable way to implement the Arnoldi-method is to use modi ed Gram-
Schmidt or complete or partial reorthogonalization in step 2.

Example 01 2 31
B 1 2 3 CC
A=B
@ A; m = 2
1 1 1
v1 = (1; 0; 0)T :
j =1:
i = 1; h11 = 1

357
001
B CC
v^2 = Av1 ; h11v1 = B
@1A
1
h210 = 1:41411
0
B CC
v2 = v^2 =h21 = B
@ 0: 7071 A
0:7071
j=2:
i = 1 h12 = 3:5355; h22 = 3:5000
Form
01 0 1
B C 1 3:5335
!
V2 = B
@ 0 0:7071 CA ; H2 = :
1:4142 3:5000
0 0:7071
VERIFY: 00 0
1
B 0 1:0607 CC :
AV2 ; H2 V2 = B
@ A
0 ;1:0607
GMRES (Generalized Minimal Residual) Method
The GMRES method is designed to minimize the norm of the residual vector b ; Ax over all
vectors of the ane subspace x0 + Km , where x0 is the initial vector and Km is the Krylov subspace
of dimension m.

Algorithm 6.10.9 : fGeneralized Minimal Residual Method (GMRES) (Saad1 and Schultz2
(1986))
(1) Start:
Choose x0 and a dimension m of the Krylov subspace. Compute r0 = b ; Ax0 .
(2) Arnoldi process:
Perform m steps of the Arnoldi algorithm starting with v1 = r0=kr0k, to generate the Hessenberg
matrix H~ m and the orthogonal matrix Vm .
(3) Form the approximate solution:
1
Youcef Saad is a professor of computer science at the University of Minnesota. He is well-known for his contri-
butions to large-scale matrix computations based on Arnolid Method.
2
Martin Schultz is a professor of computer science at Yale University.

358
 Find the vector ym that minimizes the function J (y) = k e1 ; H~ myk where e1 = [1; 0; :::; 0]T ;
among all vectors y of Rm . = jr0j
 Compute xm = x0 + Vmym .

Remark: Although not clear from the above description, the number m steps needed for the
above algorithm to converge is not xed beforehand but is determined as the Arnoldi algorithm
is run. A formula that gives the residual norm without computing explicitly the residual vector
makes this possible. For details see the paper by Saad and Schultz [1986].
Solving a Shifted System: An important observation is that the Arnoldi basis Vm is
invariant under a diagonal shift of A: if we were to use A ; I instead of A in Arnoldi, we would
obtain the same sequence fv1 ; :::; vmg. This is because the Krylov subspace Km is the same for A
and A ; I , provided the initial vector v1 is the same. Note that from (6.9.24) we have:
(A ; I )Vm = Vm (Hm ; I ) + hm+1;m vm+1 eTm ;
which means that if we run the Arnoldi process with the matrix A ; I , we would obtain the same
matrix Vm but the matrix Hm will have its diagonal shifted by I .
This idea has been exploited in the context of ordinary di erential equations methods by Gear
and Saad (1983).
Solving shifted systems arises in many applications, such as the computation of the Frequency
Response matrix of a control system (see Section 6.4.7).

6.11 Review and Summary


For an easy reference, we now state the most important results discussed in this chapter.

1. Numerical Methods for Arbitrary Linear System Problems. Two types of methods|
direct and iterative|have been discussed.

(a) Direct Methods: The Gaussian elimination and QR factorization methods.


 Gaussian elimination without row interchanges, when it exists, gives a factorization of A :
A = LU .
The system Ax = b is then solved rst by solving the lower triangular system Ly = b followed
by solving the upper triangular system Ux = y .

359
The method requires n3 ops. It is unstable for arbitrary matrices, and is not rec-
3

ommended for practical use unless the matrix A is symmetric positive de nite.
The growth factor can be arbitrarily large for an arbitrary matrix.
 Gaussian elimination with partial pivoting gives a factorization of A:
MA = U:
Once having this factorization, Ax = b can be solved by solving the upper triangular system
Ux = b0, where b0 = Mb.
The process requires n33 ops and O(n2) comparisons. In theory, there are some risks involved,
but in practice, this is a stable algorithm. It is the most widely used practical algorithm
for solving a dense linear system.
 Gaussian elimination with complete pivoting gives
MAQ = U:
Once having this factorization, Ax = b can be solved by solving rst
Uy = b0; where y = QT x; b0 = Mb
and then recovering x from
x = Qy:
The process requires n33 ops and O(n3 ) comparisons. Thus it is more expensive than Gaussian
elimination with partial pivoting, but it is more stable (the growth factor  in this case is
bounded by a slowly growing function of n, whereas the growth factor  with Gaussian
elimination using partial pivoting can be as big as 2n;1).
 The orthogonal triangularization methods are based on the QR factorization of A:
A = QR:
Once having this factorization, Ax = b can be solved by solving the upper triangular system
Rx = b0, where b0 = QT b.
One can use either the Householder method or the Givens method to achieve this factorization.
The Householder method is more economical than the Givens method ( 2n3 3 ops versus 4n3 3
ops). Both the methods have the guaranteed stability.

360
(2) Iterative Methods: The Jacobi, Gauss-Seidel, and SOR methods have been discussed.
A generic formulation of an iterative method is:
x(k+1) = Bx(k) + d:
Di erent methods di er in the way B and d are chosen. Writing A = L + D + U; we have:
 For the Jacobi method:
B = BJ = ;D;1 (L + U );
d = bJ = D;1 b:

 For the Gauss-Seidel method:


B = BGS = ;(D + L);1 U;
d = bGS = (D + L);1b:

 For the SOR method:


B = BSOR = (D + wL);1[(1 ; w)D ; wU ];
d = bSOR = w(D + wL);1b:
(w is the relaxation parameter).
 The iteration
x(k+1) = Bx(k) + d
converges, for any arbitrary choice of the initial approximation x(1) if and only if the spectral
radius of B is less than 1 (Theorem 6.10.1).
A sucient condition for convergence is kB k < 1 (Theorem 6.10.1).
 Using this sucient condition, it has been shown that both the Jacobi and Gauss-Seidel
methods converge when A is a diagonally dominant matrix (Corollary 1 and Corollary 2 of
Theorem 6.10.1).
 The Gauss-Seidel method also converges when A is symmetric positive de nite (Theorem
6.10.2).
 For the SOR iteration to converge for any arbitrary choice of the initial approximation, the
relaxation parameter w has to lie in (0,2) (Theorem 6.10.4).
361
 If the matrix A is symmetric positive de nite, then the SOR iteration is guaranteed to con-
verge for any arbitrary choice of w in the interval (0,2) (Theorem 6.10.5).
 For a consistently ordered and 2-cyclic matrix A with nonzero diagonal entries, the optimal
choice of w, denoted by wopt is given by:
wopt = p 2 ;
1 + 1 ; (B )2
J
assuming that the eigenvalues of BJ are real and (BJ ) < 1. (A) stands for the spectral
radius of A (Theorem 6.10.6).

2. Special Systems: Symmetric positive de nite, diagonally dominant, Hessenberg and tridi-
agonal systems have been discussed.

(a) Symmetric positive de nite system. Two methods|Gaussian elimination without


pivoting and Cholesky|are described.
 The Gaussian elimination without pivoting gives a factorization of A:
A = LDLT :
The system Ax = b is then solved by rst solving the lower triangular system LDy = b,
followed by solving the upper triangular system LT x = y .
The method requires n33 ops and does not require any square roots evaluation. It is stable
(  1).
 The Cholesky factorization algorithm computes a factorization of A in the form A = HH T ,
where H is lower triangular with positive diagonal entries. Once having this factorization, the
system Ax = b is solved by rst solving the lower triangular system Hy = b, where y = H T x,
followed by solving the upper triangular system H T x = y . The method requires n63 ops and
n square roots evaluations. It is stable.

(b) Diagonally Dominant system. Gaussian elimination with partial pivoting is stable
(  2).

(c) Hessenberg system. Gaussian elimination with partial pivoting requires only O(n2)
ops to solve an n  n Hessenberg system. It is stable (  n).

362
(d) Tridiagonal system. Gaussian elimination with partial pivoting requires only O(n) ops.
It is stable (  2).

3. Inverse, Determinant and Leading Principal Minors. The inverse and the determinant
of a matrix A can be readily computed once a factorization of A is available.

(a) Inverses.
 If Gaussian elimination without pivoting is used, we have
A = LU:
Then
A;1 = U ;1 L;1:
 If Gaussian elimination with partial pivoting is used, we have
MA = U:
Then
A;1 = (M ;1U );1 = U ;1M:
 If Gaussian elimination with complete pivoting is used, we have
MAQ = U:
Then
A;1 = (M ;1UQT );1 = QU ;1M:
 If an orthogonal factorization is used, we have
A = QR:
Then
A;1 = (QR);1 = R;1QT :
Note that most problems involving inverses can be recast so that the inverse does
not have to be computed explicitly.
Furthermore, there are matrices (such as triangular, etc.) whose inverses are easily computed.
 The inverse of a matrix B which di ers from a matrix A by a rank-one perturbation only can
be readily computed, once the inverse of A has been found, by using the Sherman-Morrison
Formula: Let B = A ; uvT . Then B;1 = A;1 + (A;1uvT A;1), where = (1;v 1A;1 u) . T

There is a generalization of this formula, known as the Woodbury Formula.

363
(b) Determinant. The determinant is rarely required in practical applications.
However, the determinant of A can be computed immediately, once a factorization of A has
been obtained.
 If we use Gaussian elimination with partial pivoting, we have
MA = U;
then
det(A) = (;1)r a11 a(1) (n;1)
22    ann ;

where r is the number of row interchanges made during the elimination process. a11; a(1)
22 ; : : :;
( n ;
ann are the pivot entries, which appear as the diagonal entries of U . Similarly, other
1)

factorizations can be readily used to nd the determinant.

(c) Leading Principal Minors. The Givens triangularization method has been described.
4. The Condition Number and Accuracy of Solution. In the linear system problem Ax = b,
the input data are A and b. There may exist impurities either in A or in b, or in both.
We have presented perturbation analyses in all the three cases. The results are contained in
Theorems 6.6.1, 6.6.2, and 6.6.3. Theorem 6.6.3 is the most general theorem.
In all these three cases, it turns out that
Cond(A) = kAk kA;1k
is the deciding factor. If this number is large, then a small perturbation in the input data may
cause a large relative error in the computed solution. In this case, the system is called an ill-
conditioned system, otherwise it is well-conditioned. The matrix A having a large condition
number is called an ill-conditioned matrix.
Some important properties of the condition number of a matrix have been listed. Some well-
known ill-conditioned matrices are the Hilbert matrix, Pie matrix, Wilkinson bidiagonal
matrix, etc.
The condition number, of course, has a noticeable e ect on the accuracy of the solution.
A computed solution can be considered accurate only when the product of both
Cond(A) and the relative residual is small (Theorem 6.7.1). Thus, a small relative
residual alone does not guarantee the accuracy of the solution.
A frequently asked question is: How large does Cond(A) have to be before the system
Ax = b is considered to be ill-conditioned?
364
The answer depends upon the accuracy of the input data and the level of tolerance of the error
in the solution.
In general, if the data are approximately accurate and if Cond(A) = 10s, then there
are about (t ; s) signi cant digits of accuracy in the solution, if it is computed in t-digit
arithmetic.
 Computing the condition number from the de nition is clearly expensive; it involves nding
the norm of the inverse of A, and nding the inverse of A is about three times the expense of
solving the linear system itself.
 Two condition number estimators: The LINPACK condition number estimator and the
Hager's norm-1 condition number estimator, have been described.
 There are symptoms exhibited during the Gaussian elimination with pivoting such as a
small pivot, a large computed solution, a large residual, etc. that merely indicate if
a system is ill-conditioned, but these are not sure tests.
 When component-wise perturbations are known, Skeel's condition number can be useful ,
especially when the norms of the columns of the inverse matrix vary widely.

5. Iterative Re nement. Once a solution has been computed, an inexpensive way to re ne the
solution iteratively, known as the iterative re nement procedure has been described (Section 6.8).
The iterative re nement technique is a very useful technique.

6. The Conjugate Gradient and GMRES Methods. The conjugate gradient method, when
used with an appropriate preconditioner, is one of the most widely used methods for solving a large
and sparse symmetric positive de nite linear system.
Only one type of preconditioner, namely the Incomplete Cholesky Factorization (ICF), has been
described in this book. The basic idea of ICF is to compute the Cholesky factorization of A for the
nonzero entries of A only, leaving the zero entries as zeros.
We have described only basic Arnoldi and Arnoldi-based GMRES methods. In practice, mod-
i ed Gram-Schmidt process needs to be used in the implementation of the Arnoldi method, and
GMRES needs to be used with a proper preconditioner. The conjugate gradient method is direct in
theory, but iterative in practice. It is extremely slow when A is ill-conditioned and a preconditioner
is needed to accelerate the convergence.

365
6.12 Some Suggestions for Further Reading
The books on numerical methods in engineering literature routinely discuss how various engineering
applications give rise to linear systems problems. We have used the following two in our discussions
and found them useful:
1. Numerical Methods for Engineers, by Steven C. Chapra and Raymond P. Canale
(second edition), McGraw-Hill, Inc., New York, 1988.
2. Advanced Engineering Mathematics, by Peter V. O'Neil (third edition) Wadsworth
Publishing Company, Belmont, California, 1991.
Direct methods (such as Gaussian elimination, QR factorization, etc.) for linear systems and
related problems, discussions on perturbation analysis and conditioning of the linear systems prob-
lems, iterative re nement, etc. can be found in any standard numerical linear algebra text (in
particular the books by Golub and Van Loan, MC and by Stewart, IMC are highly recom-
mended). Most numerical analysis texts contain some discussion, but none of the existing books
provides a through and in-depth treatment. For discussion on solutions of linear systems with spe-
cial matrices such as diagonally dominant, Hessenberg, positive de nite, etc., see Wilkinson (AEP,
pp. 218{220).
For proofs and analyses of backward error analyses of various algorithms, AEP by Wilkinson is
the most authentic book. Several recent papers by Nick Higham (1986,1987) on condition-number
estimators are interesting to read.
For iterative methods, two (by now) almost classical books on the subject:
1. Matrix Iterative Analysis, by Richard Varga, Prentice Hall, Engelwood Cli s, New
Jersey, 1962, and
2. Iterative Solution of Large Linear Systems, by David Young, Academic Press, New
York, 1971,
are a must.
Another important book in the area is: Applied Iterative Methods, by L. A. Hageman and
D. M. Young, Academic Press, New York, 1981. The most recent book in the area is Templates
for the Solution of Linear Systems: Building Blocks For Iterative Methods, by
Richard Barrett, Mike Berry, Tony Chan, James Demmel, June Donato, Jack Dongarra, Viector
Eijkhoat, Roland Pozo, Chuck Romine and Henk van der Vorst, SIAM, 1994. The book incoporates
state-of-the-art computational methods for solving large and sparse non-symmetric systems.

366
The books: Introduction to Parallel and Vector Solutions of Linear Systems and
Numerical Analysis { A Second Course, by James Ortega, also contain very clear expositions
on the convergence of iterative methods.
The Conjugate Gradient Method, originally developed by M. R. Hestenes and E. Stiefel (1952),
has received considerable attention in the context of solving large and sparse positive de nite
linear systems. A considerable amount of work has been done by numerical linear algebraists and
researchers in applications areas (such as in optimization).
The following books contain some in-depth discussion:
1. Introduction to Linear and Nonlinear Programming, by David G. Luenberger,
Addison-Wesley, New York, 1973.
2. Introduction to Parallel and Vector Solutions of Linear Systems, by James
Ortega.
An excellent survey paper on the subject by Golub and O'Leary (1989) is also recommended
for further reading. See also another survey paper by Axelsson (1985).
The interesting survey paper by Young, Jea, and Mai (1988), in the book Linear Algebra in
Signals, Systems, and Control, edited by B. N. Datta, et al., SIAM, 1988, is worth reading.
The development of conjugate gradient type methods for nonsymmetric and sym-
metric inde nite linear systems is an active area of research.
A nice discussion on \scaling" appears in Forsythe and Moler (CSLAS, Chapter 11). See also
a paper by Skeel(1979) for relationship between scaling and stability of Gaussian elimination.
A recent paper of Chandrasekaran and Ipsen( 1994)describes sensitivity of individual compo-
nents of the solution vector when the data is subject to perturbations.
To learn more about the Arnoldi method and the Arnoldi-based GMRES method, see the papers
of Saad and his coauthors (Saad (1981), Saad (1982), Saad and Schultz (1986)). Walker (1988) has
proposed an implementation of GMRES method using Householder matrices.

367
Exercises on Chapter 6
Use MATLAB Wherever Needed and Appropriate
PROBLEMS ON SECTION 6.3
1. An engineer requires 5000, 5500, and 6000 yd3 of sand, cement and gravel for a building
project. He buys his material from three stores. A distribution of each material in these
stores is given as follows:
Store Sand Cement Gravel
% % %
1 60 20 20
2 40 40 20
3 20 30 50
How many cubic yards of each material must the engineer take from each store to meet his
needs?
2. If the input to reactor 1 in the \reactor" problem of Section 6.3.2 is decreased 10%, what is
the percent change in the concentration of the other reactors?
3. Consider the following circuit diagram:

10Ω 5Ω 1
3 2
V1 = 200V

5Ω 10Ω

V6 = 50V
4 5 6
15Ω 20Ω

Set up a linear system to determine the current between nodes.


4. Using the di erence equation (6.3.35), set up a linear system for heat distribution at the
following interior points of a heated plate whose boundary temperatures are held constant:
368
o
50 C

(1,1) (2,1) (3,1)

o o
100 C (1,2) (2,2) (3,2) 75 C

(1,3) (2,3) (3,3)

o
0 C

5. Derive the linear system for the nite di erence approximation of the elliptic equation
2T +  2T = f (x; y):
x2 y 2
The domain is in the unit square, x = 0:25, and the boundary conditions are given by
T (x; 0) = 1;x
T (1; y ) = y
T (0; y ) = 1
T (x; 1) = 1:
6. For the previous problem, if
f (x; y ) = ; 2 sin(x) sin(y);
then the analytic solution to the elliptic equation
2T +  2T = f (x; y);
x2 y 2
with the same boundary conditions as in problem #5, is given by
 
T (x; y) = 1 ; x + xy + 12 sin(x) sin(y )
(Celia and Gray, Numerical Methods for Differential Equations, Prentice Hall, Inc.,
NJ, 1992, pp. 105{106).
(a) Use the nite di erence scheme of section 6.3.4 to approximate the values of T at the
interior points with x = y = n1 ; n = 4; 8; 16:
(b) Compare the values obtained in (a) with the exact solution.
(c) Write down the linear system arising from nite element method of the solution of the
two-point boundary value problem: ;2u00 + 3u = x2 , 0  x  1; y (0) = y (1) = 0, using
the same basic functions j (x) as in the book and uniform grid.

369
PROBLEMS ON SECTION 6.4
7. Solve the linear system Ax = b, where b is a vector with each component equal to 1 and with
each A from problem #18 of Chapter 5, using
(a) Gaussian elimination wihtout pivoting,
(b) Gaussian elimination with partial and complete pivoting,
(c) the QR factorization.
8. (a) Solve each of the systems of Problem #7 using parital pivoting but without explicit
factorization (Section 6.4.4).
(b) Compute the residual vector in each case.
9. Solve 0 0:00001 1 1 1 0 x 1 0 2:0001 1
1
B
B C B C = BB 3 CC
1 1 A @ x2 C
C B
@ 3 A @ A
1 2 3 x3 3
using Gaussian elimination without and with partial pivoting and compare the answers.
10. Consider m linear systems
Axi = bi; i = 1; 2; : : :; m:
(a) Develop an algorithm to solve the above systems using Gaussian elimination with com-
plete pivoting. Your algorithm should take advantage of the fact that all the m systems
have the same system matrix A.
(b) Determine the op-count of your algorithm.
(c) Apply your algorithm in the special case where bi is the ith column of the identity matrix,
i = 1; : : :; n.
(d) Use the algorithm in (c) to show that the inverse of an n  n matrix A can be computed
using Gaussian elimination with complete pivoting in about 43 n3 ops.
(e) Apply the algorithm in (c) to compute the inverse of
01 1 11
B 2 3
A = @ 2 3 41 C
B C
1 1
A:
1 1 1
3 4 5
11. Consider the system Ax = b, where both A and b are complex. Show how the system can be
solved using real arithmetic only. Compare the op-count in this case with that needed to
solve the system with Gaussian elimination using complex arithmetic.
370
12. (i) Compute the Cholesky factorization of
01 1 1
1
A=B
B 1 1:001 1:001 CC
@ A
1 1 2
using
(a) Gaussian elimination without pivoting,
(b) the Cholesky algorithm.
ii) In part (a) verify that

max ja(iji)j  max ja(ijk;1)j; k = 1; 2.


A(1) and A(2) are also positive de nite.
iii) What is the growth factor?
13. Using the results of problem #12, solve the system Ax = b
0 3 1
where b = B
B 3:0020 CC :
@ A
4:0010
and A is the same as in problem #12.
0 4 ;1 ;1 0 1
BB ;1 4 0 ;1 CC
14. (a) Show that A = B BB CC is positive de nite with and without nding the
@ ;1 0 4 ;1 CA
0 ;1 ;1 4
Cholesky factorization. 0 1
2
BB 2 CC
(b) Solve the system Ax = B BB CCC.
@2A
2
15. (a) Develop an algorithm to solve a tridiagonal system using Gaussian elimination with
partial pivoting.
(b) Show that the growth factor in this case is bounded by two (Hint: max ja(1)
ij j  2 max jaij j).

16. (a) Show that Gaussian elimination applied to a column diagonally dominant matrix pre-
serves diagonal dominance at each step of reduction; that is, if A = (aij ) is column
diagonally dominant, then so is A(k) = (a(ijk)); k = 1; 2; : : :; n ; 1.
371
(b) Show that the growth using Gaussian elimination with partial pivoting for such a matrix
is bounded by 2. (Hint: max max ja(k)j  2 max jaiij).
k i;j ij
(c) Verify the statement of part (a) with the matrix of problem #14.
(d) Construct a 2  2 column diagonally dominant matrix whose growth factor with Gaussian
elimination without pivoting is larger than 1 but less than or equal to 2.
17. Solve the tridiagonal system
0 2 ;1 0 0 1 0 x 1 0 1 1
BB CC BB 1 CC BB CC
BB ;1 2 ;1 0 CC BB x2 CC = BB 1 CC
B@ 0 ;1 2 ;1 CA B@ x3 CA B@ 1 CA
0 0 ;1 2 x4 1
(a) using Gaussian elimination,
(b) computing the LU factorization of A directly from A = LU .
18. Solve the diagonally dominant system
0 10 1 1 1 1 0 x 1 0 13 1
1
B
B 1 10 1 1 C
C B
BB x2 CCC BBB 13 CCC
B
B C
CC BB CC = BB CC
B
@ ; 1 0 10 1 A @ x3 A @ 10 A
;1 ;1 ;1 10 x4 7
using Gaussian elimination without pivoting. Compute the growth factor.
19. (a) Develop ecient algorithms for triangularizing (i) an upper Hessenberg matrix (ii) a
lower Hessenberg matrix, using Gaussian elimination with partial pivoting.
(b) Show that when A is upper Hessenberg, Gaussian elimination with partial pivoting gives
ja(ijk)j  k + 1; if jaij j  1;
hence deduce that the growth factor in this case is bounded by n.
(c) Apply0 1your2 algorithms
1to0 solve
1 the0 systems:
1
2 3 x1 6
B
B 3 4 5 7C
C BB x CC BB 7 CC
B
i. B C
C BB 2 CC = BB CC
B
@ 0 0:1 2 3 C A B@ x3 CA B@ 8 CA
0 0 0:1 1 x4 9
0 1 0:0001 0 1 0 x 1 0 1:0001 1
1
B C B C B C
@ 0 2 0:0001 CA B@ x2 CA = B@ 2:0001 CA
ii. B
0 0 3 x3 3:0000

372
00 0 0 110x 1 001
BB 1 0 0 2 CC BB x1 C BB 0 CC
B C B 2CC
iii. BB@ 0 1 0 3 CCA BB@ x3 C
C =BBB CCC :
A @1A
0 0 1 4 x4 1
(d) Compute the growth factor in each case.
(e) Suppose the data in the above problems are accurate to 4 digits and you seek an accuracy
of three digits in your soluiton. Identify which problems are ill-conditioned.
0 1 1
1
20. (a) Find the QR factorization of A = B
B 10;5 0 CC.
@ A
0 10 ; 5
0 2 1
(b) Using the results of (a), solve Ax = b, where b = B
B 10;5 CC.
@ A
10;5
21. Find the LU factorization of 0T I 0 1
A=B
B I T I CC
@ A
0 I T
0 2 ;1 0 1
B C
where T = B@ ;1 2 ;1 CA :
0 ;1 2
Use this factorization to solve Ax = b; where each element of b is 2.

PROBLEMS ON SECTION 6.5


22. Find the determinant and the inverse of each of the matrices of problem #18 of Chapter 5
using
(a) Gaussian elimination with partial and complete pivoting.
(b) the QR factorization.
A A
!
23. (a) Let A = 11 12 .
A21 A22
Assume that A11 and!A22 are square and that A11 and A22 ; A21 A;111 A12 are nonsingular.
B B
Let B = 11 12 be the inverse of A. Show that
B21 B22
B22 = (A22 ; A21 A;111A12 );1;
B12 = ;A;111 A12B22 ;

373
B21 = ;B22 A21A;111;
and B11 = A;111 ; B12 A21A;111 :
(b) How many ops are needed to compute A;1 using the results of (a) if A11 and A22 are,
respectively, m  m and p  p? 0 4 0 ;1 ;1 1
BB 0 4 ;1 ;1 CC
(c) Use your results above to compute A ; where A = B
; 1 BB CC :
@ ;1 ;1 4 0 CA
;1 1 0 4
01 2 1
1 0 0 2 1
1
B C B C
24. Let A = B@ 2 4:0001 2:0002 C A and B = B@ 2 4:0001 2:0002 CA.
1 2:0002 2:0004 1 2:0002 2:0004
Write B in the form B = A ; uv T , then compute B ;1 using the Sherman-Morrison formula,
knowing 0 4:0010 ;2:0006 0:0003 1
B C
A;1 = 104 B
@ ;2:0006 1:0004 ;0:0002 CA
0:0003 ;0:0002 0:0001
25. Suppose you have solved a linear system with A as the system matarix. Then show, how you
can solve the augmented system
A 
! x
! b
!
= ;
c xn+1 bn+1
where A is nonsingular and n  n and a; b, and c are vectors, using the solution you have
already obtained. Apply your result to the solution of
01 2 3 11
BB 4 5 6 1 CC
BB C
B@ 1 1 1 1 CCA y = ( 6 15 3 1 ) :
0 0 1 2
PROBLEMS ON SECTIONS 6.6 and 6.7
26. Consider the symmetric systems Ax = b, where
0 0:4445 0:4444 ;0:2222 1 0 0:6667 1
A=B
B 0:4444 0:4445 ;0:2222 CC b = BB 0:6667 CC :
@ A @ A
;0:2222 ;0:2222 0:1112 ;0:3332
011
B CC
The exact solution of the system x = B
@ 1 A.
1
374
(a) Make a small perturbation b in b keeping A unchanged. Solve the system Ax0 = b + b.
Compare x0 with x. Compute Cond(A) and verify the appropriate inequality in the text.
(b) Make a small perturbation A in A such that kAk  kA1;1 k . Solve the system
(A + A)x0 = b. Compare x0 with x and verify the appropriate inequality in the text.
(Hint: kA;1k2 = 104).
27. Prove the inequality
kxk kAk
kx + xk  Cond(A) kAk ;
where Ax = b and (A + A)(x + x) = b.
0 1 1 1 10x 1 011 0 0 0 0:00003 1
1
B 1 1 1 CC BB x CC = BB 1 CC, using A = BB 0 0
Verify the inequality for the system B
2 3
0 C
C.
@ 2 3 4 A@ 2A @ A @ A
1 1 1
3 4 5 x3 1 0 0 0
28. (a) How are Cond(A) and Cond(A;1) related?
(b) Show that
i. Cond(A)  1
ii. Cond(AT A) = (Cond(A))2.
29. (a) Let O be an orthogonal matrix. Then show that Cond(O) with respect to the 2-norm is
one.
(b) Show that the Cond(A) with respect to the 2-norm is one if and only if A is a scalar
multiple of an orthogonal matrix.
30. Let U = (uij ) be a nonsingular upper triangular matrix. Then show that with respect to the
in nity norm
Cond(U )  max( uii) :
min(u )
ii
Hence construct a simple example of an ill-conditioned non-diagonal symmetric positive def-
inite matrix.
31. Let A = LDLT be a symmetric positive de nite matrix. Let D = diag(Dii). Then show that
with respect to 2-norm
Cond(A)  max(dii) :
min(dii)
Hence construct an example for an ill-conditioned non-diagonal symmetric positive de nite
matrix.

375
32. (a) Show that for any matrix A, Cond(A) with respect to 2-norm is given by
Cond(A) = max ;
min
where max and min are, respectively, the largest and the smallest singular values of A.
(b) Use the above expressions for Cond(A) to construct an example of an ill-conditioned
matrix as follows: choose two non-diagonal orthogonal matrices U and V (in particular
they can be chosen as Householder matrices) and a diagonal matrix  with one or several
small diagonal entries. Then the matrix
A = U V T
has the same condition-number as  and is ill-conditioned.
33. (a) Construct your own example to show that a small residual does not necessarily guarantee
that the solution is accurate.
(b) Give a proof of Theorem 6.7.1 (Residual Theorem).
(c) Using the Residual Theorem prove that if an algorithm produces a small residual for
every well-conditioned matrix, it is weakly stable.
1 a
!
34. (a) Find for what values of a the matrix A = is ill-conditioned?
a 1
1
!
(b) Let a = 0:999. Solve the system Ax = using Gaussian elimination without pivot-
1
ing.
(c) What is the condition number of A?

PROBLEMS ON SECTION 6.9 001


B CC
35. Apply iterative re nement to the system of the problem #9 using x(1) = B
@ 0 A. Estimate
0
Cond(A) from x(1) and x(2) and compare it with the actual Cond2(A) = 16:2933.

PROBLEMS ON SECTION 6.10


36. Consider the linear system of problem #14(a).
(a) Why do both the Jacob and Gauss-Seidel methods converge with an arbitrary choice of
initial approximation for this system?

376
(b) Carry out ve iterations of both the methods with the same initial approximation
001
BB 0 CC
x =B
(1) BB CCC
@0A
0
and compare the rates of convergence.
37. Construct an example to show that the convergence of the Jacobi method does not necessarily
imply that the Gauss-Seidel method will converge.
38. Let the n  n matrix A be partitioned into the form
0 A A  A 1
BB A11 A12    A1n CC
A=BBB ..21 . .22. . . . ..2n CCC
@ . . A
AN;1 AN;2    AN;N
where each diagonal block Aii is square and nonsingular. Consider the linear system
Ax = b
with A as above and x and b partitioned commensurately.
(a) Write down the Block Jacobi, Block Gauss-Seidel, and Block SOR iterations for the
linear system Ax = b, (Hint: Write A = L + D + U , where D = diag(A11; : : :; ANN ),
and L and U are strictly block lower and upper triangular matrices.)
(b) If A is symmetric positive de nite, then show that U = LT and D is positive de nite. In
this case, from the corresponding results in the scalar cases, prove that, with an arbitrary
choice of the initial approximation, Block Gauss-Seidel always converges and Block SOR
converges if and only if 0 < w < 2.
39. Consider the Block system arising in the solution of the discrete Poisson equation: uxx +
uyy = f : 0 T ;I    1
BB ;I T . . . CC
BB . . . C
B@ .. . . . . ;I CCA
;I T

377
where 04 ;1 0    0 1
BB ;1 4 ;1 0 0 C
C
BB ... ... ... 0 C
C
T =B BB CC
B@ . . . . . . ;1 CCA
0 0 0 ;1 4
Show that the Block Jacobi iteration in this case is
Tx(ik+1) = x(i+1
k) + x(k) + b ; i = 1; : : :; N:
i;1 i

Write down the Block Gauss-Seidel and Block SOR iterations for this system.
p p
40. (a) Prove that 1kxk2  kxkA  nkxk2, where A is a symmetric positive de nite matrix
with the eigenvalues 0 < 1  2  : : :  n.
p
(b) Using the result in (a), prove that kx ; xk k2  2  k kx ; x0 k2, for the conjugate
gradient method.
41. Show that the Jacobi method converges for a 2  2 symmetric positive de nite system.
42. For the system of problem #39, compute (BJ ); (BGS ), and wopt with N = 50; 100; 1000.
Compare the rate of convergence of the SOR iteration with using the optimal value wopt in
each case with that of Gauss Seidel, without actually performing the iterations.
43. Consider the block diagonal system
0A 0   0 1 011
BB 0 A 0  0 C
C BB 1 CC
BB . . C
. . . . . . ... C
B C
BB .. . . CC x = BBB 1 CCC
BB .. . . . ... ... 0 C CA BB .. CC
@. @.A
0 0    0 A 2525 1
where 0 2 ;1 0    01
BB ;1 2 ;1    0C
C
BB . . . . .. C
C
A=B . C
BB ... . . . . . . . . CC
B@ .. .. ..;1 C A
0       ;12 55
Compute (BJ ) and (BGS ) and nd how they are related. Solve the system using 5 iterations
of Gauss-Seidel and SOR with optimal value of w. Compare the rates of convergence.

378
44. Prove that the choice of
= pT (Ax ; b)=pT Ap
minimizes the quadratic function
( ) = (x ; p)
= 12 (x ; p)T A(x ; p) ; bT (x ; p):

45. Show that the eigenvectors of A are the direction vectors.


46. (a) Apply the Incomplete Cholesky Factorization algorithm to an unreduced tridiagonal
matrix T and show that the result is the usual Cholesky Factorization of T . Verify the
above statement with 04 1  01
BB 1 . . . . . . ... CC
T =B BB .. . . . . . . CCC :
@. 1A
0    1 4 55
(b) Apply the SOR iteration to the matrix T in (a) with w = 1:5 using x(0) = (0; 0; 0; 0; 0)T ,
and make a table with the results of the iterations.
47. Let p0; p1; : : :; pn;1 be the direction vectors generated by the basic conjugate gradient algo-
rithm. Let rk = b ; Axk ; k = 0; 1; : : :n ; 1. Then prove that
(a) rk 2 span (p0; : : :; pk); k = 0; 1; 2; : : :; n ; 1.
(b) Span (p0; : : :; pi) = span (r0; Ar0; : : :; Air0 ),
i = 0; 1; : : :; n ; 1. (See Ortega IPVL, pp. 271{273). (Read the proof from Ortega
IPVL, pp. 271-273 and reproduce the proof yourself using your own words.)
(c) Prove that r0 ; : : :; rn;1 are mutually orthogonal.
48. (Multisplitting) Consider the iteration
x(k+1) = Bx(k) + d
where B is given by
X
k X
k
B= DiBi;1 Ci; d=( DiBi;1 )b;
i=1 i=1
and
X
k
A = Bi ; Ci; i = 1; : : :; k; Di = I (Di  0):
i=1

379
Develop the Jacobi, the Gauss-Seidel and the SOR methods based on the multisplitting of
A. This type of multisplitting has been considered by O'Leary and White (1985),
and Neuman and Plemmons (1987).
49. Apply the Jacobi, the Gauss-Seidel and the SOR (with optimal relaxation factor) methods to
the system in the example following Theorem 6.10.6 (Example 6.10.5) and verify the statement
about number of iterations made there about di erent methods.
50. Give a proof of Theorem 6.10.8.
51. Read the proof of Theorem 6.10.9 from Ortega, IPVL, p. 277, and then reproduce the proof
yourself using your own words.

380
MATLAB AND MATCOM PROGRAMS AND PROBLEMS ON CHAPTER 6

You will need the programs lugsel, inlu, inparpiv, incompiv, givqr, compiv,
invuptr, iterref, jacobi, gaused, sucov, nichol from MATCOM.

1. (a) Write a MATLAB program called forelm, based on Algorithm 6.4.1:


[y ] = forelm (L; b)
to solve a nonsingular lower triangular system Ly = b using forward elimination.
(b) Write a MATLAB program called backsub, based on Algorithm 3.1.3
[x] = backsub (U; b)
to solve a nonsingular upper triangular system
Ux = b:

Use randomly generated test matrices and test matrices creating L and U with one or more
small diagonal entries.
(Note : forelm and backsub are also in MATCOM or in the Appendix.)

381
Test Matrices for Problems 2 Through 8

For problems #2 through 8 use the following matrices as test matrices. When the
problem is linear system problem Ax = b, create a vector b such that the solution
vector x is a vector with all components equal to 1.
1. Hilbert matrix of order 10
2. Pie matrix of order 10
3. Hankel matrix of order 10
4. Randomly generated matrix of order 10
0 1
BB :00001 1 1 CC
5. A = B@ 0 :00001 1 CA.
0 0 :00001
6. Vandermonde matrix of order 10.

2. (a) Using lugsel from MATCOM, backsub, and forelm, write the MATLAB program
[x] = linsyswp(A; b)
to solve Ax = b using Gaussian elimination without pivoting.
Compute the growth factor, elapsed time, and op-count for each system.
(b) Run the program inlu from MATCOM and multiply the result by the vector b, to obtain
the solution vector x = A;1 b. Compute op-count.
(c) Compare the computed solutions and op-counts of (a) and (b).
3. (a) Using parpiv and elmul from Chapter 5, and backsub, write a MATLAB program
[x] = linsyspp(A; b)
to solve Ax = b using Gaussian elimination with partial pivoting.
Compute the growth factor, elapsed time, and op-count for each system.

382
(b) Run the program inparpiv from MATCOM and multiply the result by b to compute
the solution vector x = A;1 b.
Compute op-count for each system.
(c) Compare the computed solutions, op-counts and elapsed times of (a) and (b).
4. (a) Using compiv from MATCOM, elmul from Chapter 5, and backsub, write the MAT-
LAB program
[x] = linsyscp(A; b)
to solve Ax = b using Gaussian elimination with complete pivoting.
Compute op-count, elapsed time, and the growth factor for each system.
(b) Run the program incompiv from MATCOM and multiply the result by b to compute
the solution vector x = A;1 b.
Compute op-count for each system.
(c) Compare the computed solutions, op-count, and elapsed time of (a) and (b).
5. (a) Implement the algorithm in section 6.4.4 to solve Ax = b without explicit factorization
using partial pivoting:
[x] = linsyswf (A; b):
(b) Compute A;1 using this explicit factorization.
6. (a) Using housqr from Chapter 5 (or the MATLAB function qr) and backsub, write the
MATLAB program
[x] = linsysqrh(A; b)
to solve Ax = b using QR factorization with Householder matrices. Compute op-count
for each system.
(b) Repeat (a) with givqr in place of housqr; that is, write a MATLAB program called
linsysqrg to solve Ax = b using the Givens method for QR factorization.
7. (The purpose of this exercise is to make a comparative study with respect to
accuracy, elapsed time, op-count and growth factor for di erent methods for
solving Ax = b.)
Tabulate the result of problems 2 through 6 in the following form:
Make one table for each matrix. x^ stands for the computed solution.

383
TABLE 6.1
(Comparison of Di erent Methods For the Linear System Problems)
The computed Rel. Error Residual Growth Elapsed
Method solution x^ jjx ; x^jj=jjxjj jjb ; Ax^jj Factor Time
linsyswp
linsyspp
linsyscp
linsyswf
linsysqrh
linsysqrg
A;1b

384
8. (a) Write a MATLAB program to nd the inverse of A using housqr (or MATLAB function
qr) and invuptr (from MATCOM):
[A] = invqrh(A):
Compute op-count for each matrix.
(b) Repeat (a) using givqr and invuptr:
[A] = invqrg(A):
Compute op-count for each matrix.
(c) Run inlu, inparpiv, incompiv from MATCOM with each of the data matrices. Make
a table for each matrix A to compare the di erent methods for nding the inverse with
respect to accuracy and op-count. Denote the computed inverse by A^. Get A;1 by
using the MATLAB command inv (A).

TABLE 6.2
(Comparison of Di erent Methods for Computing the Inverse)
Relative Error
Method jjA;1 ; (A^)jj=jjA;1jj Flop-Count
inlu
inparpiv
incompiv
invqrh
invqrg

9. (a) Modify the program elmlu to nd the cholesky factorization of a symmetric positive
de nite matrix A using Gaussian elimination without pivoting.
[H ] = cholgauss(A):
Create a 15  15 lower triangular matrix L with positive diagonal entries taking some of
the diagonal entries small enough to be very close to zero, multiply it by LT and take
A = LLT as your test matrix. Compute H .
Compute op-count.
385
(b) Run the MATLAB program Chol on the same matrix in (a), denote the transpose of
the result by H^ . Compute op-count.
(c) Compare the results of (a) and (b). (Note that Chol(A) gives an upper triangular matrix
H such that A = H T H ).
10. Run the program linsyswp with the diagonally dominant, symmetric tridiagonal and block
tridiagonal matrices encountered in Section 6.3.5 by choosing the right-hand side vector b so
that the solution vector x is known apriori. Compare the exact solution x with the computed
solution x^.
(The purpose of this exercise is to verify that to solve a symmetrical positive
de nite system, no pivoting is needed to ensure stability in Gaussian elimination).
11. (a) Write a MATLAB program to implement algorithm 6.7.2 that nds an upper bound of
the 2-norm of the inverse of an upper triangular matrix:
[CEBOUND] = norminvtr(U ):
Test your result by randomly creating a 10  10 upper triangular matrix with several
small diagonal entries, and then compare your result with that obtained by running the
MATLAB command:
norm(inv (U )):
(b) Now compute the condition number of U as follows:
norm(U )  norminvtr(U ):
Compare your result with that obtained by running the MATLAB command: cond (U ):
Verify that
Cond(U )  max (uii ) :
min(u )ii
(Use the same test matrix U as in part (a)).
12. (a) (The purpose of this exercise is to compare di erent approaches for estimat-
ing the condition number of a matrix). Compute and/or estimate the condition
number of each of the following matrices A of order 10: Hilbert, Pie, randomly
generated, Vandermonde, and Hankel using the following approaches:
i. Find the QR factorization of A with column pivoting: QT AP = R
Estimate the 2-norm of the inverse of R by running norminvtr on R.
Now compute norm(R) * norminvtr(R). Compute op-count.
386
ii. Compute norm(A) * norm( inv(A)). Compute op-count.
iii. Compute cond(A). Compute op-count.
(b) Now compare the results and op-counts.
13. (a) Run the iterative re nement program iterref from MATCOM on each of the 15  15
systems: Hilbert, Pie, Vandermonde, randomly generated, Hankel, using the
solution obtained from the program linsyspp (problem #3) as the initial approximation
x(o)
(b) Estimate the condition number of each of the matrices above using the iterative re ne-
ment procedure.
(c) Compare your results on condition number estimation with those obtained in problem
#12.
14. Run the programs jacobi, gaused, and sucov from MATCOM on the 6  6 matrix A of
Example 6.10.5 with the same starting vector x(0) = (0; 0; :::; 0)T . Find how many iterations
each method will take to converge. Verify the statement of the example that it takes ve
iterations for SOR to converge with !opt compared to twelve iterations for Jacobi.
15. Run the programs jacobi, gaused, and sucov from MATCOM on Example 6.10.3 and verify
the statement of the example.
16. Run the program nichol from MATCOM implementing the \NO-Fill Incomplete Cholesky
Factorization" on the tridiagonal symmetric positive de nite matrix T of order 20 arising
in descretization of Poisson's equation. Compare your result with that obtained by running
chol(T) on T .
17. Write a MATLAB program called arnoldi based on the Arnoldi method (Algorithm 6.10.8)
using modifying Gram{Schmidt algorithm (modi ed Gram{Schmidt has been implemented
in MATCOM program mdgrsch (see Chapter 7)).
18. Using Arnoldi and a suitable least-squares routine from Chapter 7, write a MATLAB program
called gmres to implement the GMRES algorithm (Algorithm 6.10.9).

387

Potrebbero piacerti anche